Get unique, static id from a device via web request - asp.net

I have an MVC application that I would like to add some custom stats to. For some of the stats, it would be nice to have a unique identifier for a device.
For example, if I have a unique id for a RSS subscriber, I can monitor the active number of RSS subscribers.
I was wondering if anyone knew of anything in the web request that could be used as an ID other than the IP (which can obviously change). Something like a device ID or something?

Here are some approaches to consider.
HTTP Headers
There are a few HTTP Headers you can look at that can help you identify a unique user or device - some would refer to the sim card, some refer to the device.
Here is a list that I derived from the headers that Google Adsense Mobile uses to help track their advertising:
x-dcmguid
x-up-subno
x-jphone-uid
x-em-uid
These are probably some very popular one's, but there would be more vendor/device specific headers that are popular. You could start gathering all the headers your site receives and count how many of each you receive and start building up your own database of common headers.
Some other approaches
Cookies
Cookies is something that can be set on the requesting agent (browser for example) and returned when the agent visits again. For a list of methods, check out Ever Cookie - the virtually permanent cookie - it works by using one of the following methods of which at least one will work:
- Standard HTTP Cookies
- Local Shared Objects (Flash Cookies)
- Silverlight Isolated Storage
- Storing cookies in RGB values of auto-generated, force-cached
PNGs using HTML5 Canvas tag to read pixels (cookies) back out
- Storing cookies in Web History
- Storing cookies in HTTP ETags
- Storing cookies in Web cache
- window.name caching
- Internet Explorer userData storage
- HTML5 Session Storage
- HTML5 Local Storage
- HTML5 Global Storage
- HTML5 Database Storage via SQLite
Combinations
It's also possible to come up with your own scheme, e.g. take the user-agent header, some other headers like accept, x-fowarded-for and the ip make a unique hash value of out them to more accurately determine the uniqueness of the agent.
There are many different mobile headers as seen here. I also hit a page of mine and store mobile headers from various devices for my own purposes here http://wap.defza.com/ua/ua.txt (also ua1.txt, ua2.txt etc)

The short answer is their isn't any (and with good reason given privacy concerns). The more helpful answer would be that this is something you would normally do using cookies. You set a cookie and then check that to identify the specific browser making the request.
Of course, this is by no means fool-proof as users can reject cookies, delete them and they can use many different browsers (each of which will have a different cookie). If you are being devious (and I wouldn't recommend this) you could use a Local Shared Object (Flash Cookie) as this is less likely to be removed. At the end of the day, though, if someone doesn't want to be tracked you can't force them to be.
Generally, though, if you want analytics and tracking then consider using a 3rd party solution like Google Analytics. This will give you very detailed data (albeit still relying on cookies and javascript) about your visitors and their browsing habits.

other than the IP
If your site doesn't require any sort of authentication in order to serve this content, the IP address is the only thing you could get to identify clients, and even this might not be unique, for example you could have two clients behind the same proxy => no way of distinguishing those requests in this case. Another possibility is to use cookies, but that sort of falls in the first category => authentication.

There is no identifier that's provided by a browser, privacy concerns make it very unlikely that any vendor would ever implement that, now at least.
The only option you have is some form of cookie.
For RSS feeds, you could conceivably embed a random unique ID in the feed URL every time its rendered, so you'd know when the person that retrieved that URL downloaded your feed. However, if the user shared that URL with others you'd have no real way of knowing.

Related

How would you identify if a visitor to one of your sites is the same person who visited another site of yours before (different domain)?

My question is more of a conceptual one, but in my specific case I am using Google Analytics 4. If the question is unclear, here it is in scenario form: Some guy visits my site x.com after a google search. He closes the tab, does another google search, and arrives at my other site y.com. How do I know it's the same person? I don't think there's anything I can do with User ID's in this situation. How would I solve this?
This isn't without fault, but if you are implementing it via Google Tag Manager, you have more control over the data being sent and on top of that, if you are transporting the data via Google Tag Manager server side container.
You would use a single server (but possibly different containers) or use BigQuery and either use the templateDataStorage API call or the BigQuery API call.
Essentially, the first time you see a google cid or an IP address or combination of user agent and ip address you would store it in the server or in a BigQuery table as a key and create a random associated value next to it.
At each time, across all your sites, you would check to see if the IP address or CID or combination of user agent and ip exists in the server or in the BigQuery table, then output the random value as a custom dimension and if not, it will create one.
Actually you probably wouldn't.
Presumably you could try fingerprinting, but depending on your legislation that might not be quite legal, and it tends to work a lot better in a lab than in real life. Also browsers start to implement anti-fingerprinting measures such as trimming the user agent, and denying access to browser properties such as installed plugins.
I have heard of experimental approaches to recongnize users via usage patterns - e.g. how do they move their mouse etc. I am not aware of any actual product that uses this, and I am not convinced it is a useful (or even legal) approach.
But in general, when it comes to cross-domain detection for unrelated visits (moving from domain to domain works via link decorators, and even that is affected by browser protections) you have the combined power of browser vendor against you, who try to make this harder (either for genuine concerns about privacy, or to establish themselves as the single gatekeeper for user identity. E.g. Google has a huge user base that is almost constantly logged in to Google accounts or Android smartphones, which helps with identifying users all over the web).

Is there a way to prevent content caching or scraping from an API?

Imagine the following situation. I have an API and a developer builds an application that retrieves new content from it on a daily base. She stores this content and provides this data to all the instances of an app she developed. In this way these apps do not have to call the API directly.
Is there a way to prevent this and force the apps (and therefore the end users) to use the API and not only the application on the server.
I found many questions about how to cache API data but not how to prevent that. I am fairly new to this, so maybe I am overlooking something or maybe it is not possible to prevent this.
Thank you in advance!
Assuming you are using Apigee for API-management, you have some options. First, consider the options available to you contractually, if this is that sort of business relationship and you can impose certain API behavior with a business partner through a contract.
Separate from the legal side of things, we remember that you control your API and the credentials you issue for use by your API clients. You cannot though control, practically, what a client developer does with the credentials you issue: she could promise to embed the credentials in the mobile apps' API client, but change her mind and use it centrally, and then design her mobile client to call into her central cache. If though you really insist that only mobile app clients should be calling your API and not a hub/cache server, then you could consider applying constraint policies on your API (within the Apigee proxy, such as Access Control). For instance, you could blacklist your partner's hub/cache server IP address, although that is weak security at best. Or, you could apply a constraint that only clients with certain identifying User-Agent strings (mobile OS, client) are allowed to connect to your API. Or use GeoIP filtering to allow only clients from certain regions, if that applies to your use-case.
Finally, depending on the data model, you might be able to rate-limit such that a bulk cache becomes impractical: if your edge-client use-cases is to fetch a single record, but a cache would have to hold thousands of records, then you could impose a per-client rate limit (Quota policy) which is no bother to individual mobile clients, but makes the work of a hub/cache server untenable.

Is it possible to add logic to CDN

Is it possible to serve two different pages based on the user agent.
I want to serve pagename-iphone.html if the user agent matches iPhone and pagename-other.html for all other user agents. I want all pages on the site to follow this pattern. Is it possible to do this at the CDN level (cloud front, akamai etc).
thanks for your help!
I think what you are after is User Agent based caching, like vary: User-Agent.
In theory, a server provide Cache service can definitely do so, however, as far as I can tell CloudFront and most of other major CDN providers don' support so.
The basic reason is very straightforward that the currently there are too many User-Agent header, and it's almost unique on every single browser, not mention the different versions of the same browser. If you purely do things based on the whole User-Agent, you will lost the benefit of CDN cache most of the time.
Some of the more advanced servers allow you to add condition based on headers, for example, in Varnish, you can even add if,else logic for returning different values. But this is not available for majority of CDNs.
In the other hand, you should not rely on CDN to return different html pages. CDN is more commonly used to accelerate artifacts (js/css/imgs) instead of the whole page.
EDIT: Actually, I just recieved an email from AWS mentioned now CloudFront starts to support this:
Mobile Device Detection: You can now cache and deliver customized
content to your viewers on different devices (e.g. mobile vs. desktop)
based on the value of the User Agent header.
Please refers to: http://aws.amazon.com/about-aws/whats-new/2014/06/26/amazon-cloudfront-device-detection-geo-targeting-host-header-cors/ for more details.

Does googlebot keep sessions when crawling?

When googlebot crawls pages does it have session? For example I am storing some variables on the session and using them in my site's pages. When googlebot crawls these pages will I still have the session-variables? In my global.asax I am storing some variables on the session at session start. Will I have any problem with Google bot?
Googlebot actively tries to avoid sessions and does not support cookies. From First date with the Googlebot: Headers and compression (March 2008)
I usually avoid cookies (so no "Cookie:" header) since I don't want
the content affected too much by session-specific info. And, if a
server uses a session id in a dynamic URL rather than a cookie, I can
usually figure this out, so that I don't end up crawling your same
page a million times with a million different session ids.
I imagine most regular search engine bots will be similar in this respect. Google is trying to build an index of unique URLs. The URL is the unique key that identifies a unique page of content. Cookies (and sessions) are not passed when a user clicks a link in the SERPS. Google is primarily indexing pages, not sites.
The answer to one of your question is: yes, you will have problems with Google bot.
Generally we've encountered two types of issues with google bot:
it sometimes does not retain HTTP cookies between requests. Our application relies on custom cookies and the there were plenty of google bot requests caught to carry no cookies at all.
it makes long breaks between consecutive requests. For example, it retrieves your page and asks for it's scripts later on.
Both will cause troubles with your session. First - you need a precise ASPNETSessionID cookie to be passed between requests. Googlebot will probably sometimes fail to do that. Second - if there's a long timespan between requests, your session is going to terminate even if the cookie is there.
Generally the answer is no, however other crawlers (of which there are plenty) work other ways.
I should note that I have seen an instance of a google crawler for Adwords (not the normal googlebot) which DID present a session cookie.
It's very unlikely, I think. It should create a new session every time it crawls your website.

How can I set permissions on an RSS file?

I want to create an in-house RSS feed (I work for 3 Mobile, Australia) for consumption on an INQ1 mobile phone, or any other RSS reader for that matter. However, testing it out on the phone's built-in RSS reader, I realize that without the ability to password protect the feed, or otherwise restrict access to it, I stand little chance of being able to develop this idea further.
One thing I thought of was to periodically change the Uri for the feed, so managers who had left the company couldn't continue to subscribe and see sensitive information, but the idea of making users do that would make it a harder sell, and furthermore is terribly inelegant.
Does anybody know how to make it so that prior to downloading a feed, a reader would have to authenticate the user? Is it part of the metadata within the feed, or something you would set in the reader software?
Update: I should have explained that I already have placed folder-level permissions on the parent folder, which brings up the normal authentication dialog when the feed is viewed in a browser, but which just results in a failed update with no explanation or warning in the phone's RSS reader, and is indistiguishable from the file being missing, when I next try and refresh the feed.
If the reader in the phone doesn't support HTTP Basic or Digest, your best bet is to create a unique url to the feed for each consumer. Have the customer login and generate a link with some token in it that is unique for that user. If the user ever leaves, you can then deny that token, shutting down access.
If you go this route, you probably want to investigate including the Feed Access Control bits in your feed. It's not perfect, but it is respected by the bigger aggregators, so if one of your clients decides to subscribe to the feed with Reader or Bloglines, things shouldn't show up in search results.
I believe you would set the permissions on the feed itself, forcing authentication, much like the Twitter feeds. The problem with this is that many readers (including Google Reader) don't yet support authenticated feeds.
The idea is to have authentication over a secure channel. These posts explain it pretty well:
RSS Security
Private RSS Feeds
Authentication by the webserver is probably the best solution, however to get round the issues of reader not supporting it (Google has been mentioned and I have issues with Safari) you could implement a simple value-key to append to the URL.
I.E.
http://www.mydomain/rss.php?key=value
Your system could then "authenticate" the key-value and output the RSS, invalid k-v could get a standard "invalid authenticate" message as single item RSS or return a 40x error.
It not very secure as you could see the k-v in the URL but it's a a trade off. An un-authenticated HTTPS would be slightly more secure.
Assuming your RSS feed is over HTTP then basic HTTP authentication would probably do the trick. This would either be done at the web server level (in IIS for example) or via whatever framework you're using to produce the feed (in ASP.NET for example).
The authentication scheme (basic username/password, NTLM, Kerberos etc) is up to you. If you're using WCF to produce the feed then these decisions are things you can make later and apply via config if needed.
Are you simply looking to authenticate consumers of the feed, or also encrypt it to prevent the information from being read by a "man in the middle". If you require encryption then SSL is probably the easiest to implement.
You should avoid simply "hiding" the RSS feed by changing it's name.
update:
Your question (with it's update) sounds like you're actually having issues with the RSS client on the device. You need to determine whether the phones RSS client understands how to deal with basic/digest authentication etc.
Assuming it doesn't, is there anything in the HTTP request that could allow you to associate a device with a user? Is there an HTTP Header that gives you a unique device ID? If so, you might be able to then perform a lookup against this data to perform your own weak-authentication, but you should remember that this sort of authentication could be easily spoofed.
Does the device have a client certificate that could be used for mutual SSL? If so, then that would be ideal.

Resources