Does googlebot keep sessions when crawling? - asp.net

When googlebot crawls pages does it have session? For example I am storing some variables on the session and using them in my site's pages. When googlebot crawls these pages will I still have the session-variables? In my global.asax I am storing some variables on the session at session start. Will I have any problem with Google bot?

Googlebot actively tries to avoid sessions and does not support cookies. From First date with the Googlebot: Headers and compression (March 2008)
I usually avoid cookies (so no "Cookie:" header) since I don't want
the content affected too much by session-specific info. And, if a
server uses a session id in a dynamic URL rather than a cookie, I can
usually figure this out, so that I don't end up crawling your same
page a million times with a million different session ids.
I imagine most regular search engine bots will be similar in this respect. Google is trying to build an index of unique URLs. The URL is the unique key that identifies a unique page of content. Cookies (and sessions) are not passed when a user clicks a link in the SERPS. Google is primarily indexing pages, not sites.

The answer to one of your question is: yes, you will have problems with Google bot.
Generally we've encountered two types of issues with google bot:
it sometimes does not retain HTTP cookies between requests. Our application relies on custom cookies and the there were plenty of google bot requests caught to carry no cookies at all.
it makes long breaks between consecutive requests. For example, it retrieves your page and asks for it's scripts later on.
Both will cause troubles with your session. First - you need a precise ASPNETSessionID cookie to be passed between requests. Googlebot will probably sometimes fail to do that. Second - if there's a long timespan between requests, your session is going to terminate even if the cookie is there.

Generally the answer is no, however other crawlers (of which there are plenty) work other ways.
I should note that I have seen an instance of a google crawler for Adwords (not the normal googlebot) which DID present a session cookie.

It's very unlikely, I think. It should create a new session every time it crawls your website.

Related

Server side redirect vs Javascript redirect in terms of SEO

I am creating my own short url website 9o9.in
While a visitor hits a short url generated by my site, he will essentially hit my server first. But I know there might be several links of potentially harmful or inappropriate sites which will be shortened using my site's service.
In order to make sure that I am not not setting a negative reputation of my site in terms of SEO, by linking or http referring unaccepted sites in the eyes of a Search Engine, should I go for a server side redirect like using php header() function, or shall I do a javascript based client side redirect?
Well, I know the wiser solution is to prevent users from generating short links of unacceptable sites. But right now I can't afford to implement it, as it would require extensive amount of data analysis or using expensive word filtering APIs...
Any help is highly appretiated.
Thanks.
A server-side redirect will be lower latency, as the browser can immediately begin fetching the new page whereas, with a client-side redirect in JavaScript, the browser must continue downloading your JavaScript code and then must execute this JavaScript code. Therefore, it is in your users' best interest to use a server-side redirect wherever possible over doing client-side redirecting. And, because it is in the users' best interest, it is also in a search engine's best interest to reward such behavior (indeed, Google has publicly stated that end user latency is one of many ranking signals that is used).
On the subject, though, you may want to take advantage of the safe browsing API to help you validate the URLs to which you redirect for malware, so that you don't serve malware from these links.

Get unique, static id from a device via web request

I have an MVC application that I would like to add some custom stats to. For some of the stats, it would be nice to have a unique identifier for a device.
For example, if I have a unique id for a RSS subscriber, I can monitor the active number of RSS subscribers.
I was wondering if anyone knew of anything in the web request that could be used as an ID other than the IP (which can obviously change). Something like a device ID or something?
Here are some approaches to consider.
HTTP Headers
There are a few HTTP Headers you can look at that can help you identify a unique user or device - some would refer to the sim card, some refer to the device.
Here is a list that I derived from the headers that Google Adsense Mobile uses to help track their advertising:
x-dcmguid
x-up-subno
x-jphone-uid
x-em-uid
These are probably some very popular one's, but there would be more vendor/device specific headers that are popular. You could start gathering all the headers your site receives and count how many of each you receive and start building up your own database of common headers.
Some other approaches
Cookies
Cookies is something that can be set on the requesting agent (browser for example) and returned when the agent visits again. For a list of methods, check out Ever Cookie - the virtually permanent cookie - it works by using one of the following methods of which at least one will work:
- Standard HTTP Cookies
- Local Shared Objects (Flash Cookies)
- Silverlight Isolated Storage
- Storing cookies in RGB values of auto-generated, force-cached
PNGs using HTML5 Canvas tag to read pixels (cookies) back out
- Storing cookies in Web History
- Storing cookies in HTTP ETags
- Storing cookies in Web cache
- window.name caching
- Internet Explorer userData storage
- HTML5 Session Storage
- HTML5 Local Storage
- HTML5 Global Storage
- HTML5 Database Storage via SQLite
Combinations
It's also possible to come up with your own scheme, e.g. take the user-agent header, some other headers like accept, x-fowarded-for and the ip make a unique hash value of out them to more accurately determine the uniqueness of the agent.
There are many different mobile headers as seen here. I also hit a page of mine and store mobile headers from various devices for my own purposes here http://wap.defza.com/ua/ua.txt (also ua1.txt, ua2.txt etc)
The short answer is their isn't any (and with good reason given privacy concerns). The more helpful answer would be that this is something you would normally do using cookies. You set a cookie and then check that to identify the specific browser making the request.
Of course, this is by no means fool-proof as users can reject cookies, delete them and they can use many different browsers (each of which will have a different cookie). If you are being devious (and I wouldn't recommend this) you could use a Local Shared Object (Flash Cookie) as this is less likely to be removed. At the end of the day, though, if someone doesn't want to be tracked you can't force them to be.
Generally, though, if you want analytics and tracking then consider using a 3rd party solution like Google Analytics. This will give you very detailed data (albeit still relying on cookies and javascript) about your visitors and their browsing habits.
other than the IP
If your site doesn't require any sort of authentication in order to serve this content, the IP address is the only thing you could get to identify clients, and even this might not be unique, for example you could have two clients behind the same proxy => no way of distinguishing those requests in this case. Another possibility is to use cookies, but that sort of falls in the first category => authentication.
There is no identifier that's provided by a browser, privacy concerns make it very unlikely that any vendor would ever implement that, now at least.
The only option you have is some form of cookie.
For RSS feeds, you could conceivably embed a random unique ID in the feed URL every time its rendered, so you'd know when the person that retrieved that URL downloaded your feed. However, if the user shared that URL with others you'd have no real way of knowing.

ASP.NET scenario Interview question. How would you answer it?

Here's the question scenario:
Suppose you have a multiple-page ASP.NET web site with the following
requirements:
User-specific data for the currently logged in user is loaded and is required on each individual page of the application during a user's session.
The application itself only allows a certain number of users to be logged in at one time.
The next time a specific user logs in, the user should be returned to the last page visited.
Given this information, briefly describe how you would use ASP.NET to manage the state of the application to meet these needs?
Here's my thoughts and reasons. Please provide yours.
User-specific data for the currently
logged in user is loaded and is
required on each individual page of
the application during a user's
session.
This is suggesting to me that the interviewer is looking to see if I would suggest using Master pages as a way to provide a common approach to displaying the same thing on every page.
The application itself only allows a
certain number of users to be logged
in at one time.
Could the sought response be that, because scaling isn't an issue due to the limited number of users, that it is OK to put this information in the Session object for performance reasons or is this a trap and some of approach is better?
The next time a specific user logs in,
the user should be returned to the
last page visited
A cookie seems the best approach to track the last page access, since this doesn't seem to be critical information.
Please tell me how you would handle these question if you wanted to make the best impression
Feel free to provide input or comment an any line item.
Thanks!
As far as (3) is concerned, consider a shared PC. User A logs into a website using their site based user name/password. Does a whole load of work and shuts down the browser. USer B then comes along and on the same PC logs into the same site using their details. However, they will get the cookie from User A and be redirected to the last page they saw. This happens because Cookies are tied to the browser / OS user, where as you are potentially applying the site security separately in the application.
In this situation you would either need to put the user name into the cookie (encrypted) or use a server side method to store the location
Here are my thoughts:
They might be looking for Master Pages, but my first thought here was whether you're going to cache this user data, so you're not making a database query every time they hit a new page. To really impress them, you might mention partial caching techniques so that the repetitive portions of the page don't even need to be re-rendered with each page load.
I think you're right: they're helping you to conclude that the session state is an appropriate place to cache the user data. Just be sure you ask the appropriate questions, like "How many users?", and "How much data per user?"
The cached data could be used to keep track of the last-requested page, and when the user's session expires, you could save this data into a database table to be retrieved next time they log in.
That third item is awfully tricky. What if the user was last looking at an object that has since been deleted? What would be the intended behavior if a user logged in from one computer, did some work, and then logged in simultaneously from another computer or browser? I'd be sure to ask these kinds of questions, not least to show that I understand the implications of a requirement like this. If their responses lead you to believe that they're looking for a simple solution, go with the simple solution. Otherwise, tweak your response to be only as complicated as necessary.
Just a small thought.. If the system are running in a "Farmed" environment the Session data can be cleared and need to be handled some way.
http://www.beansoftware.com/ASP.NET-Tutorials/Store-Session-State-Server.aspx

Check if anyone is currently using an ASP.Net app (site)

I build ASP.NET websites (hosted under IIS 6 usually, often with SQL Server backends and forms authentication).
Clients sometimes ask if I can check whether there are people currently browsing (and/or whether there are users currently logged in to) their website at a given moment, usually so the can safely do a deployment (they want a hotfix, for example).
I know the web is basically stateless so I can't be sure whether someone has closed the browser window, but I imagine there'd be some count of not-yet-timed-out sessions or something, and surely logged-in-users...
Is there a standard and/or easy way to check this?
Jakob's answer is correct but does rely on installing and configuring the Membership features.
A crude but simple way of tracking users online would be to store a counter in the Application object. This counter could be incremented/decremented upon their sessions starting and ending. There's an example of this on the MSDN website:
Session-State Events (MSDN Library)
Because the default Session Timeout is 20 minutes the accuracy of this method isn't guaranteed (but then that applies to any web application due to the stateless and disconnected nature of HTTP).
I know this is a pretty old question, but I figured I'd chime in. Why not use Google Analytics and view their real time dashboard? It will require minor code modifications (i.e. a single script import) and will do everything you're looking for...
You may be looking for the Membership.GetNumberOfUsersOnline method, although I'm not sure how reliable it is.
Sessions, suggested by other users, are a basic way of doing things, but are not too reliable. They can also work well in some circumstances, but not in others.
For example, if users are downloading large files or watching videos or listening to the podcasts, they may stay on the same page for hours (unless the requests to the binary data are tracked by ASP.NET too), but are still using your website.
Thus, my suggestion is to use the server logs to detect if the website is currently used by many people. It gives you the ability to:
See what sort of requests are done. It's quite easy to detect humans and crawlers, and with some experience, it's also possible to see if the human is currently doing something critical (such as writing a comment on a website, editing a document, or typing her credit card number and ordering something) or not (such as browsing).
See who is doing those requests. For example, if Google is crawling your website, it is a very bad idea to go offline, unless the search rating doesn't matter for you. On the other hand, if a bot is trying for two hours to crack your website by doing requests to different pages, you can go offline for sure.
Note: if a website has some critical areas (for example, writing this long answer, I would be angry if Stack Overflow goes offline in a few seconds just before I submit my answer), you can also send regular AJAX requests to the server while the user stays on the page. Of course, you must be careful when implementing such feature, and take in account that it will increase the bandwidth used, and will not work if the user has JavaScript disabled).
You can run command netstat and see how many active connection exist to your website ports.
Default port for http is *:80.
Default port for https is *:443.

Session sharing issue

I am facing an issue when we are using multiple tabs since its sharing the same session. Any alternatives to this? Can we create a unique session when someone uses the tab or CTRL+N.
It's a Java EE/Struts2 enterprise application if this matters.
This is a problem all server-centric web applications face, it's not specific to Java EE. The problem is that most browsers store cookies on a per-user basis, not per tab. Also, this behaviour is not generally transparent to the user, adding to the confusion. A few solutions I can think of (although none of them is really satisfactory):
Host the application under more than one URI. This way, any browser will store cookies independently, and consequently, you have one session per application version.
Propagate session IDs through a different mechanism, e.g. through the URI. This, however, has a few caveats - it exposes the session ID to the user, it makes for ugly URIs, and it forms a security risk (session hijacking and such) when users copy-paste or bookmark the current URI (because they then store the session ID in the link).
Propagate session IDs through hidden fields inside the page. This solution probably requires you to rewrite part of the built-in session handling, and it loses the session ID when your page contains links to other pages within your application.
For Firefox, there's an add-on called "cookie pie", which allows users to have independent cookie stores for some or all tabs. Downside is that users have to actively enable it, and working around the tab problem becomes the user's responsibility. Also, it doesn't work under all circumstances (e.g., google finds your active login regardless).
Avoid using session state, and use other mechanisms to preserve state between requests. Like passing session IDs through hidden fields, this breaks under certain circumstances.
Make the application fully client-centric, that is, program the entire interface in javascript and communicate with the server through ajax calls. This way, you won't depend on the browser's cookie implementation at all. Chances are you'll have to rewrite substantial amounts of code though, assuming your application is basically working already.
There is no simple way to achieve this that I know of.
The usual way to fix this is to change the app so that it can deal with users using multiple tabs (if possible).
There are several workaround ideas for how to "disable" the old window if the user presses Ctrl+N while walking through a multi-step form, but you'd have to give more detailed information for ideas on that.
Usually a browser instance is treated as a single user/entity for session tracking purposes. Especially if you are using cookies to track the sessions. I am not sure that I like the idea of allowing different tabs to have different sessions. It feels unintuitive for web based applications. All IMHO, of course.
That said, if you want to change this you will have to come up with a custom implementation. Perhaps you can generate and attach different session ids to the URL for different tabs. Never tried this myself so do not know how easy or difficult it will be.

Resources