Unknown 302 redirect at root. Crawlers cannot follow through - asp.net

The whole thing started when I noticed Facebook Debugger and other crawler tools are unable to parse my page. Facebook throws a critical error saying that it cannot follow the redirect. I believe Search Engine bots are hitting the same end. The website is functioning normally via all major web browsers.
It's probably worth to mention I am experimenting with ASP.NET Routing, using Web Forms under IIS8.
Given a website (http://example.com), here's what happens.
Case 1: Trying to access the root, this is what I get with a Web Sniffer simulator
Case 1 observations:
First thing I notice is '302' redirect instead of '200 OK'. It gives a 302 redirect with or without leading 'www'.
I noticed is that the Location Header is simply "/", confirmed by the page from IIS that I cannot see with a regular browser, which says the page is moved to "/". I believe something messes up at this point and the crawler is unable to follow through for some reason.
Case 2: Trying to access a given category page with a Web Sniffer simulator
Case 2 observations:
As you might figured out already, identical to case 1. And once again Facebook debugger cannot go through it, resulting to a redirect it cannot follow.
Questions:
1: How can I force an absolute path in the location headers instead of relative and will this be sufficient for the crawlers to follow through?
2: What could cause a 302 redirect happening in the first place in both www and non-www versions of the website?

Your web application is most likely depending on a cookie. The application sends a Set-Cookie header and redirects to the same page in order to receive a new request with the cookie data available. Search engines / bots, the Facebook bot and your Web Sniffer simulator will not send that cookie data and hence the web application keeps sending the 302 redirect responses.
The solution is to change your application to not require cookies for just simply viewing your web pages.

Related

Error 302 when the default page is changed in IIS 8

I have the following scenario. I have a website in IIS 8 and I am trying to secure it (https). I have made the web with web forms. In the process to secure it I have to change the page at the beginning (default page in the IIS administrator). When I do it, I don't get the change and I go to the website that was set by default.
I have seen the log and when trying to access the new homepage it gives an error 302 (object moved). I have seen the response header and I see that the location is configured with the old home page.
Example:
Old default page: www.namedomain.com/start.aspx
New default page: www. namedomain.com/home.aspx
The new website has as in the response header: location = /start.aspx and as I said before when trying to access it gives error 302.
Thanks.
There's a few things going on here, "securing" the site with HTTPS and also potentially <authentication mode="Forms"> in your web.config where it will try and redirect any unauthorised requests to a login page. It seems like you are just doing the HTTPS though at this stage, and maybe trying to set up a redirect from HTTP to HTTPS?
It sounds like you are also trying to change the default page for the website (in IIS or the web.config?) from default.aspx to home.aspx? I'm not sure I understand why you want to do that as it isn't necessary for HTTPS, but the effect of that will mean you can go to https://www.namedomain.com/ and you will get served the content from home.aspx instead of start.aspx (or default.aspx) but the URL will stay as just https://www.namedomain.com/
Normally to set up HTTPS, all you do is go into IIS, Bindings, and add a HTTPS binding (you'll need a TLS certificate to make the https work properly). then just make sure you include the "https://" at the start of your URL.
If you think it might be caching problem on your machine, just add a nonsense querystring to the end of your URL (like https://www.namedomain.com?blah=blahblah) and it will cause your browser to get a fresh copy of the page.
I'm not sure what is causing the 302 redirect, have you added any special code to swap HTTP requests over to HTTPS? Can you update your answer with any more info?
Yes, it is what I put in my last comment Jalpa. I do not understand very well the relationship between not configuring the session variables and the default page but once corrected in code, the application correctly loads the web by default.

When to add http(s):// to website address

I'm trying to create a web browser using Cocoa and Swift. I have an NSTextField where the user can enter the website he wants to open and a WebView where the page requested is displayed. So far, to improve the user experience, I'm checking if the website entered by the user starts with http:// and add it if it doesn't. Well, it works for most of the cases but not every time, for example when the user wants to open a local web page or something like about:blank. How can I check if adding http:// is necessary and if I should rather add https:// instead of http://?
You need to be more precise in your categorization of what the user typed in.
Here are some examples and expected reactions:
www.google.com: should be translated into http://www.google.com
ftp://www.foo.com: Should not be modified. Same goes to file:// (local)
Barrack Obama: Should probably run a search engine
about:settings: Should open an internal page
So after you figure out these rules with all their exceptions, you can use a regex to find out what should be done.
As for HTTP vs. HTTPS - if the site supports HTTPS, you'll get a redirect response (307 Internal Redirect, 301 Moved Permanently etc) if you go to the HTTP link. So for example, if you try to navigate to http://www.facebook.com, you'll receive a 307 that will redirect you to https://www.facebook.com. In other words, it's up to the site to tell the browser that it has HTTPS (unless of course you navigated to HTTPS to begin with).
A simple and fairly accurate approach would simply be to look for the presence of a different schema. If the string starts with [SomeText]: before any slashes are encountered, it is likely intended to indicate a different schema such as about:, mailto:, file: or ftp:.
If you do not see a non-http schema, try resolving the URL as an HTTP URL by prepending http://.

What happens if a 302 URI can't be found?

If I make an HTTP request to get index.html on http://www.example.com but that URL has a 302 re-direct in place that points to http://www.foo.com/index.html, what happens if the redirect target (http://www.foo.com/index.html) isn't available? Will the user agent try the original URL (http://www.example.com/index.html) or just return an error?
Background to the question: I manage a legacy site that supports a few existing customers but doesn't allow new signs ups. Pretty much all the pages are redirected (using 302s rather than 301s for some unknown reason...) to a newer site. This includes the sign up page. On one of the pages that isn't redirected there is still a link to the sign up page which itself links through to a third party payment page (i.e. on another domain). Last week our current site went down for a couple of hours and in that period someone successfully signed up through the old site. The only way I can imagine this happened is that if a 302 doesn't find its intended URL some (all?) user agents bypass the redirect and then go to originally requested URL.
By the way, I'm aware there are many better ways to handle the particular situation we're in with the two sites. We're on it! This is just one of those weird situations I want to get to the bottom of.
You should receive a 404 Not Found status code.
Since HTTP is a stateless protocol, there is no real connection between two requests of a user agent. The redirection status codes are just a way for servers to politely tell their clients that the resource they were looking for is somewhere else now. The clients, however, are in no way obliged to actually request the resource from that other URL.
Oh, the signup page is at that URL now? Well then I don't want it anymore... I'll go and look at some kittens instead.
Moreover, even if the client decides to do request the new URL (which it usually does ^^), this can be considered as a completely new communication between server and client. Neither server nor client should remember that there was a previous request which resulted in a redirection status code. Instead, the current request should be treated as if it was the first (and only) request. And what happens when you request a URL that cannot be found? You get a 404 Not Found status code.

Cookies sometimes disappear on a 303 redirect

Problem Closed
This particular user had turned off third party cookies in two of his browsers and the testing was being done on a page which was embedded in an iFrame.
We have a number of sites that use 303 redirects to share identity across them.
You GET a Page 1 on site A with a blank cache - you get a
303 to a sync page on site B.
You GET the sync page on Site B - it sets a cookie and redirects (303).
You GET a sync page on Site A - the cookie value is also in the URL.
Site A unpacks the cookie from the URL - it sets it on itself and then
303's you back to GET your original page
You should have two identical cookies set for Site A and Site B with a long lifetime and the root path.
Sometimes, for some users it goes infinite redirect. The cookie that is set by Site B for the final redirect isn't sent back when the browser processes the last redirect.
What can be causing this?
incorrect setting of cookies?
browser settings?
How can I debug it? It is a real intermittent problem in production that I can't reproduce in dev.

Redirect loop in ASP.NET app when used in America

I have a bunch of programs written in ASP.NET 3.5 and 4. I can load them fine (I'm in England) and so can my England based colleagues. My American colleagues however are suffering redirect loops when trying to load any of the apps. I have tried myself using Hide My Ass and can consistently recreate this issue.
I'm stumped. What could be causing a redirect loop for users in a specific country?!
The apps are hosted on IIS 6 on a dedicated Windows Server 2003. I have restarted IIS with no luck.
Edit
I should have made it clear that unfortunately I do not have access to the machines in the US to run Firefox Firebug/Fiddler. The message I get in Chrome is This webpage has a redirect loop..
When you say "a redirect loop", do you mean a redirect as in an http redirect? Or do you mean you have a TCP/IP routing loop?
A TCP/IP loop can be positively identified by performing a ping from one of the affected client boxes. If you get a "TTL expired" or similar message then this is routing and unlikely to be application related.
If you really meant an http redirect, try running Fiddler, or even better, HttpWatch Pro and looking at both the request headers, and the corresponding responses. Even better - try comparing the request/response headers from non-US working client/servers to the failing US counterparts
you could take a look with Live HTTP Headers in firefox and see what it's trying to redirect to. it could possibly be trying to redirect to a url based on the visitor's lang/country, or perhaps the dns is not fully propagated...
if you want to post the url, i could give you the redirect trace
What could be causing a redirect loop
for users in a specific country?!
Globalization / localization related code
Geo-IP based actions
Using different base URLs in each country, and then redirecting from one to itself. For example, if you used uk.example.com in the UK, and us.example.com in the US, and had us.example.com redirect accidentally to itself for some reason.
Incorrect redirects on 404 Not Found errors.
Spurious meta redirect tags
Incorrect redirects based on authentication errors
Many other reasons
I have tried myself using Hide My Ass
and can consistently recreate this
issue.
I have restarted IIS with no luck.
I do not have access to the machines
in the US to run Firefox
Firebug/Fiddler.
The third statement above don't make sense in light of the other two. If you can restart IIS or access the sites with a proxy, then you can run Fiddler, since it's a client-side application. Looking at the generated HTML and corresponding HTTP headers will be the best way to diagnose your problem.

Resources