Server returns 404 for a web page, but page is showing fine in browser - why? - http

A strange web page crossed my way. (And being a developer I have to solve the mystery.)
When accessing the web page in any browser, all seems normal. The web page is displayed as expected.
But when looking in the console the server acually returns a 404 status code:
So why is the browser rendering a page?
Looking at the Body shows valid HTML is returned:
Hold on. Responding 404 and sending the HTML along the way? And the browser renders it??
Why is this happening? Is this some server misconfiguration? Or is something clever going on here that I don't understand? Is there a practical reason for configuring a server on purpose to behave like this?

Another answer on Stack Overflow contains some interesting information: A HTTP status code of 404 plus HTML response body is actually recommended by the spec.
The 4xx class of status code is intended for cases in which the
client seems to have erred. Except when responding to a HEAD
request, the server SHOULD include a representation containing an
explanation of the error situation, and whether it is a temporary or
permanent condition. These status codes are applicable to any
request method. User agents SHOULD display any included
representation to the user.
This leaves me with two possible explanations:
Explanation 1: it's a server error.
the server wrongly returns a 404 status code
the browser thinks the response body contains details about the error and displays it - for the end user this is the actual page
Explanation 2: it's done on purpose to defeat crawlers and page watchers.
the server returns 404 on purpose - non-browser user agents won't process the result as they interpret it as error
browsers are unaffected, the end user doesn't care as long as the page is being displayed
The second one would indeed be kind of clever if you don't want your page to be indexed.

I faced with the same situation. My portal was hosted in a tomcat server. The portal was loaded when the host name along with the tomcat directory path was hit. But on loading the the webpage redirected to a deep-link URL and rendered the page. But if you hit the deep link URL directly in the browser it would give you 404 error in the network tab in Dev tools although the webpage would be rendered fine.
This happens because there is no resource as your deep-link URL anywhere in your server config files, so when it searches for the resource it doesn't find one and returns 404 in the network tab in Dev tools.
But browser behaves differently with the resource URL. It first loads and connects to the host name of the resource, when returned with success gets redirected as per the config files settings and renders the deep-link URL resource HTML, styling contents properly.

Note: I don't know whether this issue comes from me not being strict enough in the .htaccess or my CMS.
In my contrived .htaccess example I had the following rules to ignore these directories from being handled by the CMS.
RewriteCond $1 !^(branch|css|js|html|images) [NC]
I also had a branches directory inside my CMS' templates (created within CMS). I guess my .htaccess rule wasn't strict enough here. I had to change branch to branch\/, like so:
RewriteCond $1 !^(branch\/|css|js|html|images) [NC]
Only then would the page load without the 404 in the console.

Related

404 error on my homepage although I can see my site

I am at my wits end with the following problem:
My site www.sebastianthalhammer.com is available under that URL without any problems.
However Google Search Console as well as other external third party test tools return a 404 error.
Status report from Uptrends
It is just the main page that's affected. All the other subpages and blog content isn't affected.
I have been in contact with the server stuff but it seems alright to them. As mentioned. The site can be reached. The site runs on wordpress - latest version.
I have no real clue where to start as this error seems to be quite a tricky one. Does anyone here might have an idea what's going on?
Sebastian
The 4xx class of status code is intended for cases in which the
client seems to have erred. Except when responding to a HEAD
request, the server SHOULD include a representation containing an
explanation of the error situation, and whether it is a temporary or
permanent condition. These status codes are applicable to any
request method. User agents SHOULD display any included
representation to the user.
This leaves me with two possible explanations:
Explanation 1: it's a server error.
the server wrongly returns a 404 status code
the browser thinks the response body contains details about the error and displays it - for the end user this is the actual page
Explanation 2: it's done on purpose to defeat crawlers and page watchers.
the server returns 404 on purpose - non-browser user agents won't process the result as they interpret it as error
browsers are unaffected, the end user doesn't care as long as the page is being displayed
The second one would indeed be kind of clever if you don't want your page to be indexed.
Thanks to your feedback I could think about the problem in a different way.
Ultimately at the unholy depths of a certain plugin I could dig out a setting that caused the error.
It was a redirection plugin that (for whatever reason) sent out a 404 signal when the URL was requested.
I don't know what the purpose would be for something. All I know is that the setting was on default for quite a while now and that caused the weird situation.
thanks guys for getting me on the right track.
Sebastian

When to add http(s):// to website address

I'm trying to create a web browser using Cocoa and Swift. I have an NSTextField where the user can enter the website he wants to open and a WebView where the page requested is displayed. So far, to improve the user experience, I'm checking if the website entered by the user starts with http:// and add it if it doesn't. Well, it works for most of the cases but not every time, for example when the user wants to open a local web page or something like about:blank. How can I check if adding http:// is necessary and if I should rather add https:// instead of http://?
You need to be more precise in your categorization of what the user typed in.
Here are some examples and expected reactions:
www.google.com: should be translated into http://www.google.com
ftp://www.foo.com: Should not be modified. Same goes to file:// (local)
Barrack Obama: Should probably run a search engine
about:settings: Should open an internal page
So after you figure out these rules with all their exceptions, you can use a regex to find out what should be done.
As for HTTP vs. HTTPS - if the site supports HTTPS, you'll get a redirect response (307 Internal Redirect, 301 Moved Permanently etc) if you go to the HTTP link. So for example, if you try to navigate to http://www.facebook.com, you'll receive a 307 that will redirect you to https://www.facebook.com. In other words, it's up to the site to tell the browser that it has HTTPS (unless of course you navigated to HTTPS to begin with).
A simple and fairly accurate approach would simply be to look for the presence of a different schema. If the string starts with [SomeText]: before any slashes are encountered, it is likely intended to indicate a different schema such as about:, mailto:, file: or ftp:.
If you do not see a non-http schema, try resolving the URL as an HTTP URL by prepending http://.

How do servers behave when browsers request embedded resources that do not exist?

Let's take the following hypothetical situation:
an HTTP server has a custom error page set up /404.html and does a server-side forward for any URL that gives a 404 response (for example /blabla.html) to the 404.html page
a browser requests an existing page from the server, say /home.html
the page contains <img src="a.jpg" alt="a" />, but that resource does not exist on the server
the browser receives a 404 for the resource, marks it as missing and does not receive any response (tested this in Chrome and FF in the network tab of the dev console - the response bit is empty)
My question is: what happens on the server when the image is requested?
My guess is the browser cuts off the connection when it gets the 404 status in the header so it doesn't wait or download the response. My other guess is that it's implementation specific, but I'm curious if the servers notice that the connection has been cut off.
The browser will get your error page but he can't handle html in an image. (It will throw an error in the console)
If you would do it with a frame it will show your error page.

What does "pending" mean for request in Chrome Developer Window?

What does "Pending" mean under the status column in the "Network" tab of Google Chrome Developer window?
This happens when my page script issues a GET request whose response contains content-headers for downloading a CSV file:
Content-type: text/csv;
Content-Disposition: attachment; filename=myfile.csv
This works fine in FF and IE7, downloading a CSV file as expected and opening a file picker to save the file, but Chrome does nothing. I confirmed that the server responds to the request, so it appears that Chrome will not process the response.
Curiously, all works as expected if I type the URL into Chromes address bar and hit <enter>.
FYI: Chrome 10.0.648.204 on Windows XP
In my case, I found that the "pending" status was caused by the AdBlock extension. The image that I couldn't get to load had the word "ad" in the URL, so AdBlock kept it from loading.
Disabling AdBlock fixes this issue.
Renaming the file so that it doesn't contain "ad" in the URL also fixes it, and is obviously a better solution. Unless it's an advertisement, in which case you should leave it like that.
I also get this when using the HTTPS everywhere plugin.
This plugin has a list of sites that also have https instead of http. So I assume before the actual request is made it is already being cancelled somehow.
So for example when I go to http://stackexchange.com, in Developer I first see a request with status (terminated). This request has some headers, but only the GET, User-Agent, and Accept. No response as well.
Then there is request to https://stackexchange.com with full headers etc.
So I assume it is used for requests that aren't sent.
I had some problems with pending request for mp3 files.
I had a list of mp3 files and one player to play them. If I picked a file that had already been downloaded, Chrome would block the request and show "pending request" in the network tab of the developer tools.
All versions of Chrome seem to be affected.
Here is a solution I found:
player[0].setAttribute('src','video.webm?dummy=' + Date.now());
You just add a dummy query string to the end of each url. This forces Chrome to download the file again.
Another example with popcorn player (using jquery) :
url = $(this).find('.url_song').attr('url');
pop = Popcorn.smart( "#player_", url + '?i=' + Date.now());
This works for me. In fact, the resource is not stored in the cache system. This should also work in the same way for .csv files.
I had the same issue on OSX Mavericks, it turned out that Sophos anti-virus was blocking certain requests, once I uninstalled it the issue went away.
If you think that it might be caused by an extension one easy way to try and test this is to open chrome with the '--disable-extensions flag to see if it fixes the problem. If that doesn't fix it consider looking beyond the browser to see if any other application might be causing the problem, specifically security apps which can affect requests.
I had a similar issue with application/json ajax calls. In ff/IE they were fine. In chrome in the Developer Network window Status was always (pending) because a different status code was being returned.
In my case I changed my Json response to send a HttpStatusCode of 200 then Chrome was fine and the Status Text changed to 200 OK.
For example using ASP.NET Web Api
return new HttpResponseMessage(HttpStatusCode.OK ) {
Content = request.Content
};
The Network pending state on time, means your request is in progressing state. As soon as it responds the time will be updated with total elapsed time.
This picture shows the network call is in processing state(Pending)
This picture shows the time taken in processing by network call.
The fix, for me, was to add the following to the top of the php file which was being requested.
header("Cache-Control: no-cache,no-store");
Same problem with Chrome : I had in my html page the following code :
<body>
...
<script src="http://myserver/lib/load.js"></script>
...
</body>
But the load.js was always in status pending when looking in the Network pannel.
I found a workaround using asynchronous load of load.js:
<body>
...
<script>
setTimeout(function(){
var head, script;
head = document.getElementsByTagName("head")[0];
script = document.createElement("script");
script.src = "http://myserver/lib/load.js";
head.appendChild(script);
}, 1);
</script>
...
</body>
Now its working fine.
Encountered a similar issue recently.
My App is in angular 11 and we have a form with some validators which have regex to validate the data. One of data element had a special character which the regex wasn't handling and it made the entire browser hung up. Infact, even though all network calls were successful with 200 Ok, chrome was not showing any response returned by the backend and was also showing the requests in Pending State when infact all network calls are successful, there was no console log errors or anything. Handling the regex fixed the issue.
After i found the issue, i googled more about it. Here is more explanation about it.
https://javascript.info/regexp-catastrophic-backtracking
I came across this issue when I was debugging a local web application. The issue turned out to be AVG Antivirus and Firewall restrictions. I had to allow an exception through the firewall to get rid of the "Pending" status.
In my case, a simple restart to my browser (chrome) and it worked straight away afterwards like magic!
Little bit of context, I happen to refresh my frontend web page and straight away went onto making a changes to my API which led it to restart. During that instance, the frontend was making calls to API which led into "pending" due to that API is reloading. Browser at this point cached that pending state. For me to get out of it is either I set no-cache (which I didn't want to) or simply restart the browser, I chose the restart.
A little background
I encountered such an issue when requesting an url in my Django project. The server is setup using Apache HTTP web server and basic auth for user authentication.
The url I was accessing required no authentication to access i.e. in my Apache config, I had set Require all granted on the url using the LocationMatch directive.
The issue
The url I was trying to access returned 200 status (in the Network tab in Chrome), but the static assets being used for styling of the requested webpage (css, javascript, font files etc.) associated with the request url were not loading and returned pending status.
In the meanwhile, the page loaded partially and still kept on loading. All this was happening in the presence of basic-auth dialog in browser, even though my url was granted all access.
What worked for me
Interestingly, as I entered my credentials and logged in, the requested page loaded all the static assets. This made it very clear to me that the static assets directory might NOT have the necessary access permissions.
Then, I granted the access to the static assets directory by updating my Apache config and then the requested url and the webpage loaded up fine (200 status) without any basic auth dialog OR pending status.
In my case, there's an update for Chrome that makes it won't load before you restart the browser. Cheers
I encountered the same problem when I request certain images from page. I use JavaScript to set the src attribute of an img object and if the network is poor pending will be displayed in the network panel of chrome developer window. I think it's due to the poor network.

using customErrors for vanity URLs / asp.net url redirection

So, from here...
In ASP.NET, you have a choice about how to respond to that - it's in the web.config as CustomErrors. Turn that on, then redirect to a fancy 404 page (maybe you already do). The fancy 404 page, then, could be checking the requested querystring (which gets passed over to the custom error page as yet another querystring) to see if it's a valid redirect, lives in your database, etc. Just do a Response.Redirect() from there.
Then schooner writes:
Thanks, we do have a 404 now but we would prefer this not to be detected as a 404 in the process. We would like ot handle it directly and seperately if possible.
..and I'd like to know just how bad a practice this is. I don't expect to put my "pretty" URLs on the internet (just business cards) and I have a sample of 404-redirecting-to-a-helpful-site code working, but I don't want to get to production and have an issue with a browser that takes the initial 404 too seriously. Can anyone help me understand more about why I wouldn't want to use customErrors / 404 to flow users to the page they actually wanted?
The main problem with using customeErrors as your 404 error handler is that every time customErrors picks up an errored request rather than throwing a 404 error back to your browser and letting your browser know there was a bad request, it instead returns a 302 which indicates that a page has been relocated to whatever your customErrors page is. This isn't bad for most users because they don't know or even notice the difference, the problem comes from the fact that web crawlers DO know the difference and the status code they receive directly affects how their indexing works.
Consider the scenario where you have a page at http://mysite.com/MyAwesomePageAboutStuff.aspx for some period of time and then one day you decide you no longer need it and delete the file. If Google or some other crawler has already indexed that URL and goes back to it after you delete it the crawler will get a 302 status code instead of a 404 error and because of this status code the crawler will update the page's url to point to your error page rather deleting the non-existent link. Now, whenever someone finds that url by way of a search engine they'll end up at your error page.
It's not really a huge issue, but you can definitely see the headaches this can create for your users in the long run.
Look here for some corroborating data.
I created a vanity url system using the 404 as the handler. There's no need for a 302 on my side as the 404 dynamically loads the content and returns that. I am fully able to handle any and all POST / GET and SERVER data.
Works great. If you are interested TarantulaHawk is up on SourceForge.

Resources