Which HTTP status should be returned when a SPECIFIC page will be down for several days - asp.net

We are uploading a new version of our website.
For various reasons, some pages that exists on the older version still aren't ready for the new version and we need to temporarily take them off.
Which HTTP status should we return for these pages considered they will be up and running again within several days.
Is using ServiceUnavailable = 503 only for these pages the right way or will it have negative impact on the entire website?
(Using ASP.NET in case it's related in some way...)

The status code 503 seems to be the best choice here:
The 503 (Service Unavailable) status code indicates that the server is currently unable to handle the request due to a temporary overload or scheduled maintenance, which will likely be alleviated after some delay.
It shouldn’t be relevant that it’s not due to overload or maintenance in your case. What is relevant is that it’s your fault (hence a status code from the 5xx class), and that it’s temporary (hence 503), so there’s no need to let them know the real reason.
While 503 is typically used for the whole site, I can see no reason why it shouldn’t be used for specific pages only. A possible drawback: If a bot ‎successively crawls a few documents that give 503, it might think that the whole site is affected and stop crawling for now.
If you know when the page will be available again, you can send the Retry-After header:
When sent with a 503 (Service Unavailable) response, Retry-After indicates how long the service is expected to be unavailable to the client.
(FWIW, the Googlebot seems to support this.)
In the post Website outages and blackouts the right way (by a Google employee at the time), my assumptions are confirmed as far as Google Search is concerned: 503 should also be used for specific pages only; crawling rate might be affected if Googlebot gets many 503 answers.

Related

What status code should I set when updating and reloading the website?

What status code should I set when updating and reloading the website for search engines to understand?
Currently, I use the code 503, but Google registers it as a site error!
503 is the most appropriate HTTP response code for the scenario you describe. There are no 4xx or 3xx errors that indicate "try again later".
But lets step back:
It shouldn't be a problem if Google occasionally fails to crawl your website. If your business model is significantly impacted by this, I would suggest that there is something a bit wrong with your model. (And besides, that is a matter for discussion between you and Google!)
It would be more a problem if your website's (real) users see 503's.
There is an obvious solution for you. Reimplement your mechanisms for reloading your website so that you don't need to send a 5xx response while reloading.
For example, implement a pair of sites and flip between them using a load balancer or similar. That way you can be updating one site while the other site is serving requests. When the new site is ready switch it into production in the loadbalancer.
There are probably other ways to do this ... depending on how your site works.

HTTP status code for "never going to exist"

I'm building a service where resources are divided into categories, e.g. example.com/category/foo would represent "foo" (think: similar to SO tags).
New categories may be added at any time and, when attempting to view a category that doesn't yet exist, users would be able to suggest it added.
However, I'd like to just outright ban some category names (e.g. NSFW terms). This means that (assuming "bar" is one such name) example.com/category/bar not only never existed, but is guaranteed never to exist in the future.
Which HTTP status code is appropriate for this situation?
Several ideas come to mind:
410 Gone - while this makes it clear the resource will not be available in the future, I'm not sure if it's appropriate as it also seems to imply it used to exist in the past.
400 Bad Request - the request is technically not malformed, so this is probably not the way to go.
404 Not Found initially seemed like the logical option, but doesn't convey the permanence of the ban, especially since I plan to use 404 for categories that don't yet exist, but can be suggested.
301 Moved Permanently and redirect to another page, either the home page or some other page explaining that some category names are banned.
I don't think there's such a thing as "will never exist" in the specs of HTTP. Let's think about it slightly differenly :
the resource is created through your website (and only through your website). If a resource is created, that means that whatever validation you put in place succeeded.
So to keep it simple stupid, you should stick to HTTP semantics. If someone hits a URL : example.com/category/{cat}, either you know {cat} (it's in your DB and has a valid name, right?) and process the request safley, or you have never seen {cat} before and you just return 404.
After all there's an infinity of possible values that one could use for {cat} and all of them would be valid URLS.
Hope it helps
I would suggest a 410 Gone is an appropriate response, assuming that your web clients are common and implement the http spec correctly (most popular ones do).
When looking on this page here: Status Code Definitions
The section about 410 gone says:
10.4.11 410 Gone
The requested resource is no longer available at the server and no
forwarding address is known. This condition is expected to be
considered permanent. Clients with link editing capabilities SHOULD
delete references to the Request-URI after user approval. If the
server does not know, or has no facility to determine, whether or not
the condition is permanent, the status code 404 (Not Found) SHOULD be
used instead. This response is cacheable unless indicated otherwise.
The 410 response is primarily intended to assist the task of web
maintenance by notifying the recipient that the resource is
intentionally unavailable and that the server owners desire that
remote links to that resource be removed. Such an event is common for
limited-time, promotional services and for resources belonging to
individuals no longer working at the server's site. It is not
necessary to mark all permanently unavailable resources as "gone" or
to keep the mark for any length of time -- that is left to the
discretion of the server owner.
I take that to mean the resource definitely does not exist now and will not exist in the future, giving the permanence that you want, and it is also cachable unless stated otherwise.
Usually I would not need to know that it had "never existed before", I just know that it doesn't exist right now when i'm requesting information about it, and it is not going to exist in the future if i request it again. If I did need to learn that certain categories never existed in the past (e.g. a blacklist of categories), i would probably want a separate explicit request that I could use to get all of those categories rather than having to check each category one at a time at run time.

Is my website being attacked/spammed?

My website goes 503 Service Unavailable error several times a day. Sometimes it gives Request Time Limit for specific requests.
My website isn't receiving huge traffic for it to go busy.
The trace error logs shows a lot of error requests sometimes 503 or 500. It wasn't like it a few months ago. No changes were made in the code/settings/design of the website.
I found some logs that trace to +Ahrefsbot and Googlebot web crawler so I suspect it's crawling my website too fast. I can't block them as they come in different ip addresses.
Please help
My website is hosted under networksolutions so I can't perform any modifications or analyze what's going behind my website (server-side)
If you're sure it's crawlers that're causing your problems you could try using a /robots.txt file.
You can learn more about it here, but to block them completely use this:
User-agent: *
Disallow: /
You can adjust Google's crawl rate through Google Webmaster tools.
There's an existing answer here that should be of help when dealing with Google's bots.
That said, I honestly doubt it's robots causing spikes in traffic large enough to bump you offline. Try analyzing your logs to see what the actual cause is. Maybe you're using more memory than your host allows.

Is there any reason a website should return appropriate HTTP status codes?

I'm working on a small web application, and I'm trying to decide if I should make the effort to emit semantically appropriate HTTP status codes from within the application.
I mean, it makes sense for the web server itself to emit proper response codes. 500 Internal Server Error for a misconfigured Apache or 404 Not Found for a missing index.php or whatever all make sense, since there's nothing else the server can really do.
It also makes sense to manipulate the browser with 303 See Other or other HTTP mechanisms which actually produce behavior.
But if all that happened is a missing GET parameter, for example, is there any reason to go out of my way to return 400 Bad Request? Or how about 404 Not Found, if my application is handling all the routing by itself? From what I can tell there isn't any behavior associated with either of those error codes.
My general opinion: provide codes if the code provides actionable data for the user.
If all you're doing is presenting content, then in most cases I think it's less important. If YouTube fails to load a video, I mostly care about the fact that I can't watch my video. That it failed with a 418 status might be intellectually interesting, but it doesn't really provide me with any helpful information (Even assuming a non-silly failure code).
On the other hand, if you're allowing some kind of user interaction with a server, then the codes become much more important. I might actually care about why my request failed, because I'm now in a position to do something about it.
However, there are some codes that are actionable. 410 Gone for example: If my request failed for that reason, but I just got back a generic "Stuff Broke" message, I'd probably repeat the request a bunch of times, get nowhere, and give up in frustration. Knowing that the thing I'm looking for doesn't exist is a pretty useful thing for me to know.
I think its very important for a web service to respond with appropriate codes as sometime the developer using the service might not know whats wrong or why the app stopped working unless he views the status code.

Which HTTP status codes do you actually use when developing web applications? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
The HTTP/1.1 specification (RFC2616) defines a number of status codes that can be returned by HTTP server to signal certain conditions. Some of those codes can be utilized by web applications (and frameworks). Which of those codes are the most useful in practice in both classic and asynchronous (XHR) responses, in what situations you use each of them?
Which codes should be avoided, eg. should applications mess with the 5xx code range at all? What are your conventions when returning HTTP codes in REST web services? Do you ever use redirects other than 302?
The ones I'm using (that I could find with a quick grep 'Status:' anyway):
200 Successfully retrieved a resource without affecting it
201 Sent whenever a form submission puts something significant into the database (forum post, user account, etc.), creating a new resource
204 Sent with empty body, for example after a DELETE
304 HTTP caching. I've found this one is very hard to get right since it has to account for users changing display settings and so on. The best idea I've come up with for that is using a hash of the user's preferences as the ETag. It doesn't help that most browsers have unpredictable and inconsistent behaviour here...
400 Used for bad form submissions that fail some validation check.
403 Used whenever someone is somewhere they shouldn't be (though I try to avoid that by not displaying links to stuff the users shouldn't access).
404 Apart from the normal webserver ones I use these when the URL contains invalid ID numbers. I suppose it'd be a good idea to check in this case whether a higher valid ID exists and send a 410 instead...
429 When user's requests are too frequents
500: I tend to put these in catch{ } blocks where the only option is to give up, to make sure something meaningful is sent to the browser.
I realise I could get away with simply letting the server send "200" for everything, but they save a lot of pain when users are seeing (or causing) errors and not telling you about them. I've already got functions to display access-denied messages and so on, so it's not much work to add these anyway.
418: I'm a teapot
From http://www.ietf.org/rfc/rfc2324.txt Hyper Text Coffee Pot Control Protocol (HTCPCP/1.0)
Don't forget about 503 - Service Unavailable. This one is essential for site downtime. Especially where Search Engines are concerned.
Say you're taking the site down for a few hours for maintenance or upgrade work. By directing all requests to a friendly page that returns a 503 code, it tells spiders to "try again later".
If you simply display a "Temporarily Down" page but still return 200 OK, the spider may index your error pages or, worse, replace existing indexing with this "new" content.
This could seriously impact your SEO rankings, especially if your a large, popular site.
303 See Other is a must for PRG, which you should be using now if you aren't already.
Here are the most common response codes, from my experience:
The response codes in the 1xx-2xx range are typically handled automatically by the underlying webserver (i.e. Apache, IIS), so you don't need to worry about those.
Codes 301 and 302 are typically used for redirects, and 304 is used a lot when the client or proxy already contains a valid copy of the data and does not need a new version from the server (see the RFC for details on exactly how this works).
Code 400 is typically used to indicate that the client sent bad or unexpected data that caused a problem in the server.
Code 403 is for performing authentication, which is again usually handled somewhat automatically by the server configuration.
Code 404 is the error code for a page not found.
Code 500 indicates an error condition in the server that is not necessarily caused by data sent from the client. For example, database connection failures, programming errors, or other unhandled exceptions.
Code 502 is typically seen if you are proxying from a webserver (such as Apache) to an app server (such as Tomcat) in the backend, and the connection cannot be made to the app server.
For asynchronous calls (i.e. AJAX/JSON responses) it's usually safest to always return a 200 response. Even if there was an error in the server while processing the request, it's best to include the error in the JSON object and let the client deal with it that way. The reason being that not all web browsers allow access to the response body for non-200 response codes.
I tend to use 405 Method Not Allowed when somebody tries to GET an URL that can only be POSTed. Anyone else does it the same way?
In Aida/Web framework we use just
200 Ok
302 Redirect
404 Not Found
500 Internal Server Error
I basically use all of them, where appropriate. The spec itself defines each code and the circumstances in which they should be used.
When building a RESTful web application, I don't recommend picking-and-choosing status codes, and restrict oneself to a subset of the full range. That is, unless one's building a web application for a specific HTTP client - in which case one isn't really building a web application, one's building an application for that specific client.
At my firm, we have some Flex clients. They can't properly handle status codes other than 200, so we have them send a special parameter with their requests, which tells our servers to always send a 200, even when it's not the proper response.
I've had nightmares about the number 500.
500 Internal Server Error is returned in Aida/Web when your web application raise an exception. Because this is a Smalltalk application, an exception window is raised on the server (if you run it headfull) while the user got 500 and a short stack.
On the server you can then open a full debugger and try to solve the problem, while the server is continuing running of course. Another nice thing is that exception window with full stack is waiting for you until you come around.
Conclusion: 500 is a blessing, not a nightmare!
I'm using 409 instead of 400 for bad user data (i.e. form submission).
The specs for 409 go on about version conflicts but it mentions that information on how to fix the issue should be sent in the response. Perfect for malformed email or wrong password messages.
400 only addresses syntax issues which to me sounds like the request just doesn't make sense at all rather then failing some regex .
I use webmachine, which automagically generates proper error codes. But there are cases when I need to supply my own. I have found it helpful for development and debugging to return 666 in those cases, so I can easily tell which ones come from my code, and which from webmachine. Besides, I get a chuckle out of it when it comes up. You do have to remember to change this before deployment for real.

Resources