Is there any reason a website should return appropriate HTTP status codes? - http

I'm working on a small web application, and I'm trying to decide if I should make the effort to emit semantically appropriate HTTP status codes from within the application.
I mean, it makes sense for the web server itself to emit proper response codes. 500 Internal Server Error for a misconfigured Apache or 404 Not Found for a missing index.php or whatever all make sense, since there's nothing else the server can really do.
It also makes sense to manipulate the browser with 303 See Other or other HTTP mechanisms which actually produce behavior.
But if all that happened is a missing GET parameter, for example, is there any reason to go out of my way to return 400 Bad Request? Or how about 404 Not Found, if my application is handling all the routing by itself? From what I can tell there isn't any behavior associated with either of those error codes.

My general opinion: provide codes if the code provides actionable data for the user.
If all you're doing is presenting content, then in most cases I think it's less important. If YouTube fails to load a video, I mostly care about the fact that I can't watch my video. That it failed with a 418 status might be intellectually interesting, but it doesn't really provide me with any helpful information (Even assuming a non-silly failure code).
On the other hand, if you're allowing some kind of user interaction with a server, then the codes become much more important. I might actually care about why my request failed, because I'm now in a position to do something about it.
However, there are some codes that are actionable. 410 Gone for example: If my request failed for that reason, but I just got back a generic "Stuff Broke" message, I'd probably repeat the request a bunch of times, get nowhere, and give up in frustration. Knowing that the thing I'm looking for doesn't exist is a pretty useful thing for me to know.

I think its very important for a web service to respond with appropriate codes as sometime the developer using the service might not know whats wrong or why the app stopped working unless he views the status code.

Related

URL manipulation always returns a 200:OK in meteor - getting flagged as violation in OWASP-ZAP

I ran OWASP ZAP and the tool threw up a high vulnerability for possible SQL injection issue. Although we know for sure we do not use any sql databases as part of our application stack, I poked around and have have a few questions.
The payload that detected this “vulnerability” was as below:
https://demo.meteor.app/sockjs/info?cb=n6_udji55a+AND+7843%3D8180--+UFVTsdsds
Running this on the browser, I get a response:
{"websocket":true,"origins":["*:*"],"cookie_needed":false,"entropy":3440653497}
I am able to go ahead and make any sort of manipulations to what comes after the cb= part and I still get the same response. I believe this is what has tricked the tool to flag this as vulnerability - where in it injected a -- with some characters and still managed to get a proper response.
How can I make sure that changing the URL parameter to something that does not exist, returns a 404 or a forbidden message?
Along the same lines, when I try to do a GET (or simply a browser call) for:
https://demo.meteor.app/packages/accounts-base.js?hash=13rhofnjarehwofnje
I get the auto generated JS file for accounts-base.js.
If I manipulate the hash= value, I still get the same accounts-base.js file rendered. Shouldn’t it render a 404? If not, what role does the hash play? I feel that the vulnerability testing tool is flagging such URL based manipulations wrongly and ascertaining that there is some vulnerability with the application.
Summarizing my question:
How do I make sure that manipulating the URL gives me a 404 or at the very least, forbidden message instead of always giving a 200:ok in a meteor application?

Why and how HTTP responses' status code matter?

Genuine question; there are so many HTTP status codes and people always motivate for you to use the correct one in a certain situation.
Except for the common ones (200, 404, and 500), does it really matter what HTTP status code I send back to the user?
Does the browser does something different with each status code?
In many cases: no. As long as you use the correct class of errors (4xx, 5xx) many things will just work.
The purpose of using more specific HTTP status codes is two fold:
It can be helpful documentation for a programmer running into them.
It allows a generic client to handle results in a generic way.
Some examples for #2:
A 401 error can allow a generic client to initiate authentication.
A 429 error can allow a generic client to automatically back-off and retry a request after an amount of time.
A good client can make these types of decisions completely independent of your API. It knows how to behave because these are agreed upon standards.
But if you don't make use of clients that can handle any of these advanced features, then it is less important to use them.
All of these status codes are more or less designed with this idea in mind; perhaps it can allow clients to automatically resolve an error, perhaps it will immediately understand whether a cache should be marked stale, or perhaps it can render good default feedback to an end-user.
However, most APIs and clients will only make use of a few of these features. So my general advice would be:
It doesn't really hurt to use the right status code
Making yourself familiar with status codes can help you design your API in a way to use well-designed common features in a way that is similar to how someone else might build it, instead of say: reinventing the wheel for your specific use-case.
Don't sweat it if you don't know if you (for example) should return a 400, 403, 409, 422 and you can't see a practical benefit to each.
If you want to read more, see what one of the major authors of recent HTTP standards has to say about this: https://www.mnot.net/blog/2017/05/11/status_codes
I also wrote a series of blog post about each status along with real-world uses for each: https://evertpot.com/http/

Is circular redirect good option when Retry-After is specified?

I need to notify client that I want it to revisit the resource in 30 secs because I can't give him satisfactory response immediately. It's not typical situation, but happens from time to time.
Is 302 redirect with Retry-After: 30 pointing to the same URL as requested by client header good option? Or such circular redirects are always bad?
That kind of redirect looping seems like a bad idea to me.
You could just inform the client that they need to initiate a reload. "Sorry, try again later"
Another option to consider is SignalR. By using this, you can automatically update your UI without reloading the page.
It's a reasonable thing to do; it's the client's responsibility to do loop detection (and HTTP itself talks about this).
James Snell goes into some depth here which might be relevant, depending on exactly what you're trying to do:
http://chmod777self.blogspot.com.au/2013/01/asynchronous-patterns.html
You can respond with 503 Service Unavailable with a Reply-After that indicates how long until you can provide a satisfactory response. Even though your service is available in a way, it really isn't since it can't provide the required response.

Why shouldn't data be modified on an HTTP GET request?

I know that using non-GET methods (POST, PUT, DELETE) to modify server data is The Right Way to do things. I can find multiple resources claiming that GET requests should not change resources on the server.
However, if a client were to come up to me today and say "I don't care what The Right Way to do things is, it's easier for us to use your API if we can just use call URLs and get some XML back - we don't want to have to build HTTP requests and POST/PUT XML," what business-conducive reasons could I give to convince them otherwise?
Are there caching implications? Security issues? I'm kind of looking for more than just "it doesn't make sense semantically" or "it makes things ambiguous."
Edit:
Thanks for the answers so far regarding prefetching. I'm not as concerned with prefetching since is mostly surrounding internal network API use and not visitable HTML pages that would have links that could be prefetched by a browser.
Prefetch: A lot of web browsers will use prefetching. Which means that it will load a page before you click on the link. Anticipating that you will click on that link later.
Bots: There are several bots that scan and index the internet for information. They will only issue GET requests. You don't want to delete something from a GET request for this reason.
Caching: GET HTTP requests should not change state and they should be idempotent. Idempotent means that issuing a request once, or issuing it multiple times gives the same result. I.e. there are no side effects. For this reason GET HTTP requests are tightly tied to caching.
HTTP standard says so: The HTTP standard says what each HTTP method is for. Several programs are built to use the HTTP standard, and they assume that you will use it the way you are supposed to. So you will have undefined behavior from a slew of random programs if you don't follow.
How about Google finding a link to that page with all the GET parameters in the URL and revisiting it every now and then? That could lead to a disaster.
There's a funny article about this on The Daily WTF.
GETs can be forced on a user and result in Cross-site Request Forgery (CSRF). For instance, if you have a logout function at http://example.com/logout.php, which changes the server state of the user, a malicious person could place an image tag on any site that uses the above URL as its source: http://example.com/logout.php. Loading this code would cause the user to get logged out. Not a big deal in the example given, but if that was a command to transfer funds out of an account, it would be a big deal.
Good reasons to do it the right way...
They are industry standard, well documented, and easy to secure. While you fully support making life as easy as possible for the client you don't want to implement something that's easier in the short term, in preference to something that's not quite so easy for them but offers long term benefits.
One of my favourite quotes
Quick and Dirty... long after the
Quick has departed the Dirty remains.
For you this one is a "A stitch in time saves nine" ;)
Security:
CSRF is so much easier in GET requests.
Using POST won't protect you anyway but GET can lead easier exploitation and mass exploitation by using forums and places which accepts image tags.
Depending on what you do in server-side using GET can help attacker to launch DoS (Denial of Service). An attacker can spam thousands of websites with your expensive GET request in an image tag and every single visitor of those websites will carry out this expensive GET request against your web server. Which will cause lots of CPU cycle to you.
I'm aware that some pages are heavy anyway and this is always a risk, but it's bigger risk if you add 10 big records in every single GET request.
Security for one. What happens if a web crawler comes across a delete link, or a user is tricked into clicking a hyperlink? A user should know what they're doing before they actually do it.
I'm kind of looking for more than just "it doesn't make sense semantically" or "it makes things ambiguous."
...
I don't care what The Right Way to do things is, it's easier for us
Tell them to think of the worst API they've ever used. Can they not imagine how that was caused by a quick hack that got extended?
It will be easier (and cheaper) in 2 months if you start with something that makes sense semantically. We call it the "Right Way" because it makes things easier, not because we want to torture you.

Which HTTP status codes do you actually use when developing web applications? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
The HTTP/1.1 specification (RFC2616) defines a number of status codes that can be returned by HTTP server to signal certain conditions. Some of those codes can be utilized by web applications (and frameworks). Which of those codes are the most useful in practice in both classic and asynchronous (XHR) responses, in what situations you use each of them?
Which codes should be avoided, eg. should applications mess with the 5xx code range at all? What are your conventions when returning HTTP codes in REST web services? Do you ever use redirects other than 302?
The ones I'm using (that I could find with a quick grep 'Status:' anyway):
200 Successfully retrieved a resource without affecting it
201 Sent whenever a form submission puts something significant into the database (forum post, user account, etc.), creating a new resource
204 Sent with empty body, for example after a DELETE
304 HTTP caching. I've found this one is very hard to get right since it has to account for users changing display settings and so on. The best idea I've come up with for that is using a hash of the user's preferences as the ETag. It doesn't help that most browsers have unpredictable and inconsistent behaviour here...
400 Used for bad form submissions that fail some validation check.
403 Used whenever someone is somewhere they shouldn't be (though I try to avoid that by not displaying links to stuff the users shouldn't access).
404 Apart from the normal webserver ones I use these when the URL contains invalid ID numbers. I suppose it'd be a good idea to check in this case whether a higher valid ID exists and send a 410 instead...
429 When user's requests are too frequents
500: I tend to put these in catch{ } blocks where the only option is to give up, to make sure something meaningful is sent to the browser.
I realise I could get away with simply letting the server send "200" for everything, but they save a lot of pain when users are seeing (or causing) errors and not telling you about them. I've already got functions to display access-denied messages and so on, so it's not much work to add these anyway.
418: I'm a teapot
From http://www.ietf.org/rfc/rfc2324.txt Hyper Text Coffee Pot Control Protocol (HTCPCP/1.0)
Don't forget about 503 - Service Unavailable. This one is essential for site downtime. Especially where Search Engines are concerned.
Say you're taking the site down for a few hours for maintenance or upgrade work. By directing all requests to a friendly page that returns a 503 code, it tells spiders to "try again later".
If you simply display a "Temporarily Down" page but still return 200 OK, the spider may index your error pages or, worse, replace existing indexing with this "new" content.
This could seriously impact your SEO rankings, especially if your a large, popular site.
303 See Other is a must for PRG, which you should be using now if you aren't already.
Here are the most common response codes, from my experience:
The response codes in the 1xx-2xx range are typically handled automatically by the underlying webserver (i.e. Apache, IIS), so you don't need to worry about those.
Codes 301 and 302 are typically used for redirects, and 304 is used a lot when the client or proxy already contains a valid copy of the data and does not need a new version from the server (see the RFC for details on exactly how this works).
Code 400 is typically used to indicate that the client sent bad or unexpected data that caused a problem in the server.
Code 403 is for performing authentication, which is again usually handled somewhat automatically by the server configuration.
Code 404 is the error code for a page not found.
Code 500 indicates an error condition in the server that is not necessarily caused by data sent from the client. For example, database connection failures, programming errors, or other unhandled exceptions.
Code 502 is typically seen if you are proxying from a webserver (such as Apache) to an app server (such as Tomcat) in the backend, and the connection cannot be made to the app server.
For asynchronous calls (i.e. AJAX/JSON responses) it's usually safest to always return a 200 response. Even if there was an error in the server while processing the request, it's best to include the error in the JSON object and let the client deal with it that way. The reason being that not all web browsers allow access to the response body for non-200 response codes.
I tend to use 405 Method Not Allowed when somebody tries to GET an URL that can only be POSTed. Anyone else does it the same way?
In Aida/Web framework we use just
200 Ok
302 Redirect
404 Not Found
500 Internal Server Error
I basically use all of them, where appropriate. The spec itself defines each code and the circumstances in which they should be used.
When building a RESTful web application, I don't recommend picking-and-choosing status codes, and restrict oneself to a subset of the full range. That is, unless one's building a web application for a specific HTTP client - in which case one isn't really building a web application, one's building an application for that specific client.
At my firm, we have some Flex clients. They can't properly handle status codes other than 200, so we have them send a special parameter with their requests, which tells our servers to always send a 200, even when it's not the proper response.
I've had nightmares about the number 500.
500 Internal Server Error is returned in Aida/Web when your web application raise an exception. Because this is a Smalltalk application, an exception window is raised on the server (if you run it headfull) while the user got 500 and a short stack.
On the server you can then open a full debugger and try to solve the problem, while the server is continuing running of course. Another nice thing is that exception window with full stack is waiting for you until you come around.
Conclusion: 500 is a blessing, not a nightmare!
I'm using 409 instead of 400 for bad user data (i.e. form submission).
The specs for 409 go on about version conflicts but it mentions that information on how to fix the issue should be sent in the response. Perfect for malformed email or wrong password messages.
400 only addresses syntax issues which to me sounds like the request just doesn't make sense at all rather then failing some regex .
I use webmachine, which automagically generates proper error codes. But there are cases when I need to supply my own. I have found it helpful for development and debugging to return 666 in those cases, so I can easily tell which ones come from my code, and which from webmachine. Besides, I get a chuckle out of it when it comes up. You do have to remember to change this before deployment for real.

Resources