Related
I feel like this has to be easy to Google, but I can't find it: from the perspective of an HTTP cache, what determines if two requests are equivalent?
I imagine one ingredient is that that their URLs need to be identical; for example, rearranging (but not changing) query string parameters seems to cause a cache miss. Presumably they need to have the same Accept header. What else determines if a request can be served from cache?
This is mostly described in this RFC: https://www.rfc-editor.org/rfc/rfc7234#section-4
Summary:
The method
The full uri
Caching-related headers in response influence whether something got stored.
Any request headers that appeared in the list of the Vary response header.
It also matters whether you are caching for a specific user (for example a browser), or many users (for example a proxy).
I also struggled with this. Changing my google search to use "http cache key" generated better results. Using the URL seems to be the most common. Query strings are also generally included.
https://support.cloudflare.com/hc/en-us/articles/115004290387-Using-Custom-Cache-Keys describes what the default is for cloudflare and a discussion on the impact of using different keys.
Another parameter that could be useful is to identifying the type of assets that you want to cache. Or leave it open (no filtering)
"Authorization" header is specifically mentioned in the HTTP spec (https://www.rfc-editor.org/rfc/rfc7234) and needs to be handled.
Upon further reading, I noticed the section on "Secondary keys" in the standard (https://www.rfc-editor.org/rfc/rfc7234#section-4.1) and the use of "Vary" header in a response. Headers presented in the "Vary" response header have to match in both the original and the new request for the cache to declare it as a match.
And as for the primary key, standard says "The primary cache key consists of the request method and target URI." in https://www.rfc-editor.org/rfc/rfc7234#section-2
There are all the conditional requests for cache control like If-match, If-unmodified-since, If-none-match and If-modified-since. For example If-modified-since works this way: suppose you have already requested a page and now you want to reload it. If the header is present then a new page will be sent back from the server ONLY if it was modified since the date indicated as a value for If-modified-since, otherwise 304(not-modified) status will be returned.
Accept and Accept-* instead are necessary for Content-Negotiation, like in which language the page should be returned.
More on conditional requests here: https://www.rfc-editor.org/rfc/rfc7232#page-13
I want to redirect my users browser using HTTP code 303 to a GET URL that I secure using HMAC. Because the request will come from the users browser, I will not have fore-knowledge of the request headers. So I am generating the HMAC hash using the values of the HTTP method and URL only. For example, the URL I want the browser to do to might be:
GET /download
?name=report.pdf
&include=http://url1
&include=http://url2
This create report.pdf for me, containing the contents of all the urls specified using the include query param.
My HMAC code will change this URL to be
GET /download
?name=report.pdf
&include=http://url1
&include=http://url2
&hmac-algorithm=simple-hmac
&hmac-signature=idhihhoaiDOICNK
I can issue HTTP 303 to the user using this URL, and the user will get their report.pdf.
As I am not including the request headers in the signature, I am wondering two things:
1) Can a would-be attacker take advantage of the fact that I am not signing the request headers?
2) Is there a better way to achieve what I am trying to do?
When I realised that what I am talking about here is a signed URL, I checked the Amazon Docs and found "REST Authentication Example 3: Query String Authentication Example" in this document: http://s3.amazonaws.com/doc/s3-developer-guide/RESTAuthentication.html.
This example is about a signed URL for use through a browser. About signing the headers, the document says:
You know that when the browser makes the GET request, it won't provide a Content-Md5 or a Content-Type header, nor will it set any x-amz- headers, so those parts are all kept empty.
In other words, Amazon leave the headers out of the signature.
Amazon make no mention of potential security holes, so until I hear otherwise (or get hacked :) ), I will assume my approach above is fine.
I am building a web service that exclusively uses JSON for its request and response content (i.e., no form encoded payloads).
Is a web service vulnerable to CSRF attack if the following are true?
Any POST request without a top-level JSON object, e.g., {"foo":"bar"}, will be rejected with a 400. For example, a POST request with the content 42 would be thus rejected.
Any POST request with a content-type other than application/json will be rejected with a 400. For example, a POST request with content-type application/x-www-form-urlencoded would be thus rejected.
All GET requests will be Safe, and thus not modify any server-side data.
Clients are authenticated via a session cookie, which the web service gives them after they provide a correct username/password pair via a POST with JSON data, e.g. {"username":"user#example.com", "password":"my password"}.
Ancillary question: Are PUT and DELETE requests ever vulnerable to CSRF? I ask because it seems that most (all?) browsers disallow these methods in HTML forms.
EDIT: Added item #4.
EDIT: Lots of good comments and answers so far, but no one has offered a specific CSRF attack to which this web service is vulnerable.
Forging arbitrary CSRF requests with arbitrary media types is effectively only possible with XHR, because a form’s method is limited to GET and POST and a form’s POST message body is also limited to the three formats application/x-www-form-urlencoded, multipart/form-data, and text/plain. However, with the form data encoding text/plain it is still possible to forge requests containing valid JSON data.
So the only threat comes from XHR-based CSRF attacks. And those will only be successful if they are from the same origin, so basically from your own site somehow (e. g. XSS). Be careful not to mistake disabling CORS (i.e. not setting Access-Control-Allow-Origin: *) as a protection. CORS simply prevents clients from reading the response. The whole request is still sent and processed by the server.
Yes, it is possible. You can setup an attacker server which will send back a 307 redirect to the target server to the victim machine. You need to use flash to send the POST instead of using Form.
Reference: https://bugzilla.mozilla.org/show_bug.cgi?id=1436241
It also works on Chrome.
It is possible to do CSRF on JSON based Restful services using Ajax. I tested this on an application (using both Chrome and Firefox).
You have to change the contentType to text/plain and the dataType to JSON in order to avaoid a preflight request. Then you can send the request, but in order to send sessiondata, you need to set the withCredentials flag in your ajax request.
I discuss this in more detail here (references are included):
http://wsecblog.blogspot.be/2016/03/csrf-with-json-post-via-ajax.html
I have some doubts concerning point 3. Although it can be considered safe as it does not alter the data on the server side, the data can still be read, and the risk is that they can be stolen.
http://haacked.com/archive/2008/11/20/anatomy-of-a-subtle-json-vulnerability.aspx/
Is a web service vulnerable to CSRF attack if the following are true?
Yes. It's still HTTP.
Are PUT and DELETE requests ever vulnerable to CSRF?
Yes
it seems that most (all?) browsers disallow these methods in HTML forms
Do you think that a browser is the only way to make an HTTP request?
I need to authenticate a client when he sends a request to an API. The client has an API-token and I was thinking about using the standard Authorization header for sending the token to the server.
Normally this header is used for Basic and Digest authentication. But I don't know if I'm allowed to customize the value of this header and use a custom authentication scheme, e.g:
Authorization: Token 1af538baa9045a84c0e889f672baf83ff24
Would you recommend this or not? Or is there a better approach for sending the token?
You can create your own custom auth schemas that use the Authorization: header - for example, this is how OAuth works.
As a general rule, if servers or proxies don't understand the values of standard headers, they will leave them alone and ignore them. It is creating your own header keys that can often produce unexpected results - many proxies will strip headers with names they don't recognise.
Having said that, it is possibly a better idea to use cookies to transmit the token, rather than the Authorization: header, for the simple reason that cookies were explicitly designed to carry custom values, whereas the specification for HTTP's built in auth methods does not really say either way - if you want to see exactly what it does say, have a look here.
The other point about this is that many HTTP client libraries have built-in support for Digest and Basic auth but may make life more difficult when trying to set a raw value in the header field, whereas they will all provide easy support for cookies and will allow more or less any value within them.
In the case of CROSS ORIGIN request read this:
I faced this situation and at first I chose to use the Authorization Header and later removed it after facing the following issue.
Authorization Header is considered a custom header. So if a cross-domain request is made with the Autorization Header set, the browser first sends a preflight request. A preflight request is an HTTP request by the OPTIONS method, this request strips all the parameters from the request. Your server needs to respond with Access-Control-Allow-Headers Header having the value of your custom header (Authorization header).
So for each request the client (browser) sends, an additional HTTP request(OPTIONS) was being sent by the browser. This deteriorated the performance of my API.
You should check if adding this degrades your performance. As a workaround I am sending tokens in http parameters, which I know is not the best way of doing it but I couldn't compromise with the performance.
This is a bit dated but there may be others looking for answers to the same question. You should think about what protection spaces make sense for your APIs. For example, you may want to identify and authenticate client application access to your APIs to restrict their use to known, registered client applications. In this case, you can use the Basic authentication scheme with the client identifier as the user-id and client shared secret as the password. You don't need proprietary authentication schemes just clearly identify the one(s) to be used by clients for each protection space. I prefer only one for each protection space but the HTTP standards allow both multiple authentication schemes on each WWW-Authenticate header response and multiple WWW-Authenticate headers in each response; this will be confusing for API clients which options to use. Be consistent and clear then your APIs will be used.
I would recommend not to use HTTP authentication with custom scheme names. If you feel that you have something of generic use, you can define a new scheme, though. See http://greenbytes.de/tech/webdav/draft-ietf-httpbis-p7-auth-latest.html#rfc.section.2.3 for details.
Kindly try below on postman :-
In header section example work for me..
Authorization : JWT eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyIkX18iOnsic3RyaWN0TW9kZSI6dHJ1ZSwiZ2V0dGVycyI6e30sIndhc1BvcHVsYXRlZCI6ZmFsc2UsImFjdGl2ZVBhdGhzIjp7InBhdGhzIjp7InBhc3N3b3JkIjoiaW5pdCIsImVtYWlsIjoiaW5pdCIsIl9fdiI6ImluaXQiLCJfaWQiOiJpbml0In0sInN0YXRlcyI6eyJpZ25vcmUiOnt9LCJkZWZhdWx0Ijp7fSwiaW5pdCI6eyJfX3YiOnRydWUsInBhc3N3b3JkIjp0cnVlLCJlbWFpbCI6dHJ1ZSwiX2lkIjp0cnVlfSwibW9kaWZ5Ijp7fSwicmVxdWlyZSI6e319LCJzdGF0ZU5hbWVzIjpbInJlcXVpcmUiLCJtb2RpZnkiLCJpbml0IiwiZGVmYXVsdCIsImlnbm9yZSJdfSwiZW1pdHRlciI6eyJkb21haW4iOm51bGwsIl9ldmVudHMiOnt9LCJfZXZlbnRzQ291bnQiOjAsIl9tYXhMaXN0ZW5lcnMiOjB9fSwiaXNOZXciOmZhbHNlLCJfZG9jIjp7Il9fdiI6MCwicGFzc3dvcmQiOiIkMmEkMTAkdTAybWNnWHFjWVQvdE41MlkzZ2l3dVROd3ZMWW9ZTlFXejlUcThyaDIwR09IMlhHY3haZWUiLCJlbWFpbCI6Im1hZGFuLmRhbGUxQGdtYWlsLmNvbSIsIl9pZCI6IjU5MjEzYzYyYWM2ODZlMGMyNzI2MjgzMiJ9LCJfcHJlcyI6eyIkX19vcmlnaW5hbF9zYXZlIjpbbnVsbCxudWxsLG51bGxdLCIkX19vcmlnaW5hbF92YWxpZGF0ZSI6W251bGxdLCIkX19vcmlnaW5hbF9yZW1vdmUiOltudWxsXX0sIl9wb3N0cyI6eyIkX19vcmlnaW5hbF9zYXZlIjpbXSwiJF9fb3JpZ2luYWxfdmFsaWRhdGUiOltdLCIkX19vcmlnaW5hbF9yZW1vdmUiOltdfSwiaWF0IjoxNDk1MzUwNzA5LCJleHAiOjE0OTUzNjA3ODl9.BkyB0LjKB4FIsCtnM5FcpcBLvKed_j7rCCxZddwiYnU
I'm designing a Web service. The request is idempotent, so I chose the GET method. The response is relatively expensive to calculate and not small, so I want to get caching (on the protocol level) right. (Don't worry about memoisation at my part, I have that already covered; my question here is actually also paying attention to the Web as a whole.)
There's only one mandatory parameter and a number of optional parameter with default values if missing. For example, the following two map to the same representation of the response. (If this is a dumb way to go about it the interface, propose something better.)
GET /service?mandatory_parameter=some_data HTTP/1.1
GET /service?mandatory_parameter=some_data;optional_parameter=default1;another_optional_parameter=default2;yet_another_optional_parameter=default3 HTTP/1.1
However, I imagine clients do not know this and would treat them separate and therefore waste cache storage. What should I do to avoid violating the golden rule of caching?
Make up a canonical form, document it (e.g. all parameters are required after all and need to be sorted in a specific order) and return a client error unless the required form is met?
Instead of an error, redirect permanently to the canonical form of a request?
Or is it enough to not mind how the request looks like, and just respond with the same ETag for same responses?
First, don't use semicolons as a delimiter in a query string. You should be using ? to begin a query string and & to delimit variable/value pairs. RFC 3986 doesn't explicitly say you have to use &, but the vast majority of existing code uses this delimiter because of the application/x-www-form-urlencoded precedent.
Second, you're right, in that parameters in a query string result in a different URI, and thus, as far as caches are concerned, a different resource. Assuming you want optimal caching performance, if you know that an optional parameter has been specified, and its inclusion is unnecessary and does not affect the representation that will be transmitted, you should be making a redirect to a canonical representation that omits the parameter. (i.e., An optional parameter is given with a value that is set to the default value. For example, if you have http://example.com:80/, you can normalize to http://example.com/ because 80 is the default value for the port with HTTP. You can do the same for query parameters since you control the URI space.) If you have parameters included (optional or otherwise) that appear in an order other than the canonical order, you should redirect for that too. A 301 redirect would be preferred if you know that the relationship between URIs will be stable. Otherwise, do a 302/307 redirect as appropriate. I would recommend defining your canonical form the same way that OAuth does: Sort each parameter alphabetically, first by key, then by value. Other normalization operations will also help out here. RFC 3986 has an entire section on URI normalization that will be relevant to you. This technique will really only work for GET, and redirects on PUT/POST/DELETE are not generally recommended.
Third, ETags are great, and they provide a huge performance improvement if implemented well by both the client and server. However, it's unfortunately rare for both sides to do it right. Ditto for Last-Modified. You should pursue these, because the CPU and bandwidth savings are significant when it works, but they are not sufficient on their own. Other headers like Cache-Control are also frequently necessary. It's worth familiarizing yourself with Section 13 of RFC 2616 if you're planning on going into great detail on this stuff.
Finally, a word of warning — there is an issue with these redirects you need to be aware of: Clients trying to access your resources may frequently be redirected to other locations. This introduces overhead that only gives you an overall savings if the clients make subsequent requests against the same resource, maintaining state to avoid the subsequent redirect. Unless you've open-sourced a reference client implementation that takes advantage of your caching optimizations, you may never benefit from these tweaks.
I would pick option (2) in your list - I would make the request RESTful, rather than RPC like.
I.e. in this case, if you make all of the parameters parts of the request path:
/service/mandatory_parameter/some_data/optional_parameter/default1/another_optional_parameter/default2/yet_another_optional_parameter/default3
In the case where not all of the optional parameters are specified, return a 301 (Permanent redirect) to the full resource name with the defaults filled in. This will (or should) be cached by clients and web caches appropriately, and even if it gets to your backend then making the 301 should be very cheap for you.
At which point, you have one canonical form for the URI, and caching will work as normal/expected.
This does mean that every combination of parameters will be cached separately (as a 301), however that's fine really as the non-canonical requests will have an independent cache policy to the full request and clients which are worried about the extra round trip can fill in all the parameters themselves.
Your option (3) won't work as you expect - each form will be cached independently as they're different URIs.
It should also be noted that a lot of downstream caches / software won't cache your response at all due to the query parameters, which is why I suggest turning it into a 'proper' resource..
First it's a good thing you choice GET since other methods don't have as good caching support. As far as I know browsers do cache URIs with respect to the parameters so I don't think It's a good idea to use a canonical form.
One thing that you don't state here is how this service is going to be used. If those requests are made from a browser (and it looks to me that those are probably issued from a script) requests will probably look the same even if they are asked for more than once. So make sure that whatever generate the URI end up with the same URI for equal input data (remove default parameters or always include them).
When it comes to the ETag I recommend you to have this, though I would like to clarify how it works; You get the request, you process all your "expensive calculations" and then if there were a If-None-Match header with the same hash (ETag) as your processed response you may return 304 Not-Modified. So ETag is used to avoid transmitting the response if the client already have it. (Sure you may implement caching on server-side, but this is better to do based on input parameters).
To further improve cache hits on client side you may want to set proper caching headers in you response.
I asked almost the same question for me some month ago. My answer I describe on an example of my realization.
On the server side I have WFC service which receive requests in one of the following forms
GET /Service/RequestedData?param1=data1¶m2=data2…
GET /Service/RequestedData/IdOfData?param1=data1¶m2=data2…
PUT /Service/RequestedData/IdOfData // with param1=data1¶m2=data2… in body
POST /Service/RequestedData/IdOfData // with param1=data1¶m2=data2… in body
DELETE /Service/RequestedData/IdOfData
So requests are in REST for, but GET requests have some optional parameters. Especially this part is a port of your interest.
Because WFC support a URL templates, the prototype of functions which reply to a client request looks like
[WebGet (UriTemplate = "RequestedData?param1={myParam1}¶m2={myParam2}",
ResponseFormat = WebMessageFormat.Json)]
[OperationContract]
MyResult GetData (string myParam1, int myParam2);
All requests like
GET /Service/RequestedData?param1=¶m2=data2
GET /Service/RequestedData?param2=data2¶m1=
GET /Service/RequestedData?param2=data2
will be mapped to the same call from the side of my WCF service. So I have one problem less.
Now at the beginning of implementation of every method which response to HTTP GET request I set in the HTTP header "Cache-Control: max-age=0". It means that client always try to verify client browser cache and no ajax requests will be not easy responded from the local cache like it can do Internet Explorer.
Next I calculate always an ETag based on my data. The exact algorithm is a subject of separate discussion, but important is, that in all responses to HTTP GET requests exist ETag in the HTTP header.
So clients every time verify his local cache and send GET request to server. They send the ETag, which come from its local cache, inside of "If-None-Match" HTTP header. Server computes the ETag which has data, which will be sending back to this GET request. It ETag of data is the same as in the client request server send back response with empty body and the code "304 Not Modified" back. In this case browser gives data from the local cache.
If the same client from a unknown reason create a new version of URL request, which will be interpret from the web browser as a new URL, then web browser will not find old server response in the local cache and send one more time the same request to the server. Is it a real problem? The server send the data one more time. If you have a server side caching you can makes a little more optimization. In the most cases, the URL of GET requests will be produced by a client side JavaScript so you will be no time have such situation.
Calculation of ETag and setting of "Cache-Control: max-age=0" and Etag header as well as setting "304 Not Modified" code should do WFC service, but it is very easy.
The most important is that my implementation of ETag calculation is not as expansive as getting the whole data from the database server and calculation MD5 cache from there. I use permanently rowversion data type in every row of data in the SQL Server database. This rowversion is nothing other as a counter of changes in the database. If one change a row of data rowversion value in the corresponding row will be incremented. So if one makes SELECT statement from maximum value of rowversion value, and this value is not changed comparing with the previous requests, one can be sure that the data were not changed in the time period. The algorithm of calculation of ETag should be only sensitive to deleting of data from the table. But it is also a solved problem. A little more about this you can read in Concurrency handling of Sql transactrion.
I don’t want suggest my ETag calculation as a best choice, I want only say, that calculation of ETag can be much cheaper as calculation MD5 from the whole data.
In case of errors Server throws an exception which will be mapped to a HTTP code, which I define in the throw statement. As a body WFC sends a standard JSON object {"description":"My error text"}. A custom error object is also possible (see Is WebProtocolException included in .net 4.0?). On the client side I use jQuery and in the corresponding jQuery.ajax inside of error event handler the error message will be decoded and displayed to the user.
So my recommendation: usage of ETag together with "Cache-Control: max-age=0" for all HTTP GET requests. For all other requests I’ll recommend you implement RESTfull service. For the error implementation you should look at the most native way which is supported by the software used for server and client implementation and use this.
UPDATED: To clear the URL structure I should add following. In my service the main part like GET /Service/RequestedData/IdOfData describes data objects requested. Parameters param1=data1¶m2=data2 corresponds mostly the information about sorting, paging and filtering of data. I use active jqGrid plugin for jQuery and if the end-user scroll in the grid to the next page, click on the column header (sorting of data) or if he set a filter with respect of searching feature, all these follows to different optional parameters appended the main URL.