I'm using playframework and nginx. playframework may add following cookies to http response: PLAY_SESSION, PLAY_FLASH, PLAY_LANG.
I want to make sure that only above cookies (PLAY_*) are allowed in nginx level. If there are other cookies (let's say they're added accidentally) they should be removed by nginx.
How can I allow only predefined cookies in http response in nginx?
PS: If it's not possible to solve this issue in nginx, I need to fix by using playframework.
How cookies work?
First, let's establish what's cookies — they're little pieces of "sticky" hidden information that lets you keep state on your web-site for a given User-Agent. These cookies are often used for tracking users, keeping session and storing minor preference information for the site.
Set-Cookie HTTP response header (from server to client)
Cookies can be set by the server through the Set-Cookie response header (with a separate header for each cookie), or, after the page has already been transferred from the server to the client, through JavaScript.
Note that setting cookies is a pretty complex job — they have expiration dates, http/https settings, path etc — hence the apparent necessity to use a separate Set-Cookie header for each cookie.
This requirement to have a separate header is usually not an issue, since cookies aren't supposed to be modified all that often, as they usually store very minimal information, like a session identifier, with the heavy-duty information being stored in an associated database on the server.
Cookie HTTP request header (from client to server)
Regardless how they were first set, cookies would then included in eligible subsequent requests to the server by the client, using the Cookie request header, with a whole list of eligible cookies in one header.
Note that, as such, these cookies that are sent by the client back to the server is a simple list of name and attribute pairs, without any extra information about the underlying cookies that store these attributes on the client side (e.g., the expiration dates, http/https setting and paths are saved by the client internally, but without being revealed in subsequent requests to the server).
This conciseness of the Cookie request header field is important, because, once set, eligible cookies will be subsequently included in all forthcoming requests for all resources with the eligible scheme / domain / path combination.
Caching issues with cookies.
The normal issue of using cookies, especially in the context of acceleration and nginx, is that:
cookies invalidate the cache by default (e.g., unless you use proxy_ignore_headers Set-Cookie;),
or, if you do sloppy configuration, cookies could possibly spoil your cache
e.g., through the client being able to pass cookies to the upstream in the absence of proxy_set_header Cookie "";,
or, through the server insisting on setting a cookie through the absence of proxy_hide_header Set-Cookie;.
How nginx handles cookies?
Cookie from the client
Note that nginx does support looking through the cookies that the client sends to it (in the Cookie request header) through the $cookie_name scheme.
If you want to limit the client to only be sending certain cookies, you could easily re-construct the Cookie header based on these variables, and send only whichever ones you want to the upstream (using proxy_set_header as above).
Or, you could even make decisions based on the cookie to decide which upstream to send the request to, or to have a per-user/per-session proxy_cache_key, or make access control decisions based on the cookies.
Set-Cookie from the backend
As for the upstream sending back the cookies, you can, of course, decide to block it all to considerably improve the caching characteristics (if applicable to your application, or parts thereof), or fix up the domain and/or path with proxy_cookie_domain and/or proxy_cookie_path, respectively.
Otherwise, it's generally too late to make any other routing decision — the request has already been processed by the selected upstream server, and the response is ready to be served — so, naturally, there doesn't seem to be a way to look into these individual Set-Cookie cookies through normal means in nginx (unless you want to go third-party modules, or lua, or perl), since it'd already be too late to make any important routing decisions for a completed request.
Basically, these Set-Cookie cookies have more to do with the content than with the way it is served or routed, so, it doesn't seem appropriate to have integrated functionality to look into them through nginx.
(If you do need to make routing decisions after the completion of the request, then nginx does support X-Accel-Redirect, as well as some other special headers.)
If your issue is security, then, as I've pointed out above, the upstream developer can already use JavaScript to set ANY extra cookies however they want, so, effectively, trying to use nginx to limit some, but not all, Set-Cookie responses from the server is kind of a pointless endeavour in the real world (as there is hardly any difference between the cookies set through JavaScript compared to Set-Cookie).
In summary:
you can easily examine and reconstruct the Cookie header sent by the client to the server before passing it over to the backend, and only include the sanctioned cookies in the request to upstream backend,
but, unless you want to use lua/perl, or have your own nginx module (as well as possibly quarantine the JavaScript from the pages you serve), then you cannot pass only certain Set-Cookie headers back from the upstream backend to the client with a stock nginx.conf — with the Set-Cookie headers, it's an all-or-nothing situation, and there doesn't seem to be a good-enough use-case for a distinct approach.
For an Nginx solution it might be worth asking over at serverfault. Here is a potential solution via Play Framework.
package filters
import javax.inject._
import play.api.mvc._
import scala.concurrent.ExecutionContext
#Singleton
class ExampleFilter #Inject()(implicit ec: ExecutionContext) extends EssentialFilter {
override def apply(next: EssentialAction) = EssentialAction { request =>
next(request).map { result =>
val cookieWhitelist = List("PLAY_SESSION", "PLAY_FLASH", "PLAY_LANG")
val allCookies = result.newCookies.map(c => DiscardingCookie(c.name))
val onlyWhitelistedCookies = result.newCookies.filter(c => cookieWhitelist.contains(c.name))
result.discardingCookies(allCookies: _*).withCookies(onlyWhitelistedCookies: _*)
}
}
}
This solution utilizes Filters and Result manipulation. Do test for adverse effects on performance.
Related
Does/should server code run when the server return status code 304?
I understand that the server should not return anything (client should use the cache), but I cant find any info on whether the server will executed the code in an api endpoint for example.
RFC 2616 section 10.3.5 describes the 304 Not Modified response:
If the client has performed a conditional GET request and access is
allowed, but the document has not been modified, the server SHOULD
respond with this status code. The 304 response MUST NOT contain a message-body
The server will send Date, ETag and/or Content-Location (200 only), or Expires, Cache-Control, and/or Vary if the respond may vary.
What is a Conditional GET?
An cliet application, browser, or proxy with a retained Last-Modified or Etag value will issue a Conditional GET as an initial header only. This allows the client to determine if the resource has been updated.
How does the client know if the resource changed?
Well it depends on how the server is configured.
The Origin Server May:
Ignore caching, serve every request new.
Similar to ignoring, you may develop the application to change query strings. This prevents caching at Proxy Servers and invalidates the client cache.
If configured to do so, issue a Last-Modified or Etag value. Often done for static content. Proxy Servers and Client Caches use to invalidate their version.
A Web application could issue a Last-Modified far
into the future then change the URL to invalidate stale content. This requires the application be developed with this feature in mind.
Resources may also be issued version numbers. This allows them to preserve Proxy Caches but invalidate Client Caches.
Does server code run when the server return status code 304?
Almost Never. Disregarding that it is technically possible for an incorrectly configured application to respond with a 304 Not Modified code instead of a 200 code.
With a ETag and/or Content-Location value, a server (nginx for example) can confirm nothing has changed without issued a call to the application. This also neatly handles resources with version numbers the same way.
For query strings (image.jpg?version=12), the client cache will invalidate the content. A Proxy Server will also invalidate, and the query will be requested fresh.
I understand that the server should not return anything (client should use the cache), but I cant find any info on whether the server will executed the code in an api endpoint for example.
I'm a fan of nginx, here is a good resource on how caching applies to it.
In short, as much as you can do to support various caches between your client and the application the more requests you can support per day.
I was thinking about asking on Software Recommendations, but then I've found out that it may be a too strange request and it needs some clarification first.
My points are:
Each response contains an etag
which is a hash of the content
and which is globally unique (with sufficient probability)
The content is (mostly) dynamic and may change anytime (expires and max-age headers are useless here).
The content is partly user-dependent, as given by the permissions (which itself change sometimes).
Basically, the proxy should contain a cache mapping the etag to the response content. The etag gets obtained from the server and in the most common case, the server does not deal with the response content at all.
It should go like follows: The proxy always sends a request to the server and then either
1 the server returns only the etag and the proxy makes a lookup based on it and
1.1 on cache hit,
it reads the response data from cache
and sends a response to the client
1.2 on cache miss,
it asks the server again and then
the server returns the response with content and etag,
the proxy stores it in its cache
and sends a response to the client
2 or the server returns the response with content and etag,
the proxy stores the data in its cache
and sends a response to the client
For simplicity, I left out the handling of the if-none-match header, which is rather obvious.
My reason for this is that the most common case 1.1 can be implemented very efficiently in the server (using its cache mapping requests to etags; the content isn't cached in the server), so that most requests can be handled without the server dealing with the response content. This should be better than first getting the content from a side cache and then serving it.
In case 1.2, there are two requests to the server, which sounds bad, but is no worse than the server asking a side cache and getting a miss.
Q1: I wonder, how to map the first request to HTTP. In case 1, it's like a HEAD request. In case 2, it's like GET. The decision between the two is up to the server: If it can serve the etag without computing the content, then it's case 1, otherwise, it's case 2.
Q2: Is there a reverse proxy doing something like this? I've read about nginx, HAProxy and Varnish and it doesn't seem to be the case. This leads me to Q3: Is this a bad idea? Why?
Q4: If not, then which existing proxy is easiest to adapt?
An Example
A GET request like /catalog/123/item/456 from user U1 was served with some content C1 and etag: 777777. The proxy stored C1 under the key 777777.
Now the same request comes from user U2. The proxy forwards it, the server returns just etag: 777777 and the proxy is lucky, finds C1 in its cache (case 1.1) and sends it to U2. In this example, neither the clients not the proxy knew the expected result.
The interesting part is how could the server know the etag without computing the answer. For example, it can have a rule stating that requests of this form return the same result for all users, assuming that the given user is allowed to see it. So when the request from U1 came, it computed C1 and stored the etag under the key /catalog/123/item/456. When the same request came from U2, it just verified that U2 is permitted to see the result.
Q1: It is a GET request. The server can answer with an "304 not modified" without body.
Q2: openresty (nginx with some additional modules) can do it, but you will need to implement some logic yourself (see more detailed description below).
Q3: This sounds like a reasonable idea given the information in your question. Just some food for thought:
You could also split the page in user-specific and generic parts which can be cached independently.
You shouldn't expect the cache to keep the calculated responses forever. So, if the server returns a 304 not modified with etag: 777777 (as per your example), but the cache doesn't know about it, you should have an option to force re-building the answer, e.g. with another request with a custom header X-Force-Recalculate: true.
Not exactly part of your question, but: Make sure to set a proper Vary header to prevent caching issues.
If this is only about permissions, you could maybe also work with permission infos in a signed cookie. The cache could derive the permission from the cookie without asking the server, and the cookie is tamper proof due to the signature.
Q4: I would use openresty for this, specifically the lua-resty-redis module. Put the cached content into a redis key-value-store with the etag as key. You'd need to code the lookup logic in Lua, but it shouldn't be more than a couple of lines.
MDN says, when the credentials like cookies, authorisation header or TLS client certificates has to be exchanged between sites Access-Control-Allow-Crendentials has to be set to true.
Consider two sites A - https://example1.xyz.com and another one is B- https://example2.xyz.com. Now I have to make a http Get request from A to B. When I request B from A I am getting,
"No 'Access-Control-Allow-Origin' header is present on the requested
resource. Origin 'http://example1.xyz.com' is therefore not allowed
access."
So, I'm adding the following response headers in B
response.setHeader("Access-Control-Allow-Origin", request.getHeader("origin"));
This resolves the same origin error and I'm able to request to B. When and why should I set
response.setHeader("Access-Control-Allow-Credentials", "true");
When I googled to resolve this same-origin error, most of them recommended using both headers. I'm not clear about using the second one Access-Control-Allow-Credentials.
When should I use both?
Why should I set Access-Control-Allow-Origin to origin obtained from request header rather than wildcard *?
Please quote me an example to understand it better.
Allow-Credentials would be needed if you want the request to also be able to send cookies. If you needed to authorize the incoming request, based off a session ID cookie would be a common reason.
Setting a wildcard allows any site to make requests to your endpoint. Setting allow to origin is common if the request matches a whitelist you've defined. Some browsers will cache the allow response, and if you requested the same content from another domain as well, this could cause the request to be denied.
Setting Access-Control-Allow-Credentials: true actually has two effects:
Causes the browser to actually allow your frontend JavaScript code to access the response if credentials are included
Causes any Set-Cookie response header to actually have the effect of setting a cookie (the Set-Cookie response header is otherwise ignored)
Those effects combine with the effect that setting XMLHttpRequest.withCredentials or credentials: 'include' (Fetch API) have of causing credentials (HTTP cookies, TLS client certificates, and authentication entries) to actually be included as part of the request.
https://fetch.spec.whatwg.org/#example-cors-with-credentials has a good example.
Why should I set Access-Control-Allow-Origin to origin obtained from request header rather than wildcard *?
You shouldn’t unless you’re very certain what you’re doing.
It’s actually safe to do if:
The resource for which you’re setting the response headers that way is a public site or API endpoint intended to be accessible by everyone, and
You’re just not setting cookies that could enable an attacker to get access to sensitive information or confidential data.
For example, if your server code is just setting cookies just for the purpose of saving application state or session state as a convenience to your users, then there’s no risk in taking the value of the Origin request header and reflecting/echoing it back in the Access-Control-Allow-Origin value while also sending the Access-Control-Allow-Credentials: true response header.
On the other hand, if the cookies you’re setting expose sensitive information or confidential data, then unless you’re really certain you have things otherwise locked down (somehow…) you really want to avoid reflecting the Origin back in the Access-Control-Allow-Origin value (without checking it on the server side) while also sending Access-Control-Allow-Credentials: true.
If you do that, you’re potentially exposing sensitive information or confidential data in way that could allow malicious attackers to get to it. For an explanation of the risks, read the following:
https://web-in-security.blogspot.jp/2017/07/cors-misconfigurations-on-large-scale.html
http://blog.portswigger.net/2016/10/exploiting-cors-misconfigurations-for.html
And if the resource you’re sending the CORS headers for is not a public site or API endpoint intended to be accessible by everyone but is instead inside an intranet or otherwise behind some IP-address-restricted firewall, then you definitely really want to avoid combining Access-Control-Allow-Origin-reflects-Origin and Access-Control-Allow-Credentials: true. (In the intranet case you almost always want to only be allowing specific hardcoded/whitelisted origins.)
I have a route that uses an authentication cookie set by another route. I've created it like this:
This method no longer works in the new version. Paw complains that there is no set-cookie header in the response from the Authenticate request.
It seems this is because Paw now takes the cookies and handles them differently from other headers. I like this approach because it should make this sort of authentication easier, but, unfortunately, it's not working like I would expect.
Here's how I have configured a newer request:
So, I've set the cookie header to the Response Cookies dynamic value which, I believe, should pass along the cookies set previously. I would think I should select the Authenticate request from the dropdown (since it's the response from this request that actually sets the cookie, but the cookie value disappears if I do that. Instead, I have left the request value as Current Request since that seems to contain the correct value.
I've also noticed the Automatically send cookies setting which I thought might be an easy solution. I removed the manual cookie header from my request leaving this checked in hopes it might automatically send over any cookies from the cookie jar along with the request, but that doesn't seem to work either. No matter what I try, my request fails to produce the desired results because of authentication.
Can you help me understand how to configure these requests so that I can continue using Paw to test session-authenticated routes?
Here are a couple of things that will make you understand how cookies work in Paw (starting from version 2.1):
1. Cookies are stored in Jars
To allow users to keep multiple synchronous sessions, cookies are stored in jars, so you can switch between sessions (jars) easily.
Cookies stored in jars will be sent only if they match the request (hostname, path, is secure, etc.).
2. Cookies from jars are sent by default, unless Cookie header is overridden
If you set a Cookie header manually, cookies stored in jars wont' be sent.
And obviously unless Automatically Send Cookies is disabled.
3. The previous use of "Response Headers" was hacky. Use Response Cookies.
In fact, Set-Cookie (for responses) and Cookie (for requests) have different syntaxes. So you can't send back the original value of Set-Cookie (even though it seemed to be working in most cases).
The new Response Cookies dynamic value you mentioned has this purpose: send back the cookies set by a specific request.
Now, in your case, I would use a Response Cookies dynamic value all the time. As you have only 1 request doing auth / cookie setting, it might be the easiest to handle. Also, maybe check Ignore Domain, Path, Is Secure and Date to make sure your cookie is always sent even if you switch the host (or something else).
Deleting the cookies in the cookie jar and hitting the authentication endpoint again seems to have fixed the problem. Not sure why, but it seems to be working now either manually sending a cookie or using the Automatically send cookies setting.
I've looked around but haven't been able to figure out if I should use both an ETag and an Expires Header or one or the other.
What I'm trying to do is make sure that my flash files (and other images and what not only get updated when there is a change to those files.
I don't want to do anything special like changing the filename or putting some weird chars on the end of the url to make it not get cached.
Also, is there anything I need to do programatically on my end in my PHP scripts to support this or is it all Apache?
They are slightly different - the ETag does not have any information that the client can use to determine whether or not to make a request for that file again in the future. If ETag is all it has, it will always have to make a request. However, when the server reads the ETag from the client request, the server can then determine whether to send the file (HTTP 200) or tell the client to just use their local copy (HTTP 304). An ETag is basically just a checksum for a file that semantically changes when the content of the file changes.
The Expires header is used by the client (and proxies/caches) to determine whether or not it even needs to make a request to the server at all. The closer you are to the Expires date, the more likely it is the client (or proxy) will make an HTTP request for that file from the server.
So really what you want to do is use BOTH headers - set the Expires header to a reasonable value based on how often the content changes. Then configure ETags to be sent so that when clients DO send a request to the server, it can more easily determine whether or not to send the file back.
One last note about ETag - if you are using a load-balanced server setup with multiple machines running Apache you will probably want to turn off ETag generation. This is because inodes are used as part of the ETag hash algorithm which will be different between the servers. You can configure Apache to not use inodes as part of the calculation but then you'd want to make sure the timestamps on the files are exactly the same, to ensure the same ETag gets generated for all servers.
Etag and Last-modified headers are validators.
They help the browser and/or the cache (reverse proxy) to understand if a file/page, has changed, even if it preserves the same name.
Expires and Cache-control are giving refresh information.
This means that they inform, the browser and the reverse in-between proxies, up to what time or for how long, they may keep the page/file at their cache.
So the question usually is which one validator to use, etag or last-modified, and which refresh infomation header to use, expires or cache-control.
Expires and Cache-Control are "strong caching headers"
Last-Modified and ETag are "weak caching headers"
First the browser check Expires/Cache-Control to determine whether or not to make a request to the server
If have to make a request, it will send Last-Modified/ETag in the HTTP request. If the Etag value of the document matches that, the server will send a 304 code instead of 200, and no content. The browser will load the contents from its cache.
Another summary:
You need to use both. ETags are a "server side" information. Expires are a "Client side" caching.
Use ETags except if you have a load-balanced server. They are safe and will let clients know they should get new versions of your server files every time you change something on your side.
Expires must be used with caution, as if you set a expiration date far in the future but want to change one of the files immediatelly (a JS file for instance), some users may not get the modified version until a long time!
One additional thing I would like to mention that some of the answers may have missed is the downside to having both ETags and Expires/Cache-control in your headers.
Depending on your needs it may just add extra bytes in your headers which may increase packets which means more TCP overhead. Again, you should see if the overhead of having both things in your headers is necessary or will it just add extra weight in your requests which reduces performance.
You can read more about it on this excellent blog post by Kyle Simpson: http://calendar.perfplanet.com/2010/bloated-request-response-headers/
In my view, With Expire Header, server can tell the client when my data would be stale, while with Etag, server would check the etag value for client' each request.
ETag is used to determine whether a resource should use the copy one. and Expires Header like Cache-Control is told the client that before the cache decades, client should fetch the local resource.
In modern sites, There are often offer a file named hash, like app.98a3cf23.js, so that it's a good practice to use Expires Header. Besides this, it also reduce the cost of network.
Hope it helps ;)
Etag is a hash for indicating the version of the resource. When the server returns data, it hashes the data and set this hash value under ETAG. When you send a "PUT" request to the server to update a record, maybe simultaneously another user made the same "PUT" request and its request has been processed. The server will check your "PUT" data and will see that it is the same update so it wont make another update, it will send you the updated data (by another user) and you will update your cache.
when the time for caching expires, the browser automatically makes a new request to get the fresh data. That is why "Expires" header is used
If a response includes both an Expires header and a max-age directive,
the max-age directive overrides the Expires header, even if the
Expires header is more restrictive. This rule allows an origin server
to provide, for a given response, a longer expiration time to an
HTTP/1.1 (or later) cache than to an HTTP/1.0 cache. This might be
useful if certain HTTP/1.0 caches improperly calculate ages or
expiration times, perhaps due to desynchronized clocks.