How to send a response with HAProxy without passing the request to web servers - http

The server is receiving thousands of OPTIONS requests due to CORS (Cross-Origin Resource Sharing). Right now, every options request is being sent to one of the servers, which is a bit wasteful, knowing that HAProxy can add the CORS headers itself without the help of a web server.
frontend https-in
...
use_backend cors_headers if METH_OPTIONS
...
backend cors_headers
rspadd Access-Control-Allow-Origin:\ https://www.example.com
rspadd Access-Control-Max-Age:\ 31536000
However for this to work I need to specify at least one live server in cors_headers backend and that server will still receive the requests.
How can I handle the request in the backend without specifying any servers? How can I stop the propagation of the request to servers, while sending the response to the browser and keeping the connection alive?

Edit for HAProxy 2.2 and above: In case you need to support a whitelist of origins, Lua scripts can now generate the entire response without having to pass the request to the backend server. Sample Lua script with simple integration instructions can be found here: https://github.com/haproxytech/haproxy-lua-cors
The only way to do this is in HAProxy 1.5.14 is by manually triggering the 503 error (no servers available to handle the request) and setting the error page to the file with custom CORS headers.
backend cors_headers
errorfile 503 /path/to/custom/file.http
The file.http should contain the desired headers and 2 empty lines at the end
HTTP/1.1 200 OK
Access-Control-Allow-Origin: https://www.example.com
Access-Control-Max-Age: 31536000
Content-Length: 0
Cache-Control: private
<REMOVE THIS LINE COMPLETELY>
This "method" has a couple of limitations:
there is no way to check the origin before sending the CORS headers, so you will either have to have a static list of allowed origins or you will have to allow all origins
lack of dynamic headers: you can't do
http-response set-header Date %[date(),http_date]
or set Expires header.
Note: if you are updating the HTTP file dynamically over time, to apply the changes to the HAProxy you will have to restart it. It can be a graceful restart or a hard restart, in either case the new file will be loaded, cached and served immediately.

Good news, HAProxy 2.2 just introduced the "Native Response Generator" feature. It works with the http-request return directive, and can be used for serving static files or text strings, including dynamic parameters.
The goal is to avoid the usual hacks with errorfile.
Taking advantage of another directive introduced in version 2.2 (http-after-response), the OP goal could be achieved with the following:
backend cors_headers
# http-response won't work here as the response is generated by HAP
http-after-response set-header Access-Control-Allow-Origin \
"%[req.hdr(Origin)]"
http-after-response set-header Access-Control-Max-Age "31536000"
http-request return status 200 content-type "text/plain" string "" if TRUE
The set-header and http-request return can be made conditional with an if clause based on the request headers or origin, depending on your needs (see the doc for examples).
With this technique the headers and response can use variables:
http-request return status 200 content-type "text/plain" \
lf-string "Hello, you are: %[src]"

This should work in most versions of HAproxy:
backend cors_headers
http-request deny deny_status 200

Related

Which changes do a browser make when using an HTTP Proxy?

Imagine a webbrowser that makes an HTTP request to a remote server, such as site.example.com
If the browser is then configured to use a proxy server, let's call it proxy.example.com using port 8080, in which ways are the request now different?
Obviously the request is now sent to proxy.example.com:8080, but there must surely be other changes to enable the proxy to make a request to the original url?
RFC 7230 - Hypertext Transfer Protocol (HTTP/1.1): Message Syntax and Routing, Section 5.3.2. absolute-form:
When making a request to a proxy, other than a CONNECT or server-wide
OPTIONS request (as detailed below), a client MUST send the target
URI in absolute-form as the request-target.
absolute-form = absolute-URI
The proxy is requested to either service that request from a valid
cache, if possible, or make the same request on the client's behalf
to either the next inbound proxy server or directly to the origin
server indicated by the request-target. Requirements on such
"forwarding" of messages are defined in Section 5.7.
An example absolute-form of request-line would be:
GET http://www.example.org/pub/WWW/TheProject.html HTTP/1.1
So, without proxy, the connection is made to www.example.org:80:
GET /pub/WWW/TheProject.html HTTP/1.1
Host: www.example.org
With proxy it is made to proxy.example.com:8080:
GET http://www.example.org/pub/WWW/TheProject.html HTTP/1.1
Host: www.example.org
Where in the latter case the Host header is optional (for HTTP/1.0 clients), and must be recalculated by the proxy anyway.
The proxy simply makes the request on behalf of the original client. Hence the name "proxy", the same meaning as in legalese. The browser sends their request to the proxy, the proxy makes a request to the requested server (or not, depending on whether the proxy wants to forward this request or deny it), the server returns a response to the proxy, the proxy returns the response to the original client. There's no fundamental difference in what the server will see, except for the fact that the originating client will appear to be the proxy server. The proxy may or may not alter the request, and it may or may not cache it; meaning the server may not receive a request at all if the proxy decides to deliver a cached version instead.

Do http proxies have some request memory?

Imagine we have http client, some proxy and web-server that serves as backend. The proxy is configured to cache the responses of the backend.
A request arrives, the proxy transfers it to the backend, the latter responses, the proxy caches the response and sends it to the client.
Imagine the backend has set some cache-related headers in its response to the proxy's request. For example:
Cache-Control: no-cache (or)
Cache-Control: max-age=100000 (or)
Expires: 'Next Friday'
A question is: Will the next request from the client be processed by the proxy in compliance with that headers?
Another flavour of the question: Is there a way for a proxy to understand that the resource is stale, except for its own static resource-lifetime settings?
Third variation: Can the client force the proxy to load the fresh resource version if the proxy's version of the resource is not considered stale by the proxy?
My question may look a little bit overgeneralized, not specific enough. I'll try to address this issue by working with browser + nginx proxy + nginx web-server setup. If my setup works correctly, in case some resource has been cached by proxy and nginx's proxy_cache_valid timeout is still on - nothing can prevent the proxy from serving a stale response; whatever I do the requests do not hit the backend.
It looks like nginx decides whether the cache is stale only basing on proxy_cache_valid setting, and the headers of backend's response do not matter at all. I'm wondering whether my guess is correct and whether it can be incorrect for some other http proxy setups, serving as reverse proxy such as nginx, office network internal proxy such as squid, internet public proxy.

What HTTP client headers should I use to instruct proxies to refetch from origin, and cache the response?

I'm currently working on a system where a client makes HTTP 1.1 requests of an origin server. I control both the client and the server software, so have free reign over HTTP headers set. Between the client are multiple, hierarchical layers of web proxy / cache devices (think, Squid or similar).
The data served up by the origin is usually highly cacheable, and I intend to set HTTP response headers to indicate this. Specifically, I plan to use Cache-Control: public, max-age=<value>. I understand that this will mean that intermediate proxies will cache the response up to the specified max-age, at which point they will revalidate against the origin (presumably with a Last-Modified header, looking for a 304 response).
The problem I have is that the client might become aware that the data held by caches might now be invalid. In this case, I need the client to make a request which instructs the caches to either fetch or revalidate their response with the origin. If the origin response is now different, the cache should store this new response. In my mind, this would involve the client making the request, and each cache in the chain should revalidate its response with the next upstream device, all the way back to the origin. The new response can then be served from the closest cache which actually has it.
What's the correct HTTP headers that need to be set on the client request to achieve this? At first I thought that setting Cache-control: no-cache in the HTTP request would make this happen, but reading the RFC, it seems that this will instruct the intermediate caches to both go back to the origin (desired) but also not cache the new response (not desired). I then saw an article in which an HTTP request header of Cache-control: max-age=0 would perhaps do this, but I'm not sure.
Will max-age=0 do what I need here, or do I need some other combination of HTTP headers?
I asked a similar question here: How to make proxy revalidate resource from origin. I since learned that proxy revalidate wasn't supported by nginx at the time of writing. It is scheduled for the 1.5 release.
Sending max-age=0 from the client should trigger this revalidate mechanism in the proxy, if the original response from the origin contained the right cache control headers.
But whether your upstream server(s) will respect these headers and revalidate with their origin is clearly not something you can just assume. If you have control over your upstream servers I think it could work.
Also etag is preferred over modified since headers afaik.
I found these to be helpful articles on the subject:
caching tutorial
cache control directives
http specs on validation
section 14.9.4 on this spec
[UPDATE]
Nginx version 1.5.8 has been released since, and I can confirm that this mechanism is now working!

What is the canonical method for an HTTP client to instruct an HTTP server to disable gzip responses?

I thought this was a simple google search, but apparently I'm wrong on that.
I've seen that you should supply:
Accept-Encoding: gzip;q=0,deflate;q=0
in the request headers. However, the article that suggested it also noted that proxies routinely ignore that header. Also, when I supplied it to nginx, it still compressed the response message body.
http://forgetmenotes.blogspot.ca/2009/05/how-to-disable-gzip-compression-in.html
So, how do I tell a web server to disable compression on the response message body?
Many web servers ignore the 'q' parameter. The compressed version of a static resource is often cached and is returned whenever the request accepts it. To avoid getting compressed resources, use
Accept-Encoding: identity
The server should not serve you a compressed representation of the resource in this instance. Nor should any proxy. This is the default accepted encoding if none is given, but your client might add a default that accepts gzip, so explicitly providing 'identity' should do the trick.
Do you wish encoding to be disabled altogether?Then skip the Accept-Encoding header itself within http request headers.
Do you wish only gzip compression to be absent in the http response?Then skip gzip from the values list in the http request header.
Do you wish to prioritize different compression techniques that servers support? then use different values between 0 and 1 along-with q argument for each value in the Accept-Encoding http request header. (Currently you are using conflicting value and indicating by weight=0 that you don't know how you'll manage, but you want response to be encoded anyhow)
I think this mod for apache
http://httpd.apache.org/docs/2.2/mod/mod_deflate.html (2)
And this for Nginx
http://wiki.nginx.org/HttpGzipModule (1)
Sounds like what you need depending on which server you plan to use. The rest is up to you!
Please note the nginx module both allows shutting down decompression:
[edit] gzip
Syntax: gzip on | off
Default: off
Context: http
server
location
if in location
Reference: gzip
Enables or disables gzip compression.
And dealing with proxies:
[edit] gzip_proxied
Syntax: gzip_proxied off | expired | no-cache | no-store | private | no_last_modified | no_etag | auth | any ...
Default: off
Context: http
server
location
Reference: gzip_proxied
It allows or disallows the compression of the response for the proxy request in the dependence on the request and the response. The fact that, request proxy, is determined on the basis of line "Via" in the headers of request. In the directive it is possible to indicate simultaneously several parameters:
off - disables compression for all proxied requests
expired - enables compression, if the "Expires" header prevents caching
no-cache - enables compression if "Cache-Control" header is set to "no-cache"
no-store - enables compression if "Cache-Control" header is set to "no-store"
private - enables compression if "Cache-Control" header is set to "private"
no_last_modified - enables compression if "Last-Modified" isn't set
no_etag - enables compression if there is no "ETag" header
auth - enables compression if there is an "Authorization" header
any - enables compression for all requests
[edit] gzip_types
Best wishes!
Sources:
1) http://forum.nginx.org/read.php?11,96472,214303
2) http://httpd.apache.org/docs/2.2/mod/mod_deflate.html#inflate (Section Output Decompression)

X-Cache Header Explanation

I was going through the firefox local cache folder and found a lot of files containing the X-cache header. Can someone explain the purpose of this header ?
thanks
CDN (Content Delivery Network) adds X-cache header to HTTP Response. X-cache:HIT means that your request was served by CDN, not origin servers. CDN is a special network designed to cache content, so that usr request served faster + to unload origin servers.
Prefix 'X' in X-Cache indicates that the header is not a standard HTTP Header Field. Also its meaning vary from one proxy implementation to another. A common place to find these header fields is in squid servers. Organizations and universities place proxy (squid) servers between their and
outer network. This serves two purposes. One of security, and other of caching more frequent web pages (in order to limit network traffic).
X-Cache corresponds to the result, whether the proxy has served the result from cache (HIT for yes, and MISS for no)
X-Cache-Lookup represents if the proxy has a cache-able response to the request (HIT for yes and MISS for no)
Both HITs means that the client has made a cache-able request and the proxy had a cache-able response that matched, and was forwarded back to the client.
In case X-Cache is MISS and X-Cache_Lookup is HIT, then the client made a request that had a cache-able response but was forced by the client to bypass the cache. This is hard refresh, which can be simulated by Ctrl + F5 or by sending headers:
Pragma: no-cache (in HTTP/1.0) and Cache-Control: no-cache
(HTTP/1.1)
If both are MISS(es) then the request by the client doesn't have any valid object corresponding to the request.
Some Useful Resources:
X-Cache and X-Cache-Lookup Headers
Understanding cache HIT and MISS headers with shielded services
X-Cache "is NOT a standard HTTP header field".
Also, check out X-Cache and X-Cache-Lookup headers explained.
for me me this was related to fastcgi cache header existing on Nginx server block
add_header X-Cache $upstream_cache_status;
just removing commenting this line and restart nginx the header were removed .

Resources