adding cache_control = nocache to http request using squid transparent proxy - http

I am trying to edit the header of http headers to set the cache_control option to " no-cache" to prevent users from being able to cache the contents even if they are using any type of caching engines. the squid proxy will work in a transparent mode.
PS: We are an ISP and we are trying to prevent the clients from caching all HTTP data.

Related

HTTP connect with Transparent proxy

I am trying to write (and understand) a transparent proxy.
My setup would look like this
Client Browser ---> TProxy ----> Upstream Proxy ------> cloud
When the client browser makes a GET request, the idea is TProxy would then CONNECT to the Upstream proxy. The upstream proxy requires digest authentication. So, essentially the flow would look like
Client Browser ---> TProxy --------> Upstream Proxy ---------------> cloud
GET BBC.co.uk
CONNECT
407 PROXY AUTH REQUIRED
CONNECT
(with proxy-authorization)
200 OK
GET BBC.co.uk
I am confused what happens once CONNECT with authorization succeeds.
Am I suppose to modify the original GET request now to include a
Proxy-Authorization header?
or would the original GET request be then tunnelled in another http header something like
HTTP Header
Proxy Authorization
HTTP Header (GET BBC.CO.UK)
Data
or I can just pass the original GET request as is?
I am just starting with http and would appreciate any help.
Thanks
When you authenticate upstream from your transparent proxy, the Proxy-Authorization header applies only to the CONNECT.
The GET requests happen within the tunnel, so the upstream explicit proxy is not supposed to see them, and for sure does not expect any proxy authentication headers on them.
In short, you do not need to worry about the GET, but not because of the answer given above, but because there is a tunnel between the transparent proxy and the site, and the explicit proxy only sees and authenticates the CONNECT.
There is no such thing as nested headers in HTTP.
A proxy - whether transparent or not - always terminates the HTTP connection from the client, and initiates a new one to the server.
That means that the HTTP GET from the client goes to your TProxy. TProxy creates a new GET request to Upstream Proxy. Ideally, TProxy will simply pass on all the headers. That would make it (nearly) undetectable.
The same goes in reverse for the response headers.
In reality, proxy servers will, and in many cases have to, manipulate some headers. They will often add their own header (for instance, to alert the communication partners to the presence of a proxy), and they can also manipulate existing headers.
So, the short answer to your question: whatever header field your TProxy receives, pass it on unchanged unless you fully understand the implications.

How to make squid proxy server cache response with vary: * in header?

I'm building a system to serve the same page (even though it's not fresh anymore) when requesting the same URL within a run which can be about an hour, so I try using squid cache to cache everything. I add this to squid.conf:
refresh_pattern ^http: 600000 100% 700000 override-expire
override-lastmod reload-into-ims ignore-reload ignore-no-cache
ignore-private ignore-no-store ignore-must-revalidate ignore-auth
However, it doesn't seem to work when the HTTP response has "Vary: *" in the header. For example, I cannot cache http://stackoverflow.com. I'm using squid version 3.1.19, if that matters.
Is there a way to get around this?
"Vary: *" essentially means that there are factors other than headers in the HTTP request that determine the uniqueness of a request (for example, client IP address, etc), so a intermediate cache (squid) cannot really reliably cache.
Unfortunately, Squid has no mechanism for ignoring the Vary header, either completely or for select headers. I'm running into this problem myself.

Do http proxies have some request memory?

Imagine we have http client, some proxy and web-server that serves as backend. The proxy is configured to cache the responses of the backend.
A request arrives, the proxy transfers it to the backend, the latter responses, the proxy caches the response and sends it to the client.
Imagine the backend has set some cache-related headers in its response to the proxy's request. For example:
Cache-Control: no-cache (or)
Cache-Control: max-age=100000 (or)
Expires: 'Next Friday'
A question is: Will the next request from the client be processed by the proxy in compliance with that headers?
Another flavour of the question: Is there a way for a proxy to understand that the resource is stale, except for its own static resource-lifetime settings?
Third variation: Can the client force the proxy to load the fresh resource version if the proxy's version of the resource is not considered stale by the proxy?
My question may look a little bit overgeneralized, not specific enough. I'll try to address this issue by working with browser + nginx proxy + nginx web-server setup. If my setup works correctly, in case some resource has been cached by proxy and nginx's proxy_cache_valid timeout is still on - nothing can prevent the proxy from serving a stale response; whatever I do the requests do not hit the backend.
It looks like nginx decides whether the cache is stale only basing on proxy_cache_valid setting, and the headers of backend's response do not matter at all. I'm wondering whether my guess is correct and whether it can be incorrect for some other http proxy setups, serving as reverse proxy such as nginx, office network internal proxy such as squid, internet public proxy.

How to configure squid to be a Transparent proxy?

I am working with Squid Proxy Server as I have also used cyberoam,Sonicwall and Clear OS.
I want to setup my own proxy like above products ie authentication in transparent proxy.
Actually I setup transparent proxy but at that time my HTTPS site is not working.Then I configure one iptables rule that redirect all http & https traffic to 3128(squid port) only. but here I can access all my https websites but I cant block them.
My requirement is when I am going to access any website at first time it will ask me to authentication and then and only i can access internet. In log reports also I can show its Username and one more thing it will also possible in thinclient(terminal service).
Anybody help me short-out this problem ?
Proxy authentication doesn't work in transparent proxies setups. The browser should have the proxy configured to catch the authentication request from a proxy and to request the credentials to an user.
Another thing is that you can create a transparent proxy for HTTPS. Why? Because when the browser connects, it's connected to the proxy, not the real server. The browser will try to negotiate SSL which is a thing that Squid won't support. There are tricks to do this, but you'll break the SSL security, browser will complain, etc. There are one tool that I used to get this working: u2nl, but it's a hack that tunnels HTTPS trought the proxy.
The best option, is to use a non-transparent proxy. If you want to avoid browsers configuration, have a look at WPAD
As said before, you can't really block HTTPS sites with Squid, and you can't really use authentication with the proxy running at his transparent mode.
As far as I could use and cofigure, you can use an external acl to force a kind of login, but the login requests will not be treated by the proxy, but you can work it with some PERL.
And about the HTTPS thing, you could work it with some hacks, but it is a very sensible question, because the server performance with be punished with this kind of use and you could be pointed as a fraudulent service, which isn't cool... Believe me.

How to write "transparent" HTTP proxy?

Inspired by this article http://www.catonmat.net/http-proxy-in-nodejs
Any idea, how to convert this proxy to acts as transparent proxy?
PS: I know how to set up my firewall, etc... Just trying to run this toy instead of transparent squid.
In general, the difference between transparent and explicit proxy is that the full URL is not sent in the HTTP command in transparent mode.
The proxy would use the Host header to determine the upstream server instead of extracting it from the URL: otherwise the processing is the same. Note that this works for HTTP only, and transparent HTTPS proxy is much harder.
I'm not familiar with node.js: my guess is that the Host header will be available in the request.headers field, and then it's a matter of fixing the proxy_request object to have the proper full upstream URL.

Resources