Bandwidth quota in squid - networking

Can anyone help me with finding a way to do 'per user quota' on squid like people login to the proxy they have a cap like 1GB or something (Might be different for each user) and when they use 1GB of Bandwidth it stops their internet and redirects them to a webpage on the a web server saying they have run out of internet etc..

Squid doesn't have anything to do that. You have to code an external helper and a process that counts the bytes transferred from the logs. I'd write a daemon that receives the logs via UDP which could be queried by the external helper to check if the user is over quota.

Related

Monitoring website visit length with Nginx logs

I am trying to build a solution to monitor my website visits without relying on cookies and third parties. Currently, by monitoring the access logs I can get enough and useful information but I am missing the length of the visits (i.e. to check whether people actually read what I write).
What would be a good strategy to monitor visit length with access logs? (I am using Nginx, but presumably the same ideas will be valid for Apache)
If not already part of your build then install the Nchan websockets module for Nginx.
Configure a websocket subscriber location directive on your Nginx server and specify nchan_subscribe_request and nchan_unsubscribe_request directives within it.
Insert a line of code into your page to establish a client connection to your websocket location upon page load.
That's it, done.
Now when I visit your page my browser will connect to your Nginx/Nchan websocket server. Nginx will make an internal request to whatever address you set as the nchan_subscribe_request URL, you can pass my IP in the headers of this request or whatever you need to identify me. Log this in your main log, a separate log, pass it to an upstream server, php, node, make a database entry, save my ip+timestamp in memcached, whatever.
Then when I leave the site my websocket connection will disconnect and Nginx will do the same thing but to the nchan_unsubscribe_request URL instead. Depending upon what you did when I connected you can now do whatever you need to do in order to work out how long I spent on your site.
As you now have a persistent connection to your clients you could take it a step further and include some code to monitor certain client behaviours or watch for certain events.
You are trying to determine whether or not people are reading what you write so you could use a few lines of javascript to monitor how far down the page visitors had scrolled. Each time they scrolled to a new maximum scroll position send that data over the websocket back to your server.
Due to the disconnected nature of HTTP, your access log would probably not give you what you need.
Not totally familiar with nginx or apache log, but I think most logs contain a timestamp, an HTTP request (the document requested and status, etc.) and an IP address.
Potential issues
Without a session cookie, all IP addresses (same household, same company, etc.) would be seen as the same session.
If someone goes to your site (1 HTTP request), consumes content on your site, doesn't proceed to another page, and leaves, your log will only contain that request (which is essentially a bounce, and you won't be able to calculate duration). If your application makes uses of a lot of javascript calls, then you might be able to log from the server side application,
2) If you use a tool like GA, etc., you can still use timer and javascript events (etc. though not perfect) to tell GA that the session is still active. Not sure if it works for typical server logs.
It might not be as big as an issue if a typical visit contains more than 1 request, knowing that there is no easy way to get the duration after the last server request.

SQUID Monitoring tool

I have install "Squid for windows" server on my network to provide shared Internet access to about 6 users and it is work fine.
My main purpose to setup squid is save bans width of my 4G Internet connection, but I do not find any traffic monitoring tool, to monitor users activity, hit rate and miss rate of server.
I want to know what are the available GUI monitoring tool for Squid and how to configure in windows environment?
squid cache manager tool will give you information about request rates, failure, CPU and memory utilization and such details. I'm not sure whether it give user-wise/ip-wise details of requests.
If you are using any authentication in proxy, you can log username against each log entry, write an app to process your squid access log so that you can get all details against ip/username
You can use squid analysis and report generator which will give you all required information.

nginx behind load balancers

I've found at that Instagram share their technology implementation with other developers trough their blog. They've some great solutions for the problems they run into. One of those solutions they've is an Elastic Load Balancer on Amazon with 3 nginx instances behind it. What is the task of those nginx servers? And what is the task of the Elastic Load balancers, and what is the relation between them?
Disclaimer: I am no expert on this in any way and am in the process of learning about AWS ecosystem myself.
The ELB (Elastic load balancer) has no functionality on its own except receiving the requests and routing it to the right server. The servers can run Nginx, IIS, Apache, lighthttpd, you name it.
I will give you a real use case.
I had one Nginx server running one WordPress blog. This server was, like I said, powered by Nginx serving static content and "upstreaming" .php requests to phpfpm running on the same server. Everything was going fine until one day. This blog was featured on a tv show. I had a ton of users and the server could not keep up with that much traffic.
My first reaction would be to just use the AMI (Amazon machine image) to spin up a copy of my server on a more powerful instance like m1.heavy. The problem was I knew I would have traffic increasing over time over the next couple of days. Soon I would have to spin an even more powerful machine, which would mean more downtime and trouble.
Instead, I launched an ELB (elastic load balancer) and updated my DNS to point website traffic to the ELB instead of directly to the server. The user doesn’t know server IP or anything, he only sees the ELB, everything else goes on inside amazon’s cloud.
The ELB decides to which server the traffic goes. You can have ELB and only one server on at the time (if your traffic is low at the moment), or hundreds. Servers can be created and added to the server array (server group) at any time, or you can configure auto scaling to spawn new servers and add them to the ELB Server group using amazon command line, all automatically.
Amazon cloud watch (another product and important part of the AWS ecosystem) is always watching your server’s health and decides to which server it will route that user. It also knows when all the servers are becoming too loaded and is the agent that gives the order to spawn another server (using your AMI). When the servers are not under heavy load anymore they are automatically destroyed (or stopped, I don’t recall).
This way I was able to serve all users at all times, and when the load was light, I would have ELB and only one Nginx server. When the load was high I would let it decide how many servers I need (according to server load). Minimal downtime. Of course, you can set limits to how many servers you can afford at the same time and stuff like that so you don’t get billed over what you can pay.
You see, Instagram guys said the following - "we used to run 2 Nginx machines and DNS Round-Robin between them". This is inefficient IMO compared to ELB. DNS Round Robin is DNS routing each request to a different server. So first goes to server one, second goes to server two and on and on.
ELB actually watches the servers' HEALTH (CPU usage, network usage) and decides which server traffic goes based on that. Do you see the difference?
And they say: "The downside of this approach is the time it takes for DNS to update in case one of the machines needs to get decommissioned."
DNS Round robin is a form of a load balancer. But if one server goes kaput and you need to update DNS to remove this server from the server group, you will have downtime (DNS takes time to update to the whole world). Some users will get routed to this bad server. With ELB this is automatic - if the server is in bad health it does not receive any more traffic - unless of course the whole group of servers is in bad health and you do not have any kind of auto-scaling setup.
And now the guys at Instagram: "Recently, we moved to using Amazon’s Elastic Load Balancer, with 3 NGINX instances behind it that can be swapped in and out (and are automatically taken out of rotation if they fail a health check).".
The scenario I illustrated is fictional. It is actually more complex than that but nothing that cannot be solved. For instance, if users upload pictures to your application, how can you keep consistency between all the machines on the server group? You would need to store the images on an external service like Amazon s3. On another post on Instagram engineering – “The photos themselves go straight to Amazon S3, which currently stores several terabytes of photo data for us.”. If they have 3 Nginx servers on the load balancer and all servers serve HTML pages on which the links for images point to S3, you will have no problem. If the image is stored locally on the instance – no way to do it.
All servers on the ELB would also need an external database. For that amazon has RDS – All machines can point to the same database and data consistency would be guaranteed.
On the image above, you can see an RDS "Read replica" - that is RDS way of load balancing. I don't know much about that at this time, sorry.
Try and read this: http://awsadvent.tumblr.com/post/38043683444/using-elb-and-auto-scaling
Can you please point the blog entry out?
Load balancers balance load. They monitor the Web servers health (response time etc) and distribute the load between the Web servers. On more complex implementations it is possible to have new servers spawn automatically if there is a traffic spike. Of course you need to make sure there is a consistency between the servers. THEY CAN share the same databases for instance.
So I believe the load balancer gets hit and decides to which server it will route the traffic according to server health.
.
Nginx is a Web server that is extremely good at serving a lot of static content for simultaneous users.
Requests for dynamic pages can be offloaded to a different server using cgi. Or the same servers that run nginx can also run phpfpm.
.
A lot of possibilities. I am on my cell phone right now. tomorrow I can write a little more.
Best regards.
I am aware that I am late to the party, but I think the use of NGINX instances behind ELB in Istagram blogpost is to provide high available load balancer as described here.
NGINX instances do not seem to be used as web servers in the blogpost.
For that role they mention:
Next up comes the application servers that handle our requests. We run Djangoon Amazon High-CPU Extra-Large machines
So ELB is used just as a replacement for their older solution with DNS Round-Robin between NGINX instances that was not providing high availability.

Redirecting http traffic to another server temporarily

Assume you have one box (dedicated server) that's on 24 7 and several other boxes that are user machines that have unused bandwidth. Assume you want to host several web pages. How can the dedicated server redirect http traffic to the user machines. It is desirable that the address field in the web browser still displays the right address, and not an ip. Ie. I don't want to redirect to another web page, I want to tell the web browser that it should request the same web page from a different server. I have been browsing through the 3xx codes, and I don't think they are made for anything like this.
It should work some what along these lines:
1. Dedicated server is online all the time.
2. User machine starts and tells the dedicated server that it's online.
(several other user machines can do similarly)
3. Web browser looks up domain name and finds out that it points to dedicated server.
4. Web browser requests page.
5. Dedicated server tells web browser to repeat request to user machine
Is it possible to use some kind of redirect, and preferably tell the browser to keep sending further requests to user machine. The user machine can close down at almost any point of time, but it is assumed that the user machine will wait for ongoing transactions to finish, no closing the server program in the middle of a get or something.
What you want is called a Proxy server or load balancer that would sit in front of your web server.
The web browser would always talk to the load balancer, and the load balancer would forward the request to one of several back-end servers. No redirect is needed on the client side, as the client always thinks it is just talking to the load balancer.
ETA:
Looking at your various comments and re-reading the question, I think I misunderstood what you wanted to do. I was thinking that all the machines serving content would be on the same network, but now I see that you are looking for something more like a p2p web server setup.
If that's the case, using DNS and HTTP 30x redirects would probably be what you need. It would probably look something like this:
Your "master" server would serve as an entry point for the app, and would have a well known host name, e.g. "www.myapp.com".
Whenever a new "user" machine came online, it would register itself with the master server and a the master server would create or update a DNS entry for that user machine, e.g. "user123.myapp.com".
If a request came to the master server for a given page, e.g. "www.myapp.com/index.htm", it would do a 302 redirect to one of the user machines based on whatever DNS entry it had created for that machine - e.g. redirect them to "user123.myapp.com/index.htm".
Some problems I see with this approach:
First, Once a user gets redirected to a user machine, if the user machine went offline it would seem like the app was dead. You could avoid this by having all the links on every page specifically point to "www.myapp.com" instead of using relative links, but then every single request has to be routed through the "master server" which would be relatively inefficient.
You could potentially solve this by changing the DNS entry for a user machine when it goes offline to point back to the master server, but that wouldn't work without an extremely short TTL.
Another issue you'll have is tracking sessions. You probably wouldn't be able to use sessions very effectively with this setup without a shared session state server of some sort accessible by all the user machines. Although cookies should still work.
In networking, load balancing is a technique to distribute workload evenly across two or more computers, network links, CPUs, hard drives, or other resources, in order to get optimal resource utilization, maximize throughput, minimize response time, and avoid overload. Using multiple components with load balancing, instead of a single component, may increase reliability through redundancy. The load balancing service is usually provided by a dedicated program or hardware device (such as a multilayer switch or a DNS server).
and more interesting stuff in here
apart from load balancing you will need to set up more or less similar environment on the "users machines"
This sounds like 1 part proxy, 1 part load balancer, and about 100 parts disaster.
If I had to guess, I'd say you're trying to build some type of relatively anonymous torrent... But I may be wrong. If I'm right, HTTP is entirely the wrong protocol for something like this.
You could use dns, off the top of my head, you could setup a hostname for each machine that is going to serve users:
www in A xxx.xxx.xxx.xxx # ip address of machine 1
www in A xxx.xxx.xxx.xxx # ip address of machine 2
www in A xxx.xxx.xxx.xxx # ip address of machine 3
Then as others come online, you could add then to the dns entries:
www in A xxx.xxx.xxx.xxx # ip address of machine 4
Only problem is you'll have to lower the time to live (TTL) entry for each record down to make it smaller (I think the default is 86400 - 1 day)
If a machine does down, you'll have to remove the dns entry, though I do think this is the least intensive way of adding capacity to any website. Jeff Attwood has more info here: is round robin dns good enough?

Troubleshooting an SSL flood

Users connect to our webserver via https, and stay on a secured connection throughout their use of our service. A typical user session will establish a small handful of connections to the server (one or two).
There are a very small number of exceptions we are trying to track down. Particular users will intermittently have handfuls of hundreds of connections established. When we happen to catch the problem in the act, we can see the exchange of the SSL handshake, and from the perspective of the server, all appears to be in order. Yet we never observe a payload - the client instead connects on a new port and initiates a new handshake.
We do not have access to the client, and cannot observe the behavior from that side of the connection. Nor do we have a local scenario that can reproduce the problem.
It is our belief (though not confirmed) that the user agent is connecting to our server directly, and not through a proxy.
Does anybody recognize these symptoms? Can anyone suggest steps to further identify the problem?
Are there any patterns you can see to this traffic, aside from making many repeated requests?
For example, do the requests come from the same IP ranges? Possibly search engines or other spiders, or maybe from countries that you normally don't get users from, possibly indicating some sort of weird botnet or at least something you could block?
Do these rogue requests always negotiate to use a particular cipher suite, potentially indicating the client software?
Does it make any difference if you change the available cipher suites available for negotiation?
What server software are you using, and are there any firewalls within your network that could potentially be dropping some responses to the user?
i've seen a botnet flooding https sites being mentoned.
this is probably not your situation, but i thought i mention it.
i'm seeing chrome (12.0.742.60 beta) flooding my server with https connections, some half a dozen or more connections for a single static picture being served... as if it had an optimization to build up connections with ready https handshakes waiting for requests to send, and then after the page (file) has been served it closes them all.
on plain http i see only two connections (one extra for favicon.ico).

Resources