I am trying to build a solution to monitor my website visits without relying on cookies and third parties. Currently, by monitoring the access logs I can get enough and useful information but I am missing the length of the visits (i.e. to check whether people actually read what I write).
What would be a good strategy to monitor visit length with access logs? (I am using Nginx, but presumably the same ideas will be valid for Apache)
If not already part of your build then install the Nchan websockets module for Nginx.
Configure a websocket subscriber location directive on your Nginx server and specify nchan_subscribe_request and nchan_unsubscribe_request directives within it.
Insert a line of code into your page to establish a client connection to your websocket location upon page load.
That's it, done.
Now when I visit your page my browser will connect to your Nginx/Nchan websocket server. Nginx will make an internal request to whatever address you set as the nchan_subscribe_request URL, you can pass my IP in the headers of this request or whatever you need to identify me. Log this in your main log, a separate log, pass it to an upstream server, php, node, make a database entry, save my ip+timestamp in memcached, whatever.
Then when I leave the site my websocket connection will disconnect and Nginx will do the same thing but to the nchan_unsubscribe_request URL instead. Depending upon what you did when I connected you can now do whatever you need to do in order to work out how long I spent on your site.
As you now have a persistent connection to your clients you could take it a step further and include some code to monitor certain client behaviours or watch for certain events.
You are trying to determine whether or not people are reading what you write so you could use a few lines of javascript to monitor how far down the page visitors had scrolled. Each time they scrolled to a new maximum scroll position send that data over the websocket back to your server.
Due to the disconnected nature of HTTP, your access log would probably not give you what you need.
Not totally familiar with nginx or apache log, but I think most logs contain a timestamp, an HTTP request (the document requested and status, etc.) and an IP address.
Potential issues
Without a session cookie, all IP addresses (same household, same company, etc.) would be seen as the same session.
If someone goes to your site (1 HTTP request), consumes content on your site, doesn't proceed to another page, and leaves, your log will only contain that request (which is essentially a bounce, and you won't be able to calculate duration). If your application makes uses of a lot of javascript calls, then you might be able to log from the server side application,
2) If you use a tool like GA, etc., you can still use timer and javascript events (etc. though not perfect) to tell GA that the session is still active. Not sure if it works for typical server logs.
It might not be as big as an issue if a typical visit contains more than 1 request, knowing that there is no easy way to get the duration after the last server request.
Related
Suppose I have a website that is normally accessed at address www.mywebsite.com.
Now let's say the website is down completely (think server has melted). I want the users trying to reach www.mywebsite.com to end up on a maintenance URL on another server instead of having a 404.
Is this possible easily without having to route all the trafic through a dispatcher/load balancer?
I could imagine something like :
When the default server is UP traffic is like :
[USER]<---->[www.mywebsite.com]<---->[DISPATCHER]<---->[DEFAULT SERVER]
When the default server is DOWN traffic is like :
[USER]<---->[www.mywebsite.com]<---->[DISPATCHER]<---->[MAINTENANCE SERVER]
Where [DISPATCHER] figures out where to route the traffic. Problem is that in this scenario all the traffic goes through [DISPATCHER]. Can I make it so that the first connection goes through dispatcher, and then, if the default server is up, the traffic goes directly from the user to the default server? (with a check every 10 - 15 minutes for example)
[USER]<---->[www.mywebsite.com]<-------->[DEFAULT SERVER] after the first successful connection
Thanks in advance!
Unfortunately, maybe the most practical solution is to give-up. Until browsers finally add support for SRV records....
You can achieve what you want with dynamic DNS - setup some monitoring script on a "maintenance server" that would check if your website is down, and if yes, update DNS for your site and point it to the maintenance server. This approach have it's own problems, biggest of which is that any monitoring may generate false positives, and thus your users will see the maintenance page while the site is actually up.
Another possible approach (even worse) - for example, make www.example.com point to your dispatcher server, and www2.example.com - to your main server. Then dispatcher would HTTP redirect all incoming requests to www2.example.com.
But what will you do when your dispatcher melts ? - While trying to handle one point of failure you just added another one.
Maybe it's practical to handle all page links in some javascript what will check if the server is up first, and only then follow the link. This approach while requires some scripting, but at least provides best results when your server is down while the user is already on your site. But it helps nothing for those who ry to enter the site for the first time.
If only browsers would support SRV records....
My webapp is deployed in a cluster of multiple JBoss instances. There is an admin page in the webapp to perform certain Jboss instance-specific operations.
The problem is that requests are sent to a load balancer instead of directly hitting specific individual instance.
Is there any way to direct request to a specific instance? Or at least when the admin page is up, all subsequent requests (Ajax) will stick to the original instance that serves the page at the beginning.
I don't think HttpSession is going to help here. I need to target specific instance and not maintaining the state of individual client.
Thanks.
You were looking for how to configure for Sticky sessions.
Send all requests in a user session consistently to the same backend server known as persistence or stickiness. A significant downside to this technique is its lack of automatic failover: if a backend server goes down, its per-session information becomes inaccessible, and any sessions depending on it are lost. The same problem is usually relevant to central database servers; even if web servers are "stateless" and not "sticky".
Assignment to a particular server might be based on a username, client IP address, or by random assignment. While there are advantages and disadvantages to the approaches.
I would suggest to please go through below article in configuring JBoss under a cluster rather going in deep understanding unless and until you would want to know in deep.
http://docs.jboss.org/jbossas/docs/Clustering_Guide/beta422/html/clustering-http-nodes.html
https://community.jboss.org/wiki/HTTPLoadbalancer
Assume you have one box (dedicated server) that's on 24 7 and several other boxes that are user machines that have unused bandwidth. Assume you want to host several web pages. How can the dedicated server redirect http traffic to the user machines. It is desirable that the address field in the web browser still displays the right address, and not an ip. Ie. I don't want to redirect to another web page, I want to tell the web browser that it should request the same web page from a different server. I have been browsing through the 3xx codes, and I don't think they are made for anything like this.
It should work some what along these lines:
1. Dedicated server is online all the time.
2. User machine starts and tells the dedicated server that it's online.
(several other user machines can do similarly)
3. Web browser looks up domain name and finds out that it points to dedicated server.
4. Web browser requests page.
5. Dedicated server tells web browser to repeat request to user machine
Is it possible to use some kind of redirect, and preferably tell the browser to keep sending further requests to user machine. The user machine can close down at almost any point of time, but it is assumed that the user machine will wait for ongoing transactions to finish, no closing the server program in the middle of a get or something.
What you want is called a Proxy server or load balancer that would sit in front of your web server.
The web browser would always talk to the load balancer, and the load balancer would forward the request to one of several back-end servers. No redirect is needed on the client side, as the client always thinks it is just talking to the load balancer.
ETA:
Looking at your various comments and re-reading the question, I think I misunderstood what you wanted to do. I was thinking that all the machines serving content would be on the same network, but now I see that you are looking for something more like a p2p web server setup.
If that's the case, using DNS and HTTP 30x redirects would probably be what you need. It would probably look something like this:
Your "master" server would serve as an entry point for the app, and would have a well known host name, e.g. "www.myapp.com".
Whenever a new "user" machine came online, it would register itself with the master server and a the master server would create or update a DNS entry for that user machine, e.g. "user123.myapp.com".
If a request came to the master server for a given page, e.g. "www.myapp.com/index.htm", it would do a 302 redirect to one of the user machines based on whatever DNS entry it had created for that machine - e.g. redirect them to "user123.myapp.com/index.htm".
Some problems I see with this approach:
First, Once a user gets redirected to a user machine, if the user machine went offline it would seem like the app was dead. You could avoid this by having all the links on every page specifically point to "www.myapp.com" instead of using relative links, but then every single request has to be routed through the "master server" which would be relatively inefficient.
You could potentially solve this by changing the DNS entry for a user machine when it goes offline to point back to the master server, but that wouldn't work without an extremely short TTL.
Another issue you'll have is tracking sessions. You probably wouldn't be able to use sessions very effectively with this setup without a shared session state server of some sort accessible by all the user machines. Although cookies should still work.
In networking, load balancing is a technique to distribute workload evenly across two or more computers, network links, CPUs, hard drives, or other resources, in order to get optimal resource utilization, maximize throughput, minimize response time, and avoid overload. Using multiple components with load balancing, instead of a single component, may increase reliability through redundancy. The load balancing service is usually provided by a dedicated program or hardware device (such as a multilayer switch or a DNS server).
and more interesting stuff in here
apart from load balancing you will need to set up more or less similar environment on the "users machines"
This sounds like 1 part proxy, 1 part load balancer, and about 100 parts disaster.
If I had to guess, I'd say you're trying to build some type of relatively anonymous torrent... But I may be wrong. If I'm right, HTTP is entirely the wrong protocol for something like this.
You could use dns, off the top of my head, you could setup a hostname for each machine that is going to serve users:
www in A xxx.xxx.xxx.xxx # ip address of machine 1
www in A xxx.xxx.xxx.xxx # ip address of machine 2
www in A xxx.xxx.xxx.xxx # ip address of machine 3
Then as others come online, you could add then to the dns entries:
www in A xxx.xxx.xxx.xxx # ip address of machine 4
Only problem is you'll have to lower the time to live (TTL) entry for each record down to make it smaller (I think the default is 86400 - 1 day)
If a machine does down, you'll have to remove the dns entry, though I do think this is the least intensive way of adding capacity to any website. Jeff Attwood has more info here: is round robin dns good enough?
I have servers spread across several data centers, each storing different files. I want users to be able to access the files on all servers through a single domain and have the individual servers return the files directly to the users.
The following shows a simple example:
1) The user's browser requests http://www.example.com/files/file1.zip
2) Request goes to server A, based on the DNS A record for example.com.
3) Server A analyzes the request and works out that /files/file1.zip is stored on server B.
4) Server A forwards the request to server B.
5) Server B returns file1.zip directly to the user without going through server A.
Note: steps 4 and 5 must be transparent to the user and cannot involve sending a redirect to the user as that would violate the requirement of a single domain.
From my research, what I want to achieve is called "Direct Server Return" and it is a common setup for load balancing. It is also sometimes called a half reverse proxy.
For step 4, it sounds like I need to do MAC Address Translation and then pass the request back onto the network and for servers outside the network of server A tunneling will be required.
For step 5, I simply need to configure server B, as per the real servers in a load balancing setup. Namely, server B should have server A's IP address on the loopback interface and it should not answer any ARP requests for that IP address.
My problem is how to actually achieve step 4?
I have found plenty of hardware and software that can do this for simple load balancing at layer 4, but these solutions fall short and cannot handle the kind of custom routing I require. It seems like I will need to roll my own solution.
Ideally, I would like to do the routing / forwarding at the web server level, i.e. in PHP or C# / ASP.net. However, I am open to doing it at a lower level such as Apache or IIS, or at an even lower level, i.e. a custom proxy service in front of everything.
Forgive my ignorance, but why not setup Server A to mount the files that are located on the other servers either via NFS or SMB, depending on whether you're using a unix variant, or whether you're using Windows?
Seems like what you're trying to do is overly complicate something that could be very simple. In addition, using network-mounted files will allow you to mount those files on additional machines in the future when you need them. At that point, then you could put a load balancer in front of server A (and servers x, y, and z, which also all mount files from server B).
Granted this would not solve the problem of bypassing server A on the return, technically server A would be returning the file instead of server B, but if a load balancer were to be put in front of A, then A would become B anyways, so technically B would still be returning the file, because the load balancer would use direct server return (its a standard feature for a long time now).
If I did miss something, please do elaborate.
Edit: Yes I realize this was posted nearly 3 years ago. Oh well.
Why not send an HTTP response of status code 307 Temporary Redirect?
At that point the client will re-request to the correct specified server.
I know you want a single domain, but you could have both individual subdomains plus a single common domain.
For example:
example.com has IP1, IP2, IP3.
example1.example.com has IP1
example2.example.com has IP2
example3.example.com has IP3
If the request comes to a server that it can't handle itself, it will forward the user to make another request to the correct specific server. An HTTP browser will follow this redirect transparently by the way.
Users connect to our webserver via https, and stay on a secured connection throughout their use of our service. A typical user session will establish a small handful of connections to the server (one or two).
There are a very small number of exceptions we are trying to track down. Particular users will intermittently have handfuls of hundreds of connections established. When we happen to catch the problem in the act, we can see the exchange of the SSL handshake, and from the perspective of the server, all appears to be in order. Yet we never observe a payload - the client instead connects on a new port and initiates a new handshake.
We do not have access to the client, and cannot observe the behavior from that side of the connection. Nor do we have a local scenario that can reproduce the problem.
It is our belief (though not confirmed) that the user agent is connecting to our server directly, and not through a proxy.
Does anybody recognize these symptoms? Can anyone suggest steps to further identify the problem?
Are there any patterns you can see to this traffic, aside from making many repeated requests?
For example, do the requests come from the same IP ranges? Possibly search engines or other spiders, or maybe from countries that you normally don't get users from, possibly indicating some sort of weird botnet or at least something you could block?
Do these rogue requests always negotiate to use a particular cipher suite, potentially indicating the client software?
Does it make any difference if you change the available cipher suites available for negotiation?
What server software are you using, and are there any firewalls within your network that could potentially be dropping some responses to the user?
i've seen a botnet flooding https sites being mentoned.
this is probably not your situation, but i thought i mention it.
i'm seeing chrome (12.0.742.60 beta) flooding my server with https connections, some half a dozen or more connections for a single static picture being served... as if it had an optimization to build up connections with ready https handshakes waiting for requests to send, and then after the page (file) has been served it closes them all.
on plain http i see only two connections (one extra for favicon.ico).