Setup a maintenance URL in a generic way - networking

Suppose I have a website that is normally accessed at address www.mywebsite.com.
Now let's say the website is down completely (think server has melted). I want the users trying to reach www.mywebsite.com to end up on a maintenance URL on another server instead of having a 404.
Is this possible easily without having to route all the trafic through a dispatcher/load balancer?
I could imagine something like :
When the default server is UP traffic is like :
[USER]<---->[www.mywebsite.com]<---->[DISPATCHER]<---->[DEFAULT SERVER]
When the default server is DOWN traffic is like :
[USER]<---->[www.mywebsite.com]<---->[DISPATCHER]<---->[MAINTENANCE SERVER]
Where [DISPATCHER] figures out where to route the traffic. Problem is that in this scenario all the traffic goes through [DISPATCHER]. Can I make it so that the first connection goes through dispatcher, and then, if the default server is up, the traffic goes directly from the user to the default server? (with a check every 10 - 15 minutes for example)
[USER]<---->[www.mywebsite.com]<-------->[DEFAULT SERVER] after the first successful connection
Thanks in advance!

Unfortunately, maybe the most practical solution is to give-up. Until browsers finally add support for SRV records....
You can achieve what you want with dynamic DNS - setup some monitoring script on a "maintenance server" that would check if your website is down, and if yes, update DNS for your site and point it to the maintenance server. This approach have it's own problems, biggest of which is that any monitoring may generate false positives, and thus your users will see the maintenance page while the site is actually up.
Another possible approach (even worse) - for example, make www.example.com point to your dispatcher server, and www2.example.com - to your main server. Then dispatcher would HTTP redirect all incoming requests to www2.example.com.
But what will you do when your dispatcher melts ? - While trying to handle one point of failure you just added another one.
Maybe it's practical to handle all page links in some javascript what will check if the server is up first, and only then follow the link. This approach while requires some scripting, but at least provides best results when your server is down while the user is already on your site. But it helps nothing for those who ry to enter the site for the first time.
If only browsers would support SRV records....

Related

Monitoring website visit length with Nginx logs

I am trying to build a solution to monitor my website visits without relying on cookies and third parties. Currently, by monitoring the access logs I can get enough and useful information but I am missing the length of the visits (i.e. to check whether people actually read what I write).
What would be a good strategy to monitor visit length with access logs? (I am using Nginx, but presumably the same ideas will be valid for Apache)
If not already part of your build then install the Nchan websockets module for Nginx.
Configure a websocket subscriber location directive on your Nginx server and specify nchan_subscribe_request and nchan_unsubscribe_request directives within it.
Insert a line of code into your page to establish a client connection to your websocket location upon page load.
That's it, done.
Now when I visit your page my browser will connect to your Nginx/Nchan websocket server. Nginx will make an internal request to whatever address you set as the nchan_subscribe_request URL, you can pass my IP in the headers of this request or whatever you need to identify me. Log this in your main log, a separate log, pass it to an upstream server, php, node, make a database entry, save my ip+timestamp in memcached, whatever.
Then when I leave the site my websocket connection will disconnect and Nginx will do the same thing but to the nchan_unsubscribe_request URL instead. Depending upon what you did when I connected you can now do whatever you need to do in order to work out how long I spent on your site.
As you now have a persistent connection to your clients you could take it a step further and include some code to monitor certain client behaviours or watch for certain events.
You are trying to determine whether or not people are reading what you write so you could use a few lines of javascript to monitor how far down the page visitors had scrolled. Each time they scrolled to a new maximum scroll position send that data over the websocket back to your server.
Due to the disconnected nature of HTTP, your access log would probably not give you what you need.
Not totally familiar with nginx or apache log, but I think most logs contain a timestamp, an HTTP request (the document requested and status, etc.) and an IP address.
Potential issues
Without a session cookie, all IP addresses (same household, same company, etc.) would be seen as the same session.
If someone goes to your site (1 HTTP request), consumes content on your site, doesn't proceed to another page, and leaves, your log will only contain that request (which is essentially a bounce, and you won't be able to calculate duration). If your application makes uses of a lot of javascript calls, then you might be able to log from the server side application,
2) If you use a tool like GA, etc., you can still use timer and javascript events (etc. though not perfect) to tell GA that the session is still active. Not sure if it works for typical server logs.
It might not be as big as an issue if a typical visit contains more than 1 request, knowing that there is no easy way to get the duration after the last server request.

My Azure Website has an odd "HTTP success" pattern in the (Monitor) portal

I have a website hosted in Azure Websites as a Basic tier website.
I'm currently in the development stage, yet the site is live and accessible by the outside world (at least at a basic level), so I wanted to better understand the monitoring features in the Azure management portal.
When I looked at the monitoring tab inside the portal, I see an odd pattern for HTTP success. Looking at the past 60 minutes (which I personally have not been active on), the HTTP successes are very cyclic, with 80 connections, then 0, then 40, then 0, then repeat.
Does anyone have any pointers how I can figure out what the 80 and 40 connections are. I certainly don't have any timed events in my code, so there shouldn't be any calls being made unless a person is actually hitting the site.
UPDATE:
I setup a staging server and blocked all incoming traffic except my own IP. So the same code running, just without access from the outside world. And the HTTP success appears only when I hit the server myself (as expected). This suggests that my site is being hit by an outside bot maybe? Does anyone know how to protect against this? Or at least diagnose if the requests are not legitimate, etc?
I'd say it's this setting that causes the traffic:
Always On. By default, websites are unloaded if they are idle for some period of time. This lets the system conserve resources. In Basic or Standard mode, you can enable Always On to keep the site loaded all the time. If your site runs continuous web jobs, you should enable Always On, or the web jobs may not run reliably
http://azure.microsoft.com/en-us/documentation/articles/web-sites-configure/
It's just a keep alive to avoid cold starts every time you or someone else visit your site.
Here's another reference that describes this behavior:
What the always-on feature does is simply ping your site every now and
then, to keep the application pool up and running.
And Scott Gu says:
One of the other useful Web Site features that we are introducing
today is a feature we call “Always On”. When Always On is enabled on a
site, Windows Azure will automatically ping your Web Site regularly to
ensure that the Web Site is always active and in a warm/running state.
This is useful to ensure that a site is always responsive (and that
the app domain or worker process has not paged out due to lack of
external HTTP requests).
About the traffic in general: First of all, the requests could really only come from Microsoft, since any traffic pattern like this will quickly be automatically detected and blocked when using Azure Websites - you cannot set up a keep alive like this yourself. Second, no modern bot whatsoever would regularily ping a specific page with that kind of regularity since it's all to obvious. Any modern datacenter security appliance would catch that kind of traffic and block/ignore/nullroute it.
As for your question regarding protection and security: Microsoft cannot protect your code from yourself. However, everything at the perimeter is managed and handled by Microsoft. That's one of the USP features of Azure - Firewall, Load Balancing, Spoofing, Anti-bot and DDOS protection etc. There will of course always be security concerns regarding any publicly exposed service but you can stay focused on your application while Microsoft manages the rest.
When running Azure Websites, you're in the hands of Microsoft regarding security outside of your application scope. That's a great thing, but if you really like to be able to use other security measures you'll have to set up a virtual machine instead and run your site from there.
You may want to first understand what are these requests. Enable web server logging for the website on Azure Management portal and download IIS logs for your website after seeing this pattern. Then check those to understand the URL, client ip addresses for the requests and user agent field to identify if the requests are really from search bots. Based on the observation, you can either disable some IP statically, use dynamic ip restrictions or configure URLREWRITE to block requests with specific patterns in request or request headers
EDIT
This is how you can block search bots - http://moz.com/ugc/blocking-bots-based-on-useragent
You can configure the URLREWRITE locally on an IIS server in the way described in the above article and then copy the configuration generated in the web.config or connect to the azure website directly using IIS manager as described in http://azure.microsoft.com/blog/2014/02/28/remote-administration-of-windows-azure-websites-using-iis-manager/ and configure urlrewrite rule

Website accessible from everywhere except for client's network

My client has a website that is showing some strange behavior. The site is built in ASP.Net and used to be hosted on their internal network. It's now been moved to a different server outside their network. They have other sites hosted on the same server, some built using DotNetNuke, and some classic ASP. All these sites are hosted on one application server, with a database (SQL Server 2008) on a separate server (which is on the same network as the application server). They share the application server, and the database server.
Now that this site has been moved to the outside server, they can't access it. I can, and so can others that I work with (from different IPs, across the country). But the client can't from their network. They can access the landing page subsite.clientdomain.com (no db access), but nothing else. So, for instance, there's a link to subsite.clientdomain.com/folder. When they click that link, the URL changes to subsite.com/folder, which does not work. For myself and others not at the client site, the URL does not change and opens with no problems.
I didn't write the site, and didn't even know it existed before this problem cropped up, so I know very little more than this. Any help is appreciated.
I'm going to go with Martijn B's answer. There's a DNS issue on the internal network. Somewhere on of the DNS servers is a definition that maps http://companywebsite to an ip address like 192.168.1.20 or whatever.
I would open a command prompt on your PC and type
ping new_website_name.com
Take a look at the IP address that comes back. You can also do an nslookup on new_website_name.com that will give you more information. If you (person A) gets one IP address and Person B (inside the network) gets a different IP address....there is definitely a DNS issue on the internal network.
You're going to have to do some network tracing to determine exactly where any redirection is occurring. Given that the problem is only manifested in certain locations, it is likely that it is a function of network configuration in that location (as previously suggested). Without understanding exactly what redirection is occurring, it would be unwise to make configuration changes that might make the problem worse or introduce new issues.
A DNS server cannot AFAIK redirect to a different URL. So something is redirecting from subsite.clientdomain.com/folder to subsite.com/folder, which could be caused by a HTTP redirect. This can be triggered by the software/website itself or by IIS.

Redirecting http traffic to another server temporarily

Assume you have one box (dedicated server) that's on 24 7 and several other boxes that are user machines that have unused bandwidth. Assume you want to host several web pages. How can the dedicated server redirect http traffic to the user machines. It is desirable that the address field in the web browser still displays the right address, and not an ip. Ie. I don't want to redirect to another web page, I want to tell the web browser that it should request the same web page from a different server. I have been browsing through the 3xx codes, and I don't think they are made for anything like this.
It should work some what along these lines:
1. Dedicated server is online all the time.
2. User machine starts and tells the dedicated server that it's online.
(several other user machines can do similarly)
3. Web browser looks up domain name and finds out that it points to dedicated server.
4. Web browser requests page.
5. Dedicated server tells web browser to repeat request to user machine
Is it possible to use some kind of redirect, and preferably tell the browser to keep sending further requests to user machine. The user machine can close down at almost any point of time, but it is assumed that the user machine will wait for ongoing transactions to finish, no closing the server program in the middle of a get or something.
What you want is called a Proxy server or load balancer that would sit in front of your web server.
The web browser would always talk to the load balancer, and the load balancer would forward the request to one of several back-end servers. No redirect is needed on the client side, as the client always thinks it is just talking to the load balancer.
ETA:
Looking at your various comments and re-reading the question, I think I misunderstood what you wanted to do. I was thinking that all the machines serving content would be on the same network, but now I see that you are looking for something more like a p2p web server setup.
If that's the case, using DNS and HTTP 30x redirects would probably be what you need. It would probably look something like this:
Your "master" server would serve as an entry point for the app, and would have a well known host name, e.g. "www.myapp.com".
Whenever a new "user" machine came online, it would register itself with the master server and a the master server would create or update a DNS entry for that user machine, e.g. "user123.myapp.com".
If a request came to the master server for a given page, e.g. "www.myapp.com/index.htm", it would do a 302 redirect to one of the user machines based on whatever DNS entry it had created for that machine - e.g. redirect them to "user123.myapp.com/index.htm".
Some problems I see with this approach:
First, Once a user gets redirected to a user machine, if the user machine went offline it would seem like the app was dead. You could avoid this by having all the links on every page specifically point to "www.myapp.com" instead of using relative links, but then every single request has to be routed through the "master server" which would be relatively inefficient.
You could potentially solve this by changing the DNS entry for a user machine when it goes offline to point back to the master server, but that wouldn't work without an extremely short TTL.
Another issue you'll have is tracking sessions. You probably wouldn't be able to use sessions very effectively with this setup without a shared session state server of some sort accessible by all the user machines. Although cookies should still work.
In networking, load balancing is a technique to distribute workload evenly across two or more computers, network links, CPUs, hard drives, or other resources, in order to get optimal resource utilization, maximize throughput, minimize response time, and avoid overload. Using multiple components with load balancing, instead of a single component, may increase reliability through redundancy. The load balancing service is usually provided by a dedicated program or hardware device (such as a multilayer switch or a DNS server).
and more interesting stuff in here
apart from load balancing you will need to set up more or less similar environment on the "users machines"
This sounds like 1 part proxy, 1 part load balancer, and about 100 parts disaster.
If I had to guess, I'd say you're trying to build some type of relatively anonymous torrent... But I may be wrong. If I'm right, HTTP is entirely the wrong protocol for something like this.
You could use dns, off the top of my head, you could setup a hostname for each machine that is going to serve users:
www in A xxx.xxx.xxx.xxx # ip address of machine 1
www in A xxx.xxx.xxx.xxx # ip address of machine 2
www in A xxx.xxx.xxx.xxx # ip address of machine 3
Then as others come online, you could add then to the dns entries:
www in A xxx.xxx.xxx.xxx # ip address of machine 4
Only problem is you'll have to lower the time to live (TTL) entry for each record down to make it smaller (I think the default is 86400 - 1 day)
If a machine does down, you'll have to remove the dns entry, though I do think this is the least intensive way of adding capacity to any website. Jeff Attwood has more info here: is round robin dns good enough?

Forwarding HTTP Request with Direct Server Return

I have servers spread across several data centers, each storing different files. I want users to be able to access the files on all servers through a single domain and have the individual servers return the files directly to the users.
The following shows a simple example:
1) The user's browser requests http://www.example.com/files/file1.zip
2) Request goes to server A, based on the DNS A record for example.com.
3) Server A analyzes the request and works out that /files/file1.zip is stored on server B.
4) Server A forwards the request to server B.
5) Server B returns file1.zip directly to the user without going through server A.
Note: steps 4 and 5 must be transparent to the user and cannot involve sending a redirect to the user as that would violate the requirement of a single domain.
From my research, what I want to achieve is called "Direct Server Return" and it is a common setup for load balancing. It is also sometimes called a half reverse proxy.
For step 4, it sounds like I need to do MAC Address Translation and then pass the request back onto the network and for servers outside the network of server A tunneling will be required.
For step 5, I simply need to configure server B, as per the real servers in a load balancing setup. Namely, server B should have server A's IP address on the loopback interface and it should not answer any ARP requests for that IP address.
My problem is how to actually achieve step 4?
I have found plenty of hardware and software that can do this for simple load balancing at layer 4, but these solutions fall short and cannot handle the kind of custom routing I require. It seems like I will need to roll my own solution.
Ideally, I would like to do the routing / forwarding at the web server level, i.e. in PHP or C# / ASP.net. However, I am open to doing it at a lower level such as Apache or IIS, or at an even lower level, i.e. a custom proxy service in front of everything.
Forgive my ignorance, but why not setup Server A to mount the files that are located on the other servers either via NFS or SMB, depending on whether you're using a unix variant, or whether you're using Windows?
Seems like what you're trying to do is overly complicate something that could be very simple. In addition, using network-mounted files will allow you to mount those files on additional machines in the future when you need them. At that point, then you could put a load balancer in front of server A (and servers x, y, and z, which also all mount files from server B).
Granted this would not solve the problem of bypassing server A on the return, technically server A would be returning the file instead of server B, but if a load balancer were to be put in front of A, then A would become B anyways, so technically B would still be returning the file, because the load balancer would use direct server return (its a standard feature for a long time now).
If I did miss something, please do elaborate.
Edit: Yes I realize this was posted nearly 3 years ago. Oh well.
Why not send an HTTP response of status code 307 Temporary Redirect?
At that point the client will re-request to the correct specified server.
I know you want a single domain, but you could have both individual subdomains plus a single common domain.
For example:
example.com has IP1, IP2, IP3.
example1.example.com has IP1
example2.example.com has IP2
example3.example.com has IP3
If the request comes to a server that it can't handle itself, it will forward the user to make another request to the correct specific server. An HTTP browser will follow this redirect transparently by the way.

Resources