Value of proxying HTTP requests with node.js - http

I have very recently started development on a multiplayer browser game that will use nowjs to synchronize player states from the server state. I am new to server-side development (so many of the things I'm saying are probably being said incorrectly), and while I understand how node.js works on its own I have seen discussions about proxying HTTP requests through another server technology (a la NGinx or Apache) for efficiency.
I don't understand why it would be beneficial to do so, even though I've seen plenty of explanations of how to do so. My current plan is to have the game's website and info on the same server as the game itself, so if there is any gain from proxying node I'd love to know why.

In the context of your question it seems you are looking for an answer on the benefits of implementing a reverse proxy in front of your node.js webserver. In summary, a reverse proxy (depending on implementation) can provide the following features out of the box:
Load balancing
Caching of static content
Failover
Compression of responses (e.g gzip)
SSL support
All these features are cross-cutting concerns that you should not need to accommodate in your application tier/code. By implementing these features within the proxy it allows you to focus on developing the code for your application and leaves the web server to do what it's good at, serving the HTTP requests for your application.
nginx appears to be a common choice in a reverse proxy/node configuration and if you take a look at the modules reference you should get a feel for what features the proxy can provide.

When you say "through another technology" I assume you mean through a dedicated web server such as NGinx or Apache.
The reason you do that is b/c in a production environment there are a number of considerations you don't want your application to have to do on its own. Caching, domain (or sub-domain) mapping, perhaps security, SSL, load balancing, and serving static files to name a few.
The web servers are already built to do all those things for you, and so they can handle them and then pass only the requests on to your app that actually need to be handled by your app. They're also optimized for doing those things and will probably do them as well or better than the average developer can.
Hope that helps.

Another issue that people haven't added in here is that with a front-end proxy, when you need to take your service down for maintenance (or even just restart it), nginx can serve up a pretty "YourCompanyName is currently under maintenance" page, making for a much more pleasant user experience.

Related

No need for HTTPS at all, but there seems to be no other way from the web browser

Given a web application running over an HTTPS connection. It also has to communicate with a Java application on the local area network.
This server is literally in the same room with the PC on which the web app is running, a simple HTTP connection would be completely fine between the two, but since the web app is running over HTTPS, the browser forces the HTTPS.
It's already stupid and a big overkill that I must employ an HTTPS server in the Java application just because of that, but still now it doesn't work yet, because now the browser is complaining about the certificate that it is self-signed..
I mean, do I really need to purchase an SSL certificate so two of my computers in the same room can communicate over HTTP? Even if I wanted I couldn't. There's not even a fix domain.
I'm confused, is there a way around?
UPDATE:
The web application is served from the Internet, that's why the HTTPS connection. Whereas it should receive data from a Java application running locally. Hundreds of megabytes in every couple of minutes (confidential medical images) so sending all that through a proxy is not really an option.
I also wanted to avoid the need of any manual configuration from the user's side to make the communication work (like importing a certificate into the web browser and similar) but maybe I have no other option.

My Azure Website has an odd "HTTP success" pattern in the (Monitor) portal

I have a website hosted in Azure Websites as a Basic tier website.
I'm currently in the development stage, yet the site is live and accessible by the outside world (at least at a basic level), so I wanted to better understand the monitoring features in the Azure management portal.
When I looked at the monitoring tab inside the portal, I see an odd pattern for HTTP success. Looking at the past 60 minutes (which I personally have not been active on), the HTTP successes are very cyclic, with 80 connections, then 0, then 40, then 0, then repeat.
Does anyone have any pointers how I can figure out what the 80 and 40 connections are. I certainly don't have any timed events in my code, so there shouldn't be any calls being made unless a person is actually hitting the site.
UPDATE:
I setup a staging server and blocked all incoming traffic except my own IP. So the same code running, just without access from the outside world. And the HTTP success appears only when I hit the server myself (as expected). This suggests that my site is being hit by an outside bot maybe? Does anyone know how to protect against this? Or at least diagnose if the requests are not legitimate, etc?
I'd say it's this setting that causes the traffic:
Always On. By default, websites are unloaded if they are idle for some period of time. This lets the system conserve resources. In Basic or Standard mode, you can enable Always On to keep the site loaded all the time. If your site runs continuous web jobs, you should enable Always On, or the web jobs may not run reliably
http://azure.microsoft.com/en-us/documentation/articles/web-sites-configure/
It's just a keep alive to avoid cold starts every time you or someone else visit your site.
Here's another reference that describes this behavior:
What the always-on feature does is simply ping your site every now and
then, to keep the application pool up and running.
And Scott Gu says:
One of the other useful Web Site features that we are introducing
today is a feature we call “Always On”. When Always On is enabled on a
site, Windows Azure will automatically ping your Web Site regularly to
ensure that the Web Site is always active and in a warm/running state.
This is useful to ensure that a site is always responsive (and that
the app domain or worker process has not paged out due to lack of
external HTTP requests).
About the traffic in general: First of all, the requests could really only come from Microsoft, since any traffic pattern like this will quickly be automatically detected and blocked when using Azure Websites - you cannot set up a keep alive like this yourself. Second, no modern bot whatsoever would regularily ping a specific page with that kind of regularity since it's all to obvious. Any modern datacenter security appliance would catch that kind of traffic and block/ignore/nullroute it.
As for your question regarding protection and security: Microsoft cannot protect your code from yourself. However, everything at the perimeter is managed and handled by Microsoft. That's one of the USP features of Azure - Firewall, Load Balancing, Spoofing, Anti-bot and DDOS protection etc. There will of course always be security concerns regarding any publicly exposed service but you can stay focused on your application while Microsoft manages the rest.
When running Azure Websites, you're in the hands of Microsoft regarding security outside of your application scope. That's a great thing, but if you really like to be able to use other security measures you'll have to set up a virtual machine instead and run your site from there.
You may want to first understand what are these requests. Enable web server logging for the website on Azure Management portal and download IIS logs for your website after seeing this pattern. Then check those to understand the URL, client ip addresses for the requests and user agent field to identify if the requests are really from search bots. Based on the observation, you can either disable some IP statically, use dynamic ip restrictions or configure URLREWRITE to block requests with specific patterns in request or request headers
EDIT
This is how you can block search bots - http://moz.com/ugc/blocking-bots-based-on-useragent
You can configure the URLREWRITE locally on an IIS server in the way described in the above article and then copy the configuration generated in the web.config or connect to the azure website directly using IIS manager as described in http://azure.microsoft.com/blog/2014/02/28/remote-administration-of-windows-azure-websites-using-iis-manager/ and configure urlrewrite rule

Should I always use a reverse proxy for a web app?

I'm writing a web app in Go. Currently I have a layout that looks like this:
[CloudFlare] --> [Nginx] --> [Program]
Nginx does the following:
Performs some redirects (i.e. www.domain.tld --> domain.tld)
Adds headers such as X-Frame-Options.
Handles static images.
Writes access.log.
In the past I would use Nginx as it performed SSL termination and some other tasks. Since that's now handled by CloudFlare, all it does, essentially, is static images. Given that Go has a built in HTTP FileServer and CloudFlare could take over handling static images for me, I started to wonder why Nginx is in-front in the first place.
Is it considered a bad idea to put nothing in-front?
In your case, you can possibly get away with not running nginx, but I wouldn't recommend it.
However, as I touched on in this answer there's still a lot it can do that you'll need to "reinvent" in Go.
Content-Security headers
SSL (is the connection between CloudFlare and you insecure if they are terminating SSL?)
SSL session caching & HSTS
Client body limits and header buffers
5xx error pages and maintenance pages when you're restarting your Go application
"Free" logging (unless you want to write all that in your Go app)
gzip (again, unless you want to implement that in your Go app)
Running Go standalone makes sense if you are running an internal web service or something lightweight, or genuinely don't need the extra features of nginx. If you're building web applications then nginx is going to help abstract "web server" tasks from the application itself.
I wouldn't use nginx at all to be honest, some nice dude tested fast cgi go + nginx and just go standalone library. The results he came up with were quite interesting, the standalone hosting seemed to be much better in handling requests than using it behind nginx, and the final recommendation was that if you don't need specific features of nginx don't use it. full article
You could run it as standalone and if you're using partial/full ssl on your site you could use another go http server to redirect to safe https routes.
Don't use ngnix if you do not need it.
Go does SSL in less lines then you have to write in ngnix configure file.
The only reason is a free logging but I wonder how many lines of code is logging in Go.
There is nice article in Russian about reverse proxy in Go in 200 lines of code.
If Go could be used instead of ngnix then ngnix is not required when you use Go.
You need ngnix if you wish to have several Go processes or Go and PHP on same site.
Or if you use Go and you have some problem when you add ngnix then it fix the problem.

understanding load balancing in asp.net

I'm writing a website that is going to start using a load balancer and I'm trying to wrap my head around it.
Does IIS just do all the balancing for you?
Do you have a separate web layer that sits on the distributed server that does some work before sending to the sub server, like auth or other work?
It seems like a lot of the articles I keep reading don't really give me a straight answer, or I'm just not understanding them correctly, I'd like to get my head around how true load balancing works from a techincal side, and if anyone has any code to share that would also be nice.
I understand caching is gonna be a problem but that's a different topic, session as well.
IIS do not have a load balancer by default but you can use at least two Microsoft technologies:
Application Request Routing that integrates with IIS, there you should ideally have a separate web layer to do routing work,
Network Load Balancing that is integrated with Microsoft Windows Server, there you can join existing servers into NLB cluster.
Both of those technologies do not require any code per se, it a matter of the infrastructure. But you must of course remember about load balanced environment during development. For example, to make a web sites truly balanced, they should be stateless. Otherwise you will have to provide so called stickiness between client and the server, so the same client will be connecting always to the same server.
To make service stateless, do not persist any state (Session, for example, in case of ASP.NET website) on the server but on external server shared between all servers in the farm. So it is common for example to use external ASP.NET Session server (StateServer or SQLServer modes) for all sites in the cluster.
EDIT:
Just to clarify a few things, a few words about both mentioned technologies:
NLB works on network level (as a networking driver in fact), so without any knowledge about applications used. You create so called clusters consisting of a few machines/servers and expose them as a single IP address. Then another machine can use this IP as any other IP, but connections will be routed to the one of the cluster's machines automatically. A cluster is configured on each server, there is no external, additional routing machine. Depending on the clusters settings, as we have already mentioned, a stickiness can be enabled or disabled (called here a Single or None Affinity). There is also a Load weight parameter, so you can set weighed load distribution, sending more connections to the fastest machine for example. But this parameter is static, it can't be dynamically based on network, CPU or any other usage. In fact NLB does not care if target application is even running, it just route network traffic to the selected machine. But it notices servers went offline, so there will be no routing there. The advantages of NLB is that it is quite lightweight and requires no additional machines.
ARR is much more sophisticated, it is built as a module on top of IIS and is designed to make the routing decisions at application level. Network load balancing is only one of its features as it is a more complete, routing solution. It has "rule-based routing, client and host name affinity, load balancing of HTTP server requests, and distributed disk caching" as Microsoft states. You create there Server Farms with many options like load balance algorithm, load distribution and client stickiness. You can define health tests and routing rules to forward request to other servers. Disadvantage of all of it is that there should be a dedicated machine where ARR is installed, so it takes more resources (and costs).
NLB & ARR - as using a single ARR machine can be the single point of failure, Microsoft states that it is worth consideration to create a NLB cluster of ARR machines.
Does IIS just do all the balancing for you?
Yes,if you configure Application Request Routing:
Do you have a separate web layer that sits on the distributed server
Yes.
that does some work before sending to the sub server, like auth or other work?
No, ARR is pretty 'dumb':
IIS ARR doesn't provide any pre-authentication. If pre-auth is a requirement then you can look at Web Application Proxy (WAP) which is available in Windows Server 2012 R2.
It just acts as a transparent proxy that accepts and forwards requests, while adding some caching when configured.
For authentication you can look at Windows Server 2012's Web Application Proxy.
Some tips, and perhaps items to get yourself fully acquainted with:
ARR as all the above answers above state is a "proxy" that handles the traffic from your users to your servers.
You can handle State as Konrad points out, or you can have ARR do "sticky" sessions (ensure that a client always goes to "this server" - presumably the server that maintains state for that specific client). See the discussion/comments on that answer - it's great.
I haven't worn an IT/server hat for so long and frankly haven't touched clustering hands on (always "handled for me automagically" by some provider), so I did ask this question from our host, "what/how is replication among our cluster/farm" done?" - The question covers things like
I'm only working/setting things on 1 server, does that get replicated across X VMs in our cluster/farm? How long?
What about dynamically generated,code and/or user generated files (file system)? If it's on VM1's file system, and I have 10 load balanced VMs, and the client can hit any one of them at any time, then...?
What about encryption? e.g. if you use DPAPI to encrypt web.config stuff (e.g.db conn strings/sections), what is the impact of that (because it's based on machine key, and well, the obvious thing is now you have machine(s) or VM(s). RSA re-write....?
SSL: ARR can handle this for you as well, and that's great! But as with all power, comes a "con" - if you check/validate in your code, e.g. HttpRequest.IsSecureConnection, well, it'll always be false. Your servers/VMs don't have the cert, ARR does. The encrypted conn is between client and ARR. ARR to your servers/VMs isn't. As the link explains, if you prefer it the other way around (no offloading), you can...but that means all your servers/VMs should then have a cert (and how that pertains to "replication" above starts popping in your head).
Not meant to be comprehensive, just listing things out from memory...Hth

How to enable a maintenance page on the frontend server while you are restarting the backend service?

I am trying to improve the user experience while a backend service is down due to maintenance, shutdown manually.
We do had a frontend web proxy, which happens to be nginx but it could also be something else like a NetScaler instance. An important note is that the frontend proxy is running on a different machine than the backend application.
Now, for the backend service it takes a lot of time to start, even more than 10 minutes in some cases.
Note, I am asking this question on StackOverflow, as opposed to ServerFault because providing a solution for this problem is more likely to require writing some bash code inside the daemon startup script.
What we want to achive:
service mydaemon stop should enable the maintenance page on the frontend proxy
service mydaemon start should disabled the maintenance page on the frontend proxy
In the past we used to create a maintenance.html page and had nginx configured to check the existence of this page using try, before falling back to the backend.
Still, because we decided to move nginx to another machine we cannot do this and doing this using ssh raises security concerns.
We already considered writing this file to a NFS drive which would be accessible by both machine, but even this solution does not scale for a service that has a lot of traffic. Nginx will end-up trying this for every request, slowing down the responses quite a lot.
We are looking for another solution for this problem, one that would ideally be more flexible.
As a note, we still want to be able to trigger this behaviour from inside the daemon script of the backend application. So if the backend application stops responsing for other reasons, we do expect to see the same from the frontend.

Resources