Users connect to our webserver via https, and stay on a secured connection throughout their use of our service. A typical user session will establish a small handful of connections to the server (one or two).
There are a very small number of exceptions we are trying to track down. Particular users will intermittently have handfuls of hundreds of connections established. When we happen to catch the problem in the act, we can see the exchange of the SSL handshake, and from the perspective of the server, all appears to be in order. Yet we never observe a payload - the client instead connects on a new port and initiates a new handshake.
We do not have access to the client, and cannot observe the behavior from that side of the connection. Nor do we have a local scenario that can reproduce the problem.
It is our belief (though not confirmed) that the user agent is connecting to our server directly, and not through a proxy.
Does anybody recognize these symptoms? Can anyone suggest steps to further identify the problem?
Are there any patterns you can see to this traffic, aside from making many repeated requests?
For example, do the requests come from the same IP ranges? Possibly search engines or other spiders, or maybe from countries that you normally don't get users from, possibly indicating some sort of weird botnet or at least something you could block?
Do these rogue requests always negotiate to use a particular cipher suite, potentially indicating the client software?
Does it make any difference if you change the available cipher suites available for negotiation?
What server software are you using, and are there any firewalls within your network that could potentially be dropping some responses to the user?
i've seen a botnet flooding https sites being mentoned.
this is probably not your situation, but i thought i mention it.
i'm seeing chrome (12.0.742.60 beta) flooding my server with https connections, some half a dozen or more connections for a single static picture being served... as if it had an optimization to build up connections with ready https handshakes waiting for requests to send, and then after the page (file) has been served it closes them all.
on plain http i see only two connections (one extra for favicon.ico).
Related
I'm going to be using gRPC for a device to device connection over a network (my device will be running Linux and collecting patient data from various monitors, gRPC will be used by a Windows client system to grab and display that data).
I obviously want to encrypt the data on the wire, but dealing with certificates is going to be a problem for various reasons. I can easily have the server not ask for the client cert, but so far I've been unable to find a way around the client validating the server's cert.
I've got several reasons I don't want to bother with a server cert:
The data collection device (the gRPC server) is going to be assigned an IP and name via DHCP in most cases. Which means that when that name changes (at install time, or when they move the device to a different part of the hospital), I have to automatically fixup the certs. Other than shipping a self-signed CA cert and key with the device, I don't know how to do that.
There are situations where we're going to want to point client to server via IP, not name. Given that gRPC can't do a cert for an IP (https://github.com/grpc/grpc/issues/2691), this becomes a configuration that we can't support without doing something to give a name to a thing we only have an IP for (hosts file on the Windows client?). Given the realities of operating in a hospital IT environment, NOT supporting use of IPs instead of names is NOT an option.
Is there some simple way to accommodate this situation? I'm far from an expert on any of this, so it's entirely possible I've missed something very basic.
Is there some simple way to set the name that the client uses to check the server to be different than the name it uses to connect to the server? That way I could just set a fixed name, use that all the time and be fine.
Is there some way to get a gRPC client to not check the server certificate? (I already have the server setup to ignore the client cert).
Is there some other way to get gRPC to encrypt the connection?
I could conceivably set things up to have the client open an ssh tunnel to the server and then run an insecure gRPC connection across that tunnel, but obviously adding another layer to opening the connection is a pain in the neck, and I'm not at all sure how comfortable the client team is going to be with that.
Thanks for raising this question! Please see my inline replies below:
I obviously want to encrypt the data on the wire, but dealing with
certificates is going to be a problem for various reasons. I can
easily have the server not ask for the client cert, but so far I've
been unable to find a way around the client validating the server's
cert.
There are actually two types of checks happening on the client side: certificate check and the hostname verification check. The former checks the server certificate, to make sure it is trusted by the client; the latter checks the target name with server's identity on the peer certificate. It seems you are suffering with the latter - just want to make sure because you will need to get both of these checks right on the client side, in order to establish a good connection.
The data collection device (the gRPC server) is going to be assigned
an IP and name via DHCP in most cases. Which means that when that name
changes (at install time, or when they move the device to a different
part of the hospital), I have to automatically fixup the certs. Other
than shipping a self-signed CA cert and key with the device, I don't
know how to do that.
There are situations where we're going to want to point client to
server via IP, not name. Given that gRPC can't do a cert for an IP
(https://github.com/grpc/grpc/issues/2691), this becomes a
configuration that we can't support without doing something to give a
name to a thing we only have an IP for (hosts file on the Windows
client?). Given the realities of operating in a hospital IT
environment, NOT supporting use of IPs instead of names is NOT an
option.
gRPC supports IP address(it is also mentioned in the last comment of the issue you brought up). You will have to put your IP address in the SAN field of server's certificate, instead of the CN field. It's true that it will be a problem if your IP will change dynamically - that's why we need DNS domain name, and set up the PKI infrastructure. If that's a bit heavy amount of work for your team, see below :)
Is there some simple way to accommodate this situation? I'm far from
an expert on any of this, so it's entirely possible I've missed
something very basic.
Is there some simple way to set the name that the client uses to check
the server to be different than the name it uses to connect to the
server? That way I could just set a fixed name, use that all the time
and be fine.
You can directly use IP address to connect, and override the target name in the channel args. Note that the overridden name should match the certificate sent from the server. Depending on which credential type you use, it could be slightly different. I suggest you read this question.
Is there some way to get a gRPC client to not check the server
certificate? (I already have the server setup to ignore the client
cert).
Is there some other way to get gRPC to encrypt the connection?
Note that: Even if you don't use any certificate on the wire, if you are sure the correct credential type(either SSL or TLS) is used, then the data on the wire is encrypted. Certificate helps you to make sure the endpoint to which you are connecting is verified. Failing to use certificates will leave your application to Man-In-The-Middle attacks. Hope this can help you better understand the goals and make the right judgement for your team.
I am trying to build a solution to monitor my website visits without relying on cookies and third parties. Currently, by monitoring the access logs I can get enough and useful information but I am missing the length of the visits (i.e. to check whether people actually read what I write).
What would be a good strategy to monitor visit length with access logs? (I am using Nginx, but presumably the same ideas will be valid for Apache)
If not already part of your build then install the Nchan websockets module for Nginx.
Configure a websocket subscriber location directive on your Nginx server and specify nchan_subscribe_request and nchan_unsubscribe_request directives within it.
Insert a line of code into your page to establish a client connection to your websocket location upon page load.
That's it, done.
Now when I visit your page my browser will connect to your Nginx/Nchan websocket server. Nginx will make an internal request to whatever address you set as the nchan_subscribe_request URL, you can pass my IP in the headers of this request or whatever you need to identify me. Log this in your main log, a separate log, pass it to an upstream server, php, node, make a database entry, save my ip+timestamp in memcached, whatever.
Then when I leave the site my websocket connection will disconnect and Nginx will do the same thing but to the nchan_unsubscribe_request URL instead. Depending upon what you did when I connected you can now do whatever you need to do in order to work out how long I spent on your site.
As you now have a persistent connection to your clients you could take it a step further and include some code to monitor certain client behaviours or watch for certain events.
You are trying to determine whether or not people are reading what you write so you could use a few lines of javascript to monitor how far down the page visitors had scrolled. Each time they scrolled to a new maximum scroll position send that data over the websocket back to your server.
Due to the disconnected nature of HTTP, your access log would probably not give you what you need.
Not totally familiar with nginx or apache log, but I think most logs contain a timestamp, an HTTP request (the document requested and status, etc.) and an IP address.
Potential issues
Without a session cookie, all IP addresses (same household, same company, etc.) would be seen as the same session.
If someone goes to your site (1 HTTP request), consumes content on your site, doesn't proceed to another page, and leaves, your log will only contain that request (which is essentially a bounce, and you won't be able to calculate duration). If your application makes uses of a lot of javascript calls, then you might be able to log from the server side application,
2) If you use a tool like GA, etc., you can still use timer and javascript events (etc. though not perfect) to tell GA that the session is still active. Not sure if it works for typical server logs.
It might not be as big as an issue if a typical visit contains more than 1 request, knowing that there is no easy way to get the duration after the last server request.
My question is the same as this one but hopefully adds clarity to get an answer. After reading this fantastic article on the specifics behind NAT Traversal along with a general summary of methods found here, I'm wondering if the scenario has been accomplished or is possible. I'm writing software that serves web pages on any specified port, and am wondering if it is possible to have a web client from the WAN side connect to this server that is behind a NAT router. The reason this I'm finding this difficult is because:
I don't want to tell the user (who owns the web server) to configure their router to port forward (and many cases the user may not have privileges to do so).
UPnP I believe is often default-disabled, and is another configuration privilege not afforded to the user.
UDP Hole Punching looked promising until I realized the client is using a browser with http, and thus can communicate only through TCP, and limits my capability further by restricting options to browser-scripts.
I haven not found a successful implementation of TCP Hole Punching, considering the difficulties of maintaining state information (currently I'm looking at chownat, but am wondering how to implement TCP over a UDP tunnel from a web browser (or if that's even possible?).
Using a proxy to forward all traffic doesn't scale well (though using an external server, that is not behind a NAT, would be perfectly fine for setting up the initial connection or NAT traversal). By Scaling, I mean if many many users have their own web servers, not for the one user's web server to have high traffic (which is not a concern given the user's upload-bandwidth is often severely limited).
Right now I'm starting to think there will have to be some client-side browser script to help implement this, so the task won't be completely handled by the server. If anybody has any ideas or experience with trying to have a user connect to a web server behind a NAT router, I'd greatly help some direction! Thanks!
Assume you have one box (dedicated server) that's on 24 7 and several other boxes that are user machines that have unused bandwidth. Assume you want to host several web pages. How can the dedicated server redirect http traffic to the user machines. It is desirable that the address field in the web browser still displays the right address, and not an ip. Ie. I don't want to redirect to another web page, I want to tell the web browser that it should request the same web page from a different server. I have been browsing through the 3xx codes, and I don't think they are made for anything like this.
It should work some what along these lines:
1. Dedicated server is online all the time.
2. User machine starts and tells the dedicated server that it's online.
(several other user machines can do similarly)
3. Web browser looks up domain name and finds out that it points to dedicated server.
4. Web browser requests page.
5. Dedicated server tells web browser to repeat request to user machine
Is it possible to use some kind of redirect, and preferably tell the browser to keep sending further requests to user machine. The user machine can close down at almost any point of time, but it is assumed that the user machine will wait for ongoing transactions to finish, no closing the server program in the middle of a get or something.
What you want is called a Proxy server or load balancer that would sit in front of your web server.
The web browser would always talk to the load balancer, and the load balancer would forward the request to one of several back-end servers. No redirect is needed on the client side, as the client always thinks it is just talking to the load balancer.
ETA:
Looking at your various comments and re-reading the question, I think I misunderstood what you wanted to do. I was thinking that all the machines serving content would be on the same network, but now I see that you are looking for something more like a p2p web server setup.
If that's the case, using DNS and HTTP 30x redirects would probably be what you need. It would probably look something like this:
Your "master" server would serve as an entry point for the app, and would have a well known host name, e.g. "www.myapp.com".
Whenever a new "user" machine came online, it would register itself with the master server and a the master server would create or update a DNS entry for that user machine, e.g. "user123.myapp.com".
If a request came to the master server for a given page, e.g. "www.myapp.com/index.htm", it would do a 302 redirect to one of the user machines based on whatever DNS entry it had created for that machine - e.g. redirect them to "user123.myapp.com/index.htm".
Some problems I see with this approach:
First, Once a user gets redirected to a user machine, if the user machine went offline it would seem like the app was dead. You could avoid this by having all the links on every page specifically point to "www.myapp.com" instead of using relative links, but then every single request has to be routed through the "master server" which would be relatively inefficient.
You could potentially solve this by changing the DNS entry for a user machine when it goes offline to point back to the master server, but that wouldn't work without an extremely short TTL.
Another issue you'll have is tracking sessions. You probably wouldn't be able to use sessions very effectively with this setup without a shared session state server of some sort accessible by all the user machines. Although cookies should still work.
In networking, load balancing is a technique to distribute workload evenly across two or more computers, network links, CPUs, hard drives, or other resources, in order to get optimal resource utilization, maximize throughput, minimize response time, and avoid overload. Using multiple components with load balancing, instead of a single component, may increase reliability through redundancy. The load balancing service is usually provided by a dedicated program or hardware device (such as a multilayer switch or a DNS server).
and more interesting stuff in here
apart from load balancing you will need to set up more or less similar environment on the "users machines"
This sounds like 1 part proxy, 1 part load balancer, and about 100 parts disaster.
If I had to guess, I'd say you're trying to build some type of relatively anonymous torrent... But I may be wrong. If I'm right, HTTP is entirely the wrong protocol for something like this.
You could use dns, off the top of my head, you could setup a hostname for each machine that is going to serve users:
www in A xxx.xxx.xxx.xxx # ip address of machine 1
www in A xxx.xxx.xxx.xxx # ip address of machine 2
www in A xxx.xxx.xxx.xxx # ip address of machine 3
Then as others come online, you could add then to the dns entries:
www in A xxx.xxx.xxx.xxx # ip address of machine 4
Only problem is you'll have to lower the time to live (TTL) entry for each record down to make it smaller (I think the default is 86400 - 1 day)
If a machine does down, you'll have to remove the dns entry, though I do think this is the least intensive way of adding capacity to any website. Jeff Attwood has more info here: is round robin dns good enough?
Hey I am writing an app in Twisted, and as it stands I have 4 servers bound two different ports all communicating with the client via JSON. Is there anyway to bind these 4 servers to the same port and have the interactions remain the same?
For instance say the client subscribes to two different feeds, transmitted via a direct socket.
Right now I just do like
server1.read_string()
server2.read_string()
and it will read the correct JSON string from the respective feeds. Is there anyway to maintain this type of functionality but contact my server on the same port?
I do not want to throw all of the server functionality into one massive server and partition the data by header prefixes.
I don't want to do something like
s = server.read_string()
header = s.split(//some delimiter)[0]
if (header == "SERVER1")
{
// Blahh
}
It sounds like you have many clients interacting with your servers via HTTP. The standard solution is to throw a reverse proxy between the client and your servers - that proxy then forwards connections to the appropriate server depending on the URL. The reverse proxy can run on any one of your existing servers or on its own server to lighten the load.
If your data is cachable, the reverse proxy can do caching on your results too.
There are many reverse proxies available and you will want to choose one based on what sort of workload you have. Do you need it to be highly configurable? Is the data public or based on logins? How long does each connection last / how many connections to you want to hold open at once?
Squid, Varnish, HAProxy are good reverse proxies and even Apache could do this for you.
I plan to use HAProxy for Gridspy, my project as I have many ongoing connections with my clients and want to place an orbited server in the same URL path as my django server. See This tutorial for more information on how to forward many connections on port 80 from one server to many. This tutorial is focused on Comet, but your problem is even simpler than that.
If you are considering an ongoing tcp/ip connection from the browser back to your servers, seriously consider Orbited. See this tutorial about graphs via orbited and morbidQ. Orbited will also punch through firewalls and proxies better than most custom solutions will, as it looks like normal HTTP traffic.
In order to have multiple servers running on the same machine all bound to the same port, they need to be bound to different IP addresses. The only way to bind to the same port on the same IP is to enable the socket's SO_REUSESOCKET option, but then multiple servers would be able to receive each other's inbound data, really messing up your communications.
Otherwise, having a single server that uses headers to identifies the particular feeds is best. Why do you not want to do that?