Kubernetes sticky session / load balance by header value - nginx

I'm working on a project where I want to use Kubernetes and Docker. The microservice I'm about to implement must create a permanent HTTP/2 connection to another service (provided by others and I can't modify anything in that service) pro user / client in order to send asynchronous and cloud initiated messages to that user. Also, each following request from that client must use the same connection.
Obviously that is a challenge in terms of scalability, because every request from a client must be routed to the same instance of my microservice, which created the permanent connection to the other service. What makes things worse is the fact that my clients can change the IPs and that they can't use cookies. But what they can do is to send a custom header value which identifies them.
I thought about HAProxy and nginx, but can't find an option in either of them to load balance requests by a header value. Is there really no way to do that? How would you approach that issue? Any ideas?
Thanks!

Related

When implementing a web proxy, how should the server report lower-level protocol errors?

I'm implementing an HTTP proxy. Sometimes when a browser makes a request via my proxy, I get an error such as ECONNRESET, Address not found, and the like. These indicate errors below the HTTP level. I'm not talking about bugs in my program -- but how other servers behave when I send them an HTTP request.
Some servers might simply not exist, others close the socket, and still others not answer at all.
What is the best way to report these errors to the caller? Is there a standard method that, if I use it, browsers will convert my HTTP message to an appropriate error message? (i.e. they get a reply from the proxy that tells them ECONNRESET, and they act as though they received the ECONNRESET themselves).
If not, how should it be handled?
Motivations
I really want my proxy to be totally transparent and for the browser or other client to work exactly as if it wasn't connected to it, so I want to replicate the organic behavior of errors such as ECONNRESET instead of sending an HTTP message with an error code, which would be totally different behavior.
I kind of thought that was the intention when writing an HTTP proxy.
There are several things to keep in mind.
Firstly, if the client is configured to use the proxy (which actually I'd recommend) then fundamentally it will behave differently than if it were directly connecting out over the Internet. This is mostly invisible to the user, but affects things like:
FTP URLs
some caching differences
authentication to the proxy if required
reporting of connection errors etc <= your question.
In the case of reporting errors, a browser will show a connectivity error if it can't connect to the proxy, or open a tunnel via the proxy, but for upstream errors, the proxy will be providing a page (depending on the error, e.g. if a response has already been sent the proxy can't do much but close the connection). This page won't look anything like your browser page would.
If the browser is NOT configured to use a proxy, then you would need to divert or intercept the connection to the proxy. This can cause problems if you decide you want to authenticate your users against the proxy (to identify them / implement user-specific rules etc).
Secondly HTTPS can be a real pain in the neck. This problem is growing as more and more sites move to HTTPS only. There are several issues:
browsers configured to use a proxy, for HTTPS URLS will firstly open a tunnel via the proxy using the CONNECT method. If your proxy wants to prevent this then any information it provides in the block response is ignored by the browser, and instead you get the generic browser connectivity error page.
if you want to provide any other benefits one normally wishes from a proxy (e.g. caching / scanning etc) you need to implement a MitM (Man-in-the-middle) and spoof server SSL certificates etc. In fact you need to do this if you just want to send back a block-page to deny things.
There is a way a browser can act a bit more like it was directly connected via a proxy, and that's using SOCKS. SOCKS has a way to return an error code if there's an upstream connection error. It's not the actual socket error code however.
These are all reasons why we wrote the WinGate Internet Client, which is a LSP-based product for our product WinGate. Client applications then learn the actual upstream error codes etc.
It's not a favoured approach nowadays though, as it requires installation of software on the client computer.
I wouldn't provide them too much info. Report what you need through internal logs in case you have to solve the problem. Return a 400, 403 or 418. Why? Perhaps the're just hacking.

HTTP vs HTTPS from developer view

I need to build a Web site which would have a secure connection (HTTPS) on some pages. I need to know if there will be a difference for me (as a developer) while I will write the code? I must treat differently some data or what? What is the main difference from back-end view?
From the backend point of view, there is no difference. The difference between the two is the TCP connection between the server and the client. Https will be encrypted, http is not of course, but it's all decrypted by the time it hits your code. The server will have some flags available so you can determine whether the connection is http or https (names vary depending on the server) but unless you're using that information to change the behavior of the page, you don't need to worry about it.

Will a request to api.myapp.com be slower then a request to api-myapp.herokuapp.com when hosted on heroku?

I'm trying to understand the best way to handle SOA on heroku, i've got it into my head that making requests to custom domains will somehow be slower, or would all requests go "out" via the internet?
On previous projects which are SOA in nature we've had dedicated hosting so could make requests like http://blogs/ (obviously on the internal network) I'm wondering if heroku treats *.herokuapp.com requests as "internal"... Or is it clever enough to know the myapp.com is actually myapp.herokuapp.com and route locally, or am i missing the point completely, and in fact all requests are "external"
What you are asking about is general knowledge of how internet requests are working.
Whenever you do request from your application to lets say example.com, domain name will first be translated into IP address using so called DNS servers.
So this how it works: does not matter you request myapp.com or myapp.heroku.com you will always request infromation from specific IP address, and domain name you have requested will be passed as part of request headers.
Server which receives this request will try to find in its internal records this domain name and handle request accordingly.
So conclusion is that does not matter you put myapp.com or myapp.heroku.com, the speed of request will always be same.
PS: As heroku will load balance your requests between different instances of your running myapp.com, the speed here will depend on several factors: how quickly your application will respond, how many instances you have running and load average per instance, how much is load balancer loaded at the moment. But surely it will not depend on which domain name you use.

HTTP client acting as a pseudo-server

Let's say I am going to deploy a server application that's likely to be placed behind a NAT/firewall and I don't want to ask users to tweak their NAT port mapping. In other words, connections to the server are impossible, but my app is a server application by nature, i.e. it sends back objects per URI.
Now, I'm thinking about initiating connections from the server periodically to see what requests are there to be responded to. I'm going to use HTTP via port 80 as something that would likely be working through NAT/firewall from virtually anywhere.
The question is, are there any standard considerations and common practices of implementing a client that can act as a server at the application level, specifically using HTTP? Any special HTTP headers? Design patterns?
E.g. I am thinking about the following scheme:
The client (which is my logical server) sends a dummy HTTP request to the server
The server responds back with non-standard headers X-Request-URI:, X-Host:, X-If-Modified-Since: etc, in other words, request headers wrapped into X-xxx as they are not standard in this situation; also requests to keep the connection alive
The client responds with a POST request that sends the requested object; again, uses wrapped headers (e.g. X-Status:, etc)
Unless there is a more "standard" way of doing something like this, do you think my approach is plausible?
Edit: an interesting discussion took place on reddit here
I've done something similar. This is very common. Client initiate the connection to the Server and keep the connection ALIVE. If the session is shut-down, client would re-initiate. When the session is up, Server can push anything to the client since it's client initiated.

stateless protocol and stateful protocol

How to understand stateless protocol and stateful protocol? HTTP is a stateless protocol and FTP is a stateful protocol. For the web applications requiring a lot of interactions, the underlying protocol should be stateful ones. Is my understanding right?
HTTP is a stateless protocol, in other word the server will forget everything related to client/browser state. Although web applications have made it virtually look like stateful.
A stateless protocol can be forced to behave as if it were stateful. This can be accomplished if the server sends the state to the client, and if the client to sends it back again to the server, every time.
There are three ways this may be accomplished in HTTP:
a) One is cookies, in which case the state is sent and returned in HTTP headers.
b) The second is URL extension, in which case the state is sent as part of the URL as request.
c) The third is "hidden form fields", in which the state is sent to the client as part of the response, and returned to the server as part of a form's hidden data
SCALABILITY AND HIGH AVAILABILITY
One of the major reasons why HTTP scales so well is its Statelessness. Stateless protocol eases the replication concerns, as the state itself doesn't need to be stored on the server.
Stateful protocols are logically heavy to implement in Internet reliably. Stateless servers are also easily scalable, while for stateful servers scalablity is problematic. Stateless request can be sent to any node, at any time, while with Stateful this is not a case.
HTTP as Stateless protocol increases availability for stateless web applications, which otherwise would be difficult or impossible to implement. If there is connection lost, there is no state that is lost, simple request resend will resolve the problem. Stateless requests are also cacheable.
see more here
Since you're asking about a Web application, the protocol will always be stateless -- the protocol for the Web is http (or https), and that's all she wrote.
I think what you're thinking of is providing a state mechanism in your Web application itself. The typical approach to this is that you create a unique identifier for the user's session in your Web application (a sessionID of one form or another is the common practice) which is handed back and forth between browser and server. That's typically done in a cookie, though it can be done, with a bit more hassle for you depending on your platform/framework, on the URL as well.
Your server-side code stores stateful information (again, typically called the user's session) however it wants to using the sessionID to look it up. The http traffic simply hands back the sessionID. As long as that identifier is there, each http transaction is completely independent of all others, hence the protocol traffic itself is stateless.
HTTP is a stateless protocol. All the web-based applications are also stateless.
When a request is send to the server, a connection is established between client and server. The server receives the request, processes the request and sends back the response and then, the connection will be closed.
If another request will be sent, after that, it will be treated as a new request and a new connection will be established.
In order to make HTTP stateful, we use session management techniques.
So that, it uses the data coming from previous request while processing present request i.e, it uses the same connection for a series of client server interactions.
The session management techniques are:
hidden form field
cookie
session
URL-rewriting
Anything that forgets whatever it did in past is stateless, such as http
Anything that can keep the history is statefull, such as database
Http is a stateless protocol, that's why it forgets the user information.
We make http as statefull protocol using jsonWebToken(JWT) i.e. on each request going to server, server will first verify the user using JWT.
Your question is spot on, and yes, it would be great if your web transactions with your bank were done over a stateful connection. Alas, HTTP is stateless due to a quirky bug in FTP and a 12 socket limit in the partial socket table in BSD of 1989. Marcus Ranum explained it all here.
So HTTP throws away the state it inherits from TCP and has to recreate state at the application layer in the form of cookies. Crappy internet security is the result.
The Seif project proposes to fix all that using "secure JSON over TCP". DNS and certificate authorities are not required. The protocol and seifnode.js are finished and on github with an MIT license.
HTTP doesn't 'inherit' from TCP, but rather uses it for a transport. HTTP uses TCP for a stateful connection, but then disconnects. Later it will connect again, if needed. So, while you browse through a web site you create many different connections. Each one of those connections is stateful, but the conversation as a whole is not, since you are dropping the connection with every conversation.
From this link
Basically yes, but you have no choice but use HTTP which is where websites are served in. So you have to deal with compromises to make HTTP stateful, aka session management. Possibilities are basically passing on a session id through each call in the URL so you know when you're talking to someone you've talked about before, or via cookies, which achieve the same goal without cluttering the url. However, most modern web development languages take care of that for you; if you google for the language of your choice + "session management" you should get some ideas of how it's done.

Resources