http push - http streaming method with ssl - do proxies interfere whith https traffic? - http

My Question is related to the HTTP Streaming Method for realizing HTTP Server Push:
The "HTTP streaming" mechanism keeps a request open indefinitely. It
never terminates the request or closes the connection, even after the
server pushes data to the client. This mechanism significantly
reduces the network latency because the client and the server do not
need to open and close the connection.
The HTTP streaming mechanism is based on the capability of the server
to send several pieces of information on the same response, without
terminating the request or the connection. This result can be
achieved by both HTTP/1.1 and HTTP/1.0 servers.
The HTTP protocol allows for intermediaries
(proxies, transparent proxies, gateways, etc.) to be involved in
the transmission of a response from server to the client. There
is no requirement for an intermediary to immediately forward a
partial response and it is legal for it to buffer the entire
response before sending any data to the client (e.g., caching
transparent proxies). HTTP streaming will not work with such
intermediaries.
Do I avoid the descibed problems whith proxy servers if i use HTTPS?

HTTPS doesn't use HTTP proxies - this would make security void. HTTPS connection can be routed via some HTTP proxy or just HTTP redirector by using HTTP CONNECT command, which establishes transparent tunnel to the destination host. This tunnel is completely opaque to the proxy, and proxy can't get to know, what is transferred (it can attempt to modify the dataflow, but SSL layer will detect modification and send an alert and/or close connection), i.e. what has been encrypted by SSL.
Update: for your task you can try to use one of NULL cipher suites (if the server allows) to reduce the number of operations, such as perform no encryption, anonymous key exchange etc. (this will not affect proxy's impossibility to alter your data).

Related

Close HTTP request socket connection

I'm implementing HTTP over TLS proxy server (sni-proxy) that make two socket connection:
Client to ProxyServer
ProxyServer to TargetServer
and transfer data between Client and TargetServer(TargetServer detected using server_name extension in ClientHello)
The problem is that the client doesn't close the connection after the response has been received and the proxy server waits for data to transfer and uses resources when the request has been done.
What is the best practice for implementing this project?
The client behavior is perfectly normal - HTTP keep alive inside the TLS connection or maybe even a Websocket connection. Given that the proxy does transparent forwarding of the encrypted traffic it is not possible to look at the HTTP traffic in order to determine exactly when the connection can be closed. A good approach is therefore to keep the connection open as long as the resources allow this and on resource shortage close the connections which were idle (no traffic) the longest time.

HTTP protocol connectionless and use TCP by default, How does that make sense?

I read that HTTP protocol uses the reliable TCP connections by default
,and one of the HTTP feature is connectionless.
Now I am confused how does it make sense ?
How does it use TCP and the same time it is connectionless and as I know TCP is connection-oriented
HTTP and TCP are different things. TCP is a transport layer protocol, whereas HTTP is an application layer protocol. HTTP uses TCP for data transmissions.
IMO this website has a nice explanation:
HTTP is connectionless: The HTTP client, i.e., a browser initiates an HTTP request and after a request is made, the client waits for the response. The server processes the request and sends a response back after which client disconnect the connection. So client and server knows about each other during current request and response only. Further requests are made on new connection like client and server are new to each other.
However, Wikipedia defines HTTP as stateless:
HTTP is a stateless protocol. A stateless protocol does not require the HTTP server to retain information or status about each user for the duration of multiple requests. However, some web applications implement states or server side sessions using for instance HTTP cookies or hidden variables within web forms.
Based on their explanations, the terms seems to be used interchangeably. However, these are not really true as the in-use HTTP versions allow you to identify the users through cookies etc. and create persistent connections.

Are communication protocol Half-duplex or Full-duplex?

I know there are a lot of different comminucation protocals like: http, tcp, ssh, socks5, SMTP, POP,etc.
I also know that to achieve a comminication, we need to connect localhost:localport to remotehost:remoteport. For example, if we google something, we would connect a random local port to www.google.com: 80. If we ssh a remote host, we would connect a random local port to remotehost: 22.
My question is: Are communication protocol Half-duplex or Full-duplex?
I guess the answer is Half-duplex. Because I think in http connection, at first we send the request from localhost:localport to remotehost:80, and then the remote server send its response from remotehost:80 to localhost:localport. Similarly, in ssh connection, at first we sent the ssh commands to remote host, after receiving the commands, the remote host do something and send the results back to the local host.
So I think in one connection between localhost:localport and remotehost:remoteport, the message is sent either from localhost:localport to remotehost:remoteport, or from remotehost:remoteport to localhost:localport.
Am I right?
As explained in this article:
SSH is a bidirectional full duplex protocol, which means that it’s not synchronous like HTTP where you need to send a message for a response to happen.
With SSH the remote host might want to tell you something even if you have remained silent. This connector uses a callback flow approach to decouple the “sending” operation from the “receiving” operation.
As documented in this IETF draft, most implementations do allow full-duplex HTTP (for 2xx responses).
Full-duplex HTTP follows the basic HTTP request-response semantics but also allows the server to send response body to the client at the same time when the client is transmitting request body to the server.
Requirements for full-duplex HTTP are under-specified in the existing HTTP (Hypertext Transfer Protocol -- HTTP/1.1) specification, and this memo intends to clarify the requirements of full-duplex HTTP on top of the basic HTTP protocol semantics.

HTTP and Sessions

I just went through the specification of http 1.1 at http://www.w3.org/Protocols/rfc2616/rfc2616.html and came across a section about connections http://www.w3.org/Protocols/rfc2616/rfc2616-sec8.html#sec8 that says
" A significant difference between HTTP/1.1 and earlier versions of HTTP is that persistent connections are the default behavior of any HTTP connection. That is, unless otherwise indicated, the client SHOULD assume that the server will maintain a persistent connection, even after error responses from the server.
Persistent connections provide a mechanism by which a client and a server can signal the close of a TCP connection. This signaling takes place using the Connection header field (section 14.10). Once a close has been signaled, the client MUST NOT send any more requests on that connection. "
Then I also went through a section on http state management at https://www.rfc-editor.org/rfc/rfc2965 that says in its section 2 that
"Currently, HTTP servers respond to each client request without relating that request to previous or subsequent requests;"
A section about the need to have persistent connections in the RFC 2616 also said that prior to persistent connections every time a client wished to fetch a url it had to establish a new TCP connection for each and every new request.
Now my question is, if we have persistent connections in http/1.1 then as mentioned above a client does not need to make a new connection for every new request. It can send multiple requests over the same connection. So if the server knows that every subsequent request is coming over the same connection, would it not be obvious that the request is from the same client? And hence would this just not suffice to maintain the state and would this just nit be enough for the server to understand that the request was from the same client ? In this case then why is a separate state management mechanism required at all ?
Basically, yes, it would make sense, but HTTP persistent connections are used to eliminate administrative TCP/IP overhead of connection handling (e.g. connect/disconnect/reconnect, etc.). It is not meant to say anything about the state of the data moving across the connection, which is what you're talking about.
No. For instance, there might an intermediate (such as a proxy or a reverse proxy) in the request path that aggregates requests from multiple TCP connections.
See http://greenbytes.de/tech/webdav/draft-ietf-httpbis-p1-messaging-21.html#intermediaries.

Why are HTTP proxies able to support protocols like IRC and FTP?

I understand that a SOCKS proxy only establishes a connection at the TCP level while an HTTP proxy interprets traffic at HTTP level. Thus a SOCKS proxy can work for any kind of protocol while an HTTP Proxy can only handle HTTP traffic. But why does an HTTP Proxy like Squid can support protocol like IRC, FTP ? When we use an HTTP Proxy for an IRC or FTP connection, what does specifically happen? Is there any metadata added to the package when it is sent to the proxy over the HTTP protocol?
HTTP proxy is able to support high level protocols other than HTTP if it supports CONNECT method, which is primarily used for HTTPS connections, here is description from Squid wiki:
The CONNECT method is a way to tunnel any kind of connection through an HTTP proxy. By default, the proxy establishes a TCP connection to the specified server, responds with an HTTP 200 (Connection Established) response, and then shovels packets back and forth between the client and the server, without understanding or interpreting the tunnelled traffic
If client software supports connection through 'HTTP CONNECT'-enabled (HTTPS) proxy it can be any high level protocol that can work with such a proxy (VPN, SSH, SQL, version control, etc.)
As others have mentioned, the "HTTP CONNECT" method allows you to establish any TCP-based connection via a proxy. This functionality is needed primarily for HTTPS connections, since for HTTPS connections, the entire HTTP request is encrypted (so it appears to the proxy as a "meaningless" TCP connection). In other words, an HTTPS session over a proxy, or a SSH/FTPS session over a proxy, will both appear as "encrypted sessions" to the proxy, and it won't be able to tell them apart, so it has to either allow them all or none of them.
During normal operation, the HTTP proxy receives the HTTP request, and is "smart enough" to understand the request to be able to do high level things with it (e.g. search its cache to see if it can serve the response without going to the destination server, or consults a whitelist/blacklist to see if this URL is allowed, etc.). In "CONNECT" mode, none of this happens. The proxy establishes a TCP connection to the destination server, and simply forwards all traffic from the client to the destination server and all traffic from the destination server to the client. That means any TCP protocol can work (HTTPS, SSH, FTP - even plain HTTP)

Resources