How to know when to stop data tunneling in HTTP proxy connection - http

I am trying to write my own HTTP proxy server and I have a question about the protocol.
First, I would like to mention that I am using this page as a reference. I think it's accurate but it's also from 1998. If anyone has a better reference for me I would be grateful to them.
So basically I understand that the connection starts with a handshake. I receive a CONNECT request, proxy-authorization, etc. Then I connect to the host and port specified in the request's resource URI. Then I send a status line, ideally HTTP/1.1 200 Connection established, followed by some headers and a CRLF like normal.
Once this handshake is complete my client and the host my client asked for are connected through my proxy server. I am supposed to tunnel data in both directions, which makes sense since I could be supporting any type of TCP based protocol, including HTTPS or even WebSocket, over this HTTP based proxy connection.
What doesn't make sense to me is how I know when to stop. If this proxy can really support any TCP based protocol then I don't understand how to know when the interaction is over. An HTTP message would be a simple 2 step read-write, an HTTPS interaction would involve several such exchanges, and a WebSocket interaction would involve indefinitely many exchanges.
I'm not asking for a perfect solution. I would be happy with something pragmatic like a timeout, but I would like to know what standard best practices are in order to do this project as well as I can.
Thanks to everyone for any help.

Just copy data in both directions simultaneously until you read an end of stream. Then:
Shutdown the opposite socket for writing and stop copying in that direction. That propagates the EOS to the peer.
If the socket you read EOS from was already shutdown for writing, which you will have to remember, close both sockets.

Related

What are the benefits of HTTP reverse shell over TCP reverse shell?

I had made a multiclient TCP reverse shell and saw a course video which said HTTP reverse shells are better because how its difficult to trace back to the attacker compared to TCP . I didn't understand it .
I have tried googling this question with not much help .
Are HTTP reverse shells actually beneficial over TCP ? How ?
I personally think having HTTP reverse shell is bad since http is connectionless , when the attacker wants to communicate with the host , it can't since there is no connection to it and attacker can only communicate if a request (like GET) comes from the host. Am I missing anything here ?
Please explain....
First, I am just going to answer for HTTPS over HTTP because I don't see much reason to use HTTP over HTTPS, but there are a lot of benefits to encrypting your traffic this way.
It's unlikely to be auto-filtered
Many networks will block outbound traffic other than a few special ports. So, using something like port 6666 is likely to set off a few alerts. If you try to use a port for something other than it's intended use, some software can use deep packet inspection (DPI) to detect/block this. In other words, if your payload tries to use port 80/443 without using HTTP/HTTPS, it may raise an alert and get your payload caught.
It's stealthier.
I would say two of the most important factors to being a stealthy payload are looking like normal traffic so as to avoid attracting attention in the first place and to be difficult to inspect if attention does come to your connection. HTTPS accomplishes both of these rather well.
This is because on most networks, it is extremely common to see nodes on your network making requests to the internet all the time. Compare a beaconing payload making HTTPS requests to some payload connecting over some random port.
Now, as far as your question at the end... it depends on your situation, but you are right that there will often be a delay if you use something like HTTP(S) over maintaining an established connection. I alluded to this earlier, but we are able to communicate through beaconing. Essentially, that just means that the payload will check back with the server on a set interval (often with a jitter to make it a little harder to detect).
The victim will make an HTTP(S) request to your command and control (C2) server that contains the results of the previous command you told it to run. Your server will return an HTTP(S) response that contains the next instructions for the payload.

Implementing a WebServer

I am trying to create a Web Server of my own and there are several questions about working of Web servers we are using today. Questions are:
After receiving a HTTP request from a client through port 80, does server respond using same port 80?
If yes then while sending a large file say a pic in MB's, webserver will be unable to receive requests from other clients?
Is a computer port duplex or simplex? (Can it send and receive at the same time)?
If another port on server side is used to send response to client, then (if TCP is used, which is generally used), again 3-way handshaking will be done which will be overhead...
http://beej.us/guide/bgnet/output/html/singlepage/bgnet.html here is a good guide on what's going on with webservers, although it's in c but the concepts are all there. This will explain the whole client server relationship as well as some implementation details.
I'll just give a high level on what's going on:
Usually what happens is when your server gets a new request that comes in it creates a fork that will process it, that way you are not bogged down by each request, when the request comes in the child process is handed a new file to write to(again this is all implementation details).
So really you have one server waiting for requests and for each request it received it spawns a child to process to deal with this request. I'm sure there are much easier languages to implement this stuff than c(I had to do both a c and java server serving to either one in my past) but c really gets you to understand the things that are going on and I'm betting that is what you are looking for here
Now there are a couple of things to think about:
how you want the webserver to work. The example explains the parent child process.
Do you want to use tcp/UDP there are differences in the way to payload gets delivered.
You don't have to connect on port 80. that's just the default for web.
Hopefully the guide will help you.
Yes. The server sends the response using the TCP connection established by the client, so it also responds using the same port. The server can handle connections from multiple clients using the same port because TCP connections are identified by (local-ip, local-port, remote-ip, remote-port), so the server can even handle multiple connections from same client provided that the source ports are different.
There are different techniques you can use to be able to serve multiple clients at the same time. These include
using multiple processes or threads: when one is busy serving a client the others can serve other clients.
using events: the server listens for events from the OS: when it can write a block of data to a connection it writes it, when a new client connects it accepts the connection, ...
Frequently both approaches are be combined.
A TCP connection is duplex: you can send and receive at the same time. The HTTP protocol is based on a simple request-response model though: at any given time only one party is "talking."

Why exactly pipe is simple than TCP connection between SSL proxy and HTTP proxy communicating?

I began to study protocol stuffs recently.
I acknowledged that in the old method, incoming data will be first delivered to SSL proxy, where to be decrypted and then be sent to HTTP proxy through another TCP connection. For every packet passes through this connection, we need to do a connection table to look up to determine the other endpoint of the connection.
But the pipe setup and teardown require one function call each and no packet sent. Sending data through the pipe will not require a connection table lookup, as the data structures are already tied together with pointers.
I tried to search the answer of my own question, but can’t find good method to understand it. I guess there may be something related to structure of TCP or PIPE. Could any tell me that why exactly pipe is simple than TCP connection between SSL proxy and HTTP proxy? Or please suggest me what book to read or how can I understand it?
Two Pics related to this question:
http://www.tripntale.com/pic/19254/857880/pipe-jpg#pid-857880
http://www.tripntale.com/pic/19254/857880/pipe-jpg#pid-857882
So what you want to know is how these two diagrams compare?
I'm sorry to say that these diagrams don't make much sense to me either, hopefully they do make sense if there's the text to go with them when they were published.
The diagrams relate to software engineering approaches to a problem, but the objects in the diagrams aren't defined functionally, appear to me to be used in different ways and it isn't clear what the problem is that these are approaches to.
HTTP proxies can be used as:
Forward proxies (client sends it's HTTP requests to proxy, proxy fetches and returns them to client)
Or
Reverse proxies (proxy sits in front of server(s) for service engineering reasons)
The term "SSL Proxy" could refer to either application and would have differing implications to how it was designed.
See here for more explanation: http://en.wikipedia.org/wiki/SSL_Proxy
Do you just want to understand these diagrams? Or are you trying to solve a problem and think that these diagrams can help you? If so, what is the problem you are trying to solve?
For every packet passes through this connection, we need to do a
connection table
Why? I've written several proxies without a connection table.

Working with persistent HTTP connections

We are trying to implement a proxy proof of concept but have encountered an interesting question: Since a single HTTP connection can, and indeed should, make multiple requests, and the HTTP transactions are sent via multiple packets due to TCP's magic, is it possible for a HTTP request to begin in the middle of a packet?
Bear in mind that this is not a theoretical question regarding possible optimization of the browser, but whether it actually happens in real life. It would be even better if someone could point me to a written reference on whether or not this is possible and if so how often it can occur.
Clarification update: We know that if we work in the HTTP layer alone we would not need to bother with this question, however we're trying to figure out if some advanced technique could be applied by working on the TCP layer first.
Assuming that you are talking about IP packets: Yes, it is possible that HTTP request starts middle of IP packet.
When you are using persistent HTTP connections, that is, using same TCP connection for several HTTP requests, it is fully possible that request boundary is middle of IP packet.
Also there is a TCP protocol between IP and HTTP. TCP contains also some headers so a IP packet may start with some TCP headers and rest of the packet consists of HTTP request.
HTTP request may also consist of several IP packets (in case of file uploads, transmission errors and following retransmissions etc).
However, I wonder why you are interested in packets if you are working at HTTP level. TCP should hide the IP packet details.
First of all, TCP is a stream based protocol and has no concept of packets. HTTP itself might have some kind of message or record delimiter, but TCP doesn't.
This page might be helpful: Structure of HTTP Transactions
From your question it sounds like you think that each read from a TCP socket is a "packet" of data. In reality, each read simply reads as many bytes as are in the buffer up to the maximum that you requested, without any concept of records or packets.
So for instance, lets say you read 2048 bytes from the socket, you could have the tail end of one transaction, followed by the beginning of a second response half way through the data you read, and only get the remainder of your second response on your next read from the socket.
If you're here in Jerusalem or near by maybe I could help you out.
Unless you are implementing your own TCP stack, you should not need to worry about the packets, but rather about the API that the TCP provides, in case of POSIX interfaces it would be the recv() or read(). So I treat the question then as "Can more than one HTTP requests come into a single read(), and can the HTTP request be split between multiple read() requests?" -- The answer to both would be "yes, it is possible".
An example of where this can happen is HTTP pipelining. This not frequent in real life (ironically, at least some of the browsers disable it by default because of "buggy proxies" :-) - but when it happens, can be a bit of a problem for the users to diagnose - especially if they have no access to the proxy.
One very notable place where it does happen by default apt-get in Debian-derived linux systems. Just install a Debian or Ubuntu server and try to use it through your proxy. You can do that by editing the /etc/apt/apt.conf.d/proxy file and placing the following there:
Acquire::http::Proxy "http://your.proxy.address:8080";
Depends of which abstraction layer of a packet you are talking about: there are many layers underneath HTTP.
HTTP --> TCP (byte stream) --> IP (packet) --> (possibly something else) Ethernet (frame) --> (possibly) some other transport
If you are talking about the IP layer, then yes the HTTP layer would start later on... Note that TCP presents a "byte stream interface" to its Client layer hence, no concept of packet here.
I think I understand where you are trying to go with this question.
If you don't use persistent HTTP connections, the HTTP GET request header is always the very first thing which is sent over the TCP connection, so we can be sure that the start of the HTTP GET request header does "not start in the middle of some TCP packet". But keep in mind that there may be one or more TCP packets without any user data, e.g. only a SYN, which may preceed the TCP packet with the start of the HTTP GET request header. And also keep in mind that the HTTP GET request header may not be contained in a single TCP packet.
If you do use persistent HTTP connections, the start of the HTTP GET request header for request number N+1 can start in the middle of a TCP packet, namely after the end of HTTP GET request body of request number N.
If you are asking these questions you are possibly "doing it wrong". As several other responders have already pointed out, in the vast majority of cases you should probably just be a TCP client and deal with a TCP stream of data and let the TCP code worry about the TCP packets. (Unless, of course, you are working on some special hardware which is looking at individual IP packets as they fly by and try to do some processing at the HTTP layer.)

Detecting missing responses to long running HTTP (SOAP) requests

I need a way to detect a missing response to a long running HTTP POST request. This problem arises when the network infrastructure (firewalls, proxies, unplugged cables, etc.) drops the response packets. The server may detect this failure, but the client cannot send additional bytes after the POST to probe the state of the TCP connection. The failure may be limited to a single TCP connection. For example I may be able to subsequently open a new TCP connection to the server.
I'm looking for a solution that still uses HTTP POST and does not change the duration of the server side processing.
Some solutions that I can think of are:
Provide a side channel interface to retrieve request & response history. If the history lists the response as having been send (presumably resulting in a TCP error) but I have not yet received it within a reasonable time I can generate a local error.
Use an X header to request that the server deliver "spurious" 100 Continue provisional responses on a regular interval. If I fail to see an expected 100 Continue or a non-provisional response I can generate a local error.
Is there a state of the art solution for this problem?
It sounds to me like you are using Soap for something that would be much better done using a stateful connection, or a server side push technology.

Resources