Writing a Proxy/Caching server using Lua! - http

I'm still starting out with Lua, and would like to write a (relatively) simple proxy using it.
This is what I would like to get to:
Listen on port.
Accept connection.
Since this is a proxy, I'm expecting HTTP (Get/Post etc..)/HTTPS/FTP/whatever requests from my browser.
Inspect the request (Just to extract the host and port information?)
Create a new socket and connect to the host specified in the request.
Relay the exact request as it was received, with POST data and all.
Receive the response (header/body/anything else..) and respond to the initial request.
Close Connections? I suppose Keep-Alive shouldn't be respected?
I realize it's not supposed to be trivial, but I'm having a lot of trouble setting this up using LuaSockets or Copas --- how do I receive the entire request? Keep receiving until I scan \r\n\r\n? Then how do I pull the post data? and the body? Or accept a "download" file? I read about the "sink", but admittedly didn't understand most of what that meant, so maybe I should read up more on that?
In case it matters, I'm working on a windows machine, using LuaForWindows and am still rather new to Lua. Loving it so far though, tables are simply amazing :)
I discovered lua-http but it seems to have been merged into Xavante (and I didn't find any version for lua 5.1 and LuaForWindows), not sure if it makes my life easier?
Thanks in advance for any tips, pointers, libraries/source I should be looking at etc :)

Not as easy as you may think. Requests to proxies and request to servers are different. In rfc2616 you can see that, when querying a proxy, a client include the absolute url of the requested document instead of the usual relative one.
So, as a proxy, you have to parse incomming requests, modify them, query the appropriate servers, and return response.
Parsing incomming requests is quite complex as body length depends on various parameters ( method, content encoding, etc ).

You may try to use lua-http-parser.

Related

The reason for a mandatory 'Host' clause in HTTP 1.1 GET

Last week I started quite a fuss in my Computer Networks class over the need for a mandatory Host clause in the header of HTTP 1.1 GET messages.
The reason I'm provided with, be it written on the Web or shouted at me by my classmates, is always the same: the need to support virtual hosting. However, and I'll try to be as clear as possible, this does not appear to make sense.
I understand that in order to allow two domains to be hosted in a single machine (and by consequence, share the same IP address), there has to exist a way of differentiating both domain names.
What I don't understand is why it isn't possible to achieve this without a Host clause (HTTP 1.0 style) by using an absolute URL (e.g. GET http://www.example.org/index.html) instead of a relative one (e.g. GET /index.html).
When the HTTP message got to the server, it (the server) would redirect the message to the appropriate host, not by looking at the Host clause but, instead, by looking at the hostname in the URL present in the message's request line.
I would be very grateful if any of you hardcore hackers could help me understand what exactly am I missing here.
This was discussed in this thread:
modest suggestions for HTTP/2.0 with their rationale.
Add a header to the client request that indicates the hostname and
port of the URL which the client is accessing.
Rationale: One of the most requested features from commercial server
maintainers is the ability to run a single server on a single port
and have it respond with different top level pages depending on the
hostname in the URL.
Making an absolute request URI required (because there's no way for the client to know on beforehand whether the server homes one or more sites) was suggested:
Re the first proposal, to incorporate the hostname somewhere. This
would be cleanest put into the URL itself :-
GET http://hostname/fred http/2.0
This is the syntax for proxy redirects.
To which this argument was made:
Since there will be a mix of clients, some supporting host name reporting
and some not, it just doesn't matter how this info gets to the server.
Since it doesn't matter, the easier to implement solution is a new HTTP
request header field. It allows all clients and servers to operate as they
do now with NO code changes. Clients and servers that actually need host
name information can have tiny mods made to send the extra header field
containing the URL and process it.
[...]
All I'm suggesting is that there is a better way to
implement the delivery of host name info to the server that doesn't involve
hacking the request syntax and can be backwards compatible with ALL clients
and servers.
Feel free to read on to discover the final decision yourself. But be warned, it's easy to get lost in there.
The reason for adding support for specifying a host in an HTTP request was the limited supply of IP addresses (which was not an issue yet when HTTP 1.0 came out).
If your question is "why specify the host in a Host header as opposed to on the Request-Line", the answer is the need for interopability between HTTP/1.0 and 1.1.
If the question is "why is the Host header mandatory", this has to do with the desire to speed up the transition away from assigned IP addresses.
Here's some background on the Internet address conservation with respect to HTTP/1.1.
The reason for the 'Host' header is to make explicit which host this request refers to. Without 'Host', the server must know ahead of time that it is supposed to route 'http://joesdogs.com/' to Joe's Dogs while it is supposed to route 'http://joscats.com/' to Jo's Cats even though they are on the same webserver. (What if a server has 2 names, like 'joscats.com' and 'joescats.com' that should refer to the same website?)
Having an explicit 'Host' header make these kinds of decisions much easier to program.

Why exactly pipe is simple than TCP connection between SSL proxy and HTTP proxy communicating?

I began to study protocol stuffs recently.
I acknowledged that in the old method, incoming data will be first delivered to SSL proxy, where to be decrypted and then be sent to HTTP proxy through another TCP connection. For every packet passes through this connection, we need to do a connection table to look up to determine the other endpoint of the connection.
But the pipe setup and teardown require one function call each and no packet sent. Sending data through the pipe will not require a connection table lookup, as the data structures are already tied together with pointers.
I tried to search the answer of my own question, but can’t find good method to understand it. I guess there may be something related to structure of TCP or PIPE. Could any tell me that why exactly pipe is simple than TCP connection between SSL proxy and HTTP proxy? Or please suggest me what book to read or how can I understand it?
Two Pics related to this question:
http://www.tripntale.com/pic/19254/857880/pipe-jpg#pid-857880
http://www.tripntale.com/pic/19254/857880/pipe-jpg#pid-857882
So what you want to know is how these two diagrams compare?
I'm sorry to say that these diagrams don't make much sense to me either, hopefully they do make sense if there's the text to go with them when they were published.
The diagrams relate to software engineering approaches to a problem, but the objects in the diagrams aren't defined functionally, appear to me to be used in different ways and it isn't clear what the problem is that these are approaches to.
HTTP proxies can be used as:
Forward proxies (client sends it's HTTP requests to proxy, proxy fetches and returns them to client)
Or
Reverse proxies (proxy sits in front of server(s) for service engineering reasons)
The term "SSL Proxy" could refer to either application and would have differing implications to how it was designed.
See here for more explanation: http://en.wikipedia.org/wiki/SSL_Proxy
Do you just want to understand these diagrams? Or are you trying to solve a problem and think that these diagrams can help you? If so, what is the problem you are trying to solve?
For every packet passes through this connection, we need to do a
connection table
Why? I've written several proxies without a connection table.

Why can't I view Omegle's HTTP request/response headers?

I'm trying to write a small program that I can talk to Omegle strangers via command line for school. However I'm having some issues, I'm sure I could solve the problem if I could view the headers sent however if you talk to a stranger on Omegle while Live HTTP Headers (or a similar plug-in or program) is running the headers don't show. Why is this? Are they not sending HTTP headers and using a different protocol instead?
I'm really lost with this, any ideas?
I had success in writing a command line Omegle chat client. However it is hardcoded in C for POSIX and curses.
I'm not sure what exactly your problem is, maybe it's just something with your method of reverse engineering Omegle's protocol. If you want to make a chat client, use a network packet analyzer such as Wireshark (or if you're on a POSIX system I recommend tcpdump), study exactly what data is sent and received during a chat session and have your program emulate what the default web client is doing. Another option is to de-compile/reverse engineer the default web client itself, which would be a more thorough method but more complicated.

"Proxying" HTTP requests

I have some software which runs as a black box, I have no access to it. This software makes HTTP requests. What I want to do is intercept these requests, forward them on, catch the response, do something with it, before passing the response back to the software.
Can this be done? What's the best method?
Thanks
Edit: Requests are to the public internet from a local intranet via a gateway/router. I have root access to my machine. Another machine could be used as intermediate gateway.
Edit 2: Requests are not encrypted. What I am actually trying to do is save down any images that are requested.
Try yellosoft-alchemy.
If the communication isn't encrypted, use Ethereal (or any other similar program) to sniff the communication on the wire.
edit: since the communication isn't encrypted, you can do that easily with Ethereal. You can save each TCP stream independently from there.
Edit2: Ok, you want to do this automatically. In this case, I would suggest you look at two tools available on Linux called tcpflow and tcpreen.
tcpreen creates a proxy similar to what you want between a local port and a remote one. It's a TCP proxy, not an HTTP proxy so this means you'll have to write some parsing tool to isolate the HTTP streams that contain the images you want (probably based on the MIME type of the response). it's not too complex a task, though, if you understand how HTTP works.
tcpflow is similar to tcpreen except that it's a sniffer instead of a proxy. Use whatever tool you think its more adapted to your environment.

What things should be kept in mind while desigining an HTTP based protocol?

I have heard that http is a nice way to design my own protocol. although i can design a binary protocol, i would prefer to follow the HTTP standard to design my protocol.
basically the flow of the application is that with the request the client sends some parameter strings to the server, the server sends the response string to the application. this procedure continues several times, before the connection thread terminates.
i am using java servlets for the above.
how should the client send the HTTP parameters so that parsing is easy at the server.
Get /a HTTP/1.1
Host: localhost
??? what comes here
??? what comes here
Since that is a GET request, nothing.
I'd suggest using querystring parameters, then you can access them using ServletRequest.getParameterNames(), getParameterValues(), getParameterMap() etc.
So, your request line would take the form:
GET /a?x=1&y=1 HTTP/1.1
since this is the standard way of passing parameter data, other clients, such as web browsers, will be able to use your service easily.
This assumes that the operation does not cause side-effects on the server. If it does then you should be using a POST, PUT or DELETE request depending on the exact nature of the operation.
HTTP Made Really Easy is a useful document since, at least initially, the HTTP Spec can be a bit daunting.
Why not base your protocol on something that already exists for example SOAP?
What you're designing is a data exchange format, not a protocol really.
So the question is, really, what sort of data do you want to send? To answer that, you need to consider who is receiving it. If it's yourself, then just keep it simple.

Resources