I am using the following PLone + urllib code to proxy responses from another server through a BrowserView
req = urllib2.Request(full_url)
try:
# Important or if the remote server is slow
# all our web server threads get stuck here
# But this is UGLY as Python does not provide per-thread
# or per-socket timeouts thru urllib
orignal_timeout = socket.getdefaulttimeout()
try:
socket.setdefaulttimeout(10)
response = urllib2.urlopen(req)
finally:
# restore orignal timeoout
socket.setdefaulttimeout(orignal_timeout)
# XXX: How to stream respone through Zope
# AFAIK - we cannot do it currently
return response.read()
My question is how could I make this function not to block and start streaming the proxied response through Zope instantly when the first bytes arrive? When interfaces, objects or patterns are used in making streamable Zope responses?
I think there are two ways you can do this. Firstly, the Zope response itself is file-like so you can use the response's write() method to write successive chunks of data to the response as they come in. Here's an example where I use a Zope response as a file-like object for a csv.writer.
Or you can use ZPublisher's IStreamIterators and wrap the response in a ZPublisher.Iterators.filestream_iterator wrapper and return the wrapper.
This should actually be a comment, but I don't have the reputation yet.
I am trying to do the same thing as you Mikko, and RESPONSE.write() does exactly that, as Ross said it would. Note however that the bytes won't actually leave the interface until there's 64K of them (or connection closes). Flushing stdout won't help so it seems that you will have to interfere further down with the socket to promptly send a few bytes right away.
Related
I have built a router with extended logging capabilities using Go. It works properly for most use cases. However, it encounters problems when clients send non-standard HTTP messages on port 80.
To date, I have solved this by implementing my own version of ServeHTTP():
func (myproxy *MyProxy) ServeHTTP(w http.ResponseWtier, r *http.Request) {
// Inspect headers
// Determine if it is a custom protocol (ie: websockets, CONNECT requests)
// Implement handlers for each time
}
In the event that I determine a request is a non-standard HTTP protocol, the request is played back to the original destination (via http.Request.Write()) and everyone is happy.
At least, most of the time. The problem arises with edge cases. Tivo clients do not send "Host" headers and appear to not like all kinds of other standard things that Go does (such as capitalizing header names). The number of possible variations on this is endless so what I would very much like to do is to buffer the original request - exactly as it was sent to me without any modifications - and replay it back to the original destination server.
I could hack this by re-implementing net.Http.Server, but this seems like an extraordinarily difficult and brittle approach. Would I would prefer to do would be to somehow hook into net/http/http/server.go right around the point where it receives a connection, then wrap that in a spy wrapper that logs the request to a buffer.
func (srv *Server) Serve(l net.Listener) error {
// Lots of listener code...
c := srv.newConn(rw)
c.setState(c.rwc, StateNew) // before Serve can return
// I'd like to wrap c here and save the request for possible reply later.
go c.serve(ctx)
}
https://golang.org/src/net/http/server.go?s=93229:93284#L2805
Edit: I have looked at httpUtil:DumpRequest but this slightly modifies the original request (changes case and order of headers). If it didn't do that, it would be an ideal solution.
https://godoc.org/net/http/httputil#DumpRequest
Is there a way to hook connections around this point before they are parsed by http.Request?
In the interest of helping others, I wanted to answer my question. The approach I suggested above does in fact work and is the proper way to do this. In summary:
Implement ListenAndServe()
Wrap the incoming net.Conn in a TeeReader or other multiplexed connection wrapper
Record the request
Dial the original destination and connect with the inbound connection, replaying the original request if necessary.
A similar use case is required when upgrading connection requests for websockets servers. A nice writeup can be found here:
https://medium.freecodecamp.org/million-websockets-and-go-cc58418460bb
I have a client side GUI app for human usage that consumes some SOAP web services and uses cURL as the underlying HTTP communication lib. Depending on the input, processing a request can take some large amount of time, even one hour. Neither the client nor server time out for that reason on their own and that's tested and works. Most of the requests get processed in some minutes anyway, so this is an edge case.
One of my users is forced to use a proxy between my client app and my server and for various reasons has no control over it. That proxy has a time out configured and closes the connection to my client after 4 minutes of no data transfer. So the user can (and did) upload data for e.g. 30 minutes, afterwards the server starts to process the data and after 4 minutes the proxy closes the connection, the server will silently continue to process the request, but the user is left with some error message AND won't get the processing result. My app already uses TCP Keep Alive, so that shouldn't be the problem, but instead the time out seems to be defined for higher level data. It works the same like the option read_timeout for squid, which I used to reproduce the behaviour in our internal setup.
What I would like to do now is start a background thread in my web service which simply outputs some garbage data to my client over all the time the request is processed, which is ignored by the client and tells the proxy that the connection is still active. I can recognize my client using the user agent and can configure if to ouput that data or not server side and such, so other clients consuming the web service wouldn't get a problem.
What I'm asking for is, if there's any HTTP compliant method to output such garbage data before the actual HTTP response? So e.g. would it be enough to simply output \r\n without any additional content over and over again to be HTTP compliant with all requesting libs? Or maybe even binary 0? Or some full fledged HTTP headers stating something like "real answer about to come, please be patient"? From my investigation this pretty much sounds like chunked HTTP encoding, but I'm not sure yet if this is applicable.
I would like to have the following, where all those "Wait" stuff is simply ignored in the end and the real HTTP response at the end contains Content-Length and such.
Wait...\r\n
Wait...\r\n
Wait...\r\n
[...]
HTTP/1.1 200 OK\r\n
Server: Apache/2.4.23 (Win64) mod_jk/1.2.41\r\n
[...]
<?xml version="1.0" encoding="UTF-8"?><soap:Envelope[...]
Is that possible in some standard HTTP way and if so, what's the approach I need to take? Thanks!
HTTP Status 102
Isn't HTTP Status 102 exactly what I need? As I understand the spec, I can simply print that response line over and over again until the final response is available?
HTTP Status 102 was a dead-end, two things might work, depending on the proxy used: A NPH script can be used to regularly print headers directly to the client. The important thing is that NPH scripts normally bypass header buffers from the web server and can therefore be transferred over the wire as needed. They "only" need be correct HTTP headers and depending on the web server and proxy and such it might be a good idea to create incrementing, unique headers. Simply by adding some counter in the header name.
The second thing is chunked transfer-encoding, in which case small chunks of dummy data can be printed to the client in the response body. The good thing is that such small amount of data can be transferred over the wire as needed using server side flush and such, the bad thing is that the client receives this data and by default behaves as if it was part of the expected response body. That might break the application of course, but most HTTP libs provide callbacks for processing received data and if you print some unique one, the client should be able to filter the garbage out.
In my case the web service is spawning some background thread and depending on the entry point of the service requested it either prints headers using NPH or chunks of data. In both cases the data can be the same, so a NPH-header can be used for chunked transfer-encoding as well.
My NPH solution doesn't work with Squid, but the chunked one does. The problem with Squid is that its read_timeout setting is not low level for the connection to receive data at all, but instead some logical HTTP thing. This means that Squid does receive my headers, but it expects a complete HTTP header within the period of time defined using read_timeout. With my NPH approach this isn't the case, simply because by design I only want to send some garbage headers to ignore until the real headers arrive.
Additionally, one has to be careful about NPH in Apache httpd, but in my use case it works. I can see the individual headers in Squid's log and without any garbage after the response body or such. Avoid the Action directive.
Apache2 sends two HTTP headers with a mapped "nph-" CGI
I have a program already written in gawk that downloads a lot of small bits of info from the internet. (A media scanner and indexer)
At present it launches wget to get the information. This is fine, but I'd like to simply reuse the connection between invocations. Its possible a run of the program might make between 200-2000 calls to the same api service.
I've just discovered that gawk can do networking and found geturl
However the advice at the bottom of that page is well heeded, I can't find an easy way to read the last line and keep the connection open.
As I'm mostly reading JSON data, I can set RS="}" and exit when body length reaches the expected content-length. This might break with any trailing white space though. I'd like a more robust approach. Does anyone have a nicer way to implement sporadic http requests in awk that keep the connection open. Currently I have the following structure...
con="/inet/tcp/0/host/80";
send_http_request(con);
RS="\r\n";
read_headers();
# now read the body - but do not close the connection...
RS="}"; # for JSON
while ( con |& getline bytes ) {
body = body bytes RS;
if (length(body) >= content_length) break;
print length(body);
}
# Do not close con here - keep open
Its a shame this one little thing seems to be spoiling all the potential here. Also in case anyone asks :) ..
awk was originally chosen for historical reasons - there were not many other language options on this embedded platform at the time.
Gathering up all of the URLs in advance and passing to wget will not be easy.
re-implementing in perl/python etc is not a quick solution.
I've looked at trying to pipe urls to a named pipe and into wget -i - , that doesn't work. Data gets buffered, and unbuffer not available - also I think wget gathers up all the URLS until EOF before processing.
The data is small so lack of compression is not an issue.
The problem with the connection reuse comes from the HTTP 1.0 standard, not gawk. To reuse the connection you must either use HTTP 1.1 or try some other non-standard solutions for HTTP 1.0. Don't forget to add the Host: header in your HTTP/1.1 request, as it is mandatory.
You're right about the lack of robustness when reading the response body. For line oriented protocols this is not an issue. Moreover, even when using HTTP 1.1, if your scripts locks waiting for more data when it shouldn't, the server will, again, close the connection due to inactivity.
As a last resort, you could write your own HTTP retriever in whatever langauage you like which reuses connections (all to the same remote host I presume) and also inserts a special record separator for you. Then, you could control it from the awk script.
I'm still starting out with Lua, and would like to write a (relatively) simple proxy using it.
This is what I would like to get to:
Listen on port.
Accept connection.
Since this is a proxy, I'm expecting HTTP (Get/Post etc..)/HTTPS/FTP/whatever requests from my browser.
Inspect the request (Just to extract the host and port information?)
Create a new socket and connect to the host specified in the request.
Relay the exact request as it was received, with POST data and all.
Receive the response (header/body/anything else..) and respond to the initial request.
Close Connections? I suppose Keep-Alive shouldn't be respected?
I realize it's not supposed to be trivial, but I'm having a lot of trouble setting this up using LuaSockets or Copas --- how do I receive the entire request? Keep receiving until I scan \r\n\r\n? Then how do I pull the post data? and the body? Or accept a "download" file? I read about the "sink", but admittedly didn't understand most of what that meant, so maybe I should read up more on that?
In case it matters, I'm working on a windows machine, using LuaForWindows and am still rather new to Lua. Loving it so far though, tables are simply amazing :)
I discovered lua-http but it seems to have been merged into Xavante (and I didn't find any version for lua 5.1 and LuaForWindows), not sure if it makes my life easier?
Thanks in advance for any tips, pointers, libraries/source I should be looking at etc :)
Not as easy as you may think. Requests to proxies and request to servers are different. In rfc2616 you can see that, when querying a proxy, a client include the absolute url of the requested document instead of the usual relative one.
So, as a proxy, you have to parse incomming requests, modify them, query the appropriate servers, and return response.
Parsing incomming requests is quite complex as body length depends on various parameters ( method, content encoding, etc ).
You may try to use lua-http-parser.
I have heard that http is a nice way to design my own protocol. although i can design a binary protocol, i would prefer to follow the HTTP standard to design my protocol.
basically the flow of the application is that with the request the client sends some parameter strings to the server, the server sends the response string to the application. this procedure continues several times, before the connection thread terminates.
i am using java servlets for the above.
how should the client send the HTTP parameters so that parsing is easy at the server.
Get /a HTTP/1.1
Host: localhost
??? what comes here
??? what comes here
Since that is a GET request, nothing.
I'd suggest using querystring parameters, then you can access them using ServletRequest.getParameterNames(), getParameterValues(), getParameterMap() etc.
So, your request line would take the form:
GET /a?x=1&y=1 HTTP/1.1
since this is the standard way of passing parameter data, other clients, such as web browsers, will be able to use your service easily.
This assumes that the operation does not cause side-effects on the server. If it does then you should be using a POST, PUT or DELETE request depending on the exact nature of the operation.
HTTP Made Really Easy is a useful document since, at least initially, the HTTP Spec can be a bit daunting.
Why not base your protocol on something that already exists for example SOAP?
What you're designing is a data exchange format, not a protocol really.
So the question is, really, what sort of data do you want to send? To answer that, you need to consider who is receiving it. If it's yourself, then just keep it simple.