How to make GET request with python-requests without downloading content - python-requests

If I make
r = requests.get('http://github.com', stream=True)
and see in tcpdump, the content of page downloaded just after requests.get. After r.content, no tcpdump transfer activity. The same with requests.Session(stream=True).

Don't use GET if you don't want a response body to be sent by the server. Use a HEAD request instead if all you need is the header information.
All stream=True does is not read the response body from the socket. The server can still initiate sending that body, so the socket receive buffer will already have (some) of that body for Python to read.

Related

HTTP in simple terms

I came across the term HTTP. I have done some research and wanted to ensure that I correctly understood the term.
So, is it true that HTTP, in simple words, a letter containing information in the language that both client and server can understand.
Then, that letter is sent to the server thanks to TCP/IP which serves as a car that takes that letter to the server.
Then, after the letter is delivered to the server, the server reads the content of the letter and if it is GET request, the server takes the necessary data and ATTACHES that data to the letter and sends back to the client via again TCP/IP. But if it was POST request then the client ATTACHES the DATA to the letter and sends it to the server so that it saves that data in the database.
Is that true?
Basically, it is true.
However, the server can decide what to do if it is a GET or POST or any other request(it doesn't need to e.g. append it to a file).
I will show you some additional information/try to explain it in my words:
TCP is another communication protocol protocol. It allows a client to open a connection to a server and they can communicate afterwards.
HTTP(hyper text transfer protocol) builds up on TCP.
At first, the client opens a connection to the server.
After that, the client sends the HTTP Request. The first line contains the type of the request, the path and the version. For example, it could be GET / HTTP/1.1.
The next part of the request contains the Request parameters. Every parameter is a line. The parameters are sent like the following: paramName: paramValue
This part of the request ends with an empty line.
If it is a POST Request, query parameters are added next. If it is a GET Request, these query parameters are added with the path(e.g. /index.html?paramName=paramValue)
After rescieving the Request, the server sends a HTTP Response back to the client.
The first line of the response contains the HTTP version, the status code and the status message. For example, it could be HTTP/1.1 200 OK.
Then, just like in the request, the response parameters are following. For example Content-Length: 1024.
The response parameters also end with an empty line.
The last part of the response is the body/content. For example, this could be the HTML code of the website you are visiting.
Obviously, the length of the content/body of the response has to match the Content-Length parameter(in bytes).
After that, the connection will be closed(normally). If the client to e.g. request resources, it will send another request. The server has NO POSSIBILITY to send data to the client after that unless the client sends another request(websockets can bypass this issue).
GET is meant to get the content of a site A web browser will send a GET request if you type in a URL. POST can be used to update a site but in fact, the server can decide that. POST can be also used if the server doesn't want query parameters to be shown in the address bar.
There are other methods like PATCH or DELETE that are used by some APIs.
Some important status codes (and status messages) are:
200 OK (everything went well)
204 No content (like ok but there is no body in the response)
400 Bad Request (something is wrong with the Request)
404 Not found (the requested file(the path) was not found on the server)
500 Internal server error (An error occured while processing the request)
Every status code beginning with 1 is related to inform the client of something.
If it is starting with 2, everything went right.
Status code beginning with 3 forward the client to another site.
If it starts with 4, there is a error on the client side.
Codes starting with 5 represent an error that occured on the server side.
TCP is a network protocol that establishes a connection with the server over a network (or the Internet) and allows two-way communication. The HTTP will traffic inside this TCP tunnel. TCP is a very useful protocol that helps keep things sane, it ensures data packets are read in the correct order and that packets that went missing during transmission are sent again.
Sometimes there will be another protocol layer between HTTP and TCP, called SSL. It is responsible for encrypting the data that traffics over TCP, so that it is transmitted safely over unsafe networks. This is know as HTTPS, and is just HTTP but using this additional layer.
Although almost always true, HTTP doesn't necessarily uses TCP. UPnP requests use HTTP over UDP, a network protocol that uses standalone packets instead of a connection.
HTTP is a plain text protocol, meaning it's designed in such a way that a human can understand it without using any tools. This is very convenient for learning.
If you're using Firefox or Chrome, you can press Ctrl-Shift-C to open the Developer Tools, and under the Network tab you will see every HTTP request your browser is making, see exactly what's the request, what the server answered etc, and get a better view of how this protocol works.
Explaining it in details is... too extensive for this answer. But as you will see it's not that complicated.

How to get the crc32 of a resource in the response headers?

I need to get a CRC32 checksum of a file i'm downloading through an http GET request - without actually opening the response body.
I am building a proxy app - which gets a request from a client, and does the actual GET call. I'd like the response the proxy gets from the server to contain the checksum, without having to read through the actual data in the response body. I connect the response body reader stream, to the writer stream which I return to the client.
I read about the "Want-Digest" header which I can add to the request, and should result in the response containing a "Digest" header, with a checksum - but it did not work.
I also looked into the Content-MD5 header, but when I try to download some photos, I see i'm not getting it in the response (also, I read that it is deprecated).
Thanks in advance!
Any headers, such as 'Want-Digest' or 'Content-MD5', will be up to the server to implement. Most servers will probably ignore those headers, which is why they aren't working for you. If you want to calculate the CRC32 of the body, you'll have to open the body and calculate it yourself.
If you have access to the TCP headers I suppose you could access the TCP checksum, though that is a relatively weak checksum even compared to CRC32, and it is also a checksum of the entire packet, not just the body.

Should http server read request body before sending a response?

Is there any rule for http server implementation to read or skip the request body before sending a response?
Once you have received the full HTTP request you are free to do with it whatever you want to do with it. If you don't care about the body you can simply read it and discard it. You should empty the read buffer obviously.

Ask CURL to disconnect as soon as it receives a header

I'm pulling data from a server but need to know the type of data before I pull it. I know I can look at content-type in the response header, and I've looked into using
curl --head http://x.com/y/z
however some servers do not support the "HEAD" command (I get a 501 not implemented response).
Is it possible to somehow do a GET with curl, and immediately disconnect after all headers have been received?
Check out the following answer:
https://stackoverflow.com/a/5787827
Streaming. UNIX philosphy and pipes: they are data streams. Since curl and GET are unix filters, ending the receiving pipe (dd) will terminate curl or GET early (SIGPIPE). There is no telling whether the server will be smart enough to stop transmission. However on a TCP level I suppose it would stop retrying packets once there is no more response. #sehe
Using this method you should be able to download as many bytes as you want, and then cancel the request. You could also work some magic to terminate after receiving a blank line, which would mean the end of the header.

How do I report an error midway through a chunked http repsonse without closing the connection?

I have an HTTP server that returns large bodies in response to POST requests (it is a SOAP server). These bodies are "streamed" via chunking. If I encounter an error midway through streaming the response how can I report that error to the client and still keep the connection open? The implementation uses a proprietary HTTP/SOAP stack so I am interested in answers at the HTTP protocol level.
Once the server has sent the status line (the very first line of the response) to the client, you can't change the status code of the response anymore. Many servers delay sending the response by buffering it internally until the buffer is full. While the buffer is filling up, you can still change your mind about the response.
If your client has access to the response headers, you could use the fact that chunked encoding allows the server to add a trailer with headers after the chunked-encoded body. So, your server, having encountered the error, could gracefully stop sending the body, and then send a trailer that sets some header to some value. Your client would then interpret the presence of this header as a sign that an error happened.
Also keep in mind that chunked responses can contain "footers" which are just like HTTP headers. After failing, you can send a footer such as:
X-RealStatus: 500 Some bad stuff happened
Or if you succeed:
X-RealStatus: 200 OK
you can change the status code as long as response.iscommitted() returns false.
(fot HttpServletResponse in java, im sure there exists an equivalent in other languages)

Resources