Parsing a previously saved HTTP response - http

I have previously captured HTTP responses on disk (in the WARC format to be specific) and need a way to parse the responses.
I need to be able to access the headers and header values and optionally get the body if headers fit my criteria.
A WARC file contains the full respons including the HTTP/1.1 line, all headers and the body. I can find enough Clojure libraries to do HTTP requests, but I'm unable to only use the response parsing of such libraries. I'm a Clojure beginner, so I might be missing something.
The WARC parser I currently use gives me a stream with the HTTP response if this matters.

Related

what is grpc trailers metadata used for?

I was looking through grpc documents and found out that on the server-side you are able to set metadata both in the form of headers and trailers.
Headers seem like the usual replacement for normal HTTP headers with key-value mapping. I don't see any needs for trailers anymore seems the header is serving somewhat a similar purpose or am I missing something here?
Trailers can be used for anything the server wishes to send to the client after processing the request. Typically this should be used for information common to all methods a service provides, for example, data about the load created by the RPC for metrics purposes.
gRPC uses HTTP trailers for two purposes.
it sends its final status (grpc-status) as a trailer header after the content has been sent.
When an application or runtime error occurs during an RPC a Status and Status-Message are delivered in Trailers.
For responses end-of-stream is indicated by the presence of the END_STREAM flag on the last received HEADERS frame that carries Trailers
The second reason is to support streaming use cases. These use cases last much longer than normal HTTP requests. The HTTP trailer is used to give the post-processing result of the request or the response. For example, if there is an error during streaming data processing, you can send an error code using the trailer, which is not possible with the header before the message body.
Source 1 2

HTTP status code for sending back just the meta-data not full data

I am looking for an appropriate HTTP status code that tells the receiver that just the meta-data is being sent, not the complete data.
For example, say you do an HTTP GET:
GET /foo?meta_data_only=yes
the server won't look up the complete data, just send some metadata back about the endpoint, for example. Is there an HTTP status code for the response that can represent this? I would guess it's in the 200s or 300s somewhere?
Since your metadata is being returned in the headers, I would send a status code of 204 No Content.
https://httpstatuses.com/204
The server has successfully fulfilled the request and that there is no
additional content to send in the response payload body.
Metadata in
the response header fields refer to the target resource and its
selected representation after the requested action was applied.
This sounds exactly like what you’re looking for: a successful response that contains no body, and metadata in the headers that provide additional about the resource.
Another thing worth noting is that it’s common practice to use the HTTP verb HEAD when you only want metadata. HEAD is very similar to GET, except that it specifies that you do not want a body back. For example if you do a HEAD to an image url, you will get a 204 No Content response and some metadata about the file such as Content-Type, Content-Size, maybe ETag, but you won’t be sent all of the file data. A lot of web servers (such as Nginx) support this behavior out of the box for static files. I would recommend that you stop using your querystring parameter, and instead implement HEAD versions of your endpoints. That would make the intention even more clear and intuitive.

Http Header Accept Encoding

I have difficulty in understanding how this header works.
Briefly my question is
If i am requesting a post to certain resource then let's
Say in 1st case response is some json string and in 2nd case response is a .jar file.
1.Should client include accept-header:gzip,deflate in both cases while sending HTTP request,knowing that first one results in json string?
2.What if the response is already zipped,now zipping the response over the already zipped data doesn't create problems?
3.what happens if i include accept-encoding:gzip in first case where json string is received. So i receive a zipped data as my response(i am not even sure if get zipped data or some encoded data as response.I think zipped data means something zipped like .jar/.zip and encoded data means Encoded data of the original data ,which one is happening zipping or encoding)?
4.Lets say the server sends the response with Contentype header as "application/octet-stream". Now is it must to use accept-header:gzip,deflate
A client can use Accept-Encoding HTTP request header to tell the server that it can accept a compressed response.
The server can use the request header to decide if it should send a compressed response or not. It can ignore the header and always send a non-compressed response (possibly less efficient). It can ignore the header and always send a compressed response (risking giving a client a response it can't decode).
Should client include accept-header:gzip,deflate in both cases
I can't think of any reason to not tell the server that a client can handle a compressed response (assuming that fact is true).
What if the response is already zipped,now zipping the response over the already zipped data doesn't create problems
It might be a waste of processor power for little or no saving in bytes.
That's not a reason for the client to say it can't handle a compressed response though. That's a decision to be made on the server.
what happens if i include accept-encoding:gzip in first case where json string is received.
Then the client has told the server that a compressed response is acceptable.
So i receive a zipped data as my response
The server might send a compressed response. It might ignore the header.
i am not even sure if get zipped data or some encoded data as response
There isn't an "or" here.
The data is encoded using a compression algorithm.
Lets say the server sends the response with Contentype header as "application/octet-stream"
That just means the server doesn't know what type of data it is sending. Instead of saying "This is JSON" or "This is a jar file" it is saying "I dunno what this is, it's just a stream of bytes to me".
Now is it must to use accept-header:gzip,deflate
It doesn't make a difference.
The server can compress the data. It can send uncompressed data. It can use the Accept-Encoding request header to decide which of the two.
Yes, why not? If the JSON payload is big, compressing it will make a lot of sense.
It's just overhead.
You might receive gzipped data - not a ZIP file. You may want to read RFCs 7230 and RFC 7231 for details.
The internet media type of the payload is completely independent of the content coding.

Sending a SOAP XML manually and receiving a HTTP 500 error code and binary data in the response

I am using ruby to send a SOAP request to a very enterprisey bla bla service, so unfortunately I can not attach any samples, there's nobody to send any server-side logs, nobody knows whats wrong on the provider side or how the actual HTTP requests need to look like (except a single XML example I got, but no HTTP headers), the docs are very Microsoft-centric with C# examples and whatnot ("instantiate AbstractFactoryFactory..." and whatnot), long live enterprise software.
But the bottom line is, eventually I took one of their own XMLs from their logs and sent it via HTTP to the endpoint from the WSDL and sent it to their host using the Savon gem raw XML option and got a HTTP 500 error from their host and a bunch of non-ascii binary data inside - literally, no ASCII characters are in the body.
I guessed that maybe Savon does some bad magic or that the XML option is not working as expected and I tried sending the same request via Faraday, but got the same thing,
the HTTP response headers says it's a HTTP response, XML encoded, from an ASP.NET host:
"content-type"=>"text/xml; charset=utf-8",
"server"=>"Microsoft-IIS/7.5",
"x-aspnet-version"=>"2.0.50727",
"x-powered-by"=>"ASP.NET",
but again, a 440 bytes worth of binaries in the response:
method=:post,
body=
"\x1F\x8B\b\x00\x00...
etc.
Am I missing some weird aspect of the SOAP specification and I need to do something to decode this data or has their server gone bonkers from my XML, HTTP headers or something else and I need to ping the provider?
Update 1
I noticed that their original XML had UTF-16 encoding set, so I tried encoding the raw string to UTF-16, then had Savon spew errors at me about bad data, then I updated encoding in the Savon client config. But I still get HTTP 500 error and binaries as response and if I try to log anything Savon reports a bug:
Encoding::CompatibilityError: incompatible encoding regexp match (US-ASCII regexp with UTF-16 string)
from /home/bbozo/.rvm/gems/ruby-2.2.4/gems/savon-2.11.1/lib/savon/log_message.rb:13:in `to_s'
Faraday basically reported the same behavior, an binary blob.
Update 2
I tried piping the encoding to every known encoding, and got nothing, even though the HTTP headers imply the encoding is UTF-8, it obviously isn't
Encoding.name_list.map{ |e_in| [ e_in, ( response.body.dup.force_encoding(e_in).encode('utf-8') rescue 'incompatible' ) ] }
There is nothing that would indicate the encoding in the WSDL files, the API spec doesn't even mention encoding except that the request XMLs need to be UTF-8 encoding, I tried encoding the body, changing the XML encoding definition, HTTP headers, but still I get the same binary blob, with the same heading (\x1F\x8B\b\x00\x00) - so it's not some weird encryption either.
Compression maybe?
I tried with https for good measure and nothing.
Question
Am I missing some weird aspect of the SOAP specification and I need to do something to decode this data or has their server gone bonkers from my XML, HTTP headers or something else and I need to ping the provider?
The response body was compressed! In the end I just gunzipped it and there it was,
How to decompress Gzip string in ruby?

HTTP POST: content-length header required?

I'm currently trying to optimize http-based data transfer between several applications.
Our current approach, downloading first and then creating the post-request, obviously add extra IO/memory load and latencies, which I'd like to circumvent.
The core question of all:
Is it required to send a "Content-Length" header in HTTP POST requests?
IIRC, HTTP 2616 declares that it's optional, but I'm not sure how applications actually behave at this point.
Depends what you mean by optional. If you mean that you can just omit the header anytime you like then no, it is not optional. The HTTP spec has very specific rules when to use that header. There are different ways of sending the data if you don't know the length. Chunked encoding for example.
4.4 Message Length

Resources