Proper Chunked Transfer Encoding Format

Proper Chunked Transfer Encoding Format - http

I'm curious about the proper format of chunked data in comparison to the spec and what Twitter returns from their activity stream.
When using curl to try to get a Chunked stream from Twitter, curl reports:
~$ curl -v https://stream.twitter.com/1/statuses/sample.json?delimited=length -u ...:...
< HTTP/1.1 200 OK
< Content-Type: application/json
< Transfer-Encoding: chunked
<
1984
{"place":null,"text":...
1984
{"place":null,"text":...
1984
{"place":null,"text":...
I've written a chunked data emitter based upon the Wikipedia info and the HTTP spec (essentially: \r\n\r\n), and my result looks like this:
~$ curl -vN http://localhost:7080/stream
< HTTP/1.1 200 OK
< Content-Type: application/json; charset=UTF-8
< Transfer-Encoding: chunked
<
{"foo":{"bar":...
{"foo":{"bar":...
{"foo":{"bar":...
The difference being that it appears that Twitter is including the length of the string as part of the body of the chunk as an integer (in conjunction with the value in Hex that must also be there), and I wanted to make sure that I wasn't missing something. The Twitter docs make no mention of the length value, it's not in their examples, nor do I see anything about it in the spec.

If your code does not emit length information that it is clearly incorrect. See http://greenbytes.de/tech/webdav/rfc2616.html#rfc.section.3.6.1.

RCF2616-19.4.6 Introduction of Transfer-Encoding
A process for decoding the "chunked" transfer-coding (section 3.6) can be represented in pseudo-code as:
length := 0
read chunk-size, chunk-extension (if any) and CRLF
while (chunk-size > 0) {
read chunk-data and CRLF
append chunk-data to entity-body
length := length + chunk-size
read chunk-size and CRLF
}
read entity-header
while (entity-header not empty) {
append entity-header to existing header fields
read entity-header
}
Content-Length := length
Remove "chunked" from Transfer-Encoding
As RFC says, the chunk-size will not append to the entity-body. So that is nomal you can not see the chunk-size.And I have read the souce code of curl(function Curl_httpchunk_read)and make sure it skips the chunk-size\r\n, just append chunk-size bytes behind it to body.
The twitter replys with chunk-size,I think it is because of using https, the whole data is encrypted.

Related

How to POST multipart/form-data using Vegeta?

I wanted to request POST a image file as multipart/form-data with Vegeta
But when I use this code, it didn't work well. As I thought, 'mean of Bytes In' in Vegeta report needed to be over 20000 because of the image size, but it was just 55.00
I command like this because it is on Windows Power Shell
PS >vegeta attack -duration=10s -rate 100 -targets .\targets_formdata.txt -output output\results.bin
targets_formdata.txt
POST http://url/to/request
Content-Type: multipart/form-data; boundary=Boundary+1234
#body.txt
body.txt
--Boundary+1234
Content-Disposition: form-data; name="file"; filename="DvBp50cVYAEIfxd.jpg"
Content-Type: image/jpeg
I wrote --Boundary+1234 as really it is, as --Boundary+1234, this could be a problem? I don't know what is the real problem.

Robot Framework “multipart/form-data” REST request with multiple parameters

I am trying to use requestslibrary to upload some files, goal is to achieve this:
------WebKitFormBoundary61N9vqJ7380nh6iv
Content-Disposition: form-data; name="files"; filename="photo-2.jpeg"
Content-Type: image/jpeg
------WebKitFormBoundary61N9vqJ7380nh6iv
Content-Disposition: form-data; name="fileId"
b3duLWZpbGVzL2ZmZmZmZmZmYTQyNDVmODAvMjAxNTY*
------WebKitFormBoundary61N9vqJ7380nh6iv
Content-Disposition: form-data; name="extract"
false
------WebKitFormBoundary61N9vqJ7380nh6iv--
and now I have this, as per this:
${data}= Evaluate {'files': open("C:/testautomation/resources/Assets/photo-2.jpeg", 'r+b'), 'extract': (None, 'false'), 'fileId': (None, 'b3duLWZpbGVzL2ZmZmZmZmZmYTQyNDVmODAvMjAxNTY*')}
log ${data}
${result}= Post Request rest ${url} headers=${HEADERS} files=${data}
I THINK that the only bit I am missing is the "Content-Type: image/jpeg" from the first part, but how on earth I can add that? Currently the file gets uploaded, but it is not considered to be an image file.

The answer was:
${data}= Evaluate {'files': ('photo-1.jpeg', open("C:/testautomation-robot/resources/Assets/photo-1.jpeg", 'r+b'), 'image/jpeg'), 'extract': (None, 'false'), 'fileId': (None, 'b3duLWZpbGVzL2ZmZmZmZmZmYTQyNDVmODAvMjAxNTY*')}
Found an example from here: https://code.i-harness.com/en/q/bcfb9b
>>> url = 'http://httpbin.org/post'
>>> files = {'file': ('report.xls', open('report.xls', 'rb'), 'application/vnd.ms-excel', {'Expires': '0'})}
In the above, the tuple is composed as follows:
(filename, data, content_type, headers)

Create a python resources and call it through robot framework:
and call it through robot framework
Call params from robotframework:
upload multipart files post request ${headers} ${url} resources/files/upload_file/testfile1_upload.pdf

NodeMCU http.get() and UTF-8 charset

I am using the http module of the NodeMCU dev branch to make a GET request to the Google Calendar API. However, when I retrieve the events and parse answer, I get strange chars due to a non-proper encoding.
I tried to add Accept-Charset: utf-8 in the header of the request, but the request fails (code=-1).
Is there a way to set the charset, or convert it afterwards in lua?
function bdayshttps(curr_date)
if (string.len(curr_date) == 10) then
http.get("https://www.googleapis.com/calendar/v3/calendars/"..
"<CalendarID>/events"..
"?timeMax="..curr_date.."T23%3A59%3A59-00%3A00"..
"&timeMin="..curr_date.."T00%3A00%3A00-00%3A00&fields=items%2Fsummary"..
"&key=<Google Calendar API key>", "Accept-Charset: utf-8", function(code, data)
if (code < 0) then
print("msg:birthdays error")
else
if (code == 200) then
output = ""
for line in data:gmatch"\"summary\": \"[^\n]*" do
output = output..line:sub(13, line:len()-1)..";"
end
print("bday:"..output)
end
end
end)
end
end
For obvious reasons, I erased the calendarID and API key.
EDIT:
The result of this code returns msg:birthday error, meaning the GET request returns a code=-1.
When replacing the "Accept-Charset: utf-8" by nil in the header, I get:
LoÃ¯c Simonetti instead of Loïc Simonetti.

The API docs say that you need to append \r\n to every header you set. There's an example in the docs for http.post().
Hence, instead of "Accept-Charset: utf-8" you should set "Accept-Charset: utf-8\r\n".

For the second part of your question (now that the response is working), it appears you are receiving back valid UTF8 Unicode.
Loïc corresponds to the following UTF8 code units: 4C 6F C3 AF 63
LoÃ¯c has the following byte values in a common encoding (CodePage 1252): 4C 6F C3 AF 63.
So my guess is the terminal or other device you're using to view the bytes is mangling things as opposed to your request/response being incorrect. Per #Marcel's link above too, Lua does not handle Unicode at all so this is about the best you can do (safely receive, store, and then send it back later)

Is it necessary to check both '\r\n\r\n' and '\n\n' as HTTP Header/Content separator?

The accepted answer on this page says we should check HTTP server responses for both \r\n\r\n and \n\n as the sequence that separates the headers from the content.
Like:
HTTP/1.1 200 Ok\r\n
Server: AAA\r\n
Cache-Control: no-cache\r\n
Date: Fri, 07 Nov 2014 23:20:27 GMT\r\n
Content-Type: text/html\r\n
Connection: close\r\n\r\n <--------------
or:
HTTP/1.1 200 Ok\r\n
Server: AAA\r\n
Cache-Control: no-cache\r\n
Date: Fri, 07 Nov 2014 23:20:27 GMT\r\n
Content-Type: text/html\r\n
Connection: close\n\n <--------------
In all the responses I've seen in Wireshark, servers use \r\n\r\n.
Is it really necessary to check for both? Which servers/protocol versions would use \n\n?

I started off with \r\n\r\n but soon found some sites that used \n\n. And looking at some professional libraries like curl, they also handle \n\n even if it's not according to the standard.
I don't really know the curl code, but see for example here: https://github.com/curl/curl/blob/7a33c4dff985313f60f39fcde2f89d5aa43381c8/lib/http.c#L1406-L1413
/* find the end of the header line */
end = strchr(start, '\r'); /* lines end with CRLF */
if(!end) {
/* in case there's a non-standard compliant line here */
end = strchr(start, '\n');
if(!end)
/* hm, there's no line ending here, use the zero byte! */
end = strchr(start, '\0');
}
Looking at that I think even \0\0 would be handled.
So:
To handle "anything" out there, then yes.
If you want to strictly follow the standard then no.

HTTP spec says:
The line terminator for message-header fields is the sequence CRLF. However, we recommend that applications, when parsing such headers, recognize a single LF as a line terminator and ignore the leading CR.
In practice I've never seen web server with CR line separator.

Understanding the "content type" for PDFs in crawling output

Using heritrix, I have crawled a site which contained some PDF files. The crawl log shows that the content type for the pdf link is "application/pdf", whereas the response in .warc file (crawl output) shows that the content type is "application/http" as well as "application/pdf" (see the example below:).
WARC/1.0^M
WARC-Type: response^M
WARC-Target-URI: `http://example.com/b/c/files/abc.pdf`^M
WARC-Date: 2014-05-29T10:48:03Z^M
WARC-Payload-Digest: sha1:JMRPMGSNIPHBPSBNPD2VJ2NIOGD75UUK^M
WARC-IP-Address: 86.36.67.50^M
WARC-Record-ID: <urn:uuid:00c8b80f-2851-42a1-a449-3cd9e238bfe9>^M
**Content-Type: application/http; msgtype=response^M**
Content-Length: 592173^M
WARC-Block-Digest: sha256:0a56d251257dbcbd6a54e19a528a56aae3e0c9e92a6702f4048e3b69bb3e0920^M
^M
HTTP/1.1 200 OK^M
Date: Thu, 29 May 2014 10:48:04 GMT^M
Server: Apache/2.4.4 (Unix) OpenSSL/0.9.7d PHP/5.3.12 mod_jk/1.2.35^M
Last-Modified: Wed, 20 Nov 2013 08:13:50 GMT^M
ETag: "90805-4eb975c6bcb80"^M
Accept-Ranges: bytes^M
Content-Length: 591877^M
Connection: close^M
**Content-Type: application/pdf^M**
followed by the content of the PDF file
I do not understand how this is happening. Can anyone please explain?

The WARC file contains:
First the WARC-Header-Metadata, from the beginning to the first empty line. This header describes what follows, ie. a full http response, with header and content. Hence the content-type to application/http.
Then comes the HTTP-Response-Metadata. This header is the actual HTTP header and describes what follows, ie. a PDF document.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Proper Chunked Transfer Encoding Format - http

If your code does not emit length information that it is clearly incorrect. See http://greenbytes.de/tech/webdav/rfc2616.html#rfc.section.3.6.1.

Related

How to POST multipart/form-data using Vegeta?

Robot Framework “multipart/form-data” REST request with multiple parameters

NodeMCU http.get() and UTF-8 charset

Is it necessary to check both '\r\n\r\n' and '\n\n' as HTTP Header/Content separator?

Understanding the "content type" for PDFs in crawling output

Categories

Resources