Most servers have the http request header length limit(4k~8k).
Usually we split the long headers into several parts.
For golang http package, I remember that it combines headers with the same key value into one giant header. Is this correct?
Like if I have a token which length exceeds the 8k limit. I'd like to split into several parts with the same header key Authorization.
Then send request using http package.
Does this split make sense or not?
Hmm, I'm not sure that's quite valid. The Headers object is actually a map of string keys pointing to string slices.
https://golang.org/pkg/net/http/#Header
As such, if you try to set the same key it will be overwritten as per standard golang map functionality.
Related
How can sent and received bytes be counted from within a ServeHTTP function in Go?
The count needs to be relatively accurate. Skipping connection establishment is not ideal, but acceptable. But headers must be included.
It also needs to be fast. Iterating is generally too slow.
The counting itself doesn't need to occur within ServeHTTP, as long the count for a given connection can be made available to ServeHTTP.
This must also not break HTTPS or HTTP/2.
Things I've Tried
It's possible to get a rough, slow estimate of received bytes by iterating over the Request headers. This is far too slow, and the Go standard library removes and combines headers, so it's not accurate either.
I tried writing an intercepting Listener, which created an internal tls.Listen or net.Listen Listener, and whose Accept() function got a net.Conn from the internal Listener's Accept(), and then wrapped that in an intercepting net.Conn whose Read and Write functions call the real net.Conn and count their reads and writes. It's then possible to make those counts available to the ServeHTTP function via mutexed shared variables.
The problem is, the intercepting Conn breaks HTTP/2, because Go's internal libraries cast the net.Conn as a *tls.Conn (e.g. https://golang.org/src/net/http/server.go#L1730), and it doesn't appear possible in Go to wrap the object while still making that cast succeed (if it is, it would solve this problem).
Counting sent bytes can be done relatively accurately by counting what is written to the ResponseWriter. Counting received bytes in the HTTP body is also achievable, via Request.Body. The critical issue here appears to be quickly and accurately counting request header bytes. Though again, also counting connection establishment bytes would be ideal.
Is this possible? How?
I think it is possible, but I can't say I've done it. However, based on browsing the stdlib implementation of the HTTP server and TLS listener, I don't see why it shouldn't be possible; the key is wrapping the connection before TLS instead of after. This also gets you a more accurate count of bytes on the wire, rather than a count of decrypted bytes.
You've already got an intercepting Listener, you just need to insert it in the right spot. Rather than passing your Listener to http.Serve (or wherever you're inserting it), you want to pass it to tls.NewListener first, which wraps it in the TLS handler, and then pass the result, which will be a TLS listener (making Go's HTTP/2 support happy) into the HTTP server.
Of course, if you want a count of decrypted bytes rather than wire bytes, you may be SOL - wrapping the net.Conn just won't get you there. You'll likely have to do the best you can with counting headers & body.
I would like to generate a multipart byte range response. Is there a way for me to do it without scanning each segment I am about to send out, since I need to generate multipart boundary strings?
For example, I can have a user request a byterange that would have me fetch and scan 2GB of data, which in my case involves me loading that data into my (slow) VM as strings and so forth. Ideally I would like to simply state in the response that a part has a length of a certain number of bytes, and be done with it. Is there any tooling that could provide me with this option? I see that many developers just grab a UUID as the boundary and are probably willing to risk a tiny probability that it will appear somewhere within the part, but that risk seems to be small enough multiple people are taking it?
To explain in more detail: scanning the parts ahead of time (before generating the response) is not really feasible in my case since I need to fetch them via HTTP from an upstream service. This means that I effectively have to prefetch the entire part first to compute a non-matching multipart boundary, and only then can I splice that part into the response.
Assuming the data can be arbitrary, I don’t see how you could guarantee absence of collisions without scanning the data.
If the format of the data is very limited (like... base 64 encoded?), you may be able to pick a boundary that is known to be an illegal sequence of bytes in that format.
Even if your boundary does collide with the data, it must be followed by headers such as Content-Range, which is even more improbable, so the client is likely to treat it as an error rather than consume the wrong data.
Major Web servers use very simple strategies. Apache grabs 8 random bytes at startup and renders them in hexadecimal. nginx uses a sequential counter left-padded with zeroes.
UUIDs are designed to avoid collisions with other UUIDs, not with arbitrary data. A UUID is no more likely to be a good boundary than a completely random string of the same length. Moreover, some UUID variants include information that you may not want to disclose, such as your machine’s MAC address.
Ideally I would like to simply state in the response that a part has a length of a certain number of bytes, and be done with it. Is there any tooling that could provide me with this option?
Maybe you can avoid supporting multiple ranges and simply tell the clients to request each range separately. In that case, you don’t use the multipart format, so there is no problem.
If you do want to send multiple ranges in one response, then RFC 7233 requires the multipart format, which requires the boundary string.
You can, of course, invent your own mechanism instead of that of RFC 7233. In that case:
You cannot use 206 (Partial Content). You must use 200 (OK) or some other applicable status code.
You cannot use the multipart/byteranges media type. You must come up with your own media type.
You cannot use the Range request header.
Because a 200 (OK) response to a GET request is supposed to carry a (full) representation of the resource, you must do one of the following:
encode the requested ranges in the URL; or
use something like POST instead of GET; or
use a custom, non-standard status code instead of 200 (OK); or
(not sure if this is a correct approach) use media type parameters, send them in Accept, and add Accept to Vary.
The chunked transfer coding may be useful, but you cannot rely on it alone, because it is a property of the connection, not of the payload.
rfc2616 (HTTP/1.1):
A response to a request for a single range MUST NOT be sent using the
multipart/byteranges media type.
A response to a request for multiple ranges, whose result is a single
range, MAY be sent as a multipart/byteranges media type with one part.
A client that cannot decode a multipart/byteranges message MUST NOT
ask for multiple byte-ranges in a single request.
If I understand this correctly, multiple ranges in a single request MAY use multipart/byteranges and clients MUST be able to decode it or shouldn't request it at all.
Does the "MAY" imply that there are also alternatives to multipart/byteranges that could be used? Do any exist? If so, are there headers to request them?
For example, could a server potentially concatenate all byte ranges into a single part response?
If a request asks for multiple ranges and the server can concatenate the requested ranges into a single continuous range, then the response can either:
use multipart/byteranges with a single MIME part for the concatenated range, where the part has its own Content-Range header.
send the concatenated data by itself and include a top-level Content-Range header.
As far as my experience back in 2012, I would reommend to stick to the first, i.e. "A response to a request for a single range MUST NOT be sent using the multipart/byteranges media type." because some clients will choke.
I'm writing a Web service that is going to use HMAC for authentication. Quick overview: an HMAC is a message digest calculated using the body of a message along with a secret key. The sender calculates the HMAC and attaches it to the request, then the receiver calculates the message digest on receipt using the secret key, which it has on file. If the digests are the same, then the receiver can be sure that the message was sent by the person who they claim to be.
My question is about the parameter order. Let's say the Web service request has three parameters, foo, bar and baz. The body of the HTTP POST will look something like:
foo=1&bar=2&baz=3&hmac=de7c9b85b8b78aa6bc8a7a36f70a90701c9db4d9
(The HMAC in this case is a fake example.)
Normally HTTP parameter order is not significant, but when it comes to calculating the hash, it is. Should the server take the raw incoming request, drop the "hmac" parameter which is, of course, not part of the hash calculation, and hash that? Or should there be an agreed upon order of parameters which must be followed in order for the hash to be calculated correctly?
The former approach puts a bit more of a burden on the implementor on the server side, but it's more robust. What I'm really asking about is the expectation of developers who are building things on the client side. Do they expect that things will just work regardless of what order the parameters?
I would say that manipulating the body of the request after you have calculated a hash based on that body, which is significant to whether the request is accepted, is generally bad practice (for reasons that, I feel, are obvious). That HMAC should not be appended to the request body, but set in either a GET parameter, a cookie, or a custom header.
This also reduces the burden on the implementor on the server side for your first suggestion, and this is the path I would recommend.
But that's me, others may have differing opinions on all of this...
In HTTP there are two ways to POST data: application/x-www-form-urlencoded and multipart/form-data. I understand that most browsers are only able to upload files if multipart/form-data is used. Is there any additional guidance when to use one of the encoding types in an API context (no browser involved)? This might e.g. be based on:
data size
existence of non-ASCII characters
existence on (unencoded) binary data
the need to transfer additional data (like filename)
I basically found no formal guidance on the web regarding the use of the different content-types so far.
TL;DR
Summary; if you have binary (non-alphanumeric) data (or a significantly sized payload) to transmit, use multipart/form-data. Otherwise, use application/x-www-form-urlencoded.
The MIME types you mention are the two Content-Type headers for HTTP POST requests that user-agents (browsers) must support. The purpose of both of those types of requests is to send a list of name/value pairs to the server. Depending on the type and amount of data being transmitted, one of the methods will be more efficient than the other. To understand why, you have to look at what each is doing under the covers.
For application/x-www-form-urlencoded, the body of the HTTP message sent to the server is essentially one giant query string -- name/value pairs are separated by the ampersand (&), and names are separated from values by the equals symbol (=). An example of this would be:
MyVariableOne=ValueOne&MyVariableTwo=ValueTwo
According to the specification:
[Reserved and] non-alphanumeric characters are replaced by `%HH', a percent sign and two hexadecimal digits representing the ASCII code of the character
That means that for each non-alphanumeric byte that exists in one of our values, it's going to take three bytes to represent it. For large binary files, tripling the payload is going to be highly inefficient.
That's where multipart/form-data comes in. With this method of transmitting name/value pairs, each pair is represented as a "part" in a MIME message (as described by other answers). Parts are separated by a particular string boundary (chosen specifically so that this boundary string does not occur in any of the "value" payloads). Each part has its own set of MIME headers like Content-Type, and particularly Content-Disposition, which can give each part its "name." The value piece of each name/value pair is the payload of each part of the MIME message. The MIME spec gives us more options when representing the value payload -- we can choose a more efficient encoding of binary data to save bandwidth (e.g. base 64 or even raw binary).
Why not use multipart/form-data all the time? For short alphanumeric values (like most web forms), the overhead of adding all of the MIME headers is going to significantly outweigh any savings from more efficient binary encoding.
READ AT LEAST THE FIRST PARA HERE!
I know this is 3 years too late, but Matt's (accepted) answer is incomplete and will eventually get you into trouble. The key here is that, if you choose to use multipart/form-data, the boundary must not appear in the file data that the server eventually receives.
This is not a problem for application/x-www-form-urlencoded, because there is no boundary. x-www-form-urlencoded can also always handle binary data, by the simple expedient of turning one arbitrary byte into three 7BIT bytes. Inefficient, but it works (and note that the comment about not being able to send filenames as well as binary data is incorrect; you just send it as another key/value pair).
The problem with multipart/form-data is that the boundary separator must not be present in the file data (see RFC 2388; section 5.2 also includes a rather lame excuse for not having a proper aggregate MIME type that avoids this problem).
So, at first sight, multipart/form-data is of no value whatsoever in any file upload, binary or otherwise. If you don't choose your boundary correctly, then you will eventually have a problem, whether you're sending plain text or raw binary - the server will find a boundary in the wrong place, and your file will be truncated, or the POST will fail.
The key is to choose an encoding and a boundary such that your selected boundary characters cannot appear in the encoded output. One simple solution is to use base64 (do not use raw binary). In base64 3 arbitrary bytes are encoded into four 7-bit characters, where the output character set is [A-Za-z0-9+/=] (i.e. alphanumerics, '+', '/' or '='). = is a special case, and may only appear at the end of the encoded output, as a single = or a double ==. Now, choose your boundary as a 7-bit ASCII string which cannot appear in base64 output. Many choices you see on the net fail this test - the MDN forms docs, for example, use "blob" as a boundary when sending binary data - not good. However, something like "!blob!" will never appear in base64 output.
I don't think HTTP is limited to POST in multipart or x-www-form-urlencoded. The Content-Type Header is orthogonal to the HTTP POST method (you can fill MIME type which suits you). This is also the case for typical HTML representation based webapps (e.g. json payload became very popular for transmitting payload for ajax requests).
Regarding Restful API over HTTP the most popular content-types I came in touch with are application/xml and application/json.
application/xml:
data-size: XML very verbose, but usually not an issue when using compression and thinking that the write access case (e.g. through POST or PUT) is much more rare as read-access (in many cases it is <3% of all traffic). Rarely there where cases where I had to optimize the write performance
existence of non-ascii chars: you can use utf-8 as encoding in XML
existence of binary data: would need to use base64 encoding
filename data: you can encapsulate this inside field in XML
application/json
data-size: more compact less that XML, still text, but you can compress
non-ascii chars: json is utf-8
binary data: base64 (also see json-binary-question)
filename data: encapsulate as own field-section inside json
binary data as own resource
I would try to represent binary data as own asset/resource. It adds another call but decouples stuff better. Example images:
POST /images
Content-type: multipart/mixed; boundary="xxxx"
... multipart data
201 Created
Location: http://imageserver.org/../foo.jpg
In later resources you could simply inline the binary resource as link:
<main-resource>
...
<link href="http://imageserver.org/../foo.jpg"/>
</main-resource>
I agree with much that Manuel has said. In fact, his comments refer to this url...
http://www.w3.org/TR/html401/interact/forms.html#h-17.13.4
... which states:
The content type
"application/x-www-form-urlencoded" is
inefficient for sending large
quantities of binary data or text
containing non-ASCII characters. The
content type "multipart/form-data"
should be used for submitting forms
that contain files, non-ASCII data,
and binary data.
However, for me it would come down to tool/framework support.
What tools and frameworks do you
expect your API users to be building
their apps with?
Do they have
frameworks or components they can use
that favour one method over the
other?
If you get a clear idea of your users, and how they'll make use of your API, then that will help you decide. If you make the upload of files hard for your API users then they'll move away, of you'll spend a lot of time on supporting them.
Secondary to this would be the tool support YOU have for writing your API and how easy it is for your to accommodate one upload mechanism over the other.
Just a little hint from my side for uploading HTML5 canvas image data:
I am working on a project for a print-shop and had some problems due to uploading images to the server that came from an HTML5 canvas element. I was struggling for at least an hour and I did not get it to save the image correctly on my server.
Once I set the
contentType option of my jQuery ajax call to application/x-www-form-urlencoded everything went the right way and the base64-encoded data was interpreted correctly and successfully saved as an image.
Maybe that helps someone!
If you need to use Content-Type=x-www-urlencoded-form then DO NOT use FormDataCollection as parameter: In asp.net Core 2+ FormDataCollection has no default constructors which is required by Formatters. Use IFormCollection instead:
public IActionResult Search([FromForm]IFormCollection type)
{
return Ok();
}
In my case the issue was that the response contentType was application/x-www-form-urlencoded but actually it contained a JSON as the body of the request. Django when we access request.data in Django it cannot properly converted it so access request.body.
Refer this answer for better understanding:
Exception: You cannot access body after reading from request's data stream