Foreign character when send file image using http via multipart / form-data - http

I'm trying to understand the http request, when I send a string of data using the post method on the request body there is a value that we send as below:
--------------------------d74496d66958873e
Content-Disposition: form-data; name="person"
anonymous
--------------------------d74496d66958873e
but if we send a file using post method it will be like this:
--------------------------d74496d66958873e
Content-Disposition: form-data; name="fileToUpload"; filename="icon.png"
Content-Type: image/png
-O9†q#ë#ÞÿËà3l†v}uá#t(<‡c3f
úS©59ñõCáa#Ž¡#Za%ð.ž zxý˜F#ZqÄð&^
jx[1…ÕЊËÂ$Æ‚#Þ
--------------------------d74496d66958873e
my question is:
what is the foreign character contained between -------------------------- d74496d66958873e when we send the file?
i mean
-O9†q#ë#ÞÿËà3l†v}uá#t(<‡c3f
úS©59ñõCáa#Ž¡#Za%ð.ž zxý˜F#ZqÄð&^
jx[1…ÕЊËÂ$Æ‚#Þ
whether the character is a binary, hexa, or base64 or what?
how to change image file to the character when we want to write http request manually using programming language?

Those are plain binary bytes. All the bytes that the file icon.png contained when it was sent by the HTTP client.
The format is described in RFC 1867 and basically works like this:
--[boundary]
[headers]
[N bytes]
--[boundary]
[headers]
[M bytes]
--[boundary]--
To extract the contents, you'll need to parse past the boundary, parse the headers (0, 1 or many) and then read the binary data until you reach the ending boundary. (The final boundary has two extra dashes on the right.)
... and there can be any amount of such parts, in a single multipart POST.

Related

How does Servlet HttpServletResponse::setCharacterEncoding() work?

I have learned that in general, Java uses UTF-16 as the internal String representation.
My question is what actually happens when composing a response in Java and applying different char encoding, e.g. response.setCharacterEncoding("ISO-8859-1").
Does it actually convert the response's body bytes from UTF-16 to ISO-8859-1 or it just adds some metadata to the response object?
I'm assuming you're talking about a class that works along the lines of HttpServletResponse. If that's the case, then yes, it changes the body of the response, if you call getWriter. The writer that is returned by that has to convert any strings that are written to it into bytes, and the encoding is used for that.
If you've set the content type, then setting the content encoding will also make that information available via the Content-Type header. As per the ServletResponse docs:
Calling setContentType(java.lang.String) with the String of text/html and calling this method with the String of UTF-8 is equivalent with calling setContentType with the String of text/html; charset=UTF-8.

HTTP multirange requests - headers in response

I'm using multirange http requests like this
"curl --range 1-2,2-3 http://some.url"
The response is like
--00000000000000030705 Content-Type: text/html; charset=utf-8 Content-Range: bytes 1-2/13882393
il
--00000000000000030705 Content-Type: text/html; charset=utf-8 Content-Range: bytes 2-3/13882393
le
--00000000000000030705--
How can I remove fields Content-Type and Content-Range from response to get a raw data from server (without parsing on client side)?
I want to get response like:
"ille"
Thanks a lot!
You probably can't. The server is conforming to the spec, as described by the RFC.
If multiple parts are being transferred, the server generating the 206 response must generate a "multipart/byteranges" payload, as defined in Appendix A, and a Content-Type header field containing the multipart/byteranges media type and its required boundary parameter. To avoid confusion with single-part responses, a server must not generate a Content-Range header field in the HTTP header section of a multiple part response (this field will be sent in each part instead).
In the case of contiguous multi ranges the server may send the response without the multipart boundaries but this is optional.
When multiple ranges are requested, a server may coalesce any of the ranges that overlap, or that are separated by a gap that is smaller than the overhead of sending multiple parts, regardless of the order in which the corresponding byte-range-spec appeared in the received Range header field. Since the typical overhead between parts of a multipart/byteranges payload is around 80 bytes, depending on the selected representation's media type and the chosen boundary parameter length, it can be less efficient to transfer many small disjoint parts than it is to transfer the entire selected representation.
Within the header area of each body part in the multipart payload, the server must generate a Content-Range header field corresponding to the range being enclosed in that body part. If the selected representation would have had a Content-Type header field in a 200 (OK) response, the server should generate that same Content-Type field in the header area of each body part. For example:
Assuming your server conforms to the spec, sending a single range 1-3 you will get a single body.

HTTP post multipart-form data length format?

In a HTTP POST multipart-form content-type stream, what do the long ------------- lines means? What is hex encoded as the end of these lines? Can you figure out the length of the variables from them? Or is this a specially designed sequence so you can find the break between variables?
-----------------------------7dc34719970524
Content-Disposition: form-data; name="my variable"
blah content here
-----------------------------7dc34719970524
Content-Disposition: form-data; name="asdfasdf"
heaps of data here
It is a boundary that is used to separate the different sets of data in case of a multi-part data submission. Read more about it:
http://www.w3.org/Protocols/rfc1341/7_2_Multipart.html
To quote from the link:
In the case of multiple part messages, in which one or more different sets of data are combined in a single body, a "multipart" Content-Type field must appear in the entity's header. The body must then contain one or more "body parts," each preceded by an encapsulation boundary, and the last one followed by a closing boundary. Each part starts with an encapsulation boundary, and then contains a body part consisting of header area, a blank line, and a body area. Thus a body part is similar to an RFC 822 message in syntax, but different in meaning.

ASP.Net not populating Request.Files on receiving multipart data

I'm sending files from an android app to a asp.net webform using multipart/form-data as the content type. However the Request.files property does not get populated. Reading the Request object I get the following
Request.Params("ALL_HTTP")
"HTTP_CONNECTION:Keep-Alive HTTP_CONTENT_LENGTH:8913 HTTP_CONTENT_TYPE:multipart/form-data;boundary=*********************** HTTP_HOST:192.168.1.2 HTTP_USER_AGENT:Dalvik/1.2.0 (Linux; U; Android 2.2; sdk Build/FRF91) "
The HTTP_CONTENT_LENGTH shows the correct length. I guess I will have to do a binary read and then parse the content and store the file contents. Has anyone done this before or is there a library/class available?
Thanks
How are you writing the files to the request stream? The following rules should be followed when programatically uploading files (binary streams):
1) Write a boundary (it could be anything prefixed by two dashes). Here is an example boundary:
private string boundary = "----" + DateTime.Now.Ticks;
2) Write content disposition in the form:
Content-Disposition: form-data; name="{name}"; filename="{filename}"
3) Write the content type
4) Write an empty line
5) Write the bytes to the request stream
6) Write the end boundary, it marks the end of the request. It should be in the following form:
"--" + boundary + "--"
7) Write an empty line and flush (if needed) the request.
Here is how sample upload request should look inside an Http debugging tool like fiddler:
------634388181001966332
Content-Disposition: form-data; name="files"; filename="cald_3d.JPG"
Content-Type: application/octet-stream
1010101001... (more bytes)
------634388181001966332--
Then, on the server, access the file with Request.Files[name], the same name which you have used when specifying Content Disposition. Good luck :)

multipart/form-data, what is the default charset for fields?

what is the default encoding one should use to decode multipart/form-data if no charset is given? RFC2388 states:
4.5 Charset of text in form data
Each part of a multipart/form-data is supposed to have a content-
type. In the case where a field element is text, the charset
parameter for the text indicates the character encoding used.
For example, a form with a text field in which a user typed 'Joe owes
<eu>100' where <eu> is the Euro symbol might have form data returned
as:
--AaB03x
content-disposition: form-data; name="field1"
content-type: text/plain;charset=windows-1250
content-transfer-encoding: quoted-printable>>
Joe owes =80100.
--AaB03x
In my case, the charset isn't set and I don't know how to decode the data within that text/plain section. As I do not want to enforce something that isn't standard behavior I'm asking what the expected behavior in this case is. The RFC does not seem to explain this so I'm kinda lost.
Thank you!
This apparently has changed in HTML5 (see http://dev.w3.org/html5/spec-preview/constraints.html#multipart-form-data).
The parts of the generated multipart/form-data resource that correspond to non-file fields must not have a Content-Type header specified.
So where is the character set specified? As far as I can tell from the encoding algorithm, the only place is within a form data set entry named _charset_.
If your form does not have a hidden input named _charset_, what happens? I've tested this in Chrome 28, sending a form encoded in UTF-8 and one in ISO-8859-1 and inspecting the sent headers and payload, and I don't see charset given anywhere (even though the text encoding definitely changes). If I include an empty _charset_ field in the form, Chrome populates that with the correct charset type. I guess any server-side code must look for that _charset_ field to figure it out?
I ran into this problem while writing a Chrome extension that uses XMLHttpRequest.send of a FormData object, which always gets encoded in UTF-8 no matter what the source document encoding is.
Let the request entity body be the result of running the multipart/form-data encoding algorithm with data as form data set and with utf-8 as the explicit character encoding.
Let mime type be the concatenation of "multipart/form-data;", a U+0020 SPACE character, "boundary=", and the multipart/form-data boundary string generated by the multipart/form-data encoding algorithm.
As I found earlier, charset=utf-8 is not specified anywhere in the POST request, unless you include an empty _charset_ field in the form, which in this case will automatically get populated with "utf-8".
This is my understanding of the state of things. I welcome any corrections to my assumptions!
The default charset for HTTP 1.1 is ISO-8859-1 (Latin1), I would guess that this also applies here.
3.7.1 Canonicalization and Text Defaults
--snip--
The "charset" parameter is used with some media types to define the character set (section 3.4) of the data. When no explicit charset parameter is provided by the sender, media subtypes of the "text" type are defined to have a default charset value of "ISO-8859-1" when received via HTTP. Data in character sets other than "ISO-8859-1" or its subsets MUST be labeled with an appropriate charset value. See section 3.4.1 for compatibility problems.
Thanks to the detailed explanation by #owlman.
Just some more info here:
Upload request payload fragment:
------WebKitFormBoundarydZAwJIasnBbGaUqM
Content-Disposition: form-data; name="file"; filename="xxx.txt"
Content-Type: text/plain
If "xxx.txt" has some UNICODE char in it using UTF-8 encoding, Resin(as of 4.0.40) can't decode it correctly, but Jetty(9.x) can.
I think the reason for Resin's behavior is that the Content-type doesn't specify any encoding, so Resin decode file name using "ISO8859-1", which may result in garbled characters.
I did some googling:
https://mail-archives.apache.org/mod_mbox/struts-user/200310.mbox/%3C3FA0395B.1080209#kumachan.net.nz%3E
It seems that Resin's behavior is according to Servlet Spec 2.3
And I can't find any settings from http://www.caucho.com/resin-4.0/reference.xtp
which can change this behavior for Resin.

Resources