Amazon S3 Multipart Upload Async

Amazon S3 Multipart Upload Async - asynchronous

I'm working with Amazon S3 multipart uploading and I read that you can upload parts of a file in parallel. However, looking through documentation I see that Amazon's response to an uploaded file part does not contain a part number. So my question is if I upload Part 1 of a file and Part 2 of a file asynchronously then I check for a response from Amazon how do I know if the response is referring to Part 1 or Part 2 of the file?
Here's an example request and response.
Request:
PUT /my-movie.m2ts?partNumber=1&uploadId=VCVsb2FkIElEIGZvciBlbZZpbmcncyBteS1tb3ZpZS5tMnRzIHVwbG9hZR HTTP/1.1
Host: example-bucket.s3.amazonaws.com
Date: Mon, 1 Nov 2010 20:34:56 GMT
Content-Length: 10485760
Content-MD5: pUNXr/BjKK5G2UKvaRRrOA==
Authorization: AWS AKIAIOSFODNN7EXAMPLE:VGhpcyBtZXNzYWdlIHNpZ25lZGGieSRlbHZpbmc=
***part data omitted***
Response:
HTTP/1.1 200 OK
x-amz-id-2: Vvag1LuByRx9e6j5Onimru9pO4ZVKnJ2Qz7/C1NPcfTWAtRPfTaOFg==
x-amz-request-id: 656c76696e6727732072657175657374
Date: Mon, 1 Nov 2010 20:34:56 GMT
ETag: "b54357faf0632cce46e942fa68356b38"
Content-Length: 0
Connection: keep-alive
Server: AmazonS3

The Etag you get back in the response to each part is the md5sum of the part you just uploaded.
In the case of your example, unless I have made an error, your Content-MD5 decodes to a54357aff06328ae46d942af69146b38 ... so I would suggest that unless you have a problem with your MD5 calculation, the request and the response you've posted don't actually belong together.
The multipart uploader that I wrote is extremely pedantic because I use it to archive critical data (so pedantic, in fact, that it actually turns around and re-downloads the file after it thinks the multipart upload succeeded to be absolutely certain that the final product is perfect) ... but this utility submits the parts sequentially with a call that blocks and doesn't return until the response comes back... and one of its sanity tests is to compare the locally-calculated MD5 of the block with the Etag returned, and it's a fatal error if they don't match... so unless you have identical blocks, it would seem like you could correlate the parts that way.
additional:
I didn't use the missing body to calculate an md5 :) I took your header:
Content-MD5: pUNXr/BjKK5G2UKvaRRrOA==
Converted from base64 -> binary -> hex and got a54357aff06328ae46d942af69146b38.
I do my verification downloads by stringing together 2 command line utilities, like this:
wget --server-response '$signed_url' -O - | md5sum
This downloads the file and pipes the bytes into md5sum for calculating the checksum, so I can download an infinitely-large file without using any disk space and very little memory. The wget utility has built-in retrying capability and will try to continue from the byte position where it left off if something breaks the connection. The outputs of this pipeline are the md5sum of the file (stdout) and the headers sent by the server and a progress meter (stderr). My utility captures stdout and does the comparison, while letting stderr leak through to the console for observation.

When you initiate the multipart upload, you include the part number in the request. From the AWS multipart upload documentation:
PUT /ObjectName?partNumber=PartNumber&uploadId=UploadId HTTP/1.1
Host: BucketName.s3.amazonaws.com
Date: date
Content-Length: Size
Authorization: Signature
Therefore there's no ambiguity about which part you have just uploaded.
EDIT So the basic process is the following:
Initiate a multipart upload and get an UploadId
Upload all the parts
in parallel. In each response you will get an ETag header - you need
to remember it and the part number it goes with so AWS can
reassemble the file
Then send all the ETag values and the part
numbers and complete the multipart upload

Related

Sending HTTP requests with AT commands on the ESP32

I'm trying to send AT commands to my ESP32* module and am not getting any response back.
I need to perform a POST request that contains the username and password and other requests later on. I am not structuring these correctly and there is not a lot of good documentation for this.
NOTE: because I cannot share my complete url due to privacy I will use something with the same length ********connected.com:443
Send login information to ********connected.com/login (POST) body{"email":"myemail.ca", "password":"xxxxx"}
once I get the token I will make other requests.
get information regarding user profile ********connected.com/getRoutine ( GET) query param username="bob"
I really want to understand how these requests are structured so if someone can explain it to me elegantly that would be great!
Here is what I have tried..
AT
OK
AT+CIPSTART="TCP","********connected.com",443
CONNECT
OK
AT+CIPSEND=48
> "GET ********connected.com:443/getUsersOnline"
OK
>
Recv 48 bytes
SEND OK
CLOSED
REQUESTED POST REQUEST I HAVE USED
AT+CIPSEND=177 “POST \r Host: ********connected.com\r\n Accept: application/json\r\n Content-Length: 224r\n Content-Type: application/jsonr\n { "email":"myemail.com", "password":"myPassword" } “

There are actually several parts of your system that might be the cause of the malfunctioning:
The AT commands sent (it is not clear how you check for server responses. Responses could proviede clues about what's wrong)
The server side app seems to be a custom implementation that might have bugs as well
The POST request might be malformed
Let's focus on the last one.
POST are described in RFC 7231, and though it is an obscure description without examples, it makes one thing clear: there's not actually a well defined standard... because it is strictly application dependant!
I also quote the relevant part of this brilliant answer to an old SO question:
When receiving a POST request, you should always expect a "payload", or, in HTTP terms: a message body. The message body in itself is pretty useless, as there is no standard.
For this reason, all we can do is to build a POST request as accurate as possible and then to debug the system as a whole thing making sure that the request matches what expected by the server side application.
In order to do this, let's check another external link I found: POST request examples. We found this request:
POST /test HTTP/1.1
Host: foo.example
Content-Type: application/x-www-form-urlencoded
Content-Length: 27
field1=value1&field2=value2
Now let's compare this example to your request:
POST
Host: ********connected.com
Accept: application/json
Content-Length: 224
Content-Type: application/jsonr
{ "email":"myemail.com", "password":"myPassword" }
You are saying to the server that you want to pass a resource to an unspecified application (no path), and that this resource is 224 bytes long (wrong! Message body is shorter).
For these reasons, at least these things can be improved:
POST /path/invic18app.php HTTP/1.1 //The path to the application and HTTP version are missing
Content-Length: 48 //This must be the length of the message body, without the header. I also would write it as the last option, just before message body
Write TWO empty lines before message body, otherwise the server will interpret it as further (wrong) options
I hope this helps you, even if it is a tentative answer (I cannot try this request myself). But, again, you definitely need to sniff packets a TCP levels, in order to avoid debugging the server if you are not sure that data is actually received! If you cannot install Wireshark, also tcpdump will be ok.

How do I make sure my users are downloading the new version of my S3 File?

This is inside bash file:
s3cmd --add-header='Content-Encoding':'gzip' put /home/media/main.js s3://myproject/media/main.js
This is what I do to upload my backbone compressed file into Amazon S3.
I run this command every time I make changes to my javascript files.
However, when I refresh the page in Chrome, Chrome still uses the cached version.
Request headers:
Accept:*/*
Accept-Encoding:gzip, deflate, sdch
Accept-Language:en-US,en;q=0.8,es;q=0.6
AlexaToolbar-ALX_NS_PH:AlexaToolbar/alxg-3.3
Cache-Control:max-age=0
Connection:keep-alive
Host:myproject.s3.amazonaws.com
If-Modified-Since:Thu, 04 Dec 2014 09:21:46 GMT
If-None-Match:"5ecfa32f291330156189f17b8945a6e3"
User-Agent:Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.71 Safari/537.36
Response headers:
Accept-Ranges:bytes
Content-Encoding:gzip
Content-Length:70975
Content-Type:application/javascript
Date:Thu, 04 Dec 2014 09:50:06 GMT
ETag:"85041deb28328883dd88ff761b10ece4"
Last-Modified:Thu, 04 Dec 2014 09:50:01 GMT
Server:AmazonS3
x-amz-id-2:4fGKhKO8ZQowKIIFIMXgUo7OYEusZzSX4gXgp5cPzDyaUGcwY0h7BTAW4Xi4Gci0Pu2KXQ8=
x-amz-request-id:5374BDB48F85796
Notice that the Etag is different. I made changes to it, but when I refreshed the page, that's what I got. Chrome is still using my old file.

It looks like your script has been aggressively cached, either by Chrome itself or some other interim server.
If it's a js file called from a HTML page (which is sounds like it is), one technique I've seen is having the page add a parameter to the file:
<script src="/media/main.js?v=123"></script>
or
<script src="/media/main.js?v=2015-01-03_01"></script>
... which you change whenever the JS is updated (but will be ignored by the server). Neither the browser nor any interim caching servers will recognise it as the same and will therefore not attempt to use the cached version - even though on your S3 server it is still the same filename.
Whenever you do a release you can update this number/date/whatever, ideally automatically if the templating engine has access to the application's release number or id.
It's not the most elegant solution but it is useful to have around if ever you find you have used an optimistically long cache duration.
Obviously, this only works if you have uploaded the new file correctly to S3 and S3 is genuinely sending out the new version of the file. Try using a command-line utility like curl or wget on the url of the javascript to check this is the case if you have any doubts about this.

The Invalidation Method
s3cmd -P --cf-invalidate put /home/media/main.js s3://myproject/media/main.js
| |
| Invalidate the uploaded filed in CloudFront.
|
-P, --acl-public / Store objects with ACL allowing read for anyone.
This will invalidate the cache for the file you specify. It's also possible to invalidate your entire site, however, the command above shows what I would imagine you'd want in this scenario.
Note: The first 1000 requests/month are free. After that it's approximately $0.005 per file, so if you do a large number of invalidation requests this might be a concern.
The Query String / Object Key Method
CloudFront includes the query string (on origin) from the given URL when caching the object. What this means is that even if you have the same exact object duplicated, but the query strings are different, then each one will be cached as a different object. In order for this to work properly you'll need to select Yes for Forward Query Strings in the CloudFront console or specify true for the value of the QueryString element in the DistributionConfig complex type when you're using the CloudFront API.
Example:
http://myproject/media/main.js?parameter1=a
Summary:
The most convenient method of ensuring the object being served is the current would be invalidation, although if you don't mind managing the query string parameters then you should find it just as effective. Adjusting the headers won't be nearly as reliable as either method above in my opinion; clients handle caching differently in too many ways that it's not easy to distinguish where caching issues might be.

You need the response from S3 to include the Cache-Control header. You can set this when uploading the file:
s3cmd --add-header="cache-control:max-age=0,no-cache" put file s3://your_bucket/
The lack of whitespace and uppercase in my example is due to some odd signature issue with s3cmd. Your mileage may vary.
After updating the file with that command, you should get the Cache-Control header in the S3 response.

How do I use Fiddler to modify the status code in an HTTP response?

I need to test some client application code I've written to test its' handling of various status codes returned in an HTTP response from a web server.
I have Fiddler 2 (Web Debugging Proxy) installed and I believe there's a way to modify responses using this application, but I'm struggling to find out how. This would be the most convenient way, as it would allow me to leave both client and server code unmodified.
Can anyone assist as I'd like to intercept the HTTP response being sent from server to client and modify the status code before it reaches the client?
Any advice would be much appreciated.

Ok, so I assume that you're already able to monitor your client/server traffic. What you want to do is set a breakpoint on the response then fiddle with it before sending it on to the client.
Here are a couple of different ways to do that:
Rules > Automatic Breakpoints > After Responses
In the quickexec box (the black box at the bottom) type "bpafter yourpage.svc". Now Fiddler will stop at a breakpoint before all requests to any URL that contains "yourpage.svc". Type "bpafter" with no parameters to clear the breakpoint.
Programmatically tamper with the response using FiddlerScript. The best documentation for FiddlerScript is on the official site: http://www.fiddler2.com/Fiddler/dev/
Once you've got a response stopped at the breakpoint, just double click it to open it in the inspectors. You've got a couple of options now:
Right next to the green Run to Completion button (which you click to send the response) there's a dropdown that lets you choose some default response types.
Or, on the Headers inspector, change the response code & message in the textbox at the top.
Or, click the "Raw" inspector and mess with the raw response to do arbitrary things to it. Also a good way to see what your client does when it gets a malformed response, which you'll probably test accidentally :)

Another alternative is to use Fiddler's AutoResponder tab (on the right-hand panel). This allows you to catch a request to any URI that matches a string and serve a "canned" response from a file. The file can contain both headers and payload. The advantage of this approach is that you don't have to write FiddlerScript and you don't have to handle each request manually via a breakpoint.
You would set the rule up in Fiddler like shown below (ensure you enable unmatched requests passthrough otherwise all other http requests will fail).
In this example, any request whose URI includes "fooBar" will get the canned response. The format of the file will vary depending on your APIs (you can use your browser to intercept a "real" response and base it on that) but mine looked like the following:
HTTP/1.1 409 Conflict
Server: Apache-Coyote/1.1
X-Powered-By: Servlet 2.5; JBoss-5.0/JBossWeb-2.1
Access-Control-Allow-Origin: *
Access-Control-Allow-Methods: GET, POST, DELETE, PUT, PATCH, OPTIONS
Access-Control-Allow-Headers: Origin, X-Requested-With, Content-Type, Accept, Authorization
Access-Control-Max-Age: 86400
Content-Type: application/vnd.api+json
Content-Length: 149
Date: Tue, 28 Mar 2017 10:03:29 GMT
{"errors":[{"code":"OutOfStock","detail":"Item not in stock","source":{"lineId":{"type":"Order line Number","id":"1"}},"meta":{"availableStock":0}}]}
I found that it needed a carriage return at the end of the last line (i.e. after the json), and that the Content-Length header had to match the number of characters in the json, otherwise the webapp would hang. Your mileage may vary.

Create a FiddlerScript rule. Here's what I used in order to generate a local copy of a website that was intentionally using 403 on every page to thwart HTTrack/WGET.
https://gist.github.com/JamoCA/22db8d68a9a2fb20cb04a85360185333
/* 20180615 Fiddler rule to ignore all 403 HTTP Status errors so WGET or HTTrack can generate local copy of remote website */
SCENARIO: Changing the user agent or setting a delay isn't enough and the entire remote server is configured to respond w/403.
CONFIGURE: Add below rule to FiddlerScript OnBeforeReponse() section. Configure HTTrack/WGET/CRON to use proxy 127.0.0.01:8888 */
static function OnBeforeResponse(oSession: Session) {
if (oSession.HostnameIs("TARGETHOSTNAME_FILTER.com") && oSession.responseCode == 403) {
oSession.responseCode = 200;
oSession.oResponse.headers.HTTPResponseCode = 200;
oSession.oResponse.headers.HTTPResponseStatus = "200 OK";
}
}

How to download multiple files with one HTTP request?

Use case: user clicks the link on a webpage - boom! load of files sitting in his folder.
I tried to pack files using multipart/mixed message, but it seems to work only for Firefox
This is how my response looks like:
HTTP/1.0 200 OK
Connection: close
Date: Wed, 24 Jun 2009 23:41:40 GMT
Content-Type: multipart/mixed;boundary=AMZ90RFX875LKMFasdf09DDFF3
Client-Date: Wed, 24 Jun 2009 23:41:40 GMT
Client-Peer: 127.0.0.1:3000
Client-Response-Num: 1
MIME-Version: 1.0
Status: 200
--AMZ90RFX875LKMFasdf09DDFF3
Content-type: image/jpeg
Content-transfer-encoding: binary
Content-disposition: attachment; filename="001.jpg"
<< here goes binary data >>--AMZ90RFX875LKMFasdf09DDFF3
Content-type: image/jpeg
Content-transfer-encoding: binary
Content-disposition: attachment; filename="002.jpg"
<< here goes binary data >>--AMZ90RFX875LKMFasdf09DDFF3
--AMZ90RFX875LKMFasdf09DDFF3--
Thank you
P.S. No, zipping files is not an option

Zipping is the only option that will have consistent result on all browsers. If it's not an option because you don't know zips can be generated dynamically, well, they can. If it's not an option because you have a grudge against zip files, well..
MIME/multipart is for email messages and/or POST transmission to the HTTP server. It was never intended to be received and parsed on the client side of a HTTP transaction. Some browsers do implement it, some others don't.
As another alternative, you could have a JavaScript script opening windows downloading the individual files. Or a Java Applet (requires Java Runtimes on the machines, if it's an enterprise application, that shouldn't be a problem [as the NetAdmin can deploy it on the workstations]) that would download the files in a directory of the user's choice.

Remember doing this >10 years ago in the netscape 4 days. It used boundaries like what your doing and didn't work at all with other browsers at that time.
While it does not answer your question HTTP 1.1 supports request pipelining so that at least the same TCP connection can be reused to download multiple images.

You can use base64 encoding to embed an (very small) image into a HTML document, however from a browser/server standpoint, you're technically still sending only 1 document. Maybe this is what you intend to do?
Embedd Images into HTML using Base64
EDIT: i just realized that most methods i found in my google search only support firefox, and not iE.

You could make a json with multiple data urls.
Eg:
{
"stamp.png": "data:image/png;base64,...",
"document.pdf": "data:application/pdf;base64,..."
}

(extending trinalbadger587's answer)
You could return an html with multiple clickable, downloadable, inplace data links:
<html>
<body>
<a download="yourCoolFilename.png" href="data:image/png;base64,...">PNG</a>
<a download="theFileGetsSavedWithThisName.pdf" href="data:application/pdf;base64,...">PDF</a>
</body>
</html>

Size of uploaded file

Do web browsers send the file size in the http header when uploading a file to the server? And if that is the case, then, is it possible to refuse the file just by reading the header and not wait for the whole upload process to finish?

http://www.faqs.org/rfcs/rfc1867.html
HTTP clients are
encouraged to supply content-length for overall file input so that a
busy server could detect if the proposed file data is too large to be
processed reasonably
But the content-length is not required, so you cannot rely on it. Also, an attacker can forge a wrong content-length.
To read the file content is the only reliable way. Having said that, if the content-lenght is present and is too big, to close the connection would be a reasonable thing to do.
Also, the content is sent as multipart, so most of the modern frameworks decode it first. That means you won't get the file byte stream until the framework is done, which could mean "until the whole file is uploaded".

EDIT : before going too far, you may want to check this other answer relying on apache configuration : Using jQuery, Restricting File Size Before Uploading . the description below is only useful if you really need even more custom feedback.
Yes, you can get some information upfront, before allowing the upload of the whole file.
Here's an example of header coming from a form with the enctype="multipart/form-data" attribute :
POST / HTTP/1.1
Host: 127.0.0.1:8000
User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.0.3) Gecko/2008092414 Firefox/3.0.3
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-us,en;q=0.7,fr-be;q=0.3
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 300
Connection: keep-alive
Content-Type: multipart/form-data; boundary=---------------------------886261531333586100294758961
Content-Length: 135361
-----------------------------886261531333586100294758961
Content-Disposition: form-data; name=""; filename="IMG_1132.jpg"
Content-Type: image/jpeg
(data starts here and ends with -----------------------------886261531333586100294758961 )
You have the Content-Length in the header, and additionally there is the Content-Type in the header of the file part ( each file has its own header, which is the purpose of multipart encoding ). Beware that it's the browser responsibility to set a relevant Content-Type by guessing the file type ; you can't guarantee it, but it should be fairly reliable for early rejection ( yet you'd better check the whole file when it's entirely available ).
Now, there is a gotcha. I used to filter image files like that, not on the size, but on the content-type ; but as you want to stop the request as soon as possible, the same problem arises : the browser only gets your response once the whole request is sent, including form content and thus uploaded files.
If you don't want the provided content and stop the upload, you have no choice but to brutally close the socket. The user will only see a confusing "connection reset by peer" message. And that sucks, but it's by design.
So you only want to use this method in cases of background asynchronous checks ( using a timer that checks the file field ). So I had that hack :
I use jquery to tell me if the file field has changed
When a new file is chosen, disable all other file fields on the same form to get only that one.
Send the file asynchronously ( jQuery can do it for you, it uses a hidden frame )
Server-side, check the header ( content-length, content-type, ... ), cut the connection as soon as you got what you need.
Set a session variable telling if that file was OK or not.
Client-side, as the file is uploaded to a frame you don't even get any kind of feedback if the connection is closed. Your only alternative is a timer.
Client-side, a timer polls the server to get a status for the uploaded file. Server side, you have that session variable set, send it back to the brower.
The client has the status code ; render it to your form : error message, green checkmark/red X, whatever. Reset the file field or disable the form, you decide. Don't forget to re-enable other file fields.
Quite messy, eh ? If any of you has a better alternative, I'm all ears.

I'm not sure, but you should not really trust anything sent in the header, as it could be faked by the user.
It depends on how the server works. For example in PHP your script will not run until the file upload is complete, so this wouldn't be possible.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex