How to perform an action when a remote (Http) file changed? - http

I want to create a script that checks an URL and perform an action (download + unzip) when the "Last-Modified" header of the remote file changed. I thought about fetching the header with curl but then I have to store it somewhere for each file and perform a date comparison.
Does anyone have a different idea using (mostly) standard unix tools?
thanks

A possible solution would be periodically running this algorithm on the client box.
Create a HTTP request indicating the If-Modified-Since header equal to the date of your local file. If the file does not exist yet do not include this header;
The server will either send you the file if it was changed since the If-Modified-Since header in the payload or send 304 Not Modified HTTP status.
If you receive a 200 OK HTTP status simply get the payload from the HTTP body and unzip the file.
If in the other hand you received a 304 Not Modified you know that your file is up-to-date.
Use the Last-Modified header to touch your local file. This way you will be in sync with the server datetime.
Another way would be for the server to push notifications (a broadcast package for example) when the file is changed. When the notification is received the client would then execute the above algorithm. This would imply code to live in the HTTP server that listens for file system changes and then broadcast them to interested parties.
Perhaps this info for the curl command is of some importance:
TIME CONDITIONS
HTTP allows a client to specify a time
condition for the document it
requests. It is If-Modified-Since or
If-Unmodified-Since. Curl allow you to
specify them with the -z/--time-cond
flag.
For example, you can easily make a
download that only gets performed if
the remote file is newer than a local
copy. It would be made like:
curl -z local.html
http://remote.server.com/remote.html
Or you can download a file only if the
local file is newer than the remote
one. Do this by prepending the date
string with a '-', as in:
curl -z -local.html
http://remote.server.com/remote.html
You can specify a "free text" date as
condition. Tell curl to only download
the file if it was updated since
yesterday:
curl -z yesterday
http://remote.server.com/remote.html
Curl will then accept a wide range of
date formats. You always make the date
check the other way around by
prepending it with a dash '-'.prepending it with a dash '-'.
To sum up, you will need:
curl command
touch command
some bash scripting

is Java applicable in your case? I did a similar thing in one of my homework using the Apache HTTPcore library, you need to add the header "If-Modified-Since" to your HTTP request before you send it to the server, if the status code of the response that you receive from the server is not 304 then you know that the file has changed since the time value that you're checking against.

Related

Is there a way to set the http Header values for an esp_https_ota call?

I'm trying to download a firmware.bin file that is produced in a private Github repository. I have the code that is finding the right asset url to download the file and per Github instructions the accept header needs to be set to accept: application/octet-stream in order to get the binary file. I'm only getting JSON in response. If I run the same request through postman I'm getting a binary file as the body. I've tried downloading it using HTTPClient and I get the same JSON request. It seems the headers aren't being set as requested to tell Github to send the binary content as I'm just getting JSON. As for the ArduinoOTA abstraction, I can't see how to even try to set headers and in digging into the esp_https_ota functions and http_client functions there doesn't appear to be a way to set headers for any of these higher level abstractions because the http_config object has no place for headers as far as I can tell. I might file a feature request to allow for this, but am new to this programming area and want to check to see if I'm missing something first.
Code returns JSON, not binary. URL is github rest api url to the asset (works in postman)
HTTPClient http2;
http2.setAuthorization(githubname,githubpass);
http2.addHeader("Authorization","token MYTOKEN");
http2.addHeader("accept","application/octet-stream");
http2.begin( firmwareURL, GHAPI_CERT); //Specify the URL and certificate
With the ESP IDF HTTP client you can add headers to an initialized HTTP client using function esp_http_client_set_header().
esp_http_client_handle_t client = esp_http_client_init(&config);
esp_http_client_set_header(client, "HeaderKey", "HeaderValue");
err = esp_http_client_perform(client);
If using the HTTPS OTA API, you can register for a callback which gives you a handle to the underlying HTTP client. You can then do the exact same as in above example.

Curl redirect without sending the first POST

I'm using "curl -L --post302 -request PUT --data-binary #file " to post a file to a redirected address. At the moment the redirection is not optional since it will allow for signed headers and a new destination. The GET version works well. The PUT version under a certain file size threshold works also. I need a way for the PUT to allow itself to be redirected without sending the file on the first request (to the redirectorURL) and then only send the file when the POST is redirected to a new URL. In other words, I don't want to transfer the same file twice. Is this possible? According to the RFC (https://www.rfc-editor.org/rfc/rfc2616#section-8.2) it appears that a server may send a 100 "with an undeclared wait for 100 (Continue) status, applies only to HTTP/1.1 requests without the client asking to send its payload" so what I'm asking for may be thwarted by the server. Is there a way around this with one curl call? If not, two curl calls?
Try curl -L -T file $URL as the more "proper" way to PUT that file. (Often repeated by me: -X and --request should be avoided if possible, they cause misery.)
curl will use "Expect: 100" by itself in this case, but you'll also probably learn that servers widely don't care about supporting that anyway so it'll most likely still end up having to PUT twice...

Using If-Modified-Since header for dynamically generated remote files

Our web server regularly downloads images from other web servers. To prevent our server having to download the same image every day even if it has not changed, I plan to store the Last-Modified header when the image downloads and then put that date in the If-Modified-Since header of subsequent requests for the same file.
I have this working fine except when the remote file is generated on-the-fly when requested (e.g. if it generates a certain sized version for the web when requested from separate original file). In this case, the Last-Modified header is the date that the remote server responds to the request so the stored Last-Modified header from the previous download will always be earlier that ones for subsequent requests so the image will always get downloaded and I'll never get the 304 Not Modified status code.
So, is there a way to reduce the download frequency when the remote server is serving up images that are generated on the fly?
It sounds to me like this is not possible, but I thought I'd ask anyway.
If you can create some form of hash for the the images use ETags. Your server will have to check the If-None-Match request header against the hash and if they match you can return a 304 response.
Clients will still send Last-Modified but if your hashing method does not generate many collision you should be able to ignore it and just match the ETags.

What tool should I use to fetch HTTP header of a remote web server?

I am basically looking for something similar but simpler tool like cURL that fetches http header without the body. Not interested downloading the body. Noticed cURL seems to download the body and consumes unnecessary bandwidth for my need
use the -I flag to curl to make it issue a HEAD request, i.e., just the headers.
(not guaranteed to be exactly the same, but is supposed to be)
If you are using the libcurl library, the curl_easy_setopt() function has a CURLOPT_NOBODY option available, which causes libcurl to send a HEAD request to download just the headers, instead of a GET request to download the entire data.

nginx resumable upload with upload_module and multipart/form

I currently upload to a webservice on an nginx server using the upload module (http://www.grid.net.ru/nginx/upload.en.html) from a custom desktop application doing a simple multipart-form POST that sends a file in one part and a base64 encoded XML with the file's metadata in another part.
The server receives this POST, passes it to my webservice which reads the metadata, processes the file and all is good.
What I want to do now is use the upload module's upload_resumable directive to do the POST in several chunks to minimize disconnection chances and allow resume. I can currently do this following the protocol described here: http://www.grid.net.ru/nginx/resumable_uploads.en.html
One sends byte ranges of the file along with some headers to identify the chunk and the session in several posts and once all the parts have been uploaded, nginx will compose the final POST containing the file name and path and pass it to your upload_pass location (which in my case CGIs to a django app).
However, I am not clear on how one would send a multipart post with this method since the protocol indicates that the body of the POST must be the bytes indicated in the byte range. I need the final post to also contain the XML I wrote about above.
I can think of sending the XML as the first bytes of the body and a header that indicates how many bytes belong to it but that would mean extra handling of the final file to remove that header and the final files are potentially in the GB size range.
Any other ideas?
Since the protocol supported by nginx specifically states that the post should not be multipart I ended up sending the file in the body, and the rest of the parameters encoded in the URL. Not the prettiest URLs but it works.

Resources