Is If-Modified-Since strong or weak validation? - http

HTTP 1.1 states that there can be either strong and weak ETag/If-None-Match validation. My questions is, is Last-Modified/If-Modified-Since validation strong or weak?
This has implications whether sub-range requests can be made or not.

From http://greenbytes.de/tech/webdav/draft-ietf-httpbis-p5-range-23.html#rfc.section.4.3:
"A response might transfer only a subrange of a representation if the connection closed prematurely or if the request used one or more Range specifications. After several such transfers, a client might have received several ranges of the same representation. These ranges can only be safely combined if they all have in common the same strong validator, where "strong validator" is defined to be either an entity-tag that is not marked as weak (Section 2.3 of [Part4]) or, if no entity-tag is provided, a Last-Modified value that is strong in the sense defined by Section 2.2.2 of [Part4]."

An ETag can be strong or weak depending on its suffix. Normally it will be strong, except if you access dynamic content where the content management system (CMS) handles that which is IMHO very uncommon.
However, the If-Modified-Since headers result should be strong too if and only if nobody manipulates the metadata of the files in the filesystem. In Linux it is pretty simple with the touch command, however I think you normally don't need to care about that. If somebody manipulates your server you have a different problem entirely.

Related

Why is the Nginx etag created from last-modified-time and content-length?

Nginx etag source
etag->value.len = ngx_sprintf(etag->value.data, "\"%xT-%xO\"",
r->headers_out.last_modified_time,
r->headers_out.content_length_n)
- etag->value.data;
r->headers_out.etag = etag;
If the file last-modified-time in server is changed but the file content has not been updated, does the etag value will be the same?
Why not the etag value generated by content hash?
Why not the etag value generated by content hash?
Unless nginx has documented the reason it's hard to say why.
My speculation is that they did it this way because it's very fast and only takes a constant amount of time. Computing a hash can be a costly operation, with the amount of time needed depending on the size of the response. nginx, with a reputation for simplicity and speed, may not have been wiling to add that overhead.
If the file last-modified-time in server is changed but the file content has not been updated, does the etag value will be the same?
No, it will not be the same and therefore the file will have to be re-served. The result is a slower response than you would get with a hash-based ETag, but the response will be correct.
The bigger concern with this algorithm is that the content could change while the ETag stays the same, in which case the response will be incorrect. This could happen if the file changes (in a way that keeps the same length) faster than the one-second precision of the Last-Modified time. (In theory a hash-based approach has the same issue—that is, it's possible for two different files to produce the same hash—but collisions are so unlikely that it's not a concern in practice.)
So presumably nginx weighed this tradeoff—a faster response, but one that has a slight chance of being incorrect—and decided that it was worth it.

How to determine if a DAV folder had parallel updates while I was modifying it

I'm syncing local client with a DAV folder (CardDAV in this particular case).
For this folder, I have ETag (CTag in SabreDAV dialect, to distinguish folder etags from item etags). If CTag has changed, I need to re-sync again. But if this change was caused by myself (e.g. I just uploaded a contact into this CardDAV folder), isn't there any way to avoid resync?
Ideally, I wanted that the DAV server would return this info on each request which changes anything on the server:
CTag1, CTag of the current folder as it was before my action was applied
CTag2, CTag of the current folder after my action was applied
ETag assigned to the item in question (although it's not relevant to this particular question).
This would let me understand if CTag change was only caused by my own actions (and no resync needed) or something else occurred in between (and thus resync is needed).
Currently, I can only query the folder for its CTag at any time but I have no clue what to do if CTag changed (in pseudo-code):
cTag0 = ReadStoredValue() ' The value left from the previous sync.
cTag1 = GetCTag()
If cTag0 <> cTag1 Then
Resync()
End If
UploadItem() ' Can get race condition if another client changes anything right now
cTag2 = GetCTag()
cTag2 will obviously be not the same as cTag1 but this provide zero information on whether something else occurred in the middle (another client changed something in the same folder). So, cTag0 <> cTag1 compare won't save me from race conditions, I could think that I'm in sync while some other update sneaked unnoticed.
Would be great to have:
cTag0 = ReadStoredValue() ' The value left from the previous sync.
(cTag1, cTag2) = UploadItem()
If cTag0 == cTag1
' No resync needed, just remember new CTag for the next sync cycle.
cTag0 = cTag2
Else
Resync()
cTag0 = cTag2
End If
I'm aware of DAV-Sync protocol extension but this would be a different story. In this task, I'm referring to the standard DAV, no extensions allowed.
EDIT: One thought which just crossed my mind. I noticed that CTag is sequential. It's a number which gets incremented by 1 on each operation with the folder. So if it's increased by more than 1 between obtaining CTag, making my action and then obtaining CTag again, this will indicate something else has just occurred. But this does not seem to be reliable, I'm afraid it's too implementation-specific to rely on this behavior. Looking for a more robust solution.
How to determine if a DAV folder had parallel updates while I was modifying it
This is very similar to
How to avoid time conflict or overlap for CalDAV?
Technically in pure DAV you are not guaranteed to be able to do this. Though in the real world, most servers will return you the ETag in the response to the PUT which was used to create/update the resource. This allows you to reconcile concurrent changes to the same resource.
There is also the Calendar Server Bulk Change Requests for *DAV Protocols
which is supported by some servers and which provides a more specific way to do this.
Since it isn't an RFC, I wouldn't suggest to rely on that though.
So what you would probably do is a PUT. If that returns you the ETag, you are good and can reconcile by syncing the collection (by whatever mechanism, PROPFIND:1, CTag or sync-report). If not, you either have the option to reconciling by other means (e.g. comparing/hashing the content), or to just treat the change as a concurrent edit, which I think most implementations do.
If you are very lucky, the server may also return the CTag/sync-token in the PUT. But AFAIK there is no standard for that, servers are not required to do it.
For this folder, I have ETag (CTag in SabreDAV dialect)
This is a misconception of yours. A CTag is absolutely not the same like an ETag, it is its own thing documented over here:
CalDAV CTag.
I'm aware of DAV-Sync protocol extension but this would be a different story. In this task, I'm referring to the standard DAV, no extensions allowed.
CTag is not a DAV standard at all, it is a private Apple extension (there is no RFC for that).
Standard HTTP/1.1 specs the ETag. It corresponds to the resource representation and doesn't apply to WebDAV collection contents, which are distinct to that. WebDAV collections often also have contents (that can be retrieved by GET etc), the ETag corresponds to that.
The official standard which replaces the proprietary CTag extension is in fact DAV-Sync aka RFC 6578. And the sync-token property and header is what replaces CTag header.
So if "no extensions allowed" is your use case, you need to resource comparison on the client side. Pure WebDAV doesn't provide this capability.
I noticed that CTag is sequential
CTags are not sequential, they are opaque tokens. Specific servers may use a sequence, but that is completely arbitrary. (the same is true for all DAV tokens, they are always opaque)

Generating a multipart/byterange response without scanning the parts ahead of sending

I would like to generate a multipart byte range response. Is there a way for me to do it without scanning each segment I am about to send out, since I need to generate multipart boundary strings?
For example, I can have a user request a byterange that would have me fetch and scan 2GB of data, which in my case involves me loading that data into my (slow) VM as strings and so forth. Ideally I would like to simply state in the response that a part has a length of a certain number of bytes, and be done with it. Is there any tooling that could provide me with this option? I see that many developers just grab a UUID as the boundary and are probably willing to risk a tiny probability that it will appear somewhere within the part, but that risk seems to be small enough multiple people are taking it?
To explain in more detail: scanning the parts ahead of time (before generating the response) is not really feasible in my case since I need to fetch them via HTTP from an upstream service. This means that I effectively have to prefetch the entire part first to compute a non-matching multipart boundary, and only then can I splice that part into the response.
Assuming the data can be arbitrary, I don’t see how you could guarantee absence of collisions without scanning the data.
If the format of the data is very limited (like... base 64 encoded?), you may be able to pick a boundary that is known to be an illegal sequence of bytes in that format.
Even if your boundary does collide with the data, it must be followed by headers such as Content-Range, which is even more improbable, so the client is likely to treat it as an error rather than consume the wrong data.
Major Web servers use very simple strategies. Apache grabs 8 random bytes at startup and renders them in hexadecimal. nginx uses a sequential counter left-padded with zeroes.
UUIDs are designed to avoid collisions with other UUIDs, not with arbitrary data. A UUID is no more likely to be a good boundary than a completely random string of the same length. Moreover, some UUID variants include information that you may not want to disclose, such as your machine’s MAC address.
Ideally I would like to simply state in the response that a part has a length of a certain number of bytes, and be done with it. Is there any tooling that could provide me with this option?
Maybe you can avoid supporting multiple ranges and simply tell the clients to request each range separately. In that case, you don’t use the multipart format, so there is no problem.
If you do want to send multiple ranges in one response, then RFC 7233 requires the multipart format, which requires the boundary string.
You can, of course, invent your own mechanism instead of that of RFC 7233. In that case:
You cannot use 206 (Partial Content). You must use 200 (OK) or some other applicable status code.
You cannot use the multipart/byteranges media type. You must come up with your own media type.
You cannot use the Range request header.
Because a 200 (OK) response to a GET request is supposed to carry a (full) representation of the resource, you must do one of the following:
encode the requested ranges in the URL; or
use something like POST instead of GET; or
use a custom, non-standard status code instead of 200 (OK); or
(not sure if this is a correct approach) use media type parameters, send them in Accept, and add Accept to Vary.
The chunked transfer coding may be useful, but you cannot rely on it alone, because it is a property of the connection, not of the payload.

Understand the weak comparison function

HTTP 1.1 defines a weak comparison function for cache validators:
in order to be considered equal,
both validators MUST be identical in every way, but either or
both of them MAY be tagged as "weak" without affecting the
result.
I understand that following statement (for two ETags) is true:
W/"Foo" = "Foo"
Now I'm wondering what real world use case might exist where a server compares a weak ETag against a strong one.
There are cases where servers first assign a weak etag, and later on promote it to a strong etag (by removing the "W/" prefix). An example is Apache moddav (or is it plain httpd?), when configured to create entity tags based on the filesystem timestamp of the file being served.

How do you determine HTTP request parameter order when calculating HMACs?

I'm writing a Web service that is going to use HMAC for authentication. Quick overview: an HMAC is a message digest calculated using the body of a message along with a secret key. The sender calculates the HMAC and attaches it to the request, then the receiver calculates the message digest on receipt using the secret key, which it has on file. If the digests are the same, then the receiver can be sure that the message was sent by the person who they claim to be.
My question is about the parameter order. Let's say the Web service request has three parameters, foo, bar and baz. The body of the HTTP POST will look something like:
foo=1&bar=2&baz=3&hmac=de7c9b85b8b78aa6bc8a7a36f70a90701c9db4d9
(The HMAC in this case is a fake example.)
Normally HTTP parameter order is not significant, but when it comes to calculating the hash, it is. Should the server take the raw incoming request, drop the "hmac" parameter which is, of course, not part of the hash calculation, and hash that? Or should there be an agreed upon order of parameters which must be followed in order for the hash to be calculated correctly?
The former approach puts a bit more of a burden on the implementor on the server side, but it's more robust. What I'm really asking about is the expectation of developers who are building things on the client side. Do they expect that things will just work regardless of what order the parameters?
I would say that manipulating the body of the request after you have calculated a hash based on that body, which is significant to whether the request is accepted, is generally bad practice (for reasons that, I feel, are obvious). That HMAC should not be appended to the request body, but set in either a GET parameter, a cookie, or a custom header.
This also reduces the burden on the implementor on the server side for your first suggestion, and this is the path I would recommend.
But that's me, others may have differing opinions on all of this...

Resources