How to determine if a DAV folder had parallel updates while I was modifying it - webdav

I'm syncing local client with a DAV folder (CardDAV in this particular case).
For this folder, I have ETag (CTag in SabreDAV dialect, to distinguish folder etags from item etags). If CTag has changed, I need to re-sync again. But if this change was caused by myself (e.g. I just uploaded a contact into this CardDAV folder), isn't there any way to avoid resync?
Ideally, I wanted that the DAV server would return this info on each request which changes anything on the server:
CTag1, CTag of the current folder as it was before my action was applied
CTag2, CTag of the current folder after my action was applied
ETag assigned to the item in question (although it's not relevant to this particular question).
This would let me understand if CTag change was only caused by my own actions (and no resync needed) or something else occurred in between (and thus resync is needed).
Currently, I can only query the folder for its CTag at any time but I have no clue what to do if CTag changed (in pseudo-code):
cTag0 = ReadStoredValue() ' The value left from the previous sync.
cTag1 = GetCTag()
If cTag0 <> cTag1 Then
Resync()
End If
UploadItem() ' Can get race condition if another client changes anything right now
cTag2 = GetCTag()
cTag2 will obviously be not the same as cTag1 but this provide zero information on whether something else occurred in the middle (another client changed something in the same folder). So, cTag0 <> cTag1 compare won't save me from race conditions, I could think that I'm in sync while some other update sneaked unnoticed.
Would be great to have:
cTag0 = ReadStoredValue() ' The value left from the previous sync.
(cTag1, cTag2) = UploadItem()
If cTag0 == cTag1
' No resync needed, just remember new CTag for the next sync cycle.
cTag0 = cTag2
Else
Resync()
cTag0 = cTag2
End If
I'm aware of DAV-Sync protocol extension but this would be a different story. In this task, I'm referring to the standard DAV, no extensions allowed.
EDIT: One thought which just crossed my mind. I noticed that CTag is sequential. It's a number which gets incremented by 1 on each operation with the folder. So if it's increased by more than 1 between obtaining CTag, making my action and then obtaining CTag again, this will indicate something else has just occurred. But this does not seem to be reliable, I'm afraid it's too implementation-specific to rely on this behavior. Looking for a more robust solution.

How to determine if a DAV folder had parallel updates while I was modifying it
This is very similar to
How to avoid time conflict or overlap for CalDAV?
Technically in pure DAV you are not guaranteed to be able to do this. Though in the real world, most servers will return you the ETag in the response to the PUT which was used to create/update the resource. This allows you to reconcile concurrent changes to the same resource.
There is also the Calendar Server Bulk Change Requests for *DAV Protocols
which is supported by some servers and which provides a more specific way to do this.
Since it isn't an RFC, I wouldn't suggest to rely on that though.
So what you would probably do is a PUT. If that returns you the ETag, you are good and can reconcile by syncing the collection (by whatever mechanism, PROPFIND:1, CTag or sync-report). If not, you either have the option to reconciling by other means (e.g. comparing/hashing the content), or to just treat the change as a concurrent edit, which I think most implementations do.
If you are very lucky, the server may also return the CTag/sync-token in the PUT. But AFAIK there is no standard for that, servers are not required to do it.
For this folder, I have ETag (CTag in SabreDAV dialect)
This is a misconception of yours. A CTag is absolutely not the same like an ETag, it is its own thing documented over here:
CalDAV CTag.
I'm aware of DAV-Sync protocol extension but this would be a different story. In this task, I'm referring to the standard DAV, no extensions allowed.
CTag is not a DAV standard at all, it is a private Apple extension (there is no RFC for that).
Standard HTTP/1.1 specs the ETag. It corresponds to the resource representation and doesn't apply to WebDAV collection contents, which are distinct to that. WebDAV collections often also have contents (that can be retrieved by GET etc), the ETag corresponds to that.
The official standard which replaces the proprietary CTag extension is in fact DAV-Sync aka RFC 6578. And the sync-token property and header is what replaces CTag header.
So if "no extensions allowed" is your use case, you need to resource comparison on the client side. Pure WebDAV doesn't provide this capability.
I noticed that CTag is sequential
CTags are not sequential, they are opaque tokens. Specific servers may use a sequence, but that is completely arbitrary. (the same is true for all DAV tokens, they are always opaque)

Related

Generating a multipart/byterange response without scanning the parts ahead of sending

I would like to generate a multipart byte range response. Is there a way for me to do it without scanning each segment I am about to send out, since I need to generate multipart boundary strings?
For example, I can have a user request a byterange that would have me fetch and scan 2GB of data, which in my case involves me loading that data into my (slow) VM as strings and so forth. Ideally I would like to simply state in the response that a part has a length of a certain number of bytes, and be done with it. Is there any tooling that could provide me with this option? I see that many developers just grab a UUID as the boundary and are probably willing to risk a tiny probability that it will appear somewhere within the part, but that risk seems to be small enough multiple people are taking it?
To explain in more detail: scanning the parts ahead of time (before generating the response) is not really feasible in my case since I need to fetch them via HTTP from an upstream service. This means that I effectively have to prefetch the entire part first to compute a non-matching multipart boundary, and only then can I splice that part into the response.
Assuming the data can be arbitrary, I don’t see how you could guarantee absence of collisions without scanning the data.
If the format of the data is very limited (like... base 64 encoded?), you may be able to pick a boundary that is known to be an illegal sequence of bytes in that format.
Even if your boundary does collide with the data, it must be followed by headers such as Content-Range, which is even more improbable, so the client is likely to treat it as an error rather than consume the wrong data.
Major Web servers use very simple strategies. Apache grabs 8 random bytes at startup and renders them in hexadecimal. nginx uses a sequential counter left-padded with zeroes.
UUIDs are designed to avoid collisions with other UUIDs, not with arbitrary data. A UUID is no more likely to be a good boundary than a completely random string of the same length. Moreover, some UUID variants include information that you may not want to disclose, such as your machine’s MAC address.
Ideally I would like to simply state in the response that a part has a length of a certain number of bytes, and be done with it. Is there any tooling that could provide me with this option?
Maybe you can avoid supporting multiple ranges and simply tell the clients to request each range separately. In that case, you don’t use the multipart format, so there is no problem.
If you do want to send multiple ranges in one response, then RFC 7233 requires the multipart format, which requires the boundary string.
You can, of course, invent your own mechanism instead of that of RFC 7233. In that case:
You cannot use 206 (Partial Content). You must use 200 (OK) or some other applicable status code.
You cannot use the multipart/byteranges media type. You must come up with your own media type.
You cannot use the Range request header.
Because a 200 (OK) response to a GET request is supposed to carry a (full) representation of the resource, you must do one of the following:
encode the requested ranges in the URL; or
use something like POST instead of GET; or
use a custom, non-standard status code instead of 200 (OK); or
(not sure if this is a correct approach) use media type parameters, send them in Accept, and add Accept to Vary.
The chunked transfer coding may be useful, but you cannot rely on it alone, because it is a property of the connection, not of the payload.

Understand the weak comparison function

HTTP 1.1 defines a weak comparison function for cache validators:
in order to be considered equal,
both validators MUST be identical in every way, but either or
both of them MAY be tagged as "weak" without affecting the
result.
I understand that following statement (for two ETags) is true:
W/"Foo" = "Foo"
Now I'm wondering what real world use case might exist where a server compares a weak ETag against a strong one.
There are cases where servers first assign a weak etag, and later on promote it to a strong etag (by removing the "W/" prefix). An example is Apache moddav (or is it plain httpd?), when configured to create entity tags based on the filesystem timestamp of the file being served.

Is If-Modified-Since strong or weak validation?

HTTP 1.1 states that there can be either strong and weak ETag/If-None-Match validation. My questions is, is Last-Modified/If-Modified-Since validation strong or weak?
This has implications whether sub-range requests can be made or not.
From http://greenbytes.de/tech/webdav/draft-ietf-httpbis-p5-range-23.html#rfc.section.4.3:
"A response might transfer only a subrange of a representation if the connection closed prematurely or if the request used one or more Range specifications. After several such transfers, a client might have received several ranges of the same representation. These ranges can only be safely combined if they all have in common the same strong validator, where "strong validator" is defined to be either an entity-tag that is not marked as weak (Section 2.3 of [Part4]) or, if no entity-tag is provided, a Last-Modified value that is strong in the sense defined by Section 2.2.2 of [Part4]."
An ETag can be strong or weak depending on its suffix. Normally it will be strong, except if you access dynamic content where the content management system (CMS) handles that which is IMHO very uncommon.
However, the If-Modified-Since headers result should be strong too if and only if nobody manipulates the metadata of the files in the filesystem. In Linux it is pretty simple with the touch command, however I think you normally don't need to care about that. If somebody manipulates your server you have a different problem entirely.

Does if-match HTTP header require two-phase commits?

I'm trying to design a RESTful web API, so I've been studying rfc2616. I like the idea of using ETags for optimistic concurrency and was trying to use it to make a safe way to add resources without race-conditions. However, I noticed the following two statements in section 14.24:
If the request would, without the If-Match header field, result in anything other than a 2xx or 412 status, then the If-Match header MUST be ignored.
A request intended to update a resource (e.g., a PUT) MAY include an If-Match header field to signal that the request method MUST NOT be applied if the entity corresponding to the If-Match value (a single entity tag) is no longer a representation of that resource.
I'm using a RDBMS and don't know whether a transaction will successfully commit until I try it, so I think the first requirement seems a bit onerous. Consider a case where somebody supplies an If-Match header with mismatched ETags: If the commit would succeed, then I should heed the If-Match header, NOT attempt the commit, and return 412. If the commit would fail, then a request without the If-Match header would have resulted in a non-2XX/412 response, so I MUST ignore the If-Match header, meaning I should attempt the commit.
As far as I can figure out, I have 2 options:
Use 2-phase commits to gain foresight into whether the commit will succeed before attempting it.
Ignore the first requirement above, and return 412 even if ignoring If-Match would have resulted in a non-2XX/412 response. (this is the one I'm leaning towards)
Any other ideas? Am I misinterpreting the specs?
Wouldn't something like "update unless modified" (optimistic locking) work? The entity would need to store a version number or the etag in the database.
run validations that don't require a commit, ignoring the etag, return error if necessary
update entity where id = :the_id and etag = :expected_etag
this returns either 0 or 1 for affected rows
if 0 the resource has seen a concurrent update (or the id is completely wrong, which you could check separately). In this case return 412
commit
if the commit fails, return error as appropriate
Maybe this is somewhat on the theoretical side, but based on my current understanding of the HTTP specification, I would classify the usage of If-Match-like headers practically unusable for all but maybe the safe methods, because of this:
"If the request would, without the If-Match header field, result in anything other than a 2xx or 412 status, then the If-Match header MUST be ignored".
Why? Simply because in most practical cases, it's just impossible to foresee what should happen if the request was carried out.
As an example, who can forsee a IO-level error or some exceptional case occuring in code that must be run?
It'd be more "solvable" if 5xx where added to 2xx and 412.

What is the difference between Application("Something") and Session("Something")

While debugging a classic ASP application (and learning about classic ASP at the same time) I've encountered the following
Application("Something") = "some value"
and elsewhere in the code this value gets used thus:
someObj.Property = Session("Something")
How does the Application object relate to Session?
A Session variable is linked to a user. An Application variable is shared between all users.
Application is a handy vault for storing things you want to persist but you can't guarantee they'll always be there. So think low-end caching, short-term variable storage, etc.
In this context with these definitions, they have very little to do with each other except that getting and setting variables is roughly the same for each.
Note: there can be concurrency issues when using Application (because you could easily have more than one user hitting something that reads or writes to it) so I suggest you use Application.Lock before you write and Application.Unlock after you're done. This only really applies to writing.
Note 2: I'm not sure if it automatically unlocks after the request is done (that would be sensible) but I wouldn't trust it to. Make sure that any part of the application that could conceivable explode isn't within a lock otherwise you might face locking other users out.
Note 3: In that same vein, don't put things that take a long time to process inside a lock, only the bit where you write the data. If you do something that takes 10 seconds while in a lock, you lock everybody else out.

Resources