I'm trying to design a RESTful web API, so I've been studying rfc2616. I like the idea of using ETags for optimistic concurrency and was trying to use it to make a safe way to add resources without race-conditions. However, I noticed the following two statements in section 14.24:
If the request would, without the If-Match header field, result in anything other than a 2xx or 412 status, then the If-Match header MUST be ignored.
A request intended to update a resource (e.g., a PUT) MAY include an If-Match header field to signal that the request method MUST NOT be applied if the entity corresponding to the If-Match value (a single entity tag) is no longer a representation of that resource.
I'm using a RDBMS and don't know whether a transaction will successfully commit until I try it, so I think the first requirement seems a bit onerous. Consider a case where somebody supplies an If-Match header with mismatched ETags: If the commit would succeed, then I should heed the If-Match header, NOT attempt the commit, and return 412. If the commit would fail, then a request without the If-Match header would have resulted in a non-2XX/412 response, so I MUST ignore the If-Match header, meaning I should attempt the commit.
As far as I can figure out, I have 2 options:
Use 2-phase commits to gain foresight into whether the commit will succeed before attempting it.
Ignore the first requirement above, and return 412 even if ignoring If-Match would have resulted in a non-2XX/412 response. (this is the one I'm leaning towards)
Any other ideas? Am I misinterpreting the specs?
Wouldn't something like "update unless modified" (optimistic locking) work? The entity would need to store a version number or the etag in the database.
run validations that don't require a commit, ignoring the etag, return error if necessary
update entity where id = :the_id and etag = :expected_etag
this returns either 0 or 1 for affected rows
if 0 the resource has seen a concurrent update (or the id is completely wrong, which you could check separately). In this case return 412
commit
if the commit fails, return error as appropriate
Maybe this is somewhat on the theoretical side, but based on my current understanding of the HTTP specification, I would classify the usage of If-Match-like headers practically unusable for all but maybe the safe methods, because of this:
"If the request would, without the If-Match header field, result in anything other than a 2xx or 412 status, then the If-Match header MUST be ignored".
Why? Simply because in most practical cases, it's just impossible to foresee what should happen if the request was carried out.
As an example, who can forsee a IO-level error or some exceptional case occuring in code that must be run?
It'd be more "solvable" if 5xx where added to 2xx and 412.
Related
Let's say we have an API with a route /foo/<id> that represents an instance of an object like this:
class Foo:
bar: Optional[Bar]
name: str
...
class Bar:
...
(Example in Python just because it's convenient, this is about the HTTP layer rather than the application logic.)
We want to expose full serialized Foo instances (which may have many other attributes) under /foo/<id>, but, for the sake of efficiency, we also want to expose /foo/<id>/bar to give us just the .bar attribute of the given Foo.
It feels strange to me to use 404 as the status response when bar is None here, since that's the same status code you'd get if you requested some arbitrarily incorrect route like /random/gibberish, too; if we were to have automatic handling of 404 status in our client-side layer, it would be misinterpreting this with likely explanations such as "we forgot to log in" or "the client-side URL routing was wrong".
However, 200 with a response-body of null (if we're serializing using JSON) feels odd as well, because the presence or absence of the entity at the given endpoint is usually communicated via a status rather than in-line in the body. Would 204 with an empty response-body be the right thing to say here? Is a 404 the right way to go, and if so, what's the right way for the server to communicate nuances like "but that was a totally expected and correct route" or "actually the foo-ID you specified was incorrect, this isn't missing because the attribute was un-set".
What are the advantages and disadvantages of representing the missing-ness of this attribute in different ways?
I wonder if you could more clearly articulate why a 200 with a null response body is odd. I think it communicates exactly what you want, as long as you're not trying to differentiate between a given Foo not having a bar (e.g. Foo.has_key?(bar)) and Foo having a bar explicitly set to null.
Of 404, https://developer.mozilla.com says,
In an API, this can also mean that the endpoint is valid but the resource itself does not exist.
so I think it's acceptable. 204 doesn't strike me as particularly outlandish in this situation, but is more commonly associated (IME, at least) with DELETEs (and occasionally PUTs/POSTs that don't return results.)
I also struggle a lot with this because:
404 can point to a non existent url, or a path that is acceptable
but the particular referenced resource does not exist. I have also
used it to error out on request body's that carry identifiers that
are non existent.
A lot of people shoe-horn these errors into the bad request (400)
error code which is somewhat acceptable but also a cop out.
(Literally anything the server did not process successfully can be classified as a bad request, if you
think about it)
With 2(above) in mind, a 400 with some helpful message body is
sometimes used to wash out the guilt of not committing outrightly to
a 404, but this demands some parsing expectations on the client's
side, which is not always nice. Also returning a 400 which,
according to this is kind of gaslighting the client, because 400
errors are supposed to be the client's fault entirely with regard to the structure of the request, not because the client asked for something not in your db.
400 Bad Request response status code indicates that the server cannot or will not process the request due to something that is perceived to be a client error (e.g., malformed request syntax, invalid request message framing, or deceptive request routing).
The general feeling is that 200 means all is good, and therefore
there's always a tacit expectation the response will always contain
some form of body, not null.(Right??) I wouldn't encourage using a 200 for
these situations. While 204's don't carry the responsibility having to carry a response body, they also sort of convey the message that "something worked", which is not the message you want to send here, right?
What I'm trying to say? Thoughtful API design is hard.
I'm syncing local client with a DAV folder (CardDAV in this particular case).
For this folder, I have ETag (CTag in SabreDAV dialect, to distinguish folder etags from item etags). If CTag has changed, I need to re-sync again. But if this change was caused by myself (e.g. I just uploaded a contact into this CardDAV folder), isn't there any way to avoid resync?
Ideally, I wanted that the DAV server would return this info on each request which changes anything on the server:
CTag1, CTag of the current folder as it was before my action was applied
CTag2, CTag of the current folder after my action was applied
ETag assigned to the item in question (although it's not relevant to this particular question).
This would let me understand if CTag change was only caused by my own actions (and no resync needed) or something else occurred in between (and thus resync is needed).
Currently, I can only query the folder for its CTag at any time but I have no clue what to do if CTag changed (in pseudo-code):
cTag0 = ReadStoredValue() ' The value left from the previous sync.
cTag1 = GetCTag()
If cTag0 <> cTag1 Then
Resync()
End If
UploadItem() ' Can get race condition if another client changes anything right now
cTag2 = GetCTag()
cTag2 will obviously be not the same as cTag1 but this provide zero information on whether something else occurred in the middle (another client changed something in the same folder). So, cTag0 <> cTag1 compare won't save me from race conditions, I could think that I'm in sync while some other update sneaked unnoticed.
Would be great to have:
cTag0 = ReadStoredValue() ' The value left from the previous sync.
(cTag1, cTag2) = UploadItem()
If cTag0 == cTag1
' No resync needed, just remember new CTag for the next sync cycle.
cTag0 = cTag2
Else
Resync()
cTag0 = cTag2
End If
I'm aware of DAV-Sync protocol extension but this would be a different story. In this task, I'm referring to the standard DAV, no extensions allowed.
EDIT: One thought which just crossed my mind. I noticed that CTag is sequential. It's a number which gets incremented by 1 on each operation with the folder. So if it's increased by more than 1 between obtaining CTag, making my action and then obtaining CTag again, this will indicate something else has just occurred. But this does not seem to be reliable, I'm afraid it's too implementation-specific to rely on this behavior. Looking for a more robust solution.
How to determine if a DAV folder had parallel updates while I was modifying it
This is very similar to
How to avoid time conflict or overlap for CalDAV?
Technically in pure DAV you are not guaranteed to be able to do this. Though in the real world, most servers will return you the ETag in the response to the PUT which was used to create/update the resource. This allows you to reconcile concurrent changes to the same resource.
There is also the Calendar Server Bulk Change Requests for *DAV Protocols
which is supported by some servers and which provides a more specific way to do this.
Since it isn't an RFC, I wouldn't suggest to rely on that though.
So what you would probably do is a PUT. If that returns you the ETag, you are good and can reconcile by syncing the collection (by whatever mechanism, PROPFIND:1, CTag or sync-report). If not, you either have the option to reconciling by other means (e.g. comparing/hashing the content), or to just treat the change as a concurrent edit, which I think most implementations do.
If you are very lucky, the server may also return the CTag/sync-token in the PUT. But AFAIK there is no standard for that, servers are not required to do it.
For this folder, I have ETag (CTag in SabreDAV dialect)
This is a misconception of yours. A CTag is absolutely not the same like an ETag, it is its own thing documented over here:
CalDAV CTag.
I'm aware of DAV-Sync protocol extension but this would be a different story. In this task, I'm referring to the standard DAV, no extensions allowed.
CTag is not a DAV standard at all, it is a private Apple extension (there is no RFC for that).
Standard HTTP/1.1 specs the ETag. It corresponds to the resource representation and doesn't apply to WebDAV collection contents, which are distinct to that. WebDAV collections often also have contents (that can be retrieved by GET etc), the ETag corresponds to that.
The official standard which replaces the proprietary CTag extension is in fact DAV-Sync aka RFC 6578. And the sync-token property and header is what replaces CTag header.
So if "no extensions allowed" is your use case, you need to resource comparison on the client side. Pure WebDAV doesn't provide this capability.
I noticed that CTag is sequential
CTags are not sequential, they are opaque tokens. Specific servers may use a sequence, but that is completely arbitrary. (the same is true for all DAV tokens, they are always opaque)
We are running loadtests and Riak is sometimes responding with 204 No content when we do a PUT operation, even though we've passed returnbody=true (this answer implies this is not expected)
It happens rarely, so what might be the possible reasons?
Our riak has 9 nodes, w=5, n_val=6, r=2.
You will get a 204 whenever there is no data to return. I've seen this sometimes using the PHP client when json_encode returns false, which results in the object being stored with a 0-byte value. Since there is no data to include in the body, the server returns '204 No Content'
That is just one possible way this can occur. I'm sure there are a multitude of situations where a key is stored with just metadata and no value.
Is adding an extra redudandant header to the HTTP request may cause any functionality harm?
for example:
Adding :
myheader=blablabla
An X- prefix was customary for those headers, but no longer. It shouldn't break anything as long as your headers are formatted correctly (so: myheader: blablabla, not myheader=blablabla)
The HTTP 1.1 specification says (about entity headers);
Unrecognized header fields SHOULD be ignored by the recipient and MUST be forwarded by transparent proxies.
In other words - since the wording is SHOULD, not MUST - recipients are allowed to react to unknown headers, so technically your extra header could cause harm.
In practice though I have never seen a recipient do this, and with the surfacing of newer RFCs regarding custom header use seeing an adverse effect is very unlikely.
HTTP 1.1 states that there can be either strong and weak ETag/If-None-Match validation. My questions is, is Last-Modified/If-Modified-Since validation strong or weak?
This has implications whether sub-range requests can be made or not.
From http://greenbytes.de/tech/webdav/draft-ietf-httpbis-p5-range-23.html#rfc.section.4.3:
"A response might transfer only a subrange of a representation if the connection closed prematurely or if the request used one or more Range specifications. After several such transfers, a client might have received several ranges of the same representation. These ranges can only be safely combined if they all have in common the same strong validator, where "strong validator" is defined to be either an entity-tag that is not marked as weak (Section 2.3 of [Part4]) or, if no entity-tag is provided, a Last-Modified value that is strong in the sense defined by Section 2.2.2 of [Part4]."
An ETag can be strong or weak depending on its suffix. Normally it will be strong, except if you access dynamic content where the content management system (CMS) handles that which is IMHO very uncommon.
However, the If-Modified-Since headers result should be strong too if and only if nobody manipulates the metadata of the files in the filesystem. In Linux it is pretty simple with the touch command, however I think you normally don't need to care about that. If somebody manipulates your server you have a different problem entirely.