Is there a defined meaning to the getcontentlength property on a collection? - webdav

The response fragment below is part of a PROPFIND reply:
<D:response>
<D:href>https://dav.mystery-meat.com/top</D:href>
<D:propstat>
<D:prop>
<D:creationdate ns0:dt="dateTime.tz">1970-01-01T00:00:00Z</D:creationdate>
<D:getcontentlanguage>en</D:getcontentlanguage>
<D:getcontentlength>16384</D:getcontentlength>
<D:getcontenttype>httpd/unix-directory</D:getcontenttype>
<D:getlastmodified ns0:dt="dateTime.rfc1123">Thu, 01 Jan 1970 00:00:00 GMT</D:getlastmodified>
<D:resourcetype><D:collection/></D:resourcetype>
</D:prop>
<D:status>HTTP/1.1 200 OK</D:status>
</D:propstat>
</D:response>
The getcontentlength value isn't the total bytes of items within this directory. Is there any predefined meaning for this value in WebDAV or is it simply implementor-defined by each server that happens to report a value?
I.e. is it of any real use?

Read the RFC, as usual it has a perfect definition:
Purpose: Contains the Content-Length header returned by a GET without accept headers.
If that isn't clear, it basically says, if you perform a GET request on the same resource with no Accept-* headers, the response will report a Content-Length that is this value.
So if you have a WebDAV implementation that conforms to the standard, you should be able to easily test this by just executing a GET request on the collection. Chances are you'll get some automatically generated HTML response.
If the response to this GET request is a different size (in bytes) as it reported via {DAV:}getcontentlength, it should be considered a bug.
I think in your particular case it might be a bug. The fact that the reported size for the collection is exactly a power of two, leads me to believe that this particular server returns the result of stat() for that directory, which is simply how much space the directory listing takes up on the filesystem (the same number as when you use ls).
If my hunch is true, the server basically has broken behavior.

Related

Proper REST response for empty table?

Let's say you want to get list of users by calling GET to api/users, but currently the table was truncated so there are no users. What is the proper response for this scenario: 404 or 204?
I'd say, neither.
Why not 404 (Not Found) ?
The 404 status code should be reserved for situations, in which a resource is not found. In this case, your resource is a collection of users. This collection exists but it's currently empty. Personally, I'd be very confused as an author of a client for your application if I got a 200 one day and a 404 the next day just because someone happened to remove a couple of users. What am I supposed to do? Is my URL wrong? Did someone change the API and neglect to leave a redirection.
Why not 204 (No Content) ?
Here's an excerpt from the description of the 204 status code by w3c
The server has fulfilled the request but does not need to return an entity-body, and might want to return updated metainformation.
While this may seem reasonable in this case, I think it would also confuse clients. A 204 is supposed to indicate that some operation was executed successfully and no data needs to be returned. This is perfect as a response to a DELETE request or perhaps firing some script that does not need to return data. In case of api/users, you usually expect to receive a representation of your collection of users. Sending a response body one time and not sending it the other time is inconsistent and potentially misleading.
Why I'd use a 200 (OK)
For reasons mentioned above (consistency), I would return a representation of an empty collection. Let's assume you're using XML. A normal response body for a non-empty collection of users could look like this:
<users>
<user>
<id>1</id>
<name>Tom</name>
</user>
<user>
<id>2</id>
<name>IMB</name>
</user>
</users>
and if the list is empty, you could just respond with something like this (while still using a 200):
<users/>
Either way, a client receives a response body that follows a certain, well-known format. There's no unnecessary confusion and status code checking. Also, no status code definition is violated. Everybody's happy.
You can do the same with JSON or HTML or whatever format you're using.
I'd answer one of two codes depending on runtime situation:
404 (Not Found)
This answer is pretty correct if you have no table. Not just empty table but NO USER TABLE. It confirms exact idea - no resource. Further options are to provide more details WHY your table is absent, there is couple of more detailed codes but 404 is pretty good to refer to situation where you really have no table.
200 (OK)
All cases where you have table but it is empty or your request processor filtered out all results. This means 'your request is correct, everything is OK but you do not match any data just because either we have no data or we have no data which matches your request. This should be different from security denial answer. I also vote to return 200 in situation where you have some data and in general you are allowed to access table but have no access to all data which match your request (data was filtered out because of object level security but in general you are allowed to request).
If you are expecting list of user object, the best solution is returning an empty list ([]) with 200 OK than using a 404 or a 204 response.
definitely returns 200.
404 means resource not found. But the resource exists. And also, if the response has 404 status. How can you know users list empty or filled?
'/users' if is empty should return '200'.
'/users/1' if the id is not found. should return 404.
It must 200 OK with empty list.
Why: Empty table means the table exists but does not have any records.
404 Not Found means requested end point does not exist.

HTTP Range Header for Entity lists

I have resources like this
/entities # GET, POST
/entities/<id> # GET, PUT, DELETE
GET /entities gets the list of all entities.
Now I want to poll for updates. The case for a single entity is straight forward:
GET /entities/2
If-Modified-Since: <http date>
The list is tricky. I want the response to be a list of entities, updated or created since a given point in time. I'd intuitively use
GET /entities
Range: after <http date>
Which is a valid request by HTTP specification http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.35.2 . But the spec also mandates a 206 Partial Content response, which has to include a Content-Range header. A Content-Range header, in turn, mandates a byte range to be specified http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.16 . This is obviously very inconvenient for my use case.
How would you request a semantic range over HTTP?
From reading section 14.35.1, I would say that the Range header is used to request a specific range of bytes from a resource, not to request a group of entities according to when they were modified.
In this case, I believe you should treat your range as a filter and pass the date as a query string parameter:
GET /entities?modified-since=<date>

Is it OK to return most recent version of the entity in case of a 412 "Precondition failed"

When doing a PUT or DELETE with an "If-Match" header, in case the ETag sent by a client indicates staleness, rather than just returning a 412, I'd like to return the whole up-to-date entity (including its new ETag in the HTTP header), so the client does not have to perform another GET round trip, which they otherwise would certainly do - in my use-case at least they'd do in probably 100% of the cases.
I don't see anything for or against it in the docs for 412:
http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.4.13
And looking at, say, status code 409, it doesn't seem to be a problem in general to do whatever one likes with the response body of a 4xx error:
http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.4.10
So, does anything (especially in the HTTP specs) speak against return the full up-to-date entity and its ETag?
Should be fine:
All 1xx (informational), 204 (no content), and 304 (not modified)
responses MUST NOT include a message-body. All other responses do
include a message-body, although it MAY be of zero length.
Source: http://www.w3.org/Protocols/rfc2616/rfc2616-sec4.html#sec4.3
What is the request? GET with If-None-Match? In that case, the server isn't supposed to return 412 anyway.
For PUT, DELETE, you certainly can return the current representation. For large representations, it will be inconvenient for clients that don't need it though.
You may also want to label the payload as representation of the resource by using the Location header; see http://greenbytes.de/tech/webdav/draft-ietf-httpbis-p2-semantics-16.html#identifying.response.associated.with.representation.

How can I find out whether a server supports the Range header?

I have been trying to stream audio from a particular point by using the Range header values but I always get the song right from the beginning. I am doing this through a program so am not sure whether the problem lies in my code or on the server.
How can I find out whether the server supports the Range header param?
Thanks.
The way the HTTP spec defines it, if the server knows how to support the Range header, it will. That in turn, requires it to return a 206 Partial Content response code with a Content-Range header, when it returns content to you. Otherwise, it will simply ignore the Range header in your request, and return a 200 response code.
This might seem silly, but are you sure you're crafting a valid HTTP request header? All too commonly, I forget to specify HTTP/1.1 in the request, or forget to specify the Range specifier, such as "bytes".
Oh, and if all you want to do is check, then just send a HEAD request instead of a GET request. Same headers, same everything, just "HEAD" instead of "GET". If you receive a 206 response, you'll know Range is supported, and otherwise you'll get a 200 response.
This is for others searching how to do this. You can use curl:
curl -I http://exampleserver.com/example_video.mp4
In the header you should see
Accept-Ranges: bytes
You can go further and test retrieving a range
curl --header "Range: bytes=100-107" -I http://exampleserver.com/example_vide0.mp4
and in the headers you should see
HTTP/1.1 206 Partial Content
and
Content-Range: bytes 100-107/10000000
Content-Length: 8
[instead of 10000000 you'll see the length of the file]
Although I am a bit late in answering this question, I think my answer will help future visitors. Here is a python method that detects whether a server supports range queries or not.
def accepts_byte_ranges(self, effective_url):
"""Test if the server supports multi-part file download. Method expects effective (absolute) url."""
import pycurl
import cStringIO
import re
c = pycurl.Curl()
header = cStringIO.StringIO()
# Get http header
c.setopt(c.URL, effective_url)
c.setopt(c.NOBODY, 1)
c.setopt(c.HEADERFUNCTION, header.write)
c.perform()
c.close()
header_text = header.getvalue()
header.close()
verbose_print(header_text)
# Check if server accepts byte-ranges
match = re.search('Accept-Ranges:\s+bytes', header_text)
if match:
return True
else:
# If server explicitly specifies "Accept-Ranges: none" in the header, we do not attempt partial download.
match = re.search('Accept-Ranges:\s+none', header_text)
if match:
return False
else:
c = pycurl.Curl()
# There is still hope, try a simple byte range query
c.setopt(c.RANGE, '0-0') # First byte
c.setopt(c.URL, effective_url)
c.setopt(c.NOBODY, 1)
c.perform()
http_code = c.getinfo(c.HTTP_CODE)
c.close()
if http_code == 206: # Http status code 206 means byte-ranges are accepted
return True
else:
return False
One way is just to try, and check the response. In your case, it appears the server doesn't support ranges.
Alternatively, do a GET or HEAD on the URI, and check for the Accept-Ranges response header.
You can use GET method with 0-0 Range request header, and check whether the response code is 206 or not, which will respond with
the first and last bytes of the response body
You also can use HEAD method do the same thing as the first session which will get the same response header and code without response body
Furthermore, you can check Accept-Ranges on the response header to judge whether it can support range, but please notice if the value is none on Accept-Ranges field, it means it can't support range, and if the response header doesn't have Accept-Ranges field you also can't finger out it can't support range from it.
There is another thing you have to know if you are using 0- Range on the request header with GET method to check the response code, the response body message will be cached automatically on the TCP receive window until the cache is full.

What does "subrange" mean in the HTTP spec?

See, for example, §13.3.3 and §13.3.4.
It doesn't seem to me that this could be related to "media range" (§14.1, e.g. Accept: text/*), nor "language range" (§14.4, e.g. Accept-Language: da, en-gb;q=0.8, en;q=0.7).
Maybe it's the "accept range" (§14.5), which puts byte limitations on a response? If that's true, how do ETags relate?
I'm pretty sure it's for range retrieval requests, i.e. requesting part of a document (resuming a file download, for example).
14.35.2 Range Retrieval Requests
HTTP retrieval requests using
conditional or unconditional GET
methods MAY request one or more
sub-ranges of the entity, instead of
the entire entity, using the Range
request header, which applies to
the entity returned as the result of
the request:
If the ETag is weak (starts with W/) then it can't be used for a range retrieval - only strong validators can be used for that or the client may end up with an inconsistent file.

Resources