REST: What is a good Hypermedia and Resource Caching Strategy? - http

If I have a RESTful service that has discoverable resources via an endpoint such as:
Request:
GET http://acme.org/someInfo
Response:
HTTP/1.1 200 OK
Content-Length: ...
Content-Type: application/vnd.acme+xml
Date: Fri, 16 Dec 2012 12:40:00 GMT
Last-Modified: Tue, 1 Mar 2012 11:45:00 GMT
<someInfo xmlns="http://schemas.acme.org/someInfo" xmlns:dap="http://schemas.acme.org/dap">
<dap:link rel="http://relations.acme.org/someInfo" uri="htp://acme.org/someInfo/foo" />
<dap:link rel="http://relations.acme.org/someInfo" uri="htp://acme.org/someInfo/bar" />
<dap:link rel="http://relations.acme.org/someInfo" uri="htp://acme.org/someInfo/baz" />
</someInfo>
And then with this response, a client may then follow one of the hypermedia links:
Request:
GET http://acme.org/someInfo/foo
Response:
HTTP/1.1 200 OK
Content-Length: ...
Content-Type: application/vnd.acme+xml
Date: Fri, 16 Dec 2012 12:45:00 GMT
Last-Modified: Wed, 28 Sep 2012 11:45:00 GMT
<fooInfo xmlns="http://schemas.acme.org/fooInfo">
...
</fooInfo>
The first response may change less frequently (ex: many months), and the second one may change slightly more frequently (ex: every month or so). What is a good HTTP caching strategy for this sort of scenario; by date, client ETag comparison, something else?
EDIT: If the data is stale in magnitudes of a day or so, that is fine. Any more would probably be problematic.

This is a performance versus consistency issue that really can only be answered by the business.
For each resource you need to ask two questions:
If the resource changes and the users do not see that change for X
hours, what is the business impact? Will reactors explode if the user does not see the temperature change?
How much does it cost to see
a new version of that resource? Are you on a 1Gbps local network,
or accessing it from a mobile phone in Siberia?
Once you know how valuable it is to have that data up-to-date and how much it costs to get that data then you can decide on the best caching strategy.

Related

Testing specific Azure Web Site Instance

I have an Azure Web Site configured to use multiple (2) instances:
I have a service bus that should pass messages (ie Cache Evict) between the instances. I need to test this mechanism.
In a conventional (on premise) system I would point a browser to instance 1 (ie http://myserver1.example.com), perform an action, then point my browser to the other instance (http://myserver2.example.com) to test.
However, in Azure I can't see a way to hit a specific instance. Is it possible? Or is there an alternative way to to run through this test scenario (act on instance 1, ensure instance 2 behaves appropriately)?
Unfortunately, there isn't an official way of doing this. However, you can achieve that by setting a cookie called ARRAffinity on your request.
Try hitting your site from any client (Chrome, Firefox, curl, httpie, etc) and inspect the response headers that you are getting back.
For example in curl you would do
curl -I <siteName>.azurewebsites.net
you would get this
HTTP/1.1 200 OK
Content-Length: 2
Content-Type: text/html
Last-Modified: Wed, 17 Sep 2014 16:57:26 GMT
Accept-Ranges: bytes
ETag: "2ba0757598d2cf1:0"
Server: Microsoft-IIS/8.0
X-Powered-By: ASP.NET
Set-Cookie: ARRAffinity=<very long hash>; Path=/;Domain=<siteName>.azurewebsites.net
Date: Fri, 28 Nov 2014 03:13:07 GMT
what you are interested in is the ARRAFinity if you send couple of request you would notice that hash will keep changing between 2 values that represent your 2 instances.
Set that in your Cookie header on your request will guarantee it going to one of the instances and not the other.
curl --cookie ARRAfinity=<one of the hashes you got> <siteName>.azurewebsites.net

Datapower adds characters outside soap envelope. What are they?

First off, I'm a total newbie and am way over my head. I have googled my brains out and cant get what seems like a super easy question answered. My company just started sending requests via datapower and although we havent changed other things we have a partner who is receiving a constant 3 characters above the soap envelope and a 0 below it. Exactly like the example below. I cant find for the life of me what these numbers are supposed to represent and why we are all of a sudden sending them because of a switch to datapower. Any insight would be much appreciated. TIA.
In this example the 191 and 0 are the problem.
HTTP/1.1 200 OK
Date: Sat, 16 Feb 2008 00:30:34 GMT
Server: Microsoft-IIS/6.0
X-Powered-By: ASP.NET
Transfer-Encoding: chunked
X-Transport-Caps-Negotiation-Flags: 0,0,0,0,0
Content-Type: text/xml
191
<soap:Envelope/>
0
I see Transfer-Encoding: chunked in the headers, so the 191 and 0 are chunk lengths. Read about it on wikipedia and in RFC2616.
If the receiver does not understand chunked encoding (?), you might have to disable it somewhere in the SOAP handler or IIS.

Http range header requests entire file

I am working with the c# webserver from codeplex version 1.1. I have implemented the Accept-Range headers and it does work. However when I use wireshark (Version 1.4.1 (SVN Rev 34476 from /trunk-1.4)) to catch the traffic, I see the following:
GET /movies/i_am_legend%20dvd/main.m4v HTTP/1.1
Host: 10.100.1.199:8081
Accept: */*
Range: bytes=0-1
Accept-Encoding: identity
Connection: keep-alive
User-Agent: AppleCoreMedia/1.0.0.9B206 (iPad; U; CPU OS 5_1_1 like Mac OS X; nl_nl)
X-Playback-Session-Id: 9CED81CC-BFAE-4CF6-A477-0EA62B2C652F
HTTP/1.1 206 PartialContent
Content-Range: bytes 0-1/652965648
Accept-Ranges: bytes
ETag: "0daA8D4/wgt4MFvxdNIPLw=="
Date: Wed, 13 Jun 2012 09:10:18 GMT
Content-Length: 2
Content-Type: video/x-m4v
Server: Tiny WebServer
Connection: keep-alive
.. << 2 bytes data
GET /movies/i_am_legend%20dvd/main.m4v HTTP/1.1
Host: 10.100.1.199:8081
Accept: */*
Range: bytes=0-652965647
Accept-Encoding: identity
Connection: keep-alive
User-Agent: AppleCoreMedia/1.0.0.9B206 (iPad; U; CPU OS 5_1_1 like Mac OS X; nl_nl)
X-Playback-Session-Id: 9CED81CC-BFAE-4CF6-A477-0EA62B2C652F
HTTP/1.1 206 PartialContent
Content-Range: bytes 0-652965647/652965648
Accept-Ranges: bytes
ETag: "0daA8D4/wgt4MFvxdNIPLw=="
Date: Wed, 13 Jun 2012 09:10:18 GMT
Content-Length: 652965648
Content-Type: video/x-m4v
Server: Tiny WebServer
Connection: keep-alive
The webserver will try to send the entire file ( >600MB), wireshark shows that the entire conversation is 159774 bytes. If I do the same thing with IIS I get similar headers
GET /ipod/main.m4v HTTP/1.1
Host: 10.100.1.199
User-Agent: AppleCoreMedia/1.0.0.9B206 (iPad; U; CPU OS 5_1_1 like Mac OS X; nl_nl)
Accept: */*
Range: bytes=0-1
Accept-Encoding: identity
X-Playback-Session-Id: C5BBF91D-78AB-42BA-ACE0-D74AB9D845CE
Connection: keep-alive
HTTP/1.1 206 Partial Content
Content-Type: video/x-m4v
Last-Modified: Mon, 11 Jun 2012 10:33:41 GMT
Accept-Ranges: bytes
ETag: "7243cabbd47cd1:0"
Server: Microsoft-IIS/7.5
X-Powered-By: ASP.NET
Date: Wed, 13 Jun 2012 09:21:03 GMT
Content-Length: 2
Content-Range: bytes 0-1/652965648
.. << 2 bytes of data
GET /ipod/main.m4v HTTP/1.1
Host: 10.100.1.199
User-Agent: AppleCoreMedia/1.0.0.9B206 (iPad; U; CPU OS 5_1_1 like Mac OS X; nl_nl)
Accept: */*
Range: bytes=0-652965647
Accept-Encoding: identity
X-Playback-Session-Id: C5BBF91D-78AB-42BA-ACE0-D74AB9D845CE
Connection: keep-alive
HTTP/1.1 206 Partial Content
Content-Type: video/x-m4v
Last-Modified: Mon, 11 Jun 2012 10:33:41 GMT
Accept-Ranges: bytes
ETag: "7243cabbd47cd1:0"
Server: Microsoft-IIS/7.5
X-Powered-By: ASP.NET
Date: Wed, 13 Jun 2012 09:21:03 GMT
Content-Length: 652965648
Content-Range: bytes 0-652965647/652965648
Wireshark shows that the entire conversation is 175615 bytes.
I have searched for more information on the Accept-Range headers, and so far I can only find that the server must send the requested range. But I can't believe that it was meant to use a range request for requesting a huge file in one time.
My webserver tries to send the entire file because it has been requested as such, but I see new range requests coming in with more huge ranges like this (only the Range header copied from the request header. The (#time ... ) is the time of wireshark
Range: bytes=2162688-652965647 (# time == 1.646204)
Range: bytes=4980736-652965647 (# time == 2.754322)
Range: bytes=6356992-652965647 (# time == 2.922479)
After reading this I have tried to send a shorter range whenever I get the range request for the entire file. But then it does not work at all.
I would like to know:
Is the range request for the entire file is some kind of bug in iOS (seen it with 4.3.3 as well) I would have expected Range: bytes=0-1 and after the replay something like Range: bytes=0-65535/652965648
Can I somehow gracefully deny this large request and tell the requested that I can deliver a maximum size at once? (I did not find this in the RFC)
Is IIS simply aborting this request after certain amount of bytes?
EDIT: For number 3: Not IIS but the browser seems to simply aborting (and closing) the connection. After that making a new request. I can't imagine that the Range Request was meant to request the entire file or HUGE parts of the file.
EDIT: In iOS7 it seems to have changed. The first range request is still the same (bytes 0-1). After that, I see 2 or 3 range requests as mentioned above, where the last request keeps on transferring bytes for a longer period. However still multiple requests are done.
Is the range request for the entire file is some kind of bug in iOS (seen it with 4.3.3 as well) I would have expected Range: bytes=0-1 and after the replay something like Range: bytes=0-65535/652965648
I don't know whether it is a bug. However, I can think of reasons for a media player to request the entire file in one request. In this way the media player gets a data stream from which it can read all data from start to finish.
As soon as the media player have read enough data from the stream it can start playing the media file. It then chooses how much more data it shall buffer in the background while the media is playing. There could be several different approaches to this:
Eagerly buffer the entire media file. This is a good strategy when bandwidth is cheap (user is not paying or paying a flat rate for data transfer). It is assumed that the user will want to see/listen to the entire media file.
Lazily buffer just enough to avoid lagging. This is a good strategy when bandwidth is expensive (user is paying by the byte).
In an ideal setup, the media player wouldn't have to buffer anything at all and instead decode data from the stream while playing the media in real time. However, that would require that the underlying network channel would be super stable and transfer data at the required pace at all times.
This is not the case, and therefore the media player will choose to buffer a couple of seconds or minutes ahead.
It is important to note that whatever strategy is chosen it could still make sense for the media player to request the entire resource in a single request.
However, range requests are vital to the media player when:
The connection is aborted (for any reason).
The user jumps ahead in the media. (for example, wants to see what's 10 minutes into a movie)
The media player can then close the data stream it originally opened and send a range request for the desired position.
Can I somehow gracefully deny this large request and tell the requested that I can deliver a maximum size at once?
No you cannot. Range requests are initiated by the client/browser and a server that have stated that it support range requests (via Accept-Ranges header) must obey the client and respond with whatever range it requests.
What you can do however is to send data with a Transfer-Encoding: chunked header. This will enable your server to control how large chunks of data it shall transmit. However, it is still done over a single HTTP connection.

How does HTTP download work?

Let's say i want to download a file called example.pdf from http://www.xxx.ууу/example.pdf
Probably, i send GET request like this:
GET /example.pdf HTTP/1.1␍␊
Host: www.xxx.yyy␍␊
␍␊
But what's next?
How does exchange of http headers look like?
I'm assuming you've read the Wikipedia article on the HTTP protocol. If you just need more examples I'd highly recommend you download Wireshark. Wireshark is an extremely powerful packet sniffer which will allow you to watch packet communications between you and any website. In addition it will actually break down the packets and tell you a little bit about their meanings in more "human terms". It has a bit of a learning curve but it can teach you a lot about a number of different protocols including HTTP.
http://www.wireshark.org/
I'm not sure what your ultimate goal is, but you can view real-time http header interaction with the Live HTTP Headers Firefox add-on. It's also possible in Chrome, but it's a little more work.
Check the HTTP 1.1 RFC.
You might want to look at http://www.w3.org/Protocols/rfc2616/rfc2616.html . But also, there is rarely a need to recreate the protocol.
To answer such GET request, the packet with the following header should be passed:
HTTP/1.1 200 OK
Accept-Ranges: bytes
Content-Length: 6475593
Content-Type: application/x-msdownload
Etag: "qwfw473usll"
Last-Modified: Sun, 18 Jul 2021 12:02:31 GMT
Server: Caddy
Date: Sun, 18 Jul 2021 12:03:47 GMT
After the last line, you must specify 2 CRLF and row bytes of the file to be transmitted.

Cneonction and nnCoection HTTP headers

We have often some issues in terms of interoperability on the Web. One of these issues for browsers vendors is the wrongly spelled Connection HTTP header. The most common errors are given by these two forms.
nnCoection:
Cneonction:
There are has been a few articles about this, including Fun with HTTP headers. Often it is happening by period, then disappear. It seems that some of them are created by load balancers such as this example: NetScaler Appliance.
Do you know any other instances of hardware or software that create these issues?
Update Here an example among others of a site which doesn't send back a good Connection HTTP header.
curl -sI ehg-nokiafin.hitbox.com
HTTP/1.1 200 OK
Date: Tue, 25 Jan 2011 20:35:45 GMT
Server: Hitbox Gateway 9.3.6-rc1
P3P: policyref="/w3c/p3p.xml", CP="NOI DSP LAW NID PSA ADM OUR IND NAV COM"
Cneonction: close
Pragma: no-cache
Cache-Control: max-age=0, private, proxy-revalidate
Expires: Tue, 25 Jan 2011 20:35:46 GMT
Content-Type: text/plain
Content-Length: 23
update 2011-01-26
On Amazon forum about AWS, there is a thread about nnCoection. A comment says:
FYI, the reason it misspells the word
connection is so that the internet
check-sum (a simple sum) still adds
up, this way the change can occur at
the packet level. If it completely
removed the header, it would have to
stall forwarding the response until
the header was entirely read, so it
could rewrite the headers, recompute
the checksum and then send it along.
with
sum(ord(c) for c in "Connection")
and
sum(ord(c) for c in "nnCoection")
both gives 1040
Are you sure it's an actual issue? The linked article suggests that these sorts of headers are "misspelled on purpose" so that a load balancer, reverse proxy or other middlebox can defeat the server's wishes that the connection be kept alive, without having to track a delta in TCP stream position over the life of the connection. Something like this may actually be necessary to bring a downed and recovered server back into active duty, by forcing kept-alive connections to other servers to migrate to the one brought online.
If you have a protocol that's dependent on HTTP Connection: keep-alive to function (cough), you're probably doing it wrong.

Resources