ASP.NET Web API split response in chuncks - asp.net

Is it possible to split a response from WEB API into chunks as follow.
I have a win forms app that can handle 100 KB of data at a time.
When this client makes a request to my ASP.NET WEB API for some data let's assume that the WEB API response is 2 MB ... can I somehow cache this response, split it in 100KB chunks and return the first chunk to my app. That response will contain a link/token to the next chunk and so forth? Does this sound crazy? Is this feasible?
One more question: when we are talking about request/response content length (size), what does this means: that the content itself cannot be bigger then 100 KB or the content with headers and so on ... I want to know if headers are included or not in the content length?
Example: if my response is 99KB and the headers are 10 KB (109 KB) will this pass if the limit is 100KB?

Pagination is a pretty common solution to large data sets in webservices. Facebook, for example, will paginate API results when they exceed a certain number of rows. Your idea for locally caching the full result is a good optimization, though you can probably get away with not caching it as a first implementation if you are unsure of whether or not you will keep this as your final solution.
Without caching, you can just pass the page number and total number of pages back to your client, and it can then re-make the call with a specific page number in mind for the next set. This makes for an easy loop on the client, and your data access layer is marginally more complicated since it will only re-serialize certain row numbers depending on the page parameter, but that should still be pretty simple.
Web API has access to the same HttpRuntime.Cache object as the rest of the ASP.NET project types do, so it should be easy to write a wrapper around your data access call and stick the result of a larger query into that cache. Use a token as the key to that value in the cache and pass the key, likely an instance of the GUID class, back to the client with the current page number. On subsequent calls, skip accessing your normal persistence method (DB, file, etc) and instead access the GUID key in the HttpRuntime.Cache and find the appropriate row. One wrinkle with this is if you have multiple webservers hosting your service since the HttpRuntime.Cache will exist on only the machine that took the first call, so unless your load balancer has IP affinity or you have a distributed caching layer, this will be more difficult to implement.

Related

How to efficiently stream video over HTTP directly from SQL Server?

I'm trying to implement a video-streaming service. I use ASP.NET Web API, and as I've searched, PushStreamContent is exactly what I want, and it works very fine, sending HTTP response 206 (partial content) to the client, keeping the connection alive and pushing (writing) streams of bytes to the output.
However, I can't scale. Because I can't retrieve partial binary data from database. For example consider that I have a 300MB video in my SQL Server table (varbinary field) and I use Entity Framework to get the record, and then push it to the client using PushStreamContent.
However, this hugely impacts RAM. And for each seeking action that client does, the RAM uses another extra 600MB of space. Look at it in action:
1) First request for video
2) Second request (seeking to the middle of the video)
3) Third request (seeking into the last quarter of the video)
This can not be scaled at all. 10 users watching this movie, and our server is down.
What should I do? How can I stream video directly from SQL Server table without loading the entire video into RAM with Entity Framework and then pushing it to client via PushStreamContent?
You could combine the SUBSTRING function with VARBINARY fields, to return portions of your data. But I suspect you'd prefer a solution that doesn't require jumping from one chunk to the next.
You may also want to review this similar question.

Handle ActionResults as cachable, "static content" in ASP.NET MVC (4)

I have a couple of ActionMethods that returns content from the database that is not changing very often (eg.: a polygon list of available ZIP-Areas, returned as json; changes twice per year).
I know, there is the [OutputCache(...)] Attribute, but this has some disadvantages (a long time client-side caching is not good; if the server/iis/process gets restartet the server-side cache also stopps)
What i want is, that MVC stores the result in the file system, calculates the hash, and if the hash hasn't changed - it returns a HTTP Status Code 304 --> like it is done with images by default.
Does anybody know a solution for that?
I think it's a bad idea to try to cache data on the file system because:
It is not going to be much faster to read your data from file system than getting it from database, even if you have it already in the json format.
You are going to add a lot of logic to calculate and compare the hash. Also to read data from a file. It means new bugs, more complexity.
If I were you I would keep it as simple as possible. Store you data in the Application container. Yes, you will have to reload it every time the application starts but it should not be a problem at all as application is not supposed to be restarted often. Also consider using some distributed cache like App Fabric if you have a web farm in order not to come up with different data in the Application containers on different servers.
And one more important note. Caching means really fast access and you can't achieve it with file system or database storage this is a memory storage you should consider.

Consequences of POST not being idempotent (RESTful API)

I am wondering if my current approach makes sense or if there is a better way to do it.
I have multiple situations where I want to create new objects and let the server assign an ID to those objects. Sending a POST request appears to be the most appropriate way to do that.
However since POST is not idempotent the request may get lost and sending it again may create a second object. Also requests being lost might be quite common since the API is often accessed through mobile networks.
As a result I decided to split the whole thing into a two-step process:
First sending a POST request to create a new object which returns the URI of the new object in the Location header.
Secondly performing an idempotent PUT request to the supplied Location to populate the new object with data. If a new object is not populated within 24 hours the server may delete it through some kind of batch job.
Does that sound reasonable or is there a better approach?
The only advantage of POST-creation over PUT-creation is the server generation of IDs.
I don't think it worths the lack of idempotency (and then the need for removing duplicates or empty objets).
Instead, I would use a PUT with a UUID in the URL. Owing to UUID generators you are nearly sure that the ID you generate client-side will be unique server-side.
well it all depends, to start with you should talk more about URIs, resources and representations and not be concerned about objects.
The POST Method is designed for non-idempotent requests, or requests with side affects, but it can be used for idempotent requests.
on POST of form data to /some_collection/
normalize the natural key of your data (Eg. "lowercase" the Title field for a blog post)
calculate a suitable hash value (Eg. simplest case is your normalized field value)
lookup resource by hash value
if none then
generate a server identity, create resource
Respond => "201 Created", "Location": "/some_collection/<new_id>"
if found but no updates should be carried out due to app logic
Respond => 302 Found/Moved Temporarily or 303 See Other
(client will need to GET that resource which might include fields required for updates, like version_numbers)
if found but updates may occur
Respond => 307 Moved Temporarily, Location: /some_collection/<id>
(like a 302, but the client should use original http method and might do automatically)
A suitable hash function might be as simple as some concatenated fields, or for large fields or values a truncated md5 function could be used. See [hash function] for more details2.
I've assumed you:
need a different identity value than a hash value
data fields used
for identity can't be changed
Your method of generating ids at the server, in the application, in a dedicated request-response, is a very good one! Uniqueness is very important, but clients, like suitors, are going to keep repeating the request until they succeed, or until they get a failure they're willing to accept (unlikely). So you need to get uniqueness from somewhere, and you only have two options. Either the client, with a GUID as Aurélien suggests, or the server, as you suggest. I happen to like the server option. Seed columns in relational DBs are a readily available source of uniqueness with zero risk of collisions. Round 2000, I read an article advocating this solution called something like "Simple Reliable Messaging with HTTP", so this is an established approach to a real problem.
Reading REST stuff, you could be forgiven for thinking a bunch of teenagers had just inherited Elvis's mansion. They're excitedly discussing how to rearrange the furniture, and they're hysterical at the idea they might need to bring something from home. The use of POST is recommended because its there, without ever broaching the problems with non-idempotent requests.
In practice, you will likely want to make sure all unsafe requests to your api are idempotent, with the necessary exception of identity generation requests, which as you point out don't matter. Generating identities is cheap and unused ones are easily discarded. As a nod to REST, remember to get your new identity with a POST, so it's not cached and repeated all over the place.
Regarding the sterile debate about what idempotent means, I say it needs to be everything. Successive requests should generate no additional effects, and should receive the same response as the first processed request. To implement this, you will want to store all server responses so they can be replayed, and your ids will be identifying actions, not just resources. You'll be kicked out of Elvis's mansion, but you'll have a bombproof api.
But now you have two requests that can be lost? And the POST can still be repeated, creating another resource instance. Don't over-think stuff. Just have the batch process look for dupes. Possibly have some "access" count statistics on your resources to see which of the dupe candidates was the result of an abandoned post.
Another approach: screen incoming POST's against some log to see whether it is a repeat. Should be easy to find: if the body content of a request is the same as that of a request just x time ago, consider it a repeat. And you could check extra parameters like the originating IP, same authentication, ...
No matter what HTTP method you use, it is theoretically impossible to make an idempotent request without generating the unique identifier client-side, temporarily (as part of some request checking system) or as the permanent server id. An HTTP request being lost will not create a duplicate, though there is a concern that the request could succeed getting to the server but the response does not make it back to the client.
If the end client can easily delete duplicates and they don't cause inherent data conflicts it is probably not a big enough deal to develop an ad-hoc duplication prevention system. Use POST for the request and send the client back a 201 status in the HTTP header and the server-generated unique id in the body of the response. If you have data that shows duplications are a frequent occurrence or any duplicate causes significant problems, I would use PUT and create the unique id client-side. Use the client created id as the database id - there is no advantage to creating an additional unique id on the server.
I think you could also collapse creation and update request into only one request (upsert). In order to create a new resource, client POST a “factory” resource, located for example at /factory-url-name. And then the server returns the URI for the new resource.
Why don't you use a request Id on your originating point (your originating point should do two things, send a GET request on request_id=2 to see if it's request has been applied - like a response with person created and created as part of request_id=2
This will ensure your originating system knows what was the last request that was executed as the request id is stored in db.
Second thing, if your originating point finds that last request was still at 1 not yet 2, then it may try again with 3, to make sure if by any chance just the GET response has gotten lost but the request 2 was created in the db.
You can introduce number of tries for your GET request and time to wait before firing again a GET etc kinds of system.

How long should I cache an object which can be changed at any time?

I'm in the process of making a fairly complex application. It is expected to run across multiple web servers, otherwise this would be easy.
Basically, we have a set of Client records. Each Client record has an XML column which contains all of the "real" data, such as the clients name and other fields which are made dynamically. Our users can update a client's record at anytime. Also, we have Application records. Each application is tied to multiple clients. Each application is usually tied to more than 3 clients. Each client's XML data is greater than 5k of text, usually.
In some profiling I've done, obtaining and deserializing this XML data is a fairly expensive operation. At one portion of our web application, we must have very low latencies (related). So during this portion, our web application is a JSON web service. When a request is made to it, usually, every client record will be needed(in full, due to how it's currently coded). I'm attempting to make as few database hits as possible in this portion.
How long should I cache the Client records' XML objects? Knowing the user can change it at anytime, I'm not sure if I should cache it at all, but can users live with slightly stale data?
Instead of refreshing the cache on any kind of schedule, just compare the last modified date of any critical records with the cached value when accessed, which should be a very inexpensive operation. Then update the cache only when needed.
You could store a hash of the xml in the database that the clients validate their cached XML against.
Then if it doesn't match up, invalidate your cache and retrieve new.
When the XML is updated, update the hash with it and then your clients will notice and update their cache.
Maybe you should use an SqlCacheDependency to ensure the data removed from the cache and reloaded from the database whenever it was changed.

Caching across requests with HttpModule

I have written an HttpModule that accepts the request, processes it (via a database lookup), and outputs results (html) back to the response. (Note that the HttpModule actually ends the request after it is done, so there is no normal ASP.NET processing of the request.)
As the database lookup can be expensive/time-consuming, I would like to store the results in-memory so that subsequent (identical) requests can be served the same content without going to the database.
Where can I store (in-memory) data so that it is available for subsequent invocations of the HttpModule?
You could store it in the Application.Cache, the result would be available Application wide then. Be sure to check for "new data" every now and then if necessary.
How is response size compared to your data fetched from db? What about rendering once you have that data in memory? If rendering is not an issue, I would just cache the data and render it on each request. If rendering takes lots of cpu, then you should cache the full response and directly serve it from cache.

Resources