How to efficiently stream video over HTTP directly from SQL Server? - asp.net

I'm trying to implement a video-streaming service. I use ASP.NET Web API, and as I've searched, PushStreamContent is exactly what I want, and it works very fine, sending HTTP response 206 (partial content) to the client, keeping the connection alive and pushing (writing) streams of bytes to the output.
However, I can't scale. Because I can't retrieve partial binary data from database. For example consider that I have a 300MB video in my SQL Server table (varbinary field) and I use Entity Framework to get the record, and then push it to the client using PushStreamContent.
However, this hugely impacts RAM. And for each seeking action that client does, the RAM uses another extra 600MB of space. Look at it in action:
1) First request for video
2) Second request (seeking to the middle of the video)
3) Third request (seeking into the last quarter of the video)
This can not be scaled at all. 10 users watching this movie, and our server is down.
What should I do? How can I stream video directly from SQL Server table without loading the entire video into RAM with Entity Framework and then pushing it to client via PushStreamContent?

You could combine the SUBSTRING function with VARBINARY fields, to return portions of your data. But I suspect you'd prefer a solution that doesn't require jumping from one chunk to the next.
You may also want to review this similar question.

Related

HTTP data streaming for beginners?

I am starting to work on a project where I need to stream Twitter data using PowerTrack/GNIP and I have to be honest when I say I am very very inexperienced when it comes to networks and I have absolutely no knowledge when it comes to Data Stream (HTTP), how they work etc.
Are there any resources out there that go through all of this in simple terms? I would love to be able to map Data streaming process in my head before I start looking at APIs etc.
Thanks
Take a look at the following two resources which give a good overview of video streaming. Video streaming has probably more background available and should help you understand the concepts:
https://developer.apple.com/library/ios/documentation/NetworkingInternet/Conceptual/StreamingMediaGuide/Introduction/Introduction.html
http://www.jwplayer.com/blog/what-is-video-streaming/
In very simple terms, streaming breaks a large file or live stream into chunks, and sends those chunks one after another to a client (e.g. browser). The client can generally request a start point for content which is not a live stream. In the background this generally works by the client sending requests for each individual chunk (rather than just one request with multiple responses).
The advantage of the multiple request approach is that you know the client is actually still interested (e.g. the user has not browser to another page etc) and for video and audio etc the client can dynamically request different bandwidth files depending on the current network connection - see: http://en.wikipedia.org/wiki/Adaptive_bitrate_streaming.
Twitter do have a streaming page also, but you have probably already seen this:
https://dev.twitter.com/streaming/overview

ASP.NET Web API split response in chuncks

Is it possible to split a response from WEB API into chunks as follow.
I have a win forms app that can handle 100 KB of data at a time.
When this client makes a request to my ASP.NET WEB API for some data let's assume that the WEB API response is 2 MB ... can I somehow cache this response, split it in 100KB chunks and return the first chunk to my app. That response will contain a link/token to the next chunk and so forth? Does this sound crazy? Is this feasible?
One more question: when we are talking about request/response content length (size), what does this means: that the content itself cannot be bigger then 100 KB or the content with headers and so on ... I want to know if headers are included or not in the content length?
Example: if my response is 99KB and the headers are 10 KB (109 KB) will this pass if the limit is 100KB?
Pagination is a pretty common solution to large data sets in webservices. Facebook, for example, will paginate API results when they exceed a certain number of rows. Your idea for locally caching the full result is a good optimization, though you can probably get away with not caching it as a first implementation if you are unsure of whether or not you will keep this as your final solution.
Without caching, you can just pass the page number and total number of pages back to your client, and it can then re-make the call with a specific page number in mind for the next set. This makes for an easy loop on the client, and your data access layer is marginally more complicated since it will only re-serialize certain row numbers depending on the page parameter, but that should still be pretty simple.
Web API has access to the same HttpRuntime.Cache object as the rest of the ASP.NET project types do, so it should be easy to write a wrapper around your data access call and stick the result of a larger query into that cache. Use a token as the key to that value in the cache and pass the key, likely an instance of the GUID class, back to the client with the current page number. On subsequent calls, skip accessing your normal persistence method (DB, file, etc) and instead access the GUID key in the HttpRuntime.Cache and find the appropriate row. One wrinkle with this is if you have multiple webservers hosting your service since the HttpRuntime.Cache will exist on only the machine that took the first call, so unless your load balancer has IP affinity or you have a distributed caching layer, this will be more difficult to implement.

When a Firebase node syncs, is the full new value sent to the server or just the difference?

And does it work the same way when an object is being updated via a callback like ref.on('value', ...?
I tried to figure it out in myself in the Chrome dev tools but wasn't able to.
This makes a difference for me because I'm working on an app where users might store large amounts of text. If only diffs are sent over the wire, it's a lot more lightweight and I can sync much more frequently. If full values are sent, I wouldn't want to do that.
When data is written, the Firebase server currently sends all the data being written down to the server. If you write a large object, and then rewrite the whole object with the same object again, the entire object will be sent over the wire (This may be changing in the future, but that's the current implementation).
When sending data from the server back out to other clients, we do do some optimization and don't transmit some of the duplicate data.
Firebase is designed to allow you to granularly access data. I would strongly suggest you address into the data that is changing and only update the relevant portions. For example:
//inefficient method:
ref.set(HUGE_BLOCK_OF_JSON);
//efficient method:
ref.child("a").child("b").child("c").set(SOME_SMALL_PIECE_OF_DATA);
When you address into a piece of data, only that small piece is transmitted and rebroadcast to other clients.
Firebase is intended for true real-time apps where updates are made as soon as data changes. If you find yourself intentionally caching changes for a while and saving them as big blobs for performance reasons, you should probably be breaking up your data and only writing the relevant portions.

grab website content thats not in the sourcecode

I want to grab some financial data from sites like http://www.fxstreet.com/rates-charts/currency-rates/
up to now I'm using liburl to grab the sourcecode and some regexp search to get the data, which I afterwards store in a file.
Yet there is a little problem:
On the page as I see it in the browser, the data is updated almost each second. When I open the source code however the data I'm looking for changes only every two minutes.
So my program only gets the data with a much lower time-resolution than possible.
I have two questions:
(i) How is it possible that a source-code which remains static over two minutes produces a table that changes every second? What is the mechanism?
(ii) How do I get the data with second time-resolution, i.e. how do I read out such a changing table thats not shown in the sourcecode.
thanks in advance,
David
You can use the network panel in FireBug to examine the HTTP requests being sent out (typically to fetch data) while the page is open. This particular page you've referenced appears to be sending POST requests to http://ttpush.fxstreet.com/http_push/, then receiving and parsing a JSON response.
try sending POST request to http://ttpush.fxstreet.com/http_push/connect, and see what you get
it will continuously load new data
EDIT:
you can use liburl or python, it doesn't really matter. Under HTTP, when you browse the web, you send GET or POST requests.
Go to the website, open the Developer Tools (Chrome)/firebug(firefox plugin) and you will see that after all the data is loaded, there's a request that doesn't close - it stays open.
When you have a website and you want to fetch data continuously, you can do it in a few techniques:
make separate requests (using ajax) every few seconds - this will open a connection for each request, and if you want frequent data updates - it's wasteful
use long polling or server polling - make 1 request that fetches the data. it stays open, and flushes data to the socket (to your browser) whenever it needs. the TCP connection remains open. When the connection times out - you can reopen it. It's more effective than the above normally - but the connection remains open.
use XMPP or some other protocol (not HTTP) - used mainly on chats, like facebook/msn i think., probably google's and some others.
the website you posted uses the second method - when it detects a POST request to that page, it keeps the connection open and dumps data continuously.
What you need to do is make a POST request to that page, you need to see which parameters (if any) are needed to be sent. It doesn't matter how you make the request, as long as you send the right parameters.
you need to read the response with a delimiter - probably every time they want to process data, they send \n or some other delimiter.
Hope this helps. If you see that you still can't get around this let me know and i'll get into more technical details

Flex - best strategy for keeping client data in synch with backend database?

In an Adobe flex applicaiton using BlazeDS AMF remoting, what is the best stategy for keeping the local data fresh and in synch with the backend database?
In a typical web application, web pages refresh the view each time they are loaded, so the data in the view is never too old.
In a Flex application, there is the tempation to load more data up-front to be shared across tabs, panels, etc. This data is typically refreshed from the backend less often, so there is a greater chance of it being stale - leading to problems when saving, etc.
So, what's the best way to overcome this problem?
a. build the Flex application as if it was a web app - reload the backend data on every possible view change
b. ignore the problem and just deal with stale data issues when they occur (at the risk of annoying users who are more likely to be working with stale data)
c. something else
In my case, keeping the data channel open via LiveCycle RTMP is not an option.
a. Consider optimizing back-end changes through a proxy that does its own notification or poling: it knows if any of the data is dirty, and will quick-return (a la a 304) if not.
b. Often, users look more than they touch. Consider one level of refresh for looking and another when they start and continue to edit.
Look at BuzzWord: it locks on edit, but also automatically saves and unlocks frequently.
Cheers
If you can't use the messaging protocol in BlazeDS, then I would have to agree that you should do RTMP polling over HTTP. The data is compressed when using RTMP in AMF which helps speed things up so the client is waiting long between updates. This would also allow you to later scale up to the push methods if the product's customer decides to pay up for the extra hardware and licenses.
You don't need Livecycle and RTMP in order to have a notification mechanism, you can do it with the channels from BlazeDS and use a streaming/long polling strategy
In the past I have gone with choice "a". If you were using Remote Objects you could setup some cache-style logic to keep them in sync on the remote end.
Sam
Can't you use RTMP over HTTP (HTTP Polling)?
That way you can still use RTMP, and although it is much slower than real RTMP you can still braodcast updates this way.
We have an app that uses RTMP to signal inserts, updates and deletes by simply broadcasting RTMP messages containing the Table/PrimaryKey pair, leaving the app to automatically update it's data. We do this over Http using RTMP.
I found this article about synchronization:
http://www.databasejournal.com/features/sybase/article.php/3769756/The-Missing-Sync.htm
It doesn't go into technical details but you can guess what kind of coding will implement this strategies.
I also don't have fancy notifications from my server so I need synchronization strategies.
For instance I have a list of companies in my modelLocator. It doesn't change really often, it's not big enough to consider pagination, I don't want to reload it all (removeAll()) on each user action but yet I don't want my application to crash or UPDATE corrupt data in case it has been UPDATED or DELETED from another instance of the application.
What I do now is saving in a SESSION the SELECT datetime. When I come back for refreshing the data I SELECT WHERE last_modified>$SESSION['lastLoad']
This way I get only rows modified after I loaded the data (most of the time 0 rows).
Obviously you need to UPDATE last_modified on each INSERT and UPDATE.
For DELETE it's more tricky. As the guy point out in his article:
"How can we send up a record that no longer exists"
You need to tell flex which item it should delete (say by ID) so you cannot really DELETE on DELETE :)
When a user delete a company you do an UPDATE instead: deleted=1
Then on refresh companies, for row where deleted=1 you just send back the ID to flex so that it makes sure this company isn't in the model anymore.
Last but not the least, you need to write a function that clean rows where deleted=1 and last_modified is older than ... 3days or whatever you think suits your needs.
The good thing is that if a user delete a row by mistake it's still in the database and you can save it from real delete within 3days.
Rather than caching on flex client, why not do caching on server side? Some reasons,
1) When you cache data on server side, its centralized and you can make sure all clients have the same state of data
2) There are much better options available on server side for caching rather than on flex. Also you can have a cron job which refreshes data based on some frequency say every 24 hours.
3) As data is cached on server and it doesn't need to fetch it from db every time, communication with flex will be much faster
Regards,
Tejas

Resources