Implementing an audio stream service similar to Spotify - networking

High Level Description
Let's say I have a client program (an iOS app in my specific case) that should communicate with a server program running on a remote host. The system should work as follows:
The server has a set of indexed audio files and exposes them to the client using the indexes as identifiers
The client can query the server for an item with a given identifier and the server should stream its contents so the client can play it in real time
The data streamed by the server should only be used by the client itself, i.e. someone sniffing the traffic should not be able to interpret the contents and the user should not be able to access the data.
From my perspective, this is a simple implementation of what Spotify does.
Technical Questions
How should the audio data be streamed between server and client? What protocols should be used? I'm aware that using something on top of TLS will protect the information from someone sniffing the traffic, however it won't protect it from the user himself if he has access to the encryption keys.

The data streamed by the server should only be used by the client itself, i.e. someone sniffing the traffic should not be able to interpret the contents…
HTTPS is the best way for this.
…and the user should not be able to access the data.
That's not possible. Even if you had some sort of magic to prevent capture of decrypted data (which isn't possible), someone can always record the audio output, even digitally.
From my perspective, this is a simple implementation of what Spotify does.
Spotify doesn't do this. Nobody does, and nobody can. It's impossible. If the client must decode data, then you can't stop someone from modifying how that data gets decoded.
What you can do
Use HTTPS
Sign your URLs so that the raw media is only accessible for short periods of time. Everyone effectively gets their own URL to the media. (Check out how AWS S3 handles this, for an excellent example.)
If you're really concerned, you can watermark your files on-the-fly, encoding an ID within them so that should someone leak the media, you can go after them based on their account data. This is expensive, so make sure you really have a business case for doing so.

Related

Do servers remember clients and user agents?

Do the big guys (Google, Microsoft, etc...) remember all HTTP clients and more importantly, the User-Agents that connected to them?
If so, should you implement this as a startup? (make your server remember the clients)
I'm not asking for advice, only for practicality or if there's some protocol somewhere that requires it. Like what's the standard, not your opinion.
The standard is: if there is data you need then you collect and store it. If you don't need the data then don't bother.
That information is in the request header sent by the browser. Anything the browser sends to the server can be collected, processed, stored, etc...
There is no protocol that requires it and you do not have to store this information.

NGINX as warm cache in front of wowza for HLS live streams - Get per stream data duration and data transferred?

I've setup NGINX as a warm cache server in front of Wowza > HTTP-Origin application to act as an edge server. The config is working great streaming over HTTPS with nDVR and adaptive streaming support. I've combed the internet looking for examples and help on configuring NGINX and/or other solutions to give me live statistics (# of viewers per stream_name) as well parse the logs to give me stream duration per stream_name/session and data_transferred per stream_name/session. The logging in NGINX for HLS streams logs each video chunk. With Wowza, it is a bit easier to get this data by reading the duration or bytes transferred values from the logs when the stream is destroyed... Any help on this subject would be hugely appreciated. Thank you.
Nginx isn't aware of what the chunks are. It's only serving resource to clients over HTTP, and doesn't know or care that they're interrelated. Therefore, you'll have to derive the data you need from the logs.
To associate client requests together as one, you need some way to track state between requests, and then log that state. Cookies are a common way to do this. Alternatively, you could put some sort of session identifier in the request URI, but this hurts your caching ability since each client is effectively requesting a different resource.
Once you have some sort of session ID logged, you can process those logs with tools such as Elastic Stack to piece together the reports you're looking for.
Depending on your goals with this, you might find it better to get your data client-side. There, you have a better idea of what a session actually is, and then you can log client-side items such as buffer levels and latency and what not. The HTTP requests don't really tell you much about the experience the end users are getting. If that's what you want to know, you should use the log from the clients, not from your HTTP servers. Your HTTP server log is much more useful for debugging underlying technical infrastructure issues.

Why HTTP was designed to be a pull protocol?

I was watching many presentations about Html 5 WebSockets , where server can initialize connection with client and push the data without the request from the client.
We don't need Polling etc.
And , I am curious , why Http was designed as a "pull" and not full duplex protocol in the first place ? What where the reasons behind that kind of decision ?
Because when http was first designed it was meant to be used to retrieve documents from a server. And the easiest way to do is when the client asks the server for a document and gets it delivered as response (or an error in case it does not exist). When you have push protocol that means the server would need to keep client connections around for potentially a long time creating more resource management problems - remember we are talking about early 1990s here.
Http was designed for simply retrieving hypertext documents from a server. There were no reasons to push anything to the client when the pages were just pure, static html without scripting capabilities.
Since there was no need at the time for pushing things back to the client, the protocol was kept simple.
HTTP is mainly a pull protocol—someone loads information on a Web server and
users use HTTP to pull the information from the server at their convenience. In particular,
the TCP connection is initiated by the machine that wants to receive the file.

find out connection speed on http request?

is it possible to find out the connection speed of the client when it requests a page on my website.
i want to serve video files but depending on how fast the clients network is i would like to serve higher or lower quality videos. google analytics is showing me the clients connection types, how can i find out what kind of network the visitor is connected to?
thx
No, there's no feasible way to detect this server-side short of monitoring the network stream's send buffer while streaming something. If you can switch quality mid-stream, this is a viable approach because if the user's Internet connection suddenly gets burdened by a download you could detect this and switch to a lower-quality stream.
But if you just want to detect the speed initially, you'd be better off doing this detection on the client and sending the results to the server with the video request.
Assign each request a token /videos/data.flv?token=uuid123, and calculate amount of data your webserver sends for this token per second (possible check for multiple tokens at one username at a time period). You can do this with Apache sources and APR.

Understanding REST: REST as a high volume transport?

I'm designing a system that will need to move multi-GB backup images over TCP, and I'm looking at REST as an alternative to ONC RPC.
For example, I might have
POST http://site/backups/image1
where image1 is an 50GB file whose data is contained in the HTTP body.
My question: is this within the scope of what REST is meant for? Is it inappropriate to move massive files over HTTP? My preliminary testing shows that the performance isn't too bad, and I like the clean, debuggable protocol, as opposed to a custom ONC RPC server. But is this overloading the role of a webserver?
Thanks,
-Steve
HTTP has about the same overheads as FTP.
An HTTP server if often asked to do more work than an FTP server. But otherwise, using HTTP to send a large file is about the same as using FTP.
The only consideration is making sure your web server and web application framework are configured to do this kind of thing without needlessly expanding the entire 50Gb file inside Apache.
Steve,
HTTP has a look-before-you-leap 'feature' that allows the client to ask the server whether it will accept the data submission before it actually sends out the data. I'd look into using this to avoid transferring GBs of data only to find out that the server is currently not willing to handle them. Look at the HTTP Expect header and 100 Continue status codes.
Also, you can use FTP within a RESTful approach, IOW, think along the lines of
<backup-store href="ftp://example.org/site/backup/images/"/>
and make your clients understand the ftp URI scheme.
Finally, the T in HTTP means transfer and not transport - an important distinction to make because the former is an application semantic (HTTP is an application protocol) and the latter is a not.
HTH,
Jan
REST has nothing to do with how large your data is or which method you use to transport it.

Resources