Can gRPC be used for audio-video streaming over Internet?

Can gRPC be used for audio-video streaming over Internet? - grpc

I understand in a client-server model gRPC can do a bidirectional streaming of data.
I have not tried yet, but want to know will it be possible to stream audio and video data from a source to cloud server using gRPC and then broadcast to multiple client, all in real time ?

TLDR: I would not recommend video over gRPC. Primarily because it wasn't designed for it, so doing it would take a lot of hacking. You should probably take a look at WebRTC + a specific video codec.
More information below:
gRPC has no video compression
When sending video, we want to send things efficiently because sending it raw could require 1GB/s connectivity.
So we use video compression / video encoding. For example, H.264, VP8 or AV1.
Understand how video compression works (eg saving bandwidth by minimising similar data shared between frames in a video)
There is no video encoder for protobufs (the format used by gRPC).
You could then try image compression and save the images in a bytes field (e.g. bytes image_frame = 1;, but this is less efficient and definitely takes up unnecessary space for videos.
It's probably possible to encode frames into protobufs using a video encoder (e.g. H.264) and then decode them to play in applications. However, it might take a lot of hacking/engineering effort. This use case is not what gRPC/protobufs is designed for and not commonly done. Let me know if you hack something together, I would be curious.
gRPC is reliable
gRPC uses TCP (not UDP), which is reliable.
At a glance, reliability might be handy, to avoid corrupting data or lost data. However, depending on the use case (realtime video or audio calls), we may prefer to skip frames if they are dropped or delayed. The losses may be unnoticeable or painless to the user.
If the packet is delayed, it will wait for the packet before playing the rest. (aka. out of order delivery)
If the packet is dropped, it will resend it (aka. packet loss)
Therefore, video conferencing apps usually use WebRTC/RTP (configured to be unreliable)
Having mentioned that, looks like Zoom was able to implement Video-over-WebSockets, which is also a reliable transport (over TCP). So it's not "game-over", just highly not recommended and a lot more effort. They have moved over to WebRTC though.
Data received on the WebSockets goes into a WebAssembly (WASM) based decoder. Audio is fed to an AudioWorklet in browsers that support that. From there the decoded audio is played using the WebAudio “magic” destination node.

Related

Multi-party WebRTC without SFU

Based on this article, when implementing a WebRTC solution without a server, I assume it means SFU, the bottleneck is that only 4-6 participants can work.
Is there a solution that can work around this? For example, I just want to use Firebase as the only backend, mainly signaling and no SFU. What is the general implementation strategy to achieve at least 25-50 participants in WebRTC?
Update: This Github project shares a different statement. It states "A full mesh is great for up to ~100 connections"

Your real bottleneck with MESH is that each RTCPeerConnection will do its own video encoding in the browser.
The p2p concept naturally includes the requirement that both peers should adjust encoding quality based on network conditions. So, when your browser sends two streams to peers X (good download speed) and Y (bad download speed), the encodings for X and Y will be different - Y will receive lower framerate and bitrate than X.
Sounds reasonable, right? But, unfortunately, mandates separate video encoding for each peer connection.
If multiple peer connections could re-use the same video encoding, then MESH would be much more viable. But Google didn't provide that option in the browser. Simulcast requires SFU, so that's not your case.
So, how many concurrent video encodings can browser perform on a typical machine, for 720p 30 fps video? 5-6, not more. For 640x480 15 fps? Maybe 20 encodings.
In my opinion, the encoding layer and networking layer could be separated in WebRTC design, and even getUserMedia could be extended to getEncodedUserMedia, so that you could send the same encoded content to multiple peers.
So that's the real practical reason people use SFU for multi-peer WebRTC.

If you want to make a conference with 25 people all sending their video, then a regular webrtc setup will not work. Except if you massively lower your video quality. The reason for this is that every participant would need to send 24 seperate streams to every other client. So lets say you stream is 128 KB/s then you will need to have 3MB/s in upload speed available. Which isn't always available. Then also downloading that same amount.
The problem is that isn't scalable. That's why you need an SFU. Then you will only send a single stream and receive from others. The other positive thing about SFUs is that you can use simulcast which adapts the quality of your received streams depending on your network speed.
You can use the Janus gateway or mediasoup for example. Here is an already setup mediasoup video conferencing application that is scalable github repository

Streaming different kinds of data over local network: tcp or udp?

I don't have much experience with network programming, but an interesting problem came up that requires it. The server will be transmitting multiple streams of different types of data to other machines. Each machine should be able choose which of the streams (one or more) it will like to receive. The whole setup is confined to the local network only. Initially, there will be only two clients, but I would like to design a scalable approach, if possible.
The existing server code, which is streaming only a single stream, is using TCP streaming socket for doing so. However, from some reading on the subject, I am not sure if this approach will scale to multiple streams and multiple clients well. The reason is: wouldn't two clients, who want to receive the same stream but connect via different TCP sockets, result in wastage of bandwidth? Especially compared to UDP, which allows to multicast.
Due to my inexperience, I am relying on better informed people out there to advise me: considering that i do want the stream to be reliable, would it be worth it to start from the scratch with UDP, and implement reliability into it, than to keep using TCP? Or, will this be better solved by designing an appropriate network structure? I'd be happy to provide more details if needed. Thanks.
UPDATE: I am looking at PGM and emcaster for reliable multicasting at the moment. Must have C# implementations at server side, and python implementations at client side.

Since you want a scalable program, then UDP would be a better choice, because it does not go the extra length to verify that the data has been received, thus making the process of sending data faster.

Video Streaming Protocols Comparison

I'm trying to figure out the best way to stream video and I'm stuck on choosing my technology stack. Eventually what I want is to stream video (for now it's strictly h264 streaming) as efficient as possible. I'm new to streaming video and am quite confused choosing the transport protocol. I want to implement a streaming service where the packet loss is favored over latency, what would be the best transport protocol I could use?
I've been working with RTMP and still can't get the latency less than 30sec for 720p via 50mbit wifi connection. I couldn't find any performance comparisons between transport protocols so either I haven't been looking in the right places or no such comparison can be done without specifying the information about the data being transfered. In any case, could anyone shed some light on the issue?
I don't know if it makes any difference but I'm using NGINX-based media streaming server, and I'm open to any other suggestions for the media server.
Thanks

For low latency you could choose a protocol like RTP/RTSP which is intended for realtime. Now if you want to further reduce latency - turn off B-frames on your H.264 encoder.
WebRTC may be another option.

A) RTMP
Drawback:
1) For CDN, It will be costly since it work without caching.
2) From point of view of security, RTMP Encrypted is somewhat flawed, as it is prone to Man-in-the-Middle Attacks. RTMP toolkit - RTMPDump can be used easily to download RTMP streams.
3) RTMP streams your video in chunks it never loads ahead of time and viewers on slow speed connections might experience a bit of buffering as they wait for the rest of the file to stream.
4) You could use rtmp only with Flash plugin.
5) RTMP works on port 1935 only (Blocked by most corporate).
B)HLS (from Apple)Adaptive Streaming
Can be used with live + stored video streaming
Drawback:
This is platform dependent(good for IOS only)
C) Progressive /Pseudo Streaming
Download video in chunks before request.
Platform Independent.
Supports all video formats and served from http protocol.
Use pre trans-coded video to work with par with adaptive streaming.
**Nginx has support for all above methods, ngx_http_mp4_module

flash/flex: progressive download vs. rtmp

I'm trying to understand and really pinpoint when to use progressive download vs. rtmp in flex/flash. It seems that the main point is that rtmp is not served with http, whereas progressive download is. Since it's not rtmp, the resource is protected since there is no way to connect to the rtmp server from outside the swf.
Even if the user can see that object code and can figure out the location
<object data="http://media.example.com/jw-player/player.swf" >
<param value="streamer=rtmp://sub.example.com/video
&file=1330/title/folder2/theflvresource.flv
&id=FlvPlayer" name="flashvars">
</object>
they would not be able to connect to rtmp. So rtmp seems to be more useful when you want to protect a resource? Is that all there is to it?

I agree with xtat, but want to add much more.
The pros and cons of RTMP (or any UDP-based streaming protocol) vs. 'progressive download' (which is really just a subset of HTTP-based streaming) in my not-so-humble opinion:
UDP-based streaming
Pros
Currently significantly more difficult to pilfer streams
Currently supports live, which HTTP-based does not
Multi-cast capable, which can be desirable on intranets
Cons
Dramatically higher resource usage, relative to http-based approach
Need for specialized servers (FMS, Red5, Wowza, whatever)
More noticeable buffering
Firewall issues, especially with corporate customers
HTTP-based streaming
Pros
Dead simple
Can seek into media. FLV and MP4 (with some effort)
Cons
Trivial to pilfer streams. E.g.: Real Downloader
Live streams not currently possible, but give it a year. Apple is making this a reality.
no multi-casting
The entire HTTP-based approach is filled with and/but/if situations, lots of misunderstandings about what is and is not possible, and a lack of common definitions.
There are two basic characteristics people are looking at when discussing HTTP-based streaming: seeking and regulated bandwidth. From that, we get all these terms like 'pseudo-streaming', 'progressive download', etc.
These are the definitions I use to describe HTTP-based streaming servers:
regulated bit-rate: The flat media file is parsed by the server, and it send media as fast as the player needs to play the media without buffering.
seeking: the ability of a web-server to seek into the media and effectively create a new 'file' on the fly for use by the client. Similar to an http byte-range request, except that headers and media meta data are added/modified.
progressive download: Just send the file, as fast as possible. Basically, put media file on web server that sends to client in a 'dumb' manner, like like a large .iso or .zip file.
pseudo streaming: the ability of a web server to send media files to the client with a regulated bit-rate and to seek into files.

Personally, the main reason to choose RTMP over progressive download is it allows your user to skip to the middle of a video without having to download the whole file.

These days unless you need to record, there's not really any point to using RTMP. HTTP is simpler and obviously much more widely supported, easier to debug and indeed it does allow for seeking, even over CDNs. This is what I have set up at Viddler.

How to tunnel multi-player game data over HTTP with IIS to minimize latency?

I would like to create a browser-based network game. To ensure it can be used by as many people as possible, I'd like to embed all the traffic in standard HTTP packages.
Assuming I use IIS as my back end, how should I code this to minimize latency?
Is it reasonable to start with an ASP application of some kind (ASP.NET MVC perhaps) using shared state in memory? Or should I plan from the outset on writing some kind of IIS plugin of my own? Or should I abandon IIS and write a custom server instead?
Is it reasonable to start with each client repeatedly querying, say ten times per second, or should I make the data more stream-like somehow (and if so how)?

This can work just fine, but you're going to have to do something less "conventional."
To make this work, don't do individual requests, keep the request open and then transmit data with the open connection.
You could try using a protocol like Bayeux, but there are no rules here.
Here's an example with ASP.NET using COMET.

Designing an app to hit a web server 10 times a second is not a good idea. Web servers are designed for less frequent client requests and probably larger processing times and reponse sizes than your game will be using. That's not to say a web server wouldn't be able to cope just that it would not be an efficient client-server match.
For any type of app that requires multiple packets per second you really should think about a lighter protocol than HTTP which is fairly verbose. For example if your game needs to send 4 bytes to register a user's battleship co-ordinates you don't really want to transmit an extra few hundred bytes of HTTP headers.
I'd recommend a browser plugin technology like Siverlight of Flash. Both of those support TCP socket connections. You would need to write your own server to sit at the other end of the TCP socket. With that approach you'd also have the advantage of being able to push data out to the clients without having to rely on client polling mechanisms which are required with HTTP, e.g. AJAX.

One problem you will face with standard HTTP messages is the size (and lack of control) over the header data.
I was involved in the design of a flash game which was played by several million people. We needed to communicate with the server every few seconds and ended up using ultra-lightweight JSON messages to save on bandwidth and keep it snappy.
Size of JSON message: 16 bytes
Size of HTTP Header: 200+ bytes
HTTP is not really a good protocol for high traffic, low latency requirements. It was designed for larger, less frequent request/response models and has status codes like 304 Not Modified for this very reason.
You'll probably want to drop down to a custom TCP/IP implementation.