I was capturing youtube video packets using wireshark. I saw it was http tunneled over tcp packet. (Even in case of youtube live streaming).
But whatever I know is that youtube uses flash video technology and html5. Again in some websites they mention about DASH protocol.
My question is, what is the exact protocol used by youtube? And how we can interpret the data that I have captured in wireshark? In the capture it is shown as just 'Data'. There is nothing mention as video data or any other things like that.
YouTube primarily uses the VP9 and H.264/MPEG-4 AVC video formats, and the Dynamic Adaptive Streaming over HTTP protocol.
By January 2019, YouTube had begun rolling out videos in AV1 format.
For mobile - Sometimes Youtube servers are sending data using RTSP which is an Application Layer Protocol.
On the transport layer RTSP uses both TCP and UDP.
If you want to parse youtube data from wireshark you will have to store it and run it inside a flashplayer. The video is sent as a flash object embedded into the HTML Page that is sent to you via https.
Source:
https://en.wikipedia.org/wiki/YouTube#Features
The exact protocol is tcp; although YouTube has been switching over to UDP as of late. The inability to interpret packet data is intentional, the way YouTube breaks up streaming data prevents capture apps like Wireshark from exposing anything about the data being transferred. To interpret the data you are going to need capture the data from a substantial number of packets and compile it to form a part of the file be sent. It’s best to just take the source IP from the pocket sender and use DNS to resolve it to the Domain name, then do research on what type of data that can be expected from that domain, but obviously this is extremely unreliable.
Related
I have never deployed the systems of video monitoring. But now I'am going to deploy it with the next structure.
Office:
static IP
IP-cameras
Home:
static IP
NVR
I decided to place NVR at home, because if an intruder breaks into the office, he will be able to take NVR away and then meaning of the video monitoring will be lost.
A task to protect the channel between NVR and cameras appers in my structure. RTSP is supposed to be used.
I'm going to restrict an access to rtsp://... by office's networking equipment. Cameras will be available only from my home's IP. And then I can be sure, that no one can access to office's cameras.
My question: are the following statements true?
To decrypt the video stream packets on NVR side, the camera's password is used.
This password is transmitted from NVR to camera in plain text, when connection via RTSP is established.
Having this password, it is possible to view the video stream if the traffic between the NVR and the camera does sniffed.
I would be extremely grateful for a competent answer.
Firstly, its worth mentioning RTSP is a streaming control protocol, to control streaming servers. It defines how to package data to stream it and how both ends of the connection should behave to support the protocol. Thinking of it as a remote control is a very nice way to think of it - see: https://stackoverflow.com/a/43045354/334402.
RTP (Real Time Transport) protocol is what is used to actually stream the media streams.
Regular RTP is not encrypted end to end, which is what I think you are looking for here.
So someone 'sniffing' the stream will be able to view the audio and video stream without any need for a password.
It sounds like what you need for your use case is secure RTP - SRTP: https://datatracker.ietf.org/doc/html/rfc3711 (this is the spec but you can find overviews online with some quick searches also).
Many (or maybe most) leading IP security cameras will supporter SRTP now.
I was hoping to build an application that streams audio (mp3, ogg, etc.) from my microphone to a web browser.
I think I can use the html5 audio tag to read/play the stream from my server.
The area I'm really stuck on is how to setup the streaming http endpoint. What technologies will I need, and how should my server be structured to get the live audio from my mic and accessible from my server?
For example, for streaming mp3, do I constantly respond with mp3 frames as they are recorded?
Thanks for any help!
First off, let's split this problem up into a few parts. You have the audio capture (recording), the encoding/codec, the server, and the receiving clients.
Capture -> Codec -> Server -> Several Clients
For audio capture, you will need to use the Web Audio API along with getUserMedia. This will allow you to get 32-bit floating point PCM samples from the recording device. This data stream takes up a ton of bandwidth... a few megabit for a stereo stream. This stream is not directly playable in an HTML5 audio tag, and while you could play it on the receiving end with the Web Audio API, it takes up too much bandwidth to be useful. You need to use a codec to get the bandwidth usage down.
The codecs you want to look at include MP3, AAC (and its variants such as HE-AAC), and Opus. Not all browsers support all codecs. MP3 is the most widely compatible but AAC provides better quality for a given bitrate. Opus is a free and open codec but still doesn't have the greatest client adoption. In any case, there isn't yet a codec that you can run in-browser with any real stability. (Although it's being worked on! There are a lot of test projects made with Emscripten.) I solved this problem by reducing the bit depth of my samples to 16-bit signed integers and sending this PCM stream to a server to do the codec, over a binary websocket.
This encoding server took the PCM stream and ran it through a codec server-side. Here you can use whatever you'd like, such as a licensed codec binary or a tool like FFmpeg which encapsulates multiple codecs.
Next, this server streamed the data to a real streaming media server like Icecast. SHOUTcast and Icecast servers take the encoded stream and relay it to many clients over an HTTP-like connection. (Icecast is HTTP compliant whereas SHOUTcast is close but not quite there which can cause compatibility issues.)
Once you have your streaming server set up, it's as simple as referencing the stream URL in your <audio> tag.
Hopefully that gets you started. Depending on your needs, you might also look into WebRTC which does all of this for you but doesn't give you options for quality and also doesn't scale beyond a few users.
I just sniffed some traffic using wireshark and noticed, that the YouTube traffic relies on TCP. I thought, they were using UDP? But it seems like as if they would use HTTP octet streams. Is YouTube really using TCP for streams or am i missing something?
Because they need everything TCP provides (slow start, transmit pacing, exponential backoff, receive windows, reordering, duplicate rejection, and so on) they would either have to use TCP or try to do all those things themselves. There's no way they could do that better than each operating system's optimized TCP implementation.
Obviously, Google is currently experimenting with own Protocol Implementations, like QUIC (Quick UDP Internet Connection), as one can see when examining the HTTP Response
HTTP/1.1 200 OK
...
Content-Type: video/mp4
Alternate-Protocol: 80:quic
...
However, currently, they seem to rely on TCP, just like David mentioned before.
From http://www.crazyengineers.com/threads/youtube-use-tcp-or-udp.38419/:
...of course youtube page uses http [which is over TCP]. The real thing does not happens
via http page but the flash object that is embedded in that page. The
flash object which appear on youtube is video flash player. The video
flash player acts as iframe(technically incorrect term) for contents
that would be called for streaming via flash object. For storing media
contents a media sever have been installed by youtube whose contents
get called when you press play button.
For streaming media to flash player Real Time Streaming
Protocol(RTSP) is used. The play button on flash player acts as RTSP
invoker for media being called and media is streamed via UDP packets.
In fact you don't need to migrate anywhere from page because the
embedded object calls for video not the http page but as the object is
embedded on http page once you close it, object also get closed.
What protocols do web cameras use for streaming audio/video feeds over the internet? HTTP? TCP? How is each frame sent inside the protocol? For example, if they use HTTP, does the web cam software encode each frame and tack it on as a query string parameter like:
http://www.some-url.com?encoded-frame=WJDJ84FU84F85594DK3DK
or, is the encoded frame set as the HTTP request's body? Similar question for TCP or any other protocol that is used.
I'm asking because I'd like to stream a web cam to a web server and have software that receives each encoded frame, decodes it, and does something with it. Thanks in advance.
Well the question in OP is open ended because it's not like there is 'one fixed set of protocols(TCP/UDP)' used in this kind of applications and also its scope is large, due to various technologies involved in this end-to-end solution of Camera capturem encoding , streaming, decoding/processing. In the case you mentioned if it is going to be likely that the webcam and the Web server are going to be on same LAN, then well you can as well use TCP/IP and then server can process it. Because on LAN latencies won't be high, so TCP would serve good. Else if on WAN, then UDP/IP can be of help.
There are plenty of tutorials online to get basics of using TCP/IP or UDP/IP sockets and its programming concepts. Then there are tutorials about streaming, packetization etc of Video data.
I don't see how HTTP can be of use here to send from webcam to a server.
For starters
http://streaminglearningcenter.com/streaming-video-consulting.html
Hope this is good to get you started.
While using RTMP if the request is tunneled through HTTP, how different it is from a HTTP request?
What would be the performance implications of tunneling while using RTMP?
The advantage of RTMP streams over the casual HTTP based progressive downloading is far too realistic to ignore
You can serve Flash Video over the Internet using RTMP, a special protocol for real-time server applications ranging from instant messaging to collaborative data sharing to video streaming. Whereas HTTP-delivered Flash Video is referred to as progressive download video, RTMP-delivered Flash Video is called streaming video. However, because the term streaming is so often misused, I prefer the term real-time streaming video.
One of the benefits of RTMP delivery for the viewer is near-instantaneous playback of video, provided the Flash Video file is encoded with a bitrate appropriate to the viewer's connection speed. Real-time streaming video can also be seeked to any point in the content. This feature is particularly advantageous for long-duration content because the viewer doesn't have to wait for the video file to load before jumping ahead, as is the case for HTTP-delivered video.
http://www.cisco.com/en/US/prod/collateral/video/ps11488/ps11791/ps11802/white_paper_c11-675935.html