I am tring to send a screenshot of a desktop over winsock.
As such, there are four tasks:
Save bitmap to buffer
Write data across wire using a socket
Read data from wire using a socket
Load a bitmap from a buffer
I have saved the bitmap to a char array using GetDIBits.
Writing the data to the server, I have done but I have questions.
For writing data over from server to the client, do I need to use only 1 recv() call (I am using TCP), or do i need to split it up into multiple parts? Ive read that TCP is stream concept and that I wouldnt have to worry about packets because that is abstracted for me?
How would I go about loading the information from GetDIBits into a bitmap and displaying it on the main window?
I am guessing I have to use SetDIBits, but into which device contexts do i use?
The Server screenshot capturer is here:
HDC handle_ScreenDC = GetDC(NULL);
HDC handle_MemoryDC = CreateCompatibleDC(handle_ScreenDC);
BITMAP bitmap;
int x = GetDeviceCaps(handle_ScreenDC, HORZRES);
int y = GetDeviceCaps(handle_ScreenDC, VERTRES);
HBITMAP handle_Bitmap = CreateCompatibleBitmap(handle_ScreenDC, x, y);
SelectObject(handle_MemoryDC, handle_Bitmap);
BitBlt(handle_MemoryDC, 0, 0, x, y, handle_ScreenDC, 0, 0, SRCCOPY);
GetObject(handle_Bitmap, sizeof(BITMAP), &bitmap);
BITMAPFILEHEADER bmfHeader;
BITMAPINFOHEADER bi;
bi.biSize = sizeof(BITMAPINFOHEADER);
bi.biWidth = bitmap.bmWidth;
bi.biHeight = bitmap.bmHeight;
bi.biPlanes = 1;
bi.biBitCount = 16;
bi.biCompression = BI_RGB;
bi.biSizeImage = 0;
bi.biXPelsPerMeter = 0;
bi.biYPelsPerMeter = 0;
bi.biClrUsed = 0;
bi.biClrImportant = 0;
//std::cout<< bitmap.bmWidth;
DWORD dwBmpSize =((bitmap.bmWidth * bi.biBitCount + 5) / 32) * 4 * bitmap.bmHeight;
//int i = bitmap.bmWidth;
//DWORD dwBmpSize = 99;
HANDLE hDIB = GlobalAlloc(GHND, dwBmpSize);
char* bufptr = (char *)GlobalLock(hDIB);
GetDIBits(handle_ScreenDC, handle_Bitmap, 0, (UINT)bitmap.bmHeight, bufptr, (BITMAPINFO *)&bi, DIB_RGB_COLORS);
send(clientsock, bufptr , GlobalSize((char *)GlobalLock(hDIB)), 0);
/*Do i need to packetize/split it up? Or 1 send() is good for the matching Recv on the client?*/
/*I am assuming i must send bi structure over winsock also correct?*/
And The receiveing client code:
case WM_PAINT:{
//Im a Gdi beginner so I dont have a clue what im doing here as far as blitting the recved bits, this is just some stuff i tried myself before asking for help
PAINTSTRUCT paintstruct;
HDC handle_WindowDC = BeginPaint(hwnd, &paintstruct);
handle_MemoryDC = CreateCompatibleDC(handle_WindowDC);
handle_Bitmap = CreateCompatibleBitmap(handle_WindowDC, 640, 360);
std::cout << SetDIBits(handle_MemoryDC, handle_Bitmap, 0, bi.biHeight, buffer, (BITMAPINFO *)&bi, DIB_RGB_COLORS);
SelectObject(handle_MemoryDC, handle_Bitmap);
StretchBlt(handle_WindowDC, 50, 50, 640, 360, handle_MemoryDC, 0, 0, x, y, SRCCOPY);
EndPaint(hwnd, &paintstruct);
}
Sockets do have limited buffer sizes at both ends, typically around 4000 bytes. So if you dump a large block of data (like a full screendump) in one call to a non-blocking send, you will likely get errors, and you will need to manage your own buffers, calling multiple sends. However, if you are using non-blocking socket, you should be OK, as send() will simply block until all the data is sent.
On the receiving side, it is a similar case - a blocking receive can just keep waiting until it has the full data size that you asked for, but a non-blocking receive will return with whatever data is available at that time, which will result in the data filtering through bit by bit, and you will need to reassemble the data from multiple recv() calls.
I have heard of issues with sending really large blocks of data in one hit, so if you are sending 5 megabytes of data in one hit, be aware there might be other issues coming into play as well.
Related
I have been trying to use NDI SDK 4.5, in a Objective-C iOS-13 app, to broadcast camera capture from iPhone device.
My sample code is in public Github repo: https://github.com/bharatbiswal/CameraExampleObjectiveC
Following is how I send CMSampleBufferRef sampleBuffer:
CVPixelBufferRef pixelBuffer = CMSampleBufferGetImageBuffer(sampleBuffer);
NDIlib_video_frame_v2_t video_frame;
video_frame.xres = VIDEO_CAPTURE_WIDTH;
video_frame.yres = VIDEO_CAPTURE_HEIGHT;
video_frame.FourCC = NDIlib_FourCC_type_UYVY; // kCVPixelFormatType_420YpCbCr8BiPlanarFullRange
video_frame.line_stride_in_bytes = VIDEO_CAPTURE_WIDTH * VIDEO_CAPTURE_PIXEL_SIZE;
video_frame.p_data = CVPixelBufferGetBaseAddress(pixelBuffer);
NDIlib_send_send_video_v2(self.my_ndi_send, &video_frame);
I have been using "NewTek NDI Video Monitor" to receive the video from network. However, even though it shows as source, the video does not play.
Has anyone used NDI SDK in iOS to build broadcast sender or receiver functionalities? Please help.
You should use kCVPixelFormatType_32BGRA in video settings. And NDIlib_FourCC_type_BGRA as FourCC in NDIlib_video_frame_v2_t.
Are you sure about your VIDEO_CAPTURE_PIXEL_SIZE ?
When I worked with NDI on macos I had the same black screen problem and it was due to a wrong line stride.
Maybe this can help : https://developer.apple.com/documentation/corevideo/1456964-cvpixelbuffergetbytesperrow?language=objc ?
Also it seems the pixel formats from core video and NDI don't match.
On the core video side you are using Bi-Planar Y'CbCr 8-bit 4:2:0, and on the NDI side you are using NDIlib_FourCC_type_UYVY which is Y'CbCr 4:2:2.
I cannot find any Bi-Planar Y'CbCr 8-bit 4:2:0 pixel format on the NDI side.
You may have more luck using the following combination:
core video: https://developer.apple.com/documentation/corevideo/1563591-pixel_format_identifiers/kcvpixelformattype_420ypcbcr8planarfullrange?language=objc
NDI: NDIlib_FourCC_type_YV12
Hope this helps!
In my experience, you have two mistake. To use CVPixelBuffer's CVPixelBufferGetBaseAddress, the CVPixelBufferLockBaseAddress method must be called first. Otherwise, it returns a null pointer.
https://developer.apple.com/documentation/corevideo/1457128-cvpixelbufferlockbaseaddress?language=objc
Secondly, NDI does not support YUV420 biplanar. (The default format for iOS cameras.) More precisely, NDI only accepts one data pointer. In other words, you have to merge the biplanar memory areas into one, and then pass it in NV12 format. See the NDI document for details.
So your code should look like this: And if sending asynchronously instead of NDIlib_send_send_video_v2, a strong reference to the transferred memory area must be maintained until the transfer operation by the NDI library is completed.
CVPixelBufferRef pixelBuffer = CMSampleBufferGetImageBuffer(sampleBuffer);
CVPixelBufferLockBaseAddress(pixelBuffer, 0);
int width = (int)CVPixelBufferGetWidth(pixelBuffer);
int height = (int)CVPixelBufferGetHeight(pixelBuffer);
OSType pixelFormat = CVPixelBufferGetPixelFormatType(pixelBuffer);
NDIlib_FourCC_video_type_e ndiVideoFormat;
uint8_t* pixelData;
int stride;
if (pixelFormat == kCVPixelFormatType_32BGRA) {
ndiVideoFormat = NDIlib_FourCC_type_BGRA;
pixelData = (uint8_t*)CVPixelBufferGetBaseAddress(pixelBuffer); // Or copy for asynchronous transmit.
stride = width * 4;
} else if (pixelFormat == kCVPixelFormatType_420YpCbCr8BiPlanarFullRange) {
ndiVideoFormat = NDIlib_FourCC_type_NV12;
pixelData = (uint8_t*)malloc(width * height * 1.5);
uint8_t* yPlane = (uint8_t*)CVPixelBufferGetBaseAddressOfPlane(pixelBuffer, 0);
int yPlaneBytesPerRow = (int)CVPixelBufferGetBytesPerRowOfPlane(pixelBuffer, 0);
int ySize = yPlaneBytesPerRow * height;
uint8_t* uvPlane = (uint8_t*)CVPixelBufferGetBaseAddressOfPlane(pixelBuffer, 1);
int uvPlaneBytesPerRow = (int)CVPixelBufferGetBytesPerRowOfPlane(pixelBuffer, 1);
int uvSize = uvPlaneBytesPerRow * height;
stride = yPlaneBytesPerRow;
memcpy(pixelData, yPlane, ySize);
memcpy(pixelData + ySize, uvPlane, uvSize);
} else {
return;
}
NDIlib_video_frame_v2_t video_frame;
video_frame.xres = width;
video_frame.yres = height;
video_frame.FourCC = ndiVideoFormat;
video_frame.line_stride_in_bytes = stride;
video_frame.p_data = pixelData;
NDIlib_send_send_video_v2(self.my_ndi_send, &video_frame); // synchronous sending.
CVPixelBufferUnlockBaseAddress(pixelBuffer, 0);
// For synchrnous sending case. Free data or use pre-allocated memory.
if (pixelFormat == kCVPixelFormatType_420YpCbCr8BiPlanarFullRange) {
free(pixelData);
}
We have two Qt applications. App1 accepts a connection from App2 through QTcpServer and stores it in an instance of QTcpSocket* tcpSocket. App1 runs a simulation with 30 Hz. For each simulation run, a QByteArray consisting of a few kilobytes is sent using the following code (from the main/GUI thread):
QByteArray block;
/* lines omitted which write data into block */
tcpSocket->write(block, block.size());
tcpSocket->waitForBytesWritten(1);
The receiver socket listens to the QTcpSocket::readDataBlock signal (in main/GUI thread) and prints the corresponding time stamp to the GUI.
When both App1 and App2 run on the same system, the packages are perfectly in sync. However when App1 and App2 are run on different systems connected through a network, App2 is no longer in sync with the simulation in App2. The packages come in much slower. Even more surprising (and indicating our implementation is wrong) is the fact that when we stop the simulation loop, no more packages are received. This surprises us, because we expect from the TCP protocol that all packages will arrive eventually.
We built the TCP logic based on Qt's fortune example. The fortune server, however, is different, because it only sends one package per incoming client. Could someone identify what we have done wrong?
Note: we use MSVC2012 (App1), MSVC2010 (App2) and Qt 5.2.
Edit: With a package I mean the result of a single simulation experiment, which is a bunch of numbers, written into QByteArray block. The first bits, however, contain the length of the QByteArray, so that the client can check whether all data has been received. This is the code which is called when the signal QTcpSocket::readDataBlock is emitted:
QDataStream in(tcpSocket);
in.setVersion(QDataStream::Qt_5_2);
if (blockSize == 0) {
if (tcpSocket->bytesAvailable() < (int)sizeof(quint16))
return; // cannot yet read size from data block
in >> blockSize; // read data size for data block
}
// if the whole data block is not yet received, ignore it
if (tcpSocket->bytesAvailable() < blockSize)
return;
// if we get here, the whole object is available to parse
QByteArray object;
in >> object;
blockSize = 0; // reset blockSize for handling the next package
return;
The problem in our implementation was caused by data packages being piled up and incorrect handling of packages which had only arrived partially.
The answer goes in the direction of Tcp packets using QTcpSocket. However this answer could not be applied in a straightforward manner, because we rely on QDataStream instead of plain QByteArray.
The following code (run each time QTcpSocket::readDataBlock is emitted) works for us and shows how a raw series of bytes can be read from QDataStream. Unfortunately it seems that it is not possible to process the data in a clearer way (using operator>>).
QDataStream in(tcpSocket);
in.setVersion(QDataStream::Qt_5_2);
while (tcpSocket->bytesAvailable())
{
if (tcpSocket->bytesAvailable() < (int)(sizeof(quint16) + sizeof(quint8)+ sizeof(quint32)))
return; // cannot yet read size and type info from data block
in >> blockSize;
in >> dataType;
char* temp = new char[4]; // read and ignore quint32 value for serialization of QByteArray in QDataStream
int bufferSize = in.readRawData(temp, 4);
delete temp;
temp = NULL;
QByteArray buffer;
int objectSize = blockSize - (sizeof(quint16) + sizeof(quint8)+ sizeof(quint32));
temp = new char[objectSize];
bufferSize = in.readRawData(temp, objectSize);
buffer.append(temp, bufferSize);
delete temp;
temp = NULL;
if (buffer.size() == objectSize)
{
//ready for parsing
}
else if (buffer.size() > objectSize)
{
//buffer size larger than expected object size, but still ready for parsing
}
else
{
// buffer size smaller than expected object size
while (buffer.size() < objectSize)
{
tcpSocket->waitForReadyRead();
char* temp = new char[objectSize - buffer.size()];
int bufferSize = in.readRawData(temp, objectSize - buffer.size());
buffer.append(temp, bufferSize);
delete temp;
temp = NULL;
}
// now ready for parsing
}
if (dataType == 0)
{
// deserialize object
}
}
Please not that the first three bytes of the expected QDataStream are part of our own procotol: blockSize indicates the number of bytes for a complete single package, dataType helps deserializing the binary chunk.
Edit
For reducing the latency of sending objects through the TCP connection, disabling packet bunching was very usefull:
// disable Nagle's algorithm to avoid delay and bunching of small packages
tcpSocketPosData->setSocketOption(QAbstractSocket::LowDelayOption,1);
I red about netmap which allows user programmers to access packets in the user space, that means user applications can read / send network packets very quickly using this netamp.
netmap :
http://info.iet.unipi.it/~luigi/netmap/
Can any one who are very familiar with netamp, tell me should we create entire packet that we want to send out, or we using the stack features to send it out.
Edit : here is example on how to use this api
https://www.freebsd.org/cgi/man.cgi?query=netmap&sektion=4
#include <net/netmap_user.h>
void sender(void)
{
struct netmap_if *nifp;
struct netmap_ring *ring;
struct nmreq nmr;
struct pollfd fds;
fd = open("/dev/netmap", O_RDWR);
bzero(&nmr, sizeof(nmr));
strcpy(nmr.nr_name, "ix0");
nmr.nm_version = NETMAP_API;
ioctl(fd, NIOCREGIF, &nmr);
p = mmap(0, nmr.nr_memsize, fd);
nifp = NETMAP_IF(p, nmr.nr_offset);
ring = NETMAP_TXRING(nifp, 0);
fds.fd = fd;
fds.events = POLLOUT;
for (;;) {
poll(&fds, 1, -1);
while (!nm_ring_empty(ring)) {
i = ring->cur;
buf = NETMAP_BUF(ring, ring->slot[i].buf_index);
// here they are saying to construct the packet
... prepare packet in buf ...
ring->slot[i].len = ... packet length ...
ring->head = ring->cur = nm_ring_next(ring, i);
}
}
}
You need to create entire packed, including ethernet, ip and tcp headers. Netmap completely bypasses kernel network stack, so you need to do all work yourself.
In a TcpClient/TcpListener set up, is there any difference from the receiving end point of view between:
// Will sending a prefixed length before the data...
client.GetStream().Write(data, 0, 4); // Int32 payload size = 80000
client.GetStream().Write(data, 0, 80000); // payload
// Appear as 80004 bytes in the stream?
// i.e. there is no end of stream to demarcate the first Write() from the second?
client.GetStream().Write(data, 0, 80004);
// Which means I can potentially read more than 4 bytes on the first read
var read = client.GetStream().Read(buffer, 0, 4082); // read could be any value from 0 to 4082?
I noticed that DataAvailable and return value of GetStream().Read() does not reliably tell whether there are incoming data on the way. Do I always need to write a Read() loop to exactly read the first 4 bytes?
// Read() loop
var ms = new MemoryStream()
while(ms.length < 4)
{
read = client.GetStream().Read(buffer, 0, 4 - ms.length);
if(read > 0)
ms.Write(buffer, 0, read);
}
The answer seems to be yes, we have to always be responsible for reading the same number of bytes that was sent. In other words, there has to be an application level protocol to read exactly what was written on to the underlying stream because it does not know when a new message start or ends.
Using .NET 4.0, IIS 7.5 (Windows Server 2008 R2). I would like to stream out a binary content of about 10 MB. The content is already in a MemoryStream. I wonder if IIS7 automatically chunks the output stream. From the client receiving the stream, is there any difference between these two approaches:
//#1: Output the entire stream in 1 single chunks
Response.OutputStream.Write(memoryStr.ToArray(), 0, (int) memoryStr.Length);
Response.Flush();
//#2: Output by 4K chunks
byte[] buffer = new byte[4096];
int byteReadCount;
while ((byteReadCount = memoryStr.Read(buffer, 0, buffer.Length)) > 0)
{
Response.OutputStream.Write(buffer, 0, byteReadCount);
Response.Flush();
}
Thanks in advance for any help.
I didn't try your 2nd suggestion passing the original data stream. The memory stream was indeed populated from a Response Stream of a Web Request. Here is the code,
HttpWebRequest webreq = (HttpWebRequest) WebRequest.Create(this._targetUri);
using (HttpWebResponse httpResponse = (HttpWebResponse)webreq.GetResponse())
{
using (Stream responseStream = httpResponse.GetResponseStream())
{
byte[] buffer = new byte[4096];
int byteReadCount = 0;
MemoryStream memoryStr = new MemoryStream(4096);
while ((byteReadCount = responseStream.Read(buffer, 0, buffer.Length)) > 0)
{
memoryStr.Write(buffer, 0, byteReadCount);
}
// ... etc ... //
}
}
Do you think it can safely pass the responseStream to Response.OutputStream.Write() ? If yes, can you suggest an economic way of doing so? How to send ByteArray + exact stream length to Response.OutputStream.Write()?
The second option is the best one as ToArray will in fact create a copy of the internal array stored in the MemoryStream.
But, you can also preferably use memoryStr.GetBuffer() that will return a reference to this internal array. In this case, you need to use the memoryStr.Length property because the buffer returned by GetBuffer() is in general bigger than the stored actual data (it's allocated chunk by chunk, not byte by byte).
Note that it would be best to pass the original data as a stream directly to the ASP.NET outputstream, instead of using an intermediary MemoryStream. It depends on how you get your binary data in the first place.
Another option, if you serve the exact same content often, is to save this MemoryStream to a physical file (using a FileStream), and use Response.TransmitFile on all subsequent requests. Response.TransmitFile is using low level Windows socket layers and there's nothing faster to send a file.