Winsock asynchronous multiple WSASend with one single buffer - asynchronous

MSDN states "For a Winsock application, once the WSASend function is called, the system owns these buffers and the application may not access them."
In a server application, does that mean that if I want to broadcast a message to multiple clients I cannot use a single buffer that holds the data and invoke WSASend on each socket with that one buffer?

I don't have a documentation reference that confirms this is possible but I've been doing it for years and it hasn't failed yet, YMMV.
You CAN use a single data buffer as long as you have a unique OVERLAPPED structure per send. Since the WSABUF array is duplicated by the WSASend() call and can be stack based I would expect that you COULD have a single WSABUF array, but I've never done that.
What you DO need to make sure that you keep that single data buffer "alive" until all of the data writes complete.
Broadcasting like this can complicate a design if you tend to structure your extended OVERLAPPED so that it includes the data buffer, but it does avoid memory allocation and memory copying.
Note: I have a system whereby my extended OVERLAPPED structures include the data buffer and operation code and these are reference counted and pool and used for sends and recvs. When broadcasting a buffer I use a separate "buffer handle" per send, this handle is just an OVERLAPPED structure extended in a different way, it holds a reference to the original data buffer and has its own reference count. When all of the broadcast sends have completed all of the buffer handles will have been released and these will, in turn, have released the underlying data buffer for reuse.

Related

In Trio, how do you write data to a socket without waiting?

In Trio, if you want to write some data to a TCP socket then the obvious choice is send_all:
my_stream = await trio.open_tcp_stream("localhost", 1234)
await my_stream.send_all(b"some data")
Note that this both sends that data over the socket and waits for it to be written. But what if you just want to queue up the data to be sent, but not wait for it to be written (at least, you don't want to wait in the same coroutine)?
In asyncio this is straightforward because the two parts are separate functions: write() and drain(). For example:
writer.write(data)
await writer.drain()
So of course if you just want to write the data and not wait for it you can just call write() without awaiting drain(). Is there equivalent functionality in Trio? I know this two-function design is controversial because it makes it hard to properly apply backpressure, but in my application I need them separated.
For now I've worked around it by creating a dedicated writer coroutine for each connection and having a memory channel to send data to that coroutine, but it's quite a lot of faff compared to choosing between calling one function or two, and it seems a bit wasteful (presumably there's still a send buffer under the hood, and my memory channel is like a buffer on top of that buffer).
I posted this on the Trio chat and Nathaniel J. Smith, the creator of Trio, replied with this:
Trio doesn't maintain a buffer "under the hood", no. There's just the kernel's send buffer, but the kernel will apply backpressure whether you want it to or not, so that doesn't help you.
Using a background writer task + an unbounded memory channel is basically what asyncio does for you implicitly.
The other option, if you're putting together a message in multiple pieces and then want to send it when you're done would be to append them into a bytearray and then call send_all once at the end, at the same place where you'd call drain in asyncio
(but obviously that only works if you're calling drain after every logical message; if you're just calling write and letting asyncio drain it in the background then that doesn't help)
So the question was based on a misconception: I wanted to write into Trio's hidden send buffer, but no such thing exists! Using a separate coroutine that waits on a stream and calls send_all() makes more sense than I had thought.
I ended up using a hybrid of the two ideas (using separate coroutine with a memory channel vs using bytearray): save the data to a bytearray, then use a condition variable ParkingLot to signal to the other coroutine that it's ready to be written. That lets me coalesce writes, and also manually check if the buffer's getting too large.

read a well defined frame with qtserialport

I'm developing a desktop application with qt which communicates with stm32 to send and receive data.
The thing is, the data to transfer, follow a well-defined shape, with a previously defined fields. My problem is that I can't find how read () or readall() work or how Qserialport even treats the data. So my question is how can I read data (in real time, whenever there is data in the buffer) and analyze it field by field (or per byte) in order to be displayed in the GUI.
There's nothing to it. read() and readAll() give you bytes, optionally wrapped in a QByteArray. How you deal with those bytes is up to you. The serial port doesn't "treat" or interpret the data in any way.
The major point of confusion is that somehow people think of a serial port as if it was a packet oriented interface. It's not. When the readyRead() signal fires, all that you're guaranteed is that there's at least one new byte available to read. You must cope with such fragmentation.

I don't understand what exactly does the function bytesToWrite() Qt

I searched for bytesToWrite in doc and that what I found "For buffered devices, this function returns the number of bytes waiting to be written. For devices with no buffer, this function returns 0."
First what does mean buffered devices. And can anyone please explain to me what exactly this function does and where or how can I use it.
Many IO devices are buffered, which means that data isn't sent straight away, but it is accumulated to be sent in bulk when there is a sufficient amount.
This is done essentially to have better performance, as sending data normally has some fixed overhead (at the very least the syscall overhead), which is well amortized when sending data in bulk, but would have to be paid for each write if no buffering would be used.
(notice that here we are only talking about QIODevice buffers, normally there are also all kinds of kernel-mode buffers and buffers internal to hardware devices themselves)
bytesToWrite tells you how much stuff is in the QIODevice write buffer, i.e. how many bytes you wrote that are waiting to be actually written (as in, given to the OS to write).
I never actually had to use that member, but I suppose it could be useful e.g. to in a producer-consumer scenario (=if the write buffer is lower than something, then you have to actually calculate the next chunk of data to send), to manually handle buffering in some places or even just for debugging/logging purposes.
it's actually very usefull when you're using an asynchronous API.
you can for example, use it inside a bytesWritten() slot to tell wether the buffer is empty and the data has been fully written or not.

Handling messages over TCP

I'm trying to send and receive messages over TCP using a size of each message appended before the it starts.
Say, First three bytes will be the length and later will the message:
As a small example:
005Hello003Hey002Hi
I'll be using this method to do large messages, but because the buffer size will be a constant integer say, 200 Bytes. So, there is a chance that a complete message may not be received e.g. instead of 005Hello I get 005He nor a complete length may be received e.g. I get 2 bytes of length in message.
So, to get over this problem, I'll need to wait for next message and append it to the incomplete message etc.
My question is: Am I the only one having these difficulties to appending messages to each other, appending lengths etc.. to make them complete Or is this really usually how we need to handle the individual messages on TCP? Or, if there is a better way?
What you're seeing is 100% normal TCP behavior. It is completely expected that you'll loop receiving bytes until you get a "message" (whatever that means in your context). It's part of the work of going from a low-level TCP byte stream to a higher-level concept like "message".
And "usr" is right above. There are higher level abstractions that you may have available. If they're appropriate, use them to avoid reinventing the wheel.
So, there is a chance that a complete message may not be received e.g.
instead of 005Hello I get 005He nor a complete length may be received
e.g. I get 2 bytes of length in message.
Yes. TCP gives you at least one byte per read, that's all.
Or is this really usually how we need to handle the individual messages on TCP? Or, if there is a better way?
Try using higher-level primitives. For example, BinaryReader allows you to read exactly N bytes (it will internally loop). StreamReader lets you forget this peculiarity of TCP as well.
Even better is using even more higher-level abstractions such as HTTP (request/response pattern - very common), protobuf as a serialization format or web services which automate pretty much all transport layer concerns.
Don't do TCP if you can avoid it.
So, to get over this problem, I'll need to wait for next message and append it to the incomplete message etc.
Yep, this is how things are done at the socket level code. For each socket you would like to allocate a buffer of at least the same size as kernel socket receive buffer, so that you can read the entire kernel buffer in one read/recv/resvmsg call. Reading from the socket in a loop may starve other sockets in your application (this is why they changed epoll to be level-triggered by default, because the default edge-triggered forced application writers to read in a loop).
The first incomplete message is always kept in the beginning of the buffer, reading the socket continues at the next free byte in the buffer, so that it automatically appends to the incomplete message.
Once reading is done, normally a higher level callback is called with the pointers to all read data in the buffer. That callback should consume all complete messages in the buffer and return how many bytes it has consumed (may be 0 if there is only an incomplete message). The buffer management code should memmove the remaining unconsumed bytes (if any) to the beginning of the buffer. Alternatively, a ring-buffer can be used to avoid moving those unconsumed bytes, but in this case the higher level code should be able to cope with ring-buffer iterators, which it may be not ready to. Hence keeping the buffer linear may be the most convenient option.

Out of memoryexception in SignalR due to a large real time data

I'm using SignalR for a real time data ticker with up to 10k rows contained in a single object being sent to the client per sec. The memory of IIS worker processor goes on increasing until the ticking finally freezes.
First of all, as you already could read SignalR is built on a premise that you will not send large messages. However, in a real word it is a perfectly valid scenario.
So, what's the deal with a memory issues. SignalR has a circular buffer with a default size of 1000 elements, it stores every single message into that buffer per opened connection. So basically, if you have 100 opened connection, and you've sent 1000 messages it will be total of 100*1000 messages stored inside your memory.
Another thing you should take into consideration is .Net framework's large objects heap and garbage collection. Every object with a size greater than 85kB will went to the large object heap, next thing I should point out that garbage collector treats objects inside large object heap as a second generation objects. Taken that into account, you could realize that your objects once they have been dereferenced from the SignalR's circular buffer won't be garbage collected immediatelly due to their size.
As #davidfowl said, you really could make your data smaller, but sometimes you will not be able to do it, without introducing some pretty complex mechanic both on client and on server.
Fortunatelly, there is a way to reduce default size of SignalR's circular buffer, and you could do it by setting:
GlobalHost.Configuration.DefaultMessageBufferSize = 32

Resources