How to efficiently decode gobs and wait for more to arrive via tcp connection - asynchronous

I'd like to have a TCP connection for a gaming application. It's important to be time efficient. I want to receive many objects efficiently. It's also important to be CPU efficient because of the load.
So far, I can make sure handleConnection is called every time a connection is dialed using go's net library. However once the connection is created, I have to poll (check over and over again to see if new data is ready on the connection). This seems inefficient. I don't want to run that check to see if new data is ready if it's needlessly sucking up CPU.
I was looking for something such as the following two options but didn't find what I was looking for.
(1) Do a read operation that somehow blocks (without sucking CPU) and then unblocks when new stuff is ready on the connection stream. I could not find that.
(2) Do an async approach where a function is called when new data arrives on the connection stream (not just when a new connection is dialed). I could not find that.
I don't want to put any sleep calls in here because that will increase the latency of responding to singles messages.
I also considered dialing out for every single message, but I'm not sure if that's efficient or not.
So I came up with code below, but it's still doing a whole lot of checking for new data with the Decode(p) call, which does not seem optimal.
How can I do this more efficiently?
func handleConnection(conn net.Conn) {
dec := gob.NewDecoder(conn)
p := &P{}
for {
result := dec.Decode(p)
if result != nil {
// do nothing
} else {
fmt.Printf("Received : %+v", p)
fmt.Println("result", result, "\n")
}
}
conn.Close()
}

You say:
So I came up with code below, but it's still doing a whole lot of checking for new data with the Decode(p) call.
Why do you think that? The gob decoder will issue a Read to the conn and wait for it to return data before figuring out what it is and decoding it. This is a blocking operation, and will be handled asynchronously by the runtime behind the scenes. The goroutine will sleep until the appropriate io signal comes in. You should not have to do anything fancy to make that more performant.
You can trace this yourself in the code for decoder.Decode.
I think your code will work just fine. CPU will be idle until it receives more data.
Go is not node. Every api is "blocking" for the most part, but that is not as much as a problem as in other platforms. The runtime manages goroutines very efficiently and delegates appropriate signals to sleeping goroutines as needed.

Related

Efficiently connecting an asynchronous IMFSourceReader to a synchronous IMFTransform

Given an asynchronous IMFSourceReader connected to a synchronous only IMFTransform.
Then for the IMFSourceReaderCallback::OnReadSample() callback is it a good idea not to call IMFTransform::ProcessInput directly within OnReadSample, but instead push the produced sample onto another queue for another thread to call the transforms ProcessInput on?
Or would I just be replicating identical work source readers typically do internally? Or put another way does work within OnReadSample run the risk of blocking any further decoding work within the source reader that could have otherwise happened more asynchronously?
So I am suggesting something like:
WorkQueue transformInputs;
...
// Called back async
HRESULT OnReadSampleCallback(... IMFSample* sample)
{
// Push sample and return immediately
Push(transformInputs, sample);
}
// Different worker thread awoken for transformInputs queue samples
void OnTransformInputWork()
{
// Transform object is not async capable
transform->TransformInput(0, Pop(transformInputs), 0);
...
}
This is touched on, but not elaborated on here 'Implementing the Callback Interface':
https://learn.microsoft.com/en-us/windows/win32/medfound/using-the-source-reader-in-asynchronous-mode
Or is it completely dependent on whatever the source reader sets up internally and not easily determined?
It is not a good idea to perform a long blocking operation in IMFSourceReaderCallback::OnReadSample. Nothing is going to be fatal or serious but this is not the intended usage.
Taking into consideration your previous question about audio format conversion though, audio sample data conversion is fast enough to happen on such callback.
Also, it is not clear or documented (depends on actual implementation), ProcessInput is often instant and only references input data. ProcessOutput would be computationally expensive in this case. If you don't do ProcessOutput right there in the same callback you might run into situation where MFT is no longer accepting input, and so you'd have to implement a queue anyway.
With all this in mind you would just do the processing in the callback neglecting performance impact assuming your processing is not too heavy, or otherwise you would just start doing the queue otherwise.

In Trio, how do you write data to a socket without waiting?

In Trio, if you want to write some data to a TCP socket then the obvious choice is send_all:
my_stream = await trio.open_tcp_stream("localhost", 1234)
await my_stream.send_all(b"some data")
Note that this both sends that data over the socket and waits for it to be written. But what if you just want to queue up the data to be sent, but not wait for it to be written (at least, you don't want to wait in the same coroutine)?
In asyncio this is straightforward because the two parts are separate functions: write() and drain(). For example:
writer.write(data)
await writer.drain()
So of course if you just want to write the data and not wait for it you can just call write() without awaiting drain(). Is there equivalent functionality in Trio? I know this two-function design is controversial because it makes it hard to properly apply backpressure, but in my application I need them separated.
For now I've worked around it by creating a dedicated writer coroutine for each connection and having a memory channel to send data to that coroutine, but it's quite a lot of faff compared to choosing between calling one function or two, and it seems a bit wasteful (presumably there's still a send buffer under the hood, and my memory channel is like a buffer on top of that buffer).
I posted this on the Trio chat and Nathaniel J. Smith, the creator of Trio, replied with this:
Trio doesn't maintain a buffer "under the hood", no. There's just the kernel's send buffer, but the kernel will apply backpressure whether you want it to or not, so that doesn't help you.
Using a background writer task + an unbounded memory channel is basically what asyncio does for you implicitly.
The other option, if you're putting together a message in multiple pieces and then want to send it when you're done would be to append them into a bytearray and then call send_all once at the end, at the same place where you'd call drain in asyncio
(but obviously that only works if you're calling drain after every logical message; if you're just calling write and letting asyncio drain it in the background then that doesn't help)
So the question was based on a misconception: I wanted to write into Trio's hidden send buffer, but no such thing exists! Using a separate coroutine that waits on a stream and calls send_all() makes more sense than I had thought.
I ended up using a hybrid of the two ideas (using separate coroutine with a memory channel vs using bytearray): save the data to a bytearray, then use a condition variable ParkingLot to signal to the other coroutine that it's ready to be written. That lets me coalesce writes, and also manually check if the buffer's getting too large.

Will the write() system call block further operation till read() is involved, or vice versa?

Written as part of a TCP/IP client-server:
Server:
write(nfds,data1,sizeof(data1));
usleep(1000);
write(nfds,data2,sizeof(data2));
Client:
read(fds,s,sizeof(s));
printf("%s",s);
read(fds,s,sizeof(s));
printf("%s",s);
Without usleep(1000) between the two calls to write(), the client prints data1 twice. Why is this?
Background:
I am doing a Client-Server program where the server has to send two consecutive pieces of information after their acquisition, via the network (socket); nfds is the file descriptor we get from accept().
In the client side, we receive these information via read; here fds is the file descriptor obtained via socket().
My issue is that when I am NOT using the usleep(1000) between the write() functions, the client just prints the info represented by data1 twice, instead of printing data1 and then data2. When I put in the usleep() it's fine. Exactly WHY is this happening? Is write() blocking the operation till the buffer is read or is read() blocking the operation till info is written into the buffer? Or am I completely off the page?
You are making several false assumptions. There is nothing in TCP that guarantees that one send equals one receive. There is a lot of buffering, at both ends, and there are deliberate delays in sending to as to coalesce packets (the Nagle algorithm). When you call read(), or recv() and friends, you need to store the result into a variable and examine it for each of the following cases:
-1: an error: examine/log/print errno, or strerror(), or call perror(), and in most cases close the socket and exit the reading loop.
0: end of stream; the owner has closed the connection; close the socket and exit the reading loop.
a positive value but less than you expected: keep reading and accumulate the data until you have everything you need.
a positive value that is more than you expected: process the data you expected, and save the rest for next time.
exactly what you expected: process the data, discard it all, and repeat. This isn the easy case, and it is rare, but it is the only case you are currently programming for.
Don't add sleeps into networking code. It doesn't solve problems, it only delays them.

A MailboxProcessor that operates with a LIFO logic

I am learning about F# agents (MailboxProcessor).
I am dealing with a rather unconventional problem.
I have one agent (dataSource) which is a source of streaming data. The data has to be processed by an array of agents (dataProcessor). We can consider dataProcessor as some sort of tracking device.
Data may flow in faster than the speed with which the dataProcessor may be able to process its input.
It is OK to have some delay. However, I have to ensure that the agent stays on top of its work and does not get piled under obsolete observations
I am exploring ways to deal with this problem.
The first idea is to implement a stack (LIFO) in dataSource. dataSource would send over the latest observation available when dataProcessor becomes available to receive and process the data. This solution may work but it may get complicated as dataProcessor may need to be blocked and re-activated; and communicate its status to dataSource, leading to a two way communication problem. This problem may boil down to a blocking queue in the consumer-producer problem but I am not sure..
The second idea is to have dataProcessor taking care of message sorting. In this architecture, dataSource will simply post updates in dataProcessor's queue. dataProcessor will use Scanto fetch the latest data available in his queue. This may be the way to go. However, I am not sure if in the current design of MailboxProcessorit is possible to clear a queue of messages, deleting the older obsolete ones. Furthermore, here, it is written that:
Unfortunately, the TryScan function in the current version of F# is
broken in two ways. Firstly, the whole point is to specify a timeout
but the implementation does not actually honor it. Specifically,
irrelevant messages reset the timer. Secondly, as with the other Scan
function, the message queue is examined under a lock that prevents any
other threads from posting for the duration of the scan, which can be
an arbitrarily long time. Consequently, the TryScan function itself
tends to lock-up concurrent systems and can even introduce deadlocks
because the caller's code is evaluated inside the lock (e.g. posting
from the function argument to Scan or TryScan can deadlock the agent
when the code under the lock blocks waiting to acquire the lock it is
already under).
Having the latest observation bounced back may be a problem.
The author of this post, #Jon Harrop, suggests that
I managed to architect around it and the resulting architecture was actually better. In essence, I eagerly Receive all messages and filter using my own local queue.
This idea is surely worth exploring but, before starting to play around with code, I would welcome some inputs on how I could structure my solution.
Thank you.
Sounds like you might need a destructive scan version of the mailbox processor, I implemented this with TPL Dataflow in a blog series that you might be interested in.
My blog is currently down for maintenance but I can point you to the posts in markdown format.
Part1
Part2
Part3
You can also check out the code on github
I also wrote about the issues with scan in my lurking horror post
Hope that helps...
tl;dr I would try this: take Mailbox implementation from FSharp.Actor or Zach Bray's blog post, replace ConcurrentQueue by ConcurrentStack (plus add some bounded capacity logic) and use this changed agent as a dispatcher to pass messages from dataSource to an army of dataProcessors implemented as ordinary MBPs or Actors.
tl;dr2 If workers are a scarce and slow resource and we need to process a message that is the latest at the moment when a worker is ready, then it all boils down to an agent with a stack instead of a queue (with some bounded capacity logic) plus a BlockingQueue of workers. Dispatcher dequeues a ready worker, then pops a message from the stack and sends this message to the worker. After the job is done the worker enqueues itself to the queue when becomes ready (e.g. before let! msg = inbox.Receive()). Dispatcher consumer thread then blocks until any worker is ready, while producer thread keeps the bounded stack updated. (bounded stack could be done with an array + offset + size inside a lock, below is too complex one)
Details
MailBoxProcessor is designed to have only one consumer. This is even commented in the source code of MBP here (search for the word 'DRAGONS' :) )
If you post your data to MBP then only one thread could take it from internal queue or stack.
In you particular use case I would use ConcurrentStack directly or better wrapped into BlockingCollection:
It will allow many concurrent consumers
It is very fast and thread safe
BlockingCollection has BoundedCapacity property that allows you to limit the size of a collection. It throws on Add, but you could catch it or use TryAdd. If A is a main stack and B is a standby, then TryAdd to A, on false Add to B and swap the two with Interlocked.Exchange, then process needed messages in A, clear it, make a new standby - or use three stacks if processing A could be longer than B could become full again; in this way you do not block and do not lose any messages, but could discard unneeded ones is a controlled way.
BlockingCollection has methods like AddToAny/TakeFromAny, which work on an arrays of BlockingCollections. This could help, e.g.:
dataSource produces messages to a BlockingCollection with ConcurrentStack implementation (BCCS)
another thread consumes messages from BCCS and sends them to an array of processing BCCSs. You said that there is a lot of data. You may sacrifice one thread to be blocking and dispatching your messages indefinitely
each processing agent has its own BCCS or implemented as an Agent/Actor/MBP to which the dispatcher posts messages. In your case you need to send a message to only one processorAgent, so you may store processing agents in a circular buffer to always dispatch a message to least recently used processor.
Something like this:
(data stream produces 'T)
|
[dispatcher's BCSC]
|
(a dispatcher thread consumes 'T and pushes to processors, manages capacity of BCCS and LRU queue)
| |
[processor1's BCCS/Actor/MBP] ... [processorN's BCCS/Actor/MBP]
| |
(process) (process)
Instead of ConcurrentStack, you may want to read about heap data structure. If you need your latest messages by some property of messages, e.g. timestamp, rather than by the order in which they arrive to the stack (e.g. if there could be delays in transit and arrival order <> creation order), you can get the latest message by using heap.
If you still need Agents semantics/API, you could read several sources in addition to Dave's links, and somehow adopt implementation to multiple concurrent consumers:
An interesting article by Zach Bray on efficient Actors implementation. There you do need to replace (under the comment // Might want to schedule this call on another thread.) the line execute true by a line async { execute true } |> Async.Start or similar, because otherwise producing thread will be consuming thread - not good for a single fast producer. However, for a dispatcher like described above this is exactly what needed.
FSharp.Actor (aka Fakka) development branch and FSharp MPB source code (first link above) here could be very useful for implementation details. FSharp.Actors library has been in a freeze for several months but there is some activity in dev branch.
Should not miss discussion about Fakka in Google Groups in this context.
I have a somewhat similar use case and for the last two days I have researched everything I could find on the F# Agents/Actors. This answer is a kind of TODO for myself to try these ideas, of which half were born during writing it.
The simplest solution is to greedily eat all messages in the inbox when one arrives and discard all but the most recent. Easily done using TryReceive:
let rec readLatestLoop oldMsg =
async { let! newMsg = inbox.TryReceive 0
match newMsg with
| None -> oldMsg
| Some newMsg -> return! readLatestLoop newMsg }
let readLatest() =
async { let! msg = inbox.Receive()
return! readLatestLoop msg }
When faced with the same problem I architected a more sophisticated and efficient solution I called cancellable streaming and described in in an F# Journal article here. The idea is to start processing messages and then cancel that processing if they are superceded. This significantly improves concurrency if significant processing is being done.

Receiving image through winsocket

i have a proxy server running on my local machine used to cache images while surfing. I set up my browser with a proxy to 127.0.0.1, receive the HTTP requests, take the data and send it back to the browser. It works fine for everything except large images. When I receive the image info, it only displays half the image (ex.: the top half of the google logo) heres my code:
char buffer[1024] = "";
string ret("");
while(true)
{
valeurRetour = recv(socketClient_, buffer, sizeof(buffer), 0);
if(valeurRetour <= 0) break;
string t;
t.assign(buffer,valeurRetour);
ret += t;
longueur += valeurRetour;
}
closesocket(socketClient_);
valeurRetour = send(socketServeur_, ret.c_str(),longueur, 0);
the socketClient_ is non-blocking. Any idea how to fix this problem?
You're not making fine enough distinctions among the possible return values of recv.
There are two levels here.
The first is, you're lumping 0 and -1 together. 0 means the remote peer closed its sending half of the connection, so your code does the right thing here, closing its socket down, too. -1 means something happened besides data being received. It could be a permanent error, a temporary error, or just a notification from the stack that something happened besides data being received. Your code lumps all such possibilities together, and on top of that treats them the same as when the remote peer closes the connection.
The second level is that not all reasons for getting -1 from recv are "errors" in the sense that the socket is no longer useful. I think if you start checking for -1 and then calling WSAGetLastError to find out why you got -1, you'll get WSAEWOULDBLOCK, which is normal since you have a non-blocking socket. It means the recv call cannot return data because it would have to block your program's execution thread to do so, and you told Winsock you wanted non-blocking calls.
A naive fix is to not break out of the loop on WSAEWOULDBLOCK but that just means you burn CPU time calling recv again and again until it returns data. That goes against the whole point of non-blocking sockets, which is that they let your program do other things while the network is busy. You're supposed to use functions like select, WSAAsyncSelect or WSAEventSelect to be notified when a call to the API function is likely to succeed again. Until then, you don't call it.
You might want to visit The Winsock Programmer's FAQ. (Disclaimer: I'm its maintainer.)
Have you analyzed the transaction at the HTTP level i.e. checked Headers?
Are you accounting for things like Chunked transfers?
I do not have a definite answer in part because of the lack of details given here.

Resources