The product I work on uses Boost ASIO (TCP) for network communication. During a test I noticed something very strange: ASIO would send at ~60MB/s for 13 seconds and then drop to ~300K/s for 9 seconds then it would then go back to ~60MB/s for 13 seconds; this would repeat for the entire transfer. I wrote a test application to see if I could reproduce this and to my surprise, I could.
In my test app, after the client and server connect the code is very basic. Of note, blocking and non-blocking sockets produce the same results. Here is how the server receives data in a blocking situation:
while(true)
{
boost::system::error_code ec;
serverSocket.read_some(boost::asio::buffer(serverBuffer, bufferLength), ec);
}
The client does this:
while(true)
boost::asio::write(clientSocket, boost::asio::buffer(clientBuffer, bufferLength)
If bufferLength is 4K, I see the problem, but if I push it up to 32K the transfer speed is fast (>120MB/s) and consistent.
Can anyone shed some light on what may be happening? This is a Windows application running on Windows Server 2008.
Edit: I did a Wireshark capture and there is sometimes a delay of > 150ms of the server sending an ACK. i.e. client sends the last bit of data [PSH,ACK]; server reponds 150ms later with an [ACK]. No other traffic in between.
2nd Edit: I wrote a C# app that exhibits the same behavior. Client sends 4k packets with NetworkStream.Write, and server reads them with NetworkStream.Read. Any suggestions on where to go from here?
I don't know what exactly in your program. but I got the meassge of your wireshark for the 150ms one trip. It's abnormal. I thinks your should check the network cards if they working in same mode. Duplex or half-Duplex, 100M or 1000M. and then, check the error packet counter, it should be 0 or a very small number. if it's a big number and increse with your C/S Communication. you should change your cable.
I wish it can help you. :)
Did you try to disable Nagle's algorithm (tcp::no_delay)?
Also try to change buffer size.
socket->set_option( boost::asio::ip::tcp::no_delay( true) );
socket->set_option( boost::asio::socket_base::send_buffer_size( 65536 ) );
socket->set_option( boost::asio::socket_base::receive_buffer_size( 65536 ) );
Related
I am designing the software controlling several serial ports, operating system is OpenWrt. The device application is running in is single core ARM9 # 450 MHz. Protocol for serial ports is Modbus.
The problem is with Modbus slave implementation. I designed it in real-time manner, looping reading data from serial port (port is open in non-blocking mode with 0 characters to wait for and no timeouts). The sequence/data stream timeout is about 4 milliseconds # 9600/8/N/1 (3.5 characters as advised). Timeout is checked if application does not see anything in the buffer, therefore if application is slower that incoming stream of characters, timeouts mechanism will not take place - until all characters are removed from the buffer and bus is quiet.
But I see that CPU switches between threads and this thread is missed for about 40-60 milliseconds, which is a lot to measure timeouts. While I guess serial port buffer still receives the data (how long this buffer is?), I am unable to assess how much time passed between chars, and may treat next message as continuation of previous and miss Modbus request targeted for my device.
Therefore, I guess, something must be redesigned (for the slave - master is different story). First idea coming to my mind is to forget about timeouts, and just parse the incoming data after being synchronized with the whole stream (by finding initial timeout). However, there're several problems - I must know everything about all types of Modbus messages to parse them correctly and find out their ends and where next message starts, and I do not see the way how to differentiate Modbus request with Modbus response from the device. If developers of the Modbus protocol would put special bit in command field identifying if message is request or response... but it is not the case, and I do not see the right way to identify if message I am getting is request or response without getting following bytes and checking CRC16 at would-be byte counts, it will cost time while I am getting bytes, and may miss window for the response to request targeted for me.
Another idea would be using blocking method with VTIME timeout setting, but this value may be set to tenths of seconds only (therefore minimal is 100 ms), and this is too much given another +50 ms for possible CPU switching between threads, I think something like timeout of 10 ms is needed here. It is a good question if VTIME is hardware time, or software/driver also subject to CPU thread interruptions; and the size of FIFO in the chip/driver how many bytes it can accumulate.
Any help/idea/advice will be greatly appreciated.
Update per #sawdust comments:
non-blocking use has really appeared not a good way because I do not poll hardware, I poll software, and the whole design is again subject to CPU switching and execution scheduling;
"using the built-in event-based blocking mode" would work in blocking mode if I would be able to configure UART timeout (VTIME) in 100 us periods, plus if again I would be sure I am not interrupted during the executing - only then I will be able to have timing properly, and restart reading in time to get next character's timing assessed properly.
I currently have a game server with a customizable tick rate, but for this example let's suggest that the server is only going to tick once per second or 1hz. I'm wondering what's the best way to handle incoming packets if the client send rate is faster than the server's as my current setup doesn't seem to work.
I have my udp blocking receive with a timeout inside my tick function, and it works, however if the client tick rate is higher than the server, all of the packets are not received; only the one that is being read at the current time. So essentially the server is missing packets being sent by clients. The image below demonstrates my issue.
So my question is, how is this done correctly? Is there a separate thread where packets are read constantly, queued up and then the queue is processed when the server ticks or is there a better way?
Image was taken from a video https://www.youtube.com/watch?v=KA43TocEAWs&t=7s but demonstrates exactly what I'm explaining
There's a bunch going on with the scenario you describe, but here's my guidance.
If you are running a server at 1hz, having a blocking socket prevent your main logic loop from running is not a great idea. There is a chance you won't be receiving messages at the rate you expect (due to packet loss, network lag spike or client app closing)
A) you certainly could create another thread, continue to make blocking recv/recvfrom calls, and then enqueue them onto a thread safe data structure
B) you could also just use a non-blocking socket, and keep reading packets until it returns -1. The OS will buffer (usually configurable) a certain number of incoming messages, until it starts dropping them if you aren't reading.
Either way is fine, however for individual game clients, I prefer the second simple approaches when knowing I'm on a thread that is servicing the socket at a reasonable rate (5hz seems pretty low, but may be appropriate for your game). If there's a chance you are stalling the servicing thread (level loading, etc), then go with the first approach, so you don't detect it as a disconnection if you miss sending/receiving a periodic keepalive message.
On the server side, if I'm planning on a large number of clients/data, I go to great lengths to efficiently read from sockets - using IO Completion Ports on Windows, or epoll() on Linux.
Your server could have to have a thread to tick every 5 seconds just like the client to receive all the packets. Anything not received during that tick would be dropped as the server was not listening for it. You can then pass the data over from the thread after 5 ticks to the server as one chunk. The more reliable option though is to set the server to 5hz just like the client and thread every packet that comes in from the client so that it does not lock up the main thread.
For example, if the client update rate is 20, and the server tick rate is 64, the client might as well be playing on a 20 tick server.
In our C/S based online game project, we use TCP for network transmission. We include Libevent, utilise a bufferevent for each connection to handling with the network I/O automatically.
It works well beforeļ¼but the lagging problem comes to the surface recently. When I do some stress testing to make the network busier, the latency becomes extremely high, several seconds or more. The server sinks into a confusing state:
the average CPU usage decreased (0%-60%-0%-60% repeat, waiting something?)
the net traffic decreased (nethogs)
the clients connected to server still alive (netstat & tcpdump)
It looks like something magically slowed all system down, but new connection to server responded quit in time.
When I changed the protocol to UDP, it works well on the same situation: no obvious latency, the system runs fast. Net traffic is around 3M/S.
The project is running on an Intranet. I also tested the max download speed, nearly 18M/S.
I studied part of Libevent's header files and ducumentations, tried to setup a rate limit to all connections. It did some improvements, but not completely resolved the problem even though I had tried several different configurations. Here is my parameters: read_rate 163840, read_burst 163840, write_rate 163840, write_burst 163840, tick_len 500ms.
Thank you for your help!
TCP = Transmission Control Protocol. It responds to packet loss by retransmitting unacknowledged packets after a delay. In the case of repeated loss, it will exponentially back off. Take a look at this network capture of an attempt to open a connection to host that is not responding:
It sends the initial SYN, and then after not getting an ack for 1s it tries again. After not getting an ack it then sends another after ~2s, then ~4s, then ~8s, and so on. So you can see that you can get some serious latency in the face of repeated packet loss.
Since you said you were deliberately stressing the network, and that the CPU usage is inconsistent, one possible explanation is that TCP is waiting to retransmit lost packets.
The best way to see what is going on is to get a network capture of what is actually transmitted. If your hosts are connected to a single switch, you can "span" a port of interest to the port of another host where you can make the capture.
If your switch isn't capable of this, or if you don't have the administrative control of the switch, then you will have to get the capture from one of hosts involved in your online game. The disadvantage of this is that taking the capture will possibly alter what happens, and it doesn't see what is actually on the wire. For example, you might have TCP segmentation offload enabled for your interface, in which case the capture will see large packets that will be broken up by the network interface.
I would suggest installing wireshark to analyse the network capture (which you can do in real time by using wireshark to do the capture as well). Any time you are working with a networked system I would recommend using wireshark so that you have some visibility into what is actually happening on the network. The first filter I would suggest you use is tcp.analysis.flags which will show you packets suggestive of problems.
I would also suggest turning off the rate limiting first to try to see what is going on (rate limiting is adding another reason to not send packets, which is probably going to make it harder to diagnose what is going on). Also, 500ms might be a longish tick_len depending on how your game operates. If your burst configuration allows the rate to be used up in 100ms, you will end up waiting 400ms before you can transmit again. The IO Graph is a very helpful feature of Wireshark in this regard. It can help you see transmission rates, although the default tick interval and unit are not very helpful in this regard. Here is an example of a bursty flow being rate limited to 200mbit/s:
Note that the tick interval is 1ms and the unit is bits/tick, which makes the top of the chart 1gb/s, the speed of the interface in question.
I have been testing a program which has simple communication between two machines over a 1Gbps line. While running TCP communications over the line I occasionally receive write errors on the client side (due to a timeout) when the network is totally flooded (running at or close to 100% usage). This generally happens when I am running multiple instances of the same program going to different ports.
My question is, is it possible to get a write error but still receive the message on the server side. It appears that is what is happening, and I am not quite sure why. Could it be that the ACK coming back to the client is what is timing out?
Yes, that is possible. TCP does not guarantee you that data you sent successfully is received and that data that is sent unsuccessfully is not received. This problem is unsolvable. It is called the Generals Problem. There is always a way to loose messages/packets such that the sender comes to the wrong conclusion. TCP guarantees that the receiver receives the same stream of bytes that the sender sent, but possibly cut off at an arbitrary point.
This unreliability has performance reasons, too. TCP data is buffered on both hosts as well as on the network. Acknowledgement is delayed.
You have to live with this. If you make your scenario more concrete I can suggest some strategies of dealing with this.
send puts data into the TCP send buffer.
If the send buffer has no enough space, send will block util the data is completely or partly copied into the send buffer, or the designed timeout arrives.
Read timeout and write timeout is OK. You should check and process them. The way is restarting read/write operation after timeout. You also pay attention to other read/write error except timeout.
I'm writing a Comet-like app using Flex on the client and my own hand-written server.
I need to be able to send short bursts of data from the client at quite a high frequency (e.g. of the order of 10ms between sends).
I also need the server to push short bursts of data at a similarly high frequency.
I'm using NetConnection.call() to send the data to the server, and URLStream (with chunked encoding) to push the data from the server to the client.
What I've found is that the data isn't being sent/received as soon as it's available. For example, in IE, it seems the data is sent every 200ms rather than as soon as NetConnection.call() is called. Similarly, URLStream isn't making the data available as soon as the server is sending it.
Judging by the difference in behaviour between the browsers, it seems as though the Flash Player (version 10) is relying on the host browser to do all the comms. Can anyone confirm this? Update: This is very likely as only the host browser would know about the proxy settings that might be set.
I've tried using the Socket class and there's no problem with speed there: it works perfectly. However, I'd like to be able to use HTTP-based (port 80) connections so that my app can run in heavily fire-walled environments (I tried using a Socket over port 80, but that has its problems).
Incidentally, all development/testing has been done on an internal LAN, so bandwidth/latency is not an issue.
Update: The data being sent/received is in small packets and doesn't need to be in any particular format. For example, I might need to send a short array of Numbers, and this could either be encoded in AMF (e.g. via NetConnection.call()) or could be put into GET parameters (e.g. using sendToURL()). The main point of my question is really to see whether anyone else has experienced the same problem in calling NetConnection/URLStream frequently, and whether there is a workaround (it's also possible that the fault lies with my server code of course, rather than Flash).
Thanks.
Turns out the problem had nothing to do with Flash/Flex or any of the host browsers. The problem was in my server code (written in C++ on Linux), and without access to my source code the cause is hard to find (so I couldn't have hoped for an answer from this forum).
Still - thank you everyone who chipped in.
It was only after looking carefully at the output shown in Wireshark that I noticed the problem, which was twofold:
Nagle's algorithm
I was sending replies in multiple packets by calling write() multiple times (e.g. once for the HTTP response header, and again for the HTTP response body). The server's TCP/IP stack was waiting for an ACK for the first packet before sending the second, but because of Nagle's algorithm the client was waiting 200ms before sending back the ACK to the first packet, so the server took at least 200ms to send the full HTTP response.
The solution is to use send() with the flag MSG_MORE until all the logically connected blocks are written. I could also have used writev() or setsockopt() with TCP_CORK, but it suited my existing code better to use send().
Chunk-encoded streams
I'm using a never-ending HTTP response with chunk encoding to push data back to the client. Naggle's algorithm needs to be turned off here because even if each chunk is written as one packet (using MSG_MORE), the client OS TCP/IP stack will still wait up to 200ms before sending back an ACK, and the server can't push a subsequent chunk until it gets that ACK.
The solution here is to ask the server not to wait for an ACK for each sent packet before sending the next packet, and this is done by calling setsockopt() with the TCP_NODELAY flag.
The above solutions only work on Linux and aren't POSIX-compliant (I think), but that isn't a problem for me.
I'm almost 100% sure the player relies on the browser for such communications. Can't find an official page stating so atm, but check this out for example:
Applications hosting the Flash Player
ActiveX control or Flash Player
plug-in can use the
EnforceLocalSecurity and
DisableLocalSecurity API calls to
control security settings.
Which I think somehow implies the idea. Also, I've suffered some network related bugs on FF/IE only which again points out to the player using each browser for networking (otherwise there wouldn't be such differences).
And regarding your latency problem, I think that if speed is critical, your best bet is sockets. You have some work to do, but seems possible, check out the docs again:
This error occurs in SWF content.
Dispatched if a call to
Socket.connect() attempts to connect
either to a server outside the
caller's security sandbox or to a port
lower than 1024. You can work around
either problem by using a cross-domain
policy file on the server.
HTH,
Juan