I'm having an issue with some embedded mobile devices that have a buggy TCP stack. We're trying to update these devices but the firmware download fails, unless the mobile connection is very very good. Since it's an EDGE connection, it's usually bad.
Part of the problem is that the devices need quite a bit of time to write the data to storage. This is probably what leads to packet loss, but the connection never recovers.
I'm thinking that if I could control the connection at TCP level, I might be able to get around this problem. We tried changing the congestion control and it doesn't help, but we're still looking into that.
In the meantime I'd like to look into this option. Is there any way to do it, without writing my own TCP stack / kernel module?
I didn't find any way to do this so in the end I set up a new server and recompiled the linux kernel with a modified TCP_RTO_MAX value (5s instead of 120s). This seems to have solved my problem. My guess is that the network was not the actual issue but rather devices taking too long to store the data. This is a very specific case and this solution wouldn't help in any situation where the network connection is actually slow.
Related
I want multiple IoT devices (Say 50) communicating to a server directly asynchronously via TCP. Assume all of them have a heartbeat pulse every 30 seconds and may drop off and reconnect at variable times.
Can anyone advice me the best way to make sure no data is dropped or blocked when multiple devices are communicating simultaneously?
TCP by itself ensures no data loss during the communication between a client and a server. It does that by the use of sequence numbers and ACK messages.
Technically, before the actual data transfer happens, a TCP connection is created between the client (which can be an IoT device, or any other device) and the server. Then, the data is split into multiple packets and sent over the network through that connection. All TCP-related mechanisms like flow-control, error-detection, congestion-detection, and many others, take place once the data starts to flow.
The wiki page for TCP is a pretty good start if you want to learn more about how it works.
Apart from that, as long as your server has enough capacity to support the flow of requests coming from the devices, then everything should work (at least in theory).
I don't think you are asking the right question. There is no way to make sure that no data is dropped or blocked. Networks do not always work (that is why the word work is in network, to convince you otherwise ).
The right question is: how do I make my distributed system as available and reliable as possible? The answer involves viewing interruption and congestion as part of the normal operation, and build your software appropriately.
There is a timeless usenix/acm/? paper from the late 70s early 80s that invigorated the notion that end-to-end protocols are much more effective then over-featured middle to middle protocols; and most guarantees of middle to middle amount to best effort. If you rely upon those guarantees, you are bound to fail. Sorry, cannot find the reference right now, but it is widely cited.
At the moment I have a problem I cannot pin down. Seemingly at random my communication with my RS232 Alicat Device will get held up. It will get held up somewhere in the read or write process and be unable to complete it. Upon closing the VI I will get a "Resetting VI" error in Labview 2020. I am using 7 of the 9 RS232 ports. My question is:
How do I fix this problem so that I do not get a communication drop OR (more likely)
How do I code the system such that I can catch and move through this problem or reset the connection. Something of a VISA read/write timeout? Open to ideas on how to move past the block
Here is what I have gathered about the problem:
Windows 10, I’ve tested everything on multiple computers. It happens no matter what.
It happens at random. It might happen twice within 20 minutes or not for a couple of hours.
I have never experienced the error when probing the line. I don’t know if that is a clue, or if that speaks to the randomness of the problem
Baud Rate = 9600, Prior to this I was running at 19,200 and experienced equivalent issues. The manufacturer recommended lowering the baud rate to reduce noise. I have also isolated the cable from other parts of the hardware. At this point noise on the connection is not an issue, but I am still experiencing the error.
My buffer size is 1000 bytes.
By termination character is \r. I cannot imagine a scenario where it fails to read a termination character due to the size of my buffer
I'm querying it every 50ms. Far below the threshold of a standard timeout. Too much?
What I am currently testing.
Due to how my code block is setup I cannot yet confirm if it is getting locked up on the read or write block or both. I'm attempting to isolate the problem with only minor modifications to see if I can isolate it.
Attached is slimmed down version of my code that I isolated the error to.
I have experienced similar problems with some RS232 devices from different suppliers. The (quite bad) solution was to connect and disconnect for each communication command. The question would be what sample rate you need.
Another idea is to replace that device with an ethernet device. If I am not mistaken Alicat supplies those with Modbus (TCP).
The issue turned out to be specific to windows/my laptop. There is a USB setting that disables inactive USB's after a certain amount of time. The setting to disable the timeout was unavailable through the control panel on my laptop, though it was available on my coworkers. I had to use powershell commands to change the setting
I am writing an application where the client side will be uploading data to the server through a wireless link.
The connection should be very reliable.The link is expected to break many times and there will be many clients connected to the server.
I am confused whether to use TCP or reliable UDP.
Please share your thoughts.
Thanks.
RUDP is not, of course, a formal standard, and there's no telling if you will find existing implementations you can use. Given a choice between rolling this from scratch and just re-making TCP connections, I'd chose TCP.
To be safe, I would go with TCP just because it's a reliable, standard protocol. RUDP has the disadvantage of not being an established standard (although it's been mentioned in several IETF discussions).
Good luck with your project!
It's likely that both your TCP and RUDP links would be broken by your environment, so the fact that you're using RUDP is unlikely to help there; there will likely be times when no datagrams can get through...
What you actually need to make sure of is that a) you can handle the number of connected clients, b) your application protocol can detect reasonably quickly when you've lost connectivity with a client (or server) and c) you can handle the required reconnection and maintenance of cross connection session state for clients.
As long as you deal with b) and c) it doesn't really matter if the connection keeps being broken. Make sure you design your application protocol so that you can get things done in short batches; so if you're uploading files, make sure that you're sending small blocks and that the application protocol can resume a transfer that was broken half way through; you don't want to get 99% of the way through a 2gb transfer and lose the connection and have to start again.
For this to work your server needs some kind of client session state cache where you can keep the logical state of a client's connection beyond the life of the connection itself. Design from the start to expect a given session to include multiple separate connections. The session state should possibly have some kind of timeout so if the client goes away for along time it doesn't continue to consume resources on the server but, to be honest, it may simply be a case of saving the state off to disk after a while.
In summary, I don't think the choice of transport matters and I'd go with TCP at least to start with. What will really matter is being able to manage your client's session state on the server and deal with the fact that clients will connect and disconnect regularly.
If you aren't sure, odds are that you should use TCP. For one thing, it's certain to be part of the network stack for anything supporting IP. "Reliable UDP" is rarely supported out of the box, so you'll have some extra support work for your clients.
I am doing a measurement project where I send and receive data from numerous devices on my network. The send/receive can be considered fast and intensive, as there is almost no pause and a continuous flow of data. However, the data to/from each device is quite small, on the order of a couple of bytes each. For some reason, I am experiencing a reset of my entire ethernet connection where my internet connection also goes down, and I lose connection to all my devices as well. I have never experienced such a situation and am wondering what are some of the common situations that might lead to resets like this?
Actually, it turns out this had to do with the way I constructed my thread, in which I would create a new socket continuously without discarding the previous one. Stupid right? Well I fixed the code and the ethernet no longer crashes.
A bit of history: We have an application, which was originally written many years ago (1998 is the first date in PVCS but the app is about 5 years older than that as it originally was a DOS program). This application communicates with a piece of hardware via serial. When we got to Windows XP we started receiving reports of the app dying after a short time of running. It seems that the serial comms just 'died' and the app was left in a stuck state. The only way to recover from this situation was to restart the application.
The only information I can find regarding this problem was apparently the Windows Message system would miss that information was received, the buffer would fill and the system would get stuck. This snippet of information was left in a old word document, but there's no evidence to back this up. It also mentions that this is only prevalent at high baud rates (115200+).
The solution was to provide customers with USB->Serial converters along with the hardware.
Today: We are working on a new version of the hardware that will run across a network as well as serial ports. So to allow me to work on the network code, minus the actual hardware we are using a VSCOM NetCom113 device. It also installs a virtual comm port on the users (ie: mine) machine.
Now I have got the network code integrated with the app, it appears that the NetCom device exhibits the same behaviour as a physical commport. This is undesirable as I need the app to run longer than ~30 seconds.
Google turns up zero problems that we experience.
I was wondering:
Has anyone experienced this before? If so what did you do to fix/workaround the problem?
Does anyone have any suggestions as to whether the original author of the document is correct and what I can do to test the theory?
Unfortunately I can't post code as the serial code is tightly couple with the rest of the system, though if you have questions regarding it I can answer questions about it.
Updates:
The code is written using Win32 Comm routines - so I am using CreateFile, ReadFile. There's also judicious calls to GetOverlappedResult.
It's not hanging per se, it's just that the comms stops. You can access the menus, click the buttons, but nothing can interact with the connected hardware. Using realterm you can see that no data is coming in or going out.
I think the reference to the windows message is that the problem is internal to windows. Data has arrived but the kernal has missed it and thus not told the rest of the system about it.
Flow control is not used.
Writing a 'simple' test is difficult due the the fact that the code is tightly coupled and the underlying protocol is quite complex and would require a lot of work.
Are you using DOS-style serial code, or the Win32 CreateFile approach?
If the former, be very suspicious: if at all possible I'd convert to the latter.
If the latter, do you know on what kind of system call it's hanging? Are you in a blocking read call? or an overlapped I/O call? or waiting on an event? (I'm not sure I have enough experience to help, but those are the kinds of questions that come to mind)
You might also check into the queue size, which you can set with the SetupComm function.
I don't buy the "Windows Message system" stuff -- it sounds fishy; you can write good Win32 serial i/o code that never uses Windows messages.
edit: does your Overlapped I/O use events? I seem to remember something about auto-reset events occasionally missing their trigger... check your overlapped I/O calls very carefully to see whether you're handling the possible outcomes properly. Perhaps there's a way to make your code more robust by automatically cancelling the overlapped i/o and restarting another read. (I assume the problem is in the read half, not the write half?)
edit 2: A suggestion: assuming the win32 side has missed a byte or packet, and your devices are in deadlock because they're both expecting each other to respond to something, can you tweak the other side of the serial I/O to regularly send some type of "ping" packet with an incrementing counter? (and log the ping packets on the PC side; that way you can see whether you've missed any)
Are you sure you have your flow control set up correctly? DTR, RTS, etc...
-Adam
i have written apps that use usb / bluetooth serial ports and have never had an issue. with bluetooth i have seen bit rates (sustained) of 800,000 bps for long periods of time. most people don't properly implement the port.
My serial port
Not sure if this is a possibility for you, but if you could re-write the code using C#.NET you'd have access to the SerialPort class there. It might remedy your problem. I know a lot of legacy code based around the Win32 API for hardware I/O ports tended to fail in XP due to timing (had a small bit of experience with MIDI).
In addition, I don't know if you can use the Win32 method of Serial Port access in Vista, so that might shut out future MS OSes from being able to use your code.