I have a tcpip socket interface to a third party software app. I've implemented this interface for several customer sites with no problem. The latest customer, though... problems. We've turned on logging in the apps on either end, and also installed Wireshark on the PC to log raw tcpip traffic. With that, we've proved that my server app successfully sends the message out, the pc receives the message, but the client app doesn't see it. (This is a totally intermittent problem, which is why it's such a pain to troubleshoot.)
The socket details are as simple as they come: one socket handling two way communications between the server and the pc. The messages are plain ascii text and fairly short (not XML). The server initiates communications by sending the first message, and then the client responds with several messages. The socket is kept open at all times while the apps are running. The client app is designed so that the end user can only process one case at a time, which prevents message collisions from happening. They have some sort of polling set up, their app "hibernates" until it sees the initiating message from the server.
The third party vendor has advised me to add a few second delay before I send them the initiating message. I can't see how that helps. If the client is "sleeping", just polling the socket waiting for a message, how does adding a delay before the first message help? It's not like we send two messages and the second one gets lost. It's losing the first message. So I don't see how it matters if we send that message now or two seconds from now.
I've asked them and they haven't given me details. It could be some proprietary details in their coding that they don't want to disclose to me, and that's fair. So I'm asking here because I'm always learning new things about socket programming. Maybe you guys can shed some light on how polling a tcpip socket can be affected by message timing?
Since its someone else's client and they won't tell you what its doing (other than saying 'insert a delay'), the answer is probably that their client is reading and discarding the message because its not yet in a state to deal with it. The delay will allow the client time to get into a state where it can respond to the message properly.
In other words, the client has a race condition. One easy way this can happen is if they have one thread for reading messages and another for dealing with them.
Short of running strace(1) on the client to see what system calls it is making, its tough to tell what the client is actually doing.
Related
I have an embedded device running BT5 with GATT server setup. On the server I have setup a service with various characteristics to allow a client (PC or Mobile Device) to adjust various parameters of the device by writing to the characteristics.
I would like, for the device to send a response back from the application level for each write. It's not clear to me what the recommended way would be.
I thought about having the client read or subscribe to a general status characteristic, but I want to make sure I am not missing an easier way to do this. I looked at the BT write with response command, but it seems the acknowledgement for that may happen lower than the application.
You should be able to use the Write Response as "application level response". I have not seen any Bluetooth stack where this response is sent at a lower level before the application has processed the request. The reason is probably because the application can even send an Application Error code instead of a Write Response, so it would be stupid to move the Write Response handling to a lower level. Even in Android (if you set up a GATT server) you send the Write Response from the application.
The situation is different with Indications, though, where the Bluetooth stack sometimes sends the Confirmation at a lower level than the application, before it even informs the application that an Indication has arrived, which I find a bit strange and makes Indications kind of pointless compared to Notifications.
I solved this using a Notification characteristic. The client first subscribes to notification events on that CCD, and then every command sent to the host/device is acknowledged by the host firing the notification. To better synchronize command-and-response, you could add an incremental command-id with every command, and have the command-id be part of the notification data that is sent back to the client.
However I implemented this because I needed a response after the device has processed the command, with the results sent back to the client. If all you want to know is whether or not the host has received the command, a Write-With-Response CCD is the way to go.
I looked at the BT write with response command, but it seems the acknowledgement for that may happen lower than the application.
Indeed, the Write-With-Response-Handler is almost always implemented on the BLE stack, not on application level. However I don't see why this would be a problem; you should get error reports by your BLE stack in some form when a Write-with-Response fails. If it's a blocking call it might even return a success-value.
I'm using SignalR with Redis as a message bus on a server that sits behind an Nginx proxy for load balancing. I used SignalR's PersistentConnection class to write a simple chat program that broadcasts messages to users belonging to the same certain group. Users are added to a group in OnConnectedAsync, removed in OnDisconnectAsync, and the user-to-group mapping is deterministic.
Currently, the client side falls back to long polling for whatever reason (I'm not entirely sure why), and whenever the client sets up a new connection after waiting for and receiving a response, seemingly at random, the server will sometimes respond to the new connection immediately with the previous response, despite there having only been one POST.
The message ID's tend to differ by exactly one, (the smaller ID coming first), with the rest of the response remaining the same. I logged some debug info and am quite positive that my override of OnReceivedAsync is sending one response per one request. I tried the same implementation without the Redis message bus, and got the same problem. Running locally (with long polling) however yielded good results so I suspect that the problem might be with the way the message bus might be buffering messages to refresh clients who might not be caught up, and some weird timing with the cutting/setting up of connections with the Nginx load balancer, but beyond that, I am very much at a loss.
Any help would be appreciated.
EDIT: Further investigation reveals that duplication occurs at somewhat regular intervals of approximately 20-30 seconds. I'm led to believe that the message expiration in the message bus might have something to do with the bug.
EDIT: Bug can be seen here: http://tinyurl.com/9q5t3va
The server is simply broadcasting a counter being sent by the client. You will notice some responses are duplicated every 20 or so.
Reducing the number of worker processes in the IIS (6.0) Server Manager from 2 to 1 solved the problem.
Say my network connection drops for a few seconds and I miss some SignalR server-pushed messages.
When I regain network connectivity are the messages I missed lost? or does signalR handle them and push them out when I reconnect?
If it can't handle missed messages, then what is the recommended approach for ensuring consistency?
Periodically (2-3 mins) poll to check server-data?
Somehow detect loss of network on the client side and do an ajax call to get the data on network restoration?
something else?
Here are a couple of thoughts:
If you aren't sending a lot of messages per second, consider sending no data in the messages themselves. Instead, the message is just a "ping" to the clients telling them to go get the server data when they can. Combine that with a periodic poll, as you said, and you can be assured that you won't miss messages. They just might be delayed.
If you are sending a lot of messages quickly, how about adding a sequential ID to each one? Think of a SQL Identity column. Your clients would need to keep track of the most recent ID received. After a network reconnect, the client could ask for all messages since [Last ID]. If a message is received whose ID is not contiguous with the most recently received, you know that there was a disconnect and can ask the server for the missing information.
I have a fms 3 server runing a video chat room application. It goes well except everyday it will die once or twice. After restarting the fms server, everything goes working again.
I really need to know the reason why fms server can die.
I checked its log, i saw many
"Server rejected an invalid flow."
Any hint will be greatest welcome.
This error can be caused by making an attempt to make a P2P connection to the server's peer ID. Connections to the server need to use
http://forums.adobe.com/thread/845685
i believe the problem is that you are attempting to make a P2P connection to the server's peer ID; that is, something like
var ns:NetStream = new NetStream(netConnection, netConnection.farID);
ns.play(...);
under the covers, this will open a new RTMFP flow to the server that will appear to the server as a new incoming client, but the initial handshake will be incorrect (the first/only command message is "play" instead of "connect"). i see this on Cirrus all the time.
it's possible that FMS doesn't account properly when rejecting these flows (leaving the connection count higher than it should be), or perhaps it leaves the flow open waiting for a "connect" message that will never come, so the connection count is legitimately higher than you think it is.
in any case, make sure you're not opening a P2P stream to the server's peer ID.
However, this error may not actually be related to the crashes. Additionally, are you even sure FMS is crashing and not just your application? If it's just your application, review your application logs (instead of the core FMS logs) and if you don't have anything useful add more logging to your application.
We currently experience a problem with a self-written server application running on Windows (occurs on different versions). The server listens at a TCP port, accepts connections, exchanges some data and then closes the connections again. There are about 100 clients that connect from time to time.
Sometimes the server stops to work: Log files show that connections are still accepted, but that at the first read attempt a socket error (10054 - Connection reset by peer) occurs. I don't think it is a client issue because it suddenly stops working for all clients.
Now we found out, that the same problem occurs with our old server software, that is even written in another programming language. So it doesn't seem to be an error in our program - I think it has to be some kind of OS / firewall issue? Of course, firewalls have been deactivated, which didn't solve the issue yet.
Any ideas where to look into? Wireshark logs will follow soon..
Excerpt from the log (Timestamp, Thread Id, message)
11:37:56.137 T#3960 Connection from 10.21.13.3
11:37:56.138 T#3960 Client Exception: Socket Error # 10054
Connection reset by peer.
11:37:56.138 T#3960 ClientDisconnected
11:38:00.294 T#4144 Connection from 10.21.13.3
You can see that the exception occurs almost at the same time as the connection is accepted, in this case the client reconnects after a few seconds.
A "stateful" firewall or NAT keeps track of connections, and ought to send RSTs for connectiosn it doesn't know about. If the firewall loses track of connections for some reason, then you'll probably see random connections being reset.
Our router at work does this — it forgets about connections when the PPP connection dies, which is remarkably unhelpful when it rains and the DSL restart takes a bit too long. However, instead of resetting connections, it just drops packets (even more unhelpful!).
Sounds like a firewall or routing issue - maybe stale connections get disconnected after a timeout period. Are you using a ping/keepalive inside your protocol.
Otherwise you may ask Wireshark to see what is going on.
First, thanks for many hints - I'm afraid the problem was a completely different one which you couldn't possibly solve by reading my question.
The server application uses log4net, configured with a log file an ImmediateFlush = true. If every log statement is directly written into the file and multiple socket connections occur this slows down the whole application.
The server needed about a minute to really accept the connection. This was far more than the timeout on clientside. So in the log there was only shown "accepted" followed by "disconnected" - even the log was delayed!
Sorry for the inconvenience...
Have you tried changing the backlog and then see how much time or how many clients are served before this problem occurs
You don't say what Windows versions you're using for the server, but you should be aware that the Windows TCP/IP stack behaves differently in server and client OSes. There are limits on how many simultaneous incoming connections a client OS will allow, and they are significantly lower than you might expect.
What do the logs look like from the client side?
Since the error is stating that the client is dropping the connection; if you see the same error on the client side then it is a firewall or proxy that is dropping the connection (both side seeing the opposite side dropping the connection is indicative of a proxy/firewall).
If the error is not present on the client side; then I would say that your client side is where you will see the actual error.