FMS server died daily while enable rtmfp - apache-flex

I have a fms 3 server runing a video chat room application. It goes well except everyday it will die once or twice. After restarting the fms server, everything goes working again.
I really need to know the reason why fms server can die.
I checked its log, i saw many
"Server rejected an invalid flow."
Any hint will be greatest welcome.

This error can be caused by making an attempt to make a P2P connection to the server's peer ID. Connections to the server need to use
http://forums.adobe.com/thread/845685
i believe the problem is that you are attempting to make a P2P connection to the server's peer ID; that is, something like
var ns:NetStream = new NetStream(netConnection, netConnection.farID);
ns.play(...);
under the covers, this will open a new RTMFP flow to the server that will appear to the server as a new incoming client, but the initial handshake will be incorrect (the first/only command message is "play" instead of "connect"). i see this on Cirrus all the time.
it's possible that FMS doesn't account properly when rejecting these flows (leaving the connection count higher than it should be), or perhaps it leaves the flow open waiting for a "connect" message that will never come, so the connection count is legitimately higher than you think it is.
in any case, make sure you're not opening a P2P stream to the server's peer ID.
However, this error may not actually be related to the crashes. Additionally, are you even sure FMS is crashing and not just your application? If it's just your application, review your application logs (instead of the core FMS logs) and if you don't have anything useful add more logging to your application.

Related

What are the general rules for getting the 104 "Connection reset by peer" error?

Are there any general rules on when a website sends out a TCP reset, triggering the Connection reset by peer error?
Like
too many open connections
too high bandwidth use
connected for too long
…?
I'm pretty certain that there is no law governing this and that different websites/web developers have different tastes, but I would be interested if there are some general rule sets (from websites or textbooks on the subject or what you have been taught in school/at work) that are mostly followed.
Reason why I'm asking, of course, is that I want to get around being blocked…
I'm downloading some government data that is freely available, but is lacking an API or something, so the two official ways to get it are either clicking around in some web-GIS a few thousand times or going along the Kafkaesque path of explaining various levels of clerks the concepts of databases, csv files, zip files and that you can't (and won't need to, if they'd just did what you try to explain them) just drive to their agency with a "giant" harddrive, so I'm trying to just go the most resource saving way for everyone involved…
A website is not "sending" a "Connection reset by peer" error. This error is generated by the OS kernel on the client site if it gets a TCP reset for an active connection. There are many reasons this TCP reset might be sent. A TCP reset might be sent by design from some kind of load limit, for example to limit the number of connections from the same IP address within a specific time as a form of DOS protection, to restrict data scraping or to enforce some kind of fair use. There is no general rule or even law for this kind of explicit limits.
A TCP reset might also be caused by the application being overloaded, application crashing, system running out of resources ... .
And a TCP reset will happen if the client writes to a connection which the server already considers as closed. This can happen for example with HTTP keep alive: the server might close the connection on inactivity at any time after the HTTP response was sent. If the client sends a new request on the same connection at the same time the server closes the connection, the server will reject the new request (since the connection is closed on the server end) and will send a TCP RST, causing a connection reset by peer at the client. The client needs to properly handle this situation by creating a new connection and sending the request again (provided that the request was not state changing, i.e. is idempotent).

lost server replies/errors with netty's object decoder

I have a very simple netty app which serves both as server and a client.
Client uses channel.writeAndFlush() to send request to server and then blocks on monitor.wait().
In client's InboundAdapter in channelRead() I find the appropriate monitor and do monitor.notify() to let the requesting client thread to proceed working on the server's reply.
On the server in ChannelHandler's channelRead() I do the following:
To limit the amount of requests being processed I submit a task which does the real work as a new task to existing EventLoop: ctx.executor().submit(new Task()). I that task I do heavy IO operations and after that I writeAndFlush() results back to client.
Here is my pipeline setup:
new ObjectEncoder(),
new ObjectDecoder(LibConstants.Search.MAX_REQUIEST_SIZE, ClassResolvers.cacheDisabled(null))
Here is the bootstrap config:
new ServerBootstrap()
.channel(NioServerSocketChannel.class)
.option(ChannelOption.SO_BACKLOG, 1000)
.option(ChannelOption.SO_KEEPALIVE, true)
I have 2 problems:
Rather often I get io.netty.handler.codec.DecoderException: java.io.UTFDataFormatException on the client when receiving a reply from server. I cannot find any obvious reason for this. Since my pipeline setup is so simple.
A reply from the server just wouldn't appear on the client. I the logs I see a successful flush on the server but the reply never arrived at the client. This is very hard to deal with since my app is very latency sensitive. Any timeout I would set will kill my user experience.
This all happens over a VPN network so there is a possibility that VPN device misbehaves in some weird way but I hoping that TCP would handle any sort of packet loss/corruption which can happen in the channel.
Any advice or experience you can share will be very appreciated!

TcpListener stops accepting or accepts broken connections

We currently experience a problem with a self-written server application running on Windows (occurs on different versions). The server listens at a TCP port, accepts connections, exchanges some data and then closes the connections again. There are about 100 clients that connect from time to time.
Sometimes the server stops to work: Log files show that connections are still accepted, but that at the first read attempt a socket error (10054 - Connection reset by peer) occurs. I don't think it is a client issue because it suddenly stops working for all clients.
Now we found out, that the same problem occurs with our old server software, that is even written in another programming language. So it doesn't seem to be an error in our program - I think it has to be some kind of OS / firewall issue? Of course, firewalls have been deactivated, which didn't solve the issue yet.
Any ideas where to look into? Wireshark logs will follow soon..
Excerpt from the log (Timestamp, Thread Id, message)
11:37:56.137 T#3960 Connection from 10.21.13.3
11:37:56.138 T#3960 Client Exception: Socket Error # 10054
Connection reset by peer.
11:37:56.138 T#3960 ClientDisconnected
11:38:00.294 T#4144 Connection from 10.21.13.3
You can see that the exception occurs almost at the same time as the connection is accepted, in this case the client reconnects after a few seconds.
A "stateful" firewall or NAT keeps track of connections, and ought to send RSTs for connectiosn it doesn't know about. If the firewall loses track of connections for some reason, then you'll probably see random connections being reset.
Our router at work does this — it forgets about connections when the PPP connection dies, which is remarkably unhelpful when it rains and the DSL restart takes a bit too long. However, instead of resetting connections, it just drops packets (even more unhelpful!).
Sounds like a firewall or routing issue - maybe stale connections get disconnected after a timeout period. Are you using a ping/keepalive inside your protocol.
Otherwise you may ask Wireshark to see what is going on.
First, thanks for many hints - I'm afraid the problem was a completely different one which you couldn't possibly solve by reading my question.
The server application uses log4net, configured with a log file an ImmediateFlush = true. If every log statement is directly written into the file and multiple socket connections occur this slows down the whole application.
The server needed about a minute to really accept the connection. This was far more than the timeout on clientside. So in the log there was only shown "accepted" followed by "disconnected" - even the log was delayed!
Sorry for the inconvenience...
Have you tried changing the backlog and then see how much time or how many clients are served before this problem occurs
You don't say what Windows versions you're using for the server, but you should be aware that the Windows TCP/IP stack behaves differently in server and client OSes. There are limits on how many simultaneous incoming connections a client OS will allow, and they are significantly lower than you might expect.
What do the logs look like from the client side?
Since the error is stating that the client is dropping the connection; if you see the same error on the client side then it is a firewall or proxy that is dropping the connection (both side seeing the opposite side dropping the connection is indicative of a proxy/firewall).
If the error is not present on the client side; then I would say that your client side is where you will see the actual error.

How to detect if a client has crashed (or exit) for a server using Qt

The client use ssh login and start up a server on remote machine, then the clinet create a tcp connect to the server.
The server need exit when the client has exit normally or crashed or network is dropped.
So the question is how to detect if the client which the server has connected to is crashed.
The first try is using error() signal, catch QAbsoluteSocket::NetworkError to determine the network has dropped. But I can't receive error() signal at all even if i pull out the network cable.
The second try is using the SocketState, i think whenever SocketState is UnconnectedState,the client may has exit normally and the server should exit too. This way works fine for "normal exit", but I don't know how to deal with "crash" and "dead network".
Help me, thanks!
I'd recommend using TCP keep alive. It is not exposed through the public QTcpSocket interface, but you can use setsockopt with QAbstractSocker::socketDescriptor to activate the SO_KEEPALIVE feature.
EDIT: It appears that keep alive was added to QAbstractSocket at some point. So, simply call QAbstractSocket::setSocketOption with QAbstractSocket::KeepAliveOption.
You can find information about adjusting the timeout of keep alive request here: http://www.gnugk.org/keepalive.html
Most of the time, the only way you will know there is a problem with a socket connection is when you try to read or write with it. There are some exceptions: Windows will change the state of sockets if the network cable is unplugged, Linux (in my experience) will not.
The most reliable way to detect connection problems is to have the client regularly send a small message at an agreed upon interval with the server. If the server does not see this message within a reasonable time, it should consider the client dead and drop the connection. This will also give both sides regular opportunities to detect a problem via reads and writes.

Delay before sending message over socket - how does that help?

I have a tcpip socket interface to a third party software app. I've implemented this interface for several customer sites with no problem. The latest customer, though... problems. We've turned on logging in the apps on either end, and also installed Wireshark on the PC to log raw tcpip traffic. With that, we've proved that my server app successfully sends the message out, the pc receives the message, but the client app doesn't see it. (This is a totally intermittent problem, which is why it's such a pain to troubleshoot.)
The socket details are as simple as they come: one socket handling two way communications between the server and the pc. The messages are plain ascii text and fairly short (not XML). The server initiates communications by sending the first message, and then the client responds with several messages. The socket is kept open at all times while the apps are running. The client app is designed so that the end user can only process one case at a time, which prevents message collisions from happening. They have some sort of polling set up, their app "hibernates" until it sees the initiating message from the server.
The third party vendor has advised me to add a few second delay before I send them the initiating message. I can't see how that helps. If the client is "sleeping", just polling the socket waiting for a message, how does adding a delay before the first message help? It's not like we send two messages and the second one gets lost. It's losing the first message. So I don't see how it matters if we send that message now or two seconds from now.
I've asked them and they haven't given me details. It could be some proprietary details in their coding that they don't want to disclose to me, and that's fair. So I'm asking here because I'm always learning new things about socket programming. Maybe you guys can shed some light on how polling a tcpip socket can be affected by message timing?
Since its someone else's client and they won't tell you what its doing (other than saying 'insert a delay'), the answer is probably that their client is reading and discarding the message because its not yet in a state to deal with it. The delay will allow the client time to get into a state where it can respond to the message properly.
In other words, the client has a race condition. One easy way this can happen is if they have one thread for reading messages and another for dealing with them.
Short of running strace(1) on the client to see what system calls it is making, its tough to tell what the client is actually doing.

Resources