I am working with IBM MQ. I managed to get a basic Handshake / Put Message(s) / Get Message(s) / Disconnect .net solution going on, a couple of days ago, but it only works on a local level, and I now need to update the solution so it works remotely as well.
After reading and experimenting for a while, I decided to follow IBM Knowledge Center's Point to Point scenario step by step. However, I can't start the Sender Channel as instructed by the guide's last step; the Sender Channel's status ping-pongs between Binding and Retrying, and the logs come up with the following error codes; AMQ9002, AMQ9202 and AMQ9999, meaning, as far as I can tell, there is some kind of trouble finding and/or connecting with the host, as explained by the error logs.
I have looked through a lot of questions regarding these errors in particular, but while I have followed most of the proposed solutions (I made sure the Receiver's listener is running, I tried turning off Firewalls, I tried with different ports, I have performed tests Telnet, I have stopped/restarted/resolved the Sender channel a few times, and I have tried setting this up from both, the command line and MQ Explorer), I have yet to get a successful communication going on between two different PCs.
I am aware the error could be either temporary, or the result of problems within the Network itself, but I have been trying to establish a successful connection for almost three days now, and before I pass this unto my bosses I would like to make sure I have exhausted every other possibility.
How can I complete IBM's Point To Point set up guide, or is there anything that could point me towards a different / better approach to get two PCs talking with each other via IBM MQ v9?
Although hastily translated from Japanese, you can find the detailed error logs below.
2017/09/19 17:34:09 - Process (234212.1) User (MUSR_MQADMIN) Program
(runmqchl.exe)
Host (DESKTOP - UP 4 D 363) Installation (Installation 1)
VRMF (9.0.3.0) QMgr (QM 1)
Time (2017-09-19T08: 34: 09.201 Z)
AMQ9002: Channel 'TO.QM2' is starting.
Description: Channel 'TO.QM2' is starting.
ACTION: None.
2017/09/19 17:34:30 - Process (234212.1) User (MUSR_MQADMIN) Program
(runmqchl.exe)
Host (DESKTOP - UP4D363) Installation (Installation 1)
VRMF (9.0.3.0) QMgr (QM 1)
Time (2017-09-19T08: 34: 30.824Z)
AMQ 9202: The remote host 'DESKTOP-1AV4LM3 (The correct ip address) (1415)' can not be used.Please try again later.
Description: Using TCP / IP to host 'DESKTOP-1AV4LM3 (The correct ip
address) of channel TO.QM2 (1415) 'trying to allocate a conversation,
but it did not succeed. However, It is temporary and there is also the
possibility that TCP / IP conversation can be allocated normally
later.
If the remote host can not be determined, '????' is displayed. .
ACTION: Please try the connection later. If the failure persists,
record the error value Please contact the stem administrator. The
return code from TCP / IP is 10060 (X'274C ').The cause of this
failure may be that the host can not reach the destination host.
Alternatively, There is a possibility that the host 'DESKTOP-1AV4LM3
(The correct ip address) (1415)' listener isn't running. If that is
the case, start the listener and try again.
2017/09/19 17:34:30 - Process (234212.1) User (MUSR_MQADMIN) Program (runmqchl.exe)
Host (DESKTOP - UP 4 D 363) Installation (Installation 1)
VRMF (9.0.3.0) QMgr (QM 1)
Time (2017-09-19T08: 34: 30.825Z)
AMQ9999: Channel 'TO.QM2' for host 'DESKTOP-1AV4LM3 (1415)' terminated abnormally
Description: The host 'DESKTOP-1AV4LM3 (1415)' cannot be determined.
ACTION: Check the error log for the preceding error message for
this channel program Please determine the cause of failure....
".
The 'interesting' bit of the error messages above is that the sender is attempting to start a channel to port 1415 on the destination and is getting a 10060 return code (WSAETIMEDOUT). This is different from an immediate rejection because the other end doesnt have a socket open, for example.
You will also note its timing out after about 21 seconds if your times are to be believed. The only time I've seen this kind of things is DNS resolution - There was an APAR for example showing that reverse DNS can cause delays in channel startup, and this could be for a successful or unsuccessful startup
http://www-01.ibm.com/support/docview.wss?uid=swg1IC96408
A new attribute was added to MQ to disable reverse DNS lookups if its the cause - See https://www.ibm.com/support/knowledgecenter/en/SSFKSJ_8.0.0/com.ibm.mq.pro.doc/q113120_.htm#q113120___chlauth
If this is the case, on the receiving end (or both!) try runmqsc , 'ALTER QMGR REVDNS(DISABLED)'. You might have to restart the qmgr for it to be effective (I'm not sure, sorry)
I'd also echo the comment added to your question by JoshMc, to check the receiving end logs for messages (both global errors but more likely the qmgr specific AMQERR01.LOG files) when this occurs - I have a feeling that the timeout is only part of your problem.
Related
In datapower, the operational state of queue manager object is pending. The information provided for this operational state is as follows : "This message indicates that the configuration of the object has changed, but has not been committed and has yet to take effect. No user intervention is required." What is exactly causing this problem and how can this be resolved?
If it is "pending" for a MQ QM object it means that DataPower is trying to figure out if it has a connection to it or not.
Normally if a QM object is in "pending" for a while, more than 20 seconds, it would mean that it didn't get the connection.
Check the System log and you'll probably see a ton of connection errors to the QM server.
First go to Troubleshooting from the Control Panel and do a TCP test to make sure you have a connection to the MQ server using the IP and port of the listener on the QM.
If you get a connection then check the MQ logs for any authentication issues, eg. user and/or auth-records. You need a Server-Connection channel for DataPower!
If you don't get a connection in TCP test then check your firewalls and also make sure that the DataPower network is setup correctly if you have multiple network cards (NIC) setup and set a static route for the MQ on the correct NIC.
I am inside a network where I need proxy settings to access the internet.
I have a weird problem.
The internet is working fine.
But it is one particular instance when i get this error:
Network Error (tcp_error)
A communication error occurred: "Operation timed out"
The Web Server may be down, too busy, or experiencing other problems preventing it from responding to requests. You may wish to try again at a later time.
For assistance, contact your network support team.
This happens when I use hadoop in local mode.
I can access the UI interface. I can see the jobs running. but when I try to see the logs of each task.. i am not able to access those logs.
UI--> job-->map--> task--> all <-- this is where the error is..
Any clues?
THanks
Not sure about exactly what your tcp action is, or about Hadoop or your proxy setup, but if you can reliably repeat the error, and the timeout error happens at approximately the same time each time you test, and that time is on the order of minutes, my guess would be that you've got a true processing delay (perhaps caused by blocking somewhere) at the server, but not necessarily.
We currently experience a problem with a self-written server application running on Windows (occurs on different versions). The server listens at a TCP port, accepts connections, exchanges some data and then closes the connections again. There are about 100 clients that connect from time to time.
Sometimes the server stops to work: Log files show that connections are still accepted, but that at the first read attempt a socket error (10054 - Connection reset by peer) occurs. I don't think it is a client issue because it suddenly stops working for all clients.
Now we found out, that the same problem occurs with our old server software, that is even written in another programming language. So it doesn't seem to be an error in our program - I think it has to be some kind of OS / firewall issue? Of course, firewalls have been deactivated, which didn't solve the issue yet.
Any ideas where to look into? Wireshark logs will follow soon..
Excerpt from the log (Timestamp, Thread Id, message)
11:37:56.137 T#3960 Connection from 10.21.13.3
11:37:56.138 T#3960 Client Exception: Socket Error # 10054
Connection reset by peer.
11:37:56.138 T#3960 ClientDisconnected
11:38:00.294 T#4144 Connection from 10.21.13.3
You can see that the exception occurs almost at the same time as the connection is accepted, in this case the client reconnects after a few seconds.
A "stateful" firewall or NAT keeps track of connections, and ought to send RSTs for connectiosn it doesn't know about. If the firewall loses track of connections for some reason, then you'll probably see random connections being reset.
Our router at work does this — it forgets about connections when the PPP connection dies, which is remarkably unhelpful when it rains and the DSL restart takes a bit too long. However, instead of resetting connections, it just drops packets (even more unhelpful!).
Sounds like a firewall or routing issue - maybe stale connections get disconnected after a timeout period. Are you using a ping/keepalive inside your protocol.
Otherwise you may ask Wireshark to see what is going on.
First, thanks for many hints - I'm afraid the problem was a completely different one which you couldn't possibly solve by reading my question.
The server application uses log4net, configured with a log file an ImmediateFlush = true. If every log statement is directly written into the file and multiple socket connections occur this slows down the whole application.
The server needed about a minute to really accept the connection. This was far more than the timeout on clientside. So in the log there was only shown "accepted" followed by "disconnected" - even the log was delayed!
Sorry for the inconvenience...
Have you tried changing the backlog and then see how much time or how many clients are served before this problem occurs
You don't say what Windows versions you're using for the server, but you should be aware that the Windows TCP/IP stack behaves differently in server and client OSes. There are limits on how many simultaneous incoming connections a client OS will allow, and they are significantly lower than you might expect.
What do the logs look like from the client side?
Since the error is stating that the client is dropping the connection; if you see the same error on the client side then it is a firewall or proxy that is dropping the connection (both side seeing the opposite side dropping the connection is indicative of a proxy/firewall).
If the error is not present on the client side; then I would say that your client side is where you will see the actual error.
The client use ssh login and start up a server on remote machine, then the clinet create a tcp connect to the server.
The server need exit when the client has exit normally or crashed or network is dropped.
So the question is how to detect if the client which the server has connected to is crashed.
The first try is using error() signal, catch QAbsoluteSocket::NetworkError to determine the network has dropped. But I can't receive error() signal at all even if i pull out the network cable.
The second try is using the SocketState, i think whenever SocketState is UnconnectedState,the client may has exit normally and the server should exit too. This way works fine for "normal exit", but I don't know how to deal with "crash" and "dead network".
Help me, thanks!
I'd recommend using TCP keep alive. It is not exposed through the public QTcpSocket interface, but you can use setsockopt with QAbstractSocker::socketDescriptor to activate the SO_KEEPALIVE feature.
EDIT: It appears that keep alive was added to QAbstractSocket at some point. So, simply call QAbstractSocket::setSocketOption with QAbstractSocket::KeepAliveOption.
You can find information about adjusting the timeout of keep alive request here: http://www.gnugk.org/keepalive.html
Most of the time, the only way you will know there is a problem with a socket connection is when you try to read or write with it. There are some exceptions: Windows will change the state of sockets if the network cable is unplugged, Linux (in my experience) will not.
The most reliable way to detect connection problems is to have the client regularly send a small message at an agreed upon interval with the server. If the server does not see this message within a reasonable time, it should consider the client dead and drop the connection. This will also give both sides regular opportunities to detect a problem via reads and writes.
When I run F# compiler - fsc.exe - on our build server it takes ages (~20sec) to run even when there are no input files. After some investigation I found out that it's because the application tries to access crl.microsoft.com (probably to check if some certificates aren't revoked). However, the account under which it runs doesn't have an access to the Internet. And because our routers/firewalls/whatever just drops the SYN packets, fsc.exe tries several times before giving up.
The only solution which comes to mind is to set clr.microsoft.com to 127.0.0.1 in hosts file but it's pretty nasty solution. Moreover, I'll need fsc.exe on our production box, where I can't do such things. Any other ideas?
Thanks
Come across this myself - here are some links... to better descriptions and some alternatives
http://www.eggheadcafe.com/software/aspnet/29381925/code-signing-performance-problems-with-certificate-revocation-chec.aspx
I dug up this form an old MS KB for Exchange when we hit it... Just got the DNS Server to reply as stated (might be the solution for your production box.)
MS Support KB
The CRL check is timing out because it
never receives a response. If a router
were to send a “no route to host” ICMP
packet or similar error instead of
just dropping the packets, the CRL
check would fail right away, and the
service would start. You can add an
entry to crl.microsoft.com in the
hosts file or on the DNS server and
send the packets to a legitimate
location on the network, such as
127.0.0.1, which will reject the connection..."