I have a .net core Worker process using the latest library IBMMQDotnetClient 9.2.4. I am running into a test when there is a about 10k+ messages in queue. the application processes about 5.5k and then it processes random number of messages(mostly one message every heartbeat interval but have seen 100+) for every heartbeat interval time(currently have default value of 5). Has anyone ran into the issue. it would be great if someone knows the cause.
Update: There is no issue when putting message in the Queue it is only when getting from the queue.
Related
I have been having a lot of trouble to get Signalr to function reliably with messages inexplicably never arriving but being sent from the Hub almost as if the groups are being lost although the websocket connection is open on two clients which I am trying to communicate between.
I noticed now that I have IIS 8.5 set at 10 maximum worker processes for the site and they are all running.
Is this possibly a cause for the erratic behavior? Should I implement a backplane even if I have just one server but multiple processes?
Any help will be much appreciated. It's been weeks. :(
In Corda 4.1, I started up a few nodes. Then, I put a few 1000 txns through them. As expected cpu spiked during this time. However, an hour later, the java process is still using 100% of available CPU. Nothing additional is being output to the logs.
When I do dashboard there seems to be two jvm processes called rpc-client-observation-pool-0 chewing up 50% of the cpu, even when the rpc client that was running against it has quit. Not gonna lie - the machine hasn't got the most ram allocated to it.
I can see in the logs this message A hot observable returned from an RPC was never subscribed to. This wastes server-side resources because it was queueing observations for retrieval. It is being closed now, but please adjust your code to call .notUsed() on the observable to close it explicitly. but I would have hoped that after an hour+ all of the hot observables would have been closed (as per the error) .
Any pointers as to what can I do to tell the node to explicitly close these or find out what has caused this continuous CPU usage ?
You can set -Dnet.corda.client.rpc.trackRpcCallSites=true on the JVM,
command line and you will get a stack trace with the warning explaining where is leak is coming from.
If you are calling an RPC method which returns an Observable and If you don’t want an observable then subscribe then unsubscribe immediately to clear the client-side buffers and to stop the server from streaming.
We have a replicated cluster cache setup with two instances, everything runs well when both instances are on-line, and we are using Community Edition 4.8.
When we take an instance offline, cache management becomes very slow and even stopping and starting the cache from NCache Manager GUI takes a very long time and then shows a message stating that there is an instance that is un-reachable.
Also when trying to fetch data from cache or add data to it, it gives an exception of operation timeout, and there is no response from the single instance that is still running.
From what I understand, this scenario should be handled by the cache service it-self since it is replicated, and it should handle failure for an instance going offline.
Thank you,
I would like to explain the cause of slowness on your application when one of the server node gets removed from the cache cluster.
What happens is whenever a node gets removed from the cache cluster, the surviving node/nodes go into recovery process and try to re-establish connection with that downed server node. By default this Connection retry value is set to “2” which means that the surviving nodes would try to reconnect with the downed node two times and after the reconnection has failed, the cache cluster would consider the downed server and offline and the cluster would start handling requests like before. This reconnection process can take up to 90 seconds as this is the default TCP/IP timeout interval and if the connection retry is set to “2” the recovery process could take up to around 200 seconds. Your application(Or NCache Manager calls) could experience slowness or request timeouts during this 2-3 minutes window when the cluster is in the recovery mode but once the recovery process is finished, the application should start working without any issues. If the slowness or request timeouts last more than a few minutes
The Connection retry value can be changed from the NCache “Config.ncconf” file. Increasing the number of connection retries would mean that the cluster would spend more time in the recovery process. The purpose of this feature is that if there is a network glitch in the environment and the server nodes lose connection with eachother, the servers would get reconnected automatically due to this recovery process. This is the reason why it is recommended to keep the Connection Retry interval set to at least 1.
We are using BizTalk (2013 R2 CU 6) EDI functionality to batch EDI files. This uses the Microsoft.BizTalk.Edi.BatchingOrchestration.BatchingService Orchestration which is always running in a waiting state (for lack of a better term), dehydrated most of the time. While running the orchestration builds up instances of the BTXTimerMessages in the Queued (awaiting processing) state. These messages are never removed or processed, that I can tell. This eventually causes us to pass the 50k message threshold and start throttling.
As far as I can tell there is no way to setup a reoccurring schedule for the batcher, it must always run or be manually started. If we leave the batcher off there we get routing errors.
Currently the only way we have to eliminate these messages is to terminate the EDI batcher for each party, then restart it.
Is there a better way to purge these messages from the system, or stop them from being generated all together?
We're currently having an issue with our Biztalk having too few concurrent orchestrations and is causing delay in delivering messages. The system has been running for years and the issue happened just recently. In a normal state it would have 20-40 concurrent orchestrations but currently it's been running only 4 or less at the same time.
The same configuration on a test server is working properly so at first we thought clearing the database would help, but unfortunately hasn't.
Any advise will be greatly appreciated.
First thing is to fire up Performance Monitor and see if the system is Throttling and if so, why. See: Host Throttling Performance Counters