AMQP handshake timeout error while deploying AWS Corda Enterprise Template - corda

I am deploying AWS Corda Enterprise Template. The Quick start deployed the stack as per the defined CloudFormation template. I can see 2 AWS instances, up and running as Corda nodes, in Hot-Cold setup with a load balancer.
However the Log for Corda node has following ERROR related to AMQP communication.
[ERROR] 2018-10-18T05:47:55,743Z [Thread-3
(ActiveMQ-scheduled-threads)] core.server.lambda$channelActive$0 -
AMQ224088: Timeout (10 seconds) while handshaking has occurred. {}
What can be possible reason for this error? This error keeps on occurring after a certain time interval. So it looks like some connectivity issue to me.
Note: The load balancer shows the status of this AWS Corda instances as healty (In Service). So I believe the Corda node has booted up successfully.

The ERROR message isn't necessarily tied to AMQP. Perhaps you were confused by the "AMQ" in the error ID (AMQ224088)?
In any event, this error indicates that something on the network is connecting to the ActiveMQ Artemis broker, but it's not completing any protocol handshake. This is commonly seen with, for example, load balancers that do a health check by creating a socket connection without sending any real data just to see if the port is open on the target machine.

Related

Trouble connecting to gRPC server on AWS Fargate

I have a Python gRPC server running on AWS Fargate (configured very similar to this AWS guide here), and another AWS Fargate task (call it the "client") that attempts to make a connection to my gRPC server (also using Python gRPC). However, the client is unable to make a call to my server, with the following error:
<_InactiveRpcError of RPC that terminated with:
status = StatusCode.UNAVAILABLE
details = "failed to connect to all addresses"
debug_error_string = "{"created":"#1619057124.216955000","description":"Failed to pick subchannel",
"file":"src/core/ext/filters/client_channel/client_channel.cc","file_line":5397,
"referenced_errors":[{"created":"#1619057124.216950000","description":"failed to connect to all addresses",
"file":"src/core/ext/filters/client_channel/lb_policy/pick_first/pick_first.cc",
"file_line":398,"grpc_status":14}]}"
Based on my reading online, it seems like there are myriad situations in which this error is thrown, and I'm having trouble figuring out which one pertains to my case. Here is some additional information:
When running client and server locally, I am able to successfully connect by having the client connect to localhost:[PORT]
I have configured an application load balancer target group following the guide from AWS here that makes health check requests to the / route of my gRPC server, using the gRPC protocol, and expect gRPC response code 12 (UNIMPLEMENTED); these health check requests are coming back as expected, which I believe implies the load balancer is able to successfully communicate with the server (although I could be misunderstanding)
I configured a service discovery system (following this guide here) that should allow me to reach my gRPC server within my VPC via the name service-name.dev.co.local. I can confirm that the corresponding DNS record exists in Route 53, and when I SSH into my VPC, I am indeed able to ping service-name.dev.co.local successfully.
Anyone have any ideas? Would appreciate any and all advice, and I'm happy to answer any further questions.
Thank you for your help!
on your grpc server use 0.0.0.0:[port] and expose this port with TCP on your container.

corda CENM networkmap server start failing to connect database after a few week run

we operate CENM(1.2 and use helm template to run on k8s cluster) to construct our own private network and keep on running CENM network map server for a few week, then launching new node start failing.
with further investigation, its appeared that request timeout for http://nmap:10000/network-map causes problem.
in nmap server’s log, we found following output when access to above url with curl.
[NMServer] - Error while handling socket client message com.r3.enm.servicesapi.networkmap.handlers.LatestUnsignedNetworkParametersRetrievalMessage#760c53ea: HikariPool-1 - Connection is not available, request timed out after 30000ms.
netstat shows there is at least 3 establish connection to the database from the container which network map server runs, also I can connect database directly with using CLI.
so I don’t think it is neither database saturated nor network configuration problem.
anyone have an idea why this happens? I think restart probably solve the problem, but want to know the root cause...
regards,
Please test the following options.
Since it is the HikariCP (connection pool) component that is throwing the error it would be worth seeing if increasing the pool size in the network map configuration may help - see below)
Corda uses Hikari Pool for creating the connection pool. To configure the connection pool any custom properties can be set in the dataSourceProperties section.
dataSourceProperties = {
dataSourceClassName = "org.postgresql.ds.PGSimpleDataSource"
...
maximumPoolSize = 10
connectionTimeout = 50000
}
Has a healthcheck been conducted to verify there are sufficient resources on that postgres database i.e basic diagnostic checks ?
Another option to get more information logged from the network map service is to run with TRACE logging also:
From https://docs.corda.net/docs/cenm/1.2/troubleshooting-common-issues.html
Enabling debug/trace logging
Each service can be configured to run with a deeper log level via command line flags passed at startup:
java -DdefaultLogLevel=TRACE -DconsoleLogLevel=TRACE -jar <enm-service-jar>.jar --config-fi

BizTalk 2016 sFTP WinSCP - No more messages can be received

Our BizTalk 2016 environment consists of 2 application server running in a group with CU5 and FP3.
We have 24 applications deployed. Across this applications we have 27 receive locations with the new sFTP WinSCP adapter configured. For all sFTP receive applications we have the "Connection Limit" configured to 5. We have 6 different sFTP Server we are connecting to.
After approximate 2 hours, we have the following event log warnings and the receive locations stop working:
"The adapter "SFTP" raised an error message. Details "The WCF service
host at address "sftp://..." has faulted and as a result no more
messages can be received on the corresponding receive location. To fix
the issue, BizTalk Server will automatically attempt to restart the
service host."
Against the event log message the service host is not restarting automatically.
Has someone any idea how to fix this issue?
Try out CU7 as it includes a couple of SFTP fixes.
The latest version of BizTalk Health Monitor comes up with the following Important Warning
The host instances of 'hostinstancename' need more worker threads per cpu to run correctly SFTP Receive Locations. Increase so the "Maximum Worker Threads" property to 500 for these host instances and be sure they are dedicated for this SFTP Receive Locations
So things to look at are
Have a dedicated host for receive locations using SFTP
Increase the Maximum Worker threads setting to 500
Check how frequently you poll (the default is 5 seconds)
Put a schedule on to only poll during the periods you need.
Disable message body tracking if it is not needed.

Kafka Receives Messages But Fails To Add To Topic - With Setup Local Kafka VM and Minikube Kubernetes Cluster

Set Up
Laptop with:-
Kafka in a virtualbox vm : vagrant 9092 port forwarded from laptop's localhost
Kubernetes Cluster in a virtualbox VM : miniKube
Desired Outcome
Microservices on my miniKube cluster can fire messages to Kafka VM.
Note that this works in Google Container Engine (GKE)
Actual Outcome
From the laptop I can use a console producer to send messages to Kafka VM and it happily obliges adding these to the topic. But when a microservice from the kubernetes cluster sends a message, the message is received but it is not added to the topic.
Instead I get the error on the microservice ...
Batch containing 1 record(s) expired due to timeout while requesting metadata from brokers for generated-test-script-0
If I tail kafka-request.log I see ...
[2017-02-08 21:57:05,891] TRACE Completed request:{api_key=3,api_version=1,correlation_id=0,client_id=producer-5} -- {topics=[generated-test-script]} from connection 10.0.2.15:9092-10.0.2.2:50124;totalTime:0,requestQueueTime:0,localTime:0,remoteTime:0,responseQueueTime:0,sendTime:0,securityProtocol:PLAINTEXT,principal:User:ANONYMOUS (kafka.request.logger)
While in the "success" case when I simply use a console producer on the laptop I see 2 lines. 1 the same as above but I guess another ACK ...
[2017-02-08 22:08:12,764] TRACE Completed request:{api_key=3,api_version=2,correlation_id=0,client_id=console-producer} -- {topics=[test]} from connection 10.0.2.15:9092-10.0.2.2:50748;totalTime:6,requestQueueTime:0,localTime:6,remoteTime:0,responseQueueTime:0,sendTime:0,securityProtocol:PLAINTEXT,principal:User:ANONYMOUS (kafka.request.logger)
[2017-02-08 22:08:13,799] TRACE Completed request:{api_key=0,api_version=2,correlation_id=1,client_id=console-producer} -- {acks=1,timeout=1500,topic_data=[{topic=test,data=[{partition=0,record_set=java.nio.HeapByteBuffer[pos=0 lim=39 cap=39]}]}]} from connection 10.0.2.15:9092-10.0.2.2:53696;totalTime:22,requestQueueTime:1,localTime:21,remoteTime:0,responseQueueTime:0,sendTime:0,securityProtocol:PLAINTEXT,principal:User:ANONYMOUS (kafka.request.logger)
Conclusion And Thoughts
So there is no ERROR as such on the kafka server side, just on the client. My guess is that this a a network issue setup ( NAT? ) whereby the microserice in the virtual Kubernetes cluster can talk to my Kafka VM but the reply route is dropped?
The metadata is required to be returned by Kafka on the first sent message so making the batch size == 0, or "acks" = 0 doesn't really help as a hack because of the initial requirement to send this metadata back.
Any thoughts or pointers would be great as I really want to run this cluster and Kafka VM locally for dev work.

BizTalk MSMQ Receive Locations Not receiving. Error details: The Messaging Engine is shutting down

So I have been running into this issue since we set up our BizTalk Server on a new network. We have the same MSMQ settings between the two servers.
The data stays in our AX MSMQ folders and has the correct permissions.
The system does not ever throw an error until I stop/restart the Receive Host Instances.
(we get one of these errors per message in any of our MSMQ ports)
Full error:
A message received by adapter "MSMQ" on receive location
"recv_loc_file_ax_2012_customer_message" with URI
"FORMATNAME:DIRECT=OS:AXSERVER\AXOUTPPDCUSTOMER" is suspended. Error
details: The Messaging Engine is shutting down. MessageId:
{65E24FE1-317E-4636-AFC7-B43FACBDBEDF} InstanceID:
{6618EEB3-9B72-4123-BD8C-422661A59BDD}
Then the messages finally appear under suspended instances after this error occurs. I am able to resume them and they all process as expected.
I have looked almost every, Anyone have suggestions for what is causing these messages to not be read into my MSMQ receive ports properly?
EDIT: This BizTalk server is connecting to a remote AX server's MSMQ, but I am doubting this changes anything I have not already looked into.
Thank you very much.
The error was on the guys who installed MSMQ on the remote server; the active directory was not set up thus not properly autenticating my BizTalk Server account.
Answer for BizTalk: The way to find these hidden errors was by changing the MSMQ BizTalk receive port to a WFC-NetMSMQ.
Other: we are a little baffled that BizTalk was able to take the messages out of MSMQ despite the AD not being set up and "force messages through" but this is a minor detail to note.
BizTalk reference

Resources