Our BizTalk 2016 environment consists of 2 application server running in a group with CU5 and FP3.
We have 24 applications deployed. Across this applications we have 27 receive locations with the new sFTP WinSCP adapter configured. For all sFTP receive applications we have the "Connection Limit" configured to 5. We have 6 different sFTP Server we are connecting to.
After approximate 2 hours, we have the following event log warnings and the receive locations stop working:
"The adapter "SFTP" raised an error message. Details "The WCF service
host at address "sftp://..." has faulted and as a result no more
messages can be received on the corresponding receive location. To fix
the issue, BizTalk Server will automatically attempt to restart the
service host."
Against the event log message the service host is not restarting automatically.
Has someone any idea how to fix this issue?
Try out CU7 as it includes a couple of SFTP fixes.
The latest version of BizTalk Health Monitor comes up with the following Important Warning
The host instances of 'hostinstancename' need more worker threads per cpu to run correctly SFTP Receive Locations. Increase so the "Maximum Worker Threads" property to 500 for these host instances and be sure they are dedicated for this SFTP Receive Locations
So things to look at are
Have a dedicated host for receive locations using SFTP
Increase the Maximum Worker threads setting to 500
Check how frequently you poll (the default is 5 seconds)
Put a schedule on to only poll during the periods you need.
Disable message body tracking if it is not needed.
Related
we operate CENM(1.2 and use helm template to run on k8s cluster) to construct our own private network and keep on running CENM network map server for a few week, then launching new node start failing.
with further investigation, its appeared that request timeout for http://nmap:10000/network-map causes problem.
in nmap server’s log, we found following output when access to above url with curl.
[NMServer] - Error while handling socket client message com.r3.enm.servicesapi.networkmap.handlers.LatestUnsignedNetworkParametersRetrievalMessage#760c53ea: HikariPool-1 - Connection is not available, request timed out after 30000ms.
netstat shows there is at least 3 establish connection to the database from the container which network map server runs, also I can connect database directly with using CLI.
so I don’t think it is neither database saturated nor network configuration problem.
anyone have an idea why this happens? I think restart probably solve the problem, but want to know the root cause...
regards,
Please test the following options.
Since it is the HikariCP (connection pool) component that is throwing the error it would be worth seeing if increasing the pool size in the network map configuration may help - see below)
Corda uses Hikari Pool for creating the connection pool. To configure the connection pool any custom properties can be set in the dataSourceProperties section.
dataSourceProperties = {
dataSourceClassName = "org.postgresql.ds.PGSimpleDataSource"
...
maximumPoolSize = 10
connectionTimeout = 50000
}
Has a healthcheck been conducted to verify there are sufficient resources on that postgres database i.e basic diagnostic checks ?
Another option to get more information logged from the network map service is to run with TRACE logging also:
From https://docs.corda.net/docs/cenm/1.2/troubleshooting-common-issues.html
Enabling debug/trace logging
Each service can be configured to run with a deeper log level via command line flags passed at startup:
java -DdefaultLogLevel=TRACE -DconsoleLogLevel=TRACE -jar <enm-service-jar>.jar --config-fi
I am deploying AWS Corda Enterprise Template. The Quick start deployed the stack as per the defined CloudFormation template. I can see 2 AWS instances, up and running as Corda nodes, in Hot-Cold setup with a load balancer.
However the Log for Corda node has following ERROR related to AMQP communication.
[ERROR] 2018-10-18T05:47:55,743Z [Thread-3
(ActiveMQ-scheduled-threads)] core.server.lambda$channelActive$0 -
AMQ224088: Timeout (10 seconds) while handshaking has occurred. {}
What can be possible reason for this error? This error keeps on occurring after a certain time interval. So it looks like some connectivity issue to me.
Note: The load balancer shows the status of this AWS Corda instances as healty (In Service). So I believe the Corda node has booted up successfully.
The ERROR message isn't necessarily tied to AMQP. Perhaps you were confused by the "AMQ" in the error ID (AMQ224088)?
In any event, this error indicates that something on the network is connecting to the ActiveMQ Artemis broker, but it's not completing any protocol handshake. This is commonly seen with, for example, load balancers that do a health check by creating a socket connection without sending any real data just to see if the port is open on the target machine.
So I have been running into this issue since we set up our BizTalk Server on a new network. We have the same MSMQ settings between the two servers.
The data stays in our AX MSMQ folders and has the correct permissions.
The system does not ever throw an error until I stop/restart the Receive Host Instances.
(we get one of these errors per message in any of our MSMQ ports)
Full error:
A message received by adapter "MSMQ" on receive location
"recv_loc_file_ax_2012_customer_message" with URI
"FORMATNAME:DIRECT=OS:AXSERVER\AXOUTPPDCUSTOMER" is suspended. Error
details: The Messaging Engine is shutting down. MessageId:
{65E24FE1-317E-4636-AFC7-B43FACBDBEDF} InstanceID:
{6618EEB3-9B72-4123-BD8C-422661A59BDD}
Then the messages finally appear under suspended instances after this error occurs. I am able to resume them and they all process as expected.
I have looked almost every, Anyone have suggestions for what is causing these messages to not be read into my MSMQ receive ports properly?
EDIT: This BizTalk server is connecting to a remote AX server's MSMQ, but I am doubting this changes anything I have not already looked into.
Thank you very much.
The error was on the guys who installed MSMQ on the remote server; the active directory was not set up thus not properly autenticating my BizTalk Server account.
Answer for BizTalk: The way to find these hidden errors was by changing the MSMQ BizTalk receive port to a WFC-NetMSMQ.
Other: we are a little baffled that BizTalk was able to take the messages out of MSMQ despite the AD not being set up and "force messages through" but this is a minor detail to note.
BizTalk reference
I have an application that is used to make hotel bookings. The application takes an XML message, transforms the XML into another XML message and sends this new XML to another application. I am able to book hotels successfully.
When I try to amend this booking (different XML request, same application, same URL) I get a 'Connection refused' error.
I would have thought that there'd be consistency (all work or none) but there's not.
Anyone any idea why?
"Connection refused" means that no application is accepting connections on the port and host that you try to connect to. It can be caused by
The application is actually running on a different host or a different port
The application crashed and hasn't been restarted
The application is buggy: it closes the listening server socket from time to time, so that it is not listening for connection attempts all the time
Firewall is configured to respond to new connections with a "connection refused" even if the application was able to accept a connection
Background
We have a number of web applications on different web servers that connect to a single database server. Over the past couple months, we have noticed that every once in awhile, our web servers won't be able to connect to the database server.
Our Environment
We have a couple different web environments, some running ColdFusion and others running .NET. The .NET apps are both Web Forms and MVC. They span multiple versions from 2.0 to 4.5. Both the ColdFusion and .NET web servers are windows based machines. Both the ColdFusion and .NET web environments are clustered and some of the machines are physical while others are virtual.
Our database server is SQL Server 2008 r2. It houses multiple databases. Each application has its own database user that it connects with to the server that only gives it access to a particular database.
Other Facts
When we notice issues, they occur in short bursts that last anywhere from a couple seconds to a couple minutes.
When we notice issues, the burst contains errors from multiple different appliations, not just one app at at time.
When we notice issues, the burst contains errors from applications from different web environments. (This makes us think we can rule out that the apps themselves are the issue)
The burst of connection issues happen at various times throughout the day and night. They are not always during times of high usage.
We have monitored things like number of user connections, memory, IO, CPU usage, etc... and we have not seen spikes or anything else that might point to a problem.
We have installed wireshark on the web and db servers in hopes of catching the problem without any success.
Questions
Does anyone have suggestions on where I should look next?
Are there properties of the database that could cause this?
Is there any way to "monitor" the connection between the database and web server in a better manner?
Is there anything that can be done on the app side to better understand what is happening?
Errors Caught by Apps
.NET errors
A network-related or instance-specific error occurred while establishing a connection to SQL Server. The server was not found or was not accessible. Verify that the instance name is correct and that SQL Server is configured to allow remote connections. (provider: Named Pipes Provider, error: 40 - Could not open a connection to SQL Server)
Timeout expired. The timeout period elapsed prior to completion of the operation or the server is not responding.
A transport-level error has occurred when receiving results from the server. (provider: TCP Provider, error: 0 - The semaphore timeout period has expired.)
Timeout expired. The timeout period elapsed prior to obtaining a connection from the pool. This may have occurred because all pooled connections were in use and max pool size was reached.
ColdFusion errors
Error Executing Database Query. The TCP/IP connection to the host has failed. java.net.ConnectException: Connection timed out: connect The error occurred on line 38.
Error Executing Database Query. Connection reset by peer: socket write error The error occurred on line 91.
Error Executing Database Query. Timed out trying to establish connection The error occurred on line 38.
In CF, I once had an issue like what you were seeing. I had CF on 1 server, and sql 2008 r2 on another server. I would see CF errors like you posted below. To help trace it to a network error I wrote something like this:
1) created a down.bat
tracert serverip
2) I then put a <cftry><cfcatch> around the query.
When the query generated the error I would execute
<cfexecute name="C:\path\to\down.bat" variable="log" timeout="60" />
<cfmail to="ME" from="Server" subject="SQL DOWN">
Server Debugging Info:
------------------------------------------------------------
#now()#
#cfcatch.Detail#
#cfcatch.Message#
#log#
</cfmail>
</cfexecute>
This helped me fix my situation which ended up being hardware at the datacenter.