Connection to the java client failed after nebulagraph database was restarted - nebula-graph

Due to slow query, nebula graph was down.
I ran I ran enter code here, and then the connection to the nebulagraph database was lost. The error message is java.net.SocketException:Broken pipe(Write failed)`.
I would like to ask if this is normal and if it is possible to optimize the client to automatically recognize and reinitialize the connection pool?
The application is java client :com.vesoft.client-3.3.0 and the connection pool is com.vesoft.nebula.client.graph.SessionPool.

This is normal, the database kernel restarted the application and the current connection will also be invalid. It will be better that we can catch the corresponding exception in the business layer and re-initialize the session pool. In addition session pool and connection pool are two different concepts.

Related

Is it possible to specify a connection timeout for Maxscale connections from applications?

I have setup a two node MariaDB Galera cluster on Ubuntu systems. A simple application connects to a database using MaxScale and it works fine. But when the node in the cluster that is currently in use, say, node 1, fails, the application gets error such as 1927 or 1045. On receiving this error, the application tries to connect to database again but it keeps failing many times but succeeds once fail over from node 1 to node 2 is complete and MaxScale gives database connection to node 2. The connection trial duration ranges from 20 to 50 seconds in my cluster environment.
My question is whether or not there is any MaxScale connection time out parameter that I can use to specify connection timeout to some value such as 50 seconds so that application tries just once for a new connection instead of trying many times. (I used parameter connectTimeout in JDBC URL for the database but it was not effective for my application and I think this is expected.)
MaxScale is sending errors most likely because no master server is available. This error cannot be prevented with MaxScale 2.2 and client side re-connection is required.
In MaxScale 2.3, a new feature will be available that allows similar behavior to what you describe (see MXS-1501).
If you are performing read-only requests, it might be beneficial to enable master_failure_mode=error_on_write. This will allow read-only requests to be done even when no master server is available.

Connection pool is full

In my IIS Server, I have many application pools (like 6 to 7) and there are many ASP.NET applications running on each of them (ex. 25 applications per pool). They all are connected with Oracle database by using ADO.NET.
All applications are just working fine, but sometimes we get error like
Timeout expired. The timeout period elapsed prior to obtaining a connection from the pool. This may have occurred because all pooled connections were in use and max pool size was reached.
I know the possibilities for this like we are not closing our database connections properly. So here is my headache... I don't want to go each and every project to see where we forget to close connections it is very time taking task for us.
So is there any way to identify from which application connections are remaining opened? Can we see from IIS itself? Can we make some kind of utility to track from which project connection are remaining opened?
I'm not sure that it's a probleme of the connection to database. I think that you application are not disposing the context then the garbage collector can't clear memory. you can try to reduce the time for recylcling your application pools then check if you memory usage is decreasing or not.

MSDTC fails for self-hosted NServiceBus ASP.NET endpoints but not other processes

I have a Windows 2008 R2 server that hosts many back end NServiceBus endpoints. All of the services that rely on the NServiceBus.Host.exe host (installed as Windows Services) are able to interact with MSDTC perfectly, averaging a small handful of concurrent distributed transactions throughout the day. There are 2 small Web.API applications, however, that self host NServiceBus endpoints (as publishers) that constantly receive the following error when trying to process subscription requests:
NServiceBus.Transports.Msmq.MsmqDequeueStrategy Error in receiving
messages. System.Transactions.TransactionAbortedException: The
transaction has aborted. --->
System.Transactions.TransactionManagerCommunicationException:
Communication with the underlying transaction manager has failed. --->
System.Runtime.InteropServices.COMException: The Transaction Manager
is not available. (Exception from HRESULT: 0x8004D01B) at
System.Transactions.Oletx.IDtcProxyShimFactory.ConnectToProxy(String
nodeName, Guid resourceManagerIdentifier, IntPtr managedIdentifier,
Boolean& nodeNameMatches, UInt32& whereaboutsSize, CoTaskMemHandle&
whereaboutsBuffer, IResourceManagerShim& resourceManagerShim) at
System.Transactions.Oletx.DtcTransactionManager.Initialize() ---
End of inner exception stack trace --- at
System.Transactions.Oletx.OletxTransactionManager.ProxyException(COMException
comException) at
System.Transactions.Oletx.DtcTransactionManager.Initialize() at
System.Transactions.Oletx.DtcTransactionManager.get_ProxyShimFactory()
at
System.Transactions.Oletx.OletxTransactionManager.CreateTransaction(TransactionOptions
properties) at
System.Transactions.TransactionStatePromoted.EnterState(InternalTransaction
tx) --- End of inner exception stack trace --- at
System.Transactions.TransactionStateAborted.CheckForFinishedTransaction(InternalTransaction
tx) at System.Transactions.Transaction.Promote() at
System.Transactions.TransactionInterop.ConvertToOletxTransaction(Transaction
transaction) at
System.Transactions.TransactionInterop.GetDtcTransaction(Transaction
transaction) at
System.Messaging.MessageQueue.StaleSafeReceiveMessage(UInt32 timeout,
Int32 action, MQPROPS properties, NativeOverlapped* overlapped,
ReceiveCallback receiveCallback, CursorHandle cursorHandle, IntPtr
transaction) at
System.Messaging.MessageQueue.ReceiveCurrent(TimeSpan timeout, Int32
action, CursorHandle cursor, MessagePropertyFilter filter,
MessageQueueTransaction internalTransaction,
MessageQueueTransactionType transactionType) at
System.Messaging.MessageQueue.Receive(TimeSpan timeout,
MessageQueueTransactionType transactionType) at
NServiceBus.Transports.Msmq.MsmqDequeueStrategy.ReceiveMessage(Func`1
receive) in
c:\BuildAgent\work\31f8c64a6e8a2d7c\src\NServiceBus.Core\Transports\Msmq\MsmqDequeueStrategy.cs:line
313
Some other notes:
Both the erroring ApplicationPools' identities and the Windows
Services' Log On users are the same.
This actually worked well before
a recent reboot, as the Web.API services were able to successfully
process subscription requests, and are able to publish messages just
fine (though publishing does not automatically use MSDTC, and we are
not using a TransactionScope explicitly). Since the local reboot, we
simply get the above error if a subscription request message sits
in either of the Web.API publisher's input queue.
I've used both procmon.exe and MSDTC tracing and have found nothing of interest. The typical event viewer logs also do not provide any information.
All endpoints are running .NET 4.5 and NServiceBus 4.6
We cannot
recreate this in any other environment.
Additional notes from below conversations
The thread which throws the exception is pure NServiceBus subscription management where none of "my" code is involved. When the application pool starts the w3wp.exe worker process on demand, NSB is spawning a worker thread unbeknownst to the application to process subscription requests. It should only ever work across the publisher's input queue and the subscription storage, which I'm using MSMQ for that as well, in a queue right beside the other (i.e. no other server is involved to my knowledge).
The "code" of the website didn't change across reboots, and the application pool stopped and restarted several times before the reboot without issue.
Not really an answer, but too long for a comment.
What part of your operation requires DTC? A Distributed Transaction gets enlisted automatically when needed, usually when you are talking to two different DTC-supporting bits of infrastructure (e.g. MSMQ and a database).
You said you tested via DTC tracing--do you mean DTC Ping? Did you test by having it run on both machines (or all machines if there are more than two involved in the transaction)? The DTC tool is pretty esoteric, and its output can be confusing.
Also, if it did work before the reboot, is it possible the reboot reset firewall settings? Firewalls are a common cause of DTC problems.
Also, I assume you checked and rechecked your DTC settings on the local machine? Did you ensure that your MSMQ queues are set up to be transactional?
From your comments:
Note that this particular failure occurs when attempting to dequeue a
message from a local private MSMQ queue [...]
The stack trace makes it appear that that's all it's doing, but I suspect that as it is attempting dequeue it is also trying to enlist the transaction between multiple servers. See below.
Why MSDTC? It's the original way to support exactly-once messaging in
NServiceBus (see here).
Right, but what I'm asking is why the particular operation requires a distributed transaction. If all a handler is doing is reading from a queue and (for example) writing output to the console, MSDTC will never be enlisted, even though the handler is wrapped in a transaction scope. It will simply use a local transaction to read from the queue. The escalation to a distributed transaction is automatic, and only happens when it is needed to support multiple bits of infrastructure.
So if you recently deployed code in a handler that writes data to a new database server, you may be getting a failure because you are now enlisting a transaction that includes the new server, which may be where the failure is happening.
So determining all the servers involved in the distributed transaction is the first step. The next step would be to check the DTC settings on all involved servers. If DTC settings aren't the problem, I'd recommend testing communication between the servers using DTCPing. The NServiceBus documentation has some good instructions for using DTCPing.
What "fixed" this for us in the production environment was adding the application pool identity user to the local Administrators group on the server. Unfortunately we don't have time to determine what setting required that security setup, as this isn't a required configuration in other similar servers. Also, this isn't the most desirable solution from a security perspective, but in our particular situation, we're willing to live with it.

Calling SqlConnection.ClearAllPools() in Application_Start & Application_End?

We are trying to diagnose an issue that occurred in our production environment last week. Long story short, the database connection pool seemed to be full of active connections from our ASP.NET 3.5 app that would not clear, even after restarting the application pool and IIS.
The senior DBA said that because the network connections occur at the operating system level, recycling the app and IIS did not sever the actual network connections, so SQL Server left the database connections to continue running, and our app was still unable to reach the database.
In looking up ways to force a database connection pool to reset, I found the static method SqlConnection.ClearAllPools(), with documentation explaining what it does, but little to nothing explaining when to call it. It seems like calling it at the beginning of Application_Start and the end of Application_End in my global.asax.cs is a good safety measure to protect the app from poisoned connection pools, though it would of course incur a performance hit on startup/shutdown times.
Is what I've described a good practice? Is there a better one? The goal is to allow a simple app restart to reset an app's mangled connection pool without having to restart the OS or the SQL Server service, which would affect many other apps.
Any guidance is much appreciated.
When a process dies, all network connection are always, always, always closed immediately. That's at the TCP level. Has nothing to do with ADO.NET and goes for all applications. Kill the browser, and all downloads stop. Kill the FTP client and all connections are closed immediately.
Also, the connection pool is per process. So clearing it when starting the app is useless because the pool is empty. Clearing it at shutdown is not necessary because all connections will (gracefully) shut down any moment.
Probably, your app is not returning connections to the pool. You must dispose of all connections after use in all cases. If you fail to do that, dangling connections will accumulate for an indefinite amount of time.
Clearing the pool does not free up dangling connections because those appear to be in use. How could ADO.NET tell that you'll never use them again? It can't.
Look at sys.dm_exec_connections to see who is holding connections open. You might increase the ADO.NET pool size as a stop-gap measure. SQL Server can take over 30k connections per instance. You'll normally never saturate that.

ASP.NET connection pool question

Does the same connection string used on two different physical servers hosting different web applications that talk to the same database draw connections from the same connection pool? Or are pooled connections confined to at the application level?
I ask because I inherited a 7 year old .NET 1.1 web application which is riddled with in-line SQL, unclosed and undisposed sql connection and datareader objects. Recently, I was tasked to write a small web app that is hosted on another server and talks to the same database and therefore used the same database connection string. I created a LINQ object to read and write the one table required by the app. Now the original .NET 1.1 app is throwing exceptions like
"Timeout expired. The timeout period elapsed prior to obtaining a connection from the pool. This may have occurred because all pooled connections were in use and max pool size was reached."
Maybe these are unreleated, but wanted to get your opinions to make sure I cover all my bases.
Thanks!
There is no way connections can be pooled between two separate machines. Your SQL Server will have a connection limit for total connections however.
This error is most likely occurring because the application is not returning connections to the connection pool. This can happen because the connection is not being disposed of correctly. This can happen due to poor code (does it use a using block, or a try catch finally?) or if using a SQLDataReader can cause the connection to stay open after the code to execute the SQL has exited.
Connection Pools are kept in your App Pool, so it shouldn't be possible for a separate machine to steal out of a separate boxes App Pool. Have a look here for some info on the connection pool. I'd also recommend slapping the performance counters on see bottom of this article to see what's going on in there a bit more.
Also might want to check the max number of connections on SQL Server. In management Studio
Right click on the Server name --> Properties --> Connections
look for "Maximum number of concurrent connections (0 = unlimited)"

Resources