First some background:
1. Understand how COnnection Pooling is being used by NPGSQL in ASP.NET REST API
Environment:
- We have a REST API controller that queries first a list of items (to RDS) then per each item in this list we need to obtain some additional values so we use a Parallel.ForEach statement
Every time we use a connection we dispose it properly
I've seen that every time this endpoint is called the number of connections increase and then they are removed ok.
Process:
I've followed http://www.npgsql.org/doc/performance.html#performance-counters to check on how NPGSQL is handling connections, also added the following to the connectionstring:
"CommandTimeout=50000;TIMEOUT=1024;POOLING=True;MINPOOLSIZE=1;MAXPOOLSIZE=100;Use Perf Counters=true;"
but I found a strange outcome:
NumberOfNonPooledConnections and NumberOfPooledConnections is always the same in my case (56) we are using a Parallel.ForEach to query several items.
The value for NumberOfActiveConnectionPools is 1.
At first I couldn't really understand how this is working, was it really using the connection pool ?
Then I stop the process removed the ";POOLING=True;" from the connection string and I have the same result.
Finally I set ";POOLING=false;" and execute again, now the NumberOfPooledConnections went to the roof it reached 2378, and then it started timing out opening new connections.
I also noted in RDS performance metrics that the number of connections never exeeced 110 connections.
So the questions would be:
What would be the criteria to set the MaxPoolSize parameter ? 100 seems the usual.
In ASP.NET the connection pool is handled by instance ? so all connections being made from the same Application Pool in IIS will be reused or is per execution?.
First, ASP.NET (the web side) has absolutely no effect on Npgsql's connection pooling or on ADO.NET in general, so it's better to reason about Npgsql and ADO.NET without thinking about web.
Second, you aren't saying which version of Npgsql you're using.
Beyond that, before looking at performance counters, what exactly is the problem you are seeing? Are you seeing too many connections at the PostgreSQL side? You can check this by querying pg_stat_activity.
If Npgsql pooling is on (Pooling=true in the connection string, it's also the default), then when you call NpgsqlConnection.Open() a physical connection will be taken from the pool if one is available. When you close or dispose that NpgsqlConnection, it will be returned to the pool to be reused later. If you're seeing physical connections going up too much at the PostgreSQL side, that is a probable sign that you are forgetting to close/dispose a connection in your code and you have a leak.
The performance counters feature can be useful to understand what's happening, but unfortunately it isn't well-tested and may contain bugs. So please make sure there's an actual issue before starting to look at it (and at the very least report the Npgsql version you're using).
Related
We have an ASP.NET API web site which connects using NHibernate to a SQL Server.
The problem we are experiencing is that gradually throughout the day, the number of connections to the SQL server creeps up, and there are many connections that do not appear to be returned to the pool. By this, I mean that if I run the following query:
select * from master..sysprocesses s where datediff(minute, s.last_batch, getdate())>10
the number of rows returned just keeps climbing. Nothing in the API should be taking 10 minutes to complete. And there are connections in there from hours ago.
Here's another clue: the open_tran column of all these rows has a value of 1. So it seems to me that somewhere inside the API call, we're creating a transaction boundary, and that transaction is never being closed. Perhaps DTC may have a hand in this (we sometimes do connect to more than one database in a call).
The thing is, I haven't a clue how to troubleshoot this further. I've tried running DBCC INPUTBUFFER on the rogue spids, and there's nothing consistent between them.
What are some of the anti-patterns/other possible causes that might lead to this behavior?
Update: here's how the DB connection is being created. We're using StructureMap for Dependency Injection. We create two DB connections on each unit of work: one "normal" connection for regular read/write access, and an "uncommitted" connection that runs in a transaction with "ReadUncommitted" access (we were having a problem with table locking when reading from large tables).
Here's the code from the DI Registry:
For<ISession>().Transient().Use(context => context.GetInstance<ISessionFactory>().OpenSession());
For<ISessionUncommittedWrapper>().Transient().Use(context => new SessionUncommittedWrapper { Session = context.GetInstance<ISessionFactory>().OpenSession() });
Then, inside the unit of work middleware, we create a UnitOfWork (with a using block, of course), which takes an ISession and an ISessionUncommittedWrapper in the constructor. In the Begin() method, we have:
_uncommittedTransaction = SessionUncommittedWrapper.Session.BeginTransaction(IsolationLevel.ReadUncommitted);
which gets disposed (along with the ISession and ISessionUncommittedWrapper) in the UnitOfWork's Dispose() method.
I eventually found the problem.
The way I found the problem was by creating a logging table that tracked the creation and disposal of Sessions, along with the URI of the endpoint called. By querying all the undisposed connections, I found that in every case where the connection was not disposed, the path began with "/signalr".
<facepalm>D'oh!</facepalm>
Since the OWIN middleware was proactively creating the Sql connections, it was also doing so for SignalR, which in its nature, keeps the transaction open! So every client that logged in with SignalR was hogging two Sql connections.
I made the appropriate changes to exclude SignalR connections from the middleware, and now we have no more hanging Sql connections.
I'm curious how should I structure JSON REST API server in Go language with Mgo library. I have dozens of collections related with each other. I've created the gist with sample part of file structure in my current approach.
It works great, but from time to time I encounter downtime caused by this error: "read tcp 10.168.30.100:37288: i/o timeout". I suppose that I handle mgo connection pool inapropriately. Are there any examples showing how should I create big applications based on mgo?
This error message implies a roundtrip to the database took longer than the timeout period you defined. Just increasing that timeout should get rid of the problem, assuming you don't have any real issues that are causing the application to behave in a sluggish manner.
In general, this error doesn't imply you have any kind of scale issues, other than the fact maybe you have an increasing amount of data in some collections and certain queries may be getting too slow and need re-thinking (indexes, etc).
There's also no need to restart the application. You can either Refresh the problematic session, or Close and re-create the session in case you're using copies of a master session. The state of mgo and the pool of connections is still fine. It's just warning you that this specific session observed an issue on the wire, and so you have to acknowledge it before the session will be valid again.
As usual, also make sure to be using the latest release to avoid problems that have already been fixed, if any.
First off, I am not an AS 400 guy - at all. So please forgive me for asking any noobish questions here.
Basically, I am working on a .Net application that needs to access the AS400 for some real-time data. Although I have the system working, I am getting very different performance results between queries. Typically, when I make the 1st request against a SPROC on the AS400, I am seeing ~ 14 seconds to get the full data set. After that initial call, any subsequent calls usually only take ~ 1 second to return. This performance improvement remains for ~ 20 mins or so before it takes 14 seconds again.
The interesting part with this is that, if the stored procedure is executed directly on the iSeries Navigator, it always returns within milliseconds (no change in response time).
I wonder if it is a caching / execution plan issue but I can only apply my SQL SERVER logic to the AS400, which is not always a match.
Any suggestions on what I can do to recieve a more consistant response time or simply insight as to why the AS400 is acting in this manner when I was using the iSeries Data Provider for .Net? Is there a better access method that I should use?
Just in case, here's the code I am using to connect to the AS400
Dim Conn As New IBM.Data.DB2.iSeries.iDB2Connection(ConnectionString)
Dim Cmd As New IBM.Data.DB2.iSeries.iDB2Command("SPROC_NAME_HERE", Conn)
Cmd.CommandType = CommandType.StoredProcedure
Using Conn
Conn.Open()
Dim Reader = Cmd.ExecuteReader()
Using Reader
While Reader.Read()
'Do Something
End While
Reader.Close()
End Using
Conn.Close()
End Using
EDIT: after looking about a bit on this issue and using some of the comments below, I am beginning to wonder if I am experiencing this due to the gains from connection pooling? Thoughts?
I've found the Redbook Integrating DB2 Universal Database for iSeries with Microsoft ADO .NET useful for diagnosing issues like these.
Specifically look into the client and server side traces to help isolate the issue. And don't be afraid to call IBM software support. They can help you set up profiling to figure out the issue.
You may want to try a different driver to connect to the AS400-DB2 system. I have used 2 options.
the standard jt400.jar driver to create a simple java web service to get my data
the drivers from the company called HIT software (www.hitsw.com)
Obviously the first option would be the slower of the two, but thats the free way of doing things.
Each connection to the iSeries is backed by a job. Upon the first connection, the iSeries driver needs to create the connection pool, start a job, and associate that job with the connection object. When you close a connection, the driver will return that object to the connection pool, but will not end the job on the server.
You can turn on tracing to determine what is happening on your first connection attempt. To do so, add "Trace=StartDebug" to your connection string, and enable trace logging on the box that is running your code. You can do this by using the 'cwbmptrc' tool in the Client Access program directory:
c:\Program Files (x86)\IBM\Client Access>cwbmptrc.exe +a
Error logging is on
Tracing is on
Error log file name is:
C:\Users\Public\Documents\IBM\Client Access\iDB2Log.txt
Trace file name is:
C:\Users\Public\Documents\IBM\Client Access\iDB2Trace.txt
The trace output will give you insight into what operations the driver is performing and how long each operation takes to complete. Just don't forget to turn tracing off once you are done (cwbmptrc.exe -a)
If you don't want to mess with the tracing, a quick test to determine if connection pooling is behind the delay is to disable it by adding "Pooling=false" to your connection string. If connection pooling the is reason that your 2nd attempt is much faster, disabling connection pooling should make each request perform as slowly as the first.
I have seen similar performance from iSeries SQL (ODBC) queries for several years. I think it's part of the nature of the beast-- OS/400 dynamically moves data from disk to memory when it's accessed.
FWIW, I've also noticed that the iSeries is more like a tractor than a race car. It deals much better with big loads. In one case, I consolidated about a dozen short queries into a single monstrous one, and reduced the execution time from something like 20 seconds to about 2 seconds.
I have had to pull data from the AS/400 in the past, basically there were a couple of things that worked for me:
1) Dump data into a SQL Server table nightly where I could control the indexes, the native SqlClient beats the IBM DB2 .NET Client every time
2) Talk to one of your AS400 programmers and make sure the command you are using is hitting a logical file as opposed to a physical (logical v/s physical in their world is akin to our tables v/s views)
3) Create Views using a Linked Server on SQL server and query your views.
I have observed the same behavior when connecting to Iseries data from Java solutions hosted on Websphere Application Server (WAS) as well as .Net solutions hosted on IIS. The first call of the day is always more "expensive" than the second.
The delay on the first call is caused by the "setup" time for the Iseries to set up the job to service the request, (job name is QZDASOINIT in subsystem QUSRWRK). Subsequent calls will reuse the existing jobs that stay active waiting for more incoming requests.
The number of QZDASOINIT jobs and how long they stay active is configurable on the Iseries.
One document on how to tune your prestart job entries:
http://www.ibmsystemsmag.com/ibmi/tipstechniques/systemsmanagement/Tuning-Prestart-Job-Entries/
I guess it would be a reasonable assumption that there is also some overhead to the "first call of the day" on both WAS and IIS.
Try creating a stored procedure. This will create and cache your access plan with the stored procedure, so optimizer doesn't have to look in the SQL cache or reoptimize.
A part of the application I'm working on is an swf that shows a test with some 80 questions. Each question is saved in SQL Server through WebORB and ASP.NET.
If a candidate finishes the test, the session needs to be validated. The problem is that sometimes 350 candidates finish their test at the same moment, and the CPU on the web server and SQL Server explodes (350 validations concurrently).
Now, how should I implement queuing here? In the database, there's a table that has a record for each session. One column holds the status. 1 is finished, 2 is validated.
I could implement queuing in two ways (as I see it, maybe you have other propositions):
A process that checks the table for records with status 1. If it finds one, it validates the session. So, sessions are validated one after one.
If a candidate finishes its session, a message is sent to a MSMQ queue. Another process listens to the queue and validates sessions one after one.
Now:
What would be the best approach?
Where do you start the process that will validate sessions? In your global.asax (application_start)? As a windows service? As an exe on the root of the website that is started in application_start?
To me, using the table and looking for records with status 1 seems the easiest way.
The MSMQ approach decouples your web-facing application from the validation logic service and the database.
This brings many advantages, a few of which:
It would be easier to handle situations where the validation logic can handle 5 sessions per second, and it receives 300 all at once. Otherwise you would have to handle copmlicated timeouts, re-attempts, etc.
It would be easier to do maintanance on the validation service, without having to interrupt the rest of the application. When the validation service is brought down, messages would queue up in MSMQ, and would get processed again as soon as it is brought up.
The same as above applies for database maintanance.
If you don't have experience using MSMQ and no infrastructrure set up, I would advice against it. Sure, it might be the "proper" way of doing queueing on the Microsoft platform, but it is not very straight-forward and has quite a learning curve.
The same goes for creating a Windows Service; don't do it unless you are familiar with it. For simple cases such as this I would argue that the pain is greater than the rewards.
The simplest solution would probably be to use the table and run the process on a background thread that you start up in global.asax. You probably also want to create an admin page that can report some status information about the process (number of pending jobs etc) and maybe a button to restart the process if it for some reason fails.
What is validating? Before working on your queuing strategy, I would try to make validating as fast as possible, including making it set based if it isn't already so.
I have recently been investigating this myself so wanted to mention my findings. The location of the Database in comparison to your application is a big factor on deciding which option is faster.
I tested inserting the time it took to insert 100 database entries versus logging the exact same data into a local MSMQ message. I then took the average of the results of performing this test several times.
What I found was that when the database is on the local network, inserting a row was up to 4 times faster than logging to an MSMQ.
When the database was being accessed over a decent internet connection, inserting a row into the database was up to 6 times slower than logging to an MSMQ.
So:
Local database - DB is faster, otherwise MSMQ is.
Having deployed a new build of an ASP.NET site in a production environment, I am logging dozens of data errors every second, almost always with the error "Cannot find table 0." We use datasets and frequently refer to Table[0], and while I understand the defensive coding practice of checking the dataset for tables before accessing Table[0], it's never been a problem in the past. A certain page will load fine one second, and then be missing one of its data-driven components the next. Just seeing if this rings a bell for anyone.
More detail: I used a different build server this time, and while I imagine the compiler settings are the same on both, I have a hard time thinking that there's a switch that makes 50% of my database calls come back with no tables. I also switched the project to VS 2008, but then reverted all of those changes when I switched back to VS 2005. I notice that the built assembly has a new MyLibrary.XmlSerializers.dll, where it didn't used to, but I also can't imagine that that's causing all the trouble. (It also doesn't fall down on calls to MyLibrary, or at least no more than any other time.)
Updated to add: I've discovered that the troublesome build is a "Release" build, where the working build was compiled as "Debug". Could that explain it?
Rolling back to the build before these changes fixed it. (Rebooting the SQL Server, the step we tried before that, did not.)
The trouble also seems to be load-based - this cruised through our integration and QA environments without a problem, and even our smoke test environment - the one that points to production data - is fine under light load.
Does this have the distinguishing characteristics of anything you might have seen in the past?
Bumping this old question because we have encountered the same issue and perhaps our solution would give more insight in what causes this.
Essentially this problem occurs in a production environment that is under very heavy load in a Windows service that uses multiple threads to process several jobs simultaneously (100 users use the same DB via ASP.NET web app and there are about 60 transactions/second on older hardware with SQL Server 2000).
No variables are shared, that is connections are opened anew, transaction is started, operations executed, transaction committed and connection closes.
Under heavy load sometimes one of the following exceptions occurs:
NullReferenceException: Object reference not set to an instance of an
object.
at System.Data.SqlClient.SqlInternalConnectionTds.get_IsLockedForBulkCopy()
or
System.Data.SqlClient.SqlException:
The server failed to resume the transaction. Desc:3400000178
or
New request is not allowed to start because it should come with valid transaction descriptor
or
This SqlTransaction has completed; it is no longer usable
It seems somehow the connection that is within the pool becomes corrupted and remains associated with previously used transactions. Furthermore, if such connection is retrieved from pool then sqlAdapter.Fill(dataset) results in an empty dataset, causing "Cannot find table 0". Because our service would retry the operation (reading job list) on failure and it would always get the same corrupt connection from the pool it would fail with this error until restarted.
We removed the issue by using SqlConnection.ClearPool(connection) on exception to make sure this connection is discarded from the pool and restructuring the application so less threads access the same resources simultaneously.
I have no clue who exactly caused this issue so I am not sure we have really fixed that, maybe just made it so rare it had not occurred again yet.
I've fought precisely this error message before. The key is that an underlying data method is swallowing a timeout exception.
You're probably doing something like this:
var table = GetEmployeeDataSet().Tables[0];
GetEmployeeDataSet is swallowing an exception, probably a timeout exception, which is why it only happens sporadically - it happens under load. You need to do the following to fix it:
Modify the underlying code to not swallow the exception, but rather let it bubble up to the next level so you can identify it properly.
Identify the query(s) causing the problem, and then rewrite, reindex, denormalize or throw hardware at the problem. See this for more info: System.Data.SqlClient.SqlException: Timeout expired
I've seen something similar. I believe our problem had to do with failed sessions being re-used (once the session object failed it went into a poor state and could not recover.) We fixed it by increasing the memory for the session pool and increasing the frequency of the web application recycling.
It also was "caused" by a new version that at first blush did not seem to have any change to cause such an effect. However, eventually it became clear that the logic of the program was opening and closing a lot more connections (maybe 20% more) than it used to. This small change pushed the limit of our prior configuration.
You might check the SQL Server logs for errors. Or, the Web server event log. It sounds like your connection pool could be out of open connections or your db could be out.
Which database calls changed between versions?
The error is obviously telling you one of your database calls isn't returning any data on occasion; I can't think of any cases where a code/assembly issue would cause it.
I have seen something like this when doing something with nHibernate Sessions in a non-thread-safe manner. That would explain why you only see it under load. Would need to see your code to guess at what isn't thread-safe though.