Oracle configuration to keep firewall open for time consuming db transactions - oracle11g

I have an EAR app running on WAS and connecting to a oracle db to run a stored proc.
Now the stored proc takes a long time to run. So the firewall between the WAS and the oracle server closes down after 30 min. Is there an oracle configuration that will allow the firewall to remain open? Increasing the timeout is not an option here.

If the firewall is closing the connection because of inactivity, you can set the sqlnet.ora parameter sqlnet.expire_time on the server to ping the client every N minutes. Normally, this is used for dead connection detection so that the server can determine that a client application died with an open connection. But that may work to prevent the firewall from deciding that the connection has been inactive too long. If, on the other hand, the firewall simply disallows connections that last longer than 30 minutes regardless of inactivity, this setting won't have any impact.
Architecturally, do you really want your application making a 30 minute stored procedure call, though? It would seem more appropriate for the application to make a call that submits a job that runs the stored procedure asynchronously. The web application can then periodically poll the database to see whether the job is still running if you want to display some sort of progress bar to the user.

Related

Is it possible to specify a connection timeout for Maxscale connections from applications?

I have setup a two node MariaDB Galera cluster on Ubuntu systems. A simple application connects to a database using MaxScale and it works fine. But when the node in the cluster that is currently in use, say, node 1, fails, the application gets error such as 1927 or 1045. On receiving this error, the application tries to connect to database again but it keeps failing many times but succeeds once fail over from node 1 to node 2 is complete and MaxScale gives database connection to node 2. The connection trial duration ranges from 20 to 50 seconds in my cluster environment.
My question is whether or not there is any MaxScale connection time out parameter that I can use to specify connection timeout to some value such as 50 seconds so that application tries just once for a new connection instead of trying many times. (I used parameter connectTimeout in JDBC URL for the database but it was not effective for my application and I think this is expected.)
MaxScale is sending errors most likely because no master server is available. This error cannot be prevented with MaxScale 2.2 and client side re-connection is required.
In MaxScale 2.3, a new feature will be available that allows similar behavior to what you describe (see MXS-1501).
If you are performing read-only requests, it might be beneficial to enable master_failure_mode=error_on_write. This will allow read-only requests to be done even when no master server is available.

Issues with NCache cluster cache running on a single cluster, after the other instance goes offline

We have a replicated cluster cache setup with two instances, everything runs well when both instances are on-line, and we are using Community Edition 4.8.
When we take an instance offline, cache management becomes very slow and even stopping and starting the cache from NCache Manager GUI takes a very long time and then shows a message stating that there is an instance that is un-reachable.
Also when trying to fetch data from cache or add data to it, it gives an exception of operation timeout, and there is no response from the single instance that is still running.
From what I understand, this scenario should be handled by the cache service it-self since it is replicated, and it should handle failure for an instance going offline.
Thank you,
I would like to explain the cause of slowness on your application when one of the server node gets removed from the cache cluster.
What happens is whenever a node gets removed from the cache cluster, the surviving node/nodes go into recovery process and try to re-establish connection with that downed server node. By default this Connection retry value is set to “2” which means that the surviving nodes would try to reconnect with the downed node two times and after the reconnection has failed, the cache cluster would consider the downed server and offline and the cluster would start handling requests like before. This reconnection process can take up to 90 seconds as this is the default TCP/IP timeout interval and if the connection retry is set to “2” the recovery process could take up to around 200 seconds. Your application(Or NCache Manager calls) could experience slowness or request timeouts during this 2-3 minutes window when the cluster is in the recovery mode but once the recovery process is finished, the application should start working without any issues. If the slowness or request timeouts last more than a few minutes
The Connection retry value can be changed from the NCache “Config.ncconf” file. Increasing the number of connection retries would mean that the cluster would spend more time in the recovery process. The purpose of this feature is that if there is a network glitch in the environment and the server nodes lose connection with eachother, the servers would get reconnected automatically due to this recovery process. This is the reason why it is recommended to keep the Connection Retry interval set to at least 1.

Connection pool is full

In my IIS Server, I have many application pools (like 6 to 7) and there are many ASP.NET applications running on each of them (ex. 25 applications per pool). They all are connected with Oracle database by using ADO.NET.
All applications are just working fine, but sometimes we get error like
Timeout expired. The timeout period elapsed prior to obtaining a connection from the pool. This may have occurred because all pooled connections were in use and max pool size was reached.
I know the possibilities for this like we are not closing our database connections properly. So here is my headache... I don't want to go each and every project to see where we forget to close connections it is very time taking task for us.
So is there any way to identify from which application connections are remaining opened? Can we see from IIS itself? Can we make some kind of utility to track from which project connection are remaining opened?
I'm not sure that it's a probleme of the connection to database. I think that you application are not disposing the context then the garbage collector can't clear memory. you can try to reduce the time for recylcling your application pools then check if you memory usage is decreasing or not.

Calling SqlConnection.ClearAllPools() in Application_Start & Application_End?

We are trying to diagnose an issue that occurred in our production environment last week. Long story short, the database connection pool seemed to be full of active connections from our ASP.NET 3.5 app that would not clear, even after restarting the application pool and IIS.
The senior DBA said that because the network connections occur at the operating system level, recycling the app and IIS did not sever the actual network connections, so SQL Server left the database connections to continue running, and our app was still unable to reach the database.
In looking up ways to force a database connection pool to reset, I found the static method SqlConnection.ClearAllPools(), with documentation explaining what it does, but little to nothing explaining when to call it. It seems like calling it at the beginning of Application_Start and the end of Application_End in my global.asax.cs is a good safety measure to protect the app from poisoned connection pools, though it would of course incur a performance hit on startup/shutdown times.
Is what I've described a good practice? Is there a better one? The goal is to allow a simple app restart to reset an app's mangled connection pool without having to restart the OS or the SQL Server service, which would affect many other apps.
Any guidance is much appreciated.
When a process dies, all network connection are always, always, always closed immediately. That's at the TCP level. Has nothing to do with ADO.NET and goes for all applications. Kill the browser, and all downloads stop. Kill the FTP client and all connections are closed immediately.
Also, the connection pool is per process. So clearing it when starting the app is useless because the pool is empty. Clearing it at shutdown is not necessary because all connections will (gracefully) shut down any moment.
Probably, your app is not returning connections to the pool. You must dispose of all connections after use in all cases. If you fail to do that, dangling connections will accumulate for an indefinite amount of time.
Clearing the pool does not free up dangling connections because those appear to be in use. How could ADO.NET tell that you'll never use them again? It can't.
Look at sys.dm_exec_connections to see who is holding connections open. You might increase the ADO.NET pool size as a stop-gap measure. SQL Server can take over 30k connections per instance. You'll normally never saturate that.

Can low memory on IIS server cause SQL Timeouts (SQL Server on separate box)?

I have an IIS Web Server that hosts 400 web applications (distributed across 30 application pools). They are both ASP.NET applications and WCF Services end points. The server has 32GB of RAM and is usually running fast; although it's running at 95% memory usage. Worker processes each take between 500MB and 1.5GB of RAM.
I also have another box running SQL Server. That one has plenty of free memory.
Sometimes, the Web Server starts throwing SQL Timeout exceptions. A few per minutes at first and rapidly increasing to hundreds per minute; effectively making the server down. This problem affects applications in all pools. Some requests still complete but most of them don't. While this happens the CPU usage on the server is around 30% (which is the normal load on that box).
While this is happening, we can still use SQL Server Management Studio (from the IIS Server) to execute requests successfully (and fast).
The fix is to restart IIS. And then everything goes back to normal until the next time.
Because the server is running with very low memory, I feel like this is the cause. But I cannot explain the relationship between low memory and sudden bursts of SQL Timeout exceptions.
Any idea?
Memory pressure can trigger paging and garbage collection. Both introduce latency which would not be present otherwise.
GC'ing 32GB of data can take seconds. Why would all app processes GC at the same time? Because at about 95% memory utilization Windows sets a "low memory" event that the CLR listens to. It will try to release memory to help other processes.
If the applications get into a paging frenzy that would also explain huge delays in normal execution.
This is just guessing, though. You can try proving it by looking at the "Hard page faults/sec" counter. There also must be a counter for "full GC" or "Gen 2 GC".
The fix would be running at a higher margin to the physical memory limit.
The first problem is to discover where the timeout is happening. Can you tell from the stack trace if the timeout is happening when executing a request against the database, or when connecting to the database? (Or even connecting to the web server?)
Timeouts executing database requests can be a variety of causes. The problem might be in the database with blocking processes, database maintenance (also locking), deadlocks, etc. When apps are running slowly, do you see a lot of entries in sys.dm_exec_requests, and if so, what are their wait_types?
Even if you can run SQL in the query window while the web server is timing out, that doesn't mean there isn't massive blocking or deadlocking going on.
If it is a timeout connecting to the database, then it is possible the ADO connection pools are overwhelmed and not getting cleaned up, or the database has a connection limit, and the web services are timing out waiting for a connection.
One of the best ways to find out what is going on is to capture a memory dump of the w3wp.exe process and analyze it. Even if you aren't adept at a debugger like WinDbg, Microsoft's DebugDiag tool can produce some nice reports with helpful information.
SqlCommand.CommandTimeout
This property is the cumulative time-out for all network reads during command execution or processing of the results. A time-out can still occur after the first row is returned, and does not include user processing time, only network read time.
It is a client based time out. If stuff is getting queued due to memory constraints then that could cause a timeout.
Are you retrieving a lot of data from these queries?
If some queries return a lot of data consider breaking them up and give the user a next and prior button.
Have you considered asynch like BeginExecuteReader?
The advantage is no timeout.
It does not release the calling thread.
isExecutingFTSindexWordOnce = true;
sqlCmdFTSindexWordOnce.BeginExecuteNonQuery(callbackFTSindexWordOnce, sqlCmdFTSindexWordOnce);
// isExecutingFTSindexWordOnce set to false in the callback
Debug.WriteLine("Calling thread active");
But I agree with your comment how to respond to the request as the answer does not come back to the calling thread.
Sorry I am used to WPF where I just update a public property on the call back.

Resources