website down with mariadb "too many connections" error - mariadb

I am running a single high-visited website on a high-end Centos 7 VPS (16 vCore / 128 GB of RAM) running Plesk Onyx on
Centos 7 / MariaDB 10.1 / PHP-FPM 5.6 setup.
Everything is usually smooth and fast, but it happened twice in a year that the website went down with the message "Too Many Connections" from MariaDB.
Being in a hurry to restore website I launched a " service mariadb restart " without actually launching a SHOW PROCESSLIST.
I checked mariadb logs and web server logs afterwards and I haven't find anything useful to troubleshoot the issue.
Note that when it happened first time, I raised the max_connections value to 300 in my.cnf and constantly monitored the "max_used_connections" variabile seeing that value never went over 50 so I guessed it happened because of some DDOS attack or malicious attempt.
Questions :
Any advice on how to troubleshoot this ?
How can I be alerted if the max_used_connections value is approaching the max_connections value ? Any tool ?
I am using external pingdom service to check website uptime but it didn't detect this kind of problem (the web response is 200 OK) and also a netdata instance on the server (https://netdata.io/) that didn't help...

Troubleshoot it by turning on the slowlog, preferably with a low value for long_query_time (such as "1"). Probably some naughty query will show up there.
Yes, do SHOW FULL PROCESSLIST next time. (Note "FULL".) Instead of restarting mysqld, look for the offending query. It will have one of the highest values in Time and it probably won't be in Sleep mode. It may be something potentially long like ALTER or a dump. Killing that one process will probably uncork the problem, and the problem will vanish in, perhaps, seconds.
Deleting a file that is "open" by a process (such as mysqld) will not help -- disk space is not recycled until all processes have closed the file. Killing the process closes any open files. Some logs are can be handled with FLUSH LOGS; -- this should be harmless, though it may not help.
If your tables are MyISAM, switching to InnoDB will avoid many cases of table locks (if that is what you are experiencing).
What is the value of innodb_buffer_pool_size? For that sized RAM, about 80G is reasonable.
There might be some clues in the GLOBAL STATUS; see http://mysql.rjweb.org/doc.php/mysql_analysis#tuning for analyzing it. (Caution: It will be useless immediately after a reboot.)

Related

Teradata Express Does not respond after restart

Instead of commenting on thread Teradata Viewpoint on Teradata Express 16.10 is not working, I am creating new thread but it is with respect to linked so added this link to post,
I have restarted Teradata Express with command tpareset -x "increasing memory size" and increased the memory, but after restart, queries on system are running very slowly and sometimes SQl Assistant goes into Not Respoding state.
To resolve this temporary, I used tpa stop and then tpa start, after this, system runs smoothly without lag but sometimes later it again goes into slow state.
Not sure what is happening here, can someone guide on how to resolve it permanently ?

Debugging MySQL: Too many connections

After we deployed the new version of our ASP.NET C# app with a MySQL DB we are having issues with the connections.
Yesterday I got the "Too many connections" error and I'm watching the open connections with
SHOW FULL PROCESSLIST and they keep increasing during the day.
Is there a good way to figure out where our bug could be? Like checking the last query that a sleeping connection made?
Make sure your app is closing connection to db properly....if your app not closing connections then you will get above error
Connection pooling usually solved this problem. In your case, connections seem to stay open much too long, which means that in some branches of your software, there's not definitive finally that closes the connection after it has been used. It's especially useful to diagnose problematic connection usage at a central point, because it can keep an eye on the number of open connections at any given time and maybe alert somebody to make a dump to analyze later.
You could also increase the number of connections allowed to your MySQL instance. The settings in your my.cnf is "max_connections".
Lastly, if you want you can try to decrease the "wait_timeout" or "interactive_timeout" properties of your instance. These settings regulate atomatic closing of connections after certain amounts of time.

Postgresql Database Slow Down

I have installed PostgreSQL 9.1 server on my production server.
I have Changed all the configuration(postgresql.conf) according to system.
Everything has been working fine for a week.
After this suddenly, postgresql server becomes very slow.
Even for count(*) query on a table.It is taking too much time.
After this I have done so many activities like:
Monitor Load on system :- Normal within range of 0.5 to 1.5.
Monitor No of users logged in application :- 200 to 400. Normal
Recreated Index
Kill the ideal transactions.
Check Locks (No DeadLock Found)
Application server restarted.
Database server restarted.
After doing this all activities the performance of database server is not increased.
It is taking so much time for normal queries also.
Then I drop the database and recreated then Performance Increases
Everything working after recreating the database.
But after some days suddenly performance goes down.
This sounds like the active portion of the database is growing large enough that it doesn't fit in cache, causing actual disk access (which is orders of magnitude slower) for many of your reads. This is often caused by not vacuuming aggressively enough.
Other factors could be related to your configuration of PostgreSQL and the operating system. It's hard to give much advice without knowing:
exactly what version of PostgreSQL you're using (9.1 tells us the major release, but the minor release sometimes matters),
how you have PostgreSQL configured,
what OS you're running on,
what hardware you are using (cores, RAM, drive arrays, controllers), etc.
Part of that can be supplied by posting the results of running the query on this page:
http://wiki.postgresql.org/wiki/Server_Configuration
It might also help to select relname, relpages, and reltuples from pg_class for the tables involved and compare numbers when things are running well to when they are slow.
With the additional information, people should be able to make some pretty specific recommendations.

What Causes "Internal connection fatal errors"

I've got a number of ASP.Net websites (.Net v3.5) running on a server with a SQL 2000 database backend. For several months, I've been receiving seemingly random InvalidOperationExceptions with the message "Internal connection fatal error". Sometimes there's a few days in between, while other times there are multiple errors per day.
The exception is not limited to one site in particular, though they share business and data access assemblies. The error seems to always be thrown from SqlClient.TdsParser.Run(). It sometimes is thrown from old-school direct SqlCommand.Execute() calls, while other times it is thrown from Linq2Sql code.
I've been assured by the network guys that there are no errors or packets lost on their end. Has anyone else experienced this? Could it be a driver problem? We have been unable as of yet to pinpoint a specific trigger for this exception.
We're running II6 on Windows Server 2003.
After a few months of ignoring this issue, it started to reach a critical mass as traffic gradually increased. Under heavy load, including some crawlers, things got crazy and these errors poured in nonstop.
Through trial and error, we eventually tracked down a handful of SqlCommand or LINQ queries whose SqlConnection wasn't closed immediately after use. Instead, through some sloppy programming originating from a misunderstanding of LINQ connections, the DataContext objects were disposed (and connections closed) only at the end of a request rather than immediately.
Once we refactored these methods to immediately close the connection with a C# "using" block (freeing up that pool for the next request), we received no more errors. While we still don't know the underlying reason that a connection pool would get so mixed up, we were able to cease all errors of this type. This problem was resolved in conjunction with another similar error I posted, found here: Why is my SqlCommand returning a string when it should be an int?
Sounds like the database connection is getting dropped or timing out.
We recently had similar issues moving to IIS 6 from IIS 5 connecting to SQL 2000. Our issue was solved by increasing number of ephemeral ports available.
Look at the usage of the ephemeral ports by the IIS server. The default max no. of ports available is normally 4000. You might want to consider increasing this if the sites on your server are particularly busy or your application is making a lot of database calls.
You can monitor these first to see if going over max limit.
Search Microsoft Knowledge base for "MaxUserPort" and "TcpTimedWaitDelay" and make necessary registry changes. Make sure you back up registry or snapshot server before making the changes. Will need to reboot for changes to take effect.
You should double check your database and recordset connection are being closed after use. Not closing will use up this port range unnecessarily.
Check the efficiency of your stored procedures anyway as they might be taking longer than they need too.
"If you rapidly open and close 4000 sockets in less than four minutes, you will reach the default maximum setting for client anonymous ports, and new socket connection attempts fail until the existing set of TIME_WAIT sockets times out." - from http://support.microsoft.com/kb/328476
Check your server's LOG folder (\program files\Microsoft SQL Server\MSSQL.1\MSSQL\LOG or similar) for files named SqlDump*.mdmp and SqlDump*.txt. If you do find any you'll have to take it to Product Support.
I was creating a new EF Core project and was trying to create the database to an external Linux server instead of a Windows Server or local one. After hours of searching I found out that I am using MySQL instead of the Microsoft SQL server.
I found it weird that everyone was using 1433 instead of the usual 3306. So to fix my 'Internal connection fatal error' I had to set up a docker instance of SQL Server bound to its default port of 1433.
It literally was that simple. In the docker repo look for "microsoft-mssql-server" and run the image as described neatly in the description below. Everything works now and I am able to push my database from my EF Core project to an external server.

Slow BizTalk File Receive

I have an application with a file receive location. After the host instance has been running for a few hours the receive location fails to identify new files dropped into the folder that it is monitoring. It doesn't forget about them altogether, it's just that performance grinds to a crawl. The receive location is configured to poll the target folder every 60 seconds but after host instance has been running for an hour or so, then it seems that the target folder is being polled only every thirty minutes. If I restart the host instance then the files waiting in the target folder are collected right away and performance is fine for the next hour or so.
The same application runs fine in a different environment.
There are now obvious entries in the event log related to the problem.
All the BizTalk SQL jobs are running fine except for Backup BizTalk Server (BizTalkMgmtDb).
Any suggestions gratefully received.
Thanks
Rob
Here are some additional tools which may help you identify and diagnose BizTalk database issues.
BizTalk MsgBox Viewer
Here is a tool to repair identified errors:
Terminator
Use at your own risk... read the glogs and docs. Start with the message box viewer and let us know our results.
Without more details, the biggest tell is that your Backup Job is failing. If the backup job is failing, it may not be properly configured. If it is properly configured and still failing, then you've got other issues. Can you give us some more information about your BizTalk install.
What version are you running?
What are our database sizes?
What are your purge and archive settings like?
Is there any long running blocks in your SQL Server DB coming from BizTalk?
Another thing to consider is the user accounts the send, receive and orchestration hosts are running under. Please check the BizTalk Administration Console. If they are all running the same account, sometimes the orchestrations can starve the send and receive processes of CPU time. I believe priority is given to orchestrations then receive, then send. Even if you are just developing, it is useful to use separate accounts for this. This also improves security.
The Wrox BizTalk Server 2006 will also supply tuning advice.
What other things are going on with the server? Is BizTalk pegged otherwise or is it idle?
You mention that the solution does not have any problems in another environment, so it's likely that there is a configuration problem.
Check the following:
** On SQL Server, set some upper memory limit for SQL Server. By default, SQL Server uses whatever it can get and then hangs onto it, so set a reasonable limit so that your system can operate without spending a lot of time paging memory onto and from your hard drive(s).
** Ensure that you have available disk space - maybe you are running low - this can lead to all kinds of strange problems.
** Try to split up the system's paging file among its physical drives (if you have more than one drive on the system). Also consider using a faster drive, or if you have lots of cash laying around, get a SAN.
** In BizTalk, is tracking enabled? If so, are you also tracking message bodies? Disable tacking or message body tracking and see if there is a difference.
** Start performance monitor and monitor the following counters when running your solution
Object: BizTalk Messaging
Instance: (select the receiving host) %%
Counter: Documents Received/Sec
Object: BizTalk Messaging
Instance: (select the transmitting host) %%
Counter: Documents Sent/Sec
Object: XLANG/s Orchestrations
Instance: (select the processing host) %%
Counter: Orchestrations Completed/Sec.
%% You may have only one host, so just use it. Since BizTalk configurations vary, I am using generic names for hosts.
The preceding counters monitor the most basic aspects of your server, but may help to narrow down places to look further. You can, of course, add CPU and Memory too. If you have time (days...maybe weeks) you could monitor for processes that allocate memory and never release it. Use the following counter...
Object: Memory
Counter: Pool Nonpaged Bytes
Slow decline of this counter indicates that a process is not releasing memory, which affects everything on the system.
Let us know how things turn out!
I had the same problem with, when my orchestration was idle for some time it took a long time to process the first msg. A article of EvYoung helped me solve this problem.
"This is caused by application domain unloading within the BizTalk host process. If an AppDomain is shutdown after idle, the next message that comes needs to wait for the Orchestration to compile again. Depending on the complexity of your design, this can be a noticeable wait. To prevent this in low latency requirement scenario, you can modify the BTSNTSVC.EXE.config file and set SecondsIdleBeforeShutdown property to -1. This will prevent AppDomain shutdown due to idle."
You can find the article in here:
http://blogs.msdn.com/b/biztalkcpr/archive/2008/05/08/thoughts-on-orchestration-performance.aspx
It took me to long to respond but i thought i might help someone. cheers :)
Some good suggestions from others. I will add :
Do you have any custom receive pipeline components on the receive location ? If so perhaps one is leaking memory, calling some external component eg database which is taking a long time ?
How big are the files you are receiving ?
On the File transport properties of your receive location, set "file renaming" on, do the files get renamed within 60s.

Resources