I am running a single high-visited website on a high-end Centos 7 VPS (16 vCore / 128 GB of RAM) running Plesk Onyx on
Centos 7 / MariaDB 10.1 / PHP-FPM 5.6 setup.
Everything is usually smooth and fast, but it happened twice in a year that the website went down with the message "Too Many Connections" from MariaDB.
Being in a hurry to restore website I launched a " service mariadb restart " without actually launching a SHOW PROCESSLIST.
I checked mariadb logs and web server logs afterwards and I haven't find anything useful to troubleshoot the issue.
Note that when it happened first time, I raised the max_connections value to 300 in my.cnf and constantly monitored the "max_used_connections" variabile seeing that value never went over 50 so I guessed it happened because of some DDOS attack or malicious attempt.
Questions :
Any advice on how to troubleshoot this ?
How can I be alerted if the max_used_connections value is approaching the max_connections value ? Any tool ?
I am using external pingdom service to check website uptime but it didn't detect this kind of problem (the web response is 200 OK) and also a netdata instance on the server (https://netdata.io/) that didn't help...
Troubleshoot it by turning on the slowlog, preferably with a low value for long_query_time (such as "1"). Probably some naughty query will show up there.
Yes, do SHOW FULL PROCESSLIST next time. (Note "FULL".) Instead of restarting mysqld, look for the offending query. It will have one of the highest values in Time and it probably won't be in Sleep mode. It may be something potentially long like ALTER or a dump. Killing that one process will probably uncork the problem, and the problem will vanish in, perhaps, seconds.
Deleting a file that is "open" by a process (such as mysqld) will not help -- disk space is not recycled until all processes have closed the file. Killing the process closes any open files. Some logs are can be handled with FLUSH LOGS; -- this should be harmless, though it may not help.
If your tables are MyISAM, switching to InnoDB will avoid many cases of table locks (if that is what you are experiencing).
What is the value of innodb_buffer_pool_size? For that sized RAM, about 80G is reasonable.
There might be some clues in the GLOBAL STATUS; see http://mysql.rjweb.org/doc.php/mysql_analysis#tuning for analyzing it. (Caution: It will be useless immediately after a reboot.)
I'm executing a load test against an application hosted in Azure. It's a cloud service with 3 instances behind an internal load balancer (Hash based load balancing mode).
When I execute the load test, it queues request even though the req/sec and total current request to IIS is quite low. I'm not sure what could be the problem.
Any suggestions?
Adding few screenshot of performance counters which might help you take decision.
Click on image to view original image.
Edit-1: Per request from Rohit Rajan,
Cloud Service is having 2 instances (meaning 2 VMs), each of them having 14 GBs of RAM and 8 cores.
I'm executing a Step load pattern start with 100 and add 100,150 user every 5 minutes, till 4-5 hours until the load reaches to 10,000 VUs.
Any call to external system are written async. Database calls are synchronous.
There is no straight forward answer to your question. One possible way would be to explore additional investigation options.
Based on your explanation, there seems to be a bottleneck within the application which is causing the requests to queue-up.
In order to investigate this, collect a memory dump when you see the requests queuing up and then use DebugDiag to run a hang analysis on it.
There are several ways to gather the memory dump.
Task Manager
Procdump.exe
Debug Diagnostics
Process Explorer
Once you have the memory dump you can install debug diag and then run analysis on it. It will generate a report which can help you get started.
Debug Diagnostics download: https://www.microsoft.com/en-us/download/details.aspx?id=49924
I have 3 machines - one which is IIS, one with a database and one from which I test the efficiency of my application - which means:
Using The Grinder I run 1000 instances of my application (hosted on the IIS and operating with the database on the machine with SQL Server). And using perfmon I observe that there really are 1000 requests.
BUT the problem is that connecting to this application (IIS) from another computer is very slow. I suppose there is some bottleneck on the IIS side but I cannot find it - CPU usage is less than 10%.
I think I changed every option in the IIS Manager and machine.config and web.config files - nothing seems to have any effect.
First thing is you need to confirm if you have a slowness issue while browsing the site
Check the IIS logs and look for the time-taken field. If the time taken is more than 10 seconds then it is considered as a Slowness.
The slowness might be because of several reasons. It might be because of the Network or might be because something in your code might be causing it.
I suggest you to capture a Network trace using Netmon or WireShark in case if its a Network related.
If its not Network you can collect a Process dump using Debug diag 2 update 2 tool.
You can check the below link to collect the dumps and check them and try to find out if there is any slowness:
https://msdn.microsoft.com/en-us/library/ff420662.aspx
On a 5-node Riak cluster, we have observed very slow writes - about 2 docs per second. Upon investigation, I noticed that some of the nodes were low on disk space. After making more space available and restarting the nodes, we are see this error (or something similar) on all of the nodes inside console.log:
2015-02-20 16:16:29.694 [info] <0.161.0>#riak_core_handoff_manager:handle_info:282 An outbound handoff of partition riak_kv_vnode 182687704666362864775460604089535377456991567872 was terminated for reason: {shutdown,max_concurrency}
Currently, the cluster is not being written to or read from.
I would appreciate any help in getting the cluster back to good health.
I will add that we are writing documents to an index that is tied to a Solr index.
This is not critical production data, and I could technically wipe everything and start fresh, but I would like to properly diagnose and fix the issue so that I am prepared to handle it if it should happen in a production environment in the future.
Thanks!
I have an application with a file receive location. After the host instance has been running for a few hours the receive location fails to identify new files dropped into the folder that it is monitoring. It doesn't forget about them altogether, it's just that performance grinds to a crawl. The receive location is configured to poll the target folder every 60 seconds but after host instance has been running for an hour or so, then it seems that the target folder is being polled only every thirty minutes. If I restart the host instance then the files waiting in the target folder are collected right away and performance is fine for the next hour or so.
The same application runs fine in a different environment.
There are now obvious entries in the event log related to the problem.
All the BizTalk SQL jobs are running fine except for Backup BizTalk Server (BizTalkMgmtDb).
Any suggestions gratefully received.
Thanks
Rob
Here are some additional tools which may help you identify and diagnose BizTalk database issues.
BizTalk MsgBox Viewer
Here is a tool to repair identified errors:
Terminator
Use at your own risk... read the glogs and docs. Start with the message box viewer and let us know our results.
Without more details, the biggest tell is that your Backup Job is failing. If the backup job is failing, it may not be properly configured. If it is properly configured and still failing, then you've got other issues. Can you give us some more information about your BizTalk install.
What version are you running?
What are our database sizes?
What are your purge and archive settings like?
Is there any long running blocks in your SQL Server DB coming from BizTalk?
Another thing to consider is the user accounts the send, receive and orchestration hosts are running under. Please check the BizTalk Administration Console. If they are all running the same account, sometimes the orchestrations can starve the send and receive processes of CPU time. I believe priority is given to orchestrations then receive, then send. Even if you are just developing, it is useful to use separate accounts for this. This also improves security.
The Wrox BizTalk Server 2006 will also supply tuning advice.
What other things are going on with the server? Is BizTalk pegged otherwise or is it idle?
You mention that the solution does not have any problems in another environment, so it's likely that there is a configuration problem.
Check the following:
** On SQL Server, set some upper memory limit for SQL Server. By default, SQL Server uses whatever it can get and then hangs onto it, so set a reasonable limit so that your system can operate without spending a lot of time paging memory onto and from your hard drive(s).
** Ensure that you have available disk space - maybe you are running low - this can lead to all kinds of strange problems.
** Try to split up the system's paging file among its physical drives (if you have more than one drive on the system). Also consider using a faster drive, or if you have lots of cash laying around, get a SAN.
** In BizTalk, is tracking enabled? If so, are you also tracking message bodies? Disable tacking or message body tracking and see if there is a difference.
** Start performance monitor and monitor the following counters when running your solution
Object: BizTalk Messaging
Instance: (select the receiving host) %%
Counter: Documents Received/Sec
Object: BizTalk Messaging
Instance: (select the transmitting host) %%
Counter: Documents Sent/Sec
Object: XLANG/s Orchestrations
Instance: (select the processing host) %%
Counter: Orchestrations Completed/Sec.
%% You may have only one host, so just use it. Since BizTalk configurations vary, I am using generic names for hosts.
The preceding counters monitor the most basic aspects of your server, but may help to narrow down places to look further. You can, of course, add CPU and Memory too. If you have time (days...maybe weeks) you could monitor for processes that allocate memory and never release it. Use the following counter...
Object: Memory
Counter: Pool Nonpaged Bytes
Slow decline of this counter indicates that a process is not releasing memory, which affects everything on the system.
Let us know how things turn out!
I had the same problem with, when my orchestration was idle for some time it took a long time to process the first msg. A article of EvYoung helped me solve this problem.
"This is caused by application domain unloading within the BizTalk host process. If an AppDomain is shutdown after idle, the next message that comes needs to wait for the Orchestration to compile again. Depending on the complexity of your design, this can be a noticeable wait. To prevent this in low latency requirement scenario, you can modify the BTSNTSVC.EXE.config file and set SecondsIdleBeforeShutdown property to -1. This will prevent AppDomain shutdown due to idle."
You can find the article in here:
http://blogs.msdn.com/b/biztalkcpr/archive/2008/05/08/thoughts-on-orchestration-performance.aspx
It took me to long to respond but i thought i might help someone. cheers :)
Some good suggestions from others. I will add :
Do you have any custom receive pipeline components on the receive location ? If so perhaps one is leaking memory, calling some external component eg database which is taking a long time ?
How big are the files you are receiving ?
On the File transport properties of your receive location, set "file renaming" on, do the files get renamed within 60s.