why large sqlite database can cause windows server tcp connection delay - sqlite

I have a Windows Server 2016 machine that runs a server program, there're about 2.2k concurrent requests per second. The server program only costs the server 25% cpu and 25% memory and 30% bandwidth. It's written in c++, just like the boost example. it just does some calculation and return the result to client in TCP, and it doesn't use the disk.
But it's very lag, I can see the lag not only from my clients, but also from the Remote Desktop Connection, it takes about 10 seconds to establish an RDP connection, and it's very quick(less than 2 seconds) if I close the server program.
I guess some resources on my server is exhausted. But how can I find it, is there any tool can profile the system to find the bottleneck?
Update
The server program uses all cores averagely by running 8 threads on 8 cores, I did take care about this, it's confirmed in Task Manager, all 8 cores used nearly the same.
I found the problem is: I'm using a sqlite3 database(my.db) to log all the client access, the server becomes more lag when the .db grows. Now it is 1.2Gb, which causes the lag.
Then I tried:
Keep the 1.2Gb .db, just load it once at startup to read some configuration, stop recording new log, no read/write access while server is running, but it's still lag.
Execute delete from log_table and vacuum to delete the previous log and reduce the .db size to 16k. Then lag problem is gone, client request becomes very quick.
Question
Why the large database can cause the whole server lag? Not only for the server itself, but also affect other app like RDP connection, even the load is low?
Server Environment
Windows Server 2016
cpu: 8 cores (25% used)
memory: 16Gb (25% used)
disk: 40Gb (30% used)
server program written in c++ with boost coroutine
sqlite3 database with PRAGMA journal_mode=WAL; enabled.

Install the sysinternals tools.
Launch procexp.exe (Process Explorer ) - use process explorer to find out memory and disk usage for your process, and other.
Use resmon ( Win+R then type "resmon" ) to monitor the network bandwidth when your program is running and when it's not.

Related

How to make a UNIX socket faster?

I'm running a Google Cloud Compute VM as my application server for an app that's available on iOS and Android. The server runs Django within uWSGI, fronted with nginx. The communication between uWSGI and nginx happens through a unix file socket.
Recently I started noticing timeouts at client end. I did a bit of experimentation, and found that uWSGI sometimes errors out while writing data to the file socket. When I increase the 'max-time' parameter at the client end, it goes through smoothly. For example, a sample request that returns about 200KB of json data, takes about 1 sec for Django to compute. But the UNIX socket seems to take another 1-2 secs, which seems too high for a 200KB response. If the client is expecting a response within 2 secs, this often leads to a write error (as shown in the screenshot below) at uWSGI. When I increase the timeout at the client end, it goes through smoothly.
I want to know if there are some configuration changes that can make reading and writing on a UNIX socket faster. 200KB is a very minor size for a JSON response from my server - so I won't be able to bring it down. And I can't have a timeout of more than 2 secs at my client (iOS or Android), for business reasons.
Several unix entities are represented by files but are no file at all. Pipes and sockets are examples of entities represented by files that are not files.
So, writing, and reading from a unix socket is not bound to file system I/O and does not share file system time responses. In fact, unix socket is one of fastest ways of IPC, being more efficient than a TCP socket, since it does not use network I/O at all.
That stated, here is some hints on how to solve your particular problem:
Evaluate your app for performance issues. Profile it and check where it might be spending too much time. Usually, I/O is the main villain on performance issues. Also, bad algorithms, linear searches on long lists are also common guilties.
Check your configuration on both web server and your application gateway.
Check processes scheduling. If everybody is running on the same box, process concurrency may be an issue for heavy loads. Be sure to have all processes running under proper priorities.
Good luck!

Increase RAM usage for IIS server

I am running a large scale ERP system on the following server configuration. The application is developed using AngularJS and ASP.NET 4.5
Dell PowerEdge R730 (Quad Core 2.7 Ghz, 32 GB RAM, 5 x 500 GB Hard disk, RAID5 configured) Software: Host OS is VMWare ESXi 6.0 Two VMs run on VMWare ESXi .. one is Windows Server 2012 R2 with 16 GB memory allocated ... this contains IIS 8 server with my application code Another VM is also Windows Server 2012 R2 with SQL Server 2012 and 16 GB memory allocated .... this just contains my application database.
You see, I separated the application server and database server for load balancing purposes.
My application contains a registration module where the load is expected to be very very high (around 10,000 visitors over 10 minutes)
To support this volume of requests, I have done the following in my IIS server -> increase request queue in application pool length to 5000 -> enable output caching for aspx files -> enable static and dynamic compression in IIS server -> set virtual memory limit and private memory limit of each application pool to 0 -> Increase maximum worker process of each application pool to 6
I then used gatling to run load testing on my application. I injected 500 users at once into my registration module.
However, I see that only 40% / 45% of my RAM is being used. Each worker process is using only a maximum amount of 130 MB or so.
And gatling is reporting that around 20% of my requests are getting 403 error, and more than 60% of all HTTP requests have a response time greater than 20 seconds.
A single user makes 380 HTTP requests over a span of around 3 minutes. The total data transfer of a single user is 1.5 MB. I have simulated 500 users like this.
Is there anything missing in my server tuning? I have already tuned my application code to minimize memory leaks, increase timeouts, and so on.
There is a known issue with the newest generation of PowerEdge servers that use the Broadcom Network Chip set. Apparently, the "VM" feature for the network is broken which results in horrible network latency on VMs.
Head to Dell and get the most recent firmware and Windows drivers for the Broadcom.
Head to VMWare Downloads and get the latest Broadcom Driver
As for the worker process settings, for maximum performance, you should consider running the same number of worker processes as there are NUMA nodes, so that there is 1:1 affinity between the worker processes and NUMA nodes. This can be done by setting "Maximum Worker Processes" AppPool setting to 0. In this setting, IIS determines how many NUMA nodes are available on the hardware and starts the same number of worker processes.
I guess the 1 caveat to the answer you received would be if your server isn't NUMA aware/uses symmetric processing, you won't see those IIS options under CPU, but the above poster seems to know a good bit more than I do about the machine. Sorry I don't have enough street cred to add this as a comment. As far as IIS you may also want to make sure your app pool doesn't use default recycle conditions and pick a time like midnight for recycle. If you have root level settings applied the default app pool recycling at 29 hours may also trigger garbage collection against your child pool/causing delays even in concurrent gc where it sounds like you may benefit a bit from Gcserver=true. Pretty tough to assess that though.
Has your sql server been optimized for that type of workload? If your data isn't paramount you could squeeze faster execution times with delayed durability, then assess queries that are returning too much info for async io wait types. In general there's not enough here to really assess for sql optimizations, but if not configured right (size/growth options) you could be hitting a lot of timeouts due to growth, vlf fragmentation, etc.

Constant SQL Server 80% CPU Utilization

We have a small (for now) Asp.Net MVC 5 website on a dedicated VPS. When I go to the server and fire-up task manager, I see that "SQL Server Windows NT - 64 bit" is using around 80% of CPU and 170MB of RAM and IIS is using 6% CPU and 400MB of RAM. Server Specs are:
CPU 1.90Ghz dual core
Memory 2GB
Windows Server 2012
SQL Server Express 2012
Disk Space: 25GB, 2.35 Free.
The database is not very big. Its backup is less than 10MB.
I have tried to optimize the website as much as I could. I added caching to a lot of controllers and implemented donut caching for quite a lot of controllers. But today, even though there were only 5 users online, our search wouldn't work. I restarted the Windows on the server and it started working but I got the high CPU usage the minute server started. Interestingly when I open the SQL Server Management Studio and try to get the report for top CPU-consuming queries it says that there are no queries currently consuming any CPU!!! But at the same time I can see that SQL server is consuming a lot of CPU. How can I examine what is taking all the CPU? Below is a picture from the server:
I was/am very careful with designing and implementing the website. All the database access is through latest version of Entity Framework. I just wonder if the server's specs are low. Any help would be very much appreciated.
Update:
Here's the result of the sp_who2 stored procedure.
This could happen if the memory set to use is more than the available memory on the box. The default memory setting of 2147483647MB. In our case the AWS box had only 30.5 GB so we changed the setting to 26GB and the CPU usage fell to 40%. You generally want to leave 20% of memory for OS and its operations.
I would agree running SQL Profiler to spot large query durations and large write operations. Try running perfmon and spotting any potential connection leaks (reclaimed connections).

Can low memory on IIS server cause SQL Timeouts (SQL Server on separate box)?

I have an IIS Web Server that hosts 400 web applications (distributed across 30 application pools). They are both ASP.NET applications and WCF Services end points. The server has 32GB of RAM and is usually running fast; although it's running at 95% memory usage. Worker processes each take between 500MB and 1.5GB of RAM.
I also have another box running SQL Server. That one has plenty of free memory.
Sometimes, the Web Server starts throwing SQL Timeout exceptions. A few per minutes at first and rapidly increasing to hundreds per minute; effectively making the server down. This problem affects applications in all pools. Some requests still complete but most of them don't. While this happens the CPU usage on the server is around 30% (which is the normal load on that box).
While this is happening, we can still use SQL Server Management Studio (from the IIS Server) to execute requests successfully (and fast).
The fix is to restart IIS. And then everything goes back to normal until the next time.
Because the server is running with very low memory, I feel like this is the cause. But I cannot explain the relationship between low memory and sudden bursts of SQL Timeout exceptions.
Any idea?
Memory pressure can trigger paging and garbage collection. Both introduce latency which would not be present otherwise.
GC'ing 32GB of data can take seconds. Why would all app processes GC at the same time? Because at about 95% memory utilization Windows sets a "low memory" event that the CLR listens to. It will try to release memory to help other processes.
If the applications get into a paging frenzy that would also explain huge delays in normal execution.
This is just guessing, though. You can try proving it by looking at the "Hard page faults/sec" counter. There also must be a counter for "full GC" or "Gen 2 GC".
The fix would be running at a higher margin to the physical memory limit.
The first problem is to discover where the timeout is happening. Can you tell from the stack trace if the timeout is happening when executing a request against the database, or when connecting to the database? (Or even connecting to the web server?)
Timeouts executing database requests can be a variety of causes. The problem might be in the database with blocking processes, database maintenance (also locking), deadlocks, etc. When apps are running slowly, do you see a lot of entries in sys.dm_exec_requests, and if so, what are their wait_types?
Even if you can run SQL in the query window while the web server is timing out, that doesn't mean there isn't massive blocking or deadlocking going on.
If it is a timeout connecting to the database, then it is possible the ADO connection pools are overwhelmed and not getting cleaned up, or the database has a connection limit, and the web services are timing out waiting for a connection.
One of the best ways to find out what is going on is to capture a memory dump of the w3wp.exe process and analyze it. Even if you aren't adept at a debugger like WinDbg, Microsoft's DebugDiag tool can produce some nice reports with helpful information.
SqlCommand.CommandTimeout
This property is the cumulative time-out for all network reads during command execution or processing of the results. A time-out can still occur after the first row is returned, and does not include user processing time, only network read time.
It is a client based time out. If stuff is getting queued due to memory constraints then that could cause a timeout.
Are you retrieving a lot of data from these queries?
If some queries return a lot of data consider breaking them up and give the user a next and prior button.
Have you considered asynch like BeginExecuteReader?
The advantage is no timeout.
It does not release the calling thread.
isExecutingFTSindexWordOnce = true;
sqlCmdFTSindexWordOnce.BeginExecuteNonQuery(callbackFTSindexWordOnce, sqlCmdFTSindexWordOnce);
// isExecutingFTSindexWordOnce set to false in the callback
Debug.WriteLine("Calling thread active");
But I agree with your comment how to respond to the request as the answer does not come back to the calling thread.
Sorry I am used to WPF where I just update a public property on the call back.

NonStop ODBC: how the connections (ODBC servers) are assigned to CPUs?

We have an ODBC pool running on a NonStop server. The pool is connected to SQL/MX.
This pool is used by a few external Java applications, each of which has an JDBC pool connected to ODBC pool (e.g. 14 connections per application).
With time (after a few application recycles) we see an imbalance between CPUs -- some have 8 ODBC processes running, some only 5. That leads to CPU time imbalance too.
Up to this point we assumed that a CPU is assigned to ODBC process in round-robin fashion. That would maintain the number of ODBC processes more or less equally distributed. It's not the case though.
Is there any information on how ODBC pool decided which CPU to choose for every new allocated process? Does it look at CPU load? Available memory? Something else?
Sadly, even HP's own people (available to us, that is) couldn't answer those questions with certainty. :-(
And in fact connections are assigned to CPUs in round-robin fashion. But if one of the consumers (with its own pool) is restarted for any reason, the connections will be released on the CPUs where they were allocated (obviously), but new ones will be allocated on the next CPU according to round-robin algorithm. Thus some CPUs will become less busy, and some more. Thus imbalance.

Resources