Why is Datastax Opscenter eating too much CPU? - cpu-usage

Environment :
machines : 2.1 xeon, 128 GB ram, 32 cpu
os : centos 7.2 15.11
cassandra version : 2.1.15
opscenter version : 5.2.5
3 keyspaces : Opscenter (3 tables), OpsCenter (10 tables), application`s keyspace with (485 tables)
2 Datacenters, 1 for cassandra (5 machines )and another one DCOPS to store up opscenter data (1 machine).
Right now the agents on the nodes consume on average ~ 1300 cpu (out of 3200 available). The only transactioned data being ~ 1500 w/s on the application keyspace.
Any relation between number tables and opscenter? Is it behaving alike, eating a lot of CPU because agents are trying to write the data from too many metrics or is it some kind of a bug!?
Note, same behaviour on previous version of opscenter 5.2.4. For this reason i first tried to upg opscenter to newest version available.
From opscenter 5.2.5 release notes :
"Fixed an issue with high CPU usage by agents on some cluster topologies. (OPSC-6045)"
Any help/opinion much appreciated.
Thank you.

Observing with the awesome tool you provided Chris, on specific agent`s PID noticed that the heap utilisation was constant above 90% and that triggered a lot of GC activity with huge GC pauses of almost 1 sec. In this period of time i suspect the pooling threads had to wait and block my cpu alot. Anyway i am not a specialist in this area.
I took the decision to enlarge the heap for the agent from default 128 to a nicer value of 512 and i saw that all the GC pressure went off and now any thread allocation is doing nicely.
Overall the cpu utilization dropped from values of 40-50% down to 1-2% for the opscenter agent. And i can live with 1-2% since i know for sure that the CPU is consumed by the jmx-metrics.
So my advice is to edit the file:
datastax-agent-env.sh
and alter the default 128 value of Xmx
-Xmx512M
save the file, restart the agent, and monitor for a while.
http://s000.tinyupload.com/?file_id=71546243887469218237
Thank you again Chris.
Hope this will help other folks.

Related

Trobleshooting slow writes on hpux

Could anyone offer any troubleshooting ideas or pointers on where/how to get more information on the difference between sys and real time from the output below?
It is my understanding that the command finished processing in the OS in 4 seconds, but then IO where queued and processing and 38.3 seconds (is that right?). It is somewhat a block box at this point to me on how to get some additional details.
time prealloc /myfolder/testfile 2147483648
real 42.5
user 0.0
sys 4.2
You are writing 2 GB to disk on an HP-UX system; this is most likely using spinning disks (physical hard disks).
The system is writing 2GiB / 42s = 51 MB/s which doesn't sound slow to me.
On these systems you can use tools such as sar. Use sar -ud 5 to see CPU and disk usage during your prealloc command; you will likely see disk usage pegged at 100%.

Query execution taking time in Presto with pinot connector

We are using Apache pinot as source system. We have loaded 10GB TPCH data into pinot. We are using Presto as query execution engine, using pinot connector.
We are trying with simple configuration. Presto installed on CentOS machine with 8CPUs and 64GB RAM. Only one instance of worker running with embedded coordinator. Pinot is installed on CentOS machine with 4 CPUs and 64 GB RAM. One Controller,one broker,one server and one zookeeper are running.
Running a query on Lineitem table involving group by roll-up, is taking 23 seconds. Around 20 seconds is spent in transferring 2.3GB data from pinot to presto.
In another query, involving join between Lineitem,Nation,Partsupply,Region with group by cube is taking around 2 minutes. Data transfer is taking around 25 seconds in this. Most of the remaiy time is spent in join and aggregation computation.
Is this normal performance with presto-pinot?
If not,what am I missing?
Do I, need to increase hardware? Increase number of presto/pinot processes?
Any specific presto properties I should consider modifying?
Thanks for your help in advance
Please list the queries so that we can provide a better answer. At a high level, Presto Pinot connector tries to pushdown most of the computation (filter, aggregation, group by) to Pinot and minimize the amount of data needed to pull from Pinot.
There are always queries that require a full table scan and computation cannot be pushed to pinot. Query latency can be higher in such cases. Pinot recently added a streaming api that can improve the latency further.

Nuodb Memory and CPU usage reached high

While accessing NuoDB database from Java application, In Task manager tool getting CPU and Memory usage reached 99% almost and I tired with NUODB 2.4 ,2.5 and 2.6 versions but finally i am getting same issue.
Present my windows server hardware configurations are below.
RAM : 12 GB (3 processors ) and
Hard disk : 100 GB
Please give any suggest to come this issue.
Thanks in advance
I see from the task manager MANY "NuoDB Server" processes running
(the picture shows 7 NuoDB processes running on that server),
It might be having too many TEs or too much memory configured for NuoDB on
that single server as a potential for the problem or setting NuoDB incorrectly.
The following link can help you understand how to check your system settings.
http://doc.nuodb.com/Latest/Default.htm#Mgr-Show-Domain.htm?Highlight=--memory

ASP.NET application and CPU usage

We have a vanilla ASP.NET application (ASP.NET web forms, Entity Framework, SQL Server 2005) without any explicit multithreading from code. It has been deployed on the stagging environment (OS - Windows Server 2008 R2 64 bit, CPU - Intel Xeon E5507 # 2.27 GHz 2.34 GHz, RAM - 7.5 GB). This environment is composed of a web, a database and a reporting server each a separate instance in the cloud (Amazon EC2). When testing for concurrency, observations are as under:
1 user - CPU usage ~25%, Response time 2-4 seconds
2 users - CPU usage 40-50%, Response time 3-6 seconds
4 users - CPU usage 60-80%, Response time 4-8 seconds
8 users - CPU usage 80-100%, Response time 4-10 seconds
My questions are:
Is CPU usage is relative to no. of concurrent users? And the response time can vary to a great extent as seen in the above observations?
As from the above observations, CPU will be maxed out when concurrent user counts is ~10. Shouldn't the CPU handle much more concurrent users seamlessly without a drastic increase in the response time? In ideal scenario, in case of a basic ASP.NET application, how many concurrent users a CPU can handle?
If yes in the above question, what could be the problem here for high CPU/long response time? What ways we should go ahead for effective debugging to find out bottlenecks in code/IIS settings?
PS: IIS settings (i.e. in machine.config) which have been changed:
maxWorkerThreads = 100
MinWorkerThreads = 50
maxIOThreads = 100
minIOThreads = 50
minFreeThreads = 176
maxConnections = 100
The high CPU usage could be caused by a wide variety of things. The easiest way to find out what's going on is to use a profiling tool:
ANTS Memory Profiler:
http://www.red-gate.com/products/dotnet-development/ants-memory-profiler/
ANTS Performance Profiler:
http://www.red-gate.com/products/dotnet-development/ants-performance-profiler/
They're very affordable, but you should be able to work unrestricted with the trial versions. These tools do a fantastic job of identifying bottlenecks and memory leaks.

Out of Memory Exception - ASP.NET - IIS 7

The problem is with Memory management because I keep receiving “Out of Memory exception”.
Here are the scenarios where we face the problem:
Please note:
1. The site/application is developed in ASP.Net and uploaded on a server with the following specs:
- Windows Server 2008 (R2) Standard
- Intel Xeon L5520#2.27GHz 2.27GHz
- RAM = 8GB
- System Type = 64bit
The application is event management based web application where the requirements include saving huge amount of data in Sessions etc (mentioning this in case it is relevant)
The applications/site works fine until we:
Edit a file directly on the server
Update a file from repository
Copy/Paste a file (we don’t usually edit code using this technique)
Please note, all of the above hold true ONLY when the traffic to the site is high that is,
The issue/error “Out of Memory” is not produced when the traffic/visits is low
Details of:
System Properties > Advanced > Performance Settings > Advanced tab
Total paging file size for all drives: 16362 MB
In web.config
Is there any way we can debug this problem to the core and find out a solution. Can you please provide links/help where we can further investigate this problem?
Best regards,
Farrukh
Out of Memory Exceptions are common with applications that see periodic transaction surges while keeping larger volumes of data in memory. This problem does, however, depend on your application and architecture. Below are a few pointers:
Hardware - you have Xeon 5500 (Intel Nehalem chips). These are very good at handling memory. You should be good here.
OS - Windows Server 2008 R2 - As an OS this system will handle more than enough memory for you (you are good here, see link for capabilities: Memory Limits for Windows)
Physical Memory - Did you say you have 8 GB on the server? Note you app is allowing 16 GB. There is one issue. If your app requests more memory than physically available you will see your error. But this is not your only concern ...
CLR / GC limitations - Your application has a "paging file size" of 16+ GB. This is probably your issue.
GC is the heart of your problem for you. In terms of why, it is the same reason Java and the JVM have issues whenever an application exceeds 2-4 GB. That requires a look at the actual process of GC.
You have "old generation" and "young generation" Garbage Collection processes. As you app runs the CLR tries to keep your memory space organized. These processes force all threads to pause (phase changes) when GC mark and swap processes occur. The problem here is, depending on how your code is written and the amount of memory you keep around for long periods, you can run into memory issues.
Any time you press a runtime environment to exceed the 4 GB threshold you will see exponential increases in collection times. When you hit the "stop the world" pause (the old gen GC where everything gets cleaned up) the CLR has to go through the entire heap and de-allocate memory. Based on your app, 16 GB may give you issues even with more physical memory (Windows Server 2008 R2 - Enterprise or DataCenter can support 2 TB). Even if you feed it more physical memory you may see LONG collection times when your full GC hits.
Ideally I would do the following:
Get more physical memory (you never want to come withing 600MB of your total physical memory allocated to your application to avoid out of memory errors, but your buffer does depend on your load and the application's ability to handle it ... you may want a larger safety net to be safe).
Once you have the physical memory you need run GC logs while stressing the app. This will give you an idea where you see exponential degradation in performance and what level your app can support when considering Heap size (Memory). You may want to find a way to get your 16GB page down to a smaller size. I do know with .Net 4.0 Microsoft has made some solid improvements to the GC process, including allowing a background thread to maintain GC. This should give you the ability to support larger heaps (in theory) ... but nothing beats real tests on the app. Check out this link for more info:
Garbage Collection Performance (Asp.net 4.0) - Also, as I am limited on links. Navigate to the Fundamentals page for some great explanations on new GC features of ASP.Net 4.0
(http://msdn.microsoft.com/en-us/library/ee787088.aspx#concurrent_garbage_collection)
Hope this helps!
PS - Anyone out there on lesser hardware will need to be aware of the ASP.NET use of the GC thread. If you are running something in development like a Core Duo you have to consider that 50% of your compute power will go to GC optimization. This means that Hardware (number of cores) is important to consider. If you have more than you need this process should theoretically help performance. If you are constrained on cores either get better hardware or use an older version of ASP.Net or consider turning the feature off (if possible). Second, if latency is a concern, using "hyper-threading" does have an impact on performance as well. You always get better performance on "physical" cores ... but that will not be a concern for 99.9% of the applications out there.
2 GB by default. If the application is large address space aware (linked with /LARGEADDRESSAWARE), it gets 4 GB (see http://msdn.microsoft.com/en-us/library/aa366778.aspx)
They're still limited to 2 GB since many application depends on the top bit of pointers to be zero.

Resources