I'm trying to understand the Impala memory settings on my cluster.
We have hosts having 48 GB memory in the cluster. For a host, when I look at the memory resources for each service in cloudera manager, I see that for Impala daemon 38 Gb of memory is allocated.
But, Impala Daemon memory limit is set to O , in the Impala configurations.
So, where is the value 38GB getting assigned to Impala Daemon?
And I believe Impala Daemon memory limit is a node level limit, not a cluster level. Is that right?
Please note that static and dynamic pools are also not configured.
If you don't set a process memory limit, Impala will default to using 80% of the memory on the system as it's process memory limit. (Yes, the process memory limit is a per-node value, not a cluster-wide value.)
Note that this does not mean that 80% of the system memory is actually available, but Impala will limit itself to 80% of the memory. That means that if you have other processes using that memory then you'll see swapping.
Related
I have a Windows Server 2016 machine that runs a server program, there're about 2.2k concurrent requests per second. The server program only costs the server 25% cpu and 25% memory and 30% bandwidth. It's written in c++, just like the boost example. it just does some calculation and return the result to client in TCP, and it doesn't use the disk.
But it's very lag, I can see the lag not only from my clients, but also from the Remote Desktop Connection, it takes about 10 seconds to establish an RDP connection, and it's very quick(less than 2 seconds) if I close the server program.
I guess some resources on my server is exhausted. But how can I find it, is there any tool can profile the system to find the bottleneck?
Update
The server program uses all cores averagely by running 8 threads on 8 cores, I did take care about this, it's confirmed in Task Manager, all 8 cores used nearly the same.
I found the problem is: I'm using a sqlite3 database(my.db) to log all the client access, the server becomes more lag when the .db grows. Now it is 1.2Gb, which causes the lag.
Then I tried:
Keep the 1.2Gb .db, just load it once at startup to read some configuration, stop recording new log, no read/write access while server is running, but it's still lag.
Execute delete from log_table and vacuum to delete the previous log and reduce the .db size to 16k. Then lag problem is gone, client request becomes very quick.
Question
Why the large database can cause the whole server lag? Not only for the server itself, but also affect other app like RDP connection, even the load is low?
Server Environment
Windows Server 2016
cpu: 8 cores (25% used)
memory: 16Gb (25% used)
disk: 40Gb (30% used)
server program written in c++ with boost coroutine
sqlite3 database with PRAGMA journal_mode=WAL; enabled.
Install the sysinternals tools.
Launch procexp.exe (Process Explorer ) - use process explorer to find out memory and disk usage for your process, and other.
Use resmon ( Win+R then type "resmon" ) to monitor the network bandwidth when your program is running and when it's not.
I'm checking a server that has 32gb of ram and I see 99% memory usage.
The machine is used with IIS, MongoDB and ElasticSearch.
None of the processes seemed to be that big. The largest was MongoDB at about 1gb.
So, I shut down everything.. and now that memory usage is 88%
After a reboot, with all services running, the memory usage is 23%
Those are the largest processes on the system, with everything being shutdown. As you can see, everything is very small, but most of the ram remains gone.
How can I track what is eating up all the ram? I tried process explorer, but it doesn't give me any more useful info.
Try to use RAMMAP from sysinternals it will give you more details about memory usage. Like metaFile for example.
Elasticsearch generally a lot of the available RAM to cache search results aggregation. This is to avoid memory swapping. It's very evident and observable in LINUX servers. Thus it's recommended using ES in separate server in production with heavy usage.
So please try and check cache memory once.
Have a look at the Heap size allotted to Elasticsearch. You could check the values of -Xms and -Xmx in jvm.options file. Usually, 50% of physical RAM is allotted to ES and with bootstrap.memory_lock set to true, it locks the RAM. Ideally, as another answer mentions, Elasticsearch should be run in its own machine.
I am running a large scale ERP system on the following server configuration. The application is developed using AngularJS and ASP.NET 4.5
Dell PowerEdge R730 (Quad Core 2.7 Ghz, 32 GB RAM, 5 x 500 GB Hard disk, RAID5 configured) Software: Host OS is VMWare ESXi 6.0 Two VMs run on VMWare ESXi .. one is Windows Server 2012 R2 with 16 GB memory allocated ... this contains IIS 8 server with my application code Another VM is also Windows Server 2012 R2 with SQL Server 2012 and 16 GB memory allocated .... this just contains my application database.
You see, I separated the application server and database server for load balancing purposes.
My application contains a registration module where the load is expected to be very very high (around 10,000 visitors over 10 minutes)
To support this volume of requests, I have done the following in my IIS server -> increase request queue in application pool length to 5000 -> enable output caching for aspx files -> enable static and dynamic compression in IIS server -> set virtual memory limit and private memory limit of each application pool to 0 -> Increase maximum worker process of each application pool to 6
I then used gatling to run load testing on my application. I injected 500 users at once into my registration module.
However, I see that only 40% / 45% of my RAM is being used. Each worker process is using only a maximum amount of 130 MB or so.
And gatling is reporting that around 20% of my requests are getting 403 error, and more than 60% of all HTTP requests have a response time greater than 20 seconds.
A single user makes 380 HTTP requests over a span of around 3 minutes. The total data transfer of a single user is 1.5 MB. I have simulated 500 users like this.
Is there anything missing in my server tuning? I have already tuned my application code to minimize memory leaks, increase timeouts, and so on.
There is a known issue with the newest generation of PowerEdge servers that use the Broadcom Network Chip set. Apparently, the "VM" feature for the network is broken which results in horrible network latency on VMs.
Head to Dell and get the most recent firmware and Windows drivers for the Broadcom.
Head to VMWare Downloads and get the latest Broadcom Driver
As for the worker process settings, for maximum performance, you should consider running the same number of worker processes as there are NUMA nodes, so that there is 1:1 affinity between the worker processes and NUMA nodes. This can be done by setting "Maximum Worker Processes" AppPool setting to 0. In this setting, IIS determines how many NUMA nodes are available on the hardware and starts the same number of worker processes.
I guess the 1 caveat to the answer you received would be if your server isn't NUMA aware/uses symmetric processing, you won't see those IIS options under CPU, but the above poster seems to know a good bit more than I do about the machine. Sorry I don't have enough street cred to add this as a comment. As far as IIS you may also want to make sure your app pool doesn't use default recycle conditions and pick a time like midnight for recycle. If you have root level settings applied the default app pool recycling at 29 hours may also trigger garbage collection against your child pool/causing delays even in concurrent gc where it sounds like you may benefit a bit from Gcserver=true. Pretty tough to assess that though.
Has your sql server been optimized for that type of workload? If your data isn't paramount you could squeeze faster execution times with delayed durability, then assess queries that are returning too much info for async io wait types. In general there's not enough here to really assess for sql optimizations, but if not configured right (size/growth options) you could be hitting a lot of timeouts due to growth, vlf fragmentation, etc.
I have Cloudera Express 5.3.2 installed on a cluster. I would like to use it for Impala querying.
I want to let Impala set the limit depending on the cluster's capacity. In the Impala configuration, in cloudera manager, it's written to "leave it blank to let Impala pick its own limit". However I can't leave the field blank because the web interface tells me that "this field is required".
http://i.imgur.com/c9RA8mV.png
Unfortunately Impala cannot set its own memory limit. You don't have to set a memory limit (use -1), but your queries will perform poorly if you run out of physical memory and the OS is forced to swap. If you're only using Impala on this cluster (i.e. not Hive, MapReduce, Spark, etc.), you can set this to most of the physical memory; we typically recommend 80%. If you do need to share resources with other systems, you should look at the resource management options available in CDH.
We have an ODBC pool running on a NonStop server. The pool is connected to SQL/MX.
This pool is used by a few external Java applications, each of which has an JDBC pool connected to ODBC pool (e.g. 14 connections per application).
With time (after a few application recycles) we see an imbalance between CPUs -- some have 8 ODBC processes running, some only 5. That leads to CPU time imbalance too.
Up to this point we assumed that a CPU is assigned to ODBC process in round-robin fashion. That would maintain the number of ODBC processes more or less equally distributed. It's not the case though.
Is there any information on how ODBC pool decided which CPU to choose for every new allocated process? Does it look at CPU load? Available memory? Something else?
Sadly, even HP's own people (available to us, that is) couldn't answer those questions with certainty. :-(
And in fact connections are assigned to CPUs in round-robin fashion. But if one of the consumers (with its own pool) is restarted for any reason, the connections will be released on the CPUs where they were allocated (obviously), but new ones will be allocated on the next CPU according to round-robin algorithm. Thus some CPUs will become less busy, and some more. Thus imbalance.