Openstack Swift high number of processes in D state

Openstack Swift high number of processes in D state - openstack

I recently set up a standalone swift cluster with one proxy node and three storage nodes. I put some 100gb of data on that. I have attached a volume of 1tb to all three servers and the storage is mounted on those volumes. Everyday at some point of time the storage nodes seems to be unresponsive and unreachable to the point where I have to restart the servers. After doing some monitoring it seemed that the system.cpu.load would start increasing at random times the reason for which being that many processes were waiting for IO and they were stuck in D states resulting in degradation of performance of the machines. On doing ps aux | awk '{if ($8 ~ "D") print $0}' I found these processes that were in D states. In syslog I see these kind of errors too.
I have no other processes running on the machines and cpu, memory everything else seems to be normal. The puzzling part is I have another instance of the cluster with no data and those servers have no issues. I cannot figure out what exactly is causing the processes to be in wait state and some help would greatly be appreciated.

Related

Puma Stops Running for Rails App on EC2 Instance with Nginx (using Capistrano/Capistrano Puma)

My top-level question is, how can I get Puma to stop failing. But that is really made up of lots of smaller questions. I will number and bold each of them, to try to make this question answerable.
I am hosting a Rails application on an EC2 instance that is a t2.nano. This is admittedly, a very small box--but I don't expect my website to receive any traffic. I configured everything successfully with Nginx and Puma using Capistrano and Capistrano Puma. Everything was great, until one day I went to my website and saw the Nginx 504 message.
I opened the Nginx error log and saw that it could not connect to Puma:
connect() to unix:/home/deploy/myapp/shared/tmp/sockets/puma.sock failed (111: Connection refused) while connecting to upstream, client: xxx.xxx.xxx.xxx, server: localhost, request: "GET / HTTP/1.0", upstream: "http://unix:/home/deploy/myapp/shared/tmp/sockets/puma.sock:/500.html", host: "myapp.com"
Debugging this, I learned that Puma had stopped running. That is why Nginx could not connect to it. I think there are two problems here: the first, is that Puma should not stop running. The server is tiny, but there is no traffic. the second, is that when Puma does fail, it should restart gracefully. However, I am just focusing on the first issue for now. Because if Puma is constantly restarting, it seems reasonable that sometimes it kills the process in a harsh way.
To debug this, I opened htop. Sure enough, the machine was running without any memory to spare. This makes sense--I am running a database, rails app, webserver, and memcache on one tiny machine. It keeps running out of memory and killing Puma.
I looked into the Puma configuration I had set up with Capistrano. In config/deploy.rb I had these lines--
set :puma_threads, [0, 8]
set :puma_workers, 0
I read all about puma_workers and puma_threads. I also learned that Nginx has its own workers. Puma processes are very expensive. What makes Puma cool is that it is properly muli-threaded--so the independent processes are awesome. It sounds like each worker has its own set of threads--so if there are 4 workers with 8 threads, there will be 32 processes. But in my case, I want to use very little memory. 2 processes sound good to me. 1. Is my understanding of workers and threads correct?
I updated my config/deploy.rb file and deployed, with 0 puma_workers and min=0, max=2 threads.
It appears the configuration for Nginx lives here: /etc/nginx/nginx.conf. And the configuration for Puma lives here: /home/deploy/myapp/shared/puma.rb. I would have expected my updates in config/deploy.rb to have had Capistano edit the config files. No luck--my min, max threads was still set to 0,8. 2. Is it correct to try and update these values through config/deploy.rb when using Capistano?
Also--I opened the nginx.conf and saw worker_processes 4;. 3. Was this set to four when I installed Nginx or did Capistano set this default?
I opened htop and sure enough I had lots of Puma processes. Therefore, I edited my config files manually and restarted Puma and Nginx.
I changed the number of Nginx workers from 4 to 1. Looking in htop, this worked. I now only had 1 Nginx worker. However, the Nginx workers were never very expensive (compared to the Puma threads). So I don't think this matters much.
However, there were still more than 2 Puma threads--there were 6. On a lark, I changed the minimum number of threads from 0 to 1--thinking 0 isn't a possible number so maybe it's setting a default. This increased the number of Puma processes to 9. I also tried changing the number of puma_workers to 1, for the same reason, and the number of processes increased. 4. What does it mean to have 0 threads and/or workers?
I then tried to kill one of the puma processes manually (sudo kill xxxxx), and then all of the Puma processes died.
5. What do I have to do to have just 2 puma processes?
As you can see, my understanding of Puma is not great and the lines between what Puma vs Nginx vs Capistano touches is not clear. Any help is greatly appreciated. I haven't been able to find great resources regarding this issue.

This is what I've learned--
if Puma stops working, make sure you have enough memory to handle to number of workers and threads that you specified. each Puma process is pretty expensive.
if you set 0 workers, Puma will not run in cluster mode. it is recommended to run MRI using cluster mode.
threads are set per cluster. if you have 2 works and 0,8 threads that means you will have two works and each will have between 1 and 8 threads.
Puma uses processes in addition to the threads. Puma has a PID for the parent process. if you are using cluster mode, it has a PID to manager the clusters. if you are using cluster mode, it also has a PID for each cluster. then, there are a fixed number of PIDs to run other tasks (per cluster). without cluster mode, there are 5 fixed PIDs. with cluster mode, there are 7 fixed PIDs.
this is all to say--if you see more processes than you expect, this is why. also--when you add a new worker you add a significant amount of expensive processes. make sure you have the space.
i have a small app, and things seem to be working nicely with 1 worker and min=1, max=4 threads. having a max of 8 threads looks to be what kept killing puma for me.
To answer my original questions--
Yes, the explanation above of workers and threads is correct.
capistrano-puma appears to only set puma config with the first deploy.
I think the nginx config is created when nginx is installed.
0 workers means you are running puma without cluster mode. It is impossible to have 0 threads. I believe 0,8 is the same as 1,8.
Puma needs to run processes in addition to the threads you request. It is impossible to have puma running with only 2 or 3 PIDs. These processes run addition tasks.

A suspect for Puma hangs
The thing with Puma is that it's the only mainstream project that encourages the use of threading in MRI Ruby (well, anyway, Heroku encourages).
This is why we sometimes see statements from people working on Puma about how people think that Puma has various kinds of issues, while the problem is elsewhere, and it is, and it affects only Puma :P
"We" have discovered and fixed in the past some very freaky and nasty Ruby GC issues on heavy duty use of threads in Ruby MRI with some freaky corner cases (remember http://blog.skylight.io/hunting-for-leaks-in-ruby/) and who is to say this is not the last of such freaky issues that people attribute to Puma?
Try disabling threading for a while, see how it goes, and let us know, maybe the rabbit lies there, again
Docs explaining threads vs clustered mode vs workers
Thread pool docs: https://github.com/puma/puma#thread-pool
Clustered mode docs: https://github.com/puma/puma#clustered-mode
puma.rb options: https://github.com/puma/puma/blob/master/examples/config.rb
Under Thread pool the docs explain how to set up the number of worker threads. Remember, Puma is/was primarily a JRuby thing and MRI support & forking was added only later as an afterthought, the ordering of configuration entries in the docs (how to set up threading before how to set up forking) is a consequence of this.
The docs state:
Puma utilizes a dynamic thread pool which you can modify. You can set the minimum and maximum number of threads that are available in the pool with the -t (or --threads) flag:
Puma 2 offers clustered mode, allowing you to use forked processes to handle multiple incoming requests concurrently, in addition to threads already provided.
Meaning, Puma will always thread, it's what it does, if you tell it to do 0/1 thread, it will do 1 thread so it can serve requests.
Additionally, if you set the number of workers (processes) to > 1, Puma will run in "Clustered mode" which means it will fork and each fork will thread,
i.e. -w 3 -t4:4 will result with 3 processes running 4 threads each, allowing you to concurrently server 12 requests.
Puma docs don't specify which and how many processes Puma will use for it's internals, but just an educated guess is that at the very minimum it needs to run all of the workers + 1 master process to manage them, deliver data to them, start them, stop them, channel their logs etc.

How many Tornado Instances and How many Nginx Worker Processes

Suppose I am running a web application using Tornado and running them behind Nginx as a Load Balancer. Please tell me the best practices for certain things.
1. If I am running the service in an AWS EC2 instance, then How many NGINX worker processes should I run for a given x number of VCPUs for any particular instance. Lets say I am running on an EC2 instance with 2 VCPUs, then how many worker processes should I run? It would be better if I know the general rule for it. Also, in what conditions should I increase the number of workers as against the general rule?
2. Now after I set my Nginx as load balancer, it boils down to my Tornado Application. So, how many Tornado instances should I run given x number of VCPUs in an EC2 instance? As mentioned in the doc, its good to have 1 instance per processor, but is that the best condition? If yes, then in what scenario, should I look for increasing the number of instances per processor? If not, than what is the best rule?
NOTE : I am running the instances via Supervisord as my process management program.
3. Now if my application does a lot of async calls to MySQL Database and MongooseIM server, all running on the same host, then will the number of Tornado Instances per processor should be changed? If yes, then what is the rule? If not, then what is the best practice?

If you are running nginx on a machine by itself, then you should give it as many worker processes as you have CPUs. If you're running it on the same machine as Tornado then you probably want to give it fewer (maybe just one). But it's better to be too high than too low here, so if you're unsure it's fine to use the number of CPUs. You'll want more nginx workers if you're using TLS (especially with stronger security settings) or serving a lot of static files, and fewer if it's just a proxy to Tornado.
One Tornado instance per CPU is the best starting point. You might decrease this number if your application does a lot with threads or if there are other things running on the same machine, and you might increase it if you do any synchronous database/network calls without threads.
As long as your database calls are asynchronous, they do not affect how many Tornado processes you should run.

How can I see detailed work of nodes on a Rocks Cluster?

I built a Rocks Cluster for my school project, which is matrix multiplication, with one frontend and 5 other computers which are nodes. Over MPI I send them partions of matrix which they use for multiplication and then they send data back. Command which I run is:
mpirun -hostfile myhostfile ./myprogram
where myhostfile is a file of names of nodes and their slots(thread) numbers.
My program is working and I'm trying to analize it now.
My question is how can i see the work of each nodes core/processor working on his task, are the all processors working, is there some kind of overload?
I tried to install Vampir profiler and Intels Vtune Amplifierbut but I have some problems attaching them to my program with this command above (other comands dont allow me to run my programs on all threads of a node). All that i have accomplished (to see my nodes working good besides Ganglia) is to login to a node from the frontend and with the command "top" I could see when my program is executing by the number of threads and almost 100% CPU usage on each thread.

Take a look at mpstat
With no params it will show aggregated load for all cores
mpstat -P ALL shows load for each core
This will give you realtime stats for your nodes:
watch pdsh -w compute-01-[01-10] mpstat
(use your compute nodes names)

How do I kill running map tasks on Amazon EMR?

I have a job running using Hadoop 0.20 on 32 spot instances. It has been running for 9 hours with no errors. It has processed 3800 tasks during that time, but I have noticed that just two tasks appear to be stuck and have been running alone for a couple of hours (apparently responding because they don't time out). The tasks don't typically take more than 15 minutes. I don't want to lose all the work that's already been done, because it costs me a lot of money. I would really just like to kill those two tasks and have Hadoop either reassign them or just count them as failed. Until they stop, I cannot get the reduce results from the other 3798 maps!
But I can't figure out how to do that. I have considered trying to figure out which instances are running the tasks and then terminate those instances, but
I don't know how to figure out which instances are the culprits
I am afraid it will have unintended effects.
How do I just kill individual map tasks?

Generally, on a Hadoop cluster you can kill a particular task by issuing:
hadoop job -kill-task [attempt_id]
This will kill the given map task and re-submits it on an different
node with a new id.
To get the attemp_id navigate on the Jobtracker's web UI to the map task
in question, click on it and note it's id (e.g: attempt_201210111830_0012_m_000000_0)

ssh to the master node as mentioned by Lorand, and execute:
bin/hadoop job -list
bin/hadoop job –kill <JobID>

Munin Graphs meaning

I've been using Munin for some days and I think it's very interesting information, but I don't understand some of the graphs, and how they can be used/read to get information to improve system.
The ones I don't understand are:
Disk
Disk throughput per device
Inode usage in percent
IOstat
Firewall Throughput
Processes
Fork rate
Number of threads
VMstat
System
Available entropy
File table usage
Individual interrupts
Inode table usage
Interrupts and context switches
Ty!

Munin creates graphs that enable you to see trends. This is very useful to see if a change you made doesn't negatively impact the performance of the system.
Disk
Disk throughput per device & IOstat
The amount of data written or read from a disk device. Disks are always slow compared to memory. A lot of disk reads could for example indicate that your database server doesn't have enough RAM.
Inode usage in percent
Every filesystem has a index where information about the files is stored, like name, permissions and location on the disk. With many small files the space available to this index could run out. If that happens no new files can be saved to that filesystem, even if there is enough space on the device.
Firewall Throughput
Just like it says, the amount of packets going though the iptables firewall. Often this firewall is active on all interfaces on the system. This is only really interesting if you run munin on a router/firewall/gateway system.
Processes
Fork rate
Processes are created by forking a existing process into two processes. This is the rate at wich new processes are created.
Number of threads
The total number of processes running in the system.
VMstat
Usage of cpu time.
running: time spent running non-kernel code
I/O sleep: Time spent waiting for IO
System
Available entropy: The entropy is the measure of the random numbers available from /dev/urandom. These random numbers are needed to create SSL connections. If you create a large number of SSL connections this randomness pool could possibly run out of real random numbers.
File table usage
The total number of files open in the system. If this number suddenly goes up there might be a program that is not releasing its file handles properly.

Categories

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex