How to kill a running EL pipeline in Meltano - airflow

When running an Extract/Load pipeline with Meltano, what's the best way (or ways) to kill a running job?
Generally these would be run via Airflow but would be nice to have a process that worked also with bare meltano elt and/or meltano run invocations from orphaned terminal sessions, which might not be able to be canceled simply by hitting Ctrl+C.

Related

Openresty dynamic spawning of worker process

Is is possible to spawn a new worker process and gracefully shutdown an existing one dynamically using Lua scripting in openresty?
Yes but No
Openresty itself doesn't really offer this kind of functionality directly, but it does give you the necessary building blocks:
nginx workers can be terminated by sending a signal to them
openresty allows you to read the PID of the current wroker thread
LuaJITs FFI allows you to use the kill() system call or
using os.execute you can just call kill directly.
Combining those, you should be able to achieve what you want :D
Note: After reading the question again, I noticed that I really only answered the second part.
nginx uses a set number of worker processes, so you can only shut down running workers which the master process will then restart, but the number will stay the same.
If you just want to change the number of worker processes, you would have to restart the nginx instance completely (I just tried nginx -s reload -g 'worker_processes 4;' and it didn't actually spawn any additional workers).
However, I can't see a good reason why you'd ever do that. If you need additional threads, there's a separate API for that, other than that, you'll probably just have to live with a hard restart.

Infinite process creation of PHP script by Supervisord

This is a Unix/queues/Supervisor question. :)
I'm running PHP command from CLI as one-shot script called in my service. It connects Amazon SQS queue and check if there are jobs to do.
Currently I'm running infinite loop with supervisord using autorestart=true option. Things work great but uptime of this process is always 0:00 (which is understandable) and each time script is called Supervisor creates new process with new PID.
So basically my question is: is this fine to recreate process with new PID all-time-around? It's like: initialize process, run process, end process loop 10x per second. Obviously PID is increasing fast.
Can I leave it as it is or there are other ways to run it as single process and run subprocesses? (Supervisor is running it's jobs already in subprocesses so I guess there cannot be subprocess for subprocess?)

makeCluster function in R snow hangs indefinitely

I am using makeCluster function from R package snow from Linux machine to start a SOCK cluster on a remote Linux machine. All seems settled for the two machines to communicate succesfully (I am able to estabilish ssh connections between the two). But:
makeCluster("192.168.128.24",type="SOCK")
does not throw any result, just hangs indefinitely.
What am I doing wrong?
Thanks a lot
Unfortunately, there are a lot of things that can go wrong when creating a snow (or parallel) cluster object, and the most common failure mode is to hang indefinitely. The problem is that makeSOCKcluster launches the cluster workers one by one, and each worker (if successfully started) must make a socket connection back to the master before the master proceeds to launch the next worker. If any of the workers fail to connect back to the master, makeSOCKcluster will hang without any error message. The worker may issue an error message, but by default any error message is redirected to /dev/null.
In addition to ssh problems, makeSOCKcluster could hang because:
R not installed on a worker machine
snow not installed on a the worker machine
R or snow not installed in the same location as the local machine
current user doesn't exist on a worker machine
networking problem
firewall problem
and there are many more possibilities.
In other words, no one can diagnose this problem without further information, so you have to do some troubleshooting in order to get that information.
In my experience, the single most useful troubleshooting technique is manual mode which you enable by specifying manual=TRUE when creating the cluster object. It's also a good idea to set outfile="" so that error messages from the workers aren't redirected to /dev/null:
cl <- makeSOCKcluster("192.168.128.24", manual=TRUE, outfile="")
makeSOCKcluster will display an Rscript command to execute in a terminal on the specified machine, and then it will wait for you to execute that command. In other words, makeSOCKcluster will hang until you manually start the worker on host 192.168.128.24, in your case. Remember that this is a troubleshooting technique, not a solution to the problem, and the hope is to get more information about why the workers aren't starting by trying to start them manually.
Obviously, the use of manual mode bypasses any ssh issues (since you're not using ssh), so if you can create a SOCK cluster successfully in manual mode, then probably ssh is your problem. If the Rscript command isn't found, then either R isn't installed, or it's installed in a different location. But hopefully you'll get some error message that will lead you to the solution.
If makeSOCKcluster still just hangs after you've executed the specified Rscript command on the specified machine, then you probably have a networking or firewall issue.
For more troubleshooting advice, see my answer for making cluster in doParallel / snowfall hangs.

How do I kill running map tasks on Amazon EMR?

I have a job running using Hadoop 0.20 on 32 spot instances. It has been running for 9 hours with no errors. It has processed 3800 tasks during that time, but I have noticed that just two tasks appear to be stuck and have been running alone for a couple of hours (apparently responding because they don't time out). The tasks don't typically take more than 15 minutes. I don't want to lose all the work that's already been done, because it costs me a lot of money. I would really just like to kill those two tasks and have Hadoop either reassign them or just count them as failed. Until they stop, I cannot get the reduce results from the other 3798 maps!
But I can't figure out how to do that. I have considered trying to figure out which instances are running the tasks and then terminate those instances, but
I don't know how to figure out which instances are the culprits
I am afraid it will have unintended effects.
How do I just kill individual map tasks?
Generally, on a Hadoop cluster you can kill a particular task by issuing:
hadoop job -kill-task [attempt_id]
This will kill the given map task and re-submits it on an different
node with a new id.
To get the attemp_id navigate on the Jobtracker's web UI to the map task
in question, click on it and note it's id (e.g: attempt_201210111830_0012_m_000000_0)
ssh to the master node as mentioned by Lorand, and execute:
bin/hadoop job -list
bin/hadoop job –kill <JobID>

Tie the life of a process to the shell that started it

In a UNIX-y way, I'm trying to start a process, background it, and tie the lifetime of that process to my shell.
What I'm talking about isn't simply backgrounding the process, I want the process to be sent SIGTERM, or for it to have an open file descriptor that is closed, or something when the shell exits, so that the user of the shell doesn't have to explicitly kill the process or get a "you have running jobs" warning.
Ultimately I want a program that can run, uniquely, for each shell and carry state along with that shell, and close when the shell closes.
IBM's DB2 console commands work this way. When you connect to the database, it spawns a "db2bp" process, that carries the database state and connection and ties it to your shell. You can connect in multiple different terminals or ssh connections, each with its own db2bp process, and when those are closed the appropriate db2bp process dies and that connection is closed.
DB2 queries are then started with the db2 command, which simply hands it off to the appropriate db2bp process. I don't know how it communicates with the correct db2bp process, but maybe it uses the tty device connected to stdin as a unique key? I guess I need to figure that out too.
I've never written anything that does tty manipulation, so I have no clue where to even start. I think I can figure the rest out if I can just spawn a process that is automatically killed on shell exit. Anyone know how DB2 does it?
If your shell isn't a subshell, you can do the following; Put the following into a script called "ttywatch":
#!/usr/bin/perl
my $p=open(PI, "-|") || exec #ARGV; sleep 5 while(-t); kill 15,$p;
Then run your program as:
$ ttywatch commandline... & disown
Disowning the process will prevent the shell from complaining that there are running processes, and when the terminal closes, it will cause SIGTERM (15) to be delivered to the subprocess (your app) within 5 seconds.
If the shell isn't a subshell, you can use a program like ttywrap to at least give it its own tty, and then the above trick will work.
Okay, I think I figured it out. I was making it too complicated :)
I think all db2 is daemon-izing db2bp, then db2bp is calling waitpid on the parent PID (the shell's PID) and exiting after waitpid returns.
The communication between the db2 command and db2bp seems to be done via fifo with a filename based on the parent shell PID.
Waaaay simpler than I was thinking :)
For anyone who is curious, this whole endeavor was to be able to tie a python or groovy interactive session to a shell, so I could test code while easily jumping in and out of a session that would retain database connections and temporary classes / variables.
Thank you all for your help!
Your shell should be sending a SIGHUP signal to any running child processes when it shuts down. Have you tried adding a SIGHUP handler to your application to shut it down cleanly
when the shell exits?
Is it possible that your real problem here is the shell and not your process. My understanding agrees with Jim Lewis' that when the shell dies its children should get SIGHUP. But what you're complaining about is the shell (or perhaps the terminal) trying to prevent you from accidentally killing a running shell with active children.
Consider reading the manual for the shell or the terminal to see if this behavior is configurable.
From the bash manual on my MacBook:
The shell exits by default upon receipt of a SIGHUP. Before exiting, an interactive shell resends the SIGHUP
to all jobs, running or stopped. Stopped jobs are sent SIGCONT to ensure that they receive the SIGHUP. To
prevent the shell from sending the signal to a particular job, it should be removed from the jobs table with
the disown builtin (see SHELL BUILTIN COMMANDS below) or marked to not receive SIGHUP using disown -h.
If the huponexit shell option has been set with shopt, bash sends a SIGHUP to all jobs when an interactive
login shell exits.
which might point you in the right direction.

Resources