Grunt task to kill a PID based on port number - gruntjs

Currently am doing the task of killing the PID that is running on a port (4444, of course the selenium server for my protractor tests). For the rest i.e., starting the selenium server and running my protractor tests i have grunt tasks and they are working like a charm. But, if i do not manually kill the process where my selenium server might have been running because of the previous test run, then my tests would not start as the port is already occupied by previous test run.
What i would ideally want as a solution is a grunt task that will fetch all the processes and then filter out based on the port# 4444, then kill that process. Can someone please help me with this grunt task. Thanks in advance.

Related

How to kill a running EL pipeline in Meltano

When running an Extract/Load pipeline with Meltano, what's the best way (or ways) to kill a running job?
Generally these would be run via Airflow but would be nice to have a process that worked also with bare meltano elt and/or meltano run invocations from orphaned terminal sessions, which might not be able to be canceled simply by hitting Ctrl+C.

Running airflow DAG/tasks on different hosts

We currently have a bunch of independent jobs running on different servers & being scheduled with crontab. The goal would be to have a single view of all the jobs across the servers and whether they've run successfully etc.
Airflow is one of the tools we are considering using to achieve this. But our servers are configured very differently. Is it possible to set up airflow so that DAG1 (and the airflow scheduler & webserver) runs on server1 and DAG2 runs on server2 without RabbitMQ.
Essentially I'd like to achieve something like the first answer given here (or just at a DAG level): Airflow DAG tasks parallelism on different worker nodes
in the quickest & simplest way possible!
Thanks
You can checkout Running Apache-Airflow with Celery Executor in Docker.
To use celery, you can instantiate a redis node as a pod and proceed with managing tasks across multiple hosts.
The link above will also give you a starter docker-compose yaml to help you get started quickly with Apache Airflow on celery executor.
Is it possible to set up airflow so that DAG1 (and the airflow
scheduler & webserver) runs on server1 and DAG2 runs on server2
without RabbitMQ.
Airflow by default will try to use multiple hosts on Celery Executor and the division will always be on task level and not on DAG level.
This post might help you with spawning specific tasks on a specific worker node.

Is there way to have 3 set of worker nodes (groups) for airflow

We are setting up airflow for scheduling/orchestration , currently we have Spark python loads, and non-spark loads in different server and push files to gcp available in another server. Is there an option to decide to which worker nodes the airflow task are submitted? Currently we are using ssh connection to run all work loads. Our processing is mostly on-perm
Usage is celery executor model, How to we make sure that a specific task is run on its appropriate node.
task run a non spark server ( no spark binaries available)
task 2 executes PySpark submit - (This has spark binaries)
Task Push the files created from task 2 from another server/nodes ( Only this has the gcp utilities installed to push the files due to security reason ) .
If create a dag, is it possible to mention the task to execute on set of worker nodes ?
Currently we are having wrapper shell script for each task and making 3 ssh runs to complete these process. We would like to avoid such wrapper shell script rather use the inbuild have pythonOperator , SparkSubmitOperator, SparkJdbcOperator and SFTPToGCSOperator and make sure the specific task runs in specific server or worknodes .
In short , can we have 3 worker node groups and make the task to execute on a group of nodes based on the operations?
We can assign a queue to each worker node like
Start the airflow worker with mentioning the queue
airflow worker -q sparkload
airflow worker -q non-sparkload
airflow worker -q gcpload
The start each task with queue mentioned. Similar thread found as well.
How can Airflow be used to run distinct tasks of one workflow in separate machines?

El-cheapo way to monitor tasks in a cluster and restart if they crash (self-healing)?

Consider a linux cluster of N nodes. It needs to run M tasks. Each task can run on any node. Assume the cluster is up and working normally.
Question: what's the simplest way to monitor the M tasks are running, and if a task exits abnormally (exit code != 0), start a new task on any of the up machines. Ignore network partitions.
Two of the M tasks have a dependency so that if task 'm' does down, task 'm1' should be stopped. Then 'm' is started and when up, 'm1' can be restarted. 'm1' depends on 'm'. I can provide an orchestration script for this.
I eventually want to work up to Kubernetes which does self-healing but I'm not there yet.
The right (tm) way to do is to setup a retry, potentially with some back-off strategy. There were many similar questions here on StackOverflow how to do this - this is one of them.
If you still want to do the monitoring, and explicit task restart, then you can implement a service based on the task events that will do it for you. It is extremely simple, and a proof how brilliant Celery is. The service should handle the task-failed event. An example how to do it is on the same page.
If you just need an initialization task to run for each computation task, you can use the Job concept along with an init container. Jobs are tasks that run just once until completion, Kubernetes will restart it, if it crashes.
Init containers run before the actual pod containers are started and are used for initialization tasks: https://kubernetes.io/docs/concepts/workloads/pods/init-containers/

Infinite process creation of PHP script by Supervisord

This is a Unix/queues/Supervisor question. :)
I'm running PHP command from CLI as one-shot script called in my service. It connects Amazon SQS queue and check if there are jobs to do.
Currently I'm running infinite loop with supervisord using autorestart=true option. Things work great but uptime of this process is always 0:00 (which is understandable) and each time script is called Supervisor creates new process with new PID.
So basically my question is: is this fine to recreate process with new PID all-time-around? It's like: initialize process, run process, end process loop 10x per second. Obviously PID is increasing fast.
Can I leave it as it is or there are other ways to run it as single process and run subprocesses? (Supervisor is running it's jobs already in subprocesses so I guess there cannot be subprocess for subprocess?)

Resources