I'm positive this is some misconfiguration but I can't seem to find what it is looking through the gearmanbundle and gearman docs.
I've set up Gearman and the gearman PHP extension, and have them running with Symfony2's GearmanBundle.
I can set up jobs and send them to a worker, but the worker isn't automatically executing them, rather it waits for me until I run the app/console gearman:worker:execute command.
Essentially all I want to do is write some images to S3 as a background process so my application doesn't wait for the process to complete before continuing.
When I do run that command it works fine, seems like everything is ok.
Am I running my gearman daemon with the wrong arguments, or, am I missing a configuration somewhere?
Or, do I need to do something like
foreach( $this->gearman->getWorkers() as $worker) {
$worker->doWork();
}
Or, should I be adding all of these as tasks? Just a little confused I guess.
Thanks a bunch. Here's what the code that calls the worker looks like:
foreach($photos as $photo){
$payload = array(
'tempBase64Data' => $photo->getTempBase64Data(),
'path' => $photo->getPath(),
'mime' => $photo->getMime(),
);
$jobIds[] = $this->gearman->doBackgroundJob(
'imageWorker~writeFileToS3',
json_encode($payload)
);
}
After more research I've found that this is the intended functionality of the GearmanBundle. It will not automatically execute jobs unless you run the execute console command.
I intend to use Supervisor to make sure these workers are always running.
Related
I just started using Airflow to coordinate our ETL pipeline.
I encountered the pipe error when I run a dag.
I've seen a general stackoverflow discussion here.
My case is more on the Airflow side. According to the discussion in that post, the possible root cause is:
The broken pipe error usually occurs if your request is blocked or
takes too long and after request-side timeout, it'll close the
connection and then, when the respond-side (server) tries to write to
the socket, it will throw a pipe broken error.
This might be the real cause in my case, I have a pythonoperator that will start another job outside of Airflow, and that job could be very lengthy (i.e. 10+ hours), I wonder if what is the mechanism in place in Airflow that I can leverage to prevent this error.
Can anyone help?
UPDATE1 20190303-1:
Thanks to #y2k-shubham for the SSHOperator, I am able to use it to set up a SSH connection successfully and am able to run some simple commands on the remote site (indeed the default ssh connection has to be set to localhost because the job is on the localhost) and am able to see the correct result of hostname, pwd.
However, when I attempted to run the actual job, I received same error, again, the error is from the jpipeline ob instead of the Airflow dag/task.
UPDATE2: 20190303-2
I had a successful run (airflow test) with no error, and then followed another failed run (scheduler) with same error from pipeline.
While I'd suggest you keep looking for a more graceful way of trying to achieve what you want, I'm putting up example usage as requested
First you've got to create an SSHHook. This can be done in two ways
The conventional way where you supply all requisite settings like host, user, password (if needed) etc from the client code where you are instantiating the hook. Im hereby citing an example from test_ssh_hook.py, but you must thoroughly go through SSHHook as well as its tests to understand all possible usages
ssh_hook = SSHHook(remote_host="remote_host",
port="port",
username="username",
timeout=10,
key_file="fake.file")
The Airflow way where you put all connection details inside a Connection object that can be managed from UI and only pass it's conn_id to instantiate your hook
ssh_hook = SSHHook(ssh_conn_id="my_ssh_conn_id")
Of course, if your'e relying on SSHOperator, then you can directly pass the ssh_conn_id to operator.
ssh_operator = SSHOperator(ssh_conn_id="my_ssh_conn_id")
Now if your'e planning to have a dedicated task for running a command over SSH, you can use SSHOperator. Again I'm citing an example from test_ssh_operator.py, but go through the sources for a better picture.
task = SSHOperator(task_id="test",
command="echo -n airflow",
dag=self.dag,
timeout=10,
ssh_conn_id="ssh_default")
But then you might want to run a command over SSH as a part of your bigger task. In that case, you don't want an SSHOperator, you can still use just the SSHHook. The get_conn() method of SSHHook provides you an instance of paramiko SSHClient. With this you can run a command using exec_command() call
my_command = "echo airflow"
stdin, stdout, stderr = ssh_client.exec_command(
command=my_command,
get_pty=my_command.startswith("sudo"),
timeout=10)
If you look at SSHOperator's execute() method, it is a rather complicated (but robust) piece of code trying to achieve a very simple thing. For my own usage, I had created some snippets that you might want to look at
For using SSHHook independently of SSHOperator, have a look at ssh_utils.py
For an operator that runs multiple commands over SSH (you can achieve the same thing by using bash's && operator), see MultiCmdSSHOperator
again, im stucking in Gearman. I was implementing the ulabox gearman Bundle which works nicely. But there are two things which I dont unterstand yet.
How do I start a Worker??
Im the documentation, I should first execute a worker and the start the code.
https://github.com/ulabox/GearmanBundle/blob/master/README.md
Open the first console and run:
$ php app/console gearman:worker:execute --worker=AcmeDemoBundle:AcmeWorker
Now open another console and run:
$ php app/console gearman:client:execute --client=UlaboxGearmanBundle:GearmanClient:hello_world --worker=AcmeDemoBundle:AcmeWorker --params="{\"foo\": \"bar\" }"
So, if I dont start the worker manually, the job would be done by itsself. If I start the worker, everysthin is fine. But at least, it is a bit strange to start in manually, even if there is set an iteration of x so that the worker will kill itsself after that amount of job.
So please, can anyone help me out of this :((((
Heeeeeelp :) lol
thanks in advance an kind regards
Phil
Yes to run some task in background not only Gearman need to be run but also workers.
So you have run "gearman" that wait for some command (e.x. email send).
Additionally you have waiting workers.
When gearman view new command he look for first free worker and pass this command to it.
Next worker process execution for command and after finish return to Gearman server that it finished and ready to process new command.
More worker you have faster commands in queue processed.
You can use "supervisor" for automatic maintenance workers running.
Bellow you can find few links with more information:
http://www.daredevel.com/php-jobs-with-gearman-and-supervisor/
http://www.masnun.com/2011/11/02/gearman-php-and-supervisor-processing-background-jobs-with-sanity.html
Running Gearman Workers in the Background
When we using Symfony\Component\Process\Process, the command run as who?
I tried the command whoami through Process, but it return void???
$return = exec('whoami');
echo $return."\n"; // return [myname]
$process = new Process('whoami'); // The symfony process
echo $process->getOutput(); // return nothing #_#
Yes, it runs as user you run this command or the user of your webserver.
Your code seems a bit incomplete. I suggest adding $process->run(); before trying to get an output.
I almost guarantee that that Process runs as whatever user your webserver is running as. If you're running apache for instance, try running:
ps aux | egrep '(apache|httpd)'
In your terminal to discover which user apache is running as. My money would be on either apache or httpd as the user which Process runs under. Hope that helps.
From the documentation is better use start() instead run() if you want to create a background process. The process_max_time could kill your process if you create it with run()
"Instead of using run() to execute a process, you can start() it: run() is blocking and waits for the process to finish, start() creates a background process."
All,
I'm looking for a good way to do some job backgrounding through either of these two services.
I see PHPFog supports IronWorks, but i need something more realtime. Through these cloud based PaaS services, I'm not able to use popen(background.php --token=1234). So I'm thinking the best solution, might be to try to kick off a gearman worker to handle the job. (Actually my preferred method would be to use websockets to keep a connection open and receive feedback from the job, rather than long polling a db table through AJAX, but none of these guys support websockets)
Question 1 is, is there a better solution than using gearman to offload the job?
Question 2 is, http://help.pagodabox.com/customer/portal/articles/430779 I see pagodabox supports 'worker listeners' ... has anybody set this up with gearman? Would it work?
Thanks
I am using PagodaBox with a background worker in an application I am building right now. Basically, PagodaBox daemonizes a PHP process for you (meaning it will continually run in the background), so all you really have to do is create a script that checks a database table for tasks to run, runs them, and then sleeps a bit so it's not running too many queries against your database.
This is a simplified version of what I have running:
// Remove time limit
set_time_limit(0);
// Show ALL errors
error_reporting(-1);
// Run daemon
echo "--- Starting Daemon ---\n";
while(true) {
// Query 'work_queue' table for new tasks
// Loop over items and do whatever tasks are associated with them
// Update row to mark task as completed
// Wait a bit
sleep(30);
}
A benefit to this approach is that it's easy to test via CLI:
php tasks.php
You will see all the echo statements come through in console as it's running, and of course it's much easier to do than a more complicated setup with other dependencies like Gearman.
So whenever you add a new task to the table, the maximum amount of time you'll wait for that task to be started in a batch is 30 seconds (or whatever your sleep time is). This is better and preferable to cron jobs, because if you setup a cron job to run every minute (the lowest possible interval) and the work you have to do takes longer than a minute, another cron process will start working on the same queue and you could end up with quite a lot of duplicated task work that could cause a lot of issues that are hard to debug and troubleshoot. So if you instead have either only one background worker that runs all tasks, or multiple background workers that work on different task types, you will never run into this issue.
I have a php script which does the accepted answer described here.
It doesn't work unless I add the following before fclose($fp)
while (!feof($fp)) {
$httpResponse .= fgets($fp, 128);
}
Even a blank for loop would do the job instead of the above!!
But whats the point? I wanted Async calls :(
To add to my pain, the same code is running fine without the above code snippet in an Apache driven environment.
Anybody knows if Nginx or php-fpm having a problem with such requests?
What you're looking for can only be done on Linux flavor systems with a PHP build that includes the Process Control functions (PCNTL library).
You'll find it's documentation here:
http://php.net/manual/en/book.pcntl.php
Specifically what you want to do is "fork" a process. This creates an identical copy of the current PHP script's process including all memory references and then allows both scripts to continue executing simultaneously.
The "parent" script is aware that it is still the primary script. And the "child" script (or scripts, you can do this as many times as you want) is aware that is is a child. This allows you to choose a different action for the parent and the child once the child is spun off and turned into a daemon.
To do this, you'd use something along these lines:
$pid = pcntl_fork(); //store the process ID of the child when the script forks
if ($pid == -1) {
die('could not fork'); // -1 return value means the process could not fork properly
} else if ($pid) {
// a process ID will only be set in the parent script. this is the main script that can output to the user's browser
} else {
// this is the child script executing. Any output from this script will NOT reach the user's browser
}
That will enable a script to spin off a child process that can continue executing along side (or long after) the parent script outputs it's content and exits.
You should keep in mind that these functions must be compiled into your PHP build and that the vast majority of hosting companies will not allow access to them on their servers. In order to use these functions, you generally will need to have a Virtual Private Server (VPS) or a Dedicated server. Not even cloud hosting setups will usually offer these functions as if used incorrectly (or maliciously) they can easily bring a server to it's knees.