I just started using Airflow to coordinate our ETL pipeline.
I encountered the pipe error when I run a dag.
I've seen a general stackoverflow discussion here.
My case is more on the Airflow side. According to the discussion in that post, the possible root cause is:
The broken pipe error usually occurs if your request is blocked or
takes too long and after request-side timeout, it'll close the
connection and then, when the respond-side (server) tries to write to
the socket, it will throw a pipe broken error.
This might be the real cause in my case, I have a pythonoperator that will start another job outside of Airflow, and that job could be very lengthy (i.e. 10+ hours), I wonder if what is the mechanism in place in Airflow that I can leverage to prevent this error.
Can anyone help?
UPDATE1 20190303-1:
Thanks to #y2k-shubham for the SSHOperator, I am able to use it to set up a SSH connection successfully and am able to run some simple commands on the remote site (indeed the default ssh connection has to be set to localhost because the job is on the localhost) and am able to see the correct result of hostname, pwd.
However, when I attempted to run the actual job, I received same error, again, the error is from the jpipeline ob instead of the Airflow dag/task.
UPDATE2: 20190303-2
I had a successful run (airflow test) with no error, and then followed another failed run (scheduler) with same error from pipeline.
While I'd suggest you keep looking for a more graceful way of trying to achieve what you want, I'm putting up example usage as requested
First you've got to create an SSHHook. This can be done in two ways
The conventional way where you supply all requisite settings like host, user, password (if needed) etc from the client code where you are instantiating the hook. Im hereby citing an example from test_ssh_hook.py, but you must thoroughly go through SSHHook as well as its tests to understand all possible usages
ssh_hook = SSHHook(remote_host="remote_host",
port="port",
username="username",
timeout=10,
key_file="fake.file")
The Airflow way where you put all connection details inside a Connection object that can be managed from UI and only pass it's conn_id to instantiate your hook
ssh_hook = SSHHook(ssh_conn_id="my_ssh_conn_id")
Of course, if your'e relying on SSHOperator, then you can directly pass the ssh_conn_id to operator.
ssh_operator = SSHOperator(ssh_conn_id="my_ssh_conn_id")
Now if your'e planning to have a dedicated task for running a command over SSH, you can use SSHOperator. Again I'm citing an example from test_ssh_operator.py, but go through the sources for a better picture.
task = SSHOperator(task_id="test",
command="echo -n airflow",
dag=self.dag,
timeout=10,
ssh_conn_id="ssh_default")
But then you might want to run a command over SSH as a part of your bigger task. In that case, you don't want an SSHOperator, you can still use just the SSHHook. The get_conn() method of SSHHook provides you an instance of paramiko SSHClient. With this you can run a command using exec_command() call
my_command = "echo airflow"
stdin, stdout, stderr = ssh_client.exec_command(
command=my_command,
get_pty=my_command.startswith("sudo"),
timeout=10)
If you look at SSHOperator's execute() method, it is a rather complicated (but robust) piece of code trying to achieve a very simple thing. For my own usage, I had created some snippets that you might want to look at
For using SSHHook independently of SSHOperator, have a look at ssh_utils.py
For an operator that runs multiple commands over SSH (you can achieve the same thing by using bash's && operator), see MultiCmdSSHOperator
I have a requirement to transfer files from one server to another. I used RCP command to perform the same and it was working fine. Please find code below:
rcp tst.txt usrname#hostname:/home/username/destination_folder
I tried to automate the same using Expect command so created the below mentioned shell script:
#!/bin/bash
/usr/bin/expect -d<<EOD
spawn rcp tst.txt usrname#hostname:/home/username/destination_folder
expect "*userid#hostname's password:*"
send "mypassword\n"
EOD
I didn't get any error while I executing the shell script but the file was not transferred. Can someone help me figuring out what the issue is?
I have tried password less transfer through key generation but with no luck so I am trying the RCP approach.
Thanks in Advance,
Vijay
After sending the password, wait for the completion of rcp, by expecting for eof.
send "mypassword\r"
expect eof
If password-less, then, after spawning rcp, expect for eof.
Can anyone show me an example of script that can be run from sqoop2 client in batch mode?
I refered http://sqoop.apache.org/docs/1.99.2/Sqoop5MinutesDemo.html
and it says we can run sqoop2 client in batch mode using the following command
sqoop.sh client /path/to/your/script.sqoop
but that script.sqoop isn't like sqoop1 script, so how should it be?
Batch file is nothing but a list of the same commands you would otherwise type in interactive mode (plus comment lines starting with pound sign).
However! Some commands require manual input, thus cannot be easily fully automated (e.g., 'create link' command). See this thread for details.
I find the below statement before print log statement.
if (LOG.isDebugEnabled())
How can we enable or disable debug statements when running a Giraph program?
And where can one find the logs of these statements?
Setting the custom argument
-ca
giraph.logLevel=debug
Should do the work
To access the logs try something like
yarn logs --applicationId application_1399469361545_0003
where you can find the application ID in the console output.
I am attempting to kick off a third party program using EXEC command in PeopleSoft. It is returning error code 127. When I kick the program off from Unix command line, I get no error. Does anybody know what code 127 is? Or have a list of all the return codes?
I think it is likely the Unix shell return code, in which case 127 is "command not found".
See http://tldp.org/LDP/abs/html/exitcodes.html
You may need to make sure your Exec call is specifying the correct path, relative or absolute, or that any expected environment variables are available. Possibly test with a simple program to see if calling through Exec is successful at all. On the server it would run under the ID that started the app server, and may be sourced differently than an individual user. If using relative paths I believe it would start in $PS_HOME.
If you can provide the code snippet someone may be able to also provide other suggestions.