Airflow HDFS Sensor - airflow

Trying to get HDFSSensor working. I have set up the hdfs connection and the file is there but it keeps on poking the file and never completes
Poking for file hdfs://user/airflow/stamps/test/ds=2018-10-15/_SUCCESS
code is as below
hdfs_sense_open = HdfsSensor(
task_id='hdfs_sense_open',
filepath='hdfs://user/airflow/stamps/test/ds=2018-10-15/_SUCCESS',
hdfs_conn_id='hdfs_leo',
dag=dag)
Actually it works without file name in the path. I would also like to add one more point when you create hdfs connection, you need to use the hdfs port number not webhdfs port, i.e. 8020 (may be 9000 if it's localhost) but not webhdfs port like 50070
hdfs_sense_open = HdfsSensor(
task_id='hdfs_sense_open',
filepath='/user/airflow/stamps/test/ds=2018-10-15/',
hdfs_conn_id='hdfs_leo',
dag=dag)
Thank you so much both of you for trying to help me out

Try it with the filepath set without the protocol. Like:
hdfs_sense_open = HdfsSensor(
task_id='hdfs_sense_open',
filepath='/user/airflow/stamps/test/ds=2018-10-15/_SUCCESS',
hdfs_conn_id='hdfs_leo',
dag=dag)

Related

mariadb slow query not logged

As the title suggests, no log is recorded in the log file even though the related settings have been completed.
slow_query_log_file = /var/log/mysql/mariadb-slow.log
slow_query_log = 1
long_query_time = 1
log_slow_rate_limit = 1000
log_slow_verbosity = query_plan
log-queries-not-using-indexes
This is mariadb's conf content.
When you open the log file, only the basics exist.
Tcp port: 3306 Unix socket: /run/mysqld/mysqld.sock
Time Id Command Argument
logrotate seems to work fine.
After connecting to mysql, I used select sleep(); but it did not work properly.
The result after using the command is 0, which seems to be normal, but the log is not recorded.
Why wouldn't it work?
The new settings will apply only if the MariaDB server instance is restarted. Therefore, the solution, as mentioned in the comment, is to restart the MariaDB server instance in order to apply the new settings.

Oozie sample on EMR

Can someone please explain me what is a name node and job tracker for oozie action when working on EMR(EMRFS). I do understand that name node is specific to hdfs but if i'm using EMRFS then what should be the value of it in oozie.
name-node should be the namenode FQDN:port or IP:port of the EMR master where HDFS namenode daemon runs. job-tracker is the YARN resource managers address. They remain unchanged with or without EMRFS because OOZIE still uses HDFS(not S3). Based on the Action , the YARN containers(mappers/reducers) might use EMRFS and you do not need to set anything for it.
You can see this ports list to find the necessary ports for EMR :
http://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-4.2.0/emr-release-differences.html#w2ab1c66c15
You can also find them in fs.default.name ,mapred.job.tracker settings of core-site.xml / yarn-site.xml / mapred-site.xml files.

how to write a file to a remote server using rhdfs, R and hadoop dfs [duplicate]

I have small example script (script_p.r) like the following, which in intend run in terminal.
#!/usr/bin/Rscript
sink("output_capture.txt")
mn <- mean(1:10)
# and so on, much longer list of tasks
I want to run this script remotely with other iMac host computer (ip address e.g.. not real : 111.111.111.111) which allows me to log in and work (e.g., not real. username user101, password p12334)
Is this way to run this script remotely (say using ssh), say from other computer with ip address: 222.222.222.222 and user name user102 ?
First, put script_p.r on the remote machine.
Then either just do:
ssh user102#222.222.222.222
user102:-$ ./script_p.r
or ssh user102#222.222.222.222 'script_p.r'
or put it in a script:
runremote.sh :
#!/bin/bash
ssh user102#222.222.222.222 'script_p.r'
and run locally
user101:-$ ./runremote.sh

How can I serve HTTP slowly?

I'm working on an http client and I would like to test it on requests that take some time to finish. I could certainly come up with a python script to suit my needs, something about like:
def slow_server(environ, start_response):
with getSomeFile(environ) as file_to_serve:
block = file_to_serve.read(1024);
while block:
yield block
time.sleep(1.0)
block = file_to_serve.read(1024);
but this feels like a problem others have already encountered. Is there an easy way to serve static files with an absurdly low bandwidth cap, short of a full scale server like apache or nginx.
I'm working on linux, and the way I've been testing so far is with python -m SimpleHTTPServer 8000 in a directory full of files to serve. I'm equally interested in another simple command line server or a way to do bandwidth limiting with one or a few iptables commands on tcp port 8000 (or whatever would work).
The solution I'm going with for now uses a "real" webserver, but a much easier to configure one, lighttpd. I've added the following file to my path (its in ~/bin)
#! /usr/sbin/lighttpd -Df
server.document-root = "/dev/null"
server.modules = ("mod_proxy")
server.kbytes-per-second = env.LIGHTTPD_THROTTLE
server.port = env.LIGHTTPD_PORT
proxy.server = ( "" => (( "host" => "127.0.0.1", "port" => env.LIGHTTPD_PROXY )))
Which is a lighttpd config file that acts as a reverse proxy to localhost; source and destination ports, as well as a server total maximum bandwidth are given as environment variables, and so it can be invoked like:
$ cd /path/to/some/files
$ python -m SimpleHTTPServer 8000 &
$ LIGHTTPD_THROTTLE=60 LIGHTTPD_PORT=8001 LIGHTTPD_PROXY=8000 throttle.lighttpd
to proxy the python file server on port 8000 with a low 60KB per second on port 8001. Obviously, lighttpd could be used to serve the files itself, but this little script can be used to make any http server slow
On Windows you can use Fiddler which is a HTTP proxy debugging tool to simulate very slow speeds. Maybe a similar tool exists on what ever OS you are using.
I remember I once had the same question and my search turned up an Apache2 module that goes by the name of mod_bw (mod_bandwith that is). It served me well for my testings.

Boot script execution order (rc.local)?

With some great help from another user on here I've managed to create a script which writes the necessary network configurations required to /etc/network/interfaces and allow public access to a DomU server.
I’ve placed this script in the /etc/rc.local file, and executed chmod u+x /etc/rc.local to enable it.
The server is a DomU Ubuntu server on the a host (Dom0). And rc.local doesn't seem to be executing before the network is brought up at boot/creation time.
So the configuration changes are being made to the /etc/network/interfaces file, but are not active once the boot process completes. I have to reboot once more before the changes take effect.
I've tried adding /etc/init.d/networking restart to the the end of the rc.local script (before exit 0), but with no joy.
I also tried adding the script to the S35networking file, but again without success.
Any advice or suggestions on getting this script to execute before the network device is brought up would be greatly appreciated.?

Resources