I have a bunch of pig scripts that I'm running as a workflow in oozie. Some of the output files are very short and there are a couple I'd like to concatenate and include in the body of an email action. How would I go about doing this?
Use action and send an email from a script.
workflow.xml :
...
<shell>
<exec>email_hdfs_file.sh</exec>
<file>scripts/email_hdfs_file.sh</exec>
</shell>
...
Make sure you have scripts/email_hdfs_file.sh in the same folder in hdfs.
email_hdfs_file.sh :
#1 download and merge multiple files into one
hadoop fs -getmerge /path/to/your/files part-all.txt
#2 put a command that emails part-all.txt file here
It's up to you how to implement #2
Related
In my airflow task, I am creating a file using open() method in airflow dag and writing records into it. Then sending it with a mail within same task. Will it get deleted automatically or will exists into the dag?
filename = to_report_name(context)+'_'+currentNextRunTime.strftime('%m.%d.%Y_%H-%M')+'_'+currentNextRunTime.tzname()+'.'+extension.lower()
with open(filename, "w+b") as file:
file.write(download_response.content)
print(file.name)
send_report(context,file)
The file will not be deleted automaticly. The code you execute is pure Python if you want the file to be deleted once the operation is done then use tempfile module which gurentee the file will be deleted once it's closed. Example:
import tempfile, os
with tempfile.NamedTemporaryFile() as file:
os.rename(file.name, '/tmp/my_custom_name.txt') # use this if you want to rename the file
file.write(...)
I want to create an event driven oozie coordinator. but the directory path changes regularly. I don't want to hard code the directory in the code.
<datasets>
<dataset name="test_co" frequency="${coord:minutes(120)}" initial-instance="${coordStartDate}" timezone="${timezone}">
<uri-template>**${nameNode}/dynamicName**</uri-template>
<done-flag>_OK</done-flag>
</dataset>
</datasets>
How can i run shell script before this action is triggered it creates the folder name and check if OK file is present inside that folder or not?
Oozie supports creating dynamic directory structure i.e. dated directories using coordinator datasets (if possible use with ).
e.g.
<datasets>
<dataset name="logs" frequency="${coord:hours(1)}" initial-instance="2009- 01-01T01:00Z" timezone="UTC">
<uri-template>hdfs://bar:9000/app/logs/${YEAR}${MONTH}/${DAY}/${HOUR}</uri-template>
</dataset>
</datasets>
After running above oozie code today viz.22-03-2017 16:00 PM
The directory structure would be like : hdfs://bar:9000/app/logs/2017/03/22/16
What is the procedure to upload a html file into cloud.i tried with predix mailed link.but it is given error 403 .can anybody uploaded html files into cloud used predix platform link .please do need full.
The hello world is an html file. Please try this.
https://www.predix.io/resources/tutorials/journey.html#1719
You can serve up files within your microservice that you push up to the cloud. See the rest endpoint /static in the Spring Boot doc. Try out the microservice-template project for some ideas
https://www.predix.io/resources/tutorials/tutorial-details.html?tutorial_id=1523&tag=1719&journey=Hello%20World&resources=1475,1569,1523
I have my collection of Predix.io apps in different languages including static HTML, angular, java even Scala etc.
Please look at
https://github.com/SVyatkin/CF-Hello-World-examples
According your question about HTML it is in the collection as well as just use
https://github.com/SVyatkin/CF-Hello-World-examples/tree/master/hello-html
$ git clone https://github.com/SVyatkin/CF-Hello-World-examples.git
$ cd CF-Hello-World-examples/hello-html
$ cf push -m 128M -b predix_openresty_buildpack html-hello
$ curl https://html-hello.run.aws-usw02-pr.ice.predix.io returns "HTML Predix.io - Hello World Example"
I'm a newbie in Oozie and I've read some Oozie shell action examples but this got me confused about certain things.
There are examples I've seen where there is no <file> tag.
Some example, like in Cloudera here, repeats the shell script in file tag:
<shell xmlns="uri:oozie:shell-action:0.2">
<exec>check-hour.sh</exec>
<argument>${earthquakeMinThreshold}</argument>
<file>check-hour.sh</file>
</shell>
While in Oozie's website, writes the shell script (the reference ${EXEC} from job.properties, which points to script.sh file) twice, separated by #.
<shell xmlns="uri:oozie:shell-action:0.1">
...
<exec>${EXEC}</exec>
<argument>A</argument>
<argument>B</argument>
<file>${EXEC}#${EXEC}</file>
</shell>
There are also examples I've seen where the path (HDFS or local?) is prepended before the script.sh#script.sh within the <file> tag.
<shell xmlns="uri:oozie:shell-action:0.1">
...
<exec>script.sh</exec>
<argument>A</argument>
<argument>B</argument>
<file>/path/script.sh#script.sh</file>
</shell>
As I understand, any shell script file can be included in the workflow HDFS path (same path where workflow.xml resides).
Can someone explain the differences in these examples and how <exec>, <file>, script.sh#script.sh, and the /path/script.sh#script.sh are used?
<file>hdfs:///apps/duh/mystuff/check-hour.sh</file> means "download that HDFS file into the Current Working Dir of the YARN container that runs the Oozie Launcher for the Shell action, using the same file name by default, so that I can reference it as ./check-hour.sh or simply check-hour.sh in the <exec> element".
<file>check-hour.sh</file> means "download that HDFS file -- from my user's home dir e.g. hdfs:///user/borat/check-hour.sh -- into etc. etc.".
<file>hdfs:///apps/duh/mystuff/check-hour.sh#youpi</file> means "download that HDFS file etc. etc., renaming it as youpi, so that I can reference it as ./youpi or simply youpi in the element".
Note that the Hue UI often inserts unnecessary # stuff with no actual name change. That's why you will see it so often.
I am trying to translate a .bat file to a INI file so that I can use WinRun4J to launch a small JAVA app as a service.
Working from the demo that ships with the download, the web page https://github.com/poidasmith/winrun4j and a few samples that have posted I've come up with an .ini file that reads as...
terrainserver.class=ru.ibs.JEPPEG3.ProjectionServer.ProjectionServerDaemon
terrainserver.id=TerrainServer
terrainserver.name=WinRun4J TerrainServer terrainserver
terrainserver.description=Pegasus Terrain Service
classpath.1=*.jar
classpath.2=*.zip
arg.1=prjsrvConfig=.\prjsrv.properties
vmarg.1=-Xdebug
vmarg.2=-Xnoagent
vmarg.3=-Xrunjdwp:transport=dt_socket,address=2121,server=y,suspend=n
vm.heapsize.min.percent=256m
vm.heapsize.preferred=1000m
vm.location=C:\Program Files (x86)\Java\jdk1.7.0_55\jre\bin\server\jvm.dl
from the original batch file...
set JAVA_HOME=c:\jdk1.3.1_03
set PRJSRV_CLASSPATH=.\ProjServer.jar;.\ode.jar;.\classes12.zip;.\JAGR-client.jar;.\PegasusElevAdapter.jar
set PRJSRV_PARAM1=prjsrvConfig=.\prjsrv.properties
start %JAVA_HOME%\bin\java.exe -classpath %PRJSRV_CLASSPATH% -D%PRJSRV_PARAM1% -Xms256m -Xmx1000m ru.ibs.JEPPEG3.ProjectionServer.ProjectionServerDaemon
My question is is using arg key the correct method of setting a reference to the prjsrv.properties file? Or is there a better method? JAVA isn't my strongest language so please bear with me.
From what I can see your batch will have to be translated into:
vmarg.4=-DprjsrvConfig=.\prjsrv.properties
Besides that I think you need to rename these:
terrainserver.class=ru.ibs.JEPPEG3.ProjectionServer.ProjectionServerDaemon
terrainserver.id=TerrainServer
terrainserver.name=WinRun4J TerrainServer terrainserver
terrainserver.description=Pegasus Terrain Service
to
service.class=ru.ibs.JEPPEG3.ProjectionServer.ProjectionServerDaemon
service.id=TerrainServer
service.name=WinRun4J TerrainServer terrainserver
service.description=Pegasus Terrain Service
because WinRun4j does not support terrainserver but service.* or main.class instead.