I submitted oozie coordinator job under user "runner" when I try either kill or suspend I am getting following error message:
[runner#hadooptools ~]$ oozie job -oozie http://localhost:11000/oozie -kill 0000005-140722025226945-oozie-oozi-C
Error: E0509 : E0509: User [?] not authorized for Coord job [0000005-140722025226945-oozie-oozi-C]
From the logs on oozie server I see following message:
2014-07-25 03:10:07,324 INFO oozieaudit:539 - USER [runner], GROUP [null], APP [cron-coord], JOBID [0000005-140722025226945-oozie-oozi-C], OPERATION [start], PARAMETER [null], STATUS [SUCCESS], HTTPCODE [200], ERRORCODE [null], ERRORMESSAG
E [null]
Time to time even user under I issue the command is not logged correctly.
I am using CentOS 6.3 and Oozie Oozie client build version: 4.0.0.2.0.6.0-101, Oozie server build version: 4.0.0.2.0.6.0-101
I am not even able to stop it under the user oozie who runs the server. Under the user who submitted job I am not able to do suspend, kill, etc. I am able to just perform submit run which passes the flow or info.
Any hints/tricks or do I missconfigured something obvious?
UPDATE:
Security settings for the instance I am using.
<property>
<name>oozie.authentication.type</name>
<value>simple</value>
</property>
<property>
<name>oozie.authentication.simple.anonymous.allowed</name>
<value>true</value>
</property>
My conf/adminusers.txt contains:
# Admin Users, one user by line
runner
Hadoop core-site.xml
<property>
<name>hadoop.security.authentication</name>
<value>simple</value>
</property>
<property>
<name>hadoop.proxyuser.oozie.groups</name>
<value>users</value>
</property>
Where runner is a member of users group. According to Oozie documentation:
Oozie has a basic authorization model:
Users have read access to all jobs
Users have write access to their own jobs
Users have write access to jobs based on an Access Control List (list
of users and groups)
Users have read access to admin operations Admin
Users have write access to all jobs Admin users have write access to
admin operations
Did I overlooked something in configuration?
Do I need to specify/configure something like this:
Pseudo/simple authentication requires the user to specify the user name on the request, this is done by the PseudoAuthenticator class by injecting the user.name parameter in the query string of all requests. The user.name parameter value is taken from the client process Java System property user.name .
Old question, but eh, I got the same problem. Seems related to https://issues.apache.org/jira/browse/OOZIE-800
Just rm ~/.oozie-auth-token before issuing the oozie command solved it for me.
Temporarily resolved by disable security model.
Following setting disabled security model and then all worked as expected.
<property>
<name>oozie.service.AuthorizationService.security.enabled</name>
<value>false</value>
<description>
Specifies whether security (user name/admin role) is enabled or not.
If disabled any user can manage Oozie system and manage any job.
</description>
</property>
Will look deeper how to correctly solve this but as a temporary solution or for development this works fine.
Related
When I used NebulaGraph Explorer, I expected to use workflow, but the task was executed with this error.
There are 0 NebulaGraph Analytics available. clusterSize should be less than or equal to it
You can check according to the following procedure:
Check whether the configuration of SSH password-free login between nodes is successful. You can run the ssh <user_name>#<node_ip> command on the Dag Controller machine to check whether the login succeeds.
Note that if the Dag Controller and Analytics are on the same machine, you also need to configure SSH password-free login.
Check the configuration file of the Dag Controller.
Check whether the SSH user in etc/dag-ctrl-api.yaml is the same as the user who starts the Dag Controller service and configs SSH password-free login.
Check whether the algorithm path in etc/tasks.yaml is correct.
Check whether Hadoop and Java paths in scripts/set_env.sh are correct.
Restart the Dag Controller for the settings to take effect.
as part of a batch job I create 4 command lines through control-m which invoke a legacy console application written in VB6. The console application invokes an ActiveEx server which performs a set of analytic jobs calculating outputs. The ActiveEx server was coded as a singleton but when invoked through control-m I get 4 instances running. the ActiveEx server does not tear down once the job has completed and the command line has closed it self.
I created 4 .bat files which once launced manually on the server, simulate the calls made through control-m and the ActiveEx server behaves as expected, i.e. there is only 1 instance ever running and once complete it closes down gracefully.
What am I doing wrong?
Control-M jobs are run under a service account and it same as we login as a user and execute a job. How did you test this? Did you manually executed each batch job one after another or you have executed all the batch job at the same time from different terminals? You can do one thing. Run the control-M jobs with a time interval like first one at 09.00 second one at 09.05, third one at 09.10 and forth one at 09.15 and see if that fix your issue.
Maybe your job cannot use the Desktop environment.
Check your agent service settings:
Log on As:
User account under which Control‑M Agent service will run.
Valid values:
Local System Account – Service logs on as the system account.
Allow Service to Interact with Desktop – This option is valid only if the service is running as a local system account.
Selected – the service provides a user interface on a desktop that can
be used by whoever is logged in when the service is started. Default.
Unselected – the service does not provide a user interface.
This Account – User account under which Control‑M Agent service will run.
NOTE: If the owner of any Control-M/Server jobs has a "roaming profile" or if job output (OUTPUT) will be copied to or from other computers, the Log in mode must be set to This Account.
Default: Local System Account
I'm looking for a way to launch a custom script when a coordinator start.
So when a coordinator start the running of a job, I'd need to make for example an api call to a third party service.
Is there a way or a workaround to make this possible?
Thank you
Solution found: the key is the property oozie.wf.workflow.notification.url
add in the workflow configuration file, the following parameter
<property>
<name>oozie.wf.workflow.notification.url</name>
<value>http://server_name:8080/oozieNotification/jobUpdate?jobId=$jobId%26status=$status</value>
and create a webservice listening on this url
Is there any Oozie action that can be used in workflow.xml to purge the logs generated by oozie by 2 days from the oozie job excution?
you can use -
log4j.appender.oozie.RollingPolicy.FileNamePattern=${log4j.appender.oozie.File}-%d{yyyy-MM-dd-HH}
log4j.appender.oozie.RollingPolicy.MaxHistory=720
Settings to define the amount of time you keep your logs. The older logs will be gzipped and after a retention period, they would be deleted.
Can anyone please suggest which is best suited scheduler for Hadoop. If it is oozie.
How is oozie different from cron jobs.
Oozie is the best option.
Oozie Coordinator allows triggering actions when files arrive at HDFS. This will be challenging to implement anywhere else.
Oozie gets callbacks from MapReduce jobs so it knows when they finish and whether they hang without expensive polling. No other workflow manager can do this.
There are some benefits over crontab or any other, pointing some links
https://prodlife.wordpress.com/2013/12/09/why-oozie/
Oozie is able to start jobs on data availability, this is not free since someone has to say when the data are available.
Oozie allows you to build complex workflow using the mouse.
Oozie allows you to schedule workflow execution using the coordinator.
Oozie allows you to bundle one or more coordinators.
Using cron on hadoop is a bad idea but it's still fast, reliable, well known. Most of work which is free on oozie has to be coded if you are going to use cron.
Using oozie without Java means ( at the current date ) to meet a long list of dependency problem.
If you are a Java programmer oozie is a must.
Cron is still a good choice when you are in the test/verify stage.
Oozie separates specifications for workflow and schedule into a workflow specification and a coordinator specification, respectively. Coordinator specifications are optional, only required if you want to run a job repeatedly on a schedule. By convention you usually see workflow specifications in a file called workflow.xml and a coordinator specification in a file called coordinator.xml. The new cron-like scheduling affects these coordinator specifications. Let’s take a look at a coordinator specification that will cause a workflow to be run every weekday at 2 AM.
[xml]
<coordinator-app name="weekdays-at-two-am"
frequency="0 2 * * 2-6"
start="${start}" end="${end}" timezone="UTC"
xmlns="uri:oozie:coordinator:0.2">
<action>
<workflow>
<app-path>${workflowAppUri}</app-path>
<configuration>
<property>
<name>jobTracker</name>
<value>${jobTracker}</value>
</property>
<property>
<name>nameNode</name>
<value>${nameNode}</value>
</property>
<property>
<name>queueName</name>
<value>${queueName}</value>
</property>
</configuration>
</workflow>
</action>
</coordinator-app>
[/xml]
The key thing here is the frequency attribute in the coordinator-app element, here we see a cron-like specification that instructs Oozie when to run the workflow. The value for is specified in another properties file. The specification is “cron-like” and you might notice one important difference, days of the week are numbered 1-7 (1 being Sunday) as opposed to the 0-6 numbering used in standard cron.
For info visit:http://hortonworks.com/blog/new-in-hdp-2-more-powerful-scheduling-options-in-oozie/
Apache oozie is built to work with yarn and hdfs.
There are many features like data dependency, coordinator, workflow actions provided by oozie.
Oozie documentation
I think oozie is the best option
Sure you can use cron. But you will have to take lot of efforts to work with hadoop.