How to get Oozie's dependency check to list the dataset full path name instead of coord:latest(0)? - oozie

With an Oozie coordinator and workflow, I see the following in the Coord Job Log for a specific action:
JOB[0134742-190911204352052-oozie-oozi-C] ACTION[0134742-190911204352052-oozie-oozi-C#1] [0134742-190911204352052-oozie-oozi-C#1]::CoordActionInputCheck:: Missing deps: ${coord:latest(0)}#${coord:latest(0)}#${coord:latest(0)}#${coord:latest(0)}#${coord:latest(0)}#${coord:latest(0)}
It seems the full path names are missing. If the path name is not specified in the coordinator with latest(0), the paths are available as seen here:
JOB[0134742-190911204352052-oozie-oozi-C] ACTION[0134742-190911204352052-oozie-oozi-C#1] [0134742-190911204352052-oozie-oozi-C#1]::CoordActionInputCheck:: Missing deps:hdfs://labs-xxx/data/funcxx/inputs/uploads/reports-for-targeting/20190923/14
Later the paths is resolved as:
JOB[0134742-190911204352052-oozie-oozi-C] ACTION[0134742-190911204352052-oozie-oozi-C#1] [0134742-190911204352052-oozie-oozi-C#1]::ActionInputCheck:: File:hdfs://labs-xxx/data/funcxx/inputs/uploads/reports-for-targeting/20190923/14, Exists? :true
How can I see the full path name instead of the ${coord:latest(0)} strings?

You can check this vis oozie cli -
oozie job -info 0134742-190911204352052-oozie-oozi-C#1

Related

How can I resolve the exception reported by rocksdb when using NebulaGraph Exchange?

I want to export data from NebulaGraph and used the configuration file provided by the official documentation.
The core content of the configuration file used in this example is as follows:
# Processing tags
# There are tag config examples for different dataSources.
tags: [
# export NebulaGraph tag data to csv, only support export to CSV for now.
{
name: player
type: {
source: Nebula
sink: CSV
}
# the path to save the NebulaGrpah data, make sure the path doesn't exist.
path:"hdfs://192.168.12.177:9000/vertex/player"
# if no need to export any properties when export NebulaGraph tag data
# if noField is configured true, just export vertexId
noField:false
# define properties to export from NebulaGraph tag data
# if return.fields is configured as empty list, then export all properties
return.fields:[]
# nebula space partition number
partition:10
}
Then I executed the spark task as follows:
./spark-submit --master "local" --class com.vesoft.nebula.exchange.Exchange ~/exchange-ent/nebula-exchange-ent-3.3.0.jar -c ~/exchange-ent/export_application.conf
However, the task fails to be executed, and the following error message is displayed:
org.rocksdb.RocksDBException: While open a file for appending: /path/sst/1-102.sst: No such file or directory
I checked the path and /path/sst/ exists, and I have checked that the permission of this folder also belongs to NebulaGraph.
Could anyone help hint at where I could be wrong?

appengine python remote_api module object has no attribute GoogleCredentials

AttributeError: 'module' object has no attribute 'GoogleCredentials'
I have an appengine app which is running on localhost.
I have some tests which i run and i want to use the remote_api to check the db values.
When i try to access the remote_api by visiting:
'http://127.0.0.1:8080/_ah/remote_api'
i get a:
"This request did not contain a necessary header"
but its working in the browser.
When i now try to call the remote_api from my tests by calling
remote_api_stub.ConfigureRemoteApiForOAuth('localhost:35887','/_ah/remote_api')
i get the error:
Error
Traceback (most recent call last):
File "/home/dan/src/gtup/test/test_users.py", line 38, in test_crud
remote_api_stub.ConfigureRemoteApiForOAuth('localhost:35887','/_ah/remote_api')
File "/home/dan/Programs/google-cloud-sdk/platform/google_appengine/google/appengine/ext/remote_api/remote_api_stub.py", line 747, in ConfigureRemoteApiForOAuth
credentials = client.GoogleCredentials.get_application_default()
AttributeError: 'module' object has no attribute 'GoogleCredentials'
I did try to reinstall the whole google cloud but this didn't work.
When i open the client.py
google-cloud-sdk/platform/google_appengine/lib/google-api-python-client/oauth2client/client.py
which is used by remote_api_stub.py, i can see, that there is no GoogleCredentials class inside of it.
The GoogleCredentials class exists, but inside of other client.py files which lie at:
google-cloud-sdk/platform/google_appengine/lib/oauth2client/oauth2client/client.py
google-cloud-sdk/platform/gsutil/third_party/oauth2client/oauth2client/client.py
google-cloud-sdk/platform/bq/third_party/oauth2client/client.py
google-cloud-sdk/lib/third_party/oauth2client/client.py
my app.yaml looks like this:
application: myapp
version: 1
runtime: python27
api_version: 1
threadsafe: true
libraries:
- name: webapp2
version: latest
builtins:
- remote_api: on
handlers:
- url: /.*
script: main.app
Is this just a wrong import/bug inside of appengine.
Or am i doing something wrong to use the remote_api inside of my unittests?
I solved this problem by replacing the folder:
../google-cloud-sdk/platform/google_appengine/lib/google-api-python-client/oauth2client
with:
../google-cloud-sdk/platform/google_appengine/lib/oauth2client/oauth2client
the one which gets included in the google-api-python-client folder now has the needed Class: GoogleCredentials in the client file.
Then i had a second problem with the connection and now i have to call:
remote_api_stub.ConfigureRemoteApiForOAuth('localhost:51805','/_ah/remote_api', False)
note, the port changes every time, the server gets restarted.
Answering instead of commenting as I cannot post a comment with my reputation -
Similar things have happened to me, when running these types of scripts on mac. Sometimes, your PATH variable gets confused as to which files to actually check for functions, especially when you have gcloud installed alongside the app engine launcher. If on mac, I would suggest editing/opening your ~/.bash_profile file to fix this (or possible ~/.bashrc, if on linux). For example, on my Mac I have the following lines to fix my PATH variable:
export PATH="/usr/local/bin:$PATH"
export PYTHONPATH="/usr/local/google_appengine:$PYTHONPATH
These basically make sure the python / command line will look in /usr/local/bin (or /usr/local/google_appengine in the case of the PYTHONPATH line) BEFORE anything in the PATH (or PYTHONPATH).
The PATH variable is where the command line checks for python files when you type them into the prompt. The PYTHONPATH is where your python files find the modules to load at runtime.

How to retrieve currently applied node configuration from Riak v2.0+

Showing currently applied configuration values
In v2.0+ of Riak there is a new command option: riak config effective
Which I read as it would tell you the current running values of riak.
At any time, you can get a snapshot of currently applied
configurations through the command line. For a listing of all of the
configs currently applied in the node
Config changes applied only on start of each node?
In multiple locations in Riak documentation there is reference like:
Remember that you must stop and then re-start each node when you
change storage backends or modify any other configuration
Problem:
However when I made a change to a setting (I've tested this in both riak.conf and advanced.conf), I see the newest value when running: riak config effective
ie:
Start node: riak start
View current setting for log level: riak config effective | grep log.console.level
log.console.level = info
Change the level to debug (something that will output a lot to console.log)
Re-run: riak config effective | grep log.console.level, we get:
log.console.level = debug
Checking the console log file for debug: cat /var/log/riak/console.log | grep debug give no results (indicating the config change has not been applied)
So the question is, how can I retrieve and verify what config setting each Riak node is running under?
When Riak starts, it creates two files: 'app..config' and 'vm..config'. The default location is in a 'generated.configs' directory under the platform data directory (usually /var/lib/riak).
These files will contain the settings that were in place when Riak was started. The command riak config effective processes the current riak.conf and advanced.config files.

Using KeyczarTool to create new keyset

Following the documentation noted in the wiki, I'm trying to use the KeyczarTool to generate new keyset. Anyone else come across this FileNotFoundException? The KeyczarTool.jar has rwx permissions and tried running via sudo.
From docs
Command Usage:
create --location=/path/to/keys --purpose=(crypt|sign) [--name="A name"] [--asymmetric=(dsa|rsa|ec)]
Creates a new, empty key set in the given location.
This key set must have a purpose of either "crypt" or "sign"
and may optionally be given a name. The optional version
flag will generate a public key set of the given algorithm.
The "dsa" and "ec" asymmetric values are valid only for sets
with "sign" purpose.
Cmd:
$ java -jar KeyczarTool-0.71f-060112.jar create --location=/keys --purpose=crypt -name="first key" --asymmetric=rsa
output:
org.keyczar.exceptions.KeyczarException: Unable to write to: /keys/meta
at org.keyczar.KeyczarTool.create(KeyczarTool.java:366)
at org.keyczar.KeyczarTool.main(KeyczarTool.java:123)
Caused by: java.io.FileNotFoundException: /keys/meta (No such file or directory)
at java.io.FileOutputStream.open(Native Method)
at java.io.FileOutputStream.<init>(FileOutputStream.java:194)
at java.io.FileOutputStream.<init>(FileOutputStream.java:145)
at org.keyczar.KeyczarTool.create(KeyczarTool.java:362)
... 1 more
With the current version of java keyczar the directory "keys" needs to be created first before running the program.
This is a known issue KeyczarTool should create directories automatically.
As #jbtule kindly pointed out you must create the keys dir first. But also include . before the slash.
Correct working command is:
$ java -jar KeyczarTool-0.71f-060112.jar create --location=./keys --purpose=crypt -name="first key" --asymmetric=rsa

Get LastWriteTime of SymLink's Target

(Get-Item $SymLink).LastWriteTime return's the SymLink's last modified time and not the target's modified time.
How do I get the target's last modified time?
There appears to be no direct way, thus for now this will have to be done in two steps-
Get the path of the SymLink's target
Get the LastWriteTime from the target's path
To determine if its a symlink: Check if SymLink - PowerShell
To get the path:
use the Dir command's summary output - from which the target information can be snipped out - using RegEx.
or using Native API Call: GetFinalPathNameByHandle; see: Calling Unmanaged Code
from PS

Resources