Is it possible to put multiple jobs on ice, eg. using a wildcard for the job name?
Not by using wildcards for the job name. You can use the Group or Application job definitions to put multiple jobs on ice. These support wildcards.
You can retrieve multiple jobs using wildcards and use script to put each job on ICE
Related
I have an airflow DAG which call a particular bash command using a variable. At the backend, we have Aurora DB. Do we know if there are any tables in the Aurora DB which stores information of the variables used in Airflow DAGs? I need to create a report out of it and hence, the ask to access the variables from backend.
I tried using the operational_insights schema but could not find any tables with the desired information.
If you are using an Airflow variable you should be able to query a list of them with the REST API no matter which backend you use.
curl "http://<your Airflow host>/api/v1/variables" --user "login:password"
This is preferred over querying the Airflow metadata database directly because if you accidentally modify or drop a table you can corrupt your Airflow.
With that caveat: the standard table where Airflow variables are stored is variable so after logging into the db SELECT * FROM variable; should return a list.
Again this is for Airflow Variables. From your question I am not entirely sure if you mean that or in general any variables that tasks use. In the latter case you might be looking for the rendered_fields parameter of the task instances, which can also be done using the API.
I am trying to define a dependency to trigger my oozie workflow in my coordinator and my source file path is dynamic which cannot be defined before its presence, that's why i want to match my dependency againt a certian pattern, for example, the file can be like this: hdfs://path/to/file/a01b/a.parquet or hdfs://path/to/file/c01d/a.parquet. I want to match this file by
hdfs://path/to/file/*01*/
in <uri-template> of oozie coordinator dataset but it seems oozie cannot recognise such pattern of wildcard
Any idea how to achieve this?
In airflow, we can perform SQL operations in databases like MySQL, PostgreSQL or cloud database like BigQuery.
We can also pass the parameter using user_defined_macros in SQL which will replace it with certain values. Eg. parameterize the database/schema name to avoid 2 different versions of sql w.r.t Dev/QA/Prod environment.
However, is there any way using which we can further optimize it for different schedule provided SQL's are same.
Eg.
For regular run use table : Dev_A/QA_A/Prod_A
For snapshot run use table : Dev_B/QA_B/Prod_B
This will help us to avoid 2 different versions of SQL for regular and snapshot run.
I am newbie to the beats. I am using topbeat to monitor the system health.
Up to this point everything is fine.
Now I need to monitor the resource utilization of a java process, so I configured topbeat.yml as: procs: ["java"]
In my linux box there are 4 java processes are running but I am interested in only one java process. So,
Is there any way to monitor specific java process using regex?
Is there any way to differentiate the processes by name [not with pid]?
If you wish to view certain processes then you can use sample topbeat dashboards and in that dashboard there is one search which is for proc stats. From there select proc.name from the available fields and further filter it to select your relevant proc.name
Suggestion from elastic forum: https://discuss.elastic.co/t/topbeat-monitor-specific-java-process/65594/2 Try MetricBeat and see if it helps.
I would like create oozie coordinator which will be depending on five different input folders once data available in these folders then it should trigger job. Is it possible?
Yes You can perform the same by calling some UDF Jar, which will help to check the Input Location defined in the Mention configuration and will exist with exception if the List of Input Location is empty