airflow GoogleCloudStorageObjectSensor with wildcard not working file*.xml - airflow

I wanted to use filemask in the GoogleCloudStoragePrefixSensor. I cant use the GoogleCloudStoragePrefixSensor because I also need to see the ending oif the file mask. BAsically my file is like "tv_link_input_*.xml". So, tried using GoogleCloudStorageObjectSensor, but its keep running without any OP. Using airflow 1.10.14. Code
check_file_gcs = GoogleCloudStorageObjectSensor(
task_id='check_file_gcs',
bucket=source_bucket,
object="processing/tv_link_input_*xml",
poke='10',
dag=dag
)
Thanks

Related

Using an alias to get further into directory

I created a alias to the directory data,
now I want to be able to access subdirectories for a script in the same manner as a normal directory, for example data/sub1.
The alias on its own works in the terminal but I can't access the path it stands for ( /User/usr/Folder/Subfolder/data ) in the command line arguments of the script.
python skript.py -input data/path/ -output data/path.
I tried $data, {data} and different quotation marks around the path but wasn't able to access it.
I saw that at some similar questions link -ln was suggested but it sounds like this is similar to an environment variable, the later didn't work for me reliably therefor I haven't tried the link yet.
Any suggestions, again I would prefer the alias but I'm open for anything else.
Kind regards

Snakemake: wildcards do not expand in script line of rule

I am running a pipeline and was trying to optimize it by declaring the paths in a config file (config.yaml). The config.yaml file contains the path to find the scripts to run inside the pipeline, but when I expand the wildcard of the path, the pipeline does not run the script. The script itself runs fine.
To explain my problem:
rule with_script:
input: someinput
output: someoutput
script: expand("{script_path}/scriptfile", script_path = config[scriptpath])
input, output or rule all do not contain the script's path wildcard, so here is the first time I'm declaring it. The config.yaml line that contains the path looks like this:
scriptpath: /path/to/the/script
is there a way to maintain the wildcard and config file path (to make it easier for others to make changes if needed) and have the script work? Like this snakemake doesn't even enter the script file. Or maybe it is possible to declare global wildcards outside the rule all?
Thank you for your help!
P.S.: I'm sorry if this question has already been answered, but I couldn't find anything to help me with this.
You cannot define a function like expand() in the script section. Snakemake expects a path to your script.
Like the documentation states:
The script path is always relative to the Snakefile containing the directive (in contrast to the input and output file paths, which are relative to the working directory). It is recommended to put all scripts into a subfolder "scripts"
If you need to define different paths to your scripts, you can always do it in python outside of your rules. Don't forget, all python code outside of rules is executed before building the DAG. Thus, you can define all variables you want and use them in your rules.
SCRIPTSPATH = config["scriptpath"]
rule with_script:
input: someinput
output: someoutput
script: "{SCRIPTSPATH}/scriptfile"
Note:
Do not mix wildcards and "variables". In an expand function as
expand("{script_path}/scriptfile", script_path = config[scriptpath])
{script_path} is not a wildcard but just a placeholder for the values given in the second parameter of the function.

How do you set edge.options in serenity.config file?

I am using serneity bdd (cucumber). In the serenity.config file I am able to set chrome settings like
chrome.switches = """--headless;"""
I can also pass them in through mvn like
-Dheadless.mode=true
But I cannot set any edge options. I think the correct way is to use
edge.options = """ """
But I cannot find what are valid inputs, and I cannot pass anything (that works) into the mvn command.
I can't seem to find answer online, anyone know?

AWSAathenaOperator output S3 name?

I'm using Airflow 2 and trying to use the AWSAthenaOperator. I can run the operator and it works, but I can't find any way to determine what the file names are that it wrote.
task = AWSAthenaOperator(
task_id="foo",
database="mydb",
query='select * from mytable limit 10;',
aws_conn_id="athena_conn",
output_location='s3://mybucket/myfolder',
)
It drops files in s3://mybucket/myfolder, which is great, but how do I find out from the task output what those file names are? I need to then take those names and pass them to other tasks downstream.
I have been digging through the AWSAthenaOperator and AWSAthenaHook code that seems to do the work underneath, but I can't find where it stores that information or how I'd retrieve it.
I think the problem is that the path that you pass with output_location isn't unique. output_location is a templated field so you can use execution_date to make it unique thus each task save the file to a different and known location so files from different tasks are not getting mixed also using that way you can use the path in downstream tasks.
task = AWSAthenaOperator(
task_id="foo",
database="mydb",
query='select * from mytable limit 10;',
aws_conn_id="athena_conn",
output_location='s3://mybucket/myfolder/{{ execution_date }}',
)

Use bash_profile aliases in jupyter notebook

It looks like my Jupyter notebook picks up everything that I export in my .bash_profile, but nothing that I alias.
I think ! uses bin/sh, so it's understandable that the aliases from the bash profile don't port over, but the %%bash magic also does not pick up the aliases I've written.
Is there a way to make the aliases in my bash profile available through ! (ideally) or, at the least, using %%bash?
This seems to work (python3, modified from a hack I found in a jupyter issue)
import subprocess
lines = subprocess.check_output('source ~/.bash_profile; alias',shell=True).split(b'\n')
manager = get_ipython().alias_manager
for line in lines:
line = line.decode("utf-8")
split_index = line.find('=')
cmd = line[split_index+1:]
alias = line[:split_index]
cmd = cmd[1:-1]
print ("ALIAS:{}\t\tCMD:{}".format(alias,cmd))
manager.soft_define_alias(alias, cmd)
Here's another alternative, which is less a solution than a workaround: you can define aliases locally to the notebook using the %alias magic, and make those aliases available in the future using the %store magic. More alias trickiness here: https://github.com/ipython/ipython/wiki/Cookbook:-Storing-aliases
More on the %store magic here: http://ipython.readthedocs.io/en/stable/config/extensions/storemagic.html
The next step is hacking the %store magic to persist these aliases: https://github.com/ipython/ipython/blob/master/IPython/extensions/storemagic.py
For posterity, here are the results of some experiments I ran before finally finding a solution:
I sourced my .bash_profile in a %%bash cell. From within that cell, I was able to interrogate the values of variables I defined in my .bash_profile, and was able to list aliased commands by invoking alias. However, I was still not able to use aliased commands. Additionally, variables defined in my .bash_profile were only accessible inside the cell with the source call: trying to access them in subsequent %%bash cell didn't work, and the alias command also failed. More interesting still: if I sourced using !, I wasn't able to interrogate variables defined in my bash profile nor list my aliases with ! shell commands in the same cell.
Suffice it say, the %%bash magic is finicky.

Resources