Airflow Dag Folder - How to ignore notebook checkpoints - jupyter-notebook

Airflow is being too clever and trying to pick up dags within the jupyter notebook checkpoints folder "dags/.ipynb_checkpoints/" which is throwing an error.
Is there a way to config airflow to ignore folders of a certain pattern? like I would .gitignore?
Thanks

You can create .airflowignore in dags folder:
.ipynb_checkpoints
From the docs:
A .airflowignore file specifies the directories or files in DAG_FOLDER that Airflow should intentionally ignore. Each line in .airflowignore specifies a regular expression pattern, and directories or files whose names (not DAG id) match any of the patterns would be ignored (under the hood, re.findall() is used to match the pattern). Overall it works like a .gitignore file.
.airflowignore file should be put in your DAG_FOLDER. For example, you can prepare a .airflowignore file with contents
project_a
tenant_[\d]
Then files like project_a_dag_1.py, TESTING_project_a.py, tenant_1.py, project_a/dag_1.py, and tenant_1/dag_1.py in your DAG_FOLDER would be ignored (If a directory’s name matches any of the patterns, this directory and all its subfolders would not be scanned by Airflow at all. This improves efficiency of DAG finding).
The scope of a .airflowignore file is the directory it is in plus all its subfolders. You can also prepare .airflowignore file for a subfolder in DAG_FOLDER and it would only be applicable for that subfolder.

Place a file named .airflowignore in the directory you want Airflow to ignore.

Related

WinSCP script to synchronize directories, but exclude several subdirectories

I need to write a script that synchronizes local files with a remote machine.
My file structure is:
ProjectFolder/
.git/
input/
output/
classes/
main.py
readme.md
I need to synchronize everything, but:
completely ignore .git folder
ignore files in input and output folders, but copy the folder
So far my code is:
open sftp://me:password#server -hostkey="XXXXXXXX"
option batch abort
option confirm off
synchronize remote "C:\Users\MYNAME\Documents\MY FOLDER\Python Projects\ProjectFolder" "/home/MYNAME/py_proj/ProjectFolder" -filemask="|C:\Users\MYNAME\Documents\MY FOLDER\Python Projects\ProjectFolder\.git"
close
exit
First question: it doesn't seems to work.
Second question, how to add mask for input and output folder if I have spaces in file paths?
Thanks to all in advance.
Masks for directories have to end with a slash.
To exclude files in a specific folder, use something like */folder/*
-filemask="|.git\;*/input/*;*/output/*"

What is the order of the config file for NGINX

I have the following config files and locations:
etc/ngnix/nginix.conf
var/etc/nginx/sites-available/myproject
etc/ngnix/conf.d/default.conf
etc/ngnix/conf.d/web.conf
I'm confused regarding each conf file role, rules, when to use one or another, are they loaded one after another, or just one, are directives overwriting ?
The nginx configuration file is called nginx.conf and on most systems is located at etc/nginx/nginx.conf.
nginx.conf may optionally contain include statements to read parts of the configuration from other files. See this document for more. Read your nginx.conf file to identify which files and directories are sourced and in which context and which order.
Some distributions are delivered with an nginx.conf file that sources additional files from directories such as /conf.d/ and /sites-enabled/.
There is also a convention on some distributions to symlink files between /sites-available/ and /sites-enabled/.
The nginx -T command (uppercase T) is useful to list the entire configuration across all the included files.

How to exclude a folder from rsync

I am using rsync to deploy a git branch with my production server. Currently, I got js files stored in two locations:
assets/js/
js/
When I run rsync using --exclude js, non of the both folders will be sync, while I want the assets/js/ folder to be synced and the js/ folder inside my root folder to be skipped. How can I achieve this?
You need to specify the pattern for those files and directories:
using:
CWRULE [PATTERN_OR_FILENAME]
CWRULE,MODIFIERS [PATTERN_OR_FILENAME]
so you would have something like
CW- js/
For even more detailed info you can see the man page at the section
Include/Exclude Pattern Rules
from this link, hope it helps

How to stop git (.gitignore) from tracking minqueue (wordpress plugin) cached changes

After entering "git status", I keep getting messages like wp-content/uploads/minqueue-cache/minqueue-9cbb4cb4-9cb6af13.js
even though I have added the following line to .gitignore file: /wp-content/uploads/minqueue-cache/*
. Why is this?
The slash in the beginning of /wp-content/uploads/minqueue-cache/* means starting from the directory where the .gitignore file is in, so your pattern will match all files and folders inside wp-content/uploads/minqueue-cache/ but not the files inside www.apis.de/wp-content/uploads/minqueue-cache/.
If you change the pattern to wp-content/uploads/minqueue-cache/* it will match all files and folders in all wp-content/uploads/minqueue-cache/ folders, no matter where they start.
If you change the pattern to /www.apis.de/wp-content/uploads/minqueue-cache/* it will match all files and folders exactly in this one directory.

finding the last common directory in a jar file

Is there a way to extract the last common directory structure in a jar file ?
For example,
jar -tf test.jar
META-INF/
META-INF/MANIFEST.MF
com/
com/a/
com/a/b/
com/a/b/c/
com/a/b/c/file1.class
com/a/b/d/
com/a/b/d/file2.class
Manifest.txt
The last common directory structure would be
com/a/b
I can code something in java/bash to just split by / and see if there is anything common with the next string, but I was wondering if there is a magical jar option that will save me some time.

Resources