Hello everyone I'm playing with Airflow, I'm reading this helpful tutorial. I'm asking help to understand better how Admin->Connection works regarding Conn Type: File (path).
I suppose this type of connection is to have local filesystem folder accessible by my operator?
I just understand how to configure a connection for a local file because of your comment, thanks #desimetallica. I will put it here for the next one that will need it.
If the local path should be the dags folder and you are running airflow inside a docker it should be like that:
Related
I have the following problem. I have a data pipeline at work that transforms raw data and loads it to a cloud database, for various projects. There are Python scripts for the project-based transformations, but everything must be done manually (defining the transformer's project-based inputs, run the transformer, load the data).
I want to automate this process with Airflow. I created the above steps as tasks in Python. The Airflow instance is running on some computer, which must reach a network drive, where the raw data and the transformer scripts are located. The required connection type is Samba.
I managed to connect to the drive and create a SambaHook object:
samba_file_share: Final[object] = SambaHook(connection_id, file_share_name)
In one task, I need to call and run the transformer script. With a former solution (without Samba) I used Popen, which worked fine. However, I must use Samba now, and I face the following problem.
I have the path of the transformer script by reading out the root folder of the file share from the Samba object, and join the path of the transformer to it:
samba_file_share._join_path(transformer_path)
If I print this out, the path is correct, and the network is available. If I fed it as a string to Popen (or byte string or path-like object) I got the error "No such file or directory".
Can anyone help with it? How can I fed it to Popen to run the script; or should I use something else, not Popen, to run it? The Samba documentation is totally incomplete, I could not found anything there so far.
Thanks,
Marci
This automated Airflow solution works perfectly if I connect from a machine that easily access the network drive.
However, that is only for development, and in production it must run in some other machine which has no direct access to the drive. I must use Samba to connect to it, and it breaks everything.
The accepted answer to this question states that
"...the gs://my-bucket/dags folder is available in the scheduler, web server, and workers at /home/airflow/gcs/dags."
(which is supported by the newer docs)
So I wrote a bash operator like this:
t1 = bash.BashOperator(
task_id='my_test',
bash_command="touch /home/airflow/gcs/data/test.txt",
)
I thought by prefacing my file creation with the path specified in the answer it would write to the data folder in my cloud composer environment's associated storage account. Simiarly, touch test.txt also ran successfully but didn't actually create a file anywhere I can see it (I assume it's written to the worker's temp storage which is then deleted when the worker is shut down following execution of the DAG). I can't seem to persist any data from simple commands run through a DAG? Is it even possible to simply write out some files from a bash script running in Cloud Composer? Thank you in advance.
Bizarrely, I needed to add a space at the end of the string containing the Bash command.
t1 = bash.BashOperator(
task_id='my_test',
bash_command="touch /home/airflow/gcs/data/test.txt ",
)
The frustrating thing was the error said the path didn't exist so I went down a rabbit-hole mapping the directories of the Airflow worker until I was absolutely certain it did - then I found a similar issue here. Although I didn't get the 'Jinja Template not Found Error' I should have got according to this note.
I wrote a daemon using Skycoder42/QtService.
It works when run from QtCteator but on the server, I get the below error
qtservice: No backend found for the name "standard"
In the project document, I can't find anything
what is the backend? How can I install the service and start it?
Finally, I find the solution:
We must copy the path/to/qt/gcc_64/plugins/servicebackends directory next to executable
Trying to evaluate CoreOS. It really looks like it is an interesting product and I was trying to see about simply starting up networking. I got a static configuration to work by doing the following:
Create a static network file in the /etc/systemd/network/ folder.
It is my understanding that the important parts of the file name I drop into this directory are the number at the beginning of the file for cases when I have multiple network files this will help to determine which file is applied first and the ".network" suffix to declare that this is a network configuration file
The contents of /etc/systemd/network/10-static.network is as follows (yes, this is a very simple configuration):
[Network]
Address=192.168.1.102/24
Gateway=192.168.1.2
I then tried starting the service: sudo systemctl start systemd-networkd
This actually worked and assigned a static ip address that was visible when running ifconfig.
Here is my problem. I rebooted the CoreOS virtual machine and noticed that the networking was no longer set on reboot. When I check the /etc/systemd/network/ folder it is empty and my configuration file apparently disappeared on reboot.
Does anyone know why this would have happened?
Thanks in advance for any help on this!
You must remove ISO image, coreOS maybe reboot same ISO image. If you remove ISO image, coresystem can reboot from new system.
I experienced the same situation before.
Files on disk shouldn't disappear on you like that. Did you happen to PXE-boot this VM or somehow use a file system in RAM?
A better way to do this config is with cloud-config, which CoreOS uses to configure machines at boot. It's intended to provide a repeatable way to set up networking, mount disks and things like that. The steps that you completed manually can be done with cloud-config like this: https://coreos.com/docs/cluster-management/setup/network-config-with-networkd/
More info about cloud-config in general: https://coreos.com/docs/cluster-management/setup/cloudinit-cloud-config/
I just want to transfer a file from ftp server to unix folder, --this is stright forward.
if the file doesn't exist on the ftp server, then the script needs to run recursively until it finds the file. Please let me know how do i get that file.
please remember script has to run on ftp server.
Thanks
CK
I'd mount the FTP server with curlftpfs http://curlftpfs.sourceforge.net and then use it like it were a local file system — for example, run find(1).
You need to write a program to automate your FTP session. You can either write your own custom FTP client, not that hard if you know a few things about network programming, or write a script to automate a session for an existing client. For the latter approach, I suggest using Expect if you are proficient with TCL, or PyExpect if you prefer Python. Expect is a library designed to automate interactive tasks like downloading a file with FTP.