I am trying to read contents of a simple text file residing on the GCS bucket using Airflow DAG. I have to check first if the file exists, if so, read the contents. Basically the flow will branch from the if_exit_or_not in to yes_exists and no_exists. Upon yes_exists, read the contents from the file and use in the dataflow further. Any help is appreciated. Thanks
Related
I am trying to create a replica of my main database for the sake of testing. However, I am having a hard time figuring out how to do that.
What I have tried is exporting the entire main-database into a bucket. Then I downloaded the 2022-10-24T16-etc.overall_export_metadata file from that bucket and uploaded it to a bucket for test-database. However, when I try to import that file, I get an error
Google Cloud Storage file does not exist: /database-copy/database-copy.overall_export_metadata
I'm a little confused as to why its looking for /database-copy/database-copy.overall_export_metadata when the file im trying to upload looks more like /database-copy/2022-10-24T16-etc.overall_export_metadata.
Any help would be appreciated. Thanks!
I just found a document that explains how to do this,
https://cloud.google.com/firestore/docs/manage-data/move-data
I did't find here an answer for that, so thought anyone can help:
I'm receiving a csv file from a Get request.
I want to upload it to S3 (and then continue the pipeline..)
I'm using Airflow on the managed AMAA platform.
Since when uploading to S3, the script required a file path for the csv file.
how can I pass a file path when it's on the AMAA platform? is it even stored anywhere?
do I need a middle man to store it in between?
The accepted answer to this question states that
"...the gs://my-bucket/dags folder is available in the scheduler, web server, and workers at /home/airflow/gcs/dags."
(which is supported by the newer docs)
So I wrote a bash operator like this:
t1 = bash.BashOperator(
task_id='my_test',
bash_command="touch /home/airflow/gcs/data/test.txt",
)
I thought by prefacing my file creation with the path specified in the answer it would write to the data folder in my cloud composer environment's associated storage account. Simiarly, touch test.txt also ran successfully but didn't actually create a file anywhere I can see it (I assume it's written to the worker's temp storage which is then deleted when the worker is shut down following execution of the DAG). I can't seem to persist any data from simple commands run through a DAG? Is it even possible to simply write out some files from a bash script running in Cloud Composer? Thank you in advance.
Bizarrely, I needed to add a space at the end of the string containing the Bash command.
t1 = bash.BashOperator(
task_id='my_test',
bash_command="touch /home/airflow/gcs/data/test.txt ",
)
The frustrating thing was the error said the path didn't exist so I went down a rabbit-hole mapping the directories of the Airflow worker until I was absolutely certain it did - then I found a similar issue here. Although I didn't get the 'Jinja Template not Found Error' I should have got according to this note.
I have a DAG that downloads a file from Cloud Storage and saves it to the following path: /home/airflow/gcs/data/FILENAME.txt
This file then appears in the Cloud Composer storage bucket under /data.
However, when I originally wrote the DAG I didn't specify the download location to be: /home/airflow/gcs/data/ and simply had it downloading the file in place. I would like to go delete those files but I don't know where to find them.
Where do downloaded files in Cloud Composer reside when you don't specify the folder?
It looks like you don't need to worry about cleanup from when you first wrote the DAG - if you're using the gcs_download_operator, then according to its source code, if you did not specify a value for the filename parameter, the downloaded file won't be stored on the local file system.
I am wanting to setup a File Watcher job to monitor a file shared in an Active Directory environment. The filename is always the same, and does not contain the date/time. And the file stays in it's location until replaced, as others might use the file.
How can I create a File Watcher job to look for a file less than 24 hours old?
AutoSys Automation AE - Release:11.4.6.20180302-b425
There is no easy way of doing that. I would suggest, if you know when the file is created have the FT start little after the creation or deleting the file after it is processed.