Airflow cannot open .cache file - airflow

I am trying to automate the flow of calling Spotify. I am nearly there but I am held up at one point.
I am using Airflow (ETL tool) to run my python scripts.
I am able to connect using the spotipy module but it cannot read/access the .cache-'username' file which it needs to. I am running on a Debian Google Cloud Compute instance.
The error is
Couldn't write token to cache at: .cache-'username'

Related

How to run Python files with Airflow using Samba connection

I have the following problem. I have a data pipeline at work that transforms raw data and loads it to a cloud database, for various projects. There are Python scripts for the project-based transformations, but everything must be done manually (defining the transformer's project-based inputs, run the transformer, load the data).
I want to automate this process with Airflow. I created the above steps as tasks in Python. The Airflow instance is running on some computer, which must reach a network drive, where the raw data and the transformer scripts are located. The required connection type is Samba.
I managed to connect to the drive and create a SambaHook object:
samba_file_share: Final[object] = SambaHook(connection_id, file_share_name)
In one task, I need to call and run the transformer script. With a former solution (without Samba) I used Popen, which worked fine. However, I must use Samba now, and I face the following problem.
I have the path of the transformer script by reading out the root folder of the file share from the Samba object, and join the path of the transformer to it:
samba_file_share._join_path(transformer_path)
If I print this out, the path is correct, and the network is available. If I fed it as a string to Popen (or byte string or path-like object) I got the error "No such file or directory".
Can anyone help with it? How can I fed it to Popen to run the script; or should I use something else, not Popen, to run it? The Samba documentation is totally incomplete, I could not found anything there so far.
Thanks,
Marci
This automated Airflow solution works perfectly if I connect from a machine that easily access the network drive.
However, that is only for development, and in production it must run in some other machine which has no direct access to the drive. I must use Samba to connect to it, and it breaks everything.

How to execute Python Script through Informatica Cloud

I have a python script that I need to execute and automate via IICS. The output of the script is a csv file. This output should be loaded to the Target. How can I achieve this via Informatica cloud. Please help with some info and documentations to the same.
Thanks
There are two ways to do this.
You can create an executable(using py2exe or some tool) from your py script. Then put that file in Informatica cloud agent server. Then you can call it using shell command. Please note, you do not need to install python or any packages.
You can also put the .py file in agent server and run it using shell like $PYTHON_HOME/python your_script.py You need to make sure py version is compatible and you have all packages installed in agent server.
You can refer to the below screenshot for how to setup shell command. Then you can run it as part of some workflow. Schedule it if needed.
https://i.stack.imgur.com/wnDOV.png

Amazon Web Services - how to run a script daily

I have an R script that I run every day that scrapes data from a couple of different websites, and then writes the data scraped to a couple of different CSV files. Each day, at a specific time (that changes daily) I open RStudio, open the file, and run the script. I check that it runs correctly each time, and then I save the output to a CSV file. It is often a pain to have to do this everyday (takes ~10-15 minutes a day). I would love it if someway I could have this script run automatically at a pre-defined specific time, and a buddy of mine said AWS is capable of doing this?
Is this true? If so, what is the specific feature / aspect of AWS that is able to do this, this way I can look more into it?
Thanks!
Two options come to mind thinking about this:
Host a EC2 Instance with R on it and configure a CRON-Job to execute your R-Script regularly.
One easy way to get started: Use this AMI.
To execute the script R offers a CLI rscript. See e.g. here on how to set this up
Go Serverless: AWS Lambda is a hosted microservice. Currently R is not natively supported but on the official AWS Blog here they offer a step by step guid on how to run R. Basically you execute R from Python using the rpy2-Package.
Once you have this setup schedule the function via CloudWatch Events (~hosted cron-job). Here you can find a step by step guide on how to do that.
One more thing: You say that your function outputs CSV files: To save them properly you will need to put them to a file-storage like AWS-S3. You can do this i R via the aws.s3-package. Another option would be to use the AWS SDK for python which is preinstalled in the lambda-function. You could e.g. write a csv file to the /tmp/-dir and after the R script is done move the file to S3 via boto3's S3 upload_file function.
IMHO the first option is easier to setup but the second-one is more robust.
It's a bit counterintuitive but you'd use Cloudwatch with an event rule to run periodically. It can run a Lambda or send a message to an SNS topic or SQS queue. The challenge you'll have is that a Lambda doesn't support R so you'd either have to have a Lambda kick off something else or have something waiting on the SNS topic or SQS queue to run the script for you. It isn't a perfect solution as there are, potentially, quite a few moving parts.
#stdunbar is right about using CloudWatch Events to trigger a lambda function. You can set a frequency of the trigger or use a Cron. But as he mentioned, Lambda does not natively support R.
This may help you to use R with Lambda: R Statistics ready to run in AWS Lambda and x86_64 Linux VMs
If you are running windows, one of the easier solution is to write a .BAT script to run your R-script and then use Window's task scheduler to run as desired.
To call your R-script from your batch file use the following syntax:
C:\Program Files\R\R-3.2.4\bin\Rscript.exe" C:\rscripts\hello.R
Just verify the path to the "RScript" application and your R code is correct.
Dockerize your script (write a Dockerfile, build an image)
Push the image to AWS ECR
Create an AWS ECS cluster and AWS ECS task definition within the cluster that will run the image from AWS ECR every time it's spun-up
Use EventBridge to create a time-based trigger that will run the AWS ECS task definition
I recently gave a seminar walking through this at the Why R? 2022 conference.
You can check out the video here: https://www.youtube.com/watch?v=dgkm0QkWXag
And the GitHub repo here: https://github.com/mrismailt/why-r-2022-serverless-r-in-the-cloud

How to perform a local shell command in OpenStack HOT template file?

The user data section in HOT is to run shell command on new created nova instance, however, I want to run a local(the machine which runs heat stack-create command) shell command when perform "heat stack-create" command.
But after I read though the OpenStack HOT doc, I didn't find any syntax to support this.
There is no syntax to support this. Heat templates are processed on the server. There is also no guarantee that the submitting system even has a standard Unix shell (heat templates can be submitted via a web browser, or with tools other than the heat CLI).
If your deployment workflow requires coordinating some local action with the Heat deployment, your best bet is to use another tool (Ansible? A shell script?) to automate the workflow.
An alternative would be to have Heat (or your VMs) signal your local system using some sort of web hook, and have a web service running locally that would receive the signal and perform whatever action is appropriate.

Meteor application dies when I close my ssh session with server

Standard procedure to start with
meteor app -p 3000&
This works, except when I close the ssh connection, application is no more running? I have no clue why this is happening.
Awakening Edit:
I use PM2 for traditional node apps, but if I have to setup process monitors, logs, database all by myself. I could just go back to reactjs and socketio and rock it with node.
Consider use mup package from Arunoda to easily deploy and run your app in production.
You could launch meteor with nohup (no hang-up) which serves this purpose.
nohup meteor --production &
But it's not a good idea to run a site in production with meteor anyway.
What should I do to run meteor forever ?
You can use forever, a Node.js tool designed to run node apps as services.
I also want to point that forever is getting old and I've heard of better recent alternatives but it seems to still be a pretty common tool. You could also use systemd which integrates better with the UNIX service ecosystem but that's anoter story.
But first, you'll have to "demeteorize" your meteor application like this :
cd my-project-meteor
meteor bundle --directory ../my-project-node
this is going to take some time
cd ../my-project-node/programs/server
npm install
this is going to take some time too
So now you have a plain node app, that you can run with node main.js
Let me mention that it might be a good idea to use the node version used by meteor which is 0.10.29 as of meteor 0.9.1 You can install it traditionally or you could use the node version that is shipped with the meteor tool.
sudo ln -s ~/.meteor/packages/meteor-tool/1.0.27/meteor-tool-os.linux.x86_64/dev_bundle/bin/node /usr/bin/node
sudo ln -s ~/.meteor/packages/meteor-tool/1.0.27/meteor-tool-os.linux.x86_64/dev_bundle/bin/npm /usr/bin/npm
Note that this way of "installing" node + npm on your system is problematic because :
it assumes you're doing only meteor related stuff.
it is dependant on the release process of the meteor tool (you'll need to rerun these commends if the meteor tool is updated).
You can install the forever tool using npm :
-g means globally : give access to forever to all users on the system
sudo npm install -g forever
To launch your node app as a service, you can use the following command, which sets correctly some environment variables and run the app using forever :
sudo export PORT=80 MONGO_URL=mongodb://localhost/my-project-mongodb ROOT_URL=http://localhost forever start my-project-node/main.js
You can monitor it using forever stop my-project-node/main.js
Also, what's the point of using 3rd party database service like https://mongolab.com/?
When using the meteor tool, it launches a mongod process automatically for you, and the underlying node process executed by meteor representing your app connects to this mongo instance.
When we want to launch our meteor app as a node app, we have to handle the mongo stuff ourself, which kinda answer the question : why not using another service to handle it for us, they know better, right ?
Doesn't it slow down the website, because now application has to connect to their database instead of local database ?
Of course, relying on a 3rd party database service has its inconvenients, and this is one of them. Network communications will always be slower than interprocess communications taking place on localhost (this is especially true on these SSD backed cheap VPS you can find nowadays).
And how exactly do I connect to mongolab for example ?
By setting an appropriate value to the environment variable MONGO_URL, the database service provider will give you an url that corresponds to your online mongodb, this is what you need to pass to the node process in command line if you want meteor to connect to your distant database and work as usual.
If you want to launch a dedicated local mongod instance to let your application connect to it, well this is another topic but you'll have to follow these steps :
first install mongodb correctly on your server, using the reference documentation for the OS version. By correctly I mean choose the same version as meteor is using currently (2.4.9) and let it run as a service so that it will actually restart when your server reboots.
test that mongod is running by launching a client with the mongo command.
pass the correct MONGO_URL when launching your app with forever (something like mongodb://localhost/my-project-mongodb)
Understand now why meteor deploy is amazing :D
Answer copy from here

Resources