I have a Google AI Platform Notebook, that i would like to run from some kind of command line tool.
The end goal, is to schedule this AI Notebook such that it runs at a specified time every day or week. Fully scripted.
I think I need to first open up a VM instance and then run the Notebook AI on that instance. I know that GCP have a lot of cloud products that should facilitate the above (scheduler, pub/sub, compute, function), but i'm not ready to "marry" Google at this point.
Br, Torben
You can use papermill link.
papermill is a tool for parameterizing, executing, and analyzing
Jupyter Notebooks.
Then you can write a write a script.sh on your VM (Linux):
papermill local/input.ipynb gs://bkt/output.ipynb -p alpha 0.6 -p l1_ratio 0.1
Create a cronjob to run your your script.sh, for example every day at 10:10 AM:
contab -e
10 10 * * * /path/to/script.sh
Related
I would like to run an R script on a Linux server (CentOS) in an automated way. This should be done once a day (if possible several times a day). I would like to download stock prices using R (and later enter them into a database).
For example, the R script looks like this:
library(tidyquant)
library(lubridate)
data<-tq_get("AAPL", from="2021-01-01", to=today())
How should I write a job so that I can run the script automatically within a certain interval?
Can anyone help me?
Many thanks in advance!
you might would like to create a service. Depends on the CentOS version what type of service Systemd or init deamon
Full information of a timed service and the workings here.
Simple tutorial of how to create services here
This lets you create a service with the desired conditions and run your application/script.
Service example:
services are located # /etc/systemd/system/
for example open cli sudo touch /etc/systemd/system/updatestockdb.service
go into file and write your service sudo vim /etc/systemd/system/updatestockdb.service
[unit]
Description=Update stock price DB
Type=simple
[Timer]
OnCalendar=daily
AccuracySec=12h
Persistent=true
ExecStart=/opt/scripts/fetch_Stonks.sh --full --to-external
Restart=on-failure
PIDFile=/tmp/yourservice.pid
[Install]
WantedBy=multi-user.target
I am using the Google AI platform which provides jupyterlab notebooks. I have 2 notebook instances set up to run R of which only one notebook now opens. The first notebook will not open regardless of the number of stops and resets I performed. The notebook overview can be seen in this image and circled is a difference (it is 'started'):
The only reason I can imagine for this difficulty is that I changed the machine type for the notebook where I decreased the number of CPUs from 4 to 2 and the RAM from 15 to 7.5. Now I cannot open it and it has a blank for where the environment should say R3.6. I would not mind deleting it and starting over if there was not nonbacked-up work on it.
What can be done to bring the notebook back to operation and if it cannot be done, how can I download it or extract some key files?
As it was commented before, there are two ways to inspect the Notebook Instance using Cloud Console:
GCP Console > AI Platform > Notebooks
GCP Console > Compute Engine > VM Instances. The name of the GCE VM Instance will be the same as the Notebook Instance name.
It looks like you were able to connect to your Notebook instance via SSH button. Additionally you can use gcloud command to connect to instances via SSH that you have permission to access by following:
gcloud compute ssh --project <PROJECT> --zone <ZONE> <INSTANCE>
After you connect, use the terminal to run commands to verify the status of your jupyter service and the service logs by running:
sudo service jupyter status
sudo journalctl -u jupyter.service --no-pager
You can restart the jupyter service to try to recover it:
sudo service jupyter restart
If you want to use other methods or third parties to create a SSH connection to your Notebook instance, you can follow this.
If you were not able to recover your jupyter service, you can copy your files from your VM by click the gear icon in the upper right of the SSH from the Browser window and select Download file.
As it was mentioned before, the gsutil cp command allows you to copy data between your local file system and the cloud, within the cloud, and between cloud storage providers. For example, to upload all files from the local directory to a bucket, you can run:
gsutil cp -r dir gs://my-bucket
Use the -r option to copy an entire directory tree
I have a meteor.js application I've created to run locally on OSX on my laptop. Every so often, meteor stops running for unexplained reasons (like after the computer returns from sleep).
I'd like to schedule a process on cron to check every minute if meteor is working, and if it's not, launch it. I placed the following in my crontab:
* * * * * ps aux | grep meteor | grep -v grep || cd ~/path_to_my_meteor_project/; nohup meteor &
This command works at launching meteor when I enter it manually in terminal. But when scheduled in cron, it does not seem to do anything.
This is likely to be caused by the fact cron expects a binary file (or a script to launch) whereas, when typed in a terminal, your command is interpreted by bash, which understands logic such as ||.
In order to fix this, you will have to create a bash script containing the line of code working in your terminal, then ask cron to run it every minute.
I am trying to schedule my R script using cron, but it is not working. It seems R can not find packages in cron. Anyone can help me? Thanks.
The following is my bash script
# source my profile
. /home/winie/.profile
# script.R will load packages
R CMD BATCH /home/script.R
Consider these tips
Use Rscript (or littler) rather than R CMD BATCH
Make sure the cron job is running as you
Make sure the script runs by itself
Test it a few times in verbose mode
My box is running the somewhat visible CRANberries via a cronjob calling an R script
(which I execute via littler but Rscript
should work just as well). For this, the entry in /etc/crontab on my Ubuntu server is
# every few hours, run cranberries
16 */3 * * * edd cd /home/edd/cranberries && ./cranberries.r
so every sixteen minutes past every third hour, a shell command is being run with my id. It changes into the working directory, and call the R script (which has executable modes etc).
Looking at this, I could actually just run the script and have setwd() command in it....
I am a novice as far as using cloud computing but I get the concept and am pretty good at following instructions. I'd like to do some simulations on my data and each step takes several minutes. Given the hierarchy in my data, it takes several hours for each set. I'd like to speed this up by running it on Amazon's EC2 cloud.
After reading this, I know how to launch an AMI, connect to it via the shell, and launch R at the command prompt.
What I'd like help on is being able to copy data (.rdata files) and a script and just source it at the R command prompt. Then, once all the results are written to new .rdata files, I'd like to copy them back to my local machine.
How do I do this?
I don't know much about R, but I do similar things with other languages. What I suggest would probably give you some ideas.
Setup a FTP server on your local machine.
Create a "startup-script" that you launch with your instance.
Let the startup script download the R files from your local machine, initialize R and do the calculations, then the upload the new files to your machine.
Start up script:
#!/bin/bash
set -e -x
apt-get update && apt-get install curl + "any packages you need"
wget ftp://yourlocalmachine:21/r_files > /mnt/data_old.R
R CMD BATCH data_old.R -> /mnt/data_new.R
/usr/bin/curl -T /mnt/data_new.r -u user:pass ftp://yourlocalmachine:21/new_r_files
Start instance with a startup script
ec2-run-instances --key KEYPAIR --user-data-file my_start_up_script ami-xxxxxx
first id use amazon S3 for storing the filesboth from your local machine and back from the instance
as stated before, you can create start up scripts, or even bundle your own customized AMI with all the needed settings and run your instances from it
so download the files from a bucket in S3, execute and process, finally upload the results back to the same/different bucket in S3
assuming the data is small (how big scripts can be) than S3 cost/usability would be very effective