Best practice to run `servr::rmd2()` in background/forever - r

I'm planning to use servr::rmdv2() to host some Rmarkdown generated files on a CentOS 6.3 server. I'm wondering what the suggested best practice is for keeping this running in the background, and preferably restarting when my server restarts. Some options I'm considering
Run in background Rscript ... $. Won't restart.
Run in a screen session. Not sure that this will automatically restart.
Use nohup Rscript ... or nohup servr with the servr provided shell script. Place command in /etc/rc.d/rc.local so it runs when system restarts.
Any other options? I'm thinking #3 is the way to go but haven't done anything like this before so not sure what issues I may run into.

I settled on option #3 (hohup Rscript ...). To take care of making sure it's running, I put this command in crontab to run hourly and use flock to make sure that the cron job only restarts if the job isn't currently running.

Related

Can I use zsh as the default non-interactive shell for WSL2 Ubuntu?

I am trying to use Run/Debug Configurations on WebStorm, however it doesn't seem to source .zshrc and produces errors about not finding commands and environment variables. (An example of this would be yarn tauri dev when using Tauri)
I have installed Ubuntu 20.04 in WSL and the project I opened in WebStorm resides under the $HOME directory. WebStorm is installed in Windows.
For the interactive shell, I have made zsh the default by chsh -s $(which zsh), but when using Run/Debug Configurations it uses the default non-interactive shell, which is dash as far as I know. And my environment variables and PATH are all set in .zshrc, which is not sourced by dash.
It seems in CLion, it is possible to execute commands in the login shell according to this YouTrack issue, but such an option is not available on WebStorm.
Is it possible to use zsh instead of dash as the default non-interactive shell? If not, it would help me a lot to know what is the best practice in such situations.
There are several questions and points you make:
First, from the question title (and the summary at the end):
Can I use zsh as the default non-interactive shell for WSL2 Ubuntu?
Well, maybe (using symlinks), but it would be a really bad idea. So many built-in scripts rely on /bin/sh pointing to Dash, or at least Bash. While Zsh might be compatible with 99.9% of them, eventually there's a strong likelihood that some difference in Zsh would cause a system-level script to fail (or at least produce results inconsistent with those from Dash).
It is possible in Ubuntu to change the default non-interactive ("system" shell) from Dash to Bash with sudo dpkg-reconfigure dash. If you select "No" in the resulting dialog, then the system will be updated to point /bin/sh to bash instead of dash.
But not to Zsh, no.
when using Run/Debug Configurations it uses the default non-interactive shell, which is dash as far as I know
I don't run WebStorm myself, so I'm not sure on this exactly. Maybe #lena's answer (or another) will cover it for you, but if it doesn't, I'm noticing this doc page. It might be worth trying to specify Zsh in those settings, but again, I can't be sure.
And my environment variables and PATH are all set in .zshrc, which is not sourced by dash.
Hmm. I'm guessing you would need these set in a .profile/.zprofile equivalent regardless. I would assume that WebStorm is executing the shell as a non-interactive one, which means that it wouldn't even parse ~/.bashrc if Bash was your default shell.
... it would help me a lot to know what is the best practice in such situations.
Best practice is probably to make sure that your ~/.profile has any environment changes needed. Yes, this violates DRY (don't repeat yourself), but it's probably the best route.
Thanks to the answer here and the discussion below, I was able to figure it out. (Thank you, #NotTheDr01ds and #lena.)
The main problem is that WebStorm is installed on Windows and therefore knows only the environment variables in Windows. There are two ways to solve the problem as follows.
Sharing WSL's environment variable to Windows through WSLENV
Add the line below to .zshrc so that it sets $WSLENV when zsh starts.
export WSLENV=VAR_I_WANT_TO_SHARE:$WSLENV
# Don't forget to insert the colon
# And for some reason, appending the variable after $WSLENV didn't work well
In Windows, run
wsl -e zsh -lic powershell.exe
This runs WSL using zsh (logged-in and interactive), then runs powershell which brings you back to Windows. Although this doesn't seem to achieve anything, by going through zsh in WSL, .zshrc was sourced and therefore $WSLENV set as well. You can check if it worked well by running the below command after you've run the above.
$env:VAR_I_WANT_TO_SHARE
Run WebStorm from the PowerShell that was just created.
& 'C:\Program Files (x86)\JetBrains\WebStorm 2022.1.3\bin\webstorm64.exe'
When you run or debug any of the Run/Debug Configurations, you will see that the environment variable is shared successfully.
Setting the PATH in Windows
For most environment variables, the previous method works well. However, PATH is an exception. The Windows PATH is shared to WSL by default. The opposite doesn't work, probably because the PATH in WSL should not interfere with Windows.I've tried adding the $PATH of WSL into $WSLENV but it didn't seem to work.
In the end, what I did was manually adding each needed $PATH of WSL into the Windows PATH.
For example, if there was export PATH=$PATH:home/(username)/.cargo/bin in .zshrc, you can then add \\wsl$\Ubuntu\home\(username)\.cargo\bin to the Windows $env:Path using the Environment Variable window.
I might have made some mistakes, so feel free to leave an edit or comments.
You can try using npm config set script-shell command to set the shell for your scripts. Like npm config set script-shell "/usr/bin/zsh".
When npm run <script name> spawns a child process, the SHELL being used depends on NPM environment. Cм https://docs.npmjs.com/cli/run-script:
The actual shell your script is run within is platform dependent. By
default, on Unix-like systems it is the /bin/sh command, on Windows it
is the cmd.exe. The actual shell referred to by /bin/sh also depends
on the system. As of npm#5.1.0 you can customize the shell with the
script-shell configuration
See also https://github.com/npm/npm-lifecycle/blob/10c0c08fc25fea3c18c7c030d4618a401963355a/index.js#L293-L304

Automating R to run scripts

I'm basically looking for any way to automatically run R scripts just like it would run as if I was copy and pasting it into console. I've tried the package 'taskscheduleR' however it just seems to output to a log file in the directory which isn't as if I were to just run it inside the Rstudio application.
An example might be, say I want to get the last closing stock prices of 5 stocks each night, then the script in Rstudio and have the variables there and all of the code would be in the script file.
Any thoughts?
I would suggest the in-built Task Scheduler application if you using Windows.
Create a task that will run a batchscript file. This batchscript file has only 1 line which executes the Rscript you want. Set it to run each night (or whatever time you want).
I am not that well-versed in linux and MacOS but here's what I know:
Linux has cron. Add a job to crontab with your preferred timing and execute your script 'path/to/bin/r /path/to/script.r'
MacOS has Automator + iCal (for scheduling). It also has crontab like Linux.

Run r script in background on Ubuntu server

I am working on an ubuntu server. I have an R Script which will run for several days. How would I run it in the background - that when I log out it still runs?!
When I try R script.R it says ARGUMENT 'script.R' __ignored__
First off, to run an R script in batch mode, you have several possibilities; I use the following, which works well:
Rscript scriptname.r
This, however, will run the script in the foreground. This isn’t a problem in tmux per se — just run it in a background tab. However, you can of course run it in the background in the usual way — append &:
Rscript scriptname.r &
Again, this needs to be run inside tmux (or similar) to stay alive once you log out.

When using mpirun with R script, should I copy manually file/script on clusters?

I'm trying to understand how openmpi/mpirun handle script file associated with an external program, here a R process ( doMPI/Rmpi )
I can't imagine that I have to copy my script on each host before running something like :
mpirun --prefix /home/randy/openmpi -H clust1,clust2 -n 32 R --slave -f file.R
But, apparently it doesn't work until I copy the script 'file.R' on clusters, and then run mpirun. Then, when I do this, the results are written on cluster, but I expected that they would be returned to working directory of localhost.
Is there another way to send R job from localhost to multiple hosts, including the script to be evaluated ?
Thanks !
I don't think it's surprising that mpirun doesn't know details of how scripts are specified to commands such as "R", but the Open MPI version of mpirun does include the --preload-files option to help in such situations:
--preload-files <files>
Preload the comma separated list of files to the current working
directory of the remote machines where processes will be
launched prior to starting those processes.
Unfortunately, I couldn't get it to work, which may be because I misunderstood something, but I suspect it isn't well tested because very few use that option since it is quite painful to do parallel computing without a distributed file system.
If --preload-files doesn't work for you either, I suggest that you write a little script that calls scp repeatedly to copy the script to the cluster nodes. There are some utilities that do that, but none seem to be very common or popular, which I again think is because most people prefer to use a distributed file system. Another option is to setup an sshfs file system.

Update deployed meteor app while running with minimum downtime - best practice

I run my meteor app on EC2 like this: node main.js (in tmux session)
Here are the steps I use to update my meteor app:
1) meteor bundle app.tgz
2) scp app.tgz EC2-server:/path
3) ssh EC2-server and attach to tmux
4) kill the current meteor-node process by C-c
5) extract app.tgz
6) run "node main.js" of the extracted app.tgz
Is this the standard practice?
I realize forever can be used too but still do you have to kill the old node process and start a new one every time I update my app? Can the upgrade be more seamless without killing the Node process?
You can't do this without killing the node process, but I haven't found that really matters. What's actually more annoying is the browser refresh on the client, but there isn't much you can do about that.
First, let's assume the application is already running. We start our app via forever with a script like the one in my answer here. I'd show you my whole upgrade script but it contains all kinds of Edthena-specific stuff, so I'll outline the steps we take below:
Build a new bundle. We do this on the server itself, which avoids any missing fibers issues. The bundle file is written to /home/ubuntu/apps/edthena/edthena.tar.gz.
We cd into the /home/ubuntu/apps/edthena directory and rm -rf bundle. That will blow away the files used by the current running process. Because the server is still running in memory it will keep executing. However, this step is problematic if your app regularly does uncached disk operatons like reading from the private directory after startup. We don't, and all of the static assets are served by nginx, so I feel safe in doing this. Alternatively, you can move the old bundle directory to something like bundle.old and it should work.
tar xzf edthena.tar.gz
cd bundle/programs/server && npm install
forever restart /home/ubuntu/apps/edthena/bundle/main.js
There really isn't any downtime with this approach - it just restarts the app in the same way it would if the server threw an exception. Forever also keeps the environment from your original script, so you don't need to specify your environment variables again.
Finally, you can have a look at the log files in your ~/.forever directory. The exact path can be found via forever list.
David's method is better than this once, because there's less downtime when using forever restart compared to forever stop; ...; forever start.
Here's the deploy script spelled out, using the latter technique. In ~/MyApp, I run this bash script:
echo "Meteor bundling..."
meteor bundle myapp.tgz
mkdir ~/myapp.prod 2> /dev/null
cd ~/myapp.prod
forever stop myapp.js
rm -rf bundle
echo "Unpacking bundle"
tar xzf ~/MyApp/myapp.tgz
mv bundle/main.js bundle/myapp.js
# `pwd` is there because ./myapp.log would create the log in ~/.forever/myapp.log actually
PORT=3030 ROOT_URL=http://myapp.example.com MONGO_URL=mongodb://localhost:27017/myapp forever -a -l `pwd`/myapp.log start myapp.js
You're asking about best practices.
I'd recommend mup and cluster
They allow for horizontal scaling, and a bunch of other nice features, while using simple commands and configuration.

Resources