E-mail (or similar) notification when code execution is finished - r

I am currently doing several simulations in R that each take quite a long time to execute and the time it takes for each to finish varies from case to case. To use the time in between more efficiently, I wondered if it would be possible to set up something (like a e-mail notification system or similar) that would notify me as soon a a chunk of simulation is completed.
Does somebody here have any experience with setting up something similar or does someone know a resource that could teach me to implement a notification system via R?

I recently saw an R package for this kind of thing: pushoverr. However didn't use it myself - so not tested how it works. But seems like it might be useful in your case.

I assume you run the time consuming simulations on a server, correct? If these run own you own PC, your PC will be slow as hell anyway and I would not see something beneficial in sending a mail to myself.
For long calculations: Run them on a virtual machine, I use the following workflow for my own calculations.
Write your R script. Important: Write a .txt file when the calculation file in the end. The shell script will search in a loop for the file to exist.
Copy that code an save it as Python script. I tried one day to get MailR running a Linux and it did not work. This code worked on the first try.
#!/usr/bin/env python3
import smtplib
from email.mime.text import MIMEText
from email.mime.multipart import MIMEMultipart
from email.mime.base import MIMEBase
from email import encoders
email_user = 'youownmail#gmail.com'
email_password = 'password'
email_send = 'theothersmail.com'
subject = 'yourreport'
msg = MIMEMultipart()
msg['From'] = email_user
msg['To'] = email_send
msg['Subject'] = subject
body = 'Calculation is done'
msg.attach(MIMEText(body,'plain'))
part = MIMEBase('application','octet-stream')
part.set_payload((attachment).read())
encoders.encode_base64(part)
msg.attach(part)
text = msg.as_string()
server = smtplib.SMTP('smtp.gmail.com',587)
server.starttls()
server.login(email_user,email_password)
server.sendmail(email_user,email_send,text)
server.quit()
Make sure you are allowed to run the script.
sudo chmod 777 /path/script.R sudo chmod 777 /path/script.py
Run both your script.R and script.py inside a script.sh file. It looks the the following:
R < /path/script.R --no-save
while [ ! -f /tmp/finished.txt ]
do
sleep 2
done
python path/script.py
This may sound a bit overwhelming if you are not familiar with these technologies, but think this is a pretty much automated workflow, which relieves your own resources and can be used "in production". (I use this workflow to send me my own stock reports).

Related

How to release R's prompt when using 'system'?

I am writing an R code on a Linux system using RStudio. At some point in the code, I need to use a system call to a command that will download a few thousand of files from the lines of a text file:
down.command <- paste0("parallel --gnu -a links.txt wget")
system(down.command)
However, this command takes a little while to run (a couple of hours), and the R prompt stays locked while the command runs. I would like to keep using R while the command runs on the background.
I tried to use nohup like this:
down.command <- paste0("nohup parallel --gnu -a links.txt wget > ~/down.log 2>&1")
system(down.command)
but the R prompt still gets "locked" waiting for the end of the command.
Is there any way to circumvent this? Is there a way to submit system commands from R and keep them running on the background?
Using ‘processx’, here’s how to create a new process that redirects both stdout and stderr to the same file:
args = c('--gnu', '-a', 'links.txt', 'wget')
p = processx::process$new('parallel', args, stdout = '~/down.log', stderr = '2>&1')
This launches the process and resumes the execution of the R script. You can then interact with the running process via the p name. Notably you can signal to it, you can query its status (e.g. is_alive()), and you can synchronously wait for its completion (optionally with a timeout after which to kill it):
p$wait()
result = p$get_exit_status()
Based on the comment by #KonradRudolph, I became aware of the processx R package that very smartly deals with system process submissions from within R.
All I had to do was:
library(processx)
down.command <- c("parallel","--gnu", "-a", "links.txt", "wget", ">", "~/down.log", "2>&1")
processx::process$new("nohup", down.comm, cleanup=FALSE)
As simple as that, and very effective.

Python3.8 asyncio behavior difference between Windows and Unix

I am working on script where I will be dealing with huge amount of data to process via python.
I have written a script using asyncio in python3.8 on windows box which is working perfectly fine but when I execute the same script on unix on python3.8 its completing the execution but not terminating the program at the end. Seems like its not release resources/lock.
When I debug further, found that on windows the asyncio uses ProactorEventLoop whereas on Unix it uses _UnixSelectorEventLoop, But not sure if this affect by any means.
I cant share the full script but it follows below structure:
import asyncio
async def myCoroutine():
print("My Coroutine")
try:
loop = asyncio.get_event_loop()
loop.run_until_complete(myCoroutine())
print("Execution Completed")
finally:
print("Closing the loop")
loop.close()
print("loop Closed")
Output:
Execution Completed
Closing the loop
loop closed
But program is not terminating.
Is anyone faced the similar issue before? Any inputs?
Thanks in Advance!!

R system functions always returns error 127

I need to execute an external tool from R and process errors (if any) occurred in that tool.
I know 3 functions to do something familiar with my task:
shell, system and system2.
Trying to test those, I see that command
shell("notepad")
opens notepad. As far as I know shell doesn't allow to check errors (there's no interface to look into stderr).
When I call
system("notepad")
or
system2("notepad")
R freezes trying to make those commands.
Calling
system("start notepad")
or
system2("start notepad")
returns warning
Warning message:
running command '"start notepad"' had status 127
Adapting #DavidTseng's answer (sorry for not having enough reputation to upvote it)...
system("cmd.exe", input = "notepad")
worked for me in Windows.
As I mentioned in my comments, the R documentation reveals that in Windows the system() function does not launch a separate shell (if needed). This is why command line commands run with system(), but Notepad, which needs a separate window, does not run:
From the documentation for system():
The most important difference is that on a Unix-alike system launches a shell which then runs command. On Windows the command is run directly – use shell for an interface which runs command via a shell (by default the Windows shell cmd.exe, which has many differences from a POSIX shell).
system("bash -l", input = "notepad")
I'm not sure if there's been an update to R that allows this since the question was asked nearly four years ago, but system("\"C:\path\to\exe.exe\" args", intern = T) works for me and WILL bring up a separate child window and works on Windows 10 + R 3.6 + RStudio.
Not using the 'intern = T' was giving me a return code of 127 and did not run the process.
I had the same issue. there is an additional step in the installation process which i did not do.
refer to to the url
https://cran.r-project.org/bin/windows/Rtools/
Look for "Putting Rtools on the PATH"
writeLines('PATH="${RTOOLS40_HOME}\usr\bin;${PATH}"', con = "~/.Renviron")
for windows users
wrong: system(path("c:", "program files", "r", "anysoft.EXE"))
but works : system(path("c:", shQuote("program files"), "r", "anysoft.EXE"))
You guys are making it so complicated. I solved this problem by referring to this answer. The problem is with the PATH. type Sys.which('') in R, and you will see nothing. So you have to set the path in CMD, and then use Sys.setenv(PATH = '') in R to get this work.

What to do when a py.test hangs silently?

While using py.test, I have some tests that run fine with SQLite but hang silently when I switch to Postgresql. How would I go about debugging something like that? Is there a "verbose" mode I can run my tests in, or set a breakpoint ? More generally, what is the standard plan of attack when pytest stalls silently? I've tried using the pytest-timeout, and ran the test with $ py.test --timeout=300, but the tests still hang with no activity on the screen whatsoever
I ran into the same SQLite/Postgres problem with Flask and SQLAlchemy, similar to Gordon Fierce. However, my solution was different. Postgres is strict about table locks and connections, so explicitly closing the session connection on teardown solved the problem for me.
My working code:
#pytest.yield_fixture(scope='function')
def db(app):
# app is an instance of a flask app, _db a SQLAlchemy DB
_db.app = app
with app.app_context():
_db.create_all()
yield _db
# Explicitly close DB connection
_db.session.close()
_db.drop_all()
Reference: SQLAlchemy
To answer the question "How would I go about debugging something like that?"
Run with py.test -m trace --trace to get trace of python calls.
One option (useful for any stuck unix binary) is to attach to process using strace -p <PID>. See what system call it might be stuck on or loop of system calls. e.g. stuck calling gettimeofday
For more verbose py.test output install pytest-sugar. pip install pytest-sugar And run test with pytest.py --verbose . . .
https://pypi.python.org/pypi/pytest-sugar
I had a similar problem with pytest and Postgresql while testing a Flask app that used SQLAlchemy. It seems pytest has a hard time running a teardown using its request.addfinalizer method with Postgresql.
Previously I had:
#pytest.fixture
def db(app, request):
def teardown():
_db.drop_all()
_db.app = app
_db.create_all()
request.addfinalizer(teardown)
return _db
( _db is an instance of SQLAlchemy I import from extensions.py )
But if I drop the database every time the database fixture is called:
#pytest.fixture
def db(app, request):
_db.app = app
_db.drop_all()
_db.create_all()
return _db
Then pytest won't hang after your first test.
Not knowing what is breaking in the code, the best way is to isolate the test that is failing and set a breakpoint in it to have a look. Note: I use pudb instead of pdb, because it's really the best way to debug python if you are not using an IDE.
For example, you can the following in your test file:
import pudb
...
def test_create_product(session):
pudb.set_trace()
# Create the Product instance
# Create a Price instance
# Add the Product instance to the session.
...
Then run it with
py.test -s --capture=no test_my_stuff.py
Now you'll be able to see exactly where the script locks up, and examine the stack and the database at this particular moment of execution. Otherwise it's like looking for a needle in a haystack.
I just ran into this problem for quite some time (though I wasn't using SQLite). The test suite ran fine locally, but failed in CircleCI (Docker).
My problem was ultimately that:
An object's underlying implementation used threading
The object's __del__ normally would end the threads
My test suite wasn't calling __del__ as it should have
I figured I'd add how I figured this out. Other answers suggest these:
Found usage of pytest-timeout didn't help, the test hung after completion
Invoked via pytest --timeout 5
Versions: pytest==6.2.2, pytest-timeout==1.4.2
Running pytest -m trace --trace or pytest --verbose yielded no useful information either
I ended up having to comment literally everything out, including:
All conftest.py code and test code
Slowly uncommented/re-commented regions and identified the root cause
Ultimate solution: using a factory fixture to add a finalizer to call __del__
In my case the Flask application did not check if __name__ == '__main__': so it executed app.start() when that was not my intention.
You can read many more details here.
In my case diff worked very slow on comparing 4 MB data when assert failed.
with open(path, 'rb') as f:
assert f.read() == data
Fixed by:
with open(path, 'rb') as f:
eq = f.read() == data
assert eq

R Import - CSV file from password protected URL - in .BAT file

Okay - so here is what I'm trying to do.
I've got this password protected CSV file I'm trying to import into R.
I can import it fine using:
read.csv()
and when I run my code in RStudio everything works perfect.
However, when I try and run my .R file using a batch file (windows .bat) it doesn't work. I want to use the .BAT file so that I can set up a scheduled task to run my code every morning.
Here is my .BAT file:
"E:\R-3.0.2\bin\x64\R.exe" CMD BATCH "E:\Control Files\download_data.R" "E:\Control Files\DailyEmail.txt"
And here is my .R file:
url <- "http://username:password#www.url.csv"
data <- read.csv(url, skip=1)
** note, I've put my username/password and the exact location of the CSV in my code. I've used generic stuff here, as this is work related and posting usernames and passwords is probably frowned upon.
As I've said, this code works fine when I use it in RStudio. But fails when I use the .BAT file.
I get the following error message:
Error in download.file(url, "E:/data/data.csv") :
cannot open URL 'websiteurl'
In addition: Warning message:
In download.file(url, "E:/data/data.csv") :
unable to resolve 'username'
Execution halted
** above websiteurl is the http above (I can't post links)
So obviously, the .BAT is having trouble with the username/password? Any thoughts?
* EDIT *
I've gone so far as trying this on Linux. Thinking maybe windows was playing silly bugger.
Just from the terminal, I run Rscript -e "download_data.r" and get the EXACT same error message as I did in Windows. So I suspect this may be a problem with where I'm getting the data? Could the provider be blocking data from the command line, but not from with Rstudio?
I have had similar problems which had to do with file permissions. The .bat file somehow does not have the same privileges as you running the code directly from Rstudio. Try using rscript (http://stat.ethz.ch/R-manual/R-devel/library/utils/html/Rscript.html) within your .bat file like
Rscript "E:\Control Files\download_data.R"
What is the purpose of the argument "E:\Control Files\DailyEmail.txt"? Is the program suppose to use it in any way?
So, I've found a solution, which is likely not the most practical for most people, but works for me.
What I did was migrated my project over to a Linux system. Running daily scripts, is easier on Linux anyways.
The solution makes use of the "wget" function in linux.
You can either run the wget right in your shell script, or make use of the system() function in R to run the wget.
code looks like:
wget -O /home/user/.../file.csv --user=userid --password='password' http://www.url.com/file.csv
And you can do something like:
syscomand >- "wget -O /home/.../file.csv --user=userid --password='password' http://www.url.com/file.csv"
system (syscommand)
in R to download the CSV to a location on your hard drive, then grab the CSV using read.csv()
Doing it this way gave me some more insight into the potential root cause of the problem. While the system(syscommand) is running, I get the following output:
Connecting to www.website.com (www.website.com)|ip.ad.re.ss|:80... connected.
HTTP request sent, awaiting response... 401 Unauthorized
Reusing existing connection to www.weburl.com:80.
HTTP request sent, awaiting response... 200 OK
Not sure why it has to send the request twice? And why I'm getting a 401 Unauthorized the first try?

Resources