How to capture the names of all files read and written by an R script? - r

In a project, many separate scripts are executed and they read and write files from / for each other. It has become quite confusing, which file is coming from where. Of course, this is bad software design but that is how this has grown over a long time.
Now, I would to execute all scripts in their proper order and capture which files are read and which ones are written by each script.
Is there, e.g., a way to monitor and log the input and output of the R process while the script is running (from within R)? Or any other ideas for a solution?
I am running R under Windows 10.

Related

Deploy R function with directory as argument as executable or web application

I've written an R function (code available on demand) that improves some analysis workflows in my research group (~10 people), and now I would like to deploy it so it's easily accessible to the rest of the group. I am the only person in the group with any knowledge of R.
In a nutshell, the function does the following:
Takes one argument, the directory in which to search for microscopy images in proprietary formats
Asks user (with readline()) which channels should be analysed and what they are named
Generate several histograms and scatter plots of intensity levels per image channel, after various normalisation steps, these are deposited in a .pdf file for each image stack
Perform linear regression, generate a .txt file per image stack
The .pdf and .txt files get output to the directory the user specifies as the argument when running the function. I want to turn it into something somewhat more user-friendly, essentially removing the need to install R + function dependencies. For the sake of universality I would like to deploy it as a web application that takes a .zip file of the images as input, extracts them and then runs the function with that newly created directory as the argument. When it's done, it should output a .zip file of the created .pdfs and .txts. I'm not sure how feasible this is. I have looked into using Shiny for this but I'm having a hard time figuring out how to apply it as I do not have experience with it. I have experience in unix server administration and have a remote server that I can play around with.
The second option would be somewhat less universal, but it would be to deploy it as a Windows executable (I am the only person in my group not to use Windows as a daily OS, but I do have access to a Windows environment). Here, ideally the executable should ask the user to navigate to a directory, then use that directory as the argument to the function and output the generated files in said directory.
As my experience with R is limited, I cannot tell which option would be more feasible and worth it in the long run. My gut feeling says the web application would be the most flexible and user friendly, but more challenging to deploy. Are either of my ideas implementable, and if so, what would be a good way to do so?
Thanks in advance!

automating R script using Mac's Automator and Calendar

I have been trying to run a script automatically using the steps that I found online.
I am trying to run the following R script called AUTO.R
Here is what the script contains:
library(quantmod)
obs <- last(Ad(getSymbols("SPY", auto.assign=FALSE)))
saveRDS(obs, "SAMPLE.rds")
When I build the application it prints Workflow completed
I believe all is well until the time comes to run the script. The alarm pop-up in my desktop is displayed from Calendar but nothing runs. After a few minutes the folder where the .rds file should be saved does not contain anything.
Two suggested changes:
Your Automator task should be more like just /usr/local/bin/Rscript --vanilla /Users/rimeallthetime/Desktop/AUTO.R
You should explicitly set the path in saveRDS; i.e. saveRDS(obs, "/Users/rimeallthetime/Desktop/SAMPLE.rds")
Honestly, though, you should at least make a ~/bin dir (i.e. a directory called bin under your home directory, so in your case /Users/rimeallthetime/bin and put both the workflow and R script in there, and I'd also suggest creating another directory for output files vs the desktop.
UPDATE
I just let the calendar event run and this is really a crude way to automate what you want to do. You'd be better off in the long run using launchd, that way it's fully automated and requires no human intervnention at all (but you may need to adjust your script to send you a notification or "append" to the rds file).

Reading LabVIEW TDMS files with R

As part of a transition from MATLAB to R, I am trying to figure out how to read TDMS files created with National Instruments LabVIEW using R. TDMS is a fairly complex binary file format (http://www.ni.com/white-paper/5696/en/).
Add-ons exist for excel and open-office (http://www.ni.com/white-paper/3727/en/), and I could make something in LabVIEW to make the conversion, but I am looking for a solution that would let me read the TDMS files directly into R. This would allow us to test out the use of R for certain data processing requirements without changing what we do earlier in the data acquisition process. Having a simple process would also reduce the barriers to others trying out R for this purpose.
Does anyone have any experience with reading TDMS files directly into R, that they could share?
This is far from supporting all TDMS specifications but I started a port of a python npTDMS package into R here https://github.com/msuefishlab/tdmsreader and it has been tested out in the context of a shiny app here
You don't say if you need to automate the reading of these files using R, or just convert the data manually. I'm assuming you or your colleagues don't have any access to LabVIEW yourselves otherwise you could just create a LabVIEW tool to do the conversion (and build it as a standalone application or DLL, if you have the professional development system or app builder - you could run the built app from your R code by passing parameters on a command line).
The document on your first link refers to (a) add-ins for OpenOffice Calc and for Excel, which should work for a manual conversion and which you might be able to automate using those programs' respective macro languages, and (b) a C DLL for reading TDMS - would it be possible for you to use one of those?

Can my CGI call R?

I know barely more than zero about R: until yesterday I didn't know how to spell it. But I'm suicidal: for my web site, I'm thinking about letting a visitor type in an R "program" ( is it even called a "program") and then, at submit time, blindly calling the R interpreter from my CGI. I'd then return the interpreter's output to the visitor.
Does this make sense? Or does it amount to useless noise?
If it's workable, what are the pitfalls in this approach? For example, what are the security issues, if any? Is it possible to make R crash, killing my CGI program? Do I have to clean up the R code before calling the interpreter? And the like.
you could take a look to Rserve which allows to execute R scripts via the TCP/IP interface available in PHP for example if I'm not mistaken.
Its just asking for trouble to let people run arbitrary R code on your server. You could try running it in a chroot jail, but these things can be broken out of. Even in a chroot, the R process could delete or alter files, or spawn a long-running process, or download a file to your server, and all manner of nastiness.
You might look at Rweb, which has exactly this behavior: http://www.math.montana.edu/Rweb/
Since you can read and write files in R, it would not be safe to let people run arbitrary R code at your server. I would look if R has something like PHP's safe mode... If not, and if you are root, you can try to run R under user nobody in a chroot (you must also place there packages and libraries - for readonly access, and some temporary directory for RW access).

monitoring for changes in file(s) in real time

I have a program that monitors certain files for change. As soon as the file gets updated, the file is processed. So far I've come up with this general approach of handing "real time analysis" in R. I was hoping you guys have other approaches. Maybe we can discuss their advantages/disadvantages.
monitor <- TRUE
start.state <- file.info$mtime # modification time of the file when initiating
while(monitor) {
change.state <- file.info$mtime
if(start.state < change.state) {
#process
} else {
print("Nothing new.")
}
Sys.sleep(sleep.time)
}
Similar to the suggestion to use a system API, this can be also done using qtbase which will be a cross-platform means from within R:
dir_to_watch <- "/tmp"
library(qtbase)
fsw <- Qt$QFileSystemWatcher()
fsw$addPath(dir_to_watch)
id <- qconnect(fsw, "directoryChanged", function(path) {
message(sprintf("directory %s has changed", path))
})
cat("abc", file="/tmp/deleteme.txt")
If your system provides an API for monitoring filesystem changes, then you should use that. I believe Macs come with this. Not sure about other platforms though.
Edit:
A quick goog gave me:
Linux - http://wiki.linuxquestions.org/wiki/FAM
Win32 - http://msdn.microsoft.com/en-us/library/aa364417(VS.85).aspx
Obviously, these APIs will eliminate any polling that you require. On the other hand, they may not always be available.
Java has this: http://jnotify.sourceforge.net/ and http://java.sun.com/developer/technicalArticles/javase/nio/#6
I have a hack in mind: you can setup a CRON job/Scheduled task to run R script every n seconds (or whatever). R script checks the file hash, and if hashes don't match, runs the analysis. You can use digest::digest function, just check out the manual.
If you have lots of files that you want to monitor, then R may be too slow for this purpose. Go to your c: or / dir and see how long it takes to do file.info(dir(recursive = TRUE)). A dos or bash script may be quicker.
Otherwise, the code looks fine.
You could use the tclTaskSchedule function in the tcltk2 package to set up a function that checks for updates and runs your code. This would then be run on a regular basis (you set the timing) but would still allow you to use your R session.
I'll offer another solution for windows that I have been using in a production environment that works perfectly and that I find very easy to set up and, under the hood it basically accesses the system API for monitoring folder changes as others have mentioned, but all the "hard work" is taken care of for you. I use a freely available piece of software called Folder Monitor by Nodesoft and well described here. Once you execute this program, it appears in your system tray and from there you can specify a given directory to monitor. When files are written to the directory (or changed or modified - there are a few options from which you can choose), the program executes any program you like. I simply link the program to a windows batch that that calls my R Script. So for example, I have Folder Monitor set up to monitor a "\myservername\DropOff" UNC path for any new data files written to it. When Folder Monitor detects new files, it executes RunBatch.bat file that simply runs an R script (see here for information on setting that up) that validates the format of the expected file based on an expected naming convention for files received and then it unzips and processes the data, creating a dataframe and ultimately loads that into a SQL Server Database. It just doesn't get any easier.
One note if you decide to use this solution: take a look at the optional delay execution parameter, which might be important if files take a while to copy into the target directory from the source location.

Resources