Deploy R function with directory as argument as executable or web application - r

I've written an R function (code available on demand) that improves some analysis workflows in my research group (~10 people), and now I would like to deploy it so it's easily accessible to the rest of the group. I am the only person in the group with any knowledge of R.
In a nutshell, the function does the following:
Takes one argument, the directory in which to search for microscopy images in proprietary formats
Asks user (with readline()) which channels should be analysed and what they are named
Generate several histograms and scatter plots of intensity levels per image channel, after various normalisation steps, these are deposited in a .pdf file for each image stack
Perform linear regression, generate a .txt file per image stack
The .pdf and .txt files get output to the directory the user specifies as the argument when running the function. I want to turn it into something somewhat more user-friendly, essentially removing the need to install R + function dependencies. For the sake of universality I would like to deploy it as a web application that takes a .zip file of the images as input, extracts them and then runs the function with that newly created directory as the argument. When it's done, it should output a .zip file of the created .pdfs and .txts. I'm not sure how feasible this is. I have looked into using Shiny for this but I'm having a hard time figuring out how to apply it as I do not have experience with it. I have experience in unix server administration and have a remote server that I can play around with.
The second option would be somewhat less universal, but it would be to deploy it as a Windows executable (I am the only person in my group not to use Windows as a daily OS, but I do have access to a Windows environment). Here, ideally the executable should ask the user to navigate to a directory, then use that directory as the argument to the function and output the generated files in said directory.
As my experience with R is limited, I cannot tell which option would be more feasible and worth it in the long run. My gut feeling says the web application would be the most flexible and user friendly, but more challenging to deploy. Are either of my ideas implementable, and if so, what would be a good way to do so?
Thanks in advance!

Related

How to capture the names of all files read and written by an R script?

In a project, many separate scripts are executed and they read and write files from / for each other. It has become quite confusing, which file is coming from where. Of course, this is bad software design but that is how this has grown over a long time.
Now, I would to execute all scripts in their proper order and capture which files are read and which ones are written by each script.
Is there, e.g., a way to monitor and log the input and output of the R process while the script is running (from within R)? Or any other ideas for a solution?
I am running R under Windows 10.

How do I unit test functions in my R package that interact with the file system?

I'm working on an R package at work. My package has gotten large enough that I've decided I need some form of repeatable testing. I settled upon using testthat and mockery. I'm not a developer, so this is the first time I'm writing tests at this level.
I deal with a lot of data files and it's very convenient to have functions in my package to help locate files. These functions interact with the file system via calls to dir. For example,
Data from one event can be split over multiple files. If I have file datafile_2017.10.20_12.00.00, I have a function that can find the next file that is part of the same event, i.e. datafile_2017.10.20_12.05.00.
My question is this: what is the best way to test functions like this? My intuition is to avoid using actual files stored somewhere else in my repository because that can fail for a number of reasons, e.g. different paths, different repo states b/w systems. I searched around and it looks like different languages have mocking libraries that allow for mocking directory structures. I haven't found anything like that for R (except for testthatsomemore, but it was removed from CRAN sometime in 2016).
Is there an R package that allows for mocking directory structures? Or am I wrong to move away from storing small test files in my repo?

Is it possible to define a cross-platform working directory for R?

I am teaching R tutorials in person w/ a large number of undergraduate R novices. I am also trying to format my notes on RPubs so that they can easily be used by other people. Nothing derails things faster than people mis-specifying working directories or saving spreadsheet files to someplace different than their working directory.
Is it possible to define a working directory that is universal across platforms? For example, a line of code or a function like
setwd( someplace that is likely to exist on every computer)
This could involve a function that finds some place the usually exists on all computers, such as the desktop, downloads folder, or R directory.
In general your best bet is to go for the user's home area,
setwd("~")
path.expand("~")
Since you are teaching novices, a common problem is that students notice the R package directory ~/R/ and assume that they should put their scripts this directory; thereby creating odd bugs. To avoid this, I would go for
dir.create("~/RCourse", FALSE)
setwd("~/RCourse")
If you use RStudio, you could get them to create an RStudio project.
In the past, I have come across situations where this doesn't work. For example, some people have their home area as a network drive, but can't connect to the internet or get through a firewall.
You asked about
someplace that is likely to exist on every computer
and yes, R works hard to ensure this returns a valid directory: tempdir().
The main danger, though, may be that this directory will vanish after the session (unless you override the default behaviour of removing the per-session temporary directory at end). Until then, it works.
Still, this can be useful. I sometimes use that to write temporary files I don't want to clutter in the current directory, or ~.
Otherwise #csgillespie gave you a good answer pertaining to $HOME aka ~.

R shiny concurrent file access

I am using the R shiny package to build a web interface for my executable program. The web interface provides user input and shows output.
On the server background, the R script formats user inputs and saves them to a local input file. Then R calls the system command to run the executable program.
My concern is that if multiple users run the web app at the same time, it is possible that the input file generated by the first user will be overwritten by the second user's input before it is read by the executable program.
One way to solve the conflict is to ask R to create a temporary folder and generate/run the input file under that folder for each user. But I'd like to know whether there is a better or automatic way to resolve this potential conflict with shiny. For example, if use shiny fileInputs, the uploaded files are automatically stored in a temporary folder.
Update
Thanks for the advice.#Symbolix and #Mike Wise
I read the persistent data storage article before but I don't think it is exactly what I wanted. Maybe my understanding is not correct. I end up with creating a temporary folder and run my executable from there.

How i Connect the data that's in my folder without writing the specific path like:"C:\\Users\\Dima\\Desktop\\NewData\\..."

I am writing a script that's Requires Data Which is in my computer folder.
But eventually this script will be used in another computer, by another person.
I can't tell him to change all the paths to the data in the script.
How i Connect the data that's in my folder without writing the specific path
Like:"C:\Users\Dima\Desktop\NewData\..."
The best way of making your code shareable depends upon your use case.
As Carl Witthoft pointed out, most code should be encapsulated in functions. These functions can then be packaged into packages and easily redistributed on other peoples's machines. Writing packages is easier than you think.
For one off analyses, scripts are appropriate. How you make them user-independent depends on who your users are. If your are sharing the script with colleagues, try to keep your data on a network drive, then the link to the data will be the same for everyone. If you are sharing your script with the world, then keep your data on the internet, and the link to the data will be a hyperlink, again, the same for everyone.
If you are sharing your script with a few people who don't have access to a common drive, and you can't put your data on the internet, then some directory manipulation is acceptable.
Change your working directory to the root of where your project files are.
setwd("c:/Users/Dima/My Project")
Then you can reference the location of the data using relative paths.
data_file <- "Data/My data file.csv"
my_data <- read.csv(data_file)
Assuming that you keep the directory structure within your project the same, then you only need to change the call to setwd on each machine.
Also note that the special location "~" refers to your user home directory. Try
normalizePath("~")
That way, if you keep your project in that location, you can avoid reference to "Dima" entirely.

Resources