Sourcing R files housed within an R project and maintaining relative paths - r

Okay, so I like to use R projects in Rstudio for scripts and data that I'm working with. However, let's say I want to source those scripts in another directory...R does not detect the .Rproj file unless the script is called from the directory where it is housed. Is there any way to source an R script that is part of an R project from another directory?
This is relevant as I have a system where I perform analyses and make figures in one directory, but then produce LaTeX documents that use those figures in another directory. I like to be able to source the R scripts that make the figures and save them to the directory where I'm writing in LaTeX.
Here's a MRE:
With an R project already created in directory (done via Rstudio)...let's call it ~/test.
Create some data:
a <- 1:10
dat <- data.frame(a = a, b = a + rnorm(length(a), 10, 2))
save(dat, file = "test.RData")
Place the following script in ~/test. Let's call it test.R.
load("test.RData")
pdf(file = "plot.pdf")
plot(b ~ a, data = dat)
dev.off()
Works great, right? But if we try the following from any other directory R can't figure it out.
cd ~
Rscript ~/test/test.R
Any thoughtful solutions? I suppose it's easy enough to just setwd() in the script that I'm sourcing the original script from, but this sort of defeats the whole purpose of using R projects.

You could use setwd("~/test/") at the beginning of the script and if necessary change it back later on.

Related

How to transfer my files to R Projects, and then to GitHub?

I have 3 r scripts;
data1.r
data2.r
graph1.r
the two data files, run some math and generate 2 separate data files, which I save in my working directory. I then call these two files in graph1.r and use it to plot the data.
How can I organise and create an R project which has;
these two data files - data1.r and data2.r
another file which calls these files (graph1.r)
Output of graph1.r
I would then like to share all of this on GitHub (I know how to do this part).
Edit -
Here is the data1 script
df1 <- data.frame(x = seq(1,100,1), y=rnorm(100))
save(df1, file = "data1.Rda")
Here is the data2 script
df2 <- data.frame(x = seq(1,100,1), y=rnorm(100))
save(df2, file = "data2.Rda")
Here is the graph1 script
load(file = "data1.Rda")
load(file = "data2.Rda")
library(ggplot2)
ggplot()+geom_point(data= df1, aes(x=x,y=y))+geom_point(data= df2, aes(x=x,y=y))
Question worded differently -
How would the above need to be executed inside a project?
I have looked at the following tutorials -
https://r4ds.had.co.nz/workflow-projects.html
https://martinctc.github.io/blog/rstudio-projects-and-working-directories-a-beginner's-guide/
https://swcarpentry.github.io/r-novice-gapminder/02-project-intro/
https://www.tidyverse.org/blog/2017/12/workflow-vs-script/
https://chrisvoncsefalvay.com/2018/08/09/structuring-r-projects/
https://support.rstudio.com/hc/en-us/articles/200526207-Using-Projects
I have broken my answer into three parts:
The question in your title
The reworded question in your text
What I, based on your comments, believe you are actually asking
How to transfer my files to R Projects, and then to GitHub?
From RStudio, just create a new project and move your files to this folder. You can then initialize this folder with git using git init.
How would [my included code] need to be executed inside a project?
You don't need to change anything in your example code. If you just place your files in a project folder they will run just fine.
An R project mainly takes care of the following for you:
Working directory (it's always set to the project folder)
File paths (all paths are relative to the project root folder)
Settings (you can set project specific settings)
Further, many external packages are meant to work with projects, making many task easier for you. A project is also a very good starting point for sharing your code with Git.
What would be a good workflow for working with multiple scripts in an R project?
One common way of organizing multiple scripts is to make a new script calling the other scripts in order. Typically, I number the scripts so it's easy to see the order to call them. For example, here I would create 00_main.R and include the code:
source("01_data.R")
source("02_data.R")
source("03_graph.R")
Note that I've renamed your scripts to make the order clear.
In your code, you do not need to save the data to pass it between the scripts. The above code would run just fine if you delete the save() and load() parts of your code. The objects created by the scripts would still be in your global environment, ready for the next script to use them.
If you do need to save your data, I would save it to a folder named data/. The output from your plot I would probably save to outputs/ or plots/.
When you get used to working with R, the next step to organize your code is probably to create a package instead of using only a project. You can find all the information you need in this book.

R package to knit a markdown document given some data

I am writing a basic R package that reads in data from a user specified database and spits out a markdown report with predefined graphs and tables etc. I have placed the .Rmd file in the R folder, and have a user level function that reads in the data and knits it.
# create_doc.R
create_doc <- function(directory = NULL,
database_name_name) {
if (is.null(directory)) directory <- tclvalue(tkchooseDirectory(
title = "Choose Folder for Input and Output"))
rmarkdown::render("R/doc_generator.Rmd", output_dir = directory)
}
This works fine on my computer, but when I build the package, the .Rmd file has been deleted. This means I can't give it to other users for use on other computers. I realise that the R folder may not be the correct place for this file (I guess it deletes any files not ending in .R), but I'm not sure where else to put it. It is not package documentation, it creates the end result when using the package.
Googling has not helped so far. Is it possible to knit a document using a function in an R package? If yes, what am I doing wrong. If no, are there any other suggestions on how to achieve this?

I want to know an automated way to load the data set when the files moved to another computer?

Someone, please guide me. Suppose I choose the location of a data file using the file.choose () and load the dataset after that. Also, suppose I have sent the script+data set to a friend of mine from e-mail. when my friend downloaded the files and run the r script, he has to choose the location of the file to run the script. I want to know an automated way to load the data set when the files moved to another computer.
First, consider having a "project" directory where you have a directory for scripts and one for data. There's a 📦 called rprojroot that has filesystem helpers which will aid you in writing system independent code and will work well if you have a "project" directory. RStudio has a concept of projects & project directories which makes this even easier.
Second, consider using a public or private GitHub for this work (scripts & data). If the data is sensitive, make it a private repo and grant access as you need. If it's not, then it's even easier to share. You'll get data and code version control this way as well.
Third --- as a GitHub alternative --- consider using Keybase shared directories or git spaces. You can grant/remove access to specific individuals and they remain private and secure as well as easy to use.
These solutions will work on any computer without changing the script.
1) use current dir If you assume the data and script are in the same directory then this will work on any computer provided the user first does a setwd("/my/dir") or starts R in that directory. One invokes the script using source("myscript.R") and the script reads the data using read.table("mydata.dat"). This approach is the simplest, particularly if the script is only going to be used once or a few times and then never used again.
2) use R options A slightly more general approach is to assume that R option DATADIR (pick any name you like) contains that directory or the current directory if not defined. In the script write:
datadir <- getOption("DATADIR", ".") # use DATADIR or . if DATADIR not defined
read.table(file.path(datadir, "mydata.dat"))
Then the user can define DATADIR in their R session or in their .Rprofile:
options(DAtADIR = "/my/dir")
or not define it at all but setwd to that directory in their R session prior to running the script or start R in that directory.
This might be better than (1) if the script is going to be used over a long period of time and moved around without the data. If you put the options statement in your .Rprofile then it will help remind you where the data is if you don't use the script for a long time and lose track of its location.
3) include data in script If the script always uses the same data and it is not too large you could include the data in the script. Use dput(DF) where DF is the data frame in order to get the R code corresponding to DF and then just paste that into your script. Here is such a sample script where we used the output of dput(BOD):
DF <- structure(list(Time = c(1, 2, 3, 4, 5, 7), demand = c(8.3, 10.3,
19, 16, 15.6, 19.8)), .Names = c("Time", "demand"), row.names = c(NA,
-6L), class = "data.frame", reference = "A1.4, p. 270")
plot(demand ~ Time, DF)
Of course if you always use the same data you could create a package and include the script and the data.
4) config package You could use the config package to define a configuration file for your script. That still begs the question of how to find the configuration file but config can search the current directory and all ancestors (parent dir, grandparent dir, etc.) for the config file so specification of its location may not be needed.

Editing a .r file from within another .r file

I am trying to make my current project reproducible, and so am creating a master document (eventually a .rmd file) that will be used to call and execute several other documents. This way myself and other investigators only need to open and run one file.
There are three layers to the current setup: master file, 2 read-in files, 2 databases. The master file calls the read-in files using source(), and the read-in files parse the .csv databases and apply labels.
The read-in files and the databases are generated automatically with the data management software I'm currently using (REDCap) each time I download the updated data.
However, the read-in files have a line of code that removes all of the objects in my environment. I would like to edit the read-in files directly from the master file so that I do not have to open the read-in files individually each time I run my report. Specifically, since all the read-in files are the same, I would like to remove line #2 in each.
I've tried searching Google, and tried file.edit(), but have been unable to find anything. Not even sure it is possible, but figured I would ask. Let me know if I can improve this question or if you need any additional code to answer it. Thanks!
Current relevant master code (edited for generality):
source("read-in1")
source("read-in2")
Current relevant read-in file code (same in each file, except for the database name):
#Clear existing data and graphics
rm(list=ls())
graphics.off()
#Load Hmisc library
library(Hmisc)
#Read Data
data=read.csv('database.csv')
#Setting Labels
[read-in code truncated]
Additional details:
OS: Windows 7 Professional x86
R version: 3.1.3
R Studio version: 0.99.441
You might try readLines() and something like the following (which was simplified greatly by a suggestion from #Hong Ooi below):
eval(parse(readLines("read-in1.R")[-2]))
My original solution which was much more pedantic:
f <- file("read-in1.R", open="r")
t <- readLines(f)
close(f)
for (l in t[-2]) { eval(parse(text=l)) }
The for() loop just parses and evaluates each line from the text file except for the second one (that's what the -2 index value does). If you're reading and writing longer files then the following will be much faster than the second option, however still less preferable than #Hong Ooi's:
f <- file("read-in1.R", open="r")
t <- readLines(f)
close(f)
f <- file("out.R", open="w")
o <- writeLines(t[-2], f)
close(f)
source("out.R")
Sorry I'm so late in noticing this question, but you may want to investigate getting access the the REDCap API and using either the redcapAPI package or the REDCapR package. Both of those packages will allow you to export the data from REDCap and directly into R without having to use the download scripts. redcapAPI will even apply all the formats and dates (REDCapR might do this now too. It was in the plan, but I haven't used it in a while).
You could try this. It just calls some shell commands: (1) renames the file, then (2) copies all lines not containing rm(list=ls()) to a new file with the same name as the original file, then (3) removes the copy.
files_to_change <- c("read-in1.R", "read-in2.R")
for (f in files_to_change) {
old <- paste0(f, ".old")
system(paste("cmd.exe /c ren", f, old))
system(paste("cmd.exe /c findstr /v rm(list=ls())", old, ">", f))
system(paste("cmd.exe /c rm", old))
}
After calling this loop you should have
#Clear existing data and graphics
graphics.off()
#Load Hmisc library
library(Hmisc)
#Read Data
data=read.csv('database.csv')
#Setting Labels
in your read-in*.R files. You could put this in a batch script
#echo off
ren "%~f1" "%~nx1.old"
findstr /v "rm(list=ls())" "%~f1.old" > "%~f1"
rm "%~nx1.old"
say, "example.bat", and call that in the same way using system.

Specify output directory for R script with knit_hooks$set(purl = hook_purl)

I understand that we shouldn't purl() a chunk with knitrbut instead use knit_hooks$set(purl = hook_purl). That works, but it puts the R script in the working directory. I would like to put it in an R/ directory. It's probably due to my own incompetence, but I couldn't find anything about specifying the directory for the R script (I looked in the R documentation as well as several places online). Anyone have any ideas? I'm knitting from within RStudio, by the way.
You can generate the script under the current directory, and file.rename() it to the R/ directory.

Resources