.Rprofile's search path is not the same as the default one - r

Consider the two lines below:
Sys.setenv(R_IMPORT_PATH = "/path/to/my/r_import")
foo <- modules::import("foo")
If I execute this code from within an already-established interactive R session, it works fine.
But if I put the same two lines in my .Rprofile and start a new interactive R session, the modules::import line fails with
Error in module_init_files(module, module_path) :
could not find function "setNames"
If I then attempt the following fix/hack
Sys.setenv(R_IMPORT_PATH = "/path/to/my/r_import")
library(stats)
foo <- modules::import("foo")
...then the modules::import line still fails, but with the following
Error in lapply(x, f) : could not find function "lsf.str"
So the idea of patching the missing names seems like it will be an unmaintainable nightmare...
The crucial issue is this: It appears that the search path right after an interactive search session starts is different from the one that an .Rprofile script sees.
Q1: Is there a way that I can tell R to make the search path exactly as it will be when the first > prompt appears in the interactive session?
Q2: Alternatively, is there a way for the .Rprofile to schedule some code to run after the session's default search path is in place?
NB: Solutions like the following:
Sys.setenv(R_IMPORT_PATH = "/path/to/my/r_import")
library(stats)
library(utils)
foo <- modules::import("foo")
...are liable to break every time that the (third-party) modules package is modified.

Related

Copy script after executing

Is there a possibility to copy the executed lines or basically the script into the working directory?
So my normal scenario is, I have stand alone script which just need to be sourced within a working directory and they will do everything I need.
After a few month, I made update to these scripts and I would love to have a snapshot from the script when I executed the source...
So basically file.copy(ITSELF, '.') or something like this.
I think this is what you're looking for:
file.copy(sys.frame(1)$ofile,
to = file.path(dirname(sys.frame(1)$ofile),
paste0(Sys.Date(), ".R")))
This will take the current file and copy it to a new file in the same directory with the name of currentDate.R, so for example 2015-07-14.R
If you want to copy to the working directory instead of the original script directory, use
file.copy(sys.frame(1)$ofile,
to = file.path(getwd(),
paste0(Sys.Date(), ".R")))
Just note that the sys.frame(1)$ofile only works if a saved script is sourced, trying to run it in terminal will fail. It is worth mentioning though that this might not be the best practice. Perhaps looking into a version control system would be better.
Explanation:
TBH, I might not be the best person to explain this (I copied this idea from somewhere and use it sometimes), but I'll try. Basically in order to have information about the script file R needs to be running it as a file inside an environment with that information, and when that environment is a source call it contains the ofile data. We use (1) to select the next (source()'s) environment following the global environment (which is 0). When you're running this from terminal, there's no frame/environment other than Global (that's the error message), since no file is being ran - the commands are sent straight to terminal.
To illustrate that, we can do a simple test:
> sys.frame(1)
Error in sys.frame(1) : not that many frames on the stack
But if we call that from another function:
> myf <- function() sys.frame(1)
> myf()
<environment: 0x0000000013ad7638>
Our function's environment doesn't have anything in it, so it exists but, in this case, does not have ofile:
> myf <- function() names(sys.frame(1))
> myf()
character(0)
I just wanted to add my solution since I decided to use a try function before executing the copy command... Because I have the feeling I miss some control...
try({
script_name <- sys.frame(1)$ofile
copy_script_name <-
paste0(sub('\\.R', '', basename(script_name)),
'_',
format(Sys.time(), '%Y%m%d%H%M%S'),
'.R')
file.copy(script_name,
copy_script_name)
})
This will copy the script into the current directory and also adds a timestamp to the filename. In case something goes wrong, the rest of the script will still execute.
I originally posted this other thread, and I think it addresses your problem: https://stackoverflow.com/a/62781925/9076267
In my case, I needed a way to copy the executing file to back up the original >script together with its outputs. This is relatively important in research.
What worked for me while running my script on the command line, was a mixure of >other solutions presented here, that looks like this:
library(scriptName)
file_dir <- paste0(gsub("\\", "/", fileSnapshot()$path, fixed=TRUE))
file.copy(from = file.path(file_dir, scriptName::current_filename()) ,
to = file.path(getwd(), scriptName::current_filename()))
Alternatively, one can add to the file name the date and our to help in >distinguishing that file from the source like this:
file.copy(from = file.path(current_dir, current_filename()) ,
to = file.path(getwd(), subDir, paste0(current_filename(),"_", Sys.time(), ".R")))

Object created via read.csv disappears after script finishes

I am not a programmer and have less than a month of experience with R but have been writing simple scripts that read from external CSV files for the past week. The function below, which reads data from a CSV file, was originally more complex but I repeatedly shortened it during troubleshooting until I was left with this:
newfunction <- function(input1, input2) {
processingobject <- read.csv("processing-file.csv")
print(head(processingobject))
}
I can print both the head and the entire processingobject within the script without a problem, but after the script ends processingobject no longer exists. It never appears in the RStudio global environment pane. Shouldn't processingobject still exist after the script terminates?
The script runs without displaying any error or warning messages. I tried assigning processingobject to a second variable: processingobject2 <- processingobject but the second variable doesn't exist after the script ends either. I also tried clearing the global environment and restarting RStudio, but that did not work either. If at the prompt after the script I type processingobject I get the message "Error: object 'processingobject' not found". The CSV file itself is perfectly normal as far as I can tell.
Obviously, I must be doing something very stupid. Please help me. Thanks.
You need to use return in your function and reproducible examples are always a good idea. It makes it easier for people to help you out.
ncol<- 10
nrow<- 100
x <- matrix(runif(nrow*ncol),nrow,ncol)
write.csv(x,file="example_data.csv")
newfunction <- function(input1) {
processingobject <- read.csv("example_data.csv")
result <- apply(processingobject,2,function(x)x*input1) #doing something to csv with input
print(head(result))
return(result)
}
newcsv <-newfunction(3)

R: Improving workflow and keeping track of output

I have what I think is a common enough issue, on optimising workflow in R. Specifically, how can I avoid the common issue of having a folder full of output (plots, RData files, csv, etc.), without, after some time, having a clue where they came from or how they were produced? In part, it surely involves trying to be intelligent about folder structure. I have been looking around, but I'm unsure of what the best strategy is. So far, I have tackled it in a rather unsophisticated (overkill) way: I created a function metainfo (see below) that writes a text file with metadata, with a given file name. The idea is that if a plot is produced, this command is issued to produce a text file with exactly the same file name as the plot (except, of course, the extension), with information on the system, session, packages loaded, R version, function and file the metadata function was called from, etc. The questions are:
(i) How do people approach this general problem? Are there obvious ways to avoid the issue I mentioned?
(ii) If not, does anyone have any tips on improving this function? At the moment it's perhaps clunky and not ideal. Particularly, getting the file name from which the plot is produced doesn't necessarily work (the solution I use is one provided by #hadley in 1). Any ideas would be welcome!
The function assumes git, so please ignore the probable warning produced. This is the main function, stored in a file metainfo.R:
MetaInfo <- function(message=NULL, filename)
{
# message - character string - Any message to be written into the information
# file (e.g., data used).
# filename - character string - the name of the txt file (including relative
# path). Should be the same as the output file it describes (RData,
# csv, pdf).
#
if (is.null(filename))
{
stop('Provide an output filename - parameter filename.')
}
filename <- paste(filename, '.txt', sep='')
# Try to get as close as possible to getting the file name from which the
# function is called.
source.file <- lapply(sys.frames(), function(x) x$ofile)
source.file <- Filter(Negate(is.null), source.file)
t.sf <- try(source.file <- basename(source.file[[length(source.file)]]),
silent=TRUE)
if (class(t.sf) == 'try-error')
{
source.file <- NULL
}
func <- deparse(sys.call(-1))
# MetaInfo isn't always called from within another function, so func could
# return as NULL or as general environment.
if (any(grepl('eval', func, ignore.case=TRUE)))
{
func <- NULL
}
time <- strftime(Sys.time(), "%Y/%m/%d %H:%M:%S")
git.h <- system('git log --pretty=format:"%h" -n 1', intern=TRUE)
meta <- list(Message=message,
Source=paste(source.file, ' on ', time, sep=''),
Functions=func,
System=Sys.info(),
Session=sessionInfo(),
Git.hash=git.h)
sink(file=filename)
print(meta)
sink(file=NULL)
}
which can then be called in another function, stored in another file, e.g.:
source('metainfo.R')
RandomPlot <- function(x, y)
{
fn <- 'random_plot'
pdf(file=paste(fn, '.pdf', sep=''))
plot(x, y)
MetaInfo(message=NULL, filename=fn)
dev.off()
}
x <- 1:10
y <- runif(10)
RandomPlot(x, y)
This way, a text file with the same file name as the plot is produced, with information that could hopefully help figure out how and where the plot was produced.
In terms of general R organization: I like to have a single script that recreates all work done for a project. Any project should be reproducible with a single click, including all plots or papers associated with that project.
So, to stay organized: keep a different directory for each project, each project has its own functions.R script to store non-package functions associated with that project, and each project has a master script that starts like
## myproject
source("functions.R")
source("read-data.R")
source("clean-data.R")
etc... all the way through. This should help keep everything organized, and if you get new data you just go to early scripts to fix up headers or whatever and rerun the entire project with a single click.
There is a package called Project Template that helps organize and automate the typical workflow with R scripts, data files, charts, etc. There is also a number of helpful documents like this one Workflow of statistical data analysis by Oliver Kirchkamp.
If you use Emacs and ESS for your analyses, learning Org-Mode is a must. I use it to organize all my work. Here is how it integrates with R: R Source Code Blocks in Org Mode.
There is also this new free tool called Drake which is advertised as "make for data".
I think my question belies a certain level of confusion. Having looked around, as well as explored the suggestions provided so far, I have reached the conclusion that it is probably not important to know where and how a file is produced. You should in fact be able to wipe out any output, and reproduce it by rerunning code. So while I might still use the above function for extra information, it really is a question of being ruthless and indeed cleaning up folders every now and then. These ideas are more eloquently explained here. This of course does not preclude the use of Make/Drake or Project Template, which I will try to pick up on. Thanks again for the suggestions #noah and #alex!
There is also now an R package called drake (Data Frames in R for Make), independent from Factual's Drake. The R package is also a Make-like build system that links code/dependencies with output.
install.packages("drake") # It is on CRAN.
library(drake)
load_basic_example()
plot_graph(my_plan)
make(my_plan)
Like it's predecessor remake, it has the added bonus that you do not have to keep track of a cumbersome pile of files. Objects generated in R are cached during make() and can be reloaded easily.
readd(summ_regression1_small) # Read objects from the cache.
loadd(small, large) # Load objects into your R session.
print(small)
But you can still work with files as single-quoted targets. (See 'report.Rmd' and 'report.md' in my_plan from the basic example.)
There is package developed by RStudio called pins that might address this problem.

How to stop normalizePath from returning a tokenized path

I have an R program that does a bunch of data analysis and outputs the results to a text file. Unfortunately when I call
system2("open", file_path_as_absolute(toFile));
The R interpreter states the following
'../output/HLA-A,B,C,DR,DP,DQ' not found
'GT2' not found
'vs' not found
'LT2_DRB1_output2of9.10.11.12.13.14.16.25.26.28.30.31.32.33.37.38.40.47.57.58.60
.67.70.71.73.74.77.78.85.86.txt' not found
I am assuming, based upon this error, that the file_path_as_absolute tokenizes the file name, but I am not sure how to disable this. I have also tried normalizePath(), but I get the same error.
Edit
The file itself is called
"HLA-A,B,C,DR,DP,DQ GT2 vs LT2_DRB1_output2of9.10.11.12.13.14.16.25.26.28.30.31.32.33.37.38.40.47.57.58.60.67.70.71.73.74.77.78.85.86.txt"
and is located in ../output
Here is the code I ran to open the file, which gave me the same error
openSesame <- paste0('"', file_path_as_absolute(toFile), '"');
system2("open", openSesame);
I can get the following code to work on my system, starting in a directory where sister directory ../output exists (I don't have open on my system so I used gedit as the external command):
fn <- "HLA-A,B,C,DR,DP,DQ GT2 vs LT2_DRB1_output2of9.10.11.12.13.14.16.25.26.28.30.31.32.33.37.38.40.47.57.58.60.67.70.71.73.74.77.78.85.86.txt"
prot <- function(x) paste0('"',x,'"')
writeLines(c("a","b"),con=paste0("../output/",fn))
n <- tools::file_path_as_absolute(paste0("../output/",fn))
file.show(n)
system2("gedit",prot(n))
Can you make a reproducible example along these lines that fails on your system? (Does file.show() work for you?)
(Posting as an answer rather than a comment for purposes of code formatting ...)

R - Keep log of all plots

I do a lot of data exploration in R and I would like to keep every plot I generate (from the interactive R console). I am thinking of a directory where everything I plot is automatically saved as a time-stamped PDF. I also do not want this to interfere with the normal display of plots.
Is there something that I can add to my ~/.Rprofile that will do this?
The general idea is to write a script generating the plot in order to regenerate it. The ESS documentation (in a README) says it well under 'Philosophies for using ESS':
The source code is real. The objects are realizations of the
source code. Source for EVERY user modified object is placed in a
particular directory or directories, for later editing and
retrieval.
With any editor allows stepwise (or regionwise) execution of commands you can keep track of your work this way.
The best approach is to use a script file (or sweave or knitr file) so that you can just recreate all the graphs when you need them (into a pdf file or other).
But here is the start of an approach that does the basics of what you asked:
savegraphs <- local({i <- 1;
function(){
if(dev.cur()>1){
filename <- sprintf('graphs/SavedPlot%03d.pdf', i)
dev.copy2pdf( file=filename )
i <<- i + 1
}
}
})
setHook('before.plot.new', savegraphs )
setHook('before.grid.newpage', savegraphs )
Now just before you create a new graph the current one will be saved into the graphs folder of the current working folder (make sure that it exists). This means that if you add to a plot (lines, points, abline, etc.) then the annotations will be included. However you will need to run plot.new in order for the last plot to be saved (and if you close the current graphics device without running another plot.new then that last plot will not be saved).
This version will overwrite plots saved from a previous R session in the same working directory. It will also fail if you use something other than base or grid graphics (and maybe even with some complicated plots then). I would not be surprised if there are some extra plots on occasion that show up (when internally a plot is created to get some parameters, then immediatly replaced with the one of interest). There are probably other things that I have overlooked as well, but this might get you started.
you could write your own wrapper functions for your commonly used plot functions. This wrapper function would call both the on-screen display and a timestamped pdf version. You could source() this function in your ~/.Rprofile so that it's available every time you run R.
For latice's xyplot, using the windows device for the on-screen display:
library(lattice)
my.xyplot <- function(...){
dir.create(file.path("~","RPlots"))
my.chart <- xyplot(...)
trellis.device(device="windows",height = 8, width = 8)
print(my.chart)
trellis.device(device = "pdf",
file = file.path("~", "RPlots",
paste("xyplot",format(Sys.time(),"_%Y%m%d_%H-%M-%S"),
".pdf", sep = "")),
paper = "letter", width = 8, height = 8)
print(my.chart)
dev.off()
}
my.data <- data.frame(x=-100:100)
my.data$y <- my.data$x^2
my.xyplot(y~x,data=my.data)
As others have said, you should probably get in the habit of working from an R script, rather than working exclusively from the interactive terminal. If you save your scripts, everything is reproducible and modifiable in the future. Nonetheless, a "log of plots" is an interesting idea.

Resources