multiple working directories in R - r

I wrote a list of different functions and script and I put them in some subfolders of the working directory so I can divide all my functions in arguments (descriptive statistic, geostatistic, regression....)
When I type source("function_in_subfolder") R tells me that there is no function.
I understood that it happens because functions have to stay in the working directory.
Is there a way to set also subfolders of the working directory as source for the functions (let's say in a hierarchical way)?

The source function has a chdir argument which, if set to TRUE, will set the working directory to that where the script resides. The new work directory is valid for the duration of the execution of the script, after that it is changed back. Assumung the following structure
main.R
one/
script.R
two/
subscript.R
you can call source("one/script.R", chdir=T) from main.R and, in script.R, call source("two/subscript.R", chdir=T).
However, by default, R will start its search from the current directory. There is no such thing as a "list of search paths" like, e.g., the PATH environment variable, although apparently someone attempted to create such a thing. I would strongly advise against attempting to find a script file "anywhere". Instead, indicate precisely which script is to be run at which point. Otherwise, name clashes resulting from simply adding a file to your scripts can lead to unpredictable behavior which is also difficult to debug.

One solution is to use list.files to get the full path of your function. for example:
myfunction.path <- list.files(getwd(),
recursive=TRUE,full.names=TRUE,
pattern='^myfunction.R$')
Then you can call it :
source(myfunction.path)
The recursive call of list.files can be expensive, so maybe you should call it once at the beginning of your analyze for example and store all functions paths in a named list. And BE CAREFUL the result can not be unique if you create 2 sources files withe the same name in 2 differents sub-directories.

Related

Designing a centralized file import system

This is an environment design question. I have a number of analysis/forecasting scripts I run each week, and each one relies on a number of files, with most files used by more than one script. I just had to change the name of one of the files, which was a real pain because I had to search through all my scripts and change the path declared in each one.
I would like to use a single .csv master file with file names and their paths, and create a centralized function that takes a list of file names, looks up their file paths, and then imports them all into the global environment. I could use this function in every script I run. Something like:
files_needed <- c("File_1", "File_2", "File_4", "File_6")
import_files(files_needed)
But then the function would require indirect variable assignment and declaring global variables, which know are frowned upon and I don't even know how to do both at once. I know I can write logic for importing the file path names manually in every script, but there must be a better option, where I can just write the import logic once.
Currently I have a master file that I source at the beginning of every script which loads my most commonly used packages and declares some helper functions I use frequently. I'd love to add this importing functionality in some capacity, but I'm open to solutions that look completely different to what I described. How do people generally solve this problem?
As a final note, many files have another twist, where they incorporate e.g. a date into the file name, so I need to be able to pass additional parameters in order to get the one I need.
Without a worked example this is untested code, but why not just make a list of imported files using those names?
files_needed <- c("File_1", "File_2", "File_4", "File_6")
my_imported_files <-
setNames( lapply(files_needed, read.csv), paste0(files_needed, "_df") )

setwd error: directories within directories

Sorry this is long, but I'm a novice and want to be specific.
I have varied numbers of dataframes within a set of directories, within a set of directories. (That's 60 inner directories, hence I'm attempting to automate this.) My goal is to list and open each outer directory; within it, list and open each inner directory; and within that, perform some simple functions with the dataframes there (average some values, etc.).
The script returns "Error in setwd(inner) : cannot change working directory", and performs the function on files in the outer directory instead, only to the first outer directory. I think the script is calling the functions in the wrong order, perhaps it's because I nested for loops such that both setwd(inner) and setwd('..') are within setwd(outer) and setwd('..'), in order to access every directory in every directory. It's not a recursion or path-name issue, because the same error results whether recursive and full.names are TRUE or FALSE in my list of directories (with list.dirs).
I've read about the downfalls of using setwd, but I'm the only analyst and don't need to share the script with other people/machines/OSs (I use RStudio in Mac OS 10.7.5). Are there better functions than setwd for analyzing all files in each directory in each directory? Or do I need to use a simpler script to work only within an inner directory, and apply it by hand individually to those 60 directories? Thank you for reading and thank you in advance for any advice you can offer!
I will use the function list.files function that ships with base r. list.files will searCH a folder recursively for files. You can also include a pattern so that the function only returns files that matches.
list.files will return the relative path to the files that you are looking for so you can read each dataframe without having to change your working directory.
I hope you will find this useful.
Let me know if you need any other help.
Cheers

How to converge multiple R files into one single file

Situation
I wrote an R program which I split up into multiple R-files for the sake of keeping a good code structure.
There is a Main.R file which references all the other R-files with the 'source()' command, like this:
source(paste(getwd(), dirname1, 'otherfile1.R', sep="/"))
source(paste(getwd(), dirname3, 'otherfile2.R', sep="/"))
...
As you can see, the working directory needs to be set correctly in advance, otherwise, this could go wrong.
Now, if I want to share this R program with someone else, I have to pass all the R files and folders in relative order of each other for things to work. Hence my next question.
Question
Is there a way to replace all the 'source' commands with the actual R script code which it refers to? That way, I have a SINGLE R script file, which I can simply pass along without having to worry about setting the working directory.
I'm not looking for a solution which is an 'R package' (which by the way is one single directory, so I would lose my own directory structure). I simply wondering if there is an easy way to combine these self-referencing R files into one single file.
Thanks,
Ok I think you could use something like scaning all the files and then writting them again in the same new one. This can be done using readLines and sink:
sink("mynewRfile.R")
for(i in Nfiles){
current_file = readLines(filedir[i])
cat("\n\n#### Current file:",filedir[i],"\n\n")
cat(current_file, sep ="\n")
}
sink()
Here I have supposed all your file directories are in a vector filedir with length Nfiles, I guess you can adapt that

Defining a 'scripts' directory in R?

I am working with R in several directories containing model output I'd like to analyse and plot. I maintain a single 'scripts' directory for this project.
I'd like to be able to 'point' an environment variable at this scripts directory so that I could tab complete source(...) commands. Is this a possibility?
So far, I've managed to create an RPATH environment variable, and have written a function in my .Rprofile which lists the directory's contents without me having to type it out. I can't quite figure how I'd get tab completion though.
Any help/advice would be greatly appreciated.

Passing values to a sourced file in R

Is there a way in R to pass the values of some variables, say strings, defined in a script to another script that is being sourced so that the latter can use them without having to declare them? Eg:
some R code
...
...
var1 <- "some string"
var2 <- "some param"
source("header.r")
Within header.r a list() has the slots with the names of the strings in var1 and var2:
tabl <- alldata.list[["some string"]][["some param"]]
Such that when I run the original script and call the header, tabl will be addressed properly?
Additionally, is there a restriction on the number and type of elements that can be passed?
When you use source to load a .R file, this sequentially runs the lines in that script, merging everything in that script into your running R session. All variables and functions are available from that moment onwards.
To make your code more readable/maintainable/debuggable though I would recommend not using variables to communicate between source files. In stead, I would use functions. In practice for me this means that I have one or multiple files which contain helper functions (sort of a package-light). These helper functions abstract away some of the functionality you need in the main script, making it shorter and more to-the-point. The goal is to create a main script that roughly fills a screen. In this way you can easily grasp the main idea of the script, any details can be found in the helper functions.
Using functions makes the main script self contained and not dependent on what happens in executable code in other source files. This requires less reasoning by yourself and others to determine what the script is exactly doing as you basically just have to read 40-50 lines of code.

Resources