Passing values to a sourced file in R - r

Is there a way in R to pass the values of some variables, say strings, defined in a script to another script that is being sourced so that the latter can use them without having to declare them? Eg:
some R code
...
...
var1 <- "some string"
var2 <- "some param"
source("header.r")
Within header.r a list() has the slots with the names of the strings in var1 and var2:
tabl <- alldata.list[["some string"]][["some param"]]
Such that when I run the original script and call the header, tabl will be addressed properly?
Additionally, is there a restriction on the number and type of elements that can be passed?

When you use source to load a .R file, this sequentially runs the lines in that script, merging everything in that script into your running R session. All variables and functions are available from that moment onwards.
To make your code more readable/maintainable/debuggable though I would recommend not using variables to communicate between source files. In stead, I would use functions. In practice for me this means that I have one or multiple files which contain helper functions (sort of a package-light). These helper functions abstract away some of the functionality you need in the main script, making it shorter and more to-the-point. The goal is to create a main script that roughly fills a screen. In this way you can easily grasp the main idea of the script, any details can be found in the helper functions.
Using functions makes the main script self contained and not dependent on what happens in executable code in other source files. This requires less reasoning by yourself and others to determine what the script is exactly doing as you basically just have to read 40-50 lines of code.

Related

How to use a file modified by a R chunk in a Python one

I am working in Rmarkdown into primarily R chunks, which I used to modify data frames. Now that they are ready, a colleague gave me Python codes to process some of the data. But when transitioning from a R chunk to a Python one, the environment changes and I do not know how to use the previous files directly.
reticulate::repl_python()
biodata_file = women_personal_data
NameError: name 'women_personal_data' is not defined
NameError: name 'women_personal_data' is not defined
Ideally, I would like not to have to save the files on my computer between R and Python, and then back at R again, to avoid accumulating files that are not completely clean yet (because I figured it could be a solution).
I tried this solution but seems to not work with Data Frames
Thanks !
biodata_file = r.women_personal_data
The '.r' makes it take it from R, because the variable was called
r women_personal_data
TIP = to come back to R, the variable is now called py$women_personal_data

How to combine multiple similar scripts into one in R?

I have 48 scripts used to clean data corresponding to 48 different tests. The cleaning protocols for each test used to be unique and test-specific, but after some time the final project guideline allows that all tests may use the same cleaning protocol granted they save all output files to the appropriate directory (each test's own folder of results). I'm trying to combine these tests into one master cleaning script that can be used by any team member to clean data as more is collected, or make small changes, given they have the raw data files and a folder for each test (that I would give to them).
Currently I have tried two approaches:
The first is to include all necessary libraries in the body of a master cleaning script, then source() each individual cleaning script. Inside each script, the libraries are the require()ed, the appropriate files are read in, and code for the files are saved to their correct destination. This method seems to work best, but if the whole script is run, some subtests are successfully cleaned and saved to their correct locations, and the rest need to be saved individually--I'm not sure why.
library(readr)
library(dplyr)
library(data.table)
library(lubridate)
source("~/SF_Cleaning_Protocol.R")
etc
.
.
The second is the save the body of the general cleaning script as a function, and then call that function in a series of if statements based on the test one wants to clean.
For example:
if (testname == "SF"){
setwd("~/SF")
#read in the csv file
subtest<- read_csv()
path_map<- read_csv()
SpecIDs<- read_csv()
CleaningProtocol(subtest,path_map,SpecIDs)
write.csv("output1.csv")
write.csv("output2.csv")
write.csv("output3.csv")
write.csv("output4.csv")
} else if (testname == "EV"){
etc
}
The code reads in and prints out files fine if selected individually, but when testname is specified and the script is run as a whole, it ignores the if statements, runs all test, but fails to print results for any.
Is there a better option I haven't tried, or can anyone help me diagnose my issues?
Many thanks.

Designing a centralized file import system

This is an environment design question. I have a number of analysis/forecasting scripts I run each week, and each one relies on a number of files, with most files used by more than one script. I just had to change the name of one of the files, which was a real pain because I had to search through all my scripts and change the path declared in each one.
I would like to use a single .csv master file with file names and their paths, and create a centralized function that takes a list of file names, looks up their file paths, and then imports them all into the global environment. I could use this function in every script I run. Something like:
files_needed <- c("File_1", "File_2", "File_4", "File_6")
import_files(files_needed)
But then the function would require indirect variable assignment and declaring global variables, which know are frowned upon and I don't even know how to do both at once. I know I can write logic for importing the file path names manually in every script, but there must be a better option, where I can just write the import logic once.
Currently I have a master file that I source at the beginning of every script which loads my most commonly used packages and declares some helper functions I use frequently. I'd love to add this importing functionality in some capacity, but I'm open to solutions that look completely different to what I described. How do people generally solve this problem?
As a final note, many files have another twist, where they incorporate e.g. a date into the file name, so I need to be able to pass additional parameters in order to get the one I need.
Without a worked example this is untested code, but why not just make a list of imported files using those names?
files_needed <- c("File_1", "File_2", "File_4", "File_6")
my_imported_files <-
setNames( lapply(files_needed, read.csv), paste0(files_needed, "_df") )

R load script objects to workspace

This is a rookie question that I cannot seem to figure out. Say you've built an R script that manipulates a few data frames. You run the script, it prints out the result. All is well. How is it possible to load objects created within the script to be used in the workspace? For example, say the script creates data frame df1. How can we access that in the workspace? Thanks.
Here is the script...simple function just reads a csv file and computes diff between columns 2 and 3...basically I would like to access spdat in workspace
mspreaddata<-function(filename){
# read csv file
rdat<-read.csv(filename,header=T,sep=",")
# compute spread value column 2-3
spdat$sp<-rdat[,2]-rdat[,3]
}
You should use the source function.
i.e. use source("script.R")
EDIT:
Check the documentation for further details. It'll run the script you call. The objects will then be in your workspace.
Alternatively you can save those objects using save and then load them using load.
So when you source that, the function mspreaddata is not available in your workspace? Because in there spdat is never created. You are just creating a function and not running it. That object spdat only exists within that function and not in any environment external to that. You should add something like
newObject <- mspreaddata("filename.csv")
Then you can access newObject
EDIT:
It is also the case that spdat is not created in your function so the call to spdat$sp<-rdat[,2]-rdat[,3] is itself incorrect. Simply use return(rdat[,2]-rdat[,3]) instead.

multiple working directories in R

I wrote a list of different functions and script and I put them in some subfolders of the working directory so I can divide all my functions in arguments (descriptive statistic, geostatistic, regression....)
When I type source("function_in_subfolder") R tells me that there is no function.
I understood that it happens because functions have to stay in the working directory.
Is there a way to set also subfolders of the working directory as source for the functions (let's say in a hierarchical way)?
The source function has a chdir argument which, if set to TRUE, will set the working directory to that where the script resides. The new work directory is valid for the duration of the execution of the script, after that it is changed back. Assumung the following structure
main.R
one/
script.R
two/
subscript.R
you can call source("one/script.R", chdir=T) from main.R and, in script.R, call source("two/subscript.R", chdir=T).
However, by default, R will start its search from the current directory. There is no such thing as a "list of search paths" like, e.g., the PATH environment variable, although apparently someone attempted to create such a thing. I would strongly advise against attempting to find a script file "anywhere". Instead, indicate precisely which script is to be run at which point. Otherwise, name clashes resulting from simply adding a file to your scripts can lead to unpredictable behavior which is also difficult to debug.
One solution is to use list.files to get the full path of your function. for example:
myfunction.path <- list.files(getwd(),
recursive=TRUE,full.names=TRUE,
pattern='^myfunction.R$')
Then you can call it :
source(myfunction.path)
The recursive call of list.files can be expensive, so maybe you should call it once at the beginning of your analyze for example and store all functions paths in a named list. And BE CAREFUL the result can not be unique if you create 2 sources files withe the same name in 2 differents sub-directories.

Resources