I am performing a set of analyses in R. The flow of the analysis is reading in a dataframe (i.e. input_dataframe), performing a set of calculations that then result in a new, smaller dataframe (called final_result). A set of exact calculations is performed on 23 different files, each of which contains a dataframe.
My question is as follows: For each file that is read in (i.e. the 23 files) I am trying to save a unique R object: How do I do so? When I save the resulting final_result dataframe (using save() to an R object, I then cannot read all 23 objects into a new R session without having the different R objects override each other. Other suggestions (such as Create a variable name with "paste" in R?) did not work for me, since they rely on the fact that once the new variable name is assigned, you then call that new variable by its name, which I cannot do in this case.
To Summarize/Reword: Is there a way to save an object in R but change the name of the object for when it will be loaded later?
For example:
x=5
magicSave(x,file="saved_variable_1.r",to_save_as="result_1")
x=93
magicSave(x,file="saved_variable_2.r",to_save_as="result_2")
load(saved_variable_1)
load(saved_variable_2)
result_1
#returns 5
result_2
#returns 93
In R it's generally a good idea to actually store as a list everything that can be seen as a list. It will make everything more elegant afterwards.
First you put all your paths in a list or a vector :
paths <- c("C:/somewhere/file1.csv",
"C:/somewhere/file2.csv") # etc
Then you read them :
objects <- lapply(paths,read.csv) # objects is a list of tables
Then you apply your transformation on each element :
output <- lapply(objects,transformation_function)
And then you can save your output (I find saveRDS cleaner than save as you know what variables you'll be inviting in your workspace when loading) :
saveRDS(output,"C:/somewhere/output.RDS")
which you will load with
output <- readRDS("C:/somewhere/output.RDS")
OR if you prefer for some reason to save as different objects:
output_paths <- paste0("C:/somewhere/output",seq_along(output),".csv")
Map(saveRDS,output,output_paths)
To load later with:
output <- lapply(paths, readRDS)
x=5
write.csv(x,"one_thing.csv", row.names = F)
x=93
write.csv(x,"two_thing.csv", row.names = F)
result_1 <- read.csv("one_thing.csv")
result_2 <- read.csv("two_thing.csv")
result_1
# x
# 1 5
result_2
# x
# 1 93
Related
I am working with a large dataset in an R package.
I need to get all of the separate data frames into my global environment, preferably into a list of data frames so that I can use lapply to do some repetitive operations later.
So far I've done the following:
l.my.package <- data(package="my.package")
lc.my.package <- l.my.package[[3]]
lc.df.my.package <- as.data.frame(lc.my.package)
This effectively creates a data frame of the location and name of each of the .RData files in my package, so I can load them all.
I have figured out how to load them all using a for loop.
I create a vector of path names and feed it into the loop:
f <- path('my/path/folder', lc.df.my.package$Item, ext="rdata")
f.v <- as.vector(f)
for (i in f.v) {load(i)}
This loads everything into separate data frames (as I want), but it obviously doesn't put the data frames into a list. I thought lapply would work here, but when I use lapply, the resulting list is a list of character strings (the title of each dataframe with no data included). That code looks like this:
f.l <- as.list(f)
func <- function(i) {load(i)}
df.list <- lapply(f.l, func)
I am looking for one of two possible solutions:
how can I efficiently collect the output of for loop into a list (a "while" loop would likely be too slow)?
how can I adjust lapply so the output includes each entire dataframe instead of just the title of each dataframe?
Edit: I have also tried introducing the "envir=.GlobalEnv" argument into load() within lapply. When I do that, the data frames load, but still not in a list. The list still contains only the names as character strings.
If you are willing to use a packaged solution, I wrote a package call libr that does exactly what you are asking for. Here is an example:
library(libr)
# Create temp directory
tmp <- tempdir()
# Save some data to temp directory
# for illustration purposes
saveRDS(trees, file.path(tmp, "trees.rds"))
saveRDS(rock, file.path(tmp, "rocks.rds"))
# Create library
libname(dat, tmp)
# library 'dat': 2 items
# - attributes: not loaded
# - path: C:\Users\User\AppData\Local\Temp\RtmpCSJ6Gc
# - items:
# Name Extension Rows Cols Size LastModified
# 1 rocks rds 48 4 3.1 Kb 2020-11-05 23:25:34
# 2 trees rds 31 3 2.4 Kb 2020-11-05 23:25:34
# Load library
lib_load(dat)
# Examine workspace
ls()
# [1] "dat" "dat.rocks" "dat.trees" "tmp"
# Unload the library from memory
lib_unload(dat)
# Examine workspace again
ls()
# [1] "dat" "tmp"
#rawr's response works perfectly:
df.list <- mget(l.my.package$results[, 'Item'], inherits = TRUE)
I have several .RData files, each of which has letters and numbers in its name, eg. m22.RData. Each of these contains a single data.frame object, with the same name as the file, eg. m22.RData contains a data.frame object named "m22".
I can generate the file names easily enough with something like datanames <- paste0(c("m","n"),seq(1,100)) and then use load() on those, which will leave me with a few hundred data.frame objects named m1, m2, etc. What I am not sure of is how to do the next step -- prepare and merge each of these dataframes without having to type out all their names.
I can make a function that accepts a data frame as input and does all the processing. But if I pass it datanames[22] as input, I am passing it the string "m22", not the data frame object named m22.
My end goal is to epeatedly do the same steps on a bunch of different data frames without manually typing out "prepdata(m1) prepdata(m2) ... prepdata(n100)". I can think of two ways to do it, but I don't know how to implement either of them:
Get from a vector of the names of the data frames to a list containing the actual data frames.
Modify my "prepdata" function so that it can accept the name of the data frame, but then still somehow be able to do things to the data frame itself (possibly by way of "assign"? But the last step of the function will be to merge the prepared data to a bigger data frame, and I'm not sure if there's a method that uses "assign" that can do that...)
Can anybody advise on how to implement either of the above methods, or another way to make this work?
See this answer and the corresponding R FAQ
Basically:
temp1 <- c(1,2,3)
save(temp1, file = "temp1.RData")
x <- c()
x[1] <- load("temp1.RData")
get(x[1])
#> [1] 1 2 3
Assuming all your data exists in the same folder you can create an R object with all the paths, then you can create a function that gets a path to a Rdata file, reads it and calls "prepdata". Finally, using the purr package you can apply the same function on a input vector.
Something like this should work:
library(purrr)
rdata_paths <- list.files(path = "path/to/your/files", full.names = TRUE)
read_rdata <- function(path) {
data <- load(path)
return(data)
}
prepdata <- function(data) {
### your prepdata implementation
}
master_function <- function(path) {
data <- read_rdata(path)
result <- prepdata(data)
return(result)
}
merged_rdatas <- map_df(rdata_paths, master_function) # This create one dataset. Merging all together
Does anyone know the best way to carry out a "for loop" that would read in different subject id's and append them to the name of an exported csv?
As an example, I have multiple output files from an electrocardiogram software program (each file belongs to one individual). The files are named C800_HR.bdf.evt, C801_HR.bdf.evt, C802_HR.bdf.evt etc. Each file gets read into r and then has a script applied to calculate heart rate variability. At the end of the script, I need to add a loop that will extract the subject id (e.g., C800, C801, C802) and write a new file name for each individual so that it becomes C800_RtoR.csv. Essentially, I would like to avoid changing the syntax every time I read in and export a file name.
I am currently using the following syntax to read in multiple files:
>setwd("/Users/kmpc/Downloads")
>myhrvdata <-lapply(Sys.glob("C8**_HR.bdf.evt"), read.delim)
Try this out:
cardio_files <- list.files(pattern = "C8\\d{2}_HR.bdf.evt")
subject_ids <- sub("^(C8\\d{2})_.*", "\\1" cardio_files)
myList <- lapply(cardio_files, read.delim)
## do calculations on the list
for (i in names(myList)) {
write.csv(myList[[i]], paste0(subject_ids[i], "_RtoR.csv"))
}
The only thing is, you have to deal with using a list when doing your calculations. You could combine them to a single data.frame, but it would be best to leave it as a list to write the files at the end.
Consider generalizing your process by creating a function that: 1) reads in file, 2) processes data, 3) outputs to csv. Then have lapply call the defined method iteratively across all Sys.glob items and even return a list of calculated data frames.
proc_heart_rate <- function(f_name) {
# READ IN .evt FILE INTO df
df <- read.delim(f_name)
# CALCULATE HEART RATE VARIABILITY WITH df
...
# OUTPUT df TO CSV
subject_id <- gsub("\\_.*", "", f_name)
write.csv(df, paste0(subject_id, "_RtoR.csv"))
# RETURN df FOR OTHER USES
return(df)
}
# LIST OF DATA FRAMES WITH CALCULATIONS
myhrvdata_list <-lapply(Sys.glob("C8**_HR.bdf.evt"), proc_heart_rate)
I am new to R and dont know exactly how to do for loops.
Here is my problem: I have about 160 csv files in a folder, each with a specific name. In each file, there is a pattern:"HL.X.Y.Z.", where X="Region", Y="cluster", and Z="point". What i need to do is read all these csv files, extract strings from the names, create a column with the strings for each csv file, and bind all these csv files in a single data frame.
Here is some code of what i am trying to do:
setwd("C:/Users/worddirect")
files.names<-list.files(getwd(),pattern="*.csv")
files.names
head(files.names)
>[1] "HL.1.1.1.2F31CA.150722.csv" "HL.1.1.2.2F316A.150722.csv"
[3] "HL.1.1.3.2F3274.150722.csv" "HL.1.1.4.2F3438.csv"
[5] "HL.1.10.1.3062CD.150722.csv" "HL.1.10.2.2F343D.150722.csv"
Doing like this to read all files works just fine:
files.names
for (i in 1:length(files.names)) {
assign(files.names[i], read.csv(files.names[i],skip=18))
}
Adding an extra column for an individual csv files like this works fine:
test<-cbind("Region"=rep(substring(files.names[1],4,4),times=nrow(HL.1.1.1.2F31CA.150722.csv)),
"Cluster"=rep(substring(files.names[1],6,6),times=nrow(HL.1.1.1.2F31CA.150722.csv)),
"Point"=rep(substring(files.names[1],8,8),times=nrow(HL.1.1.1.2F31CA.150722.csv)),
HL.1.1.1.2F31CA.150722.csv)
head(test)
Region Cluster Point Date.Time Unit Value
1 1 1 1 6/2/14 11:00:01 PM C 24.111
2 1 1 1 6/3/14 1:30:01 AM C 21.610
3 1 1 1 6/3/14 4:00:01 AM C 20.609
However, a for loop of the above doesn`t work.
files.names
for (i in 1:length(files.names)) {
assign(files.names[i], read.csv(files.names[i],skip=18))
cbind("Region"=rep(substring(files.names[i],4,4),times=nrow(i)),
"Cluster"=rep(substring(files.names[i],6,6),times=nrow(i)),
"Point"=rep(substring(files.names[i],8,8),times=nrow(i)),
i)
}
>Error in rep(substring(files.names[i], 4, 4), times = nrow(i)) :
invalid 'times' argument
The final step would be to bind all the csv files in a single data frame.
I appreciate any suggestion. If there is any simpler way to do what i did i appreciate too!
There are many ways to solve a problem in R. A more R-like way to solve this problem is with an apply() function. The apply() family of functions acts like an implied for loop, applying one or more operations to each item in passed to it via a function argument.
Another important feature of R is the anonymous function. Combining lapply() with an anonymous function we can solve your multi file read problem.
setwd("C:/Users/worddirect")
files.names<-list.files(getwd(),pattern="*.csv")
# read csv files and return them as items in a list()
theList <- lapply(files.names,function(x){
theData <- read.csv(x,skip=18)
# bind the region, cluster, and point data and return
cbind(
"Region"=rep(substring(x,4,4),times=nrow(theData)),
"Cluster"=rep(substring(x,6,6),times=nrow(theData)),
"Point"=rep(substring(x,8,8),times=nrow(theData)),
theData)
})
# rbind the data frames in theList into a single data frame
theResult <- do.call(rbind,theList)
regards,
Len
i is number, which doesn't have nrow property.
You can use following code
result = data.frame()
for (i in 1:length(files.names)) {
assign(files.names[i], read.csv(files.names[i],skip=18))
result = rbind(
cbind(
"Region"=rep(substring(files.names[i],4,4),times=nrow(files.names[i])),
"Cluster"=rep(substring(files.names[i],6,6),times=nrow(files.names[i])),
"Point"=rep(substring(files.names[i],8,8),times=nrow(files.names[i])),
files.names[i]))
}
I have to load data from files related to multiple experiments, and latter process them for generating a plot. Each experiment generated multiple files. Files related to experiment 1 will have their name "Experiment1" and then postfixed by data type it contains i.e. "Experiment1-per0", "Experiment1-per50", "Experiment1-per100".
These postfixes are fixed for all experiments. So to load the files, I want to give only the experiment names, and latter append the postfixes in R-script. Consequently, for each experiment name "ExperimentX" I would give, I will load three separate data files by appending the postfixes (i.e "ExperimentX-per0", "ExperimentX-per50", "ExperimentX-per100")
I am unable to figure out, in which datastructure I should store the initial experiment names and then the postfixed names.
Sample file (Experiment1-per50):
# the last column also shows the type of data i.e postfix of file
Obj TGiven TUsed TOGiven TOServed per50
16570 8 7 12 6 per50
18430 8 8 12 9 per50
16890 8 7 12 9 per50
Currently, I put every file name, manually, which takes lot of time.
If each experiment will have the same set of suffixes, you can store your list of experiment names and suffix names separately. Then, using a nested loop, you can combine the experiment name and suffix name using the paste function to get the filename.
You code might look something like this:
experiments = c("Experiment1","Experiment2","Experiment3")
suffixes = c("per0","per50","per100")
for (experiment in experiments) {
for (suffix in suffixes) {
filename <- paste(experiment, suffix, sep="-")
df <- read.table(filename)
df$experiment <- experiment
# Do something with the dataframe here
}
}
Alternatively, if you just want a vector of all the filenames from given experiments and suffixes lists, this would combine them:
as.vector(sapply(experiments, paste, suffixes, sep="-"))
If all the columns are the different
If the columns are different between the experiments, I would wrap the experiments in lists as follows:
library(plyr);
experiments <- c("Experiment1","Experiment2","Experiment3");
suffixes <- c("per0","per50","per100");
# if you want to go ahead and get the data
data <- llply( experiments, function(experiment) {
llply( suffixes, function(suffix) {
fn <- str_c(experiment,'_',suffix,'.csv'); # make filename
# later, try to read fn, now just return
return(fn);
})
})
You can then iterate through data for further processing. llply is part of the plyr package. It iterates over a list (the first l in llply) and returns a list (the second l).
If all the columns are the same
library(plyr);
experiments <- c("Experiment1","Experiment2","Experiment3");
suffixes <- c("per0","per50","per100");
data <- ldply( experiments, function(experiment) {
ldply( suffixes, function(suffix) {
data.frame(
experiment = experiment,
suffix= suffix,
fn = str_c(exper.name,'_',suffix,'.csv'))
})
})
This will read all the data as a single data.frame, which you can then parse as needed (for example, using plyr and/or subset).