Use multiple dataframes from a package of data in R - r

I am working with a large dataset in an R package.
I need to get all of the separate data frames into my global environment, preferably into a list of data frames so that I can use lapply to do some repetitive operations later.
So far I've done the following:
l.my.package <- data(package="my.package")
lc.my.package <- l.my.package[[3]]
lc.df.my.package <- as.data.frame(lc.my.package)
This effectively creates a data frame of the location and name of each of the .RData files in my package, so I can load them all.
I have figured out how to load them all using a for loop.
I create a vector of path names and feed it into the loop:
f <- path('my/path/folder', lc.df.my.package$Item, ext="rdata")
f.v <- as.vector(f)
for (i in f.v) {load(i)}
This loads everything into separate data frames (as I want), but it obviously doesn't put the data frames into a list. I thought lapply would work here, but when I use lapply, the resulting list is a list of character strings (the title of each dataframe with no data included). That code looks like this:
f.l <- as.list(f)
func <- function(i) {load(i)}
df.list <- lapply(f.l, func)
I am looking for one of two possible solutions:
how can I efficiently collect the output of for loop into a list (a "while" loop would likely be too slow)?
how can I adjust lapply so the output includes each entire dataframe instead of just the title of each dataframe?
Edit: I have also tried introducing the "envir=.GlobalEnv" argument into load() within lapply. When I do that, the data frames load, but still not in a list. The list still contains only the names as character strings.

If you are willing to use a packaged solution, I wrote a package call libr that does exactly what you are asking for. Here is an example:
library(libr)
# Create temp directory
tmp <- tempdir()
# Save some data to temp directory
# for illustration purposes
saveRDS(trees, file.path(tmp, "trees.rds"))
saveRDS(rock, file.path(tmp, "rocks.rds"))
# Create library
libname(dat, tmp)
# library 'dat': 2 items
# - attributes: not loaded
# - path: C:\Users\User\AppData\Local\Temp\RtmpCSJ6Gc
# - items:
# Name Extension Rows Cols Size LastModified
# 1 rocks rds 48 4 3.1 Kb 2020-11-05 23:25:34
# 2 trees rds 31 3 2.4 Kb 2020-11-05 23:25:34
# Load library
lib_load(dat)
# Examine workspace
ls()
# [1] "dat" "dat.rocks" "dat.trees" "tmp"
# Unload the library from memory
lib_unload(dat)
# Examine workspace again
ls()
# [1] "dat" "tmp"

#rawr's response works perfectly:
df.list <- mget(l.my.package$results[, 'Item'], inherits = TRUE)

Related

loop through .rds files from directory and convert to data frame

I am trying to loop through a list containing the file_path to all .rds files in a folder.
I can easily load one .rds file and then convert it to a data frame as shown below. However, the issue is how to first load all files and subsequently convert the rds. files to separate dataframes. I suspect a for-loop using file_path as input is necessary.
user_x <- readRDS("/Users/marcoliedecke/Desktop/thesis/data/VK_Data/75315975_VK_user.rds") # load data
user_x_df <- as.data.frame(user_x) # convert to dataframe
file_path <- list.files(".../VK_Data", pattern="*.rds", full.names=TRUE)
print(file_path)
[1] "/Users/marcoliedecke/Desktop/thesis/data/VK_Data/103656622_VK_user.rds"
[2] "/Users/marcoliedecke/Desktop/thesis/data/VK_Data/11226063_VK_user.rds"
[3] "/Users/marcoliedecke/Desktop/thesis/data/VK_Data/112552215_VK_user.rds"
(...)
Yes you have the right idea. You can make a for-loop to do this. Since you have your vector of file paths, you could do something like I do below. Note that I don't know how you want your data frames named, but this will read everything in.
library(tidyverse)
## counting how many files in your directory
file_number <- 1
for (file in file_path) {
name <- paste0("df_", file_number) ## getting a new name ready
x <- readRDS(file) ## reading in the file
assign(name, x) ## assigning the data the name we created two lines up
file_number <- file_number + 1
}
This gives you as many data frames named df_* as there are files in your file_path vector.
You could then append all of them together (assuming they have the same column names and column types) using this:
full_data <- mget(ls(pattern = "^df_")) %>%
reduce(bind_rows)
In the above code, the ls(pattern = "^df_") line returns a list of all of the data frames in your global environment that start with "df". The mget function will grab these for you, and the reduce(bind_rows) should append them all together.

R: Doing the same steps on many data frames with their names stored in a vector

I have several .RData files, each of which has letters and numbers in its name, eg. m22.RData. Each of these contains a single data.frame object, with the same name as the file, eg. m22.RData contains a data.frame object named "m22".
I can generate the file names easily enough with something like datanames <- paste0(c("m","n"),seq(1,100)) and then use load() on those, which will leave me with a few hundred data.frame objects named m1, m2, etc. What I am not sure of is how to do the next step -- prepare and merge each of these dataframes without having to type out all their names.
I can make a function that accepts a data frame as input and does all the processing. But if I pass it datanames[22] as input, I am passing it the string "m22", not the data frame object named m22.
My end goal is to epeatedly do the same steps on a bunch of different data frames without manually typing out "prepdata(m1) prepdata(m2) ... prepdata(n100)". I can think of two ways to do it, but I don't know how to implement either of them:
Get from a vector of the names of the data frames to a list containing the actual data frames.
Modify my "prepdata" function so that it can accept the name of the data frame, but then still somehow be able to do things to the data frame itself (possibly by way of "assign"? But the last step of the function will be to merge the prepared data to a bigger data frame, and I'm not sure if there's a method that uses "assign" that can do that...)
Can anybody advise on how to implement either of the above methods, or another way to make this work?
See this answer and the corresponding R FAQ
Basically:
temp1 <- c(1,2,3)
save(temp1, file = "temp1.RData")
x <- c()
x[1] <- load("temp1.RData")
get(x[1])
#> [1] 1 2 3
Assuming all your data exists in the same folder you can create an R object with all the paths, then you can create a function that gets a path to a Rdata file, reads it and calls "prepdata". Finally, using the purr package you can apply the same function on a input vector.
Something like this should work:
library(purrr)
rdata_paths <- list.files(path = "path/to/your/files", full.names = TRUE)
read_rdata <- function(path) {
data <- load(path)
return(data)
}
prepdata <- function(data) {
### your prepdata implementation
}
master_function <- function(path) {
data <- read_rdata(path)
result <- prepdata(data)
return(result)
}
merged_rdatas <- map_df(rdata_paths, master_function) # This create one dataset. Merging all together

Assign new names to dataframes and save as separate objects in R

I am performing a set of analyses in R. The flow of the analysis is reading in a dataframe (i.e. input_dataframe), performing a set of calculations that then result in a new, smaller dataframe (called final_result). A set of exact calculations is performed on 23 different files, each of which contains a dataframe.
My question is as follows: For each file that is read in (i.e. the 23 files) I am trying to save a unique R object: How do I do so? When I save the resulting final_result dataframe (using save() to an R object, I then cannot read all 23 objects into a new R session without having the different R objects override each other. Other suggestions (such as Create a variable name with "paste" in R?) did not work for me, since they rely on the fact that once the new variable name is assigned, you then call that new variable by its name, which I cannot do in this case.
To Summarize/Reword: Is there a way to save an object in R but change the name of the object for when it will be loaded later?
For example:
x=5
magicSave(x,file="saved_variable_1.r",to_save_as="result_1")
x=93
magicSave(x,file="saved_variable_2.r",to_save_as="result_2")
load(saved_variable_1)
load(saved_variable_2)
result_1
#returns 5
result_2
#returns 93
In R it's generally a good idea to actually store as a list everything that can be seen as a list. It will make everything more elegant afterwards.
First you put all your paths in a list or a vector :
paths <- c("C:/somewhere/file1.csv",
"C:/somewhere/file2.csv") # etc
Then you read them :
objects <- lapply(paths,read.csv) # objects is a list of tables
Then you apply your transformation on each element :
output <- lapply(objects,transformation_function)
And then you can save your output (I find saveRDS cleaner than save as you know what variables you'll be inviting in your workspace when loading) :
saveRDS(output,"C:/somewhere/output.RDS")
which you will load with
output <- readRDS("C:/somewhere/output.RDS")
OR if you prefer for some reason to save as different objects:
output_paths <- paste0("C:/somewhere/output",seq_along(output),".csv")
Map(saveRDS,output,output_paths)
To load later with:
output <- lapply(paths, readRDS)
x=5
write.csv(x,"one_thing.csv", row.names = F)
x=93
write.csv(x,"two_thing.csv", row.names = F)
result_1 <- read.csv("one_thing.csv")
result_2 <- read.csv("two_thing.csv")
result_1
# x
# 1 5
result_2
# x
# 1 93

R: Reading and writing multiple csv files into a loop then using original names for output

Apologies if this may seem simple, but I can't find a workable answer anywhere on the site.
My data is in the form of a csv with the filename being a name and number. Not quite as simple as having file with a generic word and increasing number...
I've achieved exactly what i want to do with just one file, but the issue is there are a couple of hundred to do, so changing the name each time is quite tedious.
Posting my original single-batch code here in the hopes someone may be able to ease the growing tension of failed searches.
# set workspace
getwd()
setwd(".../Desktop/R Workspace")
# bring in original file, skipping first four rows
Person_7<- read.csv("PersonRound7.csv", header=TRUE, skip=4)
# cut matrix down to 4 columns
Person7<- Person_7[,c(1,2,9,17)]
# give columns names
colnames(Person7) <- c("Time","Spare", "Distance","InPeriod")
# find the empty rows, create new subset. Take 3 rows away for empty lines.
nullrow <- (which(Person7$Spare == "Velocity"))-3
Person7 <- Person7[(1:nullrow), ]
#keep 3 needed columns from matrix
Person7<- Person7[,c(1,3,4)]
colnames(Person7) <- c("Time","Distance","InPeriod")
#convert distance and time columns to factors
options(digits=9)
Person7$Distance <- as.numeric(as.character(Person7$Distance))
Person7$Time <- as.numeric(as.character(Person7$Time))
#Create the differences column for distance
Person7$Diff <- c(0, diff(Person7$Distance))
...whole heap of other stuff...
#export Minutes to an external file
write.csv(Person7_maxs, ".../Desktop/GPS Minutes/Person7.csv")
So the three part issue is as follows:
I can create a list or vector to read through the file names, but not a dataframe for each, each time (if that's even a good way to do it).
The variable names throughout the code will need to change instead of just being "Person1" "Person2", they'll be more like "Johnny1" "Lou23".
Need to export each resulting dataframe to it's own csv file with the original name.
Taking any and all suggestions on board - s.t.ruggling with this one.
Cheers!
Consider using one list of the ~200 dataframes. No need for separate named objects flooding global environment (though list2env still shown below). Hence, use lapply() to iterate through all csv files of working directory, then simply name each element of list to basename of file:
setwd(".../Desktop/R Workspace")
files <- list.files(path=getwd(), pattern=".csv")
# CREATE DATA FRAME LIST
dfList <- lapply(files, function(f) {
df <- read.csv(f, header=TRUE, skip=4)
df <- setNames(df[c(1,2,9,17)], c("Time","Spare","Distance","InPeriod"))
# ...same code referencing temp variable, df
write.csv(df_max, paste0(".../Desktop/GPS Minutes/", f))
return(df)
})
# NAME EACH ELEMENT TO CORRESPONDING FILE'S BASENAME
dfList <- setNames(dfList, gsub(".csv", "", files))
# REFERENCE A DATAFRAME WITH LIST INDEXING
str(dfList$PersonRound7) # PRINT STRUCTURE
View(dfList$PersonRound7) # VIEW DATA FRAME
dfList$PersonRound7$Time # OUTPUT ONE COLUMN
# OUTPUT ALL DFS TO SEPARATE OBJECTS (THOUGH NOT NEEDED)
list2env(dfList, envir = .GlobalEnv)

How to use an index to read multiple files at a time?

I want R to read the five files with names like
"alpha_rarefaction_8000_0.txt" ... "alpha_rarefaction_12000_0.txt"
and write it as
"alpha8000" ... "alpha12000", respectively.
I used the following code, but it did not work. Please help. What's wrong with my codes?
I tried to search like "how to use index in R function" or "how to write executable loop in R", but nothing helps. What kind of search strategy should I use to get effective results where searching the answers on Google?
for(i in seq(8000,12000,by=1000)) {
paste("rare",i,sep="")<-read.table(paste("alpha_rarefaction",i,"0.txt",sep="_"))
}
or
read.rare<-function(i){
paste("rare",$i,sep="")<-read.table(paste("alpha_rarefaction",$i,"0.txt",sep="_"))
}
i<-seq(8000,12000,by=1000)
read.rare(i)
I would recommend you read the files into a list, possibly doing it this way -
## create the sequence for the file names
s <- 8:12 * 1e3
# [1] 8000 9000 10000 11000 12000
## create the full file names from the sequence above
files <- sprintf("alpha_rarefaction_%d_0.txt", s)
# [1] "alpha_rarefaction_8000_0.txt" "alpha_rarefaction_9000_0.txt"
# [3] "alpha_rarefaction_10000_0.txt" "alpha_rarefaction_11000_0.txt"
# [5] "alpha_rarefaction_12000_0.txt"
## Now we can loop the file names, reading the data into a list
## and setting the names for each element
datalist <- setNames(lapply(files, read.table), paste0("alpha", s))
This will keep all the data frames in a list, which will make working with them later a lot easier. You can access them individually with the $ operator. They have names
names(datalist)
[1] "alpha8000" "alpha9000" "alpha10000" "alpha11000" "alpha12000"
so datalist$alpha9000, for example, accesses the second data set (and alternatively with datalist[[2]]).

Resources