I want to save data into an .RData file.
For instance, I'd like to save into 1.RData with two csv files and some information.
Here, I have two csv files
1) file_1.csv contains object city[[1]]
2) file_2.csv contains object city[[2]]
and additionally save other values, country and population as follows.
So, I guess I need to make objects 'city' from two csv files first of all.
The structure of 1.RData may looks like this:
> data = load("1.RData")
> data
[1] "city" "country" "population"
> city
[[1]]
NEW YORK 1.1
SAN FRANCISCO 3.1
[[2]]
TEXAS 1.3
SEATTLE 1.4
> class(city)
[1] "list"
> country
[1] "east" "west" "north"
> class(country)
[1] "character"
> population
[1] 10 11 13 14
> class(population)
[1] "integer"
file_1.csv and file_2.csv have bunch of rows and columns.
How can I create this type of RData with csv files and values?
Alternatively, when you want to save individual R objects, I recommend using saveRDS.
You can save R objects using saveRDS, then load them into R with a new variable name using readRDS.
Example:
# Save the city object
saveRDS(city, "city.rds")
# ...
# Load the city object as city
city <- readRDS("city.rds")
# Or with a different name
city2 <- readRDS("city.rds")
But when you want to save many/all your objects in your workspace, use Manetheran's answer.
There are three ways to save objects from your R session:
Saving all objects in your R session:
The save.image() function will save all objects currently in your R session:
save.image(file="1.RData")
These objects can then be loaded back into a new R session using the load() function:
load(file="1.RData")
Saving some objects in your R session:
If you want to save some, but not all objects, you can use the save() function:
save(city, country, file="1.RData")
Again, these can be reloaded into another R session using the load() function:
load(file="1.RData")
Saving a single object
If you want to save a single object you can use the saveRDS() function:
saveRDS(city, file="city.rds")
saveRDS(country, file="country.rds")
You can load these into your R session using the readRDS() function, but you will need to assign the result into a the desired variable:
city <- readRDS("city.rds")
country <- readRDS("country.rds")
But this also means you can give these objects new variable names if needed (i.e. if those variables already exist in your new R session but contain different objects):
city_list <- readRDS("city.rds")
country_vector <- readRDS("country.rds")
Just to add an additional function should you need it. You can include a variable in the named location, for example a date identifier
date <- yyyymmdd
save(city, file=paste0("c:\\myuser\\somelocation\\",date,"_RData.Data")
This was you can always keep a check of when it was run
Related
I am working with a large dataset in an R package.
I need to get all of the separate data frames into my global environment, preferably into a list of data frames so that I can use lapply to do some repetitive operations later.
So far I've done the following:
l.my.package <- data(package="my.package")
lc.my.package <- l.my.package[[3]]
lc.df.my.package <- as.data.frame(lc.my.package)
This effectively creates a data frame of the location and name of each of the .RData files in my package, so I can load them all.
I have figured out how to load them all using a for loop.
I create a vector of path names and feed it into the loop:
f <- path('my/path/folder', lc.df.my.package$Item, ext="rdata")
f.v <- as.vector(f)
for (i in f.v) {load(i)}
This loads everything into separate data frames (as I want), but it obviously doesn't put the data frames into a list. I thought lapply would work here, but when I use lapply, the resulting list is a list of character strings (the title of each dataframe with no data included). That code looks like this:
f.l <- as.list(f)
func <- function(i) {load(i)}
df.list <- lapply(f.l, func)
I am looking for one of two possible solutions:
how can I efficiently collect the output of for loop into a list (a "while" loop would likely be too slow)?
how can I adjust lapply so the output includes each entire dataframe instead of just the title of each dataframe?
Edit: I have also tried introducing the "envir=.GlobalEnv" argument into load() within lapply. When I do that, the data frames load, but still not in a list. The list still contains only the names as character strings.
If you are willing to use a packaged solution, I wrote a package call libr that does exactly what you are asking for. Here is an example:
library(libr)
# Create temp directory
tmp <- tempdir()
# Save some data to temp directory
# for illustration purposes
saveRDS(trees, file.path(tmp, "trees.rds"))
saveRDS(rock, file.path(tmp, "rocks.rds"))
# Create library
libname(dat, tmp)
# library 'dat': 2 items
# - attributes: not loaded
# - path: C:\Users\User\AppData\Local\Temp\RtmpCSJ6Gc
# - items:
# Name Extension Rows Cols Size LastModified
# 1 rocks rds 48 4 3.1 Kb 2020-11-05 23:25:34
# 2 trees rds 31 3 2.4 Kb 2020-11-05 23:25:34
# Load library
lib_load(dat)
# Examine workspace
ls()
# [1] "dat" "dat.rocks" "dat.trees" "tmp"
# Unload the library from memory
lib_unload(dat)
# Examine workspace again
ls()
# [1] "dat" "tmp"
#rawr's response works perfectly:
df.list <- mget(l.my.package$results[, 'Item'], inherits = TRUE)
I have about 30 separate dataframes loaded in my R session each with various names. I also have a character vector called mydfs which contains the names of all those dataframes loaded into my R session. I am trying to loop over mydfs and save out as an rds file each dataframe listed in the elements of mydfs, but for some reason, I'm only able to save out the character string of the name of the dataframe I'm trying to save (not the datafame itself). Here is simulated, reproducible example of what I have:
#Create vector of dataframes that exist in base r to create a reproducible example
mydfs<-c("cars","iris","iris3","mtcars")
#My code that creates files, but they don't contain my dataframe data for some reason
for (i in 1:length(mydfs)){
savefile<-paste0(paste0("D:/Data/", mydfs[i]), ".Rds")
saveRDS(mydfs[i], file=savefile)
print(paste("Dataframe Saved:", mydfs[i]))
}
This results in the following log output:
[1] "Dataframe Saved: cars"
[1] "Dataframe Saved: iris"
[1] "Dataframe Saved: iris3"
[1] "Dataframe Saved: mtcars"
Then I try to read back in any of the files I created:
#But when read back in only contain a single character string of the dataframe name
a<-readRDS("D:/Data/iris3.Rds")
str(a)
chr "iris3"
Note that when I read iris3.Rds back into a new R session using readRDS, I don't have a dataframe as I was expecting, but a single character vector containing the name of the datafame and not the data.
I haven't been programming in R for a while, since my current client preferred SAS, so I think I am somehow getting macro variable looping in SAS confused with R and so that when I call saveRDS, I'm passing in a single character vector instead of the actual dataframe. How can I get the dataframe to be passed into saveRDS instead of the character?
Thanks for helping me untangle my SAS thinking with my somewhat rusty R thinking.
You're currently just saving the names of the dataframes. You can use the get function as follows:
mydfs<-c("cars","iris","iris3","mtcars")
for (i in 1:length(mydfs)){
savefile<-paste0(paste0("D:/Data/", mydfs[i]), ".Rds")
saveRDS(get(mydfs[i]), file=savefile)
print(paste("Dataframe Saved:", mydfs[i]))
}
readRDS('D:/Data/iris3.RDS')
I am performing a set of analyses in R. The flow of the analysis is reading in a dataframe (i.e. input_dataframe), performing a set of calculations that then result in a new, smaller dataframe (called final_result). A set of exact calculations is performed on 23 different files, each of which contains a dataframe.
My question is as follows: For each file that is read in (i.e. the 23 files) I am trying to save a unique R object: How do I do so? When I save the resulting final_result dataframe (using save() to an R object, I then cannot read all 23 objects into a new R session without having the different R objects override each other. Other suggestions (such as Create a variable name with "paste" in R?) did not work for me, since they rely on the fact that once the new variable name is assigned, you then call that new variable by its name, which I cannot do in this case.
To Summarize/Reword: Is there a way to save an object in R but change the name of the object for when it will be loaded later?
For example:
x=5
magicSave(x,file="saved_variable_1.r",to_save_as="result_1")
x=93
magicSave(x,file="saved_variable_2.r",to_save_as="result_2")
load(saved_variable_1)
load(saved_variable_2)
result_1
#returns 5
result_2
#returns 93
In R it's generally a good idea to actually store as a list everything that can be seen as a list. It will make everything more elegant afterwards.
First you put all your paths in a list or a vector :
paths <- c("C:/somewhere/file1.csv",
"C:/somewhere/file2.csv") # etc
Then you read them :
objects <- lapply(paths,read.csv) # objects is a list of tables
Then you apply your transformation on each element :
output <- lapply(objects,transformation_function)
And then you can save your output (I find saveRDS cleaner than save as you know what variables you'll be inviting in your workspace when loading) :
saveRDS(output,"C:/somewhere/output.RDS")
which you will load with
output <- readRDS("C:/somewhere/output.RDS")
OR if you prefer for some reason to save as different objects:
output_paths <- paste0("C:/somewhere/output",seq_along(output),".csv")
Map(saveRDS,output,output_paths)
To load later with:
output <- lapply(paths, readRDS)
x=5
write.csv(x,"one_thing.csv", row.names = F)
x=93
write.csv(x,"two_thing.csv", row.names = F)
result_1 <- read.csv("one_thing.csv")
result_2 <- read.csv("two_thing.csv")
result_1
# x
# 1 5
result_2
# x
# 1 93
I want to save data into an .RData file.
For instance, I'd like to save into 1.RData with two csv files and some information.
Here, I have two csv files
1) file_1.csv contains object city[[1]]
2) file_2.csv contains object city[[2]]
and additionally save other values, country and population as follows.
So, I guess I need to make objects 'city' from two csv files first of all.
The structure of 1.RData may looks like this:
> data = load("1.RData")
> data
[1] "city" "country" "population"
> city
[[1]]
NEW YORK 1.1
SAN FRANCISCO 3.1
[[2]]
TEXAS 1.3
SEATTLE 1.4
> class(city)
[1] "list"
> country
[1] "east" "west" "north"
> class(country)
[1] "character"
> population
[1] 10 11 13 14
> class(population)
[1] "integer"
file_1.csv and file_2.csv have bunch of rows and columns.
How can I create this type of RData with csv files and values?
Alternatively, when you want to save individual R objects, I recommend using saveRDS.
You can save R objects using saveRDS, then load them into R with a new variable name using readRDS.
Example:
# Save the city object
saveRDS(city, "city.rds")
# ...
# Load the city object as city
city <- readRDS("city.rds")
# Or with a different name
city2 <- readRDS("city.rds")
But when you want to save many/all your objects in your workspace, use Manetheran's answer.
There are three ways to save objects from your R session:
Saving all objects in your R session:
The save.image() function will save all objects currently in your R session:
save.image(file="1.RData")
These objects can then be loaded back into a new R session using the load() function:
load(file="1.RData")
Saving some objects in your R session:
If you want to save some, but not all objects, you can use the save() function:
save(city, country, file="1.RData")
Again, these can be reloaded into another R session using the load() function:
load(file="1.RData")
Saving a single object
If you want to save a single object you can use the saveRDS() function:
saveRDS(city, file="city.rds")
saveRDS(country, file="country.rds")
You can load these into your R session using the readRDS() function, but you will need to assign the result into a the desired variable:
city <- readRDS("city.rds")
country <- readRDS("country.rds")
But this also means you can give these objects new variable names if needed (i.e. if those variables already exist in your new R session but contain different objects):
city_list <- readRDS("city.rds")
country_vector <- readRDS("country.rds")
Just to add an additional function should you need it. You can include a variable in the named location, for example a date identifier
date <- yyyymmdd
save(city, file=paste0("c:\\myuser\\somelocation\\",date,"_RData.Data")
This was you can always keep a check of when it was run
I would like to load a data file in R using data(), with the data set's name stored in a variable. Doing this without the data set name stored in a variable is trivial:
> library(ChIPpeakAnno)
> data(TSS.human.NCBI36)
> # Use data:
> TSS.human.NCBI36 # Prints out contents of data set
When the data set name is stored in a variable, however, I'm not sure how to accomplish the same task.
> library(ChIPpeakAnno)
> assembly <- 'TSS.human.NCBI36'
> data(list=c(assembly)) # Hackish way of loading the data from a variable
> # Now I wish to access the data, but I don't know how.
data()'s return value is simply the name of the data set loaded. The data file I'm trying to load is located at ~/R/2.15/library/ChIPpeakAnno/data/TSS.human.NCBI36.rda -- I do not believe there is anything Bioconductor-specific to it.
Thanks!
If you're trying to figure out how to access data programmatically when you just have the objects name in a character vector you can use get.
library(ChIPpeakAnno)
assembly <- 'TSS.human.NCBI36'
data(list=c(assembly))
# Now store the data into 'dat'
dat <- get(assembly)
# Now you can use 'dat' anywhere you would normally use TSS.human.NCBI36
head(start(dat))
#[1] 1873 4274 20229 24417 24417 42912
head(start(TSS.human.NCBI36))
#[1] 1873 4274 20229 24417 24417 42912