How to load a single object from .Rdata file? [duplicate] - r

I have a Rdata file containing various objects:
New.Rdata
|_ Object 1 (e.g. data.frame)
|_ Object 2 (e.g. matrix)
|_...
|_ Object n
Of course I can load the data frame with load('New.Rdata'), however, is there a smart way to load only one specific object out of this file and discard the others?

.RData files don't have an index (the contents are serialized as one big pairlist). You could hack a way to go through the pairlist and assign only entries you like, but it's not easy since you can't do it at the R level.
However, you can simply convert the .RData file into a lazy-load database which serializes each entry separately and creates an index. The nice thing is that the loading will be on-demand:
# convert .RData -> .rdb/.rdx
e = local({load("New.RData"); environment()})
tools:::makeLazyLoadDB(e, "New")
Loading the DB then only loads the index but not the contents. The contents are loaded as they are used:
lazyLoad("New")
ls()
x # if you had x in the New.RData it will be fetched now from New.rdb
Just like with load() you can specify an environment to load into so you don't need to pollute the global workspace etc.

You can use attach rather than load which will attach the data object to the search path, then you can copy the one object you are interested in and detach the .Rdata object.
This still loads everything, but is simpler to work with than loading everything into the global workspace (possibly overwriting things you don't want overwritten) then getting rid of everything you don't want.

Simon Urbanek's answer is very, very nice. A drawback is that it doesn't seem to work if an object to be saved is too large:
tools:::makeLazyLoadDB(
local({
x <- 1:1e+09
cat("size:", object.size(x) ,"\n")
environment()
}), "lazytest")
size: 4e+09
Error: serialization is too large to store in a raw vector
I'm guessing that this is due to a limitation of the current implementation of R (I have 2.15.2) rather than running out of physical memory and swap. The saves package might be an alternative for some uses, however.

A function is useful to extract a single object without loading everything in the RData file.
extractorRData <- function(file, object) {
#' Function for extracting an object from a .RData file created by R's save() command
#' Inputs: RData file, object name
E <- new.env()
load(file=file, envir=E)
return(get(object, envir=E, inherits=F))
}
See full answer here. https://stackoverflow.com/a/65964065/4882696

This blog post gives an a neat practice that prevents this sort of issue in the first problem. The gist of it is to use saveRDS(), loadRDS() functions instead of the regular save(), load() functions.

Related

regarding reading data information saved in a rda file

For a .rda file, after loading it, except viewing it using Rtudio window, are there any functions that can list all the data information stored in this rda file?
When I want to see what's in an .rda file without loading it into my current environment, I load it into a temp env:
e <- new.env(parent = emptyenv())
load("path/to/file.rda", envir=e)
ls(e) # shows names of variables stored in it
ls.str(e) # shows a `str` presentation of all variables within it
Not the most efficient way, as it requires that you load the contents before listing them. I don't think it's easy to look at a raw rda file on-disk and know its contents without loading it in some fashion.

How to output a list of dataframes, which is able to be used by another user

I have a list whose elements are several dataframes, which looks like this
Because it is hard for another user to use these data by re-running my original code. Hence, I would like to export it. As the graph shows, the dataframes in that list have different number of rows. I am wondering if there is any method to export it as file without damaging any information, and make it be able to be used by Rstudio. I have tried to save it as RData, but I don't know how to save the information.
Thanks a lot
To output objects in R, here are 4 common methods:
dput() writes a text representation of an R object
This is very convenient if you want to allow someone to get your object by copying and pasting text (for instance on this site), without having to email or upload and download a file. The downside however is that the output is long and re-reading the object into R (simply by assigning the copied text to an object) can hang R for large objects. This works best to create reproducible examples. For a list of data frames, this would not be a very good option.
You can print an object to a .csv, .xlsx, etc. file with write.table(), write.csv(), readr::write_csv(), xlsx::write.xlsx(), etc.
While the file can then be used by other software (and re-imported into R with read.csv(), readr::read_csv(), readxl::read_excel(), etc.), the data can be transformed in the process and some objects cannot be printed in a single file without prior modifications. So this is not ideal in your case either.
save.image() saves your entire workspace (objects + environment)
The workspace can then be recreated with load(). This can be useful, but you are here only interested in saving one object. In that case, it is preferable to use:
saveRDS() which allows to write one object to file
The object can then be re-created with readRDS(). This is the best option to save an R object to file, without any modification and then re-create it.
In your situation, this is definitely the best solution.

How to load only a dataframe (not entire RData)? [duplicate]

I have a Rdata file containing various objects:
New.Rdata
|_ Object 1 (e.g. data.frame)
|_ Object 2 (e.g. matrix)
|_...
|_ Object n
Of course I can load the data frame with load('New.Rdata'), however, is there a smart way to load only one specific object out of this file and discard the others?
.RData files don't have an index (the contents are serialized as one big pairlist). You could hack a way to go through the pairlist and assign only entries you like, but it's not easy since you can't do it at the R level.
However, you can simply convert the .RData file into a lazy-load database which serializes each entry separately and creates an index. The nice thing is that the loading will be on-demand:
# convert .RData -> .rdb/.rdx
e = local({load("New.RData"); environment()})
tools:::makeLazyLoadDB(e, "New")
Loading the DB then only loads the index but not the contents. The contents are loaded as they are used:
lazyLoad("New")
ls()
x # if you had x in the New.RData it will be fetched now from New.rdb
Just like with load() you can specify an environment to load into so you don't need to pollute the global workspace etc.
You can use attach rather than load which will attach the data object to the search path, then you can copy the one object you are interested in and detach the .Rdata object.
This still loads everything, but is simpler to work with than loading everything into the global workspace (possibly overwriting things you don't want overwritten) then getting rid of everything you don't want.
Simon Urbanek's answer is very, very nice. A drawback is that it doesn't seem to work if an object to be saved is too large:
tools:::makeLazyLoadDB(
local({
x <- 1:1e+09
cat("size:", object.size(x) ,"\n")
environment()
}), "lazytest")
size: 4e+09
Error: serialization is too large to store in a raw vector
I'm guessing that this is due to a limitation of the current implementation of R (I have 2.15.2) rather than running out of physical memory and swap. The saves package might be an alternative for some uses, however.
A function is useful to extract a single object without loading everything in the RData file.
extractorRData <- function(file, object) {
#' Function for extracting an object from a .RData file created by R's save() command
#' Inputs: RData file, object name
E <- new.env()
load(file=file, envir=E)
return(get(object, envir=E, inherits=F))
}
See full answer here. https://stackoverflow.com/a/65964065/4882696
This blog post gives an a neat practice that prevents this sort of issue in the first problem. The gist of it is to use saveRDS(), loadRDS() functions instead of the regular save(), load() functions.

R language saving SpatialPixelsDataFrame objects

Excuse me in advance for the basic question
Converting SpatialGridDataFrame objects to SpatialPixelsDataFrame ones may be a time (and computer memory) demanding task, especially when big grids are involved.
I have been unsuccessfully goggleing around the possibility to create the SpatialPixelsDataFrame object once, and save it in such a way I will be able to upload it later as... an SpatialPixelsDataFrame object.
Can anybody tell me how to do that?
Danke schön
perep
You can save R objects in RDS files:
saveRDS(anything, file="anything.rds")
and then load it back:
anything = readRDS(file="anything.rds")
Someone may suggest you use save() to an RData file instead:
save(anything, file="mything.RData")
but that means unless you do a bit of fiddling you will have to load it into a thing called anything:
rm(anything)
load(file="mything.RData")
summary(anything) # A magic "anything" has appeared!
So use RDS files, then you can load them back to any object name you like:
foo = readRDS("anything.rds")
bar = readRDS("anything.rds")
and so on.

List of variables from R workspace file [duplicate]

I have a Rdata file containing various objects:
New.Rdata
|_ Object 1 (e.g. data.frame)
|_ Object 2 (e.g. matrix)
|_...
|_ Object n
Of course I can load the data frame with load('New.Rdata'), however, is there a smart way to load only one specific object out of this file and discard the others?
.RData files don't have an index (the contents are serialized as one big pairlist). You could hack a way to go through the pairlist and assign only entries you like, but it's not easy since you can't do it at the R level.
However, you can simply convert the .RData file into a lazy-load database which serializes each entry separately and creates an index. The nice thing is that the loading will be on-demand:
# convert .RData -> .rdb/.rdx
e = local({load("New.RData"); environment()})
tools:::makeLazyLoadDB(e, "New")
Loading the DB then only loads the index but not the contents. The contents are loaded as they are used:
lazyLoad("New")
ls()
x # if you had x in the New.RData it will be fetched now from New.rdb
Just like with load() you can specify an environment to load into so you don't need to pollute the global workspace etc.
You can use attach rather than load which will attach the data object to the search path, then you can copy the one object you are interested in and detach the .Rdata object.
This still loads everything, but is simpler to work with than loading everything into the global workspace (possibly overwriting things you don't want overwritten) then getting rid of everything you don't want.
Simon Urbanek's answer is very, very nice. A drawback is that it doesn't seem to work if an object to be saved is too large:
tools:::makeLazyLoadDB(
local({
x <- 1:1e+09
cat("size:", object.size(x) ,"\n")
environment()
}), "lazytest")
size: 4e+09
Error: serialization is too large to store in a raw vector
I'm guessing that this is due to a limitation of the current implementation of R (I have 2.15.2) rather than running out of physical memory and swap. The saves package might be an alternative for some uses, however.
A function is useful to extract a single object without loading everything in the RData file.
extractorRData <- function(file, object) {
#' Function for extracting an object from a .RData file created by R's save() command
#' Inputs: RData file, object name
E <- new.env()
load(file=file, envir=E)
return(get(object, envir=E, inherits=F))
}
See full answer here. https://stackoverflow.com/a/65964065/4882696
This blog post gives an a neat practice that prevents this sort of issue in the first problem. The gist of it is to use saveRDS(), loadRDS() functions instead of the regular save(), load() functions.

Resources