regarding reading data information saved in a rda file - r

For a .rda file, after loading it, except viewing it using Rtudio window, are there any functions that can list all the data information stored in this rda file?

When I want to see what's in an .rda file without loading it into my current environment, I load it into a temp env:
e <- new.env(parent = emptyenv())
load("path/to/file.rda", envir=e)
ls(e) # shows names of variables stored in it
ls.str(e) # shows a `str` presentation of all variables within it
Not the most efficient way, as it requires that you load the contents before listing them. I don't think it's easy to look at a raw rda file on-disk and know its contents without loading it in some fashion.

Related

How to load a single object from .Rdata file? [duplicate]

I have a Rdata file containing various objects:
New.Rdata
|_ Object 1 (e.g. data.frame)
|_ Object 2 (e.g. matrix)
|_...
|_ Object n
Of course I can load the data frame with load('New.Rdata'), however, is there a smart way to load only one specific object out of this file and discard the others?
.RData files don't have an index (the contents are serialized as one big pairlist). You could hack a way to go through the pairlist and assign only entries you like, but it's not easy since you can't do it at the R level.
However, you can simply convert the .RData file into a lazy-load database which serializes each entry separately and creates an index. The nice thing is that the loading will be on-demand:
# convert .RData -> .rdb/.rdx
e = local({load("New.RData"); environment()})
tools:::makeLazyLoadDB(e, "New")
Loading the DB then only loads the index but not the contents. The contents are loaded as they are used:
lazyLoad("New")
ls()
x # if you had x in the New.RData it will be fetched now from New.rdb
Just like with load() you can specify an environment to load into so you don't need to pollute the global workspace etc.
You can use attach rather than load which will attach the data object to the search path, then you can copy the one object you are interested in and detach the .Rdata object.
This still loads everything, but is simpler to work with than loading everything into the global workspace (possibly overwriting things you don't want overwritten) then getting rid of everything you don't want.
Simon Urbanek's answer is very, very nice. A drawback is that it doesn't seem to work if an object to be saved is too large:
tools:::makeLazyLoadDB(
local({
x <- 1:1e+09
cat("size:", object.size(x) ,"\n")
environment()
}), "lazytest")
size: 4e+09
Error: serialization is too large to store in a raw vector
I'm guessing that this is due to a limitation of the current implementation of R (I have 2.15.2) rather than running out of physical memory and swap. The saves package might be an alternative for some uses, however.
A function is useful to extract a single object without loading everything in the RData file.
extractorRData <- function(file, object) {
#' Function for extracting an object from a .RData file created by R's save() command
#' Inputs: RData file, object name
E <- new.env()
load(file=file, envir=E)
return(get(object, envir=E, inherits=F))
}
See full answer here. https://stackoverflow.com/a/65964065/4882696
This blog post gives an a neat practice that prevents this sort of issue in the first problem. The gist of it is to use saveRDS(), loadRDS() functions instead of the regular save(), load() functions.

How to load only a dataframe (not entire RData)? [duplicate]

I have a Rdata file containing various objects:
New.Rdata
|_ Object 1 (e.g. data.frame)
|_ Object 2 (e.g. matrix)
|_...
|_ Object n
Of course I can load the data frame with load('New.Rdata'), however, is there a smart way to load only one specific object out of this file and discard the others?
.RData files don't have an index (the contents are serialized as one big pairlist). You could hack a way to go through the pairlist and assign only entries you like, but it's not easy since you can't do it at the R level.
However, you can simply convert the .RData file into a lazy-load database which serializes each entry separately and creates an index. The nice thing is that the loading will be on-demand:
# convert .RData -> .rdb/.rdx
e = local({load("New.RData"); environment()})
tools:::makeLazyLoadDB(e, "New")
Loading the DB then only loads the index but not the contents. The contents are loaded as they are used:
lazyLoad("New")
ls()
x # if you had x in the New.RData it will be fetched now from New.rdb
Just like with load() you can specify an environment to load into so you don't need to pollute the global workspace etc.
You can use attach rather than load which will attach the data object to the search path, then you can copy the one object you are interested in and detach the .Rdata object.
This still loads everything, but is simpler to work with than loading everything into the global workspace (possibly overwriting things you don't want overwritten) then getting rid of everything you don't want.
Simon Urbanek's answer is very, very nice. A drawback is that it doesn't seem to work if an object to be saved is too large:
tools:::makeLazyLoadDB(
local({
x <- 1:1e+09
cat("size:", object.size(x) ,"\n")
environment()
}), "lazytest")
size: 4e+09
Error: serialization is too large to store in a raw vector
I'm guessing that this is due to a limitation of the current implementation of R (I have 2.15.2) rather than running out of physical memory and swap. The saves package might be an alternative for some uses, however.
A function is useful to extract a single object without loading everything in the RData file.
extractorRData <- function(file, object) {
#' Function for extracting an object from a .RData file created by R's save() command
#' Inputs: RData file, object name
E <- new.env()
load(file=file, envir=E)
return(get(object, envir=E, inherits=F))
}
See full answer here. https://stackoverflow.com/a/65964065/4882696
This blog post gives an a neat practice that prevents this sort of issue in the first problem. The gist of it is to use saveRDS(), loadRDS() functions instead of the regular save(), load() functions.

How to open .rdb file using R

My question is quite simple, but I couldn't find the answer anywhere.
How do I open a .rdb file using R?
It is placed inside an R package.
I have been able to solve the problem, so I am posting the answer here in case someone needs it in the future.
#### Importing data from .rdb file ####
setwd("path...\\Rsafd\\Rsafd\\data") # Set working directory up to the file that contains
# your .rds and .rdb files.
readRDS("Rdata.rds") # see metadata contained in .rds file
# lazyLoad is the function we use to open a .rdb file:
lazyLoad(filebase = "path...\\Rsafd\\Rsafd\\data\\Rdata", envir = parent.frame())
# for filebase, Rdata is the name of the .rdb file.
# envir is the environment on which the objects are loaded.
The result of using the lazyLoad function is that every database contained in the .rdb file shows up in your variable environment as a "promise". This means that the database will not be opened unless you want it to be.
The way to open it is the following:
find(HOWAREYOU) # open the file named HOWAREYOU
head(HOWAREYOU) # look at the first entries, just to make sure
Edit: readRDS is not part of the process to open the .rdb file, it is just to look at the metadata. The lazyLoad function indeed opens .rdb files.
Posting a slightly more direct answer since I keep Googling to this Q&A when trying to examine .rdb objects inside an R package (in particular the help/package.rdb file) and not seeing the answer clearly enough.
R keeps the help Rd objects for the installed package pkg at help/$pkg.{rdb,rdx}.
We can load these Rd objects into environment e like so:
lazyLoad(
file.path(system.file("help", package=pkg), pkg),
envir = e
)
Note that we can't use system.file("help", pkg, package=pkg) because system.file() requires the file to exist or it returns "", and here we've truncated the .rdb/.rdx extension as required by lazyLoad().
We can skip supplying envir=e, but the objects will be loaded into the global environment (assuming you're running this interactively) and I wanted my default answer to avoid polluting it.
See ?lazyLoad for more.

List of variables from R workspace file [duplicate]

I have a Rdata file containing various objects:
New.Rdata
|_ Object 1 (e.g. data.frame)
|_ Object 2 (e.g. matrix)
|_...
|_ Object n
Of course I can load the data frame with load('New.Rdata'), however, is there a smart way to load only one specific object out of this file and discard the others?
.RData files don't have an index (the contents are serialized as one big pairlist). You could hack a way to go through the pairlist and assign only entries you like, but it's not easy since you can't do it at the R level.
However, you can simply convert the .RData file into a lazy-load database which serializes each entry separately and creates an index. The nice thing is that the loading will be on-demand:
# convert .RData -> .rdb/.rdx
e = local({load("New.RData"); environment()})
tools:::makeLazyLoadDB(e, "New")
Loading the DB then only loads the index but not the contents. The contents are loaded as they are used:
lazyLoad("New")
ls()
x # if you had x in the New.RData it will be fetched now from New.rdb
Just like with load() you can specify an environment to load into so you don't need to pollute the global workspace etc.
You can use attach rather than load which will attach the data object to the search path, then you can copy the one object you are interested in and detach the .Rdata object.
This still loads everything, but is simpler to work with than loading everything into the global workspace (possibly overwriting things you don't want overwritten) then getting rid of everything you don't want.
Simon Urbanek's answer is very, very nice. A drawback is that it doesn't seem to work if an object to be saved is too large:
tools:::makeLazyLoadDB(
local({
x <- 1:1e+09
cat("size:", object.size(x) ,"\n")
environment()
}), "lazytest")
size: 4e+09
Error: serialization is too large to store in a raw vector
I'm guessing that this is due to a limitation of the current implementation of R (I have 2.15.2) rather than running out of physical memory and swap. The saves package might be an alternative for some uses, however.
A function is useful to extract a single object without loading everything in the RData file.
extractorRData <- function(file, object) {
#' Function for extracting an object from a .RData file created by R's save() command
#' Inputs: RData file, object name
E <- new.env()
load(file=file, envir=E)
return(get(object, envir=E, inherits=F))
}
See full answer here. https://stackoverflow.com/a/65964065/4882696
This blog post gives an a neat practice that prevents this sort of issue in the first problem. The gist of it is to use saveRDS(), loadRDS() functions instead of the regular save(), load() functions.

Open file generated by loading .rda files

I followed your advice about creating a loop that loads files in R and did:
dataFiles<-lapply(Sys.glob("kwo*.rda*"), load)
Now I have my dataFiles which contains the files I wanted to load
head(dataFiles)
[[1]]
[1] "kw"
[[2]]
[1] "kw"
[[3]]
[1] "kw"
Now I need to work with the information contained in the files I loaded, what should I do to open the files and to 'identify' them?
Standard behavior of load in this kind of loop is to create a temporary environment, load the data into it, and discarding this temporary enviroment again. If you want them in the global environment, you need to explicitly load them into the global environment, see this SO post for more info. This will load all the objects contained in all the .Rda files into your global environment, aka workspace.
Could you provide some more information as to what you are doing exactly? What generated the Rda files, and what do you want to do with that data you read in? More information can help us, help you. And you refer to an earlier SO question your (I followed your advice about creating a loop that loads files in R), I cannot find this question in your profile.

Resources