Rds File size increasing after reading [duplicate]

Rds File size increasing after reading [duplicate] - r

What are the main differences between .RData, .Rda and .Rds files?
Are there differences in compression, etc.?
When should each type be used?
How can one type be converted to another?

Rda is just a short name for RData. You can just save(), load(), attach(), etc. just like you do with RData.
Rds stores a single R object. Yet, beyond that simple explanation, there are several differences from a "standard" storage. Probably this R-manual Link to readRDS() function clarifies such distinctions sufficiently.
So, answering your questions:
The difference is not about the compression, but serialization (See this page)
Like shown in the manual page, you may wanna use it to restore a certain object with a different name, for instance.
You may readRDS() and save(), or load() and saveRDS() selectively.

In addition to #KenM's answer, another important distinction is that, when loading in a saved object, you can assign the contents of an Rds file. Not so for Rda
> x <- 1:5
> save(x, file="x.Rda")
> saveRDS(x, file="x.Rds")
> rm(x)
## ASSIGN USING readRDS
> new_x1 <- readRDS("x.Rds")
> new_x1
[1] 1 2 3 4 5
## 'ASSIGN' USING load -- note the result
> new_x2 <- load("x.Rda")
loading in to <environment: R_GlobalEnv>
> new_x2
[1] "x"
# NOTE: `load()` simply returns the name of the objects loaded. Not the values.
> x
[1] 1 2 3 4 5

Related

How can i write as a csv file for S4 class?

I am using DEP proteomics package to analyse my mass spectometry data. I want to remove the batch effect from the my data. So after the preprocessing of my data i want to download as a CSV file file so that i can upload into batch server. but I am not able to write as CSV. Whenever i try i am getting error (no method for coercing this S4 class to a vector)? I am new to R. i read something about this and still i don't have clear idea?
can someone help me with this?
> data_se <- make_se(data_unique, LFQ_columns, experimental_design)
> LFQ_columns <- grep("LFQ.", colnames(data_unique))
> data_se_parsed <- make_se_parse(data_unique, LFQ_columns)
> is(data_se)
[1] "SummarizedExperiment" "RectangularData" "Vector" "Annotated"
[5] "vector_OR_Vector"
> data_se#metadata
list()
> View(data_se)
> colnames(data_se)
[1] "Ubi4_1" "Ubi4_2" "Ubi4_3" "Ubi6_1" "Ubi6_2" "Ubi6_3" "Ctrl_1" "Ctrl_2" "Ctrl_3" "Ubi1_1" "Ubi1_2" "Ubi1_3"
> write.csv2(data_se, "/home/dell/Desktop/Preoteomics_TMT_data/ubi_data_se.csv")
Error in as.vector(x) : no method for coercing this S4 class to a vector

Sorry, the question is not too clear (not sure what you want to upload for example), but you can access and write to file what looks like mass spec readings using the following:
my_data <- data_se#assays#data#listData
write.csv2(my_data, "my_upload.csv")
It probably won't be in the format needed for your batch server, but that is a separate question.
Not sure if this is the data you want! If you have no luck here, try biostars, where this type of question is more common.

Hide data in global environment

Is there a function that allows for data to be hidden in the global environment but it can still be accessed?
For example, I have a very long script with up to 100+ lines and my global environment is looking messy, there is too much and it strains my brain finding what is necessary.
I have searched up similar questions and they involve creating a package, quite frankly I have no time to learn, at this moment.

If you name all the objects you don't want to appear in the global environment starting with a dot (.), for example:
.foo <- 'bar' the object will be accesible but will be hidden in the global environment or in any ls() call:
> .foo <- 'bar'
> .foo
[1] "bar"
> ls()
character(0)
>
Edit: Adding a working example

Possible solutions would be:
removing objects once they're not needed any more
put related variables into a list (hello lapply, sapply)
move into a separate custom environment (new.env())
simplify the script to not use as many objects
run script in batch mode and miss out on what the environment looks like altogether

Sounds like an XY problem, 100 lines is quite small, it's likely that you are using too many temporary variables or are numbering objects that should be in a list.
You also don't mention why you don't like your environment to be "messy", my guess is that maybe you don't like the output of ls() ?
Then maybe you'll be happy to learn about the pattern argument of ls() which will allow you to filter the result, mostly useful with prefixes or suffixes as in the following examples :
something <- 1
some_var <- 2
another_var <- 3
ls(pattern ="^some")
#> [1] "some_var" "something"
ls(pattern="var$")
#> [1] "another_var" "some_var"
Created on 2019-11-17 by the reprex package (v0.3.0)

What are the main differences between R data files?

What are the main differences between .RData, .Rda and .Rds files?
Are there differences in compression, etc.?
When should each type be used?
How can one type be converted to another?

Rda is just a short name for RData. You can just save(), load(), attach(), etc. just like you do with RData.
Rds stores a single R object. Yet, beyond that simple explanation, there are several differences from a "standard" storage. Probably this R-manual Link to readRDS() function clarifies such distinctions sufficiently.
So, answering your questions:
The difference is not about the compression, but serialization (See this page)
Like shown in the manual page, you may wanna use it to restore a certain object with a different name, for instance.
You may readRDS() and save(), or load() and saveRDS() selectively.

In addition to #KenM's answer, another important distinction is that, when loading in a saved object, you can assign the contents of an Rds file. Not so for Rda
> x <- 1:5
> save(x, file="x.Rda")
> saveRDS(x, file="x.Rds")
> rm(x)
## ASSIGN USING readRDS
> new_x1 <- readRDS("x.Rds")
> new_x1
[1] 1 2 3 4 5
## 'ASSIGN' USING load -- note the result
> new_x2 <- load("x.Rda")
loading in to <environment: R_GlobalEnv>
> new_x2
[1] "x"
# NOTE: `load()` simply returns the name of the objects loaded. Not the values.
> x
[1] 1 2 3 4 5

Examining contents of .rdata file by attaching into a new environment - possible?

I am interested in listing objects in an RDATA file and loading only selected objects, rather than the whole set (in case some may be big or may already exist in the environment). I'm not quite clear on how to do this when there are conflicts in names, as attach() doesn't work as nicely.
1: For examining the contents of an R data file without loading it: This question is similar, but different from, the one asked at listing contents of an R data file without loading
In that case, the solution offered was:
attach(filename)
ls(pos = 2)
detach()
If there are naming conflicts between objects in the file and those in the global environment, this warning appears:
The following object(s) are masked _by_ '.GlobalEnv':
I tried creating a new environment, but I cannot seem to attach into that.
For instance, this produces the same error:
lsfile <- function(filename){
tmpEnv <- new.env()
evalq(attach(filename), envir = tmpEnv)
tmpls <- ls(pos = 2)
detach()
return(tmpls)
}
lsfile(filename)
Maybe I've made a mess of things with evalq (or eval). Is there some other way to avoid the naming conflict?
2: If I want to access an object - if there are no naming conflicts, I can just work with the one from the .rdat file, or copy it to a new one. If there are conflicts, how does one access the object in the file's namespace?
For instance, if my file is "sample.rdat", and the object is surveyData, and a surveyData object already exists in the global environment, then how can I access the one from the file:sample.rdat namespace?
I currently solve this problem by loading everything into a temporary environment, and then copy out what's needed, but this is inefficient.

Since this question has just been referenced let's clarify two things:
attach() simply calls load() so there is really no point in using it instead of load
if you want selective access to prevent masking it's much easier to simply load the file into a new environment:
e = local({load("foo.RData"); environment()})
You can then use ls(e) and access contents like e$x. You can still use attach on the environment if you really want it on the search path.
FWIW .RData files have no index (the objects are stored in one big pairlist), so you can't list the contained objects without loading. If you want convenient access, convert it to the lazy-load format instead which simply adds an index so each object can be loaded separately (see Get specific object from Rdata file)

I just use an env= argument to load():
> x <- 1; y <- 2; z <- "foo"
> save(x, y, z, file="/tmp/foo.RData")
> ne <- new.env()
> load(file="/tmp/foo.RData", env=ne)
> ls(env=ne)
[1] "x" "y" "z"
> ne$z
[1] "foo"
>
The cost of this approach is that you do read the whole RData file---but on the other hand that seems to be unavoidable anyway as no other method seems to offer a list of the 'content' of such a file.

You can suppress the warning by setting warn.conflicts=FALSE on the call to attach. If an object is masked by one in the global environment, you can use get to retreive it from your attached data.
x <- 1:10
save(x, file="x.rData")
#attach("x.rData", pos=2, warn.conflicts=FALSE)
attach("x.rData", pos=2)
(x <- 1)
# [1] 1
(x <- get("x", pos=2))
# [1] 1 2 3 4 5 6 7 8 9 10

Thanks to #Dirk and #Joshua.
I had an epiphany. The command/package foreach with SMP or MC seems to produce environments that only inherit, but do not seem to conflict with, the global environment.
lsfile <- function(list_files){
aggregate_ls = foreach(ix = 1:length(list_files)) %dopar% {
attach(list_files[ix])
tmpls <- ls(pos = 2)
return(tmpls)
}
return(aggregate_ls)
}
lsfile("f1.rdat")
lsfile(dir(pattern = "*rdat"))
This is useful to me because I can now parallelize this. This is a bare-bones version, and I will modify it to give more detailed information, but so far it seems to be the only way to avoid conflicts, even without ignore.
So, question #1 can be resolved by either ignoring the warnings (as #Joshua suggested) or by using whatever magic foreach summons.
For part 2, loading an object, I think #Joshua has the right idea - "get" will do.
The foreach magic can also work, by using the .noexport option. However, this has risks: whatever isn't specifically excluded will be inherited/exported from the global environment (I could do ls(), but there's always the possibility of attached datasets). For safety, this means that get() must still be used to avoid the risk of a naming conflict. Loading into a subenvironment avoids the naming conflict, but doesn't avoid the loading of unnecessary objects.
#Joshua's answer is far simpler than my foreach detour.

Loading someone else's .rdata file, can't access the data

My professor has sent me an .rdata file and wants me to do some analysis on the contents. Although I'm decent with R, I've never saved my work in .rdata files, and consequently haven't ever worked with them.
When I try to load the file, it looks like it's working:
> load('/home/swansone/Desktop/anes.rdata')
> ls()
[1] "25383-0001-Data"
But I can't seem to get at the data:
> names("25383-0001-Data")
NULL
I know that there is data in the .rdata file (it's 13 MB, there's definitely a lot in there) Am I doing something wrong? I'm at a loss.
Edit:
I should note, I've also tried not using quotes:
> names(25383-0001-Data)
Error: object "Data" not found
And renaming:
> ls()[1] <- 'nes'
Error in ls()[1] <- "nes" : invalid (NULL) left side of assignment

You're going to run into a lot of issues with an object that doesn't begin with a letter or . and a letter (as mentioned in An Introduction to R).
Use backticks to access this object (the "Names and Identifiers" section of help("`") explains why this works) and assign the object to a new, syntactically validly named object.
Data <- `25383-0001-Data`

Maybe it has to do with the unusual use of dashes in the name and backquotes work:
names(`25383-0001-Data`)
Edit:
More for reference (since Joshua already answered the main question perfectly), you can also reassign an object from ls() (what Wilduck tried in the question) using get(). This might be useful if the object of the name contains very weird characters:
foo <- 1:5
bar <- get(ls()[1])
bar
[1] 1 2 3 4 5
This of course requires the index of foo in ls() to be [1], but looking up the index of the required object is not too hard.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Rds File size increasing after reading [duplicate] - r

What are the main differences between .RData, .Rda and .Rds files? Are there differences in compression, etc.? When should each type be used? How can one type be converted to another?

Related

How can i write as a csv file for S4 class?

Hide data in global environment

What are the main differences between R data files?

Examining contents of .rdata file by attaching into a new environment - possible?

Loading someone else's .rdata file, can't access the data

Categories

Resources