Efficiently cleaning R from attached objects

Efficiently cleaning R from attached objects - r

I want to initialise some variables from an external data file. One way is to set a file like the following foo.csv:
var1,var2,var3
value1,value2,value3
Then issue:
attach(read.csv('foo.csv'))
The problem is that in this way var1, var2, var3 are not shown by ls() and most of all rm(all=ls()) doesn't clean all anymore and var1, var2, var3 are still there.
As the default position for new objects is '2', I can remove the workspace where this variables live via:
detach(pos=2)
or simply
detach()
Since pos=2 is the default for detach too.
But detach() is "too" powerful and it can delete R objects loaded by default. This means that, if one attaches many datasets, removing them with repeated detach can lead to delete also default R objects and you have to restart it. Besides the simplicity of the single rm(all=ls()) goes away.
One solution will be to attach var1, var2, var3 straight to the the global environment.
Do you know how to do that?
attach(read.csv('foo.csv'), pos=1)
issues a warning (future error).
attach(read.csv('foo.csv'), pos=-1)
seems ineffective.

If you want to read the variables straight into the global environment, you can do this:
{
foo<-read.csv('foo.csv')
for(n in names(foo)) assign(n,foo[[n]],globalenv())
}
The braces will prevent foo from also being added to the global environment. You can also make this into a function if you want.

Use the named variant of attach and detach:
attach(read.csv(text='var1,var2,var3\nvalue1,value2,value3'),
name = 'some_name')
and
detach('some_name')
This will prevent mistakes. You’d obviously wrap these two into functions and generate the names automatically in an appropriate manner (the easiest being via a monotonically increasing counter).

What about a 'safer' attach?
attach<-function(x) {for(n in names(x)) assign(n,x[[n]],globalenv()); names(x)}
'Safer' means you can see attached vars with ls() and most of all remove them with a single rm(list=ls())
Inspired by mrip

Related

Is there a way to delete existing variables in my R environment via the terminal? [duplicate]

I would like to remove some data from the workspace. I know the "Clear All" button will remove all data. However, I would like to remove just certain data.
For example, I have these data frames in the data section:
data
data_1
data_2
data_3
I would like to remove data_1, data_2 and data_3, while keeping data.
I tried data_1 <- data_2 <- data_3 <- NULL, which does remove the data (I think), but still keeps it in the workspace area, so it is not fully what I would like to do.

You'll find the answer by typing ?rm
rm(data_1, data_2, data_3)

A useful way to remove a whole set of named-alike objects:
rm(list = ls()[grep("^tmp", ls())])
thereby removing all objects whose name begins with the string "tmp".
Edit: Following Gsee's comment, making use of the pattern argument:
rm(list = ls(pattern = "^tmp"))
Edit: Answering Rafael comment, one way to retain only a subset of objects is to name the data you want to retain with a specific pattern. For example if you wanted to remove all objects whose name do not start with paper you would issue the following command:
rm(list = grep("^paper", ls(), value = TRUE, invert = TRUE))

Following command will do
rm(list=ls(all=TRUE))

In RStudio, ensure the Environment tab is in Grid (not List) mode.
Tick the object(s) you want to remove from the environment.
Click the broom icon.

You can use the apropos function which is used to find the objects using partial name.
rm(list = apropos("data_"))

Use the following command
remove(list=c("data_1", "data_2", "data_3"))

If you just want to remove one of a group of variables, then you can create a list and keep just the variable you need. The rm function can be used to remove all the variables apart from "data". Here is the script:
0->data
1->data_1
2->data_2
3->data_3
#check variables in workspace
ls()
rm(list=setdiff(ls(), "data"))
#check remaining variables in workspace after deletion
ls()
#note: if you just use rm(list) then R will attempt to remove the "list" variable.
list=setdiff(ls(), "data")
rm(list)
ls()

paste0("data_",seq(1,3,1))
# makes multiple data.frame names with sequential number
rm(list=paste0("data_",seq(1,3,1))
# above code removes data_1~data_3

If you're using RStudio, please consider never using the rm(list = ls()) approach!* Instead, you should build your workflow around frequently employing the Ctrl+Shift+F10 shortcut to restart your R session. This is the fastest way to both nuke the current set of user-defined variables AND to clear loaded packages, devices, etc. The reproducibility of your work will increase markedly by adopting this habit.
See this excellent thread on Rstudio community for (h/t #kierisi) for a more thorough discussion (the main gist is captured by what I've stated already).
I must admit my own first few years of R coding featured script after script starting with the rm "trick" -- I'm writing this answer as advice to anyone else who may be starting out their R careers.
*of course there are legitimate uses for this -- much like attach -- but beginning users will be much better served (IMO) crossing that bridge at a later date.

To clear all data:
click on Misc>Remove all objects.
Your good to go.
To clear the console:
click on edit>Clear console.
No need for any code.

Adding one more way, using ls() and remove()
ls() return a vector of character strings giving the names of the objects in the specified environment.
Create a list of objects you want to remove from the environment using ls() and then use remove() to remove it.
remove(list = ls()[ls() != "data"])

You can also use tidyverse
# to remove specific objects(s)
rm(list = ls() %>% str_subset("xxx"))
# or to keep specific object(s)
rm(list = setdiff(ls(), ls() %>% str_subset("xxx")))

Maybe this can help as well
remove(list = c(ls()[!ls() %in% c("what", "to", "keep", "here")] ) )

Doubts on running multiple scripts at once

I'm using
sapply(list.files('scritps/', full.names=TRUE),source)
to run 80 scripts at once in the folder "scripts/" and I do not know exactly how does this work. There are "intermediate" objects equally named across scripts (they are iterative scritps across 80 different biological populations). Does each script only use its own objects? Is there any risk of an script taking the objects of other "previous" script that has not been yet deleted out of the memory, or does this process works exactly like if was run manually sequentially one by one?
Many thanks in advance.

The quick answer is: each script runs independently. Imagine you run a for loop iterating through all the script files instead of using sapply - it should be the same in results.
To prove my thoughts, I just did an experiment:
# This is foo.R
x <- mtcars
write.csv(x, "foo.csv")
# This is bar.R
x <- iris
write.csv(x, "bar.csv")
# Run them at once
sapply(list.files(), source)
Though the default of "local" argument in source is FALSE, it turns out that I have two different csv files in my working directory, one named "foo.csv" with mtcars data frame, and the other named "bar.csv" with iris data frame.

There are global variables that you can declare out a function. As it's name says they are global and can be re-evaluated. If you declare a var into a function it will be local variable and only will take effect inside this concrete function, it will not exists out of its own function.
Example:
Var globalVar = 'i am global';
Function foo(){
Var localVar = 'i don't exist out of foo function';
}
If you declared globalVar on the first script, and you call it on the latest one, it will answer. If you declared localVar on some script and you call it into another or out of the functions or in another function you'll get an error (var localVar is not declared / can't be found).
Edit:
Perhaps, if there aren't dependences between scripts (you don't need one to finish to continue with another) there's no matter on running them on parallel or running them secuentialy. The behaviour will be the same.
You've only to take care with global vars, local ones can't infer into another script/function.

Using delayed assignment in a function: How do I send the promise back to the parent environment?

I would like to used delayedAssign to load a series of data from a set of files only when the data is needed. But since these files will always be in the same directory (which may be moved around), instead of hard coding the location of each file (which would be tedious to change later on if the directory was moved), I would like to simply make a function that accepts the filepath for the directory.
loadLayers <- function(filepath) {
delayedAssign("dataset1", readRDS(file.path(filepath, "experiment1.rds")))
delayedAssign("dataset2", readRDS(file.path(filepath, "experiment2.rds")))
delayedAssign("dataset3", readRDS(file.path(filepath,"experiment3.rds")))
return (list <- (setOne = dataset1, setTwo = dataset2, setThree = dataset3)
}
So instead of loading all the data sets at the start, I'd like to have each data set loaded only when needed (which speeds up the shiny app).
However, I'm having trouble when doing this in a function. It works when the delayedAssign is not in a function, but when I put them in a function, all the objects in the list simply return null, and the "promise" to evaluate them when needed doesn't seem to be fulfilled.
What would be the correct way to achieve this?
Thanks.

Your example code doesn't work in R, but even conceptually, you're using delayedAssign and then you immediately resolve it by referencing it in return() so you end up loading everything anyway. To make it clear, assignments are binding a symbol to a value in an enviroment. So in order for it to make any sense your function must return the environment, not a list. Or, you can simply use the global environment and the function doesn't need to return anything as you use it for its side-effect.
loadLayers <- function(filepath, where=.GlobalEnv) {
delayedAssign("dataset1", readRDS(file.path(filepath, "experiment1.rds")),
assign.env=where)
delayedAssign("dataset2", readRDS(file.path(filepath, "experiment2.rds")),
assign.env=where)
delayedAssign("dataset3", readRDS(file.path(filepath, "experiment3.rds")),
assign.env=where)
where
}

retrieving object definition in R studio

I've been defining some variables with the "<-", like
elect<- copPob12$nivelaprob[copPob12$dispelect==5]
I can see their numerical values on the global environment,
but I want to see how I defined them, to be sure about the function I used, because they are subsets within subsets, I can find them in the "History" tab, but that takes too long,
any function that can retrieve the way I defined the variable on the console?
thanks a lot

As I see the problem, may be you are looking for this:
elect <<- copPob12$nivelaprob[copPob12$dispelect==5]
or you can write
elect <- copPob12$nivelaprob[copPob12$dispelect==5]
assign("elect", elect , envir = .GlobalEnv)
here it changes environment as global so it works within function also

How do I clear only a few specific objects from the workspace?

I would like to remove some data from the workspace. I know the "Clear All" button will remove all data. However, I would like to remove just certain data.
For example, I have these data frames in the data section:
data
data_1
data_2
data_3
I would like to remove data_1, data_2 and data_3, while keeping data.
I tried data_1 <- data_2 <- data_3 <- NULL, which does remove the data (I think), but still keeps it in the workspace area, so it is not fully what I would like to do.

You'll find the answer by typing ?rm
rm(data_1, data_2, data_3)

A useful way to remove a whole set of named-alike objects:
rm(list = ls()[grep("^tmp", ls())])
thereby removing all objects whose name begins with the string "tmp".
Edit: Following Gsee's comment, making use of the pattern argument:
rm(list = ls(pattern = "^tmp"))
Edit: Answering Rafael comment, one way to retain only a subset of objects is to name the data you want to retain with a specific pattern. For example if you wanted to remove all objects whose name do not start with paper you would issue the following command:
rm(list = grep("^paper", ls(), value = TRUE, invert = TRUE))

Following command will do
rm(list=ls(all=TRUE))

In RStudio, ensure the Environment tab is in Grid (not List) mode.
Tick the object(s) you want to remove from the environment.
Click the broom icon.

You can use the apropos function which is used to find the objects using partial name.
rm(list = apropos("data_"))

Use the following command
remove(list=c("data_1", "data_2", "data_3"))

If you just want to remove one of a group of variables, then you can create a list and keep just the variable you need. The rm function can be used to remove all the variables apart from "data". Here is the script:
0->data
1->data_1
2->data_2
3->data_3
#check variables in workspace
ls()
rm(list=setdiff(ls(), "data"))
#check remaining variables in workspace after deletion
ls()
#note: if you just use rm(list) then R will attempt to remove the "list" variable.
list=setdiff(ls(), "data")
rm(list)
ls()

paste0("data_",seq(1,3,1))
# makes multiple data.frame names with sequential number
rm(list=paste0("data_",seq(1,3,1))
# above code removes data_1~data_3

If you're using RStudio, please consider never using the rm(list = ls()) approach!* Instead, you should build your workflow around frequently employing the Ctrl+Shift+F10 shortcut to restart your R session. This is the fastest way to both nuke the current set of user-defined variables AND to clear loaded packages, devices, etc. The reproducibility of your work will increase markedly by adopting this habit.
See this excellent thread on Rstudio community for (h/t #kierisi) for a more thorough discussion (the main gist is captured by what I've stated already).
I must admit my own first few years of R coding featured script after script starting with the rm "trick" -- I'm writing this answer as advice to anyone else who may be starting out their R careers.
*of course there are legitimate uses for this -- much like attach -- but beginning users will be much better served (IMO) crossing that bridge at a later date.

To clear all data:
click on Misc>Remove all objects.
Your good to go.
To clear the console:
click on edit>Clear console.
No need for any code.

Adding one more way, using ls() and remove()
ls() return a vector of character strings giving the names of the objects in the specified environment.
Create a list of objects you want to remove from the environment using ls() and then use remove() to remove it.
remove(list = ls()[ls() != "data"])

You can also use tidyverse
# to remove specific objects(s)
rm(list = ls() %>% str_subset("xxx"))
# or to keep specific object(s)
rm(list = setdiff(ls(), ls() %>% str_subset("xxx")))

Maybe this can help as well
remove(list = c(ls()[!ls() %in% c("what", "to", "keep", "here")] ) )

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Efficiently cleaning R from attached objects - r

If you want to read the variables straight into the global environment, you can do this: { foo<-read.csv('foo.csv') for(n in names(foo)) assign(n,foo[[n]],globalenv()) } The braces will prevent foo from also being added to the global environment. You can also make this into a function if you want.

What about a 'safer' attach? attach<-function(x) {for(n in names(x)) assign(n,x[[n]],globalenv()); names(x)} 'Safer' means you can see attached vars with ls() and most of all remove them with a single rm(list=ls()) Inspired by mrip

Related

Is there a way to delete existing variables in my R environment via the terminal? [duplicate]

Doubts on running multiple scripts at once

Using delayed assignment in a function: How do I send the promise back to the parent environment?

retrieving object definition in R studio

How do I clear only a few specific objects from the workspace?

Categories

Resources