My memory is getting clogged by a bunch of intermediate files (call them temp1, temp2, etc.), and I would like to know if it is possible to remove them from memory without doing repeated rm calls (i.e. rm(temp1), rm(temp2))?
I tried rm(list(temp1, temp2, etc.)), but that doesn't seem to work.
Make the list a character vector (not a vector of names)
rm(list = c('temp1','temp2'))
or
rm(temp1, temp2)
An other solution rm(list=ls(pattern="temp")), remove all objects matching the pattern.
Or using regular expressions
"rmlike" <- function(...) {
names <- sapply(
match.call(expand.dots = FALSE)$..., as.character)
names = paste(names,collapse="|")
Vars <- ls(1)
r <- Vars[grep(paste("^(",names,").*",sep=""),Vars)]
rm(list=r,pos=1)
}
rmlike(temp)
Another variation you can try is (expanding #mnel's answer)
if you have many temp'x'.
Here, "n" could be the number of temp variables present:
rm(list = c(paste("temp",c(1:n),sep="")))
Related
My memory is getting clogged by a bunch of intermediate files (call them temp1, temp2, etc.), and I would like to know if it is possible to remove them from memory without doing repeated rm calls (i.e. rm(temp1), rm(temp2))?
I tried rm(list(temp1, temp2, etc.)), but that doesn't seem to work.
Make the list a character vector (not a vector of names)
rm(list = c('temp1','temp2'))
or
rm(temp1, temp2)
An other solution rm(list=ls(pattern="temp")), remove all objects matching the pattern.
Or using regular expressions
"rmlike" <- function(...) {
names <- sapply(
match.call(expand.dots = FALSE)$..., as.character)
names = paste(names,collapse="|")
Vars <- ls(1)
r <- Vars[grep(paste("^(",names,").*",sep=""),Vars)]
rm(list=r,pos=1)
}
rmlike(temp)
Another variation you can try is (expanding #mnel's answer)
if you have many temp'x'.
Here, "n" could be the number of temp variables present:
rm(list = c(paste("temp",c(1:n),sep="")))
This question already has answers here:
How to assign values to dynamic names variables
(2 answers)
Closed 7 years ago.
I keep running into situations where I want to dynamically create variables using a for loop (or similar / more efficient construct using dplyr perhaps). However, it's unclear to me how to do it right now.
For example, the below shows a construct that I would intuitively expect to generate 10 variables assigned numbers 1:10, but it doesn't work.
for (i in 1:10) {paste("variable",i,sep = "") = i}
The error
Error in paste("variable", i, sep = "") = i :
target of assignment expands to non-language object
Any thoughts on what method I should use to do this? I assume there are multiple approaches (including a more efficient dplyr method). Full disclosure: I'm relatively new to R and really appreciate the help. Thanks!
I've run into this problem myself many times. The solution is the assign command.
for(i in 1:10){
assign(paste("variable", i, sep = ""), i)
}
If you wanted to get everything into one vector, you could use sapply. The following code would give you a vector from 1 to 10, and the names of each item would be "variable i," where i is the value of each item. This may not be the prettiest or most elegant way to use the apply family for this, but I think it ought to work well enough.
var.names <- function(x){
a <- x
names(a) <- paste0("variable", x)
return(a)
}
variables <- sapply(X = 1:10, FUN = var.names)
This sort of approach seems to be favored because it keeps all of those variables tucked away in one object, rather than scattered all over the global environment. This could make calling them easier in the future, preventing the need to use get to scrounge up variables you'd saved.
No need to use a loop, you can create character expression with paste0 and then transform it as uneveluated expression with parse, and finally evaluate it with eval.
eval(parse(text = paste0("variable", 1:10, "=",1:10, collapse = ";") ))
The code you have is really no more useful than a vector of elements:
x<-1
for(i in 2:10){
x<-c(x,i)
}
(Obviously, this example is trivial, could just use x<-1:10 and be done. I assume there's a reason you need to do non-vectored calculations on each variable).
I have a data.frame mapping which contains path and map.
I also have another data.frame DATA which contains the raw path and value.
EDIT: Path might have two components or more: e.g. "A>C" or "A>C>B"
set.seed(24);
DATA <- data.frame(
path=paste0(sample(LETTERS[1:3], 25, replace=TRUE), ">", sample(LETTERS[1:3], 25, replace=TRUE)),
value=rnorm(25)
)
mapping <- data.frame(path=c("A","B","C"), map=c("X","Y","Z"))
lapply(mapping, function (x) {
for (i in 1:nrow(DATA)) {
DATA$path[i] <- gsub(as.character(x["path"]),as.character(x["map"]),as.character(DATA$path[i]))
}
})
I'm trying to replace the path in DATA with the map value in mapping but this doesn't seem to be working for me.
"A>C" will be converted to "X>Z".
I understand that for loops are not good in R, but I can't think of another way to code it. Data size I'm working with is 6m row in DATA and 16k rows in mapping.
Clarification on Data: While the path consists of alphabets (ABC) now, the real path are actually domain names. Number of steps in a path is also not fixed at 2 and can be any number.
You can use chartr
DATA$path <- chartr('ABC', 'XYZ', DATA$path)
Or if we are using the data from 'mapping'
DATA$path <- chartr(paste(mapping$path, collapse=''),
paste(mapping$map, collapse=''), DATA$path)
Or using gsubfn
library(gsubfn)
pat <- paste0('[', paste(mapping$path, collapse=''),']')
indx <- setNames(as.character(mapping$map), mapping$path)
gsubfn(pat, as.list(indx), as.character(DATA$path))
Or a base R option based on #smci's comment
vapply(strsplit(as.character(DATA$path), '>'), function(x)
paste(indx[x], collapse=">"), character(1L))
Using data.table (1.9.5+), especially advisable b/c of the size of your data.
library(data.table)
setDT(DATA); setDT(mapping)
DATA[,paste0("path",1:2):=tstrsplit(path,split=">")]
setkey(DATA,path1)[mapping,new.path1:=i.map]
setkey(DATA,path2)[mapping,new.path2:=i.map]
DATA[,new.path:=paste0(new.path1,">",new.path2)]
If you want to get rid of the extra columns:
DATA[,paste0(c("","","new.","new."),"path",rep(1:2,2)):=NULL]
If you just want to overwrite path, use path on the LHS of the last line instead of new.path.
This could also be written more concisely:
library(data.table)
setDT(mapping)
setkey(setkey(setDT(DATA)[,paste0("path",1:2):=tstrsplit(path,split=">")
],path1)[mapping,new.path1:=i.map],path2
)[mapping,new.path:=paste0(new.path1,">",i.map)]
I think you're using the wrong apply.
mapply allows you to use two arguments to the function, here the path and the map. Note that in mapply, the argument FUN comes first. You also do not need to do this row by row, you can just do the entire column at once. Finally, in an apply the variables do not get updated as they do in a for loop, so you need to assign them in the .GlobalEnv. You can do this with an explicit call to assign() or using <<- which assigns them in the first place it finds them in the stack. In this case, that will be back in .GlobalEnv.
After defining mapping and DATA as you do above, try this.
head(DATA)
invisible(mapply( function (x,y) {
DATA$path <<- gsub(x,y,DATA$path)
},mapping$path, mapping$map))
head(DATA)
note that the call to invisible suppresses output from mapply.
If you really want to use lapply, you can. But you need to transpose mapping. You can do that but it will be converted to a matrix, so you have to convert it back. Then, you can just use the same tricks with <<- and not using a for loop as above to get this code:
invisible(lapply(as.data.frame(t(mapping)), function (x) {
DATA$path <<- gsub(x[1],x[2],DATA$path)
}))
head(DATA)
Thanks for sharing, I learned a lot answering this question.
I am trying to use rbind on them. But I need a list of all the dataframes that are already in my global environment. How can I do it?
Code I used to import the 20 csv files in a directory. Basically, have to combine into a single dataframe.
temp = list.files(pattern = "*.csv")
for (i in 1:length(temp)) assign(temp[i], read.csv(temp[i]))
This function should return a proper list with all the data.frames as elements
dfs <- Filter(function(x) is(x, "data.frame"), mget(ls()))
then you can rbind them with
do.call(rbind, dfs)
Of course it's awfully silly to have a bunch of data.frames lying around that are so related that you want to rbind them. It sounds like they probably should have been in a list in the first place.
I recommend you say away from assign(), that's always a sign things are probably afoul. Try
temp <- list.files(pattern="*.csv")
dfs <- lapply(temp, read.csv)
that should return a list straight away.
From your posted code, I would recommend you start a new R session, and read the files in again with the following code
do.call(rbind, lapply(list.files(pattern = ".csv"), read.csv))
The ls function lists all things in your environment. The get function gets a variable with a given name. You can use the class function to get the class of a variable.
If you put them all together, you can do this:
ls()[sapply(ls(), function(x) class(get(x))) == 'data.frame']
which will return a character vector of the data.frames in the current environment.
If you only have data.frames with the same number of columns and column names in you global environment, the following should work (non-data.frame object don't matter):
do.call(rbind, eapply(.GlobalEnv,function(x) if(is.data.frame(x)) x))
This is a slight improvement on MentatOfDune's answer, which does not catch data.frames with multiple classes:
ls()[grepl('data.frame', sapply(ls(), function(x) class(get(x))))]
To improve MentatOfDune's answer (great username by the way):
ls()[sapply(ls(), function(x) any(class(get(x)) == 'data.frame'))]
or even more robust:
ls()[sapply(ls(), function(x) is.data.frame(get(x)))]
This also supports tibbles (created with dplyr for example), because they contain multiple classes, where data.frame is one of them.
A readable version to get TRUEs and FALSEs using R 4 and higher:
ls() |> sapply(get) |> sapply(is.data.frame)
Finally super, super robust, also for package developers:
ls()[sapply(ls(), function(x) is.data.frame(eval(parse(text = x), envir = globalenv())))]
I have some data.frames
dat1=read.table...
dat2=read.table...
dat3=read.table...
And I would to count the rows for each data set. So
the names are saved like this (cannot "change" it) vector=c("dat1","dat2","dat3...)
p <- vector(numeric, length=1:length(dat))
counting <- function(x) {for (i in 1:x){
p[i]<-nrow(dat[i])}
return(p)
}
This is not working because the input for nrow is a character, but i need integer(?) or?
Thx for help
You can use get for this, but be careful! Instead reading the tables at a list is the R-ish way:
file.names <- list.files()
dat <- lapply(file.names, read.table)
Then you have all the conveniences of lapply and the apply family at your disposal, e.g.:
lapply(dat, nrow)
The solution using get (also vector is a bad variable name since its a very important function):
lapply(vector, function(x) nrow(get(x)))
Your method fails since there is no object called dat to index into. The for loop could look like:
p = NULL
for(v in vector) {
p <- c(p, nrow(get(v)))
}
But that technique is poor form for lotsa reasons...
If you want to determine properties of items you know to be in the .GlobalEnv, this works:
> sapply( c("A","B"), function(objname) nrow(.GlobalEnv[[objname]]) )
A B
5 4
You could substitute any character vector for c("A","B")`. If the object is not in the global environment it just returns NULL, so it's reasonably robust.