calling data frames in a for loop by a vector - r

I have some data.frames
dat1=read.table...
dat2=read.table...
dat3=read.table...
And I would to count the rows for each data set. So
the names are saved like this (cannot "change" it) vector=c("dat1","dat2","dat3...)
p <- vector(numeric, length=1:length(dat))
counting <- function(x) {for (i in 1:x){
p[i]<-nrow(dat[i])}
return(p)
}
This is not working because the input for nrow is a character, but i need integer(?) or?
Thx for help

You can use get for this, but be careful! Instead reading the tables at a list is the R-ish way:
file.names <- list.files()
dat <- lapply(file.names, read.table)
Then you have all the conveniences of lapply and the apply family at your disposal, e.g.:
lapply(dat, nrow)
The solution using get (also vector is a bad variable name since its a very important function):
lapply(vector, function(x) nrow(get(x)))
Your method fails since there is no object called dat to index into. The for loop could look like:
p = NULL
for(v in vector) {
p <- c(p, nrow(get(v)))
}
But that technique is poor form for lotsa reasons...

If you want to determine properties of items you know to be in the .GlobalEnv, this works:
> sapply( c("A","B"), function(objname) nrow(.GlobalEnv[[objname]]) )
A B
5 4
You could substitute any character vector for c("A","B")`. If the object is not in the global environment it just returns NULL, so it's reasonably robust.

Related

How do I make all my dataframes' variables in my environment numeric?

I have got a lot of data frames in my R environment and I want to do the as.numeric() function on all of the variables in the data.frames and overwrite them. I do not know how to address all of them.
The following is my attempt, but ls() seemingly just writes the name to x:
for (i in 1:length(ls())){
x <- ls()[i]
for (i in 1:length(x)){
x[i] <- as.numeric(x[i])
}
}
So, there were two helpful answers to my question. One, that was later deleted and another one by #Henrik.
The deleted one followed my approach to convert all data frames in global environment (that has an "V" in it - in my example) as numerics. This is the code:
res <- lapply(mget(ls(pattern = 'V')), \(x) {
x[] <- lapply(x, as.numeric)
return(x)
})
list2env(res, .GlobalEnv)
# Check
str(VA01.000306__ft2)
The second approach uses lists instead of multiple objects. When I have stored my multiple csv files into lists. This is the csv to list import:
F_EB_names <- list.files(pattern="*.csv")### Daten in Liste speichern?
F_EB <- lapply(F_EB_names, read.csv2)
names(F_EB) <- gsub(".wav.csv","_ft2",F_EB_names)
And this is the conversion to numerals:
F_EB <- type.convert(F_EB) # Conversion
str(F_EB) # Check
Thank you both for the help.

Why does formals() not require names() when looping with sapply?

I wanted to find which function in base R has the greatest number of arguments.
> objs <- mget(ls("package:base"), inherits = TRUE)
> funs <- Filter(is.function, objs)
one easy way was to use sapply as follows
f_arg_length <- sapply(funs, function(x) length(formals(x)))
f_arg_length[which.max(f_arg_length)]
but I also tried an explicit loop to do the same and my code was
max_fun_name <- ""
max_fun <- 0
for(x in 1:length(funs)) {
if (length(formals(names(funs[x]))) > max_fun)
{
max_fun <- length(formals(names(funs[x])))
max_fun_name <- names(funs[x])
}
}
max_fun_name
max_fun
I am unable to understand why elements of funs are passed in formals() with names() when referencing with index (as seen in the explicit loop), while the same can be achieved without names() when referencing without index (as seen in the case of sapply). Can someone please explain the reason why these two ways of referencing the same thing produce noticeable differences?

How can I assign a function to a variable that its name should be made by paste function? [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
How to name variables on the fly in R?
I have 10 objects of type list named: list1, list2, ..., list10. And I wanted to use the output from my function FUN to replace the first index of these list variables through a for loop like this:
for (i in 1:10){
eval(parse(test=(paste("list",i,sep=""))))[[1]] <- FUN()
}
but this does not work. I also used lapply in this way and again it is wrong:
lapply(eval(parse(text=(paste("list",i,sep=""))))[[1]], FUN)
Any opinion would be appreciated.
This is FAQ 7.21. The most important part of that FAQ is the last part that says not to do it that way, but to put everything into a list and operate on the list.
You can put your objects into a list using code like:
mylists <- lapply( 1:10, function(i) get( paste0('list',i) ) )
Then you can do the replacement with code like:
mylists <- lapply( mylists, function(x) { x[[1]] <- FUN()
x} )
Now if you want to save all the lists, or delete all the lists, you just have a single object that you need to work with instead of looping through the names again. If you want to do something else to each list then you just use lapply or sapply on the overall list in a single easy step without worrying about the loop. You can name the elements of the list to match the original names if you want and access them that way as well. Keeping everything of interest in a single list will also make your code safer, much less likely to accidentilly overwrite or delete another object.
You probably want something like
for (i in 1:10) {
nam <- paste0("list", i) ## built the name of the object
tmp <- get(nam) ## fetch the list by name, store in tmp
tmp[[1]] <- FUN() ## alter 1st component of tmp using FUN()
assign(nam, value = tmp) ## assign tmp back to the current list
}
as the loop.

How to index (subset) over a list of data.frames

I got a list of several data.frames and I want to remove the first 2 columns from each of the data.frames. I did it as follows, but feel this could be more R-ish.
data(mtcars)
data(iris)
myList <- list(A = mtcars, B = iris)
# helper function
removeCols <- function(df,vec) {
res <- df[,-vec]
}
lapply(myList,removeCols,1:2)
Obviously this does the job, but to me it seems like i must have missed something here (such as using an operator within lapply, cause it's technically a function too).
However, the major disadvantage of this approach is that you need a little helper function for every little task you want to do to all elements of that list.
Your code is perfectly good R. But you have two alternative options:
Use an anonymous function - this is a general solution
Use the [ operator - specific to this case
Your original:
xx <- lapply(myList,removeCols,1:2)
An anonymous function:
yy <- lapply(myList, function(df, vec){df[,-vec]}, 1:2)
Use the [ operator:
zz <- lapply(myList, "[", -(1:2))
These yield identical results
identical(xx, yy)
[1] TRUE
identical(xx, zz)
[1] TRUE
The only thing I can imagine at the moment to be more R-ish is to make it shorter and get rid of the helper-function.
data(mtcars)
data(iris)
myList <- list(A = mtcars, B = iris)
lapply(myList,function(x) x[,-(1:2)])
If you asking for a direct way to modify something:
myList[[1]][,-(1:2)]
But as lists are a quite open structure with no requirements to its content you can not index over its contents, as they can be really different. However if your tow data sets have the same dimension (nxm) than you can combine them to an 3d-array on which all the known indexing tricks will work.

Iterating over separate lists in R

I have lots of variables in R, all of type list
a100 = list()
a200 = list()
# ...
p700 = list()
Each variable is a complicated data structure:
a200$time$data # returns 1000 x 1000 matrix
Now, I want to apply code to each variable in turn. However, since R doesn't support pass-by-reference, I'm not sure what to do.
One idea I had was to create a big list of all these lists, i.e.,
biglist = list()
biglist[[1]] = a100
...
And then I could iterate over biglist:
for (i in 1:length(biglist)){
biglist[[i]]$newstuff = "profit"
# more code here
}
And finally, after the loop, go backwards so that existing code (that uses variable names) still works:
a100 = biglist[[1]]
# ...
The question is: is there a better way to iterate over a set of named lists? I have a feeling that I'm doing things horribly wrong. Is there something easier, like:
# FAKE, Idealized code:
foreach x in (a100, a200, ....){
x$newstuff = "profit"
}
a100$newstuff # "profit"
To parallel walk over lists you can use mapply, which will take parallel lists and then walk over them in lock-step. Furthermore, in a functional language you should emit the object that you want rather than modify the data structure within a function call.
You should use the sapply, apply, lapply, ... family of functions.
jim
jimmyb is quite right. lapply and sapply are specifically designed to work on lists. So they would work with your biglist as well. You shouldn't forget to return the object in the nested function though : An example :
X <- list(A=list(A1=1:2,A2=3:4),B=list(B1=5:6,B2=7:8))
lapply(X,function(i){
i$newstuff = "profit"
return(i)
})
Now as you said, R passes by value so you have multiple copies of the data roaming around. If you work with really big lists, you might want to try toning the memory usage down by working on each variable seperately, using assign and get. The following is considered bad coding, but can sometimes be necessary to avoid memory trouble :
A <- X[[1]] ; B <- X[[2]] #make the data
list.names <- c("A","B")
for (i in list.names){
tmp <- get(i)
tmp$newstuff <- "profit"
assign(i,tmp)
rm(tmp)
}
Make sure you are well aware of the implication this code has, as you're working within the global environment. If you need to do this more often, you might want to work with environments instead :
my.env <- new.env() # make the environment
my.env$A <- X[[1]];my.env$B <- X[[2]] # put vars in environment
for (i in list.names){
tmp <- get(i,envir=my.env)
tmp$newstuff <- "profit"
assign(i,tmp,envir=my.env)
rm(tmp)
}
my.env$A
my.env$B

Categories

Resources