Why does formals() not require names() when looping with sapply? - r

I wanted to find which function in base R has the greatest number of arguments.
> objs <- mget(ls("package:base"), inherits = TRUE)
> funs <- Filter(is.function, objs)
one easy way was to use sapply as follows
f_arg_length <- sapply(funs, function(x) length(formals(x)))
f_arg_length[which.max(f_arg_length)]
but I also tried an explicit loop to do the same and my code was
max_fun_name <- ""
max_fun <- 0
for(x in 1:length(funs)) {
if (length(formals(names(funs[x]))) > max_fun)
{
max_fun <- length(formals(names(funs[x])))
max_fun_name <- names(funs[x])
}
}
max_fun_name
max_fun
I am unable to understand why elements of funs are passed in formals() with names() when referencing with index (as seen in the explicit loop), while the same can be achieved without names() when referencing without index (as seen in the case of sapply). Can someone please explain the reason why these two ways of referencing the same thing produce noticeable differences?

Related

Using for loop to append vectors of variable length

I am trying to create a vector or list of values based on the output of a function performed on individual elements of a column.
library(hpoPlot)
xyz_hpo <- c("HP:0003698", "HP:0007082", "HP:0006956")
getallancs <- function(hpo_col) {
for (i in 1:length(hpo_col)) {
anc <- get.ancestors(hpo.terms, hpo_col[i])
output <- list()
output[[length(anc) + 1]] <- append(output, anc)
}
return(anc)
}
all_ancs <- getallancs(xyz_hpo)
get.ancestors outputs a character vector of variable length depending on each term. How can I loop through hpo_col adding the length of each ancs vector to the output vector?
Welcome to Stack Overflow :) Great job on providing a minimal reproducible example!
As mentioned in the comments, you need to move the output <- list() outside of your for loop, and return it after the loop. At present it is being reset for each iteration of the loop, which is not what you want. I also think you want to return a vector rather than a list, so I have changed the type of output.
Also, in your original question, you say that you want to return the length of each anc vector in the loop, so I have changed the function to output the length of each iteration, rather than the whole vector.
getallancs <- function(hpo_col) {
output <- numeric()
for (i in 1:length(hpo_col)) {
anc <- get.ancestors(hpo.terms, hpo_col[i])
output <- append(output, length(anc))
}
return(output)
}
If you are only doing this for a few cases, such as your example, this approach will be fine, however, this paradigm is typically quite slow in R and it's better to try and vectorise this style of calculation if possible. This is especially important if you are running this for a large number of elements where computation will take more than a few seconds.
For example, one way the function above could be vectorised is like so:
all_ancs <- sapply(xyz_hpo, function(x) length(get.ancestors(hpo.terms, x)))
If in fact you did mean to output the whole vector of anc, not just the lengths, the original function would look like this:
getallancs <- function(hpo_col) {
output <- character()
for (i in 1:length(hpo_col)) {
anc <- get.ancestors(hpo.terms, hpo_col[i])
output <- c(output, anc)
}
return(output)
}
Or a vectorised version could be
all_ancs <- unlist(lapply(xyz_hpo, function(x) get.ancestors(hpo.terms, x)))
Hope that helps. If it solves your problem, please mark this as the answer.

How to loop through mapply in R?

I am trying to concatenate strings using mapply function in R. However, I want one of the strings to be variable in mapply function. I have a snippet of my code below:
strings<-data.frame(x=c("dsf","sdf","sdf"))
strings2<-data.frame(extension=c(".csv",".json",".xml"))
for (i in 1:3)
{
strings_concat<-mapply(function(string1,string2) paste0(string1,string2),strings$x,strings2$extension[i])%>%
data.frame()%>%
unlist()%>%
data.frame()
#dosomething with strings_concat
}
But this is giving me the last iteration only
strings_concat
dsf.xml
sdf.xml
sdf.xml
bust instead, the desired output is as follows:
strings_concat
dsf.csv
sdf.csv
sdf.csv
dsf.json
sdf.json
sdf.json
dsf.xml
sdf.xml
sdf.xml
At every iteration, i want to combine strings_concat with another dataframe and save it. Can anyone help me if there is an easy way to do this in R?
Perhaps, outer is a better option here :
strings_concat <- c(outer(strings$x, strings2$extension, paste0))
strings_concat
#[1] "dsf.csv" "sdf.csv" "sdf.csv" "dsf.json" "sdf.json" "sdf.json"
# "dsf.xml" "sdf.xml" "sdf.xml"
You can add it in a data.frame :
df <- data.frame(strings_concat)
If you want to add some additional steps at each iteration you can use lapply :
lapply(strings2$extension, function(x) {
strings_concat <- paste0(strings$x, x)
#do something with strings_concat
})
All you should need to do is make sure you are continually augmenting your dataset. So I think this should do the trick:
strings<-data.frame(x=c("dsf","sdf","sdf"))
strings2<-data.frame(extension=c(".csv",".json",".xml"))
# We are going to keep adding things to results
results = NULL
for (i in 1:3)
{
strings_concat<-mapply(function(string1,string2) paste0(string1,string2),strings$x,strings2$extension[i])%>%
data.frame()%>%
unlist()%>%
data.frame()
# Here is where we keep adding things to results
results = rbind(results, strings_concat)
}
print(results)
Caution: not in front a computer with R so this code is untested

calling data frames in a for loop by a vector

I have some data.frames
dat1=read.table...
dat2=read.table...
dat3=read.table...
And I would to count the rows for each data set. So
the names are saved like this (cannot "change" it) vector=c("dat1","dat2","dat3...)
p <- vector(numeric, length=1:length(dat))
counting <- function(x) {for (i in 1:x){
p[i]<-nrow(dat[i])}
return(p)
}
This is not working because the input for nrow is a character, but i need integer(?) or?
Thx for help
You can use get for this, but be careful! Instead reading the tables at a list is the R-ish way:
file.names <- list.files()
dat <- lapply(file.names, read.table)
Then you have all the conveniences of lapply and the apply family at your disposal, e.g.:
lapply(dat, nrow)
The solution using get (also vector is a bad variable name since its a very important function):
lapply(vector, function(x) nrow(get(x)))
Your method fails since there is no object called dat to index into. The for loop could look like:
p = NULL
for(v in vector) {
p <- c(p, nrow(get(v)))
}
But that technique is poor form for lotsa reasons...
If you want to determine properties of items you know to be in the .GlobalEnv, this works:
> sapply( c("A","B"), function(objname) nrow(.GlobalEnv[[objname]]) )
A B
5 4
You could substitute any character vector for c("A","B")`. If the object is not in the global environment it just returns NULL, so it's reasonably robust.

How to index (subset) over a list of data.frames

I got a list of several data.frames and I want to remove the first 2 columns from each of the data.frames. I did it as follows, but feel this could be more R-ish.
data(mtcars)
data(iris)
myList <- list(A = mtcars, B = iris)
# helper function
removeCols <- function(df,vec) {
res <- df[,-vec]
}
lapply(myList,removeCols,1:2)
Obviously this does the job, but to me it seems like i must have missed something here (such as using an operator within lapply, cause it's technically a function too).
However, the major disadvantage of this approach is that you need a little helper function for every little task you want to do to all elements of that list.
Your code is perfectly good R. But you have two alternative options:
Use an anonymous function - this is a general solution
Use the [ operator - specific to this case
Your original:
xx <- lapply(myList,removeCols,1:2)
An anonymous function:
yy <- lapply(myList, function(df, vec){df[,-vec]}, 1:2)
Use the [ operator:
zz <- lapply(myList, "[", -(1:2))
These yield identical results
identical(xx, yy)
[1] TRUE
identical(xx, zz)
[1] TRUE
The only thing I can imagine at the moment to be more R-ish is to make it shorter and get rid of the helper-function.
data(mtcars)
data(iris)
myList <- list(A = mtcars, B = iris)
lapply(myList,function(x) x[,-(1:2)])
If you asking for a direct way to modify something:
myList[[1]][,-(1:2)]
But as lists are a quite open structure with no requirements to its content you can not index over its contents, as they can be really different. However if your tow data sets have the same dimension (nxm) than you can combine them to an 3d-array on which all the known indexing tricks will work.

Iterating over separate lists in R

I have lots of variables in R, all of type list
a100 = list()
a200 = list()
# ...
p700 = list()
Each variable is a complicated data structure:
a200$time$data # returns 1000 x 1000 matrix
Now, I want to apply code to each variable in turn. However, since R doesn't support pass-by-reference, I'm not sure what to do.
One idea I had was to create a big list of all these lists, i.e.,
biglist = list()
biglist[[1]] = a100
...
And then I could iterate over biglist:
for (i in 1:length(biglist)){
biglist[[i]]$newstuff = "profit"
# more code here
}
And finally, after the loop, go backwards so that existing code (that uses variable names) still works:
a100 = biglist[[1]]
# ...
The question is: is there a better way to iterate over a set of named lists? I have a feeling that I'm doing things horribly wrong. Is there something easier, like:
# FAKE, Idealized code:
foreach x in (a100, a200, ....){
x$newstuff = "profit"
}
a100$newstuff # "profit"
To parallel walk over lists you can use mapply, which will take parallel lists and then walk over them in lock-step. Furthermore, in a functional language you should emit the object that you want rather than modify the data structure within a function call.
You should use the sapply, apply, lapply, ... family of functions.
jim
jimmyb is quite right. lapply and sapply are specifically designed to work on lists. So they would work with your biglist as well. You shouldn't forget to return the object in the nested function though : An example :
X <- list(A=list(A1=1:2,A2=3:4),B=list(B1=5:6,B2=7:8))
lapply(X,function(i){
i$newstuff = "profit"
return(i)
})
Now as you said, R passes by value so you have multiple copies of the data roaming around. If you work with really big lists, you might want to try toning the memory usage down by working on each variable seperately, using assign and get. The following is considered bad coding, but can sometimes be necessary to avoid memory trouble :
A <- X[[1]] ; B <- X[[2]] #make the data
list.names <- c("A","B")
for (i in list.names){
tmp <- get(i)
tmp$newstuff <- "profit"
assign(i,tmp)
rm(tmp)
}
Make sure you are well aware of the implication this code has, as you're working within the global environment. If you need to do this more often, you might want to work with environments instead :
my.env <- new.env() # make the environment
my.env$A <- X[[1]];my.env$B <- X[[2]] # put vars in environment
for (i in list.names){
tmp <- get(i,envir=my.env)
tmp$newstuff <- "profit"
assign(i,tmp,envir=my.env)
rm(tmp)
}
my.env$A
my.env$B

Resources