Does anybody know if it is possible to define an own function that takes undetermined many arguments as input? My concrete problem is that I would like to write an own rbind function that is able to rbind data.frames with similar rownames (and just assigns new, numerical rownames).
This approach here is obviously wrong, but I hope you get my problem/idea:
rbindDF <- function(x){
N <- length(x)
# Join x[1] and x[2]
...
# Join x[n-1] and x[n]
}
I tried to find, how it is done e.g. in rbind or sum but I cannot remember how to see the source code from .Internal functions.
Using a call rbindDF(list(...)) is maybe one compromise, but I would be happy if it could be done in such a way if three data frames are present rbindDF(data1,data2,data3) and e.g. in case of two like this rbindDF(data1,data2).
Thanks a lot for any hint!
You can use the ellipsis operator (...). E.g.:
rbindDF <- function(...) {
df_list <- list(...)
do.call(rbind, df_list)
}
This will allow any number of data frames to be passed in:
rbindDF(df1, df2, df3)
I take it this question was just about the need for passing an unknown number of arguments rather than the contents of the function itself.
Related
This is probably a very simple problem but I have been struggling to search for this issue. Basically, I am using lapply to convert the column names to upper in a list of dataframes. My first attempt did not work, however adding ;x works. What exactly is going on?
This does not work:
df.list <- lapply(df.list,function(x) colnames(x) <- toupper(colnames(x)))
This does:
df.list <- lapply(df.list,function(x) {colnames(x) <- toupper(colnames(x));x})
Since you are modifying the object x (or in this case only the colnames of x) inside the function definition, you have to return the modified object x. This is happening by using ;x which can be read as a new line only returning the object x
I have a list of data frames. I want to use lapply on a specific column for each of those data frames, but I keep throwing errors when I tried methods from similar answers:
The setup is something like this:
a <- list(*a series of data frames that each have a column named DIM*)
dim_loc <- lapply(1:length(a), function(x){paste0("a[[", x, "]]$DIM")}
Eventually, I'll want to write something like results <- lapply(dim_loc, *some function on the DIMs*)
However, when I try get(dim_loc[[1]]), say, I get an error: Error in get(dim_loc[[1]]) : object 'a[[1]]$DIM' not found
But I can return values from function(a[[1]]$DIM) all day long. It's there.
I've tried working around this by using as.name() in the dim_loc assignment, but that doesn't seem to do the trick either.
I'm curious 1. what's up with get(), and 2. if there's a better solution. I'm constraining myself to the apply family of functions because I want to try to get out of the for-loop habit, and this name-as-list method seems to be preferred based on something like R- how to dynamically name data frames?, but I'd be interested in other, more elegant solutions, too.
I'd say that if you want to modify an object in place you are better off using a for loop since lapply would require the <<- assignment symbol (<- doesn't work on lapply`). Like so:
set.seed(1)
aList <- list(cars = mtcars, iris = iris)
for(i in seq_along(aList)){
aList[[i]][["newcol"]] <- runif(nrow(aList[[i]]))
}
As opposed to...
invisible(
lapply(seq_along(aList), function(x){
aList[[x]][["newcol"]] <<- runif(nrow(aList[[x]]))
})
)
You have to use invisible() otherwise lapply would print the output on the console. The <<- assigns the vector runif(...) to the new created column.
If you want to produce another set of data.frames using lapply then you do:
lapply(seq_along(aList), function(x){
aList[[x]][["newcol"]] <- runif(nrow(aList[[x]]))
return(aList[[x]])
})
Also, may I suggest the use of seq_along(list) in lapply and for loops as opposed to 1:length(list) since it avoids unexpected behavior such as:
# no length list
seq_along(list()) # prints integer(0)
1:length(list()) # prints 1 0.
here is how I created number of data sets with names data_1,data_2,data_3 .....and so on
for initial
dim(data)<- 500(rows) 17(column) matrix
for ( i in 1:length(unique( data$cluster ))) {
assign(paste("data", i, sep = "_"),subset(data[data$cluster == i,]))
}
upto this point everything is fine
now I am trying to use these inside the other loop one by one like
for (i in 1:5) {
data<- paste(data, i, sep = "_")
}
however this is not giving me the data with required format
any help will be really appreciated.
Thank you in advance
Let me give you a tip here: Don't just assign everything in the global environment but use lists for this. That way you avoid all the things that can go wrong when meddling with the global environment. The code you have in your question, will overwrite the original dataset data, so you'll be in trouble if you want to rerun that code when something went wrong. You'll have to reconstruct the original dataframe.
Second: If you need to split a data frame based on a factor and carry out some code on each part, you should take a look at split, by and tapply, or at the plyr and dplyr packages.
Using Base R
With base R, it depends on what you want to do. In the most general case you can use a combination of split() and lapply or even a for loop:
mylist <- split( data, f = data$cluster)
for(mydata in mylist){
head(mydata)
...
}
Or
mylist <- split( data, f = data$cluster)
result <- lapply(mylist, function(mydata){
doSomething(mydata)
})
Which one you use, depends largely on what the result should be. If you need some kind of a summary for every subset, using lapply will give you a list with the results per subset. If you need this for a simulation or plotting or so, you better use the for loop.
If you want to add some variables based on other variables, then the plyr or dplyr packages come in handy
Using plyr and dplyr
These packages come especially handy if the result of your code is going to be an array or data frame of some kind. This would be similar to using split and lapply but then in a way Hadley approves of :-)
For example:
library(plyr)
result <- ddply(data, .(cluster),
function(mydata){
doSomething(mydata)
})
Use dlply if the result should be a list.
Say you have a list of functions
funList=list()
for (i in 1:5){
funList[[i]]=approxfun(0:5,(0:5)^i,method="linear", rule=2)
}
and later you want a matrix of values with each row (or column which ever makes the code simpler or even a list of arrays instead of a matrix would be fine) being of the form of lets say
funList[[i]](1:3)
I've tried using lapply, but I haven't been able to get that to work
I would do:
eval.with.args <- function(FUN, ...) FUN(...)
Then one of:
lapply(funList, eval.with.args, 1:3)
sapply(funList, eval.with.args, 1:3)
mapply(eval.with.args, funList, list(1:3))
Map(eval.with.args, funList, list(1:3))
I think I remember asking on the forums if there was a function that already implemented function(FUN, ...)FUN(...) but the answer was "no" at the time. It could make a nice addition to the base or functional packages IMHO.
You're looking for do.call:
lapply(funList, do.call, list(1:3))
You can replace eval.with.args in all of #flodel's examples with do.call if you wrap the second argument in an additional call to list.
There is a data.frame() for which's columns I'd like to calculate quantiles:
tert <- c(0:3)/3
data <- dbGetQuery(dbCon, "SELECT * FROM tablename")
quans <- mapply(quantile, data, probs=tert, name=FALSE)
But the result only contains the last element of quantiles return list and not the whole result. I also get a warning longer argument not a multiple of length of shorter. How can I modify my code to make it work?
PS: The function alone works like a charme, so I could use a for loop:
quans <- quantile(a$fileName, probs=tert, name=FALSE)
PPS: What also works is not specifying probs
quans <- mapply(quantile, data, name=FALSE)
The problem is that mapply is trying to apply the given function to each of the elements of all of the specified arguments in sequence. Since you only want to do this for one argument, you should use lapply, not mapply:
lapply(data, quantile, probs=tert, name=FALSE)
Alternatively, you can still use mapply but specify the arguments that are not to be looped over in the MoreArgs argument.
mapply(quantile, data, MoreArgs=list(probs=tert, name=FALSE))
I finally found a workaround which I don't like but kinda works. Perhaps someone can tell the right way to do it:
q <- function(x) { quantile(x, probs=c(0:3)/3, names=FALSE) }
mapply(q, data)
works, no Idea where the difference is.