rbind multiple dataframes within a function

rbind multiple dataframes within a function - r

I found this code line below on SO and it worked as a charm outside a function to identify the list of dataframes and join them using rbind.
mylist<-ls(pattern='leg_')
mleg <- do.call(rbind, lapply(mylist, get))
But when I enclose this within a loop, I am getting an error message. I have tried to troubleshoot at various steps in the functions and those work but I might be missing something that is causing this error.
for(i in 1:(length(blg_idx))){
assign(paste(deparse(substitute(leg_)),i,sep=''),l_merge(get(paste(deparse(substitute(blg)),i,sep='')),get(paste(deparse(substitute(bsg)),i,sep=''))))
}
mylist<-ls(pattern='leg_')
#return(mylist) # this returns a good list of dataframes
#mlegleg <- rbind(leg_1,leg_2) # this works
mleg <- do.call(rbind, lapply(mylist, get))
return(mleg)
} #end function read_leg
Error in FUN(c("leg_1", "leg_2")[[1L]], ...) :
object 'leg_1' not found
When I return mylist from the function, it is able to identify all the dataframes and list them. The function is able to return leg_1 or leg_2 dataframe when I choose to return those in debugging.
[1] "leg_1" "leg_2"
Any help?
update
I found another of achieving what I need but I am sure it is inefficient although my list of dataframes is a maximum of 4
for(i in 1:(length(blg_idx))){
assign(paste(deparse(substitute(leg_)),i,sep=''),l_merge(get(paste(deparse(substitute(blg)),i,sep='')),get(paste(deparse(substitute(bsg)),i,sep=''))))
}
mylist<-ls(pattern='leg_')
#return(mylist)
#mlegleg <- rbind(leg_1,leg_2) # this works
# mleg <- do.call(rbind, lapply(mylist, get))
mleg <- leg_1
for(i in 2:(length(blg_idx))){
mleg <- rbind(leg,get(paste(deparse(substitute(leg_)),i,sep='')))
}
return(mleg)
} #end read_leg
update 2
Here is the reproducible example for the issue I am facing. For some reason do.call & get is unable to process the mylist parameter generated for dataframes generated within a function.
read_date <- function(x){
pur_1 <- data.frame(sku=859, X = sample(1:10),Y = sample(c("yes", "no"), 10, replace = TRUE))
pur_2 <- data.frame(sku=859, X = sample(11:20),Y = sample(c("yes", "no","na"), 10, replace = TRUE))
mylist<-ls(pattern='pur_')
pur_final <- do.call(rbind, lapply(mylist, get))
#fancier version that I want to achieve is below
#assign(paste('pur_',eval(pur_1$sku[1]),sep=''),do.call(rbind, lapply(mylist, get)))
return(pur_final)
}
read_date()
Error message is
read_date()
Error in FUN(c("pur_1", "pur_2")[[1L]], ...) : object 'pur_1' not found
update 3
I am sorry for unconventional management of this post but I will get better with my next ones.
Here is what I stumbled upon that is working for me with an exception.
pur_final <- do.call(rbind, mget(paste0("pur_", 1:2),envir = as.environment(-1)))
But the next not so big issue is to suppress the row.names that get added to the dataframe. Any suggestions to suppress the row.names in this context.
> read_date()
sku X Y
pur_1.1 859 8 yes
pur_1.2 859 4 no
pur_1.3 859 3 yes
....
pur_2.8 859 14 na
pur_2.9 859 13 na
pur_2.10 859 19 no
>

You do not have a reproducible example with which to test this solution but take a look at the help page for get and try this:
mleg <- do.call(rbind, lapply(mylist, get, envir = globalenv() ))

The answer above contains the key to your question: envir = globalenv()
It took me a while to realize that R will create a private environment for each function. And within that private environment your other variables don't exist. That is, unless you tell your function to look in the Global Environment by using the envir argument.
Here's a function that should take a character string as input and then identify all variables (e.g. data frames) in Global Environment that include that string of text in their name. Then it will try to bind those variables (data frames).
If all variables are data frames with the same column names, then it should return a single binded data frame. myBindedDF <- mergeCompatibleTables("mypattern")
bindCompatibleTables <- function(x){
if(is.character(x)){
mylist <- grep(x, ls(pos = 1), value=T)
mergedDF <- do.call(rbind, mget(mylist,envir = as.environment(1)))
return(bindedDF)
} else {
stop("Input is not a character string")
}
}

A late response but I just faced a similar issue to the update 2 posting where "object 'pur_1' not found".
If you want to use the following within a function when you have an unknown number of dataframes starting with "pur_", for example:
mylist <- ls(pattern='pur_')
mleg <- do.call(rbind, lapply(mylist, get))
Then you need to point to the correct environment within the function:
mylist <- ls(pattern='pur_')
mleg <- do.call(rbind, lapply(mylist, get, env=environment()))

Related

R : How to create objects with a function which name and value depend on an argument, and that these objects are found in the global environment?

I have the following situation: I have different dataframes, I would like to be able, for each dataframe, to create 2 dataframes according to the value of one of the columns (log2FoldChange>1 and logFoldChange<-1).
For this I use the following code:
DJ29_T0_Overexpr = DJ29_T0[which(DJ29_T0$log2FoldChange > 1),]
DJ29_T0_Underexpr = DJ29_T0[which(DJ21_T0$log2FoldChange < -1),]
DJ229_T0 being one of my dataframe.
First problem: the sign for the dataframe where log2FoldChange < -1 is not taken into account.
But the main problem is at the time of making the function, I wrote the following:
spliteOverUnder <- function(res){
nm <-deparse(substitute(res))
assign(paste(nm,"_Overexpr", sep=""), res[which(as.numeric(as.character(res$log2FoldChange)) > 1),])
assign(paste(nm,"_Underexpr", sep=""), res[which(as.numeric(as.character(res$log2FoldChange)) < -1),])
}
Which I then ran with :
spliteOverUnder(DJ29_T0)
No error message, but my objects are not exported in my global environment. I tried with return(paste(nm,"_Overexpr", sep="") but it only returns the object name but not the associated dataframe.
Using paste() forces the use of assign(), so I can't do :
spliteOverUnder <- function(res){
nm <-deparse(substitute(res))
paste(nm,"_Overexpr", sep="") <<- res[which(as.numeric(as.character(res$log2FoldChange)) > 1),]
paste(nm,"_Underexpr", sep="") <<- res[which(as.numeric(as.character(res$log2FoldChange)) < -1),]
}
spliteOverUnder(DJ24_T0)
I encounter the following error:
Error in paste(nm, "_Overexpr", sep = "") <<- res[which(as.numeric(as.character(res$log2FoldChange)) > :
could not find function "paste<-"
If you've encountered this difficulty before, I'd appreciate a little help.
And if you knew, once the function works, how to use a For loop going through a list containing all my dataframes to apply this function to each of them, I'm also a taker.
Thanks

When assigning, use the pos argument to hoist the new objects out of the function.
function(){
assign(x = ..., value = ...,
pos = 1 ## see below
)
}
... where 0 = the function's local environment, 1 = the environment next up (in which the function is defined) etc.
edit
A general function to create the split dataframes in your global environment follows. However, you might rather want to save the new dataframes (from within the function) or just forward them to downstream functions than cram your workspace with intermediary objects.
splitOverUnder <- function(the_name_of_the_frame){
df <- get(the_name_of_the_frame)
df$cat <- cut(df$log2FoldChange,
breaks = c(-Inf, -1, 1, Inf),
labels = c('underexpr', 'normal', 'overexpr')
)
split_data <- split(df, df$cat)
sapply(c('underexpr', 'overexpr'),
function(n){
new_df_name <- paste(the_name_of_the_frame, n, sep = '_')
assign(x = new_df_name,
value = split_data$n,
envir = .GlobalEnv
)
}
)
}
## say, df1 and df2 are your initial dataframes to split:
sapply(c('df1', 'df2'), function(n) splitOverUnder(n))

trying to get a proper names(list) output

I'm trying to split a 2 level deep list of characters into a 1 level list using a suffix.
More precisely, I have a list of genes, each containing 6 lists of probes corresponding to 6 bins. The architecture looks like :
feat_indexed_probes_bin$HSPB6$bin1
[1] "cg14513218" "cg22891287" "cg20713852" "cg04719839" "cg27580050" "cg18139462" "cg02956481" "cg26608795" "cg15660498" "cg25654926" "cg04878216"
I'm trying to get a list "bins_indexed_probes" with the following architecture :
bins_indexed_probes$HSPB6_bin6 containing the same probes so I can pass it to my map-reducing function.
I tried many solutions such as melt(), for loop, etc but I can't figure how to perform a double nested loop ( on genes and on bins) and get a list output with only 1 level depth.
For the moment, my func to do so is the following :
create_map <- function(indexes = feat_indexed_probes_bin, binlist = c("bin1", "bin2", "bin3", "bin4", "bin5", "bin6"), genes = features) {
map <- list()
ret <- lapply(binlist, function(bin) {
lapply(rownames(features), function(gene) {
map[[paste(gene, "_", bin, sep = "")]] <- feat_indexed_probes_bin[[gene]][[bin]]
tmp_names <<- paste(gene, "_", bin, sep = "")
return(map)
})
names(map) <- tmp_names
rm(tmp_names)
})
return(ret)
}
it returns:
[[6]][[374]]
GDF10_bin6
"cg13565300"
[[6]][[375]]
NULL
[[6]][[376]]
[[6]][[376]]$HNF1B_bin6
[1] "cg03433642" "cg09679923" "cg17652435" "cg03348978" "cg02435495" "cg02701059" "cg05110178" "cg11862993" "cg09463047"
[[6]][[377]]
[[6]][[377]]$GPIHBP1_bin6
[1] "cg01953797" "cg00152340"
instead, I would expect something like
$GPIHBP1_bin1
"cg...." "cg...."
...
$GPIHBP1_bin6
"someotherprobe"
$someothergene_bin1
"probe" "probe"
...
I hope I'm being clear, and since this is my first time asking question, I already apologise if I didn't follow the stackoverflow protocol.
Thank you already for reading me

Consider a nested lapply with extract, [[, and setNames calls, all wrapped in do.call using c to bind return elements together.
bins_indexed_probes <- do.call(c,
lapply(1:6, function(i)
setNames(lapply(feat_indexed_probes_bin, `[[`, i),
paste0(names(feat_indexed_probes_bin), "_bin", i))
)
)
# RE-ORDER ELEMENTS BY NAME
bins_indexed_probes <- bins_indexed_probes[sort(names(bins_indexed_probes))]
Rextester Demo

Error message when using lapply to apply a function to multiple dataframes in a list.

My dataset looks like this, and I have a list of data.
Plot_ID Canopy_infection_rate DAI
1 YO01 5 7
2 YO01 8 14
3 YO01 10 21
What I want to do is to apply a function called "audpc_Canopyinfactionrate" to a list of dataframes.
However, when I run lapply, I get an error as below:
Error in FUN(X[[i]], ...) : argument "DAI" is missing, with no default
I've checked my list that my data does not shift a column.
Does anyone know what's wrong with it? Thanks
Here is part of my code:
#Read files in to list
for(i in 1:length(files)) {
lst[[i]] <- read.delim(files[i], header = TRUE, sep=" ")
}
#Apply a function to the list
densities <- list()
densities<- lapply(lst, audpc_Canopyinfactionrate)
#canopy infection rate
audpc_Canopyinfactionrate <- function(Canopy_infection_rate,DAI){
n <- length(DAI)
meanvec <- matrix(-1,(n-1))
intvec <- matrix(-1,(n-1))
for(i in 1:(n-1)){
meanvec[i] <- mean(c(Canopy_infection_rate[i],
Canopy_infection_rate[i+1]))
intvec[i] <- DAI[i+1] - DAI[i]
}
infprod <- meanvec * intvec
sum(infprod)
}

As pointed out in the comments, the problem lies in the way you are using lapply.
This function is built up like this: lapply(X, FUN, ...). FUN is the name of a function used to apply to the elements in a data.frame/list called X. So far so good.
Back to your case: You want to apply a function audpc_Canopyinfactionrate() to all data frames in lst. This function takes two arguments. And I think this is where things got mixed up in your code. Make sure you understand that in the way you are using lapply, you use lst[[1]], lst[[2]], etc. as the only argument in audpc_Canopyinfactionrate(), whereas it actually requires two arguments!
If you reformulate your function a bit, you can use lst[[1]], lst[[2]] as the only argument to your function, because you know that argument contains the columns you need - Canopy_infection_rate and DAI:
audpc_Canopyinfactionrate <- function(df){
n <- nrow(df)
meanvec <- matrix(-1, (n-1))
intvec <- matrix(-1, (n-1))
for(i in 1:(n-1)){
meanvec[i] <- mean(c(df$Canopy_infection_rate[i],
df$Canopy_infection_rate[i+1]))
intvec[i] <- df$DAI[i+1] - df$DAI[i]
}
infprod <- meanvec * intvec
return(sum(infprod))
}
Call lapply in the following way:
lapply(lst, audpc_Canopyinfactionrate)
Note: lapply can also be used with more than 1 argument, by using the ... in lapply(X, FUN, ...). In your case, however, I think this is not the best option.

Using lapply to subset rows from data frames -- incorrect number of dimensions error

I have a list called "scenbase" that contains 40 data frames, which are each 326 rows by 68 columns. I would like to use lapply() to subset the data frames so they only retain rows 33-152. I've written a simple function called trim() (below), and am attempting to apply it to the list of data frames but am getting an error message. The function and my attempt at using it with lapply is below:
trim <- function(i)
{ (i <- i[33:152,]) }
lapply(scenbase, trim)
Error in i[33:152, ] : incorrect number of dimensions
When I try to do the same thing to one of the individual data frames (soil11base.txt) that are included in the list (below), it works as expected:
soil11base.txt <- soil11base.txt[33:152,]
Any idea what I need to do to get the dimensions correct?

You have 2 solutions. You can either
(a) assign to a new list newList = lapply(scenbase, function(x) { x[33:152,,drop=F]} )
(b) use the <<- operator will assign your trimmed data in place lapply(1:length(scenbase), function(x) { scenbase[[x]] <<- scenbase[[x]][33:152,,drop=F]} ).
Your call does not work because the i is not in the global scope. You can work your way around that by using calls to the <<- operator which assigns to the first variable it finds in successive parent environments. Or by creating a new trimmed list.
Here is some code that reproduces solution (a):
listOfDfs = list()
for(i in 1:10) { listOfDfs[[i]] = data.frame("x"=sample(letters,200,replace=T),"y"=sample(letters,200,replace=T)) }
choppedList = lapply(listOfDfs, function(x) { x[33:152,,drop=F]} )
Here is some code that reproduces solution (b):
listOfDfs = list()
for(i in 1:10) { listOfDfs[[i]] = data.frame("x"=sample(letters,200,replace=T),"y"=sample(letters,200,replace=T)) }
lapply(1:length(listOfDfs), function(x) { listOfDfs[[x]] <<- listOfDfs[[x]][33:152,,drop=F]} )

Create a list of functions from a vector of characters

Thanks in advance, and sorry if this question has been answered previously - I have looked pretty extensively. I have a dataset containing a row of with concatenated information, specifically: name,color code,some function expression. For example, one value may be:
cost#FF0033#log(x)+6.
I have all of the code to extract the information, and I end up with a vector of expressions that I would like to convert to a list of actual functions.
For example:
func.list <- list()
test.func <- c("x","x+1","x+2","x+3","x+4")
where test.func is the vector of expressions. What I would like is:
func.list[[3]]
To be equivalent to
function(x){x+3}
I know that I can create a function using:
somefunc <- function(x){eval(parse(text="x+1"))}
to convert a character value into a function. The problem comes when I try and loop through to make multiple functions. For an example of something I tried that didn't work:
for(i in 1:length(test.func)){
temp <- test.func[i]
f <- assign(function(x){eval(expr=parse(text=temp))})
func.list[[i]] <- f
}
Based on another post (http://stats.stackexchange.com/questions/3836/how-to-create-a-vector-of-functions) I also tried this:
makefunc <- function(y){y;function(x){y}}
for(i in 1:length(test.func)){
func.list[[i]] <- assign(x=paste("f",i,sep=""),value=makefunc(eval(parse(text=test.func[i]))))
}
Which gives the following error: Error in eval(expr, envir, enclos) : object 'x' not found
The eventual goal is to take the list of functions and apply the jth function to the jth column of the data.frame, so that the user of the script can specify how to normalize each column within the concatenated information given by the column header.

Maybe initialize your list with a single generic function, and then update them using:
foo <- function(x){x+3}
> body(foo) <- quote(x+4)
> foo
function (x)
x + 4
More specifically, starting from a character, you'd probably do something like:
body(foo) <- parse(text = "x+5")

Just to add onto joran's answer, this is what finally worked:
test.data <- matrix(data=rep(1,25),5,5)
test.data <- data.frame(test.data)
test.func <- c("x","x+1","x+2","x+3","x+4")
func.list <- list()
for(i in 1:length(test.func)){
func.list[[i]] <- function(x){}
body(func.list[[i]]) <- parse(text=test.func[i])
}
processed <- mapply(do.call,func.list,lapply(test.data,list))
Thanks again, joran.

This is what I do:
f <- list(identity="x",plus1 = "x+1", square= "x^2")
funCreator <- function(snippet){
txt <- snippet
function(x){
exprs <- parse(text = txt)
eval(exprs)
}
}
listOfFunctions <- lapply(setNames(f,names(f)),function(x){funCreator(x)}) # I like to have some control of the names of the functions
listOfFunctions[[1]] # try to see what the actual function looks like?
library(pryr)
unenclose(listOfFunctions[[3]]) # good way to see the actual function http://adv-r.had.co.nz/Functional-programming.html
# Call your funcions
listOfFunctions[[2]](3) # 3+1 = 4
do.call(listOfFunctions[[3]],list(3)) # 3^2 = 9
attach(listOfFunctions) # you can also attach your list of functions and call them by name
square(3) # 3^2 = 9
identity(7) # 7 ## masked object identity, better detach it now!
detach(listOfFunctions)

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

rbind multiple dataframes within a function - r

You do not have a reproducible example with which to test this solution but take a look at the help page for get and try this: mleg <- do.call(rbind, lapply(mylist, get, envir = globalenv() ))

Related

R : How to create objects with a function which name and value depend on an argument, and that these objects are found in the global environment?

trying to get a proper names(list) output

Error message when using lapply to apply a function to multiple dataframes in a list.

Using lapply to subset rows from data frames -- incorrect number of dimensions error

Create a list of functions from a vector of characters

Categories

Resources