R store output from lapply with multiple functions - r

I'm using lapply to loop through a list of dataframes and apply the same set of functions. This works fine when lapply has just one function, but I'm struggling to see how I store/print the output from multiple functions - in that case, I seem to only get output from one 'loop'.
So this:
output <- lapply(dflis,function(lismember) vss(ISEQData,n=9,rotate="oblimin",diagonal=F,fm="ml"))
works, while the following doesn't:
output <- lapply(dflis,function(lismember){
outputvss <- vss(lismember,n=9,rotate="oblimin",diagonal=F,fm="ml")
nefa <- (EFA.Comp.Data(Data=lismember, F.Max=9, Graph=T))
})
I think this dummy example is an analogue, so in other words:
nbs <- list(1==1,2==2,3==3,4==4)
nbsout <- lapply(nbs,function(x) length(x))
Gives me something I can access, while I can't see how to store output using the below (e.g. the attempt to use nbsout[[x]][2]):
nbs <- list(1==1,2==2,3==3,4==4)
nbsout <- lapply(nbs,function(x){
nbsout[[x]][1]<-typeof(x)
nbsout[[x]][2]<-length(x)
}
)
I'm using RStudio and will then be printing outputs/knitting html (where it makes sense to display the results from each dataset together, rather than each function-output for each dataset sequentially).

You should return a structure that include all your outputs. Better to return a named list. You can also return a data.frame if your outputs have all the same dimensions.
otutput <- lapply(dflis,function(lismember){
outputvss <- vss(lismember,n=9,rotate="oblimin",diagonal=F,fm="ml")
nefa <- (EFA.Comp.Data(Data=lismember, F.Max=9, Graph=T))
list(outputvss=outputvss,nefa=nefa)
## or data.frame(outputvss=outputvss,nefa=nefa)
})
When you return a data.frame you can use sapply that simply outputs the final result to a big data.frame. Or you can use the classical:
do.call(rbind,output)
to aggregate the result.

A function should always have an explicit return value, e.g.
output <- lapply(dflis,function(lismember){
outputvss <- vss(lismember,n=9,rotate="oblimin",diagonal=F,fm="ml")
nefa <- (EFA.Comp.Data(Data=lismember, F.Max=9, Graph=T))
#return value:
list(outputvss, nefa)
})
output is then a list of lists.

Related

Avoid getting the output of sapply() in R Markdown

I made a function systemFail(x), where x is one of the columns in my data frame (df$location).
Now I want to create a new column in my data frame (df$outcome) with the result of this function (Pass/Fail result). I used the line of code below which made the extra column as I wanted. However, annoyingly the result (a long column of Passes and Fails) also shows up in my R Markdown document.
How can I get the extra column in my data frame without getting the result also in my R Markdown document?
df$outcome <- sapply(df$location, systemFail)
It is difficult to answer without knowing details of systemFail function. However, from your previous question it seems you are printing the values in the function. Instead of printing use return to return the result.
Look at this simple example -
systemFail <- function(x) {
print(x)
}
res <- systemFail('out')
#[1] "out"
and when we use return nothing is printed and the result is available in res.
systemFail <- function(x) {
return(x)
}
res <- systemFail('out')

How to loop through mapply in R?

I am trying to concatenate strings using mapply function in R. However, I want one of the strings to be variable in mapply function. I have a snippet of my code below:
strings<-data.frame(x=c("dsf","sdf","sdf"))
strings2<-data.frame(extension=c(".csv",".json",".xml"))
for (i in 1:3)
{
strings_concat<-mapply(function(string1,string2) paste0(string1,string2),strings$x,strings2$extension[i])%>%
data.frame()%>%
unlist()%>%
data.frame()
#dosomething with strings_concat
}
But this is giving me the last iteration only
strings_concat
dsf.xml
sdf.xml
sdf.xml
bust instead, the desired output is as follows:
strings_concat
dsf.csv
sdf.csv
sdf.csv
dsf.json
sdf.json
sdf.json
dsf.xml
sdf.xml
sdf.xml
At every iteration, i want to combine strings_concat with another dataframe and save it. Can anyone help me if there is an easy way to do this in R?
Perhaps, outer is a better option here :
strings_concat <- c(outer(strings$x, strings2$extension, paste0))
strings_concat
#[1] "dsf.csv" "sdf.csv" "sdf.csv" "dsf.json" "sdf.json" "sdf.json"
# "dsf.xml" "sdf.xml" "sdf.xml"
You can add it in a data.frame :
df <- data.frame(strings_concat)
If you want to add some additional steps at each iteration you can use lapply :
lapply(strings2$extension, function(x) {
strings_concat <- paste0(strings$x, x)
#do something with strings_concat
})
All you should need to do is make sure you are continually augmenting your dataset. So I think this should do the trick:
strings<-data.frame(x=c("dsf","sdf","sdf"))
strings2<-data.frame(extension=c(".csv",".json",".xml"))
# We are going to keep adding things to results
results = NULL
for (i in 1:3)
{
strings_concat<-mapply(function(string1,string2) paste0(string1,string2),strings$x,strings2$extension[i])%>%
data.frame()%>%
unlist()%>%
data.frame()
# Here is where we keep adding things to results
results = rbind(results, strings_concat)
}
print(results)
Caution: not in front a computer with R so this code is untested

get() not working for column in a data frame in a list in R (phew)

I have a list of data frames. I want to use lapply on a specific column for each of those data frames, but I keep throwing errors when I tried methods from similar answers:
The setup is something like this:
a <- list(*a series of data frames that each have a column named DIM*)
dim_loc <- lapply(1:length(a), function(x){paste0("a[[", x, "]]$DIM")}
Eventually, I'll want to write something like results <- lapply(dim_loc, *some function on the DIMs*)
However, when I try get(dim_loc[[1]]), say, I get an error: Error in get(dim_loc[[1]]) : object 'a[[1]]$DIM' not found
But I can return values from function(a[[1]]$DIM) all day long. It's there.
I've tried working around this by using as.name() in the dim_loc assignment, but that doesn't seem to do the trick either.
I'm curious 1. what's up with get(), and 2. if there's a better solution. I'm constraining myself to the apply family of functions because I want to try to get out of the for-loop habit, and this name-as-list method seems to be preferred based on something like R- how to dynamically name data frames?, but I'd be interested in other, more elegant solutions, too.
I'd say that if you want to modify an object in place you are better off using a for loop since lapply would require the <<- assignment symbol (<- doesn't work on lapply`). Like so:
set.seed(1)
aList <- list(cars = mtcars, iris = iris)
for(i in seq_along(aList)){
aList[[i]][["newcol"]] <- runif(nrow(aList[[i]]))
}
As opposed to...
invisible(
lapply(seq_along(aList), function(x){
aList[[x]][["newcol"]] <<- runif(nrow(aList[[x]]))
})
)
You have to use invisible() otherwise lapply would print the output on the console. The <<- assigns the vector runif(...) to the new created column.
If you want to produce another set of data.frames using lapply then you do:
lapply(seq_along(aList), function(x){
aList[[x]][["newcol"]] <- runif(nrow(aList[[x]]))
return(aList[[x]])
})
Also, may I suggest the use of seq_along(list) in lapply and for loops as opposed to 1:length(list) since it avoids unexpected behavior such as:
# no length list
seq_along(list()) # prints integer(0)
1:length(list()) # prints 1 0.

Calculating variable values using paste function in R

First let me say that I am not an expert coder and any advice about this particular question or my general technique will be greatly appreciated.
I have a large data set that is made up of similar data frames named Table6.# such as: Table6.1, Table6.2, ect. I have variables in each data frame that repeat as well, such as: ST1_Delta_PV%, ST2_Delta_PV%, ect. and ST1_Realloc_Margin, ST2_Reallocation_Margin, ect.
I am trying to write several nested loops that will calculated values in each table across these similar variables. I have tried to do this with the paste function as shown below, but this is obviously not the correct way to do this.
for (i in 1:25){
for (j in 1:4){
for (k in 1:length(paste("Table6.",i,"sep="")[,1]){
paste("Table6.",i,sep="")$paste("ST",j,"NonTgt_Shr",sep="")[k] <- paste("Table6.",i,sep="")$paste("ST",j,"_Delta_PV%",sep="")[k] * paste("Table6.",i,sep="")$paste("ST",j,"_Reallocation_Margin",sep="")[k]
}
}
}
I apologize if this is a complete mess. I appreciate your help.
As akrun says, you should put your data frames in a list
Tables <- list(Table6.1, Table6.2, …)
for (Table in Tables) { … }
This way, you do not need to use paste to construct the different Table names.
For accessing the different columns, you can use the df["column"] syntax - this is similar to df$column, except that inside the brackets, you can use any string
nonTgt_Shr.column.name <- paste0("ST",j,"NonTgt_Shr")
delta.column.name <- paste0("ST",j,"_Delta_PV%")
for (k in 1:nrow(Table) {
Table[nonTgt_Shr.column.name][k] <- Table[delta.column.name][k] * …
}
Note how I use variables for storing the name, making the line with the actual computation much more readable.
Also, nrow is more intuitive than length(Table[,1]).
The calculations could be transformed into a function which improves readability, scaling and
robustness
In the actual calculation function, the function get is used to retrieve the data frame based on the name.
#Calculation Function
fn_CalcVariables <- function(
tableName="Table6.1",
outputVarName="NonTgt_Shr",
inputVarNames=c("_Delta_PV%", "_Reallocation_Margin"),
variablePrefix="ST1"
) {
DF <- get(tableName)
outputVarName <- paste0(variablePrefix, outputVarName)
inputVarNames <- paste0(variablePrefix, inputVarNames)
DF[,outputVarName] <- DF[,inputVarNames[1]] * DF[,inputVarNames[2]]
return(DF)
}
This function should by called by nested lapply calls.
lapply iterates over the lists of the arguments, calls the function (second argument), and collects a list of the return values.
(As an exercise, try l <- list(a=1, b=2); lapply(l, function(x) { x*2 }).)
#List object names for tables and variable names
tableNamesList <- paste0("Table6.",1:25)
variablePrefixList <- paste0("ST",1:4)
#Nested loops to invoke custom function from above
lapply(variablePrefixList, function(alpha) {
lapply(tableNamesList, function(x, varprefix=alpha) {
cat("Begin Processing Table",x,"varPrefix",varprefix,"\n")
fn_CalcVariables(
tableName=x,
outputVarName="NonTgt_Shr",
inputVarNames=c("_Delta_PV%","_Reallocation_Margin"),
variablePrefix=varprefix
)
cat("End Processing Table", x, "varPrefix", varprefix, "\n")
}) #End of innner lapply
}) #End of outer lapply

How to use extract function in a for loop?

I am using the extract function in a loop. See below.
for (i in 1:length(list_shp_Tanzania)){
LU_Mod2000<- extract(x=rc_Mod2000_LC, y=list_shp_Tanzania[[i]], fun=maj)
}
Where maj function is:
maj <- function(x){
y <- as.numeric(names(which.max(table(x))))
return(y)
}
I was expecting to get i outputs, but I get only one output once the loop is done. Somebody knows what I am doing wrong. Thanks.
One solution in this kind of situation is to create a list and then assign the result of each iteration to the corresponding element of the list:
LU_Mod2000 <- vector("list", length(list_shp_Tanzania))
for (i in 1:length(list_shp_Tanzania)){
LU_Mod2000[[i]] <- extract(x=rc_Mod2000_LC, y=list_shp_Tanzania[[i]], fun=maj)
}
Do not do
LU_Mod2000 <- c(LU_Mod2000, extract(x=rc_Mod2000_LC, y=list_shp_Tanzania[[i]], fun=maj))
inside the loop. This will create unnecessary copies and will take long to run. Use the list method, and after the loop, convert the list of results to the desired format (usually using do.call(LU_Mod2000, <some function>))
Alternatively, you could substitute the for loop with lapply, which is what many people seem to prefer
LU_Mod2000 <- lapply(list_shp_Tanzania, function(z) extract(x=rc_Mod2000_LC, y=z, fun=maj))

Resources