I am trying to create a vector or list of values based on the output of a function performed on individual elements of a column.
library(hpoPlot)
xyz_hpo <- c("HP:0003698", "HP:0007082", "HP:0006956")
getallancs <- function(hpo_col) {
for (i in 1:length(hpo_col)) {
anc <- get.ancestors(hpo.terms, hpo_col[i])
output <- list()
output[[length(anc) + 1]] <- append(output, anc)
}
return(anc)
}
all_ancs <- getallancs(xyz_hpo)
get.ancestors outputs a character vector of variable length depending on each term. How can I loop through hpo_col adding the length of each ancs vector to the output vector?
Welcome to Stack Overflow :) Great job on providing a minimal reproducible example!
As mentioned in the comments, you need to move the output <- list() outside of your for loop, and return it after the loop. At present it is being reset for each iteration of the loop, which is not what you want. I also think you want to return a vector rather than a list, so I have changed the type of output.
Also, in your original question, you say that you want to return the length of each anc vector in the loop, so I have changed the function to output the length of each iteration, rather than the whole vector.
getallancs <- function(hpo_col) {
output <- numeric()
for (i in 1:length(hpo_col)) {
anc <- get.ancestors(hpo.terms, hpo_col[i])
output <- append(output, length(anc))
}
return(output)
}
If you are only doing this for a few cases, such as your example, this approach will be fine, however, this paradigm is typically quite slow in R and it's better to try and vectorise this style of calculation if possible. This is especially important if you are running this for a large number of elements where computation will take more than a few seconds.
For example, one way the function above could be vectorised is like so:
all_ancs <- sapply(xyz_hpo, function(x) length(get.ancestors(hpo.terms, x)))
If in fact you did mean to output the whole vector of anc, not just the lengths, the original function would look like this:
getallancs <- function(hpo_col) {
output <- character()
for (i in 1:length(hpo_col)) {
anc <- get.ancestors(hpo.terms, hpo_col[i])
output <- c(output, anc)
}
return(output)
}
Or a vectorised version could be
all_ancs <- unlist(lapply(xyz_hpo, function(x) get.ancestors(hpo.terms, x)))
Hope that helps. If it solves your problem, please mark this as the answer.
Related
I am trying to concatenate strings using mapply function in R. However, I want one of the strings to be variable in mapply function. I have a snippet of my code below:
strings<-data.frame(x=c("dsf","sdf","sdf"))
strings2<-data.frame(extension=c(".csv",".json",".xml"))
for (i in 1:3)
{
strings_concat<-mapply(function(string1,string2) paste0(string1,string2),strings$x,strings2$extension[i])%>%
data.frame()%>%
unlist()%>%
data.frame()
#dosomething with strings_concat
}
But this is giving me the last iteration only
strings_concat
dsf.xml
sdf.xml
sdf.xml
bust instead, the desired output is as follows:
strings_concat
dsf.csv
sdf.csv
sdf.csv
dsf.json
sdf.json
sdf.json
dsf.xml
sdf.xml
sdf.xml
At every iteration, i want to combine strings_concat with another dataframe and save it. Can anyone help me if there is an easy way to do this in R?
Perhaps, outer is a better option here :
strings_concat <- c(outer(strings$x, strings2$extension, paste0))
strings_concat
#[1] "dsf.csv" "sdf.csv" "sdf.csv" "dsf.json" "sdf.json" "sdf.json"
# "dsf.xml" "sdf.xml" "sdf.xml"
You can add it in a data.frame :
df <- data.frame(strings_concat)
If you want to add some additional steps at each iteration you can use lapply :
lapply(strings2$extension, function(x) {
strings_concat <- paste0(strings$x, x)
#do something with strings_concat
})
All you should need to do is make sure you are continually augmenting your dataset. So I think this should do the trick:
strings<-data.frame(x=c("dsf","sdf","sdf"))
strings2<-data.frame(extension=c(".csv",".json",".xml"))
# We are going to keep adding things to results
results = NULL
for (i in 1:3)
{
strings_concat<-mapply(function(string1,string2) paste0(string1,string2),strings$x,strings2$extension[i])%>%
data.frame()%>%
unlist()%>%
data.frame()
# Here is where we keep adding things to results
results = rbind(results, strings_concat)
}
print(results)
Caution: not in front a computer with R so this code is untested
Suppose I have two vectors. Suppose further that I would like my function takes only one values of each vector and return me the output. Then, I would like another function to check the values of each run. If the output of the previous run is smaller than the new one. Then, I would like my function to stop and return me all the previous values. My original function is very complicated (estimation models). Hence, I try to provide an example to explain my idea.
Suppose that I have these two vectors:
set.seed(123)
x <- rnorm(1:20)
y <- rnorm(1:20)
Then, I would like to write a function which only takes one values of each vector and multiplied them. Then, return me the output. Then, I would like the function to check if the previous multiplication is smaller than the new one or not. If yes, then stop and return me all the previous multiplication.
I tried this:However, this functions takes all the values at once and return me a list of the multiplication. I was thinking about using lapply, to fit one element at a time but I do not know how to work with the conditions.
myfun <- function(x, y, n){
multi <- list()
for ( i in 1:n){
multi[[i]] <- x[[i]]*y[[i]]
}
return(multi)
}
myfun(x,y,10)
Here is another try
x <- rnorm(1:20)
y <- rnorm(1:20)
myfun <- function(x, y){
multi <- x*y
return(multi)
}
This is the first function. I would like to run it element by element. Each time, I would like it to returns me only one multiplication result. Then, another function (wrapper function) check the result. It the second output of the first function (multiplication function) is larger than the first one, then stop, otherwise keep going.
I would like to write a function which only takes one values of each vector and multiplied them. Then, return me the output. Then, I would like the function to check if the previous multiplication is smaller than the new one or not.
I would like the multiplication in a separate function. Then, I would like to check its output. So, I should have a warper function.
You can apply a for loop with a stopping condition, similar to what you have already:
# example input
set.seed(123)
x <- rnorm(1:20)
y <- rnorm(1:20)
# example function
f = function(xi, yi) xi*yi
# wrapper
stopifnot(length(x) == length(y))
res = vector(length(x), mode="list")
for (i in seq_along(x)){
res[[i]] = f(x[[i]], y[[i]])
if (i > 1L && res[[i]] > res[[i-1L]]) break
}
res[seq_len(i)]
Comments:
It is better to predefine the max length res might need (here, length(x)), rather than expanding it in the loop.
For this function (multiplication), there is no good reason to proceed elementwise. R's multiplication function is vectorized and fast.
You don't need to use a list-class output for this function, since it is returning doubles; res = double(length(x)) should also work.
You don't need to use list-style accessors for x, y and res unless lists are involved; res[i] = f(x[i], y[i]) should work, etc.
First let me say that I am not an expert coder and any advice about this particular question or my general technique will be greatly appreciated.
I have a large data set that is made up of similar data frames named Table6.# such as: Table6.1, Table6.2, ect. I have variables in each data frame that repeat as well, such as: ST1_Delta_PV%, ST2_Delta_PV%, ect. and ST1_Realloc_Margin, ST2_Reallocation_Margin, ect.
I am trying to write several nested loops that will calculated values in each table across these similar variables. I have tried to do this with the paste function as shown below, but this is obviously not the correct way to do this.
for (i in 1:25){
for (j in 1:4){
for (k in 1:length(paste("Table6.",i,"sep="")[,1]){
paste("Table6.",i,sep="")$paste("ST",j,"NonTgt_Shr",sep="")[k] <- paste("Table6.",i,sep="")$paste("ST",j,"_Delta_PV%",sep="")[k] * paste("Table6.",i,sep="")$paste("ST",j,"_Reallocation_Margin",sep="")[k]
}
}
}
I apologize if this is a complete mess. I appreciate your help.
As akrun says, you should put your data frames in a list
Tables <- list(Table6.1, Table6.2, …)
for (Table in Tables) { … }
This way, you do not need to use paste to construct the different Table names.
For accessing the different columns, you can use the df["column"] syntax - this is similar to df$column, except that inside the brackets, you can use any string
nonTgt_Shr.column.name <- paste0("ST",j,"NonTgt_Shr")
delta.column.name <- paste0("ST",j,"_Delta_PV%")
for (k in 1:nrow(Table) {
Table[nonTgt_Shr.column.name][k] <- Table[delta.column.name][k] * …
}
Note how I use variables for storing the name, making the line with the actual computation much more readable.
Also, nrow is more intuitive than length(Table[,1]).
The calculations could be transformed into a function which improves readability, scaling and
robustness
In the actual calculation function, the function get is used to retrieve the data frame based on the name.
#Calculation Function
fn_CalcVariables <- function(
tableName="Table6.1",
outputVarName="NonTgt_Shr",
inputVarNames=c("_Delta_PV%", "_Reallocation_Margin"),
variablePrefix="ST1"
) {
DF <- get(tableName)
outputVarName <- paste0(variablePrefix, outputVarName)
inputVarNames <- paste0(variablePrefix, inputVarNames)
DF[,outputVarName] <- DF[,inputVarNames[1]] * DF[,inputVarNames[2]]
return(DF)
}
This function should by called by nested lapply calls.
lapply iterates over the lists of the arguments, calls the function (second argument), and collects a list of the return values.
(As an exercise, try l <- list(a=1, b=2); lapply(l, function(x) { x*2 }).)
#List object names for tables and variable names
tableNamesList <- paste0("Table6.",1:25)
variablePrefixList <- paste0("ST",1:4)
#Nested loops to invoke custom function from above
lapply(variablePrefixList, function(alpha) {
lapply(tableNamesList, function(x, varprefix=alpha) {
cat("Begin Processing Table",x,"varPrefix",varprefix,"\n")
fn_CalcVariables(
tableName=x,
outputVarName="NonTgt_Shr",
inputVarNames=c("_Delta_PV%","_Reallocation_Margin"),
variablePrefix=varprefix
)
cat("End Processing Table", x, "varPrefix", varprefix, "\n")
}) #End of innner lapply
}) #End of outer lapply
I am having trouble optimising a piece of R code. The following example code should illustrate my optimisation problem:
Some initialisations and a function definition:
a <- c(10,20,30,40,50,60,70,80)
b <- c(“a”,”b”,”c”,”d”,”z”,”g”,”h”,”r”)
c <- c(1,2,3,4,5,6,7,8)
myframe <- data.frame(a,b,c)
values <- vector(length=columns)
solution <- matrix(nrow=nrow(myframe),ncol=columns+3)
myfunction <- function(frame,columns){
athing = 0
if(columns == 5){
athing = 100
}
else{
athing = 1000
}
value[colums+1] = athing
return(value)}
The problematic for-loop looks like this:
columns = 6
for(i in 1:nrow(myframe){
values <- myfunction(as.matrix(myframe[i,]), columns)
values[columns+2] = i
values[columns+3] = myframe[i,3]
#more columns added with simple operations (i.e. sum)
solution <- rbind(solution,values)
#solution is a large matrix from outside the for-loop
}
The problem seems to be the rbind function. I frequently get error messages regarding the size of solution which seems to be to large after a while (more than 50 MB).
I want to replace this loop and the rbind with a list and lapply and/or foreach. I have started with converting myframeto a list.
myframe_list <- lapply(seq_len(nrow(myframe)), function(i) myframe[i,])
I have not really come further than this, although I tried applying this very good introduction to parallel processing.
How do I have to reconstruct the for-loop without having to change myfunction? Obviously I am open to different solutions...
Edit: This problem seems to be straight from the 2nd circle of hell from the R Inferno. Any suggestions?
The reason that using rbind in a loop like this is bad practice, is that in each iteration you enlarge your solution data frame and then copy it to a new object, which is a very slow process and can also lead to memory problems. One way around this is to create a list, whose ith component will store the output of the ith loop iteration. The final step is to call rbind on that list (just once at the end). This will look something like
my.list <- vector("list", nrow(myframe))
for(i in 1:nrow(myframe)){
# Call all necessary commands to create values
my.list[[i]] <- values
}
solution <- rbind(solution, do.call(rbind, my.list))
A bit to long for comment, so I put it here:
If columns is known in advance:
myfunction <- function(frame){
athing = 0
if(columns == 5){
athing = 100
}
else{
athing = 1000
}
value[colums+1] = athing
return(value)}
apply(myframe, 2, myfunction)
If columns is not given via environment, you can use:
apply(myframe, 2, myfunction, columns) with your original myfunction definition.
I am using the extract function in a loop. See below.
for (i in 1:length(list_shp_Tanzania)){
LU_Mod2000<- extract(x=rc_Mod2000_LC, y=list_shp_Tanzania[[i]], fun=maj)
}
Where maj function is:
maj <- function(x){
y <- as.numeric(names(which.max(table(x))))
return(y)
}
I was expecting to get i outputs, but I get only one output once the loop is done. Somebody knows what I am doing wrong. Thanks.
One solution in this kind of situation is to create a list and then assign the result of each iteration to the corresponding element of the list:
LU_Mod2000 <- vector("list", length(list_shp_Tanzania))
for (i in 1:length(list_shp_Tanzania)){
LU_Mod2000[[i]] <- extract(x=rc_Mod2000_LC, y=list_shp_Tanzania[[i]], fun=maj)
}
Do not do
LU_Mod2000 <- c(LU_Mod2000, extract(x=rc_Mod2000_LC, y=list_shp_Tanzania[[i]], fun=maj))
inside the loop. This will create unnecessary copies and will take long to run. Use the list method, and after the loop, convert the list of results to the desired format (usually using do.call(LU_Mod2000, <some function>))
Alternatively, you could substitute the for loop with lapply, which is what many people seem to prefer
LU_Mod2000 <- lapply(list_shp_Tanzania, function(z) extract(x=rc_Mod2000_LC, y=z, fun=maj))