Repeat codeline in R - r

How can I repeat a code-line in R?
I have created a function called 'func1' and I want ‘data’ to run though ‘func1’ 10 times after another
This is what I have now:
data<-func1(data);data<-func1(data);data<-func1(data);data<-func1(data);data<-func1(data);data<-func1(data)
data<-func1(data);data<-func1(data);data<-func1(data);data<-func1(data)
This is what I would like to have:
solution
data<-func1(data,times=10)
Thanks in advance
Jannik

A simple loop would do this,
for(i in 1:10) {
data <- func1(data)
}

You can write a higher-order function which, given a function, f, a seed value, s, and an integer n, computes
f(f(f( ....(s))...)
(with n function evalutions):
iterate <- function(f,s,n){
if(n == 0){
s
}
else{
f(iterate(f,s,n-1))
}
}
Then you seem to want data <- iterate(func1,data,10)
You can also write iterate using a loop (in a way which is similar to the excellent answer of #JamesElderfield ) but the recursive approach given above is fairly common in the functional programming paradigm (which is one of R's native paradigms).

Related

Operate on multiple rows of dataframe simultaneously in R

I'm sure someone has asked this (very basic) question before, but I must be searching for the wrong thing because I can't find an answer:
I frequently need to perform operations that involve combining data from multiple rows of the same dataframe. I know how to do this with a looping construct, e.g.
for (i in 2:nrow(df)) { df$result[i] <- df$data[i] - df$data[i-1] }
for (i in 12:nrow(df)) { j <- i - 11; df$result[i] <- prod(df$data[j:i]) }
Is there a general solution for these types of operations that does not involve looping? Or is looping actually the best way to do it in R?
You may try subsetting your data frame, e.g. this:
for (i in 2:nrow[df]) { df$result[i] <- df$data[i] - df$data[i-1] }
becomes:
df$result[2:nrow(df)] <- df$data[2:nrow(df)] - df$data[1:nrow(df)-1]
Note: nrow() is a function AFAIK, so you should call it using parentheses, not square brackets.
In base R:
df$result[2:nrow(df)] = diff(df$data)
df$result2[13:nrow(df)] = diff(df$data,12)
Or dplyr:
df$result = dplyr::lag(df$data)
df$result2 = dplyr::lag(df$data, 12)

Calculating variable values using paste function in R

First let me say that I am not an expert coder and any advice about this particular question or my general technique will be greatly appreciated.
I have a large data set that is made up of similar data frames named Table6.# such as: Table6.1, Table6.2, ect. I have variables in each data frame that repeat as well, such as: ST1_Delta_PV%, ST2_Delta_PV%, ect. and ST1_Realloc_Margin, ST2_Reallocation_Margin, ect.
I am trying to write several nested loops that will calculated values in each table across these similar variables. I have tried to do this with the paste function as shown below, but this is obviously not the correct way to do this.
for (i in 1:25){
for (j in 1:4){
for (k in 1:length(paste("Table6.",i,"sep="")[,1]){
paste("Table6.",i,sep="")$paste("ST",j,"NonTgt_Shr",sep="")[k] <- paste("Table6.",i,sep="")$paste("ST",j,"_Delta_PV%",sep="")[k] * paste("Table6.",i,sep="")$paste("ST",j,"_Reallocation_Margin",sep="")[k]
}
}
}
I apologize if this is a complete mess. I appreciate your help.
As akrun says, you should put your data frames in a list
Tables <- list(Table6.1, Table6.2, …)
for (Table in Tables) { … }
This way, you do not need to use paste to construct the different Table names.
For accessing the different columns, you can use the df["column"] syntax - this is similar to df$column, except that inside the brackets, you can use any string
nonTgt_Shr.column.name <- paste0("ST",j,"NonTgt_Shr")
delta.column.name <- paste0("ST",j,"_Delta_PV%")
for (k in 1:nrow(Table) {
Table[nonTgt_Shr.column.name][k] <- Table[delta.column.name][k] * …
}
Note how I use variables for storing the name, making the line with the actual computation much more readable.
Also, nrow is more intuitive than length(Table[,1]).
The calculations could be transformed into a function which improves readability, scaling and
robustness
In the actual calculation function, the function get is used to retrieve the data frame based on the name.
#Calculation Function
fn_CalcVariables <- function(
tableName="Table6.1",
outputVarName="NonTgt_Shr",
inputVarNames=c("_Delta_PV%", "_Reallocation_Margin"),
variablePrefix="ST1"
) {
DF <- get(tableName)
outputVarName <- paste0(variablePrefix, outputVarName)
inputVarNames <- paste0(variablePrefix, inputVarNames)
DF[,outputVarName] <- DF[,inputVarNames[1]] * DF[,inputVarNames[2]]
return(DF)
}
This function should by called by nested lapply calls.
lapply iterates over the lists of the arguments, calls the function (second argument), and collects a list of the return values.
(As an exercise, try l <- list(a=1, b=2); lapply(l, function(x) { x*2 }).)
#List object names for tables and variable names
tableNamesList <- paste0("Table6.",1:25)
variablePrefixList <- paste0("ST",1:4)
#Nested loops to invoke custom function from above
lapply(variablePrefixList, function(alpha) {
lapply(tableNamesList, function(x, varprefix=alpha) {
cat("Begin Processing Table",x,"varPrefix",varprefix,"\n")
fn_CalcVariables(
tableName=x,
outputVarName="NonTgt_Shr",
inputVarNames=c("_Delta_PV%","_Reallocation_Margin"),
variablePrefix=varprefix
)
cat("End Processing Table", x, "varPrefix", varprefix, "\n")
}) #End of innner lapply
}) #End of outer lapply

Looping a function over Lists in R

I have a list of samples, each of varying lengths. I need to compare sample means (using a Mann-Whitney-Wilcoxon test) for all samples in the list. Current code is as follows:
wilcox.v = list() ##This creates the list of samples
for (i in df){
treat = list(i$treatment)
wilcox.v = c(wilcox.v,treat)
}
###This *should* iterate over all items in the list
wilcox = sapply(wilcox.v, function(i){ wilcox.test(as.numeric(wilcox.v[i,]), as.numeric(wilcox.v[-i,]), exact = FALSE)$p.value
})
I'd like to have the function return a vector of p-values, so that the broader function can re-sample if necessary.
The problem seems to lie in the need to compare a sample mean to all other sample means in the list.
I'm sure there's an easy way to do this (and I think it has something to do with calling indicies correctly), but I'm not sure!
AS joran said, you wrote your apply function a little wonky. There are two ways you can fis this.
Modify it so i is in fact an index reference:
wilcox = sapply(1:length(wilcox.v)
,function(i){ wilcox.test(as.numeric(wilcox.v[[i]])
,as.numeric(wilcox.v[[-i]]), exact = FALSE)$p.value
})
modify your function so it appropriately treats i as a list element. I'll leave this as an exercise to you (primarily since I don't want to deal with the wilcox.v[-i,] term.
Thanks for your help! This is the solution I ended up using. It's hardly elegant but it gets the job done.
mannwhit = vector()
for (i in mannwhit.v){
for (j in mannwhit.v){
if (identical(i,j) == FALSE){
p.val = wilcox.test(i, j, paired=FALSE)$p.value
mannwhit = c(mannwhit, p.val)
}
}
}

apply and updating arguments in R

I have a question regarding R apply (and all its variants). Is there a way to update the arguments of the function while apply is working?
For example, I have a function NextSol(Prev_Sol) that generates a new solution from Prev_Sol, compares it with the original one in some way and then returns either the original or the new, depending on the result of the comparison. I need to save all the solutions returned. Currently, I am doing this:
for( i in 2:N ) {
Results[[i]] <- NextSol(Results[[i-1]])
}
But maybe there is a (faster) way to do it using apply? I have seen also that Reduce could help but I have no idea of how can I use it. Any help will be much appreciated!
As Thomas said, the for loop is the standard way of looping when one iteration depends on a previous one. (Just make sure that you correctly handle the case of N = 1 in your code.)
An alternative is to use the Reduce function. This example is adapted from the one on the ?Reduce help page.
NextSol <- function(x) x + 1 #Or whatever you want
Funcall <- function(f, ...) f(...)
Reduce(Funcall, rep.int(list(NextSol), 5), 0, right = TRUE)
## [1] 5
It's unlikely that this will be much faster, and it's arguably harder to read, so you may well decide to stick with a for loop.
Well, I suppose we can make it easier to read by wrapping it in an Iterate function.
Iterate <- function(f, init, n)
{
Reduce(
function(f, ...) f(...),
rep.int(list(f), n),
init,
right = TRUE
)
}
Iterate(NextSol, 0, 5) #same as before

R: Confusion with apply() vs for loop

I know that I should avoid for-loops, but I'm not exactly sure how to do what I want to do with an apply function.
Here is a slightly simplified model of what I'm trying to do. So, essentially I have a big matrix of predictors and I want to run a regression using a window of 5 predictors on each side of the indexed predictor (i in the case of a for loop). With a for loop, I can just say something like:
results<-NULL
window<-5
for(i in 1:ncol(g))
{
first<-i-window #Set window boundaries
if(first<1){
1->first
}
last<-i+window-1
if(last>ncol(g)){
ncol(g)->last
}
predictors<-g[,first:last]
#Do regression stuff and return some result
results[i]<-regression stuff
}
Is there a good way to do this with an apply function? My problem is that the vector that apply would be shoving into the function really doesn't matter. All that matters is the index.
This question touches several points that are made in 'The R Inferno' http://www.burns-stat.com/pages/Tutor/R_inferno.pdf
There are some loops you should avoid, but not all of them. And using an apply function is more hiding the loop than avoiding it. This example seems like a good choice to leave in a 'for' loop.
Growing objects is generally bad form -- it can be extremely inefficient in some cases. If you are going to have a blanket rule, then "not growing objects" is a better one than "avoid loops".
You can create a list with the final length by:
result <- vector("list", ncol(g))
for(i in 1:ncol(g)) {
# stuff
result[[i]] <- #results
}
In some circumstances you might think the command:
window<-5
means give me a logical vector stating which values of 'window' are less than -5.
Spaces are good to use, mostly not to confuse humans, but to get the meaning directly above not to confuse R.
Using an apply function to do your regression is mostly a matter of preference in this case; it can handle some of the bookkeeping for you (and so possibly prevent errors) but won't speed up the code.
I would suggest using vectorized functions though to compute your first's and last's, though, perhaps something like:
window <- 5
ng <- 15 #or ncol(g)
xy <- data.frame(first = pmax( (1:ng) - window, 1 ),
last = pmin( (1:ng) + window, ng) )
Or be even smarter with
xy <- data.frame(first= c(rep(1, window), 1:(ng-window) ),
last = c((window+1):ng, rep(ng, window)) )
Then you could use this in a for loop like this:
results <- list()
for(i in 1:nrow(xy)) {
results[[i]] <- xy$first[i] : xy$last[i]
}
results
or with lapply like this:
results <- lapply(1:nrow(xy), function(i) {
xy$first[i] : xy$last[i]
})
where in both cases I just return the sequence between first and list; you would substitute with your actual regression code.

Resources