R apply() function issue with custom function included - r

I have coded following function:
one_way_anova <- function(m, n, sample_means, sample_vars) {
keskiarvo = 1/m*sum(sample_means)
otosv = (sum((sample_means-keskiarvo)^2))/(m-1)
TS = (n*otosv)/(sum(sample_vars)/m)
parvo = 1-pf(TS, m-1, m*(n-1))
return(parvo)
}
And using following data:
set.seed(1)
dat <- matrix(rnorm(300*20), nrow=300)
sample_means <- matrix(rowMeans(dat), nrow=100)
sample_vars <- matrix(apply(dat, 1, var), nrow=100)
m <- nrow(sample_means)
n <- ncol(sample_means)
Now I try to use apply -function to calculate "parvo" with my function one_way_anova for dataset sample_means by individual rows with three samples (matrix is 100x3).
apply(sample_means, 1, one_way_anova)
Which gives following error
Error in FUN(newX[, i], ...) : argument "sample_means" is missing, with no default

Since your function one_way_anova needs multiple arguments, you need to pass all other arguments besides sample_means if you used apply.
If you want to run it over rows in sample_means and sample_vars, maybe you can try sapply like below
sapply(1:m,function(k) one_way_anova(m,n,sample_means[k,],sample_vars[k,]))

Related

Apply function to dataset when function calls from two sources

I have a function that I want to apply to a dataset, but the function also uses global variables as arguments as these variables are needed elsewhere.
With this reduced example I want to apply 'pterotest' to the rows of 'data'. This test case works when the function is given V as a vector, and M and g as a single value.
df<- data.frame(matrix(ncol = 1, nrow = 3))
row.names(df) <- c("Apsaravis_ukhaana", "Jeholornis_prima", "Changchengornis_hengdaoziensis")
colnames(df) <- "M"
mass_var <- c(0.1840000, 1.6910946, 0.0858997)
df$M <- mass_var
V <- seq(0.25,30, by = 0.05)
g <- 9.81
pterotest <- function(V, M, g) {
out1 <- M*g
out2 <- V*M
return(list(V, out1, out2))
}
apply(df,1,pterotest, M = "M", g = g, V = V)
However, all I get is an error of the form:
Error in match.fun(FUN) : '1' is not a function, character or symbol
EDIT: Turning this on it's head, what I could do would be to run a loop over each row, using the multiple columns as different arguments to the function, but with a 4.2M line dataset I feel vectorising might be quicker...

R Convert loop into function

I would like to clean up my code a bit and start to use more functions for my everyday computations (where I would normally use for loops). I have an example of a for loop that I would like to make into a function. The problem I am having is in how to step through the constraint vectors without a loop. Here's what I mean;
## represents spectral data
set.seed(11)
df <- data.frame(Sample = 1:100, replicate(1000, sample(0:1000, 100, rep = TRUE)))
## feature ranges by column number
frm <- c(438,563,953,963)
to <- c(548,803,1000,993)
nm <- c("WL890", "WL1080", "WL1400", "WL1375")
WL.ps <- list()
for (i in 1:length(frm)){
## finds the minimum value within the range constraints and returns the corresponding column name
WL <- colnames(df[frm[i]:to[i]])[apply(df[frm[i]:to[i]],1,which.min)]
WL.ps[[i]] <- WL
}
new.df <- data.frame(WL.ps)
colnames(new.df) <- nm
The part where I iterate through the 'frm' and 'to' vector values is what I'm having trouble with. How does one go from frm[1] to frm[2].. so-on in a function (apply or otherwise)?
Any advice would be greatly appreciated.
Thank you.
You could write a function which returns column name of minimum value in each row for a particular range of columns. I have used max.col instead of apply(df, 1, which.min) to get minimum value in a row since max.col would be efficient compared to apply.
apply_fun <- function(data, x, y) {
cols <- x:y
names(data[cols])[max.col(-data[cols])]
}
Apply this function using Map :
WL.ps <- Map(apply_fun, frm, to, MoreArgs = list(data = df))

R function ambigous call

I'm working on a project, trying to convert an R function to CUDA C++, but I can't understand some R function call, I'm really new to R and I can't find what I'm really looking after. To be exactly, this is the main R function code:
for (i in 1:ncy) {
res <- apply(allsubset, 2, banddepthforonecurve, xdata=x, ydata=y[,i], tau=tau, use=use)
depth[i] <- sum(res[1,])
localdepth[i] <- sum(res[2,])
}
The part that I can't really understand is "banddepthforonecurve" function call, this is the "banddepthforonecurve" function code:
banddepthforonecurve <- function(x, xdata, ydata, tau, use) {
envsup <- apply(xdata[,x], 1, max)
envinf <- apply(xdata[,x], 1, min)
inenvsup <- ydata <= envsup
inenvinf <- ydata >= envinf
depth <- all(inenvsup) & all(inenvinf)
localdepth <- depth & use(envsup-envinf) <= tau
res <- c(depth,localdepth)
return(res)
}
When it is called in:
res <- apply(allsubset, 2, banddepthforonecurve, xdata=x, ydata=y[,i], tau=tau, use=use)
I don't really get what it set for the first parameter "x" of the "banddepthforonecurve", I supposed its like banddepthforonecurve(i, xdata=x, ydata=y[,i], tau = tau, use=use)
but if I try to run it separately on R studio to try to understand it better I get:
apply(xdata[, x], 1, max) : dim(X) must have a positive length
Why when I compile the whole R project there isn't this error? What it set for the "x" parameter when called in the "res <- apply(...)"? I hope I was clear, sorry for my bad english, Thank you in advance !
# This apply function
res = apply(X = input, MAR = 2, FUN = foo, ...)
# is essentially syntactical sugar for this:
res = list()
for(i in 1:ncol(X)) {
res[[i]] = foo(X[, i], ...)
}
# plus an attempt simplify `res` (e.g., to a matrix or vector)
So in your line:
apply(allsubset, 2, banddepthforonecurve, xdata=x, ydata=y[,i], tau=tau, use=use)
In a single iteration of your for loop, the first parameter of banddepthforonecurve (x) will be allubset[, 1], then allsubset[, 2], ..., allsubset[, ncol(allsubset)].
The xdata parameter is always x, the tau and use parameters are always tau and use, and the for loop iterates over the columns of y to use as the ydata argument. You can think of it as a nested loop, for each column of y, use it as ydata and (via apply) iterate over all columns of allsubset.
(If the MAR argument of apply was 1, then it would iterate over rows instead of columns.)

How to sum over range and calculate a series in R?

Here is the formula which I am trying to calculate in R.
So far, this is my approach using a simplified example
t <- seq(1, 2, 0.1)
expk <- function(k){exp(-2*pi*1i*t*k)}
set.seed(123)
dat <- ts(rnorm(100), start = c(1994,3), frequency = 12)
arfit <- ar(dat, order = 4, aic = FALSE) # represent \phi in the formula
tmp1 <- numeric(4)
for (i in seq_along(arfit$ar)){
ek <- expk(i)
arphi <- arfit$ar[i]
tmp1[i] <- ek * arphi
}
tmp2 <- sum(tmp1)
denom = abs(1-tmp2)^2
s2 <- t/denom
Error : Warning message:
In tmp1[i] <- ek * arphi :
number of items to replace is not a multiple of replacement length
I was trying to avoid using for loop and tried using sapply as in solutions to this question.
denom2 <- abs(1- sapply(seq_along(arfit$ar), function(x)sum(arfit$ar[x]*expf(x))))^2
but doesnt seem to be correct. The problem is to do the sum of the series(over index k) when it is taking values from another vector as well, in this case, t which is in the numerator.
Any solutions ?
Any suggestion for a test dataset, maybe using 0 and 1 to check if the calculation is done correctly in this loop here ?
Typing up the answer determined in chat. Here's a solution involving vapply.
First correct expk to:
expk <- function(k){sum(exp(-2*pi*1i*t*k))}
Then you can create this function and vapply it:
myFun <- function(i) return(expk(i) * arfit$ar[i])
tmp2 <- sum(vapply(seq_along(arfit$ar), myFun, complex(1)))

Creating a loop that calculates the rolling mean of a vector for different rolling mean lengths

I am trying to create a "for loop" setup that is going calculate different rolling means of a return series, where I use rolling means ranging from the last 2 observations to the last 16 observations. kϵ[2,16]. I've been trying to use a function like this, where the "rollmean" is a function from zoo. This produces the warning "Warning message:
In roll[i] <- rollmean(x, i) :
number of items to replace is not a multiple of replacement length"
Can someone please help me?
rollk <- function(x, kfrom= 2, kto=16){
roll <- as.list(kto-kfrom+1)
for (i in kfrom:kto){
roll[i]<- rollmean(x, i)
return(roll)
}}
I suppose you want
# library(zoo)
rollk <- function(x, kfrom = 2, kto = 16){
roll <- list()
ft <- kfrom:kto
for (i in seq_along(ft)){
roll[[i]]<- rollmean(x, ft[i])
}
return(roll)
}
There are several problems in your function:
You need [[ to access a single list element, not [.
You want a list of length length(krom:kto). Now, i starts at 1, not at kfrom.
Now, roll is returned after the for loop. Hence, the function returns a single list containing all values.
A shorter equivalent of the function above:
rollk2 <- function(x, kfrom = 2, kto = 16)
lapply(seq(kfrom, kto), function(i) na.omit(filter(x, 1 / rep(i, i))))
It does not require loading additional packages.
Try this:
library(zoo)
lapply(2:16, rollmean, x = x)

Resources