Function not finding an argument when vectorized in R? - r

I'm wondering why my vectorized function below works fine when I use cii(peta = c(.3, .4), N = 120, df1 = 3, df2 = 116) BUT when I use cii(F.value = c(30, 40), N = 120, df1 = 3, df2 = 116) the function gives "peta" is missing?
P.S. I have made it clear in my function that when F.value is NA, function should use peta and ELSE use F.value. But why I'm getting the "peta" is missing error?
cii <- function(peta, F.value = NA, N, df1, df2, conf.level = .9){ # Upper-Level FUNCTION
ci <- Vectorize(function(peta, F.value, N, df1, df2, conf.level){ # Lower-Level FUNCTION
options(warn = -1)
q = ifelse(is.na(F.value), (-peta * df2) / ((peta * df1) - df1), F.value)
alpha = (1 - conf.level)/2
f <- function (ncp, alpha, q, df1, df2) {
abs(suppressWarnings(pf(q = q, df1 = df1, df2 = df2, ncp, lower.tail = FALSE)) - alpha)}
I <- sapply(c(alpha, 1-alpha), function(x) optimize(f, interval = c(-30, 30), alpha = x, q = q, df1 = df1, df2 = df2)[[1]])
round(data.frame(lower = I[1], upper = I[2], conf.level = conf.level, F.value = q), 6)
})
data.frame(t(ci(peta = peta, F.value = F.value, N = N, df1 = df1, df2 = df2, conf.level = conf.level)))
}
### TWO EXAMPLES OF USE: ###
cii(F.value = c(30, 40), N = 120, df1 = 3, df2 = 116) # Gives Error!!!
cii(peta = c(.3, .4), N = 120, df1 = 3, df2 = 116) # Works Fine !!!

Your problem is that cii() expects there to be an input for the peta argument, and you are not providing one. You have a few options:
1. Call cii() with peta = NA
cii(peta = NA, F.value = c(30, 40), N = 120, df1 = 3, df2 = 116)
# Output:
lower upper conf.level F.value
1 29.99996 29.99996 0.9 30
2 29.99996 29.99996 0.9 40
2. Re-write cii() to assign a default value to peta
e.g.
cii <- function(peta = c(.3, .4), F.value = NA, N, df1, df2, conf.level = .9){ ... }
cii(F.value = c(30, 40), N = 120, df1 = 3, df2 = 116)
# Output
lower upper conf.level F.value
1 29.99996 29.99996 0.9 30
2 29.99996 29.99996 0.9 40
As for this case:
cii(peta = c(.3, .4), N = 120, df1 = 3, df2 = 116)
You don't get an error because in your function you've already assigned a default value of NA to F.value.

BTW although Marcus's solution works - your error occurs because you are using Vectorize
You can call a function with a single argument if the second argument is not required for evaluation
myfun <- function(a, b) {
ifelse(a==1, print(a), print(b))
if (a==1) { print(a) } else { print(b) }
}
myfun(1)
# [1] 1
# [1] 1
myfun(2)
# Error in print(b) : argument "b" is missing, with no default
This is also true for a nested(?) (internal) function
myfun <- function(a, b) {
internalfun <- function(a, b) { ifelse(a==3, print(a), print(b)) }
if (a == 1) { print(a) } else { print(b) }
internalfun(a = 3)
}
myfun(1)
# [1] 1
# [1] 3
# [1] 3
# BTW, I do not understand why `3` is printed twice
This still works when you explicitly name the arguments when calling the nested function
myfun <- function(a, b) {
internalfun <- function(a, b) { ifelse(a==3, print(a), print(b)) }
if (a == 1) { print(a) } else { print(b) }
internalfun(a = 3, b = b)
}
myfun(1)
# [1] 1
# [1] 3
# [1] 3
But it fails when using Vectorize
myfun <- function(a, b) {
internalfun <- Vectorize(function(a, b) { ifelse(a==3, print(a), print(b)) })
if (a == 1) { print(a) } else { print(b) }
internalfun(a = 3, b = b)
}
myfun(1)
# [1] 1
# Error in FUN(X[[i]], ...) : argument "b" is missing, with no default
My guess is that when vectorizing your function, Vectorize has to 'evaluate' your arguments, which is leading to the error

Related

Change Error Message to An Instruction for Users

When I run this R code I get Error in order(res2$seed): argument 1 is not a vector as an error message in the function call at first instance but when I change the range of i to be something different like in function call at second instance, I get the expected data frame format that I want.
The Function
abc <- function(a, z, n, ar11, p, d, q, sd = sd, j1, arr1, n_cores){
future::plan(future::multisession)
n_cores <- parallel::detectCores()
cl <- parallel::makeCluster(n_cores)
doParallel::registerDoParallel(cores = n_cores)
message('processing...')
`%dopar%` <- foreach::`%dopar%`
i <- a:z
res <- foreach::foreach(i = a:z, .packages = c('foreach', 'forecast')) %dopar% {
set.seed(i)
mod <- stats::arima.sim(n = n, model = list(ar = c(ar11), order = c(p, d, q)), sd = sd)
best.mod <- forecast::auto.arima(mod, ic = "aicc")
(cf <- best.mod$coef)
if (length(cf) == 0) {
rep(NA, 2)
} else if (all(grepl(c("ar1|intercept"), names(cf))) &
substr(cf["ar1"], 1, j1) %in% arr1) {
c(cf, seed = i)
} else {
rep(NA, 2)
}
}
message(' done!\n')
res1 = res[!sapply(res, anyNA)]
parallel::stopCluster(cl)
options(max.print = .Machine$integer.max)
res2 <- tibble::tibble(Reduce(function(...) merge(..., all = T), lapply(res1, function(x) as.data.frame(t(x)))))
res2[order(res2$seed), ]
res2 <- Reduce(function(...) merge(..., all = T), lapply(res1, function(x) as.data.frame(t(x))))
res2[order(res2$seed), ]
}
Call Function at First Instance
abc(a = 280000, z = 281000, n = 10, p = 1, d = 0, q = 0, ar11 = 0.8, sd = 1, j1 = 4, arr1 = "0.80")
#Error in order(res2$seed) : argument 1 is not a vector
Call Function at Second Instance
abc(a = 289800, z = 289989, n = 10, p = 1, d = 0, q = 0, ar11 = 0.8, sd = 1, j1 = 4, arr1 = "0.80")
#ar1 seed
#1 0.8000000 289805
#2 0.8000368 289989
I want to change Error in order(res2$seed): argument 1 is not a vector when need be to instruction for this R function useers to Try another range of seeds
You can either look before you leap by testing if the seed column exists:
abc <- function(a, z, n, ar11, p, d, q, sd = sd, j1, arr1, n_cores){
# ...code as in OP...
res2 <- tibble::tibble(Reduce(function(...) merge(..., all = T), lapply(res1, function(x) as.data.frame(t(x)))))
if (!("seed" %in% colnames(res2))) {
warning("Try another range of seeds", call. = FALSE)
} else {
res2[order(res2$seed), ]
}
}
abc(a = 280000, z = 281000, n = 10, p = 1, d = 0, q = 0, ar11 = 0.8, sd = 1, j1 = 4, arr1 = "0.80")
# processing...
# done!
#
# Warning message:
# Try another range of seeds
Or ask for forgiveness instead of permission using tryCatch() and suppressWarnings() for a slightly more generic approach:
abc <- function(a, z, n, ar11, p, d, q, sd = sd, j1, arr1, n_cores){
# ...code as in OP...
res2 <- tibble::tibble(Reduce(function(...) merge(..., all = T), lapply(res1, function(x) as.data.frame(t(x)))))
tryCatch(
suppressWarnings(res2[order(res2$seed), ]),
error = \(err) {
if (grepl("argument 1 is not a vector", err$message)) {
warning("Try another range of seeds", call. = FALSE)
} else {
stop(err)
}
}
)
}
abc(a = 280000, z = 281000, n = 10, p = 1, d = 0, q = 0, ar11 = 0.8, sd = 1, j1 = 4, arr1 = "0.80")
# processing...
# done!
#
# Warning message:
# Try another range of seeds
That said, it’s better in my opinion to throw an error than a warning when a function doesn’t return the expected output. Especially if other code will depend on the result of this function. You can throw an error with your desired message by replacing warning() with stop().

Using app function from {terra} package on raster stacks? (in parallel)

I have four high resolution rasters for a country. I have split each raster into tiles and done some other processing to them. I now want to apply a function to each cell, of each 'stack' of the raster tiles, to produce one set of output tiles. The function is a little complex. I have tried to synthesise some data below to reproduce my current approach. It works (ish) but I'm convinced that there's a better way to do this. To use parallel processing on my unix box, I simply swap mapply for mcmapply, but I haven't done that in the example below as I presume many will be working on Windows machines. I'd welcome ideas on my approach and particularly optimisation.
library("terra")
library("glue")
## Make some toy data
dir.create("temp_folder")
dir.create("result_folder")
x <- rast(ncols = 10, nrows = 10)
a <- rast(ncol = 100, nrow = 100)
some_values <- as.integer(runif(10000, min = 1, max = 100))
ind <- which(some_values %in% sample(some_values, 15))
some_values[ind] <- NA
values(a) <- some_values
a_tiles <- makeTiles(a, x, glue("temp_folder/tile_a_{1:100}.tif"), overwrite = TRUE)
b <- rast(ncol = 100, nrow = 100)
some_values <- as.integer(runif(10000, min = 1, max = 100))
ind <- which(some_values %in% sample(some_values, 15))
some_values[ind] <- NA
values(b) <- some_values
b_tiles <- makeTiles(b, x, glue("temp_folder/tile_b_{1:100}.tif"), overwrite = TRUE)
c <-rast(ncol = 100, nrow = 100)
some_values <- as.integer(runif(10000, min = 1, max = 100))
ind <- which(some_values %in% sample(some_values, 15))
some_values[ind] <- NA
values(c) <- some_values
c_tiles <- makeTiles(c, x, glue("temp_folder/tile_c_{1:100}.tif"), overwrite = TRUE)
d <- rast(ncol = 100, nrow = 100)
some_values <- as.integer(runif(10000, min = 1, max = 100))
ind <- which(some_values %in% sample(some_values, 15))
some_values[ind] <- NA
values(d) <- some_values
d_tiles <- makeTiles(d, x, glue("temp_folder/tile_d_{1:100}.tif"), overwrite = TRUE)
## Outer function so that this can be used in parallel ? But maybe this is a silly way to do it?
outer_function <- function(a_tiles, b_tiles, c_tiles, d_tiles, output_files) {
one_a_tile <- rast(unlist(a_tiles))
one_b_tile <- rast(unlist(b_tiles))
one_c_tile <- rast(unlist(c_tiles))
one_d_tile <- rast(unlist(d_tiles))
output_file <- output_files
# I replace any NAs with 0 as an NA will break my 'if' statement of the inner_function.
# I get Error in if (z["a"] <= z["b"]) { : missing value where TRUE/FALSE needed
one_a_tile[is.na(one_a_tile)] <- 0
one_b_tile[is.na(one_b_tile)] <- 0
one_c_tile[is.na(one_c_tile)] <- 0
one_d_tile[is.na(one_d_tile)] <- 0
z <- sds(one_a_tile, one_b_tile, one_c_tile, one_d_tile)
## Inner function that actually does the work I want doing
inner_function <- function(z) {
names(z) <- c('a', 'b', 'c', 'd')
if (z['a'] <= z['b']) {
y <- rowSums(cbind((z['c'] + z['a'] * 10),
(z['c'] + z['a'] * 20)))
}
if (z['a'] >= z['b']) {
y <- rowSums(cbind((z['c'] + z['a'] * 40),
(z['c'] + z['a'] * 10)))
}
if (z['a'] == z['b']) {
y <- rowSums(cbind((z['c'] + z['a'] * 60),
(z['c'] + z['a'] * 10)))
}
y <- ifelse(y == 0, NA, y)
return(y)
}
app(z,
inner_function,
filename = output_file,
overwrite = TRUE,
wopt = list(datatype = "INT4U"))
return(output_file)
}
results <- mapply(outer_function,
a_tiles = a_tiles,
b_tiles = b_tiles,
c_tiles = c_tiles,
d_tiles = d_tiles,
output_files = output_files <- glue("result_folder/result_tile_{1:length(d_tiles)}.tif"))
names(results) <- NULL
unlink("temp_folder", recursive = TRUE)
unlink("result_folder", recursive = TRUE)

Why lapply works and apply doesn't?

My data:
df_1 <- data.frame(
x = replicate(
n = 3,
expr = runif(n = 30, min = 20, max = 100)
),
y = sample(
x = 1:3, size = 30, replace = TRUE
)
)
The follow code with lapply works:
lapply(X = names(df_1)[c(1:3)], FUN = function(x) {
pairwise.t.test(
x = df_1[, x],
g = df_1[['y']],
p.adj = 'bonferroni'
)
})
But, with apply doesn't:
apply(X = names(df_1)[c(1:3)], MARGIN = 2, FUN = function(x) {
pairwise.t.test(
x = df_1[, x],
g = df_1[['y']],
p.adj = 'bonferroni'
)
})
Error in apply(X = names(df_1)[c(1:3)], MARGIN = 2, FUN = function(x) { :
dim(X) must have a positive length
Why the problem? Are they not equivalent?
For apply you should instead use
apply(X = df_1[1:3], MARGIN = 2, FUN = function(x) {
pairwise.t.test(
x = x,
g = df_1[['y']],
p.adj = 'bonferroni'
)
})
that is because from ?apply
apply returns a vector if MARGIN has length 1 and an array of dimension dim(X)[MARGIN] otherwise.
In your attempt you are using names(df_1)[c(1:3)] as argument to apply which has
dim(names(df_1)[c(1:3)])[2]
#NULL
Hence, you get the error.

How execute pairwise.t.test into a list with `for` loop?

My list (lt):
df_1 <- data.frame(
x = replicate(
n = 2,
expr = runif(n = 30, min = 20, max = 100)
),
y = sample(
x = 1:3, size = 30, replace = TRUE
)
)
lt <- split(
x = df_1,
f = df_1[['y']]
)
vars <- names(df_1)[1:2]
I try:
for (i in vars) {
for (i in i) {
print(pairwise.t.test(x = lt[, i], g = lt[['y']], p.adj = 'bonferroni'))
}
}
But, the error message is:
Error in lista[, i] : incorrect number of dimensions
What's problem?
We don't need to split
pairwise.t.test(unlist(df_1[1:2]), g = rep(df_1$y, 2), p.adj = 'bonferroni')
#Pairwise comparisons using t tests with pooled SD
#data: unlist(df_1[1:2]) and rep(df_1$y, 2)
# 1 2
#2 1.00 -
#3 0.91 1.00

Is it possible to `for` loop the `sapply` in R?

I was wondering why my object CI doesn't correctly return the full (11 paired answers) outputs from the for() loop in the following function? Instead, the CI returns 11 single numbers.
N = 30 ; df = 118 ; d = 1
f <- function (ncp, alpha, q, df) {
abs(suppressWarnings(pt(q = d*sqrt(N), df = df, ncp, lower.tail = FALSE)) -
alpha)
}
a = mapply(c, as.list(20:30), as.list(-20:-30), SIMPLIFY = FALSE) # a list of paired values
CI <- numeric(length(a))
for(i in 1:length(a)){
CI[i] = sapply(c(0.025, 0.975),
function(x) optimize(f, interval = a[[i]], alpha = x, q = d*sqrt(N), df = df, tol = 1e-10)[[1]])
}
CI # just returns one paired of the 11 paired answers expected!
How about:
N = 30 ; df = 118 ; d = 1
f <- function (ncp, alpha, q, df) {
abs(suppressWarnings(pt(q = d*sqrt(N), df = df, ncp, lower.tail = FALSE)) -
alpha)
}
a = mapply(c, as.list(20:30), as.list(-20:-30), SIMPLIFY = FALSE) # a list of paired values
CI <- matrix(NA, 11,2)
for(i in 1:length(a)){
CI[i,] = sapply(c(0.025, 0.975),
function(x) optimize(f, interval = a[[i]], alpha = x, q = d*sqrt(N), df = df, tol = 1e-10)[[1]])
}
CI

Resources