How to call `eval` with `with` function? - r

Having an lm object I need to create a function based on its variables represented as character vector. I have tried to use a combination of eval and expr to create an f function that would be further used in obj and nlm optimisation of the latter.
library(tidyverse)
df <- drop_na(airquality)
model <- lm(Ozone~. - Temp, data = df, x=TRUE, y=TRUE)
base_vars <- all.vars(formula(model)[-2])
k <- length(base_vars)
f <- function(base_df, x, y, parms) {
with(base_df, parms[1] +
eval(expr(paste(paste(paste0('parms[', 2:(k+1), ']'), base_vars, sep = '*'), collapse = '+'))) +
log(parms[k+2] * (x - parms[k+3] ^ 2)))
}
obj <- function(parms, y, x) mean((residuals(model) - f(df, x, y, parms))^2)
fit <- with(data, nlm(obj, c(0, 0, 0, 0, 0, 0, 0), y = e, x = x))
But calling f(model$x, df$Temp, model$y, c(0, 0, 0, 0, 0, 0, 0)) results in the following error:
Error in eval(substitute(expr), data, enclos = parent.frame()) :
numeric 'envir' arg not of length one
4.
eval(substitute(expr), data, enclos = parent.frame())
3.
with.default(base_df, parms[1] + eval(expr(paste(paste(paste0("parms[",
2:(k + 1), "]"), base_vars, sep = "*"), collapse = "+"))) +
log(parms[k + 2] * (x - parms[k + 3]^2)))
2.
with(base_df, parms[1] + eval(expr(paste(paste(paste0("parms[",
2:(k + 1), "]"), base_vars, sep = "*"), collapse = "+"))) +
log(parms[k + 2] * (x - parms[k + 3]^2)))
1.
f(model$x, df$Temp, model$y, c(0, 0, 0, 0, 0, 0, 0))
I believe there might be a conflict between eval environment and environment implied by with function, but can't figure out why. Any ideas how can I create custom function f for variable models?
Expected output for the f(model$x, df$Temp, model$y, c(0, 0, 0, 0, 0, 0, 0)) would be:
with(base_df, parms[1]+parms[2]*Solar.R+parms[3]*Wind+parms[4]*Temp+parms[5]*Month+
parms[6]*Day+log(parms[7] * (Temp - parms[8] ^ 2)))
but for a different model it could be something like:
with(base_df,
parms[1]+parms[2]*var1+parms[3]*var2+log(parms[4]*(var3-parms[5]^2)))
so the number of variables and parameters is different with every call.

R supports computing on the language, but it should not be your first option. If you do it, it should never involve text processing of code. You don't have a case here where you need to compute on the language. I have no idea how you thought your attempt would work but I don't know the expr function and I refuse to install package tidyverse and its ginormous dependency tree.
Also, you generally should avoid with outside of interactive use. But with is not the problem here.
Here is how I would do this:
df <- airquality[complete.cases(airquality),]
model <- lm(Ozone~. - Temp, data = df)
f <- function(base_df, x, parms) {
m <- model.matrix(model, data = base_df)
k <- ncol(m)
stopifnot(length(parms) == (k + 2L))
#I use exp(parms[k+1]) to ensure a positive value within the log
m %*% parms[seq_len(k)] + log(exp(parms[k + 1L]) * (x - parms[k + 2L] ^ 2))
}
obj <- function(parms, y, x, base_df) mean((residuals(model) - f(base_df, x, parms))^2)
#some x:
x <- rpois(nrow(df), 10)
fit <- nlm(obj, c(0, 0, 0, 0, 0, 0, 0), x = x, base_df = df)
#works
You don't seem to use y and thus I removed it from the code.
Note how I create the design matrix for the linear part (using model.matrix) and use matrix multiplication with the parameters. You also need to ensure that log doesn't return Inf/-Inf/NaN.

I think #Roland gave a good answer covering your actual problem. I am isolating what I think you were specifically asking based on the question Title, with no comment on whether it is a good idea or not. It probably isn't in this use case.
But what you were looking for more than likely is eval_tidy() from rlang. I left the :: function notation in just so its obvious what package is being used here.
Note I fixed a couple things that seemed to be errors in the code. I am also using all ones instead of zeros to test in parms due to the log.
library(rlang)
library(tidyr)
# dropped y since it was an unused argument
f <- function(base_df, x, parms) {
# set an expression to evaluate using parse_expr()
.f <- rlang::parse_expr(paste(paste(paste0('parms[', 2:(k+1), ']'),
base_vars, sep = '*'), collapse = '+'))
# use eval_tidy() with the data mask
y_part1 <- rlang::eval_tidy(.f, data = base_df)
y_part2 <- log(parms[k + 2] * (x - parms[k + 3] ^ 2))
parms[1] + y_part1 + y_part2
}
# using your code
df <- tidyr::drop_na(airquality)
model <- lm(Ozone~. - Temp, data = df, x=TRUE, y=TRUE)
base_vars <- all.vars(formula(model)[-2])
k <- length(base_vars)
# changed to all ones, I think this is what you wanted for length
parms <- rep(1, k + 3)
method_1 <- f(df, df$Temp, parms)
method_2 <- with(df, parms[1]+parms[2]*Solar.R+parms[3]*Wind+parms[4]*Temp+parms[5]*Month+
parms[6]*Day+log(parms[7] * (Temp - parms[8] ^ 2)))
all.equal(method_1, method_2)
# [1] TRUE

Related

Integrating under a curve in R

I apologise if this is a duplicate; I've read answers to similar questions to no avail.
I'm trying to integrate under a curve, given a specific formula (below) for said integration.
As a toy example, here's some data:
Antia_Model <- function(t,y,p1){
r <- p1[1]; k <- p1[2]; p <- p1[3]; o <- p1[4]
P <- y[1]; I <- y[2]
dP = r*P - k*P*I
dI = p*I*(P/(P + o))
list(c(dP,dI))
}
r <- 0.25; k <- 0.01; p <- 1; o <- 1000 # Note that r can range btw 0.1 and 10 in this model
parms <- c(r, k, p, o)
P0 <- 1; I0 <- 1
N0 <- c(P0, I0)
TT <- seq(0.1, 50, 0.1)
results <- lsoda(N0, TT, Antia_Model, parms, verbose = FALSE)
P <- results[,2]; I <- results[,3]
As I understand it, I should be able to use the auc() function from the MESS package (can I just use the integrate() function? Unclear...), which should look something like this:
auc(P, TT, from = x1, to = x2, type = "spline")
Though I don't really understand how to use the "from" and "to" arguments, or how to incorporate "u" from the original integration formula...
Using the integrate() function seems more intuitive, but if I try:
u <- 1
integrand <- function(P) {u*P}
q <- integrate(integrand, lower = 0, upper = Inf)
I get this error:
# Error in integrate(integrand, lower = 0, upper = Inf) :
# the integral is probably divergent
As you can tell, I'm pretty lost, so any help would be greatly appreciated! Thank you so much! :)
integrand is technically acceptable but right now, it's the identity function f(x) = x. The area under it from [0, inf) is infinite, i.e. divergent.
From the documentation of integrate the first argument is:
an R function taking a numeric first argument and returning a numeric vector of the same length. Returning a non-finite element will generate an error.
If instead you use a pulse function:
pulse <- function(x) {ifelse(x < 5 & x >= 0, 1, 0)}
integrate(pulse, lower = 0, upper = Inf)
#> 5 with absolute error < 8.5e-05

Include function in linear constraint in r

I am trying to set up an optimization with the constraint: sum(abs(x-y))*0.5 where x are my decision variables and y are given inputs.
x <- c(0.25, 0.3, 0.05, 0.25, 0.15)
y <- rep(0.2, 5)
I have added the function
AcS <- function(g, h, dir, rhs) {
AS <- sum(abs(g-h))*0.5
return(AS)
L_constraint(L = L, dir = dir, rhs = rhs)
}
And then with the ROI package attempted to create the constraint as: AcS_cons <- L_constraint(AcS(x, y), "<=", 0.25)
With
c1 <- c(1:24, -25)
C <- matrix(c1,nrow=5,ncol=5,byrow=TRUE)
I then have my optimization problem setup to solve for x as:
QPL <- OP(Q_objective(Q = C, L = rep(0, NCOL(count_y))),
act_share_2,
max = FALSE)
When i run this I get the following error: Error in .check_constraints.L_constraint(constr, x) :
dimension missmatch! OP has 5 variables the constraints have 1
Any help on how to adjust the above, or how to setup a constraint for my function AcS would be appreciated.

lme4 allFit() giving confusing results when wrapped in a function

I am using allFit() in lme4 to automatically scan through possible optimizers, since the default optimizer usually doesn't converge in this situation. My code works fine when I run it line-by-line, but when I run it wrapped in a simple function, it gives different results.
I've looked at the output of the allFit call and it seems that when it's NOT inside the function, it returns a list of lmerModLmerTest objects as desired.
However, inside the function, it returns a list with the values simpleError, error, and condition. Why is it doing this?
I'm using RStudio, R 3.6, lme4 1.1-21, lmerTest 3.1-0.
UPDATE: The problem is that the update() method used by allFit cannot find the 'tt' data frame when re-fitting the models. I have put breakpoints into the code and it seems that the 'test' data exists in the function environment, however, so I don't understand why it can't find it...
UPDATE 2: It appears that if I change the assignment of the test data to <<-, it works. This is dangerous, though, by breaking functional programming, and I think it may fail when I try to parallelize. I am testing further... still open to suggestions!
Here is the code that works, not inside the function:
library(lme4)
multi_arm_var_sim <- function(nsub = 20, nclust = 100, narm = 2, iccs = c(.01, .04), betas = c(0,.3)){
sig_b2 <- -1*iccs / (iccs - 1)
n <- nsub * nclust * narm
y <- rep_len(NA, n)
arm <- as.factor(rep(0:(narm-1), each = nsub*nclust))
clustid <- rep(1:(nclust*narm), each = nsub)
clustRElist <- rnorm(narm*nclust, mean = 0, sd = rep(sqrt(sig_b2), each = nclust))
clustRE <- rep(clustRElist, each = nsub)
sig_b2 <- rep(sig_b2, each = nclust*nsub)
error <- rnorm(n, mean = 0, sd = 1)
beta <- rep(betas, each = nclust*nsub)
linpred <- beta + clustRE + error
output <- cbind.data.frame(arm, clustid, sig_b2, clustRE, linpred)
return(output)
}
set.seed(2)
test_1 <- multi_arm_var_sim()
model_flex_1 <- lmer(linpred ~ arm + (1 + arm | clustid),
data = test_1)
diff_optims_1 <- allFit(model_flex_1, verbose = TRUE)
print(class(diff_optims_1[[1]]))
is.OK_1 <- sapply(diff_optims_1, is, "lmerMod")
print(is.OK_1)
And here is the code that doesn't work, same setup, wrapped in a function.
library(lme4)
multi_arm_var_sim <- function(nsub = 20, nclust = 100, narm = 2, iccs = c(.01, .04), betas = c(0,.3)){
sig_b2 <- -1*iccs / (iccs - 1)
n <- nsub * nclust * narm
y <- rep_len(NA, n)
arm <- as.factor(rep(0:(narm-1), each = nsub*nclust))
clustid <- rep(1:(nclust*narm), each = nsub)
clustRElist <- rnorm(narm*nclust, mean = 0, sd = rep(sqrt(sig_b2), each = nclust))
clustRE <- rep(clustRElist, each = nsub)
sig_b2 <- rep(sig_b2, each = nclust*nsub)
error <- rnorm(n, mean = 0, sd = 1)
beta <- rep(betas, each = nclust*nsub)
linpred <- beta + clustRE + error
output <- cbind.data.frame(arm, clustid, sig_b2, clustRE, linpred)
return(output)
}
get_pval <- function(){
tt <- multi_arm_var_sim()
model_flex <- lme4::lmer(linpred ~ arm + (1 + arm | clustid),
data = tt)
diff_optims <- lme4::allFit(model_flex, data = tt, verbose = TRUE)
print(class(diff_optims[[1]]))
is.OK <- sapply(diff_optims, is, "merMod")
print(is.OK)
}
set.seed(2)
get_pval()
Thanks!!

Error supplying appropriate start parameters to nlsLM

I was playing around with the nlsLM function, from the minpack.lm library, and encountered some behaviour that I don't understand.
Given that the following function produces output when I supply a numeric vector 'b' as input I wanted to use this function to fit a nonlinear model to my data.
volEquation <- function(DBH, PHt, b){
b[1] * DBH^b[2] * PHt^b[3]
}
However I have become stuck when it comes to correctly specifying the initial parameter values. R code follows:
library(minpack.lm)
n <- 20
x <- seq(12, 60, length.out = n)
y <- seq(22, 45, length.out = n)
z <- x^2 * y ^ 3 + rnorm(n, 0, 0.1)
Data <- data.frame(DBH = x, PHt = y, TVT = z)
nlsFormula <- "TVT ~ volEquation(DBH, PHt, b)"
nlsInitial <- list(b = c(0.5, 2.25, 3.25))
nlsLMOutput <- nlsLM(formula = nlsFormula, data = Data, start = nlsInitial)
nlsOutput <- nls(formula = nlsFormula, data = Data, start = nlsInitial
nls was successful at fitting the data while nlsLM gave me this error message,
Error in rownames<-(*tmp*, value = "b") :
length of 'dimnames' [1] not equal to array extent
Can anyone provide insight as to why this problem occurs in the nlsLM function? I've tried sifting through the nlsLM code but I still don't understand what's going on.
Try separating your parameters
volEquation <- function(DBH, PHt, x,y,z){
x * DBH^y * PHt^z
}
nlsFormula <- "TVT ~ volEquation(DBH, PHt, x, y, z)"
nlsInitial <- c(x=5e-3, y=2, z=1)
nlsOutput <- nlsLM(formula = nlsFormula, data = Data, start = nlsInitial, control=nls.lm.control(maxiter=100))

summing the positive values of k variables in R

I want to first find a max of 0 or j where j is any variable and then sum these for k (k=1,2,...k) variables of a dataframe data. In stata, I did as follows:
gen sum=max(0,x)+max(0,y)+max(0,z)+...+max(0,k)
In R I used following approach:
data$sum<-ifelse(data$x<0,0,data$x*1)+ifelse(data$y<0,0,data$y*1)+ifelse(data$z<0,0,data$z*1)+...+ifelse(data$k<0,0,data$k*1)
I was wondering whether there is an alternative and efficient approach in R to do the same thing.
Try this:
mycols <- c("x", "y", "z", "k")
data$sum <- rowSums(data[mycols] * (data[mycols] > 0))
Check with some sample data:
data <- data.frame(x = runif(10) - 0.5,
y = runif(10) - 0.5,
z = runif(10) - 0.5,
k = runif(10) - 0.5)
identical(rowSums(data[mycols] * (data[mycols] > 0)), # mine
ifelse(data$x < 0, 0, data$x * 1) + # yours
ifelse(data$y < 0, 0, data$y * 1) +
ifelse(data$z < 0, 0, data$z * 1) +
ifelse(data$k < 0, 0, data$k * 1))
# [1] TRUE
Alternatives to flodel's excellent solution, noting the first looks quite a bit like your Stata code.
with( data, # terrible name for an R object, BTW
pmax(x, 0) + pmax(y, 0) + pmax(z, 0) +pmax(k,0) )
rowSums( apply(data[-5], 2, pmax, 0) )
The second one is probably slower, but it is in the running for this R-golf competition. Also a matrix math solution:
as.matrix( (data[,1:4] > 0 )* data[, 1:4]) %*% rep(1, 4 )
Not the question, but writing out every variable in Stata is likely to be tedious and error-prone. There is likely to be scope for a loop here:
gen sum = 0
quietly foreach v of var varlist {
replace sum = sum + `v' if inrange(`v', 0, .)
}
where you must work out what the varlist should be.

Resources