How to define function arguments based on data.frame columns (R)? - r

I have a script that runs maximum likelihood estimation for a linear model. The model has several variables and I need to vary them occasionally, maybe add or drop some. The usual way to define the likelihood function is like this:
LL <- function(beta0, beta1, beta2, mu, sigma){
R = y - beta0*X$x0 + beta1*X$x1 + beta2*X$x2
R = dnorm(R, mu, sigma, log = T)
-sum(R)
}
I have dependent variable in vector y and covariates in data.frame X:
X <- data.frame(x0 = 1, x1 = runif(100), x2 = runif(100)*2)
y <- X$x0 + X$x1 + X$x2 + rnorm(100)
Now the amount of variables is subject to change by application and I need to reformulate the function so that it will take as many covariates as there are columns in the data.frame X. I was already able to reformulate this to a more general form:
cols <- 0:(ncol(X)-1)
betas <- paste0("beta", cols)
eqR <- paste0("y - ", paste0(betas, "*X$x", cols, collapse = " - "))
LL <- function(beta0, beta1, beta2, mu, sigma){
R = as.formula(eqR)
R = dnorm(R, mu, sigma, log = T)
-sum(R)
}
I'm still struggling to find a way to dynamically define the function so that it would take the same number of beta arguments as there are columns in the covariate matrix. Ellipsis is perhaps useful here? I also tried with do.call:
LL <- function(betas, mu, sigma){
R <- do.call(dnorm(as.formula(eqR), mu, sigma, log = T), betas)
-sum(R)
}
That doesn't work when you fit the model, which has another stumbling block in the list of initial values:
require(stats4)
fit <- mle(LL, start = list(beta0 = 0, beta1 = 0, beta2 = 0, mu = 0, sigma = 1))
Any ideas for this?
EDIT:
I made some advance with bbmle package:
require(bbmle)
dfModel <- cbind(y, X)
cols <- 0:(ncol(X)-1)
betas <-paste0("beta",cols)
betaList <- as.list(rep(0), length(betas)))
names(betaList) <- betas
initList <- c(betaList, mu = 0, sigma = 1)
fitML <- mle2(mu ~ dnorm(mean = y - beta0*x0 - beta1*x1 - beta2*x2, sd = sigma),
start = initList,
data = dfModel)
The above example works. But when I try to define the function beforehand with as.formula, I can't get it working. So the following does not work.
eqR <- paste0("y - ", paste0(betas, "*x", cols, collapse = " - "))
fitML <- mle2(mu ~ dnorm(mean = as.formula(eqR), sd = sigma),
start = initList,
data = dfModel)
The error message is:
Error in eval(expr, envir, enclos) : object 'beta0' not found
I suspect that this might have something to do with scoping - conflict between dnorm and as.formula? I just can't find workaround for that.

Try this:
betas = c(0,0,0)
X <- data.frame(x0 = 1, x1 = runif(100), x2 = runif(100)*2)
y <- apply(X,1,sum) + rnorm(100)
where betas is (b0, b1, b2, ...etc) and its length must be equal to the number of columns of X.
Since X could have a different number of columns y should be defined as above.
Your LL function should change to:
LL <- function(betas, mu, sigma){
R = y - as.matrix(X) %*% as.matrix(betas)
R = dnorm(R, mu, sigma, log = T)
-sum(R)
}
where %*% is the matrix product. This is the same as doing b[1]*X[,1] + b[2]*X[,2] + b[3]*X[,3] + ... + b[n]*X[,n]
With these changes, you could have data frame X with any number of columns, betas an array of the same length as columns of X.
I hope I understood what you needed.

Related

Estimating quantile correlations with rolling window

I would like to estimate the quantile correlations between two variables, say Y and X, using the rolling window. I am using the R package QCSIS for this purpose. I tried to do the following
library(QCSIS)
#Generate some random variables
n <- 4000
x <- rnorm(n)
y <- 2 * x + rt(n,df = 1)
tau <- 9 / 10
#calculate the static quantile correlation
fit<-qc(x = x, y = y, tau = tau)
fit$rho
#calculate the rolling window quantile correlations
s<-260 #The window size
Rho.mat <- matrix(0,1,(n-s+1)) #create empty matrix to store the correlation coefficients
#running the loop
for(i in 1:(n-s+1)) {
fit <- qc(x = x, y = y, tau = tau)
Rho.mat[,i] <- fit$rho
}
However, this code does not give the quantile correlation for each window and only repeats the static quantile correlation! Most of the other solutions I found online are related to linear regression and do not fit with the function I am using. That is why I am using a loop.
Use rollapplyr as follows to avoid loops:
library(zoo)
rollapplyr(cbind(x, y), s, function(z) qc(z[, 1], z[, 2], tau)$rho,
fill = NA, by.column = FALSE)
or over the indexes:
rollapplyr(seq_along(x), s, function(ix) qc(x[ix], y[ix], tau)$rho, fill = NA)
We can check the result like this:
library(zoo)
r.roll <- rollapplyr(cbind(x, y), s, function(z) qc(z[, 1], z[, 2], tau)$rho,
fill = NA, by.column = FALSE)
r.for <- x
for(i in seq_along(r.for)) {
r.for[i] <- if (i < s) NA else {
ix <- seq(to = i, length = s)
qc(x[ix], y[ix], tau = tau)$rho
}
}
identical(r.roll, r.for)
## [1] TRUE

How to estimate the Kalman Filter with 'KFAS' R package, with an AR(1) transition equation and covariates?

I am using 'KFAS' package from R to estimate a state-space model with the Kalman filter. My measurement and transition equations are:
y_t = b_0 + b_1xx_t + Z_t * x_t + \eps_t (measurement)
x_t = T_t * x_{t-1} + R_t * \eta_t (transition),
with \eps_t ~ N(0,H_t) and \eta_t ~ N(0,Q_t),
where xx_t are covariates. I have read this question and wrote the following code
library(KFAS)
set.seed(100)
xx <- rnorm(200)
beta0 <- 0.1
beta1 <- 0.1
eps <- rt(200, 4, 1)
y <- as.matrix(beta0 + beta1*xx + (arima.sim(n=200, list(ar=0.6), innov = rnorm(200)*sqrt(0.5)) + eps),
ncol=1)
Zt <- 1
Ht <- matrix(NA)
Tt <- matrix(NA)
Rt <- 1
Qt <- matrix(NA)
ss_model <- SSModel(y ~ xx + SSMcustom(Z = Zt, T = Tt, R = Rt,
Q = Qt), H = Ht)
updatefn <- function(pars, model) {
model$H[1] <- pars[1]
model$T[1] <- pars[2]
model$Q[1] <- pars[3]
model
}
fit <- fitSSM(ss_model, c(1, 0.5, 1), updatefn, method = "L-BFGS-B",
lower = c(0, -0.99, 0), upper = c(100, 0.99, 100))
I get the error
Error in is.SSModel(do.call(updatefn, args = c(list(inits, model), update_args)), :
System matrices (excluding Z) contain NA or infinite values, covariance matrices contain values larger than 1e+07
I have tried to change the initial vector to c(1, 0.5, 1, 1, 1) but it returns the same message. Does anyone know how can I do this?
Thanks!

Calculating RSS manually with given pairs of beta0 and beta1

I am trying to manually calculate the RSS for a dataset with given pairs of beta0 and beta1. For each (beta_0,beta_1) pair of values, I need to calculate the residual sum of squares. Store it as a vector in data called RSS. Here's the code provided.
x = pinotnoir$Aroma
y = pinotnoir$Quality
fit = lm(y ~ x)
summary(fit)
b0s <- seq(0, 10, .1)
b1s <- seq(0, 4, .01)
data <- expand.grid(beta0=b0s, beta1=b1s)
Here's what I have so far. I think the residual calculation is wrong but I'm not sure how to fix it.
rows = length(b1s)
rsd <- rep(NA,rows)
for (i in 1:rows){
residual = (y - (b0s[i] + b1s[i] * x))^2
rsd[i] <- residual
}
data <- expand.grid(beta0=b0s, beta1=b1s, RSS=rsd)
Any help would be appreciated. Thanks in advance!
I am not sure this is exactly what you aim but adapting your code slightly you can get the sum of squared residuals and which betas minimizes them. (using mtcars data for the example)
mtcars
x = mtcars$drat
y = mtcars$wt
(fit = lm(y ~ x))
summary(fit)
grid_len <- 20
b0s <- seq(5, 10, length.out = grid_len)
b1s <- seq(-3, -1, length.out = grid_len)
(data <- expand.grid(beta0=b0s, beta1=b1s))
rows = nrow(data)
resids <- rep(NA,rows)
for (i in 1:rows) {
fitted <- (data$beta0[i] + (data$beta1[i] * x))
squared_resid <- (y - fitted)^2
SSR <- sum(squared_resid)
resids[i] <- SSR
cat(i, ": ", SSR, "\n")
}
data[which.min(resids), ]
fit
results:
> data[which.min(resids), ]
beta0 beta1
332 7.894737 -1.315789
> fit
Call:
lm(formula = y ~ x)
Coefficients:
(Intercept) x
7.906 -1.304

R convert regression model fit to a function

I want to quickly extract the fit of a regression model to a function.
So I want to get from:
# generate some random data
set.seed(123)
x <- rnorm(n = 100, mean = 10, sd = 4)
z <- rnorm(n = 100, mean = -8, sd = 3)
y <- 9 * x - 10 * x ^ 2 + 5 * z + 10 + rnorm(n = 100, 0, 30)
df <- data.frame(x,y)
plot(df$x,df$y)
model1 <- lm(formula = y ~ x + I(x^2) + z, data = df)
summary(model1)
to a model_function(x) that describes the fitted values for me.
Of course I could do this by hand in a way like this:
model_function <- function(x, z, model) {
fit <- coefficients(model)["(Intercept)"] + coefficients(model)["x"]*x + coefficients(model)["I(x^2)"]*x^2 + coefficients(model)["z"]*z
return(fit)
}
fit <- model_function(df$x,df$z, model1)
which I can compare to the actual fitted values and (with some rounding errors) works perfectly.
all(round(as.numeric(model1$fitted.values),5) == round(fit,5))
But of course this is not a universal solution (e.g. more variables etc.).
So to be clear:
Is there an easy way to extract the fitted values relationship as a function with the coefficients that were just estimated?
Note: I know of course about predict and the ability to generate fitted values from new data - but I'm really looking for that underlying function. Maybe that's possible through predict?
Grateful for any help!
If you want an actual function you can do something like this:
get_func <- function(mod) {
vars <- as.list(attr(mod$terms, "variables"))[-(1:2)]
funcs <- lapply(vars, function(x) list(quote(`*`), 1, x))
terms <- mapply(function(x, y) {x[[2]] <- y; as.call(x)}, funcs, mod$coefficients[-1],
SIMPLIFY = FALSE)
terms <- c(as.numeric(mod$coefficients[1]), terms)
body <- Reduce(function(a, b) as.call(list(quote(`+`), a, b)), terms)
vars <- setNames(lapply(seq_along(vars), function(x) NULL), sapply(vars, as.character))
f <- as.function(c(do.call(alist, vars), body))
formals(f) <- formals(f)[!grepl("\\(", names(formals(f)))]
f
}
Which allows:
my_func <- get_func(model1)
my_func
#> function (x = NULL, z = NULL)
#> 48.6991866925322 + 3.31343108778127 * x + -9.77589420188036 * I(x^2) + 5.38229596972984 * z
<environment: 0x00000285a1982b48>
and
my_func(x = 1:10, z = 3)
#> [1] 58.38361 32.36936 -13.19668 -78.31451 -162.98413 -267.20553
#> [7] -390.97872 -534.30371 -697.18048 -879.60903
and
plot(1:10, my_func(x = 1:10, z = 3), type = "b")
At the moment, this would not work with interaction terms, etc, but should work for most simple linear models
Any of these give the fitted values:
fitted(model1)
predict(model1)
model.matrix(model1) %*% coef(model1)
y - resid(model1)
X <- model.matrix(model1); X %*% qr.solve(X, y)
X <- cbind(1, x, x^2, z); X %*% qr.solve(X, y)
Any of these give the predicted values for any particular x and z:
cbind(1, x, x^2, z) %*% coef(model1)
predict(model1, list(x = x, z = z))

fitting function for a given data set

I'm trying to fitting the following function y(x)=a*( 1 + (x^2)/(b^2) )^t to a particular set of data , where, a, b and t are constants that want to determine by fitting.
I try the following, for example
len <- 24
x = runif(len)
y = x^3 + runif(len, min = -0.1, max = 0.1)
plot(x, y)
s <- seq(from = 0, to = 1, length = 50)
lines(s, s^3, lty = 2)
df <- data.frame(x, y)
m <- nls(y~a*( 1 + (x^2)/(b^2) )^t, data = df, start = list(a=1,t=0, b=1), trace = T)
> Error in nlsModel(formula, mf, start, wts) :
singular gradient matrix at initial parameter estimates
Can someone help me to set this function to these points, even if the fitting becomes bad, the important is to get fit this function, ie that she run on the data
thanks everyone
Because your data are changing randomly, for some situations the value of a is close to zero and your function becomes zero. The curve fit procedure fails at that point. Randomizing the start parameters might work for some situations.
A slightly more stable output can be computed using the LM algorithm:
require("minpack.lm")
LMCurveFit <- function(df) {
# The function to be fit
FitFunction <- function(params, x) {
with (
as.list(params), {
a*(1 + x^2/b^2)^t
}
)
}
# Residual
Residual <- function(params, x, y) {
FitFunction(params, x) - y
}
# Sum of squares of residuals
ssqfun <- function(params, x, y) {
sum(Residual(params, x, y)^2)
}
# Normalize the data
x_max = max(x)
y_max = max(y)
df$x = df$x/x_max
df$y = df$y/y_max
# Define start parameters
a_start = 0.1
b_start = 1.0
t_start = 1.0
param_start = c(a = a_start,
b = b_start,
t = t_start)
# Do LM fit
nls.out <- nls.lm(par = param_start,
fn = Residual,
control = nls.lm.control(nprint=0,
ftol=.Machine$double.eps,
ptol=.Machine$double.eps,
maxfev=10000, maxiter=1024),
x = df$x,
y = df$y)
# Revert scaling
nls.out$par[1] = nls.out$par[1]*y_max
nls.out$par[2] = nls.out$par[2]*x_max
# Get the parameters
params_fit = coef(nls.out)
print(params_fit)
# Compute predicted values
predicted = FitFunction(as.list(params_fit), df$x*x_max)
}
# LM fit
pred_y = LMCurveFit(df)
lines(x, pred_y)

Resources