Estimating quantile correlations with rolling window - r

I would like to estimate the quantile correlations between two variables, say Y and X, using the rolling window. I am using the R package QCSIS for this purpose. I tried to do the following
library(QCSIS)
#Generate some random variables
n <- 4000
x <- rnorm(n)
y <- 2 * x + rt(n,df = 1)
tau <- 9 / 10
#calculate the static quantile correlation
fit<-qc(x = x, y = y, tau = tau)
fit$rho
#calculate the rolling window quantile correlations
s<-260 #The window size
Rho.mat <- matrix(0,1,(n-s+1)) #create empty matrix to store the correlation coefficients
#running the loop
for(i in 1:(n-s+1)) {
fit <- qc(x = x, y = y, tau = tau)
Rho.mat[,i] <- fit$rho
}
However, this code does not give the quantile correlation for each window and only repeats the static quantile correlation! Most of the other solutions I found online are related to linear regression and do not fit with the function I am using. That is why I am using a loop.

Use rollapplyr as follows to avoid loops:
library(zoo)
rollapplyr(cbind(x, y), s, function(z) qc(z[, 1], z[, 2], tau)$rho,
fill = NA, by.column = FALSE)
or over the indexes:
rollapplyr(seq_along(x), s, function(ix) qc(x[ix], y[ix], tau)$rho, fill = NA)
We can check the result like this:
library(zoo)
r.roll <- rollapplyr(cbind(x, y), s, function(z) qc(z[, 1], z[, 2], tau)$rho,
fill = NA, by.column = FALSE)
r.for <- x
for(i in seq_along(r.for)) {
r.for[i] <- if (i < s) NA else {
ix <- seq(to = i, length = s)
qc(x[ix], y[ix], tau = tau)$rho
}
}
identical(r.roll, r.for)
## [1] TRUE

Related

R convert regression model fit to a function

I want to quickly extract the fit of a regression model to a function.
So I want to get from:
# generate some random data
set.seed(123)
x <- rnorm(n = 100, mean = 10, sd = 4)
z <- rnorm(n = 100, mean = -8, sd = 3)
y <- 9 * x - 10 * x ^ 2 + 5 * z + 10 + rnorm(n = 100, 0, 30)
df <- data.frame(x,y)
plot(df$x,df$y)
model1 <- lm(formula = y ~ x + I(x^2) + z, data = df)
summary(model1)
to a model_function(x) that describes the fitted values for me.
Of course I could do this by hand in a way like this:
model_function <- function(x, z, model) {
fit <- coefficients(model)["(Intercept)"] + coefficients(model)["x"]*x + coefficients(model)["I(x^2)"]*x^2 + coefficients(model)["z"]*z
return(fit)
}
fit <- model_function(df$x,df$z, model1)
which I can compare to the actual fitted values and (with some rounding errors) works perfectly.
all(round(as.numeric(model1$fitted.values),5) == round(fit,5))
But of course this is not a universal solution (e.g. more variables etc.).
So to be clear:
Is there an easy way to extract the fitted values relationship as a function with the coefficients that were just estimated?
Note: I know of course about predict and the ability to generate fitted values from new data - but I'm really looking for that underlying function. Maybe that's possible through predict?
Grateful for any help!
If you want an actual function you can do something like this:
get_func <- function(mod) {
vars <- as.list(attr(mod$terms, "variables"))[-(1:2)]
funcs <- lapply(vars, function(x) list(quote(`*`), 1, x))
terms <- mapply(function(x, y) {x[[2]] <- y; as.call(x)}, funcs, mod$coefficients[-1],
SIMPLIFY = FALSE)
terms <- c(as.numeric(mod$coefficients[1]), terms)
body <- Reduce(function(a, b) as.call(list(quote(`+`), a, b)), terms)
vars <- setNames(lapply(seq_along(vars), function(x) NULL), sapply(vars, as.character))
f <- as.function(c(do.call(alist, vars), body))
formals(f) <- formals(f)[!grepl("\\(", names(formals(f)))]
f
}
Which allows:
my_func <- get_func(model1)
my_func
#> function (x = NULL, z = NULL)
#> 48.6991866925322 + 3.31343108778127 * x + -9.77589420188036 * I(x^2) + 5.38229596972984 * z
<environment: 0x00000285a1982b48>
and
my_func(x = 1:10, z = 3)
#> [1] 58.38361 32.36936 -13.19668 -78.31451 -162.98413 -267.20553
#> [7] -390.97872 -534.30371 -697.18048 -879.60903
and
plot(1:10, my_func(x = 1:10, z = 3), type = "b")
At the moment, this would not work with interaction terms, etc, but should work for most simple linear models
Any of these give the fitted values:
fitted(model1)
predict(model1)
model.matrix(model1) %*% coef(model1)
y - resid(model1)
X <- model.matrix(model1); X %*% qr.solve(X, y)
X <- cbind(1, x, x^2, z); X %*% qr.solve(X, y)
Any of these give the predicted values for any particular x and z:
cbind(1, x, x^2, z) %*% coef(model1)
predict(model1, list(x = x, z = z))

Performing residual bootstrap using kernel regression in R

Kernel regression is a non-parametric technique that wants to estimate the conditional expectation of a random variable. It uses local averaging of the response value, Y, in order to find some non-linear relationship between X and Y.
I am have used bootstrap for kernel density estimation and now want to use it for kernel regression as well. I have been told to use residual bootstrapping for kernel regression and have read a couple of papers on this. I am however unsure how to perform this. Programming has been done in R using the FKSUM package. I have made an attempt to use standard resampling on kernel regression:
library(FKSUM)
set.seed(1)
n <- 5000
sample.size <- 500
B.replications <- 200
x <- rbeta(n, 2, 2) * 10
y <- 3 * sin(2 * x) + 10 * (x > 5) * (x - 5)
y <- y + rnorm(n) + (rgamma(n, 2, 2) - 1) * (abs(x - 5) + 3)
#taking x.y to be the population
x.y <- data.frame(x, y)
xs <- seq(min(x), max(x), length = 1000)
ftrue <- 3 * sin(2 * xs) + 10 * (xs > 5) * (xs - 5)
#Sample from the population
seqx<-seq(1,5000,by=1)
sample.ind <- sample(seqx, size = sample.size, replace = FALSE)
sample.reg<-x.y[sample.ind,]
x_s <- sample.reg$x
y_s <- sample.reg$y
fhat_loc_lin.pop <- fk_regression(x, y)
fhat_loc_lin.sample <- fk_regression(x = x_s, y = y_s)
plot(x, y, col = rgb(.7, .7, .7, .3), pch = 16, xlab = 'x',
ylab = 'x', main = 'Local linear estimator with amise bandwidth')
lines(xs, ftrue, col = 2, lwd = 3)
lines(fhat_loc_lin, lty = 2, lwd = 2)
#Bootstrap
n.B.sample = sample.size # sample bootstrap size
boot.reg.mat.X <- matrix(0,ncol=B.replications, nrow=n.B.sample)
boot.reg.mat.Y <- matrix(0,ncol=B.replications, nrow=n.B.sample)
fhat_loc_lin.boot <- matrix(0,ncol = B.replications, nrow=100)
Temp.reg.y <- matrix(0,ncol = B.replications,nrow = 1000)
for(i in 1:B.replications){
sequence.x.boot <- seq(from=1,to=n.B.sample,by=1)
sample.ind.boot <- sample(sequence.x.boot, size = sample.size, replace = TRUE)
boot.reg.mat <- sample.reg[sample.ind.boot,]
boot.reg.mat.X <- boot.reg.mat$x
boot.reg.mat.Y <- boot.reg.mat$y
fhat_loc_lin.boot <- fk_regression(x = boot.reg.mat.X ,
y = boot.reg.mat.Y,
h = fhat_loc_lin.sample$h)
lines(y=fhat_loc_lin.boot$y,x= fhat_loc_lin.sample$x, col =c(i) )
Temp.reg.y[,i] <- fhat_loc_lin.boot$y
}
quan.reg.l <- vector()
quan.reg.u <- vector()
for(i in 1:length(xs)){
quan.reg.l[i] <- quantile(x = Temp.reg.y[i,],probs = 0.025)
quan.reg.u[i] <- quantile(x = Temp.reg.y[i,],probs = 0.975)
}
# Lower Bound
Temp.reg.2 <- quan.reg.l
lines(y=Temp.reg.2,x=fhat_loc_lin.boot$x ,col="red",lwd=4,lty=1)
# Upper Bound
Temp.reg.3 <- quan.reg.u
lines(y=Temp.reg.3,x=fhat_loc_lin.boot$x ,col="navy",lwd=4,lty=1)
Asking the question on here now since I haven't received any response on CV. Any help would be greatly appreciated!

How to define function arguments based on data.frame columns (R)?

I have a script that runs maximum likelihood estimation for a linear model. The model has several variables and I need to vary them occasionally, maybe add or drop some. The usual way to define the likelihood function is like this:
LL <- function(beta0, beta1, beta2, mu, sigma){
R = y - beta0*X$x0 + beta1*X$x1 + beta2*X$x2
R = dnorm(R, mu, sigma, log = T)
-sum(R)
}
I have dependent variable in vector y and covariates in data.frame X:
X <- data.frame(x0 = 1, x1 = runif(100), x2 = runif(100)*2)
y <- X$x0 + X$x1 + X$x2 + rnorm(100)
Now the amount of variables is subject to change by application and I need to reformulate the function so that it will take as many covariates as there are columns in the data.frame X. I was already able to reformulate this to a more general form:
cols <- 0:(ncol(X)-1)
betas <- paste0("beta", cols)
eqR <- paste0("y - ", paste0(betas, "*X$x", cols, collapse = " - "))
LL <- function(beta0, beta1, beta2, mu, sigma){
R = as.formula(eqR)
R = dnorm(R, mu, sigma, log = T)
-sum(R)
}
I'm still struggling to find a way to dynamically define the function so that it would take the same number of beta arguments as there are columns in the covariate matrix. Ellipsis is perhaps useful here? I also tried with do.call:
LL <- function(betas, mu, sigma){
R <- do.call(dnorm(as.formula(eqR), mu, sigma, log = T), betas)
-sum(R)
}
That doesn't work when you fit the model, which has another stumbling block in the list of initial values:
require(stats4)
fit <- mle(LL, start = list(beta0 = 0, beta1 = 0, beta2 = 0, mu = 0, sigma = 1))
Any ideas for this?
EDIT:
I made some advance with bbmle package:
require(bbmle)
dfModel <- cbind(y, X)
cols <- 0:(ncol(X)-1)
betas <-paste0("beta",cols)
betaList <- as.list(rep(0), length(betas)))
names(betaList) <- betas
initList <- c(betaList, mu = 0, sigma = 1)
fitML <- mle2(mu ~ dnorm(mean = y - beta0*x0 - beta1*x1 - beta2*x2, sd = sigma),
start = initList,
data = dfModel)
The above example works. But when I try to define the function beforehand with as.formula, I can't get it working. So the following does not work.
eqR <- paste0("y - ", paste0(betas, "*x", cols, collapse = " - "))
fitML <- mle2(mu ~ dnorm(mean = as.formula(eqR), sd = sigma),
start = initList,
data = dfModel)
The error message is:
Error in eval(expr, envir, enclos) : object 'beta0' not found
I suspect that this might have something to do with scoping - conflict between dnorm and as.formula? I just can't find workaround for that.
Try this:
betas = c(0,0,0)
X <- data.frame(x0 = 1, x1 = runif(100), x2 = runif(100)*2)
y <- apply(X,1,sum) + rnorm(100)
where betas is (b0, b1, b2, ...etc) and its length must be equal to the number of columns of X.
Since X could have a different number of columns y should be defined as above.
Your LL function should change to:
LL <- function(betas, mu, sigma){
R = y - as.matrix(X) %*% as.matrix(betas)
R = dnorm(R, mu, sigma, log = T)
-sum(R)
}
where %*% is the matrix product. This is the same as doing b[1]*X[,1] + b[2]*X[,2] + b[3]*X[,3] + ... + b[n]*X[,n]
With these changes, you could have data frame X with any number of columns, betas an array of the same length as columns of X.
I hope I understood what you needed.

fitting function for a given data set

I'm trying to fitting the following function y(x)=a*( 1 + (x^2)/(b^2) )^t to a particular set of data , where, a, b and t are constants that want to determine by fitting.
I try the following, for example
len <- 24
x = runif(len)
y = x^3 + runif(len, min = -0.1, max = 0.1)
plot(x, y)
s <- seq(from = 0, to = 1, length = 50)
lines(s, s^3, lty = 2)
df <- data.frame(x, y)
m <- nls(y~a*( 1 + (x^2)/(b^2) )^t, data = df, start = list(a=1,t=0, b=1), trace = T)
> Error in nlsModel(formula, mf, start, wts) :
singular gradient matrix at initial parameter estimates
Can someone help me to set this function to these points, even if the fitting becomes bad, the important is to get fit this function, ie that she run on the data
thanks everyone
Because your data are changing randomly, for some situations the value of a is close to zero and your function becomes zero. The curve fit procedure fails at that point. Randomizing the start parameters might work for some situations.
A slightly more stable output can be computed using the LM algorithm:
require("minpack.lm")
LMCurveFit <- function(df) {
# The function to be fit
FitFunction <- function(params, x) {
with (
as.list(params), {
a*(1 + x^2/b^2)^t
}
)
}
# Residual
Residual <- function(params, x, y) {
FitFunction(params, x) - y
}
# Sum of squares of residuals
ssqfun <- function(params, x, y) {
sum(Residual(params, x, y)^2)
}
# Normalize the data
x_max = max(x)
y_max = max(y)
df$x = df$x/x_max
df$y = df$y/y_max
# Define start parameters
a_start = 0.1
b_start = 1.0
t_start = 1.0
param_start = c(a = a_start,
b = b_start,
t = t_start)
# Do LM fit
nls.out <- nls.lm(par = param_start,
fn = Residual,
control = nls.lm.control(nprint=0,
ftol=.Machine$double.eps,
ptol=.Machine$double.eps,
maxfev=10000, maxiter=1024),
x = df$x,
y = df$y)
# Revert scaling
nls.out$par[1] = nls.out$par[1]*y_max
nls.out$par[2] = nls.out$par[2]*x_max
# Get the parameters
params_fit = coef(nls.out)
print(params_fit)
# Compute predicted values
predicted = FitFunction(as.list(params_fit), df$x*x_max)
}
# LM fit
pred_y = LMCurveFit(df)
lines(x, pred_y)

loop linear regression over samples that contain multiple observations

I have a linear regression model y = 50 + 10x + e, where e is normally distributed.
Every time I fit the model, I'm required to use 20 pairs of x and y values, where x is seq(from = 0.5, to = 10, by = 0.5).
My first task is to fit the model 100 times. In other words, generate 100 samples, where each sample consists of 10 pairs of x and y values.
My second task is to save the intercept and slope of each of the 100 instances of model-fitting.
My un-successful code is below:
linear_model <- c()
intercept <- c()
slope <- c()
for (i in 1:100) {
e <- rnorm(n = 20, mean = 0, sd = 4)
x <- seq(from = 0.5, to = 10, by = 0.5)
y <- 50 + 10 * x + e
linear_model[i] <- lm(formula = y ~ x)
intercept[i] <- summary(object = linear_model[i])$coefficients[1, 1]
slope[i] <- summary(object = linear_model[i])$coefficients[2, 1]
}
You've generated 10 random variables for error but 20 x values so that the dimensions don't match. Either 20 random variables or 10 x values should work.
Below is my trial - note that loops are made only twice (times = 2) while it is 100 in your example.
errs <- lapply(rep(x=20, times=2), rnorm, mean=0, sd=4)
x <- seq(0.5, 10, 0.5)
y <- lapply(errs, function(err) 50 * x + err)
myLM <- function(res) {
mod <- lm(formula = res ~ x)
out <- list(intercept = mod$coefficients[1],
slope = mod$coefficients[2])
out
}
fit <- sapply(y, myLM)
fit
[,1] [,2]
intercept 0.005351345 -2.362931
slope 50.13638 50.60856

Resources