I am using a spatial lag and durbin regression models and I would like to estimate the standard errors of the impacts. Any ideas on how to do this?
Reproducible example below using a durbin model
# data
data(oldcol)
# neighbours lists
lw <- nb2listw(COL.nb, style="W")
# regression
fit_durb <- lagsarlm(CRIME ~ INC + HOVAL, data=COL.OLD, type="Durbin",
listw=lw, method="eigen",
zero.policy=T, na.action="na.omit")
# power traces
W <- as(lw, "CsparseMatrix")
trMC <- trW(W, type="MC", listw = lw)
# Impacts
imp <- summary(impacts(fit_durb, tr=trMC, R=1000), zstats=TRUE, short=TRUE)
You should be able to use the MC samples stored in the imp object to get the standard errors, for instance:
test1<-lapply(imp$sres, function(x){apply(x, 2, mean)})
test2<-lapply(imp$sres, function(x){apply(x, 2, sd)}
test1$direct/test2$direct
give the same z values as returned by imp
Related
I am trying to understand the way the e1071 package obtains its SVM predictions in a two-class classification framework. Consider the following toy example.
library(mvtnorm)
library(e1071)
n <- 50
### Gaussians
eps <- 0.05
data1 <- as.data.frame(rmvnorm(n, mean = c(0,0), sigma=diag(rep(eps,2))))
data2 <- as.data.frame(rmvnorm(n, mean = c(1,1), sigma=diag(rep(eps,2))))
### Train Model
data_df <- as.data.frame(rbind(data1, data2))
data <- as.matrix(data_df)
data_df$y <- as.factor(c(rep(-1,n), rep(1,n)))
svm <- svm(y ~ ., data = data_df, kernel = "radial", gamma=1, type = "C-classification", scale = FALSE)
Having trained the SVM, I would like to write a function that uses the coefficients and the intercept to predict on a new data point.
Recall that the kernel trick guarantees that we can write the prediction on a new point as the weighted sum of the kernel evaluated at the support vectors and the new point itself (plus some intercept).
In other words: How to combine the following three terms
supportv <- svm$SV
coefs <- svm$coefs
intercept <- svm$rho
to get the prediction associated with the corresponding SVM?
If this is not possible, or too complicated, I would also switch to a different package.
Two conceptually plausible methods of retrieving in-sample predictions (or "conditional expectations") of y[t] given y[t-1] from a bsts model yield different results, and I don't understand why.
One method uses the prediction errors returned by bsts (defined as e=y[t] - E(y[t]|y[t-1]); source: https://rdrr.io/cran/bsts/man/one.step.prediction.errors.html):
library(bsts)
get_yhats1 <- function(fit){
# One step prediction errors defined as e=y[t] - yhat (source: )
# Recover yhat by y-e
bsts.pred.errors <- bsts.prediction.errors(fit, burn=SuggestBurn(0.1, fit))$in.sample
predictions <- t(apply(bsts.pred.errors, 1, function(e){fit$original.series-e}))
return(predictions)
}
Another sums the contributions of all model component at time t.
get_yhats2 <- function(fit){
burn <- SuggestBurn(0.1, fit)
X <- fit$state.contributions
niter <- dim(X)[1]
ncomp <- dim(X)[2]
nobs <- dim(X)[3]
# initialize final fit/residuals matrices with zeros
predictions <- matrix(data = 0, nrow = niter - burn, ncol = nobs)
p0 <- predictions
comps <- seq_len(ncomp)
for (comp in comps) {
# pull out the state contributions for this component and transpose to
# a niter x (nobs - burn) array
compX <- X[-seq_len(burn), comp, ]
# accumulate the predictions across each component
predictions <- predictions + compX
}
return(predictions)
}
Fit a model:
## Air passengers data
data("AirPassengers")
# 11 years, monthly data (timestep=monthly) --> 132 observations
Y <- stats::window(AirPassengers, start=c(1949,1), end=c(1959,12))
y <- log(Y)
ss <- AddLocalLinearTrend(list(), y)
ss <- AddSeasonal(ss, y, nseasons=12, season.duration=1)
bsts.model <- bsts(y, state.specification=ss, niter=500, family='gaussian')
Compute and compare predictions using each of the functions
p1 <- get_yhats1(bsts.model)
p2 <- get_yhats2(bsts.model)
# Compare predictions for t=1:5, first MCMC iteration:
p1[1,1:5]; p2[1,1:5]
I'm the author of bsts.
The 'prediction errors' in bsts come from the filtering distribution. That is, they come from p(state | past data). The state contributions come from the smoothing distribution, i.e. p(state | all data). The filtering distribution looks backward in time, while the smoothing distribution looks both forward and backward. One typically needs the filtering distribution while using a fitted model, and the smoothing distribution while fitting the model in the first place.
I am trying to recover in-sample predictions (fitted values) from a bsts model with a specified poisson response using the bsts package in R. The following results in an error: Prediction errors are not supported for Poisson or logit models.
data("AirPassengers")
# 11 years, monthly data (timestep=monthly) --> 132 observations
Y <- stats::window(AirPassengers, start=c(1949,1), end=c(1959,12))
y <- log10(Y)
ss <- AddLocalLinearTrend(list(), y)
ss <- AddSeasonal(ss, y, nseasons=12, season.duration=1)
bsts.model <- bsts(Y, state.specification=ss, niter=150, family='poisson')
bsts.prediction.errors(bsts.model)
Is there a way to retrieve predictions on model-training data with a poisson model in bsts?
One way to do it is to extract the contribution of each model component at time t and sum them.
get_yhats2 <- function(fit){
burn <- SuggestBurn(0.1, fit)
X <- fit$state.contributions
niter <- dim(X)[1]
ncomp <- dim(X)[2]
nobs <- dim(X)[3]
# initialize final fit/residuals matrices with zeros
predictions <- matrix(data = 0, nrow = niter - burn, ncol = nobs)
p0 <- predictions
comps <- seq_len(ncomp)
for (comp in comps) {
# pull out the state contributions for this component and transpose to
# a niter x (nobs - burn) array
compX <- X[-seq_len(burn), comp, ]
# accumulate the predictions across each component
predictions <- predictions + compX
}
return(predictions)
}
get_yhats2(bsts.model)
But I also posted here, showing that this method didn't necessarily match expectations I had even in the Gaussian case.
I am using spgwr::ggwr() to fit generalized geographically weighted regression with Poisson model and log-link function. The results provide local coefficient estimates, but i am missing how to get their standard errors (or t statistics) to compute pseudo p-values.
Below is a toy example using SpatialEpi::NYleukemia dataset:
library(SpatialEpi)
library(spgwr)
## Load data
data(NYleukemia)
population <- NYleukemia$data$population
cases <- ceiling(NYleukemia$data$cases * 100)
centroids <- latlong2grid(NYleukemia$geo[, 2:3])
# data frame
nyleuk <- data.frame(centroids, cases, population)
# set coordinates as vector
coordny <- cbind(centroids[,1],centroids[,2])
# set a kernel bandwidth
bw <- 0.5
# fit ggwr()
m_pois <- ggwr(cases ~ offset(log(population)),
data = nyleuk, gweight = gwr.Gauss,
adapt = bw, family = poisson(link="log"),
type="working", coords = coordny)
# returns spatial point with coefficients
# but no standard errors :(
head(m_pois$SDF#data)
Is there any way i can get standard errors of the coefficients?
Thanks!
You may obtain standard errors from local coefficients running the function GWmodel::ggwr.basic. Function spgwr::ggwr() returns coefficients but no standard errors.
I am trying to write my own gradient boosting algorithm. I understand there are existing packages like gbm and xgboost, but I wanted to understand how the algorithm works by writing my own.
I am using the iris data set, and my outcome is Sepal.Length (continuous). My loss function is mean(1/2*(y-yhat)^2) (basically the mean squared error with 1/2 in front), so my corresponding gradient is just the residual y - yhat. I'm initializing the predictions at 0.
library(rpart)
data(iris)
#Define gradient
grad.fun <- function(y, yhat) {return(y - yhat)}
mod <- list()
grad_boost <- function(data, learning.rate, M, grad.fun) {
# Initialize fit to be 0
fit <- rep(0, nrow(data))
grad <- grad.fun(y = data$Sepal.Length, yhat = fit)
# Initialize model
mod[[1]] <- fit
# Loop over a total of M iterations
for(i in 1:M){
# Fit base learner (tree) to the gradient
tmp <- data$Sepal.Length
data$Sepal.Length <- grad
base_learner <- rpart(Sepal.Length ~ ., data = data, control = ("maxdepth = 2"))
data$Sepal.Length <- tmp
# Fitted values by fitting current model
fit <- fit + learning.rate * as.vector(predict(base_learner, newdata = data))
# Update gradient
grad <- grad.fun(y = data$Sepal.Length, yhat = fit)
# Store current model (index is i + 1 because i = 1 contain the initialized estiamtes)
mod[[i + 1]] <- base_learner
}
return(mod)
}
With this, I split up the iris data set into a training and testing data set and fit my model to it.
train.dat <- iris[1:100, ]
test.dat <- iris[101:150, ]
learning.rate <- 0.001
M = 1000
my.model <- grad_boost(data = train.dat, learning.rate = learning.rate, M = M, grad.fun = grad.fun)
Now I calculate the predicted values from my.model. For my.model, the fitted values are 0 (vector of initial estimates) + learning.rate * predictions from tree 1 + learning rate * predictions from tree 2 + ... + learning.rate * predictions from tree M.
yhats.mymod <- apply(sapply(2:length(my.model), function(x) learning.rate * predict(my.model[[x]], newdata = test.dat)), 1, sum)
# Calculate RMSE
> sqrt(mean((test.dat$Sepal.Length - yhats.mymod)^2))
[1] 2.612972
I have a few questions
Does my gradient boosting algorithm look right?
Did I calculate the predicted values yhats.mymod correctly?
Yes this looks correct. At each step you are fitting to the psuedo-residuals, which are computed as the derivative of loss with respect to the fit. You have correctly derived this gradient at the start of your question, and even bothered to get the factor of 2 right.
This also looks correct. You are aggregating across the models, weighted by learning rate, just as you did during training.
But to address something that was not asked, I noticed that your training setup has a few quirks.
The iris dataset is split equally between 3 species (setosa, versicolor, virginica) and these are adjacent in the data. Your training data has all of the setosa and versicolor, while the test set has all of the virginica examples. There is no overlap, which will lead to out-of-sample problems. It is preferable to balance your training and test sets to avoid this.
The combination of learning rate and model count looks too low to me. The fit converges as (1-lr)^n. With lr = 1e-3 and n = 1000 you can only model 63.2% of the data magnitude. That is, even if every model predicts every sample correctly, you would be estimating 63.2% of the correct value. Initializing the fit with an average, instead of 0s, would help since then the effect is a regression to the mean instead of just a drag.