I'm trying to use bootcov to get clustered standard errors for a regression analysis on panel data. In the analysis, I'm including the cluster variable as a fixed effect to address cluster-level confounding. However, including the cluster variable as a fixed effect causes bootcov to throw an error ("Warning message:...fit failure in 200 resamples. Might try increasing maxit"). I imagine this is because the coefficient matrix varies over bootstrap replications depending on which clusters are selected (here's a similar issue and solution in Stata).
Does anyone know a way around this problem? If not, I can try to manually edit the function myself. Unfortunately, I can't use the cluster option in robcov because my analysis actually requires the Glm function rather than the ols function. Furthermore, I want to stick with the rms package because my analysis involves restricted cubic splines, which rms makes easy to visualize, and test via ANOVA (although I'm open to other suggestions).
Thanks for the help. I copied an example below.
#load package
library(rms)
#make df
x <- rnorm(1000)
y <- sample(c(1:100),1000, replace=TRUE)
z <- factor(rep(1:50, 20))
df <- data.frame(y,x,z)
#set datadist
dd <- datadist(df)
options(datadist='dd')
#works when cluster variable isn't included as fixed effect in regression
reg <- ols(x ~ y, df, x=TRUE, y=TRUE)
reg_clus <- bootcov(reg, df$z)
summary(reg_clus)
#doesn't work when cluster variable included as fixed effect in regression
reg2 <- ols(x ~ y + z, df, x=TRUE, y=TRUE)
reg_clus2 <- bootcov(reg2, df$z)
summary(reg_clus2)
I am trying to plot my svm model.
library(foreign)
library(e1071)
x <- read.arff("contact-lenses.arff")
#alt: x <- read.arff("http://storm.cis.fordham.edu/~gweiss/data-mining/weka-data/contact-lenses.arff")
model <- svm(`contact-lenses` ~ . , data = x, type = "C-classification", kernel = "linear")
The contact lens arff is the inbuilt data file in weka.
However, now i run into an error trying to plot the model.
plot(model, x)
Error in plot.svm(model, x) : missing formula.
The problem is that in in your model, you have multiple covariates. The plot() will only run automatically if your data= argument has exactly three columns (one of which is a response). For example, in the ?plot.svm help page, you can call
data(cats, package = "MASS")
m1 <- svm(Sex~., data = cats)
plot(m1, cats)
So since you can only show two dimensions on a plot, you need to specify what you want to use for x and y when you have more than one to choose from
cplus<-cats
cplus$Oth<-rnorm(nrow(cplus))
m2 <- svm(Sex~., data = cplus)
plot(m2, cplus) #error
plot(m2, cplus, Bwt~Hwt) #Ok
plot(m2, cplus, Hwt~Oth) #Ok
So that's why you're getting the "Missing Formula" error.
There is another catch as well. The plot.svm will only plot continuous variables along the x and y axes. The contact-lenses data.frame has only categorical variables. The plot.svm function simply does not support this as far as I can tell. You'll have to decide how you want to summarize that information in your own visualization.
I am trying to streamline my code to avoid for loops but am having a hard time once I run my cox proportional hazards code to extract p-values and standard errors for the coefficients. My code is as follows:
library(survival)
#Generate Data
x = matrix(rbinom(10000,1,.5),ncol=100)
y = rexp(ncol(x),.02)
censor = rbinom(ncol(x),1,.5)
event = ifelse(censor==1,0,1)
#Fit the coxph model to the data
ans = apply(x,1,function(x,y,event)coxph(Surv(y,event)~x),y=y,event=event)
#Extract the coefficients from ans
coef = unname(sapply(ans,function(x)x$coef))
So as you can see, I am able to extract the coefficients from the object ans, but I cannot extract the p-values and standard errors. Is there an easy way to do this from my ans object? Or a simple way to modify this code to do it?
You can just add these two lines of code to get the p-values and the std errors.
pValues <- sapply(1:length(ans), function(x) {summary(ans[[x]])$coefficients[5]})
sd <- sapply(1:length(ans), function(x) {summary(ans[[x]])$coefficients[3]})
Does anyone know how to get stargazer to display clustered SEs for lm models? (And the corresponding F-test?) If possible, I'd like to follow an approach similar to computing heteroskedasticity-robust SEs with sandwich and popping them into stargazer as in http://jakeruss.com/cheatsheets/stargazer.html#robust-standard-errors-replicating-statas-robust-option.
I'm using lm to get my regression models, and I'm clustering by firm (a factor variable that I'm not including in the regression models). I also have a bunch of NA values, which makes me think multiwayvcov is going to be the best package (see the bottom of landroni's answer here - Double clustered standard errors for panel data - and also https://sites.google.com/site/npgraham1/research/code)? Note that I do not want to use plm.
Edit: I think I found a solution using the multiwayvcov package...
library(lmtest) # load packages
library(multiwayvcov)
data(petersen) # load data
petersen$z <- petersen$y + 0.35 # create new variable
ols1 <- lm(y ~ x, data = petersen) # create models
ols2 <- lm(y ~ x + z, data = petersen)
cl.cov1 <- cluster.vcov(ols1, data$firmid) # cluster-robust SEs for ols1
cl.robust.se.1 <- sqrt(diag(cl.cov1))
cl.wald1 <- waldtest(ols1, vcov = cl.cov1)
cl.cov2 <- cluster.vcov(ols2, data$ticker) # cluster-robust SEs for ols2
cl.robust.se.2 <- sqrt(diag(cl.cov2))
cl.wald2 <- waldtest(ols2, vcov = cl.cov2)
stargazer(ols1, ols2, se=list(cl.robust.se.1, cl.robust.se.2), type = "text") # create table in stargazer
Only downside of this approach is you have to manually re-enter the F-stats from the waldtest() output for each model.
Using the packages lmtest and multiwayvcov causes a lot of unnecessary overhead. The easiest way to compute clustered standard errors in R is the modified summary() function. This function allows you to add an additional parameter, called cluster, to the conventional summary() function. The following post describes how to use this function to compute clustered standard errors in R:
https://economictheoryblog.com/2016/12/13/clustered-standard-errors-in-r/
You can easily the summary function to obtain clustered standard errors and add them to the stargazer output. Based on your example you could simply use the following code:
# estimate models
ols1 <- lm(y ~ x)
# summary with cluster-robust SEs
summary(ols1, cluster="cluster_id")
# create table in stargazer
stargazer(ols1, se=list(coef(summary(ols1,cluster = c("cluster_id")))[, 2]), type = "text")
I would recommend lfe package, which is much more powerful package than lm package. You can easily specify the cluster in the regression model:
ols1 <- felm(y ~ x + z|0|0|firmid, data = petersen)
summary(ols1)
stargazer(OLS1, type="html")
The clustered standard errors will be automatically produced. And stargazer will report the clustered-standard error accordingly.
By the way (allow me to do more marketing), for micro-econometric analysis, felm is highly recommended. You can specify fixed effects and IV easily using felm. The grammar is like:
ols1 <- felm(y ~ x + z|FixedEffect1 + FixedEffect2 | IV | Cluster, data = Data)
Was trying to predict the future value of a sample using polynomial regression in R. The y values within the sample forms a wave pattern.
For example
x = 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16
y= 1,2,3,4,5,4,3,2,1,0,1,2,3,4,5,4
But when the graph is plotted for future values the resultant y values was completely different from what was expected. Instead of a wave pattern, was getting a graph where the y values keep increasing.
futurY = 17,18,19,20,21,22
Tried different degrees of polynomial regression, but the predicted results for futurY were drastically different from what was expected
Following is the sample R code which was used to get the results
dfram <- data.frame('x'=c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16))
dfram$y <- c(1,2,3,4,5,4,3,2,1,0,1,2,3,4,5,4)
plot(dfram,dfram$y,type="l", lwd=3)
pred <- data.frame('x'=c(17,18,19,20,21,22))
myFit <- lm(y ~ poly(x,5), data=dfram)
newdata <- predict(myFit, pred)
print(newdata)
plot(pred[,1],data.frame(newdata)[,1],type="l",col="red", lwd=3)
Is this the correct technique to be used for predicting the unknown future y values OR should I be using other techniques like forecasting?
# Reproducing your data frame
dfram <- data.frame("x" = c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16),
"y" = c(1,2,3,4,5,4,3,2,1,0,1,2,3,4,5,4))
From your graph I've got the phase and period of the signal. There're better ways of calculating that automatically.
# Phase and period
fase = 1
per = 10
In the linear model function I've put the triangular signal equations.
fit <- lm(y ~ I((((trunc((x-fase)/(per/2))%%2)*2)-1) * (x-fase)%%(per/2))
+ I((((trunc((x-fase)/(per/2))%%2)*2)-1) * ((per/2)-((x-fase)%%(per/2))))
,data=dfram)
# Predict the old data
p_olddata <- predict(fit,type="response")
# Predict the new data
newdata <- data.frame('x'=c(17,18,19,20,21,22))
p_newdata <- predict(fit,newdata,type="response")
# Ploting Old and new data
plot(x=c(dfram$x,newdata$x),
y=c(p_olddata,p_newdata),
col=c(rep("blue",length(p_olddata)),rep("green",length(p_olddata))),
xlab="x",
ylab="y")
lines(dfram)
Where the black line is the original signal, the blue circles are the prediction for the original points and the green circles are the prediction for the new data.
The graph shows a perfect fit for the model because there's no noise in the data. In a real dataset you may find it so the fit will not look as nice as that.