I want to fit a Fine&Gray competing risk adjusted model including an offset. In other types of models, I am used to being able to simply put in >offset(x), which will add an offset with coefficient 1.
I tried to do the same using the FGR function from the package riskRegression. I didn't get a warning message, but I then noticed that the coefficients for the model with and without offset(x) were exactly the same for the other variables
Example:
#install.packages(riskRegression")
library(riskRegression)
matrix <- matrix(c(3,6,3,2,5,4,7,2,8,2,
0.8,0.6,0.4,0.25,0.16,0.67,0.48,0.7,0.8,0.78,
60,55,61,62,70,49,59,63,62,64,
15,16,18,12,16,13,19,12,15,14,
0,2,1,0,1,1,0,1,2,0,
345,118,225,90,250,894,128,81,530,268),
nrow=10,ncol=6)
df <- data.frame(matrix)
colnames(df) <- c("x","y","z", "a","event","time")
fit <- FGR(Hist(time,event)~ offset(x)+a+y+z, data=df, cause=1)
fit
fit2 <- FGR(Hist(time,event)~ a+y+z, data=df, cause=1)
fit2
If you run this script, you can see that the coefficients of a, y and z do not change, while you are not getting a warning that offset cannot be used (so apparantly it just simply ignored offset(x)).
Does anybody know of a way to include x as an offset (i.e. with coefficient fixed at 1) in FGR? (Edit: Or another way to calculate the correct coefficents for a, y and z with fixed x?)
You can use the survival package for Fine-Gray models with offsets. Just wrap the variable you would like to have the offset with offset(var). I set the model below to model event 1. See code below:
library(survival)
matrix <- matrix(c(3,6,3,2,5,4,7,2,8,2,
0.8,0.6,0.4,0.25,0.16,0.67,0.48,0.7,0.8,0.78,
60,55,61,62,70,49,59,63,62,64,
15,16,18,12,16,13,19,12,15,14,
0,2,1,0,1,1,0,1,2,0,
345,118,225,90,250,894,128,81,530,268),
nrow=10,ncol=6)
df <- data.frame(matrix)
colnames(df) <- c("x","y","z", "a","event","time")
coxph(Surv(time,event==1)~ offset(x)+a+y+z, data=df)
Related
I'm running into some problems while running plm regressions in my panel database. Basically, I have to take out a year from my base and also all observations from some variable that are zero. I tried to make a reproducible example using a dataset from AER package.
require (AER)
library (AER)
require(plm)
library("plm")
data("Grunfeld", package = "AER")
View(Grunfeld)
#Here I randomize some observations of the third variable (capital) as zero, to reproduce my dataset
for (i in 1:220) {
x <- rnorm(10,0,1)
if (mean(x) >=0) {
Grunfeld[i,3] <- 0
}
}
View(Grunfeld)
panel <- Grunfeld
#First Method
#This is how I was originally manipulating my data and running my regression
panel <- Grunfeld
dd <-pdata.frame(panel, index = c('firm', 'year'))
dd <- dd[dd$year!=1935, ]
dd <- dd[dd$capital !=0, ]
ols_model_2 <- plm(log(value) ~ (capital), data=dd)
summary(ols_model_2)
#However, I couuldn't plot the variables of the datasets in graphs, because they weren't vectors. So I tried another way:
#Second Method
panel <- panel[panel$year!= 1935, ]
panel <- panel[panel$capital != 0,]
ols_model <- plm(log(value) ~ log(capital), data=panel, index = c('firm','year'))
summary(ols_model)
#But this gave extremely different results for the ols regression!
In my understanding, both approaches sould have yielded the same outputs in the OLS regression. Now I'm afraid my entire analysis is wrong, because I was doing it like the first way. Could anyone explain me what is happening?
Thanks in advance!
You are a running two different models. I am not sure why you would expect results to be the same.
Your first model is:
ols_model_2 <- plm(log(value) ~ (capital), data=dd)
While the second is:
ols_model <- plm(log(value) ~ log(capital), data=panel, index = c('firm','year'))
As you see from the summary of the models, both are "Oneway (individual) effect Within Model". In the first one you dont specify the index, since dd is a pdata.frame object. In the second you do specify the index, because panel is a simple data.frame. However this makes no difference at all.
The difference is using the log of capital or capital without log.
As a side note, leaving out 0 observations is often very problematic. If you do that, make sure you also try alternative ways of dealing with zero, and see how much your results change. You can get started here https://stats.stackexchange.com/questions/1444/how-should-i-transform-non-negative-data-including-zeros
I was looking for a way to do clustered standard errors based on ID-Year clusters (each ID-Year combination gets treated like a new cluster). I found that no such functions exist for plm objects, but I had an idea and I would like to know whether it makes sense:
In my plm formula, let's say I have
p <- plm(y~x+factor(year), df, model="within", index=("ID","Date"), effect="individual")
pce <- coeftest(p, vcov=vcovHC(p, method = "arellano", type="sss",cluster="group"))
Could I simply assign a LSDV model with an index which simply represents ID-Year combinations like this:
df$IDYEAR <- paste(df$ID,df$YEAR)
p1 <- plm(y~x+factor(year)+factor(ID), df, model="pooling", index=("IDYEAR"))
p1ce <- coeftest(p1, vcov=vcovHC(p1, method = "arellano", type="sss",cluster="group"))
This should estimate almost exactly the same model while tricking my plm function into thinking that the group level is IDYEAR so that I get the right standard errors. Is my thinking correct here?
I think, a minor adjustment to vcovDC should do
vcovDC <- function(x, ...){
vcovHC(x, cluster="group", ...) + vcovHC(x, cluster="time", ...) -
vcovHC(x, method="white1", ...)
}
Pretty neat explanation here.
This should work for your LSDV example, too.
My model includes one response variable, five predictors and one interaction term for predictor_1 and predictor_2. I would like to plot partial residual plots for every predictor variable which I would normally realize using the crPlots function from the package car. Unfortunately the function complains that it doesn't work with models that include interaction terms.
Is there another way of doing what I want?
EDIT: I created a small example illustrating the problem
require(car)
R <- c(0.53,0.60,0.64,0.52,0.75,0.66,0.71,0.49,0.52,0.59)
P1 <- c(3.1,1.8,1.8,1.8,1.8,3.2,3.2,2.8,3.1,3.3)
P2 <- c(2.1,0.8,0.3,0.5,0.4,1.3,0.5,1.2,1.6,2.1)
lm.fit1 <- lm(R ~ P1 + P2)
summary(lm.fit1)
crPlots(lm.fit1) # works fine
lm.fit2 <- lm(R ~ P1*P2)
summary(lm.fit2)
crPlots(lm.fit2) # not available
Another way to do this is to put the interaction term in as a separate variable (which avoids hacking the code for crPlot(...)).
df <- data.frame(R,P1,P2,P1.P2=P1*P2)
lm.fit1 <- lm(R ~ ., df)
summary(lm.fit1)
crPlots(lm.fit1)
Note that summary(lm.fit1) yeilds exactly the same result as summary(lm(R~P1*P2,df)).
I must admit i'm not that familiar with partial residual plots so i'm not entirely sure what the proper interpretation of them should be given an interaction term. But basically, the equivalent of
crPlot(lm.fit1, "P1")
is
x <- predict(lm.fit1, type="term", term="P1")
y <- residuals(lm.fit1, type="partial")[,"P1"]
plot(x, y)
abline(lm(y~x), col="red", lty=2)
loessLine(x,y,col="green3",log.x = FALSE, log.y = FALSE, smoother.args=list())
so really, there's no real reason the same idea couldn't work with an interaction term as well. We just leave the partial contribution from a variable due to the interaction as a separate entity and just focus on the non-interaction contribution. So what i'm going to do is just take out the check for the interaction term and then we can use the function. Assuming that
body(car:::crPlot.lm)[[11]]
# if (any(attr(terms(model), "order") > 1)) {
# stop("C+R plots not available for models with interactions.")
# }
we can copy and modify to create a new function with out the check
crPlot2 <- car:::crPlot.lm
body(crPlot2) <- body(crPlot2)[-11]
environment(crPlot2) <- asNamespace("car")
And then we can run
layout(matrix(1:2, ncol=2))
crPlot2(lm.fit2, "P1")
crPlot2(lm.fit2, "P2")
to get
I'm sure the authors had a good reason for not incorporating models with interaction terms so use this hack at your own risk. It's just unclear to me what should happen to the residual from the interaction term when making the plot.
I have been looking forever about how to do this in R and cannot find anything! Basically, I am wanting to shrink predictors using LMM. So I have a set of fixed effects, X, and I have a set of predictors, Z, that I want to put a random effect on so the model is
Y=X*beta+Z*u+e
where u~N(0,sigma_u^2 * I) and e ~ N(0,sigma_e^2 * I). I thought I could do this in lme with
fit <- lme(Y~X,random=pdIdent(~-1+Z))
but I only get the error:
Error in getGroups.data.frame(dataMix, groups) :
invalid formula for groups
Any help on this issue is much appreciated.
Have you tried:
N = sample size
group <- rep(1, N)
fit <- lme(Y~X, random=list(group=pdIdent(~-1+Z)))
In R I use nls to do a nonlinear least-squares fit. How then do I plot the model function using the values of the coefficients that the fit provided?
(Yes, this is a very naive question from an R relative newbie.)
Using the first example from ?nls and following the example I pointed you to line by line achieves the following:
#This is just our data frame
DNase1 <- subset(DNase, Run == 1)
DNase1$lconc <- log(DNase1$conc)
#Fit the model
fm1DNase1 <- nls(density ~ SSlogis(lconc, Asym, xmid, scal), DNase1)
#Plot the original points
# first argument is the x values, second is the y values
plot(DNase1$lconc,DNase1$density)
#This adds to the already created plot a line
# once again, first argument is x values, second is y values
lines(DNase1$lconc,predict(fm1DNase1))
The predict method for a nls argument is automatically returning the fitted y values. Alternatively, you add a step and do
yFitted <- predict(fm1DNase1)
and pass yFitted in the second argument to lines instead. The result looks like this:
Or if you want a "smooth" curve, what you do is to simply repeat this but evaluate the function at more points:
r <- range(DNase1$lconc)
xNew <- seq(r[1],r[2],length.out = 200)
yNew <- predict(fm1DNase1,list(lconc = xNew))
plot(DNase1$lconc,DNase1$density)
lines(xNew,yNew)
coef(x) returns the coefficients for regression results x.
model<-nls(y~a+b*x^k,my.data,list(a=0.,b=1.,k=1))
plot(y~x,my.data)
a<-coef(model)[1]
b<-coef(model)[2]
k<-coef(model)[3]
lines(x<-c(1:10),a+b*x^k,col='red')
For example.
I know what you want (I'm a Scientist). This isn't it, but at least shows how to use 'curve' to plot your fitting function over any range, and the curve will be smooth. Using the same data set as above:
nonlinFit <- nls(density ~ a - b*exp(-c*conc), data = DNase1, start = list(a=1, b=1, c=1) )
fitFnc <- function(x) predict(nonlinFit, list(conc=x))
curve(fitFnc, from=.5, to=10)
or,
curve(fitFnc, from=8.2, to=8.4)
or,
curve(fitFnc, from=.1, to=50) # well outside the data range
or whatever (without setting up a sequence of evaluation points first).
I'm a rudimentary R programmer, so I don't know how to implement (elegantly) something like ReplaceAll ( /. ) in Mathematica that one would use to replace occurrences of the symbolic parameters in the model, with the fitted parameters. This first step works although it looks horrible:
myModel <- "a - b*exp(-c*conc)"
nonlinFit <- nls(as.formula(paste("density ~", myModel)), data = DNase1, start = list(a=1, b=1, c=1) )
It leaves you with a separate 'model' (as a character string), that you might be able to make use of with the fitted parameters ... cleanly (NOT digging out a, b, c) would simply use nonlinFit ... not sure how though.
The function "curve" will plot functions for you.