I built two datasets based on the following equations.
spend1 <- runif(100,5,40)
spend2 <- runif(100,30,45)
rev1 <- 2*spend1 + 3*spend2
total_spend<- spend1 + spend2
rev2 <- runif(100,75,120)
total_rev<- rev1 + rev2
After getting these figures, then I built regression model and let the regression to simulate the coefficients reversely.
model1 <- lm(total_rev~total_spend + rev2)
yhat <- total_spend*as.vector(model1$coefficients)[2] + model1$residuals
model2<- lm(yhat~spend1+spend2)[enter image description here][1]
summary(model2)
Why adding model1$residuals can help to simulate the correct coefficients for spend1 and spend2?
If I don't use it, spend1 and spend2 coefficients will be just the coefficient of total_spend from model1.
[1]: https://i.stack.imgur.com/fO74j.png
Related
I am currently running a mixed effects model using lmer in which random slopes and correlated random intercepts are estimated. After fitting the model I would like to plot the result allowing from random slopes and intercepts as well as one overall fixed line. How I currently implement is this way:
library(lmer)
library(sjPlot)
df <- read_csv("anonymized_test.csv")
m1 <- lmer("DV ~ IV + (1 + IV| iso)", df)
plot_model(m1, ,type="pred",
terms=c("IV","iso"),
pred.type="re", ci.lvl = NA)
This is the result:
Which is not what is expected as we would expect some negative and positive slopes in addition to the random intercepts according to the extracted random effects of the model
The problem is that sjPlot seems to only plot the random intercepts. Looking at an older vignette of sjPlot this seems to have been implemented in a deprecated function (see here ). The question is how do I get this functionality back? Thanks for any insight.
This is actually straightforward, even without the sjPlot package. We may extract fixef and ranef as fe and re and combine them in a plot. Both, fe and re have intercept and slope and get added together.
library(lme4)
fm1 <- lmer("Reaction ~ Days + (Days | Subject)", sleepstudy)
fe <- fixef(fm1)
re <- ranef(fm1)$Subject
clr <- rainbow(nrow(re)) ## define n colors
par(mfrow=c(1, 2))
plot(Reaction ~ Days, sleepstudy, col=clr[as.numeric(Subject)], main='Pred w/ points')
lapply(seq_len(nrow(re)), \(x) abline(fe[1] + re[x, 1], fe[2] + re[x, 2], col=clr[x]))
plot(Reaction ~ Days, sleepstudy, col=clr[as.numeric(Subject)], main='Pred w/o points', type='n')
lapply(seq_len(nrow(re)), \(x) abline(fe[1] + re[x, 1], fe[2] + re[x, 2], col=clr[x]))
However, I also get the random slopes using sjPlot. Not sure what went wrong, maybe you are using outdated software?
sjPlot::plot_model(fm1, type="pred", terms=c("Days","Subject"), pred.type="re", ci.lvl=NA)
# Warning message:
# In RColorBrewer::brewer.pal(n, pal) :
# n too large, allowed maximum for palette Set1 is 9
# Returning the palette you asked for with that many colors
I am externally validating and updating a Cox model in R. The model predicts 5 year risk. I don't have access to the original data, just the equation for the linear predictor and the value of the baseline survival probability at 5 years.
I have assessed calibration and discrimination of the model in my dataset and found that the model needs to be updated.
I want to update the model by adjusting baseline risk only, so I have been using a Cox model with the linear predictor ("beta.sum") included as an offset term, to restrict its coefficient to be 1.
I want to be able to use cph instead of coxph as it makes internal validation by bootstrapping much easier. However, when including the linear predictor as an offset I get the error:
"Error in exp(object$linear.predictors) :
non-numeric argument to mathematical function"
Is there something I am doing incorrectly, or does the cph function not allow an offset within the formula? If so, is there another way to restrict the coefficient to 1?
My code is below:
load(file="k.Rdata")
### Predicted risk ###
# linear predictor (LP)
k$beta.sum <- -0.2201 * ((k$age/10)-7.036) + 0.2467 * (k$male - 0.5642) - 0.5567 * ((k$epi/5)-7.222) +
0.4510 * (log(k$acr_mgmmol/0.113)-5.137)
k$pred <- 1 - 0.9365^exp(k$beta.sum)
# Recalibrated model
# Using coxph:
cox.new <- coxph(Surv(time, rrt) ~ offset(beta.sum), data = k, x=TRUE, y=TRUE)
# new baseline survival at 5 years
library(pec)
predictSurvProb(cox.new, newdata=data.frame(beta.sum=0), times = 5) #baseline = 0.9570
# Using cph
cph.new <- cph(Surv(time, rrt) ~ offset(beta.sum), data=k, x=TRUE, y=TRUE, surv=TRUE)
The model will run without surv=TRUE included, but this means a lot of the commands I want to use cannot work, such as calibrate, validate and predictSurvProb.
EDIT:
I will include a way to reproduce the error
library(purr)
library(rms)
n <- 1000
set.seed(1234)
status <- as.numeric(rbernoulli(n, p=0.1))
time <- -5* log(runif(n))
lp <- rnorm(1000, mean=-2.7, sd=1)
mydata <- data.frame(status, time, lp)
test <- cph(Surv(time, status) ~ offset(lp), data=mydata, surv=TRUE)
Using bam, I made a logistic mixed model with the following form:
PresAbs ~ s(Var 1) + s(Var 2) + ... + s(Var n) + s(RandomVar, bs = "re")
The RandomVar is a factor and I am not interested in the predictions for each of its level. How can I obtain population-level prediction, comparable to predict.lme?
One way is just exclude the random effect spline from the predictions.
Using the example from ?gam.models
library("mgcv")
dat <- gamSim(1,n=400,scale=2) ## simulate 4 term additive truth
## Now add some random effects to the simulation. Response is
## grouped into one of 20 groups by `fac' and each groups has a
## random effect added....
fac <- as.factor(sample(1:20,400,replace=TRUE))
dat$X <- model.matrix(~fac-1)
b <- rnorm(20)*.5
dat$y <- dat$y + dat$X%*%b
m1 <- gam(y ~ s(fac,bs="re")+s(x0)+s(x1)+s(x2)+s(x3),data=dat,method="ML")
we want to exclude the term s(fac) as it is written in the output from
summary(m1)
For the observed data, population effects are
predict(m1, exclude = 's(fac)')
but you can supply newdata to generate predictions for other combinations of the covariates.
I would like to create confusion matrices for a multinomial logistic regression as well as a proportional odds model but I am stuck with the implementation in R. My attempt below does not seem to give the desired output.
This is my code so far:
CH <- read.table("http://data.princeton.edu/wws509/datasets/copen.dat", header=TRUE)
CH$housing <- factor(CH$housing)
CH$influence <- factor(CH$influence)
CH$satisfaction <- factor(CH$satisfaction)
CH$contact <- factor(CH$contact)
CH$satisfaction <- factor(CH$satisfaction,levels=c("low","medium","high"))
CH$housing <- factor(CH$housing,levels=c("tower","apartments","atrium","terraced"))
CH$influence <- factor(CH$influence,levels=c("low","medium","high"))
CH$contact <- relevel(CH$contact,ref=2)
model <- multinom(satisfaction ~ housing + influence + contact, weights=n, data=CH)
summary(model)
preds <- predict(model)
table(preds,CH$satisfaction)
omodel <- polr(satisfaction ~ housing + influence + contact, weights=n, data=CH, Hess=TRUE)
preds2 <- predict(omodel)
table(preds2,CH$satisfaction)
I would really appreciate some advice on how to correctly produce confusion matrices for my 2 models!
You can refer -
Predict() - Maybe I'm not understanding it
Here in predict() you need to pass unseen data for prediction.
Is there a way how I can extract coefficients of globally fitted terms in local regression modeling?
Maybe I do misunderstand the role of globally fitted terms in the function loess, but what I would like to have is the following:
# baseline:
x <- sin(seq(0.2,0.6,length.out=100)*pi)
# noise:
x_noise <- rnorm(length(x),0,0.1)
# known structure:
x_1 <- sin(seq(5,20,length.out=100))
# signal:
y <- x + x_1*0.25 + x_noise
# fit loess model:
x_seq <- seq_along(x)
mod <- loess(y ~ x_seq + x_1,parametric="x_1")
The fit is done perfectly, however, how can I extract the estimated value of the globally fitted term x_1 (i.e. some value near 0.25 for the example above)?
Finally, I found a solution to my problem using the function gam from the package gam:
require(gam)
mod2 <- gam(y ~ lo(x_seq,span=0.75,degree=2) + x_1)
However, the fits from the two models are not exactly the same (which might be due to different control settings?)...