I'm trying to write a function for fitting a logistic curve using nls with the self-starting option ssLogis. I've got some data - which will actually come as a data frame, so this first step is really just to get a similar data set-up to what I'll be working with...
int <- c(.1, .2, .5, .8, 1.5, 2.5, 4.1, 6.1, 8.8, 13.8, 25.5, 29.2, 35, 37.9, 41.4)
yr2 <- 1:15
newdata <- data.frame(int, yr2)
So, onto the function I've written - with the help of Fox & Weisberg...
log.fit <- function(dep, ind, yourdata){
#Self-starting...
log.ss <- nls(dep ~ SSlogis(ind, phi1, phi2, phi3), data=yourdata)
#C
C <- summary(log.ss)$coef[1]
#a
A <- exp((summary(log.ss)$coef[2]) * (1/summary(log.ss)$coef[3]))
#k
K <- (1 / summary(log.ss)$coef[3])
plot(dep ~ ind, data=yourdata, main = "Logistic Function", xlab=ind, ylab=dep)
with(data, lines(seq(0, max(ind), 1), predict(log.ss, data.frame(ind=seq(0,max(ind),1)))))
p <- (yourdata$ind)
m <- mean(p)
r1 <- sum((p-m)^2)
r2 <- sum(residuals(log.ss)^2)
r_sq <- (r1-r2) / r1
R <- sqrt(r_sq)
return(c(C=C, a=A, k=K, R.value=R))
}
I can't even get out of the blocks with this to see if the rest of my function code is OK...I get an error when I run the function:
log.fit(int, yr2, newdata)
Then, I get this error:
Error in SSlogis(ind, phi1, phi2, phi3) : object 'phi2' not found
phi2 is a parameter that is estimated with ssLogis, so I'm lost. This runs like a champ outside the function.
I cannot reproduce the error you mention. I would suggest you update to the latest version of R and make sure any packages you are using are also up to date. However, there are two errors in your code:
with(yourdata, lines(seq(0, max(ind), 1), predict(log.ss, data.frame(ind=seq(0,max(ind),1)))))
p <- (yourdata$int)
For the second correction, please make sure that you indeed want to calculate the empirical variance of the int variable.
Related
I was trying to fit the data on the training set, and then apply it to the full set to see the result. But the problem is that I have to use a data.frame to store the x and y, since I am planning on using bootstrap. I was doing some testing like so:
library(splines)
# generating the data
x_ = c(0.2, 0.9, 1.4, 1.7)
y_ = c(2.5, 4.3, 5.2, 2.5)
n = 50 - length(x_)
set.seed(0)
x = seq(0,3, length.out=n) + runif(n,0,0.1)
y = x*sin(3*x) + runif(n)
x = sort(c(x, x_))
y = c(y, y_)
df <- data.frame(x=sort(x), y=y)
# fitting the model
df1 <- df[c(2,4,6,8,16,20,25,30,35,40,45,50),]
ft <- lm(df1$y ~ bs(df1$x, knots=knots, degree=3))
pr <- predict(ft, df$x)
length(pr)
The problem is that predict does not like df$x, it only works with data.frame(df$x) (Why?). Also, it refuses to predict more than 12 values, which is extremely strange to me.
Because predict() is expecting an input as a data.frame() object which contains the column 'x'. So when you pass a vector, it doesn't recognize it.
ft <- lm(y ~ bs(x, knots=knots, degree=3),data=df1)
pr <- predict(ft, newdata=df)
length(pr)
I am using this code to fit a model using LASSO regression.
library(glmnet)
IV1 <- data.frame(IV1 = rnorm(100))
IV2 <- data.frame(IV2 = rnorm(100))
IV3 <- data.frame(IV3 = rnorm(100))
IV4 <- data.frame(IV4 = rnorm(100))
IV5 <- data.frame(IV5 = rnorm(100))
DV <- data.frame(DV = rnorm(100))
data<-data.frame(IV1,IV2,IV3,IV4,IV5,DV)
x <-model.matrix(DV~.-IV5 , data)[,-1]
y <- data$DV
AB<-glmnet(x=x, y=y, alpha=1)
plot(AB,xvar="lambda")
lambdas = NULL
for (i in 1:100)
{
fit <- cv.glmnet(x,y)
errors = data.frame(fit$lambda,fit$cvm)
lambdas <- rbind(lambdas,errors)
}
lambdas <- aggregate(lambdas[, 2], list(lambdas$fit.lambda), mean)
bestindex = which(lambdas[2]==min(lambdas[2]))
bestlambda = lambdas[bestindex,1]
fit <- glmnet(x,y,lambda=bestlambda)
I would like to calculate some sort of R2 using the training data. I assume that one way to do this is using the cross-validation that I performed in choosing lambda. Based off of this post it seems like this can be done using
r2<-max(1-fit$cvm/var(y))
However, when I run this, I get this error:
Warning message:
In max(1 - fit$cvm/var(y)) :
no non-missing arguments to max; returning -Inf
Can anyone point me in the right direction? Is this the best way to compute R2 based off of the training data?
The function glmnet does not return cvm as a result on fit
?glmnet
What you want to do is use cv.glmnet
?cv.glmnet
The following works (note you must specify more than 1 lambda or let it figure it out)
fit <- cv.glmnet(x,y,lambda=lambdas[,1])
r2<-max(1-fit$cvm/var(y))
I'm not sure I understand what you are trying to do. Maybe do this?
for (i in 1:100)
{
fit <- cv.glmnet(x,y)
errors = data.frame(fit$lambda,fit$cvm)
lambdas <- rbind(lambdas,errors)
r2[i]<-max(1-fit$cvm/var(y))
}
lambdas <- aggregate(lambdas[, 2], list(lambdas$fit.lambda), mean)
bestindex = which(lambdas[2]==min(lambdas[2]))
bestlambda = lambdas[bestindex,1]
r2[bestindex]
I am working through the textbook "Bayesian Ideas and Data Analysis" by Christensen et al.
There is a simple exercise in the book that involves cutting and pasting the following code to run in Winbugs:
model{ y ~ dbin(theta, n) # Model the data
ytilde ~ dbin(theta, m) # Prediction of future binomial
theta ~ dbeta(a, b) # The prior
prob <- step(ytilde - 20) # Pred prob that ytilde >= 20 }
list(n=100, m=100, y=10, a=1, b=1) # The data
list(theta=0.5, ytilde=10) # Starting/initial values
I am trying to translate the following into R2jags code and am running into some trouble. I thought I could fairly directly write my R2Jags code in this fashion:
model {
#Likelihoods
y ~ dbin(theta,n)
yt ~ dbin(theta,m)
#Priors
theta ~ dbeta(a,b)
prob <- step(yt - 20)
}
with the R code:
library(R2jags)
n <- 100
m <- 100
y <- 10
a <- 1
b <- 1
jags.data <- list(n = n,
m = m,
y = y,
a = a,
b = b)
jags.init <- list(
list(theta = 0.5, yt = 10), #Chain 1 init
list(theta = 0.5, yt = 10), #Chain 2 init
list(theta = 0.5, yt = 10) #Chain 3 init
)
jags.param <- c("theta", "yt")
jags.fit <- jags.model(data = jags.data,
inits = jags.inits,
parameters.to.save = jags.param,
model.file = "hw21.bug",
n.chains = 3,
n.iter = 5000,
n.burnin = 100)
print(jags.fit)
However, calling the R code brings about the following error:
Error in jags.model(data = jags.data, inits = jags.inits, parameters.to.save = jags.param, :
unused arguments (parameters.to.save = jags.param, model.file = "hw21.bug", n.iter = 5000, n.burnin = 100)
Is it because I am missing a necessary for loop in my R2Jags model code?
The error is coming from the R function jags.model (not from JAGS) - you are trying to use arguments parameters.to.save etc to the wrong function.
If you want to keep the model as similar to WinBUGS as possible, there is an easier way than specifying the data and initial values in R. Put the following into a text file called 'model.txt' in your working directory:
model{
y ~ dbin(theta, n) # Model the data
ytilde ~ dbin(theta, m) # Prediction of future binomial
theta ~ dbeta(a, b) # The prior
prob <- step(ytilde - 20) # Pred prob that ytilde >= 20
}
data{
list(n=100, m=100, y=10, a=1, b=1) # The data
}
inits{
list(theta=0.5, ytilde=10) # Starting/initial values
}
And then run this in R:
library('runjags')
results <- run.jags('model.txt', monitor='theta')
results
plot(results)
For more information on this method of translating WinBUGS models to JAGS see:
http://runjags.sourceforge.net/quickjags.html
Matt
This old blog post has an extensive example of converting BUGS to JAGS accessed via package rjags not R2jags. (I like the package runjags even better.) I know we're supposed to present self-contained answers here, not just links, but the post is rather long. It goes through each logical step of a script, including:
loading the package
specifying the model
assembling the data
initializing the chains
running the chains
examining the results
I am using 'KFAS' package from R to estimate a state-space model with the Kalman filter. My measurement and transition equations are:
y_t = Z_t * x_t + \eps_t (measurement)
x_t = T_t * x_{t-1} + R_t * \eta_t (transition),
with \eps_t ~ N(0,H_t) and \eta_t ~ N(0,Q_t).
So, I want to estimate the variances H_t and Q_t, but also T_t, the AR(1) coefficient. My code is as follows:
library(KFAS)
set.seed(100)
eps <- rt(200, 4, 1)
meas <- as.matrix((arima.sim(n=200, list(ar=0.6), innov = rnorm(200)*sqrt(0.5)) + eps),
ncol=1)
Zt <- 1
Ht <- matrix(NA)
Tt <- matrix(NA)
Rt <- 1
Qt <- matrix(NA)
ss_model <- SSModel(meas ~ -1 + SSMcustom(Z = Zt, T = Tt, R = Rt,
Q = Qt), H = Ht)
fit <- fitSSM(ss_model, inits = c(0,0.6,0), method = 'L-BFGS-B')
But it returns: "Error in is.SSModel(do.call(updatefn, args = c(list(inits, model), update_args)),: System matrices (excluding Z) contain NA or infinite values, covariance matrices contain values larger than 1e+07"
The NA definitions for the variances works well, as documented in the package's paper. However, it seems this cannot be done for the AR coefficients. Does anyone know how can I do this?
Note that I am aware of the SSMarima function, which eases the definition of the transition equation as ARIMA models. Although I am able to estimate the AR(1) coef. and Q_t this way, I still cannot estimate the \eps_t variance (H_t). Moreover, I am migrating my Kalman filter codes from EViews to R, so I need to learn SSMcustom for other models that are more complicated.
Thanks!
It seems that you are missing something in your example, as your error message comes from the function fitSSM. If you want to use fitSSM for estimating general state space models, you need to provide your own model updating function. The default behaviour can only handle NA's in covariance matrices H and Q. The main goal of fitSSM is just to get started with simple stuff. For complex models and/or large data, I would recommend using your self-written objective function (with help of logLik method) and your favourite numerical optimization routines manually for maximum performance. Something like this:
library(KFAS)
set.seed(100)
eps <- rt(200, 4, 1)
meas <- as.matrix((arima.sim(n=200, list(ar=0.6), innov = rnorm(200)*sqrt(0.5)) + eps),
ncol=1)
Zt <- 1
Ht <- matrix(NA)
Tt <- matrix(NA)
Rt <- 1
Qt <- matrix(NA)
ss_model <- SSModel(meas ~ -1 + SSMcustom(Z = Zt, T = Tt, R = Rt,
Q = Qt), H = Ht)
objf <- function(pars, model, estimate = TRUE) {
model$H[1] <- pars[1]
model$T[1] <- pars[2]
model$Q[1] <- pars[3]
if (estimate) {
-logLik(model)
} else {
model
}
}
opt <- optim(c(1, 0.5, 1), objf, method = "L-BFGS-B",
lower = c(0, -0.99, 0), upper = c(100, 0.99, 100), model = ss_model)
ss_model_opt <- objf(opt$par, ss_model, estimate = FALSE)
Same with fitSSM:
updatefn <- function(pars, model) {
model$H[1] <- pars[1]
model$T[1] <- pars[2]
model$Q[1] <- pars[3]
model
}
fit <- fitSSM(ss_model, c(1, 0.5, 1), updatefn, method = "L-BFGS-B",
lower = c(0, -0.99, 0), upper = c(100, 0.99, 100))
identical(ss_model_opt, fit$model)
I want to find Lethal Dose (LD50) with its confidence interval in R. Other softwares line Minitab, SPSS, SAS provide three different versions of such confidence intervals. I could not find such intervals in any package in R (I also used findFn function from sos package).
How can I find such intervals? I coded for one type of intervals based on Delta method (as not sure about it correctness) but would like to use any established function from R package. Thanks
MWE:
dose <- c(10.2, 7.7, 5.1, 3.8, 2.6, 0)
total <- c(50, 49, 46, 48, 50, 49)
affected <- c(44, 42, 24, 16, 6, 0)
finney71 <- data.frame(dose, total, affected)
fm1 <- glm(cbind(affected, total-affected) ~ log(dose),
family=binomial(link = logit), data=finney71[finney71$dose != 0, ])
summary(fm1)$coef
Estimate Std. Error z value Pr(>|z|)
(Intercept) -4.886912 0.6429272 -7.601035 2.937717e-14
log(dose) 3.103545 0.3877178 8.004650 1.198070e-15
library(MASS)
xp <- dose.p(fm1, p=c(0.50, 0.90, 0.95)) # from MASS
xp.ci <- xp + attr(xp, "SE") %*% matrix(qnorm(1 - 0.05/2)*c(-1,1), nrow=1)
zp.est <- exp(cbind(xp, attr(xp, "SE"), xp.ci[,1], xp.ci[,2]))
dimnames(zp.est)[[2]] <- c("LD", "SE", "LCL","UCL")
zp.est
LD SE LCL UCL
p = 0.50: 4.828918 1.053044 4.363708 5.343724
p = 0.90: 9.802082 1.104050 8.073495 11.900771
p = 0.95: 12.470382 1.133880 9.748334 15.952512
From the package drc, you can get the ED50 (same calculation), along with confidence intervals.
library(drc) # Directly borrowed from the drc manual
mod <- drm(affected/total ~ dose, weights = total,
data = finney71[finney71$dose != 0, ], fct = LL2.2(), type = "binomial")
#intervals on log scale
ED(mod, c(50, 90, 95), interval = "fls", reference = "control")
Estimated effective doses
(Back-transformed from log scale-based confidence interval(s))
Estimate Lower Upper
1:50 4.8289 4.3637 5.3437
1:90 9.8021 8.0735 11.9008
1:95 12.4704 9.7483 15.9525
Which matches the manual output.
The "finney71" data is included in this package, and your calculation of confidence intervals exactly matches the example given by the drc folks, down to the "# from MASS" comment. You should give credit to them, rather than claiming you wrote the code.
There's a few other ways to figure this out. One is using parametric bootstrap, which is conveniently available through the boot package.
First, we'll refit the model.
library(boot)
finney71 <- finney71[finney71$dose != 0,] # pre-clean data
fm1 <- glm(cbind(affected, total-affected) ~ log(dose),
family=binomial(link = logit),
data=finney71)
And for illustration, we can figure out the LD50 and LD75.
statfun <- function(dat, ind) {
mod <- update(fm1, data = dat[ind,])
coefs <- coef(mod)
c(exp(-coefs[1]/coefs[2]),
exp((log(0.75/0.25) - coefs[2])/coefs[1]))
}
boot_out <- boot(data = finney71, statistic = statfun, R = 1000)
The boot.ci function can work out a variety of confidence intervals for us, using this object.
boot.ci(boot_out, index = 1, type = c('basic', 'perc', 'norm'))
##BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
##Based on 999 bootstrap replicates
##
##CALL :
##boot.ci(boot.out = boot_out, type = c("basic", "perc", "norm"),
## index = 1)
##Intervals :
##Level Normal Basic Percentile
##95% ( 3.976, 5.764 ) ( 4.593, 5.051 ) ( 4.607, 5.065 )
The confidence intervals using the normal approximation are thrown off quite a bit by a few extreme values, which the basic and percentile-based intervals are more robust to.
One interesting thing to note: if the sign of the slope is sufficiently unclear, we can get some rather extreme values (simulated as in this answer, and discussed more thoroughly in this blog post by Andrew Gelman).
set.seed(1)
x <- rnorm(100)
z = 0.05 + 0.1*x*rnorm(100, 0, 0.05) # small slope and more noise
pr = 1/(1+exp(-z))
y = rbinom(1000, 1, pr)
sim_dat <- data.frame(x, y)
sim_mod <- glm(y ~ x, data = sim_dat, family = 'binomial')
statfun <- function(dat, ind) {
mod <- update(sim_mod, data = dat[ind,])
-coef(mod)[1]/coef(mod)[2]
}
sim_boot <- boot(data = sim_dat, statistic = statfun, R = 1000)
hist(sim_boot$t[,1], breaks = 100,
main = "Bootstrap of simulated model")
The delta method above gives us mean = 6.448, lower ci = -36.22, and upper ci = 49.12, and all of the bootstrap CIs give us similarly extreme estimates.
##Level Normal Basic Percentile
##95% (-232.19, 247.76 ) ( -20.17, 45.13 ) ( -32.23, 33.06 )