I have a runjags object that has two chains that mixed very well (chains 1 and 3), and one that did not (chain 2). How can I go about trimming the runjags object to just contain chains 1 and 3?
Here is a reproducible example of generating a JAGS model using runjags (although, the chains here mix fine).
library(runjags)
#generate the data
x <- seq(1,10, by = 0.1)
y <- x + rnorm(length(x))
#write a jags model
j.model = "
model{
#this is the model loop.
for(i in 1:N){
y[i] ~dnorm(y.hat[i], tau)
y.hat[i] <- m*x[i]
}
#priors
m ~ dnorm(0, .0001)
tau <- pow(sigma, -2)
sigma ~ dunif(0, 100)
}
"
#put data in a list.
data = list(y=y, x=x, N=length(y))
#run the jags model.
jags.out <- run.jags(j.model,
data = data,
n.chains=3,
monitor=c('m'))
One way to achieve this is to convert the runjags object to a mcmc.list, then remove the chain using the following code:
trim.jags <- as.mcmc.list(jags.out)
trim.jags <- mcmc.list(trim.jags[[1]], trimjags[[3]])
However, once converted in this direction, the data cannot be put back into the runjags format. I would really like a solution that keeps the output in the runjags format, as my current workflows rely on that formatting generated by the runjags summary output.
Take a look at the (admittedly not very obviously named) divide.jags function:
jags_13 <- divide.jags(jags.out, which.chains=c(1,3))
jags_13
extend.jags(jags_13)
# etc
Hopefully this does exactly what you want.
Matt
Related
I am modelling a time series as a GARCH(1,1)-process:
And the z_t are t-distributed.
In R, I do this in the fGarch-package via
model <- garchFit(formula = ~garch(1,1), cond.dist = "std", data=r)
Is this correct?
Now, I would like to understand the output of this to check my formula.
Obviously, model#fit$coefs gives me the coefficients and model#fitted gives me the fitted r_t.
But how do I get the fitted sigma_t and z_t?
I believe that the best way is to define extractor functions when generics are not available and methods when generics already exist.
The first two functions extract the values of interest from the fitted objects.
get_sigma_t <- function(x, ...){
x#sigma.t
}
get_z_t <- function(x, ...){
x#fit$series$z
}
Here a logLik method for objects of class "fGARCH" is defined.
logLik.fGARCH <- function(x, ...){
x#fit$value
}
Now use the functions, including the method. The data comes from the first example in help("garchFit").
N <- 200
r <- as.vector(garchSim(garchSpec(rseed = 1985), n = N)[,1])
model <- garchFit(~ garch(1, 1), data = r, trace = FALSE)
get_sigma_t(model) # output not shown
get_z_t(model) # output not shown
logLik(model)
#LogLikelihood
# -861.9494
Note also that methods coef and fitted exist, there is no need for model#fitted or model#fit$coefs, like is written in the question.
fitted(model) # much simpler
coef(model)
# mu omega alpha1 beta1
#3.541769e-05 1.081941e-06 8.885493e-02 8.120038e-01
It is a list structure. Can find the structure with
str(model)
From the structure, it is easier to extract with $ or #
model#fit$series$z
model#sigma.t
I need to estimate a fractional (response taking values between 0 and 1) model with R. I also want to cluster the standard errors. I have found several examples in SO and elsewhere and I built this function based on my findings:
require(sandwich)
require(lmtest)
clrobustse <- function(fit, cl){
M <- length(unique(cl))
N <- length(cl)
K <- fit$rank
dfc <- (M/(M - 1))*((N - 1)/(N - K))
uj <- apply(estfun(fit), 2, function(x) tapply(x, cl, sum))
vcovCL <- dfc*sandwich(fit, meat = crossprod(uj)/N)
coeftest(fit, vcovCL)
}
I estimate my model like this:
fit <- glm(dep ~ exp1 + exp2 + exp3, data = df, fam = quasibinomial("probit"))
clrobustse(fit, df$cluster)
Everything works fine and I get the results. However, I suspect that something is not right as the non-clustered version:
coeftest(fit)
gives the exact same standard errors. I checked that Stata reports and that displays different clustered errors. I suspect that I have misspecified the function clrobustse but I just don't know how. Any ideas about what could be going wrong here?
I want to be able to view the p-values for the lmekin object produced by the coxme package.
eg.
model= lmekin(formula = height ~ score + sex + age + (1 | IID), data = phenotype_df, varlist = kinship_matrix)
I tried:
summary(model)
coef(summary(model))
summary(model$coefficient$fixed)
fixef(model)/ sqrt(diag(vcov(model)) #(Calculates Z-scores but not p-values)
But these did not work. So how do I view the p-values for this linear mixed model?
It took me ages of searching to figure this out, but I noticed a lot of other similar questions without proper answers, so I wanted to answer this.
You use:
library(coxme)
print(model)
Note it is important to load the coxme package beforehand or it will not work.
I've also noticed a lot of posts about how to extract the p-value from lmekin objects, or how to extract the p-value from coxme objects in general. I wrote this function, which is based on the coxme:::print.coxme function code (to view code type coxme:::print.coxme directly into R). print calculates p-values on the fly - this function allows the extraction of p-values and saves them to an object.
Note that mod is your model, eg. mod <- lmekin(y~x+a+b)
Use print(mod) to double check that the tables match.
extract_coxme_table <- function (mod){
beta <- mod$coefficients$fixed
nvar <- length(beta)
nfrail <- nrow(mod$var) - nvar
se <- sqrt(diag(mod$var)[nfrail + 1:nvar])
z<- round(beta/se, 2)
p<- signif(1 - pchisq((beta/se)^2, 1), 2)
table=data.frame(cbind(beta,se,z,p))
return(table)
}
I arrived at this topic because I was looking for the same thing for just the coxme object. The function of IcedCoffee works with a micro adjustment:
extract_coxme_table <- function (mod){
beta <- mod$coefficients #$fixed is not needed
nvar <- length(beta)
nfrail <- nrow(mod$var) - nvar
se <- sqrt(diag(mod$var)[nfrail + 1:nvar])
z<- round(beta/se, 2)
p<- signif(1 - pchisq((beta/se)^2, 1), 2)
table=data.frame(cbind(beta,se,z,p))
return(table)
}
I am forecasting a time series using harmonic regression created as such:
(Packages used: tseries, forecast, TSA, plyr)
airp <- AirPassengers
TIME <- 1:length(airp)
SIN <- COS <- matrix(nrow = length(TIME), ncol = 6,0)
for (i in 1:6){
SIN[,i] <- sin(2*pi*i*TIME/12)
COS[,i] <- cos(2*pi*i*TIME/12)
}
SIN <- SIN[,-6]
decomp.seasonal <- decompose(airp)$seasonal
seasonalfit <- lm(airp ~ SIN + COS)
The fitting works just fine. The problem occurs when forecasting.
TIME.NEW <- seq(length(TIME)+1, length(TIME)+12, by=1)
SINNEW <- COSNEW <- matrix(nrow=length(TIME.NEW), ncol = 6, 0)
for (i in 1:6) {
SINNEW[,i] <- sin(2*pi*i*TIME.NEW/12)
COSNEW[,i] <- cos(2*pi*i*TIME.NEW/12)
}
SINNEW <- SINNEW[,-6]
prediction.harmonic.dataframe <- data.frame(TIME = TIME.NEW, SIN = SINNEW, COS = COSNEW)
seasonal.predictions <- predict(seasonalfit, newdata = prediction.harmonic.dataframe)
This causes the warning:
Warning message:
'newdata' had 12 rows but variables found have 144 rows
I went through and found that the names were SIN.1, SIN.2, et cetera, instead of SIN1 and SIN2... So I manually changed those and it still didn't work. I also manually removed the SIN.6 because it, for some reason, was still there.
Help?
Edit: I have gone through the similar posts as well, and the answers in those questions did not fix my problem.
Trying to predict with a data.frame after fitting an lm model with variables not inside a data.frame (especially matrices) is not fun. It's better if you always fit your model from data in a data.frame.
For example if you did
seasonalfit <- lm(airp ~ ., data.frame(airp=airp,SIN=SIN,COS=COS))
Then your predict would work.
Alternatively you can try to cram matrices into data.frames but this is generally a bad idea. You would do
prediction.harmonic.dataframe <- data.frame(TIME = TIME.NEW,
SIN = I(SINNEW), COS = I(COSNEW))
The I() (or AsIs function) will keep them as matrices.
I want to compare between two different classification methods, namely ctree and C5.0 in the libraries partyand c50 respectively, the comparison is to test their sensitivity to the initial start points. The test should be carried 30 times for each time the number of wrong classified items are calculated and stored in a vector then by using t-test I hope to see if they are really different or not.
library("foreign"); # for read.arff
library("party") # for ctree
library("C50") # for C5.0
trainTestSplit <- function(data, trainPercentage){
newData <- list();
all <- nrow(data);
splitPoint <- floor(all * trainPercentage);
newData$train <- data[1:splitPoint, ];
newData$test <- data[splitPoint:all, ];
return (newData);
}
ctreeErrorCount <- function(st,ss){
set.seed(ss);
model <- ctree(Class ~ ., data=st$train);
class <- st$test$Class;
st$test$Class <- NULL;
pre = predict(model, newdata=st$test, type="response");
errors <- length(which(class != pre)); # counting number of miss classified items
return(errors);
}
C50ErrorCount <- function(st,ss){
model <- C5.0(Class ~ ., data=st$train, seed=ss);
class <- st$test$Class;
pre = predict(model, newdata=st$test, type="class");
errors <- length(which(class != pre)); # counting number of miss classified items
return(errors);
}
compare <- function(n = 30){
data <- read.arff(file.choose());
set.seed(100);
errors = list(ctree = c(), c50 = c());
seeds <- floor(abs(rnorm(n) * 10000));
for(i in 1:n){
splitData <- trainTestSplit(data, 0.66);
errors$ctree[i] <- ctreeErrorCount(splitData, seeds[i]);
errors$c50[i] <- C50ErrorCount(splitData, seeds[i]);
}
cat("\n\n");
cat("============= ctree Vs C5.0 =================\n");
cat(paste(errors$ctree, " ", errors$c50, "\n"))
tt <- t.test(errors$ctree, errors$c50);
print(tt);
}
The program shown is supposedly doing the job of comparison, but because of the number of errors does not change in the vectors then the t.test function produces an error. I used iris inside R (but changing class to Class) and Winchester breast cancer data which can be downloaded here to test it but any data can be used as long as it has Class attribute
But I get in to the problem that the result of both methods remain constant and not changes while I am changing the random seed, theoretically ,as described in their documentation,both of the functions use random seeds, ctree uses set.seed(x) while C5.0 uses an argument called seed to set seed, unfortunatly I can not find the effect.
Could you please tell me how to control initials of these functions
ctrees does only depend on a random seed in the case where you configure it to use a random selection of input variables (ie that mtry > 0 within ctree_control). See http://cran.r-project.org/web/packages/party/party.pdf (p. 11)
In regards to C5.0-trees the seed is used this way:
ctrl = C5.0Control(sample=0.5, seed=ss);
model <- C5.0(Class ~ ., data=st$train, control = ctrl);
Notice that the seed is used to select a sample of the data, not within the algoritm itself. See http://cran.r-project.org/web/packages/C50/C50.pdf (p. 5)