I am trying to fit a multivariate t distribution to CRSPday data in R. But optimization fails with non-finite finite-difference value [11] error every time. Can someone please suggest what am i doing wrong.
Basically i am passing mean, variance/correlation and DF as parameters and optimizing it.
I tried few hacks: changing bounds etc, using BFGS but nothing is working
Code Snippet:
mtfit <- function(series){
loglik <- function(par) {
mean <- par[1:4]
rho12 <- par[5]; rho13 <- par[6]; rho14 <- par[7]
rho23 <- par[8]; rho24 <- par[9]; rho34 <- par[10]
var1 <- par[11]; var2 <- par[12]; var3 <- par[13]; var4 <- par[14]
nu <- par[15]
cov12 <- rho12*sqrt(var1*var2); cov13 <- rho13*sqrt(var1*var3); cov14 <-
rho14*sqrt(var1*var4)
cov23 <- rho23*sqrt(var2*var3); cov24 <- rho24*sqrt(var2*var4); cov34 <-
rho34*sqrt(var3*var4)
covar <-matrix(c(var1,cov12,cov13,cov14,cov12,var2,cov23,cov24,cov13,cov23,var3,cov34,
cov14,cov24,cov34,var4),4,4)
f <- -sum(log(dmvt(x=series, delta=mean, sigma=covar, df=nu, log=FALSE)))
f
}
cov1 <- c(1e-2,1e-2,1e-2,1e-2,1e-2,1e-2,1e-10,1e-10,1e-10,1e-10)
cov2 <- c(0.99,0.99,0.99,0.99,0.99,0.99,1e-3,1e-3,1e-3,1e-3)
lower <- append(append(c(-1,-1,-1,-1),cov1),2.1)
upper <- append(append(c(0.01,0.001,0.001,0.001),cov2),7)
start <- lower
results <- optim(start, loglik, method = "L-BFGS-
B",lower=lower,upper=upper,hessian=T)
return(results)
}
fit_mine <- mtfit(CRSPday[,c(4:7)])
Found a solution to this, hence sharing. Basically use cholesky decomposition instead of correlation to represent the co-variance matrix. In this case the bounds don't cause gradient error.
mtfit <- function(series){
loglik <- function(par) {
mean <- par[1:4]
A <-
matrix(c(par[5],par[6],par[7],par[8],0,par[9],par[10],par[11],0,0,par[12],
par[13],0,0,0,par[14]),nrow=4,byrow=T)
covar <- t(A)%*%A
f <- -sum(log(dmvt(x=series, delta=mean, sigma=covar, df=nu, log=FALSE)))
f
}
A <- chol(cov(series))
cov1 <- c(-.1,-.1,-.1,-.1,-.1,-.1,-.1,-.1,-.1,-.1)
cov2 <- c(.1,.1,.1,.1,.1,.1,.1,.1,.1,.1)
lower <- append(append(c(-0.02,-0.02,-0.02,-0.02),cov1),2.1)
upper <- append(append(c(0.02,0.02,0.02,0.02),cov2),15)
start <-
as.vector(c(apply(series,2,mean),A[1,1],A[1,2],A[1,3],A[1,4],A[2,2],
A[2,3],A[2,4],A[3,3],A[3,4],A[4,4],4))
results <- optim(start, loglik, method = "L-BFGS-B",lower=lower,
upper=upper,hessian=T)
return(results)
}
fit_mine <- mtfit(CRSPday[,c(4:7)])
Related
I'm having problems with the identify function.
I am trying to identify the points on the residuals graph of the adjusted model, however the identify function is giving an error.
library(mgcv)
require(gamm4)
fit4.gamm <- gamm(log(massaseca)~factor(Trat)+s(Tempo,k=10,bs="ps",m=2,
by=factor(Trat)),
random=list(id=pdSymm(~Tempo)),data=dados)
comp_lme = fit4.gamm$lme
x11()
plot(comp_lme)
Error:
identify(comp_lme)
Error in xy.coords(x, y, setLab = FALSE) :
'x' is a list, but does not have components 'x' and 'y'
Take a reproducible example from the documentation:
library(mgcv)
library(gamm4)
n <- 200;sig <- 2
set.seed(0)
n.g <- 10
n<-n.g*10*4
## simulate smooth part...
dat <- gamSim(1,n=n,scale=2)
f <- dat$f
## simulate nested random effects....
fa <- as.factor(rep(1:10,rep(4*n.g,10)))
ra <- rep(rnorm(10),rep(4*n.g,10))
fb <- as.factor(rep(rep(1:4,rep(n.g,4)),10))
rb <- rep(rnorm(4),rep(n.g,4))
for (i in 1:9) rb <- c(rb,rep(rnorm(4),rep(n.g,4)))
## simulate auto-correlated errors within groups
e<-array(0,0)
for (i in 1:40) {
eg <- rnorm(n.g, 0, sig)
for (j in 2:n.g) eg[j] <- eg[j-1]*0.6+ eg[j]
e<-c(e,eg)
}
dat$y <- f + ra + rb + e
dat$fa <- fa;dat$fb <- fb
## fit model ....
b <- gamm(y~s(x0,bs="cr")+s(x1,bs="cr")+s(x2,bs="cr")+
s(x3,bs="cr"),data=dat,random=list(fa=~1,fb=~1),
correlation=corAR1())
What you are trying:
plot(b$lme) #lattice plot
identify(b$lme) #doesn't work with lattice plots
Instead, make your own base plot:
plot(fitted(b$lme), resid(b$lme))
identify(fitted(b$lme), resid(b$lme))
This will work.
I am using this code to fit a model using LASSO regression.
library(glmnet)
IV1 <- data.frame(IV1 = rnorm(100))
IV2 <- data.frame(IV2 = rnorm(100))
IV3 <- data.frame(IV3 = rnorm(100))
IV4 <- data.frame(IV4 = rnorm(100))
IV5 <- data.frame(IV5 = rnorm(100))
DV <- data.frame(DV = rnorm(100))
data<-data.frame(IV1,IV2,IV3,IV4,IV5,DV)
x <-model.matrix(DV~.-IV5 , data)[,-1]
y <- data$DV
AB<-glmnet(x=x, y=y, alpha=1)
plot(AB,xvar="lambda")
lambdas = NULL
for (i in 1:100)
{
fit <- cv.glmnet(x,y)
errors = data.frame(fit$lambda,fit$cvm)
lambdas <- rbind(lambdas,errors)
}
lambdas <- aggregate(lambdas[, 2], list(lambdas$fit.lambda), mean)
bestindex = which(lambdas[2]==min(lambdas[2]))
bestlambda = lambdas[bestindex,1]
fit <- glmnet(x,y,lambda=bestlambda)
I would like to calculate some sort of R2 using the training data. I assume that one way to do this is using the cross-validation that I performed in choosing lambda. Based off of this post it seems like this can be done using
r2<-max(1-fit$cvm/var(y))
However, when I run this, I get this error:
Warning message:
In max(1 - fit$cvm/var(y)) :
no non-missing arguments to max; returning -Inf
Can anyone point me in the right direction? Is this the best way to compute R2 based off of the training data?
The function glmnet does not return cvm as a result on fit
?glmnet
What you want to do is use cv.glmnet
?cv.glmnet
The following works (note you must specify more than 1 lambda or let it figure it out)
fit <- cv.glmnet(x,y,lambda=lambdas[,1])
r2<-max(1-fit$cvm/var(y))
I'm not sure I understand what you are trying to do. Maybe do this?
for (i in 1:100)
{
fit <- cv.glmnet(x,y)
errors = data.frame(fit$lambda,fit$cvm)
lambdas <- rbind(lambdas,errors)
r2[i]<-max(1-fit$cvm/var(y))
}
lambdas <- aggregate(lambdas[, 2], list(lambdas$fit.lambda), mean)
bestindex = which(lambdas[2]==min(lambdas[2]))
bestlambda = lambdas[bestindex,1]
r2[bestindex]
How can I do difference in means (ttest) for a multivariate using R and WinBUGS14
I have a multivariate outcome y and the categorical variable x. I am able to get the means of the MCMC sampled values from the multivariate using the code below, but how can I test for the difference in means by variable x?
Here is the R code
library(R2WinBUGS)
library(MASS) # need to mvrnorm
library(MCMCpack) # need for rwish
# Generate synthetic data
N <- 500
#we use this to simulate the data
S <- matrix(c(1,.2,.2,5),nrow=2)
#Produces one or more samples from the specified multivariate normal distribution.
#produces 2 variables with the given distribution
y <- mvrnorm(n=N,mu=c(1,3),Sigma=S)
x <- rbinom(500, 1, 0.5)
# Set up for WinBUGS
#set up of the mu0 values
mu0 <- as.vector(c(0,0))
#covariance matrices
# the precisions
S2 <- matrix(c(1,0,0,1),nrow=2)/1000 #precision for unkown mu
# precison matrix to be passes to the wishart distribution for the tau
S3 <- matrix(c(1,0,0,1),nrow=2)/10000
#the data for the winbug code
data <- list("y","N","S2","S3","mu0")
inits <- function(){
list( mu=mvrnorm(1,mu0,matrix(c(10,0,0,10),nrow=2) ),
tau <- rwish(3,matrix(c(.02,0,0,.04),nrow=2)) )
}
# Run WinBUGS
bug_file <- paste0(getwd(), "/codes/mult_normal.bug")
multi_norm.sim <- bugs(data,inits,model.file=bug_file,
parameters=c("mu","tau"),n.chains = 2,n.iter=4010,n.burnin=10,n.thin=1,
bugs.directory="../WinBUGS14/",codaPkg=F)
print(multi_norm.sim,digits=3)
and this is the WinBUGS14 code called mult_normal.bug
model{
for(i in 1:N)
{
y[i,1:2] ~ dmnorm(mu[],tau[,])
}
mu[1:2] ~ dmnorm(mu0[],S2[,])
#parameters of a wishart
tau[1:2,1:2] ~ dwish(S3[,],3)
}
2 Steps:
Load a function to run the t.test using sample statistics instead of doing it directly.
t.test2 <- function(m1,m2,s1,s2,n1,n2,m0=0,equal.variance=FALSE)
{
if( equal.variance==FALSE )
{
se <- sqrt( (s1^2/n1) + (s2^2/n2) )
# welch-satterthwaite df
df <- ( (s1^2/n1 + s2^2/n2)^2 )/( (s1^2/n1)^2/(n1-1) + (s2^2/n2)^2/(n2-1) )
} else
{
# pooled standard deviation, scaled by the sample sizes
se <- sqrt( (1/n1 + 1/n2) * ((n1-1)*s1^2 + (n2-1)*s2^2)/(n1+n2-2) )
df <- n1+n2-2
}
t <- (m1-m2-m0)/se
dat <- c(m1-m2, se, t, 2*pt(-abs(t),df))
names(dat) <- c("Difference of means", "Std Error", "t", "p-value")
return(dat)
}
Parse out the mean and standard deviation of the things we want to test against x, then pass them to the function.
mu1 <- as.data.frame(multi_norm.sim$mean)$mu[1]
sdmu1 <- multi_norm.sim$sd$mu[1]
t.test2( mean(x), as.numeric(mu1), s1 = sd(x), s2 = sdmu1, 500, 500)
Difference of means Std Error t p-value
-4.950656e-01 2.246905e-02 -2.203323e+01 5.862968e-76
When I copied the results from my screen to SO it was hard to make the labels of the results properly spaced apart, my apologies.
So I have this data and I would like to extract the coefficients from the equation it produces. That way I would be able to plug in a new data point and see where it would be placed.
library(MASS)
Iris <- data.frame(rbind(iris3[,,1], iris3[,,2], iris3[,,3]),
Sp = rep(c("s","c","v"), rep(50,3)))
train <- sample(1:150, 75)
table(Iris$Sp[train])
## your answer may differ
## c s v
## 22 23 30
z <- lda(Sp ~ ., Iris, prior = c(1,1,1)/3, subset = train)
I know I can get this:
> z
Call:
lda(Sp ~ ., data = Iris, prior = c(1, 1, 1)/3, subset = train)
Prior probabilities of groups:
c s v
0.3333333 0.3333333 0.3333333
Group means:
Sepal.L. Sepal.W. Petal.L. Petal.W.
c 5.969231 2.753846 4.311538 1.3384615
s 5.075000 3.541667 1.500000 0.2583333
v 6.700000 2.936000 5.552000 1.9880000
Coefficients of linear discriminants:
LD1 LD2
Sepal.L. -0.5458866 0.5215937
Sepal.W. -1.5312824 1.7891248
Petal.L. 1.8087255 -1.2637188
Petal.W. 2.8620894 3.2868849
Proportion of trace:
LD1 LD2
0.9893 0.0107
but is there a way to get just the equation so I would not have to calculate the new observation by hand?
Just turning this into an answer. You need predict(), the predict.lda method in the MASS package has your exact example in its help page:
tr <- sample(1:50, 25)
train <- rbind(iris3[tr,,1], iris3[tr,,2], iris3[tr,,3])
test <- rbind(iris3[-tr,,1], iris3[-tr,,2], iris3[-tr,,3])
cl <- factor(c(rep("s",25), rep("c",25), rep("v",25)))
z <- lda(train, cl)
predict(z, test)$class
The default method is "plug-in" so this is the code from MASS:::predict.lda. object is the fit-object and x comes from the newdata argument converted to a matrix:
# snipped preamble and error checking
means <- colSums(prior * object$means)
scaling <- object$scaling
x <- scale(x, center = means, scale = FALSE) %*% scaling
dm <- scale(object$means, center = means, scale = FALSE) %*%
scaling
method <- match.arg(method)
dimen <- if (missing(dimen))
length(object$svd)
else min(dimen, length(object$svd))
N <- object$N
if (method == "plug-in") {
dm <- dm[, 1L:dimen, drop = FALSE]
dist <- matrix(0.5 * rowSums(dm^2) - log(prior), nrow(x),
length(prior), byrow = TRUE) - x[, 1L:dimen, drop = FALSE] %*%
t(dm)
dist <- exp(-(dist - apply(dist, 1L, min, na.rm = TRUE)))
}
# snipped two other methods
}
posterior <- dist/drop(dist %*% rep(1, ng))
This mostly put in to demonstrate why Gregor's answer is the most sensible approach. Trying to pull out an "equation" seems unfruitful. (I can remember using the results of linear regression to do such an exercise in my first year-regression class in grad school.)
I am trying to find a way to estimate the predicted values of y with confidence intervals for specific values of x in an OLS regression. My model includes an interaction term and I use clustered standard errors and weights in my model.
A similar question was asked and answer previously, I thought it could be a good starting point:
robust standard errors in ggplot2
The problem is that, the solution offered here does not work when there are interaction terms OR weights in the model. It does produces an outcome when there are both weights and interaction terms. I found this confusing but I am relatively new to R and I could not understand the source of the problem.
In the second and third examples (lm2 & lm3) I get "Error in X %*% V : non-conformable arguments". My best guess for the source of the error in the third case is that model.frame(lm3) does not include interaction terms. But I don’t know whether I am in the right track and could not find a way to fix it. Besides it's not clear to me how I can set x1 to a specific value in this example. Can someone help me revise to code above or offer an alternative way to get my predicted standard errors when x is set to a specific value ?
df <- data.frame(x1 = rnorm(100), x2 = rnorm(100), w1 = runif(100,0.1,2),y = rnorm(100), group = as.factor(sample(1:10, 100, replace=T)))
lm1 <- lm(y ~ x1+x2, data = df)
lm2 <- lm(y ~ x1+x2, data = df, weight=w1)
lm3 <- lm(y ~ x1*x2, data = df)
lm4 <- lm(y ~ x1*x2, data = df, weight=w1)
getvcov <- function(fm,dfcw,cluster) {
library(sandwich);library(lmtest)
M <- length(unique(cluster))
N <- length(cluster)
K <- fm$rank
dfc <- (M/(M-1))*((N-1)/(N-K))
uj <- apply(estfun(fm),2, function(x) tapply(x, cluster, sum));
dfc*sandwich(fm, meat=crossprod(uj)/N)*dfcw
}
V <- getvcov(lm1,1,df$group)
X <- as.matrix(model.frame(lm1))
se <- predict(lm1,se=TRUE)$se.fit
se_robust1 <- sqrt(diag(X %*% V %*% t(X)))
V <- getvcov(lm2,1,df$group)
X <- as.matrix(model.frame(lm2))
se <- predict(lm2,se=TRUE)$se.fit
se_robust2 <- sqrt(diag(X %*% V %*% t(X)))
V <- getvcov(lm3,1,df$group)
X <- as.matrix(model.frame(lm3))
se <- predict(lm3,se=TRUE)$se.fit
se_robust2 <- sqrt(diag(X %*% V %*% t(X)))
V <- getvcov(lm4,1,df$group)
X <- as.matrix(model.frame(lm4))
se <- predict(lm4,se=TRUE)$se.fit
se_robust4 <- sqrt(diag(X %*% V %*% t(X)))