Confusion matrix on regression logistic - r

I am trying to perform some logistic regression on the dataset provided
here by using the 5-fold-cross-validation.
My goal is to make prediction over the Classification column of the dataset which can take the value 1 (if no cancer) and the value 2 (if cancer).
Here is the full code :
library(ISLR)
library(boot)
dataCancer <- read.csv("http://archive.ics.uci.edu/ml/machine-learning-databases/00451/dataR2.csv")
#Randomly shuffle the data
dataCancer<-dataCancer[sample(nrow(dataCancer)),]
#Create 5 equally size folds
folds <- cut(seq(1,nrow(dataCancer)),breaks=5,labels=FALSE)
#Perform 5 fold cross validation
for(i in 1:5){
#Segement your data by fold using the which() function
testIndexes <- which(folds == i)
testData <- dataCancer[testIndexes, ]
trainData <- dataCancer[-testIndexes, ]
#Use the test and train data partitions however you desire...
classification_model = glm(as.factor(Classification) ~ ., data = trainData,family = binomial)
summary(classification_model)
#Use the fitted model to do predictions for the test data
model_pred_probs = predict(classification_model , testData , type = "response")
model_predict_classification = rep(0 , length(testData))
model_predict_classification[model_pred_probs > 0.5] = 1
#Create the confusion matrix and compute the misclassification rate
table(model_predict_classification , testData)
mean(model_predict_classification != testData)
}
I would like to have some help at the end
table(model_predict_classification , testData)
mean(model_predict_classification != testData)
I get the following error :
Error in table(model_predict_classification, testData) : all arguments must have the same length
I don't understand very well how to use the confusion matrix.
I want to have 5 misclassification rate. The trainData and testData have been cut into 5 segments. The size should be equal to the model_predict_classification.
Thanks for your help.

Here is a solution using the caret package to perform 5-fold cross validation on the cancer data after splitting it into test and training data sets. Confusion matrices are generated against both the test and training data.
caret::train() reports an average accuracy across the 5 hold out folds. The results for each individual fold can be obtained by extracting them from the output model object.
library(caret)
data <- read.csv("http://archive.ics.uci.edu/ml/machine-learning-databases/00451/dataR2.csv")
# set classification as factor, and recode to
# 0 = no cancer, 1 = cancer
data$Classification <- as.factor((data$Classification - 1))
# split data into training and test, based on values of dependent variable
trainIndex <- createDataPartition(data$Classification, p = .75,list=FALSE)
training <- data[trainIndex,]
testing <- data[-trainIndex,]
trCntl <- trainControl(method = "CV",number = 5)
glmModel <- train(Classification ~ .,data = training,trControl = trCntl,method="glm",family = "binomial")
# print the model info
summary(glmModel)
glmModel
confusionMatrix(glmModel)
# generate predictions on hold back data
trainPredicted <- predict(glmModel,testing)
# generate confusion matrix for hold back data
confusionMatrix(trainPredicted,reference=testing$Classification)
...and the output:
> # print the model info
> > summary(glmModel)
>
> Call: NULL
>
> Deviance Residuals:
> Min 1Q Median 3Q Max
> -2.1542 -0.8358 0.2605 0.8260 2.1009
>
> Coefficients:
> Estimate Std. Error z value Pr(>|z|) (Intercept) -4.4039248 3.9159157 -1.125 0.2607 Age -0.0190241 0.0177119 -1.074 0.2828 BMI -0.1257962 0.0749341 -1.679 0.0932 . Glucose 0.0912229 0.0389587 2.342 0.0192 * Insulin 0.0917095 0.2889870 0.317 0.7510 HOMA -0.1820392 1.2139114 -0.150 0.8808 Leptin -0.0207606 0.0195192 -1.064 0.2875 Adiponectin -0.0158448 0.0401506 -0.395 0.6931 Resistin 0.0419178 0.0255536 1.640 0.1009 MCP.1 0.0004672 0.0009093 0.514 0.6074
> --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>
> (Dispersion parameter for binomial family taken to be 1)
>
> Null deviance: 119.675 on 86 degrees of freedom Residual deviance: 89.804 on 77 degrees of freedom AIC: 109.8
>
> Number of Fisher Scoring iterations: 7
>
> > glmModel Generalized Linear Model
>
> 87 samples 9 predictor 2 classes: '0', '1'
>
> No pre-processing Resampling: Cross-Validated (5 fold) Summary of
> sample sizes: 70, 69, 70, 69, 70 Resampling results:
>
> Accuracy Kappa
> 0.7143791 0.4356231
>
> > confusionMatrix(glmModel) Cross-Validated (5 fold) Confusion Matrix
>
> (entries are percentual average cell counts across resamples)
>
> Reference Prediction 0 1
> 0 33.3 17.2
> 1 11.5 37.9
> Accuracy (average) : 0.7126
>
> > # generate predictions on hold back data
> > trainPredicted <- predict(glmModel,testing)
> > # generate confusion matrix for hold back data
> > confusionMatrix(trainPredicted,reference=testing$Classification) Confusion Matrix and Statistics
>
> Reference Prediction 0 1
> 0 11 2
> 1 2 14
>
> Accuracy : 0.8621
> 95% CI : (0.6834, 0.9611)
> No Information Rate : 0.5517
> P-Value [Acc > NIR] : 0.0004078
>
> Kappa : 0.7212 Mcnemar's Test P-Value : 1.0000000
>
> Sensitivity : 0.8462
> Specificity : 0.8750
> Pos Pred Value : 0.8462
> Neg Pred Value : 0.8750
> Prevalence : 0.4483
> Detection Rate : 0.3793 Detection Prevalence : 0.4483
> Balanced Accuracy : 0.8606
>
> 'Positive' Class : 0
>
> >

Related

R Nonliner Least Squares (nls) function: Using indexed vectors as inputs?

I am trying to run the nls function in R using indexed vectors as inputs, however I am getting an error:
> a=c(1,2,3,4,5,6,7,8,9,10)
> b=c(6,7,9,11,14,18,23,30,38,50) #make some example data
>
> nls(b[1:6]~s+k*2^(a[1:6]/d),start=list(s=2,k=3,d=2.5)) #try running nls on first 6 elements of a and b
Error in parse(text = x, keep.source = FALSE) :
<text>:2:0: unexpected end of input
1: ~
^
I can run it on the full vectors:
> nls(b~s+k*2^(a/d),start=list(s=2,k=3,d=2.5))
Nonlinear regression model
model: b ~ s + k * 2^(a/d)
data: parent.frame()
s k d
1.710 3.171 2.548
residual sum-of-squares: 0.3766
Number of iterations to convergence: 3
Achieved convergence tolerance: 1.2e-07
I am fairly certain that the indexed vectors have the same variable type as the full vectors:
> a
[1] 1 2 3 4 5 6 7 8 9 10
> typeof(a)
[1] "double"
> class(a)
[1] "numeric"
> a[1:6]
[1] 1 2 3 4 5 6
> typeof(a[1:6])
[1] "double"
> class(a[1:6])
[1] "numeric"
I can run nls if I save the indexed vectors in new variables:
> a_part=a[1:6]
> b_part=b[1:6]
> nls(b_part~s+k*2^(a_part/d),start=list(s=2,k=3,d=2.5))
Nonlinear regression model
model: b_part ~ s + k * 2^(a_part/d)
data: parent.frame()
s k d
2.297 2.720 2.373
residual sum-of-squares: 0.06569
Number of iterations to convergence: 3
Achieved convergence tolerance: 1.274e-07
Furthermore, lm accepts both full and indexed vectors:
> lm(b~a)
Call:
lm(formula = b ~ a)
Coefficients:
(Intercept) a
-4.667 4.594
> lm(b[1:6]~a[1:6])
Call:
lm(formula = b[1:6] ~ a[1:6])
Coefficients:
(Intercept) a[1:6]
2.533 2.371
Is there a way to run nls on indexed vectors without saving them in new variables?
Use subset . (It would also be possible to use the weights argument giving a weight of 1 to each of the first 6 observations and 0 to the rest.)
Also you might want to use the plinear algorithm to avoid having to give the starting values for the two parameters that enter linearly. In that case provide a matrix on the RHS with column names s and k such that its first column multiplies s and the second column multiplies k.
nls(b ~ cbind(s = 1, k = 2^(a/d)), subset = 1:6, start = list(d = 2.5),
algorithm = "plinear")
giving:
Nonlinear regression model
model: b ~ cbind(s = 1, k = 2^(a/d))
data: parent.frame()
d .lin.s .lin.k
2.373 2.297 2.720
residual sum-of-squares: 0.06569
Number of iterations to convergence: 3
Achieved convergence tolerance: 7.186e-08

Logistic Regression on NBA shot data

I am using NBA shot data and am attempting to create shot prediction models using different regression techniques. However, I am running into the following warning message when trying to use a logistic regression model: Warning message:
glm.fit: algorithm did not converge. Also, it seems that the predictions do not work at all (not changed from the original Y variable (make or miss)). I will provide my code below. I got the data from here: Shot Data.
nba_shots <- read.csv("shot_logs.csv")
library(dplyr)
library(ggplot2)
library(data.table)
library("caTools")
library(glmnet)
library(caret)
nba_shots_clean <- data.frame("game_id" = nba_shots$GAME_ID, "location" =
nba_shots$LOCATION, "shot_number" = nba_shots$SHOT_NUMBER,
"closest_defender" = nba_shots$CLOSEST_DEFENDER,
"defender_distance" = nba_shots$CLOSE_DEF_DIST, "points" = nba_shots$PTS,
"player_name" = nba_shots$player_name, "dribbles" = nba_shots$DRIBBLES,
"shot_clock" = nba_shots$SHOT_CLOCK, "quarter" = nba_shots$PERIOD,
"touch_time" = nba_shots$TOUCH_TIME, "game_result" = nba_shots$W
, "FGM" = nba_shots$FGM)
mean(nba_shots_clean$shot_clock) # NA
# this gave NA return which means that there are NAs in this column that we
# need to clean up
# if the shot clock was NA I assume that this means it was the end of a
# quarter and the shot clock was off.
# For now I'm going to just set all of these NAs equal to zero, so all zeros
# mean it is the end of a quarter
# checking the amount of NAs
last_shots <- nba_shots_clean[is.na(nba_shots_clean$shot_clock),]
nrow(last_shots) # this tells me there is 5567 shots taken when the shot
# clock was turned off at the end of a quarter
# setting these NAs equal to zero
nba_shots_clean[is.na(nba_shots_clean)] <- 0
# checking to see if it worked
nrow(nba_shots_clean[is.na(nba_shots_clean$shot_clock),]) # it worked
# create a test and train set
split = sample.split(nba_shots_clean, SplitRatio=0.75)
nbaTrain = subset(nba_shots_clean, split==TRUE)
nbaTest = subset(nba_shots_clean, split==FALSE)
# logistic regression
nbaLogitModel <- glm(FGM ~ location + shot_number + defender_distance +
points + dribbles + shot_clock + quarter + touch_time, data=nbaTrain,
family="binomial", na.action = na.omit)
nbaPredict = predict(nbaLogitModel, newdata=nbaTest, type="response")
cm = table(nbaTest$FGM, nbaPredict > 0.5)
print(cm)
This gives me the output of the following, which tells me the prediction didn't do anything, as it's the same as before.
FALSE TRUE
0 21428 0
1 0 17977
I would really appreciate any guidance.
The confusion matrix of your model (model prediction vs. nbaTest$FGM) tells you that your model has a 100% accuracy !
This is due to the points variable in your dataset which is perfectly associated to the dependent variable:
table(nba_shots_clean$points, nba_shots_clean$FGM)
0 1
0 87278 0
2 0 58692
3 0 15133
Try to delete points from your model:
# create a test and train set
set.seed(1234)
split = sample.split(nba_shots_clean, SplitRatio=0.75)
nbaTrain = subset(nba_shots_clean, split==TRUE)
nbaTest = subset(nba_shots_clean, split==FALSE)
# logistic regression
nbaLogitModel <- glm(FGM ~ location + shot_number + defender_distance +
dribbles + shot_clock + quarter + touch_time, data=nbaTrain,
family="binomial", na.action = na.omit)
summary(nbaLogitModel)
No warning messages now and the estimated model is:
Call:
glm(formula = FGM ~ location + shot_number + defender_distance +
dribbles + shot_clock + quarter + touch_time, family = "binomial",
data = nbaTrain, na.action = na.omit)
Deviance Residuals:
Min 1Q Median 3Q Max
-3.8995 -1.1072 -0.9743 1.2284 1.6799
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.427688 0.025446 -16.808 < 2e-16 ***
locationH 0.037920 0.012091 3.136 0.00171 **
shot_number 0.007972 0.001722 4.630 0.000003656291 ***
defender_distance -0.006990 0.002242 -3.117 0.00182 **
dribbles 0.010582 0.004859 2.178 0.02941 *
shot_clock 0.032759 0.001083 30.244 < 2e-16 ***
quarter -0.043100 0.007045 -6.118 0.000000000946 ***
touch_time -0.038006 0.005700 -6.668 0.000000000026 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 153850 on 111532 degrees of freedom
Residual deviance: 152529 on 111525 degrees of freedom
AIC: 152545
Number of Fisher Scoring iterations: 4
The confusion matrix is:
nbaPredict = predict(nbaLogitModel, newdata=nbaTest, type="response")
cm = table(nbaTest$FGM, nbaPredict > 0.5)
print(cm)
FALSE TRUE
0 21554 5335
1 16726 5955

Order of predictions from merTools predictInterval()

I'm encountering an issue with predictInterval() from merTools. The predictions seem to be out of order when compared to the data and midpoint predictions using the standard predict() method for lme4. I can't reproduce the problem with simulated data, so the best I can do is show the lmerMod object and some of my data.
> # display input data to the model
> head(inputData)
id y x z
1 calibration19 1.336 0.531 001
2 calibration20 1.336 0.433 001
3 calibration22 0.042 0.432 001
4 calibration23 0.042 0.423 001
5 calibration16 3.300 0.491 001
6 calibration17 3.300 0.465 001
> sapply(inputData, class)
id y x z
"factor" "numeric" "numeric" "factor"
>
> # fit mixed effects regression with random intercept on z
> lmeFit = lmer(y ~ x + (1 | z), inputData)
>
> # display lmerMod object
> lmeFit
Linear mixed model fit by REML ['lmerMod']
Formula: y ~ x + (1 | z)
Data: inputData
REML criterion at convergence: 444.245
Random effects:
Groups Name Std.Dev.
z (Intercept) 0.3097
Residual 0.9682
Number of obs: 157, groups: z, 17
Fixed Effects:
(Intercept) x
-0.4291 5.5638
>
> # display new data to predict in
> head(predData)
id x z
1 29999900108 0.343 001
2 29999900207 0.315 001
3 29999900306 0.336 001
4 29999900405 0.408 001
5 29999900504 0.369 001
6 29999900603 0.282 001
> sapply(predData, class)
id x z
"factor" "numeric" "factor"
>
> # estimate fitted values using predict()
> set.seed(1)
> preds_mid = predict(lmeFit, newdata=predData)
>
> # estimate fitted values using predictInterval()
> set.seed(1)
> preds_interval = predictInterval(lmeFit, newdata=predData, n.sims=1000) # wrong order
>
> # estimate fitted values just for the first observation to confirm that it should be similar to preds_mid
> set.seed(1)
> preds_interval_first_row = predictInterval(lmeFit, newdata=predData[1,], n.sims=1000)
>
> # display results
> head(preds_mid) # correct prediction
1 2 3 4 5 6
1.256860 1.101074 1.217913 1.618505 1.401518 0.917470
> head(preds_interval) # incorrect order
fit upr lwr
1 1.512410 2.694813 0.133571198
2 1.273143 2.521899 0.009878347
3 1.398273 2.785358 0.232501376
4 1.878165 3.188086 0.625161201
5 1.605049 2.813737 0.379167003
6 1.147415 2.417980 -0.108547846
> preds_interval_first_row # correct prediction
fit upr lwr
1 1.244366 2.537451 -0.04911808
> preds_interval[round(preds_interval$fit,3)==round(preds_interval_first_row$fit,3),] # the correct prediction ends up as observation 1033
fit upr lwr
1033 1.244261 2.457012 -0.0001299777
>
To put this into words, the first observation of my data frame predData should have a fitted value around 1.25 according to the predict() method, but it has a value around 1.5 using the predictInterval() method. This does not seem to be simply due to differences in the prediction approaches, because if I restrict the newdata argument to the first row of predData, the resulting fitted value is around 1.25, as expected.
The fact that I can't reproduce the problem with simulated data leads me to believe it has to do with an attribute of my input or prediction data. I've tried reclassifying the factor variable as character, enforcing the order of the rows prior to fitting the model, between fitting the model and predicting, but found no success.
Is this a known issue? What can I do to avoid it?
I have attempted to make a minimal reproducible example of this issue, but have been unsuccessful.
library(merTools)
d <- data.frame(x = rnorm(1000), z = sample(1:25L, 1000, replace=TRUE),
id = sample(LETTERS, 1000, replace = TRUE))
d$z <- as.factor(d$z)
d$id <- factor(d$id)
d$y <- simulate(~x+(1|z),family = gaussian,
newdata=d,
newparams=list(beta=c(2, -1.1), theta=c(.25),
sigma = c(.23)), seed =463)[[1]]
lmeFit <- lmer(y ~ x + (1|z), data = d)
predData <- data.frame(x = rnorm(25), z = sample(1:25L, 25, replace=TRUE),
id = sample(LETTERS, 25, replace = TRUE))
predData$z <- as.factor(predData$z)
predData$id <- factor(predData$id)
predict(lmeFit, predData)
predictInterval(lmeFit, predData)
predictInterval(lmeFit, predData[1, ])
But, playing around with this code I was not able to recreate the error observed above. Can you post a synthetic example or see if you can create a synthetic example?
Or can you test the issue first coercing the factors to characters and seeing if you see the same re-ordering issue?

How Do You Use Post-Stratification Output to Influence Variables in a Predictive Model in R?

My current dataset oversampled females to the point that they make up 74% of the total sample size of 411 -- and it should be 50% to 50%. How can I use my post-stratification output to influence my (logistical regression) predictive model?
This is what I did to get the new mean and coefficients of my support when changing the amount of women surveyed:
> library(foreign)
> library(survey)
>
> mydata <- read.csv("~/Desktop/R/mydata.csv")
>
> #Enter Actual Population Size
> mydata$fpc <- 1200
>
> #Enter ID Column Name
> id <- mydata$My.ID
>
> #Enter Column to Post-Stratify
> type <- mydata$Male
>
> #Enter Column Variables
> x1 <- 0
> y1 <- 1
>
> #Enter Corresponding Frequencies
> x2 <- 600
> y2 <- 600
>
> #Enter the Variable of Interest
> mydata$interest <- mydata$Support
>
> preliminary.design <- svydesign(id = ~1, data = mydata, fpc = ~fpc)
>
> ps.weights <- data.frame(type = c(x1,y1), Freq = c(x2, y2))
>
> mydesign <- postStratify(preliminary.design, ~type, ps.weights)
>
> #Print Original Mean of Variable of Interest
> mean(mydata$Support)
[1] 0.6666666667
>
> #Total Actual Population Size
> sum(ps.weights$Freq)
[1] 1200
>
> #Unweighted Observations Where the Variable of Interest is Not Missing
> unwtd.count(~interest, mydesign)
counts SE
counts 411 0
>
> #Print the Post-Stratified Mean and SE of the Variable
> svymean(~interest, mydesign)
mean SE
interest 0.71077946 0.01935
>
> #Print the Weighted Total and SE of the Variable
> svytotal(~interest, mydesign)
total SE
interest 852.93535 23.21552
>
> #Print the Mean and SE of the Interest Variable, by Type
> svyby(~interest, ~type, mydesign, svymean)
type interest se
0 0 0.6196721311 0.02256768435
1 1 0.8018867925 0.03142947839
>
> mysvyby <- svyby(~interest, ~type, mydesign, svytotal)
>
> #Print the Coefficients of each Type
> coef(mysvyby)
0 1
371.8032787 481.1320755
>
> #Print the Standard Error of each Type
> SE(mysvyby)
[1] 13.54061061 18.85768704
>
> #Print Confidence Intervals for the Coefficient Estimates
> confint(mysvyby)
2.5 % 97.5 %
0 345.2641696 398.3423878
1 444.1716880 518.0924629
All of the output above seems right -- but I can't figure out how to utilize that data to influence the output of my logistic regression model. This is the code without any post-stratification influence:
> mydata <- read.csv("~/Desktop/R/mydata.csv")
>
> attach(mydata)
>
> # Define variables
>
> Y <- cbind(Support)
> X <- cbind(Black, vote, Male)
>
> # Descriptive statistics
>
> summary(Y)
Support
Min. :0.0000000
1st Qu.:0.0000000
Median :1.0000000
Mean :0.6666667
3rd Qu.:1.0000000
Max. :1.0000000
>
> summary(X)
Black vote Male
Min. :0.0000000 Min. : 0.8100 Min. :0.0000000
1st Qu.:0.0000000 1st Qu.:24.0350 1st Qu.:0.0000000
Median :0.0000000 Median :47.6300 Median :0.0000000
Mean :0.4355231 Mean :48.0447 Mean :0.2579075
3rd Qu.:1.0000000 3rd Qu.:72.1300 3rd Qu.:1.0000000
Max. :1.0000000 Max. :91.3200 Max. :1.0000000
>
> table(Y)
Y
0 1
137 274
>
> table(Y)/sum(table(Y))
Y
0 1
0.3333333333 0.6666666667
>
>
> # Logit model coefficients
>
> logit<- glm(Y ~ X, family=binomial (link = "logit"))
>
> summary(logit)
Call:
glm(formula = Y ~ X, family = binomial(link = "logit"))
Deviance Residuals:
Min 1Q Median 3Q Max
-2.1658288 -1.1277933 0.5904486 0.9190314 1.3256407
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.462496014 0.265017604 1.74515 0.0809584 .
XBlack 1.329633506 0.244053422 5.44812 5.0904e-08 ***
Xvote -0.008839950 0.004262016 -2.07412 0.0380678 *
XMale 0.781144950 0.283218355 2.75810 0.0058138 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 523.21465 on 410 degrees of freedom
Residual deviance: 469.48706 on 407 degrees of freedom
AIC: 477.48706
Number of Fisher Scoring iterations: 4
>
> # Logit model odds ratios
>
> exp(logit$coefficients)
(Intercept) XBlack Xvote XMale
1.5880327947 3.7796579101 0.9911990073 2.1839713716
Is there a way to combine these two scripts in R to update my logit model so that it looks at gender as 50/50 instead of 74% female/26% male when I predict?
Thanks!
Since you want to create predictions from your model, here's a possible solution: (1) fit the logistic regression model with the data you have at hand (that is, with 74% female and 26% male) and then (2) extract predicted probabilities from your model setting the gender variable equal to 0.5. See ?predict.glm for more information.

Doing calculations on summary elements

Is there an easy way to run followup mathematical calculations on elements of a summary? I have log transformed data that is run through an anova analysis. I would like to calculate the antilog of the summary output.
I have the following code:
require(multcomp)
inc <- log(Inc)
myanova <- aov(inc ~ educ)
tukey <- glht(myanova, linfct = mcp(educ = "Tukey"))
summary(tukey)
Which produces an output as follows:
Estimate Std. Error t value Pr(>|t|)
12 - under12 == 0 0.32787 0.08493 3.861 0.00104 **
13to15 - under12 == 0 0.49187 0.08775 5.606 < 0.001 ***
16 - under12 == 0 0.89775 0.09217 9.740 < 0.001 ***
over16 - under12 == 0 0.99856 0.09316 10.719 < 0.001 ***
13to15 - 12 == 0 0.16400 0.04674 3.509 0.00394 **
etc.
How can I easily execute an antilog calculation on the Estimate values?
This is a bit of a hack, so I'd recommend further checking, but if all you want is to see exponented estimates and standard errors I think something similar to the following will work (I used different data).
> amod <- aov(breaks ~ tension, data = warpbreaks)
> tukey = glht(amod, linfct = mcp(tension = "Tukey"))
> tsum = summary(tukey)
> tsum[[10]]$coefficients = exp(tsum[[10]]$coefficients)
> tsum[[10]]$sigma = exp(tsum[[10]]$sigma)
> tsum
If you want to use coef(tukey) to give you the estimates then you would reverse transform with:
exp(coef(tukey))
I think this should work:
coef(tukey)
to get the estimated values. here an example:
amod <- aov(breaks ~ tension, data = warpbreaks)
tukey <- glht(amod, linfct = mcp(tension = "Tukey"))
Now if want to get all tukey summary elements you type you apply head or tail to get a named list with the summary elements.
head(summary(tukey))
$model
Call:
aov(formula = breaks ~ tension, data = warpbreaks)
Terms:
tension Residuals
Sum of Squares 2034.259 7198.556
Deg. of Freedom 2 51
Residual standard error: 11.88058
Estimated effects may be unbalanced
$linfct
(Intercept) tensionM tensionH
M - L 0 1 0
H - L 0 0 1
H - M 0 -1 1
attr(,"type")
[1] "Tukey"
$rhs
[1] 0 0 0
$coef
(Intercept) tensionM tensionH
36.38889 -10.00000 -14.72222
$vcov
(Intercept) tensionM tensionH
(Intercept) 7.841564 -7.841564 -7.841564
tensionM -7.841564 15.683128 7.841564
tensionH -7.841564 7.841564 15.683128
$df
[1] 51

Resources