From rpart package, it is possible to get following output from printcp function. But how to extract the root node error value?
Classification tree:
rpart(formula = survived ~ ., data = ptitanic, control = rpart.control(cp = 1e-04))
#Variables actually used in tree construction:
#[1] age parch pclass sex sibsp
#Root node error: 500/1309 = 0.38
#n= 1309
# CP nsplit rel error xerror xstd
#1 0.4240 0 1.00 1.00 0.035
#2 0.0210 1 0.58 0.58 0.030
#3 0.0150 3 0.53 0.57 0.030
#4 0.0113 5 0.50 0.57 0.030
#5 0.0026 9 0.46 0.53 0.029
#6 0.0020 16 0.44 0.53 0.029
#7 0.0001 18 0.44 0.53 0.029
you can get the root error value from the frame component of your fit via:
fit$frame[1, 'dev']/fit$frame[1, 'n']
or the yval2.V5 entry in the 1st row of fit$frame.
Related
For the "rd" parameter, I got an error message while running t.test using the ggpubr::stat_compare_means() function. Moreover, TukeyHSD analysis of my data categorized all the individual group as "a", implying that there was no significance differences. This seems a bit weird as I'm expecting the opposite by looking at the plot (attached my plot). Moreover, there was no issue for identical t.test and TukeyHSD analysis of other parameters (fv,fq,npq, and in_situ etc. in data frame ). Please find my scripts and datas below, thanks.
This was just an example of similar plot from another parameter ("fv" in data frame) where the results of t.test from ggpubr::stat_compare_means()were shown above the error bars,identical script was being used here. expected plot
Exp1 <- read_csv("Raw data/Exp1.csv")
Exp1 $time <- factor(Exp1$time)
Exp1 $growth_condition <- factor(Exp1$growth_condition)
summary_anti_PsbS_both <-summarySE(data=Exp1, measurevar="rd", groupvars=c("time","growth_condition"))
> data.frame(Exp1)
id growth_condition time fv fq npq in_situ rd
1 1 Control 0 0.81 0.56 0.72 0.797 1.000
2 2 Control 0 0.81 0.58 0.78 0.788 1.000
3 3 Control 0 0.80 0.59 0.76 0.793 1.000
4 4 High light+Chilled 0 0.82 0.57 0.85 0.799 1.000
5 5 High light+Chilled 0 0.81 0.59 0.75 0.796 1.000
6 6 High light+Chilled 0 0.81 0.56 0.69 0.782 1.000
7 7 Control 0.5 0.81 0.53 1.08 0.759 1.279
8 8 Control 0.5 0.81 0.56 0.72 0.759 0.668
9 9 Control 0.5 0.79 0.50 1.04 0.771 0.877
10 10 High light+Chilled 0.5 0.70 0.46 1.04 0.540 0.487
11 11 High light+Chilled 0.5 0.60 0.43 0.69 0.652 1.341
12 12 High light+Chilled 0.5 0.73 0.46 1.19 0.606 0.904
13 13 Control 8 0.82 0.52 1.20 0.753 0.958
14 14 Control 8 0.81 0.55 1.09 0.759 0.642
15 15 Control 8 0.80 0.55 1.07 0.747 0.612
16 16 High light+Chilled 8 0.44 0.28 0.58 0.230 0.471
17 17 High light+Chilled 8 0.35 0.21 0.45 0.237 0.777
18 18 High light+Chilled 8 0.54 0.35 0.68 0.186 0.342
19 19 Control 24 0.81 0.49 1.17 0.762 0.915
20 20 Control 24 0.82 0.67 1.25 0.749 0.876
21 21 Control 24 0.82 0.48 1.18 0.756 0.836
22 22 High light+Chilled 24 0.40 0.25 0.45 0.089 0.392
23 23 High light+Chilled 24 0.43 0.27 0.51 0.106 0.627
24 24 High light+Chilled 24 0.34 0.21 0.37 0.140 0.258
25 25 Control 48 0.81 0.48 1.05 0.773 0.662
26 26 Control 48 0.80 0.45 1.14 0.785 0.914
27 27 Control 48 0.82 0.47 1.09 0.792 0.912
28 28 High light+Chilled 48 0.73 0.45 0.90 0.750 0.800
29 29 High light+Chilled 48 0.70 0.51 0.79 0.626 1.305
30 30 High light+Chilled 48 0.66 0.43 0.74 0.655 0.579
Script for plot
ggplot(data=summary_anti_PsbS_both, mapping = aes(x = factor(time), y = rd, fill= growth_condition))+
geom_bar(stat = "identity", position = "dodge")+
labs(x= "Time (hr)", y="Relative density", fill= "Growth conditions")+
ylim(0,1.5)+
geom_errorbar(aes(ymin=rd-se, ymax=rd+se), width=.2, position=position_dodge(width= 0.9))+
annotate(geom="text", x=1, y=1.45, label="n=3")+
stat_compare_means(data=Exp1, label="p.signif", label.y= 1.35, method="t.test")+
theme_bw()+
theme(text = element_text(size = 15))
Error message
Warning message:
Computation failed in `stat_compare_means()`:
Problem while computing `p = purrr::map(...)`.
Script for TukeyHSD
res.both88 <- aov(rd ~ growth_condition * time, data =Exp1)
summary(res.both88)
t8<-TukeyHSD(res.both88)
multcompLetters4(res.both88,t8)
I'm running a multinomial logistic regression. The outcome has four categories and there are two predictors (Male =1, a measure of the number of books in the home as a 5 point scale, and measure of motivation to read which is continuous. Here is the essential aspects of the code. I'm following "K" parameterization in https://mc-stan.org/docs/stan-users-guide/multi-logit.html. Thanks.
DOWELL <- as.factor(DOWELL)
Canadareg2 <- data.frame(Canadareg2)
n <- nrow(Canadareg2)
f <- as.formula("DOWELL ~ Male + booksHome + motivRead")
m <- model.matrix(f,Canadareg2)
data.list <- list(n=nrow(Canadareg2),
k=length(unique(Canadareg2[,1])),
d=ncol(m),x=m, Male=Male, booksHome=booksHome,
motivRead=motivRead,DOWELL=as.numeric(Canadareg2[,1]))
ReadMultiNom <- "
data {
int<lower = 2> k; // The variable has at least two categories
int<lower = 1> d; // number of predictors
int<lower = 0> n;
vector[n] Male;
vector[n] booksHome;
vector[n] motivRead;
int <lower=1, upper=k> DOWELL[n];
matrix[n, d] x;
}
parameters {
matrix[d, k] beta;
}
transformed parameters {
matrix[n, k] x_beta= x * beta;
}
model {
to_vector(beta) ~ normal(0,2); // vectorizes beta and assigns same prior
for (i in 1:n) {
DOWELL[i] ~ categorical_logit(x_beta[i]');
}
}
generated quantities {
int <lower=1, upper=k> DOWELL_rep[n];
vector[n] log_lik;
for (i in 1:n) {
DOWELL_rep[i] = categorical_logit_rng(x_beta[i]');
log_lik[i] = categorical_logit_lpmf(DOWELL[i] |x_beta[i]');
}
}
"
nChains = 4
nIter= 10000
thinSteps = 10
burnInSteps = floor(nIter/2)
DOWELL = data.list$DOWELL
MultiNomRegFit = stan(data=data.list,model_code=ReadMultiNom,
chains=nChains,control = list(adapt_delta = 0.99),
iter=nIter,warmup=burnInSteps,thin=thinSteps)
Everything runs beautifully and all convergence criterion are met. However, I am struggling to interpret the betas. I'm not sure where the Male effects are located. That is, it seems to be only for the two other predictors, but even then, one of them is a 5 point scale. It would seem to me that each beta would have three elements, e.g. beta(1,1,1), beta(1,2,1), etc. Here is the output. I'm just unsure how to interpret the betas.
Inference for Stan model: 7be33c603bd35d82ad7f6b200ccee16f.
## 4 chains, each with iter=10000; warmup=5000; thin=10;
## post-warmup draws per chain=500, total post-warmup draws=2000.
##
## mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat
## beta[1,1] -2.07 0.03 1.05 -4.16 -2.80 -2.08 -1.36 0.00 1705 1
## beta[1,2] 1.16 0.03 1.04 -0.97 0.47 1.18 1.84 3.25 1652 1
## beta[1,3] 1.06 0.03 1.07 -1.03 0.36 1.06 1.76 3.15 1703 1
## beta[1,4] -0.01 0.03 1.07 -2.17 -0.70 -0.01 0.68 2.06 1657 1
## beta[2,1] 0.51 0.03 1.04 -1.52 -0.18 0.52 1.20 2.51 1618 1
## beta[2,2] -0.01 0.03 1.04 -2.08 -0.72 -0.02 0.67 2.07 1636 1
## beta[2,3] -0.31 0.03 1.03 -2.31 -1.00 -0.31 0.38 1.70 1617 1
## beta[2,4] -0.26 0.03 1.04 -2.28 -0.94 -0.26 0.44 1.72 1648 1
## beta[3,1] 0.26 0.03 1.01 -1.73 -0.40 0.27 0.95 2.17 1525 1
## beta[3,2] 0.03 0.03 1.01 -2.01 -0.65 0.05 0.70 1.93 1525 1
## beta[3,3] 0.02 0.03 1.01 -2.04 -0.64 0.04 0.70 1.95 1522 1
## beta[3,4] -0.30 0.03 1.01 -2.36 -0.99 -0.29 0.38 1.59 1528 1
## beta[4,1] 0.18 0.02 0.98 -1.70 -0.49 0.14 0.85 2.17 1561 1
## beta[4,2] -0.09 0.02 0.98 -1.98 -0.77 -0.14 0.58 1.89 1568 1
## beta[4,3] -0.12 0.02 0.98 -2.01 -0.80 -0.15 0.56 1.85 1566 1
## beta[4,4] 0.11 0.02 0.99 -1.79 -0.56 0.07 0.79 2.10 1559 1
##
## Samples were drawn using NUTS(diag_e) at Tue Oct 18 09:58:48 2022.
## For each parameter, n_eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor on split chains (at
## convergence, Rhat=1).
I'm not quite sure what is going on, so any advice would be appreciated.
library(mirt) #this contains a dataset called deAyala.
library(psych) #this contains the alpha() function.
alpha(deAyala)
Using this function gives me the following dataset:
Some items ( Item.4 Item.5 ) were negatively correlated with the total scale and
probably should be reversed.
To do this, run the function again with the 'check.keys=TRUE' option
Reliability analysis
Call: alpha(x = deAyala)
raw_alpha std.alpha G6(smc) average_r S/N ase mean sd
0.00097 0.21 0.32 0.042 0.27 0.00057 103 134
median_r
0
lower alpha upper 95% confidence boundaries
0 0 0
Reliability if an item is dropped:
raw_alpha std.alpha G6(smc) average_r S/N
Item.1 5.9e-05 0.019 0.057 0.0038 0.019
Item.2 6.5e-04 0.178 0.293 0.0414 0.216
Item.3 8.4e-04 0.220 0.342 0.0535 0.283
Item.4 1.2e-03 0.289 0.385 0.0750 0.406
Item.5 1.3e-03 0.306 0.387 0.0812 0.442
Frequency 0.0e+00 0.000 0.000 0.0000 0.000
alpha se var.r med.r
Item.1 0.00056 0.011 0
Item.2 0.00055 0.044 0
Item.3 0.00054 0.047 0
Item.4 0.00052 0.044 0
Item.5 0.00051 0.041 0
Frequency 0.27951 0.000 0
Item statistics
n raw.r std.r r.cor r.drop mean sd
Item.1 32 0.60 0.59 0.657 0.60 0.5 0.51
Item.2 32 0.22 0.45 0.202 0.22 0.5 0.51
Item.3 32 0.10 0.41 0.080 0.10 0.5 0.51
Item.4 32 -0.11 0.33 -0.059 -0.11 0.5 0.51
Item.5 32 -0.17 0.31 -0.079 -0.17 0.5 0.51
Frequency 32 1.00 0.61 0.722 0.29 612.5 804.25
Non missing response frequency for each item
0 1 miss
Item.1 0.5 0.5 0
Item.2 0.5 0.5 0
Item.3 0.5 0.5 0
Item.4 0.5 0.5 0
Item.5 0.5 0.5 0
I ONLY want the raw.r column ([1:6] 0.6 0.223 0.103 -0.112 -0.174) in item statistics table, and I want to store them in a variable. How can I do that? I tried the following:
str(alpha(deAyala)$item.stats$raw.r)
But this gives me a lot of text:
Some items ( Item.4 Item.5 ) were negatively correlated with the total scale and
probably should be reversed.
To do this, run the function again with the 'check.keys=TRUE' option num [1:6] 0.6 0.223 0.103 -0.112 -0.174 ...
Warning message:
In alpha(deAyala) :
Some items were negatively correlated with the total scale and probably
should be reversed.
To do this, run the function again with the 'check.keys=TRUE' option
You can get rid of the warning bit by wrapping it with suppressWarnings() but the first bit of the message looks like just a print statement in the alpha() function. This will work though
invisible(capture.output(x <- suppressWarnings(alpha(deAyala)$item.stats$raw.r)))
EDIT: Actually I just looked at the help for alpha and it has a warnings option you can just set to FALSE.
I have the following data:
library tidyverse
age_grp <- c(10,9,8,7,6,5,4,3,2,1)
start <- c(0.420,0.420,0.420,0.420,0.420,0.420,0.420,0.420,0.420,0.420)
change <- c(0.020,0.033,0.029,0.031,0.027,0.032,0.032,0.030,0.027,0.034)
final_outcome <- c(0.400,0.367,0.338,0.307,0.28,0.248,0.216,0.186,0.159,0.125)
my_data <- data.frame(age_grp,start,change,final_outcome)
my_data1 <- my_data %>%
dplyr::arrange(age_grp)
I would like to subtract the values in the variable change from the values in the variable start such that it is an iterative decrease from the oldest age group to the youngest. The final values that I am looking to get are in the variable final_outcome. For example, starting with age_grp 10, I want to subtract 0.20 from 0.420 to get 0.400. Then, I would like to subtract 0.033 from 0.400 to get 0.367 and so forth. I am struggling with how to store those differences. I have made an attempt below, but I don't know how to store the difference to then continue the subtraction forward (or backward, depending on how you look at it). Any advice or suggestions would be appreciated.
my_data1$attempt <- NA
#calculating the decreases
for(i in 2:nrow(my_data1))
my_data1$attempt[i] <- round(my_data1$start[i] - my_data1$change[i-1], 4)
If we need the same output as in final_outcome
library(dplyr)
my_data %>%
mutate(attempt = start - cumsum(change)) %>%
arrange(age_grp)
-output
# age_grp start change final_outcome attempt
#1 1 0.42 0.034 0.125 0.125
#2 2 0.42 0.027 0.159 0.159
#3 3 0.42 0.030 0.186 0.186
#4 4 0.42 0.032 0.216 0.216
#5 5 0.42 0.032 0.248 0.248
#6 6 0.42 0.027 0.280 0.280
#7 7 0.42 0.031 0.307 0.307
#8 8 0.42 0.029 0.338 0.338
#9 9 0.42 0.033 0.367 0.367
#10 10 0.42 0.020 0.400 0.400
my_data$final <- my_data$start - cumsum(my_data$change)
library(tidyverse)
my_data %>%
mutate(attempt = accumulate(change, ~ .x - .y, .init = start[1])[-1])
Note: accumulate is from the purrr library that's part of the tidyverse. It also has a .dir argument where you can go "forward" or "backward".
Or in base R using Reduce:
within(my_data, attempt <- Reduce("-", change, init = start[1], accumulate = T)[-1])
Reduce has an argument right that can also do the computation forwards or backwards.
Output
age_grp start change final_outcome attempt
1 10 0.42 0.020 0.400 0.400
2 9 0.42 0.033 0.367 0.367
3 8 0.42 0.029 0.338 0.338
4 7 0.42 0.031 0.307 0.307
5 6 0.42 0.027 0.280 0.280
6 5 0.42 0.032 0.248 0.248
7 4 0.42 0.032 0.216 0.216
8 3 0.42 0.030 0.186 0.186
9 2 0.42 0.027 0.159 0.159
10 1 0.42 0.034 0.125 0.125
Subject var1 var2 var3 var4 var5
1 0.2 0.78 7.21 0.5 0.47
1 0.52 1.8 11.77 -0.27 -0.22
1 0.22 0.84 7.32 0.35 0.36
2 0.38 1.38 10.05 -0.25 -0.2
2 0.56 1.99 13.76 -0.44 -0.38
3 0.35 1.19 7.23 -0.16 -0.06
4 0.09 0.36 4.01 0.55 0.51
4 0.29 1.08 9.48 -0.57 -0.54
4 0.27 1.03 9.42 -0.19 -0.21
4 0.25 0.9 7.06 0.12 0.12
5 0.18 0.65 5.22 0.41 0.42
5 0.15 0.57 5.72 0.01 0.01
6 0.26 0.94 7.38 -0.17 -0.13
6 0.14 0.54 5.13 0.16 0.17
6 0.22 0.84 6.97 -0.66 -0.58
6 0.18 0.66 5.79 0.23 0.25
# the above is sample data matrix (dat11)
# The following lines of function is to calculate the p-value (P.z) for a
# variable pair var2 and var3 using lmer().
fit1 <- lmer(var2 ~ var3 + (1|Subject), data = dat11)
summary(fit1)
coefs <- data.frame(coef(summary(fit1)))
# use normal distribution to approximate p-value
coefs$p.z <- 2 * (1 - pnorm(abs(coefs$t.value)))
round(coefs,6)
# the following is the result
Estimate Std.Error t.value p.z
(Intercept) -0.280424 0.110277 -2.542913 0.010993
var3 0.163764 0.013189 12.417034 0.000000
The real data contains 65 variables (var1, var2....var65). I would like to use the above codes to find the above result for all possible pairs of 65 variables, eg, var1 ~ var2, var1 ~var3, ... var1 ~var65; var2 ~var3, var2 ~ var4, ... var2~var65; var3~var4, ... and so on. There will be about 2000 pairs. Can somebody help me with the loop codes and get the results to a .csv file? Thank you.