I can't get confidence intervalls for Friedmans chi-squared - r

I have a dataset (not normal distribution) with repeated measures over time. As such I was planning to use the Friedman.test in R:
friedman.test(dash ~ time | nr)
this gave:
Friedman rank sum test
data: dash and time and nr
Friedman chi-squared = 105.26, df = 2, p-value < 2.2e-16
To calculate the CI of the chi-squared I tried the following:
friedman_boot <- function(data, indices) {
return(friedman.test(dash ~ time | nr, data = data_t[indices, ])$statistic)
}
boot_results <- boot(data = data_t, statistic = friedman_boot, R = 1000)
boot_ci <- boot.ci(boot.out = boot_results, type = "perc", conf = 0.95)
boot_ci
which gave:
boot_results <- boot(data = data_t, statistic = friedman_boot, R = 1000)
Error in friedman.test.default(mf[[1L]], mf[[2L]], mf[[3L]]) :
not an unreplicated complete block design
I do not quite understand why this happens. Does anyone have another way to calculate CI for the test statistic?
To calculate the CI of the chi-squared I tried the following:
friedman_boot <- function(data, indices) {
return(friedman.test(dash ~ time | nr, data = data_t[indices, ])$statistic)
}
boot_results <- boot(data = data_t, statistic = friedman_boot, R = 1000)
boot_ci <- boot.ci(boot.out = boot_results, type = "perc", conf = 0.95)
boot_ci
which gave:
boot_results <- boot(data = data_t, statistic = friedman_boot, R = 1000)
Error in friedman.test.default(mf[[1L]], mf[[2L]], mf[[3L]]) :
not an unreplicated complete block design
I do not quite understand why this happens. Does anyone have another way to calculate CI for the test statistic?

Related

Stata vs. R: Delta Method provides different results for relative risk SE's from logit model

I've been trying to estimate the conditional mean treatment effect of covariates in a logit regression (using relative-risk) along with their standard errors for inference purposes. The delta method is necessary to calculated the standard errors for these treatment effects. I've been trying to recreate these results using the Stata user written command, adjrr, to calculate the relative risk with standard errors and confidence intervals in R.
The adjrr command in Stata calculates the adjusted relative-risk (the conditional mean treatment effect of interest for my project) and it's SE's using the delta method. The deltamethod command in R should create the same results, however this is not the case.
How can I replicate the results from Stata in R?
I used the following self generated data: (https://migariane.github.io/DeltaMethodEpiTutorial.nb.html).
R code below:
generateData <- function(n, seed){
set.seed(seed)
age <- rnorm(n, 65, 5)
age65p <- ifelse(age>=65, T, F)
cmbd <- rbinom(n, size=1, prob = plogis(1 - 0.05 * age))
Y <- rbinom(n, size=1, prob = plogis(1 - 0.02* age - 0.02 * cmbd))
data.frame(Y, cmbd, age, age65p)
}
# Describing the data
data <- generateData(n = 1000, seed = 777)
str(data)
logfit <- glm(Y ~ age65p + cmbd, data = data, family = binomial)
summary(logfit)
p1 <- predict(logfit, newdata = data.frame(age65p = T, cmbd = 0), type="response")
p0 <- predict(logfit, newdata = data.frame(age65p = F, cmbd = 0), type="response")
rr <- p1 / p0
rr
0.8123348 #result
library(msm)
se_rr_delta <- deltamethod( ~(1 + exp(-x1)) / (1 + exp(-x1 -x2)), coef(logfit), vcov(logfit))
se_rr_delta
0.6314798 #result
Stata Code (using same data):
logit Y i.age65p i.cmbd
adjrr age65p
//results below
R1 = 0.3685 (0.0218) 95% CI (0.3259, 0.4112)
R0 = 0.4524 (0.0222) 95% CI (0.4090, 0.4958)
ARR = 0.8146 (0.0626) 95% CI (0.7006, 0.9471)
ARD = -0.0839 (0.0311) 95% CI (-0.1449, -0.0229)
p-value (R0 = R1): 0.0071
p-value (ln(R1/R0) = 0): 0.0077

Error when Bootstraping a Beta regression model in R with {betareg}

I need to bootstrap a beta regression model to check its robustness - because of a data point with a large cook's distance - with the boot package (other suggestions welcomed).
I have the following error:
Error in t.star[r, ] <- res[[r]] :
incorrect number of subscripts on matrix
Here's a reproductible example:
library(betareg)
library(boot)
fake_data <- data.frame(diet = as.factor(c(rep("A",10),rep("B",10))),
fat = c(runif(10,.1,.5),runif(10,.4,.9)) )
plot(fat~diet, data = fake_data)
my_beta_reg <- function(data,i){
data_i <- data[i,]
mod <- betareg(data_i[,"fat"] ~ data_i[,"diet"])
return(mod$coef)
}
b = boot(fake_data, statistic = my_beta_reg, R= 50)
Error in t.star[r, ] <- res[[r]] :
incorrect number of subscripts on matrix
What's the issue?
Thanks in advance.
The issue is that mod$coef is a list:
betareg(fat ~ diet, data = fake_data)$coef
#$mean
#(Intercept) dietB
# -1.275793 2.490126
#
#$precision
# (phi)
#20.59014
You need to unlist it or preferably use the function you are supposed to use for extraction of coefficients:
my_beta_reg <- function(data,i){
mod <- betareg(fat ~ diet, data = data[i,])
#unlist(mod$coef)
coef(mod)
}
b = boot(fake_data, statistic = my_beta_reg, R= 50)
print(b)
#ORDINARY NONPARAMETRIC BOOTSTRAP
#
#
#Call:
#boot(data = fake_data, statistic = my_beta_reg, R = 50)
#
#
#Bootstrap Statistics :
# original bias std. error
#t1* -1.275793 -0.019847377 0.2003523
#t2* 2.490126 0.009008892 0.2314521
#t3* 20.590142 8.265394485 17.2271497

Bootstrap t test in R without reliance on MKinfer package

I would like to get CI for a paired t-test using bootstrap in R. Unfortunately, I dont have privileges to install the MKinfer. Using MKinfer I would (like here: T-test with bootstrap in R):
boot.t.test(
x = iris["Petal.Length"],
y = iris["Sepal.Length"],
alternative = c("two.sided"),
mu = 0,
#paired = TRUE,
conf.level = 0.95,
R = 9999
)
How would I do this for paired data with CI's and p-values not relying on MKinfer (relying on boot would be fine)?
Here is an example using boot using R = 1000 bootstrap replicates
library(boot)
x <- iris$Petal.Length
y <- iris$Sepal.Length
change_in_mean <- function(df, indices) t.test(
df[indices, 1], df[indices, 2], paired = TRUE, var.equal = FALSE)$estimate
model <- boot(
data = cbind(x, y),
statistic = change_in_mean,
R = 1000)
We can calculate the confidence interval of the estimated change in the mean using boot.ci
boot.ci(model, type = "norm")
#BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
#Based on 1000 bootstrap replicates
#
#CALL :
#boot.ci(boot.out = model, type = "norm")
#
#Intervals :
#Level Normal
#95% (-2.262, -1.905 )
#Calculations and Intervals on Original Scale
Note that this is very close to the CI reported by t.test
t.test(x, y, paired = TRUE, var.equal = FALSE)
#
# Paired t-test
#
#data: x and y
#t = -22.813, df = 149, p-value < 2.2e-16
#alternative hypothesis: true difference in means is not equal to 0
#95 percent confidence interval:
# -2.265959 -1.904708
#sample estimates:
#mean of the differences
# -2.085333

gofCopula in R giving same p-values

I am working with copulas and I am trying to select the best copula I want to use a goodness of fit test, but my code is giving me the same p-values for all the copulas, which is ceirtainly wrong. Here is my code:
dat <- read_excel("SMI.xlsx")
data <- cbind(dat[,2], dat[,3], dat[4])
dataNZ <- cbind(data[,1], data[,3])
#definition of copulas
#Normal copula
nc <- ellipCopula(family = "normal", param= 0.5, dim = 2, dispstr = "un")
#Gumbel-Hougaard copula
gc <- gumbelCopula(dim = 2, param = 1.1)
#GOF tests
gofmplnNZ <- gofCopula(nc, u, estim.method= "mpl")
gofmplgNZ <- gofCopula(gc, u, estim.method= "mpl")
To show exaclty what I mean here are the results of gofmplnNZ and gofmplgNZ:
> gofmplnNZ
Parametric bootstrap-based goodness-of-fit test of Normal copula, dim. d = 2, with 'method'="Sn",
'estim.method'="mpl":
data: x
statistic = 0.062807, parameter = 0.45013, p-value = 0.0004995
> gofmplgNZ
Parametric bootstrap-based goodness-of-fit test of Gumbel copula, dim. d = 2, with 'method'="Sn",
'estim.method'="mpl":
data: x
statistic = 0.1805, parameter = 1.4038, p-value = 0.0004995

R : Calculating p-value using simulations

I wrote this code to run a test statistic on two randomly distributed observations x and y
mean.test <- function(x, y, B=10000,
alternative=c("two.sided","less","greater"))
{
p.value <- 0
alternative <- match.arg(alternative)
s <- replicate(B, (mean(sample(c(x,y), B, replace=TRUE))-mean(sample(c(x,y), B, replace=TRUE))))
t <- mean(x) - mean(y)
p.value <- 2*(1- pnorm(abs(quantile(T,0.01)), mean = 0, sd = 1, lower.tail =
TRUE, log.p = FALSE)) #try to calculate p value
data.name <- deparse(substitute(c(x,y)))
names(t) <- "difference in means"
zero <- 0
names(zero) <- "difference in means"
return(structure(list(statistic = t, p.value = p.value,
method = "mean test", data.name = data.name,
observed = c(x,y), alternative = alternative,
null.value = zero),
class = "htest"))
}
the code uses a Monte-Carlo simulations to generate the distribution function of the test statistic mean(x) - mean(y) and then calculates the p-value, but apparently i miss defined this p-value because for :
> set.seed(0)
> mean.test(rnorm(1000,3,2),rnorm(2000,4,3))
the output should look like:
mean test
data: c(rnorm(1000, 3, 2), rnorm(2000, 4, 3))
difference in means = -1.0967, p-value < 2.2e-16
alternative hypothesis: true difference in means is not equal to 0
but i got this instead:
mean test
data: c(rnorm(1000, 3, 2), rnorm(2000, 4, 3))
difference in means = -1.0967, p-value = 0.8087
alternative hypothesis: true difference in means is not equal to 0
can someone explain the bug to me ?
As far as I can tell, your code has numerous mistakes and errors in it:
quantile(T, 0.01) - here T == TRUE, so you're calculating the quantile of 1.
The object s is never used.
mean(sample(c(x,y), B, replace=TRUE)) What are you trying to do here? The c() function combines x and y. Sampling makes no sense since you don't know what population they come from
When you calculate the test statistic t, it should depend on the variance (and sample size).

Resources