I wrote this code to run a test statistic on two randomly distributed observations x and y
mean.test <- function(x, y, B=10000,
alternative=c("two.sided","less","greater"))
{
p.value <- 0
alternative <- match.arg(alternative)
s <- replicate(B, (mean(sample(c(x,y), B, replace=TRUE))-mean(sample(c(x,y), B, replace=TRUE))))
t <- mean(x) - mean(y)
p.value <- 2*(1- pnorm(abs(quantile(T,0.01)), mean = 0, sd = 1, lower.tail =
TRUE, log.p = FALSE)) #try to calculate p value
data.name <- deparse(substitute(c(x,y)))
names(t) <- "difference in means"
zero <- 0
names(zero) <- "difference in means"
return(structure(list(statistic = t, p.value = p.value,
method = "mean test", data.name = data.name,
observed = c(x,y), alternative = alternative,
null.value = zero),
class = "htest"))
}
the code uses a Monte-Carlo simulations to generate the distribution function of the test statistic mean(x) - mean(y) and then calculates the p-value, but apparently i miss defined this p-value because for :
> set.seed(0)
> mean.test(rnorm(1000,3,2),rnorm(2000,4,3))
the output should look like:
mean test
data: c(rnorm(1000, 3, 2), rnorm(2000, 4, 3))
difference in means = -1.0967, p-value < 2.2e-16
alternative hypothesis: true difference in means is not equal to 0
but i got this instead:
mean test
data: c(rnorm(1000, 3, 2), rnorm(2000, 4, 3))
difference in means = -1.0967, p-value = 0.8087
alternative hypothesis: true difference in means is not equal to 0
can someone explain the bug to me ?
As far as I can tell, your code has numerous mistakes and errors in it:
quantile(T, 0.01) - here T == TRUE, so you're calculating the quantile of 1.
The object s is never used.
mean(sample(c(x,y), B, replace=TRUE)) What are you trying to do here? The c() function combines x and y. Sampling makes no sense since you don't know what population they come from
When you calculate the test statistic t, it should depend on the variance (and sample size).
Related
I would like to get CI for a paired t-test using bootstrap in R. Unfortunately, I dont have privileges to install the MKinfer. Using MKinfer I would (like here: T-test with bootstrap in R):
boot.t.test(
x = iris["Petal.Length"],
y = iris["Sepal.Length"],
alternative = c("two.sided"),
mu = 0,
#paired = TRUE,
conf.level = 0.95,
R = 9999
)
How would I do this for paired data with CI's and p-values not relying on MKinfer (relying on boot would be fine)?
Here is an example using boot using R = 1000 bootstrap replicates
library(boot)
x <- iris$Petal.Length
y <- iris$Sepal.Length
change_in_mean <- function(df, indices) t.test(
df[indices, 1], df[indices, 2], paired = TRUE, var.equal = FALSE)$estimate
model <- boot(
data = cbind(x, y),
statistic = change_in_mean,
R = 1000)
We can calculate the confidence interval of the estimated change in the mean using boot.ci
boot.ci(model, type = "norm")
#BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
#Based on 1000 bootstrap replicates
#
#CALL :
#boot.ci(boot.out = model, type = "norm")
#
#Intervals :
#Level Normal
#95% (-2.262, -1.905 )
#Calculations and Intervals on Original Scale
Note that this is very close to the CI reported by t.test
t.test(x, y, paired = TRUE, var.equal = FALSE)
#
# Paired t-test
#
#data: x and y
#t = -22.813, df = 149, p-value < 2.2e-16
#alternative hypothesis: true difference in means is not equal to 0
#95 percent confidence interval:
# -2.265959 -1.904708
#sample estimates:
#mean of the differences
# -2.085333
I am working with copulas and I am trying to select the best copula I want to use a goodness of fit test, but my code is giving me the same p-values for all the copulas, which is ceirtainly wrong. Here is my code:
dat <- read_excel("SMI.xlsx")
data <- cbind(dat[,2], dat[,3], dat[4])
dataNZ <- cbind(data[,1], data[,3])
#definition of copulas
#Normal copula
nc <- ellipCopula(family = "normal", param= 0.5, dim = 2, dispstr = "un")
#Gumbel-Hougaard copula
gc <- gumbelCopula(dim = 2, param = 1.1)
#GOF tests
gofmplnNZ <- gofCopula(nc, u, estim.method= "mpl")
gofmplgNZ <- gofCopula(gc, u, estim.method= "mpl")
To show exaclty what I mean here are the results of gofmplnNZ and gofmplgNZ:
> gofmplnNZ
Parametric bootstrap-based goodness-of-fit test of Normal copula, dim. d = 2, with 'method'="Sn",
'estim.method'="mpl":
data: x
statistic = 0.062807, parameter = 0.45013, p-value = 0.0004995
> gofmplgNZ
Parametric bootstrap-based goodness-of-fit test of Gumbel copula, dim. d = 2, with 'method'="Sn",
'estim.method'="mpl":
data: x
statistic = 0.1805, parameter = 1.4038, p-value = 0.0004995
I realize that this kind of question has been asked before, but I don't understand why my code is breaking.
I have tried mapply alone and with do.call as well as the purrr package's pmap function. I keep getting "unused argument" errors among others. Since all 3 keep failing, I figure I must be referencing my data incorrectly in the arguments. I have used mdply from the plyr package to do something like this, but that was over a year ago. Of course, any alternative approaches would be appreciated too.
To create the dataframe, compar:
obs = floor(runif(500, 1,99))
p = round(runif(500,0,1), digits = 4)
n = floor(runif(500, 100,150))
test = rep("two.sided", 500)
conf = rep(0.95, 500)
compar = as.data.frame(cbind(obs,n, p))
compar$test = test
compar$conf = conf
head(compar, 3)
obs p n test conf
1 47 0.2432 133 two.sided 0.95
2 52 0.3391 118 two.sided 0.95
3 22 0.2790 115 two.sided 0.95
I try pmap:
pmap(.l = compar, .f = binom.test)
Error in .f(obs = .l[[c(1L, i)]], p = .l[[c(2L, i)]], n = .l[[c(3L, i)]], :
unused arguments (obs = .l[[c(1, i)]], test = .l[[c(4, i)]])
Next up, mapply:
mapply(compar, FUN = binom.test)
Error in (function (x, n, p = 0.5, alternative = c("two.sided", "less", :
incorrect length of 'x'
Finally, do.call and mapply
do.call(mapply, c(binom.test, compar[c("obs", "n", "p", "test", "conf")]))
Error in (function (x, n, p = 0.5, alternative = c("two.sided", "less", :
unused arguments (obs = dots[[1]][[1]], test = dots[[4]][[1]])
The column names don't match binom.test arguments; For the pmap version, renaming the columns according to binom.test arguments should work:
pmap(select(compar, x=obs, n, p, alternative=test, conf), binom.test)
#[[1]]
# Exact binomial test
#data: .l[[c(1L, i)]] and .l[[c(2L, i)]]
#number of successes = 5, number of trials = 149, p-value < 2.2e-16
#alternative hypothesis: true probability of success is not equal to 0.435
#95 percent confidence interval:
# 0.01098400 0.07657136
#sample estimates:
#probability of success
# 0.03355705
#[[2]]
# Exact binomial test
#data: .l[[c(1L, i)]] and .l[[c(2L, i)]]
#number of successes = 20, number of trials = 113, p-value = 1.391e-10
#alternative hypothesis: true probability of success is not equal to 0.4681
#95 percent confidence interval:
# 0.1115928 0.2600272
#sample estimates:
#probability of success
# 0.1769912
# more output
Or: pmap(rename(compar, x=obs, alternative=test), binom.test)
i need to write an own test in R with the help of the mean of a given test statistic of 2 given random variables X and Y which are unknown distributed.
I am given following code:
mean.test <- function(x, y, B=10000,
alternative=c("two.sided","less","greater"))
{
p.value <- 0
alternative <- match.arg(alternative)
s<-replicate(B, (mean(sample(c(x,y), B, replace=TRUE))-mean(sample(c(x,y), B, replace=TRUE)))) # random samples of test statistics
t <- mean(x) - mean(y) #teststatistics t
p.value <- 2 * (1- pnorm(mean(s))) #try to calculate p value
data.name <- deparse(substitute(c(x,y)))
names(t) <- "difference in means"
zero <- 0
names(zero) <- "difference in means"
return(structure(list(statistic = t, p.value = p.value,
method = "mean test", data.name = data.name,
observed = c(x,y), alternative = alternative,
null.value = zero),
class = "htest"))
}
Where t is the mean of a random set of the variables X and Y substracted from each other. I am given some solution to some function calls, but i never get them.
For example following:
set.seed(0)
mean.test(rnorm(100,50,4),rnorm(100,51,5),alternative="less")
Should output:
mean test
data: c(rnorm(100, 50, 4), rnorm(100, 51, 5))
difference in means = -2.0224, p-value = 0.0011
alternative hypothesis: true difference in means is less than 0
But it outputs:
mean test
data: c(rnorm(100, 50, 4), rnorm(100, 51, 5))
difference in means = -0.68157, p-value = 1
alternative hypothesis: true difference in means is less than 0
I am sure that i am calculating the p value in a wrong way. Also the mean values substracted from each other are wrong for this example, but right for other examples of the excercise. I am really confused as how to calculate the p value. How do i calculate it?
I have a problem with ks function in R. I have a Laplace Distribution:
ldes <- function(y, a) {
if(y < 0.5) 1/a*log(2*y, 2)
else 1/a*log(2*(1-y), 2)
}
a <- 1
set.seed(1)
y = runif(1000, 0, 1)
ld <- ldes(y, a)
So, I need to do the ks test, but can't find anything about second parameter that should be in there, like:
ks.test(my_lnorm, **plnorm**, mean = -5, sd = 5)
for Lognormal Destribution or:
ks.test(my_log, **plogis**, location = 2, scale = 3)
for Logistics Destribution
Thanks.
You can try some package for the laplace distribution, for example disclap (if it satisfies our need, otherwise some continuous analog).
library(disclap)
ks.test(ld, "pdisclap", 0.5) # choose the right value of parameter p (p=0.5 is arbitrary)
One-sample Kolmogorov-Smirnov test
data: ld
D = 0.3333, p-value < 2.2e-16
alternative hypothesis: two-sided
As can be seen from the result of the hypothesis test, the null hypothesis (that the samples are drawn from the same population distribution) is rejected.
y2 <- rdisclap(1000, p=0.5) # generate some simulated datapoints
plot(ecdf(ld), xlim = range(c(ld, y2))) # compare ecdfs
plot(ecdf(y2), add = TRUE, lty = "dashed")