I realize that this kind of question has been asked before, but I don't understand why my code is breaking.
I have tried mapply alone and with do.call as well as the purrr package's pmap function. I keep getting "unused argument" errors among others. Since all 3 keep failing, I figure I must be referencing my data incorrectly in the arguments. I have used mdply from the plyr package to do something like this, but that was over a year ago. Of course, any alternative approaches would be appreciated too.
To create the dataframe, compar:
obs = floor(runif(500, 1,99))
p = round(runif(500,0,1), digits = 4)
n = floor(runif(500, 100,150))
test = rep("two.sided", 500)
conf = rep(0.95, 500)
compar = as.data.frame(cbind(obs,n, p))
compar$test = test
compar$conf = conf
head(compar, 3)
obs p n test conf
1 47 0.2432 133 two.sided 0.95
2 52 0.3391 118 two.sided 0.95
3 22 0.2790 115 two.sided 0.95
I try pmap:
pmap(.l = compar, .f = binom.test)
Error in .f(obs = .l[[c(1L, i)]], p = .l[[c(2L, i)]], n = .l[[c(3L, i)]], :
unused arguments (obs = .l[[c(1, i)]], test = .l[[c(4, i)]])
Next up, mapply:
mapply(compar, FUN = binom.test)
Error in (function (x, n, p = 0.5, alternative = c("two.sided", "less", :
incorrect length of 'x'
Finally, do.call and mapply
do.call(mapply, c(binom.test, compar[c("obs", "n", "p", "test", "conf")]))
Error in (function (x, n, p = 0.5, alternative = c("two.sided", "less", :
unused arguments (obs = dots[[1]][[1]], test = dots[[4]][[1]])
The column names don't match binom.test arguments; For the pmap version, renaming the columns according to binom.test arguments should work:
pmap(select(compar, x=obs, n, p, alternative=test, conf), binom.test)
#[[1]]
# Exact binomial test
#data: .l[[c(1L, i)]] and .l[[c(2L, i)]]
#number of successes = 5, number of trials = 149, p-value < 2.2e-16
#alternative hypothesis: true probability of success is not equal to 0.435
#95 percent confidence interval:
# 0.01098400 0.07657136
#sample estimates:
#probability of success
# 0.03355705
#[[2]]
# Exact binomial test
#data: .l[[c(1L, i)]] and .l[[c(2L, i)]]
#number of successes = 20, number of trials = 113, p-value = 1.391e-10
#alternative hypothesis: true probability of success is not equal to 0.4681
#95 percent confidence interval:
# 0.1115928 0.2600272
#sample estimates:
#probability of success
# 0.1769912
# more output
Or: pmap(rename(compar, x=obs, alternative=test), binom.test)
Related
I need to compare the variances of several independent samples. I don't have the data stored in vectors. I only know the mean, standard deviation and the sample count of each sample. Does anyone know a way to test whether the variances are aqual with only those three statistics in R?
Here is an implementation of the Bartlett test that doesn't require the samples only their sizes and standard errors or variances.
The arguments are
n a vector of sample sizes;
S a vector of standard errors or variances;
se a logical value, if TRUE argument S are the standard errors, if FALSE they are the variances.
Tested below with data set iris.
Bartlett_test <- function(n, S, se = TRUE){
dname <- deparse(substitute(S))
N <- sum(n)
k <- length(n)
S2 <- if(se) S^2 else S
S2p <- sum((n - 1)* S2)/(N - k)
numer <- (N - k)*log(S2p) - sum((n - 1)*log(S2))
denom <- 1 + (sum(1/(n - 1)) - 1/(N - k))/(3*(k - 1))
statistic <- c(X2 = numer/denom)
parameter <- k - 1
p.value <- pchisq(statistic, df = parameter, lower.tail = FALSE)
ht <- list(
statistic = statistic,
data.name = dname,
parameter = parameter,
p.value = p.value,
method = "Bartlett test of homogeneity of variances",
alternative = "there are at least two unequal variances"
)
class(ht) <- "htest"
ht
}
n <- with(iris, tapply(Sepal.Length, Species, FUN = length))
s <- with(iris, tapply(Sepal.Length, Species, FUN = sd))
s2 <- with(iris, tapply(Sepal.Length, Species, FUN = var))
Bartlett_test(n, s)
Bartlett_test(n, s2, se = FALSE)
Please help me understand why am I getting that error and any way to fix it. Thank you!
a<- filter(combine2, NEIGHBORHOOD_NAME=="ASTORIA", YEAR_BUILT>="2009")$SALE_PRICE
b<- filter(combine2, NEIGHBORHOOD_NAME=="CORONA", YEAR_BUILT>="2009")$SALE_PRICE
t.test(a,b, alternative = "greater", mu=0, paired= "false", conf.level = .95)
Looks like the error is based on the t.test where the paired takes a logical value i.e. TRUE/FALSE instead of a string "false"
t.test(a,b, alternative = "greater", mu=0, paired= FALSE, conf.level = .95)
i.e. the error is reproducible with
t.test(1:10, y = c(7:20), paired = "false")
Error in paired || !is.null(y) : invalid 'x' type in 'x || y'
instead it would be
t.test(1:10, y = c(7:20))
# Welch Two Sample t-test
#data: 1:10 and c(7:20)
#t = -5.4349, df = 21.982, p-value = 1.855e-05
#alternative hypothesis: true difference in means is not equal to 0
#95 percent confidence interval:
# -11.052802 -4.947198
#sample estimates:
#mean of x mean of y
# 5.5 13.5
I would like to get CI for a paired t-test using bootstrap in R. Unfortunately, I dont have privileges to install the MKinfer. Using MKinfer I would (like here: T-test with bootstrap in R):
boot.t.test(
x = iris["Petal.Length"],
y = iris["Sepal.Length"],
alternative = c("two.sided"),
mu = 0,
#paired = TRUE,
conf.level = 0.95,
R = 9999
)
How would I do this for paired data with CI's and p-values not relying on MKinfer (relying on boot would be fine)?
Here is an example using boot using R = 1000 bootstrap replicates
library(boot)
x <- iris$Petal.Length
y <- iris$Sepal.Length
change_in_mean <- function(df, indices) t.test(
df[indices, 1], df[indices, 2], paired = TRUE, var.equal = FALSE)$estimate
model <- boot(
data = cbind(x, y),
statistic = change_in_mean,
R = 1000)
We can calculate the confidence interval of the estimated change in the mean using boot.ci
boot.ci(model, type = "norm")
#BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
#Based on 1000 bootstrap replicates
#
#CALL :
#boot.ci(boot.out = model, type = "norm")
#
#Intervals :
#Level Normal
#95% (-2.262, -1.905 )
#Calculations and Intervals on Original Scale
Note that this is very close to the CI reported by t.test
t.test(x, y, paired = TRUE, var.equal = FALSE)
#
# Paired t-test
#
#data: x and y
#t = -22.813, df = 149, p-value < 2.2e-16
#alternative hypothesis: true difference in means is not equal to 0
#95 percent confidence interval:
# -2.265959 -1.904708
#sample estimates:
#mean of the differences
# -2.085333
I wrote this code to run a test statistic on two randomly distributed observations x and y
mean.test <- function(x, y, B=10000,
alternative=c("two.sided","less","greater"))
{
p.value <- 0
alternative <- match.arg(alternative)
s <- replicate(B, (mean(sample(c(x,y), B, replace=TRUE))-mean(sample(c(x,y), B, replace=TRUE))))
t <- mean(x) - mean(y)
p.value <- 2*(1- pnorm(abs(quantile(T,0.01)), mean = 0, sd = 1, lower.tail =
TRUE, log.p = FALSE)) #try to calculate p value
data.name <- deparse(substitute(c(x,y)))
names(t) <- "difference in means"
zero <- 0
names(zero) <- "difference in means"
return(structure(list(statistic = t, p.value = p.value,
method = "mean test", data.name = data.name,
observed = c(x,y), alternative = alternative,
null.value = zero),
class = "htest"))
}
the code uses a Monte-Carlo simulations to generate the distribution function of the test statistic mean(x) - mean(y) and then calculates the p-value, but apparently i miss defined this p-value because for :
> set.seed(0)
> mean.test(rnorm(1000,3,2),rnorm(2000,4,3))
the output should look like:
mean test
data: c(rnorm(1000, 3, 2), rnorm(2000, 4, 3))
difference in means = -1.0967, p-value < 2.2e-16
alternative hypothesis: true difference in means is not equal to 0
but i got this instead:
mean test
data: c(rnorm(1000, 3, 2), rnorm(2000, 4, 3))
difference in means = -1.0967, p-value = 0.8087
alternative hypothesis: true difference in means is not equal to 0
can someone explain the bug to me ?
As far as I can tell, your code has numerous mistakes and errors in it:
quantile(T, 0.01) - here T == TRUE, so you're calculating the quantile of 1.
The object s is never used.
mean(sample(c(x,y), B, replace=TRUE)) What are you trying to do here? The c() function combines x and y. Sampling makes no sense since you don't know what population they come from
When you calculate the test statistic t, it should depend on the variance (and sample size).
i need to write an own test in R with the help of the mean of a given test statistic of 2 given random variables X and Y which are unknown distributed.
I am given following code:
mean.test <- function(x, y, B=10000,
alternative=c("two.sided","less","greater"))
{
p.value <- 0
alternative <- match.arg(alternative)
s<-replicate(B, (mean(sample(c(x,y), B, replace=TRUE))-mean(sample(c(x,y), B, replace=TRUE)))) # random samples of test statistics
t <- mean(x) - mean(y) #teststatistics t
p.value <- 2 * (1- pnorm(mean(s))) #try to calculate p value
data.name <- deparse(substitute(c(x,y)))
names(t) <- "difference in means"
zero <- 0
names(zero) <- "difference in means"
return(structure(list(statistic = t, p.value = p.value,
method = "mean test", data.name = data.name,
observed = c(x,y), alternative = alternative,
null.value = zero),
class = "htest"))
}
Where t is the mean of a random set of the variables X and Y substracted from each other. I am given some solution to some function calls, but i never get them.
For example following:
set.seed(0)
mean.test(rnorm(100,50,4),rnorm(100,51,5),alternative="less")
Should output:
mean test
data: c(rnorm(100, 50, 4), rnorm(100, 51, 5))
difference in means = -2.0224, p-value = 0.0011
alternative hypothesis: true difference in means is less than 0
But it outputs:
mean test
data: c(rnorm(100, 50, 4), rnorm(100, 51, 5))
difference in means = -0.68157, p-value = 1
alternative hypothesis: true difference in means is less than 0
I am sure that i am calculating the p value in a wrong way. Also the mean values substracted from each other are wrong for this example, but right for other examples of the excercise. I am really confused as how to calculate the p value. How do i calculate it?