I am trying to calculate the correlation between two numeric columns in a data frame for each level of a factor. Here is an example data frame:
concentration <-(c(3, 8, 4, 7, 3, 1, 3, 3, 8, 6))
area <-c(0.5, 0.9, 0.3, 0.4, 0.5, 0.8, 0.9, 0.2, 0.7, 0.7)
area_type <-c("A", "B", "A", "B", "A", "B", "A", "B", "A", "B")
data_frame <-data.frame(concentration, area, area_type)
In this example, I want to calculate the correlation between concentration and area for each level of area_type. I want to use cor.test rather than cor because I want p-values and kendall tau values. I have tried to do this using ddply:
ddply(data_frame, "area_type", summarise,
corr=(cor.test(data_frame$area, data_frame$concentration,
alternative="two.sided", method="kendall") ) )
However, I am having a problem with the output: it is organized differently from the normal Kendall cor.test output, which states z value, p-value, alternative hypothesis, and tau estimate. Instead of that, I get the output below. I don't know what each row of the output indicates. In addition, the output values are the same for each level of area_type.
area_type corr
1 A 0.3766218
2 A NULL
3 A 0.7064547
4 A 0.1001252
5 A 0
6 A two.sided
7 A Kendall's rank correlation tau
8 A data_frame$area and data_frame$concentration
9 B 0.3766218
10 B NULL
11 B 0.7064547
12 B 0.1001252
13 B 0
14 B two.sided
15 B Kendall's rank correlation tau
16 B data_frame$area and data_frame$concentration
What am I doing wrong with ddply? Or are there other ways of doing this? Thanks.
You can add an additional column with the names of corr. Also, your syntax is slightly incorrect. The . specifies that the variable is from the data frame you've specified. Then remove the data_frame$ or else it will use the entire data frame:
ddply(data_frame, .(area_type), summarise,
corr=(cor.test(area, concentration,
alternative="two.sided", method="kendall")), name=names(corr) )
Which gives:
area_type corr name
1 A -0.285133 statistic
2 A NULL parameter
3 A 0.7755423 p.value
4 A -0.1259882 estimate
5 A 0 null.value
6 A two.sided alternative
7 A Kendall's rank correlation tau method
8 A area and concentration data.name
9 B 6 statistic
10 B NULL parameter
11 B 0.8166667 p.value
12 B 0.2 estimate
13 B 0 null.value
14 B two.sided alternative
15 B Kendall's rank correlation tau method
16 B area and concentration data.name
statistic is the z-value and estimate is the tau estimate.
EDIT: You can also do it like this to only pull what you want:
corfun<-function(x, y) {
corr=(cor.test(x, y,
alternative="two.sided", method="kendall"))
}
ddply(data_frame, .(area_type), summarise,z=corfun(area,concentration)$statistic,
pval=corfun(area,concentration)$p.value,
tau.est=corfun(area,concentration)$estimate,
alt=corfun(area,concentration)$alternative
)
Which gives:
area_type z pval tau.est alt
1 A -0.285133 0.7755423 -0.1259882 two.sided
2 B 6.000000 0.8166667 0.2000000 two.sided
Part of the reason this is not working is the cor.test returns:
Pearson's product-moment correlation
data: data_frame$concentration and data_frame$area
t = 0.5047, df = 8, p-value = 0.6274
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
-0.5104148 0.7250936
sample estimates:
cor
0.1756652
This information cannot be put into a data.frame (which ddply does) without future complicating the code. If you can provide the exact information you need then I can provide further assistance. I would look at just using
corrTest <- ddply(.data = data_frame,
.variables = .(area_type),
.fun = cor(concentration, area,))
method="kendall")))
I haven't test this code but this is the route I would take initially and work from here.
Related
Everything is running fine, but I'm checking to make sure that non-numeric values don't totally screw up the test. I turned the variables into numeric using as.numeric and it returned "introduced by coercion" - but it worked!
I'm running this line of code with a file of 2020 Presidential election data by county and Unemployment data.
cor.test(Unemployment2020, PercentD2020, method = 'spearman', exact = FALSE).
Does the "exact = FALSE" piece make it unnecessary for there to be the same number of numeric values for each variable?
This has nothing to do with exact = FALSE.
Since cor.test is an S3 generic, when you pass two numeric vectors to it, you will invoke the stats:::cor.test.default method. Reviewing the source code of this function, you will see that it silently drops the NA values in lines 10 to 13 of the function body:
OK <- complete.cases(x, y)
x <- x[OK]
y <- y[OK]
n <- length(x)
The complete.cases(x, y) here will drop NA values from both vectors, so that only matching entries where neither are NA will be considered.
We can see this in action with the following example. Suppose we have an x and a y vector and want to run cor.test, but each has an NA value at a different point:
x <- c(1, 2, NA, 3, 4, 5)
y <- c(1.1, 1.9, 7, 3.3, 4.5, NA)
cor.test(x, y)
#>
#> Pearson's product-moment correlation
#>
#> data: x and y
#> t = 13.671, df = 2, p-value = 0.005308
#> alternative hypothesis: true correlation is not equal to 0
#> 95 percent confidence interval:
#> 0.7634907 0.9998944
#> sample estimates:
#> cor
#> 0.9946918
We should get the same result if we drop the third entry from each vector (since x has an NA there) and drop the 6th entry where y has an NA:
x <- c(1, 2, 3, 4)
y <- c(1.1, 1.9, 3.3, 4.5)
cor.test(x, y)
#>
#> Pearson's product-moment correlation
#>
#> data: x and y
#> t = 13.671, df = 2, p-value = 0.005308
#> alternative hypothesis: true correlation is not equal to 0
#> 95 percent confidence interval:
#> 0.7634907 0.9998944
#> sample estimates:
#> cor
#> 0.9946918
Created on 2022-07-22 by the reprex package (v2.0.1)
I have a (big)dataset which looks like this:-
dat <- data.frame(m=c(rep("a",4),rep("b",3),rep("c",2)),
n1 =round(rnorm(mean = 20,sd = 10,n = 9)))
g <- rnorm(20,10,5)
dat
m n1
1 a 15.132
2 a 17.723
3 a 3.958
4 a 19.239
5 b 11.417
6 b 12.583
7 b 32.946
8 c 11.970
9 c 26.447
I want to perform a t-test on each category of "m" with vectorg like
n1.a <- c(15.132,17.723,3.958,19.329)
I need to do a t-test like t.test(n1.a,g)
I initially thought about breaking them up into list using split(dat,dat$m) and
then use lapply, but it is not working .
Any thoughts on how to go about it ?
Here's a tidyverse solution using map from purrr:
dat %>%
split(.$m) %>%
map(~ t.test(.x$n1, g), data = .x$n1)
Or, using lapply as you mentioned, which will store all of your t-test statistics in a list (or a shorter version using by, thanks #markus):
dat <- split(dat, dat$m)
dat <- lapply(dat, function(x) t.test(x$n1, g))
Or
dat <- by(dat, m, function(x) t.test(x$n1, g))
Which gives us:
$a
Welch Two Sample t-test
data: .x$n1 and g
t = 1.5268, df = 3.0809, p-value = 0.2219
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-11.61161 33.64902
sample estimates:
mean of x mean of y
21.2500 10.2313
$b
Welch Two Sample t-test
data: .x$n1 and g
t = 1.8757, df = 2.2289, p-value = 0.1883
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-7.325666 20.863073
sample estimates:
mean of x mean of y
17.0000 10.2313
$c
Welch Two Sample t-test
data: .x$n1 and g
t = 10.565, df = 19, p-value = 2.155e-09
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
7.031598 10.505808
sample estimates:
mean of x mean of y
19.0000 10.2313
In base R you can do
lapply(split(dat, dat$m), function(x) t.test(x$n1, g))
Output
$a
Welch Two Sample t-test
data: x$n1 and g
t = 1.9586, df = 3.2603, p-value = 0.1377
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-6.033451 27.819258
sample estimates:
mean of x mean of y
21.0000 10.1071
$b
Welch Two Sample t-test
data: x$n1 and g
t = 2.3583, df = 2.3202, p-value = 0.1249
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-5.96768 25.75349
sample estimates:
mean of x mean of y
20.0000 10.1071
$c
Welch Two Sample t-test
data: x$n1 and g
t = 13.32, df = 15.64, p-value = 6.006e-10
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
13.77913 19.00667
sample estimates:
mean of x mean of y
26.5000 10.1071
Data
set.seed(1)
dat <- data.frame(m=c(rep("a",4),rep("b",3),rep("c",2)),
n1 =round(rnorm(mean = 20,sd = 10,n = 9)))
g <- rnorm(20,10,5)
I'm working on a dataset with several different types of proteins as columns. It kinds of looks like this This is simplified, the original dataset contains over 100 types of proteins. I wanted to see if the concentration of a protein differs by treatments when taking random effect (=id) into consideration. I managed to run multiple repeated ANOVA at once. But I would also like to do pairwise comparisons for all proteins based on the treatment. The first thing came to my mind was using emmeans package, but I had trouble coding this.
#install packages
library(tidyverse)
library(emmeans)
#Create a data set
set.seed(1)
id <- rep(c("1","2","3","4","5","6"),3)
Treatment <- c(rep(c("A"), 6), rep(c("B"), 6),rep(c("C"), 6))
Protein1 <- c(rnorm(3, 1, 0.4), rnorm(3, 3, 0.5), rnorm(3, 6, 0.8), rnorm(3, 1.1, 0.4), rnorm(3, 0.8, 0.2), rnorm(3, 1, 0.6))
Protein2 <- c(rnorm(3, 1, 0.4), rnorm(3, 3, 0.5), rnorm(3, 6, 0.8), rnorm(3, 1.1, 0.4), rnorm(3, 0.8, 0.2), rnorm(3, 1, 0.6))
Protein3 <- c(rnorm(3, 1, 0.4), rnorm(3, 3, 0.5), rnorm(3, 6, 0.8), rnorm(3, 1.1, 0.4), rnorm(3, 0.8, 0.2), rnorm(3, 1, 0.6))
DF <- data.frame(id, Treatment, Protein1, Protein2, Protein3) %>%
mutate(id = factor(id),
Treatment = factor(Treatment, levels = c("A","B","C")))
#First, I tried to run multiple anova, by using lapply
responseList <- names(DF)[c(3:5)]
modelList <- lapply(responseList, function(resp) {
mF <- formula(paste(resp, " ~ Treatment + Error(id/Treatment)"))
aov(mF, data = DF)
})
lapply(modelList, summary)
#Pairwise comparison using emmeans. This did not work
wt_emm <- emmeans(modelList, "Treatment")
> wt_emm <- emmeans(modelList, "Treatment")
Error in ref_grid(object, ...) : Can't handle an object of class “list”
Use help("models", package = "emmeans") for information on supported models.
So I tried a different approach
anova2 <- aov(cbind(Protein1,Protein2,Protein3)~ Treatment +Error(id/Treatment), data = DF)
summary(anova2)
#Pairwise comparison using emmeans.
#I got only result for the whole dataset, instead of by different types of protein.
wt_emm2 <- emmeans(anova2, "Treatment")
pairs(wt_emm2)
> pairs(wt_emm2)
contrast estimate SE df t.ratio p.value
A - B -1.704 1.05 10 -1.630 0.2782
A - C 0.865 1.05 10 0.827 0.6955
B - C 2.569 1.05 10 2.458 0.0793
I don't understand why even if I used "cbind(Protein1, Protein2, Protein3)" in the anova model. R still only gives me one result instead of something like the following
this is what I was hoping to get
> Protein1
contrast
A - B
A - C
B - C
> Protein2
contrast
A - B
A - C
B - C
> Protein3
contrast
A - B
A - C
B - C
How do I code this or should I try a different package/function?
I don't have trouble running one protein at a time. However, since I have over 100 proteins to run, it would be really time-consuming to code them one by one.
Any suggestion is appreciated. Thank you!
Here
#Pairwise comparison using emmeans. This did not work
wt_emm <- emmeans(modelList, "Treatment")
you need to lapply over the list like you did with lapply(modelList, summary)
modelList <- lapply(responseList, function(resp) {
mF <- formula(paste(resp, " ~ Treatment + Error(id/Treatment)"))
aov(mF, data = DF)
})
But when you do this, there is an error:
lapply(modelList, function(x) pairs(emmeans(x, "Treatment")))
Note: re-fitting model with sum-to-zero contrasts
Error in terms(formula, "Error", data = data) : object 'mF' not found
attr(modelList[[1]], 'call')$formula
# mF
Note that mF was the name of the formula object, so it seems emmeans needs the original formula for some reason. You can add the formula to the call:
modelList <- lapply(responseList, function(resp) {
mF <- formula(paste(resp, " ~ Treatment + Error(id/Treatment)"))
av <- aov(mF, data = DF)
attr(av, 'call')$formula <- mF
av
})
lapply(modelList, function(x) pairs(emmeans(x, "Treatment")))
# [[1]]
# contrast estimate SE df t.ratio p.value
# A - B -1.89 1.26 10 -1.501 0.3311
# A - C 1.08 1.26 10 0.854 0.6795
# B - C 2.97 1.26 10 2.356 0.0934
#
# P value adjustment: tukey method for comparing a family of 3 estimates
#
# [[2]]
# contrast estimate SE df t.ratio p.value
# A - B -1.44 1.12 10 -1.282 0.4361
# A - C 1.29 1.12 10 1.148 0.5082
# B - C 2.73 1.12 10 2.430 0.0829
#
# P value adjustment: tukey method for comparing a family of 3 estimates
#
# [[3]]
# contrast estimate SE df t.ratio p.value
# A - B -1.58 1.15 10 -1.374 0.3897
# A - C 1.27 1.15 10 1.106 0.5321
# B - C 2.85 1.15 10 2.480 0.0765
#
# P value adjustment: tukey method for comparing a family of 3 estimates
Make a loop of the function by column names.
responseList <- names(DF)[c(3:5)]
for(n in responseList) {
anova2 <- aov(get(n) ~ Treatment +Error(id/Treatment), data = DF)
summary(anova2)
wt_emm2 <- emmeans(anova2, "Treatment")
print(pairs(wt_emm2))
}
This returns
Note: re-fitting model with sum-to-zero contrasts
Note: Use 'contrast(regrid(object), ...)' to obtain contrasts of back-transformed estimates
contrast estimate SE df t.ratio p.value
A - B -1.41 1.26 10 -1.122 0.5229
A - C 1.31 1.26 10 1.039 0.5705
B - C 2.72 1.26 10 2.161 0.1269
Note: contrasts are still on the get scale
P value adjustment: tukey method for comparing a family of 3 estimates
Note: re-fitting model with sum-to-zero contrasts
Note: Use 'contrast(regrid(object), ...)' to obtain contrasts of back-transformed estimates
contrast estimate SE df t.ratio p.value
A - B -2.16 1.37 10 -1.577 0.2991
A - C 1.19 1.37 10 0.867 0.6720
B - C 3.35 1.37 10 2.444 0.0810
Note: contrasts are still on the get scale
P value adjustment: tukey method for comparing a family of 3 estimates
Note: re-fitting model with sum-to-zero contrasts
Note: Use 'contrast(regrid(object), ...)' to obtain contrasts of back-transformed estimates
contrast estimate SE df t.ratio p.value
A - B -1.87 1.19 10 -1.578 0.2988
A - C 1.28 1.19 10 1.077 0.5485
B - C 3.15 1.19 10 2.655 0.0575
Note: contrasts are still on the get scale
P value adjustment: tukey method for comparing a family of 3 estimates
If you want to have the output as a list:
responseList <- names(DF)[c(3:5)]
output <- list()
for(n in responseList) {
anova2 <- aov(get(n) ~ Treatment +Error(id/Treatment), data = DF)
summary(anova2)
wt_emm2 <- emmeans(anova2, "Treatment")
output[[n]] <- pairs(wt_emm2)
}
I'm not sure if this is more a programming or statistical (i.e. my lack of understanding) question.
I have a Poisson mixed model that I want to use to compare average counts across groups at different time periods.
mod <- glmer(Y ~ TX_GROUP * time + (1|ID), data = dat, family = poisson)
mod_em <- emmeans(mod, c("TX_GROUP","time"), type = "response")
TX_GROUP time rate SE df asymp.LCL asymp.UCL
0 1 5.743158 0.4566671 Inf 4.914366 6.711723
1 1 5.529303 0.4639790 Inf 4.690766 6.517741
0 2 2.444541 0.2981097 Inf 1.924837 3.104564
1 2 1.467247 0.2307103 Inf 1.078103 1.996855
0 3 4.570218 0.4121428 Inf 3.829795 5.453790
1 3 1.676827 0.2472920 Inf 1.255904 2.238826
Now, I want to estimate the marginal count for the combined time period (2 + 3) for each group. Is it not a simple case of exponentiating the sum of the logged counts from:
contrast(mod_em, list(`2 + 3` = c(0, 0, 1, 0, 1, 0)))
contrast(mod_em, list(`2 + 3` = c(0, 0, 0, 1, 0, 1)))
If I try that the value does not come close to matching the simple mean of the combined groups.
First, I suggest that you put both of your contrasts in one list, e.g.,
contr = list(`2+2|0` = c(0, 0, 1, 0, 1, 0),
`2+3|1` = c(0, 0, 0, 1, 0, 1))
You have to decide when you want to back-transform. See the vignette on transformations and note the discussion on "timing is everything". The two basic options are:
One option: Obtain the marginal means of the log counts, and then back-transform:
mod_con = update(contrast(mod_emm, contr), tran = "log")
summary(mod_con, type = "response")
[The update call is needed because contrast strips off transformations except in special cases, because it doesn't always know what scale to assign to arbitrary linear functions. For example, the difference of two square roots is not on a square-root scale.]
Second option: Back-transform the predictions, then sum them:
mod_emmr = regrid(mod_emm)
contrast(mod_emmr, contr)
The distinction between these results is the same as the distinction between a geometric mean (option 1) and an arithmetic mean (option 2). I doubt that either of them will yield the same results as the raw marginal mean counts, because they are based on the predictions from your model. Personally, I think the first option is the better choice, because sums are a linear operation, and the model is linear on the log scale.
Addendum
There is actually a third option, which is to create a grouping variable. I will illustrate with the pigs dataset.
> pigs.lm <- lm(log(conc) ~ source + factor(percent), data = pigs)
Here are the EMMs for percent:
> emmeans(pigs.lm, "percent")
percent emmean SE df lower.CL upper.CL
9 3.445307 0.04088810 23 3.360723 3.529890
12 3.624861 0.03837600 23 3.545475 3.704248
15 3.662706 0.04372996 23 3.572244 3.753168
18 3.745156 0.05296030 23 3.635599 3.854713
Results are averaged over the levels of: source
Results are given on the log (not the response) scale.
Confidence level used: 0.95
Now let's create a grouping factor group:
> pigs.emm = add_grouping(ref_grid(pigs.lm), "group", "percent", c("1&2","1&2","3&4","3&4"))
> str(pigs.emm)
'emmGrid' object with variables:
source = fish, soy, skim
percent = 9, 12, 15, 18
group = 1&2, 3&4
Nesting structure: percent %in% group
Transformation: “log”
Now get the EMMs for group and note they are just the averages of the respective levels:
> emmeans(pigs.emm, "group")
group emmean SE df lower.CL upper.CL
1&2 3.535084 0.02803816 23 3.477083 3.593085
3&4 3.703931 0.03414907 23 3.633288 3.774574
Results are averaged over the levels of: source, percent
Results are given on the log (not the response) scale.
Confidence level used: 0.95
And here is a summary on the response scale:
> summary(.Last.value, type = "response")
group response SE df lower.CL upper.CL
1&2 34.29790 0.961650 23 32.36517 36.34605
3&4 40.60662 1.386678 23 37.83703 43.57893
Results are averaged over the levels of: source, percent
Confidence level used: 0.95
Intervals are back-transformed from the log scale
These are averages rather than sums, but otherwise it works, and the transformation doesn't get zapped like it does in contrast()
To use the example data from the package, it seems to be fine, though I'd use the grouping in the formula instead.
> warp.lm <- lm(breaks ~ wool*tension, data = warpbreaks)
> warp.emm <- emmeans(warp.lm, c("tension", "wool"))
> warp.emm
tension wool emmean SE df lower.CL upper.CL
L A 44.55556 3.646761 48 37.22325 51.88786
M A 24.00000 3.646761 48 16.66769 31.33231
H A 24.55556 3.646761 48 17.22325 31.88786
L B 28.22222 3.646761 48 20.88992 35.55453
M B 28.77778 3.646761 48 21.44547 36.11008
H B 18.77778 3.646761 48 11.44547 26.11008
Confidence level used: 0.95
Sum of L and M should be 44 + 24 ~ 68 for A and 28 + 28 ~ 56 for B.
> contrast(warp.emm, list(A.LM = c(1, 1, 0, 0, 0, 0),
+ B.LM = c(0, 0, 0, 1, 1, 0)))
contrast estimate SE df t.ratio p.value
A.LM 68.55556 5.157299 48 13.293 <.0001
B.LM 57.00000 5.157299 48 11.052 <.0001
Though I'd use the grouping in the formula.
> warp.em2 <- emmeans(warp.lm, ~tension|wool)
> contrast(warp.em2, list(LM = c(1, 1, 0)))
wool = A:
contrast estimate SE df t.ratio p.value
LM 68.55556 5.157299 48 13.293 <.0001
wool = B:
contrast estimate SE df t.ratio p.value
LM 57.00000 5.157299 48 11.052 <.0001
Thanks. The second method works for me, but not the first (which seems more intuitive) - it doesn't seem to give me back-transformed values:
(mod_em_inj <- emmeans(mod_inj, c("TX_GROUP","time"), type = "response"))
TX_GROUP time rate SE df asymp.LCL asymp.UCL
0 1 5.743158 0.4566671 Inf 4.914366 6.711723
1 1 5.529303 0.4639790 Inf 4.690766 6.517741
0 2 2.444541 0.2981097 Inf 1.924837 3.104564
1 2 1.467247 0.2307103 Inf 1.078103 1.996855
0 3 4.570218 0.4121428 Inf 3.829795 5.453790
1 3 1.676827 0.2472920 Inf 1.255904 2.238826
# Marginal means for combined period (7 - 24 months) - Method 1
(mod_em_inj2 <- emmeans(mod_inj, c("TX_GROUP","time")))
TX_GROUP time emmean SE df asymp.LCL asymp.UCL
0 1 1.7480092 0.07951497 Inf 1.59216273 1.9038557
1 1 1.7100619 0.08391274 Inf 1.54559591 1.8745278
0 2 0.8938574 0.12194916 Inf 0.65484147 1.1328734
1 2 0.3833880 0.15724024 Inf 0.07520279 0.6915732
0 3 1.5195610 0.09018011 Inf 1.34281119 1.6963107
1 3 0.5169035 0.14747615 Inf 0.22785558 0.8059515
contr = list(`2+3|0` = c(0, 0, 1, 0, 1, 0),
`2+3|1` = c(0, 0, 0, 1, 0, 1))
summary(contrast(mod_em_inj2, contr), type = "response")
contrast estimate SE df z.ratio p.value
2+3|0 2.4134184 0.1541715 Inf 15.654 <.0001
2+3|1 0.9002915 0.2198023 Inf 4.096 <.0001
# Marginal means for combined period (7 - 24 months) - Method 2
mod_emmr = regrid(mod_em_inj)
contrast(mod_emmr, contr)
contrast estimate SE df z.ratio p.value
2+3|0 7.014759 0.5169870 Inf 13.569 <.0001
2+3|1 3.144075 0.3448274 Inf 9.118 <.0001
The values of 7.01 and 3.14 are about what I should be getting. Apologies if I'm missing something obvious in your response.
I have a table like so:
1 2 3 4 5
10 22 15 14 3
15 44 22 26 9
...more rows
I want to run a t test on a single row to find out if it's plausible that its mean is less than 3. Using t.test(table[x, ]) doesn't work, because it assumes I'm interested in the mean of the values in the row, which I'm not: the values just indicate the number of responses to each value on a scale of 1-5.
How do?
You could use the following approach:
Ungroup you data
Apply the t.test to each row
apply(data, 1, function(data) {t.test( rep(1:5, times = data), alternative = "less", mu = 3)})
Which will return a t-test for each row, e.g.:
[[1]]
One Sample t-test
data: rep(1:5, times = data)
t = -2.4337, df = 63, p-value = 0.008896
alternative hypothesis: true mean is less than 3
95 percent confidence interval:
-Inf 2.892043
sample estimates:
mean of x
2.65625
[[2]]
One Sample t-test
data: rep(1:5, times = data)
t = -2.3745, df = 115, p-value = 0.009613
alternative hypothesis: true mean is less than 3
95 percent confidence interval:
-Inf 2.921981
sample estimates:
mean of x
2.741379
If you want just the p-values then add $p.value:
apply(data, 1, function(data) {t.test( rep(1:5, times = data), alternative = "less", mu = 3)$p.value})
[1] 0.008895887 0.009613075