Im trying to perform a simple hypothesis test but now I need t-values for alpha = 0.01 instead of 0.05 (the default). Is there a way to do this in R?
This is what I am trying to get for alpha = 0.01:
enter image description here
If you used the t.test function in R, you can use the argument conf.level = 0.99, since the confidence level is equivalent to 1 – the alpha level. You can also read this page on Rdocumentation on the t.test function for more information on what arguments can be used
This seems to be a statistical question rather than a programming question, and as such probably belongs on CrossValidated ...
Results table:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 93.10386 44.482243 2.093057 3.647203e-02
educ 39.82828 3.314467 12.016497 3.783286e-32
When you change alpha (the cutoff value for significance testing), nothing in the table above — neither the t-statistic (t value) nor the p-value (Pr(>|t|)) — changes. The only thing that changes is the judgment of whether you rejected or failed to reject the null hypothesis. In this case, since the p-value for the intercept (0.036) is between 0.01 and 0.05, the conclusion would change from "reject H0" (alpha=0.05) to "fail to reject H0" (alpha=0.01). The p-value for educ is way less than 0.01, so the conclusion would be "reject" either way.
In most cases, base-R functions don't specify an alpha value; they let you make the decision yourself. If you do have a vector of p-values, you could implement an alpha threshold by saying
result <- ifelse(pval<alpha, "reject H0", "fail to reject H0")
Related
I am trying to get the sample size for a single AUC using power.roc.test in the pROC package. For example, the expected AUC is 0.97 (alternative hypothesis) and the value I am comparing to is 0.95 (null hypothesis). At the significance level of 0.05 and power of 0.80, I get 433 positives and 433 negatives using MedCalc, a statistical software. However, I want to carry this out in R. I cannot find any package that allows me to set the null hypothesis value.
Does anyone know how to do this in R?
In pROC, I can use power.roc.test but there is no argument to set the null hypothesis value; it defaults to 0.50.
I have the exact same question. It is possible to compute a sample size using with two roc objects with power.roc.test(), e.g.
power.roc.test(roc1, roc2, power = 0.8)
although I want to compute a sample size just for two different AUC values (one representing the null hypothesis, like MedCalc).
I could also not find a solution in R.
But if this helps, I found out that in MedCalc software you can define the null hypothesis value, e.g., as = 0.95.
This is displayed e.g., here: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6683590/
I need to run several one sample t-tests. Is there a way of changing the alpha level for this test? or, is it necessary to do some kind of correction for a one sample t test? (like a bonferroni correction for a paired t test) ?
Many thanks!
The t.test function returns a p-value
t.test(rnorm(10))$p.value
You can set the cut-off. The function does have an argument conf.level for the confidence interval.
To correct for multiple comparisons, see p.adjust.
p_values = c(0.1, 0.01, 0.05)
p.adjust(p_values, method="bonferroni")
[1] 0.30 0.03 0.15
If you look at the output of the t.test you may have noticed that the output is independent of alpha. The test gives you information you would need to make decisions but the decision criteria are not in it. For that reason it's difficult for people to help you because your question doesn't seem to have anything to do with the t.test R command. I do caution against adjusting the p-values post-hoc using p.adjust. This is especially problematic because many adjustments are actually varying alpha (but, as you indicate Bonferroni uses a single one). It is much more honest reporting to report your modified alpha value which, for Bonferroni, is just 0.05 / number of tests.
I am attempting to use mlogit in R to produce a transportation mode choice. The problem is that I have a variable that only applies to certain alternatives.
To be more specific, I am attempting to predict the probability of using auto, transit and non motorized modes of transportation. My predictors are: distance, transit wait time, number of vehicles in household and in vehicle travel time.
It works when I format it this way:
> amres<-mlogit(mode~ivt+board|distance+nveh,data=AMLOGIT)
However, the results I get for in vehicle travel time (ivt) does not make sense:
> summary(amres)
Call:
mlogit(formula = mode ~ ivt + board | distance + nveh, data = AMLOGIT,
method = "nr", print.level = 0)
Frequencies of alternatives:
auto tansit nonmotor
0.24654 0.28378 0.46968
nr method
5 iterations, 0h:0m:2s
g'(-H)^-1g = 6.34E-08
gradient close to zero
Coefficients :
Estimate Std. Error t-value Pr(>|t|)
tansit:(intercept) 7.8392e-01 8.3761e-02 9.3590 < 2.2e-16 ***
nonmotor:(intercept) 3.2853e+00 7.1492e-02 45.9532 < 2.2e-16 ***
ivt 1.6435e-03 1.2673e-04 12.9691 < 2.2e-16 ***
board -3.9996e-04 1.2436e-04 -3.2161 0.001299 **
tansit:distance 3.2618e-04 2.0217e-05 16.1336 < 2.2e-16 ***
nonmotor:distance -2.9457e-04 3.3772e-05 -8.7224 < 2.2e-16 ***
tansit:nveh -1.5791e+00 4.5932e-02 -34.3799 < 2.2e-16 ***
nonmotor:nveh -1.8008e+00 4.8577e-02 -37.0720 < 2.2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Log-Likelihood: -10107
McFadden R^2: 0.30354
Likelihood ratio test : chisq = 8810.1 (p.value = < 2.22e-16)
As you can see, the stats look great, but ivt should be a negitive coefficient and not a positive one. My thoughts are that the non-motorized portion, which is all 0, is affecting it. I believe what I have to do is use the third par of the equation as seen below:
> amres<-mlogit(mode~board|distance+nveh|ivt,data=AMLOGIT)
However, this results in:
Error in solve.default(H, g[!fixed]) :
Lapack routine dgesv: system is exactly singular: U[10,10] = 0
I believe this is, again, because the variable is all 0's for non-motorized but I am unsure how to fix this. How do I include an alternative specific variable if it does not apply to all alternatives?
I am not well versed in the various implementations of logit models, but I imagine it has to do with making sure you have variation across persons and alternatives to the matrix can be properly determined with variation across alternatives and choosers.
What do you get from saying
amres<-mlogit(mode~distance| nveh | ivt+board,data=AMLOGIT)
mlogit has a group separation between the pipes, as I understand it as follows: first part is your basic formula, the second part is variables that don't vary across alternatives (i.e. are only person specific, gender, income--I think nveh should be here) while the third part varies by alternative.
Ken Train, incidentally, has a set of vignettes on mlogit specifically that might be helpful. Viton mentions the partition with pipes.
Ken Train's Vignettes
Philip Viton's Vignettes
Yves Croissant's Vignettes
Looks like you may have perfect separation. Have you checked this by e.g. looking at crosstables of the variables? (Can't fit a model if one combination of predictors allows for perfect prediction...) Would be helpful to know size of dataset in this regard - you may be over-fitting for the amount of data you have. This is a general problem in modelling, not specific to mlogit.
You say "the stats look great" but values for Pr(>|t|)s and the Likelihood ratio test look implausibly significant, which would be consistent with this problem. This means the estimates of the coefficients are likely to be inaccurate. (Are they similar to the coefficients produced by univariate modelling ?). Perhaps a simpler model would be more appropriate.
Edit #user3092719 :
You're fitting a generalized linear model, which can easily be overfit (as the outcome variable is discrete or nominal - i.e. has a restricted no. of values). mlogit is an extension of logistic regression; here's a simple example of the latter to illustrate:
> df1 <- data.frame(x=c(0, rep(1, 3)),
y=rep(c(0, 1), 2))
> xtabs( ~ x + y, data=df1)
y
x 0 1
0 1 0
1 1 2
Note the zero in the top right corner. This shows 'perfect separation' which means you that if x=0 you know for sure that y=0 based on this set. So a probabilistic predictive model doesn't make much sense.
Some output from
> summary(glm(y ~ x, data=df1, binomial(link = "logit")))
gives
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -18.57 6522.64 -0.003 0.998
x 19.26 6522.64 0.003 0.998
Here the size of the Std. Errors are suspiciously large relative to the value of the coefficients. You should also be alerted by Number of Fisher Scoring iterations: 17 - the large no. iterations needed to fit suggests numerical instability.
Your solution seems to involve ensuring that this problem of complete separation does not occur in your model, although hard to be sure without having a minimal working example.
I want to simulate the effect of different kinds of multiple testing correction such as Bonferroni, Fisher's LSD, DUncan, Dunn-Sidak Newman-Keuls, Tukey, etc... on Anova.
I guess I should simply run a regular Anova. And then accept as significant p.values which I calculate by using p.adjust. But I'm not getting how this p.adjust function works. Could give me some insights about p.adjust() ?
when running:
> p.adjust(c(0.05,0.05,0.1),"bonferroni")
# [1] 0.15 0.15 0.30
Could someone explain as to what does this mean?
Thank you for your answer. I kinda know a bit of all that. But I still don't understand the output of p.adjust. I'd expect that...
P.adjust(0.08,'bonferroni',n=10)
... would returns 0.008 and not 0.8. n=10 doesn't it mean that I'm doing 10 comparisons. and isn't 0.08 the "original alpha" (I mean the threshold I'd use to reject the NULL hypothesis if I had one simple comparison)
You'll have to read about each multiple testing correction technique, whether it be False Discovery Rate (FDR) or Family-Wise Error Rate (FWER). (Thanks to #thelatemail for pointing out to expand the abbreviations).
Bonferroni correction controls the FWER by setting the significance level alpha to alpha/n where n is the number of hypotheses tested in a typical multiple comparison (here n=3).
Let's say you are testing at 5% alpha. Meaning if your p-value is < 0.05, then you reject your NULL. For n=3, then, for Bonferroni correction, you could then divide alpha by 3 = 0.05/3 ~ 0.0167 and then check if your p-values are < 0.0167.
Equivalently (which is directly evident), instead of checking pval < alpha/n, you could take the n to the other side pval * n < alpha. So that the alpha remains the same value. So, your p-values get multiplied by 3 and then would be checked if they are < alpha = 0.05 for example.
Therefore, the output you obtain is the FWER controlled p-value and if this is < alpha (5% say), then you would reject the NULL, else you'd accept the NULL hypothesis.
For each tests, there are different procedures to control the false-positives due to multiple testing. Wikipedia might be a good start point to learn about other tests as to how they correct for controlling false-positives.
However, your output of p.adjust, gives in general multiple-testing corrected p-value. In case of Bonferroni, it is FWER controlled p-value. In case of BH method, it is FDR corrected p-value (or also otherwise called q-value).
Hope this helps a bit.
I'm trying to write a script in R that allows to aproximate by simulation the critical values (p-values) for a Pearson Chi Squared test, taking different alpha values.
I know that an option in "chisq.test" exists, but I want to know how to do this simulation by hand.
For example:
Please check the code at http://www.biostat.wisc.edu/~kbroman/teaching/stat371/comp21.R (I don't know how to put the code properly)
If you check the last part ("p-value by simulation"), you'll see the way p-value are obtained in the script. I want to do this, but taking different alpha values.
Thank you very much!
The calculation of p-value of any statistical test (whatever method: classical, bootstrap) has nothing to do with alpha value if you mean significance level by that. You need alpha value when making a decision to accept or reject the null hypothesis (if p-value is less than chosen alpha then reject the null).
If you have done a simulation as shown in the script, and have derived a vector of simulation values xsqsim, then the critical value for an alpha level of alpha is approximately
quantile(xsqsim,1-alpha)
You have to be a little bit careful if you have a small sample, because the critical value should be the value of the test statistic q such that the probability of the observed value being greater than or equal to q is equal to alpha ...