My ANOVA doesnt produce a P value in R - r

I'm trying to find if soil PH levels differ between 3 species of violet. I'm performing a one way ANOVA test in R and entered the text as follows:
anova<-aov(viola$PH~viola$species)
summary(anova)
This seems like it should be simple and I've done similar tests before, but when I run this, all I get is the degrees of freedom, sum sq, and mean sq. Any help would be appreciated.

Anova requires replicates to assess the variance and compute the statistic. Discard the mean values, and use the original replicated data as an input to anova. Then you can extract the p-values with summary(anova) or TukeyHSD(anova).

Related

MICE: Paired sample t-test and cohen's d estimation using imputed datasets?

I've created 12 imputed samples using the MICE package and wish to run paired sample t-tests and cohen's d calculation using an imputed dataset but I'm not sure how to do this. My end goal is to compare parameter estimates, t-test results and effect size estimates from both complete case analysis and adjusted (via MICE) to compare these, but while I have no issue with parameter estimates, I can't figure out t-tests and cohen's d.
I'm a bit confused as to how to approach this and searching online and in the mice package documentation and has not led to much progress. I did find mi.t.test from the MKmisc package but this appears to be for datasets imputed using Amelia, not MICE, and I can't quite figure it out. Would anyone have any advice or resources here please?
So far I have:
Identified auxiliary variables
Created Predictor Matrix
Imputed missing data m times
Fit & pooled estimates for linear models using with() for parameter estimates using summary()
Is there perhaps a way I can create an object of an imputed dataset that is usable with other analyses or am I looking at this in the wrong way?
I used multiple imputations for the first time for my research, but maybe I can help you by passing on the tips I received.
Perform the t-test on every imputed dataset
Use the Mice pool.scalar function. You can find documentation online. For Q fill in the Mean Difference, and for U the Standard Error Difference.
Then your pooled t-value is: qbar / sqrt(t)
You can find the values of qbar and t in the output of pool.scalar
And your pooled p-value is: 2 * (1 - pt(abs(statistic), pmax(DF, 0.001)))
Hope this helps!

What post-hoc test should be used for a glmer model with a continious and a categorical predictor variable?

I'm a bit of a newbie with stats and R, so need a bit of direction to find a suitable post-hoc test for my glmer model.
The model has a binary dependent variable (absent/present) and the predictor variables are interactive terms between a continuous variable(eg temp) and a categorical variable (species, n=3). Only interactive terms, rather than the continuous factor in isolation, produce significant results when an anova is run on the model. Species by itself has a large effect because one species is much rarer than the others. I'm trying to tease apart how the presence of these species varies across pH and between species.
I've tried lsmeans test with Tukey, and Firth's Bias-Reduced Logistic Regression, emmeans. I ran the effects function on the interactive terms, so had a rough expectation of what a post hoc could show, but the results logistf (firth's) have produced I was not expecting. Emmeans and tukey both gave the same results and ignored the continuous variable I assume because it's not a factor.
When I run firth's regression it produces chi-squared and p values that are either infinity for chi values or the p values astronomically small, even though what I saw through effects suggested no significant difference. I can't tell with the interactive term if there truly is an effect of the environmental variable or if the significant effect is because of the difference in species. Based on what I have seen of the logistf function, I didn't think it would produce a chi-square score. Is this an issue in coding or is it because of my data?
If I wasn't clear enough about something please let me know and if anyone has any suggestions or advice, they would be massively appreciated. Thanks!
The model and test code I used are below:
###glmer model
Large<-glmer(Abs.Pres~ Species:Q.Depth+Species:Conductivity+Species:Temp+Species:pH+Species:DO.P+(1|QID),
nAGQ=0,
family=binomial,
data=Stacked_Pref)
anova(Large)
Output:Analysis of Variance Table
npar Sum Sq Mean Sq F value
Species:Q.Depth 3 234.904 78.301 78.3014
Species:Conductivity 3 32.991 10.997 10.9970
Species:Temp 3 39.001 13.000 13.0004
Species:pH 3 25.369 8.456 8.4562
Species:DO.P 3 34.930 11.643 11.6434
###Firths
Lp<-logistf(Abs.Pres~Species:pH, data=Stacked_Pref, contrasts.arg=list(pH="contr.treatment", Species="contr.sum"))
> Lp
logistf(formula = Abs.Pres ~ Species:pH, data = Stacked_Pref,
contrasts.arg = list(pH = "contr.treatment", Species = "contr.sum"))
Model fitted by Penalized ML
Confidence intervals and p-values by Profile Likelihood
coef se(coef) lower 0.95 upper 0.95 Chisq p
(Intercept) 1.9711411 0.57309880 0.8552342 3.1015114 12.09107 5.066380e-04
SpeciesGoby:pH -0.3393185 0.07146049 -0.4804047 -0.2003108 23.31954 1.371993e-06
SpeciesMosquito:pH -0.3001385 0.07127771 -0.4408186 -0.1614419 18.24981 1.937453e-05
SpeciesRFBE:pH -0.4771393 0.07232469 -0.6200179 -0.3365343 45.73750 1.352096e-11
Likelihood ratio test=267.0212 on 3 df, p=0, n=3945

Negative Binomial Regression Assumption Testing

First post!
I'm a biologist with limited background in applied statistics and R. Basically know enough to be dangerous, so I'd appreciate it someone could confirm/deny that I'm on the right path.
My datasets consists of count data (wildlife visits to water wells) as a response variable and multiple continuous predictor variables (environmental measurements).
First, I eliminated multicolinearity by dropping a few predictor variables. Second, I investigated the distribution of the response variable. Initially, it looked Poisson. However, a Poisson exact test came back as significant, and the variance of the response variable was around 200 with a mean around 9, i.e. overdispersed. Due to this, I decided to move forward with Negative Binomial and Quasipoisson regressions. Both selected the same model, the residuals of which are in a normal distribution. Further, a plot of residuals over predicted values is unbiased and homoscedastic.
Questions:
1. Have I selected the correct regressions to model this data?
2. Are there additional assumptions of the NBR and QpR that I need to test? How should I/Where can I learn about how to do these?
3. Did I check for overdispersion correctly? Is there a difference in comparing the mean and variance vs comparing the conditional mean and variance of the response variable?
4. While the NBR and QpR called the same model, is there a way to select which is the "better" approach?
5. I would like to eventually publish. Are there more analyses I should perform on my selected model?

beta coefficient in Anova with R and XLStat

I’m working with the software R and XLStat. I’ve conducted an one-way ANOVA (my categorical variable is 3 modal (1,2,3) and my response variable is quantitative on scale 1-10).
I’ve conducted this ANOVA on R and XLStat and the outputs for the F fisher, p-value, coefficient estimations, t-values, std error … are exactly the same.
However, XLstat offers an extra output : the standardized coefficients (called too beta coefficients). Firstly, I was surprised, because I didn’t think we could calculate beta coefficient for a categorical variable and according to the bibliography I read, it doesn’t have any sense.
Anyway, I tried to find these coefficients with R, thanks to the unique formula I found : beta = estimate * sd(x)/sd(y). sd(x) being the standard deviation of the categorical variable (which is automatically transformed as numeric variable with R, in order to calculate sd(x), seems logical ) and sd(y) being the standard deviation of my response variable.
The first beta I obtained with R is the same than in XLstat , but not the second and the third. Given that the first one is the same with R and XLStat, I suppose that Xlstat convert too the categorical variable in numeric variable (which is senseless but this is not the question).
Moreover, I conducted the anova on Statistica in order to see if XLStat did any mistake but its outputs for the beta coefficients are the same than in Xlstat …
So, my question is this one : what is the formula to obtain the beta coefficient in a one way Anova ?
Then, I would like to ask you about the relevance of these beta coefficients for a categorical variable. According to my thoughts and publications I read, it doesn't have any sense …
ps contrasts in R and Xlstat are sum(ai)=0. For beta coefficients, XLStat remove the intercept. I guess this fact could be important but I don't know somehow
The formula for obtaining beta coefficients from metric coefficients for an ANOVA is the same as for a linear regression. The coefficients have no sensible interpretation (for categorical variables), but standardized coefficients are useful in comparing the relative effects of IVs with different metrics.
In R, either use scale() to transform the data to z-scores before fitting the model, or use lm.beta() instead of lm().
It is not clear why you would obtain different beta coefficients with XLStat, but it could have something to do with degrees of freedom if it's not an error. This example compares the lm.beta() function in R with SAS and obtains the same coefficients.

How to perform a bootstrapped paired t-test in R?

I would like to perform a bootstrapped paired t-test in R. I have tried this for multiple datasets that returned p<.05 when using a parametric paired t-test however when I run the bootstrap I get p-values between 0.4 and 0.5. Am I running this incorrectly?
differences<-groupA-groupB
t.test(differences) #To get the t-statistic e.g. 1.96
Repnumber <- 10000
tstat.values <- numeric(Repnumber)
for (i in 1:Repnumber) {
group1 = sample(differences, size=length(differences), replace=T)
tstat.values[i] = t.test(group1)$statistic
}
#### To get the bootstrap p-value compare the # of tstat.values
greater (or lesser) than or equal to the original t-statistic divided
by # of reps:
sum(tstat.values<=-1.96)/Repnumber
Thank you!
It looks like you're comparing apples and oranges. For the single t-test of differences you're getting a t-statistic, which, if greater than a critical value indicates whether the difference between group1 and group2 is significantly different from zero. Your bootstrapping code does the same thing, but for 10,000 bootstrapped samples of differences, giving you an estimate of the variation in the t-statistic over different random samples from the population of differences. If you take the mean of these bootstrapped t-statistics (mean(tstat.values)) you'll see it's about the same as the single t-statistic from the full sample of differences.
sum(tstat.values<=-1.96)/Repnumber gives you the percentage of bootstrapped t-statistics less than -1.96. This is an estimate of the percentage of the time that you would get a t-statistic less than -1.96 in repeated random samples from your population. I think this is essentially an estimate of the power of your test to detect a difference of a given size between group1 and group2 for a given sample size and significance level, though I'm not sure how robust such a power analysis is.
In terms of properly bootstrapping the t-test, I think what you actually need to do is some kind of permutation test that checks whether your actual data is an outlier when compared with repeatedly shuffling the labels on your data and doing a t-test on each shuffled dataset. You might want to ask a question on CrossValidated, in order to get advice on how to do this properly for your data. These CrossValidated answers might help: here, here, and here.

Resources