R is making wrong contingency tables for me - r

I am creating a contingency table for Fisher's Exact test from my 'covid' dataframe using the table function. The table is the following
table1 <- matrix(c(117,390,861,669),ncol=2,byrow=TRUE)
colnames(table1) <- c("No","Yes")
rownames(table1) <- c("Hospital","House")
table1 <- as.table(table1)
table1
The code I wrote to make the table was following:
table1 = table(covid$Treatment_place, covid$Cognitive_dysfunction)
table1
No Yes
Hospital (a)117 (b)390
House (c) 861 (d)669
However, my suspicion is that R is not counting my positive and negative cases correctly. In this case, cell b and c would be positive cases (I have manually labelled the cells for your understanding), but R is counting them as negative and therefore giving a flawed analysis. The following is the odds ratio given by R, which is possible only if R counts cell a and d as positive cases (which should be negative cases).
fisher.test (table1)
Fisher's Exact Test for Count Data
data: table1
p-value < 2.2e-16
alternative hypothesis: true odds ratio is not equal to 1
95 percent confidence interval:
0.1836178 0.2947950
sample estimates:
odds ratio
0.2332679
Is there any way I can specify which cells are to be counted as positives ? I think if I can interchange my 'Yes' and 'No' columns it would give me a correct result too, but I don't know how.
Also, is there any way I can transfer the original column names as table labels?
Thanks.

Related

Calculating significance of two variables in a dataset for every column

I would like to find out how to create a column of p-values to check the significance of the variable for every observation. I would like to check the p values for the two columns on the right side of the data set. I think the most efficient way to do so is to calculate a t.test for every column but i don't know how to do so.
This is what i tried. But this didn't give me the significance of every table.
t.test(Elasapp,Elashuis,var.equal=TRUE)
Results:
Two Sample t-test
data: Elasapp and Elashuis
t = 41.674, df = 48860, p-value \< 2.2e-16
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
0\.07461778 0.08198304
sample estimates:
mean of x mean of y
0\.085672044 0.007371636

Fisher's Exact Test

In this post https://stats.stackexchange.com/questions/94909/course-of-action-for-2x2-tables-with-0s-in-cell-and-low-cell-counts, OP said that s/he got a p-value 0.5152 while conducted a Fisher's exact test for the following data:
Control Cases
A 8 0
B 14 0
But I am getting p-value=1 and odds ratio=0 for the data. My R codes are:
a <- matrix(c(8,14,0,0),2,2)
(res <- fisher.test(a))
Where am I doing mistake?
Good afternoon :)
https://en.wikipedia.org/wiki/Fisher%27s_exact_test
Haven't used these in a while, but I'm assuming its your column of two 0's:
p = choose(14, 14) * choose(8, 8)/ choose(22, 22)
which is 1.0. For odds ratio, read here: https://en.wikipedia.org/wiki/Odds_ratio
The 0's are either the numerators or the denominators. I think this makes sense, as a column of 0's effectively mean you have a group with no observations in.
You get the strange p-value=1 and OR=0 because one or more of your counts is 0. It should not be computed by the chi-square equation, which through multiplication yields chi-values of 0 for these respective cells:
Chi square equation, cell-by-cell.
Instead, you should use the Fisher's exact test ("fisher.test()") which to some extent can correct for the very low cell counts (normally you should use Fisher's for whenever you have at least 20% of cells with a count of <5). Source: https://www.ncbi.nlm.nih.gov/pubmed/23894860 Using the chi-square analysis will require you to correct using the Yates' correction, (e.g.: chisq.test(matrix, correct = T)).

Permutation Distribution in r

So I need to create a permutation distribution of the difference in the proportions for a data set, however I'm not sure the best way to go about doing so.
This is the table that I need it for. I have to asses whether the difference between 2010 and 2011 is significant for "Yes".
mytable1 <- matrix(c(3648,25843,3407,26134), byrow=T, ncol=2)
dimnames(mytable1) <- list(c("2010","2011"),c("Yes","No"))
names(dimnames(mytable1)) <- c("Year","Response")
How do I code this in a for-loop?
Thank you so much!
Why use a permutation-based test if you can calculate exact probabilities? Is this a homework exercise?
fisher.test(mytable1);
Fisher's Exact Test for Count Data
data: mytable1
p-value = 0.001799
alternative hypothesis: true odds ratio is not equal to 1
95 percent confidence interval:
1.029882 1.138384
sample estimates:
odds ratio
1.082775
gives you the exact probabilitity (the p-value) for seeing a ratio of "Yes" to "No" in 2010 relative to 2011 (i.e. the odds ratio) as extreme or more extreme than what was observed. Note that the null hypothesis corresponds to an odds ratio of 1.
I assume this is what you mean by the "difference between 2010 and 2011 is significant [for Yes]". If not, please clarify and be more precise in specifying your test statistic (and null hypothesis). If it needs to be a permutation-based test, can you show us how far you have gotten?

ANOVA (AOV function) in R: Misleading p_value reported on equal values

I would greatly appreciate any guidance on the following: I am running ANOVA (aov) to retrieve p_value s for a number of subsets of a larger data set. So I kind of bumped into a subset where my numeric variables/values are equally 36. Because it is a part of a loop ANOVA is still executed along with reporting an seemingly infinitely small p_value 1.2855e-134--> Correct me if I am wrong but the smaller the p_value the higher the probability that the difference between the factors is significantly different?
For simplicity this is the subset:
sUBSET_FOR_ANOVA
Here is how I calculate ANOVA and retrieve p_value, where TEMP_DF2 is just the subset you see attached:
#
anova_sweep <- aov(TEMP_DF2$GOOD_PTS~TEMP_DF2$MACH,data = TEMP_DF2)
p_value <- summary(anova_sweep)[[1]][["Pr(>F)"]]
p_value <- p_value[1]
#
Many thanks for any guidance,
I can't replicate your findings. Let's produce an example dataset with all values being 36:
df <- data.frame(gr = rep(letters[1:2], 100),
y = 36)
summary(aov(y~gr, data = df))
Gives:
Df Sum Sq Mean Sq F value Pr(>F)
gr 1 1.260e-27 1.262e-27 1 0.319
Residuals 198 2.499e-25 1.262e-27
Basically, depending on the sample size, we obtain a p-value around 0.3 or so. The F statistic is (by definition) always 1, since the between and within group variances are equal.
Are there results misleading? To some extent, yes. The estimated SS and MS should be 0, aov calculates them as very very small. Some other statistical tests in R and in some packages check for zero variance and would produce an error, but aov apparently does not.
However, more importantly, I would say your data is violating the assumptions of the ANOVA and therefore any result cannot be trusted to base conclusion on. The expectation in R when it comes to statistical tests is usually that it is upon the user to employ the tests in the correct circumstances.

chi squared test R and excel

I'm working on independence testing for some stuff at work. I'm usually do this sort of thing in R, but my boss wanted me to do it in Excel for the graphs. My problem is that when I use R's chi squared test, it gives me a different result from the one Excel uses. I'm not sure if I'm setting things up wrong, or if there's a difference in methods used, but the results are pretty much polar opposites. Are the null hypotheses different in these two programs?
Here's what I've got:
Observed Values Expected Values
Total Errors Priority 1 + 2 Total Errors Priority 1 + 2
Non-V&T 342 188 530 Non-V&T 171.0759494 93.92405063
V&T 117 64 181 V&T 58.42405063 32.07594937
459 252 1422
Test value:
2.68619E-79
R:
tbl1 <- matrix(c(342,117,188,64),ncol=2)
chisq.test(tbl1)
Pearson's Chi-squared test with Yates' continuity correction
data: tbl1
X-squared = 1.6653e-30, df = 1, p-value = 1
chisq.test(tbl1)$expected
[,1] [,2]
[1,] 342.1519 187.8481
[2,] 116.8481 64.1519
ps. I can't seem to paste in what I had from Excel properly. The main point is that the p-value expected values are different from what R gives me.
I too am not sure how to paste from Excel at the moment, but I can provide you with the formulas I used in Excel via a screenshot. It produced a p-value of 0.9782, close to that given in R. Please see the following screenshot for the values:
In the above, I use the actual values as input into R. Cells A2:B3
I compute the marginal row and column sums
I compute the expected cell values by taking a product of the appropriate marginal row and column sum, and dividing by the overall sum. Cells A7:B8
I compute the p-value next using the actual and expected counts.
If you re-do the R procedure without the Yates correction, i.e chisq.test(tbl1, correct = F), you get a p-value of 0.9782, which corresponds to Excel's p-value.

Resources