I am doing boxM test from biotools R package to test the homogeneity of variance-covariance matrices. I am working on a dataset namely "wine" from r package called rattle.
First, I installed the library(car). Then I did following conversions for all variables to prepare the dataset for LDA or QDA after Box's M test.
wine1 <- data.frame(wine)
wine1$Type <- as.factor(wine1$Type)
wine1$Alcohol <- as.numeric(wine1$Alcohol) # I converted all other variables to "numeric" class.
When I ran boxM test in RStudio [Version 0.99.489 and Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_4) AppleWebKit/601.5.17 (KHTML, like Gecko)] as
boxM(wine2[,-c(4, 11, 12, 13)], wine2[,1]) # No physical factors
boxM(wine2[,-c(2, 3, 5, 6, 7, 8, 9, 10, 14)], wine2[,1]) # No chemical factor
it returned following error
Error: is.numeric(x) || is.logical(x) is not TRUE
Two things needed with that data setup: Change wine2 to wine1 (but that did not fix your error) and remove column 1 from the first arguments to boxM:
> boxM(wine1[,-c(1,4, 11, 12, 13)], wine1[,1])
Box's M-test for Homogeneity of Covariance Matrices
data: wine1[, -c(1, 4, 11, 12, 13)]
Chi-Sq (approx.) = 378.32, df = 90, p-value < 2.2e-16
> boxM(wine2[,-c(1,2, 3, 5, 6, 7, 8, 9, 10, 14)], wine2[,1])
Error in inherits(data, c("data.frame", "matrix")) :
object 'wine2' not found
> boxM(wine1[,-c(1,2, 3, 5, 6, 7, 8, 9, 10, 14)], wine1[,1])
Box's M-test for Homogeneity of Covariance Matrices
data: wine1[, -c(1, 2, 3, 5, 6, 7, 8, 9, 10, 14)]
Chi-Sq (approx.) = 147.69, df = 20, p-value < 2.2e-16
Related
Consider the following:
library(tidyverse)
library(car)
a <- c( 2, 3, 4, 5, 6, 7, 8, 9, 100, 11)
b <- c(5, 6, 7, 80, 9, 10, 11, 12, 13, 14)
c <- c(15, 16, 175, 18, 19, 20, 21, 22, 23, 24)
x <- c(17,18,50,15,64,15,3,5,6,9)
y <- c(55,66,99,83,64,51,23,64,89,101)
z <- c(98,78,56,21,45,34,61,98,45,64)
abc <- data.frame(cbind(a,b,c))
Firstly, I run a regression and values abc with xyz as follows (This went according to plan):
dep_vars <- as.matrix(abc)
lm <- lm(dep_vars ~ x + y + z, data = abc)
From here, I want to get the variance inflation factor using the vif() function:
vif(lm)
But then I get an error that says Error in if (names(coefficients(mod)[1]) == "(Intercept)") { : argument is of length zero.
Can anybody help me understand where I went wrong? Or is there an alternative?
I have two datasets which contains a distrbution of 90 data points into 2 and 4 groups/rows and I would like to determine which one out of the two has better distributed the data and plot the result to visually see which one has done this. Better distribution means which one has made it so each group has a similar/same number of data. For example, we can see that the result of Grouped 2 the second group contains larger values for each column compared to the first column so 1 of the 2 groups contains larger values which means its not well distributed among the 2 groups.
I quite new to R so I am unsure how I could go about doing this. Would appreciate any insight into what approach could be used.
R
Grouped into 4
Values <- matrix(c(1, 6, 3, 6, 6, 8,
3, 3, 5, 3, 3, 3,
6, 7, 6, 7, 5, 4,
9, 4, 4, 5, 5, 3), nrow = 4, ncol = 6, byrow = TRUE)
Grouped into 2
Values <- matrix(c(3, 6, 4, 3, 4, 6,
12, 9, 12, 12, 11, 9), nrow = 2, ncol = 6, byrow = TRUE)
You can do this with some basic statistics, using hypothesis testing i.e. testing whether the two groups are statistically different or not. The stats package in R has a lot of tests that you can try and use, each with its own assumptions. Here is one:
Making the matrix
values <- matrix(c(3, 6, 4, 3, 4, 6,
12, 9, 12, 12, 11, 9), nrow = 2, ncol = 6, byrow = TRUE)
Conducting t-test
t.test(values[1, ], values[2, ], paired = FALSE)
Will give you this:
Welch Two Sample t-test
data: values[1, ] and values[2, ]
t = -7.9279, df = 9.945, p-value = 1.318e-05
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-8.328203 -4.671797
sample estimates:
mean of x mean of y
4.333333 10.833333
The means of values[1, ] is smaller than values[2, ], with a p-value of 1.3e-05.
I get a warning when wanting to select rows dependent on the mean of one of the variables in a tibble. See details below and warning. So I wonder if there is a more tidyverse solution to this.
Example data:
x <- c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
y <- c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
z <- c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
xyz <- tibble(x, y, z)
group1 <- xyz[xyz[2] < stats::median(purrr::as_vector(xyz$y), na.rm = TRUE), ]
Warning message:
The i argument of ``[() can't be a matrix as of tibble 3.0.0.
Convert to a vector.
Thanks in advance
xyz %>%
filter(y < stats::median(y))
I have the following data,
Data <- c(8, 15, 8, 10, 7, 5, 2, 11, 8, 7, 6, 6, 4, 6, 10,
3, 9, 7, 15, 6, 5, 9, 8, 3, 3, 8, 5, 14, 8, 11,
8, 10, 7, 4, 6, 4, 6, 7, 11, 7, 8, 7, 8, 6, 5,
12, 7, 8, 13, 10, 6, 9, 7)
and I want to perform a KS test in R using the dgof package but have no idea how to use it. I also fit the above data with binomial and Poisson distribution.
Now, I want to use KS test to identify which model (binomial or Poisson) represents the data.
Thank you.
Well first you have two problems.
The test of kolmogorov does not apply to continuous distributions. Kolmogorov-Smirnov Goodness-of-Fit Test
It has ties in the data.
even if I wanted to apply the test it would be in the following way, eliminate the duplicate values and use the maximum likelihood estimators for the poisson and binomial distributions.
x <- unique(Data)
ks.test(x,"ppois",lambda <- mean(x))
One-sample Kolmogorov-Smirnov test
data: x
D = 0.2058, p-value = 0.5273
alternative hypothesis: two-sided
ks.test(x,"pbinom",n <- length(x),p <- mean(x)/n)
One-sample Kolmogorov-Smirnov test
data: x
D = 0.3126, p-value = 0.103
alternative hypothesis: two-sided
We could conclude that the Poisson model best represents the data, without relying on the p-value.
R code:
x <- c(9, 5, 9 ,10, 13, 8, 8, 13, 18, 30)
y <- c(10, 6, 9, 8, 11, 4, 1, 3, 3, 10)
library(exactRankTests)
wilcox.exact(y,x, paired = TRUE, alternative = "two.sided")
The results: V = 3, p-value = 0.01562
SAS code:
data aaa;
set aaa;
diff=x-y;
run;
proc univariate;
var diff;
run;
The results: S=19.5 Pr >= |S| 0.0156
How to get statistics S in R?
If n<=20 the exact P was same in SAS and R,but if n>20 the results were different.
x <- c(9, 5, 9 ,10, 13, 8, 8, 13, 18, 30,9, 5, 9 ,10, 13, 8, 8, 13, 18, 30,9,11,12,10)
y <- c(10, 6, 9, 8, 11, 4, 1, 3, 3, 10,10, 6, 9, 8, 11, 4, 1, 3, 3, 10,10,12,11,12)
wilcox.exact(y,x,paired=TRUE, alternative = "two.sided",exact = FALSE)
The results: V = 34, p-value = 0.002534
The SAS results:S=92.5 Pr >= |S| 0.0009
How to get the same statistics S and P value in SAS and R? Thank you!