I have the following data,
Data <- c(8, 15, 8, 10, 7, 5, 2, 11, 8, 7, 6, 6, 4, 6, 10,
3, 9, 7, 15, 6, 5, 9, 8, 3, 3, 8, 5, 14, 8, 11,
8, 10, 7, 4, 6, 4, 6, 7, 11, 7, 8, 7, 8, 6, 5,
12, 7, 8, 13, 10, 6, 9, 7)
and I want to perform a KS test in R using the dgof package but have no idea how to use it. I also fit the above data with binomial and Poisson distribution.
Now, I want to use KS test to identify which model (binomial or Poisson) represents the data.
Thank you.
Well first you have two problems.
The test of kolmogorov does not apply to continuous distributions. Kolmogorov-Smirnov Goodness-of-Fit Test
It has ties in the data.
even if I wanted to apply the test it would be in the following way, eliminate the duplicate values and use the maximum likelihood estimators for the poisson and binomial distributions.
x <- unique(Data)
ks.test(x,"ppois",lambda <- mean(x))
One-sample Kolmogorov-Smirnov test
data: x
D = 0.2058, p-value = 0.5273
alternative hypothesis: two-sided
ks.test(x,"pbinom",n <- length(x),p <- mean(x)/n)
One-sample Kolmogorov-Smirnov test
data: x
D = 0.3126, p-value = 0.103
alternative hypothesis: two-sided
We could conclude that the Poisson model best represents the data, without relying on the p-value.
Related
I have two datasets which contains a distrbution of 90 data points into 2 and 4 groups/rows and I would like to determine which one out of the two has better distributed the data and plot the result to visually see which one has done this. Better distribution means which one has made it so each group has a similar/same number of data. For example, we can see that the result of Grouped 2 the second group contains larger values for each column compared to the first column so 1 of the 2 groups contains larger values which means its not well distributed among the 2 groups.
I quite new to R so I am unsure how I could go about doing this. Would appreciate any insight into what approach could be used.
R
Grouped into 4
Values <- matrix(c(1, 6, 3, 6, 6, 8,
3, 3, 5, 3, 3, 3,
6, 7, 6, 7, 5, 4,
9, 4, 4, 5, 5, 3), nrow = 4, ncol = 6, byrow = TRUE)
Grouped into 2
Values <- matrix(c(3, 6, 4, 3, 4, 6,
12, 9, 12, 12, 11, 9), nrow = 2, ncol = 6, byrow = TRUE)
You can do this with some basic statistics, using hypothesis testing i.e. testing whether the two groups are statistically different or not. The stats package in R has a lot of tests that you can try and use, each with its own assumptions. Here is one:
Making the matrix
values <- matrix(c(3, 6, 4, 3, 4, 6,
12, 9, 12, 12, 11, 9), nrow = 2, ncol = 6, byrow = TRUE)
Conducting t-test
t.test(values[1, ], values[2, ], paired = FALSE)
Will give you this:
Welch Two Sample t-test
data: values[1, ] and values[2, ]
t = -7.9279, df = 9.945, p-value = 1.318e-05
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-8.328203 -4.671797
sample estimates:
mean of x mean of y
4.333333 10.833333
The means of values[1, ] is smaller than values[2, ], with a p-value of 1.3e-05.
I am doing boxM test from biotools R package to test the homogeneity of variance-covariance matrices. I am working on a dataset namely "wine" from r package called rattle.
First, I installed the library(car). Then I did following conversions for all variables to prepare the dataset for LDA or QDA after Box's M test.
wine1 <- data.frame(wine)
wine1$Type <- as.factor(wine1$Type)
wine1$Alcohol <- as.numeric(wine1$Alcohol) # I converted all other variables to "numeric" class.
When I ran boxM test in RStudio [Version 0.99.489 and Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_4) AppleWebKit/601.5.17 (KHTML, like Gecko)] as
boxM(wine2[,-c(4, 11, 12, 13)], wine2[,1]) # No physical factors
boxM(wine2[,-c(2, 3, 5, 6, 7, 8, 9, 10, 14)], wine2[,1]) # No chemical factor
it returned following error
Error: is.numeric(x) || is.logical(x) is not TRUE
Two things needed with that data setup: Change wine2 to wine1 (but that did not fix your error) and remove column 1 from the first arguments to boxM:
> boxM(wine1[,-c(1,4, 11, 12, 13)], wine1[,1])
Box's M-test for Homogeneity of Covariance Matrices
data: wine1[, -c(1, 4, 11, 12, 13)]
Chi-Sq (approx.) = 378.32, df = 90, p-value < 2.2e-16
> boxM(wine2[,-c(1,2, 3, 5, 6, 7, 8, 9, 10, 14)], wine2[,1])
Error in inherits(data, c("data.frame", "matrix")) :
object 'wine2' not found
> boxM(wine1[,-c(1,2, 3, 5, 6, 7, 8, 9, 10, 14)], wine1[,1])
Box's M-test for Homogeneity of Covariance Matrices
data: wine1[, -c(1, 2, 3, 5, 6, 7, 8, 9, 10, 14)]
Chi-Sq (approx.) = 147.69, df = 20, p-value < 2.2e-16
This is my data
x = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
17, 18, 19, 20, 21, 22)
y = c(1, 6, 2, 5, 4, 7, 9, 6, 8, 4, 5, 6, 5, 5, 6, 7, 5, 8, 9,
5, 4, 7)
plot(x, y)
fit <- lm(y ~ x)
fit
abline(fit, col = "black", lwd = "1")
I would like to the plot to split the data into two groups, observations above the regression line and and those under the regression line. How can I do this?
You can use predict to get the fitted value at each x, and then a logical comparison between the observed and fitted to test if they're above or below the line. Then set the colors you plot based on this logical comparison.
prediction <- predict(fit)
colors<-ifelse(y>prediction,1,2)
plot(x,y,col=colors)
abline(fit, col= "black",lwd="1")
R code:
x <- c(9, 5, 9 ,10, 13, 8, 8, 13, 18, 30)
y <- c(10, 6, 9, 8, 11, 4, 1, 3, 3, 10)
library(exactRankTests)
wilcox.exact(y,x, paired = TRUE, alternative = "two.sided")
The results: V = 3, p-value = 0.01562
SAS code:
data aaa;
set aaa;
diff=x-y;
run;
proc univariate;
var diff;
run;
The results: S=19.5 Pr >= |S| 0.0156
How to get statistics S in R?
If n<=20 the exact P was same in SAS and R,but if n>20 the results were different.
x <- c(9, 5, 9 ,10, 13, 8, 8, 13, 18, 30,9, 5, 9 ,10, 13, 8, 8, 13, 18, 30,9,11,12,10)
y <- c(10, 6, 9, 8, 11, 4, 1, 3, 3, 10,10, 6, 9, 8, 11, 4, 1, 3, 3, 10,10,12,11,12)
wilcox.exact(y,x,paired=TRUE, alternative = "two.sided",exact = FALSE)
The results: V = 34, p-value = 0.002534
The SAS results:S=92.5 Pr >= |S| 0.0009
How to get the same statistics S and P value in SAS and R? Thank you!
I am trying to fit a curved line segment to a dataset. While I can create the line it is always connected back to the starting point. I can't figure out how to get rid of this. I would really appreciate any help. Here is the code
mscF25=c(-12.94382785, -11.0281518, -9.186403952, -7.691576905, -6.470229134, -5.43000796, -4.559074508, -12.87271022, -10.0646268, -6.796208225, -4.433351598, -2.928135666, -1.979265556, -1.38936463, -11.05819006, -7.785838826, -5.297330858, -3.674159165, -2.64702678, -1.980973252, -1.533714976, -11.83971039, -9.168353808, -6.89192172, -5.23424594, -4.033326594, -3.148798626, -2.480469911)
bscF25=c(4, 5, 6, 7, 8, 9, 10, 4, 5, 6, 7, 8, 9, 10, 4, 5, 6, 7, 8, 9, 10, 4, 5, 6, 7, 8, 9, 10)
df25 <- data.frame(bscF25,mscF25)
plot(mscF25 ~ bscF25, data = df25)
ls25 <- loess(mscF25 ~ bscF25, data = df25, span = 3)
lines(df25$bscF25, ls25$fitted)
You might try the scatter.smooth function: "Plot and add a smooth curve computed by loess to a scatter plot"
scatter.smooth(x = df25$bscF25, y = df25$mscF25, span = 3)