Problems with ks.test and ties - r

I have a distribution, for example:
d
#[1] 4 22 15 5 9 5 11 15 21 14 14 23 6 9 17 2 7 10 4
Or, the vector d in dput format.
d <- c(4, 22, 15, 5, 9, 5, 11, 15, 21, 14, 14, 23, 6, 9, 17, 2, 7, 10, 4)
And when I apply the ks.test,:
gamma <- ks.test(d, "pgamma", shape = 3.178882, scale = 3.526563)
This gives the following warning:
Warning message:
In ks.test(d, "pgamma", shape = 3.178882, scale = 3.526563) :
ties should not be present for the Kolmogorov-Smirnov test
I tried put unique(d), but obvious my data reduce the values and I wouldn't like this happen.
And the others manners and examples online, this example happen too, but the difference is the test show some results with the warning message, not only the message without values of ks.test.
Some help?

In gamma you can find your result, warning message is not blocking
d <- c(4, 22, 15, 5, 9, 5, 11, 15, 21, 14, 14, 23, 6, 9, 17, 2, 7, 10, 4)
gamma <- ks.test(d, "pgamma", shape = 3.178882, scale = 3.526563)
Warning message: In ks.test(d, "pgamma", shape = 3.178882, scale =
3.526563) : ties should not be present for the Kolmogorov-Smirnov test
gamma
One-sample Kolmogorov-Smirnov test
data: d
D = 0.14549, p-value = 0.816
alternative hypothesis: two-sided
You find an explanation of the warning in the help page ??ks.test
The presence of ties always generates a warning, since continuous
distributions do not generate them. If the ties arose from rounding
the tests may be approximately valid, but even modest amounts of
rounding can have a significant effect on the calculated statistic.
As you can see some rounding is applied and the test is "approximately" valid.

Related

Output of adaptive.density function in spatstat

I'm reading the book "Spatial Point Patterns: Methodology and Applications with R", Chapter 6, trying to replicate all the examples following the code at the companion website. I cannot replicate Figure 6.15 (a) since this is the output I get and it's way different from the Figure in the book.
library(spatstat)
#> Carico il pacchetto richiesto: spatstat.data
#> Carico il pacchetto richiesto: nlme
#> Carico il pacchetto richiesto: rpart
#>
#> spatstat 1.60-1 (nickname: 'Swinging Sixties')
#> For an introduction to spatstat, type 'beginner'
swp <- rescale(swedishpines)
aden <- adaptive.density(swp, f=0.1, nrep=30)
#> Computing 30 intensity estimates...
#>
#> PLEASE NOTE: The components "delsgs" and "summary" of the
#> object returned by deldir() are now DATA FRAMES rather than
#> matrices (as they were prior to release 0.0-18).
#> See help("deldir").
#>
#> PLEASE NOTE: The process that deldir() uses for determining
#> duplicated points has changed from that used in version
#> 0.0-9 of this package (and previously). See help("deldir").
#> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30.
#> Done.
rainsat <- function(n) {
grade <- sqrt(seq(0.1, 1, length=n))
rainbow(n=n, start=1/2, s=grade)
}
par(mar = c(1, 0, 0, 2))
plot(aden, main="", ribscale=1000, col=rainsat)
plot(swp, add=TRUE, pch=3)
Created on 2019-09-06 by the reprex package (v0.3.0)
What's the problem here? What am I doing wrong? Even if I run all the code in the startup.R and figurelayout.R files (which should just change the cols of the plots making the b/w) I still cannot get the same plot.
adaptive.density involves randomisation. You will not get the same result if you repeat the same command twice (unless you reset random.seed).
A larger value of nrep will reduce the random variation.

Why does the 'digits' argument in R's print change a value?

Why does the function return a value t = 13.214, but print(..., digits = 3) returns t = 10?
vals <- data.frame(a = c(4, 2, 4, 7, 3, 4, 8, 8, 3, 0, 1, 5, 4, 6, 4, 8, 7, 9, 6, 6, 3, 6, 7, 4),
b = c(5, 7, 6, 13, 12, 6, 14, 16, 4, 2, 7, 7, 4, 8, 9, 9, 11, 13, 12, 8, 3, 8, 7, 7))
stats::t.test(x = vals)
# One Sample t-test
# data: vals
# t = 13.214, df = 47, p-value < 2.2e-16
# alternative hypothesis: true mean is not equal to 0
# 95 percent confidence interval:
# 5.598761 7.609572
# sample estimates:
# mean of x
# 6.604167
print(stats::t.test(x = vals), digits = 3)
Form ?print:
digits: minimal number of significant digits, see print.default.
But that should not change 10 to 13?
package ‘stats’ version 3.5.1
R.version
platform x86_64-w64-mingw32
arch x86_64
os mingw32
system x86_64, mingw32
status
major 3
minor 5.1
year 2018
month 07
day 02
svn rev 74947
language R
version.string R version 3.5.1 (2018-07-02)
nickname Feather Spray
The first step in answering these questions is always to figure out which print method we're dealing with. The generic help in ?print won't necessarily be terribly relevant. t.test objects have class htest, so we want to look at print.htest.
Note that ?print.htest sends you to a slightly more specific documentation page. The documentation for digits doesn't say anything specific, but then in the Details section we see:
Both print methods traditionally have not obeyed the digits argument
properly. They now do, the htest method mostly in expressions like
max(1, digits - 2).
(This is in R 3.5.2)
For example, in the function code we see things like:
out <- c(out, paste(names(x$statistic), "=", format(signif(x$statistic,
max(1L, digits - 2L)))))
The default value for digits will typically be 7. It uses digits for printing the sample estimates and confidence intervals, but fewer digits for other quantities.

Extract matrix from list in markovchainListFit

I'm trying to extract the matrices from the markovchainListFit but am unable to.
library(markovchain)
mat <- data.frame(A = c(rep(0, 10)),
B = c(40 ,37, 35 ,30, 27, 21, 15, 16, 21, 19),
C = c(10, 15, 20, 23, 44, 34, 47, 22, 37, 29),
D = c(1, 2, 3, 5, 9, 21, 8, 12, 17, 12))
mat$A <- apply(mat, 1, function(x) 100 - sum(x))
# Build sequence from mat
tseq <- apply(t(mat), 2, function(x) rep(row.names(t(mat)), x))
# Fit Markov Matrices to sequences
mcListFit <- markovchainListFit(data = tseq)
What I've tried:
> mcListFit$estimate[[1]]
Unnamed Markov chain
A 4 - dimensional discrete Markov Chain defined by the following states:
A, B, C, D
The transition matrix (by rows) is defined as follows:
A B C D
A 0.9387755 0.06122449 0.00 0.0
B 0.0000000 0.85000000 0.15 0.0
C 0.0000000 0.00000000 0.90 0.1
D 0.0000000 0.00000000 0.00 1.0
> as.matrix(mcListFit$estimate[[1]])
Error in as.vector(data) :
no method for coercing this S4 class to a vector
> as.matrix(unlist(mcListFit$estimate[[1]]))
Error in as.vector(data) :
no method for coercing this S4 class to a vector
But I'm still not able to extract any of the matrices. How would I go about doing this?
This code could help:
#allocate a generic list
matrixList<-list()
#sequentially fill the list with the matrices
#using dim method to get the length of the estimates
for (i in 1:dim(mcListFit$estimate)) {
myMatr<- mcListFit$estimate[[i]]#transitionMatrix
matrixList[[i]]<-myMatr
}
matrixList

Plot variables as slope of line between points

Due to the nature of my specification, the results of my regression coefficients provide the slope (change in yield) between two points; therefore, I would like to plot these coefficients using the slope of a line between these two points with the first point (0, -0.7620) as the intercept. Please note this is a programming question; not a statistics question.
I'm not entirely sure how to implement this in base graphics or ggplot and would appreciate any help. Here is some sample data.
Sample Data:
df <- data.frame(x = c(0, 5, 8, 10, 12, 15, 20, 25, 29), y = c(-0.762,-0.000434, 0.00158, 0.0000822, -0.00294, 0.00246, -0.000521, -0.00009287, -0.01035) )
Output:
x y
1 0 -7.620e-01
2 5 -4.340e-04
3 8 1.580e-03
4 10 8.220e-05
5 12 -2.940e-03
6 15 2.460e-03
7 20 -5.210e-04
8 25 -9.287e-05
9 29 -1.035e-02
Example:
You can use cumsum, the cumulative sum, to calculate intermediate values
df <- data.frame(x=c(0, 5, 8, 10, 12, 15, 20, 25, 29),y=cumsum(c(-0.762,-0.000434, 0.00158, 0.0000822, -0.00294, 0.00246, -0.000521, -0.00009287, -0.0103)))
plot(df$x,df$y)

How to calculate a mean value from multiple maximal values

I have a variable e.g. c(0, 8, 7, 15, 85, 12, 46, 12, 10, 15, 15)
how can I calculate a mean value out of random maximal values in R?
for example, I would like to calculate a mean value with three maximal values?
First step: You draw a sample of 3 from your data and store it in x
Second step: You calculate the mean of the sample
try
dat <- c(0,8,7,15, 85, 12, 46, 12, 10, 15,15)
x <- sample(dat,3)
x
mean(x)
possible output:
> x <- sample(dat,3)
> x
[1] 85 15 0
> mean(x)
[1] 33.33333
If you mean the three highest values, just sort your vector and subset:
> mean(sort(c(0,8,7,15, 85, 12, 46, 12, 10, 15,15), decreasing=T)[1:3])
[1] 48.66667

Resources