Extract statistics from Anderson-Darling test (list)

Extract statistics from Anderson-Darling test (list) - r

I would like to extract the p-values from the Anderson-Darling test (ad.test from package kSamples). The test result is a list of 12 containing a 2x3 matrix. The p value is part of the 2x3 matrix and is present in element 7.
When using the following code:
lapply(AD_result, "[[", 7)
I get the following subset of AD test results (first 2 of a total of 50 shown)
[[1]]
AD T.AD asympt. P-value
version 1: 1.72 0.94536 0.13169
version 2: 1.51 0.66740 0.17461
[[2]]
AD T.AD asympt. P-value
version 1: 12.299 14.624 6.9248e-07
version 2: 11.900 14.144 1.1146e-06
My question is how to extract only the p-value (e.g. from version 1) and put these 50 results into a vector
The output from str(AD_result) is:
List of 55
$ :List of 12
..$ test.name : chr "Anderson-Darling"
..$ k : int 2
..$ ns : int [1:2] 103 2905
..$ N : int 3008
..$ n.ties : int 2873
..$ sig : num 0.762
..$ ad : num [1:2, 1:3] 1.72 1.51 0.945 0.667 0.132 ...
.. ..- attr(*, "dimnames")=List of 2
.. .. ..$ : chr [1:2] "version 1:" "version 2:"
.. .. ..$ : chr [1:3] "AD" "T.AD" " asympt. P-value"
..$ warning : logi FALSE
..$ null.dist1: NULL
..$ null.dist2: NULL
..$ method : chr "asymptotic"
..$ Nsim : num 1
..- attr(*, "class")= chr "kSamples"

You could try:
unlist(lapply(AD_result, function(x) x$ad[,3]))

Related

Interpreting the PCA axis Dim1 and Dim2 from CLARA plot results directly

I had a large dataset that contains more than 300,000 rows/observations and 22 variables. I used the CLARA method for the clustering and plotted the results using fviz_cluster. Using the silhouette method, I got 10 as my number of clusters and from there I applied it to my CLARA algorithm.
clara.res <- clara(df, 10, samples = 50,trace = 1,sampsize = 1000, pamLike = TRUE)
str(clara.res)
List of 10
$ sample : chr [1:1000] "100046" "100303" "10052" "100727" ...
$ medoids : num [1:10, 1:22] 0.925 0.125 0.701 0 0 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : chr [1:10] "193751" "137853" "229261" "257462" ...
.. ..$ : chr [1:22] "COD" "DMW" "HER" "SPR" ...
$ i.med : int [1:10] 104171 42062 143627 174961 300065 13836 192832 207079 185241 228575
$ clustering: Named int [1:302251] 1 1 1 2 3 4 5 3 3 3 ...
..- attr(*, "names")= chr [1:302251] "1" "10" "100" "1000" ...
$ objective : num 0.37
$ clusinfo : num [1:10, 1:4] 71811 40181 46271 10155 31309 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : NULL
.. ..$ : chr [1:4] "size" "max_diss" "av_diss" "isolation"
$ diss : 'dissimilarity' num [1:499500] 1.392 2.192 0.937 2.157 1.643 ...
..- attr(*, "Size")= int 1000
..- attr(*, "Metric")= chr "euclidean"
..- attr(*, "Labels")= chr [1:1000] "100046" "100303" "10052" "100727" ...
$ call : language clara(x = df, k = 10, samples = 50, sampsize = 1000, trace = 1, pamLike = TRUE)
$ silinfo :List of 3
..$ widths : num [1:1000, 1:3] 1 1 1 1 1 1 1 1 1 1 ...
.. ..- attr(*, "dimnames")=List of 2
.. .. ..$ : chr [1:1000] "83395" "181310" "34452" "42991" ...
.. .. ..$ : chr [1:3] "cluster" "neighbor" "sil_width"
..$ clus.avg.widths: num [1:10] 0.645 0.408 0.487 0.513 0.839 ...
..$ avg.width : num 0.612
$ data : num [1:302251, 1:22] 1 1 1 0.366 0.35 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : chr [1:302251] "1" "10" "100" "1000" ...
.. ..$ : chr [1:22] "COD" "DMW" "HER" "SPR" ...
- attr(*, "class")= chr [1:2] "clara" "partition"
For the plot:
fviz_cluster(clara.res,
palette = c(
"#004c6d",
"#00a1c1",
"#ffc334",
"#78ab63",
"#00ffff",
"#00cfe3",
"#6efa75",
"#cc0089",
"#ff9509",
"#ffb6de"
), # color palette
ellipse.type = "t",geom = "point",show.clust.cent = TRUE,repel = TRUE,pointsize = 0.5,
ggtheme = theme_classic()
)+ xlim(-7, 3) + ylim (-5, 4) + labs(title = "Plot of clusters")
The result:
I reckoned that this cluster plot is based on PCA and have been trying to figure out which variables in my original data were chosen as Dim1 and Dim2 or what these x and y-axis represent. Can somebody help me how to find out these Dim1 and Dim2 and eigenvalues/variance of the whole Dim that exist without running PCA separately?
I saw there are some other functions/packages for PCA such as get_eigenvalue in factoextra and FactomineR, but it seemed that will require me to use the PCA algorithm from the beginning? How can I integrate it directly with my CLARA results?
Also, my Dim1 only consists of 12.3% and Dim2 8.8%, does it mean that these variables are not representative enough or? considering that I would have 22 dimensions in total (from my 22 variables), I think it's alright, no? I am not sure how these percentages of Dim1 and Dim2 affect my cluster results. I was thinking to do the screeplot from my CLARA results but I also can't figure it out.
I'd appreciate any insights.

R kohonen - Is the input data scaled and centred automatically?

I have been following an online example for R Kohonen self-organising maps (SOM) which suggested that the data should be centred and scaled before computing the SOM.
However, I've noticed the object created seems to have attributes for centre and scale, in which case am I really applying a redundant step by centring and scaling first? Example script below
# Load package
require(kohonen)
# Set data
data(iris)
# Scale and centre
dt <- scale(iris[, 1:4],center=TRUE)
# Prepare SOM
set.seed(590507)
som1 <- som(dt,
somgrid(6,6, "hexagonal"),
rlen=500,
keep.data=TRUE)
str(som1)
The output from the last line of the script is:
List of 13
$ data :List of 1
..$ : num [1:150, 1:4] -0.898 -1.139 -1.381 -1.501 -1.018 ...
.. ..- attr(*, "dimnames")=List of 2
.. .. ..$ : NULL
.. .. ..$ : chr [1:4] "Sepal.Length" "Sepal.Width" "Petal.Length"
"Petal.Width"
.. ..- attr(*, "scaled:center")= Named num [1:4] 5.84 3.06 3.76 1.2
.. .. ..- attr(*, "names")= chr [1:4] "Sepal.Length" "Sepal.Width"
"Petal.Length" "Petal.Width"
.. ..- attr(*, "scaled:scale")= Named num [1:4] 0.828 0.436 1.765 0.762
.. .. ..- attr(*, "names")= chr [1:4] "Sepal.Length" "Sepal.Width"
"Petal.Length" "Petal.Width"
$ unit.classif : num [1:150] 3 5 5 5 4 2 4 4 6 5 ...
$ distances : num [1:150] 0.0426 0.0663 0.0768 0.0744 0.1346 ...
$ grid :List of 6
..$ pts : num [1:36, 1:2] 1.5 2.5 3.5 4.5 5.5 6.5 1 2 3 4 ...
.. ..- attr(*, "dimnames")=List of 2
.. .. ..$ : NULL
.. .. ..$ : chr [1:2] "x" "y"
..$ xdim : num 6
..$ ydim : num 6
..$ topo : chr "hexagonal"
..$ neighbourhood.fct: Factor w/ 2 levels "bubble","gaussian": 1
..$ toroidal : logi FALSE
..- attr(*, "class")= chr "somgrid"
$ codes :List of 1
..$ : num [1:36, 1:4] -0.376 -0.683 -0.734 -1.158 -1.231 ...
.. ..- attr(*, "dimnames")=List of 2
.. .. ..$ : chr [1:36] "V1" "V2" "V3" "V4" ...
.. .. ..$ : chr [1:4] "Sepal.Length" "Sepal.Width" "Petal.Length"
"Petal.Width"
$ changes : num [1:500, 1] 0.0445 0.0413 0.0347 0.0373 0.0337 ...
$ alpha : num [1:2] 0.05 0.01
$ radius : Named num [1:2] 3.61 0
..- attr(*, "names")= chr [1:2] "66.66667%" ""
$ user.weights : num 1
$ distance.weights: num 1
$ whatmap : int 1
$ maxNA.fraction : int 0
$ dist.fcts : chr "sumofsquares"
- attr(*, "class")= chr "kohonen"
Note notice that in lines 7 and 10 of the output there are references to centre and scale. I would appreciate an explanation as to the process here.

Your step with scaling is not redundant because in source code there are no scaling, and attributes, that you see in 7 and 10 are attributes from train dataset.
To check this, just run and compare results of this chunk of code:
# Load package
require(kohonen)
# Set data
data(iris)
# Scale and centre
dt <- scale(iris[, 1:4],center=TRUE)
#compare train datasets
str(dt)
str(as.matrix(iris[, 1:4]))
# Prepare SOM
set.seed(590507)
som1 <- kohonen::som(dt,
kohonen::somgrid(6,6, "hexagonal"),
rlen=500,
keep.data=TRUE)
#without scaling
som2 <- kohonen::som(as.matrix(iris[, 1:4]),
kohonen::somgrid(6,6, "hexagonal"),
rlen=500,
keep.data=TRUE)
#compare results of som function
str(som1)
str(som2)

Produce a graphic tree diagram showing the structure of an R object

In R, str() is handy for showing the structure of an object, such as the list of lists returned by lm() and other modelling functions, but it gives way too much output. I'm looking for some tool to create a simple tree diagram showing only the names of the list elements and their structure.
e.g., for this example,
data(Prestige, package="car")
out <- lm(prestige ~ income+education+women, data=Prestige)
str(out, max.level=2)
#> List of 12
#> $ coefficients : Named num [1:4] -6.79433 0.00131 4.18664 -0.00891
#> ..- attr(*, "names")= chr [1:4] "(Intercept)" "income" "education" "women"
#> $ residuals : Named num [1:102] 4.58 -9.39 4.69 4.22 8.15 ...
#> ..- attr(*, "names")= chr [1:102] "gov.administrators" "general.managers" "accountants" "purchasing.officers" ...
#> $ effects : Named num [1:102] -472.99 -123.61 -92.61 -2.3 6.83 ...
#> ..- attr(*, "names")= chr [1:102] "(Intercept)" "income" "education" "women" ...
#> $ rank : int 4
#> $ fitted.values: Named num [1:102] 64.2 78.5 58.7 52.6 65.3 ...
#> ..- attr(*, "names")= chr [1:102] "gov.administrators" "general.managers" "accountants" "purchasing.officers" ...
#> $ assign : int [1:4] 0 1 2 3
#> $ qr :List of 5
#> ..$ qr : num [1:102, 1:4] -10.1 0.099 0.099 0.099 0.099 ...
#> .. ..- attr(*, "dimnames")=List of 2
#> .. ..- attr(*, "assign")= int [1:4] 0 1 2 3
#> ..$ qraux: num [1:4] 1.1 1.44 1.06 1.06
#> ..$ pivot: int [1:4] 1 2 3 4
#> ..$ tol : num 1e-07
#> ..$ rank : int 4
#> ..- attr(*, "class")= chr "qr"
#> $ df.residual : int 98
...
I would like to get something like this:
This is similar to what I get from tree for file folders in my file system:
C:\Dropbox\Documents\images>tree
Folder PATH listing
Volume serial number is 2250-8E6F
C:.
+---cartoons
+---chevaliers
+---icons
+---milestones
+---minard
+---minard-besancon
The result could be either in graphic characters, as in tree or an actual graphic as shown above. Is anything like this available?

A simple approach to getting this from the str output would be something like...
a <- capture.output(str(out, max.level=2))
a <- trimws(gsub("\\:.*", "", a[grepl("\\$", a)]))
cat(a, sep="\n")
$ coefficients
$ residuals
$ effects
$ rank
$ fitted.values
$ assign
$ qr
..$ qr
..$ qraux
..$ pivot
..$ tol
..$ rank
$ df.residual
$ xlevels
$ call
$ terms
$ model
..$ prestige
..$ income
..$ education
..$ women

Extract nested list elements using bracketed numbers and names

After running a repeated measures ANOVA and naming the output
RM_test <- ezANOVA(data=test_data, dv=var_test, wid = .(subject),
within = .(water_year), type = 3)
I looked at the internal structure of the named object using str(RM_test) and received the following:
List of 3
$ ANOVA :List of 3
..$ ANOVA :'data.frame': 1 obs. of 7 variables:
.. ..$ Effect: chr "water_year"
.. ..$ DFn : num 2
.. ..$ DFd : num 22
.. ..$ F : num 26.8
.. ..$ p : num 1.26e-06
.. ..$ p<.05 : chr "*"
.. ..$ ges : num 0.531
..$ Mauchly's Test for Sphericity:'data.frame': 1 obs. of 4 variables:
.. ..$ Effect: chr "water_year"
.. ..$ W : num 0.875
.. ..$ p : num 0.512
.. ..$ p<.05 : chr ""
..$ Sphericity Corrections :'data.frame': 1 obs. of 7 variables:
.. ..$ Effect : chr "water_year"
.. ..$ GGe : num 0.889
.. ..$ p[GG] : num 4.26e-06
.. ..$ p[GG]<.05: chr "*"
.. ..$ HFe : num 1.05
.. ..$ p[HF] : num 1.26e-06
.. ..$ p[HF]<.05: chr "*"
$ Mauchly's Test for Sphericity:'data.frame': 1 obs. of 4 variables:
..$ Effect: chr "wtr_yr"
..$ W : num 0.875
..$ p : num 0.512
..$ p<.05 : chr ""
$ Sphericity Corrections :'data.frame': 1 obs. of 7 variables:
..$ Effect : chr "wtr_yr"
..$ GGe : num 0.889
..$ p[GG] : num 4.26e-06
..$ p[GG]<.05: chr "*"
..$ HFe : num 1.05
..$ p[HF] : num 1.26e-06
..$ p[HF]<.05: chr "*"
I was able to extract the fourth variable F from the first data frame using RM_test[[1]][[4]][1] but cannot figure out how to extract the third variable p[GG] from the data frame Sphericity Corrections. This data frame appears twice so extracting either one would be fine.
Suggestions on how to do this using bracketed numbers and names would be appreciated.

The problem seems to be you not knowing how to extract list elements. As you said, there are two Sphericity Corrections data frames, so I will how to get the p[GG] value for both.
using bracketed number
For the first one, we do RM_test[[1]][[3]][[3]]. You can do it step by step to understand it:
x1 <- RM_test[[1]]; str(x1)
x2 <- x1[[3]]; str(x2)
x3 <- x2[[3]]; str(x3)
For the second one, do RM_test[[3]][[3]].
using bracketed name
Instead of using numbers for indexing, we can use names. For the first, do
RM_test[["ANOVA"]][["Sphericity Corrections"]][["p[GG]"]]
For the second, do
RM_test[["Sphericity Corrections"]][["p[GG]"]]
using $
For the first one, do
RM_test$ANOVA$"Sphericity Corrections"$"p[GG]"
For the second one, do
RM_test$"Sphericity Corrections"$"p[GG]"
Note the use of quote "" when necessary.

extract the correlation matrix for the factors in the psych package's fa.poly function

I'm working from caracal's great example conducting a factor analysis on dichotomous data and I'm now struggling to extract the factors from the object produced by the psych package's fa.poly function.
Can anyone help me extract the factors from the fa.poly object (and look at the correlation)?
Please see caracal's example for the working example.

In this example you create an object with:
faPCdirect <- fa.poly(XdiNum, nfactors=2, rotate="varimax") # polychoric FA
so somewhere in faPCdirect there is what you want. I recommend using str() to inspect the structure of faPCdirect
> str(faPCdirect)
List of 5
$ fa :List of 34
..$ residual : num [1:6, 1:6] 4.79e-01 7.78e-02 -2.97e-0...
.. ..- attr(*, "dimnames")=List of 2
.. .. ..$ : chr [1:6] "X1" "X2" "X3" "X4" ...
.. .. ..$ : chr [1:6] "X1" "X2" "X3" "X4" ...
..$ dof : num 4
..$ fit
...skip stuff....
..$ BIC : num 4.11
..$ r.scores : num [1:2, 1:2] 1 0.0508 0.0508 1
.. ..- attr(*, "dimnames")=List of 2
.. .. ..$ : chr [1:2] "MR2" "MR1"
.. .. ..$ : chr [1:2] "MR2" "MR1"
..$ R2 : Named num [1:2] 0.709 0.989
.. ..- attr(*, "names")= chr [1:2] "MR2" "MR1"
..$ valid : num [1:2] 0.819 0.987
..$ score.cor : num [1:2, 1:2] 1 0.212 0.212 1
So this says that this object is a list of five, with the first element called fa and that contains an element called score.cor that is a 2x2 matrix. I think what you want is the off diagonal.
> faPCdirect$fa$score.cor
[,1] [,2]
[1,] 1.0000000 0.2117457
[2,] 0.2117457 1.0000000

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Extract statistics from Anderson-Darling test (list) - r

You could try: unlist(lapply(AD_result, function(x) x$ad[,3]))

Related

Interpreting the PCA axis Dim1 and Dim2 from CLARA plot results directly

R kohonen - Is the input data scaled and centred automatically?

Produce a graphic tree diagram showing the structure of an R object

Extract nested list elements using bracketed numbers and names

extract the correlation matrix for the factors in the psych package's fa.poly function

Categories

Resources