Extract nested list elements using bracketed numbers and names - r

After running a repeated measures ANOVA and naming the output
RM_test <- ezANOVA(data=test_data, dv=var_test, wid = .(subject),
within = .(water_year), type = 3)
I looked at the internal structure of the named object using str(RM_test) and received the following:
List of 3
$ ANOVA :List of 3
..$ ANOVA :'data.frame': 1 obs. of 7 variables:
.. ..$ Effect: chr "water_year"
.. ..$ DFn : num 2
.. ..$ DFd : num 22
.. ..$ F : num 26.8
.. ..$ p : num 1.26e-06
.. ..$ p<.05 : chr "*"
.. ..$ ges : num 0.531
..$ Mauchly's Test for Sphericity:'data.frame': 1 obs. of 4 variables:
.. ..$ Effect: chr "water_year"
.. ..$ W : num 0.875
.. ..$ p : num 0.512
.. ..$ p<.05 : chr ""
..$ Sphericity Corrections :'data.frame': 1 obs. of 7 variables:
.. ..$ Effect : chr "water_year"
.. ..$ GGe : num 0.889
.. ..$ p[GG] : num 4.26e-06
.. ..$ p[GG]<.05: chr "*"
.. ..$ HFe : num 1.05
.. ..$ p[HF] : num 1.26e-06
.. ..$ p[HF]<.05: chr "*"
$ Mauchly's Test for Sphericity:'data.frame': 1 obs. of 4 variables:
..$ Effect: chr "wtr_yr"
..$ W : num 0.875
..$ p : num 0.512
..$ p<.05 : chr ""
$ Sphericity Corrections :'data.frame': 1 obs. of 7 variables:
..$ Effect : chr "wtr_yr"
..$ GGe : num 0.889
..$ p[GG] : num 4.26e-06
..$ p[GG]<.05: chr "*"
..$ HFe : num 1.05
..$ p[HF] : num 1.26e-06
..$ p[HF]<.05: chr "*"
I was able to extract the fourth variable F from the first data frame using RM_test[[1]][[4]][1] but cannot figure out how to extract the third variable p[GG] from the data frame Sphericity Corrections. This data frame appears twice so extracting either one would be fine.
Suggestions on how to do this using bracketed numbers and names would be appreciated.

The problem seems to be you not knowing how to extract list elements. As you said, there are two Sphericity Corrections data frames, so I will how to get the p[GG] value for both.
using bracketed number
For the first one, we do RM_test[[1]][[3]][[3]]. You can do it step by step to understand it:
x1 <- RM_test[[1]]; str(x1)
x2 <- x1[[3]]; str(x2)
x3 <- x2[[3]]; str(x3)
For the second one, do RM_test[[3]][[3]].
using bracketed name
Instead of using numbers for indexing, we can use names. For the first, do
RM_test[["ANOVA"]][["Sphericity Corrections"]][["p[GG]"]]
For the second, do
RM_test[["Sphericity Corrections"]][["p[GG]"]]
using $
For the first one, do
RM_test$ANOVA$"Sphericity Corrections"$"p[GG]"
For the second one, do
RM_test$"Sphericity Corrections"$"p[GG]"
Note the use of quote "" when necessary.

Related

Extracting sub objects from list in [duplicate]

After running a repeated measures ANOVA and naming the output
RM_test <- ezANOVA(data=test_data, dv=var_test, wid = .(subject),
within = .(water_year), type = 3)
I looked at the internal structure of the named object using str(RM_test) and received the following:
List of 3
$ ANOVA :List of 3
..$ ANOVA :'data.frame': 1 obs. of 7 variables:
.. ..$ Effect: chr "water_year"
.. ..$ DFn : num 2
.. ..$ DFd : num 22
.. ..$ F : num 26.8
.. ..$ p : num 1.26e-06
.. ..$ p<.05 : chr "*"
.. ..$ ges : num 0.531
..$ Mauchly's Test for Sphericity:'data.frame': 1 obs. of 4 variables:
.. ..$ Effect: chr "water_year"
.. ..$ W : num 0.875
.. ..$ p : num 0.512
.. ..$ p<.05 : chr ""
..$ Sphericity Corrections :'data.frame': 1 obs. of 7 variables:
.. ..$ Effect : chr "water_year"
.. ..$ GGe : num 0.889
.. ..$ p[GG] : num 4.26e-06
.. ..$ p[GG]<.05: chr "*"
.. ..$ HFe : num 1.05
.. ..$ p[HF] : num 1.26e-06
.. ..$ p[HF]<.05: chr "*"
$ Mauchly's Test for Sphericity:'data.frame': 1 obs. of 4 variables:
..$ Effect: chr "wtr_yr"
..$ W : num 0.875
..$ p : num 0.512
..$ p<.05 : chr ""
$ Sphericity Corrections :'data.frame': 1 obs. of 7 variables:
..$ Effect : chr "wtr_yr"
..$ GGe : num 0.889
..$ p[GG] : num 4.26e-06
..$ p[GG]<.05: chr "*"
..$ HFe : num 1.05
..$ p[HF] : num 1.26e-06
..$ p[HF]<.05: chr "*"
I was able to extract the fourth variable F from the first data frame using RM_test[[1]][[4]][1] but cannot figure out how to extract the third variable p[GG] from the data frame Sphericity Corrections. This data frame appears twice so extracting either one would be fine.
Suggestions on how to do this using bracketed numbers and names would be appreciated.
The problem seems to be you not knowing how to extract list elements. As you said, there are two Sphericity Corrections data frames, so I will how to get the p[GG] value for both.
using bracketed number
For the first one, we do RM_test[[1]][[3]][[3]]. You can do it step by step to understand it:
x1 <- RM_test[[1]]; str(x1)
x2 <- x1[[3]]; str(x2)
x3 <- x2[[3]]; str(x3)
For the second one, do RM_test[[3]][[3]].
using bracketed name
Instead of using numbers for indexing, we can use names. For the first, do
RM_test[["ANOVA"]][["Sphericity Corrections"]][["p[GG]"]]
For the second, do
RM_test[["Sphericity Corrections"]][["p[GG]"]]
using $
For the first one, do
RM_test$ANOVA$"Sphericity Corrections"$"p[GG]"
For the second one, do
RM_test$"Sphericity Corrections"$"p[GG]"
Note the use of quote "" when necessary.

Extract statistics from Anderson-Darling test (list)

I would like to extract the p-values from the Anderson-Darling test (ad.test from package kSamples). The test result is a list of 12 containing a 2x3 matrix. The p value is part of the 2x3 matrix and is present in element 7.
When using the following code:
lapply(AD_result, "[[", 7)
I get the following subset of AD test results (first 2 of a total of 50 shown)
[[1]]
AD T.AD asympt. P-value
version 1: 1.72 0.94536 0.13169
version 2: 1.51 0.66740 0.17461
[[2]]
AD T.AD asympt. P-value
version 1: 12.299 14.624 6.9248e-07
version 2: 11.900 14.144 1.1146e-06
My question is how to extract only the p-value (e.g. from version 1) and put these 50 results into a vector
The output from str(AD_result) is:
List of 55
$ :List of 12
..$ test.name : chr "Anderson-Darling"
..$ k : int 2
..$ ns : int [1:2] 103 2905
..$ N : int 3008
..$ n.ties : int 2873
..$ sig : num 0.762
..$ ad : num [1:2, 1:3] 1.72 1.51 0.945 0.667 0.132 ...
.. ..- attr(*, "dimnames")=List of 2
.. .. ..$ : chr [1:2] "version 1:" "version 2:"
.. .. ..$ : chr [1:3] "AD" "T.AD" " asympt. P-value"
..$ warning : logi FALSE
..$ null.dist1: NULL
..$ null.dist2: NULL
..$ method : chr "asymptotic"
..$ Nsim : num 1
..- attr(*, "class")= chr "kSamples"
You could try:
unlist(lapply(AD_result, function(x) x$ad[,3]))

R: Extract values from a summary() in a clValid object

I'm working on validating the goodness of hierarchical clustering using clValid. Below is my code. The clustering always results in one noisy cluster which contains the 70% of the elements and hence I recursively cluster the elements in the noisy cluster.
intern <- clValid(primaryDataSource, 2:10,clMethods = c("Hierarchical"),
validation="internal", maxitems = 2200)
summary(intern)
Output of summary(intern):
Clustering Methods:
hierarchical
Cluster sizes:
2 3 4 5 6 7 8 9 10
Validation Measures:
2 3 4 5 6 7 8 9 10
hierarchical Connectivity 3.8738 3.8738 8.2563 10.9452 16.0286 18.6452 20.6452 22.6452 24.6452
Dunn 4.0949 0.8810 0.6569 0.8694 0.8808 1.0416 1.0230 1.0262 1.3724
Silhouette 0.9592 0.9879 0.9785 0.9751 0.9727 0.9729 0.9727 0.9726 0.9725
Optimal Scores:
Score Method Clusters
Connectivity 3.8738 hierarchical 2
Dunn 4.0949 hierarchical 2
Silhouette 0.9879 hierarchical 3
At each iteration I have to execute the clValid() and select the number of clusters which would give me the highest Silhouette value (in the above example it's 3). I'm trying to automate the recursive clustering approach. Hence I'm looking to pick the number of clusters which would have the highest Silhouette value. Can you please help me in extracting that piece of information? Thank you.
P.S: I tried converting the results into a data frame or a table. However it didn't work.
Update: After using str()
> str(intern)
Formal class 'clValid' [package "clValid"] with 14 slots
..# clusterObjs:List of 1
.. ..$ hierarchical:List of 7
.. .. ..$ merge : int [1:2173, 1:2] -1673 -714 -1121 -1688 -1876 -1123 -1689 -1228 -429 -535 ...
.. .. ..$ height : num [1:2173] 0 0.001 0.001 0.001 0.001 ...
.. .. ..$ order : int [1:2174] 2165 2166 1950 1951 1954 1955 1577 1565 1564 1576 ...
.. .. ..$ labels : chr [1:2174] "out_M_aacald_c_boundary" "out_M_12ppd_DASH_R_e_boundary" "out_M_12ppd_DASH_S_e_boundary" "in_M_14glucan_e_boundary" ...
.. .. ..$ method : chr "average"
.. .. ..$ call : language hclust(d = Dist, method = method)
.. .. ..$ dist.method: chr "euclidean"
.. .. ..- attr(*, "class")= chr "hclust"
..# measures : num [1:3, 1:9, 1] 3.874 4.095 0.959 3.874 0.881 ...
.. ..- attr(*, "dimnames")=List of 3
.. .. ..$ : chr [1:3] "Connectivity" "Dunn" "Silhouette"
.. .. ..$ : chr [1:9] "2" "3" "4" "5" ...
.. .. ..$ : chr "hierarchical"
..# measNames : chr [1:3] "Connectivity" "Dunn" "Silhouette"
..# clMethods : chr "hierarchical"
..# labels : chr [1:2174] "out_M_aacald_c_boundary" "out_M_12ppd_DASH_R_e_boundary" "out_M_12ppd_DASH_S_e_boundary" "in_M_14glucan_e_boundary" ...
..# nClust : num [1:9] 2 3 4 5 6 7 8 9 10
..# validation : chr "internal"
..# metric : chr "euclidean"
..# method : chr "average"
..# neighbSize : num 10
..# annotation : NULL
..# GOcategory : chr "all"
..# goTermFreq : num 0.05
..# call : language clValid(obj = primaryDataSource, nClust = 2:10, clMethods = c("Hierarchical"), validation = "internal", maxitems = 2200)
I guess the important section is
# measures : num [1:3, 1:9, 1] 3.874 4.095 0.959 3.874 0.881 ...
.. ..- attr(*, "dimnames")=List of 3
.. .. ..$ : chr [1:3] "Connectivity" "Dunn" "Silhouette"
.. .. ..$ : chr [1:9] "2" "3" "4" "5" ...
.. .. ..$ : chr "hierarchical"
when I executed >intern#measuresI got the below result.
2 3 4 5 6 7 8 9
Connectivity 3.8738095 3.8738095 8.2563492 10.9452381 16.0285714 18.6452381 20.6452381 22.645238
Dunn 4.0948837 0.8810494 0.6568857 0.8694067 0.8808228 1.0415614 1.0230197 1.026192
Silhouette 0.9591803 0.9879153 0.9784684 0.9751393 0.9727454 0.9728736 0.9727153 0.972622
10
Connectivity 24.6452381
Dunn 1.3724494
Silhouette 0.9725379
I'm able to get the max and access individual items based on the index. I want to get the maximum value for Silhouette.
intern#measures[1]
max(intern#measures)
Some additionnal explanation, when str() shows # signs, this points that the object you are inspecting is a S4 class with attributes. I am not familiar with clValid but a quick look at the source code shows that the clValid class inherits from S4.
You can access those using object#attribute. Typically these attributes can be anything.
Looking at the print function for clValid it seems that you can access the measures using the convenience function measures(object). Looking at the remaining source code for clValid there are utility functions that may be of use for you. Check optimalScores().

Extract knot values out of gam with spline [duplicate]

This question already has an answer here:
mgcv: How to set number and / or locations of knots for splines
(1 answer)
Closed 5 years ago.
I am running a GAM across many samples and am extracting coefficients/t-values/r-squared from the results in the way shown below. For background, I am using a natural spline, so the regular lm() works fine here and perhaps that is why this method works fine.
tvalsm93exf=ldply(fitsm93exf, function(x) as.data.frame(t(coef(summary(x))[,'t value', drop=FALSE])))
r2m93exf=ldply(fitsm93exf, function(x) as.data.frame(t(summary(x))[,'r.squared', drop=FALSE]))
I would also like to extract the knot locations for each sample set(df=4 and no intercept, so three internal knots and the boundaries). I have tried several variations of the commands above, but haven't been able to index in to this. The regular way to do this is below, so I was attempting to put this into the form above. But I am not certain if the summary function contains these values, or if there is another result I should be including instead.
attr(terms(fits),"predvars")
http://www.inside-r.org/r-doc/splines/ns
Note: This question is related to the question below, if that helps, though its solution did not help me solve my problem:
Extract estimates of GAM
The knots are fixed at the time that the ns function is called in the examples on the help page you linked to, so you could have extracted the knots without going into the model object. But ... you have not provided the code for the GAM model creation, so we can only speculate about what you might have done. Just because the word "spline" is used in both the ?ns-help-page and in the documentation does not mean they are the same. The model in the other page you linked to had two "smooth" terms constructed wtih the s function.
.... + s(time,bs="cr",k=200) + s(tmpd,bs="cr")
The result of that gam call had a list node named "smooth" and the first one looked like this when viewed with str():
str(ap1$smooth)
List of 2
$ :List of 22
..$ term : chr "time"
..$ bs.dim : num 200
..$ fixed : logi FALSE
..$ dim : int 1
..$ p.order : logi NA
..$ by : chr "NA"
..$ label : chr "s(time)"
..$ xt : NULL
..$ id : NULL
..$ sp : Named num -1
.. ..- attr(*, "names")= chr "s(time)"
..$ S :List of 1
.. ..$ : num [1:199, 1:199] 5.6 -5.475 2.609 -0.577 0.275 ...
..$ rank : num 198
..$ null.space.dim: num 1
..$ df : num 199
..$ xp : Named num [1:200] -2556 -2527 -2502 -2476 -2451 ...
.. ..- attr(*, "names")= chr [1:200] "0.0000000%" "0.5025126%" "1.0050251%" "1.5075377%" ...
..$ F : num [1:40000] 0 0 0 0 0 0 0 0 0 0 ...
..$ plot.me : logi TRUE
..$ side.constrain: logi TRUE
..$ S.scale : num 9.56e-05
..$ vn : chr "time"
..$ first.para : num 5
..$ last.para : num 203
..- attr(*, "class")= chr [1:2] "cr.smooth" "mgcv.smooth"
..- attr(*, "qrc")=List of 4
.. ..$ qr : num [1:200, 1] -0.0709 0.0817 0.0709 0.0688 0.0724 ...
.. ..$ rank : int 1
.. ..$ qraux: num 1.03
.. ..$ pivot: int 1
.. ..- attr(*, "class")= chr "qr"
..- attr(*, "nCons")= int 1
So the smooth was evaluated at each of 200 points and a polynomial function fit to the data on that grid. If you forced the knots to be at three interior locations then they will just be at the extremes and evenly spaced location between the extremes.

R: calculate the area within no closed contour lines

I want to calculate the area within a 0.975 contour lines, which some of them aren't closed. This is the plot:
contour(zpropMAteo, levels = c(0.975),lty = 1,drawlabels = F, col=2)
plot(contorno, add=T )
where contorno is a window: polygonal boundary, with the continuous border of the plot:
str(contorno)
#List of 5
# $ type : chr "polygonal"
# $ xrange: num [1:2] 704787 727062
# $ yrange: num [1:2] 4239419 4261570
# $ bdry :List of 1
# ..$ :List of 4
# .. ..$ x : num [1:9188] 704787 705760 705892 706135 706311 ...
# .. ..$ y : num [1:9188] 4251037 4248333 4247517 4247191 4246915 ...
# .. ..$ area: num 1.76e+08
# .. ..$ hole: logi FALSE
# $ units :List of 3
# ..$ singular : chr "unit"
# ..$ plural : chr "units"
# ..$ multiplier: num 1
# ..- attr(*, "class")= chr "units"
# - attr(*, "class")= chr "owin"
and zpropMAteo is a pixel image:
str(zpropMAteo)
#List of 10
# $ v : num [1:99, 1:100] NA NA NA NA NA NA NA NA NA NA ...
# ..- attr(*, "dimnames")=List of 2
# .. ..$ : chr [1:99] "4239530.43796768" "4239753.18885493" "4239975.93974217" "4240198.69062942" ...
# .. ..$ : chr [1:100] "704898.406701755" "705121.157589002" "705343.908476249" "705566.659363497" ...
# $ dim : int [1:2] 99 100
# $ xrange: num [1:2] 704787 727062
# $ yrange: num [1:2] 4239419 4261570
# $ xstep : num 223
# $ ystep : num 224
# $ xcol : num [1:100] 704898 705121 705344 705567 705789 ...
# $ yrow : num [1:99] 4239531 4239755 4239978 4240202 4240426 ...
# $ type : chr "real"
# $ units :List of 3
# ..$ singular : chr "unit"
# ..$ plural : chr "units"
# ..$ multiplier: num 1
# ..- attr(*, "class")= chr "units"
# - attr(*, "class")= chr "im"
The problem, as you can see in the plot(right), is that there are open contour lines. Maybe a solution can be calculate the intersection with contours lines and the border and then the area inside, or maybe first calculate a contour line which it be exactly the border, I don't know, contourline(100%) or something like that... and then to obtain the interseccion and the area inside.
I tried to do
clinessMAteo<-contourLines(zpropMAteo$xcol,zpropMAteo$yrow,zpropMAteo$v,levels = c(0.975))
with this result:
str(clinessMAteo)
#List of 7
# $ :List of 3
# ..$ level: num 0.975
# ..$ x : num [1:5] 710690 710584 710690 710716 710690
# ..$ y : num [1:5] 4246190 4246243 4246296 4246243 4246190
# $ :List of 3
# ..$ level: num 0.975
# ..$ x : num [1:19] 714031 713978 713978 714031 714066 ...
# ..$ y : num [1:19] 4245519 4245572 4245796 4245814 4245796 ...
# $ :List of 3
# ..$ level: num 0.975
# ..$ x : num [1:6] 715258 715258 715145 715136 715145 ...
# ..$ y : num [1:6] 4260226 4260339 4260510 4260563 4260581 ...
# $ :List of 3
# ..$ level: num 0.975
# ..$ x : num [1:38] 718932 719154 719377 719574 719600 ...
# ..$ y : num [1:38] 4256978 4256942 4256891 4256759 4256742 ...
# $ :List of 3
# ..$ level: num 0.975
# ..$ x : num [1:45] 719377 719600 719823 720045 720268 ...
# ..$ y : num [1:45] 4255696 4255691 4255710 4255710 4255687 ...
# $ :List of 3
# ..$ level: num 0.975
# ..$ x : num [1:42] 724959 724946 724723 724562 724500 ...
# ..$ y : num [1:42] 4253166 4253162 4253138 4252956 4252900 ...
# $ :List of 3
# ..$ level: num 0.975
# ..$ x : num [1:15] 722273 722238 722273 722496 722718 ...
# ..$ y : num [1:15] 4251802 4251837 4251858 4251920 4251848 ...
and the area:
areaMASteo <- sum(sapply(clinessMAteo,function(ring){areapl(cbind(ring$x,ring$y))
but I know it isn't correct because I think I should obtain contours lines closed first.
Any idea?? :-)

Resources