I want to calculate the area within a 0.975 contour lines, which some of them aren't closed. This is the plot:
contour(zpropMAteo, levels = c(0.975),lty = 1,drawlabels = F, col=2)
plot(contorno, add=T )
where contorno is a window: polygonal boundary, with the continuous border of the plot:
str(contorno)
#List of 5
# $ type : chr "polygonal"
# $ xrange: num [1:2] 704787 727062
# $ yrange: num [1:2] 4239419 4261570
# $ bdry :List of 1
# ..$ :List of 4
# .. ..$ x : num [1:9188] 704787 705760 705892 706135 706311 ...
# .. ..$ y : num [1:9188] 4251037 4248333 4247517 4247191 4246915 ...
# .. ..$ area: num 1.76e+08
# .. ..$ hole: logi FALSE
# $ units :List of 3
# ..$ singular : chr "unit"
# ..$ plural : chr "units"
# ..$ multiplier: num 1
# ..- attr(*, "class")= chr "units"
# - attr(*, "class")= chr "owin"
and zpropMAteo is a pixel image:
str(zpropMAteo)
#List of 10
# $ v : num [1:99, 1:100] NA NA NA NA NA NA NA NA NA NA ...
# ..- attr(*, "dimnames")=List of 2
# .. ..$ : chr [1:99] "4239530.43796768" "4239753.18885493" "4239975.93974217" "4240198.69062942" ...
# .. ..$ : chr [1:100] "704898.406701755" "705121.157589002" "705343.908476249" "705566.659363497" ...
# $ dim : int [1:2] 99 100
# $ xrange: num [1:2] 704787 727062
# $ yrange: num [1:2] 4239419 4261570
# $ xstep : num 223
# $ ystep : num 224
# $ xcol : num [1:100] 704898 705121 705344 705567 705789 ...
# $ yrow : num [1:99] 4239531 4239755 4239978 4240202 4240426 ...
# $ type : chr "real"
# $ units :List of 3
# ..$ singular : chr "unit"
# ..$ plural : chr "units"
# ..$ multiplier: num 1
# ..- attr(*, "class")= chr "units"
# - attr(*, "class")= chr "im"
The problem, as you can see in the plot(right), is that there are open contour lines. Maybe a solution can be calculate the intersection with contours lines and the border and then the area inside, or maybe first calculate a contour line which it be exactly the border, I don't know, contourline(100%) or something like that... and then to obtain the interseccion and the area inside.
I tried to do
clinessMAteo<-contourLines(zpropMAteo$xcol,zpropMAteo$yrow,zpropMAteo$v,levels = c(0.975))
with this result:
str(clinessMAteo)
#List of 7
# $ :List of 3
# ..$ level: num 0.975
# ..$ x : num [1:5] 710690 710584 710690 710716 710690
# ..$ y : num [1:5] 4246190 4246243 4246296 4246243 4246190
# $ :List of 3
# ..$ level: num 0.975
# ..$ x : num [1:19] 714031 713978 713978 714031 714066 ...
# ..$ y : num [1:19] 4245519 4245572 4245796 4245814 4245796 ...
# $ :List of 3
# ..$ level: num 0.975
# ..$ x : num [1:6] 715258 715258 715145 715136 715145 ...
# ..$ y : num [1:6] 4260226 4260339 4260510 4260563 4260581 ...
# $ :List of 3
# ..$ level: num 0.975
# ..$ x : num [1:38] 718932 719154 719377 719574 719600 ...
# ..$ y : num [1:38] 4256978 4256942 4256891 4256759 4256742 ...
# $ :List of 3
# ..$ level: num 0.975
# ..$ x : num [1:45] 719377 719600 719823 720045 720268 ...
# ..$ y : num [1:45] 4255696 4255691 4255710 4255710 4255687 ...
# $ :List of 3
# ..$ level: num 0.975
# ..$ x : num [1:42] 724959 724946 724723 724562 724500 ...
# ..$ y : num [1:42] 4253166 4253162 4253138 4252956 4252900 ...
# $ :List of 3
# ..$ level: num 0.975
# ..$ x : num [1:15] 722273 722238 722273 722496 722718 ...
# ..$ y : num [1:15] 4251802 4251837 4251858 4251920 4251848 ...
and the area:
areaMASteo <- sum(sapply(clinessMAteo,function(ring){areapl(cbind(ring$x,ring$y))
but I know it isn't correct because I think I should obtain contours lines closed first.
Any idea?? :-)
Related
I had a large dataset that contains more than 300,000 rows/observations and 22 variables. I used the CLARA method for the clustering and plotted the results using fviz_cluster. Using the silhouette method, I got 10 as my number of clusters and from there I applied it to my CLARA algorithm.
clara.res <- clara(df, 10, samples = 50,trace = 1,sampsize = 1000, pamLike = TRUE)
str(clara.res)
List of 10
$ sample : chr [1:1000] "100046" "100303" "10052" "100727" ...
$ medoids : num [1:10, 1:22] 0.925 0.125 0.701 0 0 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : chr [1:10] "193751" "137853" "229261" "257462" ...
.. ..$ : chr [1:22] "COD" "DMW" "HER" "SPR" ...
$ i.med : int [1:10] 104171 42062 143627 174961 300065 13836 192832 207079 185241 228575
$ clustering: Named int [1:302251] 1 1 1 2 3 4 5 3 3 3 ...
..- attr(*, "names")= chr [1:302251] "1" "10" "100" "1000" ...
$ objective : num 0.37
$ clusinfo : num [1:10, 1:4] 71811 40181 46271 10155 31309 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : NULL
.. ..$ : chr [1:4] "size" "max_diss" "av_diss" "isolation"
$ diss : 'dissimilarity' num [1:499500] 1.392 2.192 0.937 2.157 1.643 ...
..- attr(*, "Size")= int 1000
..- attr(*, "Metric")= chr "euclidean"
..- attr(*, "Labels")= chr [1:1000] "100046" "100303" "10052" "100727" ...
$ call : language clara(x = df, k = 10, samples = 50, sampsize = 1000, trace = 1, pamLike = TRUE)
$ silinfo :List of 3
..$ widths : num [1:1000, 1:3] 1 1 1 1 1 1 1 1 1 1 ...
.. ..- attr(*, "dimnames")=List of 2
.. .. ..$ : chr [1:1000] "83395" "181310" "34452" "42991" ...
.. .. ..$ : chr [1:3] "cluster" "neighbor" "sil_width"
..$ clus.avg.widths: num [1:10] 0.645 0.408 0.487 0.513 0.839 ...
..$ avg.width : num 0.612
$ data : num [1:302251, 1:22] 1 1 1 0.366 0.35 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : chr [1:302251] "1" "10" "100" "1000" ...
.. ..$ : chr [1:22] "COD" "DMW" "HER" "SPR" ...
- attr(*, "class")= chr [1:2] "clara" "partition"
For the plot:
fviz_cluster(clara.res,
palette = c(
"#004c6d",
"#00a1c1",
"#ffc334",
"#78ab63",
"#00ffff",
"#00cfe3",
"#6efa75",
"#cc0089",
"#ff9509",
"#ffb6de"
), # color palette
ellipse.type = "t",geom = "point",show.clust.cent = TRUE,repel = TRUE,pointsize = 0.5,
ggtheme = theme_classic()
)+ xlim(-7, 3) + ylim (-5, 4) + labs(title = "Plot of clusters")
The result:
I reckoned that this cluster plot is based on PCA and have been trying to figure out which variables in my original data were chosen as Dim1 and Dim2 or what these x and y-axis represent. Can somebody help me how to find out these Dim1 and Dim2 and eigenvalues/variance of the whole Dim that exist without running PCA separately?
I saw there are some other functions/packages for PCA such as get_eigenvalue in factoextra and FactomineR, but it seemed that will require me to use the PCA algorithm from the beginning? How can I integrate it directly with my CLARA results?
Also, my Dim1 only consists of 12.3% and Dim2 8.8%, does it mean that these variables are not representative enough or? considering that I would have 22 dimensions in total (from my 22 variables), I think it's alright, no? I am not sure how these percentages of Dim1 and Dim2 affect my cluster results. I was thinking to do the screeplot from my CLARA results but I also can't figure it out.
I'd appreciate any insights.
I am new to machine learning and not very familiar with lists. I need to raster stack layers (Ras) to be in the same order as the training data using in the model (xgb_train_1). The names are in a component called finalModel called feature names. I have tried to reference this list in a list in several ways:
so: xgb_train_1 -> finalModel -> feature_names
Ras1 <- Ras[xgb_train_1$finalModel['feature_names']]
Ras1 <- Ras[xgb_train_1$finalModel[feature_names]]
Ras1 <- Ras[xgb_train_1[finalModel[feature_names]]]
None work and I receive the error "NULL" when I call Ras1
str(xgb_train_1$finalModel)
List of 13
$ handle :Class 'xgb.Booster.handle' <externalptr>
$ raw : raw [1:300765128] 7b 22 43 6f ...
$ niter : num 10000
$ call : language xgboost::xgb.train(params = list(eta = param$eta, max_depth = param$max_depth, gamma = param$gamma, colsampl| __truncated__ ...
$ params :List of 8
..$ eta : num 0.01
..$ max_depth : num 20
..$ gamma : num 1
..$ colsample_bytree : num 1
..$ min_child_weight : num 3
..$ subsample : num 1
..$ objective : chr "reg:squarederror"
..$ validate_parameters: logi TRUE
$ callbacks :List of 1
..$ cb.print.evaluation:function (env = parent.frame())
.. ..- attr(*, "call")= language cb.print.evaluation(period = print_every_n)
.. ..- attr(*, "name")= chr "cb.print.evaluation"
$ feature_names: chr [1:20] "Age" "Drainage" "HSG" "LU2005" ...
$ nfeatures : int 20
$ xNames : chr [1:20] "Age" "Drainage" "HSG" "LU2005" ...
$ problemType : chr "Regression"
$ tuneValue :'data.frame': 1 obs. of 7 variables:
..$ nrounds : num 10000
..$ max_depth : num 20
..$ eta : num 0.01
..$ gamma : num 1
..$ colsample_bytree: num 1
..$ min_child_weight: num 3
..$ subsample : num 1
$ obsLevels : logi NA
$ param : list()
- attr(*, "class")= chr "xgb.Booster"
I have been following an online example for R Kohonen self-organising maps (SOM) which suggested that the data should be centred and scaled before computing the SOM.
However, I've noticed the object created seems to have attributes for centre and scale, in which case am I really applying a redundant step by centring and scaling first? Example script below
# Load package
require(kohonen)
# Set data
data(iris)
# Scale and centre
dt <- scale(iris[, 1:4],center=TRUE)
# Prepare SOM
set.seed(590507)
som1 <- som(dt,
somgrid(6,6, "hexagonal"),
rlen=500,
keep.data=TRUE)
str(som1)
The output from the last line of the script is:
List of 13
$ data :List of 1
..$ : num [1:150, 1:4] -0.898 -1.139 -1.381 -1.501 -1.018 ...
.. ..- attr(*, "dimnames")=List of 2
.. .. ..$ : NULL
.. .. ..$ : chr [1:4] "Sepal.Length" "Sepal.Width" "Petal.Length"
"Petal.Width"
.. ..- attr(*, "scaled:center")= Named num [1:4] 5.84 3.06 3.76 1.2
.. .. ..- attr(*, "names")= chr [1:4] "Sepal.Length" "Sepal.Width"
"Petal.Length" "Petal.Width"
.. ..- attr(*, "scaled:scale")= Named num [1:4] 0.828 0.436 1.765 0.762
.. .. ..- attr(*, "names")= chr [1:4] "Sepal.Length" "Sepal.Width"
"Petal.Length" "Petal.Width"
$ unit.classif : num [1:150] 3 5 5 5 4 2 4 4 6 5 ...
$ distances : num [1:150] 0.0426 0.0663 0.0768 0.0744 0.1346 ...
$ grid :List of 6
..$ pts : num [1:36, 1:2] 1.5 2.5 3.5 4.5 5.5 6.5 1 2 3 4 ...
.. ..- attr(*, "dimnames")=List of 2
.. .. ..$ : NULL
.. .. ..$ : chr [1:2] "x" "y"
..$ xdim : num 6
..$ ydim : num 6
..$ topo : chr "hexagonal"
..$ neighbourhood.fct: Factor w/ 2 levels "bubble","gaussian": 1
..$ toroidal : logi FALSE
..- attr(*, "class")= chr "somgrid"
$ codes :List of 1
..$ : num [1:36, 1:4] -0.376 -0.683 -0.734 -1.158 -1.231 ...
.. ..- attr(*, "dimnames")=List of 2
.. .. ..$ : chr [1:36] "V1" "V2" "V3" "V4" ...
.. .. ..$ : chr [1:4] "Sepal.Length" "Sepal.Width" "Petal.Length"
"Petal.Width"
$ changes : num [1:500, 1] 0.0445 0.0413 0.0347 0.0373 0.0337 ...
$ alpha : num [1:2] 0.05 0.01
$ radius : Named num [1:2] 3.61 0
..- attr(*, "names")= chr [1:2] "66.66667%" ""
$ user.weights : num 1
$ distance.weights: num 1
$ whatmap : int 1
$ maxNA.fraction : int 0
$ dist.fcts : chr "sumofsquares"
- attr(*, "class")= chr "kohonen"
Note notice that in lines 7 and 10 of the output there are references to centre and scale. I would appreciate an explanation as to the process here.
Your step with scaling is not redundant because in source code there are no scaling, and attributes, that you see in 7 and 10 are attributes from train dataset.
To check this, just run and compare results of this chunk of code:
# Load package
require(kohonen)
# Set data
data(iris)
# Scale and centre
dt <- scale(iris[, 1:4],center=TRUE)
#compare train datasets
str(dt)
str(as.matrix(iris[, 1:4]))
# Prepare SOM
set.seed(590507)
som1 <- kohonen::som(dt,
kohonen::somgrid(6,6, "hexagonal"),
rlen=500,
keep.data=TRUE)
#without scaling
som2 <- kohonen::som(as.matrix(iris[, 1:4]),
kohonen::somgrid(6,6, "hexagonal"),
rlen=500,
keep.data=TRUE)
#compare results of som function
str(som1)
str(som2)
After running a repeated measures ANOVA and naming the output
RM_test <- ezANOVA(data=test_data, dv=var_test, wid = .(subject),
within = .(water_year), type = 3)
I looked at the internal structure of the named object using str(RM_test) and received the following:
List of 3
$ ANOVA :List of 3
..$ ANOVA :'data.frame': 1 obs. of 7 variables:
.. ..$ Effect: chr "water_year"
.. ..$ DFn : num 2
.. ..$ DFd : num 22
.. ..$ F : num 26.8
.. ..$ p : num 1.26e-06
.. ..$ p<.05 : chr "*"
.. ..$ ges : num 0.531
..$ Mauchly's Test for Sphericity:'data.frame': 1 obs. of 4 variables:
.. ..$ Effect: chr "water_year"
.. ..$ W : num 0.875
.. ..$ p : num 0.512
.. ..$ p<.05 : chr ""
..$ Sphericity Corrections :'data.frame': 1 obs. of 7 variables:
.. ..$ Effect : chr "water_year"
.. ..$ GGe : num 0.889
.. ..$ p[GG] : num 4.26e-06
.. ..$ p[GG]<.05: chr "*"
.. ..$ HFe : num 1.05
.. ..$ p[HF] : num 1.26e-06
.. ..$ p[HF]<.05: chr "*"
$ Mauchly's Test for Sphericity:'data.frame': 1 obs. of 4 variables:
..$ Effect: chr "wtr_yr"
..$ W : num 0.875
..$ p : num 0.512
..$ p<.05 : chr ""
$ Sphericity Corrections :'data.frame': 1 obs. of 7 variables:
..$ Effect : chr "wtr_yr"
..$ GGe : num 0.889
..$ p[GG] : num 4.26e-06
..$ p[GG]<.05: chr "*"
..$ HFe : num 1.05
..$ p[HF] : num 1.26e-06
..$ p[HF]<.05: chr "*"
I was able to extract the fourth variable F from the first data frame using RM_test[[1]][[4]][1] but cannot figure out how to extract the third variable p[GG] from the data frame Sphericity Corrections. This data frame appears twice so extracting either one would be fine.
Suggestions on how to do this using bracketed numbers and names would be appreciated.
The problem seems to be you not knowing how to extract list elements. As you said, there are two Sphericity Corrections data frames, so I will how to get the p[GG] value for both.
using bracketed number
For the first one, we do RM_test[[1]][[3]][[3]]. You can do it step by step to understand it:
x1 <- RM_test[[1]]; str(x1)
x2 <- x1[[3]]; str(x2)
x3 <- x2[[3]]; str(x3)
For the second one, do RM_test[[3]][[3]].
using bracketed name
Instead of using numbers for indexing, we can use names. For the first, do
RM_test[["ANOVA"]][["Sphericity Corrections"]][["p[GG]"]]
For the second, do
RM_test[["Sphericity Corrections"]][["p[GG]"]]
using $
For the first one, do
RM_test$ANOVA$"Sphericity Corrections"$"p[GG]"
For the second one, do
RM_test$"Sphericity Corrections"$"p[GG]"
Note the use of quote "" when necessary.
After running a repeated measures ANOVA and naming the output
RM_test <- ezANOVA(data=test_data, dv=var_test, wid = .(subject),
within = .(water_year), type = 3)
I looked at the internal structure of the named object using str(RM_test) and received the following:
List of 3
$ ANOVA :List of 3
..$ ANOVA :'data.frame': 1 obs. of 7 variables:
.. ..$ Effect: chr "water_year"
.. ..$ DFn : num 2
.. ..$ DFd : num 22
.. ..$ F : num 26.8
.. ..$ p : num 1.26e-06
.. ..$ p<.05 : chr "*"
.. ..$ ges : num 0.531
..$ Mauchly's Test for Sphericity:'data.frame': 1 obs. of 4 variables:
.. ..$ Effect: chr "water_year"
.. ..$ W : num 0.875
.. ..$ p : num 0.512
.. ..$ p<.05 : chr ""
..$ Sphericity Corrections :'data.frame': 1 obs. of 7 variables:
.. ..$ Effect : chr "water_year"
.. ..$ GGe : num 0.889
.. ..$ p[GG] : num 4.26e-06
.. ..$ p[GG]<.05: chr "*"
.. ..$ HFe : num 1.05
.. ..$ p[HF] : num 1.26e-06
.. ..$ p[HF]<.05: chr "*"
$ Mauchly's Test for Sphericity:'data.frame': 1 obs. of 4 variables:
..$ Effect: chr "wtr_yr"
..$ W : num 0.875
..$ p : num 0.512
..$ p<.05 : chr ""
$ Sphericity Corrections :'data.frame': 1 obs. of 7 variables:
..$ Effect : chr "wtr_yr"
..$ GGe : num 0.889
..$ p[GG] : num 4.26e-06
..$ p[GG]<.05: chr "*"
..$ HFe : num 1.05
..$ p[HF] : num 1.26e-06
..$ p[HF]<.05: chr "*"
I was able to extract the fourth variable F from the first data frame using RM_test[[1]][[4]][1] but cannot figure out how to extract the third variable p[GG] from the data frame Sphericity Corrections. This data frame appears twice so extracting either one would be fine.
Suggestions on how to do this using bracketed numbers and names would be appreciated.
The problem seems to be you not knowing how to extract list elements. As you said, there are two Sphericity Corrections data frames, so I will how to get the p[GG] value for both.
using bracketed number
For the first one, we do RM_test[[1]][[3]][[3]]. You can do it step by step to understand it:
x1 <- RM_test[[1]]; str(x1)
x2 <- x1[[3]]; str(x2)
x3 <- x2[[3]]; str(x3)
For the second one, do RM_test[[3]][[3]].
using bracketed name
Instead of using numbers for indexing, we can use names. For the first, do
RM_test[["ANOVA"]][["Sphericity Corrections"]][["p[GG]"]]
For the second, do
RM_test[["Sphericity Corrections"]][["p[GG]"]]
using $
For the first one, do
RM_test$ANOVA$"Sphericity Corrections"$"p[GG]"
For the second one, do
RM_test$"Sphericity Corrections"$"p[GG]"
Note the use of quote "" when necessary.