R k-means clustering data - r

in R, I have computed a k-means clustering as follows:
km = (mat2, centers=3)
where mat2 is a matrix of column vectors obtained by combining elements of a set of time series. There are 31 rows
Now that I have my k-means object how can I look at the data associated with a particular point? For example, supposed I clicked on a dot in that belongs to one of the partitions. How can I view this data? Of course what I mean is how to programmatically obtain this data.

I expect that you call kmeans as this:
set.seed(42)
df <- data.frame( row.names = paste0( "obs", 1:100 ),
V1 = rnorm(100),
V2 = rnorm(100),
V3 = rnorm(100) )
km <- kmeans( df, centers = 3 )
If you are unfamiliar with a new function, it's always a good idea to inspect the resulting object using str():
> str(km)
List of 7
$ cluster : Named int [1:100] 1 2 3 3 1 1 1 1 1 1 ...
..- attr(*, "names")= chr [1:100] "obs1" "obs2" "obs3" "obs4" ...
$ centers : num [1:3, 1:3] 0.65604 -1.09689 0.56428 0.11162 0.00549 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : chr [1:3] "1" "2" "3"
.. ..$ : chr [1:3] "V1" "V2" "V3"
$ totss : num 291
$ withinss : num [1:3] 43.7 65.7 51.3
$ tot.withinss: num 161
$ betweenss : num 130
$ size : int [1:3] 36 34 30
- attr(*, "class")= chr "kmeans"
As I understood from your question, you are looking for km$cluster, which tells you which observation of your data has been assigned to which cluster. The cluster centers can accordingly be investigated by km$centers.
If you now want to know which observations has been clustered to the third cluster with the center km$centers[3,], you can subset your data.frame (or matrix) by
> rownames(df[ km$cluster == 3, ])
[1] "obs3" "obs4" "obs12" "obs15" "obs16" "obs21" "obs25" "obs27" "obs32" "obs42" "obs43" "obs46" "obs48" "obs54" "obs55" "obs58" "obs61" "obs62" "obs63" "obs66" "obs67" "obs73" "obs76"
[24] "obs77" "obs81" "obs84" "obs86" "obs87" "obs90" "obs94"

Related

Rphylopars: "Error in class(tree) <- "phylo" : attempt to set an attribute on NULL"

I'm trying to compute a phenotypic covariance matrix between a fatty acid dataset and a phylogenetic tree using the Rphylopars package.
I'm able to load the data set and phylogeny; however, when I attempt to run the test I get the error message
Error in class(tree) <- "phylo" : attempt to set an attribute on NULL"
This is the code for the test
phy <- read.tree("combined_trees.txt")
plot(phy)
phy$tip.label
FA_data <- read.csv("fatty_acid_example_data.csv", header = TRUE, na.strings = ".")
head(FA_data)
str(FA_data)
PPE <- phylopars(trait_data = FA_data$fatty1_continuous, tree = FA_data$phy)
Not sure what other info will help figure out the issue. The data set and phylogeny loaded without an error.
In the tutorial, the tree and trait data are jointly simulated by the simtraits() function, so both end up as elements of a single list. In your case (which will be typical of real-data cases), the tree and the trait data come from different sources, so most likely you want
PPE <- phylopars(trait_data = FA_data, tree = phy)
provided that FA_data contains a first column species matching the tip names in phy, and otherwise only the numeric data you want to use (potentially only the single fatty_acid1 column).
For comparison, the data structure returned by simtraits() looks like this (using str()):
List of 4
$ trait_data:'data.frame': 45 obs. of 5 variables:
..$ species: chr [1:45] "t7" "t8" "t2" "t3" ...
..$ V1 : num [1:45] 1.338 0.308 1.739 2.009 2.903 ...
..$ V2 : num [1:45] -2.002 -0.115 -0.349 -4.452 NA ...
..$ V3 : num [1:45] -1.74 NA 1.09 -2.54 -1.19 ...
..$ V4 : num [1:45] 2.496 2.712 1.198 1.675 -0.117 ...
$ tree :List of 4
..$ edge : int [1:28, 1:2] 29 29 28 28 27 27 26 26 25 25 ...
..$ edge.length: num [1:28] 0.0941 0.0941 0.6233 0.7174 0.0527 ...
..$ Nnode : int 14
..$ tip.label : chr [1:15] "t7" "t8" "t2" "t3" ...
..- attr(*, "class")= chr "phylo"
..- attr(*, "order")= chr "postorder"
...
you can see that simtraits() returns a list containing (among other things) (1) a data frame with species as the first column and the other columns numeric and (2) a phylogenetic tree.
You

Adding a suffix to names when storing results in a loop

I am making some plots in R in a for-loop and would like to store them using a name to describe the function being plotted, but also which data it came from.
So when I have a list of 2 data sets "x" and "y" and the loop has a structure like this:
x = matrix(
c(1,2,4,5,6,7,8,9),
nrow=3,
ncol=2)
y = matrix(
c(20,40,60,80,100,120,140,160,180),
nrow=3,
ncol=2)
data <- list(x,y)
for (i in data){
??? <- boxplot(i)
}
I would like the ??? to be "name" + (i) + "_" separator. In this case the 2 plots would be called "plot_x" and "plot_y".
I tried some stuff with paste("plot", names(i), sep = "_") but I'm not sure if this is what to use, and where and how to use it in this scenario.
We can create an empty list with the length same as that of the 'data' and then store the corresponding output from the for loop by looping over the sequence of 'data'
out <- vector('list', length(data))
for(i in seq_along(data)) {
out[[i]] <- boxplot(data[[i]])
}
str(out)
#List of 2
# $ :List of 6
# ..$ stats: num [1:5, 1:2] 1 1.5 2 3 4 5 5.5 6 6.5 7
# ..$ n : num [1:2] 3 3
# ..$ conf : num [1:2, 1:2] 0.632 3.368 5.088 6.912
# ..$ out : num(0)
# ..$ group: num(0)
# ..$ names: chr [1:2] "1" "2"
# $ :List of 6
# ..$ stats: num [1:5, 1:2] 20 30 40 50 60 80 90 100 110 120
# ..$ n : num [1:2] 3 3
# ..$ conf : num [1:2, 1:2] 21.8 58.2 81.8 118.2
# ..$ group: num(0)
# ..$ names: chr [1:2] "1" "2"
If required, set the names of the list elements with the object names
names(out) <- paste0("plot_", c("x", "y"))
It is better not to create multiple objects in the global environment. Instead as showed above, place the objects in a list
akrun is right, you should try to avoid setting names in the global environment. But if you really have to, you can try this,
> y = matrix(c(20,40,60,80,100,120,140,160,180),ncol=1)
> .GlobalEnv[[paste0("plot_","y")]] <- boxplot(y)
> str(plot_y)
List of 6
$ stats: num [1:5, 1] 20 60 100 140 180
$ n : num 9
$ conf : num [1:2, 1] 57.9 142.1
$ out : num(0)
$ group: num(0)
$ names: chr "1"
You can read up on .GlobalEnv by typing in ?.GlobalEnv, into the R command prompt.

How to subset point patterns of a hyperframe using marks from a dataframe

I have a hyperframe with 93 rows. Each row contains a stem map of trees of class ppp along with plot level grouping factors. A dataframe of marks provides point specific data, such as the diameter, species, and height for each point.
I need to subset the point pattern based on the dataframe of marks and then run the L est function, which requires the data to be pooled. I have found examples of subsetting marks of single point patterns and examples of subsetting hyperframes based on columns of a hyperframe, but I have not seen examples subsetting point patterns of a hyperframe calling levels of a factor from a dataframe with multiple marks. Any guidance would be much appreciated.
I can subset the hyperframe by plot level factors, lets say a,b,and c vegetation types, then run a Lest for each plot, pool the outputs based on the vegetation type, and graph the pooled Lest (pg. 684 of Baddeley et al. 2015 provides a helpful example).
I fail however to subset the point patterns of the hyperframe based on specific marks of the dataframe. I'm not sure if the structure of my data causes problems, so I've included below, or if I'm just confused with the code that subsets marks of a dataframe associated with multiple point patterns of a hyperframe (R novice here. The lists within list gets confusing).
Data construction:
z.list <- mapply(as.ppp, X = df.list, W = window.list, SIMPLIFY=FALSE)
#df.list contains x,y coordinates, followed by columns of point specific
#data.
h <- hyperframe(X=z.list)
H <- cbind.hyperframe(h, plot.df1)#combine the point pattern and marks
#with plot level data
Data Structure:
str(H)
'hyperframe': 93 rows and 14 columns
$ X : objects of class ppp
$ PLOTID : factor 0102U001 0104U001 0104U002 ...
$ Group1 : integer 1 2 1 ...
$ Group2 : numeric 2.0 2.5 2.0 ...
str(H[1,]$X) #str of the ppp of the hyperframes first row
List of 1
$ X:List of 6
..$ window :List of 5
.. ..$ type : chr "polygonal"
.. ..$ xrange: num [1:2] 516441 516503
.. ..$ yrange: num [1:2] 3382698 3382804
.. ..$ bdry :List of 1
.. .. ..$ :List of 2
.. .. .. ..$ x: num [1:4] 516503 516502 516441 516442
.. .. .. ..$ y: num [1:4] 3382698 3382804 3382804 3382698
.. ..$ units :List of 3
.. .. ..$ singular : chr "metre"
.. .. ..$ plural : chr "metres"
.. .. ..$ multiplier: num 1
.. .. ..- attr(*, "class")= chr "unitname"
.. ..- attr(*, "class")= chr "owin"
..$ n : int 107
..$ x : num [1:107] 516501 516473 516470 516474 516474 ...
..$ y : num [1:107] 3382801 3382723 3382726 3382734 3382732 ...
..$ markformat: chr "dataframe"
..$ marks :'data.frame': 107 obs. of 3 variables:
.. ..$ DBH_Class: Factor w/ 16 levels "11.25","13.75",..: 7 5 13 12 8 4 9
.. ..$ Ingrowth : Factor w/ 7 levels "DD","I_DD","I_LD_MY",..: 7 6 6 7
.. ..$ PlotID : Factor w/ 93 levels "0102U001","0104U001",..: 1 1 1 1
..- attr(*, "class")= chr "ppp"
- attr(*, "class")= chr [1:5] "ppplist" "solist" "anylist" "listof" ...
The above seems correct to me, but I notice that although the marks print with the str function, is.multipoint outputs as FALSE. Not sure if this is part of my problem.
The following works great for plot level factors located in rows of the hyperframe.
H$L <- with(H, Lest((X),rmax=40))
L.VT.split <- split(H$L, H$VEG_TYPE) #plot level factor
L.VT.pool <- anylapply(L.VT.split, pool)
plot(L.VT.pool,cbind(pooliso, pooltheo, hiiso,loiso)-r~r,
shade=c("hiiso","loiso"),equal.scales=TRUE, main='')
But how does one perform the same operation using marks from a dataframe?
I'm not sure I quite understand the question, but I will try to provide
some useful hints anyway...
For each row in H you have a point pattern which contains mark
information in a data.frame (three columns called DBH_Class,
Ingrowth and PlotID). Here are some fake data with that structure:
library(spatstat)
set.seed(42)
df1 <- data.frame(x = runif(3), y = runif(3), DBH_Class = factor(1:3),
Ingrowth = LETTERS[1:3], PlotID = letters[1:3])
X1 <- as.ppp(df1, W = square(1))
df2 <- data.frame(x = runif(3), y = runif(3), DBH_Class = factor(1:3),
Ingrowth = LETTERS[1:3], PlotID = letters[1:3])
X2 <- as.ppp(df2, W = square(1))
H <- hyperframe(X = list(X1 = X1, X2 = X2))
H
#> Hyperframe:
#> X
#> 1 (ppp)
#> 2 (ppp)
plot(H$X, which.marks = "Ingrowth")
To subset an individual point pattern by a specific mark (Ingrowth in
this example) use subset:
X1B <- subset(X1, Ingrowth == "B")
Same thing for each pattern in the column X within H:
H$XB <- with(H, subset(X, Ingrowth == "B"))
H
#> Hyperframe:
#> X XB
#> 1 (ppp) (ppp)
#> 2 (ppp) (ppp)
plot(H$XB, which.marks = "Ingrowth")

How to get fitted values from ar() method model in R

I want to retrieve the fitted values from an ar() function output model in R. When using Arima() method, I get them using fitted(model.object) function, but I cannot find its equivalent for ar().
It does not store a fitted vector but does have the residuals. An example of using the residuals from the ar-object to reconstruct the predictions from the original data:
data(WWWusage)
arf <- ar(WWWusage)
str(arf)
#====================
List of 14
$ order : int 3
$ ar : num [1:3] 1.175 -0.0788 -0.1544
$ var.pred : num 117
$ x.mean : num 137
$ aic : Named num [1:21] 258.822 5.787 0.413 0 0.545 ...
..- attr(*, "names")= chr [1:21] "0" "1" "2" "3" ...
$ n.used : int 100
$ order.max : num 20
$ partialacf : num [1:20, 1, 1] 0.9602 -0.2666 -0.1544 -0.1202 -0.0715 ...
$ resid : Time-Series [1:100] from 1 to 100: NA NA NA -2.65 -4.19 ...
$ method : chr "Yule-Walker"
$ series : chr "WWWusage"
$ frequency : num 1
$ call : language ar(x = WWWusage)
$ asy.var.coef: num [1:3, 1:3] 0.01017 -0.01237 0.00271 -0.01237 0.02449 ...
- attr(*, "class")= chr "ar"
#===================
str(WWWusage)
# Time-Series [1:100] from 1 to 100: 88 84 85 85 84 85 83 85 88 89 ...
png(); plot(WWWusage)
lines(seq(WWWusage),WWWusage - arf$resid, col="red"); dev.off()
The simplest way to get the fits from an AR(p) model would be to use auto.arima() from the forecast package, which does have a fitted() method. If you really want a pure AR model, you can constrain the differencing via the d parameter and the MA order via the max.q parameter.
> library(forecast)
> fitted(auto.arima(WWWusage,d=0,max.q=0))
Time Series:
Start = 1
End = 100
Frequency = 1
[1] 91.68778 86.20842 82.13922 87.60576 ...

Accessing control chart results in R?

I have a short R script that loads a bunch of data and plots it in an XBar chart. Using the following code, I can plot the data and view the various statistical information.
library(qcc)
tir<-read.table("data.dat", header=T,,sep="\t")
names(tir)
attach(tir)
rand <- sample(tir)
xbarchart <- qcc(rand[1:100,],type="R")
summary(xbarchart)
I want to be able to do some process capability analysis (described here(PDF) on page 5) immediately after the XBar chart is created. In order to create the analysis chart, I need to store the LCL and UCL results from the XBar chart results created before as variables. Is there any way I can do this?
I shall answer your question using the example in the ?qcc help file.
x <- c(33.75, 33.05, 34, 33.81, 33.46, 34.02, 33.68, 33.27, 33.49, 33.20,
33.62, 33.00, 33.54, 33.12, 33.84)
xbarchart <- qcc(x, type="xbar.one", std.dev = "SD")
A useful function to inspect the structure of variables and function results is str(), short for structure.
str(xbarchart)
List of 11
$ call : language qcc(data = x, type = "xbar.one", std.dev = "SD")
$ type : chr "xbar.one"
$ data.name : chr "x"
$ data : num [1:15, 1] 33.8 33 34 33.8 33.5 ...
..- attr(*, "dimnames")=List of 2
.. ..$ Group : chr [1:15] "1" "2" "3" "4" ...
.. ..$ Samples: NULL
$ statistics: Named num [1:15] 33.8 33 34 33.8 33.5 ...
..- attr(*, "names")= chr [1:15] "1" "2" "3" "4" ...
$ sizes : int [1:15] 1 1 1 1 1 1 1 1 1 1 ...
$ center : num 33.5
$ std.dev : num 0.342
$ nsigmas : num 3
$ limits : num [1, 1:2] 32.5 34.5
..- attr(*, "dimnames")=List of 2
.. ..$ : chr ""
.. ..$ : chr [1:2] "LCL" "UCL"
$ violations:List of 2
..$ beyond.limits : int(0)
..$ violating.runs: num(0)
- attr(*, "class")= chr "qcc"
You will notice the second to last element in this list is called $limits and contains the two values for LCL and UCL.
It is simple to extract this element:
limits <- xbarchart$limits
limits
LCL UCL
32.49855 34.54811
Thus LCL <- limits[1] and UCL <- limits[2]

Resources