How to create a graph file for INLA using region names - r

i.e. use the region.id of class nb from the spdep package rather than ignoring it as spdep::nb2INLA does?
I've been trying to link a column in my data containing these regions as a factor, to an INLA model with a graph describing their spatial arrangement.
#something like this
f(rgn16cd,
model = "bym2",
graph = inla_graphs$gb_regions)
It works if I coerce rgn16cd from factor to numeric. Is there a way to get the region names into the graph file?

Where nbs is a list of class nb, made using an spatial polygons object with row.names given values from a column of the #data slot of the spatial polygons object.
This code should return a graph with named element as shown.
inla_graphs <- purrr::imap(nbs, ~ {
spdep::nb2INLA(file = glue::glue("{.y}.graph"), nb = .x$nb)
x <- INLA::inla.read.graph(glue::glue("{.y}.graph"))
x$nbs <- lapply(x$nbs, FUN = function(X) {
row.names(.x$mat)[X]
})
names(x$nbs) <- row.names(.x$mat)
unlink(glue::glue("{.y}.graph"))
x
})

Related

R: How can I assign points on a map a color based on a set of values?

I have run a factor analysis on a spatial dataset, and I would like to plot the results on a map so that the color of each individual point (location) is a combination in a RGB/HSV space of the scores at that location of the three factors extracted.
I am using base R to plot the locations, which are in a SpatialPointsDataFrame created with the spdep package:
Libraries
library(sp)
library(classInt)
Sample Dataset
fas <- structure(list(MR1 = c(-0.604222013102789, -0.589631093835467,
-0.612647301042234, 2.23360319770647, -0.866779007222414), MR2 = c(-0.492209397489792,
-0.216810726717787, -0.294487678489753, -0.60466348557844, 0.34752411748663
), MR3 = c(-0.510065798219453, -0.61303212834454, 0.194263734935779,
0.347461766159926, -0.756375966467285), x = c(1457543.717, 1491550.224,
1423185.998, 1508232.145, 1521316.942), y = c(4947666.766, 5001394.895,
4948766.5, 4950547.862, 5003955.997)), row.names = c("Acqui Terme",
"Alagna", "Alba", "Albera Ligure", "Albuzzano"), class = "data.frame")
Create spatial object
fas <- SpatialPointsDataFrame(fas[,4:5], fas,
proj4string = CRS("+init=EPSG:3003"))
Plotting function
map <- function(f) {
pal <- colorRampPalette(c("steelblue","white","tomato2"), bias = 1)
collist <- pal(10)
class <- classIntervals(f, 8, style = "jenks")
color <- findColours(class, collist)
plot(fas, pch=21,cex=.8, col="black",bg=color)
}
#example usage
#map(fas$MR1)
The above code works well for producing a separate plot for each factor. What I would like is a way to produce a composite map of the three factors together.
Many thanks in advance for any suggestion.
I found a solution through this post! With the data shown above, it goes like this:
#choose columns to map to color
colors <-fas#data[,c(1:3)]
#set range from 0 to 1
range_col <- function(x){(x-min(x))/(max(x)-min(x))}
colors_norm <- range_col(colors)
print(colors_norm)
#convert to RGB
colors_rgb <- rgb(colors_norm)
print(colors_rgb)
#plot
plot(fas, main="Color Scatterplot", bg=colors_hex,
col="black",pch=21)

how to draw ellipses without scatterplot in ggplot

I am trying to represent niche of species by drawing inertia ellipses. The function to do this in ade4 is niche. Here is an example:
data(trichometeo)
pca1 <- dudi.pca(trichometeo$meteo, scan = FALSE)
nic1 <- niche(pca1, log(trichometeo$fau + 1), scan = FALSE)
s.distri(dfxy = nic1$ls, dfdistri = eval.parent(as.list(nic1$call)[[3]]))
This graph is not really clear.
PCA is done on environmental variables.
Each point of the PCA is a study site. In each study site, several species have been observed. The ellipses are the niches of each species.
When building the ellipse of one species, a weight is given to each of the study sites (the points) according to the relative abundance of the species. The center of gravity of these weighed points is the center of the ellipsoid. The width of the ellipse is linked to the variance of the weighed points.
so there is no scatterplot with a factor i could use to use stat_ellipse.
Any suggestions on how to do that in ggplot graphics ?
thank you
So, finally i found how to plot ellipses in ggplot.It is explained in the first part of the answer. The second part describes how to extract ellipsoid coordinates from niche analysis in ade4.
Draw a simple ellipsoid in ggplot In order to do that, you have to build a data frame with to columns x and y for coordinates of some of the points that compose the ellipse, and use geom_polygon as follow:
> dput(test)
structure(list(x = c(-0.74970124137657, -0.776450364352299, -0.804256933708176,
-0.833011209618567, -0.862599712093033, -0.892905668830007, -0.923809476063724,
-0.955189170585639, -0.986920911077492, -1.01887946685642, -1.05093871210323,
-1.08297212362341, -1.11485328017637, -1.14645636140231, -1.1776566443777,
-1.20833099583969, -1.23835835813684, -1.26762022698836, -1.29600111916637,
-1.32338902825544, -1.34967586669074, -1.37475789233028, -1.39853611787775,
-1.42091670154015, -1.44181131737848, -1.46113750388986, -1.47881898944545,
-1.49478599329964, -1.50897550098285, -1.52133151299083, -1.53180526578912,
-1.5403554242604, -1.54694824483541, -1.55155770866329, -1.55416562429619,
-1.55476169948255, -1.55334358178592, -1.54991686786897, -1.54449508140603,
-1.53709961971128, -1.52775966929336, -1.51651209066957, -1.50340127289419,
-1.48847895837522, -1.47180403867071, -1.45344232207069, -1.43346627388192,
-1.41195473044039, -1.38899258798039, -1.3646704675878, -1.3390843575601,
-1.31233523458437, -1.28452866522849, -1.2557743893181, -1.22618588684363,
-1.19587993010666, -1.16497612287294, -1.13359642835103, -1.10186468785917,
-1.06990613208025, -1.03784688683344, -1.00581347531326, -0.973932318760297,
-0.94232923753436, -0.911128954558971, -0.880454603096979, -0.850427240799828,
-0.821165371948307, -0.792784479770296, -0.765396570681226, -0.739109732245926,
-0.714027706606386, -0.690249481058919, -0.66786889739652, -0.646974281558191,
-0.627648095046803, -0.609966609491222, -0.593999605637033, -0.579810097953819,
-0.567454085945835, -0.556980333147553, -0.548430174676264, -0.541837354101259,
-0.537227890273375, -0.534619974640476, -0.534023899454122, -0.535442017150752,
-0.538868731067695, -0.544290517530637, -0.551685979225388, -0.561025929643302,
-0.572273508267099, -0.585384326042479, -0.600306640561448, -0.616981560265954,
-0.63534327686597, -0.655319325054748, -0.676830868496273, -0.699793010956271,
-0.724115131348859), y = c(0.325013216091984, 0.336960163623126,
0.346538198705152, 0.353709521209382, 0.358445829202159, 0.360728430639646,
0.360548317136793, 0.357906199519309, 0.352812505018361, 0.345287336119057,
0.335360391225119, 0.323070847452856, 0.308467206016976, 0.291607100818459,
0.272557070989873, 0.251392298295821, 0.228196310424862, 0.203060651343884,
0.176084520015889, 0.147374378907005, 0.117043533827776, 0.085211686766882,
0.0520044634820859, 0.0175529177127555, -0.0180069860293719,
-0.0545349090500008, -0.0918866923249894, -0.129914925430158,
-0.168469528302887, -0.207398343539562, -0.246547736891326, -0.285763203588276,
-0.324889978099203, -0.363773644920435, -0.402260747983286, -0.440199396275052,
-0.477439863283444, -0.513835177898732, -0.549241704441568, -0.583519709527388,
-0.616533913530252, -0.648154024469714, -0.678255252213751, -0.7067188009684,
-0.733432338110486, -0.758290437513162, -0.781194995614671, -0.802055618588285,
-0.820789979085438, -0.837324141144163, -0.851592851980555, -0.863539799511696,
-0.873117834593722, -0.880289157097953, -0.885025465090729, -0.887308066528217,
-0.887127953025363, -0.884485835407879, -0.879392140906931, -0.871866972007626,
-0.861940027113689, -0.849650483341425, -0.835046841905545, -0.818186736707027,
-0.799136706878442, -0.77797193418439, -0.754775946313431, -0.729640287232453,
-0.702664155904458, -0.673954014795575, -0.643623169716345, -0.611791322655452,
-0.578584099370656, -0.544132553601326, -0.508572649859198, -0.47204472683857,
-0.434692943563581, -0.396664710458413, -0.358110107585684, -0.31918129234901,
-0.280031898997246, -0.240816432300296, -0.201689657789369, -0.162805990968137,
-0.124318887905287, -0.0863802396135213, -0.0491397726051291,
-0.012744457989841, 0.0226620685529939, 0.0569400736388145, 0.0899542776416779,
0.12157438858114, 0.151675616325177, 0.180139165079826, 0.206852702221912,
0.231710801624588, 0.254615359726098, 0.275475982699712, 0.294210343196865,
0.31074450525559)), .Names = c("x", "y"), row.names = c(NA, -100L
), class = "data.frame")
then just plot the polygon:
ggplot()+geom_polygon(data=test, aes(x=x, y=y))
For this specific issue: how to extract ellipses coordinates from a niche analysis with ade4:
plots from ade4 can be put in an oject:
data(trichometeo)
pca1 <- dudi.pca(trichometeo$meteo, scan = FALSE)
nic1 <- niche(pca1, log(trichometeo$fau + 1), scan = FALSE)
p1<-s.distri(dfxy = nic1$ls, dfdistri = eval.parent(as.list(nic1$call)[[3]]))
p1 is an object of class S4, and it is possible to access slots with data using # as follow:
p1#s.misc$ellipse
this command display a list containing, for each species:
one vector of x coordinates of the ellipse
one vector of y coordinates
one vector with coordinates of the axes of the ellipse
To exctract these coordinates, you use sapply
listx=sapply(p1#s.misc$ellipse, "[", "x")
listy=sapply(p1#s.misc$ellipse, "[", "y")
then transform them into a data frame:
tabx=do.call(data.frame, listx)
taby=do.call(data.frame, listy)
and combine them in one data frame (i use melt from reshape package to have a long data frame for ggplot)
tabx.long=melt(tabx)
taby.long=melt(taby)
tab.fin=cbind.data.frame(tabx.long,taby.long)
you can then use this dataframe with the method explained above

Dendrogram and HistDAWass package

I am using the HistDAWass package (https://cran.r-project.org/web/packages/HistDAWass/index.html) to perform clustering using a script partially provided by the package author.
As the Data1.csv files does not include a column with the row name sample (labels) I get a dendrogram that mark the tree labels as I1...I6.
Therefore, I tried to work with a new file (Data2.csv) which its first column include the labels but I get an error.
I will appreciate if someone can explain me how to generate the dendrogram with the new labels.
Script:
library(HistDAWass)
data=read.csv('D:/Data1.csv', header = FALSE)
data=t(data)
Hdata=MatH(nrows=6,ncols = 1)
for (i in 1:get.MatH.nrows(Hdata)){
tmp=data2hist(as.vector(data[,i]))
Hdata#M[i,1][[1]]=tmp
}
results=WH_hclust(x = Hdata,simplify = TRUE, method="complete")
plot(results) # it plots the dendrogram
Data files (in zip):
http://ge.tt/8yVsiQS2/v/0
The script contains a way for generating a matrix, where, in each cell there is a distributionH object. From raw data (for each row of the csv file) a distributionH in the for cycle, a new MatH (a matrix of distributions) is build.
For building the same from Data2.csv file you should run the following script
library(HistDAWass)
#read data
data=read.csv('Data2.csv', header = FALSE)
#initialize an empty MatH matrix using names from the firs colum of data
Hdata=MatH(nrows=nrow(data),rownames=as.list(as.character(data[,1])),ncols = 1)
#Fill the matrix
for (i in 1:get.MatH.nrows(Hdata)){
tmp=data2hist(as.vector(t(data[i,2:ncol(data)])))
Hdata#M[i,1][[1]]=tmp
}
#Do hierarchical clustering
results=WH_hclust(x = Hdata,simplify = TRUE, method="complete")
plot(results) # it plots the dendrogram

how to create a cclust object given a dataframe of indices

I need to access a function clustIndex of cclust package in R.
The protoptype of the function is as follows:
clustIndex ( y, x, index = "all" )
y Object of class "cclust" returned by a clustering algorithm such as kmeans
x Data matrix where columns correspond to variables and rows to observations
index The indexes that are calculated "calinski", "cindex", "db", "hartigan",
"ratkowsky", "scott", "marriot", "ball", "trcovw", "tracew", "friedman",
"rubin", "ssi", "likelihood", and "all" for all the indexes. Abbreviations
of these names are also accepted.
y is the object that is produced from function cclust in the same package, but I have a clustering algorithm coded in Matlab, and want to use this function clustIndex to calculate the indices using the solution produced by the algorithm in matlab.
One way I can think of is to create an object of cclust and fill value of its variable using my solutuion and then use it. Will this be correct/work?
Documentation of the package is available here
Any other ideas to use?
No need to create an object , you can just create a list like this:
y = list(cluster = matlabObj$cluster ,
centers = matlabObj$centers ,
withins = matlabObj$withins,
size = matlabObj$size)
Here an example using cclust(you should use your matlab cluster here) to show that the 4 variables are enough to use clustIndex function:
x<- rbind(matrix(rnorm(100,sd=0.3),ncol=2),
matrix(rnorm(100,mean=1,sd=0.3),ncol=2))
matlabObj <- cclust(x,2,20,verbose=TRUE,method="kmeans")
clustIndex(matlabObj,x, index="all")
y = list(cluster = matlabObj$cluster ,
centers = matlabObj$centers ,
withins = matlabObj$withins,
size = matlabObj$size)
identical(clustIndex(y,x, index="all"),
clustIndex(matlabObj,x, index="all"))
[1] TRUE

Displaying TraMineR (R) dendrograms in text/table format

I use the following R code to generate a dendrogram (see attached picture) with labels based on TraMineR sequences:
library(TraMineR)
library(cluster)
clusterward <- agnes(twitter.om, diss = TRUE, method = "ward")
plot(clusterward, which.plots = 2, labels=colnames(twitter_sequences))
The full code (including dataset) can be found here.
As informative as the dendrogram is graphically, it would be handy to get the same information in text and/or table format. If I call any of the aspects of the object clusterward (created by agnes), such as "order" or "merge" I get everything labeled using numbers rather than the names I get from colnames(twitter_sequences). Also, I don't see how I can output the groupings represented graphically in the dendrogram.
To summarize: How can I get the cluster output in text/table format with the labels properly displayed using R and ideally the traminer/cluster libraries?
The question concerns the cluster package. The help page for the agnes.object returned by agnes
(See http://stat.ethz.ch/R-manual/R-devel/library/cluster/html/agnes.object.html ) states that this object contains an order.lab component "similar to order, but containing observation labels instead of observation numbers. This component is only available if the original observations were labelled."
The dissimilarity matrix (twitter.om in your case) produced by TraMineR does currently not retain the sequence labels as row and column names. To get the order.lab component you have to manually assign sequence labels as both the rownames and colnames of your twitter.om matrix. I illustrate here with the mvad data provided by the TraMineR package.
library(TraMineR)
data(mvad)
## attaching row labels
rownames(mvad) <- paste("seq",rownames(mvad),sep="")
mvad.seq <- seqdef(mvad[17:86])
## computing the dissimilarity matrix
dist.om <- seqdist(mvad.seq, method = "OM", indel = 1, sm = "TRATE")
## assigning row and column labels
rownames(dist.om) <- rownames(mvad)
colnames(dist.om) <- rownames(mvad)
dist.om[1:6,1:6]
## Hierarchical cluster with agnes library(cluster)
cward <- agnes(dist.om, diss = TRUE, method = "ward")
## here we can see that cward has an order.lab component
attributes(cward)
That is for getting order with sequence labels rather than numbers. But now it is not clear to me which cluster outcome you want in text/table form. From the dendrogram you decide of where you want to cut it, i.e., the number of groups you want and cut the dendrogram with cutree, e.g. cl.4 <- cutree(clusterward1, k = 4). The result cl.4 is a vector with the cluster membership for each sequence and you get the list of the members of group 1, for example, with rownames(mvad.seq)[cl.4==1].
Alternatively, you can use the identify method (see ?identify.hclust) to select the groups interactively from the plot, but need to pass the argument as as.hclust(cward). Here is the code for the example
## plot the dendrogram
plot(cward, which.plot = 2, labels=FALSE)
## and select the groups manually from the plot
x <- identify(as.hclust(cward)) ## Terminate with second mouse button
## number of groups selected
length(x)
## list of members of the first group
x[[1]]
Hope this helps.

Resources