How to combine state distribution plot and separate legend in traminer? - r

Plotting several clusters using seqdplot in TraMineR can make the legend messy, especially in combination with numerous states. This calls for additional options for modifying the legend which is available with the function seqlegend. However, I have a hard time combining a state distribution plot (seqdplot) with a separate modified legend (seqlegend). Ideally one wants to plot the clusters (e.g. 9) without a legend and then add the separate legend in the available bottom right row, but instead the separate legend is generating a new plot window. Can anyone help?
Here's an example using the biofam data. With the data I use in my own research the legend becomes much more messy since I have 11 states.
#Data
library(TraMineR)
library(WeightedCluster)
data(biofam)
biofam.seq <- seqdef(biofam[501:600, 10:25])
#OM distances
biofam.om <- seqdist(biofam.seq, method = "OM", indel = 3, sm = "TRATE")
#9 clusters
wardCluster <- hclust(as.dist(biofam.om), method = "ward.D2")
cluster9 <- cutree(wardCluster, k = 9)
#State distribution plot
seqdplot(biofam.seq, group = cluster9, with.legend = F)
#Separate legend
seqlegend(biofam.seq, title = "States", ncol = 2)
#Combine state distribution plot and separate legend
#??
Thank you.

The seqplot function does not allow to control the number of columns of the legend, nor does it allow to add a legend title. So you have to compose the plot yourself by generating a separated plot for each group with the legend disabled and adding the legend afterwards. Here is how you can do that:
cluster9 <- factor(cluster9)
levc <- levels(cluster9)
lev <- length(levc)
par(mfrow=c(5,2))
for (i in 1:lev)
seqdplot(biofam.seq[cluster9 == levc[i],], border=NA, main=levc[i], with.legend=FALSE)
seqlegend(biofam.seq, ncol=4, cex = 1.2, title='States')
========================
Update, Oct 1, 2018 =================
Since TraMineR V 2.0-9, the seqplot family of functions now support (when applicable) the argument ncol to control the number of columns in the legend. To add a title to the legend, you still have to proceed as shown above.

AFAIK seqlegend() doesn't work when the other plots you are plotting utilizes the groups arguments. In your case the only thing seqlegend() is adding is a title "States". If you are looking to add a legend so you can customize what is in the legend and so forth, you can accomplish that by providing the corresponding alphabet and states that are used in your analysis.
The package's website has several walkthroughs and guides enumerating the various options and so forth: Link to their webiste
#Data
library(TraMineR)
library(WeightedCluster)
data(biofam)
## Generate alphabet and states
alphabet <- 0:7
states <- letters[seq_along(alphabet)]
biofam.seq <- seqdef(biofam[501:600, 10:25], states = states, alphabet = alphabet)
#OM distances
biofam.om <- seqdist(biofam.seq, method = "OM", indel = 3, sm = "TRATE")
#9 clusters
wardCluster <- hclust(as.dist(biofam.om), method = "ward.D2")
cluster9 <- cutree(wardCluster, k = 9)
#State distribution plot
seqdplot(biofam.seq, group = cluster9, with.legend = TRUE)

Related

Circular heat maps in R?

Similar questions have been asked here and here, however, none of the other answers solve my problem.
Im trying to join together two (or more) separate heat maps and turn them into a circle. Im trying to achieve something like the image below (which I made by following the circlize package tutorial found here:
In my data, I have multiple matrices, where each matrix represents a different year. I want to try and create a circular heat map (like the one in the image) where each section of the circular heatmap is a single year.
In my example below, I am just using 2 years (so 2 heat maps) but I cant seem to get it to work:
library(circlize)
# create matrix
mat1 <- matrix(runif(80), 10, 8)
mat2 <- matrix(runif(80), 10, 8)
rownames(mat1) <- rownames(mat2) <- paste0('a', 1:10)
colnames(mat1) <- colnames(mat2) <- paste0('b', 1:8)
# join together
matX <- cbind(mat1, mat2)
# set splits
split <- c(rep('a', 8), rep('b', 8))
split = factor(split, levels = unique(split))
# create circular heatmap
col_fun1 = colorRamp2(c(0, 0.5, 1), c("blue", "white", "red"))
circos.heatmap(matX, split = split, col = col_fun1, rownames.side = "inside")
circos.clear()
The above code makes:
Im not sure where I am going wrong!? As when I use the ComplexHeatmap package, I am splitting the matrices correctly, as shown below:
# using ComplexHeatmap package
library(ComplexHeatmap)
Heatmap(matX, column_split = split, show_row_dend = F, show_column_dend = F)
Any suggestions as to how I could achieve this?

Base R Choropleth: colors aren't being applied to the map according to the order of the interval/breaks which makes the map hard to read

I created a choropleth with base R but I'm struggling with the colors. First, the colors don't follow the same order as the intervals and second, two of the intervals are using the same color, all of which makes the graph hard to read. This happens regardless of how many colors I use. It also doesn't matter whether I'm using brewer.pal or base colors.Here is a map with its respective legend illustrating the issue.
Below are the statements that I use to create the graph once data has been downloaded:
#Relevant packages:
library(dplyr)
library(RColorBrewer)
library(rgdal)
#create colors vector
pop_colors <- brewer.pal(8,"Purples")
#create breaks/intervals
pop_breaks <- c(0,20000,40000,60000,80000,100000,120000)
#apply breaks to population
cuts <- cut(cal_pop$Pop2016, pop_breaks, dig.lab = 6)
#create a vector with colors by population according to the interval they belong to:
color_breaks <- pop_colors[findInterval(cal_pop$Pop2016,vec = pop_breaks)]
Create choropleth
plot(cal_pop,col = color_breaks, main = "Calgary Population (2016)")
#create legend
legend("topleft", fill = color_breaks, legend = levels(cuts), title = "Population")
I used readOGR() command to read the shape file, which I'm linking here in case anybody is interested in taking a look at the data.
I'd appreciate any advice you could give me.
Thanks!
Your error is in this line:
color_breaks <- pop_colors[findInterval(cal_pop$Pop2016,vec = pop_breaks)]
I can't read your data file, so I'll use a built-in one from the sf package.
library(sf)
nc <- readOGR(system.file("shapes/", package="maptools"), "sids")
str(nc#data)
colors <- brewer.pal(8,"Purples")
#create breaks/intervals
sid_breaks <- c(0,2,4,6,8,10,12,20,60)
#apply breaks to population
sid_cuts <- cut(nc$SID79, sid_breaks, dig.lab = 6, include=TRUE)
#create a vector with colors by population according to the interval they belong to:
sid_colors <- colors[sid_cuts]
#Create choropleth
par(mar=c(0,0,0,0))
plot(nc, col = sid_colors)
legend("bottomleft", fill = colors, legend = levels(sid_cuts), nc=2, title = "SID (1979)", bty="n")

How to plot an nmds with coloured/symbol points based on SIMPROF

Hi so i am trying to plot my nmds of a assemblage data which is in a bray-curtis dissimilarity matrix in R. I have been able to apply ordielipse(),ordihull() and even change the colours based on group factors created by cutree() of a hclst()
e.g using the dune data from the vegan package
data(dune)
Dune.dis <- vegdist(Dune, method = "bray)
Dune.mds <- metaMDS(Dune, distance = "bray", k=2)
#hierarchical cluster
clua <- hclust(Dune.dis, "average")
plot(clua, hang = -1)
# set groupings
rect.hclust(clua, 4)
grp <- cutree(clua, 4)
#plot mds
plot(Dune.mds, display = "sites", type = "text", cex = 1.5)
#show groupings
ordielipse(Dune.mds, group = grp, border =1, col ="red", lwd = 3)
or even colour the points just by the cutree
colvec <- c("red2", "cyan", "deeppink3", "green3")
colvec[grp]
plot(Dune.mds, display = "sites", type = "text", cex = 1.5) #or use type = "points"
points(P4.mds, col = colvec[c2], bg =colvec[c2], pch=21)
However what i really want to do is use the SIMPROF function using the package "clustsig" to then colour the points based on significant groupings - this is more of a technical coding language thing - i am sure there is a way to create a string of factors but i am sure there is a more efficient way to do it
heres my code so far for that:
simp <- simprof(Dune.dis, num.expected = 1000, num.simulated = 999, method.cluster = "average", method.distance = "braycurtis", alpha = 0.05, sample.orientation = "row")
#plot dendrogram
simprof.plot(simp, plot = TRUE)
Now i am just not sure how do the next step to plot the nmds using the groupings defined by the SIMPROF - how do i make the SIMPROF results a factor string without literally typing it my self it myself?
Thanks in advance.
You wrote you know how to get colours from an hclust object with cutree. Then read the documentation of clustsig::simprof. This says that simprof returns an hclust object within its result object. It also returns numgroups which is the suggested number of clusters. Now you have all information you need to use the cutree of hclust you already know. If your simprof result is called simp, use cutree(simp$hclust, simp$numgroups) to extract the integer vector corresponding to the clustsig::simprof result, and use this to colours.
I have never used simprof or clustsig, but I gathered all this information from its documentation.

How to cut a dendrogram in r

Okay so I'm sure this has been asked before but I can't find a nice answer anywhere after many hours of searching.
I have some data, I run a classification then I make a dendrogram.
The problem has to do with aesthetics, specifically; (1) how to cut according to the number of groups (in this example I want 3), (2) make the group labels aligned with the branches of the trees, (2) Re-scale so that there aren't any huge gaps between the groups
More on (3). I have dataset which is very species rich and there would be ~1000 groups without cutting. If I cut at say 3, the tree has some branches on the right and one 'miles' off to the right which I would want to re-scale so that its closer. All of this is possible via external programs but I want to do it all in r!
Bonus points if you can put an average silhouette width plot nested into the top right of this plot
Here is example using iris data
library(ggplot2)
data(iris)
df = data.frame(iris)
df$Species = NULL
ED10 = vegdist(df,method="euclidean")
EucWard_10 = hclust(ED10,method="ward.D2")
hcd_ward10 = as.dendrogram(EucWard_10)
plot(hcd_ward10)
plot(cut(hcd_ward10, h = 10)$upper, main = "Upper tree of cut at h=75")
I suspect what you would want to look at is the dendextend R package (it also has a paper in bioinformatics).
I am not fully sure about your question on (3), since I am not sure I understand what rescaling means. What I can tell you is that you can do quite a lot of dendextend. Here is a quick example for coloring the branches and labels for 3 groups.
library(ggplot2)
library(vegan)
data(iris)
df = data.frame(iris)
df$Species = NULL
library(vegan)
ED10 = vegdist(df,method="euclidean")
EucWard_10 = hclust(ED10,method="ward.D2")
hcd_ward10 = as.dendrogram(EucWard_10)
plot(hcd_ward10)
install.packages("dendextend")
library(dendextend)
dend <- hcd_ward10
dend <- color_branches(dend, k = 3)
dend <- color_labels(dend, k = 3)
plot(dend)
You can also get an interactive dendrogram by using plotly (ggplot method is available through dendextend):
library(plotly)
library(ggplot2)
p <- ggplot(dend)
ggplotly(p)

Ordihull label with single occurrence

I would like to plot goups for my ordination using the function ordihull in vegan. However I have some sites with only one occurence. Using ordihull this sites do not appear in the plot. See the example below, when only one site has BF as management. What I would like to have is a BF label where the one remaining BF management site is located in the ordination plot.
library(vegan)
data(dune)
data(dune.env)
#remove all but one row with BF as management
dune <- dune[-c(2,11),]
dune.env <- dune.env[-c(2,11),]
mod <- cca(dune ~ Management, dune.env)
attach(dune.env)
plot(mod, type="n", scaling = 3)
pl <- ordihull(mod, Management, scaling = 3, label = TRUE)
orihull ignores groups with a single observation and thus doesn't populate the group centroids object with the centre of the convex hull. You could argue it should; I'll need to take this up with Jari and see if we can fix this.
To solve the problem, you have to add the location of the single observation in a secondary step using the text() method. [With the correct removal of all bar one of the BF observations -- -c(2,11)] the following does what you want:
plot(mod, type="n", scaling = 3)
with(dune.env, ordihull(mod, Management, scaling = 3, label = TRUE))
with(dune.env,
text(mod, labels = Management, select = Management == "BF",
scaling = 3, display = "sites"))
Giving
This is made trivial because you can specify select to choose to plot on the one observation with Management == "BF".

Resources