How to plot an nmds with coloured/symbol points based on SIMPROF - r

Hi so i am trying to plot my nmds of a assemblage data which is in a bray-curtis dissimilarity matrix in R. I have been able to apply ordielipse(),ordihull() and even change the colours based on group factors created by cutree() of a hclst()
e.g using the dune data from the vegan package
data(dune)
Dune.dis <- vegdist(Dune, method = "bray)
Dune.mds <- metaMDS(Dune, distance = "bray", k=2)
#hierarchical cluster
clua <- hclust(Dune.dis, "average")
plot(clua, hang = -1)
# set groupings
rect.hclust(clua, 4)
grp <- cutree(clua, 4)
#plot mds
plot(Dune.mds, display = "sites", type = "text", cex = 1.5)
#show groupings
ordielipse(Dune.mds, group = grp, border =1, col ="red", lwd = 3)
or even colour the points just by the cutree
colvec <- c("red2", "cyan", "deeppink3", "green3")
colvec[grp]
plot(Dune.mds, display = "sites", type = "text", cex = 1.5) #or use type = "points"
points(P4.mds, col = colvec[c2], bg =colvec[c2], pch=21)
However what i really want to do is use the SIMPROF function using the package "clustsig" to then colour the points based on significant groupings - this is more of a technical coding language thing - i am sure there is a way to create a string of factors but i am sure there is a more efficient way to do it
heres my code so far for that:
simp <- simprof(Dune.dis, num.expected = 1000, num.simulated = 999, method.cluster = "average", method.distance = "braycurtis", alpha = 0.05, sample.orientation = "row")
#plot dendrogram
simprof.plot(simp, plot = TRUE)
Now i am just not sure how do the next step to plot the nmds using the groupings defined by the SIMPROF - how do i make the SIMPROF results a factor string without literally typing it my self it myself?
Thanks in advance.

You wrote you know how to get colours from an hclust object with cutree. Then read the documentation of clustsig::simprof. This says that simprof returns an hclust object within its result object. It also returns numgroups which is the suggested number of clusters. Now you have all information you need to use the cutree of hclust you already know. If your simprof result is called simp, use cutree(simp$hclust, simp$numgroups) to extract the integer vector corresponding to the clustsig::simprof result, and use this to colours.
I have never used simprof or clustsig, but I gathered all this information from its documentation.

Related

How to specify tm_fill() if I want it to be a variable from a new object?

I am trying to create an R function that would run a GWR on variables that the user specifies from a Spatial Polygons Data Frame. The end result of running the function are two mappings - one of the independent variable's values and one of the coefficient values from the GWR model. I'm having trouble with the second map.
I have managed to create the GWR model and a 'results' object for the coefficients that I would be visualizing.
gwr.model <- gwr(SpatialPolygonsDataFrame#data[, y] ~ SpatialPolygonsDataFrame#data[, x],
data = SpatialPolygonsDataFrame,
adapt = GWRbandwidth,
hatmatrix = TRUE,
se.fit = TRUE)
results <- as.data.frame(gwr.model$SDF)
gwr.map <- SpatialPolygonsDataFrame
gwr.map#data <- cbind(SpatialPolygonsDataFrame#data, as.matrix(results))
To create the visualization of the GWR coefficients, I have to specify my tm_fill() to be a column from the 'results' object, but I do not know how to do it so that the function may be used will any Spatial Polygons Data Frame. So far, I have tried using the paste0() function, as so:
map2 <- tm_shape(gwr.map) + tm_fill(paste0("SpatialPolygonsDataFrame.", x), n = 5, style = "quantile", title = "Coefficient") +
tm_layout(frame = FALSE, legend.text.size = 0.5, legend.title.size = 0.6)
But I got an error saying that the fill argument is neither colors nor a valid variable name.
I'll be grateful for any tips that could help me resolve the issue.
Switching to the package sf - leaving sp behind - probably will solve your problem here.
In the absence of a reproducible example, let me try to suggest the following here:
convert your results with gwr.map.sf <- sf::st_as_sf(gwr.map). Then you add the results of your GWR simply as a new column: gwr.map$results <- results (my understanding is that the dimensions should fit).
Finally you should be able to plot like this:
map2 <- tm_shape(gwr.map.sf) + tm_fill("results", n = 5, style = "quantile", title = "Coefficient") +
tm_layout(frame = FALSE, legend.text.size = 0.5, legend.title.size = 0.6)

How to combine state distribution plot and separate legend in traminer?

Plotting several clusters using seqdplot in TraMineR can make the legend messy, especially in combination with numerous states. This calls for additional options for modifying the legend which is available with the function seqlegend. However, I have a hard time combining a state distribution plot (seqdplot) with a separate modified legend (seqlegend). Ideally one wants to plot the clusters (e.g. 9) without a legend and then add the separate legend in the available bottom right row, but instead the separate legend is generating a new plot window. Can anyone help?
Here's an example using the biofam data. With the data I use in my own research the legend becomes much more messy since I have 11 states.
#Data
library(TraMineR)
library(WeightedCluster)
data(biofam)
biofam.seq <- seqdef(biofam[501:600, 10:25])
#OM distances
biofam.om <- seqdist(biofam.seq, method = "OM", indel = 3, sm = "TRATE")
#9 clusters
wardCluster <- hclust(as.dist(biofam.om), method = "ward.D2")
cluster9 <- cutree(wardCluster, k = 9)
#State distribution plot
seqdplot(biofam.seq, group = cluster9, with.legend = F)
#Separate legend
seqlegend(biofam.seq, title = "States", ncol = 2)
#Combine state distribution plot and separate legend
#??
Thank you.
The seqplot function does not allow to control the number of columns of the legend, nor does it allow to add a legend title. So you have to compose the plot yourself by generating a separated plot for each group with the legend disabled and adding the legend afterwards. Here is how you can do that:
cluster9 <- factor(cluster9)
levc <- levels(cluster9)
lev <- length(levc)
par(mfrow=c(5,2))
for (i in 1:lev)
seqdplot(biofam.seq[cluster9 == levc[i],], border=NA, main=levc[i], with.legend=FALSE)
seqlegend(biofam.seq, ncol=4, cex = 1.2, title='States')
========================
Update, Oct 1, 2018 =================
Since TraMineR V 2.0-9, the seqplot family of functions now support (when applicable) the argument ncol to control the number of columns in the legend. To add a title to the legend, you still have to proceed as shown above.
AFAIK seqlegend() doesn't work when the other plots you are plotting utilizes the groups arguments. In your case the only thing seqlegend() is adding is a title "States". If you are looking to add a legend so you can customize what is in the legend and so forth, you can accomplish that by providing the corresponding alphabet and states that are used in your analysis.
The package's website has several walkthroughs and guides enumerating the various options and so forth: Link to their webiste
#Data
library(TraMineR)
library(WeightedCluster)
data(biofam)
## Generate alphabet and states
alphabet <- 0:7
states <- letters[seq_along(alphabet)]
biofam.seq <- seqdef(biofam[501:600, 10:25], states = states, alphabet = alphabet)
#OM distances
biofam.om <- seqdist(biofam.seq, method = "OM", indel = 3, sm = "TRATE")
#9 clusters
wardCluster <- hclust(as.dist(biofam.om), method = "ward.D2")
cluster9 <- cutree(wardCluster, k = 9)
#State distribution plot
seqdplot(biofam.seq, group = cluster9, with.legend = TRUE)

Run points() after plot() on a dataframe

I'm new to R and want to plot specific points over an existing plot. I'm using the swiss data frame, which I visualize through the plot(swiss) function.
After this, want to add outliers given by the Mahalanobis distance:
mu_hat <- apply(swiss, 2, mean); sigma_hat <- cov(swiss)
mahalanobis_distance <- mahalanobis(swiss, mu_hat, sigma_hat)
outliers <- swiss[names(mahalanobis_distance[mahalanobis_distance > 10]),]
points(outliers, pch = 'x', col = 'red')
but this last line has no effect, as the outlier points aren't added to the previous plot. I see that if repeat this procedure on a pair of variables, say
plot(swiss[2:3])
points(outliers[2:3], pch = 'x', col = 'red')
the red points are added to the plot.
Ask: is there any restriction to how the points() function can be used for a multivariate data frame?
Here's a solution using GGally::ggpairs. It's a little ugly as we need to modify the ggally_points function to specify the desired color scheme.
I've assumed that mu_hat = colMeans(swiss) and sigma_hat = cov(swiss).
library(dplyr)
library(GGally)
swiss %>%
bind_cols(distance = mahalanobis(swiss, colMeans(swiss), cov(swiss))) %>%
mutate(is_outlier = ifelse(distance > 10, "yes", "no")) %>%
ggpairs(columns = 1:6,
mapping = aes(color = is_outlier),
upper = list(continuous = function(data, mapping, ...) {
ggally_points(data = data, mapping = mapping) +
scale_colour_manual(values = c("black", "red"))
}),
lower = list(continuous = function(data, mapping, ...) {
ggally_points(data = data, mapping = mapping) +
scale_colour_manual(values = c("black", "red"))
}),
axisLabels = "internal")
Unfortunately this isn't possible the way you're currently doing things. When plotting a data frame R produces many plots and aligns them. What you're actually seeing there is 6 by 6 = 36 individual plots which have all been aligned to look nice.
When you use the dots command, it tells it to place the dots on the current plot. Which doesn't really make sense when you have 36 plots, at least not the way you want it to.
ggplot is a really powerful tool in R, it provides far greater combustibility. For example you could set up the dataframe to include your outliers, but have them labelled as "outlier" and place it in each plot that you have set up as facets. The more you explore it you might find there are better plots which suit your needs as well.
Plotting a dataframe in base R is a good exploratory tool. You could set up those outliers as a separate dataframe and plot it, so you can see each of the 6 by 6 plots side by side and compare. It all depends on your goal. If you're goal is to produce exactly as you've described, the ggplot2 package will help you create something more professional. As #Gregor suggested in the comments, looking up the function ggpairs from the GGally package would be a good place to start.
A quick google image search shows some funky plots akin to what you're after and then some!
Find it here

Plotting quantile regression by variables in a single page

I am running quantile regressions for several independent variables separately (same dependent). I want to plot only the slope estimates over several quantiles of each variable in a single plot.
Here's a toy data:
set.seed(1988)
y <- rnorm(50, 5, 3)
x1 <- rnorm(50, 3, 1)
x2 <- rnorm(50, 1, 0.5)
# Running Quantile Regression
require(quantreg)
fit1 <- summary(rq(y~x1, tau=1:9/10), se="boot")
fit2 <- summary(rq(y~x2, tau=1:9/10), se="boot")
I want to plot only the slope estimates over quantiles. Hence, I am giving parm=2 in plot.
plot(fit1, parm=2)
plot(fit2, parm=2)
Now, I want to combine both these plots in a single page.
What I have tried so far;
I tried setting par(mfrow=c(2,2)) and plotting them. But it's producing a blank page.
I have tried using gridExtra and gridGraphics without success. Tried to convert base graphs into Grob objects as stated here
Tried using function layout function as in this document
I am trying to look into the source code of plot.rqs. But I am unable to understand how it's plotting confidence bands (I'm able to plot only the coefficients over quantiles) or to change mfrow parameter there.
Can anybody point out where am I going wrong? Should I look into the source code of plot.rqs and change any parameters there?
While quantreg::plot.summary.rqs has an mfrow parameter, it uses it to override par('mfrow') so as to facet over parm values, which is not what you want to do.
One alternative is to parse the objects and plot manually. You can pull the tau values and coefficient matrix out of fit1 and fit2, which are just lists of values for each tau, so in tidyverse grammar,
library(tidyverse)
c(fit1, fit2) %>% # concatenate lists, flattening to one level
# iterate over list and rbind to data.frame
map_dfr(~cbind(tau = .x[['tau']], # from each list element, cbind the tau...
coef(.x) %>% # ...and the coefficient matrix,
data.frame(check.names = TRUE) %>% # cleaned a little
rownames_to_column('term'))) %>%
filter(term != '(Intercept)') %>% # drop intercept rows
# initialize plot and map variables to aesthetics (positions)
ggplot(aes(x = tau, y = Value,
ymin = Value - Std..Error,
ymax = Value + Std..Error)) +
geom_ribbon(alpha = 0.5) +
geom_line(color = 'blue') +
facet_wrap(~term, nrow = 2) # make a plot for each value of `term`
Pull more out of the objects if you like, add the horizontal lines of the original, and otherwise go wild.
Another option is to use magick to capture the original images (or save them with any device and reread them) and manually combine them:
library(magick)
plots <- image_graph(height = 300) # graphics device to capture plots in image stack
plot(fit1, parm = 2)
plot(fit2, parm = 2)
dev.off()
im1 <- image_append(plots, stack = TRUE) # attach images in stack top to bottom
image_write(im1, 'rq.png')
The function plot used by quantreg package has it's own mfrow parameter. If you do not specify it, it enforces some option which it chooses on it's own (and thus overrides your par(mfrow = c(2,2)).
Using the mfrow parameter within plot.rqs:
# make one plot, change the layout
plot(fit1, parm = 2, mfrow = c(2,1))
# add a new plot
par(new = TRUE)
# create a second plot
plot(fit2, parm = 2, mfrow = c(2,1))

How to draw line around significant values in R's corrplot package

I have been asked to obtain a correlation plot for a colaborator.
My choice is to use R for the task, specifically the corrplot package.
I have been researching on the internet and I found multiple ways to obtain such graphics, but not the specific graphic I was asked for (as you can see in the picture the significant values are highlighted by drawing a square around the significant tile), which is puzzling me.
Example of the correlation plot required
The closest result I achieve is using the code under this lines, but I do not seem to be able to find the option to draw line around the significant tiles (if exists).
#Insignificant correlations are leaved blank
corrplot(res3$r, type="upper", order="hclust",
p.mat = res3$P, sig.level = 0.01, insig = "blank")
I tried adding the "addrect" parameter but it didn't work.
#Insignificant correlation are crossed
corrplot(res3$r, type="upper", order="hclust", p.mat = res3$P,
addrect=2, sig.level = 0.01, insig = "blank")
Any help will be appreciated.
corrplot allows you to add new plots to an already existing one. Therefore, once you've created the plot of the initial correlation matrix, you can simply add those cells that you want to highlight in an iterative manner using corrplot(..., add = TRUE).
The only thing required to achieve your goal is an indices vecor (which I called 'ids') to tell R which cells to highlight. Note that for reasons of simplicity, I took a random sample of the initial correlation matrix, but things like ids <- which(p.value < 0.01) (assuming that you've stored your significance levels in a separate vector) would work similarly.
library(corrplot)
## create and visualize correlation matrix
data(mtcars)
M <- cor(mtcars)
corrplot(M, cl.pos = "n", na.label = " ")
## select cells to highlight (e.g., statistically significant values)
set.seed(10)
ids <- sample(1:length(M), 15L)
## duplicate correlation matrix and reject all irrelevant values
N <- M
N[-ids] <- NA
## add significant cells to the initial corrplot iteratively
for (i in ids) {
O <- N
O[-i] <- NA
corrplot(O, cl.pos = "n", na.label = " ", addgrid.col = "black", add = TRUE,
bg = "transparent", tl.col = "transparent")
}
Note that you could also add all values to highlight in one go (i.e., without requiring a for loop) using corrplot(N, ...), but in that case, an undesirable black margin is drawn all around the plotting area.

Resources