Edit betadisper permutest plot - r

I have used the script below to generate this betadisper plot between 2 communities.
In my "df", the first column is station names (x13)
I have 2 questions:
There is a point behind the "ABC" label, so how do I make the label transparent? Preferably adding different colours to each community?
How do I add the station names next to each point so I can visually compare which stations are most similar?
df <-read.csv("NMDS matrix_csv_NEW.csv", header=T, row.names=1, sep= ",")
dis <- vegdist(df)
groups <- factor(c(rep(1,8), rep(2,5)), labels = c("ABC","DEF"))
mod <- betadisper(dis, groups)
permutest(mod, pairwise = TRUE)
plot(mod, ellipse = TRUE, hull = FALSE, main= "MultiVariate Permutation")

To answer 2), here's how to plot the station names on top of the points.
text(mod$vectors[,1:2], label=Label)

Here is a possibile solution to your problem.
Download the myplotbetadisp.r file from this link and place the file in the working directory (warning, do not save the file as myplotbetadisp.r.txt!).
Some additional options are available in myplotbetadisper function:
fillrect, filling color of the box where centroid labels are printed;
coltextrect, vector of colors for centroid labels;
alphaPoints, alpha trasparency for centroid points;
labpoints, vectors of labels plotted close to points;
poslabPoints, position specifier for the text in labpoints.
# A dummy data generation process
n <- 100
df <- matrix(runif(13*n),nrow=13)
# Compute dissimilarity indices
dis <- vegdist(df)
groups <- factor(c(rep(1,8), rep(2,5)), labels = c("ABC","DEF"))
# Analysis of multivariate homogeneity of group dispersions
mod <- betadisper(dis, groups)
labPts <- LETTERS[1:13]
col.fill.rect <- addAlpha(col2rgb("gray65"), alpha=0.5)
col.text.rect <- apply(col2rgb(c("blue","darkgreen")), 2, addAlpha, alpha=0.5)
transp.centroids <- 0.7
myplotbetadisper(mod, ellipse = TRUE, hull = FALSE,
fillrect=col.fill.rect, coltextrect=col.text.rect,
alphaPoints=transp.centroids, labPoints=labPts,
main= "MultiVariate Permutation")
Here is the plot
Hope it can help you.


group variables and color by type R igraph

I have a graph where I need the vertices to be in name, with a different color due to the type of data (stock, forex and commodities)... I don't understand how to do it...
in this post igraph group vertices based on community something similar is done... I need not circles, only letters and that they have a different color according to the type of data that is...
library(Hmisc) # For correlation matrix
library(corrplot) # For correlation matrix
library("zoo") #
library("reshape") # For "melt" function/cluster network
# We compute the dynamic interdependence. Direct edges denoted as Granger causality linkages
# We compute the contemporaneous interdependence. Indirect edges denoted as Partial correlation linkages
stock <- table[1:55, 1:55]
forex <- table[56:95, 56:95]
commodities <- table[96:116, 96:116]
dim(stock); dim(forex); dim(commodities)
# Full network
network.spill <- graph.adjacency(table, mode='directed')
degree <- degree(network.spill) # number of adjacent edges
between <- betweenness(network.spill)
close <- closeness(network.spill, mode = "all")
autorsco <- authority.score(network.spill)$vector
eccentry <- eccentricity(network.spill, mode = "all")
measures <- cbind(as.matrix(degree), as.matrix(between), as.matrix(close), as.matrix(autorsco), as.matrix(eccentry))
V(network.spill)$size <- round((degree-min(degree))/(max(degree)-min(degree))) # To create vertex
V(network.spill)$shape <- "sphere"
# For stock returns layout_nicely(network.spill)
par(mfcol = c(1, 1))
plot( network.spill, layout = layout_nicely(network.spill), vertex.color = c("gold"), vertex.label.cex=0.6,
vertex.size = autorsco*2,edge.curved = 0.2, edge.arrow.mode=0.5,edge.arrow.size=1.5) # To make the chart

How to add labels to original data given clustering result using hclust

Just say I have some unlabeled data which I know should be clustered into six catergories, like for example this dataset:
ts <- read_table(url("http://kdd.ics.uci.edu/databases/synthetic_control/synthetic_control.data"), col_names = FALSE)
If I create an hclust object with a sample of 60 from the original dataset like so:
n <- 10
s <- sample(1:100, n)
idx <- c(s, 100+s, 200+s, 300+s, 400+s, 500+s)
ts.samp <- ts[idx,]
observedLabels <- c(rep(1,n), rep(2,n), rep(3,n), rep(4,n), rep(5,n), rep(6,n))
# compute DTW distances
library(dtw)#Dynamic Time Warping (DTW)
distMatrix <- dist(ts.samp, method= 'DTW')
# hierarchical clustering
hc <- hclust(distMatrix, method='average')
I know that I can then add the labels to the dendrogram for viewing like this:
observedLabels <- c(rep(1,), rep(2,n), rep(3,n), rep(4,n), rep(5,n), rep(6,n))
plot(hc, labels=observedLabels, main="")
However, I would like to the correct labels to the initial data frame that was clustered. So for ts.samp I would like to add a extra column with the correct label that each observation has been clustered into.
It would seems that ts.samp$cluster <- hc$label should add the cluster to the data frame, however hc$label returns NULL.
Can anyone help with extracting this information?
You need to define a level where you cut your dendrogram, this will form the groups.
labels <- cutree(hc, k = 3) # you set the number of k that's more appropriate, see how to read a dendrogram
ts.samp$grouping <- labels
Let's look at the dendrogram in order to find the best number for k:
plot(hc, main="")
abline(h=500, col = "red") # cut at height 500 forms 2 groups
abline(h=300, col = "blue") # cut at height 300 forms 3/4 groups
It looks like either 2 or 3 might be good. You need to find the highest jump in the vertical lines (Height).
Use the horizontal lines at that height and count the cluster "formed".

Plot a step function (cadlag) in R (two dots: continuity and discontinuity point)

Essentially I want to plot a compound Poisson process. Everything works fine except that I don't know how to edit the plot parameters correctly.
I want to have the continuity points with a full dot and the discontinuity points with an empty dot. Right now I only am able to manage the full dot.
Minimal working example (plots an compound Poisson path with 10 jumps)
n <- 10
n.t <- cumsum(rexp(n))
x <- c(0,cumsum(rnorm(n)))
plot(stepfun(n.t, x), xlab="t", ylab="X",do.points = TRUE,pch = 16,col.points = "blue",verticals = FALSE)
So how can I add the discontinuity points to the right? Any idea?
You can use points to add the points after the original plot.
set.seed(2017) ## For reproducibility
## Your code
n <- 10
n.t <- cumsum(rexp(n))
x <- c(0,cumsum(rnorm(n)))
plot(stepfun(n.t, x), xlab="t", ylab="X",
do.points = TRUE,pch = 16,col.points = "blue",verticals = FALSE)
## Add the endpoints
points(n.t, x[-length(x)], pch = 1)

New outliers appear after I remove existing ones using QQ Plot Results

I'm working on the PCA section from Michael Faraway's Linear Models with R (chapter 11, page 164).
PCA analysis is sensitive to outliers and the Mahalanobis distance helps us identify them.
The author checks for outliers by plotting the Mahalanobis distance against the quantiles of a chi-squared distribution.
if require(faraway)==F install.packages("faraway"); require(faraway)
data(fat, package='faraway')
cfat <- fat[,9:18]
n <- nrow(cfat); p <- ncol(cfat)
plot(qchisq(1:n/(n+1),p), sort(md), xlab=expression(paste(chi^2,
ylab = "Sorted Mahalanobis distances")
I identify the points:
identify(qchisq(1:n/(n+1),p), sort(md))
It appears that the outliers are in rows 242:252. I remove these outliers and re-create the QQ Plot:
cfat.mod <- cfat[-c(242:252),] #remove outliers
robfat <- cov.rob(cfat.mod)
md <- mahalanobis(cfat.mod, center=robfat$center, cov=robfat$cov)
n <- nrow(cfat.mod); p <- ncol(cfat.mod)
plot(qchisq(1:n/(n+1),p), sort(md), xlab=expression(paste(chi^2,
ylab = "Sorted Mahalanobis distances")
identify(qchisq(1:n/(n+1),p), sort(md))
Alas, it appears now that a new set of points (rows 234:241) are now outliers. This keeps happening every time I remove additional outliers.
Look forward to understanding what I'm doing wrong.
To identify the points correctly, make sure the labels correspond to the positions of the points in the data. The functions order or sort with index.return=TRUE will give the sorted indices. Here is an example, arbitrarily removing the points with md greater than a threshold.
## Your data
data(fat, package='faraway')
cfat <- fat[, 9:18]
n <- nrow(cfat)
p <- ncol(cfat)
md <- sort(mahalanobis(cfat, colMeans(cfat), cov(cfat)), index.return=TRUE)
xs <- qchisq(1:n/(n+1), p)
plot(xs, md$x, xlab=expression(paste(chi^2, 'quantiles')))
## Use indices in data as labels for interactive identify
identify(xs, md$x, labels=md$ix)
## remove those with md>25, for example
inds <- md$x > 25
cfat.mod <- cfat[-md$ix[inds], ]
nn <- nrow(cfat.mod)
md1 <- mahalanobis(cfat.mod, colMeans(cfat.mod), cov(cfat.mod))
## Plot the new data
par(mfrow=c(1, 2))
plot(qchisq(1:nn/(nn+1), p), sort(md1), xlab='chisq quantiles', ylab='')
abline(0, 1, col='red')
car::qqPlot(md1, distribution='chisq', df=p, line='robust', main='With car::qqPlot')

Easiest way to plot inequalities with hatched fill?

Refer to the above plot. I have drawn the equations in excel and then shaded by hand. You can see it is not very neat. You can see there are six zones, each bounded by two or more equations. What is the easiest way to draw inequalities and shade the regions using hatched patterns ?
To build up on #agstudy's answer, here's a quick-and-dirty way to represent inequalities in R:
plot(NA,xlim=c(0,1),ylim=c(0,1), xaxs="i",yaxs="i") # Empty plot
a <- curve(x^2, add = TRUE) # First curve
b <- curve(2*x^2-0.2, add = TRUE) # Second curve
names(a) <- c('xA','yA')
names(b) <- c('xB','yB')
id <- yB<=yA
# b<a area
polygon(x = c(xB[id], rev(xA[id])),
y = c(yB[id], rev(yA[id])),
density=10, angle=0, border=NULL)
# a>b area
polygon(x = c(xB[!id], rev(xA[!id])),
y = c(yB[!id], rev(yA[!id])),
density=10, angle=90, border=NULL)
If the area in question is surrounded by more than 2 equations, just add more conditions:
plot(NA,xlim=c(0,1),ylim=c(0,1), xaxs="i",yaxs="i") # Empty plot
a <- curve(x^2, add = TRUE) # First curve
b <- curve(2*x^2-0.2, add = TRUE) # Second curve
d <- curve(0.5*x^2+0.2, add = TRUE) # Third curve
names(a) <- c('xA','yA')
names(b) <- c('xB','yB')
names(d) <- c('xD','yD')
# Basically you have three conditions:
# curve a is below curve b, curve b is below curve d and curve d is above curve a
# assign to each curve coordinates the two conditions that concerns it.
idA <- yA<=yD & yA<=yB
idB <- yB>=yA & yB<=yD
idD <- yD<=yB & yD>=yA
polygon(x = c(xB[idB], xD[idD], rev(xA[idA])),
y = c(yB[idB], yD[idD], rev(yA[idA])),
density=10, angle=0, border=NULL)
In R, there is only limited support for fill patterns and they can only be
applied to rectangles and polygons.This is and only within the traditional graphics, no ggplot2 or lattice.
It is possible to fill a rectangle or polygon with a set of lines drawn
at a certain angle, with a specific separation between the lines. A density
argument controls the separation between the lines (in terms of lines per inch)
and an angle argument controls the angle of the lines.
here an example from the help:
plot(c(1, 9), 1:2, type = "n")
polygon(1:9, c(2,1,2,1,NA,2,1,2,1),
density = c(10, 20), angle = c(-45, 45))
Another option is to use alpha blending to differentiate between regions. Here using #plannapus example and gridBase package to superpose polygons, you can do something like this :
vps <- baseViewports()
grid.polygon(x = xA, y = yA,gp =gpar(fill='red',lty=1,alpha=0.2))
grid.polygon(x = xB, y = yB,gp =gpar(fill='green',lty=2,alpha=0.2))
grid.polygon(x = xD, y = yD,gp =gpar(fill='blue',lty=3,alpha=0.2))
There are several submissions on the MATLAB Central File Exchange that will produce hatched plots in various ways for you.
I think a tool that will come handy for you here is gnuplot.
Take a look at the following demos:
some tricks
