How to plot ellipses with bray-curtis similarity values? - r

I want to do an nMDS analysis and I want to add to my graph ellipses that represent the percentage of similarity of bray-curtis, but I do not know how to do it with R, this type of graphics you can do with PRIMER but I suppose with R as well.
How can I do it?
I want my graphic to look like this:
graph

I have never used PRIMER, and I do not know how these graphs are supposed to be drawn, but I have to guess from the graph you posted. It may be that you first get a cluster dendrogram (function hclust), and then cut it by tree height (function cutree) and then draw enclosing ellipses for these (vegan::ordiellipse(..., kind = "ehull")). If so, this should do the trick:
library(vegan)
data(BCI)
d <- vegdist(BCI)
ord <- metaMDS(d, trace=FALSE)
cl <- hclust(d, "average") # using average linkage
plot(cl, hang=-1)
rect.hclust(cl, h=0.4, border="red") # to see the clusters
rect.hclust(cl, h=0.5, border="cyan")
rect.hclust(cl, h=0.6, border="blue")
plot(ord) # then ordination & enclosing ellipses
ordiellipse(ord, cutree(cl, h=0.4), kind="ehull", col="red", lwd=2)
ordiellipse(ord, cutree(cl, h=0.5), kind="ehull", col="cyan", lwd=2)
ordiellipse(ord, cutree(cl, h=0.6), kind="ehull", col="blue", lwd=2)
I adjusted the limits for the current dendrogram. The ordiellipse will give you error message for every one-item class, but they are harmless (I got to clean the function from these).

Related

R: Holt Model. Unable to plot timeseries prediction (predict)

I have been able to use a lm poly-model to model and predict some timeseries data. However when I change to using a holt model, I obtain an error in the R console.
Here is what I am trying to do:
library(ggplot2)
library(matrixStats)
library(forecast)
df_input <- read.csv("postprocessed.csv")
x <- df_input$time
y <- df_input$value
df <- data.frame(x, y)
#poly4model <- lm(y~poly(x, degree=4), data=df)
holtmodel <- holt(df$y) # might need df$value here ?
v <- seq(1, 44)
v2 <- seq(44, 55)
pdf("postprocessed_holts.pdf")
plot(df, xlim=c(0, 55))
##lines(v, predict(poly4model, data.frame(x=v)), col="blue", pch=20, lwd=3)
##lines(v2, predict(poly4model, data.frame(x=v2)), col="red", pch=20, lwd=3)
lines(v, predict(holtmodel, data.frame(x=v)), col="blue", pch=20, lwd=3)
lines(v2, predict(holtmodel, data.frame(x=v2)), col="red", pch=20, lwd=3)
dev.off()
This is the error which shows up
Error in xy.coords(x, y) : 'x' and 'y' lengths differ
I am a bit confused as to what x and y refer to here. The objects x and y which are in the Environment (R Studio Environment) both have length 44.
The code appears to error on both lines starting with lines.
Here's a copy of the input data...
"","time","value"
"1",1,2.61066016308988
"2",2,3.41246054742996
"3",3,3.8608767964033
"4",4,4.28686048552237
"5",5,4.4923132964825
"6",6,4.50557049744317
"7",7,4.50944447661246
"8",8,4.51097373134893
"9",9,4.48788748823809
"10",10,4.34603985656981
"11",11,4.28677073671406
"12",12,4.20065901625172
"13",13,4.02514194962519
"14",14,3.91360194972916
"15",15,3.85865748409081
"16",16,3.81318053258601
"17",17,3.70380706527433
"18",18,3.61552922363713
"19",19,3.61405310598722
"20",20,3.64591327503384
"21",21,3.70234435835577
"22",22,3.73503970503372
"23",23,3.81003078640584
"24",24,3.88201196162666
"25",25,3.89872518158949
"26",26,3.97432743542362
"27",27,4.2523675144599
"28",28,4.34654855854847
"29",29,4.49276038902684
"30",30,4.67830892029687
"31",31,4.91896819673664
"32",32,5.04350767355202
"33",33,5.09073406942046
"34",34,5.18510849382162
"35",35,5.18353176529036
"36",36,5.2210776270173
"37",37,5.22643491929207
"38",38,5.11137006553725
"39",39,5.01052467981257
"40",40,5.0361056705898
"41",41,5.18149486951409
"42",42,5.36334869132276
"43",43,5.43053620818444
"44",44,5.60001072279525
Edit
I tried an alternative method as well. I noticed that the object holtmodel contains two objects which might be useful. They are fitted and mean. As far as I can tell this is the fitted timeseries and the mean timeseries for the next 10 steps/predictions.
I tried plotting these objects with
lines(holtmodel$fitted, col="orange", lwd=2)
lines(holtmodel$mean, col="blue", lwd=2)
however the second of these fails to plot anything, despite no error being produced in the console. The first line plots an orange timeseries as expected.
Your issue
The objects you are trying to add as lines don't have the same length:
length(predict(holtmodel, data.frame(x=v)))
# 10
length(v)
# 44
length(predict(holtmodel, data.frame(x=v2)))
# 10
length(v2)
# 12
This means you can't add them as new lines.
Also, you can't really predict the same way you would with a linear regression by using say, older data as point to prepare the model. Exponential smoothing methods use historical data points to build future data points, you can't really display them for past events.
Also, you are not specifying the parameter for the number of periods you are trying to predict (h), I'll let you refer to the documentation on the holt function. It is already a prediction of future events that is the output, so the use of predict() on it doesn't change the result:
holt_predict <- predict(holtmodel)
length(setdiff(holt_predict, holtmodel))
# 0 which means they are the same objects
Solution
What you could do is use directly mean and fitted and plot them with lines, by also expanding the area to plot the chat with xlim and ylim to view the predicted values. You can directly plot holtmodel$fitted and holtmodel$mean on your chart, since they are time series objects:
plot(df, xlim=c(0, 60), ylim=c(2.5, 10))
lines(holtmodel$fitted, col="blue", pch=20, lwd=3)
lines(holtmodel$mean, col="red", pch=20, lwd=3)
And the result:
Easy alternative
To save you the hassle of having to go through this kind of solution there are easier methods. Have you tried the autoplot function included in the package forecast ? It is from ggplot2 and will give you what you want directly (unless you don't want the confidence intervals). It is very straightforward and will probably yield results close to what you want:
autoplot(holtmodel)

How to plot, in R, a correlogram on top of a correlation matrix?

I've followed the instructions on this website from STHDA to plot correlation matrices and correlograms in R. The website and examples are really good. However, I'd like to plot the upper part of the correlogram over the upper part of the correlation matrix.
Here's the code:
library(PerformanceAnalytics)
chart.Correlation(mtcars, histogram=TRUE, pch=19)
This should give me the correlation matrix using scatter plots, together with the histogram, which I'd like to maintain. But for the upper part of the plot, I'd like to have the correlogram obtained from this code:
library(corrplot)
corrplot(cor(mtcars), type="upper", order="hclust", tl.col="black", tl.srt=45)
The obvious way of doing it is exporting all graphs in pdf and then work with Inkscape, but it would be nicer if I could get this directly from R. Is there any possible way for doing this?
Thanks.
The trick to using the panel functions within pairs is found in help(pairs):
A panel function should not attempt to start a new plot, but just plot within a given coordinate system: thus 'plot' and 'boxplot' are not panel functions.
So, you should use graphic-adding functions, such as points, lines, polygon, or perhaps (when available) plot(..., add=TRUE), but not a straight-up plot. What you were suggesting in your comment (with SpatialPolygons) might have worked with some prodding if you actually tried to plot it on a device vice just returning it from your plotting function.
In my example below, I actually do "create a new plot", but I cheat (based on this SO post) by adding a second plot on top of the one already there. I do this to shortcut an otherwise necessary scale/shift, which would still not be perfect since you appear to want a "perfect circle", something that can really only be guaranteed with asp=1 (aspect ratio fixed at 1:1).
colorRange <- c('#69091e', '#e37f65', 'white', '#aed2e6', '#042f60')
## colorRamp() returns a function which takes as an argument a number
## on [0,1] and returns a color in the gradient in colorRange
myColorRampFunc <- colorRamp(colorRange)
panel.cor <- function(w, z, ...) {
correlation <- cor(w, z)
## because the func needs [0,1] and cor gives [-1,1], we need to
## shift and scale it
col <- rgb( myColorRampFunc( (1+correlation)/2 )/255 )
## square it to avoid visual bias due to "area vs diameter"
radius <- sqrt(abs(correlation))
radians <- seq(0, 2*pi, len=50) # 50 is arbitrary
x <- radius * cos(radians)
y <- radius * sin(radians)
## make them full loops
x <- c(x, tail(x,n=1))
y <- c(y, tail(y,n=1))
## I trick the "don't create a new plot" thing by following the
## advice here: http://www.r-bloggers.com/multiple-y-axis-in-a-r-plot/
## This allows
par(new=TRUE)
plot(0, type='n', xlim=c(-1,1), ylim=c(-1,1), axes=FALSE, asp=1)
polygon(x, y, border=col, col=col)
}
pairs(mtcars, upper.panel=panel.cor)
You can manipulate the size of the circles -- at the expense of unbiased visualization -- by playing with the radius. The colors I took directly from the page you linked to originally.
Similar functions can be used for your lower and diagonal panels.

Best fit quadratic regression

I'm running into an odd problem; get my dataset here:dataset
All I need is a simple graph showing the best-fit regression (quadratic regression) between rao and obs_richness; but instead I am getting very different polynomial models. Any suggestions on how to fix this?
#read in data
F_Div<-read.csv('F_Div.csv', header=T)
str(F_Div)
pairs(F_Div[2:12], pch=16)
#richness vs functional diversity
par(mfrow=c(1,1))
lm1<-lm ( rao~Obs_Richness, data=F_Div)
summary (lm1)
plot (rao~Obs_Richness, data=F_Div, pch=16, xlab="Species Richness", ylab="Rao's Q")
abline(lm1, lty=3)
lines (lowess (F_Div$rao~F_Div$Obs_Richness))
poly.mod<- lm (F_Div$rao ~ poly (F_Div$Obs_Richness, 2, raw=T))
summary (poly.mod)
lines (F_Div$Obs_Richness, predict(poly.mod))
I need the line that best approximates the lowess line (a simple curve), not this squiggly mess.
I also tried this but not what need:
xx <- seq(0,30, length=67)
plot (rao~Obs_Richness, data=F_Div, pch=16, xlab="Species Richness", ylab="Rao's Q")
lines(xx, predict(poly.mod, data.frame(x=xx)), col="blue")
The squiggly mess happens because line(...) draws lines between successive points in the data's original order. Try this at the end.
p <- data.frame(x=F_Div$Obs_Richness,y=predict(poly.mod))
p <- p[order(p$x),]
lines(p)

How do I plot the 'inverse' of a survival function?

I am trying to plot the inverse of a survival function, as the data I'm is actually an increase in proportion of an event over time. I can produce Kaplan-Meier survival plots, but I want to produce the 'opposite' of these. I can kind of get what I want using the following fun="cloglog":
plot(survfit(Surv(Days_until_workers,Workers)~Queen_Number+Treatment,data=xdata),
fun="cloglog", lty=c(1:4), lwd=2, ylab="Colonies with Workers",
xlab="Days", las=1, font.lab=2, bty="n")
But I don't understand quite what this has done to the time (i.e. doesn't start at 0 and distance decreases?), and why the survival lines extend above the y axis.
Would really appreciate some help with this!
Cheers
Use fun="event" to get the desired output
fit <- survfit(Surv(time, status) ~ x, data = aml)
par(mfrow=1:2, las=1)
plot(fit, col=2:3)
plot(fit, col=2:3, fun="event")
The reason for fun="cloglog" screwing up the axes is that it does not plot a fraction at all. It is instead plotting this according to ?plot.survfit:
"cloglog" creates a complimentary log-log survival plot (f(y) = log(-log(y)) along with log scale for the x-axis)
Moreover, the fun argument is not limited to predefined functions like "event" or "cloglog", so you can easily give it your own custom function.
plot(fit, col=2:3, fun=function(y) 3*sqrt(1-y))

How can I recreate this 2d surface + contour + glyph plot in R?

I've run a 2d simulation in some modelling software from which i've got an export of x,y point locations with a set of 6 attributes. I wish to recreate a figure that combines the data, like this:
The ellipses and the background are shaded according to attribute 1 (and the borders of these are of course representing the model geometry, but I don't think I can replicate that), the isolines are contours of attribute 2, and the arrow glyphs are from attributes 3 (x magnitude) and 4 (y magnitude).
The x,y points are centres of the triangulated mesh I think, and look like this:
I want to know how I can recreate a plot like this with R. To start with I have irregularly-spaced data due to it being exported from an irregular mesh. That's immediately where I get stuck with R, having only ever used it for producing box-and-whisper plots and the like.
Here's the data:
https://dl.dropbox.com/u/22417033/Ellipses_noheader.txt
Edit: fields: x, y, heat flux (x), heat flux (y), thermal conductivity, Temperature, gradT (x), gradT (y).
names(Ellipses) <- c('x','y','dfluxx','dfluxy','kxx','Temps','gradTx','gradTy')
It's quite easy to make the lower plot (making the assumption that there is a dataframe named 'edat' read in with:
edat <- read.table(file=file.choose())
with(edat, plot(V1,V2), cex=0.2)
Things get a bit more beautiful with:
with(edat, plot(V1,V2, cex=0.2, col=V5))
So I do not think your original is being faithfully represented by the data. The contour lines are NOT straight across the "conductors". I call them "conductors" because this looks somewhat like iso-potential lines in electrostatics. I'm adding some text here to serve as a search handle for others who might be searching for plotting problems in real world physics: vector-field (the arrows) , heat equations, gradient, potential lines.
You can then overlay the vector field with:
with(edat, arrows(V1,V2, V1-20*V6*V7, V2-20*V6*V8, length=0.04, col="orange") )
You could"zoom in" with xlim and ylim:
with(edat, plot(V1,V2, cex=0.3, col=V5, xlim=c(0, 10000), ylim=c(-8000, -2000) ))
with(edat, arrows(V1,V2, V1-20*V6*V7, V2-20*V6*V8, length=0.04, col="orange") )
Guessing that the contour requested if for the Temps variable. Take your pick of contourplots.
require(akima)
intflow<- with(edat, interp(x=x, y=y, z=Temps, xo=seq(min(x), max(x), length = 410),
yo=seq(min(y), max(y), length = 410), duplicate="mean", linear=FALSE) )
require(lattice)
contourplot(intflow$z)
filled.contour(intflow)
with( intflow, contour(x=x, y=y, z=z) )
The last one will mix with the other plotting examples since those were using base plotting functions. You may need to switch to points instead of plot.
There are several parts to your plot so you will probably need several tools to make the different parts.
The background and ellipses can be created with polygon (once you figure where they should be).
The contourLines function can calculate the contour lines for you which you can add with the lines function (or contour has and add argument and could probably be used to add the lines directly).
The akima package has a function interp which can estimate values on a grid given the values ungridded.
The my.symbols function along with ms.arrows, both from the TeachingDemos package, can be used to draw the vector field.
#DWin is right to say that your graph don't represent faithfully your data, so I would advice to follow his answer. However here is how to reproduce (the closest I could) your graph:
Ellipses <- read.table(file.choose())
names(Ellipses) <- c('x','y','dfluxx','dfluxy','kxx','Temps','gradTx','gradTy')
require(splancs)
require(akima)
First preparing the data:
#First the background layer (the 'kxx' layer):
# Here the regular grid on which we're gonna do the interpolation
E.grid <- with(Ellipses,
expand.grid(seq(min(x),max(x),length=200),
seq(min(y),max(y),length=200)))
names(E.grid) <- c("x","y") # Without this step, function inout throws an error
E.grid$Value <- rep(0,nrow(E.grid))
#Split the dataset according to unique values of kxx
E.k <- split(Ellipses,Ellipses$kxx)
# Find the convex hull delimiting each of those values domain
E.k.ch <- lapply(E.k,function(X){X[chull(X$x,X$y),]})
for(i in unique(Ellipses$kxx)){ # Pick the value for each coordinate in our regular grid
E.grid$Value[inout(E.grid[,1:2],E.k.ch[names(E.k.ch)==i][[1]],bound=TRUE)]<-i
}
# Then the regular grid for the second layer (Temp)
T.grid <- with(Ellipses,
interp(x,y,Temps, xo=seq(min(x),max(x),length=200),
yo=seq(min(y),max(y),length=200),
duplicate="mean", linear=FALSE))
# The regular grids for the arrow layer (gradT)
dx <- with(Ellipses,
interp(x,y,gradTx,xo=seq(min(x),max(x),length=15),
yo=seq(min(y),max(y),length=10),
duplicate="mean", linear=FALSE))
dy <- with(Ellipses,
interp(x,y,gradTy,xo=seq(min(x),max(x),length=15),
yo=seq(min(y),max(y),length=10),
duplicate="mean", linear=FALSE))
T.grid2 <- with(Ellipses,
interp(x,y,Temps, xo=seq(min(x),max(x),length=15),
yo=seq(min(y),max(y),length=10),
duplicate="mean", linear=FALSE))
gradTgrid<-expand.grid(dx$x,dx$y)
And then the plotting:
palette(grey(seq(0.5,0.9,length=5)))
par(mar=rep(0,4))
plot(E.grid$x, E.grid$y, col=E.grid$Value,
axes=F, xaxs="i", yaxs="i", pch=19)
contour(T.grid, add=TRUE, col=colorRampPalette(c("blue","red"))(15), drawlabels=FALSE)
arrows(gradTgrid[,1], gradTgrid[,2], # Here I multiply the values so you can see them
gradTgrid[,1]-dx$z*40*T.grid2$z, gradTgrid[,2]-dy$z*40*T.grid2$z,
col="yellow", length=0.05)
To understand in details how this code works, I advise you to read the following help pages: ?inout, ?chull, ?interp, ?expand.grid and ?contour.

Resources