Plotting a subset of envfit results onto an ordination - r

I'm working on a figure for a publication where we are looking at a combination of plant coverage and other environmental data on differing communities. I am trying to make a multi-panel figure, with panels that display all the envfit results, one that displays only the plants, and one that displays only the other enviro. Because of the complexity of the figure, it's actually a little easier to construct in the base plot function than in ggvegan.
My challenge is figuring out how to subset the results of the envfit analysis object for the different panels. A simplified example would be:
library(vegan)
data("mite")
data("mite.env")
set.seed(55)
nmds<-metaMDS(mite)
set.seed(55)
ef<-envfit(nmds, mite.env, permu=999)
plot(ef, p.max = .05)
which produces this figure
For sake of the example, does anyone have suggestions on a way I could create two separate figures, one with only the WatrCont vector and one with only the SubsDens vector? I'm sure there is a way to pull specific results out of the ef object, but my coding is not savvy enough to understand how.
Additionally, is there a way to have the jumble of text at the center not overlap, similar to jitter in ggplot?
Thank y'all for all of your help!

I would suggest extracting the data from nmds and ef and using ggplot to add the required elements to your plots.
Here is an example:
library(vegan)
library(ggplot2)
data("mite")
data("mite.env")
set.seed(55)
nmds<-metaMDS(mite)
set.seed(55)
ef<-envfit(nmds, mite.env, permu=999)
# Get the NMDS scores
nmds_values <- as.data.frame(scores(nmds))
# Get the coordinates of the vectors produced for continuous predictors in your envfit
vector_coordinates <- as.data.frame(scores(ef, "vectors")) * ordiArrowMul(ef)
# Plot the vectors separately
ggplot(nmds_values,
aes(x=NMDS1, y = NMDS2)) +
geom_point() +
geom_segment(aes(x=0, y=0, xend=NMDS1, yend=NMDS2),
vector_coordinates[1,]) +
geom_text(aes(x=NMDS1,y=NMDS2),
vector_coordinates[1,],
label=row.names(vector_coordinates[1,]))
ggplot(nmds_values,
aes(x=NMDS1, y = NMDS2)) +
geom_point() +
geom_segment(aes(x=0, y=0, xend=NMDS1, yend=NMDS2),
vector_coordinates[2,]) +
geom_text(aes(x=NMDS1,y=NMDS2),
vector_coordinates[2,],
label=row.names(vector_coordinates[2,]))
You can play around with the colours, size of the different elements as you see fit. Coordinates for categorical predictors can be extracted in a similar manner.

Related

Sorting data vector for a histogram using ggplot and R

So I have 10.000 values in a vector from a Monte Carlo simulation. I want to plot this data as a histogram and a density plot. Doing this with the hist() function is easy, and it will calculate the frequency of the of the different values automatically. My ambition is however doing this in ggplot.
My biggest problem right now is how to transform the data so ggplot can handle it. I would like my x-axis to show the "price" while the x-axis shows the frequency or density. My data has a lot decimals as shown in the example data below.
myData <- c(266.8997, 271.5137, 225.4786, 223.3533, 258.1245, 199.5601, 234.2341, 231.7850, 260.2091, 184.5102, 272.8287, 203.7482, 212.5140, 220.9094, 221.2627, 236.3224)
My current code using the hist()-function, and the plot is shown below.
hist(myData,
xlab ="Price",
prob=TRUE)
lines(density(myData))
Histogram for the data vector containing 10000 values
How would you sort the data, and how would you do this with ggplot? I am thinking if I should round the numbers as well?
Hard to say exactly without seeing a sample of your data, but have you tried:
ggplot(myData, aes(Price)) + geom_histogram()
or:
ggplot(myData, aes(Price)) + geom_density()
Just try this:
ggplot() +
geom_bar(aes(myData)) +
geom_density(aes(myData))

Combining output from smatr with ggplot2

I have a dataset of leaf trait measurements made at multiple sites at two contrasting seasons. I am interested to explore the association/line fit between a pair of traits and to differentiate the seasons at each site.
Rather than a linear regression, I would prefer to use the Standardised Major Axis approach within the smatr package:
e.g. sma.site1 <- sma(TraitA ~ TraitB * Visit, data=subset(myfile, Site=="Site1")) # testing the null hypothesis of common slopes for the two Visits (Seasons) at a given Site.
I can produce a handy lattice plot in ggplot2 with a separate panel for each Site and the points differentiated by Visit:
e.g. qplot(TraitB, TraitA, data=myfile, colour=Visit) + facet_wrap(~Site, ncol=2)
However, if I add trend lines fitted with the additional argument in ggplot2:
+ geom_smooth(aes(group=Visit), method="lm", se=F)
……, those lines are not a good match for the sma coefficients.
What I would like to do is fit the lines suggested by the sma test onto the ggplot lattice. Is there an easy, or efficient, way to do that?
I know that I can subset the data, produce a plot for each site, add the relevant lines with + geom_abline() and then stitch the separate plots up together with grid.arrange(). But that feels very long-winded.
I would be grateful for any pointers.
I don't know anything about the smatr package but you should be able to tweak this to get the right values. Since you provided no data I used the leaf data from the example in the pkg. The basic idea is to pull out the slope & intercept from the returned sma object and then facet the geom_abline. I may be misinterpreting the object, though.
library(smatr)
library(ggplot2)
data(leaflife)
do.call(rbind, lapply(unique(leaflife$site), function(x) {
obj <- sma(longev~lma*rain, data=subset(leaflife, site=x))
data.frame(site=x,
intercept=obj$coef[[1]][1, 1],
slope=obj$coef[[1]][2, 1])
})) -> fits
gg <- ggplot(leaflife)
gg <- gg + geom_point(aes(x=lma, y=longev, color=soilp))
gg <- gg + geom_abline(data=fits, aes(slope=slope, intercept=intercept))
gg <- gg + facet_wrap(~site, ncol=2)
gg
I just saw this question and am not sure if you are still interested in this. I run the code by hrbrmstr, and found actually the only thing you need to change is:
obj <- sma(longev~lma*rain, data=subset(leaflife, site == x))
then you can get the plot with four lines for each group.
and also

Combine logistic regression with bar graph for maturity results

I am trying to present the results of a logistic regression analysis for the maturity schedule of a fish species. Below is my reproducible code.
#coded with R version R version 3.0.2 (2013-09-25)
#Frisbee Sailing
rm(list=ls())
library(ggplot2)
library(FSA)
#generate sample data 1 mature, 0 non mature
m<-rep(c(0,1),each=25)
tl<-seq(31,80, 1)
dat<-data.frame(m,tl)
# add some non mature individuals at random in the middle of df to
#prevent glm.fit: fitted probabilities numerically 0 or 1 occurred error
tl<-sample(50:65, 15)
m<-rep(c(0),each=15)
dat2<-data.frame(tl,m)
#final dataset
data3<-rbind(dat,dat2)
ggplot can produce a logistic regression graph showing each of the data points employed, with the following code:
#plot logistic model
ggplot(data3, aes(x=tl, y=m)) +
stat_smooth(method="glm", family="binomial", se=FALSE)+
geom_point()
I want to combine the probability of being mature at a given size, which is obtained, and plotted with the following code:
#plot proportion of mature
#clump data in 5 cm size classes
l50<-lencat(~tl,data=data3,startcat=30,w=5)
#table of frequency of mature individuals by size
mat<-with(l50, table(LCat, m))
#proportion of mature
mat_prop<-as.data.frame.matrix(prop.table(mat, margin=1))
colnames(mat_prop)<-c("nm", "m")
mat_prop$tl<-as.factor(seq(30,80, 5))
# Bar plot probability mature
ggplot(mat_prop, aes(x=tl,y=m)) +
geom_bar(stat="bin")
What I've been trying to do, with no success, is to make a graph that combines both, since the axis are the same it should be straightforward, but I cant seem to make t work. I have tried:
ggplot(mat_prop, aes(x=tl,y=m)) +
geom_bar(stat="bin")+
stat_smooth(method="glm", family="binomial", se=FALSE)
but does not work. Any help would be greatly appreciated. I am new so not able to add the resulting graphs to this post.
I see three problems with your code:
Using stat="bin" in your geom_bar() is inconsisten with giving values for the y-axis (y=m). If you bin, then you count the number of x-values in an interval and use that count as y-value, so there is no need to map your data to the y-axis.
The data for the glm-plot is in data3, but your combined plot only uses mat_prop.
The x-axis of the two plots are acutally not quite the same. In the bar plot, you use a factor variable on the x-axis, making the axis discrete, while in the glm-plot, you use a numeric variable, which leads to a continuous x-axis.
The following code gave a graph combining your two plots:
mat_prop$tl<-seq(30,80, 5)
ggplot(mat_prop, aes(x=tl,y=m)) +
geom_bar(stat="identity") +
geom_point(data=data3) +
geom_smooth(data=data3,aes(x=tl,y=m),method="glm", family="binomial", se=FALSE)
I could run it after first sourcing your script to define all the variables. The three problems mentioned above are adressed as follows:
I used geom_bar(stat="identity") in order not to use binning in the bar plot.
I use the data-argument in geom_point and geom_smooth in order to use the correct data (data3) for these parts of the plot.
I redifine mat_prop$tl to make it numeric. It is then consistent with the column tl in data3, which is numeric as well.
(I also added the points. If you don't want them, just remove geom_point(data=data3).)
The plot looks as follows:

R - How to histogram multiple matrixes using qplot/ggplot2

I'm using R to read and plot data from NetCDF files (ncdf4). I've started using R only recently thus I'm very confused, I beg your pardon.
Let's say from the files I obtain N 2-D matrixes of numerical values, each with different dimensions and many NA values.
I have to histogram these values in the same plot, with bins of given width and within given limits, the same for every matrix.
For just one matrix, I can do this:
library(ncdf4)
library(ggplot2)
file0 <- nc_open("test.nc")
#Read a variable
prec0 <- ncvar_get(file0,"pr")
#Some settings
min_plot=0
max_plot=30
bin_width=2
xlabel="mm/day"
ylabel="PDF"
title="Precipitation"
#Get maximum of array, exclude NAs
maximum_prec0=max(prec0, na.rm=TRUE)
#Store the histogram
histo_prec0 <- hist(prec0, xlim=c(min_plot,max_plot), right=FALSE, breaks=seq(0,ceiling(maximum_prec0),by=bin_width))
#Plot the histogram densities using points instead of bars, which is what we want
qplot(histo_prec0$mids, histo_prec0$density, xlim=c(min_plot,max_plot), color=I("yellow"), xlab=xlabel, ylab=ylabel, main=title, log="y")
#If necessary, can transform matrix to vector using
#vector_prec0 <- c(prec0)
However it occurs to me that it would be best to use a DataFrame for plotting multiple matrixes. I'm not certain of that nor on how to do it. This would also allow for automatic legends and all the advantages that come from using dataframes with ggplot2.
What I want to achieve is something akin to this:
https://copy.com/thumbs_public/j86WLyOWRs4N1VTi/scatter_histo.jpg?size=1024
Where on Y we have the Density and on X the bins.
Thanks in advance.
To be honest, it is unclear what you are after (scatter plot or histogram of data with values as points?).
Here are a couple of examples using ggplot which might fit your goals (based on your last sentence: "Where on Y we have the Density and on X the bins"):
# some data
nsample<- 200
d1<- rnorm(nsample,1,0.5)
d2<- rnorm(nsample,2,0.6)
#transformed into histogram bins and collected in a data frame
hist.d1<- hist(d1)
hist.d2<- hist(d2)
data.d1<- data.frame(hist.d1$mids, hist.d1$density, rep(1,length(hist.d1$density)))
data.d2<- data.frame(hist.d2$mids, hist.d2$density, rep(2,length(hist.d2$density)))
colnames(data.d1)<- c("bin","den","group")
colnames(data.d2)<- c("bin","den","group")
ddata<- rbind(data.d1,data.d2)
ddata$group<- factor(ddata$group)
# plot
plots<- ggplot(data=ddata, aes(x=bin, y=den, group=group)) +
geom_point(aes(color=group)) +
geom_line(aes(color=group)) #optional
print(plots)
However, you could also produce smooth density plots (or histograms) directly in ggplot:
ddata2<- cbind(c(rep(1,nsample),rep(2,nsample)),c(d1,d2))
ddata2<- as.data.frame(ddata2)
colnames(ddata2)<- c("group","value")
ddata2$group<- factor(ddata2$group)
plots2<- ggplot(data=ddata2, aes(x=value, group=group)) +
geom_density(aes(color=group))
# geom_histogram(aes(color=group, fill=group)) # for histogram instead
windows()
print(plots2)

Creating a facet_wrap plot with ggplot2 with different annotations in each plot

I am using ggplot2 to explore the result of some testing on an agent-based model. The model can end in one of three rounds per realization, and as such I am interested in how player utilities differ in terms of what round the game ends and their relative position in 2D space.
All this is to say that I have generated a facet_wrap plot to show this for each round, but I would also like to annotate each plot with the cor(x,y) for the subset of data represented in each facet. Is there a way to tell ggplot2 that I would like the annotation to use the subset of data generated by facet_wrap? Here is the code I have so far, and what it is producing
library(ggplot2)
# Load data
abm.data<-read.csv("ABM_results.csv")
# Create new colun for area of Pareto set
attach(abm.data)
area<-abs(((x3*(y2-y1))+(x2*(y1-y3))+(x1*(y3-y2)))/2)
abm.data<-transform(abm.data,area=area)
detach(abm.data)
# Compare area of Pareto set with player utility
png("area_p1.png",res=100,pointsize=20,height=500,width=1600)
area.p1<-ggplot(abm.data,aes(x=area))+geom_point(aes(y=U1_2,colour="Player 1",alpha=0.4))+facet_wrap(~round,ncol=3)+
annotate("text",0.375,-1.25,label=paste("rho=",round(cor(abm.data$area,abm.data$U1_2),2)), parse=TRUE)+
scale_colour_manual(values=c("Player 1"="red"))
area.p1+xlab("Area of Pareto Set")+ylab("Player Utility at Game End")+
opts(title="Final Player 1 Utility by Pareto Set Size and Round Game Ends",legend.position="none")
dev.off()
(source: drewconway.com)
As you can see, there are two problems:
The \rho value is of the full dataset, rather than the subsets by 'round'. Is there a way to get the cor(x,y) to print based on only the data shown in each plot?
The annotation should read "\rho=some_value" but instead I get "=(\rho,value);" is there a way to fix this?
To fix the second problem use
annotate("text", 0.375, -1.25,
label=paste("rho==", round(cor(abm.data$area, abm.data$U1_2), 2)),
parse=TRUE)
i.e. "rho==".
Edit: Here is a solution to solve the first problem
library("plyr")
library("ggplot2")
set.seed(1)
df <- data.frame(x=rnorm(300), y=rnorm(300), cl=gl(3,100)) # create test data
df.cor <- ddply(df, .(cl), function(val) sprintf("rho==%.2f", cor(val$x, val$y)))
p1 <- ggplot(data=df, aes(x=x)) +
geom_point(aes(y=y, colour="col1", alpha=0.4)) +
facet_wrap(~ cl, ncol=3) +
geom_text(data=df.cor, aes(x=0, y=3, label=V1), parse=TRUE) +
scale_colour_manual(values=c("col1"="red")) +
opts(legend.position="none")
print(p1)
The same question may be asked as for adding segments for each facet. We can solve these general problems by geom_segment instead of annotate("segment",...), for the geom_foo, we can define a data.frame to store the data for the geom_foo.

Resources