I am relatively new to R and I am trying to get my head around how to do ordination techniques in R, so that I don't need to use other software.
I am trying to get a PCA with environmental factors in the place of species.
As I have sites which differ qualitatively (in terms of land use) I wanted to be able to show that difference in the final plot (with different colours). Therefore, I used the method a la Gavin Simpson with the package vegan. So far so good. Here is also the code that I used for that:
with(fish, status)
scl <- -1 ## scaling = -1
colvec <- c("red2", "mediumblue")
plot(pond.pca, type = "n", scaling = scl)
with(fish, points(pond.pca, display = "sites", col = colvec[status], scaling = scl, pch = 21, bg = colvec[status]))
head(with(fish, colvec[status]))
text(pond.pca, display = "species", scaling = scl, cex = 0.8, col = "darkcyan")
with(fish, legend("topright", legend = levels(status), bty = "n", col = colvec, pch = 21, pt.bg = colvec))
The problem arises when I try to put arrows for my environmental variables in the ordination plot. If I use biplot and other functions like ordiplot etc. I ll not be able to keep the different colours for my two types of sites, therefore I don't want to use those. If I use the command here:
plot(envfit(pond.pca, PondEnv38, scaling=-1), add=TRUE, col="black")
I get nice arrows, only the are not aligned (and in some cases are completely opposite) with the environmental variables that I ve given with the code before (line 5). I tried to change the scaling but they just cannot align.
Does anyone know how to deal with that problem?
Any tips would be useful.
It is not clear what you are doing wrong as you don't provide a reproducible example of the problem and I am having difficulty following your description of what is wrong. Here is a fully worked out example for you to follow that does what you seem to being trying to do.
data(varespec)
data(varechem)
ord <- rda(varespec)
set.seed(1)
(fit <- envfit(ord, varechem, perm = 999))
## make up a fake `status`
status <- factor(rep(c("Class1","Class2"), times = nrow(varespec) / 2))
> head(status)
[1] Class1 Class2 Class1 Class2 Class1 Class2
Now plot
layout(matrix(1:2, ncol = 2))
## auto version
plot(fit, add = FALSE)
## manual version with extra things
colvec <- c("red","green")
scl <- -1
plot(ord, type = "n", scaling = scl)
points(ord, display = "sites", col = colvec[status], pch = (1:2)[status])
points(ord, display = "species", pch = "+")
plot(fit, add = TRUE, col = "black")
layout(1)
Which gives
And all the arrows seem to be pointing as they would if you plotted the envfit object directly.
Related
I don't understand how the scaling works in Vegan, when plotting ordinations.
I found this question which will help clarify my point. For what I can read on the "Numerical ecology with R" book, there is differences between scaling = 1 and scaling = 2. In particular, with scaling 1 "The angles among descriptor vectors do not reflect their correlations" while with scaling 2 "The angles between descriptors in the biplot reflect their correlations".
So, I run this code (partially copy-pasted from the cited question) and I get two different plots (the axis span is different, so maybe the scaling parameter is doing something) but I don't see much difference between the angles of the descriptor vectors so I am trying to understand what, if anything, is wrong.
What I am missing, here?
library("vegan")
data(varespec)
data(varechem)
ord <- rda(varespec)
set.seed(1)
(fit <- envfit(ord, varechem, perm = 999))
## make up a fake `status`
status <- factor(rep(c("Class1","Class2"), times = nrow(varespec) / 2))
## manual version with extra things
colvec <- c("red","green")
scl <- 1
plot(ord, type = "n", scaling = scl, main="Scaling 1")
points(ord, display = "sites", col = colvec[status], pch = (1:2)[status])
points(ord, display = "species", pch = "+")
plot(fit, add = TRUE, col = "black")
dev.new()
scl <- 2
plot(ord, type = "n", scaling = scl, main="Scaling 2")
points(ord, display = "sites", col = colvec[status], pch = (1:2)[status])
points(ord, display = "species", pch = "+")
plot(fit, add = TRUE, col = "black")
I have a big data with more than 20 millions values, due to privacy and making the codes reproducible, I use mydata to replace it.
set.seed(1234)
mydata <- rlnorm(28000000,3.14,1.3)
I want to find which known distributions fit mydata best, so function fitdist in package fitdistrplus is choosen.
library(fitdistrplus)
fit.lnorm <- fitdist(mydata,"lnorm")
fit.weibull <- fitdist(mydata, "weibull")
fit.gamma <- fitdist(mydata, "gamma", lower = c(0, 0))
fit.exp <- fitdist(mydata,"exp")
Then, I use ppcomp function to draw P-P plot to help me choose the best fitted distribution.
library(RColorBrewer)
tiff("./pplot.tiff",res = 300,compression = "lzw",height = 6,width = 10,units = "in",pointsize = 12)
ppcomp(list(fit.lnorm,fit.weibull, fit.gamma,fit.exp), fitcol = brewer.pal(9,"Set1")[1:4],legendtext = c("lnorm","weibull", "gamma","exp"))
dev.off()
Absolutely, lognormal fits mydata best, but take a look at the legend of the plot, the line annotation with different colors is missing, only text annotation shows, what should I do?
I try some datasets with few values, and it worked. So the big data leads to the question, what should I do to make the legend perfect?
A lot of function questions could be done by fix(function), in this way, we could know how the function works.
fix(ppcomp)
And I find some codes about legend,
if (addlegend) {
if (missing(legendtext))
legendtext <- paste("fit", 1:nft)
if (!largedata)
legend(x = xlegend, y = ylegend, bty = "n", legend = legendtext,
pch = fitpch, col = fitcol, ...)
else legend(x = xlegend, y = ylegend, bty = "n", legend = legendtext,
col = fitcol, ...)
}
Then, I add lty=1 to the legend, and it works.
Starting with the following code:
library(vegan)
data(dune)
data(dune.env)
Ordination.model1 <- cca(dune ~ Management,dune.env)
plot1 <- plot(Ordination.model1, choices=c(1,2), scaling=1)
I get a plot with sites, species, centroids, and biplot arrows. I want to build up a plot with just the sites depicted by points, and the arrows with customized labels.
So far, I have:
colvec <- c("red", "green", "blue")
plot(Ordination.model1, type="n", scaling=1)
with(dune.env, points(Ordination.model1, display ="sites", col=colvec[Use], scaling=1, pch =16, bg = colvec[Use]))
I am stuck as far as how to put the arrows in. Thanks in advance!
You can add arrows using text. I was not able to use your code as I kept getting errors, however here is a basic example that does what you want. I took it from R Help: CCA Plot
Once you add text the arrows should show.
require(vegan)
data(varespec)
data(varechem)
vare.cca <- cca(varespec ~ ., data = varechem)
plot(vare.cca, display = c("sites","species"), scaling = 3)
text(vare.cca, scaling = 3, display = "bp")
Here is an example with the labels argument:
## S3 method for class 'cca':
text((x, display = "sites", labels, choices = c(1, 2),
scaling = "species", arrow.mul, head.arrow = 0.05, select, const,
axis.bp = TRUE, correlation = FALSE, hill = FALSE, ...))
labels:
Optional text to be used instead of row names:
Plot or Extract Results of Constrained Correspondence Analysis or Redundancy Analysis
I was able to rename the arrows: below is the full code.
library(vegan)
data(dune)
data(dune.env)
Ordination.model1 <- cca(dune ~ Management,dune.env)
summary(Ordination.model1) # Lets you see the current biplot labels in the output.
colvec <- c("red", "green", "blue", "orange")
plot(Ordination.model1, type="n", scaling=1)
with(dune.env, points(Ordination.model1, display ="sites", col=colvec[Management],scaling=1, pch =16, bg = colvec[Management]))
labl <- c("HF", "NM", "SF") # new labels. Need to be in the same order as the old biplot labels.
text(Ordination.model1, display="bp", scaling=1, labels=labl)
I am quite new to R programming and have been given the task of representing some data in a boxplot. We were only provided the five figure summary of the data, i.e the lowest value, lower quartile,median,upper quartile,highest value. We are also told the amount of samples (n).
I read bxp was a function similar to boxplot but drew the boxplot based upon this five figure summary.
However, I know varwidth can be used to change the width of boxes proportionate to N, yet it does not seem to work here as all boxes are the same length. This is what I need help with.
MORSEYear1 <- c(18.2,58.5,64.4,73.4,91.1)
MORSEYear2 <- c(22.3,56.4,64.3,75.7,97.4)
MORSEYear3 <- c(29.1,57.9,66.6,73.4,86.0)
MathStatYear1 <- c(46.8,54.8,66.1,71.4,84.1)
MathStatYear2 <- c(35.1,47.8,57.8,65.7,82.8)
MathStatYear3 <- c(32.6,56.3,61.1,75.6,89.4)
MORSE1<-list(stats=matrix(MORSEYear1,MORSEYear1[5],MORSEYear1[1]), n=139)
MORSE2<-list(stats=matrix(MORSEYear2,MORSEYear2[5],MORSEYear2[1]), n=132)
MORSE3<-list(stats=matrix(MORSEYear3,MORSEYear3[5],MORSEYear3[1]), n=131)
MS1 <- list(stats=matrix(MathStatYear1,MathStatYear1[5],MathStatYear1[1]), n= 21)
MS2 <- list(stats=matrix(MathStatYear2,MathStatYear2[5],MathStatYear2[1]), n=20)
MS3 <- list(stats=matrix(MathStatYear3,MathStatYear3[5],MathStatYear3[1]), n= 14)
bxp(MORSE1, xlim = c(0.5,6.5),ylim = c(0,100),varwidth= TRUE, main = "Graph comparing distribution of marks across different years of MORSE and MathStat",ylab = "Marks", xlab = "Course and year of study (Course,Year)", axes = FALSE)
par(new=T)
bxp(MORSE2, xlim = c(-0.5,5.5), ylim = c(0,100),axes= TRUE, varwidth=TRUE)
par(new=T)
bxp(MORSE3, xlim = c(-1.5,4.5), ylim = c(0,100), varwidth=TRUE, axes = FALSE)
par(new=T)
bxp(MS1, xlim = c(-2.5,3.5), ylim = c(0,100), varwidth=TRUE, axes = FALSE)
par(new=T)
bxp(MS2, xlim = c(-3.5,2.5), ylim = c(0,100), varwidth=TRUE, axes = FALSE)
par(new=T)
bxp(MS3, xlim = c(-4.5,1.5), ylim = c(0,100), varwidth=TRUE, axes = FALSE)
NOTE: My supervisor said to use par(new=T) and change the xlim to plot multiple graphs using bxp(), if someone could verify if this is the best method or not that would be great!
Thanks
Stumbled upon the same problem, without much experience with R.
The varwidth argument of the bxp() function requires multiple boxplots being plotted at once. Adding to an initial plot does not count, as no readjustment is possible after the fact.
The question is how to construct a multidimensional z argument for bxp(). To answer this, a look at the result of something like boxplot(c(c(1,1),c(2,2))~c(c(11,11),c(22,22))) helps.
First, a generic example with made-up data to aid anyone that lands here:
# data
d1 <- c(1,2,3,4,5)
d2 <- c(1,2,3,5,8,13,21,34)
# summaries (generated with quantile and structured accordingly)
z1 <- list(
stats=matrix(quantile(d1, c(0.05,0.25,0.5,0.75,0.85))),
n=length(d1)
)
z2 <- list(
stats=matrix(quantile(d2, c(0.05,0.25,0.5,0.75,0.85))),
n=length(d2)
)
# merging the summaries appropriately
z <- list(
stats=cbind(z1$stats,z2$stats),
n=c(z1$n,z2$n)
)
# check result
print(z)
# call bxp with needed parameters ("at" can/should also be used here)
bxp(z=z,varwidth=TRUE)
In the case of the original question, one should merge MORSE# and MS#. The code is far from optimal - there might be a better way to merge and a function for this can be written, but the aim is ugly clarity and simplicity:
z <- list(
stats=cbind(MORSE1$stats, MORSE2$stats, MORSE3$stats, M1$stats, M2$stats, M3$stats),
n=c(MORSE1$stats, MORSE2$n, MORSE3$n, M1$n, M2$n, M3$n)
)
I have a dataset including 100 species and therefore it's very bad to plot. So I want to pick out a subset of these species and plot them in a RDA plot. I have been following this
guideline
The code looks like this:
## load vegan
require("vegan")
## load the Dune data
data(dune, dune.env)
## PCA of the Dune data
mod <- rda(dune, scale = TRUE)
## plot the PCA
plot(mod, scaling = 3)
## build the plot up via vegan methods
scl <- 3 ## scaling == 3
colvec <- c("red2", "green4", "mediumblue")
plot(mod, type = "n", scaling = scl)
with(dune.env, points(mod, display = "sites", col = colvec[Use],
scaling = scl, pch = 21, bg = colvec[Use]))
text(mod, display = "species", scaling = scl, cex = 0.8, col = "darkcyan")
with(dune.env, legend("topright", legend = levels(Use), bty = "n",
col = colvec, pch = 21, pt.bg = colvec))
This is the plot you end up with. Now i would really like to remove some of the species from the plot, but not the analysis. So the plot only shows like Salrep, Viclat, Aloge and Poatri.
Help is appreciated.
The functions you are doing the actual plotting with have an argument select (at least text.cca() and points.cca(). select takes either a logical vector of length i indicating whether the ith thing should be plotted, or the (numeric) indices of the things to plot. The example would then become:
## Load vegan
library("vegan")
## load the Dune data
data(dune, dune.env)
## PCA of the Dune data
mod <- rda(dune, scale = TRUE)
## plot the PCA
plot(mod, scaling = 3)
## build the plot up via vegan methods
scl <- 3 ## scaling == 3
colvec <- c("red2", "green4", "mediumblue")
## Show only these spp
sppwant <- c("Salirepe", "Vicilath", "Alopgeni", "Poatriv")
sel <- names(dune) %in% sppwant
## continue plotting
plot(mod, type = "n", scaling = scl)
with(dune.env, points(mod, display = "sites", col = colvec[Use],
scaling = scl, pch = 21, bg = colvec[Use]))
text(mod, display = "species", scaling = scl, cex = 0.8, col = "darkcyan",
select = sel)
with(dune.env, legend("topright", legend = levels(Use), bty = "n",
col = colvec, pch = 21, pt.bg = colvec))
Which gives you:
You may also use the ordiselect() function from the goeveg-package:
https://CRAN.R-project.org/package=goeveg
It offers selection of species for ordination plots based on abundances and/or species fit to axes.
## Select ssp. with filter: 50% most abundant and 50% best fitting
library(goeveg)
sel <- ordiselect(dune, mod, ablim = 0.5, fitlim = 0.5)
sel # 12 species selected
The result object of the function (containing the names of selected species) can be put into the select argument (as described above).