R: PCA plot with different colors for Sites - r

I´m recently trying to analyse my data and want to make the graphs a little nicer but I´m failing at this.
So I have a data set with 144 sites and 5 environmental variables. It´s basically about the substrate composition around an island and the fish abundance. On this island there is supposed to be a difference in the substrate composition between the north and the southside. Right now I am doing a pca and with the biplot function it works quite fine, but I would like to change the plot a bit.
I need one where the sites are just points and not numbered, arrows point to the different variable and the sites are colored according to their location (north or southside). So I tried everything i could find.
Most examples where with the dune data and suggested something like this:
library(vegan)
library(biplot)
data(dune)
mod <- rda(dune, scale = TRUE)
biplot(mod, scaling = 3, type = c("text", "points"))
So according to this I would just need to say text and points and R would label the variables and just make points for the sites. When i do this, however I get the Error:
Error in plot.default(x, type = "n", xlim = xlim, ylim = ylim, col = col[1L], :
formal argument "type" matched by multiple actual arguments
No idea how to get around this.
So next strategy I found, is to make a plot manually like this:
require("vegan")
data(dune, dune.env)
mod <- rda(dune, scale = TRUE)
scl <- 3 ## scaling == 3
colvec <- c("red2", "green4", "mediumblue")
plot(mod, type = "n", scaling = scl)
with(dune.env, points(mod, display = "sites", col = colvec[Use],
scaling = scl, pch = 21, bg = colvec[Use]))
text(mod,display="species", scaling = scl, cex = 0.8, col = "darkcyan")
with(dune.env, legend("bottomright", legend = levels(Use), bty = "n",
col = colvec, pch = 21, pt.bg = colvec))
This works fine so far as well, I get different colors and points, but now the arrows are missing. So I found that this should be corrected easy, if i just put "display="bp"" in the text line. But this doesn´t work either. Everytime I put "bp" R says:
Error in match.arg(display) :
argument "display" is missing, with no default
So I´m kind of desperate now. I looked through all the answers here and I don´t understand why display="bp" and type=c("text","points") is not working for me.
If anyone has an idea i would be super grateful.
https://www.dropbox.com/sh/y8xzq0bs6mus727/AADmasrXxUp6JTTHN5Gr9eufa?dl=0
This is the link to my dropbox folder. It contains my R-script and the csv files. The one named environmentalvariables_Kon1 also contains the data about north and southside.
So yeah...if anyone could help me. That would be awesome. I really don´t know what to do anymore.
Best regards,
Nancy

You can add arrows with arrows(). See the code for vegan:::biplot.rda to see how it works in the original function.
With your plot, add
g <- scores(mod, display = "species")
len <- 1
arrows(0, 0, len * g[, 1], len * g[, 2], length = 0.05, col = "darkcyan")
You might want to adjust the value of len to make the arrows longer

Related

non-linear 2d object transformation by horizontal axis

How can such a non-linear transformation be done?
here is the code to draw it
my.sin <- function(ve,a,f,p) a*sin(f*ve+p)
s1 <- my.sin(1:100, 15, 0.1, 0.5)
s2 <- my.sin(1:100, 21, 0.2, 1)
s <- s1+s2+10+1:100
par(mfrow=c(1,2),mar=rep(2,4))
plot(s,t="l",main = "input") ; abline(h=seq(10,120,by = 5),col=8)
plot(s*7,t="l",main = "output")
abline(h=cumsum(s)/10*2,col=8)
don't look at the vector, don't look at the values, only look at the horizontal grid, only the grid matters
####UPDATE####
I see that my question is not clear to many people, I apologize for that...
Here are examples of transformations only along the vertical axis, maybe now it will be more clear to you what I want
link Source
#### UPDATE 2 ####
Thanks for your answer, this looks like what I need, but I have a few more questions if I may.
To clarify, I want to explain why I need this, I want to compare vectors with each other that are non-linearly distorted along the horizontal axis .. Maybe there are already ready-made tools for this?
You mentioned that there are many ways to do such non-linear transformations, can you name a few of the best ones in my case?
how to make the function f() more non-linear, so that it consists, for example, not of one sinusoid, but of 10 or more. Тhe figure shows that the distortion is quite simple, it corresponds to one sinusoid
and how to make the function f can be changed with different combinations of sinusoids.
set.seed(126)
par(mar = rep(2, 4),mfrow=c(1,3))
s <- cumsum(rnorm(100))
r <- range(s)
gridlines <- seq(r[1]*2, r[2]*2, by = 0.2)
plot(s, t = "l", main = "input")
abline(h = gridlines, col = 8)
f <- function(x) 2 * sin(x)/2 + x
plot(s, t = "l", main = "input+new greed")
abline(h = f(gridlines), col = 8)
plot(f(s), t = "l", main = "output")
abline(h = f(gridlines), col = 8)
If I understand you correctly, you wish to map the vector s from the regular spacing defined in the first image to the irregular spacing implied by the second plot.
Unfortunately, your mapping is not well-defined, since there is no clear correspondence between the horizontal lines in the first image and the second image. There are in fact an infinite number of ways to map the first space to the second.
We can alter your example a bit to make it a bit more rigorous.
If we start with your function and your data:
my.sin <- function(ve, a, f, p) a * sin(f * ve + p)
s1 <- my.sin(1:100, 15, 0.1, 0.5)
s2 <- my.sin(1:100, 21, 0.2, 1)
s <- s1 + s2 + 10 + 1:100
Let us also create a vector of gridlines that we will draw on the first plot:
gridlines <- seq(10, 120, by = 2.5)
Now we can recreate your first plot:
par(mar = rep(2, 4))
plot(s, t = "l", main = "input")
abline(h = gridlines, col = 8)
Now, suppose we have a function that maps our y axis values to a different value:
f <- function(x) 2 * sin(x/5) + x
If we apply this to our gridlines, we have something similar to your second image:
plot(s, t = "l", main = "input")
abline(h = f(gridlines), col = 8)
Now, what we want to do here is effectively transform our curve so that it is stretched or compressed in such a way that it crosses the gridlines at the same points as the gridlines in the original image. To do this, we simply apply our mapping function to s. We can check the correspondence to the original gridlines by plotting our new curves with a transformed axis :
plot(f(s), t = "l", main = "output", yaxt = "n")
axis(2, at = f(20 * 1:6), labels = 20 * 1:6)
abline(h = f(gridlines), col = 8)
It may be possible to create a mapping function using the cumsum(s)/10 * 2 that you have in your original example, but it is not clear how you want this to correspond to the original y axis values.
Response to edits
It's not clear what you mean by comparing two vectors. If one is a non-linear deformation of the other, then presumably you want to find the underlying function that produces the deformation. It is possible to create a function that applies the deformation empirically simply by doing f <- approxfun(untransformed_vector, transformed_vector).
I didn't say there were many ways of doing non-linear transformations. What I meant is that in your original example, there is no correspondence between the grid lines in the original picture and the second picture, so there is an infinite choice for which gridines in the first picture correspond to which gridlines in the second picture. There is therefore an infinite choice of mapping functions that could be specified.
The function f can be as complicated as you like, but in this scenario it should at least be everywhere non-decreasing, such that any value of the function's output can be mapped back to a single value of its input. For example, function(x) x + sin(x)/4 + cos(3*(x + 2))/5 would be a complex but ever-increasing sinusoidal function.

Select argument doesn't work on cca objects

I created an object of class cca in vegan and now I am trying to tidy up the triplot. However, I seemingly can't use the select argument to only show specified items.
My code looks like this:
data("varechem")
data("varespec")
ord <- cca(varespec ~ Al+S, varechem)
plot(ord, type = "n")
text(ord, display = "sites", select = c("18", "21"))
I want only the two specified sites (18 and 21) to appear in the plot, but when I run the code nothing happens. I do not even get an error meassage.
I'm really stuck, but I am fairly certain that this bit of code is correct. Can someone help me?
I can't recall now, but I don't think the intention was to allow "names" to select which rows of the scores should be selected. The documentation speaks of select being a logical vector, or indices of the scores to be selected. By indices it was meant numeric indices, not rownames.
The example fails because select is also used to subset the labels character vector of values to be plotted in text(), and this labels character vector is not named. Using a character vector to subset another vector requires that the other vector be named.
Your example works if you do:
data("varechem")
data("varespec")
ord <- cca(varespec ~ Al + S, varechem)
plot(ord, type = "n")
take <- which(rownames(varechem) %in% c("18", "21"))
# or
# take <- rownames(varechem) %in% c("18", "21")
text(ord, display = "sites", select = take)
I'll have a think about whether it will be simple to support the use case of your example.
The following code probably gives the result you want to achieve:
First, create an object to store the blank CCA1-CCA2 plot
p1 = plot(ord, type = "n")
Find and then save the coordinates of the sites 18 and 21
p1$p1$sites[c("18", "21"),]
# CCA1 CCA2
#18 0.3496725 -1.334061
#21 -0.8617759 -1.588855
site18 = p1$sites["18",]
site21 = p1$sites["21",]
Overlay the blank CCA1-CCA2 plot with the points of site 18 and 21. Setting different colors to different points might be a good idea.
points(p1$sites[c("18", "21"),], pch = 19, col = c("blue", "red"))
Showing labels might be informative.
text(x = site18[1], y = site18[2] + 0.3, labels = "site 18")
text(x = site21[1], y = site21[2] + 0.3, labels = "site 21")
Here is the resulted plot.

How do I exclude parameters from an RDA plot

I'm still relatively inexperienced manipulating plots in R, and am in need of assistance. I ran a redundancy analysis in R using the rda() function, but now I need to simplify the figure to exclude unnecessary information. The code I'm currently using is:
abio1516<-read.csv("1516 descriptors.csv")
attach(abio1516)
bio1516<-read.csv("1516habund.csv")
attach(bio1516)
rda1516<-rda(bio1516[,2:18],abio1516[,2:6])
anova(rda1516)
RsquareAdj(rda1516)
summary(rda1516)
varpart(bio1516[,2:18],~Distance_to_source,~Depth, ~Veg._cover, ~Surface_area,data=abio1516)
plot(rda1516,bty="n",xaxt="n",yaxt="n",main="1516; P=, R^2=",
ylab="Driven by , Var explained=",xlab="Driven by , Var explained=")
The produced plot looks like this:
Please help me modify my code to: exclude the sites (sit#), all axes, and the internal dashed lines.
I'd also like to either expand the size of the field, or move the vector labels to all fit in the plotting field.
updated as per responses, working code below this point
plot(rda,bty="n",xaxt="n",yaxt="n",type="n",main="xxx",ylab="xxx",xlab="xxx
Overall best:xxx")
abline(h=0,v=0,col="white",lwd=3)
points(rda,display="species",col="blue")
points(rda,display="cn",col="black")
text(rda,display="cn",col="black")
Start by plotting the rda with type = "n" which generates an empty plot to which you can add the things you want. The dotted lines are hard coded into the plot.cca function, so you need either make your own version, or use abline to hide them (then use box to cover up the holes in the axes).
require(vegan)
data(dune, dune.env)
rda1516 <- rda(dune~., data = dune.env)
plot(rda1516, type = "n")
abline(h = 0, v = 0, col = "white", lwd = 3)
box()
points(rda1516, display = "species")
points(rda1516, display = "cn", col = "blue")
text(rda1516, display = "cn", col = "blue")
If the text labels are not in the correct position, you can use the argument pos to move them (make a vector as long as the number of arrows you have with the integers 1 - 4 to move the label down, left, up, or right. (there might be better solutions to this)

(R) Axis widths in gbm.plot

Hoping for some pointers or some experiences insight as i'm literally losing my mind over this, been trying for 2 full days to set up the right values to have a function spit out clean simple line plots from the gbm.plot function (packages dismo & gbm).
Here's where I start. bty=n in par to turn off the box & leave me with only left & bottom axes. Gbm.plot typically spits out one plot per explanatory variable, so usually 6 plots etc, but I'm tweaking it to do one per variable & looping it. I've removed the loop & lots of other code so it's easy to see what's going on.
png(filename = "whatever.png",width=4*480, height=4*480, units="px", pointsize=80, bg="white", res = NA, family="", type="cairo-png")
par(mar=c(2.6,2,0.4,0.5), fig=c(0,1,0.1,1), las=1, bty="n", mgp=c(1.6,0.5,0))
gbm.plot(my_gbm_model,
n.plots=1,
plot.layout = c(1,1),
y.label = "",
write.title=F,
variable.no = 1, #this is part of the multiple plots thing, calls the explanatory variable
lwd=8, #this controls the width of the main result line ONLY
rug=F)
dev.off()
So this is what the starting condition looks like. Aim: make the axes & ticks thicker. That's it.
Putting "lwd=20" in par does nothing.
Adding axes=F into gbm.plot() turns the axes and their numbers off. So I conclude that the control of these axes is handled by gbm.plot, not par. Here's where it get's frustrating and crap. Accepted wisdom from searches says that lwd should control this but it only controls the wiggly centre line as per my note above. So maybe I could add axis(side=1, lwd=8) into gbm.plot() ?
It runs but inexplicably adds a smoother! (which is very thin & hard to see on the web but it's there, I promise). It adds these warnings:
In if (smooth & is.vector(predictors[[j]])) { ... :
the condition has length > 1 and only the first element will be used
Fine, R's going to be a dick for seemingly no reason, I'll keep plugging the leaks as they come up. New code with axis as before and now smoother turned off:
png(filename = "whatever.png",width=4*480, height=4*480, units="px", pointsize=80, bg="white", res = NA, family="", type="cairo-png")
par(mar=c(2.6,2,0.4,0.5), fig=c(0,1,0.1,1), las=1, bty="n", mgp=c(1.6,0.5,0))
gbm.plot(my_gbm_model,
n.plots=1,
plot.layout = c(1,1),
y.label = "",
write.title=F,
variable.no = 1,
lwd=8,
rug=F,
smooth=F,
axis(side=1,lwd=8))
dev.off()
Gives error:
Error in axis(side = 1, lwd = 8) : plot.new has not been called yet
So it's CLEARLY drawing axes within plot since I can't affect the axes from par and I can turn them off in plot. I can do what I want and make one axis bold, but that results in a smoother and warnings. I can turn the smoother off, but then it fails because it says plot.new hadn't been called. And this doesn't even account for the other axis I have to deal with, which also causes the plot.new failure if I call 2 axis sequentially and allow the smoother.
Am I the butt of a big joke here, or am I missing something obvious? It took me long enough to work out that par is supposed to be before all plots unless you're outputting them with png etc in which case it has to be between png & plot - unbelievably this info isn't in ?par. I know I'm going off topic by ranting, sorry, but yeah, 2 full days. Has this been everyone's experience of plotting in R?
I'm going to open the vodka in the freezer. I appreciate I've not put the full reproducible code here, apologies, I can do if absolutely necessary, but it's such a huge timesuck to get to reproducible stage and I'm hoping someone can see a basic logical/coding failure screaming out at them from what I've given.
Thanks guys.
EDIT: reproducibility
core data csv: https://drive.google.com/file/d/0B6LsdZetdypkWnBJVDJ5U3l4UFU
(I've tried to make these data reproducible before and I can't work out how to do so)
samples<-read.csv("data.csv", header = TRUE, row.names=NULL)
my_gbm_model<-gbm.step(data=samples, gbm.x=1:6, gbm.y=7, family = "bernoulli", tree.complexity = 2, learning.rate = 0.01, bag.fraction = 0.5))
Here's what will widen your axis ticks:
..... , lwd.ticks=4 , ...
I predict on the basis of no testing because I keep getting errors with what limited code you have provided) that it will get handled correctly in either gbm.plot or in a subsequent axis call. There will need to be a subsequent axis call, two of them in fact (because as you noted 'lwd' gets passed around indiscriminately):
png(filename = "whatever.png",width=4*480, height=4*480, units="px", pointsize=80, bg="white", res = NA, family="", type="cairo-png")
par(mar=c(2.6,2,0.4,0.5), fig=c(0,1,0.1,1), las=1, bty="n", mgp=c(1.6,0.5,0))
gbm.plot(my_gbm_model,
n.plots=1,
plot.layout = c(1,1),
y.label = "",
write.title=F,
variable.no = 1,
lwd=8,
rug=F,
smooth=F, axes="F",
axis(side=1,lwd=8))
axis(1, lwd.ticks=4, lwd=4)
# the only way to prevent `lwd` from also affecting plot line
axis(2, lwd.ticks=4, lwd=4)
dev.off()
This is what I see with a simple example:
png(); Speed <- cars$speed
Distance <- cars$dist
plot(Speed, Distance,
panel.first = lines(stats::lowess(Speed, Distance), lty = "dashed"),
pch = 0, cex = 1.2, col = "blue", axes=FALSE)
axis(1, lwd.ticks=4, lwd=4)
axis(2, lwd.ticks=4, lwd=4)
dev.off()

R) Create double-labeled MDS plot

I am totally new to R.
I have expression profile data which is preprocessed and combined. Looks like this ("exp.txt")
STUDY_1_CANCER_1 STUDY_1_CON_1 STUDY_2_CANCER_1 STUDY_2_CANCER_2
P53 1.111 1.22 1.3 1.4
.....
Also, I created phenotype data. Looks lite this ("pheno.txt")
Sample Disease Study
STUDY_1_CANCER_1 Cancer GSE1
STUDY_1_CON_1 Normal GSE1
STUDY_2_CANCER_1 Cancer GSE2
STUDY_2_CON_1 Normal GSE2
Here, I tried to make MDS plot using classical cmdscale command like this.
data=read.table("exp.txt", row.names=1, header=T)
DATA=as.matrix(data)
pc=cor(DATA, method="p")
mds=cmdscale(as.dist(1-pc),2)
plot(mds)
I'd like to create plot like this figure with color double-labeling (Study and Disease). How should I do?
First create an empty plot, then add the points with specified colors/shapes.
Here's an example:
require(vegan)
data(dune)
data(dune.env)
mds <- cmdscale(vegdist(dune, method='bray'))
# set colors and shapes
cols = c('red', 'blue', 'black', 'steelblue')
shps = c(15, 16, 17)
# empty plot
plot(mds, type = 'n')
# add points
points(mds, col = cols[dune.env$Management], pch = shps[dune.env$Use])
# add legend
legend('topright', col=cols, legend=levels(dune.env$Management), pch = 16, cex = 0.7)
legend('bottomright', legend=levels(dune.env$Use), pch = shps, cex = 0.7)
Note that factors are internally coded as integers, which is helpful here.
> levels(dune.env$Management)
[1] "BF" "HF" "NM" "SF"
so
cols[dune.env$Management]
will take the first entry of cols for the first factor levels. Similariy for the different shapes.
Finally add the legend. Of course this plot still needs some polishing, but thats the way to go...
BTW: Gavin Simpson has a nice blogpost about customizing ordination plots.
Actually, you can do this directly in default plot command which can take pch and col arguments as vectors. Use:
with(data, plot(mds, col = as.numeric(Study), pch = as.numeric(Disease), asp = 1)
You must use asp = 1 when you plot cmdscale results: both axes must be scaled similarly. You can also add xlab and ylab arguments for nicer axis labels. For adding legend and selecting plotting characters and colours, see other responses.

Resources