How to limit variables of dotplot? - r

I want to create a dotplot which comprises only the top 10 values of the features in the text file. The following code works, but the output is a dotplot containing all 160 variables.
library(lattice)
table<-"imp_s2.txt"
DT<-read.table(table, header=T)
# output graph to pdf file
pdf("dotplot_s2.pdf")
colnames(DT)
DT$feature <- reorder(DT$feature, DT$IncMSE)
dotplot(feature ~ IncMSE, data = DT,
aspect = 1.5,
xlab = "Variable Importance, Scale 2",
scales = list(cex = .6),
panel = function (x, y) {
panel.abline(h = as.numeric(y), col = "gray", lty = 2)
panel.xyplot(x, as.numeric(y), col = "black", pch = 16)})
dev.off()

It would help if you included a reproducible example. My guess is that this can be done by simply subsetting your data frame so that you are including only the rows with the top 10 values. Something like this might work (although I can't test it):
# get threshold value
cutoff <- sort(DT$IncMSE, decreasing=TRUE)[10]
dotplot(feature ~ IncMSE,
data = DT[which(DT$IncMSE>=cutoff),], # this only includes top values
aspect = 1.5,
xlab = "Variable Importance, Scale 2",
scales = list(cex = .6),
panel = function (x, y) {
panel.abline(h = as.numeric(y), col = "gray", lty = 2)
panel.xyplot(x, as.numeric(y), col = "black", pch = 16)})

Related

RDA triplot in R- plot only numeric explanatory variables as arrows; factors as centroids

I ran a distance-based RDA using capscale() in the vegan library in R and I am trying to plot my results as a custom triplot. I only want numeric or continuous explanatory variables to be plotted as arrows/vectors. Currently, both factors and numeric explanatory variables are being plotted with arrows, and I want to remove arrows for factors (site and year) and plot centroids for these instead.
dbRDA=capscale(species ~ canopy+gmpatch+site+year+Condition(pair), data=env, dist="bray")
To plot I extracted % explained by the first 2 axes as well as scores (coordinates in RDA space)
perc <- round(100*(summary(spe.rda.signif)$cont$importance[2, 1:2]), 2)
sc_si <- scores(spe.rda.signif, display="sites", choices=c(1,2), scaling=1)
sc_sp <- scores(spe.rda.signif, display="species", choices=c(1,2), scaling=1)
sc_bp <- scores(spe.rda.signif, display="bp", choices=c(1, 2), scaling=1)
I then set up a blank plot with scaling, axes, and labels
dbRDAplot<-plot(spe.rda.signif,
scaling = 1, # set scaling type
type = "none", # this excludes the plotting of any points from the results
frame = FALSE,
# set axis limits
xlim = c(-1,1),
ylim = c(-1,1),
# label the plot (title, and axes)
main = "Triplot db-RDA - scaling 1",
xlab = paste0("db-RDA1 (", perc[1], "%)"),
ylab = paste0("db-RDA2 (", perc[2], "%)"))
Created a legend and added points for site scores and text for species
pchh <- c(2, 17, 1, 19)
ccols <- c("black", "red", "black", "red")
legend("topleft", c("2016 MC", "2016 SP", "2018 MC", "2018 SP"), pch = pchh[unique(as.numeric(as.factor(env$siteyr)))], pt.bg = ccols[unique(as.factor(env$siteyr))], bty = "n")
points(sc_si,
pch = pchh[as.numeric(as.factor(env$siteyr))], # set shape
col = ccols[as.factor(env$siteyr)], # outline colour
bg = ccols[as.factor(env$siteyr)], # fill colour
cex = 1.2) # size
text(sc_sp , # text(sc_sp + c(0.02, 0.08) tp adjust text coordinates to avoid overlap with points
labels = rownames(sc_sp),
col = "black",
font = 1, # bold
cex = 0.7)
Here is where I add arrows for explanatory variables, but I want to be selective and do so for numeric variables only (canopy and gmpatch). The variables site and year I want to plot as centroids, but unsure how to do this. Note that the data structure for these are definitely specified as factors already.
arrows(0,0, # start them from (0,0)
sc_bp[,1], sc_bp[,2], # end them at the score value
col = "red",
lwd = 2)
text(x = sc_bp[,1] -0.1, # adjust text coordinate to avoid overlap with arrow tip
y = sc_bp[,2] - 0.03,
labels = rownames(sc_bp),
col = "red",
cex = 1,
font = 1)
#JariOksanen thank you for your answer. I was able to use the following to fix the problem
text(dbRDA, choices = c(1, 2),"cn", arrow=FALSE, length=0.05, col="red", cex=0.8, xpd=TRUE)
text(dbRDA, display = "bp", labels = c("canopy", "gmpatch"), choices = c(1, 2),scaling = "species", arrow=TRUE, select = c("canopy", "gmpatch"), col="red", cex=0.8, xpd = TRUE)
#JariOksanen thank you for your answer. I was able to use the following to fix the problem
text(dbRDA, choices = c(1, 2),"cn", arrow=FALSE, length=0.05, col="red", cex=0.8, xpd=TRUE)
text(dbRDA, display = "bp", labels = c("canopy", "gmpatch"), choices = c(1, 2),scaling = "species", arrow=TRUE, select = c("canopy", "gmpatch"), col="red", cex=0.8, xpd = TRUE)

How can I change the colour of my points on my db-RDA triplot in R?

QUESTION: I am building a triplot for the results of my distance-based RDA in R, library(vegan). I can get a triplot to build, but can't figure out how to make the colours of my sites different based on their location. Code below.
#running the db-RDA
spe.rda.signif=capscale(species~canopy+gmpatch+site+year+Condition(pair), data=env, dist="bray")
#extract % explained by first 2 axes
perc <- round(100*(summary(spe.rda.signif)$cont$importance[2, 1:2]), 2)
#extract scores (coordinates in RDA space)
sc_si <- scores(spe.rda.signif, display="sites", choices=c(1,2), scaling=1)
sc_sp <- scores(spe.rda.signif, display="species", choices=c(1,2), scaling=1)
sc_bp <- scores(spe.rda.signif, display="bp", choices=c(1, 2), scaling=1)
#These are my location or site names that I want to use to define the colours of my points
site_names <-env$site
site_names
#set up blank plot with scaling, axes, and labels
plot(spe.rda.signif,
scaling = 1,
type = "none",
frame = FALSE,
xlim = c(-1,1),
ylim = c(-1,1),
main = "Triplot db-RDA - scaling 1",
xlab = paste0("db-RDA1 (", perc[1], "%)"),
ylab = paste0("db-RDA2 (", perc[2], "%)")
)
#add points for site scores - these are the ones that I want to be two different colours based on the labels in the original data, i.e., env$site or site_names defined above. I have copied the current state of the graph
points(sc_si,
pch = 21, # set shape (here, circle with a fill colour)
col = "black", # outline colour
bg = "steelblue", # fill colour
cex = 1.2) # size
Current graph
I am able to add species names and arrows for environmental predictors, but am just stuck on how to change the colour of the site points to reflect their location (I have two locations defined in my original data). I can get them labelled with text, but that is messy.
Any help appreciated!
I have tried separating shape or colour of point by site_name, but no luck.
If you only have a few groups (in your case, two), you could make the group a factor (within the plot call). In R, factors are represented as an integer "behind the scenes" - you can represent up to 8 colors in base R using a simple integer:
set.seed(123)
df <- data.frame(xvals = runif(100),
yvals = runif(100),
group = sample(c("A", "B"), 100, replace = TRUE))
plot(df[1:2], pch = 21, bg = as.factor(df$group),
bty = "n", xlim = c(-1, 2), ylim = c(-1, 2))
legend("topright", unique(df$group), pch = 21,
pt.bg = unique(as.factor(df$group)), bty = "n")
If you have more than 8 groups, or if you would like to define your own colors, you can simply create a vector of colors the length of your groups and still use the same factor method, though with a few slight tweaks:
# data with 10 groups
set.seed(123)
df <- data.frame(xvals = runif(100),
yvals = runif(100),
group = sample(LETTERS[1:10], 100, replace = TRUE))
# 10 group colors
ccols <- c("red", "orange", "blue", "steelblue", "maroon",
"purple", "green", "lightgreen", "salmon", "yellow")
plot(df[1:2], pch = 21, bg = ccols[as.factor(df$group)],
bty = "n", xlim = c(-1, 2), ylim = c(-1, 2))
legend("topright", unique(df$group), pch = 21,
pt.bg = ccols[unique(as.factor(df$group))], bty = "n")
For pch just a slight tweak to wrap it in as.numeric:
pchh <- c(21, 22)
ccols <- c("slateblue", "maroon")
plot(df[1:2], pch = pchh[as.numeric(as.factor(df$group))], bg = ccols[as.factor(df$group)],
bty = "n", xlim = c(-1, 2), ylim = c(-1, 2))
legend("topright", unique(df$group),
pch = pchh[unique(as.numeric(as.factor(df$group)))],
pt.bg = ccols[unique(as.factor(df$group))], bty = "n")

Add calculated mean value to vertical line in plot in R

I have created a density plot with a vertical line reflecting the mean - I would like to include the calculated mean number in the graph but don't know how
(for example the mean 1.2 should appear in the graph).
beta_budget[,2] is the column which includes the different numbers of the price.
windows()
plot(density(beta_budget[,2]), xlim= c(-0.1,15), type ="l", xlab = "Beta Coefficients", main = "Preis", col = "black")
abline(v=mean(beta_budget[,2]), col="blue")
legend("topright", legend = c("Price", "Mean"), col = c("black", "blue"), lty=1, cex=0.8)
I tried it with the text command but it didn't work...
Thank you for your advise!
Something along these lines:
Data:
set.seed(123)
df <- data.frame(
v1 = rnorm(1000)
)
Draw histogram with density line:
hist(df$v1, freq = F, main = "")
lines(density(df$v1, kernel = "cosine", bw = 0.5))
abline(v = mean(df$v1), col = "blue", lty = 3, lwd = 2)
Include the mean as a text element:
text(mean(df$v1), # position of text on x-axis
max(density(df$v1)[[2]]), # position of text on y-axis
mean(df$v1), # text to be plotted
pos = 4, srt = 270, cex = 0.8, col = "blue") # some graphical parameters

R: center red_to_blue color palette at 0 in levelplot

I am making a levelplot in which one variable of my data frame is used to color the cells (fold.change) and another (map.signif) is written on top. In this case, I write * and ** for significant cells.
This is my MWE:
set.seed(150)
pv.df <- data.frame(compound=rep(LETTERS[1:8], each=3), comparison=rep(c("a/b","b/c","a/c"), 8), p.value=runif(24, 0, 0.2), fold.change=runif(24, -0.3, 0.9))
pv.df$map.signif <- ifelse(pv.df$p.value > 0.05, "", ifelse(pv.df$p.value > 0.01,"*", "**"))
pv.df
myPanel <- function(x, y, z, ...) {
panel.levelplot(x, y, z, ...)
panel.text(x, y, pv.df$map.signif, cex=3)
}
#install.packages("latticeExtra")
library(latticeExtra)
library(RColorBrewer)
cols <- colorRampPalette(brewer.pal(11, "RdBu"))(11)
png(filename="test.png", height=800, width=400)
print(
levelplot(fold.change ~ comparison*compound, #p.value instead of p.adjust depending on map.signif
pv.df,
panel = myPanel,
col.regions = cols,
at = do.breaks(range(pv.df$fold.change), 11),
colorkey = list(col = cols,
at = do.breaks(range(pv.df$fold.change), 11)),
xlab = "", ylab = "", # remove axis titles
scales = list(x = list(rot = 45), # change rotation for x-axis text
cex = 0.8), # change font size for x- & y-axis text
main = list(label = "Test\nfold change color\n*pv<0.05\t**pv<0.01",
cex = 1.5))
)
dev.off()
Which produces:
My question here is: Since fold.change includes negative and positive values, how do I make 0 to coincide with white in my color palette, so negative values are in red, and positive ones in blue?
For the win, is it possible to wirte the * in black when the cell background is clear, and in white when the background is dark? Many thanks!
Your range is not simetrical. An alternative is this one:
max_abs <- max(abs(pv.df$fold.change))
brk <- do.breaks(c(-max_abs, max_abs), 11)
levelplot(fold.change ~ comparison*compound, #p.value instead of p.adjust depending on map.signif
pv.df,
panel = myPanel,
col.regions = cols,
at = brk,
colorkey = list(col = cols,
at = brk),
xlab = "", ylab = "", # remove axis titles
scales = list(x = list(rot = 45), # change rotation for x-axis text
cex = 0.8), # change font size for x- & y-axis text
main = list(label = "Test\nfold change color\n*pv<0.05\t**pv<0.01",
cex = 1.5))
Edit
If you don't want the extra breaks:
max_abs <- max(abs(pv.df$fold.change))
brk <- do.breaks(c(-max_abs, max_abs), 11)
first_true <- which.max(brk > min(pv.df$fold.change))
brk <- brk[(first_true -1):length(brk)]
cols <- cols[(first_true -1):length(cols)]
levelplot(fold.change ~ comparison*compound, #p.value instead of p.adjust depending on map.signif
pv.df,
panel = myPanel,
col.regions = cols,
at = brk,
colorkey = list(col = cols,
at = brk),
xlab = "", ylab = "", # remove axis titles
scales = list(x = list(rot = 45), # change rotation for x-axis text
cex = 0.8), # change font size for x- & y-axis text
main = list(label = "Test\nfold change color\n*pv<0.05\t**pv<0.01",
cex = 1.5))

How to avoid overlapping of labels in a NMDS plot?

I tried to avoid overlapping of labels in a NMDS plot by using the ggrepel package. At first my code was like this:
result <- adonis(spiders~Wald, data = env, permutations=1000)
result1 <- metaMDS(spiders, distance = "bray", k = 2)
fit <- envfit(result1, env, perm = 1000)
fig<-plot(result1, type = "none")
points(fig, "sites", pch = as.numeric(env$Wald))
text(fig, "species", font=c(2), cex=c(0.75))
plot(fit, p.max = 0.05, col = "darkgrey", font=c(2), cex=c(0.75))
legend("topright", legend = c("Bestand A", "Bestand B", "Bestand C"),cex =
c(0.75), pch = as.numeric(env$Wald))
And I received this plot
so I changed my code slightly
fig<-plot(result1, type = "none")
points(fig, "sites", pch = as.numeric(env$Wald))
geom_text_repel(fig, "species", font=c(2), cex=c(0.75))
plot(fit, p.max = 0.05, col = "darkgrey", font=c(2), cex=c(0.75))
but than I got this
Error: ggplot2 doesn't know how to deal with data of class character.
I would love to provide my data to make it more easy to answer my question but I don't know how

Resources