Add calculated mean value to vertical line in plot in R - r

I have created a density plot with a vertical line reflecting the mean - I would like to include the calculated mean number in the graph but don't know how
(for example the mean 1.2 should appear in the graph).
beta_budget[,2] is the column which includes the different numbers of the price.
windows()
plot(density(beta_budget[,2]), xlim= c(-0.1,15), type ="l", xlab = "Beta Coefficients", main = "Preis", col = "black")
abline(v=mean(beta_budget[,2]), col="blue")
legend("topright", legend = c("Price", "Mean"), col = c("black", "blue"), lty=1, cex=0.8)
I tried it with the text command but it didn't work...
Thank you for your advise!

Something along these lines:
Data:
set.seed(123)
df <- data.frame(
v1 = rnorm(1000)
)
Draw histogram with density line:
hist(df$v1, freq = F, main = "")
lines(density(df$v1, kernel = "cosine", bw = 0.5))
abline(v = mean(df$v1), col = "blue", lty = 3, lwd = 2)
Include the mean as a text element:
text(mean(df$v1), # position of text on x-axis
max(density(df$v1)[[2]]), # position of text on y-axis
mean(df$v1), # text to be plotted
pos = 4, srt = 270, cex = 0.8, col = "blue") # some graphical parameters

Related

RDA triplot in R- plot only numeric explanatory variables as arrows; factors as centroids

I ran a distance-based RDA using capscale() in the vegan library in R and I am trying to plot my results as a custom triplot. I only want numeric or continuous explanatory variables to be plotted as arrows/vectors. Currently, both factors and numeric explanatory variables are being plotted with arrows, and I want to remove arrows for factors (site and year) and plot centroids for these instead.
dbRDA=capscale(species ~ canopy+gmpatch+site+year+Condition(pair), data=env, dist="bray")
To plot I extracted % explained by the first 2 axes as well as scores (coordinates in RDA space)
perc <- round(100*(summary(spe.rda.signif)$cont$importance[2, 1:2]), 2)
sc_si <- scores(spe.rda.signif, display="sites", choices=c(1,2), scaling=1)
sc_sp <- scores(spe.rda.signif, display="species", choices=c(1,2), scaling=1)
sc_bp <- scores(spe.rda.signif, display="bp", choices=c(1, 2), scaling=1)
I then set up a blank plot with scaling, axes, and labels
dbRDAplot<-plot(spe.rda.signif,
scaling = 1, # set scaling type
type = "none", # this excludes the plotting of any points from the results
frame = FALSE,
# set axis limits
xlim = c(-1,1),
ylim = c(-1,1),
# label the plot (title, and axes)
main = "Triplot db-RDA - scaling 1",
xlab = paste0("db-RDA1 (", perc[1], "%)"),
ylab = paste0("db-RDA2 (", perc[2], "%)"))
Created a legend and added points for site scores and text for species
pchh <- c(2, 17, 1, 19)
ccols <- c("black", "red", "black", "red")
legend("topleft", c("2016 MC", "2016 SP", "2018 MC", "2018 SP"), pch = pchh[unique(as.numeric(as.factor(env$siteyr)))], pt.bg = ccols[unique(as.factor(env$siteyr))], bty = "n")
points(sc_si,
pch = pchh[as.numeric(as.factor(env$siteyr))], # set shape
col = ccols[as.factor(env$siteyr)], # outline colour
bg = ccols[as.factor(env$siteyr)], # fill colour
cex = 1.2) # size
text(sc_sp , # text(sc_sp + c(0.02, 0.08) tp adjust text coordinates to avoid overlap with points
labels = rownames(sc_sp),
col = "black",
font = 1, # bold
cex = 0.7)
Here is where I add arrows for explanatory variables, but I want to be selective and do so for numeric variables only (canopy and gmpatch). The variables site and year I want to plot as centroids, but unsure how to do this. Note that the data structure for these are definitely specified as factors already.
arrows(0,0, # start them from (0,0)
sc_bp[,1], sc_bp[,2], # end them at the score value
col = "red",
lwd = 2)
text(x = sc_bp[,1] -0.1, # adjust text coordinate to avoid overlap with arrow tip
y = sc_bp[,2] - 0.03,
labels = rownames(sc_bp),
col = "red",
cex = 1,
font = 1)
#JariOksanen thank you for your answer. I was able to use the following to fix the problem
text(dbRDA, choices = c(1, 2),"cn", arrow=FALSE, length=0.05, col="red", cex=0.8, xpd=TRUE)
text(dbRDA, display = "bp", labels = c("canopy", "gmpatch"), choices = c(1, 2),scaling = "species", arrow=TRUE, select = c("canopy", "gmpatch"), col="red", cex=0.8, xpd = TRUE)
#JariOksanen thank you for your answer. I was able to use the following to fix the problem
text(dbRDA, choices = c(1, 2),"cn", arrow=FALSE, length=0.05, col="red", cex=0.8, xpd=TRUE)
text(dbRDA, display = "bp", labels = c("canopy", "gmpatch"), choices = c(1, 2),scaling = "species", arrow=TRUE, select = c("canopy", "gmpatch"), col="red", cex=0.8, xpd = TRUE)

"col" argument in plot function not working when a factor value is used for x - axis

I am doing quarterly analysis, for which I want to plot a graph. To maintain continuity on x axis I have turned quarters into factors. But then when I am using plot function and trying to color it red, the col argument is not working.
An example:
quarterly_analysis <- data.frame(Quarter = as.factor(c(2020.1,2020.2,2020.3,2020.4,2021.1,2021.2,2021.3,2021.4)),
AvgDefault = as.numeric(c(0.24,0.27,0.17,0.35,0.32,0.42,0.38,0.40)))
plot(quarterly_analysis, col="red")
But I am getting the graph in black color as shown below:
Converting it to a factor is not ideal to plot unless you have multiple values for each factor - it tries to plot a box plot-style plot. For example, with 10 observations in the same factor, the col = "red" color shows up as the fill:
set.seed(123)
fact_example <- data.frame(factvar = as.factor(rep(LETTERS[1:3], 10)),
numvar = runif(30))
plot(fact_example$factvar, fact_example$numvar,
col = "red")
With only one observation for each factor, this is not ideal because it is just showing you the line that the box plot would make.
You could use border = "red:
plot(quarterly_analysis$Quarter,
quarterly_analysis$AvgDefault, border="red")
Or if you want more flexibility, you can plot it numerically and do a little tweaking for more control (i.e., can change the pch, or make it a line graph):
# make numeric x values to plot
x_vals <- as.numeric(substr(quarterly_analysis$Quarter,1,4)) + rep(seq(0, 1, length.out = 4))
par(mfrow=c(1,3))
plot(x_vals,
quarterly_analysis$AvgDefault, col="red",
pch = 7, main = "Square Symbol", axes = FALSE)
axis(1, at = x_vals,
labels = quarterly_analysis$Quarter)
axis(2)
plot(x_vals,
quarterly_analysis$AvgDefault, col="red",
type = "l", main = "Line graph", axes = FALSE)
axis(1, at = x_vals,
labels = quarterly_analysis$Quarter)
axis(2)
plot(x_vals,
quarterly_analysis$AvgDefault, col="red",
type = "b", pch = 7, main = "Both", axes = FALSE)
axis(1, at = x_vals,
labels = quarterly_analysis$Quarter)
axis(2)
Data
set.seed(123)
quarterly_analysis <- data.frame(Quarter = as.factor(paste0(2019:2022,
rep(c(".1", ".2", ".3", ".4"),
each = 4))),
AvgDefault = runif(16))
quarterly_analysis <- quarterly_analysis[order(quarterly_analysis$Quarter),]

How can I change the colour of my points on my db-RDA triplot in R?

QUESTION: I am building a triplot for the results of my distance-based RDA in R, library(vegan). I can get a triplot to build, but can't figure out how to make the colours of my sites different based on their location. Code below.
#running the db-RDA
spe.rda.signif=capscale(species~canopy+gmpatch+site+year+Condition(pair), data=env, dist="bray")
#extract % explained by first 2 axes
perc <- round(100*(summary(spe.rda.signif)$cont$importance[2, 1:2]), 2)
#extract scores (coordinates in RDA space)
sc_si <- scores(spe.rda.signif, display="sites", choices=c(1,2), scaling=1)
sc_sp <- scores(spe.rda.signif, display="species", choices=c(1,2), scaling=1)
sc_bp <- scores(spe.rda.signif, display="bp", choices=c(1, 2), scaling=1)
#These are my location or site names that I want to use to define the colours of my points
site_names <-env$site
site_names
#set up blank plot with scaling, axes, and labels
plot(spe.rda.signif,
scaling = 1,
type = "none",
frame = FALSE,
xlim = c(-1,1),
ylim = c(-1,1),
main = "Triplot db-RDA - scaling 1",
xlab = paste0("db-RDA1 (", perc[1], "%)"),
ylab = paste0("db-RDA2 (", perc[2], "%)")
)
#add points for site scores - these are the ones that I want to be two different colours based on the labels in the original data, i.e., env$site or site_names defined above. I have copied the current state of the graph
points(sc_si,
pch = 21, # set shape (here, circle with a fill colour)
col = "black", # outline colour
bg = "steelblue", # fill colour
cex = 1.2) # size
Current graph
I am able to add species names and arrows for environmental predictors, but am just stuck on how to change the colour of the site points to reflect their location (I have two locations defined in my original data). I can get them labelled with text, but that is messy.
Any help appreciated!
I have tried separating shape or colour of point by site_name, but no luck.
If you only have a few groups (in your case, two), you could make the group a factor (within the plot call). In R, factors are represented as an integer "behind the scenes" - you can represent up to 8 colors in base R using a simple integer:
set.seed(123)
df <- data.frame(xvals = runif(100),
yvals = runif(100),
group = sample(c("A", "B"), 100, replace = TRUE))
plot(df[1:2], pch = 21, bg = as.factor(df$group),
bty = "n", xlim = c(-1, 2), ylim = c(-1, 2))
legend("topright", unique(df$group), pch = 21,
pt.bg = unique(as.factor(df$group)), bty = "n")
If you have more than 8 groups, or if you would like to define your own colors, you can simply create a vector of colors the length of your groups and still use the same factor method, though with a few slight tweaks:
# data with 10 groups
set.seed(123)
df <- data.frame(xvals = runif(100),
yvals = runif(100),
group = sample(LETTERS[1:10], 100, replace = TRUE))
# 10 group colors
ccols <- c("red", "orange", "blue", "steelblue", "maroon",
"purple", "green", "lightgreen", "salmon", "yellow")
plot(df[1:2], pch = 21, bg = ccols[as.factor(df$group)],
bty = "n", xlim = c(-1, 2), ylim = c(-1, 2))
legend("topright", unique(df$group), pch = 21,
pt.bg = ccols[unique(as.factor(df$group))], bty = "n")
For pch just a slight tweak to wrap it in as.numeric:
pchh <- c(21, 22)
ccols <- c("slateblue", "maroon")
plot(df[1:2], pch = pchh[as.numeric(as.factor(df$group))], bg = ccols[as.factor(df$group)],
bty = "n", xlim = c(-1, 2), ylim = c(-1, 2))
legend("topright", unique(df$group),
pch = pchh[unique(as.numeric(as.factor(df$group)))],
pt.bg = ccols[unique(as.factor(df$group))], bty = "n")

How to add centroids to an RDA plot

I'd like to replace the arrows on this RDA plot with centroids, something like what's pictured here.
This is the code I currently have which provides me arrows (I guess by default). I have shared our RDA code and I think this is where we might be able to change it from arrows to centroid:
# add arrows for effects of the expanatory variables
arrows(0,0, # start them from (0,0)
sc_bp[,1], sc_bp[,2], # end them at the score value
col = "red",
lwd = 1,
length = .1)
(but I share the entire code chunk (below), just in case.
Please note that my data is on fish community (species) and substrate types at 36 sites, I'd like to replace the arrows for substrates with centroids within my RDA.
##Now, the RDA
Y.mat<-Belt_2021_fish_transformed_forPCA #fish community
str(Y.mat)
X.mat<-Reefcheck_2021_forPCA #substrate
str(X.mat)
###Community data has already been transformed with hellinger
##Now, try the RDA
fish_substrate_rda<-rda(Y.mat,X.mat)
```
##Plot
## extract % explained by the first 2 axes
perc_b <- round(100*(summary(fish_substrate_rda)$cont$importance[2, 1:2]), 2)
## extract scores - these are coordinates in the RDA space
sc_si <- scores(fish_substrate_rda, display="sites", choices=c(1,2), scaling=1)
sc_sp <- scores(fish_substrate_rda, display="species", choices=c(1,2), scaling=1)
sc_sp <- sc_sp[c(2,7,8),]
sc_bp <- scores(fish_substrate_rda, display="bp", choices=c(1,2), scaling=1)
sc_bp <- sc_bp[c(2,5,6),]
# Set up a blank plot with scaling, axes, and labels
plot(fish_substrate_rda,
scaling = 1, # set scaling type
type = "none", # this excludes the plotting of any points from the results
frame = TRUE,
# set axis limits
ylim = c(-1.5,0.7),
xlim = c(-1.5,1.2),
# label the plot (title, and axes)
main = "Triplot RDA - scaling 1",
xlab = paste0("RDA1 (", perc_b[1], "%)"),
ylab = paste0("RDA2 (", perc_b[2], "%)")
)
# add points for site scores
points(sc_si,
pch = 21, # set shape (here, circle with a fill colour)
col = "black", # outline colour
bg = "steelblue", # fill colour
cex = 0.7) # size
# add points for species scores
points(sc_sp,
pch = 22, # set shape (here, square with a fill colour)
col = "black",
bg = "#f2bd33",
cex = 0.7)
# add text labels for species abbreviations
text(sc_sp + c(-0.09, -0.09), # adjust text coordinates to avoid overlap with points
labels = rownames(sc_sp),
col = "grey40",
font = 2, # bold
cex = 0.6)
# add arrows for effects of the expanatory variables
arrows(0,0, # start them from (0,0)
sc_bp[,1], sc_bp[,2], # end them at the score value
col = "red",
lwd = 1,
length = .1)
# add text labels for arrows
text(x = sc_bp[,1] -0.01, # adjust text coordinate to avoid overlap with arrow tip
y = sc_bp[,2] - 0.09,
labels = rownames(sc_bp),
col = "red",
cex = .7,
font = 1)
```
I have not found anything online that might help me to accomplish this.

Padding Around Legend when using Pch in Base R

Just a minor question. I am trying to make a legend for the following plot.
# fitting the linear model
iris_lm = lm(Petal.Length ~ Sepal.Length, data = iris)
summary(iris_lm)
# calculating the confidence interval for the fitted line
preds = predict(iris_lm, newdata = data.frame(Sepal.Length = seq(4,8,0.1)),
interval = "confidence")
# making the initial plot
par(family = "serif")
plot(Petal.Length ~ Sepal.Length, data = iris, col = "darkgrey",
family = "serif", las = 1, xlab = "Sepal Length", ylab = "Pedal Length")
# shading in the confidence interval
polygon(
c(seq(8,4,-0.1), seq(4,8,0.1)), # all of the necessary x values
c(rev(preds[,3]), preds[,2]), # all of the necessary y values
col = rgb(0.2745098, 0.5098039, 0.7058824, 0.4), # the color of the interval
border = NA # turning off the border
)
# adding the regression line
abline(iris_lm, col = "SteelBlue")
# adding a legend
legend("bottomright", legend = c("Fitted Values", "Confidence Interval"),
lty = c(1,0))
Here's the output so far:
My goal is to put a box in the legend next to the "Confidence Interval" tab, and color it in the same shade that it is in the picture. Naturally, I thought to use the pch parameter. However, when I re-run my code with the additional legend option pch = c(NA, 25), I get the following:
It is not super noticeable, but if you look closely at the padding on the left margin of the legend, it actually has decreased, and the edge of the border is now closer to the line than I would like. Is there any way to work around this?
That's a curious behavior in legend(). I'm sure someone will suggest a ggplot2 alternative. However, legend() does offer a solution. This solution calls the function without plotting anything to capture the dimensions of the desired rectangle. The legend is then plotted with the elements you really want but no enclosing box (bty = "n"). The desired rectangle is added explicitly. I assume you mean pch = 22 to get the filled box symbol. I added pt.cex = 2 to make it a bit larger.
# Capture the confidence interval color, reusable variables
myCol <- rgb(0.2745098, 0.5098039, 0.7058824, 0.4)
legText <- c("Fitted Values", "Confidence Interval")
# Picking it up from 'adding a legend'
ans <- legend("bottomright", lty = c(1,0), legend = legText, plot = F)
r <- ans$rect
legend("bottomright", lty = c(1,0), legend = legText, pch = c(NA,22),
pt.bg = myCol, col = c(1, 0), pt.cex = 2, bty = "n")
# Draw the desired box
rect(r$left, r$top - r$h, r$left + r$w, r$top)
By the way, I don't think this will work without further tweaking if you place the legend on the left side.

Resources