When using glmnet and making a plot of the coefficient path, I would like to remove the axis above the plot.
How do you remove this axis? I found this toy example on the web:
library(glmnet)
age <- c(4,8,7,12,6,9,10,14,7)
gender <- c(1,0,1,1,1,0,1,0,0) ; gender<-as.factor(gender)
bmi_p <- c(0.86,0.45,0.99,0.84,0.85,0.67,0.91,0.29,0.88)
m_edu <- c(0,1,1,2,2,3,2,0,1); m_edu<-as.factor(m_edu)
p_edu <- c(0,2,2,2,2,3,2,0,0); p_edu<-as.factor(p_edu)
f_color <- c("blue", "blue", "yellow", "red", "red", "yellow", "yellow", "red", "yellow")
asthma <- c(1,1,0,1,0,0,0,1,1)
f_color <- as.factor(f_color)
xfactors <- model.matrix(asthma ~ gender + m_edu + p_edu + f_color)[,-1]
x <- as.matrix(data.frame(age, bmi_p, xfactors))
glmmod<-glmnet(x,y=as.factor(asthma),alpha=1,family='binomial')
plot(glmmod,xvar="lambda", axes=FALSE)
This unwanted axis is made with axis function, I guess. We can use this function one more time to cover it. The trick is as follows:
1) Create a sequence with many points at which tick-marks are to be drawn.
2) Make them bigger (lengthen upwards)
3) Enlarge the width of the axis
4) Change the white colour and switch off labels
plot(glmmod, xvar = "lambda", axes = F, xlab = "", ylab = "")
axis(side = 3,
at = seq(par("usr")[1], par("usr")[2], len = 1000),
tck = -0.5,
lwd = 2,
col = "white",
labels = F)
Related
I ran a distance-based RDA using capscale() in the vegan library in R and I am trying to plot my results as a custom triplot. I only want numeric or continuous explanatory variables to be plotted as arrows/vectors. Currently, both factors and numeric explanatory variables are being plotted with arrows, and I want to remove arrows for factors (site and year) and plot centroids for these instead.
dbRDA=capscale(species ~ canopy+gmpatch+site+year+Condition(pair), data=env, dist="bray")
To plot I extracted % explained by the first 2 axes as well as scores (coordinates in RDA space)
perc <- round(100*(summary(spe.rda.signif)$cont$importance[2, 1:2]), 2)
sc_si <- scores(spe.rda.signif, display="sites", choices=c(1,2), scaling=1)
sc_sp <- scores(spe.rda.signif, display="species", choices=c(1,2), scaling=1)
sc_bp <- scores(spe.rda.signif, display="bp", choices=c(1, 2), scaling=1)
I then set up a blank plot with scaling, axes, and labels
dbRDAplot<-plot(spe.rda.signif,
scaling = 1, # set scaling type
type = "none", # this excludes the plotting of any points from the results
frame = FALSE,
# set axis limits
xlim = c(-1,1),
ylim = c(-1,1),
# label the plot (title, and axes)
main = "Triplot db-RDA - scaling 1",
xlab = paste0("db-RDA1 (", perc[1], "%)"),
ylab = paste0("db-RDA2 (", perc[2], "%)"))
Created a legend and added points for site scores and text for species
pchh <- c(2, 17, 1, 19)
ccols <- c("black", "red", "black", "red")
legend("topleft", c("2016 MC", "2016 SP", "2018 MC", "2018 SP"), pch = pchh[unique(as.numeric(as.factor(env$siteyr)))], pt.bg = ccols[unique(as.factor(env$siteyr))], bty = "n")
points(sc_si,
pch = pchh[as.numeric(as.factor(env$siteyr))], # set shape
col = ccols[as.factor(env$siteyr)], # outline colour
bg = ccols[as.factor(env$siteyr)], # fill colour
cex = 1.2) # size
text(sc_sp , # text(sc_sp + c(0.02, 0.08) tp adjust text coordinates to avoid overlap with points
labels = rownames(sc_sp),
col = "black",
font = 1, # bold
cex = 0.7)
Here is where I add arrows for explanatory variables, but I want to be selective and do so for numeric variables only (canopy and gmpatch). The variables site and year I want to plot as centroids, but unsure how to do this. Note that the data structure for these are definitely specified as factors already.
arrows(0,0, # start them from (0,0)
sc_bp[,1], sc_bp[,2], # end them at the score value
col = "red",
lwd = 2)
text(x = sc_bp[,1] -0.1, # adjust text coordinate to avoid overlap with arrow tip
y = sc_bp[,2] - 0.03,
labels = rownames(sc_bp),
col = "red",
cex = 1,
font = 1)
#JariOksanen thank you for your answer. I was able to use the following to fix the problem
text(dbRDA, choices = c(1, 2),"cn", arrow=FALSE, length=0.05, col="red", cex=0.8, xpd=TRUE)
text(dbRDA, display = "bp", labels = c("canopy", "gmpatch"), choices = c(1, 2),scaling = "species", arrow=TRUE, select = c("canopy", "gmpatch"), col="red", cex=0.8, xpd = TRUE)
#JariOksanen thank you for your answer. I was able to use the following to fix the problem
text(dbRDA, choices = c(1, 2),"cn", arrow=FALSE, length=0.05, col="red", cex=0.8, xpd=TRUE)
text(dbRDA, display = "bp", labels = c("canopy", "gmpatch"), choices = c(1, 2),scaling = "species", arrow=TRUE, select = c("canopy", "gmpatch"), col="red", cex=0.8, xpd = TRUE)
QUESTION: I am building a triplot for the results of my distance-based RDA in R, library(vegan). I can get a triplot to build, but can't figure out how to make the colours of my sites different based on their location. Code below.
#running the db-RDA
spe.rda.signif=capscale(species~canopy+gmpatch+site+year+Condition(pair), data=env, dist="bray")
#extract % explained by first 2 axes
perc <- round(100*(summary(spe.rda.signif)$cont$importance[2, 1:2]), 2)
#extract scores (coordinates in RDA space)
sc_si <- scores(spe.rda.signif, display="sites", choices=c(1,2), scaling=1)
sc_sp <- scores(spe.rda.signif, display="species", choices=c(1,2), scaling=1)
sc_bp <- scores(spe.rda.signif, display="bp", choices=c(1, 2), scaling=1)
#These are my location or site names that I want to use to define the colours of my points
site_names <-env$site
site_names
#set up blank plot with scaling, axes, and labels
plot(spe.rda.signif,
scaling = 1,
type = "none",
frame = FALSE,
xlim = c(-1,1),
ylim = c(-1,1),
main = "Triplot db-RDA - scaling 1",
xlab = paste0("db-RDA1 (", perc[1], "%)"),
ylab = paste0("db-RDA2 (", perc[2], "%)")
)
#add points for site scores - these are the ones that I want to be two different colours based on the labels in the original data, i.e., env$site or site_names defined above. I have copied the current state of the graph
points(sc_si,
pch = 21, # set shape (here, circle with a fill colour)
col = "black", # outline colour
bg = "steelblue", # fill colour
cex = 1.2) # size
Current graph
I am able to add species names and arrows for environmental predictors, but am just stuck on how to change the colour of the site points to reflect their location (I have two locations defined in my original data). I can get them labelled with text, but that is messy.
Any help appreciated!
I have tried separating shape or colour of point by site_name, but no luck.
If you only have a few groups (in your case, two), you could make the group a factor (within the plot call). In R, factors are represented as an integer "behind the scenes" - you can represent up to 8 colors in base R using a simple integer:
set.seed(123)
df <- data.frame(xvals = runif(100),
yvals = runif(100),
group = sample(c("A", "B"), 100, replace = TRUE))
plot(df[1:2], pch = 21, bg = as.factor(df$group),
bty = "n", xlim = c(-1, 2), ylim = c(-1, 2))
legend("topright", unique(df$group), pch = 21,
pt.bg = unique(as.factor(df$group)), bty = "n")
If you have more than 8 groups, or if you would like to define your own colors, you can simply create a vector of colors the length of your groups and still use the same factor method, though with a few slight tweaks:
# data with 10 groups
set.seed(123)
df <- data.frame(xvals = runif(100),
yvals = runif(100),
group = sample(LETTERS[1:10], 100, replace = TRUE))
# 10 group colors
ccols <- c("red", "orange", "blue", "steelblue", "maroon",
"purple", "green", "lightgreen", "salmon", "yellow")
plot(df[1:2], pch = 21, bg = ccols[as.factor(df$group)],
bty = "n", xlim = c(-1, 2), ylim = c(-1, 2))
legend("topright", unique(df$group), pch = 21,
pt.bg = ccols[unique(as.factor(df$group))], bty = "n")
For pch just a slight tweak to wrap it in as.numeric:
pchh <- c(21, 22)
ccols <- c("slateblue", "maroon")
plot(df[1:2], pch = pchh[as.numeric(as.factor(df$group))], bg = ccols[as.factor(df$group)],
bty = "n", xlim = c(-1, 2), ylim = c(-1, 2))
legend("topright", unique(df$group),
pch = pchh[unique(as.numeric(as.factor(df$group)))],
pt.bg = ccols[unique(as.factor(df$group))], bty = "n")
I have created a density plot with a vertical line reflecting the mean - I would like to include the calculated mean number in the graph but don't know how
(for example the mean 1.2 should appear in the graph).
beta_budget[,2] is the column which includes the different numbers of the price.
windows()
plot(density(beta_budget[,2]), xlim= c(-0.1,15), type ="l", xlab = "Beta Coefficients", main = "Preis", col = "black")
abline(v=mean(beta_budget[,2]), col="blue")
legend("topright", legend = c("Price", "Mean"), col = c("black", "blue"), lty=1, cex=0.8)
I tried it with the text command but it didn't work...
Thank you for your advise!
Something along these lines:
Data:
set.seed(123)
df <- data.frame(
v1 = rnorm(1000)
)
Draw histogram with density line:
hist(df$v1, freq = F, main = "")
lines(density(df$v1, kernel = "cosine", bw = 0.5))
abline(v = mean(df$v1), col = "blue", lty = 3, lwd = 2)
Include the mean as a text element:
text(mean(df$v1), # position of text on x-axis
max(density(df$v1)[[2]]), # position of text on y-axis
mean(df$v1), # text to be plotted
pos = 4, srt = 270, cex = 0.8, col = "blue") # some graphical parameters
I am looking to plot the following:
L<-((2*pi*h*c^2)/l^5)*((1/(exp((h*c)/(l*k*T)-1))))
all variables except l are constant:
T<-6000
h<-6.626070040*10^-34
c<-2.99792458*10^8
k<-1.38064852*10^-23
l has a range of 20*10^-9 to 2000*10^-9.
I have tried l<-seq(20*10^-9,2000*10^-9,by=1*10^-9), however this does not give me the results I expect.
Is there a simple solution for this in R, or do I have to try in another language?
Thank you.
Looking at the spectral radiance equation wikipedia page, it seems that your formula is a bit off. Your formula multiplies an additional pi (not sure if intended) and the -1 is inside the exp instead of outside:
L <- ((2*pi*h*c^2)/l^5)*((1/(exp((h*c)/(l*k*T)-1))))
Below is the corrected formula. Also notice I have converted it into a function with parameter l since this is a variable:
T <- 6000 # Absolute temperature
h <- 6.626070040*10^-34 # Plank's constant
c <- 2.99792458*10^8 # Speed of light in the medium
k <- 1.38064852*10^-23 # Boltzmann constant
L <- function(l){((2*h*c^2)/l^5)*((1/(exp((h*c)/(l*k*T))-1)))}
# Plotting
plot(L, xlim = c(20*10^-9,2000*10^-9),
xlab = "Wavelength (nm)",
ylab = bquote("Spectral Radiance" ~(KW*sr^-1*m^-2*nm^-1)),
main = "Plank's Law",
xaxt = "n", yaxt = "n")
xtick <- seq(20*10^-9, 2000*10^-9,by=220*10^-9)
ytick <- seq(0, 4*10^13,by=5*10^12)
axis(side=1, at=xtick, labels = (1*10^9)*seq(20*10^-9,2000*10^-9,by=220*10^-9))
axis(side=2, at=ytick, labels = (1*10^-12)*seq(0, 4*10^13,by=5*10^12))
The plot above is not bad, but I think we can do better with ggplot2:
h <- 6.626070040*10^-34 # Plank's constant
c <- 2.99792458*10^8 # Speed of light in the medium
k <- 1.38064852*10^-23 # Boltzmann constant
L2 <- function(l, T){((2*h*c^2)/l^5)*((1/(exp((h*c)/(l*k*T))-1)))} # Plank's Law
classical_L <- function(l, T){(2*c*k*T)/l^4} # Rayleigh-Jeans Law
library(ggplot2)
ggplot(data.frame(l = c(20*10^-9,2000*10^-9)), aes(l)) +
geom_rect(aes(xmin=390*10^-9, xmax=700*10^-9, ymin=0, ymax=Inf),
alpha = 0.3, fill = "lightblue") +
stat_function(fun=L2, color = "red", size = 1, args = list(T = 3000)) +
stat_function(fun=L2, color = "green", size = 1, args = list(T = 4000)) +
stat_function(fun=L2, color = "blue", size = 1, args = list(T = 5000)) +
stat_function(fun=L2, color = "purple", size = 1, args = list(T = 6000)) +
stat_function(fun=classical_L, color = "black", size = 1, args = list(T = 5000)) +
theme_bw() +
scale_x_continuous(breaks = seq(20*10^-9, 2000*10^-9,by=220*10^-9),
labels = (1*10^9)*seq(20*10^-9,2000*10^-9,by=220*10^-9),
sec.axis = dup_axis(labels = (1*10^6)*seq(20*10^-9,2000*10^-9,by=220*10^-9),
name = "Wavelength (\U003BCm)")) +
scale_y_continuous(breaks = seq(0, 4*10^13,by=5*10^12),
labels = (1*10^-12)*seq(0, 4*10^13,by=5*10^12),
limits = c(0, 3.5*10^13)) +
labs(title = "Black Body Radiation described by Plank's Law",
x = "Wavelength (nm)",
y = expression("Spectral Radiance" ~(kWsr^-1*m^-2*nm^-1)),
caption = expression(''^'\U02020' ~'Spectral Radiance described by Rayleigh-Jeans Law, which demonstrates the ultraviolet catastrophe.')) +
annotate("text",
x = c(640*10^-9, 640*10^-9, 640*10^-9, 640*10^-9,
150*10^-9, (((700-390)/2)+390)*10^-9, 1340*10^-9),
y = c(2*10^12, 5*10^12, 14*10^12, 31*10^12,
35*10^12, 35*10^12, 35*10^12),
label = c("3000 K", "4000 K", "5000 K", "6000 K",
"UV", "VISIBLE", "INFRARED"),
color = c(rep("black", 4), "purple", "blue", "red"),
alpha = c(rep(1, 4), rep(0.6, 3)),
size = 4.5) +
annotate("text", x = 1350*10^-9, y = 23*10^12,
label = deparse(bquote("Classical theory (5000 K)"^"\U02020")),
color = "black", parse = TRUE)
Notes:
I created L2 by also making absolute temperature T a variable
For each T, I plot the function L2 using different colors for representation. I've also added a classical_L function to demonstrate classical theory of spectral radiance
geom_rect creates the light blue shaded area for "VISIBLE" light wavelength range
scale_x_continuous sets the breaks of the x axis, while labels sets the axis tick labels. Notice I have multiplied the seq by (1*10^9) to convert the units to nanometer (nm). A second x-axis is added to display the micrometer scale
Analogously, scale_y_continuous sets the breaks and tick labels for y axis. Here I multiplied by (1*10^-12) or (1*10^(-3-9)) to convert from watts (W) to kilowatts (kW), and from inverse meter (m^-1) to inverse nanometer (nm^-1)
bquote displays superscripts correctly in the y axis label
annotate sets the coordinates and text for curve labels. I've also added the labels for "UV", "VISIBLE" and "INFRARED" light wavelengths
ggplot2
Plot from wikipedia:
Image source: https://upload.wikimedia.org/wikipedia/commons/thumb/1/19/Black_body.svg/600px-Black_body.svg.png
I have a 3 column matrix; plots are made by points based on column 1 and column 2 values, but colored based on column 2 (6 different groups). I can successfully plot all points, however, the last plot group (group 6) which was assigned the color purple, masks the plots of the other groups. Is there a way to make the plot points more transparent?
s <- read.table("/.../parse-output.txt", sep="\t")
dim(s)
[1] 67124 3
x <- s[,1]
y <- s[,2]
z <- s[,3]
cols <- cut(z, 6, labels = c("pink", "red", "yellow", "blue", "green", "purple"))
plot(x, y, main= "Fragment recruitment plot - FR-HIT", ylab = "Percent identity", xlab = "Base pair position", col = as.character(cols), pch=16)
Otherwise, you have function alpha in package scales in which you can directly input your vector of colors (even if they are factors as in your example):
library(scales)
cols <- cut(z, 6, labels = c("pink", "red", "yellow", "blue", "green", "purple"))
plot(x, y, main= "Fragment recruitment plot - FR-HIT",
ylab = "Percent identity", xlab = "Base pair position",
col = alpha(cols, 0.4), pch=16)
# For an alpha of 0.4, i. e. an opacity of 40%.
When creating the colors, you may use rgb and set its alpha argument:
plot(1:10, col = rgb(red = 1, green = 0, blue = 0, alpha = 0.5),
pch = 16, cex = 4)
points((1:10) + 0.4, col = rgb(red = 0, green = 0, blue = 1, alpha = 0.5),
pch = 16, cex = 4)
Please see ?rgb for details.
Transparency can be coded in the color argument as well. It is just two more hex numbers coding a transparency between 0 (fully transparent) and 255 (fully visible). I once wrote this function to add transparency to a color vector, maybe it is usefull here?
addTrans <- function(color,trans)
{
# This function adds transparancy to a color.
# Define transparancy with an integer between 0 and 255
# 0 being fully transparant and 255 being fully visable
# Works with either color and trans a vector of equal length,
# or one of the two of length 1.
if (length(color)!=length(trans)&!any(c(length(color),length(trans))==1)) stop("Vector lengths not correct")
if (length(color)==1 & length(trans)>1) color <- rep(color,length(trans))
if (length(trans)==1 & length(color)>1) trans <- rep(trans,length(color))
num2hex <- function(x)
{
hex <- unlist(strsplit("0123456789ABCDEF",split=""))
return(paste(hex[(x-x%%16)/16+1],hex[x%%16+1],sep=""))
}
rgb <- rbind(col2rgb(color),trans)
res <- paste("#",apply(apply(rgb,2,num2hex),2,paste,collapse=""),sep="")
return(res)
}
Some examples:
cols <- sample(c("red","green","pink"),100,TRUE)
# Fully visable:
plot(rnorm(100),rnorm(100),col=cols,pch=16,cex=4)
# Somewhat transparant:
plot(rnorm(100),rnorm(100),col=addTrans(cols,200),pch=16,cex=4)
# Very transparant:
plot(rnorm(100),rnorm(100),col=addTrans(cols,100),pch=16,cex=4)
If you are using the hex codes, you can add two more digits at the end of the code to represent the alpha channel:
E.g. half-transparency red:
plot(1:100, main="Example of Plot With Transparency")
lines(1:100 + sin(1:100*2*pi/(20)), col='#FF000088', lwd=4)
mtext("use `col='#FF000088'` for the lines() function")
If you decide to use ggplot2, you can set transparency of overlapping points using the alpha argument.
e.g.
library(ggplot2)
ggplot(diamonds, aes(carat, price)) + geom_point(alpha = 1/40)