How can I use geom_text() to add the "number" field next to each upper error bar. i.e. to the right of the upper error bar.
group= 1:10
count = c(41,640,1000,65,30,4010,222,277,1853,800 )
mu = c(.7143,.66,.6441,.58,.7488,.5616,.5507,.5337,.5513,.5118)
sd = c(.2443,.20,.2843,.2285,.2616,.2365,.2408,.2101,.2295,.1966)
u = mu + 1.96*sd/sqrt(count)
l= mu - 1.96*sd/sqrt(count)
number = c(23,12,35,32,23,63,65,66,66,66)
dat = data.frame(group= group, count = count, mu = mu, sd = sd,u,u,l=l,number = number)
dat[order(dat$count),]
ggplot(dat, aes(y=factor(group), x= mu)) +
geom_point()+
geom_errorbarh(aes(xmax = as.numeric(u),xmin = as.numeric(l)))
aes(label = number, x = as.numeric(u)) to use numbers as the labels and the upper error bar the x coordinates. The y coordinates will remain the same as you've specified in ggplot.
hjust = -1 will justify text labels and shift them right.
Use xlim() to adjust for text that might go over the right edge.
Example:
ggplot(dat, aes(y=factor(group), x= mu)) +
geom_point()+
geom_errorbarh(aes(xmax = as.numeric(u),xmin = as.numeric(l))) +
geom_text(aes(label = number, x = as.numeric(u)), hjust = -1) +
xlim(.49, .85)
Related
I want to divide the y axis for the attached figure to take part with a score <25 occupies the majority of the figure while the remaining represent a minor upper part.
I browsed that and I am aware that I should use scale_y_discrete(limits .I used this p<- p+scale_y_continuous(breaks = 1:20, labels = c(1:20,"//",40:100)) but it doesn't work yet.
I used the attached data and this is my code
Code
p<-ggscatter(data, x = "Year" , y = "Score" ,
color = "grey", shape = 21, size = 3, # Points color, shape and size
add.params = list(color = "blue", fill = "lightgray"), # Customize reg. line
add = "loess", #reg.line
conf.int = T,
cor.coef = F, cor.method = "pearson",
xlab = "Year" , ylab= "Score")
p<-p+ coord_cartesian(xlim = c(1980, 2020));p
Here is as close as I could get getting a fake axis break and resizing the upper area of the plot. I still think it's a bad idea and if this were my plot I'd much prefer a more straightforward axis transform.
First, we'd need a function that generates a transform that squeezes all values above some threshold:
library(ggplot2)
library(scales)
# Define new transform
my_transform <- function(threshold = 25, squeeze_factor = 10) {
force(threshold)
force(squeeze_factor)
my_transform <- trans_new(
name = "trans_squeeze",
transform = function(x) {
ifelse(x > threshold,
((x - threshold) * (1 / squeeze_factor)) + threshold,
x)
},
inverse = function(x) {
ifelse(x > threshold,
((x - threshold) * squeeze_factor) + threshold,
x)
}
)
return(my_transform)
}
Next we apply that transformation to the y-axis and add a fake axis break. I've used vanilla ggplot2 code as I find the ggscatter() approach confusing.
ggplot(data, aes(Year, Score)) +
geom_point(color = "grey", shape = 21, size = 3) +
geom_smooth(method = "loess", fill = "lightgray") +
# Add fake axis lines
annotate("segment", x = -Inf, xend = -Inf,
y = c(-Inf, Inf), yend = c(24.5, 25.5)) +
# Apply transform to y-axis
scale_y_continuous(trans = my_transform(25, 10),
breaks = seq(0, 80, by = 10)) +
scale_x_continuous(limits = c(1980, 2020), oob = oob_keep) +
theme_classic() +
# Turn real y-axis line off
theme(axis.line.y = element_blank())
You might find it informative to read Hadley Wickham's view on discontinuous axes. People sometimes mock weird y-axes.
I'd like to plot histogram and density on the same plot. What I would like to add to the following is custom y-axis label which would be something like sprintf("[%s] %s", ..density.., ..count..) - two numbers at one tick value. Is it possible to obtain this with scale_y_continuous or do I need to work this around somehow?
Below current progress using scales::trans_new and sec_axis. sec_axis is kind of acceptable but the most desirable output is as on the image below.
set.seed(1)
var <- rnorm(4000)
binwidth <- 2 * IQR(var) / length(var) ^ (1 / 3)
count_and_proportion_label <- function(x) {
sprintf("%s [%.2f%%]", x, x/sum(x) * 100)
}
ggplot(data = data.frame(var = var), aes(x = var, y = ..count..)) +
geom_histogram(binwidth = binwidth) +
geom_density(aes(y = ..count.. * binwidth)) +
scale_y_continuous(
# this way
trans = trans_new(name = "count_and_proportion",
format = count_and_proportion_label,
transform = function(x) x,
inverse = function(x) x),
# or this way
sec.axis = sec_axis(trans = ~./sum(.),
labels = percent,
name = "proportion (in %)")
)
I've tried to create object with breaks before basing on the graphics::hist output - but these two histogram differs.
bins <- (max(var) - min(var))/binwidth
hdata <- hist(var, breaks = bins, right = FALSE)
# hist generates different bins than `ggplot2`
At the end I would like to get something like this:
Would it be acceptable to add percentage as a secondary axis? E.g.
your_plot + scale_y_continuous(sec.axis = sec_axis(~.*2, name = "[%]"))
Perhaps it would be possible to overlay the secondary axis on the primary one, but I'm not sure how you would go about doing that.
You can achieve your desired output by creating a custom set of labels, and adding it to the plot:
library(tidyverse)
library(ggplot2)
set.seed(1)
var <- rnorm(400)
bins <- .1
df <- data.frame(yvals = seq(0, 20, 5), labels = c("[0%]", "[10%]", "[20%]", "[30%]", "[40%]"))
df <- df %>% tidyr::unite("custom_labels", labels, yvals, sep = " ", remove = TRUE)
ggplot(data = data.frame(var = var), aes(x = var, y = ..count..)) +
geom_histogram(aes(y = ..count..), binwidth = bins) +
geom_density(aes(y = ..count.. * bins), color = "black", alpha = 0.7) +
ylab("[density] count") +
scale_y_continuous(breaks = seq(0, 20, 5), labels = df$custom_labels)
Is there a simple way to extend a dotted line from the end of a solid regression line to a predicted value?
Below is my basic attempt at it:
x = rnorm(10)
y = 5 + x + rnorm(10,0,0.4)
my_lm <- lm(y~x)
summary(my_lm)
my_intercept <- my_lm$coef[1]
my_slope <- my_lm$coef[2]
my_pred = predict(my_lm,data.frame(x = (max(x)+1)))
ggdf <- data.frame( x = c(x,max(x)+1), y = c(y,my_pred), obs_Or_Pred = c(rep("Obs",10),"Pred") )
ggplot(ggdf, aes(x = x, y = y, group = obs_Or_Pred ) ) +
geom_point( size = 3, aes(colour = obs_Or_Pred) ) +
geom_abline( intercept = my_intercept, slope = my_slope, aes( linetype = obs_Or_Pred ) )
This doesn't give the output I'd hoped to see. I've looked at some other answers on SO and haven't seen anything simple.The best I've come up with is:
ggdf2 <- data.frame( x = c(x,max(x),max(x)+12), y = c(y,my_intercept+max(x)*my_slope,my_pred), obs_Or_Pred = c(rep("Obs",8),"Pred","Pred"), show_Data_Point = c(rep(TRUE,8),FALSE,TRUE) )
ggplot(ggdf2, aes(x = x, y = y, group = obs_Or_Pred ) ) +
geom_point( data = ggdf2[ggdf2[,"show_Data_Point"],] ,size = 3, aes(colour = obs_Or_Pred) ) +
geom_smooth( method = "lm", se=F, aes(colour = obs_Or_Pred, linetype=obs_Or_Pred) )
This gives output which is correct, but I have had to include an extra column specifying whether or not I want to show the data points. If I don't, I end up with the second of these two plots, which has an extra point at the end of the fitted regression line:
Is there a simpler way to tell ggplot to predict a single point out from the linear model and draw a dashed line to it?
You can plot the points using only your actual data and build a prediction data frame to add the lines. Note that max(x) appears twice so that it can be an endpoint of both the Obs line and the Pred line. We also use a shape aesthetic so that we can remove the point marker that would otherwise appear in the legend key for Pred.
# Build prediction data frame
pred_x = c(min(x),rep(max(x),2),max(x)+1)
pred_lines = data.frame(x=pred_x,
y=predict(my_lm, data.frame(x=pred_x)),
obs_Or_Pred=rep(c("Obs","Pred"), each=2))
ggplot(pred_lines, aes(x, y, colour=obs_Or_Pred, shape=obs_Or_Pred, linetype=obs_Or_Pred)) +
geom_point(data=data.frame(x,y, obs_Or_Pred="Obs"), size=3) +
geom_line(size=1) +
scale_shape_manual(values=c(16,NA)) +
theme_bw()
Semi-ugly: You can use scale_x_continuous(limits = to set the range of x values used for prediction. Plot the predicted line first with fullrange = TRUE, then add the 'observed' line on top. Note that the overplotting isn't rendered perfectly, and you may want to increase the size of the observed line slightly.
ggplot(d, aes(x, y)) +
geom_point(aes(color = "obs")) +
geom_smooth(aes(color = "pred", linetype = "pred"), se = FALSE, method = "lm",
fullrange = TRUE) +
geom_smooth(aes(color = "obs", linetype = "obs"), size = 1.05, se = FALSE, method = "lm") +
scale_linetype_discrete(name = "obs_or_pred") +
scale_color_discrete(name = "obs_or_pred") +
scale_x_continuous(limits = c(NA, max(x) + 1))
However, I tend to agree with Gregor: "ggplot is a plotting package, not a modeling package".
The line width (size) aesthetics in ggplot2 seems to print approximately 2.13 pt wider lines to a pdf (the experiment was done in Adobe Illustrator with a Mac):
library(ggplot2)
dt <- data.frame(id = rep(letters[1:5], each = 3), x = rep(seq(1:3), 5), y = rep(seq(1:5), each = 3), s = rep(c(0.05, 0.1, 0.5, 1, 72.27/96*0.5), each = 3))
lns <- split(dt, dt$id)
ggplot() + geom_line(data = lns[[1]], aes(x = x, y = y), size = unique(lns[[1]]$s)) +
geom_text(data = lns[[1]], y = unique(lns[[1]]$y), x = 3.5, label = paste("Width in ggplot =", unique(lns[[1]]$s))) +
geom_line(data = lns[[2]], aes(x = x, y = y), size = unique(lns[[2]]$s)) +
geom_text(data = lns[[2]], y = unique(lns[[2]]$y), x = 3.5, label = paste("Width in ggplot =", unique(lns[[2]]$s))) +
geom_line(data = lns[[3]], aes(x = x, y = y), size = unique(lns[[3]]$s)) +
geom_text(data = lns[[3]], y = unique(lns[[3]]$y), x = 3.5, label = paste("Width in ggplot =", unique(lns[[3]]$s))) +
geom_line(data = lns[[4]], aes(x = x, y = y), size = unique(lns[[4]]$s)) +
geom_text(data = lns[[4]], y = unique(lns[[4]]$y), x = 3.5, label = paste("Width in ggplot =", unique(lns[[4]]$s))) +
geom_line(data = lns[[5]], aes(x = x, y = y), size = unique(lns[[5]]$s)) +
geom_text(data = lns[[5]], y = unique(lns[[5]]$y), x = 3.5, label = paste("Width in ggplot =", unique(lns[[5]]$s))) +
xlim(1,4) + theme_void()
ggsave("linetest.pdf", width = 8, height = 2)
# Device size does not affect line width:
ggsave("linetest2.pdf", width = 10, height = 6)
I read that one should multiply the line width by 72.27/96 to get a line width in pt, but the experiment above gives me a line width of 0.8 pt, when I try to get 0.5 pt.
As #Pascal points out, the line width does not seem to follow the pt to mm conversion that works for fonts and was defined by #hadley in one of the comments. I.e. the line width does not appear to be defined by "the magic number" 1/0.352777778.
What is the equation behind line width for ggplot2?
You had all the pieces in your post already. First, ggplot2 multiplies the size setting by ggplot2::.pt, which is defined as 72.27/25.4 = 2.845276 (line 165 in geom-.r):
> ggplot2::.pt
[1] 2.845276
Then, as you state, you need to multiply the resulting value by 72.27/96 to convert from R pixels to points. Thus the conversion factor is:
> ggplot2::.pt*72.27/96
[1] 2.141959
As you can see, ggplot2 size = 1 corresponds to approximately 2.14pt, and similarly 0.8 pt corresponds to 0.8/2.141959 = 0.3734899 in ggplot2 size units.
I try run the following function in R:
GainChart <- function(score, good.label, n.breaks = 50, ...){
df <- data.frame(percentiles = seq(0, 1, length = n.breaks),
gain = Gain(score, good.label, seq(0, 1, length = n.breaks)))
p <- ggplot(df, aes(percentiles, gain)) + geom_line(size = 1.2, colour = "darkred")
p <- p + geom_line(aes(x = c(0,1), y = c(0,1)), colour = "gray", size = 0.7)
p <- p + scale_x_continuous("Sample Percentiles", labels = percent_format(), limits = c(0, 1))
p <- p + scale_y_continuous("Cumulative Percents of Bads", labels = percent_format(), limits = c(0, 1))
p
}
And I received the following error message: "Error: Aesthetics must be either length 1 or the same as the data (50): x, y"
The command for call the function is:
GainChart(data_sampni$score,data_sampni$TOPUP_60d)
Your second geom_line() doesn't make sense. It seems that you are trying to draw a line between [0,0] and [1,1]. In that case, use geom_abline():
+ geom_abline(aes(intercept = 0, slope = 1))
The help for geom_line says that if data=NULL (the default) then the data is inherited from the parent graph. That is where your mismatch is coming from.
Iy you change the second line to something like:
p <- p + geom_line(aes(x,y), data=data.frame(x=c(0,1), y=c(0,1)))
Then it should work.