ggplot2: drawing a line between two values - r

I have the following dataframe:
df <- tribble(
~group, ~value, ~ci_low, ~ci_upper,
"group1", 0.0577434, 0.0567665, 0.0587203,
"group2", 0.0941233, 0.0801769, 0.1080698)
I want to plot the column "value" as a point and then a dashed line that goes "under" which minimum point is ci_low and high point is ci_upper.
So far I have this:
ggplot(df, aes(group, value)) +
geom_point(size = 4)
And I want something like this:

To have control over the line ends, use geom_segment:
ggplot(df, aes(group, value)) +
geom_segment(aes(xend = group, y = ci_low, yend = ci_upper), color = "red", size = 2, lineend = "round") +
geom_point(size = 4) +
theme_bw()
If square line ends are OK, use geom_linerange:
ggplot(df, aes(group, value)) +
geom_linerange(aes(ymin = ci_low, ymax = ci_upper), color = "red", size = 2) +
geom_point(size = 4) +
theme_bw()

This should do the trick.
ggplot(df, aes(x=group, y=value)) +
geom_pointrange(aes(ymin=ci_low, ymax=ci_upper))

Related

mean line in every facet_wrap

ggplot(data = results, aes(x = inst, y = value, group = inst)) +
geom_boxplot() +
facet_wrap(~color) +
#geom_line(data = mean,
#mapping = aes(x = inst, y = average, group = 1))
theme_bw()
When I run the code above with the code line commented, it runs and gives the graph below but I want a joining mean lines on the boxplots based on its own color category for each group in facet wraps. Any ideas how can I do that?
Your code is generally correct (though you'll want to add color = color to the aes() specification in geom_line()), so I suspect your mean dataset isn't set up correctly. Do you have means grouped by both your x axis and faceting variable? Using ggplot2::mpg as an example:
library(dplyr) # >= v1.1.0
library(ggplot2)
mean_dat <- summarize(mpg, average = mean(hwy), .by = c(cyl, drv))
ggplot(mpg, aes(factor(cyl), hwy)) +
geom_boxplot() +
geom_line(
data = mean_dat,
aes(y = average, group = 1, color = drv),
linewidth = 1.5,
show.legend = FALSE
) +
facet_wrap(~drv) +
theme_bw()
Alternatively, you could use stat = "summary" and not have to create a means dataframe at all:
ggplot(mpg, aes(factor(cyl), hwy)) +
geom_boxplot() +
geom_line(
aes(group = 1, color = drv),
stat = "summary",
linewidth = 1.5,
show.legend = FALSE
) +
facet_wrap(~drv) +
theme_bw()
# same result as above

ggplot: color points by density as they approach a specific value?

I have a dataset containing 1,000 values for a model, these values are all within the same range (y=40-70), so the points overlap a ton. I'm interested in using color to show the density of the points converging on a single value (y=56.72) which I have indicated with a horizontal dashed line on the plot below. How can I color these points to show this?
ggplot(data, aes(x=model, y=value))+
geom_point(size=1) +
geom_hline(yintercept=56.72,
linetype="dashed",
color = "black")
I think that you should opt for an histogram or density plot:
n <- 500
data <- data.frame(model= rep("model",n),value = rnorm(n,56.72,10))
ggplot(data, aes(x = value, y = after_stat(count))) +
geom_histogram(binwidth = 1)+
geom_density(size = 1)+
geom_vline(xintercept = 56.72, linetype = "dashed", color = "black")+
theme_bw()
Here is your plot with the same data:
ggplot(data, aes(x = model, y = value))+
geom_point(size = 1) +
geom_hline(yintercept = 56.72, linetype = "dashed", color = "black")
If your model is iterative and do converge to the value, I suggest you plot as a function of the iteration to show the convergence. An other option, keeping a similar plot to your, is dodging the position of the points :
ggplot(data, aes(x = model, y = value))+
geom_point(position = position_dodge2(width = 0.2),
shape = 1,
size = 2,
stroke = 1,
alpha = 0.5) +
geom_hline(yintercept = 56.72, linetype = "dashed", color = "black")
Here is a color density plot as you asked:
library(dplyr)
library(ggplot2)
data %>%
mutate(bin = cut(value, breaks = 10:120)) %>%
dplyr::group_by(bin) %>%
mutate(density = dplyr::n()) %>%
ggplot(aes(x = model, y = value, color = density))+
geom_point(size = 1) +
geom_hline(yintercept = 56.72, linetype = "dashed", color = "black")+
scale_colour_viridis_c(option = "A")
I would suggest to use the alpha parameter within the geom_point. You should use a value close to 0.
ggplot(data, aes(x=model, y=value)) +
geom_point(size=1, alpha = .1) +
geom_hline(yintercept=56.72, linetype="dashed", color = "black")

R graph: label by group

The data I am working on is a clustering data, with multiple observations within one group, I generated a caterpillar plot and want labelling for each group(zipid), not every line, my current graph and code look like this:
text = hosp_new[,c("zipid")]
ggplot(hosp_new, aes(x = id, y = oe, colour = zipid, shape = group)) +
# theme(panel.grid.major = element_blank()) +
geom_point(size=1) +
scale_shape_manual(values = c(1, 2, 4)) +
geom_errorbar(aes(ymin = low_ci, ymax = high_ci)) +
geom_smooth(method = lm, se = FALSE) +
scale_linetype_manual(values = linetype) +
geom_segment(aes(x = start_id, xend = end_id, y = region_oe, yend = region_oe, linetype = "4", size = 1.2)) +
geom_ribbon(aes(ymin = region_low_ci, ymax = region_high_ci), alpha=0.2, linetype = "blank") +
geom_hline(aes(yintercept = 1, alpha = 0.2, colour = "red", size = 1), show.legend = "FALSE") +
scale_size_identity() +
scale_x_continuous(name = "hospital id", breaks = seq(0,210, by = 10)) +
scale_y_continuous(name = "O:E ratio", breaks = seq(0,7, by = 1)) +
geom_text(aes(label = text), position = position_stack(vjust = 10.0), size = 2)
Caterpillar plot:
Each color represents a region, I just want one label/per region, but don't know how to delete the duplicated labels in this graph.
Any idea?
The key is to have geom_text return only one value for each zipid, rather than multiple values. If we want each zipid label located in the middle of its group, then we can use the average value of id as the x-coordinate for each label. In the code below, we use stat_summaryh (from the ggstance package) to calculate that average id value for the x-coordinate of the label and return a single label for each zipid.
library(ggplot2)
theme_set(theme_bw())
library(ggstance)
# Fake data
set.seed(300)
dat = data.frame(id=1:100, y=cumsum(rnorm(100)),
zipid=rep(LETTERS[1:10], c(10, 5, 20, 8, 7, 12, 7, 10, 13,8)))
ggplot(dat, aes(id, y, colour=zipid)) +
geom_segment(aes(xend=id, yend=0)) +
stat_summaryh(fun.x=mean, aes(label=zipid, y=1.02*max(y)), geom="text") +
guides(colour=FALSE)
You could also use faceting, as mentioned by #user20650. In the code below, panel.spacing.x=unit(0,'pt') removes the space between facet panels, while expand=c(0,0.5) adds 0.5 units of padding on the sides of each panel. Together, these ensure constant spacing between tick marks, even across facets.
ggplot(dat, aes(id, y, colour=zipid)) +
geom_segment(aes(xend=id, yend=0)) +
facet_grid(. ~ zipid, scales="free_x", space="free_x") +
guides(colour=FALSE) +
theme_classic() +
scale_x_continuous(breaks=0:nrow(dat),
labels=c(rbind(seq(0,100,5),'','','',''))[1:(nrow(dat)+1)],
expand=c(0,0.5)) +
theme(panel.spacing.x = unit(0,"pt"))

Colour average lines in ggplot

I would like to colour the dashed lines, which are the average values of the two respective categories, with the same colour of the default palette used by ggplot to fill the distributions:
Click here to view the distribution
This is the code used:
library(ggplot2)
print(ggplot(dati, aes(x=ECU_fuel_consumption_L_100Km_CF, fill=Model))
+ ggtitle("Fuel Consumption density histogram, by Model")
+ ylab("Density")
+ geom_density(alpha=.3)
+ scale_x_continuous(breaks=pretty(dati$ECU_fuel_consumption_L_100Km_CF, n=10))
+ geom_vline(aes(xintercept = mean(ECU_fuel_consumption_L_100Km_CF[dati$Model == "500X"])), linetype="dashed", size=1)
+ geom_vline(aes(xintercept = mean(ECU_fuel_consumption_L_100Km_CF[dati$Model == "Renegade"])), linetype="dashed", size=1)
)
Thank you all in advance!
No reproducible example, but you probably want to do something like this:
library(dplyr)
# make up some data
d <- data.frame(x = c(mtcars$mpg, mtcars$hp),
var = rep(c('mpg', 'hp'), each = nrow(mtcars)))
means <- d %>% group_by(var) %>% summarize(m = mean(x))
ggplot(d, aes(x, fill = var)) +
geom_density(alpha = 0.3) +
geom_vline(data = means, aes(xintercept = m, col = var),
linetype = "dashed", size = 1)
This approach is extendable to any number of groups.
An option that doesn't require pre-calculation, but is also a bit more hacky, is:
ggplot(d, aes(x, fill = var)) +
geom_density(alpha = 0.3) +
geom_vline(aes(col = 'hp', xintercept = x), linetype = "dashed", size = 1,
data = data.frame(x = mean(d$x[d$var == 'hp']))) +
geom_vline(aes(col = 'mpg', xintercept = x), linetype = "dashed", size = 1,
data = data.frame(x = mean(d$x[d$var == 'mpg'])))

r annotate values above geometric bars

Consider this sample data.
df <- data.frame(
x = factor(c(1, 1, 2, 2)),
y = c(.1, .3, .2, .1),
grp = c("a", "b", "a", "b")
)
Now I create the graph using ggplot, and annotate it using geom_text()
ggplot(data = df, aes(x, y, fill = grp, label = y)) +
geom_bar(stat = "identity", position = "dodge") +
scale_y_continuous(limits=c(0,1)) +
geom_text(position = position_dodge(0.9))
How do I specify that all the text values align perfectly horizontal at the top of the graph window?
You can specify the aes(y=...) in geom_text. So, for the numbers at the top of the graph window you'll have
ggplot(data = df, aes(x, y, fill = grp, label = y)) +
geom_bar(stat = "identity", position = "dodge") +
geom_text(aes(y=Inf), position = position_dodge(0.9))
And you may want to chuck in a + ylim(0, 4) to expand the plot area.
To match the edited question:
ggplot(data = df, aes(x, y, fill = grp, label = y)) +
geom_bar(stat = "identity", position = "dodge") +
scale_y_continuous(limits=c(0,1)) +
geom_text(aes(y=0.9), position = position_dodge(0.9)) ## can specify any y=.. value

Resources