Adding multiple text annotations to a faceted ggplot geom_histogram - r

I have the following data.frame:
hist.df <- data.frame(y = c(rnorm(30,1,1), rnorm(15), rnorm(30,0,1)),
gt = c(rep("ht", 30), rep("hm", 15), rep("hm", 30)),
group = c(rep("sc", 30), rep("am", 15), rep("sc",30)))
from which I produce the following faceted histogram ggplot:
main.plot <- ggplot(data = hist.df, aes(x = y)) +
geom_histogram(alpha=0.5, position="identity", binwidth = 2.5,
aes(fill = factor(gt))) +
facet_wrap(~group) +
scale_fill_manual(values = c("darkgreen","darkmagenta"),
labels = c("ht","hm"),
name = "gt",
limits=c(0, 30))
In addition, I have this data.frame:
text.df = data.frame(ci.lo = c(0.001,0.005,-10.1),
ci.hi = c(1.85,2.25,9.1),
group = c("am","sc","sc"),
factor = c("nu","nu","alpha"))
Which defines the text annotations I want to add to the faceted histograms, so that the final figure will be:
So text.df$ci.lo and text.df$ci.hi are confidence intervals on the corresponding text.df$factor and they correspond to the faceted histograms through text.df$group
Note that not every histogram has all text.df$factor's.
Ideally, the ylim's of the faceted histograms will leave enough space for the text to be added above the histograms so that they appear only on the background.
Any idea how to achieve this?

Wrapping my comment into an answer:
text.df$ci <- paste0(text.df$factor, ' = [', text.df$ci.lo, ', ', text.df$ci.hi, ']')
new_labels <- aggregate(text.df$ci, by = list(text.df$group),
FUN = function(x) paste(x, collapse = '\n'))$x
hist.df$group <- factor(hist.df$group)
hist.df$group <- factor(hist.df$group,
labels = paste0(levels(hist.df$group), '\n', new_labels))
main.plot <- ggplot(data = hist.df, aes(x = y)) +
geom_histogram(alpha=0.5, position="identity", binwidth = 2.5,
aes(fill = factor(gt))) +
facet_wrap(~group) +
scale_fill_manual(values = c("darkgreen","darkmagenta"),
labels = c("ht","hm"),
name = "gt")
main.plot + theme(strip.text = element_text(size=20))
If you wish to stick to the original idea, this question has an answer that will help.

Related

adding a label in geom_line in R

I have two very similar plots, which have two y-axis - a bar plot and a line plot:
code:
sec_plot <- ggplot(data, aes_string (x = year, group = 1)) +
geom_col(aes_string(y = frequency), fill = "orange", alpha = 0.5) +
geom_line(aes(y = severity))
However, there are no labels. I want to get a label for the barplot as well as a label for the line plot, something like:
How can I add the labels to the plot, if there is only pone single group? is there a way to specify this manually? Until know I have only found option where the labels can be added by specifying them in the aes
EXTENSION (added a posterior):
getSecPlot <- function(data, xvar, yvar, yvarsec, groupvar){
if ("agegroup" %in% xvar) xvar <- get("agegroup")
# data <- data[, startYear:= as.numeric(startYear)]
data <- data[!claims == 0][, ':=' (scaled = get(yvarsec) * max(get(yvar))/max(get(yvarsec)),
param = max(get(yvar))/max(get(yvarsec)))]
param <- data[1, param] # important, otherwise not found in ggplot
sec_plot <- ggplot(data, aes_string (x = xvar, group = groupvar)) +
geom_col(aes_string(y = yvar, fill = groupvar, alpha = 0.5), position = "dodge") +
geom_line(aes(y = scaled, color = gender)) +
scale_y_continuous(sec.axis = sec_axis(~./(param), name = paste0("average ", yvarsec),labels = function(x) format(x, big.mark = " ", scientific = FALSE))) +
labs(y = paste0("total ", yvar)) +
scale_alpha(guide = 'none') +
theme_pubclean() +
theme(legend.title=element_blank(), legend.background = element_rect(fill = "white"))
}
plot.ExposureYearly <- getSecPlot(freqSevDataAge, xvar = "agegroup", yvar = "exposure", yvarsec = "frequency", groupvar = "gender")
plot.ExposureYearly
How can the same be done on a plot where both the line plot as well as the bar plot are separated by gender?
Here is a possible solution. The method I used was to move the color and fill inside the aes and then use scale_*_identity to create and format the legends.
Also, I needed to add a scaling factor for severity axis since ggplot does not handle the secondary axis well.
data<-data.frame(year= 2000:2005, frequency=3:8, severity=as.integer(runif(6, 4000, 8000)))
library(ggplot2)
library(scales)
sec_plot <- ggplot(data, aes(x = year)) +
geom_col(aes(y = frequency, fill = "orange"), alpha = 0.6) +
geom_line(aes(y = severity/1000, color = "black")) +
scale_fill_identity(guide = "legend", label="Claim frequency (Number of paid claims per 100 Insured exposure)", name=NULL) +
scale_color_identity(guide = "legend", label="Claim Severity (Average insurance payment per claim)", name=NULL) +
theme(legend.position = "bottom") +
scale_y_continuous(sec.axis =sec_axis( ~ . *1, labels = label_dollar(scale=1000), name="Severity") ) + #formats the 2nd axis
guides(fill = guide_legend(order = 1), color = guide_legend(order = 2)) #control which scale plots first
sec_plot

Highlight single points in scatter plot with ggplot2 and ggrepel

I want to highlight 4 single points in a scatter plot with a box surrounding the name associated with the plot. I am using ggrepel to create the boxes surrounding the plots and to repel them.
This is the code I have:
library(ggplot2)
gg <- ggplot(X, aes(x = XX, y = XY)) +
geom_point(col = "steelblue", size = 3) +
geom_smooth(method = "lm", col = "firebrick", se = FALSE) +
labs(title = "XX vs XY", subtitle = "X", y = "XX", x = "XY") +
scale_x_continuous(breaks = seq(76, 82, 1)) +
scale_y_continuous(breaks = seq(15, 19, 1))
library(ggrepel)
gg + geom_text_repel(aes(label = Female), size = 3, data = X)
gg + geom_label_repel(aes(label = Female), size = 2, data = X)
With that code, I obtain boxes surrounding all the plots. However, I only want to have the boxes in 4 specific plots and no boxes in the other plots. How can I do that?
Thanks in advance! Regards,
TD

Center x-axis labels in grouped violin/boxplot

I am creating violin plots by groups. I would like to label each group using a Greek letter centered at the middle point of each group of violin plots. How can I do this?
So far, I am using scale_x_discrete, but I cannot indicate any sort of centering.
library(ggplot2)
dat <- matrix(rnorm(100*12),ncol=12)
# Violin plots for columns
mat <- reshape2::melt(data.frame(dat), id.vars = NULL)
mat$variable_grouping <- ifelse(mat$variable %in% c('X1', 'X2', 'X3','X4'), 'g1',
ifelse(mat$variable %in% c('X5','X6','X7','X8'),
'g2', 'g3'))
pp <- ggplot(mat, aes(x = variable, y = value, fill = variable_grouping)) +
geom_violin(scale="width",adjust = 1,width = 0.5) +
scale_x_discrete(labels = c(expression(theta[1]),"","","",expression(theta[2]),"","","",expression(theta[3])))
pp
In this example, the labels should be at 2.5, 6.5 and 8.5.
A different solution:
library(ggplot2)
pp <- ggplot(mat, aes(x = as.numeric(variable), y = value,
group = variable, fill = variable_grouping)) +
geom_violin(scale="width", adjust = 1, width = 0.5) + xlab("variable") +
scale_x_continuous(breaks = c(2.5, 6.5, 10.5),
labels = c(expression(theta[1]),expression(theta[2]),expression(theta[3])))
pp
You can manually change horizontal position of x-axis labels in theme.
library(ggplot2)
ggplot(mat, aes(variable, value, fill = variable_grouping)) +
geom_violin(scale = "width", adjust = 1, width = 0.5) +
scale_x_discrete(labels = c(expression(theta[1]), "", "", "",
expression(theta[2]), "", "", "",
expression(theta[3]))) +
theme(axis.text.x = element_text(hjust = -8),
axis.ticks.x = element_blank())
PS: I also removed x-axis ticks.

Add text to geom_line in ggplot

I am trying to create a line plot for 2 stocks AAPL and FB. Instead of adding a separate legend, I would like to print the stock symbols along with the lines. How can I add geom_text to the following code? I appreciate any help you could provide.
library (ggplot2)
library(quantmod)
getSymbols('AAPL')
getSymbols('FB')
AAPL = data.frame(AAPL)
FB = data.frame(FB)
p1 = ggplot(AAPL)+geom_line(data=AAPL,aes(as.Date(rownames(AAPL)),AAPL.Adjusted,color="AAPL"))
p2 = p1+geom_line(data=FB,aes(as.Date(rownames(FB)),FB.Adjusted,color="FB"))
p2 + xlab("Year")+ylab("Price")+theme_bw()+theme(legend.position="none")
This is the sort of plot that is perfect for the directlabels package. And it is easier to plot if the data is available in one dataframe.
# Data
library(quantmod)
getSymbols('AAPL')
getSymbols('FB')
AAPL = data.frame(AAPL)
FB = data.frame(FB)
# rbind into one dataframe
AAPL$label = "AAPL"
FB$label = "FB"
names = gsub("^FB\\.(.*$)", "\\1", names(FB))
names(AAPL) = names
names(FB) = names
df = rbind(AAPL, FB)
# Packages
library(ggplot2)
library(directlabels)
# The plot - labels at the beginning and the ends of the lines.
ggplot(df, aes(as.Date(rownames(df)), Adjusted, group = label, colour = label)) +
geom_line() +
scale_colour_discrete(guide = 'none') +
geom_dl(aes(label = label), method = list(dl.combine("first.points", "last.points")))
A better plot: Increase the space between the end points of the lines and the labels. See here for other options.
ggplot(df, aes(as.Date(rownames(df)), Adjusted, group = label, colour = label)) +
geom_line() +
scale_colour_discrete(guide = 'none') +
scale_x_date(expand=c(0.1, 0)) +
geom_dl(aes(label = label), method = list(dl.trans(x = x + .2), "last.points")) +
geom_dl(aes(label = label), method = list(dl.trans(x = x - .2), "first.points"))
Question is possibly a duplicate of this one.
You simply have to add geom_text as u said:
Define the x, y positions, the label you want to appear (and the color):
library(quantmod)
getSymbols('AAPL')
getSymbols('FB')
AAPL = data.frame(AAPL)
FB = data.frame(FB)
p1 = ggplot(AAPL)+geom_line(data=AAPL,aes(as.Date(rownames(AAPL)),AAPL.Adjusted,color="AAPL"))
p2 = p1+geom_line(data=FB,aes(as.Date(rownames(FB)),FB.Adjusted,color="FB"))
p2 + xlab("Year") + ylab("Price")+theme_bw()+theme(legend.position="none") +
geom_text(aes(x = as.Date("2011-06-07"), y = 60, label = "AAPL", color = "AAPL")) +
geom_text(aes(x = as.Date("2014-10-01"), y = 45, label = "FB", color = "FB"))
EDIT
If you want to automatically find positions for x and y in geom_text, you will face new problems with overlapping labels if you increase the number of variables.
Here is a beginning of solution, you might adapt the method to define x and `y
AAPL$date = rownames(AAPL)
AAPL$var1 = "AAPL"
names(AAPL)[grep("AAPL", names(AAPL))] = gsub("AAPL.", "", names(AAPL)[grep("AAPL", names(AAPL))])
FB$date = rownames(FB)
FB$var1 = "FB"
names(FB)[grep("FB", names(FB))] = gsub("FB.", "", names(FB)[grep("FB", names(FB))])
# bind the 2 data frames
df = rbind(AAPL, FB)
# where do you want the legend to appear
legend = data.frame(matrix(ncol = 3, nrow = length(unique(df$var1))))
colnames(legend) = c("x_pos" , "y_pos" , "label")
legend$label = unique(df$var1)
legend$x_pos = as.POSIXct(legend$x_pos)
df$date = as.POSIXct(df$date)
for (i in legend$label)
{
legend$x_pos[legend$label == i] <- as.POSIXct(min(df$date[df$var1 == i]) +
as.numeric(difftime(max(df$date[df$var1 == i]), min(df$date[df$var1 == i]), units = "sec"))/2)
legend$y_pos[legend$label == i] <- df$Adjusted[df$date > legend$x_pos[legend$label == i] & df$var1 == i][1]
}
# Plot
ggplot(df, aes(x = as.POSIXct(date), y = Adjusted, color = var1)) +
geom_line() + xlab("Year") + ylab("Price") +
geom_text(data = legend, aes(x = x_pos, y = y_pos, label = label, color = label, hjust = -1, vjust = 1))
+ guides(color = F)

Condition a ..count.. summation on the faceting variable

I'm trying to annotate a bar chart with the percentage of observations falling into that bucket, within a facet. This question is very closely related to this question:
Show % instead of counts in charts of categorical variables but the introduction of faceting introduces a wrinkle. The answer to the related question is to use stat_bin w/ the text geom and then have the label be constructed as so:
stat_bin(geom="text", aes(x = bins,
y = ..count..,
label = paste(round(100*(..count../sum(..count..)),1), "%", sep="")
)
This works fine for an un-faceted plot. However, with facets, this sum(..count..) is summing over the entire collection of observations without regard for the facets. The plot below illustrates the issue---note that the percentages do not sum to 100% within a panel.
Here the actually code for the figure above:
g.invite.distro <- ggplot(data = df.exp) +
geom_bar(aes(x = invite_bins)) +
facet_wrap(~cat1, ncol=3) +
stat_bin(geom="text", aes(x = invite_bins,
y = ..count..,
label = paste(round(100*(..count../sum(..count..)),1), "%", sep="")
),
vjust = -1, size = 3) +
theme_bw() +
scale_y_continuous(limits = c(0, 3000))
UPDATE: As per request, here's a small example re-producing the issue:
df <- data.frame(x = c('a', 'a', 'b','b'), f = c('c', 'd','d','d'))
ggplot(data = df) + geom_bar(aes(x = x)) +
stat_bin(geom = "text", aes(
x = x,
y = ..count.., label = ..count../sum(..count..)), vjust = -1) +
facet_wrap(~f)
Update geom_bar requires stat = identity.
Sometimes it's easier to obtain summaries outside the call to ggplot.
df <- data.frame(x = c('a', 'a', 'b','b'), f = c('c', 'd','d','d'))
# Load packages
library(ggplot2)
library(plyr)
# Obtain summary. 'Freq' is the count, 'pct' is the percent within each 'f'
m = ddply(data.frame(table(df)), .(f), mutate, pct = round(Freq/sum(Freq) * 100, 1))
# Plot the data using the summary data frame
ggplot(data = m, aes(x = x, y = Freq)) +
geom_bar(stat = "identity", width = .7) +
geom_text(aes(label = paste(m$pct, "%", sep = "")), vjust = -1, size = 3) +
facet_wrap(~ f, ncol = 2) + theme_bw() +
scale_y_continuous(limits = c(0, 1.2*max(m$Freq)))

Resources