ggplot 2: how to show breaks - r

I can't get ggplot2 to show breaks in horizontal bars. The code is as follows:
dat <- data.frame(
result = c(replicate(50, 'ok'), replicate(17, 'error'), replicate(35, 'notrun')),
test = 'test',
count = 'count'
)
ggplot(data=dat, aes(x=test, y=count, fill=result)) +
geom_bar(stat="identity") +
scale_fill_manual(values = c(ok = '#00BA38', error='#F8766D', notrun='gray')) +
xlab("") + ylab("") +
scale_x_discrete(label="") +
scale_y_discrete(breaks = c(1, 7, 9)) +
coord_flip()

ggplot counts automatically if you leave stat=identity out. In this case you can leave your count variable out as well:
dat <- data.frame(
result = c( replicate(50, 'ok'), replicate(17, 'error'),
replicate(35, 'notrun') ),
test = 'test' )
ggplot(data=dat, aes(x=test, fill=result)) +
geom_bar() +
coord_flip() +
scale_fill_manual(values = c(ok = '#00BA38', error='#F8766D', notrun='gray'))
To clarify I cite the documentation on geom_bar():
"By default, geom_bar uses stat="bin". This makes the height of each bar equal to the number of cases in each group, and it is incompatible with mapping values to the y aesthetic. If you want the heights of the bars to represent values in the data, use stat="identity" and map a value to the y aesthetic."
So what you were doing was putting variable count on the y-axis, which had the same string value "count" for each row, and hence could not be represented on a numerical scale.

Related

How to add percentages on top of an histogram when data is grouped

This is not my data (for confidentiality reasons), but I have tried to create a reproducible example using a dataset included in the ggplot2 library. I have an histogram summarizing the value of some variable by group (factor of 2 levels). First, I did not want the counts but proportions of the total, so I used that code:
library(ggplot2)
library(dplyr)
df_example <- diamonds %>% as.data.frame() %>% filter(cut=="Premium" | cut=="Ideal")
ggplot(df_example,aes(x=z,fill=cut)) +
geom_histogram(aes(y=after_stat(width*density)),binwidth=1,center=0.5,col="black") +
facet_wrap(~cut) +
scale_x_continuous(breaks=seq(0,9,by=1)) +
scale_y_continuous(labels=scales::percent_format(accuracy=2,suffix="")) +
scale_fill_manual(values=c("#CC79A7","#009E73")) +
labs(x="Depth (mm)",y="Count") +
theme_bw() + theme(legend.position="none")
It gave me this as a result.
enter image description here
The issue is that I would like to print the numeric percentages on top of the bins and haven't find a way to do so.
As I saw it done for printing counts elsewhere, I attempted to print them using stat_bin(), including the same y and label values as the y in geom_histogram, thinking it would print the right numbers:
ggplot(df_example,aes(x=z,fill=cut)) +
geom_histogram(aes(y=after_stat(width*density)),binwidth=1,center=0.5,col="black") +
stat_bin(aes(y=after_stat(width*density),label=after_stat(width*density*100)),geom="text",vjust=-.5) +
facet_wrap(~cut) +
scale_x_continuous(breaks=seq(0,9,by=1)) +
scale_y_continuous(labels=scales::percent_format(accuracy=2,suffix="")) +
scale_fill_manual(values=c("#CC79A7","#009E73")) +
labs(x="Depth (mm)",y="%") +
theme_bw() + theme(legend.position="none")
However, it does print way more values than there are bins, these values do not appear consistent with what is portrayed by the bar heights and they do not print in respect to vjust=-.5 which would make them appear slightly above the bars.
enter image description here
What am I missing here? I know that if there was no grouping variable/facet_wrap, I could use after_stat(count/sum(count)) instead of after_stat(width*density) and it seems that it would have fixed my issue. But I need the histograms for both groups to appear next to each other. Thanks in advance!
You have to use the same arguments in stat_bin as for the histogram when adding your labels to get same binning for both layers and to align the labels with the bars:
library(ggplot2)
library(dplyr)
df_example <- diamonds %>%
as.data.frame() %>%
filter(cut == "Premium" | cut == "Ideal")
ggplot(df_example, aes(x = z, fill = cut)) +
geom_histogram(aes(y = after_stat(width * density)),
binwidth = 1, center = 0.5, col = "black"
) +
stat_bin(
aes(
y = after_stat(width * density),
label = scales::number(after_stat(width * density), scale = 100, accuracy = 1)
),
geom = "text", binwidth = 1, center = 0.5, vjust = -.25
) +
facet_wrap(~cut) +
scale_x_continuous(breaks = seq(0, 9, by = 1)) +
scale_y_continuous(labels = scales::number_format(scale = 100)) +
scale_fill_manual(values = c("#CC79A7", "#009E73")) +
labs(x = "Depth (mm)", y = "%") +
theme_bw() +
theme(legend.position = "none")

labels on a stacked density plot

I'm generating a stacked density plot:
ggplot(data=tydy_rawdata, aes(x=timepoint, y=tpm, group=fct_inorder(names),
fill=fct_inorder(names))) +
geom_density(position="fill",
stat="identity") +
scale_fill_manual(values = rev(mycolors))
plot :
I would like to add label on each curve (or at least the top 3 or 4) basing on the "names" displayed on the right.
I'm trying adding geom_text but the result is this :
gplot(data=tydy_rawdata, aes(x=timepoint, y=tpm, group=fct_inorder(names),
fill=fct_inorder(names))) +
geom_density(position="fill",
stat="identity") +
geom_text(aes(label=names)) +
scale_fill_manual(values = rev(mycolors))
plot :
Are there some way to do it?
First, your chart is a stacked area chart, i.e. geom_density with stat="identity" is equal to geom_area. Second, when adding labels via geom_text you have to take account of the position argument. As you use position="fill" for your density/area chart you also have to do the same for geom_text.
As you provided no example data I created my own to make your issue reproducible:
library(ggplot2)
library(forcats)
set.seed(123)
tydy_rawdata <- data.frame(
names = rep(LETTERS[1:10], each = 6),
timepoint = factor(seq(6)),
tpm = runif(6 * 10, 0, 80)
)
ggplot(data = tydy_rawdata, aes(
x = timepoint, y = tpm,
group = fct_inorder(names), fill = fct_inorder(names)
)) +
geom_area(
position = "fill",
color = "black"
) +
geom_text(aes(label = names), position = "fill")

How to show every second R ggplot2 x-axis label value?

I want to show every second of x-axis label list in the presentation.
Simplified code example in the following and its output in Fig. 1 where four Dates shown but #2 and #4 should be skipped.
# https://stackoverflow.com/a/6638722/54964
require(ggplot2)
my.dates = as.Date(c("2011-07-22","2011-07-23",
"2011-07-24","2011-07-28","2011-07-29"))
my.vals = c(5,6,8,7,3)
my.data <- data.frame(date =my.dates, vals = my.vals)
plot(my.dates, my.vals)
p <- ggplot(data = my.data, aes(date,vals))+ geom_line(size = 1.5)
Expected output: skip dates second and fourth.
Actual code
Actual code where due to rev(Vars) logic, I cannot apply as.Date to the values in each category; the variable molten has a column Dates
p <- ggplot(molten, aes(x = rev(Vars), y = value)) +
geom_bar(aes(fill=variable), stat = "identity", position="dodge") +
facet_wrap( ~ variable, scales="free") +
scale_x_discrete("Column name dates", labels = rev(Dates))
Expected output: skip #2,#4, ... values in each category.
I thought here changing scale_x_discrete to scale_x_continuous and having a break sequence breaks = seq(1,length(Dates),2)) in scale_x_continuous but it fails because of the following error.
Error: `breaks` and `labels` must have the same length
Proposal based Juan's comments
Code
ggplot(data = my.data, aes(as.numeric(date), vals)) +
geom_line(size = 1.5) +
scale_x_continuous(breaks = pretty(as.numeric(rev(my.data$date)), n = 5))
Output
Error: Discrete value supplied to continuous scale
Testing EricWatt's proposal application into Actual code
Code proposal
p <- ggplot(molten, aes(x = rev(Vars), y = value)) +
geom_bar(aes(fill=variable), stat = "identity", position="dodge") +
facet_wrap( ~ variable, scales="free") +
scale_x_discrete("My dates", breaks = Dates[seq(1, length(Dates), by = 2)], labels = rev(Dates))
Output
Error: `breaks` and `labels` must have the same length
If you have scale_x_discrete("My dates", breaks = Dates[seq(1, length(Dates), by = 2)]), you get x-axis without any labels so blank.
Fig. 1 Output of the simplified code example,
Fig. 2 Output of EricWatt's first proposal
OS: Debian 9
R: 3.4.0
This works with your simplified example. Without your molten data.frame it's hard to check it against your more complicated plot.
ggplot(data = my.data, aes(date, vals)) +
geom_line(size = 1.5) +
scale_x_date(breaks = my.data$date[seq(1, length(my.data$date), by = 2)])
Basically, use scale_x_date which will likely handle any strange date to numeric conversions for you.
My solution eventually on the actual code motivated by the other linked thread and EricWatt's answer
# Test data of actual data here # https://stackoverflow.com/q/45130082/54964
ggplot(data = molten, aes(x = as.Date(Time.data, format = "%d.%m.%Y"), y = value)) +
geom_bar(aes(fill = variable), stat = "identity", position = "dodge") +
facet_wrap( ~ variable, scales="free") +
theme_bw() + # has to be before axis text manipulations because disables their effect otherwise
theme(axis.text.x = element_text(angle = 90, hjust=1),
text = element_text(size=10)) +
scale_x_date(date_breaks = "2 days", date_labels = "%d.%m.%Y")

Annotate x-axis with N in faceted plot, but preserve empty facets

I asked a question yesterday about annotating the x-axis with N in a faceted plot using a minimal example that turns out to be too simple, relative to my real problem. The answer given there works in the case of complete data, but if you have missing facets you would like to preserve, the combination of facet_wrap options drop=FALSE and scales="free_x" triggers an error: "Error in if (zero_range(from) || zero_range(to)) { : missing value where TRUE/FALSE needed"
Here is a new, less-minimal example. The goal here is to produce a large graph with two panels using grid.arrange; the first showing absolute values over time by treatment group; the second showing the change from baseline over time by treatment group. In the second panel, we need a blank facet when vis=1.
# setup
library(ggplot2)
library(plyr)
library(gridExtra)
trt <- factor(rep(LETTERS[1:2],150),ordered=TRUE)
vis <- factor(c(rep(1,150),rep(2,100),rep(3,50)),ordered=TRUE)
id <- c(c(1:150),c(1:100),c(1:50))
val <- rnorm(300)
data <- data.frame(id,trt,vis,val)
base <- with(subset(data,vis==1),data.frame(id,trt,baseval=val))
data <- merge(data,base,by="id")
data <- transform(data,chg=ifelse(vis==1,NA,val-baseval))
data.sum <- ddply(data, .(vis, trt), summarise, N=length(na.omit(val)))
data <- merge(data,data.sum)
data <- transform(data, trtN=paste(trt,N,sep="\n"))
mytheme <- theme_bw() + theme(panel.margin = unit(0, "lines"), strip.background = element_blank())
# no missing facets
plot.a <- ggplot(data) + geom_boxplot(aes(x=trtN,y=val,group=trt,colour=trt), show.legend=FALSE) +
facet_wrap(~ vis, drop=FALSE, switch="x", nrow=1, scales="free_x") +
labs(x="Visit") + mytheme
# first facet should be blank
plot.b <- ggplot(data) + geom_boxplot(aes(x=trtN,y=chg,group=trt,colour=trt), show.legend=FALSE) +
facet_wrap(~ vis, drop=FALSE, switch="x", nrow=1, scales="free_x") +
labs(x="Visit") + mytheme
grid.arrange(plot.a,plot.b,nrow=2)
You can add a blank layer to draw all the facets in your second plot. The key is that you need a variable that exists for every level of vis to use as your y variable. In your case you can simply use the variable you used in your first plot.
ggplot(data) +
geom_boxplot(aes(x = trtN, y = chg, group = trt, colour = trt), show.legend = FALSE) +
geom_blank(aes(x = trtN, y = val)) +
facet_wrap(~ vis, switch = "x", nrow = 1, scales = "free_x") +
labs(x="Visit") + mytheme
If your variables have different ranges, you can set the y limits using the overall min and max of your boxplot y variable.
+ scale_y_continuous(limits = c(min(data$chg, na.rm = TRUE), max(data$chg, na.rm = TRUE)))

How to order bars in faceted ggplot2 bar chart

If I want to order the bars in a ggplot2 barchart from largest to smallest, then I'd usually update the factor levels of the bar category, like so
one_group <- data.frame(
height = runif(5),
category = gl(5, 1)
)
o <- order(one_group$height, decreasing = TRUE)
one_group$category <- factor(one_group$category, levels = one_group$category[o])
p_one_group <- ggplot(one_group, aes(category, height)) +
geom_bar(stat = "identity")
p_one_group
If have have several groups of barcharts that I'd like in different facets, with each facet having bars ordered from largest to smallest (and different x-axes) then the technique breaks down.
Given some sample data
two_groups <- data.frame(
height = runif(10),
category = gl(5, 2),
group = gl(2, 1, 10, labels = letters[1:2])
)
and the plotting code
p_two_groups <- ggplot(two_groups, aes(category, height)) +
geom_bar(stat = "identity") +
facet_grid(. ~ group, scales = "free_x")
p_two_groups
what do I need to do to get the bar ordering right?
If it helps, an equivalent problem to solve is: how do I update factor levels after I've done the faceting?
here is a hack:
two_groups <- transform(two_groups, category2 = factor(paste(group, category)))
two_groups <- transform(two_groups, category2 = reorder(category2, rank(height)))
ggplot(two_groups, aes(category2, height)) +
geom_bar(stat = "identity") +
facet_grid(. ~ group, scales = "free_x") +
scale_x_discrete(labels=two_groups$category, breaks=two_groups$category2)
make UNIQUE factor variable for all entries (category2)
reorder the variable based on the height
plot on the variable: aes(x=category2)
re-label the axis using original value (category) for the variable (category2) in scale_x_discrete.
Here is a hack to achieve what you want. I was unable to figure out how to get the category values below the tick marks. So if someone can help fix that, it would be wonderful. Let me know if this works
# add a height rank variable to the data frame
two_groups = ddply(two_groups, .(group), transform, hrank = rank(height));
# plot the graph
p_two_groups <- ggplot(two_groups, aes(-hrank, height)) +
geom_bar(stat = "identity") +
facet_grid(. ~ group, scales = "free_x") +
opts(axis.text.x = theme_blank()) +
geom_text(aes(y = 0, label = category, vjust = 1.5))

Resources