Trying to separate bars in two overlayed bar charts in ggplot - r

I am trying to create a single chart from two created bar charts to show the differences in their distribution. I have both charts merging together, and the axis labels are correct. However, I cannot figure out how to get the bars in each section to be next to each other for comparison instead of overlaying. Data for this chart are two variables within the same DF. I am relatively new to r and new to ggplot so even plotting what I have was a challenge. Please be kind and I apologize if this is a question that has been answered before.
Here is the code I am using:
Labeled <- ggplot(NULL, aes(lab),position_dodge(.5)) + ggtitle("Figure 1. Comparison of Distribution of Age of Diagnosis and Age of Feeding Challenges")+
geom_bar(aes(x=AgeFactor,fill = "Age of Autism Diagnosis"), data = Graph, alpha = 0.5,width = 0.6) +
geom_bar(aes(x=FdgFactor,fill = "Feeding Challenge Onset"), data = Graph, alpha = 0.5,width=.6)+
scale_x_discrete(limits=c("0-6months","7-12months","1-1.99","2-2.99","3-3.99","4-4.99","5-5.99","6-6.99","7-7.99","8-8.99","9-9.99","10-10.99"))+
xlab("Age")+
ylab("")+
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))+
scale_fill_discrete(name = "")
and this is the graph it is creating for me:
I really appreciate any insight. This is my first time asking a question on stack too - so I am happy to edit/adjust info as needed.

The issue is that you plot from different columns of your dataset. To dodge your bars position_dodge requires a column to group the data by. To this end you could reshape your data to long format using e.g. tidyr::pivot_longer so that your two columns are stacked on top of each other and you get a new column containing the column or group names as categories.
Using some fake random example data. First I replicate your issue with this data and your code:
set.seed(123)
levels <- c("0-6months", "7-12months", "1-1.99", "2-2.99", "3-3.99", "4-4.99", "5-5.99", "6-6.99", "7-7.99", "8-8.99", "9-9.99", "10-10.99")
Graph <- data.frame(
AgeFactor = sample(levels, 100, replace = TRUE),
FdgFactor = sample(levels, 100, replace = TRUE),
lab = 1:100
)
library(ggplot2)
ggplot(NULL, aes(lab), position_dodge(.5)) +
ggtitle("Figure 1. Comparison of Distribution of Age of Diagnosis and Age of Feeding Challenges") +
geom_bar(aes(x = AgeFactor, fill = "Age of Autism Diagnosis"), data = Graph, alpha = 0.5, width = 0.6) +
geom_bar(aes(x = FdgFactor, fill = "Feeding Challenge Onset"), data = Graph, alpha = 0.5, width = .6) +
scale_x_discrete(limits = c("0-6months", "7-12months", "1-1.99", "2-2.99", "3-3.99", "4-4.99", "5-5.99", "6-6.99", "7-7.99", "8-8.99", "9-9.99", "10-10.99")) +
xlab("Age") +
ylab("") +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1)) +
scale_fill_discrete(name = "")
And now the fix using reshaping. Additionally I simplified your code a bit:
library(tidyr)
library(dplyr)
Graph_long <- Graph %>%
select(AgeFactor, FdgFactor) %>%
pivot_longer(c(AgeFactor, FdgFactor))
ggplot(Graph_long, aes(x = value, fill = name)) +
geom_bar(alpha = 0.5, width = 0.6, position = position_dodge()) +
scale_fill_discrete(labels = c("Age of Autism Diagnosis", "Feeding Challenge Onset")) +
scale_x_discrete(limits = levels) +
labs(x = "Age", y = NULL, fill = NULL, title = "Figure 1. Comparison of Distribution of Age of Diagnosis and Age of Feeding Challenges") +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1))

Related

Unable to add legend to economic time series chart

I'm attempting to add a legend to a time series chart and I've so far been unable to get any traction. I've provided the working code below, which pulls three economic data series into one chart and applies several changes to get in a format/overall aesthetic that I'd like. I should also add that the chart is graphing the y/y change of quarterly data sets.
I've only been able to find examples of individuals using scale_colour_manual to add a legend - I've provided code that I put together below.
Ideally, the legend just needs to appear to the right of the graph with the color and line chart.
Any help would be greatly appreciated!
library(quantmod)
library(TTR)
library(ggthemes)
library(tidyverse)
Nondurable <- getSymbols("PCND", src = "FRED", auto.assign = F)
Nondurable$chng <- ROC(Nondurable$PCND,4)
Durable <- getSymbols("PCDG", src = "FRED", auto.assign = F)
Durable$chng <- ROC(Durable$PCDG,4)
Services <- getSymbols("PCESV", src = "FRED", auto.assign = F)
Services$chng <- ROC(Services$PCESV, 4)
ggplot() +
geom_line(data = Nondurable, aes(x = Index, y = chng), color = "#5b9bd5", size = 1, linetype = "solid") +
geom_line(data = Durable, aes(x = Index, y = chng), color = "#00b050", size = 1, linetype = "longdash") +
geom_line(data = Services, aes(x = Index, y = chng), color = "#ed7d31", size = 1, linetype = "twodash") +
theme_tufte() +
scale_y_continuous(labels = percent, limits = c(-0.01,.09)) +
xlim(as.Date(c('1/1/2010', '6/30/2019'), format="%d/%m/%Y")) +
labs(y = "Percent Change", x = "", caption = "Seasonally Adjusted Annual Rate. Retrieved from FRED & U.S. Bureau of Economic Analysis") +
ggtitle("Year-over-Year Spending Trend Changes of the US Consumer") +
scale_colour_manual(name = 'Legend',
guide = 'legend',
values = c('Nondurable' = '#5b9bd5',
'Durable' = '#00b050',
'Services' = '#ed7d31'),
labels = c('Nondurable',
'Durable',
'Services'))
I receive the following warning messages when I run the program (the chart still plots though).
Warning messages:
1: Removed 252 rows containing missing values (geom_path).
2: Removed 252 rows containing missing values (geom_path).
3: Removed 252 rows containing missing values (geom_path).
There are two reasons you are receiving this error:
The bulk are being removed because of your limits. When you use xlim() or scale_y_continuous(..., limits = ...) ggplot removes the values beyond these limits from your data before plotting and displays that warning as an FYI. After commenting out both of those lines, you will still see a message about removed values but a much smaller number. This is becuase
you have NA values in the first 4 rows of column chng. This is true in all 3 datasets.
For the scales to show, you need to put something differentiating the lines in the aes() as in aes(..., color = "Nondurable"). See if this solution works for you:
ggplot() +
geom_line(data = Nondurable, aes(x = Index, y = chng, color = "Nondurable"), size = 1, linetype = "solid") +
geom_line(data = Durable, aes(x = Index, y = chng, color = "Durable"), size = 1, linetype = "longdash") +
geom_line(data = Services, aes(x = Index, y = chng, color = "Services"), size = 1, linetype = "twodash") +
theme_tufte() +
labs(
y = "Percent Change",
x = "",
caption = "Seasonally Adjusted Annual Rate. Retrieved from FRED & U.S. Bureau of Economic Analysis"
) +
ggtitle("Year-over-Year Spending Trend Changes of the US Consumer") +
scale_colour_manual(
name = "Legend",
values = c("#5b9bd5","#00b050","#ed7d31"),
labels = c("Nondurable", "Durable", "Services"
)
) +
scale_x_date(limits = as.Date(c("2010-01-01", "2019-02-01")))

How can I plot 2 related variables on the same axis using ggplot? [duplicate]

Edit: This question has been marked as duplicated, but the responses here have been tried and did not work because the case in question is a line chart, not a bar chart. Applying those methods produces a chart with 5 lines, 1 for each year - not useful. Did anyone who voted to mark as duplicate actually try those approaches on the sample dataset supplied with this question? If so please post as an answer.
Original Question:
There's a feature in Excel pivot charts which allows multilevel categorical axes.I'm trying to find a way to do the same thing with ggplot (or any other plotting package in R).
Consider the following dataset:
set.seed(1)
df=data.frame(year=rep(2009:2013,each=4),
quarter=rep(c("Q1","Q2","Q3","Q4"),5),
sales=40:59+rnorm(20,sd=5))
If this is imported to an Excel pivot table, it is straightforward to create the following chart:
Note how the x-axis has two levels, one for quarter and one for the grouping variable, year. Are multilevel axes possible with ggplot?
NB: There is a hack with facets that produces something similar, but this is not what I'm looking for.
library(ggplot2)
ggplot(df) +
geom_line(aes(x=quarter,y=sales,group=year))+
facet_grid(.~year,scales="free")
New labels are added using annotate(geom = "text",. Turn off clipping of x axis labels with clip = "off" in coord_cartesian.
Use theme to add extra margins (plot.margin) and remove (element_blank()) x axis text (axis.title.x, axis.text.x) and vertical grid lines (panel.grid.x).
library(ggplot2)
ggplot(data = df, aes(x = interaction(year, quarter, lex.order = TRUE),
y = sales, group = 1)) +
geom_line(colour = "blue") +
annotate(geom = "text", x = seq_len(nrow(df)), y = 34, label = df$quarter, size = 4) +
annotate(geom = "text", x = 2.5 + 4 * (0:4), y = 32, label = unique(df$year), size = 6) +
coord_cartesian(ylim = c(35, 65), expand = FALSE, clip = "off") +
theme_bw() +
theme(plot.margin = unit(c(1, 1, 4, 1), "lines"),
axis.title.x = element_blank(),
axis.text.x = element_blank(),
panel.grid.major.x = element_blank(),
panel.grid.minor.x = element_blank())
See also the nice answer by #eipi10 here: Axis labels on two lines with nested x variables (year below months)
The suggested code by Henrik does work and helped me a lot! I think the solution has a high value. But please be aware, that there is a small misstake in the first line of the code, which results in a wrong order of the data.
Instead of
... aes(x = interaction(year,quarter), ...
it should be
... aes(x = interaction(quarter,year), ...
The resulting graphic has the data in the right order.
P.S. I suggested an edit (which was rejected until now) and, due to a small lack of reputation, I am not allowed to comment, what I rather would have done.
User Tung had a great answer on this thread
library(tidyverse)
library(lubridate)
library(scales)
set.seed(123)
df <- tibble(
date = as.Date(41000:42000, origin = "1899-12-30"),
value = c(rnorm(500, 5), rnorm(501, 10))
)
# create year column for facet
df <- df %>%
mutate(year = as.factor(year(date)))
p <- ggplot(df, aes(date, value)) +
geom_line() +
geom_vline(xintercept = as.numeric(df$date[yday(df$date) == 1]), color = "grey60") +
scale_x_date(date_labels = "%b",
breaks = pretty_breaks(),
expand = c(0, 0)) +
# switch the facet strip label to the bottom
facet_grid(.~ year, space = 'free_x', scales = 'free_x', switch = 'x') +
labs(x = "") +
theme_classic(base_size = 14, base_family = 'mono') +
theme(panel.grid.minor.x = element_blank()) +
# remove facet spacing on x-direction
theme(panel.spacing.x = unit(0,"line")) +
# switch the facet strip label to outside
# remove background color
theme(strip.placement = 'outside',
strip.background.x = element_blank())
p

Add legend to geom_density R [duplicate]

This question already has answers here:
Add legend to ggplot2 line plot
(4 answers)
Closed 4 years ago.
I'm working with the Prosper Loan dataset and I'm trying to show two variable in the same plot using geom_density.
The problem, when I try to include the lengend to show the variable name from the pink area and the variable name from the dark area, it doesn't work.
library(ggplot2)
EstimatedLoss <- c(0.5, 0.2,0.3,0.4,0.8,0.5, 0.2,0.3,0.4,0.8)
EstimatedEffectiveYield <- c(0.10, 0.15,0.18,0.20,0.8,0.15, 0.13,0.22,0.22,0.25)
prosper_loan <- data.frame(EstimatedLoss,EstimatedEffectiveYield)
ggplot(data = prosper_loan)
geom_density(aes(EstimatedLoss * 100), color = '#e1b582', fill = '#e1b582', alpha = 0.5, show.legend = TRUE ) +
geom_density(aes(EstimatedEffectiveYield * 100), color = '#a2b285',fill = '#a2b285', alpha = 0.7, linetype = 3, size = 1, show.legend = TRUE) +
scale_y_continuous(name = "Density")+
scale_x_continuous(name = "Estimate loss and effective yield in percentage") +
ggtitle('Density from the Estimated loss and effective yield in percentage')
Am I doing anything wrong?
Ideally, your data should be one observation per row (aka "long" data) to properly take advantage of ggplot2. Here's an example of first transforming the data using tidyr::gather. A legend will automatically be added with a fill or color aesthetic.
library(ggplot2)
library(tidyr)
library(magrittr)
EstimatedLoss <- c(0.5, 0.2,0.3,0.4,0.8,0.5, 0.2,0.3,0.4,0.8)
EstimatedEffectiveYield <- c(0.10, 0.15,0.18,0.20,0.8,0.15, 0.13,0.22,0.22,0.25)
prosper_loan <- data.frame(EstimatedLoss, EstimatedEffectiveYield) %>%
gather(key, value, EstimatedLoss:EstimatedEffectiveYield)
ggplot(data = prosper_loan) +
geom_density(aes(value * 100, fill = key, color = key), alpha = 0.5) +
scale_fill_manual(values = c('#e1b582', '#a2b285')) +
scale_color_manual(values = c('#e1b582', '#a2b285')) +
scale_y_continuous(name = "Density")+
scale_x_continuous(name = "Estimate loss and effective yield in percentage") +
ggtitle('Density from the Estimated loss and effective yield in percentage')

apply ggplot color scale gradient to portion of data

I have a question about applying ggplot's color scale gradient. I have dataset where the response variable is a continuous variable including both positive and negative numbers, and the independent variable is a number of independent sites. I'm trying to plot the data in such a way that I can plot all of the data in the background and then apply a color scale gradient to response data that covers the negative range of the data. This is what I have so far with a example dataset that mimics the structure of my actual dataset.
tr_sim <- data.frame(site_id = seq(1,100,1), estimated_impact =
rnorm(100,18,200), impact_group = rep(c(1,2),each = 50))
rng_full <- range(tr_sim$estimated_impact)
#produce plot showing the full range of impacts across all sites and then
over the subsetted sites
impact_plot_full <- ggplot(data = tr_sim, aes(x = factor(site_id, levels =
site_id[order(estimated_impact)]), y = estimated_impact)) +
geom_bar(stat = "identity",width = 1, fill = "grey80")
impact_plot_full
impact_plot_full +
geom_bar(stat = "identity", width = 1, position = "stack", aes(y =
estimated_impact[impact_group == 1])) +
scale_fill_gradient2(low="firebrick", mid="yellow", high = "green4") +
labs(y = "Estimated Impact ($/week)", x = "Total number of sites with estimate
is 100", title = "Sites with the greatest impact after coverage loss") +
theme(axis.text.x = element_blank()) +
scale_y_continuous(breaks =
round(seq(rng_full[1],rng_full[2],by=100),digits=0))
I can plot all of the data in the background in gray and I'm attempting to plot the negative range of the response data on top of this. I get the error that 'aesthetics must be either length 1 or the same as the data(100), y,x'. I know this is occurring because the negative data is not the same length as the entire dataset, but I can not figure out a way to do this. Any suggestions would be greatly appreciated.
Thank you,
Curtis
You need to subset the data and use fill in the aes() for geom_bar.
impact_plot_full +
geom_bar(data = subset(tr_sim, estimated_impact < 0),
stat = "identity",
aes(y = estimated_impact, fill = estimated_impact)) +
scale_fill_gradient2(low = "firebrick", mid = "yellow", high =
"green4") +
theme(axis.text.x = element_blank()) +
xlab("site_id")
Hope this is what You are looking for.

ggplot, facet, piechart: placing text in the middle of pie chart slices

I'm trying to produce a facetted pie-chart with ggplot and facing problems with placing text in the middle of each slice:
dat = read.table(text = "Channel Volume Cnt
AGENT high 8344
AGENT medium 5448
AGENT low 23823
KIOSK high 19275
KIOSK medium 13554
KIOSK low 38293", header=TRUE)
vis = ggplot(data=dat, aes(x=factor(1), y=Cnt, fill=Volume)) +
geom_bar(stat="identity", position="fill") +
coord_polar(theta="y") +
facet_grid(Channel~.) +
geom_text(aes(x=factor(1), y=Cnt, label=Cnt, ymax=Cnt),
position=position_fill(width=1))
The output:
What parameters of geom_text should be adjusted in order to place numerical labels in the middle of piechart slices?
Related question is Pie plot getting its text on top of each other but it doesn't handle case with facet.
UPDATE: following Paul Hiemstra advice and approach in the question above I changed code as follows:
---> pie_text = dat$Cnt/2 + c(0,cumsum(dat$Cnt)[-length(dat$Cnt)])
vis = ggplot(data=dat, aes(x=factor(1), y=Cnt, fill=Volume)) +
geom_bar(stat="identity", position="fill") +
coord_polar(theta="y") +
facet_grid(Channel~.) +
geom_text(aes(x=factor(1),
---> y=pie_text,
label=Cnt, ymax=Cnt), position=position_fill(width=1))
As I expected tweaking text coordiantes is absolute but it needs be within facet data:
NEW ANSWER: With the introduction of ggplot2 v2.2.0, position_stack() can be used to position the labels without the need to calculate a position variable first. The following code will give you the same result as the old answer:
ggplot(data = dat, aes(x = "", y = Cnt, fill = Volume)) +
geom_bar(stat = "identity") +
geom_text(aes(label = Cnt), position = position_stack(vjust = 0.5)) +
coord_polar(theta = "y") +
facet_grid(Channel ~ ., scales = "free")
To remove "hollow" center, adapt the code to:
ggplot(data = dat, aes(x = 0, y = Cnt, fill = Volume)) +
geom_bar(stat = "identity") +
geom_text(aes(label = Cnt), position = position_stack(vjust = 0.5)) +
scale_x_continuous(expand = c(0,0)) +
coord_polar(theta = "y") +
facet_grid(Channel ~ ., scales = "free")
OLD ANSWER: The solution to this problem is creating a position variable, which can be done quite easily with base R or with the data.table, plyr or dplyr packages:
Step 1: Creating the position variable for each Channel
# with base R
dat$pos <- with(dat, ave(Cnt, Channel, FUN = function(x) cumsum(x) - 0.5*x))
# with the data.table package
library(data.table)
setDT(dat)
dat <- dat[, pos:=cumsum(Cnt)-0.5*Cnt, by="Channel"]
# with the plyr package
library(plyr)
dat <- ddply(dat, .(Channel), transform, pos=cumsum(Cnt)-0.5*Cnt)
# with the dplyr package
library(dplyr)
dat <- dat %>% group_by(Channel) %>% mutate(pos=cumsum(Cnt)-0.5*Cnt)
Step 2: Creating the facetted plot
library(ggplot2)
ggplot(data = dat) +
geom_bar(aes(x = "", y = Cnt, fill = Volume), stat = "identity") +
geom_text(aes(x = "", y = pos, label = Cnt)) +
coord_polar(theta = "y") +
facet_grid(Channel ~ ., scales = "free")
The result:
I would like to speak out against the conventional way of making pies in ggplot2, which is to draw a stacked barplot in polar coordinates. While I appreciate the mathematical elegance of that approach, it does cause all sorts of headaches when the plot doesn't look quite the way it's supposed to. In particular, precisely adjusting the size of the pie can be difficult. (If you don't know what I mean, try to make a pie chart that extends all the way to the edge of the plot panel.)
I prefer drawing pies in a normal cartesian coordinate system, using geom_arc_bar() from ggforce. It requires a little bit of extra work on the front end, because we have to calculate angles ourselves, but that's easy and the level of control we get as a result is more than worth it.
I've used this approach in previous answers here and here.
The data (from the question):
dat = read.table(text = "Channel Volume Cnt
AGENT high 8344
AGENT medium 5448
AGENT low 23823
KIOSK high 19275
KIOSK medium 13554
KIOSK low 38293", header=TRUE)
The pie-drawing code:
library(ggplot2)
library(ggforce)
library(dplyr)
# calculate the start and end angles for each pie
dat_pies <- left_join(dat,
dat %>%
group_by(Channel) %>%
summarize(Cnt_total = sum(Cnt))) %>%
group_by(Channel) %>%
mutate(end_angle = 2*pi*cumsum(Cnt)/Cnt_total, # ending angle for each pie slice
start_angle = lag(end_angle, default = 0), # starting angle for each pie slice
mid_angle = 0.5*(start_angle + end_angle)) # middle of each pie slice, for the text label
rpie = 1 # pie radius
rlabel = 0.6 * rpie # radius of the labels; a number slightly larger than 0.5 seems to work better,
# but 0.5 would place it exactly in the middle as the question asks for.
# draw the pies
ggplot(dat_pies) +
geom_arc_bar(aes(x0 = 0, y0 = 0, r0 = 0, r = rpie,
start = start_angle, end = end_angle, fill = Volume)) +
geom_text(aes(x = rlabel*sin(mid_angle), y = rlabel*cos(mid_angle), label = Cnt),
hjust = 0.5, vjust = 0.5) +
coord_fixed() +
scale_x_continuous(limits = c(-1, 1), name = "", breaks = NULL, labels = NULL) +
scale_y_continuous(limits = c(-1, 1), name = "", breaks = NULL, labels = NULL) +
facet_grid(Channel~.)
To show why I think this this approach is so much more powerful than the conventional (coord_polar()) approach, let's say we want the labels on the outside of the pie rather than inside. This creates a couple of problems, such as we will have to adjust hjust and vjust depending on the side of the pie a label falls, and also we will have to make the
plot panel wider than high to make space for the labels on the side without generating excessive space above and below. Solving these problems in the polar coordinate approach is not fun, but it's trivial in the cartesian coordinates:
# generate hjust and vjust settings depending on the quadrant into which each
# label falls
dat_pies <- mutate(dat_pies,
hjust = ifelse(mid_angle>pi, 1, 0),
vjust = ifelse(mid_angle<pi/2 | mid_angle>3*pi/2, 0, 1))
rlabel = 1.05 * rpie # now we place labels outside of the pies
ggplot(dat_pies) +
geom_arc_bar(aes(x0 = 0, y0 = 0, r0 = 0, r = rpie,
start = start_angle, end = end_angle, fill = Volume)) +
geom_text(aes(x = rlabel*sin(mid_angle), y = rlabel*cos(mid_angle), label = Cnt,
hjust = hjust, vjust = vjust)) +
coord_fixed() +
scale_x_continuous(limits = c(-1.5, 1.4), name = "", breaks = NULL, labels = NULL) +
scale_y_continuous(limits = c(-1, 1), name = "", breaks = NULL, labels = NULL) +
facet_grid(Channel~.)
To tweak the position of the label text relative to the coordinate, you can use the vjust and hjust arguments of geom_text. This will determine the position of all labels simultaneously, so this might not be what you need.
Alternatively, you could tweak the coordinate of the label. Define a new data.frame where you average the Cnt coordinate (label_x[i] = Cnt[i+1] + Cnt[i]) to position the label in the center of that particular pie. Just pass this new data.frame to geom_text in replacement of the original data.frame.
In addition, piecharts have some visual interpretation flaws. In general I would not use them, especially where good alternatives exist, e.g. a dotplot:
ggplot(dat, aes(x = Cnt, y = Volume)) +
geom_point() +
facet_wrap(~ Channel, ncol = 1)
For example, from this plot it is obvious that Cnt is higher for Kiosk than for Agent, this information is lost in the piechart.
Following answer is partial, clunky and I won't accept it.
The hope is that it will solicit better solution.
text_KIOSK = dat$Cnt
text_AGENT = dat$Cnt
text_KIOSK[dat$Channel=='AGENT'] = 0
text_AGENT[dat$Channel=='KIOSK'] = 0
text_KIOSK = text_KIOSK/1.7 + c(0,cumsum(text_KIOSK)[-length(dat$Cnt)])
text_AGENT = text_AGENT/1.7 + c(0,cumsum(text_AGENT)[-length(dat$Cnt)])
text_KIOSK[dat$Channel=='AGENT'] = 0
text_AGENT[dat$Channel=='KIOSK'] = 0
pie_text = text_KIOSK + text_AGENT
vis = ggplot(data=dat, aes(x=factor(1), y=Cnt, fill=Volume)) +
geom_bar(stat="identity", position=position_fill(width=1)) +
coord_polar(theta="y") +
facet_grid(Channel~.) +
geom_text(aes(y=pie_text, label=format(Cnt,format="d",big.mark=','), ymax=Inf), position=position_fill(width=1))
It produces following chart:
As you noticed I can't move labels for green (low).

Resources