Related
I'm trying to create a barplot by month that includes two columns, with each column stacked. For each month, the first column would be the total number of video visits, split by vid_new and vid_return. The second column would be the total number of phone visits, split by phone_charge and phone_nocharge.
I still haven't been able to get the bars side-by-side correct. This code uses the data frame in the second picture and it's counting the instances of the word "video" and "phone", not the Count column resulting in the third picture.
plot <- ggplot(data=new_df, aes(x=Month, y = count, fill = gen_type)) +
geom_bar(stat = "identity", position = "dodge")
Below is a pic of the data I'm working with. I've converted it into a few different forms to try new methods by have not been able to form this graph.
How can I make a barplot by group and by stack in ggplot? What data structure do I need to get make it?
Thanks in advance for your advice!
You can try any of these options reshaping your data to long and creating and additional variable so that you can identify the types. Here the code using tidyverse functions:
library(ggplot2)
library(dplyr)
library(tidyr)
#Date
df <- data.frame(Month=c(rep('Mar',4),rep('Apr',4),rep('May',2)),
spec_type=c('vid_new','vid_return','phone_charge','phone_nocharge',
'vid_new','vid_return','phone_charge','phone_nocharge',
'vid_new','vid_return'),
Count=c(7,85,595,56,237,848,2958,274,205,1079))
#Plot 1
df %>% mutate(Month=factor(Month,levels = unique(Month),ordered = T)) %>%
mutate(Dup=spec_type) %>%
separate(Dup,c('Type','Class'),sep='_') %>% select(-Class) %>%
ggplot(aes(x=Type,y=Count,fill=spec_type))+
geom_bar(stat = 'identity',position = 'stack')+
facet_wrap(.~Month,strip.position = 'bottom')+
theme(strip.placement = 'outside',
strip.background = element_blank())
Output:
Or this:
#Plot 2
df %>% mutate(Month=factor(Month,levels = unique(Month),ordered = T)) %>%
mutate(Dup=spec_type) %>%
separate(Dup,c('Type','Class'),sep='_') %>% select(-Class) %>%
ggplot(aes(x=Type,y=Count,fill=spec_type))+
geom_bar(stat = 'identity',position = 'fill')+
facet_wrap(.~Month,strip.position = 'bottom',scales = 'free')+
theme(strip.placement = 'outside',
strip.background = element_blank())
Output:
Or this:
#Plot 3
df %>% mutate(Month=factor(Month,levels = unique(Month),ordered = T)) %>%
mutate(Dup=spec_type) %>%
separate(Dup,c('Type','Class'),sep='_') %>% select(-Class) %>%
ggplot(aes(x=Type,y=Count,fill=spec_type))+
geom_bar(stat = 'identity',position = position_dodge2(preserve = 'single'))+
facet_wrap(.~Month,strip.position = 'bottom',scales = 'free')+
theme(strip.placement = 'outside',
strip.background = element_blank())
Output:
In order to see by month you can use facet_wrap() and placing labels in a smart way.
Edit: This question has been marked as duplicated, but the responses here have been tried and did not work because the case in question is a line chart, not a bar chart. Applying those methods produces a chart with 5 lines, 1 for each year - not useful. Did anyone who voted to mark as duplicate actually try those approaches on the sample dataset supplied with this question? If so please post as an answer.
Original Question:
There's a feature in Excel pivot charts which allows multilevel categorical axes.I'm trying to find a way to do the same thing with ggplot (or any other plotting package in R).
Consider the following dataset:
set.seed(1)
df=data.frame(year=rep(2009:2013,each=4),
quarter=rep(c("Q1","Q2","Q3","Q4"),5),
sales=40:59+rnorm(20,sd=5))
If this is imported to an Excel pivot table, it is straightforward to create the following chart:
Note how the x-axis has two levels, one for quarter and one for the grouping variable, year. Are multilevel axes possible with ggplot?
NB: There is a hack with facets that produces something similar, but this is not what I'm looking for.
library(ggplot2)
ggplot(df) +
geom_line(aes(x=quarter,y=sales,group=year))+
facet_grid(.~year,scales="free")
New labels are added using annotate(geom = "text",. Turn off clipping of x axis labels with clip = "off" in coord_cartesian.
Use theme to add extra margins (plot.margin) and remove (element_blank()) x axis text (axis.title.x, axis.text.x) and vertical grid lines (panel.grid.x).
library(ggplot2)
ggplot(data = df, aes(x = interaction(year, quarter, lex.order = TRUE),
y = sales, group = 1)) +
geom_line(colour = "blue") +
annotate(geom = "text", x = seq_len(nrow(df)), y = 34, label = df$quarter, size = 4) +
annotate(geom = "text", x = 2.5 + 4 * (0:4), y = 32, label = unique(df$year), size = 6) +
coord_cartesian(ylim = c(35, 65), expand = FALSE, clip = "off") +
theme_bw() +
theme(plot.margin = unit(c(1, 1, 4, 1), "lines"),
axis.title.x = element_blank(),
axis.text.x = element_blank(),
panel.grid.major.x = element_blank(),
panel.grid.minor.x = element_blank())
See also the nice answer by #eipi10 here: Axis labels on two lines with nested x variables (year below months)
The suggested code by Henrik does work and helped me a lot! I think the solution has a high value. But please be aware, that there is a small misstake in the first line of the code, which results in a wrong order of the data.
Instead of
... aes(x = interaction(year,quarter), ...
it should be
... aes(x = interaction(quarter,year), ...
The resulting graphic has the data in the right order.
P.S. I suggested an edit (which was rejected until now) and, due to a small lack of reputation, I am not allowed to comment, what I rather would have done.
User Tung had a great answer on this thread
library(tidyverse)
library(lubridate)
library(scales)
set.seed(123)
df <- tibble(
date = as.Date(41000:42000, origin = "1899-12-30"),
value = c(rnorm(500, 5), rnorm(501, 10))
)
# create year column for facet
df <- df %>%
mutate(year = as.factor(year(date)))
p <- ggplot(df, aes(date, value)) +
geom_line() +
geom_vline(xintercept = as.numeric(df$date[yday(df$date) == 1]), color = "grey60") +
scale_x_date(date_labels = "%b",
breaks = pretty_breaks(),
expand = c(0, 0)) +
# switch the facet strip label to the bottom
facet_grid(.~ year, space = 'free_x', scales = 'free_x', switch = 'x') +
labs(x = "") +
theme_classic(base_size = 14, base_family = 'mono') +
theme(panel.grid.minor.x = element_blank()) +
# remove facet spacing on x-direction
theme(panel.spacing.x = unit(0,"line")) +
# switch the facet strip label to outside
# remove background color
theme(strip.placement = 'outside',
strip.background.x = element_blank())
p
I have what I think is a version of remove data points when using stat_summary to generate mean and confidence band or How to set multiple colours in a ggplot2 stat_summary plot? and may also relate to this bug report relating to the SE parameter https://github.com/tidyverse/ggplot2/issues/1546, but I can't seem to figure out what I am doing wrong.
I have weekly data and I am trying to plot current year, previous year, 5 year average, and 5 year range. I can get the plot and all the elements that I want, but I can't get the fill in the range to relate to my scale_fill command.
Here is the code I am using:
library(plyr)
require(dplyr)
require(tidyr)
library(ggplot2)
library(lubridate)
library(zoo)
library(viridis)
ggplot(df1,aes(week,value)) +
geom_point(data=subset(df1,year(date)==year(Sys.Date()) ),size=1.7,aes(colour="1"))+
geom_line(data=subset(df1,year(date)==year(Sys.Date()) ),size=1.7,aes(colour="1"))+
geom_line(data=subset(df1,year(date)==year(Sys.Date())-1 ),size=1.7,aes(colour="2"))+
geom_point(data=subset(df1,year(date)==year(Sys.Date())-1 ),size=1.7,aes(colour="2"))+
#stat_summary(data=subset(df1,year(date)<year(Sys.Date()) &year(date)>year(Sys.Date())-6),geom = 'smooth', alpha = 0.2,size=1.7,
# fun.data = median_hilow,aes(colour=c("1","2","3"),fill="range"))+
stat_summary(data=subset(df1,year(date)<year(Sys.Date()) &year(date)>year(Sys.Date())-6),geom="smooth",fun.y = mean, fun.ymin = min, fun.ymax = max,size=1.7,aes(colour="c",fill="b"))+
#stat_summary(fun.data=mean_cl_normal, geom='smooth', color='black')+
scale_color_viridis("",discrete=TRUE,option="C",labels=c(year(Sys.Date()), year(Sys.Date())-1,paste(year(Sys.Date())-6,"-",year(Sys.Date())-1,"\naverage",sep ="")))+
scale_fill_viridis("",discrete=TRUE,option="C",labels=paste(year(Sys.Date())-6,"-",year(Sys.Date())-1,"\nrange",sep =""))+
#scale_fill_continuous()+
scale_x_continuous(limits=c(min(df1$week),max(df1$week)),expand=c(0,0))+
theme_minimal()+theme(
legend.position = "bottom",
legend.margin=margin(c(0,0,0,0),unit="cm"),
legend.text = element_text(colour="black", size = 12),
plot.caption = element_text(size = 14, face = "italic"),
plot.title = element_text(face = "bold"),
plot.subtitle = element_text(size = 14, face = "italic"),
#panel.grid.minor = element_blank(),
text = element_text(size = 14,face = "bold"),
axis.text.y =element_text(size = 14,face = "bold", colour="black"),
axis.text.x=element_text(size = 14,face = "bold", colour="black",angle=90, hjust=1),
)+
labs(y="Crude Oil Imports \n(Weekly, Thousands of Barrels per Day)",x="Week",
title=paste("US Imports of Crude Oil",sep=""),
caption="Source: EIA API, graph by Andrew Leach.")
I have placed an test.Rdata file here with the df1 data frame: https://drive.google.com/file/d/1aMt4WQaOi1vFJcMlgXFY7dzF_kjbgBiU/view?usp=sharing
Ideally, I'd like to have a fill legend item that looks like this, only with the text as I have it in my graph:
Any help would be much appreciated.
The short answer is that you seem to be misunderstanding how ggplot's scale_xx_xx commands are meant to be used (this trips up a lot of people). Whenever possible, the intention is for the aesthetics (the aes() bit inside most geoms) to be mapped to the scale functions. For example, the following code maps year to line color:
plot.simple <- ggplot(data = df1, aes(x = week, y = value, color = as.factor(year(date)))) +
geom_line()
print(plot.simple)
Since we specified that year (converted to a factor) should be used to define line color, ggplot defaults to using scale_color_hue. We could use a different scale:
plot.gray <- ggplot(data = df1, aes(x = week, y = value, color = as.factor(year(date)))) +
geom_line() +
scale_color_grey()
print(plot.gray)
If we don't want to tie aesthetics such as color or fill to values in the data, we can just specify them outside of the call to aes(). Typically you only do this if you don't have multiple values for an aesthetic:
plot.simple <- ggplot(data = df1, aes(x = week, y = value, color = as.factor(year(date)))) +
geom_line(alpha = 0.2)
print(plot.simple)
But you're in the unenviable position of wanting both of these things at once. For your 2017 and 2018 lines, color is meaningful. For the summary ribbon and its associated line, color is just decorative. In such cases, I usually avoid ggplot's built-in summary functions, since they can often "help" in ways that end up confusing or cumbersome.
I would suggest creating two data sets, one containing the 2017 and 2018 years, and the other containing the summary statistics for the ribbon:
df.years <- df1 %>%
mutate(year = year(date)) %>%
filter(year >= year(Sys.Date()) - 1)
df.year.range <- df1 %>%
mutate(year = year(date)) %>%
filter(year >= year(Sys.Date()) - 6 & year <= year(Sys.Date()) - 1) %>%
group_by(week) %>%
summarize(mean = mean(value), min = min(value), max = max(value))
We can then trick ggplot into printing a nice title for the fill on the legend, by setting fill inside aes to the intended string. Because fill is set in aes(), we control its color with scale_fill_manual.
the.plot <- ggplot() +
geom_ribbon(data = df.year.range, aes(x = week, ymin = min, ymax = max, fill = 'Previous 5 Year Range\nof Weekly Exports')) +
geom_line(data = df.year.range, aes(x = week, y = mean), color = 'purple') +
geom_line(data = df.years, aes(x = week, y = value, color = as.factor(year))) +
geom_point(data = filter(df.years, year == year(Sys.Date())), aes(x = week, y = value, color = as.factor(year))) +
scale_fill_manual(values = '#ffccff')
print(the.plot)
This is still rather cumbersome, because you have quite a few different elements tied to various different sources of data (lines for some years, points for others, a ribbon for a summary, etc). But it gets the job done!
I have the following dataset:
HIU,0.0833333333,0,0.35,0.0208333333,0.40625,0,0.21875,0.125,0.078125,0.0104166667,1,0.53125,0.4375
TTHY,0,0,0.8,0,0.5,0,0.7083333333,0.2708333333,0,0.6597222222,0,0.1435185185,0
Full,0.0554986339,0.1034836066,0.4620901639,0.0683060109,0.4961577869,0.0696721311,0.222079918,0.1465163934,0.2085040984,0.0476007514,0.893613388,0.396943306,0.4223872951
I made a grouped bar plot according to the rows of HIU and TTHY (figure 1). But I want to add a line according to the "Full" row, such as the second image.
Figure 1:
Figure 2:
How can I do it with R? This is my current code:
df = read.csv('TTR-HIU/resultados.csv',header=FALSE,colClasses=c("NULL",NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA))
df.bar <- barplot(as.matrix(df[-nrow(df),]),beside=TRUE,col=c("darkblue","red"))
Using ggplot2, you could try something like this:
# put data in data frame:
df <- data.frame(HIU = c(0.0833333333,0,0.35,0.0208333333,0.40625,0,0.21875,0.125,0.078125,0.0104166667,1,0.53125,0.4375),
TTHY = c(0,0,0.8,0,0.5,0,0.7083333333,0.2708333333,0,0.6597222222,0,0.1435185185,0),
Full= c(0.0554986339,0.1034836066,0.4620901639,0.0683060109,0.4961577869,0.0696721311,0.222079918,0.1465163934,0.2085040984,0.0476007514,0.893613388,0.396943306,0.4223872951))
library(ggplot2)
library(tidyr) # to make data long (gather)
# create x-values:
df$x <- as.factor(seq_len(nrow(df)))
# make data long for ggplot2:
df_long <- df %>% gather(key, value, -x)
ggplot() +
# plot bars:
geom_col(data = subset(df_long, key %in% c("HIU", "TTHY")),
mapping = aes(x = x, y = value, fill = key),
position = position_dodge()) +
# plot lines:
geom_line(data = subset(df_long, key == "Full"),
mapping = aes(x = x, y = value, group = key, color = key),
size = 2) +
# make plot look a little like your desired output:
scale_color_manual(values = c("Full" = "yellow")) +
scale_fill_manual(values = c("HIU" = "blue", "TTHY" = "red")) +
theme_minimal() +
theme(axis.title = element_blank(),
legend.title = element_blank(),
panel.grid.major.x = element_blank(),
panel.grid.minor.x = element_blank())
However, you might have to put your data in data-frame-shape as in this example. Use dput to show how your data exactly looks like, if you need further help...
I'm trying to use ggplot to create sequence plots, for the sake of keeping the same visual style within my paper using sequence analysis. I do:
library(ggplot2)
library(TraMineR)
library(dplyr)
library(tidyr)
data(mvad)
mvad_seq<-seqdef(mvad,15:length(mvad))
mvad_trate<-seqsubm(mvad_seq,method="TRATE")
mvad_dist<-seqdist(mvad_seq,method="OM",sm=mvad_trate)
cluster<-cutree(hclust(d=as.dist(mvad_dist),method="ward.D2"),k=6)
mvad$cluster<-cluster
mvad_long<-gather(select(mvad,id,contains("."),-matches("N.Eastern"),-matches("S.Eastern")),
key="Month",value="state",
Jul.93, Aug.93, Sep.93, Oct.93, Nov.93, Dec.93, Jan.94, Feb.94, Mar.94,
Apr.94, May.94, Jun.94, Jul.94, Aug.94, Sep.94, Oct.94, Nov.94, Dec.94, Jan.95,
Feb.95, Mar.95, Apr.95, May.95, Jun.95, Jul.95, Aug.95, Sep.95, Oct.95, Nov.95,
Dec.95, Jan.96, Feb.96, Mar.96, Apr.96, May.96, Jun.96, Jul.96, Aug.96, Sep.96,
Oct.96, Nov.96, Dec.96, Jan.97, Feb.97, Mar.97, Apr.97, May.97, Jun.97, Jul.97,
Aug.97, Sep.97, Oct.97, Nov.97, Dec.97, Jan.98, Feb.98, Mar.98, Apr.98, May.98,
Jun.98, Jul.98, Aug.98, Sep.98, Oct.98, Nov.98, Dec.98, Jan.99, Feb.99, Mar.99,
Apr.99, May.99, Jun.99)
mvad_long<-left_join(mvad_long,select(mvad,id,cluster))
ggplot(data=mvad_long,aes(x=Month,y=id,fill=state))+geom_tile()+facet_wrap(~cluster)
I try to plot the sequences by cluster, and this gives me the following plot:
As you can see, there are gaps for the ids that don't belong to the cluster represented by each facet. I would like to get rid of these gaps, so that the sequences show up stacked just as with the seqIplot() function of TraMineR as in the next figure:
Any suggestions of how to proceed?
Two small changes:
mvad_long$id <- as.factor(mvad_long$id)
ggplot(data=mvad_long,aes(x=Month,y=id,fill=state))+
geom_tile()+facet_wrap(~cluster,scales = "free_y")
ggplot was treating id as a numerical variable, rather than a factor, and then the scales were fixed.
An update: I needed to convert the month in to a date for it to work. Full solution follows:
library(ggplot2)
library(TraMineR)
library(dplyr)
library(tidyr)
library(lubridate)
data(mvad)
mvad_seq <- seqdef(mvad, 15:length(mvad))
mvad_trate <- seqsubm(mvad_seq, method = "TRATE")
mvad_dist <- seqdist(mvad_seq, method = "OM", sm = mvad_trate)
cluster <- cutree(hclust(d = as.dist(mvad_dist), method = "ward.D2"), k = 6)
mvad$cluster <- cluster
mvad_long <- mvad %>%
select(id, matches("\\.\\d\\d")) %>%
gather(key = "month", value = "state", -id) %>%
inner_join(
mvad %>%
select(id, cluster),
by = "id"
) %>%
mutate(
id = factor(id),
date = myd(paste0(month, "01"))
)
mvad_long %>%
ggplot(aes(x = date, y = id, fill = state, color = state)) +
geom_tile() +
facet_wrap(~cluster, scales = "free_y", ncol = 2) +
theme_bw() +
theme(
axis.text.y = element_blank(),
axis.ticks.y = element_blank(),
panel.grid = element_blank()
) +
scale_fill_brewer(palette = "Accent") +
scale_colour_brewer(palette = "Accent") +
labs(x = "", y = "")