ggplot2 multiple time-series plots - r

I'm just learning ggplot, so my apologies if this is a really basic question. I have data that has been aggregated by year with a few different qualities to slice on (code below will generate sample data). I'm trying to show a few different charts: one that shows overall for a given metric, then a couple that show the same metric split across the qualities, but its not going right. Ideally, I want to make the plot once, then call the geom layer for each of the individual charts. I do have examples of how I want it to look in the code as well.
I'm starting to think this is a data structure issue, but really can't figure it out.
Secondary question - My years are formatted as integers, is that the best way to do that here, or should I convert them to dates?
library(data.table)
library(ggplot2)
#Generate Sample Data - Yearly summarized data
BaseData <- data.table(expand.grid(dataYear = rep(2010:2017),
Program = c("A","B","C"),
Indicator = c("0","1")))
set.seed(123)
BaseData$Metric1 <- runif(nrow(BaseData),min = 10000,100000)
BaseData$Metric2 <- runif(nrow(BaseData),min = 10000,100000)
BaseData$Metric3 <- runif(nrow(BaseData),min = 10000,100000)
BP <- ggplot(BaseData, aes(dataYear,Metric1))
BP + geom_area() #overall Aggregate
BP + geom_area(position = "stack", aes(fill = Program)) #Stacked by Program
BP + geom_area(position = "stack", aes(fill = Indicator)) #stacked by Indicator
#How I want them to look
##overall Aggregate
BP.Agg <- BaseData[,.(Metric1 = sum(Metric1)),
by = dataYear]
ggplot(BP.Agg,aes(dataYear, Metric1))+geom_area()
##Stacked by Program
BP.Pro <- BaseData[,.(Metric1 = sum(Metric1)),
by = .(dataYear,
Program)]
ggplot(BP.Pro,aes(dataYear, Metric1, fill = Program))+geom_area(position = "stack")
##stacked by Indicator
BP.Ind <- BaseData[,.(Metric1 = sum(Metric1)),
by = .(dataYear,
Indicator)]
ggplot(BP.Ind,aes(dataYear, Metric1, fill = Indicator))+geom_area(position = "stack")

I was right, it was an easy fix. I should have used stat_summary instead of geom_area, here are the correct layers to add:
BP + stat_summary(fun.y = sum, geom = "area")
BP + stat_summary(fun.y = sum, geom = "area", position = "stack", aes(fill = Program, group = Program))
BP + stat_summary(fun.y = sum, geom = "area", position = "stack", aes(fill = Indicator, group = Indicator))

Related

How do you create a plot from two different data frames (or how do you combine data frames with identical column names)

I have two dataframes and I want to plot a comparison between them. The plot and dataframes look like so
df2019 <- data.frame(Institute = c("A","B","C"),Women = c(65,50,70),Men = c(35,50,30))
df2016 <- data.frame(Institute = c("A","B","C"),Women = c(70,45,50),Men = c(30,55,50))
df2019_melted <- melt(df2019)
ggplot(data = df2019_melted, aes(x = Institute, y = value, fill = variable))+
geom_bar(stat = "identity", position = "dodge")+
labs(fill = "Gender")+
xlab("Institute")+
ylab("Percent")+
scale_fill_discrete(labels = c("Women","Men"))+
ggtitle("Overall Gender Composition 2019")
but I want the plot to show 2016 in faded bars, but grouped the same way as 2019, so 4 bars for each Institute.
Since the column names are the same for all of my dataframes I cant use rbind() or similar since it doesnt differentiate between what dataframe is what when combined.
Add a column for year to your data frames and then combine and melt. ggplot prefers everything to be in one data.frame
all_melted <- reshape2::melt(
rbind(cbind(df2019, year=2019), cbind(df2016, year=2016)),
id=c("year", "Institute"))
Then you can plot with something like this, mapping year to alpha to make "faded" bars
ggplot(all_melted, aes(x = Institute, y = value, fill = variable, alpha=factor(year)))+
geom_col(position = "dodge")+
labs(fill = "Gender")+
xlab("Institute")+
ylab("Percent")+
scale_alpha_discrete(range=c(.4, 1), name="Year") +
ggtitle("Overall Gender Composition")

Barplot side by side and line charts in the same plot

I want to create in R a plot which contains side by side bars and line charts as follows:
I tried:
Total <- c(584,605,664,711,759,795,863,954,1008,1061,1117,1150)
Infected <- c(366,359,388,402,427,422,462,524,570,560,578,577)
Recovered <- c(212,240,269,301,320,359,385,413,421,483,516,548)
Death <- c(6,6,7,8,12,14,16,17,17,18,23,25)
day <- itemizeDates(startDate="01.04.20", endDate="12.04.20")
df <- data.frame(Day=day, Infected=Infected, Recovered=Recovered, Death=Death, Total=Total)
value_matrix = matrix(, nrow = 2, ncol = 12)
value_matrix[1,] = df$Recovered
value_matrix[2,] = df$Death
plot(c(1:12), df$Total, ylim=c(0,1200), xlim=c(1,12), type = "b", col="peachpuff", xaxt="n", xlab = "", ylab = "")
points(c(1:12), df$Infected, type = "b", col="red")
barplot(value_matrix, beside = TRUE, col = c("green", "black"), width = 0.35, add = TRUE)
But the bar chart does not fit the line chart. I guess it would be easier to use ggplot2, but don't know how. Could anyone help me? Thanks a lot in advance!
With ggplot2, the margins are handled nicely for you, but you'll need the data in two separate long forms. Reshape from wide to long with tidyr::gather, tidyr::pivot_longer, reshape2::melt, reshape, or whatever you prefer.
library(tidyr)
library(ggplot2)
df <- data.frame(
Total = c(584,605,664,711,759,795,863,954,1008,1061,1117,1150),
Infected = c(366,359,388,402,427,422,462,524,570,560,578,577),
Recovered = c(212,240,269,301,320,359,385,413,421,483,516,548),
Death = c(6,6,7,8,12,14,16,17,17,18,23,25),
day = seq(as.Date("2020-04-01"), as.Date("2020-04-12"), by = 'day')
)
ggplot(
tidyr::gather(df, Population, count, Total:Infected),
aes(day, count, color = Population, fill = Population)
) +
geom_line() +
geom_point() +
geom_col(
data = tidyr::gather(df, Population, count, Recovered:Death),
position = 'dodge', show.legend = FALSE
)
Another way to do it is to gather twice before plotting. Not sure if this is easier or harder to understand, but you get the same thing.
df %>%
tidyr::gather(Population, count, Total:Infected) %>%
tidyr::gather(Resolution, count2, Recovered:Death) %>%
ggplot(aes(x = day, y = count, color = Population)) +
geom_line() +
geom_point() +
geom_col(
aes(y = count2, color = Resolution, fill = Resolution),
position = 'dodge', show.legend = FALSE
)
You can actually plot the lines and points without reshaping by making separate calls for each, but to dodge bars (or get legends), you'll definitely need to reshape.

Geom_bar with R (Beginner)

Good morning all,
I work on data that I would like to represent in the form of a bar graph by two according to my two departments. I generated a dataframe that looks like this:
> test = data.frame (type_transport = sample (c ("ON FOOT", "CAR", "TRANSPORT COMMON"), 5000, replace = T), type_route = sample (c ("N", "D", " A "," VC "), 5000, replace = T), department = sample (c (" department1"," department2"), 5000, replace = T), troncon = sample (x = 0: 17 , 5000, replace = T))
By entering this formula, I get a bar graph:
> ggplot (test, aes (x = route_type, y = troncon_km, fill = department)) + geom_bar (stat = "identity")
https://zupimages.net/viewer.php?id=20/19/vt1s.png
Now, I would like to split these bars in half, to display the data according to my two departments. For this, I use position = "dodge":
> ggplot (test, aes (x = road_type, y = troncon_km, fill = department)) + geom_bar (stat = "identity", position = "dodge")
But there is a problem. The Y scale is far too small compared to reality (we go from several thousand on the first graph to 15 on the second). I obviously missed something ...
https://zupimages.net/viewer.php?id=20/19/sbh5.png
I do not understand.
Thank you.
The reason why all bars are of equal height is because geom_bar(stat="identity") will plot a bar for each observation (and the height of the bar will equal the value for that observation). Since every category in both departments have at least 1 observation of 17, all bars are showing that value.
There are several ways to move forward:
1.
ggplot(test, aes(type_route, troncon_km, fill = department)) +
stat_summary(geom = "bar", position = "dodge", fun.y = sum)
The fun.y argument can be any other function (e.g. mean, or median etc.)
2.
library("tidyverse")
total_km <- test %>%
group_by(department, type_route) %>%
summarise(total_km = sum(troncon_km))
ggplot(total_km, aes(type_route, total_km, fill = department)) +
geom_bar(stat = "identity", position = "dodge")
Again you can change the sum() function within the summarise() to your liking.
using the same data frame total_km, only a litle bit shorter using geom_col
ggplot(total_km, aes(type_route, total_km, fill = department)) +
geom_col(position = "dodge")
Hope this helps.

position_dodge() does not seem to work with stat_summary() and x variables with different fill groups

Using stat_summary(geom = "bar) + stat_summary(geom = "errorbar") does not seem to work with position_dodge(), in the case of x values with varying numbers of condition groups.
I am trying to make a (what should be straightforward) barplot with ggplot2. My data has a number of different samples (x variable), and some of these samples also have a fill (condition) variable ("Scr" or "shRNA") while others don't (condition = NA). When I attempt to plot these data using the stat_summary wrappers to make bar plots with error bars, the position_dodge function for errorbars only works on samples that do not have different fill groups. The stat_summary(geom = "barplot") seems to be functional, because the separate bars do show up, but their error bars are not aligned.
test <- data.frame(Sample = c(rep("A",6),rep("B",3)),
Target = c(rep("GENE1",9)),
val = c(1.1,1.2,1.15,.5,.6,.7,.95,1,1.05),
condition = c(rep("Scr",3),rep("shRNA",3),rep(NA,3)))
g <- ggplot(data=test,aes(x=Sample,y=val,fill=condition)) +
stat_summary(geom = "bar", fun.y = mean,position = position_dodge2(width=.5,preserve = "single"),color="black",width=.8) +
stat_summary(geom = "errorbar", fun.data = mean_se, position = position_dodge2(width=.2,preserve = "single"),width=.2) +
scale_y_continuous(expand = expand_scale(mult = c(0,.2))) +
#scale_fill_discrete(guide=guide_legend(title="",nrow=2))
I expect the position_dodge() argument in both stat_summary()'s to align error bars to the correct x position, regardless of whether or not that particular sample has one or two fill groups.
I'm a bit confused about what you're trying to do. Why not use geom_col/geom_bar instead of stat_summary? I always prefer keeping data manipulation/summarisation and plotting separate.
This is what I'd do
library(tidyverse)
test %>%
group_by(Sample, condition) %>%
summarise(val.mean = mean(val), val.sd = sd(val)) %>%
ggplot(aes(Sample, val.mean, fill = condition)) +
geom_col(position = position_dodge(width = 0.8)) +
geom_errorbar(
aes(ymin = val.mean - val.sd, ymax = val.mean + val.sd),
position = position_dodge(width = 0.8),
width = 0.2)

R : ggplot2 plot several data frames in one plot

I'm little bit stuck on ggplot2 trying to plot several data frame in one plot.
I have several data frame here I'll present just two exemples.
The data frame have the same Header but are different. Let say that I want to count balls that I have in 2 boxes.
name=c('red','blue','green','purple','white','black')
value1=c(2,3,4,2,6,8)
value2=c(1,5,7,3,4,2)
test1=data.frame("Color"=name,"Count"=value1)
test2=data.frame("Color"=name,"Count"=value2)
What I'm trying to do it's to make a bar plot of my count.
At the moment what I did it's :
(plot_test=ggplot(NULL, aes(x= Color, y=Count)) +
geom_bar(data=test1,stat = "identity",color='green')+
geom_bar(data=test2,stat = "identity",color='blue')
)
I want to have x=Color and y=Count, and barplot of test2 data frame next to test1. Here there are overlapping themselves. So I'll have same name twice in x but I want to plot the data frames in several color and got in legend the name.
For example "Green bar" = test1
"Blue bar" = test2
Thank you for your time and your help.
Best regards
You have two options here:
Either tweak the size and position of the bars
ggplot(NULL, aes(x= Color, y=Count)) +
geom_bar(data=test1, aes(color='test1'), stat = "identity",
width=.4, position=position_nudge(x = -0.2)) +
geom_bar(data=test2, aes(color='test2'), stat = "identity",
width=.4, position=position_nudge(x = 0.2))
or what I recommend is join the two data frames together and then plot
library(dplyr)
test1 %>%
full_join(test2, by = 'Color') %>%
data.table::melt(id.vars = 'Color') %>%
ggplot(aes(x= Color, y=value, fill = variable)) +
geom_bar(stat = "identity", position = 'dodge')
Try this:
name=c('red','blue','green','purple','white','black')
value1=c(2,3,4,2,6,8)
value2=c(1,5,7,3,4,2)
test1=data.frame("Color"=name,"Count"=value1)
test2=data.frame("Color"=name,"Count"=value2)
test1$var <- 'test1'
test2$var <- 'test2'
test_all <- rbind(test1,test2)
(plot_test=ggplot(data=test_all) +
geom_bar(aes(x=Color,y=Count,color=var),
stat = "identity", position=position_dodge(1))+
scale_color_manual(values = c('green', 'blue'))
)
This will do what you were trying to do:
balls <- data.frame(
count = c(c(2,3,4,2,6,8),c(1,5,7,3,4,2)),
colour = c(c('red','blue','green','purple','white','black'),c('red','blue','green','purple','white','black')),
box = c(rep("1", times = 6), rep("2", times = 6))
)
ggplot(balls, aes(x = colour, y = count, fill = box)) +
geom_col() +
scale_fill_manual(values = c("green","blue"))
This is better because it facilitates comparisons between the box counts:
ggplot(balls, aes(x = colour, y = count)) +
geom_col() +
facet_wrap(~ box, ncol = 1, labeller = as_labeller(c("1" = "Box #1", "2" = "Box #2")))

Resources