I'm just learning ggplot, so my apologies if this is a really basic question. I have data that has been aggregated by year with a few different qualities to slice on (code below will generate sample data). I'm trying to show a few different charts: one that shows overall for a given metric, then a couple that show the same metric split across the qualities, but its not going right. Ideally, I want to make the plot once, then call the geom layer for each of the individual charts. I do have examples of how I want it to look in the code as well.
I'm starting to think this is a data structure issue, but really can't figure it out.
Secondary question - My years are formatted as integers, is that the best way to do that here, or should I convert them to dates?
library(data.table)
library(ggplot2)
#Generate Sample Data - Yearly summarized data
BaseData <- data.table(expand.grid(dataYear = rep(2010:2017),
Program = c("A","B","C"),
Indicator = c("0","1")))
set.seed(123)
BaseData$Metric1 <- runif(nrow(BaseData),min = 10000,100000)
BaseData$Metric2 <- runif(nrow(BaseData),min = 10000,100000)
BaseData$Metric3 <- runif(nrow(BaseData),min = 10000,100000)
BP <- ggplot(BaseData, aes(dataYear,Metric1))
BP + geom_area() #overall Aggregate
BP + geom_area(position = "stack", aes(fill = Program)) #Stacked by Program
BP + geom_area(position = "stack", aes(fill = Indicator)) #stacked by Indicator
#How I want them to look
##overall Aggregate
BP.Agg <- BaseData[,.(Metric1 = sum(Metric1)),
by = dataYear]
ggplot(BP.Agg,aes(dataYear, Metric1))+geom_area()
##Stacked by Program
BP.Pro <- BaseData[,.(Metric1 = sum(Metric1)),
by = .(dataYear,
Program)]
ggplot(BP.Pro,aes(dataYear, Metric1, fill = Program))+geom_area(position = "stack")
##stacked by Indicator
BP.Ind <- BaseData[,.(Metric1 = sum(Metric1)),
by = .(dataYear,
Indicator)]
ggplot(BP.Ind,aes(dataYear, Metric1, fill = Indicator))+geom_area(position = "stack")
I was right, it was an easy fix. I should have used stat_summary instead of geom_area, here are the correct layers to add:
BP + stat_summary(fun.y = sum, geom = "area")
BP + stat_summary(fun.y = sum, geom = "area", position = "stack", aes(fill = Program, group = Program))
BP + stat_summary(fun.y = sum, geom = "area", position = "stack", aes(fill = Indicator, group = Indicator))
Related
I have two dataframes and I want to plot a comparison between them. The plot and dataframes look like so
df2019 <- data.frame(Institute = c("A","B","C"),Women = c(65,50,70),Men = c(35,50,30))
df2016 <- data.frame(Institute = c("A","B","C"),Women = c(70,45,50),Men = c(30,55,50))
df2019_melted <- melt(df2019)
ggplot(data = df2019_melted, aes(x = Institute, y = value, fill = variable))+
geom_bar(stat = "identity", position = "dodge")+
labs(fill = "Gender")+
xlab("Institute")+
ylab("Percent")+
scale_fill_discrete(labels = c("Women","Men"))+
ggtitle("Overall Gender Composition 2019")
but I want the plot to show 2016 in faded bars, but grouped the same way as 2019, so 4 bars for each Institute.
Since the column names are the same for all of my dataframes I cant use rbind() or similar since it doesnt differentiate between what dataframe is what when combined.
Add a column for year to your data frames and then combine and melt. ggplot prefers everything to be in one data.frame
all_melted <- reshape2::melt(
rbind(cbind(df2019, year=2019), cbind(df2016, year=2016)),
id=c("year", "Institute"))
Then you can plot with something like this, mapping year to alpha to make "faded" bars
ggplot(all_melted, aes(x = Institute, y = value, fill = variable, alpha=factor(year)))+
geom_col(position = "dodge")+
labs(fill = "Gender")+
xlab("Institute")+
ylab("Percent")+
scale_alpha_discrete(range=c(.4, 1), name="Year") +
ggtitle("Overall Gender Composition")
I want to create in R a plot which contains side by side bars and line charts as follows:
I tried:
Total <- c(584,605,664,711,759,795,863,954,1008,1061,1117,1150)
Infected <- c(366,359,388,402,427,422,462,524,570,560,578,577)
Recovered <- c(212,240,269,301,320,359,385,413,421,483,516,548)
Death <- c(6,6,7,8,12,14,16,17,17,18,23,25)
day <- itemizeDates(startDate="01.04.20", endDate="12.04.20")
df <- data.frame(Day=day, Infected=Infected, Recovered=Recovered, Death=Death, Total=Total)
value_matrix = matrix(, nrow = 2, ncol = 12)
value_matrix[1,] = df$Recovered
value_matrix[2,] = df$Death
plot(c(1:12), df$Total, ylim=c(0,1200), xlim=c(1,12), type = "b", col="peachpuff", xaxt="n", xlab = "", ylab = "")
points(c(1:12), df$Infected, type = "b", col="red")
barplot(value_matrix, beside = TRUE, col = c("green", "black"), width = 0.35, add = TRUE)
But the bar chart does not fit the line chart. I guess it would be easier to use ggplot2, but don't know how. Could anyone help me? Thanks a lot in advance!
With ggplot2, the margins are handled nicely for you, but you'll need the data in two separate long forms. Reshape from wide to long with tidyr::gather, tidyr::pivot_longer, reshape2::melt, reshape, or whatever you prefer.
library(tidyr)
library(ggplot2)
df <- data.frame(
Total = c(584,605,664,711,759,795,863,954,1008,1061,1117,1150),
Infected = c(366,359,388,402,427,422,462,524,570,560,578,577),
Recovered = c(212,240,269,301,320,359,385,413,421,483,516,548),
Death = c(6,6,7,8,12,14,16,17,17,18,23,25),
day = seq(as.Date("2020-04-01"), as.Date("2020-04-12"), by = 'day')
)
ggplot(
tidyr::gather(df, Population, count, Total:Infected),
aes(day, count, color = Population, fill = Population)
) +
geom_line() +
geom_point() +
geom_col(
data = tidyr::gather(df, Population, count, Recovered:Death),
position = 'dodge', show.legend = FALSE
)
Another way to do it is to gather twice before plotting. Not sure if this is easier or harder to understand, but you get the same thing.
df %>%
tidyr::gather(Population, count, Total:Infected) %>%
tidyr::gather(Resolution, count2, Recovered:Death) %>%
ggplot(aes(x = day, y = count, color = Population)) +
geom_line() +
geom_point() +
geom_col(
aes(y = count2, color = Resolution, fill = Resolution),
position = 'dodge', show.legend = FALSE
)
You can actually plot the lines and points without reshaping by making separate calls for each, but to dodge bars (or get legends), you'll definitely need to reshape.
Good morning all,
I work on data that I would like to represent in the form of a bar graph by two according to my two departments. I generated a dataframe that looks like this:
> test = data.frame (type_transport = sample (c ("ON FOOT", "CAR", "TRANSPORT COMMON"), 5000, replace = T), type_route = sample (c ("N", "D", " A "," VC "), 5000, replace = T), department = sample (c (" department1"," department2"), 5000, replace = T), troncon = sample (x = 0: 17 , 5000, replace = T))
By entering this formula, I get a bar graph:
> ggplot (test, aes (x = route_type, y = troncon_km, fill = department)) + geom_bar (stat = "identity")
https://zupimages.net/viewer.php?id=20/19/vt1s.png
Now, I would like to split these bars in half, to display the data according to my two departments. For this, I use position = "dodge":
> ggplot (test, aes (x = road_type, y = troncon_km, fill = department)) + geom_bar (stat = "identity", position = "dodge")
But there is a problem. The Y scale is far too small compared to reality (we go from several thousand on the first graph to 15 on the second). I obviously missed something ...
https://zupimages.net/viewer.php?id=20/19/sbh5.png
I do not understand.
Thank you.
The reason why all bars are of equal height is because geom_bar(stat="identity") will plot a bar for each observation (and the height of the bar will equal the value for that observation). Since every category in both departments have at least 1 observation of 17, all bars are showing that value.
There are several ways to move forward:
1.
ggplot(test, aes(type_route, troncon_km, fill = department)) +
stat_summary(geom = "bar", position = "dodge", fun.y = sum)
The fun.y argument can be any other function (e.g. mean, or median etc.)
2.
library("tidyverse")
total_km <- test %>%
group_by(department, type_route) %>%
summarise(total_km = sum(troncon_km))
ggplot(total_km, aes(type_route, total_km, fill = department)) +
geom_bar(stat = "identity", position = "dodge")
Again you can change the sum() function within the summarise() to your liking.
using the same data frame total_km, only a litle bit shorter using geom_col
ggplot(total_km, aes(type_route, total_km, fill = department)) +
geom_col(position = "dodge")
Hope this helps.
Using stat_summary(geom = "bar) + stat_summary(geom = "errorbar") does not seem to work with position_dodge(), in the case of x values with varying numbers of condition groups.
I am trying to make a (what should be straightforward) barplot with ggplot2. My data has a number of different samples (x variable), and some of these samples also have a fill (condition) variable ("Scr" or "shRNA") while others don't (condition = NA). When I attempt to plot these data using the stat_summary wrappers to make bar plots with error bars, the position_dodge function for errorbars only works on samples that do not have different fill groups. The stat_summary(geom = "barplot") seems to be functional, because the separate bars do show up, but their error bars are not aligned.
test <- data.frame(Sample = c(rep("A",6),rep("B",3)),
Target = c(rep("GENE1",9)),
val = c(1.1,1.2,1.15,.5,.6,.7,.95,1,1.05),
condition = c(rep("Scr",3),rep("shRNA",3),rep(NA,3)))
g <- ggplot(data=test,aes(x=Sample,y=val,fill=condition)) +
stat_summary(geom = "bar", fun.y = mean,position = position_dodge2(width=.5,preserve = "single"),color="black",width=.8) +
stat_summary(geom = "errorbar", fun.data = mean_se, position = position_dodge2(width=.2,preserve = "single"),width=.2) +
scale_y_continuous(expand = expand_scale(mult = c(0,.2))) +
#scale_fill_discrete(guide=guide_legend(title="",nrow=2))
I expect the position_dodge() argument in both stat_summary()'s to align error bars to the correct x position, regardless of whether or not that particular sample has one or two fill groups.
I'm a bit confused about what you're trying to do. Why not use geom_col/geom_bar instead of stat_summary? I always prefer keeping data manipulation/summarisation and plotting separate.
This is what I'd do
library(tidyverse)
test %>%
group_by(Sample, condition) %>%
summarise(val.mean = mean(val), val.sd = sd(val)) %>%
ggplot(aes(Sample, val.mean, fill = condition)) +
geom_col(position = position_dodge(width = 0.8)) +
geom_errorbar(
aes(ymin = val.mean - val.sd, ymax = val.mean + val.sd),
position = position_dodge(width = 0.8),
width = 0.2)
I'm little bit stuck on ggplot2 trying to plot several data frame in one plot.
I have several data frame here I'll present just two exemples.
The data frame have the same Header but are different. Let say that I want to count balls that I have in 2 boxes.
name=c('red','blue','green','purple','white','black')
value1=c(2,3,4,2,6,8)
value2=c(1,5,7,3,4,2)
test1=data.frame("Color"=name,"Count"=value1)
test2=data.frame("Color"=name,"Count"=value2)
What I'm trying to do it's to make a bar plot of my count.
At the moment what I did it's :
(plot_test=ggplot(NULL, aes(x= Color, y=Count)) +
geom_bar(data=test1,stat = "identity",color='green')+
geom_bar(data=test2,stat = "identity",color='blue')
)
I want to have x=Color and y=Count, and barplot of test2 data frame next to test1. Here there are overlapping themselves. So I'll have same name twice in x but I want to plot the data frames in several color and got in legend the name.
For example "Green bar" = test1
"Blue bar" = test2
Thank you for your time and your help.
Best regards
You have two options here:
Either tweak the size and position of the bars
ggplot(NULL, aes(x= Color, y=Count)) +
geom_bar(data=test1, aes(color='test1'), stat = "identity",
width=.4, position=position_nudge(x = -0.2)) +
geom_bar(data=test2, aes(color='test2'), stat = "identity",
width=.4, position=position_nudge(x = 0.2))
or what I recommend is join the two data frames together and then plot
library(dplyr)
test1 %>%
full_join(test2, by = 'Color') %>%
data.table::melt(id.vars = 'Color') %>%
ggplot(aes(x= Color, y=value, fill = variable)) +
geom_bar(stat = "identity", position = 'dodge')
Try this:
name=c('red','blue','green','purple','white','black')
value1=c(2,3,4,2,6,8)
value2=c(1,5,7,3,4,2)
test1=data.frame("Color"=name,"Count"=value1)
test2=data.frame("Color"=name,"Count"=value2)
test1$var <- 'test1'
test2$var <- 'test2'
test_all <- rbind(test1,test2)
(plot_test=ggplot(data=test_all) +
geom_bar(aes(x=Color,y=Count,color=var),
stat = "identity", position=position_dodge(1))+
scale_color_manual(values = c('green', 'blue'))
)
This will do what you were trying to do:
balls <- data.frame(
count = c(c(2,3,4,2,6,8),c(1,5,7,3,4,2)),
colour = c(c('red','blue','green','purple','white','black'),c('red','blue','green','purple','white','black')),
box = c(rep("1", times = 6), rep("2", times = 6))
)
ggplot(balls, aes(x = colour, y = count, fill = box)) +
geom_col() +
scale_fill_manual(values = c("green","blue"))
This is better because it facilitates comparisons between the box counts:
ggplot(balls, aes(x = colour, y = count)) +
geom_col() +
facet_wrap(~ box, ncol = 1, labeller = as_labeller(c("1" = "Box #1", "2" = "Box #2")))