stacked barplot with aggregated data (ggplot2) - r

I have some major problems with ggplot2. Even though it might be a very easy question to you I couldnt manage yet to get it right (I have read a ggplot2-book and was looking on stackoverflow as well).
Orginally there was a dataset consisting of a factor variable (country) and a dichotomous variable. Unfortunately I dont have my data in this extended format myself: I have two variables "var1" and "var2". var1 gives the number of cases in the original dataset where a certain condition is true and var2 gives the number of cases where the same condition is false:
country | var1 | var2
----------------------
"A" | 20 | 30
"B" | 24 | 53
"C" | 21 | 24
"D" | 30 | 66
Now I'd like to produce a stacked barplot with a y-axis showing percentages within each country (all bars should have the same height) plus the absolute numbers displayed within the bars:
How it should look
I found out that if the data were in an extended format, I could use
ggplot(data=dataset)+geom_bar(aes(x=country, fill=variable), position='fill')
However, I only have aggregated data.
Could anyone please help me?
Thank you!

A quick solution by first reshaping and then plotting:
library(ggplot2)
library(tidyr)
temp2 <- gather(temp, var, val, -country)
ggplot(temp2, aes(country, val, fill = var)) +
geom_col(position = 'fill') +
geom_text(aes(label = val), position = position_fill(vjust = 0.5)) +
scale_y_continuous(labels = scales::percent_format())

library('ggplot2')
library('data.table')
df1 <- fread('country var1 var2
"A" 20 30
"B" 24 53
"C" 21 24
"D" 30 66')
df1 <- melt(df1, id.vars = 'country', )
df1[, percent := prop.table(value)*100, by = 'country']
ggplot(data = df1, aes( x = country, y = percent, fill = variable, label = value )) +
geom_bar(stat = 'identity') +
geom_text(size = 5, position = position_stack(vjust = 0.5))

Related

R ggplot geom_point overlay from 2 data frames, differentiated by color, subset by id

I have two data frames with identical rows and columns, DataMaster and IMPSAVG, for which I'm trying to create a series of combined overlaid 2d scatterplots (subset by country "ids" and variable columns) with observations from the two data sets differentiated by color in ggplot. The code below does not work, but gives a sense of what I'm aiming for (acctm is the variable and ARG is the country in this example).
ggplot() +
geom_point(data=DataMaster, aes(x="Year", y="acctm"), subset = .(Country %in% c("ARG")), shape=21, color= "red") +
geom_point(data=IMPSAVG, aes(x="Year", y="acctm"), subset = .(Country %in% c("ARG")), shape=21, color= "blue")
While just getting the above to work would be much appreciated, a loop to create separate plots of this variable for all unique country ids in the column Country found in both datasets (also specified by the vector CountryList$Country) would be amazing. Thanks!
Without reproducible example of your dataset, it is hard ot be sure of what you ar elooking for.
However, using these fake datasets:
df1 <- data.frame(Country = c("A","A","A","B","B"),
Year = 2010:2014,
Value = sample(1:100,5))
df2 <- data.frame(Country = c("A","A","A","B","B"),
Year = 2010:2014,
Value = sample(1:100,5))
1) Plotting without joining datasets (not the most appropriate)
You don't have to absolutely assemble your dataframes to plot them, however it will make things a little bit harder (especially if you want to customize several parameters).
Here you can do:
library(ggplot2)
ggplot()+
geom_point(data = df1, aes(x = Year, y = Value, color = "blue"), shape = 21)+
geom_point(data = df2, aes(x = Year, y = Value, color = "red"), shape = 21, show.legend = TRUE)+
scale_color_manual(values = c("blue","red"), labels = c("df1","df2"), name = "")
2) Assembling both dataframes (best way to do it)
However, it will be much easier if you assemble your both dataframes (ggplot2 is designed to work with dataframes in a longer format).
So, here, you can do:
df1$Dataset = "DF1"
df2$Dataset = "DF2"
DF <- rbind(df1,df2)
Country Year Value Dataset
1 A 2010 66 DF1
2 A 2011 64 DF1
3 A 2012 40 DF1
4 B 2013 58 DF1
5 B 2014 20 DF1
6 A 2010 78 DF2
7 A 2011 25 DF2
8 A 2012 71 DF2
9 B 2013 40 DF2
10 B 2014 61 DF2
Now, you can simply plot it like this which is much more concise:
library(ggplot2)
ggplot(DF, aes(x = Year, y = Value, color = Dataset))+
geom_point(shape = 21)
3) Subsetting dataframe
To plot only a subset of your dataframes, starting with the assembled dataframe DF, you can simply do:
library(ggplot2)
ggplot(subset(DF, Country =="A"), aes(x = Year, y = Value, color = Dataset))+
geom_point(shape = 21)
Does it answer your question ?
I think you need to create a new dataframe, which combines those two dataframes and subsets the countries that you are interested in. You can use rbind for combining the two, and also you should add a column for samples indicating which dataframe they are coming from, so that you can use it later in aes(..., color = new_column).
Just to add onto dc37's excellent write up, here is the trick to have one dataframe print on top of the other
ggplot(subset(DF, Country =="A"), aes(x = Year, y = Value, color = Dataset)) +
geom_point(shape = 21, na.rm = T) +
geom_point(data = subset(DF, Dataset == DF1 & Country == "A"),
aes(x = Year, y = compi, color = E), shape = 21, na.rm = T)
where "DF1" is the dataframe you want plotted on top.

Stacked barchart, independent fill order for each stack

I'm facing a behaviour of ggplot2, ordering and stacked barplot that I cannot understand. I've read some question about it (here,here and so on), but unluckily I cannot find a solution that suits to me. Maybe the answer is easy and I cannot see it. Hope it's not a dupe.
My main goal is to have each stack ordered independently, based on the ordering column (called here ordering).
Here I have some data:
library(dplyr)
library(ggplot2)
dats <- data.frame(id = c(1,1,1,2,2,3,3,3,3),
value = c(9,6,4,5,6,4,3,4,5),
ordering = c(1,2,3,2,3,1,3,2,4),
filling = c('a','b','c','b','a','a','c','d','b')) %>% arrange(id,ordering)
So there is an ID, a value, a value to use to order, and a filling, the data are as they should be ordered in the plot, as looking the ordering column.
I tried to plot it: the idea is to plot as a stacked barchart with x axis the id, the value value, filled by filling, but the filling has as order the value of ordering, in an ascending ordering, i.e. biggest value of ordering at the bottom for each column. The ordering of the filling is somewhat equal as the dataset, i.e. each column has an independent order.
As you can imagine those are fake data, so the number of id can vary.
id value ordering filling
1 1 9 1 a
2 1 6 2 b
3 1 4 3 c
4 2 5 2 b
5 2 6 3 a
6 3 4 1 a
7 3 4 2 d
8 3 3 3 c
9 3 5 4 b
When I plot them, there is something I do not understand:
library(dplyr)
dats$filling <- reorder(dats$filling, -dats$ordering)
ggplot(dats,aes(x = id,
y = value,
fill = filling)) +
geom_bar(stat = "identity",position = "stack") +
guides(fill=guide_legend("ordering"))
The second and the third id are not properly ordered, I should have the order of the original dataset.
If you use separate geom_bars, you can make the orders different.
dats %>%
ggplot(aes(x = id, y = value, fill = reorder(filling,-ordering))) +
geom_bar(stat = "identity", position = "stack", data = dats %>% filter(id == 1)) +
geom_bar(stat = "identity", position = "stack", data = dats %>% filter(id == 2)) +
geom_bar(stat = "identity", position = "stack", data = dats %>% filter(id == 3)) +
guides(fill=guide_legend("ordering"))
More generally:
bars <- map(unique(dats$id)
, ~geom_bar(stat = "identity", position = "stack"
, data = dats %>% filter(id == .x)))
dats %>%
ggplot(aes(x = id, y = value, fill = reorder(filling,-ordering))) +
bars +
guides(fill=guide_legend("ordering"))
The problem is that, in your case, different bars should use the same values (levels) of filling in a different order. This conflicts with the way ggplot works: taking the factor levels (which already have a certain order) and applying them in the same way for each bar.
A workaround then is... To create many factor levels.
ggplot(dats, aes(x = id, y = value, fill = interaction(-ordering, id))) +
geom_bar(stat = "identity", position = "stack")
This one now is too "generous" by being too detailed. However, what we can do now is to deal with the legend and the different colors:
dats <- arrange(dats, id, -ordering)
aux <- with(dats, match(sort(unique(filling)), filling))
ggplot(dats, aes(x = id, y = value, fill = interaction(-ordering, id))) +
geom_bar(stat = "identity", position = "stack") +
scale_fill_manual("Ordering", values = scales::hue_pal()(4)[dats$filling],
labels = with(dats, filling[aux]),
breaks = with(dats, interaction(-ordering, id)[aux]))
Here I first rearrange the rows of dats as to avoid doing that later. Then aux is an auxiliary vector
aux
# [1] 3 2 1 8
giving arbitrary positions (one for each) where levels a, b, c, and d (in this order) appear in dats, which again is useful later. Then I simply set corresponding scale values, labels, and breaks... Lastly, I use scales::hue_pal to recover the original color palette.
The problem here is that the element filling = d only appears in the third group with a low value. One solution, could be to fill non-present values with 0:
library(dplyr)
#>
#> Attachement du package : 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(ggplot2)
dats <- data.frame(id = c(1,1,1,1,2,2,2,2,3,3,3,3),
value = c(9,6,4,0,5,6,0,0,4,3,4,5),
ordering = c(1,2,3,5,2,3,5,5,1,3,2,4),
filling = c('a','b','c','d','b','a','c','d','a','c','d','b')) %>% arrange(id,ordering)
ggplot(dats,aes(x = id,
y = value,
fill = reorder(filling,-ordering))) +
geom_bar(stat = "identity",position = "stack") +
guides(fill=guide_legend("ordering"))
Created on 2018-12-03 by the reprex package (v0.2.1)

Order multiple geom_bar in ggplot2 bargraph

I am working on a bar graph that shows counts of cats and dogs that differ across countries. Cats and dogs are levels stored in different factors/ variables. I want to plot the bars for each animal count on top of the other (i.e. 2 layers), and then I want to order the bars from the tallest (i.e. highest count) to lowest according to animal frequency per country.
Here is what I did:
Order the data table according to animal counts per country
plot <- within(plot, country <- factor(country,
levels=names(sort(table(country), decreasing=TRUE))))
Plot the graph
gg <- ggplot(data = plot, aes(x=country))
Add bar for dogs
dogs <- gg +
geom_bar(data = plot[plot$animal1 == 'dog',], #select dogs from animal1 variable
stat="count")
If I do that, I get this (with one geom_bar):
So far, so good. Next, I add the second geom_bar for the cats:
dogs_cats <- gg +
geom_bar(data = plot[plot$animal1 == 'dog',], #select dogs from animal1 variable
stat="count") +
geom_bar(data = plot[plot$animal2 == 'cat',], #select cats from animal2 variable
stat="count")
Now the order is changed and off-key (after the second geom_bar):
How can I maintain the order of the bars to follow the initial geom_bar?
Many thanks!
I suggest you to use merge to create a new data frame:
1.Sum up (ddply and melt)
require(plyr) #ddply
require(reshape2) # melt
df = ddply(plot, "country", summarize, dogs = sum(animal1 == "dog"),
cats = sum(animal2 == "cat"))
dogs_and_cats = melt(df, id = "country")
You might have a new data frame with 3 columns:
country
variable: "dog" or "cat"
value: number of dogs/cats (per country)
2.Plot
ggplot(dogs_and_cats , aes(x = reorder(country, -value), y = value, fill = variable)) +
geom_bar(stat = "identity", position = "dodge")
3.Example:
Here is an example with the diamonds dataset, without a reproducible example:
df = ddply(diamonds, "cut", summarize, J = sum(color == "J"),
D = sum(color == "D"))
plot = melt(df, id = "cut")
ggplot(plot, aes(x = reorder(cut, -value), y = value, fill = variable)) +
geom_bar(stat = "identity", position = "dodge")
Hoom, I did like your code but the order of bars didn't change.
Perhaps you made a simple mistake somewhere.
library(ggplot2)
# make a sample data
set.seed(1); d <- data.frame(animal1 = sample(c("dog", "other"), replace=T, 10000, prob=c(0.7,0.3)),
animal2 = sample(c("cat", "other"), replace=T, 10000, prob=c(0.3,0.7)),
country = sample(LETTERS[1:15], replace=T, 10000, prob=runif(15,0,1)))
levels(d$country) # [1] "A" "B" "C" "D" ...
plot <- within(d, country <- factor(country, levels=names(sort(table(country), decreasing=TRUE))))
levels(plot$country) # [1] "N" "O" "L" "F" ...
gg <- ggplot(data = plot, aes(x=country))
dogs <- gg + geom_bar(data = plot[plot$animal1 == "dog",], stat="count", fill="darkblue")
dogs_cats <- gg +
geom_bar(data = plot[plot$animal1 == "dog",], stat="count", fill="darkblue") +
geom_bar(data = plot[plot$animal2 == "cat",], stat="count", fill="blue")
print(dogs)
print(dogs_cats) # I made below img using library(grid) to form two graphs.

ggplot2: create ordered group bar plot - (use reorder)

I want to create grouped bar plot while keeping order. If it was single column and not a grouped bar plot use of reorder function is obvious. But not sure how to use it on a melted data.frame.
Here is the detail explanation with code example:
Lets say we have following data.frame:
d.nfl <- data.frame(Team1=c("Vikings", "Chicago", "GreenBay", "Detroit"), Win=c(20, 13, 9, 12))
plotting a simple bar plot while flipping it.
ggplot(d.nfl, aes(x = Team1, y=Win)) + geom_bar(aes(fill=Team1), stat="identity") + coord_flip()
above plot will not have an order and if I want to order the plot by win I can do following:
d.nfl$orderedTeam <- reorder(d.nfl$Team1, d.nfl$Win)
ggplot(d.nfl, aes(x = orderedTeam, y=Win)) + geom_bar(aes(fill=orderedTeam), stat="identity") + coord_flip()
Now lets say we add another column (to original data frame)
d.nfl$points <- c(12, 3, 45, 5)
Team1 Win points
1 Vikings 20 12
2 Chicago 13 3
3 GreenBay 9 45
4 Detroit 12 5
to generate grouped bar plot, first we need to melt it:
library(reshape2)
> d.nfl.melt <- melt(d.nfl[,c('Team1','Win','points')],id.vars = 1)
> ggplot(d.nfl.melt,aes(x = Team1,y = value)) + geom_bar(aes(fill = variable),position = "dodge", stat="identity") + coord_flip()
above ggplot is unordered.
but how I do ordered group bar plot (ascending manner)
This is a non-issue.
The easiest way is to not discard your ordered team in the melt:
d.nfl.melt <- melt(d.nfl,id.vars = c("Team1", "orderedTeam"))
Alternatively, we can use reorder after melting and just only use the Win elements in computing the ordering:
d.nfl.melt$ordered_after_melting = reorder(
d.nfl.melt$Team1,
X = d.nfl.melt$value * (d.nfl.melt$variable == "Win")
)
Yet another idea is to take the levels from the original ordered column and apply them to a melted factor:
d.nfl.melt$copied_levels = factor(
d.nfl.melt$Team1,
levels = levels(d.nfl$orderedTeam)
)
All three methods give the same result. (I left out the coord_flips because they don't add anything to the question, but you can of course add them back in.)
gridExtra::grid.arrange(
ggplot(d.nfl.melt,aes(x = orderedTeam, y = value)) +
geom_bar(aes(fill = variable),position = "dodge", stat="identity"),
ggplot(d.nfl.melt,aes(x = ordered_after_melting, y = value)) +
geom_bar(aes(fill = variable),position = "dodge", stat="identity"),
ggplot(d.nfl.melt,aes(x = copied_levels, y = value)) +
geom_bar(aes(fill = variable),position = "dodge", stat="identity")
)
As to the easiest, I would recommend just keeping the orderedTeam variable around while melting. Your code seems to work hard to leave it out, it's quite easy to keep it in.
The challenge your question presents is how to reorder a factor Team1 based on a subset values in a melted column.
The comments to your question from #alistaire and #joran link to great answers.
The tl;dr answer is to just apply the ordering from your original, unmelted data.frame to the new one using levels().
library(reshape2)
#Picking up from your example code:
d.nfl.melt <- melt(d.nfl[,c('Team1','Win','points')],id.vars = 1)
levels(d.nfl.melt$Team1)
#Current order is alphabetical
#[1] "Chicago" "Detroit" "GreenBay" "Vikings"
#Reorder based on Wins (using the same order from your earlier, unmelted data.frame)
d.nfl.melt$Team1 <- factor(d.nfl.melt$Team1, levels = levels(d.nfl$orderedTeam)) #SOLUTION
levels(d.nfl.melt$Team1)
#New order is ascending by wins
#[1] "GreenBay" "Detroit" "Chicago" "Vikings"
ggplot(d.nfl.melt,aes(x = Team1,y = value)) +
geom_bar(aes(fill = variable),position = "dodge", stat="identity") + coord_flip()

Sort stacked bar plot by cumulative value in R

I am pretty new to R and i'm trying to get a stacked bar plot. My data looks like this:
name value1 value2
1 A 1118 239
2 B 647 31
3 C 316 1275
4 D 2064 230
5 E 231 85
I need a horizontal bar graph with stacked values, this is as far as i can get with my limited R skills (and most of that is also copy-pasted):
melted <- melt(data, id.vars=c("name"))
melted$name <- factor(
melted$name,
levels=rev(sort(unique(melted$name))),
ordered=TRUE
)
melted2 <- melted[order(melted$value),]
ggplot(melted2, aes(x= name, y = value, fill = variable)) +
geom_bar(stat = "identity") +
coord_flip()
It even took me several hours to get to this point, with witch I am pretty content as far as looks go, this is the produced output
What I now want to do is to get the bars ordered by summed up value (D is first, followed by C, A, B, E). I googled and tried some reorder and order stuff, but I simply can't get it to behave like I want it to. I'm sure the solution has to be pretty simple, so I hope you guys can help me with this.
Thanks in advance!
Well, I am not down or keeping up with all the latest changes in ggplot, but here is one way you could remedy this
I used your idea to set up the factor levels of name but based on the grouped sums. You might also find order = variable useful at some point, which will order the bar colors based on the variable, but not needed here
data <- read.table(header = TRUE, text = "name value1 value2
1 A 1118 239
2 B 647 31
3 C 316 1275
4 D 2064 230
5 E 231 85")
library('reshape2')
library('ggplot2')
melted <- melt(data, id.vars=c("name"))
melted <- within(melted, {
name <- factor(name, levels = names(sort(tapply(value, name, sum))))
})
levels(melted$name)
# [1] "E" "B" "A" "C" "D"
ggplot(melted, aes(x= name, y = value, fill = variable, order = variable)) +
geom_bar(stat = "identity") +
coord_flip()
Another option would be to use the dplyr package to set up a total column in your data frame and use that to sort.
The approach would look something like this.
m <- melted %>% group_by(name) %>%
mutate(total = sum(value) ) %>%
ungroup() %>%
arrange(total) %>%
mutate(name = factor(name, levels = unique(as.character(name))) )
ggplot(m, aes(x = name, y = value, fill = variable)) + geom_bar(stat = 'identity') + coord_flip()
Note that trying below code.
using tidyr package instead to reshape2 package
library(ggplot2)
library(dplyr)
library(tidyr)
data <- read.table(text = "
class value1 value2
A 1118 239
B 647 31
C 316 1275
D 2064 230
E 231 85", header = TRUE)
pd <- gather(data, key, value, -class) %>%
mutate(class = factor(class, levels = tapply(value, class, sum) %>% sort %>% names))
pd %>% ggplot(aes(x = class, y = value, fill = key, order = class)) +
geom_bar(stat = "identity") +
coord_flip()

Resources