Grouped bar plot R - r

i have this data frame:
TotalCost Vehicles Bikes
92 1 2
92 1 3
96 1 6
93 2 2
93 2 3
95 2 6
108 3 2
108 3 3
108 3 6
I would like to plot with bars filled in 'Bikes' parameter, but this command:
ggplot(data, aes(Vehicles, TotalCost)) + geom_bar(aes(fill = Bikes), position = "dodge", stat="identity")
gives me this plot, without any color
What am i doing wrong?

library("magrittr")
library("reshape2")
library("ggplot2")
rawdata = matrix(data = strsplit(split = ",", "92,1,2,92,1,3,96,1,6,93,2,2,93,2,3,95,2,6,108,3,2,108,3,3,108,3,6") %>% unlist %>% as.numeric,
ncol = 3, byrow = T)
colnames(rawdata) = c("TotalCost","Vehicles","Bikes")
df = as.data.frame(rawdata, stringsAsFactors = F)
If your "Bikes" data are continuous, then you could be looking for the following:
ggplot(df, aes(x = Vehicles, y = TotalCost)) + geom_bar(aes(fill = Bikes), stat="identity")
If the Bikes" are more of distinct categories, then the following might be it:
ggplot(df, aes(x = Vehicles, y = TotalCost)) + geom_bar(aes(fill = as.factor(Bikes)), stat="identity", position = "dodge")

This is happening because you can't dodge based on a numeric quantity, because it it continuous. If you specify fill=factor(Bikes) it will do the right thing; otherwise ggplot doesn't know how to "dodge" the bars for a continuous value.
Alternatively, you can specify the grouping explicitly, by adding group=Bikes to the aesthetics for the master plot or geom_bar:
ggplot(df, aes(x=Vehicles, y=TotalCost)) +
geom_bar(aes(fill=Bikes, group=Bikes), position="dodge", stat="identity")
The advantage of the factor approach is that each bar gets its own label, and you can use discrete color scales (like Brewer) to make the differentiation clear.
With the group approach, the coloration will reflect the relative values, which may be desirable, but may make the plot hard to read if there are more values for bikes, since comparing adjacent Vehicles columns will involve comparing subtle gradations. If we append another row with 108, 3, 7, then it will be hard to compare the 2 and 3 groupings.
ggplot(rbind(df, c(108, 3, 7)), aes(x=Vehicles, y=TotalCost)) +
geom_bar(aes(fill=Bikes, group=Bikes), position="dodge", stat="identity")

Related

Trouble graphing two columns on one graph in R

I just started learning R. I melted my dataframe and used ggplot to get this graph. There's supposed to be two lines on the same graph, but the lines connecting seem random.
Correct points plotted, but wrong lines.
# Melted my data to create new dataframe
AvgSleep2_DF <- melt(AvgSleep_DF , id.vars = 'SleepDay_Date',
variable.name = 'series')
# Plotting
ggplot(AvgSleep2_DF, aes(SleepDay_Date, value, colour = series)) +
geom_point(aes(colour = series)) +
geom_line(aes(colour = series))
With or without the aes(colour = series) in the geom_line results in the same graph. What am I doing wrong here?
The following might explain what geom_line() does when you specify aesthetics in the ggplot() call.
I assign a deliberate colour column that differs from the series specification!
df <- data.frame(
x = c(1,2,3,4,5)
, y = c(2,2,3,4,2)
, colour = factor(c(rep(1,3), rep(2,2)))
, series = c(1,1,2,3,3)
)
df
x y colour series
1 1 2 1 1
2 2 2 1 1
3 3 3 1 2
4 4 4 2 3
5 5 2 2 3
Inheritance in ggplot will look for aesthetics defined in an upper layer.
ggplot(data = df, aes(x = x, y = y, colour = colour)) +
geom_point(size = 3) + # setting the size to stress point layer call
geom_line() # geom_line will "inherit" a "grouping" from the colour set above
This gives you
While we can control the "grouping" associated to each line(segment) as follows:
ggplot(data = df, aes(x = x, y = y, colour = colour)) +
geom_point(size = 3) +
geom_line(aes(group = series) # defining specific grouping
)
Note: As I defined a separate "group" in the series column for the 3rd point, it is depicted - in this case - as a single point "line".

Stack Barplot (ggplot): Is there a way to use 'repeatable' values of filling in different order?

I am creating a stacked barplot showing different types of treatment for ovarian cancer. Each 'bar' represents a different treatment. Some patients are treated with the same combinational therapy, but not neccessarily in continued lines.
I've looked at this answer # 2.
But it doesn't cut it.
I've attached a sample patient
record_id line treatment value
134 47 1 Carboplatin og Docetaxel 1
135 47 2 Carboplatin og Caelyx 1
136 47 3 Carboplatin og Caelyx 1
137 47 4 AVANOVA, arm 2 - Bevacizumab og NIraparib 1
138 47 5 Carboplatin og Caelyx 1
Using the following ggplot for the patients generates
library(tidyverse)
library(ggplot2)
stack %>%
ggplot(aes(x = record_id, y = value, fill = interaction(treatment,-line))) +
geom_bar(stat = "identity", position = "stack", data = stack %>% filter(record_id == 47)) +
guides(fill = guide_legend("ordering"))
I have also tried using the fill = reorder - same code as above. The result is
I was hoping to get a result looking the the first picture (with fill = interaction), but where the colors appear the same for the same treatment (in this example 'Carboplatin and Caelyx').
It sounds like you want the bars stacked in the first order, but with their fill solely based on treatment. I think this can be done by using group and fill together:
library(tidyverse)
stack %>%
ggplot(aes(x = record_id, y = value,
group = interaction(treatment,-line),
fill = treatment)) +
geom_bar(stat = "identity", position = "stack",
data = stack %>% filter(record_id == 47),
color = "white") +
guides(fill = guide_legend("ordering"))

ggplot facet_wrap with equally spaced axes

Say I have the following dummy data frame:
df <- data.frame(let = LETTERS[1:13], value = sample(13),
group = rep(c("foo", "bar"), times = c(5,8)))
df
let value group
1 A 2 foo
2 B 1 foo
3 C 12 foo
4 D 8 foo
5 E 4 foo
6 F 13 bar
7 G 11 bar
8 H 3 bar
9 I 7 bar
10 J 5 bar
11 K 10 bar
12 L 9 bar
13 M 6 bar
Using ggplot with facet_wrap allows me to make a panel for each of the groups...
library(ggplot2)
ggplot(df, aes(x= let, y = value)) +
geom_point() +
coord_flip() +
facet_wrap(~group, scales = "free")
..but the vertical axes are not equally spaced, i.e. the left plot contains more vertical ticks than the right one. I would like to fill up the right vertical axis with (unlabeled) ticks (with no plotted values). In this case that would add 3 empty ticks, but it should be scalable to any df size.
What is the best way to accomplish this? Should I change the data frame, or is there a way to do this using ggplot?
I’m not sure why you want to arrange the categorical variable on your chart as you do other than aesthetics (it does seem to look better). At any rate, a simple workaround which seems to handle general cases is to note that ggplot uses a numerical scale to plot categorical variables. The workaround for your chart is then for each x value to plot a transparent point at the y value equal to the number of categorical variables. Points are plotted for all x values as a simple solution to the case of non-overlapping ranges of x values for each group. I've added another group to your data frame to make the example a bit more general.
library(ggplot2)
set.seed(123)
df <- data.frame(let = LETTERS[1:19], value = c(sample(13),20+sample(6)),
group = rep(c("foo", "bar", "bar2"), times = c(5,8,6)))
num_rows <- xtabs(~ group, df)
max_rows <- max(num_rows)
sp <- ggplot(df, aes(y= let, x = value)) +
geom_point() +
geom_point(aes(y = max_rows +.5), alpha=0 ) +
facet_wrap(~group, scales = "free", nrow=1 )
plot(sp)
This gives the following chart:
A cludgy solution that requires magrittr (for the compound assignment pipe %<>%):
df %<>%
rbind(data.frame(let = c(" ", " ", " "),
value = NA,
group = "foo"))
I just add three more entries for foo that are blank strings (i.e., just spaces) of different lengths. There must be a more elegant solution, though.
Use free_x instead of free, like this:
ggplot(df, aes(x= let, y = value)) +
geom_point() +
coord_flip() +
facet_wrap(~group, scales = "free_x")+
theme(axis.text.y=element_blank(),
axis.ticks.y=element_blank())

ggplot2: create ordered group bar plot - (use reorder)

I want to create grouped bar plot while keeping order. If it was single column and not a grouped bar plot use of reorder function is obvious. But not sure how to use it on a melted data.frame.
Here is the detail explanation with code example:
Lets say we have following data.frame:
d.nfl <- data.frame(Team1=c("Vikings", "Chicago", "GreenBay", "Detroit"), Win=c(20, 13, 9, 12))
plotting a simple bar plot while flipping it.
ggplot(d.nfl, aes(x = Team1, y=Win)) + geom_bar(aes(fill=Team1), stat="identity") + coord_flip()
above plot will not have an order and if I want to order the plot by win I can do following:
d.nfl$orderedTeam <- reorder(d.nfl$Team1, d.nfl$Win)
ggplot(d.nfl, aes(x = orderedTeam, y=Win)) + geom_bar(aes(fill=orderedTeam), stat="identity") + coord_flip()
Now lets say we add another column (to original data frame)
d.nfl$points <- c(12, 3, 45, 5)
Team1 Win points
1 Vikings 20 12
2 Chicago 13 3
3 GreenBay 9 45
4 Detroit 12 5
to generate grouped bar plot, first we need to melt it:
library(reshape2)
> d.nfl.melt <- melt(d.nfl[,c('Team1','Win','points')],id.vars = 1)
> ggplot(d.nfl.melt,aes(x = Team1,y = value)) + geom_bar(aes(fill = variable),position = "dodge", stat="identity") + coord_flip()
above ggplot is unordered.
but how I do ordered group bar plot (ascending manner)
This is a non-issue.
The easiest way is to not discard your ordered team in the melt:
d.nfl.melt <- melt(d.nfl,id.vars = c("Team1", "orderedTeam"))
Alternatively, we can use reorder after melting and just only use the Win elements in computing the ordering:
d.nfl.melt$ordered_after_melting = reorder(
d.nfl.melt$Team1,
X = d.nfl.melt$value * (d.nfl.melt$variable == "Win")
)
Yet another idea is to take the levels from the original ordered column and apply them to a melted factor:
d.nfl.melt$copied_levels = factor(
d.nfl.melt$Team1,
levels = levels(d.nfl$orderedTeam)
)
All three methods give the same result. (I left out the coord_flips because they don't add anything to the question, but you can of course add them back in.)
gridExtra::grid.arrange(
ggplot(d.nfl.melt,aes(x = orderedTeam, y = value)) +
geom_bar(aes(fill = variable),position = "dodge", stat="identity"),
ggplot(d.nfl.melt,aes(x = ordered_after_melting, y = value)) +
geom_bar(aes(fill = variable),position = "dodge", stat="identity"),
ggplot(d.nfl.melt,aes(x = copied_levels, y = value)) +
geom_bar(aes(fill = variable),position = "dodge", stat="identity")
)
As to the easiest, I would recommend just keeping the orderedTeam variable around while melting. Your code seems to work hard to leave it out, it's quite easy to keep it in.
The challenge your question presents is how to reorder a factor Team1 based on a subset values in a melted column.
The comments to your question from #alistaire and #joran link to great answers.
The tl;dr answer is to just apply the ordering from your original, unmelted data.frame to the new one using levels().
library(reshape2)
#Picking up from your example code:
d.nfl.melt <- melt(d.nfl[,c('Team1','Win','points')],id.vars = 1)
levels(d.nfl.melt$Team1)
#Current order is alphabetical
#[1] "Chicago" "Detroit" "GreenBay" "Vikings"
#Reorder based on Wins (using the same order from your earlier, unmelted data.frame)
d.nfl.melt$Team1 <- factor(d.nfl.melt$Team1, levels = levels(d.nfl$orderedTeam)) #SOLUTION
levels(d.nfl.melt$Team1)
#New order is ascending by wins
#[1] "GreenBay" "Detroit" "Chicago" "Vikings"
ggplot(d.nfl.melt,aes(x = Team1,y = value)) +
geom_bar(aes(fill = variable),position = "dodge", stat="identity") + coord_flip()

ggplot: showing % instead of counts in charts of categorical variables with multiple levels

I would like to create a barplot like this:
library(ggplot2)
# Dodged bar charts
ggplot(diamonds, aes(clarity, fill=cut)) + geom_bar(position="dodge")
However, instead of counts, I want to have the percentage of observations falling into each 'clarity' category by cutting category ('fair', 'good', 'very good' ...).
With this ...
# Dodged bar charts
ggplot(diamonds, aes(clarity, fill=cut)) +
geom_bar(aes(y = (..count..)/sum(..count..)), position="dodge")
I get percentages on the y-axis, but these percentages ignore the cut-factor.
I want that all the red bars sum up to 1, all the yellow bars sum up to 1 etc.
Is there an easy way to make that work without having to prepare the data manually?
Thanks!
P.S.: This is a follow-up to this stackoverflow question
You could use sjp.xtab from the sjPlot-package for that:
sjp.xtab(diamonds$clarity,
diamonds$cut,
showValueLabels = F,
tableIndex = "row",
barPosition = "stack")
The data preparation for stacked group-percentages that sum up to 100% should be:
data.frame(prop.table(table(diamonds$clarity, diamonds$cut),1))
thus, you could write
mydf <- data.frame(prop.table(table(diamonds$clarity, diamonds$cut),1))
ggplot(mydf, aes(Var1, Freq, fill = Var2)) +
geom_bar(position = "stack", stat = "identity") +
scale_y_continuous(labels=scales::percent)
Edit: This one adds up each category (Fair, Good...) to 100%, using 2 in prop.table and position = "dodge":
mydf <- data.frame(prop.table(table(diamonds$clarity, diamonds$cut),2))
ggplot(mydf, aes(Var1, Freq, fill = Var2)) +
geom_bar(position = "dodge", stat = "identity") +
scale_y_continuous(labels=scales::percent)
or
sjp.xtab(diamonds$clarity,
diamonds$cut,
showValueLabels = F,
tableIndex = "col")
Verifying the last example with dplyr, summing up percentages within each group:
library(dplyr)
mydf %>% group_by(Var2) %>% summarise(percsum = sum(Freq))
> Var2 percsum
> 1 Fair 1
> 2 Good 1
> 3 Very Good 1
> 4 Premium 1
> 5 Ideal 1
(see this page for further plot-options and examples from sjp.xtab...)

Resources