How to plot stacked proportional graph? - r

I have a data frame:
x <- data.frame(id=letters[1:3],val0=1:3,val1=4:6,val2=7:9)
id val0 val1 val2
1 a 1 4 7
2 b 2 5 8
3 c 3 6 9
I want to plot a stacked bar plot that shows the percentage of each columns. So, each bar represents one row and and each bar is of length but of three different colors each color representing percentage of val0, val1 and val2.
I tried looking for it, I am getting only ways to plot stacked graph but not stacked proportional graph.
Thanks.

Using ggplot2
For ggplot2 and geom_bar
Work in long format
Pre-calculate the percentages
For example
library(reshape2)
library(plyr)
# long format with column of proportions within each id
xlong <- ddply(melt(x, id.vars = 'id'), .(id), mutate, prop = value / sum(value))
ggplot(xlong, aes(x = id, y = prop, fill = variable)) + geom_bar(stat = 'identity')
# note position = 'fill' would work with the value column
ggplot(xlong, aes(x = id, y = value, fill = variable)) +
geom_bar(stat = 'identity', position = 'fill', aes(fill = variable))
# will return the same plot as above
base R
A table object can be plotted as a mosaic plot. using plot. Your x is (almost) a table object
# get the numeric columns as a matrix
xt <- as.matrix(x[,2:4])
# set the rownames to be the first column of x
rownames(xt) <- x[[1]]
# set the class to be a table so plot will call plot.table
class(xt) <- 'table'
plot(xt)
you could also use mosaicplot directly
mosaicplot(x[,2:4], main = 'Proportions')

Related

Trouble graphing two columns on one graph in R

I just started learning R. I melted my dataframe and used ggplot to get this graph. There's supposed to be two lines on the same graph, but the lines connecting seem random.
Correct points plotted, but wrong lines.
# Melted my data to create new dataframe
AvgSleep2_DF <- melt(AvgSleep_DF , id.vars = 'SleepDay_Date',
variable.name = 'series')
# Plotting
ggplot(AvgSleep2_DF, aes(SleepDay_Date, value, colour = series)) +
geom_point(aes(colour = series)) +
geom_line(aes(colour = series))
With or without the aes(colour = series) in the geom_line results in the same graph. What am I doing wrong here?
The following might explain what geom_line() does when you specify aesthetics in the ggplot() call.
I assign a deliberate colour column that differs from the series specification!
df <- data.frame(
x = c(1,2,3,4,5)
, y = c(2,2,3,4,2)
, colour = factor(c(rep(1,3), rep(2,2)))
, series = c(1,1,2,3,3)
)
df
x y colour series
1 1 2 1 1
2 2 2 1 1
3 3 3 1 2
4 4 4 2 3
5 5 2 2 3
Inheritance in ggplot will look for aesthetics defined in an upper layer.
ggplot(data = df, aes(x = x, y = y, colour = colour)) +
geom_point(size = 3) + # setting the size to stress point layer call
geom_line() # geom_line will "inherit" a "grouping" from the colour set above
This gives you
While we can control the "grouping" associated to each line(segment) as follows:
ggplot(data = df, aes(x = x, y = y, colour = colour)) +
geom_point(size = 3) +
geom_line(aes(group = series) # defining specific grouping
)
Note: As I defined a separate "group" in the series column for the 3rd point, it is depicted - in this case - as a single point "line".

Stacked barchart, independent fill order for each stack

I'm facing a behaviour of ggplot2, ordering and stacked barplot that I cannot understand. I've read some question about it (here,here and so on), but unluckily I cannot find a solution that suits to me. Maybe the answer is easy and I cannot see it. Hope it's not a dupe.
My main goal is to have each stack ordered independently, based on the ordering column (called here ordering).
Here I have some data:
library(dplyr)
library(ggplot2)
dats <- data.frame(id = c(1,1,1,2,2,3,3,3,3),
value = c(9,6,4,5,6,4,3,4,5),
ordering = c(1,2,3,2,3,1,3,2,4),
filling = c('a','b','c','b','a','a','c','d','b')) %>% arrange(id,ordering)
So there is an ID, a value, a value to use to order, and a filling, the data are as they should be ordered in the plot, as looking the ordering column.
I tried to plot it: the idea is to plot as a stacked barchart with x axis the id, the value value, filled by filling, but the filling has as order the value of ordering, in an ascending ordering, i.e. biggest value of ordering at the bottom for each column. The ordering of the filling is somewhat equal as the dataset, i.e. each column has an independent order.
As you can imagine those are fake data, so the number of id can vary.
id value ordering filling
1 1 9 1 a
2 1 6 2 b
3 1 4 3 c
4 2 5 2 b
5 2 6 3 a
6 3 4 1 a
7 3 4 2 d
8 3 3 3 c
9 3 5 4 b
When I plot them, there is something I do not understand:
library(dplyr)
dats$filling <- reorder(dats$filling, -dats$ordering)
ggplot(dats,aes(x = id,
y = value,
fill = filling)) +
geom_bar(stat = "identity",position = "stack") +
guides(fill=guide_legend("ordering"))
The second and the third id are not properly ordered, I should have the order of the original dataset.
If you use separate geom_bars, you can make the orders different.
dats %>%
ggplot(aes(x = id, y = value, fill = reorder(filling,-ordering))) +
geom_bar(stat = "identity", position = "stack", data = dats %>% filter(id == 1)) +
geom_bar(stat = "identity", position = "stack", data = dats %>% filter(id == 2)) +
geom_bar(stat = "identity", position = "stack", data = dats %>% filter(id == 3)) +
guides(fill=guide_legend("ordering"))
More generally:
bars <- map(unique(dats$id)
, ~geom_bar(stat = "identity", position = "stack"
, data = dats %>% filter(id == .x)))
dats %>%
ggplot(aes(x = id, y = value, fill = reorder(filling,-ordering))) +
bars +
guides(fill=guide_legend("ordering"))
The problem is that, in your case, different bars should use the same values (levels) of filling in a different order. This conflicts with the way ggplot works: taking the factor levels (which already have a certain order) and applying them in the same way for each bar.
A workaround then is... To create many factor levels.
ggplot(dats, aes(x = id, y = value, fill = interaction(-ordering, id))) +
geom_bar(stat = "identity", position = "stack")
This one now is too "generous" by being too detailed. However, what we can do now is to deal with the legend and the different colors:
dats <- arrange(dats, id, -ordering)
aux <- with(dats, match(sort(unique(filling)), filling))
ggplot(dats, aes(x = id, y = value, fill = interaction(-ordering, id))) +
geom_bar(stat = "identity", position = "stack") +
scale_fill_manual("Ordering", values = scales::hue_pal()(4)[dats$filling],
labels = with(dats, filling[aux]),
breaks = with(dats, interaction(-ordering, id)[aux]))
Here I first rearrange the rows of dats as to avoid doing that later. Then aux is an auxiliary vector
aux
# [1] 3 2 1 8
giving arbitrary positions (one for each) where levels a, b, c, and d (in this order) appear in dats, which again is useful later. Then I simply set corresponding scale values, labels, and breaks... Lastly, I use scales::hue_pal to recover the original color palette.
The problem here is that the element filling = d only appears in the third group with a low value. One solution, could be to fill non-present values with 0:
library(dplyr)
#>
#> Attachement du package : 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(ggplot2)
dats <- data.frame(id = c(1,1,1,1,2,2,2,2,3,3,3,3),
value = c(9,6,4,0,5,6,0,0,4,3,4,5),
ordering = c(1,2,3,5,2,3,5,5,1,3,2,4),
filling = c('a','b','c','d','b','a','c','d','a','c','d','b')) %>% arrange(id,ordering)
ggplot(dats,aes(x = id,
y = value,
fill = reorder(filling,-ordering))) +
geom_bar(stat = "identity",position = "stack") +
guides(fill=guide_legend("ordering"))
Created on 2018-12-03 by the reprex package (v0.2.1)

ggplot facet_wrap with equally spaced axes

Say I have the following dummy data frame:
df <- data.frame(let = LETTERS[1:13], value = sample(13),
group = rep(c("foo", "bar"), times = c(5,8)))
df
let value group
1 A 2 foo
2 B 1 foo
3 C 12 foo
4 D 8 foo
5 E 4 foo
6 F 13 bar
7 G 11 bar
8 H 3 bar
9 I 7 bar
10 J 5 bar
11 K 10 bar
12 L 9 bar
13 M 6 bar
Using ggplot with facet_wrap allows me to make a panel for each of the groups...
library(ggplot2)
ggplot(df, aes(x= let, y = value)) +
geom_point() +
coord_flip() +
facet_wrap(~group, scales = "free")
..but the vertical axes are not equally spaced, i.e. the left plot contains more vertical ticks than the right one. I would like to fill up the right vertical axis with (unlabeled) ticks (with no plotted values). In this case that would add 3 empty ticks, but it should be scalable to any df size.
What is the best way to accomplish this? Should I change the data frame, or is there a way to do this using ggplot?
I’m not sure why you want to arrange the categorical variable on your chart as you do other than aesthetics (it does seem to look better). At any rate, a simple workaround which seems to handle general cases is to note that ggplot uses a numerical scale to plot categorical variables. The workaround for your chart is then for each x value to plot a transparent point at the y value equal to the number of categorical variables. Points are plotted for all x values as a simple solution to the case of non-overlapping ranges of x values for each group. I've added another group to your data frame to make the example a bit more general.
library(ggplot2)
set.seed(123)
df <- data.frame(let = LETTERS[1:19], value = c(sample(13),20+sample(6)),
group = rep(c("foo", "bar", "bar2"), times = c(5,8,6)))
num_rows <- xtabs(~ group, df)
max_rows <- max(num_rows)
sp <- ggplot(df, aes(y= let, x = value)) +
geom_point() +
geom_point(aes(y = max_rows +.5), alpha=0 ) +
facet_wrap(~group, scales = "free", nrow=1 )
plot(sp)
This gives the following chart:
A cludgy solution that requires magrittr (for the compound assignment pipe %<>%):
df %<>%
rbind(data.frame(let = c(" ", " ", " "),
value = NA,
group = "foo"))
I just add three more entries for foo that are blank strings (i.e., just spaces) of different lengths. There must be a more elegant solution, though.
Use free_x instead of free, like this:
ggplot(df, aes(x= let, y = value)) +
geom_point() +
coord_flip() +
facet_wrap(~group, scales = "free_x")+
theme(axis.text.y=element_blank(),
axis.ticks.y=element_blank())

faceted piechart with ggplot

I have the following data.frame:
x = data.frame(category=c(1,1,1,1,2,2,2,2), value=c(1,2,1,1,2,2,2,1));
x$category = as.factor(x$category);
x$value = as.factor(x$value);
and I have created a faceted bar chart with ggplot2.
ggplot(x, aes(value, fill=category)) + geom_bar() + facet_wrap(~category);
However, I would like to have a pie chart that shows the fraction values (based on the totals for each category). The diagram should then show one pie chart for each category and two fractions inside each pie chart, one for each value factor. The real data has up to 6 categories and I have a few 1000 data sets). Is there a generic way to do that?
One way is to calculate the percentage/ratio beforehand and then use it to get the position of the text label. See also how to put percentage label in ggplot when geom_text is not suitable?
# Your data
y = data.frame(category=c(1,1,1,1,2,2,2,2), value=c(2,2,1,1,2,2,2,1))
# get counts and melt it
data.m = melt(table(y))
names(data.m)[3] = "count"
# calculate percentage:
m1 = ddply(data.m, .(category), summarize, ratio=count/sum(count))
#order data frame (needed to comply with percentage column):
m2 = data.m[order(data.m$category),]
# combine them:
mydf = data.frame(m2,ratio=m1$ratio)
# get positions of percentage labels:
mydf = ddply(mydf, .(category), transform, position = cumsum(count) - 0.5*count)
# create bar plot
pie = ggplot(mydf, aes(x = factor(1), y = count, fill = as.factor(value))) +
geom_bar(stat = "identity", width = 1) +
facet_wrap(~category)
# make a pie
pie = pie + coord_polar(theta = "y")
# add labels
pie + geom_text(aes(label = sprintf("%1.2f%%", 100*ratio), y = position))

How can I highlight subset of values in ggplot2 plots?

For example I have basic stacked plot:
ggplot(diamonds, aes(x=factor(color),fill=factor(cut)))+geom_bar(position="fill")
and I have small subset diamonds with "carat" value higher than 3:
subset(diamonds,carat>3)
and I want to highlight this particular values on plot (like points or labels if our diamonds would have IDs) to see in which part of distribution are they lying. Is there any possibility to do something like that?
PS: unfortunantly I`m not allowed to post figures.
The following inserts the count of "carat greater than 3" into the bar segments. I've broken the problem down to a number of steps. Step 1: New variable identifying "carat greater than 3". Step 2: Get a summary table of the counts - of diamonds for each color and cut, and of "carat greater than 3' for each color and cut. I used the ddply() function from the plyr packages. Step 3: The bar plot without the labels. Step 4: Add to the summary table a variable giving the y positions of the labels. Step 5: Add the geom_text layer to the plot. The data frame for geom_text is the summary table. geom_text() needs aesthetics for label (in this case, the count for "carat greater than 3'), y position (calculated in the previous step), and x positions (color).
library(ggplot2)
library(plyr)
# Step 1
diamonds$caratGT3 = ifelse(diamonds$carat > 3, 1, 0)
# Step 2
diamonds2 = ddply(diamonds, .(color, cut), summarize, CountGT3 = sum(caratGT3))
diamonds2$Count = count(diamonds, .(color, cut))[,3]
diamonds2
# Step 3
p = ggplot() + geom_bar(data = diamonds, aes(x=factor(color),fill=factor(cut)))
# Step 4
diamonds2 <- ddply(diamonds2,.(color),
function(x) {
x$cfreq <- cumsum(x$Count)
x$pos <- (c(0,x$cfreq[-nrow(x)]) + x$cfreq) / 2
x
})
# Step 5
(p <- p + geom_text(data = diamonds2,
aes(x = factor(color), y = pos, label = CountGT3),
size = 3, colour = "black", face = "bold"))

Resources