Stacked barchart, independent fill order for each stack - r

I'm facing a behaviour of ggplot2, ordering and stacked barplot that I cannot understand. I've read some question about it (here,here and so on), but unluckily I cannot find a solution that suits to me. Maybe the answer is easy and I cannot see it. Hope it's not a dupe.
My main goal is to have each stack ordered independently, based on the ordering column (called here ordering).
Here I have some data:
library(dplyr)
library(ggplot2)
dats <- data.frame(id = c(1,1,1,2,2,3,3,3,3),
value = c(9,6,4,5,6,4,3,4,5),
ordering = c(1,2,3,2,3,1,3,2,4),
filling = c('a','b','c','b','a','a','c','d','b')) %>% arrange(id,ordering)
So there is an ID, a value, a value to use to order, and a filling, the data are as they should be ordered in the plot, as looking the ordering column.
I tried to plot it: the idea is to plot as a stacked barchart with x axis the id, the value value, filled by filling, but the filling has as order the value of ordering, in an ascending ordering, i.e. biggest value of ordering at the bottom for each column. The ordering of the filling is somewhat equal as the dataset, i.e. each column has an independent order.
As you can imagine those are fake data, so the number of id can vary.
id value ordering filling
1 1 9 1 a
2 1 6 2 b
3 1 4 3 c
4 2 5 2 b
5 2 6 3 a
6 3 4 1 a
7 3 4 2 d
8 3 3 3 c
9 3 5 4 b
When I plot them, there is something I do not understand:
library(dplyr)
dats$filling <- reorder(dats$filling, -dats$ordering)
ggplot(dats,aes(x = id,
y = value,
fill = filling)) +
geom_bar(stat = "identity",position = "stack") +
guides(fill=guide_legend("ordering"))
The second and the third id are not properly ordered, I should have the order of the original dataset.

If you use separate geom_bars, you can make the orders different.
dats %>%
ggplot(aes(x = id, y = value, fill = reorder(filling,-ordering))) +
geom_bar(stat = "identity", position = "stack", data = dats %>% filter(id == 1)) +
geom_bar(stat = "identity", position = "stack", data = dats %>% filter(id == 2)) +
geom_bar(stat = "identity", position = "stack", data = dats %>% filter(id == 3)) +
guides(fill=guide_legend("ordering"))
More generally:
bars <- map(unique(dats$id)
, ~geom_bar(stat = "identity", position = "stack"
, data = dats %>% filter(id == .x)))
dats %>%
ggplot(aes(x = id, y = value, fill = reorder(filling,-ordering))) +
bars +
guides(fill=guide_legend("ordering"))

The problem is that, in your case, different bars should use the same values (levels) of filling in a different order. This conflicts with the way ggplot works: taking the factor levels (which already have a certain order) and applying them in the same way for each bar.
A workaround then is... To create many factor levels.
ggplot(dats, aes(x = id, y = value, fill = interaction(-ordering, id))) +
geom_bar(stat = "identity", position = "stack")
This one now is too "generous" by being too detailed. However, what we can do now is to deal with the legend and the different colors:
dats <- arrange(dats, id, -ordering)
aux <- with(dats, match(sort(unique(filling)), filling))
ggplot(dats, aes(x = id, y = value, fill = interaction(-ordering, id))) +
geom_bar(stat = "identity", position = "stack") +
scale_fill_manual("Ordering", values = scales::hue_pal()(4)[dats$filling],
labels = with(dats, filling[aux]),
breaks = with(dats, interaction(-ordering, id)[aux]))
Here I first rearrange the rows of dats as to avoid doing that later. Then aux is an auxiliary vector
aux
# [1] 3 2 1 8
giving arbitrary positions (one for each) where levels a, b, c, and d (in this order) appear in dats, which again is useful later. Then I simply set corresponding scale values, labels, and breaks... Lastly, I use scales::hue_pal to recover the original color palette.

The problem here is that the element filling = d only appears in the third group with a low value. One solution, could be to fill non-present values with 0:
library(dplyr)
#>
#> Attachement du package : 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(ggplot2)
dats <- data.frame(id = c(1,1,1,1,2,2,2,2,3,3,3,3),
value = c(9,6,4,0,5,6,0,0,4,3,4,5),
ordering = c(1,2,3,5,2,3,5,5,1,3,2,4),
filling = c('a','b','c','d','b','a','c','d','a','c','d','b')) %>% arrange(id,ordering)
ggplot(dats,aes(x = id,
y = value,
fill = reorder(filling,-ordering))) +
geom_bar(stat = "identity",position = "stack") +
guides(fill=guide_legend("ordering"))
Created on 2018-12-03 by the reprex package (v0.2.1)

Related

R ggplot: Modifying aesthetics of individual lines without recreating entire color palette

In ggplot, is there any simple way of overriding the line attributes of a single group(s) without having to specify the entirety of the color/line pallet via scale_*_manual()?
In the example below, I basically want to make all the boot_* lines gray and skinny, while I want all other lines to retain the default colors/widths otherwise being used.
I know there's a lot of brute ways of doing this by some combo of a) creating some auxiliary variables in the data-frame based on the string-pattern that will server as my color/size group, then b) generating the plot below, extracting all the color-layer info, and then filling out an entire scale_color_manual() and scale_size_manual() map, and c)replacing the 'boot_*' values with "grey."
Are there any versatile shortcuts here?
library(dplyr)
library(ggplot)
set.seed(231)
df=tibble(time=c(1:5), actual=2*time+3, estimate = actual+rnorm(length(actual)))
for(i in 1:8){
df[paste('boot_', i, sep='')] = df$estimate + rnorm(nrow(df))
}
> head(df) %>% data.frame
# time actual estimate boot_1 boot_2 boot_3 boot_4 boot_5 boot_6 boot_7
# 1 1 5 4.466898 4.684295 4.240585 4.786520 5.904332 4.862498 2.092772 4.595850
# 2 2 7 4.688336 4.751258 6.074914 5.694181 3.445036 4.639329 4.548511 5.453597
# 3 3 9 8.045802 7.167972 6.858666 7.519752 7.721405 7.801243 10.156436 9.521482
# 4 4 11 11.262516 11.826206 10.682760 11.137814 11.252465 11.452442 11.925339 11.754248
# 5 5 13 12.526643 12.492315 13.927974 14.176896 11.924183 12.950479 11.257865 13.430229
# boot_8
# 1 3.987001
# 2 3.813539
# 3 7.549984
# 4 11.482360
# 5 11.645106
# Melt for ggplot compatibility
df_long = df %>%
pivot_longer(cols=(-time))
head(df_long) %>% data.frame
# time name value
# 1 1 actual 5.000000
# 2 1 estimate 4.466898
# 3 1 boot_1 4.684295
# 4 1 boot_2 4.240585
# 5 1 boot_3 4.786520
# 6 1 boot_4 5.904332
## The basic ggplot
df_long %>%
ggplot(aes(x=time, y=value, color=name)) + geom_line()
You could just use the first four characters of name for the colour aesthetic (using substr), and the full name as a group aesthetic. It's a bit hacky but it's short, effective, and all gets done in the plotting code without extra data wrangling, post-hoc changes or a long vector of colour mappings.
df_long %>%
ggplot(aes(x = time, y = value, color = substr(name, 1, 4), group = name)) +
geom_line() +
scale_color_manual(labels = c("actual", "boot", "estimate"),
values = c("orange", "gray", "blue3"), name = "name")
An alternative is using filtering to have two sets of lines: one coloured, and one merely grouped. This has the benefit that you don't need to add any scale calls at all:
df_long %>%
filter(!grepl("boot", name)) %>%
ggplot(aes(x = time, y = value, color = name)) +
geom_line(data = filter(df_long, grepl("boot", name)),
aes(group = name), color = "gray", size = 0.3) +
geom_line()
EDIT
It's pretty difficult to only specify an aesthetic mapping for a single (multiple) group, while leaving the others at default values. However, it is possible using ggnewscale. Here we only have to specify the color of the boot group:
library(ggnewscale)
df_long %>%
filter(!grepl("boot", name)) %>%
ggplot(aes(x = time, y = value)) +
new_scale_color() +
geom_line(aes(color = name)) +
scale_color_discrete(name = "Variable") +
new_scale_color() +
geom_line(data = filter(df_long, grepl("boot", name)),
aes(group = name, color = "boot"), size = 0.3) +
scale_color_manual(values = "gray", name = "") +
theme(legend.margin = margin(-28, 10, 0, 0))

Reformat label / preserve order of Multi-factor facets in ggplot2::facet_wrap() based on factor level

I wish to facet a graph based on two factors, rename the facets using a combination of the two facet factor values, but preserve the order of the facets based on the levels in the original factors.
The data looks something like this:
library(tidyverse)
set.seed(100)
tmp.d <- data.frame(
sector = factor(rep(c("B","A"),c(6,3)), levels = c("B","A")),
subsector = factor(rep(c("a","b","c"), each = 3), levels = c("c","b","a")),
year = factor(rep(2020:2022,3)),
value = sample(8:15,9, replace = TRUE)
)
#> tmp.d
# sector subsector year value
#1 B a 2020 9
#2 B a 2021 14
#3 B a 2022 13
#4 B b 2020 15
#5 B b 2021 10
#6 B b 2022 8
#7 A c 2020 9
#8 A c 2021 13
#9 A c 2022 11
Which is plotted and faceted by sector and subsector...
ggplot(tmp.d, aes(x = year, y = value, group = 1)) +
geom_path()+
facet_wrap(facets = list("sector","subsector"))
...and looks like this:
Notice that the facets keep the order set by the factor levels of "sector" and "subsector." This is desirable.
However, instead of listing the sector and sub sector on separate lines, I want the facet labels to read "[sector]: [subsector]" as in "B: b".
Attempt 1:
Adding a helper column to tmp.d, containing the facet labels.
tmp.d <- tmp.d %>% mutate(label = factor(paste0(sector, ": ", subsector)))
ggplot(tmp.d, aes(x = year, y = value, group = 1)) +
geom_path()+
facet_wrap(facets = list("label"))
Which yields:
Here, the facet labels are correct, but I've lost the order from the sector/subsector factor levels.
Attempt 2:
I think the answer may lay in a custom as_labeller function or perhaps even changing setting for an existing labeller like label_value which has a multi_line = [bool] attribute that controls whether the facet values appear on a single line or mulitple lines. Other versions of the label_ family have another attribute sep = which I beleive controls how the values are seperated in on the same line. Presumably, the combination of ...multi_line = FALSE, sep = ": "... might format the label and preserve the desired order.
The labeller is applied in the call to facet_wrap().
ggplot(tmp.d, aes(x = year, y = value, group = 1)) +
geom_path()+
facet_wrap(facets = list("sector","subsector"), labeller = [the labeller function])
Setting the labeller to an existing labeller function without changing default settings (see below) yields the same output as my original attempt above.
...
facet_wrap(facets = list("sector","subsector"), labeller = label_value)
...
Attempting to change the attribute values for label_value like so...
...
facet_wrap(facets = list("sector","subsector"), labeller = label_value(multi_line = FALSE))
...
... does not work because the label_value function requires a label value that I do not know how to provide. Passing the facet factors as names or as character strings (either as a list or vector) does not appear to work. The examples I found in the documentation or elsewhere use facet_grid instead of facet_wrap, and the labels is provided as a formula like ~sector+subsector which I assume is treated like a grid/matrix where sectors are columns and subsectors are rows. In my case, most (but not necessarily all) combinations of sector/subsector will be unique (i.e., Sectors A and B do not share subsectors).
Question Summary
Is there a simple way to acheive my objectives (restated for convenience):
facet on two factor variables (facet_wrap, not facet_grid)
preserve facet order based on factor levels
reformat the facet label to a single line with sector and subsector sepearted by a colon
Thanks,
Wow, that was a lot trickier than I expected... One solution would be to combine them into a different field:
tmp.d |>
arrange(sector, subsector) |> # arrange by factor levels
mutate(
facet =
paste0(sector, ": ", subsector) |>
fct_inorder(ordered = TRUE) # use that order for the new field
) |>
ggplot(aes(x = year, y = value, group = 1)) +
geom_path()+
facet_wrap(facets = ~facet) # here
This also works if a ", " is acceptable:
ggplot(tmp.d, aes(x = year, y = value, group = 1)) +
geom_path()+
facet_wrap(
facets = sector~subsector,
labeller =
labeller( # here
sector = label_value, #
subsector = label_value, #
.multi_line = FALSE #
)
)
A similar thing can be done with purrr::partial() which substitutes out defaults but again you get a comma. I think it would be worth creating an issue on their github page to add a sep argument to the label_*() functions
... +
facet_wrap(
facets = sector~subsector,
labeller = purrr::partial(label_value, multi_line = FALSE)
)
Update: Meanwhile yake84 already finished the answer:
To automate just add fct_inorder.. after using arrange:
tmp.d %>%
arrange(sector, subsector) %>%
mutate(my_label = paste(sector,subsector, sep=":") %>%
fct_inorder(ordered = TRUE)) %>%
ggplot(aes(x = year, y = value, group = 1)) +
geom_path()+
facet_wrap( ~ my_label)
First answer:
Just transform your label in attempt 1 to factor and define the levels:
library(tidyverse)
tmp.d %>%
mutate(my_label = paste(sector,subsector, sep=":")) %>%
mutate(my_label = factor(my_label, levels = c("B:b", "B:a", "A:c"))) %>%
ggplot(aes(x = year, y = value, group = 1)) +
geom_path()+
facet_wrap( ~ my_label)

geom_bar overlapping labels

for simplicity lets suppose we have a database like
# A
1 1
2 2
3 2
4 2
5 3
We have a categorical variable "A" with 3 possible values (1,2,3). And im tring this code:
ggplot(df aes(x="", y=df$A, fill=A))+
geom_bar(width = 1, stat = "identity")
The problem is that the labels are overlapping. Also i want to change the labes for 1,2,3 to x,y,z.
Here is picture of what is happening
And here is a link for the actual data that im using.
https://a.uguu.se/anKhhyEv5b7W_Data.csv
Your graph does not correspond to the sample of data you are showing, so it is hard to be sure that the structure of your real data is actually the same.
Using a random example, I get the following plot:
df <- data.frame(A = sample(1:3,20, replace = TRUE))
library(ggplot2)
ggplot(df, aes(x="A", y=A, fill=as.factor(A)))+
geom_bar(width = 1, stat = "identity") +
scale_fill_discrete(labels = c("x","y","z"))
EDIT: Using data provided by the OP
Here using your data, you should get the following plot:
ggplot(df, aes(x = "A",y = A, fill = as.factor(A)))+
geom_col()
Or if you want the count of each individual values of A, you can do:
library(dplyr)
library(ggplot2)
df %>% group_by(A) %>% count() %>%
ggplot(aes(x = "A", y = n, fill = as.factor(A)))+
geom_col()
Is it what you are looking for ?

R ggplot facet_grid multi boxplot

Using ggplot and facet_grid, I'd like to visualize two parallel vector of values through a box plot. My available data:
DF <- data.frame("value" = runif(50, 0, 1),
"value2" = runif(50,0,1),
"type1" = c(rep("AAAAAAAAAAAAAAAAAAAAAA", 25),
rep("BBBBBBBBBBBBBBBBB", 25)),
"type2" = rep(c("c", "d"), 25),
"number" = rep(2:6, 10))
The code at the moment permit to visualize only one vector of values:
ggplot(DF, aes(y=value, x=type1)) +
geom_boxplot(alpha=.3, aes(fill = type1)) +
ggtitle("TITLE") +
facet_grid(type2 ~ number) +
scale_x_discrete(name = NULL, breaks = NULL) + # these lines are optional
theme(legend.position = "bottom")
This is my plot at the moment.
I'd like to visualize a parallel box plot one for each vector (value and value2 in dataframe). Then for each colored boxplot, I'd like to have two boxplot one for value and another one for value2
I think there's likely a post that already addresses it, in addition to the one I linked to above. But this is a problem of two things: 1) getting data into the format that ggplot expects, i.e. long-shaped so there are values to map onto aesthetics, and 2) separation of concerns, in that you can use reshape2 or (more up-to-date) tidyr functions to get data into the proper shape, and ggplot2 functions to plot it.
You can use tidyr::gather for getting long data, and conveniently pipe it directly into ggplot.
library(tidyverse)
...
To illustrate, though with very generic column names:
DF %>%
gather(key, value = val, value, value2) %>%
head()
#> type1 type2 number key val
#> 1 AAAAAAAAAAAAAAAAAAAAAA c 2 value 0.5075600
#> 2 AAAAAAAAAAAAAAAAAAAAAA d 3 value 0.6472347
#> 3 AAAAAAAAAAAAAAAAAAAAAA c 4 value 0.7543778
#> 4 AAAAAAAAAAAAAAAAAAAAAA d 5 value 0.7215786
#> 5 AAAAAAAAAAAAAAAAAAAAAA c 6 value 0.1529630
#> 6 AAAAAAAAAAAAAAAAAAAAAA d 2 value 0.8779413
Pipe that directly into ggplot:
DF %>%
gather(key, value = val, value, value2) %>%
ggplot(aes(x = key, y = val, fill = type1)) +
geom_boxplot() +
facet_grid(type2 ~ number) +
theme(legend.position = "bottom")
Again, because of some of the generic column names, I'm not entirely sure this is the setup you want—like I don't know the difference in value / value2 vs AAAAAAA / BBBBBBB. You might need to swap aes assignments around accordingly.
You have to reshape your data frame. Use an additionally indicator (column) which defines the type of value (for example "value_type") and only define one value column. The indicator will than match the value to the corresponding value type. The following code will reshape your example:
DF <- data.frame("value" = c(runif(50, 0, 1), runif(50,0,1)),
"value_type" = rep(c("value1","value2"), each=50),
"type1" = rep(c(rep("AAAAAAAAAAAAAAAAAAAAAA", 25),
rep("BBBBBBBBBBBBBBBBB", 25)), 2),
"type2" = rep(rep(c("c", "d"), 25), 2),
"number" = rep(rep(2:6, 10),2))
Use ggplot additionaly with an color argument:
ggplot(DF, aes(y=value, x=type1, col=value_type)) +
geom_boxplot(alpha=.3, aes(fill = type1)) +
ggtitle("TITLE") +
facet_grid(type2 ~ number) +
scale_color_manual(values=c("green", "steelblue")) + # set the color of the values manualy
scale_x_discrete(name = NULL, breaks = NULL) +# these lines are optional
theme(legend.position = "bottom")

ordering of labels with bar and point plot

I have a bar plot using geom_bar() that I'd like to overlay points using geom_point(). The issues is the ordering of the axis labels. I have 2 groups, group A which I want to show with geom_bar() ordered from high to low and group B which I want to show with points using geom_bar. Group A and B will not always have the same categories but I always want group A shown with bars and ordered from high to low. and
If you run this code you will see just the bar plot correctly ordered. I need the pet supercategory shown first and then the car category. I have defined supercategory as an ordered factor and it is working.
Then within the supercategory, the bars are sorted by gorup A's value form high to low. you can see in the pet category dog is higher than the others and kia is higher than the others in the car category.
library(dplyr)
group = c("A","A","A","B","B","B","A","A","A","B","B","B")
supercategory = c("pet", "pet","pet","pet","pet","pet","car","car","car","car","car","car")
category = c("bird","cat","dog","bird","cat","lizard","ford","chevy","kia","kia","toyota","ford")
supercategory = factor(supercategory, levels= c("pet", "car"), ordered = TRUE)
value=c(3,4,5,4,5,6,1,3,10,8,3,5)
dat = data.frame(group = group,supercategory = supercategory, category = category, value = value )
dat = dat %>% mutate(LABEL = paste0(supercategory, "-",category), HIGH_VALUE = ifelse(group =="A",value,0)) %>%
arrange(supercategory, -HIGH_VALUE)
# after the lines above the data is ordered correctly. first by supercategory then by group A's value from higest to lowest using the HIGH_VALUE field
dat$ROW_NUMBER = 1:nrow(dat)
dat = dat %>% group_by(supercategory,category) %>% mutate(ROW_NUMBER2= min(ROW_NUMBER)) %>% arrange( supercategory ,ROW_NUMBER2)
# after the 2 lines above now the data is sorted by ROW_NUMBER2 which orders the category within supercategory.
# Group A will be be in bars using geom_bar
# group B will be displayed iwht points using geom_point
# The bars and points should be in the order of ROW_NUMBER2
library(ggplot2)
dat$LABEL = factor(dat$LABEL, levels = unique(dat$LABEL), ordered = TRUE)
ggplot(dat[dat$group=="A",] , aes(x = LABEL, y = value))+
geom_bar(stat="identity")
I'd like to keep the ordering of the plot above and just add the points above the bars. And if Group B has a category that is not one of Group A's the point should be to the right of Group A's last bar within whatever supercategory it is in.
But when I try to add the points the ordering gets messed up. Run this code which just adds group B's data as points and you will see the order of the labels gets messed up.
library(ggplot2)
dat$LABEL = factor(dat$LABEL, levels = unique(dat$LABEL), ordered = TRUE)
ggplot(dat[dat$group=="A",] , aes(x = LABEL, y = value))+
geom_bar(stat="identity") +
geom_point(data = dat[dat$group=="B",], aes(x = LABEL, y = value), shape=15, size = 3, color = "blue" )
How can I add this line to the plot:
geom_point(data = dat[dat$group=="B",], aes(x = LABEL, y = value), shape=15, size = 3, color = "blue" )
while keeping group A's ordering?
Each group have not the same set of values, then you have to force the X axis order by adding:
+ scale_x_discrete(limits=dat$LABEL)
Then:
ggplot(data = dat , aes(x = LABEL, y = value) ) +
geom_bar(data = dat[dat$group=="A",], stat="identity") +
geom_point(data = dat[dat$group=="B",], shape=15, size = 3, color = "blue") +
scale_x_discrete(limits=dat$LABEL)
I agree with #Cédric Miachon.
There is a problem of using different x.
A possible way to change the behaviour is to introduce NA's to the not present x:
require(reshape2)
require(dplyr)
require(tidyr)
vector_f <- unique(dat$LABEL)
dat1 <- dat %>%
dcast(group+supercategory~LABEL, value.var = 'value') %>% #casting and gathering
gather(label, value , 3:10)
ggplot() +
geom_bar(data = dat1[dat1$group=="A",],aes(x = factor(label, levels = vector_f), y = value), stat="identity") +
geom_point(data = dat1[dat1$group=="B",], aes(x = factor(label, levels = vector_f), y = value))
##I removed some of the geom_point layout specs

Resources