Using ggplot and facet_grid, I'd like to visualize two parallel vector of values through a box plot. My available data:
DF <- data.frame("value" = runif(50, 0, 1),
"value2" = runif(50,0,1),
"type1" = c(rep("AAAAAAAAAAAAAAAAAAAAAA", 25),
rep("BBBBBBBBBBBBBBBBB", 25)),
"type2" = rep(c("c", "d"), 25),
"number" = rep(2:6, 10))
The code at the moment permit to visualize only one vector of values:
ggplot(DF, aes(y=value, x=type1)) +
geom_boxplot(alpha=.3, aes(fill = type1)) +
ggtitle("TITLE") +
facet_grid(type2 ~ number) +
scale_x_discrete(name = NULL, breaks = NULL) + # these lines are optional
theme(legend.position = "bottom")
This is my plot at the moment.
I'd like to visualize a parallel box plot one for each vector (value and value2 in dataframe). Then for each colored boxplot, I'd like to have two boxplot one for value and another one for value2
I think there's likely a post that already addresses it, in addition to the one I linked to above. But this is a problem of two things: 1) getting data into the format that ggplot expects, i.e. long-shaped so there are values to map onto aesthetics, and 2) separation of concerns, in that you can use reshape2 or (more up-to-date) tidyr functions to get data into the proper shape, and ggplot2 functions to plot it.
You can use tidyr::gather for getting long data, and conveniently pipe it directly into ggplot.
library(tidyverse)
...
To illustrate, though with very generic column names:
DF %>%
gather(key, value = val, value, value2) %>%
head()
#> type1 type2 number key val
#> 1 AAAAAAAAAAAAAAAAAAAAAA c 2 value 0.5075600
#> 2 AAAAAAAAAAAAAAAAAAAAAA d 3 value 0.6472347
#> 3 AAAAAAAAAAAAAAAAAAAAAA c 4 value 0.7543778
#> 4 AAAAAAAAAAAAAAAAAAAAAA d 5 value 0.7215786
#> 5 AAAAAAAAAAAAAAAAAAAAAA c 6 value 0.1529630
#> 6 AAAAAAAAAAAAAAAAAAAAAA d 2 value 0.8779413
Pipe that directly into ggplot:
DF %>%
gather(key, value = val, value, value2) %>%
ggplot(aes(x = key, y = val, fill = type1)) +
geom_boxplot() +
facet_grid(type2 ~ number) +
theme(legend.position = "bottom")
Again, because of some of the generic column names, I'm not entirely sure this is the setup you want—like I don't know the difference in value / value2 vs AAAAAAA / BBBBBBB. You might need to swap aes assignments around accordingly.
You have to reshape your data frame. Use an additionally indicator (column) which defines the type of value (for example "value_type") and only define one value column. The indicator will than match the value to the corresponding value type. The following code will reshape your example:
DF <- data.frame("value" = c(runif(50, 0, 1), runif(50,0,1)),
"value_type" = rep(c("value1","value2"), each=50),
"type1" = rep(c(rep("AAAAAAAAAAAAAAAAAAAAAA", 25),
rep("BBBBBBBBBBBBBBBBB", 25)), 2),
"type2" = rep(rep(c("c", "d"), 25), 2),
"number" = rep(rep(2:6, 10),2))
Use ggplot additionaly with an color argument:
ggplot(DF, aes(y=value, x=type1, col=value_type)) +
geom_boxplot(alpha=.3, aes(fill = type1)) +
ggtitle("TITLE") +
facet_grid(type2 ~ number) +
scale_color_manual(values=c("green", "steelblue")) + # set the color of the values manualy
scale_x_discrete(name = NULL, breaks = NULL) +# these lines are optional
theme(legend.position = "bottom")
Related
In ggplot, is there any simple way of overriding the line attributes of a single group(s) without having to specify the entirety of the color/line pallet via scale_*_manual()?
In the example below, I basically want to make all the boot_* lines gray and skinny, while I want all other lines to retain the default colors/widths otherwise being used.
I know there's a lot of brute ways of doing this by some combo of a) creating some auxiliary variables in the data-frame based on the string-pattern that will server as my color/size group, then b) generating the plot below, extracting all the color-layer info, and then filling out an entire scale_color_manual() and scale_size_manual() map, and c)replacing the 'boot_*' values with "grey."
Are there any versatile shortcuts here?
library(dplyr)
library(ggplot)
set.seed(231)
df=tibble(time=c(1:5), actual=2*time+3, estimate = actual+rnorm(length(actual)))
for(i in 1:8){
df[paste('boot_', i, sep='')] = df$estimate + rnorm(nrow(df))
}
> head(df) %>% data.frame
# time actual estimate boot_1 boot_2 boot_3 boot_4 boot_5 boot_6 boot_7
# 1 1 5 4.466898 4.684295 4.240585 4.786520 5.904332 4.862498 2.092772 4.595850
# 2 2 7 4.688336 4.751258 6.074914 5.694181 3.445036 4.639329 4.548511 5.453597
# 3 3 9 8.045802 7.167972 6.858666 7.519752 7.721405 7.801243 10.156436 9.521482
# 4 4 11 11.262516 11.826206 10.682760 11.137814 11.252465 11.452442 11.925339 11.754248
# 5 5 13 12.526643 12.492315 13.927974 14.176896 11.924183 12.950479 11.257865 13.430229
# boot_8
# 1 3.987001
# 2 3.813539
# 3 7.549984
# 4 11.482360
# 5 11.645106
# Melt for ggplot compatibility
df_long = df %>%
pivot_longer(cols=(-time))
head(df_long) %>% data.frame
# time name value
# 1 1 actual 5.000000
# 2 1 estimate 4.466898
# 3 1 boot_1 4.684295
# 4 1 boot_2 4.240585
# 5 1 boot_3 4.786520
# 6 1 boot_4 5.904332
## The basic ggplot
df_long %>%
ggplot(aes(x=time, y=value, color=name)) + geom_line()
You could just use the first four characters of name for the colour aesthetic (using substr), and the full name as a group aesthetic. It's a bit hacky but it's short, effective, and all gets done in the plotting code without extra data wrangling, post-hoc changes or a long vector of colour mappings.
df_long %>%
ggplot(aes(x = time, y = value, color = substr(name, 1, 4), group = name)) +
geom_line() +
scale_color_manual(labels = c("actual", "boot", "estimate"),
values = c("orange", "gray", "blue3"), name = "name")
An alternative is using filtering to have two sets of lines: one coloured, and one merely grouped. This has the benefit that you don't need to add any scale calls at all:
df_long %>%
filter(!grepl("boot", name)) %>%
ggplot(aes(x = time, y = value, color = name)) +
geom_line(data = filter(df_long, grepl("boot", name)),
aes(group = name), color = "gray", size = 0.3) +
geom_line()
EDIT
It's pretty difficult to only specify an aesthetic mapping for a single (multiple) group, while leaving the others at default values. However, it is possible using ggnewscale. Here we only have to specify the color of the boot group:
library(ggnewscale)
df_long %>%
filter(!grepl("boot", name)) %>%
ggplot(aes(x = time, y = value)) +
new_scale_color() +
geom_line(aes(color = name)) +
scale_color_discrete(name = "Variable") +
new_scale_color() +
geom_line(data = filter(df_long, grepl("boot", name)),
aes(group = name, color = "boot"), size = 0.3) +
scale_color_manual(values = "gray", name = "") +
theme(legend.margin = margin(-28, 10, 0, 0))
I wish to facet a graph based on two factors, rename the facets using a combination of the two facet factor values, but preserve the order of the facets based on the levels in the original factors.
The data looks something like this:
library(tidyverse)
set.seed(100)
tmp.d <- data.frame(
sector = factor(rep(c("B","A"),c(6,3)), levels = c("B","A")),
subsector = factor(rep(c("a","b","c"), each = 3), levels = c("c","b","a")),
year = factor(rep(2020:2022,3)),
value = sample(8:15,9, replace = TRUE)
)
#> tmp.d
# sector subsector year value
#1 B a 2020 9
#2 B a 2021 14
#3 B a 2022 13
#4 B b 2020 15
#5 B b 2021 10
#6 B b 2022 8
#7 A c 2020 9
#8 A c 2021 13
#9 A c 2022 11
Which is plotted and faceted by sector and subsector...
ggplot(tmp.d, aes(x = year, y = value, group = 1)) +
geom_path()+
facet_wrap(facets = list("sector","subsector"))
...and looks like this:
Notice that the facets keep the order set by the factor levels of "sector" and "subsector." This is desirable.
However, instead of listing the sector and sub sector on separate lines, I want the facet labels to read "[sector]: [subsector]" as in "B: b".
Attempt 1:
Adding a helper column to tmp.d, containing the facet labels.
tmp.d <- tmp.d %>% mutate(label = factor(paste0(sector, ": ", subsector)))
ggplot(tmp.d, aes(x = year, y = value, group = 1)) +
geom_path()+
facet_wrap(facets = list("label"))
Which yields:
Here, the facet labels are correct, but I've lost the order from the sector/subsector factor levels.
Attempt 2:
I think the answer may lay in a custom as_labeller function or perhaps even changing setting for an existing labeller like label_value which has a multi_line = [bool] attribute that controls whether the facet values appear on a single line or mulitple lines. Other versions of the label_ family have another attribute sep = which I beleive controls how the values are seperated in on the same line. Presumably, the combination of ...multi_line = FALSE, sep = ": "... might format the label and preserve the desired order.
The labeller is applied in the call to facet_wrap().
ggplot(tmp.d, aes(x = year, y = value, group = 1)) +
geom_path()+
facet_wrap(facets = list("sector","subsector"), labeller = [the labeller function])
Setting the labeller to an existing labeller function without changing default settings (see below) yields the same output as my original attempt above.
...
facet_wrap(facets = list("sector","subsector"), labeller = label_value)
...
Attempting to change the attribute values for label_value like so...
...
facet_wrap(facets = list("sector","subsector"), labeller = label_value(multi_line = FALSE))
...
... does not work because the label_value function requires a label value that I do not know how to provide. Passing the facet factors as names or as character strings (either as a list or vector) does not appear to work. The examples I found in the documentation or elsewhere use facet_grid instead of facet_wrap, and the labels is provided as a formula like ~sector+subsector which I assume is treated like a grid/matrix where sectors are columns and subsectors are rows. In my case, most (but not necessarily all) combinations of sector/subsector will be unique (i.e., Sectors A and B do not share subsectors).
Question Summary
Is there a simple way to acheive my objectives (restated for convenience):
facet on two factor variables (facet_wrap, not facet_grid)
preserve facet order based on factor levels
reformat the facet label to a single line with sector and subsector sepearted by a colon
Thanks,
Wow, that was a lot trickier than I expected... One solution would be to combine them into a different field:
tmp.d |>
arrange(sector, subsector) |> # arrange by factor levels
mutate(
facet =
paste0(sector, ": ", subsector) |>
fct_inorder(ordered = TRUE) # use that order for the new field
) |>
ggplot(aes(x = year, y = value, group = 1)) +
geom_path()+
facet_wrap(facets = ~facet) # here
This also works if a ", " is acceptable:
ggplot(tmp.d, aes(x = year, y = value, group = 1)) +
geom_path()+
facet_wrap(
facets = sector~subsector,
labeller =
labeller( # here
sector = label_value, #
subsector = label_value, #
.multi_line = FALSE #
)
)
A similar thing can be done with purrr::partial() which substitutes out defaults but again you get a comma. I think it would be worth creating an issue on their github page to add a sep argument to the label_*() functions
... +
facet_wrap(
facets = sector~subsector,
labeller = purrr::partial(label_value, multi_line = FALSE)
)
Update: Meanwhile yake84 already finished the answer:
To automate just add fct_inorder.. after using arrange:
tmp.d %>%
arrange(sector, subsector) %>%
mutate(my_label = paste(sector,subsector, sep=":") %>%
fct_inorder(ordered = TRUE)) %>%
ggplot(aes(x = year, y = value, group = 1)) +
geom_path()+
facet_wrap( ~ my_label)
First answer:
Just transform your label in attempt 1 to factor and define the levels:
library(tidyverse)
tmp.d %>%
mutate(my_label = paste(sector,subsector, sep=":")) %>%
mutate(my_label = factor(my_label, levels = c("B:b", "B:a", "A:c"))) %>%
ggplot(aes(x = year, y = value, group = 1)) +
geom_path()+
facet_wrap( ~ my_label)
I'm trying to create some variation of a pareto-chart.
Moving along the code I face a problem I cannot solve on my own for several hours. It's regarding the data order of the package ggplot2 (1) and renaming the labels accordingly(2).
(1)Since I want to create an ordered bar-plot with a saturation curve, I created a dummyvar from X to X-1, so my bars are sorted from high to low, as you can see in the output (1).
By maneuvering around this problem I created a second problem I can't fix.
(2)I have a column in my df containing all the species I want to see at the x-axis. However, ggplot won't allow to print those accordingly. Actually since I added the command I won't get any labeling on the x-axis. Somehow I will not get any error.
So my question is:
Is there a way to use my species list as x-axis?(But remember my data has to be sorted from high to low)
Or does some one easily spot a way to solve the labeling problem?
cheers
dfb
Beech id proc kommu Order
1 Va fla 1 8.749851 8.749851 Psocopt
2 Er 2 7.793812 16.543663 Acari
3 Faga dou 3 7.659406 24.203069 Dipt
4 Tro 4 6.675941 30.879010 Acari
5 Hal ann 5 6.289307 37.168317 Dipt
6 Stigm 6 3.724406 40.892723 Acari
7 Di fag 7 3.642574 44.535297 Lepidopt
8 Phyfa 8 3.390545 47.925842 Neoptera
9 Phylma 9 2.766040 50.691881 Lepidopt
data example:
structure(list(Beech = c("Va fla", "Er", "Faga dou", "Tro", "Hal ann",
"Stigm", "Di fag", "Phyfa", "Phylma"), id = c(1, 2, 3, 4, 5,
6, 7, 8, 9), proc = c(8.749851, 7.793812, 7.659406, 6.675941,
6.289307, 3.724406, 3.642574, 3.390545, 2.76604), kommu = c(8.749851,
16.543663, 24.203069, 30.87901, 37.168317, 40.892723, 44.535297,
47.925842, 50.691881), Order = c("Psocopt", "Acari", "Dipt",
"Acari", "Dipt", "Acari", "Lepidopt", "Neoptera", "Lepidopt")), row.names = c(NA,
-9L), class = c("tbl_df", "tbl", "data.frame"))
library(openxlsx)
library(ggplot2)
dfb <- data.xlsx ###(df containing different % values per species)
labelb <- dfb$Beech ###(list of 22 items; same number as x-values)
p <-ggplot(dfb, aes(x=id))
p <- p + geom_bar(aes(y = proc), stat = "identity", fill = "lightgreen")
p <- p + geom_line(aes(y = kommu/10), color = "orange", size = 2) + geom_point(aes(y = kommu/10),size = 2)
p <- p + scale_y_continuous(sec.axis = sec_axis(~.*10, name ="Total biocoenosis[%]"))
p <- p + labs(y = "Species [%]",
x = "Species")
p <- p + scale_x_discrete(labels = labelb)
p <- p + theme(legend.position = c(0.8, 0.9))
--> Answer to other comments:
So basically my problem is the bars are not labeled with a species name.
I know that this is a result due to my dummyvar, which is basically 1 to 22.
So I try to force ggplot to name the x-axis with my wanted values.
But this input doesn't work
p <- p + scale_x_discrete(labels = labelb)
But back to your suggestions:
Jeah, I tried tidyverse just after creating this post and couldn't handle it good enough. But your idea doesn't do anything for me, its like using the ggplot command.
arrange(Beech) %>%
mutate(Beech = factor(Beech, levels = unique(.$Beech))) %>%
ggplot(aes(Beech, proc)) +
geom_col()
I can't quite tell from the picture what's going wrong, but one way to make sure your bar plots are in ascending/descending order is to arrange the column and then convert it to a factor using the existing order of the categories:
So, without ordering:
library(tidyverse)
diamonds %>%
group_by(cut) %>%
summarize(price = mean(price)) %>%
ggplot(aes(cut, price)) +
geom_bar(stat = "identity")
And with ordering:
diamonds %>%
group_by(cut) %>%
summarize(price = mean(price)) %>%
arrange(price) %>%
mutate(cut = factor(cut, levels = unique(.$cut))) %>%
ggplot(aes(cut, price)) +
geom_bar(stat = "identity")
I edited your code with the database sample you provided and I think I was able to do what you wanted.
Basically I sorted Beech depending on the descending proc and then convert it to factor. Here is the modified code and the result:
p <-
dfb %>%
arrange(desc(proc)) %>%
mutate(Beech = factor(Beech, levels = unique(.$Beech))) %>%
ggplot(aes(Beech)) +
geom_bar(aes(y = proc), stat = "identity", fill = "lightgreen") +
geom_line(aes(y = kommu/10, x=as.integer(Beech)), color = "orange", size = 2) +
geom_point(aes(y = kommu/10),size = 2) +
labs(y = "Species [%]", x = "Species") +
scale_x_discrete("Species") +
scale_y_continuous(sec.axis = sec_axis(~.*10, name ="Total biocoenosis[%]")) +
theme(legend.position = c(0.8, 0.9))
p
Note: I had to tweak a bit the geom_line by adding x=as.integer(Beech) because it works with numbers and not factors.
for simplicity lets suppose we have a database like
# A
1 1
2 2
3 2
4 2
5 3
We have a categorical variable "A" with 3 possible values (1,2,3). And im tring this code:
ggplot(df aes(x="", y=df$A, fill=A))+
geom_bar(width = 1, stat = "identity")
The problem is that the labels are overlapping. Also i want to change the labes for 1,2,3 to x,y,z.
Here is picture of what is happening
And here is a link for the actual data that im using.
https://a.uguu.se/anKhhyEv5b7W_Data.csv
Your graph does not correspond to the sample of data you are showing, so it is hard to be sure that the structure of your real data is actually the same.
Using a random example, I get the following plot:
df <- data.frame(A = sample(1:3,20, replace = TRUE))
library(ggplot2)
ggplot(df, aes(x="A", y=A, fill=as.factor(A)))+
geom_bar(width = 1, stat = "identity") +
scale_fill_discrete(labels = c("x","y","z"))
EDIT: Using data provided by the OP
Here using your data, you should get the following plot:
ggplot(df, aes(x = "A",y = A, fill = as.factor(A)))+
geom_col()
Or if you want the count of each individual values of A, you can do:
library(dplyr)
library(ggplot2)
df %>% group_by(A) %>% count() %>%
ggplot(aes(x = "A", y = n, fill = as.factor(A)))+
geom_col()
Is it what you are looking for ?
I need help on setting the individual x-axis limits on different facets as described below.
A programmatical approach is preferred since I will apply the same template to different data sets.
first two facets will have the same x-axis limits (to have comparable bars)
the last facet's (performance) limits will be between 0 and 1, since it is calculated as a percentage
I have seen this and some other related questions but couldn't apply it to my data.
Thanks in advance.
df <-
data.frame(
call_reason = c("a","b","c","d"),
all_records = c(100,200,300,400),
problematic_records = c(80,60,100,80))
df <- df %>% mutate(performance = round(problematic_records/all_records, 2))
df
call_reason all_records problematic_records performance
a 100 80 0.80
b 200 60 0.30
c 300 100 0.33
d 400 80 0.20
df %>%
gather(key = facet_group, value = value, -call_reason) %>%
mutate(facet_group = factor(facet_group,
levels=c('all_records','problematic_records','performance'))) %>%
ggplot(aes(x=call_reason, y=value)) +
geom_bar(stat="identity") +
coord_flip() +
facet_grid(. ~ facet_group)
So here is one way to go about it with facet_grid(scales = "free_x"), in combination with a geom_blank(). Consider df to be your df at the moment before piping it into ggplot.
ggplot(df, aes(x=call_reason, y=value)) +
# geom_col is equivalent to geom_bar(stat = "identity")
geom_col() +
# geom_blank includes data for position scale training, but is not rendered
geom_blank(data = data.frame(
# value for first two facets is max, last facet is 1
value = c(rep(max(df$value), 2), 1),
# dummy category
call_reason = levels(df$call_reason)[1],
# distribute over facets
facet_group = levels(df$facet_group)
)) +
coord_flip() +
# scales are set to "free_x" to have them vary independently
# it doesn't really, since we've set a geom_blank
facet_grid(. ~ facet_group, scales = "free_x")
As long as your column names remain te same, this should work.
EDIT:
To reorder the call_reason variable, you could add the following in your pipe that goes into ggplot:
df %>%
gather(key = facet_group, value = value, -call_reason) %>%
mutate(facet_group = factor(facet_group,
levels=c('all_records','problematic_records','performance')),
# In particular the following bit:
call_reason = factor(call_reason, levels(call_reason)[order(value[facet_group == "performance"])]))