Some bars don't reorder in ggplot - r

My dataframe:
data <- data.frame(commodity = c("A", "A", "B", "C", "C", "D"),
cost = c(1809065, 348456, 203686, 5966690, 172805, 3176424))
data
commodity cost
1 A 1809065
2 A 348456
3 B 203686
4 C 5966690
5 C 172805
6 D 3176424
Next I plot a barplot with reorder:
library(tidyverse)
data %>%
ggplot(aes(x = reorder(factor(commodity), cost), y = cost)) +
geom_bar(stat = "identity", fill = "steelblue3")
What happens next is that most bars are ordered just like I want, but a few aren't. Here's an image of my problematic plot:

you can try
library(tidyverse)
data %>%
ggplot(aes(x = reorder(commodity, cost, sum), y = cost)) +
geom_col(fill = "steelblue3")
Change the default mean function of reorder to sum. Then the order is in line with the bar function of ggplot. Of note, using geom_col is prefered over geom_bar when using stat="identity". If you need a decreased ordering try rev(reorder(commodity, cost, sum)) or create a function by yourself like function(x) -sum(x).

Reorder will by default reorder by the mean value for each group, as explained in the help page. Jimbou's solution is better but you could also do this in a different way by aggregating the data before plotting and using geom_col instead:
data %>%
group_by(commodity) %>%
summarise(cost = sum(cost)) %>%
ggplot(aes(x = reorder(factor(commodity), cost), y = cost)) +
geom_col(fill = "steelblue3")

Related

Iteratively plotting all columns in ggplot

I have a dataframe of tempratures where each column represents a year from 1996 to 2015 and rows are data from 1-Jul to Oct-31:
head(df)
[![Dataframe head][1]][1]
I am trying to create a line plot with x= DAYS and y=temp per year. when I use DAYS in the loop, either with aes() or aes_strint() it doesn't produce anything:
iterator <- c(colnames(df))[-1]
g <- ggplot(df, aes_string(x = 'DAY'))
for (i in iterator){
g <- g+ geom_line(aes_string(y=i))
}
print(g)
so I added an index column which is just integers from 1 to 123. Now the same code plots a bunch of lines but very strange:
df$index <- c(1:123)
iterator <- c(colnames(df))[-1]
iterator <- iterator[-21]
g <- ggplot(df, aes_string(x = 'index'))
for (i in iterator){
g <- g+ geom_line(aes_string(y=i))
}
print(g)
[![Final plot][2]][2]
as you can see, I have one line per column name and all the Colum names are stacking above each other. This has compressed the vertical axis so much that the variations in temperature is not visible. I wish my y-axis just goes from 50 to 100 and there will be one line per column name there with the same scale as other columns. How do I do that?
[1]: https://i.stack.imgur.com/ruF11.png
[2]: https://i.stack.imgur.com/gAvMe.png
Agree with Andrew's solution. Just a minor change: you have to remove the "df" on 3rd line as you declared it already in the beginning.
df %>%
tidyr::pivot_longer(!DAYS, names_to = "column", values_to = "temp") %>%
ggplot(aes(x = DAYS, y = temp, group = column)) +
geom_line()
I think you could rearrange your data frame, e.g. using the tidyr package, so that you have a data frame with "year", "day" and "temp" columns
library(ggplot2)
library(tidyr)
year1 = c(5,6,4,5)
year2 = c(6,5,5,6)
year3 = c(3,4,3,4)
date = c("a", "b", "c", "d")
data = data.frame(date, year1, year2, year3)
data2 = gather(data , "year", "temp", -date)
Then, you can easily plot the temperature per year.
ggplot(data2, aes(x = date, y = temp, group = year, color = year))+
geom_path()
If you're doing something with loops in R, especially with ggplot2, you are probably doing something wrong. I'm not 100% sure why you're looping at all, when you probably want to do something more like,
df %>%
tidyr::pivot_longer(!DAYS, names_to = "column", values_to = "temp") %>%
ggplot(df, aes(x = day, y = temp, group = column)) +
geom_line()
but without a reprex / data set I can't be sure if that's what you want.

boxplot data ggplot2 package

I'm new in R, hope you can help me. I want to make multiple boxplots in one graph, but I can't get output like this:
Here is my own data:
I used this command:
library(tidyverse)
library(readxl)
library(ggplot2)
marte <- read_xlsx("marterstudio.xlsx")
head(marte)
marte <- gather(marte, "A", "B", "C", "D", "E", "F", "G", "H", "I", "J", "K", "L", "M", "O", "P", key="ID", value="value")
marte$Group <- as.factor(marte$Group)
marte$ID <- as.factor(marte$ID)
ggplot(marte, aes(x = value, y = ID, color = ID)) +
geom_boxplot()
This is the result:
Can you help me?
What you need is coord_flip function instead of assign x/y as you did.
ggplot(data = marte) +
# I can put the data param in ggplot call but rather define the aes
# inside the geom_ call this allow you specify different aes for
# different geom if you happened to use multiple geom in one plot
geom_boxplot(aes(x = ID, y = value, color = ID)) +
coord_flip()
Making a boxplot with only 1 or 2 values per group is probably misleading about the true variance in each population. But just for the sake of demonstrating the code, you could do something like:
# load necessary packages
library(tidyverse)
# to reproduce sampling rows
set.seed(1)
# produce boxplot (not recommended for small samples)
iris %>%
select(Species, Sepal.Length, Sepal.Width) %>%
pivot_longer(-Species) %>%
group_by(Species, name) %>%
sample_n(size = 2, replace = FALSE) %>%
ggplot(aes(x = name, y = value, fill = Species)) +
geom_boxplot() +
coord_flip()
Which produces this plot:
In practice, when sample size is fairly small (e.g. n < 10), it is more informative to show the individual data points, perhaps with some summary statistic such as the mean or median. Here's how I would be more inclined to represent data with a sample size = 2:
# to reproduce sampling rows
set.seed(1)
# produce bar plot with overlaid points (recommended for small samples)
iris %>%
select(Species, Sepal.Length, Sepal.Width) %>%
pivot_longer(-Species) %>%
group_by(Species, name) %>%
sample_n(size = 2, replace = FALSE) %>%
ggplot(aes(x = name, y = value, fill = Species)) +
stat_summary(fun = mean, geom = "bar", position = "dodge") +
geom_point(shape = 21, size = 3, position = position_dodge(width = 0.9)) +
coord_flip()
Which gives this plot:

geom_bar overlapping labels

for simplicity lets suppose we have a database like
# A
1 1
2 2
3 2
4 2
5 3
We have a categorical variable "A" with 3 possible values (1,2,3). And im tring this code:
ggplot(df aes(x="", y=df$A, fill=A))+
geom_bar(width = 1, stat = "identity")
The problem is that the labels are overlapping. Also i want to change the labes for 1,2,3 to x,y,z.
Here is picture of what is happening
And here is a link for the actual data that im using.
https://a.uguu.se/anKhhyEv5b7W_Data.csv
Your graph does not correspond to the sample of data you are showing, so it is hard to be sure that the structure of your real data is actually the same.
Using a random example, I get the following plot:
df <- data.frame(A = sample(1:3,20, replace = TRUE))
library(ggplot2)
ggplot(df, aes(x="A", y=A, fill=as.factor(A)))+
geom_bar(width = 1, stat = "identity") +
scale_fill_discrete(labels = c("x","y","z"))
EDIT: Using data provided by the OP
Here using your data, you should get the following plot:
ggplot(df, aes(x = "A",y = A, fill = as.factor(A)))+
geom_col()
Or if you want the count of each individual values of A, you can do:
library(dplyr)
library(ggplot2)
df %>% group_by(A) %>% count() %>%
ggplot(aes(x = "A", y = n, fill = as.factor(A)))+
geom_col()
Is it what you are looking for ?

is it possible to ggplot grouped partial boxplots w/o facets w/ a single `geom_boxplot()`?

I needed to add some partial boxplots to the following plot:
library(tidyverse)
foo <- tibble(
time = 1:100,
group = sample(c("a", "b"), 100, replace = TRUE) %>% as.factor()
) %>%
group_by(group) %>%
mutate(value = rnorm(n()) + 10 * as.integer(group)) %>%
ungroup()
foo %>%
ggplot(aes(x = time, y = value, color = group)) +
geom_point() +
geom_smooth(se = FALSE)
I would add a grid of (2 x 4 = 8) boxplots (4 per group) to the plot above. Each boxplot should consider a consecutive selection of 25 (or n) points (in each group). I.e., the firsts two boxplots represent the points between the 1st and the 25th (one boxplot below for the group a, and one boxplot above for the group b). Next to them, two other boxplots for the points between the 26th and 50th, etcetera. If they are not in a perfect grid (which I suppose would be both more challenging to obtain and uglier) it would be even better: I prefer if they will "follow" their corresponding smooth line!
That all without using facets (because I have to insert them in a plot which is already facetted :-))
I tried to
bar <- foo %>%
group_by(group) %>%
mutate(cut = 12.5 * (time %/% 25)) %>%
ungroup()
bar %>%
ggplot(aes(x = time, y = value, color = group)) +
geom_point() +
geom_smooth(se = FALSE) +
geom_boxplot(aes(x = cut))
but it doesn't work.
I tried to call geom_boxplot() using group instead of x
bar %>%
ggplot(aes(x = time, y = value, color = group)) +
geom_point() +
geom_smooth(se = FALSE) +
geom_boxplot(aes(group = cut))
But it draws the boxplots without considering the groups and loosing even the colors (and add a redundant call including color = group doesn't help)
Finally, I decided to try it roughly:
bar %>%
ggplot(aes(x = time, y = value, color = group)) +
geom_point() +
geom_smooth(se = FALSE) +
geom_boxplot(data = filter(bar, group == "a"), aes(group = cut)) +
geom_boxplot(data = filter(bar, group == "b"), aes(group = cut))
And it works (maintaining even the correct colors from the main aes)!
Does someone know if it is possible to obtain it using a single call to geom_boxplot()?
Thanks!
This was interesting! I haven't tried to use geom_boxplot with a continuous x before and didn't know how it behaved. I think what is happening is that setting group overrides colour in geom_boxplot, so it doesn't respect either the inherited or repeated colour aesthetic. I think this workaround does the trick; we combine the group and cut variables into group_cut, which takes 8 different values (one for each desired boxplot). Now we can map aes(group = group_cut) and get the desired output. I don't think this is particularly intuitive and it might be worth raising it on the Github, since usually we expect aesthetics to combine nicely (e.g. combining colour and linetype works fine).
library(tidyverse)
bar <- tibble(
time = 1:100,
group = sample(c("a", "b"), 100, replace = TRUE) %>% as.factor()
) %>%
group_by(group) %>%
mutate(
value = rnorm(n()) + 10 * as.integer(group),
cut = 12.5 * ((time - 1) %/% 25), # modified this to prevent an extra boxplot
group_cut = str_c(group, cut)
) %>%
ungroup()
bar %>%
ggplot(aes(x = time, y = value, colour = group)) +
geom_point() +
geom_smooth(se = FALSE) +
geom_boxplot(aes(group = group_cut), position = "identity")
#> `geom_smooth()` using method = 'loess' and formula 'y ~ x'
Created on 2019-08-13 by the reprex package (v0.3.0)

Order x axis in stacked bar by subset of fill

There are multiple questions (here for instance) on how to arrange the x axis by frequency in a bar chart with ggplot2. However, my aim is to arrange the categories on the X-axis in a stacked bar chart by the relative frequency of a subset of the fill. For instance, I would like to sort the x-axis by the percentage of category B in variable z.
This was my first try using only ggplot2
library(ggplot2)
library(tibble)
library(scales)
factor1 <- as.factor(c("ABC", "CDA", "XYZ", "YRO"))
factor2 <- as.factor(c("A", "B"))
set.seed(43)
data <- tibble(x = sample(factor1, 1000, replace = TRUE),
z = sample(factor2, 1000, replace = TRUE))
ggplot(data = data, aes(x = x, fill = z, order = z)) +
geom_bar(position = "fill") +
scale_y_continuous(labels = percent)
When that didn't work I created a summarised data frame using dplyr and then spread the data and sort it by B and then gather it again. But plotting that didn't work either.
library(dplyr)
library(tidyr)
data %>%
group_by(x, z) %>%
count() %>%
spread(z, n) %>%
arrange(-B) %>%
gather(z, n, -x) %>%
ggplot(aes(x = reorder(x, n), y = n, fill = z)) +
geom_bar(stat = "identity", position = "fill") +
scale_y_continuous(labels = percent)
I would prefer a solution with ggplot only in order not to be dependent of the order in the data frame created by dplyr/tidyr. However, I'm open for anything.
If you want to sort by absolute frequency:
lvls <- names(sort(table(data[data$z == "B", "x"])))
If you want to sort by relative frequency:
lvls <- names(sort(tapply(data$z == "B", data$x, mean)))
Then you can create the factor on the fly inside ggplot:
ggplot(data = data, aes(factor(x, levels = lvls), fill = z)) +
geom_bar(position = "fill") +
scale_y_continuous(labels = percent)
A solution using tidyverse would be:
data %>%
mutate(x = forcats::fct_reorder(x, as.numeric(z), fun = mean)) %>%
ggplot(aes(x, fill = z)) +
geom_bar(position = "fill") +
scale_y_continuous(labels = percent)

Resources