Plot a barplot with repeated labels - r

I would like to plot data with repeated x-axis labels in the form of bar-plot without merging the values with repeated labels.
In the example I have a table de:
de <- data.frame(mean=c(10, 2, 3, 1, 4, 5, 3, 9),
base=c('A','A','C','G','T','T','T','A'))
And I would like to have a plot like this:
But when I run this in R:
ggplot(de, aes( y = mean, x =base))+
geom_bar(stat = 'identity')
This is what I get:
It merges the identical bases into one column, whereas I want a separate column for each value of base, even the repeated ones, as shown in the table above.

The easy way is to:
Set non-unique labels for As and Ts in your "base" column; for example Ax, Ay, Tx, Ty etc:
de <- data.frame(mean=c(10, 2, 3, 1, 4, 5, 3, 9),
base=c("Ax", "Ay", "C", "G", "Tx","Ty", "Tz", "A"))
And then change the x-axis labels:
ggplot(de, aes( y = mean, x =base))+
geom_bar(stat = 'identity') +
scale_x_discrete(labels=c("A", "A", "C", "G", "T","T", "T", "A"))

Building off #sargg's excellent answer, we can prevent the possibility of human error by generating the unique base names and the ggplot labels automatically with dplyr:
library(dplyr)
de2 <- de %>%
group_by(base) %>%
mutate(unique_base = paste0(base, row_number()))
# A tibble: 8 x 3
# Groups: base [4]
mean base unique_base
<dbl> <fct> <chr>
1 10 A A1
2 2 A A2
3 3 C C1
4 1 G G1
5 4 T T1
6 5 T T2
7 3 T T3
8 9 A A3
ggplot(de2, aes(y = mean, x =unique_base))+
geom_bar(stat = 'identity') +
scale_x_discrete(labels=de2$base)
For an even DRY-er answer, we can pass the data in like this (note the curly braces: see this answer for more information):
de2 %>% {
ggplot(., aes( y = mean, x =unique_base))+
geom_bar(stat = 'identity') +
scale_x_discrete(labels=.$base)
}
This lets us access the de2 data frame from within the ggplot call with ., thus letting us specify the labels with labels=.$base, rather than having to specify the dataframe de2 twice.

Though there already is an accepted solution, I will post another one, creating the desired labels from the original dataset.
First, an example dataset creation code.
set.seed(1234)
values <- sample(20, 8)
base <- c('A', 'A', 'C', 'G', 'T', 'T', 'T', 'A')
de <- data.frame(base, values)
Now the code to plot the graph.
library(tidyverse)
de %>%
mutate(base1 = paste0(seq_along(base), base)) %>%
ggplot(aes(x = base1, y = values)) +
geom_bar(stat = 'identity') +
geom_text(aes(x = base1, y = -1,
label = base)) +
theme(axis.title.x=element_blank(),
axis.text.x=element_blank(),
axis.ticks.x=element_blank())

Related

Reorder labels in ggplot based on values in 2 colums

I would like to build a bar plot in R in which the labels in x axis are ordered by the value in Count if Name == A. When I build it, it simply order the x axis "Tool 1, Tool 2, Tool 3". What I would like is to get a x axis order like "Tool 3, Tool 1, Tool 2" (ordering by values in Name == A).
Here is a tiny dataframe as example:
df <- rbind(tibble(Name = c('A', 'B', 'C'), Count = c(6, 3, 1), Method = rep('Tool 1', 3)),
tibble(Name = c('A', 'B', 'C'), Count = c(4, 2, 4), Method = rep('Tool 2', 3)),
tibble(Name = c('A', 'B', 'C'), Count = c(7, 3, 0), Method = rep('Tool 3', 3))
)
df
# A tibble: 9 x 3
Name Count Method
<fct> <dbl> <chr>
1 A 6 Tool 1
2 B 3 Tool 1
3 C 1 Tool 1
4 A 4 Tool 2
5 B 2 Tool 2
6 C 4 Tool 2
7 A 7 Tool 3
8 B 3 Tool 3
9 C 0 Tool 3
To build the plot:
p <- ggplot(df, aes(x = Method,
y = Count,
fill = Name)) +
geom_bar(stat="identity")
p
I tried to rearrange it with df %>% mutate(Method = fct_reorder(Method, desc(Count))) but I don't know how to select only by A values. I also could reorder it manually but I am interested in knowing how to do it automatically. Thanks for the help !
We can add a column to df and use the reorder function
df$Count_A <- ifelse(df$Name == "A", df$Count, NA)
ggplot(df, aes(x = reorder(Method, -Count_A, mean, na.rm = TRUE),
y = Count,
fill = Name))+
geom_bar(stat="identity")
Alternatively, you could add the reordered factor to df:
df$Method <- reorder(df$Method, -df$Count_A, mean, na.rm = TRUE)
ggplot(df, aes(x = Method, y = Count, fill = Name)) +
geom_bar(stat="identity")
Using relevel.
p <- ggplot(transform(df, Method=relevel(as.factor(Method), ref="Tool 3")),
aes(x = Method,y = Count, fill = Name)) +
geom_bar(stat="identity")
p

a graph with two lines, separating into different graphs based on one column values

I have a dataframe in this format, but with several hundred more rows:
dfex = data.frame(dot = c('A', 'B', 'C', 'D', 'E', 'F'),
group = c('A1', 'A1', 'A1', 'A2', 'A2', 'A2'),
x1 = c(1, 2, 3, 4, 5, 6),
x2 = c(4, 5, 6, 1, 2, 3),
y = c(1, 2, 3, 4, 5, 6))
I want to create different graphs based on the value in group, so one graph will only have group A1 rows and the other graph only has group A2 rows.
On each graph, there should be two different lines for the x1-y pair and the x2-y pair. Preferably I could have the correlation for each of these lines listed as well.
I'm familiar with ggplot2, so using that would be great.
Here is an amazing paint drawing for a better idea of what I mean:
I agree with #camille, it is better to reshape the data to long format before plotting.
library(tidyverse)
dfex %>%
gather(key, value, -c(dot, group, y)) %>%
ggplot() +
aes(value, y, color = key) +
geom_line() +
facet_wrap(.~group)
The below code will split into two parts. The facet_wrap will divide the graph into two columns on the group. I have created two lines because of the variables being stored in separate columns.
ggplot(dfex) +
geom_line(mapping = aes(x = x1, y = y, color = "blue")) +
geom_line(mapping = aes(x = x2, y = y, color = "red")) +
facet_wrap(. ~group)
Or additionally to gather the data into a more tidy format,
gather(dfex, "xVar", "x", 3:4) %>%
ggplot() +
geom_line(mapping = aes(x = x, y = y, color = xVar)) +
facet_wrap(. ~group)

gganimate: include additional variable other than states level variable or frame in title expression

I'd like to insert another column value of my data into a gganimate animation title.
Example, here the states level variable is x and I'd like to add to title variable y:
df <- tibble(x = 1:10, y = c('a', 'a', 'b', 'd', 'c', letters[1:5]))
df
A tibble: 10 x 2
x y
<int> <chr>
1 1 a
2 2 a
3 3 b
4 4 d
5 5 c
6 6 a
7 7 b
8 8 c
9 9 d
10 10 e
This works as expected:
ggplot(df, aes(x, x)) +
geom_point() +
labs(title = '{closest_state}') +
transition_states(x,
transition_length = 0.1,
state_length = 0.1)
This fails:
ggplot(df, aes(x, x)) +
geom_point() +
labs(title = '{closest_state}, another_var: {y}') +
transition_states(x,
transition_length = 0.1,
state_length = 0.1)
Error in eval(parse(text = text, keep.source = FALSE), envir) :
object 'y' not found
Also tried this, but y will not change:
ggplot(df, aes(x, x)) +
geom_point() +
labs(title = str_c('{closest_state}, another_var: ', df$y)) +
transition_states(x,
transition_length = 0.1,
state_length = 0.1)
Another option is to map y as the states level variable and use the frame variable instead of x, but in my application y is either a not-necessarily-unique character variable like above, or it is a numeric variable but again not-necessarily-unique and not-necessarily-ordered. In which case gganimate (or ggplot?) will order it as it sees fit, making the final result weird not ordered by x:
ggplot(df, aes(x, x)) +
geom_point() +
labs(title = '{frame}, another_var: {closest_state}') +
transition_states(y,
transition_length = 0.1,
state_length = 0.1)
So how to simply add the changing value of the un-ordered, not numeric, y variable?
Finally: This question was asked here but without a reproducible example so it was not answered, hoping this one is better.
One dirty solution would be to paste together the variables and make a new one to use in the transition_states:
df <- mutate(df, title_var = factor(paste(x, y, sep="-"), levels = paste(x, y, sep="-")))
# # A tibble: 6 x 3
# x y title_var
# <int> <chr> <fct>
# 1 1 a 1-a
# 2 2 a 2-a
# 3 3 b 3-b
# 4 4 d 4-d
# 5 5 c 5-c
# 6 6 a 6-a
Then we could use gsub() in ordet to strip closest_state from the unwanted part, like this:
gsub(pattern = "\\d+-", replacement = "", "1-a")
"a"
So:
ggplot(df, aes(x, x)) +
geom_point() +
labs(title = '{gsub(pattern = "\\d+-", replacement = "", closest_state)}') +
transition_states(title_var, transition_length = 0.1, state_length = 0.1)
Another possibility, slightly more compact, from the author of gganimate himself, following the issue I opened:
https://github.com/thomasp85/gganimate/issues/252#issuecomment-450846868
According to Thomas:
There are multiple reasons why random columns from the input data
cannot be accessed so it is unlikely to get any better than this...
Here's a solution using dplyr, based on the gganimate developer Thomas's solution, provided by Giora.
library(tidyverse)
library(gganimate)
df <- tibble::tibble(x = 1:10, y = c('a', 'a', 'b', 'd', 'c', letters[1:5]))
a <- ggplot(df, aes(x, x)) +
geom_point() +
labs(title = "{closest_state}, another_var: {df %>% filter(x == closest_state) %>% pull(y)}") +
transition_states(x,
transition_length = 0.1,
state_length = 0.1)
animate(a)
The gganimate titles use glue syntax for the animated title elements, and you can include entire dplyr data manipulation pipelines within them.
You can refer to the closest_state variable provided by gganimate::transition_states() within your dplyr calls. Here, since the animation's frames are indexed by successive levels of x, I use filter() to subset df for a given frame based on the value of x and then refer to corresponding rows of column y, which contain additional information I'd like to display in the title. Using pull, you can grab the individual value of y corresponding to x and display it within the animation's title.
This is a clean and straightforward way to do it with the advantage that you can, e.g., compute summary values to display on-the-fly by adding summarize() and other calls in your magrittr pipeline.

Filter in ggplot2's geoms using common aesthetics and data frames across geoms

Say I have the following data frame:
# Dummy data frame
df <- data.frame(x = rep(1:5, 2), y = runif(10), z = rep(c("A", "B"), each = 5))
# x y z
# 1 1 0.92024937 A
# 2 2 0.37246007 A
# 3 3 0.76632809 A
# 4 4 0.03418754 A
# 5 5 0.33770400 A
# 6 1 0.15367174 B
# 7 2 0.78498276 B
# 8 3 0.03341913 B
# 9 4 0.77484244 B
# 10 5 0.13309999 B
I'd like to plot cases where z == "A" as points and cases where z == "B" as lines. Simple enough.
library(ggplot2)
# Plot data
g <- ggplot()
g <- g + geom_point(data = df %>% filter(z == "A"), aes(x = x, y = y))
g <- g + geom_line(data = df %>% filter(z == "B"), aes(x = x, y = y))
g
My data frame and aesthetic for the points and lines are identical, so this seems a bit verbose – especially if I want to do this lots of times (e.g., z == "A" through z == "Z"). Is there a way that I could state ggplot(df, aes(x = x, y = y)) and then subsequently state my filtering or subsetting criteria within the appropriate geoms?
I find the example in the question itself the most readable, although verbose. The second part of the question about dealing with more cases just requires a more sophisticated test in filter using for example %in% (or grep, grepl, etc.) when dealing with multiple cases. Taking advantage of the possibility of accessing default plot data within a layer, and as mentioned by #MrFlick moving the mapping of aesthetics out of the individual layers results in more concise code. All earlier answers get the plot done, so in this respect my answer is not better than any of them...
library(ggplot2)
library(dplyr)
df <- data.frame(x = rep(1:5, 4),
y = runif(20),
z = rep(c("A", "B", "C", "Z"), each = 5))
g <- ggplot(data = df, aes(x = x, y = y)) +
geom_point(data = . %>% filter(z %in% c("A", "B", "C"))) +
geom_line(data = . %>% filter(z == "Z"))
g
Another option would be to spread the data and then just supply the y aesthetic.
library(tidyverse)
df %>% spread(z,y) %>%
ggplot(aes(x = x))+
geom_point(aes(y = A))+
geom_line(aes(y = B))
You can plot lines and points for all z records, but remove unwanted lines and points with passing NA to scale_linetype_manual and scale_shape_manual:
library(ggplot2)
ggplot(df, aes(x, y, linetype = z, shape = z)) +
geom_line() +
geom_point() +
scale_linetype_manual(values = c(1, NA)) +
scale_shape_manual(values = c(NA, 16))

ordering and plotting by one variable conditional on a second

Task: I would like to reorder a factor variable by the difference between the factor variable when a second variable equals 1 and the factor variable when the second variable equals 0. Here is a reproducible example to clarify:
# Package
library(tidyverse)
# Create fake data
df1 <- data.frame(place = c("A", "B", "C"),
avg = c(3.4, 4.5, 1.8))
# Plot, but it's not in order of value
ggplot(df1, aes(x = place, y = avg)) +
geom_point(size = 4)
# Now put it in order
df1$place <- factor(df1$place, levels = df1$place[order(df1$avg)])
# Plots in order now
ggplot(df1, aes(x = place, y = avg)) +
geom_point(size = 4)
# Adding second, conditional variable (called: new)
df2 <- data.frame(place = c("A", "A", "B", "B", "C", "C"),
new = rep(0:1, 3),
avg = c(3.4, 2.3, 4.5, 4.2, 2.1, 1.8))
ggplot(df2, aes(x = place, y = avg, col = factor(new))) +
geom_point(size = 3)
Goal: I would like to order and plot the factor variable place by the difference of avg between place when new is 1 and place when new is 0
You can create the levels for the place column by:
library(tidyr)
df2$place <- factor(df2$place, levels=with(spread(df2, new, avg), place[order(`1` - `0`)]))
ggplot(df2, aes(x = place, y = avg, col = factor(new))) +
geom_point(size = 3) + labs(color = 'new')
gives:
If I understand the goal correctly, then factor A has the biggest difference:
avg(new = 0) - avg(new = 1) = 1.1
So you can spread the data frame to calculate the difference, then gather, then plot avg versus place, reordered by diff. Or if you want A first, by -diff.
But let me know if I didn't understand correctly :)
df2 %>%
spread(new, avg) %>%
mutate(diff = `0` - `1`) %>%
gather(new, avg, -diff, -place) %>%
ggplot(aes(reorder(place, diff), avg)) +
geom_point(aes(color =factor(new)), size = 3)
Calculate the column first using dplyr:
df2 %>% group_by(place) %>% mutate(diff=diff(avg))
ggplot(df2, aes(x=place, y=diff, color=diff)+
geom_point(size=3)

Resources