Reorder labels in ggplot based on values in 2 colums - r

I would like to build a bar plot in R in which the labels in x axis are ordered by the value in Count if Name == A. When I build it, it simply order the x axis "Tool 1, Tool 2, Tool 3". What I would like is to get a x axis order like "Tool 3, Tool 1, Tool 2" (ordering by values in Name == A).
Here is a tiny dataframe as example:
df <- rbind(tibble(Name = c('A', 'B', 'C'), Count = c(6, 3, 1), Method = rep('Tool 1', 3)),
tibble(Name = c('A', 'B', 'C'), Count = c(4, 2, 4), Method = rep('Tool 2', 3)),
tibble(Name = c('A', 'B', 'C'), Count = c(7, 3, 0), Method = rep('Tool 3', 3))
)
df
# A tibble: 9 x 3
Name Count Method
<fct> <dbl> <chr>
1 A 6 Tool 1
2 B 3 Tool 1
3 C 1 Tool 1
4 A 4 Tool 2
5 B 2 Tool 2
6 C 4 Tool 2
7 A 7 Tool 3
8 B 3 Tool 3
9 C 0 Tool 3
To build the plot:
p <- ggplot(df, aes(x = Method,
y = Count,
fill = Name)) +
geom_bar(stat="identity")
p
I tried to rearrange it with df %>% mutate(Method = fct_reorder(Method, desc(Count))) but I don't know how to select only by A values. I also could reorder it manually but I am interested in knowing how to do it automatically. Thanks for the help !

We can add a column to df and use the reorder function
df$Count_A <- ifelse(df$Name == "A", df$Count, NA)
ggplot(df, aes(x = reorder(Method, -Count_A, mean, na.rm = TRUE),
y = Count,
fill = Name))+
geom_bar(stat="identity")
Alternatively, you could add the reordered factor to df:
df$Method <- reorder(df$Method, -df$Count_A, mean, na.rm = TRUE)
ggplot(df, aes(x = Method, y = Count, fill = Name)) +
geom_bar(stat="identity")

Using relevel.
p <- ggplot(transform(df, Method=relevel(as.factor(Method), ref="Tool 3")),
aes(x = Method,y = Count, fill = Name)) +
geom_bar(stat="identity")
p

Related

A simple plot for many curves with different colors

I have the following data frame which contains 4 columns of data in addition to the vector of labels c.
Time <-c(1:4)
d<-data.frame(Time,
x1= rpois(n = 4, lambda = 10),
x2= runif(n = 4, min = 1, max = 10),
x3= rpois(n = 4, lambda = 5),
x4= runif(n = 4, min = 1, max = 5),
c=c(1,1,2,3))
I would like to use ggpolt to plot 4 curves"x1,..,x4" above each others where each curve is colored according to the label. So curves x1 and x2 are colored by the same color since they have the same label where as curves x3 and x4 in different colors.
I did the following
d %>% pivot_longer(-c(Time,x1,x2,x3,x4))%>%
rename(class=value) %>% select(-name) %>%
pivot_longer(-c(Time,class)) %>%
mutate(Label=ifelse(Time==max(Time,na.rm = T),name,NA),
Label=ifelse(duplicated(Label),NA,Label)) %>%
ggplot(aes(x=Time,y=value,color=factor(class),group=name))+
geom_line()+
labs(color='class')+
scale_color_manual(values=c('red','blue','green'))+
geom_label_repel(aes(label = Label),
nudge_x = 1.5,
na.rm = TRUE,show.legend = F,color='black')
but I don't get the needed plot, the resulted curves are not colored according to the label. I want x1 and x2 in red, x3 in blue and x4 in green.
To add: I would like to obtain the same plot above in the following general case, where I can't add the vector c to the data frame as length(c) is not equal to length(x1)=...=length(x4)
Time <-c(1:5)
d<-data.frame(Time,
x1= rpois(n = 5, lambda = 10),
x2= runif(n = 5, min = 1, max = 10),
x3= rpois(n = 5, lambda = 5),
x4= runif(n = 5, min = 1, max = 5))
and c=c(1,1,2,3)
As you point out in your comments, it is only possible to put the vector of colors as a column in the original data.frame because it happens to be square, but this is a dangerous way to store the information because the colors really belong to the columns rather than the rows. It's better to assign the colors separately and then join into the long format data by variable name prior to plotting.
Below is an example of how I'd do this with your data.
First, prepare the data without the color mapping for each variable, we'll do that next:
# load necessary packages
library(tidyverse)
library(ggrepel)
# set seed to make simulated data reproducible
set.seed(1)
# simulate data
Time <-c(1:4)
d <- data.frame(Time,
x1 = rpois(n = 4, lambda = 10),
x2 = runif(n = 4, min = 1, max = 10),
x3 = rpois(n = 4, lambda = 5),
x4 = runif(n = 4, min = 1, max = 5))
Next, make a separate data.frame that maps the color grouping to the variable names. At some point you'll want to make this a factor (i.e. discrete rather than continuous) to map it to color so I just do it here but it can be done later in the ggplot call if you prefer. Per your request, this solution easily scales with your dataset without needing to manually set each level, but it requires that your vector of color mappings is in the same order and the same length as the variable names in d unless you have some other way to establish that relationship.
# create separate df with color groupings for variable in d
color_grouping <- data.frame(var = names(d)[-1],
color_group = factor(c(1, 1, 2, 3)))
Then you pivot_longer and do a join to merge the color mapping with the data for plotting.
# pivot d to long and merge in color codes
d_long <- d %>%
pivot_longer(cols = -Time, names_to = "var", values_to = "value") %>%
left_join(., color_grouping)
# inspect final table prior to plotting to confirm color mappings
head(d_long, 4)
# # A tibble: 4 x 4
# Time var value color_group
# <int> <chr> <dbl> <fct>
# 1 1 x1 8 1
# 2 1 x2 1.56 1
# 3 1 x3 4 2
# 4 1 x4 4.97 3
Finally, generate line plot where color is mapped to the color_group variable. To ensure you get one line per original variable you also need to set group = var. For more info on this check the documentation on grouping.
# plot data adding labels for each line
p <- d_long %>%
ggplot(aes(x = Time, y = value, group = var, color = color_group)) +
geom_line() +
labs(color='class') +
scale_color_manual(values=c('red','blue','green')) +
geom_label_repel(aes(label = var),
data = d_long %>% slice_max(order_by = Time, n = 1),
nudge_x = 1.5,
na.rm = TRUE,
show.legend = F,
color='black')
p
This produces the this plot:
In your comment you suggested wanting to separate out and stacking the plots. I'm not sure I fully understood, but one way to accomplish this is with faceting.
For example if you wanted to facet out separate panels by color_group, you could add this line to the plot above:
p + facet_grid(rows = "color_group")
Which gives this plot:
Note that the faceting variable must be put in quotes.
You were on the right path, but you need a little bit of a different structure to use ggplot:
# delete old color column
d$c <- NULL
# reshape df
plot.d <- reshape2::melt(d, id.vars = c("Time"))
# create new, correct color column
plot.d$c <- NA
plot.d$c[plot.d$variable == "x1"] <- 1
plot.d$c[plot.d$variable == "x2"] <- 1
plot.d$c[plot.d$variable == "x3"] <- 2
plot.d$c[plot.d$variable == "x4"] <- 3
# plot
ggplot(plot.d, aes(x=Time, y=value, color=as.factor(c), group = variable))+
geom_line() +
labs(color='class')+
scale_color_manual(values=c('red','blue','green'))
Note that I omitted the labels for brevity, but you can add them back in using the same logic. The code above gives the following result:
Here is a solution for how I understood your question.
The DF is brought in the long format, the variable c is replaced with mutate / case_when with the number code you have used.
I have set a seed for better reproducibility.
library(tidyverse)
library(ggrepel)
set.seed(1)
# YOUR DATA
Time <- c(1:4)
d <- data.frame(Time,
x1 = rpois(n = 4, lambda = 10),
x2 = runif(n = 4, min = 1, max = 10),
x3 = rpois(n = 4, lambda = 5),
x4 = runif(n = 4, min = 1, max = 5),
c = c(1, 1, 2, 3)
)
d %>%
pivot_longer(cols = x1:x4) %>% # make it long
mutate(c = as.factor(case_when( # replace consistently
name == "x1" | name == "x2" ~ 1, # according to YOUR DATA
name == "x3" ~ 2,
name == "x4" ~ 3
))) %>%
mutate(
Label = ifelse(Time == max(Time, na.rm = T), name, NA),
Label = ifelse(duplicated(Label), NA, Label)
) %>%
ggplot(aes(x = Time, y = value, color = c, group = name)) +
geom_line() +
labs(color = "class") +
scale_color_manual(values = c("red", "blue", "green")) + # YOUR CHOICE
geom_label_repel(aes(label = Label),
nudge_x = 1.5,
na.rm = TRUE, show.legend = F, color = "black"
)
ADDED
You could leave the c out and color according to name.
The color code was neccessary because you wanted 2 names with the same color. If that is not needed, the following code can do it.
d %>%
pivot_longer(cols = x1:x4) %>% # make it long
mutate(
Label = ifelse(Time == max(Time, na.rm = T), name, NA),
Label = ifelse(duplicated(Label), NA, Label)
) %>%
ggplot(aes(x = Time, y = value, color = name, group = name)) +
geom_line() +
geom_label_repel(aes(label = Label),
nudge_x = 1.5,
na.rm = TRUE, show.legend = F, color = "black"
)

Plot a barplot with repeated labels

I would like to plot data with repeated x-axis labels in the form of bar-plot without merging the values with repeated labels.
In the example I have a table de:
de <- data.frame(mean=c(10, 2, 3, 1, 4, 5, 3, 9),
base=c('A','A','C','G','T','T','T','A'))
And I would like to have a plot like this:
But when I run this in R:
ggplot(de, aes( y = mean, x =base))+
geom_bar(stat = 'identity')
This is what I get:
It merges the identical bases into one column, whereas I want a separate column for each value of base, even the repeated ones, as shown in the table above.
The easy way is to:
Set non-unique labels for As and Ts in your "base" column; for example Ax, Ay, Tx, Ty etc:
de <- data.frame(mean=c(10, 2, 3, 1, 4, 5, 3, 9),
base=c("Ax", "Ay", "C", "G", "Tx","Ty", "Tz", "A"))
And then change the x-axis labels:
ggplot(de, aes( y = mean, x =base))+
geom_bar(stat = 'identity') +
scale_x_discrete(labels=c("A", "A", "C", "G", "T","T", "T", "A"))
Building off #sargg's excellent answer, we can prevent the possibility of human error by generating the unique base names and the ggplot labels automatically with dplyr:
library(dplyr)
de2 <- de %>%
group_by(base) %>%
mutate(unique_base = paste0(base, row_number()))
# A tibble: 8 x 3
# Groups: base [4]
mean base unique_base
<dbl> <fct> <chr>
1 10 A A1
2 2 A A2
3 3 C C1
4 1 G G1
5 4 T T1
6 5 T T2
7 3 T T3
8 9 A A3
ggplot(de2, aes(y = mean, x =unique_base))+
geom_bar(stat = 'identity') +
scale_x_discrete(labels=de2$base)
For an even DRY-er answer, we can pass the data in like this (note the curly braces: see this answer for more information):
de2 %>% {
ggplot(., aes( y = mean, x =unique_base))+
geom_bar(stat = 'identity') +
scale_x_discrete(labels=.$base)
}
This lets us access the de2 data frame from within the ggplot call with ., thus letting us specify the labels with labels=.$base, rather than having to specify the dataframe de2 twice.
Though there already is an accepted solution, I will post another one, creating the desired labels from the original dataset.
First, an example dataset creation code.
set.seed(1234)
values <- sample(20, 8)
base <- c('A', 'A', 'C', 'G', 'T', 'T', 'T', 'A')
de <- data.frame(base, values)
Now the code to plot the graph.
library(tidyverse)
de %>%
mutate(base1 = paste0(seq_along(base), base)) %>%
ggplot(aes(x = base1, y = values)) +
geom_bar(stat = 'identity') +
geom_text(aes(x = base1, y = -1,
label = base)) +
theme(axis.title.x=element_blank(),
axis.text.x=element_blank(),
axis.ticks.x=element_blank())

gganimate: include additional variable other than states level variable or frame in title expression

I'd like to insert another column value of my data into a gganimate animation title.
Example, here the states level variable is x and I'd like to add to title variable y:
df <- tibble(x = 1:10, y = c('a', 'a', 'b', 'd', 'c', letters[1:5]))
df
A tibble: 10 x 2
x y
<int> <chr>
1 1 a
2 2 a
3 3 b
4 4 d
5 5 c
6 6 a
7 7 b
8 8 c
9 9 d
10 10 e
This works as expected:
ggplot(df, aes(x, x)) +
geom_point() +
labs(title = '{closest_state}') +
transition_states(x,
transition_length = 0.1,
state_length = 0.1)
This fails:
ggplot(df, aes(x, x)) +
geom_point() +
labs(title = '{closest_state}, another_var: {y}') +
transition_states(x,
transition_length = 0.1,
state_length = 0.1)
Error in eval(parse(text = text, keep.source = FALSE), envir) :
object 'y' not found
Also tried this, but y will not change:
ggplot(df, aes(x, x)) +
geom_point() +
labs(title = str_c('{closest_state}, another_var: ', df$y)) +
transition_states(x,
transition_length = 0.1,
state_length = 0.1)
Another option is to map y as the states level variable and use the frame variable instead of x, but in my application y is either a not-necessarily-unique character variable like above, or it is a numeric variable but again not-necessarily-unique and not-necessarily-ordered. In which case gganimate (or ggplot?) will order it as it sees fit, making the final result weird not ordered by x:
ggplot(df, aes(x, x)) +
geom_point() +
labs(title = '{frame}, another_var: {closest_state}') +
transition_states(y,
transition_length = 0.1,
state_length = 0.1)
So how to simply add the changing value of the un-ordered, not numeric, y variable?
Finally: This question was asked here but without a reproducible example so it was not answered, hoping this one is better.
One dirty solution would be to paste together the variables and make a new one to use in the transition_states:
df <- mutate(df, title_var = factor(paste(x, y, sep="-"), levels = paste(x, y, sep="-")))
# # A tibble: 6 x 3
# x y title_var
# <int> <chr> <fct>
# 1 1 a 1-a
# 2 2 a 2-a
# 3 3 b 3-b
# 4 4 d 4-d
# 5 5 c 5-c
# 6 6 a 6-a
Then we could use gsub() in ordet to strip closest_state from the unwanted part, like this:
gsub(pattern = "\\d+-", replacement = "", "1-a")
"a"
So:
ggplot(df, aes(x, x)) +
geom_point() +
labs(title = '{gsub(pattern = "\\d+-", replacement = "", closest_state)}') +
transition_states(title_var, transition_length = 0.1, state_length = 0.1)
Another possibility, slightly more compact, from the author of gganimate himself, following the issue I opened:
https://github.com/thomasp85/gganimate/issues/252#issuecomment-450846868
According to Thomas:
There are multiple reasons why random columns from the input data
cannot be accessed so it is unlikely to get any better than this...
Here's a solution using dplyr, based on the gganimate developer Thomas's solution, provided by Giora.
library(tidyverse)
library(gganimate)
df <- tibble::tibble(x = 1:10, y = c('a', 'a', 'b', 'd', 'c', letters[1:5]))
a <- ggplot(df, aes(x, x)) +
geom_point() +
labs(title = "{closest_state}, another_var: {df %>% filter(x == closest_state) %>% pull(y)}") +
transition_states(x,
transition_length = 0.1,
state_length = 0.1)
animate(a)
The gganimate titles use glue syntax for the animated title elements, and you can include entire dplyr data manipulation pipelines within them.
You can refer to the closest_state variable provided by gganimate::transition_states() within your dplyr calls. Here, since the animation's frames are indexed by successive levels of x, I use filter() to subset df for a given frame based on the value of x and then refer to corresponding rows of column y, which contain additional information I'd like to display in the title. Using pull, you can grab the individual value of y corresponding to x and display it within the animation's title.
This is a clean and straightforward way to do it with the advantage that you can, e.g., compute summary values to display on-the-fly by adding summarize() and other calls in your magrittr pipeline.

How to use the first value of a column as a subtitle in ggplot?

Consider this simple chart
library(ggplot2)
data_frame(group = c('a', 'a', 'b', 'b'),
x = c(1,2,3,4),
y = c(10,11,12,13),
title = c('one', 'one', 'two', 'two'))
# A tibble: 4 x 4
group x y title
<chr> <dbl> <dbl> <chr>
1 a 1 10 one
2 a 2 11 one
3 b 3 12 two
4 b 4 13 two
%>%
ggplot(aes(x = x, y = y, group = group)) + geom_point(size = 12)+
facet_wrap(~group)
Here, I would like to show the string shown in the title column (as you can see, it is always the same for each group) on a subtitle for each chart.
I tried to play with labs(subtitle = .$title[[1]]) but that says Error in labs(subtitle = .$title[[1]]) : object '.' not found
Any ideas?
Thanks!
You can paste the titles to combine them with the group labels, and use that as the facet label:
data_frame(group = c('a', 'a', 'b', 'b'),
x = c(1,2,3,4),
y = c(10,11,12,13),
title = c('one', 'one', 'two', 'two')) %>%
mutate(group_title = paste0(group, "\n", title)) %>%
ggplot(aes(x = x, y = y, group = group)) + geom_point(size = 12)+
facet_wrap(~group_title)

Using aes_string to change color of geom_point from dynamic column

Here's my initial dataframe.
data.df
x y z label
2 3 4 1
1 2 3 2
2 4 3 3
To make ggplot, this works when there is only 1 column (label) :
g <- ggplot(data.df) +
geom_point(data = data.df, aes(x= x, y= y,
color = ifelse(( label == 2), "a", "b")+
scale_colour_manual(values= c("a" = "blue", "b" = "green"))
return g
On clicking a button called "merge", new column gets added dynamically:
x y z label label2
2 3 4 1 1
1 2 3 2 2
2 4 3 3 2
Now in ggplot I need to access LAST column instead of label column (it could be label2, label3...) and update ggplot.
I tried two ways.
g <- ggplot(data.df) +
geom_point(data = data.df, aes(x= x, y= y,
color = ifelse(( data.df[, ncol(data.df)] == 2, "a", "b")+
scale_colour_manual(values= c("a" = "blue", "b" = "green"))
return g
As shown while using data.df[, ncol(data.df)] , I'm getting the error:
Error: Aesthetics must be either length 1 or the same as the data (40): x, y, colour
I have a feeling aes_string can be used instead of aes:
label <- paste("label", counter , sep="")
g <- ggplot(data.df) +
geom_point(data = data.df, aes_string(x= "x", y= "y",
color = ifelse((label == 2), a, b))) +
scale_colour_manual(values= c("a" = "blue", "b" = "green"))
I'm getting this error:
Error in ifelse((label == 2), a, b))), : object a not found
My opinion is to do your standard evaluation that allows for the dynamic functionality before you go into ggplot2 functions.
The below takes advantage of Standard evaluation versions of dplyr functions. It creates a column in the data frame dynamically called formatCol and bases the colour scale on it.
data.df <- data.frame(x = c(2, 1, 2),
y = c(3, 2, 4),
z = c(4, 3, 3),
label = c(1, 2, 3),
label2 = c(1, 2, 2))
library(ggplot2)
library(dplyr)
library(lazyeval)
formatCol <- names(data.df)[ncol(data.df)]
formula <- interp(~ifelse((label == 2), "a", "b"), label = as.name(formatCol))
plot.df <- data.df %>% mutate_(formatCol = formula)
g <- ggplot(plot.df, aes(x= x, y= y)) +
geom_point( aes(color = formatCol))+
scale_colour_manual(values= c("a" = "blue", "b" = "green"))
g

Resources