Ordering alphanumeric variables for plotting - r

How to I order a set of variable names along the x-axis that contain letters and numbers? So these come from a survey where the variables are formatted like var1, below. But when plotted, they appear out_1, out_10, out_11...
But what I would like is for it to be plotted out_1, out_2...
library(tidyverse)
var1<-rep(paste0('out','_', seq(1,12,1)), 100)
var2<-rnorm(n=length(var1) ,mean=2)
df<-data.frame(var1, var2)
ggplot(df, aes(x=var1, y=var2))+geom_boxplot()
I tried this:
df %>%
separate(var1, into=c('A', 'B'), sep='_') %>%
arrange(B) %>%
ggplot(., aes(x=B, y=var2))+geom_boxplot()

You can order the levels of var1 before plotting:
levels(df$var1) <- unique(df$var1)
ggplot(df, aes(var1,var2)) + geom_boxplot()
Or you can specify the order in ggplot scale options:
ggplot(df, aes(var1,var2)) +
geom_boxplot() +
scale_x_discrete(labels = unique(df$var1))
Both cases will give the same result:
You can also use it to give personalized labels; there's no need to create a new variable:
ggplot(df, aes(var1, var2)) +
geom_boxplot() +
scale_x_discrete('output', labels = gsub('out_', '', unique(df$var1)))
Check ?discrete_scale for details. You can use breaks and labels in different combinations, including the use of labels that came from outside your data.frame:
pers.labels <- paste('Output', 1:12)
ggplot(df, aes(var1, var2)) +
geom_boxplot() +
scale_x_discrete(NULL, labels = pers.labels)

Related

Can you add a count to the legend for each level of a factor in ggplot?

I am producing a scatterplot using ggplot, and will be colouring the data points by a given factor. The legend that is produced, details the colour assigned to each level of the factor, but is it possible for it to also count the number of points in each factor.
For example, I have included the code for the cars data set:
p <- ggplot(mtcars, aes(wt, mpg))
p + geom_point(aes(colour = factor(cyl)))
In this plot, I would be looking to have the count for each number of cylinders. So 4(Count 1), 6(Count 2) and 8(Count 3).
Thanks in advance.
you can try something like this
mtcars %>%
group_by(cyl) %>%
mutate(label = paste0(cyl, ' (Count ', n(), ')')) %>%
ggplot(aes(wt, mpg)) +
geom_point(aes(colour = factor(label)))

Mantain order of dataframe for a stacked barplot using ggplot2

Using the following dataframe and ggplot...
sample ="BC04"
df<- data.frame(Name=c("Pseudomonas veronii", "Pseudomonas stutzeri", "Janthinobacterium lividum", "Pseudomonas viridiflava"),
Abundance=c(7.17, 4.72, 3.44, 3.33))
ggplot(data=df, aes(x=sample, y=Abundance, fill=Name)) +
geom_bar(stat="identity")
... creates the following graph
barplot
Altough the "geom_bar(stat="identity")" is set to "identity", it still ignores the order in the dataframe. I would like to get a stack order based on the Abundance percentage (Highest percentage at the top with ascending order)
Earlier, strings passed to ggplot, are evaluated with aes_string (which is now deprecated). Now, we convert the string to symbol and evaluate (!!)
library(ggplot2)
ggplot(data=df, aes(x= !! rlang::sym(sample), y=Abundance, fill=Name)) +
geom_bar(stat="identity")
Or another option is .data
ggplot(data=df, aes(x= .data[[sample]]), y=Abundance, fill=Name)) +
geom_bar(stat="identity")
Update
By checking the plot, it may be that the OP created a column named 'sample. In that case, we reorder the 'Name' based on the descending order of 'Abundance'
df$sample <- "BC04"
ggplot(data = df, aes(x = sample, y = Abundance,
fill = reorder(Name, desc(Abundance)))) +
geom_bar(stat = 'identity')+
guides(fill = guide_legend(title = "Name"))
-output
Or another option is to convert the 'Name' to factor with levels mentioned as the unique elements of 'Name' (as the data is already arranged in descending order of 'Abundance')
library(dplyr)
df %>%
mutate(Name = factor(Name, levels = unique(Name))) %>%
ggplot(aes(x = sample, y = Abundance, fill = Name)) +
geom_bar(stat = 'identity')

geom_text not properly positioned when using position_dodge(preserve="single") in bar plot

With the code below the labelling in facet bB is not correctly positioned.
The problem seems to originate from the fact that there is no position_dodge(preserve="single") for geom_text (correct?). I am aware that I could 'manually' add an empty dummy cell (filling y=0 in facet bB), but I was wondering whether there is any way to correct for it in ggplot directly?
v1 <- LETTERS[1:2]
v2 <- letters[1:2]
v3 <- c("x","y")
g <- expand.grid(v1,v2,v3)
val=c(sample(10,8))
df<- data.frame(g,val)
df<- df[-8,]
df %>% ggplot() +
geom_bar(aes(x=Var2, y=val, fill=Var3, group=Var3),
stat="identity",
position=position_dodge(preserve="single"))+
geom_text(aes(x=Var2, y=val+1, label=val, group=Var3),
position=position_dodge(width=1))+
facet_grid(Var1~Var2, scale="free_x")
Update/answer: using position_dodge2 alignes the bar with the labelling (however, the bar is the then centered and not aligned with the bars in the other facets).
df %>% ggplot() +
geom_bar(aes(x=Var2, y=val, fill=Var3, group=Var3), stat="identity",
position=position_dodge2(preserve="single"))+
geom_text(aes(x=Var2, y=val+1, label=val, group=Var3),
position=position_dodge2(width=1))+
facet_grid(Var1~Var2, scale="free_x")
If the example is similar to your actual use case, I suggest avoiding all this trouble with position_dodge, and simply assign different variables to the x-axis & columns of facet_grid instead. You are currently using "Var2" for both.
ggplot(df,
aes(x = Var3, y = val)) + # put Var3 here instead of Var2
geom_col(aes(fill = Var3)) + # dodge becomes unnecessary here
geom_text(aes(y = val + 1, label = val)) + # as above
facet_grid(Var1 ~ Var2) +
# optional, simulates the same appearance as original
labs(x = "Var2") +
theme(axis.text.x = element_blank(),
axis.ticks.x = element_blank())

Display the total number of bin elements in a stacked histogram with ggplot2

I'd like to show data values on stacked bar chart in ggplot2. After many attempts, the only way I found to show the total amount (for each bean) is using the following code
set.seed(1234)
df <- data.frame(
sex=factor(rep(c("F", "M"), each=200)),
weight=round(c(rnorm(200, mean=55, sd=5), rnorm(200, mean=65, sd=5)))
)
p<-ggplot(df, aes(x=weight, fill=sex, color=sex))
p<-p + geom_histogram(position="stack", alpha=0.5, binwidth=5)
tbl <- (ggplot_build(p)$data[[1]])[, c("x", "count")]
agg <- aggregate(tbl["count"], by=tbl["x"], FUN=sum)
for(i in 1:length(agg$x))
if(agg$count[i])
p <- p + geom_text(x=agg$x[i], y=agg$count[i] + 1.5, label=agg$count[i], colour="black" )
which generates the following plot:
Is there a better (and more efficient) way to get the same result using ggplot2?
Thanks a lot in advance
You can use stat_bin to count up the values and add text labels.
p <- ggplot(df, aes(x=weight)) +
geom_histogram(aes(fill=sex, color=sex),
position="stack", alpha=0.5, binwidth=5) +
stat_bin(aes(y=..count.. + 2, label=..count..), geom="text", binwidth=5)
I moved the fill and color aesthetics to geom_histogram so that they would apply only to that layer and not globally to the whole plot, because we want stat_bin to generate and overall count for each bin, rather than separate counts for each level of sex. ..count.. is an internal variable returned by stat_bin that stores the counts.
In this case, it was straightforward to add the counts directly. However, in more complicated situations, you might sometimes want to summarise the data outside of ggplot and then feed the summary data to ggplot. Here's how you would do that in this case:
library(dplyr)
counts = df %>% group_by(weight = cut(weight, seq(30,100,5), right=FALSE)) %>%
summarise(n = n())
countsByGroup = df %>% group_by(sex, weight = cut(weight, seq(30,100,5), right=FALSE)) %>%
summarise(n = n())
ggplot(countsByGroup, aes(x=weight, y=n, fill=sex, color=sex)) +
geom_bar(stat="identity", alpha=0.5, width=1) +
geom_text(data=counts, aes(label=n, y=n+2), colour="black")
Or, you can just create countsByGroup and then create the equivalent of counts on the fly inside ggplot:
ggplot(countsByGroup, aes(x=weight, y=n, fill=sex, color=sex)) +
geom_bar(stat="identity", alpha=0.5, width=1) +
geom_text(data=countsByGroup %>% group_by(weight) %>% mutate(n=sum(n)),
aes(label=n, y=n+2), colour="black")

ggplot: order of factors with duplicate levels

ggplot changes the order of an axis variable, which I do not want. I know I can change the variable to a factor and specify the levels to get around this, but what if the levels contain duplicate values?
An example is below. The only alternative I can think of is to use reorder(), but I can't get that to preserve the original order of the variable.
require(ggplot2)
season <- c('Sp1', 'Su1', 'Au1', 'Wi1', 'Sp2', 'Su2', 'Au2', 'Wi2', 'Sp3', 'Su3', 'Au3', 'Wi3') # this is the order I want the seasons to appear in
tempa <- rnorm(12, 15)
tempb <- rnorm(12, 20)
df <- data.frame(season=rep(season, 2), temp=c(tempa, tempb), type=c(rep('A',12), rep('B',12)))
# X-axis order wrong:
ggplot(df, aes(x=season, y=temp, colour=type, group=type)) + geom_point() + geom_line()
# X-axis order correct, but warning of duplicate levels in factor
df$season2 <- factor(df$season, levels=df$season)
ggplot(df, aes(x=season2, y=temp, colour=type, group=type)) + geom_point() + geom_line()
Just so this has an answer, this works just fine:
df$season2 <- factor(df$season, levels=unique(df$season))
ggplot(df, aes(x=season2, y=temp, colour=type, group=type)) +
geom_point() +
geom_line()

Resources