How to change order of factor levels in ggplot facet wrap - r

I have a factor with levels Xa, aXa and aX. Since R treats the factors in alphabetical order, the default facet wraps are coming as aX, aXa and Xa. I want the Xa to be the first graph in the wrap. I tried the following code:
data_small<- read.csv("agg_cond_subj_123.csv")
data_small<-fct_relevel(data_small$pos, "Xa", "aXa", "aX")
And then fed it to ggplot:
data_small %>%
ggplot(aes(lg, prop))+
geom_boxplot()+
facet_wrap(~pos)+
labs(x="Language group",
y="Accuracy (%)")
Xa was still treated in the last order. I tried piping it directly through fct_reorder()
data_small %>%
fct_reorder(pos, "Xa", "aXa", "aX")
ggplot(aes(lg, prop))+
geom_boxplot()+
facet_wrap(~pos)+
labs(x="Language group",
y="Accuracy`enter code here` (%)")
but it gave an error: Error: f must be a factor (or character vector).
I looked at some related solutions on the platform already but they were not fulfilling my purpose.

Use functions that relevel the factor variable inside mutate:
data_small %>%
mutate(pos = factor(pos, levels = c("Xa", "aXa", "aX"))) %>%
ggplot(aes(lg, prop))+
geom_boxplot()+
facet_wrap(~pos)+
labs(x="Language group",
y="Accuracy`enter code here` (%)")

Related

Excluding levels/groups within categorical variable (ggplot graph)

I am relatively new to ggplot, and I am interested in visualizing a categorical variable with 11 groups/levels. I ran the code below to produce a bar graph showing the frequency of each group. However, given that some groups within the categorical variable "active" only occur once or zero times, they clutter the graph. Therefore, is it possible to directly exclude groups in ggplot within the categorical variable with < 2 observations?
I am also open to recommendations on how to visualize a categorical variable with multiple groups/levels if a bar graph isn't suitable here.
Data type
sapply(df,class)
username active
"character" "character"
ggplot(data = df, aes(x = active)) +
geom_bar()
You can count() the categories first, and then filter(), before feeding to ggplot. In this way, you would use geom_col() instead:
df %>% count(active) %>% filter(n>2) %>%
ggplot(aes(x=active,y=n)) +
geom_col()
Alternatively, you could group_by() / filter() directly within your ggplot() call, like this:
ggplot(df %>% group_by(active) %>% filter(n()>2), aes(x=active)) +
geom_bar()

Reorder ggplot axis by one value, display labels from another

Situation is as follows:
I have many names and many corresponding codes for those names.
All different names have a unique code, but not all different codes have a unique name.
This has created an issue when plotting the data, as I need to group_by(code), and reorder(name,code) when plotting, but the codes are nonsense and I want to display the names. Since some codes share names, this creates a bit of an issue.
Example to illustrate below:
library(tidyverse)
set.seed(1)
# example df
df <- tibble("name" = c('apple','apple','pear','pear','pear',
'orange','banana','peach','pie','soda',
'pie','tie','beer','picnic','cigar'),
"code" = seq(1,15),
"value" = round(runif(15,0,100)))
df %>%
ggplot(aes(x=reorder(name,value)))+
geom_bar(aes(y=value),
stat='identity')+
coord_flip()+
ggtitle("The axis labels I want, but the order I don't")
df %>%
ggplot(aes(x=reorder(code,value)))+
geom_bar(aes(y=value),
stat='identity')+
coord_flip()+
ggtitle("The order I want, but the axis labels I don't")
Not quite sure how to get ggplot to keep the display and order of the second plot while being able to replace the axis labels with the names from the first plot.
What about using interaction to bind names and code and in scale_x_discrete replace labels by appropriate one such as follow:
df %>%
ggplot(aes(x=interaction(reorder(name, value),reorder(code,value))))+
geom_bar(aes(y=value),
stat='identity')+
scale_x_discrete(labels = function(x) sub("\\..*$","",x), name = "name")+
coord_flip()
is it what you are looking for ?

Reorder factored count data in ggplot2 geom_bar

I find countless examples of reordering X by the corresponding size of Y if the Dataframe for ggplot2 (geom_bar) is read using stat="identity".
I have yet to find an example of stat="count". The reorder function fails as I have no corresponding y.
I have a factored DF of one column, "count" (see below for a poor example), where there are multiple instances of the data as you would expect. However, I expected factored data to be displayed:
ggplot(df, aes(x=df$count)) + geom_bar()
by the order defined from the quantity of each factor, as it is different for unfactored (character) data i.e., will display alphabetically.
Any idea how to reorder?
This is my current awful effort, sadly I figured this out last night, then lost my R command history:
If you start off your project with loading the tidyverse, I suggest you use the built-in tidyverse function: fct_infreq()
ggplot(df, aes(x=fct_infreq(df$count))) + geom_bar()
As your categories are words, consider adding coord_flip() so that your bars run horizontally.
ggplot(df, aes(x=fct_infreq(df$count))) + geom_bar() + coord_flip()
This is what it looks like with some fish species counts: A horzontal bar chart with species on the y axis (but really the flipped x-axis) and counts on horizontal axis (but actually the flipped y-axis). The counts are sorted from least to greatest.
Converting the counts to a factor and then modifying that factor might help accomplish what you need. In the below I'm reversing the order of the counts using fct_rev from the forcats package (part of tidyverse)
library(tidyverse)
iris %>%
count(Sepal.Length) %>%
mutate(n=n %>% as.factor %>% fct_rev) %>%
ggplot(aes(n)) + geom_bar()
Alternatively, if you'd like the bars to be arranged large to small, you can use fct_infreq.
iris %>%
count(Sepal.Length) %>%
mutate(n=n %>% as.factor %>% fct_infreq) %>%
ggplot(aes(n)) + geom_bar()

Ordering bars in ggplot2 stacked barplot via levels() but output looks different

I'm struggling with my ggplot2 stacked barplot. I want to define the order of the bars manually. So I do that normally by transforming the variable into a factor and defining the levels in my desired order.
data <- transform(data, variable = factor(variable, levels = c("A4 Da/De/Du", "A2 London", "A3 Berlin", "A1 Paris", "A5 Rome")))
When I check my variable levels I can see that the levels are in my desired order to plot
head(data$variable)
When I plot my data everything looks as desired, but somehow, and I have no clue why, one variable (for example "A4 Da/De/Du") is not in my defined variable order...
Has someone an idea what the problem could be?
-It's the only variable with special characters (e.g "/") in it
-It's the only variable which has zero levels in it (e.g. c(20,40,0,0,40))
-My ggplot code is quite complex, and I use the "reorder()" function, and I use the "forcats" package in my ggplot2 code. Could that be a problem?
Thanks very much for any help or ideas!
EDIT (some example data)
library(reshape2)
library(ggplot2)
library(dplyr)
df <- data.frame(cbind(a=c(20,40,20,10,10),b=c(10,30,50,5,5), c=c(60,10,10,15,5), d=c(80,20,0,0,0), e=c(50,10,10,15,15)))
colnames(df) <- c("D1 Paris", "D2 London", "D3 Berlin", "D4 Da/De/Du", "D5 Rome")
df$category <- c("C1", "C2", "C3", "C4", "C5")
data <- data %>% group_by(variable) %>% arrange(variable)
data <- melt(data)
data$percent <- data$value/100
data <- transform(data, variable = factor(variable,
levels = c("D4 Da/De/Du", "D2 London", "D3 Berlin", "D1 Paris", "D5 Rome")))
And the short version of the ggplot2 code:
ggplot(data, aes(x=reorder((variable), percent), y=percent, fill=category)) +
coord_flip()+
geom_bar(stat="identity", width = .4, colour="black", lwd=0.1)
SOLUTION
I finally solved my problem :)
Gregor was right, after transforming the levels of the specific variable in the desired order, the reorder() function in ggplot2 is no longer necessary respectively overwrites the earlier defined levels, what at the end produced my error...
Thanks Gregor!

Ordering bar plots with ggplot2 according to their size, i.e. numerical value

This question asks about ordering a bar graph according to an unsummarized table. I have a slightly different situation. Here's part of my original data:
experiment,pvs_id,src,hrc,mqs,mcs,dmqs,imcs
dna-wm,0,7,9,4.454545454545454,1.4545454545454546,1.4545454545454541,4.3939393939393945
dna-wm,1,7,4,2.909090909090909,1.8181818181818181,0.09090909090909083,3.9090909090909087
dna-wm,2,7,1,4.818181818181818,1.4545454545454546,1.8181818181818183,4.3939393939393945
dna-wm,3,7,8,3.4545454545454546,1.5454545454545454,0.4545454545454546,4.272727272727273
dna-wm,4,7,10,3.8181818181818183,1.9090909090909092,0.8181818181818183,3.7878787878787876
dna-wm,5,7,7,3.909090909090909,1.9090909090909092,0.9090909090909092,3.7878787878787876
dna-wm,6,7,0,4.909090909090909,1.3636363636363635,1.9090909090909092,4.515151515151516
dna-wm,7,7,3,3.909090909090909,1.7272727272727273,0.9090909090909092,4.030303030303029
dna-wm,8,7,11,3.6363636363636362,1.5454545454545454,0.6363636363636362,4.272727272727273
I only need a few variables from this, namely mqs and imcs, grouped by their pvs_id, so I create a new table:
m = melt(t, id.var="pvs_id", measure.var=c("mqs","imcs"))
I can plot this as a bar graph where one can see the correlation between MQS and IMCS.
ggplot(m, aes(x=pvs_id, y=value))
+ geom_bar(aes(fill=variable), position="dodge", stat="identity")
However, I'd like the resulting bars to be ordered by the MQS value, from left to right, in decreasing order. The IMCS values should be ordered with those, of course.
How can I accomplish that? Generally, given any molten dataframe — which seems useful for graphing in ggplot2 and today's the first time I've stumbled over it — how do I specify the order for one variable?
It's all in making
pvs_id a factor and supplying the appropriate levels to it:
dat$pvs_id <- factor(dat$pvs_id, levels = dat[order(-dat$mqs), 2])
m = melt(dat, id.var="pvs_id", measure.var=c("mqs","imcs"))
ggplot(m, aes(x=pvs_id, y=value))+
geom_bar(aes(fill=variable), position="dodge", stat="identity")
This produces the following plot:
EDIT:
Well since pvs_id was numeric it is treated in an ordered fashion. Where as if you have a factor no order is assumed. So even though you have numeric labels pvs_id is actually a factor (nominal). And as far as dat[order(-dat$mqs), 2] is concerned the order function with a negative sign orders the data frame from largest to smallest along the variable mqs. But you're interested in that order for the pvs_id variable so you index that column which is the second column. If you tear that apart you'll see it gives you:
> dat[order(-dat$mqs), 2]
[1] 6 2 0 5 7 4 8 3 1
Now you supply that to the levels argument of factor and this orders the factor as you want it.
With newer tidyverse functions, this becomes much more straightforward (or at least, easy to read for me):
library(tidyverse)
d %>%
mutate_at("pvs_id", as.factor) %>%
mutate(pvs_id = fct_reorder(pvs_id, mqs)) %>%
gather(variable, value, c(mqs, imcs)) %>%
ggplot(aes(x = pvs_id, y = value)) +
geom_col(aes(fill = variable), position = position_dodge())
What it does is:
create a factor if not already
reorder it according to mqs (you may use desc(mqs) for reverse-sorting)
gather into individual rows (same as melt)
plot as geom_col (same as geom_bar with stat="identity")

Resources