How to Plot Every Column in Descending Order in R - r

I intend to plot every categorical column in the dataframe in a descending order depends on the frequency of levels in a variable.
I have already found out how to plot every column and reorder the levels, but I cannot figure out how to combine them together. Could you please give me some suggestions?
Code for plot every column:
require(purrr)
library(tidyr)
library(ggplot2)
diamonds %>%
keep(is.factor) %>%
gather() %>%
ggplot(aes(value)) +
facet_wrap(~ key, scales = "free") +
geom_bar()
Code for reorder the levels of one variable:
tb <- table(x)
factor(x, levels = names(tb[order(tb, decreasing = TRUE)]))
BTW, if you feel there is a better way writing these codes, please let me know.
Thanks.

Alternative 1
No need to use gridExtra to emulate facet_wrap, just include the function reorder_size inside aes:
reorder_size <- function(x) {
factor(x, levels = names(sort(table(x), decreasing = TRUE)))
}
diamonds %>%
keep(is.factor) %>%
gather() %>%
ggplot(aes(x = reorder_size(value))) +
facet_wrap(~ key, scales = "free") +
geom_bar()
Alternative 2
Using dplyrto calculate the count grouping by key and value. Then we reorder the value in descending order by count inside aes.
library(dplyr)
diamonds %>%
keep(is.factor) %>%
gather() %>%
group_by(key,value) %>%
summarise(n = n()) %>%
ggplot(aes(x = reorder(value, -n), y = n)) +
facet_wrap(~ key, scales = "free") +
geom_bar(stat='identity')
Output

The problem with your approach is that the long form of your data-frame will introduce a lot of factors that would be plotted as 0 for the geom_bar().
Instead of relying on facet_wrap and dealing with the long data-form, here's an alternative.
Reordering by size function:
reorder_size <- function(x) {
factor(x, levels = names(sort(table(x), decreasing=T)))
}
Using gridExtra::grid.arrange function to deliver similar facet_wrap style figure:
library(gridExtra)
a <- ggplot(diamonds, aes(x=reorder_size(cut))) + geom_bar()
b <- ggplot(diamonds, aes(x=reorder_size(color))) + geom_bar()
c <- ggplot(diamonds, aes(x=reorder_size(clarity))) + geom_bar()
grid.arrange(a,b,c, nrow=1)

Related

Cleaner way to plot multiple bar charts of different outcome variables (R)

I am wondering if there is a better way to produce 4 barcharts of different outcome variables arranged in a grid:
This is the code I used:
library(cowplot)
bar1 <- ggplot(data = subset(data, !is.na(MHQ_Heading_Male_Quartile))) +
geom_bar(mapping = aes(x = MHQ_Heading_Male_Quartile))
bar2 <- ggplot(data = subset(data, !is.na(AHQ_Heading_Male_Quartile))) +
geom_bar(mapping = aes(x = AHQ_Heading_Male_Quartile))
bar3 <- ggplot(data = subset(data, !is.na(MHQ_Heading_Female_Quartile))) +
geom_bar(mapping = aes(x = MHQ_Heading_Female_Quartile))
bar4 <- ggplot(data = subset(data, !is.na(AHQ_Heading_Female_Quartile))) +
geom_bar(mapping = aes(x = AHQ_Heading_Female_Quartile))
plot_grid(bar1, bar2, bar3, bar4, ncol = 2)
However, there is a lot of repeated code- is there some function or way to create the same plot with ggplot2 in fewer lines?
I would convert relevant columns from wide to long (the ones ending in "_Quartile") and then use facet_wrap to show the 4 plots in a 2x2 grid with scales = "free".
Something like this:
data %>%
gather(key, value, ends_with("Quartile")) %>%
filter(!is.na(value)) %>%
ggplot(aes(value)) +
geom_bar() +
facet_wrap(~ key, scales = "free", ncol = 2, nrow = 2)
As mentioned you need to make it a long format using dplyr gather (or reshape package) and then facet over this.
`data %>%
select( MHQ_Heading_Male_Quartile, AHQ_Heading_Male_Quartile, MHQ_Heading_Female_Quartile, AHQ_Heading_Female_Quartile) %>%
gather("Type", "Range", MHQ_Heading_Male_Quartile:AHQ_Heading_Female_Quartile) %>%
filter(!is.na(Range)) %>%
ggplot(aes(x=Range)) +
geom_bar() +
facet_wrap(~Type, scales="free")`
I'll leave it to you to clean the graphs up but that's the basic premise.
Extract the column names to be shown into nms and then for each one use qplot to create a ggplot object so that bars is a list of such objects. Then run plot_grid on that.
nms <- grep("Quartile", names(data), value = TRUE)
bars <- lapply(nms, function(nm) qplot(na.omit(data[[nm]]), xlab = nm))
do.call("plot_grid", bars)

R: Convert data.frame to color matrix using ggplot2?

Does anyone know how to use ggplot2 to convert a data frame in R with continous values into a pretty figure. This would be similar to the answer from this post but with ggplot2.
Is this possible?
New to R and ggplot2 so thanks in advance for any advice.
Here's an example using the mtcars data (scaled to give comparable values, so the numbers don't mean much).
The key things are the use of gather to tidy the data, geom_tile filled by value, and geom_text for the labels. Everything else is just manipulation of that particular data frame.
You could also just use one of the scale_fill_gradient geoms.
library(tidyverse)
library(viridis)
mtcars %>%
scale() %>%
as.data.frame() %>%
rownames_to_column(var = "make") %>%
gather(var, val, -make) %>%
ggplot(aes(var, make)) +
geom_tile(aes(fill = val)) +
geom_text(aes(label = round(val, 2)),
size = 3) +
coord_fixed() +
scale_fill_viridis() +
guides(fill = FALSE)
Or using:
+ scale_fill_gradient2(midpoint = 1.5)

Set ggplot title to reflect dplyr grouping

I've got a grouped dataframe generated in dplyr where each group reflects a unique combination of factor variable levels. I'd like to plot the different groups using code similar to this post. However, I can't figure out how to include two (or more) variables in the title of my plots, which is a hassle since I've got a bunch of different combinations.
Fake data and plotting code:
library(dplyr)
library(ggplot2)
spiris<-iris
spiris$site<-as.factor(rep(c("A","B","C")))
spiris$year<-as.factor(rep(2012:2016))
spiris$treatment<-as.factor(rep(1:2))
g<-spiris %>%
group_by(site, Species) %>%
do(plots=ggplot(data=.) +
aes(x=Petal.Width)+geom_histogram()+
facet_grid(treatment~year))
##Need code for title here
g[[3]] ##view plots
I need the title of each plot to reflect both "site" and "Species". Any ideas?
Use split() %>% purrr::map2() instead of group_by() %>% do() like this:
spiris %>%
split(list(.$site, .$Species)) %>%
purrr::map2(.y = names(.),
~ ggplot(data=., aes(x=Petal.Width)) +
geom_histogram()+
facet_grid(treatment~year) +
labs(title = .y) )
You just need to set the title with ggtitle():
g <- spiris %>% group_by(site, Species) %>% do(plots = ggplot(data = .) +
aes(x = Petal.Width) + geom_histogram() + facet_grid(treatment ~
year) + ggtitle(paste(.$Species,.$site,sep=" - ")))

rearrange facet_wrap plots based on the points in the subplot

I would like to rearrange the facet_wrap plots in a better way.
library(ggplot2)
set.seed(123)
freq <- sample(1:10, 20, replace = T)
labels <- sample(LETTERS, 20)
value <- paste("i",1:13,sep='')
lab <- rep(unlist(lapply(1:length(freq), function(x) rep(labels[x],freq[x]))),2)
ival <- rep(unlist(lapply(1:length(freq), function(x) value[1:freq[x]])),2)
df <- data.frame(lab, ival, type=c(rep('Type1',119),rep('Type2',119)),val=runif(238,0,1))
ggplot(df, aes(x=ival, y=val, col = type, group = type)) +
geom_line() +
geom_point(aes(x=ival, y=val)) +
facet_wrap( ~lab, ncol=3) +
theme(axis.text.x=element_text(angle=45, vjust=0.3)) +
scale_x_discrete(limits=paste('i',1:13,sep=''))
It results in the below plot:
Is there any way rearrange the plots based on their frequency? Some of the lab frequencies (or the number of points per type) are very low(1-3). I would like to arrange the plots facet_wrap wrt their frequencies instead of their label orders. One advantage is to reduce the plotting area and get better intuition from the plots.
Can it be done using the frequency values computed on the fly and passing them to the facet_wrap? Or it should be done separately using dplyr approaches and divide the data into low/medium/high frequent set of plots?
Here is one idea. We can use dplyr to calculate the number of each group in lab and use fct_reorder from forcats to reorder the factor level.
library(dplyr)
library(forcats)
df2 <- df %>%
group_by(lab) %>%
mutate(N = n()) %>%
ungroup() %>%
mutate(lab = fct_reorder(lab, N))
ggplot(df2, aes(x=ival, y=val, col = type, group = type)) +
geom_line() +
geom_point(aes(x=ival, y=val)) +
facet_wrap( ~lab, ncol=3) +
theme(axis.text.x=element_text(angle=45, vjust=0.3)) +
scale_x_discrete(limits=paste('i',1:13,sep=''))
Set .desc = TRUE when using fct_reorder if you want to reverse the factor levels.

Rank Stacked Bar Chart by Sum of Subset of Fill Variable

Sample data:
set.seed(145)
df <- data.frame(Age=sample(c(1:10),20,replace=TRUE),
Rank=sample(c("Extremely","Very","Slightly","Not At All"),
20,replace=TRUE),
Percent=(runif(10,0,.01)))
df.plot <- ggplot(df,aes(x=Age,y=Percent,fill=Rank))+
geom_bar(stat="identity")+
coord_flip()
df.plot
Within the ggplot, how can I reorder x=Age, by the sum of Ranks "Extremely" and "Very" only?
I tried using the below, without success.
df.plot <- ggplot(df,aes(x=reorder(Age,Rank=="Extremely",sum),y=Percent,fill=Rank))+
geom_bar(stat="identity")+
coord_flip()
df.plot
Couple of notes:
The way that you are simulating your data does not rule out the possibility that for some ages, all categories are not represented (which is fine), but also that for some ages, some categories are duplicated. I am assuming that this is not true for your real data, so have let this be. Note also that your simulation logic does not produce percentages that add up, although the category names indicate that they should.
The way I would do this is to create the ordering of age based on your desired logic, and then pass that order to the factor call. This decouples the ordering logic and allows arbitrary ordering logic.
Here is then what I think you are looking for:
library(ggplot2)
library(dplyr)
library(scales)
set.seed(145)
# simulate the data
df_foo = data.frame(Age=sample(c(1:10),20,replace=TRUE),
Rank=sample(c("Extremely","Very","Slightly","Not At All"),
20,replace=TRUE),
Percent=(runif(10,0,.01)))
# get the ordering that you are interested in
age_order = df_foo %>%
filter(Rank %in% c("Extremely", "Very")) %>%
group_by(Age) %>%
summarize(SumRank = sum(Percent)) %>%
arrange(desc(SumRank)) %>%
`[[`("Age")
# in some cases ages do not appear in the order because the
# ordering logic does not span all categories
age_order = c(age_order, setdiff(unique(df_foo$Age), age_order))
# make age a factor sorted by the ordering above
ggplot(df_foo, aes(x = factor(Age, levels = age_order), y = Percent, fill = Rank))+
geom_bar(stat = "identity") +
coord_flip() +
theme_bw() +
scale_y_continuous(labels = percent)
Which code produces:

Resources