Rank Stacked Bar Chart by Sum of Subset of Fill Variable - r

Sample data:
set.seed(145)
df <- data.frame(Age=sample(c(1:10),20,replace=TRUE),
Rank=sample(c("Extremely","Very","Slightly","Not At All"),
20,replace=TRUE),
Percent=(runif(10,0,.01)))
df.plot <- ggplot(df,aes(x=Age,y=Percent,fill=Rank))+
geom_bar(stat="identity")+
coord_flip()
df.plot
Within the ggplot, how can I reorder x=Age, by the sum of Ranks "Extremely" and "Very" only?
I tried using the below, without success.
df.plot <- ggplot(df,aes(x=reorder(Age,Rank=="Extremely",sum),y=Percent,fill=Rank))+
geom_bar(stat="identity")+
coord_flip()
df.plot

Couple of notes:
The way that you are simulating your data does not rule out the possibility that for some ages, all categories are not represented (which is fine), but also that for some ages, some categories are duplicated. I am assuming that this is not true for your real data, so have let this be. Note also that your simulation logic does not produce percentages that add up, although the category names indicate that they should.
The way I would do this is to create the ordering of age based on your desired logic, and then pass that order to the factor call. This decouples the ordering logic and allows arbitrary ordering logic.
Here is then what I think you are looking for:
library(ggplot2)
library(dplyr)
library(scales)
set.seed(145)
# simulate the data
df_foo = data.frame(Age=sample(c(1:10),20,replace=TRUE),
Rank=sample(c("Extremely","Very","Slightly","Not At All"),
20,replace=TRUE),
Percent=(runif(10,0,.01)))
# get the ordering that you are interested in
age_order = df_foo %>%
filter(Rank %in% c("Extremely", "Very")) %>%
group_by(Age) %>%
summarize(SumRank = sum(Percent)) %>%
arrange(desc(SumRank)) %>%
`[[`("Age")
# in some cases ages do not appear in the order because the
# ordering logic does not span all categories
age_order = c(age_order, setdiff(unique(df_foo$Age), age_order))
# make age a factor sorted by the ordering above
ggplot(df_foo, aes(x = factor(Age, levels = age_order), y = Percent, fill = Rank))+
geom_bar(stat = "identity") +
coord_flip() +
theme_bw() +
scale_y_continuous(labels = percent)
Which code produces:

Related

ggplot geom_bar leave blank spaces for 0 values by group

Below is a simple ggplot bar plot:
x<-c(1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,3,3,3,3,3,3,3,3,3)
y<-c(1,2,3,4,5,3,3,3,3,4,5,5,6,7,6,5,4,3,2,3,4,5,3,2,1,1,1,1,1)
d<-cbind(x,y)
ggplot(data=d,aes(x=x,fill=as.factor(y)))+
geom_bar(position = position_dodge())
The issue I'm having is that each value of y is not present in each grouping x. So for example, group 1 along the x-axis only contains groups 1-5 of the y variable, and doesn't have any values for 6 or 7. What I would like is for the plot to leave blank spaces when there is are no values for a y in the given x-grouping, this way it is easier to compare the x-groups.
A solution is to compute the frequencies manually and plot the graph based on that frequencies table.
library(ggplot2)
d1 <- data.frame(table(d))
d1$x <- factor(d1$x)
ggplot(d1, aes(x, Freq, fill = factor(y))) +
geom_bar(stat = "identity", position = position_dodge())
library(tidyverse)
# set factor levels
d2 <- d %>% data.frame() %>% mutate(x=factor(x, levels=c(1:3)),
y=factor(y, levels=c(1:7)))
# count frequencies and send to ggplot2
d2 %>% group_by(x, y, .drop=F) %>% tally() %>%
ggplot(aes(x=x, y=n, fill=y, color=y)) +
geom_bar(position = position_dodge2(),
stat="identity")
Another way to do this using dplyr is to use tally() to count the frequencies, but you need to make sure that you have your variables set as factors first.
Using color=y & fill=y in the aes statement helps to show exactly where on the plot the zero values are. So, now you can see that it is y=6 & y=7 missing from x=1 & x=3, and y=1 missing from x=2
And I chose position_dodge2 for my own personal preferences.

Assigning specific colors to specific cases in ridgeline plots in R

Recently this community helped me tremendously with getting Ridgeline plots to work with my data.
Now I am struggling with coloring them according to my needs.
Basically what I want is plotting my cases in different orders but they should keep a specific color so observations remain recognizable even when plotted in a different order. So far I failed with applying the available solutions to my requirements.
Let us take for example this data, where we have a name, a mean and an SD:
caseName caseMean caseSD
Svansdottir 2006 -0.0646 0.4032398
Guétin 2009 -0.4649 0.3995663
Raglio 2010a -0.2145 0.2814031
Let's first sort them by caseMean:
df$caseName <- factor(df$caseName, levels = df$caseName[order(df$caseMean)])
and plot it with the following code:
library(tidyverse); library(ggridges)
n = 100
df3 <- df %>%
mutate(low = caseMean - 3 * caseSD, high = caseMean + 3 * caseSD) %>%
uncount(n, .id = "row") %>%
mutate(x = (1 - row/n) * low + row/n * high,
norm = dnorm(x, caseMean, caseSD))
ggplot(df3, aes(x, caseName, height = norm, fill=caseName)) +
geom_ridgeline(scale = 2,alpha=0.75) +
scale_fill_viridis_d()
we get this:
Now we reverse the order
df$caseName <- factor(df$caseName, levels = df$caseName[order(-df$caseMean)])
and plot again with the code above we see that the plots have switched color:
How can I make sure that the same cases have always the same colors no matter the order I put them in?
I would like to have code that doesn't require me to to "hard-wire" colors to a specific case name. I want to be able to do this to ridgeline plots with 20, 30, or more observations. The fact that I picked the viridis color palette doesn't matter. I am happy with any solution (like with heat.colors or something similar).
If your new factor is just reversing the order of the previous one, you could use the argument direction in scale_fill_viridis_d().
For more complicated cases (i.e. re-leveling a factor), a possibility is to add the colour manually, possibly in your orginal data-frame, and to feed it with scale_fill_manual()
simple case: reversing order of factor
library(tidyverse)
df <- data.frame(name = letters[3:1], value = c(3,1,2))
pl_1 <- ggplot(aes(x=name, y=value, fill=name), data=df)+
geom_col() +
scale_fill_viridis_d()
pl_1
pl_1 %+% mutate(df, name = factor(name, levels = c("c", "b", "a"))) +
scale_fill_viridis_d(direction=-1)
#> Scale for 'fill' is already present. Adding another scale for 'fill',
#> which will replace the existing scale.
More complicated case
library(tidyverse)
library(viridis)
df_new <- tibble(name = letters[3:1], value = c(3,1,2),
col = rev(viridis(3))) %>%
mutate(name = factor(name, levels = c("c", "b", "a"))) %>%
arrange(name)
df_new %>%
ggplot(aes(x=name, y=value, fill=name)) +
geom_col() +
scale_fill_manual(values=df_new$col)
Created on 2019-06-06 by the reprex package (v0.3.0)

multiple line and facet_grid in Bar plot

I have a dataframe with 53 states and sex variable. e.g. the below DF is having 26 states.
set.seed(25)
test <- data.frame(
state = sample(letters[1:26], 10000, replace = TRUE),
sex = sample(c("M","F"), 10000, replace = TRUE)
)
Now I want to see which state has more female member, so I created a bar plot in a grid for each state and each grid has two bars (M,F).
test.pct = test %>% group_by(state, sex) %>%
summarise(count=n()) %>%
mutate(pct=count/sum(count))
ggplot(test.pct, aes(x=sex, y=pct, fill=sex)) +
geom_bar(stat="identity") +
facet_grid(. ~ state)
The problem is all these 26 grid are appearing in single line - visibility issue. I want to construct the plot in multiple frame, e.g 3X9 instead of 1X26.
Also the state should be ordered based of Female percentage.
Thanks for your help.
Problem #1: Use facet_wrap. Problem #2: Reorder the state levels beforehand.
It could look like this:
ggplot(transform(test.pct, state=factor(state,
levels=with(subset(test.pct, sex=="F"),
state[order(pct)]))),
aes(x=sex, y=pct, fill=sex)) +
geom_bar(stat="identity") +
facet_wrap(~ state, nrow = 3)
The first part is straightforward: just use facet_wrap instead of facet_grid. The ordering is a bit trickier; you have to reorder the levels of the factor. Just to make it a bit clearer, I've split the operation up into a few steps. First, extract only female percentages, then find the order of those percentages, and finally use that order to rearrange the order of the levels of state. That's a long-winded way of doing it, but I hope it makes the principle clear.
wom.pct <- test.pct %>% filter(sex == 'F')
ix <- order(wom.pct$pct)
test.pct$state <- factor(test.pct$state, levels = letters[1:26][ix])
ggplot(test.pct, aes(x=sex, y=pct, fill=sex)) +
geom_bar(stat="identity") +
facet_wrap( ~ state)

R- stacked charts

Hi I'm having issues with a stacked bar chart.
The goal is to print a bar chart that shows the sum of products sold stacked on top of each other, which I have done, but the products are not grouped together, so instead of having big blocks per product, they are all split. I need some way to aggregate the count, so it sums and then I can add the chart in some sort of order
library(ggplot2)
library(plyr) #Is this automatically loaded with ggplot2?
library(dplyr)
salesMixData <- read.csv("SalesMix.csv", stringsAsFactors = FALSE, header = TRUE)
productMix <- salesMixData[,c(1,6,7)]
ggplot(productMix, aes(x=JoinMonthYear, y=Count,fill=Prod)) +
geom_bar(stat='identity') +
theme(axis.text.x = element_text(angle=60, hjust = 1),legend.position="bottom")
The output looks like the following:
You probably want to summarise the data first, calculating an aggregate sum for each combination of JoinMonthYear and Prod.
Here's an example with a dummy data set:
library(ggplot2)
library(dplyr)
d <- data.frame(x=sample(20, 1000, replace=T),
count=rpois(1000, 10),
grp=sample(LETTERS[1:10], 1000, replace=TRUE))
This is equivalent to what you're seeing:
ggplot(d, aes(x=x, y=count, fill=grp)) +
geom_bar(stat='identity')
Grouping the observations (in your case by JoinMonthYear and Prod), and then summarising to the groups' sums, should get you what you're after:
d %>%
group_by(x, grp) %>%
summarise(sum_count=sum(count, na.rm=TRUE)) %>%
ggplot(aes(x=x, y=sum_count, fill=grp)) +
geom_bar(stat='identity')

Plotting a bar graph in R

Here is a snapshot of data:
restaurant_change_sales = c(3330.443, 3122.534)
restaurant_change_labor = c(696.592, 624.841)
restaurant_change_POS = c(155.48, 139.27)
rest_change = data.frame(restaurant_change_sales, restaurant_change_labor, restaurant_change_POS)
I want two bars for each of the columns indicating the change. One graph for each of the columns.
I tried:
ggplot(aes(x = rest_change$restaurant_change_sales), data = rest_change) + geom_bar()
This is not giving the result the way I want. Please help!!
So ... something like:
library(ggplot2)
library(dplyr)
library(tidyr)
restaurant_change_sales = c(3330.443, 3122.534)
restaurant_change_labor = c(696.592, 624.841)
restaurant_change_POS = c(155.48, 139.27)
rest_change = data.frame(restaurant_change_sales,
restaurant_change_labor,
restaurant_change_POS)
cbind(rest_change,
change = c("Before", "After")) %>%
gather(key,value,-change) %>%
ggplot(aes(x = change,
y = value)) +
geom_bar(stat="identity") +
facet_grid(~key)
Which will produce:
Edit:
To be extra fancy e.g. make it so that the order of x-axis labels goes from "Before" to "After", you can add this line: scale_x_discrete(limits = c("Before", "After")) to the end of the ggplot function
Your data are not formatted properly to work well with ggplot2, or really any of the plotting packages in R. So we'll fix your data up first, and then use ggplot2 to plot it.
library(tidyr)
library(dplyr)
library(ggplot2)
# We need to differentiate between the values in the rows for them to make sense.
rest_change$category <- c('first val', 'second val')
# Now we use tidyr to reshape the data to the format that ggplot2 expects.
rc2 <- rest_change %>% gather(variable, value, -category)
rc2
# Now we can plot it.
# The category that we added goes along the x-axis, the values go along the y-axis.
# We want a bar chart and the value column contains absolute values, so no summation
# necessary, hence we use 'identity'.
# facet_grid() gives three miniplots within the image for each of the variables.
ggplot2(rc2, aes(x=category, y=value, facet=variable)) +
geom_bar(stat='identity') +
facet_grid(~variable)
You have to melt your data:
library(reshape2) # or library(data.table)
rest_change$rowN <- 1:nrow(rest_change)
rest_change <- melt(rest_change, id.var = "rowN")
ggplot(rest_change,aes(x = rowN, y = value)) + geom_bar(stat = "identity") + facet_wrap(~ variable)

Resources