R Stacked Bar Plot - r

data1=data.frame("Grade"=c(1,1,1,2,2,2,3,3,3),
"Class"=c(1,2,3,1,2,3,1,2,3),
"Score"=c(6,9,9,7,7,4,9,6,6))
I am sincerely apologetic if this already was posted but I did not see it. I wish to prepare a stacked bar plot there the X axis is 'Grade' and each Grade is 1 bar. Every bar contains three color shades because there are three classes ('Class'). Finally the height of the bar is 'Score' and it always starts from low class to high. So it will look something like this but this is not to proper scale

We can use xtabs to convert the data to wide format and then apply the barplot
barplot(xtabs(Score ~ Grade + Class, data1), legend = TRUE,
col = c('yellow', 'red', 'orange'))
Or using ggplot
library(dplyr)
library(ggplot2)
data1 %>%
mutate_at(vars(Grade, Class), factor) %>%
ggplot(aes(x = Grade, y = Score, fill = Class)) +
geom_col()
If we want to order for 'Class', convert to factor with levels specified in that order based on the 'Score' values
data1 %>%
mutate(Class = factor(Class, levels = unique(Class[order(Score)])),
Grade = factor(Grade)) %>%
ggplot(aes(x = Grade, y = Score, fill = Class)) +
geom_col()

Related

Add a gradient fill to geom_col

Here is come basic code for a column plot:
library(tidyverse)
diamonds %>%
group_by(cut) %>%
summarise(
count = n()
) %>%
ggplot(
aes(
x = cut,
y = count,
fill = count
)
) +
geom_col() +
scale_fill_viridis_c(
option = "plasma"
)
I could not find any examples of what I would like to do so I will try and explain it as best I can. I have applied a colour gradient to the fill aesthetic which colours the whole column plot one colour. Is it possible to have it such that each column of the plot contains the full colour spectrum up until it's respective value?
By which I mean the "Ideal" column of my plot would look exactly like the key in the legend. Then the "Premium" column would look like the key in the legend but cut off ~2/3 of the way up.
Thanks
You can do this fairly easily with a bit of data manipulation. You need to give each group in your original data frame a sequential number that you can associate with the fill scale, and another column the value of 1. Then you just plot using position_stack
library(ggplot2)
library(dplyr)
diamonds %>%
group_by(cut) %>%
mutate(fill_col = seq_along(cut), height = 1) %>%
ggplot(aes(x = cut, y = height, fill = fill_col)) +
geom_col(position = position_stack()) +
scale_fill_viridis_c(option = "plasma")

How to set ggplot2 fill color to aggregate statistic?

I want to set the fill colors of bars in a ggplot2 bar chart of one variable according to an aggregate stat of another variable.
For instance, using the house price regression data from here I might want to use color to visualize the mean sale price of homes in each category in a bar chart.
ggplot(
data = df,
mapping = aes(x = OverallCond, fill = mean(SalePrice))
) +
geom_bar()
This is not the graph I'm looking for. Each bar should be a color that represents the average sale price within that category.
You can calculate the mean values by category first and then plot them using geom_col(). Here is an example using mtcars dataset.
library(tidyverse)
d <- mtcars %>%
group_by(carb = factor(carb)) %>%
summarise(m = mean(disp)) %>%
ungroup()
ggplot(
data = d,
mapping = aes(x = carb, y = m, fill = carb)
) +
geom_col()

plotly and ggplot legend order interaction

I have multiple graphs that I am plotting with ggplot and then sending to plotly. I set the legend order based the most recent date, so that one can easily interpret the graphs. Everything works great in generating the ggplot, but once I send it through ggplotly() the legend order reverts to the original factor level. I tried resetting the factors but this creates a new problem - the colors are different in each graph.
Here's the code:
Data:
Country <- c("CHN","IND","INS","PAK","USA")
a <- data.frame("Country" = Country,"Pop" = c(1400,1300,267,233,330),Year=rep(2020,5))
b <- data.frame("Country" = Country,"Pop" = c(1270,1000,215,152,280),Year=rep(2000,5))
c <- data.frame("Country" = Country,"Pop" = c(1100,815,175,107,250),Year=rep(1990,5))
Data <- bind_rows(a,b,c)
Legend Ordering Vector - This uses 2020 as the year to determine order.
Legend_Order <- Data %>%
filter(Year==max(Year)) %>%
arrange(desc(Pop)) %>%
select(Country) %>%
unlist() %>%
as.vector()
Then I create my plot and use Legend Order as breaks
Graph <- Data %>%
ggplot() +
geom_line(aes(x = Year, y = Pop, group = Country, color = Country), size = 1.2) +
scale_color_discrete(name = 'Country', breaks = Legend_Order)
Graph
But then when I pass this on to:
ggplotly(Graph)
For some reason plotly ignores the breaks argument and uses the original factor levels.
If I set the factor levels beforehand, the color schemes changes (since the factors are in a different order).
How can I keep the color scheme from graph to graph, but change the legend order when using plotly?
Simply recode your Conutry var as factor with the levels set according to Legend_Order. Try this:
library(plotly)
library(dplyr)
Country <- c("CHN","IND","INS","PAK","USA")
a <- data.frame("Country" = Country,"Pop" = c(1400,1300,267,233,330),Year=rep(2020,5))
b <- data.frame("Country" = Country,"Pop" = c(1270,1000,215,152,280),Year=rep(2000,5))
c <- data.frame("Country" = Country,"Pop" = c(1100,815,175,107,250),Year=rep(1990,5))
Data <- bind_rows(a,b,c)
Legend_Order <- Data %>%
filter(Year==max(Year)) %>%
arrange(desc(Pop)) %>%
select(Country) %>%
unlist() %>%
as.vector()
Data$Country <- factor(Data$Country, levels = Legend_Order)
Graph <- Data %>%
ggplot() +
geom_line(aes(x = Year, y = Pop, group = Country, color = Country), size = 1.2)
ggplotly(Graph)
To "lock in" the color assignment you can make use of a named color vector like so (for short I only show the ggplots):
# Fix the color assignments using a named color vector which can be assigned via scale_color_manual
cols <- scales::hue_pal()(5) # Default ggplot2 colors
cols <- setNames(cols, Legend_Order) # Set names according to legend order
# Plot with unordered Countries but "ordered" color assignment
Data %>%
ggplot() +
geom_line(aes(x = Year, y = Pop, color = Country), size = 1.2) +
scale_color_manual(values = cols)
# Plot with ordered factor
Data$Country <- factor(Data$Country, levels = Legend_Order)
Data %>%
ggplot() +
geom_line(aes(x = Year, y = Pop, color = Country), size = 1.2) +
scale_color_manual(values = cols)

ggplot add segments to scatter plot according to factors

I have the following 'code'
set.seed(100)
values<-c(rnorm(200,10,1),rnorm(200,2.1,1),rnorm(250,6,1),rnorm(75,2.1,1),rnorm(50,9,1),rnorm(210,2.05,1))
rep1<-rep(3,200)
rep2<-rep(0,200)
rep3<-rep(1,250)
rep4<-rep(0,75)
rep5<-rep(2,50)
rep6<- rep(0,210)
group<-c(rep1,rep2,rep3,rep4,rep5,rep6)
df<-data.frame(values,group)
I would like to plot these data as a scatter plot (like the attached plot) and add segments. These segments (y values) shall represent the mean value of the data for a given group. In addition, the segments should have a different color depending on the factor (group). Is there an efficient way to do it with ggplot ?
Many thanks
We can do this by augmenting your data a little. We'll use dplyr to get the mean by group, and we'll create variables that give the observation index and one that increments by one each time the group changes (which will be helpful to get the segments you want):1
library(dplyr)
df <- df %>%
mutate(idx = seq_along(values), group = as.integer(group)) %>%
group_by(group) %>%
mutate(m = mean(values)) %>%
ungroup() %>%
mutate(group2 = cumsum(group != lag(group, default = -1)))
Now we can make the plot; using geom_line() with grouping by group2, which changes every time the group changes, makes the segments you want. Then we just color by (a discretized version of) group:
ggplot(data = df, mapping = aes(x = idx, y = values)) +
geom_point(shape = 1, color = "blue") +
geom_line(aes(x = idx, y = m, group = group2, color = as.factor(group)),
size = 2) +
scale_color_manual(values = c("red", "black", "green", "blue"),
name = "group") +
theme_bw()
1 See https://stackoverflow.com/a/42705593/8386140

Convert a geom_tile in dotplot in ggplot2

I am doing several heatmaps in ggplot2 using geom_tile. They work great but what if instead of tiles (little rectangles) I want to have dots. My input is a binary matrix (converted in a table using melt function).
My x and y are discrete factors. How do I produce circles or dots instead of tiles.....any idea?
Thanks!
example:
dat=data.frame(sample = c("a","a","a","b","b","b","c","c","c"), cond=c("x","y","z","x","y","z","x","y","z"),value=c("1","4","6","2","3","7","4","6","7"),score=c(0,1,1,0,0,0,1,1,1))
if I use the following plot:
ggplot(dat, aes(x = sample, y = cond, color = value)) +
geom_point()
I get the wrong plot. Instead, I would like to have or not have a dot where the score is 0 or 1 and color them by value factor.
I assume you mean to map score to your color aesthetic and not value, as written in your shared code.
Simply convert color to a factor in your initial aesthetics call:
ggplot(dat, aes(x = sample, y = cond, color = as.factor(score))) +
geom_point()
EDIT:
The user indicated that he would like to filter observations where score is not equal to 1, and then color the points by value. You can do so by adding the following pipe operation:
I assume you mean to map score to your color aesthetic and not value, as written in your shared code.
Simply convert color to a factor in your initial aesthetics call:
dat %>%
filter(score == 1) %>%
ggplot(aes(x = sample, y = cond, color = as.factor(value))) +
geom_point()
Note that there are only 3 levels of the factor score and we are missing level b from sample on the x-axis. Keep all levels by specifying drop = FALSE in scale_x_discrete():
dat %>%
filter(score == 1) %>%
ggplot(aes(x = sample, y = cond, color = as.factor(value))) +
geom_point() +
scale_x_discrete(drop = FALSE)

Resources