Mutating y axis in ggplot R - r

I would like to ask how I can possibly mutate the y axis in this example. I do not want to do any sort of transformation, I simply would like to have even breaks, but defined by me (in my own real example I have lots in the range of 0:300 and one tall bar in one category, so I would like to boost the lower bars)
set.seed(123)
df <- data.frame(c(rnorm(20,10,2),rnorm(30,50,5),rnorm(25,5,0.5)),
c(rnorm(75,60,20)),
c(rep("A",20),rep("B",30),rep("C",25)))
colnames(df) <- c("bags","dist","source")
df %>%
mutate(bin = cut(dist, breaks = c(min(dist),18, 30,40, 50, max(dist)))) %>% # specify ranges
group_by(source, bin) %>%
summarise(sum_number = sum(bags)) %>%
ungroup() %>%
ggplot(aes(bin, sum_number, fill=source))+
geom_col()+
xlab("Km")+
ylab("Number of bags")+
scale_fill_manual(values = c("#a6611a","#dfc27d","#bababa"),
labels = unique(df$source),
name = "")+
scale_y_continuous(limits = c(0,1200), breaks =
c(0,50,100,150,200,250,300,500,1000),
labels = c("0", "50", "100","150", `"200","250","300", "500","1,000")) +`
theme_minimal()
The code above is only applying chosen ticks but is not rescaling the y-axis which is what I need.

Related

How to order time in y axis

I have a data frame (a tibble) like this:
library(tidyverse)
library(lubridate)
x = tibble(date=c("2022-04-25 07:04:07", "2022-04-25 07:09:07", "2022-04-25 07:14:07", "2022-04-26 07:04:07"),
value=c("on", "off", "on", "off"))
x$day<- as.factor(day(x$date))
x$time <- paste0(str_pad(hour(x$date),2,pad="0"),":",str_pad(minute(x$date),2,pad="0"))
When I plot the data:
x %>% ggplot() + geom_col(aes(x=day,y=time, fill=value))
the times in the y axis do not follow the bars. Each time data is supposed to be side by side with each bar segment.
I tried using as.factor(time) but that didn't solve.
I also tried to add a numeric scale:
x = tibble(date=c("2022-04-25 07:04:07", "2022-04-25 07:09:07", "2022-04-25 07:14:07", "2022-04-26 07:04:07"),
fake_y=c(1,1,1,1)
value=c("on", "off", "on", "off"))
x %>% ggplot() + geom_col(aes(x=day,y=fake_y, fill=value))
but then the order of the on/off bars is lost.
How can I fix this?
Since you are looking for a time line, you would probably be best with geom_segment rather than geom_col. The reason is that since you might have multiple 'on' or 'off' values in a single day, it would be difficult to get these to stack correctly. You would also need to diff the on-off times to get them to stack. Furthermore, your labels would be wrong using columns if "off" represents the time of going from an on state to an off state.
When working with times in R, it is often best to keep them in time format for plotting. If you convert times to character strings before plotting, they will be interpreted as factor levels, and therefore will not be proportionately spaced correctly.
Since you want to have the day along one axis, you will need quite a bit of data manipulation to ensure that you record the state at the start of each day and the end of each day, but it can be achieved by doing:
p <- x %>%
mutate(date = as.POSIXct(date)) %>%
mutate(day = as.factor(day(date))) %>%
group_by(day) %>%
group_modify(~ add_row(.x,
date = floor_date(as.POSIXct(first(.x$date)), 'day'),
value = ifelse(first(.x$value) == 'on', 'off', 'on'),
.before = 1)) %>%
group_modify(~ add_row(.x,
date = ceiling_date(as.POSIXct(last(.x$date)), 'day') - 1,
value = last(.x$value))) %>%
mutate(ends = lead(date)) %>%
filter(!is.na(ends)) %>%
mutate(date = hms::as_hms(date), ends = hms::as_hms(ends)) %>%
ggplot(aes(x = day, y = date)) +
geom_segment(aes(xend = day, yend = ends, color = value),
size = 20) +
coord_cartesian(ylim = c(25120, 26500)) +
labs(y = 'time') +
guides(color = guide_legend(override.aes = list(size = 8)))
p
And of course, you can easily flip the co-ordinates if you wish, and apply theme elements to make the plot more appealing:
p + coord_flip(ylim = c(25120, 26500)) +
scale_color_manual(values = c('deepskyblue4', 'orange')) +
theme_light(base_size = 16)

Control Discrete Tick Labels in ggplot2 (scale_x_discrete) [duplicate]

This question already has answers here:
ggplot2: display every nth value on discrete axis
(2 answers)
Closed 1 year ago.
On a continuous scale, I can reduce the density of the tick labels using breaks and get nice control over their density in a flexible fashion using scales::pretty_breaks(). However, I can't figure out how to achieve something similar with a discrete scale. Specifically, if my discrete labels are letters, then let's say that I want to show every other one to clean up the graph. Is there an easy, systematic way to do this?
I have a hack that works (see below) but looking for something more automatic and elegant.
library(tidyverse)
# make some dummy data
dat <-
matrix(sample(100),
nrow = 10,
dimnames = list(letters[1:10], LETTERS[1:10])) %>%
as.data.frame() %>%
rownames_to_column("row") %>%
pivot_longer(-row, names_to = "column", values_to = "value")
# default plot has all labels on discrete axes
dat %>%
ggplot(aes(row, column)) +
geom_tile(aes(fill = value))
# desired plot would look like following:
ylabs <- LETTERS[1:10][c(T, NA)] %>% replace_na("")
xlabs <- letters[1:10][c(T, NA)] %>% replace_na("")
# can force desired axis text density but it's an ugly hack
dat %>%
ggplot(aes(row, column)) +
geom_tile(aes(fill = value)) +
scale_y_discrete(labels = ylabs) +
scale_x_discrete(labels = xlabs)
Created on 2021-12-21 by the reprex package (v2.0.1)
One option for dealing with overly-dense axis labels is to use n.dodge:
ggplot(dat, aes(row, column)) +
geom_tile(aes(fill = value)) +
scale_x_discrete(guide = guide_axis(n.dodge = 2)) +
scale_y_discrete(guide = guide_axis(n.dodge = 2))
Alternatively, if you are looking for a way to reduce your use of xlabs and do it more programmatically, then we can pass a function to scale_x_discrete(breaks=):
everyother <- function(x) x[seq_along(x) %% 2 == 0]
ggplot(dat, aes(row, column)) +
geom_tile(aes(fill = value)) +
scale_x_discrete(breaks = everyother) +
scale_y_discrete(breaks = everyother)

plotly and ggplot legend order interaction

I have multiple graphs that I am plotting with ggplot and then sending to plotly. I set the legend order based the most recent date, so that one can easily interpret the graphs. Everything works great in generating the ggplot, but once I send it through ggplotly() the legend order reverts to the original factor level. I tried resetting the factors but this creates a new problem - the colors are different in each graph.
Here's the code:
Data:
Country <- c("CHN","IND","INS","PAK","USA")
a <- data.frame("Country" = Country,"Pop" = c(1400,1300,267,233,330),Year=rep(2020,5))
b <- data.frame("Country" = Country,"Pop" = c(1270,1000,215,152,280),Year=rep(2000,5))
c <- data.frame("Country" = Country,"Pop" = c(1100,815,175,107,250),Year=rep(1990,5))
Data <- bind_rows(a,b,c)
Legend Ordering Vector - This uses 2020 as the year to determine order.
Legend_Order <- Data %>%
filter(Year==max(Year)) %>%
arrange(desc(Pop)) %>%
select(Country) %>%
unlist() %>%
as.vector()
Then I create my plot and use Legend Order as breaks
Graph <- Data %>%
ggplot() +
geom_line(aes(x = Year, y = Pop, group = Country, color = Country), size = 1.2) +
scale_color_discrete(name = 'Country', breaks = Legend_Order)
Graph
But then when I pass this on to:
ggplotly(Graph)
For some reason plotly ignores the breaks argument and uses the original factor levels.
If I set the factor levels beforehand, the color schemes changes (since the factors are in a different order).
How can I keep the color scheme from graph to graph, but change the legend order when using plotly?
Simply recode your Conutry var as factor with the levels set according to Legend_Order. Try this:
library(plotly)
library(dplyr)
Country <- c("CHN","IND","INS","PAK","USA")
a <- data.frame("Country" = Country,"Pop" = c(1400,1300,267,233,330),Year=rep(2020,5))
b <- data.frame("Country" = Country,"Pop" = c(1270,1000,215,152,280),Year=rep(2000,5))
c <- data.frame("Country" = Country,"Pop" = c(1100,815,175,107,250),Year=rep(1990,5))
Data <- bind_rows(a,b,c)
Legend_Order <- Data %>%
filter(Year==max(Year)) %>%
arrange(desc(Pop)) %>%
select(Country) %>%
unlist() %>%
as.vector()
Data$Country <- factor(Data$Country, levels = Legend_Order)
Graph <- Data %>%
ggplot() +
geom_line(aes(x = Year, y = Pop, group = Country, color = Country), size = 1.2)
ggplotly(Graph)
To "lock in" the color assignment you can make use of a named color vector like so (for short I only show the ggplots):
# Fix the color assignments using a named color vector which can be assigned via scale_color_manual
cols <- scales::hue_pal()(5) # Default ggplot2 colors
cols <- setNames(cols, Legend_Order) # Set names according to legend order
# Plot with unordered Countries but "ordered" color assignment
Data %>%
ggplot() +
geom_line(aes(x = Year, y = Pop, color = Country), size = 1.2) +
scale_color_manual(values = cols)
# Plot with ordered factor
Data$Country <- factor(Data$Country, levels = Legend_Order)
Data %>%
ggplot() +
geom_line(aes(x = Year, y = Pop, color = Country), size = 1.2) +
scale_color_manual(values = cols)

How to annotate inside the plot when using datetime on the X axis with ggplot2?

I have successfully created a line a graph in R using ggplot2 with percentage on Y axis and Date/Time on the X axis, but I am unsure how to annotate inside the graph for specific date/time points when their is a high/low peak.
The examples I identified (on R-bloggers & RPubs) are annotated without using date/time, and I have made attempts to annotate it (with ggtext and annotate functions, etc), but got nowhere. Please can you show me an example of how to do this using ggplot2 in R?
The current R code below creates the line graph, but can you help me extend the code to annotate inside of the graph?
sentimentdata <- read.csv("sentimentData-problem.csv", header = TRUE, sep = ",", stringsAsFactors = FALSE)
sentimentTime <- sentimentdata %>%
filter(between(Hour, 11, 23))
sentimentTime$Datetime <- ymd_hm(sentimentTime$Datetime)
library(zoo)
sentimentTime %>%
filter(Cat %in% c("Negative", "Neutral", "Positive")) %>%
ggplot(aes(x = Datetime, y = Percent, group = Cat, colour = Cat)) +
geom_line() +
scale_x_datetime(breaks = date_breaks("1 hours"), labels = date_format("%H:00")) +
labs(title="Peak time on day of event", colour = "Sentiment Category") +
xlab("By Hour") +
ylab("Percentage of messages")
Data source available via GitHub:
Since you have multiple lines and you want two labels on each line according to the maxima and minima, you could create two small dataframes to pass to geom_text calls.
First we ensure the necessary packages and the data are loaded:
library(lubridate)
library(ggplot2)
library(scales)
library(dplyr)
url <- paste0("https://raw.githubusercontent.com/jcool12/",
"datasets/master/sentimentData-problem.csv")
sentimentdata <- read.csv(url, stringsAsFactors = FALSE)
sentimentdata$Datetime <- dmy_hm(sentimentdata$Datetime)
sentimentTime <- filter(sentimentdata, between(Hour, 11, 23))
Now we can create a max_table and min_table that hold the x and y co-ordinates and the labels for our maxima and minima:
max_table <- sentimentTime %>%
group_by(Cat) %>%
summarise(Datetime = Datetime[which.max(Percent)],
Percent = max(Percent) + 3,
label = paste(trunc(Percent, 3), "%"))
min_table <- sentimentTime %>%
group_by(Cat) %>%
summarise(Datetime = Datetime[which.min(Percent)],
Percent = min(Percent) - 3,
label = paste(trunc(Percent, 3), "%"))
Which allows us to create our plot without much trouble:
sentimentTime %>%
filter(Cat %in% c("Negative", "Neutral", "Positive")) %>%
ggplot(aes(x = Datetime, y = Percent, group = Cat, colour = Cat)) +
geom_line() +
geom_text(data = min_table, aes(label = label)) + # minimum labels
geom_text(data = max_table, aes(label = label)) + # maximum labels
scale_x_datetime(breaks = date_breaks("1 hours"),
labels = date_format("%H:00")) +
labs(title="Peak time on day of event", colour = "Sentiment Category") +
xlab("By Hour") +
ylab("Percentage of messages")

ggplot faceted cumulative histogram

I have the following data
set.seed(123)
x = c(rnorm(100, 4, 1), rnorm(100, 6, 1))
gender = rep(c("Male", "Female"), each=100)
mydata = data.frame(x=x, gender=gender)
and I want to plot two cumulative histograms (one for males and the other for females) with ggplot.
I have tried the code below
ggplot(data=mydata, aes(x=x, fill=gender)) + stat_bin(aes(y=cumsum(..count..)), geom="bar", breaks=1:10, colour=I("white")) + facet_grid(gender~.)
but I get this chart
that, obviously, is not correct.
How can I get the correct one, like this:
Thanks!
I would pre-compute the cumsum values per bin per group, and then use geom_histogram to plot.
mydata %>%
mutate(x = cut(x, breaks = 1:10, labels = F)) %>% # Bin x
count(gender, x) %>% # Counts per bin per gender
mutate(x = factor(x, levels = 1:10)) %>% # x as factor
complete(x, gender, fill = list(n = 0)) %>% # Fill missing bins with 0
group_by(gender) %>% # Group by gender ...
mutate(y = cumsum(n)) %>% # ... and calculate cumsum
ggplot(aes(x, y, fill = gender)) + # The rest is (gg)plotting
geom_histogram(stat = "identity", colour = "white") +
facet_grid(gender ~ .)
Like #Edo, I also came here looking for exactly this. #Edo's solution was the key for me. It's great. But I post here a few additions that increase the information density and allow comparisons across different situations.
library(ggplot2)
set.seed(123)
x = c(rnorm(100, 4, 1), rnorm(50, 6, 1))
gender = c(rep("Male", 100), rep("Female", 50))
grade = rep(1:3, 50)
mydata = data.frame(x=x, gender=gender, grade = grade)
ggplot(mydata, aes(x,
y = ave(after_stat(density), group, FUN = cumsum)*after_stat(width),
group = interaction(gender, grade),
color = gender)) +
geom_line(stat = "bin") +
scale_y_continuous(labels = scales::percent_format()) +
facet_wrap(~grade)
I rescale the y so that the cumulative plot always ends at 100%. Otherwise, if the groups are not the same size (like they are in the original example data) then the cumulative plots have different final heights. This obscures their relative distribution.
Secondly, I use geom_line(stat="bin") instead of geom_histogram() so that I can put more than one line on a panel. This way I can compare them easily.
Finally, because I also want to compare across facets, I need to make sure the ggplot group variable uses more than just color=gender. We set it manually with group = interaction(gender, grade).
Answering a million years later....
I was looking for a solution for the same problem and I got here..
Eventually I figured it out by myself, so I'll drop it here in case other people will ever need it.
As required: no pre-work is necessary!
ggplot(mydata) +
geom_histogram(aes(x = x, y = ave(..count.., group, FUN = cumsum),
fill = gender, group = gender),
colour = "gray70", breaks = 1:10) +
facet_grid(rows = "gender")

Resources