How to make a dual axis in ggplot R - r

I have made a time series plot for total count data of 4 different species. As you can see the results with sharksucker have a much higher count than the other 3 species. To see the trends of the other 3 species they need to plotted separately (or on a smaller y axis). However, I have a figure limit in my masters paper. So, I was trying to create a dual axis plot or have the y axis split into two. Does anyone know of a way I could do this?
library(tidyverse)
library(reshape2)
dat <- read_xlsx("ReefPA.xlsx")
dat1 <- dat
dat1$Date <- format(dat1$Date, "%Y/%m")
plot_dat <- dat1 %>%
group_by(Date) %>%
summarise(Sharksucker_Remora = sum(Sharksucker_Remora)) %>%
melt("Date") %>%
filter(Date > '2018-01-01') %>%
arrange(Date)
names(plot_dat) <- c("Date", "Species", "Count")
ggplot(data = plot_dat) +
geom_line(mapping = aes(x = Date, y = Count, group = Species, colour = Species)) +
stat_smooth(method=lm, aes(x = Date, y = Count, group = Species, colour = Species)) +
scale_colour_manual(values=c(Golden_Trevally="goldenrod2", Red_Snapper="firebrick2", Sharksucker_Remora="darkolivegreen3", Juvenile_Remora="aquamarine2")) +
xlab("Date") +
ylab("Total Presence Per Month") +
theme(legend.title = element_blank()) +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1))

The thing is, the problem you're trying to solve doesn't seem like a 2nd Y axis issue. The problem here is of relative scale of the species. You might want to think of something like standardizing the initial species presence to 100 and showing growth or decline from there.
Another option would be faceting by species.

Related

ggplot geom_col: making certain axis count integers rather than summing

I am currently making a hate crime case study. For my plot I am using one zip-code as my y-axis and plotting how many crimes and what group is being targeted on the x-axis using geom-col. The problem is my y-axis is adding the zip-codes together rather than counting each frequency of how many times the zip-code shows up. Here is my dataset looks like:
structure(list(ID = 1:5, CRIME_TYPE = c("VANDALISM", "ASSAULT", "VANDALISM", "ASSAULT",
"OTHER"), BIAS_MOTIVATION_GROUP = c("ANTI-BLACK ",
"ANTI-BLACK ", "ANTI-FEMALE HOMOSEXUAL (LESBIAN) ",
"ANTI-MENTAL DISABILITY ", "ANTI-JEWISH "),
ZIP_CODE = c(40291L, 40219L, 40243L, 40212L, 40222L
)), row.names = c(NA, 5L), class = "data.frame")
Here is my code:
library(ggplot2)
df <- read.csv(file = "LMPD_OP_BIAS.csv", header = T)
library(tidyverse)
hate_crime <- df %>%
filter(ZIP_CODE == "40245")
hate_crime_plot <- hate_crime %>%
ggplot(., aes(x = BIAS_MOTIVATION_GROUP, y = ZIP_CODE, fill =
BIAS_MOTIVATION_GROUP)) +
geom_col() + labs(x = "BIAS_MOTIVATION_GROUP", fill = "BIAS_MOTIVATION_GROUP") +
theme_minimal() +
theme(axis.text.x=element_text (angle =45, hjust =1))
print(hate_crime_plot)
hate_crime_ploter <- hate_crime %>%
ggplot(., aes(x = UOR_DESC, y = ZIP_CODE, fill =
UOR_DESC)) +
geom_col() + labs(x = "UOR_DESC", fill = "UOR_DESC") +
theme_minimal() +
theme(axis.text.x=element_text (angle =45, hjust =1))
print(hate_crime_ploter)
For full data visit here: visit site to download data set
Alright, I think you've got a couple issues here. What's happening in your code is you're asking ggplot to make a bar plot with a categorical variable (BIAS_MOTIVATION_GROUP and UOR_DESC) on the x-axis and a continuous variable (ZIP_CODE) on the y-axis. Since there are more than one row per x-y combination, ggplot adds things together by x value, which is what you'd expect out of a bar plot. Long story short, I wonder if what you actually want is a histogram here. Your dataset (hate_crime) only has one value of ZIP_CODE, so I'm not sure what plotting ZIP on the y-axis is supposed to visualize. A histogram would look like this:
hate_crime %>%
ggplot(., aes(x = UOR_DESC, , fill = UOR_DESC)) +
geom_histogram(stat = "count") +
labs(x = "UOR_DESC", fill = "UOR_DESC") +
theme_minimal() +
theme(axis.text.x=element_text (angle =45, hjust =1))
If, instead, you're trying to visualize how often each ZIP code shows up in each category, you'd have to approach things differently. Perhaps you're looking for something like this?
df %>%
ggplot(aes(x = UOR_DESC, fill = factor(ZIP_CODE))) +
geom_histogram(stat = "count") +
theme(axis.text.x=element_text (angle =45, hjust =1))

Plotting in r by date range

I have a dataset with 4000 categoric variables which are city names arranged by date. I can do a plot of the entire dataset with an overall count.
What I need to do is be able to plot aggregates of specific cities by specific date ranges. I cannot use by quarter or anything like that because the required date ranges every year are different. I need to be able to, say, select 2016/4/1 to 2016/6/23 to get a count of how many are Denver.
How can I do this?
library(ggplot2)
library(ggpubr)
theme_set(theme_classic())
df <- log %>%
group_by(Location) %>%
summarise(counts = n())
df
ggplot(df, aes(x = Location, y = counts)) +
geom_bar(fill = "#0073C2FF", stat = "identity",width = .65) +
geom_text(aes(label = counts), vjust = -0.3) +
theme(axis.text.x = element_text(angle=65, vjust=0.6)) +
labs(title="Locations of Library Instruction",
subtitle="2016-2020")

Adding labels to individual % inside geom_bar() using R / ggplot2 [duplicate]

This question already has answers here:
Add percentage labels to a stacked barplot
(2 answers)
Closed 3 years ago.
bgraph <- ggplot(data = data, aes(x = location)) +
geom_bar(aes(fill = success))
success is a percentage calculated as a factor of 4 categories with the varying 4 outcomes of the data set. I could separately calculate them easily, but as the ggplot is currently constituted, they are generated by the geom_bar(aes(fill=success)).
data <- as.data.frame(c(1,1,1,1,1,1,2,2,3,3,3,3,4,4,4,4,4,4,
4,4,5,5,5,5,6,6,6,6,6,6,7,7,7,7,7))
data[["success"]] <- c("a","b","c","c","d","d","a","b","b","b","c","d",
"a","b","b","b","c","c","c","d","a","b","c","d",
"a","b","c","c","d","d","a","b","b","c","d")
names(data) <- c("location","success")
bgraph <- ggplot(data = data, aes(x = location)) +
geom_bar(aes(fill = success))
bgraph
How do I get labels over the individual percentages? More specifically, I wanted 4 individual percentages for each bar. One for yellow, light orange, orange, and red, respectively. %'s all add up to 1.
Maybe there is a way to do this in ggplot directly but with some pre-processing in dplyr, you'll be able to achieve your desired output.
library(dplyr)
library(ggplot2)
data %>%
count(location, success) %>%
group_by(location) %>%
mutate(n = n/sum(n) * 100) %>%
ggplot() + aes(x = location, n, fill = success,label = paste0(round(n, 2), "%")) +
geom_bar(stat = "identity") +
geom_text(position=position_stack(vjust=0.5))
How about creating a summary frame with the relative frequencies within location and then using that with geom_col() and geom_text()?
# Create summary stats
tots <-
data %>%
group_by(location,success) %>%
summarise(
n = n()
) %>%
mutate(
rel = round(100*n/sum(n)),
)
# Plot
ggplot(data = tots, aes(x = location, y = n)) +
geom_col(aes(fill = fct_rev(success))) + # could only get it with this reversed
geom_text(aes(label = rel), position = position_stack(vjust = 0.5))
OUTPUT:

How to how to approximate lines (change the y-axis) to improve comparison

I'm having problem to plot two lines in ggplot2, since I need they be closer to improve the comparison. I'm tried to change the y-scale using 'log' and 'sqrt', but the lines are still far apart.
My data is big, I can't upload here, but here is my code
ggplot(data_sex, aes(x = year, y = sqrt(log(ratemort)), color = sex)) +
geom_line(aes(group = sex)) +
coord_cartesian( ylim = c(3.25,3.67))+
geom_point()
where year 'sex' is a factor and 'ratemort' is a number.
I expect to approximate the lines to improve the visualization.
What about change only the visualization way? You can use facet_wrap() with free scales:
# some fake data: it's not necessary your data you can't post them
# but a kind-of-your-data is always welcome
data_sex <- data.frame(year = c(2000,2001,2002,2003,2004,2005,2000,2001,2002,2003,2004,2005),
ratemort = c(1,2,1,2,1,3,100,200,200,300,200,500),
sex = c('0','0','0','0','0','0','1','1','1','1','1','1'))
library(ggplot2)
ggplot(data_sex, aes(x = year, y =(ratemort))) +
geom_line(aes(group = sex)) +
geom_point() +facet_wrap(vars(sex), scales = 'free', ncol = 1)
Or if you're interested in make the lines more "near", you can make them as percentage: it's clear that it's not the real values plotted, so you've specify the different magnitude of the values, that is lost here:
library(dplyr)
data_sex %>%
group_by( sex, year) %>%
summarise(n = sum(ratemort)) %>%
mutate(perc = n / sum(n)) %>%
ggplot(aes(x = year, y =perc, color = sex)) +
geom_line(aes(group = sex)) +
geom_point()

ggplot faceted cumulative histogram

I have the following data
set.seed(123)
x = c(rnorm(100, 4, 1), rnorm(100, 6, 1))
gender = rep(c("Male", "Female"), each=100)
mydata = data.frame(x=x, gender=gender)
and I want to plot two cumulative histograms (one for males and the other for females) with ggplot.
I have tried the code below
ggplot(data=mydata, aes(x=x, fill=gender)) + stat_bin(aes(y=cumsum(..count..)), geom="bar", breaks=1:10, colour=I("white")) + facet_grid(gender~.)
but I get this chart
that, obviously, is not correct.
How can I get the correct one, like this:
Thanks!
I would pre-compute the cumsum values per bin per group, and then use geom_histogram to plot.
mydata %>%
mutate(x = cut(x, breaks = 1:10, labels = F)) %>% # Bin x
count(gender, x) %>% # Counts per bin per gender
mutate(x = factor(x, levels = 1:10)) %>% # x as factor
complete(x, gender, fill = list(n = 0)) %>% # Fill missing bins with 0
group_by(gender) %>% # Group by gender ...
mutate(y = cumsum(n)) %>% # ... and calculate cumsum
ggplot(aes(x, y, fill = gender)) + # The rest is (gg)plotting
geom_histogram(stat = "identity", colour = "white") +
facet_grid(gender ~ .)
Like #Edo, I also came here looking for exactly this. #Edo's solution was the key for me. It's great. But I post here a few additions that increase the information density and allow comparisons across different situations.
library(ggplot2)
set.seed(123)
x = c(rnorm(100, 4, 1), rnorm(50, 6, 1))
gender = c(rep("Male", 100), rep("Female", 50))
grade = rep(1:3, 50)
mydata = data.frame(x=x, gender=gender, grade = grade)
ggplot(mydata, aes(x,
y = ave(after_stat(density), group, FUN = cumsum)*after_stat(width),
group = interaction(gender, grade),
color = gender)) +
geom_line(stat = "bin") +
scale_y_continuous(labels = scales::percent_format()) +
facet_wrap(~grade)
I rescale the y so that the cumulative plot always ends at 100%. Otherwise, if the groups are not the same size (like they are in the original example data) then the cumulative plots have different final heights. This obscures their relative distribution.
Secondly, I use geom_line(stat="bin") instead of geom_histogram() so that I can put more than one line on a panel. This way I can compare them easily.
Finally, because I also want to compare across facets, I need to make sure the ggplot group variable uses more than just color=gender. We set it manually with group = interaction(gender, grade).
Answering a million years later....
I was looking for a solution for the same problem and I got here..
Eventually I figured it out by myself, so I'll drop it here in case other people will ever need it.
As required: no pre-work is necessary!
ggplot(mydata) +
geom_histogram(aes(x = x, y = ave(..count.., group, FUN = cumsum),
fill = gender, group = gender),
colour = "gray70", breaks = 1:10) +
facet_grid(rows = "gender")

Resources