Changing the variable displayed on the x-axis in ggplot - r

I have a dataframe which contains a variable for week-since-2017. So, it counts up from 1 to 313 in that column. I mutated another variable into the dataframe to indicate the year. So, in my scatterplot, I have each week as a point, but the x-axis is horrid, counting up from 1 to 313. Is there a way I can change the scale at the bottom to instead display the variable year, possibly even adding vertical lines in between to show when the year changes?
Currently, I have this:
ggplot(HS, aes(as.integer(Obs), Total)) + geom_point(aes(color=YEAR)) + geom_smooth() + labs(title="Weekly Sales since 2017",x="Week",y="Written Sales") + theme(axis.line = element_line(colour = "orange", size = 1, linetype = "solid"))

You can convert the number of weeks to a number of days using 7 * Obs and add this value on to the start date (as.Date('2017-01-01')). This gives you a date-based x axis which you can format as you please.
Here, we set the breaks at the turn of each year so the grid fits to them:
ggplot(HS, aes(as.Date('2017-01-01') + 7 * Obs, Total)) +
geom_point(aes(color = YEAR)) +
geom_smooth() +
labs(title = "Weekly Sales since 2017", x = "Week", y = "Written Sales") +
theme(axis.line = element_line(colour = "orange", size = 1)) +
scale_x_date('Year', date_breaks = 'year', date_labels = '%Y')
Data used
Obviously, we don't have your data, so I had to create a reproducible set with the same names and similar values to yours for the above example:
set.seed(1)
HS <- data.frame(Obs = 1:312,
Total = rnorm(312, seq(1200, 1500, length = 312), 200)^2,
YEAR = rep(2017:2022, each = 52))

Related

How to change ggplot2 lineplot color based on y value

I'm using ggplot2 in R to create a lineplot. The y value of the line is rates and the x value is dates. I want the color of the line to change depending on what the rates value is, so I wrote a for loop to assign the variable color based on what the rates is (i.e. >90 = Blue, <70 = Red).
The dataset looks like this:
dates
rates
color
1/1/21
91
Blue
1/2/21
42
Red
1/3/21
NA
NA
etc.
The code looks like this:
ggplot(data, aes(x=dates,y=rates)) +
geom_line(aes(color = color)) +
scale_x_date(date_labels = "%b %Y",date_breaks = "1 week") +
labs(title = "Title", x = "Date", y = "Rates (%)")
For some reason, it keeps plotting like this:
I want it to look like this, but with colors:
Does anyone have any ideas on how to fix it? Thanks.
You can try changing dates to date class, specifying group = 1 in aes and use scale_color_identity to get the same color as the name in color column of the dataframe.
library(tidyverse)
df %>%
mutate(dates = lubridate::mdy(dates)) %>%
ggplot(aes(x=dates,y=rates, color = color, group = 1)) +
geom_line() +
scale_color_identity() +
scale_x_date(date_labels = "%b %Y",date_breaks = "1 week") +
labs(title = "Title", x = "Date", y = "Rates (%)")

How to change x-axis ticks to reflect another variable?

I'm working with a data frame that includes both the day and month of the observation:
Day Month
1 January
2 January
3 January
...
32 February
33 February
34 February
...
60 March
61 March
And so on. I am creating a line graph in ggplot that reflects the day by day value of column wp (which can just be assumed to be random values between 0 and 1).
Because there are so many days in the data set, I don't want them to be reflected in the x-axis tick marks. I need to refer to the day column in creating the plot so I can see the day-by-day change in wp, but I just want month to be shown on the x-axis labels. I can easily rename the x-axis title with xlab("Month"), but I can't figure out how to change the tick marks to only show the month. So ideally, I want to see "January", "February", and "March" as the labels along the x-axis.
test <- ggplot(data = df, aes(x=day, y=wp)) +
geom_line(color="#D4BF91", size = 1) +
geom_point(color="#D4BF91", size = .5) +
ggtitle("testex") +
xlab("Day") +
ylab("Value") +
theme_fivethirtyeight() + theme(axis.title = element_text())
When I treat the day as an actual date, I get the desired result:
library(lubridate)
set.seed(21540)
dat <- tibble(
date = seq(mdy("01-01-2020"), mdy("01-01-2020")+75, by=1),
day = seq_along(date),
wp = runif(length(day), 0, 1)
)
ggplot(data = dat, aes(x=date, y=wp)) +
geom_line(color="#D4BF91", size = 1) +
geom_point(color="#D4BF91", size = .5) +
ggtitle("testex") +
xlab("Day") +
ylab("Value") +
theme(axis.title = element_text())
More generally, though you could use scale_x_continuous() to do this:
library(lubridate)
set.seed(21540)
dat <- tibble(
date = seq(mdy("01-01-2020"), mdy("01-01-2020")+75, by=1),
day = seq_along(date),
wp = runif(length(day), 0, 1),
month = as.character(month(date, label=TRUE))
)
firsts <- dat %>%
group_by(month) %>%
slice(1)
ggplot(data = dat, aes(x=day, y=wp)) +
geom_line(color="#D4BF91", size = 1) +
geom_point(color="#D4BF91", size = .5) +
ggtitle("testex") +
xlab("Day") +
ylab("Value") +
scale_x_continuous(breaks=firsts$day, label=firsts$month) +
theme(axis.title = element_text())

R - Formatting data per month and facet wrapping per year

I am practicing with R and have hit a speedbump while trying to create a graph of airline passengers per month.
I want to show a separate monthly line graph for each year from 1949 to 1960 whereby data has been recorded. To do this I have used ggplot to create a line graph with the values per month. This works fine, however when I try to separate this by year using facet_wrap() and formatting the current month field: facet_wrap(format(air$month[seq(1, length(air$month), 12)], "%Y")); it returns this:
Graph returned
I have also tried to format the facet by inputting my own sequence for the years: rep(c(1949:1960), each = 12). This returns a different result which is better but still wrong:
Second graph
Here is my code:
air = data.frame(
month = seq(as.Date("1949-01-01"), as.Date("1960-12-01"), by="months"),
air = as.vector(AirPassengers)
)
ggplot(air, aes(x = month, y = air)) +
geom_point() +
labs(x = "Month", y = "Passengers (in thousands)", title = "Total passengers per month, 1949 - 1960") +
geom_smooth(method = lm, se = F) +
geom_line() +
scale_x_date(labels = date_format("%b"), breaks = "12 month") +
facet_wrap(format(air$month[seq(1, length(air$month), 12)], "%Y"))
#OR
facet_wrap(rep(c(1949:1960), each = 12))
So how do I make an individual graph per year?
Thanks!
In the second try you were really close. The main problem with the data is that you are trying to make a facetted plot with different x-axis values (dates including the year). An easy solution to fix that would be to transform the data to a "common" x axis scale and then do the facetted plot. Here is the code that should output the desired plot.
library(tidyverse)
library(lubridate)
air %>%
# Get the year value to use it for the facetted plot
mutate(year = year(month),
# Get the month-day dates and set all dates with a dummy year (2021 in this case)
# This will get all your dates in a common x axis scale
month_day = as_date(paste(2021,month(month),day(month), sep = "-"))) %>%
# Do the same plot, just change the x variable to month_day
ggplot(aes(x = month_day,
y = air)) +
geom_point() +
labs(x = "Month",
y = "Passengers (in thousands)",
title = "Total passengers per month, 1949 - 1960") +
geom_smooth(method = lm,
se = F) +
geom_line() +
# Set the breaks to 1 month
scale_x_date(labels = scales::date_format("%b"),
breaks = "1 month") +
# Use the year variable to do the facetted plot
facet_wrap(~year) +
# You could set the x axis in an 90° angle to get a cleaner plot
theme(axis.text.x = element_text(angle = 90,
vjust = 0.5,
hjust = 1))

ggplot: How to add a certain percentage to the top of the pillars of a histogram

I need to replicate a certain format of a histogram/barchart. I already did some good modification with ggplot in order to group the categorial x-variable and specifiy the colors with HEX.
Here is what I try to plot/replicate:
Here is a MWE for my data structure:
sex <- sample(0:1, 100, replace=TRUE)
group <- sample(2:5, 100, replace=TRUE)
data <- data.frame(sex, group)
library(ggplot2)
ggplot(data, aes(x = group, group=sex, fill=factor(sex) )) +
geom_histogram(position="dodge", binwidth=0.45) +
theme(axis.title.x=element_blank(), axis.title.y=element_blank()) +
guides(fill=guide_legend(title="sex")) +
scale_y_continuous(labels = scales::percent_format()) +
scale_fill_manual(values=c("#b6181f", "#f6b8bb"))
I get:
Small things I can't handle are:
replace the factor labels on the x-axis, there might be a problem with my histogram-approach, but I also found no practical way with a bar-chart
round the percentage-digits, no decimals for percentages
But most important is, that I don't know how to add a single percentage-value for one group, one sex to the top of each bar..
I am looking forward for some advice :)
First of all I would treat your x-axis data as factors and plot it as bars. Getting percentage value text to the bar top look this question: Show % instead of counts in charts of categorical variables.
Futhermore the y-axis percent values aren't a question of rounding, they actually are no percentage values. y = ..prop.. solves that.
Are you looking for that (I summed everything up)?
sex <- sample(0:1, 100, replace=TRUE)
group <- sample(2:5, 100, replace=TRUE)
data <- data.frame(sex, group)
labs <- c("Score < 7", "Score\n7 bis < 12", "Score\n12 bis < 15",
"Score\n15 bis < 20","Score >= 20")
ggplot(data, aes(x = as.factor(group), y = ..prop.., group = sex, fill = factor(sex) )) +
geom_bar(position = "dodge") +
geom_text(aes(label = scales::percent(..prop..)),
position = position_dodge(width = 0.9), stat = "count", vjust = 2) +
labs(x = NULL, y = NULL) +
guides(fill = guide_legend(title = "sex")) +
scale_y_continuous(labels = scales::percent_format()) +
scale_fill_manual(values=c("#b6181f", "#f6b8bb")) +
scale_x_discrete(labels = labs)

ggplot: assign manual break points to date scale

I have some time series data and I would like to customize the x-axis (dates) to show the date labels where I obtain measurements, as opposed to having regular breaks per week/month/year.
Sample data:
dates <- as.Date("2011/01/01") + sample(0:365, 5, replace=F)
number <- sample(1:100, 5)
df <- data.frame(
dates = dates,
number = number
)
This way I can plot my df with regular breaks every month...
ggplot(df, aes(as.Date(dates), number)) +
geom_point(size=6) +
geom_segment(aes(x = dates, y = 0, xend = dates, yend = number),
size=0.5, linetype=2) +
scale_x_date(breaks = date_breaks("1 month"), labels = date_format("%d-%b-%Y")) +
theme(axis.text.x = element_text(angle=90, hjust=1, vjust=0.5))
... but I would like to set the major breaks to the actual 5 dates in df$dates. It works with a normal continuous scale (scale_x_continuous(breaks = c(1, 3, 7, 9))) but I can't figure out how to do it for a continuous date scale.
I am looking to do something like...
scale_dates_continuous(breaks = df$dates)
...but that doesn't exist unfortunately. Thanks lot for your help!
Please read ?scale_x_date, about the breaks argument: you can use a "vector of breaks". Thus, try
scale_x_date(breaks = df$dates, labels = date_format("%d-%b-%Y"))

Resources