I'm trying to plot the monthly sales data with RStudio, but the dates on the x-axis are not showing correctly.
My code :-
uc_ts_plot <- ggplot(monthly_sales, aes(DATE,DAUTONSA)) + geom_line(na.rm=TRUE) +
xlab("Month") + ylab("Auto Sales in Thousands") +
scale_x_date(labels = date_format(format= "%b-%Y"),
breaks = date_breaks("1 year")) +
stat_smooth(colour = "green")
uc_ts_plot
I expect the dates on the x-axis to be displayed as Jan-2011, Jan-2012, as shown here.
All I'm getting is a 0001-01 at the left end and a 0002-01 at the right end of the x-axis.
The plot which is shown is filtered between year 2011 and 2018 whereas the data you have is from 1967.
The below code produces the exact plot
library(tidyverse)
library(scales)
library(lubridate)
monthly_sales %>%
mutate(DATE = as.Date(DATE)) %>%
filter(year(DATE) >= 2011 & year(DATE) < 2018) %>%
ggplot() + aes(DATE,DAUTONSA) +
geom_line(na.rm=TRUE) +
xlab("Month") + ylab("Auto Sales in Thousands") +
scale_x_date(labels = date_format(format= "%b-%Y"),
breaks = date_breaks("1 year")) +
stat_smooth(colour = "green")
You can remove the filter step to plot data for all the years but then it clutters the x-axis with lot of labels.
Related
Similar to this question: Split up time series per year for plotting which has done in Python, I want to display the daily time series as multiple lines by year. How can I achieve this in R?
library(ggplot2)
library(dplyr)
# Dummy data
df <- data.frame(
day = as.Date("2017-06-14") - 0:364,
value = runif(365) + seq(-140, 224)^2 / 10000
)
# Most basic bubble plot
p <- ggplot(df, aes(x=day, y=value)) +
geom_line() +
xlab("")
p
Out:
One solution is using ggplot2, but date_labels are displayed incorrectly:
library(tidyverse)
library(lubridate)
p <- df %>%
# mutate(date = ymd(date)) %>%
mutate(date=as.Date(date)) %>%
mutate(
year = factor(year(date)), # use year to define separate curves
date = update(date, year = 1) # use a constant year for the x-axis
) %>%
ggplot(aes(date, value, color = year)) +
scale_x_date(date_breaks = "1 month", date_labels = "%b")
# Raw daily data
p + geom_line()
Out:
Alternative solution is to use gg_season from feasts package:
library(feasts)
library(tsibble)
library(dplyr)
tsibbledata::aus_retail %>%
filter(
State == "Victoria",
Industry == "Cafes, restaurants and catering services"
) %>%
gg_season(Turnover)
Out:
References:
Split up time series per year for plotting
R - How to create a seasonal plot - Different lines for years
If you want your x axis to represent the months from January to February, then perhaps getting the yday of the date and adding it to the first of January on a random year would be simplest:
library(tidyverse)
library(lubridate)
df <- data.frame(
day = as.Date("2017-06-14") - 0:364,
value = runif(365) + seq(-140, 224)^2 / 10000
)
df %>%
mutate(year = factor(year(day)), date = yday(day) + as.Date('2017-01-01')) %>%
ggplot(aes(date, value, color = year)) +
geom_line() +
scale_x_date(breaks = seq(as.Date('2017-01-01'), by = 'month', length = 12),
date_labels = '%b')
Created on 2023-02-07 with reprex v2.0.2
I tend to think simple is better:
transform(df, year = format(day, "%Y")) |>
ggplot(aes(x=day, y=value, group=year, color=year)) +
geom_line() +
xlab(NULL)
optionally removing the year legend with + guides(colour = "none").
I am trying to do a faceted plot of a grouped dataframe with ggplot2, using geom_line(). My dataframe has a Date column and I would like to have dates on the horizontal axis. If I just use Date in aes(x=Date, ...) I get nice labels on the horizontal axis. However, the line has an almost horizontal section where the date jumps from the end of one group to the beginning of the next group. This code and chart shows that:
dts <- seq.Date(as.Date("2020-01-01"), as.Date("2021-12-31"), by="day")
mos <- sapply(dts, month)
df <- data.frame(Date=dts, Month=mos)
nr <- nrow(df)
df$X <- rep(1, nr)
df %>%
group_by(Month) -> dfgrp
dfgrp %>%
group_by(Month) %>%
mutate(Time = Date[1:n()],
Z = cumsum(X)) %>%
ggplot(aes(x=Date, y=Z)) +
geom_line(color="darkgreen", size=0.5) +
facet_grid(. ~ Month, scale="free_x") +
theme(axis.text.x = element_text(angle=45, size=7))
I would not like my chart to have those almost-horizontal lines when the date changes by a large amount. I was able to generate a chart without those lines using integers on aes() as follows:
dfgrp %>%
mutate(Time = 1:n() %>% as.integer(),
Z = cumsum(X)) %>%
ggplot(aes(x=Time, y=Z)) +
geom_line(color="darkgreen", size=0.5) +
facet_grid(. ~ Month, scale="free_x") +
scale_x_continuous(breaks = seq(from=1, to=nr, by=10) %>% as.integer(),
labels = function(x) as.character(dfgrp$Date[x])) +
theme(axis.text.x = element_text(angle=45, size=7))
The line on the chart looks like I want it but the dates on the horizontal axis are not correct: they end in February 2020 in every facet while the dates in the dataframe end in December 2021 and the dates in the first chart begin and end on different months in different facets.
I tried many things but nothing worked. Any suggestions on how to have a chart with dates like in the first chart above and lines like in the second chart above?
Help will be much appreciated.
You may want to adjust the dates to be in the same year, but noting the original year as a variable:
library(lubridate)
dfgrp %>%
group_by(Month) %>%
mutate(year = year(Date),
adj_date = ymd(paste(2020, month(Date), day(Date)))) %>%
# 2020 was leap year so 2/29 won't be lost
mutate(Time = Date[1:n()],
Z = cumsum(X)) %>%
ggplot(aes(x=adj_date, y=Z, color = year, group = year)) +
geom_line(size=0.5) +
facet_grid(. ~ Month, scale="free_x") +
theme(axis.text.x = element_text(angle=45, size=7))
I'm working with a data frame that includes both the day and month of the observation:
Day Month
1 January
2 January
3 January
...
32 February
33 February
34 February
...
60 March
61 March
And so on. I am creating a line graph in ggplot that reflects the day by day value of column wp (which can just be assumed to be random values between 0 and 1).
Because there are so many days in the data set, I don't want them to be reflected in the x-axis tick marks. I need to refer to the day column in creating the plot so I can see the day-by-day change in wp, but I just want month to be shown on the x-axis labels. I can easily rename the x-axis title with xlab("Month"), but I can't figure out how to change the tick marks to only show the month. So ideally, I want to see "January", "February", and "March" as the labels along the x-axis.
test <- ggplot(data = df, aes(x=day, y=wp)) +
geom_line(color="#D4BF91", size = 1) +
geom_point(color="#D4BF91", size = .5) +
ggtitle("testex") +
xlab("Day") +
ylab("Value") +
theme_fivethirtyeight() + theme(axis.title = element_text())
When I treat the day as an actual date, I get the desired result:
library(lubridate)
set.seed(21540)
dat <- tibble(
date = seq(mdy("01-01-2020"), mdy("01-01-2020")+75, by=1),
day = seq_along(date),
wp = runif(length(day), 0, 1)
)
ggplot(data = dat, aes(x=date, y=wp)) +
geom_line(color="#D4BF91", size = 1) +
geom_point(color="#D4BF91", size = .5) +
ggtitle("testex") +
xlab("Day") +
ylab("Value") +
theme(axis.title = element_text())
More generally, though you could use scale_x_continuous() to do this:
library(lubridate)
set.seed(21540)
dat <- tibble(
date = seq(mdy("01-01-2020"), mdy("01-01-2020")+75, by=1),
day = seq_along(date),
wp = runif(length(day), 0, 1),
month = as.character(month(date, label=TRUE))
)
firsts <- dat %>%
group_by(month) %>%
slice(1)
ggplot(data = dat, aes(x=day, y=wp)) +
geom_line(color="#D4BF91", size = 1) +
geom_point(color="#D4BF91", size = .5) +
ggtitle("testex") +
xlab("Day") +
ylab("Value") +
scale_x_continuous(breaks=firsts$day, label=firsts$month) +
theme(axis.title = element_text())
I am practicing with R and have hit a speedbump while trying to create a graph of airline passengers per month.
I want to show a separate monthly line graph for each year from 1949 to 1960 whereby data has been recorded. To do this I have used ggplot to create a line graph with the values per month. This works fine, however when I try to separate this by year using facet_wrap() and formatting the current month field: facet_wrap(format(air$month[seq(1, length(air$month), 12)], "%Y")); it returns this:
Graph returned
I have also tried to format the facet by inputting my own sequence for the years: rep(c(1949:1960), each = 12). This returns a different result which is better but still wrong:
Second graph
Here is my code:
air = data.frame(
month = seq(as.Date("1949-01-01"), as.Date("1960-12-01"), by="months"),
air = as.vector(AirPassengers)
)
ggplot(air, aes(x = month, y = air)) +
geom_point() +
labs(x = "Month", y = "Passengers (in thousands)", title = "Total passengers per month, 1949 - 1960") +
geom_smooth(method = lm, se = F) +
geom_line() +
scale_x_date(labels = date_format("%b"), breaks = "12 month") +
facet_wrap(format(air$month[seq(1, length(air$month), 12)], "%Y"))
#OR
facet_wrap(rep(c(1949:1960), each = 12))
So how do I make an individual graph per year?
Thanks!
In the second try you were really close. The main problem with the data is that you are trying to make a facetted plot with different x-axis values (dates including the year). An easy solution to fix that would be to transform the data to a "common" x axis scale and then do the facetted plot. Here is the code that should output the desired plot.
library(tidyverse)
library(lubridate)
air %>%
# Get the year value to use it for the facetted plot
mutate(year = year(month),
# Get the month-day dates and set all dates with a dummy year (2021 in this case)
# This will get all your dates in a common x axis scale
month_day = as_date(paste(2021,month(month),day(month), sep = "-"))) %>%
# Do the same plot, just change the x variable to month_day
ggplot(aes(x = month_day,
y = air)) +
geom_point() +
labs(x = "Month",
y = "Passengers (in thousands)",
title = "Total passengers per month, 1949 - 1960") +
geom_smooth(method = lm,
se = F) +
geom_line() +
# Set the breaks to 1 month
scale_x_date(labels = scales::date_format("%b"),
breaks = "1 month") +
# Use the year variable to do the facetted plot
facet_wrap(~year) +
# You could set the x axis in an 90° angle to get a cleaner plot
theme(axis.text.x = element_text(angle = 90,
vjust = 0.5,
hjust = 1))
I have a box-plot of certain variables. I see the box plots of a number of days on x-axis (Sunday, Monday, Wednesday, Thursday and Friday). I would like to know how to remove two of the box plots from the graph. To be specific, I don't want there to be a box plot of Monday and Wednesday. The code I used was:
ggplot(contents2019, aes(x = days, y = time)) +
geom_boxplot() +
xlab("Day") +
ylab("Time") +
theme_bw()
You can filter your dataframe beforehand:
library(tidyverse)
contents2019 %>%
filter(!days %in% c("Monday", "Wednesday")) %>%
ggplot(aes(x = days, y = time)) +
geom_boxplot() +
xlab("Day") +
ylab("Time") +
theme_bw()