ggplot: Multiple years on same plot by month - r

So, I've hit something I don't think I have every come across. I scoured Google looking for the answer, but have not found anything (yet)...
I have two data sets - one for 2015 and one for 2016. They represent the availability of an IT system. The data frames read as such:
2015 Data Set:
variable value
Jan 2015 100
Feb 2015 99.95
... ...
2015 Data Set:
variable value
Jan 2016 99.99
Feb 2016 99.90
... ...
They just go from Jan - Dec listing the availability of the system. The "variable" column is a as.yearmon data type and the value is a simple numeric.
I want to create a geom_line() chart with ggplot2 that will basically have the percentages as the y-axis and the months as the x-axis. I have been able to do this where there are two lines, but the x-axis runs from Jan 2015 - Dec 2016. What I'd like is to have them only be plotted by month, so they overlap. I have tried some various things with the scales and so forth, but I have yet to figure out how to do this.
Basically, I need the x-axis to read January - December in chronological order, but I want to plot both 2015 and 2016 on the same chart. Here is my ggplot code (non-working) as I have it now:
ggplot(data2015,aes(variable,value)) +
geom_line(aes(color="2015")) +
geom_line(data=data2016,aes(color="2016")) +
scale_x_yearmon() +
theme_classic()
This plots in a continuous stream as I am dealing with a yearmon() data type. I have tried something like this:
ggplot(data2015,aes(months(variable),value)) +
geom_line(aes(color="2015")) +
geom_line(data=data2016,aes(color="2016")) +
theme_classic()
Obviously that won't work. I figure the months() is probably still carrying the year somehow. If I plot them as factors() they are not in order. Any help would be very much appreciated. Thank you in advance!

To get a separate line for each year, you need to extract the year from each date and map it to colour. To get months (without year) on the x-axis, you need to extract the month from each date and map to the x-axis.
library(zoo)
library(lubridate)
library(ggplot2)
Let's create some fake data with the dates in as.yearmon format. I'll create two separate data frames so as to match what you describe in your question:
# Fake data
set.seed(49)
dat1 = data.frame(date = seq(as.Date("2015-01-15"), as.Date("2015-12-15"), "1 month"),
value = cumsum(rnorm(12)))
dat1$date = as.yearmon(dat1$date)
dat2 = data.frame(date = seq(as.Date("2016-01-15"), as.Date("2016-12-15"), "1 month"),
value = cumsum(rnorm(12)))
dat2$date = as.yearmon(dat2$date)
Now for the plot. We'll extract the year and month from date with the year and month functions, respectively, from the lubridate package. We'll also turn the year into a factor, so that ggplot will use a categorical color palette for year, rather than a continuous color gradient:
ggplot(rbind(dat1,dat2), aes(month(date, label=TRUE, abbr=TRUE),
value, group=factor(year(date)), colour=factor(year(date)))) +
geom_line() +
geom_point() +
labs(x="Month", colour="Year") +
theme_classic()

month value year
Jan 99.99 2015
Feb 99.90 2015
Jan 100 2016
Feb 99.95 2016
You need one longform dataset that has a year column. Then you can plot both lines with ggplot
ggplot(dataset, aes(x = month, y = value, color = year)) + geom_line()

ggseasonplotfrom forecast package can do that for you. Example code with ts object:
ggseasonplot(a10, year.labels=TRUE, year.labels.left=TRUE) +
ylab("$ million") +
ggtitle("Seasonal plot: antidiabetic drug sales")
Source

Related

geom bar comapre years per month

I have 2 datas, one for 2020 and the other for 2019. Each is divided into 5 groups when each month has its own data.
I want to create a graph that compares each month for each group between the figure in 2020 and the figure in 2019.
the data for 2020 was like that-
enter image description here
and the data for 2019 was the same.
I combine the 2 datas to that:
enter image description here
The problem is that all the graphs I looked at on the internet have either one column of values or no division into months.
How can you create one graph that compares each month between 2019 and 2020?
library(tidyverse)
library(ggplot2)
# bring table in long format
longerTable <- tibble(month = 1:12, value_2020 = rnorm(12), value_2019=rnorm(12)) %>%
pivot_longer(cols=starts_with("value"), names_to="year", values_to="value")
# plot with ggplot.
ggplot(longerTable, aes(x=month, y=value, fill=year)) +
# stat = identity -> plot numbers as they are
# position = dodge -> show bars next to each other
geom_bar(stat="identity", position = "dodge")
Created on 2020-10-01 by the reprex package (v0.3.0)

ggplot: Issue when converting axis values from number of days to months in a boxplot

When converting a numeric variable "number of days from 1st of January of 2015" to date, the boxplot only shows part of the range of y-values but not all.
In this example, I plotted "gender" vs "months". Months were obtained by transforming the original "days" variable (i.e. days starting from 2015/1/1). The range of numeric values should extend from the end of March to the beginning of April of the subsequent year, but ggplot() is only plotting values between Aug and Jan and showing only month labels within that range in the y-axis.
Any help to solve this issue is very welcome!
Here is the code and the corresponding plot:
gender <- c(rep("female",144), rep("male",144))
days <- c(274,285,302,330,117,230,271,207,235,249,268,NA,NA,NA,NA,210,255,290,267,252,257,268,288,220,264,270,277,303,222,252,296,323,369,NA,258,NA,240,245,310,271,272,282,314,345,214,211,258,268,145,176,244,273,249,257,277,284,272,273,272,282,290,297,260,266,277,213,247,244,269,349,268,NA,220,235,269,299,266,273,274,307,285,299,300,224,257,284,291,305,278,294,455,280,262,272,276,295,338,264,339,232,277,230,270,312,276,285,308,241,273,340,249,260,270,352,297,217,247,287,320,191,249,265,287,320,432,262,265,324,309,234,441,409,264,381,262,276,316,330,252,264,298,315,287,330,274,287,371,237,259,266,349,247,249,241,333,379,486,198,249,270,275,279,314,182,234,252,289,319,216,262,293,234,272,284,311,258,NA,299,314,290,292,296,300,274,289,359,267,319,NA,492,294,319,293,265,273,315,307,315,287,378,238,239,315,325,361,249,NA,192,224,226,204,208,234,263,283,294,430,267,273,307,327,460,240,307,319,492,300,311,485,348,297,348,317,317,318,338,316,316,336,255,284,316,249,302,307,308,301,265,273,316,281,326,272,283,NA,NA,243,254,271,191,259,324,287,265,310,337,287,326,304,399,337,295,313,228,288,307,270,347,290,245,NA,283,423,223,NA,264,314,283)
mytable <- data.frame(gender,days)
range(mytable$days, na.rm=T) # 117 to 492
mytable$months <- (as.Date(days,origin = "2015/1/1"))
ggplot(mytable, aes(x=gender, y=months,fill=gender)) +
geom_boxplot()
I am not sure about the intuition behind this plot. But, this would give you what you desire:
ggplot(mytable, aes(x=gender, y=months, fill=gender)) +
geom_boxplot() +
scale_y_date(date_labels="%b ", date_breaks ="1 month",
limits = c(as.Date("2015-3-1"), as.Date("2016-2-1")))

How to use conditional arguments while plotting in R?

I have a dataframe of which two columns are of date and sales. date column varies from 2012-10-22 to 2016-09-22. I want to plot the graph of sales in jan 2013 by day without creating any subset.
I have used this-
ggplot(subsales,aes(Date,spdby))+geom_line()
Is it possible by using ggplot()?
I have plotted the sales per day and look like this-
I want to zoom in, to January 2013 and want to extract that part as a new plot.
yes, in ggplot:
library(ggplot2)
subsales <- data.frame(
date = seq(as.Date("2013-1-1"), as.Date("2017-1-1"), by = "day"),
spdby = runif(1462, 2000, 6000)
)
ggplot(subsales, aes(date, spdby)) +
geom_line()
ggplot(subsales, aes(date, spdby)) + geom_line() +
scale_x_date(limits = c(as.Date("2013-1-1"), as.Date("2013-1-31")))
#> Warning: Removed 1431 rows containing missing values (geom_path).

Trying to manually set the x-axis range for a date variable

I want to have the x-axis run from March until November although the dates only run between April and August.
Generate random date data and convert to data frame
date.data <- sample(seq(as.Date("2016-04-01"),as.Date("2016-08- 31"),by = "day"),50)
date.df <- as.data.frame(date.data)
generate ggplot histogram with default values for x-axis
ggplot(date.df,aes(date.data)) + geom_histogram()
Attempt to override the default values by writing a datebreaks vector and using a scale command. The resulting histogram still only spans April through August.
datebreaks <- seq(as.Date("2016-03-01"),as.Date("2016-11-15"),by = "month")
ggplot(date.df,aes(date.data)) + geom_histogram() +
scale_x_date(breaks = datebreaks,labels=date_format("%b"))
How can I have the x-axis run from March until November?
aosmith made the useful suggestion to use xlim(). That works except the histogram has only the alternate months labeled on the axis. If I use scale_x_date to have each month labeled, that scale overrides the xlim(command)
mar <- as.Date("2016-03-01")
nov <- as.Date("2016-11-01")
ggplot(date.df,aes(date.data)) + geom_histogram() +
xlim(c(mar,nov))
Just what I am looking for except only alternate months appear as labels.
The addition of the scale command gets all months labeled but I lose the March to November span of the x-axis.
mar <- as.Date("2016-03-01")
nov <- as.Date("2016-11-01")
ggplot(date.df,aes(date.data)) + geom_histogram() +
xlim(c(mar,nov)) + scale_x_date(date_breaks = "1 month", date_labels = "%b")
Anyone know a way to use xlim() and set the label formats (each month appearing as a label) at the same time?

trouble getting Date field on X axis using ggplot2

head(bktst.plotdata)
date method product type actuals forecast residual Percent_error month
1 2012-12-31 bauwd CUSTM NET 194727.51 -8192.00 -202919.51 -104.21 Dec12
2 2013-01-31 bauwd CUSTM NET 470416.27 1272.01 -469144.26 -99.73 Jan13
3 2013-02-28 bauwd CUSTM NET 190943.57 -1892.45 -192836.02 -100.99 Feb13
4 2013-03-31 bauwd CUSTM NET -42908.91 2560.05 45468.96 -105.97 Mar13
5 2013-04-30 bauwd CUSTM NET -102401.68 358807.48 461209.16 -450.39 Apr13
6 2013-05-31 bauwd CUSTM NET -134869.73 337325.33 472195.06 -350.11 May13
I have been trying to plot my back test result using ggplot2. Given above a sample dataset. I have dates ranging from Dec2012 to Jul2013. 3 levels in 'method', 5 levels in 'product' and 2 levels in 'type'
I tried this code, trouble is that R is not reading x-axis correct, on the X-axis I am getting 'Jan, feb, mar, apr, may,jun, jul, aug', instead I expect R to plot Dec-to-Jul
month.plot1 <- ggplot(data=bktst.plotdata, aes(x= date, y=Percent_error, colour=method))
facet4 <- facet_grid(product~type,scales="free_y")
title3 <- ggtitle("Percent Error - Month-over-Month")
xaxis2 <- xlab("Date")
yaxis3 <- ylab("Error (%)")
month.plot1+geom_line(stat="identity", size=1, position="identity")+facet4+title3+xaxis2+yaxis3
# Tried changing the code to this still not getting the X-axis right
month.plot1 <- ggplot(data=bktst.plotdata, aes(x= format(date,'%b%y'), y=Percent_error, colour=method))
month.plot1+geom_line(stat="identity", size=1, position="identity")+facet4+title3+xaxis2+yaxis3
Well, it looks like you are plotting the last day of each month, so it actually makes sense to me that December 31 is plotted very very close to January. If you look at the plotted points (with geom_point) you can see that each point is just to the left of the closest month axis.
It sounds like you want to plot years and months instead of actual dates. There are a variety of ways you might do this, but one thing you could is to change the day part of the date to the first of the month instead of the last of the month. Here I show how you could do this using some functions from package lubridate along with paste (I have assumed your variable date is already a Date object).
require(lubridate)
bktst.plotdata$date2 = as.Date(with(bktst.plotdata,
paste(year(date), month(date), "01", sep = "-")))
Then the plot axes start at December. You can change the format of the x axis if you load the scales package.
require(scales)
ggplot(data=bktst.plotdata, aes(x = date2, y=Percent_error, colour=method)) +
facet_grid(product~type,scales="free_y") +
ggtitle("Percent Error - Month-over-Month") +
xlab("Date") + ylab("Error (%)") +
geom_line() +
scale_x_date(labels=date_format(format = "%m-%Y"))

Resources