trouble getting Date field on X axis using ggplot2 - r

head(bktst.plotdata)
date method product type actuals forecast residual Percent_error month
1 2012-12-31 bauwd CUSTM NET 194727.51 -8192.00 -202919.51 -104.21 Dec12
2 2013-01-31 bauwd CUSTM NET 470416.27 1272.01 -469144.26 -99.73 Jan13
3 2013-02-28 bauwd CUSTM NET 190943.57 -1892.45 -192836.02 -100.99 Feb13
4 2013-03-31 bauwd CUSTM NET -42908.91 2560.05 45468.96 -105.97 Mar13
5 2013-04-30 bauwd CUSTM NET -102401.68 358807.48 461209.16 -450.39 Apr13
6 2013-05-31 bauwd CUSTM NET -134869.73 337325.33 472195.06 -350.11 May13
I have been trying to plot my back test result using ggplot2. Given above a sample dataset. I have dates ranging from Dec2012 to Jul2013. 3 levels in 'method', 5 levels in 'product' and 2 levels in 'type'
I tried this code, trouble is that R is not reading x-axis correct, on the X-axis I am getting 'Jan, feb, mar, apr, may,jun, jul, aug', instead I expect R to plot Dec-to-Jul
month.plot1 <- ggplot(data=bktst.plotdata, aes(x= date, y=Percent_error, colour=method))
facet4 <- facet_grid(product~type,scales="free_y")
title3 <- ggtitle("Percent Error - Month-over-Month")
xaxis2 <- xlab("Date")
yaxis3 <- ylab("Error (%)")
month.plot1+geom_line(stat="identity", size=1, position="identity")+facet4+title3+xaxis2+yaxis3
# Tried changing the code to this still not getting the X-axis right
month.plot1 <- ggplot(data=bktst.plotdata, aes(x= format(date,'%b%y'), y=Percent_error, colour=method))
month.plot1+geom_line(stat="identity", size=1, position="identity")+facet4+title3+xaxis2+yaxis3

Well, it looks like you are plotting the last day of each month, so it actually makes sense to me that December 31 is plotted very very close to January. If you look at the plotted points (with geom_point) you can see that each point is just to the left of the closest month axis.
It sounds like you want to plot years and months instead of actual dates. There are a variety of ways you might do this, but one thing you could is to change the day part of the date to the first of the month instead of the last of the month. Here I show how you could do this using some functions from package lubridate along with paste (I have assumed your variable date is already a Date object).
require(lubridate)
bktst.plotdata$date2 = as.Date(with(bktst.plotdata,
paste(year(date), month(date), "01", sep = "-")))
Then the plot axes start at December. You can change the format of the x axis if you load the scales package.
require(scales)
ggplot(data=bktst.plotdata, aes(x = date2, y=Percent_error, colour=method)) +
facet_grid(product~type,scales="free_y") +
ggtitle("Percent Error - Month-over-Month") +
xlab("Date") + ylab("Error (%)") +
geom_line() +
scale_x_date(labels=date_format(format = "%m-%Y"))

Related

How to add total count of values in bar chart y-axis in time-series analysis graph in r

I have created a graph from two variables (customer.complaints and Date_by_month) for the trend chart of monthly granularity levels but on the y-axis, it's showing me the name of the complaints, instead of that I want the total number of complaints in that particular month.
This is the graph I am getting
as you can see the value of the y-axis it quite annoying. instead of that, I want to add the count of the values
Here is my code snippet
library(ggplot2)
library(scales)
### Provide the trend chart for the number of complaints at monthly and daily granularity levels.
# converting date var to date type
comcast$Date <- gsub('-', '/', comcast$Date)
comcast$Date <- as.Date(comcast$Date, '%d/%m/%Y')
# plotting graph for monthly granularity levels
comcast$Date_by_month <- as.Date(cut(comcast$Date, breaks='month'))
ggplot(comcast, aes(Date_by_month, Customer.Complaint)) + stat_summary(fun.y=sum, geom='bar') + scale_x_date(labels=date_format("%Y-%m"), breaks='1 month') + scale_y_continuous(labels = fun.y= length)
There are a few issues here. The code you posted actually has a syntax error in the last line so doesn't run at all : scale_y_continuous(labels = fun.y= length), so whatever code produced your plot, it wasn't the code you posted.
In the line stat_summary(fun.y=sum, geom='bar') you are asking to get a sum of a text variable, which doesn't make any sense (maybe you meant count or length?)
And, of course, your problem isn't reproducible because you haven't given us any data to try it out on.
That said, let's recreate a similar data frame:
library(ggplot2)
library(lubridate)
library(scales)
random_words <- function(x) paste0(sample(c(" ", " ", letters), 50, TRUE), collapse = "")
Date_by_month <- as.Date(as.POSIXct("2015-01-01") + months(sample(12, 100, TRUE)))
complaints <- sapply(1:50, random_words)
comcast <- data.frame(Date_by_month = as_date(Date_by_month), Customer.Complaint = complaints)
head(comcast)
#> Date_by_month Customer.Complaint
#> 1 2015-09-30 impzvcx esxfmknrpufewh fxqknamay qhob cvpzlgubpu
#> 2 2015-08-31 mwt aezkcolutpengovtggeqavkxnfr myrq famttzzurj ug
#> 3 2015-04-30 uusewv wjxdpywsssqxgclhmlksrxqnqdfsip u jrdsfbldey
#> 4 2015-08-31 sf jytjtwseahfaqtvzisozuhhtrzygysxndyjifxoaytxhncf
#> 5 2015-06-30 vabtbfijnkeflhgpsspxyasiistuqqqjxuqs bsucp lbdrgbn
#> 6 2015-03-01 eylmurltlfgcp rvfdx as hiehnqdrn lrqanrmf quvzbhgh
Now, we don't actually need to include the complaints themselves in the chart. We can just give the dates to ggplot and it will automatically perform counts if we select geom_bar:
ggplot(comcast, aes(Date_by_month)) +
geom_bar() +
scale_x_date(labels = date_format("%Y-%m"), breaks='1 month')
Created on 2020-02-28 by the reprex package (v0.3.0)

ggplot: Issue when converting axis values from number of days to months in a boxplot

When converting a numeric variable "number of days from 1st of January of 2015" to date, the boxplot only shows part of the range of y-values but not all.
In this example, I plotted "gender" vs "months". Months were obtained by transforming the original "days" variable (i.e. days starting from 2015/1/1). The range of numeric values should extend from the end of March to the beginning of April of the subsequent year, but ggplot() is only plotting values between Aug and Jan and showing only month labels within that range in the y-axis.
Any help to solve this issue is very welcome!
Here is the code and the corresponding plot:
gender <- c(rep("female",144), rep("male",144))
days <- c(274,285,302,330,117,230,271,207,235,249,268,NA,NA,NA,NA,210,255,290,267,252,257,268,288,220,264,270,277,303,222,252,296,323,369,NA,258,NA,240,245,310,271,272,282,314,345,214,211,258,268,145,176,244,273,249,257,277,284,272,273,272,282,290,297,260,266,277,213,247,244,269,349,268,NA,220,235,269,299,266,273,274,307,285,299,300,224,257,284,291,305,278,294,455,280,262,272,276,295,338,264,339,232,277,230,270,312,276,285,308,241,273,340,249,260,270,352,297,217,247,287,320,191,249,265,287,320,432,262,265,324,309,234,441,409,264,381,262,276,316,330,252,264,298,315,287,330,274,287,371,237,259,266,349,247,249,241,333,379,486,198,249,270,275,279,314,182,234,252,289,319,216,262,293,234,272,284,311,258,NA,299,314,290,292,296,300,274,289,359,267,319,NA,492,294,319,293,265,273,315,307,315,287,378,238,239,315,325,361,249,NA,192,224,226,204,208,234,263,283,294,430,267,273,307,327,460,240,307,319,492,300,311,485,348,297,348,317,317,318,338,316,316,336,255,284,316,249,302,307,308,301,265,273,316,281,326,272,283,NA,NA,243,254,271,191,259,324,287,265,310,337,287,326,304,399,337,295,313,228,288,307,270,347,290,245,NA,283,423,223,NA,264,314,283)
mytable <- data.frame(gender,days)
range(mytable$days, na.rm=T) # 117 to 492
mytable$months <- (as.Date(days,origin = "2015/1/1"))
ggplot(mytable, aes(x=gender, y=months,fill=gender)) +
geom_boxplot()
I am not sure about the intuition behind this plot. But, this would give you what you desire:
ggplot(mytable, aes(x=gender, y=months, fill=gender)) +
geom_boxplot() +
scale_y_date(date_labels="%b ", date_breaks ="1 month",
limits = c(as.Date("2015-3-1"), as.Date("2016-2-1")))

ggplot: Multiple years on same plot by month

So, I've hit something I don't think I have every come across. I scoured Google looking for the answer, but have not found anything (yet)...
I have two data sets - one for 2015 and one for 2016. They represent the availability of an IT system. The data frames read as such:
2015 Data Set:
variable value
Jan 2015 100
Feb 2015 99.95
... ...
2015 Data Set:
variable value
Jan 2016 99.99
Feb 2016 99.90
... ...
They just go from Jan - Dec listing the availability of the system. The "variable" column is a as.yearmon data type and the value is a simple numeric.
I want to create a geom_line() chart with ggplot2 that will basically have the percentages as the y-axis and the months as the x-axis. I have been able to do this where there are two lines, but the x-axis runs from Jan 2015 - Dec 2016. What I'd like is to have them only be plotted by month, so they overlap. I have tried some various things with the scales and so forth, but I have yet to figure out how to do this.
Basically, I need the x-axis to read January - December in chronological order, but I want to plot both 2015 and 2016 on the same chart. Here is my ggplot code (non-working) as I have it now:
ggplot(data2015,aes(variable,value)) +
geom_line(aes(color="2015")) +
geom_line(data=data2016,aes(color="2016")) +
scale_x_yearmon() +
theme_classic()
This plots in a continuous stream as I am dealing with a yearmon() data type. I have tried something like this:
ggplot(data2015,aes(months(variable),value)) +
geom_line(aes(color="2015")) +
geom_line(data=data2016,aes(color="2016")) +
theme_classic()
Obviously that won't work. I figure the months() is probably still carrying the year somehow. If I plot them as factors() they are not in order. Any help would be very much appreciated. Thank you in advance!
To get a separate line for each year, you need to extract the year from each date and map it to colour. To get months (without year) on the x-axis, you need to extract the month from each date and map to the x-axis.
library(zoo)
library(lubridate)
library(ggplot2)
Let's create some fake data with the dates in as.yearmon format. I'll create two separate data frames so as to match what you describe in your question:
# Fake data
set.seed(49)
dat1 = data.frame(date = seq(as.Date("2015-01-15"), as.Date("2015-12-15"), "1 month"),
value = cumsum(rnorm(12)))
dat1$date = as.yearmon(dat1$date)
dat2 = data.frame(date = seq(as.Date("2016-01-15"), as.Date("2016-12-15"), "1 month"),
value = cumsum(rnorm(12)))
dat2$date = as.yearmon(dat2$date)
Now for the plot. We'll extract the year and month from date with the year and month functions, respectively, from the lubridate package. We'll also turn the year into a factor, so that ggplot will use a categorical color palette for year, rather than a continuous color gradient:
ggplot(rbind(dat1,dat2), aes(month(date, label=TRUE, abbr=TRUE),
value, group=factor(year(date)), colour=factor(year(date)))) +
geom_line() +
geom_point() +
labs(x="Month", colour="Year") +
theme_classic()
month value year
Jan 99.99 2015
Feb 99.90 2015
Jan 100 2016
Feb 99.95 2016
You need one longform dataset that has a year column. Then you can plot both lines with ggplot
ggplot(dataset, aes(x = month, y = value, color = year)) + geom_line()
ggseasonplotfrom forecast package can do that for you. Example code with ts object:
ggseasonplot(a10, year.labels=TRUE, year.labels.left=TRUE) +
ylab("$ million") +
ggtitle("Seasonal plot: antidiabetic drug sales")
Source

Google Trends and Weeks, ggplot2

When I am downloading data from Google Trend, the dataset looks like this:
Week nuclear atomic nuclear.weapons unemployment
2004-01-04 - 2004-01-10 11 11 1 15
2004-01-11 - 2004-01-17 11 13 1 13
2004-01-18 - 2004-01-24 10 11 1 13
How can I change the dates in "Week" from this format "Y-m-d - Y-m-d" to a format like "Year-Week"?
Furthermore, how can I tell ggplot, that it only the years are printed on the x-axes instead of all values for x?
#Mattrition: Thank you. I followed your advice:
trends <- melt(trends, id = "Woche",
measure = c("nuclear", "atomic", "nuclear.weapons", "unemployment"))
trends$Week<- gsub("^(\\d+-\\d+-\\d+).+", "\\1", trends$Week)
trends$Week <- as.Date(trends$Week)
ggplot(trends, aes(Week, value, colour = variable, group=variable)) +
geom_line() +
ylab("Trends") +
theme(legend.position="top", legend.title=element_blank(),
panel.background = element_rect(fill = "#FFFFFF", colour="#000000"))+
scale_color_manual(values=c("#999999", "#E69F00", "#56B4E9", "#009E73"))+
stat_smooth(method="loess")
Now, every second year is labeled (2004, 2006, ...) in x-axis. How can I tell ggplot to label every year (2004, 2005, ...)?
ggplot will understand Date objects (see ?Date) and work out appropriate labelling if you can convert your dates to this format.
You can use something like gsub to extract starting day for each week. This uses regular expressions to match the first argument and return anything inside the set of brackets:
df$startingDay <- gsub("^(\\d+-\\d+-\\d+).+", "\\1", df$Week)
Then call as.Date() on the extracted day strings to convert to Date objects:
df$date <- as.Date(df$startingDay)
You can then use the date objects to plot whatever you wanted to plot:
g <- ggplot(df, aes(date, as.numeric(atomic))) + geom_line()
print(g)
EDIT:
To answer your additional question, add the following to your ggplot object:
library(scales)
g <- g + scale_x_date(breaks=date_breaks(width="1 year"),
labels=date_format("%Y"))

Formatting dates in ggplot to highlight the start of financial years

I've got data refering to financial years, starting from 1 April each year and ending 31 March in next solar year.
df <- data.frame(date = seq(as.POSIXct("2008-04-01"), by="month", length.out=49),
var = rnorm(49))
head(df,3)
date var
1 2008-04-01 0.04265025
2 2008-05-01 -1.59671801
3 2008-06-01 0.4909673
Plotting df with library(ggplot2); ggplot(df) + geom_line(aes(date, var)) I get:
Now, what I'm interested in is having say the "2009" label positioned at "2009-04-01", as it's that the actual start of the FY 2009. I managed to get that with the following code:
ggplot(df) + geom_line(aes(date, var)) +
scale_x_datetime(breaks = df$date[months(df$date)=="April"],
labels = date_format("%Y"))
which correctly gives:
My question is (finally :-) ) does some of you have a better way for showing financial years and eventually better codes then the above?
You could use geom_rect to highlight the financial years. Assuming you save your original plot as p, try:
bgdf <- data.frame(xmin=as.POSIXct(paste0(2008:2011,"-04-01")),
xmax=as.POSIXct(paste0(2009:2012,"-04-01")),
ymin=min(df$var),ymax=max(df$var),alpha=((2008:2011)%%2)*0.1)
p + geom_rect(aes(xmin=xmin,xmax=xmax,ymin=ymin,ymax=ymax),
data=bgdf,alpha=bgdf$alpha,fill="blue")

Resources