I've got data refering to financial years, starting from 1 April each year and ending 31 March in next solar year.
df <- data.frame(date = seq(as.POSIXct("2008-04-01"), by="month", length.out=49),
var = rnorm(49))
head(df,3)
date var
1 2008-04-01 0.04265025
2 2008-05-01 -1.59671801
3 2008-06-01 0.4909673
Plotting df with library(ggplot2); ggplot(df) + geom_line(aes(date, var)) I get:
Now, what I'm interested in is having say the "2009" label positioned at "2009-04-01", as it's that the actual start of the FY 2009. I managed to get that with the following code:
ggplot(df) + geom_line(aes(date, var)) +
scale_x_datetime(breaks = df$date[months(df$date)=="April"],
labels = date_format("%Y"))
which correctly gives:
My question is (finally :-) ) does some of you have a better way for showing financial years and eventually better codes then the above?
You could use geom_rect to highlight the financial years. Assuming you save your original plot as p, try:
bgdf <- data.frame(xmin=as.POSIXct(paste0(2008:2011,"-04-01")),
xmax=as.POSIXct(paste0(2009:2012,"-04-01")),
ymin=min(df$var),ymax=max(df$var),alpha=((2008:2011)%%2)*0.1)
p + geom_rect(aes(xmin=xmin,xmax=xmax,ymin=ymin,ymax=ymax),
data=bgdf,alpha=bgdf$alpha,fill="blue")
Related
I am struggling (due to lack of knowledge and experience) to create a plot in R with time series from three different years (2009, 2013 and 2017). Failing to solve this problem by searching online has led me here.
I wish to create a plot that shows change in nitrate concentrations over the course of May to October for all years, but keep failing since the x-axis is defined by one specific year. I also receive errors because the x-axis lengths differ (due to different number of samples). To solve this I have tried making separate columns for month and year, with no success.
Data example:
date NO3.mg.l year month
2009-04-22 1.057495 2009 4
2013-05-08 1.936000 2013 5
2017-05-02 2.608000 2017 5
Code:
ggplot(nitrat.all, aes(x = date, y = NO3.mg.l, colour = year)) + geom_line()
This code produces a plot where the lines are positioned next to one another, whilst I want a plot where they overlay one another. Any help will be much appreciated.
Nitrate plot
Probably, that will be helpful for plotting:
library("lubridate")
library("ggplot2")
# evample of data with some points for each year
nitrat.all <- data.frame(date = c(ymd("2009-03-21"), ymd("2009-04-22"), ymd("2009-05-27"),
ymd("2010-03-15"), ymd("2010-04-17"), ymd("2010-05-10")), NO3.mg.l = c(1.057495, 1.936000, 2.608000,
3.157495, 2.336000, 3.908000))
nitrat.all$year <- format(nitrat.all$date, format = "%Y")
ggplot(data = nitrat.all) +
geom_point(mapping = aes(x = format(date, format = "%m-%d"), y = NO3.mg.l, group = year, colour = year)) +
geom_line(mapping = aes(x = format(date, format = "%m-%d"), y = NO3.mg.l, group = year, colour = year))
As for selecting of the dates corresponding to a certain month, you may subset your data frame by a condition using basic R-functions:
n_month1 <- 3 # an index of the first month of the period to select
n_month2 <- 4 # an index of the first month of the period to select
test_for_month <- (as.numeric(format(nitrat.all$date, format = "%m")) >= n_month1) &
(as.numeric(format(nitrat.all$date, format = "%m")) <= n_month2)
nitrat_to_plot <- nitrat.all[test_for_month, ]
Another quite an elegant approach is to use filter() from dplyr package
nitrat.all$month <- as.numeric(format(nitrat.all$date, format = "%m"))
library("dplyr")
nitrat_to_plot <- filter(nitrat.all, ((month >= n_month1) & (month <= n_month2)))
I already asked the same question yesterday, but I didnt get any suggestions until now, so I decided to delete the old one and ask again, giving additional infos.
So here again:
I have a dataframe like this:
Link to the original dataframe: https://megastore.uni-augsburg.de/get/JVu_V51GvQ/
Date DENI011
1 1993-01-01 9.946
2 1993-01-02 13.663
3 1993-01-03 6.502
4 1993-01-04 6.031
5 1993-01-05 15.241
6 1993-01-06 6.561
....
....
6569 2010-12-26 44.113
6570 2010-12-27 34.764
6571 2010-12-28 51.659
6572 2010-12-29 28.259
6573 2010-12-30 19.512
6574 2010-12-31 30.231
I want to create a plot that enables me to compare the monthly values in the DENI011 over the years. So I want to have something like this:
http://r-statistics.co/Top50-Ggplot2-Visualizations-MasterList-R-Code.html#Seasonal%20Plot
Jan-Dec on the x-scale, values on the y-scale and the years displayed by different colored lines.
I found several similar questions here, but nothing works for me. I tried to follow the instructions on the website with the example, but the problem is that I cant create a ts-object.
Then I tried it this way:
Ref_Data$MonthN <- as.numeric(format(as.Date(Ref_Data$Date),"%m")) # Month's number
Ref_Data$YearN <- as.numeric(format(as.Date(Ref_Data$Date),"%Y"))
Ref_Data$Month <- months(as.Date(Ref_Data$Date), abbreviate=TRUE) # Month's abbr.
g <- ggplot(data = Ref_Data, aes(x = MonthN, y = DENI011, group = YearN, colour=YearN)) +
geom_line() +
scale_x_discrete(breaks = Ref_Data$MonthN, labels = Ref_Data$Month)
That also didnt work, the plot looks horrible. I dont need to put all the years in 1 plot from 1993-2010. Actually only a few years would be ok, like from 1998-2006 maybe.
And suggestions, how to solve this?
As others have noted, in order to create a plot such as the one you used as an example, you'll have to aggregate your data first. However, it's also possible to retain daily data in a similar plot.
reprex::reprex_info()
#> Created by the reprex package v0.1.1.9000 on 2018-02-11
library(tidyverse)
library(lubridate)
# Import the data
url <- "https://megastore.uni-augsburg.de/get/JVu_V51GvQ/"
raw <- read.table(url, stringsAsFactors = FALSE)
# Parse the dates, and use lower case names
df <- as_tibble(raw) %>%
rename_all(tolower) %>%
mutate(date = ymd(date))
One trick to achieve this would be to set the year component in your date variable to a constant, effectively collapsing the dates to a single year, and then controlling the axis labelling so that you don't include the constant year in the plot.
# Define the plot
p <- df %>%
mutate(
year = factor(year(date)), # use year to define separate curves
date = update(date, year = 1) # use a constant year for the x-axis
) %>%
ggplot(aes(date, deni011, color = year)) +
scale_x_date(date_breaks = "1 month", date_labels = "%b")
# Raw daily data
p + geom_line()
In this case though, your daily data are quite variable, so this is a bit of a mess. You could hone in on a single year to see the daily variation a bit better.
# Hone in on a single year
p + geom_line(aes(group = year), color = "black", alpha = 0.1) +
geom_line(data = function(x) filter(x, year == 2010), size = 1)
But ultimately, if you want to look a several years at a time, it's probably a good idea to present smoothed lines rather than raw daily values. Or, indeed, some monthly aggregate.
# Smoothed version
p + geom_smooth(se = F)
#> `geom_smooth()` using method = 'loess'
#> Warning: Removed 117 rows containing non-finite values (stat_smooth).
There are multiple values from one month, so when plotting your original data, you got multiple points in one month. Therefore, the line looks strange.
If you want to create something similar to the example your provided, you have to summarize your data by year and month. Below I calculated the mean of each year and month for your data. In addition, you need to convert your year and month to factors if you want to plot it as discrete variables.
library(dplyr)
Ref_Data2 <- Ref_Data %>%
group_by(MonthN, YearN, Month) %>%
summarize(DENI011 = mean(DENI011)) %>%
ungroup() %>%
# Convert the Month column to factor variable with levels from Jan to Dec
# Convert the YearN column to factor
mutate(Month = factor(Month, levels = unique(Month)),
YearN = as.factor(YearN))
g <- ggplot(data = Ref_Data2,
aes(x = Month, y = DENI011, group = YearN, colour = YearN)) +
geom_line()
g
If you don't want to add in library(dplyr), this is the base R code. Exact same strategy and results as www's answer.
dat <- read.delim("~/Downloads/df1.dat", sep = " ")
dat$Date <- as.Date(dat$Date)
dat$month <- factor(months(dat$Date, TRUE), levels = month.abb)
dat$year <- gsub("-.*", "", dat$Date)
month_summary <- aggregate(DENI011 ~ month + year, data = dat, mean)
ggplot(month_summary, aes(month, DENI011, color = year, group = year)) +
geom_path()
I have the below data which I am trying to plot on the one chart so I can compare 2013 to 2014 data, with colour set by the 'year'.
I would like the output to look something like this:
My example CSV data looks like the below:
Date Data
1/01/2013 10
1/02/2013 20
1/03/2013 30
1/04/2013 20
1/01/2014 40
1/02/2014 70
1/03/2014 80
1/04/2014 90
I have the below code, but it doesn't extract the 'year' from the 'Date' data. I only know how to treat each 'date' with a different colour instead, but it's not really what I want.
p <- ggplot(d, aes(x=as.Date(Date, "%d/%m/%Y"), y=Data,
group=Date, color=Date)) +
geom_bar(stat="identity") +
scale_color_discrete(name="Year") +
labs(x="",y="Test Data") +
geom_smooth(aes(group=1))
p
Any help would be much appreciated.
Add an extra column Year to your data frame. Here is a simple example:
# create example data set
library("zoo")
library("strucchange")
d <- data.frame(Date=index(SP2001)+90, Data=SP2001$AAPL)
# add year column to data frame
d$Year <- format(d$Date, "%Y")
library("ggplot2")
p <- ggplot(d, aes(x=as.Date(Date, "%d/%m/%Y"), y=Data,
group=Year)) +
geom_bar(aes(fill=Year), stat="identity") +
labs(x="", y="Test Data") +
geom_smooth(aes(colour=Year))
p
given a date object you can extract the year as follows
format(date_series,'%Y')
%Y will use 4 digits, %y just the last 2
you can add more elements to the format string, for example %Y%m outputs things like 201401, 201402 - I use this one frequently
When I am downloading data from Google Trend, the dataset looks like this:
Week nuclear atomic nuclear.weapons unemployment
2004-01-04 - 2004-01-10 11 11 1 15
2004-01-11 - 2004-01-17 11 13 1 13
2004-01-18 - 2004-01-24 10 11 1 13
How can I change the dates in "Week" from this format "Y-m-d - Y-m-d" to a format like "Year-Week"?
Furthermore, how can I tell ggplot, that it only the years are printed on the x-axes instead of all values for x?
#Mattrition: Thank you. I followed your advice:
trends <- melt(trends, id = "Woche",
measure = c("nuclear", "atomic", "nuclear.weapons", "unemployment"))
trends$Week<- gsub("^(\\d+-\\d+-\\d+).+", "\\1", trends$Week)
trends$Week <- as.Date(trends$Week)
ggplot(trends, aes(Week, value, colour = variable, group=variable)) +
geom_line() +
ylab("Trends") +
theme(legend.position="top", legend.title=element_blank(),
panel.background = element_rect(fill = "#FFFFFF", colour="#000000"))+
scale_color_manual(values=c("#999999", "#E69F00", "#56B4E9", "#009E73"))+
stat_smooth(method="loess")
Now, every second year is labeled (2004, 2006, ...) in x-axis. How can I tell ggplot to label every year (2004, 2005, ...)?
ggplot will understand Date objects (see ?Date) and work out appropriate labelling if you can convert your dates to this format.
You can use something like gsub to extract starting day for each week. This uses regular expressions to match the first argument and return anything inside the set of brackets:
df$startingDay <- gsub("^(\\d+-\\d+-\\d+).+", "\\1", df$Week)
Then call as.Date() on the extracted day strings to convert to Date objects:
df$date <- as.Date(df$startingDay)
You can then use the date objects to plot whatever you wanted to plot:
g <- ggplot(df, aes(date, as.numeric(atomic))) + geom_line()
print(g)
EDIT:
To answer your additional question, add the following to your ggplot object:
library(scales)
g <- g + scale_x_date(breaks=date_breaks(width="1 year"),
labels=date_format("%Y"))
head(bktst.plotdata)
date method product type actuals forecast residual Percent_error month
1 2012-12-31 bauwd CUSTM NET 194727.51 -8192.00 -202919.51 -104.21 Dec12
2 2013-01-31 bauwd CUSTM NET 470416.27 1272.01 -469144.26 -99.73 Jan13
3 2013-02-28 bauwd CUSTM NET 190943.57 -1892.45 -192836.02 -100.99 Feb13
4 2013-03-31 bauwd CUSTM NET -42908.91 2560.05 45468.96 -105.97 Mar13
5 2013-04-30 bauwd CUSTM NET -102401.68 358807.48 461209.16 -450.39 Apr13
6 2013-05-31 bauwd CUSTM NET -134869.73 337325.33 472195.06 -350.11 May13
I have been trying to plot my back test result using ggplot2. Given above a sample dataset. I have dates ranging from Dec2012 to Jul2013. 3 levels in 'method', 5 levels in 'product' and 2 levels in 'type'
I tried this code, trouble is that R is not reading x-axis correct, on the X-axis I am getting 'Jan, feb, mar, apr, may,jun, jul, aug', instead I expect R to plot Dec-to-Jul
month.plot1 <- ggplot(data=bktst.plotdata, aes(x= date, y=Percent_error, colour=method))
facet4 <- facet_grid(product~type,scales="free_y")
title3 <- ggtitle("Percent Error - Month-over-Month")
xaxis2 <- xlab("Date")
yaxis3 <- ylab("Error (%)")
month.plot1+geom_line(stat="identity", size=1, position="identity")+facet4+title3+xaxis2+yaxis3
# Tried changing the code to this still not getting the X-axis right
month.plot1 <- ggplot(data=bktst.plotdata, aes(x= format(date,'%b%y'), y=Percent_error, colour=method))
month.plot1+geom_line(stat="identity", size=1, position="identity")+facet4+title3+xaxis2+yaxis3
Well, it looks like you are plotting the last day of each month, so it actually makes sense to me that December 31 is plotted very very close to January. If you look at the plotted points (with geom_point) you can see that each point is just to the left of the closest month axis.
It sounds like you want to plot years and months instead of actual dates. There are a variety of ways you might do this, but one thing you could is to change the day part of the date to the first of the month instead of the last of the month. Here I show how you could do this using some functions from package lubridate along with paste (I have assumed your variable date is already a Date object).
require(lubridate)
bktst.plotdata$date2 = as.Date(with(bktst.plotdata,
paste(year(date), month(date), "01", sep = "-")))
Then the plot axes start at December. You can change the format of the x axis if you load the scales package.
require(scales)
ggplot(data=bktst.plotdata, aes(x = date2, y=Percent_error, colour=method)) +
facet_grid(product~type,scales="free_y") +
ggtitle("Percent Error - Month-over-Month") +
xlab("Date") + ylab("Error (%)") +
geom_line() +
scale_x_date(labels=date_format(format = "%m-%Y"))