Force ggplot scales to start on e.g. 1st of year, 1st of month etc - r

I'm looking for a way to force the date labels on a ggplot to start at a (seemingly) logical time. I've had the problem a number of times but my current problem is I want the breaks to be on the 01/01/yyyy
My data is a large dataset with POSIXct Date column, data to plot in Flow column and a number of site names in the Site column.
library(ggplot2)
library(scales)
ggplot(AllFlowData, aes(x=Date, y = Flow, colour = Site))+geom_line()+
scale_x_datetime(date_breaks = "1 year", expand =c(0,0),labels=date_format("%Y"))
I can force the breaks to be every year and they appear okay without the labels=date_format("%Y") (starting on 01/01 each year) but if I include labels=date_format("%Y") (as there is 10 years of data so gets a bit messy) the date labels move to ~November, and 1989 is the first label even though my data starts on the 01/01/1990.
I have had this problem numerous times in the past on different time steps, such as wanting to force it to the 1st of the month or daily times to be at midnight instead during the day. Is there a generic way to do this?
I have looked at create specific date range in ggplot2 ( scale_x_date), but I do not want to have to hard code my breaks as I have a fair few plots to do with different date ranges.
Thanks

If the dates come to you in a vector like:
dates <- seq.Date(as.Date("2001-03-04"), as.Date("2001-11-04"), by="day")
## "2001-03-04" "2001-03-05" "2001-03-06" ... "2001-11-03" "2001-11-04"
use pretty.Dates() to make a best guess about the end points.
range(pretty(dates))
## "2001-01-01" "2002-01-01"
Then pass this range to ggplot.
However, I recommend coord_cartesian() instead of scale_x_date(). Typically I want to crop the graphic bounds, instead of flat-out exclude the values entirely (which can mess up things like a loess summary).

Related

How to simplify date in graph axis to month and year?

Apologies for a question on something which is probably very straightforward. I am very new to R.
I have a dataframe which contains dates in a year,month,day format e.g. "2020-05-28". I wanted to stratify the data at month level, so I used the "floor_date" function. However, now the dates read as "2020-05-01" etc. This is absolutely fine for the data set itself, but I am creating epidemiological curves and want to change the dates to "2020-05" etc. on the legend. Could anyone provide some guidance on how to do this? I can't simply replace the pattern "01" with a blank as I need to keep 01 on month level (January) visible.

Changing x axis label in GGPlot2

I have an r data set which has money spendings spread across months, and also grouped by years.
Year|Mth_Year|Mth_Spend
2004|01-2004|42507163
2004|02-2004|3377592
2006|10-2006|3507636
2006|11-2006|4479139
2006|12-2006|2439603
I need to display the monthly information (grouped year wise), so that some quick comparisons can be done.
I am using the ggplot, geom_bar options to display the monthly spends. Below is the code, I use.
ggplot(data=yearly_spending,aes(x=Year,y=Mth_Spend,fill=Mth_Year))+
geom_bar(stat="identity",color="black",position=position_dodge())+
theme(axis.text.x=element_text(angle=90,hjust=1))
When I use this code, the bar chart is getting displayed. But in X Axis, only the years (2004, 2005 & 2006) are displayed. Can I get the months also displayed above the years. The years can appear horizontally, and while months can be placed vertically.
Thank you for all the suggestions. Using the scales package and lubridate to convert strings into date, I could solve the issue.
ggplot(data=monthly_spend_catwise,aes(x=Mth_Year,y=TotSpending,fill=Type))+
scale_x_date(labels=date_format("%m-%Y"),date_breaks = "1 month")+
theme(axis.text.x=element_text(angle=45,hjust=1,vjust=0.5))+
geom_bar(stat="identity",color="black",position=position_dodge())
I had formatted the dates in Mth_Year (instead of 01-2004, made it into 01-01-2004), using the below code.
spending$Mth_Year <- as.Date(paste("01",spending$Mth_Year,sep="-"),"%d-%m-%y")

Weekly time series plot in R

I am trying to create a plot of weekly data. Though this is not the exact problem I am having it illustrates it well. Basically imagine you want to make a plot of 1,2,....,7 for for 7 weeks from Jan 1 2015. So basically my plot should just be a line that trends upward but instead I get 7 different lines. I tried the code (and some other to no avail). Help would be greatly appreciated.
startDate = "2015-01-01"
endDate = "2015-02-19"
y=c(1,2,3,4,5,6,7)
tsy=ts(y,start=as.Date(startDate),end=as.Date(endDate))
plot(tsy)
You are plotting both the time and y together as individual plots.
Instead use:
plot(y)
lines(y)
Also, create a date column based on the specifics you gave which will be a time series. From here you can add the date on the x-axis to easily see how your variable changes over time.
To make your life easier I think your first step should be to create a (xts) time series object (install/load the xts-package), then it is a piece of cake to plot, subset or do whatever you like with the series.
Build your vector of dates as a sequence with start/end date:
seq( as.Date("2011-07-01"), by=1, len=7)
and your data vector: 1:7
a one-liner builds and plots the above time series object:
plot(as.xts(1:7,order.by=seq( as.Date("2011-07-01"), by=1, len=7)))

How can I define custom quarter boundaries (not calendar) in R?

I see a ton of libraries like zoo, ts, timeSeries for working with quarters but I can't seem to figure out a way to change quarter boundaries.
The data I analyze needs to be broken into fiscal quarters.
Ex:
Fiscal Q1: 7/28/2013 - 10/26/2013
Fiscal Q2: 10/27/2013 - 1/25/2014
and so on...
Try useing cut to define your own date ranges:
boundaries <- as.Date(c("7/28/2013","10/27/2013","1/26/2014"),"%m/%d/%Y")
quarterNames <- c("Fiscal Q1","Fiscal Q2")
cut(vectorOfDates ,
breaks = boundaries,
labels = quarterNames)
Note that you need one more boundary than label (since the labels are applied to the ranges between the breaks), and that the boundaries must span your date range, otherwise you'll introduce missing values.

Plotting truncated times from zoo time series

Let's say I have a data frame with lots of values under these headers:
df <- data.frame(c("Tid", "Value"))
#Tid.format = %Y-%m-%d %H:%M
Then I turn that data frame over to zoo, because I want to handle it as a time series:
library("zoo")
df <- zoo(df$Value, df$Tid)
Now I want to produce a smooth scatter plot over which time of day each measurement was taken (i.e. discard date information and only keep time) which supposedly should be done something like this: https://stat.ethz.ch/pipermail/r-help/2009-March/191302.html
But it seems the time() function doesn't produce any time at all; instead it just produces a number sequence. Whatever I do from that link, I can't get a scatter plot of values over an average day. The data.frame code that actually does work (without using zoo time series) looks like this (i.e. extracting the hour from the time and converting it to numeric):
smoothScatter(data.frame(as.numeric(format(df$Tid,"%H")),df$Value)
Another thing I want to do is produce a density plot of how many measurements I have per hour. I have plotted on hours using a regular data.frame with no problems, so the data I have is fine. But when I try to do it using zoo then I either get errors or I get the wrong results when trying what I have found through Google.
I did manage to get something plotted through this line:
plot(density(as.numeric(trunc(time(df),"01:00:00"))))
But it is not correct. It seems again that it is just producing a sequence from 1 to 217, where I wanted it to be truncating any date information and just keep the time rounded off to hours.
I am able to plot this:
plot(density(df))
Which produces a density plot of the Values. But I want a density plot over how many values were recorded per hour of the day.
So, if someone could please help me sort this out, that would be great. In short, what I want to do is:
1) smoothScatter(x-axis: time of day (0-24), y-axis: value)
2) plot(density(x-axis: time of day (0-24)))
EDIT:
library("zoo")
df <- data.frame(Tid=strptime(c("2011-01-14 12:00:00","2011-01-31 07:00:00","2011-02-05 09:36:00","2011-02-27 10:19:00"),"%Y-%m-%d %H:%M"),Values=c(50,52,51,52))
df <- zoo(df$Values,df$Tid)
summary(df)
df.hr <- aggregate(df, trunc(df, "hours"), mean)
summary(df.hr)
png("temp.png")
plot(df.hr)
dev.off()
This code is some actual values that I have. I would have expected the plot of "df.hr" to be an hourly average, but instead I get some weird new index that is not time at all...
There are three problems with the aggregate statement in the question:
We wish to truncate the times not df.
trunc.POSIXt unfortunately returns a POSIXlt result so it needs to be converted back to POSIXct
It seems you did not intend to truncate to the hour in the first place but wanted to extract the hours.
To address the first two points the aggregate statement needs to be changed to:
tt <- as.POSIXct(trunc(time(df), "hours"))
aggregate(df, tt, mean)
but to address the last point it needs to be changed entirely to
tt <- as.POSIXlt(time(df))$hour
aggregate(df, tt, mean)

Resources