How can I define custom quarter boundaries (not calendar) in R? - r

I see a ton of libraries like zoo, ts, timeSeries for working with quarters but I can't seem to figure out a way to change quarter boundaries.
The data I analyze needs to be broken into fiscal quarters.
Ex:
Fiscal Q1: 7/28/2013 - 10/26/2013
Fiscal Q2: 10/27/2013 - 1/25/2014
and so on...

Try useing cut to define your own date ranges:
boundaries <- as.Date(c("7/28/2013","10/27/2013","1/26/2014"),"%m/%d/%Y")
quarterNames <- c("Fiscal Q1","Fiscal Q2")
cut(vectorOfDates ,
breaks = boundaries,
labels = quarterNames)
Note that you need one more boundary than label (since the labels are applied to the ranges between the breaks), and that the boundaries must span your date range, otherwise you'll introduce missing values.

Related

Changing period of dates to standard date to do line graph

I'm trying to plot a line graph with R using the dataset that can be found here . I'm looking specifically at how to plot the number of cases in each region i.e. north east, north west etc against the period of time.
However, as the date is a period of a week rather than a standard date, how can I convert it to make the line graph actually possible? For example, right now it has the dates as 01/09/2020 - 07/09/2020. How can I use this for a line graph?
Sorry if my explanation isn't clear, here is a picture below.
I assume you're trying to plot a time series? You could just trim the dates to the beginning of the week and label the time axis as "Week beginning on date". You could do this with substr() in base r and keep the first 10 characters.
substr(data$column,1,10)
You may also want to format it as a date, easiest with the lubridate package, something like dmy() (day month year).
Here is the full code you would want:
library(tidyverse)
#Read in data
data <- read.csv("/Users/sabrinaxie/Downloads/covid19casesbysociodemographiccharacteristicengland1sep2020to10dec20213.csv")
#Modify data and remove extraneous top rows
data <- data %>%
rename(Period=Table.9..Weekly.estimates.of.age.standardised.COVID.19.case.rates..per.100.000.person.weeks..by.region..England..1.September.2020.to.6.December.20211.2.3) %>%
slice(3:n())
#Keep first 10 characters of Period column and assign to old column to replace
data$Period <- substr(data$Period,1,10)
#Parse as date
data$Period <- dmy(data$Period)

How to simplify date in graph axis to month and year?

Apologies for a question on something which is probably very straightforward. I am very new to R.
I have a dataframe which contains dates in a year,month,day format e.g. "2020-05-28". I wanted to stratify the data at month level, so I used the "floor_date" function. However, now the dates read as "2020-05-01" etc. This is absolutely fine for the data set itself, but I am creating epidemiological curves and want to change the dates to "2020-05" etc. on the legend. Could anyone provide some guidance on how to do this? I can't simply replace the pattern "01" with a blank as I need to keep 01 on month level (January) visible.

Force ggplot scales to start on e.g. 1st of year, 1st of month etc

I'm looking for a way to force the date labels on a ggplot to start at a (seemingly) logical time. I've had the problem a number of times but my current problem is I want the breaks to be on the 01/01/yyyy
My data is a large dataset with POSIXct Date column, data to plot in Flow column and a number of site names in the Site column.
library(ggplot2)
library(scales)
ggplot(AllFlowData, aes(x=Date, y = Flow, colour = Site))+geom_line()+
scale_x_datetime(date_breaks = "1 year", expand =c(0,0),labels=date_format("%Y"))
I can force the breaks to be every year and they appear okay without the labels=date_format("%Y") (starting on 01/01 each year) but if I include labels=date_format("%Y") (as there is 10 years of data so gets a bit messy) the date labels move to ~November, and 1989 is the first label even though my data starts on the 01/01/1990.
I have had this problem numerous times in the past on different time steps, such as wanting to force it to the 1st of the month or daily times to be at midnight instead during the day. Is there a generic way to do this?
I have looked at create specific date range in ggplot2 ( scale_x_date), but I do not want to have to hard code my breaks as I have a fair few plots to do with different date ranges.
Thanks
If the dates come to you in a vector like:
dates <- seq.Date(as.Date("2001-03-04"), as.Date("2001-11-04"), by="day")
## "2001-03-04" "2001-03-05" "2001-03-06" ... "2001-11-03" "2001-11-04"
use pretty.Dates() to make a best guess about the end points.
range(pretty(dates))
## "2001-01-01" "2002-01-01"
Then pass this range to ggplot.
However, I recommend coord_cartesian() instead of scale_x_date(). Typically I want to crop the graphic bounds, instead of flat-out exclude the values entirely (which can mess up things like a loess summary).

Convert from dd/mm/yyyy to dd/mm in r

I have data spread over a period of two months. When I graph data points for each day, dates (dd/mm/yyyy) are overlapping and it is not possible to make sense of which date a certain point refers to. I tried to remove years from the date as they are not useful for the info I have and the dd/mm should leave enough space.
df$date<-as.Date(df$date, format="%d/%m")
However, it transforms the 01/09/2014 to 2015-09-01. I read that when the year is missing as.Date assumes current year and inputs it. Can I avoid this automatic insertion somehow?
something like this?
date <- as.Date("01/09/2014", format = %d/%m/%Y)
format(date, "%d/%m")
"01/09"

R graphics plotting a linegraph with date/time horizontally along x-axis

I want to get a linegraph in R which has Time along x and temperature along y.
Originally I had the data in dd/mm/yyyy hh:mm format, with a time point every 30 minutes.
https://www.dropbox.com/s/q35y1rfila0va1h/Data_logger_S65a_Ania.csv
Since I couldn't find a way of reading this into R, I formatted the data to make it into dd/mm/yyyy and added a column 'time' with 1-48 for all the time points for each day
https://www.dropbox.com/s/65ogxzyvuzteqxv/temp.csv
This is what I have so far:
temp<-read.csv("temp.csv",as.is=T)
temp$date<-as.Date(temp$date, format="%d/%m/%Y")
#inputting date in correct format
plot(temperature ~ date, temp, type="n")
#drawing a blank plot with axes, but without data
lines(temp$date, temp$temperature,type="o")
#type o is a line overlaid on top of points.
This stacks the points up vertically, which is not what I want, and stacks all the time points (1-48) for each day all together on the same date.
Any advice would be much appreciated on how to get this horizontal, and ordered by time as well as date.

Resources