How to simplify date in graph axis to month and year? - r

Apologies for a question on something which is probably very straightforward. I am very new to R.
I have a dataframe which contains dates in a year,month,day format e.g. "2020-05-28". I wanted to stratify the data at month level, so I used the "floor_date" function. However, now the dates read as "2020-05-01" etc. This is absolutely fine for the data set itself, but I am creating epidemiological curves and want to change the dates to "2020-05" etc. on the legend. Could anyone provide some guidance on how to do this? I can't simply replace the pattern "01" with a blank as I need to keep 01 on month level (January) visible.

Related

My data does not convert to time series in R

My data contains several measurements in one day. It is stored in CSV-file and looks like this:
enter image description here
The V1 column is factor type, so I'm adding a extra column which is date-time -type: vd$Vdate <- as_datetime(vd$V1) :
enter image description here
Then I'm trying to convert the vd-data into time series: vd.ts<- ts(vd, frequency = 365)
But then the dates are gone:
enter image description here
I just cannot get it what I am doing wrong! Could someone help me, please.
Your dates are gone because you need to build the ts dataframe from your variables (V1, ... V7) disregarding the date field and your ts command will order R to structure the dates.
Also, I noticed that you have what is seems like hourly data, so you need to provide the frequency that is appropriate to your time not 365. Considering what you posted your frequency seems to be a bit odd. I recommend finding a way to establish the frequency correctly. For example, if I have hourly data for 365 days of the year then I have a frequency of 365.25*24 (0.25 for the leap years).
So the following is just as an example, it still won't work properly with what I see (it is limited view of your dataset so I am not sure 100%)
# Build ts data (univariate)
vs.ts <- ts(vd$V1, frequency = 365, start = c(2019, 4)
# check to see if it is structured correctly
print(vd.ts, calendar = T)
Finally my time series is working properly. I used
ts <- zoo(measurements, date_times)
and I found out that the date_times was supposed to be converted with as_datetime() as otherwise they were character type. The measurements are converted into data.frame type.

How can I convert a characters into dates in RStudio?

still new to R. I wanted to create a simple (bar) chart of the fluctuations/occurrences of burglaries per month in my city. I found that the column, 'Occurence_Date' is a character, I wanted it to be "time", or something simpler, to create a visualization. I wanted the "x-axis" to be the months of January to June 2019, with the "y-axis" to be the amount of burglaries per month. Can anyone help me get started on this please? Thanks!
This is my data frame
The lubridate package is very helpful for working with dates and times in R.
# load.packages("lubridate") ## only run once
library(lubridate)
df$Occurence_Date <- ymd(df$Occurence_Date) # converts text in year month day format, igrores time
Generally it's better to put example data in your question so people can work with it and show an example.

Force ggplot scales to start on e.g. 1st of year, 1st of month etc

I'm looking for a way to force the date labels on a ggplot to start at a (seemingly) logical time. I've had the problem a number of times but my current problem is I want the breaks to be on the 01/01/yyyy
My data is a large dataset with POSIXct Date column, data to plot in Flow column and a number of site names in the Site column.
library(ggplot2)
library(scales)
ggplot(AllFlowData, aes(x=Date, y = Flow, colour = Site))+geom_line()+
scale_x_datetime(date_breaks = "1 year", expand =c(0,0),labels=date_format("%Y"))
I can force the breaks to be every year and they appear okay without the labels=date_format("%Y") (starting on 01/01 each year) but if I include labels=date_format("%Y") (as there is 10 years of data so gets a bit messy) the date labels move to ~November, and 1989 is the first label even though my data starts on the 01/01/1990.
I have had this problem numerous times in the past on different time steps, such as wanting to force it to the 1st of the month or daily times to be at midnight instead during the day. Is there a generic way to do this?
I have looked at create specific date range in ggplot2 ( scale_x_date), but I do not want to have to hard code my breaks as I have a fair few plots to do with different date ranges.
Thanks
If the dates come to you in a vector like:
dates <- seq.Date(as.Date("2001-03-04"), as.Date("2001-11-04"), by="day")
## "2001-03-04" "2001-03-05" "2001-03-06" ... "2001-11-03" "2001-11-04"
use pretty.Dates() to make a best guess about the end points.
range(pretty(dates))
## "2001-01-01" "2002-01-01"
Then pass this range to ggplot.
However, I recommend coord_cartesian() instead of scale_x_date(). Typically I want to crop the graphic bounds, instead of flat-out exclude the values entirely (which can mess up things like a loess summary).

Convert from dd/mm/yyyy to dd/mm in r

I have data spread over a period of two months. When I graph data points for each day, dates (dd/mm/yyyy) are overlapping and it is not possible to make sense of which date a certain point refers to. I tried to remove years from the date as they are not useful for the info I have and the dd/mm should leave enough space.
df$date<-as.Date(df$date, format="%d/%m")
However, it transforms the 01/09/2014 to 2015-09-01. I read that when the year is missing as.Date assumes current year and inputs it. Can I avoid this automatic insertion somehow?
something like this?
date <- as.Date("01/09/2014", format = %d/%m/%Y)
format(date, "%d/%m")
"01/09"

Creating a single timestamp from separate DAY OF YEAR, Year and Time columns in R

I have a time series dataset for several meteorological variables. The time data is logged in three separate columns:
Year (e.g. 2012)
Day of year (e.g. 261 representing 17-September in a Leap Year)
Hrs:Mins (e.g. 1610)
Is there a way I can merge the three columns to create a single timestamp in R? I'm not very familiar with how R deals with the Day of Year variable.
Thanks for any help with this!
It looks like the timeDate package can handle gregorian time frames. I haven't used it personally but it looks straightforward. There is a shift argument in some methods that allow you to set the offset from your data.
http://cran.r-project.org/web/packages/timeDate/timeDate.pdf
Because you mentioned it, I thought I'd show the actual code to merge together separate columns. When you have the values you need in separate columns you can use paste to bring them together and lubridate::mdy to parse them.
library(lubridate)
col.month <- "Jan"
col.year <- "2012"
col.day <- "23"
date <- mdy(paste(col.month, col.day, col.year, sep = "-"))
Lubridate is a great package, here's the official page: https://github.com/hadley/lubridate
And here is a nice set of examples: http://www.r-statistics.com/2012/03/do-more-with-dates-and-times-in-r-with-lubridate-1-1-0/
You should get quite far using ISOdatetime. This function takes vectors of year, day, hour, and minute as input and outputs an POSIXct object which represents time. You just have to split the third column into two separate hour minute columns and you can use the function.

Resources