Rounding dates with round_date() in R - r

I try convert date format yyyymmdd in yyyy only in R.
In how to convert numeric only year in Date in R? presented a very interesting answer, as it managed to make R understand to convert an 8-digit entry (yyyymmdd) as a 4-digit year year (yyyy) in the lubricated package, this is very good for me.
in old code i used round_date() for it:
date2<-c('01/01/2000','08/08/2000','16/03/2001','25/12/2000','29/02/2000')
name<-c('A','B','C','D','E')
df<-data.frame(date2,name)
df2 <- df %>%
mutate(date2 = dmy(date2)) %>%
mutate(year_date = round_date(date2,'year'))
df2
str(df2)
date2<date> name<chr> year_date <date>
2000-01-01 A 2000-01-01
2000-08-08 B 2001-01-01
2001-03-16 C 2001-01-01
2000-12-25 D 2001-01-01
2000-02-29 E 2000-01-01
But I started to have problems with my statistical analysis when discovering for example that a date 2000-08-08 was rounded up to the year 2001-01-01, instead of 2001-01-01 as I expected.
This is a very big problem for me, since information that belongs to the year 2005 has been moved to the year 2006, considering that I have more than 1400 rows in my database.
I noticed that dates after the middle of the year (after June) are rounded up to the next year, this is very bad.
How do I round a 2000-08-08 date to just 2000 instead of 2001?

Doesn't this (simpler, also only base R) operation do what you want?
> date2 <- c('01/01/2000','08/08/2000','16/03/2001','25/12/2000','29/02/2000')
> dd <- as.Date(date2, "%d/%m/%Y")
> yd <- format(dd, "%Y-01-01")
> dt <- as.Date(yd)
> D <- data.frame(date2=date2, date=dd, y=yd, d=dt)
> D
date2 date y d
1 01/01/2000 2000-01-01 2000-01-01 2000-01-01
2 08/08/2000 2000-08-08 2000-01-01 2000-01-01
3 16/03/2001 2001-03-16 2001-01-01 2001-01-01
4 25/12/2000 2000-12-25 2000-01-01 2000-01-01
5 29/02/2000 2000-02-29 2000-01-01 2000-01-01
>
In essence we just extract the year component from the (parsed as date) Date object and append -01-01.
Edit: There are also trunc() operations for Date and Datetime objects. Oddly, truncation for years only works for Datetime (see the help page for trunc.Date for more) so this works too:
> as.Date(trunc(as.POSIXlt(dd), "years"))
[1] "2000-01-01" "2000-01-01" "2001-01-01" "2000-01-01" "2000-01-01"
>
Edit 2: We can use that last step in a cleaner / simpler solution in a data.frame with three columns for input data (as characters), parse data as a proper Date type and the desired truncated year data — all using base R without further dependencies. Of course, if you would want to you could rewrite it via the pipe and lubridate for the same result via slightly slower route (which only matters for "large" data).
> date2 <- c('01/01/2000','08/08/2000','16/03/2001','25/12/2000','29/02/2000')
> pd <- as.Date(date2, "%d/%m/%Y")
> td <- as.Date(trunc(as.POSIXlt(pd), "years"))
> D <- data.frame(input = date2, parsed = pd, output = td)
> D
input parsed output
1 01/01/2000 2000-01-01 2000-01-01
2 08/08/2000 2000-08-08 2000-01-01
3 16/03/2001 2001-03-16 2001-01-01
4 25/12/2000 2000-12-25 2000-01-01
5 29/02/2000 2000-02-29 2000-01-01
>
For a real "production" use you may not need the data.frame and do not need to keep the intermediate result leading to a one-liner:
> as.Date(trunc(as.POSIXlt( as.Date(date2, "%d/%m/%Y") ), "years"))
[1] "2000-01-01" "2000-01-01" "2001-01-01" "2000-01-01" "2000-01-01"
>
which is likely the most compact and efficient conversion you can get.

If you want just the year (and not the date corresponding to the first day of the year) you can use lubridate::year().
df %>% mutate(across(date2,dmy),
year_date=year(date2))
If you do want the first day of the year then floor_date() will do the trick.
df %>% mutate(across(date2,dmy),
year_date=floor_date(date2,"year"))
or if you only need the truncated date you could go directly to mutate(year_date=floor_date(dmy(date2)))
In base R, year() would be format(date2, "%Y"), as shown in #DirkEddelbuettel's answer.

If you consult the round_datehelp page, you will also see floor_date:
library("lubridate")
library("dplyr")
date2 <- c('01/01/2000','08/08/2000','16/03/2001','25/12/2000','29/02/2000')
name <- c('A','B','C','D','E')
df <- data.frame(date2,name)
df2 <- df %>%
mutate(date2 = dmy(date2)) %>%
mutate(year_date = floor_date(date2,'year'))
df2

Related

How to change the date format & remove rows from dataframe before certain date R Studio

I have a dataframe with over 8.8 million observations and I need to remove rows from the dataframe before a certain date. Currently the date format is in MM/DD/YYYY but I would like to convert it to R date format (I believe YYYY-MM-DD).
When I run the code that I have below, it puts them in the correct R format, but it does not keep the correct date. For some reason, it makes the dates 2020. None of the dates in my data frame have the year 2020
> dates <- nyc_call_data_sample$INCIDENT_DATETIME
> date <- as.Date(dates,
+ format = "%m/%d/%y")
> head(nyc_call_data_sample$INCIDENT_DATETIME)
[1] "07/01/2015" "04/24/2016" "04/01/2013" "02/07/2015" "06/27/2016" "05/04/2017"
> head(date)
[1] "2020-07-01" "2020-04-24" "2020-04-01" "2020-02-07" "2020-06-27" "2020-05-04"
> nyc_call_data_sample$INCIDENT_DATETIME <- strptime(as.character(nzd$date), "%d/%m/%y")
Also, I have data that goes back as far as 2013. How would I go about removing all rows from the dataframe that are before 01/01/2017
Thanks!
as.Date and basic ?Extraction are your friend here.
dat <- data.frame(
unformatted = c("07/01/2015", "04/24/2016", "04/01/2013", "02/07/2015", "06/27/2016", "05/04/2017")
)
dat$date <- as.Date(dat$unformatted, format = "%m/%d/%Y")
dat
# unformatted date
# 1 07/01/2015 2015-07-01
# 2 04/24/2016 2016-04-24
# 3 04/01/2013 2013-04-01
# 4 02/07/2015 2015-02-07
# 5 06/27/2016 2016-06-27
# 6 05/04/2017 2017-05-04
dat[ dat$date > as.Date("2017-01-01"), ]
# unformatted date
# 6 05/04/2017 2017-05-04
(Feel free to remove the unformatted column with dat$unformatted <- NULL.)
With tidyverse:
library(dplyr)
dat %>%
mutate(date = as.Date(unformatted, format = "%m/%d/%Y")) %>%
select(-unformatted) %>%
filter(date > as.Date("2017-01-01"))
# date
# 1 2017-05-04

R convert yy-mm string to date format [duplicate]

I have a data frame (df) like the following:
Date Arrivals
2014-07 100
2014-08 150
2014-09 200
I know that I can convert the yearmon dates to the first date of each month as follows:
df$Date <- as.POSIXct(paste0(as.character(df[,1]),"-01"), format = "%Y-%m-%d")
However, given that my data is not available until the end of the month I want to index it to the end rather than the beginning, and I cannot figure it out. Any help appreciated.
If the Date variable is an actual yearmon class vector, from the zoo package, the as.Date.yearmon method can do what you want via its argument frac.
Using your data, and assuming that the Date was originally a character vector
library("zoo")
df <- data.frame(Date = c("2014-07", "2014-08", "2014-09"),
Arrivals = c(100, 150, 200))
I convert this to a yearmon vector:
df <- transform(df, Date2 = as.yearmon(Date))
Assuming this is what you have, then you can achieve what you want using as.Date() with frac = 1:
df <- transform(df, Date3 = as.Date(Date2, frac = 1))
which gives:
> df
Date Arrivals Date2 Date3
1 2014-07 100 Jul 2014 2014-07-31
2 2014-08 150 Aug 2014 2014-08-31
3 2014-09 200 Sep 2014 2014-09-30
That shows the individual steps. If you only want the final Date this is a one-liner
## assuming `Date` is a `yearmon` object
df <- transform(df, Date = as.Date(Date, frac = 1))
## or if not a `yearmon`
df <- transform(df, Date = as.Date(as.yearmon(Date), frac = 1))
The argument frac in the fraction of the month to assign to the resulting dates when converting from yearmon objects to Date objects. Hence, to get the first day of the month, rather than convert to a character and paste on "-01" as your Question showed, it's better to coerce to a Date object with frac = 0.
If the Date in your df is not a yearmon class object, then you can solve your problem by converting it to one and then using the as.Date() method as described above.
Here is a way to do it using the zoo package.
R code:
library(zoo)
df
# Date Arrivals
# 1 2014-07 100
# 2 2014-08 150
# 3 2014-09 200
df$Date <- as.Date(as.yearmon(df$Date), frac = 1)
# output
# Date Arrivals
# 1 2014-07-31 100
# 2 2014-08-31 150
# 3 2014-09-30 200
Using lubridate, you can add a month and subtract a day to get the last day of the month:
library(lubridate)
ymd(paste0(df$Date, '-01')) + months(1) - days(1)
# [1] "2014-07-31" "2014-08-31" "2014-09-30"

Calculate difference between timestamps

I try to find the difference between two timestamps.
The codeQ:
survey <- data.frame(date=c("07/2012","07/2012"),tx_start=c("01/2012","01/2012"))
survey$date_diff <- as.Date(as.character(survey$date), format="%m/%Y")-
as.Date(as.character(survey$tx_start), format="%m/%Y")
survey
I expect to have in the new column the different but I take NA
The results:
> survey
date tx_start date_diff
1 07/2012 01/2012 NA days
2 07/2012 01/2012 NA days
What should I change to replace as.Date for months or years?
Update based on comment of Gregor:
> survey <- data.frame(date=c("07/2012","07/2012"),tx_start=c("01/2012","01/2012"))
> survey$date <- as.Date(paste0("01/", as.character(survey$date)), "%d/%m/%Y")
> survey$tx_start <- as.Date(paste0("01/", as.character(survey$tx_start)), "%d/%m/%Y")
> survey$date_diff <- as.Date(survey$date, format="%d/%m/%Y")-
+ as.Date(survey$tx_start, format="%d/%m/%Y")
> survey
date tx_start date_diff
1 2012-07-01 2012-01-01 182 days
2 2012-07-01 2012-01-01 182 days
I usually convert my dates to POSIXct format. Then, when direct differences are taken with normal syntax, you get an answer in units of seconds. There is a difftime() function in base R that you can use as well:
survey <- data.frame(date=c("07/2012","07/2012"),tx_start=c("01/2012","01/2012"))
# Dates are finicky, add a day so that conversion will work
survey$date2 <- paste0("01/",survey$date)
survey$tx_start2 <- paste0("01/",survey$tx_start)
# conversion
survey$date2 <- as.POSIXct(x=survey$date2,format="%d/%m/%Y")
survey$tx_start2 <- as.POSIXct(x=survey$tx_start2,format="%d/%m/%Y")
# take the difference
survey$date_diff <- with(survey,difftime(time1=date2,time2=tx_start2,units="hours"))

how can i extract month and date and year from data column in R

I had a column with date datatype. in my column the dates are in 4/1/2007 format. now I want to extract month value from that column and date value from that column in different column in R. my date are from 01/01/2012 to 01/01/ 2015 plz help me.
If your variable is date type (as you say in the post) simply use following to extract month:
month_var = format(df$datecolumn, "%m") # this will give output like "09"
month_var = format(df$datecolumn, "%b") # this will give output like "Sep"
month_var = format(df$datecolumn, "%B") # this will give output like "September"
If your date variable in not in date format, then you will have to convert them into date format.
df$datecolumn<- as.Date(x, format = "%m/%d/%Y")
Assuming your initial data is character and not POSIX.
df <- data.frame(d = c("4/1/2007", "01/01/2012", "02/01/2015"),
stringsAsFactors = FALSE)
df
# d
# 1 4/1/2007
# 2 01/01/2012
# 3 02/01/2015
These are not yet "dates", just strings.
df$d2 = as.POSIXct(df$d, format = "%m/%d/%Y")
df
# d d2
# 1 4/1/2007 2007-04-01
# 2 01/01/2012 2012-01-01
# 3 02/01/2015 2015-02-01
Now they proper dates (in the R fashion). These two lines extract just a single component from each "date"; see ?strptime for details on all available formats.
df$dY = format(df$d2, "%Y")
df$dm = format(df$d2, "%m")
df
# d d2 dY dm
# 1 4/1/2007 2007-04-01 2007 04
# 2 01/01/2012 2012-01-01 2012 01
# 3 02/01/2015 2015-02-01 2015 02
An alternative method would be to extract the substrings from each string, but now you're getting into regex-pain; for that, I'd suggest sticking with somebody else's regex lessons-learned, and translate through POSIXct (or even POSIXlt if you want).

R converting a factor YYYY-MM to a date

I have a dataframe with a date in the form YYYY-MM, class factor and I am trying to convert it to class date.
I tried:
Date <- c("2015-08","2015-09","2015-08")
Val <- c(1,2,3)
df <- data.frame(Date,Val)
df[,1] <- as.POSIXct(as.character(df[,1]), format = "%Y-%m") 
df
But this does not work. I would be grateful for your help.
1) Convert the dates to zoo's "yearmon" class and then to "Date" class:
> library(zoo)
> transform(df, Date = as.Date(as.yearmon(Date)))
Date Val
1 2015-08-01 1
2 2015-09-01 2
3 2015-08-01 3
The question did not specify which date to convert to so we used the first of the month. Had the last of the month been wanted we could have used this instead:
transform(df, Date = as.Date(as.yearmon(Date), frac = 1))
2) Another possibility not using zoo is to just add the day of the month yourself and then convert to "Date" class.
> transform(df, Date = paste(Date, 1, sep = "-"))
Date Val
1 2015-08-01 1
2 2015-09-01 2
3 2015-08-01 3
3) Alternately, might want to just use "yearmon" directly since that directly models year and month with no day.
> library(zoo)
> transform(df, Date = as.yearmon(Date))
Date Val
1 Aug 2015 1
2 Sep 2015 2
3 Aug 2015 3
Note: Do not use "POSIXct" class as this gives a time zone dependent result that can cause subtle errors if you are not careful. A date in one time zone is not necessarily the same as in another time zone.
R does not support Dates in the format "%Y-%m"... A day is needed
You can do the following:
as.POSIXct(paste0(as.character(df[,1]),"-01"), format = "%Y-%m-%d")
Resulting in
"2015-08-01 CEST" "2015-09-01 CEST" "2015-08-01 CEST"

Resources