Add one day to every date in a days vector - r

I have a vector of dates called KeyDates containing two key dates. I would like to make a new vector of dates called KeyDatesPlus containing those two key dates and the two days after, in chronological order.
KeyDates <- structure(c(15159,15165), class = "Date")
#KeyDates Output:
[1] "2011-07-04" "2011-07-10"
#desired output for KeyDatesPlus:
[1] "2011-07-04" "2011-07-05" "2011-07-10" "2011-07-11"
How could I achieve that? Thank you very much.

sort(c(KeyDates, KeyDates + 1))
[1] "2011-07-04" "2011-07-05" "2011-07-10" "2011-07-11"

structure( sapply(KeyDates, "+", (0:1)), class = "Date")
[1] "2011-07-04" "2011-07-05" "2011-07-10" "2011-07-11"
Or:
as.Date( sapply(KeyDates, "+", (0:1)))
[1] "2011-07-04" "2011-07-05" "2011-07-10" "2011-07-11"

KeyDates <- structure(c(15159,15165), class = "Date")
KeyDates.plus <- as.Date(sapply(KeyDates, function(x) c(x, x+1)))

An answer using the package lubridate:
library("lubridate")
your.vector <- c("2011-07-04", "2011-07-10")
your.vector <- parse_date_time(x = your.vector, orders = "ymd")
your.vector
# [1] "2011-07-04 UTC" "2011-07-10 UTC"
one.day <- days(x = 1)
one.day
# [1] "1d 0H 0M 0S"
your.vector + one.day
# [1] "2011-07-05 UTC" "2011-07-11 UTC"
# your exact desired output (non-UTC time zone can be specified in parse_date_time):
new.vector <- sort(x = c(your.vector, your.vector + one.day))
# [1] "2011-07-04 UTC" "2011-07-05 UTC" "2011-07-10 UTC" "2011-07-11 UTC"
Lubridate distinguishes a "period" from a "duration."
A period is the time on the clock (ie if daylight savings time happens, it's what the clock reads). That's what's specified here using days().
A duration is the physical time (ie if daylight savings time happens, it's how long you've actually been sitting there.) That could be specified instead using ddays().

KeyDates <- structure(c(15159,15165), class = "Date")
KeyDatesPlus <- KeyDates+1
KeyDatesPlus <- sort(unique(c(KeyDates, KeyDatesPlus)))

Related

R reading in short date instead of long date [duplicate]

If a date vector has two-digit years, mdy() turns years between 00 and 68 into 21st Century years and years between 69 and 99 into 20th Century years. For example:
library(lubridate)
mdy(c("1/2/54","1/2/68","1/2/69","1/2/99","1/2/04"))
gives the following output:
Multiple format matches with 5 successes: %m/%d/%y, %m/%d/%Y.
Using date format %m/%d/%y.
[1] "2054-01-02 UTC" "2068-01-02 UTC" "1969-01-02 UTC" "1999-01-02 UTC" "2004-01-02 UTC"
I can fix this after the fact by subtracting 100 from the incorrect dates to turn 2054 and 2068 into 1954 and 1968. But is there a more elegant and less error-prone method of parsing two-digit dates so that they get handled correctly in the parsing process itself?
Update: After #JoshuaUlrich pointed me to strptime I found this question, which deals with an issue similar to mine, but using base R.
It seems like a nice addition to date handling in R would be some way to handle century selection cutoffs for two-digit dates within the date parsing functions.
Here is a function that allows you to do this:
library(lubridate)
x <- mdy(c("1/2/54","1/2/68","1/2/69","1/2/99","1/2/04"))
foo <- function(x, year=1968){
m <- year(x) %% 100
year(x) <- ifelse(m > year %% 100, 1900+m, 2000+m)
x
}
Try it out:
x
[1] "2054-01-02 UTC" "2068-01-02 UTC" "1969-01-02 UTC" "1999-01-02 UTC"
[5] "2004-01-02 UTC"
foo(x)
[1] "2054-01-02 UTC" "2068-01-02 UTC" "1969-01-02 UTC" "1999-01-02 UTC"
[5] "2004-01-02 UTC"
foo(x, 1950)
[1] "1954-01-02 UTC" "1968-01-02 UTC" "1969-01-02 UTC" "1999-01-02 UTC"
[5] "2004-01-02 UTC"
The bit of magic here is to use the modulus operator %% to return the fraction part of a division. So 1968 %% 100 yields 68.
I just experienced this exact same bug / feature.
I ended up writing the following two quick functions to help convert from excel-type dates (which is where i get this most) to something R can use.
There's nothing wrong with the accepted answer -- it's just that i prefer not to load up on packages too much.
First, a helper to split and replace the years ...
year1900 <- function(dd_y, yrFlip = 50)
{
dd_y <- as.numeric(dd_y)
dd_y[dd_y > yrFlip] <- dd_y[dd_y > yrFlip] + 1900
dd_y[dd_y < yrFlip] <- dd_y[dd_y < yrFlip] + 2000
return(dd_y)
}
which is used by a function that 'fixes' your excel dates, depending on type:
XLdate <- function(Xd, type = 'b-Y')
{
switch(type,
'b-Y' = as.Date(paste0(substr(Xd, 5, 9), "-", substr(Xd, 1, 3), "-01"), format = "%Y-%b-%d"),
'b-y' = as.Date(paste0(year1900(substr(Xd, 5, 6)), "-", substr(Xd, 1, 3), "-01"),
format = "%Y-%b-%d"),
'Y-b' = as.Date(paste0(substr(Xd, 1, 3), "-", substr(Xd, 5, 9), "-01"), format = "%Y-%b-%d")
)
}
Hope this helps.
Another option would be:
xxx <- c("01-Jan-54","01-Feb-68","01-Aug-69","01-May-99","01-Jun-04", "
31-Dec-68","01-Jan-69", "31-Dec-99")
.
dmy(paste0(sub("\\d\\d$","",xxx) , ifelse( (tt <-
sub("\\d\\d-\\D\\D\\D-","",xxx) ) > 20 ,paste0("19",tt),paste0("20",tt))))
Though no solution is elegant nor short.
I think it would be better if lubridate just added an option to specify the cutoff date.

Plot a histogram of yearly counts

I have a csv file that consists of one column. The column presents the date of posting on a website. I want to plot a histogram to see how the number of posts varies over the years. The file contains the years (2012 to 2016) and consists of 11,000 rows.
sample of the file:
2 30/1/12 21:07
3 2/2/12 15:53
4 3/4/12 0:49
5 14/11/12 3:49
6 11/8/13 16:00
7 31/7/14 8:08
8 31/7/14 10:48
9 6/8/14 9:24
10 16/12/14 3:34
The data types is dataframe
class(postsData)
[1] "data.frame"
I tried converting the data to text using strptime function as below:
formatDate <- strptime(as.character(postsData$Date),format="“%d/%m/%y")
then plot the histogram
hist(formatDate,breaks=10,xlab="year")
Any tip or suggestion would be useful. Thank you,
use lubridate::dmy_hm()
strptime() is overly complicated in my opinion compared to { lubridate }.
library(lubridate)
d <- c("30/1/12 21:07",
"2/2/12 15:53",
"3/4/12 0:49",
"14/11/12 3:49",
"11/8/13 16:00",
"31/7/14 8:08",
"31/7/14 10:48",
"6/8/14 9:24",
"16/12/14 3:34")
d2 <- dmy_hm(d)
d2
Returns:
[1] "2012-01-30 21:07:00 UTC"
[2] "2012-02-02 15:53:00 UTC"
[3] "2012-04-03 00:49:00 UTC"
[4] "2012-11-14 03:49:00 UTC"
[5] "2013-08-11 16:00:00 UTC"
[6] "2014-07-31 08:08:00 UTC"
[7] "2014-07-31 10:48:00 UTC"
[8] "2014-08-06 09:24:00 UTC"
[9] "2014-12-16 03:34:00 UTC"
As you can see, lubridate functions return POSIXct objects.
class(d2)
[1] "POSIXct" "POSIXt"
Next, you can use lubridate::year() to get the year of each POSIXct object returned by dmy_hm(), and plot that histogram.
hist(year(d2))
Here's one approach. I think your date conversion is fine but you need to count the number of dates that occur in each year then plot that count as a histogram.
library(tidyverse)
# generate some data
date.seq <- tibble(xdate = seq(from = lubridate::ymd_hms('2000-01-01 00:00:00'), to=lubridate::ymd_hms('2016-12-31 24:59:59'), length.out = 100))
date.seq %>%
mutate(xyear = lubridate::year(xdate)) %>% # add a column of years
group_by(xyear) %>%
summarise(date_count = length(xdate)) %>% # Count the number of dates that occur in each year
ggplot(aes(x = xyear, y = date_count)) +
geom_col(colour = 'black', fill = 'blue') # plot as a column graph
There's no problem with strptime()*, however, the format option is intended to specify how the is formatted.
df1$date <- strptime(df1$date, format="%d/%m/%y %H:%M")
# [1] "2012-01-30 21:07:00 CET" "2012-02-02 15:53:00 CET"
# [3] "2012-04-03 00:49:00 CEST" "2012-11-14 03:49:00 CET"
# [5] "2013-08-11 16:00:00 CEST" "2014-07-31 08:08:00 CEST"
# [7] "2014-07-31 10:48:00 CEST" "2014-08-06 09:24:00 CEST"
# [9] "2014-12-16 03:34:00 CET"
What you probably want then is to use the format() function
formatDate <- format(df1$date, format="%F")
(or in this case simpler with formatDate <- as.Date(df1$date))
and then
hist(formatDate, breaks=10, xlab="year")
* credits to #MikkoMarttila
Data
df1 <- structure(list(id = 2:10, date = c("30/1/12 21:07", "2/2/12 15:53",
"3/4/12 0:49", "14/11/12 3:49", "11/8/13 16:00", "31/7/14 8:08",
"31/7/14 10:48", "6/8/14 9:24", "16/12/14 3:34")), class = "data.frame", row.names = c(NA,
-9L))

Get date of timeseries object

I have a separately created time series object with daily frequency:
my.timeseries= ts(data= 1:10, start= c(2014,1,1), frequency = 365.25)
How can I get back the dates as POSIXct vector ("2014-01-01 UTC" ...) from this time series object?
Here's one potential method. I'm not really sure if it should be done this way, but it seems to work.
With your existing time series, try
p <- paste(attr(my.timeseries, "tsp")[1], my.timeseries)
as.POSIXct(as.Date(p, "%Y %j"))
# [1] "2014-01-01 UTC" "2014-01-02 UTC" "2014-01-03 UTC"
# [4] "2014-01-04 UTC" "2014-01-05 UTC" "2014-01-06 UTC"
# [7] "2014-01-07 UTC" "2014-01-08 UTC" "2014-01-09 UTC"
# [10] "2014-01-10 UTC"
As noted by G. Grothendieck in the comments, here is a more general solution
p <- paste(start(my.timeseries), seq_along(my.timeseries))
as.Date(p, "%Y %j")
# [1] "2014-01-01" "2014-01-02" "2014-01-03" "2014-01-04"
# [5] "2014-01-05" "2014-01-06" "2014-01-07" "2014-01-08"
# [9] "2014-01-09" "2014-01-10"
as.Date might be better to avoid any time-zone issues.
I strongly advise you to use xts object instead of ts.
Here is a code replicating what you want :
library(xts)
my.index = seq(from = as.Date("2014-01-01"), by = "day", length.out = 10)
my.timeseries = xts(x = 1:10, order.by = my.index)
index(my.timeseries)
Let us know if that helps :)
Romain

R sequence of dates with lubridate

Hi I'm trying to get a sequence of dates with lubridate
This doesn't work
seq(ymd('2012-04-07'),ymd('2013-03-22'),by=week(1))
the base command
seq(as.Date('2012-04-7'),as.Date('2013-03-22'),'weeks')
does, but I'd like to know if there is an elegant way to do this with lubridate.
EDIT
Please ignore : solved myself so leaving up for posterity only. Happy to have this deleted if necessary.
seq(ymd('2012-04-07'),ymd('2013-03-22'),by='weeks')
Does the trick
ymd is a wrapper to parse date strings and returns a POSIXct object.
You simply need to use standard terminology described in ?seq.POSIXt (not lubridate) to define weeks
seq(ymd('2012-04-07'),ymd('2013-03-22'), by = '1 week')
seq(ymd('2012-04-07'),ymd('2013-03-22'), by = 'weeks')
will works
as will
seq(ymd('2012-04-07'),ymd('2013-03-22'), by = '2 week')
You could coerce the lubridate Period class object to a difftime, but that seems rather unnecessary
seq(ymd('2012-04-07'),ymd('2013-03-22'), by = as.difftime(weeks(1)))
This is a way to stick within the POSIXct universe of lubridate and not change date formats to base R's POSIXt. I avoid changing the date format in my scripts because I find it is a common place where bugs (for example time-zone changes or losing timestamps) are introduced. It follows this advice to use %m+%: R: adding 1 month to a date
# example date is a leap day for a "worst case scenario"
library("lubridate")
posixct.in <- parse_date_time(x = "2016-02-29", orders = "ymd")
# [1] "2016-02-29 UTC"
posixct.seq <- posixct.in %m+% years(x = seq.int(from = 0, to = 3, by = 1))
# [1] "2016-02-29 UTC" "2017-02-28 UTC" "2018-02-28 UTC" "2019-02-28 UTC"
posixct.seq <- posixct.in %m+% months(x = seq.int(from = 0, to = 3, by = 1))
# [1] "2016-02-29 UTC" "2016-03-29 UTC" "2016-04-29 UTC" "2016-05-29 UTC"
posixct.seq <- posixct.in %m+% days(x = seq.int(from = 0, to = 3, by = 1))
# [1] "2016-02-29 UTC" "2016-03-01 UTC" "2016-03-02 UTC" "2016-03-03 UTC"
posixct.seq <- posixct.in %m+% weeks(x = seq.int(from = 0, to = 3, by = 1))
# [1] "2016-02-29 UTC" "2016-03-07 UTC" "2016-03-14 UTC" "2016-03-21 UTC"
A regular + also works sometimes, but the %m+% prevents errors like this:
posixct.seq <- posixct.in + years(x = seq.int(from = 0, to = 3, by = 1))
# [1] "2016-02-29 UTC" NA NA NA
At first I was confused because I thought %m+ was just a way to add months, and similar lubridate commands like %y+% etc. do not exist. But, turns out the "m" doesn't stand for "month addition". My best guess is "magic" =)

Is there a more elegant way to convert two-digit years to four-digit years with lubridate?

If a date vector has two-digit years, mdy() turns years between 00 and 68 into 21st Century years and years between 69 and 99 into 20th Century years. For example:
library(lubridate)
mdy(c("1/2/54","1/2/68","1/2/69","1/2/99","1/2/04"))
gives the following output:
Multiple format matches with 5 successes: %m/%d/%y, %m/%d/%Y.
Using date format %m/%d/%y.
[1] "2054-01-02 UTC" "2068-01-02 UTC" "1969-01-02 UTC" "1999-01-02 UTC" "2004-01-02 UTC"
I can fix this after the fact by subtracting 100 from the incorrect dates to turn 2054 and 2068 into 1954 and 1968. But is there a more elegant and less error-prone method of parsing two-digit dates so that they get handled correctly in the parsing process itself?
Update: After #JoshuaUlrich pointed me to strptime I found this question, which deals with an issue similar to mine, but using base R.
It seems like a nice addition to date handling in R would be some way to handle century selection cutoffs for two-digit dates within the date parsing functions.
Here is a function that allows you to do this:
library(lubridate)
x <- mdy(c("1/2/54","1/2/68","1/2/69","1/2/99","1/2/04"))
foo <- function(x, year=1968){
m <- year(x) %% 100
year(x) <- ifelse(m > year %% 100, 1900+m, 2000+m)
x
}
Try it out:
x
[1] "2054-01-02 UTC" "2068-01-02 UTC" "1969-01-02 UTC" "1999-01-02 UTC"
[5] "2004-01-02 UTC"
foo(x)
[1] "2054-01-02 UTC" "2068-01-02 UTC" "1969-01-02 UTC" "1999-01-02 UTC"
[5] "2004-01-02 UTC"
foo(x, 1950)
[1] "1954-01-02 UTC" "1968-01-02 UTC" "1969-01-02 UTC" "1999-01-02 UTC"
[5] "2004-01-02 UTC"
The bit of magic here is to use the modulus operator %% to return the fraction part of a division. So 1968 %% 100 yields 68.
I just experienced this exact same bug / feature.
I ended up writing the following two quick functions to help convert from excel-type dates (which is where i get this most) to something R can use.
There's nothing wrong with the accepted answer -- it's just that i prefer not to load up on packages too much.
First, a helper to split and replace the years ...
year1900 <- function(dd_y, yrFlip = 50)
{
dd_y <- as.numeric(dd_y)
dd_y[dd_y > yrFlip] <- dd_y[dd_y > yrFlip] + 1900
dd_y[dd_y < yrFlip] <- dd_y[dd_y < yrFlip] + 2000
return(dd_y)
}
which is used by a function that 'fixes' your excel dates, depending on type:
XLdate <- function(Xd, type = 'b-Y')
{
switch(type,
'b-Y' = as.Date(paste0(substr(Xd, 5, 9), "-", substr(Xd, 1, 3), "-01"), format = "%Y-%b-%d"),
'b-y' = as.Date(paste0(year1900(substr(Xd, 5, 6)), "-", substr(Xd, 1, 3), "-01"),
format = "%Y-%b-%d"),
'Y-b' = as.Date(paste0(substr(Xd, 1, 3), "-", substr(Xd, 5, 9), "-01"), format = "%Y-%b-%d")
)
}
Hope this helps.
Another option would be:
xxx <- c("01-Jan-54","01-Feb-68","01-Aug-69","01-May-99","01-Jun-04", "
31-Dec-68","01-Jan-69", "31-Dec-99")
.
dmy(paste0(sub("\\d\\d$","",xxx) , ifelse( (tt <-
sub("\\d\\d-\\D\\D\\D-","",xxx) ) > 20 ,paste0("19",tt),paste0("20",tt))))
Though no solution is elegant nor short.
I think it would be better if lubridate just added an option to specify the cutoff date.

Resources