I have a separately created time series object with daily frequency:
my.timeseries= ts(data= 1:10, start= c(2014,1,1), frequency = 365.25)
How can I get back the dates as POSIXct vector ("2014-01-01 UTC" ...) from this time series object?
Here's one potential method. I'm not really sure if it should be done this way, but it seems to work.
With your existing time series, try
p <- paste(attr(my.timeseries, "tsp")[1], my.timeseries)
as.POSIXct(as.Date(p, "%Y %j"))
# [1] "2014-01-01 UTC" "2014-01-02 UTC" "2014-01-03 UTC"
# [4] "2014-01-04 UTC" "2014-01-05 UTC" "2014-01-06 UTC"
# [7] "2014-01-07 UTC" "2014-01-08 UTC" "2014-01-09 UTC"
# [10] "2014-01-10 UTC"
As noted by G. Grothendieck in the comments, here is a more general solution
p <- paste(start(my.timeseries), seq_along(my.timeseries))
as.Date(p, "%Y %j")
# [1] "2014-01-01" "2014-01-02" "2014-01-03" "2014-01-04"
# [5] "2014-01-05" "2014-01-06" "2014-01-07" "2014-01-08"
# [9] "2014-01-09" "2014-01-10"
as.Date might be better to avoid any time-zone issues.
I strongly advise you to use xts object instead of ts.
Here is a code replicating what you want :
library(xts)
my.index = seq(from = as.Date("2014-01-01"), by = "day", length.out = 10)
my.timeseries = xts(x = 1:10, order.by = my.index)
index(my.timeseries)
Let us know if that helps :)
Romain
Related
I have a csv file that consists of one column. The column presents the date of posting on a website. I want to plot a histogram to see how the number of posts varies over the years. The file contains the years (2012 to 2016) and consists of 11,000 rows.
sample of the file:
2 30/1/12 21:07
3 2/2/12 15:53
4 3/4/12 0:49
5 14/11/12 3:49
6 11/8/13 16:00
7 31/7/14 8:08
8 31/7/14 10:48
9 6/8/14 9:24
10 16/12/14 3:34
The data types is dataframe
class(postsData)
[1] "data.frame"
I tried converting the data to text using strptime function as below:
formatDate <- strptime(as.character(postsData$Date),format="“%d/%m/%y")
then plot the histogram
hist(formatDate,breaks=10,xlab="year")
Any tip or suggestion would be useful. Thank you,
use lubridate::dmy_hm()
strptime() is overly complicated in my opinion compared to { lubridate }.
library(lubridate)
d <- c("30/1/12 21:07",
"2/2/12 15:53",
"3/4/12 0:49",
"14/11/12 3:49",
"11/8/13 16:00",
"31/7/14 8:08",
"31/7/14 10:48",
"6/8/14 9:24",
"16/12/14 3:34")
d2 <- dmy_hm(d)
d2
Returns:
[1] "2012-01-30 21:07:00 UTC"
[2] "2012-02-02 15:53:00 UTC"
[3] "2012-04-03 00:49:00 UTC"
[4] "2012-11-14 03:49:00 UTC"
[5] "2013-08-11 16:00:00 UTC"
[6] "2014-07-31 08:08:00 UTC"
[7] "2014-07-31 10:48:00 UTC"
[8] "2014-08-06 09:24:00 UTC"
[9] "2014-12-16 03:34:00 UTC"
As you can see, lubridate functions return POSIXct objects.
class(d2)
[1] "POSIXct" "POSIXt"
Next, you can use lubridate::year() to get the year of each POSIXct object returned by dmy_hm(), and plot that histogram.
hist(year(d2))
Here's one approach. I think your date conversion is fine but you need to count the number of dates that occur in each year then plot that count as a histogram.
library(tidyverse)
# generate some data
date.seq <- tibble(xdate = seq(from = lubridate::ymd_hms('2000-01-01 00:00:00'), to=lubridate::ymd_hms('2016-12-31 24:59:59'), length.out = 100))
date.seq %>%
mutate(xyear = lubridate::year(xdate)) %>% # add a column of years
group_by(xyear) %>%
summarise(date_count = length(xdate)) %>% # Count the number of dates that occur in each year
ggplot(aes(x = xyear, y = date_count)) +
geom_col(colour = 'black', fill = 'blue') # plot as a column graph
There's no problem with strptime()*, however, the format option is intended to specify how the is formatted.
df1$date <- strptime(df1$date, format="%d/%m/%y %H:%M")
# [1] "2012-01-30 21:07:00 CET" "2012-02-02 15:53:00 CET"
# [3] "2012-04-03 00:49:00 CEST" "2012-11-14 03:49:00 CET"
# [5] "2013-08-11 16:00:00 CEST" "2014-07-31 08:08:00 CEST"
# [7] "2014-07-31 10:48:00 CEST" "2014-08-06 09:24:00 CEST"
# [9] "2014-12-16 03:34:00 CET"
What you probably want then is to use the format() function
formatDate <- format(df1$date, format="%F")
(or in this case simpler with formatDate <- as.Date(df1$date))
and then
hist(formatDate, breaks=10, xlab="year")
* credits to #MikkoMarttila
Data
df1 <- structure(list(id = 2:10, date = c("30/1/12 21:07", "2/2/12 15:53",
"3/4/12 0:49", "14/11/12 3:49", "11/8/13 16:00", "31/7/14 8:08",
"31/7/14 10:48", "6/8/14 9:24", "16/12/14 3:34")), class = "data.frame", row.names = c(NA,
-9L))
I have looked at this answer which lists all dates between time points. Ideally I would like to state start and end dates, and the number of elements I'd want in the vector, and get back random dates including replicates.
start_date <- as.Date('2015-01-01')
end_date <- as.Date('2017-01-01')
set.seed(1984)
as.Date(sample( as.numeric(start_date): as.numeric(end_date), 10,
replace = T),
origin = '1970-01-01')
[1] "2016-04-27" "2015-11-16" "2015-10-01" "2015-08-31" "2016-06-23"
[6] "2016-09-23" "2015-01-24" "2015-11-24" "2016-08-30" "2015-06-07"
Using the sample and seq functions as in Generating Random Dates seems to me the more straight forward approach:
set.seed(1984)
sample(seq(as.Date('2015-01-01'), as.Date('2017-01-01'), by = "day"), 10)
Output:
[1] "2016-04-27" "2015-11-16" "2015-09-30" "2015-08-30" "2016-06-20"
[6] "2016-09-19" "2015-01-24" "2015-11-21" "2016-08-23" "2015-06-05"
If we are working with the class POSIXct and we'd like randomized hours, minutes, and seconds, not just dates:
set.seed(1984)
sample(seq(as.POSIXct('2015-01-01'), as.POSIXct('2017-01-01'), by = "sec"), 10)
Output:
[1] "2016-04-26 15:04:13 CEST" "2015-11-16 10:17:23 CET"
[3] "2015-09-30 22:50:41 CEST" "2015-08-30 23:17:49 CEST"
[5] "2016-06-23 04:49:01 CEST" "2016-09-22 14:37:58 CEST"
[7] "2015-01-24 17:04:13 CET" "2015-11-24 07:13:42 CET"
[9] "2016-08-29 16:13:13 CEST" "2015-06-06 21:29:18 CEST"
I have a vector of dates called KeyDates containing two key dates. I would like to make a new vector of dates called KeyDatesPlus containing those two key dates and the two days after, in chronological order.
KeyDates <- structure(c(15159,15165), class = "Date")
#KeyDates Output:
[1] "2011-07-04" "2011-07-10"
#desired output for KeyDatesPlus:
[1] "2011-07-04" "2011-07-05" "2011-07-10" "2011-07-11"
How could I achieve that? Thank you very much.
sort(c(KeyDates, KeyDates + 1))
[1] "2011-07-04" "2011-07-05" "2011-07-10" "2011-07-11"
structure( sapply(KeyDates, "+", (0:1)), class = "Date")
[1] "2011-07-04" "2011-07-05" "2011-07-10" "2011-07-11"
Or:
as.Date( sapply(KeyDates, "+", (0:1)))
[1] "2011-07-04" "2011-07-05" "2011-07-10" "2011-07-11"
KeyDates <- structure(c(15159,15165), class = "Date")
KeyDates.plus <- as.Date(sapply(KeyDates, function(x) c(x, x+1)))
An answer using the package lubridate:
library("lubridate")
your.vector <- c("2011-07-04", "2011-07-10")
your.vector <- parse_date_time(x = your.vector, orders = "ymd")
your.vector
# [1] "2011-07-04 UTC" "2011-07-10 UTC"
one.day <- days(x = 1)
one.day
# [1] "1d 0H 0M 0S"
your.vector + one.day
# [1] "2011-07-05 UTC" "2011-07-11 UTC"
# your exact desired output (non-UTC time zone can be specified in parse_date_time):
new.vector <- sort(x = c(your.vector, your.vector + one.day))
# [1] "2011-07-04 UTC" "2011-07-05 UTC" "2011-07-10 UTC" "2011-07-11 UTC"
Lubridate distinguishes a "period" from a "duration."
A period is the time on the clock (ie if daylight savings time happens, it's what the clock reads). That's what's specified here using days().
A duration is the physical time (ie if daylight savings time happens, it's how long you've actually been sitting there.) That could be specified instead using ddays().
KeyDates <- structure(c(15159,15165), class = "Date")
KeyDatesPlus <- KeyDates+1
KeyDatesPlus <- sort(unique(c(KeyDates, KeyDatesPlus)))
I am seeing an unexpected result when using the lubridate package in R. I am simply trying to combine two dates into a vector. When I do so, the time zone changes. What is happening here?
> x <- ymd("2016-02-08")
> y <- ymd("2016-03-29")
> x
[1] "2016-02-08 UTC"
> y
[1] "2016-03-29 UTC"
> c(x,y)
[1] "2016-02-07 18:00:00 CST" "2016-03-28 19:00:00 CDT"
Using c() will remove the timezone attribute. Hence you have to reassign it:
xy <- c(x,y)
attr(xy, "tzone") <- "UTC"
> xy
[1] "2016-02-08 UTC" "2016-03-29 UTC"
Source and more information: Peter Ehlers on R Help
Hi I'm trying to get a sequence of dates with lubridate
This doesn't work
seq(ymd('2012-04-07'),ymd('2013-03-22'),by=week(1))
the base command
seq(as.Date('2012-04-7'),as.Date('2013-03-22'),'weeks')
does, but I'd like to know if there is an elegant way to do this with lubridate.
EDIT
Please ignore : solved myself so leaving up for posterity only. Happy to have this deleted if necessary.
seq(ymd('2012-04-07'),ymd('2013-03-22'),by='weeks')
Does the trick
ymd is a wrapper to parse date strings and returns a POSIXct object.
You simply need to use standard terminology described in ?seq.POSIXt (not lubridate) to define weeks
seq(ymd('2012-04-07'),ymd('2013-03-22'), by = '1 week')
seq(ymd('2012-04-07'),ymd('2013-03-22'), by = 'weeks')
will works
as will
seq(ymd('2012-04-07'),ymd('2013-03-22'), by = '2 week')
You could coerce the lubridate Period class object to a difftime, but that seems rather unnecessary
seq(ymd('2012-04-07'),ymd('2013-03-22'), by = as.difftime(weeks(1)))
This is a way to stick within the POSIXct universe of lubridate and not change date formats to base R's POSIXt. I avoid changing the date format in my scripts because I find it is a common place where bugs (for example time-zone changes or losing timestamps) are introduced. It follows this advice to use %m+%: R: adding 1 month to a date
# example date is a leap day for a "worst case scenario"
library("lubridate")
posixct.in <- parse_date_time(x = "2016-02-29", orders = "ymd")
# [1] "2016-02-29 UTC"
posixct.seq <- posixct.in %m+% years(x = seq.int(from = 0, to = 3, by = 1))
# [1] "2016-02-29 UTC" "2017-02-28 UTC" "2018-02-28 UTC" "2019-02-28 UTC"
posixct.seq <- posixct.in %m+% months(x = seq.int(from = 0, to = 3, by = 1))
# [1] "2016-02-29 UTC" "2016-03-29 UTC" "2016-04-29 UTC" "2016-05-29 UTC"
posixct.seq <- posixct.in %m+% days(x = seq.int(from = 0, to = 3, by = 1))
# [1] "2016-02-29 UTC" "2016-03-01 UTC" "2016-03-02 UTC" "2016-03-03 UTC"
posixct.seq <- posixct.in %m+% weeks(x = seq.int(from = 0, to = 3, by = 1))
# [1] "2016-02-29 UTC" "2016-03-07 UTC" "2016-03-14 UTC" "2016-03-21 UTC"
A regular + also works sometimes, but the %m+% prevents errors like this:
posixct.seq <- posixct.in + years(x = seq.int(from = 0, to = 3, by = 1))
# [1] "2016-02-29 UTC" NA NA NA
At first I was confused because I thought %m+ was just a way to add months, and similar lubridate commands like %y+% etc. do not exist. But, turns out the "m" doesn't stand for "month addition". My best guess is "magic" =)