I have a matrix X where each column represents a time series. Each row doesn't represent a year, month or day, but rather a second. I'd like to use the xts package but have the dates just be 1,2,3,..., nrow(X), i.e. from 1 to the last second in the series, since each row is one second ahead of the previous. Is this possible? I can't seem to figure it out.
1) This can be done with zoo:
library(zoo)
X <- matrix(1:6, 3) + 100 # test data
zoo(X)
giving the following (the 1, 2, 3 column is the times):
1 101 104
2 102 105
3 103 106
2) xts does not support raw numbers as times (see ?xts) but you could use the fact that the "POSIXct" class is expressed in seconds internally. It will show up as POSIXct date times but internally it will be the seconds you asked for:
library(xts)
x <- xts(X, as.POSIXct(1:nrow(X), origin = "1970-01-01"))
giving:
> x
[,1] [,2]
1969-12-31 19:00:01 101 104
1969-12-31 19:00:02 102 105
1969-12-31 19:00:03 103 106
> unclass(time(x))
[1] 1 2 3
attr(,"tzone")
[1] ""
attr(,"tclass")
[1] "POSIXct" "POSIXt"
Related
I have some date that I am trying to convert them to numbers and then back to original date.
Date
1990-12-31
1991-12-31
1992-12-31
1993-12-31
1994-06-30
1994-12-31
I tried,
as.numeric(DF[1:6])
[1] 1 2 3 5 7
as.Date(as.numeric(DF[1:6]), "1990-12-31")
[1] "1991-01-01" "1991-01-02" "1991-01-03" "1991-01-05" "1991-01-07" "1991-01-08"
I notice the problem of time interval. What should I do to get original dates?
If what you have is a data frame with a column of class factor as shown reproducibly in the Note at the end then we do not want to apply as.numeric to that since that will just give the underlying codes for the factor levels which are not meaningful. Rather, this gives Date class:
d <- as.Date(DF$Date)
d
## [1] "1990-12-31" "1991-12-31" "1992-12-31" "1993-12-31" "1994-06-30"
## [6] "1994-12-31"
and this gives the number of days since the UNIX Epoch:
no <- as.numeric(d)
no
## [1] 7669 8034 8400 8765 8946 9130
and this turns that back to Date class:
as.Date(no, "1970-01-01")
## [1] "1990-12-31" "1991-12-31" "1992-12-31" "1993-12-31" "1994-06-30"
## [6] "1994-12-31"
Note
Lines <- "
Date
1990-12-31
1991-12-31
1992-12-31
1993-12-31
1994-06-30
1994-12-31 "
DF <- read.table(text = Lines, header = TRUE)
I have a df with 4 date columns as character vectors:
print(df)
Date1 Date2 Date3 Date4
1 2016-12-05 <NA> 2016-11-24 2017-12-05
2 2007-10-15 2009-09-18 2007-10-15 2017-10-15
3 2005-07-22 2009-06-20 2005-07-22 2017-07-22
4 2008-01-03 2017-07-25 2008-01-03 2018-01-03
If I apply:
df <- apply(df, 2, function(x) as.Date(x, origin = "1970-01-01"))
I get as a result:
print(df)
Date1 Date2 Date3 Date4
[1,] 17140 NA 17129 17505
[2,] 13801 14505 13801 17454
[3,] 12986 14415 12986 17369
[4,] 13881 17372 13881 17534
I've solved the problem using lapply instead of apply but I would like to know what is happening inside apply for returning dates as a number.
The OP would like to know what is happening inside apply for returning dates as a number.
The help page ?apply says
In all cases the result is coerced by as.vector to one of the basic vector types before the dimensions are set
The help page ?as.vector says
All attributes are removed from the result if it is of an atomic mode, but not in general for a list result.
As already mentioned by Akarsh Jain, Date objects have a class attribute. Removing the class attribute will leave just the plain number of days since 1970-01-01.
Here is a code snippet for demonstration:
x <- as.Date("2016-12-05")
x
#> [1] "2016-12-05"
attributes(x)
#> $class
#> [1] "Date"
y <- as.vector(x)
y
#> [1] 17140
attributes(y)
#> NULL
I have a dataframe in R, which has two variables that are dates and I need to calculate the difference in days between them. However, they are formatted as YYYYMMDD. How do I change it to a date format readable in R?
This should work
lubridate::ymd(given_date_format)
I like anydate() from the anytime package. Quick demo, with actual data:
R> set.seed(123) # be reproducible
R> data <- data.frame(inp=Sys.Date() + cumsum(runif(10)*10))
R> data$ymd <- format(data$inp, "%Y%m%d") ## as yyyymmdd
R> data$int <- as.integer(data$ymd) ## same as integer
R> library(anytime)
R> data$diff1 <- c(NA, diff(anydate(data$ymd))) # reads YMD
R> data$diff2 <- c(NA, diff(anydate(data$int))) # also reads int
R> data
inp ymd int diff1 diff2
1 2017-06-23 20170623 20170623 NA NA
2 2017-07-01 20170701 20170701 8 8
3 2017-07-05 20170705 20170705 4 4
4 2017-07-14 20170714 20170714 9 9
5 2017-07-24 20170724 20170724 10 10
6 2017-07-24 20170724 20170724 0 0
7 2017-07-29 20170729 20170729 5 5
8 2017-08-07 20170807 20170807 9 9
9 2017-08-13 20170813 20170813 6 6
10 2017-08-17 20170817 20170817 4 4
R>
Here the first column is actual dates we work from. Columns two and three are then generates to match OP's requirement: YMD, either in character or integer.
We then compute differences on them, account for the first 'lost' data point differences when we have no predecessor and show that either date format works.
I am working on a data frame that contains 2 columns as follows:
time frequency
2014-01-06 13
2014-01-07 30
2014-01-09 56
My issue is that I am interested in counting the days of which frequency is 0. The data is pulled using RPostgreSQL/RSQLite so there is no datetime given unless there is a value (i.e. unless frequency is at least 1). If I was interested in counting these dates that don't actually exist in the data frame, is there an easy way to go about doing it? I.E. If we consider the date range 2014-01-01 to 20-14-01-10, I would want it to count 7
My only thought was to brute force create a separate dataframe with every date (note that this is 4+ years of dates which would be an immense undertaking) and then merging the two dataframes and counting the number of NA values. I'm sure there is a more elegant solution than what I've thought of.
Thanks!
Sort by date and then look for gaps.
start <- as.Date("2014-01-01")
time <- as.Date(c("2014-01-06", "2014-01-07","2014-01-09"))
end <- as.Date("2014-01-10")
time <- sort(unique(time))
# Include start and end dates, so the missing dates are 1/1-1/5, 1/8, 1/10
d <- c(time[1]- start,
diff(time) - 1,
end - time[length(time)] )
d # [1] 5 0 1 1
sum(d) # 7 missing days
And now for which days are missing...
(gaps <- data.frame(gap_starts = c(start,time+1)[d>0],
gap_length = d[d>0]))
# gap_starts gap_length
# 1 2014-01-01 5
# 2 2014-01-08 1
# 3 2014-01-10 1
for (g in 1:nrow(gaps)){
start=gaps$gap_starts[g]
length=gaps$gap_length[g]
for(i in start:(start+length-1)){
print(as.Date(i, origin="1970-01-01"))
}
}
# [1] "2014-01-01"
# [1] "2014-01-02"
# [1] "2014-01-03"
# [1] "2014-01-04"
# [1] "2014-01-05"
# [1] "2014-01-08"
# [1] "2014-01-10"
I have 3 days long time series (data) sampled every minute (60*24*3 values):
require(zoo)
t<-seq(as.POSIXlt("2015/02/02 00:01:00"),as.POSIXlt("2015/02/04 24:00:00"), length.out=60*24*3)
d<-seq(1,2, length.out=60*24*3)
data<-zoo(d,t)
I would like to calculate:
Mean values (over three days span) for every minute of the hour assuming that all hours are equal. In this case I should have 60 values in the output with time stamps:
01:00, 02:00, ..., 60:00. Each mean must be calculated over 24x3=72 values, since we have 72 hours in three days long time series.
Same as above but additionally tracking hours:
00:01:00, 00:02:00, ..., 23:60:00. Each mean will be calculated over three values, since we have three days long time series.
These two both create zoo series use aggregate.zoo. The index of the resulting zoo series will be of chron "times" class.
library(chron) # "times" class
aggregate(data, times(format(time(data), "00:%M:00")), mean)
aggregate(data, times(format(time(data), "%H:%M:00")), mean)
If its OK that the index is of class "character" then times can be omitted in which case chron is not needed.
You can do this using data.table and lubridate:
library(data.table)
library(lubridate)
##
Dt <- data.table(
Data=as.numeric(data),
Index=index(data))
##
min_dt <- Dt[
,list(Mean=mean(Data)),
by=list(Minute=minute(Index))]
##
hmin_dt <- Dt[
,list(Mean=mean(Data)),
by=list(Hour=hour(Index),
Minute=minute(Index))]
##
R> head(min_dt)
Minute Mean
1: 1 1.493170
2: 2 1.493401
3: 3 1.493633
4: 4 1.493864
5: 5 1.494096
6: 6 1.494327
##
R> head(hmin_dt)
Hour Minute Mean
1: 0 1 1.333411
2: 0 2 1.333642
3: 0 3 1.333874
4: 0 4 1.334105
5: 0 5 1.334337
6: 0 6 1.334568
Data:
library(zoo)
t <- seq(
as.POSIXlt("2015/02/02 00:01:00"),
as.POSIXlt("2015/02/04 24:00:00"),
length.out=60*24*3)
d <- seq(1,2,length.out=60*24*3)
data <- zoo(d,t)