How to parse complex date/time string into zoo object? - r

I'm trying to convert the following date/time string into a zoo object:
2004:071:15:23:41.87250
2004:103:15:24:15.35931
year:doy:hour:minute:second
The date/time string is stored in a dataframe without headers. What's the best way to go about this in R?
Cheers!
Edit based on answer by Gavin:
# read in time series from CSV file; each entry as described above
timeSeriesDates <- read.csv("timeseriesdates.csv", header = FALSE, sep = ",")
# convert to format that can be used as a zoo object
timeSeriesDatesZ <- as.POSIXct(timeSeriesDates$V1, format = "%Y:%j:%H:%M:%S")

Read the data in to R in the usual way. You will have something like the following:
dats <- data.frame(times = c("2004:071:15:23:41.87250", "2004:103:15:24:15.35931"))
dats
These can be converted to one of the POSIXt classes via:
dats <- transform(dats, as.POSIXct(times, format = "%Y:%j:%H:%M:%S"))
or
data$times <- as.POSIXct(dats$times, format = "%Y:%j:%H:%M:%S"))
which can then be used in a zoo object. See ?strftime for details on the placeholders used in the format argument; essentially %j is the day of the year placeholder.
To do the zoo bit, we would do, using some dummy data for the actual time series
ts <- rnorm(2) ## dummy data
require(zoo) ## load zoo
tsZoo <- zoo(ts, dats$times)
the last line gives:
> tsZoo
2004:071:15:23:41.87250 2004:103:15:24:15.35931
0.3503648 -0.2336064
One thing to note with fractional seconds is that i) the exact fraction you have may not be recordable using floating point arithmetic. Also, R may not show the full fractional seconds given the value of an option in R; digits.secs. See ?options for more on this particular option and how to change it.

Here's a commented example for the first string:
R> s <- "2004:103:15:24:15.35931"
R> # split on the ":" and convert the result to a numeric vector
R> n <- as.numeric(strsplit(s, ":")[[1]])
R> # Use the year, hour, minute, second to create a POSIXct object
R> # for the first of the year; then add the number of days (as seconds)
R> ISOdatetime(n[1], 1, 1, n[3], n[4], n[5])+n[2]*60*60*24
[1] "2004-04-13 16:24:15 CDT"

Related

how can I convert number to date?

I have a problem with the as.date function.
I have a list of normal date shows in the excel, but when I import it in R, it becomes numbers, like 33584. I understand that it counts since a specific day. I want to set up my date in the form of "dd-mm-yy".
The original data is:
how the "date" variable looks like in r
I've tried:
as.date <- function(x, origin = getOption(date.origin)){
origin <- ifelse(is.null(origin), "1900-01-01", origin)
as.Date(date, origin)
}
and also simply
as.Date(43324, origin = "1900-01-01")
but none of them works. it shows the error: do not know how to convert '.' to class “Date”
Thank you guys!
The janitor package has a pair of functions designed to deal with reading Excel dates in R. See the following links for usage examples:
https://www.rdocumentation.org/packages/janitor/versions/2.0.1/topics/excel_numeric_to_date
https://www.rdocumentation.org/packages/janitor/versions/2.0.1/topics/convert_to_date
janitor::excel_numeric_to_date(43324)
[1] "2018-08-12"
I've come across excel sheets read in with readxl::read_xls() that read date columns in as strings like "43488" (especially when there is a cell somewhere else that has a non-date value). I use
xldate<- function(x) {
xn <- as.numeric(x)
x <- as.Date(xn, origin="1899-12-30")
}
d <- data.frame(date=c("43488"))
d$actual_date <- xldate(d$date)
print(d$actual_date)
# [1] "2019-01-23"
Dates are notoriously annoying. I would highly recommend the lubridate package for dealing with them. https://lubridate.tidyverse.org/
Use as_date() from lubridate to read numeric dates if you need to.
You can use format() to put it in dd-mm-yy.
library(lubridate)
date_vector <- as_date(c(33584, 33585), origin = lubridate::origin)
formatted_date_vector <- format(date_vector, "%d-%m-%y")

How do you change numerical values in R into dates?

Hi this question has been bugging me for some time.
So I am trying to convert the so-called dates in my R project into actual dates. Right now the dates are arranged in a numerical manner, ie after 2/28/2020 it's not 3/1/2020 but 2/3/2020.
I've tried the
as.Date(3/14/2020, origin = "14-03-2020")
and also
df <- data.frame(Date = c("10/9/2009 0:00:00", "10/15/2009 0:00:00"))
as.Date(df$Date, "%m/%d/%Y %H:%M:%S")
and
strDates <- c("01/28/2020", "05/03/2020")%>%
dates <- as.Date(strDates, "%m/%d/%Y")
i just plugged in two dates to test out if it works or not because there are about around 40 dates. However, my output is as follows:
Error in as.Date.default(., 3/14/2020, origin = "14-03-2020") : do not know how to convert '.' to class “Date”
for the first one and then
the second one is:
data frame not found
the third one is
Error in as.Date(strDates, "%m/%d/%Y") : object 'strDates' not found
Issues with your code:
as.Date(3/14/2020, origin = "14-03-2020")
First, R will replace 3/14/2020 with 0.000106082, since that's what 3 divided by 14 divided by 2020 equals. You need to identify it as a string using single or double quotes, as in: as.Date("3/14/2020", origin = "14-03-2020").
But that is still broken. When converting to Date, if you provide a character (string) input, then you may need to provide format=, since it needs to know which numbers in the string correspond to year, month, date, etc. If you provide a numeric (or integer) input, then you do need to provide origin=, so that it knows what "day 0" is. For unix, epoch is what you need, so origin="1970-01-01". If you're using dates from Excel, you need origin="1899-12-30" (see https://stackoverflow.com/a/43230524).
Your next error is because you are mixing magrittr ops with ... base R.
strDates <- c("01/28/2020", "05/03/2020")%>%
dates <- as.Date(strDates, "%m/%d/%Y")
The issue here has nothing to do with dates. The use of %>% on line 1 is taking the output of line 1 (in R, assignment to a variable invisibly returns the assigned numbers, which is why chaining assignment works, a <- b <- 2) and injecting it as the first argument in the next function call. With this your code was eventually interpreted as
strDates <- c("01/28/2020", "05/03/2020")%>%
{ dates <- as.Date(., strDates, "%m/%d/%Y") }
which is obviously not what you intended or need. I suspect that this is just an artifact of getting frustrated and was mid-stage converting from a %>% pipe to something else, and you forgot to clean up the %>%s. This could be
dates <- c("01/28/2020", "05/03/2020") %>%
as.Date("%m/%d/%Y")
dates
# [1] "2020-01-28" "2020-05-03"
Your data.frame code seems to work fine, though you do not assign the new Date-assigned values back to the frame. Try this slight adaptation:
df <- data.frame(Date = c("10/9/2009 0:00:00", "10/15/2009 0:00:00"))
df$Date <- as.Date(df$Date, "%m/%d/%Y %H:%M:%S")
df
# Date
# 1 2009-10-09
# 2 2009-10-15
str(df)
# 'data.frame': 2 obs. of 1 variable:
# $ Date: Date, format: "2009-10-09" "2009-10-15"

R: Convert list of character dates to "days since" a given origin

I need to generate a list to attach as a time dimension for a NetCDF file. This file was stacked from many files that were generated as monthly means from 6-hourly data. As a result, each t observation is simply the chronological value of the file in the stack.
First, I generated a date sequence, converted it to characters and then created a date object stating the origin to be the origin of interest. My objective was to receive an integer list of "days since 1850-01-01", however, converting to an integer, I see the origin used is still the R default "1970-01-01".
I am new to working with this "Julian" approach and I would appreciate any advice as to what methods I should use.
The code is as follows
dates <- seq(as.Date("1901-01-01"), by = "month", length.out = 12)
dates <- as.character(dates)
o <- as.Date("1850-01-01")
dates <- as.Date(dates, origin = o)
dates
dates <- as.numeric(dates)
Convert dates to Date class, subtract o and convert that to numeric:
as.numeric(as.Date(dates) - o)
## [1] 18627 18658 18686 18717 18747 18778 18808 18839 18870 18900 18931 18961
Alternately convert both to numeric:
as.numeric(as.Date(dates)) - as.numeric(o)
or use difftime
as.numeric(difftime(as.Date(dates), o, unit = "day"))

Force xts() object to ts()

Data: DOWNLOAD .TXT
Code:
data = read.table("DistrBdaily1yrs.txt", header = TRUE, sep = "", dec = ",")
data$DATE = as.Date(as.character(data$DATE),format="%Y%m%d")
dataXts = xts(data$QUANTITY,data$DATE, frequency = 6)
tseries = ts(dataXts, start = start(dataXts), end = end(dataXts), frequency = 6)
What I'm trying to do is to convert the xts dataXts object to a ts object with correct starting and ending date in order to use the decompose function. In this case start = start(dataXts) and end = end(dataXts) give me the right starting and ending date but tseries doesn't recognize the data column in dataXts and then think that all is data.
How can I fix this?
I am not sure I was able to "FORCE" xts to ts but i got the decompose part to function:
library("data.table")
# I was unable to read-in using read.table() for some reason.... used fread() as it is much faster
data <- fread("DistrBdaily1yrs.txt", header = TRUE, sep="\t")
# Set column names to the ones I saw on dropbox, as i was unable to read-in header for some reason!
colnames(data) <- c("DATE", "QUANTITY")
# Keep as-is
data$DATE = as.Date(as.character(data$DATE),format="%Y%m%d")
dataXts = xts(data$QUANTITY,data$DATE, frequency = 6)
# Not sure what the "QUANTITY" Column means but it must be turned into "numeric"
# You can see this post on how to do it if the following is unsatisfactory:
# http://stackoverflow.com/questions/3605807/how-to-convert-numbers-with-comma-inside-from-character-to-numeric-in-r
a<-as.numeric(gsub(",",".",dataXts))
dataXts <- reclass(a, match.to=dataXts); colnames(dataXts)<- "QUANTITY"
# Now convert it to timeSeries
timeseries <- ts(dataXts,frequency=6)
# decompose
decompose(timeseries)
Also, when I convert xts to ts I assume that it will use the first and last dates in order to construct the ts which is why i left out start = start(dataXts), end = end(dataXts) in the ts() function. Also see ?ts since you cannot pass Dates in the start or end criteria, rather:
Either a single number or a vector of two integers, which specify a natural time unit and a (1-based) number of samples into the time unit.
You can always convert back to xts using reclass:
# for example: Say you only want the trend
reclass(decompose(timeseries)$trend,match.to=dataXts)

Read millisecond tick data without decimal point format to zoo series

I'm trying to read some CSV-format financial tick data (source: HISTDATA_COM_ASCII_EURUSD_T_201209.zip) into a zoo series. The data is indexed by a time column which contains timestamps formatted such as 20120902 170010767 - almost like %Y%m%d %H%M%OS3 except milliseconds are not seperated by a decimal point as required by %OS3.
I have attempted to force the required decimal point by dividing the latter (right) half of the timestamp by 1000 and pasting back together again:
FUN <- function(i, format) {
sapply(strsplit(i, " "), function(j) strptime(paste(j[1], as.numeric(j[2])/1000), format = format))
}
read.zoo(file, format = "%Y%m%d %H%M%OS3", FUN = FUN, sep = ",")
However, this has not worked - could someone please shed some light on how best to do this properly?
Many thanks
You could obviously make this shorter but this gives the idea well:
> tm <- "20120902 170010767"
> gsub("(^........\\s......)(.+$)", "\\1.\\2", tm)
[1] "20120902 170010.767"
> strptime( gsub("(^........\\s......)(.+$)", "\\1.\\2", tm), "%Y%m%d %H%M%OS")
[1] "2012-09-02 17:00:10.767"

Resources