Read millisecond tick data without decimal point format to zoo series - r

I'm trying to read some CSV-format financial tick data (source: HISTDATA_COM_ASCII_EURUSD_T_201209.zip) into a zoo series. The data is indexed by a time column which contains timestamps formatted such as 20120902 170010767 - almost like %Y%m%d %H%M%OS3 except milliseconds are not seperated by a decimal point as required by %OS3.
I have attempted to force the required decimal point by dividing the latter (right) half of the timestamp by 1000 and pasting back together again:
FUN <- function(i, format) {
sapply(strsplit(i, " "), function(j) strptime(paste(j[1], as.numeric(j[2])/1000), format = format))
}
read.zoo(file, format = "%Y%m%d %H%M%OS3", FUN = FUN, sep = ",")
However, this has not worked - could someone please shed some light on how best to do this properly?
Many thanks

You could obviously make this shorter but this gives the idea well:
> tm <- "20120902 170010767"
> gsub("(^........\\s......)(.+$)", "\\1.\\2", tm)
[1] "20120902 170010.767"
> strptime( gsub("(^........\\s......)(.+$)", "\\1.\\2", tm), "%Y%m%d %H%M%OS")
[1] "2012-09-02 17:00:10.767"

Related

how can I convert number to date?

I have a problem with the as.date function.
I have a list of normal date shows in the excel, but when I import it in R, it becomes numbers, like 33584. I understand that it counts since a specific day. I want to set up my date in the form of "dd-mm-yy".
The original data is:
how the "date" variable looks like in r
I've tried:
as.date <- function(x, origin = getOption(date.origin)){
origin <- ifelse(is.null(origin), "1900-01-01", origin)
as.Date(date, origin)
}
and also simply
as.Date(43324, origin = "1900-01-01")
but none of them works. it shows the error: do not know how to convert '.' to class “Date”
Thank you guys!
The janitor package has a pair of functions designed to deal with reading Excel dates in R. See the following links for usage examples:
https://www.rdocumentation.org/packages/janitor/versions/2.0.1/topics/excel_numeric_to_date
https://www.rdocumentation.org/packages/janitor/versions/2.0.1/topics/convert_to_date
janitor::excel_numeric_to_date(43324)
[1] "2018-08-12"
I've come across excel sheets read in with readxl::read_xls() that read date columns in as strings like "43488" (especially when there is a cell somewhere else that has a non-date value). I use
xldate<- function(x) {
xn <- as.numeric(x)
x <- as.Date(xn, origin="1899-12-30")
}
d <- data.frame(date=c("43488"))
d$actual_date <- xldate(d$date)
print(d$actual_date)
# [1] "2019-01-23"
Dates are notoriously annoying. I would highly recommend the lubridate package for dealing with them. https://lubridate.tidyverse.org/
Use as_date() from lubridate to read numeric dates if you need to.
You can use format() to put it in dd-mm-yy.
library(lubridate)
date_vector <- as_date(c(33584, 33585), origin = lubridate::origin)
formatted_date_vector <- format(date_vector, "%d-%m-%y")

Trouble with as.Date

I currently have date data in the following format: DDMMYYYY
I want to convert the dates to proper date function (I suspect I will need to for visualizing the temporal data) using the following:
data$DATE<-as.Date(as.character(data$DATE), "%d%m%Y")
which would be fine, however days with single digits cause me to get NA results because there is no 0 infront.
Example:
17042018 = 2018-05-10
5022018 = NA
What is a work around? Should I just paste a 0 in instances where characters is less than 8?
I am quite new to R, but if you could send me in the right direction it would be much appreciated!
Regards,
G
Pad the string with zeros till length 8, then convert:
a <- '5102017'
a <- sprintf('%08d', as.numeric(a))
as.Date(a, "%d%m%Y")
Or, in your example:
data$DATE <- as.Date(sprintf('%08d', as.numeric(data$DATE)), "%d%m%Y")
You may have to transform the initial data$DATE, depending on what you are strarting from

Create a time vector from an excel import

I am working with data from csv files that will all look the same so I am hoping to come up with a code that can be easily applied to all of them.
However, sadly enough I am failing at step one :-(.
The csv files have the date and time saved in one column, so when I import them with read.csv that column gets read as a chr. How can I most easily convert this into a date that I then can use for plotting and analysis?
Here is what I tried:
load the data --> will save the date and time as chr under mydata$Date.Time (e.g. 1/1/15 0:00)
mydata<-read.csv(file.choose(), stringsAsFactors = FALSE,
strip.white = TRUE,
na.strings = c("NA",""), skip=16,
header=TRUE)
separate the Date.Time into Date and Time:
new <- do.call( rbind , strsplit( as.character( mydata$Date.Time ) , " " ) )
add these two back to the df mydata:
cbind( mydata , Date = new[,2] , Time = new[,1] )
convert Date into a date format via as.Date:
mydata$Date <- as.Date(new[,1], format="")
So this works fine for the date however I am stuck with the time, I tried this:
mydata$Time <- format(as.POSIXct(new[,2], format="%H:%M"))
this gives me the following error:
Error in as.POSIXlt.character(x, tz, ...) :
character string is not in a standard unambiguous format
I wonder if there is a smarter way of doing this? Reading in time and date seems to be one of the substantial tasks that I would like to understand. Is there a way of R directly recognizing the date and time from the csv? Or is it generally smarter to generate a time vector by its own, if so how would I do that?
Thanks so much for your help.
Sandra
If you want to use time only, consider using the chron package:
library(chron)
mytime <- times("21:19:37")
or in your case
times(new[,2])
assuming that that's a character vector.
I tried the chron approach but it wouldn't work for me :-(.
So what I ended up doing is just creating a time vector for the period that I am loading the data in for:
date <-seq(as.POSIXct("2015/1/1 00:00"), as.POSIXct("2015/1/31 23:00"), "hours")
and then adding it back to the df.
Not what I wanted but it will work until I find the ultimate solution :-)

Load csv into R as xts, or comparable to enable time series analysis

I am still learning R, and get very confused when using various data types, classes, etc. I have run into this issue of "Dates" not being in the right format for xts countless times now, and find a solution each time after searching long and hard for (what I consider) complicated solutions.
I am looking for a way to load a CSV into R and convert the date upon loading it each time I want to load a csv into R. 99% of my files contain Date as the first column, in format 01-31-1900 (xts wants YYYY-mm-dd).
Right now I have the following:
FedYieldCurve <- read.csv("Yield Curve.csv", header = TRUE, sep = ",", stringsAsFactors = FALSE)
FedYieldCurve$Date <- format(as.Date(FedYieldCurve$Date), "%Y/%m/%d")
and i am getting: Error in charToDate(x) :
character string is not in a standard unambiguous format
The format argument must be inside as.Date. Try this (if the dates in the files are stored in the 01-31-1900 format):
as.Date(FedYieldCurve$Date,format="%m-%d-%Y")
When you try to coerce a string to a Date object you have to specify the format of the string as the format argument in the as.Date call. You have the error you reported when you try to coerce a string which has a format other than the standard YYYY-mm-dd.
Provide a few lines of the file when asking questions like this. In the absence of this we have supplied some data below in a self contained example.
Use read.zoo from the zoo package (which xts loads) specifying the format. (Replace the read.zoo line with the commented line to read from a file.)
Lines <- "Date,Value
01-31-1900,3"
library(xts)
# z <- read.zoo("myfile.csv", header = TRUE, sep = ",", format = "%m-%d-%Y")
z <- read.zoo(text = Lines, header = TRUE, sep = ",", format = "%m-%d-%Y")
x <- as.xts(z)
See ?read.zoo and Reading Data in zoo.

How to parse complex date/time string into zoo object?

I'm trying to convert the following date/time string into a zoo object:
2004:071:15:23:41.87250
2004:103:15:24:15.35931
year:doy:hour:minute:second
The date/time string is stored in a dataframe without headers. What's the best way to go about this in R?
Cheers!
Edit based on answer by Gavin:
# read in time series from CSV file; each entry as described above
timeSeriesDates <- read.csv("timeseriesdates.csv", header = FALSE, sep = ",")
# convert to format that can be used as a zoo object
timeSeriesDatesZ <- as.POSIXct(timeSeriesDates$V1, format = "%Y:%j:%H:%M:%S")
Read the data in to R in the usual way. You will have something like the following:
dats <- data.frame(times = c("2004:071:15:23:41.87250", "2004:103:15:24:15.35931"))
dats
These can be converted to one of the POSIXt classes via:
dats <- transform(dats, as.POSIXct(times, format = "%Y:%j:%H:%M:%S"))
or
data$times <- as.POSIXct(dats$times, format = "%Y:%j:%H:%M:%S"))
which can then be used in a zoo object. See ?strftime for details on the placeholders used in the format argument; essentially %j is the day of the year placeholder.
To do the zoo bit, we would do, using some dummy data for the actual time series
ts <- rnorm(2) ## dummy data
require(zoo) ## load zoo
tsZoo <- zoo(ts, dats$times)
the last line gives:
> tsZoo
2004:071:15:23:41.87250 2004:103:15:24:15.35931
0.3503648 -0.2336064
One thing to note with fractional seconds is that i) the exact fraction you have may not be recordable using floating point arithmetic. Also, R may not show the full fractional seconds given the value of an option in R; digits.secs. See ?options for more on this particular option and how to change it.
Here's a commented example for the first string:
R> s <- "2004:103:15:24:15.35931"
R> # split on the ":" and convert the result to a numeric vector
R> n <- as.numeric(strsplit(s, ":")[[1]])
R> # Use the year, hour, minute, second to create a POSIXct object
R> # for the first of the year; then add the number of days (as seconds)
R> ISOdatetime(n[1], 1, 1, n[3], n[4], n[5])+n[2]*60*60*24
[1] "2004-04-13 16:24:15 CDT"

Resources