Time Series date format issue in R - r

I am using the [dowjones][1] dataset but I think maybe my date format is incorrect because when I run the zoo function to make the data time series I get the warning:
some methods for “zoo” objects do not work if the index entries in
‘order.by’ are not unique
My code:
dow = read.table('dow_jones_index.data', header=T, sep=',')
dowts = zoo(dow$close, as.Date(as.character(dow$date), format = "%m/%d/%Y"))
The dates look like this: 5/6/2011
Does my error have to do with using an incorrect date format? Or something else?
Thank you.
EDIT:
hist(dowts, xlab='close change rate', prob=TRUE, main='Histogram',ylim=c(0,.07))
Error in hist.default(dowts, xlab = "close change rate", prob = TRUE,
: character(0)
In addition: Warning messages: 1: In zoo(rval[i],
index(x)[i]) : some methods for “zoo” objects do not work if the
index entries in ‘order.by’ are not unique 2: In
pretty.default(range(x), n = breaks, min.n = 1) : NAs introduced by
coercion [1]:
https://archive.ics.uci.edu/ml/datasets/Dow+Jones+Index

The problem as the warning message indicates is that your date values are not unique. This is because your data is in long format with multiple stocks. A timeseries has to be in a matrix like structure with each column representing a stock and each row a point in time. With dcast from the package reshape2 this straigthforward:
library(zoo)
library(reshape2)
dow <- read.table('dow_jones_index.data', header=T, sep=',', stringsAsFactors = FALSE)
# delete $ symbol and coerce to numeric
dow$close <- as.numeric(sub("\\$", "",dow$close))
tmp <- dcast(dow, date~stock, value.var = "close")
dowts <- as.zoo(x = tmp[,-1], order.by = as.Date(tmp$date, format = "%m/%d/%Y"))

Related

nanotime: how to deal with NAs in a tibble?

I am using the excellent nanotime package to store my timestamps, but I am unable to make the package work when my tibble contains a missing value.
Consider this:
library(nanotime)
library(tibble)
library(dplyr)
tibble(time = c('2020-01-01 10:10:10.123456',
NA,
'2020-01-01 10:10:10.123456')) %>%
mutate(enhance = nanotime(time,
tz = 'GMT',
format = '%Y-%m-%d %H:%M:%E9S'))
Error in RcppCCTZ::parseDouble(x, fmt = format, tz = tz) :
Parse error on NA
What am I missing here? Using na.rm = TRUE does not work unfortunately.
Thanks!
The issue is NA is of type logical, you need to have all the values in the column of same type. We can use as.integer64 to replace logical NA's with integer64 NA.
library(nanotime)
tbl <- tibble::tibble(time = c('2020-01-01 10:10:10.123456',
NA,
'2020-01-01 10:10:10.123456'))
tbl$enhance <- as.integer64(NA)
tbl$enhance[!is.na(tbl$time)] <- nanotime(na.omit(tbl$time), tz = 'GMT',
format = '%Y-%m-%d %H:%M:%E9S')
nanotime(tbl$enhance)

How to extract Month and Day from Date in R and convert it as a date type?

I am Rstudio for my R sessions and I have the following R codes:
d1 <- read.csv("mydata.csv", stringsAsFactors = FALSE, header = TRUE)
d2 <- d1 %>%
mutate(PickUpDate = ymd(PickUpDate))
str(d2$PickUpDate)
output of last line of code above is as follows:
Date[1:14258], format: "2016-10-21" "2016-07-15" "2016-07-01" "2016-07-01" "2016-07-01" "2016-07-01" ...
I need an additional column (let's call it MthDD) to the dataframe d2, which will be the Month and Day of the "PickUpDate" column. So, column MthDD need to be in the format mm-dd but most importantly, it should still be of the date type.
How can I achieve this?
UPDATE:
I have tried the following but it outputs the new column as a character type. I need the column to be of the date type so that I can use it as the x-axis component of a plot.
d2$MthDD <- format(as.Date(d2$PickUpDate), "%m-%d")
Date objects do not display as mm-dd. You can create a character string with that representation but it will no longer be of Date class -- it will be of character class.
If you want an object that displays as mm-dd and still acts like a Date object what you can do is create a new S3 subclass of Date that displays in the way you want and use that. Here we create a subclass of Date called mmdd with an as.mmdd generic, an as.mmdd.Date method, an as.Date.mmdd method and a format.mmdd method. The last one will be used when displaying it. mmdd will inherit methods from Date class but you may still need to define additional methods depending on what else you want to do -- you may need to experiment a bit.
as.mmdd <- function(x, ...) UseMethod("as.mmdd")
as.mmdd.Date <- function(x, ...) structure(x, class = c("mmdd", "Date"))
as.Date.mmdd <- function(x, ...) structure(x, class = "Date")
format.mmdd <- function(x, format = "%m-%d", ...) format(as.Date(x), format = format, ...)
DF <- data.frame(x = as.Date("2018-03-26") + 0:2) # test data
DF2 <- transform(DF, y = as.mmdd(x))
giving:
> DF2
x y
1 2018-03-26 03-26
2 2018-03-27 03-27
3 2018-03-28 03-28
> class(DF2$y)
[1] "mmdd" "Date"
> as.Date(DF2$y)
[1] "2018-03-26" "2018-03-27" "2018-03-28"
Try using this:
PickUpDate2 <- format(PickUpDate,"%m-%d")
PickUpDate2 <- as.Date(PickUpDate2, "%m-%d")
This should work, and you should be able to bind_cols afterwards, or just add it to the data frame right away, as you proposed in the code you provided. So the code should be substituted to be:
d2$PickUpDate2 <- format(d2$PickUpDate,"%m-%d")
d2$PickUpDate2 <- as.Date(d2$PickUpDate2, "%m-%d")

Plot function outside the candlestick pattern in R

I have two xts objects: stock and base. I calculate the relative strength (which is simply the ratio of closing price of stock and of the base index) and I want to plot the weekly relative strength outside the candlestick pattern. The links for the data are here and here.
library(quantmod)
library(xts)
read_stock = function(fichier){ #read and preprocess data
stock = read.csv(fichier, header = T)
stock$DATE = as.Date(stock$DATE, format = "%d/%m/%Y") #standardize time format
stock = stock[! duplicated(index(stock), fromLast = T),] # Remove rows with a duplicated timestamp,
# but keep the latest one
stock$CLOSE = as.numeric(stock$CLOSE) #current numeric columns are of type character
stock$OPEN = as.numeric(stock$OPEN) #so need to convert into double
stock$HIGH = as.numeric(stock$HIGH) #otherwise quantmod functions won't work
stock$LOW = as.numeric(stock$LOW)
stock$VOLUME = as.numeric(stock$VOLUME)
stock = xts(x = stock[,-1], order.by = stock[,1]) # convert to xts class
return(stock)
}
relative.strength = function(stock, base = read_stock("vni.csv")){
rs = Cl(stock) / Cl(base)
rs = apply.weekly(rs, FUN = mean)
}
stock = read_stock("aaa.csv")
candleChart(stock, theme='white')
addRS = newTA(FUN=relative.strength,col='red', legend='RS')
addRS()
However R returns me this error:
Error in `/.default`(Cl(stock), Cl(base)) : non-numeric argument to binary operator
How can I fix this?
One problem is that "vni.csv" contains a "Ticker" column. Since xts objects are a matrix at their core, you can't have columns of different types. So the first thing you need to do is ensure that you only keep the OHLC and volume columns of the "vni.csv" file. I've refactored your read_stock function to be:
read_stock = function(fichier) {
# read and preprocess data
stock <- read.csv(fichier, header = TRUE, as.is = TRUE)
stock$DATE = as.Date(stock$DATE, format = "%d/%m/%Y")
stock = stock[!duplicated(index(stock), fromLast = TRUE),]
# convert to xts class
stock = xts(OHLCV(stock), order.by = stock$DATE)
return(stock)
}
Next, it looks like the the first argument to relative.strength inside the addRS function is passed as a matrix, not an xts object. So you need to convert to xts, but take care that the index class of the stock object is the same as the index class of the base object.
Then you need to make sure your weekly rs object has an observation for each day in stock. You can do that by merging your weekly data with an empty xts object that has all the index values for the stock object.
So I refactored your relative.strength function to:
relative.strength = function(stock, base) {
# convert to xts
sxts <- as.xts(stock)
# ensure 'stock' index class is the same as 'base' index class
indexClass(sxts) <- indexClass(base)
index(sxts) <- index(sxts)
# calculate relative strength
rs = Cl(sxts) / Cl(base)
# weekly mean relative strength
rs = apply.weekly(rs, FUN = mean)
# merge 'rs' with empty xts object contain the same index values as 'stock'
merge(rs, xts(,index(sxts)), fill = na.locf)
}
Now, this code:
stock = read_stock("aaa.csv")
base = read_stock("vni.csv")
addRS = newTA(FUN=relative.strength, col='red', legend='RS')
candleChart(stock, theme='white')
addRS(base)
Produces this chart:
The following line in your read_stock function is causing the problem:
stock = xts(x = stock[,-1], order.by = stock[,1]) # convert to xts class
vni.csv has the actual symbol name in the third column of your data, so when you put stock[,-1] you're actually including a character column and xts forces all the other columns to be characters as well. Then R alerts you about dividing a number by a character at Cl(stock) / Cl(base). Here is a simple example of this error message with division:
> x <- c(1,2)
> y <- c("A", "B")
> x/y
Error in x/y : non-numeric argument to binary operator
I suggest you remove the character column in vni.csv that contains "VNIndex" in every row or modify your function called read_stock() to better protect against this type of issue.

Read a column of dates from Excel into zoo (or xts)

I have a column of dates, exported from Excel as CSV into dataframe, the default type in "import dataset..." "...from CSV" i.e. d<-read_csv(data.csv).
From a dataframe I like to create a zoo and/or xts object.
The data is:
30/04/2016
31/05/2016
30/06/2016
I get the following errors:
dates <- c('30/04/2016','31/05/2016','30/06/2016')
d <- dates
z <- read.zoo(d)
Error in read.zoo(d) : index has bad entry at data row 1
z <- read.zoo(d, FUN = as.Date())
Error in as.Date() : argument "x" is missing, with no default
z <- read.zoo(d, FUN = as.Date(format="%d/%m/%Y"))
Error in as.Date(format = "%d/%m/%Y") : argument "x" is missing,
with no default
Alternatively, if i read directly into zoo with format arguemnt i get a different error:
ts.z <- read.zoo(d,index=1,tz='',format="%d/%m/%Y")
Error in read.zoo(d, index = 1, tz = "", format = "%d/%m/%Y") :
index has bad entry at data row 1
What are the bad entry row 1 errors? What are the correct ways to specify FUN = ? What are the correct input classes and distinctions for read.zoo?
From ?read.zoo about the file-parameter:
character string or strings giving the name of the file(s) which the
data are to be read from/written to. See read.table and
write.table for more information. Alternatively, in read.zoo, file
can be a connection or a data.frame (e.g., resulting from a previous
read.table call) that is subsequently processed to a "zoo" series.
What is going wrong in your example is that d is neither a filename, a connection or a data.frame. You will have to wrap it in data.frame().
A working example:
z <- read.zoo(data.frame(dates), FUN = as.Date, format='%d/%m/%Y')
which gives:
> z
2016-04-30
2016-05-31
2016-06-30
> class(z)
[1] "zoo"
Used input data:
dates <- c('30/04/2016','31/05/2016','30/06/2016')

I want to run code on data frame up to a certain date (column 2)

I am trying to run code on a data frame up to a certain date. I have individual game statistics, the second column is Date in order. I thought this is how to do this however I get an error:
Error in `[.data.frame`(dfmess, dfmess$Date <= Standingdate) :
undefined columns selected
Here is my code:
read.csv("http://www.football-data.co.uk/mmz4281/1516/E0.csv")
dfmess <- read.csv("http://www.football-data.co.uk/mmz4281/1516/E0.csv", stringsAsFactors = FALSE)
Standingdate <- as.Date("09/14/15", format = "%m/%d/%y")
dfmess[dfmess$Date <= Standingdate] -> dfmess
You probably want to convert dfmess$Date to as.Date first prior to comparing. In addition, per #Roland's comment, you require an additional comma ,:
dfmess <- read.csv("http://www.football-data.co.uk/mmz4281/1516/E0.csv", stringsAsFactors = FALSE)
dfmess$Date <- as.Date(dfmess$Date, "%m/%d/%y")
Standingdate <- as.Date("09/14/15", format = "%m/%d/%y")
dfmess[dfmess$Date <= Standingdate, ]

Resources