R - FinancialInstrument Package Changing Symbol Names when using stock - r

I'm currently in the process of building a strategy using quantstrat/blotter. The price data that I'm using uses numbers as the security identifiers and these numbers are therefore the column names as well as what I use for the synbol names in functions such as stock() in order to import the financial instruments. However as shown in the reproducible code below, using a very small portion of my dataset, whenever stock() is used on these numerical identifiers, the FinancialInstrument package modifies them in a strange manner, by appending an "X" and removing the leading digit. Based upon this, are there any restrictions on symbol names for use with the FinancialInstrument package?
structure(c(9.17000007629395, 9.17000007629395, 9.17000007629395,
9.17000007629395, 9.17000007629395, 9.17000007629395, 41.0999984741211,
40.7599983215332, 40.4599990844727, 40.1500015258789, 40.5299987792969,
40.5299987792969, 41.9900016784668, 41.7449989318848, 42.0299987792969,
41.7200012207031, 42.25, 41.7000007629395, 29.3199996948242,
29.3199996948242, 29.3199996948242, 29.3199996948242, 29.3199996948242,
29.3199996948242), class = c("xts", "zoo"), .indexCLASS = "Date", tclass = "Date", .indexTZ = "UTC", tzone = "UTC", index = structure(c(1403481600,
1403568000, 1403654400, 1403740800, 1403827200, 1404086400), tzone = "UTC", tclass = "Date"), .Dim = c(6L,
4L), .Dimnames = list(NULL, c("10078", "10104", "10107", "10108"
)))
colnames(x)
# "10078" "10104" "10107" "10108"
for(i in colnames(x)){
stock(i,currency="USD",multiplier=1)
}
ls_stocks()
# "X0078" "X0104" "X0107" "X0108"

instrument names need to begin with a letter or a dot. The instrument function uses make.names to ensure this. If it's important to be able to find your instruments by a number, then you can add it as an identifier.
stock("X1234", currency("USD"), identifiers=list(num=1234))
getInstrument("1234")
#primary_id :"X1234"
#currency :"USD"
#multiplier :1
#tick_size :0.01
#identifiers:List of 1
# ..$ num:1234
#type :"stock"
Another way to add an identifier
add.identifier("X1234", id2=42)

Related

How to filter a part of dates and then change that part?

There are a group of dates in test_2 called df that I'm trying to change. For example: 2020-12-15 is in the started_at column and 2020-12-25 is in the ended_at column. I want to change the daypart of the ended_at column.
I could write day(test_2$ended_at) <- 15 #[thanks Ben for guiding me with this chunk]
But the problem is there are some other days also. Like, 2020-12-08 etc.
How is it possible to filter the required part of the date and change it?
I soulfully appreciate your kind help.
Here is the dput of the data structure.
> dput(test_2)
structure(list(started_at = structure(c(1608033433, 1608033092,
1608033242, 1608034138, 1608034548, 1608033904, 1608033525, 1608032413,
1608032432, 1607385918, 1608032241, 1608034867, 1609079592, 1608033139,
1608032406, 1608034912, 1608033844, 1608032114, 1608034239, 1608032677,
1608032219, 1608033975, 1609459101, 1608032929, 1608034558, 1608034138,
1608033654, 1608033875, 1606810523, 1608034878, 1608034232), tzone = "UTC", class = c("POSIXct",
"POSIXt")), ended_at = structure(c(1608914839, 1608908027, 1608909124,
1608924913, 1608905112, 1608920814, 1608915081, 1608891612, 1608896054,
1607385667, 1608891462, 1608922015, 1606985651, 1608907113, 1608896350,
1608923619, 1608923486, 1608887393, 1608934063, 1608899164, 1608886816,
1608924042, 1606781193, 1608907025, 1608914882, 1608923510, 1608921699,
1608922845, 1606810492, 1608913874, 1608943331), tzone = "UTC", class = c("POSIXct",
"POSIXt"))), row.names = c(NA, -31L), class = c("tbl_df", "tbl",
"data.frame"))
Based on the description, we may create a logical index for subsetting and updating the 'ended_at' column
library(lubridate)
i1 <- with(test_2, as.Date(started_at) == "2020-12-15" &
as.Date(ended_at ) == "2020-12-25")
day(test_2$ended_at[i1]) <- 15

Imputing the date and time in R

I am having a data set like below and I am trying to impute the value like below.
ID In Out
4 2019-09-20 21:57:22 NA
4 NA 2019-09-21 5:07:03
When there NA's in lead and lag for each ID's, I am trying to impute the time to cut off the previous day and start new time for the next day. I was doing like this, but I am getting error
df1%>%
group_by(ID) %>%
mutate(In= ifelse(is.na(In) & is.na(lag(Out)),
as.POSIXct(as.character(paste(as.Date(In),"05:00:01"))),
In)) %>%
mutate(Out= ifelse(is.na(Out) & lead(In) == "05:00:01",
as.POSIXct(as.character(paste(as.Date(Out),"05:00:00"))),
Out))
The desired output will be
ID In Out
4 2019-09-20 21:57:22 2019-09-21 05:00:00
4 2019-09-21 5:00:01 2019-09-21 5:07:03
Dput for the data
structure(list(concat = c("176 - 2019-09-20", "176 - 2019-09-20",
"176 - 2019-09-20", "176 - 2019-09-20", "176 - 2019-09-21"),
ENTRY = structure(c(1568989081, 1569008386, 1569016635, 1569016646,
NA), class = c("POSIXct", "POSIXt"), tzone = "UTC"), EXIT = structure(c(1569005439,
1569014914, 1569016645, NA, 1569042433), class = c("POSIXct",
"POSIXt"), tzone = "UTC")), row.names = c(NA, -5L), class = c("data.table",
"data.frame"), .internal.selfref = <pointer: 0x0000000007e21ef0>)
Finally, I got the desired output by separating the date and time and pasting it back. Definitely this is not a efficient way to achieve this. May be some one can suggest other efficient way to do this which gives some learning at least.
df%>%
mutate(ENTRY_date = as.Date(ENTRY)) %>%
mutate(EXIT_date = as.Date(EXIT))%>%
mutate(ENTRY_time = format(ENTRY,"%H:%M:%S"))%>%
mutate(EXIT_time = format(EXIT,"%H:%M:%S"))%>%
mutate(Entry_date1 = if_else(is.na(ENTRY_date)&is.na(lag(EXIT_date)),EXIT_date,ENTRY_date))%>%
mutate(Exit_date1 = if_else(is.na(EXIT_date)& is.na(lead(ENTRY_date)),ENTRY_date,EXIT_date))%>%
mutate(Entry_time1 = if_else(is.na(ENTRY_time)&is.na(lag(EXIT_time)),"05:00:01",ENTRY_time))%>%
mutate(Exit_time1 = if_else(is.na(EXIT_time)& is.na(lead(ENTRY_time)),"04:59:59",EXIT_time))%>%
mutate(ENTRY1 = as.POSIXct(paste(Entry_date1, Entry_time1), format = "%Y-%m-%d %H:%M:%S"))%>%
mutate(EXIT1 = as.POSIXct(paste(Exit_date1, Exit_time1), format = "%Y-%m-%d %H:%M:%S"))
First, using your dput() data did not work for me. Anyway, if I understand your question correctly you can do it like this:
# load pacakge
library(lubridate)
# replace missing In values with the corresponding Out values,
# setting 5:00:01 as time.
df$In[is.na(df$In)] <- ymd_hms(paste0(as.Date(df$Out[is.na(df$In)]), " 5:00:01"))
# same idea but first we save it as a vector...
Out <- ymd_hms(paste0(as.Date(df$In[is.na(df$Out)]), " 5:00:00"))
# ... then we add one day
day(Out) <- day(Out) + 1; df$Out[is.na(df$Out)] <- Out
This works for the data that you provided but if Out time is 2019-09-21 04:07:03, for example, then the correstponding In time is later, namely 2019-09-21 05:00:01. I do not know if this is intended. If not please specify your question.
I used this data
structure(list(In = structure(c(1569016642, NA), tzone = "UTC", class = c("POSIXct",
"POSIXt")), Out = structure(c(NA, 1569042423), tzone = "UTC", class = c("POSIXct",
"POSIXt"))), .Names = c("In", "Out"), row.names = c(NA, -2L), class = "data.frame")

R Merging XTS time series causing duplicate time of day

I've never found an efficient way to solve a problem I've encountered every time I try to combine different sources of time series data. By different sources, I mean combining say a data source from the internet (yahoo stock prices) with say a local csv time series.
yahoo.xts # variable containing security prices from yahoo
local.xts # local time series data
cbind(yahoo.xts,local.xts) # combine them
The result is as follows:
I get a combined xts data frame with different time for a given date. What I want is ignore the time for a given day and align them. The way I've been solving this problem is to extract the two separate sources of data's index and converting using as.Date function and then re-wrapping them as xts object. My question is if there is another better more efficient way that I missed.
Note: I am unsure how to provide a good example of a local data source to give you guys a good way to replicate the problem but the following is a snippet of how to get data from online.
require(quantmod)
data.etf = env()
getSymbols.av(c('XOM','AAPL'), src="av", api.key="your-own-key",from = '1970-01-01',adjusted=TRUE,
output.size="full",env = data.etf, set.symbolnames = T, auto.assign = T)
yahoo.xts = Cl(data.etf$XOM)
Heres some data:
Yahoo:
structure(c(112.68, 109.2, 107.86, 104.35, 104.68, 110.66), class = c("xts",
"zoo"), .indexCLASS = c("POSIXct", "POSIXt"), tclass = c("POSIXct",
"POSIXt"), .indexTZ = "America/Chicago", tzone = "America/Chicago", index = structure(c(1508457600,
1508716800, 1508803200, 1508889600, 1508976000, 1509062400), tzone = "America/Chicago", tclass = c("POSIXct",
"POSIXt")), .Dim = c(6L, 1L), .Dimnames = list(NULL, "XIV"))
Local structure:
structure(c(0.176601541324807, -0.914132074513824, -0.0608652702022332,
-0.196679777210441, -0.190397155984135, 0.915313388202916, -0.0530280808936784,
0.263895885521142, 0.10844973759151, 0.0547864992300319, 0.0435149080877898,
-0.202388932508539, 0.0382888645282672, -0.00800908217028123,
-0.0798424223984417, 0.00268898461896916, 0.00493307845560457,
0.132697099147406, 0.074267173330532, -0.336299384720176, -0.0859815663679892,
-0.0597168456705514, -0.0867777000321366, 0.283394650847026,
-0.0100414455118704, 0.106355723615723, -0.0640682814821423,
0.0481841070155836, -0.00321273561708742, -0.13182105331959), .indexCLASS = c("POSIXct",
"POSIXt"), tclass = c("POSIXct", "POSIXt"), .indexTZ = structure("America/Chicago", .Names = "TZ"), tzone = structure("America/Chicago", .Names = "TZ"), class = c("xts",
"zoo"), na.action = structure(1L, class = "omit", index = 1080540000), index = structure(c(1508475600,
1508734800, 1508821200, 1508907600, 1508994000, 1509080400), tzone = structure("America/Chicago", .Names = "TZ"), tclass = c("POSIXct",
"POSIXt")), .Dim = c(6L, 5L), .Dimnames = list(NULL, c("D.30",
"D.60", "D.90", "D.120", "D.150")))
If you understand the sources of your problem, perhaps you can avoid the problem in the first place.
Your problem is that the 19:00:00 stamps in your printed results correspond to UTC dates (as at 12AM UTC) converted to "America/Chicago" POSIXct timestamps, when the merge happens.
As you've pointed out, one solution is to make new xts time indexes which are all of date format. But it does get annoying. It's best to avoid the situation in the first place, if you can, otherwise you have to resort to changing the date time series to a POSIXct time series with appropriate timezones.
The key thing you need to understand when you have misaligning xts objects with date data (or more precisely, what you think is date data), is that the time zones are not aligning in the objects. If the timezones are aligning in the time indexes of your xts objects, then you will get the correct merging without the undesirable behaviour. Of course, date objects don't have timezones, and by default they will be given the timezone "UTC" if they are merged with xts objects with time indexes of type POSIXct.
# reproduce your data (your code isn't reproducible fully for me:
require(quantmod)
data.etf = new.env()
getSymbols(c('XOM','AAPL'), src="yahoo", api.key="your-own-key",from = '1970-01-01',adjusted=TRUE,output.size="full",env = data.etf, set.symbolnames = T, auto.assign = T)
yahoo.xts = Cl(data.etf$XOM)
z <- structure(c(0.176601541324807, -0.914132074513824, -0.0608652702022332,
-0.196679777210441, -0.190397155984135, 0.915313388202916, -0.0530280808936784,
0.263895885521142, 0.10844973759151, 0.0547864992300319, 0.0435149080877898,
-0.202388932508539, 0.0382888645282672, -0.00800908217028123,
-0.0798424223984417, 0.00268898461896916, 0.00493307845560457,
0.132697099147406, 0.074267173330532, -0.336299384720176, -0.0859815663679892,
-0.0597168456705514, -0.0867777000321366, 0.283394650847026,
-0.0100414455118704, 0.106355723615723, -0.0640682814821423,
0.0481841070155836, -0.00321273561708742, -0.13182105331959), .indexCLASS = c("POSIXct",
"POSIXt"), tclass = c("POSIXct", "POSIXt"), .indexTZ = structure("America/Chicago", .Names = "TZ"), tzone = structure("America/Chicago", .Names = "TZ"), class = c("xts",
"zoo"), na.action = structure(1L, class = "omit", index = 1080540000), index = structure(c(1508475600,
1508734800, 1508821200, 1508907600, 1508994000, 1509080400), tzone = structure("America/Chicago", .Names = "TZ"), tclass = c("POSIXct",
"POSIXt")), .Dim = c(6L, 5L), .Dimnames = list(NULL, c("D.30",
"D.60", "D.90", "D.120", "D.150")))
#inspect the index timezones and classes:
> class(index(z))
# [1] "POSIXct" "POSIXt"
> class(index(yahoo.xts))
# [1] "Date"
indexTZ(z)
# TZ
# "America/Chicago"
indexTZ(yahoo.xts)
# [1] "UTC"
You can see that yahoo.xts is using a date class. When this is merged with a POSIXct class (i.e. with z, it will be converted to the "UTC" timestamp.
# Let's see what happens if the timezone of the yahoo.xts2 object is the same as z:
yahoo.xts2 <- xts(coredata(yahoo.xts), order.by = as.POSIXct(as.character(index(yahoo.xts)), tz = "America/Chicago"))
str(yahoo.xts2)
An ‘xts’ object on 1970-01-02/2017-10-27 containing:
Data: num [1:12067, 1] 1.94 1.97 1.96 1.95 1.96 ...
- attr(*, "dimnames")=List of 2
..$ : NULL
..$ : chr "XOM.Close"
Indexed by objects of class: [POSIXct,POSIXt] TZ: America/Chicago
xts Attributes:
NULL
u2 <- merge(z,yahoo.xts2)
tail(u2)
class(index(u2))
# [1] "POSIXct" "POSIXt"
tail(u2, 3)
# D.30 D.60 D.90 D.120 D.150 XOM.Close
# 2017-10-25 -0.1966798 0.05478650 0.002688985 -0.05971685 0.048184107 83.17
# 2017-10-26 -0.1903972 0.04351491 0.004933078 -0.08677770 -0.003212736 83.47
# 2017-10-27 0.9153134 -0.20238893 0.132697099 0.28339465 -0.131821053 83.71
Everything is as expected now.
A shortcut that you might find useful is to this:
z3 <- as.xts(as.data.frame(z), dateFormat="Date")
tail(merge(z3, yahoo.xts))
# D.30 D.60 D.90 D.120 D.150 XOM.Close
# 2017-10-20 0.17660154 -0.05302808 0.038288865 0.07426717 -0.010041446 83.11
# 2017-10-23 -0.91413207 0.26389589 -0.008009082 -0.33629938 0.106355724 83.24
# 2017-10-24 -0.06086527 0.10844974 -0.079842422 -0.08598157 -0.064068281 83.47
# 2017-10-25 -0.19667978 0.05478650 0.002688985 -0.05971685 0.048184107 83.17
# 2017-10-26 -0.19039716 0.04351491 0.004933078 -0.08677770 -0.003212736 83.47
# 2017-10-27 0.91531339 -0.20238893 0.132697099 0.28339465 -0.131821053 83.71
Convert to a data.frame, then convert back to an xts with the appropriate parameter setting : dateFormat="Date". Now you are working with an xts object with a time index that is of type date with no timezone issues:
class(index(merge(z3, yahoo.xts)))
#[1] "Date"

How to initialize a time variable in R?

I am trying to set a time variable in order to use it for comparison against times stored in a vector and am writing the following:
> openingTime <- as.POSIXct('08:00:00 AM', format='%H:%M:S %p', tzone = "EET")
> openingTime
[1] NA
Also
> dput(openingTime)
structure(NA_real_, class = c("POSIXct", "POSIXt"), tzone = "")
What am I doing wrong there?
As commented and pointed out by #rhertel, the proper syntax in order to get the variable working is:
openingTime <- as.POSIXct('08:00:00 AM', format='%H:%M:%S %p', tz = "EET")

Using lapply with variable argument lists

Oh man I feel dumb. This is beginner central, but I'm totally lost trying to figure out how to subset arguments in lapply. At this point I've just been randomly trying different combinations of [[ and friends, cursing my clumsiness with the debugging in RStudio.
The issue: I have a dataset collected from SQL Server which includes several columns of date data. Some of these are legitimate datetime values, others are strings. Most have (valid) missing data and some have more than one format. Often the date 1900-01-01 is used as a substitute for NULL. I'm trying very, very hard to be idiomatic and concise in solving this instead of brute forcing it with copy/paste invocations.
My ParseDates() function seems to work well if called column-by-column, but I can't get it to work with lapply. I can see that I'm sending the whole list of orders and threshold values when I only want to pass the current observation, but I can't get my head around how lapply iterates or how to align multiple lists so that the right arguments go with the right call.
I need to finish with all values correctly held as dates (or POSIXct in this instance) with anything close to 1900-01-01 set to NA.
library(lubridate)
# build sample data extract
events <-
structure(
list(
ReservationDate = structure(
c(4L, 2L, 3L, NA,
1L), .Label = c(
"18/12/2006", "1/1/1900", "May 11 2004 12:00AM",
"May 17 2004 12:00AM"
), class = "factor"
), OrigEnquiryDate = structure(
c(1094565600,
937404000, 1089295200, NA, NA), class = c("POSIXct", "POSIXt"), tzone = ""
), UnconditionalDate = structure(
c(1092146400, 935676000,
1087740000, NA, 1168952400), class = c("POSIXct", "POSIXt"), tzone = ""
),
ContractsExchangedDate = structure(
c(NA, NA, NA, NA, 1171544400), class = c("POSIXct", "POSIXt"), tzone = ""
)
), .Names = c(
"ReservationDate",
"OrigEnquiryDate", "UnconditionalDate", "ContractsExchangedDate"
), row.names = c(54103L, 54090L, 54057L, 135861L, 73433L), class = "data.frame"
)
ParseDates <- function(x, orders=NULL, threshold=10) {
# converts to POSIXct if required and replaces 1900-01-01 or similar with na
if(!is.null(orders)) {
x <- parse_date_time(x, orders)
}
x[abs(difftime(x, as.POSIXct("1900-01-01"), units="days")) < threshold] <- NA
return(x)
}
# only consider these columns
date.cols <- names(events) %in% c(
"ReservationDate", "UnconditionalDate", "ContractsExchangedDate", "OrigEnquiryDate"
)
# columns other than these should use the default threshold of 10
date.thresholds <- list("UnconditionalDate"=90, "ContractsExchangedDate"=400)
# columns *other* than these should use the default order of NULL,
# they skip parsing and go straight to threshold testing
date.orders <- list(
"SettlementDate"=c("dmY", "bdY I:Mp"),
"ReservationDate"=c("dmY", "bdY I:Mp")
)
events[date.cols] <- lapply(events[date.cols],
ParseDates(events[date.cols],
orders = date.orders,
threshold = date.thresholds))

Resources