I am trying to run the code in a tutorial by rbresearch titled 'Low Volatility with R', however when trying to run the cbind function the time series seems to get totally misaligned.
Here's the data-preparation section that works perfectly:
require(quantmod)
symbols = c("XLY", "XLP", "XLE", "XLF", "XLV", "XLI", "XLK", "XLB", "XLU")
getSymbols(symbols, index.class=c("POSIXt","POSIXct"), from='2000-01-01')
for(symbol in symbols) {
x<-get(symbol)
x<-to.monthly(x,indexAt='lastof',drop.time=TRUE)
indexFormat(x)<-'%Y-%m-%d'
colnames(x)<-gsub("x",symbol,colnames(x))
assign(symbol,x)
}
for(symbol in symbols) {
x <- get(symbol)
x1 <- ROC(Ad(x), n=1, type="continuous", na.pad=TRUE)
colnames(x1) <- "ROC"
colnames(x1) <- paste("x",colnames(x1), sep =".")
#x2 is the 12 period standard deviation of the 1 month return
x2 <- runSD(x1, n=12)
colnames(x2) <- "RANK"
colnames(x2) <- paste("x",colnames(x2), sep =".")
x <- cbind(x,x2)
colnames(x)<-gsub("x",symbol,colnames(x))
assign(symbol,x)
}
rank.factors <- cbind(XLB$XLB.RANK, XLE$XLE.RANK, XLF$XLF.RANK, XLI$XLI.RANK,
XLK$XLK.RANK, XLP$XLP.RANK, XLU$XLU.RANK, XLV$XLV.RANK, XLY$XLY.RANK)
r <- as.xts(t(apply(rank.factors, 1, rank)))
for (symbol in symbols){
x <- get(symbol)
x <- x[,1:6]
assign(symbol,x)
}
To illustrate that the XLE ETF data data is aligned with the XLE Ranked data:
> head(XLE)
XLE.Open XLE.High XLE.Low XLE.Close XLE.Vodalume XLE.Adjusted
2000-01-31 27.31 29.47 25.87 27.31 5903600 22.46
2000-02-29 27.31 27.61 24.62 26.16 4213000 21.51
2000-03-31 26.02 30.22 25.94 29.31 8607600 24.10
2000-04-30 29.50 30.16 27.52 28.87 5818900 23.74
2000-05-31 29.19 32.31 29.00 32.27 5148800 26.54
2000-06-30 32.16 32.50 30.09 30.34 4563100 25.07
> nrow(XLE)
[1] 163
> head(r$XLE.RANK)
XLE.RANK
2000-01-31 2
2000-02-29 2
2000-03-31 2
2000-04-30 2
2000-05-31 2
2000-06-30 2
nrow(r$XLE.RANK)
[1] 163
However after running the following cbind function, the xts object becomes totally misaligned:
> XLE <- cbind(XLE, r$XLE.RANK)
> head(XLE)
XLE.Open XLE.High XLE.Low XLE.Close XLE.Volume XLE.Adjusted XLE.RANK
2000-01-31 27.31 29.47 25.87 27.31 5903600 22.46 NA
2000-01-31 NA NA NA NA NA NA 2
2000-02-29 27.31 27.61 24.62 26.16 4213000 21.51 NA
2000-02-29 NA NA NA NA NA NA 2
2000-03-31 26.02 30.22 25.94 29.31 8607600 24.10 NA
2000-03-31 NA NA NA NA NA NA 2
> nrow(XLE)
[1] 326
Since running pre-existing code seldom works for me, I suspect there's something wrong with my R console, so here's my session information:
> sessionInfo()
R version 3.0.1 (2013-05-16)
Platform: x86_64-w64-mingw32/x64 (64-bit)
locale:
[1] LC_COLLATE=English_Canada.1252 LC_CTYPE=English_Canada.1252 LC_MONETARY=English_Canada.1252 LC_NUMERIC=C
[5] LC_TIME=English_Canada.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] timeSeries_3010.97 timeDate_3010.98 quantstrat_0.7.8 foreach_1.4.1
[5] blotter_0.8.14 PerformanceAnalytics_1.1.0 FinancialInstrument_1.1.9 quantmod_0.4-0
[9] Defaults_1.1-1 TTR_0.22-0 xts_0.9-5 zoo_1.7-10
loaded via a namespace (and not attached):
[1] codetools_0.2-8 grid_3.0.1 iterators_1.0.6 lattice_0.20-15 tools_3.0.1
I'm completely unsure how to properly align the xts object without the NA and would greatly appreciate any help.
I don't see how anyone was not able to replicate your issue. The problem is this line:
r <- as.xts(t(apply(rank.factors, 1, rank)))
The first for loop converts the data to monthly and drops the time component of the index, which converts the index to a Date. This means that rank.factors has a Date index. But as.xts creates a POSIXct index by default, so r will have a POSIXct index.
cbind(XLE, r$XLE.RANK) is merging an xts object with a Date index with an xts object with a POSIXct index. The conversion from POSIXct to Date can be problematic if you're not very careful with timezone settings.
If you don't need the time component, it's best to avoid POSIXct and just use Date. Therefore, everything should work if you set dateFormat="Date" in your as.xts call.
R> r <- as.xts(t(apply(rank.factors, 1, rank)), dateFormat="Date")
R> XLE <- cbind(XLE, r$XLE.RANK)
R> head(XLE)
XLE.Open XLE.High XLE.Low XLE.Close XLE.Volume XLE.Adjusted XLE.RANK
2000-01-31 27.31 29.47 25.87 27.31 5903600 22.46 2
2000-02-29 27.31 27.61 24.62 26.16 4213000 21.51 2
2000-03-31 26.02 30.22 25.94 29.31 8607600 24.10 2
2000-04-30 29.50 30.16 27.52 28.87 5818900 23.74 2
2000-05-31 29.19 32.31 29.00 32.27 5148800 26.54 2
2000-06-30 32.16 32.50 30.09 30.34 4563100 25.07 2
Related
Assume I have three xts objects a, m, s, indexed with the same time slots, I want to compute abs((a*20)-m)/s. This works in the following simple case:
bla <- data.frame(c("2016-09-03 13:00", "2016-09-03 13:10", "2016-09-03 13:20"),c(1,2,3), c(4,5,6), c(7,8,9))
names(bla) <- c('ts','lin','qua','cub')
a <- as.xts(x = bla[,c('lin','qua','cub')], order.by=as.POSIXct(bla$ts)
... similar for m and s...
abs((a*20)-m)/s
gives the correct results.
When I go to my real data, I see different behaviour:
> class(a)
[1] "xts" "zoo"
> class(m)
[1] "xts" "zoo"
> class(s)
[1] "xts" "zoo"
> dim(a)
[1] 1 4650
> dim(m)
[1] 1 4650
> dim(s)
[1] 1 4650
Also the column names are the same:
> setdiff(names(a),names(m))
character(0)
> setdiff(names(m),names(s))
character(0)
Now when I do n <- abs((a*20)-m)/s I get
> n[1,feature]
feature
2016-09-08 14:00:00 12687075516
but if I do the computation by hand:
> aa <- coredata((a*20)[1,feature])[1,1]
> mm <- coredata(m[1,feature])[1,1]
> ss <- coredata(s[1,feature])[1,1]
> abs(aa-mm)/ss
feature
0.0005893713
Just to give the original values:
> a[1,feature]
feature
2016-09-08 14:00:00 27955015680
> m[1,feature]
feature
2016-09-08 14:00:00 559150430034
> s[1,feature]
feature
2016-09-08 14:00:00 85033719103
Can anyone explain this discrepancy?
Thanks a lot
Norbert
Self answering: the error was that I believed that xts is more intelligent in the sense that a/b considers column names, which it does not.
> a
lin qua cub
2016-09-03 13:00:00 1 4 7
2016-09-03 13:10:00 2 5 8
2016-09-03 13:20:00 3 6 9
> b
qua lin cub
2016-09-03 13:00:00 2 3 4
2016-09-03 13:10:00 2 3 4
2016-09-03 13:20:00 2 3 4
> a/b
lin qua cub
2016-09-03 13:00:00 0.5 1.333333 1.75
2016-09-03 13:10:00 1.0 1.666667 2.00
2016-09-03 13:20:00 1.5 2.000000 2.25
Division is done via the underlying matrix without taking care of column names. That is the reason while even if the set of column names coincide, the results are wrong.
I want to get some data from a list of Chinese stocks using quantmod.
The list is like below:
002705.SZ -- 002730.SZ (in this sequence, there are some tickers matched with Null stock, for example, there is no stock called 002720.SZ)
300357.SZ -- 300402.SZ
603188.SS
603609.SS
603288.SS
603306.SS
603369.SS
I want to write a loop to run all these stocks to get the data from each of them and save them into one data frame.
This should get you started.
library(quantmod)
library(stringr) # for str_pad
stocks <- paste(str_pad(2705:2730,width=6,side="left",pad="0"),"SZ",sep=".")
get.stock <- function(s) {
s <- try(Cl(getSymbols(s,auto.assign=FALSE)),silent=T)
if (class(s)=="xts") return(s)
return (NULL)
}
result <- do.call(cbind,lapply(stocks,get.stock))
head(result)
# X002705.SZ.Close X002706.SZ.Close X002707.SZ.Close X002708.SZ.Close X002709.SZ.Close X002711.SZ.Close X002712.SZ.Close X002713.SZ.Close
# 2014-01-21 15.25 27.79 NA 17.26 NA NA NA NA
# 2014-01-22 14.28 28.41 NA 16.56 NA NA NA NA
# 2014-01-23 13.65 27.78 33.62 15.95 19.83 NA 36.58 NA
# 2014-01-24 15.02 30.56 36.98 17.55 21.81 NA 40.24 NA
# 2014-01-27 14.43 31.26 40.68 18.70 23.99 26.34 44.26 NA
# 2014-01-28 14.18 30.01 44.75 17.66 25.57 28.97 48.69 NA
This takes advantage of the fact that getSymbols(...) returns either an xts object, or a character string with an error message if the fetch fails.
Note that cbind(...) for xts objects aligns according to the index, so it acts like merge(...).
This produces an xts object, not a data frame. To convert this to a data.frame, use:
result.df <- data.frame(date=index(result),result)
I am trying to replicate something like this with a custom function but I am getting errors. I have the following data frame
> dd
datetimeofdeath injurydatetime
1 2/10/05 17:30
2 2/13/05 19:15
3 2/15/05 1:10
4 2/24/05 21:00 2/16/05 20:36
5 3/11/05 0:45
6 3/19/05 23:05
7 3/19/05 23:13
8 3/23/05 20:51
9 3/31/05 11:30
10 4/9/05 3:07
The typeof these is integer but for some reason they have levels as if they were factors. This could be the root of my problem but I am not sure.
> typeof(dd$datetimeofdeath)
[1] "integer"
> typeof(dd$injurydatetime)
[1] "integer"
> dd$injurydatetime
[1] 2/10/05 17:30 2/13/05 19:15 2/15/05 1:10 2/16/05 20:36 3/11/05 0:45 3/19/05 23:05 3/19/05 23:13 3/23/05 20:51 3/31/05 11:30
[10] 4/9/05 3:07
549 Levels: 1/1/07 18:52 1/1/07 20:51 1/1/08 17:55 1/1/11 15:25 1/1/12 0:22 1/1/12 22:58 1/11/06 23:50 1/11/07 6:26 ... 9/9/10 8:15
Now I would like to apply the following function rowwise()
library(lubridate)
library(dplyr)
get_time_alive = function(datetimeofdeath, injurydatetime)
{
if(as.character(datetimeofdeath) == "" | as.character(injurydatetime) == "") return(NA)
time_of_death = parse_date_time(as.character(datetimeofdeath), "%m/%d/%y %H:%M")
time_of_injury = parse_date_time(as.character(injurydatetime), "%m/%d/%y %H:%M")
time_alive = as.duration(new_interval(time_of_injury,time_of_death))
time_alive_hours = as.numeric(time_alive) / (60*60)
return(time_alive_hours)
}
This works on individual rows, but not when I do the operation rowwise.
> get_time_alive(dd$datetimeofdeath[1], dd$injurydatetime[1])
[1] NA
> get_time_alive(dd$datetimeofdeath[4], dd$injurydatetime[4])
[1] 192.4
> dd = dd %>% rowwise() %>% dplyr::mutate(time_alive_hours=get_time_alive(datetimeofdeath, injurydatetime))
There were 20 warnings (use warnings() to see them)
> dd
Source: local data frame [10 x 3]
Groups:
datetimeofdeath injurydatetime time_alive_hours
1 2/10/05 17:30 NA
2 2/13/05 19:15 NA
3 2/15/05 1:10 NA
4 2/24/05 21:00 2/16/05 20:36 NA
5 3/11/05 0:45 NA
6 3/19/05 23:05 NA
7 3/19/05 23:13 NA
8 3/23/05 20:51 NA
9 3/31/05 11:30 NA
10 4/9/05 3:07 NA
As you can see the fourth element is NA even though when I applied my custom function to it by itself I got 192.4. Why is my custom function failing here?
I think you can simplify your code a lot and just use something like this:
dd %>%
mutate_each(funs(as.POSIXct(as.character(.), format = "%m/%d/%y %H:%M"))) %>%
mutate(time_alive = datetimeofdeath - injurydatetime)
# datetimeofdeath injurydatetime time_alive
#1 <NA> 2005-02-15 01:10:00 NA days
#2 2005-02-24 21:00:00 2005-02-16 20:36:00 8.016667 days
#3 <NA> 2005-03-11 00:45:00 NA days
Side notes:
I shortened your input data, because it's not easy to copy (I only took those three rows that you also see in my answer)
If you want the "time_alive" formatted in hours, just use mutate(time_alive = (datetimeofdeath - injurydatetime)*24) in the last mutate.
If you use this code, there's no need for rowwise() - which should also make it faster, I guess
Is there a better way of reshaping dataframe data?
temp <- bdh(conn,c("AUDUSD Curncy","EURUSD Curncy"),"PX_LAST","20110101")
gives
head(temp)
ticker date PX_LAST
1 AUDUSD Curncy 2011-01-01 NA
2 AUDUSD Curncy 2011-01-02 NA
3 AUDUSD Curncy 2011-01-03 1.0205
4 AUDUSD Curncy 2011-01-04 1.0040
5 AUDUSD Curncy 2011-01-05 1.0014
6 AUDUSD Curncy 2011-01-06 0.9969
and
tail(temp)
ticker date PX_LAST
2127 EURUSD Curncy 2013-11-26 1.3557
2128 EURUSD Curncy 2013-11-27 1.3570
2129 EURUSD Curncy 2013-11-28 1.3596
2130 EURUSD Curncy 2013-11-29 1.3591
2131 EURUSD Curncy 2013-11-30 NA
2132 EURUSD Curncy 2013-12-01 NA
in other words, the data are just vertically tacked on to each other and further processing is necessary in order to get them working. how can i regroup this data into the various tickers, i.e.
head(temp)
AUDUSD.Curncy EURUSD.Curncy
2011-01-01 NA NA
2011-01-02 NA NA
2011-01-03 1.0205 1.3375
2011-01-04 1.0040 1.3315
2011-01-05 1.0014 1.3183
2011-01-06 0.9969 1.3028
All the reshaping questions I googled didnt have the kind of reshaping I wanted. I have implemented my own piecemeal solution given below but for learning's sake I wanted to ask you guys if there is a more elegant solution for this?
You could try read.zoo. Use index.column to specify in which column index/time is stored, and reshape data according to splitcolumnn, . The result is a zoo time series
library(zoo)
z <- read.zoo(text = "ticker date PX_LAST
1 AUDUSD 2011-01-01 NA
2 AUDUSD 2011-01-02 NA
3 AUDUSD 2011-01-03 1.0205
4 AUDUSD 2011-01-04 1.0040
5 AUDUSD 2011-01-05 1.0014
6 AUDUSD 2011-01-06 0.9969
2127 EURUSD 2013-11-26 1.3557
2128 EURUSD 2013-11-27 1.3570
2129 EURUSD 2013-11-28 1.3596
2130 EURUSD 2013-11-29 1.3591
2131 EURUSD 2013-11-30 NA
2132 EURUSD 2013-12-01 NA", index.column = "date", split = "ticker")
z
# AUDUSD EURUSD
# 2011-01-01 NA NA
# 2011-01-02 NA NA
# 2011-01-03 1.0205 NA
# 2011-01-04 1.0040 NA
# 2011-01-05 1.0014 NA
# 2011-01-06 0.9969 NA
# 2013-11-26 NA 1.3557
# 2013-11-27 NA 1.3570
# 2013-11-28 NA 1.3596
# 2013-11-29 NA 1.3591
# 2013-11-30 NA NA
# 2013-12-01 NA NA
str(z)
This is exactly why we have created the RbbgExtension package. It is a wrapper around the Rbbg package that handles many issues when dealing with financial data - issues we have come across in our daily work with backtesting trading strategies etc. for a financial institution.
As you can see the output is a xts object, but if the query is across multiple tickers and multiple fields, then the output will an array - but you can read about why that is in the documentation.
We have made the package open source and publicly available on GitHub. Just use Hadley's devtools' function install_github("pgarnry/RbbgExtension") to get the package. It has a few dependencies including "Rbbg".
> require(RbbgExtension)
Loading required package: RbbgExtension
>
> tickers <- c("AUDUSD", "EURUSD")
>
> prices <- HistData(tickers = tickers,
+ type = "Curncy",
+ fields = "PX_LAST",
+ startdate = "20110101")
R version 3.1.2 (2014-10-31)
rJava Version 0.9-6
Rbbg Version 0.5.3
Java environment initialized successfully.
Looking for most recent blpapi3.jar file...
Adding C:\blp\API\APIv3\JavaAPI\v3.7.1.1\lib\blpapi3.jar to Java classpath
Bloomberg API Version 3.7.1.1
> class(prices)
[1] "xts" "zoo"
> head(prices)
AUDUSD EURUSD
2011-01-03 1.0168 1.3361
2011-01-04 1.0051 1.3308
2011-01-05 0.9995 1.3149
2011-01-06 0.9944 1.3003
2011-01-07 0.9959 1.2907
2011-01-10 0.9956 1.2951
> tail(prices)
AUDUSD EURUSD
2015-01-26 0.7925 1.1238
2015-01-27 0.7937 1.1381
2015-01-28 0.7889 1.1287
2015-01-29 0.7762 1.1320
2015-01-30 0.7762 1.1291
2015-02-02 0.7806 1.1351
rbbg's blh (now bdh) is dumb. this outputs time series correctly.
bdhx <- function(conn,securities,start_date,end_date=NULL,fields="PX_LAST",override_fields = NULL,overrides = NULL) {
temp <- bdh(conn=conn,securities=securities,fields=fields,start_date=start_date,end_date=end_date,override_fields=override_fields)
if (colnames(temp)[1]=="date")
{temp <- as.xts(temp)[,-1];colnames(temp) <- securities;res <- temp;}
else
{cn <- unique(temp[,1]);fil <- temp[,1]==cn[1];
res <- xts(temp[fil,3],as.Date(temp[fil,2]));colnames(res) <- securities[1];
for (i in 4:(length(cn)+2)){
fil <- temp[,1]==cn[i-2]
temp2 <- xts(temp[fil,3],as.Date(temp[fil,2]));colnames(temp2) <- securities[i-2];
res <- merge.xts(res,temp2)}
}
res}
This is related to the previous question on SO: roll data.table with rollends
Given the data...
library(data.table)
dt1 = data.table(Date=seq(from=as.Date("2013-01-03"),
to=as.Date("2013-06-27"), by="1 week"),
key="Date")[, ind:=.I]
dt2 = data.table(Date=seq(from=as.Date("2013-01-01"),
to=as.Date("2013-06-30"), by="1 day"),
key="Date")
I try to carry the weekly datapoints one day forward and backward ...
dt1[dt2, roll=1][dt2, roll=-1]
...but only the first roll join (forward) seems to work and roll=-1 is ignored:
Date ind
1: 2013-01-01 NA
2: 2013-01-02 NA
3: 2013-01-03 1
4: 2013-01-04 1
5: 2013-01-05 NA
---
177: 2013-06-26 NA
178: 2013-06-27 26
179: 2013-06-28 26
180: 2013-06-29 NA
181: 2013-06-30 NA
The same effect when I reverse the order:
dt1[dt2, roll=-1][dt2, roll=1]
Date ind
1: 2013-01-01 NA
2: 2013-01-02 1
3: 2013-01-03 1
4: 2013-01-04 NA
5: 2013-01-05 NA
---
177: 2013-06-26 26
178: 2013-06-27 26
179: 2013-06-28 NA
180: 2013-06-29 NA
181: 2013-06-30 NA
I would like to achieve:
Date ind
1: 2013-01-01 NA
2: 2013-01-02 1
3: 2013-01-03 1
4: 2013-01-04 1
5: 2013-01-05 NA
---
177: 2013-06-26 26
178: 2013-06-27 26
179: 2013-06-28 26
180: 2013-06-29 NA
181: 2013-06-30 NA
EDIT:
I am using the fresh data.table version 1.8.11, note session details:
sessionInfo()
R version 3.0.0 (2013-04-03)
Platform: x86_64-w64-mingw32/x64 (64-bit)
locale:
[1] LC_COLLATE=English_United Kingdom.1252 LC_CTYPE=English_United Kingdom.1252
[3] LC_MONETARY=English_United Kingdom.1252 LC_NUMERIC=C
[5] LC_TIME=English_United Kingdom.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] data.table_1.8.11
loaded via a namespace (and not attached):
[1] plyr_1.8 reshape2_1.2.2 stringr_0.6.2 tools_3.0.0
Thanks
There is nothing to roll after the first join, each row in dt2 has a corresponding row in dt1[dt2, roll = .]. So just do the two rolls separately and combine them together, e.g.:
dt1[dt2, roll = 1][, ind := ifelse(is.na(ind), dt1[dt2, roll = -1]$ind, ind)]