Merging price time series in xts when securities show no data - r

I am querying a security DB. Price series are in xts and for some there might be no data (for the chosen window). Actual time series can be simulated as follows:
require(xts)
## Simulated time series
price=function(){
x=floor(runif(1,1,4))
xts(round(rnorm(x,5),3), Sys.Date()+1:x)
}
## Sample tickers
(tick1=setNames(price(), "tick1"))
# tick1
# 2014-04-20 5.829
# 2014-04-21 6.061
# 2014-04-22 5.813
(tick2=setNames(price(), "tick2"))
# tick2
# 2014-04-20 6.458
# 2014-04-21 5.373
(tick3=xts(data.frame(tick3=numeric()), as.Date(numeric()))) # Security showing no data
# tick3
## ...
## tickn
No need to mention that I don't know in advance which security will show no data.
If I merge the prices in a single xts object, merge.xts completely removes from the output the empty security(ies):
(port=merge(tick1, tick2, tick3))
# tick1 tick2
# 2014-04-20 5.829 6.458
# 2014-04-21 6.061 5.373
# 2014-04-22 5.813 NA
Instead I would like to keep trace of them, therefore printing an output similar to:
(cbind(port, tick3=NA))
# tick1 tick2 tick3
# 2014-04-20 5.829 6.458 NA
# 2014-04-21 6.061 5.373 NA
# 2014-04-22 5.813 NA NA
One possible solution is:
port=list(tick1, tick2, tick3) # ... tickn
port.m=lapply(port, function(sec){
if(nrow(sec)==0) sec= xts(matrix(NA, dimnames=dimnames(tick3)), Sys.Date())
sec
})
(port.m=do.call('merge', port.m))
# tick1 tick2 tick3
# 2014-04-19 NA NA NA
# 2014-04-20 5.829 6.458 NA
# 2014-04-21 6.061 5.373 NA
# 2014-04-22 5.813 NA NA
if(all(is.na(port.m[Sys.Date()])))
(port.m=port.m[time(port.m)!=Sys.Date()])
# tick1 tick2 tick3
# 2014-04-20 5.829 6.458 NA
# 2014-04-21 6.061 5.373 NA
# 2014-04-22 5.813 NA NA
Is it possible to find a smarter solution?

You are making two mistakes here:
First: You need to use a vector of non-zero length. See this:
length(integer())
length(NA)
Second: For merge to work, the xts object indices have to match some where.
e.g. something like this will work:
require(xts)
x=xts(1:4, Sys.Date()+1:4)
v=xts(NA, Sys.Date()+1)
(m=merge.xts(x,v))
Here the starting index matches, and remaining indices are filled up.
If you want to be very particular, you could probably try something like this:
v=xts(rep(NA,4), Sys.Date()+1:4)
Hope this helps!!

Related

Applying SetNames to a list of data frames

I'm running into an issue applying new names to a list of data frames. I'm using quantmod to pull stock data, and then calculating the 7-Day moving average in this example. I can create the new columns within the list of data frames, but when I go to rename them using lapply and setNames it is only returning the newly renamed column and not any of the old data in each data frame.
require(quantmod)
require(zoo)
# Select Symbols
symbols <- c('AAPL','GOOG')
# Set start Date
start_date <- '2017-01-01'
# Get data and put data xts' into a list. Create empty list and then loop through to add all symbol data
stocks <- list()
for (i in 1:length(symbols)) {
stocks[[i]] <- getSymbols(symbols[i], src = 'google', from = start_date, auto.assign = FALSE)
}
##### Create the 7 day moving average for each stock in the stocks list #####
stocks <- lapply(stocks, function(x) cbind(x, rollmean(x[,4], 7, align = "right")))
Sample Output:
[[1]]
AAPL.Open AAPL.High AAPL.Low AAPL.Close AAPL.Volume AAPL.Close.1
2017-01-03 115.80 116.33 114.76 116.15 28781865 NA
2017-01-04 115.85 116.51 115.75 116.02 21118116 NA
2017-01-05 115.92 116.86 115.81 116.61 22193587 NA
2017-01-06 116.78 118.16 116.47 117.91 31751900 NA
2017-01-09 117.95 119.43 117.94 118.99 33561948 NA
2017-01-10 118.77 119.38 118.30 119.11 24462051 NA
2017-01-11 118.74 119.93 118.60 119.75 27588593 117.7914
2017-01-12 118.90 119.30 118.21 119.25 27086220 118.2343
2017-01-13 119.11 119.62 118.81 119.04 26111948 118.6657
[[2]]
GOOG.Open GOOG.High GOOG.Low GOOG.Close GOOG.Volume GOOG.Close.1
2017-01-03 778.81 789.63 775.80 786.14 1657268 NA
2017-01-04 788.36 791.34 783.16 786.90 1072958 NA
2017-01-05 786.08 794.48 785.02 794.02 1335167 NA
2017-01-06 795.26 807.90 792.20 806.15 1640170 NA
2017-01-09 806.40 809.97 802.83 806.65 1274645 NA
2017-01-10 807.86 809.13 803.51 804.79 1176780 NA
2017-01-11 805.00 808.15 801.37 807.91 1065936 798.9371
2017-01-12 807.14 807.39 799.17 806.36 1353057 801.8257
2017-01-13 807.48 811.22 806.69 807.88 1099215 804.8229
I would like to change the "AAPL.Close.1" and "GOOG.Close.1" to say "AAPL.Close.7.Day.MA" and "GOOG.Close.7.Day.MA" respectively (for however many symbols that I choose at the top).
The closest that I've gotten is:
stocks <- lapply(stocks[], function(x) setNames(x[,6], paste0(names(x[,4]), ".7.Day.MA")))
This is correctly naming the new columns, but now my stocks list only contains that single column for each ticker:
[[1]]
AAPL.Close.7.Day.MA
2017-01-03 NA
2017-01-04 NA
2017-01-05 NA
2017-01-06 NA
2017-01-09 NA
2017-01-10 NA
2017-01-11 117.7914
2017-01-12 118.2343
2017-01-13 118.6657
[[2]]
GOOG.Close.7.Day.MA
2017-01-03 NA
2017-01-04 NA
2017-01-05 NA
2017-01-06 NA
2017-01-09 NA
2017-01-10 NA
2017-01-11 798.9371
2017-01-12 801.8257
2017-01-13 804.8229
Why is the setNames function removing the original columns?
Almost there:
N = 10 #number of pseudorandom numbers
df1 <- data.frame(a=runif(N),b=sample(N))#1st data frame
df2 <- data.frame(c=rnorm(N),google=df1$b^2,e=df1$a^3)#2nd data frame
stocks<-list(df1,df2)# create the list
lapply(stocks,names) # get the names of each list element (data.frame)
[[1]]
[1] "a" "b"
[[2]]
[1] "c" "google" "e"
Since we are using a function we need to use the <<- in order to overwrite the initial object stocks.
lapply(seq_along(1:length(stocks)),function(x) names(stocks[[x]])<<-gsub(pattern = "google",replacement = "google2",x=names(stocks[[x]])))#replacing the string google
[[1]]
[1] "a" "b"
[[2]]
[1] "c" "google2" "e"
Additionally (verification) stocks contains the new names:
> stocks
[[1]]
a b
1 0.73826897 3
2 0.35627664 8
3 0.89060134 7
4 0.72629312 10
5 0.97069742 4
6 0.12530931 2
7 0.65744257 9
8 0.06218019 1
9 0.67322891 6
10 0.66128204 5
[[2]]
c google2 e
1 -0.5272267 9 0.402386917
2 0.6993945 64 0.045223278
3 0.3707304 49 0.706398932
4 -0.2371541 100 0.383120861
5 1.5073834 16 0.914643019
6 0.4098821 4 0.001967660
7 -0.3014211 81 0.284166886
8 0.3248919 1 0.000240412
9 1.2757740 36 0.305132358
10 1.5938208 25 0.289174620

How do I copy a date from one variable to another in R data.table without losing the date format?

I have a data.table containing two date variables. The data set was read into R from a .csv file (was originally an .xlsx file) as a data.frame and the two variables then converted to date format using as.Date() so that they display as below:
df
id specdate recdate
1 1 2014-08-12 2014-08-17
2 2 2014-08-15 2014-08-20
3 3 2014-08-21 2014-08-26
4 4 <NA> 2014-08-28
5 5 2014-08-25 2014-08-30
6 6 <NA> <NA>
I then converted the data.frame to a data.table:
df <- data.table(df)
I then wanted to create a third variable, that would include "specdate" if present, but replace it with "recdate" if "specdate" was missing (NA). This is where I'm having some difficulty, as it seems that no matter how I approach this, data.table displays dates in date format only if a complete variable that is already in date format is copied. Otherwise, individual values are displayed as a number (even when using as.IDate) and I gather that an origin date is needed to correct this. Is there any way to avoid supplying an origin date but display the dates as dates in data.table?
Below is my attempt to fill the NAs of specdate with the recdate dates:
# Function to fill NAs:
fillnas <- function(dataref, lookupref, nacol, replacecol, replacelist=NULL) {
nacol <- as.character(nacol)
if(!is.null(replacelist)) nacol <- factor(ifelse(dataref==lookupref & (is.na(nacol) | nacol %in% replacelist), replacecol, nacol))
else nacol <- factor(ifelse(dataref==lookupref & is.na(nacol), replacecol, nacol))
nacol
}
# Fill the NAs in specdate with the function:
df[, finaldate := fillnas(dataref=id, lookupref=id, nacol=specdate, replacecol=as.IDate(recdate, format="%Y-%m-%d"))]
Here is what happens:
> df
id specdate recdate finaldate
1: 1 2014-08-12 2014-08-17 2014-08-12
2: 2 2014-08-15 2014-08-20 2014-08-15
3: 3 2014-08-21 2014-08-26 2014-08-21
4: 4 <NA> 2014-08-28 16310
5: 5 2014-08-25 2014-08-30 2014-08-25
6: 6 <NA> <NA> NA
The display problem is compounded if I create the new variable from scratch by using ifelse:
df[, finaldate := ifelse(!is.na(specdate), specdate, recdate)]
This gives:
> df
id specdate recdate finaldate
1: 1 2014-08-12 2014-08-17 16294
2: 2 2014-08-15 2014-08-20 16297
3: 3 2014-08-21 2014-08-26 16303
4: 4 <NA> 2014-08-28 16310
5: 5 2014-08-25 2014-08-30 16307
6: 6 <NA> <NA> NA
Alternately if I try a find-and-replace type approach, I get an error about the number of items to replace not matching the replacement length (I'm guessing this is because that approach is not vectorised?), the values from recdate are recycled and end up in the wrong place:
> df$finaldate <- df$specdate
> df$finaldate[is.na(df$specdate)] <- df$recdate
Warning message:
In NextMethod(.Generic) :
number of items to replace is not a multiple of replacement length
> df
id specdate recdate finaldate
1: 1 2014-08-12 2014-08-17 2014-08-12
2: 2 2014-08-15 2014-08-20 2014-08-15
3: 3 2014-08-21 2014-08-26 2014-08-21
4: 4 <NA> 2014-08-28 2014-08-17
5: 5 2014-08-25 2014-08-30 2014-08-25
6: 6 <NA> <NA> 2014-08-20
So in conclusion - the function I applied gets me closest to what I want, except that where NAs have been replaced, the replacement value is displayed as a number and not in date format. Once displayed as a number, the origin is required to again display it as a date (and I would like to avoid supplying the origin since I usually don't know it and it seems unnecessarily repetitive to have to supply it when the date was originally in the correct format).
Any insights as to where I'm going wrong would be much appreciated.
I'd approach it like this, maybe :
DT <- data.table(df)
DT[, finaldate := specdata]
DT[is.na(specdata), finaldate := recdate]
It seems you want to add a new column so you can can retain the original columns as well. I do that as well a lot. Sometimes, I just update in place :
DT <- data.table(df)
DT[!is.na(specdate), specdate:=recdate]
setnames(DT, "specdate", "finaldate")
Using i like that avoids creating a whole new column's worth of data which might be very large. Depends on how important retaining the original columns is to you and how many of them there are and your data size. (Note that a whole column's worth of data is still created by the is.na() call and then again by ! but at least there isn't a third column's worth for the new finaldate. Would be great to optimize i=!is.na() in future (#1386) and if you use data.table this way now you won't need to change your code in future to benefit.)
It seems that you might have various "NA" strings that you're replacing. Note that fread in v1.9.6 on CRAN has a fix for that. From README :
correctly handles na.strings argument for all types of columns - it detect possible NA values without coercion to character, like in base read.table. fixes #504. Thanks to #dselivanov for the PR. Also closes #1314, which closes this issue completely, i.e., na.strings = c("-999", "FALSE") etc. also work.
Btw, you've made one of the top 3 mistakes mentioned here : https://github.com/Rdatatable/data.table/wiki/Support
Works for me. You may want to test to be sure that your NA values are not strings or factors "<NA>"; they will look like real NA values:
dt[, finaldate := ifelse(is.na(specdate), recdate, specdate)][
,finaldate := as.POSIXct(finaldate*86400, origin="1970-01-01", tz="UTC")]
# id specdate recdate finaldate
# 1: 1 2014-08-12 2014-08-17 2014-08-12
# 2: 2 2014-08-15 2014-08-20 2014-08-15
# 3: 3 2014-08-21 2014-08-26 2014-08-21
# 4: 4 NA 2014-08-28 2014-08-28
# 5: 5 2014-08-25 2014-08-30 2014-08-25
# 6: 6 NA NA NA
Data
df <- read.table(text=" id specdate recdate
1 1 2014-08-12 2014-08-17
2 2 2014-08-15 2014-08-20
3 3 2014-08-21 2014-08-26
4 4 NA 2014-08-28
5 5 2014-08-25 2014-08-30
6 6 NA NA", header=T, stringsAsFactors=F)
dt <- as.data.table(df)

Calling a list of tickers in quantmod using R

I want to get some data from a list of Chinese stocks using quantmod.
The list is like below:
002705.SZ -- 002730.SZ (in this sequence, there are some tickers matched with Null stock, for example, there is no stock called 002720.SZ)
300357.SZ -- 300402.SZ
603188.SS
603609.SS
603288.SS
603306.SS
603369.SS
I want to write a loop to run all these stocks to get the data from each of them and save them into one data frame.
This should get you started.
library(quantmod)
library(stringr) # for str_pad
stocks <- paste(str_pad(2705:2730,width=6,side="left",pad="0"),"SZ",sep=".")
get.stock <- function(s) {
s <- try(Cl(getSymbols(s,auto.assign=FALSE)),silent=T)
if (class(s)=="xts") return(s)
return (NULL)
}
result <- do.call(cbind,lapply(stocks,get.stock))
head(result)
# X002705.SZ.Close X002706.SZ.Close X002707.SZ.Close X002708.SZ.Close X002709.SZ.Close X002711.SZ.Close X002712.SZ.Close X002713.SZ.Close
# 2014-01-21 15.25 27.79 NA 17.26 NA NA NA NA
# 2014-01-22 14.28 28.41 NA 16.56 NA NA NA NA
# 2014-01-23 13.65 27.78 33.62 15.95 19.83 NA 36.58 NA
# 2014-01-24 15.02 30.56 36.98 17.55 21.81 NA 40.24 NA
# 2014-01-27 14.43 31.26 40.68 18.70 23.99 26.34 44.26 NA
# 2014-01-28 14.18 30.01 44.75 17.66 25.57 28.97 48.69 NA
This takes advantage of the fact that getSymbols(...) returns either an xts object, or a character string with an error message if the fetch fails.
Note that cbind(...) for xts objects aligns according to the index, so it acts like merge(...).
This produces an xts object, not a data frame. To convert this to a data.frame, use:
result.df <- data.frame(date=index(result),result)

identify date format in R before converting

I have a simple data set which has a date column and a value column. I noticed that the date sometimes comes in as mmddyy (%m/%d/%y) format and other times in mmddYYYY (%m/%d/%Y) format. What is the best way to standardize the dates so that i can do other calculations without this formatting causing issues?
I tried the answers provided here
Changing date format in R
and here
How to change multiple Date formats in same column
Neither of these were able to fix the problem.
Below is a sample of the data
Date, Market
12/17/09,1.703
12/18/09,1.700
12/21/09,1.700
12/22/09,1.590
12/23/2009,1.568
12/24/2009,1.520
12/28/2009,1.500
12/29/2009,1.450
12/30/2009,1.450
12/31/2009,1.450
1/4/2010,1.440
When i read it into a new vector using something like this
dt <- as.Date(inp$Date, format="%m/%d/%y")
I get the following output for the above segment
dt Market
2009-12-17 1.703
2009-12-18 1.700
2009-12-21 1.700
2009-12-22 1.590
2020-12-23 1.568
2020-12-24 1.520
2020-12-28 1.500
2020-12-29 1.450
2020-12-30 1.450
2020-12-31 1.450
2020-01-04 1.440
As you can see we skipped from 2009 to 2020 at 12/23 because of change in formatting. Any help is appreciated. Thanks.
> dat$Date <- gsub("[0-9]{2}([0-9]{2})$", "\\1", dat$Date)
> dat$Date <- as.Date(dat$Date, format = "%m/%d/%y")
> dat
Date Market
# 1 2009-12-17 1.703
# 2 2009-12-18 1.700
# 3 2009-12-21 1.700
# 4 2009-12-22 1.590
# 5 2009-12-23 1.568
# 6 2009-12-24 1.520
# 7 2009-12-28 1.500
# 8 2009-12-29 1.450
# 9 2009-12-30 1.450
# 10 2009-12-31 1.450
# 11 2010-01-04 1.440

Add months to IDate column of data.table in R

I have been using data.table for practically everything I was using data.frames for, as it is much, much faster on big in-memory data (several million rows). However, I'm not quite sure how to add days or months to an IDate column without using apply (which is very slow).
A minimal example:
dates = c("2003-01-01", "2003-02-01", "2003-03-01", "2003-06-01", "2003-12-01",
"2003-04-01", "2003-05-01", "2003-07-01", "2003-09-01", "2003-08-01")
dt = data.table(idate1=as.IDate(dates))
Now, let's say I want to create a column with dates 6 months ahead. Normally, for a single IDate, I would do this:
seq(dt$idate1[1],by="6 months",length=2)[2]
But this won't work as from= must be of length 1:
dt[,idate2:=seq(idate1,by="6 months",length=2)[2]]
Is there an efficient way of doing it to create column idate2 in dt?
Thanks a lot,
RR
One way is to use mondate package and add the months to it and then convert it back to iDate class object.
require(mondate)
dt = data.table(idate1=as.IDate(dates))
dt[, idate2 := as.IDate(mondate(as.Date(idate1)) + 6)]
# idate1 idate2
# 1: 2003-01-01 2003-07-01
# 2: 2003-02-01 2003-08-02
# 3: 2003-03-01 2003-09-01
# 4: 2003-06-01 2003-12-02
# 5: 2003-12-01 2004-06-01
# 6: 2003-04-01 2003-10-02
# 7: 2003-05-01 2003-11-01
# 8: 2003-07-01 2004-01-01
# 9: 2003-09-01 2004-03-02
# 10: 2003-08-01 2004-02-01
Although, I suppose that there might be other better solutions.
You can use lubridate,
library(lubridate)
dt[, idate2 := as.IDate(idate1 %m+% months(6))]
idate1 idate2
1: 2003-01-01 2003-07-01
2: 2003-02-01 2003-08-01
3: 2003-03-01 2003-09-01
4: 2003-06-01 2003-12-01
5: 2003-12-01 2004-06-01
6: 2003-04-01 2003-10-01
7: 2003-05-01 2003-11-01
8: 2003-07-01 2004-01-01
9: 2003-09-01 2004-03-01
10: 2003-08-01 2004-02-01

Resources