Very new to R, so I hope I don't frustrate anyone.
Putting together pieces from online searches and using quantmode and purr packages, I have the following code to create an xts data frame called stocks:
symbols <- c("RYCVX","AJA","IEMG")
start <- as.Date("2006-06-22")
end <- as.Date("2020-07-30")
# collect adjusted column of all symbols in one matrix
stocks <- getSymbols(symbols,src = "yahoo", from = start, to = end,
auto.assign = TRUE,
warnings = FALSE) %>%
map(~Ad(get(.))) %>%
reduce(merge) %>%
`colnames<-`(symbols)
This is daily, but I want to have a monthly matrix, yet still keep the NA fields.
I tried this line of code:
mstocks <- to.monthly(stocks, indexAt = "last", OHLC = FALSE)
but my resulting data frame is shrunk down to the symbol with the least amount of data, since any row with any missing value is omitted, so I end losing data on the more historically rich symbol.
Is there a way I could keep the missing values and have monthly data that, like my daily data, has rows where one symbol is NA?
So here is what I get:
RYCX AJA IEMG
2018-12-30 29.3045 4.5523 33.2045 <- first date all symbols have data
...
2020-07-30 34.2344 5.6664 12.2234
What I get now with Walts's help:
V1
2006-06-30 NA
...
2020-07-29 52.66000
What I need:
RYCX AJA IEMG
2006-06-30 29.3045 NA NA
....
2020-07-30 34.2344 5.6664 12.2234
All prices are made up
I'll assume that by monthly data, you mean the adjusted-close price on the last trading day of the month. To simply the code a bit, I've used auto.assign = FALSE so that getSymbols returns an xts time-series object rather than placing it in the environment. I've also used the function setNames rather than colnames<-(symbols) which works but is somewhat opaque. To convert to monthly, use apply.monthly(last) which takes the last day of each month in the time series. Data for all months is returned including those with NA in some of the time series.
library(tidyverse)
library(quantmod)
symbols <- c("RYCVX","AJA","IEMG")
start <- as.Date("2006-06-22")
end <- as.Date("2020-07-30")
stocks <- symbols %>% map( ~Ad(getSymbols(.x, src = "yahoo", from = start, to = end,
auto.assign = FALSE,
warnings = FALSE))) %>%
reduce(merge) %>%
setNames(symbols) %>%
apply.monthly(last)
which gives:
> stocks
RYCVX AJA IEMG
2006-06-30 21.901295 5.810 NA
2006-07-31 21.892862 6.260 NA
2006-08-31 22.643713 18.400 NA
2006-09-29 23.732025 19.250 NA
2006-10-31 25.284351 6.160 NA
2006-11-30 25.908657 6.960 NA
2006-12-29 26.817636 20.900 NA
Related
When I run the following code on my dataset, I get an output (partial one shown) like this:
all_countries_ts[,grepl("Muslims", colnames(all_countries_ts))]
Senegal Muslims Serbia Muslims Seychelles Muslims
1970-01-01 3693807 200000 170
2000-01-01 8936283 529322 730
2010-01-01 11713126 527598 821
2015-01-01 13621382 471414 844
However, when I try to use the function apply.yearly on it to sum across the years, I just get an NA result:
apply.yearly(all_countries_ts[,grepl("Muslims", colnames(all_countries_ts))], FUN = sum)
1970-01-01 NA
2000-01-01 NA
2010-01-01 NA
2015-01-01 NA
The funny thing is that it works with some inputs but not others. For example, if I use input "Agnostics" instead of "Muslims", I get a good result. There isn't an error, so I can't seem to figure out what exactly is happening here.
all_countries_ts is stored as a xts object. One thing to note is that apply.yearly() always work on a subset of this dataset. I have written a function and you can see it below:
sum_by_category <- function(religious_group, dataset) {
apply.yearly(dataset[,grepl(paste(religious_group), colnames(dataset))], FUN =
sum)
}
country_search <- function(country_name, z){
z <- foreach(i = 1:length(country_name), .combine = merge.xts) %do%{
all_countries_ts[,grepl(country_name[i], colnames(all_countries_ts))]
}
return(z)}
When I type in the following, it works perfectly:
sum_by_category("Muslims", country_search("Senegal"))
Senegal Muslims
1970-01-01 3693807
2000-01-01 8936283
2010-01-01 11713126
2015-01-01 13621382
I really can't figure out what's going on since it works with some inputs and not others. Thanks in advance for any help / insights!
The xts::apply.yearly expects x argument coercible to xts object. Perhaps your data.frame is not a xts compatible data frame.
The help for apply.yearly explains:
Arguments
x an time-series object coercible to xts
FUN an R function
I have created a sample data based on data shared by OP and converted it to xts class. apply.yearly works correctly on the same.
library(xts)
# Convert data.frame to xts class
all_countries_ts <- xts(df[,-1], order.by = df$Date)
#Now one can use `apply.yearly`
apply.yearly(all_countries_ts[,grepl("Muslims", colnames(all_countries_ts))], FUN = sum)
# [,1]
# 1970-01-01 3893977
# 2000-01-01 9466335
# 2010-01-01 12241545
# 2015-01-01 14093640
Edited: Review of the OP's data suggest that it contains NA for many column which is causing total sum to be shown as NA. The fix is simple. OP needs to use as:
apply.yearly(all_countries_ts[,grepl("Muslims",colnames(all_countries_ts))],
FUN = sum, na.rm = TRUE)
# [,1]
# 1970-01-01 570772699
# 2000-01-01 1292170756
# 2010-01-01 1571250533
# 2015-01-01 1734531709
Data:
df <- read.table(text =
" Date 'Senegal Muslims' 'Serbia Muslims' 'Seychelles Muslims' Others
1970-01-01 3693807 200000 170 200
2000-01-01 8936283 529322 730 100
2010-01-01 11713126 527598 821 300
2015-01-01 13621382 471414 844 500",
header = TRUE, stringsAsFactors = FALSE)
#convert Date column to Date format
df$Date <- as.Date(df$Date)
I have a data.frame that contains quarterly observations. I now want to interpolate monthly values (preferred cubic, linear is fine). The intermediate goal should be to create a data.frame with DATE as the index and missing values for all the monthly observations.
Googling showed that I should create an empty data.frame for the whole time range and then merge it - but what ever I tried so far gave me errors. Here's my procedure; but since I'm a newb to r, I'm open to any suggestions for changes.
> str(ger)
'data.frame': 93 obs. of 2 variables:
$ DATE : Date, format: "1991-01-01" "1991-04-01" "1991-07-01" "1991-10-01" ...
$ VALUE: num 470780 468834 466332 472949 480359 ...
> head(ger)
DATE VALUE
1 1991-01-01 470780.3
2 1991-04-01 468834.0
3 1991-07-01 466331.6
4 1991-10-01 472949.0
5 1992-01-01 480359.2
6 1992-04-01 476744.5
emptyIndex <- seq(ger[1, 'DATE'], tail(ger[, 'DATE'], 1), by='1 month')
gerMonthly <- data.frame(DATE = emptyIndex, VALUE = NA)
merge(ger, gerMonthly, by='DATE', all.y = T)
This is the closest I got, but it gives me an undesired column format - there surely is a cleaner way to get what I want? Finally, given the format, what would be the cleanest way to get the interpolated time series?
DATE VALUE.x VALUE.y
1 1991-01-01 470780.3 NA
2 1991-02-01 NA NA
3 1991-03-01 NA NA
4 1991-04-01 468834.0 NA
5 1991-05-01 NA NA
6 1991-06-01 NA NA
I'm not quite clear on your comment about the undesired column format but if you're trying to get the interpolated values using a cubic interpolation, you might consider something like the code below
ger <- data.frame(DATE= as.Date(c("1991-01-01", "1991-04-01", "1991-07-01", "1991-10-01", "1992-01-01" )),
+ VALUE= c(470780, 468834, 466332, 472949, 480359))
DateSeq <- seq(ger$DATE[1],tail(ger$DATE,1),by="1 month")
gerMonthly <- data.frame(DATE=DateSeq, Interp.Value=spline(ger, method="natural", xout=DateSeq)$y)
merge(ger, gerMonthly, by='DATE', all.y = T)
The DATE column needs to be in Date format so the interpolation can work with numeric values.
I've usually used "natural" cubic splines but other options are available.
This format shows both the input values and the results so that you can check that the interpolation looks reasonable but you can use gerMonthly if you just want the interpolated results.
Reading your code has really helped me. To obtain the interpolated values, I did this:
library(tseries)
library(zoo)
# the last line of your code, named for convenience.
merged_data < - merge(ger, gerMonthly, by='DATE', all.y = T)
# declare your desired variable as a time series,
monthly_data <- ts(merged_data$VALUE.x, start = c(1991, 1), end = c(1998,10), frequency = 12)
# interpolate: na.approx linearly interpolates NA values in a time series, na.spline will use cubic spline interpolation.
na.approx(monthly_data) # or:
na.spline(monthly_data)
I'm trying to create a Holt-Winters forecast from a weekly time series, then plot the original series and forecast using dygraphs. I have 144 weeks of friday week-ending data. For my purpose, I'm ignoring that some years have 53 weeks. The structure of the data can be simulated by:
## create data similar to what I have
week_date <- seq.Date(from = as.Date("2012/05/11"),
by = "week",
length.out = 144)
set.seed(1)
var1 <- diffinv(rnorm(143))
df <- data.frame(cbind(week_date, var1))
## convert to ts object then
## create Holt Winters forecast
dfts <- ts(df[,2],freq=52, start=c(2012,19))
hw <- HoltWinters(dfts)
p <- predict(hw, 4)
all <- cbind(dfts, p)
## create plots
dygraph(all, "time series dygraph") %>%
dySeries("var1", label = "Actual") %>%
dySeries(c("p.lwr", "p.fit", "p.upr"), label = "Predicted")
This produces the following error:
Error in as.xts.ts(data) : could not convert index to appropriate type
I tried the solution proposed here, but am getting the same error:
> all <- cbind(dfts = as.xts(dfts), p = as.xts(p))
Error in as.xts.ts(dfts) : could not convert index to appropriate type
There are a few things going on here. The root of the issue is that the data parameter of dygraph requires "Time series data (must be an xts object or an object which is convertible to xts)" (see ?dygraph).
As you've discovered, converting dfts to an xts object fails:
> library(xts)
> dfts <- as.xts(dfts)
Error in as.xts.ts(dfts) : could not convert index to appropriate type
If you try to create the xts object directly:
> dfts <- xts(dfts)
Error in xts(dfts) : order.by requires an appropriate time-based object
This is because, by default xts uses index(x) for the order.by parameter. From ?xts:
order.by a corresponding vector of unique times/dates -
must be of a known time-based class
...
Currently acceptable classes include: ‘Date’, ‘POSIXct’, ‘timeDate’,
as well as ‘yearmon’ and ‘yearqtr’ where the index values remain unique.
And if you look at the index on dfts:
> str(index(dfts))
num [1:148] 2012 2012 2012 2012 2012 ...
> head(index(dfts))
[1] 2012.346 2012.365 2012.385 2012.404 2012.423 2012.442
The index is numeric, while xts requires some type of date object, so you'll need to convert it.
First, I'd create the all object by converting each to a zoo object and then merging:
> library(zoo)
> # You'll need prediction.interval=TRUE to get the bounds:
> p <- predict(hw, 4, prediction.interval=TRUE)
> all <- merge(actual=as.zoo(dfts), predicted=as.zoo(p))
> head(all)
actual fit upr lwr
2012(19) 0.0000000 NA NA NA
2012(20) -0.6264538 NA NA NA
2012(21) -0.4428105 NA NA NA
2012(22) -1.2784391 NA NA NA
2012(23) 0.3168417 NA NA NA
2012(24) 0.6463495 NA NA NA
Then, you can cast it to an xts object by converting the decimal index to a date. There are a few ways to do this, but the easiest is probably to use the date_decimal function from the lubridate package:
> library(lubridate)
> all.xts <- xts(all, date_decimal(index(all)))
Now, tweaking the parameters in the dygraph function:
> dygraph(all.xts, "time series dygraph") %>%
dySeries("actual", label = "Actual") %>%
dySeries(c("lwr", "fit", "upr"), label = "Predicted")
I want to calculate
"average of the closing prices for the 5,10,30 consecutive trading days immediately preceding and including the Announcement Day, but excluding trading halt days (days on which trading volume is 0 or NA)
For example, now we set 2014/5/7 is the Announcement day.
then average of price for 5 consecutive days :
average of (price of 2014/5/7,2014/5/5, 2014/5/2, 2014/4/30,2014/4/29),
price of 2014/5/6 and 2014/5/1 was excluded due to 0 trading volume on those days.
EDIT on 11/9/2014
One thing to Note: the announcement day for each stock is different, and it's not last valid date in the data, so usage of tail when calculating average was not appropriate.
Date Price Volume
2014/5/9 1.42 668000
2014/5/8 1.4 2972000
2014/5/7 1.5 1180000
2014/5/6 1.59 0
2014/5/5 1.59 752000
2014/5/2 1.6 138000
2014/5/1 1.6 NA
2014/4/30 1.6 656000
2014/4/29 1.61 364000
2014/4/28 1.61 1786000
2014/4/25 1.64 1734000
2014/4/24 1.68 1130000
2014/4/23 1.68 506000
2014/4/22 1.67 354000
2014/4/21 1.7 0
2014/4/18 1.7 0
2014/4/17 1.7 1954000
2014/4/16 1.65 1788000
2014/4/15 1.71 1294000
2014/4/14 1.68 1462000
Reproducible Code:
require(quantmod)
require(data.table)
tickers <- c("0007.hk","1036.hk")
date_begin <- as.Date("2010-01-01")
date_end <- as.Date("2014-09-09")
# retrive data of all stocks
prices <- getSymbols(tickers, from = date_begin, to = date_end, auto.assign = TRUE)
dataset <- merge(Cl(get(prices[1])),Vo(get(prices[1])))
for (i in 2:length(prices)){
dataset <- merge(dataset, Cl(get(prices[i])),Vo(get(prices[i])))
}
# Write First
write.zoo(dataset, file = "prices.csv", sep = ",", qmethod = "double")
# Read zoo
test <- fread("prices.csv")
setnames(test, "Index", "Date")
Then I got a data.table. The first Column is Date, then the price and volume for each stock.
Actually, the original data contains information for about 40 stocks. Column names have the same patter: "X" + ticker.close , "X" + ticker.volumn
Last trading days for different stock were different.
The desired output :
days 0007.HK 1036.HK
5 1.1 1.1
10 1.1 1.1
30 1.1 1.1
The major issues:
.SD and lapply and .SDCol can be used for looping different stocks. .N can be used when calculating last consecutive N days.
Due to the different announcement day, it becomes a little complicated.
Any suggestions on single stock using quantmod or multiple stocks using data.table are extremely welcomed!
Thanks GSee and pbible for the nice solutions, it was very useful. I'll update my code later incorporating different announcement day for each stocks, and consult you later.
Indeed, it's more a xts question than a data.table one. Anything about data.table will be very helpful. Thanks a lot!
Because the different stocks have different announcement days, I tried to make a solution first following #pbible's logic, any suggestions will be extremely welcomed.
library(quantmod)
tickers <- c("0007.hk","1036.hk")
date_begin <- as.Date("2010-01-01")
# Instead of making one specific date_end, different date_end is used for convenience of the following work.
date_end <- c(as.Date("2014-07-08"),as.Date("2014-05-15"))
for ( i in 1: length(date_end)) {
stocks <- getSymbols(tickers[i], from = date_begin, to = date_end[i], auto.assign = TRUE)
dataset <- cbind(Cl(get(stocks)),Vo(get(stocks)))
usable <- subset(dataset,dataset[,2] > 0 & !is.na(dataset[,2]))
sma.5 <- SMA(usable[,1],5)
sma.10 <- SMA(usable[,1],10)
sma.30 <- SMA(usable[,1],30)
col <- as.matrix(rbind(tail(sma.5,1), tail(sma.10,1), tail(sma.30,1)))
colnames(col) <- colnames(usable[,1])
rownames(col) <- c("5","10","30")
if (i == 1) {
matrix <- as.matrix(col)
}
else {matrix <- cbind(matrix,col)}
}
I got what I want, but the code is ugly..Any suggestions to make it elegant are extremely welcomed!
Well, here's a way to do it. I don't know why you want to get rid of the loop, and this does not get rid of it (in fact it has a loop nested inside another). One thing that you were doing is growing objects in memory with each iteration of your loop (i.e. the matrix <- cbind(matrix,col) part is inefficient). This Answer avoids that.
library(quantmod)
tickers <- c("0007.hk","1036.hk")
date_begin <- as.Date("2010-01-01")
myEnv <- new.env()
date_end <- c(as.Date("2014-07-08"),as.Date("2014-05-15"))
lookback <- c(5, 10, 30) # different number of days to look back for calculating mean.
symbols <- getSymbols(tickers, from=date_begin,
to=tail(sort(date_end), 1), env=myEnv) # to=last date
end.dates <- setNames(date_end, symbols)
out <- do.call(cbind, lapply(end.dates, function(x) {
dat <- na.omit(get(names(x), pos=myEnv))[paste0("/", x)]
prc <- Cl(dat)[Vo(dat) > 0]
setNames(vapply(lookback, function(n) mean(tail(prc, n)), numeric(1)),
lookback)
}))
colnames(out) <- names(end.dates)
out
# 0007.HK 1036.HK
#5 1.080 8.344
#10 1.125 8.459
#30 1.186 8.805
Some commentary...
I created a new environment, myEnv, to hold your data so that it does not clutter your workspace.
I used the output of getSymbols (as you did in your attempt) because the input tickers are not uppercase.
I named the vector of end dates so that we can loop over that vector and know both the end date and the name of the stock.
the bulk of the code is an lapply loop (wrapped in do.call(cbind, ...)). I'm looping over the named end.dates vector.
The first line gets the data from myEnv, removes NAs, and subsets it to only include data up to the relevant end date.
The next line extracts the close column and subsets it to only include rows where volume is greater than zero.
The vapply loops over a vector of different lookbacks and calculates the mean. That is wrapped in setNames so that each result is named based on which lookback was used to calculate it.
The lapply call returns a list of named vectors. do.call(cbind, LIST) is the same as calling cbind(LIST[[1]], LIST[[2]], LIST[[3]]) except LIST can be a list of any length.
at this point we have a matrix with row names, but no column names. So, I named the columns based on which stock they represent.
Hope this helps.
How about something like this using the subset and moving average (SMA). Here is the solution I put together.
library(quantmod)
tickers <- c("0007.hk","1036.hk","cvx")
date_begin <- as.Date("2010-01-01")
date_end <- as.Date("2014-09-09")
stocks <- getSymbols(tickers, from = date_begin, to = date_end, auto.assign = TRUE)
stock3Summary <- function(stock){
dataset <- cbind(Cl(get(stock)),Vo(get(stock)))
usable <- subset(dataset,dataset[,2] > 0 & !is.na(dataset[,2]))
sma.5 <- SMA(usable[,1],5)
sma.10 <- SMA(usable[,1],10)
sma.30 <- SMA(usable[,1],30)
col <- as.matrix(rbind(tail(sma.5,1), tail(sma.10,1), tail(sma.30,1)))
colnames(col) <- colnames(usable[,1])
rownames(col) <- c("5","10","30")
col
}
matrix <- as.matrix(stock3Summary(stocks[1]))
for( i in 2:length(stocks)){
matrix <- cbind(matrix,stock3Summary(stocks[i]))
}
The output:
> matrix
X0007.HK.Close X1036.HK.Close CVX.Close
5 1.082000 8.476000 126.6900
10 1.100000 8.412000 127.6080
30 1.094333 8.426333 127.6767
This should work with multiple stocks. It will use only the most recent valid date.
Apologies in advance if this is answered elsewhere. I have searched for roughly 24 hrs and have come up empty at every turn.
This is the data set I am working with
Sys.setenv(TZ='GMT')
dat = read.csv("SPY_MINUTE_TRADE.csv", header = TRUE) #QuantQuote sample minute data
dat[,2] <- sprintf('%04d', dat[,2]) #add a zero to front of time IE 400 becomes 0400 aka 4AM
#Create a zoo object ordered by day and time from the dat dataframe
datzoo <- read.zoo(file=dat, sep=",", header=TRUE,
index.column=1:2, format="%Y%m%d %H%M", tz="",
colClasses = rep(c("character", "numeric"), c(2, 8)))
Spy <- as.xts(datzoo)
# Create regular series from 00:00 to 23:59 of 1 minute prints
y <- xts(seq(from = 1, to = 60*24, by = 1), as.POSIXlt((0),
origin="2013-03-30 00:00", tz='GMT')+seq(from = 0, to = 60*60*24-1, by = 60))
colnames(y) <- "TempIndex"
#Merge the regular ts (y) with Spy and remove the original Spy column
SpyReg <- merge(y,Spy, join='left')
SpyReg$TempIndex <- NULL
#Capture the index of Spy
ISpy <- index(Spy)
I have a few questions about the above code...
1) SpyReg["2012-03-30 04:00:00 GMT"] returns
OPEN HIGH LOW CLOSE VOLUME SPLITS EARNINGS DIVIDENDS
Spy["2012-03-30 04:00:00 GMT"] returns the correct values of Spy for the given index
OPEN HIGH LOW CLOSE VOLUME SPLITS EARNINGS DIVIDENDS
2012-03-30 04:00:00 140.66 140.66 140.66 140.66 2160 1 0 0
However,
SpyReg["T04:00:00/T04:01:00"]
OPEN HIGH LOW CLOSE VOLUME SPLITS EARNINGS DIVIDENDS
2013-03-30 04:00:00 NA NA NA NA NA NA NA NA
2013-03-30 04:01:00 NA NA NA NA NA NA NA NA
why is this, when both are xts objects of the same index type, month, and time? Shouldn't SpyReg[""2012-03-30 04:00:00 GMT"] return:
OPEN HIGH LOW CLOSE VOLUME SPLITS EARNINGS DIVIDENDS
2013-03-30 04:00:00 NA NA NA NA NA NA NA NA
2) Why did the merge not give SpyReg the Spy value for the same index (such as the 4AM print?) I tried all 4 "join" options, but none worked...
3) I assume there is a MUCH more elegant way to solve this problem than what I am trying to do. After creating Spy, it was not regular, minute by minute. I wanted to create a regular xts object that had no gaps and flowed continuously minute by minute from midnight to 23:59, add the entries from Spy into it, then do a na.locf to replace the rest of the NAs with the original data.
Setting the index of an xts object to POSIXlt can cause some strange behaviors. I'd simply recommend you use POSIXct instead.
URL <- "http://quantquote.com/sample/SPY_MINUTE_TRADE.csv"
Spy <- read.zoo(URL, sep=",", header=TRUE, index.column=1:2, FUN=function(x)
as.POSIXct(sprintf("%8d %04d",x[,1],x[,2]), format="%Y%m%d %H%M", tz=""))
Spy <- as.xts(Spy)
Now you can merge Spy with an 'empty' xts object that has the regular index values you want.
SpyReg <- merge(Spy, xts(, seq(start(Spy),end(Spy),by="1 min")), fill=na.locf)