Errors when using getSymbols() quantmod to predict stock price - r

I was trying to use quantmod to download some history data of stock price, here's my code:
Nasdaq100_Symbols <- c('GE','PG','MSFT','AAPL','PFE','AMD','DELL','GRPN','FB','CSCO','INTC',
'EZJ.L','BP','HSBC','MKS')
getSymbols(Nasdaq100_Symbols)
Warning messages:
1: DELL contains missing values. Some functions will not work if objects contain missing values in the middle of the series. Consider using na.omit(), na.approx(), na.fill(), etc to remove or replace them.
How can I remove these NA values since I'm trying to merge them together and make it as a time series data type,
nasdaq100 <- data.frame(as.xts(merge(GE,PG,MSFT,AAPL,PFE,AMD,DELL,GRPN,FB,CSCO,INTC,
EZJ.L,BP,HSBC,MKS)))
head(nasdaq100[,1:12],2)
GE.Open GE.High GE.Low GE.Close GE.Volume GE.Adjusted PG.Open PG.High PG.Low
2007-01-02 NA NA NA NA NA NA NA NA NA
2007-01-03 37.41 38.15 37.38 37.97 43222800 24.48669 63.72 64.66 63.7
PG.Close PG.Volume PG.Adjusted
2007-01-02 NA NA NA
2007-01-03 64.54 9717900 44.56958
class(nasdaq100)
[1] "data.frame"
# set outcome variable
outcomeSymbol <- 'FISV.Volume'
# shift outcome value to be on same line as predictors
library(xts)
nasdaq100 <- xts(nasdaq100,order.by=as.Date(rownames(nasdaq100)))
nasdaq100 <- as.data.frame(merge(nasdaq100,lm1=lag(nasdaq100[,outcomeSymbol],-1)))
Error in `[.xts`(nasdaq100, , outcomeSymbol) : subscript out of bounds
I'm stuck here, I found a tutorial on Youtube(https://www.youtube.com/watch?v=lDgvaJFpybU&t=32s) but can't move forward because of these warning and errors, can someone tell me how to fix it?

If you are going to do part of an example code, make sure you adjust everything correctly. At the end you are filling the outcomeSymbol with a value from the stock FISV that you didn't download in the beginning of your script. And I must say the code in the script that you can find here could be written better. There are way too many switches between xts and data.frame that are not necessary. I'm not going to rewrite his whole code. But this code fixes your errors.
First, instead of polluting your work environment with a 100 stocks I put everything in one list object. Then merge all this together with Reduce and merge. The missing data that is in the DELL ticker, will nicely merge with everything else, but will be NA as there is no data. If you want to deal with this, either do not download the DELL data, or fill it with 0 with the na.fill function. This last option might not be a good solution if you are going to use this data for training a model. I also show you how to turn a xts object into a data.frame without having to use as.Date later on.
library(quantmod)
Nasdaq100_Symbols <- c('GE','PG','MSFT','AAPL','PFE','AMD','DELL')
# put all stocks in one list object
stocks <- lapply(Nasdaq100_Symbols, getSymbols, auto.assign = FALSE)
# following is not needed but if you want to use the list for other purposes
# it is a good practice to name all the different list objects.
# names(stocks) <- Nasdaq100_Symbols
# merge all stocks into 1 xts object
nasdaq100 <- Reduce(merge, stocks)
# fill NA's with 0
nasdaq100 <- na.fill(nasdaq100, 0)
outcomeSymbol <- "GE.Volume" # <-- used GE as that data is available in the downloaded data set
# merge outcome to data
nasdaq100 <- merge(nasdaq100, lm1 = lag(nasdaq100[, outcomeSymbol], -1))
# turn into data.frame
nasdaq100_df <- data.frame(date = index(nasdaq100), coredata(nasdaq100))

I am not exactly sure why you want to remove NA before merge.
I do it after merge and it works perfectly for me because xts objects are merged based on their data index. I only keep Adjusted Clase so my usual code looks like:
yahoo_symbols <- c(share1, share2, share3,...)
qts_env <- new.env()
getSymbols(yahoo_symbols,
env = qts_env,
from = start_date,
to = end_date,
periodicity = "daily"
)
shares_cl <- do.call(merge, eapply(qts_env, Ad))
shares_cl <- na.omit(shares_cl)
I hope that it helps.

Related

quantmod difficulty loading data in correct format

I am very new to R, I watched a youtube video to do various time series analysis, but it downloaded data from yahoo - my data is in Excel. I wanted to follow the same analysis, but with data from an excel.csv file. I spent two days finding out that the date must be in USA style. Now I am stuck again on a basic step - loading the data so it can be analysed - this seems to be the biggest hurdle with R. Please can someone give me some guidance on why the command shown below does not do the returns for the complete column set. I tried the zoo format, but it didn't work, then I tried xts and it worked partially. I suspect the original import from excel is the major problem. Can I get some guidance please
> AllPrices <- as.zoo(AllPrices)
> head(AllPrices)
Index1 Index2 Index3 Index4 Index5 Index6 Index7 Index8 Index9 Index10
> AllRets <- dailyReturn(AllPrices)
Error in NextMethod("[<-") : incorrect number of subscripts on matrix
> AllPrices<- as.xts(AllPrices)
> AllRets <- dailyReturn(AllPrices)
> head(AllRets)
daily.returns
2012-11-06 0.000000e+00
2012-11-07 -2.220249e-02
2012-11-08 1.379504e-05
2012-11-09 2.781961e-04
2012-11-12 -2.411128e-03
2012-11-13 7.932869e-03
Try to load your data using the readr package.
library(readr)
Then, look at the documentation by running ?read_csv in the console.
I recommend reading in your data this way. Specify the column types. For instance, if your first column is the date, read it in as a character "c" and if your other columns are numeric use "n".
data <- read_csv('YOUR_DATA.csv', col_types = "cnnnnn") # date in left column, 5 numeric columns
data$Dates <- as.Date(data$Dates, format = "%Y-%m-%d") # make the dates column a date class (you need to update "Dates" to be your column name for the Dates column, you may need to change the format
data <- as.data.frame(data) # turn the result into a dataframe
data <- xts(data[,-1], order.by = XAU[,1]) # then make an xts, data is everything but the date column, order.by is the date column

NULL values, regression / correlation switch

I have a dataset with lets say 2 variables. I want to do some regression testing, but the quite a few numeric observations have "NULL". I would want to use this as a value however, but I don't want to convert it to a specific number, ie 99999.
I keep trying all the different ways after googling and it doesn't work.
Benny2 <- read_excel("C:/Users/EH9508/Desktop/Benny2.xlsx")
I have two variables "Days" and "Amount" both have numeric values and "NULL"
Any help would be appreicated.
You can convert the file/sheet to csv from Excel (save as > csv) and then:
mydata <- read.csv("path/to/file.csv")
If you don't have access to Excel, then this is how it goes with the xlsx library:
library("xlsx")
mydata <- read.xlsx("path/to/file.xlsx")
If you put the csv/xlsx file in the same folder as your R script, you can type the file name without the path as read.xlsx("file.xlsx").
If you already have your data in R and are wondering how to get the NULL converted to a given value, try this:
mydata <- matrix(rnorm(10),5,2) # You data
mydata[2,1] <- NA # Some NA
mydata[5,2] <- NA
mydata[is.na(mydata)] <- 99999 # Replaces mydata where NA for 99999

Create XTS with dates with different number of rows

Trying to create an xts file but after formatting upon loading in, I have different number of rows for dates than I do for my data. My data has many columns with varying number of rows, anywhere from 20 to 200. I want to create a separate variable after loading in, and the variable with depend on the composite I want to look at, so I want a full data.frame with NAs before creating a variable where I will na.omit and reduce the dimensions.
Here is the code:
#load file with desired composite
allcomposites <- read.csv("Composites 2014.08.31.csv", header = T)
compositebench <- allcomposites[1, 2:ncol(allcomposites)]
dates1 <- as.Date(allcomposites$Name, format = "%m/%d/%Y")
allcomposites <- as.data.frame(lapply(allcomposites[2:nrow(allcomposites),2:ncol(allcomposites)], as.numeric))
allcomposites <- as.xts(allcomposites, order.by = dates1)
## Error in xts(x, order.by = order.by, frequency = frequency, ...) :
## NROW(x) must match length(order.by)
Edit to show what allcomposites looks like:
Name Composite1 Composite2 Composite3 Composite4 Composite5
Bmark 229 229 982 612 995
8/31/2014 0.9979 0.9404 4.3808 3.9296
7/31/2014 -0.4563 -0.3038 -1.7817 -1.7248
6/30/2014 0.205 0.2234 2.2184 2.7304
5/31/2014 1.311 1.5771 3.4824 1.7601
4/30/2014 0.9096 1.0187 -1.9195 1.2964
You need to be more careful when removing the first row from dates1 as well as allcomposites.
Here's another way to accomplish your goal:
Lines <- "Name Composite1 Composite2 Composite3 Composite4 Composite5
Bmark 229 229 982 612 995
8/31/2014 0.9979 0.9404 4.3808 3.9296
7/31/2014 -0.4563 -0.3038 -1.7817 -1.7248
6/30/2014 0.205 0.2234 2.2184 2.7304
5/31/2014 1.311 1.5771 3.4824 1.7601
4/30/2014 0.9096 1.0187 -1.9195 1.2964"
library(xts)
# use fill=TRUE because you only provided data for 4 composites
allcomp <- read.table(text=Lines, header=TRUE, fill=TRUE)
# remove the first row that contains "Bmark"
allcomp <- allcomp[-1,]
# create an xts object from the remaining data
allcomp_xts <- xts(allcomp[,-1], as.Date(allcomp[,1], "%m/%d/%Y"))
## Error in xts(x, order.by = order.by, frequency = frequency, ...
## NROW(x) must match length(order.by)
I wasted hours running into this error. Regardless of whether or not I had the exact same problem, I'll show how I solved for this error message in case it saves you the pain I had.
I imported an Excel or CSV file (tried both) through several importing functions, then tried to convert my data (as either a data.frame or .zoo object) into an xts object and kept getting errors, this one included.
I tried creating a vector of dates seperately to pass in as the order.by parameter. I tried making sure the date vector the rows of the data.frame were the same. Sometimes it worked and sometimes it didn't, for reasons I can't explain. Even when it did work, R had "coerced" all my numeric data into character data. (Causing me endless problems, later. Watch for coercion, I learned.)
These errors kept happening until:
For xts conversion I used the date column from the imported Excel sheet as the order.by parameter with an as.Date() modifier, AND I *dropped the date column during the conversion to xts.*
Here's the working code:
xl_sheet <- read_excel("../path/to/my_excel_file.xlsx")
sheet_xts <- xts(xl_sheet[-1], order.by = as.Date(xl_sheet$date))
Note my date column was the first column, so the xl_sheet[-1] removed the first column.

getSymbols downloading data for multiple symbols and calculate returns

I'm currently downloading stock data using GetSymbols from the Quantmod package and calculating the daily stock returns, and then combining the data into a dataframe. I would like to do this for a very large set of stock symbols. See example below. In stead of doing this manually I would like to use a For Loop if possible or maybe use one of the apply functions, however I can not find the solution.
This is what I currently do:
Symbols<-c ("XOM","MSFT","JNJ","GE","CVX","WFC","PG","JPM","VZ","PFE","T","IBM","MRK","BAC","DIS","ORCL","PM","INTC","SLB")
length(Symbols)
#daily returns for selected stocks & SP500 Index
SP500<-as.xts(dailyReturn(na.omit(getSymbols("^GSPC",from=StartDate,auto.assign=FALSE))))
S1<-as.xts(dailyReturn(na.omit(getSymbols(Symbols[1],from=StartDate,auto.assign=FALSE))))
S2<-as.xts(dailyReturn(na.omit(getSymbols(Symbols[2],from=StartDate,auto.assign=FALSE))))
S3<-as.xts(dailyReturn(na.omit(getSymbols(Symbols[3],from=StartDate,auto.assign=FALSE))))
S4<-as.xts(dailyReturn(na.omit(getSymbols(Symbols[4],from=StartDate,auto.assign=FALSE))))
S5<-as.xts(dailyReturn(na.omit(getSymbols(Symbols[5],from=StartDate,auto.assign=FALSE))))
S6<-as.xts(dailyReturn(na.omit(getSymbols(Symbols[6],from=StartDate,auto.assign=FALSE))))
S7<-as.xts(dailyReturn(na.omit(getSymbols(Symbols[7],from=StartDate,auto.assign=FALSE))))
S8<-as.xts(dailyReturn(na.omit(getSymbols(Symbols[8],from=StartDate,auto.assign=FALSE))))
S9<-as.xts(dailyReturn(na.omit(getSymbols(Symbols[9],from=StartDate,auto.assign=FALSE))))
S10<-as.xts(dailyReturn(na.omit(getSymbols(Symbols[10],from=StartDate,auto.assign=FALSE))))
....
S20<-as.xts(dailyReturn(na.omit(getSymbols(Symbols[20],from=StartDate,auto.assign=FALSE))))
SPportD<-cbind(SP500,S1,S2,S3,S4,S5,S6,S7,S8,S9,S10,S11,S12,S13,S14,S15,S16,S17,S18,S19,S20)
names(SPportD)[1:(length(Symbols)+1)]<-c("SP500",Symbols)
SPportD.df<-data.frame(index(SPportD),coredata(SPportD),stringsAsFactors=FALSE)
names(SPportD.df)[1:(length(Symbols)+2)]<-c(class(StartDate),"SP500",Symbols)
Any suggestions?
Thanks!
dailyReturn uses close prices, so I would recommend you either use a different function (e.g. TTR::ROC on the Adjusted column), or adjust the close prices for dividends/splits (using adjustOHLC) before calling dailyReturn.
library(quantmod)
Symbols <- c("XOM","MSFT","JNJ","GE","CVX","WFC","PG","JPM","VZ","PFE",
"T","IBM","MRK","BAC","DIS","ORCL","PM","INTC","SLB")
# create environment to load data into
Data <- new.env()
getSymbols(c("^GSPC",Symbols), from="2007-01-01", env=Data)
# calculate returns, merge, and create data.frame (eapply loops over all
# objects in an environment, applies a function, and returns a list)
Returns <- eapply(Data, function(s) ROC(Ad(s), type="discrete"))
ReturnsDF <- as.data.frame(do.call(merge, Returns))
# adjust column names are re-order columns
colnames(ReturnsDF) <- gsub(".Adjusted","",colnames(ReturnsDF))
ReturnsDF <- ReturnsDF[,c("GSPC",Symbols)]
lapply is your friend:
Stocks = lapply(Symbols, function(sym) {
dailyReturn(na.omit(getSymbols(sym, from=StartDate, auto.assign=FALSE)))
})
Then to merge:
do.call(merge, Stocks)
Similar application for the other assignments
Packages are quantmod for data download and PerformanceAnalytics for analysis/plotting.
care must be taken with time series date alignment
Code
require(quantmod)
require(PerformanceAnalytics)
Symbols<-c ("XOM","MSFT","JNJ","GE","CVX","WFC","PG","JPM","VZ","PFE","T","IBM","MRK","BAC","DIS","ORCL","PM","INTC","SLB")
length(Symbols)
#Set start date
start_date=as.Date("2014-01-01")
#Create New environment to contain stock price data
dataEnv<-new.env()
#download data
getSymbols(Symbols,env=dataEnv,from=start_date)
#You have 19 symbols, the time series data for all the symbols might not be aligned
#Load Systematic investor toolbox for helpful functions
setInternet2(TRUE)
con = gzcon(url('https://github.com/systematicinvestor/SIT/raw/master/sit.gz', 'rb'))
source(con)
close(con)
#helper function for extracting Closing price of getsymbols output and for date alignment
bt.prep(dataEnv,align='remove.na')
#Now all your time series are correctly aligned
#prices data
stock_prices = dataEnv$prices
head(stock_prices[,1:3])
# head(stock_prices[,1:3])
# BAC CVX DIS
#2014-01-02 16.10 124.14 76.27
#2014-01-03 16.41 124.35 76.11
#2014-01-06 16.66 124.02 75.82
#2014-01-07 16.50 125.07 76.34
#2014-01-08 16.58 123.29 75.22
#2014-01-09 16.83 123.29 74.90
#calculate returns
stock_returns = Return.calculate(stock_prices, method = c("discrete"))
head(stock_returns[,1:3])
# head(stock_returns[,1:3])
# BAC CVX DIS
#2014-01-02 NA NA NA
#2014-01-03 0.019254658 0.001691638 -0.002097810
#2014-01-06 0.015234613 -0.002653800 -0.003810275
#2014-01-07 -0.009603842 0.008466376 0.006858349
#2014-01-08 0.004848485 -0.014232030 -0.014671208
#2014-01-09 0.015078408 0.000000000 -0.004254188
#Plot Performance for first three stocks
charts.PerformanceSummary(stock_returns[,1:3],main='Stock Absolute Performance',legend.loc="bottomright")
Performance Chart:

Get frequency for TS from and XTS for X12

I'm trying to automate some seasonal adjustment with the x12 package. To do this I need a ts object. However, I do not need a simple ts object, but one whose start date and frequency has been set. For any given series I could type that, but I will be feeding a mix of monthly or weekly data in. I can get the data from a quantmod as an xta object, but can't seem to figure out how to extract the frequency from the xts.
Here is some sample code that works the the whole way through, but I would like to pull the frequency info from the xts, rather than explicitly set it:
getSymbols("WILACR3URN",src="FRED", from="2000-01-01") # get data as an XTS
lax <- WILACR3URN #shorten name
laxts <- ts(lax$WILACR3URN, start=c(2000,1), frequency=12) #explicitly it works
plot.ts(laxts)
x12out <- x12(laxts,x12path="c:\\x12arima\\x12a.exe",transform="auto", automdl=TRUE)
laxadj <- as.ts(x12out$d11) # extract seasonally adjusted series
Any suggestions? Or is it not possible and I should determine/feed the frequency explicitly?
Thanks
This is untested for this specific case, but try using xts::periodicity for the frequency:
freq <- switch(periodicity(lax)$scale,
daily=365,
weekly=52,
monthly=12,
quarterly=4,
yearly=1)
And use the year and mon elements of POSIXlt objects to calculate the start year and month.
pltStart <- as.POSIXlt(start(lax))
Start <- c(pltStart$year+1900,pltStart$mon+1)
laxts <- ts(lax$WILACR3URN, start=Start, frequency=freq)
plot.ts(laxts)
The xts::periodicity suggestion was helpful to me. I've also found the following approach using xts::convertIndex works well for monthly and quarterly data. It is untested for weekly data.
require("quantmod")
require("dplyr")
getSymbols("WILACR3URN",src="FRED", from="2000-01-01") # get data as an XTS
lax <- WILACR3URN #shorten name
laxts <- lax %>%
convertIndex("yearmon") %>% # change index of xts object
as.ts(start = start(.), end = end(.)) # convert to ts
plot.ts(laxts)

Resources