Loop to transform xts into dataframe - r

I am using the package quantmod to get historical share prices.
I want to create a loop to pull back the prices and as part of the loop I want to create a dataframe for each share. I have been unsuccessful so far with the below code, it gets the share prices as expected but this is returned as a xts object whereas I require the information as a dataframe - the as.data.frame part of the code doesn't do anything...
library(quantmod)
shares<-c("BARC.L", "BP.L", "DLG.L")
for(i in 1:length(shares)){
#gets share prices
getSymbols((paste(shares[i])), from = "2018-01-01")
#put the data into a dataframe (doesn't work).
shares[i]<-as.data.frame(shares[i])
}
The end result that I want is 3 dataframes - 1 for each share.
Can anyone suggest modifications to the code to achieve this please?

Personally I would do it like this:
library(quantmod)
shares<-c("BARC.L", "BP.L", "DLG.L")
my_shares <- lapply(shares, function(x) getSymbols(x, from = "2018-01-01", auto.assign = FALSE))
names(my_shares) <- shares
Or if you need the dates as a column instead of rownames:
my_shares <- lapply(shares, function(x) {
out <- getSymbols(x, from = "2018-01-01", auto.assign = FALSE)
out <- data.frame(dates = index(out), coredata(out))
return(out)
})
names(my_shares) <- shares
Or if you need everything in a tidy dataset:
library(tidyquant)
my_shares <- tq_get(shares)
my_shares
# A tibble: 7,130 x 8
symbol date open high low close volume adjusted
<chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 BARC.L 2008-01-02 464. 483. 460. 466. 38104837 344.
2 BARC.L 2008-01-03 466. 472. 458. 470. 33215781 347.
3 BARC.L 2008-01-04 466. 476. 447. 449. 42710244 332.
4 BARC.L 2008-01-07 447. 452. 433. 436. 58213512 322.
5 BARC.L 2008-01-08 439. 447. 421. 437. 105370539 322.
6 BARC.L 2008-01-09 432. 434. 420. 424. 71059078 313.
7 BARC.L 2008-01-10 428. 431. 413. 418. 54763347 309.
8 BARC.L 2008-01-11 416. 437. 416. 430. 72467229 317.
9 BARC.L 2008-01-14 430. 448. 427. 444. 56916500 328.
10 BARC.L 2008-01-15 445. 452. 428. 429. 77094907 317.
# ... with 7,120 more rows

Firstly, I suggest you use the help() function that comes with R packages if you're not already doing so. I noticed in help(getSymbols) that you need to set env=NULL to actually return the data. With that, I've also made a list object so you can store the data as data.frames like you requested:
library(quantmod)
shares<-c("BARC.L", "BP.L", "DLG.L")
# initialize a list to store your data frames
df_list <- as.list(rep(data.frame(), length(shares)))
for (i in 1:length(shares)) {
#gets share prices
df_list[[i]] <- as.data.frame(getSymbols(shares[i], from = "2018-01-01", env=NULL))
}
# so you can access by name, e.g. df_list$DLG.L
names(df_list) <- shares

Related

Why are daily returns all zeros using Quantmod?

I used the following code:
getSymbols(c("TSLA", "AAPL", "CSCO", "IBM"))
tsla<-TSLA['2022-01-03::2023-01-03']
aapl=AAPL['2022-01-03::2023-01-03']
csco=CSCO['2022-01-03::2023-01-03']
ibm=IBM['2022-01-03::2023-01-03']
tsla<-tsla$TSLA.Adjusted
aapl<-aapl$AAPL.Adjusted
csco<-csco$CSCO.Adjusted
ibm<-ibm$IBM.Adjusted
stkdata=cbind(tsla, aapl, csco, ibm)
n<-length(stkdata[,1])
rets<-log(stkdata[2:n,]/stkdata[1:(n-1),])
It produces all zeros.
After I assigned stkdata[2:n] to x and stkdata[1:n-1] to y, R shows
x[1,]
TSLA.Adjusted AAPL.Adjusted CSCO.Adjusted IBM.Adjusted
2022-01-04 383.1967 178.3907 59.26239 129.9028
y[1,]
TSLA.Adjusted AAPL.Adjusted CSCO.Adjusted IBM.Adjusted
2022-01-03 399.9267 180.6839 60.75242 128.0392
This is fine. But
x[1,]/y[1,]
Data:
numeric(0)
Index:
Date of length 0
What could be the problem? Thanks ahead!
This behavior is expected because arithmetic and logical operations on xts objects are done on observations that have the same date.
You should use the lag() function to change the datetime index alignment. log(stkdata / lag(stkdata)).
Note that you have to be very careful using lag() with dplyr loaded. It breaks how base R's lag() function is supposed to work, which breaks lag(my_xts). It also breaks lag() on all other types of objects that have their own lag() method (e.g. zoo).
1) getSymbols can place the results into a local environment and then we can iterate over its elements using eapply. Then use diff with arithmetic=FALSE causing diff to perform division rather than subtraction.
If x is the ratio of the current price to the prior price then
while it is true that log(x) approximately equals x-1 if the return is small we don't really need to use that approximation and can calculate the return exactly using x-1.
Regarding the question, xts objects do not combine by position but by time. Removing the first or last element of an xts object does not change the times so the code in the question is dividing stkdata by itself except for the positions on the end which have been removed.
Try the code below.
library(quantmod)
tickers <- c("TSLA", "AAPL", "CSCO", "IBM")
getSymbols(tickers, env = e <- new.env(), from = "2022-01-03", to = "2023-01-03")
stks <- do.call("merge", eapply(e, Ad))
rets <- diff(stks, arithmetic = FALSE) - 1
2) A variation is to use getSymbols to load the data into the current R workspace, as in the question, and then use mget.
library(quantmod)
tickers <- c("TSLA", "AAPL", "CSCO", "IBM")
getSymbols(tickers, from = "2022-01-03", to = "2023-01-03")
stks <- do.call("merge", lapply(mget(tickers), Ad))
rets <- diff(stks, arithmetic = FALSE) - 1
With tidyquant you can calculate daily log returns as such:
library(tidyquant)
library(tidyverse)
df = tq_get(c("TSLA", "AAPL", "CSCO", "IBM"),
from = "2022-01-03",
to = "2023-01-04")
log_return = df %>%
group_by(symbol) %>%
tq_mutate(select = adjusted,
mutate_fun = periodReturn,
period = "daily",
type = "log",
col_rename = "log_returns")
# A tibble: 1,008 × 9
# Groups: symbol [4]
symbol date open high low close volume adjusted log_returns
<chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 TSLA 2022-01-03 383. 400. 379. 400. 103931400 400. 0
2 TSLA 2022-01-04 397. 403. 374. 383. 100248300 383. -0.0427
3 TSLA 2022-01-05 382. 390. 360. 363. 80119800 363. -0.0550
4 TSLA 2022-01-06 359 363. 340. 355. 90336600 355. -0.0218
5 TSLA 2022-01-07 360. 360. 337. 342. 84164700 342. -0.0361
6 TSLA 2022-01-10 333. 353. 327. 353. 91815000 353. 0.0299
7 TSLA 2022-01-11 351. 359. 346. 355. 66063300 355. 0.00592
8 TSLA 2022-01-12 360. 372. 358. 369. 83739000 369. 0.0385
9 TSLA 2022-01-13 370. 372. 342. 344. 97209900 344. -0.0699
10 TSLA 2022-01-14 340. 351. 338. 350. 72924300 350. 0.0173
# … with 998 more rows
# ℹ Use `print(n = ...)` to see more rows
Plotting
log_return %>%
ggplot() +
aes(x = date, y = log_returns, col = symbol) +
geom_line() +
facet_wrap(~ symbol) +
theme_tq()

Generalize "$-notation"

I'm still getting used to working in R and thought constructing a "simple" MACD-screener would be a great way to get into some of the inner workings of R. However, I have encountered the following problem.
I've perfectly been able to calculate te MACD and signal line for a seperate stock. So now, in order to be able to scan multiple stocks, I have to generalize the code. My question in: "How can I use a variable (f.e. name of the stock currently being looked at) in the "$-notation"?
After this I'm planning to do a "for loop" iterating over the names of stocks in a list-object. Is this a practical way of doing it?
Below I've inserted the code I have till now. In this code I'm looking to replace the "QQQ" with a variable.
library(quantmod)
tickers <- c('QQQ','SPY','APPL','MMM')
ema.s = 12
ema.l = 26
ema.k = 9
ema.t = 200
getSymbols(tickers, from = '2021-01-6',
to = "2021-10-21",warnings = FALSE,
auto.assign = TRUE)
QQQ$QQQ.EMA.S <- EMA(QQQ[,6], n = ema.s)
QQQ$QQQ.EMA.L <- EMA(QQQ[,6], n = ema.l)
QQQ$QQQ.MACD <- QQQ$QQQ.EMA.S - QQQ$QQQ.EMA.L
QQQ$QQQ.SIG <- EMA(QQQ$QQQ.MACD, n = ema.k)
You can use tidyquant to all of this in one go.
library(tidyquant)
ema.s = 12
ema.l = 26
tickers <- c('QQQ','SPY','AAPL','MMM')
# get all the data in a tibble
stock_data <- tq_get(tickers,
from = '2021-01-6',
to = "2021-10-21")
stock_data <- stock_data %>%
group_by(symbol) %>%
tq_mutate(select = adjusted,
mutate_fun = MACD,
n_fast = ema.s,
n_slow = ema.l)
stock_data
# A tibble: 800 x 10
# Groups: symbol [4]
symbol date open high low close volume adjusted macd signal
<chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 QQQ 2021-01-06 307 312. 306. 308. 52809600 306. NA NA
2 QQQ 2021-01-07 310. 316. 310. 315. 30394800 314. NA NA
3 QQQ 2021-01-08 317. 319. 315. 319. 33955800 318. NA NA
4 QQQ 2021-01-11 316. 317. 314. 314. 32746400 313. NA NA
5 QQQ 2021-01-12 314. 316. 311. 314. 29266800 313. NA NA
6 QQQ 2021-01-13 314. 317. 314. 316. 22898400 315. NA NA
7 QQQ 2021-01-14 316. 318. 314. 314. 23500100 313. NA NA
8 QQQ 2021-01-15 314. 315. 311. 312. 35118700 311. NA NA
9 QQQ 2021-01-19 314. 317. 313. 316. 24537000 315. NA NA
10 QQQ 2021-01-20 320. 325. 317. 324. 30728100 323. NA NA
If you want to do this in base R functions combined with only quantmod functions, check the quantmod tag, there are a few posts that use lapply to do this. If you don't find what you need, let me know.

How do I retain all the columns while using tq_transmute() function?

I am trying to replicate a trading strategy and backtest in R. However, I am having a slight problem with the tq_transmute() function. Any help would be appreciated.
So, I have the following code that I have written until now:
#Importing the etfs data
symbols<- c("SPY","XLF","XLE")
start<-as.Date("2000-01-01")
end<- as.Date("2018-12-31")
price_data<- lapply(symbols, function(symbol){
etfs<-as.data.frame(getSymbols(symbol,src="yahoo", from=start, to= end,
auto.assign = FALSE))
colnames(etfs)<- c("Open", "High","Low","Close","volume","Adjusted")
etfs$Symbol<- symbol
etfs$Date<- rownames(etfs)
etfs
})
# Next, I used do.call() with rbind() to combine the data into a single data frame
etfs_df<- do.call(rbind, price_data)
#This because of POSIXct error
daily_price<- etfs_df %>%
mutate(Date=as.Date(Date, frac=1))
# I have deleted some columns of the table as my work only concerned the "Adjusted" column.
#So, until now we have:
head(daily_price)
Adjusted Symbol Date
1 98.14607 SPY 2000-01-03
2 94.30798 SPY 2000-01-04
3 94.47669 SPY 2000-01-05
4 92.95834 SPY 2000-01-06
5 98.35699 SPY 2000-01-07
6 98.69440 SPY 2000-01-10
#Converting the daily adjusted price to monthly adjusted price
monthly_price<-
tq_transmute(daily_price,select = Adjusted, mutate_fun = to.monthly, indexAt = "lastof")
head(monthly_price)
# And now, I get the following table:
# A tibble: 6 x 2
Date Adjusted
<date> <dbl>
1 2000-01-31 16.6
2 2000-02-29 15.9
3 2000-03-31 17.9
4 2000-04-30 17.7
5 2000-05-31 19.7
6 2000-06-30 18.6
So, as you can see, the Date and Adjusted prices have been successfully converted to monthly figures but my Symbol column has disappeared. Could anyone please tell me why did that happen and how do I get it back?
Thank you.
group the data by Symbol and apply tq_transmute.
library(dplyr)
library(quantmod)
library(tidyquant)
monthly_price <- daily_price %>%
group_by(Symbol) %>%
tq_transmute(daily_price,select = Adjusted,
mutate_fun = to.monthly, indexAt = "lastof")
# Symbol Date Adjusted
# <chr> <date> <dbl>
# 1 SPY 2000-01-31 94.2
# 2 SPY 2000-02-29 92.7
# 3 SPY 2000-03-31 102.
# 4 SPY 2000-04-30 98.2
# 5 SPY 2000-05-31 96.6
# 6 SPY 2000-06-30 98.5
# 7 SPY 2000-07-31 97.0
# 8 SPY 2000-08-31 103.
# 9 SPY 2000-09-30 97.6
#10 SPY 2000-10-31 97.2
# … with 674 more rows
I would do it like this:
symbols <- c("SPY", "XLF", "XLE")
start <- as.Date("2000-01-01")
end <- as.Date("2018-12-31")
# Environment to hold data
my_data <- new.env()
# Tell getSymbols() to load the data into 'my_data'
getSymbols(symbols, from = start, to = end, env = my_data)
# Combine all the adjusted close prices into one xts object
price_data <- Reduce(merge, lapply(my_data, Ad))
# Remove "Adjusted" from column names
colnames(price_data) <- sub(".Adjusted", "", colnames(price_data), fixed = TRUE)
# Get the last price for each month
monthly_data <- apply.monthly(price_data, last)
# Convert to a long data.frame
long_data <- fortify.zoo(monthly_data,
names = c("Date", "Symbol", "Adjusted"), melt = TRUE)

How can i mege data from csv files?

I want to perform an analysis of 2 stocks for a period from 2017-01-01 until 2020-04-14. Unfortunately, I struggle with importing data.
I was trying to import data from excel, limit data for the period from 2017-01-01 until 2020-04-14 and merge these data.
x <- read.csv("data/pkn_d.csv")
y <- read.csv("data/lts_d.csv")
head(x)
Date Open High Low Close Volume
1 1999-11-26 16.307 16.452 15.717 16.229 14845780
2 1999-11-29 16.154 16.229 15.863 15.940 5148506
3 1999-11-30 16.086 16.375 16.086 16.229 3077465
4 1999-12-01 16.375 16.742 16.229 16.742 2881475
5 1999-12-02 16.895 17.407 16.818 17.040 3093313
6 1999-12-03 17.040 17.330 16.895 17.260 2207547
head(y)
Date Open High Low Close Volume
1 2005-06-09 26.676 26.676 25.013 25.013 1795647
2 2005-06-10 25.097 25.433 24.594 24.594 679054
3 2005-06-13 25.013 25.097 24.594 24.762 213950
4 2005-06-14 24.929 24.929 24.762 24.762 181415
5 2005-06-15 24.762 24.845 24.594 24.762 160359
6 2005-06-16 24.762 24.762 24.350 24.350 171475
I'm only interested in data from 2017-01-01 until 2020-04-14 and 5th column (close price)
x <- x[4285:5100, 5]
y <- y[2899:3714, 5]
Next, I want to merge these data:
merge(x,y)
However, I don't obtain any meaningful output.. How can I solve this issue?
Since the question doesn't include a reproducible example, here is a solution that merges a set of stock prices retrieved from the internet via the quantmod package.
library("quantmod")
#
symbolList <- c("PKN","LTS")
from.dat <- as.Date("2017-01-01",format="%Y-%m-%d")
to.dat <- as.Date("2020-04-14",format="%Y-%m-%d")
prices <- lapply(symbolList,function(x){
getSymbols(x,auto.assign = FALSE,from = from.dat,to = to.dat)[,4]
})
priceData <- do.call(merge,prices)
head(priceData)
...and the output:
> head(priceData)
PKN.Close LTS.Close
2017-01-03 49.370 2.54
2017-01-04 50.370 2.57
2017-01-05 89.340 2.43
2017-01-06 89.340 2.38
2017-01-09 49.855 2.36
2017-01-10 88.300 2.44
>

Looping with quantmod

I'm new to R, loops and quantmod. I'm trying to convince quantmod to skip any ticker it's unable to process and continue on to the next ticker symbol, instead of stopping. I thought I'd found my answer here how do I loop through all the stocks with quantmod and ttr? but I'm not able to get Rime's solution to work:
If the loop breaks, say on the 50th iteration, then just re run the last block of code by changing the following
# Actual loop:
# IF IT BREAKS ON THE 50th ITERATION, it must be skipped, therefore change it to 51
for(i in 51:length(symbols)) {
symbols[i]-> symbol
...
Below is my original code, which only returns 8 of the many values(so I assume that 9 is the trouble spot).
library(gdata)
d = read.xls("~/Documents/TEST.xlsx", sheet = 1, stringsAsFactors=F)
library(quantmod)
sym <- as.character(d[,1])
results <- NULL
for (ii in sym){
data1 <- getSymbols(Symbols = ii,
src = "yahoo",
from = Sys.Date() - 100,
auto.assign = FALSE)
de = head(data1,150)
colnames(de) <- c("open","high","low","close","volume","adj.")
overnightRtn <- (as.numeric(de[2:nrow(de),"open"])/as.numeric(de[1:(nrow(de)-1),"close"])) - 1
results <- rbind(results,cbind(
paste(round(min(overnightRtn,na.rm=T),5),"%",sep="")))
}
colnames(results) <- c("overnightRtn2")
rownames(results) <- sym
View(results)
When I change for(ii in sym) to for(ii in 9:length(sym)) I get an error:
could not find function "getSymbols.9"
Here is the start of d[,1] :
[1] "ABX" "ACC" "ACCO" "ACE" "ACG" "ACH" "ACI" "ACM" "ACMP" "ACN"
There are some workarounds for errors when looping in R, one way to do this will be using the tryCatchfunction, juba showed here how to do it. I also made sure that the for loop will only continue when the data1variable is assigned some value.
Change your for loop for the following code and it should work for what you are asking.
for (ii in sym){
data1 <- NULL # NULL data1
data1 <- tryCatch(getSymbols(Symbols = ii,
src = "yahoo",
from = Sys.Date() - 100,
auto.assign = FALSE),
error=function(e){}) # empty function for error handling
if(is.null(data1)) next() # if data1 is still NULL go to next ticker
de = head(data1,150)
colnames(de) <- c("open","high","low","close","volume","adj.")
overnightRtn <- (as.numeric(de[2:nrow(de),"open"])/as.numeric(de[1:(nrow(de)-1),"close"])) - 1
results <- rbind(results,cbind(
paste(round(min(overnightRtn,na.rm=T),5),"%",sep="")))
}
You might try the tidyquant package which takes care of error handling internally. It also doesn't require for-loops so it will save you a significant amount of code. The tq_get() function is responsible for getting stock prices. You can use the complete_cases argument to adjust how errors are handled.
Example with complete_cases = TRUE: Automatically removes "bad apples"
library(tidyquant)
# get data with complete_cases = TRUE automatically removes bad apples
c("AAPL", "GOOG", "BAD APPLE", "NFLX") %>%
tq_get(get = "stock.prices", complete_cases = TRUE)
#> Warning in value[[3L]](cond): Error at BAD APPLE during call to get =
#> 'stock.prices'. Removing BAD APPLE.
#> # A tibble: 7,680 × 8
#> symbol date open high low close volume adjusted
#> <chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 AAPL 2007-01-03 86.29 86.58 81.90 83.80 309579900 10.85709
#> 2 AAPL 2007-01-04 84.05 85.95 83.82 85.66 211815100 11.09807
#> 3 AAPL 2007-01-05 85.77 86.20 84.40 85.05 208685400 11.01904
#> 4 AAPL 2007-01-08 85.96 86.53 85.28 85.47 199276700 11.07345
#> 5 AAPL 2007-01-09 86.45 92.98 85.15 92.57 837324600 11.99333
#> 6 AAPL 2007-01-10 94.75 97.80 93.45 97.00 738220000 12.56728
#> 7 AAPL 2007-01-11 95.94 96.78 95.10 95.80 360063200 12.41180
#> 8 AAPL 2007-01-12 94.59 95.06 93.23 94.62 328172600 12.25892
#> 9 AAPL 2007-01-16 95.68 97.25 95.45 97.10 311019100 12.58023
#> 10 AAPL 2007-01-17 97.56 97.60 94.82 94.95 411565000 12.30168
#> # ... with 7,670 more rows
Example with complete_cases = FALSE: Returns nested data frame.
library(tidyquant)
# get data with complete_cases = FALSE returns a nested data frame
c("AAPL", "GOOG", "BAD APPLE", "NFLX") %>%
tq_get(get = "stock.prices", complete_cases = FALSE)
#> Warning in value[[3L]](cond): Error at BAD APPLE during call to get =
#> 'stock.prices'.
#> Warning in value[[3L]](cond): Returning as nested data frame.
#> # A tibble: 4 × 2
#> symbol stock.prices
#> <chr> <list>
#> 1 AAPL <tibble [2,560 × 7]>
#> 2 GOOG <tibble [2,560 × 7]>
#> 3 BAD APPLE <lgl [1]>
#> 4 NFLX <tibble [2,560 × 7]>
In both cases the user gets a WARNING message. The prudent user will read them and try to determine what the issue is. Most important, the long running script will not fail.

Resources