I am trying to replicate a trading strategy and backtest in R. However, I am having a slight problem with the tq_transmute() function. Any help would be appreciated.
So, I have the following code that I have written until now:
#Importing the etfs data
symbols<- c("SPY","XLF","XLE")
start<-as.Date("2000-01-01")
end<- as.Date("2018-12-31")
price_data<- lapply(symbols, function(symbol){
etfs<-as.data.frame(getSymbols(symbol,src="yahoo", from=start, to= end,
auto.assign = FALSE))
colnames(etfs)<- c("Open", "High","Low","Close","volume","Adjusted")
etfs$Symbol<- symbol
etfs$Date<- rownames(etfs)
etfs
})
# Next, I used do.call() with rbind() to combine the data into a single data frame
etfs_df<- do.call(rbind, price_data)
#This because of POSIXct error
daily_price<- etfs_df %>%
mutate(Date=as.Date(Date, frac=1))
# I have deleted some columns of the table as my work only concerned the "Adjusted" column.
#So, until now we have:
head(daily_price)
Adjusted Symbol Date
1 98.14607 SPY 2000-01-03
2 94.30798 SPY 2000-01-04
3 94.47669 SPY 2000-01-05
4 92.95834 SPY 2000-01-06
5 98.35699 SPY 2000-01-07
6 98.69440 SPY 2000-01-10
#Converting the daily adjusted price to monthly adjusted price
monthly_price<-
tq_transmute(daily_price,select = Adjusted, mutate_fun = to.monthly, indexAt = "lastof")
head(monthly_price)
# And now, I get the following table:
# A tibble: 6 x 2
Date Adjusted
<date> <dbl>
1 2000-01-31 16.6
2 2000-02-29 15.9
3 2000-03-31 17.9
4 2000-04-30 17.7
5 2000-05-31 19.7
6 2000-06-30 18.6
So, as you can see, the Date and Adjusted prices have been successfully converted to monthly figures but my Symbol column has disappeared. Could anyone please tell me why did that happen and how do I get it back?
Thank you.
group the data by Symbol and apply tq_transmute.
library(dplyr)
library(quantmod)
library(tidyquant)
monthly_price <- daily_price %>%
group_by(Symbol) %>%
tq_transmute(daily_price,select = Adjusted,
mutate_fun = to.monthly, indexAt = "lastof")
# Symbol Date Adjusted
# <chr> <date> <dbl>
# 1 SPY 2000-01-31 94.2
# 2 SPY 2000-02-29 92.7
# 3 SPY 2000-03-31 102.
# 4 SPY 2000-04-30 98.2
# 5 SPY 2000-05-31 96.6
# 6 SPY 2000-06-30 98.5
# 7 SPY 2000-07-31 97.0
# 8 SPY 2000-08-31 103.
# 9 SPY 2000-09-30 97.6
#10 SPY 2000-10-31 97.2
# … with 674 more rows
I would do it like this:
symbols <- c("SPY", "XLF", "XLE")
start <- as.Date("2000-01-01")
end <- as.Date("2018-12-31")
# Environment to hold data
my_data <- new.env()
# Tell getSymbols() to load the data into 'my_data'
getSymbols(symbols, from = start, to = end, env = my_data)
# Combine all the adjusted close prices into one xts object
price_data <- Reduce(merge, lapply(my_data, Ad))
# Remove "Adjusted" from column names
colnames(price_data) <- sub(".Adjusted", "", colnames(price_data), fixed = TRUE)
# Get the last price for each month
monthly_data <- apply.monthly(price_data, last)
# Convert to a long data.frame
long_data <- fortify.zoo(monthly_data,
names = c("Date", "Symbol", "Adjusted"), melt = TRUE)
Related
I used the following code:
getSymbols(c("TSLA", "AAPL", "CSCO", "IBM"))
tsla<-TSLA['2022-01-03::2023-01-03']
aapl=AAPL['2022-01-03::2023-01-03']
csco=CSCO['2022-01-03::2023-01-03']
ibm=IBM['2022-01-03::2023-01-03']
tsla<-tsla$TSLA.Adjusted
aapl<-aapl$AAPL.Adjusted
csco<-csco$CSCO.Adjusted
ibm<-ibm$IBM.Adjusted
stkdata=cbind(tsla, aapl, csco, ibm)
n<-length(stkdata[,1])
rets<-log(stkdata[2:n,]/stkdata[1:(n-1),])
It produces all zeros.
After I assigned stkdata[2:n] to x and stkdata[1:n-1] to y, R shows
x[1,]
TSLA.Adjusted AAPL.Adjusted CSCO.Adjusted IBM.Adjusted
2022-01-04 383.1967 178.3907 59.26239 129.9028
y[1,]
TSLA.Adjusted AAPL.Adjusted CSCO.Adjusted IBM.Adjusted
2022-01-03 399.9267 180.6839 60.75242 128.0392
This is fine. But
x[1,]/y[1,]
Data:
numeric(0)
Index:
Date of length 0
What could be the problem? Thanks ahead!
This behavior is expected because arithmetic and logical operations on xts objects are done on observations that have the same date.
You should use the lag() function to change the datetime index alignment. log(stkdata / lag(stkdata)).
Note that you have to be very careful using lag() with dplyr loaded. It breaks how base R's lag() function is supposed to work, which breaks lag(my_xts). It also breaks lag() on all other types of objects that have their own lag() method (e.g. zoo).
1) getSymbols can place the results into a local environment and then we can iterate over its elements using eapply. Then use diff with arithmetic=FALSE causing diff to perform division rather than subtraction.
If x is the ratio of the current price to the prior price then
while it is true that log(x) approximately equals x-1 if the return is small we don't really need to use that approximation and can calculate the return exactly using x-1.
Regarding the question, xts objects do not combine by position but by time. Removing the first or last element of an xts object does not change the times so the code in the question is dividing stkdata by itself except for the positions on the end which have been removed.
Try the code below.
library(quantmod)
tickers <- c("TSLA", "AAPL", "CSCO", "IBM")
getSymbols(tickers, env = e <- new.env(), from = "2022-01-03", to = "2023-01-03")
stks <- do.call("merge", eapply(e, Ad))
rets <- diff(stks, arithmetic = FALSE) - 1
2) A variation is to use getSymbols to load the data into the current R workspace, as in the question, and then use mget.
library(quantmod)
tickers <- c("TSLA", "AAPL", "CSCO", "IBM")
getSymbols(tickers, from = "2022-01-03", to = "2023-01-03")
stks <- do.call("merge", lapply(mget(tickers), Ad))
rets <- diff(stks, arithmetic = FALSE) - 1
With tidyquant you can calculate daily log returns as such:
library(tidyquant)
library(tidyverse)
df = tq_get(c("TSLA", "AAPL", "CSCO", "IBM"),
from = "2022-01-03",
to = "2023-01-04")
log_return = df %>%
group_by(symbol) %>%
tq_mutate(select = adjusted,
mutate_fun = periodReturn,
period = "daily",
type = "log",
col_rename = "log_returns")
# A tibble: 1,008 × 9
# Groups: symbol [4]
symbol date open high low close volume adjusted log_returns
<chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 TSLA 2022-01-03 383. 400. 379. 400. 103931400 400. 0
2 TSLA 2022-01-04 397. 403. 374. 383. 100248300 383. -0.0427
3 TSLA 2022-01-05 382. 390. 360. 363. 80119800 363. -0.0550
4 TSLA 2022-01-06 359 363. 340. 355. 90336600 355. -0.0218
5 TSLA 2022-01-07 360. 360. 337. 342. 84164700 342. -0.0361
6 TSLA 2022-01-10 333. 353. 327. 353. 91815000 353. 0.0299
7 TSLA 2022-01-11 351. 359. 346. 355. 66063300 355. 0.00592
8 TSLA 2022-01-12 360. 372. 358. 369. 83739000 369. 0.0385
9 TSLA 2022-01-13 370. 372. 342. 344. 97209900 344. -0.0699
10 TSLA 2022-01-14 340. 351. 338. 350. 72924300 350. 0.0173
# … with 998 more rows
# ℹ Use `print(n = ...)` to see more rows
Plotting
log_return %>%
ggplot() +
aes(x = date, y = log_returns, col = symbol) +
geom_line() +
facet_wrap(~ symbol) +
theme_tq()
I would like to construct annualized volatility of returns for a panel data set in R. I have monthly returns (%) per month, per firm (entity), for a large dataset.
I would like to construct the five year average of annualized volatility of monthly returns - per year (t+5) and per firm.
Constructing this measure by it self is not difficult, but I would like to do it in R, so that it groups by firm & year. I am thankful for any help.
The data looks like this:
library(xts)
library(PerformanceAnalytics)
library(quantmod)
library(lubridate)
library(data.table)
library(stringr)
# let's fetch some real-world panel data in a similar format to that cited by OP
symbols <- c('GOOG', 'AAPL', 'AMZN')
quantmod::getSymbols(symbols,
auto.assign = TRUE,
from = Sys.time() - years(20),
periodicity = 'monthly')
lapply(symbols, function(x) {
tmp <- get(x, envir = .GlobalEnv)
tmp$Return <- CalculateReturns(Ad(tmp), method = 'discrete')
tmp$LogReturn <- CalculateReturns(Ad(tmp), method = 'log')
assign(x, tmp, envir = .GlobalEnv)
}) |> invisible()
panel_data_df <- lapply(symbols, function(x) {
tmp <- get(x, envir = .GlobalEnv)
df <- data.frame(Symbol = x,
Date = index(tmp),
Return = round(tmp$Return * 1e2, 2) |>
sprintf(fmt = '%s%%') |>
str_replace_all('NA%', NA_character_),
LogReturn = tmp$LogReturn)
df
}) |>
rbindlist() |>
as.data.frame()
head(panel_data_df)
Symbol Date Return LogReturn
1 GOOG 2004-09-01 <NA> NA
2 GOOG 2004-10-01 47.1% 0.38593415
3 GOOG 2004-11-01 -4.54% -0.04649014
4 GOOG 2004-12-01 5.94% 0.05770476
5 GOOG 2005-01-01 1.47% 0.01457253
6 GOOG 2005-02-01 -3.9% -0.03978529
# now let's calculate the 5 year mean of annualized monthly volatility
metrics_df <- split(panel_data_df, panel_data_df$Symbol) |>
lapply(function(x) {
df_xts <- xts(x$LogReturn, order.by = as.POSIXct(x$Date))
stddev_1yr <- period.apply(df_xts,
endpoints(df_xts, 'years', 1),
StdDev.annualized)
stddev_1yr_5yr_mean <- period.apply(stddev_1yr,
endpoints(stddev_1yr, 'years', 5),
mean)
stddev_1yr_5yr_mean_df <- as.data.frame(stddev_1yr_5yr_mean)
colnames(stddev_1yr_5yr_mean_df) <- 'StDevAnn5YrMean'
stddev_1yr_5yr_mean_df$Date <- rownames(stddev_1yr_5yr_mean_df) |>
str_split('\\s') |>
sapply('[', 1)
rownames(stddev_1yr_5yr_mean_df) <- NULL
stddev_1yr_5yr_mean_df$Symbol <- x$Symbol[ 1 ]
stddev_1yr_5yr_mean_df
}) |> rbindlist() |> as.data.frame()
panel_data_df <- merge(panel_data_df,
metrics_df,
by = c('Symbol', 'Date'),
all = TRUE)
head(panel_data_df, 50)
Symbol Date Return LogReturn StDevAnn5YrMean
1 AAPL 2002-11-01 <NA> NA NA
2 AAPL 2002-12-01 -7.55% -0.078484655 NA
3 AAPL 2003-01-01 0.21% 0.002089444 NA
4 AAPL 2003-02-01 4.53% 0.044272032 NA
5 AAPL 2003-03-01 -5.8% -0.059709353 NA
6 AAPL 2003-04-01 0.57% 0.005642860 NA
7 AAPL 2003-05-01 26.23% 0.232938925 NA
8 AAPL 2003-06-01 6.18% 0.060001124 NA
9 AAPL 2003-07-01 10.6% 0.100732953 NA
[ ... ]
26 AAPL 2004-12-01 -3.95% -0.040325449 NA
27 AAPL 2004-12-31 <NA> NA 0.2947654
28 AAPL 2005-01-01 19.41% 0.177392802 NA
29 AAPL 2005-02-01 16.67% 0.154188206 NA
30 AAPL 2005-03-01 -7.11% -0.073765972 NA
[ ... ]
I'm trying to get the standard deviation of a stock price by year, but I'm getting the same value for every year.
I tried with dplyr (group_by, summarise) and also with a function, but had no luck in any of them, both return the same value of 67.0.
It is probably passing the whole dataframe without subsetting it, how can this issue be fixed?
library(quantmod)
library(tidyr)
library(dplyr)
#initial parameters
initialDate = as.Date('2010-01-01')
finalDate = Sys.Date()
ybeg = format(initialDate,"%Y")
yend = format(finalDate,"%Y")
ticker = "AAPL"
#getting stock prices
stock = getSymbols.yahoo(ticker, from=initialDate, auto.assign = FALSE)
stock = stock[,4] #working only with closing prices
With dplyr:
#Attempt 1 with dplyr - not working, all values by year return the same
stock = stock %>% zoo::fortify.zoo()
stock$Date = stock$Index
separate(stock, Date, c("year","month","day"), sep="-") %>%
group_by(year) %>%
summarise(stdev= sd(stock[,2]))
# A tibble: 11 x 2
# year stdev
# <chr> <dbl>
# 1 2010 67.0
# 2 2011 67.0
#....
#10 2019 67.0
#11 2020 67.0
And with function:
#Attempt 2 with function - not working - returns only one value instead of multiple
#getting stock prices
stock = getSymbols.yahoo(ticker, from=initialDate, auto.assign = FALSE)
stock = stock[,4] #working only with closing prices
#subsetting
years = as.character(seq(ybeg,yend,by=1))
years
calculate_stdev = function(series,years) {
series[years] #subsetting by years, to be equivalent as stock["2010"], stock["2011"] e.g.
sd(series[years][,1]) #calculate stdev on closing prices of the current subset
}
yearly.stdev = calculate_stdev(stock,years)
> yearly.stdev
[1] 67.04185
Use apply.yearly() (a convenience wrapper around the more general period.apply()) to call a function on yearly subsets of the xts object returned by getSymbols().
You can use the Cl() function to extract the close column from objects returned by getSymbols().
stock = getSymbols("AAPL", from = "2010-01-01", auto.assign = FALSE)
apply.yearly(Cl(stock), sd)
## AAPL.Close
## 2010-12-31 5.365208
## 2011-12-30 3.703407
## 2012-12-31 9.568127
## 2013-12-31 6.412542
## 2014-12-31 13.371293
## 2015-12-31 7.683550
## 2016-12-30 7.640743
## 2017-12-29 14.621191
## 2018-12-31 20.593861
## 2019-12-31 34.538978
## 2020-06-19 29.577157
I don't know dplyr, but here's how with data.table
library(data.table)
# convert data.frame to data.table
setDT(stock)
# convert your Date column with content like "2020-06-17" from character to Date type
stock[,Date:=as.Date(Date)]
# calculate sd(price) grouped by year, assuming here your price column is named "price"
stock[,sd(price),year(Date)]
Don't pass the name of the dataframe again in your summarise function. Use the variable name instead.
separate(stock, Date, c("year","month","day"), sep="-") %>%
group_by(year) %>%
summarise(stdev = sd(AAPL.Close)) # <-- here
# A tibble: 11 x 2
# year stdev
# <chr> <dbl>
# 1 2010 5.37
# 2 2011 3.70
# 3 2012 9.57
# 4 2013 6.41
# 5 2014 13.4
# 6 2015 7.68
# 7 2016 7.64
# 8 2017 14.6
# 9 2018 20.6
#10 2019 34.5
#11 2020 28.7
I want to perform an analysis of 2 stocks for a period from 2017-01-01 until 2020-04-14. Unfortunately, I struggle with importing data.
I was trying to import data from excel, limit data for the period from 2017-01-01 until 2020-04-14 and merge these data.
x <- read.csv("data/pkn_d.csv")
y <- read.csv("data/lts_d.csv")
head(x)
Date Open High Low Close Volume
1 1999-11-26 16.307 16.452 15.717 16.229 14845780
2 1999-11-29 16.154 16.229 15.863 15.940 5148506
3 1999-11-30 16.086 16.375 16.086 16.229 3077465
4 1999-12-01 16.375 16.742 16.229 16.742 2881475
5 1999-12-02 16.895 17.407 16.818 17.040 3093313
6 1999-12-03 17.040 17.330 16.895 17.260 2207547
head(y)
Date Open High Low Close Volume
1 2005-06-09 26.676 26.676 25.013 25.013 1795647
2 2005-06-10 25.097 25.433 24.594 24.594 679054
3 2005-06-13 25.013 25.097 24.594 24.762 213950
4 2005-06-14 24.929 24.929 24.762 24.762 181415
5 2005-06-15 24.762 24.845 24.594 24.762 160359
6 2005-06-16 24.762 24.762 24.350 24.350 171475
I'm only interested in data from 2017-01-01 until 2020-04-14 and 5th column (close price)
x <- x[4285:5100, 5]
y <- y[2899:3714, 5]
Next, I want to merge these data:
merge(x,y)
However, I don't obtain any meaningful output.. How can I solve this issue?
Since the question doesn't include a reproducible example, here is a solution that merges a set of stock prices retrieved from the internet via the quantmod package.
library("quantmod")
#
symbolList <- c("PKN","LTS")
from.dat <- as.Date("2017-01-01",format="%Y-%m-%d")
to.dat <- as.Date("2020-04-14",format="%Y-%m-%d")
prices <- lapply(symbolList,function(x){
getSymbols(x,auto.assign = FALSE,from = from.dat,to = to.dat)[,4]
})
priceData <- do.call(merge,prices)
head(priceData)
...and the output:
> head(priceData)
PKN.Close LTS.Close
2017-01-03 49.370 2.54
2017-01-04 50.370 2.57
2017-01-05 89.340 2.43
2017-01-06 89.340 2.38
2017-01-09 49.855 2.36
2017-01-10 88.300 2.44
>
I am using the package quantmod to get historical share prices.
I want to create a loop to pull back the prices and as part of the loop I want to create a dataframe for each share. I have been unsuccessful so far with the below code, it gets the share prices as expected but this is returned as a xts object whereas I require the information as a dataframe - the as.data.frame part of the code doesn't do anything...
library(quantmod)
shares<-c("BARC.L", "BP.L", "DLG.L")
for(i in 1:length(shares)){
#gets share prices
getSymbols((paste(shares[i])), from = "2018-01-01")
#put the data into a dataframe (doesn't work).
shares[i]<-as.data.frame(shares[i])
}
The end result that I want is 3 dataframes - 1 for each share.
Can anyone suggest modifications to the code to achieve this please?
Personally I would do it like this:
library(quantmod)
shares<-c("BARC.L", "BP.L", "DLG.L")
my_shares <- lapply(shares, function(x) getSymbols(x, from = "2018-01-01", auto.assign = FALSE))
names(my_shares) <- shares
Or if you need the dates as a column instead of rownames:
my_shares <- lapply(shares, function(x) {
out <- getSymbols(x, from = "2018-01-01", auto.assign = FALSE)
out <- data.frame(dates = index(out), coredata(out))
return(out)
})
names(my_shares) <- shares
Or if you need everything in a tidy dataset:
library(tidyquant)
my_shares <- tq_get(shares)
my_shares
# A tibble: 7,130 x 8
symbol date open high low close volume adjusted
<chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 BARC.L 2008-01-02 464. 483. 460. 466. 38104837 344.
2 BARC.L 2008-01-03 466. 472. 458. 470. 33215781 347.
3 BARC.L 2008-01-04 466. 476. 447. 449. 42710244 332.
4 BARC.L 2008-01-07 447. 452. 433. 436. 58213512 322.
5 BARC.L 2008-01-08 439. 447. 421. 437. 105370539 322.
6 BARC.L 2008-01-09 432. 434. 420. 424. 71059078 313.
7 BARC.L 2008-01-10 428. 431. 413. 418. 54763347 309.
8 BARC.L 2008-01-11 416. 437. 416. 430. 72467229 317.
9 BARC.L 2008-01-14 430. 448. 427. 444. 56916500 328.
10 BARC.L 2008-01-15 445. 452. 428. 429. 77094907 317.
# ... with 7,120 more rows
Firstly, I suggest you use the help() function that comes with R packages if you're not already doing so. I noticed in help(getSymbols) that you need to set env=NULL to actually return the data. With that, I've also made a list object so you can store the data as data.frames like you requested:
library(quantmod)
shares<-c("BARC.L", "BP.L", "DLG.L")
# initialize a list to store your data frames
df_list <- as.list(rep(data.frame(), length(shares)))
for (i in 1:length(shares)) {
#gets share prices
df_list[[i]] <- as.data.frame(getSymbols(shares[i], from = "2018-01-01", env=NULL))
}
# so you can access by name, e.g. df_list$DLG.L
names(df_list) <- shares