I have looked far and wide for a solution to this issue, but I cannot seem to figure it out. I do not have much experience working with xts objects in R.
I have 40 xts objects (ETF data) and I want to run the quantmod function WeeklyReturn on each of them individually.
I have tried to refer to them by using the ls() function:
lapply(ls(), weeklyReturn)
I have also tried the object() function
lapply(object(), weeklyReturn)
I have also tried using as.xts() in my call to coerce the ls() objects to be used as xts but to no avail.
How can I run this function on every xts object in the environment?
Thank you,
It would be better to load all of your xts objects into a list or create them in a way that returns them in a list to begin with. Then you could do results = lapply(xts.list, weeklyReturn).
To work with objects in the global environment, you could test for whether the object is an xts object and then run weeklyReturn on it if it is. Something like this:
results = lapply(setNames(ls(), ls()), function(i) {
x = get(i)
if(is.xts(x)) {
weeklyReturn(x)
}
})
results = results[!sapply(results, is.null)]
Or you could select only the xts objects to begin with:
results = sapply(ls()[sapply(ls(), function(i) is.xts(get(i)))],
function(i) weeklyReturn(get(i)), simplify=FALSE, USE.NAMES=TRUE)
lapply(ls(), weeklyReturn) doesn't work, because ls() returns the object names as strings. The get function takes a string as an argument and returns the object with that name.
An alternate solution using the tidyquant package. Note that this is data frame based so I will not be working with xts objects. I use two core functions to scale the analysis. First, tq_get() is used to go from a vector of ETF symbols to getting the prices. Second, tq_transmute() is used to apply the weeklyReturn function to the adjusted prices.
library(tidyquant)
etf_vec <- c("SPY", "QEFA", "TOTL", "GLD")
# Use tq_get to get prices
etf_prices <- tq_get(etf_vec, get = "stock.prices", from = "2017-01-01", to = "2017-05-31")
etf_prices
#> # A tibble: 408 x 8
#> symbol date open high low close volume adjusted
#> <chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 SPY 2017-01-03 227.121 227.919 225.951 225.24 91366500 223.1760
#> 2 SPY 2017-01-04 227.707 228.847 227.696 226.58 78744400 224.5037
#> 3 SPY 2017-01-05 228.363 228.675 227.565 226.40 78379000 224.3254
#> 4 SPY 2017-01-06 228.625 229.856 227.989 227.21 71559900 225.1280
#> 5 SPY 2017-01-09 229.009 229.170 228.514 226.46 46265300 224.3848
#> 6 SPY 2017-01-10 228.575 229.554 228.100 226.46 63771900 224.3848
#> 7 SPY 2017-01-11 228.453 229.200 227.676 227.10 74650000 225.0190
#> 8 SPY 2017-01-12 228.595 228.847 227.040 226.53 72113200 224.4542
#> 9 SPY 2017-01-13 228.827 229.503 228.786 227.05 62717900 224.9694
#> 10 SPY 2017-01-17 228.403 228.877 227.888 226.25 61240800 224.1767
#> # ... with 398 more rows
# Use tq_transmute to apply weeklyReturn to multiple groups
etf_returns_w <- etf_prices %>%
group_by(symbol) %>%
tq_transmute(select = adjusted, mutate_fun = weeklyReturn)
etf_returns_w
#> # A tibble: 88 x 3
#> # Groups: symbol [4]
#> symbol date weekly.returns
#> <chr> <date> <dbl>
#> 1 SPY 2017-01-06 0.0087462358
#> 2 SPY 2017-01-13 -0.0007042173
#> 3 SPY 2017-01-20 -0.0013653367
#> 4 SPY 2017-01-27 0.0098350474
#> 5 SPY 2017-02-03 0.0016159256
#> 6 SPY 2017-02-10 0.0094619381
#> 7 SPY 2017-02-17 0.0154636969
#> 8 SPY 2017-02-24 0.0070186222
#> 9 SPY 2017-03-03 0.0070964211
#> 10 SPY 2017-03-10 -0.0030618336
#> # ... with 78 more rows
Related
I am trying to replicate a trading strategy and backtest in R. However, I am having a slight problem with the tq_transmute() function. Any help would be appreciated.
So, I have the following code that I have written until now:
#Importing the etfs data
symbols<- c("SPY","XLF","XLE")
start<-as.Date("2000-01-01")
end<- as.Date("2018-12-31")
price_data<- lapply(symbols, function(symbol){
etfs<-as.data.frame(getSymbols(symbol,src="yahoo", from=start, to= end,
auto.assign = FALSE))
colnames(etfs)<- c("Open", "High","Low","Close","volume","Adjusted")
etfs$Symbol<- symbol
etfs$Date<- rownames(etfs)
etfs
})
# Next, I used do.call() with rbind() to combine the data into a single data frame
etfs_df<- do.call(rbind, price_data)
#This because of POSIXct error
daily_price<- etfs_df %>%
mutate(Date=as.Date(Date, frac=1))
# I have deleted some columns of the table as my work only concerned the "Adjusted" column.
#So, until now we have:
head(daily_price)
Adjusted Symbol Date
1 98.14607 SPY 2000-01-03
2 94.30798 SPY 2000-01-04
3 94.47669 SPY 2000-01-05
4 92.95834 SPY 2000-01-06
5 98.35699 SPY 2000-01-07
6 98.69440 SPY 2000-01-10
#Converting the daily adjusted price to monthly adjusted price
monthly_price<-
tq_transmute(daily_price,select = Adjusted, mutate_fun = to.monthly, indexAt = "lastof")
head(monthly_price)
# And now, I get the following table:
# A tibble: 6 x 2
Date Adjusted
<date> <dbl>
1 2000-01-31 16.6
2 2000-02-29 15.9
3 2000-03-31 17.9
4 2000-04-30 17.7
5 2000-05-31 19.7
6 2000-06-30 18.6
So, as you can see, the Date and Adjusted prices have been successfully converted to monthly figures but my Symbol column has disappeared. Could anyone please tell me why did that happen and how do I get it back?
Thank you.
group the data by Symbol and apply tq_transmute.
library(dplyr)
library(quantmod)
library(tidyquant)
monthly_price <- daily_price %>%
group_by(Symbol) %>%
tq_transmute(daily_price,select = Adjusted,
mutate_fun = to.monthly, indexAt = "lastof")
# Symbol Date Adjusted
# <chr> <date> <dbl>
# 1 SPY 2000-01-31 94.2
# 2 SPY 2000-02-29 92.7
# 3 SPY 2000-03-31 102.
# 4 SPY 2000-04-30 98.2
# 5 SPY 2000-05-31 96.6
# 6 SPY 2000-06-30 98.5
# 7 SPY 2000-07-31 97.0
# 8 SPY 2000-08-31 103.
# 9 SPY 2000-09-30 97.6
#10 SPY 2000-10-31 97.2
# … with 674 more rows
I would do it like this:
symbols <- c("SPY", "XLF", "XLE")
start <- as.Date("2000-01-01")
end <- as.Date("2018-12-31")
# Environment to hold data
my_data <- new.env()
# Tell getSymbols() to load the data into 'my_data'
getSymbols(symbols, from = start, to = end, env = my_data)
# Combine all the adjusted close prices into one xts object
price_data <- Reduce(merge, lapply(my_data, Ad))
# Remove "Adjusted" from column names
colnames(price_data) <- sub(".Adjusted", "", colnames(price_data), fixed = TRUE)
# Get the last price for each month
monthly_data <- apply.monthly(price_data, last)
# Convert to a long data.frame
long_data <- fortify.zoo(monthly_data,
names = c("Date", "Symbol", "Adjusted"), melt = TRUE)
I have an assignment in which I need to detect anomalies in a dataset. I'm using the 'anomalize' package in R and was wondering how to interpret the following output values of the 'anomalize' function:
Remainder_L1
Remainder_L2
I've checked the documentation but I'm unable to find the calculation method for these values. Can someone explain this calculation?
Anomalize output
The anomolize documentation gives a great example of how to apply anomolize() to a time series
This generates the Remainder_L1 and Remainder_L2 values for CRAN tidyverse downloads (that data comes with the anomolize package, so no need to import data, just run the code below to see how it generates the columns
# install.packages("anomalize")
library(tidyverse)
library(tibbletime)
library(anomalize)
tidyverse_cran_downloads %>%
time_decompose(count, merge = TRUE) %>%
anomalize(remainder)
# package date count observed season trend remainder remainder_l1 remainder_l2 anomaly
# <chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
# 1 broom 2017-01-01 1053 1053. -1007. 1708. 352. -1725. 1704. No
# 2 broom 2017-01-02 1481 1481 340. 1731. -589. -1725. 1704. No
# 3 broom 2017-01-03 1851 1851 563. 1753. -465. -1725. 1704. No
# 4 broom 2017-01-04 1947 1947 526. 1775. -354. -1725. 1704. No
# 5 broom 2017-01-05 1927 1927 430. 1798. -301. -1725. 1704. No
What do these values mean? From the anomolize source code we see:
"remainder_l1" (lower limit for anomalies), "remainder_l2" (upper limit for anomalies)
In the example above, it's saying in the first row, anomolize() would treat the value (1053) as an anomoly if it was less than -1725, or greater than 1725.
I am trying to estimate ARIMA models for 100 different series. So I employed fabletools::model() method and fable::ARIMA() function to do that job. But I couldn't able to use my exogenous variables in model estimation.
My series has 3 different columns, first ID tag identifying the first outlet, then Date.Time tag, and finally the Sales. In addition to these variables I also have dummy variables representing hour of day and week of day.
Following the code given bellow I transformed the dataframe which contains my endegounus and exegenous variables to tstibble.
ts_forecast <- df11 %>% select(-Date) %>%
mutate(ID = factor(ID)) %>% group_by(ID) %>% as_tsibble(index=Date.Time,key=ID)%>%tsibble::fill_gaps(Sales=0) %>%
fabletools::model(Arima = ARIMA(Sales,stepwise = TRUE,xreg=df12))
With this code I try to forecast values for same date.time interval for multiple outlets indentified with ID factor. But, The code returns the following error.
> Could not find an appropriate ARIMA model.
> This is likely because automatic selection does not select models with characteristic roots that may be numerically unstable.
> For more details, refer to https://otexts.com/fpp3/arima-r.html#plotting-the-characteristic-roots
Sales are my endogenous target var and df12 includes dummy variables representing hour and day. Some of the stores don't create sales in some specific hours so their dummy representing 01:00 AM could be equal to zero for all observation. However I don't think that would be a problem while fable uses stepwise method. I suppose, when the code sees variable with 0s it can exclude them
I am not sure what is the problem. Am I using problematic way to add xreg to the model (in ARIMA hep page it says xreg= like previous forecast package is OK) or issue is related with the second problem i mentioned dummies including "0" for all observation. If it is the second one there could be solution that can exclude all variables with constant 0 value.
I would be delighted if you can help me.
Thanks
Here is an example using hourly pedestrian count data.
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(tsibble)
library(fable)
#> Loading required package: fabletools
# tsibble with hourly data
df <- pedestrian %>%
mutate(dow = lubridate::wday(Date, label=TRUE))
# Training data
train <- df %>%
filter(Date <= "2015-01-31")
# Fit models
fit <- train %>%
model(arima = ARIMA(Count ~ season("day") + dow + pdq(2,0,0) + PDQ(0,0,0)))
# Forecast period
fcast_xregs <- df %>%
filter(Date > "2015-01-31", Date <= "2015-02-07")
# Forecasts
fit %>%
forecast(fcast_xregs)
#> # A fable: 504 x 8 [1h] <Australia/Melbourne>
#> # Key: Sensor, .model [3]
#> Sensor .model Date_Time Count .mean Date Time
#> <chr> <chr> <dttm> <dist> <dbl> <date> <int>
#> 1 Birra… arima 2015-02-01 00:00:00 N(-67, 174024) -67.1 2015-02-01 0
#> 2 Birra… arima 2015-02-01 01:00:00 N(-270, 250881) -270. 2015-02-01 1
#> 3 Birra… arima 2015-02-01 02:00:00 N(-286, 310672) -286. 2015-02-01 2
#> 4 Birra… arima 2015-02-01 03:00:00 N(-283, 351704) -283. 2015-02-01 3
#> 5 Birra… arima 2015-02-01 04:00:00 N(-264, 380588) -264. 2015-02-01 4
#> 6 Birra… arima 2015-02-01 05:00:00 N(-244, 4e+05) -244. 2015-02-01 5
#> 7 Birra… arima 2015-02-01 06:00:00 N(-137, 414993) -137. 2015-02-01 6
#> 8 Birra… arima 2015-02-01 07:00:00 N(93, 424929) 93.0 2015-02-01 7
#> 9 Birra… arima 2015-02-01 08:00:00 N(292, 431894) 292. 2015-02-01 8
#> 10 Birra… arima 2015-02-01 09:00:00 N(225, 436775) 225. 2015-02-01 9
#> # … with 494 more rows, and 1 more variable: dow <ord>
Created on 2020-10-09 by the reprex package (v0.3.0)
Notes:
You don't need to create dummy variables in R. The formula interface will handle categorical variables appropriately.
The season("day") special within ARIMA will generate the appropriate seasonal categorical variable, equivalent to 23 hourly dummy variables.
I've specified a specific ARIMA model to save computation time. But omit the pdq special to automatically select the optimal model.
Keep the PDQ(0,0,0) special as you don't need the ARIMA model to handle the seasonality when you are doing that with the exogenous variables.
I'm trying to get the standard deviation of a stock price by year, but I'm getting the same value for every year.
I tried with dplyr (group_by, summarise) and also with a function, but had no luck in any of them, both return the same value of 67.0.
It is probably passing the whole dataframe without subsetting it, how can this issue be fixed?
library(quantmod)
library(tidyr)
library(dplyr)
#initial parameters
initialDate = as.Date('2010-01-01')
finalDate = Sys.Date()
ybeg = format(initialDate,"%Y")
yend = format(finalDate,"%Y")
ticker = "AAPL"
#getting stock prices
stock = getSymbols.yahoo(ticker, from=initialDate, auto.assign = FALSE)
stock = stock[,4] #working only with closing prices
With dplyr:
#Attempt 1 with dplyr - not working, all values by year return the same
stock = stock %>% zoo::fortify.zoo()
stock$Date = stock$Index
separate(stock, Date, c("year","month","day"), sep="-") %>%
group_by(year) %>%
summarise(stdev= sd(stock[,2]))
# A tibble: 11 x 2
# year stdev
# <chr> <dbl>
# 1 2010 67.0
# 2 2011 67.0
#....
#10 2019 67.0
#11 2020 67.0
And with function:
#Attempt 2 with function - not working - returns only one value instead of multiple
#getting stock prices
stock = getSymbols.yahoo(ticker, from=initialDate, auto.assign = FALSE)
stock = stock[,4] #working only with closing prices
#subsetting
years = as.character(seq(ybeg,yend,by=1))
years
calculate_stdev = function(series,years) {
series[years] #subsetting by years, to be equivalent as stock["2010"], stock["2011"] e.g.
sd(series[years][,1]) #calculate stdev on closing prices of the current subset
}
yearly.stdev = calculate_stdev(stock,years)
> yearly.stdev
[1] 67.04185
Use apply.yearly() (a convenience wrapper around the more general period.apply()) to call a function on yearly subsets of the xts object returned by getSymbols().
You can use the Cl() function to extract the close column from objects returned by getSymbols().
stock = getSymbols("AAPL", from = "2010-01-01", auto.assign = FALSE)
apply.yearly(Cl(stock), sd)
## AAPL.Close
## 2010-12-31 5.365208
## 2011-12-30 3.703407
## 2012-12-31 9.568127
## 2013-12-31 6.412542
## 2014-12-31 13.371293
## 2015-12-31 7.683550
## 2016-12-30 7.640743
## 2017-12-29 14.621191
## 2018-12-31 20.593861
## 2019-12-31 34.538978
## 2020-06-19 29.577157
I don't know dplyr, but here's how with data.table
library(data.table)
# convert data.frame to data.table
setDT(stock)
# convert your Date column with content like "2020-06-17" from character to Date type
stock[,Date:=as.Date(Date)]
# calculate sd(price) grouped by year, assuming here your price column is named "price"
stock[,sd(price),year(Date)]
Don't pass the name of the dataframe again in your summarise function. Use the variable name instead.
separate(stock, Date, c("year","month","day"), sep="-") %>%
group_by(year) %>%
summarise(stdev = sd(AAPL.Close)) # <-- here
# A tibble: 11 x 2
# year stdev
# <chr> <dbl>
# 1 2010 5.37
# 2 2011 3.70
# 3 2012 9.57
# 4 2013 6.41
# 5 2014 13.4
# 6 2015 7.68
# 7 2016 7.64
# 8 2017 14.6
# 9 2018 20.6
#10 2019 34.5
#11 2020 28.7
I have spent 1-day search for the answer to this question and yet still could not figure out how this works (relatively new to R).
The data:
I have the daily revenue of a store. The starting date is November 2017, and the end date is February 2020. (It is not a typical Jan - Dec every year data). There is no missing value, and every day's sale is recorded. There are 2 columns: date (in proper date format) and revenue (in numerical format).
I am trying to build a time series forecasting model for my sales data. One pre-requisite is that I need to transform my data into the ts object. All those posts online I have seen dealt with yearly or monthly data, yet I have not yet seen anyone mention daily data.
I tried to convert my data to a ts object this way (I named my data "d"):
d_ts <- ts(d$revenue, start=min(d$date), end = max(d$date), frequency = 365)
I then got really weird results as such:
Start = c(17420, 1)
End = c(18311, 1)
Frequency = 365
[1] 1174.77 214.92 10.00 684.86 7020.04 11302.50 30613.55 29920.98 24546.49 22089.89 30291.65 32993.05 26517.11 39670.38 30361.32 17510.72
[17] 9888.76 3032.27 1229.74 2426.36 ....... [ reached getOption("max.print") -- omitted 324216 entries ]
There are 892 days in this dataset, how come the ts object's dimension to be 325,216 x 1 ????
I looked into this book called "Hands-On Time-Series with R" and found the following excerpt:
enter image description here
This basically means the ts() object does NOT work for daily data. Is this why my ts() conversion is messed up?
My questions are ...
(1) How can I make my daily revenue data to be a time series object before feeding into a model, if ts() does not work for daily data? All those time-series models require my data to be in time-series format though.
(2) Does the fact that my data does not start on Jan 2017 & end on Dec 2019 (i.e. those perfect 12 months in a year data shown in many online posts) have any complications? If so, what should I adjust so that the time series forecasting would be meaningful?
I have been stuck on this issue and could not wrap my head around. I really, really appreciate your help!
The ts function can work with any time interval, that's defined by the start and end points. As you're using dates, one unit corresponds to one day, as this is how they're stored internally. The help file at ?ts also shows examples of how to use annual or quarterly data,
To read in your daily data correctly you need to set frequency=1. Using some data similar in structure to what you've got:
#Compile a dataframe like yours
library(lubridate)
set.seed(0)
d <- data.frame(date=seq.Date(dmy("01/11/2017/"), by="day", length.out=892))
d$revenue <- runif(892)
head(d)
#date revenue
# 1 2017-11-01 0.8966972
# 2 2017-11-02 0.2655087
# 3 2017-11-03 0.3721239
# 4 2017-11-04 0.5728534
# 5 2017-11-05 0.9082078
# 6 2017-11-06 0.2016819
#Convert to timeseries object
d_ts <- ts(d$revenue, start=min(d$date), end = max(d$date), frequency = 1)
d_ts
# Time Series:
# Start = 17471
# End = 18362
# Frequency = 1
# [1] 0.896697200 0.265508663 0.372123900 0.572853363 0.908207790 0.201681931 0.898389685 0.944675269 0.660797792
# [10] 0.629114044 0.061786270 0.205974575 0.176556753 0.687022847 0.384103718 0.769841420 0.497699242 0.717618508
With daily data, you are better off using a tsibble class rather than a ts class. There are modelling and forecasting tools available via the fable package.
library(tsibble)
library(fable)
set.seed(1)
d_tsibble <- data.frame(
date = seq(as.Date("2017-11-01"), by = "day", length.out = 892),
revenue = rnorm(892)
) %>%
as_tsibble(index = date)
d_tsibble
#> # A tsibble: 892 x 2 [1D]
#> date revenue
#> <date> <dbl>
#> 1 2017-11-01 -0.626
#> 2 2017-11-02 0.184
#> 3 2017-11-03 -0.836
#> 4 2017-11-04 1.60
#> 5 2017-11-05 0.330
#> 6 2017-11-06 -0.820
#> 7 2017-11-07 0.487
#> 8 2017-11-08 0.738
#> 9 2017-11-09 0.576
#> 10 2017-11-10 -0.305
#> # … with 882 more rows
d_tsibble %>%
model(
arima = ARIMA(revenue)
) %>%
forecast(h = "14 days")
#> # A fable: 14 x 4 [1D]
#> # Key: .model [1]
#> .model date revenue .distribution
#> <chr> <date> <dbl> <dist>
#> 1 arima 2020-04-11 -0.0178 N(-1.8e-02, 1.1)
#> 2 arima 2020-04-12 -0.0117 N(-1.2e-02, 1.1)
#> 3 arima 2020-04-13 -0.00765 N(-7.7e-03, 1.1)
#> 4 arima 2020-04-14 -0.00501 N(-5.0e-03, 1.1)
#> 5 arima 2020-04-15 -0.00329 N(-3.3e-03, 1.1)
#> 6 arima 2020-04-16 -0.00215 N(-2.2e-03, 1.1)
#> 7 arima 2020-04-17 -0.00141 N(-1.4e-03, 1.1)
#> 8 arima 2020-04-18 -0.000925 N(-9.2e-04, 1.1)
#> 9 arima 2020-04-19 -0.000606 N(-6.1e-04, 1.1)
#> 10 arima 2020-04-20 -0.000397 N(-4.0e-04, 1.1)
#> 11 arima 2020-04-21 -0.000260 N(-2.6e-04, 1.1)
#> 12 arima 2020-04-22 -0.000171 N(-1.7e-04, 1.1)
#> 13 arima 2020-04-23 -0.000112 N(-1.1e-04, 1.1)
#> 14 arima 2020-04-24 -0.0000732 N(-7.3e-05, 1.1)
Created on 2020-04-01 by the reprex package (v0.3.0)