Anomaly detection In R - r

I am used to using the qcc package in R to detect outliers in the data. I recently came across the AnomalyDetection package. Found here: https://github.com/twitter/AnomalyDetection
My dataset is below:
date_start<-as.Date(c('2017-10-17','2017-10-18',
'2017-10-19','2017-10-20',
'2017-10-21','2017-10-22',
'2017-10-23','2017-10-24',
'2017-10-25','2017-10-26',
'2017-10-27','2017-10-28',
'2017-10-29','2017-10-30',
'2017-10-31','2017-11-01',
'2017-11-02','2017-11-03',
'2017-11-04','2017-11-05',
'2017-11-06','2017-11-07',
'2017-11-08','2017-11-09',
'2017-11-10','2017-11-11',
'2017-11-12'))
count <- c(NA, 3828,
3532,3527,
3916,4303,
3867,3699,
3439,3099,
3148,3310,
3904,3525,
2962,3398,
2935,3013,
3005,3516,
3010,2848,
2689,2573,
2569,2946,
2713)
df<-data.frame(date_start,count)
head(df)
date_start count
1 2017-10-17 NA
2 2017-10-18 3828
3 2017-10-19 3532
4 2017-10-20 3527
5 2017-10-21 3916
6 2017-10-22 4303
When I test out this dataset with the AnomalyDetection package, the response is NULL and no plot appears. Any idea why this may be?
library(AnomalyDetection)
res = AnomalyDetectionTs(df, max_anoms=0.02, direction='both', plot=TRUE)
res$plot
NULL

This is caused by the fact no anomalies were detected.
When one manually changes:
count[13] <- 5671
it is detected.
Additionally for the plot to work the time stamps need to be class POSIXct
df <- data.frame(date_start = as.POSIXct(date_start),
count)
res <- AnomalyDetectionTs(df,
max_anoms = 0.02,
direction = 'both',
plot = TRUE)
#output
$anoms
timestamp anoms
1 2017-10-29 02:00:00 5671
$plot

When using POSIXct i get the the following error "Error: Column x is a date/time and must be stored as POSIXct, not POSIXlt"
However changing to POSIXlt solves the problem

Related

Calculate changing date for a Donchian Channel technical indicator

I am trying to create an indicator that has a dynamic n that changes each day. Basically I am making a strategy that enters a trade when a stocks price reaches its all time highest price.
The best way I can think to do this is by using a Donchian Chanel and entering when the closing price is equal to or greater than all previous DC highs. To do this I need:
n = (Current date of algo - start date).
This way the indicator will start working from day 1 and it won't "forget" about previous highs as the strategy runs through years of data. The problem I am having is that I don't know how to write a code/function that will express the current date of strategy in a way that I can turn it into a simple calculation. The best code I can come up with is:
##Problem in line below##
dcn <- difftime(initdate, as.Date(datePos), units = c("days"))
### This part will work fine once dcn is working
BuySig<-function(price,DC...)
{ifelse(price=>DC,1,0)}
add.indicator(strategy=strategyname,name="DonchianChannel",
arguments=list(HL=quote(mktdata$Close),n=dcn),label="DC")
dcn of course is going to be my Donichan Channel n. The problem I am having is that no matter what I try to use in place of as.Date(datePos) it keeps telling me "object 'datePos' not found". I have tried using other things that I specify earlier in my code such as: Dates, timestamp.
Any advice would be really helpful.
You can't use DonchianChannel with an n that varies. n must be a fixed integer for that function. You need to create your own function that trades 'highest highs' since the start of your data set.
This achieves what you want; just make a function out of it and supply it as a function for add.indicator
library(quantmod)
getSymbols("SPY")
SPY_max <- runMax(Cl(SPY), n = 1, cumulative = TRUE)
SPY$all_time_high <- Cl(SPY) >= SPY_max
chart_Series(SPY["2018/", 1:4])
tail(SPY[SPY$all_time_high == 1,], 10)
# SPY.Open SPY.High SPY.Low SPY.Close SPY.Volume SPY.Adjusted all_time_high
# 2018-01-19 279.80 280.41 279.14 280.41 140920100 273.9762 1
# 2018-01-22 280.17 282.69 280.11 282.69 91322400 276.2038 1
# 2018-01-23 282.74 283.62 282.37 283.29 97084700 276.7901 1
# 2018-01-25 284.16 284.27 282.40 283.30 84587300 276.7998 1
# 2018-01-26 284.25 286.63 283.96 286.58 107743100 280.0046 1
# 2018-08-24 286.44 287.67 286.38 287.51 57487400 283.3048 1
# 2018-08-27 288.86 289.90 288.68 289.78 57072400 285.5416 1
# 2018-08-28 290.30 290.42 289.40 289.92 46943500 285.6796 1
# 2018-08-29 290.16 291.74 289.89 291.48 61485500 287.2167 1
# 2018-09-20 292.64 293.94 291.24 293.58 100360600 289.2860
When the column all_time_high returns 1, you're at an all time high for the time series in question.

Error in to.period.contributions() from PerformanceAnalytics sandbox

I've been trying to calculate my portfolio returns and the individual stock contributions. I stumbled along this post, which appears to be from the guy who help write PerformanceAnalytics.
At the end of the article he posts a link to r-forge with a sandbox file for some functions.
So I'm trying to convert my daily returns to the summed monthly returns via the to.monthly.contributions() function but I'm running into an xts error!
Here's my code:
library(PerformanceAnalytics)
library(quantmod)
stock.weights <- c(.15, .20, .25, .225, .175)
symbols <- c("GOOG", "AMZN", "BA", "FB", "AAPL")
getSymbols(symbols, src = 'google', from = "2016")
#xts with daily closing of each stock
merged.closing <- merge(GOOG[,4], AMZN[,4], BA[,4], FB[,4], AAPL[,4])
#xts with returns
merged.return <- na.omit(Return.calculate(merged.closing))
# weighted returns rebalanced quartely
portfolio.returns = Return.portfolio(merged.return, weights = stock.weights,
rebalance_on = "quarters", verbose = TRUE)
#to monthly contributions function
to.monthly.contributions(portfolio.returns$contributions)
However when I run the last line I get the following error message:
Error in inherits(x, "xts") :
argument "Contributions" is missing, with no default
5. inherits(x, "xts")
4. is.xts(x)
3. checkData(Contributions)
2. to.period.contributions(contributions = contributions, period = "months")
1. to.monthly.contributions(portfolio.returns$contributions)
I'm guessing that the error has something to do with the portfolio.returns$contributions not being an xts? But I'm not sure how to get around that.
On the side note, if anyone has any better ideas or sources for calculating portfolio returns by months/quarters/years I'm keen to hear, bearing in mind they need to account for weight changes, re-balances and contributions to changes!
Note that PerformanceAnalytics (and many other packages in that R-Forge repo) have moved to Brian Peterson's GitHub account. There you will see some changes to sandbox/to.period.contributions.R about a year ago. That might be causing you some issue(s).
Another issue is that the object returned by Return.portfolio() does not have a contributions element. The element name you want is contribution (singular).
After addressing those two issues, your to.monthly.contributions() call works.
R> to.monthly.contributions(portfolio.returns$contribution)
GOOG.Close AMZN.Close BA.Close FB.Close AAPL.Close Portfolio Return
2016-01-29 0.0002244419 -0.0156956938 -0.036245552 0.0219893367 -1.330565e-02 -0.043033115
2016-02-29 -0.0095461956 -0.0113127380 -0.003625779 -0.0121676134 -1.128288e-03 -0.037780614
2016-03-31 0.0103601952 0.0140210157 0.016927654 0.0171632715 2.218899e-02 0.080661130
2016-04-29 -0.0104584200 0.0222188532 0.015479754 0.0068624014 -2.448619e-02 0.009616397
2016-05-31 0.0085179936 0.0210895602 -0.016873347 0.0024024015 9.732993e-03 0.024869602
2016-06-30 -0.0084883795 -0.0023345382 0.007080427 -0.0086331655 -6.610526e-03 -0.018986182
2016-07-29 0.0166211530 0.0120706520 0.007295757 0.0190190760 1.576098e-02 0.070767622
2016-08-31 -0.0003521895 0.0027014233 -0.007568643 0.0040084230 3.231073e-03 0.002020086
2016-09-30 0.0020684771 0.0177517727 0.004108611 0.0039452914 1.185750e-02 0.039731657
2016-10-31 0.0013990917 -0.0113434690 0.020286170 0.0047711858 7.585139e-04 0.015871492
2016-11-30 -0.0050340240 -0.0092287866 0.015187074 -0.0217047070 -4.601884e-03 -0.025382327
2016-12-30 0.0026858660 -0.0001688763 0.009813394 -0.0059705490 8.286484e-03 0.014646319
2017-01-31 0.0048528154 0.0196327363 0.012429342 0.0298631030 8.355638e-03 0.075133635
2017-02-28 0.0047757941 0.0053484794 0.025108019 0.0094951963 2.198006e-02 0.066707545
2017-03-31 0.0010760715 0.0096512663 -0.004718775 0.0111011780 8.787645e-03 0.025897386
2017-04-28 0.0138145523 0.0086741715 0.011265973 0.0129883844 -1.218154e-05 0.046730900
2017-05-31 0.0101747490 0.0150069699 0.003781232 0.0018310137 1.060194e-02 0.041395909
2017-06-30 -0.0093108126 -0.0055092033 0.013123207 -0.0006974798 -9.767034e-03 -0.012161323
2017-07-31 0.0035934766 0.0040867769 0.056523388 0.0272271162 5.723163e-03 0.097153921
2017-08-31 0.0013284632 -0.0013521084 -0.003226369 0.0036945746 1.691168e-02 0.017356239
2017-09-19 -0.0025908954 -0.0019880088 0.014497492 0.0007343197 -5.737005e-03 0.004915902

Error in if ((location <= 1) | (location >= length(x)) - R - Eventstudies

I am trying my best at a simple event study in R, with some data retrieved from the Wharton Research Data Service (WRDS). I am not completely new to R, but I would describe my expertise level as intermediate. So, here is the problem. I am using the eventstudies package and one of the steps is converting the physical dates to event time frame dates with the phys2eventtime(..) function. This function takes multiple arguments:
z : time series data for which event frame is to be generated. In the form of an xts object.
Events : it is a data frame with two columns: unit and when. unit has column name of which response is to measured on the event date, while when has the event date.
Width : width corresponds to the number of days on each side of the event date. For a given width, if there is any NA in the event window then the last observation is carried forward.
The authors of the package have provided an example for the xts object (StockPriceReturns) and for Events (SplitDates). This looks like the following:
> data(StockPriceReturns)
> data(SplitDates)
> head(SplitDates)
unit when
5 BHEL 2011-10-03
6 Bharti.Airtel 2009-07-24
8 Cipla 2004-05-11
9 Coal.India 2010-02-16
10 Dr.Reddy 2001-10-10
11 HDFC.Bank 2011-07-14
> head(StockPriceReturns)
Mahindra.&.Mahindra
2000-04-03 -8.3381609
2000-04-04 0.5923550
2000-04-05 6.8097616
2000-04-06 -0.9448889
2000-04-07 7.6843828
2000-04-10 4.1220462
2000-04-11 -1.9078480
2000-04-12 -8.3286900
2000-04-13 -3.8876847
2000-04-17 -8.2886060
So I have constructed my data in the same way, an xts object (DS_xts) and a data.frame (cDS) with the columns "unit" and "when". This is how it looks:
> head(DS_xts)
61241
2011-01-03 0.024247
2011-01-04 0.039307
2011-01-05 0.010589
2011-01-06 -0.022172
2011-01-07 0.018057
2011-01-10 0.041488
> head(cDS)
unit when
1 11754 2012-01-05
2 10104 2012-01-24
3 61241 2012-01-31
4 13928 2012-02-07
5 14656 2012-02-08
6 60097 2012-02-14
These are similar in my opinion, but how it looks does not tell the whole story. I am quite certain that my problem is in how I have constructed these two objects. Below is my R code:
#install.packages("eventstudies")
library("eventstudies")
DS = read.csv("ReturnData.csv")
cDS = read.csv("EventData.csv")
#Calculate Abnormal Returns
DS$AR = DS$RET - DS$VWRETD
#Clean up and let only necessary columns remain
DS = DS[, c("PERMNO", "DATE", "AR")]
cDS = cDS[, c("PERMNO", "DATE")]
#Generate correct date format according to R's as.Date
for (i in 1:nrow(DS)) {
DS$DATE[i] = format(as.Date(toString(DS$DATE[i]), format = "%Y %m %d"), format = "%Y-%m-%d")
}
for (i in 1:nrow(cDS)) {
cDS$DATE[i] = format(as.Date(toString(cDS$DATE[i]), format = "%Y %m %d"), format = "%Y-%m-%d")
}
#Rename cDS columns according to phys2eventtime format
colnames(cDS)[1] = "unit"
colnames(cDS)[2] = "when"
#Create list of unique PERMNO's
PERMNO <- unique(DS$PERMNO)
for (i in 1:length(PERMNO)) {
#Subset based on PERMNO
DStmp <- DS[DS$PERMNO == PERMNO[i], ]
#Remove PERMNO column and rename AR to PERMNO
DStmp <- DStmp[, c("DATE", "AR")]
colnames(DStmp)[2] = as.character(PERMNO[i])
dates <- as.Date(DStmp$DATE)
DStmp <- DStmp[, -c(1)]
#Create a temporary XTS object
DStmp_xts <- xts(DStmp, order.by = dates)
#If first iteration, just create new variable, otherwise merge
if (i == 1) {
DS_xts <- DStmp_xts
} else {
DS_xts <- merge(DS_xts, DStmp_xts, all = TRUE)
}
}
#Renaming columns for matching
colnames(DS_xts) <- c(PERMNO)
#Making sure classes are the same
cDS$unit <- as.character(cDS$unit)
eventList <- phys2eventtime(z = DS_xts, events = cDS, width = 10)
So, if I run phys2eventtime(..) it returns:
> eventList <- phys2eventtime(z = DS_xts, events = cDS, width = 10)
Error in if ((location <= 1) | (location >= length(x))) { :
missing value where TRUE/FALSE needed
In addition: Warning message:
In findInterval(when, index(x)) : NAs introduced by coercion
I have looked at the original function (it is available at their GitHub, can't use more than two links yet) to figure out this error, but I ran out of ideas how to debug it. I hope someone can help me sort it out. As a final note, I have also looked at another (magnificent) answer related to this R package (question: "format a zoo object with “dimnames”=List of 2"), but it wasn't enough to help me solve it (or I couldn't yet comprehend it).
Here is the link for the two CSV files if you would like to reproduce my error (or solve it!).

quantmod : can't generate daily returns for stock using OHLC

I'm attempting to get daily returns by using one BDH pull, but I can't seem to get it to work. I considered using quantmod's periodreturn function, but to no avail. I'd like the PctChg column populated, and any help is greatly appreciated.
GetReturns <- function(ticker, calctype, voldays) {
check.numeric <- function(N){
!length(grep("[^[:digit:]]", as.character(N)))}
isnumber <- function(x) is.numeric(x) & !is.na(x)
startdate <- Sys.Date()-20
enddate <- Sys.Date()
###############
GetData <- BBGPull <- bdh(paste(ticker," US EQUITY"), c("Open","High","Low","PX_Last"), startdate, enddate,
include.non.trading.days = FALSE, options = NULL, overrides = NULL,
verbose = FALSE, identity = NULL, con = defaultConnection())
##Clean Up Columns and Remove Ticker
colnames(GetData) <- c("Date","Open","High","Low","Close")
GetData[,"PctChg"] <- "RETURN" ##Hoping to populate this column with returns
GetData
}
I'm not married to the idea of using quantmod, and even would use LN(T/T-1) but im just unsure how to add a column with this data. Thank you !
You missed the (important) fact that bdh() still returns a data.frame object you need to transform first:
R> library(Rblpapi)
Rblpapi version 0.3.5 using Blpapi headers 3.8.8.1 and run-time 3.8.8.1.
Please respect the Bloomberg licensing agreement and terms of service.
R> spy <- bdh("SPY US EQUITY", c("Open","High","Low","PX_Last"), \
+ Sys.Date()-10, Sys.Date())
R> class(spy)
[1] "data.frame"
R> head(spy)
date Open High Low PX_Last
1 2016-12-05 220.65 221.400 220.420 221.00
2 2016-12-06 221.22 221.744 220.662 221.70
3 2016-12-07 221.52 224.670 221.380 224.60
4 2016-12-08 224.57 225.700 224.260 225.15
5 2016-12-09 225.41 226.530 225.370 226.51
6 2016-12-12 226.40 226.960 225.760 226.25
R> sx <- xts(spy[, -1], order.by=spy[,1])
R> colnames(sx)[4] <- "Close" ## important
R> sxret <- diff(log(Cl(sx)))
R> head(sxret)
Close
2016-12-05 NA
2016-12-06 0.00316242
2016-12-07 0.01299593
2016-12-08 0.00244580
2016-12-09 0.00602225
2016-12-12 -0.00114851
R> sxret <- ClCl(sx) ## equivalent shorthand using quantmod
This also uses packages xts and quantmod without explicitly loading them.

Create a trading day calendar from scratch

I just spent a day debugging some R code only to find that the problem I was having was caused by a missing date in the data returned by Yahoo using getSymbol. At the time I write this Yahoo is returning this:
QQQ.Open QQQ.High QQQ.Low QQQ.Close QQQ.Volume QQQ.Adjusted
2014-01-03 87.27 87.35 86.62 86.64 35723700 86.64
2014-01-06 86.66 86.76 86.00 86.32 32073100 86.32
2014-01-07 86.72 87.25 86.56 87.12 25860600 87.12
2014-01-08 87.14 87.55 86.95 87.31 27197400 87.31
2014-01-09 87.63 87.64 86.72 87.02 23674700 87.02
2014-01-13 87.18 87.48 85.68 86.01 48842300 86.01
2014-01-14 86.30 87.72 86.30 87.65 37178900 87.65
2014-01-15 88.03 88.54 87.94 88.37 39835600 88.37
2014-01-16 88.30 88.51 88.16 88.38 31630100 88.38
2014-01-17 88.11 88.37 87.67 87.88 36895800 87.88
which is missing 2014-01-10. That date is returned for other ETFs. I expect that Yahoo will fix the data one of these days (the data is on Google) but for now it is wrong which caused my code some fits.
To address this issue I want to check my data to ensure that there is data for all dates the markets were open. If there's a canned way to do this in some package I'd appreciate info on that but to that end I started writing some code using the timeDate package. However I have ended up with xts index questions I don't understand. The code follows:
library(timeDate)
library(quantmod)
MyZone = "UTC"
Sys.setenv(TZ = MyZone)
YearStart = "1990"
YearEnd = "2014"
currentYear = getRmetricsOptions("currentYear")
dateStart = paste0(YearStart, "-01-01")
dateEnd = paste0(YearEnd, "-12-31")
DayCal = timeSequence(from = dateStart, to = dateEnd, by="day", zone = MyZone)
TradingCal = DayCal[isBizday(DayCal, holidayNYSE())]
testSym = "QQQ"
getSymbols(testSym, src="yahoo", from = dateStart, to = dateEnd)
testData = get(testSym)
head(testData)
tail(testData, n=10)
#Save date range of data being checked
firstIndex = index(testData)[1]
lastIndex = index(testData)[nrow(testData)]
#Create an xts series covering all dates
AllDates = xts(x=rep(1, length.out=length(TradingCal)),
order.by=TradingCal, tzone = MyZone)
head(AllDates)
tail(AllDates)
index(AllDates)[1:20]
index(testData)[1:20]
tzone(AllDates)
tzone(testData)
#Create an xts object that has all dates covered
#by testSym but using calendar I created
CheckData = subset(AllDates, ((index(AllDates)>=firstIndex) &&
(index(AllDates)<=lastIndex))
)
class(index(AllDates))
class(index(testData))
The goal here was to create a 'known good calendar' which I could use to create a simple xts object. With that object I would then check whether every index in that object had a corresponding index in the data being tested. However I'm not getting that far as it appears my indexes are not compatible. When I run the code I get this at the end:
> CheckData = subset(AllDates, ((index(AllDates)>=firstIndex) && (index(AllDates)<=lastIndex))
+ )
Error in `>=.default`(index(AllDates), firstIndex) :
comparison (5) is possible only for atomic and list types
> class(index(AllDates))
[1] "timeDate"
attr(,"package")
[1] "timeDate"
> class(index(testData))
[1] "Date"
>
Can someone show me the errors of my ways here so that I can move forward? Thanks!
You need to convert TradingCal to Date:
TradingDates <- as.Date(TradingCal)
And here's another way to find index values in TradingDates that aren't in your testData index.
AllDates <- xts(,TradingDates)
testSubset <- paste(start(testData), end(testData), sep="/")
CheckData <- merge(AllDates, testData)[testSubset]
BadDates <- CheckData[is.na(rowSums(CheckData))]

Resources