R Quant Trading

R Quant Trading - r

I could use some help getting my code to work properly. I am trying to create a simple position signal based on the closing price being higher than the MACD, Bollinger Bands, and the Slow Stochastics. I am getting errors on line 17 onwards. I am not sure if this is because "Stock" is an xts object or not. I would like to graph the output in the end as well. Thanks!
#install.packages("quantmod")
library("quantmod")
#install.packages("FinancialInstrument")
library("FinancialInstrument")
#install.packages("PerformanceAnalytics")
library("PerformanceAnalytics")
#install.packages("TTR")
library("TTR")
#######################################
Stock <- get(getSymbols('CAT'))["2014::"]
# add the indicators
Stock$BBands <- BBands(HLC(Stock))
Stock$MACD <- MACD(HLC(Stock))
Stock$stochOSC <- stoch(Stock[,c("High","Low","Close")])
Stock$position <- ifelse(Cl(Stock) > Stock$BBands > Stock$MACD > Stock $stockOSC , 1 , -1)
Gains <- lag(Stock$position) * dailyReturn(Stock)
charts.PerformanceSummary(cbind(dailyReturn(Stock),Gains))

As Pascal mentioned in his above comment, MACD uses a univariate object. This object should be the closing price (unless you want something else) which is the third column in the HLC(Stock) named CAT.Close. The Stock$stochOSC didn't work because column names specified wrongly (CAT. should be added before High, Low and Close). Finally, & should separate multiple conditions of ifelse (note the typo in Stock$stochOSC in the question (ck instead of ch)).
Here is the code:
#install.packages("quantmod")
library("quantmod")
#install.packages("FinancialInstrument")
library("FinancialInstrument")
#install.packages("PerformanceAnalytics")
library("PerformanceAnalytics")
#install.packages("TTR")
library("TTR")
#######################################
Stock <- get(getSymbols('CAT'))["2014::"]
# add the indicators
Stock$BBands <- BBands(HLC(Stock))
Stock$MACD <- MACD(HLC(Stock)[,3])
Stock$stochOSC <- stoch(Stock[,c("CAT.High","CAT.Low","CAT.Close")])
Stock$position <- ifelse(Cl(Stock)>Stock$BBands & Stock$BBands >Stock$MACD & Stock$MACD > Stock$stochOSC , 1 , -1)
Gains <- lag(Stock$position) * dailyReturn(Stock)
charts.PerformanceSummary(cbind(dailyReturn(Stock),Gains))
You should get the following plot:

Related

Using a test sample file with MaxEnt in R

I worked a lot with MaxEnt in R recently (dismo-package), but only using a crossvalidation to validate my model of bird-habitats (only a single species). Now I want to use a self-created test sample file. I had to pick this points for validation by hand and can't use random test point.
So my R-script looks like this:
library(raster)
library(dismo)
setwd("H:/MaxEnt")
memory.limit(size = 400000)
punkteVG <- read.csv("Validierung_FL_XY_2016.csv", header=T, sep=";", dec=",")
punkteTG <- read.csv("Training_FL_XY_2016.csv", header=T, sep=";", dec=",")
punkteVG$X <- as.numeric(punkteVG$X)
punkteVG$Y <- as.numeric(punkteVG$Y)
punkteTG$X <- as.numeric(punkteTG$X)
punkteTG$Y <- as.numeric(punkteTG$Y)
##### mask NA ######
mask <- raster("final_merge_8class+le_bb_mask.img")
dataframe_VG <- extract(mask, punkteVG)
dataframe_VG[dataframe_VG == 0] <- NA
dataframe_TG <- extract(mask, punkteTG)
dataframe_TG[dataframe_TG == 0] <- NA
punkteVG <- punkteVG*dataframe_VG
punkteTG <- punkteTG*dataframe_TG
#### add the raster dataset ####
habitat_all <- stack("blockstats_stack_8class+le+area_8bit.img")
#### MODEL FITTING #####
library(rJava)
system.file(package = "dismo")
options(java.parameters = "-Xmx1g" )
setwd("H:/MaxEnt/results_8class_LE_AREA")
### backgroundpoints ###
set.seed(0)
backgrVMmax <- randomPoints(habitat_all, 100000, tryf=30)
backgrVM <- randomPoints(habitat_all, 1000, tryf=30)
### Renner (2015) PPM modelfitting Maxent ###
maxentVMmax_Renner<-maxent(habitat_all,punkteTG,backgrVMmax, path=paste('H:/MaxEnt/Ergebnisse_8class_LE_AREA/maxVMmax_Renner',sep=""),
args=c("-P",
"noautofeature",
"nothreshold",
"noproduct",
"maximumbackground=400000",
"noaddsamplestobackground",
"noremoveduplicates",
"replicates=10",
"replicatetype=subsample",
"randomtestpoints=20",
"randomseed=true",
"testsamplesfile=H:/MaxEnt/Validierung_FL_XY_2016_swd_NA"))
After the "maxent()"-command I ran into multiple errors. First I got an error stating that he needs more than 0 (which is the default) "randomtestpoints". So I added "randomtestpoints = 20" (which hopefully doesn't stop the program from using the file). Then I got:
Error: Test samples need to be in SWD format when background data is in SWD format
Error in file(file, "rt") : cannot open the connection
The thing is, when I ran the script with the default crossvalidation like this:
maxentVMmax_Renner<-maxent(habitat_all,punkteTG,backgrVMmax, path=paste('H:/MaxEnt/Ergebnisse_8class_LE_AREA/maxVMmax_Renner',sep=""),
args=c("-P",
"noautofeature",
"nothreshold",
"noproduct",
"maximumbackground=400000",
"noaddsamplestobackground",
"noremoveduplicates",
"replicates=10"))
...all works fine.
Also I tried multiple things to get my csv-validation-data in the correct format. Two rows (labled X and Y), Three rows (labled species, X and Y) and other stuff. I would rather use the "punkteVG"-vector (which is the validation data) I created with read.csv...but it seems MaxEnt wants his file.
I can't imagine my problem is so uncommon. Someone must have used the argument "testsamplesfile" before.

I found out, what the problem was. So here it is, for others to enjoy:
The correct maxent-command for a Subsample-file looks like this:
maxentVMmax_Renner<-maxent(habitat_all, punkteTG, backgrVMmax, path=paste('H:/MaxEnt',sep=""),
args=c("-P",
"noautofeature",
"nothreshold",
"noproduct",
"maximumbackground=400000",
"noaddsamplestobackground",
"noremoveduplicates",
"replicates=1",
"replicatetype=Subsample",
"testsamplesfile=H:/MaxEnt/swd.csv"))
Of course, there can not be multiple replicates, since you got only one subsample.
Most importantly the "swd.csv" Subsample-file has to include:
the X and Y coordinates
the Values at the respective points (e.g.: with "extract(habitat_all, PunkteVG)"
the first colum needs to consist of the word "species" with the header "Species" (since MaxEnt uses the default "species" if you don't define one in the Occurrence data)
So the last point was the issue here. Basically, if you don't define the species-colum in the Subsample-file, MaxEnt will not know how to assign the data.

Time series data arrangement

I have several time series data and I'm trying to make a arrangement before further analysis. The point is that, as you can see in the picture, 3 financial time series has a different dates-observed. I want to eliminate whole line if there's at least 1 blanked line. In order to make a arrangement, first I made whole dates line to the left side except saturdays and sundays from 1 Jan 2005 to 30 Jun 2015 for indexing.
example: at the 11th row, there exist unmatched dates. I want to put NA columns in the middle.
here's what I've tried
Day=data.frame(test[,1:2])
Rk=data.frame(test[,3:4])
Vix=data.frame(test[,5:6])
BA=data.frame(test[,7:8])
i=1
k=0
while(i<=2736){
if(Day[i,1]==Rk[i,1]){i=i+1}
else if(Day[i,1]!=Rk[i,1]){
k=k+1
Rk[i+1:k+2634,]=Rk[i:k+2633,]
Rk[i,]=c(Day[i,1],NA)
i=i+1}
}
but it shows error message: number of items to replace is not a multiple of replacement length
I will be very much appreciated. Any kind of helps will be more than welcomed.

This is fairly easy if you use a time-series class like xts (or zoo).
# create sample data
set.seed(21)
dayone <- as.Date("2005-01-03")
plusdays <-c(0:4, 7:11, 14:18, 21:24)
test <- data.frame(date=dayone + plusdays)
test$day <- weekdays(test$date, abbreviate=TRUE)
test$date.1 <- dayone + c(plusdays[-11L], 25)
test$Kernel <- rnorm(nrow(test), 3e-5, 1e-6)
test$date.2 <- dayone + c(plusdays[-11L], 25)
test$VIX.High <- round(rnorm(nrow(test), 14, 0.1), 2)
test$date.3 <- dayone + c(plusdays[-11L], 25)
test$Baa.Aaa <- round(rnorm(nrow(test), 0.66, 0.01), 2)
require(xts)
# create xts objects for each column
Rk <- xts(test['Kernel'], test$date.1)
Vix <- xts(test['VIX.High'], test$date.2)
Ba <- xts(test['Baa.Aaa'], test$date.3)
# use xts to merge data
testxts <- merge(xts(,test$date), Rk, Vix, Ba)
I've left out your day column because xts/zoo objects are just a matrix with an index attribute, and you cannot mix types in a matrix. But you can use the .indexwday function to extract the weekday of each row.

Variable lagging in R error

I am trying to lag some prices I downloaded from yahoo market but I don't want the lag to be fixed. I would like to have it where depending on another DF or Values the lag period changes.
This extracts and formats the data:
library("quantmod")
library("PerformanceAnalytics")
library(dplyr)
nameOfStrategy <- "GSPC MACD"
#Specify dates for downloading data, training models and running simulation
trainingStartDate = as.Date("2000-01-01")
trainingEndDate = as.Date("2010-01-01")
outofSampleStartDate = as.Date("2010-01-02")
#Download the data
symbolData <- new.env() #Make a new environment for quantmod to store data in
getSymbols("^GSPC", env = symbolData, src = "yahoo", from = trainingStartDate)
trainingData <- window(symbolData$GSPC, start = trainingStartDate, end = trainingEndDate)
testData <- window(symbolData$GSPC, start = outofSampleStartDate)
indexReturns <- Delt(Cl(window(symbolData$GSPC, start = outofSampleStartDate)))
colnames(indexReturns) <- "GSPC Buy&Hold"
And this is the code I'm using to find market signals and then organize data:
signalB <- ifelse(MACD12$macd > MACD12$signal & lag.xts(MACD12$macd) < lag.xts(MACD12$signal),1,NA)
#If fastMA > slowMA on change go long
signalS <- ifelse(MACD12$macd < MACD12$signal & lag.xts(MACD12$macd) > lag.xts(MACD12$signal),-1,NA)
#Combines Buy and sell signals
Tsignal <- merge(signalB,signalS)
#Gets number of days---dont know when period starts but doesnt matter since we just want difference in days
Tsignal$dates =indexTZ(Tsignal)
Tsignal$dates =index(Tsignal)
#Combines Buy and Sell signal into overall signal
Tsignal$Signal <- ifelse(is.na(Tsignal$Buy),ifelse(is.na(Tsignal$Sell),NA,-1),1)
Tsignal$Tdate <- 0
#Gets 'Date' only if signal either buy or sell
Tsignal$Tdate <- ifelse(!is.na(Tsignal$Signal),Tsignal$dates,NA)
#Finds difference between the Sell sig and the last signal **** In this case it will work but future buying twice before
# selling will result in not getting returns of one of the purchases
Tsignal$lag <- ifelse(Tsignal$Signal == -1,diff(na.locf(Tsignal$Tdate)),NA)
Tsignal$lag <- ifelse(is.na(Tsignal$lag),0,Tsignal$lag)
Here is the problem:
lag(Cl(trainingData),Tsignal$lag))
#Warning message:
#In if (n == 0) return(x) :
#the condition has length > 1 and only the first element will be used
It returns all of the prices but lagged back at a period of 0 (ie just returns the prices). While the Tsignal$lag has a lot of zero's in it there are values greater than 1 spread throughout.
I need it to return the same price during the dates where Tsignal$lag = 0 and return the price lagged back the number of periods Tsignal$lag specifies.
Currently I am using the dplyr lag function but I have tried other packages with the lag function and I get the same error. Writing this I am thinking I might have to do a 'for' function but I'm not sure, I am fairly new to R.
Thanks for your help ahead of time!

Both the dplyr and stats lag functions require integers for the number of lag periods (?dplyr::lag), but your Tsignal$lag is a vector. To remain in the xts domain, one suggestion is to use the lag values as direct index offsets to the close prices, as in
Cl(trainingData)[(1:nrow(trainingData))-coredata(Tsignal$lag),]

RTextTools Why does create_analytics throw "Error in order(TOPIC_CODE) create_analytics"

I'm having another problem with classifying a virgin dataset with RTextTools and hope someone can shed some light.
I have pre-trained a model (SVM,MAXENT,RF & BAGGING) and term matrix saved for a classifier "C11" with a pretty decent 91%+ ACC & AUC and look fine to me.
The source data for the models looked a bit like...
df.SourceData ( label=numeric(0), body=character(0) )
Where label was either "1" or "2" (meaning true / false), and body was a string.
The virgin dataset looks like...
df.SourceData (code=0, body=character(0) )
i.e. every instance in this set has a code (synonymous with label) set to 0 as these are virginal - we don't know the label. The body is again a text string.
So running the classifier with...
## Load pre-built matrix & model
load("original_matrix.RData") # called doc_matrix
load("original_model.RData")
## Load unknown data set for classifying
df.VirginData <- read.csv(file="VirginData.csv")
## Create NEW document matrix
new_doc_matrix <- create_matrix(df.VirginData$body,
language="english",
removeNumbers=v.matrixRemoveNumbers,
stemWords=v.matrixStemWords,
toLower=v.matrixToLower,
removeSparseTerms=v.matrixRemoveSparseTerms,
, originalMatrix=doc_matrix
)
## Add a column to df.VirginData as labels placeholder
df.VirginData$code <- 0
## Create Container
container <- create_container(new_doc_matrix, df.VirginData$code, testSize=1:3270, virgin=TRUE) # as we have no labels virgin=TRUE
## Classify Model
results <- classify_models(container,models)
## Create analytics from classified models
analytics <- create_analytics(container, results)
This seems to work right up until the point where create_analytics is called. This falls over with the error in the title (above). And if I debug step through that function I get right up to line 47 where the issue lays.
score_summary <- create_scoreSummary(container, classification_results)
document_summary <- create_documentSummary(container, score_summary)
document_summary <- document_summary[,c(2,3,5)]
raw_summary <- cbind(classification_results, document_summary)
topic_summary <- create_topicSummary(container, score_summary)
topic_summary <- as.data.frame(topic_summary[,c(1,3,4)])
topic_summary <- topic_summary[with(topic_summary, order(TOPIC_CODE)),]
Can anyone shed light on why this might be happening? - perhaps it is something to do with how I build the new container and in particular the labels bit (df.VirginData$code). Am I even doing this right for un-labelled docs?
Help appreciated with love.

To make create_analytics work, you need to have all of the prediction levels specified in the labels argument of create_container, in your case:
df.VirginData$code <- c(0,rep(1,c(nrow(df.VirginData)-1)))
These manual codes wont actually matter as long as you have virgin=TRUE in the container argument.

The problem is that label summary cannot be properly computed. Since it is virgin data, there is only one label for the new data. What you want, I assume, is the class assignment in document_summary. To fix this problem, I took the parts of the create_analytics function that applied to virgin data, removed a couple lines about topic_summary to force label_summary to be properly produced. You end up with a frequency table for all categories. Following the else (virgin=TRUE) in create_analytics:
score_summary <- create_scoreSummary(container, classification_results)
document_summary <- create_documentSummary(container,score_summary)
document_summary <- document_summary[, c(2, 3, 5)]
raw_summary <- cbind(classification_results, document_summary)
container <- new("analytics_virgin",
label_summary = data.frame(table(document_summary[,3])),
document_summary = as.data.frame(raw_summary))

Need to calculate Rate of Change of two data sets over time individually and Net rate of Change

What I have done so far:
read.csv("filename.csv", header=TRUE)
df$header1
df$header2
Now I want to calculate the Rate of Change: Header1 is Dates, Header2 is prices
Rate of Change by date for all values comparative to preceding date.
I want to generate two separate columns of Rate of Change performing the same operation on another file.
Once rate of change is available for both the data sets, I will look to find the net rate of change for each date.
Where I am now:
df<-read.csv("audusd.csv", head = TRUE)
df$Date
df$Close
rate<- 100*diff(df$Close/df[-nrow(df),]$Close
This executes and then I get this:
> rate<- 100*diff(df$Close/df[-nrow(df),]$Close
+
+
In the console.
Thanks for all the help till now, please help further. :)
Also, I am a bit confused about whether I can get the results into a file? Or Do I have to run the code each time?
This certainly would help with my work in a big way, if I can understand and start using it.

You can also use the diff(...) function, which allows you to avoid looping through rows:
rate <- 100*diff(df$price)/df[-nrow(df),]$price
diff(...) calculates the difference row(n) - row(n-1). df[-nrow(df),] returns all rows of df except the last.
Here's a complete example with working code.
# all this just to get sample data.
# daily close for AAPL from 2013.01.01 - today
library(tseries)
library(zoo)
ts <- get.hist.quote(instrument="AAPL",
start="2013-01-01", end="2014-01-01",
quote="AdjClose", provider="yahoo", origin="1970-01-01",
compression="d", retclass="zoo")
df <- data.frame(ts)
df <- data.frame(date=as.Date(rownames(df)),price=df$AdjClose)
df <- df[!is.na(df$price),]
# calculate daily rate of change...
rate <- 100*diff(df$price)/df[-nrow(df),]$price
plot(df[-nrow(df),]$date,rate,type="l",xlab="2013",ylab="Pct. Change",main="APPL")

Given what you said in the comments about how you would do it in excel here is the R equivalent.
dataset=(1:10)^3 #Data in your case one of your columns
ratedata=NULL # Something you have to do so R can put data in this object
for(i in 1:(length(dataset)-1)){ratedata[i]=(dataset[i+1]-dataset[i])/dataset[i]*100} # A simple for function doing what you specified.
ratedata # Returning the rate data so you can look at it
FYI:This only works if your time is at regular intervals.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

R Quant Trading - r

Related

Using a test sample file with MaxEnt in R

Time series data arrangement

Variable lagging in R error

RTextTools Why does create_analytics throw "Error in order(TOPIC_CODE) create_analytics"

Need to calculate Rate of Change of two data sets over time individually and Net rate of Change

Categories

Resources