Basic ntile function to create decile portfolios - r

https://imgur.com/a/O1O9G
I try to create decile portfolios based on momentum. I use DAX in Germany.
dax <- read.csv("DAXclean.csv", header = TRUE, sep = ";", dec = ",")
as.ts(dax)
mom.return <- matrix(NA,nrow(dax),ncol(dax))
mom.decile <- matrix(NA,nrow(dax),ncol(dax))
for (row in 13:nrow(dax)) {
for (column in 2:ncol(dax)) {
mom.return[row,column] <- (dax[row-1,column]-dax[row-12,column])/dax[row-12,column]
}
mom.decile[row,column] <- ntile(mom.return[row,2:ncol(dax)], 10)
}
When I run this code I get the following error message:
"Error in `[<-`(`*tmp*`, row, column, value = ntile(mom.return[row, 2:ncol(dax)], :
subscript out of bounds"
If I remove the following command, everything works fine.
mom.decile[row,column] <- ntile(mom.return[row,2:ncol(dax)], 10)
I can't see what the problem is.
Thank you in advance!

Related

Error in eval(expr, p): object 'X' not found; predict (BayesARIMAX)

I am trying to use BayesARIMAX to model and predict us gdp (you can find the data here: https://fred.stlouisfed.org/series/GDP).I followed the example (https://cran.r-project.org/web/packages/BayesARIMAX/BayesARIMAX.pdf) to build my model. I didnt have any major issue to build the model(used error handling to overcome Getting chol.default error when using BayesARIMAX in R issue). However could not get the prediction of the model. I tried to look for solution and there is no example of predicting the model that is build using BayesARIMAX. Every time that I run the "predict" I get the following error:
"Error in eval(expr, p) : object 'X' not found"
Here is my code.
library(xts)
library(zoo)
library(tseries)
library(tidyverse)
library(fpp2)
gdp <- read.csv("GDP.csv", head = T)
date.q <- as.Date(gdp[, 1], "%Y-%m-%d")
gdp <- xts(gdp[,2],date.q)
train.row <- 248
number.row <- dim(merge.data)[1]
gdp.train <- gdp[1:train.row]
gdp.test <- gdp[(train.row+1):number.row]
date.test <- date.q[(train.row+1):number.row]
library(BayesARIMAX)
#wrote this function to handle randomly procuded error due to MCMC simulation
test_function <- function(a,b,P=1,Q=1,D=1,error_count = 0)
{
tryCatch(
{
model = BayesARIMAX(Y=a,X = b,p=P,q=Q,d=D)
return(model)
},
error = function(cond)
{
error_count=error_count+1
if (error_count <40)
{
test_function(a,b,P,Q,D,error_count = error_count)
}
else
{
print(paste("Model doesnt converge for ARIMA(",P,D,Q,")"))
print(cond)
}
}
)
}
set.seed(1)
x = rnorm(length(gdp.train),4,1)
bayes_arima_model <- test_function(a = gdp.train,b=x,P = 3,D = 2,Q = 2)
bayes_arima_pred <- xts(predict(bayes_arima_model[[1]],newxreg = x[1:3])$pred,date.test)
and here is the error code
Error in eval(expr, p) : object 'X' not found
Here is how I resolve the issue after reading through the BayesARIMAX code (https://rdrr.io/cran/BayesARIMAX/src/R/BayesianARIMAX.R) . I basically created the variable "X" and passed it to predict function to get the result. You just have to set the length of X variable equal to number of prediction.
here is the solution code for prediction.
X <- c(1:3)
bayes_arima_pred <- xts(predict(bayes_arima_model[[1]],newxreg = X[1:3])$pred,date.test)
which gave me the following results.
bayes_arima_pred
[,1]
2009-01-01 14462.24
2009-04-01 14459.73
2009-07-01 14457.23

arulesViz subscript out of bounds paracoord

I want to perform basket analysis and draw a paracoord plot however I receive an error.
Content of this error is: :
Error in m[j, i] : subscript out of bounds.In addition: Warning message:
In cbind(pl, pr) :
number of rows of result is not a multiple of vector length (arg 2)
I am using data from: Link.
First I am transforming this to fit basket analysis, name of the original excel files is Online_Retail:
library(arules)
library(arulesViz)
library(plyr)
items <- ddply(Online_Retail, c("CustomerID", "InvoiceDate"), function(df1)paste(df1$Description, collapse = ","))
items1 <- items["V1"]
write.csv(items1, "groceries1.csv", quote=FALSE, row.names = FALSE, col.names = FALSE)
trans1 <- read.transactions("groceries1.csv", format = "basket", sep=",",skip=1)
And to draw paracoord I have created such a code:
rules.trans2<-apriori(data=trans1, parameter=list(supp=0.001,conf = 0.05),
appearance=list(default="rhs", lhs="ROSES REGENCY TEACUP AND SAUCER"), control=list(verbose=F))
sorted.plot <- sort(rules.trans2, by="support", decreasing = TRUE)
plot(sorted.plot, method="paracoord", control=list(reorder=TRUE, verbose = TRUE))
Why my code for paracoord is not working? how can I fix it? What should I change?
This is, unfortunately, a bug in arulesViz. This will be fixed in the next release (arulesViz 1.3-3). The fix is already available in the development version on GitHub: https://github.com/mhahsler/arulesViz

Gini Index in R

I am trying to calculate the Gini index for each row of my database. Each row is a customer and each column is a monthly session. So what i need to do is to add a column with the Gini index by row, for each customer throughout the 12 months.
See example attached
I found some examples online and did this:
Gini_index <- apply(DT_file[,c('sessions_201607_pct','sessions_201608_pct', 'sessions_201609_pct','sessions_201610_pct','sessions_201611_pct','sessions_201612_pct','sessions_201701_pct','sessions_201702_pct','sessions_201703_pct','sessions_201704_pct','sessions_201705_pct','sessions_201706_pct')], 1, gini)
However, I get the following error:
Error in match.fun(FUN) : object 'gini' not found
I have installed both Ineq and Reldist (and libraries) so I don't know why this isn't working.
Try to do this to have your gini's coeff by column :
library(ineq)
coeff= NULL
for (i in colnames(your_data[,-1])){
coeff= c(coeff,round(ineq(your_data[,i],type = 'Gini'),4))
}
data_coeff = data.frame(cbind(coeff,colnames(your_data[,-1])))
colnames(data_coeff) = c("Coeff","Colnames")
If you want it by for each rows try this :
your_new_data = as.data.frame(t(your_data[,-1]), row.names =T)
colnames(your_new_data) = your_data[,1]
ind = NULL
for (i in colnames(your_new_data)){
ind = c(ind,round(ineq(your_new_data[,i],type = 'Gini'),4))
}
data_coeff= data.frame(cbind(ind,colnames(your_new_data)))
colnames(data_coeff) = c("Coeff","customer")
Finaly you add your coeffs at the end of your data_frame with a merge for instance :
your_data_final = merge(your_data,data_coeff, by = "customer" )

Trapping error in R

A very basic quesiton. But i am not able to apply this to my code. Hence seeking help here
I am getting an error mentioned below while running this R code
knn.pred <- knn(tdm.stack.nl_train, tdm.stack.nl_Test, tdm.cand_train, prob = TRUE)
> Error in knn(tdm.stack.nl_train, tdm.stack.nl_Test, tdm.cand_train, prob = TRUE) :
> dims of 'test' and 'train' differ.
I want to print the error message as given below. However I could not achieve this. I am not good in writing functions yet.. Please help.
out <- tryCatch( when error = {print('New words seen in testing data')})
It's better and easier to use try:
knn.pred <- try(knn(tdm.stack.nl_train, tdm.stack.nl_Test, tdm.cand_train, prob = TRUE))
if (inherits(knn.pred, "try-error") { # error management
print('New words seen in testing data')
}
You could do:
tryCatch(knn.pred <- knn(tdm.stack.nl_train, tdm.stack.nl_Test, tdm.cand_train, prob = TRUE),
error = function(e) {
stop('New words seen in testing data')
})
This shows up as:
tryCatch(knn.pred <- knn(tdm.stack.nl_train, tdm.stack.nl_Test, tdm.cand_train, prob = TRUE),
error = function(e) {
stop('New words seen in testing data')
})
Error in value[[3L]](cond) : New words seen in testing data

how to solve negative subscript error in R?

I am trying to normalize the data frame before prediction but I get this error :
Error in seq_len(nrows)[i] :
only 0's may be mixed with negative subscripts
Called from: top level
Here is my code :
library('caret')
load(file = "some dataset path here")
DummyDataSet = data
attach(DummyDataSet)
foldCount = 10
classifyLabels = DummyDataSet$ClassLabel
folds = createFolds(classifyLabels,k=foldCount)
for (foldIndex in 1:foldCount){
cat("----- Start Fold -----\n")
#holding out samples of one fold in each iterration
testFold = DummyDataSet[folds[[foldIndex]],]
testLabels = classifyLabels[folds[[foldIndex]]]
trainFolds = DummyDataSet[-folds[[foldIndex]],]
trainLabels = classifyLabels[-folds[[foldIndex]]]
#Zero mean unit variance normalization to ONLY numerical data
for (k in 1:ncol(trainFolds)){
if (!is.integer(trainFolds[,k])){
params = meanStdCalculator(trainFolds[,k])
trainFolds[,k] = sapply(trainFolds[,k], function(x) (x - params[1])/params[2])
testFold[,k] = sapply(testFold[,k], function(x) (x - params[1])/params[2])
}
}
meanStdCalculator = function(data){
Avg = mean(data)
stdDeviation = sqrt(var(data))
return(c(Avg,stdDeviation))
}
cat("----- Start Fold -----\n")
}
where trainFolds is a fold creating by caret package and its type is data.frame.
I have already read these links :
R Debugging
Subset
Negative Subscripts
but I couldn't find out what is wrong with the indexes?
anybody can help me?

Resources