quantstrat logical error - missing value where TRUE/FALSE needed - r

I am getting this error when applying strategy in quantstrat:
Error in if (length(j) == 0 || (length(j) == 1 && j == 0)) { :
missing value where TRUE/FALSE needed
My code is as follows:
.blotter <- new.env()
.strategy <- new.env()
Sys.setenv(TZ="UTC")
STRATEGY<-'PFReplicate'
try(rm.strat(STRATEGY))
n<-130
#SMA signals and rules
LONG.ENTRY.SIGNAL.SMA<-"CLOSE_GT_SMA_SIG_LONG"
LONG.EXIT.SIGNAL.SMA<-"CLOSE_LT_SMA_SIG_LONG"
SHORT.ENTRY.SIGNAL.SMA<-"CLOSE_LT_SMA_SIG_SHORT"
SHORT.EXIT.SIGNAL.SMA<-"CLOSE_GT_SMA_SIG_SHORT"
LONG.ENTRY.RULE.SMA<-'L_ENTRY_SMA_RULE'
LONG.EXIT.RULE.SMA<-'L_EXIT_SMA_RULE'
SHORT.ENTRY.RULE.SMA<-'S_ENTRY_SMA_RULE'
SHORT.EXIT.RULE.SMA<-'S_EXIT_SMA_RULE'
LONG.ORDERSET.NAME<-'CLOSELONGSMA'
SHORT.ORDERSET.NAME<-'CLOSESHORTSMA'
strategy(STRATEGY,store=TRUE)
Set up SMA indicator
add.indicator(strategy = STRATEGY,name='SMA',
arguments=list(x=quote(mktdata),n),
label='SMA')
#Set up signals
#SMA signals
add.signal(strategy = STRATEGY,name="sigCrossover",
arguments=list(columns=c('Close','SMA'),
relationship="gt"),
label=LONG.ENTRY.SIGNAL.SMA)
add.signal(strategy = STRATEGY,name="sigCrossover",
arguments=list(columns=c('Close','SMA'),
relationship="lt"),
label=LONG.EXIT.SIGNAL.SMA)
add.signal(strategy = STRATEGY,name="sigCrossover",
arguments=list(columns=c('Close','SMA'),
relationship="lt"),
label=SHORT.ENTRY.SIGNAL.SMA)
add.signal(strategy = STRATEGY,name="sigCrossover",
arguments=list(columns=c('Close','SMA'),
relationship="gt"),
label=SHORT.EXIT.SIGNAL.SMA)
#Add our SMA rules (enabled)
add.rule(strategy = STRATEGY,name="ruleSignal",
arguments=list(sigcol=LONG.ENTRY.SIGNAL.SMA,sigval=TRUE,
orderqty=100,ordertype="market",
TxnFees=0,orderside="long",
orderset=LONG.ORDERSET.NAME),
type="enter",label=LONG.ENTRY.RULE.SMA)
add.rule(strategy = STRATEGY,name="ruleSignal",
arguments=list(sigcol=LONG.EXIT.SIGNAL.SMA,sigval=TRUE,
orderqty='all',ordertype="market",
TxnFees=0,orderside="long",
orderset=LONG.ORDERSET.NAME),
type="exit",label=LONG.EXIT.RULE.SMA)
add.rule(strategy = STRATEGY,name="ruleSignal",
arguments=list(sigcol=SHORT.ENTRY.SIGNAL.SMA,sigval=TRUE,
orderqty=100,ordertype="market",
TxnFees=0,orderside="short",
orderset=SHORT.ORDERSET.NAME),
type="enter",label=SHORT.ENTRY.RULE.SMA)
add.rule(strategy = STRATEGY,name="ruleSignal",
arguments=list(sigcol=SHORT.EXIT.SIGNAL.SMA,sigval=TRUE,
orderqty='all',ordertype="market",
TxnFees=0,orderside="short",
orderset=SHORT.ORDERSET.NAME),
type="exit",label=SHORT.EXIT.RULE.SMA)
symbol <- mar.rep
port <- 'mar.rep'
currency("USD")
stock(primary_id = symbol,currency = "USD",multiplier = 1)
Sys.setenv(TZ="UTC")
initDate <- '1971-01-05'
startDate <- '1972-01-06'
endDate<- '2010-12-31'
initEq <- 1e6
initPortf(name = port,symbols = symbol,initDate=initDate)
initAcct(name = port,portfolios = port,initDate=initDate,initEq=initEq)
initOrders(portfolio = port,initDate=initDate)
applyStrategy(strategy =STRATEGY,portfolios = port,debug = TRUE)
I've tried to keep the code simple, so as to avoid dumb errors, but I still get this one. The applyStrategy runs and lists thousands of transactions, and after 30 minutes, I get this error. I am guessing the fix is simple, but I am not seeing it. Thanks for your help!

I figured out the problem after posting my question. For anyone else who runs across this error when running quantstrat, check your data for NAs. Plus make sure all assets have explicitly-defined columns that match exactly the columns referenced in add.signal.
This may sound obvious, but data management was the biggest obstacle to getting results in my case. My data came from various data providers, with varying column formats (csv files, primarily). After spending a few hours cleaning and setting up my data to run through the strategy, it is working (I'm at hour 7 so far of processing).
Quantstrat can be difficult to debug, as some error messages are not easy to interpret. Note that this error message is telling you that one or more logical comparisons in an if statement results in NA. If you see this error, check your data for NA to see if this may be the problem.
nrow(na.omit(data)) == nrow(data)
If this is not true, you have NAs. You can remove them with
data_cleaned <- na.omit(data)
but it will depend on your data format.
Sorry if this is a remedial error for everyone. I just wanted to post a detailed answer to this error, as it seems to come up a fair amount for people. If I had seen an explanation like this yesterday, I would have saved several hours of frustration!

Related

Consistent error message while running grouping analysis in 'plspm' package

I am looking for some help in resolving an error using the partial least squares path modeling package ('plspm').
I can get results running a basic PLS-PM analysis but run into issues when using the grouping function, receiving the error message:
Error in if (w_dif < specs$tol || iter == specs$maxiter) break : missing value where TRUE/FALSE needed
I have no missing values and all variables have the proper classification. Elsewhere I read that there is a problem with processing observations with the exact same values across all variables, I have deleted those and still face this issue. I seem to be facing the issue only when I run the groups using the "bootstrap" method as well.
farmwood = read.csv("farmwood_groups(distance).csv", header = TRUE) %>%
slice(-c(119:123))
Control = c(0,0,0,0,0,0)
Normative = c(0,0,0,0,0,0)
B_beliefs = c(0,0,0,0,0,0)
P_control = c(1,0,0,0,0,0)
S_norm = c(0,1,0,0,0,0)
Behavior = c(0,0,1,1,1,0)
farmwood_path = rbind(Control, Normative, B_beliefs, P_control, S_norm, Behavior)
colnames(farmwood_path) = rownames(farmwood_path)
farmwood_blocks = list(14:18,20:23,8:13,24:27,19,4:7)
farmwood_modes = rep("A", 6)
farmwood_pls = plspm(farmwood, farmwood_path, farmwood_blocks, modes = farmwood_modes)
ames(farmwood)[names(farmwood) == "QB3"] <- "Distance"
farmwood$Distance <- as.factor(farmwood$Distance)
distance_boot = plspm.groups(farmwood_pls, farmwood$Distance, method = "bootstrap")
distance_perm = plspm.groups(farmwood_pls, farmwood$Distance, method = "permutation")
The data is contained here:
https://www.dropbox.com/s/8vewuupywpi1jkt/farmwood_groups%28distance%29.csv?dl=0
Any help would be appreciated. Thank you in advance

Transaction problem in RStudio for tweet apriori analysis

I want to use the apriori algorithm to apply association rules between words on the tweet database I have with RStudio. However, the code below gives an error on a million rows of data, while working on a small number of data. I needed your help as I couldn't understand what caused the error.
TweetTrans <- read.transactions("../input/tweets/output.csv",
rm.duplicates=FALSE,
format = "basket",
sep = ",",
encoding = "UTF-8")
The Error is:
Error in validObject(.Object): invalid class “ngCMatrix” object: row indices are not sorted within columns
Traceback:
1. read.transactions("../input/tweets/output.csv", rm.duplicates = FALSE,
. format = "basket", sep = ",", encoding = "UTF-8")
2. as(data, "transactions")
3. asMethod(object)
4. new("transactions", as(from, "itemMatrix"), itemsetInfo = data.frame(transactionID = names(from),
. stringsAsFactors = FALSE))
5. initialize(value, ...)
6. initialize(value, ...)
7. callNextMethod()
8. .nextMethod(.Object = .Object, ... = ...)
9. callNextMethod()
10. .nextMethod(.Object = .Object, ... = ...)
11. as(from, "itemMatrix")
12. asMethod(object)
13. new("ngCMatrix", p = c(0L, p), i = as.integer(i) - 1L, Dim = c(length(levels(i)),
. length(p)))
14. initialize(value, ...)
15. initialize(value, ...)
16. callNextMethod()
17. .nextMethod(.Object = .Object, ... = ...)
18. validObject(.Object)
19. stop(msg, ": ", errors, domain = NA)
Here are some ideas for how to find a rogue line in the data file. The input to read.transactions should be a text file the looks something like
A, B, C
B, C
C, D, E
D, A, B, F
where A, B ,C, etc are the names of the items (probably longer than one character each!)
So you could read in the file using readLines...
data <- readLines("../input/tweets/output.csv")
Each element of data (one per line of the file) should be a string of the form "A, B, C" etc, as above.
You could then use functions (e.g. from the stringr package) to check if any lines contain unusual characters, or have an odd format. Without seeing your file, it is hard to say how to do this, but you might, for example, look for quotes in odd places (str_detect(data, '\\"')) or characters that are not letters, digits , spaces or commas (str_detect(data, "[^\\w\\d\\s,]")).
Another thing you could try is to write a for loop to take each element of data (or perhaps larger chunks if that is too slow), save it as a file, try reading it with read.transactions, and see where it crashes.
for(i in seq_along(data)){
writeLines(data[i], "dummyfile.csv")
trans <- read.transactions("dummyfile.csv",
rm.duplicates=FALSE,
format = "basket",
sep = ",",
encoding = "UTF-8")
}
The value of i when it crashes will give you the problem row number. It might take a long time to run, though!
I ran into a very similar problem: the same error got triggered when trying to cast a list to a transaction object.
I also couldn't easily figure out what lines in the data caused the issue, as it seems to be triggered by a combination of transactions and not necessarily by any individual one, but I managed to track down the source of the problem in this assignment (source):
p <- new("ngCMatrix", p = c(0L, p),
i = as.integer(i) - 1L,
Dim = c(length(levels(i)), length(p)))
My R got pretty rusty over time and I couldn't find an immediate way to patch the code, but I came up with an alternative solution for constructing the ngCMatrix object:
Assume you have the data in a data.frame following some sort of (user, item) format - in your case it would most likely be (tweet_id, term/word)
Create a unique incremental ID for every user and item and add it to your data.frame
Use those ID to create the sparse matrix and - optionally - enrich it with the labels for item and user to make it more interpretable
Finally, cast the sparse matrix to a transaction object
Example (I implemented mine with data.table, but a traditional dataframe implementation would be very similar):
library(Matrix)
library(data.table)
library(arules)
DT <- data.table(user = c('A','A','B','B','A','C','D'),
item = c('AAB','AAA','AAB','BBB','ABA','BBB','AAB'))
# Create user_ids
unique_users <- unique(DT$user)
users <- data.table(user=unique_users,
user_id=c(1:length(unique_users)))
# Repeat for items
unique_items <- unique(DT$item)
items <- data.table(item=unique_items,
item_id=c(1:length(unique_items)))
# Add indexes to original data table (setting keys helps with performance)
DT <- merge.data.table(x=DT, y=users, by='user')
DT <- merge.data.table(x=DT, y=items, by='item')
# Create the sparse matrix
mat <- sparseMatrix(
i = DT$item_id,
j = DT$user_id,
dims = c(nrow(items), nrow(users)),
dimnames = list(items$item, users$user)
)
# transform to arules 'transactions'
txn <- as(op, "transactions")
Please note that this doesn't help understanding what caused the issue, but rather provides a workaround to solve it. In my data.table implementation the code is pretty performant, taking only a few seconds to process over 30M transactions on a laptop-sized machine (2 CPUs, 16gb RAM).

Bollinger Bands indicator in R

I am currently trying to write a simple strategy using Bollinger Bands in R. The goal is to enter a long position when the closing price touches the lower band and exit when it touches the upper one. To do that I firstly wrote two simple function to use as indicators:
BBandsDown <- function(HLC,n=20,maType,sd=2){
bbdown <- BBands(HLC,n,maType,sd)$dn
colnames(bbdown)<-"bbdown"
return(bbdown)
}
BBandsUp <- function(HLC,n=20,maType,sd=2){
bbup <- BBands(HLC,n,maType,sd)$up
colnames(bbup)<-"bbup"
return(bbup)
}
Then I added the indicators
add.indicator(strategy = strategy.st,
name = 'Cl',
arguments = list(x=quote(mktdata)),
label = 'close')
add.indicator(strategy = strategy.st,
name = 'BBandsDown',
arguments = list(HLC = quote(Cl(mktdata)), n=10,maType="SMA",sd=1.5),
label = 'bbandsdown1.5')
add.indicator(strategy = strategy.st,
name = 'BBandsUp',
arguments = list(HLC = quote(Cl(mktdata)), n=10,maType="SMA",sd=1.5),
label = 'bbandsup1.5')
Then I define Signals and Rules. My problem is that I cannot use the applyStrategy command because it reply.
Error in BBands(HLC, n, maType, sd) (from strategy_bbands.r!15334IYx#2) :
Price series must be either High-Low-Close, or Close/univariate.
I tried with both HLC = quote(Cl(mktdata)) and HLC = quote(HLC(mktdata)) but the error is the same. What am I doing wrong?
You haven't provided a reproducible example.
The error is pretty clear that you're passing in the wrong data.
I suspect that you have mangled the data prior to calling applyStrategy , but there isn't enough here to validate that.
See the bbands.R demo in the demos directory for a working example.

R - Trycatch is saving warning instead of returning function output

I am trying to download records from twitter using rtweet. One issue with this is the twitter server needs to wait 15minutes every 18000 records. So, after record number 18000, I receive a data frame with all the records and a nice warning telling me to wait for a bit. search_tweets has an function argument to download more than 18000 records called retryonratelimit. However, this isnt working so I am exploring other options.
I have produced a function, incorporating tryCatch to address this. However, when the warning at 18000 records pops up, tryCatch is saving the warning rather than the data frame which should be spit out before the warning. Something it would not do if 17999 records were downloaded
library(rtweet)
library(RDCOMClient)
library(profvis)
TwitScrape = function(SearchTerm){
ReturnDF = tryCatch({
TempList=NULL
Temp = search_tweets(SearchTerm,n=18000)
TempList = list(as.data.frame(Temp), SearchTerm)
return(TempList)
},
warning = function(TempList){
Comb=NULL
MAXID = min(TempList[[1]]$status_id)
message("Delay for 15 minutes to accommodate server download limits")
pause(901)
TempWarn = search_tweets(TempList[[2]],n=18000, max_id=MAXID)
TempWarn = as.data.frame(TempWarn)
Comb = rbind(TempList[[1]], TempWarn)
CombList = list(Comb, TempList[[2]])
return(CombList)
}
)
}
Searches = c("#MUFC","#LFC", "#MCFC")
TestExpandList=NULL
TestExpand=NULL
TestExpand2=NULL
for (i in seq_along(Searches)){
TestExpandList = TwitScrape(SearchTerm = Searches[i])
TestExpand = TestExpandList[[1]]
TestExpand$Cat = Searches[i]
TestExpand$DownloadDate = Sys.Date()
TestExpand2 = rbind(TestExpand2, TestExpand)
}
I hope this makes sense. If I can offer any more information please let me know. In summary, why is tryCatch saving my warning rather than the data frame I want?
I am not 100% sure what you would like to achieve, but it seems you are using tryCatch with a wrong understanding.
The argument in the warning-handler warning = function(TempList) is the warning itself, i.e. you have named it TempList, but that doesn't mean it will become your TempList variable, it will still just pass the warning into the handler.
Your function TwitScrape is returning ReturnDF by convention, as you are not properly returning anything, I guess that is still what you want and ok.
I would try to re-structure your solution without tryCatch
Thanks for your comments. RolandASc, you were right. I went back to the drawing board. See the working TwitScrape function below:
TwitScrape = function(SearchTerm){
DF=NULL
DF = search_tweets(SearchTerm,n=18001)
Warn = warnings()
if (names(Warn[1]) == "Rate limit exceeded - 88"){
message("paused")
pause(910)
DF2 = search_tweets(SearchTerm,n=18000, max_id = min(DF$status_id))
DF3 = rbind(DF, DF2)
return(DF3)
}
else {
return(DF)
}}

Handling internet connection R

I`m trying to download several stocks from google, but every time the connection stops, R stops the loop. How can I handle this problem?
stocks <- c(
'MSFT',
'GOOG',
...
)
for (symbol in stocks)
{
stock_price <- getSymbols(symbol,src='google', from=startDate,to=endDate,auto.assign = FALSE)
prices[,j] <- stock_price[,1]
j <- j + 1
}
From the R manual "quantmod.pdf:
If auto.assign=FALSE or env=NULL (as of 0.4-0) the data will be returnedfrom the call, and will require the user to assign the results himself.Note that only one symbol at a time may be requested when auto assignment is disabled.
You are trying to request more than one ticket symbol at a time with the auto.assign parameter set to false and this is not allowed. However, you should be able to obtain all your symbols at once by adapting the following code:
data <- new.env()
getSymbols.extra(stocks, src = 'google', from = startDate, to = endDate, env = data, auto.assign = T)
plot(data$MSFT)
Pay careful attention to the R manual for getSymbols
"Data is fetched through one of the available getSymbols methods and saved in the env specified - the .GloblEnv by default.

Resources