Translate from lpSolve to lpSolveAPI Package - r

The goal: Use current lpSolve code to create a new code using the lpSolveAPI package.
The background: I have been using lpSolve to find an optimal solution, for purposes of creating fantasy sports contest lineups, which maximizes the projected points (DK) of the players on the team versus the maximum allowed total salary (SALARY) - with a handful of other constraints to fit the rules of the contest. I have discovered in a few instances, however, lpSolve fails to find the most optimal solution. It seemingly overlooks the best points/dollar solution for some unknown reason and finds only the nth best solution instead. Unfortunately, I do not have an example of this as I had issues with my archive drive recently and lost quite a bit of data.
My research/ask: I have read other threads here that have had similar issues with lpSolve (like this one here). In those instances, lpSolveAPI was able to see the optimal solution when lpSolve could not. Not being familiar with lpSolveAPI, I am looking for assistance from someone familiar with both packages in converting my current code to instead take advantage of the lpSolveAPI package and eliminate lpSolve oversight going forward. I have tried but, for some reason, I keep getting lost in the translation.
My lpSolve code:
# count the number of unique teams and players
unique_teams = unique(slate_players$TEAM)
unique_players = unique(slate_players$PLAYERID)
# define the objective for the solver
obj = slate_players$DK
# create a constraint matrix for the solver
con = rbind(t(model.matrix(~ POS + 0, slate_players)), #Positions
t(model.matrix(~ PLAYERID + 0, slate_players)), #DupPlayers
t(model.matrix(~ TEAM + 0, slate_players)), #SameTeam
rep(1,nrow(slate_players)), #TotPlayers
slate_players$SALARY) #MaxSalary
# set the direction for each of the constraints
dir = c("==", #1B
"==", #2B
"==", #3B
"==", #C
"==", #OF
"==", #SP
"==", #SS
rep('<=',length(unique_players)), #DupPlayers
rep('<=',length(unique_teams)), #SameTeam
"==", #TotPlayers
"<=") #MaxSalary
# set the limits for the right-hand side of the constraints
rhs = c(1, #1B
1, #2B
1, #3B
1, #C
3, #OF
2, #SP
1, #SS
rep(1,length(unique_players)), #DupPlayers
rep(5,length(unique_teams)), #SameTeam
10, #TotPlayers
50000) #MaxSalary
# find the optimal solution using the solver
result = lp("max", obj, con, dir, rhs, all.bin = TRUE)
# create a data frame for the players in the optimal solution
solindex = which(result$solution==1)
optsolution = slate_players[solindex,]
Thank you for your help!

This should be straightforward:
library(lpSolveAPI)
ncons <- nrow(con)
nvars <- length(obj)
lprec <- make.lp(nrow=ncons, ncol=ncols)
set.objfn(lprec, obj)
set.type(lprec, 1:nvars, "binary") # all.bin=TRUE
for (i in 1:ncons) {
set.row(lprec, row=i, xt=con[i,])
set.constr.type(lprec, dir[i], constraints=i)
set.rhs(lprec, b=rhs[i], constraints=i)
}
status <- solve(lprec)
if(status!=0) stop("no solution found, error code=", status)
sol <- get.variables(lprec)
This code is untested since your question has missing data references and no expected solution.

Related

Adding priority to rows in LpsolveAPI in R

I have an optimised time allocation lp code in R using LpSolverAPI. The code works fine with the given constraints which are :
The time allotted per job must be met
The time a worker works cannot exceed the paid work time
But, i need to add a "priority" while the time is allocated to each job. Meaning, there are jobs that need to be completed first/ must be given first priority while allocating workers.
The code i have is below.
The time must be allotted to job4, job2, job3 and finally job1 (as per priority_matrix)
If no time is workers are free then lower priority jobs can remain un allocated.
With the below code (without priority) no time is allocated to job 4.
library(lpSolveAPI)
jobs <- c(1,1,1,1)
workermax <- c(8,6,9)
jobmax <- c(6,6,5,6)
priority_matrix <- as.matrix(c(4,2,3,1),ncol = 1, bycol = T) ##priority of each job
rownames(priority_matrix) <- c("Job1","Job2","Job3","Job4")
scheduler_input2 <- matrix(c(8, 0, 0, 0,6,0,8,0,9,0,6,0), nrow=4 , byrow =T)
obj.vals <- scheduler_input2
colnames(obj.vals) <- c("worker1","worker2","worker3")
rownames(obj.vals) <- c("Job1","Job2","Job3","Job4")
nworkers <- ncol(scheduler_input2)
njobs <- nrow(scheduler_input2) #4
ncol = nworkers*njobs
distribution<- make.lp(ncol=ncol)
set.type(distribution, columns=1:ncol, type = c("integer"))
set.objfn(distribution, obj.vals)
lp.control(distribution,sense='max')
#cosntraint1
time_per_job <- function (job_index) {
skill_cols <- (0:(nworkers-1))*njobs + job_index
add.constraint(distribution, rep(1,nworkers), indices=skill_cols,type="<=", rhs=jobmax[job_index])
}
lapply(1:njobs, time_per_job)
#cosntraint2
max_hrs_by_worker <- function (DA_index) {
DA_cols <- (DA_index-1)*njobs + (1:njobs) #relevant columns for a given room
add.constraint(distribution, xt=jobs, indices=DA_cols,type="<=",rhs=workermax[DA_index])
}
lapply(1:nworkers, max_hrs_by_worker)
solve(distribution)
get.objective(distribution)
distribution<-matrix(get.variables(distribution), njobs,nworkers)
Thanks in advance.
The description is not precise enough to give a definite answer. Also, write the problem down in mathematical notation before starting to code. The code is not very easy to read and not at all structure-revealing. This is partly a problem with LpSolveAPI which has a somewhat medieval way to represent optimization models.
If you want to enforce that job2 can only be executed if job4 is executed then introduce binary variables:
y[j] = 1 if job j is executed
0 otherwise
and the precedence constraint:
y[job4] >= y[job2]

Using cpquery function for several pairs from dataset

I am relatively beginner in R and trying to figure out how to use cpquery function for bnlearn package for all edges of DAG.
First of all, I created a bn object, a network of bn and a table with all strengths.
library(bnlearn)
data(learning.test)
baynet = hc(learning.test)
fit = bn.fit(baynet, learning.test)
sttbl = arc.strength(x = baynet, data = learning.test)
Then I tried to create a new variable in sttbl dataset, which was the result of cpquery function.
sttbl = sttbl %>% mutate(prob = NA) %>% arrange(strength)
sttbl[1,4] = cpquery(fit, `A` == 1, `D` == 1)
It looks pretty good (especially on bigger data), but when I am trying to automate this process somehow, I am struggling with errors, such as:
Error in sampling(fitted = fitted, event = event, evidence = evidence, :
logical vector for evidence is of length 1 instead of 10000.
In perfect situation, I need to create a function that fills the prob generated variable of sttbl dataset regardless it's size. I tried to do it with for loop to, but stumbled over the error above again and again. Unfortunately, I am deleting failed attempts, but they were smt like this:
for (i in 1:nrow(sttbl)) {
j = sttbl[i,1]
k = sttbl[i,2]
sttbl[i,4]=cpquery(fit, fit$j %in% sttbl[i,1]==1, fit$k %in% sttbl[i,2]==1)
}
or this:
for (i in 1:nrow(sttbl)) {
sttbl[i,4]=cpquery(fit, sttbl[i,1] == 1, sttbl[i,2] == 1)
}
Now I think I misunderstood something in R or bnlearn package.
Could you please tell me how to realize this task with filling the column by multiple cpqueries? That would help me a lot with my research!
cpquery is quite difficult to work with programmatically. If you look at the examples in the help page you can see the author uses eval(parse(...)) to build the queries. I have added two approaches below, one using the methods from the help page and one using cpdist to draw samples and reweighting to get the probabilities.
Your example
library(bnlearn); library(dplyr)
data(learning.test)
baynet = hc(learning.test)
fit = bn.fit(baynet, learning.test)
sttbl = arc.strength(x = baynet, data = learning.test)
sttbl = sttbl %>% mutate(prob = NA) %>% arrange(strength)
This uses cpquery and the much maligned eval(parse(...)) -- this is the
approach the the bnlearn author takes to do this programmatically in the ?cpquery examples. Anyway,
# You want the evidence and event to be the same; in your question it is `1`
# but for example using learning.test data we use 'a'
state = "\'a\'" # note if the states are character then these need to be quoted
event = paste(sttbl$from, "==", state)
evidence = paste(sttbl$to, "==", state)
# loop through using code similar to that found in `cpquery`
set.seed(1) # to make sampling reproducible
for(i in 1:nrow(sttbl)) {
qtxt = paste("cpquery(fit, ", event[i], ", ", evidence[i], ",n=1e6", ")")
sttbl$prob[i] = eval(parse(text=qtxt))
}
I find it preferable to work with cpdist which is used to generate random samples conditional on some evidence. You can then use these samples to build up queries. If you use likelihood weighting (method="lw") it is slightly easier to do this programatically (and without evil(parse(...))).
The evidence is added in a named list i.e. list(A='a').
# The following just gives a quick way to assign the same
# evidence state to all the evidence nodes.
evidence = setNames(replicate(nrow(sttbl), "a", simplify = FALSE), sttbl$to)
# Now loop though the queries
# As we are using likelihood weighting we need to reweight to get the probabilities
# (cpquery does this under the hood)
# Also note with this method that you could simulate from more than
# one variable (event) at a time if the evidence was the same.
for(i in 1:nrow(sttbl)) {
temp = cpdist(fit, sttbl$from[i], evidence[i], method="lw")
w = attr(temp, "weights")
sttbl$prob2[i] = sum(w[temp=='a'])/ sum(w)
}
sttbl
# from to strength prob prob2
# 1 A D -1938.9499 0.6186238 0.6233387
# 2 A B -1153.8796 0.6050552 0.6133448
# 3 C D -823.7605 0.7027782 0.7067417
# 4 B E -720.8266 0.7332107 0.7328657
# 5 F E -549.2300 0.5850828 0.5895373

Using ifelse to create a running tally in R

I am trying to do some quantitative modeling in R. I'm not getting an error message, but the results are not what I actually need.
I am a newbie, but here is my complete code sample.
`library(quantmod)
#Building the data frame and xts to show dividends, splits and technical indicators
getSymbols(c("AMZN"))
Playground <- data.frame(AMZN)
Playground$date <- as.Date(row.names(Playground))
Playground$wday <- as.POSIXlt(Playground$date)$wday #day of the week
Playground$yday <- as.POSIXlt(Playground$date)$mday #day of the month
Playground$mon <- as.POSIXlt(Playground$date)$mon #month of the year
Playground$RSI <- RSI(Playground$AMZN.Adjusted, n = 5, maType="EMA") #can add Moving Average Type with maType =
Playground$MACD <- MACD(AMZN, nFast = 12, nSlow = 26, nSig = 9)
Playground$Div <- getDividends('AMZN', from = "2007-01-01", to = Sys.Date(), src = "google", auto.assign = FALSE)
Playground$Split <- getSplits('AMZN', from = "2007-01-01", to = Sys.Date(), src = "google", auto.assign = FALSE)
Playground$BuySignal <- ifelse(Playground$RSI < 30 & Playground$MACD < 0, "Buy", "Hold")
All is well up until this point when I start using some logical conditions to come up with decision points.
Playground$boughts <- ifelse(Playground$BuySignal == "Buy", lag(Playground$boughts) + 1000, lag(Playground$boughts))
It will execute but the result will be nothing but NA. I suppose this is because you are trying to add NA to a number, but I'm not 100% sure. How do you tell the computer I want you to keep a running tally of how much you have bought?
Thanks so much for the help.
So we want ot buy 1000 shares every time a buy signal is generated?
Your problem stems from MACD idicator. It actually generates two columns, macd and signal. You have to decide which one you want to keep.
Playground$MACD <- MACD(AMZN, nFast = 12, nSlow = 26, nSig = 9)$signal
This should solve the problem at hand.
Also, please check the reference for ifelse. The class of return value can be tricky at times, and so the approach suggested by Floo0 is preferable.
Also, I'd advocate using 1 and 0 instead of buy and sell to show weather you are holding . It makes the math much easier.
And I'd strongly suggest reading some beginner tutorial on backtesting with PerformanceAnalytics. They make the going much much easier.
BTW, you missed this line in the code:
Playground$boughts<- 0
Hope it helps.
EDIT: And I forgot to mention the obvious. discard the first few rows where MACD will be NA
Something like:
Playground<- Playground[-c(1:26),]
Whenever you want to do an ifelse like
if ... Do something, else stay the same: Do not use ifelse
Try this instead
ind <- which(Playground$BuySignal == "Buy")
Playground$boughts[ind] <- lag(Playground$boughts) + 1000

Genetic Algorithm Optimization

I asked a question a few weeks back regarding how one would do optimization in R(Optimizing for Vector Using Optimize R). Now that I have a proper grip with basic optimization in R, I would like to start employing GA's to solve for solutions.
Given the objective function:
div.ratio <- function(weight, vol, cov.mat){
weight <- weight / sum(weight)
dr <- (t(weight) %*% vol) / (sqrt(t(weight) %*% cov.mat %*% (weight)))
return(-dr)
}
I am using genalg package for optimizing, specifically the "rbga.bin" function. But the thing is one cannot seem to pass in more than one parameter, ie can't pass in vol and cov.mat. Am I missing something or understanding this incorrectly.
Edit:
In the genalg package, there is a function called rbga.bin which is the one I am using.
Here is the simple code from previous question that can get you started:
rm(list=ls())
require(RCurl)
sit = getURLContent('https://github.com/systematicinvestor/SIT/raw/master/sit.gz', binary=TRUE, followlocation = TRUE, ssl.verifypeer = FALSE)
con = gzcon(rawConnection(sit, 'rb'))
source(con)
close(con)
load.packages('quantmod')
data <- new.env()
tickers<-spl("VTI,VGK,VWO,GLD,VNQ,TIP,TLT,AGG,LQD")
getSymbols(tickers, src = 'yahoo', from = '1980-01-01', env = data, auto.assign = T)
for(i in ls(data)) data[[i]] = adjustOHLC(data[[i]], use.Adjusted=T)
bt.prep(data, align='remove.na', dates='1990::2013')
prices<-data$prices[,-10]
ret<-na.omit(prices/mlag(prices) - 1)
vol<-apply(ret,2,sd)
cov.mat<-cov(ret)
out <- optim(par = rep(1 / length(vol), length(vol)), # initial guess
fn = div.ratio,
vol = vol,
cov.mat = cov.mat,
method = "L-BFGS-B",
lower = 0,
upper = 1)
opt.weights <- out$par / sum(out$par) #optimal weights
While the above optim function works just fine, I was thinking if it is possible to reproduce this using a GA algorithm. So in the future if I am searching for multiple objectives I will be able to do this faster compared to GA. (I am not sure if it is faster, but this is the step to take to find out)
GAmodel <- rbga.bin(size = 7, #genes
popSize = 200, #initial number of chromosomes
iters = 100, #number of iterations
mutationChance = 0.01, #chance of mutation
evalFunc = div.ratio) #objective function
Doing the above seems to produce an error as div.ratio needs extra paramters, so I am looking for some help in structuring my problem so that it will be able to produce the optimal answer. I hope the above edit clarifies things.
Thanks
This is what you need:
GAmodel <- rbga(stringMin=rep(0, length(vol)), stringMax=rep(1, length(vol)),
popSize = 200,
iters = 100,
mutationChance = 0.01,
evalFunc = function(weight) div.ratio(weight, vol=vol, cov.mat=cov.mat))
(see first and last lines above).
The problems were:
vectors weight and vol must match lengths.
function evalFunc is called with a single parameter, causing the others to be missing. As I understand, you want to optimize in the weight vector only, keeping vol and cov.mat fixed.
If you want weight to be treated as a continuous variable, then use rbga instead.

R: use of ROI package

I am trying to use the R package ROI for a simple portfolio optimization problem.
I can get the results using the quadprog solver "manually", but I'd really like to understand how the ROI package works.
Unfortunately I run into an error, even though I am sticking to the provided example by Stefan Theussl at http://statmath.wu.ac.at/courses/optimization/Presentations/ROI-2011.pdf (slide 26,27)
Here is the code:
library(fPortfolio)
library(ROI)
data(LPP2005.RET)
lppData <- 100 * LPP2005.RET[, 1:6]
r <- mean(lppData)
foo <- Q_objective(Q = cov(lppData), L = rep(0, ncol(lppData)))
full_invest <- L_constraint(rep(1, ncol(lppData)), "==", 1)
target_return <- L_constraint(apply(lppData, 2, mean), "==",r)
op <- OP(objective = foo, constraints = rbind(full_invest, target_return))
sol <- ROI_solve(op, solver = "quadprog")
The error message I get is:
Error in (dir == "<=") | (dir = q = "<") : operations are possible
only for numeric, logical or complex types
Thanks for your help!
It turns out that there was a bug in the ROI quadprog plugin which has been fixed by the developer.

Resources