I am trying to pick the best possible fantasy football team given different constraints. My goal is to pick the players that maximize the sum of their projected points.
The constraints are:
1) The team must include:
-1 QB
-2 RBs
-2 WRs
-1 TE
2) A player's risk must not exceed 6
3) The sum of the players' costs must not exceed 300.
How can I do this? What is the best package/function in R to optimize these constraints? What would the function call look like to maximize the projected points given these constraints? FYI, I'll be searching through 100-300 players.
Thanks in advance! Here is a small example data set:
name <- c("Aaron Rodgers","Tom Brady","Arian Foster","Ray Rice","LeSean McCoy","Calvin Johnson","Larry Fitzgerald","Wes Welker","Rob Gronkowski","Jimmy Graham")
pos <- c("QB","QB","RB","RB","RB","WR","WR","WR","TE","TE")
pts <- c(167, 136, 195, 174, 144, 135, 89, 81, 114, 111)
risk <- c(2.9, 3.4, 0.7, 1.1, 3.5, 5.0, 6.7, 4.7, 3.7, 8.8)
cost <- c(60, 47, 63, 62, 40, 60, 50, 35, 40, 40)
mydata <- data.frame(name, pos, pts, risk, cost)
Your constraints and objective are linear, but your variables are binaries: whether each player should be picked or not. So your problem is a little more general than a Linear Programming (LP), it is a Mixed-Integer Programming (MIP). On CRAN's Optimization Task View, look for their MIP section.
CPLEX is a commercial solver you probably not have access to, but GLPK is free. If I were you, I would probably go with the high level interface Rglpk.
It will require you put your problem in matrix form, I suggest you look at the documentation and examples.
Edit: Here is an implementation:
# We are going to solve:
# maximize f'x subject to A*x <dir> b
# where:
# x is the variable to solve for: a vector of 0 or 1:
# 1 when the player is selected, 0 otherwise,
# f is your objective vector,
# A is a matrix, b a vector, and <dir> a vector of "<=", "==", or ">=",
# defining your linear constraints.
# number of variables
num.players <- length(name)
# objective:
f <- pts
# the variable are booleans
var.types <- rep("B", num.players)
# the constraints
A <- rbind(as.numeric(pos == "QB"), # num QB
as.numeric(pos == "RB"), # num RB
as.numeric(pos == "WR"), # num WR
as.numeric(pos == "TE"), # num TE
diag(risk), # player's risk
cost) # total cost
dir <- c("==",
rep("<=", num.players),
b <- c(1,
rep(6, num.players),
sol <- Rglpk_solve_LP(obj = f, mat = A, dir = dir, rhs = b,
types = var.types, max = TRUE)
# $optimum
# [1] 836 ### <- the optimal total points
# $solution
# [1] 1 0 1 0 1 1 0 1 1 0 ### <- a `1` for the selected players
# $status
# [1] 0 ### <- an optimal solution has been found
# your dream team
name[sol$solution == 1]
# [1] "Aaron Rodgers" "Arian Foster" "LeSean McCoy"
# [4] "Calvin Johnson" "Wes Welker" "Rob Gronkowski
I have a function with five variables that I want to maximize using only an specific set of parameters for each variable.
Are there any methods in R that can do this, other than by brutal force? (e.g. Particle Swarm Optimization, Genetic Algorithm, Greedy, etc.). I have read a few packages but they seem to create their own set of parameters from within a given range. I am only interested in optimizing the set of options provided.
Here is a simplified version of the problem:
#Example of 5 variable function to optimize
#Parameters for variables to optimize
As=c(seq(1.5,3, by = 0.3)), #float
Bs=c(1,2), #Binary
Cs=c(seq(1,60, by=10)), #Integer
Ds=c(seq(60,-60, length.out=5)), #Negtive
#Full combination
FullCombn= expand.grid(Vars)
Results=data.frame(I=as.numeric(), Sum=as.numeric())
for (i in 1:nrow(FullCombn)){
#Best iteration (Largest result)
Best=Results[Results[, 2] == max(Results[, 2]),]
#Best parameters
Two more possibilities. Both minimize by default, so I flip the sign in your objective function (i.e. return -SUM).
#Example of 5 variable function to optimize
Fn<-function(x, ...){
#Parameters for variables to optimize
As=c(seq(1.5,3, by = 0.3)), #float
Bs=c(1,2), #Binary
Cs=c(seq(1,60, by=10)), #Integer
Ds=c(seq(60,-60, length.out=5)), #Negtive
First, a grid search. Exactly what you did, just convenient. And the implementation allows you to distribute the evaluations of the objective function.
gridSearch(fun = Fn,
levels = Vars)[c("minfun", "minlevels")]
## 5 variables with 6, 2, 6, 5, ... levels: 1080 function evaluations required.
## $minfun
## [1] -119
## $minlevels
## [1] 3 2 51 60 3
An alternative: a simple Local Search. You start with a valid initial guess, and then move randomly through possible feasible solutions. The key ingredient is the neighbourhood function. It picks one element randomly and then, again randomly, sets this element to one allowed value.
nb <- function(x, levels, ...) {
i <- sample(length(levels), 1)
x[i] <- sample(levels[[i]], 1)
(There would be better algorithms for neighbourhood functions; but this one is simple and so demonstrates the idea well.)
LSopt(Fn, list(x0 = c(1.8, 2, 11, 30, 2), ## a feasible initial solution
neighbour = nb,
nI = 200 ## iterations
levels = Vars)$xbest
## Local Search.
## ##...
## Best solution overall: -119
## [1] 3 2 51 60 3
(Disclosure: I am the maintainer of package NMOF, which provides functions gridSearch and LSopt.)
In response to the comment, a few remarks on Local Search and the neighbourhood function above (nb). Local Search, as implemented in
LSopt, will start with an arbitrary solution, and
then change that solution slightly. This new solution,
called a neighbour, will be compared (by its
objective-function value) to the old solution. If the new solution is
better, it becomes the current solution; otherwise it
is rejected and the old solution remains the current one.
Then the algorithm repeats, for a number of iterations.
So, in short, Local Search is not random sampling, but
a guided random-walk through the search space. It's
guided because only better solutions get accepted, worse one's get rejected. In this sense, LSopt will narrow down on good parameter values.
The implementation of the neighbourhood is not ideal
for two reasons. The first is that a solution may not
be changed at all, since I sample from feasible
values. But for a small set of possible values as here,
it might often happen that the same element is selected
again. However, for larger search spaces, this
inefficiency is typically negligible, since the
probability of sampling the same value becomes
smaller. Often so small, that the additional code for
testing if the solution has changed becomes more
expensive that the occasionally-wasted iteration.
A second thing could be improved, albeit through a more
complicated function. And again, for this small problem it does not matter. In the current neighbourhood, an
element is picked and then set to any feasible value.
But that means that changes from one solution to the
next might be large. Instead of picking any feasible values of the As,
in realistic problems it will often be better to pick a
value close to the current value. For example, when you are at 2.1, either move to 1.8 or 2.4, but not to 3.0. (This reasoning is only relevant, of course, if the variable in question is on a numeric or at least ordinal scale.)
Ultimately, what implementation works well can be
tested only empirically. Many more details are in this tutorial.
Here is one alternative implementation. A solution is now a vector of positions for the original values, e.g. if x[1] is 2, it "points" to 1.8, if x[2] is 2, it points to 1, and so on.
## precompute lengths of vectors in Vars
lens <- lengths(Vars)
nb2 <- function(x, lens, ...) {
i <- sample(length(lens), 1)
if (x[i] == 1L) {
x[i] <- 2
} else if (x[i] == lens[i]) {
x[i] <- lens[i] - 1
} else
x[i] <- x[i] + sample(c(1, -1), 1)
## the objective function now needs to map the
## indices in x back to the levels in Vars
Fn2 <- function(x, levels, ...){
y <- mapply(`[`, levels, x)
## => same as
## y <- numeric(length(x))
## y[1] <- Vars[[1]][x[1]]
## y[2] <- Vars[[2]][x[2]]
## ....
SUM <- sum(y)
xbest <- LSopt(Fn2,
list(x0 = c(1, 1, 1, 1, 1), ## an initial solution
neighbour = nb2,
nI = 200 ## iterations
levels = Vars,
lens = lens)$xbest
## Local Search.
## ....
## Best solution overall: -119
## map the solution back to the values
mapply(`[`, Vars, xbest)
## As Bs Cs Ds Es
## 3 2 51 60 3
Here is a genetic algorithm solution with package GA.
The key is to write a function decode enforcing the constraints, see the package vignette.
#> Loading required package: foreach
#> Loading required package: iterators
#> Package 'GA' version 3.2.2
#> Type 'citation("GA")' for citing this R package in publications.
#> Attaching package: 'GA'
#> The following object is masked from 'package:utils':
#> de
decode <- function(x) {
As <- Vars$As
Bs <- Vars$Bs
Cs <- Vars$Cs
Ds <- rev(Vars$Ds)
# fix real variable As
i <- findInterval(x[1], As)
if(x[1L] - As[i] < As[i + 1L] - x[1L])
x[1L] <- As[i]
else x[1L] <- As[i + 1L]
# fix binary variable Bs
if(x[2L] - Bs[1L] < Bs[2L] - x[2L])
x[2L] <- Bs[1L]
else x[2L] <- Bs[2L]
# fix integer variable Cs
i <- findInterval(x[3L], Cs)
if(x[3L] - Cs[i] < Cs[i + 1L] - x[3L])
x[3L] <- Cs[i]
else x[3L] <- Cs[i + 1L]
# fix integer variable Ds
i <- findInterval(x[4L], Ds)
if(x[4L] - Ds[i] < Ds[i + 1L] - x[4L])
x[4L] <- Ds[i]
else x[4L] <- Ds[i + 1L]
# fix the other, integer variable
x[5L] <- round(x[5L])
setNames(x , c("As", "Bs", "Cs", "Ds", "Es"))
Fn <- function(x){
x <- decode(x)
# a <- x[1]
# b <- x[2]
# c <- x[3]
# d <- x[4]
# e <- x[5]
# SUM <- a + b + c + d + e
SUM <- sum(x, na.rm = TRUE)
#Parameters for variables to optimize
Vars <- list(
As = seq(1.5, 3, by = 0.3), # Float
Bs = c(1, 2), # Binary
Cs = seq(1, 60, by = 10), # Integer
Ds = seq(60, -60, length.out = 5), # Negative
Es = c(1, 2, 3)
res <- ga(type = "real-valued",
fitness = Fn,
lower = c(1.5, 1, 1, -60, 1),
upper = c(3, 2, 51, 60, 3),
popSize = 1000,
seed = 123)
#> ── Genetic Algorithm ───────────────────
#> GA settings:
#> Type = real-valued
#> Population size = 1000
#> Number of generations = 100
#> Elitism = 50
#> Crossover probability = 0.8
#> Mutation probability = 0.1
#> Search domain =
#> x1 x2 x3 x4 x5
#> lower 1.5 1 1 -60 1
#> upper 3.0 2 51 60 3
#> GA results:
#> Iterations = 100
#> Fitness function value = 119
#> Solutions =
#> x1 x2 x3 x4 x5
#> [1,] 2.854089 1.556080 46.11389 49.31045 2.532682
#> [2,] 2.869408 1.638266 46.12966 48.71106 2.559620
#> [3,] 2.865254 1.665405 46.21684 49.04667 2.528606
#> [4,] 2.866494 1.630416 46.12736 48.78017 2.530454
#> [5,] 2.860940 1.650015 46.31773 48.92642 2.521276
#> [6,] 2.851644 1.660358 46.09504 48.81425 2.525504
#> [7,] 2.855078 1.611837 46.13855 48.62022 2.575492
#> [8,] 2.857066 1.588893 46.15918 48.60505 2.588992
#> [9,] 2.862644 1.637806 46.20663 48.92781 2.579260
#> [10,] 2.861573 1.630762 46.23494 48.90927 2.555612
#> ...
#> [59,] 2.853788 1.640810 46.35649 48.87381 2.536682
#> [60,] 2.859090 1.658127 46.15508 48.85404 2.590679
apply(res#solution, 1, decode) |> t() |> unique()
#> As Bs Cs Ds Es
#> [1,] 3 2 51 60 3
Created on 2022-10-24 with reprex v2.0.2
Morning folks,
I'm trying to categorize a set of numerical values (Days Left divided by 365.2 which gives us approximately the numbers of years left until a maturity).
The results of this first calculation give me a vector of 3560 values (example: 0.81, 1.65, 3.26 [...], 0.2).
I'd like to categorise these results into intervals, [Between 0 and 1 Year, 0 and 2 Years, 0 and 3 years, 0 and 4 years, Over 4 years].
#Set the Data Frame
dfMaturity <- data.frame(Maturity = DATA$Maturity)
#Call the library and Run the function
MaturityX = ddply(df, .(Maturity), nrow)
#Set the Data Frame
dfMaturityID <- data.frame(testttto = DATA$Security.Name)
#Calculation of the remaining days
MaturityID = ddply(df, .(dfMaturityID$testttto), nrow)
survey <- data.frame(date=c(DATA$Maturity),tx_start=c("1/1/2022"))
survey$date_diff <- as.Date(as.character(survey$date), format="%m/%d/%Y")-
as.Date(as.character(survey$tx_start), format="%m/%d/%Y")
# Data for the table
MaturityName <- MaturityID$`dfMaturityID$testttto
MaturityZ <- survey$date
TimeToMaturity <- as.numeric(survey$date_diff)
Multiplier <- TimeToMaturity /365.2
cx <- cut(Multiplier, breaks=0:5)
The original datasource comes from an excel file (DATA$Maturity)
If it can helps you:
gives us
[1] 0.4956188 1.4950712 1.9989047 0.2464403 0.9994524 3.0010953 5.0000000 7.0016429 9.0005476
[10] 21.0021906 4.1621030 13.1626506 1.1610077 8.6664841 28.5377875 3.1626506 6.7497262 2.0920044
[19] 2.5602410 4.6495071 0.3368018 6.3225630 8.7130340 10.4956188 3.9019715 12.7957284 5.8378970
I copied the first three lines, but there is a total 3560 objects.
I'm open to any kind of help, I just want it to work :) thank you !
The cut function does that:
example <- c(0.81, 1.65, 3.26, 0.2)
cut(example, breaks = c(0, 1, 2, 3, 4),
labels = c("newborn", "one year old", "two", "three"))
From the comment
I'd like then to create a table with for example: 30% of the objects has a maturity between 0 and 1 year
You could compute that using the function below:
example <- c(0.81, 1.65, 3.26, 0.2)
share <- function(x, lower = 0, higher= 1){
x <- na.omit(x)
sum((lower <= x) & (x < higher))/length(x)
share(1:10, lower = 0,higher = 3.5) # true for 1:3 out of 1:10 so 30%
share(1:10, lower = 4.5, higher = 5.5) # true for 5 so 10%)
share(example, 0, 3)
I have a large dataset with multiple categorical values that have different integer values (counts) in two different groups.
As an example
Element <- c("zinc", "calcium", "magnesium", "sodium", "carbon", "nitrogen")
no_A <- c(45, 143, 10, 35, 70, 40)
no_B <- c(10, 11, 1, 4, 40, 30)
elements_df <- data.frame(Element, no_A, no_B)
Previously I’ve just been using the code below and changing x manually to get the output values:
x = "calcium"
n1 = (elements_df %>% filter(Element== x))$no_A
n2 = sum(elements_df$no_A) - n1
n3 = (elements_df %>% filter(Element== x))$no_B
n4 = sum(elements_df$no_B) - n3
fisher.test(matrix(c(n1, n2, n3, n4), nrow = 2, ncol = 2, byrow = TRUE))
But I have a very large dataset with 4000 rows and I’d like the most efficient way to iterate through all of them and see which have significant p values.
I imagined I’d need a for loop and function, although I’ve looked through a few previous similar questions (none that I felt I could use) and it seems using apply might be the way to go.
So, in short, can anyone help me with writing code that iterates over x in each row and prints out the corresponding p values and odds ratio for each element?
You could get them all in a nice data frame like this:
`row.names<-`(do.call(rbind, lapply(seq(nrow(elements_df)), function(i) {
f <- fisher.test(matrix(c(elements_df$no_A[i], sum(elements_df$no_A[-i]),
elements_df$no_B[i], sum(elements_df$no_B[-i])), nrow = 2));
data.frame(Element = elements_df$Element[i],
"odds ratio" = f$estimate, "p value" = scales::pvalue(f$p.value),
"Lower CI" = f$conf.int[1], "Upper CI" = f$conf.int[2],
check.names = FALSE)
})), NULL)
#> Element odds ratio p value Lower CI Upper CI
#> 1 zinc 1.2978966 0.601 0.6122734 3.0112485
#> 2 calcium 5.5065701 <0.001 2.7976646 11.8679909
#> 3 magnesium 2.8479528 0.469 0.3961312 125.0342574
#> 4 sodium 2.6090482 0.070 0.8983185 10.3719176
#> 5 carbon 0.3599468 <0.001 0.2158107 0.6016808
#> 6 nitrogen 0.2914476 <0.001 0.1634988 0.5218564
I took a class a few years ago about power market optimization and tried building a small example in R that I have worked up the courage to tackle again. However I need some help.
I would like to take the constraints of ~4 power plants and try to satisfy a single location's demand for power, as well as demand for 3 other types of ancillary services needed in the power market (called Reserves). I am looking to minimize the total cost of the generators of electricity.
I've laid out a lot of information but can't seem to figure out to how to start using any optimization packages (I have used optim before but couldn't quite work with these constraints). This will be a little lengthy but I'm using comments in R code for an easy copy-paste-run for those interested in helping or viewing what I have.
Warning: You're going to learn more about power plants and electricity markets than you want to from this post.
# For each generator we have the following data:
## Maximum Capacity: the most power it can be producing
## Technical Minimum Capacity: the smallest amount (other than being off)
## Cost per Megawatt: The cost of generating power ($/MW)
## Ramp Rate: The speed a plant can change to a higher or lower output (MW/min)
G1 <- c(200, 100, 50, 2)
G2 <- c(150, 10, 80, 10)
G3 <- c(200, 100, 55, 2)
G4 <- c(150, 10, 85, 10)
Gdat.1 <- rbind(G1,G2,G3,G4)
colnames(Gdat.1) = c("MWMax","MWMin","Cost","RampRate")
n <- nrow(Gdat.1) # number of generators
# System Requirements: Demand
## Supply (MW) must equal demand.
Demand <- 415
# System Requirements: Reserves
## Total Reserves of the system must be met.
### R1: Primary Reserves
##### 0.5 minute response time, bi-direcitonal (Ramp UP/DOWN)
### R2: Secondary Reserves
##### 5 minute response time, bi-directional (Ramp UP/DOWN)
### R3: Tertiary Reserves
##### 15 minute response time, uni-directional (Ramp UP only)
# R for Reserve Type.
# For each Reserve Type we have the following data:
# Total: MW Needed
# minutes: within how much time the MW is needed by
# bid: amount the operator will pay for MW reserves $/MW)
R1 <- c(2, 0.5, 60) # Primary
R2 <- c(8, 5, 40) # Secondary
R3 <- c(20, 15, 0) # Tertiary
Reserves <- rbind(R1,R2,R3)
colnames(Reserves) = c("Total","Minutes","Bid")
## Ramp Rate constraint of generators
### For R1 (Primary Reserves) the system needs 2 MW that can be supplied 30 seconds,
### a Generator with a ramprate of 2 MW/min will only be able to supply
### 1 MW for primary reserves, while a Generator with a ramprate of 10 MW/min
### will be able to supply 5 MW.
# How much each Generator can supply in the time time
R1max <- Gdat.1[,"RampRate"] * Reserves["R1","Minutes"]
R2max <- Gdat.1[,"RampRate"] * Reserves["R2","Minutes"]
R3max <- Gdat.1[,"RampRate"] * Reserves["R3","Minutes"]
R1min <- -R1max # recall, bi-directional
R2min <- -R2max # bi-directional
R3min <- 0/R3max # uni-direction up
# we no longer need RampRate since we used it to calculate
Gdat <- cbind(Gdat.1[,-4], cbind(R1max,R2max,R3max,R1min,R2min,R3min))
# Now we initialize each generator's commitments that can change during optimization
MW.Demand = rep(0,n) # general MW to satisfy demand
MW.R1 = rep(0,n) # MW to satisfy Primary Reserves
MW.R2 = rep(0,n) # MW to satisfy Secondary Reserves
MW.R3 = rep(0,n) # MW to satisfy Tertiary Reserves
Commit.orig <- cbind(MW.Demand,MW.R1,MW.R2,MW.R3)
rownames(Commit.orig) <- paste0("G",seq(1,n))
Commit <- Commit.orig
# Some initial guess (may be exactly the right answer...)
Commit <- matrix(c(200,0,0,0,
0,0,0,0), 4, 4, byrow = T, dimnames(Commit.orig))
# Objective Function, cost per MW of each generator times their total MW output
# minimize the total cost, not sure which way to list it, or if this way even works
sum(Commit * Gdat[,"Cost"])
sum(Gdat[,"Cost"] %*% Commit)
sum(rowSums(Ctest * Gdat[,"Cost"]))
# Constraints
sum(Commit[,"MW.Demand"]) == Demand & # All generators together must sum to meet system demand requirements
sum(Commit[,"MW.R1"]) == Reserves["R1","Total"] & # Total Primary Reserves are met
sum(Commit[,"MW.R2"]) == Reserves["R2","Total"] & # Total Secondary
sum(Commit[,"MW.R3"]) == Reserves["R3","Total"] & # Total Tertiary
(rowSums(Commit) <= Gdat[,"MWMax"] | rowSums(Commit) == 0) & # Generators must be less than or equal to its max, or off
(rowSums(Commit) >= Gdat[,"MWMin"] | rowSums(Commit) == 0) & # Genreators must be more than or equal to its min, or off
Commit[,"MW.R1"] <= Gdat[,"R1max"] & Commit[,"MW.R1"] >= Gdat[,"R1min"] & # Genrators cannot exceed ramprate limitations
Commit[,"MW.R2"] <= Gdat[,"R2max"] & Commit[,"MW.R2"] >= Gdat[,"R2min"] & # - for the bi-directional
Commit[,"MW.R3"] <= Gdat[,"R3max"] & Commit[,"MW.R3"] >= Gdat[,"R3min"] # - or unidirectional reserves
Thank you anyone willing to take a look at this.
It's my understanding that when calculating quantiles in R, the entire dataset is scanned and the value for each quantile is determined.
If you ask for .8, for example it will give you a value that would occur at that quantile. Even if no such value exists, R will nonetheless give you the value that would have occurred at that quantile. It does this through linear interpolation.
However, what if one wishes to calculate quantiles and then proceed to round up/down to the nearest actual value?
For example, if the quantile at .80 gives a value of 53, when the real dataset only has a 50 and a 54, then how could one get R to list either of these values?
Try this:
#dummy data
x <- c(1,1,1,1,10,20,30,30,40,50,55,70,80)
#get quantile at 0.8
q <- quantile(x, 0.8)
# 80%
# 53
#closest match - "round up"
min(x[ x >= q ])
#[1] 55
#closest match - "round down"
max(x[ x <= q ])
#[1] 50
There are many estimation methods implemented in R's quantile function. You can choose which type to use with the type argument as documented in https://stat.ethz.ch/R-manual/R-devel/library/stats/html/quantile.html.
x <- c(1, 1, 1, 1, 10, 20, 30, 30, 40, 50, 55, 70, 80)
quantile(x, c(.8)) # default, type = 7
# 80%
# 53
quantile(x, c(.8), FALSE, TRUE, 7) # equivalent to the previous invocation
# 80%
# 53
quantile(x, c(.8), FALSE, TRUE, 3) # type = 3, nearest sample
# 80%
# 50