How to solve linear programming model in R - r
I need to solve the following microeconomic problem:
I have six assets I can produce (asset 1 - 6) across five years (2011 - 2015).
Each asset can only be produced during one year.
Each asset must be produced in my five year period.
Production is not mutually exclusive; I can produce more than one good in a year without affecting the production of either.
Each asset has a fixed cost of production equal to 30.
I must have non-negative profit in each year; revenues must be at least 30.
Below is a matrix representing my potential revenue for producing each asset (i) in a given year (j).
2011 2012 2013 2014 2015
Asset1 35* 37 39 42 45
Asset2 16 17 18 19 20*
Asset3 125 130 136*139 144
Asset4 15 27 29 30* 33
Asset5 14 43* 46 50 52
Asset6 5 7 8 10 11*
The asterisks (*) represent what should be the optimal solution set.
How can I use R to solve for the production plan that maximizes my revenue (and therefore profit) subject to the constraints outlined. My output should be a similar 6x5 matrix of 0's and 1's, where 1's represent choosing to produce a good in a given year.
This is a classic problem, and one that needs to be reformulated.
Start by reformulating your problem
Max( sum_[i,t] (pi_[i,t] - C_[i,t]) * x_[i,t])
Sd.
sum_t x_[i,t] = 1 [ for all i ]
sum_i x_[i,t] >= 30 [ for all t ]
x_[i,t] >= 0 [for all i, t]
In the lpSolve package the maximization problem is given in a linear representation, eg. in non-matrix format. Lets start by making a vector representing our x_[i,t]. For ease let's name it (although this is not used), just so we can keep track.
n <- 6
t <- 5
#x ordered by column.
x <- c(35, 16, 125, 15, 14, 5, 37, 17, 130, 27, 43, 7, 39, 18, 136, 29, 46, 8, 42, 19, 139, 30, 50, 10, 45, 20, 144, 33, 52, 11)
# if x is matrix use:
# x <- as.vector(x)
names(x) <- paste0('x_[', seq(n), ',', rep(seq(t), each = n), ']')
head(x, n * 2)
x_[1,1] x_[2,1] x_[3,1] x_[4,1] x_[5,1] x_[6,1] x_[1,2] x_[2,2] x_[3,2] x_[4,2] x_[5,2] x_[6,2]
35 16 125 15 14 5 37 17 130 27 43 7
length(x)
[1] 30
Now now we need to create our conditions. Starting with the first condition
sum_t x_[i,t] = 1 [ for all i ]
we can create this rather simply. The thing to watch out for here, is that the dimension has to be right. We have a vector of length 30, so we'll need our conditions matrix to have 30 columns. In addition we have 6 assets, so we'll need 6 rows for this condition. Again lets name the rows and columns to keep track ourself.
cond1 <- matrix(0, ncol = t * n,
nrow = n,
dimnames = list(paste0('x_[', seq(n), ',t]'),
names(x)))
cond1[, seq(n + 1)]
x_[1,1] x_[2,1] x_[3,1] x_[4,1] x_[5,1] x_[6,1] x_[1,2]
x_[1,t] 0 0 0 0 0 0 0
x_[2,t] 0 0 0 0 0 0 0
x_[3,t] 0 0 0 0 0 0 0
x_[4,t] 0 0 0 0 0 0 0
x_[5,t] 0 0 0 0 0 0 0
x_[6,t] 0 0 0 0 0 0 0
Next we fill our the correct fields. x_[1,1] + x[1, 2] + ... = 1 and x_[2,1] + x_[2,2] + ... = 1 and so forth. Using a for loop is the simplest for this problem
for(i in seq(n)){
cond1[i, seq(i, 30, n)] <- 1
}
cond1[, seq(n + 1)]
x_[1,1] x_[2,1] x_[3,1] x_[4,1] x_[5,1] x_[6,1] x_[1,2]
x_[1,t] 1 0 0 0 0 0 1
x_[2,t] 0 1 0 0 0 0 0
x_[3,t] 0 0 1 0 0 0 0
x_[4,t] 0 0 0 1 0 0 0
x_[5,t] 0 0 0 0 1 0 0
x_[6,t] 0 0 0 0 0 1 0
We still have to create the RHS and specify direction but I'll wait with this for now.
So next lets create our matrix for the second condition
sum_i x_[i,t] >= 30 [ for all t ]
The process for this one is very similar, but now we need a row for each period, so the dimension of the matrix is 5x30. The main difference here, is we need to insert the values of x_[i, t]
cond2 <- matrix(0, ncol = t * n,
nrow = t,
dimnames = list(paste0('t=', seq(t)),
names(x)))
for(i in seq(t)){
cond2[i, seq(n) + n * (i - 1)] <- x[seq(n) + n * (i - 1)]
}
cond2[, seq(1, n * t, n)]
x_[1,1] x_[1,2] x_[1,3] x_[1,4] x_[1,5]
t=1 35 0 0 0 0
t=2 0 37 0 0 0
t=3 0 0 39 0 0
t=4 0 0 0 42 0
t=5 0 0 0 0 45
Note that I'm printing the result for x_[1, t] to illustrate we've got it right.
Last we have the final condition. For this we note the ?lpSolve::lp has an argument all.bin, and reading this, it states
Logical: should all variables be binary? Default: FALSE.
So since all variables are either 1 or 0, we simply set this value to TRUE. Before continuing lets combine our conditions into one matrix
cond <- rbind(cond1, cond2)
Now both the RHS and the direction are simply taken from the 2 conditions. From the documentation on the const.dir argument
Vector of character strings giving the direction of the constraint: each value should be one of "<," "<=," "=," "==," ">," or ">=". (In each pair the two values are identical.)
In our conditions we have 6 rows representing the first condition, and rows represeting condition 2. Thus we need n (6) times == and t (5) times >=.
cond_dir <- c(rep('==', n), rep('>=', t))
The RHS is created in a similar fashion
RHS <- c(rep(1, n), rep(30, t))
And that's it! Now we're ready to solve our problem using the lpSolve::lp function.
sol = lpSolve::lp(direction = 'max',
objective.in = x,
const.mat = cond,
const.dir = cond_dir,
const.rhs = RHS,
all.bin = TRUE)
sol$objval
[1] 275
The weights for the solution are stored in sol$solution
names(sol$solution) <- names(x)
sol$solution
x_[1,1] x_[2,1] x_[3,1] x_[4,1] x_[5,1] x_[6,1] x_[1,2] x_[2,2] x_[3,2] x_[4,2] x_[5,2] x_[6,2] x_[1,3] x_[2,3] x_[3,3]
1 0 0 0 0 0 0 0 0 0 1 0 0 0 1
x_[4,3] x_[5,3] x_[6,3] x_[1,4] x_[2,4] x_[3,4] x_[4,4] x_[5,4] x_[6,4] x_[1,5] x_[2,5] x_[3,5] x_[4,5] x_[5,5] x_[6,5]
0 0 0 0 0 0 1 0 0 0 1 0 0 0 1
matrix(sol$solution,
ncol = t,
dimnames = list(rownames(cond1),
rownames(cond2)))
t=1 t=2 t=3 t=4 t=5
x_[1,t] 1 0 0 0 0
x_[2,t] 0 0 0 0 1
x_[3,t] 0 0 1 0 0
x_[4,t] 0 0 0 1 0
x_[5,t] 0 1 0 0 0
x_[6,t] 0 0 0 0 1
Which we quickly see is the correct solution. :-)
Side note on costs
One may have noticed "Where the hell did the costs go?". In this specific case, costs are fixed and not very interesting. This means we can ignore these during the calculations because we know the total cost is going to be 30 * 6 = 180 (which has to be substracted from the objective value). However it is not uncommon that costs depend on various factors, and might affect the optimal solution. For illustration, I'll include how we could incorporate costs in this example here.
First we'll have to extend our objective vector to incorporate the costs for each product at each period
Fixed_C <- -30
x <- c(x, rep(Fixed_C, n * t))
Next we'll add a pseudo-constraint
x_[i,t] - C_[i,t] = 0 [for all i, t]
This constraint ensures that if x_[i,t] = 1 then the relevant cost is added to the problem. There's 2 ways to create this constraint. The first is to have a matrix with n * t rows, one for each cost and period. Alternatively we can use our first constraint and actually live with only a single constrant
sum_[i,t] x_[i,t] - C_[i,t] = 0
because our first constraint makes sure x[1, 1] != x[1, 2]. So our third constraint becomes
cond3 <- c(rep(1, n * t), rep(-1, n * t))
Lastly we have to extend our RHS and condition 1 and 2 matrices. Simply add 0's to the condition matrices to make the dimensions fit.
cond1 <- cbind(cond1, matrix(0, nrow = n, ncol = n * t))
cond2 <- cbind(cond2, matrix(0, nrow = n, ncol = n * t))
cond <- rbind(cond1, cond2, cond3)
cond_dir <- c(cond_dir, '==')
RHS <- c(RHS, 0)
And now we can once again find the optimal solution using lpSolve::lp
solC = lpSolve::lp(direction = 'max',
objective.in = x,
const.mat = cond,
const.dir = cond_dir,
const.rhs = RHS,
all.bin = TRUE)
solC$objval
[1] 95
which is equal to our previous value 275 minus our fixed costs Fixed_C * n = 180.
Related
How to create binary constraints for optimization in R?
I have a function f(x) which I intend to minimize. "x" is a vector containing 50 parameters. This function has several constraints: first is that all parameters in x should be binary, so that x = (1,1,0,1,...); second is that the sum of "x" should be exactly 25, so that sum(x) = 25. The question can be illustrated as: min f(x) s.t. sum(x) = 25, x = 0 or 1 However when I try to solve this problem in R, I met some problems. Prevalent packages such as "optim","constrOptim" from "stats" can only input coefficients of the target function (in my case, the function is bit complex and cannot be simply illustrated using coefficient matrix), "donlp2" from "Rdonlp" does not support setting parameters to be binary. I'm wondering whether anyone has any idea of how to set binary constraints for this case?
Expanding my comment, here is an example of a Local Search, as implemented in package NMOF. (I borrow Stéphane's objective function). library("NMOF") library("neighbours") ## Stéphane's objective function f <- function(x) sum(1:20 * x) nb <- neighbourfun(type = "logical", kmin = 10, kmax = 10) x0 <- c(rep(FALSE, 10), rep(TRUE, 10)) sol <- LSopt(f, list(x0 = x0, neighbour = nb, nI = 1000)) ## initial solution as.numeric(x0) ## [1] 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 ## final solution as.numeric(sol$xbest) ## [1] 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 (Disclosure: I am the maintainer of packages NMOF and neighbours.)
You can try the amazing package rgenoud. Below is an example. I take 20 binary variables instead of your 50 for easier reading. I take f(x) = sum(1:20 * x), this is a weighted sum with increasing weights so clearly the best solution (restricted to sum(x)=10) is 1, 1, ..., 1, 0, 0, ..., 0. And rgenoud brilliantly finds it. library(rgenoud) f <- function(x) { # the function to be minimized sum(1:20 * x) } g <- function(x){ c( ifelse(sum(x) == 10, 0, 1), # set the constraint (here sum(x)=10) in this way f(x) # the objective function (to minimize/maximize) ) } solution <- genoud( g, pop.size = 3000, lexical = 2, # see ?genoud for explanations nvars = 20, # number of x_i's starting.values = c(rep(0, 10), rep(1, 10)), Domains = cbind(rep(0, 20), rep(1, 20)), # lower and upper bounds data.type.int = TRUE # x_i's are integer ) solution$par # the values of x ## [1] 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 solution$value ## [1] 0 55 ; 0 is the value of ifelse(sum(x)=10,0,1) and 55 is the value of f(x)
Make a new vector according to a given vector in R
Imagine I give you a vector like a = (8 - 2) - (7 - 1) which can be simplified as z = (8 - 2 - 7 + 1). Now imagine I give you a vector consisting of nine 0s, b = c(0,0,0,0,0,0,0,0,0). Can R turn a to the following vector desired_output = c(1,-1,0,0,0,0,-1,1,0)? The logic The numbers in a are locations of elements in b (ex. 8 in a means 8th element in b). The logic is to assign either 1 or -1 to the elements indicated in a based on their sign and assign 0 to all other elements in b so to get the desired_output.
I don't entirely understand your problem setup — in R terms, a = (8 - 2) - (7 - 1) is an expression rather than a vector — but here's a start: b <- rep(0,9) a <- c(8, -2, -7, 1) b[abs(a)] <- sign(a) ## [1] 1 -1 0 0 0 0 -1 1 0
We can use for loop for(i in a){ if(i > 0) b[i] <- 1 else b[abs(i)] <- -1 } Output [1] 1 -1 0 0 0 0 -1 1 0 Data a <- c(8 ,- 2 ,- 7 ,1) b <- c(0,0,0,0,0,0,0,0,0)
Better way to adding elements in data frame without looping in R
I want to create a dataframe that calculates the odds ratio with the standard error and confidence intervals in R. I have a dataset similar to the one like so: dat <- read.table(header = TRUE, text = " f1 f2 f3 f4 f5 f6 f7 f8 f9 f10 f11 target 0 0 1 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 1 1 1 1 1 1 0 0 0 0 0 0 0") And create a function that calculates everything I need in the dataframe for a particular future in the data set like so: get_ci <- function(df, feature) { tab <- table(df[[feature]], df$target) a <- tab[1,1] b <- tab[1,2] c <- tab[2,1] d <- tab[2,2] odds_ratio <- (a/b)/(c/d) standard_error <- sqrt(1/a + 1/b + 1/c + 1/d) log_ci_lower <- log(odds_ratio) - 1.96 + standard_error log_ci_upper <- log(odds_ratio) - 1.96 + standard_error ci_lower <- exp(log_ci_lower) ci_upper <- exp(log_ci_upper) df <- data.frame(Feature = feature, `Odds Ratio` = odds_ratio, `Standard Error` = standard_error, `Lower Bound CI` = ci_lower, `Upper Bound CI` = ci_upper ) } I want to create a DF that computes the odds ratio, standard error, and confidence interval for each features (f1-f11). What is the most efficient way to do this? I am currently creating an empty dataframe and looping through the features in the df to populate one but I feel like this is not the right way to do it. I was looking at the apply functions, but not sure how I can apply that with my function I created
I think the first table line in the function should be : tab <- table(factor(df[[feature]], levels = 0:1), df$target) otherwise, if you have all 1's and all 0's in a particular column the next lines would break. With that change, you can use lapply passing the column names result <- do.call(rbind, lapply(paste0('f', 1:11), get_ci, df = dat)) Or using purrr's map_df result <- map_df(paste0('f', 1:11), get_ci, df = dat)
Here's another solution. get_ci <- function(x, target) { tab <- table(factor(x, levels=0:1), target) #changed ... ci_upper <- exp(log_ci_upper) c(`Odds Ratio` = odds_ratio, # changed `Standard Error` = standard_error, `Lower Bound CI` = ci_lower, `Upper Bound CI` = ci_upper ) } as.data.frame(apply(dat[,1:11], 2, function(x) { get_ci(x, dat$target) })) #changed
error in design matrix: "Design matrix not of full rank. The following coefficients not estimable"
I have a design matrix for my data as below. I run command for analyzing and comparing different groups together but get error. I would like to have these comparisons: L4vsL6.L8 , Q3vsQ5.Q7, QvsL design matrix: design organoids_biological_samples method L4_D49_rep_1 L4 L L4_D49_rep_2 L4 L L6_L8_D49_rep_1 L6_L8 L L6_L8_D49_rep_2 L6_L8 L Q3_D49_rep_1 Q3 Q Q3_D49_rep_2 Q3 Q Q5_Q7_D49_rep_1 Q5_Q7 Q Q5_Q7_D49_rep_2 Q5_Q7 Q design$organoids_biological_samples <- factor(design$organoids_biological_samples, levels = c("L4","L6_L8", "Q3", "Q5_Q7")) design$method <- factor(design$method, levels = c("L", "Q")) all(rownames(design) %in% colnames(data)) all(rownames(design) == colnames(data)) Group <- factor(paste(design$organoids_biological_samples,design$method,sep=".")) design<- cbind(design,Group) design.matrix <- model.matrix(~0+Group+method,design) colnames(design.matrix) <- c("L4.L", "L6_L8.L", "Q3.Q", "Q5_Q7.Q", "method") design.matrix L4.L L6_L8.L Q3.Q Q5_Q7.Q method L4_D49_rep_1 1 0 0 0 0 L4_D49_rep_2 1 0 0 0 0 L6_L8_D49_rep_1 0 1 0 0 0 L6_L8_D49_rep_2 0 1 0 0 0 Q3_D49_rep_1 0 0 1 0 1 Q3_D49_rep_2 0 0 1 0 1 Q5_Q7_D49_rep_1 0 0 0 1 1 Q5_Q7_D49_rep_2 0 0 0 1 1 attr(,"assign") [1] 1 1 1 1 2 attr(,"contrasts") attr(,"contrasts")$Group [1] "contr.treatment" attr(,"contrasts")$method [1] "contr.treatment" edgeR.dgelist = DGEList(counts = data,group = Group) edgeR.dgelist = calcNormFactors(edgeR.dgelist,method = "TMM") CommonDisp <- estimateGLMCommonDisp(edgeR.dgelist, design.matrix) Error in glmFit.default(y, design = design, dispersion = dispersion, offset = offset, : Design matrix not of full rank. The following coefficients not estimable: method
There is a strict linear dependency in your data: Q3.Q + Q5_Q7.Q = method. Therefore your model can not find a unique coefficient for these columns if B1, B2, B3 is an optimal set of coefficients so is B1-x, B2-x, B3-x for any real number x. Numerically your program can not invert the matrix to find the optimal coefficient. Consider testing if it is better to include both Q3.Q and Q5_Q7.Q or just method in your model.
Combine each element of a vector with another vector in R
I have two vectors x <- c(2, 3, 4) y <- rep(0, 5) I want to get the following output: > z 2, 0, 0, 0, 0, 0, 3, 0, 0, 0, 0, 0, 4, 0, 0, 0, 0, 0 How can I create z? I have tried to use paste and c but nothing seems to work. The only thing I can think of is using a for() and it is terribly slow. I have googled this and I am sure the solution is out there and I am just not hitting the right keywords. UPDATE: For benchmarking purposes: Using Nicola's solution: > system.time( + precipitation <- `[<-`(numeric(length(x)*(length(y)+1)),seq(1,by=length(y)+1,length.out=length(x)),x) + ) user system elapsed 0.419 0.407 0.827 This is ridiculously fast! I must say! Can someone please explain this to me? My for() which I know is always wrong in R would have taken at least a day if it even finished. The other suggestions: > length(prate) [1] 4914594 > length(empty) [1] 207 > system.time( + precipitation <- unlist(sapply(prate, FUN = function(prate) c(prate,empty), simplify=FALSE)) + ) user system elapsed 16.470 3.859 28.904 I had to kill len <- length(prate) precip2 <- c(rbind(prate, matrix(rep(empty, len), ncol = len))) After 15 minutes.
you can try this unlist(sapply(x, FUN = function(x) c(x,y), simplify=FALSE)) [1] 2 0 0 0 0 0 3 0 0 0 0 0 4 0 0 0 0 0 or simpler from #docendodiscimus unlist(lapply(x, FUN = function(x) c(x,y)))
This seems faster for some reason: unlist(t(matrix(c(as.list(x),rep(list(y),length(x))),ncol=2))) The above solution is general, in the sense that both x and y can have any value. In the OP case, where y is made just of 0, this is fast as it can be: `[<-`(numeric(length(x)*(length(y)+1)),seq(1,by=length(y)+1,length.out=length(x)),x) #[1] 2 0 0 0 0 0 3 0 0 0 0 0 4 0 0 0 0 0 Edit I realise I've been very cryptic and the code I produced is not easy to follow, despite being just one line. I'm gonna explain in detail what the second solution does. First of all, you notice that the resulting vector will have the values containd in x plus the zeroes in y repeated length(x) times. So in total, it will be length(x) + length(x)*length(y) or length(x)*(length(y)+1) long. So we create a vector with just zeroes as long as needed: res<-numeric(length(x)*(length(y)+1)) Now we have to place the x values in res. We notice that the first value of x occupies the first value in res; the second will be after length(y)+1 from the first and so on, until all the length(x) values are filled. We can create a vector of indices in which to put the x values: indices<-seq.int(1,by=length(y)+1,length.out=length(x)) And then we make the replacement: res[indices]<-x My line was just a shortcut for the three lines above. Hope this clarifies a little.
You could also try to vectorize as follows len <- length(x) c(rbind(x, matrix(rep(y, len), ncol = len))) ## [1] 2 0 0 0 0 0 3 0 0 0 0 0 4 0 0 0 0 0 A more compact, but potentially slower option (contributed by #akrun) would be c(rbind(x, replicate(len, y))) ## [1] 2 0 0 0 0 0 3 0 0 0 0 0 4 0 0 0 0 0
You can try: c(sapply(x, 'c', y)) #[1] 2 0 0 0 0 0 3 0 0 0 0 0 4 0 0 0 0 Or a crazy solution with gusb and paste.. library(functional) p = Curry(paste0, collapse='') as.numeric(strsplit(p(gsub('(.*)$', paste0('\\1',p(y)),x)),'')[[1]]) #[1] 2 0 0 0 0 0 3 0 0 0 0 0 4 0 0 0 0 0
Here's another way: options(scipen=100) as.numeric(unlist(strsplit(as.character(x * 10^5), ""))) And some benchmarks: microbenchmark({as.numeric(unlist(strsplit(as.character(x*10^5), "")))}, {unlist(t(matrix(c(as.list(x),rep(list(y),length(x))),ncol=2)))}, {unlist(sapply(x, FUN = function(x) c(x,y), simplify=FALSE))}, times=100000) Unit: microseconds expr { as.numeric(unlist(strsplit(as.character(x * 10^5), ""))) } { unlist(t(matrix(c(as.list(x), rep(list(y), length(x))), ncol = 2))) } { unlist(sapply(x, FUN = function(x) c(x, y), simplify = FALSE)) } min lq mean median uq max neval 9.286 10.644 12.15242 11.678 12.286 1650.133 100000 9.485 11.164 13.25424 12.288 13.067 1887.761 100000 5.607 7.429 9.21015 8.147 8.784 30457.994 100000 And here's another idea (but it seems slow): r = rle(1) r$lengths = rep(c(1,5), length(x)) r$values = as.vector(rbind(x, 0)) inverse.rle(r)