Constrained Profit Maximization with Volume-dependent Production Cost - r

I am trying to maximize hourly profits from a power generation asset.
As far as I understood from my research, I might use quadprog::solve.QP.
I did most of the required data preparation already giving me a 96 x 5 data frame.
The columns include the following information:
Quarter Hour of a day
Power Price
Production Volume
Generation Cost
Profit
The first two columns are complete which leads to my quadratic optimization. The target function is as follows:
max Profit[i] = Volume[i] * (Price[i] - Cost[i])
The main issue is that the Generation Cost is a function of the Production Volume (which I have predetermined and which moreover depends on various static values).
In addition to that the Production Volume in a certain quarter hour must not differ from the precedent production volume by let's say more than 20 MegaWatt.
The Production Volume must not exceed a maximum production volume and not fall below a minimum production volume.
I tried to implement the optimization problem as follows:
Volume = x1
Price = x2
Cost = x3
Profit = x1 * (x2 - x3) --> max
Profit = x1*x2 - x1*x3 --> max
with
x3 = f(x1)
subject to
x1(t) >= x1(t-1) - 20
x1(t) <= x1(t-1) + 20
x1 <= max(x1)
x1 >= min(x1)
From the quadprog manuals I read that I need to use
solve.QP(Dmat, dvec, Amat, bvec)
But I honestly don't know how to fill the two matrices and the two vectors.
Can anyone help?
I hope the information given is sufficient.
Cheers,
Tilman

Related

How to calculate Received Signal Strength (RSS) and path loss Error in dB?

supposed that the measurement RSS is "-70dBm" and the predicted RSS is "-68dBm, the transmission power of antenna is "-12dBm",
then if the following equation is right? if not, how to calculate it?
Error = |10 * log10 (70/12) - 10 * log10 (68/12)| = 10 * log10 (70/68)
now my measurement is the RSS in dBm, how to convert it into dB?
This often confuses folks in my experience, and as such warrants a thorough explanation.
The "m" in "dBm" means relative to 1 milliwatt. It is typically used for absolute measurements, whereas "regular" dBs are typically used for power gains/losses/diffs.
Example:
in/tx out/rx
1w .5w
(1 milliwatt = 0.001)
10log(1/0.001) = 30dBm
10log(0.5/0.0001) = 27dBm
loss = 3dB
10log(1) = 0dB
10log(0.5) = 3dB
loss still is 3dB
(note there is an implied /1w here since the argument to log must be unit-less, e.g. 0.5w/1w = 0.5 "flat" (aka no units))
So in the context of power differences, the m does not matter.
Things to note:
1/2 of power lost == -3dB gain (or +3dB loss)
power gains/losses in series are added/subtracted when in dBs -vs- multipled/divided when in watts
0 watts == -infinity dB
0 dBm == 1 milliwatt
log here is base 10 (not 2 nor e)
reletiveGainOrLoss = 10^(valueOfGainOrLossInDb/10)
valueOfPowerInMilliwatts = 10^(valueOfPowerInDbm/10)
In your example, I'll presume by error you mean the error of the predicted loss relative measured loss:
predicted loss =
known transmission power - predicted RSS =
-12dBm - -68dBm =
56dB
measured loss =
known transmission power - measured RSS =
-12dBm - -70dBm =
58dB
error of predicted loss relative to measured loss =
|predicted loss - measured loss| =
|56dB - 58dB| =
2dB (==2dBms, but for diffs we should drop the m) =
Or more directly: 70 - 68 (so easy with dBs!)
This equates to a 63% error (or 58, depending on how it is done):
10^(-2dB/10) =
0.63
(10^(2dB/10) =
1.584893192461113
In millwatts:
10^(valueInDbm/10) =
10^(-70/10) = 0.0000001 milliwatts
10^(-68/10) = 0.000000158489319 milliwatts
10^(-12/10) = 0.063095734448019 milliwatts
As a sanity check:
(0.0000001/0.063095734448019 - 0.000000158489319/0.063095734448019) / (0.0000001/0.063095734448019) =
(0.000001584893192 - 0.000002511886428) / 0.000001584893192 =
-0.000000926993236 / 0.000001584893192 =
-0.584893190707832
(note that doing it in watts is much more laborious! (not to even mention float errors))
To answer your other question regarding:
Error = |10 * log10 (70/12) - 10 * log10 (68/12)| = 10 * log10 (70/68)
The first equation is nonsensical; as discussed above: for dBs we add/subtract -vs- multiply/divide. The second equation is however true, based on one of the rules of logs:
log a + log b = log ab
When the RSS in dBm, the path loss is equal to the difference between the transmission RSS and received RSS, the unit of path loss in this case is dB.

How to simulate a dataset with a binary target in proportions determined 'a-priori'?

Can someone tell me what is the best way to simulate a dataset with a binary target?
I understand the way in which a dataset can be simulated but what I'm looking for is to determine 'a-priori' the proportion of each class. What I thought was to change the intercept to achieve it but I couldn't do it and I don't know why. I guess because the average is playing a trick on me.
set.seed(666)
x1 = rnorm(1000)
x2 = rnorm(1000)
p=0.25 # <<< I'm looking for a 25%/75%
mean_z=log(p/(1-p))
b0 = mean( mean_z - (4*x1 + 3*x2)) # = mean_z - mean( 2*x1 + 3*x2)
z = b0 + 4*x1 + 3*x2 # = mean_z - (4*x1 + 3*x2) + (4*x1 + 3*x2) = rep(mean_z,1000)
mean( b0 + 4*x1 + 3*x2 ) == mean_z # TRUE!!
pr = 1/(1+exp(-z))
y = rbinom(1000,1,pr)
mean(pr) # ~ 40% << not achieved
table(y)/1000
What I'm looking for is to simulate the typical "logistic" problem in which the binary target can be modeled as a linear combination of features.
These 'logistic' models assume that the log-odd ratio of the binary variable behaves linearly. That means:
log (p / (1-p)) = z = b0 + b1 * x1 + b2 * x2 where p = prob (y = 1)
Going back to my sample code, we could do, for example: z = 1.3 + 4 * x1 + 2 * x2 , but the probability of the class would be a result. Or instead we could choose coefficient b0 such that the probability is (statistically) similar to the one sought :
log (0.25 / 0.75) = b0 + 4 * x1 + 2 * x2
This is my approach, but there may be betters
I gather that you are considering a logistic regression model, right? If so, one way to generate a data set is to create two Gaussian bumps and say that one is class 1 and the other is class 0. Then generate 25 items from class 1 and 75 items from class 0. Then each generated item plus its label is a datum or record or whatever you want to call it.
Obviously you can choose any proportions of 1's and 0's. It is also interesting to make the problem "easy" by making the Gaussian bumps farther apart (i.e. variances smaller in comparison to difference of means) or "hard" by making the bumps overlapping (i.e. variances larger compared to difference of means).
EDIT: In order to make sample data which correspond exactly to a logistic regression model, just make the variances of the two Gaussian bumps the same. When the variances (by this I mean specifically the covariance matrix) are the same, the surfaces of equal posterior class probability are planes; when the covariances are different, the surfaces of equal probability are quadratics. This is a standard result which will appear in many textbooks. I also have some notes online about this, which I can locate if it will help.
Aside from generating the two classes separately and then merging the results into one set, you can also sample from a single distribution over x, plug x into a logistic regression model with some weights (which you choose by any means you wish), and then use the resulting output as a probability for a coin toss. This method isn't guaranteed to output proportions that correspond exactly to prior class probabilities.

How to solve a portfolio optimization with a generalised objective function?

I have a portfolio of 5 stocks for which I want to find an optimal mix of minimizing portfolio variance and maximizing expected future dividends. The latter is from analysts forecasts. My problem is that I know how to solve an minimum variance problem but I am not sure how to put the quadratic form into the right matrix form for the objective function of quadprog.
The standard minimum variance problem reads
Min! ( portfolio volatility )
wherer has the 252 daily returns of the five stocks,d has the expected yearly dividend yields ( where firm_A pays 1 %, firm_B pays 2 % etc, )
and I have programmed it as follows
dat = rep( rnorm( 10, mean = 0, sd = 1 ), 252*5 )
r = matrix( dat, nr = 252, nc = 5 )
d = matrix( c( 1, 2, 1, 2, 2 ) )
library(quadprog)
# Dmat (covariance) and dvec (penalized returns) are generated easily
risk.param = 0.5
Dmat = cov(r)
Dmat[is.na(Dmat)]=0
dvec = matrix(colMeans(r) * risk.param)
dvec[is.na(dvec)]=1e-5
# The weights sum up to 1
n = 5
A = matrix( rep( 1, n ), nr = n )
b = 1
meq = 1
res = solve.QP( Dmat, dvec, A, b, meq = 1 )
Obviously, the returns in r a standard normal, hence each stocks gets about 20% weight.
Q1: How can I account for the fact that firm_A pays a dividend of 1, firm_B a dividend of 2, etc?
The new objective function reads:
Max! ( 0.5 * Portfolio_div - 0.5 * Portfolio_variance )
but I don't know how to hard-code it. The portfolio variance was easy to put into Dmat but the new objective function has the Portfolio_div element defined as Portfolio_div = w * d where w has the five weights.
Thanks a lot.
EDIT: Maybe it makes sense to add a higher-level description of the problem:
I am able to use a minimum-variance optimization with the code above. Minimizing the portfolio variance means optimizing the weights on the variace-covariance matrix Dmat (of dimension 5x5). However, I want to add an additional part to the optimization, which are the dividends in d multiplied with the weights (hence of dimension 5x1). The same weights are also used for Dmat.
Q2: How can I add the vector d to the code?
EDIT2: I guess the answer is to simply use
dvec = -1/d
as I maximize expected dividends by minimizing the inverse of the negative.
Q3: Could someone please tell me if that's right?
Opening a can of worms:
TLDR While I respect great work Harry MARKOWITZ ( 1990 Nobel prize ) has performed, I appreciate much more his wonderfull CACI Simulations spin-off deterministic simulation framework COMET III, than the Portfolio theory assumption, that variance per-se is the ruling minimiser driver for the portfolio optimisation process.
Driving this principal point of view ( which still may meet a bit ill-formed motivation of big funds,that live happily from their 2-by-20 feesdue to the nature and scale of "their" skewed perspective of perception of what are direct losses,which they recognise as a non-acquired hefty & risk-free management feesassociated with a crowd-panic churn attributed AUM erosion,rather than the real profits & losses, gained from their (in)ability to deliver any above average AUM returns ) further,closer to your ideathe problem is in the proper formulation of the { penalty | utility } function.
While variance is taken in classical efficient frontier theory as a penalty factor, operated in a min! global search, it has not much to do with real profit generation. You get penalised even for positive-side variance components, which is a nonsense per-se.
On the contrary, the dividend is a direct benefit, an absolute utility, entering the max! optimisation process.
So the first step in Q3 & Q1 ought be a design of a consistent utility function isolated from relative, revenue un-related factors, but containing all other absolute factors -- a cost of entry, transaction costs, rebalancing costs -- as otherwise your utility model would be misleading your portfolio wealth management strategy.
A2: Without this a-priori designed property, no one may claim a model is worth a single CPU-hour to even start the model's global optimisation efforts.

Optimization with minimum order quantities using R

I'm new to optimization and I need to implement it in a simple scenario:
There exists a car manufacturer that can produce 5 models of cars/vans. Associated with each model that can be produced is a number of labor hours required and a number of tons of steel required, as well as a profit that is earned from selling one such car/van. The manufacturer currently has a fixed amount of steel and labor available, which should be used in such a way that it optimizes total profit.
Here's the part I'm hung up on - each car also has a minimum order quantity. The company must manufacture a certain number of each model before it becomes economically viable to produce/sell that model. This would be easily sent to optim() if it were not for that final condition because the `lower = ...' argument can be given a vector with the minimum order quantities, but then it does not consider 0 as an option. Could someone help me solve this, taking into account the minimum order, but still allowing for an order of 0? Here's how I've organized the relevant information/constraints:
Dorian <- data.frame(Model = c('SmCar', 'MdCar', 'LgCar', 'MdVan', 'LgVan'),
SteelReq = c(1.5,3,5,6,8), LabReq=c(30,25,40,45,55),
MinProd = c(1000,1000,1000,200,200),
Profit = c(2000,2500,3000,5500,7000))
Materials <- data.frame(Steel=6500,Labor=65000)
NetProfit<-function(x) {
x[1]->SmCar
x[2]->MdCar
x[3]->LgCar
x[4]->MdVan
x[5]->LgVan
np<-sum(Dorian$Profit*c(SmCar,MdCar,LgCar,MdVan,LgVan))
np
}
LowerVec <- Dorian$MinProd #Or 0, how would I add this option?
UpperVec <- apply(rbind(Materials$Labor/Dorian$LabReq,
Materials$Steel/Dorian$SteelReq),2,min)
# Attempt at using optim()
optim(c(0,0,0,0,0),NetProfit,lower=LowerVec, upper=UpperVec)
Eventually I would like to substitute random variables with known distributions for parameters such as Profit and LabReq (labor required) and wrap this into a function that will take Steel and Labor available as inputs as well as parameters for the random variables. I will want to simulate many times and then find the average solution given specific parameters for the Profit and Labor Required, so ideally this optimization would also be fast so that I could perform the simulations. Thanks in advance for any help!
If you are not familiar with Linear Programming, start here: http://en.wikipedia.org/wiki/Linear_programming
Also have a look at the part about Mixed-Integer Programming http://en.wikipedia.org/wiki/Mixed_integer_programming#Integer_unknowns. That's when the variables you are trying to solve are not all continuous, but also include booleans or integers.
To all aspects, your problem is a mixed-integer programming (to be exact, an integer programming) as you are trying to solve for integers: the number of vehicles to produce for each model.
There are known algorithms for solving these and thankfully, they are already wrapped into R packages for you. Rglpk is one of them, and I'll show you how to formulate your problem so you can use its Rglpk_solve_LP function.
Let x1, x2, x3, x4, x5 be the variables you are solving for: the number of vehicles to produce for each model.
Your objective is:
Profit = 2000 x1 + 2500 x2 + 3000 x3 + 5500 x4 + 7000 x5.
Your steel constraint is:
1.5 x1 + 3 x2 + 5, x3 + 6 x4 + 8 x5 <= 6500
Your labor constraint is:
30 x1 + 25 x2 + 40 x3 + 45 x4 + 55 x5 <= 65000
Now comes the hard part: modeling the minimum production requirements. Let's take the first one as an example: the minimum production requirement on x1 requires that at least 1000 vehicles be produced (x1 >= 1000) or that no vehicle be produced at all (x1 = 0). To model that requirement, we are going to introduce a boolean variables z1. By boolean, I mean z1 can only take two values: 0 or 1. The requirement can be modeled as follows:
1000 z1 <= x1 <= 9999999 z1
Why does this work? Consider the two possible values for z1:
if z1 = 0, then x1 is forced to 0
if z1 = 1 then x1 is forced to be greater than 1000 (the minimum production requirement) and smaller than 9999999 which I picked as an arbitrarily big number.
Repeating this for each model, you will have to introduce similar boolean variables (z2, z3, z4, z5). In the end, the solver will not only be solving for x1, x2, x3, x4, x5 but also for z1, z2, z3, z4, z5.
Putting all this into practice, here is the code for solving your problem. We are going to solve for the vector x = (x1, x2, x3, x4, x5, z1, z2, z3, z4, z5)
library(Rglpk)
num.models <- nrow(Dorian)
# only x1, x2, x3, x4, x5 contribute to the total profit
objective <- c(Dorian$Profit, rep(0, num.models))
constraints.mat <- rbind(
c(Dorian$SteelReq, rep(0, num.models)), # total steel used
c(Dorian$LabReq, rep(0, num.models)), # total labor used
cbind(-diag(num.models), +diag(Dorian$MinProd)), # MinProd_i * z_i
cbind(+diag(num.models), -diag(rep(9999999, num.models)))) # x_i - 9999999 x_i
constraints.dir <- c("<=",
"<=",
rep("<=", num.models),
rep("<=", num.models))
constraints.rhs <- c(Materials$Steel,
Materials$Labor,
rep(0, num.models),
rep(0, num.models))
var.types <- c(rep("I", num.models), # x1, x2, x3, x4, x5 are integers
rep("B", num.models)) # z1, z2, z3, z4, z5 are booleans
Rglpk_solve_LP(obj = objective,
mat = constraints.mat,
dir = constraints.dir,
rhs = constraints.rhs,
types = var.types,
max = TRUE)
# $optimum
# [1] 6408000
#
# $solution
# [1] 1000 0 0 202 471 1 0 0 1 1
#
# $status
# [1] 0
So the optimal solution is to create (1000, 0, 0, 202, 471) vehicles of each respective model, for a total profit of 6,408,000.

Constrain Optimisation Problems in R

I am trying to set up an optimisation script that will look at a set of models, fit curves to the models and then optimise across them, subject to a few parameters.
Essentially, I have revenue as a function of cost, in a diminishing function, and I have this for multiple portfolios, say 4 or 5. As an input, I have cost and revenue figures, at set increments. What I want to do is fit a curve to the portfolio of the form Revenue=A*cost^B, and then optimise across the different portfolios to find the optimal cost split between each portfolio for a set budget.
The code below (I apologise for the inelegance of it, I'm sure there are MANY improvements to be made!) essentially reads in my data, in this case, a simulation, creates the necessary data frames (this is likely where my inelegance comes in), calculates the necessary variables for the curves for each simulation and produces graphics to check the fitted curve to the data.
My problem is that now I have 5 curves of the form:
revenue = A * Cost ^ B (different A, B and cost for each function)
And I want to know, given the 5 variables, how should I split my cost between them, so I want to optimise the sum of the 5 curves subject to
Cost <= Budget
I know that I need to use constrOptim, but I have spent literally hours banging my head against my desk (literally hours, not literally banging my head...) and I still can't figure out how to set up the function so that it maximises revenue, subject to the cost constraint...
Any help here would be greatly appreciated, this has been bugging me for weeks.
Thanks!
Rich
## clear all previous data
rm(list=ls())
detach()
objects()
library(base)
library(stats)
## read in data
sim<-read.table("input19072011.txt",header=TRUE)
sim2<-data.frame(sim$Wrevenue,sim$Cost)
## identify how many simulations there are - here you can change the 20 to the number of steps but all simulations must have the same number of steps
portfolios<-(length(sim2$sim.Cost)/20)
## create a matrix to input the variables into
a<-rep(1,portfolios)
b<-rep(2,portfolios)
matrix<-data.frame(a,b)
## create dummy vector to hold the revenue predictions
k<-1
j<-20
for(i in 1:portfolios){
test<-sim2[k:j,]
rev9<-test[,1]
cost9<-test[,2]
ds<-data.frame(rev9,cost9)
rhs<-function(cost, b0, b1){
b0 * cost^b1
m<- nls(rev9 ~ rhs(cost9, intercept, power), data = ds, start = list(intercept = 5,power = 1))
matrix[i,1]<-summary(m)$coefficients[1]
matrix[i,2]<-summary(m)$coefficients[2]
k<-k+20
j<-j+20
}
## now there exists a matrix of all of the variables for the curves to optimise
matrix
multiples<-matrix[,1]
powers<-matrix[,2]
coststarts<-rep(0,portfolios)
## check accuracy of curves
k<-1
j<-20
for(i in 1:portfolios){
dev.new()
plot(sim$Wrevenue[k:j])
lines(multiples[i]*(sim$Cost[k:j]^powers[i]))
k<-k+20
j<-j+20
}
If you want to find
the values cost[1],...,cost[5]
that maximize revenue[1]+...+revenue[5]
subject to the constraints cost[1]+...+cost[5]<=budget
(and 0 <= cost[i] <= budget),
you can parametrize the set of feasible solutions
as follows
cost[1] = s(x[1]) * budget
cost[2] = s(x[2]) * ( budget - cost[1] )
cost[3] = s(x[3]) * ( budget - cost[1] - cost[2])
cost[4] = s(x[4]) * ( budget - cost[1] - cost[2] - cost[3] )
cost[5] = budget - cost[1] - cost[2] - cost[3] - cost[4]
where x[1],...,x[4] are the parameters to find
(with no constraints on them)
and s is any bijection between the real line R and the segment (0,1).
# Sample data
a <- rlnorm(5)
b <- rlnorm(5)
budget <- rlnorm(1)
# Reparametrization
s <- function(x) exp(x) / ( 1 + exp(x) )
cost <- function(x) {
cost <- rep(NA,5)
cost[1] = s(x[1]) * budget
cost[2] = s(x[2]) * ( budget - cost[1] )
cost[3] = s(x[3]) * ( budget - cost[1] - cost[2])
cost[4] = s(x[4]) * ( budget - cost[1] - cost[2] - cost[3] )
cost[5] = budget - cost[1] - cost[2] - cost[3] - cost[4]
cost
}
# Function to maximize
f <- function(x) {
result <- sum( a * cost(x) ^ b )
cat( result, "\n" )
result
}
# Optimization
r <- optim(c(0,0,0,0), f, control=list(fnscale=-1))
cost(r$par)

Resources