lpSolve - find maximum value - formula with multiplications - r

I'm looking for a way to use lpSolve in a similar way to how I use it succesfully in Excel. I've calculated elasticity for various product. Based on whether a product is elasticity or not elastic I want to give an advice on which price to ask.
I have the following values:
current.price = 15
sales.lastmonth = 50
elasticity = -1.5
And I want to optimize the sales.prediction by changing the suggested.price
Formula:
sales.prediction = (sales.lastmonth - ((abs(elasticity)(suggested.price-current.price))(sales.lastmonth/current.price)))*suggested.price
I've tried the following:
# install.packages("lpSolve")
library(lpSolve)
objective.fn <- c(sales.prediction) # determine what the objective is
# constrain sales.lastmonth and current.price
const.mat <- matrix(c(1,1),ncol=1,byrow=T)
const.dir <- c("=","=")
const.rhs <- c(sales.lastmonth, current.price)
lp("max",objective.fn,const.mat,const.dir,const.rhs,compute.sens=TRUE)
Any suggestions?
-----------
EDIT - V2: Based on the comment below, I would like to add the constraint that the suggested.price should be at max 25% higher of 25% lower than the the suggested price I have in my mind without the optimization. For example, let's say I am already thinking about lowering the price to 12.5. Is that possible as well?

If I understand correctly you are simply trying to optimize (maximize) this function, there are no constraints since your variables are constant. If this is true then you can do
optimize(
f=function(x){
sales.lastmonth-((abs(elasticity)*(x-current.price))*(sales.lastmonth/current.price))*x
},
interval=c(-100,100),
maximum=T
)
$maximum
[1] 7.5
$objective
[1] 331.25
maximum at 7.5.
Edit: a simple hack to limit the optimization is to use the interval argument, for your edit interval=c(current.price*0.75,current.price*1.25).

Related

Data perturbation - How to perform it?

I am doing some projects related to statistics simulation using R based on "Introduction to Scientific Programming and Simulation Using R" and in the Students projects session (chapter 24) i am doing the "The pipe spiders of Brunswick" problem, but i am stuck on one part of an evolutionary algorithm, where you need to perform some data perturbation according to the sentence bellow:
"With probability 0.5 each element of the vector is perturbed, independently
of the others, by an amount normally distributed with mean 0 and standard
deviation 0.1"
What does being "perturbed" really mean here? I dont really know which operation I should be doing with my vector to make this perturbation happen and im not finding any answers to this problem.
Thanks in advance!
# using the most important features, we create a ML model:
m1 <- lm(PREDICTED_VALUE ~ PREDICTER_1 + PREDICTER_2 + PREDICTER_N )
#summary(m1)
#anova(m1)
# after creating the model, we perturb as follows:
#install.packages("perturb") #install the package
library(perturb)
set.seed(1234) # for same results each time you run the code
p1_new <- perturb(m1, pvars=c("PREDICTER_1","PREDICTER_N") , prange = c(1,1),niter=200) # your can change the number of iterations to any value n. Total number of iteration would come to be n+1
p1_new # check the values of p1
summary(p1_new)
Perturbing just means adding a small, noisy shift to a number. Your code might look something like this.
x = sample(10, 10)
ind = rbinom(length(x), 1, 0.5) == 1
x[ind] = x[ind] + rnorm(sum(ind), 0, 0.1)
rbinom gets the elements to be modified with probability 0.5 and rnorm adds the perturbation.

Extracting coefficients from mcmc object

Extracting stuff from objects has always been one of the most confusing aspects of R to me. I've fitted a bayesian linear regression model using rjags and have the following mcmc object:
summary(m_csim)
Iterations = 1:150000
Thinning interval = 1
Number of chains = 1
Sample size per chain = 150000
1. Empirical mean and standard deviation for each variable,
plus standard error of the mean:
Mean SD Naive SE Time-series SE
BR2 0.995805 0.0007474 1.930e-06 3.527e-06
BR2adj 0.995680 0.0007697 1.987e-06 3.633e-06
b[1] -5.890842 0.1654755 4.273e-04 1.289e-02
b[2] 1.941420 0.0390239 1.008e-04 1.991e-03
b[3] 1.056599 0.0555885 1.435e-04 5.599e-03
sig2 0.004678 0.0008333 2.152e-06 3.933e-06
2. Quantiles for each variable:
2.5% 25% 50% 75% 97.5%
BR2 0.994108 0.995365 0.995888 0.996339 0.99702
BR2adj 0.993932 0.995227 0.995765 0.996229 0.99693
b[1] -6.210425 -6.000299 -5.894810 -5.784082 -5.55138
b[2] 1.867453 1.914485 1.940372 1.967466 2.02041
b[3] 0.942107 1.020846 1.057720 1.094442 1.16385
sig2 0.003321 0.004082 0.004585 0.005168 0.00657
In order to extract the coefficients' means I did b = colMeans(mod_csim)[3:5]. I want to calculate credible intervals so I need to extract the 0.025 and 0.975 quantiles too. How do I do that programmatically ?
You can probably extract the quantiles directly. As others have pointed out, you can call str(m_csim), and you can limit the output of the str call with str(m_csim, max.level=1), and keep adding one to the max.level= argument until you see something that looks like quantiles.
What I like to do is to convert the MCMC output to a data.frame so it's easier to work with. I use jagsUI rather than rjags, but I often do something like
mcmc_df <- as.data.frame(as.matrix(MY_MCMC_OBJECT$samples))
Note: it might be a little different with rjags, but I'm sure you can find it with a little digging.
The benefit: I can then access a single vector for each mcmc_df$PARAMETER, and create a matrix of quantiles with
mcmc_quants <- apply(mcmc_df, 2, quantile, p=c(0.025, 0.25, 0.5, 0.75, 0.975))
or whatever quantiles you want.
You probably are looking for
model_summary_object <- summary(m_csim)
model_summary_object$quantiles[,c('2.5%','97.5%')]
I hope I'm not overstepping my knowledge bounds, but I want to answer from the point of view of 'in general' rather than specifically for rjags. m_csim is an object and many methods can possibly be used on it. You've used the summary method to see something. As people have commented, probably there is a coef method. However as someone else has commented (while I was replying !), using str() to see what an object contains is the best way to see what information is in an object and how to address it. I'd be very surprised if using str() doesn't show how to find not only the coefficients but also enough information on the confidence intervals to allow you to find the desired CI.

Struggling with simple constraints in constrOptim

I have a function in R that I wish to maximise subject to some simple constraints in optim or constrOptim, but I'm struggling to get my head around ci and uito fit my constraints.
My function is:
negexpKPI <- function(alpha,beta,spend){
-sum(alpha*(1-exp(-spend/beta)))
}
where alpha and beta are fixed vectors, and spend is a vector of spends c(sp1,sp2,...,sp6) which I want to vary in order to maximise the output of negexpKPI. I want to constrain spend in three different ways:
1) Min and max for each sp1,sp2,...,sp6, i.e
0 < sp1 < 10000000
5000 < sp2 < 10000000
...
2) A total sum:
sum(spend)=90000000
3) A sum for some individual components:
sum(sp1,sp2)=5000000
Any help please? Open to any other methods that would work but would prefer base R if possible.
According to ?constrOptim:
The feasible region is defined by ‘ui %*% theta - ci >= 0’. The
starting value must be in the interior of the feasible region, but
the minimum may be on the boundary.
So it is just a matter of rewriting your constraints in matrix format. Note, an identity constraint is just two inequality constraints.
Now we can define in R:
## define by column
ui = matrix(c(1,-1,0,0,1,-1,1,-1,
0,0,1,-1,1,-1,1,-1,
0,0,0,0,0,0,1,-1,
0,0,0,0,0,0,1,-1,
0,0,0,0,0,0,1,-1,
0,0,0,0,0,0,1,-1), ncol = 6)
ci = c(0, -1000000, 5000, -1000000, 5000000, 90000000, -90000000)
Additional Note
I think there is something wrong here. sp1 + sp2 = 5000000, but both sp1 and sp2 can not be greater than 1000000. So there is no feasible region! Please fix your question first.
Sorry, I was using sample data that I hadn't fully checked; the true optimisation is for 40 sp values with 92 constraints which would if I'd replicated here in full would have made the problem more difficult to explain. I've added a few extra zeroes to make it feasible now.

How to solve a portfolio optimization with a generalised objective function?

I have a portfolio of 5 stocks for which I want to find an optimal mix of minimizing portfolio variance and maximizing expected future dividends. The latter is from analysts forecasts. My problem is that I know how to solve an minimum variance problem but I am not sure how to put the quadratic form into the right matrix form for the objective function of quadprog.
The standard minimum variance problem reads
Min! ( portfolio volatility )
wherer has the 252 daily returns of the five stocks,d has the expected yearly dividend yields ( where firm_A pays 1 %, firm_B pays 2 % etc, )
and I have programmed it as follows
dat = rep( rnorm( 10, mean = 0, sd = 1 ), 252*5 )
r = matrix( dat, nr = 252, nc = 5 )
d = matrix( c( 1, 2, 1, 2, 2 ) )
library(quadprog)
# Dmat (covariance) and dvec (penalized returns) are generated easily
risk.param = 0.5
Dmat = cov(r)
Dmat[is.na(Dmat)]=0
dvec = matrix(colMeans(r) * risk.param)
dvec[is.na(dvec)]=1e-5
# The weights sum up to 1
n = 5
A = matrix( rep( 1, n ), nr = n )
b = 1
meq = 1
res = solve.QP( Dmat, dvec, A, b, meq = 1 )
Obviously, the returns in r a standard normal, hence each stocks gets about 20% weight.
Q1: How can I account for the fact that firm_A pays a dividend of 1, firm_B a dividend of 2, etc?
The new objective function reads:
Max! ( 0.5 * Portfolio_div - 0.5 * Portfolio_variance )
but I don't know how to hard-code it. The portfolio variance was easy to put into Dmat but the new objective function has the Portfolio_div element defined as Portfolio_div = w * d where w has the five weights.
Thanks a lot.
EDIT: Maybe it makes sense to add a higher-level description of the problem:
I am able to use a minimum-variance optimization with the code above. Minimizing the portfolio variance means optimizing the weights on the variace-covariance matrix Dmat (of dimension 5x5). However, I want to add an additional part to the optimization, which are the dividends in d multiplied with the weights (hence of dimension 5x1). The same weights are also used for Dmat.
Q2: How can I add the vector d to the code?
EDIT2: I guess the answer is to simply use
dvec = -1/d
as I maximize expected dividends by minimizing the inverse of the negative.
Q3: Could someone please tell me if that's right?
Opening a can of worms:
TLDR While I respect great work Harry MARKOWITZ ( 1990 Nobel prize ) has performed, I appreciate much more his wonderfull CACI Simulations spin-off deterministic simulation framework COMET III, than the Portfolio theory assumption, that variance per-se is the ruling minimiser driver for the portfolio optimisation process.
Driving this principal point of view ( which still may meet a bit ill-formed motivation of big funds,that live happily from their 2-by-20 feesdue to the nature and scale of "their" skewed perspective of perception of what are direct losses,which they recognise as a non-acquired hefty & risk-free management feesassociated with a crowd-panic churn attributed AUM erosion,rather than the real profits & losses, gained from their (in)ability to deliver any above average AUM returns ) further,closer to your ideathe problem is in the proper formulation of the { penalty | utility } function.
While variance is taken in classical efficient frontier theory as a penalty factor, operated in a min! global search, it has not much to do with real profit generation. You get penalised even for positive-side variance components, which is a nonsense per-se.
On the contrary, the dividend is a direct benefit, an absolute utility, entering the max! optimisation process.
So the first step in Q3 & Q1 ought be a design of a consistent utility function isolated from relative, revenue un-related factors, but containing all other absolute factors -- a cost of entry, transaction costs, rebalancing costs -- as otherwise your utility model would be misleading your portfolio wealth management strategy.
A2: Without this a-priori designed property, no one may claim a model is worth a single CPU-hour to even start the model's global optimisation efforts.

Chi squared goodness of fit for a geometric distribution

As an assignment I had to develop and algorithm and generate a samples for a given geometric distribution with PMF
Using the inverse transform method, I came up with the following expression for generating the values:
Where U represents a value, or n values depending on the size of the sample, drawn from a Unif(0,1) distribution and p is 0.3 as stated in the PMF above.
I have the algorithm, the implementation in R and I already generated QQ Plots to visually assess the adjustment of the empirical values to the theoretical ones (generated with R), i.e., if the generated sample follows indeed the geometric distribution.
Now I wanted to submit the generated sample to a goodness of fit test, namely the Chi-square, yet I'm having trouble doing this in R.
[I think this was moved a little hastily, in spite of your response to whuber's question, since I think before solving the 'how do I write this algorithm in R' problem, it's probably more important to deal with the 'what you're doing is not the best approach to your problem' issue (which certainly belongs where you posted it). Since it's here, I will deal with the 'doing it in R' aspect, but I would urge to you go back an ask about the second question (as a new post).]
Firstly the chi-square test is a little different depending on whether you test
H0: the data come from a geometric distribution with parameter p
or
H0: the data come from a geometric distribution with parameter 0.3
If you want the second, it's quite straightforward. First, with the geometric, if you want to use the chi-square approximation to the distribution of the test statistic, you will need to group adjacent cells in the tail. The 'usual' rule - much too conservative - suggests that you need an expected count in every bin of at least 5.
I'll assume you have a nice large sample size. In that case, you'll have many bins with substantial expected counts and you don't need to worry so much about keeping it so high, but you will still need to choose how you will bin the tail (whether you just choose a single cut-off above which all values are grouped, for example).
I'll proceed as if n were say 1000 (though if you're testing your geometric random number generation, that's pretty low).
First, compute your expected counts:
dgeom(0:20,.3)*1000
[1] 300.0000000 210.0000000 147.0000000 102.9000000 72.0300000 50.4210000
[7] 35.2947000 24.7062900 17.2944030 12.1060821 8.4742575 5.9319802
[13] 4.1523862 2.9066703 2.0346692 1.4242685 0.9969879 0.6978915
[19] 0.4885241 0.3419669 0.2393768
Warning, dgeom and friends goes from x=0, not x=1; while you can shift the inputs and outputs to the R functions, it's much easier if you subtract 1 from all your geometric values and test that. I will proceed as if your sample has had 1 subtracted so that it goes from 0.
I'll cut that off at the 15th term (x=14), and group 15+ into its own group (a single group in this case). If you wanted to follow the 'greater than five' rule of thumb, you'd cut it off after the 12th term (x=11). In some cases (such as smaller p), you might want to split the tail across several bins rather than one.
> expec <- dgeom(0:14,.3)*1000
> expec <- c(expec, 1000-sum(expec))
> expec
[1] 300.000000 210.000000 147.000000 102.900000 72.030000 50.421000
[7] 35.294700 24.706290 17.294403 12.106082 8.474257 5.931980
[13] 4.152386 2.906670 2.034669 4.747562
The last cell is the "15+" category. We also need the probabilities.
Now we don't yet have a sample; I'll just generate one:
y <- rgeom(1000,0.3)
but now we want a table of observed counts:
(x <- table(factor(y,levels=0:14),exclude=NULL))
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 <NA>
292 203 150 96 79 59 47 25 16 10 6 7 0 2 5 3
Now you could compute the chi-square directly and then calculate the p-value:
> (chisqstat <- sum((x-expec)^2/expec))
[1] 17.76835
(pval <- pchisq(chisqstat,15,lower.tail=FALSE))
[1] 0.2750401
but you can also get R to do it:
> chisq.test(x,p=expec/1000)
Chi-squared test for given probabilities
data: x
X-squared = 17.7683, df = 15, p-value = 0.275
Warning message:
In chisq.test(x, p = expec/1000) :
Chi-squared approximation may be incorrect
Now the case for unspecified p is similar, but (to my knowledge) you can no longer get chisq.test to do it directly, you have to do it the first way, but you have to estimate the parameter from the data (by maximum likelihood or minimum chi-square), and then test as above but you have one fewer degree of freedom for estimating the parameter.
See the example of doing a chi-square for a Poisson with estimated parameter here; the geometric follows the much same approach as above, with the adjustments as at the link (dealing with the unknown parameter, including the loss of 1 degree of freedom).
Let us assume you've got your randomly-generated variates in a vector x. You can do the following:
x <- rgeom(1000,0.2)
x_tbl <- table(x)
x_val <- as.numeric(names(x_tbl))
x_df <- data.frame(count=as.numeric(x_tbl), value=x_val)
# Expand to fill in "gaps" in the values caused by 0 counts
all_x_val <- data.frame(value = 0:max(x_val))
x_df <- merge(all_x_val, x_df, by="value", all.x=TRUE)
x_df$count[is.na(x_df$count)] <- 0
# Get theoretical probabilities
x_df$eprob <- dgeom(x_df$val, 0.2)
# Chi-square test: once with asymptotic dist'n,
# once with bootstrap evaluation of chi-sq test statistic
chisq.test(x=x_df$count, p=x_df$eprob, rescale.p=TRUE)
chisq.test(x=x_df$count, p=x_df$eprob, rescale.p=TRUE,
simulate.p.value=TRUE, B=10000)
There's a "goodfit" function described as "Goodness-of-fit Tests for Discrete Data" in package "vcd".
G.fit <- goodfit(x, type = "nbinomial", par = list(size = 1))
I was going to use the code you had posted in an earlier question, but it now appears that you have deleted that code. I find that offensive. Are you using this forum to gather homework answers and then defacing it to remove the evidence? (Deleted questions can still be seen by those of us with sufficient rep, and the interface prevents deletion of question with upvoted answers so you should not be able to delete this one.)
Generate a QQ Plot for testing a geometrically distributed sample
--- question---
I have a sample of n elements generated in R with
sim.geometric <- function(nvals)
{
p <- 0.3
u <- runif(nvals)
ceiling(log(u)/log(1-p))
}
for which i want to test its distribution, specifically if it indeed follows a geometric distribution. I want to generate a QQ PLot but have no idea how to.
--------reposted answer----------
A QQ-plot should be a straight line when compared to a "true" sample drawn from a geometric distribution with the same probability parameter. One gives two vectors to the functions which essentially compares their inverse ECDF's at each quantile. (Your attempt is not particularly successful:)
sim.res <- sim.geometric(100)
sim.rgeom <- rgeom(100, 0.3)
qqplot(sim.res, sim.rgeom)
Here I follow the lead of the authors of qqplot's help page (which results in flipping that upper curve around the line of identity):
png("QQ.png")
qqplot(qgeom(ppoints(100),prob=0.3), sim.res,
main = expression("Q-Q plot for" ~~ {G}[n == 100]))
dev.off()
---image not included---
You can add a "line of good fit" by plotting a line through through the 25th and 75th percentile points for each distribution. (I added a jittering feature to this to get a better idea where the "probability mass" was located:)
sim.res <- sim.geometric(500)
qqplot(jitter(qgeom(ppoints(500),prob=0.3)), jitter(sim.res),
main = expression("Q-Q plot for" ~~ {G}[n == 100]), ylim=c(0,max( qgeom(ppoints(500),prob=0.3),sim.res )),
xlim=c(0,max( qgeom(ppoints(500),prob=0.3),sim.res )))
qqline(sim.res, distribution = function(p) qgeom(p, 0.3),
prob = c(0.25, 0.75), col = "red")

Resources