get.basis() in lpSolveAPI - r

I am confused with the return of function get.basis(). For example,
lprec <- make.lp(0, 4)
set.objfn(lprec, c(1, 3, 6.24, 0.1))
add.constraint(lprec, c(0, 78.26, 0, 2.9), ">=", 92.3)
add.constraint(lprec, c(0.24, 0, 11.31, 0), "<=", 14.8)
add.constraint(lprec, c(12.68, 0, 0.08, 0.9), ">=", 4)
set.bounds(lprec, lower = c(28.6, 18), columns = c(1, 4))
set.bounds(lprec, upper = 48.98, columns = 4)
RowNames <- c("THISROW", "THATROW", "LASTROW")
ColNames <- c("COLONE", "COLTWO", "COLTHREE", "COLFOUR")
dimnames(lprec) <- list(RowNames, ColNames)
solve(lprec)
Then the basic variables are
> get.basis(lprec)
[1] -7 -2 -3
However, the solution is
> get.variables(lprec)
[1] 28.60000 0.00000 0.00000 31.82759
From the solution, it seems variable 1 and variable 4 are basis. Hence how does vector (-7, -2, -3) come from?
I am guessing it is from 3 constraints and 4 decision variables.

After I reviewed the simplex method for bounded variables, finally I understood how it happens. These two links are helpful. Example; Video
Come back to this problem, the structure is like
lpSolveAPI (R interface for lp_solve) would rewrite the constraint structure as the following format after adding appropriate slack variables. The first three columns are for slack variables. Hence, the return of get.basis(), which is -7,-2,-3, are column 7, 2, 3 that represent variable 4, slack variable 2 and 3.
With respect to this kind of LP with bounded variables, a variable could be nonbasic at either lower bound or upper bound. The return of get.basis(lp, nonbasic=TRUE) is -1,-4,-5,-6. Minus means these variables are at their lower bound. It means slack variable 1 = 0, variable 4 = 28.6, variable 5 = 0, variable 6 = 0.
Thus, the optimal solution is 28.6(nonbasic), 0(nonbasic), 0(nonbasic), 31.82(basic)

Related

Principal Component Analysis in R by hand

The questions is about Principal Component Analysis, partly done by hand.
Disclaimer: My background is not in maths and I am using R for the first time.
Given are the following five data points in R^3. Where xi1-3 are variables and x1 - x5 are observations.
| x1 x2 x3 x4 x5
----------------------
xi1 | -2 -2 0 2 2
xi2 | -2 2 0 -2 2
xi3 | -4 0 0 0 4
Three principal component vectors after the principal component analysis has been performed are given, and look like this:
Phi1 = (0.41, 0.41, 0.82)T
Phi2 = (-0.71, 0.71, 0.00)T
Phi3 = (0.58, 0.58, -0.58)T
The questions are as follows
1) Calculate the principal component scores zi1, zi2 and zi3 for each of the 5 data points.
2) Calculate the proportion of the variance explained by each principal component.
So far I have answered question 1 with the following code, where Z represents the scores:
A = matrix(
c(-2, -2, 0, 2, 2, -2, 2, 0, -2, 2, -4, 0, 0, 0, 4),
nrow = 3,
ncol = 5,
byrow = TRUE
)
Phi = matrix (
c(0.41, -0.71, 0.58,0.41, 0.71, 0.58, 0.82, 0.00, -0.58),
nrow = 3,
ncol = 3,
byrow = FALSE
)
Z = Phi%*%A
Now I am stuck with question 2, I am given the formula:
But I am not sure how I can recreate the formula with an R command, can anyone help me?
#Here is the numerator:
(Phi%*%A)^2%>%rowSums()
[1] 48.4128 16.1312 0.0000
#Here is the denominator:
sum(A^2)
[1] 64
#So the answer is:
(Phi%*%A)^2%>%rowSums()/sum(A^2)
[1] 0.75645 0.25205 0.00000
we can verify with prcomp+summary:
summary(prcomp(t(A)))
Importance of components:
PC1 PC2 PC3
Standard deviation 3.464 2.00 0
Proportion of Variance 0.750 0.25 0
Cumulative Proportion 0.750 1.00 1
This is roughly the same since your $\Phi$ is rounded to two decimals.

Quadratic optimization - portfolio maximization problems

In portfolio analysis, given the expectation, we aim to find the weight of each asset to minimize the variance
here is the code
install.packages("quadprog")
library(quadprog)
#Denoting annualized risk as an vector sigma
sigma <- c(0.56, 7.77, 13.48, 16.64)
#Formulazing the correlation matrix proposed by question
m <- diag(0.5, nrow = 4, ncol = 4)
m[upper.tri(m)] <- c(-0.07, -0.095, 0.959, -0.095, 0.936, 0.997)
corr <- m + t(m)
sig <- corr * outer(sigma, sigma)
#Defining the mean
mu = matrix(c(1.73, 6.65, 9.11, 10.30), nrow = 4)
m0 = 8
Amat <- t(matrix(c(1, 1, 1, 1,
c(mu),
1, 0, 0, 0,
0, 1, 0, 0,
0, 0, 1, 0,
0, 0, 0, 1), 6, 4, byrow = TRUE))
bvec <- c(1, m0, 0, 0, 0, 0)
qp <- solve.QP(sig, rep(0, nrow(sig)), Amat, bvec, meq = 2)
qp
x = matrix(qp$solution)
x
(t(x) %*% sig %*% x)^0.5
I understand the formulation of mu and covariance matrix and know the usage of the quadprog plot
However, I don‘t understand why Amat and bvec are defined in this way, why the are 6 by 4 matrix.
$mu0$ is the expectation we aim to have for the portfolio and it is fixed at value 8%
Attached is the question
As you are probably aware, the reason that Amat has four columns is that there are four assets that you are allocating over. It has six rows because there are six constraints in your problem:
The allocations add up to 1 (100%)
Expected return = 8%
'Money market' allocation >= 0
'Capital stable' allocation >= 0
'Balance' allocation >= 0
'Growth' allocation >= 0
Look at the numbers that define each constraint. They are why bvec is [1, 8, 0, 0, 0, 0]. Of these six, the first two are equality constraints, which is why meq is set to 2 (the other four are greater than or equal constraints).
Edited to add:
The way the constraints work is this: each column of Amat defines a constraint, which is then multiplied by the asset allocations, with the result equal to (or greater-than-or-equal-to) some target that is set in bvec. For example:
The first column of Amat is [1, 1, 1, 1], and the first entry of bvec is 1. So the first constraint is:
1 * money_market + 1 * capital_stable + 1 * balance + 1 * growth = 1
This is a way of saying that the asset allocations add up to 1.
The second constraint says that the expected returns add up to 8:
1.73 * money_market + 6.65 * capital_stable + 9.11 * balance + 10.32 * growth = 8
Now consider the third constraint, which says that the 'Money market' allocation is greater than or equal to zero. That's because the 3rd column of Amat is [1, 0, 0, 0] and the third entry of bvec is 0. So this constraint looks like:
1 * money_market + 0 * capital_stable + 0 * balance + 0 * growth >= 0
Simplifying, that's the same as:
money_market >= 0

How to change the per-step weighting coefficient in the R package DTW

I would like to change the default step-pattern weight of the cost function because I need to standardize my results with some others in a paper that doesn't use the weight 2 for the diagonal distance. I've read the JSS paper but I just found other step-patterns that are not what I'm really looking for, I guess. For example, imagine we have two timeSeries Q, C:
Q = array(c(0,1,0,0,0,0,0,0,0,0,0,0,0,1,1,0),dim=c(8,2))
C = array(c(0,1,0,0,0,0,0,0,0,1,1,0,0,0,0,0),dim=c(8,2))
When I calculate the dtw distance, I obtain
alignment = dtw(Q,C,keep=TRUE)
With a alginment$distance of 2.41 and a cost matrix where for example the [2,2] element is 2 instead of 1 because of the weight or penalization of 2*d[i,j] in the diagonal when selecting the minimum between:
g[i,j] = min( g[i-1,j-1] + 2 * d[i ,j ] ,
g[i ,j-1] + d[i ,j ] ,
g[i-1,j ] + d[i ,j ] )
plot(asymmetricP1)
edit(asymmetricP1)
structure(c(1, 1, 1, 2, 2, 3, 3, 3, 1, 0, 0, 1, 0, 2, 1, 0, 2,
1, 0, 1, 0, 1, 0, 0, -1, 0.5, 0.5, -1, 1, -1, 1, 1), .Dim = c(8L, 4L), class = "stepPattern", npat = 3, norm = "N")
Look at the plot, and consider the branches as ordered from right to left (ie. branch1 = 0.5 weight)
Everything in the script below is in the context of plot(asymmetricP1) and edit(asymmetricP1)
#first 8 digit sequence (1,1,1,2,2,3,3,3....
#branch1: "1,1,1" <- amount of intervals assigned to specificaly branch1; (end, joint, origin)
#branch2: "2,2" <- only 2 intervals, this is the middle diagnol line.
#branch3: "3,3,3" <- amount of interals
#note: Don't be confused by the numbers themselves, ie. "4,4,4" <- 3 intervals; "2,2,2" <- 3 intervals
#for the next sequences consider:
#the sequence of each branch is to be read as farthest from origin -> 0,0
#each interval assignment is accounted for in this order
#next 8 digit sequence: 1, 0, 0, 1, 0, 2, 1, 0,
#branch1: 1,0,0 <- interval position in relation to the query index
#branch2: 1,0 <- interval position in relation to the query index
#branch3: 2,1,0 <- interval position in relation to the query index (again see in plot)
#next 8 digit sequence: 2, 1, 0, 1, 0, 1, 0, 0
#branch1: 2,1,0 <- interval position in relation to the REFERENCE index
#branch2: 1,0 <- interval position in relation to the reference index
#branch3: 1,0,0 <- interval position in relation to the reference index (again see in plot)
#next 8 digit sequence: -1, 0.5, 0.5, -1, 1, -1, 1, 1
#note: "-1" is a signal that indicates weighting values follow
#note: notice that for each -1 that occurs, there is one value less, for example branch 1
# .....which has 3 intervals can only contain 2 weights (0.5 and 0.5)
#branch1: -1,0.5,0.5 <- changing the first 0.5 changes weight of [-1:0] segment (query index)
#branch2: -1,1 <- weight of middle branch
#branch3: -1,1,1 <- changing the second 1 changes weight of[-1,0] segment (query index)
#.Dim=c(8L, 4L):
#8 represents the number of intervals (1,1,1,2,2,3,3,3)
#4 (from what I understand) is the (length of all the branch sequences mentioned previously)/8
#npat = 3
#3 is the number of patterns you described in the structure. ie(1,1,1,2,2,3,3,3)
Hope this helps, good luck!

R, use binomial distribution with more than two possibilities

I know this is probably elementary, but I seem to have a mental block. Let's say you want to calculate the probability of tossing a 4, 5, or 6 on a roll of one die. In R, it's easy enough:
sum(1/6, 1/6, 1/6)
This gives 1/2 which is the correct answer. However, I have in the back of my mind (where it possibly should remain) that I should be able to use the binomial distribution for this. I've tried various combinations of arguments for pbinom and dbinom, but I can't get the right answer.
With coin tosses, it works fine. Is it entirely inappropriate for situations where there are more than two possible outcomes? (I'm a programmer, not a statistician, so I'm expecting to get killed by the stat guys here.)
Question: How can I use pbinom() or dbinom() to calculate the probability of throwing a 4, 5, or 6 with one roll of a die? I'm familiar with the prob and dice packages, but I really want to use one of the built-in distributions.
Thanks.
As #Alex mentioned above, dice-throwing can be represented in terms of multinomial probabilities. The probability of rolling a 4, for example, is
dmultinom(c(0, 0, 0, 1, 0, 0), size = 1, prob = rep(1/6, 6))
# [1] 0.1666667
and the probability of rolling a 4, 5, or 6 is
X <- cbind(matrix(rep(0, 9), nc = 3), diag(1, 3))
X
# [,1] [,2] [,3] [,4] [,5] [,6]
# [1,] 0 0 0 1 0 0
# [2,] 0 0 0 0 1 0
# [3,] 0 0 0 0 0 1
sum(apply(X, MAR = 1, dmultinom, size = 1, prob = rep(1/6, 6)))
# [1] 0.5
Though it's not quite obvious, this can be done with pmultinom, implemented either in my pmultinom package on CRAN or this other pmultinom package on Github.
You conceptualize it as the event that it is not a 1, 2 or 3. Then, you write this probability as
P(X_1 ≤ 0, X_2 ≤ 0, X_3 ≤ 0, X_4 ≤ ∞, X_5 ≤ ∞, X_6 ≤ ∞)
where X_i is the number of occurrences of side i. All the X's together have a multinomial distribution, with a size parameter of 1, and all probabilities equal to 1/6. This probability above can be calculated (using my package) as
pmultinom(upper=c(0, 0, 0, Inf, Inf, Inf), size=1,
probs=c(1/6, 1/6, 1/6, 1/6, 1/6, 1/6), method="exact")
# [1] 0.5
Though it's a bit of an awkward reformulation, I like it because I prefer to use a "p" function rather than take a sum of "d" functions.

setting variables to a value using linear programming in r

I have developed a linear programming model in R and I would like to know the command to set a variable to a value, here is my code and the results:
install.packages("lpSolveAPI")
library(lpSolveAPI)
#want to solve for 6 variables, these correspond to the number of bins
lprec <- make.lp(0, 6)
lp.control(lprec, sense="max")
#MODEL 1
set.objfn(lprec, c(13.8, 70.52,122.31,174.73,223.49,260.65))
add.constraint(lprec, c(13.8, 70.52, 122.31, 174.73, 223.49, 260.65), "=", 204600)
add.constraint(lprec, c(1,1,1,1,1,1), "=", 5000)
Here are the results:
> solve(lprec)
[1] 0
> get.objective(lprec)
[1] 204600
> get.variables(lprec)
[1] 2609.309 2390.691 0.000 0.000 0.000 0.000
I would like to set the first result (2609) to 3200,and the last result to 48, and then optimize on the other variables, any help would be much appreciated.
Ideally your expectation is for constrained optimization for which you should add more constraints as per your requirement. I am not familiar with lpSolveAPI and so not able to do correct coding but you need something like:
add.constraint(lprec, c(1, 0, 0, 0, 0, 0), "=", 3200)
add.constraint(lprec, c(0, 0, 0, 0, 0, 1), "=", 48)
Along with your existing constraints.

Resources