I'm working with a PCA problem where I have 3 variables and I reduce them to 2 by doing PCA. I've already plot all the points in 3D using scatter3D. My question is, how can I plot the plane determined by two vectors (the first two eigenvectors of the sampled covariance matrix) in R?
This is what I have so far
library(plot3D)
X <- matrix(c(55, 75, 110,
47, 69, 108,
42, 71, 110,
48, 74, 114,
47, 75, 114,
52, 73, 104,
49, 72, 106,
44, 67, 107,
52, 73, 108,
45, 73, 111,
50, 80, 117,
50, 71, 110,
48, 75, 114,
51, 73, 106,
44, 66, 102,
42, 71, 112,
50, 68, 107,
48, 70, 108,
51, 72, 108,
52, 73, 109,
49, 72, 112,
49, 73, 108,
46, 70, 105,
39, 66, 100,
50, 76, 108,
52, 71, 108,
56, 75, 108,
53, 70, 112,
53, 72, 110,
49, 74, 113,
51, 72, 109,
55, 74, 110,
56, 75, 110,
62, 79, 118,
58, 77, 115,
50, 71, 105,
52, 67, 104,
52, 73, 107,
56, 73, 106,
55, 78, 118,
53, 68, 103), ncol = 3,nrow = 41,byrow = TRUE)
S <- cov(X)
Gamma <- eigen(S)$vectors
scatter3D(X[,1], X[,2], X[,3], pch = 18, bty = "u", colkey = FALSE,
main ="bty= 'u'", col.panel ="gray", expand =0.4,
col.grid = "white",ticktype = "detailed",
phi = 25,theta = 45)
pc <- scale(X,center=TRUE,scale=FALSE) %*% Gamma[,c(1,2)]
Now I would like to plot the plane using scatter3D
Perhaps this will do. Using the iris data. It uses scatter3d in package car which can add a regression surface to a 3d plot:
library(car)
data(iris)
iris.pr <- prcomp(iris[, 1:3], scale.=TRUE)
# Draw 3d plot with surface and color points by species
scatter3d(PC3~PC1+PC2, iris.pr$x, point.col=c(rep(2, 50), rep(3, 50), rep(4, 50)))
This plots a regression surface predicting PC3 from PC1 and PC2. By definition the correlation between any two principal components is zero so the surface should be PC3=0 for any values of PC1 and PC2, but I don't see a way to produce exactly that surface. It is pretty close though.
I have a data such that produced from special function:
where t0=1, alpha, q, gamma, C and beta are unknown parameters.
The question is how to fit the above function to following data, in R?
mydata<-structure(list(x = 1:100, y = c(0, 0, 2, 1, 3, 4, 4, 3, 7, 8,
9, 11, 12, 11, 15, 15, 17, 21, 49, 43, 117, 75, 85, 97, 113,
129, 135, 147, 149, 149, 123, 129, 127, 122, 143, 157, 144, 139,
123, 117, 141, 138, 124, 134, 158, 151, 136, 133, 121, 117, 122,
125, 117, 111, 98, 94, 92, 89, 73, 87, 91, 88, 94, 90, 93, 76,
60, 96, 71, 80, 71, 63, 65, 47, 74, 63, 78, 68, 55, 48, 51, 45,
48, 50, 71, 48, 35, 51, 69, 62, 64, 66, 51, 59, 58, 34, 57, 56,
63, 50)), class = "data.frame", row.names = c(NA, -100L))
I defined the function as follows:
t0<<-1
fyy<-function(t,cc0,alpha0,qq0,beta0,gamma0){
ret<-cc0*((t-t0)^alpha0)/(((1+(qq0-1)*beta0*(t-t0)^gamma0))^(1/(qq0-1)))
return(ret)
}
but I don't know how to continue?
as #mhovd mentioned I used "nls" function but I got an error as follows:
> fit <- nls(y~fyy(x,cc0 ,alpha0 ,beta0 ,gamma0 ,qq0 ),
data=data.frame(mydata), start=list(cc0 = .01,alpha0 =1,beta0 =.3,gamma0
= 2,qq0 = 1))
Error in numericDeriv(form[[3L]], names(ind), env) :
Missing value or an infinity produced when evaluating the model
In the comments #masoud references a paper about the specific function in the question. It suggests fixing gamma0 and qq0 and if we do that we do get a solution -- fm shown in red in the plot. We have also shown an alternate parametric curve as fm2 in blue. It also has 3 optimized parameters but has lower residual sum of squares (lower is better).
fyy <- function(t,cc0,alpha0,qq0,beta0,gamma0){
cc0 * ((t-t0)^alpha0) / (((1+(qq0-1)*beta0*(t-t0)^gamma0))^(1/(qq0-1)))
}
mydata0 <- subset(mydata, y > 0)
# fixed values
t0 <- 1
gamma0 <- 3
qq0 <- 1.2
st <- list(cc0 = 1, alpha0 = 1, beta0 = 1) # starting values
fm <- nls(y ~ fyy(x, cc0, alpha0, qq0, beta0, gamma0), mydata0,
lower = list(cc0 = 0.1, alpha0 = 0.1, beta0 = 0.00001),
start = st, algorithm = "port")
deviance(fm) # residual sum of squares
## [1] 61458.5
st2 <- list(a = 1, b = 1, c = 1)
fm2 <- nls(y ~ exp(a + b/x + c*log(x)), mydata0, start = st2)
deviance(fm2) # residual sum of squares
## [1] 16669.24
plot(mydata0, ylab = "y", xlab = "t")
lines(fitted(fm) ~ x, mydata0, col = "red")
lines(fitted(fm2) ~ x, mydata0, col = "blue")
legend("topright", legend = c("fm", "fm2"), lty = 1, col = c("red", "blue"))
I am trying to get Rsolnp to constrain my parameters to binary integers or to decimals that are nearly the same (.999 is close enough to 1 for example).
I have three vectors of equal length (52), each of which will get multiplied by my binary parameter vector in my objective function.
library(Rsolnp)
a <- c(251, 179, 215, 251, 63, 45, 54, 63, 47, 34, 40, 47, 141, 101, 121, 141, 47, 34, 40, 47, 94, 67, 81, 94, 47, 34, 40, 47, 157, 108, 133, 157, 126, 85, 106, 126, 126, 85, 106, 126, 110, 74, 92, 110, 110, 74, 92, 110, 63, 40, 52, 63)
b <- c(179, 251, 215, 0, 45, 63, 54, 0, 34, 47, 40, 0, 101, 141, 121, 0, 34, 47, 40, 0, 67, 94, 81, 0, 34, 47, 40, 0, 108, 157, 133, 0, 85, 126, 106, 0, 85, 126, 106, 0, 74, 110, 92, 0, 74, 110, 92, 0, 40, 63, 52, 0)
c <- c(179, 179, 118, 179, 45, 45, 30, 45, 34, 34, 22, 34, 101, 101, 67, 101, 34, 34, 22, 34, 67, 67, 44, 67, 34, 34, 22, 34, 108, 108, 71, 108, 85, 85, 56, 85, 85, 85, 56, 85, 74, 74, 49, 74, 74, 74, 49, 74, 40, 40, 27, 40)
x is my parameter vector and below if my objective function.
objective_function = function(x){
-(1166 * sum(x[1:52] * a) / 2000) *
(((sum(x[1:52] * b)) / 2100) + .05) *
(((sum(x[1:52] * c))/1500) + 1.5)
}
I essentially want 1 paramater in each group of 4 equal to 1 and the rest 0 and I'm not sure how to create the correct constraints for this but I believe I need to use these sum constraints in combination with another type of constraint as well. Here are my constraints:
eqn1=function(x){
z1=sum(x[1:4])
z2=sum(x[5:8])
z3=sum(x[9:12])
z4=sum(x[13:16])
z5=sum(x[17:20])
z6=sum(x[21:24])
z7=sum(x[25:28])
z8=sum(x[29:32])
z9=sum(x[33:36])
z10=sum(x[37:40])
z11=sum(x[41:44])
z12=sum(x[45:48])
z13=sum(x[49:52])
return(c(z1,z2,z3,z4,z5,z6,z7,z8,z9,z10,z11,z12,z13))
}
And finally, here is my function call:
opti<-solnp(pars=rep(1,52), fun = objective_function, eqfun = eqn1, eqB = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1), LB=rep(0,52))
Calling opti$pars returns my solution vector:
[1] 7.199319e-01 2.800680e-01 6.015388e-08 4.886578e-10 5.540961e-01 4.459036e-01 2.906853e-07 4.635970e-08 5.389325e-01
[10] 4.610672e-01 2.979195e-07 3.651954e-08 6.228346e-01 3.771652e-01 1.980380e-07 3.348488e-09 5.389318e-01 4.610679e-01
[19] 2.979195e-07 3.651954e-08 5.820231e-01 4.179766e-01 2.099869e-07 2.624076e-08 5.389317e-01 4.610680e-01 2.979195e-07
[28] 3.651954e-08 6.499878e-01 3.500120e-01 1.959133e-07 1.059012e-08 6.249098e-01 3.750900e-01 2.588037e-07 1.752927e-08
[37] 6.249106e-01 3.750892e-01 2.588037e-07 1.752927e-08 6.095743e-01 3.904254e-01 2.741968e-07 2.233806e-08 6.095743e-01
[46] 3.904254e-01 2.741968e-07 2.233806e-08 5.679608e-01 4.320385e-01 6.821224e-07 3.997882e-08
As one can see the weight is getting split between multiple variables in each group of 4 instead of being forced onto just 1 with the rest being 0.
If this is not possible with this package could someone show me how to convert my objective function to work with other optimization packages? From what I have seen, they require the objective function to be converted to a vector of coefficients. Any help is appreciated. Thanks!
I tried with a few solvers. With MINLP solvers Couenne and Baron we can solve this directly. With Gurobi we need to decompose the objective into two quadratic parts. All these solvers give:
---- 119 VARIABLE x.L
i1 1.000, i5 1.000, i9 1.000, i14 1.000, i17 1.000, i21 1.000, i25 1.000, i29 1.000
i34 1.000, i38 1.000, i41 1.000, i46 1.000, i49 1.000
---- 119 VARIABLE z.L = -889.346 obj
Zeroes are not printed here.
I used GAMS (commercial) but if you want to use free tools you can use Pyomo(Python) + Couenne. I am not sure about MINLP solvers for R, but Gurobi can be used from R.
Note that the group constraint is simply:
groups(g).. sum(group(g,i),x(i)) =e= 1;
where g are the groups and group(g,i) is a 2d set with the mapping between groups and items.
For Gurobi you need to do something like (in pseudo code):
z1 = 1166 * sum(i,x(i)*a(i)) / 2000 (linear)
z2 = ((sum(i, x(i)*b(i))) / 2100) + .05 (linear)
z3 = ((sum(i, x(i)*c(i)))/1500) + 1.5 (linear)
z23 = z2*z3 (non-convex quadratic)
obj = -z1*z23 (non-convex quadratic)
and tell Gurobi to use the nonconvex MIQCP solver.
Sorry, no R code for this. But it may give you something to think about.
within CPLEX you may try mathematical programming as Paul wrote, but you may also use Constraint Programming.
In OPL (CPLEX modeling language)
using CP;
execute
{
cp.param.timelimit=5; // time limit 5 seconds
}
int n=52;
range r=1..n;
int a[r]=[251, 179, 215, 251, 63, 45, 54, 63, 47, 34, 40, 47, 141, 101, 121, 141, 47, 34, 40, 47, 94, 67,
81, 94, 47, 34, 40, 47, 157, 108, 133, 157, 126, 85, 106, 126, 126,
85, 106, 126, 110, 74, 92, 110, 110, 74, 92, 110, 63, 40, 52, 63];
int b[r]=[179, 251, 215, 0, 45, 63, 54, 0, 34, 47, 40, 0, 101, 141, 121, 0,
34, 47, 40, 0, 67, 94, 81, 0, 34, 47, 40, 0, 108, 157, 133, 0, 85, 126, 106, 0, 85,
126, 106, 0, 74, 110, 92, 0, 74, 110, 92, 0, 40, 63, 52, 0];
int c[r]=[179, 179, 118, 179, 45, 45, 30, 45, 34, 34, 22, 34, 101, 101, 67, 101,
34, 34, 22, 34, 67, 67, 44, 67, 34, 34, 22, 34, 108, 108, 71, 108, 85, 85, 56, 85,
85, 85, 56, 85, 74, 74, 49, 74, 74, 74, 49, 74, 40, 40, 27, 40];
// decision variable
dvar boolean x[r];
// objective
dexpr float obj=
-(1166 * sum(i in r) (x[i]*a[i]) / 2000) *
(((sum(i in r) (x[i]* b[i])) / 2100) + .05) *
(((sum(i in r) (x[i]*c[i]))/1500) + 1.5);
minimize obj;
subject to
{
// one and only one out of 4 is true
forall(i in 1..n div 4) count(all(j in 1+(i-1)*4..4+(i-1)*4)x[j],1)==1;
}
gives
// solution with objective -889.3463
x = [1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1
0 0 0 0 1 0 0 1 0 0 0 0 1 0 0 0 1 0 0 0 1 0 0 1 0 0 0 1 0 0 0 1 0
0 0];
within 5 seconds
NB: You could call OPL CPLEX from R or rely on any other CPLEX API
And in python you can write the same
from docplex.cp.model import CpoModel
n=52
r=range(0,n)
a =[251, 179, 215, 251, 63, 45, 54, 63, 47, 34, 40, 47, 141, 101, 121, 141, 47, 34, 40, 47, 94, 67, 81, 94, 47, 34, 40, 47, 157, 108, 133, 157, 126, 85, 106, 126, 126, 85, 106, 126, 110, 74, 92, 110, 110, 74, 92, 110, 63, 40, 52, 63]
b =[179, 251, 215, 0, 45, 63, 54, 0, 34, 47, 40, 0, 101, 141, 121, 0, 34, 47, 40, 0, 67, 94, 81, 0, 34, 47, 40, 0, 108, 157, 133, 0, 85, 126, 106, 0, 85, 126, 106, 0, 74, 110, 92, 0, 74, 110, 92, 0, 40, 63, 52, 0]
c =[179, 179, 118, 179, 45, 45, 30, 45, 34, 34, 22, 34, 101, 101, 67, 101, 34, 34, 22, 34, 67, 67, 44, 67, 34, 34, 22, 34, 108, 108, 71, 108, 85, 85, 56, 85, 85, 85, 56, 85, 74, 74, 49, 74, 74, 74, 49, 74, 40, 40, 27, 40]
mdl = CpoModel(name='x')
#decision variables
mdl.x = {i: mdl.integer_var(0,n,name="x"+str(i+1)) for i in r}
mdl.minimize(-1166 * sum(mdl.x[i]*a[i] / 2000 for i in r) \
*((sum(mdl.x[i]* b[i] / 2100 for i in r) +0.05)) \
*((sum(mdl.x[i]*c[i]/1500 for i in r) +1.5)) )
for i in range(0,n // 4):
mdl.add(1==sum( mdl.x[j] for j in range(i*4+0,i*4+4)))
msol=mdl.solve(TimeLimit=5)
# Dislay solution
for i in r:
if (msol[mdl.x[i]]==1):
print(i+1," ")
and that gives
! Best objective : -889.3464
1
5
9
13
17
22
25
30
34
38
41
45
49
I set up an R notebook to solve (or try to solve) the problem as a mixed integer linear program, using CPLEX as the MIP solver and the Rcplex package as the interface to it. The results were unspectacular. After five minutes of grinding, CPLEX had a solution somewhat inferior to what Erwin got (-886.8748 v. his -889.346) with a gap over 146% (which, given Erwin's result, is mostly just the upper bound converging very slowly). I'm happy to share the notebook, which shows the linearization, but to use it you would need to have CPLEX installed.
Update: I have a second notebook, using the GA genetic algorithm package, that consistently gets close to Erwin's solution (and occasionally hits it) in under five seconds. The results are random, so rerunning may do better (or worse), and there is no proof of optimality.