R Binary Integer Optimization with Groups

R Binary Integer Optimization with Groups - r

I am trying to get Rsolnp to constrain my parameters to binary integers or to decimals that are nearly the same (.999 is close enough to 1 for example).
I have three vectors of equal length (52), each of which will get multiplied by my binary parameter vector in my objective function.
library(Rsolnp)
a <- c(251, 179, 215, 251, 63, 45, 54, 63, 47, 34, 40, 47, 141, 101, 121, 141, 47, 34, 40, 47, 94, 67, 81, 94, 47, 34, 40, 47, 157, 108, 133, 157, 126, 85, 106, 126, 126, 85, 106, 126, 110, 74, 92, 110, 110, 74, 92, 110, 63, 40, 52, 63)
b <- c(179, 251, 215, 0, 45, 63, 54, 0, 34, 47, 40, 0, 101, 141, 121, 0, 34, 47, 40, 0, 67, 94, 81, 0, 34, 47, 40, 0, 108, 157, 133, 0, 85, 126, 106, 0, 85, 126, 106, 0, 74, 110, 92, 0, 74, 110, 92, 0, 40, 63, 52, 0)
c <- c(179, 179, 118, 179, 45, 45, 30, 45, 34, 34, 22, 34, 101, 101, 67, 101, 34, 34, 22, 34, 67, 67, 44, 67, 34, 34, 22, 34, 108, 108, 71, 108, 85, 85, 56, 85, 85, 85, 56, 85, 74, 74, 49, 74, 74, 74, 49, 74, 40, 40, 27, 40)
x is my parameter vector and below if my objective function.
objective_function = function(x){
-(1166 * sum(x[1:52] * a) / 2000) *
(((sum(x[1:52] * b)) / 2100) + .05) *
(((sum(x[1:52] * c))/1500) + 1.5)
}
I essentially want 1 paramater in each group of 4 equal to 1 and the rest 0 and I'm not sure how to create the correct constraints for this but I believe I need to use these sum constraints in combination with another type of constraint as well. Here are my constraints:
eqn1=function(x){
z1=sum(x[1:4])
z2=sum(x[5:8])
z3=sum(x[9:12])
z4=sum(x[13:16])
z5=sum(x[17:20])
z6=sum(x[21:24])
z7=sum(x[25:28])
z8=sum(x[29:32])
z9=sum(x[33:36])
z10=sum(x[37:40])
z11=sum(x[41:44])
z12=sum(x[45:48])
z13=sum(x[49:52])
return(c(z1,z2,z3,z4,z5,z6,z7,z8,z9,z10,z11,z12,z13))
}
And finally, here is my function call:
opti<-solnp(pars=rep(1,52), fun = objective_function, eqfun = eqn1, eqB = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1), LB=rep(0,52))
Calling opti$pars returns my solution vector:
[1] 7.199319e-01 2.800680e-01 6.015388e-08 4.886578e-10 5.540961e-01 4.459036e-01 2.906853e-07 4.635970e-08 5.389325e-01
[10] 4.610672e-01 2.979195e-07 3.651954e-08 6.228346e-01 3.771652e-01 1.980380e-07 3.348488e-09 5.389318e-01 4.610679e-01
[19] 2.979195e-07 3.651954e-08 5.820231e-01 4.179766e-01 2.099869e-07 2.624076e-08 5.389317e-01 4.610680e-01 2.979195e-07
[28] 3.651954e-08 6.499878e-01 3.500120e-01 1.959133e-07 1.059012e-08 6.249098e-01 3.750900e-01 2.588037e-07 1.752927e-08
[37] 6.249106e-01 3.750892e-01 2.588037e-07 1.752927e-08 6.095743e-01 3.904254e-01 2.741968e-07 2.233806e-08 6.095743e-01
[46] 3.904254e-01 2.741968e-07 2.233806e-08 5.679608e-01 4.320385e-01 6.821224e-07 3.997882e-08
As one can see the weight is getting split between multiple variables in each group of 4 instead of being forced onto just 1 with the rest being 0.
If this is not possible with this package could someone show me how to convert my objective function to work with other optimization packages? From what I have seen, they require the objective function to be converted to a vector of coefficients. Any help is appreciated. Thanks!

I tried with a few solvers. With MINLP solvers Couenne and Baron we can solve this directly. With Gurobi we need to decompose the objective into two quadratic parts. All these solvers give:
---- 119 VARIABLE x.L
i1 1.000, i5 1.000, i9 1.000, i14 1.000, i17 1.000, i21 1.000, i25 1.000, i29 1.000
i34 1.000, i38 1.000, i41 1.000, i46 1.000, i49 1.000
---- 119 VARIABLE z.L = -889.346 obj
Zeroes are not printed here.
I used GAMS (commercial) but if you want to use free tools you can use Pyomo(Python) + Couenne. I am not sure about MINLP solvers for R, but Gurobi can be used from R.
Note that the group constraint is simply:
groups(g).. sum(group(g,i),x(i)) =e= 1;
where g are the groups and group(g,i) is a 2d set with the mapping between groups and items.
For Gurobi you need to do something like (in pseudo code):
z1 = 1166 * sum(i,x(i)*a(i)) / 2000 (linear)
z2 = ((sum(i, x(i)*b(i))) / 2100) + .05 (linear)
z3 = ((sum(i, x(i)*c(i)))/1500) + 1.5 (linear)
z23 = z2*z3 (non-convex quadratic)
obj = -z1*z23 (non-convex quadratic)
and tell Gurobi to use the nonconvex MIQCP solver.
Sorry, no R code for this. But it may give you something to think about.

within CPLEX you may try mathematical programming as Paul wrote, but you may also use Constraint Programming.
In OPL (CPLEX modeling language)
using CP;
execute
{
cp.param.timelimit=5; // time limit 5 seconds
}
int n=52;
range r=1..n;
int a[r]=[251, 179, 215, 251, 63, 45, 54, 63, 47, 34, 40, 47, 141, 101, 121, 141, 47, 34, 40, 47, 94, 67,
81, 94, 47, 34, 40, 47, 157, 108, 133, 157, 126, 85, 106, 126, 126,
85, 106, 126, 110, 74, 92, 110, 110, 74, 92, 110, 63, 40, 52, 63];
int b[r]=[179, 251, 215, 0, 45, 63, 54, 0, 34, 47, 40, 0, 101, 141, 121, 0,
34, 47, 40, 0, 67, 94, 81, 0, 34, 47, 40, 0, 108, 157, 133, 0, 85, 126, 106, 0, 85,
126, 106, 0, 74, 110, 92, 0, 74, 110, 92, 0, 40, 63, 52, 0];
int c[r]=[179, 179, 118, 179, 45, 45, 30, 45, 34, 34, 22, 34, 101, 101, 67, 101,
34, 34, 22, 34, 67, 67, 44, 67, 34, 34, 22, 34, 108, 108, 71, 108, 85, 85, 56, 85,
85, 85, 56, 85, 74, 74, 49, 74, 74, 74, 49, 74, 40, 40, 27, 40];
// decision variable
dvar boolean x[r];
// objective
dexpr float obj=
-(1166 * sum(i in r) (x[i]*a[i]) / 2000) *
(((sum(i in r) (x[i]* b[i])) / 2100) + .05) *
(((sum(i in r) (x[i]*c[i]))/1500) + 1.5);
minimize obj;
subject to
{
// one and only one out of 4 is true
forall(i in 1..n div 4) count(all(j in 1+(i-1)*4..4+(i-1)*4)x[j],1)==1;
}
gives
// solution with objective -889.3463
x = [1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1
0 0 0 0 1 0 0 1 0 0 0 0 1 0 0 0 1 0 0 0 1 0 0 1 0 0 0 1 0 0 0 1 0
0 0];
within 5 seconds
NB: You could call OPL CPLEX from R or rely on any other CPLEX API
And in python you can write the same
from docplex.cp.model import CpoModel
n=52
r=range(0,n)
a =[251, 179, 215, 251, 63, 45, 54, 63, 47, 34, 40, 47, 141, 101, 121, 141, 47, 34, 40, 47, 94, 67, 81, 94, 47, 34, 40, 47, 157, 108, 133, 157, 126, 85, 106, 126, 126, 85, 106, 126, 110, 74, 92, 110, 110, 74, 92, 110, 63, 40, 52, 63]
b =[179, 251, 215, 0, 45, 63, 54, 0, 34, 47, 40, 0, 101, 141, 121, 0, 34, 47, 40, 0, 67, 94, 81, 0, 34, 47, 40, 0, 108, 157, 133, 0, 85, 126, 106, 0, 85, 126, 106, 0, 74, 110, 92, 0, 74, 110, 92, 0, 40, 63, 52, 0]
c =[179, 179, 118, 179, 45, 45, 30, 45, 34, 34, 22, 34, 101, 101, 67, 101, 34, 34, 22, 34, 67, 67, 44, 67, 34, 34, 22, 34, 108, 108, 71, 108, 85, 85, 56, 85, 85, 85, 56, 85, 74, 74, 49, 74, 74, 74, 49, 74, 40, 40, 27, 40]
mdl = CpoModel(name='x')
#decision variables
mdl.x = {i: mdl.integer_var(0,n,name="x"+str(i+1)) for i in r}
mdl.minimize(-1166 * sum(mdl.x[i]*a[i] / 2000 for i in r) \
*((sum(mdl.x[i]* b[i] / 2100 for i in r) +0.05)) \
*((sum(mdl.x[i]*c[i]/1500 for i in r) +1.5)) )
for i in range(0,n // 4):
mdl.add(1==sum( mdl.x[j] for j in range(i*4+0,i*4+4)))
msol=mdl.solve(TimeLimit=5)
# Dislay solution
for i in r:
if (msol[mdl.x[i]]==1):
print(i+1," ")
and that gives
! Best objective : -889.3464
1
5
9
13
17
22
25
30
34
38
41
45
49

I set up an R notebook to solve (or try to solve) the problem as a mixed integer linear program, using CPLEX as the MIP solver and the Rcplex package as the interface to it. The results were unspectacular. After five minutes of grinding, CPLEX had a solution somewhat inferior to what Erwin got (-886.8748 v. his -889.346) with a gap over 146% (which, given Erwin's result, is mostly just the upper bound converging very slowly). I'm happy to share the notebook, which shows the linearization, but to use it you would need to have CPLEX installed.
Update: I have a second notebook, using the GA genetic algorithm package, that consistently gets close to Erwin's solution (and occasionally hits it) in under five seconds. The results are random, so rerunning may do better (or worse), and there is no proof of optimality.

Related

How to identify parameters for SARIMA model in R

Part 2 Boston
plot(boston, ylab=" Boston crime data", xlab= "Time")
#Time series seem to have homogeneous variance upon visual inspection
#Q2
#Trend looks linear in the plot, so for trend differencing operator take d=1
newboston= as.numeric(unlist(boston))
xdiff = diff(newboston)
plot(xdiff)
#Q3
#ADF
library(tseries)
adf.test(xdiff)
#From the result, alternative hypothesis is stationary so null hypothesis is rejected
#KPSS test
install.packages('fpp3', dependencies = TRUE)
library ( fpp3 )
unitroot_kpss(xdiff)
#the p-value is >0.05, so fail to reject null hypothesis for KPSS
#Q4
library(astsa)
acf2(xdiff, max.lag = 50)
model1 = sarima(xdiff, p, 1, q)
So this is what I have tried so far. I am quite new to R and so do be kind if my workings make little sense. For context, Boston is the data I imported from an excel, that is simply a column of x axis data.
Firstly, I am trying to do Q4, but I am not sure how I would go about to find p and q.
Second, I am unsure whether what I did in Q2 to detrend my data is correct in the first place.
Here is the output of dput(boston)
dput(boston)
structure(list(x = c(41, 39, 50, 40, 43, 38, 44, 35, 39, 35,
29, 49, 50, 59, 63, 32, 39, 47, 53, 60, 57, 52, 70, 90, 74, 62,
55, 84, 94, 70, 108, 139, 120, 97, 126, 149, 158, 124, 140, 109,
114, 77, 120, 133, 110, 92, 97, 78, 99, 107, 112, 90, 98, 125,
155, 190, 236, 189, 174, 178, 136, 161, 171, 149, 184, 155, 276,
224, 213, 279, 268, 287, 238, 213, 257, 293, 212, 246, 353, 339,
308, 247, 257, 322, 298, 273, 312, 249, 286, 279, 309, 401, 309,
328, 353, 354, 327, 324, 285, 243, 241, 287, 355, 460, 364, 487,
452, 391, 500, 451, 375, 372, 302, 316, 398, 394, 431, 431),
y = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15,
16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30,
31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45,
46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60,
61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75,
76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90,
91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104,
105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116,
117, 118)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA,
-118L))

FCT_Collapse using a range

Im trying to use a range (160:280) instead of '160', '161' and so on. How would i do that?
group_by(disp = fct_collapse(as.character(disp), Group1 = c(160:280), Group2 = c(281:400)) %>%
summarise(meanHP = mean(hp)))
Error: Problem adding computed columns in `group_by()`.
x Problem with `mutate()` column `disp`.
i `disp = `%>%`(...)`.
x Each input to fct_recode must be a single named string. Problems at positions: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 17```

For range of values it is better to use cut where you can define breaks and labels.
library(dplyr)
library(forcats)
mtcars %>%
group_by(disp = cut(disp, c(0, 160, 280, 400, Inf), paste0('Group', 1:4))) %>%
summarise(meanHP = mean(hp))
# disp meanHP
# <fct> <dbl>
#1 Group1 93.1
#2 Group2 143
#3 Group3 217.
#4 Group4 217.
So here 0-160 becomes 'Group1', 160-280 'Group2' and so on.
With fct_collapse you can do -
mtcars %>%
group_by(disp = fct_collapse(as.character(disp), Group1 = as.character(160:280), Group2 = as.character(281:400))) %>%
summarise(meanHP = mean(hp)) %>%
suppressWarnings()
However, this works only for exact values which are present so 160 would be in group1 but not 160.1.

We could also do
library(dplyr)
library(stringr)
mtcars %>%
group_by(disp = cut(disp, c(0, 160, 280, 400, Inf), strc('Group', 1:4))) %>%
summarise(meanHP = mean(hp))

update valence shifter in sentimentr package in r

I am trying to remove certain rows from the lexicon::hash_valence_shifters in the sentimentr package. Specifically, i want to keep only rows:
c( 1 , 2 , 3 , 6 , 7 , 13, 14 , 16 , 19, 24, 25 , 26, 27 , 28, 36, 38, 39, 41, 42, 43, 45, 46, 53, 54, 55, 56, 57, 59, 60, 65, 70, 71, 73, 74, 79, 84, 85, 87, 88, 89, 94, 95, 96, 97, 98, 99, 100, 102, 103, 104, 105, 106, 107, 114, 115, 119, 120, 123, 124, 125, 126, 127, 128, 129, 135, 136, 138)
I have tried the below approach:
vsm = lexicon::hash_valence_shifters[c, ]
vsm[ , y := as.numeric(y)]
vsm = sentimentr::as_key(vsm, comparison = NULL, sentiment = FALSE)
sentimentr::is_key(vsm)
vsn = sentimentr::update_valence_shifter_table(vsm, drop = c(dropvalue$x), x= lexicon::hash_valence_shifters, sentiment = FALSE, comparison = TRUE )
However, when I am calculating the sentiment using the updated valence shifter table "vsn", it is giving the sentiment as 0.
Can someone please let me know how to just keep specific rows of the valence shifter table ?
Thanks!

FastLed Matrix with nonlinear led array

I built this LED lamp project where i put 126 ping pong balls with ws2812B leds inside, into a glass vase.
I had no previous knowledge of fastled prior to this.
The balls are all jumbled up and leds are in no apparent sequence any longer.
In the code I created a 3d array where I assigned each led a position in the matrix by taping a grid on the outside of the vase and selecting each led with a built in potentiometer and reading the led number from the serial monitor.
so I now have this 3d array
int led_zyx[3][7][13] =
{
{ {92, 92, 92, 92, 92, 92, 92, 92, 92, 92, 92, 92, 92},
{92, 92, 92, 92, 92, 92, 92, 92, 92, 92, 92, 92, 92},
{78, 78, 78, 78, 78, 78, 78, 78, 78, 78, 78, 78, 78},
{65, 65, 65, 65, 65, 65, 65, 65, 65, 65, 65, 65, 65},
{40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40},
{24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24},
{1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1}
},
{
{124, 124, 124, 124, 110, 119, 119, 116, 116, 108, 108, 108, 117},
{92, 89, 89, 107, 104, 104, 110, 116, 105, 108, 108, 106, 106},
{85, 84, 84, 77, 98, 82, 97, 97, 75, 92, 91, 85, 85},
{34, 73, 73, 60, 60, 82, 65, 65, 45, 62, 72, 34, 34},
{38, 32, 50, 46, 47, 40, 51, 40, 29, 29, 38, 38, 38},
{0, 17, 17, 46, 25, 25, 31, 31, 10, 29, 23, 23, 23},
{2, 2, 0, 0, 0, 1, 1, 0, 0, 10, 2, 2, 2}
},
{
{113, 118, 114, 109, 125, 115, 121, 120, 120, 123, 112, 112, 117},
{95, 89, 103, 83, 83, 99, 100, 122, 111, 101, 96, 106, 102},
{84, 94, 81, 80, 83, 88, 79, 70, 87, 86, 96, 90, 93},
{74, 67, 55, 68, 76, 56, 69, 57, 61, 71, 58, 63, 74},
{54, 49, 26, 37, 42, 41, 36, 43, 52, 44, 33, 59, 66},
{18, 19, 13, 20, 25, 25, 21, 30, 22, 35, 39, 33, 27},
{14, 8, 3, 9, 4, 4, 12, 16, 5, 11, 15, 28, 7}
},
};
So i already managed to create of all kinds of cool modes like that - except of using the outside of the lamp/vase as an XY matrix to draw animations on it. It would be a comfortable 13x7 pixels to play around with, but I cant quite figure out how to address it properly to display graphics, or text or pixel animations.
Could anyone give me a pointer how to approach this?
I looked into XYmatris and smartmatrix, but they require the leds to be layed out in a specific order - not in that random way mine are set up.
Thank you for your help

Cleaning text of tweet messages

I have a csv of tweets. I got it using this ruby library:
https://github.com/sferik/twitter .
The csv is two columns and 150 rows, the second column is the text message:
Text
1 RT #AlstomTransport: #Alstom and OHL to supply a #metro system to #Guadalajara #rail #Mexico http://t.co/H88paFoYc3 http://t.co/fuBPPqNts4
I have to do a sentiment analysis, so i need to clean the text message, removing links, RT, Via, and everything useless for the analysis.
I tried with R, using code found in several tutorials:
> data1 = gsub("(RT|via)((?:\\b\\W*#\\w+)+)", "", data1)
But the output is without any sense:
[1] "1:150"
[2] "c(113, 46, 38, 11, 108, 100, 45, 44, 9, 89, 99, 93, 102, 101, 110, 93, 61, 57, 104, 66, 86, 53, 42, 43, 37, 7, 88, 32, 122, 131, 14, 102, 105, 12, 54, 13, 72, 87, 55, 132, 29, 28, 10, 15, 81, 81, 107, 87, 106, 81, 98, 73, 65, 52, 94, 97, 65, 59, 60, 50, 48, 121, 117, 75, 79, 111, 115, 119, 118, 91, 79, 31, 76, 111, 85, 62, 91, 103, 79, 120, 78, 47, 49, 8, 129, 123, 124, 58, 71, 25, 36, 80, 127, 112, 23, 22, 35, 21, 30, 74, 82, 51, 63, 130, 135, 134, 90, 83, 63, 128, 16, 20, 19, 34, 27, 26, 33, 77, \n114, 126, 64, 69, 4, 135, 41, 40, 17, 67, 92, 96, 84, 92, 56, 18, 125, 5, 6, 133, 24, 39, 70, 95, 116, 68, 84, 109, 92, 3, 1, 2)"
Can anyone help me? Thank you.

Looks like you tried to pass in the entire data.frame to gsub rather than just the text column. gsub prefers to work on character vectors. Instead you should do
data1[,2] = gsub("(RT|via)((?:\\b\\W*#\\w+)+)", "", data1[,2])
to just transform the second column.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

R Binary Integer Optimization with Groups - r

Related

How to identify parameters for SARIMA model in R

FCT_Collapse using a range

update valence shifter in sentimentr package in r

FastLed Matrix with nonlinear led array

Cleaning text of tweet messages

Categories

Resources