Why the limits of the number of solutions in cplex is not taken into consideration? - julia

I'm solving a MILP problem with CPLEX called from Julia JuMP package.
In CPLEX log the number of solutions is displayed to be more than 3000, but there is the parameter CPXPARAM_MIP_Limits_Solutions set to 55, so the solver should return when the number of solution is more than 55.
The explosion of the number of solution causes an out of memory error, therefore the Linux kernel kills the process.
This is the log:
CPXPARAM_Emphasis_Memory 1
CPXPARAM_Emphasis_MIP 2
CPXPARAM_MIP_Limits_Solutions 55
CPXPARAM_TimeLimit 60
Warning: Non-integral bounds for integer variables rounded.
2 of 3 MIP starts provided solutions.
MIP start 'm1' defined initial solution with objective 0.0000.
Warning: Non-integral bounds for integer variables rounded.
Tried aggregator 2 times.
MIP Presolve eliminated 12094 rows and 182224 columns.
MIP Presolve added 26 rows and 0 columns.
MIP Presolve modified 17428 coefficients.
Aggregator did 1 substitutions.
Reduced MIP has 5863 rows, 4313 columns, and 28322 nonzeros.
Reduced MIP has 4132 binaries, 175 generals, 0 SOSs, and 0 indicators.
Presolve time = 0.35 sec. (311.81 ticks)
Probing fixed 3059 vars, tightened 200 bounds.
Probing changed sense of 57 constraints.
Probing time = 0.45 sec. (324.14 ticks)
Cover probing fixed 0 vars, tightened 286 bounds.
Tried aggregator 2 times.
MIP Presolve eliminated 4435 rows and 3257 columns.
MIP Presolve modified 923 coefficients.
Aggregator did 2 substitutions.
Reduced MIP has 1426 rows, 1054 columns, and 7403 nonzeros.
Reduced MIP has 929 binaries, 122 generals, 0 SOSs, and 0 indicators.
Presolve time = 0.02 sec. (19.58 ticks)
Probing time = 0.03 sec. (18.90 ticks)
Tried aggregator 1 time.
MIP Presolve eliminated 5 rows and 3 columns.
MIP Presolve modified 1 coefficients.
Reduced MIP has 1421 rows, 1051 columns, and 7378 nonzeros.
Reduced MIP has 927 binaries, 121 generals, 0 SOSs, and 0 indicators.
Presolve time = 0.01 sec. (7.48 ticks)
Probing time = 0.08 sec. (52.47 ticks)
Clique table members: 6451.
MIP emphasis: optimality.
MIP search method: dynamic search.
Parallel mode: deterministic, using up to 32 threads.
Root relaxation solution time = 0.02 sec. (14.70 ticks)
Nodes Cuts/
Node Left Objective IInf Best Integer Best Bound ItCnt Gap
* 0+ 0 0.0000 -106269.5431 ---
0 0 -463.9717 61 0.0000 -463.9717 480 ---
0 0 -454.9015 109 0.0000 Cuts: 86 537 ---
0 0 -434.5372 112 0.0000 Cuts: 87 592 ---
0 0 -426.6747 113 0.0000 Cuts: 97 622 ---
0 0 -418.6204 136 0.0000 Cuts: 62 660 ---
0 0 -413.7867 132 0.0000 Cuts: 55 698 ---
0 0 -409.6387 140 0.0000 Cuts: 16 721 ---
0 0 -407.9923 135 0.0000 Cuts: 39 739 ---
0 0 -407.0012 148 0.0000 Cuts: 34 760 ---
0 0 -406.3034 149 0.0000 Cuts: 11 775 ---
0 0 -405.7757 134 0.0000 Cuts: 17 784 ---
0 0 -405.4831 148 0.0000 Cuts: 59 804 ---
0 2 -405.4831 145 0.0000 -118.6877 804 ---
Elapsed time = 2.12 sec. (1148.70 ticks, tree = 0.02 MB, solutions = 2)
* 282 17 integral 0 0.0000 -118.6877 3889 ---
6415 1365 -0.0974 1 0.0000 -0.1947 20809 ---
11118 1933 -348.8038 138 0.0000 -0.1947 37285 ---
* 11185 11 integral 0 0.0000 -0.1947 37522 ---
* 11206 16 integral 0 -0.0000 -0.1947 37594 ---
12049 384 -0.0974 1 -0.0000 -0.1947 39994 ---
13976 1504 -0.0974 1 -0.0000 -0.1947 44560 ---
15081 1894 -0.0974 1 -0.0000 -0.1947 47408 ---
16098 2205 -0.0000 0 -0.0000 -0.1947 49781 ---
17468 2844 -0.0974 1 -0.0000 -0.1947 52969 ---
18578 3322 -0.0000 0 -0.0000 -0.1947 56013 ---
19990 1939 -0.0000 0 -0.0000 -0.1947 61970 ---
Elapsed time = 14.88 sec. (4728.96 ticks, tree = 2.14 MB, solutions = 3127)
21354 555 cutoff -0.0000 -0.1947 67537 ---
26500 824 -0.0974 1 -0.0000 -0.1431 79682 ---
Killed
Version:
CPLEX 12.9.0
Julia 1.2.0
JuMP 0.20.1
Edit:
CPXPARAM_MIP_Limits_Solutions parameter controls the maximum number of incumbent solution to be found before stopping the optimization. However this number is not enough to control the number of solutions that the solver keeps in memory, because there may be the case when there are multiple solutions that are equivalent in terms of their value of the objective function, and in this case they just count for one solution for CPXPARAM_MIP_Limits_Solutions purpose. Therefore, to avoid memory consumption caused by solutions stored by the solver the right parameter is CPXPARAM_MIP_Pool_Capacity (that controls the number of solutions explored by the solver) that I set to 0 (because I was not interested in getting back all the solutions explored by CPLEX, but just the best one).
After this configuration the program terminated its run without being killed by the kernel.

The number of solutions is very likely not the reason for the out-of-memory error. It's the size of the branch-and-bound tree and the number of nodes that need to be stored and processed. You should try limiting the number of threads that are used to reduce the memory footprint.
Furthermore, there aren't that many proper solutions found. For every new incumbent you see a marker (* or H) at the beginning of the respective line, e.g.,
* 282 17 integral 0 0.0000 -118.6877 3889 ---
6415 1365 -0.0974 1 0.0000 -0.1947 20809 ---
11118 1933 -348.8038 138 0.0000 -0.1947 37285 ---
* 11185 11 integral 0 0.0000 -0.1947 37522 ---
* 11206 16 integral 0 -0.0000 -0.1947 37594 ---
12049 384 -0.0974 1 -0.0000 -0.1947 39994 ---
I don't know what the number of solutions reported in the log refers to. Probably these are additional solutions that do not improve the objective.
Please note the description of the parameter CPXPARAM_MIP_Limits_Solutions in the CPLEX docu:
Sets the number of MIP solutions to be found before stopping.
This integer solution limit does not apply to the populate procedure, which generates solutions to store in the solution pool. For a limit on the number of solutions generated by populate, see the populate limit parameter: maximum number of solutions generated for solution pool by populate.
You may want to check the CPXPARAM_MIP_Limits_Populate parameter as well.

Related

issue with normalizing variable

I am trying to normalize a variable using Box-Cox. However, I am receiving an error message:
boxcox_obj <- boxcox(alive_data_4$mosslpadeq)
Error in estimate_boxcox_lambda(x, ...) : x must be positive
I read online that you can get this message when the variable has negative values. However, that is not the case with this variable (see frequency below).
table(alive_data_4$mosslpadeq)
0 10 20 30 40 50 60 70 80 90 100
766 635 2141 1756 3355 1913 2095 1400 4498 1361 2228
Can someone advise?

Problem with nor.test function in R. Sample size must be between 3 and 5000

I have a problem using the nor.test function as a oneway test in R.
My data contain yield value (Rdt_pied) that are grouped by treatment (Traitement). In each treatment I have between 60 and 90 values.
> describe(Rdt_pied ~ Traitement, data = dataMax)
n Mean Std.Dev Median Min Max 25th 75th Skewness Kurtosis NA
G(1) 90 565.0222 282.1874 535.0 91 1440 379.00 751.25 0.7364071 3.727566 0
G(2) 90 703.1444 366.1114 632.5 126 1628 431.50 1007.75 0.4606251 2.392356 0
G(3) 90 723.9667 523.5872 650.5 64 2882 293.50 1028.50 1.2606231 5.365014 0
G(4) 90 954.1000 537.0138 834.5 83 2792 565.25 1143.75 1.1695460 4.672321 0
G(A) 60 368.0667 218.1940 326.0 99 1240 243.00 420.00 2.2207612 9.234473 0
G(H) 60 265.4667 148.0383 223.5 107 866 148.00 357.25 1.3759925 5.685456 0
G(S) 60 498.8000 280.1277 401.0 170 1700 292.75 617.50 1.6792061 7.125804 0
G(T) 60 521.7167 374.7822 448.5 74 1560 214.00 733.25 1.1367209 3.737134 0
>
Why do the nor.test returns me this answer?
> nor.test(Rdt_pied ~ Traitement, data = dataMax)
Error in shapiro.test(y[which(group == (levels(group)[i]))]) :
sample size must be between 3 and 5000
Thank you for your help!
Haven't used that package, but per documentation (and your error), nor.test performs Shapiro-Wilk normality test by default, which needs a numeric vector as an input (at least 3 values). My guess is that, there is a group, based on Traitement, which has less than 3 values, or more than 5000.
Try to check it with something like
table(dataMax$Traitement)

adding and subtracting values in multiple data frames of different lengths - flow analysis

Thank you jakub and Hack-R!
Yes, these are my actual data. The data I am starting from are the following:
[A] #first, longer dataset
CODE_t2 VALUE_t2
111 3641
112 1691
121 1271
122 185
123 522
124 0
131 0
132 0
133 0
141 626
142 170
211 0
212 0
213 0
221 0
222 0
223 0
231 95
241 0
242 0
243 0
244 0
311 129
312 1214
313 0
321 0
322 0
323 565
324 0
331 0
332 0
333 0
334 0
335 0
411 0
412 0
421 0
422 0
423 0
511 6
512 0
521 0
522 0
523 87
In the above table, we can see the 44 land use CODES (which I inappropriately named "class" in my first entry) for a certain city. Some values are just 0, meaning that there are no land uses of that type in that city.
Starting from this table, which displays all the land use types for t2 and their corresponding values ("VALUE_t2") I have to reconstruct the previous amount of land uses ("VALUE_t1") per each type.
To do so, I have to add and subtract the value per each land use (if not 0) by using the "change land use table" from t2 to t1, which is the following:
[B] #second, shorter dataset
CODE_t2 CODE_t1 VALUE_CHANGE1
121 112 2
121 133 12
121 323 0
121 511 3
121 523 2
123 523 4
133 123 3
133 523 4
141 231 12
141 511 37
So, in order to get VALUE_t1 from VALUE_t2, I have, for instance, to subtract 2 + 12 + 0 + 3 + 2 hectares (first 5 values of the second, shorter table) from the value of land use type/code 121 of the first, longer table (1271 ha), and add 2 hectares to land type 112, 12 hectares to land type 133, 3 hectares to land type 511 and 2 hectares to land type 523. And I have to do that for all the land use types different than 0, and later also from t1 to t0.
What I have to do is a sort of loop that would both add and subtract, per each land use type/code, the values from VALUE_t2 to VALUE_t1, and from VALUE_t1 to VALUE_t0.
Once I estimated VALUE_t1 and VALUE_t0, I will put the values in a simple table showing the relative variation (here the values are not real):
CODE VALUE_t0 VALUE_t2 % VAR t2-t0
code1 50 100 ((100-50)/50)*100
code2 70 80 ((80-70)/70)*100
code3 45 34 ((34-45)/45)*100
What I could do so far is:
land_code <- names(A)[-1]
land_code
A$VALUE_t1 <- for(code in land_code{
cbind(A[1], A[land_code] - B[match(A$CODE_t2, B$CODE_t2), land_code])
}
If I use the loop I get an error, while if I take it away:
A$VALUE_t1 <- cbind(A[1], A[land_code] - B[match(A$CODE_t2, B$CODE_t2), land_code])
it works but I don't really get what I want to get... so far I was working on how to get a new column which would contain the new "add & subtract" values, but haven't succeeded yet. So I worked on how to get a new column which would at least match the land use types first, to then include the "add and subtract" formula.
Another problem is that, by using "match", I get a shorter A$VALUE_t1 table (13 rows instead of 44), while I would like to keep all the land use types in dataset A, because I will have then to match it with the table including VALUES_t0 (which I haven't shown here).
Sorry that I cannot do better than this at the moment... and I hope to have explained better what I have to do. I am extremely grateful for any help you can provide to me.
thanks a lot

Fitting logistic growth curves to data

I've been attempting to fit logistic growth equations to data sets I have, with mixed results. I typically use a setup like this:
# Post PT
time <- 1:48
Diversity <- new8
plot(time, Diversity,log="y",las=1, pch=16, type="l")
logisticModel <- nls(Diversity~K/(1+exp(Po+r*time)), start=list(Po=25, r=-1.6, K=200),control=list(maxiter=1000,minFactor=.00000000001))
The goal here is to model Diversity over time logistically; this is a species diversity curve that asymptotes. However, for particular datasets, I cannot get the model to work and can't for the life of me figure out why. As an example, in one iteration, the Diversity (and therefore, new8) value that is being pulled is
[1] 25 22 68 72 126 141 82 61 97 126 101 110 173 164 160 137 122 113 104 104 109 102 107 122 149 127 137 146 185 188 114 91 102 132 147
[36] 148 151 154 165 215 216 206 205 207 207 220 200 204
# plot via this, and it is a nice species diversity curve beginning to level off
plot(Diversity,type="l")
This data is beginning to reach its limit, yet I cannot fit a logistic curve to it. If I try, I get an exceeded max iterations error, no matter how high I crank up the iterations. I've played with the starting parameters over and over with no luck. Currently, for example the code looks like this:
# Post PT
time <- 1:48
Diversity <- new8
plot(time, Diversity,log="y",las=1, pch=16, type="l")
logisticModel <- nls(Diversity~K/(1+exp(Po+r*time)), start=list(Po=25, r=-1.6, K=200),control=list(maxiter=1000,minFactor=.00000000001))
Any help is more than appreciated. Spent all day sitting on my couch stuck on this. If someone has a better way to coerce a logistic growth curve out of data, I'd love to hear it! As a side note, I've used SSlogis for these datasets with no luck, either.
Numerical instability is often a problem with models involving exponential terms. Try evaluating your model at your starting parameters:
> 200/(1+exp(25-1.6*df$norm_time))
[1] 2.871735e-09 2.969073e-09 3.069710e-09 3.173759e-09 3.281333e-09 3.392555e-09 3.507546e-09 3.626434e-09 3.749353e-09
[10] 3.876437e-09 4.007830e-09 4.143676e-09 4.284126e-09 4.429337e-09 4.579470e-09 4.734691e-09 4.895174e-09 5.061097e-09
[19] 5.232643e-09 5.410004e-09 5.593377e-09 5.782965e-09 5.978979e-09 6.181637e-09 6.391165e-09 6.607794e-09 6.831766e-09
[28] 7.063329e-09 7.302742e-09 7.550269e-09 7.806186e-09 8.070778e-09 8.344338e-09 8.627170e-09 8.919589e-09 9.221919e-09
[37] 9.534497e-09 9.857670e-09 1.019180e-08 1.053725e-08 1.089441e-08 1.126368e-08 1.164546e-08 1.204019e-08 1.244829e-08
[46] 1.287023e-08 1.330646e-08 1.375749e-08
With predicted data having such small values, it's likely that any moderate change in the parameters, as required by nls() to estimate gradients, will produce changes in the data that are very small, barely above or even below minFactor().
It's better to normalize your data so that its numerical range is within a nice friendly range, like 0 to 1.
require(stringr)
require(ggplot2)
new8 <- '25 22 68 72 126 141 82 61 97 126 101 110 173 164 160 137 122 113 104 104 109 102 107 122 149 127 137 146 185 188 114 91 102 132 147 148 151 154 165 215 216 206 205 207 207 220 200 204'
Diversity = as.numeric(str_split(new8, '[ ]+')[[1]])
time <- 1:48
df = data.frame(time=time, div=Diversity)
# normalize time
df$norm_time <- df$time / max(df$time)
# normalize diversity
df$norm_div <- (df$div - min(df$div)) / max(df$div)
With this way of normalizing diversity, your Po parameter can always be assumed to be 0. That means we can eliminate it from the model. The model now only has two degrees of freedom instead of three, which also makes fitting easier.
That leads us to the following model:
logisticModel <- nls(norm_div~K/(1+exp(r*norm_time)), data=df,
start=list(K=1, r=-1.6),
control=list(maxiter=1000, minFactor=.00000000001))
Your data doesn't look like that great a fit to the model to me, but I'm not an expert in your field:
ggplot(data=df, aes(x=norm_time, y=norm_div)) +
geom_point(log='y') +
geom_line(aes(x=norm_time, y=predict(logisticModel)), color='red') +
theme_bw()
quartz.save('~/Desktop/SO_31236153.png', type='png')
summary(logisticModel)
Formula: norm_div ~ K/(1 + exp(r * norm_time))
Parameters:
Estimate Std. Error t value Pr(>|t|)
K 0.6940 0.1454 4.772 1.88e-05 ***
r -2.6742 2.4222 -1.104 0.275
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.1693 on 46 degrees of freedom
Number of iterations to convergence: 20
Achieved convergence tolerance: 5.895e-06

linear interpolate 15 Hz time series to match with 25 Hz time series in R

Hi I have the following data recorded with 15Hz and I want to resample it using linear interpolation to 25 Hz. What is the best way to achieve this?
Here is the first second of my data set:
RecordFile YTSIMTMD RBDDLO_0 RBDDGS_0 IDLWMWC1 time timeNF
864 2C01MUC.txx 85535.10 -0.31 -0.348873 1 0.00000 0
865 2C01MUC.txx 85535.17 -0.31 -0.348873 1 0.06667 6667
866 2C01MUC.txx 85535.23 -0.31 -0.348873 0 0.13334 13334
867 2C01MUC.txx 85535.30 -0.31 -0.348832 0 0.20000 20000
868 2C01MUC.txx 85535.37 -0.31 -0.348832 0 0.26667 26667
869 2C01MUC.txx 85535.43 -0.31 -0.348832 0 0.33334 33334
870 2C01MUC.txx 85535.50 -0.31 -0.348832 1 0.40000 40000
871 2C01MUC.txx 85535.57 -0.31 -0.348796 1 0.46667 46667
872 2C01MUC.txx 85535.63 -0.31 -0.348796 1 0.53334 53334
873 2C01MUC.txx 85535.70 -0.31 -0.348796 1 0.60000 60000
874 2C01MUC.txx 85535.77 -0.31 -0.348796 0 0.66667 66667
875 2C01MUC.txx 85535.83 -0.31 -0.348767 0 0.73334 73334
876 2C01MUC.txx 85535.90 -0.31 -0.348767 0 0.80000 80000
877 2C01MUC.txx 85535.97 -0.31 -0.348767 0 0.86667 86667
878 2C01MUC.txx 85536.03 -0.31 -0.348767 1 0.93334 93334
879 2C01MUC.txx 85536.10 -0.31 -0.348735 1 1.00000 100000
After that I want to match it with this data set recorded with 25 Hz
vpName vpID origIndex areaNum areaName startMS endMS durationMS startF endF durationF accumIndex
1 2C01 1 1 2 ATT 0 560 560 0 14 14 1
2 2C01 1 1 2 ATT 0 560 560 0 14 14 1
3 2C01 1 1 2 ATT 0 560 560 0 14 14 1
4 2C01 1 1 2 ATT 0 560 560 0 14 14 1
5 2C01 1 1 2 ATT 0 560 560 0 14 14 1
6 2C01 1 1 2 ATT 0 560 560 0 14 14 1
I found that approx seems to be the linear interpolation for linear interpolation in R, however I am not sure which parameters to use to upsample my data from 15 to 25 Hz?
There seem to be explicit packages for handling time series in R like zoo and xts, but I am not sure whether I need them.
Both data sets start at the same time, so after upsampling I could simply match by rownumber.
Thank your for your help!
I'll make some assumptions - first, that data columns "YTSIMTMD" "RBDDLO_0" and "RBDDGS_0" contain continuous data so linear interpolation can be used. Second, that column IDLWMWC1 contains binary data so we will interpolate using method=constant which selects the data value at the last data time prior to the interpolation time. Given this, the following uses approx to do the interpolations and combine them into a data frame. The interpolation times are generated at a time interval of 1/freq. I put your data into a data frame called xx.
t_seq <- seq(min(xx$time), max(xx$time),1/25)
ap <- cbind(t_seq, sapply(xx[,c("YTSIMTMD", "RBDDLO_0","RBDDGS_0")],
function(y, x, nout) approx(x, y, nout, method="linear")$y, x=xx$time, nout=t_seq ))
ap <- cbind(ap,IDLWMWC1=approx(xx$time, xx$IDLWMWC1, t_seq, method="constant")$y)
I don't quite understand how your second set of data relates to the first but if it's just additional information at intervals of 1/25 starting at the same time, you could just combine the two data frame using cbind.
Here's an example, using approxfun to create a function with the linear fit to the input data:
xin<-seq(1,26,by=5)
yin<-2.5+3*xin
myfun<-approxfun(xin,yin)
plot(xin,yin)
newy<-myfun(seq(3,18,by=5))
points(seq(3,18,by=5),newy)
points(seq(3,18,by=5),newy,col='red')
In your case, the inputs aretime for x-values and whatever you are working with for y-values. Then just feed a sequence of "new" x values at 25Hz intervals (0.04 seconds) to get the fitted values you want.

Resources