Combinatorial optimization: assignment (matching) with "consecutive" assignments - constraints

I am trying to model an optimization problem in IBM ILOG CPLEX. Basically it is the classic assignment problem.
I have to Sets A = {a_1,...,a_n} und B = {b_1,...b_m} and n < m.
Every element of A has to be assigned to one element of B. To each element of B, at maximum one element of A can be assigned. From n < m, it follows that some elements of B remain free (nothing is assigned to them).
Modelling this is extremely easy. However, I have another constraint and I can't find a way to model that.
The constraint is: all elements from B to which there is an assignment, have to be connected. The assignment has to be consecutive / sequential / however you want to call it.
Example: A = {a_1,a_2,a_3}, B = {b_1,b_2,b_3_b_4,b_5}
If some element from A is assigned to b_1, then b_2 and b_3 have assignments too.
If some element from A is assigned to b_3, b_4 and b_5 have assignments OR b_2 and b_4 have assignments OR b_1 and and b_2 have assignments.
In other words: If x means something is assigned to an element from B, these configurations are allowed: (xxx - -), (- xxx -), (- - xxx).
I use a decision variable x_ij with i in A and j in B. x_ij = 1 iff i is assigned to j.
Anyone got an idea how to model this restriction?

Let y(j) be 1 if there is an assignment to j and 0 otherwise:
y(j) = sum(i,x(i,j))
(This replaces the original assignment constraint sum(i,x(i,j)) ≤ 1)
Now, limit the number of times we have the pattern [0,1] in the y(j)'s as follows:
z(j) ≥ y(j)-y(j-1)
sum(j, z(j)) ≤ 1
This will allow only one transition from 0 to 1 in the y's. All variables y and z should be binary (or continuous between 0 and 1).
Output for a small data set. First the pure assignment problem:
---- 30 PARAMETER c cost coefficients
j1 j2 j3 j4 j5 j6 j7 j8 j9 j10
i1 0.661 0.756 0.627 0.284 0.086 0.103 0.641 0.545 0.032 0.792
i2 0.073 0.176 0.526 0.750 0.178 0.034 0.585 0.621 0.389 0.359
i3 0.243 0.246 0.131 0.933 0.380 0.783 0.300 0.125 0.749 0.069
i4 0.202 0.005 0.270 0.500 0.151 0.174 0.331 0.317 0.322 0.964
i5 0.994 0.370 0.373 0.772 0.397 0.913 0.120 0.735 0.055 0.576
---- 30 VARIABLE x.L assignment variables
j2 j5 j6 j9 j10
i1 1
i2 1
i3 1
i4 1
i5 1
(zero values are not shown).
After adding these y and z variables and constraints, we see:
---- 54 VARIABLE x.L assignment variables
j5 j6 j7 j8 j9
i1 1
i2 1
i3 1
i4 1
i5 1
---- 54 VARIABLE y.L destination is used
j5 1, j6 1, j7 1, j8 1, j9 1
---- 54 VARIABLE z.L 0-1 transition
j5 1
The complete model for this output was:

Related

Error: longer object length is not a multiple of shorter object length while creating multiple variables in data.table

I'm trying to create multiple columns in data.table in one command since logic is simple. I have column of starting values a0 and need to create time evolution by simply adding constant to next column.
Here is reproducible example
dt <- data.table(a0 = c(0.3, 0.34, 0.45, 0.6, 0.37, 0.444))
dt[, paste0('a', 1:5) := a0 + 1:5 / 4]
I would expect this produces columns a1, a2, a3, a4, a5 by simply adding 1/4 to each next column, instead getting warning and incorrect result
longer object length is not a multiple of shorter object length
dt
a0 a1 a2 a3 a4 a5
1: 0.300 0.550 0.550 0.550 0.550 0.550
2: 0.340 0.840 0.840 0.840 0.840 0.840
3: 0.450 1.200 1.200 1.200 1.200 1.200
4: 0.600 1.600 1.600 1.600 1.600 1.600
5: 0.370 1.620 1.620 1.620 1.620 1.620
6: 0.444 0.694 0.694 0.694 0.694 0.694
It looks R is calculating in a wrong dimension. Tried to add list dt[, paste0('a', 1:5) := list(a0 + 1:5 / 4)], but without luck.
You get the warning because length(dt$a0) is 6 whereas length(1:5) is 5.
dt$a0 + 1:5
#[1] 1.300 2.340 3.450 4.600 5.370 1.444
Warning message:
In dt$a0 + 1:5 :
longer object length is not a multiple of shorter object length
Here the first value of 1:5 is recycled and added to dt$a0[6].
You cannot reference the previous column directly like that. If you want to add new columns based on previous columns value in this case you can do something like
library(data.table)
n <- 5
dt[, paste0('a', seq_len(n)) := lapply(seq_len(n)/4, function(x) x + a0)]
dt
# a0 a1 a2 a3 a4 a5
#1: 0.300 0.550 0.800 1.050 1.300 1.550
#2: 0.340 0.590 0.840 1.090 1.340 1.590
#3: 0.450 0.700 0.950 1.200 1.450 1.700
#4: 0.600 0.850 1.100 1.350 1.600 1.850
#5: 0.370 0.620 0.870 1.120 1.370 1.620
#6: 0.444 0.694 0.944 1.194 1.444 1.694

Non linear regression for exponential decay model in R

I have the following problem:
I asked 5 people (i=1, ..., 5) to forecast next period's return of 3 different stocks. This gives me the following data:
S_11_i_c <-read.table(text = "
i c_1 c_2 c_3
1 0.150 0.70 0.190
2 0.155 0.70 0.200
3 0.150 0.75 0.195
4 0.160 0.80 0.190
5 0.150 0.75 0.180
",header = T)
In words, in period t=10 participant i=1 expects the return of stock c_1 to be 0.15 in period t=11.
The forecasts are based on past returns of the stocks. These are the following:
S_t_c <-read.table(text = "
time S_c_1 S_c_2 S_c_3
1 0.020 0.015 0.040
2 0.045 0.030 0.050
3 0.060 0.045 0.060
4 0.075 0.060 0.060
5 0.090 0.070 0.060
6 0.105 0.070 0.090
7 0.120 0.070 0.120
8 0.125 0.070 0.140
9 0.130 0.070 0.160
10 0.145 0.070 0.180
",header = T)
In words, stock c=1 had a return of 0.145 in period 10.
So, the variables in table S_11_i_c are the dependent variables.
The variables in table S_t_c are the independet variables.
The model I want to estimate is the following:
My problem with coding this is as follows:
I do only know how to express
with the help of a loop. As in:
Sum_S_t_c <- data.frame(
s = seq(1:9),
c_1 = rnorm(9)
c_2 = rnorm(9)
c_3 = rnorm(9)
)
Sum_S_t_c = 0
for (c in 2:4) {
for (s in 0:9) {
Sum_S_t_c[s,c] <- Sum_S_t_c + S_t_c[10-s, c]
Sum_S_t_c = Sum_S_t_c[s,c]
}
}
However, loops within a regression are not possible. So, my other solution would be to rewrite the sum to
However, as my actual problem has a much larger n, this isn*t realy working for me.
Any ideas?

Conditional sorting / reordering of column values in R

I have a data set similar to the following with 1 column and 60 rows:
value
1 0.0423
2 0.0388
3 0.0386
4 0.0342
5 0.0296
6 0.0276
7 0.0246
8 0.0239
9 0.0234
10 0.0214
.
40 0.1424
.
60 -0.0312
I want to reorder the rows so that certain conditions are met. For example one condition could be: sum(df$value[4:7]) > 0.1000 & sum(df$value[4:7]) <0.1100
With the data set looking like this for example.
value
1 0.0423
2 0.0388
3 0.0386
4 0.1312
5 -0.0312
6 0.0276
7 0.0246
8 0.0239
9 0.0234
10 0.0214
.
.
.
60 0.0342
What I tried was using repeat and sample as in the following:
repeat{
df1 <- as_tibble(sample(sdf$value, replace = TRUE))
if (sum(df$value[4:7]) > 0.1000 & sum(df$value[4:7]) <0.1100) break
}
Unfortunately, this method takes quite some time and I was wondering if there is a faster way to reorder rows based on mathematical conditions such as sum or prod
Here's a quick implementation of the hill-climbing method I outlined in my comment. I've had to slightly reframe the desired condition as "distance of sum(x[4:7]) from 0.105" to make it continuous, although you can still use the exact condition when doing the check that all requirements are satisfied. The benefit is that you can add extra conditions to the distance function easily.
# Using same example data as Jon Spring
set.seed(42)
vs = rnorm(60, 0.05, 0.08)
get_distance = function(x) {
distance = abs(sum(x[4:7]) - 0.105)
# Add to the distance with further conditions if needed
distance
}
max_attempts = 10000
best_distance = Inf
swaps_made = 0
for (step in 1:max_attempts) {
# Copy the vector and swap two random values
new_vs = vs
swap_inds = sample.int(length(vs), 2, replace = FALSE)
new_vs[swap_inds] = rev(new_vs[swap_inds])
# Keep the new vector if the distance has improved
new_distance = get_distance(new_vs)
if (new_distance < best_distance) {
vs = new_vs
best_distance = new_distance
swaps_made = swaps_made + 1
}
complete = (sum(vs[4:7]) < 0.11) & (sum(vs[4:7]) > 0.1)
if (complete) {
print(paste0("Solution found in ", step, " steps"))
break
}
}
sum(vs[4:7])
There's no real guarantee that this method will reach a solution, but I often try this kind of basic hill-climbing when I'm not sure if there's a "smart" way to approach a problem.
Here's an approach using combn from base R, and then filtering using dplyr. (I'm sure there's a way w/o it but my base-fu isn't there yet.)
With only 4 numbers from a pool of 60, there are "only" 488k different combinations (ignoring order; =60*59*58*57/4/3/2), so it's quick to brute force in about a second.
# Make a vector of 60 numbers like your example
set.seed(42)
my_nums <- rnorm(60, 0.05, 0.08);
all_combos <- combn(my_nums, 4) # Get all unique combos of 4 numbers
library(tidyverse)
combos_table <- all_combos %>%
t() %>%
as_tibble() %>%
mutate(sum = V1 + V2 + V3 + V4) %>%
filter(sum > 0.1, sum < 0.11)
> combos_table
# A tibble: 8,989 x 5
V1 V2 V3 V4 sum
<dbl> <dbl> <dbl> <dbl> <dbl>
1 0.160 0.00482 0.0791 -0.143 0.100
2 0.160 0.00482 0.101 -0.163 0.103
3 0.160 0.00482 0.0823 -0.145 0.102
4 0.160 0.00482 0.0823 -0.143 0.104
5 0.160 0.00482 -0.0611 -0.00120 0.102
6 0.160 0.00482 -0.0611 0.00129 0.105
7 0.160 0.00482 0.0277 -0.0911 0.101
8 0.160 0.00482 0.0277 -0.0874 0.105
9 0.160 0.00482 0.101 -0.163 0.103
10 0.160 0.00482 0.0273 -0.0911 0.101
# … with 8,979 more rows
This says that in this example, there are about 9000 different sets of 4 numbers from my sequence which meet the criteria. We could pick any of these and put them in positions 4-7 to meet your requirement.

How do i increment by the fraction required and return as vector

Did a for loop and want to return the results as a vector. i seem to only succeed with print. but that's not what i am after
n<-20
for (i in 1:n) {
start_point <- 0.50
frac <- (start_point / n) * (i-1+1)
increment <- start_point + frac
print(increment)
}
You are overwriting the increment value in every iteration, you need declare it as a numeric vector and store the value in each iteration using an index.
Some improvements in your current code -
1) no need to initialise start_point in every iteration and it can be outside loop
2) (i - 1 + 1) is just i
n <- 20
increment <- numeric(length = n)
start_point <- 0.50
for (i in 1:n) {
frac <- (start_point / n) * i
increment[i] <- start_point + frac
}
increment
# [1] 0.525 0.550 0.575 0.600 0.625 0.650 0.675 0.700 0.725 0.750 0.775
# 0.800 0.825 0.850 0.875 0.900 0.925 0.950 0.975 1.000
However, you could avoid the loop by using seq
seq(start_point + (start_point/n), by = start_point/n, length.out = n)
#[1] 0.525 0.550 0.575 0.600 0.625 0.650 0.675 0.700 0.725 0.750 0.775
# 0.800 0.825 0.850 0.875 0.900 0.925 0.950 0.975 1.000

Manipulating Data.Frames

I have different data.frame objects with two columns. These data.frame objects are called Experiment1, Experiment2, Experiment3 ... Experiment{n}
> Experiment1
Name Statistic
1 a -1.050
2 b 0.058
3 c 0.489
4 d 1.153
5 e 0.736
6 f -1.155
7 g 0.186
> Experiment2
Name Statistic
1 a 0.266
2 b 0.067
3 c -0.385
4 d 0.068
5 e 1.563
6 f 0.745
7 g 1.671
> Experiment3
Name Statistic
1 a 0.004
2 b -2.074
3 c 0.746
4 d 0.207
5 e 0.700
6 f 0.158
7 g 0.067
> Experiment4
Name Statistic
1 a 0.255
2 b -0.542
3 c 0.477
4 d 1.552
5 e 0.025
6 f 1.027
7 g 0.326
> Experiment5
Name Statistic
1 a 1.817
2 b 0.147
3 c 0.052
4 d 0.194
5 e -0.137
6 f 2.321
7 g -0.939
> Experiment6
Name Statistic
1 a 1.817
2 b 0.147
3 c 0.052
4 d 0.194
5 e -0.137
6 f 2.321
7 g -0.939
> ExperimentalDesign$metabolite
[1] "butyrate" "h2s" "hippurate" "acetate" "propionate" "butyrate_2" [7] "h2s_2" "hippurate_2" "acetate_2" "propionate_2"
I have different data.frame objects with three columns. These data.frame objects are called Experiment1, Experiment2, Experiment3 ... Experiment{n} (where n is NumberTubes divided by NumberParameters).
Now I want to merge from each data.frame object the .$Statistic column in a table (3 statistic columns per output..)
tab_1 <- cbind(Experiment1, Experiment2$Statistic, Experiment3$Statistic). Also, take the metabolite from ExperimentalDesign$metabolite in order. e.g. Table_3 would get hippurate.
NumberRepeats <- 3 (Table_1 = merge Experiment_1,
Experiment_2$Statistic, Experiment_3$Statistic , Table_2 = merge
Experiment_4, Experiment_5$Statistic, Experiment_6$Statistic, etc.)
Experiment_n <- 17 (e.g. Experiment_1, Experiment_2, etc..)
skipTube <- c(11) (skip Experiment_11)
Desired outputs:
Table_1:
Experiment1 Experiment2 Experiment3 metabolite
a -1.050 0.266 0.004 butyrate
b 0.058 0.067 -2.074 butyrate
c 0.489 -0.385 0.746 butyrate
d 1.153 0.068 0.207 butyrate
e 0.736 1.563 0.700 butyrate
f -1.155 0.745 0.158 butyrate
g 0.186 1.671 0.067 butyrate
Table_2
Experiment4 Experiment5 Experiment6 metabolite
a 0.255 1.817 -0.827 h2s
b -0.542 0.147 0.219 h2s
c 0.477 0.052 1.561 h2s
d 1.552 0.194 1.493 h2s
e 0.025 -0.137 0.063 h2s
f 1.027 2.321 0.844 h2s
g 0.326 -0.939 -0.373 h2s
TRIED SO FAR:
With this you merge on column of different dataframe objects to one table. You can control the number of column by the NumberRepeats variable. All table which are stored in a list have same number of data columns like the
NumberRepeats variable except the last table...
# created test data
for(i in 1:17){
Name <- letters[1:7]
Statistic <- round(rnorm(7), 3)
assign(paste0("Experiment",i), data.frame(Name, Statistic))
}
# set some parameters
NumberRepeats <- 3
Experiment_n <- 17
skipTube <- c(11)
# lets go
out <- list()
list_index <- 1
counter <- 1
while(counter < Experiment_n) {
tab <- NULL
nam <- NULL
while((is.null(tab) || ncol(tab) < NumberRepeats) & Experiment_n >= counter){
if(!any(counter == skipTube)){
tab <- cbind(tab, get(paste0("Experiment", counter))$Statistic)
# tab <- as.data.frame(tab)
nam <- c(nam,paste0("Experiment", counter))
}
counter <- counter + 1
}
colnames(tab) <- nam
rownames(tab) <- as.matrix(Experiment1$Name)
out[[list_index]] <- tab
assign(paste0('table_', list_index), tab)
list_index <- list_index + 1
}
out
Output from above code:
Experiment1 Experiment2 Experiment3
a 0.136 0.260 -1.089
b 0.946 -1.165 -0.599
c -0.462 -1.445 0.044
d -1.936 -0.391 0.622
e 0.537 -0.502 1.192
f 0.259 0.096 -1.873
g 1.352 0.049 -0.644
Desired output from the above code:
Experiment1 Experiment2 Experiment3 metabolite
a -1.050 0.266 0.004 butyrate
b 0.058 0.067 -2.074 butyrate
c 0.489 -0.385 0.746 butyrate
d 1.153 0.068 0.207 butyrate
e 0.736 1.563 0.700 butyrate
f -1.155 0.745 0.158 butyrate
g 0.186 1.671 0.067 butyrate
Something like this should work but this also quite manual:
table1 = Reduce(function(x,y){cbind(x,y)},
list(Experiment1$Statistic,Experiment2$Statistic,
Experiment3$Statistic,ExperimentalDesign$metabolite[1]))
table2 = Reduce(function(x,y){cbind(x,y)},
list(Experiment4$Statistic,Experiment5$Statistic,
Experiment6$Statistic,ExperimentalDesign$metabolite[2]))
EDIT: A more robust solution:
First create a list of all the experiment data.frames named ldf:
ldf = list(Experiment1,Experiment2,Experiment3,...,Experimentn)
And then:
lapply(1:ceiling(length(ldf)/3),
function(t,l,df){
if(t==ceiling(length(l)/3)){
ind = ((3*t)-2):(3*t-(length(l)%%3))
}else{
ind = ((3*t)-2):(3*t)
};
cbind(Reduce(function(x,y){cbind(x,y)},lapply(l[ind],'[[','Statistic')),
df$metabolite[t])
},
ldf,ExperimentalDesign)
This solution should do what you want in case you want to aggregate every 3 tables.
library(reshape)
for(i in 1:17){
Name <- letters[1:7]
Statistic <- round(rnorm(7), 3)
ExperimentName <- rep(paste0("Experiment",i), 7)
assign(paste0("Experiment",i), data.frame(ExperimentName, Name, Statistic, stringsAsFactors = FALSE) )
}
# set some parameters
NumberRepeats <- 5
Experiment_n <- 17
skipTube <- c(3,7,11)
# Create dummy list for the metabolites
metabolites <- c("met1", "met2", "met3", "met4", "met5")
for (iteration in c(1:Experiment_n)){
if (iteration %% 3 == 0){
temp_df <- rbind(get(paste0("Experiment", iteration - 2)), get(paste0("Experiment", iteration - 1)), get(paste0("Experiment", iteration)))
print(temp_df)
temp_df <- melt(data = temp_df)
aggregates <- dcast(data = temp_df, formula = Name ~ ExperimentName, value.var = "value")
aggregates$metabolite <- metabolites[iteration/3]
print(aggregates)
}
}

Resources