GAMS Error in MIP Transportation Problem - Uncontrolled set entered as constant - mixed-integer-programming

I am trying to formulate an MIP model in which a transportation can be performed by available trains or new ship investments. My current code includes three tables: Monthly costs for trains, monthly costs for ships and initial investment costs for ships.
It raises the following error n149 at the "cost.. z =e=" line: Uncontrolled set entered as constant. Also errors with codes 257 and 141 are raised at the 56th and 57th rows, respectively.
i supply nodes /Plant1, Plant2, Plant3, Plant4/
j demand nodes /City1, City2, City3, City4, City5, Dummy/;
a(i) supply capacities
/Plant1 290
Plant2 220
Plant3 180
Plant4 280/
b(j) demands
/City1 180
City2 200
City3 160
City4 140
City5 250
Dummy 40/;
Table c1(i,j) transport costs for trains
City1 City2 City3 City4 City5 Dummy
Plant1 8.5 7 8 6.5 9 0
Plant2 7.5 8 7 10 8.5 0
Plant3 11 6 6.5 8 7 0
Plant4 9 7 12 6 7.5 0 ;
Table c2(i,j) transport costs for ships
City1 City2 City3 City4 City5 Dummy
Plant1 5.5 6 99999 3.5 4 0
Plant2 3 4.5 4 6.5 6 0
Plant3 99999 99999 3 4 4.5 0
Plant4 5 4.5 7 3 99999 0 ;
Table in(i,j) investment costs for ships
City1 City2 City3 City4 City5 Dummy
Plant1 40 90 99999 40 80 0
Plant2 60 40 80 20 40 0
Plant3 99999 99999 80 60 100 0
Plant4 100 60 60 80 99999 0 ;
Positive Variables
x(i,j) flow between supply node i and demand node j;
y(i,j) whether a ship is bought for the trasfer from i to j
z total cost;
Binary Variables y;
cost objective function
supply(i) supply constraint
demand(j) demand constraint;
cost.. z =e= sum((i,j), c1(i,j)*x(i,j)*12*10*(1-y(i,j)) + c2(i,j)*x(i,j)*12*10*y(i,j)) + in(i,j)*y(i,j);
supply(i).. sum(j, x(i,j)) =l= a(i);
demand(j).. sum(i, x(i,j)) =g= b(j);
Model homework1c /all/;
Solve homework1c using MIP minimizing z;
Display x.l, x.M, y.l;
I would appreciate any suggestions to fix them, thanks in advance.

I see two issues:
i and j in + in(i,j)*y(i,j) at the end of your cost equation is not controlled. Was this term supposed to be part of the sum over i and j?
You try to solve a MIP (which should be linear) but multiply the variable x with variable y. So, you need to solve a MINLP or reformulate your cost equation.


Nested logit model using panel data in R

I am new to R and I would love it if you can help me with this because I am having serious difficulties.
I have unbalanced panel data that shows monthly companies' performance compared to the rest of the market in terms of $$ (eg. this month company 1 has made $1000 more than the average of the market). Each of these companies had decided on a strategy when they entered the market (1 through 8). These strategies are nested into two different groups (a,b) so that strategies 1,2, and 3 are part of the group a, while strategies 4 through 8 are part of group b. I would need a rank of the best strategies from best to worst.
I have discretized my DV so that now it only shows whether that month company 1 performed higher or lower than the market. However, I am not sure it is the right way because I would then lose how much better or worse each month companies performed compared to the market.
My data looks like this:
ID Main Strategy YearMonth DiffPerformance Control1 Control 2 DiffPerformanceHL
1 a 2 201706 9.037 2 57 H
1 a 2 201707 4.371 2 57 H
1 a 2 201708 1.633 2 57 H
1 a 2 201709 -3.521 2 59 L
1 a 2 201710 13.096 2 59 H
1 a 2 201711 5.070 2 60 H
1 a 2 201712 4.25 2 60 H
2 b 5 201904 6.78 4 171 H
2 b 5 201905 -15.26 4 169 L
2 b 5 201906 7.985 4 169 H
Where ID is the company, Main is the group (a or b) Strategies are 1 through 8 and nested as previously stated, YearMonth represents the specific month, DifferencePerformance is the DV as a continuous variable, Control 1 is static over time and is a categorical variable (1 through 6), Control 2 is a control count variable that changes over time, and DiffPerformance HL is the discretized DV.
Can you please help me figuring out how to create a nested logit model in R? I would be super appreciative

How to estimate additional variables in a df by group in R?

I have a data frame, consisting of stand ID, Treatment type, revision, and tree diameter. I want to estimate an additional variable - Quadratic mean diameter (and other variables) for every stand, revision and treatment separately, using a function: sqrt(sum(dia^2)/n).
An example of my dataset:
ID Rev Treat dia
1523 1 A 7.549834
1523 1 A 4.500000
1523 1 B 1.500000
1523 1 B 2.949576
1523 2 A 6.348228
1523 2 A 2.900000
1523 2 B 3.400000
1523 2 B 6.449806
1545 1 A 2.349468
1545 1 A 5.249762
1545 1 B 6.249800
1545 1 B 8.748714
1545 2 A 0.100000
1523 2 A 0.100000
1523 2 B 3.200000
1523 2 B 3.200000
So, basically what I want to do is have an estimate of Dq for 1) Stand 1523, Rev 1, Treat A; 2) 1) Stand 1523, Rev 1, Treat B; 3) Stand 1523, Rev 2, Treat A and so on.
My dataset is much larger, consisting of 4 treatments, 6 revisions and 8 stands. Making a loop would be one option I guess, but there must be an easier way how to do this?
Here is one way using dplyr:
data.df %>%
group_by(ID, Rev, Treat) %>%
summarise(quadratic_mean_diameter = sqrt(sum(dia^2)/length(dia)))

Mutation of non-conformable arrays

The problem I have is I have an error in result: "target - non-conformable arrays".
I know that it is the problem with melanoma$status, but have no idea how to alter the data accordingly. Any ideas? Couple of samples of data (if you don't use boot package from Rstudio).
time status sex age year thickness ulcer
1 10 3 1 76 1972 6.76 1
2 30 3 1 56 1968 0.65 0
3 35 2 1 41 1977 1.34 0
4 99 3 0 71 1968 2.90 0
5 185 1 1 52 1965 12.08 1
Your target variable should first take only the training indices. Moreover, the target should have a number of columns equal to the number of classes - with one-hot encoding. Something like this:
Target = matrix(data=0, nrow=length(idxTrain), ncol=3)
status_mat=matrix(nrow=length(idxTrain), ncol=2)
status_mat[,1] = c(1:length(idxTrain))
status_mat[,2] = melanoma$status[idxTrain]

How to prepare my data fo a factorial repeated measures analysis?

Currently, my dataframe is in wide-format and I want to do a factorial repeated measures analysis with two between subject factors (sex & org) and a within subject factor (tasktype). Below I've illustrated how my data looks with a sample (the actual dataset has a lot more variables). The variable starting with '1_' and '2_' belong to measurements during task 1 and task 2 respectively. this means that 1_FD_H_org and 2_FD_H_org are the same measurements but for tasks 1 and 2 respectively.
id sex org task1 task2 1_FD_H_org 1_FD_H_text 2_FD_H_org 2_FD_H_text 1_apv 2_apv
2 F T Correct 2 69.97 68.9 116.12 296.02 10 27
6 M T Correct 2 53.08 107.91 73.73 333.15 16 21
7 M T Correct 2 13.82 30.9 31.8 78.07 4 9
8 M T Correct 2 42.96 50.01 88.81 302.07 4 24
9 F H Correct 3 60.35 102.9 39.81 96.6 15 10
10 F T Incorrect 3 78.61 80.42 55.16 117.57 20 17
I want to analyze whether there is a difference between the two tasks on e.g. FD_H_org for the different groups/conditions (sex & org).
How do I reshape my data so I can analyze it with a model like this?
ezANOVA(data=df, dv=.(FD_H_org), wid=.(id), between=.(sex, org), within=.(task))
I think that the correct format of my data should like this:
id sex org task outcome FD_H_org FD_H_text apv
2 F T 1 Correct 69.97 68.9 10
2 F T 2 2 116.12 296.02 27
6 M T 1 Correct 53.08 107.91 16
6 M T 2 2 73.73 333.15 21
But I'm not sure. I tryed to achieve this wih the reshape2 package but couldn't figure out how to do it. Anybody who can help?
I think probably you need to rebuild it by binding the 2 subsets of columns together with rbind(). The only issue here was that your outcomes implied difference data types, so forced them both to text:
dt<-read.table(file="dt.txt",header=TRUE,sep=" ") # this was to bring in your data
ddply(dt,.(id,sex,org),summarize, task=1, outcome=as.character(task1), FD_H_org=X1_FD_H_org, FD_H_text=X1_FD_H_text, apv=X1_apv),
ddply(dt,.(id,sex,org),summarize, task=2, outcome=as.character(task2), FD_H_org=X2_FD_H_org, FD_H_text=X2_FD_H_text, apv=X2_apv)
id sex org task outcome FD_H_org FD_H_text apv
1 2 F T 1 Correct 69.97 68.90 10
7 2 F T 2 2 116.12 296.02 27
2 6 M T 1 Correct 53.08 107.91 16
8 6 M T 2 2 73.73 333.15 21
3 7 M T 1 Correct 13.82 30.90 4
9 7 M T 2 2 31.80 78.07 9
4 8 M T 1 Correct 42.96 50.01 4
10 8 M T 2 2 88.81 302.07 24
5 9 F H 1 Correct 60.35 102.90 15
11 9 F H 2 3 39.81 96.60 10
6 10 F T 1 Incorrect 78.61 80.42 20
12 10 F T 2 3 55.16 117.57 17
EDIT - obviously you don't need plyr for this (and it may slow it down) unless you're doing further transformations. This is the code with no non-standard dependencies:

How to choose the best splitting attribute with same gain information

I am actually computing step by step how CART (Classification and regression trees) choose the best attribute split with this training data set:
Car Age Children Location
1 sedan 23 0 yes
2 sports 31 1 no
3 sedan 36 1 no
4 truck 25 2 no
5 sports 30 0 no
6 sedan 36 0 no
7 sedan 25 0 yes
8 truck 36 1 no
9 sedan 30 2 yes
10 sedan 31 1 yes
11 sports 25 0 no
12 truck 45 0 yes
Results given by R:
n= 12
node), split, n, loss, yval, (yprob)
* denotes terminal node
1) root 12 5 no (0.5833333 0.4166667)
2) Car=sports,truck 6 1 no (0.8333333 0.1666667)
4) Age
5) Age>=40.5 1 0 yes (0.0000000 1.0000000) *
3) Car=sedan 6 2 yes (0.3333333 0.6666667)
6) Age>=33.5 2 0 no (1.0000000 0.0000000) *
7) Age
For the root node Gini(root)=0.486
- with the Car attribute GainGini(Car)=0.1255;
- with the Age attribute I got the same gain with threshold 27.5 and 33.5. So which one to choose if GainGini(Age) will be maximized.
- with the Children attribute. the 2 child nodes are very pure so GainGini(Children)=0.486
My first question is why on this plot I got the Car attribute for the splitting?
For the first right child node:Gini(node2)=0.444
- with the Age attribute: threshold 33.5 got GainGini(Age)=0.444
-with the children attribute: same as the root node (all instances are pure) GainGini(children)=0.444
this is my second question how CART manage to choose the split attribute with those 2 values?
