in CVXPY: Give a name or ID to constraints - constraints

In CVXPY, I have found that I am able to give a name or ID to the variables I define.
Now I would also like to give a string name the constraints I define, so that I can find them easily later.
The reason is that I have many agents, which are defined in similar ways, but I would like to know which of the constraints belong to which agent. (so give the constraint the number of the agent for example).
I have looked in the documentation, where the Constraint class is defined as:
cvxpy.constraints.constraint.Constraint(args, constr_id=None)
Where the parameters are defined as
args (list) – A list of expression trees.
constr_id (int) – A unique id for the constraint.
So it seems that I can give an integer ID, but not a name.
Does anyone know whether it is possible to give a string name to a constraint?

I don't know if that's possible, but i would not try to dive that deep into it's internal-code.
Why not just wrap it away? This should be more safe and customizable.
Prototype (which does not check for duplicates for example):
import cvxpy as cp
class ConstraintBuilder:
def __init__(self):
self.constraintList = []
self.str2constr = {}
def addConstr(self, expr, str_):
self.constraintList.append(expr)
self.str2constr[str_] = len(self.constraintList)-1
def get(self):
return self.constraintList
def getConstr(self, str_):
return self.constraintList[self.str2constr[str_]]
####
cb = ConstraintBuilder()
x = cp.Variable(5)
cb.addConstr(x >= 0, 'nonnegative')
cb.addConstr(x[0] + x[1] >= 0.3, 'pair0')
cb.addConstr(x[0] + x[1] >= 0.6, 'pair1')
objective = cp.Minimize(cp.sum(x))
problem = cp.Problem(objective, cb.get())
problem.solve(verbose=True, solver=cp.ECOS)
print(problem.status)
print(cb.getConstr('pair1'))
which outputs:
ECOS 2.0.7 - (C) embotech GmbH, Zurich Switzerland, 2012-15. Web: www.embotech.com/ECOS
It pcost dcost gap pres dres k/t mu step sigma IR | BT
0 +3.600e-01 +3.690e-01 +5e+00 7e-01 1e-02 1e+00 8e-01 --- --- 1 1 - | - -
1 +4.151e-01 +4.114e-01 +8e-01 2e-01 2e-03 1e-01 1e-01 0.8590 3e-03 1 1 1 | 0 0
2 +5.863e-01 +5.831e-01 +9e-02 3e-02 2e-04 1e-02 1e-02 0.9795 1e-01 1 1 1 | 0 0
3 +5.998e-01 +5.998e-01 +1e-03 3e-04 2e-06 1e-04 2e-04 0.9887 1e-04 1 1 1 | 0 0
4 +6.000e-01 +6.000e-01 +1e-05 3e-06 2e-08 2e-06 2e-06 0.9890 1e-04 1 0 0 | 0 0
5 +6.000e-01 +6.000e-01 +1e-07 4e-08 2e-10 2e-08 2e-08 0.9890 1e-04 1 0 0 | 0 0
6 +6.000e-01 +6.000e-01 +1e-09 4e-10 3e-12 2e-10 2e-10 0.9890 1e-04 1 0 0 | 0 0
OPTIMAL (within feastol=4.2e-10, reltol=2.5e-09, abstol=1.5e-09).
Runtime: 0.000236 seconds.
optimal
0.6 <= var0[0] + var0[1]

Related

How can I edit this apriori algorithm code in market basket analysis using rhs and lhs?

I'm trying Market Bakset Analysis using 'apriori' algorithm.
The 'Catalog' data set consists of 8 columns and 200 rows :
Automotive Computers Personal.Electronics Garden Clothing Health Jewelry Housewares
0 1 0 1 1 0 0 0 1
0 0 0 1 0 1 0 0 1
0 1 1 0 0 0 1 0 0
1 0 0 0 0 1 0 0 0
0 0 0 1 1 0 0 1 0
First, I tried apriori algorithm withtout any rhs, lhs limitation and here's the result :
Catalogrules <- apriori(Catalog, parameter= list(support =
0.1, confidence = 0.3, minlen = 2))
inspect(sort(Catalogrules, by = "lift")[1:5])
lhs rhs support confidence coverage lift count
\[1\] {Automotive=0,
Computers=0,
Personal.Electronics=0,
Clothing=0} =\> {Garden=1} 0.110 0.9565217 0.115 2.943144 22
\[2\] {Personal.Electronics=0,
Jewelry=1} =\> {Clothing=1} 0.125 0.7142857 0.175 2.857143 25
\[3\] {Clothing=1,
Health=0} =\> {Jewelry=1} 0.120 0.8888889 0.135 2.821869 24
\[4\] {Automotive=0,
Personal.Electronics=0,
Clothing=0} =\> {Garden=1} 0.110 0.9166667 0.120 2.820513 22
\[5\] {Computers=0,
Personal.Electronics=0,
Jewelry=1} =\> {Clothing=1} 0.105 0.6774194 0.155 2.709677 21
I wanted to see rules with "=1" because '0' cannot show any meaningful relationships.
ex) {Jewelry=1, Computers=1} -> {Clothing=1}
So, I tried to make new code using rhs and lhs.
Catalogrules <- apriori(Catalog, parameter= list(support =
0.1, confidence = 0.3, minlen = 2), appearance=list(rhs = c("Automotive=1", "Computers=1", "Personal.Electronics=1", "Garden=1", "Clothing=1", "Health=1", "Jewelry=1", "Housewares=1"), lhs = c("Automotive=1", "Computers=1", "Personal.Electronics=1", "Garden=1", "Clothing=1", "Health=1", "Jewelry=1", "Housewares=1"), default="rhs"))
But this is the error message:
Error in asMethod(object) :
The following items cannot be specified in multiple appearance locations: Automotive=1, Computers=1, Personal.Electronics=1, Garden=1, Clothing=1, Health=1, Jewelry=1, Housewares=1
I want to print top 5 rules using apriori algorithm.
ex) inspect(sort(Catalogrules, by = "lift")[1:5])
1. {Automotive=1} -> {Garden=1}
2. {Personal.Electronics=1} -> {Computers=1}
3. {Jewelry=1} -> {Clothing=1}
4. {Garden=1} -> (Automotives=1}
5. {Clothing=1} -> {Jewelry=1}
I need your help.

Calculating mean grade of students' peers

I have one dataset which includes all the points of students and other variables.
I further have a diagonal matrix which includes information on which student is a peer of another student.
Now I would like to use the second matrix (network) to calculate the mean-peer-points for each student. Everyone can have different (number of) peers.
To calculate the mean, I recalculated the simple 0,1 matrix into percentages, whereby the denominator is the sum of the number of peers one student has.
The second matrix then would look something like this:
ID1 ID2 ID3 ID4 ID5
ID1 0 0 0 0 1
ID2 0 0 0.5 0.5 0
ID3 0 0.5 0 0 0.5
ID4 0 0.5 0 0 0.5
ID5 0.33 0 0.33 0.33 0
And the points of each students is a simple variable in another dataset, and I would like to have the peers-average-points in as a second variable:
ID Points Peers
ID1 45 11
ID2 42 33.5
ID3 25 26.5
ID4 60 26.5
ID5 11 43.33
Are there any commands in Stata for that problem? I am currently looking into the Stata commands nwcommands, but I am unsure whether it can help. I could use solutions for Stata and R.
Without getting too creative, you can accomplish what you are trying to do with reshape, collapse and a couple of merges in Stata. Generally speaking, data in long format is easier to work with for this type of exercise.
Below is an example which produces the desired result.
/* Set-up data for example */
clear
input int(id points)
1 45
2 42
3 25
4 60
5 11
end
tempfile points
save `points'
clear
input int(StudentId id1 id2 id3 id4 id5)
1 0 0 0 0 1
2 0 0 1 1 0
3 0 1 0 0 1
4 0 1 0 0 1
5 1 0 1 1 0
end
/* End data set-up */
* Reshape peers data to long form
reshape long id, i(Student) j(PeerId)
drop if id == 0 // drop if student is not a peer of `StudentId`
* create id variable to use in merge
replace id = PeerId
* Merge to points data to get peer points
merge m:1 id using `points', nogen
* collapse data to the student level, sum peer points
collapse (sum) PeerPoints = points (count) CountPeers = PeerId, by(StudentId)
* merge back to points data to get student points
rename StudentId id
merge 1:1 id using `points', nogen
gen peers = PeerPoints / CountPeers
li id points peers
+------------------------+
| id points peers |
|------------------------|
1. | 1 45 11 |
2. | 2 42 42.5 |
3. | 3 25 26.5 |
4. | 4 60 26.5 |
5. | 5 11 43.33333
+------------------------+
In the above code, I reshape your peer data into long form data and keep only student-peer pairs. I then merge this data to the points data to get the points of each students peers. From here, I collapse the data back to the student level, totaling peer points and peer count in the process. At this point, you have total points for the peers of each student and the number of peers each student has. Now, you simply have to merge back to the points data to get the subject students points and divide total peer points (PeerPoints) by the number of peers the student has (CountPeers) for average peer points.
nwcommands is an outstanding package I have never used or studied, so I will just try the problem from first principles. This is all matrix algebra, but given a matrix and a variable, I would approach it like this in Stata.
clear
scalar third = 1/3
mat M = (0,0,0,0,1\0,0,0.5,0.5,0\0,0.5,0,0,0.5\0,0.5,0,0,0.5\third,0,third,third,0)
input ID Points Peers
1 45 11
2 42 33.5
3 25 26.5
4 60 26.5
5 11 43.33
end
gen Wanted = 0
quietly forval i = 1/5 {
forval j = 1/5 {
replace Wanted = Wanted + M[`i', `j'] * Points[`j'] in `i'
}
}
list
+--------------------------------+
| ID Points Peers Wanted |
|--------------------------------|
1. | 1 45 11 11 |
2. | 2 42 33.5 42.5 |
3. | 3 25 26.5 26.5 |
4. | 4 60 26.5 26.5 |
5. | 5 11 43.33 43.33334 |
+--------------------------------+
Small points: Using 0.33 for 1/3 doesn't give enough precision. You'll have similar problems for 1/6 and 1/7, for example.
Also, I get that the peers of 2 are 3 and 4 so their average is (25 + 60)/2 = 42.5, not 33.5.
EDIT: A similar approach starts with a data structure very like that imagined by #ander2ed
clear
input int(id points id1 id2 id3 id4 id5)
1 45 0 0 0 0 1
2 42 0 0 1 1 0
3 25 0 1 0 0 1
4 60 0 1 0 0 1
5 11 1 0 1 1 0
end
gen wanted = 0
quietly forval i = 1/5 {
forval j = 1/5 {
replace wanted = wanted + id`j'[`i'] * points[`j'] in `i'
}
}
egen count = rowtotal(id1-id5)
replace wanted = wanted/count
list
+--------------------------------------------------------------+
| id points id1 id2 id3 id4 id5 wanted count |
|--------------------------------------------------------------|
1. | 1 45 0 0 0 0 1 11 1 |
2. | 2 42 0 0 1 1 0 42.5 2 |
3. | 3 25 0 1 0 0 1 26.5 2 |
4. | 4 60 0 1 0 0 1 26.5 2 |
5. | 5 11 1 0 1 1 0 43.33333 3 |
+--------------------------------------------------------------+

Predict() with nested multinomial logit models

I'm using the mlogit package in R to create a nested multinomial logit model of healthcare provider choice given choice data I have. The data look like this:
ID RES weight age wealth educ married urban partnerAge totalChildren survivingChildren anyANC
1.0 2468158 FALSE 0.2609153 29 Poor Primary 1 0 31 4 4 1
1.1 2468158 TRUE 0.2609153 29 Poor Primary 1 0 31 4 4 1
1.2 2468158 FALSE 0.2609153 29 Poor Primary 1 0 31 4 4 1
1.3 2468158 FALSE 0.2609153 29 Poor Primary 1 0 31 4 4 1
2.0 14233860 FALSE 0.2754970 19 Poorest Primary 1 0 30 1 1 1
2.1 14233860 TRUE 0.2754970 19 Poorest Primary 1 0 30 1 1 1
2.2 14233860 FALSE 0.2754970 19 Poorest Primary 1 0 30 1 1 1
2.3 14233860 FALSE 0.2754970 19 Poorest Primary 1 0 30 1 1 1
outlier50Km optout alt spa mes dist bobs cobs Q fees chid educSec
1.0 0 -1 0 Home Home 0.000 0.0000000 0.000 0.00 0 1 0
1.1 0 -1 1 Health center Public 13.167 0.4898990 NA 0.64 0 1 0
1.2 0 -1 2 Health center Public 30.596 0.5202020 NA 0.56 0 1 0
1.3 0 -1 3 District hospital Public 41.164 0.7171717 0.825 0.88 0 1 0
2.0 0 -1 0 Home Home 0.000 0.0000000 0.000 0.00 0 2 0
2.1 0 -1 1 Health center Mission 14.756 0.7676768 NA 0.64 1 2 0
2.2 0 -1 2 Health center Public 41.817 0.3787879 NA 0.56 0 2 0
2.3 0 -1 3 District hospital Public 50.419 0.7171717 0.825 0.88 0 2 0
where spa, mes, dist, bobs, cobs, Q, and fees are characteristics of the provider and the remaining variables specific to the individual. These data are in long format, meaning each individual has four rows, reflecting her four choices (alt = 0:3), with RES being the response variable.
An un-nested model behaves appropriately
f.full <- RES ~ 0 + dist + Q + bobs + fees + spa | 0 + age + wealth + educSec + married + urban + totalChildren + survivingChildren
choice.ml.full <- mlogit(formula = f.full, data = data, weights = weight)
predict(choice.ml.full, data[1:8,])
0 1 2 3
[1,] 0.1124429 0.7739403 0.06893341 0.04468343
[2,] 0.4465272 0.3107375 0.11490317 0.12783210
By all measures of model fit, however, a nested model is better than an un-nested one. The nested model gives me coefficients appropriately:
ns2 <- mlogit(formula = f.full, nests = list(home = "0", useCare = c("1", "2", "3")), data = data, weight = weight, un.nest.el = TRUE)
summary(ns2)
Call:
mlogit(formula = f.full, data = data, weights = weight, nests = list(home = "0",
useCare = c("1", "2", "3")), un.nest.el = TRUE)
Frequencies of alternatives:
0 1 2 3
0.094378 0.614216 0.194327 0.097079
bfgs method
23 iterations, 0h:0m:13s
g'(-H)^-1g = 9.51E-07
gradient close to zero
Coefficients :
Estimate Std. Error t-value Pr(>|t|)
dist -0.0336233 0.0040136 -8.3773 < 2.2e-16 ***
Q 0.1780058 0.0768181 2.3172 0.0204907 *
bobs -0.0695695 0.0505795 -1.3754 0.1689925
fees -0.8488132 0.1001928 -8.4718 < 2.2e-16 ***
etc...
But, I get the following error if I try to predict on a single individual:
predict(ns2, data[1:4,])
Error in apply(Gl, 1, sum) : dim(X) must have a positive length
and a different error if I try to predict on more than one individual:
predict(ns2, data[1:8,])
Error in solve.default(crossprod(attr(x, "gradi")[, !fixed])) :
Lapack routine dgesv: system is exactly singular: U[5,5] = 0
Any help would be vastly appreciated.

Create a dataframe with % of Factor

I have a dataframe which shows a number of shops that have had a health and safety test. Within this dataframe I have the name of the shop and a factor that shows the outcome of the test on a certain day.
head(facttab)
new_table.dba_name new_table.results
1 QUICK SUB Out of Business
2 BAR BARI Pass
3 FOOD FIRST CHICAGO Pass
4 TRATTORIA ISABELLA Pass
5 DELI-TIME, L.L.C. Pass
6 GREAT AMERICAN BAGEL Fail
>
facttab <- data.frame(new_table$dba_name, new_table$results)
head(table(facttab))
new_table.dba_name Fail No Entry Not Ready Out of Business Pass Pass w/ Conditions
1 2 3 EXPRESS 1 0 0 0 0 0
1155 CAFETERIA 0 0 0 0 1 0
16TH ST FOOD MART 0 0 0 1 0 0
194 RIB JOYNT 0 1 0 0 0 0
24HR MINI MART & CELLAR FOR YOU 1 0 0 0 0 0
7-ELEVEN 0 0 0 0 4 2
I would like to build another table or dataframe that shows the % of the total outcomes of tests for each shop over the whole dataframe so I can see who has the largest % fails and the largest % pass.
The resulting table would be similar to above for example 7-Eleven would be - 0%, No Entry - 0%, Not Ready Out - 0%, Out of Business 0%, Pass - 66% and Pass w/conditions - 33%.
I thought I would whip up an answer. This is how to convert the prop.table into a data.frame. I'm sure there's probably a quicker way of doing this. Note that I'm using a dataset I created myself. It would probably be helpful to look at ?reshape
set.seed(123)
#create some dummy data
df <- data.frame(store = sample(c('a','b','c'), 100, replace = T),
status = sample(c('foo','bar','haz'), 100, replace = T))
#convert to prop.table
(prop.t <- prop.table(table(df$store, df$status), 1))
bar foo haz
a 0.4242424 0.2121212 0.3636364
b 0.4117647 0.4117647 0.1764706
c 0.3636364 0.3030303 0.3333333
#coerce to data.frame
(prop.t.df <- data.frame(prop.t))
Var1 Var2 Freq
1 a bar 0.4242424
2 b bar 0.4117647
3 c bar 0.3636364
4 a foo 0.2121212
5 b foo 0.4117647
6 c foo 0.3030303
7 a haz 0.3636364
8 b haz 0.1764706
9 c haz 0.3333333
#use reshape()
(reshape(prop.t.df, direction = 'wide', idvar = 'Var1', v.names = 'Freq', timevar = 'Var2'))
Var1 Freq.bar Freq.foo Freq.haz
1 a 0.4242424 0.2121212 0.3636364
2 b 0.4117647 0.4117647 0.1764706
3 c 0.3636364 0.3030303 0.3333333
Obviously, you'd probably want to play around with the names a bit, but this is one way of getting at what you want.
PS Another way of getting at it is:
prop.t.df2 = as.data.frame.matrix(prop.t)
Note: you'd probably need to create a new column called Store by accessing the row.names of prop.t.df2.
prop.t.df2$Store = row.names(prop.t.df2)

R Team Roster Optimization w/ lpSolve

I am new to R and have a particular fantasy sports team optimization problem I would like to solve. I have seen other posts use lpSolve for similar problems but I can not seem to wrap my head around the code. Example data table below. Every player is on a team, plays a particular role, has a salary, and has avg points produced per game. The constraints that I need are I need exactly 8 players. No more than 3 players may come from any one team. There must be at least one player for each role (of 5). And cumulative salary must not exceed $10,000.
Team Player Role Avgpts Salary
Bears A T 22 930
Bears B M 19 900
Bears C B 30 1300
Bears D J 25 970
Bears E S 20 910
Jets F T 21 920
Jets G M 26 980
[...]
In R, I write in the following
> obj = DF$AVGPTS
> con = rbind(t(model.matrix(~ Role + 0, DF)), rep(1,nrow(DF)), DF$Salary)
> dir = c(">=",">=",">=",">=",">=","==","<=")
> rhs = c(1,1,1,1,1,8,10000)
> result = lp("max", obj, con, dir, rhs, all.bin = TRUE)
This code works fine in producing the optimal fantasy team without the limitation of no more than 3 players may come from any one team. This is where I am stuck and I suspect it relates to the con argument. Any help is appreciated.
What if you added something similar to the way you did the roles to con?
If you add t(model.matrix(~ Team + 0, DF)) you'll have indicators for each team in your constraint. For the example you gave:
> con <- rbind(t(model.matrix(~ Role + 0,DF)), t(model.matrix(~ Team + 0, DF)), rep(1,nrow(DF)), DF$Salary)
> con
1 2 3 4 5 6 7
RoleB 0 0 1 0 0 0 0
RoleJ 0 0 0 1 0 0 0
RoleM 0 1 0 0 0 0 1
RoleS 0 0 0 0 1 0 0
RoleT 1 0 0 0 0 1 0
TeamBears 1 1 1 1 1 0 0
TeamJets 0 0 0 0 0 1 1
1 1 1 1 1 1 1
930 900 1300 970 910 920 980
We now need to update dir and rhs to account for this:
dir <- c(">=",">=",">=",">=",">=",rep('<=',n_teams),"<=","<=")
rhs <- c(1,1,1,1,1,rep(3,n_teams),8,10000)
With n_teams set appropriately.

Resources