How to specify variables in this simplex method case? - r

I'm about to learn all about the simplex method in R project, unfortunately I crashed in this case:
We're running a 24h shop and we need to know how many employees do we need if there are six shifts (8-12,12-16 etc.) during the day, and one employee can work a maximum of 8 hours. Limits of the employees at one shift are:
0:00-4:00 < 5 4:00-8:00 < 7 8:00-12:00< 15 12:00-16:00 <10
16:00-20:00 <15 20:00-24:00 <9
I tried this:
library(boot)
a=c(1,1,1,1,1,1)
w1=c(1,1)
w2=c(1,1)
w3=c(1,1)
w4=c(1,1)
w5=c(1,1)
w6=c(1,1)
A1=rbind(w1,w2,w3,w4,w5,w6)
b1=c(5,7,15,10,15,9)
simplex(a=a,A1=A1,b1=b1,maxi=TRUE)
Error in`[<-`(`*tmp*`, , basic, value = iden(M)) :
index is out of borders
But it doesn't work.

The error occurs because the dimensions of the input matrices and vectors are not correct.
Since the coefficients vector a in your example has dimension 6, also the vector x in the objective function must have dimension 6. And since the b1 that is supplied to simplex() also has dimension 6, it follows that A1 in the constraint
A1 * x <= b1
must be a 6 x 6 matrix. However, A in your example is only a 6 x 2 matrix. This triggers the error message. (It would have been nicer if simplex() checked its inputs first and issued a more user friendly message...)
Here is an example with the right dimensions, which does work:
library(boot)
a = rep(1, 6) # vector with 6 ones
A1 = matrix(1, nrow=6, ncol=6) # 6x6 matrix with all ones
b1 = c(5, 7, 15, 10, 15, 9)
simplex(a=a, A1=A1, b1=b1, maxi=TRUE)
Note that this corrected example does not try to actually solve your specific simplex problem, it only illustrates correct usage of simplex().
In general it is worth carefully checking the input dimensions of the inputs to simplex(). They are explained nicely in the help pages:
?simplex

OK, I got it after 4 days :)) I post results here if anybody else'd have same problem. The main difficulty here is to calculate "people" as "number of hours at one shift"
a=c(1,1,1,1,1,1)
> w1=c(4,0,0,0,0,4)
> w2=c(4,4,0,0,0,0)
> w3=c(0,4,4,0,0,0)
> w4=c(0,0,4,4,0,0)
> w5=c(0,0,0,4,4,0)
> w6=c(0,0,0,0,4,4)
> b1=c(20,28,60,40,60,36)
> library(boot)
> simplex(a=a,A1=rbind(w1,w2,w3,w4,w5,w6)
+ ,b1=b1,maxi=T)
Linear Programming Results
Call : simplex(a = a, A1 = rbind(w1, w2, w3, w4, w5, w6), b1 = b1, maxi = T)
Maximization Problem with Objective Function Coefficients
x1 x2 x3 x4 x5 x6
1 1 1 1 1 1
Optimal solution has the following values
x1 x2 x3 x4 x5 x6
5 2 10 0 9 0
The optimal value of the objective function is 26.
CLOSED, Deletle subject or leave for others. Thank you admins for edit and #WhiteViking !

Related

Why do results of matching depend on order of data (MatchIt package)?

When using the matchit-function for full matching, the results differ by the order of the input dataframe. That is, if the order of the data is changed, results change, too. This is surprising, because in my understanding, the optimal full algorithm should yield only one single best solution.
Am I missing something or is this an error?
Similar differences occur with the optimal algorithm.
Below you find a reproducible example. Subclasses should be identical for the two data sets, which they are not.
Thank you for your help!
# create data
nr <- c(1:100)
x1 <- rnorm(100, mean=50, sd=20)
x2 <- c(rep("a", 20),rep("b", 60), rep("c", 20))
x3 <- rnorm(100, mean=230, sd=2)
outcome <- rnorm(100, mean=500, sd=20)
group <- c(rep(0, 50),rep(1, 50))
df <- data.frame(x1=x1, x2=x2, outcome=outcome, group=group, row.names=nr, nr=nr)
df_neworder <- df[order(outcome),] # re-order data.frame
# perform matching
model_oldorder <- matchit(group~x1, data=df, method="full", distance ="logit")
model_neworder <- matchit(group~x1, data=df_neworder, method="full", distance ="logit")
# store matching results
matcheddata_oldorder <- match.data(model_oldorder, distance="pscore")
matcheddata_neworder <- match.data(model_neworder, distance="pscore")
# Results based on original data.frame
head(matcheddata_oldorder[order(nr),], 10)
x1 x2 outcome group nr pscore weights subclass
1 69.773776 a 489.1769 0 1 0.5409943 1.0 27
2 63.949637 a 529.2733 0 2 0.5283582 1.0 32
3 52.217666 a 526.7928 0 3 0.5028106 0.5 17
4 48.936397 a 492.9255 0 4 0.4956569 1.0 9
5 36.501507 a 512.9301 0 5 0.4685876 1.0 16
# Results based on re-ordered data.frame
head(matcheddata_neworder[order(matcheddata_neworder$nr),], 10)
x1 x2 outcome group nr pscore weights subclass
1 69.773776 a 489.1769 0 1 0.5409943 1.0 25
2 63.949637 a 529.2733 0 2 0.5283582 1.0 31
3 52.217666 a 526.7928 0 3 0.5028106 0.5 15
4 48.936397 a 492.9255 0 4 0.4956569 1.0 7
5 36.501507 a 512.9301 0 5 0.4685876 2.0 14
Apparently, the assignment of objects to subclasses differs. In my understanding, this should not be the case.
The developers of the optmatch package (which the matchit function calls) provided useful help:
I think what we're seeing here is the result of the tolerance argument
that fullmatch has. The matching algorithm requires integer distances,
so we have to scale then truncate floating point distances. For a
given set of integer distances, there may be multiple matchings that
achieve the minimum, so the solver is free to pick among these
non-unique solutions.
Developing your example a little more:
> library(optmatch)
> nr <- c(1:100) x1 <- rnorm(100, mean=50, sd=20)
> outcome <- rnorm(100, mean=500, sd=20) group <- c(rep(0, 50),rep(1, 50))
> df_oldorder <- data.frame(x1=x1, outcome=outcome, group=group, row.names=nr, nr=nr) > df_neworder <- df_oldorder[order(outcome),] # > re-order data.frame
> glm_oldorder <- match_on(glm(group~x1, > data=df_oldorder), data = df_oldorder)
> glm_neworder <- > match_on(glm(group~x1, data=df_neworder), data = df_neworder)
> fm_old <- fullmatch(glm_oldorder, data=df_oldorder)
> fm_new <- fullmatch(glm_neworder, data=df_neworder)
> mean(sapply(matched.distances(fm_old, glm_oldorder), mean))
> ## 0.06216174
> mean(sapply(matched.distances(fm_new, glm_neworder), mean))
> ## 0.062058 mean(sapply(matched.distances(fm_old, glm_oldorder), mean)) -
> mean(sapply(matched.distances(fm_new, glm_neworder), mean))
> ## 0.00010373
which we can see is smaller than the default tolerance of 0.001. You can always decrease the tolerance level, which may
require increased run time, in order to get closer to the true
floating put minimum. We found 0.001 seemed to work well in practice,
but there is nothing special about this value.

merge/cbind model matrices

This is a simplified version of my current problem. I need to create a model.matrix from 2 model matrices, without loosing the info in "assign". For example, consider data and formula
y<-rnorm(100); x1<-rnorm(100); x2<-rnorm(100); x3<-rnorm(100)
f1 <- y ~ x1 + x2 + x3
and 2 model matrices X1 and X2 created using
trms<-terms.formula(f1)
trms2<-drop.terms(trms, dropx = 2)
trms3<-drop.terms(trms, dropx = -2)
X1<-model.matrix(trms2)
X2<-model.matrix(trms3)
Is there an easy way to create from X1 and X2 a matrix X with 1 intercept column and with attr(,"assign") that would have been obtained from f1?
I'm not completly sure if this is what you are trying to do but cbind() seems to work fine in this case.
X <- cbind(X1, X2)
X <- X[, !duplicated(colnames(X))]
You can then concatenate the attributes from X1 and X2. In order not to get duplicates you can only take the assign info from X2 which isn't already present in X1:
attributes(X)$assign <- c(attr(X1,"assign"), attr(X2,"assign")[!attr(X2,"assign") %in% attr(X1,"assign")])
If this is not what you were trying to to let us know.
If I understand the question correctly, how about something simple and direct like:
X3 <- cbind(X1[,1:2], X2[,2], X1[,3])
attr(X3,"assign") <- c(0,1,2,3)
colnames(X3) <- c("Intercept",attr(trms, "term.labels"))
head(X3)
Intercept x1 x2 x3
1 1 -1.28372461 -0.2598796 0.3028496
2 1 0.56880875 0.2803302 0.7593734
3 1 -0.32480770 -1.6705911 -1.1750247
4 1 -1.02761734 -0.1405454 -0.6805033
5 1 0.84218452 -0.1224962 -1.3882420
6 1 0.07221231 0.5587801 -0.9042751

Variable length formula construction

I am trying to apply the Simpson's Diversity Index across a number of different datasets with a variable number of species ('nuse') captured. As such I am trying to construct code which can cope with this automatically without needing to manually construct a formula each time I do it. Example dataset for a manual formula is below:
diverse <- data.frame(nuse1=c(0,20,40,20), nuse2=c(5,5,3,20), nuse3=c(0,2,8,20), nuse4=c(5,8,2,20), total=c(10,35,53,80))
simp <- function(x) {
total <- x[,"total"]
nuse1 <- x[,"nuse1"]
nuse2 <- x[,"nuse2"]
nuse3 <- x[,"nuse3"]
nuse4 <- x[,"nuse4"]
div <- round(((1-(((nuse1*(nuse1 - 1)) + (nuse2*(nuse2 - 1)) + (nuse3*(nuse3 - 1)) + (nuse4*(nuse4 - 1)))/(total*(total - 1))))),digits=4)
return(div)
}
diverse$Simpson <- simp(diverse)
diverse
As you can see this works fine. However, how would I be able to create a function which could automatically adjust to, for example, 9 species (so up to nuse9)?
I have experimented with the paste function + as.formula as indicated here Formula with dynamic number of variables; however it is the expand form of (nuse1 * (nuse1 - 1)) that I'm struggling with. Does anyone have any suggestions please? Thanks.
How about something like:
diverse <- data.frame(nuse1=c(0,20,40,20), nuse2=c(5,5,3,20), nuse3=c(0,2,8,20), nuse4=c(5,8,2,20), total=c(10,35,53,80))
simp <- function(x, species) {
spcs <- grep(species, colnames(x)) # which column names have "nuse"
total <- rowSums(x[,spcs]) # sum by row
div <- round(1 - rowSums(apply(x[,spcs], 2, function(s) s*(s-1))) / (total*(total - 1)), digits = 4)
return(div)
}
diverse$Simpson2 <- simp(diverse, species = "nuse")
diverse
# nuse1 nuse2 nuse3 nuse4 total Simpson2
# 1 0 5 0 5 10 0.5556
# 2 20 5 2 8 35 0.6151
# 3 40 3 8 2 53 0.4107
# 4 20 20 20 20 80 0.7595
All it does is find out which columns start with "nuse" or any other species you have in your dataset. It constructs the "total" value within the function and does not require a total column in the dataset.

Adjust implausible imputed values in an optimized way

I have a dataset with some imputed values. According to a predefined edit rule, some of these imputed values are implausible. For that reason, I want to adjust these implausible imputed values, but the adjustment should be as small as possible.
Here is a simplified example:
# Seed
set.seed(111)
# Example data
data <- data.frame(x1 = round(rnorm(200, 5, 5), 0),
x2 = factor(round(runif(200, 1, 3), 0)),
x3 = round(rnorm(200, 2, 10), 0),
x4 = factor(round(runif(200, 0, 5), 0)))
data[data$x1 > 5 & data$x2 == 1, ]$x3 <- 4
data[data$x1 > 5 & data$x2 == 1, ]$x4 <- 5
# Missings
data$x1[sample(1:nrow(data), 25)] <- NA
data$x2[sample(1:nrow(data), 50)] <- NA
data$x3[sample(1:nrow(data), 40)] <- NA
data$x4[sample(1:nrow(data), 35)] <- NA
# Imputation
library("mice")
imp <- mice(data, m = 1)
# Imputed data
data_imp <- complete(imp, "repeated")
# So far everything works well.
# However, there is a predefined edit rule, which should not be violated.
# Edit Rule:
# If x1 > 5 and x2 == 1
# Then x3 > 3 and x4 > 4
# Because of the imputation, some of the observations have implausible values.
implausible <- data_imp[data_imp$x1 > 5 & data_imp$x2 == 1 &
(data_imp$x3 <= 3 | (data_imp$x4 != 4 & data_imp$x4 != 5)), ]
implausible
# Example 1)
# In row 26 x1 has a value > 5 and x2 equals 1.
# For that reason, x3 would have to be larger than 3 (here x3 is -17).
# Like you can see in the original data, x2 has been imputed in row 26.
data[rownames(implausible), ]
# Hence, x2 would have to be adjusted, so that it randomly gets a different category.
# Example 2)
# In row 182 are also implausible values.
# Three of the variables have been imputed in this row.
# Therefore, all/some of the imputed cells would have to be adjusted,
# but the adjustment should be as small as possible.
I have already made some research and found some relevant papers/books, in which some optimization algorithms are described:
Pannekoek & Zhang (2011): https://www.researchgate.net/publication/269410841_Partial_donor_Imputation_with_Adjustments
de Waal, Pannekoek & Scholtus (2011): Handbook of Statistical Data Editing and Imputation
However, I am struggling with the implementation of these algorithms in R. Is there a Package available, which helps with these kind of calculations. I'd really appreciate some help with my code or some hints about the topic!

How to use predict on a test set?

I am going to eventually do multivariate regression for a vary large set of predictors. To make sure that I am putting the data in correctly and getting expected results with a toy model. However when I try to use predict it does not predict on the new data, also since the size of the new data is different from the training set it gives me an error.
I have looked and tried various things on the Internet and none have worked. I am almost ready to give up and write my own functions but I am also building models with the please package, which I am guessing probably calls this internally already so I want to be consistent. Here is the short script I wrote:
x1<-c(1.1,3.4,5.6,1.2,5,6.4,0.9,7.2,5.4,3.1) # Orginal Variables
x2<-c(10,21,25,15.2,18.9,19,16.2,22.1,18.6,22)
y<-2.0*x1+1.12*x2+rnorm(10,mean=0,sd=0.2) # Define output variable
X<-data.frame(x1,x2)
lfit<-lm(y~.,X) # fit model
n_fit<-lfit$coefficients
xg1<-runif(15,1,10) # define new data
xg2<-runif(15,10,30)
X<-data.frame(xg1,xg2)# put into data frame
y_guess<-predict(lfit,newdata=X) #Predict based on fit
y_actual<-2.0*xg1+1.12*xg2 # actual values because I know the coefficients
y_pred=n_fit[1]+n_fit[2]*xg1+n_fit[3]*xg2 # What predict should give me based on fit
print(y_guess-y_actual) #difference check
print(y_guess-y_pred)
These are the values I am getting and the error message:
[1] -4.7171499 -16.9936498 6.9181074 -6.1964788 -11.1852816 0.9257043 -13.7968731 -6.6624086 15.5365141 -8.5009428
[11] -22.8866505 2.0804016 -1.8728602 -18.7670797 1.2251849
[1] -4.582645 -16.903164 7.038968 -5.878723 -11.149987 1.162815 -13.473351 -6.483111 15.731694 -8.456738
[11] -22.732886 2.390507 -1.662446 -18.627342 1.431469
Warning messages:
1: 'newdata' had 15 rows but variables found have 10 rows
2: In y_guess - y_actual :
longer object length is not a multiple of shorter object length
3: In y_guess - y_pred :
longer object length is not a multiple of shorter object length
The predicted coefficient are 1.97 and 1.13 and intercept -0.25, it should be 0 but I added noise, this would not cause a big discrepancy as it is. How do I get it so I can predict an independent test set.
From the help - documentation, ?predict.lm:
"Variables are first looked for in newdata and then searched for in the usual way (which will include the environment of the formula used in the fit)."
The data.frame(), created in: X <- data.frame(xg1, xg2), has different names: (xg1, xg2). predict() cannot find the original names (x1, x2) and will then search for the correct variables in the formula instead. The result is that you obtain the fitted values from your original data.
Solve this by making your names in the newdata consistent with the original:
X <- data.frame(x1=xg1, x2=xg2) :
x1 <- c(1.1, 3.4, 5.6, 1.2, 5, 6.4, 0.9, 7.2, 5.4, 3.1) # Orginal Variables
x2 <- c(10, 21, 25, 15.2, 18.9, 19, 16.2, 22.1, 18.6, 22)
y <- 2.0*x1 + 1.12*x2 + rnorm(10, mean=0, sd=0.2) # Define output variable
X <- data.frame(x1, x2)
lfit <- lm(y~., X) # fit model
n_fit <- lfit$coefficients
xg1 <- runif(15, 1, 10) # define new data
xg2 <- runif(15, 10, 30)
X <- data.frame(x1=xg1, x2=xg2) # put into data frame
y_guess <- predict(lfit, newdata=X) #Predict based on fit
y_actual <- 2.0*xg1 + 1.12*xg2 # actual values because I know the coefficients
y_pred = n_fit[1] + n_fit[2]*xg1 + n_fit[3]*xg2 # What predict should give me based on fit
> print(y_guess - y_actual) #difference check
1 2 3 4 5 6 7 8 9 10 11 12 13
-0.060223916 -0.047790535 -0.018274280 -0.096190467 -0.079490487 -0.063736231 -0.047506981 -0.009523583 -0.047774006 -0.084276807 -0.106322290 -0.030876942 -0.067232989
14 15
-0.023060651 -0.041264431
> print(y_guess - y_pred)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

Resources