R Optimisation - Integer Programming - r

I have tried to use the R package LPSolve and in particular the lp.transport function to solve a optimisation problem. In my fictitious example below I have 5 office sites that I need to resource with a minimum number of employees and I have set up a cost matrix that determines the distance from each employees home to the office. I want to minimize the total distance traveled to work whilst meeting the minimum number of employees per office.
Initially this was working as I was treating all employees as equal (1). however problems have started to occur when I rate each employee by how efficient they are. For example I now want to say that officeX needs the equivalent of 2 engineers which might be made up of 4 engineers who are 50% efficient or 1 that is 200% efficient. When I do this however the solution found will split a employee across a number of offices, what I need is a additional constraint so impose that a employee can only be at 1 Office.
Anyway hopefully that is enough background here is my example:
Employee <- c("Jim","John","Jonah","James","Jeremy","Jorge")
Office1 <- c(2.58321505105556, 5.13811249390279, 2.75943834864996,
6.73543614029559, 6.23080251653027, 9.00620341764497)
Office2 <- c(24.1757667923894, 19.9990724784926, 24.3538456922105,
27.9532073293925, 26.3310994833106, 14.6856664813007)
Office3 <- c(38.6957155251069, 37.9074293509861, 38.8271000719858,
40.3882569566947, 42.6658938732098, 34.2011184027657)
Office4 <- c(28.8754359274453, 30.396841941228, 28.9595182970988,
29.2042274337124, 33.3933900645023, 28.6340025144932)
Office5 <- c(49.8854888720157, 51.9164328512659, 49.948290261029,
49.4793138594302, 54.4908258333456, 50.1487397648236)
#create CostMatrix
costMat<-data.frame(Employee,Office1, Office2, Office3, Office4, Office5)
#efficiency is the worth of employees, eg if 1 they are working at 100%
#so if for example I wanted 5 Employees
#working in a office then I could choose 5 at 100% or 10 working at 50% etc...
efficiency<-c(0.8416298, 0.8207991, 0.7129663, 1.1406839, 1.3868177, 1.1989748)
#Uncomment next line to see the working version based on headcount
#efficiency<-c(1,1,1,1,1,1)
#Minimum is the minimum number of Employees we want in each office
minimum<-c(1, 1, 2, 1, 1)
#solve problem
opSol <-lp.transport(cost.mat = as.matrix(costMat[,-1]),
direction = "min",
col.signs = rep(">=",length(minimum)),
col.rhs = minimum,
row.signs = rep("==", length(efficiency)),
row.rhs = efficiency,
integers=NULL)
#view solution
opSol$solution
# My issue is one employee is being spread across multiple areas,
#what I really want is a extra constraint that says that in a row there
# can only be 1 non 0 value.

I think this is no longer a transportation problem. However you still can solve it as a MIP model:

Related

Making a for loop in r

I am just getting started with R so I am sorry if I say things that dont make sense.
I am trying to make a for loop which does the following,
l_dtest[[1]]<-vector()
l_dtest[[2]]<-vector()
l_dtest[[3]]<-vector()
l_dtest[[4]]<-vector()
l_dtest[[5]]<-vector()
all the way up till any number which will be assigned as n. for example, if n was chosen to be 100 then it would repeat this all the way to > l_dtest[[100]]<-vector().
I have tried multiple different attempts at doing this and here is one of them.
n<-4
p<-(1:n)
l_dtest<-list()
for(i in p){
print((l_dtest[i]<-vector())<-i)
}
Again I am VERY new to R so I don't know what I am doing or what is wrong with this loop.
The detailed background for why I need to do this is that I need to write an R function that receives as input the size of the population "n", runs a simulation of the model below with that population size, and returns the number of generations it took to reach a MRCA (most recent common ancestor).
Here is the model,
We assume the population size is constant at n. Generations are discrete and non-overlapping. The genealogy is formed by this random process: in each
generation, each individual chooses two parents at random from the previous generation. The choices are made randomly and equally likely over the n possibilities and each individual chooses twice. All choices are made independently. Thus, for example, it is possible that, when an individual chooses his two parents, he chooses the same individual twice, so that in
fact he ends up with just one parent; this happens with probability 1/n.
I don't understand the specific step at the begining of this post or why I need to do it but my teacher said I do. I don't know if this helps but the next step is choosing parents for the first person and then combining the lists from the step I posted with a previous step. It looks like this,
sample(1:5, 2, replace=T)
#[1] 1 2
l_dtemp[[1]]<-union(l_dtemp[[1]], l_d[[1]]) #To my understanding, l_dtem[[1]] is now receiving the listdescandants from l_d[[1]] bcs the ladder chose l_dtemp[[1]] as first parent
l_dtemp[[2]]<-union(l_dtemp[[2]], l_d[[1]]) #Same as ^^ but for l_d[[1]]'s 2nd choice which is l_dtemp[[2]]
sample(1:5, 2, replace=T)
#[1] 1 3
l_dtemp[[1]]<-union(l_dtemp[[1]], l_d[[2]])
l_dtemp[[3]]<-union(l_dtemp[[3]], l_d[[2]])

Is there some way to detect 'wrong' measures in a dataframe?

I'm struggling on how can I remove 'wrong' measures from my dataset. I'm dealing with kind a huge table, where I have a date and the size of an equipment. It can't get bigger with use, at most it can stay the same size, so of course this problem is a measurement error.
My database is extensive and with several particular cases, which makes it impossible for me to place it here, among other business reasons... Therefore, I use an image and a part of the data as an example, but the problem is what I described above...
simplest_example = test = data.frame(data1 = c("20-09-2020", "15-10-2020", "13-05-2021", "20-10-2021","20-11-2021"), measure = c(5,4,3,5,2))
#as result:
# data1 measure
#1 20-09-2020 5
#2 15-10-2020 4
#3 13-05-2021 3
#4 20-11-2021 2
The point is: Select the largest non-ascending sequence possible, and exclude some values that inhibit this from happening.
So I would like to ask for a suggestion, if anyone here has come across something similar, and let me know how to recommend something.
If I understand, you want to detect any time the variable measure is greater than the value at the previous time point? I'd create a lag column, which is just the measure column lagged by one time. Then identify when a previous measure is greater than the current measure
library(dplyr)
simplest_example %>%
mutate(previous_measure = lag(measure)) %>%
filter(previous_measure < measure)

Calculate the number of trips in graph traversal

Hello Stack Overflow Community,
I'm attempting to solve this problem:
https://uva.onlinejudge.org/index.php?option=com_onlinejudge&Itemid=8&page=show_problem&problem=1040
The problem is to find the best path based on capacity between edges. I get that this can be solved using Dynamic Programming, I'm confused by the example they provide:
According to the problem description, if someone is trying to get 99 people from city 1 to 7, the route should be 1-2-4-7 which I get since the weight of each edge represents the maximum amount of passengers that can go at once. What I don't get is that the description says that it takes at least 5 trips. Where does the 5 come from? 1-2-4-7 is 3 hops, If I take this trip I calculate 4 trips, since 25 is the most limited hop in the route, I would say you need 99/25 or at least 4 trips. Is this a typo, or am I missing something?
Given the first line of the problem statement:
Mr. G. works as a tourist guide.
It is likely that Mr. G must always be present on the bus, thus the equation for the number of trips is:
x = (ceil(x) + number_of_passengers) / best_route
rather than simply:
x = number_of_passengers / best_route
or, for your numbers:
x = (ceil(x) + 99) / 25
Which can be solved with:
x == 4.16 (trips)

R - efficiently organize tables on condition over time

I'd like to know how to organize a data.frame into tables on conditions over time. I have a politics data set where certain organizations take a position on a bill and whether the bill passed or failed, over the last few decades.
I know how to organize the data individually into tables, but I do it one-by-one, and its really hard to see the trends. The stackoverflow community always seems to have ingenious ways of grouping data. Here's some mock data:
Data <- data.frame(
year = sample(1998:2004, 200, replace = TRUE),
outcome = sample(0:1, 200, replace = TRUE),
biz1 = sample(-2:2, 200, replace = TRUE),
biz2 = sample(-2:2, 200, replace = TRUE),
biz3 = sample(-2:2, 200, replace = TRUE)
)
In biz, a negative number means they oppose the outcome and a positive outcome means they support it. In outcome, a zero means the law did not pass, a 1 means that it did.
I would like to use tables to see how each business has become more or less successful over time, by looking at how their positive numbers match 1s and negative numbers match 0s, compared to ever other organization (and vice verse with positive matching the number of negative numbers).
A few notes
In the data set, I have about 100 businesses as columns, so I definitely need an efficient way to make the tables without naming every single column. I can select them in a range, like 125:300, since they are ordered together.
Of course i'm open to all ideas! Feel free to list any other ways of looking at this.
If i failed to ask this question right, please let me know how I could improve it.
The comments above about your question being too vague are right on target. Having said that this interests me and the vagueness leaves me free to interpret...
First, I'd recode the outcome as -1 if the bill fails. Then ourtcome * bizn is in a sense a success score for that business on that legislation: positive if either a bill that the business supported passed, or if a bill that the business opposed failed. Then there are several ways to visualize the scores. Here are just a few to get you started.
# re-code outcomes
Data$outcome <- ifelse(Data$outcome==0,-1,1)
library(reshape2) # for melt(...)
library(ggplot2)
gg <- melt(Data, id=c("year","outcome"),
variable.name="business", value.name="support")
gg$score <- with(gg,outcome*support) # score represents level of success
# mean success vs. year with +/- 1 sd
ggplot(gg,aes(x=year,y=score, color=business))+
stat_summary(fun.data="mean_sdl")+
stat_summary(fun.y=mean,geom="line")+
facet_grid(business~.)
# boxplot of success scores
ggplot(gg,aes(x=factor(year),y=score))+
geom_boxplot(aes(fill=business))+
facet_grid(business~.)
# barplot of success/failure frequencies
# excludes cases where a business did not take a position pro or con
gg.bar <- aggregate(score~year+business,gg,
function(eff)c(success=sum(eff>0),failure=sum(eff<0)))
gg.bar <- data.frame(gg.bar[1:2],gg.bar$score)
ggplot(gg.bar,aes(x=factor(year)))+
geom_bar(aes(y=success,fill="success"),stat="identity")+
geom_bar(aes(y=-failure,fill="failure"),stat="identity")+
geom_hline(xintercept=0,linetype=2,color="blue")+
scale_fill_discrete(name="",breaks=c("success","failure"))+
labs(x="",y="frequency")+
facet_grid(business~.)
All of these represent rather simplistic ways of looking at the data. If this was a serious project I would probably run a principal components analysis on the businesses to identify groups of businesses that tend to support or oppose the same legislation. Then I'd run a cluster analysis on the principal components to identify groups of legislation that tend to attract the support or opposition of groups of businesses.
Another way to approach this would be to run a logistic regression on the outcomes using the support/opposition of the various businesses as predictors. This would tell you which businesses tend to be more influential.

Ideas for optimization algorithm for Fantasy Football

So, this is a bit different than standard fantasy football. What I have is a list of players, their average "points per game" (PPG) and their salary. I want to maximize points per game under the constraint that my team does not exceed a salary cap. A team consists of 1 QB, 1 TE, 3 WRs, and 2 RBs. So, if we have 15 of each position we have 15X15 X(15 c 3)X(15 c 2) = 10749375 possible teams.
Pretty computationally complex. I can use a bit of branch and bound i.e. once a team has surpassed the salary cap I can trim the tree, but even with that the algorithm is still pretty slow. I tried another option where I used a "genetic algorithm" i.e. made 10 random teams, picked the best one and "mutated" it (randomly changing some of the players) into another 10 teams and then picked of those and then looped through a bunch of times until the points per game of the "best team" stopped getting better.
There must be a better way to do this. I'm not a computer scientist and I've only taken an intro course in algorithmics. Programmers - what are your thoughts? I have a feeling that some sort of application of dynamic programming could help.
Thanks
I think a genetic algorithm, intelligently implemented, will yield an acceptable result for you. You might want to use a metric like points per salary dollar rather than straight PPG to decide the best team. This way you are inherently measuring value added. Also, you should consider running the full algorithm/mutation to satisfactory completion numerous times so that you can identity what players consistently show up in the final outcomes. These players then should be valued above others.
Of course the problem with the genetc approach Is that you need a good mutation algorithm and that is highly personal for how you want to implement it.
Take i as the current number of players out of n players and j to be the current remaining salary that is left. Take m[i, j] to be the dynamic set of solutions.
Then m[i, 0] = 0, m[0, j] = 0
and
m[i, j] = m[i - 1, j] if salary for player i is greater than j
else
m[i, j] = max ( m[i - 1, j], m[i - 1, j - salary of player i] + PPG of player i)
Sorry that I don't know R but I'm good with algorithms so I hope this helps.
A further optimization you can make is that you really only need 2 rows of m[i, j] because the DP solution only uses the current row and the last row (you can save memory this way)
First of all, the variation you have provided should not be right. Best way to build team is limit positions by limited plus there is absolutely no sense of moving 3 similar positions players between themselves.
Christian Ronaldo, Suarez and Messi will give you the equal sum of fantasy points in any line-up, like:
Christian Ronaldo, Suarez and Messi
or
Suarez, Christian Ronaldo and Messi
or
Messi, Suarez, Ronaldo
First step - simplify the variation possibility.
Next step - calculate the average price, and build the team one by one by adding player with lower salary but higher price. When reach salary limit, remove expensive one and add cheaper but with same fantasy points - and so on. Don't build the variation, value the weight of each player by combination of salary and fantasy points.
Does this help? It sets up the constraints and maximises points.
You could adapt to get data out of excel
http://pena.lt/y/2014/07/24/mathematically-optimising-fantasy-football-teams
14/07/24/mathematically-optimising-fantasy-football-teams

Resources