Picking fair teams - and the math to prove it - math

Application: similar to picking playground teams.
I must divide a collection of n sequentially ranked elements into two teams of n/2. The teams must be as "even" as possible. Think of "even" in terms of playground teams, as described above. The rankings indicate relative "skill" or value levels. Element #1 is worth 1 "point", element #2 is worth 2, etc. No other constraints.
So if I had a collection [1,2,3,4], I would need two teams of two elements. The possibilities are
[1,2] & [3,4]
[1,3] & [2,4]
[1,4] & [2,3]
(Order is not important.)
Looks like the third option is the best in this case. But how can I best assess larger sets? Average/mean is one approach, but that would result in identical rankings for the following candidate pair which otherwise seem uneven:
[1,2,3,4,13,14,15,16] & [5,6,7,8,9,10,11,12]
I can use brute force to evaluate all candidate solutions for my problem domain.
Is there some mathematical/statistical approach I can use to verify the "evenness" of two teams?
Thanks!

Your second, longer example, does not seem uneven (or unfair) to me. In fact, it accords with what you seem to think is the preferred answer for the first example.
Therein lies the non-programming-related nub of your problem. What you have are ordinal numbers and what you want are cardinal numbers. To turn the former into the latter you have to define your own mapping, there is no universal, off-the-shelf approach.
You might, for example, compare each element of the 2 sets in turn, eg a1 vs b1, a2 vs b2, ... and regard the sets as even enough if the number of cases where a is better than b is about the same as the number of cases where b is better than a.
But for your application, I don't think you will do better than use the playground algorithm, each team leader chooses the best unchosen player and turns to choose alternate. Why do you need anything more complicated ?

The numbers represent rankings? Then no, there is no algorithm to get fair teams, because there's not enough information. It could be that even the match-up
[1] & [2,3,4,5,6,7,8,9,10,11,12,13,14,15,16]
is stacked against the large-team. This would be the case, for example, for chess-teams, if the difference between [1] and [2] was large.
Even the matchup you mentioned as being "unfair":
[1,2,3,4,13,14,15,16] & [5,6,7,8,9,10,11,12]
Could be completely fair in a game like baseball. After all, players 13-16 still need to bat!
So, probably the most fair thing to do would be to just pick teams randomly. That would also avoid any form of "gaming" the system (like my friends and I did in gym class in high school :) )

I don't think there's enough information to determine an answer.
What does it really mean for someone to be #1 vs #2? Are they 50% better, or 10% better or 1% better? How much better is #1 vs #5? It's really the algorithm to assign a value that needs to be accurate, and the distribution algorithm needs to reflect this properly.
For example, like I said, if you have Kobe Bryant mixed in with a bunch of high school basketball kids, what would the relative values be? Because in basketball, Kobe Bryant could single-handedly beat all the high school kids. So would his rank be #1, and the rest of the kids be #1000+?
As well, you have to assume that the value determination takes into account the size of a team. Does the team only need 2 players? Or does it need 10? In latter case, then in your second example, the 2nd team seems okay because the top 4 players would be playing with 6 much worse players, which could affect the success.
If all you are doing is distributing values, and if the notion of "fairness" is built into the value system, then the mean values seem to be a fair way to distribute the players.

You need an iterative ranking approach, with automated picking to produce evenly ranked teams on each iteration. This works even when the mix of participants changes to some extent over time. I created a tool to do just this for my 5-a-side group and then opened it up to allcomers if you google for "Fair Team Picker"

The pattern from above.
Team A: 1, 4, 5, 8, 9, 12, 13, 16
Team B: 2, 3, 6, 7, 10, 11, 14, 15
Snakes through the list using the pattern
A-B-B-A; A-B-B-A; etc.
This selection is pretty easy to code. Place all the ordered list of players into pairs. Reverse every odd # pair (assumes 1st pair is 0th group).
However, there is a "better" way to make teams using the Thue-Morse algorithm. For a more in-depth description of of this algorithm see: https://www.youtube.com/watch?v=prh72BLNjIk

Aren't the teams equal if each "round" of picking is simply done in reverse order of the preceding round? If there are 10 players whose talent is 1-10 and we are creating 5 teams (2 players each), the first round through, the first pick would obviously pick the best player (talent level 10). Then next pick would be 9, and so on. The 5th pick would get the play with the talent level of 6. In the second round, the pick order is reversed, so the team that just got talent level 6 would pick talent level 5 (the highest left) and so on until the captain who picked first in the 1st round would get the last player (talent level 1). Thus each team has a talent level of 11, with one team having 10 and 1, the next having 9 and 2, and so on. This would work for as many players/teams as there are.

Firs one selects 1. From then on they take turns choosing 2.
Assuming:
An even number of elements to choose
Every chooser gives the same value to each element
The value of the elements is very similar but different
Lower is better
[BEST] Firs one selects 1. From then on they take turns choosing 2:
16 items average
Team 1: 1, 4, 5, 8, 9, 12, 13, 16 8.5
Team 2: 2, 3, 6, 7, 10, 11, 14, 15 8.5
14 items average
Team 1: 1, 4, 5, 8, 9, 12, 13 7.42857
Team 2: 2, 3, 6, 7, 10, 11, 14 7.57143
Choosing first 1, second 2 and then 1 each:
16 items average
Team 1: 1, 4, 6, 7, 10, 11, 14, 15 8.875
Team 2: 2, 3, 5, 8, 9, 12, 13, 16 8.125
14 items average
Team 1: 1, 4, 5, 8, 9, 12, 13 7.42857
Team 2: 2, 3, 6, 7, 10, 11, 14 7.57143
[WORST] Comparing with selecting 1 each:
16 items average
Team 1: 1, 3, 5, 7, 9, 11, 13, 15 8
Team 2: 2, 4, 6, 8, 10, 12, 14, 16 9
16 items average
Team 1: 1, 3, 5, 7, 9, 11, 13 7
Team 2: 2, 4, 6, 8, 10, 12, 14 8

Related

Setting up linear program for allocation/assignment problem

I have some troubles regarding a linear program I alreday solved and use excel but now i want to do it in r/python beacuse I already reach excels and the solvers limits. Therefore I am asking for help on this specific topic.
I tried it with the lPsovle package by also altering the lp.assign function but I cannot come up with an solution.
The problem is as follows:
Let's say I am a deliverer of an commodity good.
I have differnet depots which serve different areas. These areas MUST be served with their demands.
My depots on the other hand, have a constraint regarding their capacity what they can handle and deliver.
One depot can serve several areas, but one area can only be served by one depot.
I have the distance/cost matrix for the connections between depots and areas as well as the demand for that areas.
The objective for this solution should be that the areas should be served with the minimal possible effort.
Lets say the cost/distance matrix looks something like this:
assign.costs <- matrix (c(2, 7, 7, 2, 7, 7, 3, 2, 7, 2, 8, 10, 1, 9, 8, 2,7,8,9,10), 4, 10)
So this creates my matrix, with the costumers/areas in the first row/header and the depots in the first column/row names.
Now the demand of the areas/customers is:
assign.demand <- matrix (c(1,2,3,4,5,6,7,8,9,10), 1, 10)
The capacity restrictions, what amount the depos are able to serve is:
assign.capacity <- matrix (c(15,15,15,15), 4, 1)
So now i woukd like this problem to be solved by a lp to generate the allocation, which area should be served by which depot according to these restrictions.
The result should look something like this:
assign.solution <- matrix (c(1,0,0,0 ,0,1,0,0, 1,0,0,0, 1,0,0,0 ,0,0,0,1), 4, 10)
As for the restrictions this means that every column must some up to one.
I tried it with the lpsolve and lp.assign functions from lpSolve but I dont know exactly how to implement that exact kind of restrictions I have and i already tried to alter the lp.assign functions with no success.
If it helps, i can also formulate the equations for the lp.
Thank you all for your help, I am really stuck right now :D
BR
Step 1. Develop a mathematical model
The mathematical model can look like:
The blue entries represent data and the red ones indicate a decision variable. i are the depots and j are the customers. Ship indicates if we ship from i to j (it is a binary variable). The first constraint says that total amount shipped from depot i should not exceed its capacity. The second constraint says that there must be exactly one supplier i for each customer j.
Step 2. Implementation
This is now just a question of being precise. I follow the model from the previous section as closely as I can.
library(dplyr)
library(tidyr)
library(ROI)
library(ROI.plugin.symphony)
library(ompr)
library(ompr.roi)
num_depots <- 4
num_cust <- 10
cost <- matrix(c(2, 7, 7, 2, 7, 7, 3, 2, 7, 2, 8, 10, 1, 9, 8, 2,7,8,9,10), num_depots, num_cust)
demand <- c(1,2,3,4,5,6,7,8,9,10)
capacity <- c(15,15,15,15)
m <- MIPModel() %>%
add_variable(ship[i,j], i=1:num_depots, j=1:num_cust, type="binary") %>%
add_constraint(sum_expr(demand[j]*ship[i,j], j=1:num_cust) <= capacity[i], i=1:num_depots) %>%
add_constraint(sum_expr(ship[i,j], i=1:num_depots) == 1, j=1:num_cust) %>%
set_objective(sum_expr(cost[i,j]*ship[i,j], i=1:num_depots, j=1:num_cust),"min") %>%
solve_model(with_ROI(solver = "symphony", verbosity=1))
cat("Status:",solver_status(m),"\n")
cat("Objective:",objective_value(m),"\n")
get_solution(m,ship[i, j]) %>%
filter(value > 0)
We see how important it is to first write down a mathematical model. It is much more compact and easier to reason about than a bunch of code. Going directly to code often leads to all kind of problems. Like building a house without a blueprint. Even for this small example, writing down the mathematical model is a useful exercise.
For the implementation I used OMPR instead of the LpSolve package because OMPR allows me to stay closer to the mathematical model. LpSolve has a matrix interface, which is very difficult to use except for very structured models.
Step 3: Solve it
Status: optimal
Objective: 32
variable i j value
1 ship 1 1 1
2 ship 4 2 1
3 ship 2 3 1
4 ship 1 4 1
5 ship 3 5 1
6 ship 4 6 1
7 ship 4 7 1
8 ship 2 8 1
9 ship 1 9 1
10 ship 3 10 1
I believe this is the correct solution.

How to find the averages of any consecutive numbers in a sequence?

This is a bit of a math question, but I post it here too because there's a direct practical purpose and it's related to creating a faster algorithm. I want to identify users that use my app on a weekly basis. For each user I can generate a sequence of times of their interactions, and from that I can generate a sequence of the length of time between each interaction.
So given this sequence of lengths of time, how can I find sections of consecutive numbers that have an average of 7 days or less?
As an example, if I had the following sequence: [1, 11, 1, 8, 12]
[1, 11, 1, 8, 12] would be a valid stretch of numbers with an average of 7 or less, but [11, 1, 8, 12] would not be valid. [1, 2, 12] would again be valid.
Ideally, my output for every valid section would be the starting position of the first item and the length of the section. So [1, 11, 1, 8, 12] would be described as [1, 5] and [1, 2, 12] would be described as [3, 3].
There is a brute force, computational approach where I take every item in the sequence as a start point, and calculate the averages of every possible length of following numbers up until the end of the sequence. The number of calculations grows quickly though at a rate of n(n+1)/2 (Imagine for each given sequence of length N finding consecutive sequences of length N, N-1, N-2 etc.)
I ask broadly if there's a more elegant approach that doesn't require a quadratically growing number of individual calculations for means.

R seq function between item 1 and 2, then between 2 and 3 of a vector

I have a vector c(5, 10, 15) and would like to use something like the seq function to created a new vector: 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15. This is how I would do it now, but it seems ineloquent at best. In the final (functional) form, I would need to increment by any given number, not necessarily units of 1.
original_vec <- c(5, 10, 15)
new_vec <- unique(c(seq(original_vec[1],original_vec[2],1),seq(original_vec[2],original_vec[3],1)))
> new_vec
[1] 5 6 7 8 9 10 11 12 13 14 15
Is there a way (I'm sure there is!) to use an apply or similar function to apply a sequence across multiple items in a vector, also without repeating the number in the middle (in the case above, 10 would be repeated, if not for the unique function call.
Edit: Some other possible scenarios might include changing c(1,5,7,10,12) to 1,1.5,2,2.5 ... 10, 10.5, 11, 11.5, 12, or c(1,7,4) where the price increases and then decreases by an interval.
The answer may be totally obvious and I just can't quite figure it out. I have looked at manuals and conducted searched for the answer already. Thank you!
While this isn't the answer to my original question, after discussing with my colleague, we don't have cases where seq(min(original_vec), max(original_vec), by=0.5), wouldn't work, so that's the simplest answer.
However, a more generalized answer might be:
interval = 1
seq(original_vec[1], original_vec[length(original_vec)], by = interval)
Edit: Just thought I'd go ahead and include the finished product, which includes the seq value in a larger context and work for increasing values AND for cases where values change direction. The use case is the linear interpolation of utilities, given original prices and utilities.
orig_price <- c(2,4,6)
orig_utils <- c(2,1,-3)
utility.expansion = function(x, y, by=1){
#x = original price, y = original utilities
require(zoo)
new_price <- seq(x[1],x[length(x)],by)
temp_ind <- new_price %in% x
new_utils <- rep(NA,length(new_price))
new_utils[temp_ind] <- y
new_utils <- na.approx(new_utils)
return(list("new price"=new_price,"new utilities"=new_utils))
}

Figuring out how to work out n-turple questions?

Just as a side note, this isn't asking someone to help with homework it's just a example question in my lecture notes that i'm not grasping and would greatly appreciate if someone could explain to me how exactly to work it out.
It says simply this:
Let G = {3,5,7}. Write down some examples of 4-tuples.
Thank you to anyone who tries to help, this is mathematics to understand a systems unit :)
Your collection G is a set, which is unordered and non-repeating, containing three elements. You want some 4-tuples, which is an ordered collection of possibly-repeating elements, and there must be 4 elements.
We show tuples by using parentheses around the collection, while a set like G is written using braces (curly brackets). Some examples of 4-tuples using the elements of G are
(3, 3, 3, 3)
(3, 3, 3, 5)
(3, 3, 3, 7)
(3, 3, 5, 3)
...
(7, 7, 7, 5)
(7, 7, 7, 7)
That list of mine was in a particular order, called the lexigraphical order. Since there are 4 elements and each element has 3 choices regardless of the other choices, the total number of 4-tuples is
3x3x3x3 = 81
As another answer implied, your question is somewhat ambiguous. I assumed that each 4-tuple was to have elements taken from your set G, but your question did not actually say that. It does seem to be implied, however.
A 4-tuple, in math, is any vector populated with 4 objects. An example of a 4-tuple is {1, 5, 3, 7}.
Without more context, I can't use the fact that G = {3, 5, 7}.

Absolute difference between a vector and a number in R

I'm new to R and I would be very grateful for an answer to my question:
I've got a vector: c(9, 11, 2, 6, 10) and the number 4 (or a vector c(4))
I want to generate a vector with the absolute difference between the first and the second one, which should look like this: c(5, 7, 2, 2, 6)
How do I do this? I can't get it to work with diff(), even after reading through the help (?diff()).
Any help is appreciated :)
x <- c(9, 11, 2, 6, 10)
abs(x - 4)
#[1] 5 7 2 2 6
abs finds the absolute value of a vector. '4' will be recycled when subtracted from x. If you have multiple values to be subtracted, they will also be recycled with a warning unless they are the same length as x.
You ran into problems with diff because it isn't designed for scalar subtraction (what you are attempting). It is better suited to finding the difference within a vector.

Resources