R function for weighting teams by strength of opponent? - r

I'm analyzing some sports data, and I have a set of win/loss records for about 40 teams. I would like to come up with a ranking where each win is weighted by the strength of the opponent. This would have to be some iterative/recursive sort of thing where the weights and ranks are updated on each iteration until convergence. Does anyone know if there is an existing function or package for doing this sort of thing? My guess would is that it wouldn't be a sports-specific package, but I imagine this sort of thing is common across a lot of fields.
EDIT:
Here's some example data. There are 4 teams, A,B,C,and D, and each played the other team once, resulting in 10 unique games. The data are doubled so that each team's four games are listed as their own rows, with the column "a.win" referring to if "team.a" won the game (1=Yes).
dat<-data.frame(
team.a=c("A","A","A","A","B","B","B","B","C","C","C","C","D","D","D","D","E","E","E","E"),
team.b=c("B","C","D","E","A","C","D","E","A","B","D","E","A","B","C","E","A","B","C","D"),
a.win=c(1,1,0,1,0,0,1,0,0,1,1,0,1,0,0,1,0,1,1,0))
From these data, team A won 3/4, B won 1/4, and C,D,and E each won 2/4. But team D beat A, whereas C and E all lost to A. So intuitively D should be ranked slightly higher than C and E since one of its wins came to the highest rated opponent. Similarly, team C lost to team B (the only team with only won win) so intuitively it should be ranked lower than D and E.
I'm trying to figure out how best to assign ranks (e.g., from -1 to 1, or based on probability of winning, or number of losses, etc), and then how best to re-weight each team not just based on the number of wins/losses, but on the rank of the opponent they defeated.

Try the PlayerRatings package.
http://cran.r-project.org/web/packages/PlayerRatings/index.html
It implements the Elo and Glicko ratings used in Chess, but it can be extended to other sports as well. The package also contains functions for updating the ratings of players based on the previous rating and game outcomes. This is a basic starting point, which you will have to build on depending on your situation.
http://en.wikipedia.org/wiki/Elo_rating_system#Elo_ratings_beyond_chess
I don't think there will be a tailored solution for what you want to do, since how you go about ratings will depend on the specifics of your scenario.

Related

Is there a way to generate data in R where the sum of the observations add up to a specific value?

I'm looking for a way to generate different data frames where a variable is distributed randomly among a set number of observations, but where the sum of those values adds up to a predetermined total. More specifically I'm looking for a way to distribute 20.000.000 votes among 15 political parties randomly. I've looked around the forums a bit but can't seem to find an answer, and while trying to generate the data on my own I've gotten nowhere; I don't even know where to begin. The distribution itself does not matter, though I'd love to be able to influence the way it distributes the votes.
Thank you :)
You could make a vector of 20,000,000 samples of the numbers 1 through 15 then make a table from them, but this seems rather computationally expensive, and will result in an unrealistically even split of votes. Instead, you could normalise the cumulative sum of 15 numbers drawn from a uniform distribution and multiply by 20 million. This will give a more realistic spread of votes, with some parties having significantly more votes than others.
my_sample <- cumsum(runif(15))
my_sample <- c(0, my_sample/max(my_sample))
votes <- round(diff(my_sample) * 20000000)
votes
#> [1] 725623 2052337 1753844 61946 1173750 1984897
#> [7] 554969 1280220 1381259 1311762 766969 2055094
#> [13] 1779572 2293662 824096
These will add up to 20,000,000:
sum(votes)
#> [1] 2e+07
And we can see quite a "natural looking" spread of votes.
barplot(setNames(votes, letters[1:15]), xlab = "party")
I'm guessing if you substitute rexp for runif in the above solution this would more closely match actual voting numbers in real life, with a small number of high-vote parties and a large number of low-vote parties.

How does one approach this challenge asked in an Amazon Interview?

I am struggling optimising this past amazon Interview question involving a DAG.
This is what I tried (The code is long and I would rather explain it)-
Basically since the graph is a DAG and because its a transitive relation a simple traversal for every node should be enough.
So for every node I would by transitivity traverse through all the possibilities to get the end vertices and then compare these end vertices to get
the most noisy person.
In my second step I have actually found one such (maybe the only one) most noisy person for all the vertices of the traversal in step 2. So I memoize all of this in a mapping and mark the vertices of the traversal as visited.
So I am basically maintaining an adjacency list for the graph, A visited/non visited mapping and a mapping for the output (the most noisy person for every vertex).
In this way by the time I get a query I would not have to recompute anything (in case of duplicate queries).
The above code works but since I cannot test is with testcases it may/may not pass the time limit. Is there a faster solution(maybe using DP) to this. I feel I am not exploiting the transitive and anti symmetric condition enough.
Obviously I am not checking the cases where a person is less wealthy than the current person. But for instance if I have pairs like - (1,2)(1,3)(1,4)...etc and maybe (2,6)(2,7)(7,8),etc then if I am given to find a more wealthy person than 1 I have traverse through every neighbor of 1 and then the neighbor of every neighbor also I guess. This is done only once as I store the results.
Question Part 1
Question Part 2
Edit(Added question Text)-
Rounaq is graduating this year. And he is going to be rich. Very rich. So rich that he has decided to have
a structured way to measure his richness. Hence he goes around town asking people about their wealth,
and notes down that information.
Rounaq notes down the pair (Xi; Yi) if person Xi has more wealth than person Yi. He also notes down
the degree of quietness, Ki, of each person. Rounaq believes that noisy persons are a nuisance. Hence, for
each of his friends Ai, he wants to determine the most noisy(least quiet) person among those who have
wealth more than Ai.
Note that "has more wealth than"is a transitive and anti-symmetric relation. Hence if a has more wealth
than b, and b has more wealth than c then a has more wealth than c. Moreover, if a has more wealth than
b, then b cannot have more wealth than a.
Your task in this problem is to help Rounaq determine the most noisy person among the people having
more wealth for each of his friends ai, given the information Rounaq has collected from the town.
Input
First line contains T: The number of test cases
Each Test case has the following format:
N
K1 K2 K3 K4 : : : Kn
M
X1 Y1
X2 Y2
. . .
. . .
XM YM
Q
A1
A2
. . .
. . .
AQ
N: The number of people in town
M: Number of pairs for which Rounaq has been able to obtain the wealth
information
Q: Number of Rounaq’s Friends
Ki: Degree of quietness of the person i
Xi; Yi: The pairs Rounaq has noted down (Pair of distinct values)
Ai: Rounaq’s ith friend
For each of Rounaq’s friends print a single integer - the degree of quietness of the most noisy person as required or -1 if there is no wealthier person for that friend.
Perform a topological sort on the pairs X, Y. Then iterate from the most wealthy down the the least wealthy, and store the most noisy person seen so far:
less wealthy -> most wealthy
<- person with lowest K so far <-
Then for each query, binary search the first person with greater wealth than the friend. The value we stored is the most noisy person with greater wealth than the friend.
UPDATE
It seems that we cannot rely on the data allowing for a complete topological sort. In this case, traverse sections of the graph that lead from known greatest to least wealth, storing for each person visited the most noisy person seen so far. The example you provided might look something like:
3 - 5
/ |
1 - 2 |
/ |
4 --
Traversals:
1 <- 3 <- 5
1 <- 2
4 <- 2
4 <- 5
(Input)
2 1
2 4
3 1
5 3
5 4
8 2 16 26 16
(Queries and solution)
3 4 3 5 5
16 2 16 -1 -1

R Optimisation - Integer Programming

I have tried to use the R package LPSolve and in particular the lp.transport function to solve a optimisation problem. In my fictitious example below I have 5 office sites that I need to resource with a minimum number of employees and I have set up a cost matrix that determines the distance from each employees home to the office. I want to minimize the total distance traveled to work whilst meeting the minimum number of employees per office.
Initially this was working as I was treating all employees as equal (1). however problems have started to occur when I rate each employee by how efficient they are. For example I now want to say that officeX needs the equivalent of 2 engineers which might be made up of 4 engineers who are 50% efficient or 1 that is 200% efficient. When I do this however the solution found will split a employee across a number of offices, what I need is a additional constraint so impose that a employee can only be at 1 Office.
Anyway hopefully that is enough background here is my example:
Employee <- c("Jim","John","Jonah","James","Jeremy","Jorge")
Office1 <- c(2.58321505105556, 5.13811249390279, 2.75943834864996,
6.73543614029559, 6.23080251653027, 9.00620341764497)
Office2 <- c(24.1757667923894, 19.9990724784926, 24.3538456922105,
27.9532073293925, 26.3310994833106, 14.6856664813007)
Office3 <- c(38.6957155251069, 37.9074293509861, 38.8271000719858,
40.3882569566947, 42.6658938732098, 34.2011184027657)
Office4 <- c(28.8754359274453, 30.396841941228, 28.9595182970988,
29.2042274337124, 33.3933900645023, 28.6340025144932)
Office5 <- c(49.8854888720157, 51.9164328512659, 49.948290261029,
49.4793138594302, 54.4908258333456, 50.1487397648236)
#create CostMatrix
costMat<-data.frame(Employee,Office1, Office2, Office3, Office4, Office5)
#efficiency is the worth of employees, eg if 1 they are working at 100%
#so if for example I wanted 5 Employees
#working in a office then I could choose 5 at 100% or 10 working at 50% etc...
efficiency<-c(0.8416298, 0.8207991, 0.7129663, 1.1406839, 1.3868177, 1.1989748)
#Uncomment next line to see the working version based on headcount
#efficiency<-c(1,1,1,1,1,1)
#Minimum is the minimum number of Employees we want in each office
minimum<-c(1, 1, 2, 1, 1)
#solve problem
opSol <-lp.transport(cost.mat = as.matrix(costMat[,-1]),
direction = "min",
col.signs = rep(">=",length(minimum)),
col.rhs = minimum,
row.signs = rep("==", length(efficiency)),
row.rhs = efficiency,
integers=NULL)
#view solution
opSol$solution
# My issue is one employee is being spread across multiple areas,
#what I really want is a extra constraint that says that in a row there
# can only be 1 non 0 value.
I think this is no longer a transportation problem. However you still can solve it as a MIP model:

Need a solution for designing my database that has some potential permutation complexity?

I am building a website where I need to make sure that the number of "coins" and number of "users" wont kill the database if increases too quickly. I first posted this on mathematica (thinking its a maths website, but found it it's not). If this is the wrong place, please let me know and I'll move it accordingly. However, it does boil down to solving a complex problem: will my database explode if the users increase too quickly?
Here's the problem:
I am trying to confirm if the following equations would work for my problem. The problem is that i have USERS (u) and i have COINS (c).
There are millions of different coins.
One user may have the same coin another user has. (i.e. both users have coin A)
Users can trade coins with each other. (i.e. Trade coin A for coin B)
Each user can trade any coin with another coin, so long as:
they don't trade a coin for the same coin (i.e. can't trade coin A for another coin A)
they can't trade with themselves (i.e. I can't offer my own Coin A for my own Coin B)
So, effectively, there are database rows stored in the DB:
trade_id | user_id | offer_id | want_id
1 | 1 | A | B
2 | 2 | B | C
So in the above data structure, user 1 wants coin A for coint B, and user 2 wants coin B for coin C. This is how I propose to store the data, and I need to know that if I get 1000 users, and each of them have 15 coins, how many relationships will get built in this table if each user offers each coin to another user. Will it explode exponentially? Will it be scalable? etc?
In the case of 2 users with 2 coins, you'd have user 1 being able to trade his two coins with the other users two coins, and vice versa. That makes it 4 total possible trade relationships that can be set up. However, keeping in mind that if user 1 offers A for B... user 2 can't offer B for A (because that relationship already exists.
What would the equation be to figure out how many TRADES can happen with U users and C coins?
Currently, I have one of two solutions, but neither seem to be 100% right. The two possible equations I have so far:
U! x C!
C x C x (U-1) x U
(where C = coins, and U = users);
Any thoughts on getting a more exact equation? How can I know without a shadow of a doubt, that if we scale to 1000 users with 10 coins each, that this table won't explode into millions of records?
If we just think about how many users can trade with other users. You could make a table with the allowable combinations.
user 1
1 | 2 | 3 | 4 | 5 | 6 | ...
________________________________
1 | N | Y | Y | Y | Y | Y | ...
user 2 2 | Y | N | Y | Y | Y | Y | ...
3 | Y | Y | N | Y | Y | Y | ...
The total number of entries in the table is U * U, and there are U N's down the diagonal.
Theres two possibilities depending on if order matters. Is trade(user_A,user_B) is the same as trade(user_B,user_A) or not? If order matters the same the number of possible trades is the number of Y's in the table which is U * U - U or (U-1) * U. If the order is irrelevant then its half that number (U-1) * U / 2 which are the Triangular numbers. Lets assume order is irrelevant.
Now if we have two users the situation with coins is similar. Order does matter here so it is C * (C-1) possible trades between the users.
Finally multiply the two together (U-1) * U * C * (C-1) / 2.
The good thing is that this is a polynomial roughly U^2 * C^2 so it will not grow to quickly. This thing to watch out for is if you have exponential growth, like calculating moves in chess. Your well clear of this.
One of the possibilities in your question had U! which is the number of ways to arrange U distinct objects into a sequence. This would have exponential growth.
There are U possible users and there are C possible coins.
Hence there are OWNS = CxU possible "coins owned by an individual".
Hence there are also OWNS "possible offerings for a trade".
But a trade is a pair of two such offerings, restricted by the rule that the two persons acting as offerer cannot be the same, and neither can the offered coin be the same. So the number of candidates for completing a "possible offering" to form a "complete trade" is (C-1)x(U-1).
The number of possible ordered pairs that form a "full-blown trade" is thus
CxUx(C-1)x(U-1)
And then this is still to be divided by two because of the permutation issue (trades are a set of two (person,coin) pairs, not an ordered pair).
But pls note that this sort of question is actually an extremely silly one to worry about in the world of "real" database design !
I need to know that if I get 1,000 users, and each of them have 15 coins, how many relationships will get built in this table if each user offers each coin to another user.
The most that can happen is all 1,000 users each trade all of their 15 coins, for 7,500 trades. This is 15,000 coins up for trade (1,000 users x 15 coins). Since it takes at least 2 coins to trade, you divide 15,000 by 2 to get the maximum number of trades, 7,500.
Your trade table is basically a Cartesian product of the number of users times the number of coins, divided by 2.
(U x C) / 2
I'm assuming users aren't trading for the sake of trading. That they want particular coins and once they get the coins, won't trade again.
Also, most relational databases can handle millions and even billions of rows in a table.
Just make sure you have an index on trade id and on user id, trade id in your Trade table.
The way I understand this is that you are designing an offer table. I.e. user A may offer coin a in exchange for coin b, but not to a specific user. Any other user may take the offer. If this is the case, the maximum number of offers is proportional to the number of users U and the square of the number of coins C.
The maximum number of possible trades (disregarding direction) is
C(C-1)/2.
Every user can offer all the possible trades, as long as every user is offering the trades in the same direction, without any trade being matched. So the absolute maximum number of records in the offer table is
C(C-1)/2*U.
If trades are allowed between more than two user the number decreases above half that though. E.g. if A offers a for b, B offers b for c and C offers c for a. Then a trade could be accomplished in a triangle by A getting b from B, B getting c from C and C getting a from A.
The maximum number of rows in the table can then be calculated by splitting the C coins into two groups and offering any coin it the first group in exchange any coin in the second. We get the maximum number of combination if the groups are of the same size, C/2. The number of combinations is
C/2*C/2 = C^2/4.
Every user may offer all these trades without there being any possible trade. So the maximum number of rows is
C^2/4*U
which is just over half of
C(C-1)/2*U = 2*(C^2/4*U) - C/2*U.

Ideas for optimization algorithm for Fantasy Football

So, this is a bit different than standard fantasy football. What I have is a list of players, their average "points per game" (PPG) and their salary. I want to maximize points per game under the constraint that my team does not exceed a salary cap. A team consists of 1 QB, 1 TE, 3 WRs, and 2 RBs. So, if we have 15 of each position we have 15X15 X(15 c 3)X(15 c 2) = 10749375 possible teams.
Pretty computationally complex. I can use a bit of branch and bound i.e. once a team has surpassed the salary cap I can trim the tree, but even with that the algorithm is still pretty slow. I tried another option where I used a "genetic algorithm" i.e. made 10 random teams, picked the best one and "mutated" it (randomly changing some of the players) into another 10 teams and then picked of those and then looped through a bunch of times until the points per game of the "best team" stopped getting better.
There must be a better way to do this. I'm not a computer scientist and I've only taken an intro course in algorithmics. Programmers - what are your thoughts? I have a feeling that some sort of application of dynamic programming could help.
Thanks
I think a genetic algorithm, intelligently implemented, will yield an acceptable result for you. You might want to use a metric like points per salary dollar rather than straight PPG to decide the best team. This way you are inherently measuring value added. Also, you should consider running the full algorithm/mutation to satisfactory completion numerous times so that you can identity what players consistently show up in the final outcomes. These players then should be valued above others.
Of course the problem with the genetc approach Is that you need a good mutation algorithm and that is highly personal for how you want to implement it.
Take i as the current number of players out of n players and j to be the current remaining salary that is left. Take m[i, j] to be the dynamic set of solutions.
Then m[i, 0] = 0, m[0, j] = 0
and
m[i, j] = m[i - 1, j] if salary for player i is greater than j
else
m[i, j] = max ( m[i - 1, j], m[i - 1, j - salary of player i] + PPG of player i)
Sorry that I don't know R but I'm good with algorithms so I hope this helps.
A further optimization you can make is that you really only need 2 rows of m[i, j] because the DP solution only uses the current row and the last row (you can save memory this way)
First of all, the variation you have provided should not be right. Best way to build team is limit positions by limited plus there is absolutely no sense of moving 3 similar positions players between themselves.
Christian Ronaldo, Suarez and Messi will give you the equal sum of fantasy points in any line-up, like:
Christian Ronaldo, Suarez and Messi
or
Suarez, Christian Ronaldo and Messi
or
Messi, Suarez, Ronaldo
First step - simplify the variation possibility.
Next step - calculate the average price, and build the team one by one by adding player with lower salary but higher price. When reach salary limit, remove expensive one and add cheaper but with same fantasy points - and so on. Don't build the variation, value the weight of each player by combination of salary and fantasy points.
Does this help? It sets up the constraints and maximises points.
You could adapt to get data out of excel
http://pena.lt/y/2014/07/24/mathematically-optimising-fantasy-football-teams
14/07/24/mathematically-optimising-fantasy-football-teams

Resources