Calculate how much a point is worth based on played games - math

My problem I have is that I need to calculate out how much a point is worth based on played games.
If a team plays a match it can get 3 points for a win, 1 point for a tie and 0 points for a loss.
And the problem here is following:
Team 1
Wins:8 Tie:2 Loss:3 Points:26 Played Games: 13
Team 2
Wins:8 Tie:3 Loss:4 Points:27 Played Games: 15
And here you can see that Team 2 has 1 more point than Team 1 has. But Team 2 has played 2 more matches and have a lesser win % then Team 1 has. But if you should list these two then Team 2 would get a higher "rating" then Team 1 has.
So how should the math look for this to make it fair? where Team 1 will have a better score here then Team 2 ?

Just divide by the number of games to get the average points per game played.
Team1: 2.0 ppg
Team2: 1.8 ppg

Okey first of all thanks for the help.
And the solution of this is the following:
p/pg * p = Real points
p = Sum(points),
pg = Played games
So for the example up top the real points will be:
Team 1: 52
Team 2: 48.6

Related

find every combination of elements in a column of a dataframe, which add up to a given sum in R

I'm trying to ease my life by writing a menu creator, which is supposed to permutate a weekly menu from a list of my favourite dishes, in order to get a little bit more variety in my life.
I gave every dish a value of how many days it approximately lasts and tried to arrange the dishes to end up with menus worth 7 days of food.
I've already tried solutions for knapsack functions from here, including dynamic programming, but I'm not experienced enough to get the hang of it. This is because all of these solutions are targeting only the most efficient option and not every combination, which fills the Knapsack.
library(adagio)
#create some data
dish <-c('Schnitzel','Burger','Steak','Salad','Falafel','Salmon','Mashed potatoes','MacnCheese','Hot Dogs')
days_the_food_lasts <- c(2,2,1,1,3,1,2,2,4)
price_of_the_food <- c(20,20,40,10,15,18,10,15,15)
data <- data.frame(dish,days_the_food_lasts,price_of_the_food)
#give each dish a distinct id
data$rownumber <- (1:nrow(data))
#set limit for how many days should be covered with the dishes
food_needed_for_days <- 7
#knapsack function of the adagio library as an example, but all other solutions I found to the knapsackproblem were the same
most_exspensive_food <- knapsack(days_the_food_lasts,price_of_the_food,food_needed_for_days)
data[data$rownumber %in% most_exspensive_food$indices, ]
#output
dish days_the_food_lasts price_of_the_food rownumber
1 Schnitzel 2 20 1
2 Burger 2 20 2
3 Steak 1 40 3
4 Salad 1 10 4
6 Salmon 1 18 6
Simplified:
I need a solution to a single objective single Knapsack problem, which returns all possible combinations of dishes which add up to 7 days of food.
Thank you very much in advance

Wrong Answer with Permutation and Combination Example

I have below one problem regarding permutation and combination.
I know one solution which I am providing here. But I have another approach to the same problem but it is not giving me same answer as previous one. Can someone tell where am I making mistake here.
Problem: From a group of 7 men and 6 women, five persons are to be selected to form a committee so that at least 3 men are there in the committee. In how many ways can it be done?
First Answer:
We can select 5 men ...(option 1)
Number of ways to do this = 7C5
We can select 4 men and 1 woman ...(option 2)
Number of ways to do this = 7C4 × 6C1
We can select 3 men and 2 women ...(option 3)
Number of ways to do this = 7C3 × 6C2
Total number of ways = 7C5 + (7C4 × 6C1) + (7C3 × 6C2)
= 756.
Below is my new approach, where I am making mistake but not able to understand it.
atleast 3 men should be there. So ways to choose 3 men out of 7 = 7C3
= 35.
Now 2 person has to be selected from remaining 4 men and 6 women. The no of ways it can be done = 10C2 = 45.
Therefore, total no of way = 35*45 = 1575.
Can someone tell me what I am missing in second approach.
Your approach will count some ways more than
Suppose from the 7 men you choose
M1,M2,M3
and from the remaining 10 person you choose a men M4 and remaining women W1,W2,W3...W6
Now suppose you choose M1,M2,M4 men from the 7 men
and from remaining 10 you choose M3,W1,W2...W6
Now both of this represent the same set and should be counted only once but you are counting them as 2 different ways.Thats why your answer is greater than the expected answer

nMDs non-metric multi-dimensional scaling coding a data set

I have a data set of lizard retreat sites that i'd like to examine using an nmds in r to determine which variables are likely important. I'm a novice with r and was told I need to code the data so r can read it. I'm using OS X 10.9.5 (13F1911, r version R 3.3.3 GUI 1.69 Mavericks build (7328).
I'm not sure how to attach the data file, so I've copied the 'head'(data)here:
data <- data.frame(newdataset)
head(data)
Hide.. PIT Year Species Alive.Partial.Dead Standing.half.fallen.fallen X..days.obs Total...of.day.occupied Height Diameter Angle Aspect
1 1 91A1 2004 Hog Doctor A S 6 6 4.2 ? . ?
2 2 91A1 2004 Mammie A S 4 4 1.8 5-10cm 90 SW
3 3 COFE 2004 Tabebuia riparia A S 17 16 3 5-10cm 0 ENE
4 4 COFE 2004 Columar cactus P Fallen 2 2 0 5-10cm 90 S
5 5 COFE 2004 ? D Fallen 4 3 0.2 5-10cm 60 ?
6 6 COFE 2004 Eugenia sp (check greeny fruit) P S 7 7 3.5 10-20cm 0 W
As you can see I managed to read the data into r, but I'm not sure what is next? I know I need to the convert my data.frame(newdataset) to a distance matrix, but I am unclear if I have to code or create levels for some of the variables, e.g., If the retreat site (selected by the lizard) was in a tree that was either, 1. Alive, 2. Partially Dead, 3. Dead.
A little more about the variables- Column 1. Hide (retreat) Identifies each retreat selected by lizards i.e., one lizard may use a single or multiple retreats, Column 2.Passive Internal Transponder identification number uniquely identifying each lizard, Column 3. Year the data were collected, 4. Species refers to the tree species in which a retreat was located or in the case of a single lizard the substrate (rock) used, 5. Identifies if the tree was alive, partially alive or dead, 6. Identifies if the tree was standing upright, if it was leaning over, or if it was lying on the ground, 7. The number of days a lizard was observed using a particular retreat site, 8. The total number of days a retreat site was known to be used, 9. The height of the retreat site from the ground, 10. The diameter of the section of tree containing the retreat site, 11. The angle of the retreat site relative to the ground, 12. The angle of the retreat site relative to the ground.
Thank you to anyone that can give some advice with this problem.
Cheers
Rick

Moving Between States in a Markov Model - How to Tell R?

I have been struggling with this problem for quite a while and any help would be much appreciated.
I am trying to write a function to calculate a transition matrix from observed data for a markov model.
My initial data I am using to build the function look something like this;
Season Team State
1 1 Manchester United 1
2 1 Chelsea 1
3 1 Manchester City 1
.
.
.
99 5 Charlton Athletic 4
100 5 Watford 4
with 5 seasons and 4 states.
I know how I am going to calculate the transition matrix, but in order to do this I need to count the number of teams that move from state i to state j for each season.
I need code that will do something like this,
a<-function(x,i,j){
if("team x is in state i in season 1 and state j in season 2") 1 else 0
}
sum(a)
and then I could do this for each team and pair of states and repeat for all 5 seasons. However, I am having a hard time getting my head around how to tell R the thing in quotation marks. Sorry if there is a really obvious answer but I am a rubbish programmer.
Thanks so much for reading!
This function tells you if a team made the transition from state1 to state2 from season1 to season2
a <- function(team, state1, state2, data, season1, season2) {
team.rows = data[team == data["Team",],]
in.season1.in.state1 = ifelse(team.rows["Season",]==season1 && team.rows["State",state1],1,0)
in.season2.in.state2 = ifelse(team.rows["Season",]==season2 && team.rows["State",state2],1,0)
return(sum(in.season1.in.stat1) * sum(in.season2.in.state2))
}
In the first line I select all rows of a particular team.
The second line is determining for each entry if a team is ever in state1 in season1.
The third line is determining for each entry if a team is ever in state2 in season2,
and the return statement returns 0 if the team was never in the respective state in the respective season or 1 otherwise (only works if there are no duplicates, in that case it might return a value greater than 1)

Voting - Number of votes vs Vote percent?

I've implemented a simple up/down voting system on a website, and I keep track of individual votes as well as vote time and unique user iD (hashed IP).
My question is not how to calculate the percent or sum of the votes - but more, what is a good algorithm for determining a good score based on votes?
I find sorting by pure vote percent to be unacceptable, as well as simply tallying upvotes.
Consider this example:
Image A: 4 upvotes, 1 downvotes
Image B: 5 upvotes, 4 downvotes
Image C: 1 upvote, 0 downvotes
The ideal system would put A first, maybe followed by B and then C.
In a pure percentage scenario, the order is C > A > B. (wrong)
In a pure vote count scenario, the order is B > A > C. (wrong)
I have an idea for a somewhat "hybrid" algorithm based on the system's confidence in a score, maybe something along the lines of:
// (if totalvotes > 0, else score = 0)
score = 1 - ((downvotes+1 / totalvotes+1) * sqrt(1 / totalvotes))
However, I was hoping to ask the community if there are any really well-defined algorithms already out there that I simply don't know about, before I sit around tweaking my algorithm from now until sunset.
I also have date data for each vote - however, the content of the site isn't very time-sensitive so I don't really care to sort by "what's hot" at all.
Sorting by the average of votes is not very good.
By instead balancing the proportion of positive ratings with the uncertainty of a small number of observations like explained in this article, you achieve a much better representation of your scores.
The article below explains how to not make the same mistake that many popular websites do. (Amazon, urbandictionary etc.)
http://evanmiller.org/how-not-to-sort-by-average-rating.html
Hope this helps!
I know that doesn't answer your question, but I just spent 3 minutes for fun trying to find some formula and... just check it :) A column is upvotes and B is downvotes :)
=(LN((A1+1)/(A1+B1+1))+1)*LN(A1)
5 3 0.956866995
4 1 1.133543015
5 4 0.787295787
1 0 0
6 4 0.981910844
2 8 -0.207447157
6 5 0.826007385
3 3 0.483811507
4 0 1.386294361
5 0 1.609437912
6 1 1.552503332
5 2 1.146431478
100 100 -3.020151034
10 10 0.813671022

Resources