Wrong Answer with Permutation and Combination Example - math

I have below one problem regarding permutation and combination.
I know one solution which I am providing here. But I have another approach to the same problem but it is not giving me same answer as previous one. Can someone tell where am I making mistake here.
Problem: From a group of 7 men and 6 women, five persons are to be selected to form a committee so that at least 3 men are there in the committee. In how many ways can it be done?
First Answer:
We can select 5 men ...(option 1)
Number of ways to do this = 7C5
We can select 4 men and 1 woman ...(option 2)
Number of ways to do this = 7C4 × 6C1
We can select 3 men and 2 women ...(option 3)
Number of ways to do this = 7C3 × 6C2
Total number of ways = 7C5 + (7C4 × 6C1) + (7C3 × 6C2)
= 756.
Below is my new approach, where I am making mistake but not able to understand it.
atleast 3 men should be there. So ways to choose 3 men out of 7 = 7C3
= 35.
Now 2 person has to be selected from remaining 4 men and 6 women. The no of ways it can be done = 10C2 = 45.
Therefore, total no of way = 35*45 = 1575.
Can someone tell me what I am missing in second approach.

Your approach will count some ways more than
Suppose from the 7 men you choose
M1,M2,M3
and from the remaining 10 person you choose a men M4 and remaining women W1,W2,W3...W6
Now suppose you choose M1,M2,M4 men from the 7 men
and from remaining 10 you choose M3,W1,W2...W6
Now both of this represent the same set and should be counted only once but you are counting them as 2 different ways.Thats why your answer is greater than the expected answer

Related

find every combination of elements in a column of a dataframe, which add up to a given sum in R

I'm trying to ease my life by writing a menu creator, which is supposed to permutate a weekly menu from a list of my favourite dishes, in order to get a little bit more variety in my life.
I gave every dish a value of how many days it approximately lasts and tried to arrange the dishes to end up with menus worth 7 days of food.
I've already tried solutions for knapsack functions from here, including dynamic programming, but I'm not experienced enough to get the hang of it. This is because all of these solutions are targeting only the most efficient option and not every combination, which fills the Knapsack.
library(adagio)
#create some data
dish <-c('Schnitzel','Burger','Steak','Salad','Falafel','Salmon','Mashed potatoes','MacnCheese','Hot Dogs')
days_the_food_lasts <- c(2,2,1,1,3,1,2,2,4)
price_of_the_food <- c(20,20,40,10,15,18,10,15,15)
data <- data.frame(dish,days_the_food_lasts,price_of_the_food)
#give each dish a distinct id
data$rownumber <- (1:nrow(data))
#set limit for how many days should be covered with the dishes
food_needed_for_days <- 7
#knapsack function of the adagio library as an example, but all other solutions I found to the knapsackproblem were the same
most_exspensive_food <- knapsack(days_the_food_lasts,price_of_the_food,food_needed_for_days)
data[data$rownumber %in% most_exspensive_food$indices, ]
#output
dish days_the_food_lasts price_of_the_food rownumber
1 Schnitzel 2 20 1
2 Burger 2 20 2
3 Steak 1 40 3
4 Salad 1 10 4
6 Salmon 1 18 6
Simplified:
I need a solution to a single objective single Knapsack problem, which returns all possible combinations of dishes which add up to 7 days of food.
Thank you very much in advance

Finding Specific Means and Medians in R

I am working on a project for school in R that is looking at swimming data compiled up of 8 different teams looking at each of the 13 events, over 6 years. I have over 8700 rows of data that I have appended and am trying to find out how to draw the specific means that I am looking for. For example, I would like to look at the progression of mean times for team 1 for event 3 for men. Thanks!
You can subset your data-frame to only include those variables, e.g.
ss = subset(df, team == 1 & event == 3)
mean(ss$times)

Calculate how much a point is worth based on played games

My problem I have is that I need to calculate out how much a point is worth based on played games.
If a team plays a match it can get 3 points for a win, 1 point for a tie and 0 points for a loss.
And the problem here is following:
Team 1
Wins:8 Tie:2 Loss:3 Points:26 Played Games: 13
Team 2
Wins:8 Tie:3 Loss:4 Points:27 Played Games: 15
And here you can see that Team 2 has 1 more point than Team 1 has. But Team 2 has played 2 more matches and have a lesser win % then Team 1 has. But if you should list these two then Team 2 would get a higher "rating" then Team 1 has.
So how should the math look for this to make it fair? where Team 1 will have a better score here then Team 2 ?
Just divide by the number of games to get the average points per game played.
Team1: 2.0 ppg
Team2: 1.8 ppg
Okey first of all thanks for the help.
And the solution of this is the following:
p/pg * p = Real points
p = Sum(points),
pg = Played games
So for the example up top the real points will be:
Team 1: 52
Team 2: 48.6

To use the correct test for independence

I have two groups (data.frame) in R called good and bad which contain good users and bad users respectively.
The group good contains game_id which is the id for a computergame and number which is how many times this game has been played.
For example good$game_id we get 1 2 3 ... 20. We have 20 games.
Similar good$number we get 45214 1254 23 ... 8914 which is the number the game has been played. For example has game_id==1 been played 45214 times in group good.
Similar for bad.
We also have the same number of users in the two groups.
So for head(good,20) we get
game_id number
1 45214
2 1254
...
20 8914
I want to investigate if there is dependence between the number of times a fixed computergame has been played.
For game_id==1 I would try to use Pearson's Chi test for 'Independence'.
In R I type chisq.test(good[1,2], bad[1,2]) to see if there is indepence between good and bad for game_id==1 but I get an error message: x and y must have same levels.
How can this problem be solved ?

Voting - Number of votes vs Vote percent?

I've implemented a simple up/down voting system on a website, and I keep track of individual votes as well as vote time and unique user iD (hashed IP).
My question is not how to calculate the percent or sum of the votes - but more, what is a good algorithm for determining a good score based on votes?
I find sorting by pure vote percent to be unacceptable, as well as simply tallying upvotes.
Consider this example:
Image A: 4 upvotes, 1 downvotes
Image B: 5 upvotes, 4 downvotes
Image C: 1 upvote, 0 downvotes
The ideal system would put A first, maybe followed by B and then C.
In a pure percentage scenario, the order is C > A > B. (wrong)
In a pure vote count scenario, the order is B > A > C. (wrong)
I have an idea for a somewhat "hybrid" algorithm based on the system's confidence in a score, maybe something along the lines of:
// (if totalvotes > 0, else score = 0)
score = 1 - ((downvotes+1 / totalvotes+1) * sqrt(1 / totalvotes))
However, I was hoping to ask the community if there are any really well-defined algorithms already out there that I simply don't know about, before I sit around tweaking my algorithm from now until sunset.
I also have date data for each vote - however, the content of the site isn't very time-sensitive so I don't really care to sort by "what's hot" at all.
Sorting by the average of votes is not very good.
By instead balancing the proportion of positive ratings with the uncertainty of a small number of observations like explained in this article, you achieve a much better representation of your scores.
The article below explains how to not make the same mistake that many popular websites do. (Amazon, urbandictionary etc.)
http://evanmiller.org/how-not-to-sort-by-average-rating.html
Hope this helps!
I know that doesn't answer your question, but I just spent 3 minutes for fun trying to find some formula and... just check it :) A column is upvotes and B is downvotes :)
=(LN((A1+1)/(A1+B1+1))+1)*LN(A1)
5 3 0.956866995
4 1 1.133543015
5 4 0.787295787
1 0 0
6 4 0.981910844
2 8 -0.207447157
6 5 0.826007385
3 3 0.483811507
4 0 1.386294361
5 0 1.609437912
6 1 1.552503332
5 2 1.146431478
100 100 -3.020151034
10 10 0.813671022

Resources