I am working on a program where I have two 2d arrays. one is called males and one females. they're both 3x3 size. The array contains a score of how much a person likes the other.
i.e. (male array) This means that male0 likes female0 8, male0 likes female1 5, male0 likes female2 8, male1 likes female0 9, male1 likes female1 5, and so on....
8 5 8
9 5 7
5 6 8
I also have another array like this for females where they rate the males.
Then a make another 2d array where I add the scores for each female(i,j) and male(i,j)
How do I go about finding out which combination gives the biggest total score?
I would like to come up with something like
Best combination is:
male0 -> female2
male1 -> female0
male2 -> female1
One way is to try every permutation of a female array, for each permutation finding the total score, and in the end picking the permutation that gives the highest score.
This is the Stable Marriage Problem
Related
I have a population categorized based on four caracteristics: gender (female, male), age (young, middle-aged, old), geography (urban, rural) and education (lower, higher).
This leaves me with 24 possible combinations:
[1] Female Young Urban Lower
[2] Female Young Urban Higher
[3] Female Young Rural Lower
[4] Female Young Rural Higher
...
[24] Male Old Rural Higher
Let's say I want to draw a stratified sample of 25 people, among which:
Female: 13
Male: 12
---------------
Young: 7
Middle-aged: 8
Old: 10
---------------
Urban: 15
Rural: 10
---------------
Lower: 20
Higher: 5
In order to do so, I want to determine which combinations of the profiles above will allow me to achieve this distribution (e.g. 2x[1], 3x[2], 1x[3] ... 2x[24]), using R.
I thought I could (i) create a dataset with my 24 combinations having value 1-3 (using the crossing() function), (ii) calculate the products and (iii) check if they match my distribution. However, I do not even manage to create the base dataset because it is too large for my memory (3^24 = 282429536481)...
Is there someone who could help me to achieve this in an easier way (with a loop function maybe, that checks if a combination matches and drops it immediately if it does not, in order to save memory; or just a mutch easier way I did not think of)?
Many thanks in advance.
I have below one problem regarding permutation and combination.
I know one solution which I am providing here. But I have another approach to the same problem but it is not giving me same answer as previous one. Can someone tell where am I making mistake here.
Problem: From a group of 7 men and 6 women, five persons are to be selected to form a committee so that at least 3 men are there in the committee. In how many ways can it be done?
First Answer:
We can select 5 men ...(option 1)
Number of ways to do this = 7C5
We can select 4 men and 1 woman ...(option 2)
Number of ways to do this = 7C4 × 6C1
We can select 3 men and 2 women ...(option 3)
Number of ways to do this = 7C3 × 6C2
Total number of ways = 7C5 + (7C4 × 6C1) + (7C3 × 6C2)
= 756.
Below is my new approach, where I am making mistake but not able to understand it.
atleast 3 men should be there. So ways to choose 3 men out of 7 = 7C3
= 35.
Now 2 person has to be selected from remaining 4 men and 6 women. The no of ways it can be done = 10C2 = 45.
Therefore, total no of way = 35*45 = 1575.
Can someone tell me what I am missing in second approach.
Your approach will count some ways more than
Suppose from the 7 men you choose
M1,M2,M3
and from the remaining 10 person you choose a men M4 and remaining women W1,W2,W3...W6
Now suppose you choose M1,M2,M4 men from the 7 men
and from remaining 10 you choose M3,W1,W2...W6
Now both of this represent the same set and should be counted only once but you are counting them as 2 different ways.Thats why your answer is greater than the expected answer
I have two groups (data.frame) in R called good and bad which contain good users and bad users respectively.
The group good contains game_id which is the id for a computergame and number which is how many times this game has been played.
For example good$game_id we get 1 2 3 ... 20. We have 20 games.
Similar good$number we get 45214 1254 23 ... 8914 which is the number the game has been played. For example has game_id==1 been played 45214 times in group good.
Similar for bad.
We also have the same number of users in the two groups.
So for head(good,20) we get
game_id number
1 45214
2 1254
...
20 8914
I want to investigate if there is dependence between the number of times a fixed computergame has been played.
For game_id==1 I would try to use Pearson's Chi test for 'Independence'.
In R I type chisq.test(good[1,2], bad[1,2]) to see if there is indepence between good and bad for game_id==1 but I get an error message: x and y must have same levels.
How can this problem be solved ?
I'm working on a Crystal report which is grouped hierarchically. Sometimes the data goes down to a fifth-level grouping, sometimes only a 4th. I'd like to be able to tell if I'm at the deepest level of the tree, so that I can switch from using the data in the group header to using the data in the detail line, setting it to columns. I've tried to use the Maximum() function to see how deep the tree goes, but that requires a field, not just an expression.
I've also tried writing a hierarchical query in Oracle and using the MAX OVER (PARTITION BY parent) clause, but that's only bringing back the data I already have, and in addition, it seems to be forcing me to lose the level-1 lines.
ETA: I have checked the GroupingLevel and HierarchyLevel functions, but those don't seem to help -- they only tell me where I am, not where I'm going. When I said above that I used Maximum(), I should have clarified I meant Maximum(HierarchyLevel(blah)).
ETA2: Ok, say the data looks like this:
id parentid
1
2 1
3 2
4 3
5 3
6 1
7 6
8 7
9 8
10 8
11 8
I want something that will return True for 4, 5, 9, 10, and 11, because that's as far down the tree as I can go.
first off:
Suppose I have a dataset that has variables like to_location_id, from_location_id gender, and age. now if i want to know overall the top 5 locations people like to visit i do this:
#most popular 5 locations to go to
top<-as.data.frame(sort(table((mydata$to_location_id),decreasing = TRUE)[1:5])
> top
sort(table(mydata$to_location_id), decreasing = TRUE)[1:5]
3 18544
9 18395
76 15457
5 14342
1 13898
*this gives the most 5 popular locations to go to overall in the dataset
locations 3 , 9, 76, 5, and 1
**similarly i can also get the most 5 popular locations to come form overall
Now suppose that there are 100 unique location id's (in both from and to location id's) I want to know for each location what are the top 5 popular to and the top 5 popular from locations given each location. i know i need a loop but i'm not sure how to do it. i have tried this (no luck):
for(i in unique(mydata$to_location_id)){
as.data.frame(sort(table(mydata$to_location_id),decreasing = TRUE)[1:5])
}