Given M food and N person. Each person has their own fav food. Find minimum food to support all persons for their breakfast
example :
f1 f2 f3 \n
p1 1 1 0 \n
p2 0 0 1 \n
p3 0 1 0 \n
answer is 2since f2 supports 2 persons and f3 support 1 person
It's quite unclear what you are asking here as you are not explicitly posing any questions except for the one in the heading.
This is obviously a bipartite graph, yes, but the problem you are trying to solve is not a matching problem, but rather the set cover problem.
In order to see this, make a set for each food consisting of the persons that eats this food. In your example you will get the three sets {p1}, {p1, p3} and {p2} (who likes f1, f2 and f3, respectively). Finding the minimal number of these sets that together cover all of your persons, that is the set {p1, p2, p3}, is exactly an instance of the set cover problem.
Related
I'm trying to ease my life by writing a menu creator, which is supposed to permutate a weekly menu from a list of my favourite dishes, in order to get a little bit more variety in my life.
I gave every dish a value of how many days it approximately lasts and tried to arrange the dishes to end up with menus worth 7 days of food.
I've already tried solutions for knapsack functions from here, including dynamic programming, but I'm not experienced enough to get the hang of it. This is because all of these solutions are targeting only the most efficient option and not every combination, which fills the Knapsack.
library(adagio)
#create some data
dish <-c('Schnitzel','Burger','Steak','Salad','Falafel','Salmon','Mashed potatoes','MacnCheese','Hot Dogs')
days_the_food_lasts <- c(2,2,1,1,3,1,2,2,4)
price_of_the_food <- c(20,20,40,10,15,18,10,15,15)
data <- data.frame(dish,days_the_food_lasts,price_of_the_food)
#give each dish a distinct id
data$rownumber <- (1:nrow(data))
#set limit for how many days should be covered with the dishes
food_needed_for_days <- 7
#knapsack function of the adagio library as an example, but all other solutions I found to the knapsackproblem were the same
most_exspensive_food <- knapsack(days_the_food_lasts,price_of_the_food,food_needed_for_days)
data[data$rownumber %in% most_exspensive_food$indices, ]
#output
dish days_the_food_lasts price_of_the_food rownumber
1 Schnitzel 2 20 1
2 Burger 2 20 2
3 Steak 1 40 3
4 Salad 1 10 4
6 Salmon 1 18 6
Simplified:
I need a solution to a single objective single Knapsack problem, which returns all possible combinations of dishes which add up to 7 days of food.
Thank you very much in advance
My problem I have is that I need to calculate out how much a point is worth based on played games.
If a team plays a match it can get 3 points for a win, 1 point for a tie and 0 points for a loss.
And the problem here is following:
Team 1
Wins:8 Tie:2 Loss:3 Points:26 Played Games: 13
Team 2
Wins:8 Tie:3 Loss:4 Points:27 Played Games: 15
And here you can see that Team 2 has 1 more point than Team 1 has. But Team 2 has played 2 more matches and have a lesser win % then Team 1 has. But if you should list these two then Team 2 would get a higher "rating" then Team 1 has.
So how should the math look for this to make it fair? where Team 1 will have a better score here then Team 2 ?
Just divide by the number of games to get the average points per game played.
Team1: 2.0 ppg
Team2: 1.8 ppg
Okey first of all thanks for the help.
And the solution of this is the following:
p/pg * p = Real points
p = Sum(points),
pg = Played games
So for the example up top the real points will be:
Team 1: 52
Team 2: 48.6
I have observed nurses during 400 episodes of care and recorded the sequence of surfaces contacts in each.
I categorised the surfaces into 5 groups 1:5 and calculated the probability density functions of touching any one of 1:5 (PDF).
PDF=[ 0.255202629 0.186199343 0.104052574 0.201533406 0.253012048]
I then predicted some 1000 sequences using:
for i=1:1000 % 1000 different nurses
seq(i,1:end)=randsample(1:5,max(observed_seq_length),'true',PDF);
end
eg.
seq = 1 5 2 3 4 2 5 5 2 5
stairs(1:max(observed_seq_length),seq) hold all
I'd like to compare my empirical sequences with my predicted one. What would you suggest to be the best strategy or property to look at?
Regards,
EDIT: I put r as a tag as this may well fall more easily under that category due to the nature of the question rather than the matlab code.
I have done my research and googling but have yet to find a solution to the following problem. I have quite often found solutions to R-related issues from this forum, so I thought I'd give it a try and hope that somebody can suggest something. I would need it for my PhD thesis; anybody who's code or suggestions I will use will naturally be acknowledged and credited.
So: I need to draw lines/segments to connect points in a plot (of multidimensional scaling, specifically) in R (SPSS-based solutions are welcome as well) - but not between all points, just those that represent properties/variables that at least one data item shares - the placement of the lines should be based on the data that the plot in question is based on itself. Let me exeplify; below are some fictional data with dummy variables, where '1' means that the item has the property:
"properties"
a b c
"items" ---------
tree | 1 1 0
house | 0 1 1
hut | 0 1 1
book | 1 0 0
The plot is a multidimensional scaling plot (distances are to be interpreted as dissimilarities). This is the logic:
there's a line between A and B, because there is at least one item/variable ("tree") in
the data that has both properties;
there is a line between B and C, because there is at least one item in the data ("house" and "hut") that has both properties;
there is an item ("book") that has only one property (A), so it does not affect the placement of the lines
importantly, there is no line between A and C because there are no items in the data that have both properties.
What I am looking for is a way to add the grey lines automatically/computationally that I have for now drawn manually on the plot above. The automatic drawing should be based on the data as described above. With a small data set, drawing the lines manually is no problem, but becomes a problem when there are tens of such "properties" and hundreds of items/rows of data.
Any ideas? Some R code (commented if possible) would be most welcome!
EDIT: It seems I forgot something very important. First thing, the solution proposed by #GaborCsardi below works perfectly with the example data, thanks for that! But I forgot to include that the linking of the points should also be "conservative", with as few connecting lines as possible. For example, if there is an item that has all the "properties", then it should not create lines between every single property point in the plot just because of that, if the points are connected by other items already, even if indirectly. So a plot based on the following data should not be a full triangle, even though item1 has all three properties:
A B C
item1 1 1 1
item2 1 1 0
item3 0 1 1
Instead, A,B and B,C should be connected by a line, but a line between A and C would be exessive, as they are already indirectly connected (through B). Could this be done with incidence graphs?
This is very easy if you use graphs, and create the projection of the bipartite graph that you have in your table. E.g.
library(igraph)
## Some example data
mat <- " properties
items a b c
tree 1 1 0
house 0 1 1
hut 0 1 1
book 1 0 0
"
tab <- read.table(textConnection(mat), skip=1,
header=TRUE, row.names=1)
## Create a bipartite graph
graph <- graph.incidence(as.matrix(tab))
## Project the bipartite graph
proj <- bipartite.projection(graph)
## Plot one of the projections, the one you need
## happens to be the second one
plot(proj$proj2)
## Minimum spanning tree of the projection
plot(minimum.spanning.tree(proj$proj2))
For more information see the manual pages, i.e. ?"igraph-package" ?graph.incidence, ?bipartite.projection and ?plot.igraph.
I've implemented a simple up/down voting system on a website, and I keep track of individual votes as well as vote time and unique user iD (hashed IP).
My question is not how to calculate the percent or sum of the votes - but more, what is a good algorithm for determining a good score based on votes?
I find sorting by pure vote percent to be unacceptable, as well as simply tallying upvotes.
Consider this example:
Image A: 4 upvotes, 1 downvotes
Image B: 5 upvotes, 4 downvotes
Image C: 1 upvote, 0 downvotes
The ideal system would put A first, maybe followed by B and then C.
In a pure percentage scenario, the order is C > A > B. (wrong)
In a pure vote count scenario, the order is B > A > C. (wrong)
I have an idea for a somewhat "hybrid" algorithm based on the system's confidence in a score, maybe something along the lines of:
// (if totalvotes > 0, else score = 0)
score = 1 - ((downvotes+1 / totalvotes+1) * sqrt(1 / totalvotes))
However, I was hoping to ask the community if there are any really well-defined algorithms already out there that I simply don't know about, before I sit around tweaking my algorithm from now until sunset.
I also have date data for each vote - however, the content of the site isn't very time-sensitive so I don't really care to sort by "what's hot" at all.
Sorting by the average of votes is not very good.
By instead balancing the proportion of positive ratings with the uncertainty of a small number of observations like explained in this article, you achieve a much better representation of your scores.
The article below explains how to not make the same mistake that many popular websites do. (Amazon, urbandictionary etc.)
http://evanmiller.org/how-not-to-sort-by-average-rating.html
Hope this helps!
I know that doesn't answer your question, but I just spent 3 minutes for fun trying to find some formula and... just check it :) A column is upvotes and B is downvotes :)
=(LN((A1+1)/(A1+B1+1))+1)*LN(A1)
5 3 0.956866995
4 1 1.133543015
5 4 0.787295787
1 0 0
6 4 0.981910844
2 8 -0.207447157
6 5 0.826007385
3 3 0.483811507
4 0 1.386294361
5 0 1.609437912
6 1 1.552503332
5 2 1.146431478
100 100 -3.020151034
10 10 0.813671022