Graph design (for neo4j) for sports tournaments - graph

I want to use a graph database for a web application that tracks the players, matches, leagues for a given sport say volleyball. Below is the 1st level model I came up with. I would like to support the below statistics for this web application
Player
Show all the leagues played by a player.
Show all the matches played by a player in each league.
Player's current team and his previous teams.
How many times the player was a captain and all the leagues for which he was the captain.
Team
All leagues played by a team.
How many times the team was a winner or runner.
NOTE: Right click on the image and open it in a new tab to see the original image.

You model looks good, however after looking at your use cases, I have a few questions/suggestions:
Query
I'll give these in Cypher as it's easiest to show in this format.
Player
Show all the leagues played by a player.
START player=node:Player('indexForPlayer')
MATCH player-[PLAYED]->match-[PART_OF]->league
RETURN league
Show all the matches played by a player in each league.
START player=node:Player('indexForPlayer')
MATCH player-[PLAYED]->match-[PART_OF]->league
RETURN match, collect(league)
Player's current team and his previous teams.
START player=node:Player('indexForPlayer')
MATCH player-[BELONGED_TO]->team
RETURN team
How many times the player was a captain and all the leagues for which he was the captain.
How do you determine if they were a captain of a league?
Team
How many times the team was a winner or runner.
You might want to put this as a relationship such as (match)-[WINNER]->(team) this way to find out how many wins your team has, all you have to do is count the WINNER relationship.
Data Model
Add a property to the Match node for date played. I'm unfamiliar with sports, but Year may not be enough if they can swap teams within a year, however Neo4j doesn't really have a good method for dealing with time, other than a 'seconds since epoch ` type system.

Related

Change google form questions based on spreadsheet and other questions

I'm currently working on some code for a Google Form for a family football pool we do every year. I have it made to where the user selects their name from a list of competitors and then selects the team they want to pick that week. Each player cannot pick a team more than once and no more than 8 players can select any given team in one week.
Is there a way to gray out teams based on the teams a player already picked? I have a spreadsheet with everyone's name and the teams they picked. I manually update it based on last weeks results, each player is listed in a row with all the teams they picked in the cells next to their name.
Also how do I gray out teams that have already been picked by 8 players that week?

Concur / Cognos Report Studio - Create Report of Travel Destination Countries

this question is very specific to the Concur implementation of the IBM Cognos Report Studio tool since it primarily focuses on the data model used therein.
It contains business travel expense information including the travel itineraries, which are the main source of this report:
Itinerary Example
My goal now is to create a report showing which destination countries the employees traveled to (all countries if more than one country was visited in a single itinerary), how many single business trips were taken to that destination country/countries, the average duration of the business trips to that destination country/countries and the number of all trips. If possible, duration of stay by country would be great, but i have no idea how to go about that. Mockup
Using a repeater based on the field [Arrival Country] from the itinerary, i managed to get something that looks what i am trying to achieve, but it somehow does not include the home country once i cut out the other identifying columns (Itinerary Key, Departure Country, Arrival Country).
I then did a count(distinct([Itinerary Key] for [Arrival Country Repeater] which gave me numbers, but i am not really sure they are correct in this case.
Repeater
Also, as soon as i add query calculations to include the average duration, the repeater fields go blank.
Is there another way to get the report i want to build?
Are there major flaws with my attempt?
Thanks a ton for any and all suggestions!

Graph data modeling assistance relating to soccer matches

I am trying to model soccer matches and the referees and teams that play in them. I want to create nodes based on matches, referees, and players and am not clear on the best approach to model them? That is should I model it after cities, matches? Do I create a root node Id etc?
The kind of information I would looking for later would be stuff like:
1). Show all the matches for a particular referee (could be in multiple cities)
2). Show all matches where referee worked and home team won
3). show all referees that that have the highest count of wins for the home team?
4). show most active refereess in a particular city
As you can see there are all sorts of questions and for someone new this can be a little overwhelming. While I am reading some books, I wanted to see if any experts could help me in the scenario above. Again not sure if I need a root node that connects all the cities and referees and matches or just keep things independent. Your feedback would be most appreciated.
One of the possible models that at the moment seem to satisfy the queries you've posted:
(Team)-[:PLAYS]->(Match)
(Match)-[:HAS_REFEREE]->(Referee)
(Match)-[:PLAYED_IN]->(City)
The PLAYS relation could have a property to indicate if the team was the home team. You could also have a property on the PLAYS relation to indicate whether that team won or not. Or if winning is a big part of what you're looking for, you can create an extra relation such as
(Team)-[:WON]->(Match) (though then you need to think about how to model draws. The absence of a WON relation on either of the two teams for a match could indicate a draw maybe).
1) All matches for a particular referee: Start at the referee, traverse through the Match to the Cities. You might index some unique property of the referee to be able to look him up quickly
2) All matches where the referee worked and the home team won: Start at the referee, find all his matches, filter on the WON relation/property and the home team property
3)All referees that have the highest count of wins for the home team: Same as above, start at all referees
4)Most active referees for a city: Start at the city, find all matches and their referees
You might move things around a bit depending on more questions that you want to answer (especially home team properties, WIN/LOSE relations or properties etc.)
And I don't think you need the root node at all. You can index all matches/cities/referees etc if you want to find all of them
I've done some modelling of football/soccer matches which might be interesting to look at - http://staging.thinkingingraphs.com/
Mostly the same as what Luanne said although I've got specific relationship types indicating which team played at home and away. I've been writing up what I discovered while building out the model here as well - http://www.markhneedham.com/blog/tag/neo4j/page/2/

Game story where a lower score is better

My company is producing a racing game where the best score is the fastest time. Facebook publishes the time as a regular point score, where a higher score is better. This of course is turning it all upside down.
Is there a way to control how a game's score shown in a story? Ideally we would like to show "seconds" instead of points as well.
No, the Scores API currently only supports 'higher is better' for scores.
If you can't rework your scoring scheme to take this into account, consider using Open Graph actions instead - you can have the aggregations which appear on a user's Timeline ordered by whichever field of the object and action you need them to be ordered by,

Determining the popularity of a video with ratings and views

I am about to embark on a new project - a video website. Users will be able to register, and vote on videos by clicking "like" or "dislike", or something to that effect. In any event, it will be a 2-option voting system, not a 5-star system.
Every X number of days, I will be generating a "chart" of the most popular videos. So my question is: how should I determine the popularity of a given video?
If I went the route of tallying up the videos with the most views, this could have the effect of exceptionally bad videos making it to the of the charts (just because they're so bad).
If I go the route of a scoring system based on the amount of "like" and "dislike" votes (eg. 100 like votes, and 50 dislike votes equals a score of 2), videos with few views could appear on the top of the charts.
So, what I need to do is a combination of the two. Barring, of course, spammy views and votes.
What's your guys' thoughts on the subject?
Edit: the following tags were removed: [mysql] [postgresql], to make room for other, more representative tags; the SQL technology used in the intended implementation does not seem to bear much on the considerations regarding the rating model per-se.
You seem to be missing the point that likes and dislikes in movies are anything but objective even within the context of a relatively homogeneous group of "voters". Think how the term "Chix Flix" or the success story called "NetFlix", illustrate this subjectivity...
Yet, if you persist in implementing the model you suggest, there are several hidden variables and system dynamics that need to be acknowledged and possibly taken into account in the rating's formula.
the existence of a third, implicit, value of the vote: "No vote"
i.e. when someone views the movie page and yet doesn't vote, either way.
The problem of dealing with this extra value is its ambiguity: do people not vote because they didn't see the movie or because they neither truly like nor disliked it? Very likely a bit of both, therefore we can/should use the count of the "Page views without vote" in the formula, to boost (somewhat) the rating of movies that do not generate a strong (positive or negative) sentiment (lest the "polarizing" movies will appear more notorious or popular)
the bandwagon effect
Past a certain threshold, and particularly if the rating and/or vote counts is visible before the page view, the rating and vote counts can influence the way people decide to vote (either way) or even decide to abstain from voting. The implication is that the total vote and/or view counts do not relate linearly to the effective rating.
"quality" vs. "notoriety"
Vote ratios in general (eg "likes" / "total" or "likes"/"dislikes" etc.) are indicative of the "quality" of a movie (note the quotes around quality...), whereby the number of votes (and of views) is indicative of the notoriety ("name recognition" etc.) of a movie.
statistical representativity
Very small vote and/or view counts are to be handled carefully because they introduce much volatility in the rating. Phrased otherwise, small samples make for not so statically representative ratings.
trends (the time variable)
At the risk of complicating the model, consider keeping [some] record of when votes/view happened, to allow identifying "hot" (and "cooling") movies in the collection. This info may inform the rating logic, but also may be used to direct the users towards currently hot items. BTW, hence feeding the bandwagon effect mentioned :-( but also, increasing the voting sample size :-).
All these considerations suggest caution in implementing this rating system. It also hints at the likely need of including statistics about the complete set of movies into the rating formula for an individual movie. In other words, do not rate a given movie solely on the basis of the its own vote/view counts but also on say the average vote counts a move receives, the maximum view a movie page gets etc. In fact, an iterative process, whereby movies are [roughly] ranked at first and then the ranking is recalculated by using the statistics of groups of movies similarly rated may provide a better system (provided the formulas are "fair" and somehow converge)
A standard trick is to start with a neutral baseline: say 10 likes and 10 dislikes that gives a score of 1. The first few votes don't change the ratio too much, but as votes accumulate, the baseline is overwhelmed. The exact choice of the baseline values will influence the rating of a new movie (the two values don't have to be equal), and how many votes are needed to change the rating substantially.

Resources