Whilst making ERD's in a recursive relationship does it always have to link N to N or can it be N to 0 or 1? - erd

The title sums it up pretty much, needing help with it to continue with my ERD for my class. The instructions the teacher gave were:
It is intended to keep information about the albums (title, year, comment, compilation) being relevant to keep the respective nationality, musical genre, as well as the artists who participate in it.
• Artists can be a person or a group and are characterized by a name and type of artist. It is also important to know who the artists that make up the group are.
• The artists' participation in the album implies the indication if the participation is the main role or not, as well as the type of intervention (instrument) that each artist has in the album.
I currently have it linked like this
https://gyazo.com/1b1a8ef130906c875e888ec0eb46bf09
As per the instructions I have the main role part, but I thought it could be a relation instead of an entity and would link back to the artist N to 0 or 1...
Thanks for your time

to answer your question simply - NO it dont have to be N...N
the idea of recursive relationships is that you can 'bundle' entities of same structure that interacts with each other to simplify the ERD.
e.g. the Employee vs Supervisor example :)
IF it could only be N...N
N Employees would always be overseen by N Supervisors.
but in most cases N Employee is overseen by 1 Supervisors, which is also an employee, therefor you can write your recursive relation on the ERD as N..X

Related

modeling scenario with mostly semi-additive facts

Im learning dimensional modeling and Im trying to create a model. I was thinking about a social media platform which rates hotels. The platform has following data:
hotel information: name and address
a user can rate hotels (1-5 points)
a user can write comments
platform stores the date of the comments
hotel can answer via comment and it stores the date of it
the platform stores the total number of each rating level (i.e.: all rates with 1 point, all rates with 2 point etc.)
platform stores information of the user: sex, name, total number of votes he/she made and address
First, I tried to define which information belongs to a dimension or fact table
(here I also checked which one is additive/semi additive/non-additive)
I realized my example is kind of difficult, because it’s hard to decide if it belongs to a fact table or dimension.
I would like to hear some advice. Would someone agree with my model?
This is how I would model it:
Hotel information -> hotel dimension
User rating -> additive fact – because I can aggregate them with all dimensions
User comment -> semi additive? – because I can aggregate them with the date dimension (I don’t know if my argument is correct, but I know I would have new comments every day, which is for me a reason to store it in a fact table
Answer as comment -> same handling like with the user comments
Date of comment-> dimension
Total Number of all votes (1/2/3/4/5) -> semi-additive facts – makes no sense to aggregate them, since its already total but I would get the average
User information sex and name, address -> user-dimension
User Information: total number of votes -> could be dimension or fact. It depends how often it changes. If it changes often, I store it in a fact. If its not that often, then dimension
I still have question, hope someone can help me:
My Question: should I create two date dimensions, or can I store both information in one date dimension?
2nd Question: each user and hotel just have one address. Are there arguments, to separate the address dimension in a own hierarchy? Can I create a 1:1 relationship to a user dimension and address dimension?
For your model, it looks well considered, but here are some thoughts:
User comment (and answers to comments): they are an event to be captured (with new ones each day, as you mention) so are factual, with dimensionality of the commenter, type of comment, date, and the measure is at least a 'count' which is additive. But you don't want to store big text in a fact so you would put that in a dimension by itself which is 1:1 with the fact, for situations where you need to query on the comment itself.
Total Number of all votes (1/2/3/4/5) are, as you say, already aggregates, mostly for performance. Totals should be easy from the raw data itself so maybe not worthwhile to store them at all. You might also consider updating the hotel dimension with columns (hotel A has 5 '1' votes and 4 '2' votes) that you'd update as you go on, for easy filtering and categorisation.
User Information: total number of votes: it is factual information about a user (dimension) and it depends on whether you always just want to 'find it out' about a person or whether you are likely to use it to filter other information (i.e. show me all reviews for users who have made 10-20 votes). In that case you might store the total in the user dimension (and/or a banding, like 'number of reviews range' with 10-20, 20-30). You can update dimensions often if you need to, but you're right, it could still just live as a fact only.
As for date dimensions, if the 'grain' is 'day' then you only need one dimension, that you refer to from multiple facts.
As for addresses, you're right that there are arguments on both sides! Many people separate addresses into their own dimension, referred to from the other dimensions that use them. Kimball suggests you can do that behind the scenes if necessary, but prefers for each dimension to have its own set of address columns(but modelled as consistently as possible).

Users, Roles, and Security Groups Management - How to Set up a Downline in SuiteCRM

SuiteCRM 7.5.1 - In Reference to using Users, Roles, and the Security Groups within SuiteCRM specifically.
So, I have a specific setup and I've looked through and read lots of documentation and tried my best to wrap my head around how SuiteCRM does this.
How would one correctly implement the following scenario?:
Let's say I have a tree like so:
We'll number these rows for the sake of understanding: 1, 2, 3 and 4. Then we have Administrators who are employees to throw into the mix.
Administrators can work with almost all records except working with workflows, mess with code, or mess with a few custom modules, outside of that, they have very few restrictions and don't obey any of the rules of the downline.
Then we follow the downline:
Person 1's can see all Person 2's, 3's, and 4's that are specifically within their downline and within their Territory. They cannot see any other Person 1's period. They cannot see any 2's, 3's, and 4's that aren't within their downline or their Territory. They also cannot see Administrators or anything assigned to them.
Person 2's can see all Person 3's and 4's within their specific Downline and Territory, They cannot see any Person 1's or 2's period. They cannot see any Person 3's or 4's outside of their Territory or Downline. They also cannot see Administrators or anything assigned to them.
Person 3's can see all 4's within their specific Downline and Territory, They cannot see any Person 1's, 2's, or other 3's period. They cannot see any Person 4's outside of their Territory or Downline. They also cannot see Administrators or anything assigned to them.
Person 4's can see only records assigned to them.
In this example there is only 4 deep, in the real world, there is actually 12 deep plus administrators plus me, the Super Admin.
How can I go about resolving this?
I wrote SecuritySuite and what you need is fairly typical. There can be a large learning curve for figuring this out so I wrote up an example setup for a 3 deep hierarchy here to try to help with that a bit: https://www.sugaroutfitters.com/docs/securitysuite/example-of-a-typical-setup.
Your example is a 4 deep hierarchy, but it's fairly similar. The key is to create groups for the lowest level. In your case, this would be at the person 4 levels. So person 4a, 4b, 4c would all be in Group A. A role with Owner only rights would be assigned directly to Group A so that 4a/4b/4c could only access their own records.
Person 3a would be in Group A, but a "Manager" role would be created with Group access and assigned directly to person 3a. Person 3a's Group A membership would be marked as non-inheritable so that when person 3a creates a record Group A wouldn't be assigned to it directly. Person 3a would also be in Group AA along with person 3b/3c/3d (according to the picture above).
Person 2b (2nd person in the 2nd tier of the image above) would be in Group A and Group AA, both marked as non-inheritable. Person 2b would have the "Manager" role assigned directly.
Person 1 would have a role assigned directly with "All" access as this person can see everyone.

Formula for Smiley rating system

I'm using the smiley rating system in my website: Like (Smile face), Dislike (Sad face), and Neutral (Straight face).
I used the formula below for evaluating the overall rating as a smiley and not as a number.
Let x= number of likes - number of dislikes
If x<0, then rating = dislike
If x>0 & x> number of apathetic, then rating = like
If x>0 & x<number of apathetic, then rating = apathetic
How can I calculate the average rating number/score in such rating system?
Note: I need the score in order to sort the highest-rated stuff at the top and lowest-rated at the bottom
I am not sure if you are using object orientated coding, but either way, if you have 'something' to vote on, I would guess that you also have other information on that 'something', which is stored either in a database or some other medium.
That being said, you can simply store the votes, as an integer, with the object being voted on.
Example:
You have a posting site, or a blog where users can vote on posts made by other users.
Each post would be stored with information like:
Who posted it
The date
Details of the post
With that information you can also store the upvotes and downvotes for it.
Or if you wanted to take it further you could create a vote object and a relationship to the post if you wanted to store more than just the amount of upvotes and downvotes (additional information like who voted or the date they voted, if for instance you did not want people to vote on the same thing twice).
Something else to keep in mind, always keep track of all the upvotes + downvotes and use logic to determine the display. Do not just keep a single integer and add or subtract to it. A post with 1 upvote and 1 downvote = 0 score, is not the same as a post with 1000 upvotes and 1000 downvotes = 0 score, as it got 2000 views, which in its own should account for something.
The correct way to do this:
I know that you do not want to display amounts, and rather emoticons, but you should still store it in that way and render the display differently.
Create an additional table and call it something like vote
Store information like the voter, date, vote type
Link the table in a many-to-many relationship between users and votes
Every time a vote is made, update the table
Calculate the difference between the ups and downs and display the correct smileys on your front-end, or display it as a number
The "I have a crappy client", "I'm lazy" or "I'm coding on 2 hours of sleep" way of doing it:
Create a single integer field with the object being voted on
Increase and decrease that as users vote
Hope I understood the question and this helps!

Graph data modeling assistance relating to soccer matches

I am trying to model soccer matches and the referees and teams that play in them. I want to create nodes based on matches, referees, and players and am not clear on the best approach to model them? That is should I model it after cities, matches? Do I create a root node Id etc?
The kind of information I would looking for later would be stuff like:
1). Show all the matches for a particular referee (could be in multiple cities)
2). Show all matches where referee worked and home team won
3). show all referees that that have the highest count of wins for the home team?
4). show most active refereess in a particular city
As you can see there are all sorts of questions and for someone new this can be a little overwhelming. While I am reading some books, I wanted to see if any experts could help me in the scenario above. Again not sure if I need a root node that connects all the cities and referees and matches or just keep things independent. Your feedback would be most appreciated.
One of the possible models that at the moment seem to satisfy the queries you've posted:
(Team)-[:PLAYS]->(Match)
(Match)-[:HAS_REFEREE]->(Referee)
(Match)-[:PLAYED_IN]->(City)
The PLAYS relation could have a property to indicate if the team was the home team. You could also have a property on the PLAYS relation to indicate whether that team won or not. Or if winning is a big part of what you're looking for, you can create an extra relation such as
(Team)-[:WON]->(Match) (though then you need to think about how to model draws. The absence of a WON relation on either of the two teams for a match could indicate a draw maybe).
1) All matches for a particular referee: Start at the referee, traverse through the Match to the Cities. You might index some unique property of the referee to be able to look him up quickly
2) All matches where the referee worked and the home team won: Start at the referee, find all his matches, filter on the WON relation/property and the home team property
3)All referees that have the highest count of wins for the home team: Same as above, start at all referees
4)Most active referees for a city: Start at the city, find all matches and their referees
You might move things around a bit depending on more questions that you want to answer (especially home team properties, WIN/LOSE relations or properties etc.)
And I don't think you need the root node at all. You can index all matches/cities/referees etc if you want to find all of them
I've done some modelling of football/soccer matches which might be interesting to look at - http://staging.thinkingingraphs.com/
Mostly the same as what Luanne said although I've got specific relationship types indicating which team played at home and away. I've been writing up what I discovered while building out the model here as well - http://www.markhneedham.com/blog/tag/neo4j/page/2/

Determining the popularity of a video with ratings and views

I am about to embark on a new project - a video website. Users will be able to register, and vote on videos by clicking "like" or "dislike", or something to that effect. In any event, it will be a 2-option voting system, not a 5-star system.
Every X number of days, I will be generating a "chart" of the most popular videos. So my question is: how should I determine the popularity of a given video?
If I went the route of tallying up the videos with the most views, this could have the effect of exceptionally bad videos making it to the of the charts (just because they're so bad).
If I go the route of a scoring system based on the amount of "like" and "dislike" votes (eg. 100 like votes, and 50 dislike votes equals a score of 2), videos with few views could appear on the top of the charts.
So, what I need to do is a combination of the two. Barring, of course, spammy views and votes.
What's your guys' thoughts on the subject?
Edit: the following tags were removed: [mysql] [postgresql], to make room for other, more representative tags; the SQL technology used in the intended implementation does not seem to bear much on the considerations regarding the rating model per-se.
You seem to be missing the point that likes and dislikes in movies are anything but objective even within the context of a relatively homogeneous group of "voters". Think how the term "Chix Flix" or the success story called "NetFlix", illustrate this subjectivity...
Yet, if you persist in implementing the model you suggest, there are several hidden variables and system dynamics that need to be acknowledged and possibly taken into account in the rating's formula.
the existence of a third, implicit, value of the vote: "No vote"
i.e. when someone views the movie page and yet doesn't vote, either way.
The problem of dealing with this extra value is its ambiguity: do people not vote because they didn't see the movie or because they neither truly like nor disliked it? Very likely a bit of both, therefore we can/should use the count of the "Page views without vote" in the formula, to boost (somewhat) the rating of movies that do not generate a strong (positive or negative) sentiment (lest the "polarizing" movies will appear more notorious or popular)
the bandwagon effect
Past a certain threshold, and particularly if the rating and/or vote counts is visible before the page view, the rating and vote counts can influence the way people decide to vote (either way) or even decide to abstain from voting. The implication is that the total vote and/or view counts do not relate linearly to the effective rating.
"quality" vs. "notoriety"
Vote ratios in general (eg "likes" / "total" or "likes"/"dislikes" etc.) are indicative of the "quality" of a movie (note the quotes around quality...), whereby the number of votes (and of views) is indicative of the notoriety ("name recognition" etc.) of a movie.
statistical representativity
Very small vote and/or view counts are to be handled carefully because they introduce much volatility in the rating. Phrased otherwise, small samples make for not so statically representative ratings.
trends (the time variable)
At the risk of complicating the model, consider keeping [some] record of when votes/view happened, to allow identifying "hot" (and "cooling") movies in the collection. This info may inform the rating logic, but also may be used to direct the users towards currently hot items. BTW, hence feeding the bandwagon effect mentioned :-( but also, increasing the voting sample size :-).
All these considerations suggest caution in implementing this rating system. It also hints at the likely need of including statistics about the complete set of movies into the rating formula for an individual movie. In other words, do not rate a given movie solely on the basis of the its own vote/view counts but also on say the average vote counts a move receives, the maximum view a movie page gets etc. In fact, an iterative process, whereby movies are [roughly] ranked at first and then the ranking is recalculated by using the statistics of groups of movies similarly rated may provide a better system (provided the formulas are "fair" and somehow converge)
A standard trick is to start with a neutral baseline: say 10 likes and 10 dislikes that gives a score of 1. The first few votes don't change the ratio too much, but as votes accumulate, the baseline is overwhelmed. The exact choice of the baseline values will influence the rating of a new movie (the two values don't have to be equal), and how many votes are needed to change the rating substantially.

Resources