In my OLTP database I have a layout consisting of instructors and students. Each student can be a student of any number of instructors. A student can also sign up for an instructor, but not necessarily book any tuition (lesson).
In a data warehouse, how best would this be modelled? If I create a dimension table for Lessons, Instructors and Students and a fact table for the lessons students have taken then this will work when an instructor wants to see what lessons a student has taken.
However, how will an instructor see how many students are REGISTERED with the instructor but has not yet taken a lesson?
In my OLTP, I have a many to many table (InstructorStudents) that links each student with one more more instructors. In an OLAP database, this isn't appropriate.
What would be the best schema in this case? Would a many to many be appropriate in this instance? I can't store a list of which students are registered to which instructors in the student table, so I feel another dimension table is necessary but cannot work out what should be contained in it.
If a fact represents a transaction, you seem to have two different facts here: Sign ups & Lessons. There are always a lot of ways to go but, perhaps, you need two fact tables. They may have similar dimensionality except the sign-up table will have a Class dimension (class name, instructor name, etc.). The Lessons table will tie to the class dimension but, also, to a Lesson dimension (date, classroom used, etc.).
There are a few other ways to do this but they will be more difficult from a programming & reporting perspective.
You need a many to many dimensional model.
You need a factless fact table. Look at the following resource that refers to an example close to your need
http://www.kimballgroup.com/1996/09/02/factless-fact-tables/
Related
We've been given an assignment in which we are to create a conceptual model, described by a text document. There are a number of constraints given in the document, but we have also been instructed not to use constraints in the model.
We have been able to work around a few constraints, but there is one that we've been unable to tackle. I've made up a scenario that is somewhat similar to the part of the assignment that we're having issues with.
You've been tasked to create a model of the structure of a game studio. The company consists of a number of departments, and each department has at least one employee. Each employee works at a single department. There are three different types of employees: developers, designers and engineers.
In addition to this, there are a number of leadership roles that employees can have: Head of Department, Deputy Head of Department, CTO or CEO (Yes, CTO and CEO are roles that regular employees have). Each department must have 1 Head of Department and at least one Deputy Head of Department.
In addition to this, there can only be one CTO and one CEO, and these roles can only be held by engineers. Each employee can only have a single leadership role.
To solve this, we've made up an additional, abstract entity: BasicRole. This entity is a specialisation of LeadershipRole, and is a generalisation of the three roles that any employee can hold. That solves one of the problems, and now we can simply create appropriate associations between Designer/Developer and BasicRole
However, we also want Engineer to have an association with BasicRole in addition to associations to CEO and CTO. Adding those associations results in a conceptual model that looks as such:
However, this is problematic because now we're saying that an engineer can have anywhere between 0 and 3 roles.
We've considered including Company as an entity and adding associations between Company and CTO/CEO, to specify that way that the company can only have one of each, but we've been told over and over during this course not to include the thing that we're modeling as an entity in the model.
Now, it seems as if all our problems could be solved with constraints (if we were to go ahead and read up on those), with some sort of xor for the three associations. However, seeing as we've been instructed not to use constraints in the conceptual model, we're at a loss.
If you associate your Engineer to LeadershipRole (with multiplicity 0..1) removing your two relationships from Engineer to CTO, CEO and LowerRole you will get the expected result:
Each employee can only have a single leadership role.
Since LeadershipRole is abstract it has to be either CEO, CTO, HeadOfDepartment or DeputyHOD) but due to the multiplicity can't be more than one at the same time.
The "we've been told over and over during this course not to include the thing that we're modelling as an entity in the model" statement is correct if you're designing the code-level documentation but it is normal to put the entity representing the whole organisation you're modelling. In other words - don't put System (or however you call your system) in your system's model. But Company is something you model within your system.
2 options for "Each department must have 1 Head of Department and at least one Deputy Head of Department."
Redefinition
Nested notation
2 options for "these roles can only be held by engineers"
Redefinition
Generalization
Total 4=2*2 options
I design my firebase structure and I'm not sure if that the right way.
Little information:
Each college has some departments.
Each departments has many courses.Courses can belong to several departments.
Each Course has some lessons. lesson can be belong to one course.
I have node of courses with all key courses and information
I have node of departments with all departments and informations.
I have node Course_Departments and Departments_courses
in addition I have courses_lectures and courses_lectures
for display each course learn by some lecturers and each lecturers teach some courses.
so my questions are:
1.How I connect the lesson to these table for example? I want to find all the lessons of the course_key1 that lecture_key2 teach?
2. using with many tables in this way can be take many time to get the data?
for UI I don't want to users to wait much for the data.
Looking at your design, I think, you are on the right track.
I'll give some possible hints of what you can take care of. When designing the structure of a Firebase database design rules of non-relational databases should be kept in mind.
One of them is denormalization. Keep the hierarchy flat! That's an important performance factor for data change listeners since all subnodes are involved. That's what you've done so far.
Relationship of entities can be achied by using the keys. Exactly as you did it in the Courses_Department node. The built-in creation of keys should be used. They should be universally unique.
Here's a good explanation when coming from relational databases.
Situation: In the app we have up to 1000 schools. Every school has students and students are having lessons and are joining events (and more). We need to query quick and often lessons per student, per school per date. We have 2 designs in mind, wondering the best way to proceed.
1 - design with dedicated school node
2 - design with no dedicated school node
Examples of two designs
PRO design 1
- root ref to school user after login. noo need to query on school id's
- no need to mention school id's everywhere
- no need for node lessens per school and events per school
- rules on school level
...
PRO design 2
- more flatten data, as widely advised on the internet
For most NoSQL database structures, flattening and denormalising data is the best method. And that is exactly the case with Firebase too.
When you flatten your data, you get the following advantages :-
You're mostly only downloading the minimum required amount. That leads to efficiency and cost-effectiveness.
Your downloads are much faster - specially compared to the likes of SQL join queries.
Having said that, in your particular case, I think that it really depends on how much the school affects the logged in user.
Suppose that a school is only an attribute for a student, and serves no other purpose, then the second database is the way to go. For example, if the books a student can get are independent of the school she goes to, then the second database style is more suited.
However, if a school categories students into groups that define their interaction with the database, then the first database structure is the way to go. An example of this is that a student can only get a book when its available in the school she goes to.
Regardless of your decision, I would like to commend you on the fact that you have flattened your database quite well in both your structures! And my personal suggestion would be to go with the one that is more convenient to code, read and maintain for you.
I have both problems and solutions to over twenty years of physics PhD qualifying exams that I would like to make more accessible, searchable, and useful.
The problems on the Quals are organized into several different categories. The first category is Undergraduate or Graduate problems. (The first day of the exam is Undergraduate, the second day is Graduate). Within those categories there are several subjects that are tested: Mechanics, Electricity & Magnetism, Statistical Mechanics, Quantum Mechanics, Mathematical Methods, and Miscellaneous. Other identifying features: Year, Season, and Problem number.
I'm specifically interested in designing a web-based database system that can store the problem and solution and all the identifying pieces of information in some way so that the following types of actions could be done.
Search and return all Electricity & Magnetism problems.
Search and return all graduate Statistical Mechanics problems.
Create a random qualifying exam — meaning a new 20 question test randomly picking 2 Undergrad mechanics problems, 2 Undergrade E&M problems, etc. from past qualifying exams (over some restricted date range).
Have the option to hide or display the solutions on results.
Any suggestions or comments on how best to do this project would be greatly appreciated!
I've written up some more details here if you're interested.
For your situation, it seems that it is more important part to implement the interface than the data storage. To store the data, you can use a database table or tags. Each record in the database (or tag) should have the following properties:
Year
Season
Undergradure or Graduate
Subject: CM, EM, QM, SM, Mathematical Methods, and Miscellaneous
Problem number (is it neccesary?)
Question
Answer
Search and return all Electricity & Magnetism problems.
Directly query the database and you will get an array, then display some or all questions.
Create a random qualifying exam — meaning a new 20 question test randomly picking 2 Undergrad mechanics problems, 2 Undergrade E&M problems, etc. from past qualifying exams (over some restricted date range).
To generate a random exam, you should first outline the number of questions for each category and the years it drawn from. For example, if you want 2 UG EM question. Query the database for all UG EM questions and then perform a random shuffling on the question array. Finally, select the first two of them and display this question to student. Continue with the other categories and you will get a complete random exam paper.
Have the option to hide or display the solutions on results.
It is your job to determine whether you want the students to see answer. It should be controlled by only one variable.
Are "Electricity & Magnetism" and "Statistical Mechanics" mutually exclusive categoriztions, along the same dimension? Are there multiple dimensions in categories you want to search for?
If the answer is yes to both, then I would suggest you look into multidimensional data modeling. As a physicist, you've got a leg up on most people when it comes to evaluating the number of dimensions to the problem. Analyzing reality in a multidimensional way is one of the things physicists do.
Sometimes obtaining and learning an MDDB tool is overkill. Once you've looked into multidimensional modeling, you may decide you like the modeling concept, but you still want to implement using relational databases that use the SQL interface.
In that case, the next thing to look into is star schema design. Star schema is quite different from normalization as a design principle, and it doesn't offer the same advantages and limitations. But it's worth knowing in the case where the problem is really a multidimensional one.
Am making a cube in SQL Server Analysis Services 2005 and have a question about many to many relationships.
I have a many to many relationship between two entities that contains an additional descriptive column as part of the relationship.
I understand that I may need to a bridge table to model the relationship but I am not sure where to store the additional column - in the bridge table or elsewhere?
Many To Many relationsip in SSAS can be implemented via an intermediate fact table that contains both dimension key that subject to the relation.
For example; If you have a cube that has a book-sales-fact table and you want to aggregate total sales by author (which may have many books and a book may be written by many authors) you should also have a author-book intermediate fact table (just like in relational database world). In this bridge table, you should have both dimension keys (Author and Book) plus some measure related to the current book and author such as wages paid to the author to write the book (or chapters).
As a result, if your additional column is kind of a measure you should add that column to the intermediate fact table.