Situation: In the app we have up to 1000 schools. Every school has students and students are having lessons and are joining events (and more). We need to query quick and often lessons per student, per school per date. We have 2 designs in mind, wondering the best way to proceed.
1 - design with dedicated school node
2 - design with no dedicated school node
Examples of two designs
PRO design 1
- root ref to school user after login. noo need to query on school id's
- no need to mention school id's everywhere
- no need for node lessens per school and events per school
- rules on school level
...
PRO design 2
- more flatten data, as widely advised on the internet
For most NoSQL database structures, flattening and denormalising data is the best method. And that is exactly the case with Firebase too.
When you flatten your data, you get the following advantages :-
You're mostly only downloading the minimum required amount. That leads to efficiency and cost-effectiveness.
Your downloads are much faster - specially compared to the likes of SQL join queries.
Having said that, in your particular case, I think that it really depends on how much the school affects the logged in user.
Suppose that a school is only an attribute for a student, and serves no other purpose, then the second database is the way to go. For example, if the books a student can get are independent of the school she goes to, then the second database style is more suited.
However, if a school categories students into groups that define their interaction with the database, then the first database structure is the way to go. An example of this is that a student can only get a book when its available in the school she goes to.
Regardless of your decision, I would like to commend you on the fact that you have flattened your database quite well in both your structures! And my personal suggestion would be to go with the one that is more convenient to code, read and maintain for you.
Related
I design my firebase structure and I'm not sure if that the right way.
Little information:
Each college has some departments.
Each departments has many courses.Courses can belong to several departments.
Each Course has some lessons. lesson can be belong to one course.
I have node of courses with all key courses and information
I have node of departments with all departments and informations.
I have node Course_Departments and Departments_courses
in addition I have courses_lectures and courses_lectures
for display each course learn by some lecturers and each lecturers teach some courses.
so my questions are:
1.How I connect the lesson to these table for example? I want to find all the lessons of the course_key1 that lecture_key2 teach?
2. using with many tables in this way can be take many time to get the data?
for UI I don't want to users to wait much for the data.
Looking at your design, I think, you are on the right track.
I'll give some possible hints of what you can take care of. When designing the structure of a Firebase database design rules of non-relational databases should be kept in mind.
One of them is denormalization. Keep the hierarchy flat! That's an important performance factor for data change listeners since all subnodes are involved. That's what you've done so far.
Relationship of entities can be achied by using the keys. Exactly as you did it in the Courses_Department node. The built-in creation of keys should be used. They should be universally unique.
Here's a good explanation when coming from relational databases.
I am using Apriori to build a recommender system to go along with my company's application. Before going down this road, I'd like to confirm with someone that has more experience that I am on the right track. Any help is appreciated.
Let me try to explain the issue. Depending on the context of the user within the application, the features that impact the recommendations can vary. For example, imagine a shopping scenario. If I shop at HEB, I usually have a predefined grocery list so the items on that list would be good recommendations if I just told the app I was going to HEB. When I go to Home Depot though, I tend to shop by department, so power tools and the associated parts are good recommendations if I tell the app I'm at Home Depot and I am doing shopping for power tools.
You see that the number of features varies in the two scenarios. In the first, my recommendations depend solely on the store while in the second, they depend on the store and the department in which I'm shopping.
I am looking to use a single Apriori model that can handle this type of situation. Would that be considered a best practice or is it better to have different models, one for when we just list the store and another for when we list the store and the department? Given that Apriori is an unsupervised algorithm, I think it can be done with one model, but wanted to double check since I don't have a ton of experience.
It seems to me like you are talking about multi-level association rules. This is from the manual page of the aggregate function in arules:
Support for Item Hierarchies
Description:
Often an item hierarchy is available for datasets used for
association rule mining. For example in a supermarket dataset
items like "bread" and "beagle" might belong to the item group
(category) "baked goods."
I guess the higher-level categories would be your departments and stores. This will be able to find associations between items, departments and stores.
There is one example to explaining associations in UML.
A person works for a company; a company has a number offices.
But I am unable to understand the relationship between Person, Company, and Office classes. My understanding is:
a company consists of many persons as employees but these classes exist independently so that is simple association with 0..* multiplicity on Person class' end
a company has many offices and those offices will not exist if there is no company so that is composition having Company as the parent class and 0..* multiplicity on Branch class' end.
But I am not sure of 2nd point. Please correct me if I am wrong.
Thank you.
Why use composition or aggregation in this situation at all? The UML spec leaves the meaning of aggregation to the modeler. What do you want it to mean to your audience? And the meaning of composition is probably too strong for this situation. Thus, why use it here? I recommend you use a simple association.
If I were you, I would stay truer to the problem domain. In the world I know, Offices don't cease to exist when a Company goes out of business. Rather, a Company occupies some number of Offices for some limited period of time. If a Company goes out of business, the Offices get sold or leased to some other Company. The Offices are not burned to the ground.
If you aren't true to the problem domain in an application, then the shortcuts you take will become invalid when the customer "changes the requirements" for that application. The problem domain doesn't actually change much, just the shortcuts you are allowed to take. If you take shortcuts to satisfy requirements in a way that are misaligned with the problem domain, it is expensive to adjust the application. Your customer becomes unhappy and you wind up working overtime. Save yourself and everyone the trouble!
While Jim's answer is correct, I want to add some extra information. There are two main uses for aggregation
Memory management
Database management
In the first case it gives a hint how long objects shall live. This is directly related to memory usage. If the target language is one which (like most modern languages) uses a garbage collector, you can simply ignore this model information.
In the second case, it's only partially a memory question. A composite aggregation in a database indicates that the aggregated elements need to be deleted along with the aggregating element. This is less a memory but in most cases a security issue. So here you have to think twice.
A shared aggregation however has a very esoteric meaning in all cases.
I'm looking at accepting a project that would require me to clean up an existing e-commerce website. Its been relatively successful and has over 100,000 individual products - loaded both by the client and its publishers.
The site wasn't originally designed for this many products and has become fairly disorganized.
SO, the client has asked I look at a more robust search option - filterable and so forth. I completely agree it needs to be improved, but after looking at the database, I can tell that there are dozens and dozens of categories and not everything is labeled correctly etc.
Is there any database management software that could help me clean up 100,000 entries quickly? Make categories consistent - fix uppercase/lowercase problems etc.
Are there any companies out there that I can source just this particular part of the project to?
Its a massive amount of data-entry. If I spent 2 minutes per product, it would take me 6 months full time to just to complete the database cleanup. I either need to get it down to a matter of seconds per product or find a company that specializes in this type of work.
I don't even know what to search for on Google.
Thanks guys!
--
Thanks everyone for your ideas! I have a lot of options now so I feel a lot more comfortable heading in to this project. Right now I think the direction we will go is to build a tool that allows the client to hire data entry people that can update it as necessary. Then I will work as a consultant, taking care of any UPDATE-WHERE type functions as necessary.
Thanks again!
If there are inconsistencies like you are describing, it sounds like the problem may be more an issues of a bad data model (i.e. lack of normalization) than just dirty data. If good normalization is in place, cleaning up categories should be as simple as updating a single record per each category - but if category name is used instead of a foreign key, then you will most likely need to perform a series of UPDATE WHERE statements to clean up the text.
You may want to look into an ETL (extract, transform, load) tool that can help with bulk data transformation. I'm not familiar with ETL tools for mysql, but I'm sure they exist. SQL Server has a build in service called SQL Integration Services that provides the ability to extract data from an existing data source, perform bulk changes or transformations, and then reload the data back into a destination database. Tools like this may help speed up the process of standardizing capitalization, punctuation, changing categories etc.
Even still, don't overlook the possibility that the data model may need tweaking to help prevent this type of situation in the future.
Edit: Wikipedia has a list of opensource ETL products that you may want to investigate.
In any case you'll probability need to do more than "clean the data", which means you'll need to build new normalized tables. So start there, build a new database that is fully normalized, import the data "as is", with all the duplicate categories, etc.
for example, new tables:
Items
ItemID int identity/auto number
ItemName string
CategoryID int
....
Categories
CategoryID int identity/auto number
CategoryName string
....
import the bad data into the new system:
Items
ItemID ItemName CategoryID
1 thing A 1
2 thing B 2
3 thing C 3
4 thing D 1
Categories
CategoryID CategoryName
1 Game
2 food
3 games
now, you can consolidate the data using the PKs
UPDATE Items
SET CategoryID=1
WHERE CategoryID=3
DELETE Categories
WHERE CategoryID=3
You might just write an application where the customer can do the consolidation. Let them select the duplicates on a screen and merge to a selected parent category. you have this application do the merge sql from above.
If there are issues of needing to have a clean cut over date, create an application that generates a series of "Map" tables, where you store the CategoryNameOld="games" and the CategoryNameNew="Game" and use these when you do the conversion/load of the bad data into the new system's tables.
I would implement the new search system or whatever and build them a tool that would allow them to easily go through and cleanup the listings, re-categorize, etc. This task requires domain knowledge, so they're the best ones to do it.
Do some number crunching so they can prioritize the list and clean in order of importance.
Keep in mind that one or your options is to build a crappy interface that somebody can use to edit records, hire half a dozen data-entry people from a temp agency, spend two days training them, and let them go to town.
In my OLTP database I have a layout consisting of instructors and students. Each student can be a student of any number of instructors. A student can also sign up for an instructor, but not necessarily book any tuition (lesson).
In a data warehouse, how best would this be modelled? If I create a dimension table for Lessons, Instructors and Students and a fact table for the lessons students have taken then this will work when an instructor wants to see what lessons a student has taken.
However, how will an instructor see how many students are REGISTERED with the instructor but has not yet taken a lesson?
In my OLTP, I have a many to many table (InstructorStudents) that links each student with one more more instructors. In an OLAP database, this isn't appropriate.
What would be the best schema in this case? Would a many to many be appropriate in this instance? I can't store a list of which students are registered to which instructors in the student table, so I feel another dimension table is necessary but cannot work out what should be contained in it.
If a fact represents a transaction, you seem to have two different facts here: Sign ups & Lessons. There are always a lot of ways to go but, perhaps, you need two fact tables. They may have similar dimensionality except the sign-up table will have a Class dimension (class name, instructor name, etc.). The Lessons table will tie to the class dimension but, also, to a Lesson dimension (date, classroom used, etc.).
There are a few other ways to do this but they will be more difficult from a programming & reporting perspective.
You need a many to many dimensional model.
You need a factless fact table. Look at the following resource that refers to an example close to your need
http://www.kimballgroup.com/1996/09/02/factless-fact-tables/