I am new to OLAP,if I have two fact tables can they share the same dimension table?
A good example would be if I have tables fact1 and fact2, can they both have a foreign key into a single Date dimension (dimDate) table? Or, do I need/should create separate dimDate dimension tables for each separate fact?
To me, and based on my research, I don't see any downfall of sharing a dim table, but wanted to check.
Thanks!
They can, and should.
That's the whole point of conformed dimensions, keeping the attributes in a single place, so as to avoid multiple versions of truth coming from different fact tables.
So a single date dimension, with all the necessary attributes for each fact table, which is then linked from each fact table that needs it.
Same for a customer dimension. If you have a sales fact table that needs customer info such as billing address and a marketing fact table that holds info about campaigns each customer can benefit from, you would combine all those attributes in a single table. Some customers may not be referenced in the marketing fact table, others may not exist in the fact table, but all would exist in the single customer dimension, which is your single source of truth about who your customers are.
Related
I am trying to create a naming convention for different objects in DynamoDB, such as tables, partition and sort keys, LSIs, GSIs, attributes, etc. I read a lot of articles and there is no common way to do that but want to learn from real-time examples to choose which one will fit best our needs.
The infrastructure I am working on is based on microservices. Along with this, some of our development environments share the same AWS account. Based on this, I ended up with something like this:
Tables: [Environment].[Service Name].[Table Name].ddb-table
GSIs/LSIs: [Environment].[Service Name].[Table Name].[GSI/LSI Name].ddb-[gsi/lsi]
Partition Key: pk ??? (in my understanding, the keys should have abstract names, because the single table stores versatile data in the same key)
Sort Key: sk ??? (in my understanding, the keys should have abstract names, because the single table stores versatile data in the same key)
Attributes: meaningful but as short as possible as they are kept for every item in the table
Different elements are separated by dot (.)
All names are separated by dashes (kebab-case) and in lower case
Tables/GSIs/LSIs are in singular form
Here is an example:
Table: dev.user-service.user-order.ddb-table
LSI: dev.user-service.user-order.lsi1pk.ddb-lsi
GSI: dev.user-service.user-order.gsi1pk.ddb-gsi
What naming conventions do you follow?
Thanks a lot in advance!
My advice:
Use PK and SK as your partition key and sort key.
Don't put table names into code. Use ParameterStore. For example, if you ever do a table restore it will be to a new table name, and if you want to send traffic to the new name you'll not want to change code.
Thus don't get too fixed to any particular table name. Never try to have code predict a table name. Only have them be consistent to help humans.
Don't put regions in your table names. When you switch to Global Tables they all keep the same name. Awkward!
GSIs can be called GSI1, GSI2, etc. GSI keys are GSI1PK and GSI1SK, etc.
Tag your tables with their name if you ever want to track per-table costs later.
Short yet meaningful attribute names are nice because it reduces storage and can reduce RCU/WCU if you're near the 4kb or 1kb lines.
Use difference accounts for dev, staging, and production. If you want to put the names into tables as well to help you spot "OMG I'm in production" that's fine.
If you have lots of attributes as the item payload which aren't used for GSIs or filtering and are always returned together, consider just storing them as a string or binary which gets parsed client side. You can even compress them. It's more efficient and lower latency because it skips the data marshaling.
I have a table structure consisting of cities and comments. I need to get all comments related to a city. I have made my primary key for comments the name of the city. Now when I query my table I can get all the comments related to the city but I need them in order of votes for that comment. the votes value are constantly changing. I have considered adding ordered by to my query or adding vote as a range key and deleting are re-adding the recored every time the votes changes. These solutions don't seam that efficient and was wondering if there is a better way of doing it?
One easy thing you could do is to use a local secondary index - this DynamoDB feature can create a second table whose hash key is the same (the city name), but the sort key is the number of votes - which remains just an ordinary attribute in your original table. DynamoDB will automatically - and consistently - take care of the second table for you as you modify the first one.
Using a LSI is easier than coding the extra deletion and addition, and more efficient in the sense of less network activity and client work - but may not be significantly cheaper in Amazon bills, because DynamoDB charges you extra for that LSI work.
I would like to create a table that can store, say, the Title or Name of something in the first column, and then have associated people or objects in the next column. The problem is that there may be multiple people associated with the same Title or Name. If that first column is my primary key, I can't have duplicates for each row.
Name1 | Jim
| John
| Jill
Name2 | Mike
| Mary
Name3 | Jeff
Does this need to be done with intermediary tables, and if so, I'm fuzzy on how to actually code them (in sqlite). Do I just create them with foreign keys referencing the appropriate attribute in the main table? Any help would be appreciated.
Yes, in SQLite (or any relational database) you would model this by creating separate tables. It is a capital offense in the relational model to ever store two pieces of information in one column.
It's difficult to give a more precise answer to your question because you don't include any specifics, but most likely you will need one table to store the "things" you're interested in, one table to store information about people (one row per person), and a third "linking" table to associate people with "things".
This third table contains only the columns that make up the primary key of the "things" table and the columns that make up the primary key of the people table. The primary key of this table is made up of all the columns (probably two columns if "things" and people are each identified by a single column in the source table), and contains two foreign keys, one back to "things" and one to people.
One row is added to associate a person with a thing, but additional rows can be added to associate more people with the specified thing and more things with each person.
Another option is to use a DBMS specifically designed to support multiple values in a single field. There are several to choose from including at least one open source flavor. Look at d3, rocket software (U2 DBMS), Ladybridge (open qm) and search Google for multivalued DBMS for more.
This is well supported by MySQL 8.0.17+ by storing your values in a JSON array and adding a multi-valued index.
I am working on building a data mart for reporting purpose.
I am new to this field and looking for help.
I have a fact table and two dimension tables.
The fact table has only 3 fields, its primary key and foreign key references to two dimension tables.
The two dimension tables have data related to 1)phonenumbers and 2)extension numbers.
(I cannot combine these dimension tables because they have different information)
As you see my fact table does not have any quantitative columns.
I want to generate a report that displays phonenumbers and corresponding extensions.
I can get this information by performing a join on the two dimension tables.
So my question is do I have to use fact table for the report? i.e Should I first get the key from phonenumber table, perform join on fact table, get extension key and perform join on extension table?
OR
Simply join the two dimension tables to generate the report because it is possible in this case?
Do we have to involve the fact table?
Thanks for reading.
Any help is appreciated.
do I have to use fact table for the report? i.e Should I first get the key from phonenumber table, perform join on fact table, get extension key and perform join on extension table?
Often, this is necessary.
Simply join the two dimension tables to generate the report because it is possible in this case?
Sometimes, this works, also.
Do we have to involve the fact table?
Depends on the relationships.
If you have a "hierarchy" of dimensional information, then the two dimensions could be directly related. In this case, the fact table doesn't tie them together. The fact ties to the detailed dimension; the detailed dimension ties to the summary. This is rare.
Dimensions change.
If you have two or more Slowly Changing Dimensions, then your dimensions may include lots of "previous" relationship information.
Fact 1: Phone xxx-xxx-xxxx, Extension yyyy
Fact 2: Phone xxx-xxx-xxxx, Extension zzzz
Then, another load applies an SCD rule to modify zzzz to aaaa, as of 7/1/11 You may have the old dimension values available, as well as the new dimension values, with an applicable date range.
Now, the fact (and the date range) are required to define which copy of the dimension value you're going to get.
Fact 2: Phone xxx-xxx-xxxx, Extension zzzz, from beginning to before 7/1/11.
Fact 2: Phone xxx-xxx-xxxx, Extension aaaa, from 7/1/11 to end.
So, you may need the fact, dimensions and time to sort out the relationships.
Am making a cube in SQL Server Analysis Services 2005 and have a question about many to many relationships.
I have a many to many relationship between two entities that contains an additional descriptive column as part of the relationship.
I understand that I may need to a bridge table to model the relationship but I am not sure where to store the additional column - in the bridge table or elsewhere?
Many To Many relationsip in SSAS can be implemented via an intermediate fact table that contains both dimension key that subject to the relation.
For example; If you have a cube that has a book-sales-fact table and you want to aggregate total sales by author (which may have many books and a book may be written by many authors) you should also have a author-book intermediate fact table (just like in relational database world). In this bridge table, you should have both dimension keys (Author and Book) plus some measure related to the current book and author such as wages paid to the author to write the book (or chapters).
As a result, if your additional column is kind of a measure you should add that column to the intermediate fact table.