Do we have to use fact table for reports? - report

I am working on building a data mart for reporting purpose.
I am new to this field and looking for help.
I have a fact table and two dimension tables.
The fact table has only 3 fields, its primary key and foreign key references to two dimension tables.
The two dimension tables have data related to 1)phonenumbers and 2)extension numbers.
(I cannot combine these dimension tables because they have different information)
As you see my fact table does not have any quantitative columns.
I want to generate a report that displays phonenumbers and corresponding extensions.
I can get this information by performing a join on the two dimension tables.
So my question is do I have to use fact table for the report? i.e Should I first get the key from phonenumber table, perform join on fact table, get extension key and perform join on extension table?
OR
Simply join the two dimension tables to generate the report because it is possible in this case?
Do we have to involve the fact table?
Thanks for reading.
Any help is appreciated.

do I have to use fact table for the report? i.e Should I first get the key from phonenumber table, perform join on fact table, get extension key and perform join on extension table?
Often, this is necessary.
Simply join the two dimension tables to generate the report because it is possible in this case?
Sometimes, this works, also.
Do we have to involve the fact table?
Depends on the relationships.
If you have a "hierarchy" of dimensional information, then the two dimensions could be directly related. In this case, the fact table doesn't tie them together. The fact ties to the detailed dimension; the detailed dimension ties to the summary. This is rare.
Dimensions change.
If you have two or more Slowly Changing Dimensions, then your dimensions may include lots of "previous" relationship information.
Fact 1: Phone xxx-xxx-xxxx, Extension yyyy
Fact 2: Phone xxx-xxx-xxxx, Extension zzzz
Then, another load applies an SCD rule to modify zzzz to aaaa, as of 7/1/11 You may have the old dimension values available, as well as the new dimension values, with an applicable date range.
Now, the fact (and the date range) are required to define which copy of the dimension value you're going to get.
Fact 2: Phone xxx-xxx-xxxx, Extension zzzz, from beginning to before 7/1/11.
Fact 2: Phone xxx-xxx-xxxx, Extension aaaa, from 7/1/11 to end.
So, you may need the fact, dimensions and time to sort out the relationships.

Related

AWS DynamoDB Naming Convention

I am trying to create a naming convention for different objects in DynamoDB, such as tables, partition and sort keys, LSIs, GSIs, attributes, etc. I read a lot of articles and there is no common way to do that but want to learn from real-time examples to choose which one will fit best our needs.
The infrastructure I am working on is based on microservices. Along with this, some of our development environments share the same AWS account. Based on this, I ended up with something like this:
Tables: [Environment].[Service Name].[Table Name].ddb-table
GSIs/LSIs: [Environment].[Service Name].[Table Name].[GSI/LSI Name].ddb-[gsi/lsi]
Partition Key: pk ??? (in my understanding, the keys should have abstract names, because the single table stores versatile data in the same key)
Sort Key: sk ??? (in my understanding, the keys should have abstract names, because the single table stores versatile data in the same key)
Attributes: meaningful but as short as possible as they are kept for every item in the table
Different elements are separated by dot (.)
All names are separated by dashes (kebab-case) and in lower case
Tables/GSIs/LSIs are in singular form
Here is an example:
Table: dev.user-service.user-order.ddb-table
LSI: dev.user-service.user-order.lsi1pk.ddb-lsi
GSI: dev.user-service.user-order.gsi1pk.ddb-gsi
What naming conventions do you follow?
Thanks a lot in advance!
My advice:
Use PK and SK as your partition key and sort key.
Don't put table names into code. Use ParameterStore. For example, if you ever do a table restore it will be to a new table name, and if you want to send traffic to the new name you'll not want to change code.
Thus don't get too fixed to any particular table name. Never try to have code predict a table name. Only have them be consistent to help humans.
Don't put regions in your table names. When you switch to Global Tables they all keep the same name. Awkward!
GSIs can be called GSI1, GSI2, etc. GSI keys are GSI1PK and GSI1SK, etc.
Tag your tables with their name if you ever want to track per-table costs later.
Short yet meaningful attribute names are nice because it reduces storage and can reduce RCU/WCU if you're near the 4kb or 1kb lines.
Use difference accounts for dev, staging, and production. If you want to put the names into tables as well to help you spot "OMG I'm in production" that's fine.
If you have lots of attributes as the item payload which aren't used for GSIs or filtering and are always returned together, consider just storing them as a string or binary which gets parsed client side. You can even compress them. It's more efficient and lower latency because it skips the data marshaling.

Kusto: Is it possible to prefix the columns coming from a specific table?

I'm performing a query and joining data from 3 different tables. In the final table, I would like to know which table each individual column comes from, since some of the column names repeat themselves. At the moment, I am doing a very long:
| project prefix1_column1=column1, prefix1_column2=column2
for each join.
In a perfect universe, I could add a parameter to the joins to specify a prefix but that doesn't exist. Is there a cleaner way to do this?
What you are currently doing is the best way to do it. For the feature you are asking for, please open a suggestion in the Azure Data Explorer user voice.
You should also look at the lookup operator, which does not repeat the columns that are part of the join keys, while this will not solve your use case it will reduce the noise.

Can the same dimension table be related to multiple fact tables?

I am new to OLAP,if I have two fact tables can they share the same dimension table?
A good example would be if I have tables fact1 and fact2, can they both have a foreign key into a single Date dimension (dimDate) table? Or, do I need/should create separate dimDate dimension tables for each separate fact?
To me, and based on my research, I don't see any downfall of sharing a dim table, but wanted to check.
Thanks!
They can, and should.
That's the whole point of conformed dimensions, keeping the attributes in a single place, so as to avoid multiple versions of truth coming from different fact tables.
So a single date dimension, with all the necessary attributes for each fact table, which is then linked from each fact table that needs it.
Same for a customer dimension. If you have a sales fact table that needs customer info such as billing address and a marketing fact table that holds info about campaigns each customer can benefit from, you would combine all those attributes in a single table. Some customers may not be referenced in the marketing fact table, others may not exist in the fact table, but all would exist in the single customer dimension, which is your single source of truth about who your customers are.

Full-Text search in Sql server with multiple tables and ranking

We have a website which is running on DNN 7.1 with SQL server. We implemented full text search to show search results. We need to search several tables and show the results to the user. Right now the implementation is user enters search word(s) and clicks search, the code behind creates several threads to search different tables, and merge the data. Currently we are using contains predicate, issue with this is, there is no ranking and sometimes after the merge the results on the first page are not the best matches. I thought that I can use containstable and order the results by ranking but I read ranking doesn't have any meaning by itself, it merely tells which one best matches in the current resultset. But in my scenario I have multiple resultsets, how will I know which are best matches across multiple resultsets. Or am I going about this wrong way? What is a good way to handle this scenario? We need to improve the response time along with better results. Any help is greatly appreciated.
this is how we implemented full text searching across multiple tables:
1) create a new table to store the primary keys of the other tables in each column, another column to store the string concatenated values of all the search fields from each table, and another column to store the checksum value of the concatenated values.
2) implement the FTI on this new table, and create a job that regularly syncrhonizes/updates the concatenated search values only if the binary_checksum value is different
3) use the contains predicate on this new table and based on the results, join back to their corresponding tables based on the primary keys returned.

SQL server inserting lots of data from ASP.NET?

I have this application, where there is a parent child table, and customers can order products. The whole structure is quite complex to post here but suffice to say, there is one Order table and one OrderDetails table for storing the orders. Currently what we are doing is INSERT one record in Order table, and then for each item the customer added, insert each item in a loop to OrderDetails table. The solution is not scalable for obvious reasons. It works fine for 100 or so items, but if user goes over 1000 items, or 1000 qty of a item or so, one can start to notice the unresponsiveness of the application.
There are a couple of solutions that come to mind, but I am not sure which one would scale well. One is I use BulkInsert from my asp.net application to insert into the OrderDetails table. Second is I generate XML and then pass that to a sql proc and extract / insert data into OrderDetails table from that XML, but that have associate overhead of memory consumption of the XML generated. I know I could benchmark and see for myself what would suit best for my application, but I would like to know what is the most common strategy and would scale better when compared to other. Also, if there is another technique that I could use instead of these two, that would be better performance wise ( I know performance is subjective word, but let me narrow it down to speed ) I could use that. Which is generally used the most? What do you use in your application?
You could consider exploring the option of using a table valued parameter in the database. You will have to create a table type object, whose structure will mimic that of the OrderDetails table. The stored proc for inserting the data will accept an input parameter of this type (such parameters are always READONLY).
In your server side code, you can construct a DataTable object containing all the Order Details data, which will be mapped to the input parameter of the stored proc. Ensure that the order of columns in the DataTable object exactly matches the order in the table valued parameter. Upon executing the query, all the data will be inserted in one shot. This will save you from looping for each row of data that is there, and will also prevent the overhead of XML parsing. This approach though will involve passing an entire object over the network.
You can read more about it here : MSDN Table Valued Parameters
1000 items for an order does seem quite excessive!
Would it be feasible to introduce a limit of 100 items per order into the business logic of the application?

Resources