I am presently facing skewness issue while loading a fact table with Insert into ... Select from clause.
I have a staging-source tables defined with a NO PI & for now I have defined a fact table with PI on Report_Id. PI defined on Report_Id considering I will be loading data for other reports in furture, that is for sure.
However, in DBA perspective, the query is highly skewed.
Can you please suggest whether should I define a NOPI fact table or define UPI fact table. In case of UPI table, what should be the UPI key ? Identity Id ?
Please advise.
Many thanks,
Related
I'm a little unfamiliar with ClickHouse and still study it by trial and error. Got a question about it.
Talking about the star scheme of data representations, with dimensions and facts. Currently, I keep everything in PostgreSQL, but OLAP queries with aggregations start to show bad timing, so I'm going to move some fact tables to ClickHouse. Initial tests of CH show incredible performance, however, in real life the queries should include joins to dimension tables from PostgreSQL. I know I can connect them as dictionaries.
Question: I found that using dictionaries I can make requests similar to LEFT JOINs in good old RDBMS, ie values from resultset could be joined with corresponding values from the dictionary. But can they be filtered by some restrictions on dictionary keys (as in INNER JOIN)? For example, in PostgreSQL I have a table users (id, name, ...) and in ClickHouse I have table visits (user_id, source, medium, session_time, timestamp, ...) with metrics about their visits to the site. Can I make a query to CH to fetch aggregated metrics (number of daily visits for given date range) of users which name matches some condition (LIKE "EVE%" for example)?
It sounds like ODBC table function is what you're looking for. ClickHouse have a bunch of table functions which work like Postgres foreign tables. The setup is similar to Dictionaries but you gain the traditional JOIN behavior. It currently doesn't show up in the official document. You can refer to this https://github.com/yandex/ClickHouse/blob/master/dbms/tests/integration/test_odbc_interaction/test.py#L84 . And in near future (this year), ClickHouse will have standard JOIN statement supported.
The dictionary will basically replace the value first. As I understand it your dictionary would be based off your users table.
Here is an example. Hopefully I am understanding your question.
select dictGetString('accountidmap', 'domain', tuple(toString(account_id))) AS domain, sum(session) as sessions from session_distributed where date = '2018-10-15' and like(domain, '%cats%') group by domain
This is a real query on our database so If there is something you want to try/confirm let me know
I was looking for the best way to capture historical data in HANA for master data tables without the VALID_TO and VALID_FROM fields.
From my understanding, we have 2 options here.
Create a custom history table and run a stored procedure that populates this history table from the original table. Here we compromise with the real-time reporting capability on top of this table.
Enable the History table flag in SLT for this table so that SLT creates this as a history table which solves this problem.
Option 2 looks like a clear winner to me but I would like your thoughts on this as well.
Let me know.
Thanks,
Shyam
You asked for thoughts...
I would not use history tables for modeling time dependent master data. That's not the way history tables work. Think of them as system versioned temporal tables using commit IDs for the validity range. There are several posts on this topic in the SAP community.
Most applications I know need application time validity ranges instead (or sometimes both). Therefore I would rather model the time dependency explicitly using valid from / valid to. This gives you the opportunity e.g. to model temporal joins in CalcViews or query the data using "standard" SQL. The different ETL tools like EIM SDI or BODS have also options for populating such time dependent tables using special transformations like "table comparison" or "history preserving". Just search the web for "slowly changing dimensions" for the concepts.
In the future maybe temporal tables as defined in SQL 2011 could be an option as well, but I do not know when those will be available in HANA.
I am using Teradata via Sql Assistant. When I want to look up a relationship between two table I do the following : show table table1 and can see the create statement that generated the table with all primary and foreign keys. However, this is not very convenient because I might be missing something. So, is there any way to get the Entity Relationship Diagram ? I am interested in about 20 tables. So, how can I get relationships between them ?
SQL Assistant does not show relationships between objects through version 14.x. In my experience with Teradata, relationships have been modeled in proper modeling tools.
If your environment is enforcing referential integrity there are views in the DBC database that could be queried in SQL Assistant to help show you the relationships. However, the results would be in tabular form like any other query against the database.
DBC.All_RI_Children
DBC.All_RI_Parents
DBC.RI_Child_Tables
DBC.RI_Distinct_Children
DBC.RI_Distinct_Parents
DBC.RI_Parent_Tables
DBC.Tables2
Could you tell me please if it is possible to identify missing indexes for a select query using SQLite database?
Or is there any tool that can help?
Thank you!
Use EXPLAIN QUERY PLAN command to identify missing index for given SQL query. For example, the following result indicates you may need an index for a in table t1.
sqlite> EXPLAIN QUERY PLAN SELECT a, b FROM t1 WHERE a=1;
SCAN TABLE t1
If you have a lot of SQL queries and want to quickly identify missing index, I wrote a tool which helps analyze many SQL queries and saves you much time in the repeating process.
Using the sqlite3 command line tool, first enter
.expert
And then run your query as normal.
Ok, the basic situation: Due to a few mixed up starts, a project ends up with not one, but three separate databases, each containing a portion of the overall project data. All three databases are the same, it's just that, say 10% of the project was run into the first, then a new DB was made due to a code update and 15% of the project was run into the new one, then another code change required another new database for the rest of the project. Again, the pertinent tables are all exactly the same across all three databases.
Now, assume I wanted to take all three of those databases - bearing in mind that they can't just be compiled into a single databases due to Primary Key issues and so on - and run a single query that would look through all three of them, select a given set of data from each, then compile those three sets into one single result and return it to the reporting page I'm working on.
For reference, at its endpoint the data is output to an ASP.Net/VB.Net backed page, specifically a Gridview object. It doesn't need to be edited, fortunately, just displayed.
What would be the best way to approach this mess? I'm thinking that creating a temporary table would be my best bet, but honestly I'm stepping into a portion of SQL that I'm not familiar with here, and would appreciate any guidance somebody more experienced might have.
I'd say your best bet is to suck it up and combine the databases, even if it is a major pain to combine the primary keys. It may be a major pain now, but it is going to be 10x as painful over the life of the project.
You can do a union across multiple databases as Scott has pointed out, but you are in for a world of trouble as the application gets more complex. For example, even if you circumvent the technical limitations by having multiple tables/databases for the same entity, having duplicates in the PK for a logical entity is a world of trouble.
Implement the workaround solution if you must, but I guarantee you will hate yourself for it later.
Why not just use 3 part naming on the tables and union them all together?
select db1.dbo.Table1.Field1,
db1.dbo.Table1.Field2
from db1.dbo.Table1
UNION
select db2.dbo.Table1.Field1,
db2.dbo.Table1.Field2
from db2.dbo.Table1
UNION
select db3.dbo.Table1.Field1,
db3.dbo.Table1.Field2
from db3.dbo.Table1
-- where ...
-- order by ...
You should create what is called a Partitioned View for each of your tables of interest. These views do a union of the underlying base tables and eventually add a syntetic column to uniquefy the rows:
CREATE VIEW vTableXDB
AS
SELECT 'DB1' as db_key, *
FROM DB1.dbo.table
UNION ALL
SELECT 'DB2' as db_key, *
FROM DB2.dbo.table
UNION ALL
SELECT 'DB3' as db_key, *
FROM DB3.dbo.table;
You create one such view for each table and then design your reports on these views, not on the base tables. You must add the db_key to your join conditions. The query optimizzer has some understanding of the partitioned views and might be able to create plans that do the right thing and avoid joins that span multiple dbs, but that is not guaranteed. If things go haywire and the optimizer does not recognize the partitioning resulting in very bad execution times, you may have to move the db_key into the tables themselves and add some artificial check constraints on the base tables so that the optimizer can understand the partitioning (see the article I linked for details).
You can actually join tables on different databases. If I remember right the syntax is changed from "tablename.columnName" to "Server.Owner.tablename.columnName". You will need to run some stored procedures as an admin to allow this connectivity. It's also pretty slow but the effort to get it working is low.
If you have time to do it right look at data warehouse concepts. That's basically a temp table that collects the data you need to report on.
Building on Scott Ivey's excellent example above,
Use table name aliasing to simplify your code
Use UNION ALL instead of UNION assuming that your data is unique between the three databases
Code:
select
d1t1.Field1,
d1t1.Field2
from db1.dbo.Table1 AS d1t1
UNION ALL
select
d2t1.Field1,
d2.Field2
from db2.dbo.Table1 AS d2t1
UNION ALL
select
d3t1.Field1,
d3t1.Field2
from db3.dbo.Table1 AS d3t1
-- where ...
-- order by ...