In Teradata DB, generally expectation would be to get the data size for a table, but for my task I need to identify size of a View. This view contains joins from 3 different tables and all I could identify is the count of the View - 757747
Should I directly use this count and apply formula to determine it size?. If so what would be formula, is it sum(currentperm) I am confused, please advise
Related
I am working on a CR where I need to create a PL/SQL package and I am bit confused about the approach.
Background : There is a View named ‘D’ which is at end of the chain of interdependent views in sequence.
We can put it as :
A – Fact table (Populated using Informatica, source MS-Dynamics)
B – View 1 based on fact table
C – View 2 based on View1
D – View 3 based on view2
Each view has multiple joins with other tables in structure along with the base view.
Requirement: Client wants to remove all these views and create a PL/SQL Package which can insert data directly from MS-Dynamics to View3 i.e., ‘D’.
Before I come up with something complex. I would like to know, is there any standard approach to address such requirements.
Any advice/suggestions are appreciated.
It should be obvious that you still need a fact table to keep some data.
You could get rid of B and C by making D more complex (the WITH clause might help to keep it overseeable).
Inserting data into D is (most likely) not possible per se, but you can create and INSTEAD OF INSERT trigger to handle that, i.e. insert into the fact table A instead.
Example for using the WITH clause:
Instead of
create view b as select * from dual;
create view c as select * from b;
create view d as select * from c;
you could write
create view d as
with b as (select * from dual),
c as (select * from b)
select * from c;
As you can see, the existing view definition goes 1:1 into the WITH clause, so it's not too difficult to create a view to combine all views.
If you are on Oracle 12c you might look at DBMS_UTILITY.EXPAND_SQL_TEXT, though you'll probably want to clean up the output a bit for readability.
A few things first
1) A view is a predefined sql query so it is not possible to insert records directly into it. Even a materialized view which is a persistant table structure only gets populated with the results of a query thus as things stand this is not possible. What is possible is to create a new table to populate the data which is currently aggregated at view D
2) It is very possible to aggregate data at muliple levels in Informatica using combination of multiple inline sorter and aggregater transformations which will generate the data at the level you're looking for.
3) Should you do it? Data warehousing best practices would say no and keep the data as granular as possible per the original table A so that it can be rolled up in many ways (refer Kimball group site and read up on star schema for such matters). Do you have much sway in the choice though?
4) The current process (while often used) is not that much better in terms of star schema
I have multiple data sets that drive the Pentaho report. The data is derived from a handful of stored procedures. I need to access multiple data sources within the report without using sub reports and I believe the best solution is to create open formulas. The SINGLEVALUEQUERY I believe will only return the first column or row. I need to return multiple columns.
As an example here my stored procedure which is named HEADER in Pentaho (CALL Stored_procedure_test (2014, HEADER)), returns 3 values - HEADER_1, HEADER_2, HEADER_3. I'm uncertain of the correct syntax to return all three values for the open formula. Below is what I tried but was unsuccessful.
=MULTIVALUEQUERY("HEADER";?;?)
The second parameter denotes the column that contains the result.
If you dont give a column name here, the reporting engine will simply take the first column of the result. In the case of the MULTIVALUEQUERY function, the various values of the result set are then aggregated into a array of values that is suitable to be passed into a multi-select parameter or to be used in a IN clause in a SQL data-factory.
For more details see https://www.on-reporting.com/blog/using-queries-in-formulas-in-pentaho/
I have two problem sets. What I am preferably looking for is a solution which combines both.
Problem 1: I have a table of lets say 20 rows. I am reading 150,000 rows from other table (say table 2). For each row read from table 2, I have to match it with a specific row of table 1 (not matching whole row, few columns. like if table2.col1 = table1.col && table2.col2 = table1.col2) etc. Is there a way that i can cache table 1 so that i don't have to query it again and again ?
Problem 2: I want to generate query string dynamically i.e., if parameter 2 is null then don't put it in where clause. Now the only option left is to use immidiate execute which will be very slow.
Now what i am asking that how can i have dynamic query to compare it with table 1 ? any ideas ?
For problem 1, as mentioned in the comments, let the database handle it. That's what it does really well. If it is something being hit often, then the blocks for the table should remain in the database buffer cache if the buffer cache is sized appropriately. Part of DBA tuning would be to identify appropriate sizing, pinning tables into the "keep" pool, etc. But probably not something that needs worrying over.
If the desire is just to simplify writing the queries rather than performance, then views or stored procs can simplify the repetitive use of the join.
For problem 2, a query in a format like this might work for you:
SELECT id, val
FROM myTable
WHERE filter = COALESCE(v_filter, filter)
If the input parameter v_filter is null, then just automatically match the existing column. This assumes the existing filter column itself is never null (since you can't use = for null comparisons). Also, it assumes that there are other indexed portions in the WHERE clause since a function like COALESCE isn't going to be able to take advantage of an index.
For problem 1 you just join the tables. If there is an equijoin and one table is quite small and the other large then you're likely to get a hash join. This is effectively a caching mechanism, and the total cost of reading the tables and performing the join is only very slightly higher than that of reading the tables (as long as the hash table fits in memory).
It does not make a difference if the query is constructed and run through execute immediate -- the RDBMS hash join will still act as an effective cache.
I'd like to use flyway for a DB update with the situation that an DB already exists with productive data in it. The problem I'm looking at now (and I did not find a nice solution yet), is the following:
There is an existing DB table with numeric IDs, e.g.
create table objects ( obj_id number, ...)
There is a sequence "obj_seq" to allocate new obj_ids
During my DB migration I need to introduce a few new objects, hence I need new
object IDs. However I do not know at development time, what ID
numbers these will be
There is a DB trigger which later references these IDs. To improve performance I'd like to avoid determine the actual IDs every time the trigger runs but rather put the IDs directly into the trigger
Example (very simplified) of what I have in mind:
insert into objects (obj_id, ...) values (obj_seq.nextval, ...)
select obj_seq.currval from dual
-> store this in variable "newID"
create trigger on some_other_table
when new.id = newID
...
Now, is it possible to dynamically determine/use such variables? I have seen the flyway placeholders but my understanding is that I cannot set them dynamically as in the example above.
I could use a Java-based migration script and do whatever string magic I like - so, that would be a way of doing it, but maybe there is a more elegant way using SQL?
Many thx!!
tge
If the table you are updating contains only reference data, get rid of the sequence and assign the IDs manually.
If it contains a mix of reference and user data, you need to select the id based on values in other columns.
I am using a global application user account to access database A. This user account does not have permissions to modify database A's schema (ie, create tables, modify tables, etc). This user also has access to database B, but only views. I need to run SQL to feed data from a view in database B into a table in database A.
In a perfect world, I would be able to use this SQL:
create database_a.mytable as (select * from database_b) with no data
However, the user can't create tables in database A. If I could get the DDL of the select statement then I could log in under my personal account (which doesn't have any access to database B) and run the DDL in database A to create the table.
The only other option is to manually write the SQL, but I don't want to do that, especially since this view I am wanting to copy has many columns of varying data types and sizes.
Edit: I may be getting closer. I just experimented with this:
show (select * from database_b.myview)
However, it generated the DLL of every single table that is used in the view itself, as well as the definition for the view. This doesn't really help me since I just want the schema of the select statement itself. In other words, I need what would be generated if I were to use the create table as statement mentioned above.
Edit for Rob: Perhaps "DDL" was the wrong term to use. Using show view db.myview just shows the definition of the view, not the schema it represents. In my above example of create table as, I show how you can create a table that mimics the schema of a result set returned in a select. It generates a DDL on the back end for creating a table and then executes that DDL to actually create the table. You can then say show table db.newtable and see the new table's DDL. I want to get that DDL directly from a select statement so that I can copy it, log out of the app account, into my personal account, and then execute the DDL to create the table.
This is only to save me the headache of having to type out the DDL manually by hand to save time and reduce typing errors, especially since the source view has so many columns. That said, I think hitting up the DBA or writing some snazzy stored procedure to do dynamic stuff would be a bit over the top for my needs. I think there has to be a way to get the DDL for creating a table schema directly from a select statement.
Generate DDL Statements for objects:
SHOW TABLE {DatabaseB}.{Table1};
SHOW VIEW {DatabaseB}.{View1};
Breakdown of columns in a view:
HELP VIEW {DatabaseB}.{View1};
However, without the ability to create the object in the target database DatabaseA your don't have much leverage. Obviously, if the object already existed INSERT INTO SELECT ... FROM DatabaseB.Table1 or MERGE INTO would be options that you already explored.
Alternative Solution
Would it be possible to have a stored procedure created that dynamically created the table based on the view name that is provided? The global application account would simply need privilege to execute the procedure. Generally the user creating the stored procedure would need the permissions to perform the actions contained within the stored procedure. (You have some additional flexibility with this in Teradata 13.10.)
There are some caveats with this approach. You are attempting to materialize views that could reference anywhere from hundreds to billions of records. These aren't simple 1:1 views that are put on top of the target tables. Trying to determine the required space in the target database to materialize the view will be difficult. Performance can and will vary depending on the complexity of the view and the data volumes. This will not be a fast-path or data block optimized operation.
As a DBA, I would be concerned with this approach being taken on by a global application account without fully understanding the intent. I trust you have an open line of communication with the DBA(s) involved for supporting this system. I'm sure there are reasons for your madness that can't be disclosed here.
Possible Solution - VOLATILE TABLE
Unless the implicit privilege for CREATE TABLE has been revoked from the global application account this solution should work.
Volatile tables do not require perm space. There table definitions persist for the duration of the session and any data inserted into them relies on the spool space of the user who instantiated it.
CREATE VOLATILE TABLE {Global Application UserID}.{TableA_Copy} AS
(
SELECT *
FROM {DatabaseB}.{TableA}
)
WITH NO DATA
NO PRIMARY INDEX
ON COMMIT PRESERVE ROWS;
SHOW TABLE {Global Application UserID}.{TableA_Copy};
I opted to use a Teradata 13.10 feature called NO PRIMARY INDEX. By default, CREATE TABLE AS will take the first column of the SELECT statement and make it the PRIMARY INDEX of the table. This could lead to skewing and perm space issues in your testing depending on the data demographics. You can specify an explicit PRIMARY INDEX on your own as you understand the underlying data. (See the DDL manuals for details on the syntax if you're uncertain.)
The use of ON COMMIT PRESERVE ROWS for the intent of this example is probably extraneous. But in reality if you popped any data into that table for testing this clause would be beneficial in Teradata mode as the data would otherwise be lost immediately after the CREATE TABLE or any other data manipulation was performed against the volatile table.