System-versioned tables: need historical data for a new empty table for testing / development - mariadb

I want to track the changes of a table using MariaDB system versioning.
For these changes I need to create a visualization group by monthly changes.
How can I create test data inside the database table to test my query properly? It is a new empty database table, so I need to create test data in the past. I am not able to wait for a year or so to get proper test data.
Any ideas?

An example of loading data need to change the current time like:
set statement timestamp=unix_timestamp('2021-03-04') for
insert into .....
ref: fiddle
As documented, you cannot set system_versioned_asof to change the result of DML like fiddle example.

Starting with 10.11 MariaDB allows direct insertion of historical data.
https://mariadb.com/kb/en/system-versioned-tables/#system_versioning_insert_history
system_versioning_insert_history
Description: Allows direct inserts into ROW_START and ROW_END columns if secure_timestamp allows changing timestamp.
Commandline: --system-versioning-insert-history[={0|1}]
Scope: Global, Session
Dynamic: Yes
Type: Boolean
Default Value: OFF
Introduced: MariaDB 10.11.0

Related

MariaDB - Inserting historical data into a system versioned (temporal) table

I have some tables in MariaDB that I have been tracking the changes for by using a separate "changelog" table that updates every time a record is updated. However I have recently learned about temporal data tables in MariaDB and I would like to switch to that method as it is a much more elegant method of tracking changes. I'm wondering, however, if there is a way to transfer over my "changelog" table to the newly system versioned tables.
So I was hoping I could insert new rows somehow with the specified values for the table and also specify the row_end and row_start columns and also have that not trigger the table to create another historical row... is this possible? I tried just doing a a "insert into (id, row_start, row_end, etc) values(x, y, z)" but that results in an unknown column "row_start" error.
Old question, but starting with 10.11 MariaDB allows direct insertion of historical data using a command line option or setting.
https://mariadb.com/kb/en/system-versioned-tables/#system_versioning_insert_history
system_versioning_insert_history
Description: Allows direct inserts into ROW_START and ROW_END columns if secure_timestamp allows changing timestamp.
Commandline: --system-versioning-insert-history[={0|1}]
Scope: Global, Session
Dynamic: Yes
Type: Boolean
Default Value: OFF
Introduced: MariaDB 10.11.0

How to Handle BQ GA Export Changes?

I'm trying to reprocess ga_sessions_yyyymmdd data but am finding the ga_sessions never used to have a field called [channelGrouping] but it does in more recent data.
So my jobs work fine for the latest version of ga_sessions but when i try reprocess earleir ga_sessions data the job fails as it's missing the [channelGrouping] field.
Obviously usually this is what you want, but in this case it's not. I want to make sure i'm sticking to the latest ga_sessions schema and would like the job to just set missing cols to null for when they did not exist.
Is there any way around this?
Perhaps i need to make an empty table called ga_sessions_template_latest and union it on to whatever ga_sessions_ daily table i'm handling - maybe this will 'upgrade' the old ga_sessions to the new structure.
Attached is a screenshot of exactly what i mean (my union idea will actually be horrible due to nested fields in ga_sessions).
I don't have such a script yet. But since the tables are under your project you are able to update them. You can write a script and update the schema on all tables with missing columns from the most recent schema set.
I envision a script that gets most recent table schema.
Then goes back one by one to past tables, does a compare, identifies the missing columns, defines them as not required and nullable, and reads the schema + applies the additional columns and runs the update on the table. Data won't be modified, you will have just additional columns with null values.
you can try out for some also from the Web UI.

cache intersystems command to get the last updated timestamp of a table

I want to know the last update time of a Cache Intersystems DB table. Please let me know the relevant command. I ran through their command documentation:
http://docs.intersystems.com/latest/csp/docboo/DocBook.UI.Page.cls?KEY=GTSQ_commands
But I don't see any such command there. I also tried searching through this :
http://docs.intersystems.com/latest/csp/docbook/DocBook.UI.Page.cls?KEY=RSQL_currenttimestamp
Is this not the complete documentation of commands ?
Cache' does not maintain "last updated" information by default as it might introduce unnecessary performance penalty on DML operations.
You can add this field manually to every table of interest:
Property LastUpdated As %TimeStamp [ SqlComputeCode = { Set {LastUpdated}= $ZDT($H, 3) }, SqlComputed, SqlComputeOnChange = (%%INSERT, %%UPDATE) ];
This way it would keep the time of last Update/Insert for every row, but still it would not help you with Delete.
Alternatively - you can setup triggers for every DML operation that would maintain timestamp in a separate table.
Without additional coding the only way to gather this information is to scan Journal files, which is not really intended use for these and would be slow at best.

Creation of Flyway "schema_version" fails for dashDB

I'm using Flyway to manage db migration on IBM dashDB. This database organizes by default table content 'by column', which in particular makes the creation of the "schema_version" table fail.
To get it to work, the table creation SQL statement should only include the "ORGANIZE BY ROW" directive:
CREATE TABLE (...)
(...)
) ORGANIZE BY ROW
What would be the best approach to handle this issue ? I'm looking for a solution that does not impact the default table organization.
Thanks for helping,
Cheers.
dashDB will perform best when all tables are column-based. When you start to mix row and column based tables, many operations are then performed in "compensation" which basically means they won't take full advantage of the columnar engine.
There are currently some compatibility reasons why a columnar table cannot be created and thus a row based table must be used, but the original DDL nor error are stated so I can't tell in this case. If you can provide the full CREATE TABLE statement and the resulting error (if you have it), I can possibly provide an alternative solution that would allow you to still use all column-based tables.
If you only want to change a particular table from column organized to row organized then a "ORGANIZE BY ROW" on the table definition would be the recommended way to approach this. (This seems to be what you're doing)
Changing the default table org will change how tables are created when you don't put an "ORGANIZE BY " in your table ddl.
If you have admin privileges on your dashDB instance you can change the default table org via 'Run SQL' in the dashDB console or using a dashDB client. (for exampl: clp/clpplus)
Set default table organization to ROW:
call ADMIN_CMD('UPDATE DB CFG USING DFT_TABLE_ORG ROW');
Set default table organization to COLUMN: (default dashDB configuration)
call ADMIN_CMD('UPDATE DB CFG USING DFT_TABLE_ORG COLUMN');
Analytics will perform much better with Column organized tables so it's recommended to have the majority of your tables as column organized.

Teradata: Is there a way to generate DDL from a view or select statement?

I am using a global application user account to access database A. This user account does not have permissions to modify database A's schema (ie, create tables, modify tables, etc). This user also has access to database B, but only views. I need to run SQL to feed data from a view in database B into a table in database A.
In a perfect world, I would be able to use this SQL:
create database_a.mytable as (select * from database_b) with no data
However, the user can't create tables in database A. If I could get the DDL of the select statement then I could log in under my personal account (which doesn't have any access to database B) and run the DDL in database A to create the table.
The only other option is to manually write the SQL, but I don't want to do that, especially since this view I am wanting to copy has many columns of varying data types and sizes.
Edit: I may be getting closer. I just experimented with this:
show (select * from database_b.myview)
However, it generated the DLL of every single table that is used in the view itself, as well as the definition for the view. This doesn't really help me since I just want the schema of the select statement itself. In other words, I need what would be generated if I were to use the create table as statement mentioned above.
Edit for Rob: Perhaps "DDL" was the wrong term to use. Using show view db.myview just shows the definition of the view, not the schema it represents. In my above example of create table as, I show how you can create a table that mimics the schema of a result set returned in a select. It generates a DDL on the back end for creating a table and then executes that DDL to actually create the table. You can then say show table db.newtable and see the new table's DDL. I want to get that DDL directly from a select statement so that I can copy it, log out of the app account, into my personal account, and then execute the DDL to create the table.
This is only to save me the headache of having to type out the DDL manually by hand to save time and reduce typing errors, especially since the source view has so many columns. That said, I think hitting up the DBA or writing some snazzy stored procedure to do dynamic stuff would be a bit over the top for my needs. I think there has to be a way to get the DDL for creating a table schema directly from a select statement.
Generate DDL Statements for objects:
SHOW TABLE {DatabaseB}.{Table1};
SHOW VIEW {DatabaseB}.{View1};
Breakdown of columns in a view:
HELP VIEW {DatabaseB}.{View1};
However, without the ability to create the object in the target database DatabaseA your don't have much leverage. Obviously, if the object already existed INSERT INTO SELECT ... FROM DatabaseB.Table1 or MERGE INTO would be options that you already explored.
Alternative Solution
Would it be possible to have a stored procedure created that dynamically created the table based on the view name that is provided? The global application account would simply need privilege to execute the procedure. Generally the user creating the stored procedure would need the permissions to perform the actions contained within the stored procedure. (You have some additional flexibility with this in Teradata 13.10.)
There are some caveats with this approach. You are attempting to materialize views that could reference anywhere from hundreds to billions of records. These aren't simple 1:1 views that are put on top of the target tables. Trying to determine the required space in the target database to materialize the view will be difficult. Performance can and will vary depending on the complexity of the view and the data volumes. This will not be a fast-path or data block optimized operation.
As a DBA, I would be concerned with this approach being taken on by a global application account without fully understanding the intent. I trust you have an open line of communication with the DBA(s) involved for supporting this system. I'm sure there are reasons for your madness that can't be disclosed here.
Possible Solution - VOLATILE TABLE
Unless the implicit privilege for CREATE TABLE has been revoked from the global application account this solution should work.
Volatile tables do not require perm space. There table definitions persist for the duration of the session and any data inserted into them relies on the spool space of the user who instantiated it.
CREATE VOLATILE TABLE {Global Application UserID}.{TableA_Copy} AS
(
SELECT *
FROM {DatabaseB}.{TableA}
)
WITH NO DATA
NO PRIMARY INDEX
ON COMMIT PRESERVE ROWS;
SHOW TABLE {Global Application UserID}.{TableA_Copy};
I opted to use a Teradata 13.10 feature called NO PRIMARY INDEX. By default, CREATE TABLE AS will take the first column of the SELECT statement and make it the PRIMARY INDEX of the table. This could lead to skewing and perm space issues in your testing depending on the data demographics. You can specify an explicit PRIMARY INDEX on your own as you understand the underlying data. (See the DDL manuals for details on the syntax if you're uncertain.)
The use of ON COMMIT PRESERVE ROWS for the intent of this example is probably extraneous. But in reality if you popped any data into that table for testing this clause would be beneficial in Teradata mode as the data would otherwise be lost immediately after the CREATE TABLE or any other data manipulation was performed against the volatile table.

Resources