Perform CDC on CLOB Column in Informatica/Teradata - teradata

I have a clob column and I need to perform CDC on it to flag it for update/insert/nochange based on the change in the CLOB data. My source/Target are teradata. ETL tool we are using is informatica. I need to perform CDC on this CLOB column. My table will have one Key column and this CLOB column, i.e., 2 columns.
Could any one help me how can I achieve this scenario either in Teradata or informatica or using both.
Thanks in advance.

My guess would be to go for MD5. Load data and store MD5 hash. Upon reload, calculate MD5 and compare with stored one. Direct CLOB comparison might be troublesome and time and memory consuming.
I've found few more comments and answers here, take a look.

Related

DynamoDB what is the efficient way to copy items of hash A to hash B?

I have multiple data entires in Hash A, each entry has its own range key.
And I want to copy all of them to Hash B, their range can be same or any new value.
What is the fast way to do it? any built in way to do it?
If you are talking about existing data there isn't really much you can do to make it faster/more efficient. You're going to have to do a scan of the data and write the new records. You can do this with a lambda, combined with step functions of the data is big enough that it won't complete in 15 minutes.
If you are talking about duplicating the data as it is added to the table you can use DynamoDB streams and add the duplicated record after the record is added.

Bulk Collect and FOR loop when all the values for insert DML is not available

I want to insert 150K records from a source table into a destination table. The problem is that i have to calculate some values for the destination table too.
How should i use the bulk collect and for statement for the INSERT DML.
Please find the elaborate explanation below.
Source Table
Account_id | Status1| Status2
Table 1
Account_id | column2| column3| column4|column6|column7
Table 2
Account_id | column2| column3| column6|column9|column10
NOw I have to fetch the values from table 1 for the account_ids matching the source table and insert into table 2 where i have to populate column9 and column 10 dynamically.
BULK COLLECT requires a lot of memory and in practice is only feasible if you process your data in chunks, i.e. about 1000 rows at a time. Otherwise the memory consumption will be too much for most systems.
The best option is usually to create a single INSERT .. SELECT statement that retrieves, calculates and inserts all data at once.
If this is not possible or far too complex, the second best option in my opinion is a pipelined function written in PL/SQL.
The third best and usually easiest option is a simple PL/SQL loop that select row by row, calculates the required data and inserts it row by row. Performance wise it's usually the worst. But it can be still more than sufficient.
For more precise answers, you need to specify the exact problem at hand. Your question is rather broad.

Firebase IDs - can I extract the date/time generated?

I need to date/timestamp various transactions, and can add that explicityly into the data structure.
Firebase creates an ID like IuId2Du7p9rJoT-BARu using some algorithm.
Is there a way I can decode the date/time from the firebase-created ID and avoid storing a separate date/timestamp?
Short answer: no.
I've asked the same question previously, because my engineer instincts tell me I can never duplicate data. The conclusion that I came to after I thought this through to the logical end, is that even in a SQL database there exists tons of duplication. It's simply hidden under the covers (as indices, temporary tables, and memory caches). This is a part of large and active data.
So drop the timestamp in the data and go have lunch; save yourself some energy :)
Alternately, skip the timestamp entirely. You know that the records are stored by timestamp already, assuming you haven't provided your own priority, so you should be good to go.

sqlite3 insert into dynamic table

I am using sqlite3 (maybe sqlite4 in the future) and I need something like dynamic tables.
I have many tables with the same format: values_2012_12_27, values_2012_12_28, ... (number of tables is dynamic) and I want to select dynamically the table that receives some data.
I am using _sqlite3_prepare with INSERT INTO ? VALUES(?,?,?). Ofcourse this fails to compile (syntax error near ?). There is a nice and simple way to do this in sqlite ?
Thanks
Using SQL parameters is not possible for identifiers such as table or column names.
If you don't want to keep so many prepared statements around, just prepare them on the fly whenever you need one.
If your database were properly normalized, you would have a single big values table with an extra date column.
This organization is usually to be preferred, unless you have measured both and found that the better performance (if it actually exists) outweighs the overhead of managing multiple tables.

How to determine position of specific character/string in SQLite string column value?

I have values in a SQLite table* that contain a number of strings, of different lengths, joined by periods, something like this:
SomeApp.SomeNameSpace.InterestingString.NotInteresting
SomeApp.OtherNameSpace.WantThisOne.ReallyQuiteDull
SomeApp.OtherNameSpace.WantThisOne.AlsoDull
SomeApp.DifferentNameSpace.AlwaysWorthALook.LittleValue
I'd like to extract (in this case) the third period-delimited substring so I could write something like
SELECT interesting_string, COUNT(*)
FROM ( SELECT third_part_of_period_delimited_string(name) interesting_string )
GROUP BY interesting_string;
Obviously I can do this any number of ways programmatically; I'm wondering if there's any way to achieve this in a SQLite SELECT query?
* It's a SharpDevelop Profiler database, if anyone's curious
No.
You can, as you mention, work with the strings after you have selected them from the database. Or you can split them up into separate columns when they are stored.
If you do not have access to the code that is storing the data, you might want to consider reading the data in its entirety, splitting the strings and storing the split out tokens in separate columns in a new table. If the data is not too large, you might look at storing this table in a new memory database to give excellent performance.
Whether this is worthwhile depends on whether one pass to split the data strings can be made use of many times. If the data is constantly changing, then this scheme would probably not work well.

Resources