Teradata set table - teradata

I have a set table in teradata , when I load duplicate records throough informatica , session fails because it tries to push duplicate records in SET table.
I want that whenever duplicate records being loaded informatica rejects them using TPT or Relation connection
can anyone help me with properties I need to set

Do you really need to keep track of what records are rejected due to duplication in the TPT logs? It seems like you are open to suggestions about TPT or relational connections, so I assume you don't really care about TPT level logs.
If this assumption is correct then you can simply put an Aggregator Transformation in the mapping and mark every field as Group By. As expected, this will add a group by clause in the generated query and eliminate duplicates in the source data.

Please try following things:
1. If you'll use fload or TPT Fast load then the utility will implicitly remove the duplicates but this utility can only be used for loading into empty tables.
2. If you are trying to load data in non-empty table then place a sorter and de-dupe your data in Informatica
3. Also try changing the flag stop on error to 0 and flag Error limit in target to -1
Please share your results with us.

Related

Teradata error 2641 <tablename> was restructured. Resubmit

I have a table that I drop and create an index each day and I have another job that queries this table with an ACCESS lock.
Sometimes these jobs happen at the same time and then I get the following error:
2641 %DBID.%TVMID was restructured. Resubmit.
I have read in the documentation the following:
Explanation:
A table was changed before a statement that references the table was processed.
(For example, an index may have been added or a field removed.)
Notes:
The statement may not have the intended result because of the change in the table.
Remedy:
Examine the table and resubmit the request.
https://docs.teradata.com/reader/8MhLDQBmL52OycrEKPuGqg/Ju5pqm9uRFO6VziQdcmA6w
I guess this is because the CREATE INDEX sentence requests an EXCLUSIVE lock and the SELECT sentence is queued while the index is created, but when the SELECT is poped from the queue the table has a different version number and it fails.
Maybe I am completely wrong but,
Is there anyway to avoid this behaviour?
Something in the way of making the SELECT sentence reevaluate when it gets the chance to get executed.
Thank you!
It is up to the application to handle the 2641 and resubmit the request. There is no option to have the database do so automatically.

Getting ids of inserted raws via doctrine dbal

I'm working on a symfony application, and i need to insert multiple raws at once, Doctrine ORM is not a good option because for each raw it will open a connection to execute the query, to avoid this and have one connection inserting all the raws i used prepared statement of doctrine dbal and it works fine, except i need to get the ids of the inserted raws, it seems the only available function is lastinsertedid which returns only the last id not all the last inserted ones, how can i achieve this?
any help would be appreciated!
This is actually not related to doctrine at all. If you want all inserted id's it must be possible in MySQL. "It's unlikely that if doctrine don't have batch insert it will support returning list of ids after batch insert :)"
Check answers related to MYSQL:
How can I Insert many rows into a MySQL table and return the new IDs?
MySQL LAST_INSERT_ID() used with multiple records INSERT statement
But it's possible in postgresql (since you didn't mention you DB):
Retrieving serial id from batch inserted rows in postgresql
You can actually generate IDs before inserting content into database. For example, using random UUIDs.
This library might be of use: https://github.com/ramsey/uuid
use Ramsey\Uuid\Uuid;
$uuid4 = Uuid::uuid4();
echo $uuid4->toString()

How to Handle BQ GA Export Changes?

I'm trying to reprocess ga_sessions_yyyymmdd data but am finding the ga_sessions never used to have a field called [channelGrouping] but it does in more recent data.
So my jobs work fine for the latest version of ga_sessions but when i try reprocess earleir ga_sessions data the job fails as it's missing the [channelGrouping] field.
Obviously usually this is what you want, but in this case it's not. I want to make sure i'm sticking to the latest ga_sessions schema and would like the job to just set missing cols to null for when they did not exist.
Is there any way around this?
Perhaps i need to make an empty table called ga_sessions_template_latest and union it on to whatever ga_sessions_ daily table i'm handling - maybe this will 'upgrade' the old ga_sessions to the new structure.
Attached is a screenshot of exactly what i mean (my union idea will actually be horrible due to nested fields in ga_sessions).
I don't have such a script yet. But since the tables are under your project you are able to update them. You can write a script and update the schema on all tables with missing columns from the most recent schema set.
I envision a script that gets most recent table schema.
Then goes back one by one to past tables, does a compare, identifies the missing columns, defines them as not required and nullable, and reads the schema + applies the additional columns and runs the update on the table. Data won't be modified, you will have just additional columns with null values.
you can try out for some also from the Web UI.

Why does Teradata throw "Too many data records packed in one USING row" ONLY sometimes ??

I am using jdbc and uploading data to Teradata. I used to have 100,000 rows of batch previously and it ALWAYS worked fine for me. No dataset failed uploading EVER !
Now, I tried to upload a one column table (all integers) , I get this Too many data records packed in one USING row ? As I changed the batch to 16,383 it worked .
I found out that I am still able to use 100,000 rows batch for tables with multiple columns, however when I try to upload a table with a single column, it throws Too many data records packed in one USING row . . . I just can't understand why ? Intuitively , a single column table should be easier to upload right ? What is going on here ?
16383 is the limit for a PreparedStatement using a non-FastLoad INSERT for Teradata JDBC.
Have you considered adding TYPE=FASTLOAD to your connection parameters and allowing Teradata to invoke the FastLoad API to bulk load your data for INSERT statements that are supported by FastLoad? The JDBC FastLoad mechanism is suggested for inserts of 100K records or more. The big factor here is that your target table in Teradata must be empty.
If it isn't empty then you may be able to load an empty stage table that you in turn use the ANSI MERGE operator to perform an UPSERT of the stage data to the target table.

What methods are available to monitor SQL database records?

I would like to monitor 10 tables with 1000 records per table. I need to know when a record, and which record changed.
I have looked into SQL Dependencies, however it appears that SQL Dependencies would only be able to tell me that the table changed, and not which record changed. I would then have to compare all the records in the table to find the modified record. I suspect this would be a problem for me as the records constantly change.
I have also looked into SQL Trigger's, however I am not sure if triggers would work for monitoring which record changed.
Another thought I had, is to create a "Monitoring" table which would have records added to it via the application code whenever a record is modified.
Do you know of any other methods?
EDIT:
I am using SQL Server 2008
I have looked into Change Data Capture which is available in SQL 2008 and suggested by Martin Smith. Change Data Capture appears to be a robust, easy to implement and very attractive solution. I am going to roll CDC on my database.
You can add triggers and have them add rows to an audit table. They can audit the primary key of the rows that changed, and even additional information about the changes. For instance, in the case of an UPDATE, they can record the columns that changed.
Before you write/implement your own take a look at AutoAudit :
AutoAudit is a SQL Server (2005, 2008) Code-Gen utility that creates
Audit Trail Triggers with:
Created, CreatedBy, Modified, ModifiedBy, and RowVersion (incrementing INT) columns to table
Insert event logged to Audit table
Updates old and new values logged to Audit table
Delete logs all final values to the Audit table
view to reconstruct deleted rows
UDF to reconstruct Row History
Schema Audit Trigger to track schema changes
Re-code-gens triggers when Alter Table changes the table
What version and edition of SQL Server? Is Change Data Capture available? – Martin Smith
I am using SQL 2008 which supports Change Data Capture. Change Data Capture is a very robust method for tracking data changes as I would like to. Thanks for the answer.
Here's an idea.You can have a flag on each table that every time a record is created or updated is filled with current datetime. Then when you notice that a record has changed set its flag to null again.Thus unchanged records have null in their flag field and you can query not null values to see which record has changed/created and when (and set their flags to null again) .

Resources