I am looking for suggestions, on how this requirement be approached.
I have to come-up with Informatica mapping to construct below Target File.
Source File
Key-1 Key-2 ACCOUNT-1
Key-1 Key-2 ACCOUNT-2
Key-1 Key-2 CC-ACC-1
Key-1 Key-2 CC-ACC-2
Key-1 Key-2 CC-ACC-3
For the above input source layout, I need to have one output record based on below Target File.
Basically, I need to group the data based on Key-1 & Key-2 Field & create a record that can have more than one accounts & more than one credit card account details for a customer.
TARGET FILE << MAINFRAME FILE LAYOUT >>
Key-1 String 10
Key-2 String 10
BANK-CUSTOMER-INFO
MGR-NAME STRING 50 — data to be extracted from MGR Table
MGR-EMAIL STRING 100 — data to be extracted from MGR-ADDTNL-INFO Table
MGR-PHNE STRING 10 — data to be extracted from MGR-ADDTNL-INFO Table
CUST-NAME STRING 100 — data to be extracted from Person Table
CUST-EMAIL STRING 100 — data to be extracted from Person-Addtnl_info Table
CUST-ACCOUNT-INFO
BANK-ACCOUNT OCCURS 5 TIMES
ACC-NO STRING 10 ( Key Field )
ACC-TYPE STRING 10 — data to be extracted from A Table
ACC-TRXN-DTLS OCCURS 10 TIMES
ACC-TRXN-DATE DATE 10 — data to be extracted from X Table
ACC-TRXN-MODE STRING 10 — data to be extracted from Y Table
ACC-TRXN-AMT STRING 10 — data to be extracted from Z Table
CREDIT-CARD-ACC OCCURS 5 TIMES
CC-ACC-NO STRING 10 ( Key Field )
CC-ACC-TYPE STRING 10 — data to be extracted from B Table
CC-TRXN-DTLS OCCURS 10 TIMES
CC-TRXN-DATE DATE 10 — data o be extracted from X1 Table
CC-TRXN-MODE STRING 10 — data o be extracted from Y2 Table
CC-TRXN-AMT STRING 10 — data o be extracted from Z2 Table
Questions :-
How mapping can be accomplished.
Does Informatica supports to have Target Structure as defined above.
Most of my Target Attributes are to be fetched from multiple tables to construct one record, what should be most efficient way to come with Mapping.
Also, I need to denormalise the data as well to get above structure.
The structure you've shown can easily be achieved with an xml target (so long as you're happy to have the target in xml).
One way to do it is to have a source qualifier for each field you want in the target and then to use joiner transformations to denormalise the details across records. The only drawback will be if you only want to extract a few records as this method will pick up all the records in the source tables each time the mapping runs.
Otherwise you'l need a source qualifier override query that denormalizes the incoming records right at the start (could also do this with inline lookups from powercenter 9.1 on which can be configured to return multiple matches but would be fiddly)
Related
I have an excel spreadsheet with multiple entries that I want to insert into an SQLite DB from UIPath. How do I do this?
You could do it one of two ways. For both methods, you will need to use the Excel Read Range to put the excel into a table.
Scenario 1: You could read the table in a for each loop, line by line, converting each row to SQL and use a Execute non-query activity. This is too long, and if you like O notation, this is an O(n) solution.
Scenario 2: You could upload the entire table (as long as its compatible with the DB Table) to the database.
you will need Database > Insert activity
You will need to provide the DB Connection (which I answer in another post how to create)
Then enter the sqlite database table you want to insert into in Quotes
And then enter the table name that you have created or pulled from another resource in the last field
Output will be an integer (Affected Records)
For O Notation, this is an O(1) solution. At least from our coding perspective
I want to write same query for multiple files. Is this possible to write dynamic query in U-SQL or is there any way to eliminate re-writing of same piece of code like
Select count(*) as cnt from #table1;
Select count(*) as cnt from #table2;
can be replaced to
Select count(*) as cnt from #dynamic
where #dynamic = table1, table2
(Azure Data Lake team here)
Your question mentions reading from files, but your example shows tables. If you do really do want to read from files, the EXTRACT statement supports "File Sets" that allow a single EXTRACT statement to read multiple files that are specified by a pattern
#data =
EXTRACT name string,
age int,
FROM "/input/{*}.csv"
USING Extractors.Csv();
Sometimes, the data needs to include the filename the data came frome, so you can specify it like this:
#data =
EXTRACT name string,
age int,
basefilename string
FROM "/input/{basefilename}.csv"
USING Extractors.Csv();
I use a custom CSV extractor that matches columns to values using the first row in the CSV file.
Here is the Gist to be added as in code behind or as a custom assembly: https://gist.github.com/serri588/ff9e3047d8341398df4aea7557f0a82c
I made it because I have a list of files that have a similar structure, but slightly different columns. The standard CSV extractor is not well suited to this task. Write your EXTRACT with all the possible column names you want to pull and it will fill those values and ignore the rest.
For example:
Table_1 has columns A, B, and C.
Table_2 has columns A, C, and D.
I want A, B, and C so my extract would be
EXTRACT
A string,
B string,
C string
FROM "Table_{*}.csv"
USING new yourNamespace.CSVExtractor();
Table 1 will populate all three columns, while Table 2 will populate A and C, ignoring D.
U-SQL does not provide a dynamic execution mode per se, but it is adding some features that can help with some of the dynamic scenarios.
Today, you have to provide the exact schema for table type parameters for TVFs/SPs, however, we are working on a feature that will give you flexible schema parameters that will make it possible to write a TVF/SP that can be applied to any table shape (as long as your queries do not have a dependency on the shape).
Until this capability becomes available, the suggestions are:
If you know what the possible schemas are: Generate a TVF/SP for each possible schema and call it accordingly.
Use any of the SDKs (C#, PowerShell, Java, Python, node.js) to code-gen the script based on the schema information (assuming you are applying it to an object from which you can get schema information and not just a rowset expression).
I have a sqlite table containing metadata extracted from thousands of audio files in a directory tree. The objective of the extraction is to run a series of queries against the table to identify and rectify anomalies in the underlying metadata. The corrected metadata is then written back from the table to the underlying files. The underlying files are grouped into albums with each album in a directory of its own. Table structure relevant to my question is as follows:
__path: unique identifier being the path and source filename combined
__dirpath: in simple terms represents the directory from which the file represented by a table record was drawn. Records making up an album will have the same __dirpath
__discnumber: number designating the disc number from which the track originates. The field can be blank or contain a string 1,2,3... etc.
I'd like to identify all records where (__dirpath is identical and __discnumber equals 1).
SELECT DISTINCT __dirpath,
discnumber
FROM alib
WHERE __dirpath IN (
SELECT __dirpath
FROM alib
GROUP BY __dirpath
HAVING count( * ) > 0
)
AND
discnumber = 1
ORDER BY __dirpath,
discnumber;
We get new data for our database from an online form that outputs as an Excel sheet. To normalize the data for the database, I want to combine multiple columns into one row.
Example, I want data like this:
ID | Home Phone | Cell Phone | Work Phone
1 .... 555-1234 ...... 555-3737 ... 555-3837
To become this:
PhoneID | ID | Phone Number | Phone type
1 ............ 1 ....... 555-1234 ....... Home
2 ............ 1 ....... 555-3737 ....... Cell
3 ............ 1 ....... 555-3837 ...... Work
To import the data, I have a button that finds the spreadsheet and then runs a bunch of queries to add the data.
How can I write a query to append this data to the end of an existing table without ending up with duplicate records? The data pulled from the website is all stored and archived in an Excel sheet that will be updated without removing the old data (we don't want to lose this extra backup), so with each import, I need it to disregard all of the previously entered data.
I was able to make a query that lists everything out in the correct from the original spreadsheet (I entered the external spreadsheet into an unnormalized table in Access to test it) but when I try to append it to the phone number table, it adds all of the data repeatedly. I can remove it with a query to remove duplicate data, but I'd rather not leave it like that.
There are several possible approaches to this problem; which one you choose may depend on the size of the dataset relative to the number of updates being processed. Basically, the choices are:
1) Add a unique index to the destination table, so that Access will refuse to add a duplicate record. You'll need to handle the possible warning ("Access was unable to add xxx records due to index violations" or similar).
2) Import the incoming data to a staging table, then outer join the staging table to the destination table and append only records where the key field(s) in the destination table are null (i.e., there's no matching record in the destination table).
I have used both approaches in the past - I like the index approach for its simplicity, and I like the staging approach for its flexibility, because you can do a lot with the incoming data before you append it if you need to.
You could run a delete query on the table where you store the queried data and then run your imports.
Assuming that the data is only being updated.
The delete query will remove all records and then you can run the import to repopulate the table - therefore no duplicates.
I'm attempting to create an order number for customers to use. I will have multiple machines that do not have access to the same database (so can't use primary keys and generate a unique ID).
I will have a unique string that I could use for a seed for some algorithm that will generate a unique looking alphanumeric ID # for the order number. I do not want to use this unique string as the order # because its contents would not be appropriate in appearance for a customer to use for order #.
Would it be possible to combine the use of a GUID & my unique string with some algorithm to create a unique order #?
Open to any suggestions.
If you have a relatively small number of machines and each one can have it's own configuration file or setting, you can assign a letter to each machine (A,B,C...) and then append the letter onto the order number, which could just be an auto-incrementing integer in each DB.
i.e.
Starting each database ID at 1000:
1001A // First order on database A
1001B // First order on database B
1001C // First order on database C
1002A // Second order on database A
1003A // Third order on database A
1004A // etc...
1002B
1002C
Your order table in each database would have an ID column (integer) and "machine" identifier (character A,B,C...) so in case you ever needed to combine DBs into one, each order would still be unique.
Just use a straight up guid/uuid. They take into account the mac address of the network interface to make it unique to that machine.
http://en.wikipedia.org/wiki/Uuid
You can use ids and as a primary key if you generate they id from a stored procedure (or perhaps in Oracle using a sequence).
What you have to do is make each machine generate in a different range e.g. machine a from 1 to 1million, machine B from 1000001 to 2000000 etc.
You say you have a unique string that would not be 'appropriate' to show to customers.
If it's only inappropriate and not necessary i.e. security/privacy related you could just transform it somehow. A simple example would be Rot13
But generally I too would suggest using UUID (but version 4) for random numbers. The probability for generating duplicates is extremely low and there are libraries for many programming languages available.