I'm looking for a way to export related data spread over several tables, and to import that data in another schema. I'm working with an Oracle 11g Database.
To simplify my case I have tables A, B and C where B has a foreign key on A, and C has a foreign key to B. Having 1 entry in A, I would like to extract all entries relating to this entry from A, B and C and insert them into another schema. Please keep in mind that in my real-world scenario its not A, B and C, but 102 separate tables (don't ask, not my design ;-)).
What I am looking for is a tool that will use the knowledge of the relations between the tables to do the export, without the need for me to specify which tables are connected through which fields.
Is there a way to do that and stay sane?
Data pump will let you supply a predicate per table for extracting the data, so it's a "simple" matter of relating each table to the one that specifies the data for which related data is to be exported. Typically the predicate would be something like "customer_id in (select customer_id from customers).
Related
I am working on a CR where I need to create a PL/SQL package and I am bit confused about the approach.
Background : There is a View named ‘D’ which is at end of the chain of interdependent views in sequence.
We can put it as :
A – Fact table (Populated using Informatica, source MS-Dynamics)
B – View 1 based on fact table
C – View 2 based on View1
D – View 3 based on view2
Each view has multiple joins with other tables in structure along with the base view.
Requirement: Client wants to remove all these views and create a PL/SQL Package which can insert data directly from MS-Dynamics to View3 i.e., ‘D’.
Before I come up with something complex. I would like to know, is there any standard approach to address such requirements.
Any advice/suggestions are appreciated.
It should be obvious that you still need a fact table to keep some data.
You could get rid of B and C by making D more complex (the WITH clause might help to keep it overseeable).
Inserting data into D is (most likely) not possible per se, but you can create and INSTEAD OF INSERT trigger to handle that, i.e. insert into the fact table A instead.
Example for using the WITH clause:
Instead of
create view b as select * from dual;
create view c as select * from b;
create view d as select * from c;
you could write
create view d as
with b as (select * from dual),
c as (select * from b)
select * from c;
As you can see, the existing view definition goes 1:1 into the WITH clause, so it's not too difficult to create a view to combine all views.
If you are on Oracle 12c you might look at DBMS_UTILITY.EXPAND_SQL_TEXT, though you'll probably want to clean up the output a bit for readability.
A few things first
1) A view is a predefined sql query so it is not possible to insert records directly into it. Even a materialized view which is a persistant table structure only gets populated with the results of a query thus as things stand this is not possible. What is possible is to create a new table to populate the data which is currently aggregated at view D
2) It is very possible to aggregate data at muliple levels in Informatica using combination of multiple inline sorter and aggregater transformations which will generate the data at the level you're looking for.
3) Should you do it? Data warehousing best practices would say no and keep the data as granular as possible per the original table A so that it can be rolled up in many ways (refer Kimball group site and read up on star schema for such matters). Do you have much sway in the choice though?
4) The current process (while often used) is not that much better in terms of star schema
I want to write same query for multiple files. Is this possible to write dynamic query in U-SQL or is there any way to eliminate re-writing of same piece of code like
Select count(*) as cnt from #table1;
Select count(*) as cnt from #table2;
can be replaced to
Select count(*) as cnt from #dynamic
where #dynamic = table1, table2
(Azure Data Lake team here)
Your question mentions reading from files, but your example shows tables. If you do really do want to read from files, the EXTRACT statement supports "File Sets" that allow a single EXTRACT statement to read multiple files that are specified by a pattern
#data =
EXTRACT name string,
age int,
FROM "/input/{*}.csv"
USING Extractors.Csv();
Sometimes, the data needs to include the filename the data came frome, so you can specify it like this:
#data =
EXTRACT name string,
age int,
basefilename string
FROM "/input/{basefilename}.csv"
USING Extractors.Csv();
I use a custom CSV extractor that matches columns to values using the first row in the CSV file.
Here is the Gist to be added as in code behind or as a custom assembly: https://gist.github.com/serri588/ff9e3047d8341398df4aea7557f0a82c
I made it because I have a list of files that have a similar structure, but slightly different columns. The standard CSV extractor is not well suited to this task. Write your EXTRACT with all the possible column names you want to pull and it will fill those values and ignore the rest.
For example:
Table_1 has columns A, B, and C.
Table_2 has columns A, C, and D.
I want A, B, and C so my extract would be
EXTRACT
A string,
B string,
C string
FROM "Table_{*}.csv"
USING new yourNamespace.CSVExtractor();
Table 1 will populate all three columns, while Table 2 will populate A and C, ignoring D.
U-SQL does not provide a dynamic execution mode per se, but it is adding some features that can help with some of the dynamic scenarios.
Today, you have to provide the exact schema for table type parameters for TVFs/SPs, however, we are working on a feature that will give you flexible schema parameters that will make it possible to write a TVF/SP that can be applied to any table shape (as long as your queries do not have a dependency on the shape).
Until this capability becomes available, the suggestions are:
If you know what the possible schemas are: Generate a TVF/SP for each possible schema and call it accordingly.
Use any of the SDKs (C#, PowerShell, Java, Python, node.js) to code-gen the script based on the schema information (assuming you are applying it to an object from which you can get schema information and not just a rowset expression).
Lets say I already have 3 columns A,B,C in my table Tb. I want to add a new column M between B and C. How can I do this ?
After adding M,my table should look like - A B M C and NOT A B C M ?
The simple answer is that you can't. Columns are always added at the end. However, you shouldn't care about the order of columns in a table since you should always be explicitly listing columns in your queries and in your DML. And you should always have an interface layer (a view, for example) where, if order is important, you can add the new column in the appropriate place.
If you are really determined, you can create a new table with the new column order, move the data to the new table, drop the old table, and rename the new table. You'll need to recreate any indexes, constraints, or triggers on the table. Something like
ALTER TABLE tb
ADD( M NUMBER );
CREATE TABLE tb_new
AS
SELECT a, b, m, c
FROM tb;
DROP TABLE tb;
ALTER TABLE tb_new
RENAME TO tb;
I'm not sure whether it's an option in the express edition (I tend to doubt it is but I don't have an XE database handy to verify) but you could also potentially use the DBMS_REDEFINITION package as Barbara shows in that example. Behind the scenes, Oracle is doing basically the same thing that is done above but with some added materialized view logs to allow applications to continue to access the table during the operation.
If you find yourself caring about the order of columns in a table, though, you're much better off stopping to figure out what you've done wrong rather than continuing to move forward on either path. It should be exceptionally, exceptionally rare that you would care about the physical order of columns in a table.
I have a table in a MS Access 2010 Database and it can easily be split up into multiple tables. However I don't know how to do that and still keep all the data linked together. Does anyone know an easy way to do this?
I ended up just writing a bunch of Update and Append queries to create smaller tables and keep all the data synced.
You must migrate to other database system, like MSSQL, mySQL. You can't do in MsAccess replication...
Not sure what do you mean by split up into multiple tables.
Are the two tables have same structure? you want to divide the table into two pats ... means if original table has fields A,B,C,D ... then you want to split it to Table1: A,B and
Table2: C,D.
Anyways, I googled it a bit and the below links might of what you are looking for. Check them.
Split a table into related tables (MDB)
How hard is it to split a table in Access into two smaller tables?
Where do you run into trouble with the table analyzer wizard? Maybe you can work around the issue you are running into.
However, if the table analyzer wizard isn't working out, you might also consider the tactics described in http://office.microsoft.com/en-us/access-help/resolve-and-help-prevent-duplicate-data-HA010341696.aspx.
Under Microsoft Access 2012, Database Tools, Analyze table.. I use the wizard to split a large table into multiple normalized tables. Hope that helps.
Hmmm, can't you just make a copy of the table, then delete opposite items in each table leaving the data the way you want except, make sure that both tables have the same exact auto number field, and use that field to reference the other.
It may not be the most proficient way to do it, but I solved a similar issue the following way:
a) Procedure that creates a new table via SQL:
CREATE TABLE t002 (ID002 INTEGER PRIMARY KEY, CONSTRAINT SomeName FOREIGN KEY (ID002) REFERENCES t001(ID001));
The two tables are related to each other through the foreign key.
b) Procedure that adds the neccessary fields to the new table (t002). In the following sample code let's use just one field, and let's call it [MyFieldName].
c) Procedure to append all values of field ID001 from Table t001 to field ID002 in Table t002, via SQL:
INSERT INTO ID002 (t002) SELECT t001.ID001 FROM t001;
d) Procedure to transfer values from fields in t001 to fields in t001, via SQL:
UPDATE t001 INNER JOIN t002 ON t001.ID001 = t002.ID002 SET t002.MyFieldName = t001.MyFieldName;
e) Procedure to remove (drop) the fields in question in Table t001, via SQL:
ALTER TABLE t001 DROP COLUMN MyFieldName;
f) Procedure that calls them all one after the other. Fieldnames are fed into the process as parameters in the call to Procedure f.
It is quite a bunch of coding, but it did the job for me.
I am creating an activity table with many types of activities. Let's
say activities of type "jogging" will have elements a, b, and c while
activities of "football" will have elements a, d, and e. Can I create a
table in which the row elements for each column depend on that column's
type? I have considered creating one table for each activity type or
creating a single table with rows for every activity's options, but
there will be many activity types so it seems like a waste to use so
many tables or leave so many rows blank.
You cannot create such a table, it is not in the nature of databases to allow for "varargs". That is the reason we have relations in databases to model this type of stuff.
For an evil quickhack you could store the variable number of arguments in one column in a specific format and parse this again. Something like "a:foo|e:bar|f:qux". Don't do this, it will get out of hand in about 1 day.
I second James' proposal: redesign your tables. It should then look something like this.
Table: Activities
id|activity
0|jogging
1|football
2|...
Table: ElementsOfActivities
id|activity_id|element
0|0|a
1|0|b
2|0|c
3|1|a
4|1|d
5|1|e
Look up "normalization" (for example http://en.wikipedia.org/wiki/Database_normalization)
I assume in the subject you mean column instead of row because the whole concept of a table is around the fact that is has a variable number of rows. The same goes for your statement "leave so many rows blank" - again I assume you are talking about columns.
What you are describing is essentially an (anti) pattern called "entity attribute value". Search for this and you'll find a lot of hits describing how to do it and why not to do it.
In Postgres things are somewhat easier. It has a contrib module called "hstore" which is essentially what you are looking for. "Multiple columns inside a single column".
The biggest drawback with the hstore module is that you lose type safety. You can only put character data into a hstore column. So you cannot say "the attribute *price" is numeric, the attribute name is a character value".
If you can live with that restriction, hstore is probably what you are looking for in Postgres
It's complicated. The short answer is, "No."
You should ask yourself what you're trying to report on, and try to figure out a different schema for tracking your data.
If you really want to implement a variable-column-count table, you can do something close.
Define the activity types, and the elements you'll track on each one, and a junction table to resolve the many-to-many relationship. These tables will be mostly static. Then you have an Activity table and an ActivityAttribute table.
Create an Activity table, and then have an Activity Type, Activity Element, Activity Type-Elements and Activity Attribute tables.
Types would be "jogging", "football".
Elements would be "a", "b", "c", "d"...
Type-Elements would have rows that look like "jogging:a", "jogging:b", "jogging:c", "football:a", "football:d"
Attributes would have the actual data: "18236:a:'0:10:24'", "18237:d:'356 yards'"
Tables aren't a limited resource (in reasonable practice) so don't obsess about creating lots of them "wasting" them. Similarly in most modern databases, null columns don't take up space (in postgresql, beyond a minimal "null bitmask" overhead) so they aren't a particularly precious resource either.
It probably makes sense to have a table to represent distinct sets of attributes that can be defined together (this is essentially one of the general rules of database normalisation). If you want to deal with "activities" in a generic way, you may want to have common attributes in a shared table, rather like a base class in OOP... or you may not.
For example you could have:
jogging(activity_id int, a type, b type, c type)
football(activity_id int, a type, d type, e type)
and then create a view to combine these together when desired:
create view activity as
select 'jogging', activity_id, a, b, c, null as d, null as e from jogging
union all
select 'football', activity_id, a, null, null, d, e from football
Alternatively you could have:
activity(activity_id int, a type)
jogging(activity_id int, b type, c type)
football(activity_id int, d type, e type)
and then:
create view activity as
select case when jogging.activity_id is not null then 'jogging'
when football.activity_id is not null then 'football'
end,
activity_id, a, b, c, d, e
from activity
left join jogging using (activity_id)
left join football using (activity_id)
These models are mostly equivalent, the main difference being that the second one provides a clear path to a distinct activity_id identifier, which is one reason many people would prefer it, especially when using an ORM to persist the data (although you can do it the first way too by sharing a sequence).