I have recently stumbled upon a problem with selecting relationship details from a 1 table and inserting into another table, i hope someone can help.
I have a table structure as follows:
ID (PK) Name ParentID<br>
1 Myname 0<br>
2 nametwo 1<br>
3 namethree 2
e.g
This is the table i need to select from and get all the relationship data. As there could be unlimited number of sub links (is there a function i can create for this to create the loop ?)
Then once i have all the data i need to insert into another table and the ID's will now have to change as the id's must go in order (e.g. i cannot have id "2" be a sub of 3 for example), i am hoping i can use the same function for selecting to do the inserting.
If you are using SQL Server 2005 or above, you may use recursive queries to get your information. Here is an example:
With tree (id, Name, ParentID, [level])
As (
Select id, Name, ParentID, 1
From [myTable]
Where ParentID = 0
Union All
Select child.id
,child.Name
,child.ParentID
,parent.[level] + 1 As [level]
From [myTable] As [child]
Inner Join [tree] As [parent]
On [child].ParentID = [parent].id)
Select * From [tree];
This query will return the row requested by the first portion (Where ParentID = 0) and all sub-rows recursively. Does this help you?
I'm not sure I understand what you want to have happen with your insert. Can you provide more information in terms of the expected result when you are done?
Good luck!
For the retrieval part, you can take a look at Common Table Expression. This feature can provide recursive operation using SQL.
For the insertion part, you can use the CTE above to regenerate the ID, and insert accordingly.
I hope this URL helps Self-Joins in SQL
This is the problem of finding the transitive closure of a graph in sql. SQL does not support this directly, which leaves you with three common strategies:
use a vendor specific SQL extension
store the Materialized Path from the root to the given node in each row
store the Nested Sets, that is the interval covered by the subtree rooted at a given node when nodes are labeled depth first
The first option is straightforward, and if you don't need database portability is probably the best. The second and third options have the advantage of being plain SQL, but require maintaining some de-normalized state. Updating a table that uses materialized paths is simple, but for fast queries your database must support indexes for prefix queries on string values. Nested sets avoid needing any string indexing features, but can require updating a lot of rows as you insert or remove nodes.
If you're fine with always using MSSQL, I'd use the vendor specific option Adrian mentioned.
Related
I am trying to select data based on a status which is a string. What I want is that status 'draft' comes first, so I tried this:
SELECT *
FROM c
ORDER BY c.status = "draft" ? 0:1
I get an error:
Unsupported ORDER BY clause. ORDER BY item expression could not be mapped to a document path
I checked Microsoft site and I see this:
The ORDER BY clause requires that the indexing policy include an index for the fields being sorted. The Azure Cosmos DB query runtime supports sorting against a property name and not against computed properties.
Which I guess makes what I want to do impossible with queries... How could I achieve this? Using a stored procedure?
Edit:
About stored procedure: actually, I am just thinking about this, that would mean, I need to retrieve all data before ordering, that would be bad as I take max 100 value from my database... IS there any way I can do it so I don t have to retrieve all data first? Thanks
Thanks!
ORDER BY item expression could not be mapped to a document path.
Basically, we are told we can only sort with properties of document, not derived values. c.status = "draft" ? 0:1 is derived value.
My idea:
Two parts of query sql: The first one select c.* from c where c.status ='draft',second one select c.* from c where c.status <> 'draft' order by c.status. Finally, combine them.
Or you could try to use stored procedure you mentioned in your question to process the data from the result of select * from c order by c.status. Put draft data in front of others by if-else condition.
Similar to this question and this solution for PostgreSQL (in particular "INSERT missing FK rows at the same time"):
Suppose I am making an address book with a "Groups" table and a "Contact" table. When I create a new Contact, I may want to place them into a Group at the same time. So I could do:
INSERT INTO Contact VALUES (
"Bob",
(SELECT group_id FROM Groups WHERE name = "Friends")
)
But what if the "Friends" Group doesn't exist yet? Can we insert this new Group efficiently?
The obvious thing is to do a SELECT to test if the Group exists already; if not do an INSERT. Then do an INSERT into Contacts with the sub-SELECT above.
Or I can constrain Group.name to be UNIQUE, do an INSERT OR IGNORE, then INSERT into Contacts with the sub-SELECT.
I can also keep my own cache of which Groups exist, but that seems like I'm duplicating functionality of the database in the first place.
My guess is that there is no way to do this in one query, since INSERT does not return anything and cannot be used in a subquery. Is that intuition correct? What is the best practice here?
My guess is that there is no way to do this in one query, since INSERT
does not return anything and cannot be used in a subquery. Is that
intuition correct?
You could use a Trigger and a little modification of the tables and then you could do it with a single query.
For example consider the folowing
Purely for convenience of producing the demo:-
DROP TRIGGER IF EXISTS add_group_if_not_exists;
DROP TABLE IF EXISTS contact;
DROP TABLE IF EXISTS groups;
One-time setup SQL :-
CREATE TABLE IF NOT EXISTS groups (id INTEGER PRIMARY KEY, group_name TEXT UNIQUE);
INSERT INTO groups VALUES(-1,'NOTASSIGNED');
CREATE TABLE IF NOT EXISTS contact (id INTEGER PRIMARY KEY, contact TEXT, group_to_use TEXT, group_reference TEXT DEFAULT -1 REFERENCES groups(id));
CREATE TRIGGER IF NOT EXISTS add_group_if_not_exists
AFTER INSERT ON contact
BEGIN
INSERT OR IGNORE INTO groups (group_name) VALUES(new.group_to_use);
UPDATE contact SET group_reference = (SELECT id FROM groups WHERE group_name = new.group_to_use), group_to_use = NULL WHERE id = new.id;
END;
SQL that would be used on an ongoing basis :-
INSERT INTO contact (contact,group_to_use) VALUES
('Fred','Friends'),
('Mary','Family'),
('Ivan','Enemies'),
('Sue','Work colleagues'),
('Arthur','Fellow Rulers'),
('Amy','Work colleagues'),
('Henry','Fellow Rulers'),
('Canute','Fellow Ruler')
;
The number of values and the actual values would vary.
SQL Just for demonstration of the result
SELECT * FROM groups;
SELECT contact,group_name FROM contact JOIN groups ON group_reference = groups.id;
Results
This results in :-
1) The groups (noting that the group "NOTASSIGNED", is intrinsic to the working of the above and hence added initially) :-
have to be careful regard mistakes like (Fellow Ruler instead of Fellow Rulers)
-1 used because it would not be a normal value automatically generated.
2) The contacts with the respective group :-
Efficient insertion
That could likely be debated from here to eternity so I leave it for the fence sitters/destroyers to decide :). However, some considerations:-
It works and appears to do what is wanted.
It's a little wasteful due to the additional wasted column.
It tries to minimise the waste by changing the column to an empty string (NULL may be even more efficient, but for some can be confusing)
There will obviously be an overhead BUT in comparison to the alternatives probably negligible (perhaps important if you were extracting every Facebook user) but if it's user input driven likely irrelevant.
What is the best practice here?
Fences again. :)
Note Hopefully obvious, but the DROP statements are purely for convenience and that all other SQL up until the INSERT is run once
to setup the tables and triggers in preparation for the single INSERT
that adds a group if necessary.
I am working on a CR where I need to create a PL/SQL package and I am bit confused about the approach.
Background : There is a View named ‘D’ which is at end of the chain of interdependent views in sequence.
We can put it as :
A – Fact table (Populated using Informatica, source MS-Dynamics)
B – View 1 based on fact table
C – View 2 based on View1
D – View 3 based on view2
Each view has multiple joins with other tables in structure along with the base view.
Requirement: Client wants to remove all these views and create a PL/SQL Package which can insert data directly from MS-Dynamics to View3 i.e., ‘D’.
Before I come up with something complex. I would like to know, is there any standard approach to address such requirements.
Any advice/suggestions are appreciated.
It should be obvious that you still need a fact table to keep some data.
You could get rid of B and C by making D more complex (the WITH clause might help to keep it overseeable).
Inserting data into D is (most likely) not possible per se, but you can create and INSTEAD OF INSERT trigger to handle that, i.e. insert into the fact table A instead.
Example for using the WITH clause:
Instead of
create view b as select * from dual;
create view c as select * from b;
create view d as select * from c;
you could write
create view d as
with b as (select * from dual),
c as (select * from b)
select * from c;
As you can see, the existing view definition goes 1:1 into the WITH clause, so it's not too difficult to create a view to combine all views.
If you are on Oracle 12c you might look at DBMS_UTILITY.EXPAND_SQL_TEXT, though you'll probably want to clean up the output a bit for readability.
A few things first
1) A view is a predefined sql query so it is not possible to insert records directly into it. Even a materialized view which is a persistant table structure only gets populated with the results of a query thus as things stand this is not possible. What is possible is to create a new table to populate the data which is currently aggregated at view D
2) It is very possible to aggregate data at muliple levels in Informatica using combination of multiple inline sorter and aggregater transformations which will generate the data at the level you're looking for.
3) Should you do it? Data warehousing best practices would say no and keep the data as granular as possible per the original table A so that it can be rolled up in many ways (refer Kimball group site and read up on star schema for such matters). Do you have much sway in the choice though?
4) The current process (while often used) is not that much better in terms of star schema
I have two problem sets. What I am preferably looking for is a solution which combines both.
Problem 1: I have a table of lets say 20 rows. I am reading 150,000 rows from other table (say table 2). For each row read from table 2, I have to match it with a specific row of table 1 (not matching whole row, few columns. like if table2.col1 = table1.col && table2.col2 = table1.col2) etc. Is there a way that i can cache table 1 so that i don't have to query it again and again ?
Problem 2: I want to generate query string dynamically i.e., if parameter 2 is null then don't put it in where clause. Now the only option left is to use immidiate execute which will be very slow.
Now what i am asking that how can i have dynamic query to compare it with table 1 ? any ideas ?
For problem 1, as mentioned in the comments, let the database handle it. That's what it does really well. If it is something being hit often, then the blocks for the table should remain in the database buffer cache if the buffer cache is sized appropriately. Part of DBA tuning would be to identify appropriate sizing, pinning tables into the "keep" pool, etc. But probably not something that needs worrying over.
If the desire is just to simplify writing the queries rather than performance, then views or stored procs can simplify the repetitive use of the join.
For problem 2, a query in a format like this might work for you:
SELECT id, val
FROM myTable
WHERE filter = COALESCE(v_filter, filter)
If the input parameter v_filter is null, then just automatically match the existing column. This assumes the existing filter column itself is never null (since you can't use = for null comparisons). Also, it assumes that there are other indexed portions in the WHERE clause since a function like COALESCE isn't going to be able to take advantage of an index.
For problem 1 you just join the tables. If there is an equijoin and one table is quite small and the other large then you're likely to get a hash join. This is effectively a caching mechanism, and the total cost of reading the tables and performing the join is only very slightly higher than that of reading the tables (as long as the hash table fits in memory).
It does not make a difference if the query is constructed and run through execute immediate -- the RDBMS hash join will still act as an effective cache.
I am facing a big problem with simple linq query.. I am using EF 4.0..
I am trying to take all the records from a table using a linq query:
var result = context.tablename.select(x=>x);
This results in less rows than the normal sql query which is select * from tablename;
This table has more than 5 tables as child objects (foreign key relations: one to one and one to many etc)..
This result variable after executing that linq statement returns records with all child object values without doing a include statement..
I don't know is it a default behavior of EF 4.0 ..
I tried this statement in linqpad also..but there is no use...
But interesting thing is if I do a join on the same table with another one table is working same is sql inner join and count is same..but I don't know why is it acting differently with that table only..
Is it doing inner joins with all child tables before returning the all records of that parent table??
please help me..
This table has more than 5 tables as
child objects (foreign key relations:
one to one and one to many etc)..
This result variable after executing
that linq statement returns records
with all child object values without
doing a include statement..
So we are probably talking about database view or custom DefiningQuery in SSDL.
I described the same behavior here. Your entity based on joined tables probably doesn't have unique identification for each retruned row so your problem is Identity map. You must manually configure entity key of your entity. It should be composite key based on all primary keys from joined tables. Entity key is used to identify entity in indenty map. If you don't have unique key for each record only first record with the new key is used. If you didn't specify the key manually EF had infered its own.
The easiest way to troubleshoot these types of issues is to look at the generated SQL produced by the ORM tool.
If you are using SQL Server then using the SQL Profiler to view the generated SQL.
From what you are describing, a possible explanation might be that your relationships between entities are mandatory and thereby enforcing INNER joins instead of LEFT OUTER joins.