Efficient way to load referenced data in one query - sqlite

My application uses a database to save its data. I have table Objects that looks like
localID | title | content
1 Test "1,embed","3,embed","5,append"
and another table Contents that looks like
localID | content
1 Alpha
2 Beta
3 Gamma
4 Delta
5 Epsilon
The main applications runs in the main thread, the whole database stuff in a second thread. So if my application loads, I want to pass each record (QSqlRecord) to the main thread where it gets further processed (loaded into real objects). I pass that record via signals. But my data is split up into 2 tables. I want to return a record containing both, perhaps similar to a join:
localID | title | content
1 Test "Alpha,embed","Gamma,embed","Epsilon,append"
So this way, I would have all the needed information at once after only one thread return value. Without combining, I would have to call the database for each single referenced content.
I expect the database to contain less than 100.000 records, yet some content may be big (files saved as blob, e.g. a book of size of 300 mb or so).
I have two questions:
(How) Can I join the tables this way inside a query (efficiently)?
Am I too concerned about threading and should make it single threaded?
That way I would not need to bother with multiple read requests.
As a sidenode, this is my first post on Database Admins, I was not too sure about this site or Stackoverflow being the right place to ask this.

For any actual problem, use the way recommended by #Vérace in the comments,
i.e. a "linking" table. That is the way.
However, if you are either forced to keep the database structure
or for fun
or for learning (which is indicated by the migration header),
learning dirty tricks however, instead of good design...
have a look at this:
select
localID, title,
(
with recursive cnt(x) as
( select ','||a.content
union all
select replace(x, '"'||b.localID||',', '_"'||b.content||',')
from cnt, toy2 as b
)
select replace('_"'||replace(x, ',_"', ',"'), '_","', '"') from cnt
where not x like '%,"%' LIMIT 1
) as 'content' from toy as a;
using a recursive method to flexibly
(no assumptions on number of entries in AlphaBeta table, or number of their uses)
replace the numbers by greek
applying a naming scheme with "_" to create an end condition
prepend a "_" to content, to make it be processed
and cooperate with end condition
cleanup the end-condition "_"s for desired output
cleanup special case at start of output line
select the result of recursive together with other desired outputs
Note the assumption that your table does not naturally contain '__"' or '_"'. If that happens choose more "weird" strings there. If you have all kinds of strings in your table, then you look at a very meek example of what Verace describes as "a desaster to happen". Actually this non-trivial solution is in itself probably a desaster which happened.
Output (.headers on and .mode column):
localid title content
---------- ---------- --------------------------------------------
1 Test "Alpha,embed","Gamma,embed","Epsilon,append"
2 mal "Beta,append","Delta,embed"
Here is my mcve (.dump), with an additional line "mal" for testing purposes:
BEGIN TRANSACTION;
CREATE TABLE toy (localid int, title varchar(20), content varchar(100));
INSERT INTO toy VALUES(1,'Test','"1,embed","3,embed","5,append"');
INSERT INTO toy VALUES(2,'mal','"2,append","4,embed"');
CREATE TABLE toy2 (localID int, content varchar(10));
INSERT INTO toy2 VALUES(1,'Alpha');
INSERT INTO toy2 VALUES(2,'Beta');
INSERT INTO toy2 VALUES(3,'Gamma');
INSERT INTO toy2 VALUES(4,'Delta');
INSERT INTO toy2 VALUES(5,'Epsilon');
COMMIT;
SQLite 3.18.0 2017-03-28 18:48:43

Related

How to create a PL/SQL package to discard multiple level of cascading views

I am working on a CR where I need to create a PL/SQL package and I am bit confused about the approach.
Background : There is a View named ‘D’ which is at end of the chain of interdependent views in sequence.
We can put it as :
A – Fact table (Populated using Informatica, source MS-Dynamics)
B – View 1 based on fact table
C – View 2 based on View1
D – View 3 based on view2
Each view has multiple joins with other tables in structure along with the base view.
Requirement: Client wants to remove all these views and create a PL/SQL Package which can insert data directly from MS-Dynamics to View3 i.e., ‘D’.
Before I come up with something complex. I would like to know, is there any standard approach to address such requirements.
Any advice/suggestions are appreciated.
It should be obvious that you still need a fact table to keep some data.
You could get rid of B and C by making D more complex (the WITH clause might help to keep it overseeable).
Inserting data into D is (most likely) not possible per se, but you can create and INSTEAD OF INSERT trigger to handle that, i.e. insert into the fact table A instead.
Example for using the WITH clause:
Instead of
create view b as select * from dual;
create view c as select * from b;
create view d as select * from c;
you could write
create view d as
with b as (select * from dual),
c as (select * from b)
select * from c;
As you can see, the existing view definition goes 1:1 into the WITH clause, so it's not too difficult to create a view to combine all views.
If you are on Oracle 12c you might look at DBMS_UTILITY.EXPAND_SQL_TEXT, though you'll probably want to clean up the output a bit for readability.
A few things first
1) A view is a predefined sql query so it is not possible to insert records directly into it. Even a materialized view which is a persistant table structure only gets populated with the results of a query thus as things stand this is not possible. What is possible is to create a new table to populate the data which is currently aggregated at view D
2) It is very possible to aggregate data at muliple levels in Informatica using combination of multiple inline sorter and aggregater transformations which will generate the data at the level you're looking for.
3) Should you do it? Data warehousing best practices would say no and keep the data as granular as possible per the original table A so that it can be rolled up in many ways (refer Kimball group site and read up on star schema for such matters). Do you have much sway in the choice though?
4) The current process (while often used) is not that much better in terms of star schema

SQLite data retrieve with select taking too long

I have created a table with sqlite for my corona/lua app. It's a hashtable with ~=700 000 values.The table has two columns, which are the hashcode (a string), and the value (another string). During the program I need to get data several times by providing the hashcode.
I'm using something like this code to get the data:
for p in db:nrows([[SELECT * FROM test WHERE id=']].."hashcode"..[[';]]) do
print(p)
-- p = returned value --
end
This statement is though taking insanely too much time to perform
thanks,
Edit:
Success!
the mistake was with the primare key thing.I set the hashcode as the primary key like below and the retrieve time whent to normal:
CREATE TABLE IF NOT EXISTS test (id STRING PRIMARY KEY , array);
I also prepared the statements in advance as you said:
stmt = db:prepare("SELECT * FROM test WHERE id = ?;")
[...]
stmt:bind(1,s)
for p in stmt:nrows() do
The only problem was that the db file size,that was around 18 MB, went to 29,5 MB
You should create the table with id as a unique primary key; this will automatically make an index.
create table if not exists test
(
id text primary key,
val text
);
You should not construct statements using string concatenation; this is a security issue so avoid getting in this habit. Also, you should prepare statements in advance, at program initialization, and run the prepared statements.
Something like this... initially:
hashcode_query_stmt = db:prepare("SELECT * FROM test WHERE id = ?;")
then for each use:
hashcode_query_stmt:bind_values(hashcode)
for p in hashcode_query_stmt:urows() do ... end
Ensure that there is an index on the id/hashcode column? Without one such queries will be slow, slow, slow. This index should probably be unique.
If only selecting the value/hashcode (SELECT value FROM ..), it may be beneficial to have a covering index over (id, value) as that can avoid additional seeking to the row data (see SQLite Query Planning). Try it with and without such a covering index.
Also, it may be worthwhile to employ caching if the same hashcodes are queried multiple times.
As already stated, get sure you have an index on ID.
If you can't change table schema now, you can add a index ad hoc:
CREATE INDEX test_id ON test (id);
About hashes: if you are computing hashes in your software to speed up searches, don't!
SQLite will use your supplied hashes as any regular string/blob. Also, RDBMS are optimized for efficient searching, which may be greatly improved with indexes.
Unless your hashing to save space, you are wasting processor time computing hashes in your application.

Dynamic query and caching

I have two problem sets. What I am preferably looking for is a solution which combines both.
Problem 1: I have a table of lets say 20 rows. I am reading 150,000 rows from other table (say table 2). For each row read from table 2, I have to match it with a specific row of table 1 (not matching whole row, few columns. like if table2.col1 = table1.col && table2.col2 = table1.col2) etc. Is there a way that i can cache table 1 so that i don't have to query it again and again ?
Problem 2: I want to generate query string dynamically i.e., if parameter 2 is null then don't put it in where clause. Now the only option left is to use immidiate execute which will be very slow.
Now what i am asking that how can i have dynamic query to compare it with table 1 ? any ideas ?
For problem 1, as mentioned in the comments, let the database handle it. That's what it does really well. If it is something being hit often, then the blocks for the table should remain in the database buffer cache if the buffer cache is sized appropriately. Part of DBA tuning would be to identify appropriate sizing, pinning tables into the "keep" pool, etc. But probably not something that needs worrying over.
If the desire is just to simplify writing the queries rather than performance, then views or stored procs can simplify the repetitive use of the join.
For problem 2, a query in a format like this might work for you:
SELECT id, val
FROM myTable
WHERE filter = COALESCE(v_filter, filter)
If the input parameter v_filter is null, then just automatically match the existing column. This assumes the existing filter column itself is never null (since you can't use = for null comparisons). Also, it assumes that there are other indexed portions in the WHERE clause since a function like COALESCE isn't going to be able to take advantage of an index.
For problem 1 you just join the tables. If there is an equijoin and one table is quite small and the other large then you're likely to get a hash join. This is effectively a caching mechanism, and the total cost of reading the tables and performing the join is only very slightly higher than that of reading the tables (as long as the hash table fits in memory).
It does not make a difference if the query is constructed and run through execute immediate -- the RDBMS hash join will still act as an effective cache.

SQL Subquery and CAST not working

I am trying to get data from one table which is in varchar form and pass the same to other query which has same field as int.
Table Article with following fields
(ArticleID int, ArticleTitle nvarchar(200), ArticleDesc nvarchar(MAX))
Sample data
1 Title1 Desc1
2 Title2 Desc2
3 Title3 Desc3
I have another Table called Banner which has banner related to Articles
(BannerID int, BannerName nvarchar(200), BannerPath nvarchar(MAX), ArticleID varchar(200))
Sample Data
**BannerID BannerName BannerFile ArticleID**
100 Banner1 BannerPath1 '1','3','5'
101 Banner2 BannerPath2 '2','3','5'
102 Banner3 BannerPath3 '8','3','5'
103 Banner4 BannerPath4 '10','30','5','2','3','5'
Sample Query
SELECT ArticleTitle
FROM Article
WHERE CAST(ArticleID AS varchar(200)) IN (
SELECT ArticleID FROM Banner WHERE BannerID = 2
)
In my actual project i have multiple fields in banner table so that i can assign banner article, writer, Category, Pages..
For this reason i decided to store ArticleID or WriteID or CatID as single field in this format'10','30','5','2','3','5'
I change my structure then i may end up create hundreds of records for one banner and one banner can be assigned to any one of this article, writer, Category, Pages
Query below return zero rows may be my casting is creating problem I would appreciate how i can get arounds this without changing my database structure
SELECT ArticleTitle FROM Article WHERE CAST(ArticleID AS varchar(200)) IN (SELECT ArticleID FROM Banner WHERE BannerID = 2)
UPDATED:
No offense to any one i have decided to stick to my design as the question i had asked was for backed reporting section of the website which won't be used to often. I can be wrong regarding not normalizing the tables ...
My actual scenario Suppose users visit url
`abc.com/article/article.aspx?articleID=30&CatID=10&PageID=3&writerID=3`
Based on this url i can run four queries with UNION to get the required banner off-course i have to decide on banner precedence so i will do it like this
`SELECT BannerName, BannerImage FROM Banner WHERE ArticleID LIKE '%''30''%'`
UNION ALL
`SELECT BannerName, BannerImage FROM Banner WHERE CategoryID LIKE '%''10''%'`
UNION ALL
`Another query .......`
If i do it this way then query will have to look for banners in single table with few rows But if i normalize table based on JW which is good way of doing it may result in 30-40 rows for each banner in different table which may effect performance as i have to add new banner for new articles (for new magazine issues).
I know i am breaking every law of normalizing but i am afraid i have to do it for performance as i may end up having 2000 rows for every 100 banners & this will grow with time.
Updated Again
I hope this image will give you an over view of what i am trying to do
If i do it this way then i only need 1 row per banner & if i further normalize and create more table then i might end up having several row for one banner for example
Taking above image sample from banner table then my first banner will have 27 Rows
Second Banner 11 Rows
Thirds Banner 14 rows.
In order to avoid this i thought of to store multiple articleID, IssueID, PageID .... in their respected fields. This approach might be dirty but it is working.
I Definitely had some -ve feedback which from their point of view is understandable.. Since i have provided further details is my approach totally unprofessional or it is fine keeping in mind that website might have very good traffic & this approach may be faster.
It is a very bad design when you have saved comma separated values in a column when these values will be used in searching of records.
You need to properly normalize and restructure the table into 3-table design because I can see a Many-to-Many relationship on Article and Banner.
Suggested Schema design:
Article Table
ArticleID (PK)
ArticleTitle
ArticleDesc
Banner Table
BannerID (PK)
BannerName
BannerPath
Article_Banner Table
ArticleID (FK) (Also a compound PK with BannerID)
BannerID (FK)
and by this design you can simply query your records like:
SELECT a.*
FROM Article a
INNER JOIN Article_Banner b
ON a.ArticleID = b.ArticleID
WHERE b.BannerID = 2
advantages of the structure:
can easily create query statements
can take advantage of the indexes defined
etc..
In additio to John Woo's excellent answer, I will try to answer the question "Why doesn't the query return any results".
I'm going to leave aside the WHERE b.BannerID = 2 clause, which is obviously not met by any of the sample records.
The main issue with the query is the IN clause. IN will tell you whether an item is found in a set of items. What you are expecting it to do is iterate through a set of sets and tell you whether the item is found.
To illustrate this, here are two simplified queries:
-- this will print 0
if '1' in ('''1'',''3'',''5''')
print 1
else
print 0
-- this will print 1
if '1' in ('1', '3', '5')
print 1
else
print 0
The main point is that IN is a set-based operation, not a string function that will find a substring.
One possible solution to your problem would be to use CHARINDEX to perform the substring detection:
select ArticleTitle
from Article a
join Banner b
on charindex(CAST(a.ArticleID AS varchar(200)), b.ArticleID) > 0
This version is incorrect, because searching for the id '1' will also match values like '11','12'.
In order to get correct results, you could end up with a query similar to this (in order to make sure you only match on values between asterisks):
select ArticleTitle
from Article a
join Banner b
on charindex('''' + CAST(a.ArticleID AS varchar(200)) + '''', b.ArticleID) > 0
SQLFiddle: http://www.sqlfiddle.com/#!3/2ee3c/23
This query, however, has two big disadvantages:
it gets awfully slow for relatively big tables, as it cannot use any indexes and needs to scan the Banner table for each row in Article
the code got a little bit more complex and the more functionality you'll add to it, the harder it will get to reason about it, resulting in maintainability problems.
These two problems are smells that you are doing something wrong. Following JW's solution will get rid of the two problems.
Fully agree the above example is bad and not the correct way for what he is doing. However the root error message issue still exist and is a problem under some conditions.
My situation is working with a table to hold some custom form field element data. Without laying out the entire structure I’ll just lay out what is needed to show and reproduce the issue. I can also confirm the issue resolves around the IsNumeric in this case. Combined with the SubQueries as well. The sample holds two items, item name simulating the custom field element/type and the field value. Some are names, and some are minutes of labor. Could be weights, temps, distances, whatever, it’s customer definable extra data.
Create Table dboSample (cKey VarChar(20), cData VarChar(50))
Insert Into dboSample (cKey, cData) Values ('name', 'Jim')
Insert Into dboSample (cKey, cData) Values ('name', 'Bob')
Insert Into dboSample (cKey, cData) Values ('labortime', '60')
Insert Into dboSample (cKey, cData) Values ('labortime', '00')
Insert Into dboSample (cKey, cData) Values ('labortime', '15')
Select * From (Select * From dboSample Where IsNumeric(cData) = 1) As dboSampleSub Where Cast(cData As Int) > 0
Resulting in an error “Conversion failed when converting the varchar value 'Jim' to data type int.”
The lower nested query has a where clause limiting returned rows to only included numeric based data. However the cast in the higher level is clearly seeing rows not included in the sub query return. It is in fact seeing and processing data of the lower nested query. Cannot locate an Select OPTION flags to prevent this.

Hierarchical Database Select / Insert Statement (SQL Server)

I have recently stumbled upon a problem with selecting relationship details from a 1 table and inserting into another table, i hope someone can help.
I have a table structure as follows:
ID (PK) Name ParentID<br>
1 Myname 0<br>
2 nametwo 1<br>
3 namethree 2
e.g
This is the table i need to select from and get all the relationship data. As there could be unlimited number of sub links (is there a function i can create for this to create the loop ?)
Then once i have all the data i need to insert into another table and the ID's will now have to change as the id's must go in order (e.g. i cannot have id "2" be a sub of 3 for example), i am hoping i can use the same function for selecting to do the inserting.
If you are using SQL Server 2005 or above, you may use recursive queries to get your information. Here is an example:
With tree (id, Name, ParentID, [level])
As (
Select id, Name, ParentID, 1
From [myTable]
Where ParentID = 0
Union All
Select child.id
,child.Name
,child.ParentID
,parent.[level] + 1 As [level]
From [myTable] As [child]
Inner Join [tree] As [parent]
On [child].ParentID = [parent].id)
Select * From [tree];
This query will return the row requested by the first portion (Where ParentID = 0) and all sub-rows recursively. Does this help you?
I'm not sure I understand what you want to have happen with your insert. Can you provide more information in terms of the expected result when you are done?
Good luck!
For the retrieval part, you can take a look at Common Table Expression. This feature can provide recursive operation using SQL.
For the insertion part, you can use the CTE above to regenerate the ID, and insert accordingly.
I hope this URL helps Self-Joins in SQL
This is the problem of finding the transitive closure of a graph in sql. SQL does not support this directly, which leaves you with three common strategies:
use a vendor specific SQL extension
store the Materialized Path from the root to the given node in each row
store the Nested Sets, that is the interval covered by the subtree rooted at a given node when nodes are labeled depth first
The first option is straightforward, and if you don't need database portability is probably the best. The second and third options have the advantage of being plain SQL, but require maintaining some de-normalized state. Updating a table that uses materialized paths is simple, but for fast queries your database must support indexes for prefix queries on string values. Nested sets avoid needing any string indexing features, but can require updating a lot of rows as you insert or remove nodes.
If you're fine with always using MSSQL, I'd use the vendor specific option Adrian mentioned.

Resources