Easy, computationally cheap way to add a row number to a table using SQLite? - sqlite

Googled this quite a bit and found answers asking similar but different questions, but not what I am looking to do.
I have a table, and I want to add a row number to it. So:
ID || value
91 || valueA
11 || valueB
71 || valueC
becomes
Row# || ID || value
1 || 91 || valueA
2 || 11 || valueB
3 || 71 || valueC
Found this answer that is a bit more complex than my use case. Also I was warned against using the answers at they are computationally expensive (n^2-ish).
Also found a few other answers like this one where the user wanted the row number returned for a query, but that is a different use case. I just want to append a row number to all the rows in the table.

Based on your question, it seemed like you were asking for another column in your database. If that's not the case please comment.
In your database creation class (or wherever you create your database), in the CREATE TABLE statement, use the following structure:
CREATE TABLE table_name(
RowNum INTEGER AUTOINCREMENT,
_ID INTEGER PRIMARY KEY,
value TEXT,
);
Increment the database version by 1 and it'll be good to go.

As I think you know, in SQL tables have no inherent order. Therefore any "row number" is based on some implicit order. If you make use of any kind of "row id" in the DBMS, the implicit order is likely to be insertion order. That's cheap, if if it suits your needs, that's what you want.
Any other "row number" you create requires a sort; if supported by an index, that sort will be O(N log N), else, yes, O(N^2). The math has a way of being very insistent about that.
After answering this question many times in different guises, I wrote a simple example. In SQLite I've had good experience with under a million rows. Larger sorts take longer, YMMV.
FWIW, I never store derived order. Because the system can't feasibly enforce its correctness, it's impossible to know if it's correct. Better to keep a covering index on the interesting order, and rely on a view to supply rank ordinals when needed.

Related

How to put a part of a code as a string in table to use it in a procedure?

I'm trying to resolve below issue:
I need to prepare table that consists 3 columns:
user_id,
month
value.
Each from over 200 users has got different values of parameters that determine expected value which are: LOB, CHANNEL, SUBSIDIARY. So I decided to store it in table ASYSTENT_GOALS_SET. But I wanted to avoid multiplying rows and thought it would be nice to put all conditions as a part of the code that I would use in "where" clause further in procedure.
So, as an example - instead of multiple rows:
I created such entry:
So far I created testing table ASYSTENT_TEST (where I collect month and value for certain user). I wrote a piece of procedure where I used BULK COLLECT.
declare
type test_row is record
(
month NUMBER,
value NUMBER
);
type test_tab is table of test_row;
BULK_COLLECTOR test_tab;
p_lob varchar2(10) :='GOSP';
p_sub varchar2(14);
p_ch varchar2(10) :='BR';
begin
select subsidiary into p_sub from ASYSTENT_GOALS_SET where user_id='40001001';
execute immediate 'select mc, sum(ppln_wartosc) plan from prod_nonlife.mis_report_plans
where report_id = (select to_number(value) from prod_nonlife.view_parameters where view_name=''MIS'' and parameter_name=''MAX_REPORT_ID'')
and year=2017
and month between 7 and 9
and ppln_jsta_symbol in (:subsidiary)
and dcs_group in (:lob)
and kanal in (:channel)
group by month order by month' bulk collect into BULK_COLLECTOR
using p_sub,p_lob,p_ch;
forall x in BULK_COLLECTOR.first..BULK_COLLECTOR.last insert into ASYSTENT_TEST values BULK_COLLECTOR(x);
end;
So now when in table ASYSTENT_GOALS_SET column SUBSIDIARY (varchar) consists string 12_00_00 (which is code of one of subsidiary) everything works fine. But the problem is when user works in two subsidiaries, let say 12_00_00 and 13_00_00. I have no clue how to write it down. Should SUBSIDIARY column consist:
'12_00_00','13_00_00'
or
"12_00_00","13_00_00"
or maybe
12_00_00','13_00_00
I have tried a lot of options after digging on topics like "Deling with single/escaping/double qoutes".
Maybe I should change something in execute immediate as well?
Or maybe my approach to that issue is completely wrong from the very beginning (hopefully not :) ).
I would be grateful for support.
I didn't create the table function described here but that article inspired me to go back to try regexp_substr function again.
I changed: ppln_jsta_symbol in (:subsidiary) to
ppln_jsta_symbol in (select regexp_substr((select subsidiary from ASYSTENT_GOALS_SET where user_id=''fake_num''),''[^,]+'', 1, level) from dual
connect by regexp_substr((select subsidiary from ASYSTENT_GOALS_SET where user_id=''fake_num''), ''[^,]+'', 1, level) is not null) Now it works like a charm! Thank you #Dessma very much for your time and suggestion!
"I wanted to avoid multiplying rows and thought it would be nice to put all conditions as a part of the code that I would use in 'where' clause further in procedure"
This seems a misguided requirement. You shouldn't worry about number of rows: databases are optimized for storing and retrieving rows.
What they are not good at is dealing with "multi-value" columns. As your own solution proves, it is not nice, it is very far from nice, in fact it is a total pain in the neck. From now on, every time anybody needs to work with subsidiary they will have to invoke a function. Adding, changing or removing a user's subsidiary is much harder than it ought to be. Also there is no chance of enforcing data integrity i.e. validating that a subsidiary is valid against a reference table.
Maybe none of this matters to you. But there are very good reasons why Codd mandated "no repeating groups" as a criterion of First Normal Form, the foundation step of building a sound data model.
The correct solution, industry best practice for almost forty years, would be to recognise that SUBSIDIARY exists at a different granularity to CHANNEL and so should be stored in a separate table.

Check if record exists, if it does add 1 (inside procedure)

I am trying to check if an email address already exists, and add 1 if it does (and 1 if even the email+1 exists and so on). But so far I can't even figure out how to check if it exists, inside a procedure.
if exists (select 1 from table where email='something') then ...
Gives back an error ("function or pseudo-column 'EXISTS' may be used inside a SQL statement only)". Tried other stuff as well, but those might not be worth mentioning.
After I have this I plan on making a while loop for adding 1 as much as needed.
You can select the number of matching records into a variable (which you have declared), and then check that variable's value:
select count(*) into l_count
from my_table
where email = 'something';
if l_count > 0 then
-- record exists
...
else
-- record does not exist
...
end if;
select ... into always has to get exactly one record back, and using the count aggregate function means that happens, evenif more than one matching record exists.
That hopefully covers your specific issue about checking for existance. As for your underlying goal, it sounds like you're trying to find an unused value by incrementing a suffix. If so, this similar question might help. That is looking for usernames rather than emails,but the principle is the same.
As pointed out in comments, simultaneous calls to your procedure might still try to use the same suffix value; so you still need a unique constraint, and handling for that being violated in that scenario. I think that's beyond what you asked about here though.

Optimizing String search in oracle

I have a view that contain all data related to employee.
it had about 350k records.
I have to make a name search functionality.
That will retrieve all the data that matches the keyword entered.
The query performance is very slow it takes 15-20 seconds to retrieve data.
Cost-15000
My query:
SELECT H.PERSON_ID,
B.EMPLOYEE_ID,
INITCAP(B.FIRST_NAME) EMP_FNAME,
INITCAP(B.MIDDLE_NAME) EMP_MNAME,
INITCAP(B.LAST_NAME) EMP_LNAME,
B.EMPLOYEE_TYPE PERSON_DESC,
B.EMPLOYMENT_STATUS STATUS_TYPE,
EA.BASE_BRANCH
FROM EMPLOYEE_BASIC_DTLS B,
EMP_ASSIGNMENT_DTLS_MV EA,
EMPLOYEE_HIS_DEPNDENT_TBL H
WHERE B.PERSON_ID = EA.PERSON_ID
AND B.PERSON_ID = H.PERSON_ID
AND ((UPPER(B.FIRST_NAME) LIKE
('%' || V_SEARCH_PARAM1 || '%')) OR
(UPPER(B.MIDDLE_NAME) LIKE
('%' || V_SEARCH_PARAM1 || '%')) OR
(UPPER(B.LAST_NAME) LIKE
('%' || V_SEARCH_PARAM1 || '%')))
AND TRUNC(SYSDATE) BETWEEN EA.EFFECTIVE_START_DATE AND
EA.EFFECTIVE_END_DATE
AND UPPER(H.RELATIONSHIP_CODE) = 'A';
Since EMPLOYEE_BASIC_DTLS is a view I cant use indexing.
While it's true you can't put an index on a view, you can certainly put indexes on the underlying tables. However, as noted by #JustinCave even if you do add indexes to the appropriate tables this query still won't use them because of the use of LIKE. Additionally, because the UPPER function is being applied to the FIRST_NAME, MIDDLE_NAME, and LAST_NAME columns you'd need to define your indexes as function-based indexes. For example, if the 'real' table accessed by the EMPLOYEE_BASIC_TABLE view is called EMPLOYEES you could define a function-based index on the FIRST_NAME column as
CREATE INDEX EMPLOYEES_UPPER_FIRST_NAME ON EMPLOYEES (UPPER(FIRST_NAME));
I suggest you consider whether the LIKE comparisons are really needed, as working around those to get better performance is going to be difficult.
If you'd like to investigate Oracle Text indexes you can find the documentation here. I think you'll find it's more suited to document or document fragment indexes, but perhaps it would give you some ideas.
Share and enjoy.
As one may look for any name or any part of a name there is no way to create an index containing the values to be searched beforehand. So that won't help you here. Oracle will do a full table scan to check every single string for a match.
What you can do though is to speed up that scan.
You can speed up a full table scan by parallelizing it via /*+parallel(EMPLOYEE_BASIC_TABLE,4)*/ for instance. (This would be my advice here.)
Or you can avoid a full table scan by having one index per column, well knowing that there are many repeatedly used names, so that every name is scanned just once. Then you would use function based keys on the underlying table as Bob Jarvis suggests, because you are using the upper function on any name. Fastest would be a combined index:
create bitmap index idx_name_search on EMPLOYEE_BASIC_TABLE (upper(first_name || '|' || middle_name || '|' || last_name))
so there is just one index to look up. (You would have to use exactly this expression in your query of course: WHERE upper(first_name || '|' || middle_name || '|' || last_name) like '%JOHN%'.) But still, you don't know what will be searched for in advance, and as '%JOHN%' may effect only 2% of your table data, '%E%' may affect 80%. The optimizer would never know. You could at least guess and have to different select statements, one with a full table hint you'd use when the search string contains at least three letters and one with an index hint you'd use otherwise, for instance.
You see, that gets quite complicated the more you think about it. I suggest to try the parallel hint first. Maybe this already speeds things up sufficiently.

Dynamic query and caching

I have two problem sets. What I am preferably looking for is a solution which combines both.
Problem 1: I have a table of lets say 20 rows. I am reading 150,000 rows from other table (say table 2). For each row read from table 2, I have to match it with a specific row of table 1 (not matching whole row, few columns. like if table2.col1 = table1.col && table2.col2 = table1.col2) etc. Is there a way that i can cache table 1 so that i don't have to query it again and again ?
Problem 2: I want to generate query string dynamically i.e., if parameter 2 is null then don't put it in where clause. Now the only option left is to use immidiate execute which will be very slow.
Now what i am asking that how can i have dynamic query to compare it with table 1 ? any ideas ?
For problem 1, as mentioned in the comments, let the database handle it. That's what it does really well. If it is something being hit often, then the blocks for the table should remain in the database buffer cache if the buffer cache is sized appropriately. Part of DBA tuning would be to identify appropriate sizing, pinning tables into the "keep" pool, etc. But probably not something that needs worrying over.
If the desire is just to simplify writing the queries rather than performance, then views or stored procs can simplify the repetitive use of the join.
For problem 2, a query in a format like this might work for you:
SELECT id, val
FROM myTable
WHERE filter = COALESCE(v_filter, filter)
If the input parameter v_filter is null, then just automatically match the existing column. This assumes the existing filter column itself is never null (since you can't use = for null comparisons). Also, it assumes that there are other indexed portions in the WHERE clause since a function like COALESCE isn't going to be able to take advantage of an index.
For problem 1 you just join the tables. If there is an equijoin and one table is quite small and the other large then you're likely to get a hash join. This is effectively a caching mechanism, and the total cost of reading the tables and performing the join is only very slightly higher than that of reading the tables (as long as the hash table fits in memory).
It does not make a difference if the query is constructed and run through execute immediate -- the RDBMS hash join will still act as an effective cache.

Best Practices for updating multiple check boxes on a web form to a database

A sample case scenario - I have a form with one question and multiple answers as checkboxes, so you can choose more than one. Table for storing answers is as below:
QuestionAnswers
(
UserID int,
QuestionID int,
AnswerID int
)
What is the best way of updating those answers to the database using a stored proc? At different jobs I've seen all spectrum, from simply deleting all previous answers and inserting new ones, to passing list of answers to remove and list of answers to add to the stored proc.
In my current project performance and scalability are pretty important, so I'm wondering what's the best way of doing it?
Thanks!
Andrey
If I had a choice of table design, and the following statements are true:
You know the maximum choices count per question/
Each choice is a simple checked/unchecked.
Each answer be classified as correct/wrong rather than marked by some scale. (Like 70% right.)
Then considering performance I would considered the following table instead of the one you presented:
QuestionAnswers
(
UserID int,
QuestionID int,
Choice1 bool,
Choice2 bool,
...
ChoiceMax bool
)
Yes, it is ugly in terms of normalization but that denormalization will buy performance and simplify queries -- just one update/insert for one question. (And I would update first and insert only if affected rows equals to zero.)
Also detecting whether the answer was correct will be also more simple -- with the following table:
QuestionCorrectAnswers
(
QuestionID int,
Choice1 bool,
Choice2 bool,
...
ChoiceMax bool
)
All you need to do is just to lookup for the row in QuestionCorrectAnswers with the same combination of choices as user answered.
If the questions are always the same, then you'd never delete anything - just run an update query on all changed Answers.
Update QuestionAnswers
SET AnswerID = #AnswerID
WHERE UserID = #UserID AND QuestionID = #QuestionID
If for some reason you still need to do some delete/insert - I'd check which QuestionIDs already exist (for the given UserID) so you do a minimum of Delete/Insert.
Updates are just far faster than Delete then Insert, plus you don't make any identity columns skyrocket.
I presume you load the QuestionAnswers from DB upon entering the page, so the user can see which answers he/she gave last time - if you did you already have the necessary data in memory, for determining what to delete, insert and update.
Andrey: If the user (userid=1) selects choices a(answerid=1) & b(answerid=2) for question 1(questionid=1) and later switches to c (a-id=3) & d(a-id=4), you would have to check, then delete the old and add the new. If you choose to follow the other approach, you would not be checking if a particular record exists (so that you can update it), you would just delete old records and insert new records. Anyways, since you are not storing any identity columns, I would go with the latter approach.
It is a simple solution:
Every [Answer] should have integer value (bit) and this value is unique for current Question.
For example, you have Question1 and four predefined answers:
[Answer] [Bit value]
answer1 0x00001
answer2 0x00002
answer3 0x00004
answer4 0x00008
...
So, you SQL INSERT/UPDATE will be:
declare #checkedMask int
set #checkedMask = 0x00009 -- answer 1 and answer 4 are checked
declare #questionId int
set #questionId = 1
-- delete
delete
--select r.*
r
from QuestionResult r
inner join QuestionAnswer a
on r.QuestionId = a.QuestionId and r.AnswerId = a.AnswerId
where r.QuestionId = #questionId
and (a.mask & #checkedMask) = 0
-- insert
insert QuestionResult (AnswerId, QuestionId)
select
AnswerId,
QuestionId
from QuestionAnswer a
where a.QuestionId = #questionId
and (a.mask & #checkedMask) > 0
and not exists(select AnswerId from QuestionResult r
where r.QuestionId = #questionId and r.AnswerId = a.AnswerId)
Sorry to resurect an old thread. I would have thought the only realistic solution is to delete all responses for that question, and create new rows where the checkbox is ticked. Having a column per answer may be efficient as far as updates go, but the inflexibility of this approach is just not an option. You need to be able to add options to a question without having to redesign your database.
Just delete and re-insert. Thats what databases are designed to do, store and retrieve lots of rows of data.
I disagree that regent's answer is denormalized. As long as each answer is not dependent on another column, and is only dependent on the key, it is in 3rd normal form. It is no different than a table with the following fields for a customer name:
CustomerName
(
name_prefix
name_first
name_mi
name_last
name_suffix
city
state
zip
)
Same as
QuestionAnswers
(
Q1answer1
Q1answer2
Q1answerN
)
There really is no difference between the "Question" of name and the multiple answers which may or may not be filled out and the "Question" of the form and the multiple answers that may or may not be selected.

Resources