Optimizing String search in oracle - plsql

I have a view that contain all data related to employee.
it had about 350k records.
I have to make a name search functionality.
That will retrieve all the data that matches the keyword entered.
The query performance is very slow it takes 15-20 seconds to retrieve data.
Cost-15000
My query:
SELECT H.PERSON_ID,
B.EMPLOYEE_ID,
INITCAP(B.FIRST_NAME) EMP_FNAME,
INITCAP(B.MIDDLE_NAME) EMP_MNAME,
INITCAP(B.LAST_NAME) EMP_LNAME,
B.EMPLOYEE_TYPE PERSON_DESC,
B.EMPLOYMENT_STATUS STATUS_TYPE,
EA.BASE_BRANCH
FROM EMPLOYEE_BASIC_DTLS B,
EMP_ASSIGNMENT_DTLS_MV EA,
EMPLOYEE_HIS_DEPNDENT_TBL H
WHERE B.PERSON_ID = EA.PERSON_ID
AND B.PERSON_ID = H.PERSON_ID
AND ((UPPER(B.FIRST_NAME) LIKE
('%' || V_SEARCH_PARAM1 || '%')) OR
(UPPER(B.MIDDLE_NAME) LIKE
('%' || V_SEARCH_PARAM1 || '%')) OR
(UPPER(B.LAST_NAME) LIKE
('%' || V_SEARCH_PARAM1 || '%')))
AND TRUNC(SYSDATE) BETWEEN EA.EFFECTIVE_START_DATE AND
EA.EFFECTIVE_END_DATE
AND UPPER(H.RELATIONSHIP_CODE) = 'A';
Since EMPLOYEE_BASIC_DTLS is a view I cant use indexing.

While it's true you can't put an index on a view, you can certainly put indexes on the underlying tables. However, as noted by #JustinCave even if you do add indexes to the appropriate tables this query still won't use them because of the use of LIKE. Additionally, because the UPPER function is being applied to the FIRST_NAME, MIDDLE_NAME, and LAST_NAME columns you'd need to define your indexes as function-based indexes. For example, if the 'real' table accessed by the EMPLOYEE_BASIC_TABLE view is called EMPLOYEES you could define a function-based index on the FIRST_NAME column as
CREATE INDEX EMPLOYEES_UPPER_FIRST_NAME ON EMPLOYEES (UPPER(FIRST_NAME));
I suggest you consider whether the LIKE comparisons are really needed, as working around those to get better performance is going to be difficult.
If you'd like to investigate Oracle Text indexes you can find the documentation here. I think you'll find it's more suited to document or document fragment indexes, but perhaps it would give you some ideas.
Share and enjoy.

As one may look for any name or any part of a name there is no way to create an index containing the values to be searched beforehand. So that won't help you here. Oracle will do a full table scan to check every single string for a match.
What you can do though is to speed up that scan.
You can speed up a full table scan by parallelizing it via /*+parallel(EMPLOYEE_BASIC_TABLE,4)*/ for instance. (This would be my advice here.)
Or you can avoid a full table scan by having one index per column, well knowing that there are many repeatedly used names, so that every name is scanned just once. Then you would use function based keys on the underlying table as Bob Jarvis suggests, because you are using the upper function on any name. Fastest would be a combined index:
create bitmap index idx_name_search on EMPLOYEE_BASIC_TABLE (upper(first_name || '|' || middle_name || '|' || last_name))
so there is just one index to look up. (You would have to use exactly this expression in your query of course: WHERE upper(first_name || '|' || middle_name || '|' || last_name) like '%JOHN%'.) But still, you don't know what will be searched for in advance, and as '%JOHN%' may effect only 2% of your table data, '%E%' may affect 80%. The optimizer would never know. You could at least guess and have to different select statements, one with a full table hint you'd use when the search string contains at least three letters and one with an index hint you'd use otherwise, for instance.
You see, that gets quite complicated the more you think about it. I suggest to try the parallel hint first. Maybe this already speeds things up sufficiently.

Related

PeopleSoft Query Manager - string expression, trying to get the third set of ||

In PeopleSoft I am trying to get some text from the middle of a cell. I'm having trouble with creating a string expression to do this. The data in the cell looks like this (I've bolded what I'm trying to capture):
|| SVP: Person Number one|| Interview Completed by: Person Number two
|| Info: Employee lists off a bunch of stuff.
|| Non-Relevant Question? Y
|| Manager provided data to: Person Number three
I've submitted a question here and I've come close. Here's what I have so far:
substring(J.HR_NP_NOTE_TEXT,CHARINDEX('|| Info:',J.HR_NP_NOTE_TEXT)+8,100),CHARINDEX('||',J.HR_NP_NOTE_TEXT)
The problem is that this is not stopping at the ||. A better solution - if I knew how - would be if I could get the text that is after the third set of '||' and stop before the fourth set of '||'.
Without seeing your actual output, I think that you will find that your issue is to do with the way substring actually works.
From the Transact SQL documentation I can see online (https://learn.microsoft.com/en-us/sql/t-sql/functions/substring-transact-sql?view=sql-server-2017)
substring takes 3 arguments:
SUBSTRING ( expression ,start , length )
you have the expression - J.HR_NP_NOTE_TEXT
you have the start - CHARINDEX('|| Info:',J.HR_NP_NOTE_TEXT)+8,100) - I'm not 100% sure why you are only looking for the '|| Info:' after 100 characters though?
But then when you calculate the length - CHARINDEX('||',J.HR_NP_NOTE_TEXT) you are not going to get what you expect. Assuming that it ignores the earlier ||'s in the string (will it?), it will return a position in the string, not an actual length.
A more appropriate calculation would be something like
(CHARINDEX('||',J.HR_NP_NOTE_TEXT)-CHARINDEX('|| Info:',J.HR_NP_NOTE_TEXT)+8,100))
i.e. the difference between detecting the '|| Info:' position and the next '||'.
Making your whole query:
substring(J.HR_NP_NOTE_TEXT,CHARINDEX('|| Info:',J.HR_NP_NOTE_TEXT)+8,100),(CHARINDEX('||',J.HR_NP_NOTE_TEXT)-CHARINDEX('|| Info:',J.HR_NP_NOTE_TEXT)+8,100)).
As I'm not entirely sure about your CHARINDEX looking for '||', I would probably try this one though - to find the next '||' after '|| Info:'
substring(J.HR_NP_NOTE_TEXT,CHARINDEX('|| Info:',J.HR_NP_NOTE_TEXT)+8,100),(CHARINDEX('||',J.HR_NP_NOTE_TEXT,CHARINDEX('|| Info:',J.HR_NP_NOTE_TEXT)+8,100))-CHARINDEX('|| Info:',J.HR_NP_NOTE_TEXT)+8,100)).
Although this is a question you're asking about a PS Query, you may get better answers from people who are focused primarily on the SQL language you're using too, as this is not strictly speaking a PeopleSoft only question.
Hope that helps!

Easy, computationally cheap way to add a row number to a table using SQLite?

Googled this quite a bit and found answers asking similar but different questions, but not what I am looking to do.
I have a table, and I want to add a row number to it. So:
ID || value
91 || valueA
11 || valueB
71 || valueC
becomes
Row# || ID || value
1 || 91 || valueA
2 || 11 || valueB
3 || 71 || valueC
Found this answer that is a bit more complex than my use case. Also I was warned against using the answers at they are computationally expensive (n^2-ish).
Also found a few other answers like this one where the user wanted the row number returned for a query, but that is a different use case. I just want to append a row number to all the rows in the table.
Based on your question, it seemed like you were asking for another column in your database. If that's not the case please comment.
In your database creation class (or wherever you create your database), in the CREATE TABLE statement, use the following structure:
CREATE TABLE table_name(
RowNum INTEGER AUTOINCREMENT,
_ID INTEGER PRIMARY KEY,
value TEXT,
);
Increment the database version by 1 and it'll be good to go.
As I think you know, in SQL tables have no inherent order. Therefore any "row number" is based on some implicit order. If you make use of any kind of "row id" in the DBMS, the implicit order is likely to be insertion order. That's cheap, if if it suits your needs, that's what you want.
Any other "row number" you create requires a sort; if supported by an index, that sort will be O(N log N), else, yes, O(N^2). The math has a way of being very insistent about that.
After answering this question many times in different guises, I wrote a simple example. In SQLite I've had good experience with under a million rows. Larger sorts take longer, YMMV.
FWIW, I never store derived order. Because the system can't feasibly enforce its correctness, it's impossible to know if it's correct. Better to keep a covering index on the interesting order, and rely on a view to supply rank ordinals when needed.

Dynamic query and caching

I have two problem sets. What I am preferably looking for is a solution which combines both.
Problem 1: I have a table of lets say 20 rows. I am reading 150,000 rows from other table (say table 2). For each row read from table 2, I have to match it with a specific row of table 1 (not matching whole row, few columns. like if table2.col1 = table1.col && table2.col2 = table1.col2) etc. Is there a way that i can cache table 1 so that i don't have to query it again and again ?
Problem 2: I want to generate query string dynamically i.e., if parameter 2 is null then don't put it in where clause. Now the only option left is to use immidiate execute which will be very slow.
Now what i am asking that how can i have dynamic query to compare it with table 1 ? any ideas ?
For problem 1, as mentioned in the comments, let the database handle it. That's what it does really well. If it is something being hit often, then the blocks for the table should remain in the database buffer cache if the buffer cache is sized appropriately. Part of DBA tuning would be to identify appropriate sizing, pinning tables into the "keep" pool, etc. But probably not something that needs worrying over.
If the desire is just to simplify writing the queries rather than performance, then views or stored procs can simplify the repetitive use of the join.
For problem 2, a query in a format like this might work for you:
SELECT id, val
FROM myTable
WHERE filter = COALESCE(v_filter, filter)
If the input parameter v_filter is null, then just automatically match the existing column. This assumes the existing filter column itself is never null (since you can't use = for null comparisons). Also, it assumes that there are other indexed portions in the WHERE clause since a function like COALESCE isn't going to be able to take advantage of an index.
For problem 1 you just join the tables. If there is an equijoin and one table is quite small and the other large then you're likely to get a hash join. This is effectively a caching mechanism, and the total cost of reading the tables and performing the join is only very slightly higher than that of reading the tables (as long as the hash table fits in memory).
It does not make a difference if the query is constructed and run through execute immediate -- the RDBMS hash join will still act as an effective cache.

SQLite - Get a specific row index for a Sorted/Filtered Query

I'm creating a caching system to take data from an SQLite database table using a sorted/filtered query and display it. The tables I'm pulling from can be potentially very large and, of course, I need to minimize impact on memory by only retaining a maximum number of rows in memory at any given time. This is easily done by using LIMIT and OFFSET to load only the records I need and update the cache as needed. Implementing this is trivial. The problem I'm having is determining where the insertion index is for a new record inserted into a particular query so I can update my UI appropriately. Is there an easy way to do this? So far the ideas I've had are:
Dump the entire cache, re-count the Query results (there's no guarantee the new row will be included), refresh the cache and refresh the entire UI. I hope it's obvious why that's not really desirable.
Use my own algorithm to determine whether the new row is included in the current query, if it is included in the current cached results and at what index it should be inserted into if it's within the current cached scope. The biggest downfall of this approach is it's complexity and the risk that my own sorting/filtering algorithm won't match SQLite's.
Of course, what I want is to be able to ask SQLite: Given 'Query A' what is the index of 'Row B', without loading the entire query results. However, so far I haven't been able to find a way to do this.
I don't think it matters but this is all occurring on an iOS device, using the objective-c programming language.
More Info
The Query and subsequent cache is based off of user input. Essentially the user can re-sort and filter (or search) to alter the results they're seeing. My reticence in simply recreating the cache on insertions (and edits, actually) is to provide a 'smoother' UI experience.
I should point out that I'm leaning toward option "2" at the moment. I played around with creating my own caching/indexing system by loading all the records in a table and performing the sort/filter in memory using my own algorithms. So much of the code needed to determine whether and/or where a particular record is in the cache is already there, so I'm slightly predisposed to use it. The danger lies in having a cache that doesn't match the underlying query. If I include a record in the cache that the query wouldn't return, I'll be in trouble and probably crash.
You don't need record numbers.
Save the values of the ordered field in the first and last records of the LIMITed query result.
Then you can use these to check whether the new record falls into this range.
In other words, assuming that you order by the Name field, and that the original query was this:
SELECT Name, ...
FROM mytab
WHERE some_conditions
ORDER BY Name
LIMIT x OFFSET y
then try to get at the new record with a similar query:
SELECT 1
FROM mytab
WHERE some_conditions
AND PrimaryKey = LastInsertedValue
AND Name BETWEEN CachedMin AND CachedMax
Similarly, to find out before (or after) which record the new record was inserted, start directly after the inserted record and use a limit of one, like this:
SELECT Name
FROM mytab
WHERE some_conditions
AND Name > MyInsertedName
AND Name BETWEEN CachedMin AND CachedMax
ORDER BY Name
LIMIT 1
This doesn't give you a number; you still have to check where the returned Name is in your cache.
Typically you'd expect a cache to be invalidated if there were underlying data changes. I think dropping it and starting over will be your simplest, maintainable solution. I would recommend it unless you have a very good reason.
You could write another query that just returned the row count (example below) to see if your cache should be invalidated. That would save recreating the cache when it did not change.
SELECT name,address FROM people WHERE area_code=970;
SELECT COUNT(rowid) FROM people WHERE area_code=970;
The information you'd need from sqlite to know when your cache was invalidated would require some rather intimate knowledge of how the query and/or index was working. I would say that is fairly high coupling.
Otherwise, you'd want to know where it was inserted with regards to the sorting. You would probably key each page on the sorted field. Delete anything greater than the insert/delete field. Any time you change the sorting you'd drop everything.
Something like the below would be a start if you were using C++. I realize you aren't doing C++, but hopefully it is evident as to what I'm trying to do.
struct Person {
std::string name;
std::string addr;
};
struct Page {
std::string key;
std::vector<Person> persons;
struct Less {
bool operator()(const Page &lhs, const Page &rhs) const {
return lhs.key.compare(rhs.key) < 0;
}
};
};
typedef std::set<Page, Page::Less> pages_t;
pages_t pages;
void insert(const Person &person) {
if (sql_insert(person)) {
pages_t::iterator drop_cache_start = pages.lower_bound(person);
//... drop this page and everything after it
}
}
You'd have to do some wrangling to get different datatypes of key to work nicely, but its possible.
Theoretically you could just leave the pages out of it and only use the objects themselves. The database would no longer "own" the data though. If you only fill pages from the database, then you'll have less data consistency worries.
This may be a bit off topic, you aren't re-implementing views are you? It doesn't cache per se, but it isn't clear if that is a requirement of your project.
The solution I came up with is not exactly simple, but it's currently working well. I realized that the index of a record in a Query Statement is also the Count of all it's previous records. What I needed to do was 'convert' all the ORDER statements in the query to a series of WHERE statements that would return only the preceding records and take a count of those records. It's trickier than it sounds (or maybe not...it sounds tricky). The biggest issue I had was making sure the query was, in fact, sorted in a way I could predict. This meant I needed to have an order column in the Order Parameters that was based off of a column with unique values. So, whenever a user sorts on a column, I append to the statement another order parameter on a unique column (I used a "Modified Date Stamp") to break ties.
Creating the WHERE portion of the statement requires more than just tacking on a bunch of ANDs. It's easier to demonstrate. Say you have 3 Order columns: "LastName" ASC, "FirstName" DESC, and "Modified Stamp" ASC (the tie breaker). The WHERE statement would have to look something like this ('?' = record value):
WHERE
"LastName" < ? OR
("LastName" = ? AND "FirstName" > ?) OR
("LastName" = ? AND "FirstName" = ? AND "Modified Stamp" < ?)
Each set of WHERE parameters grouped together by parenthesis are tie breakers. If, in fact, the record values of "LastName" are equal, we must then look at "FirstName", and finally "Modified Stamp". Obviously, this statement can get really long if you're sorting by a bunch of order parameters.
There's still one problem with the above solution. Mathematical operations on NULL values always return false, and yet when you sort SQLite sorts NULL values first. Therefore, in order to deal with NULL values appropriately you've gotta add another layer of complication. First, all mathematical equality operations, =, must be replace by IS. Second, all < operations must be nested with an OR IS NULL to include NULL values appropriately on the < operator. This turns the above operation into:
WHERE
("LastName" < ? OR "LastName" IS NULL) OR
("LastName" IS ? AND "FirstName" > ?) OR
("LastName" IS ? AND "FirstName" IS ? AND ("Modified Stamp" < ? OR "Modified Stamp" IS NULL))
I then take a count of the RowID using the above WHERE parameter.
It turned out easy enough for me to do mostly because I had already constructed a set of objects to represent various aspects of my SQL Statement which could be assembled to generate the statement. I can't even imagine trying to manipulate a SQL statement like this any other way.
So far, I've tested using this on several iOS devices with up to 10,000 records in a table and I've had no noticeable performance issues. Of course, it's designed for single record edits/insertions so I don't really need it to be super fast/efficient.

Hierarchical Database Select / Insert Statement (SQL Server)

I have recently stumbled upon a problem with selecting relationship details from a 1 table and inserting into another table, i hope someone can help.
I have a table structure as follows:
ID (PK) Name ParentID<br>
1 Myname 0<br>
2 nametwo 1<br>
3 namethree 2
e.g
This is the table i need to select from and get all the relationship data. As there could be unlimited number of sub links (is there a function i can create for this to create the loop ?)
Then once i have all the data i need to insert into another table and the ID's will now have to change as the id's must go in order (e.g. i cannot have id "2" be a sub of 3 for example), i am hoping i can use the same function for selecting to do the inserting.
If you are using SQL Server 2005 or above, you may use recursive queries to get your information. Here is an example:
With tree (id, Name, ParentID, [level])
As (
Select id, Name, ParentID, 1
From [myTable]
Where ParentID = 0
Union All
Select child.id
,child.Name
,child.ParentID
,parent.[level] + 1 As [level]
From [myTable] As [child]
Inner Join [tree] As [parent]
On [child].ParentID = [parent].id)
Select * From [tree];
This query will return the row requested by the first portion (Where ParentID = 0) and all sub-rows recursively. Does this help you?
I'm not sure I understand what you want to have happen with your insert. Can you provide more information in terms of the expected result when you are done?
Good luck!
For the retrieval part, you can take a look at Common Table Expression. This feature can provide recursive operation using SQL.
For the insertion part, you can use the CTE above to regenerate the ID, and insert accordingly.
I hope this URL helps Self-Joins in SQL
This is the problem of finding the transitive closure of a graph in sql. SQL does not support this directly, which leaves you with three common strategies:
use a vendor specific SQL extension
store the Materialized Path from the root to the given node in each row
store the Nested Sets, that is the interval covered by the subtree rooted at a given node when nodes are labeled depth first
The first option is straightforward, and if you don't need database portability is probably the best. The second and third options have the advantage of being plain SQL, but require maintaining some de-normalized state. Updating a table that uses materialized paths is simple, but for fast queries your database must support indexes for prefix queries on string values. Nested sets avoid needing any string indexing features, but can require updating a lot of rows as you insert or remove nodes.
If you're fine with always using MSSQL, I'd use the vendor specific option Adrian mentioned.

Resources