Skip assigning if found duplicates - openedge

I have a procedure that assigns values and sends it back. I need to implement a change that it would skip the assigning process whenever it finds duplicate iban code. It would be in this FOR EACH. Some kind of IF or something else. Basically, when it finds an iban code that was already used and assigned it would not assign it for the second or third time. I am new to OpenEdge Progress so it is really hard for me to understand correctly the syntax and write the code by myself yet. So if anyone could explain how I should implement this, give any pieces of advice or tips I would be very thankful.
FOR EACH viewpoint WHERE viewpoint.cif = cif.cif AND NOT viewpoint.close NO-LOCK:
DEFINE VARIABLE cIban AS CHARACTER NO-UNDO.
FIND FIRST paaa WHERE paaa.cif EQ cif.cif AND paaa.paaa = viewpoint.aaa AND NOT paaa.close NO-LOCK NO-ERROR.
cIban = viewpoint.aaa.
IF AVAILABLE paaa THEN DO:
cIban = paaa.vaaa.
CREATE tt_account_rights.
ASSIGN
tt_account_rights.iban = cIban.
END.

You have not shown the definition of tt_account_rights but assuming that "iban" is a uniquely indexed field in tt_account_rights you probably want something like:
DEFINE VARIABLE cIban AS CHARACTER NO-UNDO.
FOR EACH viewpoint WHERE viewpoint.cif = cif.cif AND NOT viewpoint.close NO-LOCK:
FIND FIRST paaa WHERE paaa.cif EQ cif.cif AND paaa.paaa = viewpoint.aaa AND NOT paaa.close NO-LOCK NO-ERROR.
cIban = viewpoint.aaa.
IF AVAILABLE paaa THEN DO:
cIban = paaa.vaaa.
find tt_account_rights where tt_account_rights.iban = cIban no-error.
if not available tt_account_rights then
do:
CREATE tt_account_rights.
ASSIGN
tt_account_rights.iban = cIban.
end.
END.
Some bonus perspective:
1) Try to express elements of the WHERE clause as equality matches whenever possible. This is the most significant contributor to query efficiency. So instead of saying "NOT viewpoint.close" code it as "viewpoint.close = NO".
2) Do NOT automatically throw FIRST after every FIND. You may have been exposed to some code where that is the "standard". It is none the less bad coding. If the FIND is unique it adds no value (it does NOT improve performance in that case). If the FIND is not unique and you do as you have done above and assign a value from that record you are, effectively, making that FIRST record special. Which is a violation of 3rd normal form (there is now a fact about the record which is not related to the key, the whole key and nothing but the key). What if the 2nd record has a different iBan? What if different WHERE clauses return different "1st" records?
There are cases where FIRST is appropriate. The point is that it is not ALWAYS correct and it should not be added to every FIND statement without any thought about why you are putting it there and what the impact of that keyword really is.
3) It is clearer to put the NO-LOCK (or EXCLUSIVE-LOCK or SHARE-LOCK) immediately after the table name rather than towards the end of the statement. The syntax works either way but from a readability perspective it is better to have the lock phrase right by the table.

Related

What if i don't use index fields in my program?

I am a beginner to this progress 4GL. I have confused with the following logic especially how the index actually working.
I have added 2 fields in one index. As you can see below I have written three queries.
Query 1, Used the index and finding data from 2 fields to retrieve the data
Query 2, Used the same index but finding data from 1 field only
Query 3, Used the same index field with one non-index field
define temp-table tt_creldata no-undo
field tt_cscx_order as character
field tt_cscx_part as character
field tt_cscx_shipfrom as character
index tt_cscx
tt_cscx_order
tt_cscx_part
.
**Query 1:**
find first tt_creldata use-index tt_cscx
where tt_cscx_order = "153"
and tt_cscx_part = "113" no-lock no-error.
**Query 2:**
find first tt_creldata use-index tt_cscx
where tt_cscx_order = "153" no-lock no-error.
**Query 3:**
find first tt_creldata use-index tt_cscx
where tt_cscx_order = "153"
and tt_cscx_part = "113"
and tt_cscx_shipfrom = "US" no-lock no-error.
Question 1: Which query helps to improve the performance
Question 2: What if I don't use one field which is indexed when I mentioned use-index
Question 3: What if I add one non-index field when I mentioned use-index?
As a general rule of thumb, you should never use use-index.
The AVM will select one or more indexes to use for a query at compile time, and by forcing it to use one of your choosing, you are removing the possibility of this.
Having extra, possibly non-index, fields in your where clause will only affect the indexes chosen if you let the AVM choose (ie don't use use-index ). This is also true if you don't use indexed fields in your query.
You can see which indexes are used if you compile the program with the xref or xml-xref options, and looking for the SEARCH items.
As nwahmaet says, you should never use USE-INDEX. In this case it is especially pointless because there is only one index. In cases where there are multiple indexes a FIND statement will only use one of them no matter how complex the WHERE clause but the compiler will almost always do a better job picking an efficient index than you will. (The FOR EACH statement and its associated dynamic queries are capable of using multiple indexes. FIND is always limited to just one index.) In those rare cases where you think you are doing a better job you should thoroughly document why your choice is better and include detailed test cases and results.
All of your queries are using FIRST. This is necessary because your index is not defined as unique. That may be your intent but it seems unusual. And it means that in the event of duplicate records with the same key values you are magically making the "first" record more special than the others. Which is a data normalization faux pas (you are making "firstness" an attribute of the data) and a bug waiting to happen.
FIND FIRST and USE-INDEX are often used together to (try to) cover up for each other's deficiencies. By specifying a particular index the FIRST becomes more consistent. Likewise, FIRST is often used to "cure" performance issues that arise from insufficient index definitions, inadequate WHERE clauses or choosing FIND when FOR EACH would have been more appropriate.
None of these queries are going to perform notably faster than the others.
Query 2 may, or may not return the same record as query 1. For instance, if there is a part = "112" then query 2 will have a different "first" record. But it will be just as fast to return as query 1.
Likewise query 3 may have a different result depending on what records contain shipfrom = "US". In the best case where the very first order = "153" and part "113" also satisfy shipfrom = "US" then it will be the same speed as the others.
However, query 3 might be a lot slower depending on how many records have to be scanned before one is found that has shipfrom = "US" since that field is not a part of any index and matching it will, therefore, require scanning records until one is found which matches. That might be the first record or it might be the 10 zillionth.

Is there a way of using an IF statement inside a FOR EACH in PROGRESS-4GL?

I'm transforming some SQL codes into Progress 4GL. The specific code I'm writing right now has a lot of possible variables insertions, for example, there are 3 checkboxes that can be selected to be or not, and each selection adds a "AND ind_sit_doc IN", etc.
What I'd like to do is something like this:
FOR EACH doc-fiscal USE-INDEX dctfsc-09
WHERE doc-fiscal.dt-docto >= pd-data-1
AND doc-fiscal.dt-docto <= pd-data-2
AND doc-fiscal.cod-observa <> 4
AND doc-fiscal.tipo-nat <> 3
AND doc-fiscal.cd-situacao <> 06
AND doc-fiscal.cd-situacao <> 22
AND (IF pc-ind-sit-doc-1 = 'on' THEN: doc-fiscal.ind-sit-doc=1) NO-LOCK,
EACH natur-oper USE-INDEX natureza
WHERE doc-fiscal.nat-operacao = natur-oper.nat-operacao NO-LOCK:
The if part would only be read if the variable was in a certain way.
Is it possible?
Yes, you can do that (more or less as nwahmaet showed).
But it is a very, very bad idea in a non-trivial query. You are very likely going to force a table-scan to occur and you may very well send all of the data to the client for selection. That's going to be really painful. You would be much better off moving the IF THEN ELSE outside of the WHERE clause and implementing two distinct FOR EACH statements.
If your concern with that is that you would be duplicating the body of the FOR EACH then you could use queries. Something like this:
define query q for customer.
if someCondition = true then
open query q for each customer no-lock where state = "ma".
else
open query q for each customer no-lock where state = "nh".
get first q.
do while available customer:
display custNum state.
get next q.
end.
This is going to be much more efficient for anything other than a tiny little table.
You can also go fully dynamic and just build the needed WHERE clause as a string - but that involves using handles and is more complicated. But if that sounds attractive lookup QUERY-PREPARE in the documentation.
You can add an IF statement in a FOR EACH. You must have the complete IF ... THEN ... ELSE though.
For example:
FOR EACH customer WHERE (IF discount > 50 THEN state = 'ma' ELSE state = ''):
DISPL name state discount.
END.
That said, that condition will not be used for index selection and will only be applied on the client (if you're using networked db connections this is bad).

Delete a key-value pair in BerkeleyDB

Is there any way to delete key-value pair where the key start with sub-string1 and ends with sub-string2 in BerkeleyDB without iterating through all the keys in the DB?
For ex:
$sub1 = "B015";
$sub2 = "5646";
I want to delete
$key = "B015HGUJJ75646"
Note: It is guaranteed that there will be only one key for the combination of $sub1 and $sub2.
This can be done by taking an iterator of the DB and checking every key for the condition, but that will be very in-efficient for large DBs. Is there any way to do it without iterating through the complete DB?
If you're using a RECNO database, you're probably out of luck. But, if you can use a BTREE, you have a couple of options.
First, and probably easiest is to iterate over only the portion of the database that makes sense. Assuming you're using the default key comparison function, you can use DB_SET_RANGE to position the starting cursor (iterator) at the start of your partial key string. In your example, this might be "B0150000000000". You then scan forwards with DB_NEXT, looking at each key in turn. When either you find the key you're looking for, or if the key you find doesn't start with "B015", you're done.
Another technique that could be applicable to your situation is to redefine the key comparison function. If, as you state, there is only one combination of $sub1 and $sub2, then perhaps you only need to compare those sections of the keys to guarantee uniqueness? Here's an example of a full string comparison (I'm assuming you're using perl, just from the syntax you supplied above) from https://www2.informatik.hu-berlin.de/Themen/manuals/perl/DB_File.html :
sub Compare
{
my ($key1, $key2) = #_ ;
"\L$key1" cmp "\L$key2" ;
}
$DB_BTREE->{compare} = 'Compare' ;
So, if you can rig things such that you're only comparing the starting and ending four characters, you should be able to drop the database iterator directly onto the key you're interested in.

SQLite - Get a specific row index for a Sorted/Filtered Query

I'm creating a caching system to take data from an SQLite database table using a sorted/filtered query and display it. The tables I'm pulling from can be potentially very large and, of course, I need to minimize impact on memory by only retaining a maximum number of rows in memory at any given time. This is easily done by using LIMIT and OFFSET to load only the records I need and update the cache as needed. Implementing this is trivial. The problem I'm having is determining where the insertion index is for a new record inserted into a particular query so I can update my UI appropriately. Is there an easy way to do this? So far the ideas I've had are:
Dump the entire cache, re-count the Query results (there's no guarantee the new row will be included), refresh the cache and refresh the entire UI. I hope it's obvious why that's not really desirable.
Use my own algorithm to determine whether the new row is included in the current query, if it is included in the current cached results and at what index it should be inserted into if it's within the current cached scope. The biggest downfall of this approach is it's complexity and the risk that my own sorting/filtering algorithm won't match SQLite's.
Of course, what I want is to be able to ask SQLite: Given 'Query A' what is the index of 'Row B', without loading the entire query results. However, so far I haven't been able to find a way to do this.
I don't think it matters but this is all occurring on an iOS device, using the objective-c programming language.
More Info
The Query and subsequent cache is based off of user input. Essentially the user can re-sort and filter (or search) to alter the results they're seeing. My reticence in simply recreating the cache on insertions (and edits, actually) is to provide a 'smoother' UI experience.
I should point out that I'm leaning toward option "2" at the moment. I played around with creating my own caching/indexing system by loading all the records in a table and performing the sort/filter in memory using my own algorithms. So much of the code needed to determine whether and/or where a particular record is in the cache is already there, so I'm slightly predisposed to use it. The danger lies in having a cache that doesn't match the underlying query. If I include a record in the cache that the query wouldn't return, I'll be in trouble and probably crash.
You don't need record numbers.
Save the values of the ordered field in the first and last records of the LIMITed query result.
Then you can use these to check whether the new record falls into this range.
In other words, assuming that you order by the Name field, and that the original query was this:
SELECT Name, ...
FROM mytab
WHERE some_conditions
ORDER BY Name
LIMIT x OFFSET y
then try to get at the new record with a similar query:
SELECT 1
FROM mytab
WHERE some_conditions
AND PrimaryKey = LastInsertedValue
AND Name BETWEEN CachedMin AND CachedMax
Similarly, to find out before (or after) which record the new record was inserted, start directly after the inserted record and use a limit of one, like this:
SELECT Name
FROM mytab
WHERE some_conditions
AND Name > MyInsertedName
AND Name BETWEEN CachedMin AND CachedMax
ORDER BY Name
LIMIT 1
This doesn't give you a number; you still have to check where the returned Name is in your cache.
Typically you'd expect a cache to be invalidated if there were underlying data changes. I think dropping it and starting over will be your simplest, maintainable solution. I would recommend it unless you have a very good reason.
You could write another query that just returned the row count (example below) to see if your cache should be invalidated. That would save recreating the cache when it did not change.
SELECT name,address FROM people WHERE area_code=970;
SELECT COUNT(rowid) FROM people WHERE area_code=970;
The information you'd need from sqlite to know when your cache was invalidated would require some rather intimate knowledge of how the query and/or index was working. I would say that is fairly high coupling.
Otherwise, you'd want to know where it was inserted with regards to the sorting. You would probably key each page on the sorted field. Delete anything greater than the insert/delete field. Any time you change the sorting you'd drop everything.
Something like the below would be a start if you were using C++. I realize you aren't doing C++, but hopefully it is evident as to what I'm trying to do.
struct Person {
std::string name;
std::string addr;
};
struct Page {
std::string key;
std::vector<Person> persons;
struct Less {
bool operator()(const Page &lhs, const Page &rhs) const {
return lhs.key.compare(rhs.key) < 0;
}
};
};
typedef std::set<Page, Page::Less> pages_t;
pages_t pages;
void insert(const Person &person) {
if (sql_insert(person)) {
pages_t::iterator drop_cache_start = pages.lower_bound(person);
//... drop this page and everything after it
}
}
You'd have to do some wrangling to get different datatypes of key to work nicely, but its possible.
Theoretically you could just leave the pages out of it and only use the objects themselves. The database would no longer "own" the data though. If you only fill pages from the database, then you'll have less data consistency worries.
This may be a bit off topic, you aren't re-implementing views are you? It doesn't cache per se, but it isn't clear if that is a requirement of your project.
The solution I came up with is not exactly simple, but it's currently working well. I realized that the index of a record in a Query Statement is also the Count of all it's previous records. What I needed to do was 'convert' all the ORDER statements in the query to a series of WHERE statements that would return only the preceding records and take a count of those records. It's trickier than it sounds (or maybe not...it sounds tricky). The biggest issue I had was making sure the query was, in fact, sorted in a way I could predict. This meant I needed to have an order column in the Order Parameters that was based off of a column with unique values. So, whenever a user sorts on a column, I append to the statement another order parameter on a unique column (I used a "Modified Date Stamp") to break ties.
Creating the WHERE portion of the statement requires more than just tacking on a bunch of ANDs. It's easier to demonstrate. Say you have 3 Order columns: "LastName" ASC, "FirstName" DESC, and "Modified Stamp" ASC (the tie breaker). The WHERE statement would have to look something like this ('?' = record value):
WHERE
"LastName" < ? OR
("LastName" = ? AND "FirstName" > ?) OR
("LastName" = ? AND "FirstName" = ? AND "Modified Stamp" < ?)
Each set of WHERE parameters grouped together by parenthesis are tie breakers. If, in fact, the record values of "LastName" are equal, we must then look at "FirstName", and finally "Modified Stamp". Obviously, this statement can get really long if you're sorting by a bunch of order parameters.
There's still one problem with the above solution. Mathematical operations on NULL values always return false, and yet when you sort SQLite sorts NULL values first. Therefore, in order to deal with NULL values appropriately you've gotta add another layer of complication. First, all mathematical equality operations, =, must be replace by IS. Second, all < operations must be nested with an OR IS NULL to include NULL values appropriately on the < operator. This turns the above operation into:
WHERE
("LastName" < ? OR "LastName" IS NULL) OR
("LastName" IS ? AND "FirstName" > ?) OR
("LastName" IS ? AND "FirstName" IS ? AND ("Modified Stamp" < ? OR "Modified Stamp" IS NULL))
I then take a count of the RowID using the above WHERE parameter.
It turned out easy enough for me to do mostly because I had already constructed a set of objects to represent various aspects of my SQL Statement which could be assembled to generate the statement. I can't even imagine trying to manipulate a SQL statement like this any other way.
So far, I've tested using this on several iOS devices with up to 10,000 records in a table and I've had no noticeable performance issues. Of course, it's designed for single record edits/insertions so I don't really need it to be super fast/efficient.

Dataset column always returns -1

I have a SQL stored proc that returns a dataset to ASP.NET v3.5 dataset. One of the columns in the dataset is called Attend and is a nullable bit column in the SQL table. The SELECT for that column is this:
CASE WHEN Attend IS NULL THEN -1 ELSE Attend END AS Attend
When I execute the SP in Query Analyzer the row values are returned as they should be - the value for Attend is -1 is some rows, 0 in others, and 1 in others. However, when I debug the C# code and examine the dataset, the Attend column always contains -1.
If I SELECT any other columns or constant values for Attend the results are always correct. It is only the above SELECT of the bit field that is behaving strangely. I suspect it has something to do with the type being bit that is causing this. So to test this I instead selected "CONVERT(int, Attend)" but the behavior is the same.
I have tried using ExecuteDataset to retrieve the data and I have also created a .NET Dataset schema with TableAdapter and DataTable. Still no luck.
Does anyone know what is the problem here?
Like you, I suspect the data type. If you can change the data type of Attend, change it to smallint, which supports negative numbers. If not, try changing the name of the alias from Attend to IsAttending (or whatever suits the column).
Also, you can make your query more concise by using this instead of CASE:
ISNULL(Attend, -1)
You've suggested that the Attend field is a bit, yet it contains three values (-1,0,1). A bit, however, can only hold two values. Often (-1, 0) when converted to an integer, but also possible (0, 1), depending on whether the BIT is considered signed (two's compliment) or unsigned (one's compliment).
If your client (the ASP code) is converting all values for that field to a BIT type then both -1 and 1 will likely show as the same value. So, I would ensure two things:
- The SQL returns an INTEGER
- The Client isn't converting that to a BIT
[Though this doesn't explain the absence of 0's]
One needs to be careful with implicit conversion of types. When not specifying explicitly double check the precidence. Or, to be certain, explicitly specify every type...
Just out of interest, what do you get when using the following?
CASE [table].attend
WHEN NULL THEN -2
WHEN 0 THEN 0
ELSE 2
END

Resources