DBase index - dbase

Is the index value of the first record in a dbf file 0 or 1? Is the index zero-based?

DBF's are always count based. I'm not sure what you're after, but given this fact I would have to say 1. Record 1 = first record in the table, not record 0. A goto 0 means goto top.

i'm not sure but this might give you a clue.
http://www.dbase.com/knowledgebase/int/db7_file_fmt.htm
1.3.1 Standard Property and Constraint Descriptor Array
Table field offset - base one. 01 for
the first field in the table, 02 for
the second field, etc. Note: this will
be 0 in the case of a constraint.

Related

Does Teradata reuse values when an identity column is defined as GENERATED BY DEFAULT... NO CYCLE?

I need to delete rows from a Teradata Table that has an IDENTITY column defined as:
Some_Id INTEGER NOT NULL GENERATED BY DEFAULT AS IDENTITY
(START WITH 1
INCREMENT BY 1
MINVALUE 0
MAXVALUE 1000000000
NO CYCLE)
I want to know if Teradata will use again for new rows the values that the rows deleted had. I understood from the Teradata documentation that the NO CYCLE won't allow this, but I'm not really sure from what I've read in other posts and how it affects the GENERATED BY DEFAULT option.
I know that since it is defined as GENERATED BY DEFAULT someone could insert a row with one of the old numbers. I'm asking just for the values automatically generated by Teradata when the column value is not provided.

sqlite: insert or update a row, performance issue

sqlite table:
CREATE TABLE IF NOT EXISTS INFO
(
uri TEXT PRIMARY KEY,
cap INTEGER,
/* some other columns */
uid TEXT
);
INFO table has 5K+ rows and is run on a low power device (comparable to a 3 year old mobile phone).
I have this task: insert new URI with some values into INFO table, however, if URI is already present, then I need to update uid text field by appending extra text to it if the extra text to be appended isn't found within existing uid string; all other fields should remain unchanged.
As an example: INFO already has uri="http://example.com" with this uid string: "|123||abc||xxx|".
I need to add uri="http://example.com" and uid="|abc|". Since "|abc|" is a substring within existing field for the uri, then nothing should be updated. In any case, remaining fields shouldn't be updated
To get it done I have these options:
build some sql query (if it's possible to do something like that with sqlite in one sql statement),
Do everything manually in two steps: a) retrieve row for uid, do all processing manually and b) update existing or insert a new row if needed
Considering this is constrained device, which way is preferable? What if I omit the the extra requirement of sub-string match and always append uid to existing uid field?
"If it is possible with sqlite in one sql statement":
Yes, it is possible. The "UPSERT" statement has been nicely discussed in this question.
Applied to your extended case, you could do it like this in one statement:
insert or replace into info (uri, cap, uid)
values ( 'http://example.com',
coalesce((select cap from info where uri = 'http://example.com'),'new cap'),
(select case
when (select uid
from info
where uri = 'http://example.com') is null
then '|abc|'
when instr( (select uid
from info
where uri = 'http://example.com'),'|abc|') = 0
then (select uid
from info
where uri = 'http://example.com') || '|abc|'
else (select uid
from info
where uri = 'http://example.com')
end )
);
Checking the EXPLAIN QUERY PLAN gives us
selectid order from detail
---------- ---------- ---------- -------------------------
0 0 0 EXECUTE SCALAR SUBQUERY 0
0 0 0 SEARCH TABLE info USING INDEX sqlite_autoindex_INFO_1 (uri=?)
0 0 0 EXECUTE SCALAR SUBQUERY 1
1 0 0 EXECUTE SCALAR SUBQUERY 2
2 0 0 SEARCH TABLE info USING INDEX sqlite_autoindex_INFO_1 (uri=?)
1 0 0 EXECUTE SCALAR SUBQUERY 3
3 0 0 SEARCH TABLE info USING INDEX sqlite_autoindex_INFO_1 (uri=?)
1 0 0 EXECUTE SCALAR SUBQUERY 4
4 0 0 SEARCH TABLE info USING INDEX sqlite_autoindex_INFO_1 (uri=?)
1 0 0 EXECUTE SCALAR SUBQUERY 5
5 0 0 SEARCH TABLE info USING INDEX sqlite_autoindex_INFO_1 (uri=?)
As far as I know, sqlite will not cache the results of scalar sub-queries (I could not find any evidence of chaching when looking at the VM code produced by EXPLAIN for the above statement). Hence, since sqlite is an in-process db, doing things with two separate statements will most likely perform better than this.
You might want to benchmark the runtimes for this - results will of course depend on your host language and the interface you use (C, JDBC, ODBC, ...).
EDIT
A little performance benchmark using the JDBC driver and sqlite 3.7.2, running 100.000 modifications on a base data set of 5000 rows in table info, 50% updates, 50% inserts, confirms the above conclusion:
Using three prepared statements (first a select, then followed by an update or insert, depending on the selected data): 702ms
Using the above combined statement: 1802ms
The runtimes are quite stable across several runs.

how can I get faster FTS4 query results ordered by a field in another table?

Background
I'm implementing full-text search over a body of email messages stored in SQLite, making use of its fantastic built-in FTS4 engine. I'm getting some rather poor query performance, although not exactly where I would expect. Let's take a look.
Representative schema
I'll give some simplified examples of the code in question, with links to the full code where applicable.
We've got a MessageTable that stores the data about an email message (full version spread out over several files here, here, and here):
CREATE TABLE MessageTable (
id INTEGER PRIMARY KEY,
internaldate_time_t INTEGER
);
CREATE INDEX MessageTableInternalDateTimeTIndex
ON MessageTable(internaldate_time_t);
The searchable text is added to an FTS4 table named MessageSearchTable (full version here):
CREATE VIRTUAL TABLE MessageSearchTable USING fts4(
id INTEGER PRIMARY KEY,
body
);
The id in the search table acts as a foreign key to the message table.
I'll leave it as an exercise for the reader to insert data into these tables (I certainly can't give out my private email). I have just under 26k records in each table.
Problem query
When we retrieve search results, we need them to be ordered descending by internaldate_time_t so we can pluck out only the most recent few results. Here's an example search query (full version here):
SELECT id
FROM MessageSearchTable
JOIN MessageTable USING (id)
WHERE MessageSearchTable MATCH 'a'
ORDER BY internaldate_time_t DESC
LIMIT 10 OFFSET 0
On my machine, with my email, that runs in about 150 milliseconds, as measured via:
time sqlite3 test.db <<<"..." > /dev/null
150 milliseconds is no beast of a query, but for a simple FTS lookup and indexed order, it's sluggish. If I omit the ORDER BY, it completes in 10 milliseconds, for example. Also keep in mind that the actual query has one more sub-select, so there's a little more work going on in general: the full version of the query runs in about 600 milliseconds, which is into beast territory, and omitting the ORDER BY in that case shaves 500 milliseconds off the time.
If I turn on stats inside sqlite3 and run the query, I notice the line:
Sort Operations: 1
If my interpretation of the docs about those stats is correct, it looks like the query is completely skipping using the MessageTableInternalDateTimeTIndex. The full version of the query also has the line:
Fullscan Steps: 25824
Sounds like it's walking the table somewhere, but let's ignore that for now.
What I've discovered
So let's work on optimizing that a little bit. I can rearrange the query into a sub-select and force SQLite to use our index with the INDEXED BY extension:
SELECT id
FROM MessageTable
INDEXED BY MessageTableInternalDateTimeTIndex
WHERE id IN (
SELECT id
FROM MessageSearchTable
WHERE MessageSearchTable MATCH 'a'
)
ORDER BY internaldate_time_t DESC
LIMIT 10 OFFSET 0
Lo and behold, the running time has dropped to around 100 milliseconds (300 milliseconds in the full version of the query, a 50% reduction in running time), and there are no sort operations reported. Note that with just reorganizing the query like this but not forcing the index with INDEXED BY, there's still a sort operation (though we've still shaved off a few milliseconds oddly enough), so it appears that SQLite is indeed ignoring our index unless we force it.
I've also tried some other things to see if they'd make a difference, but they didn't:
Explicitly making the index DESC as described here, with and without INDEXED BY
Explicitly adding the id column in the index, with and without internaldate_time_t ordered DESC, with and without INDEXED BY
Probably several other things I can't remember at this moment
Questions
100 milliseconds here still seems awfully slow for what seems like it should be a simple FTS lookup and indexed order.
What's going on here? Why is it ignoring the obvious index unless you force its hand?
Am I hitting some limitation with combining data from virtual and regular tables?
Why is it still so relatively slow, and is there anything else I can do to get FTS matches ordered by a field in another table?
Thanks!
An index is useful for looking up a table row based on the value of the indexed column.
Once a table row is found, indexes are no longer useful because it is not efficient to look up a table row in an index by any other criterium.
An implication of this is that it is not possible to use more than one index for each table accessed in a query.
Also see the documentation: Query Planning, Query Optimizer.
Your first query has the following EXPLAIN QUERY PLAN output:
0 0 0 SCAN TABLE MessageSearchTable VIRTUAL TABLE INDEX 4: (~0 rows)
0 1 1 SEARCH TABLE MessageTable USING INTEGER PRIMARY KEY (rowid=?) (~1 rows)
0 0 0 USE TEMP B-TREE FOR ORDER BY
What happens is that
the FTS index is used to find all matching MessageSearchTable rows;
for each row found in 1., the MessageTable primary key index is used to find the matching row;
all rows found in 2. are sorted with a temporary table;
the first 10 rows are returned.
Your second query has the following EXPLAIN QUERY PLAN output:
0 0 0 SCAN TABLE MessageTable USING COVERING INDEX MessageTableInternalDateTimeTIndex (~100000 rows)
0 0 0 EXECUTE LIST SUBQUERY 1
1 0 0 SCAN TABLE MessageSearchTable VIRTUAL TABLE INDEX 4: (~0 rows)
What happens is that
the FTS index is used to find all matching MessageSearchTable rows;
SQLite goes through all entries in the MessageTableInternalDateTimeTIndex in the index order, and returns a row when the id value is one of the values found in step 1.
SQLite stops after the tenth such row.
In this query, it is possible to use the index for (implied) sorting, but only because no other index is used for looking up rows in this table.
Using an index in this way implies that SQLite has to go through all entries, instead of lookup up the few rows that match some other condition.
When you omit the INDEXED BY clause from your second query, you get the following EXPLAIN QUERY PLAN output:
0 0 0 SEARCH TABLE MessageTable USING INTEGER PRIMARY KEY (rowid=?) (~25 rows)
0 0 0 EXECUTE LIST SUBQUERY 1
1 0 0 SCAN TABLE MessageSearchTable VIRTUAL TABLE INDEX 4: (~0 rows)
0 0 0 USE TEMP B-TREE FOR ORDER BY
which is essentially the same as your first query, except that joins and subqueries are handled slightly differently.
With your table structure, it is not really possible to get faster.
You are doing three operations:
looking up rows in MessageSearchTable;
looking up corresponding rows in MessageTable;
sorting rows by a MessageTable value.
As far as indexes are concerned, steps 2 and 3 conflict with each other.
The database has to choose whether to use an index for step 2 (in which case sorting must be done explicitly) or for step 3 (in which case it has to go through all MessageTable entries).
You could try to return fewer records from the FTS search by making the message time a part of the FTS table and searching only for the last few days (and increasing or dropping the time if you don't get enough results).

Dataset column always returns -1

I have a SQL stored proc that returns a dataset to ASP.NET v3.5 dataset. One of the columns in the dataset is called Attend and is a nullable bit column in the SQL table. The SELECT for that column is this:
CASE WHEN Attend IS NULL THEN -1 ELSE Attend END AS Attend
When I execute the SP in Query Analyzer the row values are returned as they should be - the value for Attend is -1 is some rows, 0 in others, and 1 in others. However, when I debug the C# code and examine the dataset, the Attend column always contains -1.
If I SELECT any other columns or constant values for Attend the results are always correct. It is only the above SELECT of the bit field that is behaving strangely. I suspect it has something to do with the type being bit that is causing this. So to test this I instead selected "CONVERT(int, Attend)" but the behavior is the same.
I have tried using ExecuteDataset to retrieve the data and I have also created a .NET Dataset schema with TableAdapter and DataTable. Still no luck.
Does anyone know what is the problem here?
Like you, I suspect the data type. If you can change the data type of Attend, change it to smallint, which supports negative numbers. If not, try changing the name of the alias from Attend to IsAttending (or whatever suits the column).
Also, you can make your query more concise by using this instead of CASE:
ISNULL(Attend, -1)
You've suggested that the Attend field is a bit, yet it contains three values (-1,0,1). A bit, however, can only hold two values. Often (-1, 0) when converted to an integer, but also possible (0, 1), depending on whether the BIT is considered signed (two's compliment) or unsigned (one's compliment).
If your client (the ASP code) is converting all values for that field to a BIT type then both -1 and 1 will likely show as the same value. So, I would ensure two things:
- The SQL returns an INTEGER
- The Client isn't converting that to a BIT
[Though this doesn't explain the absence of 0's]
One needs to be careful with implicit conversion of types. When not specifying explicitly double check the precidence. Or, to be certain, explicitly specify every type...
Just out of interest, what do you get when using the following?
CASE [table].attend
WHEN NULL THEN -2
WHEN 0 THEN 0
ELSE 2
END

SQLite - getting number of rows in a database

I want to get a number of rows in my table using max(id). When it returns NULL - if there are no rows in the table - I want to return 0. And when there are rows I want to return max(id) + 1.
My rows are being numbered from 0 and autoincreased.
Here is my statement:
SELECT CASE WHEN MAX(id) != NULL THEN (MAX(id) + 1) ELSE 0 END FROM words
But it is always returning me 0. What have I done wrong?
You can query the actual number of rows withSELECT Count(*) FROM tblName
see https://www.w3schools.com/sql/sql_count_avg_sum.asp
If you want to use the MAX(id) instead of the count, after reading the comments from Pax then the following SQL will give you what you want
SELECT COALESCE(MAX(id)+1, 0) FROM words
In SQL, NULL = NULL is false, you usually have to use IS NULL:
SELECT CASE WHEN MAX(id) IS NULL THEN 0 ELSE (MAX(id) + 1) END FROM words
But, if you want the number of rows, you should just use count(id) since your solution will give 10 if your rows are (0,1,3,5,9) where it should give 5.
If you can guarantee you will always ids from 0 to N, max(id)+1 may be faster depending on the index implementation (it may be faster to traverse the right side of a balanced tree rather than traversing the whole tree, counting.
But that's very implementation-specific and I would advise against relying on it, not least because it locks your performance to a specific DBMS.
Not sure if I understand your question, but max(id) won't give you the number of lines at all. For example if you have only one line with id = 13 (let's say you deleted the previous lines), you'll have max(id) = 13 but the number of rows is 1. The correct (and fastest) solution is to use count(). BTW if you wonder why there's a star, it's because you can count lines based on a criteria.
I got same problem if i understand your question correctly, I want to know the last inserted id after every insert performance in SQLite operation. i tried the following statement:
select * from table_name order by id desc limit 1
The id is the first column and primary key of the table_name, the mentioned statement show me the record with the largest id.
But the premise is u never deleted any row so the numbers of id equal to the numbers of rows.
Extension of VolkerK's answer, to make code a little more readable, you can use AS to reference the count, example below:
SELECT COUNT(*) AS c from profile
This makes for much easier reading in some frameworks, for example, i'm using Exponent's (React Native) Sqlite integration, and without the AS statement, the code is pretty ugly.

Resources