When doing a PartiQL select statement over a DynamoDB table:
Do we need to explicitly define which index to use, or can that be inferred from the WHERE condition? I.e, if an index is not defined on the FROM but the condition uses an indexed column with an equality comparison, is that index used automatically to perform a query instead of a scan?
In a related question, is there any way to see the "query/scan plan" used by a PartiQL select?
Thanks
It will automatically infer based on the where condition.
Below is the documentation which clearly mentions that.
PartiQL Select Statment
Related
The indexing policy document explains the composite index is useful when there is minimum of one equality and one range/orderby query. But it doesn't explain if it's useful when the query have only multiple equality filters and no range/order by filter.
The Mongo DB has a equivalent compound index that helps while querying with multiple equality filter and no range filter.
For example the query will be like this.
Select * from c where c.FirstName = "Hi" and c.LastName ="Hello"
In the precise query you have as an example it is unlikely to help. It could provide some benefit but you'd have to have data with extreme cardinality such as a time stamp.
Diesel's SqliteBackend does not implement the SupportsReturningClause trait, so the get_result method cannot be used to retrieve a newly created value.
Is there another way to find out the id of the inserted row? Python has a solution for this. The only solution I've found so far is to use a UUID for ids instead of an autoincrement field.
The underlying issue here is that SQLite does not support SQL RETURNING clauses which would allow you to return the auto generated id as part of your insert statement.
As the OP only provided a general question I cannot show examples how to implement that using diesel.
There are several ways to workaround this issue. All of them require that you execute a second query.
Order by id and select just the largest id. That's the most direct solution. It shows directly the issues with doing a second query, as there can be a racing insert at any point in time, so that you can get back the wrong id (at least if you don't use transactions).
Use the last_insert_rowid() SQL function to receive the row id of the last inserted column. If not configured otherwise those row id matches your autoincrement primary integer key. On diesel side you can use no_arg_sql_function!() to define the underlying sql function in your crate.
With Sqlite3, I am trying to do a query like:
select *
from data
where instr(filepath,'.txt') != 0
And I want to index this query to speed it up.
I tried to create an index like:
create index data_instr_filepath
on data(instr(filepath,'.txt'));
However, "explain query plan" still shows that I'm doing a table scan.
Is this doable in sqlite? The examples I have found for doing expression-based indexes seems to be limited to the length function and multiplying two columns together.
UPDATE:
Thanks to Mike's answer, I refactored my query to not use inequalities and was able to create an index that hits it. Below are my indexes that I ended up using:
create index data_instr_filepath_txt on data(instr(filepath,'.txt'));
create index data_instr_filepath_substr on data(substr(filepath,0,instr(filepath,'.')));
The reason is that an index will likely not be used for an inequality as per :-
Similarly, index columns will not normally be used (for indexing
purposes) if they are to the right of a column that is constrained
only by inequalities. The SQLite Query Optimizer Overview
You are able to try forcing the use of an index by using INDEXED BY. However, this will not work in your situation because of the above flagging the index as not being usable. (the query will still work)
e.g.
EXPLAIN QUERY PLAN
SELECT * FROM data INDEXED BY data_instr_filepath
WHERE instr(filepath,'.txt') != 0
results in :-
no query solution
Time: 0s
I've got a list of partition keys from one table.
userId["123","456","235"]
I need to get an attribute that they all share. like "username".
What would be the best practice to get them all at once?
Is scan my only option knowing that I know all my partition keys?
Do I know the sort key? yes but only the beginning of it. Therefore I
don't think I could use batchGetItem.
Scan is only appropriate if you don't know the partition keys. Because you know the partition keys you want to search, you can achieve the desired behavior with multiple Query operations.
A Query searches all documents with the specified partition key; you can only query one partition key per request, so you'll need multiple queries, but this will still be significantly more efficient than a single Scan operation.
If you're only looking for documents with a sort key that begins with something, you can include it in your KeyConditionExpression along with the partition key.
For example, if you wanted to only return documents whose sort key begins with a certain string, you could pass something like userId = :user_id AND begins_with(#SortKey, :str) as the key condition expression.
You can efficiently achieve the result by using PartQL SELECT statement. It allows to query array of partition keys with IN operator and apply additional conditions on other attributes without causing a full table scan.
To ensure that a SELECT statement does not result in a full table
scan, the WHERE clause condition must specify a partition key. Use the
equality or IN operator.
I have a query like
select * from tbl where name like 'a%' or name like 'abc%';
Does SQLite search 'a%' and 'abc%' separately? Would it check abc% is included by a%, and do only one search?
"explain query plan" returns
"0" "0" "0" "SEARCH TABLE traces USING PRIMARY KEY (name>? AND name<?)"
"0" "0" "0" "SEARCH TABLE traces USING PRIMARY KEY (name>? AND name<?)"
Is it what happens at run time?
Does SQLite search 'a%' and 'abc%' separately? Would it check abc% is
included by a%, and do only one search?
I think neither is the correct answer as it appears to be in-between the two options given.
I think trawling through the documenting will explain a little.
First port of call is The SQLite Query Optimizer Overview . This says :-
If the WHERE clause is composed of constraints separate by the OR
operator then the entire clause is considered to be a single "term" to
which the OR-clause optimization is applied.
Addiotnally in EXPLAIN QUERY PLAN it states :-
If the WHERE clause of a query contains an OR expression, then SQLite
might use the "OR by union" strategy (also described here).
link included below to 1.8. OR-Connected Terms In The WHERE Clause
In this
case there will be two SEARCH records, one for each index, with the
same values in both the "order" and "from" columns. For example:
sqlite> CREATE INDEX i3 ON t1(b);
sqlite> EXPLAIN QUERY PLAN SELECT * FROM t1 WHERE a=1 OR b=2;
0|0|0|SEARCH TABLE t1 USING COVERING INDEX i2 (a=?)
0|0|0|SEARCH TABLE t1 USING INDEX i3 (b=?)
So it very much appears that the "OR by union" strategy is being used as you have:-
"0" "0" "0" "SEARCH TABLE traces USING PRIMARY KEY (name>? AND name<?)"
"0" "0" "0" "SEARCH TABLE traces USING PRIMARY KEY (name>? AND name<?)"
Or-clause optimization, is explained here:-
3.0 OR optimizations (same as first document). However there are lots of mights, rather I think that link as provided in the EXPLAIN QUERY PLAN to 1.8. OR-Connected Terms In The WHERE Clause is more pertinent, this includes :-
1.8. OR-Connected Terms In The WHERE Clause
Multi-column indices only work if the constraint terms in the WHERE
clause of the query are connected by AND. So Idx3 and Idx4 are helpful
when the search is for items that are both Oranges and grown in
California, but neither index would be that useful if we wanted all
items that were either oranges or are grown in California.
SELECT price FROM FruitsForSale WHERE fruit='Orange' OR state='CA';
When confronted with OR-connected terms in a WHERE clause, SQLite
examines each OR term separately and tries to use an index to find the
rowids associated with each term. It then takes the union of the
resulting rowid sets to find the end result. The following figure
illustrates this process:
The diagram above implies that SQLite computes all of the rowids
first and then combines them with a union operation before starting to
do rowid lookups on the original table. In reality, the rowid lookups
are interspersed with rowid computations. SQLite uses one index at a
time to find rowids while remembering which rowids it has seen before
so as to avoid duplicates. That is just an implementation detail,
though. The diagram, while not 100% accurate, provides a good overview
of what is happening.
In order for the OR-by-UNION technique shown above to be useful, there
must be an index available that helps resolve every OR-connected term
in the WHERE clause. If even a single OR-connected term is not
indexed, then a full table scan would have to be done in order to find
the rowids generated by the one term, and if SQLite has to do a full
table scan, it might as well do it on the original table and get all
of the results in a single pass without having to mess with union
operations and follow-on binary searches.
One can see how the OR-by-UNION technique could also be leveraged to
use multiple indices on queries where the WHERE clause has terms
connected by AND, by using an intersect operator in place of union.
Many SQL database engines will do just that. But the performance gain
over using just a single index is slight and so SQLite does not
implement that technique at this time. However, a future version
SQLite might be enhanced to support AND-by-INTERSECT.
Another consideration is 4.0 The LIKE optimization. However, I believe this is on a per LIKE clause basis only.