Can the LIKE statement be optimized to not do full table scans? - sqlite

I want to get a subtree from a table by tree path.
the path column stores strings like:
foo/
foo/bar/
foo/bar/baz/
If I try to select all records that start with a certain path:
EXPLAIN QUERY PLAN SELECT * FROM f WHERE path LIKE "foo/%"
it tells me that the table is scanned, even though the path column is indexed :(
Is there any way I could make LIKE use the index and not scan the table?
I found a way to achieve what I want with closure table, but it's harder to maintain and writes are extremely slow...

To be able to use an index for LIKE in SQLite,
the table column must have TEXT affinity, i.e., have a type of TEXT or VARCHAR or something like that; and
the index must be declared as COLLATE NOCASE (either directly, or because the column has been declared as COLLATE NOCASE):
> CREATE TABLE f(path TEXT);
> CREATE INDEX fi ON f(path COLLATE NOCASE);
> EXPLAIN QUERY PLAN SELECT * FROM f WHERE path LIKE 'foo/%';
0|0|0|SEARCH TABLE f USING COVERING INDEX fi (path>? AND path<?)
The second restriction could be removed with the case_sensitive_like PRAGMA, but this would change the behaviour of LIKE.
Alternatively, one could use a case-sensitive comparison, by replacing LIKE 'foo/%' with GLOB 'foo/*'.

LIKE has strict requirements to be optimizable with an index (ref).
If you can relax your requirements a little, you can use lexicographic ordering to get indexed lookups, e.g.
SELECT * FROM f WHERE PATH >= 'foo/' AND PATH < 'foo0'
where 0 is the lexigographically next character after /.
This is essentially the same optimization the optimizer would do for LIKEs if the requirements for optimization are met.

Related

Show negative real in SQLite table

I have a column C of type REAL in table F in SQLite. I want to join this everywhere where in another table the negative value of F exists (along with some other fields).
However -C or 0-C etc.. all return the rounded value of C e.g. when C contains "123,456" then -C returns "-123".
Should I cast this via a string first or is the syntax differently?
Looks like the , in 123,456 is meant to be a decimal separator but SQLite treats the whole thing as a string (i.e. '123,456' rather than 123.456). Keep in mind that SQLite's type system is a little different than SQL's as values have types but columns don't:
[...] In SQLite, the datatype of a value is associated with the value itself, not with its container. [...]
So you can quietly put a string (that looks like a real number in some locales) into a real column and nothing bad happens until later.
You could fix the import process to interpret the decimal separator as desired before the data gets into SQLite or you could use replace to fix them up as needed:
sqlite> select -'123,45';
-123
sqlite> select -replace('123,45', ',', '.');
-123.45

Extract only required files in U-SQL

Is it possible to extract files only for 3 days, without extracting all the files.
DROP VIEW IF EXISTS dbo.Read;
CREATE VIEW IF NOT EXISTS dbo.Read AS
EXTRACT
Statements
FROM
"adl://Test/{date:yyyy}/{date:M}/{date:d}/Testfile.csv"
USING Extractors.Csv(silent:true,quoting : true, nullEscape : "/N");
#res =
SELECT * FROM dbo.Read
WHERE date BETWEEN DateTime.Parse("2015/07/01") AND DateTime.Parse("2015/07/03");
OUTPUT #res
TO "adl://test/Testing/loop.csv"
USING Outputters.Csv();
Partition elimination already ensures for your query that only files matching predicates will actually be read (you can confirm that in the job graph).
See also my previous answer for How to implement Loops in U-SQL
If you have remaining concerns about performance, the job graph can also help you nail down where they originate.
You can use the pattern identifiers in the fileset specification in parts of the path or even parts of the name (see https://msdn.microsoft.com/en-us/library/azure/mt771650.aspx). You can do lists of files, so if you only have one file in each directory you can do;
EXTRACT ...
FROM "adl://Test/2015/07/1/Testfile.csv"
, "adl://Test/2015/07/2/Testfile.csv"
USING ...;
If there is more than one file in each directory you can do individual extracts for each day and then union the result. Something like:
#a = EXTRACT ....
FROM "adl://Test/2015/07/1/{*}.csv"
USING ...;
#b = EXTRACT ....
FROM "adl://Test/2015/07/2/{*}.csv"
USING ...;
#fullset = SELECT * FROM #a UNION SELECT * FROM #b;
Unfortunately I believe there is no list of filesets at the moment allowing you to do above case in one EXTRACT statement.

Sqlite doesn't use some indexes

Executing the following code creates a table with two columns and adds 1 million rows. One column is INT and one is TEXT. Then it creates one index per column and one collate nocase index per column. Then it executes three queries.
The first query uses the index t2 as expected.
The second query is the same as the first one, but it adds the ESCAPE clause and doesn't use the index. The presence of unescaped % or _ should prevent the index from being (fully) used, but the presence of the ESCAPE clause itself shouldn't.
Why does the ESCAPE clause prevent the index from being used?
The third query is the same as the first one, but it doesn't use the index. The only difference is that the query uses column col_i instead of col_t which is defined as INT instead of TEXT. Sqlite doesn't prevent me from creating the index, so I would expect for it to be used.
Why isn't the index i2 used?
.timer on
DROP TABLE IF EXISTS tab;
CREATE TABLE tab (col_t TEXT, col_i INT);
INSERT INTO tab (col_i, col_t) WITH RECURSIVE cte (x, y) AS (SELECT hex(randomblob(16)), hex(randomblob(16)) UNION ALL SELECT hex(randomblob(16)), hex(randomblob(16)) FROM cte LIMIT 1000000) SELECT x, y FROM cte;
CREATE INDEX t ON tab (col_t);
CREATE INDEX t2 ON tab (col_t COLLATE nocase);
CREATE INDEX i ON tab (col_i);
CREATE INDEX i2 ON tab (col_i COLLATE nocase);
SELECT * FROM tab WHERE col_t LIKE 'abcabcabc';
SELECT * FROM tab WHERE col_t LIKE 'abcabcabc' ESCAPE '\';
SELECT * FROM tab WHERE col_i LIKE 'abcabcabc';
The documentation documents when the index can be used for LIKE:
The left-hand side … must be the name of an indexed column with TEXT affinity.
The right-hand side … must be … a string literal … that does not begin with a wildcard character.
The ESCAPE clause cannot appear on the LIKE operator.
The built-in functions used to implement LIKE … must not have been overloaded using the sqlite3_create_function() API.
[…]
… the column must indexed using built-in NOCASE collating sequence.
The query optimizer has to prove that using the index cannot change the meaning of the query. These rules implement the proof.
While there exist queries that would work with the index despite violating these rules, it would be necessary to extend the optimizer to be able to prove that they work.

teradata : to calulate cast as length of column

I need to use cast function with length of column in teradata.
say I have a table with following data ,
id | name
1|dhawal
2|bhaskar
I need to use cast operation something like
select cast(name as CHAR(<length of column>) from table
how can i do that?
thanks
Dhawal
You have to find the length by looking at the table definition - either manually (show table) or by writing dynamic SQL that queries dbc.ColumnsV.
update
You can find the maximum length of the actual data using
select max(length(cast(... as varchar(<large enough value>))) from TABLE
But if this is for FastExport I think casting as varchar(large-enough-value) and postprocessing to remove the 2-byte length info FastExport includes is a better solution (since exporting a CHAR() will results in a fixed-length output file with lots of spaces in it).
You may know this already, but just in case: Teradata usually recommends switching to TPT instead of the legacy fexp.

SQLITE: select rows where a certian column is contained in a given string

I have a table which has a column named "directory" which contains strings like:
c:\mydir1\mysubdir1\
c:\mydir2
j:\myotherdir
...
I would like to do something like
SELECT FROM mytable WHERE directory is contained within 'c:\mydir2\something\'
This query should give me as a result:
c:\mydir2
Ok, I've just found that sqlite has a function instr that seems to work for my purpose.
Not sure about the performance, though.

Resources