How to escape string for SQLite FTS query - sqlite

I'm trying to perform a SQLite FTS query with untrusted user input. I do not want to give the user access to the query syntax, that is they will not be able to perform a match query like foo OR bar AND cats. If they tried to query with that string I would want to interpret it as something more like foo \OR bar \AND cats.
There doesn't seem to be anything built in to SQLite for this, so I'll probably end up building my own escaping function, but this seems dangerous and error-prone. Is there a preferred way to do this?

The FTS MATCH syntax is its own little language. For FTS5, verbatim string literals are well defined:
Within an FTS expression a string may be specified in one of two ways:
By enclosing it in double quotes ("). Within a string, any embedded double quote characters may be escaped SQL-style - by adding a second double-quote character.
(redacted special case)
It turns out that correctly escaping a string for an FTS query is simple enough to implement completely and reliably: Replace " with "" and enclose the result in " on both ends.
In my case it then works perfectly when I put it into a prepared statement such as SELECT stuff FROM fts_table WHERE fts_table MATCH ?. I would then .bind(fts_escape(user_input)) where fts_escape is the function I described above.

OK I've investigated further, and with some heavy magic you can access the actual tokenizer used by SQLite's FTS. The "simple" tokenizer takes your string, separates it on any character that is not in [A-Za-z0-0], and lowercases the remaining. If you perform this same operation you will get a nicely "escaped" string suitable for FTS.
You can write your own, but you can access SQLite's internal one as well. See this question for details on that: Automatic OR queries using SQLite FTS4

Related

Unary NOT in SQLite FTS5 MATCH query

The SQLite FTS5 docs say that search queries such as SELECT ... WHERE MATCH '<query1> NOT <query2>' are supported, but it looks like there's no support for the unary NOT operator.
For example, if I want to search for everything that doesn't match <query>, I cannot use MATCH 'NOT <query>'. I would have to use NOT MATCH '<query>', which is a completely different thing (the FTS5 module never gets to see the NOT operator, as it is outside the quotation marks). Only the text inside the quotation marks is the search query.
I need to find a way to use an unary NOT operator inside the search query. I can't use it outside, because I only get to control the search query text, and not the rest of the SQL statement.
A possible approach I've thought of would be to find a search query that matches anything, and do MATCH '<match_anything> NOT <query>'. However, I've found no way to match everything in a search query.
Can you think of a way to have the behaviour of the unary NOT operator inside the search query?
Try this ..
SELECT * FROM docs
WHERE ROWID NOT IN (
SELECT ROWID FROM docs WHERE content MATCH '<query>'
)

Getting a table from a schema from sql with a backslash

So I have to get a table which is in a schema in a database. The schema name contains a backslash, e.g david\b.
I have my connection con so I use dbplyr
tabel <- dplyr::tbl(con, in_schema("david\\b", "some_tabel"))
But this does not work.
Every database I know would only possibly allow a backslash in a quoted identifier. So I think you need to include the double quotes as well as the (escaped) backslash:
in_schema('"david\\b"', "some_tabel")
If you click on the links in my comment, they all pretty much say identifiers (like table and schema names) can only include letters, numbers, _ and (sometimes) $ and #. Unless the identifier is quoted.

TRIM BOTH Teradata not working for single quotes

SELECT TRIM(BOTH 'a' FROM 'aaaaaaaaaaaaasdfasa');
works fine and returns sdfas
but I am trying to remove quotes, so I did
SELECT TRIM(BOTH ''' FROM (''2565','5216','5364'') ;
I get error - Query ends within a string or comment block. Please suggest how to do this
This answer assumes that the .NET application successfully inserts data and the issue is at the time of SELECT or UPDATE i.e. the issue is not in a prepared SQL statement used within the .NET application. This also assumes that data sanity checks for using such logic in WHERE .. IN are already in place.
Assume your data values are:
2565,5216,5364
'2565','5216','5364'
Your application coverts these into following:
'2565,5216,5364'
''2565','5216','5364''
For ease of query design for this answer, we can convert these to string values for SQL by escaping each single quote in the value with an additional single quote, and then putting the entire thing inside a pair of single quotes to make it a string; which gives us:
'''2565,5216,5364'''
'''''2565'',''5216'',''5364'''''
If you want to remove all single quotes in these, you can use
SELECT OREPLACE( '''2565,5216,5364''', '''', '');
SELECT OREPLACE( '''''2565'',''5216'',''5364''''', '''', '');
Which means replace all single quotes with empty strings and gives us:
2565,5216,5364
2565,5216,5364
This may be the way to go in case you are comparing with integer values.
Now if you want your data to be preserved and remove only the enclosing quotes put by the .NET application, (e.g. if the comparison is also with character data) then you can combine this with further logic. Let us use the second data value, since it present more value for such an operation
SELECT TRIM(BOTH '''' FROM '''''2565'',''5216'',''5364''''');
The query above will give you following results which removes the enclosing quotes from application but also removes the first and the last quotes entered by the user
2565','5216','5364
A better option, but with the assumption that you application always encloses the data in quotes will be
SELECT SUBSTR('''''2565'',''5216'',''5364''''',2,CHARACTER_LENGTH('''''2565'',''5216'',''5364''''')-2);
This will perform substring operation from second character in the string till length-2, and will thus ignore both quotes inserted by the application
'2565','5216','5364'

Is there issues with Oracle username starting with numbers? - username in quotes

Using Oracle 11gR2
You can't create a username starting with a number:
SQL> create user 123 identified by temp;
create user 123 identified by temp
*
ERROR at line 1:
ORA-01935: missing user or role name
However, you can create it as:
SQL> create user "123" identified by temp;
User created.
Somebody knows possible problems with this kind of users?
Somebody knows oracle rules/reasons why you can't create it without quotes, ie, to have usernames starting with numbers?
Thanks in advance
Problems with quoted identifiers
Quoted identifiers can be successfully used for almost any Oracle object, including users. In theory, they work everywhere. In practice, you will run into many inconveniences and problems with quoted identifiers.
From the SQL Language Reference:
"Note: Oracle does not recommend using quoted identifiers for database object names. These quoted identifiers are accepted by SQL*Plus, but they may not be valid when using other tools that manage database objects."
Once you use double quotes, every reference to that object must use double quotes, and the correct case. You'll find lots of problems with tools that don't always use double quotes. And problems with scripts that look at metadata and don't always add double quotes. Quoted identifiers are just asking for trouble.
Why does Oracle have quoted identifiers?
This question is harder to answer, but I would guess limiting the types of characters used by objects makes parsing much easier. SQL already has a lot of keywords, and has many weird language ambiguities. If object names started with numbers it would make it difficult to differentiate between real numbers and objects.
For example, without quoted identifiers, this simple statement could be a mess:
select 1.1 + 2.2 from some_table;
Without restricting object names, 1.1 could be a huge number of things, and the parser would have to look for objects named "1", and then dependent objects named "1", and then determine if that takes precedence over the number "1.1".
Weird names are possible in languages, but I assume when someone wrote the first SQL compiler 40 years ago they decided not to make their lives so complicated just to accommodate a few weird names.
Check if the user name is not present in reserved words and doesn't start with number:
SELECT *
FROM v$reserved_words
ORDER BY keyword
If you are creating user try this:
alter session set "_ORACLE_SCRIPT"=true;
CREATE USER oe IDENTIFIED BY oe;
check your connection type is cdb or not. if it is cdb as shown in the below
image
use prefix c## before the username in the command for creating user

NHibernate, SQLite, and Cyrillic characters: case sensitivity and fallback queries

I'm querying an SQLite database using NHibernate. Generally, I want to do case insensitive string queries. Recently, I've discovered that although I can insert a row with Cyrillic characters, I can not select it using a case insensitive query. This is what the query looks like:
string foo = "foo";
IList<T> list = session.CreateCriteria(typeof(T)).
Add(Expression.Eq("Foo", foo).IgnoreCase()).List<T>();
I can, however, select the row using the above query if IgnoreCase() is removed. A naive fix would be to check if list.Count == 0 after the first query, and make a subsequent case sensitive query. The major downside of this approach is that querying for non-existent rows is a reasonably common operation that would now consist of two queries.
The question is, how can I construct a single query that will select from the Foo column that is case insensitive yet will also select rows that contain Cyrillic characters?
Case insensitive queries by default only work with ASCII characters in SQLite.
See this FAQ: Case-insensitive matching of Unicode characters does not work.

Resources