MarkLogic 8 Search case insensitive in the JSON elements - XQuery - xquery

Present records
{"myLabel":"AFRICANA"}
{"myLabel":"africans"}
{"myLabel":"AFRICAN"}
{"myLabel":"Africa"}
Query : `cts:json-property-word-match("myLabel", "Africa*")`
Result:
{"myLabel":"Africa"}
Query returns only match case data not all relavent rows.
Query : `cts:json-property-word-match("myLabel", "Africa*", "case-insensitive")`
Result:
your query returned an empty sequence
If I use "case-insensitive" option it returns empty sequence.
I have set word lexicons as myLabel.
How do I search for JSON data case insensitively?

Both of the examples provided return the expected results for me. Did you intend to show the second query as a search with cts:json-property-value-query()?
If that is the case, then applying the wildcarded option will ensure that values are matched case-insenstive and as a wildcarded query:
cts:search(doc(),
cts:json-property-value-query("myLabel", "Africa*", ("wildcarded","case-insensitive")))
Double check to see if you have "trailing wildcard searches" enabled, or any of the three, two, or one character searches enabled for your content database. The rules for wildcard searches state that you need specific database indexes enabled for the query to automatically apply queries with wildcard patters as a wildcarded query:
If neither "wildcarded" nor "unwildcarded" is present, the database configuration and $text determine wildcarding. If the database has any wildcard indexes enabled ("three character searches", "two character searches", "one character searches", or "trailing wildcard searches") and if $text contains either of the wildcard characters '?' or '*', it specifies "wildcarded". Otherwise it specifies "unwildcarded".
With either of those index settings enabled, a query term with a * should automatically execute as a wildcard query, and you can remove the explicit wildcarded option:
cts:search(doc(),
cts:json-property-value-query("myLabel", "Africa*", "case-insensitive"))

Related

Unary NOT in SQLite FTS5 MATCH query

The SQLite FTS5 docs say that search queries such as SELECT ... WHERE MATCH '<query1> NOT <query2>' are supported, but it looks like there's no support for the unary NOT operator.
For example, if I want to search for everything that doesn't match <query>, I cannot use MATCH 'NOT <query>'. I would have to use NOT MATCH '<query>', which is a completely different thing (the FTS5 module never gets to see the NOT operator, as it is outside the quotation marks). Only the text inside the quotation marks is the search query.
I need to find a way to use an unary NOT operator inside the search query. I can't use it outside, because I only get to control the search query text, and not the rest of the SQL statement.
A possible approach I've thought of would be to find a search query that matches anything, and do MATCH '<match_anything> NOT <query>'. However, I've found no way to match everything in a search query.
Can you think of a way to have the behaviour of the unary NOT operator inside the search query?
Try this ..
SELECT * FROM docs
WHERE ROWID NOT IN (
SELECT ROWID FROM docs WHERE content MATCH '<query>'
)

How to create virtual table FTS with external sqlite content table?

I want to create a SQLite virtual table with a content of a real one.
I have a small sample which demonstrates my problem. I already red the official tutorial, but can't find anything wrong in this code. Some users use a rebuild option, but it doesn't work for me.
CREATE TABLE if NOT EXISTS posts (a INTEGER PRIMARY KEY);
INSERT OR IGNORE INTO posts (a) VALUES(510000);
INSERT OR IGNORE INTO posts (a) VALUES(510001);
INSERT OR IGNORE INTO posts (a) VALUES(510300);
CREATE VIRTUAL TABLE IF NOT EXISTS posts_fts using fts5(content=posts, content_rowid=a, a);
SELECT * FROM posts_fts where posts_fts MATCH '10' ORDER BY a ASC;
If I run this, I get:
0 rows returned in 2ms from: SELECT * FROM posts_fts where posts_fts match '10' ORDER BY a ASC;
Does anyone have an idea wat I do wrong?
"10" is not a token in the FTS table.
From the doc:
4.3.1. Unicode61 Tokenizer
The unicode tokenizer classifies all unicode characters as either
"separator" or "token" characters. By default all space and
punctuation characters, as defined by Unicode 6.1, are considered
separators, and all other characters as token characters. More
specifically, all unicode characters assigned to a general category
beginning with "L" or "N" (letters and numbers, specifically) or to
category "Co" ("other, private use") are considered tokens. All other
characters are separators.
Each contiguous run of one or more token characters is considered to
be a token. The tokenizer is case-insensitive according to the rules
defined by Unicode 6.1.
Also from the doc:
3.2. FTS5 Phrases
FTS queries are made up of phrases. A phrase is an ordered list of one
or more tokens.
You might try a "prefix query" i.e. MATCH "5*" to see that you get results.

R sqlexecute wildcard

Using RODBCext (and Teradata) my SQL query often need to be restricted and is done so with a where statement. However, this is not always required and it would be beneficial to not restrict, but I would like to use a single SQL query. (The actual query is more complex and has several instances of what I'm attempting to apply here)
In order to return all rows, using a wildcard seems like the next best option, but nothing appears to work correctly. For example, the sql query is:
SELECT *
FROM MY_DB.MY_TABLE
WHERE PROC_TYPE = ?
The following does work when passing in a string for proc_type:
sqlExecute(connHandle, getSQL(SQL_script_path), proc_type, fetch = TRUE)
In order to essentially bypass this filter, I would like to pass a wildcard so all records are returned.
I've tried proc_type set to '%', '*'. Also escaped both with backslashes and enclosed with double-quotes, but no rows are ever returned, nor are any errors produced.
You could use COALESCE to do this:
SELECT *
FROM MY_DB.MY_TABLE
WHERE PROC_TYPE = COALESCE(?, PROC_TYPE);
In the event that your parameter is NULL it will choose PROC_TYPE to compare to PROC_TYPE which will return everything.
As for your wildcard attempt you would have to switch over to an operator that can use a wildcard. Instead of =, LIKE for instance. I think you would end up with some oddball edge cases though depending on your searchterm and the data in that column, so the COALESCE() option is a better way to go.

How to perform SQLite LIKE queries with wildcards not read as wildcards

I'm running into a problem in SQLite when querying on text fields that happen to have the _ or % wildcard characters.
I have a table with a 'Name' field I want to query on. Two of my records have the value 'test' and 'te_t' in the 'Name' field I want to query on. If I run a query like below
"SELECT ALL * from Table WHERE Name LIKE 'te_t'"
This will return both the 'te_t' and 'test' records, because of '_' being read as a wildcard. How do I make it so that I only get the 'te_t' record from the above query?
I've done some research on this and read that I should be able to throw a backslash '\' character in front of the wildcard to get it to be read as a normal _ character instead of a wildcard. But when I try the query
"SELECT ALL * from Table WHERE Name LIKE 'te\_t'"
my query returns zero matches.
What am I doing wrong? Is this just not possible in SQLite?
In SQL, you can escape special characters in the LIKE pattern if you declare some escape character with ESCAPE:
SELECT * FROM MyTable WHERE Name LIKE 'te\_t' ESCAPE '\'
(see the documentation)

Why does SQLite full-text search (FTS4) treat angle brackets differently in a compound search?

I have an SQLite database using FTS4. It is used to store emails with message id's of the form:
Searching for messages using the FTS MATCH syntax, I get a result from:
SELECT rowid FROM emails WHERE emails MATCH '<8200#comms.io>'
This returns the correct row. But when I try to find multiple emails, I get an empty response:
SELECT rowid FROM emails WHERE emails MATCH '<8200#comms.io> OR <8188#comms.io>'
Strangely though, I can search without the angle bracket characters. This returns both rows:
SELECT rowid FROM emails WHERE emails MATCH '8200#comms.io OR 8188#comms.io'
This even though the angle brackets are present in the stored columns. I can find no mention that these are special characters in SQLite, and without the 'OR', the single-term search works fine.
Why are these characters treated differently in my compound search?
The default (simple) tokenizer reads alphanumerical characters and treats all others as word separators to be ignored.
So when searching for a message ID, you have to actually search for a phrase with multiple words (8200, comms, and io).
If you want to treat the entire message ID as a word, you have to write a custom tokenizer.

Resources