Can I boost Lucene query parser results based on expected data? - asp.net

I have a Lucene index where one of the indexed fields contains a string that identifies the type of content.
For simplicity, say this field is called _type and will only ever contain typeone or typetwo.
I am using Lucene query parser syntax to query this index. Say my query is:
(+fieldone:term^3.0 +classname:term^2.0)
Is it possible to extend this to boost any results that have typeone in their _type field, whilst still returning typetwo records (albeit with a lower relevancy score)?
UPDATE
I've found a syntax which works but it uses the wildcard 'all documents' syntax which I suspect is not efficient. Advice appreciated.
(+fieldone:term^3.0 +classname:term^2.0) +(*:* _type:typeone^1.1)

Using just Lucene syntax you can simply keep the _type boost as SHOULD in the following way:
+fieldone:term^3.0 +classname:term^2.0 (_type:typeone)^2
you don't need wildcards.
Another solution would be using eDismax query parser then you can use the bq or bf parameter in order to boost a particular value for a field. You can use one of the following solutions:
Solution 1: you can boost your term in the following way:
defType=edismax&bq=_type:"typeone"^3
or
Solution 2: you can use a query function in the following way:
defType=edismax&bf=if(termfreq(_type,"typeone"),3,if(termfreq(_type,"typetwo"),2,1))
where the results having _type=typeone are boosted by 3, the ones having typetwo are boosted by 2, otherwise it will be 1. You can modify that query according your needs.

Related

How to query a DynamoDB index with Dynamoose?

I have a DynamoDB table with animals and I'm interacting with it using Dynamoose. My table has a 'UserId' attribute, that indicates the user that registered that animal. I want to write a query that finds all the animals registered by the same user, i.e., gets all the items that have the attribute 'UserId' matching the input string.
I'm trying to use Dynamoose's queries like this MyModel.query('UserId').eq(user.id).using('UserId-index').exec();, but it always gives this error Index can't be found for query. I imagine that this is caused because it is not finding the index for the attribute 'UserId', but I have an index 'UserId-index' on my table.
I also tried specifying the index that should be used on the query with the using() method, like this MyModel.query('UserId').eq(user.id).using('UserId-index').exec();, but it gave me this other error: Either the KeyConditions or KeyConditionExpression parameter must be specified in the request, which I don't get at all.
Note that I don't wanna use scan(), as the official documentation highly encourages the developers to use query() instead.

BigQuery error: Cannot query the cross product of repeated fields

I am running the following query on Google BigQuery web interface, for data provided by Google Analytics:
SELECT *
FROM [dataset.table]
WHERE
  hits.page.pagePath CONTAINS "my-fun-path"
I would like to save the results into a new table, however I am obtaining the following error message when using Flatten Results = False:
Error: Cannot query the cross product of repeated fields
customDimensions.value and hits.page.pagePath.
This answer implies that this should be possible: Is there a way to select nested records into a table?
Is there a workaround for the issue found?
Depending on what kind of filtering is acceptable to you, you may be able to work around this by switching to OMIT IF from WHERE. It will give different results, but, again, perhaps such different results are acceptable.
The following will remove entire hit record if (some) page inside of it meets criteria. Note two things here:
it uses OMIT hits IF, instead of more commonly used OMIT RECORD IF).
The condition is inverted, because OMIT IF is opposite of WHERE
The query is:
SELECT *
FROM [dataset.table]
OMIT hits IF EVERY(NOT hits.page.pagePath CONTAINS "my-fun-path")
Update: see the related thread, I am afraid this is no longer possible.
It would be possible to use NEST function and grouping by a field, but that's a long shot.
Using flatten call on the query:
SELECT *
FROM flatten([google.com:analytics-bigquery:LondonCycleHelmet.ga_sessions_20130910],customDimensions)
WHERE
  hits.page.pagePath CONTAINS "m"
Thus in the web ui:
setting a destination table
allowing large results
and NO flatten results
does the job correctly and the produced table matches the original schema.
I know - it is old ask.
But now it can be achieved by just using standard SQL dialect instead of Legacy
#standardSQL
SELECT t.*
FROM `dataset.table` t, UNNEST(hits.page) as page
WHERE
  page.pagePath CONTAINS "my-fun-path"

Sqlite Query Optimization with OR and AND

The following query takes 5 seconds to execute:
SELECT DISTINCT(Product.Name) FROM Product WHERE (0=1 OR Product.Number="prod11");
While the following takes ONLY 15 milliseconds:
SELECT DISTINCT(Product.Name) FROM Product WHERE (Product.Number="prod11");
Interestingly the following also takes only 15 milliseconds:
SELECT DISTINCT(Product.Name) FROM Product WHERE (1=1 AND Product.Number="prod11");
The query plan shows that the first query uses a full table scan (for some unknown reason), while the second and third queries use an index (as expected).
For some reason it looks like Sqlite optimizes the "1=1 AND ..." but it doesn't optimize "0=1 OR ...".
What can I do to make Sqlite use the index for the first query as well?
The queries are built by NHibernate so it's kind of hard to change them...
Sqlite version is the latest for Windows.
SQLite's query optimizer is rather simple and does not support OR expressions very well.
For some reason, it can optimize this query if it can use a covering index, so try this:
CREATE INDEX TakeThatNHibernate ON Product(Number, Name)
1=1and 1=0 are SQL expressions used in some parts of the NHibernate framework to denote empty statements that won't alter the logic of the sql query. A Conjunction with no subcriterias generates an 1=1 expression, A Disjunction with no subcriterias generates an 1=0 expression. An In() generates an 1=0 expression if no values are provided.
To avoid such optimization, you could change the code that is creating those empty expressions and only use the criterions that have at least one subcriteria.

Riak search queries via the java client

I am trying to perform queries using the OR operator as following:
MapReduceResult result = riakClient.
mapReduce("some_bucket", "Name:c1 OR c2").
addMapPhase(new NamedJSFunction("Riak.mapValuesJson"), true).
execute();
I only get the 1st object in the query (where name='c1').
If I change the order of the query (i.e. Name:c2 OR c1) again I get only the first object in query (where name='c2').
is the OR operator (and other query operators) supported in the java client?
I got this answer from Basho engeneer, Sean C.:
You either need to group the terms or qualify both of them. Without a field identifier, the search query assumes that the default field is being searched. You can determine how the query will be interpreted by using the 'search-cmd explain' command. Here's two alternate ways to express your query:
Name:c1 OR Name:c2
Name:(c1 OR c2)
both options worked for me!

Verifying sqlite FTS (Full Text Search) syntax, and how to do a single term NOT search

Is there a way to determine if a MATCH query sent to an fts3 table in sqlite is valid? Currently, I don't find out if the expression is invalid until I try to run it, which makes it a little tricky. I'd like to report to the user that the syntax is invalid before even trying to run it.
I'm using sqlite with the C api.
Additionally, doing a search with the expression "NOT " will fail with a "SQLlite logic/database error". Googling seems to indicate that such a query is invalid. Is there a correct syntax to do the operation? I'm essentially trying to find entries that do NOT contain that term. Or do I have to drop back down to using LIKE and do a sequential scan and not use FTS?
Looks like the simplest way right now is to create a separate FTS table with no rows, and execute the query against it. It will be immediate since there's no data in the table, and will report and error if the query syntax is incorrect.

Resources