Indexing decimal values in zope catalog - plone

I'm using Plone 4.2.5 and dexterity
I have added a field to a custom content type and would like to index one of the fields. It's a decimal field. Preferrably I'd like to be able to query the catalog using 'greater than' or 'less than or equal to' type logic for the index. Is this possible? What type of index should I use?

Use a FieldIndex for your field. You can do range searches on field indexes using the range keyword:
catalog(indexname=dict(query=10.5, range='max'))
would return matches for which the index indexname value is 10.5 or less. For a strict less than, subtract sys.float_info.epsilon to the query value:
catalog(indexname=dict(query=10.5, range='max')) # less than or equal to
catalog(indexname=dict(query=10.5 - sys.float_info.epsilon, range='max')) # less than

Related

is it good idea to use a binary attribute for GSI indexing in dynamo DB?

I have one attribute in my DynamoDB table which will take binary values success and failure.
can I do GSI indexing on this attribute if i have to fetch/query either success or failure records from this table?
or should i make two different table table for success and failure scenarios?
if should not do indexing on binary attribute,
what are the problems with GSI indexing of binary attribute?
how it will impact performance of query operation?
It sounds like you perhaps mean boolean (true/false) rather than binary. You cannot create a GSI on a boolean attribute in DynamoDB but you can on a string, number or binary attribute (which is different to boolean), so you can consider 1 / 0 or “accept” / “fail” for your logical boolean.
You might consider making this a sparse index if you only want to query one side of your index. So if you only want to query when there is a true (or “accept” or 1 or anything really) then when it is not true, delete the attribute rather than set it to “failure” or 0 etc. This makes queries far more performant as the index is smaller, but the limitation is you can no longer query the “failure” / false / 0 cases.
To answer your questions:
1) you can’t create an index on a boolean, use a string or number (or binary, but probably you want string or number)
2) if you only need to query one side of the boolean (e.g. “accept” but never “failure”) you can improve the performance by creating a sparse index

Creating DynamoDB Table with List type as a primary key

I have a use case where I want to create a Dynamodb Table which contains only 2 attributes - List of String (for example, Countries) and a Boolean value.
I am extracting this value for each country and implementing different logic in case of true or false.
My question is that, what is a best way (best practice) to create a dynamodb table.
I thought of few of following ways -
Boolean value as a key
Use boolean value as key and List as another attribute.
Add a row for each country.
Create a separate record with Country value as key and flag as an attribute.
Use List of countries as key and boolean value as another attribute. (I don't think this can be a good choice)
What could be the best practice while designing tables like this?
Thank You,
Prasad
From AWS DynamoDB Docs, NamingRulesDataTypes:
When you create a table or a secondary index, you must specify the names and data types of each primary key attribute (partition key and sort key). Furthermore, each primary key attribute must be defined as type string, number, or binary.
There are many options to model your table, but keep in mind you have to respect the rules cited above.
Your second case is a good one:
Add a row for each country. Create a separate record with Country value as key and flag as an attribute.
Partition key: country - string
Some column you do not have to define at creation: flag - boolean

Dynamodb index with Json attribute

I am referring to a thread creating an index with JSON
I have a column called data in my DynamoDB table. This is in JSON and the structure of this file looks like this:
{
"config": "aasdfds",
"state":"PROCESSED",
"value" "asfdasasdf"
}
The AWS documentation says that I can create an index with the top level JSON attribute. However I don't know how to do this exactly. When I create the index, should I specify the partition key as data.state, then, in my code, use a query with the column data.state with the value set to PROCESSED, or should I create the partition key as data, then, in my code, look for the column data with the value set to state = "PROCESSED" ?
Top level attribute means DynamoDB supports creating index on Scalar attributes only (String, Number, or Binary).
The JSON attribute is stored as Document data type. So, index can't be created on Document data type.
The key schema for the index. Every attribute in the index key schema
must be a top-level attribute of type String, Number, or Binary. Other
data types, including documents and sets, are not allowed.
Scalar Types – A scalar type can represent exactly one value. The
scalar types are number, string, binary, Boolean, and null.
Document Types – A document type can represent a complex structure
with nested attributes—such as you would find in a JSON document. The
document types are list and map.
Set Types – A set type can represent multiple scalar values. The set
types are string set, number set, and binary set.

SQLite index usage with different type affinity

I have a simple table "tags" containing a key and a value column. The key is always a string, the value can be either string, int64 or a double value.
I do not have any real data at this point to test with. But I'm curious about the index usage of the value column. I've defined the column as TEXT type - is SQLite still able to use the index on the value column when an int64 or double type is bound to the statement?
Here is the test table:
CREATE TABLE "tags" ("key" TEXT,"value" TEXT DEFAULT (null) );
INSERT INTO "tags" VALUES('test','test');
INSERT INTO "tags" VALUES('testint','1');
INSERT INTO "tags" VALUES('testdouble','2.0');
I see additional "Integer" and "Affinity" entries when analyzing the query via:
explain SELECT value FROM tags where key = "testint" and value >= 1
But I do not see any difference in index usage otherwise (e.g. idxgt is always used). But I'd rather like to have a definite answer rather than relying on wrong assumption with the small test data.
The documentation says:
A column with TEXT affinity stores all data using storage classes NULL, TEXT or BLOB. If numerical data is inserted into a column with TEXT affinity it is converted into text form before being stored.
The sort order is well-defined for all types.
Forcing the affinity to be TEXT makes comparisons on this column with numbers behave as if the values were text, but that is probably what you want.
In any case, indexes do not change the behaviour; they work correctly with all types, and apply affinities in exactly the same way as on non-indexed columns.

Under what circumstances would likelihood() be useful?

Reading through the sqlite documentation I found the following function:
http://www.sqlite.org/lang_corefunc.html#likelihood
The likelihood(X,Y) function returns argument X unchanged. The value Y in likelihood(X,Y)
must be a floating point constant between 0.0 and 1.0, inclusive. The likelihood(X) function
is a no-op that the code generator optimizes away so that it consumes no CPU cycles during
run-time (that is, during calls to sqlite3_step()). The purpose of the likelihood(X,Y)
function is to provide a hint to the query planner that the argument X is a boolean that is
true with a probability of approximately Y. The unlikely(X) function is short-hand for
likelihood(X,0.0625).
Assuming that i know that 1 will return 75% of the time, how would:
select likelihood(x,.75)
help the query optimizer?
The original example was this:
Consider the following schema and query:
CREATE TABLE composer(
cid INTEGER PRIMARY KEY,
cname TEXT
);
CREATE TABLE album(
aid INTEGER PRIMARY KEY,
aname TEXT
);
CREATE TABLE track(
tid INTEGER PRIMARY KEY,
cid INTEGER REFERENCES composer,
aid INTEGER REFERENCES album,
title TEXT
);
CREATE INDEX track_i1 ON track(cid);
CREATE INDEX track_i2 ON track(aid);
SELECT DISTINCT aname
FROM album, composer, track
WHERE cname LIKE '%bach%'
AND composer.cid=track.cid
AND album.aid=track.aid;
The schema is for a (simplified) music catalog application, though similar kinds of schemas come up in other situations. There is a large number of albums. Each album contains one or more tracks. Each track has a composer. Each composer might be associated with multiple tracks.
The query asks for the name of every album that contains a track with a composer whose name matches '%bach%'.
The query planner needs to choose among several alternative algorithms for this query. The best choices hinges on how well the expression "cname LIKE '%bach%'" filters the results. Let's give this expression a "filter value" which is a number between 1.0 and 0.0. A value of 1.0 means that cname LIKE '%bach%' is true for every row in the composer table. A value of 0.0 means the expression is never true.
The current query planner (in version 3.8.0) assumes a filter value of 1.0. In other words, it assumes that the expression is always true. The planner is assuming the worst case so that it will pick a plan that minimizes worst case run-time. That's a safe approach, but it is not optimal. The plan chosen for a filter of 1.0 is track-album-composer. That means that the "track" table is in the outer loop. For each row of track, an indexed lookup occurs on album. And then an indexed lookup occurs on composer, then the LIKE expression is run to see if the album name should be output.
A better plan would be track-composer-album. This second plan avoids the album lookup if the LIKE expression is false. The current planner would choose this second algorithm if the filter value was just slightly less than 1.0. Say 0.99. In other words, if the planner thought that the LIKE expression would be false for 1 out of every 100 rows, then it would choose the second plan. That is the correct (fastest) choice for when the filter value is large.
But in the common case of a music library, the filter value is probably much closer to 0.0 than it is to 1.0. In other words, the string "bach" is unlikely to be found in most composer names. And for values near 0.0, the best plan is composer-track-album. The composer-track-album plan is to scan the composer table once looking for entries that match '%bach%" and for each matching entry use indices to look up the track and then the album. The current 3.8.0 query planner chooses this third plan when the filter value is less than about 0.1.
The likelihood functions gives the database a (hopefully) better estimate of the selectivity of a filter.
With the example query, it would look like this:
SELECT DISTINCT aname
FROM album, composer, track
WHERE likelihood(cname LIKE '%bach%', 0.05)
AND composer.cid=track.cid
AND album.aid=track.aid;

Resources