Does composite index also create normal indexes? - google-cloud-datastore

I have a requirement where I need to filter by propA and fitlter and sort by propB but never have to do just either propA or propB. I asked to not index propA and propB and created a compound index on both. But that didn't work.
As per App Engine DataStore - Compound Indexes - datastore-indexes - not working
a composite index also requires specifying the component props to be indexed. Does that mean, internally there will be 5 indexes, one for the compound index and 2 each (asc/desc) for the two props? I am trying to understand the storage requirements of a compound index.

Yes, the individual properties propA and propB have to be indexed as well.
But no, you don't have to explicitly have to create (asc and desc) indexes for them, just let the datastore automatically create the built-in indexes for them (one per property, not 2) by simply not declaring them "not indexed". From Indexes:
Built-in indexes
By default, a Datastore mode database automatically predefines an
index for each property of each entity kind. These single property
indexes are suitable for simple types of queries.
So there will be 3 indexes in your case, 2 built-in and 1 composite.

Related

Are built-in index based ancestor queries efficient?

The indexes doc at https://cloud.google.com/datastore/docs/concepts/indexes says that built-in single property indexes can support
Queries using only ancestor and equality filters
Queries using only inequality filters (which are limited to a single property)
Since the built-in index for the property is sorted by the property value, I understand how it supports a single inequality filter. However, how is it able to support the equality filter with ancestor query? Say I have a million rows for the same property value, but the given ancestor condition only matches 100 rows within those million rows, would it have to scan all the million rows to find the 100 matching rows? I don't think that's the case as some where I read that Cloud Datastore scales with the number of rows in the result set and not the number of rows in the database. So, unless the single property index is internally a multi-column index with first column as the property and the second column as the entity key, I don't see how these ancestor + equality queries can be efficiently supported with built-in single property queries.
Cloud Datastore built-in indexes are always split into a prefix and a postfix at query time. The prefix portion is the part that remains the same (eg equalities or ancestors), the postfix portion is the part that changes (sort order).
Builtin indexes are laid out:
Kind, PropertyName, PropertyValue, Key
For example, a query: FROM MyKind WHERE A > 1
Would divide the prefix/postfix as:
MyKind,A | range<1, inf>
In the case you're asking about (ancestor with equality), FROM MyKind WHERE __key__ HAS ANCESTOR Key('MyAncestor', 1) AND A = 1 the first part of the prefix is easy:
MyKind,A,1
To understand the ancestor piece, we have to consider that Datastore keys are a hierarchy. In the case of MyKind, the keys might looks like: (MyAncestor, 1, MyKind, 345).
This means we can make the prefix for an ancestor + equality query as:
MyKind,A,1,(MyAncestor, 1)
The postfix would then just be all the keys that have (MyAncestor,1) as a prefix and A=1.
This is why you can have an equality with an ancestor using the built-in indexes, but not an inequality with an ancestor.
If you're interested, the video Google I/O 2010 - Next gen queries dives into this in depth.
According to this documentation "The rows of an index table are sorted first by ancestor and then by property values, in the order specified in the index definition."

Firestore: Does a composite index perform better on queries with multiple where clauses?

I have the following code:
foldersRef.where("ParentID", "==", "1").where("Deleted", "==", false)
Will a composite index (that indexes ParentID and Deleted) be preferable to a single-field index for this query?
If I also have individual single-field indexes for ParentID and Deleted, will Firestore know to use the composite index?
When I create the composite index, does the order of fields matter?
Does the order of my .where() clauses / function calls matter?
Will a composite index (that indexes ParentID and Deleted) be preferable to a single-field index for this query?
If you are using the following line of code:
foldersRef.where("ParentID", "==", "1").where("Deleted", "==", false)
Without any ordering (ASCENDING or DESCENDING), there is no need to create a composite index. Firestore will create the needed index automatically.
If I also have individual single-field indexes for ParentID and Deleted, will Firestore know to use the composite index?
No. Single-field indexes are also created automatically by Fiestore. So there is no need to creat any index for a single filed. Beside that, if you have separate field indexes with orderding, it doesn't mean that you also have a composite index for those fields. You need to create yourself.
When I create the composite index, does the order of fields matter?
Yes, if you change the order of your where() calls, you also need to create the corresponding index accordingly.
Does the order of my .where() clauses / function calls matter?
In terms of speed, as long as you create the correct index, no.
Yes, for reads, a composite index will perform better than a merge of single-field indexes. It allows Cloud Firestore to use one index instead of merging multiple single-field indexes.
Yes, Cloud Firestore will give preference to the composite index.
(Also 4.) Yes, the order of the fields matters in both your index definition and your query. foldersRef.where("ParentID", "==", "1").where("Deleted", "==", false) and foldersRef.where("Deleted", "==", false).where("ParentID", "==", "1") would require two different composite indexes.

AWS DynamoDB Query based on non-primary keys

I'm new to AWS DynamoDB and wanted to clarify something. Is it possible to query a table and filter base on a non-primary key attribute. My table looks like the following
Store
Id: PrimaryKey
Name: simple string
Location: simple string
Now I want to query on the Name, but I think I have to give the key as well from what I know? Apart from that I can use the scan but then I will be loading all the data.
From the docs:
The Query operation finds items based on primary key values. You can query any table or secondary index that has a composite primary key (a partition key and a sort key).
DynamoDB requires queries to always use the partition key.
In your case your options are:
create a Global Secondary Index that uses Name as a primary key
use a Scan + Filter if the table is relatively small, or if you expect the result set will include the majority of the records in the table
There are few designs principals that you can follow while you are using DynamoDB. If you are coming from a relational background, you have already witnessed the query limitations from primary key attributes.
Design your tables, for querying and separating hot and cold data.
Create Indexes for Querying from Non Key attributes (You have two options, Global Secondary Index which you can define at any time and Local Secondary Index which you need to specify at table creation time).
With the Global Secondary Index you can promote any NonKey attribute as the Partition Key for the Index and select another attribute for Sort Key for querying. For Local Secondary Index, you can promote any Non Key attribute as the Sort Key keeping the same Partition Key.
Using Indexes for query is important also to improve the efficiency in using provisioned throughput.
Although having indexes consumes the read throughput from the table, it also saves read through put from in a way that, if you project the right amount of attributes to read, it can give a huge benefit in reading. Check the following example.
Lets say you have a DynamoDB table that has items of 40KB. If you read directly from the table to list 10 items, it consumes 100 Read Throughput Units (For one item 10 Units since one unit can read 4KB and multiply it by 10). If you have an index defined just to project the attributes needed to list which will be having 4KB per item, then it will be consuming only 10 Read Throughput Units(One Unit per item) which makes a huge difference in terms of cost.
With DynamoDB its really important how you define Indexes to optimize for Querying not only from Query capability but also in terms of throughput.
You can not query based non-primary key attribute in Dynamo Db.
If you wanted to still do that you can do it using scan query,but scan is costly operation in DyanmoDB and if table is large, then it will affect performance and not recommended because it will scan each item in table and AWS cost you for all item it scan for that query.
There are two ways to achieve it
Keep Store Id as your PrimaryKey/ Partaion key of Dyanmo DB table and add Name/Location as sort Key (only one as Dyanmo DB accept only one Attribute as sort key by design.
Create Global Secondary Indexes for Querying from Non Key attributes which you are more frequenly required.
There are 3 ways to created GSI in Dyanamo DB, In your case select GSI with option INCLUDE and add Name , Location and store ID in Idex.
KEYS_ONLY – Each item in the index consists only of the table partition key and sort key values, plus the index key values. The KEYS_ONLY option results in the smallest possible secondary index.
INCLUDE – In addition to the attributes described in KEYS_ONLY, the secondary index will include other non-key attributes that you specify.
ALL – The secondary index includes all of the attributes from the source table. Because all of the table data is duplicated in the index, an ALL projection results in the largest possible secondary index.

DynamoDB - Global Secondary Index on set items

I have a dynamo table with the following attributes :
id (Number - primary key )
title (String)
created_at (Number - long)
tags (StringSet - contains a set of tags say android, ios, etc.,)
I want to be able to query by tags - get me all the items tagged android. How can I do that in DynamoDB? It appears that global secondary index can be built only on ScalarDataTypes (which is Number and String) and not on items inside a set.
If the approach I am taking is wrong, an alternative way for doing it either by creating different tables or changing the attributes is also fine.
DynamoDB is not designed to optimize indexing on set values. Below is a copy of the amazon's relevant documentation (from Improving Data Access with Secondary Indexes in DynamoDB).
The key schema for the index. Every attribute in the index key schema
must be a top-level attribute of type String, Number, or Binary.
Nested attributes and multi-valued sets are not allowed. Other
requirements for the key schema depend on the type of index: For a
global secondary index, the hash attribute can be any scalar table
attribute. A range attribute is optional, and it too can be any scalar
table attribute. For a local secondary index, the hash attribute must
be the same as the table's hash attribute, and the range attribute
must be a non-key table attribute.
Amazon recommends creating a separate one-to-many table for these kind of problems. More info here : Use one to many tables
This is a really old post, sorry to revive it, but I'd take a look at "Single Table Design"
Basically, stop thinking about your data as structured data - embrace denormalization
id (Number - primary key )
title (String)
created_at (Number - long)
tags (StringSet - contains a set of tags say android, ios, etc.,)
Instead of a nosql table with a "header" of this:
id|title|created_at|tags
think of it like this:
pk|sk |data....
id|id |{title, created_at}
id|id+tag|{id, tag} <- create one record per tag
You can still return everything by querying for pk=id & sk begins with id and join the tags to the id records in your app logic
and you can use a GSI to project id|id+tag into tag|id which will still require you to write two queries against your data to get items of a given tag (get the ids then get the items), but you won't have to duplicate your data, you wont have to scan and you'll still be able to get your items in one query when your access pattern doesn't rely on tags.
FWIW I'd start by thinking about all of your access patterns, and from there think about how you can structure composite keys and/or GSIs
cheers
You will need to create a separate table for this query.
If you are interested in fetching all items based on a tag then I suggest keeping a table with a primary key:
hash: tag
range: id
This way you can use a very simple Query to fetch all items by tag.

Does dynamodb support something like an "in" clause in its queries?

Say I have table of photos and users.
Given I have a list of users I'm following [user1,user2,...] and I want to get a list of photos of people I'm following.
How can I query the table of photos where photo.createdBy in [user1,user2,user3...]
I saw that dynamodb has a batch operation, but that takes a primary key, and in this case we would be querying against a secondary index (createdBy).
Is there a way to do a query like this in dynamodb?
If you are querying purely on photo.createdBy, then you should create a global secondary index:
To speed up queries on non-key attributes, you can create a global secondary index. A global secondary index contains a selection of attributes from the table, but they are organized by a primary key that is different from that of the table. The index key does not need to have any of the key attributes from the table; it doesn't even need to have the same key schema as a table.
This will, of course, only retrieve one item. To limit results when returning more items, use a FilterExpression:
With a Query or a Scan operation, you can provide an optional filter expression to refine the results returned to you. A filter expression lets you apply conditions to the data after it is queried or scanned, but before it is returned to you. Only the items that meet your conditions are returned.
This can be applied to a Filter or Scan, but be careful of using too many Read Capacity Units when scanning for matching entries.

Resources