Data model in Cassandra for a recursive structure - recursion

The protobuf message 'sessionproto' that I receive has a field that is recursive.
itemrelationproto references itemgroupproto and itemgroupproto references itemrelationproto .
How do I define a data model in Cassandra to store this data?
Thanks.
message itemrelationproto {
optional string id = 1;
optional itemgroupproto itemgroup = 2;
}
message itemgroupproto {
optional string id = 1;
optional string displayname = 2;
repeated itemrelationproto itemrelations = 3;
}
message sessionproto {
optional string sessionid = 1;
optional string displayname = 3;
repeated itemrelationproto itemrelations = 4;
}
create type itemrelationproto (
id text,
itemgroup frozen<itemgroupproto>
);
create type itemgroupproto (
id text,
displayname text,
itemrelations set<frozen<itemrelationproto>>
);
create table sessionproto (
sessionid text,
displayname text,
itemrelations set<frozen<itemrelationproto>,
primary key (sessionid)
);

Data modelling in Cassandra is not about the objects you want to store but about the queries you want to perform on your data.
The following links might be helpful:
http://www.datastax.com/dev/blog/basic-rules-of-cassandra-data-modeling
https://academy.datastax.com/resources/ds220-data-modeling
This statement from the blog post above sums it up very well:
Don’t model around relations. Don’t model around objects. Model around your queries.
Therefore without knowing what queries you want to execute a proper data model can't be suggested.

Cassandra is not a relational database as you cannot store items that reference each others. So there is no way of doing recursion in Cassandra.
But what you are trying to do is to define a type recursively which is not currently possible. The solution I suggest is to convert your proto into a byte array or json or whatever else and to store it in a text or blob field. Another solution is to create multiple tables and to store each message separately but you will need several requests to select the whole sessionproto.

Related

DynamoDB how to get items count for a partition keys using .net core?

How can I get items count for a particular partition key using .net core preferably using Object Persistence Interface or Document Interfaces?
Since I do not see any docs any where, currently I get the number of items count by retrieve all the item and get its count, but it is very expensive to do the reads.
What is the best practices for such item count request? Thank you.
dynamodb is mostly a document oriented key-value db; so its not optimized for functionality of the common relation db functions (like item count).
to minimize the data that is transmitted and to improve speed you may want to do the following:
Create Lambda Function that returns Item Count
To avoid transmitting data outside of AWS; which is slow and expensive.
query options
use only keys in your projection-expression,
reducing the data that is transmitted from db
max page-size, reducing number of calls needed
Stream Option
Streams could also be used for keeping counts; e.g. as described in
https://medium.com/signiant-engineering/real-time-aggregation-with-dynamodb-streams-f93547cfb244
https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/bp-gsi-aggregation.html
Related SO Question
Complexity of finding total records count with partition key in nosql dynamodb table?
I just realized that using low level interface in QueryRequest one can set Select = "COUNT" then when calling QueryAsync() orQuery() will return the count only as a integer only. Please refer to code sample below.
private static QueryRequest getStockRecordCountQueryRequest(string tickerSymbol, string prefix)
{
string partitionName = ":v_PartitionKeyName";
string sortKeyPrefix = ":v_sortKeyPrefix";
var request = new QueryRequest
{
TableName = Constants.TableName,
ReturnConsumedCapacity = ReturnConsumedCapacity.TOTAL,
Select = "COUNT",
KeyConditionExpression = $"{Constants.PartitionKeyName} = {partitionName} and begins_with({Constants.SortKeyName},{sortKeyPrefix})",
ExpressionAttributeValues = new Dictionary<string, AttributeValue>
{
{ $"{partitionName}", new AttributeValue {
S = tickerSymbol
}},
{ $"{sortKeyPrefix}", new AttributeValue {
S = prefix
}}
},
// Optional parameter.
ConsistentRead = false,
ExclusiveStartKey = null,
};
return request;
}
but I would like to point out that this still will consumed the same read units as retrieving all the item and get its count by yourself. but since it is only returning the count as an integer, it is a lot more efficient then transmitting the entire items list cross the wire.
I think using DynamoDB Streams in a more proper way to get the counts for large project. It is just a lot more complicated to implement.

DynamoDb - .NET Object Persistence Model - LoadAsync does not apply ScanCondition

I am fairly new in this realm and any help is appreciated
I have a table in Dynamodb database named Tenant as below:
"TenantId" is the hash primary key and I have no other keys. And I have a field named "IsDeleted" which is boolean
Table Structure
I am trying to run a query to get the record with specified "TenantId" while it is not deleted ("IsDeleted == 0")
I can get a correct result by running the following code: (returns 0 item)
var filter = new QueryFilter("TenantId", QueryOperator.Equal, "2235ed82-41ec-42b2-bd1c-d94fba2cf9cc");
filter.AddCondition("IsDeleted", QueryOperator.Equal, 0);
var dbTenant = await
_genericRepository.FromQueryAsync(new QueryOperationConfig
{
Filter = filter
}).GetRemainingAsync();
But no luck when I try to get it with following code snippet (It returns the item which is also deleted) (returns 1 item)
var queryFilter = new List<ScanCondition>();
var scanCondition = new ScanCondition("IsDeleted", ScanOperator.Equal, new object[]{0});
queryFilter.Add(scanCondition);
var dbTenant2 = await
_genericRepository.LoadAsync("2235ed82-41ec-42b2-bd1c-d94fba2cf9cc", new DynamoDBOperationConfig
{
QueryFilter = queryFilter,
ConditionalOperator = ConditionalOperatorValues.And
});
Any Idea why ScanCondition has no effect?
Later I also tried this: (throw exception)
var dbTenant2 = await
_genericRepository.QueryAsync("2235ed82-41ec-42b2-bd1c-d94fba2cf9cc", new DynamoDBOperationConfig()
{
QueryFilter = new List<ScanCondition>()
{
new ScanCondition("IsDeleted", ScanOperator.Equal, 0)
}
}).GetRemainingAsync();
It throws with: "Message": "Must have one range key or a GSI index defined for the table Tenants"
Why does it complain about Range key or Index? I'm calling
public AsyncSearch<T> QueryAsync<T>(object hashKeyValue, DynamoDBOperationConfig operationConfig = null);
You simply cant query a table only giving a single primary key (only hash key). Because there is one and only one item for that primary key. The result of the Query would be that still that single item, which is actually Load operation not Query. You can only query if you have composite primary key in this case (Hash (TenantID) and Range Key) or GSI (which doesn't impose key uniqueness therefore accepts duplicate keys on index).
The second code attempts to filter the Load. DynamoDBOperationConfig's QueryFilter has a description ...
// Summary:
// Query filter for the Query operation operation. Evaluates the query results and
// returns only the matching values. If you specify more than one condition, then
// by default all of the conditions must evaluate to true. To match only some conditions,
// set ConditionalOperator to Or. Note: Conditions must be against non-key properties.
So works only with Query operations
Edit: So after reading your comments on this...
I dont think there conditional expressions are for read operations. AWS documents indicates they are for put or update operations. However, not being entirely sure on this since I never needed to do a conditional Load. There is no such thing like CheckIfExists functionality as well in general. You have to read the item and see if it exists. Conditional load will still consume read throughput so your only advantage would be only NOT retrieving it in other words saving the bandwith (which is very negligible for single item).
My suggestion is read it and filter it in your application layer. Dont query for it. However what you can also do is if you very need it you can use TenantId as hashkey and isDeleted for range key. If you do so, you always have to query when you wanna get a tenant. With the query you can set rangeKey(isDeleted) to 0 or 1. This isnt how I would do it. As I said, would just read it and filter it at my application.
Another suggestion thing could be setting a GSI on isDeleted field and writing null when it is 0. This way you can only see that attribute in your table when its only 1. GSI on such attribute is called sparse index. Later if you need to get all the tenants that are deleted (isDeleted=1) you can simply scan that entire index without conditions. When you are writing null when its 0 dynamoDB wont put it in the index at the first place.

Objects with multiple key columns in realm.io

I am writing an app using the Realm.io database that will pull data from another, server database. The server database has some tables whose primary keys are composed of more than one field. Right now I can't find a way to specify a multiple column key in realm, since the primaryKey() function only returns a String optional.
This one works:
//index
override static func primaryKey() ->String?
{
return "login"
}
But what I would need looks like this:
//index
override static func primaryKey() ->[String]?
{
return ["key_column1","key_column2"]
}
I can't find anything on the docs on how to do this.
Supplying multiple properties as the primary key isn't possible in Realm. At the moment, you can only specify one.
Could you potentially use the information in those two columns to create a single unique value that you could use instead?
It's not natively supported but there is a decent workaround. You can add another property that holds the compound key and make that property the primary key.
Check out this conversation on github for more details https://github.com/realm/realm-cocoa/issues/1192
You can do this, conceptually, by using hash method drived from two or more fields.
Let's assume that these two fields 'name' and 'lastname' are used as multiple primary keys. Here is a sample pseudo code:
StudentSchema = {
name: 'student',
primaryKey: 'pk',
properties: {
pk: 'string',
name: 'string',
lastname: 'string',
schoolno: 'int'
}
};
...
...
// Create a hash string drived from related fields. Before creating hash combine the fields in order.
myname="Uranus";
mylastname="SUN";
myschoolno=345;
hash_pk = Hash( Concat(myname, mylastname ) ); /* Hash(myname + mylastname) */
// Create a student object
realm.create('student',{pk:hash_pk,name:myname,lastname:mylastname,schoolno: myschoolno});
If ObjectId is necessary then goto Convert string to ObjectID in MongoDB

How to query range key programmatically in DynamoDB

How to query range key programmatically in DynamoDB, I am using .Net AWSSDK ,I am able to query on Hash key with below code :
GetItemRequest request = new GetItemRequest
{
TableName = tableName
};
request.Key = new Dictionary<string,AttributeValue>();
request.Key.Add("ID",new AttributeValue { S = PKValue });
GetItemResponse response = client.GetItem(request);
Please suggest,
Thanks in advance.
There are two kinds of primary key in DynamoDB: Hash-only or Hash-Range.
In the above code I guess your table is Hash-only and you use the hash key to retrieve an element with hashkey equals to PKValue.
If your table is in H-R schema and you want to retrieve a specific element with a hashKey and rangeKey, you can reuse the above code and in addition, add the {"RangeKey", new AttributeValue } into your your request.KEY
On the other hand, query means a different thing in DynamoDB. Query will return you a list of rows sorted in some order.

Different RavenDB collections with documents of same type

In RavenDB I can store objects of type Products and Categories and they will automatically be located in different collections. This is fine.
But what if I have 2 logically completely different types of products but they use the same class? Or instead of 2 I could have a generic number of different types of products. Would it then be possible to tell Raven to split the product documents up in collections, lets say based on a string property available on the Product class?
Thankyou in advance.
EDIT:
I Have created and registered the following StoreListener that changes the collection for the documents to be stored on runtime. This results in the documents correctly being stored in different collections and thus making a nice, logically grouping of the documents.
public class DynamicCollectionDefinerStoreListener : IDocumentStoreListener
{
public bool BeforeStore(string key, object entityInstance, RavenJObject metadata)
{
var entity = entityInstance as EntityData;
if(entity == null)
throw new Exception("Cannot handle object of type " + EntityInstance.GetType());
metadata["Raven-Entity-Name"] = RavenJToken.FromObject(entity.TypeId);
return true;
}
public void AfterStore(string key, object entityInstance, RavenJObject metadata)
{
}
}
However, it seems I have to adjust my queries too in order to be able to get the objects back. My typical query of mine used to look like this:
session => session.Query<EntityData>().Where(e => e.TypeId == typeId)
With the 'typeId' being the name of the new raven collections (and the name of the entity type saved as a seperate field on the EntityData-object too).
How would I go about quering back my objects? I can't find the spot where I can define my collection at runtime prioring to executing my query.
Do I have to execute some raw lucene queries? Or can I maybe implement a query listener?
EDIT:
I found a way of storing, querying and deleting objects using dynamically defined collections, but I'm not sure this is the right way to do it:
Document store listener:
(I use the class defined above)
Method resolving index names:
private string GetIndexName(string typeId)
{
return "dynamic/" + typeId;
}
Store/Query/Delete:
// Storing
session.Store(entity);
// Query
var someResults = session.Query<EntityData>(GetIndexName(entity.TypeId)).Where(e => e.EntityId == entity.EntityId)
var someMoreResults = session.Advanced.LuceneQuery<EntityData>(GetIndexName(entityTypeId)).Where("TypeId:Colors AND Range.Basic.ColorCode:Yellow)
// Deleting
var loadedEntity = session.Query<EntityData>(GetIndexName(entity.TypeId)).Where(e =>
e.EntityId == entity.EntityId).SingleOrDefault();
if (loadedEntity != null)
{
session.Delete<EntityData>(loadedEntity);
}
I have the feeling its getting a little dirty, but is this the way to store/query/delete when specifying the collection names runtime? Or do I trap myself this way?
Stephan,
You can provide the logic for deciding on the collection name using:
store.Conventions.FindTypeTagName
This is handled statically, using the generic type.
If you want to make that decision at runtime, you can provide it using a DocumentStoreListner

Resources