Querying the Dynamodb - amazon-dynamodb

I've a table of cars.
with the items, CarName, ModelYear, Color, Transmission, Used-Mileage.
how can I query the car by just ModelYear or just color or just Transmission? I mean, how can I specifically query the data, available in the table.

#hassan are you able to share what you have already. There are a lot of strategies you could use, you could create multiple item collections or indexes. I’ve made an app which may help you in deciding which models a winfarm https://github.com/dariusjs/windpark
It also depends what access patterns you will have on the app. Are you sensitive to low latency, how much data will there be. You could get away with scans and filtering but that will not scale.
You could try something like this which would target your requirement to get cars by date
PK: MAKE#DODGE SK: YEAR#1969#MODEL#CHARGER TRANSMISSION:MANUAL
PK: MAKE#DODGE SK: YEAR#1970#MODEL#CHARGER TRANSMISSION:AUTO
PK: MAKE#DODGE SK: YEAR#1971#MODEL#CHALLENGER TRANSMISSION:AUTO
PK: MAKE#PLYMOUTH SK: YEAR1970#MODEL#ROADRUNNER TRANSMISSION:AUTO
Next to this you could fill out the rest of the attributes of the car and potentially use indexes to list query those
I recommend giving the aws dynamodb workbench tool a try and trying to experiment a bit on this

Related

Querying details from GraphDB

We are trying to implement Customer oriented details in Graphdb, were with a single query we can fetch the details of a customer such as his address,phone,email etc. We have build it using had address, has email edges..
g.addV('member').property('id','CU10611972').property('CustomerId', 'CU10611972').property('TIN', 'xxxx').property('EntityType', 'Person').property('pk', 'pk')
g.addV('email').property('id','CU10611972E').property('pk', 'pk')
g.addV('primary').property('id','CU10611972EP').property('EmailPreference','Primary').property('EmailType', 'Home').property('EmailAddress', 'SNEHA#GMAIL.COM').property('pk', 'pk')
g.V('CU10611972').addE('has Email').to(g.V('CU10611972E'))
g.V('CU10611972E').addE('has Primary Email').to(g.V('CU10611972EP')
This is how we have build email relation to the customer.. Similarly we have relations with Address and Phone. So right now we are using this command to fetch the json related to this customer for email,
g.V('CU10611972').out('has Email').out('has Primary Email')
And for complete Customer details we are using union for each Vertex, Phone,Emaiul and address..
Could you please suggest if there is an efficient way to query this detail?
This comes down really to two things.
General graph data modelling
Things the graph DB you are using does and does not support.
With Gremlin there are a few ways to model this data for a single vertex.
If the database supports it, have a list of names like ['home','mobile'] and use metaproperties to attach a phone number to each.
A lot of the Gremlin implementations I am aware of have chosen not to support meta properties. In these cases you have a couple of options.
(a) Have a property for 'Home' and another for 'Mobile'. If either is not known you could either not create that property or give it a value such as "unknown"
(b) Use prefixed strings such as ["Home:123456789","Mobile:123456789] and store them in a set or list (multi properties) and access them in Gremlin using the startingWith predicate. Such as g.V(id).properties('phone').hasValue(startingWith('Mobile')).value()

How to model complex relational data in Firestore while limiting composite indexes?

First of all thank you to anybody reading through this and offering any advice and help. It is much appreciated.
I'm developing a small custom CRM (ouch) for my father's business (specialty contractor) and I'm using Firestore for my database. It is supposed to be very lean, with not much "bling" but stream lined to his speciality contracting business, which is very hard to to get any other custom CRM to be applied to his process. I have gotten quite far and have a decent size implementation, but am now running into some very fundamental issues as everything is expanding.
I admit that only having experience with relational databases (and not much of that either) left me scratching my head a few times when properly setting up my database structure and am running into some issues with Firestore. I'm also a fairly novice developer and I feel I'm tackling something that is just way out of my league. (but there's not much turning around now being a year into this journey)
As of right now I'm using Top Level Collections for what I am presenting here. I recently started using Sub-Collections for some other minor features and started questioning if I should apply that for everything.
A big problem that I foresee is because I want to query in a multitude of ways, I am already consuming almost 100 composite indexes at this time. There is still lots to add, so I need to reduce the amount of composite indexes that my current and future data structure needs.
So I am somewhat certain, that my data model probably is deeply flawed and needs to be improved/optimized/changed. (Which I don't mind doing, if that's what it takes, but I'm lost on "how") I don't need a specific solution, but maybe just some pointers, generally speaking, of what approaches are available. I think I might be lacking an "aha" moment. If I understand a pattern, I can usually apply that further in other areas.
I will make my "Sales Leads Collection" a central concern of this post, as it has the most variations of querying.
So I have a mostly top level collection structure like this, but also want to prefix, that besides writing the IDs to other Documents, I will "stash" an entire "Customer" or "Sales Rep" Object/Document with other Documents and I have Cloud Functions that will iterate through certain documents when there are updates, etc. (To avoid extra reads, i.e. when I read a SalesLead, I don't need to read the SalesRep and Customer Document, as they are also stashed/nested with the SalesLead)
| /sales_reps //SalesReps Collection
| /docId //Document ID
| + salesRepId (document id)
| + firstName
| + lastName
| + other employee/salesRep related info etc.
| /customers //Customers Collection
| /docId //Document ID
| + customerId (document id)
| + firstName
| + lastName
| + address + other customer specific related info such as contact info (phone, email) etc.
Logically Sales Leads are of course linked to a Customer (one to many, one Customer can have many leads).
All the Fields mentioned below I need to be able to "query" and "filter"
| /sales_leads //SalesLeads Collection
| /docId //Document ID
| + customerId (document id) <- this is what I would query by to look for leads for a specific customer
| + salesRepId (document id) <- this is what I would query by to look for leads for a specific sales Rep
| + status <- (String: "Open", "Sold", "Lost", "On Hold)
| + progress <- (String: "Started", "Appointment scheduled", "Estimates created", etc. etc., )
| + type <- (String: New Construction or Service/Repair)
| + jobTye <- (String: Different Types job Jobs related to what type of structures they are; 8-10 types right now)
| + reference <- (String: How the lead was referred to the company, i.e. Facebook, Google, etc. etc. );
| + many other (non queryable) data related to a lead, but not relevant here...
SalesEstimates are related to Leads in a one to many relationship. (one lead can have many estimates) But Estimates are not all that relevant for this discussion, but just wanted to include it anyhow. I query and filter estimates in a very similar way I do with leads, though. (similar fields etc.)
| /sales_estimates //SalesEstimates Collection
| /docId //Document ID
| + salesLeadId (document id) <- this is what I would query by to look for estimates for a specific lead
| + customerId (document id) <- this is what I would query by to look for estimates for a specific customer
| + salesRepId (document id) <- this is what I would query by to look for estimates for a specific sales Rep
| + specific sales Lead related data etc....
In my "Sales Lead List" on the client, I have some Drop Down Boxes as Filters, that contain Values (i.e. Sales Reps) but also haven an Option/Value "All" to negate any filtering.
So I would start assembling a query:
Query query = db.collection("sales_leads");
//Rep
if (!salesRepFilter.equals("All")) { //Typically only Managers/Supervisors woujld be able to see "all leads" whereas for a SalesRep this would be set on his own ID by default.
query = query = query.whereEqualTo("salesRepId", salesRepId);
}
//Lead Status (Open, Sold, Lost, On Hold)
if (!statusFilter.contains("All")) {
query = query.whereEqualTo("status", statusFilter);
}
//Lead Progress
if (!progressFilter.contains("All")) {
query = query.whereEqualTo("progress", progressFilter);
}
//Lead Type
if (!typeFilter.contains("All")) {
query = query.whereEqualTo("leadType", typeFilter);
}
//Job Type
if (!jobTypeFilter.contains("All")) {
query = query.whereArrayContains("jobTypes", jobTypeFilter);
}
//Reference
if (!referenceFilter.contains("All")) {
query = query.whereEqualTo("reference", referenceFilter);
}
Additionally I might want to reduce the whole query to a single customer (this typically means that all other filters are skipped and "all leads for this customer are shown). This would happen if the user opens the Customer Page/Details and clicks on something like "Show Leads for this customer".
//Filter by Customer (when entering my SalesLead List from a Customer Card/Page where user clicked on "Show Leads for this Customer")
if (filterByCustomer) {
query = query.whereEqualTo("customerId", customerFilter);
}
//And at last I want to be able to query the date Range (when the lead was created) and also sort by "oldest" or "newest"
//Date Range
query = query.whereGreaterThan("leadCreatedOnDate", filterFromDate);
.whereLessThan("leadCreatedOnDate", filterToDate;
//Sort Newest vs Oldest
if (sortByNewest) { //either newest or oldest
query = query.orderBy("leadCreatedOnDate", Query.Direction.ASCENDING);
} else {
query = query.orderBy("leadCreatedOnDate", Query.Direction.DESCENDING);
}
And that would complete my query on sales leads. Which that all works great right now but I am anxious about going forward and ultimately hitting the composite index limitation. I don't have an exact number, but I am probably entertaining 25-30 composite indexes just for my collection of sales_leads. (Yikes!)
Not only are there many fields to query by, the amount of composite indexes required is multiplied by the combination of possible filters set. (UGH)
I need to be able to query all leads and then filter them by the fields mentioned above (when describing my sales_leads collection).
So instead of keeping all these collections as top level collections I am guessing that somehow I should restructure my database by entertaining sub collections, but I tried modeling this with different approaches and always seem to hit a wall.
I suppose I could have "sales_leads" as a subcollection under each customer object and could use a collection group query to retrieve "all leads", but those require composite indexes, too right? So it would just be tradeoff for that one searchable field. (..hits wall..)
Sorry for the length. I hope it's readable. I appreciate any help, feedback and input. I'm in a very anxious and frustrated position.
If this doesn't work, I might need to consider professional consultation.
Thanks!
Here are a few things I think will help you.
First, watch the AWS re:Invent 2018: Amazon DynamoDB Deep Dive on YouTube. It's about DynamoDB but DynamoDB is a NoSQL database very similar to Firestore and the concepts universally apply. Midway through the video, Rick uses a company like yours as an example and you may be surprised to see how effectively he can reduce query count simply through data modeling.
Second, familiarize yourself with Firestore's index merging. In situations like yours, it may be better to manually create your composite indices, or at least manually audit them, because Firestore's automatic indexing doesn't guarantee the most efficient menu of composite indices. Remember, composite indices are automatically created based on the order you execute queries and if you execute a query later that could be better structured by voiding a previous index, Firestore will not go back and delete it for you—you have to.
I'm highly suspicious of the fact that the sales-lead query consumes 25-30 composite indices; that number seems far too high to me given how many fields in the documents are indexed. Before you do anything—after having watched the video and studied index merging, of course—I'd focus entirely on this collection. You must be completely certain of the maximum number of composite indices this collection needs to consume. Perhaps create a dummy collection and experiment with index merging and really understand how it works because this alone may solve all of your problems. I would be shocked if Firestore couldn't handle your company's use case.
Third, don't be afraid to denormalize your data. The fundamental premise of NoSQL is really denormalization—that is, data storage really should be your least concern and computation/operation really should be your greatest concern. If you can reduce your query count by duplicating data over multiple documents in multiple collections, that is simply what you must do if the alternative is hitting 200 composite indices.

How do I model this in DynamoDB?

I am testing out DynamoDB for a serverless app I am building. I have successfully modeled all of my application's query patterns except one. I was hoping someone could provide some guidance. Here are the details:
Data Model
There are three simple entities: User (~1K records), Product (~100K), ActionItem (~100/product).
A User has a many-to-many relationship with Product.
A Product has a one-to-many relationship with ActionItem.
The Workflow
There's no concept of "Team" for this app. Instead, a user is assigned a set of products which they (and others) are responsible for managing. The user picks the oldest items from their products' action item list, services the item and then closes it.
The use case I am trying to model is: As a user, show me all action items for products to which I am assigned.
Any help would be greatly appreciated.
Really only two options...
If you can store the list of products within the 400KB limit of DDB record, then you could have a record like so...
Hash Key: userID
Sort KEY: "ASSIGNED_PRODUCTS"
Otherwise,
Hash key: UserID
Sort key: "#PRODUCT#10001-54502"
userID in the above might be the raw userid, or if using a GSI, might be something like "#USER#user-id"

DynamoDB M-M Adjacency List Design Pattern - Delete all associations?

I'm designing a Table with GSIs using an Adjacency List Design pattern - which would be able for perform all specified Queries and all seems to be working well (if you see any other improvements, please mention them!).
One problem I'm stuck on is how to delete associations. Let's say user deletes a Tag, which is used across multiple pages (where pages are tagged with one or more tags).
What I was hoping to do is just user deleteItem - but it requires whole PrimaryKey = partitionKey + sortKey (if exists) and deletes only ONE item.
BatchWrite would require basically the same - to know the sort key for every association in order to delete it.
If for example I would need to delete "tag-article" I would need to delete three rows:
tag itself - PK: tag-article SK: tag-article
reference to page-cs-articleId - PK: tag-article SK: page-cs-articleId
reference to page-en-article2 - PK: tag-article SK: page-en-article2
Is there any other way of improving the table design which would allow me to actually delete all associations (tags) by specifying it's id?
Thank you kindly for any hints!
The adjacency list design pattern requires you to query the GSI and/or table to find everything that is related. Only once you know all the associations can you delete them.

firebase realtime schema design

i have two set of entities in my firebase realtime schema. Called Orders and customers.
so far i was not actually relating them in my app but was just showing them related. the current schema looked like:
{
"orders" : [
{"id" : 1, "name": "abc", "price": 200, "customer": "vik"}
],
"customers" : [
{"cust_id" : "10", "name" : "vik", "type": "existing"}
]
}
so i have a orders list page showing all the orders in a table which i get firing /orders.json
But practically, instead of having the customer name directly in the orders i should have cust_id attribute as that is the key.
That naturally makes it a standard relational schema where i will be free to change customer attributes without worrying about mismatch in orders.
However, the downside i see right away is that if i have say 20 orders to show in the order list table then instead of 1 i will end up firing 21 rest calls (1 to get order list and 20 to fetch customer name for each of the order)
What are the recommendations or standards around this ?
Firebase is a NoSQL database. So the rules of normalization that you know from relational databases don't necessarily apply.
For example: having the customer name in each order is actually quite normal. It saves having to do a client-side join for each customer record, significantly simplifying the code and improving the speed of the operation. But of course it comes at the cost of having to store data multiple times (quite normal in NoSQL databases), and having to consider if/how you update the duplicated data in case of updates of the customer record.
I recommend reading NoSQL data modeling, watching Firebase for SQL developers, and reading my answer on keeping denormalized data up to date.

Resources