DynamoDB ExclusiveStartKey Misuse? - amazon-dynamodb

I was planning to use a Dynamo table as a sort of replication log, so I have a table that looks like this:
+--------------+--------+--------+
| Sequence Num | Action | Thing |
+--------------+--------+--------+
| 0 | ADD | Thing1 |
| 1 | DEL | Thing1 |
| 2 | ADD | Thing2 |
+--------------+--------+--------+
Each of my processes keeps track of the last sequence number it read. Then on an interval it issues a Scan against the table with ExclusiveStartKey set to that sequence number. I assumed this would result in reading everything after that sequence, but instead I am seeing inconsistent results.
For example, given the table above, if I do a Scan(ExclusiveStartKey=1), I get zero results when I am expecting to see the 3rd row (seq=2).
I have a feeling it has to do with the internal hashing DynamoDB uses to partition the items and that I am misusing the ExclusiveStartKey option.
Is this the wrong tool for the job?
Alternatively, each process could issue a Query for seq+1 on each interval (looping if anything was found), which would result in the same ReadThroughput, but would require N API calls instead of N/1MB I would get with a Scan.

When you do a DynamoDB Scan operation, it does not seem to proceed sorted by the hash key. So using ExclusiveStartKey does not allow you to get an arbitrary page of keys.
For this example table with the Sequence ID, what I want can be accomplished with a Kinesis stream.

Related

DynamoDB Global Secondary Index "Batch" Retrieval

I've see older posts around this but hoping to bring this topic up again. I have a table in DynamoDB that has a UUID for the primary key and I created a secondary global index (SGI) for a more business-friendly key. For example:
| account_id | email | first_name | last_name |
|------------ |---------------- |----------- |---------- |
| 4f9cb231... | linda#gmail.com | Linda | James |
| a0302e59... | bruce#gmail.com | Bruce | Thomas |
| 3e0c1dde... | harry#gmail.com | Harry | Styles |
If account_id is my primary key and email is my SGI, how do I query the table to get accounts with email in ('linda#gmail.com', 'harry#gmail.com')? I looked at the IN conditional expression but it doesn't appear to work with SGI. I'm using the go SDK v2 library but will take any guidance. Thanks.
Short answer, you can't.
DDB is designed to return a single item, via GetItem(), or a set of related items, via Query(). Related meaning that you're using a composite primary key (hash key & sort key) and the related items all have the same hash key (aka partition key).
Another way to think of it, you can't Query() a DDB Table/index. You can only Query() a specific partition in a table or index.
Scan() is the only operation that works across partitions in one shot. But scanning is very inefficient and costly since it reads the entire table every time.
You'll need to issue a GetItem() for every email you want returned.
Luckily, DDB now offers BatchGetItem() with will allow you to send multiple, up to 100, GetItem() requests in a single call. Saves a little bit of network time and automatically runs the requests in parallel; but otherwise is the little different from what your application could do itself directly with GetItem(). Make no mistake, BatchGetItem() is making individual GetItem() requests behind the scenes. In fact, the requests in a BatchGetItem() don't even have to be against the same tables/indexes. The cost for each request in a batch will be the same as if you'd used GetItem() directly.
One difference to make note of, BatchGetItem() can only return 16MB of data. So if your DDB items are large, you may not get as many returned as your requested.
For example, if you ask to retrieve 100 items, but each individual
item is 300 KB in size, the system returns 52 items (so as not to
exceed the 16 MB limit). It also returns an appropriate
UnprocessedKeys value so you can get the next page of results. If
desired, your application can include its own logic to assemble the
pages of results into one dataset.
Because you have a GSI with PK of email (from what I understand) you can use PartiQL command to get your batch of emails back. The API is called ExecuteStatment and you use a SQL like syntax:
SELECT * FROM mytable.myindex WHERE email IN ['email#email.com','email1#email.com']

Cosmos DB RU Usage by collection name

I am trying to find what's causing the higher RU usage on the Cosmos DB. I enabled the Log Analytics on the Doc DB and ran the below Kusto query to get the RU consumption by Collection Name.
AzureDiagnostics
| where TimeGenerated >= ago(24hr)
| where Category == "DataPlaneRequests"
| summarize ConsumedRUsPer15Minute = sum(todouble(requestCharge_s)) by collectionName_s, _ResourceId, bin(TimeGenerated, 15m)
| project TimeGenerated , ConsumedRUsPer15Minute , collectionName_s, _ResourceId
| render timechart
We have only one collection on the DocDb Account (prd-entities) which is represents Red line in the Chart. I am not able to figure out what the Blue line represents.
Is there a way to get more details about the empty collection name RU usage (i.e., Blue line)
I'm not sure but I think there's no empty collection costs RU actually.
Per my testing in my side, I found that when I execute your kusto query I can also get the 'empty collection', but when I watch the line details, I found all these rows are existing in my operation. What I mean here is that we shouldn't sum by collectionName_s especially you only have one collection in total, you may try to use requestResourceId_s instead.
When using requestResourceId_s, there're still some rows has no id, but they cost 0.
AzureDiagnostics
| where TimeGenerated >= ago(24hr)
| where Category == "DataPlaneRequests"
| summarize ConsumedRUsPer15Minute = sum(todouble(requestCharge_s)) by requestResourceId_s, bin(TimeGenerated, 15m)
| project TimeGenerated , ConsumedRUsPer15Minute , requestResourceId_s
| render timechart
Actually, you can check the requestCharge_s are coming from which operation, just watch details in Results, but not in Chart, and order by the collectionName_s, then you'll see those requests creating from the 'empty collection', judge if these requests existing in your collection.

How to do a basic sort in DynamoDb?

Dynamodb can make the simplest of database operations difficult. I have the following table and all I want to do is simply sort by the due column. How is this achieved in DynamoDb? I read everything I could find online and there doesn't seem to be a straightforward walk through anywhere.
payor | amount | due | paid
----------------------------------
Ally | 200.00 | 13 | 1
Chase | 80.00 | 2 | 0
Wells | 30.00 | 17 | 1
Directv | 150.00 | 5 | 0
So without considering the payor, amount or paid columns, how can I simply sort on the due column.
Simply, this can't be achieved in DynamoDB if the due attribute is not defined as sort key. Even if you define the due attribute as sort key, the ordering can be done only within the particular partition key. The ordering can't be done across the partition key.
Assume, you have defined the due as sort key of the table. You can use ScanIndexForward to true/false to order the items in ascending / descending order.
Data modeling in dynamo db involves designing the partition key and then determining the sort key for a use case. Partition key is compulsory for any query. This is a basic design premise of a key value nosql store which is completely different than a relational store

Can't filter Firebase data

I'm having problems filtering the data with the element of polymer. My code is like this:
<firebase-collection location="--url--" order-by-child="user_id" equal-to="1" log="true" data="{{message}}"></firebase-collection>
My database is:
messages
|
|__0__content
| |__letter_id
| |__user_id
| |__datetime
|
|__1__content
| |__letter_id
| |__user_id
| |__datetime
|
|__etcetera
This should get the messages with a user_id that is equal to 1. However, this shows nothing. I guess this is a problem with my syntax, but I can't figure out the problem.
Never mind, I had my Firebase database setup with user_id set as a number. Therefore it could not obtain fetch the data.
I had the same issue but fixed it by using both the orderByValueType (as number) and the orderByChild set like so:
<firebase-collection
order-by-child="posted_date"
start-at="0"
order-value-type="number"
limit-to-first="1"
log="true"
location="https://somefirebaselocation.firebaseio.com/articles"
data="{{articles}}"></firebase-collection>
Without the order-value-type the query did not work.

A rudimentary way to store comments on a proposal webpage using SQLite

I am a software engineer, but I am very new to databases and I am trying to hack up a tool to show some demo.
I have an Apache server which serves a simple web page full of tables. Each row in the table has a proposal id and a link to a web page where the proposal is explained. So just two columns.
----------------------
| id | proposal |
|--------------------
| 1 | foo.html |
| 2 | bar.html |
----------------------
Now, I want to add a third column titled Comments where a user can leave comments.
------------------------------------------------
| id | proposal | Comments |
|-----------------------------------------------
| 1 | foo.html | x: great idea ! |
| | | y: +1 |
| 2 | bar.html | z: not for this release |
------------------------------------------------
I just want to quickly hack up something to show this as a demo and get feedback. I am planning to use SQLite to create a table per id and store the userid, comments in the table. People can add comment at the same time. I am planning to use lock to perform operations on the SQLite database. I am not worried about scaling just want to show and get feedback. Are there any major flaw in this implementation?
There are similar questions. But I am looking for a simplest possible implementation.
Table per ID; why would you want to do that? If you get a large number of proposals, the number of tables can get out of hand very quickly. You just need to keep an id column in the table to keep track of things and keep the number of tables in a sane figure.
The other drawback of using a table for each proposal is that you will not be able to use prepared statements for those, because table names cannot be bound as a parameter.
SQLite assumes the table name is 'a'
Add column
alter table a add column Comments text;
Insert comment
insert into a values (4,"hello.html","New Comment");
You need to provide values for the other two columns along with the new comment.

Resources