Voting on items - how to design database/aws-lambda to minimize AWS costs - amazon-dynamodb

I'm working on a website that mostly displays items created by registered users. So I'd say 95% of API calls are to read a single item and 5% are to store a single item. System is designed with AWS API Gateway that calls AWS Lambda function which manipulates data in DynamoDB.
My next step is to implement voting system (upvote/downvote) with basic fetaures:
Each registered user can vote only once per item, and later is only allowed to change that vote.
number of votes needs to be displayed to all users next to every item.
items have only single-item views, and are (almost) never displayed in a list view.
only list view I need is "top 100 items by votes" but it is ok to calculate this once per day and serve cached version
My goal is to design a database/lambda to minimize costs of AWS. It's easy to make the logic work but I'm not sure if my solution is the optimal one:
My items table currently has hashkey slug and sortkey version
I created items-votes table with hashkey slug and sortkey user and also voted field (containing -1 or 1)
I added field votes to items table
API call to upvote/downvote inserts to item-votes table but before checks constraints that user has not already voted that way. Then in second query updates items table with updated votes count. (so 1 API call and 2 db queries)
old API call to show an item stays the same but grabs new votes count too (1 API call and 1 db query)
I was wondering if this can be done even better with avoiding new items-votes table and storing user votes inside items table? It looks like it is possible to save one query that way, and half the lambda execution time but I'm worried it might make that table too big/complex. Each user field is a 10 chars user ID so if item gets thousands of votes I'm not sure how Lambda/DynamoDB will behave compared to original solution.
I don't expect thousands of votes any time soon, but it is not impossible to happen to a few items and I'd like to avoid situation where I need to migrate to different solution in the near future.

I would suggest to have a SET DynamoDB (i.e. SS) attribute to maintain the list of users who voted against the item. Something like below:-
upvotes : ['user1', 'user2']
downvotes : ['user1', 'user2']
When you update the votes using UpdateExpression, you can use ADD operator which adds users to SET only if it doesn't exists.
ADD - Adds the specified value to the item, if the attribute does not
already exist. If the attribute does exist, then the behavior of ADD
depends on the data type of the attribute:
If the existing data type is a set and if Value is also a set, then
Value is added to the existing set. For example, if the attribute
value is the set [1,2], and the ADD action specified [3], then the
final attribute value is [1,2,3]. An error occurs if an ADD action is
specified for a set attribute and the attribute type specified does
not match the existing set type. Both sets must have the same
primitive data type. For example, if the existing data type is a set
of strings, the Value must also be a set of strings.
This way you don't need to check whether the user already upvote or downvote for the item or not.
Only thing you may need to ensure is that the same user shouldn't be present on upvote and downvote set. Probably, you can use REMOVE or ConditionExpression to achieve this.

Related

Conditional insert in Dynamodb

I am creating a leave tracker app where I want to store the user ID along with the from date and to date. I am using Amazon's DynamoDB as the database, and the user enters a leave through a custom command.
Eg: apply-leave from-date to-date
I want to avoid duplicate entries in the database. For example, if a user has already applied for a leave between 06-10-2019 to 10-10-2019 and applies for a leave between the same dates again, they should get a message saying that this already exists and a new record should not be created for the same.
However, a user can apply for multiple leaves and two users can take a leave between the same dates.
I tried using a conditional statement as follows:
table.put_item(
Item={
'leave_id': leave_id,
'user_id': user_id,
'from_date': from_date,
'to_date': to_date,
},
ConditionExpression='attribute_not_exists(user_id) AND attribute_not_exists(from_date) AND attribute_not_exists(to_date)'
)
where leave_id is the partition key. However, this does not work and a new row is added every time, even if it is the same dates. I have looked through similar other questions, but haven't been able to understand how to get this configured correctly.
Any ideas on how I should go about this, or if there is a different design that I should follow?
If you are calling your code with the leave_id that doesn't yet exist in the table, the item will always be inserted. If you call your code with leave_id that does already exist in your table you should be getting An error occurred (ConditionalCheckFailedException) when calling the PutItem operation: The conditional request failed error message.
I have two suggestions:
If you don't want to change your table, you can create a secondary index with user_id as the partition key and then query the index for all the items where the given user has some from_date and to_date attributes.
Like this:
table.query(
IndexName='user_id-index',
KeyConditionExpression=Key('user_id').eq(user_id),
FilterExpression=Attr('from_date').exists() & Attr('from_date').exists()
)
Then you will need to check for overlapping leave requests, etc. (eg. leave request that starts before the one that is already in place finishes). After deciding that the leave request is a valid one you will call put_item.
Another suggestion and probably a better one would be to create a composite primary key on your table with user_id as a partition key and leave_id as a sort key. That way you could execute a query for all leave requests from a particular user without the need to create a secondary index.

Doctrine innerOrder[int] column implementation for manual sort order control

I have two tables in my app's schema: Event and Game (one-to-many). Games are ordered by datetime field. But sometimes there can be games played in parallel (same datetime), but the user should be able to set their relative order.
I've added innerOrder (int) field with simple idea: it should have autogenerated value that can be changed on reorder (exchange with neighbor record). But I can't achieve this behavior with Doctrine: GeneratedValue can't be used twice / with separate field (just don't work this way).
On the next attempt I've tried to do it without autogeneration. But I need some initial value on insert, for example: MAX(innerOrder) (better - to set it automatically of course).
I can't do it in prePersist or similar methods - don't have access to repository class. And don't want to do it with additional query in controller - not only because of additional code I should insert each time (get max value from table, set inner order), but I'm afraid of possible conflicts (when two users are adding Games in parallel).
How should I achieve expected behavior (maybe, I'm totally wrong here)?
There is no need in achieving this behavior with Doctrine, you can manage this value from aggregate root. I.e when you attach the Game to the Event you can update it innerOrder value according to maximum of currently attached games + 1. Conflicts could be easily avoided with different kind of locks on Event you edit (i.e fetcing it with doctrine write lock or some kind of shared locks or mutex (see symfony/lock))
After it you can specify your relation confiration to fetch it with given order using this documentation
https://www.doctrine-project.org/projects/doctrine-orm/en/2.6/tutorials/ordered-associations.html
My two cents: when creating/modifying an event, you can check if there's one already at the same time (default innerOrder is 0, or even count(*) of the events at the same time). You can issue a warning when there's another event, ask for the order, or take to a form where you can manually reassign the order of the events.

Using timestamp as an Attribute in DynamoDB

I'm quite new to DynamoDB, but have some experience in Cassandra. I'm trying to adapt a pattern I followed in Cassandra, where each column represented a timestamped event, and wondering if it will carry over gracefully into DynamoDB or if I need to change my approach.
My goal is to query a set of documents within a date range by using the milliseconds-since-epoch timestamp as an Attribute name. I'm successfully storing the following as each report is generated with each new report being added under its own column:
{ PartitionKey:customerId,
SortKey:reportName_yyyymm,
'#millis_1#':{'report':doc_1},
'#millis_2#':{'report':doc_2},
. . .
'#millis_n#':{'report':doc_n}
}
My question is, given a millisecond-based date range, and the accompanying Partition and Sort keys, is it possible to query the set of Attributes that fall within that range or must I retrieve all columns for the matching keys and filter them at the client?
Welcome to the most powerful NoSQL database ;)
To kick off with the positive news, there is no way to query out specific attributes. You can project certain attributes in a query. But you would have to write your own logic to determine which attributes or columns should be included in the projected query. To get close to your solution you could use a map attribute inside an item with the milliseconds as a key. But there is another thing you have to be aware of when starting on this path.
There is a maximum total item size of 400KB for each item in DynamoDB, including key and attribute names.(Limits in DynamoDB Items) This means you can only store so many attributes in an item. This is especially true if you intend to put the actual report inside of the attribute. Which I would advise against, also because you will be burning up read capacity units every time you get one attribute out of the whole item. You would be better of putting this data in a separate table with the keys in the map. But truthfully in DynamoDB I would split this whole thing up, just add the milliseconds to the sort key and make every document its own item. That way you can directly query to these items and you can use the "between" where clause to select specific date-time ranges. Please let me you meant something else.

Efficiently storing and retrieving likes

In my Firebase database I have posts and then authenticated users can "like" posts. How can I efficiently get the number of likes a post has received. I know using MongoDB I can add/remove the user's id to a list and then use a MongoDB function to get the length of it very quickly and set that equal to the likes amount, but I'm not suer how to do that using Firebase. I could also add/remove it to the list and increment a likeCount variable, but that seems like it would cause concurrency issues unless Firebase has a function for that. What functions can I call to best handle this and scale well? Thanks in advance!
You can do both things:
1) Create a votes node with the UID as key and a value to sum up all the votes.
post:{
//All the data
likes:{
$user_1:1,
$user_2:-1,
}
}
And then just get a SingleValue Event or a Value event(depending if you want to keep track of changes) and sum up all the children
2)You can use a transaction block and just save a value and increase or decrease it depending on the votes
(here is a link where you can find transactions for android,iOS or java)
https://firebase.google.com/docs/database/web/save-data#save_data_as_transactions
post:{
//All the data,
likes:2,
}
It really depends on how much information you want to store, and what the user can do once he/she already voted for some post,
I would recommend using both, to keep flexibility for the user to like (like in Facebook) so he can unlike something and use the transaction with number to keep it scalable.. so if a post gets 1,000,000 likes you don't have to count the 1,000,000 likes every time someone loads the post

Firebase better way of getting total number of records

From the Transactions doc, second paragraph:
The intention here is for the client to increment the total number of
chat messages sent (ignore for a moment that there are better ways of
implementing this).
What are some standard "better ways" of implementing this?
Specifically, I'm looking at trying to do things like retrieve the most recent 50 records. This requires that I start from the end of the list, so I need a way to determine what the last record is.
The options as I see them:
use a transaction to update a counter each time a record is added, use the counter value with setPriority() for ordering
forEach() the parent and read all records, do my own sorting/filtering at client
write server code to analyze Firebase tables and create indexed lists like "mostRecent Messages" and "totalNumberOfMessages"
Am I missing obvious choices?
To view the last 50 records in a list, simply call "limit()" as shown:
var data = new Firebase(...);
data.limit(50).on(...);
Firebase elements are ordering first by priority, and if priorities match (or none is set), lexigraphically by name. The push() command automatically creates elements that are ordered chronologically, so if you're using push(), then no additional work is needed to use limit().
To count the elements in a list, I would suggest adding a "value" callback and then iterating through the snapshot (or doing the transaction approach we mention). The note in the documentation actually refers to some upcoming features we haven't released yet which will allow you to count elements without loading them first.

Resources