Lets say we have a target table TargetTable and we have N different source tables such as SourceTable1, SourceTable2,...,SourceTableN. Now let there be an update policy defined against TargetTable such that every source table feeds to the target table through this update policy. And the common query is a fixed function called TargetTable_loader(). So the output of .show table TargetTable policy update command will look as follows:-
[
{
"IsEnabled": true,
"Source": "SourceTable1",
"Query": "TargetTable_loader()",
"IsTransactional": true,
"PropagateIngestionProperties": false
},
{
"IsEnabled": true,
"Source": "SourceTable2",
"Query": "TargetTable_loader()",
"IsTransactional": true,
"PropagateIngestionProperties": false
},
.
.
.
.
{
"IsEnabled": true,
"Source": "SourceTableN",
"Query": "TargetTable_loader()",
"IsTransactional": true,
"PropagateIngestionProperties": false
}
]
Now, I have the following two questions about this situation.
Since it's a common function for all the (Source,Target) pairs , is there a way for the function to refer to the input table using some generic variable? Because if the function refers to a specific table name , it won't be generic any more. And at the same time if I had to create N different functions that each will refer to a different source table , it will be redundant. Is there a way to parameterize this function with input table name? Something like this:-
.create-or-alter function TargetTable_loader(InputTable:string) { InputTable | ..... }
Secondly , if I have such a common function for mapping all the source tables to the target tables as shown in the initial example -- and say data is being continuously ingested into all the source tables using streaming , what will happen if I abruptly update definition of the common function ? Of course I will ensure that even the new function definition will be valid. But my question is rather focused on whether sudden update of the function , which is being used as query for these N update policies , will affect execution of update policy causing failures merely due the fact that it was abrupt update.
Answers:
You can define a function that receives a table name as a string parameter, and then reference it like this table(tableName):
.create-or-alter function MyFunction(TableName: string) {
table(TableName)
| ...
}
Updating a function used for an update policy won't cause failures (as long as the result schema remains the same).
Related
I am new to AWS in general, I am building a relatively simple application with Amplify, but I've used Google Firebase before. My question is: Is there a way to set a constrain for a field to be non-negative? I have an application that does transactions and I don't want my balance to be negative. I just need a simple error/exception. Is it possible to set a field constraint in DynamoDB that says "This field should be >= 0"?.
I also checked if it was possible to do it in the VTL amplify generated resolver of my graphql mutation, and indeed it is possible to set some constraints, But somehow it allows the operation and crashes on the next one (when the balance on the DB is already < 0, like if it checks it before the update). I tried saying something like "current_balance - transaction >= 0" but I couldn't get it to work.
So it seems that the only way is to create a custom lambda resolver that does the various checks before submitting the mutation to DynamoDB. I haven't tried it yet but I don't understand how I can do a check on the current balance (stored in the DB) without doing a query.
More in general is it even possible to validate fields (even with simple assertions like non-negative) on amplify/dynamoDB? Moving to another DB like Aurora would help?
Thanks for you help
DynamoDb supports conditional updates which allow an update to be applied when the given condition is met. You can set the condition current_balance >= cost for your update.
However, the negative balance is not the main problem. What you should address is how to prevent other requests from updating the same current_balance at the same time, or in short, race conditions on current_balance. In order to deal with that, you also need a conditional update whose condition is "current_balance = initial_balance". The initial_balance is, I guess, what you get from DynamoDB at the very beginning of the purchase process.
Sample VTL code
#set( $remaining_balance = $initial_balance - $transaction_cost )
#if( $remaining_balance < 0 )
$util.error("Insufficient balance")
#end
{
"version" : "2018-05-29",
"operation" : "UpdateItem",
"key": { <your-dynamodb-key> },
"update" : {
"expression" : "SET current_balance = :remaining_balance",
"expressionValues" : {
":remaining_balance" : $util.dynamodb.toNumberJson($remaining_balance)
}
},
"condition": {
"expression": "current_balance = :initial_balance",
"expressionValues" : {
":initial_balance" : $util.dynamodb.toNumberJson($initial_balance)
}
}
}
I have a partitioned collection with about 400k documents in a particular partition. Ideally this would be more distributed, but I need to deal with all the documents in the same partition for transaction considerations. I have a query which includes the partition key and the document id, which returns quickly with 2.58 RUs of usage.
This query is dynamic and potentially could be constructed to have an IN clause to search for multiple document ids. As such I added an ORDER BY to ensure the results were in a consistent order, adding the clause however caused the RUs to skyrocket to almost 6000! Given that the WHERE clause should be filtering down the results to a handful before sorting, I was surprised by these results. It almost seems like it's applying the ORDER BY before the WHERE clause, which must not be correct. Is there something under the covers with the ORDER BY clause that would explain this behavior?
Example document:
{ "DocumentType": "InventoryRecord", (PartitionKey, String) "id": "7867f600-c011-85c0-80f2-c44d1cf09f36", (DocDB assigned GUID, stored as string) "ItemNumber": "123345", (String) "ItemName": "Item1" (String) }
With a Query looking like this:
SELECT * FROM c where c.DocumentType = 'InventoryRecord' and c.id = '7867f600-c011-85c0-80f2-c44d1cf09f36' order by c.ItemNumber
You should at least put a range index to ItemNumber. This should ensure, there is a ordering as expected. The addition in your indexing policy this would look like
{
"path": "/ItemNumber/?",
"indexes": [
{
"kind": "Range",
"dataType": "String",
"precision": -1
}
]
}
I have the following situation:
I have a team entity, in each team we have one or more users.
At first I thought about creating an array of IDS inside team. And then download all team and use the javascript to go through these IDS and fetch the corresponding user.
Something like that:
"teams": {
"xxxxxxx": {
"ids": [0: "bKvysPZZCudBKbbjLYV8ZKr1NUo1", 1: XOvysPZZCudBKbbjLYV8ZKr1NUo1]
}
}
But I do not know if it is the best solution. I would like your opinion.
Tks.
I would recommend making a dictionary of IDs where each ID maps with the boolean value of true. I.e.:
"team-users": {
"team1": [
"uid1": true,
"uid2": true,
...
]
}
if you want to get a team that a user is a part of, then use a parallel structure in you database. Add the following node:
"user-teams": {
"uid1": [
"team1": true,
"team2": true,
...
]
}
Reading from this separate node is faster than querying.
I have three models, User, Project and ProjectMember. Keeping things simple, the models have the following attributes:
User
- id
Project
- id
- owner_id
- is_published
ProjectMember
- user_id
- project_id
Using sequelize.js, I want to find all projects where the project owner is a specific user, or where there is a project member for that project whose user is that user, or where the project is published. I imagine the raw SQL would look something like this:
SELECT p.*
FROM Project p
LEFT OUTER JOIN ProjectMember m
ON p.id = m.project_id
WHERE m.user_id = 2
OR p.owner_id = 2
OR p.is_published = true;
There are plenty of examples out there on how to perform a query on an association, but I can find none on how to do so conditionally. I have been able to query just the association using this code:
projModel.findAll({
where: { },
include: [{
model: memberModel,
as: 'projectMembers',
where: { 'user_id': 2 }
}]
})
How do I combine this where query in an $or to check the project's owner_id and is_published columns?
It's frustrating, I worked for hours to try to solve this problem, and as soon as I ask here I found a way to do it. As it turns out, sequelize.js developers recently added the ability to use raw keys in your where query, making it (at long last) possible to query an association inside of the main where clause.
This is my solution:
projModel.findAll({
where: {
$or: {
'$projectMembers.user_id$': 2,
owner_id: 2,
is_published: true
}
},
include: [{
model: memberModel,
as: 'projectMembers'
}]
})
Note: this solution breaks if you use 'limit' in the find options. As an alternative, you can fetch all results and then manually limit them afterwards.
On the firebase structure data section, it shows how to structure data with a many-many user-group situation. But, why they have used "referece":true on both the side instead of using a simple array od ids.
Like, it can be used like both the ways:
A user having array of groups
"groups" : [ "groupId1", "groupId2", ... ]
A user having
"groups": {
"groupId1" : true,
"groupId2" : true,
..
}
They have done it a second way. What is the reason for that?
Something was told at the Google I/O 2016 for that in some video. But, I'm unable to recall.
Example from structure your data:
// An index to track Ada's memberships
{
"users": {
"alovelace": {
"name": "Ada Lovelace",
// Index Ada's groups in her profile
"groups": {
// the value here doesn't matter, just that the key exists
"techpioneers": true,
"womentechmakers": true
}
},
...
},
"groups": {
"techpioneers": {
"name": "Historical Tech Pioneers",
"members": {
"alovelace": true,
"ghopper": true,
"eclarke": true
}
},
...
}
}
Firebase recommends against using arrays in its database for most cases. Instead of repeating the reasons here, I'll refer you to this classic blog post on arrays in Firebase.
Let's look at one simple reason you can easily see from your example. Since Firebase arrays in JavaScript are just associative objects with sequential, integer keys, your first sample is stored as:
"groups" : {
0: "groupId1",
1: "groupId2"
]
To detect whether this user is in groupId2, you have to scan all the values in the array. When there's only two values, that may not be too bad. But it quickly gets slower as you have more values. You also won't be able to query or secure this data, since neither Firebase Queries nor its security rules support a contains() operator.
Now look at the alternative data structure:
"groups": {
"groupId1" : true,
"groupId2" : true
}
In this structure you can see whether the user is in groupId2 by checking precisely one location: /groups/groupId2. It that key exists, the user is a member of groupId2. The actual value doesn't really matter in this case, we just use true as a marker value (since Firebase will delete a path if there's no value).
This will also work better with queries and security rules, because you now "just" needs an exists() operator.
For some great insights into this type of modeling, I highly recommend that article on NoSQL data modeling.