Select a tenant's DynamoDB table in AppSync using their Cognito identity - amazon-dynamodb

I am building a multi-tenant application, with the tenants sharing a GraphQL API backed by AppSync. All of a tenant's metadata is stored in a single DynamoDB table, and AppSync uses a DynamoDB data source on that table to resolve queries and mutations. I would like to pick which table to perform the operation on based on the Cognito identity in the request resolver.
In AppSync, each data source must specify a table to resolve the DynamoDB operations, and this cannot be overridden in the AppSync resolver (except for some batch operations that allow multiple table operations in one resolver). There does not appear to be a way to dynamically select a data source in a pipeline resolver. Since each tenant's metadata is in a separate DynamoDB table, there does not appear to be a way to define a single API that is shared between tenants when using only DynamoDB data sources.
I have tried using an HTTPS resolver and recreating the DynamoDB request in that, but unfortunately there does not seem to be a $util.dynamodb.fromDynamoDB to convert the typed DynamoDB response to an untyped JSON response that the DynamoDB resolver performs automatically.
Here are my options as I understand them:
Put all tenant metadata into a single table, and use then tenant ID as the partition key for each item
Use a Lambda resolver that makes the subsequent DynamoDB request
Create a different AppSync API per-tenant with the same definitions for everything (but a different Cognito pool configuration for each)
Option 1 doesn't work for me because I'm already using the partition key in a number of different ways, especially to generate several sparse global indices on the table. I'd also like to provision the table throughput on a per-tenant basis, rather than sharing this across all tenants at once.
Option 2 may be the most straightforward answer, but I would prefer not to have to set up Lambdas for most of my resolvers when I'm just trying to perform a straightforward operation on DynamoDB.
Option 3 definitively solves the problem, but now I have to maintain as many APIs (and their endpoints) as I have tenants, which defeats the purpose of this question.
Are there any other options to support a multi-tenant setup with distinct tables per tenant in AppSync, preferably performing the routing logic in resolver templates?

Related

Best way to handle multiple container transactions operations in Cosmosdb Nosql?

Currently I am trying to design an application where we have a CosmosDB account representing a group of customers with:
One container is used an overall Metadata store that contains all customers
Other containers will containers will contain data specific to one customer where data will be partitioned on according to different categories of customer history etc.
When we onboard a new customer (which will not happen too often and once) we'd like to make sure that we create an row in the Overall customer Metadata and then provision the customer specific container if fail rollback the transaction if it fails. (In the future we'd like to remove customers as well.)
Unfortunately the Cosmosdb Nosql only supports transactions in one container within the same logical partition, and does not support multi-container transactions. Our own POC indicates the MongoDB api does support this but unfortunately MongoDB does not fit our use case as we need support for Azure Functions.
The heart of the problem here isn't whether Cosmos DB supports distributed transactions. The core problem is you can't enlist an Azure Control Plane action (in this case, creating a container resource) into a transaction.
Since you're building in the cloud, my recommendation would be to employ the outbox pattern to manage your provisioning state for your customers. There's an easy to understand example here you can read.
Given you are building a multi-tenant application for Cosmos DB and using containers as your tenant boundary, please note that the maximum number of databases and/or containers in an account is 500. Please see Service Quotas for more information.

Grant one IAM role access to a large number of DynamoDB tables

I have an AppSync app defined using a master CloudFormation stack and more than a dozen nested stacks. Each nested stack defines a DynamoDB table, an AppSync DataSource for that table, and an IAM role for that DataSource to access that table. The DataSource depends on the role, which depends on the table.
I would like to consolidate these IAM roles, for three reasons:
The role definitions are very repetitive and boilerplate-y.
There are many copies of this app, and it adds up to a lot of IAM roles — enough that we're running close to the soft limits.
Some resolvers use DynamoDB batch operations to access multiple tables, so at least some of the IAM roles must grant access to multiple tables anyway.
I do not want to give the role blanket access to all DynamoDB tables in the account.
The simplest way to grant one role access to every required table would be to list them manually in the policy document. This has the obvious downside of requiring that the policy be manually kept in sync when new tables are added. However, there is also a dependency problem: the DataSource in a nested stack depends on a role in the master stack, which depends on tables in the nested stacks.
I would have liked to use tags: grant for all DynamoDB tables that have a certain tag, then set that tag for each table. This way, the IAM role would not need to be edited when a new table was added. But apparently DynamoDB does not support tag-based conditions.
Is there an easy way to grant a single IAM role access to many DynamoDB tables without granting access to all of DynamoDB and without individually listing the tables in the role?
If you can name your tables in a way that gives them the same prefix you can use wildcards in the resource.
arn:aws:dynamodb:<Region>:<Account>:table/MyPrefix-*
That will work on all tables that start with MyPrefix-
If you are using generated names you can probably use the AWS::StackName value in place of MyPrefix but be aware that with nested stacks that value may get shortened.

Azure Cosmos DB secret connection strings per database

in my Azure Cosmos DB account, I can add multiple databases (containing multiple collections).
However, I only seem to find account-level connection strings (secrets), that are valid for each database. Differing only in the database name section.
I find this odd. Is this expected? If I want more granular control do I need to create separate accounts for each database?
PS: I'm using the Mongo API if it's somehow relevant.
Cheers
The account-level connection strings you mentioned in the question is master key.Based on this document, Azure Cosmos DB uses two types of keys to authenticate users and provide access to its data and resources.
Master keys cannot be used to provide granular access to containers and documents.
If you want more granular control,please get an idea of Resource Tokens which provides access to specific containers, partition keys, documents, attachments, stored procedures, triggers, and UDFs.More details,please refer to this link.

Is DynamoDB streams the right option for this use case?

I have a DynamoDB table that contains key value pairs that will be read by a number of applications. On startup each application will read the entire table and cache it in-memory.
The problem I'm trying to solve is that of getting the applications to update their cache if one or more items in the DynamoDB table have been modified.
DynamoDB streams initially seemed to be the right approach to solving the problem. I have implemented the consumer using Kinesis Client Library (KCL) as recommended by AWS. While implementing it, however, I have encountered some problems that make me believe that I'm on the wrong track. Specifically:
When I create a new consumer using KCL, it creates a new DynamoDB table to do the housekeeping of leases and checkpoints, such that when the application is restarted, KCL knows which records have been consumed and which have not. This is not what I need for this problem. Any stream records that are created while the application is offline is irrelevant, since the entire table is read upon application startup.
Several instances of the same application are running at the same time. Each of them needs to be notified of table updates. To implement that in KCL I need to assign a unique application name to each of them. Otherwise they will share the lease table and only one of the applications will get notified. One table for each application instance doesn't seem right. Also I would then need something to remove unused tables.
I also implemented it using the low level API instead. That works fine when there's a single shard. My implementation doesn't handle re-sharding like KCL, however, so it's too fragile. It seems wrong to have to implement handling of re-sharding for the simple problem I'm trying to solve.
I'm beginning to consider other solutions like:
Implementing a lambda function that gets triggered on updates to the table. The function sends a notification to an SNS topic. Consumers create SQS subscriptions on the topic and gets notified via that. This solution has too many moving parts for my liking.
Make the applications periodically re-read the entire table and determine themselves if changes have been made. This solution feels a bit primitive, but seems to be the simplest.
All solutions that I have considered so far have quite significant drawbacks. What am I missing?
It depends on how your KCL is pushing to the dependent apps but
I believe the SQS path is the correct choice.
You can add a presumably infinite number of consumers without being throttled.
When you do add another dependent app, it won't require changing your KCL to push to it, the new app will simply watch the SQS queue.
You gain the ability to monitor the queue when issues happen.
More moving parts to setup, but once you have the Streams -> SNS -> SQS pipe in place, it's basically bulletproof.
Just my 2¢.
Nowadays an AWS AppSync GraphQL API with subscriptions may be the simplest approach to power this type of application, with the least number of moving parts.
Whenever one of your applications starts up, it connects to your AppSync GraphQL API using the Amplify framework or AppSync SDK and subscribes to the updates its interested in. Then whenever an application updates information in the table via your GraphQL API, all your other applications will be notified of the change, along with the relevant changed data.
AppSync integrates well with DynamoDB out of the box, allowing you to generate DynamoDB tables with appropriate indexes alongside your GraphQL or generate GraphQL from your existing DynamoDB tables if you so choose. Amplify can even help you automatically generate an AppSync GraphQL API at a higher level with associated DynamoDB tables, indexes, entity relationships, and more like elasticsearch search capabilities by using their GraphQL transformers.

SQL Azure Sharding and Social Networking Apps

The concept of sharding on SQL azure is one of the top recommended options to get over the 50Gb DB size limit, it has at the moment. A key strategy in sharding is to group related records called atomic units together in a single shard , so that the application needs to only query a single SQL azure instance to retrieve the data.
However in applications such as Social networking Apps, grouping a atomic unit in a single shard is not trivial, due to the inter-connectivity of entities and records. what could be a recommended approach based on such a scenario?
Also in a sharded DB , what primary keys should be used for the tables ? Big Int or GUID. i currently use BIGINT Identity columns but if the data was to be merged for some reason this would be a problem due to conflicts between the values in different shards. i have heard some people recommend GUID's (UniqueIdentifier) but i'm wary on how this could affect performance. Indexing On-premise SQL servers with UniqueIdentifier columns is not possible, and i wonder how SQL azure implements similar strategies if i were to employ a UniqueIdentifier column.
For a social networking app, I'd presonally forgo using SQL and instead leverage a noSQL solution such as MongoDB or Azure Table Storage. These non-normalized but in-expensive systems allow you to create multiple entity datasets that are customized to your various indexing needs.
So instead of having something like...
User1 -< relationshiptable -< User2
You'd instead have tables like
Users
User1's Friends
User2's Friends
If Users 1 and 2 are both friends, then you'd have two entries to define that relationship, not one. But if makes retrieving a list of a specific user's friends trivial. It also now opens you up for executing tasks in parallel, by searching multiple index tables at a time.
This process scales extremely well, but does require that you invest more time in how the relationships are maintained. Admittedly, this is a simiplied example. Things get much more complex when you start discussing tasks like searching across your entire user base.

Resources