AWS DynamoDB data transfer (egress) pricing - amazon-dynamodb

I found this article regarding data transfer prices for certain services, but not including DynamoDB.
What is the rate for DynamoDB data transfer out?

It is the same price as EC2, but this is only for when you are accessing DynamoDB from outside of that AWS region. If you want to verify, use the AWS calculator and do 1 GB/day of EC2 out data transfer and so the same for DynamoDB. It should be $1.40 for each.
Three examples where you might run into this charge are:
Apps in us-west-2 accessing a DynamoDB table in us-east-1.
Apps accessing a DynamoDB table over a VPN or dedicated link.
Accessing a DynamoDB table over the Internet. For obvious security purposes, this is likely not a good idea. If you are going to do it, really lock it down with IAM permissions!!!

Related

Using database other than DynamoDB with iot-core

I'm just getting started on iot and note that iot-core uses DynamoDB. That's not going to work for me (no table joins, no stored procs etc.). Is it possible to use a different DB? I usually use Postgres but appreciate it may not be the quickest for a lot of inserts. What do others use?
For IoT use case and telemetry data storage, I'd suggest to use AWS Timestream. When data is sent to AWS IoT Core, you can use an IoT Rule to forward data to Kinesis Data Stream (useful here for decoupling as well as doing batch processing from downstream pipeline). AWS Lambda can be used to pull data from Kinesis Data Stream, enrich the data if needed (like adding building, users, company information) before storing the data into Timestream. You can then use Grafana to visualize the data from Timestream.

AWS Amplify creating multiple DynamoDB tables with duplicate information?

I'm building an offline-first mobile application using AWS Amplify, using the local DataStore and cloud sync. So far, I'm following the documentation without any variation (I think.)
As of now, I only have one model, lets call it at Thing. I noticed that after running amplify push, my environment contains not one, but two DynamoDB tables:
Thing-<app-id>-<env>
AmplifyDataStore-<app-id>-<env>
Whenever I save a Thing entity, it appears to be persisted redundantly in both tables. This effectively doubles my DynamoDB storage costs.
Is there a sound technical reason for this, or any way to avoid it? Or am I just making a mistake somewhere that is causing it to persist twice?
Assuming you have k models, then the Amplify DataStore will provision k + 1 tables. The extra table you're noticing is called the "delta sync table." It used to store incremental changes that have occurred since the last time the client synchronized fully with AppSync. The Delta Sync table carries a short TTL on the records, and they will get dropped if not utilized within that window of time.
To learn more about Delta Sync and DataStore generally, I recommend Ed Lima's AWS AppSync offline reference architecture – powered by the Amplify DataStore. See particularly the section labeled "The Delta Sync table."
Source: I'm an engineer on this product team.

Where to use CosmosDB?

CosmosDb has a good feature of Globally Distributed which gives Faster Response of data. This will be useful for Mobile Applications directly accessing CosmosDb where Users are spread across the Globe.
However I am using ASP.NET Web Application hosted in Azure. Here my Application to Database communication will be of Fixed Distance always.
Can I benefit from CosmosDb in this case?
This is for Azure hosted ASP.NET Application
You can utilize CosmosDB when you know noSQL concept and so is your code, it has different implementation for read and write processes or you are planning to do microservices or you have other projects that depends/communicate on your Webapp project and your using the same database
There are some points you need to take into account before choosing CosmosDB as the database.
Pricing model! CosmosDB is not a cheep database and pricing model is based on the provisioned throughput. Requests that exceed the provisioned throughput will be rejected by the database. So first make sure you completely understand how things work.
Like other document based databases, if you wanna keep a graph of objects in a document, you should consider how to handle concurrent updates to the documents (if that is the case in your app). Hope you know well the difference between document based and relational databases.
But regarding the benefits:
It has a great a integration support with other PaaS services in Azure
It scales very well if you have a good partitioning strategy

Amazon API ASP.NET

I am working on a site (ASP.NET MVC) that will ultimately display millions of book. I already have the titles and authors of these books in my MySQL database.
The goal is when a user searches for book, the top 20 matches (title and author) will appear on the page. I then plan to use the Amazon API to get more information (isbn, image, description etc) for these 20 books and flesh out these items via Ajax. I would then also add this info to MySQL so next time these specific books are requested, I already have the data.
My question is what Amazon Web Service should I use? There are so many like Amazon S3, Amazon SimpleDB etc. I just don't know which would be best for my needs. Cost is also a factor.
Any guidance would be greatly appreciated.
The API you're looking for is Amazon's Product Advertising API:
https://affiliate-program.amazon.com/gp/advertising/api/detail/main.html
in short, Amazon S3 is a technology oriented on large data storage whilst SimpleDB is a non relational database (as mongoDB and raven could be).
We use the first for storing the static files (javascript, css and pictures).
The first is cheaper but you can only retrieve a "file" at once. The second gives you some degree of support to queries.
If you need a relational database, you could use Amazon RDS which is a MySql database ready for replicas.

Sharding at application level

I am designing a multi-tenant system and am considering sharding by tenant at the application layer level instead of database.
Hypothetically, the way this should work is that for incoming request a router process has a global collection of tenants containing primary attributes to determine the tenant for this request as well as the virtual shard id. This virtual shard id is further mapped to an actual shard.
The actual shard contains both the code for application as well as whole data for this tenant. These shards would be LNMP (Linux, Nginx, MySQL/MongoDB, PHP) servers.
The router process should act as proxy. It should be able to run some code to determine the target shard for incoming request based on the collection stored in some local db or files. To be able to scale this better, i am considering making the shards themselves act as routers also so that they can run a reverse proxy that will forward the request to appropriate shard. Maybe the nginx instance running on shard can also act as that reverse proxy. But how will it execute the application logic needed to match up the request with the appropriate shard.
I will appreciate any ideas and suggestions for this router implementation.
Thanks
Another option would be to use a product such as dbShards. dbShards is the only sharding product that shards at the application level. This way you can use any RDMS (Postgres, MySQL, etc.) and still be able to shard your database without having to put some kind of proxy in-between. A lot of the other sharding products rely on a proxy to point the transactions to the correct shard, but dbShards knows where to go without having to "ask" anyone else.
Great product. dbshards
Unless you expect your tenants to generate approximately equal data volume, sharding by tenant will not be very efficient.
As to application level sharding in general, let me share my own experience:
Version 1 of our high-volume SaaS product sharded at the application level. You will find that resharding as you grow will be a major headache if you shard against a SQL type solution at the application level, or you will have to write significant tooling to automate the process.
We switched to MongoDB (after considering multiple alternatives including Cassandra) in no small part because of all of the built-in support for resharding / rebalancing as data grows.
If your application does not need the relational capabilities of MySQL, I would suggest concentrating your efforts on MongoDB (since you have already identified that as a possible data platform) if you expect more than modest data growth. Allow MongoDB to handle the data sharding.

Resources