Is possible to scale out Hashicorp Vault using DynamoDB storage backend? - amazon-dynamodb

I am using Vault on AWS with the DynamoDB backend. The backend supports HA.
storage "dynamodb" {
ha_enabled = "true"
region = "us-west-2"
table = "vault-data"
}
Reading the HA concept documentation:
https://www.vaultproject.io/docs/concepts/ha.html
To be highly available, one of the Vault server nodes grabs a lock within the data store. The successful server node then becomes the active node; all other nodes become standby nodes. At this point, if the standby nodes receive a request, they will either forward the request or redirect the client depending on the current configuration and state of the cluster -- see the sections below for details. Due to this architecture, HA does not enable increased scalability.
I am not interested in having a fleet of EC2 instances behind a ELB, where only 1 instance behaves like a master and talks to DynamoDB.
I would like to run N Ec2 instances running Vault, that read and write independently from DynamoDB.
Because DynamoDB supports read/write from multiple EC2 instances, I would expect to be able to unseal Vault from multiple instances simultaneously and perform read and write operations. This should work even with ha_enabled = "false", without doing the leader election.
Why this architecture is not suggested in the documentation ? Why it should not work ? Is there any cryptographic limitation that I am missing ?
thank you

It is a feature of Vault Enterprise. With it, you can set up a primary cluster and as many "secondary" clusters, better known as performance replicas. Each cluster has its own storage and unseal mechanism. So you could have one cluster on Dynamo DB and the other on Raft. If both are on Dynamo DB, then you'll need a Dynamo DB table for each.
But keep in mind that performance replicas will always forward write operations to the primary cluster. A write operation is something that affect the global state of Vault. In that sense a POST to /transit is not considered a write operation.
Another possibility is to have your kv store mounted locally (with the -local flag). Then it will behave like a primary even when mounted on a performance replica, at the price of not being able to replicate to the other cluster.
A final note: DR clusters are an exact copy of any cluster. Each cluster, whether a primary or a replica, can have its DR cluster.

Related

Using endpoints of AWS ElastiCache for Redis

I am using AWS ElastiCache for Redis as the caching solution for my spring-boot application. I am using spring-boot-starter-data-redis and jedis client to connect with my cache.
Imagine that I am having my cache in cluster-mode-enabled and 3 shards with 2 nodes in each. I agree then the best way of doing it is using the configuration-endpoint. Alternatively, I can list all the endpoints of all nodes and let the job done.
However, even if I use a single node's endpoint from one of the shards, my caching solution works. That doesn't looks right to me. I feel even if it works, that might case problems in the cluster in long run. When there are all together 6 nodes partitioned into 3 shards but only using one node's endpoint. I have following questions.
Is using one node's endpoint create an imbalance in the cluster?
or
Is that handled automatically by the AWS ElastiCache for Redis?
If I use only one node's endpoint does that mean the other nodes will never being used?
Thank you!
To answer your questions;
Is using one node's endpoint create an imbalance in the cluster?
NO
Is that handled automatically by the AWS ElastiCache for Redis?
Somewhat
if I use only one node's endpoint does that mean the other nodes will never being used?
No. All nodes are being used.
This is how Cluster Mode Enabled works. In your case, you have 3 shards meaning all your slots (where key-value data is stored) are divided into 3 sub-clusters ie. shards.
This was explained in this answer as well - https://stackoverflow.com/a/72058580/6024431
So, essentially, your nodes are smart enough to re-direct your requests to the nodes that has the key-slot where your data needs to be stored. So, no imbalances. Redis handles the redirection for you.
Now, while using Node endpoints, you're going to be facing other problems.
Elasticache is running on cloud (which is essentially AWS Hardware). All hardware faces issues. You have 3 primaries (1p, 2p, 3p) and 3 (1r, 2r, 3r) replicas.
So, if a primary goes down due to hardware issue (lets say 1p), the replica will get promoted to become the new Primary for the cluster (1r).
Now the problem would be, your application is connected directly to 1p which has now been demoted to replica. So, all the WRITE operations will fail.
And you will have to change the application code manually whenever this happens.
Alternatively, if you were using configurational endpoint (or other cluster level endpoints) instead of node-endpoints, this issue would only be a blip to your application at most, perhaps for 1-2 seconds.
Cheers!

Migrating from an unencrypted Redshift Cluster to Encrypted

I am trying to enable SSE with a Customer-Managed CMK in my production Redshift cluster to follow certain security protocols.
For POC purposes, I spun up a 1 Node dc2.large Redshift cluster and following this doc, I was able to enable SSE.
However, my question is, does enabling SSE encrypt the existing data in the cluster? If not, what steps should be taken?
Overall what are the downsides, if any, of enabling encryption at rest in a production Redshift cluster and what are the best practices?
There is no need to change anything in your code or existing pipelines/process. This is Disk encryption. Its nothing to do with your database connections or code.
To know more about the process then read these links.
https://aws.amazon.com/about-aws/whats-new/2018/10/encrypt-amazon-redshift-1-click/
https://docs.aws.amazon.com/redshift/latest/mgmt/changing-cluster-encryption.html

DynamoDB DAX and High Availability

What's your preferred strategy for dealing with DAX's maintenance windows?
DynamoDB itself has no MWs and is very highly available. When DAX is introduced into the mix, if it's the sole access point of clients to DDB then it becomes a SPOF. How do you then handle degradation gracefully during DAX scheduled downtimes?
My thinking was to not use the DAX Client directly but introduce some abstraction layer that allows it to fall back to direct DDB access when DAX is down. Is that a good approach?
DAX maintenance window doesn't take the cluster offline, unless it is a one-node cluster. DAX provides availability through multiple nodes in the cluster. For a multi-node cluster, each node in the cluster goes through maintenance in a specific order in order for the cluster to remain available. With retries configured on the DAX client, your worload shouldn't see an impact during maintenance windows.
Other than maintenance window, cluster nodes need to be divided across multiple AZs, for availability in case an AZ goes down.
An abstraction layer to fall back to DDB is not a bad idea. But you need to make sure you have the provisioned capacity configured to handle the load spike.

MariaDB Galera cluster: are replicate-do-db filters applied before or after data sent?

I would like to synchronize only some databases on a cluster, with replicate-do-db.
→ If I use the Galera cluster, are all data sent over the network, or are nodes smart enough to only fetch their specific databases?
On "classic" master/slave MariaDB replication, filters are made by the slave, causing network charge for nothing if you don't replicate that database. You have to configure a blackhole proxy to filter binary logs to avoid this (setup example), but the administration after is not really easy. So it would be perfect with a cluster if I can perform the same thing :)
binlog_... are performed in the sending (Master) node.
replicate_... are performed in the receiving (Slave) node.
Is this filtered server part of the cluster? If so, you are destroying much of the beauty of Galera.
On the other hand, if this is a Slave hanging off one of the Galera nodes and the Slave does not participate in the "cluster", this is a reasonable architecture.

EBS Layer behavior: created per layer or per instance?

I noticed you can create EBS volumes for each Layer in your Opsworks stack.
My question: is the EBS volume shared amongst the Instances of a Layer, or does each Instance gets its own EBS? (so, is one EBS created, or are many?)
Why: I'm creating a custom database layer, and have configured my database to write its data on the EBS. Of course I don't want separate database instances in the database layer stomping on each others data. So I would prefer separate EBS volumes, but haven't seen anything canonical about behavior either way.
While it's not cool to answer my own question, I've confirmed results experimentally, and am noting my results for people From The Future. We whom are from the past salute you!
EBS volumes are not shared across the instances of a layer - in fact, each instance gets its own EBS. When you start and stop an instance it mounts the EBS volume associated with that OpsWorks hostname.
Phrased in another way: data on an EBS is private to an instance and writes to that volume will not stomp on other instance's data.

Resources