EBS Layer behavior: created per layer or per instance? - aws-opsworks

I noticed you can create EBS volumes for each Layer in your Opsworks stack.
My question: is the EBS volume shared amongst the Instances of a Layer, or does each Instance gets its own EBS? (so, is one EBS created, or are many?)
Why: I'm creating a custom database layer, and have configured my database to write its data on the EBS. Of course I don't want separate database instances in the database layer stomping on each others data. So I would prefer separate EBS volumes, but haven't seen anything canonical about behavior either way.

While it's not cool to answer my own question, I've confirmed results experimentally, and am noting my results for people From The Future. We whom are from the past salute you!
EBS volumes are not shared across the instances of a layer - in fact, each instance gets its own EBS. When you start and stop an instance it mounts the EBS volume associated with that OpsWorks hostname.
Phrased in another way: data on an EBS is private to an instance and writes to that volume will not stomp on other instance's data.

Related

Is it possible to create a load balancer in gcp that works without instance groups in a regional scope?

the problem is I have 3 virtual machines with the same source images in three diferent zones in a region. I can't put them in a MIG because each of them has to attach to a specific persistent disk and according that I researched, I have no control of which VM in the MIG will be attached to which persistent disk (please correct me if I´m wrong). I explored the unmanaged instance group option too, but only has zonal scope. Is there any way to create a load balancer that works with my VMs or I have to create another solution (ex. NGINX)?
You have two options.
Create an Unmanaged Instance Group. This allows you to set up your VM instances as you want. You can create multiple instance groups for each region.
Creating groups of unmanaged instances
Use a Network Endpoint Group. This supports specifying backends based upon the Compute Engine VM instance's internal IP address. You can specify other methods such as an Internet address.
Network endpoint groups overview
Solution in my case was using a dedicated vm, intalling nginx as a load balancer and creating static ips to each vm in the group. I couldn't implement managed or unmanaged instance groups and managed load balancer.
That worked fine but another solution found in quicklabs was adding all intances in a "instance pool", maybe in the future I will implement that solution.

Is possible to scale out Hashicorp Vault using DynamoDB storage backend?

I am using Vault on AWS with the DynamoDB backend. The backend supports HA.
storage "dynamodb" {
ha_enabled = "true"
region = "us-west-2"
table = "vault-data"
}
Reading the HA concept documentation:
https://www.vaultproject.io/docs/concepts/ha.html
To be highly available, one of the Vault server nodes grabs a lock within the data store. The successful server node then becomes the active node; all other nodes become standby nodes. At this point, if the standby nodes receive a request, they will either forward the request or redirect the client depending on the current configuration and state of the cluster -- see the sections below for details. Due to this architecture, HA does not enable increased scalability.
I am not interested in having a fleet of EC2 instances behind a ELB, where only 1 instance behaves like a master and talks to DynamoDB.
I would like to run N Ec2 instances running Vault, that read and write independently from DynamoDB.
Because DynamoDB supports read/write from multiple EC2 instances, I would expect to be able to unseal Vault from multiple instances simultaneously and perform read and write operations. This should work even with ha_enabled = "false", without doing the leader election.
Why this architecture is not suggested in the documentation ? Why it should not work ? Is there any cryptographic limitation that I am missing ?
thank you
It is a feature of Vault Enterprise. With it, you can set up a primary cluster and as many "secondary" clusters, better known as performance replicas. Each cluster has its own storage and unseal mechanism. So you could have one cluster on Dynamo DB and the other on Raft. If both are on Dynamo DB, then you'll need a Dynamo DB table for each.
But keep in mind that performance replicas will always forward write operations to the primary cluster. A write operation is something that affect the global state of Vault. In that sense a POST to /transit is not considered a write operation.
Another possibility is to have your kv store mounted locally (with the -local flag). Then it will behave like a primary even when mounted on a performance replica, at the price of not being able to replicate to the other cluster.
A final note: DR clusters are an exact copy of any cluster. Each cluster, whether a primary or a replica, can have its DR cluster.

Are there performance benefits to offloading WordPress media folder and database to GCP?

I have been looking into offloading a WordPress database to Google CloudSQL and the media folder to Google Cloud Storage.
What are the performance benefits of doing so? At what point is it worth it?
For this answer I assume your Wordpress is running on a GCE instance.
Moving database and static resources to Cloud SQL and GCS is likely to have a slight increase in the amount of traffic your instance can handle before becoming overloaded. Moving the database to Cloud SQL is likley to slightly reduce speed of requests, as database hits take a round trip.
Where Cloud SQL and GCS will help you is scalability and potentially reliability:
scalability will increase as with static resources and data moved to a shared service, you no longer need to keep state in the GCE instance itself. This means you can add new GCE instances serving wordpress from the same database behind a load balancer and handle lots more traffic than a single instance could.
If you add multiple instances, you will gain reliability as you no longer have a single point of failure. If one GCE instance goes down, the other GCE will handle the traffic and the LB will ensure no downtime occurs. With HA set up on Cloud SQL, you also gain database reliability. GCS also has lots of redundancy built in. More importantly, you can spread your instances across zones and avoid impact from single-zone issues in GCP.

Cassandra : isolated workloads

I have three workloads.
DATACENTER1 sharing data by rest services - streaming ingest
DATACENTER2 load bulk - analysis
DATACENTER3 research
I want to isolated workloads, i am going to create one datacenter foreach workloads.
The objective of the operation is to prevent a heavy process from consuming all the resources and gurantee hight availablity data.
Is anyone already trying this ?
During a loadbulk on datacenter2, is data availability good on datacenter1 ?
Short answer is that workload won't cause disruption of load across datacenter. How it works is as follows:
Conceptually when you create a Keyspace, Cassandra creates a Virtual Data Center (VDC). Nodes with similar workloads must be assigned to same VDC. Segregating workload will ensure that only (exactly) one workload is ever executed at a VDC. As long as you follow this pattern, it works.
Data sync needs to be monitored under load on busy nodes but thats a normal concern on any Cassandra deployment.
Datastax Enterprise also support this model as can be seen from:
https://docs.datastax.com/en/datastax_enterprise/4.6/datastax_enterprise/deploy/deployWkLdSep.html#deployWkLdSep__srchWkLdSegreg

Shared data for Amazon EC2 instances

To handle high traffic, I'm planning to scale out, run my web application (WordPress based) on some EC2 instances (I'm very new to AWS). The instances need to work on the same data (images, videos...).
I am thinking about using S3 as the storage for this shared data.
My questions are:
If I use S3, do I need to write extra codes for my application to upload and get data to/from S3? Or there is a magic way to mount EC2 instances to S3, and after that EC2 instances can access S3 as accessing the local storage?
I've heard that S3 is a bit slow since it is accessed through web services (if users upload files and it takes time to upload the files to S3). So is there any better way for storing shared data?
I've read some documents about the ability of scaling of Amazon EC2. But none of them mentions about how to handle shared data. Any help is highly appreciated. Thanks.
There is no native facility to 'mount' an S3 bucket as storage to an EC2 instance, although there are several 3rd-party apps which offer mechanisms to make S3 storage available as a virtual drive or repository. Most of them offer a preset amount of free storage and then a tiered charging mechanism for larger amounts - Google for 'S3 storage interface' and take a look.
Whether you write code to use S3 through the API or use an interface layer, there will always be some latency between your app and the storage. That's a fact of physics and there's nothing you can do to eliminate the delay, because the S3 repository is not local to the EC2 cluster - so you will never achieve 'local' storage access speeds.
An alternative might be to use EBS which is local to your EC2 instance - it has different properties to S3 (for example, it does not offer edge locations for regionally-localised access) but is much faster for application use because it is inside the EC2 cluster and mounted as local storage.
You could mount S3 bucket onto all your EC2 instances. It's a 2-way mount so all your files will be synchronized. You could use s3f3 to do the mounting.
I used this guide and set up mine pretty quick: Mount S3 onto EC2
If you are then concerned about speed, you can use Amazon ElastiCache or even use EBS as a cache drive.
For starts you question lacks some details about your application architecture, but there are some possibilities.
First, if you project is medium-sized you could use GlusterFS on your main nodes as servers and clients at the same time (using native or NFS protocol), RDS *Multi-AZ* MySQL instance for DataBase. CloudFront as CDN with CDN linker or W3TC plugins. Also, put an ELB in front.
In this particular case I would recommend a couple c3.large instances at least.
Second, when your project would grow - you should make instance AMI and created an auto-scaling group that would just connect to your main storage and compute instances. (Also consider lifting compute load from these rather small instances).
Things to consider additionally:
Great WordPress article about WordPress clusters for is http://harish11g.blogspot.ru/2012/01/scaling-wordpress-aws-amazon-ec2-high.html
Alternative to GlusterFS solution could be Ceph File System
You also could (or maybe should) you SSD cache (for example flashcache) for that GlusterFS volume.

Resources