Amazon ECS Application level monitoring and instance autohealing without ELB - http

i'm deploying and app to Amazon ECS and need some advice on application level monitoring (periodic HTTP 200 and/or body match). Usually i place it behind an ELB and i am sure that my ELB will take action if it sees too many HTTP errors.
However this time it's a very low budget project and the budget for the ELB should be avoided (also consider this is going to work with only one instance as the userbase is very limited).
What strategies could i adopt to grant that the application is alive (kill instance and restart in case of too many app errors)? Regarding the instance i know about AWS autohealing but that's infrastructure.
Obviously one of the problems is that not having an ELB i must bind the DNS to an EIP....so reassigning it it's crucial.
And obviously the solution should not involve any other EC2 instance, external services are acceptable but keeping it all inside AWS would be great.
Thanks a lot

Monitoring of ECS is important to improve the importance of your site. If you still think there could be issues related to deployment on AWS, I suggest to practice auto-scaling feature of AWS.
You can scale up ECS when needed and release it when not required.
Nagios is another open source monitoring tool that you can leverage. Easy to install and configure.

Related

Nginx multiple instances

I have an EC2 instance with AWS and I have installed nginx and created multiple server blocks to server multiple applications.
However, if nginx goes down, all the applications go down as well.
Is there any way to setup seperate nginx instance for each application? So if one nginx instance goes down, it won't affect other instances.
Yes, its technically possible to install 2 nginx instances on the same server but I would do it another way.
1 - You could just create multiple EC2 instances. The downside of this approach is that maybe it's gets harder to maintain depending on how many instances you want.
2 - You could use Docker or any of its alternatives to create containers and solve this problem. You can create as many containers you need and totally separate nginx instances. Although docker is simple to learn and start using it in no time, the downside of this approach is that you need to put a little effort to learn it and your main EC2 instance needs to have enough resources to share between the containers.
I hope it helps!
If it's possible to use ELB instead of nginx. this will be more convenient but if ELB doesn't work for you. nginx already support High Availability mode to avoid the problem you mentioned of having a single point of failure
it's documented officially here
https://www.nginx.com/products/nginx/high-availability/
it's better than having one nginx machine for every application and grantee more availability
The type of redundancy that you’re looking for is usually provided by a load balancer or reverse proxy in practice. There’s a lot of ways this can be achieved architecturally, but generally speaking looks like this;
Run multiple nginx instances with the same server definitions, and a balancer like haproxy. This allows the balancer to check which nginx instances are online and send requests to each in turn. Then if a instance goes down, or the orchestrator is bring up a new one, requests only get sent to the online ones.
If requests need to be distributed more heavily, you could have nginx instances for each server, with a reverse proxy directed at each instance or node.
There may be some overhead for nginx if you do it that way and your nginx may be difficult to maintain later because there are many nginx instances. Ex. If you need to update or add some modules it will be harder.
How about if you try using EC2 autoscaling group even a minimum 1 and desired 1? So that it will automatically launch a new instance if the current one goes down.
If you need to preserve some settings like the elastic ip of your EC2, you can try to search for EC2 instance recovery. It will restore your setup unlike the autoscaling group.
But it would be better if you will use a loadbalancer like ALB and use 2 instances at a minimum. Using an ALB will also make you more secure. You may also want to read about ALB target groups. It will give you more options on how to solve your current problem.

nginx behind load balancers

I've found at that Instagram share their technology implementation with other developers trough their blog. They've some great solutions for the problems they run into. One of those solutions they've is an Elastic Load Balancer on Amazon with 3 nginx instances behind it. What is the task of those nginx servers? And what is the task of the Elastic Load balancers, and what is the relation between them?
Disclaimer: I am no expert on this in any way and am in the process of learning about AWS ecosystem myself.
The ELB (Elastic load balancer) has no functionality on its own except receiving the requests and routing it to the right server. The servers can run Nginx, IIS, Apache, lighthttpd, you name it.
I will give you a real use case.
I had one Nginx server running one WordPress blog. This server was, like I said, powered by Nginx serving static content and "upstreaming" .php requests to phpfpm running on the same server. Everything was going fine until one day. This blog was featured on a tv show. I had a ton of users and the server could not keep up with that much traffic.
My first reaction would be to just use the AMI (Amazon machine image) to spin up a copy of my server on a more powerful instance like m1.heavy. The problem was I knew I would have traffic increasing over time over the next couple of days. Soon I would have to spin an even more powerful machine, which would mean more downtime and trouble.
Instead, I launched an ELB (elastic load balancer) and updated my DNS to point website traffic to the ELB instead of directly to the server. The user doesn’t know server IP or anything, he only sees the ELB, everything else goes on inside amazon’s cloud.
The ELB decides to which server the traffic goes. You can have ELB and only one server on at the time (if your traffic is low at the moment), or hundreds. Servers can be created and added to the server array (server group) at any time, or you can configure auto scaling to spawn new servers and add them to the ELB Server group using amazon command line, all automatically.
Amazon cloud watch (another product and important part of the AWS ecosystem) is always watching your server’s health and decides to which server it will route that user. It also knows when all the servers are becoming too loaded and is the agent that gives the order to spawn another server (using your AMI). When the servers are not under heavy load anymore they are automatically destroyed (or stopped, I don’t recall).
This way I was able to serve all users at all times, and when the load was light, I would have ELB and only one Nginx server. When the load was high I would let it decide how many servers I need (according to server load). Minimal downtime. Of course, you can set limits to how many servers you can afford at the same time and stuff like that so you don’t get billed over what you can pay.
You see, Instagram guys said the following - "we used to run 2 Nginx machines and DNS Round-Robin between them". This is inefficient IMO compared to ELB. DNS Round Robin is DNS routing each request to a different server. So first goes to server one, second goes to server two and on and on.
ELB actually watches the servers' HEALTH (CPU usage, network usage) and decides which server traffic goes based on that. Do you see the difference?
And they say: "The downside of this approach is the time it takes for DNS to update in case one of the machines needs to get decommissioned."
DNS Round robin is a form of a load balancer. But if one server goes kaput and you need to update DNS to remove this server from the server group, you will have downtime (DNS takes time to update to the whole world). Some users will get routed to this bad server. With ELB this is automatic - if the server is in bad health it does not receive any more traffic - unless of course the whole group of servers is in bad health and you do not have any kind of auto-scaling setup.
And now the guys at Instagram: "Recently, we moved to using Amazon’s Elastic Load Balancer, with 3 NGINX instances behind it that can be swapped in and out (and are automatically taken out of rotation if they fail a health check).".
The scenario I illustrated is fictional. It is actually more complex than that but nothing that cannot be solved. For instance, if users upload pictures to your application, how can you keep consistency between all the machines on the server group? You would need to store the images on an external service like Amazon s3. On another post on Instagram engineering – “The photos themselves go straight to Amazon S3, which currently stores several terabytes of photo data for us.”. If they have 3 Nginx servers on the load balancer and all servers serve HTML pages on which the links for images point to S3, you will have no problem. If the image is stored locally on the instance – no way to do it.
All servers on the ELB would also need an external database. For that amazon has RDS – All machines can point to the same database and data consistency would be guaranteed.
On the image above, you can see an RDS "Read replica" - that is RDS way of load balancing. I don't know much about that at this time, sorry.
Try and read this: http://awsadvent.tumblr.com/post/38043683444/using-elb-and-auto-scaling
Can you please point the blog entry out?
Load balancers balance load. They monitor the Web servers health (response time etc) and distribute the load between the Web servers. On more complex implementations it is possible to have new servers spawn automatically if there is a traffic spike. Of course you need to make sure there is a consistency between the servers. THEY CAN share the same databases for instance.
So I believe the load balancer gets hit and decides to which server it will route the traffic according to server health.
.
Nginx is a Web server that is extremely good at serving a lot of static content for simultaneous users.
Requests for dynamic pages can be offloaded to a different server using cgi. Or the same servers that run nginx can also run phpfpm.
.
A lot of possibilities. I am on my cell phone right now. tomorrow I can write a little more.
Best regards.
I am aware that I am late to the party, but I think the use of NGINX instances behind ELB in Istagram blogpost is to provide high available load balancer as described here.
NGINX instances do not seem to be used as web servers in the blogpost.
For that role they mention:
Next up comes the application servers that handle our requests. We run Djangoon Amazon High-CPU Extra-Large machines
So ELB is used just as a replacement for their older solution with DNS Round-Robin between NGINX instances that was not providing high availability.

Azure Cloud Services vs VMs for Existing Asp.Net website

I have seen variations of this question but couldn't find any that dealt with our particular scenario.
We have an existing aps.net website that links to a SQL Server database.
The database has clr user-defined types, hence it can only be hosted in Azure VM since Cloud Services don't support said types.
We initially wanted to use a vm for the database and cloud service for the front-end, but then some issues arose:
We use StateServer for storing State, but Azure doesn't support that. We would need to configure either Table storage, SQL Databases, or a Worker role dedicated to State management (a new worker role is an added cost). Table storage wouldn't be ideal due to performance. The other 2 options are preferable but they introduce cost or app-reconfiguration disadvantages.
We use SimpleMembership for user management. We would need to migrate the membership tables from our vm instance sql server to Azure's SQL Databases. This is an inconvenience as we want to keep all our tables in the same database, and splitting up the 2 may require making some code changes.
We are looking for a quick solution to have this app live as soon as possible, and at manageable cost. We are desperately trying to avoid re-factoring our code just to accommodate hosting part of the app in Azure Cloud services.
Questions:
Should we just go the VM route for hosting everything?
Is there any cost benefit in leveraging a VM instance (for sql server) and a Cloud Service instance (for the front-end)?
It seems to me every added "background process" to a Cloud Service will require a new worker role. For example, if we wanted to enable smtp for email services, this would require a new role, and hence more cost. Is this correct?
To run SQL Server with CLR etc, you'll need to run SQL Server in a Virtual Machine.
For the web tier, there are advantages to Cloud Services (web roles), as they are stateless - very easy to scale out/in without worrying about OS setup. And app setup is done through startup scripts upon bootup. If you can host your session content appropriately, the stateless model will be simpler to scale and maintain. However: If you have any type of complex installations to perform that take a while (or manual intervention), then a Virtual Machine may indeed be the better route, since you can build the VM out, and then create a master image from that VM. You'll still have OS and app maintenance issues to contend with, just as you would in an on-premises environment.
Let me correct you on your 3rd bullet regarding background processes. A cloud service's web role (or worker role) instances are merely Windows Server VM's with some scaffolding code for startup and process monitoring. You don't need a separate role for each. Feel free to run your entire app on a single web role and scale out; you'll just be scaling at a very coarse-grain level.
Some things to consider...
If you want to be cheap, you can have your web/worker role share the same code on a single machine by adding the RoleEntryPoint. Here is a post that actually shows how to do what you are trying to do with sending email:
http://blog.maartenballiauw.be/post/2012/11/12/Sending-e-mail-from-Windows-Azure.aspx
Session management is painfully slow in SQL Azure DB, I would use the Azure Cache if you can..it is fast.
SQL Server with VMs is going to cause problems for you, because you will also need to create a virtual network between that and any cloud services. This is really stupid, but if you deploy a cloud service AND a VM they communicate over the PUBLIC LOAD BALANCER causing a potential security concern and network latency. So, first you need to virtual network them (that is an extra cost)..then you also need to host a DNS server to address the SQL Server VM. Yes this is really stupid, unless you are OK with your web/worker roles communicating with your SQL Server over the internet :)
EDIT: changed "public internet" to "public load balancer" (and noted latency)
EDIT: The above information is 100% correct contrary to the comment by David below. Please read the guidance from Microsoft here:
http://msdn.microsoft.com/library/windowsazure/dn133152.aspx#scenario
DIRECTLY FROM MICROSOFT GUIDANCE speaking about cross Cloud Service communication (VM->web/worker roles):
"We recommend that you implement the first option as the connection process would not need to go through the public Internet. Therefore, it would provide a better network performance."
As of today (8/29/2013) Azure VMs and Worker/Web Roles are deployed into DIFFERENT "Cloud Services". Therefore communication between them needs to be secured via a Virtual Network that exposes private IP addresses between the instances.
To follow up on David's point below, that about adding an ACL. You are still sending packets over the internet using TDS (SQL Server protocol). That can be encrypted, but no sane architect/enterprise governance/security governance would "allow" this scenario to happen in a production environment.

Configuring nginx web server with multiple app server of aws stack

I am a DevOps guy and presently I am running my Ruby on Rails application on ubuntu ec2 where the app and also the web server reside inside the same box but we are using mysql RDS cluster. I can see lot of spikes due to more traffic to the web site. So I am planning to change the system. I wanna put web server nginx in a separate instance and web app in a separate instance. But this needs a load balancer which should reside in nginx box, but once the traffic goes up, the nginx instance can be configured to auto scale. What about the app server instance? It can be configured to auto scale but it needs to attach itself to the web server and web server needs to discover the new app server which was created. How can achieve this? Kindly help me out to get this done.
When you are using one single web server at the moment, a transition to using nginx as static webserver and proxy for another backend webserver on another instance really makes sense and will give you performance boost.
However I am not sure if you really need autoscaling. Autoscaling mostly makes sense if you want to react on fast traffic spikes etc. If you have a more or less continuous workload that might increase over time, it should be easier to manually launch and add another backend server in the nginx config. If this does not work for you, you can still have a look at Amazon's Elastic Loadbalancers and Autoscaling afterwards.

Security groups and UDP on Heroku

Has anyone experienced running multiple collaborating applications on Heroku? For example, an admin application to manage another application; or a stats server observing another application?
On Amazons' EC2 platform you can use security groups to restrict access to servers, creating a virtual network between your application or server instances. Is there any such way to do this on Heroku? If so, can you open UDP as well as TCP connections?
Thanks
Robbie
The comment from #elithrar is correct. To talk between applications you either need to define an API, or used shared resources. For example you can have 2 applications connect to the same database by manually copying and pasting the DATABASE_URL from one app to another. This has the downside that should we need to roll credentials (very rare) your manually copied configuration will break.
The same pattern can be used with any add-ons, such as https://addons.heroku.com/redistogo or https://addons.heroku.com/iron_mq to share a message bus or queue between two applications.

Resources