scaling statsd with multiple servers - scale

I am laying out an architecture where we will be using statsd and graphite. I understand how graphite works and how a single statsd server could communicate with it. I am wondering how the architecture and set up would work for scaling out statsd servers. Would you have multiple node statsd servers and then one central statsd server pushing to graphite? I couldn't seem to find anything about scaling out statsd and any ideas of how to have multiple statsd servers would be appreciated.

I'm dealing with the same problem right now. Doing naive load-balancing between multiple statsds obviously doesn't work because keys with the same name would end up in different statsds and would thus be aggregated incorrectly.
But there are a couple of options for using statsd in an environment that needs to scale:
use client-side sampling for counter metrics, as described in the statsd documentation (i.e. instead of sending every event to statsd, send only every 10th event and make statsd multiply it by 10). The downside is that you need to manually set an appropriate sampling rate for each of your metrics. If you sample too few values, your results will be inaccurate. If you sample too much, you'll kill your (single) statsd instance.
build a custom load-balancer that shards by metric name to different statsds, thus circumventing the problem of broken aggregation. Each of those could write directly to Graphite.
build a statsd client that counts events locally and only sends them in aggregate to statsd. This greatly reduces the traffic going to statsd and also makes it constant (as long as you don't add more servers). As long as the period with which you send the data to statsd is much smaller than statsd's own flush period, you should also get similarly accurate results.
variation of the last point that I have implemented with great success in production: use a first layer of multiple (in my case local) statsds, which in turn all aggregate into one central statsd, which then talks to Graphite. The first layer of statsds would need to have a much smaller flush time than the second. To do this, you will need a statsd-to-statsd backend. Since I faced exactly this problem, I wrote one that tries to be as network-efficient as possible: https://github.com/juliusv/ne-statsd-backend
As it is, statsd was unfortunately not designed to scale in a manageable way (no, I don't see adjusting sampling rates manually as "manageable"). But the workarounds above should help if you are stuck with it.

Most of the implementations I saw use per server metrics, like: <env>.applications.<app>.<server>.<metric>
With this approach you can have local statsd instances on each box, do the UDP work locally, and let statsd publish its aggregates to graphite.
If you dont really need per server metrics, you have two choices:
Combine related metrics in the visualization layer (e.g.: by configuring graphiti to do so)
Use carbon aggregation to take care of that

If you have access to a hardware load balancer like a F5 BigIP (I'd imagine there are OSS software implementations that do this) and happen to have each host's hostname in your metrics (i.e. you're counting things like "appname.servername.foo.bar.baz" and aggregating them at the Graphite level) you can use source address affinity load balancing - it sends all traffic from one source address to the same destination node (within a reasonable timeout). So, as long as each metric name comes from only one source host, this will achieve the desired result.

Related

How to send 50.000 HTTP requests in a few seconds?

I want to create a load test for a feature of my app. It’s using a Google App Engine and a VM. The user sends HTTP requests to the App Engine. It’s realistic that this Engine gets thousands of requests in a few seconds. So I want to create a load test, where I send 20.000 - 50.000 in a timeframe of 1-10 seconds.
How would you solve this problem?
I started to try using Google Cloud Task, because it seems perfect for this. You schedule HTTP requests for a specific timepoint. The docs say that there is a limit of 500 tasks per second per queue. If you need more tasks per second, you can split this tasks into multiple queues. I did this, but Google Cloud Tasks does not execute all the scheduled task at the given timepoint. One queue needs 2-5 minutes to execute 500 requests, which are all scheduled for the same second :thinking_face:
I also tried a TypeScript script running asynchronous node-fetch requests, but I need for 5.000 requests 77 seconds on my macbook.
I don't think you can get 50.000 HTTP requests "in a few seconds" from "your macbook", it's better to consider going for a special load testing tool (which can be deployed onto GCP virtual machine in order to minimize network latency and traffic costs)
The tool choice is up to you, either you need to have powerful enough machine type so it would be able to conduct 50k requests "in a few seconds" from a single virtual machine or the tool needs to have the feature of running in clustered mode so you could kick off several machines and they would send the requests together at the same moment of time.
Given you mention TypeScript you might want to try out k6 tool (it doesn't scale though) or check out Open Source Load Testing Tools: Which One Should You Use? to see what are other options, none of them provides JavaScript API however several don't require programming languages knowledge at all
A tool you could consider using is siege.
This is Linux based and to prevent any additional cost by testing from an outside system out of GCP.
You could deploy siege on a relatively large machine or a few machines inside GCP.
It is fairly simple to set up, but since you mention that you need 20-50k in a span of a few seconds, siege by default only allows 255 requests per second. You can make this larger, though, so it can fit your needs.
You would need to play around on how many connections a machine can establish, since each machine will have a certain limit based on CPU, Memory and number of network sockets. You could just increase the -c number, until the machine gives an "Error: system resources exhausted" error or something similar. Experiment with what your virtual machine on GCP can handle.

Is it "okay" to host a small wordpress blog on one AWS EC2 Instance without load balancers/beanstalk?

This is a very simple question for those with the knowledge, but I'm a newbie.
In essence, I just need to know if it would be considered okay to run a small, approx. 700 visitors/day bitnami wordpress blog on just one t2.medium EC2 instance (without any auto-scaling, beanstalk).
Am at risk of it crashing? What stats should I monitor or be aware of to be aware of potential dangers? Sorry for the basic nature of these questions, but this is new.
tl;dr: It might be "okay", but it's not ideal.
If your question is because of:
Initial setup time - Load-balancing and auto-scaling will be less expensive (more time-efficient) over time.
Cost - Auto-scaling spins down instances that aren't being used to reduce cost.
Minimal setup for a great user experience - The goal of a great AWS setup is to ensure that capacity matches demand
Am at risk of it crashing?
Possibly, yes. If you average 700 visitors, then the risk is traffic spikes if all visitors hit at the same. It also depends on what your maximum visitors are, which could vary widely from the average (or not)
What stats should I monitor or be aware of to be aware of potential dangers?
Monitor the usage on high traffic days (ie. public holiday sales)
Setup billing alerts
Setup the right metrics:
See John Rotenstein's SO answer:
CPU Utilization is not always the right measure to use -- your
application might only be able to handle a limited number of
connections, it might be squeezed on RAM and the types of requests
might vary too.
You can use normal monitoring tools, or you can write something that
pushes metrics to Amazon CloudWatch, so that you go beyond the basic
CPU and Network metrics that CloudWatch normally provides. You could
even use the Load Balancer's Latency metric to trigger scaling when
the application slows down (custom code required).
I'd start with:
Two or more instances - to deal with instance redundancy (an instance going down)
Several t2.small rather than one t2.medium can work out to be more cost-efficient, and more cost efficient than EC in some use cases.
Add auto-scaling - automatically spin up or down instances based on minimum and maximum counts
Load balancing - to re-route users from unhealthy to healthy instances. And also to keep all of the spun up instances all working as evenly as possible (rather than a single instance handling 80% of the workload while the others bludge).
You can always reduce your instances after time with monitoring.
In my opinion, with 700 visitors a day, the safer option would be to run a load balanced/auto-scaling environment on Elastic Beanstalk with at least 2 instances. The problem with running just one instance is that yes you are at a great risk of crashing in case you get an increase in traffic or when the instance goes down and with just one running you will not have a fallback. You can easily set up CloudWatch monitoring on NetworkIn, NetworkOut to get a sense of the number of requests your site is receiving and serving, and setup CPU Usage monitoring as well. The trade-off with running a load balanced environment over a single instance environment is that the cost might significantly increase as you introduce other things into your environment such as a load balancer. Also if you introduce a load balancer consider reducing the instance size to maybe a t2.small, could aid in reducing the cost.
It actually depends. This question range is wide. You have multiple options here.
You can use only ec2 instance for that much amount of visitors or even more if your application allows. You can also consider caching if your app need it.
You may add instance in an autoscaling group. So that if by any chance you need more resources you can increase them horizontally.
You can add load balancers lateron also. You just need to add user data in your launch configuration attached to autoscaling group. So when your instance get up it should automatically register itself in your load balancer.
For monitoring, you can check for the request metrics in cloudwarch for ELB. You have to keep an eye on your CPU and trigger the scale out policy once it reaches a particular threshold.

How do I configure OpenSplice DDS for 100,000 nodes?

What is the right approach to use to configure OpenSplice DDS to support 100,000 or more nodes?
Can I use a hierarchical naming scheme for partition names, so "headquarters.city.location_guid_xxx" would prevent packets from leaving a location, and "company.city*" would allow samples to align across a city, and so on? Or would all the nodes know about all these partitions just in case they wanted to publish to them?
The durability services will choose a master when it comes up. If one durability service is running on a Raspberry Pi in a remote location running over a 3G link what is to prevent it from trying becoming the master for "headquarters" and crashing?
I am experimenting with durability settings in a remote node such that I use location_guid_xxx but for the "headquarters" cloud server I use a Headquarters
On the remote client I might to do this:
<Merge scope="Headquarters" type="Ignore"/>
<Merge scope="location_guid_xxx" type="Merge"/>
so a location won't be master for the universe, but can a durability service within a location still be master for that location?
If I have 100,000 locations does this mean I have to have all of them listed in the "Merge scope" in the ospl.xml file located at headquarters? I would think this alone might limit the size of the network I can handle.
I am assuming that this product will handle this sort of Internet of Things scenario. Has anyone else tried it?
Considering the scale of your system I think you should seriously consider the use of Vortex-Cloud (see these slides http://slidesha.re/1qMVPrq). Vortex Cloud will allow you to better scale your system as well as deal with NAT/Firewall. Beside that, you'll be able to use TCP/IP to communicate from your Raspberry Pi to the cloud instance thus avoiding any problem related to NATs/Firewalls.
Before getting to your durability question, there is something else I'd like to point out. If you try to build a flat system with 100K nodes you'll generate quite a bit of discovery information. Beside generating some traffic, this will be taking memory on your end applications. If you use Vortex-Cloud, instead, we play tricks to limit the discovery information. To give you an example, if you have a data-write matching 100K data reader, when using Vortex-Cloud the data-writer would only match on end-point and thus reducing the discovery information by 100K times!!!
Finally, concerning your durability question, you could configure some durability service as alignee only. In that case they will never become master.
HTH.
A+

Connection Speed using HTTP Request

We are making an application involving a server(tomcat, apache, linux) and multiple mobile clients(Android, iPhone, Windows, Nokia J2ME).
Normally the clients and the server will communicate using http.
I would like to know the download and upload speeds of the client from the http request that it made.
Ideally I would not like to upload a file and download a file to come up with these speeds. I am assuming that there might be some thing at the HTTP protocol level that can give me this, or some lower layer of the network.
If only it were that simple.
Even where the bandwidth and latency of a network are very well defined, the actual throughput will be limited by the congestion window and where the end points are in establishing the slow start threshold. These can affect throughput by a factor of 20 or more.
There's nothing in HTTP which will provide metrics for these. Some TCP stacks will expose limited information about throughput (as used by iftop, iptraf).
However if you really want to gather useful metrics on HTTP throughput, then you need to start shoving data across the network - have a look at yahoo boomerang for an implementation.
If the http connection goes to the Apache server first, you can use Apache Bench to do all sorts of load testing. It comes with apache and can be invoked with something like the following.
Suppose we want to see how fast Yahoo can handle 100 requests, with a maximum of 10 requests running concurrently:
ab -n 100 -c 10 http://www.yahoo.com/
HTTP does not deal with connection speeds. Although I could imagine some solution that involves some HTTP (reverse) proxy that estimates speeds on a connection and sets custom headers to pass this info. You would also need to to associate stats of different connections with particular client. I have not seen yet a readily available solution for this.
Also note that
network traffic can be buffered or shaped so download speed may depend on amount of data transferred or previous load of network. So even downloading file would not be accurate.
Amount of data transferred depends on protocol level (payload wrapped in HTTP wrapped in gzip wrapped in TLS wrapped TCP). Which one do you want to measure? Or what do you want to achieve with this measured speed?
I've seen some Real User Monitoring (RUM) tools that can do this passively (they get a feed from a SPAN port or network TAP infront of the servers at the data centre)
There are probably ways of integrating the data they produce into your applications but I'm not sure it would be easy or perhaps given the way latency and bandwidth can 'dynamically' change on a mobile network that accurate.
I guess the real thing to focus on is the design of the app, how much data is travelling across the network, how you can minimise it etc.
Other thing to consider is whether you could offer a solution that allows some of the application to be hosted in the telco's POPs (some telcos route all their towers back to a central pop, others have multiple POPs)

Can ordered message delivery be implemented with BizTalk scaling out using sequential convoy?

I have a BizTalk receive port connected to a queue (using MQ Series adapter) which is used to receive ordered messages. I need to scale out this port with multiple BizTalk host instances (I'm using BizTalk Server 2006 R2).
According to MSDN this cannot be done since ordered message delivery works against scale-out techniques.
Is there any other way to achieve ordered message delivery with scaling out with multiple BizTalk host instances? Is it possible to achieve this with sequential convoy pattern?
Appreciate your feedback.
Thanks,
Chatur
This won't work - scaling out isn't going to help you if you need to process messages in sequence. How can you process the next item in the queue from another host instance if the current host instance hasn't completed. You are basically asking how can I make my sequential delivery parallelizable?
As per answers to your post on MSDN, turning on 'Ordered' your MQ Receive Location will prevent parallel throughput from multiple receive hosts, but you should still receive the benefits of reliability, failover and 'maintenance' slots without downtime.
FWIW, we are using the MQSC adapter on 2 servers - there are many performance 'knobs' (polling interval, maximum batch size, and ?threads) on the receive location (admittedly, we aren't using ordered) which can be used to improve ordered throughput from just the one listener (exactly how many messages do you need to process per second?).
As an alternative to ordered delivery across the receive location (and assuming the documents needed for a unit of work have been split and can be correlated back together / and documents have some kind of sequence number), you might look at aggregation patterns on a per-message basis such as this one from Seroter

Resources