What is the highest number of threads that is reasonable to simultaneously run in Jmeter? - http

I want to use the highest possible number of threads (to use less computers) but without making the bottleneck to be in the client.

JMeter can simulate a very High Load provided you use it right.
Don't listen to Urban Legends that say JMeter cannot handle high load.
Now as for answer, it depends on:
your machine power
your jvm 32 bits or 64 bits
your jvm allocated memory -Xmx
your test plan ( lot of beanshell, post processor, xpath ... Means lots of cpu)
your os configuration (tunable)
Gui / non gui mode
So there is no theorical answer but following Best Practices will ensure JMeter performs well.
Note that with jmeter you can distribute load through remote testing, read:
Remote Testing > 15.4 Using a different sample sender
And finally use cloud based testing if it's not enough.
Read this for tuning tips:
http://www.ubik-ingenierie.com/blog/jmeter_performance_tuning_tips/
Read this book for doing load testing and using JMeter correctly.

I have used JMeter a fair bit and found it is not great at generating really high load. On a 2Ghz Core2 Duo with 2Gb memory you can reasonably expect about 100 threads.
That being said, it is best to run it on your hardware so that the CPU of the PC does not peak at 100% - a stable 80%-90% is best otherwise the results are affected.
I have also tried WAPT 5 - it successfully ran 1000+ threads from the same PC. It is not free but it is more useable than JMeter but doesn't have all of the features.
Outdated answer since at least version 2.6 see https://stackoverflow.com/a/11922239/460802 for a more up to date one.

The JMeter Wiki reports cases where JMeter was used with as much as 1000 threads. I have used it with at most 100 threads, but the Links in the Wiki suggest resource reductions I never tried.

One of the issues we had with running JMeter on Windows XP was the Windows XP TCP Connection Limit. Limit should be removed in order to run use the JMeter to workstation’s full potential
More info here. AFAIK, does not apply to other OS.

I used JMeter since 2004 and i launched lot of load tests.
With PC Windows 7 64 bits 4Go RAM iCore5.
I think JMeter can support 300 to 400 concurrent threads for Http (Sampler) protocol with only one "Aggregate Report Listener" who writes in the log file results and timers between call pages.
For a big load test you could configure JMeter with slaves (load generators) like this
http://jmeter-plugins.org/wiki/HttpSimpleTableServer/
I have already done tests with 11 PC slaves to simulate 5000 threads.

I have not used JMeter, but the answer probably depends on your hardware. Best bet might be to establish metrics of performance, guess at the number of threads and then run a binary search as follows.
Source was Wikipedia.
Number guessing game...
This rather simple game begins something like "I'm thinking of an integer between forty and sixty inclusive, and to your guesses I'll respond 'High', 'Low', or 'Yes!' as might be the case." Supposing that N is the number of possible values (here, twenty-one as "inclusive" was stated), then at most questions are required to determine the number, since each question halves the search space. Note that one less question (iteration) is required than for the general algorithm, since the number is already constrained to be within a particular range.
Even if the number we're guessing can be arbitrarily large, in which case there is no upper bound N, we can still find the number in at most steps (where k is the (unknown) selected number) by first finding an upper bound by repeated doubling. For example, if the number were 11, we could use the following sequence of guesses to find it: 1, 2, 4, 8, 16, 12, 10, 11
One could also extend the technique to include negative numbers; for example the following guesses could be used to find −13: 0, −1, −2, −4, −8, −16, −12, −14, −13

It is more dependent on the kind of performance testing you do(load, spike, endurance etc) on a specific server (a little on hardware dependency)
Keep in mind around these parameters
- the client machine on which you are targeting the run of jmeter, there will be a certain amount of heap memory allocated, ensure to have a healthy allocation so that the script does not error out. The highest i had run on jmeter was 1500 on a local environment ( client - server arch), On a Web arch, the highest i had a run was based upon Non- functional requirement were limited to 250 threads,
so it ideally depends on the kinds of performance testing and deployment style and so on..

There is not standard number for this. The maximum number of threads that you can generate from one computer depends completely on the computer's hardware and the OS. The OS by default occupies certain amount of CPU and the RAM.
To find out the maximum threads your computer can handle you can prepare a sample test and run it with only a few threads. Then with each cycle of test run increase the number of threads gradually. During this you also need to monitor the CPU, RAM, Disk I/O and Network I/O of your computer. The moment any of these reach near or beyond 80% (Again for you to decide if near is okay for you or beyond), that is the maximum number of threads your computer can handle. To be on the safer side I would stop at the number when the resource utilization reaches 70%.

It'll depend on the hardware you run on as well as the underlying script. I've always felt that this fuzziness is the biggest problem with traditional load testing tools. If you've got a small budget ($200 or so gets you a LOT of testing), check out my company's load testing service, BrowserMob.
Besides our Real Browser Users (RBUs) which control thousands on actual browsers for the purpose of performance and load testing, we also have traditional virtual users (VUs). Scripts are written in JavaScript and can make various HTTP calls.
The reason I bring it up is that I always felt that the game of trying to figure out how many VUs you can fit on your load gen hardware is dangerous. It's so easy to get bad results without realizing it.
To solve that for BrowserMob, we took an extremely conservative approach on the number of VUs and RBUs per CPU core: no more than 1 browser or 50 threads per CPU core, and sometimes much less. In the world of cloud computing, CPU cycles are so cheap that it just doesn't make sense to try to overload machines.

Related

Jelastic - vertical scaling

I have an application on tomcat (no horizontal scaling), I send 20 requests. Each request executing complex calculations for 6-7 minutes.
I run the same on a standalone server with many cores (50+) and there each request is executed in a thread on the separate logical processor - execution time is 3-4 minutes.
On the Jelastic platform, it scales up to 40-42 cloudlets (even if max is 92). Is there a way "influence" scaling, to tell Jelastic that it should use more cloudlets. (I suppose there is less than 20 processors on 42 cloudlets and time is shared between threads).
Also if I send twice more (40 requests) it uses around 65 cloudlets (max, in that case, is set to 160).
Could I find anywhere what are rules how Jelastic decide how to scale and what to do to use more cloudlets?
The overall compute power of your Jelastic node is determined by the cloudlet scaling limit. If you have a compute intensive workload, you should set this value as high as possible, keeping in mind your budget constraints. (you can also configure load alerts to notify you about persistent excessive usage)
The cloudlet consumption (in terms of billing) is calculated as the average CPU usage per hour, so if you have a spike of a few minutes - as in your example - you will want to allow it to use as much CPU as it can.
The usage that you see displayed is a function of how your workload is spreading across cores, with potentially artificial limitations on the actual compute power of each core due to the cloudlet limits that you've set. To get the best performance, you have to allow a very high cloudlet limit, because this will give you full access to each core instead of an (effectively) limited clock speed per core.
What you set as reserved cloudlets doesn't matter (it's purely a billing construct: no technical implications), and no cloudlets are added or removed from your server - again, this is purely a billing construct, not a technical one. This means there is no way to instruct Jelastic to use more cloudlets: that's happening purely due to how your code is executed (most likely the way it's spread across cores due to the overall cloudlet limits you've set vs. the compute power of the underlying hardware).
There is also a possibility that your performance will be different across different Jelastic providers depending on their hardware choices. For example some will use a high number of cores with lower clock speed, others may prioritise clock speed over the number of cores. Your application performance in that case will depend on how effectively it can be parallelised across cores (multi-threading); if your most intensive chunk of work doesn't parallelise, it will work best with a higher clock speed. Given that you get increased cloudlet consumption with more requests, it sounds like this is at least partially responsible in your case.
You might also consider that if you have better performance from dedicated hardware, you can get Jelastic as a private or hybrid cloud option. They may also offer different hardware options in different regions to cater for different types of workloads. Talk to your Jelastic provider and see what they can offer you.

How many safe parallel connections can be made to a server

I am trying to make a simple general purpose multi-threaded async downloader in python.How many parallel connections can be generally be made to a server with minimum risk of being banned or rate limited.
I am aware that network will be a limiting in some cases but lets assume in this case that network isn't an issue in this case for the sake of discussion.I/O is also done asynchronously.
According to Browserscope , browsers make a maximum of 17 connections at a time.
However according to my research , most download managers download files in multi-part and make 8+ connections per file.
1.How many files can be downloaded at a time ?
2.How many chunks for a single can be downloaded at one time ?
3.What should be the minimum size of those chunks to make it worth creating the overhead of creating parallel connections ?
It depends.
While some servers tolerate a high number of connections, others don't. General web servers might be more on the high side (low two digit), file hosters might be more sensitive.
There's little to say unless you can check the server's configuration or just try and remember for the next time when your ban has timed out.
You should however watch your bandwidth. Once you max out your access line there's no gain in further increasing the connections.

Single-CPU programs running on Hyper-Threading-enabled quadcore CPU

I'm a researcher in statistical pattern recognition, and I often run simulations that run for many days. I'm running Ubuntu 12.04 with Linux 3.2.0-24-generic, which, as I understand, supports multicore and hyper-threading. With my Intel Core i7 Sandy Bridge Quadcore with HTT, I often run 4 simulations (programs that take a long time) at the same time. Before I ask my question, here are the things that I already (think I) know.
My OS (Ubuntu 12.04) detects 8 CPUs due to hyper-threading.
The scheduler in my OS is clever enough never to schedule two programs to run on two logical (virtual) cores belonging to the same physical core, because the OS supports SMP (Simultaneous Multi-Threading).
I have read the Wikipedia page on Hyper-Threading.
I have read the HowStuffWorks page on Sandy Bridge.
OK, my question is as follows. When I run 4 simulations (programs) on my computer at the same time, they each run on a separate physical core. However, due to hyper-threading, each physical core is split into two logical cores. Therefore, is it true that each of the physical cores is only using half of its full capacity to run each of my simulations?
Thank you very much in advance. If any part of my question is not clear, please let me know.
This answer is probably late, but I see that nobody offered an accurate description of what's going on under the hood.
To answer your question, no, one thread will not use half a core.
One thread can work inside the core at a time, but that one thread can saturate the whole core processing power.
Assume thread 1 and thread 2 belong to core #0. Thread 1 can saturate the whole core's processing power, while thread 2 waits for the other thread to end its execution. It's a serialized execution, not parallel.
At a glance, it looks like that extra thread is useless. I mean the core can process 1 thread at once right?
Correct, but there are situations in which the cores are actually idling because of 2 important factors:
cache miss
branch misprediction
Cache miss
When it receives a task, the CPU searches inside its own cache for the memory addresses it needs to work with. In many scenarios the memory data is so scattered that it is physically impossible to keep all the required address ranges inside the cache (since the cache does have a limited capacity).
When the CPU doesn't find what it needs inside the cache, it has to access the RAM. The RAM itself is fast, but it pales compared to the CPU's on-die cache. The RAM's latency is the main issue here.
While the RAM is being accessed, the core is stalled. It's not doing anything. This is not noticeable because all these components work at a ridiculous speed anyway and you wouldn't notice it through some CPU load software, but it stacks additively. One cache miss after another and another hampers the overall performance quite noticeably.
This is where the second thread comes into play. While the core is stalled waiting for data, the second thread moves in to keep the core busy. Thus, you mostly negate the performance impact of core stalls.
I say mostly because the second thread can also stall the core if another cache miss happens, but the likelihood of 2 threads missing the cache in a row instead of 1 thread is much lower.
Branch misprediction
Branch prediction is when you have a code path with more than one possible result. The most basic branching code would be an if statement.
Modern CPUs have branch prediction algorithms embedded into their microcode which try to predict the execution path of a piece of code. These predictors are actually quite sophisticated and although I don't have solid data on prediction rate, I do recall reading some articles a while back stating that Intel's Sandy Bridge architecture has an average successful branch prediction rate of over 90%.
When the CPU hits a piece of branching code, it practically chooses one path (path which the predictor thinks is the right one) and executes it. Meanwhile, another part of the core evaluates the branching expression to see if the branch predictor was indeed right or not. This is called speculative execution.
This works similarly to 2 different threads: one evaluates the expression, and the other executes one of the possible paths in advance.
From here we have 2 possible scenarios:
The predictor was correct. Execution continues normally from the speculative branch which was already being executed while the code path was being decided upon.
The predictor was wrong. The entire pipeline which was processing the wrong branch has to be flushed and start over from the correct branch.
OR, the readily available thread can come in and simply execute while the mess caused by the misprediction is resolved. This is the second use of hyperthreading.
Branch prediction on average speeds up execution considerably since it has a very high rate of success. But performance does incur quite a penalty when the prediction is wrong.
Branch prediction is not a major factor of performance degradation since, like I said, the correct prediction rate is quite high.
But cache misses are a problem and will continue to be a problem in certain scenarios.
From my experience hyperthreading does help out quite a bit with 3D rendering (which I do as a hobby). I've noticed improvements of 20-30% depending on the size of the scenes and materials/textures required. Huge scenes use huge amounts of RAM making cache misses far more likely. Hyperthreading helps a lot in overcoming these misses.
Since you are running on a Linux kernel you are in luck because the scheduler is smart enough to make sure your tasks is divided on between your physical cores.
Linux became hyperthredding aware in kernel 2.4.17 ( ref: http://kerneltrap.org/node/391 )
Note that the reference is from the old O(1) scheduler. Linux now uses the CFS scheduling algorithm which was introduced in kernel 2.6.23 and should be even better.
But as already suggested you can experiment by disabling hyper threading in bios and see if your particular workload runs faster or slower with or without hyperthreading enabled. If you start 8 tasks instead of 4 you will probably find that the total executing time for 8 tasks on hyperthreading is faster than two separate runs with 4 tasks but again the best thing to do is to experiment. Good luck!
If you are really want just 4 dedicated cores, you should be able to disable hyperthreading in your BIOS page. Also, and this part I'm less clear on, I believe that the processor is smart enough to do more work on a single thread if its second logical core is idle.
No, it's not exactly true. A hyperthreaded core is not two cores. Some things can run in parallel, but not as much as on two separate cores.

Host ASP.Net WebSite - What would be an Ideal Hardware Configuration?

I am studying various ASP.Net deployment approaches. In there, I got a basic question. Is there any thumb rule about enviornment definition? What could be called a 'good' setup if I have to support 1000 concurrent users(requests).
I understand that there are many factors like how application is designed etc. But assuming that everything else is great, what configuration should I look for like Which processor, how much RAM etc?
Also how many concurrent users below configuration should be able to support ?
CPU: Dual 3.40 GHz Intel Xeon (Hyper-Threaded)
Memory : 3GB
OS: Windows Server 2003 SP2
Thanks for thelp
Having been on both sides of the equation (web developer and hardware engineer), my current opinion is that the answer involves both of those sides as well.
Your hardware needs to be not only sufficient for general usage, but it also has to cope with reasonable unexpected peaks and failures - which means that it needs to be redundant, and in excess of your capacity planning.
Your software needs to be designed so its easily redundant - theres no point in speccing a tiered hardware architecture (now or for future planning) if the software is going to require significant amount of changes to handle it.
Your software also needs to be designed so sudden unexpected peaks in resource usage don't happen as a regular occurrence for no external reason (eg marketing campaign).
I know that you say you understand the non-hardware factors, but the real answer to your question is that there is no real way to answer it without knowing the other factors - each situation and circumstance is unique, and requires a unique solution.
However, in an effort to add generalised recommendations, try these:
CPU - choose something with a lot of cache, and individual cache per core as well. This will do wonders to speed up the system. I typically go for dual core, dual processor at a minimum (for a total of 4 cores on two seperate physical cpus). Processor speed ratings don't really matter as much as you think these days.
Memory - fast memory, minimum of 8GB of it. Use the smallest dimms possible for the server.
Harddisk - SAS 15K RPM at a minimum, RAID 6 for the data partition on one controller, RAID 1 or 6 for the system partition on another controller. Choose a good quality controller backed by a good support or warranty package - your controller is no good if it dies in 3 years time and you can't get a replacement.
But above all, don't just install the OS and app and let it be, profile the set up as much as possible, don't be afraid of making changes to optimise to the individual setup (within reason). Move your ASP.Net temporary files to a fast disk (or a ram disk - if they are going to be rebuilt anyway, no matter worrying over losing them). Move the database to a second server, with a crossover 1GBit link between the two. Turn off disk maintenance in the OS, turn off services you do not need.
Good luck!

Best way to determine the number of servers needed

How much traffic can one web server handle? What's the best way to see if we're beyond that?
I have an ASP.Net application that has a couple hundred users. Aspects of it are fairly processor intensive, but thus far we have done fine with only one server to run both SqlServer and the site. It's running Windows Server 2003, 3.4 GHz with 3.5 GB of RAM.
But lately I've started to notice slows at various times, and I was wondering what's the best way to determine if the server is overloaded by the usage of the application or if I need to do something to fix the application (I don't really want to spend a lot of time hunting down little optimizations if I'm just expecting too much from the box).
What you need is some info on Capacity Planning..
Capacity planning is the process of planning for growth and forecasting peak usage periods in order to meet system and application capacity requirements. It involves extensive performance testing to establish the application's resource utilization and transaction throughput under load. First, you measure the number of visitors the site currently receives and how much demand each user places on the server, and then you calculate the computing resources (CPU, RAM, disk space, and network bandwidth) that are necessary to support current and future usage levels.
If you have access to some profiling tools (such as those in the Team Suite edition of Visual Studio) you can try to set up a testing server and running some synthetic requests against it and see if there's any specific part of the code taking unreasonably long to run.
You should probably check some graphs of CPU and memory usage over time before doing this, to see if it can even be that. (A number alike to the UNIX "load average" could be a useful metric, I don't know if Windows has anything like it. Basically the average number of threads that want CPU time for every time-slice.)
Also check the obvious, that you aren't running out of bandwidth.
Measure, measure, measure. Rico Mariani always says this, and he's right.
Measure req/sec, RAM, CPU, Sessions, etc.
You may come up with a caching strategy (Output caching, data caching, caching dependencies, and so on.)
See also how your SQL Server is doing... indexes are a good place to start but not the only thing to look at..
On that hardware, a .NET application should be able to serve about 200-400 requests per second. If you have only a few hundred users, I doubt you are seeing even 2 requests per second, so I think you have a lot of capacity on that box, even with SQL server running.
Without know all of the details, I would say no, you will not see any performance improvement by adding servers.
By the way, if you're not using the Output Cache, I would start there.

Resources