NiFi memory management - bigdata

I Just want to understand how we should plan for the capacity of a NiFi instance.
We have a NiFi instance which is having around 500 flows. So, the total number of processors enabled on NiFi canvas is around 4000. We do run 2-5 flows simultaneously which does not take more than half an hour i.e. we do process data in MBs.
It was working fine till now but we are seeing outofMemory error very often. So we increased xms and xmx parameters from 4g to 8g which has resolved the problem for now. But going forward we will have more flows and we may face outofmemory issue again.
So, can anyone help with matrix of capacity planning or any suggestion to avoid such issues before happening? eg:- If we have 3000 processors enabled with/without any processing then Xg amount memory required.
Any input on NiFi capacity planning would be appreciated.
Thanks in Advance.

OOM errors can occur due to specific memory consuming processors. For example: SplitXML is loading your whole record to memory, so it could load a 1GiB file for instance.
Each processors can document what resource considerations should be taken. All of the Apache processors(as far as I can tell) are documented in that matter so you can rely on them.
In our example, by the way, SplitXML can be replaced with SplitRecord which doesn't load all of the record to memory.
So even if you use 1000 processors simultaneously, they might not consume as much memory as one processor that loads your whole FlowFile's content to memory.
Check which processors you are using and make sure you don't use one like that(there are more like this one that load the whole document to memory).

Related

Deeply control orchestration throttling and dispatching in BizTalk based on message batch size

I have a biztalk orchestration which processes a single message. This messages are actually batches of messages. Most of the time, the batch size n is small (<1.000) but once in a while there are very large batches (>50.000). We have a high throughput of messages as well.
The orchestration takes a linear O(n) amount of system memory depending on the batch size and I know by observation that a single server can process up to an accumulated batch size of ~250k in parallel before it runs out of system memory and only returns OutOfMemoryExceptions. (Which will kill the BizTalk host instance and the orchestrations will startup on another host which will ultimately break again leaving our BizTalk group in a broken state which can currently only be recovered by manual intervention)
Small batches are common, large batches are rare but kind of deadly if there is more than one at the same time.
I know the batch size in advance so I could tell biztalk about it. But I see no way to interact with throttling. When throttling detects a lack of system memory it is already too late.
Do I have to build my own queueing and dispatching on top of biztalk to achieve my goals?
Our current solution is to use a semaphore with a value of 8 and every large message n>1000 needs to get a semaphore slot before it is allowed to start processing. We had an edge case the other day where even this was too much. We reduced 8 to 4 to resolve this but now, we impacted the general throughput noticeably.
Any idea or hint is welcomed!
Don't use XmlDocument within your processing. It will further exacerbate your memory issues. Prefer XmlReader for sure here. However, I'd still try to move processing outside of your orchestration. Even if you can get the streaming working in a .NET component called from the orchestration, you can still end up with an orchestration instance that runs for a long time and consumes lots of memory, which should be avoided whenever possible. Therefore...
Avoid letting the orchestration get messages that large to begin with. It may be possible to debatch the message using the OOB XmlDisassembler if you can mark the schema as an envelope schema; if not, you may need to create a custom disassembler component to do your debatching (just remember to promote/write the proper context properties to the newly created messages from the original). If you use some streaming techniques (see https://www.microsoft.com/en-us/download/details.aspx?id=20375) in the pipeline, you can greatly reduce the memory footprint and have much greater control there. Again, use XmlReader to actually parse and debatch the message (it shouldn't be super difficult - look into the ReadToFollowing and ReadSubTree, as in this Splitting large xml files in to sub files without memory contention). You might get away with doing this in an orchestration rather than a pipeline component, but in a pipeline component it should be easier to control memory usage. You may also look into promoting things like a batch ID if you need to correlate the messages back together.
If you get a large batch, you will still need to throttle the number of concurrent orchestrations; you could do so as Richard Seroter suggests here, which uses multiple convoys that correlate on instance IDs to prevent too many from running at once. Alternatively, you could use ordered delivery on the receive shape (see MSDN), which would probably be my preferred option as it takes significantly less work and won't face the concerns around zombie messages that are possible with convoys.
Basically: try to think small and lean as much as possible and BizTalk will be happier. BizTalk would much rather process 1000 small messages in a second than 1 very large message in a minute.

IIS High Thread Count

I have an IIS application which behaves like this - Number of total threads in IIS processes is low, traffic starts at some low rate like 5 rpm, the number of threads starts increasing, alarmingly, keeps on going even after load stops, does not gets down in reasonable time, reaches like 30,000 plus threads, response time goes for a toss.
Machine config is set to auto_Config.
There are no explicit threads in application, though there is some --very fancy-- use of parallel for each.
Looking for some tips on how do I go about diagnosing this. Reducing parallel for each seemed to help; I am yet to conclusively prove it. Limiting max number of threads also helps cap the thread count; but I am thinking that there is something wrong with the app that causes those threads to keep increasing. I would want to solve this.
In the picture below, the thread count is ONLY for IIS worker processes. The PUT requests are the only ones doing some work; gets are mostly static resources requests.
Can this be reproduced in a local or dev environment? If so it's a good time to attach to the process and use the debugging tools to see what threads are managed and where they are in code. If that fails to unveil anything then it might be a time to capture a memory dump from the process and dig into it with windbg.

aspnet_wp keeps recycling because of high memory consumption. How can I fix it?

I have a small WCF service which is executed on an XP box with 256 megs of RAM running in VM.
When I make a request (with a request size of approximately 5mbs) to that service I always get the following message in the event log:
aspnet_wp.exe was recycled because memory consumption exceeded the 153 MB (60 percent of available RAM).
and the call fails with error 500.
I've tried to increase memory limit to 95% but it still takes up all the available memory and fails in the same manner.
It looks like something is wrong with my app (I do not reuse byte[] buffers and maybe something else) but I cannot find root cause of such memory overuse.
Profiling showed that all CLR objects that I have in memory together do not take up that much space.
Doing a dump analysis with windbg showed same situation - nothing that big in object heap.
How can I find out what is contributing to such memory overuse?
Is there any way to make a dump right before process is recycled (during peak mem usage)?
Tess Ferrandez's blog "If broken it is, fix it you should" has lots of hints, tips and recommendations for sorting out exactly this sort of problem.
Of particular use to you would be Lab 3: Memory, where she walks you through working out what has caused all the memory on your machine to disappear.
Could be a lot of things, hard to diagnose this one. Have you watched perfmon to see if the memory usage does peak on aspnet process or on the server itself? 256MB is pretty low, but it should still be able to handle it. Do you have a SWAP file on this machine? AT what point do you take the memory dump? Have you stepped though the code, and does it work on other machines? Perhaps it is getting stuck in a loop and leaking memory until it crashes?

What is the highest number of threads that is reasonable to simultaneously run in Jmeter?

I want to use the highest possible number of threads (to use less computers) but without making the bottleneck to be in the client.
JMeter can simulate a very High Load provided you use it right.
Don't listen to Urban Legends that say JMeter cannot handle high load.
Now as for answer, it depends on:
your machine power
your jvm 32 bits or 64 bits
your jvm allocated memory -Xmx
your test plan ( lot of beanshell, post processor, xpath ... Means lots of cpu)
your os configuration (tunable)
Gui / non gui mode
So there is no theorical answer but following Best Practices will ensure JMeter performs well.
Note that with jmeter you can distribute load through remote testing, read:
Remote Testing > 15.4 Using a different sample sender
And finally use cloud based testing if it's not enough.
Read this for tuning tips:
http://www.ubik-ingenierie.com/blog/jmeter_performance_tuning_tips/
Read this book for doing load testing and using JMeter correctly.
I have used JMeter a fair bit and found it is not great at generating really high load. On a 2Ghz Core2 Duo with 2Gb memory you can reasonably expect about 100 threads.
That being said, it is best to run it on your hardware so that the CPU of the PC does not peak at 100% - a stable 80%-90% is best otherwise the results are affected.
I have also tried WAPT 5 - it successfully ran 1000+ threads from the same PC. It is not free but it is more useable than JMeter but doesn't have all of the features.
Outdated answer since at least version 2.6 see https://stackoverflow.com/a/11922239/460802 for a more up to date one.
The JMeter Wiki reports cases where JMeter was used with as much as 1000 threads. I have used it with at most 100 threads, but the Links in the Wiki suggest resource reductions I never tried.
One of the issues we had with running JMeter on Windows XP was the Windows XP TCP Connection Limit. Limit should be removed in order to run use the JMeter to workstation’s full potential
More info here. AFAIK, does not apply to other OS.
I used JMeter since 2004 and i launched lot of load tests.
With PC Windows 7 64 bits 4Go RAM iCore5.
I think JMeter can support 300 to 400 concurrent threads for Http (Sampler) protocol with only one "Aggregate Report Listener" who writes in the log file results and timers between call pages.
For a big load test you could configure JMeter with slaves (load generators) like this
http://jmeter-plugins.org/wiki/HttpSimpleTableServer/
I have already done tests with 11 PC slaves to simulate 5000 threads.
I have not used JMeter, but the answer probably depends on your hardware. Best bet might be to establish metrics of performance, guess at the number of threads and then run a binary search as follows.
Source was Wikipedia.
Number guessing game...
This rather simple game begins something like "I'm thinking of an integer between forty and sixty inclusive, and to your guesses I'll respond 'High', 'Low', or 'Yes!' as might be the case." Supposing that N is the number of possible values (here, twenty-one as "inclusive" was stated), then at most questions are required to determine the number, since each question halves the search space. Note that one less question (iteration) is required than for the general algorithm, since the number is already constrained to be within a particular range.
Even if the number we're guessing can be arbitrarily large, in which case there is no upper bound N, we can still find the number in at most steps (where k is the (unknown) selected number) by first finding an upper bound by repeated doubling. For example, if the number were 11, we could use the following sequence of guesses to find it: 1, 2, 4, 8, 16, 12, 10, 11
One could also extend the technique to include negative numbers; for example the following guesses could be used to find −13: 0, −1, −2, −4, −8, −16, −12, −14, −13
It is more dependent on the kind of performance testing you do(load, spike, endurance etc) on a specific server (a little on hardware dependency)
Keep in mind around these parameters
- the client machine on which you are targeting the run of jmeter, there will be a certain amount of heap memory allocated, ensure to have a healthy allocation so that the script does not error out. The highest i had run on jmeter was 1500 on a local environment ( client - server arch), On a Web arch, the highest i had a run was based upon Non- functional requirement were limited to 250 threads,
so it ideally depends on the kinds of performance testing and deployment style and so on..
There is not standard number for this. The maximum number of threads that you can generate from one computer depends completely on the computer's hardware and the OS. The OS by default occupies certain amount of CPU and the RAM.
To find out the maximum threads your computer can handle you can prepare a sample test and run it with only a few threads. Then with each cycle of test run increase the number of threads gradually. During this you also need to monitor the CPU, RAM, Disk I/O and Network I/O of your computer. The moment any of these reach near or beyond 80% (Again for you to decide if near is okay for you or beyond), that is the maximum number of threads your computer can handle. To be on the safer side I would stop at the number when the resource utilization reaches 70%.
It'll depend on the hardware you run on as well as the underlying script. I've always felt that this fuzziness is the biggest problem with traditional load testing tools. If you've got a small budget ($200 or so gets you a LOT of testing), check out my company's load testing service, BrowserMob.
Besides our Real Browser Users (RBUs) which control thousands on actual browsers for the purpose of performance and load testing, we also have traditional virtual users (VUs). Scripts are written in JavaScript and can make various HTTP calls.
The reason I bring it up is that I always felt that the game of trying to figure out how many VUs you can fit on your load gen hardware is dangerous. It's so easy to get bad results without realizing it.
To solve that for BrowserMob, we took an extremely conservative approach on the number of VUs and RBUs per CPU core: no more than 1 browser or 50 threads per CPU core, and sometimes much less. In the world of cloud computing, CPU cycles are so cheap that it just doesn't make sense to try to overload machines.

ASP.NET - Single large web request triggers System.OutOfMemoryException - Still have plenty of available memory

Environment:
Windows 2003 Server (32 bit); IIS6, ASP.NET 2.0 (3.5); 4Gb Ram; 1 Worker Process
We have a situation where we have a very large System.XmlDocument is being loaded into memory, and then it heads into a complied XSL transform.
What is happening is when a web request comes in the server is sitting in an idle state with 2500Mb of available system memory.
As the XML DOM is populated, the available memory drops approx 500Mb at which point we get a System.OutOfMemoryException event. At this point the system should theoretically still have 2000Mb of available memory available to service the request (according to Perfmon).
The related questions I have are:
1) At what level in the stack is this out of memory limitation being met? OS? IIS? ASP.NET? worker process? Is this a per individual web request limit?
2) Is this limit configurable somewhere?
3) Why can’t this web request access the full available system memory?
1) I would guess at the worker process but this should be configurable within IIS to the limit of memory that a worker process can use. Another factor is what level of bits does your software use, e.g. 32 bit has a physical limit of 4 GB since this is the total address space.
2) Probably but don't forget that memory fragmentation may play a role in getting to out of memory faster than you think, e.g. if there is a memory request for a contiguous 1000 Mb piece of memory then this may not necessarily be found in the current memory.
3) Have you examined dump data to see what is in the memory when the exception gets thrown? If not, there are ways to get a snapshot of the memory to see what it looks like as this may give you more clues about what is going on.
You are running in a process. A process can only access 2 gigs of memory. This task is sharing memory with everything else running in this process, so this bit of code does not get the full 2 gig -- even if it is available.
There is a 3 gig switch on the os as well. I believe it is a registry setting. But you will have to search MSDN to find that info.
But realistically, you need to do this another way. Possibly by switching to a SAX style xml parser.
I'm sure there are some bright heads here that can answer your specific questions, but have you asked yourself if there is another way to do what you want? I specifically mean that you probably do not want to process a very large XML document, but you probably more specifically want to return something back to the client. Could you rewrite the code to avoid this XML document altogether, or perhaps not load it all into memory at the same time, and still produce the same end-result?
1) Dunno. Check your logs.
2) IIS limits memory divvied out to websites/application pools. Check your settings.
3) Servers are all about uptime; if an single app hogs all the resources everybody else suffers. Thats why enterprise apps like IIS limit memory to prevent runaways from taking down the entire server.

Resources