Univocity - Parsing a fixedwidth flat file with one row - performance impact with 300 parallel threads - string-parsing

We have a project that deals with millions of transactions everyday which has some tight SLAs. As part of parsing the flat file that comes as input to a bean , we used beanio which was working better with out load. But with load its taking around 250ms to parse a flat file to a bean.
Requirement: Simple string has to converted to a single bean(nested and converted)
Heard the univocity can do better here - and tried the same with below settings.
FixedWidthParserSettings settings = new FixedWidthParserSettings();
settings.getFormat().setLineSeparator("\n");
settings.setRecordEndsOnNewline(false);
settings.setHeaderExtractionEnabled(false);
settings.setIgnoreLeadingWhitespaces(false);
settings.setIgnoreTrailingWhitespaces(false);
settings.setMaxColumns(100);
settings.setMaxCharsPerColumn(100);
settings.setNumberOfRecordsToRead(1);
settings.setReadInputOnSeparateThread(false);
settings.setInputBufferSize(10*1024);
settings.setLineSeparatorDetectionEnabled(false);
settings.setColumnReorderingEnabled(false);
When running with jmeter, with 200 parallel threads - the average time taken is 10ms(to parse and convert around 10 fields where in actual use case we have to the same for around 500 fields)
but when we increased it to 300 or 350 parallel threads , the average time was around 300ms. But our total SLA is around 10ms.
Any help here is highly appreciated!

Probably you are running out of memory on your JVM. Try increasing it with the -Xms and -Xmx flags. Also too many threads won't help you if you don't have enough cores available.

Related

Why wouldn't a small Firebase Functions app just use a single Function to handle logic?

...aside from the benefit in separate performance monitoring and logging.
For logging, I am confident I can get granularity through manually adding the name of the "routine" to each call. This is how it is now with several discrete Functions for different parts of the system:
There are multiple automatic logs: start and finish of the routine, for example. It would be more challenging to find out how expensive certain routines are, but it would not be impossible.
The reason I want the entire logic of the application handled by a single handle function is because of reducing cold starts: one function means only one container that can be persistently kept alive when there are very few users of the app.
If a month is ~2.6m seconds and we assume the system uses 1 GB RAM and 1 GHz CPU frequency at all times, that's:
2600000 * 0.0000025 + 2600000 * 0.000001042 = USD$9.21 a month
...for one minimum instance.
I should also state that all of my functions have the bare minimum amount of global scope code; it just sets up Firebase assets (RTDB and Firestore).
From a billing, performance (based on user wait time), and user/developer experience perspective, is there any reason why it would be smart to keep all my functions discrete?
I'd also accept an answer saying "one single function for all logic is reasonable" as long as there's a reason for it.
Thanks!
If you have very small app with ~5 end points and very low traffic. Sure you could do something like this. But why not do it:
billing and performance
The important thing to realize is that with every request a new instance of your function is created. Which means there could be 10s of them running at the same time.
If you would like to have just 1 instance handling all the traffic you should explore GCP Cloud run, where you have 1 container handling multiple requests and scaling only when it's not sufficient.
Imagine you have several end-points and every one of them have different performance requirements.
1 can need only 128MB or RAM
1 can need 1GB RAM
(FYI: You can control the CPU MHz of the function via the RAM settings too - which can speed up execution in some cases)
If you had only 1 function with 1GB of ram. Every request would allocate such function and in some cases most of the memory could go to waste.
But if you split it into multiple, some requests will require much less resources and can save you $ when we talk about bigger amount of executions / month. (tens of thousands+).
Let's imagine function, 3 second execution, 10k executions/month:
128MB would cost you $0.0693
1024MB would cost you $0.495
As you can see, with small app the difference could be nothing. But if you scale it matters. (*The cost can vary based on datacenter)
As for the logging, I don't think it matters. Usually in bigger systems there could be messages traveling trough several functions so you have to deal with that anyway.
As for the cold start. You just need good UI to facilitate that. At first I was worry about it in our apps but later on, you just get used to it that some action can take ~2s to execute (cold start). And you should have the UI "loading" regardless, because you don't know if the function will take ~100ms or 3s due to bad connection.

Out of Memory with large data in codename one

My Codename One application downloads around 16000 records of data (approx 10 fields in each record).
On my Android phone (OS6.0, RAM 2GB) it's able to load 8000 to 9000 records but then shows out of memory error.
From the trace, it looks like it run out of heap memory allocated to the app.
Any suggestion what would be the ideal way to handle that large amount of data, please?
Here is the log file
The amount of RAM on the phone doesn't mean much. The OS takes about half and then divides the rest to the various apps running in parallel. You would typically have much less see What is the maximum amount of RAM an app can use?
You need to review your code and check what is eating up memory. 16k records of 1kb each would be 16Mb which probably shouldn't crash an app so the question is where is memory taken, I would suggest reading the performance section of the developer guide to figure out memory usage.
This might not apply to your situation, but would it be possible to only download x number of records at a time? Then, when the user takes some action (scrolls, hits next page, etc) it loads the next batch? Codename one has a great endless scroller implementation. See here for an example - https://www.codenameone.com/blog/property-cross-revisited.html

OpenCL : Id of the physical core being used

I'm trying to get something to work but I run out of ideas so I figured I would ask here.
I have a kernel that has a large global size (usually 5 Million)
Each of the threads can require up to 1Mb of global memory (exact size not known in advance)
So i figured... ok, on my typical target GPU I have 6Gb and I can run 2880 threads in parrallel, more than enough right ?
My idea is to create a big buffer (well actually 2 because of the max buffer size limitation...)
Each thread pointing to a specific global memory area (with the coalescence and stuff, but you get the idea...)
My problem is, How do I know which thread is currenctly being run (in the kernel code) to point to the right memory area ?
I did find the cl_arm_get_core_id extension but this only gives me the workgroup, not the acutal thread being used, plus this does not seem to be available on all GPUs, since it's an extension.
I have the option to have work_group_size = nb_compute_units / nb_cores and have the offset to be arm_get_core_id() * work_group_size + global_id() % work_group_size
But maybe this group size is not optimal, and the portability issue still exists.
I can also enqueue a lot of kernels calls with global size 2880, and there I obviously know where to point to with the global Id.
But won't this lead to a lot of overhead because of the 5Million / 2880 kernel calls ? Plus any work group that finishes before the others will be idle until all workgroups for this call have finished their job.
Any ideas to do this properly are very welcome !
Well, you are storing 1MB per WI for temporal computations (because you are not saving them, otherwise your wouldn't have memory).
Then, why not simply let it spill to global memory? Does the compiler complain? If it does complain, then you need other approaches:
One possibility is to create a queue (just a boolean array), of the memory zones empty for usage by the WorkGroups. And every time a new workgroup is launched it takes an empty slot and sets the boolean to "used" state. You can do this with atomic_cmpxchg() atomic operation.
It may introduce a small overhead to launch each WG, but it would be probably negligible if each WI is needing 1MB of global memory.
Here you have a small example of how to do atomic_cmpxchg() LINK

Hadoop - job submission time on large data

Did anyone face any problem with submitting job on large data. Data is around 5-10 TB uncompressed, it is in approximate 500K files. When we try to submit a simple java map reduce job, it's mostly spend more than hour on getsplits() function call. And takes multiple hour to appear in job tracker. Is there any possible solution to solve this problem?
with 500k files, you are spending a lot of time tree walking to find all these files, which then need to be assigned to list of InputSplits (the result of getSplits).
As Thomas points out in his answer, if your machine performing the job submission has a low amount of memory assigned to the JVM, then you're going to see issues with the JVM performing garbage collection to try and find the memory required to build up the splits for these 500K files.
To makes matters worse, if these 500K files are splittable, and larger than a single block size, then you'll get even more input splits to process the files (a file of size say 1GB, with a block size of 256MB, you'll by default get 4 map tasks to process this file, assuming the input format and file compression supports splitting the file). If this is applicable to your job (look at the number of map tasks spawned for your job, are there more than 500k?), then you can force less mappers to be created by amending the mapred.min.split.size configuration property to a size larger then the current block size (setting it to 1GB for the previous example means you'll get a single mapper to process the file, rather than 4). This will help the performance of getSplits method the resultant list of getSplits will be smaller, requiring less memory.
The second symptom of your problem is the time is takes to serialize the input splits to a file (client side), and then the deserialization time at the job tracker end. 500K+ splits is going to take time, and the jobtracker will have similar GC issues if it has a low JVM memory limit.
It largely depends on how "strong" your submission server is (or your laptop client), maybe you need to upgrade RAM and CPU to make the getSplits call faster.
I believe you ran into swap issues there and the computation takes therfore multiple times longer than usual.

Memory leak in asp.net application - W3WP and gen 2 heap continues to grow until AppPool recycles

We have a large asp.net application that is leaking memory. Perfmon shows that this leak is in managed memory as W3WP private bytes grows at the same rate as bytes in all heaps. I can also see that Gen 2 garbage collections are running but the Gen 2 heap size continues to grow.
I took a memory dump and analysed in WinDbg and can see a very large number of objects of lots of types. Strings are the biggest type and 20% of the size of the strings is made from 51 objects.
Dumping these large strings shows outputted html either from controls or entire pages. Running !gcroot on these shows the root objects being of type System.Text.RegularExpressions.Regex or System.Web.RegularExpressions.GTRegex.
Any ideas of what could be happening or how I can investigate further?
Thanks, Simon
How about using a memory profiler such as dotTrace Memory or ANTZ Memory Profiler? Both products are available as a time-limited trial version.
That strings are the most common type on your heap is not strange at all. If you for example have 10 HashSet's containing 1000 strings each, the dump will show that you have 10 HashSet's on your heap, but 100 000 strings. Many objects contain one or several strings. Thus, the number of strings shown in the dump is the sum of all strings from all objects on the heap, which tend to be a lot.
However, if you have alot of System.Text.RegularExpressions.Regex on your heap, that can very well be the root to your memory problemts. Regex in .NET tend to take a lot of resources. Hence, my advice is that you go through your code and try to find any excessive use of regex. Also, make sure that any references to Regex objects are being taken care of, that is to say, that the references to the Regex objects are not kept alive. That way, the Garbage Collector can make sure the Regex objects are deallocated properly.
Good luck!
In theory it should be quite dificult to cause a memory leak in asp.net without using unmanaged resources. If everything is single threaded then all references to managed resources should be free to be garbage collected when the page life cycle is complete. Are you firing off worker threads to do anything and are these threads continuing to live beyond the life of the page? Or do you have any long running processes exposed as web methods that can be fired off by asynchounsly and are just taking a long time to run and being called repeatedly until the memory is full?

Resources