Out of Memory with large data in codename one - out-of-memory

My Codename One application downloads around 16000 records of data (approx 10 fields in each record).
On my Android phone (OS6.0, RAM 2GB) it's able to load 8000 to 9000 records but then shows out of memory error.
From the trace, it looks like it run out of heap memory allocated to the app.
Any suggestion what would be the ideal way to handle that large amount of data, please?
Here is the log file

The amount of RAM on the phone doesn't mean much. The OS takes about half and then divides the rest to the various apps running in parallel. You would typically have much less see What is the maximum amount of RAM an app can use?
You need to review your code and check what is eating up memory. 16k records of 1kb each would be 16Mb which probably shouldn't crash an app so the question is where is memory taken, I would suggest reading the performance section of the developer guide to figure out memory usage.

This might not apply to your situation, but would it be possible to only download x number of records at a time? Then, when the user takes some action (scrolls, hits next page, etc) it loads the next batch? Codename one has a great endless scroller implementation. See here for an example - https://www.codenameone.com/blog/property-cross-revisited.html

Related

Why wouldn't a small Firebase Functions app just use a single Function to handle logic?

...aside from the benefit in separate performance monitoring and logging.
For logging, I am confident I can get granularity through manually adding the name of the "routine" to each call. This is how it is now with several discrete Functions for different parts of the system:
There are multiple automatic logs: start and finish of the routine, for example. It would be more challenging to find out how expensive certain routines are, but it would not be impossible.
The reason I want the entire logic of the application handled by a single handle function is because of reducing cold starts: one function means only one container that can be persistently kept alive when there are very few users of the app.
If a month is ~2.6m seconds and we assume the system uses 1 GB RAM and 1 GHz CPU frequency at all times, that's:
2600000 * 0.0000025 + 2600000 * 0.000001042 = USD$9.21 a month
...for one minimum instance.
I should also state that all of my functions have the bare minimum amount of global scope code; it just sets up Firebase assets (RTDB and Firestore).
From a billing, performance (based on user wait time), and user/developer experience perspective, is there any reason why it would be smart to keep all my functions discrete?
I'd also accept an answer saying "one single function for all logic is reasonable" as long as there's a reason for it.
Thanks!
If you have very small app with ~5 end points and very low traffic. Sure you could do something like this. But why not do it:
billing and performance
The important thing to realize is that with every request a new instance of your function is created. Which means there could be 10s of them running at the same time.
If you would like to have just 1 instance handling all the traffic you should explore GCP Cloud run, where you have 1 container handling multiple requests and scaling only when it's not sufficient.
Imagine you have several end-points and every one of them have different performance requirements.
1 can need only 128MB or RAM
1 can need 1GB RAM
(FYI: You can control the CPU MHz of the function via the RAM settings too - which can speed up execution in some cases)
If you had only 1 function with 1GB of ram. Every request would allocate such function and in some cases most of the memory could go to waste.
But if you split it into multiple, some requests will require much less resources and can save you $ when we talk about bigger amount of executions / month. (tens of thousands+).
Let's imagine function, 3 second execution, 10k executions/month:
128MB would cost you $0.0693
1024MB would cost you $0.495
As you can see, with small app the difference could be nothing. But if you scale it matters. (*The cost can vary based on datacenter)
As for the logging, I don't think it matters. Usually in bigger systems there could be messages traveling trough several functions so you have to deal with that anyway.
As for the cold start. You just need good UI to facilitate that. At first I was worry about it in our apps but later on, you just get used to it that some action can take ~2s to execute (cold start). And you should have the UI "loading" regardless, because you don't know if the function will take ~100ms or 3s due to bad connection.

Univocity - Parsing a fixedwidth flat file with one row - performance impact with 300 parallel threads

We have a project that deals with millions of transactions everyday which has some tight SLAs. As part of parsing the flat file that comes as input to a bean , we used beanio which was working better with out load. But with load its taking around 250ms to parse a flat file to a bean.
Requirement: Simple string has to converted to a single bean(nested and converted)
Heard the univocity can do better here - and tried the same with below settings.
FixedWidthParserSettings settings = new FixedWidthParserSettings();
settings.getFormat().setLineSeparator("\n");
settings.setRecordEndsOnNewline(false);
settings.setHeaderExtractionEnabled(false);
settings.setIgnoreLeadingWhitespaces(false);
settings.setIgnoreTrailingWhitespaces(false);
settings.setMaxColumns(100);
settings.setMaxCharsPerColumn(100);
settings.setNumberOfRecordsToRead(1);
settings.setReadInputOnSeparateThread(false);
settings.setInputBufferSize(10*1024);
settings.setLineSeparatorDetectionEnabled(false);
settings.setColumnReorderingEnabled(false);
When running with jmeter, with 200 parallel threads - the average time taken is 10ms(to parse and convert around 10 fields where in actual use case we have to the same for around 500 fields)
but when we increased it to 300 or 350 parallel threads , the average time was around 300ms. But our total SLA is around 10ms.
Any help here is highly appreciated!
Probably you are running out of memory on your JVM. Try increasing it with the -Xms and -Xmx flags. Also too many threads won't help you if you don't have enough cores available.

Virtuoso 7 crashes while bulk loading

I am trying to create a local SPARQL endpoint for Freebase for running some local experiments. While using Virtuoso 7, I regularly see server getting killed by OOM killer. I have followed all the required steps as mentioned here. I have also made the required changes to my virtuoso.ini file as mentioned in RDF Performance Tuning.
My system configuration is:
8 CPU 2.9 Ghz
16 GB RAM
I have enough hard disk too.
Regarding data dumps, I have split the freebase data dump (23GB gzipped, approx 250 GB uncompressed) into 10 smaller gzipped files containing 200,000,000 triples each.
Following are the changes I made to virtuoso.ini
NumberOfBuffers = 1360000
MaxDirtyBuffers = 1000000
MaxCheckpointRemap = 340000 # (1/4th of NumberOfBuffers)
Along with this I have set vm.swapiness = 10 as mentioned in 2.
Am I missing something obvious?
P.S.:
I did try virtuoso-opensource-6.1 too. But it appeared to be too slow.
One interesting observation I had was that during bulk loading process, virtuoso-6.1 memory consumption was rising too slowly, but it might be because general indexing itself was too slow.
Another observation I had was the virtuoso-6.1 at start time occupies almost negligible memory (order of 500MB) whereas virtuoso-7 starts with approx 6500 MB and grows quickly.
Any help in this regard would be highly appreciated.
Numbers of buffers you are using is little bit too high. Do not forget that some memory is also consume by OS and other processes.
Which exact version do you use? (development or stable branch?)
Do you use disk striping ?
I load freebase to Virtuoso 7 too, but I used smaller files. Circa 260 gzipped files, 10mil triples each = circa 100M. A commit is executed after every file load.
Maybe would be easier for you to use images with Virtuoso preloaded by Freebase

Hadoop - job submission time on large data

Did anyone face any problem with submitting job on large data. Data is around 5-10 TB uncompressed, it is in approximate 500K files. When we try to submit a simple java map reduce job, it's mostly spend more than hour on getsplits() function call. And takes multiple hour to appear in job tracker. Is there any possible solution to solve this problem?
with 500k files, you are spending a lot of time tree walking to find all these files, which then need to be assigned to list of InputSplits (the result of getSplits).
As Thomas points out in his answer, if your machine performing the job submission has a low amount of memory assigned to the JVM, then you're going to see issues with the JVM performing garbage collection to try and find the memory required to build up the splits for these 500K files.
To makes matters worse, if these 500K files are splittable, and larger than a single block size, then you'll get even more input splits to process the files (a file of size say 1GB, with a block size of 256MB, you'll by default get 4 map tasks to process this file, assuming the input format and file compression supports splitting the file). If this is applicable to your job (look at the number of map tasks spawned for your job, are there more than 500k?), then you can force less mappers to be created by amending the mapred.min.split.size configuration property to a size larger then the current block size (setting it to 1GB for the previous example means you'll get a single mapper to process the file, rather than 4). This will help the performance of getSplits method the resultant list of getSplits will be smaller, requiring less memory.
The second symptom of your problem is the time is takes to serialize the input splits to a file (client side), and then the deserialization time at the job tracker end. 500K+ splits is going to take time, and the jobtracker will have similar GC issues if it has a low JVM memory limit.
It largely depends on how "strong" your submission server is (or your laptop client), maybe you need to upgrade RAM and CPU to make the getSplits call faster.
I believe you ran into swap issues there and the computation takes therfore multiple times longer than usual.

How much is too much asp.net session size?

I have an application on the corporate intranet that makes use of session state to store values between a wizard (string of pages/ user controls). I'm measuring the size of the session and navigating around to make sure things dotn get out of hand.
At worst, I can get the size up to 900 Bytes.
Is this too much?
I know it all depends on other factors such as the number of users and the amount of memory in the server. So lets set some parameters around these... The server is allocated 1 Gig of RAM for ASP.net (the rest is allocated to the OS and other items). I have at most 10 users on the system concurrently.
Thanks for the help
Personally I'd say 900 bytes is nothing. Lets say it's 1kB -> which means with 1Gig of RAM you should be able to store roughly 1000k of those sessions (not including anything else).
Personally I think you shouldn't look at the raw numbers. What's important is: is the stuff in the session really meant to be in the session. You should put information in the session that's useful when the user is browsing your website and what can be discarded if the user leaves your website.
As long as you don't store big data objects in your session, you should be fine.
Too much is when you run out of memory attempting to serve whatever user load you want to be able to serve on each machine. We can't tell you how much is too much, but we can tell you that 900 bytes isn't very much at all.
Given your user load, I think you should be alright...
Don't forget to handle the situation where your session drops out halfway through for some reaason.
900 Bytes per session, total? No thats hardly a problem with only 10 users. Even 900K per user wouldnt be much of an issue, as you would only be talking about ~10MB of session state.
900K per page per user would be something you need to worry about.

Resources