Performance of commitWriteTransaction for a big Realm file - realm

Using Realm 1.0.2 on OS X, I have a Realm file that reached ~3.5 Gb. Now, writing a batch of new objects takes around 30s - 1min on average, which makes things pretty slow.
After profiling, it clearly looks like commitWriteTransaction is taking a big chunk of time.
Is that performance normal / expected in that case? And if so, what strategies would be available to make that saving time faster?

Realm uses copy-on-write semantics whenever changes are performed in write transactions.
The larger the structure that has to be forked & copied, the longer it will take to perform the operation.
This small unscientific benchmark on my 2.8GHz i7 MacBook Pro
import Foundation
import RealmSwift
class Model: Object {
dynamic var prop1 = 0.0
dynamic var prop2 = 0.0
}
// Add 60 million objects with two Double properties in batches of 10 million
autoreleasepool {
for _ in 0..<6 {
let start = NSDate()
let realm = try! Realm()
try! realm.write {
for _ in 0..<10_000_000 {
realm.add(Model())
}
}
print(realm.objects(Model.self).count)
print("took \(-start.timeIntervalSinceNow)s")
}
}
// Add one item to that Realm
autoreleasepool {
let start = NSDate()
let realm = try! Realm()
try! realm.write {
realm.add(Model())
}
print(realm.objects(Model.self).count)
print("took \(-start.timeIntervalSinceNow)s")
}
Logs the following:
10000000
took 25.6072470545769s
20000000
took 23.7239990234375s
30000000
took 24.4556020498276s
40000000
took 23.9790390133858s
50000000
took 24.5923230051994s
60000000
took 24.2157150506973s
60000001
took 0.0106720328330994s
So you can see that adding many objects to the Realm, with no relationships, is quite fast and stays linearly proportional to the number of objects being added.
So it's likely that you're doing more than just adding objects to the Realm, maybe you're updating existing objects, causing them to be copied?
If you're reading a value from all objects as part of your write transactions, that will also grow proportionally to the number of objects.
Avoiding these things will shorten your write transactions.

Related

How to unload objects from memory while reading realm data in batches

Following is code example (in swift) given by Realm team to read subset of entire result set:
// Loop through the first 5 Dog objects
// restricting the number of objects read from disk
let dogs = try! Realm().objects(Dog.self)
for i in 0..<5 {
let dog = dogs[i]
// ...
}
Ref: https://realm.io/docs/swift/latest/ (Limiting Results)
Now I understand objects are loaded as we read them from realm, I am little confused as how realm unloads them from memory as we keep progressing. For example, if my query returns 100 objects as a result set and I read first 10 objects, then on user interaction I read next 10 and so on. Now what will happen to previously read objects in memory. How can I control memory usage in such cases as I may want to unload previous subset when I read a newer one.
My guess is I may need invalidate old realm and make a new connection to fetch result set again and read again from a new offset. Invalidating may unload everything read previously. Please suggest.
Thank you!

Concurrent updates in DynamoDB, are there any guarantees?

In general, if I want to be sure what happens when several threads make concurrent updates to the same item in DynamoDB, I should use conditional updates (i.e.,"optimistic locking"). I know that. But I was wondering if there is any other case when I can be sure that concurrent updates to the same item survive.
For example, in Cassandra, making concurrent updates to different attributes of the same item is fine, and both updates will eventually be available to read. Is the same true in DynamoDB? Or is it possible that only one of these updates survive?
A very similar question is what happens if I add, concurrently, two different values to a set or list in the same item. Am I guaranteed that I'll eventually see both values when I read this set or list, or is it possible that one of the additions will mask out the other during some sort of DynamoDB "conflict resolution" protocol?
I see a version of my second question was already asked here in the past Are DynamoDB "set" values CDRTs?, but the answer refered to a not-very-clear FAQ entry which doesn't exist any more. What's I would most like to see as an answer to my question is an official DynamoDB documentation that says how DynamoDB handles concurrent updates when neither "conditional updates" nor "transactions" are involved, and in particular what happens in the above two examples. Absent such official documentation, does anyone have any real-world experience with such concurrent updates?
I just had the same question and came across this thread. Given that there was no answer I decided to test it myself.
The answer, as far as I can observe is that as long as you are updating different attributes it will eventually succeed. It does take a little bit longer the more updates I push to the item so they appear to be written in sequence rather than in parallel.
I also tried updating a single List attribute in parallel and this expectedly fail, the resulting list once all queries had completed was broken and only had some of the entries pushed to it.
The test I ran was pretty rudimentary and I might be missing something but I believe the conclusion to be correct.
For completeness, here is the script I used, nodejs.
const aws = require('aws-sdk');
const ddb = new aws.DynamoDB.DocumentClient();
const key = process.argv[2];
const num = process.argv[3];
run().then(() => {
console.log('Done');
});
async function run() {
const p = [];
for (let i = 0; i < num; i++) {
p.push(ddb.update({
TableName: 'concurrency-test',
Key: {x: key},
UpdateExpression: 'SET #k = :v',
ExpressionAttributeValues: {
':v': `test-${i}`
},
ExpressionAttributeNames: {
'#k': `k${i}`
}
}).promise());
}
await Promise.all(p);
const response = await ddb.get({TableName: 'concurrency-test', Key: {x: key}}).promise();
const item = response.Item;
console.log('keys', Object.keys(item).length);
}
Run like so:
node index.js {key} {number}
node index.js myKey 10
Timings:
10 updates: ~1.5s
100 updates: ~2s
1000 updates: ~10-20s (fluctuated a lot)
Worth noting is that the metrics show a lot of throttled events but these are handled internally by the nodejs sdk using exponential backoff so once the dust settled everything was written as expected.
Your post contains quite a lot of questions.
There's a note in DynamoDB's manual:
All write requests are applied in the order in which they were received.
I assume that the clients send the requests in the order they were passed through a call.
That should resolve the question whether there are any guarantees. If you update different properties of an item in several requests updating only those properties, it should end up in an expected state (the 'sum' of the distinct changes).
If you, on the other hand, update the whole object, the last one will win.
DynamoDB has #DynamoDbVersion which you can use for optimistic locking to manage concurent writes of whole objects.
For scenarios like auctions, parallel tick counts (such as "likes"), DynamoDB offers AtomicCounters.
If you update a list, that depends on if you use the DynamoDB's list type (L), or if it is just a property and the client translates the lists into a String (S). So if you read a property, change it, and write, and do that in parallel, the result will be subject to eventual consistency - what you will read may not be the latest write. Applied to lists, and several times, you'll end up with some of the elements added, and some not (or, better said, added but then overwritten).

Mysterious EXC_BAD_ACCESS on dispatch_async *serial queue*

I have a location-based app that gets location every 1 second, and saves a batch of the locations at a time to the CoreData DB so as not to make the locations array too large. However, for some reason it crashes with EXC_BAD_ACCESS even though I use dispatch_async on a SERIAL QUEUE:
Created a global serial custom queue like this:
var MyCustomQueue : dispatch_queue_t =
dispatch_queue_create("uniqueID.for.custom.queue", DISPATCH_QUEUE_SERIAL);
in the didUpdateToLocation protocol function, I have this bit of code that saves the latest batch of CLLocation items onto the disk, if the batch size is greater than a preset number:
if(myLocations.count == MAX_LOCATIONS_BUFFER)
{
dispatch_async(MyCustomQueue)
{
for myLocation in self.myLocations
{
// the next line is where it crashes with EXC_BAD_ACCESS
newLocation = NSEntityDescription.insertNewObjectForEntityForName(locationEntityName, inManagedObjectContext: context as NSManagedObjectContext) ;
newLocation.setValue(..)
// all the code to set the attributes
}
appDelegate.saveContext();
dispatch_async(dispatch_get_main_queue()) {
// once the latest batch is saved, empty the in-memory array
self.myLocations = [];
}
}
}
The mystery is, isn't a SERIAL QUEUE supposed to execute all the items on it in order, even though you use dispatch_async (which simply makes the execution concurrent with the MAIN thread, that doesn't touch the SQLite data)?
Notes:
The crash happens at random times...sometimes it'll crash after 0.5 miles, other times after 2 miles, etc.
If I just take out any of the dispatch_async code (just make everything executed in order on the main thread), there is no issue.
You can take a look at this link. Your issue is probably because there is no MOC at the time you are trying to insert item into it.
Core Data : inserting Objects crashed in global queue [ARC - iPhone simulator 6.1]

Flex consuming huge memory for large data

When flex array collection is handled with large amount of data for example 2,00,000 new referenced objects the memory in flex client browser shoots up 20MB. This excess 20MB is independent of the variables defined in the object. An detailed example is illustrated below.
var list:ArrayCollection = new ArrayCollection;
for(var i:int = 0;i<200000;i++)
{
var obj:Object = new Object;
list.add(obj);
}
On executing the above code there was 20MB increase in flex client browser memory. For a different scenario i tried adding an action script object into the array collection. The action script object is defined below.
public class Sample
{
public var id:int;
public var age:int;
public Sample()
{
}
}
On adding 200000 Sample class into a array collection there was still 20MB memory leak.
var list:ArrayCollection = new ArrayCollection;
for(var i:int = 0;i<200000;i++)
{
var obj:Sample = new Sample;
obj.id= i;
onj.age = 20;
list.add(obj);
}
I even tried adding the Sample Objects into flex arrayList and array but the problem still persists. Can someone explain on where this excess memory is consumed by flex?
Requesting memory to the OS is time consuming, so Flash player requests large chunks of memory (more than it really needs) in order to minimize the number of those requests.
I have no idea if the OS allocation time is a big deal anymore, we're talking on avg 1.5-2GHz Cpu's - even mobile. But Benoit is on the right track. Large chunks are reserved at a time to mainly avoid heap fragmentation. If memory was requested in only size chunks it needs at a time, along with other IO requests, the system memory would become highly fragmented very quickly. When these fragments are returned to the OS space - unless the memory manager gets a request of the same size or smaller, it cannot reallocate this chunk - thereby making it lost to the visible pool. So to avoid this issue - Flash (and it's memory manager) requests 16Mb at a time.
In your case, it wouldn't matter if you created 1 object or 100,000. You'll still start with a minimum of 16Mb private memory (aka what you see in task manager).
The flash player allocation mechanism is based on the Mozilla MMgc.
You can read about it here: https://developer.mozilla.org/en-US/docs/MMgc

Memory usage in Flash / Flex / AS3

I'm having some trouble with memory management in a flash app. Memory usage grows quite a bit, and I've tracked it down to the way I load assets.
I embed several raster images in a class Embedded, like this
[Embed(source="/home/gabriel/text_hard.jpg")]
public static var ASSET_text_hard_DOT_jpg : Class;
I then instance the assets this way
var pClass : Class = Embedded[sResource] as Class;
return new pClass() as Bitmap;
At this point, memory usage goes up, which is perfectly normal. However, nulling all the references to the object doesn't free the memory.
Based on this behavior, looks like the flash player is creating an instance of the class the first time I request it, but never ever releases it - not without references, calling System.gc(), doing the double LocalConnection trick, or calling dispose() on the BitmapData objects.
Of course, this is very undesirable - memory usage would grow until everything in the SWFs is instanced, regardless of whether I stopped using some asset long ago.
Is my analysis correct? Can anything be done to fix this?
Make sure you run your tests again in the non-debug player. Debug player doesn't always reclaim all the memory properly when releasing assets.
Also, because you're using an Embedded rather than loaded asset, the actual data might not ever be released. As it's part of your SWF, I'd say you could reasonably expect it to be in memory for the lifetime of your SWF.
Why do you want to use this ???
var pClass : Class = Embedded[sResource] as Class;
return new pClass() as Bitmap;
Sometimes dynamically resource assignment is buggy and fails to be freed up. I had similar problems before with flash player and flex, for ex. loading and unloading the same external swf... the memory was increasing constantly with the size of the loaded swf without going down even if I was calling the system.gc(); after unloading the swf.
So my suggestion is to skip this approach and use the first case you have described.
UPDATE 1
<?xml version="1.0" encoding="utf-8"?>
<s:Application
xmlns:fx = "http://ns.adobe.com/mxml/2009"
xmlns:s = "library://ns.adobe.com/flex/spark"
xmlns:mx = "library://ns.adobe.com/flex/mx"
creationComplete = "creationComplete()">
<fx:Script>
<![CDATA[
[Embed(source="/assets/logo1w.png")]
private static var asset1:Class;
[Embed(source="/assets/060110sinkhole.jpg")]
private static var asset2:Class;
private static var _dict:Dictionary = new Dictionary();
private static function initDictionary():void
{
_dict["/assets/logo1w.png"] = asset1;
_dict["/assets/060110sinkhole.jpg"] = asset2;
}
public static function getAssetClass(assetPath:String):Class
{
// if the asset is already in the dictionary then just return it
if(_dict[assetPath] != null)
{
return _dict[assetPath] as Class;
}
return null;
}
private function creationComplete():void
{
initDictionary();
var asset1:Class = getAssetClass("/assets/logo1w.png");
var asset2:Class = getAssetClass("/assets/060110sinkhole.jpg");
var asset3:Class = getAssetClass("/assets/logo1w.png");
var asset4:Class = getAssetClass("/assets/060110sinkhole.jpg");
var asset5:Class = getAssetClass("/assets/logo1w.png");
}
]]>
</fx:Script>
At this point, memory usage goes up,
which is perfectly normal. However,
nulling all the references to the
object doesn't free the memory.
That's also perfectly normal. It's rare for any system to guarantee that the moment you stop referencing an object in the code is the same moment that the memory for it is returned to the operating system. That's exactly why methods like System.gc() are there, to allow you to force a clean-up when you need one. Usually the application may implement some sort of pooling to keep around objects and memory for efficiency purposes (as memory allocation is typically slow). And even if the application does return the memory to the operating system, the OS might still consider it as assigned to the application for a while, in case the app needs to request some more shortly afterwards.
You only need to worry if that freed memory is not getting reused. For example, you should find that if you create an object, free it, and repeat this process, memory usage should not grow linearly with the number of objects you create, as the memory for previously freed objects gets reallocated into the new ones. If you can confirm that does not happen, edit your question to say so. :)
It turns out I was keeping references to the objects that didn't unload. Very tricky ones. The GC does work correctly in all the cases I described, which I suspected may have been different.
More precisely, loading a SWF, instancing lots of classes defined in it, adding them to the display list, manipulating them, and then removing all references and unloading the SWF leaves me with almost exactly the same memory I started with.

Resources