I'm starting to use RWMutex in my Go project with map since now I have more than one routine running at the same time and while making all of the changes for that a doubt came to my mind.
The thing is that I know that we must use RLock when only reading to allow other routines to do the same task and Lock when writing to full-block the map. But what are we supposed to do when editing a previously created element in the map?
For example... Let's say I have a map[int]string where I do Lock, put inside "hello " and then Unlock. What if I want to add "world" to it? Should I do Lock or can I do RLock?
You should approach the problem from another angle.
A simple rule of thumb you seem to understand just fine is
You need to protect the map from concurrent accesses when at least one of them is a modification.
Now the real question is what constitutes a modification of a map.
To answer it properly, it helps to notice that values stored in maps are not addressable — by design.
This was engineered that way simply due to the fact maps internally have intricate implementation which
might move values they contain in memory
to provide (amortized) fast access time
when the map's structure changes due to insertions and/or deletions of its elements.
The fact map values are not addressable means you can not do
something like
m := make(map[int]string)
m[42] = "hello"
go mutate(&m[42]) // take a single element and go modifying it...
// ...while other parts of the program change _other_ values
m[123] = "blah blah"
The reason you are not allowed to do this is the
insertion operation m[123] = ... might trigger moving
the storage of the map's element around, and that might
involve moving the storage of the element keyed by 42
to some other place in memory — pulling the rug
from under the feet of the goroutine
running the mutate function.
So, in Go, maps really only support three operations:
Insert — or replace — an element;
Read an element;
Delete an element.
You cannot modify an element "in place" — you can only
go in three steps:
Read the element;
Modify the variable containing the (read) copy;
Replace the element by the modified copy.
As you can now see, the steps (1) and (3) are mere map accesses,
and so the answer to your question is (hopefully) apparent:
the step (1) shall be done under at least an read lock,
and the step (3) shall be done under a write (exclusive) lock.
In contrast, elements of other compound types —
arrays (and slices) and fields of struct types —
do not have the restriction maps have: provided the storage
of the "enclosing" variable is not relocated, it is fine to
change its different elements concurrently by different goroutines.
Since the only way to change the value associated with the key in the map is to reassign the changed value to the same key, that is a write / modification, so you have to obtain the write lock–simply using the read lock will not be sufficient.
Simplified structure of what I have is:
some single roots
each root has many (e.g. 100s) of children
User may update the root information, and no other operation on children should be allowed (because root change may affect all of them).
Also, user may operate on children (if root is not in use, of course). For example, user may change 2 children in the same time, and this is allowed, since each children is independent.
I need locks in this structure in order to be sure there are no corruptions:
when children is in use, lock the children. This will not allow two operations on the same children in same time.
when root is in use, lock root AND all the children. This will forbid the operations on any children while root is updated.
What bothers me here is the need to lock all the children - in a distributed system that means sending that many requests to distributed lock.
Is there any better solution I don't see?
You're missing two things. First, it's safe for multiple threads to read from a node at the same time, as long as nobody is writing to it. Second, the child nodes can be viewed as their own roots of smaller trees, so the same algorithm/solution may be applied to all nodes except leaf nodes. The first one is most important. Here's how you could do this:
Use a read/write mutex on all nodes in the tree. This allows any number of processes to concurrently read, or a single process to write to a node at any time.
To read:
read-lock node and all parents all the way up to root.
read.
release all read-locks.
To write:
write-lock the least-upper-bound of the nodes you want to modify. If you're modifying a node (and possibly any of its children), write-lock that node.
do your modifications
release the write-lock
This means two siblings may be modified concurrently, and that any number of reads may execute concurrently. However, the cost of reading is that you need to grab O(log100(tree_height)) read-locks, for a tree with roughly 100 children at each level. It's unlikely to be a real problem, unless your tree is huge, with extremely many reads and writes to the same leaf node.
This assumes that no child may modify its parent.
In Qt, I have a model subclassing QAbstractItemModel - it's a tree displayed in a QTreeView.
The model supports various forms of change which all work OK. The two of relevance are:
1) Some data in a small number of related rows changes
2) A visualisation change means that the majority of rows should change their formatting - in particular they have a change of background highlighting. Their DisplayRole data does not change.
The current design deals with both of these in the same way: for every row that has any change the model emits dataChanged(start_of_row_index,end_of_row_index). I emit the signal for both parent rows that change and for any of their children that have changed.
However, this performs badly in case 2 as the model gets big: a very large number of dataChanged signals are emitted.
I have changed the code so that in case 2 the model emits dataChanged only for the (single) row that is the parent of the entire tree.
This still appears to work correctly but does not accord with my understanding of the responsibilities of the model. But I suspect I may be wrong.
Perhaps I am misunderstanding the dataChanged signal? Does it actually cause the view to update all children as well as the specified range? Or can I avoid emitting dataChanged when it is not the DisplayRole that is changing?
Edited with my progress so far
As Jan points out, I ought to emit dataChanged either for most or all of the rows in case 2.
My code originally did this by emitting dataChanged for every changed row but this is too expensive - the view takes too long to process all these signals.
A possible solution could be to aggregate the dataChanged signal for any contiguous blocks of changed rows but this will still not perform well when, for example, every other row has changed - it would still emit too many signals.
Ideally I would like to just tell the view to consider all data as potentially changed (but all indexes still valid - the layout unchanged). This does not seem to be possible with a single signal.
Because of a quirk of the QTreeView class, it is possible (though incorrect according to the spec) to emit only one dataChanged(tl,br) as long as tl != br. I had this working and it passed our testing but left me nervous.
I have settled for now on a version which traverses the tree and emits a single dataChanged(tl,br) for every parent (with tl,br spanning all the children of that parent). This conforms to the model/view protocol and for our models it typically reduces the number of signals by about a factor of 10.
It does not seem ideal however. Any other suggestions anyone?
You are expected to let your views know whenever any data gets changed. This "letting know" can happen through multiple ways; emitting dataChanged is the most common one when the structure of the indexes has not changed; others are the "serious" ones like modelReset or layoutChanged. By a coincidence, some of the Qt's views are able to pick up changes even without dataChanged on e.g. a mouseover, but you aren't supposed to rely on that. It's an implementation detail and a subject to change.
To answer the final bit of your question, yes, dataChanged must be emitted whenever any data returned from the QAIM::data() changes, even if it's "just" some other role than Qt::DisplayRole.
You're citing performance problems. What are the hard numbers -- are you actually getting any measurable slowdown, or are you just prematurely worried that this might be a problem later on? Are you aware of the fact that you can use both arguments to the dataChanged to signal a change over a big matrix of indexes?
EDIT:
A couple more things to try:
Make sure that your view does not request extra data. For example, unless you set the QTreeView's uniformRowHeights (IIRC), the view will have to execute O(n) calls for each dataChanged signal, leading to O(n^2) complexity. That's bad.
If you are really sure that there's no way around this, you might get away by combining the layoutAboutToBeChanged, updatePersistentIndexes and layoutChanged. As you are not actually changing the structure of your indexes, this might be rather cheap. However, the optimization opportunity in the previous point is still worthwhile taking.
Let's say I have circular objects. Each object has a diameter of 64 pixels.
The cells of my quad tree are let's say 96x96 pixels.
Everything will be fine and working well when I check collision from the cell a circle is residing in + all it's neighbor cells.
BUT what if I have one circle that has a diameter of 512 pixels? It would cover many cells and thus this would be a problem when checking only the neighbor cells. But I can't re-size my quad-tree-grid every time a much larger object is inserted into the tree...
Instead och putting objects into a single cell put them in all cells they collide with. That way you can just test each cell individually. Use pointers to the object so you dont create copies. Also you only need to do this with leavenodes, so no need to combine data contained in higher nodes with lower ones.
This an interesting problem. Maybe you can extend the node or the cell with a tree height information? If you have an object bigger then the smallest cell nest it with the tree height. That's what map's application like google or bing maps does.
Here a link to a similar solution: http://www.gamedev.net/topic/588426-2d-quadtree-collision---variety-in-size. I was confusing the screen with the quadtree. You can check collision with a simple recusion.
Oversearching
During the search, and starting with the largest objects first...
Test Object.Position.X against QuadTreeNode.Centre.X, and also
test Object.Position.Y against QuadTreeNode.Centre.Y;
... Then, by taking the Absolute value of the difference, treat the object as lying within a specific child node whenever the absolute value is NOT more than the radius of the object...
... that is, when some portion of the object intrudes into that quad : )
The same can be done with AABB (Axis Aligned Bounding Boxes)
The only real caveat here is that VERY large objects that cover most of the screen, will force a search of the entire tree. In these cases, a different approach may be called for.
Of course, this only takes care of the object that everything else is being tested against. To ensure that all the other large objects in the world are properly identified, you will need to alter your quadtree slightly...
Use Multiple Appearances
In this variation on the QuadTree we ONLY place objects in the leaf nodes of the QuadTree, as pointers. Larger objects may appear in multiple leaf nodes.
Since some objects have multiple appearances in the tree, we need a way to avoid them once they've already been tested against.
So...
A simple Boolean WasHit flag can avoid testing the same object multiple times in a hit-test pass... and a 'cleanup' can be run on all 'hit' objects so that they are ready for the next test.
Whilst this makes sense, it is wasteful if performing all-vs-all hit-tests
So... Getting a little cleverer, we can avoid having any cleanup at all by using a Pointer 'ptrLastObjectTestedAgainst' inside of each object in the scene. This avoids re-testing the same objects on this run (the pointer is set after the first encounter)
It does not require resetting when testing a new object against the scene (the new object has a different pointer value than the last one). This avoids the need to reset the pointer as you would with a simple Bool flag.
I've used the latter approach in scenes with vastly different object sizes and it worked well.
Elastic QuadTrees
I've also used an 'elastic' QuadTree. Basically, you set a limit on how many items can IDEALLY fit in each QuadTreeNode - But, unlike a standard QuadTree, you allow the code to override this limit in specific cases.
The overriding rule here is that an object may NOT be placed into a Node that cannot hold it ENTIRELY... with the top node catching any objects that are larger than the screen.
Thus, small objects will continue to 'fall through' to form a regular QuadTree but large objects will not always fall all the way through to the leaf node - but will instead expand the node that last fitted them.
Think of the non-leaf nodes as 'sieving' the objects as they fall down the tree
This turns out to be a very efficient choice for many scenarios : )
Conclusion
Remember that these standard algorithms are useful general tools, but they are not a substitute for thinking about your specific problem. Do not fall into the trap of using a specific algorithm or library 'just because it is well known' ... your application is unique, and it may benefit from a slightly different approach.
Therefore, don't just learn to apply algorithms ... learn from those algorithms, and apply the principles themselves in novel and fitting ways. These are NOT the only tools, nor are they necessarily the best fit for your application.
Hope some of those ideas helped.
I've been reading up on garbage collection looking for features to include in my programming language and I came across "weak pointers". From here:
Weak pointers are like pointers,
except that references from weak
pointers do not prevent garbage
collection, and weak pointers must
have their validity checked before
they are used.
Weak pointers interact with the
garbage collector because the memory
to which they refer may in fact still
be valid, but containing a different
object than it did when the weak
pointer was created. Thus, whenever a
garbage collector recycles memory, it
must check to see if there are any
weak pointers referring to it, and
mark them as invalid (this need not be
implemented in such a naive way).
I've never heard of weak pointers before. I would like to support many features in my language, but in this case I cannot for the life of me think of a case where this would be useful. For what would one use weak pointer?
A really big one is caching. Let's think through how a cache would work:
The idea behind a cache is to store objects in memory until memory pressure becomes so great that some of the objects need to be pushed out (or are explicitly invalidated of course). So your cache repository object must hold on to these objects somehow. By holding onto them via weak reference, when the garbage collector goes looking for things to consume because memory is low, the items referred to only by weak reference will appear as candidates for garbage collection. Items in the cache that are currently being used by other code will have hard references still active, so those items will be protected from garbage collection.
In most situations you won't be rolling your own caching mechanism, but it is common to use a cache. Let's suppose you want to have a property which refers to an object in cache, and that property stays in scope for a long time. You would prefer to fetch the object from cache, but if it's not available, you can get it from persisted storage. You also don't want to force that particular object to stay in memory if pressure gets too high. So you can use a weak reference to that object, which will allow you to fetch it if it is available but also allow it to fall out of cache.
A typical use case is storage of additional object attributes. Suppose you have a class with a fixed set of members, and, from the outside, you want to add more members. So you create a dictionary object -> attributes, where the keys are weak references. Then, the dictionary doesn't prevent the keys from being garbage collected; removal of the object should also trigger removal of the values in the WeakKeyDictionary (e.g. by means of a callback).
If your language's garbage collector is incapable of collecting circular data structures, then you can use weak references to enable it to do so. Normally, if you have two objects which have references to each other, but no other outside object has a reference to those two, they would be candidates for garbage collection. But, a naïve garbage collector wouldn't collect them, since they contain references to each other.
To fix this, you make it so one object has a strong reference to the second, but the second has a weak reference to the first. Then, when the last outside reference to the first object goes away, the first object becomes a candidate for garbage collection, followed shortly thereafter by the second, since now its only reference is weak.
Another example... not quite caching, but similar: Suppose an I/O library provides an object which wraps a file descriptor and permits access to the file. When the object is collected, the file descriptor is closed. It is desired to be able to list all currently opened files. If you use strong pointers for this list, then files are never closed.
Use them when you wanted to keep a cached list of objects but not prevent those objects from getting garbage collected if the "real" owner of the object is done with it.
A web browser might have a history object that keeps references to image objects that the browser loaded elsewhere and saved in the history/disk cache. The web browser might expire one of those images (user cleared the cache, the cache timeout elapsed, etc) but the page would still have the reference/pointer. If the page used a weak reference/pointer the object would go away as expected and the memory would be garbage collected.
One important reason for having weak references is to deal with the possibility that an object may serve as a pipeline to connect a source of information or events to one or more listeners. If there aren't any listeners, there's no reason to keep sending information to the pipeline.
Consider, for example, an enumerable collection which allows updates during enumeration. The collection may need notify any active enumerators that it has been changed, so those enumerators can adjust themselves accordingly. If some enumerators get abandoned by their creators, but the collection holds strong references to them, those enumerators will continue to exist (and process update notifications) as long as the collection exists. If the collection itself will exist for the lifetime of the application, those enumerators will effectively become a permanent memory leak.
If the collection holds weak references to the enumerators, this problem can be largely solved. If an enumerator is abandoned, it will be eligible for garbage collection, even though the collection still holds a weak reference to it. The next time the collection is changed, it can look through its list of weak references, send updates to the ones that are still valid, and remove from its list the ones that are not.
It would be possible to achieve many of the effects of weak references using finalizers along with some extra objects, and it's possible to make such implementations more efficient than those using weak references, but there are many pitfalls and it's hard to avoid bugs. It's much easier to make a correct approach using WeakReference. The approach may not be optimally efficient, but it won't fail badly.
Weak Pointers keep whatever holds them from becoming a form of "life support" for the object the pointer points to.
Say you had a Viewport class, and 2 UI classes, and a buch of Widget classes. You want your UI to control the lifespan of the Widgets it creates, so your UI keeps SharedPtrs to all the Widgets it controls. For as long as your UI object is alive, none of the Widgets it refrences will be garbage collected (thanks to SharedPtr).
However, the Viewport is your class that actually does the drawing, so your UI needs to pass the Viewport a pointer to the Widgets so that it can draw them. For whatever reason, you want to change your active UI class to the other one. Lets consider two scenarios, one where the UI passed the Viewport WeakPtrs and one where it passed SharedPtrs (pointing to the Widgets).
If you had passed the Viewport all the Widgets as WeakPointers, as soon as the UI class was deleted there would be no more SharedPointers to the Widgets, so they would be garbage collected, the Viewport's references to the objects wouldn't keep them on "life support", which is exactly what you want because you aren't even using that UI anymore, much less the Widgets it created.
Now, consider you had passed the Viewport a SharedPointer, you delete the UI, and the Widgets are NOT garbage collected! Why? because the Viewport, which is still alive has an array (vector or list, whatever) full of SharedPtrs to the Widgets. The Viewport has in effect became a form of "life support" for them, even though you had deleted the UI that was controlling the widgets for another UI object.
Generally, a language/system/framework will garbage collect anything unless there is a "strong" reference to it somewhere in memory. Imagine if everything had a strong reference to everything, nothing would ever get garbage collected! Sometimes you want that behavior sometimes you don't. If you use a WeakPtr, and there are no Shared/StrongPtrs left pointing at the object (only WeakPtrs), then the objects will be garbage collected despite the WeakPtr references, and the WeakPtrs (should be) set to NULL (or deleted, or something).
Again, when you use a WeakPtr you're basically allowing the object you're giving it too to be able to access the data, but the WeakPtr won't prevent garbage collection of the object it points to like a SharedPtr would. When you think SharedPtr, think "life support", WeakPtr, NO "life support." Garbage collection won't (generally) occur until the object has zero life support.
Weak references can for example be used in caching scenarios - you can access data through weak references, but if you don't access the data for a long time or there is high memory pressure, the GC can free it.
The reason for garbage collection at all is that in a language like C where memory management is totally under explicit control of the programmer, when object ownership is passed around, especially between threads or, even harder, between processes sharing memory, avoiding memory leaks and dangling pointers can become very hard. If that weren't hard enough, you also have to deal with the need to have access to more objects than will fit in memory at one time—you need to have a way to have free up some objects for a while so that other objects can be in memory.
So, some languages (e.g., Perl, Lisp, Java) provide a mechanism where you can just stop "using" an object and the garbage collector will eventually discover this and free up memory used for the object. It does this correctly without the programmer worrying about all the ways they can get it wrong (albeit there are lots of ways programmers can screw this up).
If you conceptually multiply the number of times you access an object by the time that it takes to compute the value of an object, and possibly multiply again by the cost of not having the object readily available or by the size of an object since keeping a large object around in memory can prevent keeping several smaller objects around, you could classify objects into three categories.
Some objects are so important that you want to explicitly manage their existence—they will not be managed by the garbage collector or they must never be collected until explicitly freed. Some objects are cheap to compute, are small, are not accessed frequently or have similar characteristics that allow them to be garbage collected at any time.
The third class, objects which are expensive to be recomputed but could be recomputed, are accessed somewhat frequently (perhaps for a short burst of time), are of large size, and so on are a third class. You'd like to keep them in memory as long as possible because they might be reused again, but you don't want to run out of memory needed for critical objects. These are candidates for weak references.
You want these objects kept around as long as possible if they aren't conflicting with critical resources, but they should be dropped if memory is needed for a critical resource because it can be recomputed again when needed. These are hat weak pointers are for.
An example of this might be pictures. Say you have a photo web page with thousands of pictures to display. You need to know how many pictures to lay out and maybe you have to do a database query to get the list. The memory to hold a list of a few thousand items is probably very small. You want to do the query once and keep it around.
You can only physically show perhaps a few dozen pictures at a time, though, in a pane of a web page. You don't need to fetch the bits for the pictures that the user can't be looking at. When the user scrolls the page, you'll gather the actual bits for the pictures visible. Those pictures could require many megabytes to show them. If the user scrolls back and forth between a few scroll positions, you'd like not to have to refetch those megabytes over and over again. But you can't keep all the pictures in memory all the time. So you use weak pointers.
If the user just looks at a few pictures over and over again, they may stay in cache and you don't have to refetch them. But if they scroll enough, you need to free up some memory so the visible pictures can be fetched. With a weak reference, you check the reference just before you use it. If its still valid, you use it. If its not, you make the expensive calculation (fetch) to get it.