Why are weak pointers useful? - pointers

I've been reading up on garbage collection looking for features to include in my programming language and I came across "weak pointers". From here:
Weak pointers are like pointers,
except that references from weak
pointers do not prevent garbage
collection, and weak pointers must
have their validity checked before
they are used.
Weak pointers interact with the
garbage collector because the memory
to which they refer may in fact still
be valid, but containing a different
object than it did when the weak
pointer was created. Thus, whenever a
garbage collector recycles memory, it
must check to see if there are any
weak pointers referring to it, and
mark them as invalid (this need not be
implemented in such a naive way).
I've never heard of weak pointers before. I would like to support many features in my language, but in this case I cannot for the life of me think of a case where this would be useful. For what would one use weak pointer?

A really big one is caching. Let's think through how a cache would work:
The idea behind a cache is to store objects in memory until memory pressure becomes so great that some of the objects need to be pushed out (or are explicitly invalidated of course). So your cache repository object must hold on to these objects somehow. By holding onto them via weak reference, when the garbage collector goes looking for things to consume because memory is low, the items referred to only by weak reference will appear as candidates for garbage collection. Items in the cache that are currently being used by other code will have hard references still active, so those items will be protected from garbage collection.
In most situations you won't be rolling your own caching mechanism, but it is common to use a cache. Let's suppose you want to have a property which refers to an object in cache, and that property stays in scope for a long time. You would prefer to fetch the object from cache, but if it's not available, you can get it from persisted storage. You also don't want to force that particular object to stay in memory if pressure gets too high. So you can use a weak reference to that object, which will allow you to fetch it if it is available but also allow it to fall out of cache.

A typical use case is storage of additional object attributes. Suppose you have a class with a fixed set of members, and, from the outside, you want to add more members. So you create a dictionary object -> attributes, where the keys are weak references. Then, the dictionary doesn't prevent the keys from being garbage collected; removal of the object should also trigger removal of the values in the WeakKeyDictionary (e.g. by means of a callback).

If your language's garbage collector is incapable of collecting circular data structures, then you can use weak references to enable it to do so. Normally, if you have two objects which have references to each other, but no other outside object has a reference to those two, they would be candidates for garbage collection. But, a naïve garbage collector wouldn't collect them, since they contain references to each other.
To fix this, you make it so one object has a strong reference to the second, but the second has a weak reference to the first. Then, when the last outside reference to the first object goes away, the first object becomes a candidate for garbage collection, followed shortly thereafter by the second, since now its only reference is weak.

Another example... not quite caching, but similar: Suppose an I/O library provides an object which wraps a file descriptor and permits access to the file. When the object is collected, the file descriptor is closed. It is desired to be able to list all currently opened files. If you use strong pointers for this list, then files are never closed.

Use them when you wanted to keep a cached list of objects but not prevent those objects from getting garbage collected if the "real" owner of the object is done with it.
A web browser might have a history object that keeps references to image objects that the browser loaded elsewhere and saved in the history/disk cache. The web browser might expire one of those images (user cleared the cache, the cache timeout elapsed, etc) but the page would still have the reference/pointer. If the page used a weak reference/pointer the object would go away as expected and the memory would be garbage collected.

One important reason for having weak references is to deal with the possibility that an object may serve as a pipeline to connect a source of information or events to one or more listeners. If there aren't any listeners, there's no reason to keep sending information to the pipeline.
Consider, for example, an enumerable collection which allows updates during enumeration. The collection may need notify any active enumerators that it has been changed, so those enumerators can adjust themselves accordingly. If some enumerators get abandoned by their creators, but the collection holds strong references to them, those enumerators will continue to exist (and process update notifications) as long as the collection exists. If the collection itself will exist for the lifetime of the application, those enumerators will effectively become a permanent memory leak.
If the collection holds weak references to the enumerators, this problem can be largely solved. If an enumerator is abandoned, it will be eligible for garbage collection, even though the collection still holds a weak reference to it. The next time the collection is changed, it can look through its list of weak references, send updates to the ones that are still valid, and remove from its list the ones that are not.
It would be possible to achieve many of the effects of weak references using finalizers along with some extra objects, and it's possible to make such implementations more efficient than those using weak references, but there are many pitfalls and it's hard to avoid bugs. It's much easier to make a correct approach using WeakReference. The approach may not be optimally efficient, but it won't fail badly.

Weak Pointers keep whatever holds them from becoming a form of "life support" for the object the pointer points to.
Say you had a Viewport class, and 2 UI classes, and a buch of Widget classes. You want your UI to control the lifespan of the Widgets it creates, so your UI keeps SharedPtrs to all the Widgets it controls. For as long as your UI object is alive, none of the Widgets it refrences will be garbage collected (thanks to SharedPtr).
However, the Viewport is your class that actually does the drawing, so your UI needs to pass the Viewport a pointer to the Widgets so that it can draw them. For whatever reason, you want to change your active UI class to the other one. Lets consider two scenarios, one where the UI passed the Viewport WeakPtrs and one where it passed SharedPtrs (pointing to the Widgets).
If you had passed the Viewport all the Widgets as WeakPointers, as soon as the UI class was deleted there would be no more SharedPointers to the Widgets, so they would be garbage collected, the Viewport's references to the objects wouldn't keep them on "life support", which is exactly what you want because you aren't even using that UI anymore, much less the Widgets it created.
Now, consider you had passed the Viewport a SharedPointer, you delete the UI, and the Widgets are NOT garbage collected! Why? because the Viewport, which is still alive has an array (vector or list, whatever) full of SharedPtrs to the Widgets. The Viewport has in effect became a form of "life support" for them, even though you had deleted the UI that was controlling the widgets for another UI object.
Generally, a language/system/framework will garbage collect anything unless there is a "strong" reference to it somewhere in memory. Imagine if everything had a strong reference to everything, nothing would ever get garbage collected! Sometimes you want that behavior sometimes you don't. If you use a WeakPtr, and there are no Shared/StrongPtrs left pointing at the object (only WeakPtrs), then the objects will be garbage collected despite the WeakPtr references, and the WeakPtrs (should be) set to NULL (or deleted, or something).
Again, when you use a WeakPtr you're basically allowing the object you're giving it too to be able to access the data, but the WeakPtr won't prevent garbage collection of the object it points to like a SharedPtr would. When you think SharedPtr, think "life support", WeakPtr, NO "life support." Garbage collection won't (generally) occur until the object has zero life support.

Weak references can for example be used in caching scenarios - you can access data through weak references, but if you don't access the data for a long time or there is high memory pressure, the GC can free it.

The reason for garbage collection at all is that in a language like C where memory management is totally under explicit control of the programmer, when object ownership is passed around, especially between threads or, even harder, between processes sharing memory, avoiding memory leaks and dangling pointers can become very hard. If that weren't hard enough, you also have to deal with the need to have access to more objects than will fit in memory at one time—you need to have a way to have free up some objects for a while so that other objects can be in memory.
So, some languages (e.g., Perl, Lisp, Java) provide a mechanism where you can just stop "using" an object and the garbage collector will eventually discover this and free up memory used for the object. It does this correctly without the programmer worrying about all the ways they can get it wrong (albeit there are lots of ways programmers can screw this up).
If you conceptually multiply the number of times you access an object by the time that it takes to compute the value of an object, and possibly multiply again by the cost of not having the object readily available or by the size of an object since keeping a large object around in memory can prevent keeping several smaller objects around, you could classify objects into three categories.
Some objects are so important that you want to explicitly manage their existence—they will not be managed by the garbage collector or they must never be collected until explicitly freed. Some objects are cheap to compute, are small, are not accessed frequently or have similar characteristics that allow them to be garbage collected at any time.
The third class, objects which are expensive to be recomputed but could be recomputed, are accessed somewhat frequently (perhaps for a short burst of time), are of large size, and so on are a third class. You'd like to keep them in memory as long as possible because they might be reused again, but you don't want to run out of memory needed for critical objects. These are candidates for weak references.
You want these objects kept around as long as possible if they aren't conflicting with critical resources, but they should be dropped if memory is needed for a critical resource because it can be recomputed again when needed. These are hat weak pointers are for.
An example of this might be pictures. Say you have a photo web page with thousands of pictures to display. You need to know how many pictures to lay out and maybe you have to do a database query to get the list. The memory to hold a list of a few thousand items is probably very small. You want to do the query once and keep it around.
You can only physically show perhaps a few dozen pictures at a time, though, in a pane of a web page. You don't need to fetch the bits for the pictures that the user can't be looking at. When the user scrolls the page, you'll gather the actual bits for the pictures visible. Those pictures could require many megabytes to show them. If the user scrolls back and forth between a few scroll positions, you'd like not to have to refetch those megabytes over and over again. But you can't keep all the pictures in memory all the time. So you use weak pointers.
If the user just looks at a few pictures over and over again, they may stay in cache and you don't have to refetch them. But if they scroll enough, you need to free up some memory so the visible pictures can be fetched. With a weak reference, you check the reference just before you use it. If its still valid, you use it. If its not, you make the expensive calculation (fetch) to get it.

Related

Lock free non-allocating collection

I am looking for a collection data structure that is:
thread safe
lock free
non-allocating (amortized or pre-allocated is fine)
non-intrusive
not using exotic intrinsics
Element order does not matter. Stack, queue, bag, anything is fine. I have found plenty of examples that fulfill four of these five requirements, for example:
.NET's List is not thread safe.
If I put a mutex on it, then it's not lock free.
.NET's ConcurrentStack is thread safe, lock free, uses a simple CompareExchange, but allocates a new Node for each element.
If I move the next pointer from the Node to the element itself, then it's intrusive.
Array based lock free data structures tend to require multi-word intrinsics.
I feel like I'm missing something super obvious. This should be a solved problem.
.NET's ConcurrentQueue fulfills all five requirements. It does allocate when the backing storage runs out of space, similar to List<T>, but as long as there is extra capacity, no allocations occur. Unfortunately the only way to reserve extra capacity upfront, is to initialize it with a collection of same size and then dequeuing all the elements.
same is true for .NET's ConcurrentBag

High rate of Gen 1 garbage collections

I am profiling an application(using VS 2010) that is behaving badly in production. Once of the recommendations given by VS 2010 is:
Relatively high rate of Gen 1 garbage collections is occurring. If, by
design, most of your program's data structures are allocated and
persisted for a long time, this is not ordinarily a problem. However,
if this behavior is unintended, your app may be pinning objects. If
you are not certain, you can gather .NET memory allocation data and
object lifetime information to understand the pattern of memory
allocation your application uses.
Searching on google gives the following link=> http://msdn.microsoft.com/en-us/library/ee815714.aspx. Are there some obvious things that I can do to reduce this issue?I seem to be lost here.
Double-click the message in the Errors List window to navigate to the
Marks View of the profiling data. Find the .NET CLR Memory# of Gen 0
Collections and .NET CLR Memory# of Gen 1 Collections columns.
Determine if there are specific phases of program execution where
garbage collection is occurring more frequently. Compare these values
to the % Time in GC column to see if the pattern of managed memory
allocations is causing excessive memory management overhead.
To understand the application’s pattern of managed memory usage,
profile it again running a.NET Memory allocation profile and request
Object Lifetime measurements.
For information about how to improve garbage collection performance,
see Garbage Collector Basics and Performance Hints on the Microsoft
Web site. For information about the overhead of automatic garbage
collection, see Large Object Heap Uncovered.
The relevant line there is:
To understand the application’s pattern of managed memory usage, profile it again running a.NET Memory allocation profile and request Object Lifetime measurements.
You need to understand how many objects are being allocated by your application and when, and how long they are alive for. You're probably allocating hundreds (or thousands!) of tiny objects inside a loop somewhere without really thinking about the consequences of reclaiming that memory when the references fall out of scope.
http://msdn.microsoft.com/en-us/library/ms973837.aspx states:
Now that we have a basic model for how things are working, let's
consider some things that could go wrong that would make it slow. That
will give us a good idea what sorts of things we should try to avoid
to get the best performance out of the collector.
Too Many Allocations
This is really the most basic thing that can go wrong. Allocating new
memory with the garbage collector is really quite fast. As you can see
in Figure 2 above is all that needs to happen typically is for the
allocation pointer to get moved to create space for your new object on
the "allocated" side—it doesn't get much faster than that. However,
sooner or later a garbage collect has to happen and, all things being
equal, it's better for that to happen later than sooner. So you want
to make sure when you're creating new objects that it's really
necessary and appropriate to do so, even though creating just one is
fast.
This may sound like obvious advice, but actually it's remarkably easy
to forget that one little line of code you write could trigger a lot
of allocations. For example, suppose you're writing a comparison
function of some kind, and suppose that your objects have a keywords
field and that you want your comparison to be case insensitive on the
keywords in the order given. Now in this case you can't just compare
the entire keywords string, because the first keyword might be very
short. It would be tempting to use String.Split to break the keyword
string into pieces and then compare each piece in order using the
normal case-insensitive compare. Sounds great right?
Well, as it turns out doing it like that isn't such a good idea. You
see, String.Split is going to create an array of strings, which means
one new string object for every keyword originally in your keywords
string plus one more object for the array. Yikes! If we're doing this
in the context of a sort, that's a lot of comparisons and your
two-line comparison function is now creating a very large number of
temporary objects. Suddenly the garbage collector is going to be
working very hard on your behalf, and even with the cleverest
collection scheme there is just a lot of trash to clean up. Better to
write a comparison function that doesn't require the allocations at
all.

Flex loitering objects with 0 paths

My application is leaking a visual component called GraphViewer. Every time the user changes graphs a new viewer is created and the old one is removed from the stage and discarded. Yet memory seems to leak. When I use the Flex profiler to track loitering objects it shows that GraphViewer instances indeed leak, but when I examine the object references of the loitering viewers I see that all of them (except one) have 0 paths to GC root.
I take a memory snapshot after GC and then change the graph (create a new viewer) N times. Then I do GC, take another snapshot and look at loitering objects. I see N GraphViewer objects loitering, but N-1 of them actually have 0 paths and only one has anything actually referencing it.
Why is the Flex profiler showing objects as loitering when they cannot be reached from the GC root? Is the Flex profiler reliable?
First off, why do you need to create a new instance of your component when new data arrives? Seems a bit wasteful. It's better to reuse an instance than to create a new one.
Second, it's hard to answer your issue without code, but often times the reason why a view component is kept in memory is because someone either still have a reference to it or an event listener hasn't been cleaned properly.
And lastly, there has been a known bug in the GC for some time now (though I haven't tested it recently; been about a year since I could reproduce) where large memory 'islands' (think of a very large module) won't clean up properly because the round trip algorithm for the GC won't figure that it's disconnected from the rest. To alleviate that, you might want to implement an IDisposable interface where your 'parent' view calls a destroy function before removing from stage (which then propagates throughout the component and it's children to destroy as well).
Good luck.

Garbage collection vs. shared pointers

What are the differences between shared pointers (such as boost::shared_ptr or the new std::shared_ptr) and garbage collection methods (such as those implemented in Java or C#)? The way I understand it, shared pointers keep track of how many times variables points to the resource and will automatically destruct the resource when the count reaches zero. However, my understanding is that the garbage collector also manages memory resources, but requires additional resources to determine if an object is still being referred to and doesn't necessarily destruct the resource immediately.
Am I correct in my assumptions, and are there any other differences between using garbage collectors and shared pointers? Also, why would anyone ever used a garbage collector over a shared pointer if they perform similar tasks but with varying performance figures?
The main difference lies, as you noted, in when the resource is released/destroyed.
One advantage where a GC might come in handy is if you have resources that take a long time to be released. For a short program lifetime, it might be nice to leave the resources dangling and have them cleaned up in the end. If resource limits are reached, then the GC can act to release some of them. Shared pointers, on the other hand, release their resources as soon as the reference count hits zero. This could be costly for frequent acquisition-release cycles of a resource with costly time requirements.
On the other hand, in some garbage collection implementations, garbage collection requires that the whole program pause its execution while memory is examined, moved around, and freed. There are smarter implementations, but none are perfect.
Those Shared Pointers (usually called reference counting) run the risk of cycles.
Garbage collection (Mark and Sweep) does not have this problem.
In a simple garbage-collected system, nobody will hold a direct pointer to any object; instead, code will hold references to table entries which point to objects on the heap. Each object on the heap will store its size (meaning all heap objects will form a singly-linked list) and a back-reference to the object in the object table which holds it (or at least used to).
When either the heap or the object table gets full, the system will set a "delete me" flag on every object in the table. It will examine every object it knows about and, if its "delete flag" was set, unset it and add all the objects it knows about to the list of objects to be examined. Once that is done, any object whose "delete me" flag is still set can be deleted.
Once that is done, the system will start at the beginning of the heap, take each object stored there, and see if its object reference still points to it. If so, it will copy that object to the beginning of the heap, or just past the end of the last copied object; otherwise the object will be skipped (and will likely be overwritten when other objects are copied).
In languages with a garbage collector (GC), the GC keeps track of and cleans up memory that isn’t being used anymore, and we don’t need to think about it. In most languages without a GC, it’s our responsibility to identify when memory is no longer being used and to call code to explicitly free it, just as we did to request it.
more details: HERE

Tradeoff of creating collection on the fly or beforehand

Whats the pro's and con's of creating a collection instance on the fly or beforehand, during initialisation for use later.
I have a whole bundle of threads that each need to output a buffer which is enqueued on a priority or intervalheap queue. I was wondering if it would be more effecient in c# to create, a circular buffer of type X, of size 2048 beforehand, and just write into each one, and reuse later, or allow each thread to create a buffer on the fly and enqueue it, i.e. create them when necessary, and allow for normal cleanup as per.
I know that the GC would try and disappear the pre-created circular queue. I've had strange debugging problems in the past, looking for object that no longer the exist, because the GC removed it as per.
Any help or advice would be appreciated.
Bob.
GC won't remove an object you still have a reference to - in other words, if you were able to use your pre-created buffer, it wouldn't be garbage collected - unless you had a WeakReference to it, of course.
Do you know that this will be a performance bottleneck at all? Why not write the simplest code that works first, and measure how well it performs?

Resources