Related
Kotlins SortedMap is "a map that further provides a total ordering on its keys."
As a result, it should be indexable. However, this extension doesn't exist
`sortedMap.forEachIndexed()`
Why not? Am i overlooking something? Is it performance reasons? Didn't anyone bother?
(Yes, i know, i could use a List<Pair<Key, Value>>, but that's doesn't feel like the "intuitive" structure for my usecase, a map fits much better)
Most of the things that have a forEachIndexed get it either from Iterable or have it as an extension function. Map does not, but one of its properties, the entries is actually a Set, which does have forEachIndexed because it inherits from Collection (which inherits from Iterable).
That means that you can do something like this:
map.entries.forEachIndexed { index, (key, value) ->
//do stuff
}
The reason I've added this to the already existing asIterable().forEachIndex answer, is because asIterable() creates a new object.
forEachIndexed is an extension function on all sorts of Arrays and also Iterable. SortedMap is unrelated to those types, so you can't call forEachIndexed on it.
However, SortedMap does have asIterable (inherited from Map), which converts it to an Iterable. After that, you can access forEachIndexed:
someSortedMap.asIterable().forEachIndex { index, entry ->
// ...
}
However, the newer extension function onEachIndexed are declared on Maps. Unlike forEachIndexed, this also returns the map itself.
Ceylon has several different concepts for things that might all be considered some kind of array: List, Tuple, Sequence, Sequential, Iterable, Array, Collection, Category, etc. What's is different about these these types and when should I use them?
The best place to start to learn about these things at a basic level is the Ceylon tour. And the place to learn about these things in depth is the module API. It can also be helpful to look at the source files for these.
Like all good modern programming languages, the first few interfaces are super abstract. They are built around one formal member and provide their functionality through a bunch of default and actual members. (In programming languages created before Java 8, you may have heard these called "traits" to distinguish them from traditional interfaces which have only formal members and no functionality.)
Category
Let's start by talking about the interface Category. It represents types of which you can ask "does this collection contain this object", but you may not necessarily be able to get any of the members out of the collection. It's formal member is:
shared formal Boolean contains(Element element)
An example might be the set of all the factors of a large number—you can efficiently test if any integer is a factor, but not efficiently get all the factors.
Iterable
A subtype of Category is the interface Iterable. It represents types from which you can get each element one at a time, but not necessarily index the elements. The elements may not have a well-defined order. The elements may not even exist yet but are generated on the fly. The collection may even be infinitely long! It's formal member is:
shared formal Iterator<Element> iterator()
An example would be a stream of characters like standard out. Another example would be a range of integers provided to a for loop, for which it is more memory efficient to generate the numbers one at a time.
This is a special type in Ceylon and can be abbreviated {Element*} or {Element+} depending on if the iterable might be empty or is definitely not empty, respectively.
Collection
One of Iterable's subtypes is the interface Collection. It has one formal member:
shared formal Collection<Element> clone()
But that doesn't really matter. The important thing that defines a Collection is this line in the documentation:
All Collections are required to support a well-defined notion of value
equality, but the definition of equality depends upon the kind of
collection.
Basically, a Collection is a collection who structure is well-defined enough to be equatable to each other and clonable. This requirement for a well-defined structure means that this is the last of the super abstract interfaces, and the rest are going to look like more familiar collections.
List
One of Collection's subtypes is the interface List. It represents a collection whose elements we can get by index (myList[42]). Use this type when your function requires an array to get things out of, but doesn't care if it is mutable or immutable. It has a few formal methods, but the important one comes from its other supertype Correspondence:
shared formal Item? get(Integer key)
Sequential, Sequence, Empty
The most important of List's subtypes is the interface Sequential. It represents an immutable List. Ceylon loves this type and builds a lot of syntax around it. It is known as [Element*] and Element[]. It has exactly two subtypes:
Empty (aka []), which represents empty collections
Sequence (aka [Element+]), which represents nonempty collections.
Because the collections are immutable, there are lots of things you can do with them that you can't do with mutable collections. For one, numerous operations can fail with null on empty lists, like reduce and first, but if you first test that the type is Sequence then you can guarantee these operations will always succeed because the collection can't become empty later (they're immutable after all).
Tuple
A very special subtype of Sequence is Tuple, the first true class listed here. Unlike Sequence, where all the elements are constrained to one type Element, a Tuple has a type for each element. It gets special syntax in Ceylon, where [String, Integer, String] is an immutable list of exactly three elements with exactly those types in exactly that order.
Array
Another subtype of List is Array, also a true class. This is the familiar Java array, a mutable fixed-size list of elements.
drhagen has already answered the first part of your question very well, so I’m just going to say a bit on the second part: when do you use which type?
In general: when writing a function, make it accept the most general type that supports the operations you need. So far, so obvious.
Category is very abstract and rarely useful.
Iterable should be used if you expect some stream of elements which you’re just going to iterate over (or use stream operations like filter, map, etc.).
Another thing to consider about Iterable is that it has some extra syntax sugar in named arguments:
void printAll({Anything*} things, String prefix = "") {
for (thing in things) {
print(prefix + (thing?.string else "<null>"));
}
}
printAll { "a", "b", "c" };
printAll { prefix = "X"; "a", "b", "c" };
Try online
Any parameter of type Iterable can be supplied as a list of comma-separated arguments at the end of a named argument list. That is,
printAll { "a", "b", "c" };
is equivalent to
printAll { things = { "a", "b", "c" }; };
This allows you to craft DSL-style expressions; the tour has some nice examples.
Collection is, like Correspondence, fairly abstract and in my experience rarely used directly.
List sounds like it should be a often used type, but actually I don’t recall using it a lot. I’m not sure why. I seem to skip over it and declare my parameters as either Iterable or Sequential.
Sequential and Sequence are when you want an immutable, fixed-length list. It also has some syntax sugar: variadic methods like void foo(String* bar) are a shortcut for a Sequential or Sequence parameter. Sequential also allows you to do use the nonempty operator, which often works out nicely in combination with first and rest:
String commaSeparated(String[] strings) {
if (nonempty strings) {
value sb = StringBuilder();
sb.append(strings.first); // known to exist
for (string in strings.rest) { // skip the first one
sb.append(", ").append(string);
// we don’t need a separate boolean to track if we need the comma or not :)
}
return sb.string;
} else {
return "";
}
}
Try online
I usually use Sequential and Sequence when I’m going to iterate over a stream several times (which could be expensive for a generic Iterable), though List might be the better interface for that.
Tuple should never be used as Tuple (except in the rare case where you’re abstracting over them), but with the [X, Y, Z] syntax sugar it’s often useful. You can often refine a Sequential member to a Tuple in a subclass, e. g. the superclass has a <String|Integer>[] elements which in one subclass is known to be a [String, Integer] elements.
Array I’ve never used as a parameter type, only rarely as a class to instantiate and use.
Can someone tell me the exact difference between node() and element() types in XQuery? The documentation states that element() is an element node, while node() is any node, so if I understand it correctly element() is a subset of node().
The thing is I have an XQuery function like this:
declare function local:myFunction($arg1 as element()) as element() {
let $value := data($arg1/subelement)
etc...
};
Now I want to call the function with a parameter which is obtained by another function, say functionX (which I have no control over):
let $parameter := someNamespace:functionX()
return local:myFunction($parameter)
The problem is, functionX returns an node() so it will not let me pass the $parameter directly. I tried changing the type of my function to take a node() instead of an element(), but then I can’t seem to read any data from it. $value is just empty.
Is there some way of either converting the node to an element or should am I just missing something?
EDIT: As far as I can tell the problem is in the part where I try to get the subelement using $arg1/subelement. Apparently you can do this if $arg1 is an element() but not if it is a node().
UPDATE: I have tested the example provided by Dimitre below, and it indeed works fine, both with Saxon and with eXist DB (which is what I am using as the XQuery engine). The problem actually occurs with the request:get-data() function from eXist DB. This function gets data provided by the POST request when using eXist through REST, parses it as XML and returns it as a node(). But for some reason when I pass the data to another function XQuery doesn’t acknowledge it as being a valid element(), even though it is. If I extract it manually (i.e. copy the output and paste it to my source code), assign it to a variable and pass it to my function all goes well. But if I pass it directly it gives me a runtime error (and indeed fails the instance of test).
I need to be able to either make it ignore this type-check or “typecast” the data to an element().
data() returning empty for an element just because the argument type is node() sounds like a bug to me. What XQuery processor are you using?
It sounds like you need to placate static type checking, which you can do using a treat as expression. I don't believe a dynamic test using instance of will suffice.
Try this:
let $parameter := someNamespace:functionX() treat as element()
return local:myFunction($parameter)
Quoting from the 4th edition of Michael Kay's magnum opus, "The treat as operator is essentially telling the system that you know what the runtime type is going to be, and you want any checking to be deferred until runtime, because you're confident that your code is correct." (p. 679)
UPDATE: I think the above is actually wrong, since treat as is just an assertion. It doesn't change the type annotation node(), which means it's also a wrong assertion and doesn't help you. Hmmm... What I really want is cast as, but that only works for atomic types. I guess I'm stumped. Maybe you should change XQuery engines. :-) I'll report back if I think of something else. Also, I'm curious to find out if Dimitre's solution works for you.
UPDATE #2: I had backpedaled here earlier. Can I backpedal again? ;-) Now my theory is that treat as will work based on the fact that node() is interpreted as a union of the various specific node type annotations, and not as a run-time type annotation itself (see the "Note" in the "Item types" section of the XQuery formal semantics.) At run time, the type annotation will be element(). Use treat as to guarantee to the type checker that this will be true. Now I wait on bated breath: does it work for you?
EXPLANATORY ADDENDUM: Assuming this works, here's why. node() is a union type. Actual items at run time are never annotated with node(). "An item type is either an atomic type, an element type, an attribute type, a document node type, a text node type, a comment node type, or a processing instruction type."1 Notice that node() is not in that list. Thus, your XQuery engine isn't complaining that an item has type node(); rather it's complaining that it doesn't know what the type is going to be (node() means it could end up being attribute(), element(), text(), comment(), processing-instruction(), or document-node()). Why does it have to know? Because you're telling it elsewhere that it's an element (in your function's signature). It's not enough to narrow it down to one of the above six possibilities. Static type checking means that you have to guarantee—at compile time—that the types will match up (element with element, in this case). treat as is used to narrow down the static type from a general type (node()) to a more specific type (element()). It doesn't change the dynamic type. cast as, on the other hand, is used to convert an item from one type to another, changing both the static and dynamic types (e.g., xs:string to xs:boolean). It makes sense that cast as can only be used with atomic values (and not nodes), because what would it mean to convert an attribute to an element (etc.)? And there's no such thing as converting a node() item to an element() item, because there's no such thing as a node() item. node() only exists as a static union type. Moral of the story? Avoid XQuery processors that use static type checking. (Sorry for the snarky conclusion; I feel I've earned the right. :-) )
NEW ANSWER BASED ON UPDATED INFORMATION: It sounds like static type checking is a red herring (a big fat one). I believe you are in fact not dealing with an element but a document node, which is the invisible root node that contains the top-level element (document element) in the XPath data model representation of a well-formed XML document.
The tree is thus modeled like this:
[document-node]
|
<docElement>
|
<subelement>
and not like this:
<docElement>
|
<subelement>
I had assumed you were passing the <docElement> node. But if I'm right, you were actually passing the document node (its parent). Since the document node is invisible, its serialization (what you copied and pasted) is indistinguishable from an element node, and the distinction was lost when you pasted what is now interpreted as a bare element constructor in your XQuery. (To construct a document node in XQuery, you have to wrap the element constructor with document{ ... }.)
The instance of test fails because the node is not an element but a document-node. (It's not a node() per se, because there's no such thing; see explanation above.)
Also, this would explain why data() returns empty when you tried to get the <subelement> child of the document node (after relaxing the function argument type to node()). The first tree representation above shows that <subelement> is not a child of the document node; thus it returns the empty sequence.
Now for the solution. Before passing the (document node) parameter, get its element child (the document element), by appending /* (or /element() which is equivalent) like this:
let $parameter := someNamespace:functionX()/*
return local:myFunction($parameter)
Alternatively, let your function take a document node and update the argument you pass to data():
declare function local:myFunction($arg1 as document-node()) as element() {
let $value := data($arg1/*/subelement)
etc...
};
Finally, it looks like the description of eXist's request:get-data() function is perfectly consistent with this explanation. It says: "If its not a binary document, we attempt to parse it as XML and return a document-node()." (emphasis added)
Thanks for the adventure. This turned out to be a common XPath gotcha (awareness of document nodes), but I learned a few things from our detour into static type checking.
This works perfectly using Saxon 9.3:
declare namespace my = "my:my";
declare namespace their = "their:their";
declare function my:fun($arg1 as element()) as element()
{
$arg1/a
};
declare function their:fun2($arg1 as node()) as node()
{
$arg1
};
my:fun(their:fun2(/*) )
when the code above is applied on the following XML document:
<t>
<a/>
</t>
the correct result is produced with no error messages:
<a/>
Update:
The following should work even with the most punctuential static type-checking XQuery implementation:
declare namespace my = "my:my";
declare namespace their = "their:their";
declare function my:fun($arg1 as element()) as element()
{
$arg1/a
};
declare function their:fun2($arg1 as node()) as node()
{
$arg1
};
let $vRes := their:fun2(/*)
(: this prevents our code from runtime crash :)
return if($vRes instance of element())
then
(: and this assures the static type-checker
that the type is element() :)
my:fun(their:fun2(/*) treat as element())
else()
node() is an element, attribute, processing instruction, text node, etc.
But data() converts the result to a string, which isn't any of those; it's a primitive type.
You might want to try item(), which should match either.
See 2.5.4.2 Matching an ItemType and an Item in the W3C XQuery spec.
Although it's not shown in your example code, I assume you are actually returning a value (like the $value you are working with) from the local:myFunction.
When using reference counting, what are possible solutions/techniques to deal with circular references?
The most well-known solution is using weak references, however many articles about the subject imply that there are other methods as well, but keep repeating the weak-referencing example. Which makes me wonder, what are these other methods?
I am not asking what are alternatives to reference counting, rather what are solutions to circular references when using reference counting.
This question isn't about any specific problem/implementation/language rather a general question.
I've looked at the problem a dozen different ways over the years, and the only solution I've found that works every time is to re-architect my solution to not use a circular reference.
Edit:
Can you expand? For example, how would you deal with a parent-child relation when the child needs to know about/access the parent? – OB OB
As I said, the only good solution is to avoid such constructs unless you are using a runtime that can deal with them safely.
That said, if you must have a tree / parent-child data structure where the child knows about the parent, you're going to have to implement your own, manually called teardown sequence (i.e. external to any destructors you might implement) that starts at the root (or at the branch you want to prune) and does a depth-first search of the tree to remove references from the leaves.
It gets complex and cumbersome, so IMO the only solution is to avoid it entirely.
Here is a solution I've seen:
Add a method to each object to tell it to release its references to the other objects, say call it Teardown().
Then you have to know who 'owns' each object, and the owner of an object must call Teardown() on it when they're done with it.
If there is a circular reference, say A <-> B, and C owns A, then when C's Teardown() is called, it calls A's Teardown, which calls Teardown on B, B then releases its reference to A, A then releases its reference to B (destroying B), and then C releases its reference to A (destroying A).
I guess another method, used by garbage collectors, is "mark and sweep":
Set a flag in every object instance
Traverse the graph of every instance that's reachable, clearing that flag
Every remaining instance which still has the flag set is unreachable, even if some of those instances have circular references to each other.
I'd like to suggest a slightly different method that occured to me, I don't know if it has any official name:
Objects by themeselves don't have a reference counter. Instead, groups of one or more objects have a single reference counter for the entire group, which defines the lifetime of all the objects in the group.
In a similiar fashion, references share groups with objects, or belong to a null group.
A reference to an object affects the reference count of the (object's) group only if it's (the reference) external to the group.
If two objects form a circular reference, they should be made a part of the same group. If two groups create a circular reference, they should be united into a single group.
Bigger groups allow more reference-freedom, but objects of the group have more potential of staying alive while not needed.
Put things into a hierarchy
Having weak references is one solution. The only other solution I know of is to avoid circular owning references all together. If you have shared pointers to objects, then this means semantically that you own that object in a shared manner. If you use shared pointers only in this way, then you can hardly get cyclic references. It does not occur very often that objects own each other in a cyclic manner, instead objects are usually connected through a hierarchical tree-like structure. This is the case I'll describe next.
Dealing with trees
If you have a tree with objects having a parent-child relationship, then the child does not need an owning reference to its parent, since the parent will outlive the child anyways. Hence a non-owning raw back pointer will do. This also applies to elements pointing to a container in which they are situated. The container should, if possible, use unique pointers or values instead of shared pointers anyways, if possible.
Emulating garbage collection
If you have a bunch of objects that can wildly point to each other and you want to clean up as soon as some objects are not reachable, then you might want to build a container for them and an array of root references in order to do garbage collection manually.
Use unique pointers, raw pointers and values
In the real world I have found that the actual use cases of shared pointers are very limited and they should be avoided in favor of unique pointers, raw pointers, or -- even better -- just value types. Shared pointers are usually used when you have multiple references pointing to a shared variable. Sharing causes friction and contention and should be avoided in the first place, if possible. Unique pointers and non-owning raw pointers and/or values are much easier to reason about. However, sometimes shared pointers are needed. Shared pointers are also used in order to extend the lifetime of an object. This does usually not lead to cyclic references.
Bottom line
Use shared pointers sparingly. Prefer unique pointers and non-owning raw pointers or plain values. Shared pointers indicate shared ownership. Use them in this way. Order your objects in a hierarchy. Child objects or objects on the same level in a hierarchy should not use owning shared references to each other or to their parent, but they should use non-owning raw pointers instead.
No one has mentioned that there is a whole class of algorithms that collect cycles, not by doing mark and sweep looking for non-collectable data, but only by scanning a smaller set of possibly circular data, detecting cycles in them and collecting them without a full sweep.
To add more detail, one idea for making a set of possible nodes for scanning would be ones whose reference count was decremented but which didn't go to zero on the decrement. Only nodes to which this has happened can be the point at which a loop was cut off from the root set.
Python has a collector that does that, as does PHP.
I'm still trying to get my head around the algorithm because there are advanced versions that claim to be able to do this in parallel without stopping the program...
In any case it's not simple, it requires multiple scans, an extra set of reference counters, and decrementing elements (in the extra counter) in a "trial" to see if the self referential data ends up being collectable.
Some papers:
"Down for the Count? Getting Reference Counting Back in the Ring" Rifat Shahriyar, Stephen M. Blackburn and Daniel Frampton
http://users.cecs.anu.edu.au/~steveb/downloads/pdf/rc-ismm-2012.pdf
"A Unified Theory of Garbage Collection" by David F. Bacon, Perry Cheng and V.T. Rajan
http://www.cs.virginia.edu/~cs415/reading/bacon-garbage.pdf
There are lots more topics in reference counting such as exotic ways of reducing or getting rid of interlocked instructions in reference counting. I can think of 3 ways, 2 of which have been written up.
I have always redesigned to avoid the issue. One of the common cases where this comes up is the parent child relationship where the child needs to know about the parent. There are 2 solutions to this
Convert the parent to a service, the parent then does not know about the children and the parent dies when there are no more children or the main program drops the parent reference.
If the parent must have access to the children, then have a register method on the parent which accepts a pointer that is not reference counted, such as an object pointer, and a corresponding unregister method. The child will need to call the register and unregister method. When the parent needs to access a child then it type casts the object pointer to the reference counted interface.
When using reference counting, what are possible solutions/techniques to deal with circular references?
Three solutions:
Augment naive reference counting with a cycle detector: counts decremented to non-zero values are considered to be potential sources of cycles and the heap topology around them is searched for cycles.
Augment naive reference counting with a conventional garbage collector like mark-sweep.
Constrain the language such that its programs can only ever produce acyclic (aka unidirectional) heaps. Erlang and Mathematica do this.
Replace references with dictionary lookups and then implement your own garbage collector that can collect cycles.
i too am looking for a good solution to the circularly reference counted problem.
i was stealing borrowing an API from World of Warcraft dealing with achievements. i was implicitely translating it into interfaces when i realized i had circular references.
Note: You can replace the word achievements with orders if you don't like achievements. But who doesn't like achievements?
There's the achievement itself:
IAchievement = interface(IUnknown)
function GetName: string;
function GetDescription: string;
function GetPoints: Integer;
function GetCompleted: Boolean;
function GetCriteriaCount: Integer;
function GetCriteria(Index: Integer): IAchievementCriteria;
end;
And then there's the list of criteria of the achievement:
IAchievementCriteria = interface(IUnknown)
function GetDescription: string;
function GetCompleted: Boolean;
function GetQuantity: Integer;
function GetRequiredQuantity: Integer;
end;
All achievements register themselves with a central IAchievementController:
IAchievementController = interface
{
procedure RegisterAchievement(Achievement: IAchievement);
procedure UnregisterAchievement(Achievement: IAchievement);
}
And the controller can then be used to get a list of all the achievements:
IAchievementController = interface
{
procedure RegisterAchievement(Achievement: IAchievement);
procedure UnregisterAchievement(Achievement: IAchievement);
function GetAchievementCount(): Integer;
function GetAchievement(Index: Integer): IAchievement;
}
The idea was going to be that as something interesting happened, the system would call the IAchievementController and notify them that something interesting happend:
IAchievementController = interface
{
...
procedure Notify(eventType: Integer; gParam: TGUID; nParam: Integer);
}
And when an event happens, the controller will iterate through each child and notify them of the event through their own Notify method:
IAchievement = interface(IUnknown)
function GetName: string;
...
function GetCriteriaCount: Integer;
function GetCriteria(Index: Integer): IAchievementCriteria;
procedure Notify(eventType: Integer; gParam: TGUID; nParam: Integer);
end;
If the Achievement object decides the event is something it would be interested in it will notify its child criteria:
IAchievementCriteria = interface(IUnknown)
function GetDescription: string;
...
procedure Notify(eventType: Integer; gParam: TGUID; nParam: Integer);
end;
Up until now the dependancy graph has always been top-down:
IAchievementController --> IAchievement --> IAchievementCriteria
But what happens when the achievement's criteria have been met? The Criteria object was going to have to notify its parent `Achievement:
IAchievementController --> IAchievement --> IAchievementCriteria
^ |
| |
+----------------------+
Meaning that the Criteria will need a reference to its parent; the who are now referencing each other - memory leak.
And when an achievement is finally completed, it is going to have to notify its parent controller, so it can update views:
IAchievementController --> IAchievement --> IAchievementCriteria
^ | ^ |
| | | |
+----------------------+ +----------------------+
Now the Controller and its child Achievements circularly reference each other - more memory leaks.
i thought that perhaps the Criteria object could instead notify the Controller, removing the reference to its parent. But we still have a circular reference, it just takes longer:
IAchievementController --> IAchievement --> IAchievementCriteria
^ | |
| | |
+<---------------------+ |
| |
+-------------------------------------------------+
The World of Warcraft solution
Now the World of Warcraft api is not object-oriented friendly. But it does solve any circular references:
Do not pass references to the Controller. Have a single, global, singleton, Controller class. That way an achievement doesn't have to reference the controller, just use it.
Cons: Makes testing, and mocking, impossible - because you have to have a known global variable.
An achievement doesn't know its list of criteria. If you want the Criteria for an Achievement you ask the Controller for them:
IAchievementController = interface(IUnknown)
function GetAchievementCriteriaCount(AchievementGUID: TGUID): Integer;
function GetAchievementCriteria(Index: Integer): IAchievementCriteria;
end;
Cons: An Achievement can no longer decide to pass notifications to it's Criteria, because it doesn't have any criteria. You now have to register Criteria with the Controller
When a Criteria is completed, it notifies the Controller, who notifies the Achievement:
IAchievementController-->IAchievement IAchievementCriteria
^ |
| |
+----------------------------------------------+
Cons: Makes my head hurt.
i'm sure a Teardown method is much more desirable that re-architecting an entire system into a horribly messy API.
But, like you wonder, perhaps there's a better way.
If you need to store the cyclic data, for a snapShot into a string,
I attach a cyclic boolean, to any object that may be cyclic.
Step 1:
When parsing the data to a JSON string, I push any object.is_cyclic that hasn't been used into an array and save the index to the string. (Any used objects are replaced with the existing index).
Step 2: I traverse the array of objects, setting any children.is_cyclic to the specified index, or pushing any new objects to the array. Then parsing the array into a JSON string.
NOTE: By pushing new cyclic objects to the end of the array, will force recursion until all cyclic references are removed..
Step 3: Last I parse both JSON strings into a single String;
Here is a javascript fiddle...
https://jsfiddle.net/7uondjhe/5/
function my_json(item) {
var parse_key = 'restore_reference',
stringify_key = 'is_cyclic';
var referenced_array = [];
var json_replacer = function(key,value) {
if(typeof value == 'object' && value[stringify_key]) {
var index = referenced_array.indexOf(value);
if(index == -1) {
index = referenced_array.length;
referenced_array.push(value);
};
return {
[parse_key]: index
}
}
return value;
}
var json_reviver = function(key, value) {
if(typeof value == 'object' && value[parse_key] >= 0) {
return referenced_array[value[parse_key]];
}
return value;
}
var unflatten_recursive = function(item, level) {
if(!level) level = 1;
for(var key in item) {
if(!item.hasOwnProperty(key)) continue;
var value = item[key];
if(typeof value !== 'object') continue;
if(level < 2 || !value.hasOwnProperty(parse_key)) {
unflatten_recursive(value, level+1);
continue;
}
var index = value[parse_key];
item[key] = referenced_array[index];
}
};
var flatten_recursive = function(item, level) {
if(!level) level = 1;
for(var key in item) {
if(!item.hasOwnProperty(key)) continue;
var value = item[key];
if(typeof value !== 'object') continue;
if(level < 2 || !value[stringify_key]) {
flatten_recursive(value, level+1);
continue;
}
var index = referenced_array.indexOf(value);
if(index == -1) (item[key] = {})[parse_key] = referenced_array.push(value)-1;
else (item[key] = {})[parse_key] = index;
}
};
return {
clone: function(){
return JSON.parse(JSON.stringify(item,json_replacer),json_reviver)
},
parse: function() {
var object_of_json_strings = JSON.parse(item);
referenced_array = JSON.parse(object_of_json_strings.references);
unflatten_recursive(referenced_array);
return JSON.parse(object_of_json_strings.data,json_reviver);
},
stringify: function() {
var data = JSON.stringify(item,json_replacer);
flatten_recursive(referenced_array);
return JSON.stringify({
data: data,
references: JSON.stringify(referenced_array)
});
}
}
}
Here are some techniques described in Algorithms for Dynamic Memory Management by R. Jones and R. Lins. Both suggest that you can look at the cyclic structures as a whole in some way or another.
Friedman and Wise
The first way is one suggested by Friedman and Wise. It is for handling cyclic references when implementing recursive calls in functional programming languages. They suggest that you can observe the cycle as a single entity with a root object. To do that, you should be able to:
Create the cyclic structure all at once
Access the cyclic structure only through a single node - a root, and mark which reference closes the cycle
If you need to reuse just a part of the cyclic structure, you shouldn't add external references but instead make copies of what you need
That way you should be able to count the structure as a single entity and whenever the RC to the root falls to zero - collect it.
This should be the paper for it if anyone is interested in more details - https://www.academia.edu/49293107/Reference_counting_can_manage_the_circular_invironments_of_mutual_recursion
Bobrow's technique
Here you could have more than one reference to the cyclic structure. We just distinguish between internal and external references for it.
Overall, all allocated objects are assigned by the programmer to a group. And all groups are reference counted. We distinguish between internal and external references for the group and of course - objects can be moved between groups. In this case, intra-group cycles are easily reclaimed whenever there are no more external references to the group. However, inter-group cyclic structures should still be an issue.
If I'm not mistaken, this should be the original paper - https://dl.acm.org/doi/pdf/10.1145/357103.357104
I'm in the search for other general purpose alternatives besides those algorithms and using weak pointers.
Y Combinator
I wrote a programming language once in which every object was immutable. As such, an object could only contain pointers to objects that were created earlier. Since all reference pointed backwards in time, there could not possibly be any cyclic references, and reference counting was a perfectly viable way of managing memory.
The question then is, "How do you create self-referencing data structures without circular references?" Well, functional programming has been doing this trick forever using the fixed-point/Y combinator to write recursive functions when you can only refer to previously-written functions. See: https://en.wikipedia.org/wiki/Fixed-point_combinator
That Wikipedia page is complicated, but the concept is really not that complicated. How it works in terms of data structures is like this:
Let's say you want to make a Map<String,Object>, where the values in the map can refer to other values in the map. Well, instead of actually storing those objects, you store functions that generate those objects on-demand given a pointer to the map.
So you might say object = map.get(key), and what this will do is call a private method like _getDefinition(key) that returns this function and calls it:
Object get(String key) {
var definition = _getDefinition(key);
return definition == null ? null : definition(this);
}
So the map has a reference to the function that defines the value. This function doesn't need to have a reference to the map, because it will be passed the map as an argument.
When the definition called, it returns a new object that does have a pointer to the map in it somewhere, but the map it self does not have a pointer to this object, so there are no circular references.
There are couple of ways I know of for walking around this:
The first (and preferred one) is simply extracting the common code into third assembly, and make both references use that one
The second one is adding the reference as "File reference" (dll) instead of "Project reference"
Hope this helps
Should you be allowed to delete an item from the collection you are currently iterating in a foreach loop?
If so, what should be the correct behavior?
I can take quite a sophisticated Collection to support enumerators that track changes to the collection to keep position info correct. Even if it does some compromisation or assumptions need to be made. For that reason most libraries simply outlaw such a thing or mutter about unexpected behaviour in their docs.
Hence the safest approach is to loop. Collect references to things that need deleting and subsequently use the collected references to delete items from the original collection.
It really depends on the language. Some just hammer through an array and explode when you change that array. Some use arrays and don't explode. Some call iterators (which are wholly more robust) and carry on just fine.
Generally, modifying a collection in a foreach loop is a bad idea, because your intention is unknown to the program. Did you mean to loop through all items before the change, or do you want it to just go with the new configuration? What about the items that have already been looped through?
Instead, if you want to modify the collection, either make a predefined list of items to loop through, or use indexed looping.
Some collections such as hash tables and dictionaries have no notion of "position" and the order of iteration is generally not guaranteed. Therefore it would be quite difficult to allow deletion of items while iterating.
You have to understand the concept of the foreach first, and actually it depends on the programming language. But as a general answer you should avoid changing your collections inside foreach
Just use a standard for loop, iterate through the item collection backwards and you should have no problem deleting items as you go.
iterate in reverse direction and delete item one by one... That should proper solution.
No, you should not. The correct behaviour should be to signal that a potential concurrency problem has been encountered, however that is done in your language of choice (throw exception, return error code, raise() a signal).
If you modify a data structure while iterating over its elements, the iterator might no longer be valid, which means that you risk working on objects that are no longer part of the collection. If you want to filter elements based on some more complex notation, you could do something like this (in Java):
List<T> toFilter = ...;
List<T> shadow;
for ( T element : toFilter )
if ( keep(element) )
shadow.add(element);
/* If you'll work with toFilter in the same context as the filter */
toFilter = shadow;
/* Alternatively, if you want to modify toFilter in place, for instance if it's
* been given as a method parameter
*/
toFilter.clear();
toFilter.addAll(shadow);
The best way to remove an item from a collection you are iterating over it to use the iterator explitly. For example.
List<String> myList = ArrayList<String>();
Iterator<String> myIt = myList.iterator();
while (myIt.hasNext()) {
myIt.remove();
}