Grakn rule that matches on a super-relation failing to find matching instance of its subrelation - vaticle-typedb

I have a Grakn schema fragment for modelling an in-game 'research tree' that looks like this:
# Entities
research-project sub entity,
has name,
plays research-task,
plays required-tech,
plays research-to-begin;
campaign sub entity,
has name,
plays campaign-with-tasks;
# Relations
requirement sub relation,
abstract,
relates prerequisite,
relates outcome,
plays task;
tech-requirement sub requirement,
abstract,
relates required-tech as prerequisite,
relates outcome;
tech-requirement-to-begin-task sub tech-requirement,
abstract,
relates required-tech as prerequisite,
relates task-to-begin as outcome;
tech-requirement-to-begin-research sub tech-requirement-to-begin-task,
relates required-tech as prerequisite,
relates research-to-begin as outcome;
campaign-task sub relation,
abstract,
relates campaign-with-tasks,
relates task,
has started,
has progress,
has can-proceed,
has completed;
campaign-research-task sub campaign-task,
relates campaign-with-tasks,
relates research-task as task;
The property can-proceed of campaign-task is governed by a set of rules, one of which is the following:
when {
$campaign isa campaign;
$campaign_task($campaign, task: $task) isa campaign-task, has completed false;
not {
(task-to-begin: $task, required-tech: $prerequisite) isa tech-requirement-to-begin-task;
($campaign, $prerequisite) isa campaign-task, has completed false;
};
}, then {
$campaign_task has can-proceed true;
};
Each prerequisite is modelled using a tech-requirement-to-begin-research relation, which is a subrelation of the abstract tech-requirement-to-begin-task.
Expected outcome: it should mark a campaign-task as having can-proceed = true only if it has NO prerequisite campaign-tasks that are not completed.
Actual outcome: it does not detect the tech-requirement-to-begin-research instances, and simply marks all campaign-tasks with can-proceed = true, even if they have unfulfilled prerequisites.
If I change the rule by replacing task-to-begin with research-to-begin, AND tech-requirement-to-begin-task with tech-requirement-to-begin-research, then the rule correctly applies only to tasks that have no unmet prerequisites. However, doing so loses generality, and will require rewriting the rule for every subrelation of tech-requirement-to-begin-task, resulting in a lot of code duplication.
What is the expected behaviour in this case and what is the best way to resolve the issue?

I believe that your expected outcome is correct - the rule should apply the negation over tech-requirement-to-begin-task and its subtypes (including tech-requirement-to-begin-research). Therefore, the best way to resolve this is to open an issue on the Grakn repo and have this behaviour fixed in Grakn's reasoner!

Related

Can Code be Protected From Rogue Callers In Ada?

I'm a fairly new Ada programmer. I have read the book by Barnes (twice I might add) and even managed to write a fair terminal program in Ada. My main language is C++ though.
I am currently wondering if there is a way to "protect" subroutine calls in Ada, perhaps in Ada 2012 (of which I know basically nothing). Let me explain what I mean (although in C++ terms).
Suppose you have a class Secret like this:
class Secret
{
private:
int secret_int;
public:
Set_Secret_Value( int i );
}
Now this is the usual stuff, dont expose secret_int, manipulate it only through access functions. However, the problem is that anybody with access to an object of type Secret can manipulate the value, whether that particular code section is supposed to do it or not. So the danger of rogue altering of secret_int has been reduced to anybody altering secret_int through the permitted functions, even if it happens in a code section that's not supposed to manipulate it.
To remedy that I came up with the following construct
class Secret
{
friend class Secret_Interface;
private:
int secret_int;
Set_Secret_Value( int i );
Super_Secret_Function();
};
class Secret_Interface
{
friend class Client;
private:
static Set_Secret_Value( Secret &rc_secret_object, int i )
{
rc_secret_object.Set_Secret( i );
}
};
class Client
{
Some_Function()
{
...
Secret_Interface::Set_Secret_Value( c_object, some-value );
...
}
}
Now the class Secret_Interface can determine which other classes can use it's private functions and by doing so, indirectly, the functions of class Secret that are exposed to Secret_Interface. This way class Secret still has private functions that can not be called by anybody outside the class, for instance function Super_Secret_Function().
Well I was wondering if anything of this sort is possible in Ada. Basically my desire is to be able to say:
Code A may only be executed by code B but not by anybody else
Thanks for any help.
Edit:
I add a diagram here with a program structure like I have in mind that shows that what I mean here is a transport of a data structure across a wide area of the software, definition, creation and use of a record should happen in code sections that are otherwise unrleated
I think the key is to realize that, unlike C++ and other languages, Ada's primary top-level unit is the package, and visibility control (i.e. public vs. private) is on a per-package basis, not a per-type (or per-class) basis. I'm not sure I'm saying that correctly, but hopefully things will be explained below.
One of the main purposes of friend in C++ is so that you can write two (or more) closely related classes that both take part in implementing one concept. In that case, it makes sense that the code in one class would be able to have more direct access to the code in another class, since they're working together. I assume that in your C++ example, Secret and Client have that kind of close relationship. If I understand C++ correctly, they do all have to be defined in the same source file; if you say friend class Client, then the Client class has to be defined somewhere later in the same source file (and it can't be defined earlier, because at that point the methods in Secret or Secret_Interface haven't yet been declared).
In Ada, you can simply define the types in the same package.
package P is
type Secret is tagged private;
type Client is tagged private;
-- define public operations for both types
private
type Secret is tagged record ... end record;
type Client is tagged record ... end record;
-- define private operations for either or both types
end P;
Now, the body of P will contain the actual code for the public and private operations of both types. All code in the package body of P has access to those things defined in P's private part, regardless of which type they operate on. And, in fact, all code has access to the full definitions of both types. This means that a procedure that operates on a Client can call a private operation that operates on a Secret, and in fact it can read and write a Secret's record components directly. (And vice versa.) This may seem bizarre to programmers used to the class paradigm used by most other OOP languages, but it works fine in Ada. (In fact, if you don't need Secret to be accessible to anything else besides the implementation of Client, the type and its operations can be defined in the private part of P, or the package body.) This arrangement doesn't violate the principles behind OOP (encapsulation, information hiding), as long as the two types are truly two pieces of the implementation of one coherent concept.
If that isn't what you want, i.e. if Secret and Client aren't that closely related, then I would need to see a larger example to find out just what kind of use case you're trying to implement.
MORE THOUGHTS: After looking over your diagram, I think that the way you're trying to solve the problem is inferior design--an anti-pattern, if you will. When you write a "module" (whatever that means--a class or package, or in some cases two or more closely related classes or packages cooperating with each other), the module defines how other modules may use it--what public operations it provides on its objects, and what those operations do.
But the module (let's call it M1) should work the same way, according to its contract, regardless of what other module calls it, and how. M1 will get a sequence of "messages" instructing it to perform certain tasks or return certain information; M1 should not care where those messages are coming from. In particular, M1 should not be making decisions about the structure of the clients that use it. By having M1 decree that "procedure XYZ can only be called from package ABC", M1 is imposing structural requirements on the clients that use it. This, I believe, causes M1 to be too tightly coupled to the rest of the program. It is not good design.
However, it may make sense for the module that uses M1 to exercise some sort of control like that, internally. Suppose we have a "module" M2 that actually uses a number of packages as part of its implementation. The "main" package in M2 (the one that clients of M2 use to get M2 to perform its task) uses M1 to create a new object, and then passes that object to several other packages that do the work. It seems like a reasonable design goal to find a way that M2 could pass that object to some packages or subprograms without giving them the ability to, say, update the object, but pass it to other packages or subprograms that would have that ability.
There are some solutions that would protect against most accidents. For example:
package M1 is
type Secret is tagged private;
procedure Harmless_Operation (X : in out Secret);
type Secret_With_Updater is new Secret with null record;
procedure Dangerous_Operation (X : in out Secret_With_Updater);
end M1;
Now, the packages that could take a "Secret" object but should not have the ability to update it would have procedures defined with Secret'Class parameters. M2 would create a Secret_With_Updater object; since this object type is in Secret'Class, it could be passed as a parameter to procedures with Secret'Class parameters. However, those procedures would not be able to call Dangerous_Operation on their parameters; that would not compile.
A package with a Secret'Class parameter could still call the dangerous operation with a type conversion:
procedure P (X : in out Secret'Class) is
begin
-- ...
M1.Secret_With_Updater(X).Dangerous_Operation;
-- ...
end P;
The language can't prevent this, because it can't make Secret_With_Updater visible to some packages but not others (without using a child package hierarchy). But it would be harder to do this accidentally. If you really wish to go further and prevent even this (if you think there will be a programmer whose understanding of good design principles is so poor that they'd be willing to write code like this), then you could go a little further:
package M1 is
type Secret is tagged private;
procedure Harmless_Operation (X : in out Secret);
type Secret_Acc is access all Secret;
type Secret_With_Updater is tagged private;
function Get_Secret (X : Secret_With_Updater) return Secret_Acc;
-- this will be "return X.S"
procedure Dangerous_Operation (X : in out Secret_With_Updater);
private
-- ...
type Secret_With_Updater is tagged record
S : Secret_Acc;
end record;
-- ...
end M1;
Then, to create a Secret, M2 would call something that creates a Secret_With_Updater that returns a record with an access to a Secret. It would then pass X.Get_Secret to those procedures which would not be allowed to call Dangerous_Operation, but X itself to those that would be allowed. (You might also be able to declare S : aliased Secret, declare Get_Secret to return access Secret, and implement it with return X.S'access. This may avoid a potential memory leak, but it may also run into accessibility-check issues. I haven't tried this.)
Anyway, perhaps some of these ideas could help accomplish what you want to accomplish without introducing unnecessary coupling by forcing M1 to know about the structure of the application that uses it. It's hard to tell because your description of the problem, even with the diagram, is still at too abstract a level for me to see what you really want to do.
You could do this by using child packages:
package Hidden is
private
A : Integer;
B : Integer;
end Hidden;
and then
package Hidden.Client_A_View is
function Get_A return Integer;
procedure Set_A (To : Integer);
end Hidden.Client_A_View;
Then, Client_A can write
with Hidden.Client_A_View;
procedure Client_A is
Tmp : Integer;
begin
Tmp := Hidden.Client_A_View.Get_A;
Hidden.Client_A_View.Set_A (Tmp + 1);
end Client_A;
Your question is extremely unclear (and all the C++ code doesn't help explaining what you need), but if your point is that you want a type to have some publicly accessible operations, and some private operations, then it is easily done:
package Example is
type Instance is private;
procedure Public_Operation (Item : in out Instance);
private
procedure Private_Operation (Item : in out Instance);
type Instance is ... -- whatever you need it to be
end Example;
The procedure Example.Private_Operation is accessible to children of Example. If you want an operation to be purely internal, you declare it only in the package body:
package body Example is
procedure Internal_Operation (Item : in out Instance);
...
end Example;
Well I was wondering if anything of this sort is possible in Ada. Basically my desire is to be able to say:
Code A may only be executed by code B but not by anybody else
If limited to language features, no.
Programmatically, code execution can be protected if the provider must be provided an approved "key" to allow execution of its services, and only authorized clients are supplied with such keys.
Devising the nature, generation, and security of such keys is left as an exercise for the reader.

Dictionary Behaves Strangely During Databinding

I was trying to do a little data access optimization, and I ran into a situation where a dictionary appeared to get out of sync in a way that should be impossible, unless I'm somehow getting into a multithreaded situation without knowing it.
One column of GridLabels binds to a property that does data access -- which is a tad expensive. However, multiple rows end up making the same call, so I should be able to head any problems off at the pass by doing a little caching.
However, elsewhere in the app, this same code is called in ways where caching would not be appropriate, I needed a way to enable caching on demand. So my databinding code looks like this:
OrderLabelAPI.MultiSyringeCacheEnabled = True
Me.GridLabels.DataBind()
OrderLabelAPI.MultiSyringeCacheEnabled = False
And the expensive call where the caching happens looks like this:
Private Shared MultiSyringeCache As New Dictionary(Of Integer, Boolean)
Private Shared m_MultiSyringeCacheEnabled As Boolean = False
Public Shared Function IsMultiSyringe(orderLabelID As Integer) As Boolean
If m_MultiSyringeCacheEnabled Then
'Since this can get hit a lot, we cache the values into a dictionary. Obviously,
'it goes away after each request. And the cache is disabled by default.
If Not MultiSyringeCache.ContainsKey(orderLabelID) Then
MultiSyringeCache.Add(orderLabelID, DoIsMultiSyringe(orderLabelID))
End If
Return MultiSyringeCache(orderLabelID)
Else
Return DoIsMultiSyringe(orderLabelID)
End If
End Function
And here is the MultiSyringeCacheEnabled property:
Public Shared Property MultiSyringeCacheEnabled As Boolean
Get
Return m_MultiSyringeCacheEnabled
End Get
Set(value As Boolean)
ClearMultiSyringeCache()
m_MultiSyringeCacheEnabled = value
End Set
End Property
Very, very rarely (unreproducably rare...) I will get the following exception: The given key was not present in the dictionary.
If you look closely at the caching code, that's impossible since the first thing it does is ensure that the key exists. If DoIsMultiSyringe tampered with the dictionary (either explicitly or by setting MultiSyringeCacheEnabled), that could also cause problems, and for awhile I assumed this had to be the culprit. But it isn't. I've been over the code very carefully several times. I would post it here but it gets into a deeper object graph than would be appropriate.
So. My question is, does datagridview databinding actually get into some kind of zany multithreaded situation that is causing the dictionary to seize? Am I missing some aspect of shared members?
I've actually gone ahead and yanked this code from the project, but I want to understand what I'm missing. Thanks!
Since this is ASP.NET, you have an implicit multithreaded scenario. You are using a shared variable (see What is the use of a shared variable in VB.NET?), which is (as the keyword implies) "shared" across multiple threads (from different people visiting the site).
You can very easily have a scenario where one visitor's thread gets to here:
'Since this can get hit a lot, we cache the values into a dictionary. Obviously,
'it goes away after each request. And the cache is disabled by default.
If Not MultiSyringeCache.ContainsKey(orderLabelID) Then
MultiSyringeCache.Add(orderLabelID, DoIsMultiSyringe(orderLabelID))
End If
' My thread is right here, when you visit the site
Return MultiSyringeCache(orderLabelID)
and then your thread comes in here and supercedes my thread:
Set(value As Boolean)
ClearMultiSyringeCache()
m_MultiSyringeCacheEnabled = value
End Set
Then my thread is going to try to read a value from the dictionary after you've cleared it.
That said, I am not sure what performance benefit you expect from a "cache" that you clear with every request. It looks like you should simply not make this variable shared- make it an instance variable- and any user request accessing it will have their own copy.

why would I unit test business layer in MVP

i created the following sample method in business logic layer. my database doesn't allow nulls for name and parent columns:
public void Insert(string catName, long catParent)
{
EntityContext con = new EntityContext();
Category cat = new Category();
cat.Name = catName;
cat.Parent = catParent;
con.Category.AddObject(cat);
con.SaveChanges();
}
so i unit test this and test for empty name and empty parent will fail. to get around that issue i have to refactor the Insert mathod as following:
public void Insert(string catName, long catParent)
{
//added to pass the test
if(string.IsNullOrEmpty(catName)) throw new InvalidOperationException("wrong action. name is empty.");
long parent;
if(long.TryParse(catParent, out parent) == false) throw new InvalidOperationException("wrong action. parent didn't parsed.");
//real bussiness logic
EntityContext con = new EntityContext();
Category cat = new Category();
cat.Name = catName;
cat.Parent = parent;
con.Category.AddObject(cat);
con.SaveChanges();
}
my entire bussiness layer are simple calls to database. so now i'm validating the data again! i already planned to do my validation in UI and test that kind of stuff in UI test units. what should i test in my bussiness logic method other than validation related tasks? and if there is nothing to be unit tested why everybody says "unit test all the layers" and things like that which i found a lot online?
The techniques involved in testing are those that you break down your program into smaller parts (smaller components or even classes) and test those small parts. As you assemble those parts together, you make less comprehensive tests -- the smaller parts are already proven to work -- until you have a functional, tested program, which then you give to users for "user tests".
It's preferable to test smaller parts because:
It's simpler to write the tests. You'll need less data, you only setup one object, you have to inject less dependencies.
It's easier to figure out what to test. You know the failing conditions from a simple reading of the code (or, better yet, from the technical specification).
Now, how can you guarantee that you business layer, simple as it's, is correctly implemented? Even a simple database insert can fail if badly written. Besides, how can you protected yourself from changes? Right know, the code works, but what will happen in the future if the database is changed or someone update the business logic.
However, and this is important, you actually don't need to test everything. Use your intuition (which is also called experience) to understand what needs testing and what doesn't. If you method is simple enough, just make sure the client code is correctly tested.
Finally, you've said that all your validation will occur in the UI. The business layer should be able to validate the data in order to increase reuse in your application. Fail to do that and whenever you or whoever make changes in your code in the future might create new UI and forget to add the required validations.

Statefinalization/initialization activity only runs on leaf states

I am trying to get my Windows State Machine workflow to communicate with end users. The general pattern I am trying to implement within a StateActivity is:
StateInitializationActivity: Send a message to user requesting an answer to a question (e.g. "Do you approve this document?"), together with the context for...
...EventDrivenActivity: Deal with answer sent by user
StateFinalizationActivity: Cancel message (e.g. document is withdrawn and no longer needs approval)
This all works fine if the StateActivity is a "Leaf State" (i.e. has no child states). However, it does not work if I want to use recursive composition of states. For non-leaf states, StateInitialization and StateFinalization do not run (I confirmed this behaviour by using Reflector to inspect the StateActivity source code). The EventDrivenActivity is still listening, but the end user doesn't know what's going on.
For StateInitialization, I thought that one way to work around this would be to replace it with an EventDrivenActivity and a zero-delay timer. I'm stuck with what to do about StateFinalization.
So - does anyone have any ideas about how to get a State Finalization Activity to always run, even for non-leaf states?
Its unfortunate that the structure of "nested states" is one of a "parent" containing "children", the designer UI re-enforces this concept. Hence its quite natural and intuative to think the way you are thinking. Its unfortunate because its wrong.
The true relationship is one of "General" -> "Specific". Its in effect a hierachical class structure. Consider a much more familar such relationship:-
public class MySuperClass
{
public MySuperClass(object parameter) { }
protected void DoSomething() { }
}
public class MySubClass : MySuperClass
{
protected void DoSomethingElse() { }
}
Here MySubClass inherits DoSomething from SuperClass. The above though is broken because the SuperClass doesn't have a default constructor. Also parameterised constructor of SuperClass is not inherited by SubClass. In fact logically a sub-class never inherits the constructors (or destructors) of the super-class. (Yes there is some magic wiring up default constructors but thats more sugar than substance).
Similarly the relationship between StateAcivities contained with another StateActivity is actually that the contained activity is a specialisation of the container. Each contained activity inherits the set of event driven activities of the container. However, each contained StateActivity is a first class discrete state in the workflow same as any other state.
The containing activity actual becomes an abstract, it can not be transitioned to and importantly there is no real concept of transition to a state "inside" another state. By extension then there is no concept of leaving such an outer state either. As a result there is no initialization or finalization of the containing StateActivity.
A quirk of the designer allows you to add a StateInitialization and StateFinalization then add StateActivities to a state. If you try it the other way round the designer won't let you because it knows the Initialization and Finalization will never be run.
I realise this doesn't actually answer your question and I'm loath to say in this case "It can't be done" but if it can it will be a little hacky.
OK, so here’s what I decided to do in the end. I created a custom tracking service which looks for activity events corresponding to entering or leaving the states which are involved in communication with end users. This service enters decisions for the user into a database when the state is entered and removes them when the state is left. The user can query the database to see what decisions the workflow is waiting on. The workflow listens for user responses using a ReceiveActivity in an EventDrivenActivity. This also works for decisions in parent ‘superstates’. This might not be exactly what a "Tracking Service" is meant to be for, but it seems to work
I've thought of another way of solving the problem. Originally, I had in mind that for communications I would use the WCF-integrated SendActivity and ReceiveActivity provided in WF 3.5.
However, in the end I came to the conclusion that it's easier to ignore these activities and implement your own IEventActivity with a local service. IEventActivity.Subscribe can be used to indicate to users that there is a question for them to answer and IEventActivity.Unsubscribe can be used to cancel the question. This means that separate activities in the State's inialization and finalization blocks are not required. The message routing is done manually using workflow queues and the user's response is added to the queue with appropriate name. I used Guid's for the queue names, and these are passed to the user during the IEventActivity.Subscribe call.
I used the 'File System Watcher' example in MSDN to work out how to do this.
I also found this article very insructive: http://www.infoq.com/articles/lublinksy-workqueue-mgr

What solutions are there for circular references?

When using reference counting, what are possible solutions/techniques to deal with circular references?
The most well-known solution is using weak references, however many articles about the subject imply that there are other methods as well, but keep repeating the weak-referencing example. Which makes me wonder, what are these other methods?
I am not asking what are alternatives to reference counting, rather what are solutions to circular references when using reference counting.
This question isn't about any specific problem/implementation/language rather a general question.
I've looked at the problem a dozen different ways over the years, and the only solution I've found that works every time is to re-architect my solution to not use a circular reference.
Edit:
Can you expand? For example, how would you deal with a parent-child relation when the child needs to know about/access the parent? – OB OB
As I said, the only good solution is to avoid such constructs unless you are using a runtime that can deal with them safely.
That said, if you must have a tree / parent-child data structure where the child knows about the parent, you're going to have to implement your own, manually called teardown sequence (i.e. external to any destructors you might implement) that starts at the root (or at the branch you want to prune) and does a depth-first search of the tree to remove references from the leaves.
It gets complex and cumbersome, so IMO the only solution is to avoid it entirely.
Here is a solution I've seen:
Add a method to each object to tell it to release its references to the other objects, say call it Teardown().
Then you have to know who 'owns' each object, and the owner of an object must call Teardown() on it when they're done with it.
If there is a circular reference, say A <-> B, and C owns A, then when C's Teardown() is called, it calls A's Teardown, which calls Teardown on B, B then releases its reference to A, A then releases its reference to B (destroying B), and then C releases its reference to A (destroying A).
I guess another method, used by garbage collectors, is "mark and sweep":
Set a flag in every object instance
Traverse the graph of every instance that's reachable, clearing that flag
Every remaining instance which still has the flag set is unreachable, even if some of those instances have circular references to each other.
I'd like to suggest a slightly different method that occured to me, I don't know if it has any official name:
Objects by themeselves don't have a reference counter. Instead, groups of one or more objects have a single reference counter for the entire group, which defines the lifetime of all the objects in the group.
In a similiar fashion, references share groups with objects, or belong to a null group.
A reference to an object affects the reference count of the (object's) group only if it's (the reference) external to the group.
If two objects form a circular reference, they should be made a part of the same group. If two groups create a circular reference, they should be united into a single group.
Bigger groups allow more reference-freedom, but objects of the group have more potential of staying alive while not needed.
Put things into a hierarchy
Having weak references is one solution. The only other solution I know of is to avoid circular owning references all together. If you have shared pointers to objects, then this means semantically that you own that object in a shared manner. If you use shared pointers only in this way, then you can hardly get cyclic references. It does not occur very often that objects own each other in a cyclic manner, instead objects are usually connected through a hierarchical tree-like structure. This is the case I'll describe next.
Dealing with trees
If you have a tree with objects having a parent-child relationship, then the child does not need an owning reference to its parent, since the parent will outlive the child anyways. Hence a non-owning raw back pointer will do. This also applies to elements pointing to a container in which they are situated. The container should, if possible, use unique pointers or values instead of shared pointers anyways, if possible.
Emulating garbage collection
If you have a bunch of objects that can wildly point to each other and you want to clean up as soon as some objects are not reachable, then you might want to build a container for them and an array of root references in order to do garbage collection manually.
Use unique pointers, raw pointers and values
In the real world I have found that the actual use cases of shared pointers are very limited and they should be avoided in favor of unique pointers, raw pointers, or -- even better -- just value types. Shared pointers are usually used when you have multiple references pointing to a shared variable. Sharing causes friction and contention and should be avoided in the first place, if possible. Unique pointers and non-owning raw pointers and/or values are much easier to reason about. However, sometimes shared pointers are needed. Shared pointers are also used in order to extend the lifetime of an object. This does usually not lead to cyclic references.
Bottom line
Use shared pointers sparingly. Prefer unique pointers and non-owning raw pointers or plain values. Shared pointers indicate shared ownership. Use them in this way. Order your objects in a hierarchy. Child objects or objects on the same level in a hierarchy should not use owning shared references to each other or to their parent, but they should use non-owning raw pointers instead.
No one has mentioned that there is a whole class of algorithms that collect cycles, not by doing mark and sweep looking for non-collectable data, but only by scanning a smaller set of possibly circular data, detecting cycles in them and collecting them without a full sweep.
To add more detail, one idea for making a set of possible nodes for scanning would be ones whose reference count was decremented but which didn't go to zero on the decrement. Only nodes to which this has happened can be the point at which a loop was cut off from the root set.
Python has a collector that does that, as does PHP.
I'm still trying to get my head around the algorithm because there are advanced versions that claim to be able to do this in parallel without stopping the program...
In any case it's not simple, it requires multiple scans, an extra set of reference counters, and decrementing elements (in the extra counter) in a "trial" to see if the self referential data ends up being collectable.
Some papers:
"Down for the Count? Getting Reference Counting Back in the Ring" Rifat Shahriyar, Stephen M. Blackburn and Daniel Frampton
http://users.cecs.anu.edu.au/~steveb/downloads/pdf/rc-ismm-2012.pdf
"A Unified Theory of Garbage Collection" by David F. Bacon, Perry Cheng and V.T. Rajan
http://www.cs.virginia.edu/~cs415/reading/bacon-garbage.pdf
There are lots more topics in reference counting such as exotic ways of reducing or getting rid of interlocked instructions in reference counting. I can think of 3 ways, 2 of which have been written up.
I have always redesigned to avoid the issue. One of the common cases where this comes up is the parent child relationship where the child needs to know about the parent. There are 2 solutions to this
Convert the parent to a service, the parent then does not know about the children and the parent dies when there are no more children or the main program drops the parent reference.
If the parent must have access to the children, then have a register method on the parent which accepts a pointer that is not reference counted, such as an object pointer, and a corresponding unregister method. The child will need to call the register and unregister method. When the parent needs to access a child then it type casts the object pointer to the reference counted interface.
When using reference counting, what are possible solutions/techniques to deal with circular references?
Three solutions:
Augment naive reference counting with a cycle detector: counts decremented to non-zero values are considered to be potential sources of cycles and the heap topology around them is searched for cycles.
Augment naive reference counting with a conventional garbage collector like mark-sweep.
Constrain the language such that its programs can only ever produce acyclic (aka unidirectional) heaps. Erlang and Mathematica do this.
Replace references with dictionary lookups and then implement your own garbage collector that can collect cycles.
i too am looking for a good solution to the circularly reference counted problem.
i was stealing borrowing an API from World of Warcraft dealing with achievements. i was implicitely translating it into interfaces when i realized i had circular references.
Note: You can replace the word achievements with orders if you don't like achievements. But who doesn't like achievements?
There's the achievement itself:
IAchievement = interface(IUnknown)
function GetName: string;
function GetDescription: string;
function GetPoints: Integer;
function GetCompleted: Boolean;
function GetCriteriaCount: Integer;
function GetCriteria(Index: Integer): IAchievementCriteria;
end;
And then there's the list of criteria of the achievement:
IAchievementCriteria = interface(IUnknown)
function GetDescription: string;
function GetCompleted: Boolean;
function GetQuantity: Integer;
function GetRequiredQuantity: Integer;
end;
All achievements register themselves with a central IAchievementController:
IAchievementController = interface
{
procedure RegisterAchievement(Achievement: IAchievement);
procedure UnregisterAchievement(Achievement: IAchievement);
}
And the controller can then be used to get a list of all the achievements:
IAchievementController = interface
{
procedure RegisterAchievement(Achievement: IAchievement);
procedure UnregisterAchievement(Achievement: IAchievement);
function GetAchievementCount(): Integer;
function GetAchievement(Index: Integer): IAchievement;
}
The idea was going to be that as something interesting happened, the system would call the IAchievementController and notify them that something interesting happend:
IAchievementController = interface
{
...
procedure Notify(eventType: Integer; gParam: TGUID; nParam: Integer);
}
And when an event happens, the controller will iterate through each child and notify them of the event through their own Notify method:
IAchievement = interface(IUnknown)
function GetName: string;
...
function GetCriteriaCount: Integer;
function GetCriteria(Index: Integer): IAchievementCriteria;
procedure Notify(eventType: Integer; gParam: TGUID; nParam: Integer);
end;
If the Achievement object decides the event is something it would be interested in it will notify its child criteria:
IAchievementCriteria = interface(IUnknown)
function GetDescription: string;
...
procedure Notify(eventType: Integer; gParam: TGUID; nParam: Integer);
end;
Up until now the dependancy graph has always been top-down:
IAchievementController --> IAchievement --> IAchievementCriteria
But what happens when the achievement's criteria have been met? The Criteria object was going to have to notify its parent `Achievement:
IAchievementController --> IAchievement --> IAchievementCriteria
^ |
| |
+----------------------+
Meaning that the Criteria will need a reference to its parent; the who are now referencing each other - memory leak.
And when an achievement is finally completed, it is going to have to notify its parent controller, so it can update views:
IAchievementController --> IAchievement --> IAchievementCriteria
^ | ^ |
| | | |
+----------------------+ +----------------------+
Now the Controller and its child Achievements circularly reference each other - more memory leaks.
i thought that perhaps the Criteria object could instead notify the Controller, removing the reference to its parent. But we still have a circular reference, it just takes longer:
IAchievementController --> IAchievement --> IAchievementCriteria
^ | |
| | |
+<---------------------+ |
| |
+-------------------------------------------------+
The World of Warcraft solution
Now the World of Warcraft api is not object-oriented friendly. But it does solve any circular references:
Do not pass references to the Controller. Have a single, global, singleton, Controller class. That way an achievement doesn't have to reference the controller, just use it.
Cons: Makes testing, and mocking, impossible - because you have to have a known global variable.
An achievement doesn't know its list of criteria. If you want the Criteria for an Achievement you ask the Controller for them:
IAchievementController = interface(IUnknown)
function GetAchievementCriteriaCount(AchievementGUID: TGUID): Integer;
function GetAchievementCriteria(Index: Integer): IAchievementCriteria;
end;
Cons: An Achievement can no longer decide to pass notifications to it's Criteria, because it doesn't have any criteria. You now have to register Criteria with the Controller
When a Criteria is completed, it notifies the Controller, who notifies the Achievement:
IAchievementController-->IAchievement IAchievementCriteria
^ |
| |
+----------------------------------------------+
Cons: Makes my head hurt.
i'm sure a Teardown method is much more desirable that re-architecting an entire system into a horribly messy API.
But, like you wonder, perhaps there's a better way.
If you need to store the cyclic data, for a snapShot into a string,
I attach a cyclic boolean, to any object that may be cyclic.
Step 1:
When parsing the data to a JSON string, I push any object.is_cyclic that hasn't been used into an array and save the index to the string. (Any used objects are replaced with the existing index).
Step 2: I traverse the array of objects, setting any children.is_cyclic to the specified index, or pushing any new objects to the array. Then parsing the array into a JSON string.
NOTE: By pushing new cyclic objects to the end of the array, will force recursion until all cyclic references are removed..
Step 3: Last I parse both JSON strings into a single String;
Here is a javascript fiddle...
https://jsfiddle.net/7uondjhe/5/
function my_json(item) {
var parse_key = 'restore_reference',
stringify_key = 'is_cyclic';
var referenced_array = [];
var json_replacer = function(key,value) {
if(typeof value == 'object' && value[stringify_key]) {
var index = referenced_array.indexOf(value);
if(index == -1) {
index = referenced_array.length;
referenced_array.push(value);
};
return {
[parse_key]: index
}
}
return value;
}
var json_reviver = function(key, value) {
if(typeof value == 'object' && value[parse_key] >= 0) {
return referenced_array[value[parse_key]];
}
return value;
}
var unflatten_recursive = function(item, level) {
if(!level) level = 1;
for(var key in item) {
if(!item.hasOwnProperty(key)) continue;
var value = item[key];
if(typeof value !== 'object') continue;
if(level < 2 || !value.hasOwnProperty(parse_key)) {
unflatten_recursive(value, level+1);
continue;
}
var index = value[parse_key];
item[key] = referenced_array[index];
}
};
var flatten_recursive = function(item, level) {
if(!level) level = 1;
for(var key in item) {
if(!item.hasOwnProperty(key)) continue;
var value = item[key];
if(typeof value !== 'object') continue;
if(level < 2 || !value[stringify_key]) {
flatten_recursive(value, level+1);
continue;
}
var index = referenced_array.indexOf(value);
if(index == -1) (item[key] = {})[parse_key] = referenced_array.push(value)-1;
else (item[key] = {})[parse_key] = index;
}
};
return {
clone: function(){
return JSON.parse(JSON.stringify(item,json_replacer),json_reviver)
},
parse: function() {
var object_of_json_strings = JSON.parse(item);
referenced_array = JSON.parse(object_of_json_strings.references);
unflatten_recursive(referenced_array);
return JSON.parse(object_of_json_strings.data,json_reviver);
},
stringify: function() {
var data = JSON.stringify(item,json_replacer);
flatten_recursive(referenced_array);
return JSON.stringify({
data: data,
references: JSON.stringify(referenced_array)
});
}
}
}
Here are some techniques described in Algorithms for Dynamic Memory Management by R. Jones and R. Lins. Both suggest that you can look at the cyclic structures as a whole in some way or another.
Friedman and Wise
The first way is one suggested by Friedman and Wise. It is for handling cyclic references when implementing recursive calls in functional programming languages. They suggest that you can observe the cycle as a single entity with a root object. To do that, you should be able to:
Create the cyclic structure all at once
Access the cyclic structure only through a single node - a root, and mark which reference closes the cycle
If you need to reuse just a part of the cyclic structure, you shouldn't add external references but instead make copies of what you need
That way you should be able to count the structure as a single entity and whenever the RC to the root falls to zero - collect it.
This should be the paper for it if anyone is interested in more details - https://www.academia.edu/49293107/Reference_counting_can_manage_the_circular_invironments_of_mutual_recursion
Bobrow's technique
Here you could have more than one reference to the cyclic structure. We just distinguish between internal and external references for it.
Overall, all allocated objects are assigned by the programmer to a group. And all groups are reference counted. We distinguish between internal and external references for the group and of course - objects can be moved between groups. In this case, intra-group cycles are easily reclaimed whenever there are no more external references to the group. However, inter-group cyclic structures should still be an issue.
If I'm not mistaken, this should be the original paper - https://dl.acm.org/doi/pdf/10.1145/357103.357104
I'm in the search for other general purpose alternatives besides those algorithms and using weak pointers.
Y Combinator
I wrote a programming language once in which every object was immutable. As such, an object could only contain pointers to objects that were created earlier. Since all reference pointed backwards in time, there could not possibly be any cyclic references, and reference counting was a perfectly viable way of managing memory.
The question then is, "How do you create self-referencing data structures without circular references?" Well, functional programming has been doing this trick forever using the fixed-point/Y combinator to write recursive functions when you can only refer to previously-written functions. See: https://en.wikipedia.org/wiki/Fixed-point_combinator
That Wikipedia page is complicated, but the concept is really not that complicated. How it works in terms of data structures is like this:
Let's say you want to make a Map<String,Object>, where the values in the map can refer to other values in the map. Well, instead of actually storing those objects, you store functions that generate those objects on-demand given a pointer to the map.
So you might say object = map.get(key), and what this will do is call a private method like _getDefinition(key) that returns this function and calls it:
Object get(String key) {
var definition = _getDefinition(key);
return definition == null ? null : definition(this);
}
So the map has a reference to the function that defines the value. This function doesn't need to have a reference to the map, because it will be passed the map as an argument.
When the definition called, it returns a new object that does have a pointer to the map in it somewhere, but the map it self does not have a pointer to this object, so there are no circular references.
There are couple of ways I know of for walking around this:
The first (and preferred one) is simply extracting the common code into third assembly, and make both references use that one
The second one is adding the reference as "File reference" (dll) instead of "Project reference"
Hope this helps

Resources