Applying visitor on syntax tree - abstract-syntax-tree

I am building a c interpreter. My AST uses the composite-pattern. To check semantics and perform actions, I wanna use the visitor-pattern. Now there's one problem. This is an grammar rule of the c-preprocessor: if-section = if-group [ elif-groups ] [ else-group ] endif-line. The visitor of if-section needs information about the child nodes, to know which groups have to be skipped. In the visitor-pattern, every "visit"-method returns void. So I can't get any information about these nodes (only with adding information to the nodes, but that's ugly ...). Are there any opportunities?

You've nailed the problem: you have to have additional information above and beyond the raw data that comprises the AST.
You can associate all of that extra information with just individual tree nodes: if you do that, you'll end up building what is called an attributed tree. In theory (and if you work at), you make this idea work completely. Your visitor may have to inspect/update the information associated with not only the AST node it is visiting, but that of key children and parents.
In practice, it is useful to build auxiliary data structures (e.g., symbol tables) which can consulted by the visitor (and updated) as it walks the tree. You end up with kind of degenerate attributed tree: you associate symbol table chunks with AST nodes that form scopes.
You've artificially constrained your visitor from returning any value; if you didn't do that, a child visitor could pass useful values to a parent visitor, enabling the parent to do less ad hoc reaching down the tree.
In your problem statement, you have not constrained your visitor from passing values down to children, so you can pass down useful values. An extremely useful value to pass is the symbol table associated with the surrounding scope, so that children visitors don't have to climb back up the tree to find the scoping node, and the associated symbol table.

Related

Aggregation and navigability at the same end

In UML, is it possible to draw an aggregation where the component object can access the composite object? Like in this image, but with only one association line, so the association end touching A would have a diamond and an arrow.
If that isn't possible, the diagram I drawn is valid? If not, why?
Another point of view, navigability is important to show how is it possible to navigate in the model and how to access to instances.
Another point is about OCL, if navigability is not defined some OCL queries will be hard to write.
Specification describes (p 198): Navigability means that instances participating in links at runtime (instances of an Association) can be accessed efficiently from instances at the other ends of the Association. The precise mechanism by which such efficient access is achieved is implementation specific. If an end is not navigable, access from the other ends may or may not be possible, and if it is, it might not be efficient.
And about Property class (p 149): The query isNavigable() indicates whether it is possible to navigate across the property. body: not classifier->isEmpty() or association.navigableOwnedEnd->includes(self).
So to model or not navigability is important.
If you want to have navigability in both side, the following image shows that:
But in section 6 of the specifiation, it is written:
An association with neither end marked by navigability arrows means that the association is navigable in both directions.
Arrow notation is used to denote association end navigability. By definition, all class-owned association ends are navigable. By convention, all association owned ends in the metamodel are not navigable.
So the following schema is the same than the above one. This is tricky but it seems true.
Of course that's possible.
If you want to save space, you can use a single line for the association:
Here's my personal opinion about navigability: The navigational arrow is not needed as the existence of the property owner implies that already.
P. 110 of the specs:
When a Property is owned by a Classifier other than an Association via ownedAttribute, then it represents an attribute of the Classifier.
P. 200:
Navigability notation was often used in the past according to an informal convention, whereby non-navigable ends were assumed to be owned by the Association whereas navigable ends were assumed to be owned by the Classifier at the opposite end. This convention is now deprecated. Aggregation type, navigability, and end ownership are separate concepts, each with their own explicit notation. Association ends owned by classes are always navigable, while those owned by associations may be navigable or not.
But what is an association that is just named? It's a useless construct so far. What you intend is to finally create attributes in either classes - by means of properties in which case you add this (new) dot. To me this is simply over-constructed and impractical. Who is really using those dots? In EA you have to open sub-menus to make them appear. For me (and probably most UML readers) a role name represents a property and this an attribute in the other side. That's just "(my) human logic" behind that proposition. So, my practical approach:
Use simple connectors for association
Eventually add (non-) navigability (crosses and) arrows
Later add role names to indicate the use of attributes at the other end (rather than adding a typed attribute to the class' list).
And that's it. Just forget that silly dot that's a) hard to produce (in EA) and b) even harder to recognize.
Once again: this last part here is my recommendation for practical modeling.

how to ensure there single edge in a graph for a given order_id?

My current scenario is like I have I have products,customer and seller nodes in my graph ecosystem. The problem I am facing is that I have to ensure uniqueness of
(customer)-[buys]->product
with order_item_id as property of the buys relation edge.I have to ensure that there is an unique edge with buys property for a given order_item_id. In this way I want to ensure that my graph insertion remains idempotent and no repeated buys edges are created for a given order_item_id.
creating a order_item_id property
if(!mgmt.getPropertyKey("order_item_id")){
order_item_id=mgmt.makePropertyKey("order_item_id").dataType(Integer.class).make();
}else{
order_item_id=mgmt.getPropertyKey("order_item_id");
}
What I have found so far is that building unique index might solve my problem. like
if(mgmt.getGraphIndex('order_item_id')){
ridIndexBuilder=mgmt.getGraphIndex('order_item_id')
}else{
ridIndexBuilder=mgmt.buildIndex("order_item_id",Edge.class).addKey(order_item_id).unique().buildCompositeIndex();
}
Or I can also use something like
mgmt.buildEdgeIndex(graph.getOrCreateEdgeLabel("product"),"uniqueOrderItemId",Direction.BOTH,order_item_id)
How should I ensure this uniqueness of single buys edge for a given
order_item_id. (I don't have a use-case to search based on
order_item_id.)
What is the basic difference in creating an index on edge using
buildIndex and using buildEdgeIndex?
You cannot enforce the uniqueness of properties at the edge-level, ie. between two vertices (see this question on the mailing list). If I understand your problem correctly, building a CompositeIndex on edge with a uniqueness constraint for a given property should address your problem, even though you do not plan to search these edges by the indexed property. However, this could lead to performance issues when inserting many edges simultaneously due to locking. Depending on the rate at which you insert data, you may have to skip the locking (= skip the uniqueness constraint) and risk duplicate edges, then handle the deduplication yourself at read time and/or run post-import batch jobs to clean up potential duplicates.
buildIndex() builds a global, graph-index (either CompositeIndex or MixedIndex). These kind of indexes typically allows you to quickly find starting points of your Traversal within a graph.
However, buildEdgeIndex() allows you to build a local, "vertex-centric index" which is meant to speed-up the traversal between vertices with a potentially high degree (= huge number of incident edges, incoming and/or outgoing). Such vertices are called "super nodes" (see A solution to the supernode problem blog post from Aurelius) - and although they tend to be quite rare, the likelihood of traversing them isn't that low.
Reference: Titan documentation, Ch. 8 - Indexing for better Performance.

Neo4j Design: Property vs "Node & Relationship"

I have a node type that has a string property that will have the same value really often. Etc. Millions of nodes with only 5 options of that string value. I will be doing searches by that property.
My question would be what is better in terms of performance and memory:
a) Implement it as a node property and have lots of duplicates (and search using WHERE).
b) Implement it as 5 additional nodes, where all original nodes reference one of them (and search using additional MATCH).
Without knowing further details it's hard to give a general purpose answer.
From a performance perspective it's better to limit the search as early as possible. Even more beneficial if you do not have to look into properties for a traversal.
Given that I assume it's better to move the lookup property into a seperate node and use the value as relationship type.
Use labels; this blog post is a good intro to this new Neo4j 2.0 feature:
Labels and Schema Indexes in Neo4j
I've thought about this problem a little as well. In my case, I had to represent state:
STARTED
IN_PROGRESS
SUBMITTED
COMPLETED
Overall the Node + Relationship approach looks more appealing in that only a single relationship reference needs to be maintained each time rather than a property string and you don't need to scan an extra additional index which has to be maintained on the property (memory and performance would intuitively be in favor of this approach).
Another advantage is that it easily supports the ability of a node being linked to multiple "special nodes". If you foresee a situation where this should be possible in your model, this is better than having to use a property array (and searching using "in").
In practice I found that the problem then became, how do you access these special nodes each time. Either you maintain some sort of constants reference where you have the node ID of these special nodes where you can jump right into them in your START statement (this is what we do) or you need to do a search against property of the special node each time (name, perhaps) and then traverse down it's relationships. This doesn't make for the prettiest of cypher queries.

How is representing all information in Nodes vs Attributes affect storage, computations?

While using Graph Databases(my case Neo4j), we can represent the same information many ways. Making each entity a Node and connecting all entities through relationships or just adding the entities to attribute list of a Node.diff
Following are two different representations of the same data.
Overall, which mechanism is suitable in which conditions?
My use case involves traversing the Database from different nodes until 4 depths and examining the information through connected nodes or attributes (based on which approach it is).
One query of interest may be, "Who are the friends of John who went to Stanford?"
What is the difference in terms of Storage, computations
Normally,
properties are loaded lazily, and are more expensive to hold in cache, especially strings. Nodes and Relationships are most effective for traversal, especially since the relationships types are stored together with the relatoinship records and thus don't trigger property loads when used in traversals.
Also, a balanced graph (that is, not many dense nodes with over say 10K relationships) is most effective to traverse.
I would try to model most of the reoccurring proeprties as nodes connecting to the entities, thus using the graph itself to index on these values, instead of having to revert to filter on property values or index the property with an expensive index lookup.
The first one is much better since you're querying on entities such as Stanford- and that entity is related to many person nodes. My opinion that modeling as nodes is more intuitive and easier to query on. "Find all persons who went to Stanford" would not be very easy to do in your second model as you don't have a place to start traversing from.
I'd use attributes mainly to describe the node/entity use them to filter results from the query e.g. Who are friends of John who went to Stanford in the year 2010. In this case, the year attribute would just be used to trim the results. Depends on your use case- if year is really important and drives a lot of queries or is used to represent a timeline, you could even model the year as a node attached to Stanford.

Eclipse Constraint Programming - search/6

I'm having trouble understanding this documentation for the search/6 function in the eclipse constraint programming framework.
I understand that the choice parameter basically affects the value ordering.
It also seems like the selection method chooses the variable ordering, but I don't entirely understand all the options for it.
I don't really understand the other parameters so I was wondering if someone could explain them in words. I have a pretty good understanding of the theory of constraint logic programming so feel free to refer to those concepts. I just don't understand a lot of the CS lingo in that documentation (arity, etc.)
Thank you
I'll try to answer it as briefly as possible, since search/6 is one of the most complex predicates you can find in the ECLiPSe system.
Any more detailed follow-up questions would probably better be asked in the ECLiPSe user mailing list, though.
The search/6 predicate is a generic predicate for controlling the search for a solution of a CLP problem. It allows the user to control the shape of the search tree (the order of variables along the branches, the order of the branches, and the portion of the search tree that is visited). The predicate has 6 parameters: search(+L, ++Arg, ++Select, +Choice, ++Method, +Option). (+ and ++ denote the mode of the parameter)
The first two parameters go together. L is either a list of variables or a list of terms. If it's the former, the Arg must be 0, if it's the latter, then Arg denotes the position of the variables that should be instantiated during the search, e.g.:
search([A,B],0,input_order,indomain,complete,[]).
or
search([p(1,A),p(2,B)],2,input_order,indomain,complete,[]).
In both cases, the variables A and B are instantiated during search.
The third parameter is the selection method. This method is used by search/6 to select the next variable from the list L to instantiate.
The simplest option is input_order: the search simply iterates of the variables in the list. In the examples above, it would instantiate A first, then B. The other options consider the domain size and/or the number of constraints attached to the variables, and make the selection accordingly. E.g., first_fail chooses the variable with the smallest domain. If the current domain of A is [1,2,3] and B has the domain [1,3], then B will be selected and instantiated first. If more than one variable has the same smallest domain size, then the first of these by input order will be selected. Selection methods that take the domain size into account achieve a dynamic variable ordering, since the domain sizes will change (shrink) during search, depending on the amount of propagation that the constraints achieve.
The other selection methods should now be self-explanatory.
It is also possible to define your own selection method, provided that the predicate that implements it has arity 2, i.e., has two parameters. The predicate must take a variable as input and calculate some criterion value. The variable with the smallest criterion value will be selected.
The fourth parameter is the choice method. Once a variable is selected, the choice method controls the order in which the values in its domain are tried during search.
The simplest option is indomain, which chooses the values in the variable's current domain in ascending order. I.e., if variable A has the domain [1,3,5], then the search will initially bind A to 1, on backtracking bind it to 3, and finally to 5. indomain_middle will start with 3, then 1, then 5.
The more complex choice methods (i.e., other than indomain) will remove a tried value on backtracking, i.e., basically add additional constraints like A#\=1. This will cause additional propagation which may in turn allow earlier detection of infeasibilities. You can see the effect when running the n-queens example from the search/6 documentation that you linked to in your question.
Again, it is also possible to define your own choice method. The predicate must be of arity 1 or 3. If the arity is 1, then the predicate takes one variable as input and binds it to a value (or makes some other choice which alters the domain of the variable). If the arity is 3, then you can use the two additional parameters to pass along some state information which you can use to make the choice.
The fifth parameter is the search method. This controls the size of the section of the search tree that the search should explore (whereas the selection method controls the order of the variables along the branches of the tree, and the choice method controls the order of the branches in the search tree).
The simplest option is complete, which searches the tree left-to-right until the tree is exhausted. All other options (apart from symmetry breaking) are incomplete search methods, i.e., there will be branches in the search tree that are left unexplored. If the solution is on the leaf of such an unexplored branch, then it will not be found. You have to make sure that selection and choice methods shape the search tree in a way that the incomplete search method is able to find the solution. The option bbs, for instance, restricts the number of backtracks that can be made during search. If that number is exhausted, then the search will stop.
Symmetry breaking will only exclude branches that are equivalent (symmetrical) to other branches, in some way.
The sixth parameter is a list of possible additional options, described in the search/6 documentation. Normally, you won't need them.

Resources