faunus script map completing but not mutating graph - gremlin

Prelude: Several months experience using both Gremlin "dialects" for FaunusGraph & TitanGraph, so well aware of the functional and syntactic diffs. Have successfully used Faunus script step (http://architects.dzone.com/articles/distributed-graph-computing , https://github.com/thinkaurelius/faunus/blob/master/src/main/java/com/thinkaurelius/faunus/mapreduce/sideeffect/ScriptMap.java) for relatively simple deletion & mutation of subgraphs.
Problem: Implemented a complex mutation script map to "move" edge properties to either the out-vertex or the in-vertex per a direction-oriented convention for naming properties. My TitanGraph Gremlin prototype works on small graphs, but I can't get the scaled-up implementation to work: the map completes successfully but the graph isn't changed (I am committing the changes). NOTE: my Logger object is only outputing the first INFO message that displays the prefix args, indicating I'm not satifying the edge namespace guard condition (I did a run without the condition, but no change). Following is my code (fat-fingering from an internal net, so typos are possible)
//faunus pipe driver - usage gremlin -e faunus.ns.set-props.grm
import java.io.Console
//get args
console=System.console()
arg=console.readLine('> type <namespace>;<faunus.ns.set-props.mapper_path>;<from_prefix>;<to_prefix>
inargs=arg.split(";")
//establish FaunusGraph connection
f=FaunusFactory.open('titan-client.properties')
f.getConf().set("faunus.graph.input.titan.storage.read-consistency-level", "ALL")
f.getConf().set("faunus.graph.input.titan.storage.write-consistency-level", "ALL")
//Faunus pipe incl. script step
f.V().has("_namespace", inargs[0]).script(inargs[1], inargs[2], inargs[3]
//script map - usage f.V().has("_namespace", <namespace_string>).script(<this_script_path>, <outV_key_prefix_string>, <inV_key_prefix_string>)
def g
def mylog
def setup(args) {
mylog=java.util.logging.Logger.getLogger("script_map")
println("configuring graph ...")
conf=new BaseConfiguration()
conf.setProperty("storage.backend", "cassandra")
conf.setProperty("storage.keyspace", "titan")
conf.setProperty("storage.index.index-name", "titan")
conf.setProperty("storage.hostname", "localhost")
g=TitanFactory.open(conf)
}
def map(v, args) {
mylog.info("*****READ***** args: "+args[0].toString()+", "+args[1].toString())
//fetch all edges incident on Titan vertex corresponding to incoming Faunus vertex
gv=g.v(v.id)
edges=gv.bothE();null
//iterate through incident edges
while(edges.hasNext()) {
e=edges.next()
if (e.hasProperty("_namespace")) { //_namespace removed from previously processed edges
/*fetch terminal vertices of current edge, add incidence & adjacency props
to support metrics and analytics
*/
from=e.getVertex(OUT)
from.setProperty("inV_degree", from.in().count())
from.setProperty("inE_degree", from.inE().count())
from.setProperty("outV_degree" from.out().count())
from.setProperty("outE_degree", from.outE().count())
to=e.getVertex(IN)
to.setProperty("inV_degree", from.in().count())
to.setProperty("inE_degree", from.inE().count())
to.setProperty("outV_degree" from.out().count())
to.setProperty("outE_degree", from.outE().count())
mylog.info("*****READ*****edge id: "+e.id)
mylog.info("*****READ*****edge vertices: from id"+fromid+"; to id: "+to.id)
//fetch property keys of current edge
ekeys=e.getPropertyKeys()
//iterate through edge property keys
for(String ekey:ekeys)
eprop=e.getProperty(ekey) //get value of current property key
goodprop=!(eprop == "" || eprop == null)
mylog.info("*****READ*****edge key/value: "+ekey+"="eprop)
/*determine placement of current key/value on one or neither of the
terminal vertices based on key prefix arges and property value,
remove prefix from re-assigned key/value
*/
if(ekey.startsWith(args[0]) && goodprop) {
vkey=ekey.split(args[0])[1]
if(!from.hasProperty(vkey)) from.setProperty(vkey, eprop)
else {
vprop=from.getProperty(vkey)
if(!vprop.equal(eprop) from.setProperty(vkey, vprop+";"+eprop)
}
mylog.info("*****READ*****from vertex key/value: "+vkey+"="+from.getProperty(vkey)
}
else if(ekey.startsWith(args[1]) && goodprop) {
vkey=ekey.split(args[1])[1]
if(!to.hasProperty(vkey)) to.setProperty(vkey, eprop)
else {
vprop=to.getProperty(vkey)
if(!vprop.equal(eprop) to.setProperty(vkey, vprop+";"+eprop)
}
mylog.info("*****READ*****tovertex key/value: "+vkey+"="+to.getProperty(vkey)
}
//if current edge property key is re-assigned, remove it from the edge
if(ekey.startsWith(args[0]) || ekey.startsWith(args[1])) {
e.removeProperty(ekey)
if(e.hasProperty(ekey) println(ekey+" NOT remvoded from edge")
else println(ekey+ "removed from edge")
}
e.removeProperty("_namespace") // marks edge as processed per outer loop guard
}
}
}
g.commit()
}
def cleanup(args) {
g.shutdown()
}

This line:
if (e.hasProperty("_namespace")) {
hasProperty doesn't seem to be in the Blueprints API or the Titan API. Since that is the case, I'm not sure how this code worked in your smaller test db, as it will always resolve to false and you will never see the inside of that if statement:
gremlin> x = TinkerGraphFactory.createTinkerGraph()
==>tinkergraph[vertices:6 edges:6]
gremlin> v = x.V('name','marko').next()
==>v[1]
gremlin> if (v.hasProperty('name')) { true } else { false }
==>false
I suppose you really want to try this:
gremlin> if (v.getProperty('name')) { true } else { false }
==>true
gremlin> if (v.getProperty('xxx')) { true } else { false }
==>false

Related

Remove nodes which are single or have 2nd degree visjs

I've a network graph
Now I've some connected nodes and as you can see most of the nodes only have one connected node that is their degree is 1. Now I'd like to remove such nodes to clear the clutter. Unable to find how to since last 2 days. No such helper functions available in visjs documentation. Would appreciate help.
I believe the algorithm suggested by the 1st answer -by macramole- (before updates) would actually hide the non-connected nodes (degree 0), instead of the ones with degree 1.
I would probably just iterate over all the edges in the network while keeping 'degree' counters for each node that is an endpoint in the edge you are visiting (you can obtain these nodes by grabbing the edge.from and edge.to values, as shown above). You would increment the degree counter for a node, whenever the node is 'hit' in this search through the edges.
Eventually you'll end up with the degree value for each node in the network, at which point you can decide which ones to hide.
Updating this answer now to include my suggested code (note: nodes and edges are vis DataSet instances):
Example code:
var nodeToDegrees = {}; // keeps a map of node ids to degrees
var nodeFrom, nodeTo;
for (edge in edges) {
nodeFrom = edge.from;
nodeTo = edge.to;
nodeToDegrees[nodeFrom] = nodeToDegrees[nodeFrom] ? nodeToDegrees[nodeFrom] + 1 : 0;
nodeToDegrees[nodeTo] = nodeToDegrees[nodeTo] ? nodeToDegrees[nodeTo] + 1 : 0;
}
for (node in nodes) {
if (nodeToDegrees[node.id] = 1) nodes.update([{node.id, hidden: true}]);
}
This might work:
var DEGREES_HIDDEN = 1;
for ( var node of nodes ) {
node.cantLinks = 0;
for ( var link of links ) {
if ( link.from == node.id || link.to == node.id ) {
node.cantLinks++;
}
}
}
for ( var node of nodes ) {
if ( node.cantLinks <= DEGREES_HIDDEN ) {
node.hidden = true;
}
}
Nodes and links are arrays not vis.DataSet, I create the latter after doing that.
Doesn't look very nice perfomance wise but it does get the job done. Hope you find it useful.

Create nested map from key in groovy

I'm relatively new to groovy and am using it in the context of a gradle build. So please don't be harsh if there is an easy out-of-the-box solution for this.
Basically I'm trying to accomplish the reverse of Return Nested Key in Groovy. That is, I have some keys read from the System.properties map for example user.home and corresponding values like C:\User\dpr. Now I want to create a map that reflects this structure to use it in a groovy.text.SimpleTemplateEngine as bindings:
[user : [home : 'C:\Users\dpr']]
The keys may define an arbitrary deep hierarchy. For example java.vm.specification.vendor=Oracle Corporation should become:
[java : [vm : [spec : [vendor : 'Oracle Corporation']]]]
Additionally there are properties with the same parents such as user.name=dpr and user.country=US:
[
user: [
name: 'dpr',
country: 'US'
]
]
Edit: While ConfigSlurper is really nice, it is somewhat too defensive with creating the nested maps as it stops nesting at the minimum depth of a certain key.
I currently ended up using this
def bindings = [:]
System.properties.sort().each {
def map = bindings
def split = it.key.split("\\.")
for (int i = 0; i < split.length; i++) {
def part = split[i];
// There is already a property value with the same parent
if (!(map instanceof Map)) {
println "Skipping property ${it.key}"
break;
}
if (!map.containsKey(part)) {
map[part] = [:]
}
if (i == split.length - 1) {
map[part] = it.value
} else {
map = map[part]
}
}
map = it.value
}
With this solution the properties file.encoding.pkg, java.vendor.url and java.vendor.url.bug are discarded, which is not nice but something I can cope with.
However the above code is not very groovyish.
You can use a ConfigSlurper :
def conf = new ConfigSlurper().parse(System.properties)
println conf.java.specification.version

Traversing based on multiple vertexes

I've a graph in OrientDB with vertexes Area & Place with edges visited. Your average path goes Area > visited > Place > visited > Place > visited > Place > visited > Place and so on. It tracks which places user visited after the previous one. visited contains YYYYmmDD datestamp.
I'm trying to find out all Area vertexes based on arbitrary Place vertexes for certain day - i.e. I want to know from which areas users came to a certain place after visiting certain place first.
Traversing from any single Place in the path would be easy but I need to to follow the path for only for a specific datestamp. What I did was that I created index for datestamp to get day's visited edges quickly and then finds the one that has in to the first Place. However now I can't figure out how to create a fast query that finds all Area vertexes based on the first Place while also making sure that the path contains second Place as well. I can get path between first and second Place via shortestPath() but I still have the same problem with extending the path to include Area vertexes.
I found some theory on the subject but if somebody could point me to the right direction how to use OrientDB to do this instead of pure graph theory I would really appreciate it - I've been working on this for the past week now. Originally this was done via bruteforce by traversing everything and then selecting but as the database grows it's not obviously sustainable.
I created the three Vertices 'Area', 'Place' and 'User' and the two Edges 'visited' and 'placed' where datestamp is a property on the edge 'visited.
In this way you don't have to insert everytime the User as a property on the edge.
Edit
Try this JavaScript function that has three parameters(places,date,propertyPlace)
var g=orient.getGraph();
var myPlaces=places.substring(1,places.length-1).split(",");
var b=g.command("sql","select from Area");
var result=[];
if(checkPlaces){
for(i=0;i<b.length;i++){
var listPlaces=[];
for(ind=0;ind<myPlaces.length;ind++){
listPlaces.push(myPlaces[ind]);
}
if(search(b[i],listPlaces)){
result.push(b[i]);
}
}
}
return result;
function checkPlaces() {
for(index=0;index<myPlaces.length;index++){
var place=g.command("sql","select from Place where "+ propertyPlace + "='"+myPlaces[index]+"'");
if(place.length==0){
return false;
}
}
return true;
}
function checkDate(edge){
var datestamp=edge.getRecord().field("datestamp");
var year=datestamp.getYear()+1900;
var month=datestamp.getMonth()+1;
var day=datestamp.getDate();
var app=date.split("-");
var myYear=parseInt(app[0]);
var myMonth=parseInt(app[1]);
var myDay=parseInt(app[2]);
if(year==myYear && month==myMonth && day==myDay){
return true;
}
return false;
}
function search(v,places){
var edge=v.getRecord().field("out_visited");
if(edge!=null){
var edgeIterator=edge.iterator();
while(edgeIterator.hasNext()){
var edge = edgeIterator.next();
if (checkDate(edge)) {
var v1 = edge.field("in");
if(v1!=null){
var name = v1.field(propertyPlace);
for(j=0;j<places.length;j++){
if(name==(places[j])) {
places.splice(j, 1);
break;
}
}
if(places.length==0){
return true;
}
else if(search(v1,places)){
return true;
}
}
}
}
}
return false;
}
Using the following command
select expand(result) from (select myFunction("[place1,place2]","2015-12-03","name") as result)
Let me know if it works
This is not a solution to this exact problem but instead a workaround I came up with. Inspired by Orientdb get last vertex from each path when Traversing by edge property
I changed to structure so that Area is now created per visitation instead of being static and it includes yyyymmdd timestamp also. Now I can use Area to start the query and use visited edges to get Place vertices only for a certain date.
Here's the query:
SELECT $path, $depth FROM (
TRAVERSE * FROM (
SELECT outE('visited') FROM (
SELECT EXPAND(rid) FROM INDEX:area.dt WHERE key = 20151205
)
) WHILE (#class = 'visited' AND dt = 20151205) OR #class = 'place')
WHERE #class = 'place' AND NOT (outE() contains (dt=20151205))
This returns correct paths with vertices and edges so you can verify it's only for a certain day. However note that Area is not contained in the path and I still need to figure out how to do that but if you want, you can just traverse the first visited edge backwards in the path and get it that way.

How do I translate LR(1) Parse into a Abstract syntax tree?

I have coded a table driven LR(1) parser and it is working very well however I am having a bit of a disconnect on the stage of turing a parse into a syntax tree/abstract syntax tree. This is a project that I m very passionate about but I have really just hit a dead end here. Thank you for your help in advance.
Edit: Also my parser just uses a 2d array and an action object that tells it where to go next or if its a reduction where to go and how many items to pop. I noticed that many people use the visitor pattern. Im not sure how they know what type of node to make.
Here is the pushdown automata for context
while (lexer.hasNext() || parseStack.size() > 0) {
Action topOfStack = parseStack.peek();
token = parseStack.size() > 0 ? lexer.nextToken() : new Token(TokenType.EOF, "EOF");
topOfStack.setToken(token);
int row = topOfStack.getTransitionIndex();
int column = getTerminalIndex(token.getLexeme());
column = token.getType() == TokenType.IDENTIFIER
&& !terminalsContain(token.getLexeme()) ? 0 : column;
Action action = actionTable[row][column];
if (action instanceof Accept) {
System.out.println("valid parse!!!!!!");
} else if (action instanceof Reduction) {
Reduction reduction = (Reduction) action;
popStack(parseStack, reduction.getNumberOfItemsToPop());
column = reduction.getTransitionIndex();
row = parseStack.peek().getTransitionIndex();
parseStack.push(new Action(gotoTable[row][column]));
lexer.backupTokenStream();
} else if (action != null) {
parseStack.push(actionTable[row][column]);
} else {
System.out.println("Parse error");
System.out.println("On token: " + token.getLexeme());
break;
}
Each reduction in the LR parsing process corresponds to an internal node in the parse tree. The rule being reduced is the internal AST node, and the items popped off the stack correspond to the children of that internal node. The item pushed for the goto corresponds to the internal node, while those pushed by shift actions correspond to leaves (tokens) of the AST.
Putting all that together, you can easily build an AST by createing a new internal node every time you do a reduction and wiring everything together appropriately.

ArangoDB: traverse only edges within a time range

I am experimenting with time based versioning.
I have created a vertex that is connected to other vertices with this edge on one side:
{
"_id": "edges/426647569364",
"_key": "426647569364",
"_rev": "426647569364",
"_from": "nodes/426640688084",
"_to": "nodes/426629284820",
"valid_from": "1385787600000",
"valid_till": "9007199254740991"
}
And this edge on the other:
{
"_id": "edges/426679485396",
"_key": "426679485396",
"_rev": "426845488084",
"_from": "nodes/426675749844",
"_to": "nodes/426629284820",
"valid_from": "1322629200000",
"valid_till": "1417323600000"
}
The valid_till value in the first edge is the output of the Number.MAX_SAFE_INTEGER function.
I looked at custom vistors a little and it looks like its focused on filtering vertices rather than edges.
How can I restrict my traversal to edges with a valid_till value between new Date().getTime() and Number.MAX_SAFE_INTEGER?
You can use the followEdges attribute in a traversal.
followEdges can optionally be a JavaScript function for filtering edges. It will be invoked for each edge in the traversal:
var expandFilter = function (config, vertex, edge, path) {
return (edge.vaild_till >= new Date().getTime() &&
edge.valid_till <= Number.MAX_SAFE_INTEGER);
};
require("org/arangodb/aql/functions").register("my::expandFilter", expandFilter);
It can then be used in a traversal like a regular custom filter by specifying it in the followEdges attribute of the traversal options, e.g.:
LET options = {
followEdges: 'my::expandFilter'
}
FOR doc IN TRAVERSAL(nodes, edges, 'nodes/startNode', 'inbound', options)
RETURN doc.vertex

Resources