DSE Graph Loader mapping edges - datastax-enterprise-graph

DSE Graph Loader mapping edges - datastax-enterprise-graph

I have to map data from JSON files to DSE.
Everything is working just fine, but I didn't find any documentation about the way to map edges connected to different nodes but sharing a same label.
Example :
[A:Car] -- [OWNER] --> [B:Person]
[C:Car] -- [OWNER] --> [D:Company]
I've tried different approaches, finally I've added a custom field that explicitly describes the class of the nodes :
Data sample
// Nodes
{"id":"A","label":"Car"}
{"id":"B","label":"Person"}
{"id":"C","label":"Car"}
{"id":"D","label":"Company"}
// Edges
{"out":"A","label":"OWNER","in":"B", "outLabel":"Car","inLabel":"Person"}
{"out":"C","label":"OWNER","in":"D", "outLabel":"Car","inLabel":"Company"}
Here is the mapping script
load(nodesInput).asVertices {
labelField "label"
key "id"
}
load(edgesInput).asEdges {
label "OWNER"
outV "out", {
key "id"
label "Car"
}
inV "in", {
key "id"
labelField "inLabel" <-- this declaration seems to fail
}
}
Any idea ?

I believe you could accomplish the above with something like the following.
load(edgesInput).asEdges {
label "OWNER"
outV "out", {
key "id"
label "Car"
}
inV "in", {
key "id"
label it["inLabel"]
}
}
https://docs.datastax.com/en/latest-dse/datastax_enterprise/graph/dgl/dglMapScript.html

Related

Get a filtered list from a terraform map of objects?

I have a list of users with characteristics like this and I want to create a local variable that includes the names of the users in the "maker" group.
variable "users" {
type = map(object({
groups = list(string)
}))
default = {
"kevin.mccallister" = {
groups = ["kids", "maker"],
},
"biff" = {
groups = ["kids", "teens", "bully"],
},
}
}
I want to write the local like this, but it complains
Error: Invalid 'for' expression ... Key expression is required when
building an object.
locals {
makers_list = flatten({
for user, attr in var.users: user
if contains(attr.groups, "makers")
})
}
How can I take that map of objects and get out a list of names based on group affiliation?

flatten() is not required for this. Also, the {} is pushing this to build an object. You can instead build a list using [] and then it will create a list of the users filtered by their group association.
makers_list = [
for user, attr in var.users: user
if contains(attr.groups, "makers")
]

does ios computed properties increment arc count of any object?

I have a code below and I want to know that does computed properties increment arc count of another object? (retained by arc) For example below
mycustomlabel count is 2 ?
var mycustomlabel = UILabel()
var myc : UILabel {
get {
return mycustomlabel
}
}

No you have a single strong reference at this point.

How to add a DynamoDB global secondary Index via Python/Boto3

Is it possible to add a Global Secondary Index to and existing DynamoDB table AFTER it has been created? I am using Python 3.x with Boto3 and have not been able to find any examples of them being added to the table after it was created.

In general, yes it is possible to add a Global Secondary Index (GSI) after the table is created.
However, it can take a long time for the change to come into effect, because building the GSI requires a table scan.
In the case of boto3, have a look at the documentation for update_table
For example, you try something like this:
response = client.update_table(
TableName = 'YourTableName',
# ...snip...
GlobalSecondaryIndexUpdates=[
{
'Create': {
'IndexName': 'YourGSIName',
'KeySchema': [
{
'AttributeName': 'YourGSIFieldName',
'KeyType': 'HASH'
}
],
'Projection': {
'ProjectionType': 'ALL'
},
'ProvisionedThroughput': {
'ReadCapacityUnits': 1,
'WriteCapacityUnits': 1
}
}
}
],
# ...snip...
)

How do I configure OrientDB ETL to import an edge list with attributes

I have an CSV which contains an edge list, one edge per row. It looks like this:
id1, id2, attr1, attr2, attrX, attrY, attrZ
From this, I want to be able to create (or update) the following, per row:
Vertex A of class X, with id1 and attribute attr1
Vertex B of class X, with id2 and attribute attr2
Edge A->B with edge attributes attrX, attrY, attrZ
This is the configuration file I'm feeding to oetl.sh (using OrientDB 2.2 beta2):
{
"source": { "file": { "path": "/data/sample/test.csv" } },
"extractor": { "row": {} },
"transformers" :
[
{ "csv" : {} },
{ "merge" : { "joinFieldName":"id1", "lookup":"X.id" } },
{ "vertex" : { "class" : "X", "skipDuplicates":true } },
{ "edge" : {
"unresolvedLinkAction" : "WARNING",
"class" : "EdgeTypeClass",
"joinFieldName" : "id2",
"lookup": "X.id",
"edgeFields":{"attrX":"${input.attrX}", "attrY":"${input.attrY}","attrZ":"${input.attrZ}"}
}
},
{ "field" : { "fieldNames" : [ "id1", "id2", "attr1", "attr2", "attrX", "attrY", "attrZ" ], "operation": "remove" } }
],
"loader": {
"orientdb": {
"dbURL": "remote:localhost/test2",
"dbType": "graph"
}
}
}
The sample data I used to run the test is as follows:
10,11,"A","B",100,200,1
11,12,"B","C",110,201,5
12,14,"C","D",90,250,10
14,13,"D","E",105,210,3
When I run the oetl.sh script with the given configuration and sample data, it creates 4 vertices instead of 5 and no edges. There are no attributes on the vertices at all.
So these are the questions:
Is there a way in the vertex clause to specify vertex attributes/fields the same way that one can do for edges (i.e. edgeFields)? The documentation doesn't mention anything about it but it seems odd that you wouldn't be able to do it.
Rather than relying on the edge to create the outbound vertex, should I instead be creating two vertices explicitly and if so how do I specify that in the configuration file? When I try to add two "vertex" clauses it only seems to pick up the last one as the "current" vertex.
It's possible that the specific edge (id1 -> id2) already exists. Is it possible to only update the edge attributes in this case?
My sinking feeling is that given the complexity and number of things I'm trying to pack into this that it will be simpler to write my own ETL (e.g. using the Java API) instead of relying on oetl, but I was hoping I'd be able to avoid doing that if only because it's more maintainable.

faunus script map completing but not mutating graph

Prelude: Several months experience using both Gremlin "dialects" for FaunusGraph & TitanGraph, so well aware of the functional and syntactic diffs. Have successfully used Faunus script step (http://architects.dzone.com/articles/distributed-graph-computing , https://github.com/thinkaurelius/faunus/blob/master/src/main/java/com/thinkaurelius/faunus/mapreduce/sideeffect/ScriptMap.java) for relatively simple deletion & mutation of subgraphs.
Problem: Implemented a complex mutation script map to "move" edge properties to either the out-vertex or the in-vertex per a direction-oriented convention for naming properties. My TitanGraph Gremlin prototype works on small graphs, but I can't get the scaled-up implementation to work: the map completes successfully but the graph isn't changed (I am committing the changes). NOTE: my Logger object is only outputing the first INFO message that displays the prefix args, indicating I'm not satifying the edge namespace guard condition (I did a run without the condition, but no change). Following is my code (fat-fingering from an internal net, so typos are possible)
//faunus pipe driver - usage gremlin -e faunus.ns.set-props.grm
import java.io.Console
//get args
console=System.console()
arg=console.readLine('> type <namespace>;<faunus.ns.set-props.mapper_path>;<from_prefix>;<to_prefix>
inargs=arg.split(";")
//establish FaunusGraph connection
f=FaunusFactory.open('titan-client.properties')
f.getConf().set("faunus.graph.input.titan.storage.read-consistency-level", "ALL")
f.getConf().set("faunus.graph.input.titan.storage.write-consistency-level", "ALL")
//Faunus pipe incl. script step
f.V().has("_namespace", inargs[0]).script(inargs[1], inargs[2], inargs[3]
//script map - usage f.V().has("_namespace", <namespace_string>).script(<this_script_path>, <outV_key_prefix_string>, <inV_key_prefix_string>)
def g
def mylog
def setup(args) {
mylog=java.util.logging.Logger.getLogger("script_map")
println("configuring graph ...")
conf=new BaseConfiguration()
conf.setProperty("storage.backend", "cassandra")
conf.setProperty("storage.keyspace", "titan")
conf.setProperty("storage.index.index-name", "titan")
conf.setProperty("storage.hostname", "localhost")
g=TitanFactory.open(conf)
}
def map(v, args) {
mylog.info("*****READ***** args: "+args[0].toString()+", "+args[1].toString())
//fetch all edges incident on Titan vertex corresponding to incoming Faunus vertex
gv=g.v(v.id)
edges=gv.bothE();null
//iterate through incident edges
while(edges.hasNext()) {
e=edges.next()
if (e.hasProperty("_namespace")) { //_namespace removed from previously processed edges
/*fetch terminal vertices of current edge, add incidence & adjacency props
to support metrics and analytics
*/
from=e.getVertex(OUT)
from.setProperty("inV_degree", from.in().count())
from.setProperty("inE_degree", from.inE().count())
from.setProperty("outV_degree" from.out().count())
from.setProperty("outE_degree", from.outE().count())
to=e.getVertex(IN)
to.setProperty("inV_degree", from.in().count())
to.setProperty("inE_degree", from.inE().count())
to.setProperty("outV_degree" from.out().count())
to.setProperty("outE_degree", from.outE().count())
mylog.info("*****READ*****edge id: "+e.id)
mylog.info("*****READ*****edge vertices: from id"+fromid+"; to id: "+to.id)
//fetch property keys of current edge
ekeys=e.getPropertyKeys()
//iterate through edge property keys
for(String ekey:ekeys)
eprop=e.getProperty(ekey) //get value of current property key
goodprop=!(eprop == "" || eprop == null)
mylog.info("*****READ*****edge key/value: "+ekey+"="eprop)
/*determine placement of current key/value on one or neither of the
terminal vertices based on key prefix arges and property value,
remove prefix from re-assigned key/value
*/
if(ekey.startsWith(args[0]) && goodprop) {
vkey=ekey.split(args[0])[1]
if(!from.hasProperty(vkey)) from.setProperty(vkey, eprop)
else {
vprop=from.getProperty(vkey)
if(!vprop.equal(eprop) from.setProperty(vkey, vprop+";"+eprop)
}
mylog.info("*****READ*****from vertex key/value: "+vkey+"="+from.getProperty(vkey)
}
else if(ekey.startsWith(args[1]) && goodprop) {
vkey=ekey.split(args[1])[1]
if(!to.hasProperty(vkey)) to.setProperty(vkey, eprop)
else {
vprop=to.getProperty(vkey)
if(!vprop.equal(eprop) to.setProperty(vkey, vprop+";"+eprop)
}
mylog.info("*****READ*****tovertex key/value: "+vkey+"="+to.getProperty(vkey)
}
//if current edge property key is re-assigned, remove it from the edge
if(ekey.startsWith(args[0]) || ekey.startsWith(args[1])) {
e.removeProperty(ekey)
if(e.hasProperty(ekey) println(ekey+" NOT remvoded from edge")
else println(ekey+ "removed from edge")
}
e.removeProperty("_namespace") // marks edge as processed per outer loop guard
}
}
}
g.commit()
}
def cleanup(args) {
g.shutdown()
}

This line:
if (e.hasProperty("_namespace")) {
hasProperty doesn't seem to be in the Blueprints API or the Titan API. Since that is the case, I'm not sure how this code worked in your smaller test db, as it will always resolve to false and you will never see the inside of that if statement:
gremlin> x = TinkerGraphFactory.createTinkerGraph()
==>tinkergraph[vertices:6 edges:6]
gremlin> v = x.V('name','marko').next()
==>v[1]
gremlin> if (v.hasProperty('name')) { true } else { false }
==>false
I suppose you really want to try this:
gremlin> if (v.getProperty('name')) { true } else { false }
==>true
gremlin> if (v.getProperty('xxx')) { true } else { false }
==>false