Gremlin - search for multiple substrings - gremlin

For a given vertex, I can find whether a property myproperty contains a single substring substring1, as so:
g.V(993280096)
.filter({it.get().value("myproperty").contains("substring1")})
How can I extend this to search for multiple substrings in the same query?
Something along the lines of:
g.V(993280096)
.filter({ it.get().value("myproperty")
.contains(or("substring1", "substring2"))})
And is there a better way to do this instead of using lambda expressions? Note that I do not want to use the graph database (in my case JanusGraph) builtins because I'm using gremlin-python.

You can use the new text filter predicates. On the modern sample graph you could do this for example:
gremlin> TinkerFactory.createModern().traversal().V().
has("name", containing("ark").or(containing("os"))).values("name")
==>marko
==>josh

Just after posting I identified a solution (though I don't know if this is the best way) using matches instead of contains:
g.V(993280096)
.filter({ it.get().value("myproperty").matches(".* substring1.*|.* substring2.*")})

Related

Displaying level in gremlin query

I am executing the gremlin query as follows:
g.V().hasLabel('A').has('label_A','A').emit().repeat(outE().inV()).valueMap()
Getting the desired output of nodes at multiple levels.
Along with the properties, I want to add a level property to the output. How can I achieve it?
Adding another answer to point out you can avoid sack using loops as an alternative.
g.V().hasLabel('A').has('label_A','A').
emit().
repeat(group('x').by(loops()).by(valueMap().fold()).out()).
cap('x')
You can use withSack for depth:
g.withSack(0).V().hasLabel('A').has('label_A','A').emit().
repeat(sack(sum).
by(constant(1)).
out()).
project('depth', 'properties').
by(sack()).
by(valueMap())
example: https://gremlify.com/ca32zczgvtkh6

Can I extract all matches with functions like aregexec?

I've been enjoying the powerful function aregexec that allows me to mine strings in a fuzzy way.
For that I can search for a string of nucleotide "ATGGCTTCGTC" within a DNA section with defined allowance of insertion, deletion and substitute.
However, it only show me the first match without finishing the whole string. For example,
If I run
aregexec("a","adfasdfasdfaa")
only the first "a" will show up from the result. I'd like to see all the matches.
I wonder if there are other more powerful functions or a argument to be added to this one.
Thank you very much.
P.S. I explained the fuzzy search poorly. I mean, the match doesn't have to be perfect. Say if I allow an substitution of one character, and search AATTGG in ctagtactaAATGGGatctgct, the capital part will be considered a match. I can similarly allow insertions and deletions of certain characters.
gregexpr will show every time there is the pattern in the string, like in this example.
gregexpr("as","adfasdfasdfaa")
There are many more information if you use ?grep in R, it will explain every aspect of using regex.

How, with Gremlin, to return properties from in-vertices the same as I do from out-vertices? (Not as arrays)

I'm trying to start traversing from one set of labelled vertices, then get all their in-vertices connected by a particular kind of edge, then from there, return a property of those in-vertices as objects. I can do this same thing with some out-vertices starting from the same set of labelled vertices with no problem, but get a "The provided traverser does not map to a value:" error when I attempt it with some in-vertices.
I have found a workaround, but it is not ideal, as it returns the desired property values as arrays of length one.
Here is how I do the very similar task successfully with out-vertices:
g.V().hasLabel('TestCenter').project('address').by(out('physical').project('street').by(values('street1')))
This returns things like
==>{address={street=561 PLACE DE CEDARE}}
==>{address={street=370 N BLACK STATION AVE}}
This is great!
Then I try the same sort of query with some in-vertices, like this:
g.V().hasLabel('TestCenter').project('host').by(__.in('hosts').project('aCode').by(values('code')))
and get the above mentioned error.
The workaround I've been able to find is to add a .fold() to the final "by" like this:
g.V().hasLabel('TestCenter').project('host').by(__.in('hosts').project('aCode').by(values('code')).fold())
but then my responses are like this
==>{host=[{aCode=7387}]}
==>{host=[{aCode=9160}]}
What I would like is a response looking like this:
==>{host={aCode=4325}}
==>{host={aCode=1234}}
(Note: I am not sure if this is relevant, but I am connecting Gremlin to a Neptune DB Instance)
It seems to me from the error above and your workaround that not all of your 'TestCenter' have an in edge from type 'hosts'. When using project the by have to map for a valid value.
you can do two things:
1) make sure a value will be returned in the project:
g.V().hasLabel('TestCenter').project('host')
.by(coalesce(__.in('hosts').project('aCode').by(values('code')), constant('empty')))
2) filter does values:
g.V().hasLabel('TestCenter').where(__.in('hosts'))
.project('host').by(__.in('hosts').project('aCode').by(values('code')))

Marklogic: Find documents containing elements without a particular attribute (maybe many per document)

I have some data which looks something like this:
<wrapper>
<inner a="1"/>
<inner a="2" b="3"/>
</wrapper>
The attribute b may or may not be present on each inner element. My aim is to find all documents containing at least one inner element that doesn't have attribute b.*
This similar question proposes the answer:
cts:not-query(cts:element-attribute-value-query(xs:QName('inner'), xs:QName('b'), '*', ("wildcarded"))))
but that doesn't work, because some inner elements on the same document may have attribute b, and not-queries work on the entire fragment, so a mixed case like the example above would not be returned. Wrapping it in an element-query doesn't help, and cts:and-not-query seems to behave the same way.
I have also tried attacking the problem using co-occurrence/values functions to read the values of relevant attributes a, but that also seems to be impossible. It might have been possible with proximity settings on co-occurrences calls except there is no element text, so the attribute are indexed with the same word positions.
Are there any alternatives to the blunt xpath?
//inner[#a and not(#b)]
You can always make the xpath more complicated if simplicity isnt your goal.
How about this one: (it more accurately answers the exact question of 'return all documents that contain 'innner' elements that do not have an atribute #b'
doc()[exists(//inner[not(#b)])]
I do not know how well this is optimized -- some xpath expressions optimize down to the equivalent cts: query and some do not.
There is another 'trick' involving combining cts expressions represented as maps. Take the results of 2 searches, use the options that return the results as a map, then you can use the operations on this page https://developer.marklogic.com/blog/im-a-map to do extremely efficient set operations (union, intersection, difference etc). When properly constructed, this technique can be as fast as 'native' cts searches --- the cts searches use the same general technique internally for resolving results.
Make the XPath a path range index. //inner[#a and not(#b)], or if there's no element text, //inner[#a and not(#b)]/#a, then do
cts:path-range-query('//inner[#a and not(#b)]/#a','>','')
This happens to also allow us to efficiently answer the question of which #a values have a missing #b, using cts:values.
cts:not-in-query has the necessary behaviour to make this work where cts:and-not-query doesn’t. E.g.
cts:not-in-query(
cts:element-query(xs:QName('inner'), cts:true-query()),
cts:element-attribute-query(xs:QName('inner'), xs:QName('b'),'*','wildcarded')
)
Finds all ‘inner’ elements at positions that do not match the positions of ‘inner’ elements with attribute b.
Element position index must be enabled. Wildcard index must be enabled.
http://docs.marklogic.com/cts:not-in-query

Graphite: recursive descent of nodes in graphs / functions?

A follow-up question from Graphite: sum all stats that match a pattern?:
Is there any Graphite magic to recursively descend node names? I now know that I can use patterns like so:
stats.timers.api.*.200.count
... but imagine that I have the following:
stats.timers.api.foo.bar.200.count
stats.timers.api.baz.200.count
I'd like to see both of those stats (and all others of arbitrary depth) on the same chart. I tried the following:
stats.timers.api.*.200.count
stats.timers.api.**.200.count
The former only shows me items like the 'baz' example above; the latter is an error.
Is there some other way to match metrics in a depth-insensitive manner?
A neater, single line version of dannyla's answer would be:
stats.timers.api.{*,*.*,*.*.*}.200.count
But the short answer to your question would no, there's no magic to recursively descend node names.
I know its not the 100% what you are after, however you can have multiple targets on the same graph.
You could just combine the below on to the same graph that will give you the results, however not the clean solution your after.
stats.timers.api.*.200.count
stats.timers.api.*.*.200.count
stats.timers.api.*.*.*.200.count

Resources