Problem finding the right syntax for group() and by() commands - gremlin

I have the following Gremlin code:
g.V('body4608', 'body491')
.repeat(
__.as('body')
.in().aggregate('allVertices')
.bothE().as('edges')
.otherV().as('otherV')
.where('otherV', within('allVertices')).as('vertices')
.dedup()
)
.until(
__.count().is(1))
// .group()
// .by(__.select('body').id())
// .by(select('vertices').id().dedup().fold()).as('gVertices')
.group()
.by(__.select('body').id())
.by(select('edges').label().fold()).as('gEdges')
Each of the group() commands work individually where the output look like this for the 'edges':
`{'body4608': ['add', 'add', 'start', 'push', 'push_task_through', 'add'],
'body491': ['can_claim_task_for', 'is_in', 'Additionally_should_familiarize', 'can_claim_task_for', 'are_assigned_to', 'are', 'are_assigned_to', 'completes', 'can_use']}
What I would like to achieve is to include the words from the group that currently is commented out in the list of words from the 'edge' group so that I will have a combined list of words for each 'body'

Related

Returning multiple values from one step using 'select', 'by' in Gremline

My graph schema looks like this:
(Location)<-[:INVENTOR_LOCATED_IN]-(Inventor)-[:INVENTOR_OF]->(Patent)
I'm trying to return multiple values from each step in the query path. Here's the query I have so far that runs correctly:
g.V().and(has('Location', 'city', textContains('Bloomington')), has('Location','state',textContains('IN'))).as('a').
bothE().bothV().hasLabel('Inventor').as('b').
bothE().bothV().has('Patent', 'title', textContains('Lid')).as('c').
select('a,', 'b', 'c').
by('state').by('name_first').by('title').
fold();
What I would like to do is for each step return two node properties. I tried the following but it returns an error:
g.V().and(has('Location', 'city', textContains('Bloomington')), has('Location', 'state',textContains('IN'))).as('a').
bothE().bothV().hasLabel('Inventor').as('b').
bothE().bothV().has('Patent', 'title', textContains('Lid')).as('c').
select('a,', 'b', 'c').
by('city', 'state').by('name_first', 'name_last').by('title', 'abstract').
fold();
Can anyone suggest syntax that will allow me to return multiple properties from each node in the path?
The by(key) is meant to be a sort of shorthand for values(key) which means if you have more than one value you could do:
g.V().and(has('Location', 'city', textContains('Bloomington')), has('Location', 'state',textContains('IN'))).as('a').
bothE().bothV().hasLabel('Inventor').as('b').
bothE().bothV().has('Patent', 'title', textContains('Lid')).as('c').
select('a,', 'b', 'c').
by(values('city', 'state').fold()).
by(values('name_first', 'name_last').fold()).
by(values('title', 'abstract').fold()).
fold()
You might also consider forms of elementMap(), valueMap(), or project() as alternatives. Since by() takes a Traversal you have a lot of flexibility.

Use dictionary keys and associated values as wildcards in snakemake

I have a great number of analyses that need to be done in one go, and thus I thought that I can make a dictionary and parse the keys and values as wildcards (every snakemake run needs two wildcards to be used).
My dict will look like this:
myDict= {
"Apple": ["fruity","red","green"]
"Banana": ["fruity,"yellow"]
}
Here the first key in the dictionary will be wildcard1, here {Apple}, with the first value as wildcard2, here {fruity}, and run snakemake with these two until the final rule is has been run.
Then the same key will again be used ({Apple} as wildcard1) with the second associated value, here {red}, as wildcard2, and run snakemake until the last rule has been run.
Then after the final value belonging to {Apple} has been used as wildcard2, switch over to {Banana} as wildcard1 with its first value, {fruity} as wildcard2.
This will go on until all keys and their associated values have been used as wildcards and snakemake will stop. (That is keys as wildcard1, and their values as wildcard2).
My question is if this is possible, and if so, how can I achieve that?
I bet there is a way to do it with a single expand, but you can use a more verbose list comprehension. I'll take the files to be {wc1}_{wc2}.out for wildcards 1 and 2. Then you have
myDict= {
"Apple": ["fruity","red","green"],
"Banana": ["fruity","yellow"]
}
inputs = [expand('{wc1}_{wc2}.out',
wc1=key, wc2=value)
for key, value in myDict.items()]
# inputs = [['Apple_fruity.out', 'Apple_red.out', 'Apple_green.out'], ['Banana_fruity.out', 'Banana_yellow.out']]
rule all:
input: inputs
Edited to address comment:
To make two lists, keys and values, you can use
keys = []
values = []
for key, value in myDict.items():
for v in value:
keys.append(key)
values.append(v)
print(keys) # ['Apple', 'Apple', 'Apple', 'Banana', 'Banana']
print(values) # ['fruity', 'red', 'green', 'fruity', 'yellow']

Gremlin: inject() and has() not working together as expected

I need to create vertices without duplication based on a list passed to inject(), the list is very large so I need to use inject(). I tried this but it didn't work:
g.inject(["Macka", "Pedro", "Albert"]).unfold().map(
coalesce(
V().has("name", identity()),
addV("user").property("name", identity())
)
)
You can try here:
https://gremlify.com/765qiupxinw
Why this doesn't work?
It seems that V().has() is returning all vertices, why?
I think in this case you should use where step and not has:
g.inject(["Macka", "Pedro", "Albert"]).unfold().as('n').map(
coalesce(
V().where(eq('n')).by('name').by(),
addV("user").property("name", identity())
)
)
example: https://gremlify.com/06q0zxgd2uam

Union step does not work with multiple elements

The following query returns a user map with an "injected" property called "questions", it works as expected when g.V().has() returns a single user, but not when returns multiple users:
return g.V().has("user", "userId", 1)
.union(
__.valueMap().by(__.unfold()),
__.project('questions').by(
__.outE('response').valueMap().by(__.unfold()).fold()
)
)
.unfold()
.group()
.by(__.select(column.keys))
.by(__.select(column.values));
It works, but when I change the first line to return multiple users:
g.V().hasLabel("user").union(....
I finish the query calling .toList() so I was expecting to get a list of all the users in the same way it works with a single user but instead I still get a single user.
How can I get my query to work for both, multiple users or a single user?
When using Gremlin, you have to think in terms of a stream. The stream contains traversers which travel through the steps you've written. In your case, with your initial test of:
g.V().has("user", "userId", 1)
.union(
__.valueMap().by(__.unfold()),
__.project('questions').by(
__.outE('response').valueMap().by(__.unfold()).fold()
)
)
.unfold()
.group()
.by(__.select(column.keys))
.by(__.select(column.values))
you have one traverser (i.e. V().has("user", "userId", 1) produces one user) that flows to the union() and is split so that it goes to both valueMap() and project() both producing Map instances. You now have two traversers which are unfolded to a stream and grouped together to one final Map traverser.
So with that in mind what changes when you do hasLabel("user")? Well, you now have more than one starting traverser which means you will produce two traversers for each of those users when you get to union(). They will each be flatted to stream by unfold() and then they will just overwrite one another (because they have the same keys) to produce one final Map.
You really want to execute your union() and follow on operations once per initial "user" vertex traverser. You can tell Gremlin to do that with map():
g.V().has("user", "userId", 1)
.map(
.union(
__.valueMap().by(__.unfold()),
__.project('questions').by(
__.outE('response').valueMap().by(__.unfold()).fold()
)
)
.unfold()
.group()
.by(__.select(column.keys))
.by(__.select(column.values))
)
Finally, you can simplify your final by() modulators as:
g.V().has("user", "userId", 1)
.map(
.union(
__.valueMap().by(__.unfold()),
__.project('questions').by(
__.outE('response').valueMap().by(__.unfold()).fold()
)
)
.unfold()
.group()
.by(keys)
.by(values)
)

U-SQL How can i execute Linqu solution that return my Dictionary

I'm have a list with string types and i want to get each one that have maximum of occurence element grouped by another column. I'm trying to do this by linqu expression but it doesn't work. Is it possible to run my code that i show below ?
#test=(from a in #data
group a by new {a.PostCode}
into obj
select obj).ToDictionary(x => x.Key,x=>x.ToList()
.Select(y=>y.Statistic).GroupBy(s => s)
.OrderByDescending(s => s.Count())
.First().Key);

Resources