Find an edge that is already connected with vertices to a specific vertex ID, and merge it with the result - gremlin

Consider facebook search results of the people list scenario. I want to get all the people from the database (hasLabel('person')). For each of these people, I want to know whether the logged in person already have connected and follows. What is the best solution to get this in gremlin (possibly avoiding duplication)?
g.addV('person').property('id',1).as('1').
addV('person').property('id',2).as('2').
addV('person').property('id',3).as('3').
addV('person').property('id',4).as('4').
addE('connected').from('1').to('2').
addE('connected').from('2').to('3').
addE('connected').from('3').to('1').
addE('connected').from('4').to('2').
addE('follows').from('1').to('2').
addE('follows').from('1').to('3').
addE('follows').from('1').to('4').
addE('follows').from('2').to('1').
addE('follows').from('2').to('3').
addE('follows').from('3').to('1').
addE('follows').from('3').to('4').
addE('follows').from('4').to('2').
addE('follows').from('4').to('3').iterate()
For instance, if the logged-in person id is 2, the formatted JSON response will be
[
{
"id": 1,
"follows": true,
"connected": true
},
{
"id": 3,
"follows": true,
"connected": false
},
{
"id": 4,
"follows": false,
"connected": true
}
]
and if the logged-in person id is 4
[
{
"id": 1,
"follows": false,
"connected": false
},
{
"id": 2,
"follows": true,
"connected": true
},
{
"id": 3,
"follows": true,
"connected": false
}
]
Note: The JSON response is provided to understand the outcome, but I just wanted the Gremlin query to get the outcome.

Below is the general pattern you are looking for however based on the script you have listed above and the direction of the edges it's unclear exactly when to return true and when not to.
g.V().
hasLabel('person').
not(has('id', 2)). //find me person 2
project('id', 'follows', 'connected').
by('id').
by(
__.in('follows').
has('id', 2). //traverse all inbound follows edges to find if they go to person 2
fold(). //create an array (empty if nothing)
coalesce(unfold().constant(true), constant(false))). //return true if edge exists, else false
by(
__.out('connected').
has('id', 2).
fold().
coalesce(unfold().constant(true), constant(false)))
Based on the script you provided there is no way to get the answers you asked for. Let's look at just the connected edges.
For vertex 2:
using in() we would get true for 1 and 4 and false for 3
using out() we would get true for 3 and false for 1 and 4
using both() all would be true
So based on the results above it looks like you want to use in() edges. However when we apply that to vertex 4 all the results would be false

Related

OpenAI package leaving linebreak in response

I've starting using OpenAI API in R. I downloaded the openai package. I keep getting a double linebreak in the text response. Here's an example of my code:
library(openai)
vector = create_completion(
model = "text-davinci-003",
prompt = "Tell me what the weather is like in London, UK, in Celsius in 5 words.",
max_tokens = 20,
temperature = 0,
echo = FALSE
)
vector_2 = vector$choices[1]
vector_2$text
[1] "\n\nRainy, mild, cool, humid."
Is there a way to get rid of this without 'correcting' the response text using other functions?
No, it's not possible.
The OpenAI API returns the completion with starting \n\n by default. There's no parameter for the Completions endpoint to control this.
You need to remove linebreak manually.
Example response looks like this:
{
"id": "cmpl-uqkvlQyYK7bGYrRHQ0eXlWi7",
"object": "text_completion",
"created": 1589478378,
"model": "text-davinci-003",
"choices": [
{
"text": "\n\nThis is indeed a test",
"index": 0,
"logprobs": null,
"finish_reason": "length"
}
],
"usage": {
"prompt_tokens": 5,
"completion_tokens": 7,
"total_tokens": 12
}
}

How to group by parent and collect all property values of child in gremlin?

I want to collect all shows and their associated genres together. GENRES are child relationship of SHOWS
Sample gemlin graph
So that the output is something similar to:
"1" [a,b]
"2" [c,d]
Sample graph: https://gremlify.com/x8i8stszn2
You can accomplish this using the project() step within Gremlin like this:
g.V("2789").out('WATCHED').hasLabel('SHOW').
project('show', 'genre').
by('NAME').
by(out('HAS_GENRE').values('NAME').fold())
This will return your data formatted like this this:
[
{
"show": 1,
"genre": [
"a",
"b"
]
},
{
"show": 2,
"genre": [
"c",
"d"
]
}
]

Printing objects without ansible_host

I'm trying to use jq to process JSON from the VMware Ansible inventory module so I can produce a list of objects (VMs) that don't have ansible_host defined.
The closest I've been able to get is:
{
"config.cpuHotAddEnabled": true,
"config.cpuHotRemoveEnabled": false,
"config.hardware.numCPU": 1,
"config.instanceUuid": "500e4e98-50ec-a3a7-9d45-b0ac36c2d192",
"config.name": "tu-openldap-01",
"config.template": false,
"guest.guestId": "rhel6_64Guest",
"guest.guestState": "notRunning",
"guest.hostName": "tu-openldap-01",
"guest.ipAddress": null,
"name": "tu-openldap-01",
"runtime.maxMemoryUsage": 2048
}
{
"config.cpuHotAddEnabled": true,
"config.cpuHotRemoveEnabled": false,
"config.hardware.numCPU": 1,
"config.instanceUuid": "500efaa5-baac-163b-65c0-7ed2a19f1d7d",
"config.name": "tu1vcm7tst2001",
"config.template": false,
"guest.guestId": "rhel7_64Guest",
"guest.guestState": "running",
"guest.hostName": "rhel7-template",
"guest.ipAddress": null,
"name": "tu1vcm7tst2001",
"runtime.maxMemoryUsage": 4096
}
using the following:
jq '._meta.hostvars[] | select(.ansible_host | not)' prod-inventory_201905070920.json
This is almost where I want it, but the problem is how do I print these plus the key for the object itself?
If I do:
jq '._meta.hostvars | select(.ansible_host | not)' prod-inventory_201905070920.json
I get these:
"tw1pttest1001_420e92f4-453e-1267-4331-d6253d771882": {
"ansible_host": "<omitted>",
"config.cpuHotAddEnabled": true,
"config.cpuHotRemoveEnabled": false,
"config.hardware.numCPU": 2,
"config.instanceUuid": "500ef630-16c1-cb91-be9c-e9e667b551d9",
"config.name": "tw1pttest1001",
"config.template": false,
"guest.guestId": "windows9Server64Guest",
"guest.guestState": "running",
"guest.hostName": "<omitted>",
"guest.ipAddress": "<omitted>",
"name": "tw1pttest1001",
"runtime.maxMemoryUsage": 49152
},
"tw1swsrm1001_420e18d2-0c96-0df5-e6c7-1ff8fc070cdb": {
"ansible_host": "<omitted>",
"config.cpuHotAddEnabled": true,
"config.cpuHotRemoveEnabled": false,
"config.hardware.numCPU": 4,
"config.instanceUuid": "500e231d-1eda-4e66-3f4a-8c68392a70b5",
"config.name": "tw1swsrm1001",
"config.template": false,
"guest.guestId": "windows9Server64Guest",
"guest.guestState": "running",
"guest.hostName": "<omitted>",
"guest.ipAddress": "<omitted>",
"name": "tw1swsrm1001",
"runtime.maxMemoryUsage": 16384
},
Any suggestions? I feel like it's something simple that I'm missing.
Assuming you are searching for items found in the ._meta.hostvars object, you can filter the objects by key/value by using something like with_entries/1.
$ jq '._meta.hostvars | with_entries(select(.value.ansible_host | not))
' prod-inventory_201905070920.json
This will in effect take the hostvars object and only keep properties that match that condition (does not have an ansible_host property value).

Understanding fold() and its impact on gremlin query cost in Azure Cosmos DB

I am trying to understand query costs in Azure Cosmos DB
I cannot figure out what is the difference in the following examples and why using fold() lowers the cost:
g.V().hasLabel('item').project('itemId', 'id').by('itemId').by('id')
which produces the following output:
[
{
"itemId": 14,
"id": "186de1fb-eaaf-4cc2-b32b-de8d7be289bb"
},
{
"itemId": 5,
"id": "361753f5-7d18-4a43-bb1d-cea21c489f2e"
},
{
"itemId": 6,
"id": "1c0840ee-07eb-4a1e-86f3-abba28998cd1"
},
....
{
"itemId": 5088,
"id": "2ed1871d-c0e1-4b38-b5e0-78087a5a75fc"
}
]
The cost is 15642 RUs x 0.00008 $/RU = 1.25$
g.V().hasLabel('item').project('itemId', 'id').by('itemId').by('id').fold()
which produces the following output:
[
[
{
"itemId": 14,
"id": "186de1fb-eaaf-4cc2-b32b-de8d7be289bb"
},
{
"itemId": 5,
"id": "361753f5-7d18-4a43-bb1d-cea21c489f2e"
},
{
"itemId": 6,
"id": "1c0840ee-07eb-4a1e-86f3-abba28998cd1"
},
...
{
"itemId": 5088,
"id": "2ed1871d-c0e1-4b38-b5e0-78087a5a75fc"
}
]
]
The cost is 787 RUs x 0.00008$/RU = 0.06$
g.V().hasLabel('item').values('id', 'itemId')
with the following output:
[
"186de1fb-eaaf-4cc2-b32b-de8d7be289bb",
14,
"361753f5-7d18-4a43-bb1d-cea21c489f2e",
5,
"1c0840ee-07eb-4a1e-86f3-abba28998cd1",
6,
...
"2ed1871d-c0e1-4b38-b5e0-78087a5a75fc",
5088
]
cost: 10639 RUs x 0.00008 $/RU = 0.85$
g.V().hasLabel('item').values('id', 'itemId').fold()
with the following output:
[
[
"186de1fb-eaaf-4cc2-b32b-de8d7be289bb",
14,
"361753f5-7d18-4a43-bb1d-cea21c489f2e",
5,
"1c0840ee-07eb-4a1e-86f3-abba28998cd1",
6,
...
"2ed1871d-c0e1-4b38-b5e0-78087a5a75fc",
5088
]
]
The cost is 724.27 RUs x 0.00008 $/RU = 0.057$
As you see, the impact on the cost is tremendous.
This is just for approx. 3200 nodes with few properties.
I would like to understand why adding fold changes so much.
I was trying to reproduce your example, but unfortunately have opposite results (500 vertices in Cosmos):
g.V().hasLabel('test').values('id')
or
g.V().hasLabel('test').project('id').by('id')
gave respectively
86.08 and 91.44 RU, while same queries followed by fold() step resulted in 585.06 and
590.43 RU.
This result I got seems fine, as according to TinkerPop documentation:
There are situations when the traversal stream needs a "barrier" to
aggregate all the objects and emit a computation that is a function of
the aggregate. The fold()-step (map) is one particular instance of
this.
Knowing that Cosmos charge RUs for both number of accessed objects and computations that are done on those obtained objects (fold in this particular case), higher costs for fold is as expected.
You can try to run executionProfile() step for your traversal, which can help you to investigate your case. When I tried:
g.V().hasLabel('test').values('id').executionProfile()
I got 2 additional steps for fold() (same parts of output omitted for brevity), and this ProjectAggregation is where the result set was mapped from 500 to 1:
...
{
"name": "ProjectAggregation",
"time": 165,
"annotations": {
"percentTime": 8.2
},
"counts": {
"resultCount": 1
}
},
{
"name": "QueryDerivedTableOperator",
"time": 1,
"annotations": {
"percentTime": 0.05
},
"counts": {
"resultCount": 1
}
}
...

How to process output from match function in jq?

I'm using js tool to parse some JSONs/strings. My minimal example is the following command:
echo '"foo foo"' | jq 'match("(foo)"; "g")'
Which results in the following output:
{
"offset": 0,
"length": 3,
"string": "foo",
"captures": [
{
"offset": 0,
"length": 3,
"string": "foo",
"name": null
}
]
}
{
"offset": 4,
"length": 3,
"string": "foo",
"captures": [
{
"offset": 4,
"length": 3,
"string": "foo",
"name": null
}
]
}
I want my final output for this example to be:
"foo,foo"
But in this case I get two separate objects instead of an array or similar that I could call implode on. I guess either the API isn't made for my UC or my understanding of it is very wrong. Please, advise.
The following script takes the string value from each of the separate objects with .string, wraps them in an array [...] and then joins the members of the array with commas using join.
I modified the regex because you didn't actually need a capture group for the given use case, but if you wanted to access the capture groups you could do .captures[].string instead of .string.
echo '"foo foo"' | jq '[match("foo"; "g").string] | join(",")'

Resources