Gremlin: removing duplicate values while using as() - azure-cosmosdb

This is my graph example:
The picture shows my example for graphdb, the question is: I wanted to get the suggestion friends for jetse ?
Here is the vertices and edges creation for this example
I tried different queries. Sometimes I get column reference error, but I found a solution, but I'm sure its not optimal solution.
Starting Vertices: jetse
all vertices have property called: fname
edges: Works,KNOWS,Teaches,Enroll
what im trying to do is: 1- jetse works in itph, get people who work with jetse which STEF, people who know stef is Remsy, remsy teaches a course where there studendts.
the problem is when i output the result:
stef
remsey
Omar
stef
remsey
ufuk1
becuase they both have the vairables with them.
I want the result like
Stef Remsey Omar ufuk1 ufuk2
My solution is:
g.V('jetse').as('exclude').
out('works').
in().as('sug').
where(neq('exclude')).
in('knows').as('b').
out('teaches').
in('enroll').as('std').
union(select('sug').by('fname'),
select('b').by('fname'),
select('std').by('fname')).
dedup()
Do you have a better query for this solution ?

thanks for tiny-wa for giving me confidence about my answer.
g.V('jetse').as('exclude').
out('works').
in().as('sug').
where(neq('exclude')).
in('knows').as('b').
out('teaches').
in('enroll').as('std').
union(select('sug').by('fname'),
select('b').by('fname'),
select('std').by('fname')).
dedup()

Related

How do I simple calculations in CosmosDB using GremlinAPI

I am using CosmosDB with GremlinAPI and I would like to perform simple calculation even though CosmosDB does not support the math step.
Imagine that I have vertex "Person" with the property Age that can have a edge "Owns" to another vertex "Pet" that also has the property Age.
I would like to know if a given person has a cat that is younger than the person but not more than 10 years younger.
The query (I know this is just a part of it but this is where my problem is)
g.V().hasLabel("Person").has("Name", "Jonathan Q. Arbuckle").as("owner").values("age").inject(-10).sum().as("minAge").select("owner")
Returns an empty result but
g.V().hasLabel("Person").has("Name", "Jonathan Q. Arbuckle").as("owner").values("age").inject(-10).as("minAge").select("owner")
Returns the selected owner.
It seems that if I do a sum() or a count() in the query, then I cannot do 'select("owner")' anymore.
I do not understand this behaviour. What should I do to be able to do a 'select("owner")' and be able to filter the Pets based on their age.
Is there some other way I can write this query?
Thank you in advance
Steps like sum, count and max are known as reducing barrier steps. They cause what has happened earlier in the traversal to essentially be forgotten. One way to work around this is to use a project step. As I do not have your data I used the air-routes data set and used airport elevation as a substitute for age in your graph.
gremlin> g.V(3).
project("elev","minelev","city").
by("elev").
by(values("elev").inject(-10).sum()).
by("city")
==>[elev:542,minelev:532,city:Austin]
I wrote some notes about reducing barrier steps here: http://kelvinlawrence.net/book/PracticalGremlin.html#rbarriers
UPDATED
If you wanted to find airports with an elevation less than the starting airport by no more than 10 and avoiding the math step you can use this formulation.
g.V(3).as('a').
project('min').by(values('elev').inject(-10).sum()).as('p').
select('a').
out().
where(lt('a')).by('elev').
where(gt('p')).by('elev').by('min')

Displaying level in gremlin query

I am executing the gremlin query as follows:
g.V().hasLabel('A').has('label_A','A').emit().repeat(outE().inV()).valueMap()
Getting the desired output of nodes at multiple levels.
Along with the properties, I want to add a level property to the output. How can I achieve it?
Adding another answer to point out you can avoid sack using loops as an alternative.
g.V().hasLabel('A').has('label_A','A').
emit().
repeat(group('x').by(loops()).by(valueMap().fold()).out()).
cap('x')
You can use withSack for depth:
g.withSack(0).V().hasLabel('A').has('label_A','A').emit().
repeat(sack(sum).
by(constant(1)).
out()).
project('depth', 'properties').
by(sack()).
by(valueMap())
example: https://gremlify.com/ca32zczgvtkh6

How to find vertices that have at least one single-direction links in Gremlin?

I am trying to find each vertex that has a one-way connection to at least one other vertex. This is what I have, but it is clearly wrong.
g.V().has("label","SomeVertex").as('Vertex').Out().where(__.in().hasNot('Vertex'))
Any ideas? Thank you!
You can try something like that:
g.V().as('a').
where(out().not(where(out().as('a'))))
example: https://gremlify.com/8m

Searching wikipedia through R

I have a list of names in my dataframe and I want to find a way to query them in Wikipedia, although it's not as simple as just appending the name to "https://en.wikipedia.org/wiki/", I want to actually query Wikipedia so that there will be a suggestion even if its not spelt correctly. So for example if I were to put in Dick Dawkins, it'd come up with Richard Dawkins. I checked and that is actually the first hit on Wikipedia.
Ideally I'd want to use RVest but I don't want to manually get every url. Is this possible?
You are right. I, too, had a hard time getting Dick Dawkins out of the wikipedia. So much so that even searching for Dick Dawkins on the wikipedia search brought me straight to Richard Dawkins.
However, if you want to search for a term (say "Richard Dawkins") then Wikipedia has a proper API for you (https://www.mediawiki.org/wiki/API:Tutorial). You can play around and find the right parameters that work for you.
Just to get you started, I wrote a function (which is somewhat similar to rg255's post). You can change the parameter for MySearch function. Please make sure that spaces in search string are replaced by '%20' for every query from your dataframe. Simple gsub function should do the job. You will also have to install 'jsonlite' package for this to work.
library(jsonlite)
MySearch <- function(srsearch){
FullSearchString <- paste("http://en.wikipedia.org/w/api.php?action=query&list=search&srsearch=",srsearch,"&format=json",sep="")
Response <- fromJSON(FullSearchString)
return(Response)
}
Response <- MySearch("Richard%20Dawkins")
You can now use the parsed JSON to use the properties that you want. As I said, you will have to play with the parameters to get it right.
Please let me know if this is not what you wanted.

Genbank query (package seqinr): searching in sequence description

I am using the function query() of package seqinr to download myoglobin DNA sequences from Genbank. E.g.:
query("myoglobins","K=myoglobin AND SP=Turdus merula")
Unfortunately, for a lot of the species I'm looking for I don't get any sequence at all (or for this species, only a very short one), even though I find sequences when I search manually on the website. This is because of searching for "myoglobin" in the keywords only, while often there isn't any entry in there. Often the protein type is only specified in the name ("definition" on Genbank) -- but I have no idea how to search for this.
The help page on query() doesn't seem to offer any option for this in the details, a "generic search" without any "K=" doesn't work, and I haven't found anything via googling.
I'd be happy about any links, explanations and help. Thank you! :)
There is a complete manual for the seqinr package which describes the query language more in depth in chapter 5 (available at http://seqinr.r-forge.r-project.org/seqinr_2_0-1.pdf). I was trying to do a similar query and the description for many of the genes/cds is blank so they don't come up when searching using the k= option. One alternative would be to search for the organism alone, then match gene names in the individual annotations and pull out the accession numbers, which you could then use to re-query the database for your sequences.
This would pull out the annotation for the first gene:
choosebank("emblTP")
query("ACexample", "sp=Turdus merula")
getName(ACexample$req[[1]])
annotations <- getAnnot(ACexample$req[[1]])
cat(annotations, sep = "\n")
I think that this would be a pretty time consuming way to tackle the problem but there doesn't seem to be an efficient way of searching the annotations directly. I'd be interested in any solutions you might come up with.

Resources