Iterate over a list in a Match query - collections

I have a relation that has a list of ids s_ids as a property of the relation. each id in the list correspond to another node that has a sentence corresponding to an id.I used:
MATCH (c: term)-[r: semrel]->(t: term), (b: Sentence)
Where r.source = "xyz" And b.sentence_id IN r.s_id
return r,b
to return all sentences corresponding to the relation,
the result looks like :
r b
w abc
w rty
w zxv
e nmx
e qrt
the relation r is repeated for every sentence how can I group the list of sentences corresponding to each relation to get
r b
w abc, rty, zxv
e nmx,qrt
Thanks

This should return each r and its collection of sentences:
MATCH (c: term)-[r: semrel]->(t: term), (b: Sentence)
WHERE r.source = "xyz" AND b.sentence_id IN r.s_i
RETURN r, COLLECT(b) AS sentences;
For better performance, if you create an index on :Sentence(sentence_id), like this:
CREATE INDEX ON :Sentence(sentence_id);
then this query (which adds a hint to use the index) should be faster (as the b nodes can be found using the index):
MATCH (c: term)-[r: semrel]->(t: term), (b: Sentence)
USING INDEX b:Sentence(sentence_id)
WHERE r.source = "xyz" AND b.sentence_id IN r.s_i
RETURN r, COLLECT(b) AS sentences;

Related

SML Create function receives list of tuples and return list with sum each pair

I'm studying Standard ML and one of the exercices I have to do is to write a function called opPairs that receives a list of tuples of type int, and returns a list with the sum of each pair.
Example:
input: opPairs [(1, 2), (3, 4)]
output: val it = [3, 7]
These were my attempts, which are not compiling:
ATTEMPT 1
type T0 = int * int;
fun opPairs ((h:TO)::t) = let val aux =(#1 h + #2 h) in
aux::(opPairs(t))
end;
The error message is:
Error: unbound type constructor: TO
Error: operator and operand don't agree [type mismatch]
operator domain: {1:'Y; 'Z}
operand: [E]
in expression:
(fn {1=1,...} => 1) h
ATTEMPT 2
fun opPairs2 l = map (fn x => #1 x + #2 x ) l;
The error message is: Error: unresolved flex record (need to know the names of ALL the fields
in this context)
type: {1:[+ ty], 2:[+ ty]; 'Z}
The first attempt has a typo: type T0 is defined, where 0 is zero, but then type TO is referenced in the pattern, where O is the letter O. This gets rid of the "operand and operator do not agree" error, but there is a further problem. The pattern ((h:T0)::t) does not match an empty list, so there is a "match nonexhaustive" warning with the corrected type identifier. This manifests as an exception when the function is used, because the code needs to match an empty list when it reaches the end of the input.
The second attempt needs to use a type for the tuples. This is because the tuple accessor #n needs to know the type of the tuple it accesses. To fix this problem, provide the type of the tuple argument to the anonymous function:
fun opPairs2 l = map (fn x:T0 => #1 x + #2 x) l;
But, really it is bad practice to use #1, #2, etc. to access tuple fields; use pattern matching instead. Here is a cleaner approach, more like the first attempt, but taking full advantage of pattern matching:
fun opPairs nil = nil
| opPairs ((a, b)::cs) = (a + b)::(opPairs cs);
Here, opPairs returns an empty list when the input is an empty list, otherwise pattern matching provides the field values a and b to be added and consed recursively onto the output. When the last tuple is reached, cs is the empty list, and opPairs cs is then also the empty list: the individual tuple sums are then consed onto this empty list to create the output list.
To extend on exnihilo's answer, once you have achieved familiarity with the type of solution that uses explicit recursion and pattern matching (opPairs ((a, b)::cs) = ...), you can begin to generalise the solution using list combinators:
val opPairs = map op+

Deleting item out of a List

Does anyone has an idea how I can delete items out of the list (e.g. word a, the)? I am trying several ways, but do not arrive at a solution. Thank you for your help!
lst = list()
for key, val in list(counts.items()):
lst.append((val, key))
lst.sort(reverse=True)
for key, val in lst[:10]:
print(key, val)
You don't show the definition of counts but it looks like a dict of words to number of occurrences. If so, then this will do what you want
blacklist = ['a', 'the', 'an', 'and']
for key, value in [(k, v) for (k, v) in sorted(counts.iteritems(), reverse=True) if k not in blacklist]:
print "%s: %s" % (key, value)
The code above doesn't actually remove keys from your counts, it just filters them out of the loop. If you actually need to delete them (since that's what you asked for), use dict.pop() (docs)
blacklist = ['a', 'the', 'an', 'and']
for key in blacklist:
counts.pop(key, None)

Pattern Matching SML?

Can someone please explain the: "description of g"? How can f1 takes unit and returns an int & the rest i'm confused about too!!
(* Description of g:
* g takes f1: unit -> int, f2: string -> int and p: pattern, and returns
* an int. f1 and f2 are used to specify what number to be returned for
* each Wildcard and Variable in p respectively. The return value is the
* sum of all those numbers for all the patterns wrapped in p.
*)
datatype pattern = Wildcard
| Variable of string
| UnitP
| ConstP of int
| TupleP of pattern list
| ConstructorP of string * pattern
datatype valu = Const of int
| Unit
| Tuple of valu list
| Constructor of string * valu
fun g f1 f2 p =
let
val r = g f1 f2
in
case p of
Wildcard => f1 ()
| Variable x => f2 x
| TupleP ps => List.foldl (fn (p,i) => (r p) + i) 0 ps
| ConstructorP (_,p) => r p
| _ => 0
end
Wildcard matches everything and produces the empty list of bindings.
Variable s matches any value v and produces the one-element list holding (s,v).
UnitP matches only Unit and produces the empty list of bindings.
ConstP 17 matches only Const 17 and produces the empty list of bindings (and similarly for other integers).
TupleP ps matches a value of the form Tuple vs if ps and vs have the same length and for all i, the i-th element of ps matches the i-th element of vs. The list of bindings produced is all the lists from the nested pattern matches appended together.
ConstructorP(s1,p) matches Constructor(s2,v) if s1 and s2 are the same string (you can compare them with =) and p matches v. The list of bindings produced is the list from the nested pattern match. We call the strings s1 and s2 the constructor name.
Nothing else matches.
Can someone please explain the: "description of g"? How can f1 takes unit and returns an int & the rest i'm confused about too!!
The function g has type (unit → int) → (string → int) → pattern → int, so it takes three (curried) parameters of which two are functions and one is a pattern.
The parameters f1 and f2 must either be deterministic functions that always return the same constant, or functions with side-effects that can return an arbitrary integer / string, respectively, determined by external sources.
Since the comment speaks of "what number to be returned for each Wildcard and Variable", it sounds more likely that the f1 should return different numbers at different times (and I'm not sure what number refers to in the case of f2!). One definition might be this:
local
val counter = ref 0
in
fun uniqueInt () = !counter before counter := !counter + 1
fun uniqueString () = "s" ^ Int.toString (uniqueInt ())
end
Although this is just a guess. This definition only works up to Int.maxInt.
The comment describes g's return value as
[...] the sum of all those numbers for all the patterns wrapped in p.
Since the numbers are not ascribed any meaning, it doesn't seem like g serves any practical purpose but to compare the output of an arbitrarily given set of f1 and f2 against an arbitrary test that isn't given.
Catch-all patterns are often bad:
...
| _ => 0
Nothing else matches.
The reason is that if you extend pattern with additional types of patterns, the compiler will not notify you of a missing pattern in the function g; the catch-all will erroneously imply meaning for cases that are possibly yet undefined.

Dictionary key from pdb file

I'm trying to go through a .pdb file, calculate distance between alpha carbon atoms from different residues on chains A and B of a protein complex, then store the distance in a dictionary, together with the chain identifier and residue number.
For example, if the first alpha carbon ("CA") is found on residue 100 on chain A and the one it binds to is on residue 123 on chain B I would want my dictionary to look something like d={(A, 100):[B, 123, distance_between_atoms]}
from Bio.PDB.PDBParser import PDBParser
parser=PDBParser()
struct = parser.get_structure("1trk", "1trk.pdb")
def getAlphaCarbons(chain):
vec = []
for residue in chain:
for atom in residue:
if atom.get_name() == "CA":
vec = vec + [atom.get_vector()]
return vec
def dist(a,b):
return (a-b).norm()
chainA = struct[0]['A']
chainB = struct[0]['B']
vecA = getAlphaCarbons(chainA)
vecB = getAlphaCarbons(chainB)
t={}
model=struct[0]
for model in struct:
for chain in model:
for residue in chain:
for a in vecA:
for b in vecB:
if dist(a,b)<=8:
t={(chain,residue):[(a, b, dist(a, b))]}
break
print t
It's been running the programme for ages and I had to abort the run (have I made an infinite loop somewhere??)
I was also trying to do this:
t = {i:[((a, b, dist(a,b)) for a in vecA) for b in vecB if dist(a, b) <= 8] for i in chainA}
print t
But it's printing info about residues in the following format:
<Residue PHE het= resseq=591 icode= >: []
It's not printing anything related to the distance.
Thanks a lot, I hope everything is clear.
Would strongly suggest using C libraries while calculating distances. I use mdtraj for this sort of thing and it works much quicker than all the for loops in BioPython.
To get all pairs of alpha-Carbons:
import mdtraj as md
def get_CA_pairs(self,pdbfile):
traj = md.load_pdb(pdbfile)
topology = traj.topology
CA_index = ([atom.index for atom in topology.atoms if (atom.name == 'CA')])
pairs=list(itertools.combinations(CA_index,2))
return pairs
Then, for quick computation of distances:
def get_distances(self,pdbfile,pairs):
#returns list of resid1, resid2,distances between CA-CA
traj = md.load_pdb(pdbfile)
pairs=self.get_CA_pairs(pdbfile)
dist=md.compute_distances(traj,pairs)
#make dictionary you desire.
dict=dict(zip(CA, pairs))
return dict
This includes all alpha-Carbons. There should be a chain identifier too in mdtraj to select CA's from each chain.

Convert string [,] into string representation

Let say I have this two-dimensional array:
let a = Array2D.create 2 2 "*"
What is an idiomatic way to turn that into the following string?
**\n
**\n
My thought would be that I need to iterate over the rows and then map string.concat over the items in each row. However I can't seem to figure out how to iterate just the rows.
I think you'll have to iterate over the rows by hand (Array2D does not have any handy function for this),
but you can get a row using splicing syntax. To get the row at index row, you can write array.[row, *]:
let a = Array2D.create 3 2 "*"
[ for row in 0 .. a.GetLength(0)-1 ->
String.concat "" a.[row,*] ]
|> String.concat "\n"
This creates a list of rows (each turned into a string using the first String.concat) and then concatenates the rows using the second String.concat.
Alternatively, you may use StringBuilder and involve Array2D.iteri function:
let data' = [|[|"a"; "b"; "c"|]; [|"d"; "e"; "f"|]|]
let data = Array2D.init 2 3 (fun i j -> data'.[i].[j])
open System.Text
let concatenateArray2D (data:string[,]) =
let sb = new StringBuilder()
data
|> Array2D.iteri (fun row col value ->
(if col=0 && row<>0 then sb.Append "\n" else sb)
.Append value |> ignore
)
sb.ToString()
data |> concatenateArray2D |> printfn "%s"
This prints:
abc
def

Resources