Program recurs in different line than supposed to - recursion

I have some rules called sessions that have a Title(String) and Topics (a list of String). I'm creating a query where I input some keywords and find which session has a best match. Each keyword can have a weight (keyword is followed by -2 for example if weight of keyword is 2, if weight is 1 there is nothing next to the keyword).
My problem is in the recursion. For inputs without weights everything is ok but with weights the program recurs differently.
session('Rules; Semantic Technology; and Cross-Industry Standards',
['XBRL - Extensible Business Reporting Language',
'MISMO - Mortgage Industry Standards Maintenance Org',
'FIXatdl - FIX Algorithmic Trading Definition Language',
'FpML - Financial products Markup Language',
'HL7 - Health Level 7',
'Acord - Association for Cooperative Operations Research and Development (Insurance Industry)',
'Rules for Governance; Risk; and Compliance (GRC); eg; rules for internal audit; SOX compliance; enterprise risk management (ERM); operational risk; etc',
'Rules and Corporate Actions']).
session('Rule Transformation and Extraction',
['Transformation and extraction with rule standards; such as SBVR; RIF and OCL',
'Extraction of rules from code',
'Transformation and extraction in the context of frameworks such as KDM (Knowledge Discovery meta-model)',
'Extraction of rules from natural language',
'Transformation or rules from one dialect into another']).
accessList([H|T], H, T).
is_empty_string(String) :-
String == ''.
not_empty_string(String) :-
String \== ''.
get_weight_of_token(MyToken, WeightOfToken) :-
split_string(MyToken,"-","",TempList), % tokenize MyToken into a list ( [string_of_token, weight_of_token] )
accessList(TempList,_,TempWeight), %
accessList(TempWeight,TempWeight2,_), % accessing the tail of the list (weight_of_token)
number_string(WeightOfToken,TempWeight2) % convert string to number
;
WeightOfToken is 1.
find_relevance([], SessionTitle, SessionTopics).
find_relevance([H | T], SessionTitle, SessionTopics) :-
get_weight_of_token(H, WeightOfToken),
format('Token = ~w with weight = ~d ~n', [H,WeightOfToken]),
format('Title = ~w~nTopics = ~w ~n', [SessionTitle,SessionTopics]),
find_relevance(T, SessionTitle, SessionTopics).
query(ListOfKeywords) :-
is_empty_string(ListOfKeywords), write('List of keywords is empty!')
;
not_empty_string(ListOfKeywords),
split_string(ListOfKeywords," ","",KeywordsTokens), %tokenize string with space as separator
session(Title,Topics), find_relevance(KeywordsTokens, Title, Topics), fail.
With an input without weights like 'firword secword', this is the result:
https://imgur.com/a/Q2fU2IM
As you can see the program recurs fine and calls find_revelance for the next session.
With an input with weights like 'firword-2 secword', this is the result:
https://imgur.com/xKFpFvh
As you can see the program doesn't recur to (9) for the next session but recurs to (10) for some reason.. and specifically to the part after ;..
Why is this happening? Thanks in advance.
*I changed the titles and topics to something smaller so the images can be more clear

Related

Neo4j - Project graph from Neo4j with GraphDataScience

So I have a graph of nodes -- "Papers" and relationships -- "Citations".
Nodes have properties: "x", a list with 0/1 entries corresponding to whether a word is present in the paper or not, and "y" an integer label (one of the classes from 0-6).
I want to project the graph from Neo4j using GraphDataScience.
I've been using this documentation and I indeed managed to project the nodes and vertices of the graph:
Code
from graphdatascience import GraphDataScience
AURA_CONNECTION_URI = "neo4j+s://xxxx.databases.neo4j.io"
AURA_USERNAME = "neo4j"
AURA_PASSWORD = "my_code:)"
# Client instantiation
gds = GraphDataScience(
AURA_CONNECTION_URI,
auth=(AURA_USERNAME, AURA_PASSWORD),
aura_ds=True
)
#Shorthand projection --works
shorthand_graph, result = gds.graph.project(
"short-example-graph",
["Paper"],
["Citation"]
)
When I do print(result) it shows
nodeProjection {'Paper': {'label': 'Paper', 'properties': {}}}
relationshipProjection {'Citation': {'orientation': 'NATURAL', 'aggre...
graphName short-example-graph
nodeCount 2708
relationshipCount 10556
projectMillis 34
Name: 0, dtype: object
However, no properties of the nodes are projected. I then use the extended syntax as described in the documentation:
# Project a graph using the extended syntax
extended_form_graph, result = gds.graph.project(
"Long-form-example-graph",
{'Paper': {properties: "x"}},
"Citation"
)
print(result)
#Errors
I get the error:
NameError: name 'properties' is not defined
I tried various variations of this, with or without " ", but none have worked so far (also documentation is very confusing because one of the docs always uses " " and in another place I did not see " ").
Also, note that all my properties are integers in the Neo4j db (in AuraDS), as I used to have the error that String properties are not supported.
Some clarification on the correct way of projecting node features (aka properties) would be very useful.
thank you,
Dina
The keys in the Python dictionaries that you use with the GraphDataScience library should be enclosed in quotation marks. This is different from Cypher syntax, where map keys are not enclosed with quotation marks.
This should work for you.
extended_form_graph, result = gds.graph.project(
"Long-form-example-graph",
{'Paper': {"properties": "x"}},
"Citation"
)
Best wishes,
Nathan

Spring Neo4j #Query not populating related entities until a subsequent .findOne called

I have the following Cypher which is working correctly in the Neo4j browser and returning all related entities as expected:
MATCH (cmp:Competition)-[:COMPETITION_COUNTRY]-(cc:Country)
WHERE ID(cmp)=16860
MATCH (cmp)-[:COMPETITION]-(s:Season)<-[:SEASON]-(f:Fixture)
WHERE s.yearStart=2019
MATCH (f)-[:HOME_TEAM]-(ht:Team)-[:TEAM_COUNTRY]-(htc:Country)
MATCH (f)-[:HOME_TEAM]-(at:Team)-[:TEAM_COUNTRY]-(atc:Country)
RETURN cmp, s, f, ht, at, htc, atc
ORDER BY f.matchDate DESC, ht.name DESC
And I have this as a Neo4jRepository function as follows:
#Query("MATCH (cmp:Competition)-[:COMPETITION_COUNTRY]-(cc:Country)\n" +
"WHERE ID(cmp)={0}\n" +
"\n" +
"MATCH (cmp)-[:COMPETITION]-(s:Season)<-[:SEASON]-(f:Fixture)\n" +
"WHERE s.yearStart={1}\n" +
"\n" +
"MATCH (f)-[:HOME_TEAM]-(ht:Team)-[:TEAM_COUNTRY]-(htc:Country)\n" +
"MATCH (f)-[:HOME_TEAM]-(at:Team)-[:TEAM_COUNTRY]-(atc:Country)\n" +
"\n" +
"RETURN cmp, s, f, ht, at, htc, atc\n" +
"ORDER BY f.matchDate DESC, ht.name DESC"
)
List<Fixture> getCompetitionYearFixtures(
Long competitionId, Integer yearStart
);
I call this method as follows:
List<Fixture> fixtures =
fixtureRepository
.getCompetitionYearFixtures(
competitionId, year);
Although all the expected Fixtures are returned, none of the related entities are populated:
However, I discovered that if I immediately run the following statement for any one (and only one) of the returned Fxtures, then every Fixture in fixtures is suddenly fully populated:
fixtureRepository.findOne(fixtures.get(0).getId(), 3);
As so:
So my question is, is there a way of returning the Fixtures with all related entities populated after the first trip to the database, without having to go back?
I specifically retrieved everything I needed in the Cypher, and the idea of using a depth of 3 in the findOne is something I'm a little uncomfortable with as in future I may add new relations which I won't necessarily want bloating the query.
EDIT
Solution, with thanks to FrantiĊĦek Hartman:
MATCH r1=(cmp:Competition)-[cmp_c:COMPETITION_COUNTRY]-(cmpc:Country)
WHERE ID(cmp)=16860
MATCH r2=(cmp)-[cmp_s:COMPETITION]-(s:Season)<-[s_f:SEASON]-(f:Fixture)
WHERE s.yearStart=2019
MATCH r3=(f)-[f_ht:HOME_TEAM]-(ht:Team)-[ht_c:TEAM_COUNTRY]-(htc:Country)
MATCH r4=(f)-[f_at:AWAY_TEAM]-(at:Team)-[at_c:TEAM_COUNTRY]-(atc:Country)
RETURN r1,r2,r3,r4
ORDER BY f.matchDate DESC,ht.name DESC
You need to return everything that you want to map, in your case you are expecting the relationships to be mapped as well (even if you don't have relationship entities, the relationship says e.g. the fixture and home team are related).
To see what is exactly returned in your Neo4j browser go to "Browser settings" (cog icon left bottom) -> Uncheck "Connect result nodes"
You query with relationships (you might be able to come up with better names then r1-r7):
MATCH (cmp:Competition)-[r1:COMPETITION_COUNTRY]-(cc:Country)
WHERE ID(cmp)=16860
MATCH (cmp)-[r2:COMPETITION]-(s:Season)<-[r3:SEASON]-(f:Fixture)
WHERE s.yearStart=2019
MATCH (f)-[r4:HOME_TEAM]-(ht:Team)-[r5:TEAM_COUNTRY]-(htc:Country)
MATCH (f)-[r6:HOME_TEAM]-(at:Team)-[r7:TEAM_COUNTRY]-(atc:Country)
RETURN cmp, s, f, ht, at, htc, atc, r1, r2, r3, r4, r5, r6, r7
ORDER BY f.matchDate DESC, ht.name DESC
The findOne method (with depth parameter 3) loads everything, including the relationships up to 3 steps, which will be populated into cached instances of your data.

How can I find the duration of a song when given its ID using XQuery?

First off, yes this is homework - please suggest where I am going wrong,but please do not do my homework for me.
I am learning XQuery, and one of my tasks is to take a list of song ID's for a performance and determine the total duration of the performance. Given the snippits below, can anyone point me to where I can determine how to cross reference the songID from the performance to the duration of the song?
I've listed my attempts at the end of the question.
my current XQuery code looks like:
let $songIDs := doc("C:/Users/rob/Downloads/A4_FLOWR.xml")
//SongSet/Song
for $performance in doc("C:/Users/rob/Downloads/A4_FLOWR.xml")
//ContestantSet/Contestant/Performance
return if($performance/SongRef[. =$songIDs/#SongID])
then <performanceDuration>{
data($performance/SongRef)
}</performanceDuration>
else ()
Which outputs:
<performanceDuration>S005 S003 S004</performanceDuration>
<performanceDuration>S001 S007 S002</performanceDuration>
<performanceDuration>S008 S009 S006</performanceDuration>
<performanceDuration>S002 S004 S007</performanceDuration>
Each S00x is the ID of a song, which us found in the referenced xml document (partial document):
<SongSet>
<Song SongID="S001">
<Title>Bah Bah Black Sheep</Title>
<Composer>Mother Goose</Composer>
<Duration>2.99</Duration>
</Song>
<Song SongID="S005">
<Title>Thank You Baby</Title>
<Composer>Shania Twain</Composer>
<Duration>3.02</Duration>
</Song>
</SongSet>
The performance section looks like:
<Contestant Name="Fletcher Gee" Hometown="Toronto">
<Repertoire>
<SongRef>S001</SongRef>
<SongRef>S002</SongRef>
<SongRef>S007</SongRef>
<SongRef>S010</SongRef>
</Repertoire>
<Performance>
<SongRef>S001</SongRef>
<SongRef>S007</SongRef>
<SongRef>S002</SongRef>
</Performance>
</Contestant>
My Attempts
I thought I would use nested loops, but that fails:
let $songs := doc("C:/Users/rob/Downloads/A4_FLOWR.xml")
//SongSet/Song
for $performance in doc("C:/Users/rob/Downloads/A4_FLOWR.xml")
//ContestantSet/Contestant/Performance
return if($performance/SongRef[. =$songs/#SongID])
for $song in $songIDs
(: gives an error in BaseX about incomplete if :)
then <performanceDuration>{
data($performance/SongRef)
}</performanceDuration>
else ()
--Edit--
I've fixed the inner loop, however I am getting all the songs durations, not just the ones that match id's. I have a feeling that this is due to variable scope, but I'm not sure:
let $songs := doc("C:/Users/rob/Downloads/A4_FLOWR.xml")//SongSet/Song
for $performance in doc("C:/Users/rob/Downloads/A4_FLOWR.xml")//ContestantSet/Contestant/Performance
return if($performance/SongRef[. =$songs/#SongID])
then <performanceDuration>{
for $song in $songs
return if($performance/SongRef[. =$songs/#SongID])
then
sum($song/Duration)
else ()
}</performanceDuration>
else ()
}
Output:
<performanceDuration>2.99 1.15 3.15 2.2 3.02 2.25 3.45 1.29 2.33 3.1</performanceDuration>
Your immediate problem is syntactic: you've inserted your inner loop between the condition and the keyword 'then' in a conditional. Fix that first:
return if ($performance/SongRef = $songs/#SongID) then
<performanceDuration>{
(: put your inner loop HERE :)
}</performanceDuration>
else ()
Now think yourself into the situation of the query evaluator inside the performanceDuration element. You have the variable $performance, you can find all the song references using $performance/SongRef, and for each song reference in the performance element, you can find the corresponding song element by matching the SongRef value with $songs/#SongID.
My next step at this point would be to ask myself:
For a given song reference, how do I find the song element for that song, and then the duration for that song?
Is there a way to get the sum of some set of durations? Is there, for example, a sum() function? (I'm pretty sure there is, but at this point I always pull up the Functions and Operators spec and look it up to be sure of the signature.)
What type does the duration info have? I'd expect it to be minutes and seconds, and I'd be worrying about duration arithmetic, but your sample makes it look like decimals, which will be easy.

pyparsing multiple lines optional missing data in result set

I am quite new pyparsing user and have missing match i don't understand:
Here is the text i would like to parse:
polraw="""
set policy id 800 from "Untrust" to "Trust" "IP_10.124.10.6" "MIP(10.0.2.175)" "TCP_1002" permit
set policy id 800
set dst-address "MIP(10.0.2.188)"
set service "TCP_1002-1005"
set log session-init
exit
set policy id 724 from "Trust" to "Untrust" "IP_10.16.14.28" "IP_10.24.10.6" "TCP_1002" permit
set policy id 724
set src-address "IP_10.162.14.38"
set dst-address "IP_10.3.28.38"
set service "TCP_1002-1005"
set log session-init
exit
set policy id 233 name "THE NAME is 527 ;" from "Untrust" to "Trust" "IP_10.24.108.6" "MIP(10.0.2.149)" "TCP_1002" permit
set policy id 233
set service "TCP_1002-1005"
set service "TCP_1006-1008"
set service "TCP_1786"
set log session-init
exit
"""
I setup grammar this way:
KPOL = Suppress(Keyword('set policy id'))
NUM = Regex(r'\d+')
KSVC = Suppress(Keyword('set service'))
KSRC = Suppress(Keyword('set src-address'))
KDST = Suppress(Keyword('set dst-address'))
SVC = dblQuotedString.setParseAction(lambda t: t[0].replace('"',''))
ADDR = dblQuotedString.setParseAction(lambda t: t[0].replace('"',''))
EXIT = Suppress(Keyword('exit'))
EOL = LineEnd().suppress()
P_SVC = KSVC + SVC + EOL
P_SRC = KSRC + ADDR + EOL
P_DST = KDST + ADDR + EOL
x = KPOL + NUM('PId') + EOL + Optional(ZeroOrMore(P_SVC)) + Optional(ZeroOrMore(P_SRC)) + Optional(ZeroOrMore(P_DST))
for z in x.searchString(polraw):
print z
Result set is such as
['800', 'MIP(10.0.2.188)']
['724', 'IP_10.162.14.38', 'IP_10.3.28.38']
['233', 'TCP_1002-1005', 'TCP_1006-1008', 'TCP_1786']
The 800 is missing service tag ???
What's wrong here.
Thanks by advance
Laurent
The problem you are seeing is that in your expression, DST's are only looked for after having skipped over optional SVC's and SRC's. You have a couple of options, I'll go through each so you can get a sense of what all is going on here.
(But first, there is no point in writing "Optional(ZeroOrMore(anything))" - ZeroOrMore already implies Optional, so I'm going to drop the Optional part in any of these choices.)
If you are going to get SVC's, SRC's, and DST's in any order, you could refactor your ZeroOrMore to accept any of the three data types, like this:
x = KPOL + NUM('PId') + EOL + ZeroOrMore(P_SVC|P_SRC|P_DST)
This will allow you to intermix different types of statements, and they will all get collected as part of the ZeroOrMore repetition.
If you want to keep these different types of statements in groups, then you can add a results name to each:
x = KPOL + NUM('PId') + EOL + ZeroOrMore(P_SVC("svc*")|
P_SRC("src*")|
P_DST("dst*"))
Note the trailing '*' on each name - this is equivalent to calling setResultsName with the listAllMatches argument equal to True. As each different expression is matched, the results for the different types will get collected into the "svc", "src", or "dst" results name. Calling z.dump() will list the tokens and the results names and their values, so you can see how this works.
set policy id 233
set service "TCP_1002-1005"
set dst-address "IP_10.3.28.38"
set service "TCP_1006-1008"
set service "TCP_1786"
set log session-init
exit
shows this for z.dump():
['233', 'TCP_1002-1005', 'IP_10.3.28.38', 'TCP_1006-1008', 'TCP_1786']
- PId: 233
- dst: [['IP_10.3.28.38']]
- svc: [['TCP_1002-1005'], ['TCP_1006-1008'], ['TCP_1786']]
If you wrap ungroup on the P_xxx expressions, maybe like this:
P_SVC,P_SRC,P_DST = (ungroup(expr) for expr in (P_SVC,P_SRC,P_DST))
then the output is even cleaner-looking:
['233', 'TCP_1002-1005', 'IP_10.3.28.38', 'TCP_1006-1008', 'TCP_1786']
- PId: 233
- dst: ['IP_10.3.28.38']
- svc: ['TCP_1002-1005', 'TCP_1006-1008', 'TCP_1786']
This is actually looking pretty good, but let me pass on one other option. There are a number of cases where parsers have to look for several sub-expressions in any order. Let's say they are A,B,C, and D. To accept these in any order, you could write something like OneOrMore(A|B|C|D), but this would accept multiple A's, or A, B, and C, but not D. The exhaustive/exhausting combinatorial explosion of (A+B+C+D) | (A+B+D+C) | etc. could be written, or you could maybe automate it with something like
from itertools import permutations
mixNmatch = MatchFirst(And(p) for p in permutations((A,B,C,D),4))
But there is a class in pyparsing called Each that allows to write the same kind of thing:
Each([A,B,C,D])
meaning "must have one each of A, B, C, and D, in any order". And like And, Or, NotAny, etc., there is an operator shortcut too:
A & B & C & D
which means the same thing.
If you want "must have A, B, and C, and optionally D", then write:
A & B & C & Optional(D)
and this will parse with the same kind of behavior, looking for A, B, C, and D, regardless of the incoming order, and whether D is last or mixed in with A, B, and C. You can also use OneOrMore and ZeroOrMore to indicate optional repetition of any of the expressions.
So you could write your expression as:
x = KPOL + NUM('PId') + EOL + (ZeroOrMore(P_SVC) &
ZeroOrMore(P_SRC) &
ZeroOrMore(P_DST))
I looked at using results names with this expression, and the ZeroOrMore's seem to be confusing things, maybe still a bug in how this is done. So you may have to reserve using Each for more basic cases like my A,B,C,D example. But I wanted to make you aware of it.
Some other notes on your parser:
dblQuotedString.setParseAction(lambda t: t[0].replace('"','')) is probably better written
dblQuotedString.setParseAction(removeQuotes). You don't have any embedded quotes in your examples, but it's good to be aware of where your assumptions might not translate to a future application. Here are a couple of ways of removing the defining quotes:
dblQuotedString.setParseAction(lambda t: t[0].replace('"',''))
print dblQuotedString.parseString(r'"This is an embedded quote \" and an ending quote \""')[0]
# prints 'This is an embedded quote \ and an ending quote \'
# removed leading and trailing "s, but also internal ones too, which are
# really part of the quoted string
dblQuotedString.setParseAction(lambda t: t[0].strip('"'))
print dblQuotedString.parseString(r'"This is an embedded quote \" and an ending quote \""')[0]
# prints 'This is an embedded quote \" and an ending quote \'
# removed leading and trailing "s, and leaves the one internal ones but strips off
# the escaped ending quote
dblQuotedString.setParseAction(removeQuotes)
print dblQuotedString.parseString(r'"This is an embedded quote \" and an ending quote \""')[0]
# prints 'This is an embedded quote \" and an ending quote \"'
# just removes leading and trailing " characters, leaves escaped "s in place
KPOL = Suppress(Keyword('set policy id')) is a bit fragile, as it will break if there are any extra spaces between 'set' and 'policy', or between 'policy' and 'id'. I usually define these kind of expressions by first defining all the keywords individually:
SET,POLICY,ID,SERVICE,SRC_ADDRESS,DST_ADDRESS,EXIT = map(Keyword,
"set policy id service src-address dst-address exit".split())
and then define the separate expressions using:
KSVC = Suppress(SET + SERVICE)
KSRC = Suppress(SET + SRC_ADDRESS)
KDST = Suppress(SET + DST_ADDRESS)
Now your parser will cleanly handle extra whitespace (or even comments!) between individual keywords in your expressions.

How can I model a scalable set of definition/term pairs?

Right now my flashcard game is using a prepvocab() method where I
define the terms and translations for a week's worth of terms as a dictionary
add a description of that week's terms
lump them into a list of dictionaries, where a user selects their "weeks" to study
Every time I add a new week's worth of terms and translations, I'm stuck adding another element to the list of available dictionaries. I can definitely see this as not being a Good Thing.
class Vocab(object):
def __init__(self):
vocab = {}
self.new_vocab = vocab
self.prepvocab()
def prepvocab(self):
week01 = {"term":"translation"} #and many more...
week01d = "Simple Latvian words"
week02 = {"term":"translation"}
week02d = "Simple Latvian colors"
week03 = {"I need to add this":"to self.selvocab below"}
week03d = "Body parts"
self.selvocab = [week01, week02] #, week03, weekn]
self.descs = [week01d, week02d] #, week03, weekn]
Vocab.selvocab(self)
def selvocab(self):
"""I like this because as long as I maintain self.selvocab,
the for loop cycles through the options just fine"""
for x in range(self.selvocab):
YN = input("Would you like to add week " \
+ repr(x + 1) + " vocab? (y or n) \n" \
"Description: " + self.descs[x] + " ").lower()
if YN in "yes":
self.new_vocab.update(self.selvocab[x])
self.makevocab()
I can definitely see that this is going to be a pain with 20+ yes no questions. I'm reading up on curses at the moment, and was thinking of printing all the descriptions at once, and letting the user pick all that they'd like to study for the round.
How do I keep this part of my code better maintained? Anybody got a radical overhaul that isn't so....procedural?
You should store your term:translation pairs and descriptions in a text file in some manner. Your program should then parse the text file and discover all available lessons. This will allow you to extend the set of lessons available without having to edit any code.
As for your selection of lessons, write a print_lesson_choices function that displays the available lessons and descriptions to the user, and then ask for their input in selecting them. Instead of asking a question of them for every lesson, why not make your prompt something like:
self.selected_weeks = []
def selvocab(self):
self.print_lesson_choices()
selection = input("Select a lesson number or leave blank if done selecting: ")
if selection == "": #Done selecting
self.makevocab()
elif selection in self.available_lessons:
if selection not in self.selected_weeks:
self.selected_weeks.append(selection)
print "Added lesson %s"%selection
self.selvocab() #Display the list of options so the user can select again
else:
print "Bad selection, try again."
self.selvocab()
Pickling objects into a database means it'll take some effort to create an interface to modify the weekly lessons from the front end, but is well worth the time.

Resources