Why are keywords not parsed first and omitted from free text matching? - pyparsing

I thought I understood pyparsing's logic, but cannot figure out why the bottom example is failing.
I'm trying to parse open text comments where a product or set of products can be mentioned either in the beginning or the end of the comment. Product names can also be omitted from the comment.
The output should be a list of the mentioned products and the description regarding them.
Below are some test cases. The parse is identifying everything as 'description' instead of first picking up the products (isn't that what the negative is supposed to do?)
What's wrong in my understanding?
import pyparsing as pp
products_list = ['aaa', 'bbb', 'ccc']
products = pp.OneOrMore(' '.join(products_list))
word = ~products + pp.Word(pp.alphas)
description = pp.OneOrMore(word)
comment_expr = (pp.Optional(products("product1")) + description("description") + pp.Optional(products("product2")))
matches = comment_expr.scanString("""\
aaa is a good product
I prefer aaa
No comment
aaa bbb are both good products""")
for match in matches:
print match
The expected results would be:
product1: aaa, description: is a good product
product2: aaa, description: I prefer
description: No comment
product1: [aaa, bbb] description: are both good products

Pyparsing's shortcut equivalence between strings and Literals is intended to be a convenience, but sometimes it results in unexpected and unwanted circumstances. In these lines:
products_list = ['aaa', 'bbb', 'ccc']
products = pp.OneOrMore(' '.join(products_list))
.
I'm pretty sure you wanted product to match on any product. But instead, OneOrMore gets passed this as its argument:
' '.join(products_list)
This is purely a string expression, resulting in the string "aaa bbb ccc". Passing this to OneOrMore, you are saying that products is one or more instances of the string "aaa bbb ccc".
To get the lookahead, you need to change products to:
products = pp.oneOf(products_list)
or even better:
products = pp.MatchFirst(pp.Keyword(p) for p in products_list)
Then your negative lookahead will work better.

Related

How do I access a calculated field in Rails?

In my first attempt to develop something in Ruby on Rails :) ... I have a list of names stored in fields "first_name" and "last_name". In my Person model, I have defined something like this:
def sort_name
sort_name = last_name + ',' + first_name
end
Now I want to show all persons shown in a list, sorted by sort_name, but (in my controller) something like
#persons = Person.order(:sort_name)
doesn't work (Unknown column 'sort_name' in 'order clause'). How do reference to the calculated field sort_name in my controller?
I am sure this is a "oh my god I am so stupid moment" but happy for any advise!
If the model Person has the fields name, first_lastname and second_lastname, you can do the next:
Person.order(:name, :first_lastname, :second_lastname)
By default is ordering in ascending way. Also you can put if you want ascending or descending way for each field:
Person.order(name: :asc, first_lastname: :desc, second_lastname: :asc)
Additional if you want add a column with the complete name, you can use select, using postgresql the code would be:
people = Person.order(
name: :asc, first_lastname: :desc, second_lastname: :asc
).select(
"*, concat(name,' ', first_lastname, ' ',second_lastname) as sort_name"
)
people[0].sort_name
# the sort_name can be for example "Adán Saucedo Salas"

Query is unable to match parts after "/" or parts within "()" in the data

I have a search request written as
import sqlite3
conn = sqlite3.connect('locker_data.db')
c = conn.cursor()
def search1(teacher):
test = 'SELECT Name FROM locker_data WHERE Name or Email LIKE "%{0}%"'.format(teacher)
data1 = c.execute(test)
return data1
def display1(data1):
Display1 = []
for Name in data1:
temp1 = str(Name[0])
Display1.append("Name: {0}".format(temp1))
return Display1
def locker_searcher(teacher):
data = display1(search1(teacher))
return data
This allows me to search for the row containing "Mr FishyPower (Mr Swag)" or "Mr FishyPower / Mr Swag" with a search input of "FishyPower". However, when I try searching with an input of "Swag", I am then unable to find the same row.
In the search below, it should have given me the same search results.
The database is just a simple 1x1 sqlite3 database containing 'FishyPower / Mr Swag'
Search Error on 'Swag'
Edit: I technically did solve it by limiting the columns being searched to only 'Name' but I intended the code search both the 'Name' and 'Email' columns and output the results as long as the search in within either or both columns.
Edit2: SELECT Name FROM locker_data WHERE Email LIKE "%{0}%" or Name LIKE "%{0}%" was the right way to go.
I'm gonna guess that Mr. FishyPower's email address is something like mrFishyPower#something.com. The query is only comparing Email to teacher. If it was
WHERE Name LIKE "%{0}%"
OR Email LIKE "%{0}%"'
you would (probably) get the result you want.

Find word (not containing substrings) in comma separated string

I'm using a linq query where i do something liike this:
viewModel.REGISTRATIONGRPS = (From a In db.TABLEA
Select New SubViewModel With {
.SOMEVALUE1 = a.SOMEVALUE1,
...
...
.SOMEVALUE2 = If(commaseparatedstring.Contains(a.SOMEVALUE1), True, False)
}).ToList()
Now my Problem is that this does'n search for words but for substrings so for example:
commaseparatedstring = "EWM,KI,KP"
SOMEVALUE1 = "EW"
It returns true because it's contained in EWM?
What i would need is to find words (not containing substrings) in the comma separated string!
Option 1: Regular Expressions
Regex.IsMatch(commaseparatedstring, #"\b" + Regex.Escape(a.SOMEVALUE1) + #"\b")
The \b parts are called "word boundaries" and tell the regex engine that you are looking for a "full word". The Regex.Escape(...) ensures that the regex engine will not try to interpret "special characters" in the text you are trying to match. For example, if you are trying to match "one+two", the Regex.Escape method will return "one\+two".
Also, be sure to include the System.Text.RegularExpressions at the top of your code file.
See Regex.IsMatch Method (String, String) on MSDN for more information.
Option 2: Split the String
You could also try splitting the string which would be a bit simpler, though probably less efficient.
commaseparatedstring.Split(new Char[] { ',' }).Contains( a.SOMEVALUE1 )
what about:
- separating the commaseparatedstring by comma
- calling equals() on each substring instead of contains() on whole thing?
.SOMEVALUE2 = If(commaseparatedstring.Split(',').Contains(a.SOMEVALUE1), True, False)

How to extract results from asp.net regex.match?

Coming from perl, I'm I bit confused by the asp.net regex classes.
I have a simple pattern I'm trying to match: "number text number"
My code looks like:
Match results = Regex.Match(mystring, #"(\d+)\s+(Highway|Hwy|Route|Rte)\s+(\d+)",RegexOptions.IgnoreCase);
foreach (Group g in results.Groups)
{
string token = g.Value;
}
The problem is that the groups seems to contain 4 results, not the 3 I would expect - the first is the entire string that gets matched, while the next 3 are what I would expect.
Is there a simple way to directly access my 3 results?
You could use Matches:
// Define a test string.
string text = "The the quick brown fox fox jumped over the lazy dog dog.";
// Find matches.
MatchCollection matches = rx.Matches(text);
// Report the number of matches found.
Console.WriteLine("{0} matches found in:\n {1}",
matches.Count,
text);
// Report on each match.
foreach (Match match in matches)
{
...
}
var results = Regex.Match("55 Hwy 66", #"(\d+)\s+(Highway|Hwy|Route|Rte)\s+(\d+)", RegexOptions.IgnoreCase).Groups.OfType<Group>().Select((name, index) => new {name, index}).Where(x => x.index > 0).Select(x => x.name).ToList();
This is just a case of how it is designed to work, and it is just a case of ignoring the first match. I do agree that it is a strange implementation and not how I would have expected it to work.
If the regular expression engine can find a match, the first element of the GroupCollection object returned by the Groups property contains a string that matches the entire regular expression pattern.
Taken from here
I know this is an old question, but I ended up here through a search confirming my own thoughts and there was no definitive answer.

How can I model a scalable set of definition/term pairs?

Right now my flashcard game is using a prepvocab() method where I
define the terms and translations for a week's worth of terms as a dictionary
add a description of that week's terms
lump them into a list of dictionaries, where a user selects their "weeks" to study
Every time I add a new week's worth of terms and translations, I'm stuck adding another element to the list of available dictionaries. I can definitely see this as not being a Good Thing.
class Vocab(object):
def __init__(self):
vocab = {}
self.new_vocab = vocab
self.prepvocab()
def prepvocab(self):
week01 = {"term":"translation"} #and many more...
week01d = "Simple Latvian words"
week02 = {"term":"translation"}
week02d = "Simple Latvian colors"
week03 = {"I need to add this":"to self.selvocab below"}
week03d = "Body parts"
self.selvocab = [week01, week02] #, week03, weekn]
self.descs = [week01d, week02d] #, week03, weekn]
Vocab.selvocab(self)
def selvocab(self):
"""I like this because as long as I maintain self.selvocab,
the for loop cycles through the options just fine"""
for x in range(self.selvocab):
YN = input("Would you like to add week " \
+ repr(x + 1) + " vocab? (y or n) \n" \
"Description: " + self.descs[x] + " ").lower()
if YN in "yes":
self.new_vocab.update(self.selvocab[x])
self.makevocab()
I can definitely see that this is going to be a pain with 20+ yes no questions. I'm reading up on curses at the moment, and was thinking of printing all the descriptions at once, and letting the user pick all that they'd like to study for the round.
How do I keep this part of my code better maintained? Anybody got a radical overhaul that isn't so....procedural?
You should store your term:translation pairs and descriptions in a text file in some manner. Your program should then parse the text file and discover all available lessons. This will allow you to extend the set of lessons available without having to edit any code.
As for your selection of lessons, write a print_lesson_choices function that displays the available lessons and descriptions to the user, and then ask for their input in selecting them. Instead of asking a question of them for every lesson, why not make your prompt something like:
self.selected_weeks = []
def selvocab(self):
self.print_lesson_choices()
selection = input("Select a lesson number or leave blank if done selecting: ")
if selection == "": #Done selecting
self.makevocab()
elif selection in self.available_lessons:
if selection not in self.selected_weeks:
self.selected_weeks.append(selection)
print "Added lesson %s"%selection
self.selvocab() #Display the list of options so the user can select again
else:
print "Bad selection, try again."
self.selvocab()
Pickling objects into a database means it'll take some effort to create an interface to modify the weekly lessons from the front end, but is well worth the time.

Resources