Parsey mcparseface : how to get position of word in sentence along with parse tree - syntaxnet

I am using parsey mcparseface and syntaxnet to parse some text. I wish to extract the positional data of words along with the parse tree.
Currently what the output is:
echo 'Alice brought the pizza to Alice.' | syntaxnet/demo.sh
Input: Alice brought the pizza to Alice .
Parse:
brought VBD ROOT
+-- ALice NNP nsubj
+-- pizza NN dobj
| +-- the DT det
+-- to IN prep
| +-- Alice NNP pobj
+-- . . punct
how i need it to be
Input: Alice brought the pizza to Alice .
Parse:
brought VBD ROOT 2
+-- Alice NNP nsubj 1
+-- pizza NN dobj 4
| +-- the DT det 3
+-- to IN prep 5
| +-- Alice NNP pobj 6
+-- . . punct 7
or similar. (this will be particularly useful when there are many occurances of same word.)
Thank you

You can edit conll2tree.py
https://github.com/tensorflow/models/blob/master/syntaxnet/syntaxnet/conll2tree.py
Changing token_str to
token_str = ['%s %d %s %s' % (token.word, tind,
token.tag, token.label)
for tind,token in enumerate(sentence.token,1)]
should do it.

Related

How do I pretty print or visualise an object of class 'CoreNLP_pb2.ParseTree' in Python/Jupyter Notebook?

I'm using Stanza's CoreNLP client in a Jupyter notebook to do constituency parsing on a string. The final output came in the form of an object of class 'CoreNLP_pb2.ParseTree'.
>>> print type(result)
<class 'CoreNLP_pb2.ParseTree'>
How should I print this in a visible way? When I directly call print(result), there is no output.
You can conver CoreNLP_pb2.ParseTree into nltk.tree.Tree and the call pretty_print() to print the parse tree in a visible way.
from nltk.tree import Tree
def convert_parse_tree_to_nltk_tree(parse_tree):
return Tree(parse_tree.value, [get_nltk_tree(child) for child in parse_tree.child]) if parse_tree.child else parse_tree.value
convert_parse_tree_to_nltk_tree(constituency_parse).pretty_print()
The result is as follows:
ROOT
|
S
_______________|____________________
| VP |
| ________|___ |
NP | NP |
____|_____ | ________|_____ |
NNP NNP VBZ DT JJ NN .
| | | | | | |
Chris Manning is a nice person .

Robot Framework API - how to get suite and its test cases results

I have a test suite directory which contains test suite files with one or more test cases. Let's say it looks like this:
TestSuite
Test-1
Step 1
Step 2
Test-2
Step 1
Test-3
Step 1
Step 2
Step 3
I would like to parse output.xml to get results like this:
Test-1 | PASS
Test-1 | Step 1 | PASS
Test-1 | Step 2 | PASS
Test-2 | PASS
Test-2 | Step 1 | PASS
Test-3 | PASS
Test-3 | Step 1 | PASS
Test-3 | Step 2 | PASS
Test-3 | Step 3 | PASS
So far I have managed to get only suite files names and results using this code:
from robot.api import ExecutionResult, SuiteVisitor
class PrintSuiteInfo(SuiteVisitor):
def visit_suite(self, suite):
print('{} | {}'.format(suite.name, suite.status))
result = ExecutionResult('output.xml')
result.suite.suites.visit(PrintSuiteInfo())
which gives this output:
Test-1 | PASS
Test-2 | PASS
Test-3 | PASS
I can get test case names and results with this code:
from robot.api import ExecutionResult, ResultVisitor
class PrintTestInfo(ResultVisitor):
def visit_test(self, test):
print('{} | {}'.format(test.name, test.status))
result = ExecutionResult('output.xml')
result.visit(PrintTestInfo())
but the output is:
Step 1 | PASS
Step 2 | PASS
Step 1 | PASS
Step 1 | PASS
Step 2 | PASS
Step 3 | PASS
so there is no relation to suite files which I need to update results in Jira.
The only thing that came to my mind is to include the suite file name in each test case name but I would like to learn more about robot.api. I looked into the documentation many times but it is not clear enough for me now.
One of my colleagues helped me with solving this. What I was missing was:
test.parent
or one that I figured out myself:
test.longname
which gives output like this:
TestSuite.Test-1.Step 1
TestSuite.Test-1.Step 2
...
It is documented here.

Searching of a number of words if found in dataframe$text , want to keep them

I want to search a number of words from the df$text, if any of them or if these are present in the tweets I want to place the whole row in new dataframe. Actually the problem occurs I've search for keywords "pat", "ppp", "jui", "jip" but the dataset that i get contains the users' name having these keywords but not the tweets. I want to remove those tweets having not a keyword in them.
The dataframe looks like:
screen_name | text
1| pat_bing | RT #timkaine: 22 school shootings in 2018. 3 in the last week. How many times must our hearts break hearing news like this - this time in…
2| artguroo | RT #RabiaBaluch: Khurram Nawaz Gandapur (Dr Tahir Qadir’s right-hand man and PAT/Minhaj-ul-Quran leader) abusing and threatening young girl…
3| ppp_007 | RT #atDavidHoffman: Before today’s shooting in Santa Fe, Texas, no one was talking about the NRA & gun control anymore. Except the Parkland…
4| jip_1 | RT #TravisAllen02: What do Republicans care more about?
5| esha_jip | I want jip to become the best party ever #jip #ppp #anp #pmln #pti
The desired df should be like:
screen_name | text
2| artguroo | RT #RabiaBaluch: Khurram Nawaz Gandapur (Dr Tahir Qadir’s right-hand man and PAT/Minhaj-ul-Quran leader) abusing and threatening young girl…
5| esha_jip | I want jip to become the best party ever #jip #ppp #anp #pmln #pti
I'm done extracting tweets just want to clean up this mess. Help!
You can get this with grep and a regular expression. Since you include row 2, I assume that you want to ignore case.
grep("pat|ppp|jui|jip", dat$text, ignore.case =TRUE)
[1] 2 5
dat[grep("pat|ppp|jui|jip", dat$text, ignore.case =TRUE), ]
screen_name
2 artguroo
5 esha_jip
text
2 RT #RabiaBaluch: Khurram Nawaz Gandapur (Dr Tahir Qadir<U+0092>s right-hand man and PAT/Minhaj-ul-Quran leader) abusing and threatening young girl<U+0085>
5

Is there a Unix style command like `column` that formats into a table?

Is there a command line tool that takes lines of delimiter-separated values and arranges them in a SQL-style table? E.g.,
id,name
1,apple
2,banana
3,yogurt
into
id | name
----+---------
1 | apple
2 | banana
3 | yogurt
With perl and format statement :
Input file:
$ cat file.scv
id,name
1,apple
2,banana
3,yogurt
Code:
$ cat ./format-STDIN.pl
#!/usr/bin/env perl
use strict; use warnings;
sep();
while (<>) {
$. == 2 and sep();
format STDOUT =
|#<< | #<<<<<<<<<<<|
split /,/
.
write;
}
sep();
sub sep{ print "+----+-------------+\n"; }
Output:
$ ./format-STDIN.pl file.csv
+----+-------------+
|id | name |
+----+-------------+
|1 | apple |
|2 | banana |
|3 | yogurt |
+----+-------------+

Match multiple columns with same value in ODBC

Hi I have an Access Table like this.
----------------------------------------------------------------
| firstname | surname | address |
----------------------------------------------------------------
| Joan | Rivers | 123 Fake St. |
| Michael | Jackson | 69 Balls Head St. |
| Justin | Bieber | None |
----------------------------------------------------------------
I'm wondering if it is possible, over ODBC, to construct a query that allows me to match my input to any column.
Something like this:
SELECT * FROM NEMESISES WHERE '%value%' LIKE firstname or surname or address;
and when value is plugged in for example: '%bie%', it outputs the Justin Bieber row or when '%st%' is plugged in it outputs the Joan Rivers and Michael Jackson row.
Thank You!
You can divide it into 3 matchings:
SELECT * FROM NEMESISES
WHERE firstname LIKE '%value%'
OR surname LIKE '%value%'
OR address LIKE '%value%';
Or you can match joined values of columns:
SELECT * FROM NEMESISES
WHERE firstname || surname || address LIKE '%value%';
I would prefer first solution: database have less to do.

Resources