How to pre load syntaxnet so that it takes lesser time to give dependency parse output - syntaxnet

I am using demo.sh provided in syntaxnet repository. If I give input with '\n' separation, it is taking 27.05 seconds for running 3000 lines of text but when I run each line individually, it is taking more than one hour.
It means loading the model takes over 2.5 seconds. If this step is separated and has been put on cash, it will make the whole pipeline faster.
Here is modified version of demo.sh:-
PARSER_EVAL=bazel-bin/syntaxnet/parser_eval
MODEL_DIR=syntaxnet/models/parsey_mcparseface
[[ "$1" == "--conll" ]] && INPUT_FORMAT=stdin-conll || INPUT_FORMAT=stdin
$PARSER_EVAL \
--input=$INPUT_FORMAT \
--output=stdout-conll \
--hidden_layer_sizes=64 \
--arg_prefix=brain_tagger \
--graph_builder=structured \
--task_context=$MODEL_DIR/context.pbtxt \
--model_path=$MODEL_DIR/tagger-params \
--slim_model \
--batch_size=1024 \
--alsologtostderr \
| \
$PARSER_EVAL \
--input=stdin-conll \
--output=stdout-conll \
--hidden_layer_sizes=512,512 \
--arg_prefix=brain_parser \
--graph_builder=structured \
--task_context=$MODEL_DIR/context.pbtxt \
--model_path=$MODEL_DIR/parser-params \
--slim_model \
--batch_size=1024 \
--alsologtostderr \
I want to build a function call that will take input sentence and give output with dependency parser stored on to local variable like below( the below code is just to make the question clear )
dependency_parsing_model = ...
def give_dependency_parser(sentence,model=dependency_parsing_model):
...
#logic here
...
return dependency_parsing_output
In the above, model is stored in to a variable, so it takes lesser time for running each line on function call.
How to do this ?

The current version of syntaxnet's Parsey McParseface has two limitations which you've run across:
Sentences are read from stdin or a file, not from a variable
The model is in two parts and not a single executable
I have a branch of tensorflow/models:
https://github.com/dmansfield/models/tree/documents-from-tensor
which I'm working with the maintainers to get merged. With this branch of the code you can build the entire model in one graph (using a new python script called parsey_mcparseface.py) and feed sentences in with a tensor (i.e. a python variable).
Not the best answer in the world I'm afraid because it's very much in flux. There's no simple recipe for getting this working at the moment.

Related

What is the best way to iterate through lines in a CSV file?

I try to iterate through lines in a csv file.
However, the library https://pypi.org/project/robotframework-csvlibrary/ is not supported with Python3.
So, do you know another way to iterate on lines please ? My objective is to do something like:
#{lines}= Get Lines dataset.csv
:FOR ${line} IN ${lines}
\ Log ${line}[column1]
\ Log ${line}[column2]
\ Log ${line}[column3]
Thank you for your help and advice.
I found this way:
Python
def get_lines_from_csv(csv_path):
data = []
with open(csv_path, 'rt') as csvfile:
reader = csv.reader(csvfile, delimiter=';')
for row in reader:
data.append(row)
return data
Robotframework
#{lines} = Get Lines From Csv ${DATAFILE_PATH}
${lines_length} = Get Length ${lines}
FOR ${csv_row_index} IN RANGE 2 ${lines_length}
\ #{currentLine}= Set Variable #{lines}[${csv_row_index}]
Not tested, but one kind of a solution.
#{lines}= Get Lines dataset.csv
:FOR ${line} IN ${lines}
\ ${csv_row_as_a_list}= Split String ${line} ,
\ Log ${csv_row_as_a_list}[1]
\ Log ${csv_row_as_a_list}[2]
\ Log ${csv_row_as_a_list}[3]

Firebase BigQuery migration bash error

I am using the standard workflow of Google to run the migration from the old dataset to the new dataset (Migration steps). I imputed the missing values, such as Property ID, BigQuery ID, etc. When I ran the bash script the following error occured?
Migrating mindfulness.com_mindfulness_ANDROID.app_events_20180515
--allow_large_results --append_table --batch --debug_mode --destination_table=analytics_171690789.events_20180515 --noflatten_results --nouse_legacy_sql --parameter=firebase_app_id::1:437512149764:android:0dfd4ab1e9926c7c --parameter=date::20180515 --parameter=platform::ANDROID#platform --project_id=mindfulness --use_gce_service_account
FATAL Flags positioning error: Flag '--project_id=mindfulness' appears after final command line argument. Please reposition the flag.
Run 'bq help' to get help.
On stackoverflow and Google I couldn't find a solution. Someone any idea how to solve this?
My migration.sh script (with small modifications to the IDs to stay anonymous)
# Analytics Property ID for the Project. Find this in Analytics Settings in Firebase
PROPERTY_ID=171230123
# Bigquery Export Project
BQ_PROJECT_ID="mindfulness" #(e.g., "firebase-public-project")
# Firebase App ID for the app.
FIREBASE_APP_ID="1:123412149764:android:0dfd4ab1e1234c7c" #(e.g., "1:300830567303:ios:
# Dataset to import from.
BQ_DATASET="com_mindfulness_ANDROID" #(e.g., "com_firebase_demo_IOS")
# Platform
PLATFORM="ANDROID"#"platform of the app. ANDROID or IOS"
# Date range for which you want to run migration, [START_DATE,END_DATE]
START_DATE=20180515
END_DATE=20180517
# Do not modify the script below, unless you know what you are doing :)
startdate=$(date -d"$START_DATE" +%Y%m%d) || exit -1
enddate=$(date -d"$END_DATE" +%Y%m%d) || exit -1
# Iterate through the dates.
DATE="$startdate"
while [ "$DATE" -le "$enddate" ]; do
# BQ table constructed from above params.
BQ_TABLE="$BQ_PROJECT_ID.$BQ_DATASET.app_events_$DATE"
echo "Migrating $BQ_TABLE"
cat migration_script.sql | sed -e "s/SCRIPT_GENERATED_TABLE_NAME
$BQ_TABLE/g" | bq query \
--debug_mode \
--allow_large_results \
--noflatten_results \
--use_legacy_sql=False \
--destination_table analytics_$PROPERTY_ID.events_$DATE \
--batch \
--append_table \
--parameter=firebase_app_id::$FIREBASE_APP_ID \
--parameter=date::$DATE \
--parameter=platform::$PLATFORM \
--project_id=$BQ_PROJECT_ID
temp=$(date -I -d "$DATE + 1 day")
DATE=$(date -d "$temp" +%Y%m%d)
done
exit
# END OF SCRIPT
If you look at the output of your script, it contains this bit of text, right before the flag that's out of order:
--parameter=platform::ANDROID#platform --project_id=mindfulness
I'm pretty sure you want your platform to be ANDROID, not ANDROID#platform.
I suspect you can fix this just by putting a space between the end of the string, and the inline comment. So you have something like this:
PLATFORM="ANDROID" #"platform of the app. ANDROID or IOS"
Although to be safe, you might want to remove the comments entirely at the end of each line.

Makefile Target: Cannot assign variable from loop

Problem is, furl doesn't get assigned value from file variable. If I use any plain text instead of $$file, it works. I have no idea why this doesn`t work.
Makefile:
test:
#for file in $(shell pwd)/demo/*; do \
$(eval furl := $(shell echo $$file)) \
echo $(furl); \
done
Im pretty sure this should be easy to fix, however I could not find a solution. Any ideas?
Im using the latest lubuntu OS.
You have messed shell and Make variables in the recipe. I think furl should be shell variable. Something like this:
test:; #for file in demo/*; do \
furl="`echo $$file`"; \
echo "$$furl"; \
done
In general you shouldn't assign Make variables in recipes. Obvious exception is $(foreach).

How do I parse hard drive identifiers

This image shows some hard drive IDs, they look pretty standardized (acquired from a web GUI which gathered the data from the command prompt on CentOS).
Are these drive IDs standardized and how can I parse the data out (of any set of hard drives on the market). i.e. I want to end up with the following variables (would regex work for any drive on the market?):
type=scsi
type2=SATA
MFR=WDC
model=WDC_WD1001FALS
serial=WD-WCATR6632234
Is this apparent order truly standardized across all mfrs and how do I parse it?
The pattern you see comes from a .rules file on your computer, something like "60-persistent-storage.rules":
# by-id (hardware serial number)
KERNEL=="hd*[!0-9]", ENV{ID_SERIAL}=="?*", \
SYMLINK+="disk/by-id/ata-$env{ID_SERIAL}"
KERNEL=="hd*[0-9]", ENV{ID_SERIAL}=="?*", \
SYMLINK+="disk/by-id/ata-$env{ID_SERIAL}-part%n"
KERNEL=="sd*[!0-9]", ENV{ID_SCSI_COMPAT}=="?*", \
SYMLINK+="disk/by-id/scsi-$env{ID_SCSI_COMPAT}"
KERNEL=="sd*[0-9]", ENV{ID_SCSI_COMPAT}=="?*", \
SYMLINK+="disk/by-id/scsi-$env{ID_SCSI_COMPAT}-part%n"
ENV{DEVTYPE}=="disk", ENV{ID_BUS}=="?*", ENV{ID_SERIAL}=="?*", \
SYMLINK+="disk/by-id/$env{ID_BUS}-$env{ID_SERIAL}"
ENV{DEVTYPE}=="partition", ENV{ID_BUS}=="?*", ENV{ID_SERIAL}=="?*", \
SYMLINK+="disk/by-id/$env{ID_BUS}-$env{ID_SERIAL}-part%n"
ENV{DEVTYPE}=="disk", ENV{ID_EDD}=="?*", \
SYMLINK+="disk/by-id/edd-$env{ID_EDD}"
ENV{DEVTYPE}=="partition", ENV{ID_EDD}=="?*", \
SYMLINK+="disk/by-id/edd-$env{ID_EDD}-part%n"
ENV{DEVTYPE}=="disk", ENV{ID_WWN_WITH_EXTENSION}=="?*", \
SYMLINK+="disk/by-id/wwn-$env{ID_WWN_WITH_EXTENSION}"
ENV{DEVTYPE}=="partition", ENV{ID_WWN_WITH_EXTENSION}=="?*", \
SYMLINK+="disk/by-id/wwn-$env{ID_WWN_WITH_EXTENSION}-part%n"
These rules can be changed.
Note that your strings are SCSI, and the rules for SCSI IDs follow these rules (although I'm not sure how exactly this works).

Completion when program has sub-commands

I have written a command-line tool that uses sub-commands much like Mercurial, Git, Subversion &c., in that its general usage is:
>myapp [OPTS] SUBCOMMAND [SUBCOMMAND-OPTS] [ARGS]
E.g.
>myapp --verbose speak --voice=samantha --quickly "hello there"
I'm now in the process of building Zsh completion for it but have quickly found out that it is a very complex beast. I have had a look at the _hg and _git completions but they are very complex and different in approach (I struggle to understand them), but both seem to handle each sub-command separately.
Does anyone know if there a way using the built in functions (_arguments, _values, pick_variant &c.) to handle the concept of sub-commands correctly, including handling general options and sub-command specific options appropriately? Or would the best approach be to manually handle the general options and sub-command?
A noddy example would be very much appreciated.
Many thanks.
Writing completion scripts for zsh can be quite difficult. Your best bet is to use an existing one as a guide. The one for Git is way too much for a beginner. You can use this repo:
https://github.com/zsh-users/zsh-completions
As for your question, you have use the concept of state. You define your subcommands in a list and then identify via $state which command you are in. Then you define the options for each command. You can see this in the completion script for play. A simplified version is below:
_play() {
local ret=1
_arguments -C \
'1: :_play_cmds' \
'*::arg:->args' \
&& ret=0
case $state in
(args)
case $line[1] in
(build-module|list-modules|lm|check|id)
_message 'no more arguments' && ret=0
;;
(dependencies|deps)
_arguments \
'1:: :_play_apps' \
'(--debug)--debug[Debug mode (even more informations logged than in verbose mode)]' \
'(--jpda)--jpda[Listen for JPDA connection. The process will suspended until a client is plugged to the JPDA port.]' \
'(--sync)--sync[Keep lib/ and modules/ directory synced. Delete unknow dependencies.]' \
'(--verbose)--verbose[Verbose Mode]' \
&& ret=0
;;
esac
esac
(If you are going to paste this, use the original source, as this won't work).
It looks daunting, but the general idea is not that complicated:
The subcommand comes first (_play_cmds is a list of subcommands with a description for each one).
Then come the arguments. The arguments are built based on which subcommand you are choosing. Note that you can group multiple subcommands if they share arguments.
With man zshcompsys, you can find more info about the whole system, although it is somewhat dense.
I found a technique that works well and is easy to understand. Basically, you create a new completion function for each subcommand and call it from the top-level completion function. Here's an example with dolt, showing how dolt completes to dolt table and dolt table completes to dolt table import, which then completes with a set of flags:
_dolt() {
local line state
_arguments -C \
"1: :->cmds" \
"*::arg:->args"
case "$state" in
cmds)
_values "dolt command" \
"table[Commands for copying, renaming, deleting, and exporting tables.]" \
;;
args)
case $line[1] in
table)
_dolt_table
;;
esac
;;
esac
}
_dolt_table() {
local line state
_arguments -C \
"1: :->cmds" \
"*::arg:->args"
case "$state" in
cmds)
_values "dolt_table command" \
"import[Creates, overwrites, replaces, or updates a table from the data in a file.]" \
;;
args)
case $line[1] in
import)
_dolt_table_import
;;
esac
;;
esac
}
_dolt_table_import() {
_arguments -s \
{-c,--create-table}'[Create a new table, or overwrite an existing table (with the -f flag) from the imported data.]' \
{-u,--update-table}'[Update an existing table with the imported data.]' \
{-f,--force}'[If a create operation is being executed, data already exists in the destination, the force flag will allow the target to be overwritten.]' \
{-r,--replace-table}'[Replace existing table with imported data while preserving the original schema.]' \
'(--continue)--continue[Continue importing when row import errors are encountered.]' \
{-s,--schema}'[The schema for the output data.]' \
{-m,--map}'[A file that lays out how fields should be mapped from input data to output data.]' \
{-pk,--pk}'[Explicitly define the name of the field in the schema which should be used as the primary key.]' \
'(--file-type)--file-type[Explicitly define the type of the file if it can''t be inferred from the file extension.]' \
'(--delim)--delim[Specify a delimiter for a csv style file with a non-comma delimiter.]'
}
I wrote a full guide here:
https://www.dolthub.com/blog/2021-11-15-zsh-completions-with-subcommands/

Resources