Training on deeppavlov for NER keeps failing - bert-language-model

I have been trying to train a deeppavlov model for NER based on the train syntax given on their docs and it keeps failing with below error message:
/opt/anaconda3/envs/py36/lib/python3.6/site-packages/deeppavlov/dataset_readers/conll2003_reader.py in parse_ner_file(self, file_name)
104 items = line.split()
105 if len(items) < expected_items:
--> 106 raise Exception(f"Input is not valid {line}")
107 tokens.append(items[0])
108 tags.append(items[-1])
Exception: Input is not valid aio-pika==6.4.1
Used the following code to train the deeppavlov model, it seems to be working on their sample dataset, but when I created my own dataset as per their training sample guide, I keep getting above error message.
Training ner code:
from deeppavlov import configs, train_model, build_model
from deeppavlov.core.commands.utils import parse_config
import json
with configs.ner.ner_ontonotes_bert_mult.open(encoding='utf8') as f:
ner_config = json.load(f)
ner_config['dataset_reader']['data_path'] = '/Users/smankari001/deeppavlov' # directory with train.txt, valid.txt and test.txt files
ner_config['metadata']['variables']['NER_PATH'] = '/Users/smankari001/deeppavlov'
ner_config['metadata']['download'] = [ner_config['metadata']['download'][-1]] # do not download the pretrained ontonotes model
ner_model = train_model(ner_config, download=True)
input train.txt file:
What O
kind O
of O
memory O
? O
We O
respectfully O
invite O
you O
to O
watch O
a O
special O
edition O
of O
Across B-ORG
China I-ORG
. O
WW B-WORK_OF_ART
II I-WORK_OF_ART
Landmarks I-WORK_OF_ART
on I-WORK_OF_ART
the I-WORK_OF_ART
Great I-WORK_OF_ART
Earth I-WORK_OF_ART
of I-WORK_OF_ART
China I-WORK_OF_ART
: I-WORK_OF_ART
Eternal I-WORK_OF_ART
Memories I-WORK_OF_ART
of I-WORK_OF_ART
Taihang I-WORK_OF_ART
Mountain I-WORK_OF_ART
Standing O
tall O
on O
Taihang B-LOC
Mountain I-LOC
is O
the B-WORK_OF_ART
Monument I-WORK_OF_ART
to I-WORK_OF_ART
the I-WORK_OF_ART
Hundred I-WORK_OF_ART
Regiments I-WORK_OF_ART
Offensive I-WORK_OF_ART
. O
It O
is O
composed O
of O
a O
primary O
stele O
, O
secondary O
steles O
, O
a O
huge O
round O
sculpture O
and O
beacon O
tower O
, O
and O
the B-WORK_OF_ART
Great I-WORK_OF_ART
Wall I-WORK_OF_ART
, O
among O
other O
things O
. O
A O
primary O
stele O
, O
three B-CARDINAL
secondary O
steles O
, O
and O
two B-CARDINAL
inscribed O
steles O
. O
The B-EVENT
Hundred I-EVENT
Regiments I-EVENT
Offensive I-EVENT
was O
the O
campaign O
of O
the O
largest O
scale O
launched O
by O
the B-ORG
Eighth I-ORG
Route I-ORG
Army I-ORG
during O
the B-EVENT
War I-EVENT
of I-EVENT
Resistance I-EVENT
against I-EVENT
Japan I-EVENT
. O
This O
campaign O
broke O
through O
the O
Japanese B-NORP
army O
's O
blockade O
to O
reach O
base O
areas O
behind O
enemy O
lines O
, O
stirring O
up O
anti-Japanese B-NORP
spirit O
throughout O
the O
nation O
and O
influencing O
the O
situation O
of O
the O
anti-fascist O
war O
of O
the O
people O
worldwide O
. O

As ner_config['dataset_reader']['data_path'] you need to specify path to folder with only dataset files (train/valid/test).
This error:
Exception: Input is not valid aio-pika==6.4.1
says that DatasetReader started to read lines from requirements.txt file.

Related

Is there a quick way to find shift value in rot algorithm

I was watching scene where eliot and mr robot decrypt text just by looking at it. I know rot is weak encryption and shift value can be find by increasing shift value while decrypting one by one but, is there a quick way to find shift value? and how can one say surely encryption algorithm used is rot?
Can you find the correct decryption by looking at this?
NBCM CM UH YRUGJFY
ocdn dn vi zsvhkgz
pdeo eo wj atwilha
qefp fp xk buxjmib
rfgq gq yl cvyknjc
sghr hr zm dwzlokd
this is an example
uijt jt bo fybnqmf
vjku ku cp gzcorng
wklv lv dq hadpsoh
xlmw mw er ibeqtpi
ymnx nx fs jcfruqj
znoy oy gt kdgsvrk
aopz pz hu lehtwsl
bpqa qa iv mfiuxtm
cqrb rb jw ngjvyun
drsc sc kx ohkwzvo
estd td ly pilxawp
ftue ue mz qjmybxq
guvf vf na rknzcyr
hvwg wg ob sloadzs
iwxh xh pc tmpbeat
jxyi yi qd unqcfbu
kyzj zj re vordgcv
lzak ak sf wpsehdw
mabl bl tg xqtfiex
Most of the time you don't even have to look at all 25 possible rotations. It's even easier if you have a spellchecker on.

How to decrypt the monoalphabetic substitution cipher message through substitution cipher using linux commands

I have been trying to decrypt a message which is seed labs task. I have to use linux commands. They have provided the guidelines but as I am new to this I couldn't find proper help.
What commands do I need to run in order to decrypt this message?The instructions are attached below. The ciphertext.txt file is attached as well which I need to decrypt in the plain text.
ciphertext.txt
ytn xqavhq yzhu xu qzupvd ltmat qnncq vgxzy hmrty vbynh ytmq ixur qyhvurn
vlvhpq yhme ytn gvrrnh bnniq imsn v uxuvrnuvhmvu yxx
ytn vlvhpq hvan lvq gxxsnupnp gd ytn pncmqn xb tvhfnd lnmuqynmu vy myq xzyqny
vup ytn veevhnuy mceixqmxu xb tmq bmic axcevud vy ytn nup vup my lvq qtvenp gd
ytn ncnhrnuan xb cnyxx ymcnq ze givasrxlu eximymaq vhcavupd vaymfmqc vup
v uvymxuvi axufnhqvymxu vq ghmnb vup cvp vq v bnfnh phnvc vgxzy ltnytnh ytnhn
xzrty yx gn v ehnqmpnuy lmubhnd ytn qnvqxu pmpuy ozqy qnnc nkyhv ixur my lvq
nkyhv ixur gnavzqn ytn xqavhq lnhn cxfnp yx ytn bmhqy lnnsnup mu cvhat yx
vfxmp axubimaymur lmyt ytn aixqmur anhncxud xb ytn lmuynh xidcemaq ytvusq
ednxuratvur
First of all, you need to perform a frequency analysis on your cipher text. There are many online tools available to do that, but the most powerful I found was this one:
http://www.brianveitch.com/maze-runner/frequency-analysis/index.html
Based on your cipher text, you need to make assumptions and replace each letter one by one and then analyze the final result to be sure that your answer makes sense. The more correct guesses you'll make, the more closer you will get and eventually, you'll be able to crack the final mono-alphabetic code.
Based on the cipher text you provided in your ciphertext.txt file, the following results are true (Replace the lowercase letters with uppercase letters).
n - E
y - T
v - A
t - H
x - O
u - N
h - R
b - F
q - S
i - L
m - I
r - G
p - D
c - M
s - K
z - U
a - C
d - Y
k - X
l - W
e - P
g - B
f - V
j - Q
o - Z
A quick way to do this is by using tr.
tr 'nyvtxuhbqimrpcszadklegfjo' 'ETAHONRFSLIGDMKUCYXWPBVQZ' < test.txt > out.txt

Breadth and depth first search on a graph with returning edges

I do understand depth and breadth first search but this graph got me confused as there is nodes that points to preceding nodes in the graph.
So let's say for instant that N is a goal state, then using Depth first search we would have
A B E J K L F G M N
So we is it correct this way ? I don't repeat the A because it was visited before right.
And using breadth first search I would go level by level and so I would have
A B C D E F G H I J K L M N
Is this correct ?
And if we change the Goal state to P
then DFS will give us A B E J K L F G M N H O P
and BFS will give us A B C D E F G H I J K L M N O P
I feel I got this right, I am just uncertain if I am right because of the returning edges in this graph. So I just want someone to confirm that I am on the right track here.
That sounds correct to me. When pointing to a node that's already in your result set, it should not be added into the result set a second time.

How to remove/add spaces in all textfiles?

I have several files that look like these, e.g. test.in:
apple foo bar
hello world
I need to achieve this desired output, a space after every character:
a p p l e f o o b a r
h e l l o w o r l d
I though possibly i'll first remove all spaces and then add spaces to each character, as such:
sed 's/\s//g' test.in | sed -e 's/\(.\)/\1 /g'
but is there other ways?
This awk may do:
awk -v FS="" '{gsub(/ /,"");$1=$1}1' file
a p p l e f o o b a r
h e l l o w o r l d
This first remove all space, then since FS (Field Separator) is set to nothing, the $1=$1 reconstruct all fields with one space.
This does not add space at the end as most of the other sed and perl command here.
Or based on sed posted here.
awk '{gsub(/ /,"");gsub(/./,"& ")}1' file
a p p l e f o o b a r
h e l l o w o r l d
You can combine your two sed commands into a single command instead:
$ sed 's/\s//g;s/./& /g' test.in
a p p l e f o o b a r
h e l l o w o r l d
Note the use of . and & instead of \(.\) and \1.
On systems that do not support \s to designate matching whitespace, you can use [[::blank::]] instead:
$ sed 's/[[:blank:]]//g;s/./& /g' test.in
a p p l e f o o b a r
h e l l o w o r l d
Through perl,
$ perl -ple 's/([^ ]|^)(?! )/\1 /g' file
a p p l e f o o b a r
h e l l o w o r l d
Add an inline edit option -i to save the changes made,
perl -i -ple 's/([^ ]|^)(?! )/\1 /g' file
sed 's/ //g;s/./& /g' filename
&: refers to that portion of the pattern space which matched
Or maybe something like this with sed :
$ sed 's/./& /g;s/ //g' file
a p p l e f o o b a r
h e l l o w o r l d
This might work for you (GNU sed):
sed 's/\B/ /g' file

2-D FFT using MPI_Alltoall in MPI

I have a 4X6 matrix split as two 2X6 matrices on 2 processors. Since the rows have been split in a contiguous way (C Language) on the processors, we can carry out a 1-D FFT on the rows. The problem is we need an MPI_Alltoall() to collect contiguous columns on a particular processor for 1-D column FFT i.e. Processor 0 has:
A B C D E F
G H I J K L
Processor 1 has:
M N O P Q R
S T U V W X
The MPI_Alltoall() needs to convert this to the following on Processor 0:
A G M S
B H N T
C I O U
And the following on Processor 2:
D J P V
E K Q W
F L R X
I tried to define a vector having count=2, blocklength=1, stride=6 as the sending type vector, set its extent to MPI_INT, displacement to 0 and sent 3 instances of this vector using MPI_Alltoall(). This according to me should send A G, B H, and C I to processor 0 and D J, E K, and F L to processor 1. Similarly M S, N T, and O U to processor 0 and P V, Q W, and R X to processor 1.
Is my interpretation correct ? If yes, then how do I receive these 3 instances of vectors on the receiving side ( What/How do I define the datatype?).

Resources