How to use Spacy with Pyinstaller? - pyinstaller

is there the possibility, that someone could post a working pyinstaller example of a minnimal program that includes spacy and a language model?
I tried to follow the tips on stackoverflow, but maybe i dont understand them well. I still get the same error, that the model isn't found.

I was able to use PyInstaller to package my 'prediction server' using Spacy 2.x, but it took a whale of trials to get all the 'hooks' right.
My prediction server loads the NR model and listens (on a socket) for prediction requests (data). For each request, it returns predicted NR entities.
I got my server working without using GPU. Adding GPU support resulted in a monster executable, which was always missing something - not to mention it would likely be only portable to machine(s) with the same CUDA version.
My SpacyServer.spec:
# -*- mode: python ; coding: utf-8 -*-
block_cipher = None
a = Analysis(['SpacyServer.py'],
pathex=['C:\\Work\\ML\\Spacy\\deploy'],
binaries=[],
datas=[],
hiddenimports=[],
hookspath=['.'],
runtime_hooks=[],
excludes=['cupy'],
win_no_prefer_redirects=False,
win_private_assemblies=False,
cipher=block_cipher,
noarchive=False)
pyz = PYZ(a.pure, a.zipped_data,
cipher=block_cipher)
exe = EXE(pyz,
a.scripts,
a.binaries,
a.zipfiles,
a.datas,
[],
name='SpacyServer',
debug=False,
bootloader_ignore_signals=False,
strip=False,
upx=True,
upx_exclude=[],
runtime_tmpdir=None,
console=True )
And my hook-spacy.py:
# HOOK FILE FOR SPACY
from PyInstaller.utils.hooks import collect_all
# ----------------------------- SPACY -----------------------------
print('=================== SPACY =====================')
data = collect_all('spacy')
datas = data[0]
binaries = data[1]
hiddenimports = data[2]
# ----------------------------- THINC -----------------------------
data = collect_all('thinc')
datas += data[0]
binaries += data[1]
hiddenimports += data[2]
# ----------------------------- CYMEM -----------------------------
data = collect_all('cymem')
datas += data[0]
binaries += data[1]
hiddenimports += data[2]
print(data[2])
# ----------------------------- PRESHED -----------------------------
data = collect_all('preshed')
datas += data[0]
binaries += data[1]
hiddenimports += data[2]
# ----------------------------- BLIS -----------------------------
data = collect_all('blis')
datas += data[0]
binaries += data[1]
hiddenimports += data[2]
# ----------------------------- OTHER ----------------------------
hiddenimports += ['srsly.msgpack.util']
# This hook file is a bit of a hack - really, all of the libraries should be in seperate hook files. (Eg hook-blis.py with the blis part of the hook)
# But it looks we need to process everything when we import spacy, else we do not even hit import cymem.
print('=================== SPACY DONE =====================')
Finally, I run:
pyinstaller --onefile --name SpacyServer SpacyServer.py --additional-hooks-dir=. --exclude-module=cupy

Related

webdriver_manager error question (project using pyinstaller)

First, set up a virtual environment with anaconda, then install modules such as selenium,pyqt5, pyinstaller, and webdriver_manager in the virtual environment,
After creating the .spec file by setting pyinstaller.exe -w -F (project), the .spec file was modified as follows.
# -*- mode: python ; coding: utf-8 -*-
block_cipher = None
a = Analysis(
['GUI.py'],
pathex=[],
binaries=[],
datas=[('gui.ui','.'),('BHA_icon.png','.'),('process.py','.')],
hiddenimports=["webdriver_manager.chrome","selenium"],
hookspath=[],
hooksconfig={},
runtime_hooks=[],
excludes=[],
win_no_prefer_redirects=False,
win_private_assemblies=False,
cipher=block_cipher,
noarchive=False,
)
pyz = PYZ(a.pure, a.zipped_data, cipher=block_cipher)
exe = EXE(
pyz,
a.scripts,
a.binaries,
a.zipfiles,
a.datas,
[],
name='Baekjoon Hub Automation',
debug=False,
bootloader_ignore_signals=False,
strip=False,
upx=True,
icon=r'BHA_icon.ico',
upx_exclude=[],
runtime_tmpdir=None,
console=False,
disable_windowed_traceback=False,
argv_emulation=False,
target_arch=None,
codesign_identity=None,
entitlements_file=None,
)
After that, I built it as an exe file, and the program ran well with the console window below on my computer, but when I ran the exe file on another computer, the following error occurred.
enter image description here
enter image description here
and my console window : [13676:20232:0201/200946.234:ERROR:network_change_notifier_win.cc(224)] WSALookupServiceBegin failed with: 0
DevTools listening on ws://127.0.0.1:7831/devtools/browser/4cce63d7-9b48-420e-a020-ff714694c193
[16408:19512:0201/200946.366:ERROR:network_change_notifier_win.cc(224)] WSALookupServiceBegin failed with: 0
[13676:6804:0201/200947.548:ERROR:cert_issuer_source_aia.cc(34)] Error parsing cert retrieved from AIA (as DER):
ERROR: Couldn't read tbsCertificate as SEQUENCE
ERROR: Failed parsing Certificate
[13676:20232:0201/200950.547:ERROR:device_event_log_impl.cc(215)] [20:09:50.547] Bluetooth: bluetooth_adapter_winrt.cc:1205 Getting Radio failed. Chrome will be unable to change the power state by itself.
[13676:20232:0201/200950.565:ERROR:device_event_log_impl.cc(215)] [20:09:50.565] Bluetooth: bluetooth_adapter_winrt.cc:1283 OnPoweredRadioAdded(), Number of Powered Radios: 1
[13676:20232:0201/200950.566:ERROR:device_event_log_impl.cc(215)] [20:09:50.566] Bluetooth: bluetooth_adapter_winrt.cc:1298 OnPoweredRadiosEnumerated(), Number of Powered Radios: 1
[7568:21300:0201/201146.489:ERROR:gpu_init.cc(523)] Passthrough is not supported, GL is disabled, ANGLE is
[13676:20232:0201/202032.150:ERROR:device_event_log_impl.cc(215)] [20:20:32.148] USB: usb_service_win.cc:104 SetupDiGetDeviceProperty({{A45C254E-DF1C-4EFD-8020-67D146A850E0}, 6}) failed: 요소가 없습니다. (0x490)
[13676:20232:0201/202032.174:ERROR:device_event_log_impl.cc(215)] [20:20:32.173] USB: usb_service_win.cc:307 SetupDiEnumDeviceInterfaces: 사용 가능한 데이터가 없습니다. (0x103)
driver setting code
import time
from selenium.webdriver.common.by import By
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
def set_chrome_driver():
options = webdriver.ChromeOptions()
options.add_experimental_option("detach", True)
driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()), options=options)
return driver
#####
def start_setting_BJ(id, pw, i, driver):
if i!=True:
url = 'https://www.acmicpc.net/login'
driver.get(url)
selector = "#submit_button"
time.sleep(1)
driver.find_element(By.NAME, 'login_user_id').send_keys(id)
driver.find_element(By.NAME, 'login_password').send_keys(pw)
elem = driver.find_element(By.CSS_SELECTOR, selector)
elem.click()
time.sleep(2)
login_check= driver.find_element(By.XPATH, "/html/body/div[2]/div[1]/div[1]/div/ul/li[3]/a") # 로그인 여부 확인용 "로그인" 텍스트 인식
if login_check.text=="로그인":
return -1
else:
driver.get('https://www.acmicpc.net/user/' + id)
Peoblem_list = '/html/body/div[2]/div[2]/div[3]/div[2]/div/div[2]/div[2]/div[2]/div'
time.sleep(1)
elem = driver.find_element(By.XPATH, Peoblem_list)
all = elem.text.split()
return all

How to Read Data from report.html file in Robotic Framework

I have a use case to read data from report.html (Example Test case name & Elapsed time) and store into MySQL, then implement Grafana dashboard wrt test case name & Elapsed time)
How can I achieve it ? How can I read data from report.html ?
Read output.xml,
You can use Python methods with this lib "xml.etree.ElementTree".
I already parse the output.xml from robotFramework for personal usage, this is an exemple for the beginning of parsing:
# -*- coding: utf:8 -*-
import os, sys
import xml.etree.ElementTree as ET
class OutputAnalyser() :
def __init__(self) :
self.xml_report_file = '/Logs/output.xml'
self.root_full_report = self.load_output_xml_results_file()
self.all_test_by_suite = self.get_all_tests_by_suite()
def load_output_xml_results_file(self):
try:
root_full_report = ET.parse(self.xml_report_file).getroot()
except FileNotFoundError as e:
raise FileNotFoundError({'errorCode' : 'FileNotFoundError', 'errorMessage' : 'File : ' + str(self.xml_report_file) + ' inexisting. Error : ' + str(e)})
return root_full_report
def get_all_tests_by_suite(self):
all_suite = [item for elem in self.root_full_report.findall('.//suite') for item in elem]
all_test_by_suite = []
for suite in all_suite:
sublist_test = {}
sublist_test["suiteName"] = suite.get('name')
sublist_test["tests"] = suite.findall('./test')
all_test_by_suite.append(sublist_test)
return all_test_by_suite

Calling Julia from Streamlit App using PyJulia

I'm trying to use a julia function from a streamlit app. Created a toy example to test the interaction, simply returning a matrix from a julia functions based on a single parameter to specify the value of the diagonal elements.
Will also note at the outset that both julia_import_method = "api_compiled_false" and julia_import_method = "main_include" works when importing the function in Spyder IDE (rather than at the command line to launch the streamlit app via streamlit run streamlit_julia_test.py).
My project directory looks like:
├── my_project_directory
│   ├── julia_test.jl
│   └── streamlit_julia_test.py
The julia function is in julia_test.jl and just simply returns a matrix with diagonals specified by the v parameter:
function get_matrix_from_julia(v::Int)
m=[v 0
0 v]
return m
end
The streamlit app is streamlit_julia_test.py and is defined as:
import os
from io import BytesIO
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import streamlit as st
# options:
# main_include
# api_compiled_false
# dont_import_julia
julia_import_method = "api_compiled_false"
if julia_import_method == "main_include":
# works in Spyder IDE
import julia
from julia import Main
Main.include("julia_test.jl")
elif julia_import_method == "api_compiled_false":
# works in Spyder IDE
from julia.api import Julia
jl = Julia(compiled_modules=False)
this_dir = os.getcwd()
julia_test_path = """include(\""""+ this_dir + """/julia_test.jl\"""" +")"""
print(julia_test_path)
jl.eval(julia_test_path)
get_matrix_from_julia = jl.eval("get_matrix_from_julia")
elif julia_import_method == "dont_import_julia":
print("Not importing ")
else:
ValueError("Not handling this case:" + julia_import_method)
st.header('Using Julia in Streamlit App Example')
st.text("Using Method:" + julia_import_method)
matrix_element = st.selectbox('Set Matrix Diagonal to:', [1,2,3])
matrix_numpy = np.array([[matrix_element,0],[0,matrix_element]])
col1, col2 = st.columns([4,4])
with col1:
fig, ax = plt.subplots(figsize=(5,5))
sns.heatmap(matrix_numpy, ax = ax, cmap="Blues",annot=True)
ax.set_title('Matrix Using Python Numpy')
buf = BytesIO()
fig.savefig(buf, format="png")
st.image(buf)
with col2:
if julia_import_method == "dont_import_julia":
matrix_julia = matrix_numpy
else:
matrix_julia = get_matrix_from_julia(matrix_element)
fig, ax = plt.subplots(figsize=(5,5))
sns.heatmap(matrix_julia, ax = ax, cmap="Blues",annot=True)
ax.set_title('Matrix from External Julia Script')
buf = BytesIO()
fig.savefig(buf, format="png")
st.image(buf)
If the app were working correctly, it would look like this (which can be reproduced by setting the julia_import_method = "dont_import_julia" on line 13):
Testing
When I try julia_import_method = "main_include", I get the well known error:
julia.core.UnsupportedPythonError: It seems your Julia and PyJulia setup are not supported.
Julia executable:
julia
Python interpreter and libpython used by PyCall.jl:
/Users/myusername/opt/anaconda3/bin/python3
/Users/myusername/opt/anaconda3/lib/libpython3.9.dylib
Python interpreter used to import PyJulia and its libpython.
/Users/myusername/opt/anaconda3/bin/python
/Users/myusername/opt/anaconda3/lib/libpython3.9.dylib
Your Python interpreter "/Users/myusername/opt/anaconda3/bin/python"
is statically linked to libpython. Currently, PyJulia does not fully
support such Python interpreter.
The easiest workaround is to pass `compiled_modules=False` to `Julia`
constructor. To do so, first *reboot* your Python REPL (if this happened
inside an interactive session) and then evaluate:
>>> from julia.api import Julia
>>> jl = Julia(compiled_modules=False)
Another workaround is to run your Python script with `python-jl`
command bundled in PyJulia. You can simply do:
$ python-jl PATH/TO/YOUR/SCRIPT.py
See `python-jl --help` for more information.
For more information, see:
https://pyjulia.readthedocs.io/en/latest/troubleshooting.html
As suggested, when I set julia_import_method = "api_compiled_false", I get a seg fault:
include("my_project_directory/julia_test.jl")
2022-04-03 10:23:13.406 Traceback (most recent call last):
File "/Users/myusername/opt/anaconda3/lib/python3.9/site-packages/streamlit/script_runner.py", line 430, in _run_script
exec(code, module.__dict__)
File "my_project_directory/streamlit_julia_test.py", line 25, in <module>
jl = Julia(compiled_modules=False)
File "/Users/myusername/.local/lib/python3.9/site-packages/julia/core.py", line 502, in __init__
if not self.api.was_initialized: # = jl_is_initialized()
File "/Users/myusername/.local/lib/python3.9/site-packages/julia/libjulia.py", line 114, in __getattr__
return getattr(self.libjulia, name)
File "/Users/myusername/opt/anaconda3/lib/python3.9/ctypes/__init__.py", line 395, in __getattr__
func = self.__getitem__(name)
File "/Users/myuserame/opt/anaconda3/lib/python3.9/ctypes/__init__.py", line 400, in __getitem__
func = self._FuncPtr((name_or_ordinal, self))
AttributeError: dlsym(0x21b8e5840, was_initialized): symbol not found
signal (11): Segmentation fault: 11
in expression starting at none:0
Allocations: 35278345 (Pool: 35267101; Big: 11244); GC: 36
zsh: segmentation fault streamlit run streamlit_julia_test.py
I've also tried the alternative recommendation provided in the PyJulia response message regarding the use of:
python-jl my_project_directory/streamlit_julia_test.py
But I get this error when running the python-jl command:
INTEL MKL ERROR: dlopen(/Users/myusername/opt/anaconda3/lib/libmkl_intel_thread.1.dylib, 0x0009): Library not loaded: #rpath/libiomp5.dylib
Referenced from: /Users/myusername/opt/anaconda3/lib/libmkl_intel_thread.1.dylib
Reason: tried: '/Applications/Julia-1.7.app/Contents/Resources/julia/bin/../lib/libiomp5.dylib' (no such file), '/usr/local/lib/libiomp5.dylib' (no such file), '/usr/lib/libiomp5.dylib' (no such file).
Intel MKL FATAL ERROR: Cannot load libmkl_intel_thread.1.dylib.
So I'm stuck, thanks in advance for a modified reproducible example or instructions for the following system specs!
System specs:
Mac OS Monterey 12.2.1 (Chip - Mac M1 Pro)
Python 3.9.7
Julia 1.7.2
PyJulia 0.5.8.dev
Streamlit 1.7.0
yes we can use it streamlit_julia_test.py NumPy for instance

Why does pyinstaller create a huge exe file when using it with pdfminer modules in Python 3.6?

I try to create an exe file with pyinstaller (in Python 3.6) from a script that is using pdfminer modules but the created exe file is huge, around 240 MB. In contrast, when using pyinstaller in Python 2.7 with a similar script the created exe file is only around 10 MB.
What is it that I am doing wrong?
I create the exe file with the following command: pyinstaller.exe --onefile {filename/path}
My code:
from pdfminer.pdfinterp import PDFResourceManager, PDFPageInterpreter
from pdfminer.converter import TextConverter
from pdfminer.layout import LAParams
from pdfminer.pdfpage import PDFPage
...
def convert_pdf_to_txt(path):
rsrcmgr = PDFResourceManager()
retstr = io.StringIO()
#codec = 'windows-1250'
laparams = LAParams()
device = TextConverter(rsrcmgr, retstr, laparams=laparams)
fp = open(path, 'rb')
# reply = s.get(path, stream=True, verify= False)
# fp = StringIO()
# fp.write(reply.content)
# fp.seek(0)
interpreter = PDFPageInterpreter(rsrcmgr, device)
password = ""
maxpages = 0
caching = True
pagenos = set()
for page in PDFPage.get_pages(fp, pagenos, maxpages=maxpages, password=password, caching=caching, check_extractable=True):
interpreter.process_page(page)
text = retstr.getvalue()
fp.close()
device.close()
retstr.close()
return text
...
I found the following solution:
Created a virtual environment
Via pip installed only the packages needed (i.e. pdfminer and pyinstaller).
The result in now an exe file that is only 7.8 MB.

In pytest-allure, how do I capture screenshots when use case execution fails?

I don't know how to add a plugins to the pytest-allure. I hope you can give me an example. I hope to hook the failure result of the case when the case fails and capture it in the allure report.
My code is as follows: conftest.py
# -*- coding: utf-8 -*-
import pytest
from allure.constants import AttachmentType
from selenium import webdriver
drvier = webdriver.Chrome()
#pytest.mark.hookwrapper(hookwrapper=True, tryfirst=True)
def pytest_runtest_makereport(item):
outcome = yield
rep = outcome.get_result()
setattr(item, "rep_" + rep.when, rep)
if rep.when == 'call':
if rep.failed:
png = item.name + '.png'
pytest.allure.attach(png, drvier.get_screenshot_as_png(), type=AttachmentType.PNG)
The following is Main()
# coding=utf-8
import subprocess
import pytest
from conf import config_path
def action(case):
pytest.main(['-s', '-q', config_path.testcase_path + case+'.py', '--alluredir', config_path.report_path])
command_report = config_path.allurecmd_path +'allure generate ' + config_path.report_path + ' -o ' + config_path.report_path
print(command_report)
progress = subprocess.Popen(command_report, shell=True)
progress.wait()
if __name__ == '__main__':
action('Login_test')
I hope you can help me solve this problem. thanks!

Resources