Is the working directory of the dagster main process different of the scheduler processes - dagster

I'm having an issue with the loading of a file from dagster code (setup, not pipelines). Say I have the following project structure:
pipelines
-app/
--environments
----schedules.yaml
--repository.py
--repository.yaml
When I run dagit while inside the project folder($cd project && dagit -y app/repository.yaml), this folder becomes the working dir and inside the repository.py I could load a file knowing the root is project
# repository.py
with open('app/evironments/schedules.yaml', 'r'):
# do something with the file
However, if I set up a schedule the pipelines in the project do not run. Checking the cron logs it seems the open line throws a file not found exception. I was wondering if this happens because the working directory is different when executing the cron.
For context, I'm loading a config file with parameters of cron_schedules for each pipeline. Also, here's the tail of the stacktrace in my case:
File "/home/user/.local/share/virtualenvs/pipelines-mfP13m0c/lib/python3.8/site-packages/dagster/core/definitions/handle.py", line 190, in from_yaml
return LoaderEntrypoint.from_file_target(
File "/home/user/.local/share/virtualenvs/pipelines-mfP13m0c/lib/python3.8/site-packages/dagster/core/definitions/handle.py", line 161, in from_file_target
module = import_module_from_path(module_name, os.path.abspath(python_file))
File "/home/user/.local/share/virtualenvs/pipelines-mfP13m0c/lib/python3.8/site-packages/dagster/seven/__init__.py", line 75, in import_module_from_path
spec.loader.exec_module(module)
File "<frozen importlib._bootstrap_external>", line 783, in exec_module
File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
File "/home/user/pipelines/app/repository.py", line 28, in <module>
schedule_builder = ScheduleBuilder(settings.CRON_PRESET, settings.ENV_DICT)
File "/home/user/pipelines/app/schedules.py", line 12, in __init__
self.cron_schedules = self._load_schedules_yaml()
File "/home/user/pipelines/app/schedules.py", line 16, in _load_schedules_yaml
with open(path) as f:
FileNotFoundError: [Errno 2] No such file or directory: 'app/environments/schedules.yaml'

You could open the file using the absolute path of the file so that it opens correctly.
from dagster.utils import file_relative_path
with open(file_relative_path(__file__, './environments/schedules.yaml'), 'r'):
# do something with the file
All file_relative_path is simply doing the following, so you can call the os.path methods directly if you prefer:
def file_relative_path(dunderfile, relative_path):
os.path.join(os.path.dirname(dunderfile), relative_path)

Related

Script compiled with pyinstaller is missing a .dll file, when the file is manually copied in the program's folder it just dies

I have a python script which is basically a graphic interface (pysimpleguy) to a mysql database.
I am working in python 3.8; my dependencies are:
PySimpleGUI 4.55.1
sqlalchemy 1.3.20
pymysql 1.0.2
pandas 1.1.3
regex 2020.10.15
pillow 8.0.1
The code works and I'd like to compile it to .exe to distribute it to users in my organization.
I tried to compile it with:
pyinstaller -D .\db_interface_v3.6.1_release.py --debug=imports
However, pyinstaller throws some errors when compiling:
201667 INFO: Building COLLECT COLLECT-00.toc
Traceback (most recent call last):
File "c:\users\spit\anaconda3\lib\runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "c:\users\spit\anaconda3\lib\runpy.py", line 87, in _run_code
exec(code, run_globals)
File "C:\Users\Spit\anaconda3\Scripts\pyinstaller.exe\__main__.py", line 7, in <module>
File "c:\users\spit\anaconda3\lib\site-packages\PyInstaller\__main__.py", line 124, in run
run_build(pyi_config, spec_file, **vars(args))
File "c:\users\spit\anaconda3\lib\site-packages\PyInstaller\__main__.py", line 58, in run_build
PyInstaller.building.build_main.main(pyi_config, spec_file, **kwargs)
File "c:\users\spit\anaconda3\lib\site-packages\PyInstaller\building\build_main.py", line 782, in main
build(specfile, kw.get('distpath'), kw.get('workpath'), kw.get('clean_build'))
File "c:\users\spit\anaconda3\lib\site-packages\PyInstaller\building\build_main.py", line 714, in build
exec(code, spec_namespace)
File "C:\Users\Spit\Desktop\DIPEx db parser\db_interface_v3.6.1_release.spec", line 37, in <module>
coll = COLLECT(exe,
File "c:\users\spit\anaconda3\lib\site-packages\PyInstaller\building\api.py", line 818, in __init__
self.__postinit__()
File "c:\users\spit\anaconda3\lib\site-packages\PyInstaller\building\datastruct.py", line 155, in __postinit__
self.assemble()
File "c:\users\spit\anaconda3\lib\site-packages\PyInstaller\building\api.py", line 866, in assemble
shutil.copy(fnm, tofnm)
File "c:\users\spit\anaconda3\lib\shutil.py", line 415, in copy
copyfile(src, dst, follow_symlinks=follow_symlinks)
File "c:\users\spit\anaconda3\lib\shutil.py", line 261, in copyfile
with open(src, 'rb') as fsrc, open(dst, 'wb') as fdst:
FileNotFoundError: [Errno 2] No such file or directory: 'C:\\Users\\Spit\\Desktop\\DIPEx db parser\\dist\\db_interface_v3.6.1_release\\share\\jupyter\\lab\\staging\\node_modules\\.cache\\terser-webpack-p
lugin\\content-v2\\sha512\\2e\\ba\\cfce62ec1f408830c0335f2b46219d58ee5b068473e7328690e542d2f92f2058865c600d845a2e404e282645529eb0322aa4429a84e189eb6b58c1b97c1a'
If I try to run the compiled exe, I get an error regarding a specific .dll:
INTEL MKL ERROR: Impossibile trovare il modulo specificato. mkl_intel_thread.dll.
Intel MKL FATAL ERROR: Cannot load mkl_intel_thread.dll.
If I take this missing .dll from my Anaconda environment and copy it into the program's folder, when I try to run the .exe again it just dies without further messages:
import 'numpy.ma' # <pyimod03_importers.FrozenImporter object at 0x000001F6A455BEE0>
PS C:\Users\Spit\Desktop\DIPEx db parser\dist\db_interface_v3.6.1_release>
Any idea on how to sort it out?
Thanks!
Sorted out. As a future reference if someone stumbles upon this question, the error is caused by Windows' PATH_MAX limitation, preventing pyinstaller to find all the necessary files.
In order to disable said limitation: https://learn.microsoft.com/en-us/windows/win32/fileio/maximum-file-path-limitation?tabs=cmd
Kudos to https://github.com/bwoodsend

FileNotFoundError: [Errno 2] No such file or directory: 'java': 'java'

I am facing this error after I have deployed my streamlit app on streamlit sharing. The app is running well on my localhost but not after deploying. I think it is not running the bash commands on the server which I have in my streamlit app.
# Performs the descriptor calculation
bashCommand = "java -Xms2G -Xmx2G -Djava.awt.headless=true -jar ./PaDEL-Descriptor/PaDEL-Descriptor.jar -removesalt -standardizenitro -fingerprints -descriptortypes ./PaDEL-Descriptor/PubchemFingerprinter.xml -dir ./ -file descriptors_output.csv"
process = subprocess.Popen(bashCommand.split(), stdout=subprocess.PIPE)
output, error = process.communicate()
os.remove('molecule.smi')
Error Image
I am getting this as an error:
FileNotFoundError: [Errno 2] No such file or directory: 'java': 'java'
Traceback:
File "/home/appuser/venv/lib/python3.7/site-packages/streamlit/script_runner.py", line 332, in _run_script
exec(code, module.__dict__)
File "/app/bioactivity-prediction/app.py", line 69, in <module>
desc_calc()
File "/app/bioactivity-prediction/app.py", line 13, in desc_calc
process = subprocess.Popen(bashCommand.split(), stdout=subprocess.PIPE)
File "/usr/local/lib/python3.7/subprocess.py", line 800, in __init__
restore_signals, start_new_session)
File "/usr/local/lib/python3.7/subprocess.py", line 1551, in _execute_child
raise child_exception_type(errno_num, err_msg, err_filename)
Also here is a link to my deployed app:
https://share.streamlit.io/rahul97532/bioactivity-prediction/app.py
To install java on Streamlit Cloud, you need to create a file packages.txt with the line default-jre in it, as demonstrated in this repository example:
https://github.com/randyzwitch/test-java/blob/main/packages.txt

Jupyter path error when on different server with mapped home directory

I work on two servers, serverA and serverB. On both of them, my home directory is mapped to the same location. Other than the home directory, the servers have independent file systems. This includes system directories and application directories. So, I created a special .bashrc_serverb file that is sourced if my hostname is serverB. This resets my path.
balter#serverB:~$ echo $PATH
/mnt/scratch/miniconda3/bin:/bin:/usr/local/bin:/usr/bin
I first installed conda and jupyter while logged in to serverA. Apparently it created a file ~/.local/share/jupyter/kernels/python3. I also installed conda and jupyter on serverB. Now when I try to run jupyter notebook or jupyter-console on serverB, I get:
```
balter#serverB:~$ jupyter-console
[ZMQTerminalIPythonApp] ERROR | Failed to run command:
['/home/...miniconda3/bin/python', '-m', 'ipykernel', '-f', '/home/users/balter/.local/share/jupyter/runtime/kernel-26741.json']
PATH='/mnt/scratch/miniconda3/bin:/bin:/usr/local/bin:/usr/bin'
with kwargs:
{'stdin': -1, 'cwd': None, 'start_new_session': True, 'stdout': None, 'stderr': None}
Traceback (most recent call last):
File "/mnt/scratch/miniconda3/bin/jupyter-console", line 5, in
app.main()
File "/mnt/scratch/miniconda3/lib/python3.5/site-packages/jupyter_core/application.py", line 267, in launch_instance
return super(JupyterApp, cls).launch_instance(argv=argv, **kwargs)
File "/mnt/scratch/miniconda3/lib/python3.5/site-packages/traitlets/config/application.py", line 657, in launch_instance
app.initialize(argv)
File "", line 2, in initialize
File "/mnt/scratch/miniconda3/lib/python3.5/site-packages/traitlets/config/application.py", line 87, in catch_config_error
return method(app, *args, **kwargs)
File "/mnt/scratch/miniconda3/lib/python3.5/site-packages/jupyter_console/app.py", line 141, in initialize
self.init_shell()
File "/mnt/scratch/miniconda3/lib/python3.5/site-packages/jupyter_console/app.py", line 109, in init_shell
JupyterConsoleApp.initialize(self)
File "/mnt/scratch/miniconda3/lib/python3.5/site-packages/jupyter_client/consoleapp.py", line 334, in initialize
self.init_kernel_manager()
File "/mnt/scratch/miniconda3/lib/python3.5/site-packages/jupyter_client/consoleapp.py", line 288, in init_kernel_manager
self.kernel_manager.start_kernel(**kwargs)
File "/mnt/scratch/miniconda3/lib/python3.5/site-packages/jupyter_client/manager.py", line 243, in start_kernel
**kw)
File "/mnt/scratch/miniconda3/lib/python3.5/site-packages/jupyter_client/manager.py", line 189, in _launch_kernel
return launch_kernel(kernel_cmd, **kw)
File "/mnt/scratch/miniconda3/lib/python3.5/site-packages/jupyter_client/launcher.py", line 123, in launch_kernel
proc = Popen(cmd, **kwargs)
File "/mnt/scratch/miniconda3/lib/python3.5/subprocess.py", line 947, in init
restore_signals, start_new_session)
File "/mnt/scratch/miniconda3/lib/python3.5/subprocess.py", line 1551, in _execute_child
raise child_exception_type(errno_num, err_msg)
FileNotFoundError: [Errno 2] No such file or directory: '/home/...miniconda3/bin/python'
```
The last line is the crucial one. That path is on serverA (full path obscured for security).
What is the fix for this?
Cross-posted as jupyter issue.

DAG started from airflow scheduler not finding values in airflow.cfg

I'm taking advantage of the fact that Airflow v1.7.1.3 provides access to airflow.cfg to place some configuration values there rather than embedded in the code. We added the following as the first lines of the airflow.cfg file:
[foo]
bar = foo
bar
In the foobarDAG.py class representing the DAG, I do the following:
from airflow.configuration import conf
…
def fooBar():
pass
foobarList = conf['foo']['bar'].split('\n')
foobarOperator = PythonOperator(
task_id='fooBar',
provide_context=True,
python_callable=fooBar,
op_args=[foobarList],
dag=dag)
Testing this manually from the Python prompt is easy:
>>> from foobarDAG import foobarList
…
>>> foobarList
['foo', 'bar']
That's just what I would expect from the information in airflow.cfg, above.
We've also performed a test on the DAG directly:
airflow test foobarDAG fooBar 10-19-2016
That doesn't report any problems.
The problem crops up when we try to use the scheduler to schedule that one DAG:
airflow scheduler -d foobarDAG >& foobar_log.txt
In the web UI, we see the following at the top of the "DAGS" section:
Broken DAG: [/path/to/…/foobarDAG.py] 'foo'
And in foobar_log.txt, here is the error message:
[2016-10-19 14:56:09,028] {models.py:250} ERROR - Failed to import: /path/to/foobarDAG.py
Traceback (most recent call last):
File "/path/to/airflow/models.py", line 247, in process_file
m = imp.load_source(mod_name, filepath)
File "/path/to/anaconda3/envs/foobarenv/lib/python3.5/imp.py", line 172, in load_source
module = _load(spec)
File "<frozen importlib._bootstrap>", line 693, in _load
File "<frozen importlib._bootstrap>", line 673, in _load_unlocked
File "<frozen importlib._bootstrap_external>", line 662, in exec_module
File "<frozen importlib._bootstrap>", line 222, in _call_with_frames_removed
File "/path/to/foobarDAG.py", line 67, in <module>
foobarList = conf['foo']['bar'].split('\n')
File "/path/to/anaconda3/envs/foobarenv/lib/python3.5/configparser.py", line 956, in __getitem__
raise KeyError(key)
KeyError: 'foo'
So oddly it appears that the scheduler isn't retrieving the ['foo'] section from airflow.cfg and providing it to the DAG. Any idea why?
It turns out that everything was working properly, but the scheduler hadn't been restarted. The scheduler was apparently still using the old airflow.cfg which did not have the added section.

How to add a custom parser to logster?

I want to track the HTTP response codes returned by my nginx web-server, using logster.
1) I found and installed logster. I also pip-installed pygtail, which is required for logster.
https://github.com/etsy/logster
2) I found a python script that parses nginx access_log and placed it in the parsers subdir.
https://github.com/metabrainz/logster/blob/master/musicbrainz/logster/NginxStatus.py
...but when I run the logster command, I get a python exception:
Traceback (most recent call last):
File "/usr/local/bin/logster", line 5, in <module>
pkg_resources.run_script('logster==0.0.1', 'logster')
File "/usr/lib/python2.7/dist-packages/pkg_resources.py", line 505, in run_script
self.require(requires)[0].run_script(script_name, ns)
File "/usr/lib/python2.7/dist-packages/pkg_resources.py", line 1245, in run_script
execfile(script_filename, namespace, namespace)
File "/usr/local/lib/python2.7/dist-packages/logster-0.0.1-py2.7.egg/EGG-INFO/scripts/logster", line 449, in <module>
main()
File "/usr/local/lib/python2.7/dist-packages/logster-0.0.1-py2.7.egg/EGG-INFO/scripts/logster", line 380, in main
module = __import__(module_name, globals(), locals(), [parser_name])
ImportError: No module named NginxStatus1
````
What am I doing wrong?
The exception error was rather misleading: the file was placed in the right place (the parsers subdir), but - as it turns out- logster must be re-setup after a new parser is added (this isn't documented, unfortunately). so just run:
sudo python setup.py install
in the logster directory and things should start working correctly.

Resources