How to start the grpc server when definitions are in a module - dagster

My project structure looks like this:
/definitions (for all dagster python definitions)
__init__.py
repositories.py
/exchangerates
pipelines.py
...
...
workspace.yaml
I've tried running the grpc server using various methods, especially the following (started in project root):
dagster api grpc -h 0.0.0.0 -p 4000 -f definitions/repositories.py
dagster api grpc -h 0.0.0.0 -p 4000 -m definitions
dagster api grpc -h 0.0.0.0 -p 4000 -m definitions.repositories
The first command yields the following error:
dagster.core.errors.DagsterImportError: Encountered ImportError: attempted relative import with no known parent package while importing module repositories from file C:\Users\Klaus\PycharmProjects\dagsterexchangerates\definitions\repositories.py. Consider using the module-based options -m for CLI-based targets or the python_package workspace.yaml target.
The second and third command yield the following error:
(stacktrace comes before this)
ModuleNotFoundError: No module named 'definitions'
How can this be solved?
EDIT:
I've uploaded the current version of the example I'm working on to GitHub: https://github.com/kstadler/dagster-exchangerates
EDIT2:
Reflected changes in directory structure

sorry about the trouble - there are a couple of options here to get your server running.
To get it working with the '-f' option, the relative imports need to be replaced with absolute imports. That would look like this:
-from .pipelines import exchangerates_pipline
-from .partitions import year_partition_set
+from definitions.pipelines import exchangerates_pipline
+from definitions.partitions import year_partition_set
(This is the same error you would get if you tried to run python definitions/repositories.py directly).
I'm still digging into why exactly the third '-m' option isn't working the way I'd expect it to. Curiously the following command seems to work for me which should be close to identical:
python -m dagster.grpc -h 0.0.0.0 -p 4000 -m definitions.repositories
Incidentally, your example contains a workspace.yaml that should cause dagit and other Dagster processes to automatically start up the gRPC server for you for that module - so depending on your goal you may not need to run dagster api grpc yourself at all.

Related

Running python -m http.server does not load localhost

Previously when I ran the python -m http.server command from a folder it would show the link as something like "http://localhost:5500", but after I accidentally installed npm install http-server it no longer correctly shows localhost and instead shows something like this
Serving HTTP on :: port 8000 (http://[::]:8000/) ...
Is there any way to fix this back to localhost? The http://[::]:8000/ does not work and I get the error "This site can’t be reached"

Using Docker image without Entrypoint to serve R plumber API

I use the geospatial rocker2 image to deploy Rstudio for development and a Shiny app for production. By using a single image, I have a consistent package library, credentials and database connections. I would like to use this same image to serve a plumber API.
Using the standard plumber.R example and the standard plumber Docker example I have tried to serve it as follows:
docker run -v `pwd`/app/plumber.R:/plumber.R --name plumber --restart=unless-stopped \
-p 8000:8000 my_rocker2_fork/geospatial Rscript /plumber.R
Success, kind of. The plumber.R file is clearly being sourced, but it is not being "plumbed":
Another issue is that the container continually restarts (this is the output of docker ps - please ignore the node.js container running):
One more oddity is that port 8000 isn't shown. Sometimes it is, sometimes it isn't. I think this is related to the restarting behaviour.
My code isn't plumbed, because I don't have the Entrypoint that is standard in the rstudio/plumber Dockerfile, and I don't think I want this Entrypoint, as it may cause issues with Rstudio Server and the Shiny app that are also in this image. Therefore, I think it is probably optimal to "plumb" by expanding the Rscript command at the end of my Docker run statement:
docker run -v `pwd`/app/plumber.R:/plumber.R -p 8000:8000 my_rocker2_fork/geospatial \
'Rscript pr("/plumber.R") %>% pr_run(port = 8000)' &
However, this fails because of all the special characters (like the pipe operator). How can I serve plumber code with an arbitrary Dockerfile without an Entrypoint?
The answer is simple! Call a script that sets the plumbing in motion, e.g.
docker run -v `pwd`/app/plumb_start.R:/plumb_start.R -p 8000:8000 my_rocker2_fork/geospatial \
Rscript plumb_start.R
Where plumb_start.R contains:
pr("plumber.R") %>% pr_run(port=8000)
Make sure that you also expose port 8000 in the Dockerfile.

how to resolve "Error: No module named 'airflow.www'" while starting airflow websever

Getting below error while starting Airflow webserver
balajee#Balajees-MacBook-Air.local:~$ airflow webserver -p 8080
[2018-12-03 00:29:37,066] {init.py:51} INFO - Using executor SequentialExecutor
[2018-12-03 00:29:38,776] {models.py:271} INFO - Filling up the DagBag from /Users/balajee/airflow/dags
Running the Gunicorn Server with:
Workers: 4 sync
Host: 0.0.0.0:8080
Timeout: 120
Logfiles: - -
Error: No module named 'airflow.www'
Fixed for me
pip3 uninstall -y gunicorn
pip3 install gunicorn==19.4.0
I got this problem this morning, and I found a strange solution, may it helps you. I think maybe you just need to change the command running directory.
I install airflow basic dependence in my virtualenv directory venv with PyCharm help, and I use PyCharm build-in Terminal tab to directly access my venv, and I use airflow initdb to init sqlite database to store all my logs and ops, then according to the official tutorial I use airflow webserver to start the webserver. But somehow today I use my Mac terminal, and start virtulenv, and start airflow webserver, and I encounter this problem with:
Running the Gunicorn Server with:
Workers: 4 sync
Host: 0.0.0.0:8080
Timeout: 120
Logfiles: - -
=================================================================
Error: No module named 'airflow.www'
[2019-05-26 07:45:27,130] {cli.py:833} ERROR - No response from gunicorn master within 120 seconds
[2019-05-26 07:45:27,130] {cli.py:834} ERROR - Shutting down webserver
And I tried #Evgeniy Sobolev's solution by reinstall gunicorn and nothing changed, but when I still using my PyCharm Terminal, it can still running successfully. I guess maybe it is because the first directory you init your db and running webserver is critical. By default when I use PyCharm Terminal to init db and start webserver is the Project root directory, like:
(venv) root#root:~/GitHub/FakeProject$ airflow webserver
But today I check into venv to start virtualenv, and the root directory changed!
root#root:~/GitHub/FakeProject/SubDir$ source venv/bin/activate
(venv) root#root:~/GitHub/FakeProject/SubDir$ airflow webserver
** Error **
So in this way it encounters Error: No module named 'airflow.www', so I check out the directory, and the webserver running successfully just like PyCharm Terminal:
(venv) root#root:~/GitHub/FakeProject/SubDir$ cd ..
(venv) root#root:~/GitHub/FakeProject$ airflow webserver
** It works **
I thought maybe airflow store some metadata (like setup a PATH, maybe) in the first time init your airflow db, so you can not change your command running directory.
I hope it may help somebody in the future. Just check your directory!
Looks like you have a problem with gunicorn.
Try to execute this two commands:
sudo -H pip3 uninstall -y gunicorn
sudo -H pip3 install gunicorn
It should resolve your problem, cause airflow show you not clear error message related to gunicorn problems
I did this steps for the problem happens:
create a separate virtualenv only for airflow (I use anaconda distribution)
activate this env with conda activate
install airflow: pip install apache-airflow
at this moment the error No module named 'airflow.www' was showed for me
To fix follow this steps:
Look for where is your gunicorn in: whereis gunicorn
gunicorn have to stay only in your virtualenv directory: /home/yourname/anaconda3/envs/airflow_env/bin/gunicorn
If it stay in two directories, let it just in your airflow enviroment. Remove it all from another.
Another way to verify if gunicorn is in another directories is printing your PATH variable: echo $PATH. Look for gunicorn in /home/yourname/.local/bin and another anaconda directories from PATH. Remove all references. Remove gunicorn from conda base env as well: pip uninstall gunicorn.
With this steps, I think your problem will be solved.
I used anaconda distribution, but I think the same process can be done without it. I used airflow 1.10.0 and python 3.6.
If you have defined a custom home directory for airflow other than default one (~/airflow) during the installation:
You need first export the custom path:
export AIRFLOW_HOME=/your/custom/path/airflow
Go to the airflow directory and then Run the webserver
airflow webserver -p 8080
Run scheduler too
airflow scheduler
please check if gunicorn is installed already in server. for me it was installed in /usr/local/bin and it was taking precedence over gunicorn version installed with airflow. uninstall earlier one or fix $PATH variable
I solved this by starting the webserver from the airflow folder itself.
I was previously trying to open the server from the home directory but the required modules could not be found which may be the case here.
Late to the party but could help others who get here.
I got the same issue using latest airflow version 2.5.0
Make sure env variable AIRFLOW_HOME is pointing to right location
Thanks all for sharing
I added sudo and it actually worked just fine.
I got the same error today and a sudo did the trick to me

Airflow dags and PYTHONPATH

I have some dags that can't seem to locate python modules. Inside of the Airflow UI, I see a ton of these message variations.
Broken DAG: [/home/airflow/source/airflow/dags/test.py] No module named 'paramiko'
Inside of a file I can directly modify the python sys.path and that seems to mitigate my issue.
import sys
sys.path.append('/home/airflow/.local/lib/python2.7/site-packages')
That doesn't feel right though having to set my path in my code directly. I've tried exporting PYTHONPATH in the Airflow user accounts .bashrc but doesn't seem to be read when the dag jobs are executed. What's the correct way to go about this?
Thanks.
----- update -----
Thanks for the responses.
below is my systemctl scripts.
::::::::::::::
airflow-scheduler-airflow2.service
::::::::::::::
[Unit]
Description=Airflow scheduler daemon
[Service]
EnvironmentFile=/usr/local/airflow/instances/airflow2/etc/envars
User=airflow2
Group=airflow2
Type=simple
ExecStart=/usr/local/airflow/instances/airflow2/venv/bin/airflow scheduler
Restart=always
RestartSec=5s
[Install]
WantedBy=multi-user.target
::::::::::::::
airflow-webserver-airflow2.service
::::::::::::::
[Unit]
Description=Airflow webserver daemon
[Service]
EnvironmentFile=/usr/local/airflow/instances/airflow2/etc/envars
User=airflow2
Group=airflow2
Type=simple
ExecStart=/usr/local/airflow/instances/airflow2/venv/bin/airflow webserver
Restart=always
RestartSec=5s
[Install]
WantedBy=multi-user.target
this is the EnvironentFile Contents uses from above
more /usr/local/airflow/instances/airflow2/etc/envars
PATH=/usr/local/airflow/instances/airflow2/venv/bin:/usr/local/bin:/usr/bin:/bin
AIRFLOW_HOME=/usr/local/airflow/instances/airflow2/home
AIRFLOW_CONFIG=/usr/local/airflow/instances/airflow2/etc/airflow.cfg
I had similar issue:
Python wasn't loaded from virtualenv for running airflow (this fixed airflow deps not being fetched from virtualenv)
Submodules under dags path wasn't loaded due different base path (this fixed importing own modules under dags folder
I added following strings to the environemnt file for systemd service
(/usr/local/airflow/instances/airflow2/etc/envars in your case)
source /home/ubuntu/venv/airflow/bin/activate
PYTHONPATH=/home/ubuntu/venv/airflow/dags:$PYTHONPATH
It looks like your python environment is degraded - you have multiple instances of python on your vm (python 3.6 and python 2.7) and multiple instances of pip. There is a pip with python3.6 that is trying to be used, but all of your modules are actually with your python 2.7.
This can be solved easily by using symbolic links to redirect to 2.7.
Type the commands and see which instance of python is used (2.7.5, 2.7.14, 3.6, etc):
python
python2
python2.7
or type which python to find which python instance is being used by your vm. You can also do which pip to see what pip instance is being used.
I am going to assume python and which python leads to python 3 (which you do not want to use), but python2 and python2.7 lead to the instance you do want to use.
To create a symbolic link so that /home/airflow/.local/lib/python2.7/ is used, do the following and create the following symbolic links:
cd home/airflow/.local/lib/python2.7
ln -s python2 python
ln -s /home/airflow/.local/lib/python2.7 python2
Symbolic link structure is: ln -s #PATHDIRECTED #LINKNAME
You are essentially saying when you run the command python, go to python2. When python2 is then ran, go to /home/airflow/.local/lib/python2.7. Its all being redirected.
Now re run the three commands above (python, python2, python2.7). All should lead to the python instance you want.
Hope this helps!
You can add this directly to the Airflow Dockerfile, such as the example below. If you have a .env file you can do ENV PYTHONPATH "${PYTHONPATH}:${AIRFLOW_HOME}".
FROM puckel/docker-airflow:1.10.6
RUN pip install --user psycopg2-binary
ENV AIRFLOW_HOME=/usr/local/airflow
# add persistent python path (for local imports)
ENV PYTHONPATH "/home/jovyan/work:${AIRFLOW_HOME}"
COPY ./airflow.cfg /usr/local/airflow/airflow.cfg
CMD ["airflow initdb"]
I still have the same problem when I try to trigger a dag from UI (cant locate python local modules i.e my_module.my_sub_module... etc), but when I test with :
airflow test my_dag my_task 2021-04-01
It works fine !
I also have in my .bashrc the line (where it supposed to find python local modules):
export PYTHONPATH="/home/my_user"
Sorry guys this topics is very old but i have a lot of problem for launch airflow as daemon, i share my solution
first i installed anaconda in /home/myuser/anaconda3 and i installed all library that i using in my dags, next create follow files:
/etc/systemd/system/airflow-webserver.service
[Unit]
Description=Airflow webserver daemon
After=network.target
[Service]
Environment="PATH=/home/ubuntu/anaconda3/envs/airflow/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
RuntimeDirectory=airflow
RuntimeDirectoryMode=0775
User=myuser
Group=myuser
Type=simple
ExecStart=/bin/bash -c 'source /home/myuser/anaconda3/bin/activate; airflow webserver -p 8080 --pid /home/myuser/airflow/webserver.pid'
Restart=on-failure
RestartSec=5s
PrivateTmp=true
[Install]
WantedBy=multi-user.target
same for daemon scheduler
/etc/systemd/system/airflow-schedule.service
[Unit]
Description=Airflow schedule daemon
After=network.target
[Service]
Environment="PATH=/home/ubuntu/anaconda3/envs/airflow/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
RuntimeDirectory=airflow
RuntimeDirectoryMode=0775
User=myuser
Group=myuser
Type=simple
ExecStart=/bin/bash -c 'source /home/myuser/anaconda3/bin/activate; airflow scheduler'
Restart=on-failure
RestartSec=5s
PrivateTmp=true
[Install]
WantedBy=multi-user.target
next exec command of systemclt:
sudo systemctl daemon-reload
sudo systemctl enable airflow-webserver.service
sudo systemctl enable airflow-schedule.service
sudo systemctl start airflow-webserver.service
sudo systemctl start airflow-schedule.service

Why OpenMPI uses a different server given a different -n setting?

I am testing out OpenMPI, provided and compiled by another user, (I am using soft link to his directories for all bin, include, etc - all the mandatory directories) but I ran into this weird thing:
First of all, if I ran mpirun with -n setting <= 10, I can run this below. testrunmpi.py simply prints out "run." from each core.
# I am in serverA.
bash-3.2$ /home/karl/bin/mpirun -n 10 ./testrunmpi.py
run.
run.
run.
run.
run.
run.
run.
run.
run.
run.
However, when I tried running -n more than 10, I will run into this:
bash-3.2$ /home/karl/bin/mpirun -n 24 ./testrunmpi.py
karl#serverB's password: Could not chdir to home directory /home/karl: No such file or directory
bash: /home/karl/bin/orted: No such file or directory
--------------------------------------------------------------------------
A daemon (pid 19203) died unexpectedly with status 127 while attempting
to launch so we are aborting.
There may be more information reported by the environment (see above).
This may be because the daemon was unable to find all the needed shared
libraries on the remote node. You may set your LD_LIBRARY_PATH to have the
location of the shared libraries on the remote nodes and this will
automatically be forwarded to the remote nodes.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun noticed that the job aborted, but has no info as to the process
that caused that situation.
--------------------------------------------------------------------------
bash-3.2$
bash-3.2$
Permission denied, please try again.
karl#serverB's password:
Permission denied, please try again.
karl#serverB's password:
I see that the work is dispatched to serverB, while I was on serverA. I don't have any account on serverB. But if I invoke mpirun -n <= 10, the work will be on serverA.
This is strange, so I checked out /home/karl/etc/openmpi-default-hostfile, and tried set the following:
serverA slots=24 max_slots=24
serverB slots=0 max_slots=32
But the problem persists and still gives out the same error message above. What must I do in order to have my program run on serverA only?
The default hostfile in Open MPI is system-wide, i.e. its location is determined while the library is being built and installed and there is no user-specific version of it. The actual location can be obtained by running the ompi_info command like this:
$ ompi_info --param orte orte | grep orte_default_hostfile
MCA orte: parameter "orte_default_hostfile" (current value: <LOOK HERE>, data source: default value)
You can override the list of hosts in several different ways. First, you can provide your own hostfile via the -hostfile option to mpirun. If so, you don't have to put hosts with zero slots inside it - simply omit machines that you have no access to. For example:
localhost slots=10 max_slots=10
serverA slots=24 max_slots=24
You can also change the path to the default hostfile by setting the orte_default_hostfile MCA parameter:
$ mpirun --mca orte_default_hostfile /path/to/your/hostfile -n 10 executable
Instead of passing each time the --mca option, you can set the value in an exported environment variable called OMPI_MCA_orte_default_hostfile. This could be set in your shell's dot-rc file, e.g. in .bashrc if using Bash.
You can also specify the list of nodes directly via the -H (or -host) option.

Resources