Is it possible to run a custom OpenAI gym environment entirely from within Jupyter Notebook - jupyter-notebook

Long story short: I have been given some Python code for a custom openAI gym environment. I can successfully run the code via ExperimentGrid from the command line but would like to be able to run the entire experiment from within Jupyter notebook, rather than calling scripts. This would be more convenient for some experiments that I will be doing farther down the road.
My question: Is it possible to execute an experiment on a custom OpenAI gym environment entirely from within Jupyter Notebook and if so, how? I've seen plenty of examples of people executing gym's standard environments (like SpaceInvaders-v0 or CartPole-v0) from Jupyter but even then, they are calling the environment with
env=gym.make('SpaceInvaders-v0')
and essentially executing that environment's script behind the scenes.
Below is a basic description of how my code is set-up to run from the command line and the errors that I'm getting in Jupyter.
Any advice would be appreciated. I am admittedly rather new to Gym, Python and Linux.
My basic environment code is structured like this in, say, envs/mygames/Custom_Env.py:
various import statements (numpy, gym, pyglet, copy)
class Entity()
class State()
class The_Custom_Env(core.Env) # This is the main environment class
class Shell_Class # This class calls The_Custom_Env and provides some arguments
In mygames/__ init__.py,I import the Shell_Class:
from gym.envs.mygames.Custom_Env import Shell_Class
In envs/__ init__.py, I have the environment registered
register(
id='TEST-v0',
entry_point='gym.envs.mygames:Shell_Class',
max_episode_steps=200,
reward_threshold=25.0,)
Finally, if I execute a script containing this code from the command line, the experiment works without issue:
from spinup.utils.run_utils import ExperimentGrid
from spinup import ppo_pytorch
import torch
if __name__ == '__main__':
import argparse
parser = argparse.ArgumentParser()
parser.add_argument('--cpu', type=int, default=4)
parser.add_argument('--num_runs', type=int, default=1)
args = parser.parse_args()
eg = ExperimentGrid(name='super-cool-test')
eg.add('env_name', 'TEST-v0', '', True)
eg.add('seed', [10*i for i in range(args.num_runs)])
eg.add('epochs', [10])
eg.add('steps_per_epoch', 4000)
eg.add('ac_kwargs:hidden_sizes', [(32, 32)], 'hid')
eg.add('ac_kwargs:activation', [torch.nn.ReLU], '')
eg.add('pi_lr', [0.001])
eg.add('clip_ratio', 0.3)
eg.run(ppo_pytorch, num_cpu=args.cpu)
My Jupyter Attempt
I put all of the code from Custom_env.py in cell #1.
I then registered the environment in cell #2:
gym.register(
id='TEST-v1',
entry_point='__main__:Shell_Class',
max_episode_steps=200,
reward_threshold=25.0,)
based on this Q/A: Register gym environment that is defined inside a jupyter notebook cell
, I make the environment in cell #3:
gym.make('TEST-v1')
and get this non-descriptive output:
<TimeLimit<Shell_Class< TEST-v1 >>>
In cell #4, I tried to execute ExperimentGrid code directly within Jupyter like so:
from spinup.utils.run_utils import ExperimentGrid
from spinup import ppo_pytorch
import torch
num_runs=1
cpu=4
env_name='TEST-v1'
eg = ExperimentGrid(name='Jupyter-test')
eg.add('env_name', env_name, '', True)
eg.add('seed', [10*i for i in range(num_runs)])
eg.add('epochs', 500)
eg.add('steps_per_epoch', 4000)
eg.add('ac_kwargs:hidden_sizes', [(32, 32)], 'hid')
eg.add('ac_kwargs:activation', [torch.nn.ReLU], '')
eg.add('pi_lr', 0.001)
eg.add('clip_ratio', 0.3)
eg.run(ppo_pytorch, num_cpu=cpu)
The experiment starts up as usual but then runs into some kind of error:
> ================================================================================
ExperimentGrid [Jupyter-test] runs over parameters:
env_name []
TEST-v1
seed [see]
0
epochs [epo]
500
steps_per_epoch [ste]
4000
ac_kwargs:hidden_sizes [hid]
(32, 32)
ac_kwargs:activation []
ReLU
pi_lr [pi]
0.001
clip_ratio [cli]
0.3
Variants, counting seeds: 1
Variants, not counting seeds: 1
================================================================================
Preparing to run the following experiments...
Jupyter-test_test-v1
================================================================================
Launch delayed to give you a few seconds to review your experiments.
To customize or disable this behavior, change WAIT_BEFORE_LAUNCH in
spinup/user_config.py.
================================================================================
Running experiment:
Jupyter-test_test-v1
with kwargs:
{
"ac_kwargs": {
"activation": "ReLU",
"hidden_sizes": [
32,
32
]
},
"clip_ratio": 0.3,
"env_name": "TEST-v1",
"epochs": 500,
"pi_lr": 0.001,
"seed": 0,
"steps_per_epoch": 4000
}
================================================================================
There appears to have been an error in your experiment.
Check the traceback above to see what actually went wrong. The
traceback below, included for completeness (but probably not useful
for diagnosing the error), shows the stack leading up to the
experiment launch.
================================================================================
---------------------------------------------------------------------------
CalledProcessError Traceback (most recent call last)
<ipython-input-14-de843fd528cf> in <module>
15 eg.add('pi_lr', 0.001)
16 eg.add('clip_ratio', 0.3)
---> 17 eg.run(ppo_pytorch, num_cpu=cpu)
~/Downloads/spinningup/spinup/utils/run_utils.py in run(self, thunk, num_cpu, data_dir, datestamp)
544
545 call_experiment(exp_name, thunk_, num_cpu=num_cpu,
--> 546 data_dir=data_dir, datestamp=datestamp, **var)
547
548
~/Downloads/spinningup/spinup/utils/run_utils.py in call_experiment(exp_name, thunk, seed, num_cpu, data_dir, datestamp, **kwargs)
169 cmd = [sys.executable if sys.executable else 'python', entrypoint, encoded_thunk]
170 try:
--> 171 subprocess.check_call(cmd, env=os.environ)
172 except CalledProcessError:
173 err_msg = '\n'*3 + '='*DIV_LINE_WIDTH + '\n' + dedent("""
~/anaconda3/envs/spinningup/lib/python3.6/subprocess.py in check_call(*popenargs, **kwargs)
309 if cmd is None:
310 cmd = popenargs[0]
--> 311 raise CalledProcessError(retcode, cmd)
312 return 0
313

Related

Connect to Azure ML Workspace with Python SDK1 in Compute Instance

I have a script that allow me to connect to Azure ML Workspace. But it only works in local (interactive authentification) or on Cluster Instances while doing experiences.
But I can not manage to make it work in Compute Instances.
from azureml.core import Workspace
from azureml.core.run import Run, _OfflineRun
run = Run.get_context()
if isinstance(run, _OfflineRun):
workspace = Workspace(
"subscription_id",
"resource_group",
"workspace_name",
)
else:
workspace = run.experiment.workspace
I tried to use https://learn.microsoft.com/en-us/python/api/azureml-core/azureml.core.authentication.msiauthentication?view=azure-ml-py but it did not work.
from azureml.core.authentication import MsiAuthentication
from azureml.core import Workspace
msi_auth = MsiAuthentication()
workspace = Workspace(
"subscription_id",
"resource_group",
"workspace_name",
auth=msi_auth,
)
File ~/localfiles/.venv/lib/python3.8/site-packages/azureml/_vendor/azure_cli_core/auth/adal_authentication.py:65, in MSIAuthenticationWrapper.set_token(self)
63 from azureml._vendor.azure_cli_core.azclierror import AzureConnectionError, AzureResponseError
64 try:
---> 65 super(MSIAuthenticationWrapper, self).set_token()
66 except requests.exceptions.ConnectionError as err:
67 logger.debug('throw requests.exceptions.ConnectionError when doing MSIAuthentication: \n%s',
68 traceback.format_exc())
File ~/localfiles/.venv/lib/python3.8/site-packages/msrestazure/azure_active_directory.py:596, in MSIAuthentication.set_token(self)
594 def set_token(self):
595 if _is_app_service():
--> 596 self.scheme, _, self.token = get_msi_token_webapp(self.resource, self.msi_conf)
597 elif "MSI_ENDPOINT" in os.environ:
598 self.scheme, _, self.token = get_msi_token(self.resource, self.port, self.msi_conf)
File ~/localfiles/.venv/lib/python3.8/site-packages/msrestazure/azure_active_directory.py:548, in get_msi_token_webapp(resource, msi_conf)
546 raise RuntimeError(err_msg)
547 _LOGGER.debug('MSI: token retrieved')
--> 548 token_entry = result.json()
549 return token_entry['token_type'], token_entry['access_token'], token_entry
File ~/localfiles/.venv/lib/python3.8/site-packages/requests/models.py:975, in Response.json(self, **kwargs)
971 return complexjson.loads(self.text, **kwargs)
972 except JSONDecodeError as e:
973 # Catch JSON-related errors and raise as requests.JSONDecodeError
974 # This aliases json.JSONDecodeError and simplejson.JSONDecodeError
--> 975 raise RequestsJSONDecodeError(e.msg, e.doc, e.pos)
JSONDecodeError: Expecting value: line 1 column 1 (char 0)
I do not want to use SDK2, it will break all my existing code.
I don't understand why the identidy is not automatically managed when starting the Compute Instances like Cluster Instances.
Does any one has a solution for this?
Use the below code blocks to connect to the existing workspace using the python SDK using the compute instances.
Create a compute instance
from azureml.core import Workspace
ws = Workspace.from_config()
ws.get_details()
Output:
Connecting to workspace using Config file:
import os
from azureml.core import Workspace
ws = Workspace.from_config()

Calling Julia from Streamlit App using PyJulia

I'm trying to use a julia function from a streamlit app. Created a toy example to test the interaction, simply returning a matrix from a julia functions based on a single parameter to specify the value of the diagonal elements.
Will also note at the outset that both julia_import_method = "api_compiled_false" and julia_import_method = "main_include" works when importing the function in Spyder IDE (rather than at the command line to launch the streamlit app via streamlit run streamlit_julia_test.py).
My project directory looks like:
├── my_project_directory
│   ├── julia_test.jl
│   └── streamlit_julia_test.py
The julia function is in julia_test.jl and just simply returns a matrix with diagonals specified by the v parameter:
function get_matrix_from_julia(v::Int)
m=[v 0
0 v]
return m
end
The streamlit app is streamlit_julia_test.py and is defined as:
import os
from io import BytesIO
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import streamlit as st
# options:
# main_include
# api_compiled_false
# dont_import_julia
julia_import_method = "api_compiled_false"
if julia_import_method == "main_include":
# works in Spyder IDE
import julia
from julia import Main
Main.include("julia_test.jl")
elif julia_import_method == "api_compiled_false":
# works in Spyder IDE
from julia.api import Julia
jl = Julia(compiled_modules=False)
this_dir = os.getcwd()
julia_test_path = """include(\""""+ this_dir + """/julia_test.jl\"""" +")"""
print(julia_test_path)
jl.eval(julia_test_path)
get_matrix_from_julia = jl.eval("get_matrix_from_julia")
elif julia_import_method == "dont_import_julia":
print("Not importing ")
else:
ValueError("Not handling this case:" + julia_import_method)
st.header('Using Julia in Streamlit App Example')
st.text("Using Method:" + julia_import_method)
matrix_element = st.selectbox('Set Matrix Diagonal to:', [1,2,3])
matrix_numpy = np.array([[matrix_element,0],[0,matrix_element]])
col1, col2 = st.columns([4,4])
with col1:
fig, ax = plt.subplots(figsize=(5,5))
sns.heatmap(matrix_numpy, ax = ax, cmap="Blues",annot=True)
ax.set_title('Matrix Using Python Numpy')
buf = BytesIO()
fig.savefig(buf, format="png")
st.image(buf)
with col2:
if julia_import_method == "dont_import_julia":
matrix_julia = matrix_numpy
else:
matrix_julia = get_matrix_from_julia(matrix_element)
fig, ax = plt.subplots(figsize=(5,5))
sns.heatmap(matrix_julia, ax = ax, cmap="Blues",annot=True)
ax.set_title('Matrix from External Julia Script')
buf = BytesIO()
fig.savefig(buf, format="png")
st.image(buf)
If the app were working correctly, it would look like this (which can be reproduced by setting the julia_import_method = "dont_import_julia" on line 13):
Testing
When I try julia_import_method = "main_include", I get the well known error:
julia.core.UnsupportedPythonError: It seems your Julia and PyJulia setup are not supported.
Julia executable:
julia
Python interpreter and libpython used by PyCall.jl:
/Users/myusername/opt/anaconda3/bin/python3
/Users/myusername/opt/anaconda3/lib/libpython3.9.dylib
Python interpreter used to import PyJulia and its libpython.
/Users/myusername/opt/anaconda3/bin/python
/Users/myusername/opt/anaconda3/lib/libpython3.9.dylib
Your Python interpreter "/Users/myusername/opt/anaconda3/bin/python"
is statically linked to libpython. Currently, PyJulia does not fully
support such Python interpreter.
The easiest workaround is to pass `compiled_modules=False` to `Julia`
constructor. To do so, first *reboot* your Python REPL (if this happened
inside an interactive session) and then evaluate:
>>> from julia.api import Julia
>>> jl = Julia(compiled_modules=False)
Another workaround is to run your Python script with `python-jl`
command bundled in PyJulia. You can simply do:
$ python-jl PATH/TO/YOUR/SCRIPT.py
See `python-jl --help` for more information.
For more information, see:
https://pyjulia.readthedocs.io/en/latest/troubleshooting.html
As suggested, when I set julia_import_method = "api_compiled_false", I get a seg fault:
include("my_project_directory/julia_test.jl")
2022-04-03 10:23:13.406 Traceback (most recent call last):
File "/Users/myusername/opt/anaconda3/lib/python3.9/site-packages/streamlit/script_runner.py", line 430, in _run_script
exec(code, module.__dict__)
File "my_project_directory/streamlit_julia_test.py", line 25, in <module>
jl = Julia(compiled_modules=False)
File "/Users/myusername/.local/lib/python3.9/site-packages/julia/core.py", line 502, in __init__
if not self.api.was_initialized: # = jl_is_initialized()
File "/Users/myusername/.local/lib/python3.9/site-packages/julia/libjulia.py", line 114, in __getattr__
return getattr(self.libjulia, name)
File "/Users/myusername/opt/anaconda3/lib/python3.9/ctypes/__init__.py", line 395, in __getattr__
func = self.__getitem__(name)
File "/Users/myuserame/opt/anaconda3/lib/python3.9/ctypes/__init__.py", line 400, in __getitem__
func = self._FuncPtr((name_or_ordinal, self))
AttributeError: dlsym(0x21b8e5840, was_initialized): symbol not found
signal (11): Segmentation fault: 11
in expression starting at none:0
Allocations: 35278345 (Pool: 35267101; Big: 11244); GC: 36
zsh: segmentation fault streamlit run streamlit_julia_test.py
I've also tried the alternative recommendation provided in the PyJulia response message regarding the use of:
python-jl my_project_directory/streamlit_julia_test.py
But I get this error when running the python-jl command:
INTEL MKL ERROR: dlopen(/Users/myusername/opt/anaconda3/lib/libmkl_intel_thread.1.dylib, 0x0009): Library not loaded: #rpath/libiomp5.dylib
Referenced from: /Users/myusername/opt/anaconda3/lib/libmkl_intel_thread.1.dylib
Reason: tried: '/Applications/Julia-1.7.app/Contents/Resources/julia/bin/../lib/libiomp5.dylib' (no such file), '/usr/local/lib/libiomp5.dylib' (no such file), '/usr/lib/libiomp5.dylib' (no such file).
Intel MKL FATAL ERROR: Cannot load libmkl_intel_thread.1.dylib.
So I'm stuck, thanks in advance for a modified reproducible example or instructions for the following system specs!
System specs:
Mac OS Monterey 12.2.1 (Chip - Mac M1 Pro)
Python 3.9.7
Julia 1.7.2
PyJulia 0.5.8.dev
Streamlit 1.7.0
yes we can use it streamlit_julia_test.py NumPy for instance

dagster solid Parallel Run Test exmaple

I tried to run parallel, but it didn't work as I expected
The progress bar doesn't work the way I thought it would.
I think that both operations should be executed at the same time.
but first run find_highest_calorie_cereal after find_highest_protein_cereal
import csv
import time
import requests
from dagster import pipeline, solid
# start_complex_pipeline_marker_0
#solid
def download_cereals():
response = requests.get("https://docs.dagster.io/assets/cereal.csv")
lines = response.text.split("\n")
return [row for row in csv.DictReader(lines)]
#solid
def find_highest_calorie_cereal(cereals):
time.sleep(5)
sorted_cereals = list(
sorted(cereals, key=lambda cereal: cereal["calories"])
)
return sorted_cereals[-1]["name"]
#solid
def find_highest_protein_cereal(context, cereals):
time.sleep(10)
sorted_cereals = list(
sorted(cereals, key=lambda cereal: cereal["protein"])
)
# for i in range(1, 11):
# context.log.info(str(i) + '~~~~~~~~')
# time.sleep(1)
return sorted_cereals[-1]["name"]
#solid
def display_results(context, most_calories, most_protein):
context.log.info(f"Most caloric cereal 테스트: {most_calories}")
context.log.info(f"Most protein-rich cereal: {most_protein}")
#pipeline
def complex_pipeline():
cereals = download_cereals()
display_results(
most_protein=find_highest_protein_cereal(cereals),
most_calories=find_highest_calorie_cereal(cereals),
)
I am not sure but I think you should set up a executor with parallelism available. You could use multiprocess_executor.
"Executors are responsible for executing steps within a pipeline run.
Once a run has launched and the process for the run, or run worker,
has been allocated and started, the executor assumes responsibility
for execution."
modes provide the possible set of executors one can use. Use the executor_defs property on ModeDefinition.
MODE_DEV = ModeDefinition(name="dev", executor_defs=[multiprocess_executor])
#pipeline(mode_defs=[MODE_DEV], preset_defs=[Preset_test])
the execution config section of the run config determines the actual executor.
in the yml file or run_config, set:
execution:
multiprocess:
config:
max_concurrent: 4
retrieved from : https://docs.dagster.io/deployment/executors

How to deal with airflow error taskinstance.py line 983, in _run_raw_task / psycopg2?

I have a couple of DAGs running, which execute python scripts which are basically copying data from A to B. But one of the dags throws an error - but data is still getting copied, so somehow it does not have influence on the execution of the python script.
The only "special" what is within this dag is, that the python script builds up a connecton to a postgres database using psycopg2~=2.8.5 but not sure if this is somehow the root cause.
I also checked the permissions for the user, which seem to be fine at least in the dags folder.
Is there any specific timeout value I have to adjust in the config?
[2021-05-19 12:53:42,036] {taskinstance.py:1145} ERROR - Bash command failed 255
Traceback (most recent call last):
File "/hereisthepath/venv/lib64/python3.6/site-packages/airflow/models/taskinstance.py", line 983, in _run_raw_task
result = task_copy.execute(context=context)
File "/hereisthepath/venv/lib64/python3.6/site-packages/airflow/operators/bash_operator.py", line 134, in execute
raise AirflowException("Bash command failed")
airflow.exceptions.AirflowException: Bash command failed
Update: This is the passage of the operator, which fails. I copied the entire function, however the error throws at line 134 ("raise AirflowException("Bash command failed"))
def execute(self, context):
"""
Execute the bash command in a temporary directory
which will be cleaned afterwards
"""
self.log.info("Tmp dir root location: \n %s", gettempdir())
# Prepare env for child process.
env = self.env
if env is None:
env = os.environ.copy()
airflow_context_vars = context_to_airflow_vars(context, in_env_var_format=True)
self.log.debug('Exporting the following env vars:\n%s',
'\n'.join(["{}={}".format(k, v)
for k, v in airflow_context_vars.items()]))
env.update(airflow_context_vars)
self.lineage_data = self.bash_command
with TemporaryDirectory(prefix='airflowtmp') as tmp_dir:
with NamedTemporaryFile(dir=tmp_dir, prefix=self.task_id) as f:
f.write(bytes(self.bash_command, 'utf_8'))
f.flush()
fname = f.name
script_location = os.path.abspath(fname)
self.log.info(
"Temporary script location: %s",
script_location
)
def pre_exec():
# Restore default signal disposition and invoke setsid
for sig in ('SIGPIPE', 'SIGXFZ', 'SIGXFSZ'):
if hasattr(signal, sig):
signal.signal(getattr(signal, sig), signal.SIG_DFL)
os.setsid()
self.log.info("Running command: %s", self.bash_command)
self.sub_process = Popen(
['bash', fname],
stdout=PIPE, stderr=STDOUT,
cwd=tmp_dir, env=env,
preexec_fn=pre_exec)
self.log.info("Output:")
line = ''
for line in iter(self.sub_process.stdout.readline, b''):
line = line.decode(self.output_encoding).rstrip()
self.log.info(line)
self.sub_process.wait()
self.log.info(
"Command exited with return code %s",
self.sub_process.returncode
)
if self.sub_process.returncode:
raise AirflowException("Bash command failed")
if self.xcom_push_flag:
return line
Update2: It really seems, that this behavior is related to the psycopg2: I now tested all other possible error sources and only when I test with the postgres datasource using psycopg2 package, the error occurs. Meanwhile I also upgraded to the most recent version of psycopg2 (2.8.6) but without success.
Maybe this helps for further investigation

Dagster: Multiple and Conditional Outputs (Type check failed for step output xxx PySparkDataFrame)

I'm executing the Dagster tutorial, and I got stuck at the Multiple and Conditional Outputs step.
In the solid definitions, it asks to declare (among other things):
output_defs=[
OutputDefinition(
name="hot_cereals", dagster_type=DataFrame, is_required=False
),
OutputDefinition(
name="cold_cereals", dagster_type=DataFrame, is_required=False
),
],
But there's no information where the DataFrame cames from.
Firstly I have tried with pandas.DataFrame but I faced the error: {dagster_type} is not a valid dagster type. It happens when I try to submit it via $ dagit -f multiple_outputs.py.
Then I installed the dagster_pyspark and gave a try with the dagster_pyspark.DataFrame. This time I managed to summit the DAG to the UI. However, when I run it from the UI, I got the following error:
dagster.core.errors.DagsterTypeCheckDidNotPass: Type check failed for step output hot_cereals of type PySparkDataFrame.
File "/Users/bambrozio/.local/share/virtualenvs/dagster-tutorial/lib/python3.7/site-packages/dagster/core/execution/plan/execute_plan.py", line 210, in _dagster_event_sequence_for_step
for step_event in check.generator(step_events):
File "/Users/bambrozio/.local/share/virtualenvs/dagster-tutorial/lib/python3.7/site-packages/dagster/core/execution/plan/execute_step.py", line 273, in core_dagster_event_sequence_for_step
for evt in _create_step_events_for_output(step_context, user_event):
File "/Users/bambrozio/.local/share/virtualenvs/dagster-tutorial/lib/python3.7/site-packages/dagster/core/execution/plan/execute_step.py", line 298, in _create_step_events_for_output
for output_event in _type_checked_step_output_event_sequence(step_context, output):
File "/Users/bambrozio/.local/share/virtualenvs/dagster-tutorial/lib/python3.7/site-packages/dagster/core/execution/plan/execute_step.py", line 221, in _type_checked_step_output_event_sequence
dagster_type=step_output.dagster_type,
Does anyone know how to fix it? Thanks for the help!
As Arthur pointed out, the full tutorial code is available on Dagster's github.
However, you do not need dagster_pandas, rather, the key lines missing from your code are:
if typing.TYPE_CHECKING:
DataFrame = list
else:
DataFrame = PythonObjectDagsterType(list, name="DataFrame") # type: Any
The reason for the above structure is to achieve MyPy compliance, see the Types & Expectations section of the tutorial.
See also the documentation on Dagster types.
I was stuck here, too, but luckily I found the updated source code.
They have updated the docs so that the OutputDefinition is defined beforehand.
Update your code before sorting and pipeline like below:
import csv
import os
from dagster import (
Bool,
Field,
Output,
OutputDefinition,
execute_pipeline,
pipeline,
solid,
)
#solid
def read_csv(context, csv_path):
lines = []
csv_path = os.path.join(os.path.dirname(__file__), csv_path)
with open(csv_path, "r") as fd:
for row in csv.DictReader(fd):
row["calories"] = int(row["calories"])
lines.append(row)
context.log.info("Read {n_lines} lines".format(n_lines=len(lines)))
return lines
#solid(
config_schema={
"process_hot": Field(Bool, is_required=False, default_value=True),
"process_cold": Field(Bool, is_required=False, default_value=True),
},
output_defs=[
OutputDefinition(name="hot_cereals", is_required=False),
OutputDefinition(name="cold_cereals", is_required=False),
],
)
def split_cereals(context, cereals):
if context.solid_config["process_hot"]:
hot_cereals = [cereal for cereal in cereals if cereal["type"] == "H"]
yield Output(hot_cereals, "hot_cereals")
if context.solid_config["process_cold"]:
cold_cereals = [cereal for cereal in cereals if cereal["type"] == "C"]
yield Output(cold_cereals, "cold_cereals")
You can also find the whole lines of codes from here.
Try first to install the dagster pandas integration:
pip install dagster_pandas
Then do:
from dagster_pandas import DataFrame
You can find the code from the tutorial here.

Resources