How do I load a software defined asset in dagster from outside dagster? - dagster

I'm failry new to dagster, but I really hope it has the feature of loading a software deffined asset from outside dagster.
To explain my ask:
consider this dagster graph:
#asset
def users()->list[int]:
return [1,2,3]
#asset
def new_users(users)->list[int]:
return [u for u in users if is_new(u)]
The details don't really matter, just that there is some dag that generates some output.
Outside the dagster project, I have some jupyter notebooks. I'd like to be able to load the new_users list. If I had to invent an API for it I'd look something like:
# In some jupyter notebook for example
from dagster.{something} import Project
project = Project(...)
new_users = project.load_asset(asset='load_asset', force_refresh=False)
Does dagster have this type of functionality?

Dagster has the load_asset_value() function for that, see the docs under https://docs.dagster.io/concepts/assets/software-defined-assets#loading-asset-values-outside-of-dagster-runs

Related

Project structure in Julia with clean separation

I am trying to correctly configure julia project.
The project had initially:
One module in its own directory
Another "kind of module" in another directory, but there was no module keyword and it was patched together with help of includes.
A few files in the root to execute it all by just running "julia run.jl" for example.
No packages, no Project.toml no Manifest.toml - there is "packages.jl", which manually calls "Pkg.add" to for preset list of dependencies.
Not all includes were used to put it all together, there was some fiddling with LOAD_PATH
Logically the project contains 3 parts that I see there and something as I would see as "packages" in for example python world.
One "Common" module with basic util functions shared by all interested modules.
One module A, which has dependency on "Common".
Module B, which has dependency on A and "Common".
What I did is, I created 3 modules in their separate directories and to put it all together. These modules make sense internally and there is no real reason to expose them "outside". The whole code is in the end executable and there would be exposed probably just one function, that executes everything. I created loader file, which included all 3 module files. That way I got rid of LOAD_PATH and references in IDE started to work. This works for our purposes, but still isn't ideal. I have been reading quite a lot about Julia structure and possibilities - modules, packages, but still don't understand fully. And Revise doesn't work.
Is it correct to have modules like this? I like modules as they clearly set boundaries between modules using export lines.
I would also like to make the code as compatible with IDE as possible (LOAD_PATH settings didn't work for VS code and references to functions were broken) and also with Revise.
What is the typical structure for this?
How to clearly separate code while make the development easy?
How to make this work with Revise?
I expect it's good idea to make a package for this, but should I make it for the whole project? Then it would mean one "project" module and 3 submodules A, B and Common?
Or should it be 3(4) packages per module?
Thanks for any output. Comparison of some of the principles to Python/Java/kotlin/C#/Javascript module/packages could be helpful.
In CubicEoS.jl I've made a single package which provides two modules: CubicEoS with general algorithms and CubicEoS.BrusilovskyEoS with an implementation of necessary interface to make CubicEoS works on a concrete case. The last module depends on the first. Revise works well. As a user (not developer) of CubicEoS, I have scripts which run some calculations.
So, in your case, I would create a single package with four modules. The forth module is a hood for the others: Common, A and B. The possible file structure maybe like this
src/
TheHood.jl
Common/Common.jl
Module_A/Module_A.jl
Module_B/Module_B.jl
test/
...
examples/ # those may put out of here, but the important examples may be in test/
...
Project.toml
And, the possible module structure may be like this
# TheHood.jl
module TheHood
export bar
include("Common/Common.jl")
include("Module_A/Module_A.jl")
include("Module_B/Module_B.jl")
end
# Common/Common.jl
module Common
export util_1, util_2
using LinearAlgebra
util_1(x) = "util_1's implementation"
util_2(x) = "util_2's implementation"
end
# Module_A/Module_A.jl
module Module_A
import ..Common
export foo
foo(x) = "foo's implementation"
end
# Module_B/Module_B.jl
module Module_B
import ..Common
import ..Module_A
export bar
bar(x) = "bar's implementation"
end
Now, I'll answer the questions
What is the typical structure for this?
If the modules does not use independently, I would use the structure above. For small projects I've found "one package = one module" strategy painful when a core package updates.
How to clearly separate code while make the development easy?
In Julia, a module effectively is a namespace. In inner modules (like A, B, Common), I usually import the dev-modules, but use modules which a dev-module depends on (see above using LinearAlgebra vs import ..Common. That's my preference to clarify names.
How to make this work with Revise?
Turn the code into a package. That's preferable, because Reviseing of standlone modules is buggy (at least, in my experience). I'm usually using Revise like this
% cd where_the_package_lives # actually, where the project.toml is
% julia --project=. # or julia and pkg> activate .
% julia> using Revise
% julia> using ThePackage
After that I usually can edit source code and call the updated methods w/o restarting of the REPL. But, Revise has some limitations.
I expect it's good idea to make a package for this, but should I make it for the whole project? Then it would mean one "project" module and 3 submodules A, B and Common?
Or should it be 3(4) packages per module?
You should separate "scripty" (throughaway or command line scripts) and core (reusable) code. The core I would put in a single package. The scripty files (like examples/, CLI programs) should be alone using the package. The scripty files may define modules or whatever, but their users are endusers, not developers (e.g. running a script involves an i/o operation).
Just to end this.
The main thing I was missing and would save me a lot of time if I knew:
If you have your own package, it means the file src/YourPackage.jl will be "imported/included" once "using" keyword is being used and everything just works. No need to do any kind of import/LOAD_PATH magic.
just doing that fixed 95% of my problems. I ended up following patterns in accepted answer and also from the comment recommending DrWatson.
Only small thing is the fact that "scripts" don't work well with VSCode. "find all references" or "go to definition" and autocomplete don't work. They do work for everything within "YourPackage.jl" and it's imports perfectly and so does Revise. It's really tiny thing since scripts are usually like 3 lines of code.

Why use `exec_statement` when writing PyInstaller hooks?

In the PyInstaller docs they demonstrate the use of eval_statement() and exec_statement() which call eval() or exec() in a new instance of Python. But they don't say why you would want to run your code in a separate instance.
For example, why couldn't their example of:
from PyInstaller.utils.hooks import exec_statement
mpl_data_dir = exec_statement(
"import matplotlib; print(matplotlib._get_data_path())"
)
datas = [ (mpl_data_dir, "") ]
not just be:
import matplotlib
datas = [(matplotlib._get_data_path(), "")]
I've tried doing this with my own library and it doesn't seem to do it any harm. So why the extra complexity? Why do all the other hooks included in PyInstaller use the 1st method?
All the hooks use the first method in case of path and environment manipulations. What happens if matplotlib changes sys.path or some other environmental manipulation that could mess with the build process? Using a separate python instance means this won't happen.

How to create a sys image that has multiple packages with their precompiled functions cached in julia?

Let's say we have a symbol array of packages packages::Vector{Symbol} = [...] and we want to create a sys image using PackageCompiler.jl. We could simply use
using PackageCompiler
create_sysimage(packages; incremental = false, sysimage_path = "custom_sys.dll"
but without a precompile_execution_file, this isn't going to be worth it.
Note: sysimage_path = "custom_sys.so" on Linux and "custom_sys.dylib" on macOS...
For the precompile_execution_file, I thought running the test for each package might do it so I did something like this:
precompilation.jl
packages = [...]
#assert typeof(packages) == Vector{Symbol}
import Pkg
m = Module()
try Pkg.test.(Base.require.(m, packages)) catch ; end
The try catch is for when some tests give an error and we don't want it to fail.
Then, executing the following in a shell,
using PackageCompiler
packages = [...]
Pkg.add.(String.(packages))
Pkg.update()
Pkg.build.(String.(packages))
create_sysimage(packages; incremental = false,
sysimage_path = "custom_sys.dll",
precompile_execution_file = "precompilation.jl")
produced a sys image dynamic library which loaded without a problem. When I did using Makie, there was no delay so that part's fine but when I did some plotting with Makie, there still was the first time plot delay so I am guessing the precompilation script didn't do what I thought it would do.
Also when using tab to get suggestions in the repl, it would freeze the first time but I am guessing this is an expected side effect.
There are a few problems with your precompilation.jl script, that make tests throw errors which you don't see because of the try...catch.
But, although running the tests for each package might be a good idea to exercise precompilation, there are deeper reasons why I don't think it can work so simply:
Pkg.test spawns a new process in which tests actually run. I don't think that PackageCompiler can see what happens in this separate process.
To circumvent that, you might want to simply include() every package's test/runtests.jl file. But this is likely to fail too, because of missing test-specific dependencies.
So I would say that, for this to work reliably and systematically for all packages, you'd have to re-implement (or re-use, if you can) some of the internal logic of Pkg.test in order to add all test-specific dependencies to the current environment.
That being said, some packages have ready-to-use precompilation scripts helping to do just this. This is the case for Makie, which suggests in its documentation to use the following file to build system images:
joinpath(pkgdir(Makie), "test", "test_for_precompile.jl")

Displaying function dependency graph in iPython Notebook

I frequently use Jupyter notebook for collaborative purposes. Frequently, people write functions within their own modules that call functions from other modules, all of which are part of our library. An example case would be:
module1.f1 -> module2.f2 -> module3.f3 -> pandas functions.
All the functions, f1/f2/f3 follow docstring format. Is there a way to display the function hierarch f1 - f2 -f3 inside the notebook?
Maybe use plantuml?
You then choose the UML which gives you something you like ...

How can you programmatically tell the CPython interpreter to enter interactive mode when done?

If you invoke the cpython interpreter with the -i option, it will enter the interactive mode upon completing any commands or scripts it has been given to run. Is there a way, within a program to get the interpreter to do this even when it has not been given -i? The obvious use case is in debugging by interactively inspecting the state when an exceptional condition has occurred.
You want the code module.
#!/usr/bin/env python
import code
code.interact("Enter Here")
Set the PYTHONINSPECT environment variable. This can also be done in the script itself:
import os
os.environ["PYTHONINSPECT"] = "1"
For debugging unexpected exceptions, you could also use this nice recipe http://code.activestate.com/recipes/65287/
The recipe metioned in the other answer using sys.excepthook, sounds like what you want. Otherwise, you could run code.interact on program exit:
import code
import sys
sys.exitfunc = code.interact
The best way to do this that I know of is:
from IPython import embed
embed()
which allows access to variables in the current scope and brings you the full power of IPython.

Resources