4. Architecture¶
- compute
- computation
- phase
The definition & execution of networked operation is split in 1+2 phases:
… it is constrained by these IO data-structures:
… populates these low-level data-structures:
network (COMPOSE time)
execution dag (COMPILE time)
execution steps (COMPILE time)
solution (EXECUTE time)
… and utilizes these main classes:
graphtik.fnop.FnOp
(fn[, name, rescheduled, …])An operation performing a callable (ie a function, a method, a lambda).
graphtik.pipeline.Pipeline
(operations, name, *)An operation that can compute a network-graph of operations.
graphtik.planning.Network
(*operations[, graph])A graph of operations that can compile an execution plan.
graphtik.execution.ExecutionPlan
(net, needs, …)A pre-compiled list of operation steps that can execute for the given inputs/outputs.
graphtik.execution.Solution
(plan, input_values)The solution chain-map and execution state (e.g.
… plus those for plotting:
graphtik.plot.Plotter
(theme, **styles_kw)graphtik.plot.Theme
(*, _prototype, **kw)The poor man’s css-like plot theme (see also
StyleStack
).- compose
- composition
The phase where operations are constructed and grouped into pipelines and corresponding networks based on their dependencies.
Tip
Use
operation()
factory to constructFnOp
instances (a.k.a. operations).Use
compose()
factory to buildPipeline
instances (a.k.a. pipelines).
- combine pipelines
When operations and/or pipelines are composed together, there are two ways to combine the operations contained into the new pipeline: operation merging (default) and operation nesting.
They are selected by the
nest
parameter ofcompose()
factory.- operation merging
The default method to combine pipelines, also applied when simply merging operations.
Any identically-named operations override each other, with the operations added earlier in the
.compose()
call (further to the left) winning over those added later (further to the right).- seealso
- operation nesting
The elaborate method to combine pipelines forming clusters.
The original pipelines are preserved intact in “isolated” clusters, by prefixing the names of their operations (and optionally data) by the name of the respective original pipeline that contained them (or the user defines the renames).
- seealso
Nesting,
compose()
,RenArgs
,nest_any_node()
,dep_renamed()
,PlotArgs.clusters
, Hierarchical data and further tricks (example).
- compile
- planning
The phase where the
Network
creates a new execution plan by pruning all graph nodes into a subgraph dag, and deriving the execution steps.- execute
- execution
- sequential
The phase where the plan derived from a pipeline calls the underlying functions of all operations contained in its execution steps, with inputs/outputs taken/written to the solution.
Currently there are 2 ways to execute:
sequential
(unstable) parallel, with a
multiprocessing.pool.ProcessPool
Plans may abort their execution by setting the abort run global flag.
- network
- graph
A
Network.graph
of operations linked by their dependencies implementing a pipeline.During composition, the nodes of the graph are connected by repeated calls of
Network._append_operation()
withinNetwork
constructor.During planning the graph is pruned based on the given inputs, outputs & node predicate to extract the dag, and it is ordered, to derive the execution steps, stored in a new plan, which is then cached on the
Network
class.- plan
- execution plan
Class
ExecutionPlan
perform the execution phase which contains the dag and the steps.compileed execution plans are cached in
Network._cached_plans
across runs with (inputs, outputs, predicate) as key.- solution
A map of dependency-named values fed to/from the pipeline during execution.
It feeds operations with inputs, collects their outputs, records the status of executed or canceled operations, tracks any overwrites, and applies any evictions, as orchestrated by the plan.
A new
Solution
instance is created either internally byPipeline.compute()
and populated with user-inputs, or must be created externally with those values and fed into the said method.The class inherits
collections.ChainMap
, to keep the results of each operation executed in a separate solution layer dictionary (+1 for user-inputs).The results of the last operation executed “win” in the outputs produced, and the base (least precedence) is the user-inputs given when the execution started.
Certain values may be extracted/populated with accessors.
- layer
- solution layer
By default, the solution class keeps the outputs of each executed operation (and given inputs) in separate dictionaries (layers).
This layering is disabled if a jsonp dependency exists in the network, assuming that
set_layered_solution()
configurations has not been called with aTrue/False
, nor has the respective parameter been given to methodscompute()
/execute()
.Hint
Combining hierarchical data with per-operation layers in solution leads to duplications of container nodes in the data tree. To retrieve the complete solution, merging of overwritten nodes across the layers would then be needed.
- overwrite
solution values written by more than one operations in the respective layer, accessed by
Solution.overwrites
attribute (assuming that layers have not been disabled e.g. due to hierarchical data).Note that sideffected outputs always produce an overwrite.
- prune
- pruning
A subphase of planning performed by method
Network._prune_graph()
, which extracts a subgraph dag that does not contain any unsatisfied operations.It topologically sorts the graph, and prunes based on given inputs, asked outputs, node predicate and operation needs & provides.
- unsatisfied operation
The core of pruning & rescheduling, performed by
planning.unsatisfied_operations()
function, which collects all operations with unreachable dependencies:- dag
- execution dag
- solution dag
There are 2 directed-acyclic-graphs instances used:
the
ExecutionPlan.dag
, in the execution plan, which contains the pruned nodes, used to decide the execution steps;the
Solution.dag
in the solution, which derives the canceled operations due to rescheduled/failed operations upstream.
- steps
- execution steps
The plan contains a list of the operation-nodes only from the dag, topologically sorted, and interspersed with instruction steps needed to compute the asked outputs from the given inputs.
They are built by
Network._build_execution_steps()
based on the subgraph dag.The only instruction step other than an operation is for performing an eviction.
- eviction
A memory footprint optimization where intermediate inputs & outputs are erased from solution as soon as they are not needed further down the dag.
Evictions are pre-calculated during planning, denoted with the dependency inserted in the steps of the execution plan.
- inputs
The named input values that are fed into an operation (or pipeline) through
Operation.compute()
method according to its needs.These values are either:
given by the user to the outer pipeline, at the start of a computation, or
derived from solution using needs as keys, during intermediate execution.
- outputs
The dictionary of computed values returned by an operation (or a pipeline) matching its provides, when method
Operation.compute()
is called.Those values are either:
retained in the solution, internally during execution, keyed by the respective provide, or
returned to user after the outer pipeline has finished computation.
When no specific outputs requested from a pipeline,
Pipeline.compute()
returns all intermediate inputs along with the outputs, that is, no evictions happens.An operation may return partial outputs.
- pipeline
The
Pipeline
composes and computes a network of operations against given inputs & outputs.This class is also an operation, so it specifies needs & provides but these are not fixed, in the sense that
Pipeline.compute()
can potentially consume and provide different subsets of inputs/outputs.- operation
Either the abstract notion of an action with specified needs and provides, dependencies, or the concrete wrapper
FnOp
for (anycallable()
), that feeds on inputs and update outputs, from/to solution, or given-by/returned-to the user by a pipeline.The distinction between needs/provides and inputs/outputs is akin to function parameters and arguments during define-time and run-time, respectively.
- dependency
The (possibly hierarchical) name of a solution value an operation needs or provides.
Dependencies are declared during composition, when building
FnOp
instances. Operations are then interlinked together, by matching the needs & provides of all operations contained in a pipeline.During planning the graph is then pruned based on the reachability of the dependencies.
During execution
Operation.compute()
performs 2 “matchings”:inputs & outputs in solution are accessed by the needs & provides names of the operations;
operation needs & provides are zipped against the underlying function’s arguments and results.
These matchings are affected by modifiers, print-out with diacritics.
Differences between various dependency operation attributes:
dependency attribute
dupes
sfx
alias
SFXED
needs
needs
✓
✓
SINGULAR
op_needs
✗
✓
SINGULAR
_fn_needs
✓
✗
STRIPPED
provides
provides
✓
✓
✗
SINGULAR
op_provides
✗
✓
✓
SINGULAR
_fn_provides
✓
✗
✗
STRIPPED
- needs
- fn_needs
The list of dependency names an operation requires from solution as inputs,
roughly corresponding to underlying function’s arguments (fn_needs).
Specifically,
Operation.compute()
extracts input values from solution by these names, and matches them against function arguments, mostly by their positional order. Whenever this matching is not 1-to-1, and function-arguments differ from the regular needs, modifiers must be used.- provides
- op_provides
- fn_provides
The list of dependency names an operation writes to the solution as outputs,
roughly corresponding to underlying function’s results (fn_provides).
Specifically,
Operation.compute()
“zips” this list-of-names with the output values produced when the operation’s function is called. Whenever this “zipping” is not 1-to-1, and function-results differ from the regular operation (op_provides) (or results are not a list), it is possible to:mark the operation that its function returns dictionary,
artificially extended the provides with aliased fn_provides, or
use modifiers to annotate certain names as sideffects,
- alias
Map an existing name in fn_provides into a duplicate, artificial one in op_provides .
You cannot alias an alias. See Aliased provides
- conveyor operation
- default identity function
The default function if none given to an operation that conveys needs to provides.
For this to happen when
FnOp.compute()
is called, an operation name must have been given AND the number of provides must match that of the number of needs.- seealso
Default conveyor operation &
identity_function()
.
- returns dictionary
When an operation is marked with
FnOp.returns_dict
flag, the underlying function is not expected to return fn_provides as a sequence but as a dictionary; hence, no “zipping” of function-results –> fn_provides takes place.Usefull for operations returning partial outputs to have full control over which outputs were actually produced, or to cancel sideffects.
- modifier
- diacritic
A modifier change dependency behavior during planning or execution.
For instance, a needs may be annotated as
keyword()
and/or optionals function arguments, provides and needs can be annotated as “ghost” sideffects or assigned an accessor to work with hierarchical data.The
representation
of modifier-annotated dependencies utilize a combination of these diacritics:> :
keyword()
? :optional()
* :vararg()
+ :varargs()
$ : accessor (mostly for jsonp)See
graphtik.modifier
module.- optionals
A needs only modifier for a inputs that do not hinder operation execution (prune) if absent from solution.
In the underlying function it corresponds to either:
non-compulsory function arguments (with defaults), annotated with
optional()
, or
- varargish
A needs only modifier for inputs to be appended as
*args
(if present in solution).There are 2 kinds, both, by definition, optionals:
the
vararg()
annotates any solution value to be appended once in the*args
;the
varargs()
annotates iterable values and all its items are appended in the*args
one-by-one.
Attention
To avoid user mistakes, varargs do not accept
str
inputs (though iterables):>>> graph(a=5, b="mistake") Traceback (most recent call last): ... graphtik.base.MultiValueError: Failed preparing needs: 1. Expected needs['b'(+)] to be non-str iterables! +++inputs: ['a', 'b'] +++FnOp(name='enlist', needs=['a', 'b'(+)], provides=['sum'], fn='enlist') (tip: set GRAPHTIK_DEBUG envvar to raise immediately and/or enable DEBUG-logging)
In printouts, it is denoted either with
*
or+
diacritic.See also the elaborate example in Hierarchical data and further tricks section.
- sideffects
A modifier denoting a fictive dependency linking operations into virtual flows, without real data exchanges.
The side-effect modification may happen to some internal state not fully represented in the graph & solution.
There are actually 2 relevant modifiers:
An abstract sideffect modifier (annotated with
sfx()
) describing modifications taking place beyond the scope of the solution. It may have just the “optional” diacritic in printouts.The sideffected modifier (annotated with
sfxed()
) denoting modifications on a real dependency read from and written to the solution.
Both kinds of sideffects participate in the planning of the graph, and both may be given or asked in the inputs & outputs of a pipeline, but they are never given to functions. A function of a returns dictionary operation can return a falsy value to declare it as canceled.
- sideffected
A modifier that denotes sideffects on a dependency that exists in solution, allowing to declare an operation that both needs and provides that sideffected dependency.
Note
To be precise, the “sideffected dependency” is the name held in
_Modifier.sideffected
attribute of a modifier created bysfxed()
function.The outputs of a sideffected dependency will produce an overwrite if the sideffected dependency is declared both as needs and provides of some operation.
It is annotated with
sfxed()
; it may have all diacritics in printouts.See also the elaborate example in Hierarchical data and further tricks section.
- accessor
Getter/setter functions to extract/populate solution values given as a modifier parameter (not applicable for pure sideffects).
See
Accessor
defining class and thejsonp()
concrete factory.- subdoc
- superdoc
- doc chain
- hierarchical data
A subdoc is a dependency value nested further into another one (the superdoc), accessed with a json pointer path expression with respect to the solution, denoted with slashes like:
root/parent/child/leaf
Note that if a nested output is asked, then all docs-in-chain are kept i.e. all superdocs till the root dependency (the “superdocs”) plus all its subdocs (the “subdocs”); as depicted below for a hypothetical dependency
/stats/b/b1
:For instance, if the root has been asked as output, no subdoc can be subsequently evicted.
- seealso
:Hierarchical data and further tricks (example)
- json pointer path
- jsonp
A modifier containing slashes(
/
) accessing subdoc values with json pointer expressions, likeroot/parent/child/1/item
.The first step (e.g.
root
) is the name of a dependency in the solution which becomes the root document for the jsonp expression following.- reschedule
- rescheduling
- partial outputs
- canceled operation
The partial pruning of the solution’s dag during execution. It happens when any of these 2 conditions apply:
an operation is marked with the
FnOp.rescheduled
attribute, which means that its underlying callable may produce only a subset of its provides (partial outputs);endurance is enabled, either globally (in the configurations), or for a specific operation.
the solution must then reschedule the remaining operations downstream, and possibly cancel some of those ( assigned in
Solution.canceled
).Partial operations are usually declared with returns dictionary so that the underlying function can control which of the outputs are returned.
- endurance
- endured
Keep executing as many operations as possible, even if some of them fail. Endurance for an operation is enabled if
set_endure_operations()
is true globally in the configurations or ifFnOp.endured
is true.You may interrogate
Solution.executed
to discover the status of each executed operations or call one ofcheck_if_incomplete()
orscream_if_incomplete()
.- predicate
- node predicate
A callable(op, node-data) that should return true for nodes to be included in graph during planning.
- abort run
A global configurations flag that when set with
abort_run()
function, it halts the execution of all currently or future plans.It is reset automatically on every call of
Pipeline.compute()
(after a successful intermediate planning), or manually, by callingreset_abort()
.- parallel
- parallel execution
- execution pool
- task
execute operations in parallel, with a thread pool or process pool (instead of sequential). Operations and pipeline are marked as such on construction, or enabled globally from configurations.
Note a sideffects are not expected to function with process pools, certainly not when marshalling is enabled.
- process pool
When the
multiprocessing.pool.Pool
class is used for parallel execution, the tasks must be communicated to/from the worker process, which requires pickling, and that may fail. With pickling failures you may try marshalling with dill library, and see if that helps.Note that sideffects are not expected to function at all. certainly not when marshalling is enabled.
- thread pool
When the
multiprocessing.dummy.Pool()
class is used for parallel execution, the tasks are run in process, so no marshalling is needed.- marshalling
Pickling parallel operations and their inputs/outputs using the
dill
module. It is configured either globally withset_marshal_tasks()
or set with a flag on each operation / pipeline.Note that sideffects do not work when this is enabled.
- plottable
Objects that can plot their graph network, such as those inheriting
Plottable
, (FnOp
,Pipeline
,Network
,ExecutionPlan
,Solution
) or apydot.Dot
instance (the result of thePlottable.plot()
method).Such objects may render as SVG in Jupiter notebooks (through their
plot()
method) and can render in a Sphinx site with with thegraphtik
RsT directive. You may control the rendered image as explained in the tip of the Plotting section.SVGs are in rendered with the zoom-and-pan javascript library
Attention
Zoom-and-pan does not work in Sphinx sites for Chrome locally - serve the HTML files through some HTTP server, e.g. launch this command to view the site of this project:
python -m http.server 8080 --directory build/sphinx/html/
- plotter
- plotting
A
Plotter
is responsible for rendering plottables as images. It is the active plotter that does that, unless overridden in aPlottable.plot()
call. Plotters can be customized by various means, such plot theme.- active plotter
- default active plotter
The plotter currently installed “in-context” of the respective graphtik configuration - this term implies also any Plot customizations done on the active plotter (such as plot theme).
Installation happens by calling one of
active_plotter_plugged()
orset_active_plotter()
functions.The default active plotter is the plotter instance that this project comes pre-configured with, ie, when no plot-customizations have yet happened.
Attention
It is recommended to use other means for Plot customizations instead of modifying directly theme’s class-attributes.
All
Theme
class-attributes are deep-copied when constructing new instances, to avoid modifications by mistake, while attempting to update instance-attributes instead (hint: allmost all its attributes are containers i.e. dicts). Therefore any class-attributes modification will be ignored, until a newTheme
instance from the patched class is used .- plot theme
- current theme
The mergeable and expandable styles contained in a
plot.Theme
instance.The current theme in-use is the
Plotter.default_theme
attribute of the active plotter, unless overridden with thetheme
parameter when callingPlottable.plot()
(conveyed internally as the value of thePlotArgs.theme
attribute).- style
- style expansion
A style is an attribute of a plot theme, either a scalar value or a dictionary.
Styles are collected in
stacks
and aremerged
into a single dictionary after performing the followingexpansions
:Resolve any
Ref
instances, first against the current nx_attrs and then against the attributes of the current theme.Render jinja2 templates (see
_expand_styles()
) with template-arguments all the attributes of theplot_args
instance in use (hence much more flexible thanRef
).Call any callables with current
plot_args
and replace them by their result (even more flexible than templates).Any Nones results above are discarded.
Workaround pydot/pydot#228 pydot-cstor not supporting styles-as-lists.
Tip
if
DEBUG
is enabled, the provenance of all style values appears in the tooltips of plotted graphs.- configurations
- graphtik configuration
The functions controlling compile & execution globally are defined in
config
module and +1 ingraphtik.plot
module; the underlying global data are stored incontextvars.ContextVar
instances, to allow for nested control.All boolean configuration flags are tri-state (
None, False, True
), allowing to “force” all operations, when they are not set to theNone
value. All of them default toNone
(false).- jetsam
When operations fail, the original exception gets annotated with salvaged values from
locals()
and raised intact.See Jetsam on exceptions.