4. Architecture

compute
computation
phase

%3 graphtik-v4.1.0 flowchart cluster_compute compute operations operations compose compose operations->compose network network compose->network compile compile network->compile inputs input names inputs->compile outputs output names outputs->compile predicate node predicate predicate->compile plan execution plan compile->plan execute execute plan->execute solution solution execute->solution values input values values->execute The definition & execution of networked operation is split in 1+2 phases:

… it is constrained by these IO data-structures:

… populates these low-level data-structures:

… and utilizes these main classes:

graphtik.op.FunctionalOperation([fn, name, …])

An operation performing a callable (ie a function, a method, a lambda).

graphtik.netop.NetworkOperation(operations, …)

An operation that can compute a network-graph of operations.

graphtik.network.Network(*operations[, graph])

A graph of operations that can compile an execution plan.

graphtik.network.ExecutionPlan

A pre-compiled list of operation steps that can execute for the given inputs/outputs.

graphtik.network.Solution(plan, input_values)

A chain-map collecting solution outputs and execution state (eg overwrites)

compose
composition

The phase where operations are constructed and grouped into netops and corresponding networks.

Tip

compile
compilation

The phase where the Network creates a new execution plan by pruning all graph nodes into a subgraph dag, and deriving the execution steps.

execute
execution
sequential

The phase where the ExecutionPlan calls the underlying functions of all operations contained in execution steps, with inputs/outputs taken from the solution.

Currently there are 2 ways to execute:

  • sequential

  • parallel, with a multiprocessing.pool.ProcessPool

Plans may abort their execution by setting the abort run global flag.

net
network

the Network contains a graph of operations and can compile (and cache) execution plans, or prune a cloned network for given inputs/outputs/node predicate.

plan
execution plan

Class ExecutionPlan perform the execution phase which contains the dag and the steps.

compileed execution plans are cached in Network._cached_plans across runs with (inputs, outputs, predicate) as key.

solution

A Solution instance created internally by NetworkOperation.compute() to hold the values both inputs & outputs, and the status of executed operations. It is based on a collections.ChainMap, to keep one dictionary for each operation executed +1 for inputs.

The results of the last operation executed “wins” in the outputs produced, and the base (least precedence) is the inputs given when the execution started.

graph
network graph

A graph of operations linked by their dependencies forming a pipeline.

The Network.graph (currently a DAG) contains all FunctionalOperation and data-nodes (string or modifier) of a netop.

They are layed out and connected by repeated calls of Network._append_operation() by Network constructor during composition.

This graph is then pruned to extract the dag, and the execution steps are calculated, all ingredients for a new ExecutionPlan.

prune
pruning

A subphase of compilation performed by method Network._prune_graph(), which extracts a subgraph dag that does not contain any unsatisfied operations.

It topologically sorts the graph, and prunes based on given inputs, asked outputs, node predicate and operation needs & provides.

unsatisfied operation

The core of pruning & rescheduling, performed by network._unsatisfied_operations() function, which collects all operations with unreachable dependencies:

  • they have needs that do not correspond to any of the given inputs or the intermediately computed outputs of the solution;

  • all their provides are NOT needed by any other operation, nor are asked as outputs.

dag
execution dag
solution dag

There are 2 directed-acyclic-graphs instances used:

steps
execution steps

The plan contains a list of the operation-nodes only from the dag, topologically sorted, and interspersed with instruction steps needed to compute the asked outputs from the given inputs.

They are built by Network._build_execution_steps() based on the subgraph dag.

The only instruction step is for performing evictions.

evictions

A memory footprint optimization where intermediate inputs & outputs are erased from solution as soon as they are not needed further down the dag.

Evictions are pre-calculated during compilation, where _EvictInstruction steps are inserted in the execution plan.

overwrites

Values in the solution that have been written by more than one operations, accessed by Solution.overwrites. Note that solution sideffect dependency produce, almost always, overwrites.

inputs

The named input values that are fed into an operation (or netop) through Operation.compute() method according to its needs.

These values are either:

outputs

The dictionary of computed values returned by an operation (or a netop) matching its provides, when method Operation.compute() is called.

Those values are either:

  • retained in the solution, internally during execution, keyed by the respective provide, or

  • returned to user after the outer netop has finished computation.

When no specific outputs requested from a netop, NetworkOperation.compute() returns all intermediate inputs along with the outputs, that is, no evictions happens.

An operation may return partial outputs.

netop
network operation
pipeline

The NetworkOperation class holding a network of operations and dependencies.

operation

Either the abstract notion of an action with specified needs and provides, dependencies, or the concrete wrapper FunctionalOperation for (any callable()), that feeds on inputs and update outputs, from/to solution, or given-by/returned-to the user by a netop.

The distinction between needs/provides and inputs/outputs is akin to function parameters and arguments during define-time and run-time, respectively.

dependency

The name of a solution value an operation needs or provides.

  • Dependencies are declared during composition, when building FunctionalOperation instances. Operations are then interlinked together, by matching the needs & provides of all operations contained in a pipeline.

  • During compilation the graph is then pruned based on the reachability of the dependencies.

  • During execution Operation.compute() performs 2 “matchings”:

    • inputs & outputs in solution are accessed by the needs & provides names of the operations;

    • operation needs & provides are zipped against the underlying function’s arguments and results.

    These matchings are affected by modifiers.

needs
fn_needs

The list of dependency names an operation requires from solution as inputs,

roughly corresponding to underlying function’s arguments (fn_needs).

Specifically, Operation.compute() extracts input values from solution by these names, and matches them against function arguments, mostly by their positional order. Whenever this matching is not 1-to-1, and function-arguments differ from the regular needs, modifiers must be used.

provides
op_provides
fn_provides

The list of dependency names an operation writes to the solution as outputs,

roughly corresponding to underlying function’s results (fn_provides).

Specifically, Operation.compute() “zips” this list-of-names with the output values produced when the operation’s function is called. Whenever this “zipping” is not 1-to-1, and function-results differ from the regular operation (op_provides) (or results are not a list), it is possible to:

alias

Map an existing name in fn_provides into a duplicate, artificial one in op_provides .

You cannot alias an alias. See Aliased provides

returns dictionary

When an operation is marked with this flag, the underlying function is not expected to return fn_provides as a sequence but as a dictionary; hence, no “zipping” of function-results –> op_provides takes place.

Usefull for operation returning partial outputs.

modifier

Annotations on a dependency such as optionals & sideffects.

(see graphtik.modifiers module)

optionals

A modifier applied on needs only dependencies, corresponding to either:

  • function arguments-with-defaults (annotated with optional), or

  • *args (annotated with vararg & varargs),

that do not hinder execution of the operation if absent from inputs.

sideffects

A modifier denoting a fictive dependency linking operations into virtual flows,

without real data exchanges.

A sideffect is a dependency denoting a modification to some internal state that may not be fully represented in the graph & solution. Sideffects participate in the compilation of the graph, and a dummy value gets written in the solution during execution, but they are never given/asked to/from functions.

There are actually 2 relevant modifiers:

  • An abstract sideffect (annotated with sideffect modifier) describing modifications taking place beyond the scope of the solution.

  • The solution sideffect (annotated with sol_sideffect modifier) denoting modifications on dependencies that are read and written in solution.

Attention

Sideffects are not compatible with optionals and partial outputs.

solution sideffect
sideffected

A modifier that denotes sideffects on a dependency that exists in solution, …

allowing to declare an operation that both needs and provides that sideffected dependency.

All solution sideffect outputs produce, by definition, overwrites. It is annotated with sol_sideffect class.

reschedule
rescheduling
partial outputs
partial operation
canceled operation

The partial pruning of the solution’s dag during execution. It happens when any of these 2 conditions apply:

the solution must then reschedule the remaining operations downstream, and possibly cancel some of those ( assigned in Solution.canceled).

Operations with partial outputs are incompatible with solution sideffects, i.e. they cannot control which of their sideffects they have produced, it’s either all or nothing.

See Operations with partial outputs (rescheduled)

endurance
endured

Keep executing as many operations as possible, even if some of them fail. Endurance for an operation is enabled if set_endure_operations() is true globally in the configurations or if FunctionalOperation.endured is true.

You may interrogate Solution.executed to discover the status of each executed operations or call one of check_if_incomplete() or scream_if_incomplete().

See Resilience on errors (endured)

predicate
node predicate

A callable(op, node-data) that should return true for nodes to be included in graph during compilation.

abort run

A global configurations flag that when set with abort_run() function, it halts the execution of all currently or future plans.

It is reset automatically on every call of NetworkOperation.compute() (after a successful intermediate compilation), or manually, by calling reset_abort().

parallel
parallel execution
execution pool
task

execute operations in parallel, with a thread pool or process pool (instead of sequential). Operations and netop are marked as such on construction, or enabled globally from configurations.

Note a sideffects are not expected to function with process pools, certainly not when marshalling is enabled.

process pool

When the multiprocessing.pool.Pool class is used for parallel execution, the tasks must be communicated to/from the worker process, which requires pickling, and that may fail. With pickling failures you may try marshalling with dill library, and see if that helps.

Note that sideffects are not expected to function at all. certainly not when marshalling is enabled.

thread pool

When the multiprocessing.dummy.Pool() class is used for parallel execution, the tasks are run in process, so no marshalling is needed.

marshalling

Pickling parallel operations and their inputs/outputs using the dill module. It is configured either globally with set_marshal_tasks() or set with a flag on each operation / netop.

Note that sideffects do not work when this is enabled.

plottable

Objects that can plot their graph network, such as those inheriting Plottable, (FunctionalOperation, NetworkOperation, Network, ExecutionPlan, Solution) or a pydot.Dot instance (the result of the Plottable.plot() method).

Such objects may render as SVG in Jupiter notebooks (through their plot() method) and can render in a Sphinx site with with the graphtik RsT directive. You may control the rendered image as explained in the tip of the Plotting section.

SVGs are in rendered with the zoom-and-pan javascript library

Attention

Zoom-and-pan does not work in Sphinx sites for Chrome locally - serve the HTML files through some HTTP server, e.g. launch this command to view the site of this project:

python -m http.server 8080 --directory build/sphinx/html/
plotter

A Plotter is responsible for rendering plottables as images. It is the active plotter that does that, unless overridden in a Plottable.plot() call. Plotters can be customized by various means, such plot theme.

active plotter
default active plotter

The plotter currently installed “in-context” of the respective graphtik configuration - this term implies also any Plot customizations done on the active plotter (such as plot theme).

Installation happens by calling one of active_plotter_plugged() or set_active_plotter() functions.

The default active plotter is the plotter instance that this project comes pre-configured with, ie, when no plot-customizations have yet happened.

plot theme
theme expansion

The mergeable and auto-expandable attributes of plot.Theme instances in use.

The actual theme in-use is the Plotter.default_theme attribute of the active plotter, unless overridden with the theme parameter when calling Plottable.plot() (conveyed internally as the value of the PlotArgs.theme attribute).

The following expansions apply in the attribute-values of Theme instances:

  • Any lists will be merged (important for multi-valued Graphviz attributes like style).

  • Any Ref instances will be resolved against the attributes of the current theme.

  • Any jinja2 templates will be rendered, using as template-arguments all the attributes of the plot_args instance in use.

Attention

All Theme class attributes are deep-copied when constructing new instances, to avoid modifying them by mistake, while attempting to update instance attributes instead (hint: allmost all its attributes are containers i.e. dicts).

Therefore it is recommended to use other means for Plot customizations instead of modifying directly theme’s class-attributes.

configurations
graphtik configuration

The functions controlling compile & execution globally are defined in config module and +1 in graphtik.plot module; the underlying global data are stored in contextvars.ContextVar instances, to allow for nested control.

All boolean configuration flags are tri-state (None, False, True), allowing to “force” all operations, when they are not set to the None value. All of them default to None (false).