3. Plotting and Debugging

Plotting

For Errors & debugging it is necessary to visualize the graph-operation. You may plot the original plot and annotate on top the execution plan and solution of the last computation, calling methods with arguments like this:

netop.plot(show=True)                # open a matplotlib window
netop.plot("netop.svg")            # other supported formats: png, jpg, pdf, ...
netop.plot()                         # without arguments return a pydot.DOT object
netop.plot(solution=out)             # annotate graph with solution values
execution plan
Graphtik Legend

The legend for all graphtik diagrams, generated by legend().

The same Plotter.plot() method applies for NetworkOperation, Network & ExecutionPlan, each one capable to produce diagrams with increasing complexity. Whenever possible, the top-level plot() methods delegates to the ones below.

For instance, when a net-operation has just been composed, plotting it will come out bare bone, with just the 2 types of nodes (data & operations), their dependencies, and the sequence of the execution-plan.

barebone graph

But as soon as you run it, the net plot calls will print more of the internals. Internally it delegates to ExecutionPlan.plot() of NetworkOperation.last_plan attribute, which caches the last run to facilitate debugging. If you want the bare-bone diagram, plot network:

netop.net.plot(...)

Note

For plots, Graphviz program must be in your PATH, and pydot & matplotlib python packages installed. You may install both when installing graphtik with its plot extras:

pip install graphtik[plot]

Tip

The pydot.Dot instances returned by Plotter.plot() are rendered directly in Jupyter/IPython notebooks as SVG images.

You may increase the height of the SVG cell output with something like this:

netop.plot(jupyter_render={"svg_element_styles": "height: 600px; width: 100%"})

Check default_jupyter_render for defaults.

Errors & debugging

Graphs may become arbitrary deep. Launching a debugger-session to inspect deeply nested stacks is notoriously hard

As a workaround, when some operation fails, the original exception gets annotated with the folllowing properties, as a debug aid:

>>> from graphtik import compose, operation
>>> from pprint import pprint
>>> def scream(*args):
...     raise ValueError("Wrong!")
>>> try:
...     compose("errgraph",
...             operation(name="screamer", needs=['a'], provides=["foo"])(scream)
...     )(a=None)
... except ValueError as ex:
...     pprint(ex.jetsam)
{'args': {'args': [None], 'kwargs': {}},
 'executed': set(),
 'network': Network(
    +--a
    +--FunctionalOperation(name='screamer', needs=['a'], provides=['foo'], fn='scream')
    +--foo),
 'operation': FunctionalOperation(name='screamer', needs=['a'], provides=['foo'], fn='scream'),
 'outputs': None,
 'plan': ExecutionPlan(needs=['a'], provides=['foo'], steps:
  +--FunctionalOperation(name='screamer', needs=['a'], provides=['foo'], fn='scream')),
 'provides': None,
 'results_fn': None,
 'results_op': None,
 'solution': {'a': None}}

In interactive REPL console you may use this to get the last raised exception:

import sys

sys.last_value.jetsam

The following annotated attributes might have meaningfull value on an exception:

network
the innermost network owning the failed operation/function
plan
the innermost plan that executing when a operation crashed
operation
the innermost operation that failed
args
either the input arguments list fed into the function, or a dict with both args & kwargs keys in it.
outputs
the names of the outputs the function was expected to return
provides
the names eventually the graph needed from the operation; a subset of the above, and not always what has been declared in the operation.
fn_results
the raw results of the operation’s fuction, if any
op_results
the results, always a dictionary, as matched with operation’s provides
executed`
a set with the operation nodes & instructions executed till the error happened.

Ofcourse you may use many of the above “jetsam” values when plotting.

Note

The Plotting capabilities, along with the above annotation of exceptions with the internal state of plan/operation often renders a debugger session unnecessary. But since the state of the annotated values might be incomple, you may not always avoid one.

Execution internals

Network-based computation of operations & data.

The execution of network operations is splitted in 2 phases:

COMPILE:
prune unsatisfied nodes, sort dag topologically & solve it, and derive the execution steps (see below) based on the given inputs and asked outputs.
EXECUTE:
sequential or parallel invocation of the underlying functions of the operations with arguments from the solution.

Computations are based on 5 data-structures:

Network.graph

A networkx graph (yet a DAG) containing interchanging layers of Operation and _DataNode nodes. They are layed out and connected by repeated calls of add_OP().

The computation starts with prune() extracting a DAG subgraph by pruning its nodes based on given inputs and requested outputs in compute().

ExecutionPlan.dag
An directed-acyclic-graph containing the pruned nodes as build by prune(). This pruned subgraph is used to decide the ExecutionPlan.steps (below). The containing ExecutionPlan.steps instance is cached in _cached_plans across runs with inputs/outputs as key.
ExecutionPlan.steps

It is the list of the operation-nodes only from the dag (above), topologically sorted, and interspersed with instruction steps needed to complete the run. It is built by _build_execution_steps() based on the subgraph dag extracted above. The containing ExecutionPlan.steps instance is cached in _cached_plans across runs with inputs/outputs as key.

The instructions items achieve the following:

  • _EvictInstruction: evicts items from solution as soon as
    they are not needed further down the dag, to reduce memory footprint while computing.
  • _PinInstruction: avoid overwritting any given intermediate
    inputs, and still allow their providing operations to run (because they are needed for their other outputs).
var solution:a local-var in compute(), initialized on each run to hold the values of the given inputs, generated (intermediate) data, and output values. It is returned as is if no specific outputs requested; no data-eviction happens then.
arg overwrites:The optional argument given to compute() to colect the intermediate calculated values that are overwritten by intermediate (aka “pinned”) input-values.