3. Plotting and Debugging¶
Plotting¶
For Errors & debugging it is necessary to visualize the graph-operation. You may plot the original plot and annotate on top the execution plan and solution of the last computation, calling methods with arguments like this:
netop.plot(show=True) # open a matplotlib window
netop.plot("netop.svg") # other supported formats: png, jpg, pdf, ...
netop.plot() # without arguments return a pydot.DOT object
netop.plot(solution=out) # annotate graph with solution values
The same Plotter.plot()
method applies for NetworkOperation
,
Network
& ExecutionPlan
, each one capable to produce diagrams
with increasing complexity. Whenever possible, the top-level plot()
methods
delegates to the ones below.
For instance, when a net-operation has just been composed, plotting it will come out bare bone, with just the 2 types of nodes (data & operations), their dependencies, and the sequence of the execution-plan.
But as soon as you run it, the net plot calls will print more of the internals.
Internally it delegates to ExecutionPlan.plot()
of NetworkOperation.last_plan
attribute, which caches the last run to facilitate debugging.
If you want the bare-bone diagram, plot network:
netop.net.plot(...)
Note
For plots, Graphviz program must be in your PATH,
and pydot
& matplotlib
python packages installed.
You may install both when installing graphtik
with its plot
extras:
pip install graphtik[plot]
Tip
The pydot.Dot instances returned by
Plotter.plot()
are rendered directly in Jupyter/IPython notebooks
as SVG images.
You may increase the height of the SVG cell output with something like this:
netop.plot(jupyter_render={"svg_element_styles": "height: 600px; width: 100%"})
Check default_jupyter_render
for defaults.
Errors & debugging¶
Graphs may become arbitrary deep. Launching a debugger-session to inspect deeply nested stacks is notoriously hard
As a workaround, when some operation fails, the original exception gets annotated with the folllowing properties, as a debug aid:
>>> from graphtik import compose, operation
>>> from pprint import pprint
>>> def scream(*args):
... raise ValueError("Wrong!")
>>> try:
... compose("errgraph",
... operation(name="screamer", needs=['a'], provides=["foo"])(scream)
... )(a=None)
... except ValueError as ex:
... pprint(ex.jetsam)
{'args': {'args': [None], 'kwargs': {}},
'executed': set(),
'network': Network(
+--a
+--FunctionalOperation(name='screamer', needs=['a'], provides=['foo'], fn='scream')
+--foo),
'operation': FunctionalOperation(name='screamer', needs=['a'], provides=['foo'], fn='scream'),
'outputs': None,
'plan': ExecutionPlan(needs=['a'], provides=['foo'], steps:
+--FunctionalOperation(name='screamer', needs=['a'], provides=['foo'], fn='scream')),
'provides': None,
'results_fn': None,
'results_op': None,
'solution': {'a': None}}
In interactive REPL console you may use this to get the last raised exception:
import sys
sys.last_value.jetsam
The following annotated attributes might have meaningfull value on an exception:
network
- the innermost network owning the failed operation/function
plan
- the innermost plan that executing when a operation crashed
operation
- the innermost operation that failed
args
- either the input arguments list fed into the function, or a dict with
both
args
&kwargs
keys in it. outputs
- the names of the outputs the function was expected to return
provides
- the names eventually the graph needed from the operation; a subset of the above, and not always what has been declared in the operation.
fn_results
- the raw results of the operation’s fuction, if any
op_results
- the results, always a dictionary, as matched with operation’s provides
executed`
- a set with the operation nodes & instructions executed till the error happened.
Ofcourse you may use many of the above “jetsam” values when plotting.
Note
The Plotting capabilities, along with the above annotation of exceptions with the internal state of plan/operation often renders a debugger session unnecessary. But since the state of the annotated values might be incomple, you may not always avoid one.
Execution internals¶
Network-based computation of operations & data.
The execution of network operations is splitted in 2 phases:
- COMPILE:
- prune unsatisfied nodes, sort dag topologically & solve it, and derive the execution steps (see below) based on the given inputs and asked outputs.
- EXECUTE:
- sequential or parallel invocation of the underlying functions
of the operations with arguments from the
solution
.
Computations are based on 5 data-structures:
Network.graph
A
networkx
graph (yet a DAG) containing interchanging layers ofOperation
and_DataNode
nodes. They are layed out and connected by repeated calls ofadd_OP()
.The computation starts with
prune()
extracting a DAG subgraph by pruning its nodes based on given inputs and requested outputs incompute()
.ExecutionPlan.dag
- An directed-acyclic-graph containing the pruned nodes as build by
prune()
. This pruned subgraph is used to decide theExecutionPlan.steps
(below). The containingExecutionPlan.steps
instance is cached in_cached_plans
across runs with inputs/outputs as key. ExecutionPlan.steps
It is the list of the operation-nodes only from the dag (above), topologically sorted, and interspersed with instruction steps needed to complete the run. It is built by
_build_execution_steps()
based on the subgraph dag extracted above. The containingExecutionPlan.steps
instance is cached in_cached_plans
across runs with inputs/outputs as key.The instructions items achieve the following:
_EvictInstruction
: evicts items from solution as soon as- they are not needed further down the dag, to reduce memory footprint while computing.
_PinInstruction
: avoid overwritting any given intermediate- inputs, and still allow their providing operations to run (because they are needed for their other outputs).
var solution: | a local-var in compute() , initialized on each run
to hold the values of the given inputs, generated (intermediate) data,
and output values.
It is returned as is if no specific outputs requested; no data-eviction
happens then. |
---|---|
arg overwrites: | The optional argument given to compute() to colect the
intermediate calculated values that are overwritten by intermediate
(aka “pinned”) input-values. |