4. API Reference¶
Package: graphtik¶
Lightweight computation graphs for Python.
Module: base¶
Mostly utilities
-
class
graphtik.base.
Plotter
[source]¶ Classes wishing to plot their graphs should inherit this and …
implement property
plot
to return a “partial” callable that somehow ends up callingplot.render_pydot()
with the graph or any other args binded appropriately. The purpose is to avoid copying this function & documentation here around.-
plot
(filename=None, show=False, jupyter_render: Union[None, Mapping[KT, VT_co], str] = None, **kws)[source]¶ Entry-point for plotting ready made operation graphs.
Parameters: - filename (str) – Write diagram into a file.
Common extensions are
.png .dot .jpg .jpeg .pdf .svg
callplot.supported_plot_formats()
for more. - show – If it evaluates to true, opens the diagram in a matplotlib window. If it equals -1, it plots but does not open the Window.
- inputs – an optional name list, any nodes in there are plotted as a “house”
- outputs – an optional name list, any nodes in there are plotted as an “inverted-house”
- solution – an optional dict with values to annotate nodes, drawn “filled” (currently content not shown, but node drawn as “filled”)
- executed – an optional container with operations executed, drawn “filled”
- title – an optional string to display at the bottom of the graph
- node_props – an optional nested dict of Grapvhiz attributes for certain nodes
- edge_props – an optional nested dict of Grapvhiz attributes for certain edges
- clusters – an optional mapping of nodes –> cluster-names, to group them
- jupyter_render – a nested dictionary controlling the rendering of graph-plots in Jupyter cells,
if None, defaults to
jupyter_render
(you may modify it in place and apply for all future calls).
Returns: a pydot.Dot instance (for for API reference visit: https://pydotplus.readthedocs.io/reference.html#pydotplus.graphviz.Dot)
Tip
The
pydot.Dot
instance returned is rendered directly in Jupyter/IPython notebooks as SVG images.You may increase the height of the SVG cell output with something like this:
netop.plot(svg_element_styles="height: 600px; width: 100%")
Check
default_jupyter_render
for defaults.Note that the graph argument is absent - Each Plotter provides its own graph internally; use directly
render_pydot()
to provide a different graph.NODES:
- oval
- function
- egg
- subgraph operation
- house
- given input
- inversed-house
- asked output
- polygon
- given both as input & asked as output (what?)
- square
- intermediate data, neither given nor asked.
- red frame
- evict-instruction, to free up memory.
- blue frame
- pinned-instruction, not to overwrite intermediate inputs.
- filled
- data node has a value in solution OR function has been executed.
- thick frame
- function/data node in execution steps.
ARROWS
- solid black arrows
- dependencies (source-data need-ed by target-operations, sources-operations provides target-data)
- dashed black arrows
- optional needs
- blue arrows
- sideffect needs/provides
- wheat arrows
- broken dependency (
provide
) during pruning - green-dotted arrows
- execution steps labeled in succession
To generate the legend, see
legend()
.Sample code:
>>> from graphtik import compose, operation >>> from graphtik.modifiers import optional >>> from operator import add
>>> netop = compose("netop", ... operation(name="add", needs=["a", "b1"], provides=["ab1"])(add), ... operation(name="sub", needs=["a", optional("b2")], provides=["ab2"])(lambda a, b=1: a-b), ... operation(name="abb", needs=["ab1", "ab2"], provides=["asked"])(add), ... )
>>> netop.plot(show=True); # plot just the graph in a matplotlib window # doctest: +SKIP >>> inputs = {'a': 1, 'b1': 2} >>> solution = netop(**inputs) # now plots will include the execution-plan
>>> netop.plot('plot1.svg', inputs=inputs, outputs=['asked', 'b1'], solution=solution); # doctest: +SKIP >>> dot = netop.plot(solution=solution); # just get the `pydoit.Dot` object, renderable in Jupyter >>> print(dot) digraph G { fontname=italic; label=netop; a [fillcolor=wheat, shape=invhouse, style=filled, tooltip=1]; ...
- filename (str) – Write diagram into a file.
Common extensions are
-
-
graphtik.base.
aslist
(i, argname, allowed_types=<class 'list'>)[source]¶ Utility to accept singular strings as lists, and None –> [].
-
graphtik.base.
jetsam
(ex, locs, *salvage_vars, annotation='jetsam', **salvage_mappings)[source]¶ Annotate exception with salvaged values from locals() and raise!
Parameters: - ex – the exception to annotate
- locs –
locals()
from the context-manager’s block containing vars to be salvaged in case of exceptionATTENTION: wrapped function must finally call
locals()
, because locals dictionary only reflects local-var changes after call. - annotation – the name of the attribute to attach on the exception
- salvage_vars – local variable names to save as is in the salvaged annotations dictionary.
- salvage_mappings – a mapping of destination-annotation-keys –> source-locals-keys;
if a source is callable, the value to salvage is retrieved
by calling
value(locs)
. They take precendance over`salvae_vars`.
Raises: any exception raised by the wrapped function, annotated with values assigned as atrributes on this context-manager
- Any attrributes attached on this manager are attached as a new dict on
the raised exception as new
jetsam
attrribute with a dict as value. - If the exception is already annotated, any new items are inserted, but existing ones are preserved.
Example:
Call it with managed-block’s
locals()
and tell which of them to salvage in case of errors:try: a = 1 b = 2 raise Exception() exception Exception as ex: jetsam(ex, locals(), "a", b="salvaged_b", c_var="c")
And then from a REPL:
import sys sys.last_value.jetsam {'a': 1, 'salvaged_b': 2, "c_var": None}
** Reason:**
Graphs may become arbitrary deep. Debugging such graphs is notoriously hard.
The purpose is not to require a debugger-session to inspect the root-causes (without precluding one).
Naively salvaging values with a simple try/except block around each function, blocks the debugger from landing on the real cause of the error - it would land on that block; and that could be many nested levels above it.
Module: op¶
About operation nodes (but not net-ops to break cycle).
-
class
graphtik.op.
FunctionalOperation
[source]¶ An Operation performing a callable (ie function, method, lambda).
Use
operation()
factory to build instances of this class instead.-
compute
(named_inputs, outputs=None) → dict[source]¶ Compute (optional) asked outputs for the given named_inputs.
It is called by
Network
. End-users should simply call the operation with named_inputs as kwargs.Parameters: named_inputs (list) – the input values with which to feed the computation. Returns list: Should return a list values representing the results of running the feed-forward computation on inputs
.
-
-
class
graphtik.op.
Operation
[source]¶ An abstract class representing an action with
compute()
.-
compute
(named_inputs, outputs=None)[source]¶ Compute (optional) asked outputs for the given named_inputs.
It is called by
Network
. End-users should simply call the operation with named_inputs as kwargs.Parameters: named_inputs (list) – the input values with which to feed the computation. Returns list: Should return a list values representing the results of running the feed-forward computation on inputs
.
-
-
class
graphtik.op.
operation
(fn: Callable = None, *, name=None, needs: Union[Collection[T_co], str, None] = None, provides: Union[Collection[T_co], str, None] = None, returns_dict=None, node_props: Mapping[KT, VT_co] = None)[source]¶ A builder for graph-operations wrapping functions.
Parameters: - fn (function) – The function used by this operation. This does not need to be
specified when the operation object is instantiated and can instead
be set via
__call__
later. - name (str) – The name of the operation in the computation graph.
- needs (list) – Names of input data objects this operation requires. These should
correspond to the
args
offn
. - provides (list) – Names of output data objects this operation provides. If more than one given, those must be returned in an iterable, unless returns_dict is true, in which cae a dictionary with as many elements must be returned
- returns_dict (bool) – if true, it means the fn returns a dictionary with all provides, and no further processing is done on them (i.e. the returned output-values are not zipped with provides)
- node_props – added as-is into NetworkX graph
Returns: when called, it returns a
FunctionalOperation
Example:
This is an example of its use, based on the “builder pattern”:
>>> from graphtik import operation >>> opb = operation(name='add_op') >>> opb.withset(needs=['a', 'b']) operation(name='add_op', needs=['a', 'b'], provides=[], fn=None) >>> opb.withset(provides='SUM', fn=sum) operation(name='add_op', needs=['a', 'b'], provides=['SUM'], fn='sum')
You may keep calling
withset()
till you invoke a final__call__()
on the builder; then you get the actualFunctionalOperation
instance:>>> # Create `Operation` and overwrite function at the last moment. >>> opb(sum) FunctionalOperation(name='add_op', needs=['a', 'b'], provides=['SUM'], fn='sum')
- fn (function) – The function used by this operation. This does not need to be
specified when the operation object is instantiated and can instead
be set via
Module: netop¶
About network-operations (those based on graphs)
-
class
graphtik.netop.
NetworkOperation
(net, name, *, inputs=None, outputs=None, predicate: Callable[[Any, Mapping[KT, VT_co]], bool] = None, method=None, overwrites_collector=None)[source]¶ An Operation performing a network-graph of other operations.
Tip
Use
compose()
factory to prepare the net and build instances of this class.-
compute
(named_inputs, outputs=None) → dict[source]¶ Solve & execute the graph, sequentially or parallel.
It see also
Operation.compute()
.Parameters: - named_inputs (dict) – A maping of names –> values that must contain at least the compulsory inputs that were specified when the plan was built (but cannot enforce that!). Cloned, not modified.
- outputs – a string or a list of strings with all data asked to compute.
If you set this variable to
None
, all data nodes will be kept and returned at runtime.
Returns: a dictionary of output data objects, keyed by name.
Raises: ValueError –
If outputs asked do not exist in network, with msg:
Unknown output nodes: …
If plan does not contain any operations, with msg:
Unsolvable graph: …
If given inputs mismatched plan’s
needs
, with msg:Plan needs more inputs…
If outputs asked cannot be produced by the
dag
, with msg:Impossible outputs…
-
last_plan
= None[source]¶ The execution_plan of the last call to compute(), stored as debugging aid.
-
narrowed
(inputs: Union[Collection[T_co], str, None] = None, outputs: Union[Collection[T_co], str, None] = None, name=None, predicate: Callable[[Any, Mapping[KT, VT_co]], bool] = None) → graphtik.netop.NetworkOperation[source]¶ Return a copy with a network pruned for the given needs & provides.
Parameters: - inputs – prune net against these possbile inputs for
compute()
; method will WARN for any irrelevant inputs given. If None, they are collected from thenet
. They become the needs of the returned netop. - outputs – prune net against these possible outputs for
compute()
; method will RAISE if any irrelevant outputs asked. If None, they are collected from thenet
. They become the provides of the returned netop. - name –
the name for the new netop:
- if None, the same name is kept;
- if True, a distinct name is devised:
<old-name>-<uid>
- otherwise, the given name is applied.
- predicate – a 2-argument callable(op, node-data) that should return true for nodes to include
Returns: A narrowed netop clone, which MIGHT be empty!*
Raises: ValueError –
If outputs asked do not exist in network, with msg:
Unknown output nodes: …
- inputs – prune net against these possbile inputs for
-
set_execution_method
(method)[source]¶ Determine how the network will be executed.
Parameters: method (str) – If “parallel”, execute graph operations concurrently using a threadpool.
-
set_overwrites_collector
(collector)[source]¶ Asks to put all overwrites into the collector after computing
An “overwrites” is intermediate value calculated but NOT stored into the results, becaues it has been given also as an intemediate input value, and the operation that would overwrite it MUST run for its other results.
Parameters: collector – a mutable dict to be fillwed with named values
-
-
graphtik.netop.
compose
(name, op1, *operations, needs: Union[Collection[T_co], str, None] = None, provides: Union[Collection[T_co], str, None] = None, merge=False, node_props=None, method=None, overwrites_collector=None) → graphtik.netop.NetworkOperation[source]¶ Composes a collection of operations into a single computation graph, obeying the
merge
property, if set in the constructor.Parameters: - name (str) – A optional name for the graph being composed by this object.
- op1 – syntactically force at least 1 operation
- operations – Each argument should be an operation instance created using
operation
. - merge (bool) – If
True
, this compose object will attempt to merge togetheroperation
instances that represent entire computation graphs. Specifically, if one of theoperation
instances passed to thiscompose
object is itself a graph operation created by an earlier use ofcompose
the sub-operations in that graph are compared against other operations passed to thiscompose
instance (as well as the sub-operations of other graphs passed to thiscompose
instance). If any two operations are the same (based on name), then that operation is computed only once, instead of multiple times (one for each time the operation appears). - node_props – added as-is into NetworkX graph, to provide for filtering
by
NetworkOperation.narrowed()
. - method – either parallel or None (default);
if
"parallel"
, launches multi-threading. Set when invoking a composed graph or byset_execution_method()
. - overwrites_collector – (optional) a mutable dict to be fillwed with named values. If missing, values are simply discarded.
Returns: Returns a special type of operation class, which represents an entire computation graph as a single operation.
Raises: ValueError – If the net` cannot produce the asked outputs from the given inputs.
Module: network¶
Network-based computation of operations & data.
The execution of network operations is splitted in 2 phases:
- COMPILE:
- prune unsatisfied nodes, sort dag topologically & solve it, and derive the execution steps (see below) based on the given inputs and asked outputs.
- EXECUTE:
- sequential or parallel invocation of the underlying functions
of the operations with arguments from the
solution
.
Computations are based on 5 data-structures:
Network.graph
A
networkx
graph (yet a DAG) containing interchanging layers ofOperation
and_DataNode
nodes. They are layed out and connected by repeated calls ofadd_OP()
.The computation starts with
_prune_graph()
extracting a DAG subgraph by pruning its nodes based on given inputs and requested outputs incompute()
.ExecutionPlan.dag
- An directed-acyclic-graph containing the pruned nodes as build by
_prune_graph()
. This pruned subgraph is used to decide theExecutionPlan.steps
(below). The containingExecutionPlan.steps
instance is cached in_cached_plans
across runs with inputs/outputs as key. ExecutionPlan.steps
It is the list of the operation-nodes only from the dag (above), topologically sorted, and interspersed with instruction steps needed to complete the run. It is built by
_build_execution_steps()
based on the subgraph dag extracted above. The containingExecutionPlan.steps
instance is cached in_cached_plans
across runs with inputs/outputs as key.The instructions items achieve the following:
_EvictInstruction
: evicts items from solution as soon as- they are not needed further down the dag, to reduce memory footprint while computing.
_PinInstruction
: avoid overwritting any given intermediate- inputs, and still allow their providing operations to run (because they are needed for their other outputs).
var solution: | a local-var in compute() , initialized on each run
to hold the values of the given inputs, generated (intermediate) data,
and output values.
It is returned as is if no specific outputs requested; no data-eviction
happens then. |
---|---|
arg overwrites: | The optional argument given to compute() to colect the
intermediate calculated values that are overwritten by intermediate
(aka “pinned”) input-values. |
-
exception
graphtik.network.
AbortedException
[source]¶ Raised from the Network code when
abort_run()
is called.
-
graphtik.network.
_execution_configs
= <ContextVar name='execution_configs' default={'execution_pool': <multiprocessing.pool.ThreadPool object>, 'abort': False, 'skip_evictions': False}>[source]¶ Global configurations for all (nested) networks in a computaion run.
-
class
graphtik.network.
Network
(*operations, graph=None)[source]¶ Assemble operations & data into a directed-acyclic-graph (DAG) to run them.
Variables: - needs – the “base”, all data-nodes that are not produced by some operation
- provides – the “base”, all data-nodes produced by some operation
-
_append_operation
(graph, operation: graphtik.op.Operation)[source]¶ Adds the given operation and its data requirements to the network graph.
- Invoked during constructor only (immutability).
- Identities are based on the name of the operation, the names of the operation’s needs, and the names of the data it provides.
Parameters: - graph – the networkx graph to append to
- operation – operation instance to append
-
_build_execution_steps
(pruned_dag, inputs: Collection[T_co], outputs: Optional[Collection[T_co]]) → List[T][source]¶ Create the list of operation-nodes & instructions evaluating all
operations & instructions needed a) to free memory and b) avoid overwritting given intermediate inputs.
Parameters: - pruned_dag – The original dag, pruned; not broken.
- outputs – outp-names to decide whether to add (and which) evict-instructions
Instances of
_EvictInstructions
are inserted in steps between operation nodes to reduce the memory footprint of solutions while the computation is running. An evict-instruction is inserted whenever a need is not used by any other operation further down the DAG.
-
_cached_plans
= None[source]¶ Speed up
compile()
call and avoid a multithreading issue(?) that is occuring when accessing the dag in networkx.
-
_prune_graph
(inputs: Union[Collection[T_co], str, None], outputs: Union[Collection[T_co], str, None], predicate: Callable[[Any, Mapping[KT, VT_co]], bool] = None) → Tuple[<sphinx.ext.autodoc.importer._MockObject object at 0x7f17c90eb400>, Collection[T_co], Collection[T_co], Collection[T_co]][source]¶ Determines what graph steps need to run to get to the requested outputs from the provided inputs: - Eliminate steps that are not on a path arriving to requested outputs; - Eliminate unsatisfied operations: partial inputs or no outputs needed; - consolidate the list of needs & provides.
Parameters: - inputs – The names of all given inputs.
- outputs – The desired output names. This can also be
None
, in which case the necessary steps are all graph nodes that are reachable from the provided inputs. - predicate – a 2-argument callable(op, node-data) that should return true for nodes to include
Returns: a 4-tuple with the pruned_dag, the out-edges of the inputs, and needs/provides resolved based on given inputs/outputs (which might be a subset of all needs/outputs of the returned graph).
Use the returned needs/provides to build a new plan.
Raises: ValueError –
if outputs asked do not exist in network, with msg:
Unknown output nodes: …
-
_unsatisfied_operations
(dag, inputs: Collection[T_co])[source]¶ Traverse topologically sorted dag to collect un-satisfied operations.
Unsatisfied operations are those suffering from ANY of the following:
- They are missing at least one compulsory need-input. Since the dag is ordered, as soon as we’re on an operation, all its needs have been accounted, so we can get its satisfaction.
- Their provided outputs are not linked to any data in the dag.
An operation might not have any output link when
_prune_graph()
has broken them, due to given intermediate inputs.
Parameters: - dag – a graph with broken edges those arriving to existing inputs
- inputs – an iterable of the names of the input values
Returns: a list of unsatisfied operations to prune
-
compile
(inputs: Union[Collection[T_co], str, None] = None, outputs: Union[Collection[T_co], str, None] = None) → graphtik.network.ExecutionPlan[source]¶ Create or get from cache an execution-plan for the given inputs/outputs.
See
_prune_graph()
and_build_execution_steps()
for detailed description.Parameters: - inputs – A collection with the names of all the given inputs. If None`, all inputs that lead to given outputs are assumed. If string, it is converted to a single-element collection.
- outputs – A collection or the name of the output name(s). If None`, all reachable nodes from the given inputs are assumed. If string, it is converted to a single-element collection.
Returns: the cached or fresh new execution-plan
Raises: ValueError –
If outputs asked do not exist in network, with msg:
Unknown output nodes: …
If solution does not contain any operations, with msg:
Unsolvable graph: …
If given inputs mismatched plan’s
needs
, with msg:Plan needs more inputs…
If outputs asked cannot be produced by the
dag
, with msg:Impossible outputs…
-
narrowed
(inputs: Union[Collection[T_co], str, None] = None, outputs: Union[Collection[T_co], str, None] = None, predicate: Callable[[Any, Mapping[KT, VT_co]], bool] = None) → graphtik.network.Network[source]¶ Return a pruned network supporting just the given inputs & outputs.
Parameters: - inputs – all possible inputs names
- outputs – all possible output names
- predicate – a 2-argument callable(op, node-data) that should return true for nodes to include
Returns: the pruned clone, or this, if both inputs & outputs were None
-
class
graphtik.network.
ExecutionPlan
[source]¶ The result of the network’s compilation phase.
Note the execution plan’s attributes are on purpose immutable tuples.
Variables: - net – The parent
Network
- needs – An
iset
with the input names needed to exist in order to produce all provides. - provides – An
iset
with the outputs names produces when all inputs are given. - dag – The regular (not broken) pruned subgraph of net-graph.
- broken_edges – Tuple of broken incoming edges to given data.
- steps – The tuple of operation-nodes & instructions needed to evaluate the given inputs & asked outputs, free memory and avoid overwritting any given intermediate inputs.
- evict – when false, keep all inputs & outputs, and skip prefect-evictions check.
-
_execute_sequential_method
(solution, overwrites, executed)[source]¶ This method runs the graph one operation at a time in a single thread
Parameters: solution – must contain the input values only, gets modified
-
_execute_thread_pool_barrier_method
(solution, overwrites, executed)[source]¶ This method runs the graph using a parallel pool of thread executors. You may achieve lower total latency if your graph is sufficiently sub divided into operations using this method.
Parameters: solution – must contain the input values only, gets modified
-
execute
(named_inputs, outputs=None, *, overwrites=None, method=None)[source]¶ Parameters: - named_inputs – A maping of names –> values that must contain at least the compulsory inputs that were specified when the plan was built (but cannot enforce that!). Cloned, not modified.
- outputs – If not None, they are just checked if possible, based on
provides
, and scream if not. - overwrites – (optional) a mutable dict to collect calculated-but-discarded values because they were “pinned” by input vaules. If missing, the overwrites values are simply discarded.
Raises: ValueError –
If plan does not contain any operations, with msg:
Unsolvable graph: …
If given inputs mismatched plan’s
needs
, with msg:Plan needs more inputs…
If outputs asked cannot be produced by the
dag
, with msg:Impossible outputs…
-
validate
(inputs: Union[Collection[T_co], str, None], outputs: Union[Collection[T_co], str, None])[source]¶ Scream on invalid inputs, outputs or no operations in graph.
Raises: ValueError – - If cannot produce any outputs from the given inputs, with msg:Unsolvable graph: …
- If given inputs mismatched plan’s
needs
, with msg:Plan needs more inputs… - If outputs asked cannot be produced by the
dag
, with msg:Impossible outputs…
- If cannot produce any outputs from the given inputs, with msg:
- net – The parent
Module: plot¶
Plotting graphtik graps
-
graphtik.plot.
build_pydot
(graph, steps=None, inputs=None, outputs=None, solution=None, executed=None, title=None, node_props=None, edge_props=None, clusters=None) → <sphinx.ext.autodoc.importer._MockObject object at 0x7f17c8404eb8>[source]¶ Build a Graphviz out of a Network graph/steps/inputs/outputs and return it.
See
Plotter.plot()
for the arguments, sample code, and the legend of the plots.
-
graphtik.plot.
default_jupyter_render
= {'svg_container_styles': '', 'svg_element_styles': 'width: 100%; height: 300px;', 'svg_pan_zoom_json': '{controlIconsEnabled: true, zoomScaleSensitivity: 0.4, fit: true}'}[source]¶ A nested dictionary controlling the rendering of graph-plots in Jupyter cells,
as those returned from
Plotter.plot()
(currently as SVGs). Either modify it in place, or pass another one in the respective methods.The following keys are supported.
Parameters: - svg_pan_zoom_json –
arguments controlling the rendering of a zoomable SVG in Jupyter notebooks, as defined in https://github.com/ariutta/svg-pan-zoom#how-to-use if None, defaults to string (also maps supported):
"{controlIconsEnabled: true, zoomScaleSensitivity: 0.4, fit: true}"
- svg_element_styles –
mostly for sizing the zoomable SVG in Jupyter notebooks. Inspect & experiment on the html page of the notebook with browser tools. if None, defaults to string (also maps supported):
"width: 100%; height: 300px;"
- svg_container_styles – like svg_element_styles, if None, defaults to empty string (also maps supported).
- svg_pan_zoom_json –
-
graphtik.plot.
legend
(filename=None, show=None, jupyter_render: Mapping[KT, VT_co] = None)[source]¶ Generate a legend for all plots (see
Plotter.plot()
for args)
-
graphtik.plot.
render_pydot
(dot: <sphinx.ext.autodoc.importer._MockObject object at 0x7f17c84042b0>, filename=None, show=False, jupyter_render: str = None)[source]¶ Plot a Graphviz dot in a matplotlib, in file or return it for Jupyter.
Parameters: - dot – the pre-built Graphviz
pydot.Dot
instance - filename (str) – Write diagram into a file.
Common extensions are
.png .dot .jpg .jpeg .pdf .svg
callplot.supported_plot_formats()
for more. - show – If it evaluates to true, opens the diagram in a matplotlib window. If it equals -1, it returns the image but does not open the Window.
- jupyter_render – a nested dictionary controlling the rendering of graph-plots in Jupyter cells.
If None, defaults to
default_jupyter_render
(you may modify those in place and they will apply for all future calls).
Returns: the matplotlib image if
show=-1
, or the dot.See
Plotter.plot()
for sample code.- dot – the pre-built Graphviz