Graphtik

Supported Python versions of latest release in PyPi Development Status (src: 8.3.0, git: v8.3.0 , May 12, 2020) Latest release in GitHub Latest version in PyPI Travis continuous integration testing ok? (Linux) ReadTheDocs ok? cover-status Code Style Apache License, version 2.0

Github watchers Github stargazers Github forks Issues count

It’s a DAG all the way down!

solution_x9_nodes quarantine quarantine get_out_or_stay_home OP: get_out_or_stay_home R D FN: get_out_or_stay_home quarantine->get_out_or_stay_home space space get_out_or_stay_home->space time time get_out_or_stay_home->time exercise OP: exercise FN: exercise space->exercise read_book OP: read_book FN: read_book time->read_book fun fun exercise->fun body body exercise->body read_book->fun brain brain read_book->brain legend legend

Lightweight computation graphs for Python

Graphtik is an an understandable and lightweight Python module for building, running and plotting graphs of functions (a.k.a pipelines).

  • The API posits a fair compromise between features and complexity, without precluding any.

  • It can be used as is to build machine learning pipelines for data science projects.

  • It should be extendable to act as the core for a custom ETL engine, a workflow-processor for interdependent files like GNU Make, or an Excel-like spreadsheet.

Graphtik sprang from Graphkit (summer 2019, v1.2.2) to experiment with Python 3.6+ features, but has diverged significantly with enhancements ever since.

Table of Contents

Operations

An operation is a function in a computation pipeline, abstractly represented by the Operation class. This class specifies the dependencies forming the pipeline’s network.

Defining Operations

You may inherit the Operation abstract class and override its Operation.compute() method to manually do the following:

  • read declared as needs values from solution,

  • match those values into function arguments,

  • call your function to do it’s business,

  • “zip” the function’s results with the operation’s declared provides, and finally

  • hand back those zipped values to solution for further actions.

But there is an easier way – actually half of the code in this project is dedicated to retrofitting existing functions unaware of all these, into operations.

Operations from existing functions

The FunctionalOperation provides a concrete lightweight wrapper around any arbitrary function to define and execute within a pipeline. Use the operation() factory to instantiate one:

>>> from operator import add
>>> from graphtik import operation
>>> add_op = operation(add,
...                    needs=['a', 'b'],
...                    provides=['a_plus_b'])
>>> add_op
FunctionalOperation(name='add', needs=['a', 'b'], provides=['a_plus_b'], fn='add')

You may still call the original function at FunctionalOperation.fn, bypassing thus any operation pre-processing:

>>> add_op.fn(3, 4)
7

But the proper way is to call the operation (either directly or by calling the FunctionalOperation.compute() method). Notice though that unnamed positional parameters are not supported:

>>> add_op(a=3, b=4)
{'a_plus_b': 7}

Tip

In case your function needs to access the execution machinery or its wrapping operation, it can do that through the task_context (unstable API).

Builder pattern

There are two ways to instantiate a FunctionalOperations, each one suitable for different scenarios.

We’ve seen that calling manually operation() allows putting into a pipeline functions that are defined elsewhere (e.g. in another module, or are system functions).

But that method is also useful if you want to create multiple operation instances with similar attributes, e.g. needs:

>>> op_factory = operation(needs=['a'])

Notice that we specified a fn, in order to get back a FunctionalOperation instance (and not a decorator).

>>> from graphtik import operation, compose
>>> from functools import partial
>>> def mypow(a, p=2):
...    return a ** p
>>> pow_op2 = op_factory.withset(fn=mypow, provides="^2")
>>> pow_op3 = op_factory.withset(fn=partial(mypow, p=3), name='pow_3', provides='^3')
>>> pow_op0 = op_factory.withset(fn=lambda a: 1, name='pow_0', provides='^0')
>>> graphop = compose('powers', pow_op2, pow_op3, pow_op0)
>>> graphop
Pipeline('powers', needs=['a'], provides=['^2', '^3', '^0'], x3 ops:
   mypow, pow_3, pow_0)
>>> graphop(a=2)
{'a': 2, '^2': 4, '^3': 8, '^0': 1}

Tip

See Plotting on how to make diagrams like this.

Decorator specification

If you are defining your computation graph and the functions that comprise it all in the same script, the decorator specification of operation instances might be particularly useful, as it allows you to assign computation graph structure to functions as they are defined. Here’s an example:

>>> from graphtik import operation, compose
>>> @operation(needs=['b', 'a', 'r'], provides='bar')
... def foo(a, b, c):
...   return c * (a + b)
>>> graphop = compose('foo_graph', foo)
  • Notice that if name is not given, it is deduced from the function name.

Specifying graph structure: provides and needs

Each operation is a node in a computation graph, depending and supplying data from and to other nodes (via the solution), in order to compute.

This graph structure is specified (mostly) via the provides and needs arguments to the operation() factory, specifically:

needs

this argument names the list of (positionally ordered) inputs data the operation requires to receive from solution. The list corresponds, roughly, to the arguments of the underlying function (plus any sideffects).

It can be a single string, in which case a 1-element iterable is assumed.

seealso

needs, modifier, FunctionalOperation.needs, FunctionalOperation.op_needs, FunctionalOperation._fn_needs

provides

this argument names the list of (positionally ordered) outputs data the operation provides into the solution. The list corresponds, roughly, to the returned values of the fn (plus any sideffects & aliases).

It can be a single string, in which case a 1-element iterable is assumed.

If they are more than one, the underlying function must return an iterable with same number of elements (unless it returns dictionary).

seealso

provides, modifier, FunctionalOperation.provides, FunctionalOperation.op_provides, FunctionalOperation._fn_provides

Declarations of needs and provides is affected by modifiers like keyword():

Map inputs to different function arguments
graphtik.modifiers.keyword(name: str, fn_kwarg: str = None)[source]

Annotate a needs that (optionally) maps inputs name –> keyword argument name.

The value of a keyword dependency is passed in as keyword argument to the underlying function.

Parameters

fn_kwarg

The argument-name corresponding to this named-input. If it is None, assumed the same as name, so as to behave always like kw-type arg, and to preserve its fn-name if ever renamed.

Note

This extra mapping argument is needed either for optionals (but not varargish), or for functions with keywords-only arguments (like def func(*, foo, bar): ...), since inputs are normally fed into functions by-position, not by-name.

Returns

a _Modifier instance, even if no fn_kwarg is given OR it is the same as name.

Example:

In case the name of the function arguments is different from the name in the inputs (or just because the name in the inputs is not a valid argument-name), you may map it with the 2nd argument of keyword():

>>> from graphtik import operation, compose, keyword
>>> @operation(needs=['a', keyword("name-in-inputs", "b")], provides="sum")
... def myadd(a, *, b):
...    return a + b
>>> myadd
FunctionalOperation(name='myadd',
                    needs=['a', 'name-in-inputs'(>'b')],
                    provides=['sum'],
                    fn='myadd')
>>> graph = compose('mygraph', myadd)
>>> graph
Pipeline('mygraph', needs=['a', 'name-in-inputs'], provides=['sum'], x1 ops: myadd)
>>> sol = graph.compute({"a": 5, "name-in-inputs": 4})['sum']
>>> sol
9
Operations may execute with missing inputs
graphtik.modifiers.optional(name: str, fn_kwarg: str = None)[source]

Annotate optionals needs corresponding to defaulted op-function arguments, …

received only if present in the inputs (when operation is invoked).

The value of an optional dependency is passed in as a keyword argument to the underlying function.

Parameters

fn_kwarg – the name for the function argument it corresponds; if a falsy is given, same as name assumed, to behave always like kw-type arg and to preserve its fn-name if ever renamed.

Example:

>>> from graphtik import operation, compose, optional
>>> @operation(name='myadd',
...            needs=["a", optional("b")],
...            provides="sum")
... def myadd(a, b=0):
...    return a + b

Notice the default value 0 to the b annotated as optional argument:

>>> graph = compose('mygraph', myadd)
>>> graph
Pipeline('mygraph',
                 needs=['a', 'b'(?)],
                 provides=['sum'],
                 x1 ops: myadd)

The graph works both with and without c provided in the inputs:

>>> graph(a=5, b=4)['sum']
9
>>> graph(a=5)
{'a': 5, 'sum': 5}

Like keyword() you may map input-name to a different function-argument:

>>> operation(needs=['a', optional("quasi-real", "b")],
...           provides="sum"
... )(myadd.fn)  # Cannot wrap an operation, its `fn` only.
FunctionalOperation(name='myadd',
                    needs=['a', 'quasi-real'(?>'b')],
                    provides=['sum'],
                    fn='myadd')
Calling functions with varargs (*args)
graphtik.modifiers.vararg(name: str)[source]

Annotate a varargish needs to be fed as function’s *args.

See also

Consult also the example test-case in: test/test_op.py:test_varargs(), in the full sources of the project.

Example:

We designate b & c as vararg arguments:

>>> from graphtik import operation, compose, vararg
>>> @operation(
...     needs=['a', vararg('b'), vararg('c')],
...     provides='sum'
... )
... def addall(a, *b):
...    return a + sum(b)
>>> addall
FunctionalOperation(name='addall',
                    needs=['a', 'b'(*), 'c'(*)],
                    provides=['sum'],
                    fn='addall')
>>> graph = compose('mygraph', addall)

The graph works with and without any of b or c inputs:

>>> graph(a=5, b=2, c=4)['sum']
11
>>> graph(a=5, b=2)
{'a': 5, 'b': 2, 'sum': 7}
>>> graph(a=5)
{'a': 5, 'sum': 5}
graphtik.modifiers.varargs(name: str)[source]

An varargish vararg(), naming a iterable value in the inputs.

See also

Consult also the example test-case in: test/test_op.py:test_varargs(), in the full sources of the project.

Example:

>>> from graphtik import operation, compose, varargs
>>> def enlist(a, *b):
...    return [a] + list(b)
>>> graph = compose('mygraph',
...     operation(name='enlist', needs=['a', varargs('b')],
...     provides='sum')(enlist)
... )
>>> graph
Pipeline('mygraph',
                 needs=['a', 'b'(?)],
                 provides=['sum'],
                 x1 ops: enlist)

The graph works with or without b in the inputs:

>>> graph(a=5, b=[2, 20])['sum']
[5, 2, 20]
>>> graph(a=5)
{'a': 5, 'sum': [5]}
>>> graph(a=5, b=0xBAD)
Traceback (most recent call last):
...
graphtik.base.MultiValueError: Failed preparing needs:
    1. Expected needs['b'(+)] to be non-str iterables!
    +++inputs: ['a', 'b']
    +++FunctionalOperation(name='enlist', needs=['a', 'b'(+)], provides=['sum'], fn='enlist')

Attention

To avoid user mistakes, varargs do not accept str inputs (though iterables):

>>> graph(a=5, b="mistake")
Traceback (most recent call last):
...
graphtik.base.MultiValueError: Failed preparing needs:
    1. Expected needs['b'(+)] to be non-str iterables!
    +++inputs: ['a', 'b']
    +++FunctionalOperation(name='enlist',
                           needs=['a', 'b'(+)],
                           provides=['sum'],
                           fn='enlist')
Aliased provides

Sometimes, you need to interface functions & operations where they name a dependency differently. This is doable without introducing “pipe-through” interface operation, either by annotating certain needs with keyword() modifiers (above), or by aliassing certain provides to different names:

>>> op = operation(str,
...                name="`provides` with `aliases`",
...                needs="anything",
...                provides="real thing",
...                aliases=("real thing", "phony"))

Considerations for when building pipelines

When many operations are composed into a computation graph, Graphtik matches up the values in their needs and provides to form the edges of that graph (see Pipelines for more on that), like the operations from the script in Quick start:

>>> from operator import mul, sub
>>> from functools import partial
>>> from graphtik import compose, operation
>>> def abspow(a, p):
...   """Compute |a|^p. """
...   c = abs(a) ** p
...   return c
>>> # Compose the mul, sub, and abspow operations into a computation graph.
>>> graphop = compose("graphop",
...    operation(mul, needs=["a", "b"], provides=["ab"]),
...    operation(sub, needs=["a", "ab"], provides=["a_minus_ab"]),
...    operation(name="abspow1", needs=["a_minus_ab"], provides=["abs_a_minus_ab_cubed"])
...    (partial(abspow, p=3))
... )
>>> graphop
Pipeline('graphop',
                 needs=['a', 'b', 'ab', 'a_minus_ab'],
                 provides=['ab', 'a_minus_ab', 'abs_a_minus_ab_cubed'],
                 x3 ops: mul, sub, abspow1)
  • Notice the use of functools.partial() to set parameter p to a constant value.

  • And this is done by calling once more the returned “decorator* from operation(), when called without a functions.

The needs and provides arguments to the operations in this script define a computation graph that looks like this:

Pipelines

Graphtik’s operation.compose() factory handles the work of tying together operation instances into a runnable computation graph.

The simplest use case is to assemble a collection of individual operations into a runnable computation graph. The example script from Quick start illustrates this well:

>>> from operator import mul, sub
>>> from functools import partial
>>> from graphtik import compose, operation
>>> def abspow(a, p):
...    """Computes |a|^p. """
...    c = abs(a) ** p
...    return c

The call here to compose() yields a runnable computation graph that looks like this (where the circles are operations, squares are data, and octagons are parameters):

>>> # Compose the mul, sub, and abspow operations into a computation graph.
>>> graphop = compose("graphop",
...    operation(name="mul1", needs=["a", "b"], provides=["ab"])(mul),
...    operation(name="sub1", needs=["a", "ab"], provides=["a_minus_ab"])(sub),
...    operation(name="abspow1", needs=["a_minus_ab"], provides=["abs_a_minus_ab_cubed"])
...    (partial(abspow, p=3))
... )

This yields a graph which looks like this (see Plotting):

>>> graphop.plot('calc_power.svg')  

graphop

Running a pipeline

The graph composed above can be run by simply calling it with a dictionary of values with keys corresponding to the named dependencies (needs & provides):

>>> # Run the graph and request all of the outputs.
>>> out = graphop(a=2, b=5)
>>> out
{'a': 2, 'b': 5, 'ab': 10, 'a_minus_ab': -8, 'abs_a_minus_ab_cubed': 512}

You may plot the solution:

>>> out.plot('a_solution.svg')  

the solution of the graph

Producing a subset of outputs

By default, calling a graph-operation on a set of inputs will yield all of that graph’s outputs. You can use the outputs parameter to request only a subset. For example, if graphop is as above:

>>> # Run the graph-operation and request a subset of the outputs.
>>> out = graphop.compute({'a': 2, 'b': 5}, outputs="a_minus_ab")
>>> out
{'a_minus_ab': -8}

When asking a subset of the graph’s outputs, Graphtik does 2 things:

  • it prunes any operations that are not on the path from given inputs to the requested outputs (e.g. the abspow1 operation, above, is not executed);

  • it evicts any intermediate data from solution as soon as they are not needed.

You may see (2) in action by including the sequence of execution steps into the plot:

>>> dot = out.plot(theme={"include_steps": True})
Short-circuiting a pipeline

You can short-circuit a graph computation, making certain inputs unnecessary, by providing a value in the graph that is further downstream in the graph than those inputs. For example, in the graph-operation we’ve been working with, you could provide the value of a_minus_ab to make the inputs a and b unnecessary:

>>> # Run the graph-operation and request a subset of the outputs.
>>> out = graphop(a_minus_ab=-8)
>>> out
{'a_minus_ab': -8, 'abs_a_minus_ab_cubed': 512}

When you do this, any operation nodes that are not on a path from the downstream input to the requested outputs (i.e. predecessors of the downstream input) are not computed. For example, the mul1 and sub1 operations are not executed here.

This can be useful if you have a graph-operation that accepts alternative forms of the same input. For example, if your graph-operation requires a PIL.Image as input, you could allow your graph to be run in an API server by adding an earlier operation that accepts as input a string of raw image data and converts that data into the needed PIL.Image. Then, you can either provide the raw image data string as input, or you can provide the PIL.Image if you have it and skip providing the image data string.

Extending pipelines

Sometimes we begin with existing computation graph(s) to which we want to extend with other operations and/or pipelines.

There are 2 ways to combine pipelines together, merging (the default) and nesting.

Merging

This is the default mode for compose() when when combining individual operations, and it works exactly the same when whole pipelines are involved.

For example, lets suppose that this simple pipeline describes the daily scheduled workload of an “empty’ day:

>>> weekday = compose("weekday",
...     operation(str, name="wake up", needs="backlog", provides="tasks"),
...     operation(str, name="sleep", needs="tasks", provides="todos"),
... )

Now let’s do some “work”:

>>> weekday = compose("weekday",
...     operation(lambda t: (t[:-1], t[-1:]),
...               name="work!", needs="tasks", provides=["tasks done", "todos"]),
...     operation(str, name="sleep"),
...     weekday,
... )

Notice that the pipeline to override was added last, at the bottom; that’s because the operations added earlier in the call (further to the left) overrider any identically-named operations added later.

Notice also that the overridden “sleep” operation hasn’t got any actual role in the schedule. We can eliminate “sleep” altogether by pre-registering the special NULL_OP operation under the same name, “sleep”:

>>> from graphtik import NULL_OP
>>> weekday = compose("weekday", NULL_OP("sleep"), weekday)
Nesting

Other times we want preserve all the operations composed, regardless of clashes in their names. This is doable with compose(..., nest=True)).

Lets build a schedule for the the 3-day week (covid19 γαρ…), by combining 3 mutated copies of the daily schedule we built earlier:

>>> weekdays = [weekday.withset(name=f"day {i}") for i in range(3)]
>>> week = compose("week", *weekdays, nest=True)

We now have 3 “isolated” clusters because all operations & data have been prefixed with the name of their pipeline they originally belonged to.

Let’s suppose we want to break the isolation, and have all sub-pipelines consume & produce from a common “backlog” (n.b. in real life, we would have a “feeder” & “collector” operations).

We do that by passing as the nest parameter a callable() which will decide which names of the original pipeline (operations & dependencies) should be prefixed (see also compose() & RenArgs for how to use that param):

>>> def rename_predicate(ren_args):
...     if ren_args.name not in ("backlog", "tasks done", "todos"):
...         return True
>>> week = compose("week", *weekdays, nest=rename_predicate)

Finally we may run the week’s schedule and get the outcome (whatever that might be :-), hover the results to see them):

>>> sol = week.compute({'backlog': "a lot!"})
>>> sol
{'backlog': 'a lot!',
 'day 0.tasks': 'a lot!',
 'tasks done': 'a lot', 'todos': '!',
 'day 1.tasks': 'a lot!',
 'day 2.tasks': 'a lot!'}
>>> dot = sol.plot(clusters=True)

Tip

We had to plot with clusters=True so that we prevent the plan to insert the “after pruning” cluster (see PlotArgs.clusters).

See also

Consult these test-cases from the full sources of the project:

  • test/test_graphtik.py:test_network_simple_merge()

  • test/test_graphtik.py:test_network_deep_merge()

Advanced pipelines

Depending on sideffects
graphtik.modifiers.sfx(name, optional: bool = None)[source]

sideffects denoting modifications beyond the scope of the solution.

Both needs & provides may be designated as sideffects using this modifier. They work as usual while solving the graph (compilation) but they have a limited interaction with the operation’s underlying function; specifically:

  • input sideffects must exist in the solution as inputs for an operation depending on it to kick-in, when the computation starts - but this is not necessary for intermediate sideffects in the solution during execution;

  • input sideffects are NOT fed into underlying functions;

  • output sideffects are not expected from underlying functions, unless a rescheduled operation with partial outputs designates a sideffected as canceled by returning it with a falsy value (operation must returns dictionary).

Hint

If modifications involve some input/output, prefer the sfxed() modifier.

You may still convey this relationships by including the dependency name in the string - in the end, it’s just a string - but no enforcement of any kind will happen from graphtik, like:

>>> from graphtik import sfx
>>> sfx("price[sales_df]")
sfx('price[sales_df]')

Example:

A typical use-case is to signify changes in some “global” context, outside solution:

>>> from graphtik import operation, compose, sfx
>>> @operation(provides=sfx("lights off"))  # sideffect names can be anything
... def close_the_lights():
...    pass
>>> graph = compose('strip ease',
...     close_the_lights,
...     operation(
...         name='undress',
...         needs=[sfx("lights off")],
...         provides="body")(lambda: "TaDa!")
... )
>>> graph
Pipeline('strip ease',
                 needs=[sfx('lights off')],
                 provides=[sfx('lights off'), 'body'],
                 x2 ops: close_the_lights, undress)
>>> sol = graph()
>>> sol
{'body': 'TaDa!'}

sideffect

Note

Something has to provide a sideffect for a function needing it to execute - this could be another operation, like above, or the user-inputs; just specify some truthy value for the sideffect:

>>> sol = graph.compute({sfx("lights off"): True})
Modifying existing values in solutions
graphtik.modifiers.sfxed(dependency: str, sfx0: str, *sfx_list: str, fn_kwarg: str = None, optional: bool = None)[source]

Annotates a sideffected dependency in the solution sustaining side-effects.

Parameters

fn_kwarg – the name for the function argument it corresponds. When optional, it becomes the same as name if falsy, so as to behave always like kw-type arg, and to preserve fn-name if ever renamed. When not optional, if not given, it’s all fine.

Like sfx() but annotating a real dependency in the solution, allowing that dependency to be present both in needs and provides of the same function.

Example:

A typical use-case is to signify columns required to produce new ones in pandas dataframes (emulated with dictionaries):

>>> from graphtik import operation, compose, sfxed
>>> @operation(needs="order_items",
...            provides=sfxed("ORDER", "Items", "Prices"))
... def new_order(items: list) -> "pd.DataFrame":
...     order = {"items": items}
...     # Pretend we get the prices from sales.
...     order['prices'] = list(range(1, len(order['items']) + 1))
...     return order
>>> @operation(
...     needs=[sfxed("ORDER", "Items"), "vat rate"],
...     provides=sfxed("ORDER", "VAT")
... )
... def fill_in_vat(order: "pd.DataFrame", vat: float):
...     order['VAT'] = [i * vat for i in order['prices']]
...     return order
>>> @operation(
...     needs=[sfxed("ORDER", "Prices", "VAT")],
...     provides=sfxed("ORDER", "Totals")
... )
... def finalize_prices(order: "pd.DataFrame"):
...     order['totals'] = [p + v for p, v in zip(order['prices'], order['VAT'])]
...     return order

To view all internal dependencies, enable DEBUG in configurations:

>>> from graphtik.config import debug_enabled
>>> with debug_enabled(True):
...     finalize_prices
FunctionalOperation(name='finalize_prices',
                    needs=[sfxed('ORDER', 'Prices'), sfxed('ORDER', 'VAT')],
                    op_needs=[sfxed('ORDER', 'Prices'), sfxed('ORDER', 'VAT')],
                    _fn_needs=['ORDER'],
                    provides=[sfxed('ORDER', 'Totals')],
                    op_provides=[sfxed('ORDER', 'Totals')],
                    _fn_provides=['ORDER'],
                    fn='finalize_prices')

Notice that declaring a single sideffected with many items in sfx_list, expands into multiple “singular” sideffected dependencies in the network (check needs & op_needs above).

>>> proc_order = compose('process order', new_order, fill_in_vat, finalize_prices)
>>> sol = proc_order.compute({
...      "order_items": ["toilet-paper", "soap"],
...      "vat rate": 0.18,
... })
>>> sol
{'order_items': ['toilet-paper', 'soap'],
 'vat rate': 0.18,
 'ORDER': {'items': ['toilet-paper', 'soap'],
           'prices': [1, 2],
           'VAT': [0.18, 0.36],
           'totals': [1.18, 2.36]}}

sideffecteds

Notice that although many functions consume & produce the same ORDER dependency (check fn_needs & fn_provides, above), something that would have formed cycles, the wrapping operations need and provide different sideffected instances, breaking the cycles.

Resilience when operations fail (endurance)

It is possible for a pipeline to persist executing operations, even if some of them are raising errors, if they are marked as endured:

>>> @operation(endured=1, provides=["space", "time"])
... def get_out():
...     raise ValueError("Quarantined!")
>>> get_out
FunctionalOperation!(name='get_out', provides=['space', 'time'], fn='get_out')

Notice the exclamation(!) before the parenthesis in the string representation of the operation.

>>> @operation(needs="space", provides="fun")
... def exercise(where):
...     return "refreshed"
>>> @operation(endured=1, provides="time")
... def stay_home():
...     return "1h"
>>> @operation(needs="time", provides="fun")
... def read_book(for_how_long):
...     return "relaxed"
>>> pipeline = compose("covid19", get_out, stay_home, exercise, read_book)
>>> pipeline
Pipeline('covid19',
                 needs=['space', 'time'],
                 provides=['space', 'time', 'fun'],
                 x4 ops: get_out, stay_home, exercise, read_book)

Notice the thick outlines of the endured (or rescheduled, see below) operations.

When executed, the pipeline produced outputs, although one of its operations has failed:

>>> sol = pipeline()
>>> sol
{'time': '1h', 'fun': 'relaxed'}

You may still abort on failures, later, by raising an appropriate exception from Solution:

>>> sol.scream_if_incomplete()
Traceback (most recent call last):
...
graphtik.base.IncompleteExecutionError:
Not completed x2 operations ['exercise', 'get_out'] due to x1 failures and x0 partial-ops:
  +--get_out: ValueError('Quarantined!')
Operations with partial outputs (rescheduled)

In case the actually produce outputs depend on some condition in the inputs, the solution has to reschedule the plan amidst execution, and consider the actual provides delivered:

>>> @operation(rescheduled=1,
...            needs="quarantine",
...            provides=["space", "time"],
...            returns_dict=True)
... def get_out_or_stay_home(quarantine):
...     if quarantine:
...          return {"time": "1h"}
...     else:
...          return {"space": "around the block"}
>>> get_out_or_stay_home
FunctionalOperation?(name='get_out_or_stay_home',
                     needs=['quarantine'],
                     provides=['space', 'time'],
                     fn{}='get_out_or_stay_home')
>>> @operation(needs="space", provides=["fun", "body"])
... def exercise(where):
...     return "refreshed", "strong feet"
>>> @operation(needs="time", provides=["fun", "brain"])
... def read_book(for_how_long):
...     return "relaxed", "popular physics"
>>> pipeline = compose("covid19", get_out_or_stay_home, exercise, read_book)

Depending on “quarantine’ state we get to execute different part of the pipeline:

>>> sol = pipeline(quarantine=True)
>>> sol = pipeline(quarantine=False)

In both case, a warning gets raised about the missing outputs, but the execution proceeds regularly to what it is possible to evaluate. You may collect a report of what has been canceled using this:

>>> print(sol.check_if_incomplete())
Not completed x1 operations ['read_book'] due to x0 failures and x1 partial-ops:
  +--get_out_or_stay_home: ['time']

In case you wish to cancel the output of a single-result operation, return the special value graphtik.NO_RESULT.

Plotting and Debugging

Plotting

For Errors & debugging it is necessary to visualize the graph-operation. You may plot any plottable and annotate on top the execution plan and solution of the last computation, calling methods with arguments like this:

pipeline.plot(True)                   # open a matplotlib window
pipeline.plot("pipeline.svg")            # other supported formats: png, jpg, pdf, ...
pipeline.plot()                       # without arguments return a pydot.DOT object
pipeline.plot(solution=solution)      # annotate graph with solution values
solution.plot()                    # plot solution only

… or for the last …:

solution.plot(...)
execution plan
Graphtik Legend

The legend for all graphtik diagrams, generated by legend().

The same Plottable.plot() method applies also for:

each one capable to producing diagrams with increasing complexity.

For instance, when a pipeline has just been composed, plotting it will come out bare bone, with just the 2 types of nodes (data & operations), their dependencies, and (optionally, if plot theme include_steps is true) the sequence of the execution-steps of the plan.

barebone graph

But as soon as you run it, the net plot calls will print more of the internals. Internally it delegates to ExecutionPlan.plot() of the plan. attribute, which caches the last run to facilitate debugging. If you want the bare-bone diagram, plot the network:

pipeline.net.plot(...)

If you want all details, plot the solution:

solution.net.plot(...)

Note

For plots, Graphviz program must be in your PATH, and pydot & matplotlib python packages installed. You may install both when installing graphtik with its plot extras:

pip install graphtik[plot]

Tip

A description of the similar API to pydot.Dot instance returned by plot() methods is here: https://pydotplus.readthedocs.io/reference.html#pydotplus.graphviz.Dot

Jupyter notebooks

The pydot.Dot instances returned by Plottable.plot() are rendered directly in Jupyter/IPython notebooks as SVG images.

You may increase the height of the SVG cell output with something like this:

pipeline.plot(jupyter_render={"svg_element_styles": "height: 600px; width: 100%"})

See default_jupyter_render for those defaults and recommendations.

Plot customizations

Rendering of plots is performed by the active plotter (class plot.Plotter). All Graphviz styling attributes are controlled by the active plot theme, which is the plot.Theme instance installed in its Plotter.default_theme attribute.

The following style expansion\s apply in the attribute-values of Theme instances:

  • Resolve any Ref instances, first against the current nx_attrs and then against the attributes of the current theme.

  • Render jinja2 templates (see _expand_styles()) with template-arguments all the attributes of the plot_args instance in use (hence much more flexible than Ref).

  • Call any callables with current plot_args and replace them by their result (even more flexible than templates).

  • Any Nones results above are discarded.

  • Workaround pydot/pydot#228 pydot-cstor not supporting styles-as-lists.

You may customize the theme and/or plotter behavior with various strategies, ordered by breadth of the effects (most broadly effecting method at the top):

  1. (zeroth, because it is discouraged!)

    Modify in-place Theme class attributes, and monkeypatch Plotter methods.

    This is the most invasive method, affecting all past and future plotter instances, and future only(!) themes used during a Python session.

Attention

It is recommended to use other means for Plot customizations instead of modifying directly theme’s class-attributes.

All Theme class-attributes are deep-copied when constructing new instances, to avoid modifications by mistake, while attempting to update instance-attributes instead (hint: allmost all its attributes are containers i.e. dicts). Therefore any class-attributes modification will be ignored, until a new Theme instance from the patched class is used .

  1. Modify the default_theme attribute of the default active plotter, like that:

    get_active_plotter().default_theme.kw_op["fillcolor"] = "purple"
    

    This will affect all Plottable.plot() calls for a Python session.

  2. Create a new Plotter with customized Plotter.default_theme, or clone and customize the theme of an existing plotter by the use of its Plotter.with_styles() method, and make that the new active plotter.

    • This will affect all calls in context.

    • If customizing theme constants is not enough, you may subclass and install a new Plotter class in context.

  3. Pass theme or plotter arguments when calling Plottable.plot():

    pipeline.plot(plotter=Plotter(kw_legend=None))
    pipeline.plot(theme=Theme(include_steps=True)
    

    You may clone and customize an existing plotter, to preserve any pre-existing customizations:

    active_plotter = get_active_plotter()
    pipeline.plot(theme={"include_steps": True})
    

    … OR:

    pipeline.plot(plotter=active_plotter.with_styles(kw_legend=None))
    

    You may create a new class to override Plotter’s methods that way.

    Hint

    This project dogfoods (3) in its own docs/source/conf.py sphinx file. In particular, it configures the base-url of operation node links (by default, nodes do not link to any url):

    ## Plot graphtik SVGs with links to docs.
    #
    def _make_py_item_url(fn):
       if not inspect.isbuiltin(fn):
          fn_name = base.func_name(fn, None, mod=1, fqdn=1, human=0)
          if fn_name:
                return f"../reference.html#{fn_name}"
    
    
    plotter = plot.get_active_plotter()
    plot.set_active_plotter(
       plot.get_active_plotter().with_styles(
          kw_op_label={
                **plotter.default_theme.kw_op_label,
                "op_url": lambda plot_args: _make_py_item_url(plot_args.nx_item),
                "fn_url": lambda plot_args: _make_py_item_url(plot_args.nx_item.fn),
          }
       )
    )
    
Sphinx-generated sites

This library contains a new Sphinx extension (adapted from the sphinx.ext.doctest) that can render plottables in sites from python code in “doctests”.

To enabled it, append module graphtik.sphinxext as a string in you docs/conf.py : extensions list, and then intersperse the graphtik or graphtik-output directives with regular doctest-code to embed graph-plots into the site; you may refer to those plotted graphs with the graphtik role referring to their :name: option(see Examples below).

Hint

Note that Sphinx is not doctesting the actual python modules, unless the plotting code has ended up, somehow, in the site (e.g. through some autodoc directive). Contrary to pytest and doctest standard module, the module’s globals are not imported (until sphinx#6590 is resolved), so you may need to import it in your doctests, like this:


Unfortunately, you cannot use relative import, and have to write your module’s full name.

Directives
.. graphtik::

Renders a figure with a graphtik plots from doctest code.

It supports:

  • all configurations from sphinx.ext.doctest sphinx-extension, plus those described below, in Configurations.

  • all options from ‘doctest’ directive,

    • hide

    • options

    • pyversion

    • skipif

  • these options from image directive, except target (plot elements may already link to URLs):

    • height

    • width

    • scale

    • class

    • alt

  • these options from figure directive:

    • name

    • align

    • figwidth

    • figclass

  • and the following new options:

    • graphvar

    • graph-format

    • caption

Specifically the “interesting” options are these:

:graphvar: (string, optional) varname (`str`)

the variable name containing what to render, which it can be:

If missing, it renders the last variable in the doctest code assigned with the above types.

Attention

If no :graphvar: is given and the doctest code fails, it will still render any plottable created from code that has run previously, without any warnings!

:graph-format: png | svg | svgz | pdf | `None` (choice, default: `None`)
if None, format decided according to active builder, roughly:
  • “html”-like: svg

  • “latex”: pdf

Note that SVGs support zooming, tooltips & URL links, while PNGs support image maps for linkable areas.

:zoomable: <empty>, (true, 1, yes, on) | (false, 0, no, off) (`bool`)

Enable/disable interactive pan+zoom of SVGs; if missing/empty, graphtik_zoomable assumed.

:zoomable-opts: <empty>, (true, 1, yes, on) | (false, 0, no, off) (`str`)

A JS-object with the options for the interactive zoom+pan pf SVGs. If missing, graphtik_zoomable_options assumed. Specify {} explicitly to force library’s default options.

:name: link target id (`str`)

Make this pipeline a hyperlink target identified by this name. If :name: given and no :caption: given, one is created out of this, to act as a permalink.

:caption: figure's caption (`str`)

Text to put underneath the pipeline.

:alt: (`str`)

If not given, derived from string representation of the pipeline.

.. graphtik-output::

Like graphtik, but works like doctest’s testoutput directive.

:graphtik:

An interpreted text role to refer to graphs plotted by graphtik or graphtik-output directives by their :name: option.

Configurations
graphtik_default_graph_format
  • type: Union[str, None]

  • default: None

The file extension of the generated plot images (without the leading dot .`), used when no :graph-format: option is given in a graphtik or graphtik-output directive.

If None, the format is chosen from graphtik_graph_formats_by_builder configuration.

graphtik_graph_formats_by_builder
  • type: Map[str, str]

  • default: check the sources

a dictionary defining which plot image formats to choose, depending on the active builder.

  • Keys are regexes matching the name of the active builder;

  • values are strings from the supported formats for pydot library, e.g. png (see supported_plot_formats()).

If a builder does not match to any key, and no format given in the directive, no graphtik plot is rendered; so by default, it only generates plots for html & latex.

Warning

Latex is probably not working :-(

graphtik_zoomable_svg
  • type: bool

  • default: True

Whether to render SVGs with the zoom-and-pan javascript library, unless the :zoomable: directive-option is given (and not empty).

Attention

Zoom-and-pan does not work in Sphinx sites for Chrome locally - serve the HTML files through some HTTP server, e.g. launch this command to view the site of this project:

python -m http.server 8080 --directory build/sphinx/html/
graphtik_zoomable_options
  • type: str

  • default: {controlIconsEnabled: true, zoomScaleSensitivity: 0.4, fit: true}

A JS-object with the options for the interactive zoom+pan pf SVGs, when the :zoomable-opts: directive option is missing. If empty, {} assumed (library’s default options).

graphtik_plot_keywords
  • type: dict

  • default: {}

Arguments or build_pydot() to apply when rendering plottables.

graphtik_save_dot_files
- type: `bool`, `None`
- default: ``None``

For debugging purposes, if enabled, store another <img>.txt file next to each image file with the DOT text that produced it.

When none (default), controlled by config.is_debug() from configurations (which by default obeys to GRAPHTIK_DEBUG environment variable), otherwise, any boolean takes precedence here.

graphtik_warning_is_error
  • type: bool

  • default: false

If false, suppress doctest errors, and avoid failures when building site with -W option, since these are unrelated to the building of the site.

doctest_test_doctest_blocks (foreign config)

Don’t disable doctesting of literal-blocks, ie, don’t reset the doctest_test_doctest_blocks configuration value, or else, such code would be invisible to graphtik directive.

trim_doctest_flags (foreign config)

This configuration is forced to False (default was True).

Attention

This means that in the rendered site, options-in-comments like # doctest: +SKIP and <BLACKLINE> artifacts will be visible.

Examples

The following directive renders a diagram of its doctest code, beneath it:

.. graphtik::
   :graphvar: addmul
   :name: addmul-operation

   >>> from graphtik import compose, operation
   >>> addmul = compose(
   ...       "addmul",
   ...       operation(name="add", needs="abc".split(), provides="ab")(lambda a, b, c: (a + b) * c)
   ... )

addmul-operation

which you may reference with this syntax:

you may :graphtik:`reference <addmul-operation>` with ...

Hint

In this case, the :graphvar: parameter is not really needed, since the code contains just one variable assignment receiving a subclass of Plottable or pydot.Dot instance.

Additionally, the doctest code producing the plottables does not have to be contained in the graphtik directive as a whole.

So the above could have been simply written like this:

>>> from graphtik import compose, operation
>>> addmul = compose(
...       "addmul",
...       operation(name="add", needs="abc".split(), provides="ab")(lambda a, b, c: (a + b) * c)
... )

.. graphtik::
   :name: addmul-operation

Errors & debugging

Graphs may become arbitrary deep. Launching a debugger-session to inspect deeply nested stacks is notoriously hard

Logging

Increase the logging verbosity; logging statements have been placed melticulously to describe the execution flows (but not compilation :-(), with each log statement accompanied by the solution id of that flow, like the (3C40) & (8697) below, important for when running pipelines in parallel:

--------------------- Captured log call ---------------------
DEBUG    === Compiling pipeline(t)...
DEBUG    ... cache-updated key: ((), None, None)
DEBUG    === (3C40) Executing pipeline(t), in parallel, on inputs[], according to ExecutionPlan(needs=[], provides=['b'], x2 steps: op1, op2)...
DEBUG    +++ (3C40) Parallel batch['op1'] on solution[].
DEBUG    +++ (3C40) Executing OpTask(FunctionalOperation|(name='op1', needs=[], provides=[sfx: 'b'], fn{}='<lambda>'), sol_keys=[])...
INFO     graphtik.op.py:534 Results[sfx: 'b'] contained +1 unknown provides[sfx: 'b']
FunctionalOperation|(name='op1', needs=[], provides=[sfx: 'b'], fn{}='<lambda>')
DEBUG    ... (3C40) op(op1) completed in 1.406ms.

...

DEBUG    === Compiling pipeline(t)...
DEBUG    ... cache-hit key: ((), None, None)
DEBUG    === (8697) Executing pipeline(t), evicting, on inputs[], according to ExecutionPlan(needs=[], provides=['b'], x3 steps: op1, op2, sfx: 'b')...
DEBUG    +++ (8697) Executing OpTask(FunctionalOperation(name='op1', needs=[], provides=[sfx: 'b'], fn{}='<lambda>'), sol_keys=[])...
INFO     graphtik.op.py:534 Results[sfx: 'b'] contained +1 unknown provides[sfx: 'b']
FunctionalOperation(name='op1', needs=[], provides=[sfx: 'b'], fn{}='<lambda>')
DEBUG    ... (8697) op(op1) completed in 0.149ms.
DEBUG    +++ (8697) Executing OpTask(FunctionalOperation(name='op2', needs=[sfx: 'b'], provides=['b'], fn='<lambda>'), sol_keys=[sfx: 'b'])...
DEBUG    ... (8697) op(op2) completed in 0.08ms.
DEBUG    ... (8697) evicting 'sfx: 'b'' from solution[sfx: 'b', 'b'].
DEBUG    === (8697) Completed pipeline(t) in 0.229ms.
DEBUG flag

Enable the set_debug() in configurations, or externally, by setting the GRAPHTIK_DEBUG environment variable, to enact the following:

  • net objects print details recursively;

  • plotted SVG diagrams include style-provenance as tooltips;

  • Sphinx extension also saves the original DOT file next to each image (see graphtik_save_dot_files).

Tip

From code you may wrap the code you are interested in with config.debug_enabled() “context-manager”, to get augmented print-outs for selected code-paths only.

Jetsam on exceptions

Additionally, when some operation fails, the original exception gets annotated with the following properties, as a debug aid:

>>> from graphtik import compose, operation
>>> from pprint import pprint
>>> def scream(*args):
...     raise ValueError("Wrong!")
>>> try:
...     compose("errgraph",
...             operation(name="screamer", needs=['a'], provides=["foo"])(scream)
...     )(a=None)
... except ValueError as ex:
...     pprint(ex.jetsam)
{'aliases': None,
 'args': {'kwargs': {}, 'positional': [None], 'varargs': []},
 'network': Network(x3 nodes, x1 ops: screamer),
 'operation': FunctionalOperation(name='screamer', needs=['a'], provides=['foo'], fn='scream'),
 'outputs': None,
 'plan': ExecutionPlan(needs=['a'], provides=['foo'], x1 steps: screamer),
 'results_fn': None,
 'results_op': None,
 'solution': {'a': None},
 'task': OpTask(FunctionalOperation(name='screamer', needs=['a'], provides=['foo'], fn='scream'), sol_keys=['a'])}

In interactive REPL console you may use this to get the last raised exception:

import sys

sys.last_value.jetsam

The following annotated attributes might have meaningful value on an exception:

network

the innermost network owning the failed operation/function

plan

the innermost plan that executing when a operation crashed

operation

the innermost operation that failed

args

either the input arguments list fed into the function, or a dict with both args & kwargs keys in it.

outputs

the names of the outputs the function was expected to return

provides

the names eventually the graph needed from the operation; a subset of the above, and not always what has been declared in the operation.

fn_results

the raw results of the operation’s function, if any

op_results

the results, always a dictionary, as matched with operation’s provides

solution

an instance of Solution, contains inputs & outputs till the error happened; note that Solution.executed contain the list of executed operations so far.

Of course you may use many of the above “jetsam” values when plotting.

Debugger

The Plotting capabilities, along with the above annotation of exceptions with the internal state of plan/operation often renders a debugger session unnecessary. But since the state of the annotated values might be incomplete, you may not always avoid one.

Your best shot is to enable “post mortem debugging”.

If you set a breakpoint() in one of your functions, move up a few frames to find the ExecutionPlan._handle_task() method, where the “live” ExecutionPlan & Solution instances live, useful when investigating problems with computed values.

Architecture

compute
computation
phase

%3 graphtik-v4.4+ flowchart cluster_compute compute operations operations compose compose operations->compose network network compose->network compile compile network->compile ●graph inputs input names inputs->compile outputs output names outputs->compile predicate node predicate predicate->compile plan execution plan compile->plan ●pruned dag execute execute plan->execute ●dag solution solution execute->solution ●solution dag values input values values->execute solution->solution ●prune dag on reschedule The definition & execution of networked operation is split in 1+2 phases:

… it is constrained by these IO data-structures:

… populates these low-level data-structures:

… and utilizes these main classes:

graphtik.op.FunctionalOperation([fn, name, …])

An operation performing a callable (ie a function, a method, a lambda).

graphtik.pipeline.Pipeline(operations, name, *)

An operation that can compute a network-graph of operations.

graphtik.network.Network(*operations[, graph])

A graph of operations that can compile an execution plan.

graphtik.execution.ExecutionPlan

A pre-compiled list of operation steps that can execute for the given inputs/outputs.

graphtik.execution.Solution(plan, input_values)

The solution chain-map and execution state (e.g.

… plus those for plotting:

graphtik.plot.Plotter([theme])

a plotter renders diagram images of plottables.

graphtik.plot.Theme(*[, _prototype])

The poor man’s css-like plot theme (see also StyleStack).

compose
composition

The phase where operations are constructed and grouped into pipelines and corresponding networks based on their dependencies.

Tip

combine pipelines

When operations and/or pipelines are composed together, there are two ways to combine the operations contained into the new pipeline: operation merging (default) and operation nesting.

They are selected by the nest parameter of compose() factory.

operation merging

The default method to combine pipelines, also applied when simply merging operations.

Any identically-named operations override each other, with the operations added earlier in the .compose() call (further to the left) winning over those added later (further to the right).

seealso

Merging

operation nesting

The elaborate method to combine pipelines forming clusters.

The original pipelines are preserved intact in “isolated” clusters, by prefixing the names of their operations (and optionally data) by the name of the respective original pipeline that contained them (or the user defines the renames).

seealso

Nesting, compose(), RenArgs, nest_any_node(), dep_renamed(), PlotArgs.clusters

compile
compilation

The phase where the Network creates a new execution plan by pruning all graph nodes into a subgraph dag, and deriving the execution steps.

execute
execution
sequential

The phase where the ExecutionPlan calls the underlying functions of all operations contained in execution steps, with inputs/outputs taken from the solution.

Currently there are 2 ways to execute:

  • sequential

  • parallel, with a multiprocessing.pool.ProcessPool

Plans may abort their execution by setting the abort run global flag.

net
network

the Network contains a graph of operations and can compile (and cache) execution plans, or prune a cloned network for given inputs/outputs/node predicate.

plan
execution plan

Class ExecutionPlan perform the execution phase which contains the dag and the steps.

compileed execution plans are cached in Network._cached_plans across runs with (inputs, outputs, predicate) as key.

solution

A Solution instance created internally by Pipeline.compute() to hold the values both inputs & outputs, and the status of executed operations. It is based on a collections.ChainMap, to keep one dictionary for each operation executed +1 for inputs.

The results of the last operation executed “wins” in the outputs produced, and the base (least precedence) is the inputs given when the execution started.

graph
network graph

A graph of operations linked by their dependencies forming a pipeline.

The Network.graph (currently a DAG) contains all FunctionalOperation and data-nodes (string or modifier) of a pipeline.

They are layed out and connected by repeated calls of Network._append_operation() by Network constructor during composition.

This graph is then pruned to extract the dag, and the execution steps are calculated, all ingredients for a new ExecutionPlan.

prune
pruning

A subphase of compilation performed by method Network._prune_graph(), which extracts a subgraph dag that does not contain any unsatisfied operations.

It topologically sorts the graph, and prunes based on given inputs, asked outputs, node predicate and operation needs & provides.

unsatisfied operation

The core of pruning & rescheduling, performed by network.unsatisfied_operations() function, which collects all operations with unreachable dependencies:

  • they have needs that do not correspond to any of the given inputs or the intermediately computed outputs of the solution;

  • all their provides are NOT needed by any other operation, nor are asked as outputs.

dag
execution dag
solution dag

There are 2 directed-acyclic-graphs instances used:

steps
execution steps

The plan contains a list of the operation-nodes only from the dag, topologically sorted, and interspersed with instruction steps needed to compute the asked outputs from the given inputs.

They are built by Network._build_execution_steps() based on the subgraph dag.

The only instruction step is for performing evictions.

evictions

A memory footprint optimization where intermediate inputs & outputs are erased from solution as soon as they are not needed further down the dag.

Evictions are pre-calculated during compilation, where _EvictInstruction steps are inserted in the execution plan.

overwrite

Values in the solution that have been written by more than one operations, accessed by Solution.overwrites. Note that a sideffected dependency produce usually an overwrite.

inputs

The named input values that are fed into an operation (or pipeline) through Operation.compute() method according to its needs.

These values are either:

outputs

The dictionary of computed values returned by an operation (or a pipeline) matching its provides, when method Operation.compute() is called.

Those values are either:

  • retained in the solution, internally during execution, keyed by the respective provide, or

  • returned to user after the outer pipeline has finished computation.

When no specific outputs requested from a pipeline, Pipeline.compute() returns all intermediate inputs along with the outputs, that is, no evictions happens.

An operation may return partial outputs.

pipeline

The Pipeline class holding a network of operations and dependencies.

operation

Either the abstract notion of an action with specified needs and provides, dependencies, or the concrete wrapper FunctionalOperation for (any callable()), that feeds on inputs and update outputs, from/to solution, or given-by/returned-to the user by a pipeline.

The distinction between needs/provides and inputs/outputs is akin to function parameters and arguments during define-time and run-time, respectively.

dependency

The name of a solution value an operation needs or provides.

  • Dependencies are declared during composition, when building FunctionalOperation instances. Operations are then interlinked together, by matching the needs & provides of all operations contained in a pipeline.

  • During compilation the graph is then pruned based on the reachability of the dependencies.

  • During execution Operation.compute() performs 2 “matchings”:

    • inputs & outputs in solution are accessed by the needs & provides names of the operations;

    • operation needs & provides are zipped against the underlying function’s arguments and results.

    These matchings are affected by modifiers, print-out with diacritics.

needs
fn_needs

The list of dependency names an operation requires from solution as inputs,

roughly corresponding to underlying function’s arguments (fn_needs).

Specifically, Operation.compute() extracts input values from solution by these names, and matches them against function arguments, mostly by their positional order. Whenever this matching is not 1-to-1, and function-arguments differ from the regular needs, modifiers must be used.

provides
op_provides
fn_provides

The list of dependency names an operation writes to the solution as outputs,

roughly corresponding to underlying function’s results (fn_provides).

Specifically, Operation.compute() “zips” this list-of-names with the output values produced when the operation’s function is called. Whenever this “zipping” is not 1-to-1, and function-results differ from the regular operation (op_provides) (or results are not a list), it is possible to:

alias

Map an existing name in fn_provides into a duplicate, artificial one in op_provides .

You cannot alias an alias. See Aliased provides

returns dictionary

When an operation is marked with FunctionalOperation.returns_dict flag, the underlying function is not expected to return fn_provides as a sequence but as a dictionary; hence, no “zipping” of function-results –> fn_provides takes place.

Usefull for operations returning partial outputs to have full control over which outputs were actually produced, or to cancel sideffects.

modifier
diacritic

A modifier change dependency behavior during compilation or execution.

For instance, needs may be annotated as optionals function arguments, provides and needs can be annotated as “ghost” sideffects.

When printed, modifiers annotate regular or sideffect dependencies with these diacritics:

>   : keyword (fn_keyword)
?   : optional (fn_keyword)
*   : vararg
+   : varargs

See graphtik.modifiers module.

optionals

A needs only modifier for a inputs that do not hinder operation execution (prune) if absent from solution.

In the underlying function it corresponds to either:

varargish

A needs only modifier for inputs to be appended as *args (if present in solution).

There are 2 kinds, both, by definition, optionals:

  • the vararg() annotates any solution value to be appended once in the *args;

  • the varargs() annotates iterable values and all its items are appended in the *args one-by-one.

Attention

To avoid user mistakes, varargs do not accept str inputs (though iterables):

>>> graph(a=5, b="mistake")
Traceback (most recent call last):
...
graphtik.base.MultiValueError: Failed preparing needs:
    1. Expected needs['b'(+)] to be non-str iterables!
    +++inputs: ['a', 'b']
    +++FunctionalOperation(name='enlist',
                           needs=['a', 'b'(+)],
                           provides=['sum'],
                           fn='enlist')

In printouts, it is denoted either with * or + diacritic.

sideffects

A modifier denoting a fictive dependency linking operations into virtual flows, without real data exchanges.

The side-effect modification may happen to some internal state not fully represented in the graph & solution.

There are actually 2 relevant modifiers:

  • An abstract sideffect modifier (annotated with sfx()) describing modifications taking place beyond the scope of the solution. It may have just the “optional” diacritic in printouts.

  • The sideffected modifier (annotated with sfxed()) denoting modifications on a real dependency read from and written to the solution.

Both kinds of sideffects participate in the compilation of the graph, and both may be given or asked in the inputs & outputs of a pipeline, but they are never given to functions. A function of a returns dictionary operation can return a falsy value to declare it as canceled.

sideffected

A modifier that denotes sideffects on a dependency that exists in solution, allowing to declare an operation that both needs and provides that sideffected dependency.

Note

To be precise, the “sideffected dependency” is the name held in _Modifier.sideffected attribute of a modifier created by sfxed() function.

The outputs of a sideffected dependency will produce an overwrite if the sideffected dependency is declared both as needs and provides of some operation.

It is annotated with sfxed(); it may have all diacritics in printouts.

reschedule
rescheduling
partial outputs
canceled operation

The partial pruning of the solution’s dag during execution. It happens when any of these 2 conditions apply:

the solution must then reschedule the remaining operations downstream, and possibly cancel some of those ( assigned in Solution.canceled).

Partial operations are usually declared with returns dictionary so that the underlying function can control which of the outputs are returned.

See Operations with partial outputs (rescheduled)

endurance
endured

Keep executing as many operations as possible, even if some of them fail. Endurance for an operation is enabled if set_endure_operations() is true globally in the configurations or if FunctionalOperation.endured is true.

You may interrogate Solution.executed to discover the status of each executed operations or call one of check_if_incomplete() or scream_if_incomplete().

See Depending on sideffects

predicate
node predicate

A callable(op, node-data) that should return true for nodes to be included in graph during compilation.

abort run

A global configurations flag that when set with abort_run() function, it halts the execution of all currently or future plans.

It is reset automatically on every call of Pipeline.compute() (after a successful intermediate compilation), or manually, by calling reset_abort().

parallel
parallel execution
execution pool
task

execute operations in parallel, with a thread pool or process pool (instead of sequential). Operations and pipeline are marked as such on construction, or enabled globally from configurations.

Note a sideffects are not expected to function with process pools, certainly not when marshalling is enabled.

process pool

When the multiprocessing.pool.Pool class is used for parallel execution, the tasks must be communicated to/from the worker process, which requires pickling, and that may fail. With pickling failures you may try marshalling with dill library, and see if that helps.

Note that sideffects are not expected to function at all. certainly not when marshalling is enabled.

thread pool

When the multiprocessing.dummy.Pool() class is used for parallel execution, the tasks are run in process, so no marshalling is needed.

marshalling

Pickling parallel operations and their inputs/outputs using the dill module. It is configured either globally with set_marshal_tasks() or set with a flag on each operation / pipeline.

Note that sideffects do not work when this is enabled.

plottable

Objects that can plot their graph network, such as those inheriting Plottable, (FunctionalOperation, Pipeline, Network, ExecutionPlan, Solution) or a pydot.Dot instance (the result of the Plottable.plot() method).

Such objects may render as SVG in Jupiter notebooks (through their plot() method) and can render in a Sphinx site with with the graphtik RsT directive. You may control the rendered image as explained in the tip of the Plotting section.

SVGs are in rendered with the zoom-and-pan javascript library

Attention

Zoom-and-pan does not work in Sphinx sites for Chrome locally - serve the HTML files through some HTTP server, e.g. launch this command to view the site of this project:

python -m http.server 8080 --directory build/sphinx/html/
plotter
plotting

A Plotter is responsible for rendering plottables as images. It is the active plotter that does that, unless overridden in a Plottable.plot() call. Plotters can be customized by various means, such plot theme.

active plotter
default active plotter

The plotter currently installed “in-context” of the respective graphtik configuration - this term implies also any Plot customizations done on the active plotter (such as plot theme).

Installation happens by calling one of active_plotter_plugged() or set_active_plotter() functions.

The default active plotter is the plotter instance that this project comes pre-configured with, ie, when no plot-customizations have yet happened.

Attention

It is recommended to use other means for Plot customizations instead of modifying directly theme’s class-attributes.

All Theme class-attributes are deep-copied when constructing new instances, to avoid modifications by mistake, while attempting to update instance-attributes instead (hint: allmost all its attributes are containers i.e. dicts). Therefore any class-attributes modification will be ignored, until a new Theme instance from the patched class is used .

plot theme
current theme

The mergeable and expandable styles contained in a plot.Theme instance.

The current theme in-use is the Plotter.default_theme attribute of the active plotter, unless overridden with the theme parameter when calling Plottable.plot() (conveyed internally as the value of the PlotArgs.theme attribute).

style
style expansion

A style is an attribute of a plot theme, either a scalar value or a dictionary.

Styles are collected in stacks and are merged into a single dictionary after performing the following expansions:

  • Resolve any Ref instances, first against the current nx_attrs and then against the attributes of the current theme.

  • Render jinja2 templates (see _expand_styles()) with template-arguments all the attributes of the plot_args instance in use (hence much more flexible than Ref).

  • Call any callables with current plot_args and replace them by their result (even more flexible than templates).

  • Any Nones results above are discarded.

  • Workaround pydot/pydot#228 pydot-cstor not supporting styles-as-lists.

Tip

if DEBUG is enabled, the provenance of all style values appears in the tooltips of plotted graphs.

configurations
graphtik configuration

The functions controlling compile & execution globally are defined in config module and +1 in graphtik.plot module; the underlying global data are stored in contextvars.ContextVar instances, to allow for nested control.

All boolean configuration flags are tri-state (None, False, True), allowing to “force” all operations, when they are not set to the None value. All of them default to None (false).

jetsam

When operations fail, the original exception gets annotated with salvaged values from locals() and raised intact.

See Jetsam on exceptions.

API Reference

graphtik

Lightweight computation graphs for Python.

graphtik.op

Define operation & dependency and match/zip inputs/outputs during execution.

graphtik.pipeline

compose operations into pipeline and the backing network.

graphtik.modifiers

modifiers change dependency behavior during compilation & execution.

graphtik.network

compose network of operations & dependencies, compile the plan.

graphtik.execution

execute the plan to derrive the solution.

graphtik.plot

plotting handled by the active plotter & current theme.

graphtik.config

configurations for network execution, and utilities on them.

graphtik.base

Generic utilities, exceptions and operation & plottable base classes.

graphtik.sphinxext

Extends Sphinx with graphtik directive for plotting from doctest code.

%3 graphtik-v8.1+ module dependencies cluster_base base cluster_network core modules plot.py plot.py base.py base.py plot.py->base.py sphinxext/ sphinxext/ sphinxext/->plot.py config.py config.py base.py->plot.py modifiers.py modifiers.py pipeline.py pipeline.py pipeline.py->base.py op.py op.py pipeline.py->op.py network.py network.py pipeline.py->network.py op.py->base.py op.py->pipeline.py execution.py execution.py execution.py->base.py execution.py->network.py network.py->base.py network.py->pipeline.py network.py->op.py network.py->execution.py

Module: op

Define operation & dependency and match/zip inputs/outputs during execution.

Note

This module (along with modifiers & pipeline) is what client code needs to define pipelines on import time without incurring a heavy price (<5ms on a 2019 fast PC)

class graphtik.op.FunctionalOperation(fn: Callable = None, name=None, needs: Union[Collection, str, None] = None, provides: Union[Collection, str, None] = None, aliases: Mapping = None, *, rescheduled=None, endured=None, parallel=None, marshalled=None, returns_dict=None, node_props: Mapping = None)[source]

An operation performing a callable (ie a function, a method, a lambda).

Tip

  • Use operation() factory to build instances of this class instead.

  • Call withset() on existing instances to re-configure new clones.

  • See diacritics to understand printouts of this class.

__abstractmethods__ = frozenset({})[source]
__call__(*args, **kwargs)[source]

Like dict args, delegates to compute().

__eq__(other)[source]

Operation identity is based on name.

__hash__()[source]

Operation identity is based on name.

__init__(fn: Callable = None, name=None, needs: Union[Collection, str, None] = None, provides: Union[Collection, str, None] = None, aliases: Mapping = None, *, rescheduled=None, endured=None, parallel=None, marshalled=None, returns_dict=None, node_props: Mapping = None)[source]

Build a new operation out of some function and its requirements.

See operation() for the full documentation of parameters, study the code for attributes (or read them from rendered sphinx site).

__module__ = 'graphtik.op'
__repr__()[source]

Display operation & dependency names annotated with diacritics.

_abc_impl = <_abc_data object>
_fn_needs = None[source]

Value names the underlying function requires (dupes preserved, without sideffects, with stripped sideffected dependencies).

_fn_provides = None[source]

Value names the underlying function produces (dupes preserved, without aliases & sideffects, with stripped sideffected dependencies).

_match_inputs_with_fn_needs(named_inputs) → Tuple[list, list, dict][source]
_prepare_match_inputs_error(exceptions: List[Tuple[Any, Exception]], missing: List, varargs_bad: List, named_inputs: Mapping)ValueError[source]
_zip_results_with_provides(results)dict[source]

Zip results with expected “real” (without sideffects) provides.

aliases = None[source]

an optional mapping of fn_provides to additional ones, together comprising this operations op_provides.

You cannot alias an alias.

compute(named_inputs=None, outputs: Union[Collection, str, None] = None)dict[source]

Compute (optional) asked outputs for the given named_inputs.

It is called by Network. End-users should simply call the operation with named_inputs as kwargs.

Parameters

named_inputs – the input values with which to feed the computation.

Returns list

Should return a list values representing the results of running the feed-forward computation on inputs.

property deps

All dependency names, including op_ & internal _fn_.

if not DEBUG, all deps are converted into lists, ready to be printed.

endured = None[source]

If true, even if callable fails, solution will reschedule; ignored if endurance enabled globally.

fn = None[source]

The operation’s underlying function.

marshalled = None[source]

If true, operation will be marshalled while computed, along with its inputs & outputs. (usefull when run in parallel with a process pool).

name = None[source]

a name for the operation (e.g. ‘conv1’, ‘sum’, etc..); any “parents split by dots(.)”. :seealso: Nesting

needs = None[source]

The needs almost as given by the user (which may contain MULTI-sideffecteds and dupes), roughly morphed into _fn_provides + sideffects (dupes preserved, with sideffects & SINGULARIZED sideffecteds). It is stored for builder functionality to work.

node_props = None[source]

Added as-is into NetworkX graph, and you may filter operations by Pipeline.withset(). Also plot-rendering affected if they match Graphviz properties, unless they start with underscore(_).

op_needs = None[source]

Value names ready to lay the graph for pruning (NO dupes, WITH aliases & sideffects, and SINGULAR sideffecteds).

op_provides = None[source]

Value names ready to lay the graph for pruning (NO dupes, WITH aliases & sideffects, and SINGULAR sideffecteds).

parallel = None[source]

execute in parallel

prepare_plot_args(plot_args: graphtik.base.PlotArgs)graphtik.base.PlotArgs[source]

Delegate to a provisional network with a single op .

provides = None[source]

The provides almost as given by the user (which may contain MULTI-sideffecteds and dupes), roughly morphed into _fn_provides + sideffects (dupes preserved, without aliases, with sideffects & SINGULARIZED sideffecteds). It is stored for builder functionality to work.

rescheduled = None[source]

If true, underlying callable may produce a subset of provides, and the plan must then reschedule after the operation has executed. In that case, it makes more sense for the callable to returns_dict.

returns_dict = None[source]

If true, it means the underlying function returns dictionary , and no further processing is done on its results, i.e. the returned output-values are not zipped with provides.

It does not have to return any alias outputs.

Can be changed amidst execution by the operation’s function, but it is easier for that function to simply call set_results_by_name().

withset(fn: Callable = Ellipsis, name=Ellipsis, needs: Union[Collection, str, None] = Ellipsis, provides: Union[Collection, str, None] = Ellipsis, aliases: Mapping = Ellipsis, *, rescheduled=Ellipsis, endured=Ellipsis, parallel=Ellipsis, marshalled=Ellipsis, returns_dict=Ellipsis, node_props: Mapping = Ellipsis, renamer=None)graphtik.op.FunctionalOperation[source]

Make a clone with the some values replaced, or operation and dependencies renamed.

if renamer given, it is applied on top (and afterwards) any other changed values, for operation-name, needs, provides & any aliases.

Parameters

renamer

  • if a dictionary, it renames any operations & data named as keys into the respective values by feeding them into :func:.dep_renamed()`, so values may be single-input callables themselves.

  • if it is a callable(), it is given a RenArgs instance to decide the node’s name.

The callable may return a str for the new-name, or any other false value to leave node named as is.

Attention

The callable SHOULD wish to preserve any modifier on dependencies, and use dep_renamed() if a callable is given.

Returns

a clone operation with changed/renamed values asked

Raise
  • (ValueError, TypeError): all cstor validation errors

  • ValueError: if a renamer dict contains a non-string and non-callable value

Examples

>>> from graphtik import sfx
>>> op = operation(str, "foo", needs="a",
...     provides=["b", sfx("c")],
...     aliases={"b": "B-aliased"})
>>> op.withset(renamer={"foo": "BAR",
...                     'a': "A",
...                     'b': "B",
...                     sfx('c'): "cc",
...                     "B-aliased": "new.B-aliased"})
FunctionalOperation(name='BAR',
                    needs=['A'],
                    provides=['B', sfx('cc')],
                    aliases=[('B', 'new.B-aliased')],
                    fn='str')
  • Notice that 'c' rename change the “sideffect name, without the destination name being an sfx() modifier (but source name must match the sfx-specifier).

  • Notice that the source of aliases from b-->B is handled implicitely from the respective rename on the provides.

But usually a callable is more practical, like the one below renaming only data names:

>>> op.withset(renamer=lambda ren_args:
...            dep_renamed(ren_args.name, lambda n: f"parent.{n}")
...            if ren_args.typ != 'op' else
...            False)
FunctionalOperation(name='foo',
                    needs=['parent.a'],
                    provides=['parent.b', sfx('parent.c')],
                    aliases=[('parent.b', 'parent.B-aliased')],
                    fn='str')

Notice the double use of lambdas with dep_renamed() – an equivalent rename callback would be:

dep_renamed(ren_args.name, f"parent.{dependency(ren_args.name)}")
graphtik.op.NO_RESULT = <NO_RESULT>

A special return value for the function of a reschedule operation signifying that it did not produce any result at all (including sideffects), otherwise, it would have been a single result, None. Usefull for rescheduled who want to cancel their single result witout being delcared as returns dictionary.

graphtik.op.NO_RESULT_BUT_SFX = <NO_RESULT_BUT_SFX>

Like NO_RESULT but does not cancel any :term;`sideffects` declared as provides.

graphtik.op._spread_sideffects(deps: Collection[str]) → Tuple[Collection[str], Collection[str]][source]

Build fn/op dependencies from user ones by stripping or singularizing any sideffects.

Returns

the given deps duplicated as (fn_deps,  op_deps), where any instances of sideffects are processed like this:

fn_deps
  • any sfxed() are replaced by the stripped dependency consumed/produced by underlying functions, in the order they are first met (the rest duplicate sideffected are discarded).

  • any sfx() are simply dropped;

op_deps

any sfxed() are replaced by a sequence of “singularized” instances, one for each item in their _Modifier.sfx_list attribute, in the order they are first met (any duplicates are discarded, order is irrelevant, since they don’t reach the function);

graphtik.op.as_renames(i, argname)[source]

Parses a list of (source–>destination) from dict, list-of-2-items, single 2-tuple.

Returns

a (possibly empty)list-of-pairs

Note

The same source may be repeatedly renamed to multiple destinations.

graphtik.op.operation(fn: Callable = None, name=None, needs: Union[Collection, str, None] = None, provides: Union[Collection, str, None] = None, aliases: Mapping = None, *, rescheduled=None, endured=None, parallel=None, marshalled=None, returns_dict=None, node_props: Mapping = None)[source]

An operation factory that can function as a decorator.

Parameters
  • fn

    The callable underlying this operation. If given, it builds the operation right away (along with any other arguments).

    If not given, it returns a “fancy decorator” that still supports all arguments here AND the withset() method.

    Hint

    This is a twisted way for “fancy decorators”.

    After all that, you can always call FunctionalOperation.withset() on existing operation, to obtain a re-configured clone.

  • name (str) – The name of the operation in the computation graph. If not given, deduce from any fn given.

  • needs

    the list of (positionally ordered) names of the data needed by the operation to receive as inputs, roughly corresponding to the arguments of the underlying fn (plus any sideffects).

    It can be a single string, in which case a 1-element iterable is assumed.

  • provides

    the list of (positionally ordered) output data this operation provides, which must, roughly, correspond to the returned values of the fn (plus any sideffects & aliases).

    It can be a single string, in which case a 1-element iterable is assumed.

    If they are more than one, the underlying function must return an iterable with same number of elements, unless param returns_dict is true, in which case must return a dictionary that containing (at least) those named elements.

  • aliases – an optional mapping of provides to additional ones

  • rescheduled – If true, underlying callable may produce a subset of provides, and the plan must then reschedule after the operation has executed. In that case, it makes more sense for the callable to returns_dict.

  • endured – If true, even if callable fails, solution will reschedule. ignored if endurance enabled globally.

  • parallel – execute in parallel

  • marshalled – If true, operation will be marshalled while computed, along with its inputs & outputs. (usefull when run in parallel with a process pool).

  • returns_dict – if true, it means the fn returns dictionary with all provides, and no further processing is done on them (i.e. the returned output-values are not zipped with provides)

  • node_props – Added as-is into NetworkX graph, and you may filter operations by Pipeline.withset(). Also plot-rendering affected if they match Graphviz properties., unless they start with underscore(_)

Returns

when called with fn, it returns a FunctionalOperation, otherwise it returns a decorator function that accepts fn as the 1st argument.

Note

Actually the returned decorator is the FunctionalOperation.withset() method and accepts all arguments, monkeypatched to support calling a virtual withset() method on it, not to interrupt the builder-pattern, but only that - besides that trick, it is just a bound method.

Example:

This is an example of its use, based on the “builder pattern”:

>>> from graphtik import operation, varargs
>>> op = operation()
>>> op
<function FunctionalOperation.withset at ...

That’s a “fancy decorator”.

>>> op = op.withset(needs=['a', 'b'])
>>> op
FunctionalOperation(name=None, needs=['a', 'b'], fn=None)

If you call an operation with fn un-initialized, it will scream:

>>> op.compute({"a":1, "b": 2})
Traceback (most recent call last):
ValueError: Operation must have a callable `fn` and a non-empty `name`:
  FunctionalOperation(name=None, needs=['a', 'b'], fn=None)

You may keep calling withset() until a valid operation instance is returned, and compute it:

>>> op = op.withset(needs=['a', 'b'],
...                 provides='SUM', fn=lambda a, b: a + b)
>>> op
FunctionalOperation(name='<lambda>', needs=['a', 'b'], provides=['SUM'], fn='<lambda>')
>>> op.compute({"a":1, "b": 2})
{'SUM': 3}
>>> op.withset(fn=lambda a, b: a * b).compute({'a': 2, 'b': 5})
{'SUM': 10}
graphtik.op.reparse_operation_data(name, needs, provides, aliases=()) → Tuple[Hashable, Collection[str], Collection[str], Collection[Tuple[str, str]]][source]

Validate & reparse operation data as lists.

Returns

name, needs, provides, aliases

As a separate function to be reused by client building operations, to detect errors early.

Module: pipeline

compose operations into pipeline and the backing network.

Note

This module (along with op & modifiers) is what client code needs to define pipelines on import time without incurring a heavy price (<5ms on a 2019 fast PC)

class graphtik.pipeline.NULL_OP(name)[source]

Eliminates same-named operations added later during term:operation merging.

Seealso

Merging

__abstractmethods__ = frozenset({})[source]
__eq__(o)[source]

Return self==value.

__hash__()[source]

Return hash(self).

__init__(name)[source]

Initialize self. See help(type(self)) for accurate signature.

__module__ = 'graphtik.pipeline'
__repr__()[source]

Return repr(self).

_abc_impl = <_abc_data object>
compute(*args, **kw)[source]

Compute (optional) asked outputs for the given named_inputs.

It is called by Network. End-users should simply call the operation with named_inputs as kwargs.

Parameters

named_inputs – the input values with which to feed the computation.

Returns list

Should return a list values representing the results of running the feed-forward computation on inputs.

name = None[source]
needs = None[source]
op_needs = None[source]
op_provides = None[source]
prepare_plot_args(*args, **kw)[source]

Called by plot() to create the nx-graph and other plot-args, e.g. solution.

  • Clone the graph or merge it with the one in the plot_args (see PlotArgs.clone_or_merge_graph().

  • For the rest args, prefer PlotArgs.with_defaults() over _replace(), not to override user args.

provides = None[source]
class graphtik.pipeline.Pipeline(operations, name, *, outputs=None, predicate: NodePredicate = None, rescheduled=None, endured=None, parallel=None, marshalled=None, node_props=None, renamer=None)[source]

An operation that can compute a network-graph of operations.

Tip

  • Use compose() factory to prepare the net and build instances of this class.

  • See diacritics to understand printouts of this class.

__abstractmethods__ = frozenset({})[source]
__call__(**input_kwargs) → Solution[source]

Delegates to compute(), respecting any narrowed outputs.

__init__(operations, name, *, outputs=None, predicate: NodePredicate = None, rescheduled=None, endured=None, parallel=None, marshalled=None, node_props=None, renamer=None)[source]

For arguments, ee withset() & class attributes.

Raises

ValueError

if dupe operation, with msg:

Operations may only be added once, …

__module__ = 'graphtik.pipeline'
__repr__()[source]

Display more informative names for the Operation class

_abc_impl = <_abc_data object>
compile(inputs=None, outputs=<UNSET>, predicate: NodePredicate = <UNSET>) → ExecutionPlan[source]

Produce a plan for the given args or outputs/predicate narrowed earlier.

Parameters
  • named_inputs – a string or a list of strings that should be fed to the needs of all operations.

  • outputs – A string or a list of strings with all data asked to compute. If None, all possible intermediate outputs will be kept. If not given, those set by a previous call to withset() or cstor are used.

  • predicate – Will be stored and applied on the next compute() or compile(). If not given, those set by a previous call to withset() or cstor are used.

Returns

the execution plan satisfying the given inputs, outputs & predicate

Raises

ValueError

  • If outputs asked do not exist in network, with msg:

    Unknown output nodes: …

  • If solution does not contain any operations, with msg:

    Unsolvable graph: …

  • If given inputs mismatched plan’s needs, with msg:

    Plan needs more inputs…

  • If net cannot produce asked outputs, with msg:

    Unreachable outputs…

compute(named_inputs: Mapping = <UNSET>, outputs: Union[Collection, str, None] = <UNSET>, predicate: NodePredicate = None, solution_class: Type[Solution] = None) → Solution[source]

Compile a plan & execute the graph, sequentially or parallel.

Attention

If intermediate compilation is successful, the “global abort run flag is reset before the execution starts.

Parameters
  • named_inputs – A mapping of names –> values that will be fed to the needs of all operations. Cloned, not modified.

  • outputs – A string or a list of strings with all data asked to compute. If None, all intermediate data will be kept.

  • predicate – filter-out nodes before compiling

  • solution_class – a custom solution factory to use

Returns

The solution which contains the results of each operation executed +1 for inputs in separate dictionaries.

Raises

ValueError

  • If outputs asked do not exist in network, with msg:

    Unknown output nodes: …

  • If plan does not contain any operations, with msg:

    Unsolvable graph: …

  • If given inputs mismatched plan’s needs, with msg:

    Plan needs more inputs…

  • If net cannot produce asked outputs, with msg:

    Unreachable outputs…

See also Operation.compute().

name: str = None[source]

The name for the new pipeline, used when nesting them.

needs = None[source]
op_needs = None[source]
op_provides = None[source]
property ops

A new list with all operations contained in the network.

outputs = None[source]

The outputs names (possibly None) used to compile the plan.

predicate = None[source]

The node predicate is a 2-argument callable(op, node-data) that should return true for nodes to include; if None, all nodes included.

prepare_plot_args(plot_args: graphtik.base.PlotArgs)graphtik.base.PlotArgs[source]

Delegate to network.

provides = None[source]
withset(outputs: Union[Collection, str, None] = <UNSET>, predicate: NodePredicate = <UNSET>, *, name=None, rescheduled=None, endured=None, parallel=None, marshalled=None, node_props=None, renamer=None) → Pipeline[source]

Return a copy with a network pruned for the given needs & provides.

Parameters
  • outputs – Will be stored and applied on the next compute() or compile(). If not given, the value of this instance is conveyed to the clone.

  • predicate – Will be stored and applied on the next compute() or compile(). If not given, the value of this instance is conveyed to the clone.

  • name

    the name for the new pipeline:

    • if None, the same name is kept;

    • if True, a distinct name is devised:

      <old-name>-<uid>
      
    • otherwise, the given name is applied.

  • rescheduled – applies rescheduled to all contained operations

  • endured – applies endurance to all contained operations

  • parallel – mark all contained operations to be executed in parallel

  • marshalled – mark all contained operations to be marshalled (usefull when run in parallel with a process pool).

  • renamer – see respective parameter in FunctionalOperation.withset().

Returns

A narrowed pipeline clone, which MIGHT be empty!*

Raises

ValueError

  • If outputs asked do not exist in network, with msg:

    Unknown output nodes: …

graphtik.pipeline._id_bool(b)[source]
graphtik.pipeline._id_tristate_bool(b)[source]
graphtik.pipeline.compose(name, op1, *operations, outputs: Union[Collection, str, None] = None, rescheduled=None, endured=None, parallel=None, marshalled=None, nest: Union[Callable[[graphtik.base.RenArgs], str], Mapping[str, str], bool, str] = None, node_props=None)graphtik.pipeline.Pipeline[source]

Merge or nest operations & pipelines into a new pipeline,

based on the nest parameter (read below)

Operations given earlier (further to the left) override those following (further to the right), similar to set behavior (and contrary to dict).

Parameters
  • name (str) – A optional name for the graph being composed by this object.

  • op1 – syntactically force at least 1 operation

  • operations – each argument should be an operation or pipeline instance

  • nest

    a dictionary or callable corresponding to the renamer paremater of Pipeline.withset(), but the calable receives a ren_args with RenArgs.parent set when merging a pipeline, and applies the default nesting behavior (nest_any_node()) on truthies.

    Specifically:

    • if it is a dictionary, it renames any operations & data named as keys into the respective values, like that:

      • if a value is callable or str, it is fed into dep_renamed() (hint: it can be single-arg callable like: (str) -> str)

      • it applies default all-nodes nesting if other truthy;

      Note that you cannot access the “parent” name with dictionaries, you can only apply default all-node nesting by returning a non-string truthy.

    • if it is a callable(), it is given a RenArgs instance to decide the node’s name.

      The callable may return a str for the new-name, or any other true/false to apply default all-nodes nesting.

      For example, to nest just operation’s names (but not their dependencies), call:

      compose(
          ...,
          nest=lambda ren_args: ren_args.typ == "op"
      )
      

      Attention

      The callable SHOULD wish to preserve any modifier on dependencies, and use dep_renamed().

    • If false (default), applies operation merging, not nesting.

    • if true, applies default operation nesting to all types of nodes.

    In all other cases, the names are preserved.

    See also

  • rescheduled – applies rescheduled to all contained operations

  • endured – applies endurance to all contained operations

  • parallel – mark all contained operations to be executed in parallel

  • marshalled – mark all contained operations to be marshalled (usefull when run in parallel with a process pool).

  • node_props – Added as-is into NetworkX graph, to provide for filtering by Pipeline.withset(). Also plot-rendering affected if they match Graphviz properties, unless they start with underscore(_)

Returns

Returns a special type of operation class, which represents an entire computation graph as a single operation.

Raises

ValueError

  • If the net` cannot produce the asked outputs from the given inputs.

  • If nest callable/dictionary produced an non-string or empty name (see (NetworkPipeline))

graphtik.pipeline.nest_any_node(ren_args: graphtik.base.RenArgs)str[source]

Nest both operation & data under parent’s name (if given).

Returns

the nested name of the operation or data

Module: modifiers

modifiers change dependency behavior during compilation & execution.

The needs and provides annotated with modifiers designate, for instance, optional function arguments, or “ghost” sideffects.

Note

This module (along with op & pipeline) is what client code needs to define pipelines on import time without incurring a heavy price (~7ms on a 2019 fast PC)

Diacritics

When printed, modifiers annotate regular or sideffect dependencies with these diacritics:

>   : keyword (fn_keyword)
?   : optional (fn_keyword)
*   : vararg
+   : varargs
class graphtik.modifiers._Modifier[source]

Annotate a dependency with a combination of modifier.

It is private, in the sense that users should use only:

respectively, or at least _modifier().

Note

User code better call _modifier() factory which may return a plain string if no other arg but name are given.

_func

needed to reconstruct cstor code in cmd

_repr

pre-calculated representation

_withset(name=Ellipsis, fn_kwarg=Ellipsis, optional: graphtik.modifiers._Optionals = Ellipsis, sideffected=Ellipsis, sfx_list=Ellipsis) → Union[graphtik.modifiers._Modifier, str][source]

Make a new modifier with changes – handle with care.

Returns

Delegates to _modifier(), so returns a plain string if no args left.

property cmd

the code to reproduce it

fn_kwarg

Map my name in needs into this kw-argument of the function. is_mapped() returns it.

optional

required is None, regular optional or varargish? is_optional() returns it. All regulars are keyword.

sfx_list

At least one name(s) denoting the sideffects modification(s) on the sideffected, performed/required by the operation.

  • If it is an empty tuple`, it is an abstract sideffect,

    and is_pure_optional() returns True.

  • If not empty is_sfxed() returns true (the sideffected).

sideffected

Has value only for sideffects: the pure-sideffect string or the existing sideffected dependency.

class graphtik.modifiers._Optionals[source]

An enumeration.

graphtik.modifiers._modifier(name, fn_kwarg=None, optional: graphtik.modifiers._Optionals = None, sideffected=None, sfx_list=())[source]

A _Modifier factory that may return a plain str when no other args given.

It decides the final name and _repr for the new modifier by matching the given inputs with the _modifier_cstor_matrix.

graphtik.modifiers._modifier_cstor_matrix = {70000: None, 70010: ("sfx('%(dep)s')", "sfx('%(dep)s')", 'sfx'), 70011: ("sfxed('%(dep)s', %(sfx)s)", "sfxed('%(dep)s', %(sfx)s)", 'sfxed'), 70110: ("sfx('%(dep)s')", "sfx('%(dep)s'(?))", 'sfx'), 70200: ('%(dep)s', "'%(dep)s'(*)", 'vararg'), 70211: ("sfxed('%(dep)s', %(sfx)s)", "sfxed('%(dep)s'(*), %(sfx)s)", 'sfxed_vararg'), 70300: ('%(dep)s', "'%(dep)s'(+)", 'varargs'), 70311: ("sfxed('%(dep)s', %(sfx)s)", "sfxed('%(dep)s'(+), %(sfx)s)", 'sfxed_varargs'), 71000: ('%(dep)s', "'%(dep)s'(%(kw)s)", 'keyword'), 71011: ("sfxed('%(dep)s', %(sfx)s)", "sfxed('%(dep)s'(%(kw)s), %(sfx)s)", 'sfxed'), 71100: ('%(dep)s', "'%(dep)s'(?%(kw)s)", 'optional'), 71111: ("sfxed('%(dep)s', %(sfx)s)", "sfxed('%(dep)s'(?%(kw)s), %(sfx)s)", 'sfxed')}

Arguments-presence patterns for _Modifier constructor. Combinations missing raise errors.

graphtik.modifiers.dep_renamed(dep, ren)[source]

Renames dep as ren or call ren` (if callable) to decide its name,

preserving any keyword() to old-name.

For sideffected it renames the dependency (not the sfx-list) – you have to do it that manually with a custom renamer-function, if ever the need arise.

graphtik.modifiers.dep_singularized(dep)[source]

Yield one sideffected for each sfx in sfx_list, or iterate dep in other cases.

graphtik.modifiers.dep_stripped(dep)[source]

Return the _Modifier.sideffected if dep is sideffected, dep otherwise,

conveying all other properties of the original modifier to the stripped dependency.

graphtik.modifiers.dependency(dep)[source]

For non-sideffects, it coincides with str(), otherwise, the the pure-sideffect string or the existing sideffected dependency stored in sideffected.

graphtik.modifiers.is_mapped(dep) → Optional[str][source]

Check if a dependency is keyword (and get it).

All non-varargish optionals are “keyword” (including sideffected ones).

Returns

the fn_kwarg

graphtik.modifiers.is_optional(dep)bool[source]

Check if a dependency is optional.

Varargish & optional sideffects are included.

Returns

the optional

graphtik.modifiers.is_pure_sfx(dep)bool[source]

Check if it is sideffects but not a sideffected.

graphtik.modifiers.is_sfx(dep)bool[source]

Check if a dependency is sideffects or sideffected.

Returns

the sideffected

graphtik.modifiers.is_sfxed(dep)bool[source]

Check if it is sideffected.

graphtik.modifiers.is_vararg(dep)bool[source]

Check if an optionals dependency is vararg.

graphtik.modifiers.is_varargish(dep)bool[source]

Check if an optionals dependency is varargish.

graphtik.modifiers.is_varargs(dep)bool[source]

Check if an optionals dependency is varargs.

graphtik.modifiers.keyword(name: str, fn_kwarg: str = None)[source]

Annotate a needs that (optionally) maps inputs name –> keyword argument name.

The value of a keyword dependency is passed in as keyword argument to the underlying function.

Parameters

fn_kwarg

The argument-name corresponding to this named-input. If it is None, assumed the same as name, so as to behave always like kw-type arg, and to preserve its fn-name if ever renamed.

Note

This extra mapping argument is needed either for optionals (but not varargish), or for functions with keywords-only arguments (like def func(*, foo, bar): ...), since inputs are normally fed into functions by-position, not by-name.

Returns

a _Modifier instance, even if no fn_kwarg is given OR it is the same as name.

Example:

In case the name of the function arguments is different from the name in the inputs (or just because the name in the inputs is not a valid argument-name), you may map it with the 2nd argument of keyword():

>>> from graphtik import operation, compose, keyword
>>> @operation(needs=['a', keyword("name-in-inputs", "b")], provides="sum")
... def myadd(a, *, b):
...    return a + b
>>> myadd
FunctionalOperation(name='myadd',
                    needs=['a', 'name-in-inputs'(>'b')],
                    provides=['sum'],
                    fn='myadd')
>>> graph = compose('mygraph', myadd)
>>> graph
Pipeline('mygraph', needs=['a', 'name-in-inputs'], provides=['sum'], x1 ops: myadd)
>>> sol = graph.compute({"a": 5, "name-in-inputs": 4})['sum']
>>> sol
9
graphtik.modifiers.optional(name: str, fn_kwarg: str = None)[source]

Annotate optionals needs corresponding to defaulted op-function arguments, …

received only if present in the inputs (when operation is invoked).

The value of an optional dependency is passed in as a keyword argument to the underlying function.

Parameters

fn_kwarg – the name for the function argument it corresponds; if a falsy is given, same as name assumed, to behave always like kw-type arg and to preserve its fn-name if ever renamed.

Example:

>>> from graphtik import operation, compose, optional
>>> @operation(name='myadd',
...            needs=["a", optional("b")],
...            provides="sum")
... def myadd(a, b=0):
...    return a + b

Notice the default value 0 to the b annotated as optional argument:

>>> graph = compose('mygraph', myadd)
>>> graph
Pipeline('mygraph',
                 needs=['a', 'b'(?)],
                 provides=['sum'],
                 x1 ops: myadd)

The graph works both with and without c provided in the inputs:

>>> graph(a=5, b=4)['sum']
9
>>> graph(a=5)
{'a': 5, 'sum': 5}

Like keyword() you may map input-name to a different function-argument:

>>> operation(needs=['a', optional("quasi-real", "b")],
...           provides="sum"
... )(myadd.fn)  # Cannot wrap an operation, its `fn` only.
FunctionalOperation(name='myadd',
                    needs=['a', 'quasi-real'(?>'b')],
                    provides=['sum'],
                    fn='myadd')
graphtik.modifiers.sfx(name, optional: bool = None)[source]

sideffects denoting modifications beyond the scope of the solution.

Both needs & provides may be designated as sideffects using this modifier. They work as usual while solving the graph (compilation) but they have a limited interaction with the operation’s underlying function; specifically:

  • input sideffects must exist in the solution as inputs for an operation depending on it to kick-in, when the computation starts - but this is not necessary for intermediate sideffects in the solution during execution;

  • input sideffects are NOT fed into underlying functions;

  • output sideffects are not expected from underlying functions, unless a rescheduled operation with partial outputs designates a sideffected as canceled by returning it with a falsy value (operation must returns dictionary).

Hint

If modifications involve some input/output, prefer the sfxed() modifier.

You may still convey this relationships by including the dependency name in the string - in the end, it’s just a string - but no enforcement of any kind will happen from graphtik, like:

>>> from graphtik import sfx
>>> sfx("price[sales_df]")
sfx('price[sales_df]')

Example:

A typical use-case is to signify changes in some “global” context, outside solution:

>>> from graphtik import operation, compose, sfx
>>> @operation(provides=sfx("lights off"))  # sideffect names can be anything
... def close_the_lights():
...    pass
>>> graph = compose('strip ease',
...     close_the_lights,
...     operation(
...         name='undress',
...         needs=[sfx("lights off")],
...         provides="body")(lambda: "TaDa!")
... )
>>> graph
Pipeline('strip ease',
                 needs=[sfx('lights off')],
                 provides=[sfx('lights off'), 'body'],
                 x2 ops: close_the_lights, undress)
>>> sol = graph()
>>> sol
{'body': 'TaDa!'}

sideffect

Note

Something has to provide a sideffect for a function needing it to execute - this could be another operation, like above, or the user-inputs; just specify some truthy value for the sideffect:

>>> sol = graph.compute({sfx("lights off"): True})
graphtik.modifiers.sfxed(dependency: str, sfx0: str, *sfx_list: str, fn_kwarg: str = None, optional: bool = None)[source]

Annotates a sideffected dependency in the solution sustaining side-effects.

Parameters

fn_kwarg – the name for the function argument it corresponds. When optional, it becomes the same as name if falsy, so as to behave always like kw-type arg, and to preserve fn-name if ever renamed. When not optional, if not given, it’s all fine.

Like sfx() but annotating a real dependency in the solution, allowing that dependency to be present both in needs and provides of the same function.

Example:

A typical use-case is to signify columns required to produce new ones in pandas dataframes (emulated with dictionaries):

>>> from graphtik import operation, compose, sfxed
>>> @operation(needs="order_items",
...            provides=sfxed("ORDER", "Items", "Prices"))
... def new_order(items: list) -> "pd.DataFrame":
...     order = {"items": items}
...     # Pretend we get the prices from sales.
...     order['prices'] = list(range(1, len(order['items']) + 1))
...     return order
>>> @operation(
...     needs=[sfxed("ORDER", "Items"), "vat rate"],
...     provides=sfxed("ORDER", "VAT")
... )
... def fill_in_vat(order: "pd.DataFrame", vat: float):
...     order['VAT'] = [i * vat for i in order['prices']]
...     return order
>>> @operation(
...     needs=[sfxed("ORDER", "Prices", "VAT")],
...     provides=sfxed("ORDER", "Totals")
... )
... def finalize_prices(order: "pd.DataFrame"):
...     order['totals'] = [p + v for p, v in zip(order['prices'], order['VAT'])]
...     return order

To view all internal dependencies, enable DEBUG in configurations:

>>> from graphtik.config import debug_enabled
>>> with debug_enabled(True):
...     finalize_prices
FunctionalOperation(name='finalize_prices',
                    needs=[sfxed('ORDER', 'Prices'), sfxed('ORDER', 'VAT')],
                    op_needs=[sfxed('ORDER', 'Prices'), sfxed('ORDER', 'VAT')],
                    _fn_needs=['ORDER'],
                    provides=[sfxed('ORDER', 'Totals')],
                    op_provides=[sfxed('ORDER', 'Totals')],
                    _fn_provides=['ORDER'],
                    fn='finalize_prices')

Notice that declaring a single sideffected with many items in sfx_list, expands into multiple “singular” sideffected dependencies in the network (check needs & op_needs above).

>>> proc_order = compose('process order', new_order, fill_in_vat, finalize_prices)
>>> sol = proc_order.compute({
...      "order_items": ["toilet-paper", "soap"],
...      "vat rate": 0.18,
... })
>>> sol
{'order_items': ['toilet-paper', 'soap'],
 'vat rate': 0.18,
 'ORDER': {'items': ['toilet-paper', 'soap'],
           'prices': [1, 2],
           'VAT': [0.18, 0.36],
           'totals': [1.18, 2.36]}}

sideffecteds

Notice that although many functions consume & produce the same ORDER dependency (check fn_needs & fn_provides, above), something that would have formed cycles, the wrapping operations need and provide different sideffected instances, breaking the cycles.

graphtik.modifiers.sfxed_vararg(dependency: str, sfx0: str, *sfx_list: str)[source]

Like sideffected() + vararg().

graphtik.modifiers.sfxed_varargs(dependency: str, sfx0: str, *sfx_list: str)[source]

Like sideffected() + varargs().

graphtik.modifiers.vararg(name: str)[source]

Annotate a varargish needs to be fed as function’s *args.

See also

Consult also the example test-case in: test/test_op.py:test_varargs(), in the full sources of the project.

Example:

We designate b & c as vararg arguments:

>>> from graphtik import operation, compose, vararg
>>> @operation(
...     needs=['a', vararg('b'), vararg('c')],
...     provides='sum'
... )
... def addall(a, *b):
...    return a + sum(b)
>>> addall
FunctionalOperation(name='addall',
                    needs=['a', 'b'(*), 'c'(*)],
                    provides=['sum'],
                    fn='addall')
>>> graph = compose('mygraph', addall)

The graph works with and without any of b or c inputs:

>>> graph(a=5, b=2, c=4)['sum']
11
>>> graph(a=5, b=2)
{'a': 5, 'b': 2, 'sum': 7}
>>> graph(a=5)
{'a': 5, 'sum': 5}
graphtik.modifiers.varargs(name: str)[source]

An varargish vararg(), naming a iterable value in the inputs.

See also

Consult also the example test-case in: test/test_op.py:test_varargs(), in the full sources of the project.

Example:

>>> from graphtik import operation, compose, varargs
>>> def enlist(a, *b):
...    return [a] + list(b)
>>> graph = compose('mygraph',
...     operation(name='enlist', needs=['a', varargs('b')],
...     provides='sum')(enlist)
... )
>>> graph
Pipeline('mygraph',
                 needs=['a', 'b'(?)],
                 provides=['sum'],
                 x1 ops: enlist)

The graph works with or without b in the inputs:

>>> graph(a=5, b=[2, 20])['sum']
[5, 2, 20]
>>> graph(a=5)
{'a': 5, 'sum': [5]}
>>> graph(a=5, b=0xBAD)
Traceback (most recent call last):
...
graphtik.base.MultiValueError: Failed preparing needs:
    1. Expected needs['b'(+)] to be non-str iterables!
    +++inputs: ['a', 'b']
    +++FunctionalOperation(name='enlist', needs=['a', 'b'(+)], provides=['sum'], fn='enlist')

Attention

To avoid user mistakes, varargs do not accept str inputs (though iterables):

>>> graph(a=5, b="mistake")
Traceback (most recent call last):
...
graphtik.base.MultiValueError: Failed preparing needs:
    1. Expected needs['b'(+)] to be non-str iterables!
    +++inputs: ['a', 'b']
    +++FunctionalOperation(name='enlist',
                           needs=['a', 'b'(+)],
                           provides=['sum'],
                           fn='enlist')

Module: network

compose network of operations & dependencies, compile the plan.

class graphtik.network.Network(*operations, graph=None)[source]

A graph of operations that can compile an execution plan.

needs[source]

the “base”, all data-nodes that are not produced by some operation

provides[source]

the “base”, all data-nodes produced by some operation

__abstractmethods__ = frozenset({})[source]
__init__(*operations, graph=None)[source]
Parameters
  • operations – to be added in the graph

  • graph – if None, create a new.

Raises

ValueError

if dupe operation, with msg:

Operations may only be added once, …

__module__ = 'graphtik.network'
__repr__()[source]

Return repr(self).

_abc_impl = <_abc_data object>
_append_operation(graph, operation: graphtik.base.Operation)[source]

Adds the given operation and its data requirements to the network graph.

  • Invoked during constructor only (immutability).

  • Identities are based on the name of the operation, the names of the operation’s needs, and the names of the data it provides.

  • Adds needs, operation & provides, in that order.

Parameters
  • graph – the networkx graph to append to

  • operation – operation instance to append

_apply_graph_predicate(graph, predicate)[source]
_build_execution_steps(pruned_dag, inputs: Collection, outputs: Optional[Collection]) → List[source]

Create the list of operation-nodes & instructions evaluating all

operations & instructions needed a) to free memory and b) avoid overwriting given intermediate inputs.

Parameters
  • pruned_dag – The original dag, pruned; not broken.

  • outputs – outp-names to decide whether to add (and which) evict-instructions

Instances of _EvictInstructions are inserted in steps between operation nodes to reduce the memory footprint of solutions while the computation is running. An evict-instruction is inserted whenever a need is not used by any other operation further down the DAG.

_cached_plans = None[source]

Speed up compile() call and avoid a multithreading issue(?) that is occurring when accessing the dag in networkx.

_prune_graph(inputs: Union[Collection, str, None], outputs: Union[Collection, str, None], predicate: Callable[[Any, Mapping], bool] = None) → Tuple[networkx.classes.digraph.DiGraph, Collection, Collection, Collection][source]

Determines what graph steps need to run to get to the requested outputs from the provided inputs: - Eliminate steps that are not on a path arriving to requested outputs; - Eliminate unsatisfied operations: partial inputs or no outputs needed; - consolidate the list of needs & provides.

Parameters
  • inputs – The names of all given inputs.

  • outputs – The desired output names. This can also be None, in which case the necessary steps are all graph nodes that are reachable from the provided inputs.

  • predicate – the node predicate is a 2-argument callable(op, node-data) that should return true for nodes to include; if None, all nodes included.

Returns

a 3-tuple with the pruned_dag & the needs/provides resolved based on the given inputs/outputs (which might be a subset of all needs/outputs of the returned graph).

Use the returned needs/provides to build a new plan.

Raises

ValueError

  • if outputs asked do not exist in network, with msg:

    Unknown output nodes: …

_topo_sort_nodes(dag) → List[source]

Topo-sort dag respecting operation-insertion order to break ties.

compile(inputs: Union[Collection, str, None] = None, outputs: Union[Collection, str, None] = None, predicate=None) → ExecutionPlan[source]

Create or get from cache an execution-plan for the given inputs/outputs.

See _prune_graph() and _build_execution_steps() for detailed description.

Parameters
  • inputs – A collection with the names of all the given inputs. If None`, all inputs that lead to given outputs are assumed. If string, it is converted to a single-element collection.

  • outputs – A collection or the name of the output name(s). If None`, all reachable nodes from the given inputs are assumed. If string, it is converted to a single-element collection.

  • predicate – the node predicate is a 2-argument callable(op, node-data) that should return true for nodes to include; if None, all nodes included.

Returns

the cached or fresh new execution plan

Raises

ValueError

  • If outputs asked do not exist in network, with msg:

    Unknown output nodes: …

  • If solution does not contain any operations, with msg:

    Unsolvable graph: …

  • If given inputs mismatched plan’s needs, with msg:

    Plan needs more inputs…

  • If net cannot produce asked outputs, with msg:

    Unreachable outputs…

find_op_by_name(name) → Optional[graphtik.base.Operation][source]

Fetch the 1st operation named with the given name.

find_ops(predicate) → List[graphtik.base.Operation][source]

Scan operation nodes and fetch those satisfying predicate.

Parameters

predicate – the node predicate is a 2-argument callable(op, node-data) that should return true for nodes to include.

graph = None[source]

The networkx (Di)Graph containing all operations and dependencies, prior to compilation.

prepare_plot_args(plot_args: graphtik.base.PlotArgs)graphtik.base.PlotArgs[source]

Called by plot() to create the nx-graph and other plot-args, e.g. solution.

  • Clone the graph or merge it with the one in the plot_args (see PlotArgs.clone_or_merge_graph().

  • For the rest args, prefer PlotArgs.with_defaults() over _replace(), not to override user args.

class graphtik.network._EvictInstruction[source]

A step in the ExecutionPlan to evict a computed value from the solution.

It’s a step in ExecutionPlan.steps for the data-node str that frees its data-value from solution after it is no longer needed, to reduce memory footprint while computing the graph.

__module__ = 'graphtik.network'
__repr__()[source]

Return repr(self).

__slots__ = ()[source]
graphtik.network._optionalized(graph, data)[source]

Retain optionality of a data node based on all needs edges.

graphtik.network._yield_datanodes(nodes)[source]

May scan dag nodes.

graphtik.network.build_network(operations, rescheduled=None, endured=None, parallel=None, marshalled=None, node_props=None, renamer=None)[source]

The network factory that does operation merging before constructing it.

Parameters

nest – see same-named param in compose()

graphtik.network.collect_requirements(graph) → Tuple[boltons.setutils.IndexedSet, boltons.setutils.IndexedSet][source]

Collect & split datanodes in (possibly overlapping) needs/provides.

graphtik.network.log = <Logger graphtik.network (WARNING)>

If this logger is eventually DEBUG-enabled, the string-representation of network-objects (network, plan, solution) is augmented with children’s details.

graphtik.network.unsatisfied_operations(dag, inputs: Collection) → List[source]

Traverse topologically sorted dag to collect un-satisfied operations.

Unsatisfied operations are those suffering from ANY of the following:

  • They are missing at least one compulsory need-input.

    Since the dag is ordered, as soon as we’re on an operation, all its needs have been accounted, so we can get its satisfaction.

  • Their provided outputs are not linked to any data in the dag.

    An operation might not have any output link when _prune_graph() has broken them, due to given intermediate inputs.

Parameters
  • dag – a graph with broken edges those arriving to existing inputs

  • inputs – an iterable of the names of the input values

Returns

a list of unsatisfied operations to prune

graphtik.network.yield_node_names(nodes)[source]

Yield either op.name or str(node).

graphtik.network.yield_ops(nodes)[source]

May scan (preferably) plan.steps or dag nodes.

Module: execution

execute the plan to derrive the solution.

class graphtik.execution.ExecutionPlan[source]

A pre-compiled list of operation steps that can execute for the given inputs/outputs.

It is the result of the network’s compilation phase.

Note the execution plan’s attributes are on purpose immutable tuples.

net

The parent Network

needs

An IndexedSet with the input names needed to exist in order to produce all provides.

provides

An IndexedSet with the outputs names produces when all inputs are given.

dag

The regular (not broken) pruned subgraph of net-graph.

steps

The tuple of operation-nodes & instructions needed to evaluate the given inputs & asked outputs, free memory and avoid overwriting any given intermediate inputs.

asked_outs

When true, evictions may kick in (unless disabled by configurations), otherwise, evictions (along with prefect-evictions check) are skipped.

__abstractmethods__ = frozenset({})[source]
__dict__ = mappingproxy({'__module__': 'graphtik.execution', '__doc__': "\n A pre-compiled list of operation steps that can :term:`execute` for the given inputs/outputs.\n\n It is the result of the network's :term:`compilation` phase.\n\n Note the execution plan's attributes are on purpose immutable tuples.\n\n .. attribute:: net\n\n The parent :class:`Network`\n .. attribute:: needs\n\n An :class:`.IndexedSet` with the input names needed to exist in order to produce all `provides`.\n .. attribute:: provides\n\n An :class:`.IndexedSet` with the outputs names produces when all `inputs` are given.\n .. attribute:: dag\n\n The regular (not broken) *pruned* subgraph of net-graph.\n .. attribute:: steps\n\n The tuple of operation-nodes & *instructions* needed to evaluate\n the given inputs & asked outputs, free memory and avoid overwriting\n any given intermediate inputs.\n .. attribute:: asked_outs\n\n When true, :term:`evictions` may kick in (unless disabled by :term:`configurations`),\n otherwise, *evictions* (along with prefect-evictions check) are skipped.\n ", 'prepare_plot_args': <function ExecutionPlan.prepare_plot_args>, '__repr__': <function ExecutionPlan.__repr__>, 'validate': <function ExecutionPlan.validate>, '_check_if_aborted': <function ExecutionPlan._check_if_aborted>, '_prepare_tasks': <function ExecutionPlan._prepare_tasks>, '_handle_task': <function ExecutionPlan._handle_task>, '_execute_thread_pool_barrier_method': <function ExecutionPlan._execute_thread_pool_barrier_method>, '_execute_sequential_method': <function ExecutionPlan._execute_sequential_method>, 'execute': <function ExecutionPlan.execute>, '__dict__': <attribute '__dict__' of 'ExecutionPlan' objects>, '__abstractmethods__': frozenset(), '_abc_impl': <_abc_data object>})
__module__ = 'graphtik.execution'
__repr__()[source]

Return a nicely formatted representation string

_abc_impl = <_abc_data object>
_check_if_aborted(solution)[source]
_execute_sequential_method(solution: graphtik.execution.Solution)[source]

This method runs the graph one operation at a time in a single thread

Parameters

solution – must contain the input values only, gets modified

_execute_thread_pool_barrier_method(solution: graphtik.execution.Solution)[source]

This method runs the graph using a parallel pool of thread executors. You may achieve lower total latency if your graph is sufficiently sub divided into operations using this method.

Parameters

solution – must contain the input values only, gets modified

_handle_task(future, op, solution)None[source]

Un-dill parallel task results (if marshalled), and update solution / handle failure.

_prepare_tasks(operations, solution, pool, global_parallel, global_marshal) → Union[Future, graphtik.execution._OpTask, bytes][source]

Combine ops+inputs, apply marshalling, and submit to execution pool (or not) …

based on global/pre-op configs.

execute(named_inputs, outputs=None, *, name='', solution_class=None)graphtik.execution.Solution[source]
Parameters
  • named_inputs – A mapping of names –> values that must contain at least the compulsory inputs that were specified when the plan was built (but cannot enforce that!). Cloned, not modified.

  • outputs – If not None, they are just checked if possible, based on provides, and scream if not.

  • name – name of the pipeline used for logging

  • solution_class – a custom solution factory to use

Returns

The solution which contains the results of each operation executed +1 for inputs in separate dictionaries.

Raises

ValueError

  • If plan does not contain any operations, with msg:

    Unsolvable graph: …

  • If given inputs mismatched plan’s needs, with msg:

    Plan needs more inputs…

  • If net cannot produce asked outputs, with msg:

    Unreachable outputs…

prepare_plot_args(plot_args: graphtik.base.PlotArgs)graphtik.base.PlotArgs[source]

Called by plot() to create the nx-graph and other plot-args, e.g. solution.

  • Clone the graph or merge it with the one in the plot_args (see PlotArgs.clone_or_merge_graph().

  • For the rest args, prefer PlotArgs.with_defaults() over _replace(), not to override user args.

validate(inputs: Union[Collection, str, None], outputs: Union[Collection, str, None])[source]

Scream on invalid inputs, outputs or no operations in graph.

Raises

ValueError

  • If cannot produce any outputs from the given inputs, with msg:

    Unsolvable graph: …

  • If given inputs mismatched plan’s needs, with msg:

    Plan needs more inputs…

  • If net cannot produce asked outputs, with msg:

    Unreachable outputs…

class graphtik.execution.Solution(plan, input_values: dict)[source]

The solution chain-map and execution state (e.g. overwrite or canceled operation)

__abstractmethods__ = frozenset({})[source]
__copy__()[source]

New ChainMap or subclass with a new copy of maps[0] and refs to maps[1:]

__delitem__(key)[source]
__init__(plan, input_values: dict)[source]

Initialize a ChainMap by setting maps to the given mappings. If no mappings are provided, a single empty dictionary is used.

__module__ = 'graphtik.execution'
__repr__()[source]

Return repr(self).

_abc_impl = <_abc_data object>
_layers = None[source]

An ordered mapping of plan-operations to their results (initially empty dicts). The result dictionaries pre-populate this (self) chainmap, with the 1st map (wins all reads) the last operation, the last one the input_values dict.

_reschedule(dag, nbunch_to_break, op)[source]

Update dag/canceled/executed ops and return newly-canceled ops.

canceled = None[source]

A sorted set of canceled operation\s due to upstream failures.

check_if_incomplete() → Optional[graphtik.base.IncompleteExecutionError][source]

Return a IncompleteExecutionError if pipeline operations failed/canceled.

dag = None[source]

Cloned from plan will be modified, by removing the downstream edges of:

  • any partial outputs not provided, or

  • all provides of failed operations.

FIXME: SPURIOUS dag reversals on multi-threaded runs (see below next assertion)!

debugstr()[source]
executed = None[source]

A dictionary with keys the operations executed, and values their status:

  • no key: not executed yet

  • value None: execution ok

  • value Exception: execution failed

  • value Collection: canceled provides

is_failed(op)[source]
operation_executed(op, outputs)[source]

Invoked once per operation, with its results.

It will update executed with the operation status and if outputs were partials, it will update canceled with the unsatisfied ops downstream of op.

Parameters
  • op – the operation that completed ok

  • outputs – The named values the op` actually produced, which may be a subset of its provides. Sideffects are not considered.

operation_failed(op, ex)[source]

Invoked once per operation, with its results.

It will update executed with the operation status and the canceled with the unsatisfied ops downstream of op.

property overwrites

The data in the solution that exist more than once.

A “virtual” property to a dictionary with keys the names of values that exist more than once, and values, all those values in a list, ordered in reverse compute order (1st is the last one computed).

plan = None[source]

the plan that produced this solution

prepare_plot_args(plot_args: graphtik.base.PlotArgs)graphtik.base.PlotArgs[source]

delegate to plan, with solution

scream_if_incomplete()[source]

Raise a IncompleteExecutionError if pipeline operations failed/canceled.

solid = None[source]

A unique identifier to distinguish separate flows in execution logs.

class graphtik.execution._OpTask(op, sol, solid, result=<UNSET>)[source]

Mimic concurrent.futures.Future for sequential execution.

This intermediate class is needed to solve pickling issue with process executor.

__call__()[source]

Call self as a function.

__init__(op, sol, solid, result=<UNSET>)[source]

Initialize self. See help(type(self)) for accurate signature.

__module__ = 'graphtik.execution'
__repr__()[source]

Return repr(self).

__slots__ = ('op', 'sol', 'solid', 'result')
get()[source]

Call self as a function.

logname = 'graphtik.execution'
marshalled()[source]
op
result
sol
solid
graphtik.execution._do_task(task)[source]

Un-dill the simpler _OpTask & Dill the results, to pass through pool-processes.

See https://stackoverflow.com/a/24673524/548792

graphtik.execution._isDebugLogging()[source]
graphtik.execution.log = <Logger graphtik.execution (WARNING)>

If this logger is eventually DEBUG-enabled, the string-representation of network-objects (network, plan, solution) is augmented with children’s details.

graphtik.execution.task_context: None = <ContextVar name='task_context'>

Populated with the _OpTask for the currently executing operation. It does not work for parallel execution.

Module: plot

plotting handled by the active plotter & current theme.

class graphtik.plot.Plotter(theme: graphtik.plot.Theme = None, **styles_kw)[source]

a plotter renders diagram images of plottables.

default_theme[source]

The customizable Theme instance controlling theme values & dictionaries for plots.

build_pydot(plot_args: graphtik.base.PlotArgs) → pydot.Dot[source]

Build a pydot.Dot out of a Network graph/steps/inputs/outputs and return it

to be fed into Graphviz to render.

See Plottable.plot() for the arguments, sample code, and the legend of the plots.

legend(filename=None, jupyter_render: Mapping = None, theme: graphtik.plot.Theme = None)[source]

Generate a legend for all plots (see Plottable.plot() for args)

See Plotter.render_pydot() for the rest arguments.

plot(plot_args: graphtik.base.PlotArgs)[source]
render_pydot(dot: pydot.Dot, filename=None, jupyter_render: str = None)[source]

Render a pydot.Dot instance with Graphviz in a file and/or in a matplotlib window.

Parameters
  • dot – the pre-built pydot.Dot instance

  • filename (str) –

    Write a file or open a matplotlib window.

    • If it is a string or file, the diagram is written into the file-path

      Common extensions are .png .dot .jpg .jpeg .pdf .svg call plot.supported_plot_formats() for more.

    • If it IS True, opens the diagram in a matplotlib window (requires matplotlib package to be installed).

    • If it equals -1, it mat-plots but does not open the window.

    • Otherwise, just return the pydot.Dot instance.

    seealso

    PlotArgs.filename

  • jupyter_render

    a nested dictionary controlling the rendering of graph-plots in Jupyter cells. If None, defaults to default_jupyter_render; you may modify those in place and they will apply for all future calls (see Jupyter notebooks).

    You may increase the height of the SVG cell output with something like this:

    plottable.plot(jupyter_render={"svg_element_styles": "height: 600px; width: 100%"})
    

Returns

the matplotlib image if filename=-1, or the given dot annotated with any jupyter-rendering configurations given in jupyter_render parameter.

See Plottable.plot() for sample code.

with_styles(**kw)graphtik.plot.Plotter[source]

Returns a cloned plotter with a deep-copied theme modified as given.

See also Theme.withset().

class graphtik.plot.Ref(ref, default=Ellipsis)[source]

Deferred attribute reference resolve()d on a some object(s).

default
ref
resolve(*objects, default=Ellipsis)[source]

Makes re-based clone to the given class/object.

class graphtik.plot.StylesStack[source]

A mergeable stack of dicts preserving provenance and style expansion.

The merge() method joins the collected stack of styles into a single dictionary, and if DEBUG (see remerge()) insert their provenance in a 'tooltip' attribute; Any lists are merged (important for multi-valued Graphviz attributes like style).

Then they are expanded.

add(name, kw=Ellipsis)[source]

Adds a style by name from style-attributes, or provenanced explicitly, or fail early.

Parameters
  • name – Either the provenance name when the kw styles is given, OR just an existing attribute of style instance.

  • kw – if given and is None/empty, ignored.

expand(style: dict)dict[source]

Apply style expansions on an already merged style.

  • Resolve any Ref instances, first against the current nx_attrs and then against the attributes of the current theme.

  • Render jinja2 templates (see _expand_styles()) with template-arguments all the attributes of the plot_args instance in use (hence much more flexible than Ref).

  • Call any callables with current plot_args and replace them by their result (even more flexible than templates).

  • Any Nones results above are discarded.

  • Workaround pydot/pydot#228 pydot-cstor not supporting styles-as-lists.

property ignore_errors

When true, keep merging despite expansion errors.

merge(debug=None)dict[source]

Recursively merge named_styles and expand() the result style.

Parameters

debug – When not None, override config.is_debug() flag. When debug is enabled, tooltips are overridden with provenance & nx_attrs.

Returns

the merged styles

property named_styles

A list of 2-tuples: (name, dict) containing the actual styles along with their provenance.

property plot_args

current item’s plot data with at least PlotArgs.theme attribute. ` `

stack_user_style(nx_attrs: dict, skip=())[source]

Appends keys in nx_attrs starting with USER_STYLE_PREFFIX into the stack.

class graphtik.plot.Theme(*, _prototype: Optional[graphtik.plot.Theme] = None, **kw)[source]

The poor man’s css-like plot theme (see also StyleStack).

To use the values contained in theme-instances, stack them in a StylesStack, and StylesStack.merge() them with style expansions.

Attention

It is recommended to use other means for Plot customizations instead of modifying directly theme’s class-attributes.

All Theme class-attributes are deep-copied when constructing new instances, to avoid modifications by mistake, while attempting to update instance-attributes instead (hint: allmost all its attributes are containers i.e. dicts). Therefore any class-attributes modification will be ignored, until a new Theme instance from the patched class is used .

arch_url = 'https://graphtik.readthedocs.io/en/latest/arch.html'

the url to the architecture section explaining graphtik glossary, linked by legend.

broken_color = 'Red'
canceled_color = '#a9a9a9'
evicted = '#006666'
failed_color = 'LightCoral'
fill_color = 'wheat'
include_steps = False[source]
kw_data = {'margin': '0.04,0.02'}

Reduce margins, since sideffects take a lot of space (default margin: x=0.11, y=0.055O)

kw_data_canceled = {'fillcolor': Ref('canceled_color'), 'style': ['filled'], 'tooltip': '(canceled)'}
kw_data_evicted = {'penwidth': '3', 'tooltip': '(evicted)'}
kw_data_in_solution = {'fillcolor': Ref('fill_color'), 'style': ['filled'], 'tooltip': <function make_data_value_tooltip>}
kw_data_io_choice = {0: {'shape': 'rect'}, 1: {'shape': 'invhouse'}, 2: {'shape': 'house'}, 3: {'shape': 'hexagon'}}

SHAPE change if with inputs/outputs, see https://graphviz.gitlab.io/_pages/doc/info/shapes.html

kw_data_overwritten = {'fillcolor': Ref('overwrite_color'), 'style': ['filled']}
kw_data_pruned = {'color': Ref('pruned_color'), 'fontcolor': Ref('pruned_color'), 'tooltip': '(pruned)'}
kw_data_sideffect = {'color': 'blue', 'fontcolor': 'blue'}
kw_data_sideffected = {'label': <Template memory:7f5d773d0860>}
kw_data_to_evict = {'color': Ref('evicted'), 'fontcolor': Ref('evicted'), 'style': ['dashed'], 'tooltip': '(to evict)'}
kw_edge = {}[source]
kw_edge_alias = {'fontsize': 11, 'label': <Template memory:7f5d76d4add8>}

Added conditionally if alias_of found in edge-attrs.

kw_edge_broken = {'color': Ref('broken_color')}
kw_edge_endured = {'style': ['dashed']}
kw_edge_mapping_fn_kwarg = {'fontname': 'italic', 'fontsize': 11, 'label': <Template memory:7f5d76d4ae80>}

Rendered if fn_kwarg exists in nx_attrs.

kw_edge_optional = {'style': ['dashed']}
kw_edge_pruned = {'color': Ref('pruned_color')}
kw_edge_rescheduled = {'style': ['dashed']}
kw_edge_sideffect = {'color': 'blue'}
kw_graph = {'fontname': 'italic', 'graph_type': 'digraph'}
kw_graph_plottable_type = {'ExecutionPlan': {}, 'FunctionalOperation': {}, 'Network': {}, 'Pipeline': {}, 'Solution': {}}

styles per plot-type

kw_graph_plottable_type_unknown = {}[source]

For when type-name of PlotArgs.plottable is not found in kw_plottable_type ( ot missing altogether).

kw_legend = {'URL': 'https://graphtik.readthedocs.io/en/latest/_images/GraphtikLegend.svg', 'fillcolor': 'yellow', 'name': 'legend', 'shape': 'component', 'style': 'filled', 'target': '_blank'}

If 'URL'` key missing/empty, no legend icon included in plots.

kw_op = {'name': <function Theme.<lambda>>, 'shape': 'plain', 'tooltip': <function Theme.<lambda>>}

props for operation node (outside of label))

kw_op_canceled = {'fillcolor': Ref('canceled_color'), 'tooltip': '(canceled)'}
kw_op_endured = {'badges': ['!'], 'penwidth': Ref('resched_thickness'), 'style': ['dashed'], 'tooltip': '(endured)'}
kw_op_executed = {'fillcolor': Ref('fill_color')}
kw_op_failed = {'fillcolor': Ref('failed_color'), 'tooltip': <Template memory:7f5d76d4aa58>}
kw_op_label = {'fn_link_target': '_top', 'fn_name': <function Theme.<lambda>>, 'fn_tooltip': <function make_fn_tooltip>, 'fn_url': Ref('fn_url'), 'op_link_target': '_top', 'op_name': <function Theme.<lambda>>, 'op_tooltip': <function make_op_tooltip>, 'op_url': Ref('op_url')}

props of the HTML-Table label for Operations

kw_op_marshalled = {'badges': ['&']}
kw_op_parallel = {'badges': ['|']}
kw_op_pruned = {'color': Ref('pruned_color'), 'fontcolor': Ref('pruned_color')}
kw_op_rescheduled = {'badges': ['?'], 'penwidth': Ref('resched_thickness'), 'style': ['dashed'], 'tooltip': '(rescheduled)'}
kw_op_returns_dict = {'badges': ['}']}
kw_step = {'arrowhead': 'vee', 'color': Ref('steps_color'), 'fontcolor': Ref('steps_color'), 'fontname': 'bold', 'fontsize': 18, 'splines': True, 'style': 'dotted'}
op_bad_html_label_keys = {'label', 'shape', 'style'}

Keys to ignore from operation styles & node-attrs, because they are handled internally by HTML-Label, and/or interact badly with that label.

op_badge_styles = {'badge_styles': {'!': {'bgcolor': '#04277d', 'color': 'white', 'tooltip': 'endured'}, '&': {'bgcolor': '#4e3165', 'color': 'white', 'tooltip': 'marshalled'}, '?': {'bgcolor': '#fc89ac', 'color': 'white', 'tooltip': 'rescheduled'}, '|': {'bgcolor': '#b1ce9a', 'color': 'white', 'tooltip': 'parallel'}, '}': {'bgcolor': '#cc5500', 'color': 'white', 'tooltip': 'returns_dict'}}}

Operation styles may specify one or more “letters” in a badges list item, as long as the “letter” is contained in the dictionary below.

op_template = <Template memory:7f5d77e47240>

Try to mimic a regular Graphviz node attributes (see examples in test.test_plot.test_op_template_full() for params). TODO: fix jinja2 template is un-picklable!

overwrite_color = 'SkyBlue'
pruned_color = '#d3d3d3'
resched_thickness = 4
steps_color = '#00bbbb'
static theme_attributes(obj)dict[source]

Extract public data attributes of a Theme instance.

withset(**kw)graphtik.plot.Theme[source]

Returns a deep-clone modified by kw.

graphtik.plot.USER_STYLE_PREFFIX = 'graphviz.'

Any nx-attributes starting with this prefix are appended verbatim as graphviz attributes, by stack_user_style().

graphtik.plot.active_plotter_plugged(plotter: graphtik.plot.Plotter)None[source]

Like set_active_plotter() as a context-manager, resetting back to old value.

graphtik.plot.as_identifier(s)[source]

Convert string into a valid ID, both for html & graphviz.

It must not rely on Graphviz’s HTML-like string, because it would not be a valid HTML-ID.

graphtik.plot.default_jupyter_render = {'svg_container_styles': '', 'svg_element_styles': 'width: 100%; height: 300px;', 'svg_pan_zoom_json': '{controlIconsEnabled: true, fit: true}'}

A nested dictionary controlling the rendering of graph-plots in Jupyter cells,

as those returned from Plottable.plot() (currently as SVGs). Either modify it in place, or pass another one in the respective methods.

The following keys are supported.

Parameters
  • svg_pan_zoom_json

    arguments controlling the rendering of a zoomable SVG in Jupyter notebooks, as defined in https://github.com/ariutta/svg-pan-zoom#how-to-use if None, defaults to string (also maps supported):

    "{controlIconsEnabled: true, zoomScaleSensitivity: 0.4, fit: true}"
    

  • svg_element_styles

    mostly for sizing the zoomable SVG in Jupyter notebooks. Inspect & experiment on the html page of the notebook with browser tools. if None, defaults to string (also maps supported):

    "width: 100%; height: 300px;"
    

  • svg_container_styles – like svg_element_styles, if None, defaults to empty string (also maps supported).

graphtik.plot.get_active_plotter()graphtik.plot.Plotter[source]

Get the previously active plotter instance or default one.

graphtik.plot.get_node_name(nx_node, raw=False)[source]
graphtik.plot.graphviz_html_string(s, *, repl_nl=None, repl_colon=None, xmltext=None)[source]

Workaround pydot parsing of node-id & labels by encoding as HTML.

  • pydot library does not quote DOT-keywords anywhere (pydot#111).

  • Char : on node-names denote port/compass-points and break IDs (pydot#224).

  • Non-strings are not quote_if_necessary by pydot.

  • NLs im tooltips of HTML-Table labels need substitution with the XML-entity.

  • HTML-Label attributes (xmlattr=True) need both html-escape & quote.

Attention

It does not correctly handle ID:port:compass-point format.

See https://www.graphviz.org/doc/info/lang.html)

graphtik.plot.legend(filename=None, show=None, jupyter_render: Mapping = None, plotter: graphtik.plot.Plotter = None)[source]

Generate a legend for all plots (see Plottable.plot() for args)

Parameters
  • plotter – override the active plotter

  • show

    Deprecated since version v6.1.1: Merged with filename param (filename takes precedence).

See Plotter.render_pydot() for the rest arguments.

graphtik.plot.make_data_value_tooltip(plot_args: graphtik.base.PlotArgs)[source]

Called on datanodes, when solution exists.

graphtik.plot.make_fn_tooltip(plot_args: graphtik.base.PlotArgs)[source]

the sources of the operation-function

graphtik.plot.make_op_tooltip(plot_args: graphtik.base.PlotArgs)[source]

the string-representation of an operation (name, needs, provides)

graphtik.plot.make_template(s)[source]

Makes dedented jinja2 templates supporting extra escape filters for Graphviz:

ee

Like default escape filter e, but Nones/empties evaluate to false. Needed because the default escape filter breaks xmlattr filter with Nones .

eee

Escape for when writting inside HTML-strings. Collapses nones/empties (unlike default e).

hrefer

Dubious escape for when writting URLs inside Graphviz attributes. Does NOT collapse nones/empties (like default e)

graphtik.plot.quote_html_tooltips(s)[source]

Graphviz HTML-Labels ignore NLs & TABs.

graphtik.plot.quote_node_id(s)[source]

See graphviz_html_string()

graphtik.plot.remerge(*containers, source_map: list = None)[source]

Merge recursively dicts and extend lists with boltons.iterutils.remap()

screaming on type conflicts, ie, a list needs a list, etc, unless one of them is None, which is ignored.

Parameters
  • containers – a list of dicts or lists to merge; later ones take precedence (last-wins). If source_map is given, these must be 2-tuples of (name: container).

  • source_map

    If given, it must be a dictionary, and containers arg must be 2-tuples like (name: container). The source_map will be populated with mappings between path and the name of the container it came from.

    Warning

    if source_map given, the order of input dictionaries is NOT preserved is the results (important if your code rely on PY3.7 stable dictionaries).

Returns

returns a new, merged top-level container.

Example

>>> defaults = {
...     'subdict': {
...         'as_is': 'hi',
...         'overridden_key1': 'value_from_defaults',
...         'overridden_key1': 2222,
...         'merged_list': ['hi', {'untouched_subdict': 'v1'}],
...     }
... }
>>> overrides = {
...     'subdict': {
...         'overridden_key1': 'overridden value',
...         'overridden_key2': 5555,
...         'merged_list': ['there'],
...     }
... }
>>> from graphtik.plot import remerge
>>> source_map = {}
>>> remerge(
...     ("defaults", defaults),
...     ("overrides", overrides),
...     source_map=source_map)
 {'subdict': {'as_is': 'hi',
              'overridden_key1': 'overridden value',
              'merged_list': ['hi', {'untouched_subdict': 'v1'}, 'there'],
              'overridden_key2': 5555}}
>>> source_map
{('subdict', 'as_is'): 'defaults',
 ('subdict', 'overridden_key1'): 'overrides',
 ('subdict', 'merged_list'):  ['defaults', 'overrides'],
 ('subdict',): 'overrides',
 ('subdict', 'overridden_key2'): 'overrides'}
graphtik.plot.set_active_plotter(plotter: graphtik.plot.Plotter)[source]

The default instance to render plottables,

unless overridden with a plotter argument in Plottable.plot().

Parameters

plotter – the plotter instance to install

graphtik.plot.supported_plot_formats() → List[str][source]

return automatically all pydot extensions

Module: config

configurations for network execution, and utilities on them.

See also

methods plot.active_plotter_plugged(), plot.set_active_plotter(), plot.get_active_plotter()

Plot configrations were not defined here, not to pollute import space early, until they are actually needed.

Note

The contant-manager function XXX_plugged() or XXX_enabled() do NOT launch their code blocks using contextvars.Context.run() in a separate “context”, so any changes to these or other context-vars will persist (unless they are also done within such context-managers)

graphtik.config.abort_run()[source]

Sets the abort run global flag, to halt all currently or future executing plans.

This global flag is reset when any Pipeline.compute() is executed, or manually, by calling reset_abort().

graphtik.config.debug_enabled(enabled)[source]

Like set_debug() as a context-manager, resetting back to old value.

See also

disclaimer about context-managers the top of this config module.

graphtik.config.evictions_skipped(enabled)[source]

Like set_skip_evictions() as a context-manager, resetting back to old value.

See also

disclaimer about context-managers the top of this config module.

graphtik.config.execution_pool_plugged(pool: Optional[Pool])[source]

Like set_execution_pool() as a context-manager, resetting back to old value.

See also

disclaimer about context-managers the top of this config module.

graphtik.config.get_execution_pool() → Optional[Pool][source]

Get the process-pool for parallel plan executions.

graphtik.config.is_abort()[source]

Return True if networks have been signaled to stop execution.

graphtik.config.is_debug() → Optional[bool][source]

see set_debug()

graphtik.config.is_endure_operations() → Optional[bool][source]

see set_endure_operations()

graphtik.config.is_marshal_tasks() → Optional[bool][source]

see set_marshal_tasks()

graphtik.config.is_parallel_tasks() → Optional[bool][source]

see set_parallel_tasks()

graphtik.config.is_reschedule_operations() → Optional[bool][source]

see set_reschedule_operations()

graphtik.config.is_skip_evictions() → Optional[bool][source]

see set_skip_evictions()

graphtik.config.operations_endured(enabled)[source]

Like set_endure_operations() as a context-manager, resetting back to old value.

See also

disclaimer about context-managers the top of this config module.

graphtik.config.operations_reschedullled(enabled)[source]

Like set_reschedule_operations() as a context-manager, resetting back to old value.

See also

disclaimer about context-managers the top of this config module.

graphtik.config.reset_abort()[source]

Reset the abort run global flag, to permit plan executions to proceed.

graphtik.config.set_debug(enabled)[source]

When true, increase details on string-representation of network objects and errors.

Parameters

enabled

  • None, False, string(0, false, off, no): Disabled

  • 1: Enable ALL DEBUG_XXX

  • integers: Enable respective DEBUG_XXX bit-field constants

  • anything else: Enable ALL DEBUG_XXX

Affected behavior:

  • net objects print details recursively;

  • plotted SVG diagrams include style-provenance as tooltips;

  • Sphinx extension also saves the original DOT file next to each image (see graphtik_save_dot_files).

Note

The default is controlled with GRAPHTIK_DEBUG environment variable.

Note that enabling this flag is different from enabling logging in DEBUG, since it affects all code (eg interactive printing in debugger session, exceptions, doctests), not just debug statements (also affected by this flag).

Returns

a “reset” token (see ContextVar.set())

graphtik.config.set_endure_operations(enabled)[source]

Enable/disable globally endurance to keep executing even if some operations fail.

Parameters

enable

  • If None (default), respect the flag on each operation;

  • If true/false, force it for all operations.

Returns

a “reset” token (see ContextVar.set())

.

graphtik.config.set_execution_pool(pool: Optional[Pool])[source]

Set the process-pool for parallel plan executions.

You may have to :also func:set_marshal_tasks() to resolve pickling issues.

graphtik.config.set_marshal_tasks(enabled)[source]

Enable/disable globally marshalling of parallel operations, …

inputs & outputs with dill, which might help for pickling problems.

Parameters

enable

  • If None (default), respect the respective flag on each operation;

  • If true/false, force it for all operations.

Returns

a “reset” token (see ContextVar.set())

graphtik.config.set_parallel_tasks(enabled)[source]

Enable/disable globally parallel execution of operations.

Parameters

enable

  • If None (default), respect the respective flag on each operation;

  • If true/false, force it for all operations.

Returns

a “reset” token (see ContextVar.set())

graphtik.config.set_reschedule_operations(enabled)[source]

Enable/disable globally rescheduling for operations returning only partial outputs.

Parameters

enable

  • If None (default), respect the flag on each operation;

  • If true/false, force it for all operations.

Returns

a “reset” token (see ContextVar.set())

.

graphtik.config.set_skip_evictions(enabled)[source]

When true, disable globally evictions, to keep all intermediate solution values, …

regardless of asked outputs.

Returns

a “reset” token (see ContextVar.set())

graphtik.config.tasks_in_parallel(enabled)[source]

Like set_parallel_tasks() as a context-manager, resetting back to old value.

See also

disclaimer about context-managers the top of this config module.

graphtik.config.tasks_marshalled(enabled)[source]

Like set_marshal_tasks() as a context-manager, resetting back to old value.

See also

disclaimer about context-managers the top of this config module.

Module: base

Generic utilities, exceptions and operation & plottable base classes.

exception graphtik.base.AbortedException[source]

Raised from Network when abort_run() is called, and contains the solution …

with any values populated so far.

exception graphtik.base.IncompleteExecutionError[source]

Reported when any endured/reschedule operations were are canceled.

The exception contains 3 arguments:

  1. the causal errors and conditions (1st arg),

  2. the list of collected exceptions (2nd arg), and

  3. the solution instance (3rd argument), to interrogate for more.

Returned by check_if_incomplete() or raised by scream_if_incomplete().

exception graphtik.base.MultiValueError[source]
class graphtik.base.Operation[source]

An abstract class representing an action with compute().

abstract compute(named_inputs, outputs=None)[source]

Compute (optional) asked outputs for the given named_inputs.

It is called by Network. End-users should simply call the operation with named_inputs as kwargs.

Parameters

named_inputs – the input values with which to feed the computation.

Returns list

Should return a list values representing the results of running the feed-forward computation on inputs.

name: str = None[source]
needs: Items = None[source]
op_needs: Items = None[source]
op_provides: Items = None[source]
provides: Items = None[source]
class graphtik.base.PlotArgs[source]

All the args of a Plottable.plot() call,

check this method for a more detailed explanation of its attributes.

clone_or_merge_graph(base_graph)graphtik.base.PlotArgs[source]

Overlay graph over base_graph, or clone base_graph, if no attribute.

Returns

the updated plot_args

property clustered

Collect the actual clustered dot_nodes among the given nodes.

property clusters

Either a mapping of node-names to dot(.)-separated cluster-names, or false/true to enable plotter’s default clustering of nodes based on their dot-separated name parts.

Note that if it’s None (default), the plotter will cluster based on node-names, BUT the Plan may replace the None with a dictionary with the “pruned” cluster (when its dag differs from network’s graph); to suppress the pruned-cluster, pass a truthy, NON-dictionary value.

property dot

Where to add graphviz nodes & stuff.

property dot_item

The pydot-node/edge created

property filename

where to write image or show in a matplotlib window

property graph

what to plot (or the “overlay” when calling Plottable.plot())

property inputs

the list of input names .

property jupyter_render

jupyter configuration overrides

property kw_render_pydot
property name

The name of the graph in the dot-file (important for cmaps).

property nx_attrs

Attributes gotten from nx-graph for the given graph/node/edge. They are NOT a clone, so any modifications affect the nx graph.

property nx_item

The node (data(str) or Operation) or edge as gotten from nx-graph.

property outputs

the list of output names .

property plottable

who is the caller

property plotter

If given, overrides :active plotter`.

property solution

Contains the computed results, which might be different from plottable.

property steps

the list of execution plan steps.

property theme

If given, overrides plot theme plotter will use. It can be any mapping, in which case it overrite the current theme.

with_defaults(*args, **kw)graphtik.base.PlotArgs[source]

Replace only fields with None values.

class graphtik.base.Plottable[source]

Classes wishing to plot their graphs should inherit this and …

implement property plot to return a “partial” callable that somehow ends up calling plot.render_pydot() with the graph or any other args bound appropriately. The purpose is to avoid copying this function & documentation here around.

plot(filename: Union[str, bool, int] = None, show=None, *, plotter: graphtik.plot.Plotter = None, theme: graphtik.plot.Theme = None, graph: networkx.Graph = None, name=None, steps=None, inputs=None, outputs=None, solution: graphtik.network.Solution = None, clusters: Mapping = None, jupyter_render: Union[None, Mapping, str] = None) → pydot.Dot[source]

Entry-point for plotting ready made operation graphs.

Parameters
  • filename (str) –

    Write a file or open a matplotlib window.

    • If it is a string or file, the diagram is written into the file-path

      Common extensions are .png .dot .jpg .jpeg .pdf .svg call plot.supported_plot_formats() for more.

    • If it IS True, opens the diagram in a matplotlib window (requires matplotlib package to be installed).

    • If it equals -1, it mat-plots but does not open the window.

    • Otherwise, just return the pydot.Dot instance.

    seealso

    PlotArgs.filename, Plotter.render_pydot()

  • plottable

    the plottable that ordered the plotting. Automatically set downstreams to one of:

    op | pipeline | net | plan | solution | <missing>
    
    seealso

    PlotArgs.plottable

  • plotter

    the plotter to handle plotting; if none, the active plotter is used by default.

    seealso

    PlotArgs.plotter

  • theme

    Any plot theme or dictionary overrides; if none, the Plotter.default_theme of the active plotter is used.

    seealso

    PlotArgs.theme

  • name

    if not given, dot-lang graph would is named “G”; necessary to be unique when referring to generated CMAPs. No need to quote it, handled by the plotter, downstream.

    seealso

    PlotArgs.name

  • graph (str) –

    (optional) A nx.Digraph with overrides to merge with the graph provided by underlying plottables (translated by the active plotter).

    It may contain graph, node & edge attributes for any usage, but these conventions apply:

    'graphviz.xxx' (graph/node/edge attributes)

    Any “user-overrides” with this prefix are sent verbatim a Graphviz attributes.

    Note

    Remember to escape those values as Graphviz HTML-Like strings (use plot.graphviz_html_string()).

    no_plot (node/edge attribute)

    element skipped from plotting (see “Examples:” section, below)

    seealso

    PlotArgs.graph

  • inputs

    an optional name list, any nodes in there are plotted as a “house”

    seealso

    PlotArgs.inputs

  • outputs

    an optional name list, any nodes in there are plotted as an “inverted-house”

    seealso

    PlotArgs.outputs

  • solution

    an optional dict with values to annotate nodes, drawn “filled” (currently content not shown, but node drawn as “filled”). It extracts more infos from a Solution instance, such as, if solution has an executed attribute, operations contained in it are drawn as “filled”.

    seealso

    PlotArgs.solution

  • clusters

    Either a mapping, or false/true to enable plotter’s default clustering of nodes base on their dot-separated name parts.

    Note that if it’s None (default), the plotter will cluster based on node-names, BUT the Plan may replace the None with a dictionary with the “pruned” cluster (when its dag differs from network’s graph); to suppress the pruned-cluster, pass a truthy, NON-dictionary value.

    Practically, when it is a:

    • dictionary of node-names –> dot(.)-separated cluster-names, it is respected, even if empty;

    • truthy: cluster based on dot(.)-separated node-name parts;

    • falsy: don’t cluster at all.

    seealso

    PlotArgs.clusters

  • jupyter_render

    a nested dictionary controlling the rendering of graph-plots in Jupyter cells, if None, defaults to jupyter_render; you may modify it in place and apply for all future calls (see Jupyter notebooks).

    seealso

    PlotArgs.jupyter_render

  • show

    Deprecated since version v6.1.1: Merged with filename param (filename takes precedence).

Returns

a pydot.Dot instance (for reference to as similar API to pydot.Dot instance, visit: https://pydotplus.readthedocs.io/reference.html#pydotplus.graphviz.Dot)

The pydot.Dot instance returned is rendered directly in Jupyter/IPython notebooks as SVG images (see Jupyter notebooks).

Note that the graph argument is absent - Each Plottable provides its own graph internally; use directly render_pydot() to provide a different graph.

Graphtik Legend

NODES:

oval

function

egg

subgraph operation

house

given input

inversed-house

asked output

polygon

given both as input & asked as output (what?)

square

intermediate data, neither given nor asked.

red frame

evict-instruction, to free up memory.

filled

data node has a value in solution OR function has been executed.

thick frame

function/data node in execution steps.

ARROWS

solid black arrows

dependencies (source-data need-ed by target-operations, sources-operations provides target-data)

dashed black arrows

optional needs

blue arrows

sideffect needs/provides

wheat arrows

broken dependency (provide) during pruning

green-dotted arrows

execution steps labeled in succession

To generate the legend, see legend().

Examples:

>>> from graphtik import compose, operation
>>> from graphtik.modifiers import optional
>>> from operator import add
>>> pipeline = compose("pipeline",
...     operation(name="add", needs=["a", "b1"], provides=["ab1"])(add),
...     operation(name="sub", needs=["a", optional("b2")], provides=["ab2"])(lambda a, b=1: a-b),
...     operation(name="abb", needs=["ab1", "ab2"], provides=["asked"])(add),
... )
>>> pipeline.plot(True);                       # plot just the graph in a matplotlib window # doctest: +SKIP
>>> inputs = {'a': 1, 'b1': 2}
>>> solution = pipeline(**inputs)              # now plots will include the execution-plan

The solution is also plottable:

>>> solution.plot('plot1.svg');                                                             # doctest: +SKIP

or you may augment the pipelinewith the requested inputs/outputs & solution:

>>> pipeline.plot('plot1.svg', inputs=inputs, outputs=['asked', 'b1'], solution=solution);  # doctest: +SKIP

In any case you may get the pydot.Dot object (n.b. it is renderable in Jupyter as-is):

>>> dot = pipeline.plot(solution=solution);
>>> print(dot)
digraph pipeline {
fontname=italic;
label=<pipeline>;
<a> [fillcolor=wheat, margin="0.04,0.02", shape=invhouse, style=filled, tooltip="(int) 1"];
...

You may use the PlotArgs.graph overlay to skip certain nodes (or edges) from the plots:

>>> import networkx as nx
>>> g = nx.DiGraph()  # the overlay
>>> to_hide = pipeline.net.find_op_by_name("sub")
>>> g.add_node(to_hide, no_plot=True)
>>> dot = pipeline.plot(graph=g)
>>> assert "<sub>" not in str(dot), str(dot)
abstract prepare_plot_args(plot_args: graphtik.base.PlotArgs)graphtik.base.PlotArgs[source]

Called by plot() to create the nx-graph and other plot-args, e.g. solution.

class graphtik.base.RenArgs[source]

Arguments received by callbacks in rename() and operation nesting.

property name

Alias for field number 2

property op

the operation currently being processed

property parent

The parent Pipeline of the operation currently being processed,. Has value only when doing operation nesting from compose().

property typ

what is currently being renamed, one of the string: (op | needs | provides | aliases)

class graphtik.base.Token(*args)[source]

Guarantee equality, not(!) identity, across processes.

hashid
graphtik.base.aslist(i, argname, allowed_types=<class 'list'>)[source]

Utility to accept singular strings as lists, and None –> [].

graphtik.base.astuple(i, argname, allowed_types=<class 'tuple'>)[source]
graphtik.base.first_solid(*tristates, default=None)[source]

Utility combining multiple tri-state booleans.

graphtik.base.func_name(fn, default=Ellipsis, mod=None, fqdn=None, human=None, partials=None) → Optional[str][source]

FQDN of fn, descending into partials to print their args.

Parameters
  • default – What to return if it fails; by default it raises.

  • mod – when true, prepend module like module.name.fn_name

  • fqdn – when true, use __qualname__ (instead of __name__) which differs mostly on methods, where it contains class(es), and locals, respectively (PEP 3155). Sphinx uses fqdn=True for generating IDs.

  • human – when true, explain built-ins, and assume partials=True (if that was None)

  • partials – when true (or omitted & human true), partials denote their args like fn({"a": 1}, ...)

Returns

a (possibly dot-separated) string, or default (unless this is ...`).

Raises

Only if default is ..., otherwise, errors debug-logged.

Examples

>>> func_name(func_name)
'func_name'
>>> func_name(func_name, mod=1)
'graphtik.base.func_name'
>>> func_name(func_name.__format__, fqdn=0)
'__format__'
>>> func_name(func_name.__format__, fqdn=1)
'function.__format__'

Even functions defined in docstrings are reported:

>>> def f():
...     def inner():
...         pass
...     return inner
>>> func_name(f, mod=1, fqdn=1)
'graphtik.base.f'
>>> func_name(f(), fqdn=1)
'f.<locals>.inner'

On failures, arg default controls the outcomes:

TBD

graphtik.base.func_source(fn, default=Ellipsis, human=None) → Optional[Tuple[str, int]][source]

Like inspect.getsource() supporting partials.

Parameters
  • default – If given, better be a 2-tuple respecting types, or ..., to raise.

  • human – when true, denote builtins like python does

graphtik.base.func_sourcelines(fn, default=Ellipsis, human=None) → Optional[Tuple[str, int]][source]

Like inspect.getsourcelines() supporting partials.

Parameters

default – If given, better be a 2-tuple respecting types, or ..., to raise.

graphtik.base.jetsam(ex, locs, *salvage_vars: str, annotation='jetsam', **salvage_mappings)[source]

Annotate exception with salvaged values from locals() and raise!

Parameters
  • ex – the exception to annotate

  • locs

    locals() from the context-manager’s block containing vars to be salvaged in case of exception

    ATTENTION: wrapped function must finally call locals(), because locals dictionary only reflects local-var changes after call.

  • annotation – the name of the attribute to attach on the exception

  • salvage_vars – local variable names to save as is in the salvaged annotations dictionary.

  • salvage_mappings – a mapping of destination-annotation-keys –> source-locals-keys; if a source is callable, the value to salvage is retrieved by calling value(locs). They take precedence over`salvage_vars`.

Raises

any exception raised by the wrapped function, annotated with values assigned as attributes on this context-manager

  • Any attributes attached on this manager are attached as a new dict on the raised exception as new jetsam attribute with a dict as value.

  • If the exception is already annotated, any new items are inserted, but existing ones are preserved.

Example:

Call it with managed-block’s locals() and tell which of them to salvage in case of errors:

try:
    a = 1
    b = 2
    raise Exception()
exception Exception as ex:
    jetsam(ex, locals(), "a", b="salvaged_b", c_var="c")
    raise

And then from a REPL:

import sys
sys.last_value.jetsam
{'a': 1, 'salvaged_b': 2, "c_var": None}

** Reason:**

Graphs may become arbitrary deep. Debugging such graphs is notoriously hard.

The purpose is not to require a debugger-session to inspect the root-causes (without precluding one).

Naively salvaging values with a simple try/except block around each function, blocks the debugger from landing on the real cause of the error - it would land on that block; and that could be many nested levels above it.

Module: sphinxext

Extends Sphinx with graphtik directive for plotting from doctest code.

class graphtik.sphinxext.DocFilesPurgatory[source]

Keeps 2-way associations of docs <–> abs-files, to purge them.

doc_fpaths: Dict[str, Set[Path]] = None[source]

Multi-map: docnames -> abs-file-paths

purge_doc(docname: str)[source]

Remove doc-files not used by any other doc.

register_doc_fpath(docname: str, fpath: pathlib.Path)[source]

Must be absolute, for purging to work.

class graphtik.sphinxext.GraphtikDoctestDirective(name, arguments, options, content, lineno, content_offset, block_text, state, state_machine)[source]

Embeds plots from doctest code (see graphtik).

class graphtik.sphinxext.GraphtikTestoutputDirective(name, arguments, options, content, lineno, content_offset, block_text, state, state_machine)[source]

Like graphtik directive, but emulates doctest testoutput blocks.

class graphtik.sphinxext.dynaimage(rawsource='', *children, **attributes)[source]

Writes a tag in tag attr (<img> for PNGs, <object> for SVGs/PDFs).

class graphtik.sphinxext.graphtik_node(rawsource='', *children, **attributes)[source]

Non-writtable node wrapping (literal-block + figure(dynaimage)) nodes.

The figure gets the :name: & :align: options, the contained dynaimage gets the :height”, :width:, :scale:, :classes: options.

Graphtik Changelog

Changelog

v8.3.0.dev0 (12 May 2020, @ankostis): mapped–>keyword, drop sol-finalize
  • BREAK: rename mapped --> keyword, which conveys the mot important meaning.

  • DROP Solution.finalized() method – has stopped being used to reverse values since sfxed have been introduced (v7+).

  • doc(modifiers): explain diacritic symbols of dependencies when in printouts.

v8.2.0 (11 May 2020, @ankostis): custom Solutions, Task-context
  • FEAT(exe): compute() supports custom Solution classes.

  • FEAT(exe): underlying functions gain access to wrapping Operation with execution.task_context.

v8.1.0 (11 May 2020, @ankostis): drop last plan, Rename/Nest, Netop–>Pipeline, purify modules
  • DROP(pipeline): After solution class was introduced, last_plan attribute was redundant.

  • ENH(op): Rename & Nest operations with dictionary or callable.

  • FEAT(pipeline): NO_RESULT_BUT_SFX token can cancel regular data but leave sideffects of a rescheduled op.

  • REFACT: revert module splits and arrive back to base.py, op.py & pipeline.py, to facilitate development with smaller files, but still with very few import-time dependencies.

    Importing project composition classes takes less than 4ms in a fast 2019 PC (down from 300ms).

  • FIX(plot): updated Legend, which had become outdated since v6+.

  • fix(modifiers): dep_renamed() was faking sideffect-renaming only on repr() (but fix not stressed, bc v8.0.x is not actually released).

  • enh(pipe): accept a dictionary with renames when doing operation nesting (instead of callables or truthies).

  • refact(op): force abstract Operation to be Plottable.

  • enh(modifiers): add _Modifier.cmd with code to reproduce modifier.

v8.0.2 (7 May 2020, @ankostis): re-MODULE; sideffect –> sfx; all DIACRITIC Modifiers; invert “merge” meaning

–((superseded immediately v8.0.1 & v8.0.2 with more restructurings)))–

  • BREAK: restructured netop && network modules:

    • BREAK: stopped(!) importing config top-level.

    • BREAK: network module was splitted into execution which now contains plan+solution;

    • BREAK: unified modules op + netop –> :mod`.composition`.

    • DOC: module dependencies diagram in API Reference; now x60 faster import composition from 300ms –> 5ms.

  • BREAK: sideffect modifier functions shortened to sfx() & sfxed().

    • FEAT: +Sideffected varargish – now sideffected fully mirror a regular dependency.

    • ENH: change visual representation of modifiers with DIACRITICS only.

    • refact(modifiers): use cstor matrix to combine modifier arguments; new utility method for renaming dependencies dep_renamed() (usefull when Nesting, see below).

    • ENH: possible to rename also sideffects; the actual sideffect string is now stored in the modifier.

  • BREAK/ENH: invert “merge” meaning with (newly introduced) “:term:”nest <operation nesting>`”; default is now is merge:

    • FEAT: introduce the NULL_OP operation that can “erase” an existing operation when merging pipelines.

    • ENH: compose(..., nest=nest_cb) where the callback accepts class .RenArgs and can perform any kind of renaming on data + operations before combining pipelines.

    • doc: “merge” identically-named ops override each other, “nest” means they are prefixed, “combine” means both operations.

    • DOC: re-written a merge-vs-nest tutorial for humanity.

  • DROP(op): parent attribute is no longer maintained – operation identity now based only on name, which may implicitly be nested by dots(.).

  • ENH(plot): accept bare dictionary as theme overrides when plotting.

  • doc: fix site configuration for using the standard <s5defs> include for colored/font-size sphinx roles.

v8.0.0, v8.0.1 (7 May 2020, @ankostis): retracted bc found more restructurings

–((all changes above in b8.0.2 happened actually in these 2 releases))–

v7.1.2 (6 May 2020, @ankostis): minor reschedule fixes and refactoring

Actually it contains just what was destined for v7.1.1.

  • FIX(op): v7.0.0 promise that op.__call__ delegates to compute() was a fake; now it is fixed.

  • fix(config): endurance flags were miss-behaving.

  • refact(net): factor out a _reschedule() method for both endurance & rescheduled ops.

  • feat(build): +script to launch pytest on a local clone repo before pushing.

v7.1.1 (5 May 2020, @ankostis): canceled, by mistake contained features for 8.x

(removed from PyPi/RTD, new features by mistake were removed from v7.1.2)

v7.1.0 (4 May 2020, @ankostis): Cancelable sideffects, theme-ize & expand everything

%3 graphtik-v4.4+ flowchart cluster_compute compute operations operations compose compose operations->compose network network compose->network compile compile network->compile ●graph inputs input names inputs->compile outputs output names outputs->compile predicate node predicate predicate->compile plan execution plan compile->plan ●pruned dag execute execute plan->execute ●dag solution solution execute->solution ●solution dag values input values values->execute solution->solution ●prune dag on reschedule

Should have been a MAJOR BUMP due to breaking renames, but just out of another 6.x –> 7.x major bump.

NET
  • FIX: rescheduled operations were not canceling all downstream deps & operations.

  • FEAT: Cancelable sideffects: a reschedules operation may return a “falsy” sideffect to cancel downstream operations.

    • NO_RESULT constant cancels also sideffects.

  • ENH(OP): more intuitive API, compute() may be called with no args, or a single string as outputs param. Operation’s __call__ now delegates to compute() - to quickly experiment with function, access it from the operation’s FunctionalOperation.fn attribute

MODIFIERS
  • BREAK: renamed modifiers sol_sideffect --> sideffected, to reduce terminology mental load for the users.

  • ENH: support combinations of modifiers (e.g. optional sideffects).

  • REFACT: convert modifiers classes –> factory functions, producing _Modifier instances (normally not managed by the user).

PLOT
  • ENH: Theme-ize all; expand callables (beyond Refs and templates).

  • BREAK: rename Theme.with_set() –> Theme.withset().

  • break: pass verbatim any nx-attrs starting with 'graphviz.' into plotting process (instead of passing everything but private attributes).

  • break: rename graph/node/edge control attributes:

    • _no_plot --> no_plot.

    • _alias_of --> alias_of.

  • FEAT: draw combined pipelines as clusters

  • enh: corrected and richer styles for data nodes.

  • enh: unify op-badges on plot with diacritics in their string-representation.

  • ENH(sphinxext): clicking on an SVG opens the diagram in a new tab.

  • fix(sphinxext): don’t choke on duplicate :name: in graphtik directives.

  • fix(sphinxext): fix deprecation of sphinx add_object() with note_object().

Various
  • break: raise TypeError instead of ValueError wherever it must.

  • DOC(operations): heavily restructured chapter - now might stand alone. Started using the pipeline name more often.

  • doc: use as sample diagram in the project opening an “endured” one (instead of an outdated plain simple on).

  • doc: renamed document: composition.py --> pipelines.py

v7.0.0 (28 Apr 2020, @ankostis): In-solution sideffects, unified OpBuilder, plot badges
  • BREAK: stacking of solution results changed to the more natural “chronological” one (outputs written later in the solution override previous ones).

    Previously it was the opposite during execution while reading intermediate solution values (1st result or user-inputs won), and it was “reversed” to regular chronological right before the solution was finalized.

  • FEAT(op, netop): add __name__ attribute to operations, to disguise as functions.

  • BREAK(op): The operation() factory function (used to be class) now behave like a regular decorator when fn given in the first call, and constructs the FunctionalOperation without a need to call again the factory.

    Specifically the last empty call at the end () is not needed (or possible):

    operation(str, name=...)()
    

    became simply like that:

    operation(str, name=...)
    
  • DROP(NET): _DataNode and use str + modifier-classes as data-nodes;

MODIFIERS:
  • BREAK: rename arg –> mapped`, which conveys the correct meaning.

  • FEAT: Introduced :term`sideffected`s, to allow for certain dependencies to be produced & consumed by function to apply “sideffects, without creating “cycles”:

    • feat(op): introduce _fn_needs, op_needs & op_provides on FunctionalOperation, used when matching Inps/Outs and when pruning graph.

    • FEAT(op): print detailed deps when DEBUG enabled.

PLOT:
  • ENH: recursively merge Graphviz-styles attributes, with expanding jinja2-template and extending lists while preserving theme-provenance, for debugging.

  • BREAK: rename class & attributes related to Style --> Theme, to distinguish them from styles (stacks of dictionaries).

  • UPD: dot no plot Steps by default; use this Plot customizations to re-enable them:

    plottable.plot(plotter=Plotter(include_steps=True))
    
  • FEAT: now operations are also plottable.

  • FEAT: Operation BADGES to distinguish endured, rescheduled, parallel, marshalled, returns_dict.

  • FIX: Cancel/Evict styles were misclassified.

  • feat(plot): change label in sol_sideffects; add exceptions as tooltips on failed operations, etc.

  • enh: improve plot theme, e.g. prunes are all grey, sideffects all blue, “evictions” are colored closer to steps, etc. Add many neglected styles.

Sphinx extension:
  • enh: Save DOTs if DEBUG; save it before…

  • fix: save debug-DOT before rendering images, to still get those files as debug aid in case of errors.

  • fix: workaround missing lineno on doctest failures, an incomplete solution introduced upstream by sphinx-doc/sphinx#4584.

Configurations:
  • BREAK: rename context-manager configuration function debug –> debug_enabled.

  • FEAT: respect GRAPHTIK_DEBUG for enabling is_debug() configuration.

DOC:
v6.2.0 (19 Apr 2020, @ankostis): plotting fixes & more styles, net find util methods
  • PLOT:

    • DEPRECATE(plot): show argument in plot methods & functions; dropped completely from the args of the younger class Plotter.

      It has merged with filename param (the later takes precedence if both given).

    • ENH: apply more styles on data-nodes; distinguish between Prune/Cancel/Evict data Styles and add tooltips for those cases (ie data nodes without values).

    • DROP: do not plot wth splines=ortho, because it crashes with some shapes; explain in docs how to re-enables this (x2 ways).

    • FIX: node/edge attributes were ignored due to networkx API misuse - add TCs on that.

    • FIX: Networks were not plotting Inps/Outs/Name due to forgotten namedtuple._replace() assignment.

    • feat: introduce _no_plot nx-attribute to filter out nodes/edges.

  • ENH(base): improve auto-naming of operations, descending partials politely and handling better builtins.

  • FEAT(net): add Network.find_ops() & Network.find_op_by_name() utility methods.

  • enh(build, site, doc): graft Build Ver/Date as gotten from Git in PyPi landing-page.

v6.1.0 (14 Apr 2020, @ankostis): config plugs & fix styles

Should have been a MAJOR BUMP due to breaking renames, but…no clients yet (and just out of to 5.x –> 6.x major bump).

  • REFACT/BREAK(plot): rename installed_plotter --> active_plotter.

  • REFACT/BREAK(config): denote context-manager functions by adding a "_plugged" suffix.

  • FEAT(plot): offer with_XXX() cloning methods on Plotter/Style instances.

  • FIX(plot): Style cstor were had his methods broken due to eager copying them from its parent class.

v6.0.0 (13 Apr 2020, @ankostis): New Plotting Device…

–((superseded by v6.1.0 due to installed_potter –> active_plotter))–

  • ENH/REFACT(PLOT):

    • REFACT/BREAK: plots are now fully configurable with plot theme through the use of installed plotter.

    • ENH: Render operation nodes with Graphviz HTML-Table Labels.

    • ENH: Convey graph, node & edge (“non-private”) attributes from the networkx graph given to the plotter.

    • FEAT: Operation node link to docs (hackish, based on a URL formatting).

    • Improved plotting documentation & +3 new terms.

  • FIX: ReadTheDice deps

  • drop(plot): don’t suppress the grafting of the title in netop images.

v5.7.1 (7 Apr 2020, @ankostis): Plot job, fix RTD deps
  • ENH(PLOT): Operation tooltips now show function sources.

  • FIX(site): RTD failing since 5.6.0 due to sphinxcontrib-spelling extension not included n its requirements.

  • FEAT(sphinxext): add graphtik_plot_keywords sphinx-configuration with a default value that suppresses grafting the title of a netop in the images, to avoid duplication when graphtik:name: option is given.

  • enh(plot): URL/tooltips are now overridable with node_props

  • enh(sphinxext): permalink plottables with :name: option.

  • enh(plot): pan-zoom follows parent container block, on window resize; reduce zoom mouse speed.

v5.7.0 (6 Apr 2020, @ankostis): FIX +SphinxExt in Wheel

All previous distributions in PyPi since sphinx-extension was added in v5.3.0 were missing the new package sphinxext needed to build sites with the .. graphtik:: directive.

v5.6.0 (6 Apr 2020, @ankostis, BROKEN): +check_if_incomplete

–((BROKEN because wheel in PyPi is missing sphinxext package))–

  • feat(sol): + Solution.check_if_incomplete() just to get multi-errors (not raise them)

  • doc: integrate spellchecking of VSCode IDE & sphinxcontrib.spelling.

v5.5.0 (1 Apr 2020, @ankostis, BROKEN): ortho plots

–((BROKEN because wheel in PyPi is missing sphinxext package))–

Should have been a major bump due to breaking rename of Plotter class, but…no clients yet.

  • ENH(plot): plot edges in graphs with Graphviz splines=ortho.

  • REFACT(plot): rename base class from Plotter --> Plottable;

  • enh(build): add [dev] distribution extras as an alias to [all]. doc: referred to the new name from a new term in glossary.

  • enh(site): put RST substitutions in rst_epilog configuration (instead of importing them from README’s tails).

  • doc(quickstart): exemplify @operation as a decorator.

v5.4.0 (29 Mar 2020, @ankostis, BROKEN): auto-name ops, dogfood quickstart

–((BROKEN because wheel in PyPi is missing sphinxext package))–

  • enh(op): use func_name if none given.

  • DOC(quickstart): dynamic plots with sphinxext.

v5.3.0 (28 Mar 2020, @ankostis, BROKEN): Sphinx plots, fail-early on bad op

–((BROKEN because wheel in PyPi is missing sphinxext package))–

  • FEAT(PLOT,SITE): Sphinx extension for plotting graph-diagrams as zoomable SVGs (default), PNGs (with link maps), PDFs, etc.

    • replace pre-plotted diagrams with dynamic ones.

    • deps: sphinx >=2; split (optional) matplolib dependencies from graphviz.

    • test: install and use Sphinx’s harness for testing site features & extensions.

  • ENH(op): fail early if 1st argument of operation is not a callable.

  • enh(plot): possible to control the name of the graph, in the result DOT-language (it was stuck to 'G' before).

  • upd(conf): detailed object representations are enabled by new configuration debug flag (instead of piggybacking on logger.DEBUG).

  • enh(site):

    • links-to-sources resolution function was discarding parent object if it could not locate the exact position in the sources;

    • TC: launch site building in pytest interpreter, to control visibility of logs & stdout;

    • add index pages, linked from TOCs.

v5.2.2 (03 Mar 2020, @ankostis): stuck in PARALLEL, fix Impossible Outs, plot quoting, legend node
  • FIX(NET): PARALLEL was ALWAYS enabled.

  • FIX(PLOT): workaround pydot parsing of node-ID & labels (see pydot#111 about DOT-keywords & pydot#224 about colons :) by converting IDs to HTML-strings; additionally, this project did not follow Graphviz grammatical-rules for IDs.

  • FIX(NET): impossible outs (outputs that cannot be produced from given inputs) were not raised!

  • enh(plot): clicking the background of a diagram would link to the legend url, which was annoying; replaced with a separate “legend” node.

v5.2.1 (28 Feb 2020, @ankostis): fix plan cache on skip-evictions, PY3.8 TCs, docs
  • FIX(net): Execution-plans were cached also the transient is_skip_evictions() configurations (instead of just whether no-outputs were asked).

  • doc(readme): explain “fork” status in the opening.

  • ENH(travis): run full tests from Python-3.7–> Python-3.8.

v5.2.0 (27 Feb 2020, @ankostis): Map needs inputs –> args, SPELLCHECK
  • FEAT(modifiers): optionals and new modifier mapped() can now fetch values from inputs into differently-named arguments of operation functions.

    • refact: decouple varargs from optional modifiers hierarchy.

  • REFACT(OP): preparation of NEEDS –> function-args happens once for each argument, allowing to report all errors at once.

  • feat(base): +MultiValueError exception class.

  • DOC(modifiers,arch): modifiers were not included in “API reference”, nor in the glossary sections.

  • FIX: spell-check everything, and add all custom words in the VSCode settings file .vscode.settings.json.

v5.1.0 (22 Jan 2020, @ankostis): accept named-tuples/objects provides
  • ENH(OP): flag returns_dict handles also named-tuples & objects (__dict__).

v5.0.0 (31 Dec 2019, @ankostis): Method–>Parallel, all configs now per op flags; Screaming Solutions on fails/partials
  • BREAK(NETOP): compose(method="parallel") --> compose(parallel=None/False/True) and DROP netop.set_execution_method(method); parallel now also controlled with the global set_parallel_tasks() configurations function.

    • feat(jetsam): report task executed in raised exceptions.

  • break(netop): rename netop.narrowed() --> withset() toi mimic Operation API.

  • break: rename flags:

    • reschedule --> rescheduleD

    • marshal --> marshalLED.

  • break: rename global configs, as context-managers:

    • marshal_parallel_tasks --> tasks_marshalled

    • endure_operations --> operations_endured

  • FIX(net, plan,.TC): global skip evictions flag were not fully obeyed (was untested).

  • FIX(OP): revamped zipping of function outputs with expected provides, for all combinations of rescheduled, NO_RESULT & returns dictionary flags.

  • configs:

    • refact: extract configs in their own module.

    • refact: make all global flags tri-state (None, False, True), allowing to “force” operation flags when not None. All default to None (false).

  • ENH(net, sol, logs): include a “solution-id” in revamped log messages, to facilitate developers to discover issues when multiple netops are running concurrently. Heavily enhanced log messages make sense to the reader of all actions performed.

  • ENH(plot): set toolltips with repr(op) to view all operation flags.

  • FIX(TCs): close process-pools; now much more TCs for parallel combinations of threaded, process-pool & marshalled.

  • ENH(netop,net): possible to abort many netops at once, by resetting abort flag on every call of Pipeline.compute() (instead of on the first stopped netop).

  • FEAT(SOL): scream_if_incomplete() will raise the new IncompleteExecutionError exception if failures/partial-outs of endured/rescheduled operations prevented all operations to complete; exception message details causal errors and conditions.

  • feat(build): +``all`` extras.

  • FAIL: x2 multi-threaded TCs fail spuriously with “inverse dag edges”:

    • test_multithreading_plan_execution()

    • test_multi_threading_computes()

    both marked as xfail.

v4.4.1 (22 Dec 2019, @ankostis): bugfix debug print
  • fix(net): had forgotten a debug-print on every operation call.

  • doc(arch): explain parallel & the need for marshalling with process pools.

v4.4.0 (21 Dec 2019, @ankostis): RESCHEDULE for PARTIAL Outputs, on a per op basis
  • [x] dynamic Reschedule after operations with partial outputs execute.

  • [x] raise after jetsam.

  • [x] plots link to legend.

  • [x] refact netop

  • [x] endurance per op.

  • [x] endurance/reschedule for all netop ops.

  • [x] merge _Rescheduler into Solution.

  • [x] keep order of outputs in Solution even for parallels.

  • [x] keep solution layers ordered also for parallel.

  • [x] require user to create & enter pools.

  • [x] FIX pickling THREAD POOL –>Process.

Details
  • FIX(NET): keep Solution’s insertion order also for PARALLEL executions.

  • FEAT(NET, OP): rescheduled operations with partial outputs; they must have FunctionalOperation.rescheduled set to true, or else they will fail.

  • FEAT(OP, netop): specify endurance/reschedule on a per operation basis, or collectively for all operations grouped under some pipeline.

  • REFACT(NETOP):

    • feat(netop): new method Pipeline.compile(), delegating to same-named method of network.

    • drop(net): method Net.narrowed(); remember netop.narrowed(outputs+predicate) and apply them on netop.compute() & netop.compile().

      • PROS: cache narrowed plans.

      • CONS: cannot review network, must review plan of (new) netop.compile().

    • drop(netop): inputs args in narrowed() didn’t make much sense, leftover from “unvarying netops”; but exist ni netop.compile().

    • refact(netop): move net-assembly from compose() –> NetOp cstor; now reschedule/endured/merge/method args in cstor.

  • NET,OP,TCs: FIX PARALLEL POOL CONCURRENCY

    • Network:

      • feat: +marshal +_OpTask

      • refact: plan._call_op –> _handle_task

      • enh: Make abort run variable a shared-memory Value.

    • REFACT(OP,.TC): not a namedtuple, breaks pickling.

    • ENH(pool): Pool

    • FIX: compare Tokens with is –> ==, or else, it won’t work for sub-processes.

    • TEST: x MULTIPLE TESTS

      • +4 tags: parallel, thread, proc, marshal.

      • many uses of exemethod.

  • FIX(build): PyPi README check did not detect forbidden raw directives, and travis auto-deployments were failing.

  • doc(arch): more terms.

v4.3.0 (16 Dec 2019, @ankostis): Aliases
  • FEAT(OP): support “aliases” of provides, to avoid trivial pipe-through operations, just to rename & match operations.

v4.2.0 (16 Dec 2019, @ankostis): ENDURED Execution
  • FEAT(NET): when set_endure_operations() configuration is set to true, a pipeline will keep on calculating solution, skipping any operations downstream from failed ones. The solution eventually collects all failures in Solution.failures attribute.

  • ENH(DOC,plot): Links in Legend and Architecture Workflow SVGs now work, and delegate to architecture terms.

  • ENH(plot): mark overwrite, failed & canceled in repr() (see endurance).

  • refact(conf): fully rename configuration operation skip_evictions.

  • REFACT(jetsam): raise after jetsam in situ, better for Readers & Linters.

  • enh(net): improve logging.

v4.1.0 (13 Dec 2019, @ankostis): ChainMap Solution for Rewrites, stable TOPOLOGICAL sort

%3 graphtik-v4.1.0 flowchart cluster_compute compute operations operations compose compose operations->compose network network compose->network compile compile network->compile inputs input names inputs->compile outputs output names outputs->compile predicate node predicate predicate->compile plan execution plan compile->plan execute execute plan->execute solution solution execute->solution values input values values->execute

  • FIX(NET): TOPOLOGICALLY-sort now break ties respecting operations insertion order.

  • ENH(NET): new Solution class to collect all computation values, based on a collections.ChainMap to distinguish outputs per operation executed:

    • ENH(NETOP): compute() return Solution, consolidating:

    • drop(net): _PinInstruction class is not needed.

    • drop(netop): overwrites_collector parameter; now in Solution.overwrites().

    • ENH(plot): Solution is also a Plottable; e.g. use sol.plot(...)`.

  • DROP(plot): executed arg from plotting; now embedded in solution.

  • ENH(PLOT.jupyter,doc): allow to set jupyter graph-styling selectively; fix instructions for jupyter cell-resizing.

  • fix(plan): time-keeping worked only for sequential execution, not parallel. Refactor it to happen centrally.

  • enh(NET,.TC): Add PREDICATE argument also for compile().

  • FEAT(DOC): add GLOSSARY as new Architecture section, linked from API HEADERS.

v4.0.1 (12 Dec 2019, @ankostis): bugfix
  • FIX(plan): plan.repr() was failing on empty plans.

  • fix(site): minor badge fix & landing diagram.

v4.0.0 (11 Dec 2019, @ankostis): NESTED merge, revert v3.x Unvarying, immutable OPs, “color” nodes
  • BREAK/ENH(NETOP): MERGE NESTED NetOps by collecting all their operations in a single Network; now children netops are not pruned in case some of their needs are unsatisfied.

    • feat(op): support multiple nesting under other netops.

  • BREAK(NETOP): REVERT Unvarying NetOps+base-plan, and narrow Networks instead; netops were too rigid, code was cumbersome, and could not really pinpoint the narrowed needs always correctly (e.g. when they were also provides).

    • A netop always narrows its net based on given inputs/outputs. This means that the net might be a subset of the one constructed out of the given operations. If you want all nodes, don’t specify needs/provides.

    • drop 3 ExecutionPlan attributes: plan, needs, plan

    • drop recompile flag in Network.compute().

    • feat(net): new method Network.narrowed() clones and narrows.

    • Network() cstor accepts a (cloned) graph to support narrowed() methods.

  • BREAK/REFACT(OP): simplify hierarchy, make Operation fully abstract, without name or requirements.

  • refact(net): consider as netop needs also intermediate data nodes.

  • FEAT(#1, net, netop): support pruning based on arbitrary operation attributes (e.g. assign “colors” to nodes and solve a subset each time).

  • enh(netop): repr() now counts number of contained operations.

  • refact(netop): rename netop.narrow() --> narrowed()

  • drop(netop): don’t topologically-sort sub-networks before merging them; might change some results, but gives control back to the user to define nets.

v3.1.0 (6 Dec 2019, @ankostis): cooler prune()
  • break/refact(NET): scream on plan.execute() (not net.prune()) so as calmly solve needs vs provides, based on the given inputs/outputs.

  • FIX(ot): was failing when plotting graphs with ops without fn set.

  • enh(net): minor fixes on assertions.

v3.0.0 (2 Dec 2019, @ankostis): UNVARYING NetOperations, narrowed, API refact
  • Pipelines:

    • BREAK(NET): RAISE if the graph is UNSOLVABLE for the given needs & provides! (see “raises” list of compute()).

    • BREAK: Pipeline.__call__() accepts solution as keyword-args, to mimic API of Operation.__call__(). outputs keyword has been dropped.

      Tip

      Use Pipeline.compute() when you ask different outputs, or set the recompile flag if just different inputs are given.

      Read the next change-items for the new behavior of the compute() method.

    • UNVARYING NetOperations:

      • BREAK: calling method Pipeline.compute() with a single argument is now UNVARYING, meaning that all needs are demanded, and hence, all provides are produced, unless the recompile flag is true or outputs asked.

      • BREAK: net-operations behave like regular operations when nested inside another netop, and always produce all their provides, or scream if less inputs than needs are given.

      • ENH: a newly created or cloned netop can be narrowed() to specific needs & provides, so as not needing to pass outputs on every call to compute().

      • feat: implemented based on the new “narrowed” Pipeline.plan attribute.

    • FIX: netop needs are not all optional by default; optionality applied only if all underlying operations have a certain need as optional.

    • FEAT: support function **args with 2 new modifiers vararg() & varargs(), acting like optional() (but without feeding into underlying functions like keywords).

    • BREAK(yahoo#12): simplify compose API by turning it from class –> function; all args and operations are now given in a single compose() call.

    • REFACT(net, netop): make Network IMMUTABLE by appending all operations together, in Pipeline constructor.

    • ENH(net): public-size _prune_graph() –> Network.prune()`() which can be used to interrogate needs & provides for a given graph. It accepts None inputs & outputs to auto-derive them.

  • FIX(SITE): autodocs API chapter were not generated in at all, due to import errors, fixed by using autodoc_mock_imports on networkx, pydot & boltons libs.

  • enh(op): polite error-,msg when calling an operation with missing needs (instead of an abrupt KeyError).

  • FEAT(CI): test also on Python-3.8

v2.3.0 (24 Nov 2019, @ankostis): Zoomable SVGs & more op jobs
  • FEAT(plot): render Zoomable SVGs in jupyter(lab) notebooks.

  • break(netop): rename execution-method "sequential" --> None.

  • break(netop): move overwrites_collector & method args from netop.__call__() –> cstor

  • refact(netop): convert remaining **kwargs into named args, tighten up API.

v2.2.0 (20 Nov 2019, @ankostis): enhance OPERATIONS & restruct their modules
  • REFACT(src): split module nodes.py –> op.py + netop.py and move Operation from base.py –> op.py, in order to break cycle of base(op) <– net <– netop, and keep utils only in base.py.

  • ENH(op): allow Operations WITHOUT any NEEDS.

  • ENH(op): allow Operation FUNCTIONS to return directly Dictionaries.

  • ENH(op): validate function Results against operation provides; jetsam now includes results variables: results_fn & results_op.

  • BREAK(op): drop unused Operation._after_init() pickle-hook; use dill instead.

  • refact(op): convert Operation._validate() into a function, to be called by clients wishing to automate operation construction.

  • refact(op): replace **kwargs with named-args in class:FunctionalOperation, because it allowed too wide args, and offered no help to the user.

  • REFACT(configs): privatize network._execution_configs; expose more config-methods from base package.

v2.1.1 (12 Nov 2019, @ankostis): global configs
  • BREAK: drop Python-3.6 compatibility.

  • FEAT: Use (possibly multiple) global configurations for all networks, stored in a contextvars.ContextVar.

  • ENH/BREAK: Use a (possibly) single execution_pool in global-configs.

  • feat: add abort flag in global-configs.

  • feat: add skip_evictions flag in global-configs.

v2.1.0 (20 Oct 2019, @ankostis): DROP BW-compatible, Restruct modules/API, Plan perfect evictions

The first non pre-release for 2.x train.

  • BRAKE API: DROP Operation’s params - use functools.partial() instead.

  • BRAKE API: DROP Backward-Compatible Data & Operation classes,

  • BRAKE: DROP Pickle workarounds - expected to use dill instead.

  • break(jetsam): drop “graphtik_` prefix from annotated attribute

  • ENH(op): now operation() supported the “builder pattern” with .operation.withset() method.

  • REFACT: renamed internal package functional –> nodes and moved classes around, to break cycles easier, (base works as supposed to), not to import early everything, but to fail plot early if pydot dependency missing.

  • REFACT: move PLAN and compute() up, from Network --> Pipeline.

  • ENH(NET): new PLAN BUILDING algorithm produces PERFECT EVICTIONS, that is, it gradually eliminates from the solution all non-asked outputs.

    • enh: pruning now cleans isolated data.

    • enh: eviction-instructions are inserted due to two different conditions: once for unneeded data in the past, and another for unused produced data (those not belonging typo the pruned dag).

    • enh: discard immediately irrelevant inputs.

  • ENH(net): changed results, now unrelated inputs are not included in solution.

  • refact(sideffect): store them as node-attributes in DAG, fix their combination with pinning & eviction.

  • fix(parallel): eviction was not working due to a typo 65 commits back!

v2.0.0b1 (15 Oct 2019, @ankostis): Rebranded as Graphtik for Python 3.6+

Continuation of yahoo#30 as yahoo#31, containing review-fixes in huyng/graphkit#1.

Network
  • FIX: multithreaded operations were failing due to shared ExecutionPlan.executed.

  • FIX: pruning sometimes were inserting plan string in DAG. (not _DataNode).

  • ENH: heavily reinforced exception annotations (“jetsam”):

    • FIX: (8f3ec3a) outer graphs/ops do not override the inner cause.

    • ENH: retrofitted exception-annotations as a single dictionary, to print it in one shot (8f3ec3a & 8d0de1f)

    • enh: more data in a dictionary

    • TCs: Add thorough TCs (8f3ec3a & b8063e5).

  • REFACT: rename Delete–>`Evict`, removed Placeholder from data nodes, privatize node-classes.

  • ENH: collect “jetsam” on errors and annotate exceptions with them.

  • ENH(sideffects): make them always DIFFERENT from regular DATA, to allow to co-exist.

  • fix(sideffects): typo in add_op() were mixing needs/provides.

  • enh: accept a single string as outputs when running graphs.

Testing & other code:
  • TCs: pytest now checks sphinx-site builds without any warnings.

  • Established chores with build services:

    • Travis (and auto-deploy to PyPi),

    • codecov

    • ReadTheDocs

v1.3.0 (Oct 2019, @ankostis): NEVER RELEASED: new DAG solver, better plotting & “sideffect”

Kept external API (hopefully) the same, but revamped pruning algorithm and refactored network compute/compile structure, so results may change; significantly enhanced plotting. The only new feature actually is the .sideffect modifier.

Network:
  • FIX(yahoo#18, yahoo#26, yahoo#29, yahoo#17, yahoo#20): Revamped DAG SOLVER to fix bad pruning described in yahoo#24 & yahoo#25

    Pruning now works by breaking incoming provide-links to any given intermediate inputs dropping operations with partial inputs or without outputs.

    The end result is that operations in the graph that do not have all inputs satisfied, they are skipped (in v1.2.4 they crashed).

    Also started annotating edges with optional/sideffects, to make proper use of the underlying networkx graph.

    graphtik-v1.3.0 flowchart

  • REFACT(yahoo#21, yahoo#29): Refactored Network and introduced ExecutionPlan to keep compilation results (the old steps list, plus input/output names).

    Moved also the check for when to evict a value, from running the execution-plan, to when building it; thus, execute methods don’t need outputs anymore.

  • ENH(yahoo#26): “Pin* input values that may be overwritten by calculated ones.

    This required the introduction of the new _PinInstruction in the execution plan.

  • FIX(yahoo#23, yahoo#22-2.4.3): Keep consistent order of networkx.DiGraph and sets, to generate deterministic solutions.

    Unfortunately, it non-determinism has not been fixed in < PY3.5, just reduced the frequency of spurious failures, caused by unstable dicts, and the use of subgraphs.

  • enh: Mark outputs produced by Pipeline’s needs as optional. TODO: subgraph network-operations would not be fully functional until “optional outputs” are dealt with (see yahoo#22-2.5).

  • enh: Annotate operation exceptions with ExecutionPlan to aid debug sessions,

  • drop: methods list_layers()/show layers() not needed, repr() is a better replacement.

Plotting:
  • ENH(yahoo#13, yahoo#26, yahoo#29): Now network remembers last plan and uses that to overlay graphs with the internals of the planing and execution: sample graphtik plot

    • execution-steps & order

    • evict & pin instructions

    • given inputs & asked outputs

    • solution values (just if they are present)

    • “optional” needs & broken links during pruning

  • REFACT: Move all API doc on plotting in a single module, split in 2 phases, build DOT & render DOT

  • FIX(yahoo#13): bring plot writing into files up-to-date from PY2; do not create plot-file if given file-extension is not supported.

  • FEAT: path pydot library to support rendering in Jupyter notebooks.

Testing & other code:
  • Increased coverage from 77% –> 90%.

  • ENH(yahoo#28): use pytest, to facilitate TCs parametrization.

  • ENH(yahoo#30): Doctest all code; enabled many assertions that were just print-outs in v1.2.4.

  • FIX: operation.__repr__() was crashing when not all arguments had been set - a condition frequently met during debugging session or failed TCs (inspired by @syamajala’s 309338340).

  • enh: Sped up parallel/multithread TCs by reducing delays & repetitions.

    Tip

    You need pytest -m slow to run those slow tests.

Chore & Docs:
v1.2.4 (Mar 7, 2018)
  • Issues in pruning algorithm: yahoo#24, yahoo#25

  • Blocking bug in plotting code for Python-3.x.

  • Test-cases without assertions (just prints).

graphtik-v1.2.4 flowchart

1.2.2 (Mar 7, 2018, @huyng): Fixed versioning

Versioning now is manually specified to avoid bug where the version was not being correctly reflected on pip install deployments

1.2.1 (Feb 23, 2018, @huyng): Fixed multi-threading bug and faster compute through caching of find_necessary_steps

We’ve introduced a cache to avoid computing find_necessary_steps multiple times during each inference call.

This has 2 benefits:

  • It reduces computation time of the compute call

  • It avoids a subtle multi-threading bug in networkx when accessing the graph from a high number of threads.

1.2.0 (Feb 13, 2018, @huyng)

Added set_execution_method(‘parallel’) for execution of graphs in parallel.

1.1.0 (Nov 9, 2017, @huyng)

Update setup.py

1.0.4 (Nov 3, 2017, @huyng): Networkx 2.0 compatibility

Minor Bug Fixes:

  • Compatibility fix for networkx 2.0

  • net.times now only stores timing info from the most recent run

1.0.3 (Jan 31, 2017, @huyng): Make plotting dependencies optional
  • Merge pull request yahoo#6 from yahoo/plot-optional

  • make plotting dependencies optional

1.0.2 (Sep 29, 2016, @pumpikano): Merge pull request yahoo#5 from yahoo/remove-packaging-dep
  • Remove ‘packaging’ as dependency

1.0.1 (Aug 24, 2016)
1.0 (Aug 2, 2016, @robwhess)

First public release in PyPi & GitHub.

  • Merge pull request yahoo#3 from robwhess/travis-build

  • Travis build

Index

Quick start

Here’s how to install:

pip install graphtik

OR with dependencies for plotting support (and you need to install Graphviz program separately with your OS tools):

pip install graphtik[plot]

Let’s build a graphtik computation pipeline that produces x3 outputs out of 2 inputs a and b:

\[ \begin{align}\begin{aligned}a \times b\\a - a \times b\\|a - a \times b| ^ 3\end{aligned}\end{align} \]
>>> from graphtik import compose, operation
>>> from operator import mul, sub
>>> @operation(name="abs qubed",
...            needs=["a_minus_ab"],
...            provides=["abs_a_minus_ab_cubed"])
... def abs_qubed(a):
...    return abs(a) ** 3

Compose the abspow function along with mul & sub built-ins into a computation graph:

>>> graphop = compose("graphop",
...    operation(mul, needs=["a", "b"], provides=["ab"]),
...    operation(sub, needs=["a", "ab"], provides=["a_minus_ab"]),
...    abs_qubed,
... )
>>> graphop
Pipeline('graphop', needs=['a', 'b', 'ab', 'a_minus_ab'],
                  provides=['ab', 'a_minus_ab', 'abs_a_minus_ab_cubed'],
                  x3 ops: mul, sub, abs qubed)

You may plot the function graph in a file like this (if in jupyter, no need to specify the file, see Jupyter notebooks):

>>> graphop.plot('graphop.svg')      # doctest: +SKIP

As you can see, any function can be used as an operation in Graphtik, even ones imported from system modules.

Run the graph-operation and request all of the outputs:

>>> sol = graphop(**{'a': 2, 'b': 5})
>>> sol
{'a': 2, 'b': 5, 'ab': 10, 'a_minus_ab': -8, 'abs_a_minus_ab_cubed': 512}

Solutions are plottable as well:

>>> solution.plot('solution.svg')      # doctest: +SKIP

Run the graph-operation and request a subset of the outputs:

>>> solution = graphop.compute({'a': 2, 'b': 5}, outputs=["a_minus_ab"])
>>> solution
{'a_minus_ab': -8}

… where the (interactive) legend is this:

>>> from graphtik.plot import legend
>>> l = legend()