Graphtik¶
11.0.0.dev0+v10.5.0-41-g1079c1f
, May 02, 2023
It’s a DAG all the way down!
Computation graphs for Python & Pandas¶
Graphtik is a library to compose, solve, execute & plot graphs of python functions (a.k.a pipelines) that consume and populate named data (a.k.a dependencies), whose names may be nested (such as, pandas dataframe columns), based on whether values for those dependencies exist in the inputs or have been calculated earlier.
In mathematical terms, given:
a partially populated data tree, and
a set of functions operating (consuming/producing) on branches of the data tree,
graphtik collects a subset of functions in a graph that when executed consume & produce as values as possible in the data-tree.
Its primary use case is building flexible algorithms for data science/machine learning projects.
It should be extendable to implement the following:
an IoC dependency resolver (e.g. Java Spring, Google Guice);
an executor of interdependent tasks based on files (e.g. GNU Make);
a custom ETL engine;
a spreadsheet calculation engine.
Graphtik sprang from Graphkit (summer 2019, v1.2.2) to experiment with Python 3.6+ features, but has diverged significantly with enhancements ever since.
Table of Contents
Operations¶
An operation is a function in a computation pipeline,
abstractly represented by the Operation
class.
This class specifies the dependencies forming the pipeline’s
network.
Defining Operations¶
You may inherit the Operation
abstract class to do the following:
define the needs & provides properties as collection of@ dependencies (needed to solve the dependencies network),
override the
compute(solution)
method to read from the solution argument those values listed in needs (those values only are guaranteed to exist when called),do some business, and then
populate the values listed in provides back into solution (if other values are populated, they may be ignored).
But there is an easier way – actually half of the code in this project is dedicated to retrofitting existing functions unaware of all these, into operations.
Operations from existing functions¶
The FnOp
provides a concrete wrapper around any arbitrary function
to define and execute within a pipeline.
Use the operation()
factory to instantiate one:
>>> from operator import add
>>> from graphtik import operation
>>> add_op = operation(add,
... needs=['a', 'b'],
... provides=['a_plus_b'])
>>> add_op
FnOp(name='add', needs=['a', 'b'], provides=['a_plus_b'], fn='add')
You may still call the original function at FnOp.fn
,
bypassing thus any operation pre-processing:
>>> add_op.fn(3, 4)
7
But the proper way is to call the operation (either directly or by calling the
FnOp.compute()
method). Notice though that unnamed
positional parameters are not supported:
>>> add_op(a=3, b=4)
{'a_plus_b': 7}
Tip
(unstable API) In case your function needs to access the execution
machinery
or its wrapping operation, it can do that through the task_context
(unstable API, not working during (deprecated) parallel execution,
see Accessing wrapper operation from task-context)
Builder pattern¶
There are two ways to instantiate a FnOp
s, each one suitable
for different scenarios.
We’ve seen that calling manually operation()
allows putting into a pipeline
functions that are defined elsewhere (e.g. in another module, or are system functions).
But that method is also useful if you want to create multiple operation instances
with similar attributes, e.g. needs
:
>>> op_factory = operation(needs=['a'])
Notice that we specified a fn, in order to get back a FnOp
instance (and not a decorator).
>>> from graphtik import operation, compose
>>> from functools import partial
>>> def mypow(a, p=2):
... return a ** p
>>> pow_op2 = op_factory.withset(fn=mypow, provides="^2")
>>> pow_op3 = op_factory.withset(fn=partial(mypow, p=3), name='pow_3', provides='^3')
>>> pow_op0 = op_factory.withset(fn=lambda a: 1, name='pow_0', provides='^0')
>>> graphop = compose('powers', pow_op2, pow_op3, pow_op0)
>>> graphop
Pipeline('powers', needs=['a'], provides=['^2', '^3', '^0'], x3 ops:
mypow, pow_3, pow_0)
>>> graphop(a=2)
{'a': 2, '^2': 4, '^3': 8, '^0': 1}
Tip
See Plotting on how to make diagrams like this.
Decorator specification¶
If you are defining your computation graph and the functions that comprise it all in the same script,
the decorator specification of operation
instances might be particularly useful,
as it allows you to assign computation graph structure to functions as they are defined.
Here’s an example:
>>> from graphtik import operation, compose
>>> @operation(needs=['b', 'a', 'r'], provides='bar')
... def foo(a, b, c):
... return c * (a + b)
>>> graphop = compose('foo_graph', foo)
Notice that if
name
is not given, it is deduced from the function name.
Specifying graph structure: provides
and needs
¶
Each operation is a node in a computation graph, depending and supplying data from and to other nodes (via the solution), in order to compute.
This graph structure is specified (mostly) via the provides
and needs
arguments
to the operation()
factory, specifically:
needs
this argument names the list of (positionally ordered) inputs data the operation requires to receive from solution. The list corresponds, roughly, to the arguments of the underlying function (plus any tokens).
It can be a single string, in which case a 1-element iterable is assumed.
- seealso
needs, modifier,
FnOp.needs
,FnOp._user_needs
,FnOp._fn_needs
provides
this argument names the list of (positionally ordered) outputs data the operation provides into the solution. The list corresponds, roughly, to the returned values of the fn (plus any tokens & aliases).
It can be a single string, in which case a 1-element iterable is assumed.
If they are more than one, the underlying function must return an iterable with same number of elements (unless it returns dictionary).
Declarations of needs and provides is affected by modifiers like
keyword()
:
Map inputs(& outputs) to differently named function arguments (& results)¶
- graphtik.modifier.keyword(name: str, keyword: Optional[str] = None, accessor: Optional[Accessor] = None, jsonp=None) _Modifier [source]
Annotate a dependency that maps to a different name in the underlying function.
When used on needs dependencies:
The value of the
name
dependency is read from the solution, and thenthat value is passed in the function as a keyword-argument named
keyword
.
When used on provides dependencies:
The operation must be a returns dictionary.
The value keyed with
keyword
is read from function’s returned dictionary, and thenthat value is placed into solution named as
name
.
- Parameters
keyword –
The argument-name corresponding to this named-input. If it is None, assumed the same as name, so as to behave always like kw-type arg, and to preserve its fn-name if ever renamed.
accessor – the functions to access values to/from solution (see
Accessor
) (actually a 2-tuple with functions is ok)jsonp – None (derrived from name),
False
, str, collection of str/callable (last one) See genericmodify()
modifier.
- Returns
a
_Modifier
instance, even if no keyword is given OR it is the same as name.
Example:
In case the name of a function input argument is different from the name in the graph (or just because the name in the inputs is not a valid argument-name), you may map it with the 2nd argument of
keyword()
:>>> from graphtik import operation, compose, keyword
>>> @operation(needs=[keyword("name-in-inputs", "fn_name")], provides="result") ... def foo(*, fn_name): # it works also with non-positional args ... return fn_name >>> foo FnOp(name='foo', needs=['name-in-inputs'(>'fn_name')], provides=['result'], fn='foo')
>>> pipe = compose('map a need', foo) >>> pipe Pipeline('map a need', needs=['name-in-inputs'], provides=['result'], x1 ops: foo)
>>> sol = pipe.compute({"name-in-inputs": 4}) >>> sol['result'] 4
You can do the same thing to the results of a returns dictionary operation:
>>> op = operation(lambda: {"fn key": 1}, ... name="renaming `provides` with a `keyword`", ... provides=keyword("graph key", "fn key"), ... returns_dict=True) >>> op FnOp(name='renaming `provides` with a `keyword`', provides=['graph key'(>'fn key')], fn{}='<lambda>')
Hint
Mapping provides names wouldn’t make sense for regular operations, since these are defined arbitrarily at the operation level. OTOH, the result names of returns dictionary operation are decided by the underlying function, which may lie beyond the control of the user (e.g. from a 3rd-party object).
Operations may execute with missing inputs¶
- graphtik.modifier.optional(name: str, keyword: Optional[str] = None, accessor: Optional[Accessor] = None, jsonp=None) _Modifier [source]
Annotate optionals needs corresponding to defaulted op-function arguments, …
received only if present in the inputs (when operation is invoked).
The value of an optional dependency is passed in as a keyword argument to the underlying function.
- Parameters
keyword – the name for the function argument it corresponds; if a falsy is given, same as name assumed, to behave always like kw-type arg and to preserve its fn-name if ever renamed.
accessor – the functions to access values to/from solution (see
Accessor
) (actually a 2-tuple with functions is ok)jsonp – None (derrived from name),
False
, str, collection of str/callable (last one) See genericmodify()
modifier.
Example:
>>> from graphtik import operation, compose, optional
>>> @operation(name='myadd', ... needs=["a", optional("b")], ... provides="sum") ... def myadd(a, b=0): ... return a + b
Notice the default value
0
to theb
annotated as optional argument:>>> graph = compose('mygraph', myadd) >>> graph Pipeline('mygraph', needs=['a', 'b'(?)], provides=['sum'], x1 ops: myadd)
The graph works both with and without
c
provided in the inputs:>>> graph(a=5, b=4)['sum'] 9 >>> graph(a=5) {'a': 5, 'sum': 5}
Like
keyword()
you may map input-name to a different function-argument:>>> operation(needs=['a', optional("quasi-real", "b")], ... provides="sum" ... )(myadd.fn) # Cannot wrap an operation, its `fn` only. FnOp(name='myadd', needs=['a', 'quasi-real'(?'b')], provides=['sum'], fn='myadd')
Calling functions with varargs (*args
)¶
- graphtik.modifier.vararg(name: str, accessor: Optional[Accessor] = None, jsonp=None) _Modifier [source]
Annotate a varargish needs to be fed as function’s
*args
.- Parameters
See also
Consult also the example test-case in:
test/test_op.py:test_varargs()
, in the full sources of the project.Example:
We designate
b
&c
as vararg arguments:>>> from graphtik import operation, compose, vararg
>>> @operation( ... needs=['a', vararg('b'), vararg('c')], ... provides='sum' ... ) ... def addall(a, *b): ... return a + sum(b) >>> addall FnOp(name='addall', needs=['a', 'b'(*), 'c'(*)], provides=['sum'], fn='addall')
>>> graph = compose('mygraph', addall)
The graph works with and without any of
b
orc
inputs:>>> graph(a=5, b=2, c=4)['sum'] 11 >>> graph(a=5, b=2) {'a': 5, 'b': 2, 'sum': 7} >>> graph(a=5) {'a': 5, 'sum': 5}
- graphtik.modifier.varargs(name: str, accessor: Optional[Accessor] = None, jsonp=None) _Modifier [source]
An varargish
vararg()
, naming a iterable value in the inputs.- Parameters
See also
Consult also the example test-case in:
test/test_op.py:test_varargs()
, in the full sources of the project.Example:
>>> from graphtik import operation, compose, varargs
>>> def enlist(a, *b): ... return [a] + list(b)
>>> graph = compose('mygraph', ... operation(name='enlist', needs=['a', varargs('b')], ... provides='sum')(enlist) ... ) >>> graph Pipeline('mygraph', needs=['a', 'b'(?)], provides=['sum'], x1 ops: enlist)
The graph works with or without b in the inputs:
>>> graph(a=5, b=[2, 20])['sum'] [5, 2, 20] >>> graph(a=5) {'a': 5, 'sum': [5]} >>> graph(a=5, b=0xBAD) Traceback (most recent call last): ValueError: Failed matching inputs <=> needs for FnOp(name='enlist', needs=['a', 'b'(+)], provides=['sum'], fn='enlist'): 1. Expected varargs inputs to be non-str iterables: {'b'(+): 2989} +++inputs: ['a', 'b']
Attention
To avoid user mistakes, varargs do not accept
str
inputs (though iterables):>>> graph(a=5, b="mistake") Traceback (most recent call last): ValueError: Failed matching inputs <=> needs for FnOp(name='enlist', needs=['a', 'b'(+)], provides=['sum'], fn='enlist'): 1. Expected varargs inputs to be non-str iterables: {'b'(+): 'mistake'} +++inputs: ['a', 'b']
See also
The elaborate example in Hierarchical data and further tricks section.
Interface differently named dependencies: aliases & keyword modifier¶
Sometimes, you need to interface functions & operations where they name a dependency differently. There are 4 different ways to accomplish that:
Introduce some “pipe-through” operation (see the example in Default conveyor operation, below).
Annotate certain needs with
keyword()
modifier (exemplified in the modifier).For a returns dictionary operation, annotate certain provides with a
keyword()
modifier (exemplified in the modifier).Alias (clone) certain provides to different names:
>>> op = operation(str, ... name="cloning `provides` with an `alias`", ... provides="real thing", ... aliases={"real thing": "clone"})
Default conveyor operation¶
If you don’t specify a callable, the default identity function get assigned, as long a name for the operation is given, and the number of needs matches the number of provides.
This facilitates conveying inputs into renamed outputs without the need to define a trivial identity function matching the needs & provides each time:
>>> from graphtik import keyword, optional, vararg
>>> op = operation(
... None,
... name="a",
... needs=[optional("opt"), vararg("vararg"), "pos", keyword("kw")],
... # positional vararg, keyword, optional
... provides=["pos", "vararg", "kw", "opt"],
... )
>>> op(opt=5, vararg=6, pos=7, kw=8)
{'pos': 7, 'vararg': 6, 'kw': 5, 'opt': 8}
Notice that the order of the results is not that of the needs
(or that of the inputs in the compute()
method), but, as explained in the comment-line,
it follows Python semantics.
Considerations for when building pipelines¶
When many operations are composed into a computation graph, Graphtik matches up the values in their needs and provides to form the edges of that graph (see Pipelines for more on that), like the operations from the sample formula (1) in Quick start section:
>>> from operator import mul, sub
>>> from functools import partial
>>> from graphtik import compose, operation
>>> def abspow(a, p):
... """Compute |a|^p. """
... c = abs(a) ** p
... return c
>>> # Compose the mul, sub, and abspow operations into a computation graph.
>>> graphop = compose("graphop",
... operation(mul, needs=["α", "β"], provides=["α×β"]),
... operation(sub, needs=["α", "α×β"], provides=["α-α×β"]),
... operation(name="abspow1", needs=["α-α×β"], provides=["|α-α×β|³"])
... (partial(abspow, p=3))
... )
>>> graphop
Pipeline('graphop',
needs=['α', 'β', 'α×β', 'α-α×β'],
provides=['α×β', 'α-α×β', '|α-α×β|³'],
x3 ops: mul, sub, abspow1)
Notice the use of
functools.partial()
to set parameterp
to a constant value.And this is done by calling once more the returned “decorator” from
operation()
, when called without a function.
The needs
and provides
arguments to the operations in this script define
a computation graph that looks like this:
Pipelines¶
Graphtik’s operation.compose()
factory handles the work of tying together operation
instances into a runnable computation graph.
The simplest use case is to assemble a collection of individual operations into a runnable computation graph. The sample formula (1) from Quick start section illustrates this well:
>>> from operator import mul, sub
>>> from functools import partial
>>> from graphtik import compose, operation
>>> def abspow(a, p):
... """Computes |a|^p. """
... c = abs(a) ** p
... return c
The call here to compose()
yields a runnable computation graph that looks like this
(where the circles are operations, squares are data, and octagons are parameters):
>>> # Compose the mul, sub, and abspow operations into a computation graph.
>>> formula = compose("maths",
... operation(name="mul1", needs=["α", "β"], provides=["α×β"])(mul),
... operation(name="sub1", needs=["α", "α×β"], provides=["α-α×β"])(sub),
... operation(name="abspow1", needs=["α-α×β"], provides=["|α-α×β|³"])
... (partial(abspow, p=3))
... )
This yields a graph which looks like this (see Plotting):
>>> formula.plot('calc_power.svg')
Compiling and Running a pipeline¶
The graph composed above can be run by simply calling it with a dictionary of values with keys corresponding to the named dependencies (needs & provides):
>>> # Run the graph and request all of the outputs.
>>> out = formula(α=2, β=5)
>>> out
{'α': 2, 'β': 5, 'α×β': 10, 'α-α×β': -8, '|α-α×β|³': 512}
You may plot the solution:
>>> out.plot('a_solution.svg')
The solution of the graph.¶
Alternatively, you may compile only (and validate()
)
the pipeline, to see which operations will be included in the graph
(assuming the graph is solvable at all), based on the given inputs/outputs
combination:
>>> plan = formula.compile(['α', 'β'], outputs='α-α×β')
>>> plan
ExecutionPlan(needs=['α', 'β'],
provides=['α-α×β'],
x5 steps: mul1, β, sub1, α, α×β)
>>> plan.validate() # all fine
Plotting the plan reveals the pruned operations, and numbers operations and evictions (see next section) in the order of execution:
>>> plan.plot()
Obtaining just the execution plan.¶
Tip
Hover over pruned (grey) operations to see why they were excluded from the plan.
But if an impossible combination of inputs & outputs is asked, the plan comes out empty:
>>> plan = formula.compile('α', outputs="α-α×β")
>>> plan
ExecutionPlan(needs=[], provides=[], x0 steps: )
>>> plan.validate()
Traceback (most recent call last):
ValueError: Unsolvable graph:
+--Network(x8 nodes, x3 ops: mul1, sub1, abspow1)
+--possible inputs: ['α', 'β', 'α×β', 'α-α×β']
+--possible outputs: ['α×β', 'α-α×β', '|α-α×β|³']
Evictions: producing a subset of outputs¶
By default, calling a graph-operation on a set of inputs will yield all of
that graph’s outputs.
You can use the outputs
parameter to request only a subset.
For example, if formula
is as above:
>>> # Run the graph-operation and request a subset of the outputs.
>>> out = formula.compute({'α': 2, 'β': 5}, outputs="α-α×β")
>>> out
{'α-α×β': -8}
When asking a subset of the graph’s outputs, Graphtik does 2 things:
it prunes any operations that are not on the path from given inputs to the requested outputs (e.g. the
abspow1
operation, above, is not executed);it evicts any intermediate data from solution as soon as they are not needed.
You may see (2) in action by including the sequence of execution steps into the plot:
>>> dot = out.plot(theme={"show_steps": True})
Tip
Read Plot customizations to understand the trick with the plot theme, above.
Short-circuiting a pipeline¶
You can short-circuit a graph computation, making certain inputs unnecessary,
by providing a value in the graph that is further downstream in the graph than those inputs.
For example, in the graph-operation we’ve been working with, you could provide
the value of α-α×β
to make the inputs α
and β
unnecessary:
>>> # Run the graph-operation and request a subset of the outputs.
>>> out = formula.compute({"α-α×β": -8})
>>> out
{'α-α×β': -8, '|α-α×β|³': 512}
When you do this, any operation
nodes that are not on a path from the downstream input
to the requested outputs (i.e. predecessors of the downstream input) are not computed.
For example, the mul1
and sub1
operations are not executed here.
This can be useful if you have a graph-operation that accepts alternative forms of the same input.
For example, if your graph-operation requires a PIL.Image
as input, you could
allow your graph to be run in an API server by adding an earlier operation
that accepts as input a string of raw image data and converts that data into the needed PIL.Image
.
Then, you can either provide the raw image data string as input, or you can provide
the PIL.Image
if you have it and skip providing the image data string.
Re-computations¶
If you take the solution from a pipeline, change some values in it,
and feed it back into the same pipeline as inputs, the recomputation will,
unexpectedly, fail with Unsolvable graph
error – all dependencies
have already values, therefore any operations producing them are pruned out,
till no operation remains:
>>> new_inp = formula.compute({"α": 2, "β": 5})
>>> new_inp["α"] = 20
>>> formula.compute(new_inp)
Traceback (most recent call last):
ValueError: Unsolvable graph:
+--Network(x8 nodes, x3 ops: mul1, sub1, abspow1)
+--possible inputs: ['α', 'β', 'α×β', 'α-α×β']
+--possible outputs: ['α×β', 'α-α×β', '|α-α×β|³']
One way to proceed is to avoid recompiling, by executing directly the pre-compiled plan, which will run all the original operations on the new values:
>>> sol = new_inp.plan.execute(new_inp)
>>> sol
{'α': 20, 'β': 5, 'α×β': 100, 'α-α×β': -80, '|α-α×β|³': 512000}
>>> [op.name for op in sol.executed]
['mul1', 'sub1', 'abspow1']
Hint
Notice that all values have been marked as overwrites.
But that trick wouldn’t work if the modified value is an inner dependency of the graph – in that case, the operations upstream would simply overwrite it:
>>> new_inp["α-α×β"] = 123
>>> sol = new_inp.plan.execute(new_inp)
>>> sol["α-α×β"] # should have been 123!
-80
You can still do that using the recompute_from
argument of Pipeline.compute()
.
It accepts a string/list of dependencies to recompute, downstream:
>>> sol = formula.compute(new_inp, recompute_from="α-α×β")
>>> sol
{'α': 20, 'β': 5, 'α×β': 10, 'α-α×β': 123, '|α-α×β|³': 1860867}
>>> [op.name for op in sol.executed]
['abspow1']
The old values are retained, although the operations producing them have been pruned from the plan.
Note
The value of α-α×β
is no longer the correct result of sub1
operation,
above it (hover to see sub1
inputs & output).
Extending pipelines¶
Sometimes we begin with existing computation graph(s) to which we want to extend with other operations and/or pipelines.
There are 2 ways to combine pipelines together, merging (the default) and nesting.
Merging¶
This is the default mode for compose()
when when combining individual operations,
and it works exactly the same when whole pipelines are involved.
For example, lets suppose that this simple pipeline describes the daily scheduled workload of an “empty’ day:
>>> weekday = compose("weekday",
... operation(str, name="wake up", needs="backlog", provides="tasks"),
... operation(str, name="sleep", needs="tasks", provides="todos"),
... )
Now let’s do some “work”:
>>> weekday = compose("weekday",
... operation(lambda t: (t[:-1], t[-1:]),
... name="work!", needs="tasks", provides=["tasks done", "todos"]),
... operation(str, name="sleep"),
... weekday,
... )
Notice that the pipeline to override was added last, at the bottom; that’s because the operations added earlier in the call (further to the left) override any identically-named operations added later.
Notice also that the overridden “sleep” operation hasn’t got any actual role
in the schedule.
We can eliminate “sleep” by using the excludes
argument (it accepts also list):
>>> weekday = compose("weekday", weekday, excludes="sleep")
Nesting¶
Other times we want preserve all the operations composed, regardless of clashes
in their names. This is doable with compose(..., nest=True))
.
Lets build a schedule for the the 3-day week (covid19 γαρ…), by combining 3 mutated copies of the daily schedule we built earlier:
>>> weekdays = [weekday.withset(name=f"day {i}") for i in range(3)]
>>> week = compose("week", *weekdays, nest=True)
We now have 3 “isolated” clusters because all operations & data have been prefixed with the name of their pipeline they originally belonged to.
Let’s suppose we want to break the isolation, and have all sub-pipelines consume & produce from a common “backlog” (n.b. in real life, we would have a “feeder” & “collector” operations).
We do that by passing as the nest
parameter a callable()
which will decide
which names of the original pipeline (operations & dependencies) should be prefixed
(see also compose()
& RenArgs
for how to use that param):
>>> def rename_predicate(ren_args):
... if ren_args.name not in ("backlog", "tasks done", "todos"):
... return True
>>> week = compose("week", *weekdays, nest=rename_predicate)
Finally we may run the week’s schedule and get the outcome (whatever that might be :-), hover the results to see them):
>>> sol = week.compute({'backlog': "a lot!"})
>>> sol
{'backlog': 'a lot!',
'day 0.tasks': 'a lot!',
'tasks done': 'a lot', 'todos': '!',
'day 1.tasks': 'a lot!',
'day 2.tasks': 'a lot!'}
>>> dot = sol.plot(clusters=True)
Tip
We had to plot with clusters=True
so that we prevent the plan
to insert the “after pruning” cluster (see PlotArgs.clusters
).
See also
Consult these test-cases from the full sources of the project:
test/test_combine.py
Advanced pipelines¶
Depending on sideffects¶
- graphtik.modifier.token(name, optional: Optional[bool] = None) _Modifier [source]
tokens denoting modifications beyond the scope of the solution.
Both needs & provides may be designated as tokens using this modifier. They work as usual while solving the graph (planning) but they have a limited interaction with the operation’s underlying function; specifically:
input tokens must exist in the solution as inputs for an operation depending on it to kick-in, when the computation starts - but this is not necessary for intermediate tokens in the solution during execution;
input tokens are NOT fed into underlying functions;
output tokens are not expected from underlying functions, unless a rescheduled operation with partial outputs designates a sideffected as canceled by returning it with a falsy value (operation must returns dictionary).
Tip
If modifications involve some input/output, prefer
implicit()
orsfxed()
modifiers.You may still convey this relationships by including the dependency name in the string - in the end, it’s just a string - but no enforcement of any kind will happen from graphtik, like:
>>> from graphtik import token
>>> token("price[sales_df]") 'price[sales_df]'($)
Example:
A typical use-case is to signify changes in some “global” context, outside solution:
>>> from graphtik import operation, compose, token
>>> @operation(provides=token("lights off")) # sideffect names can be anything ... def close_the_lights(): ... pass
>>> graph = compose('strip ease', ... close_the_lights, ... operation( ... name='undress', ... needs=[token("lights off")], ... provides="body")(lambda: "TaDa!") ... ) >>> graph Pipeline('strip ease', needs=['lights off'($)], provides=['lights off'($), 'body'], x2 ops: close_the_lights, undress)
>>> sol = graph() >>> sol {'body': 'TaDa!'}
sideffect¶
Note
Something has to provide a sideffect for a function needing it to execute - this could be another operation, like above, or the user-inputs; just specify some truthy value for the sideffect:
>>> sol = graph.compute({token("lights off"): True})
Modifying existing values in solutions¶
- graphtik.modifier.sfxed(dependency: str, sfx0: str, *sfx_list: str, keyword: Optional[str] = None, optional: Optional[bool] = None, accessor: Optional[Accessor] = None, jsonp=None) _Modifier [source]
Annotates a sideffected dependency in the solution sustaining side-effects.
- Parameters
dependency – the actual dependency receiving the sideffect, which will be fed into/out of the function.
sfx0 – the 1st (arbitrary object) sideffect marked as “acting” on the dependency.
sfx_list – more (arbitrary object) sideffects (like the sfx0)
keyword – the name for the function argument it corresponds. When optional, it becomes the same as name if falsy, so as to behave always like kw-type arg, and to preserve fn-name if ever renamed. When not optional, if not given, it’s all fine.
accessor – the functions to access values to/from solution (see
Accessor
) (actually a 2-tuple with functions is ok)jsonp – None (derrived from name),
False
, str, collection of str/callable (last one) See genericmodify()
modifier.
Like
token()
but annotating a real dependency in the solution, allowing that dependency to be present both in needs and provides of the same function.Example:
A typical use-case is to signify columns required to produce new ones in pandas dataframes (emulated with dictionaries):
>>> from graphtik import operation, compose, sfxed
>>> @operation(needs="order_items", ... provides=sfxed("ORDER", "Items", "Prices")) ... def new_order(items: list) -> "pd.DataFrame": ... order = {"items": items} ... # Pretend we get the prices from sales. ... order['prices'] = list(range(1, len(order['items']) + 1)) ... return order
>>> @operation( ... needs=[sfxed("ORDER", "Items"), "vat rate"], ... provides=sfxed("ORDER", "VAT") ... ) ... def fill_in_vat(order: "pd.DataFrame", vat: float): ... order['VAT'] = [i * vat for i in order['prices']] ... return order
>>> @operation( ... needs=[sfxed("ORDER", "Prices", "VAT")], ... provides=sfxed("ORDER", "Totals") ... ) ... def finalize_prices(order: "pd.DataFrame"): ... order['totals'] = [p + v for p, v in zip(order['prices'], order['VAT'])] ... return order
To view all internal dependencies, enable DEBUG in configurations:
>>> from graphtik.config import debug_enabled
>>> with debug_enabled(True): ... finalize_prices FnOp(name='finalize_prices', needs=[sfxed('ORDER', 'Prices'), sfxed('ORDER', 'VAT')], _user_needs=[sfxed('ORDER', 'Prices', 'VAT')], _fn_needs=['ORDER'], provides=[sfxed('ORDER', 'Totals')], _user_provides=[sfxed('ORDER', 'Totals')], _fn_provides=['ORDER'], fn='finalize_prices')
Notice that declaring a single sideffected with many items in sfx_list, expands into multiple “singular”
sideffected
dependencies in the network (checkneeds
vs_user_needs
above).>>> proc_order = compose('process order', new_order, fill_in_vat, finalize_prices) >>> sol = proc_order.compute({ ... "order_items": ["toilet-paper", "soap"], ... "vat rate": 0.18, ... }) >>> sol {'order_items': ['toilet-paper', 'soap'], 'vat rate': 0.18, 'ORDER': {'items': ['toilet-paper', 'soap'], 'prices': [1, 2], 'VAT': [0.18, 0.36], 'totals': [1.18, 2.36]}}
sideffecteds¶
Notice that although many functions consume & produce the same
ORDER
dependency, something that would have formed cycles, the wrapping operations need and provide different sideffected instances, breaking thus the cycles.See also
The elaborate example in Hierarchical data and further tricks section.
Resilience when operations fail (endurance)¶
It is possible for a pipeline to persist executing operations, even if some of them are raising errors, if they are marked as endured:
>>> @operation(endured=1, provides=["space", "time"])
... def get_out():
... raise ValueError("Quarantined!")
>>> get_out
FnOp!(name='get_out', provides=['space', 'time'], fn='get_out')
Notice the exclamation(!
) before the parenthesis in the string representation
of the operation.
>>> @operation(needs="space", provides="fun")
... def exercise(where):
... return "refreshed"
>>> @operation(endured=1, provides="time")
... def stay_home():
... return "1h"
>>> @operation(needs="time", provides="fun")
... def read_book(for_how_long):
... return "relaxed"
>>> covid19 = compose("covid19", get_out, stay_home, exercise, read_book)
>>> covid19
Pipeline('covid19',
needs=['space', 'time'],
provides=['space', 'time', 'fun'],
x4 ops: get_out, stay_home, exercise, read_book)
Notice the thick outlines of the endured (or rescheduled, see below) operations.
When executed, the pipeline produced outputs, although one of its operations has failed:
>>> sol = covid19()
>>> sol
{'time': '1h', 'fun': 'relaxed'}
You may still abort on failures, later, by raising an appropriate exception from
Solution
:
>>> sol.scream_if_incomplete()
Traceback (most recent call last):
graphtik.base.IncompleteExecutionError:
Not completed x2 operations ['exercise', 'get_out'] due to x1 failures and x0 partial-ops:
+--get_out: ValueError('Quarantined!')
Operations with partial outputs (rescheduled)¶
In case the actually produce outputs depend on some condition in the inputs, the solution has to reschedule the plan amidst execution, and consider the actual provides delivered:
>>> @operation(rescheduled=1,
... needs="quarantine",
... provides=["space", "time"],
... returns_dict=True)
... def get_out_or_stay_home(quarantine):
... if quarantine:
... return {"time": "1h"}
... else:
... return {"space": "around the block"}
>>> get_out_or_stay_home
FnOp?(name='get_out_or_stay_home',
needs=['quarantine'],
provides=['space', 'time'],
fn{}='get_out_or_stay_home')
>>> @operation(needs="space", provides=["fun", "body"])
... def exercise(where):
... return "refreshed", "strong feet"
>>> @operation(needs="time", provides=["fun", "brain"])
... def read_book(for_how_long):
... return "relaxed", "popular physics"
>>> covid19 = compose("covid19", get_out_or_stay_home, exercise, read_book)
Depending on “quarantine’ state we get to execute different part of the pipeline:
>>> sol = covid19(quarantine=True)
>>> sol = covid19(quarantine=False)
In both case, a warning gets raised about the missing outputs, but the execution proceeds regularly to what it is possible to evaluate. You may collect a report of what has been canceled using this:
>>> print(sol.check_if_incomplete())
Not completed x1 operations ['read_book'] due to x0 failures and x1 partial-ops:
+--get_out_or_stay_home: ['time']
In case you wish to cancel the output of a single-result operation,
return the special value graphtik.NO_RESULT
.
Hierarchical data and further tricks¶
Working with hierarchical data relies upon dependencies expressed as json pointer paths against solution data-tree. Let’s retrofit the “weekly tasks” example from Nesting section, above.
In the previous example, we had left out the collection of the tasks and the TODOs – this time we’re going to:
properly distribute & backlog the tasks to be done across the days using sideffected dependencies to modify the original stack of tasks in-place, while the workflow is running,
exemplify further the use of operation nesting & renaming, and
(unstable API) access the wrapping operation and
execution
machinery from within the function by usingtask_context
, and finallystore the input backlog, the work done, and the TODOs from the tasks in this data-tree:
+--backlog +--Monday.tasks +--Wednesday.tasks +--Tuesday.tasks +--daily_tasks/ +--Monday +--Tuesday +--Wednesday +--weekly_tasks +--todos
First, let’s build the single day’s workflow, without any nesting:
>>> from graphtik import NO_RESULT, sfxed
>>> from graphtik.base import RenArgs # type hints for autocompletion.
>>> from graphtik.execution import task_context
>>> from graphtik.modifier import dep_renamed
>>> todos = sfxed("backlog", "todos")
>>> @operation(name="wake up",
... needs="backlog",
... provides=["tasks", todos],
... rescheduled=True
... )
... def pick_tasks(backlog):
... if not backlog:
... return NO_RESULT
... # Pick from backlog 1/3 of len-of-chars of my day-name.
... n_tasks = int(len(task_context.get().op.name) / 3)
... my_tasks, todos = backlog[:n_tasks], backlog[n_tasks:]
... return my_tasks, todos
The actual work is emulated with a conveyor operation:
>>> do_tasks = operation(fn=None, name="work!", needs="tasks", provides="daily_tasks")
>>> weekday = compose("weekday", pick_tasks, do_tasks)
Notice that the “backlog” sideffected result of the “wakeup” operation is also listed in its needs; through this trick, each daily tasks can remove the tasks it completed from the initial backlog of tasks, for the next day to pick up. The “todos” sfx is just a name to denote the kind of modification performed on the “backlog”
Note also that the tasks passed from “wake up” –> “work!” operations are not hierarchical,
but kept “private” in each day by nesting them with a dot(.
):
Now let’s clone the daily-task x3 and nest it, to make a 3-day workflow:
>>> days = ["Monday", "Tuesday", "Wednesday"]
>>> weekdays = [weekday.withset(name=d) for d in days]
>>> def nester(ra: RenArgs):
... dep = ra.name
... if ra.typ == "op":
... return True # Nest by.dot.
... if ra.typ.endswith(".jsonpart"):
... return False # Don't touch the json-pointer parts.
... if dep == "tasks":
... return True # Nest by.dot
... if dep == "daily_tasks":
... # Nest as subdoc.
... return dep_renamed(dep, lambda n: f"{n}/{ra.parent.name}")
... return False
>>> week = compose("week", *weekdays, nest=nester)
And this is now the pipeline for a 3 day-week.
Notice the tasks-done/{day}
subdoc nodes at the bottom of the diagram:
Finally combine all weekly-work using a “collector” operation:
>>> from graphtik import vararg
>>> @operation(
... name="collect tasks",
... needs=[todos, *(vararg(f"daily_tasks/{d}") for d in days)],
... provides=["weekly_tasks", "todos"],
... )
... def collector(backlog, *daily_tasks):
... return daily_tasks or (), backlog or ()
...
>>> week = compose("week", week, collector)
This is the final week pipeline:
We can now feed the week pipeline with a “workload” of 17 imaginary tasks. We know from each “wake up” operation that Monday, Tuesday & Wednesday will pick 4, 5 & 5 tasks respectively, leaving 3 tasks as “todo”:
>>> sol = week.compute({"backlog": range(17)})
>>> sol
{'backlog': range(14, 17),
'Monday.tasks': range(0, 4),
'daily_tasks': {'Monday': range(0, 4),
'Tuesday': range(4, 9),
'Wednesday': range(9, 14)},
'Tuesday.tasks': range(4, 9),
'Wednesday.tasks': range(9, 14),
'weekly_tasks': (range(0, 4), range(4, 9), range(9, 14)),
'todos': range(14, 17)}
Or we can reduce the workload, and see that Wednesday is left without any work to do:
>>> sol = week.compute(
... {"backlog": range(9)},
... outputs=["daily_tasks", "weekly_tasks", "todos"])
Hover over the data nodes to see the results. Specifically check the “daily_tasks” which is a nested dictionary:
>>> sol
{'daily_tasks': {'Monday': range(0, 4),
'Tuesday': range(4, 9)},
'weekly_tasks': (range(0, 4), range(4, 9)),
'todos': ()}
Tip
If an operation works with dependencies only in some sub-document and below, its prefix can be factored-out as a current-working-document, an argument given when defining the operation.
Concatenating Pandas¶
Writing output values into jsonp paths wotks fine for dictionaries,
but it is not always possible to modify pandas objects that way
(e.g. multi-indexed objects).
In that case you may concatenate the output pandas
with those in solution, by annotating provides with hcat()
or vcat
modifiers (which eventually select different accessors).
For instance, assuming an input document that contains 2 dataframes with the same number of rows:
/data_lake/satellite_data: pd.DataFrame(...)
/db/planet_ephemeris: pd.DataFrame(...)
… we can copy multiple columns from satellite_data
–> planet_ephemeris
,
at once, with something like this:
@operation(
needs="data_lake/satellite_data",
provides=hcat("db/planet_ephemeris/orbitals")
)
def extract_planets_columns(satellite_df):
orbitals_df = satellite_df[3:8] # the orbital columns
orbitals_df.columns = pd.MultiIndex.from_product(
[["orbitals", orbitals_df.columns]]
)
return orbitals_df
Hint
Notice that we used the same orbitals
name, both
for the sub-name in the jsonp expression, and as a new level
in the multi-index columns of the orbitals_df
dataframe.
That will help further down the road, to index and extract that group of columns
with /db/planet_ephemeris/orbitals
dependency, and continue building
the network.
Plotting and Debugging¶
Plotting¶
For Errors & debugging it is necessary to visualize the graph-operation (e.g. to see why nodes where pruned). You may plot any plottable and annotate on top the execution plan and solution of the last computation, calling methods with arguments like this:
pipeline.plot(True) # open a matplotlib window
pipeline.plot("pipeline.svg") # other supported formats: png, jpg, pdf, ...
pipeline.plot() # without arguments return a pydot.DOT object
pipeline.plot(solution=solution) # annotate graph with solution values
solution.plot() # plot solution only
… or for the last …:
solution.plot(...)
The legend for all graphtik diagrams, generated by legend()
.¶
The same Plottable.plot()
method applies also for:
each one capable to producing diagrams with increasing complexity.
For instance, when a pipeline has just been composed, plotting it will
come out bare bone, with just the 2 types of nodes (data & operations), their
dependencies, and (optionally, if plot theme show_steps
is true)
the sequence of the execution-steps of the plan.
But as soon as you run it, the net plot calls will print more of the internals.
Internally it delegates to ExecutionPlan.plot()
of the plan.
attribute, which caches the last run to facilitate debugging.
If you want the bare-bone diagram, plot the network:
pipeline.net.plot(...)
If you want all details, plot the solution:
solution.net.plot(...)
Note
For plots, Graphviz program must be in your PATH,
and pydot
& matplotlib
python packages installed.
You may install both when installing graphtik
with its plot
extras:
pip install graphtik[plot]
Tip
A description of the similar API to pydot.Dot
instance returned by plot()
methods is here: https://pydotplus.readthedocs.io/reference.html#pydotplus.graphviz.Dot
Jupyter notebooks¶
The pydot.Dot
instances returned by
Plottable.plot()
are rendered directly in Jupyter/IPython notebooks
as SVG images.
You may increase the height of the SVG cell output with something like this:
pipeline.plot(jupyter_render={"svg_element_styles": "height: 600px; width: 100%"})
See default_jupyter_render
for those defaults and recommendations.
Plot customizations¶
Rendering of plots is performed by the active plotter (class plot.Plotter
).
All Graphviz styling attributes are controlled by the active plot theme,
which is the plot.Theme
instance installed in its Plotter.default_theme
attribute.
The following style expansion\s apply in the attribute-values
of Theme
instances:
Call any callables found as keys, values or the whole style-dict, passing in the current
plot_args
, and replace those with the callable’s result (even more flexible than templates).Resolve any
Ref
instances, first against the current nx_attrs and then against the attributes of the current theme.Render jinja2 templates with template-arguments all attributes of
plot_args
instance in use, (hence much more flexible thanRef
).Any Nones results above are discarded.
Workaround pydot/pydot#228 pydot-cstor not supporting styles-as-lists.
Merge tooltip & tooltip lists.
You may customize the theme and/or plotter behavior with various strategies, ordered by breadth of the effects (most broadly effecting method at the top):
(zeroth, because it is discouraged!)
Modify in-place
Theme
class attributes, and monkeypatchPlotter
methods.This is the most invasive method, affecting all past and future plotter instances, and future only(!) themes used during a Python session.
Attention
It is recommended to use other means for Plot customizations instead of modifying directly theme’s class-attributes.
All
Theme
class-attributes are deep-copied when constructing new instances, to avoid modifications by mistake, while attempting to update instance-attributes instead (hint: allmost all its attributes are containers i.e. dicts). Therefore any class-attributes modification will be ignored, until a newTheme
instance from the patched class is used .
Modify the
default_theme
attribute of the default active plotter, like that:get_active_plotter().default_theme.kw_op["fillcolor"] = "purple"
This will affect all
Plottable.plot()
calls for a Python session.Create a new
Plotter
with customizedPlotter.default_theme
, or clone and customize the theme of an existing plotter by the use of itsPlotter.with_styles()
method, and make that the new active plotter.This will affect all calls in
context
.If customizing theme constants is not enough, you may subclass and install a new
Plotter
class in context.
Pass theme or plotter arguments when calling
Plottable.plot()
:pipeline.plot(plotter=Plotter(kw_legend=None)) pipeline.plot(theme=Theme(show_steps=True)
You may clone and customize an existing plotter, to preserve any pre-existing customizations:
active_plotter = get_active_plotter() pipeline.plot(theme={"show_steps": True})
… OR:
pipeline.plot(plotter=active_plotter.with_styles(kw_legend=None))
You may create a new class to override Plotter’s methods that way.
Hint
This project dogfoods (3) in its own
docs/source/conf.py
sphinx file. In particular, it configures the base-url of operation node links (by default, nodes do not link to any url):## Plot graphtik SVGs with links to docs. # def _make_py_item_url(fn): if not inspect.isbuiltin(fn): fn_name = base.func_name(fn, None, mod=1, fqdn=1, human=0) if fn_name: return f"../reference.html#{fn_name}" plotter = plot.get_active_plotter() plot.set_active_plotter( plot.get_active_plotter().with_styles( kw_op_label={ **plotter.default_theme.kw_op_label, "op_url": lambda plot_args: _make_py_item_url(plot_args.nx_item), "fn_url": lambda plot_args: _make_py_item_url(plot_args.nx_item.fn), } ) )
Sphinx-generated sites¶
This library contains a new Sphinx extension (adapted from the sphinx.ext.doctest
)
that can render plottables in sites from python code in “doctests”.
To enabled it, append module graphtik.sphinxext
as a string in you docs/conf.py
: extensions
list, and then intersperse the graphtik
or graphtik-output
directives with regular doctest-code to embed graph-plots into the site; you may
refer to those plotted graphs with the graphtik
role referring to
their :name: option(see Examples below).
Hint
Note that Sphinx is not doctesting the actual python modules, unless the plotting code has ended up, somehow, in the site (e.g. through some autodoc directive). Contrary to pytest and doctest standard module, the module’s globals are not imported (until sphinx#6590 is resolved), so you may need to import it in your doctests, like this:
Unfortunately, you cannot use relative import, and have to write your module’s full name.
Directives¶
- .. graphtik::¶
Renders a figure with a graphtik plots from doctest code.
It supports:
all configurations from
sphinx.ext.doctest
sphinx-extension, plus those described below, in Configurations.all options from ‘doctest’ directive,
hide
options
pyversion
skipif
these options from
image
directive, excepttarget
(plot elements may already link to URLs):height
width
scale
class
alt
these options from
figure
directive:name
align
figwidth
figclass
and the following new options:
graphvar
graph-format
caption
Specifically the “interesting” options are these:
- :graphvar: (string, optional) varname (`str`)¶
the variable name containing what to render, which it can be:
an instance of
Plottable
(such asFnOp
,Pipeline
,Network
,ExecutionPlan
orSolution
);an already plotted
pydot.Dot
instance, ie, the result of aPlottable.plot()
call
If missing, it renders the last variable in the doctest code assigned with the above types.
Attention
If no
:graphvar:
is given and the doctest code fails, it will still render any plottable created from code that has run previously, without any warnings!
- :graph-format: png | svg | svgz | pdf | `None` (choice, default: `None`)¶
- if None, format decided according to active builder, roughly:
“html”-like: svg
“latex”: pdf
Note that SVGs support zooming, tooltips & URL links, while PNGs support image maps for linkable areas.
- :zoomable: <empty>, (true, 1, yes, on) | (false, 0, no, off) (`bool`)¶
Enable/disable interactive pan+zoom of SVGs; if missing/empty,
graphtik_zoomable
assumed.
- :zoomable-opts: <empty>, (true, 1, yes, on) | (false, 0, no, off) (`str`)¶
A JS-object with the options for the interactive zoom+pan pf SVGs. If missing,
graphtik_zoomable_options
assumed. Specify{}
explicitly to force library’s default options.
- :name: link target id (`str`)¶
Make this pipeline a hyperlink target identified by this name. If :name: given and no :caption: given, one is created out of this, to act as a permalink.
- :caption: figure's caption (`str`)¶
Text to put underneath the pipeline.
- .. graphtik-output::¶
Like
graphtik
, but works like doctest’stestoutput
directive.
- :graphtik:¶
An interpreted text role to refer to graphs plotted by
graphtik
orgraphtik-output
directives by their:name:
option.
Configurations¶
- graphtik_default_graph_format¶
type: Union[str, None]
default: None
The file extension of the generated plot images (without the leading dot .`), used when no
:graph-format:
option is given in agraphtik
orgraphtik-output
directive.If None, the format is chosen from
graphtik_graph_formats_by_builder
configuration.
- graphtik_graph_formats_by_builder¶
type: Map[str, str]
default: check the sources
a dictionary defining which plot image formats to choose, depending on the active builder.
Keys are regexes matching the name of the active builder;
values are strings from the supported formats for pydot library, e.g.
png
(seesupported_plot_formats()
).
If a builder does not match to any key, and no format given in the directive, no graphtik plot is rendered; so by default, it only generates plots for html & latex.
Warning
Latex is probably not working :-(
- graphtik_zoomable_svg¶
type: bool
default:
True
Whether to render SVGs with the zoom-and-pan javascript library, unless the
:zoomable:
directive-option is given (and not empty).Attention
Zoom-and-pan does not work in Sphinx sites for Chrome locally - serve the HTML files through some HTTP server, e.g. launch this command to view the site of this project:
python -m http.server 8080 --directory build/sphinx/html/
- graphtik_zoomable_options¶
type: str
default:
{"controlIconsEnabled": true, "fit": true}
A JS-object with the options for the interactive zoom+pan pf SVGs, when the
:zoomable-opts:
directive option is missing. If empty,{}
assumed (library’s default options).
- graphtik_plot_keywords¶
type: dict
default:
{}
Arguments or
build_pydot()
to apply when rendering plottables.
- graphtik_save_dot_files¶
- - type: `bool`, `None`¶
- - default: ``None``¶
For debugging purposes, if enabled, store another
<img>.txt
file next to each image file with the DOT text that produced it.When
none
(default), controlled by DEBUG flag from configurations, otherwise, any boolean takes precedence here.
- graphtik_warning_is_error¶
type: bool
default:
false
If false, suppress doctest errors, and avoid failures when building site with
-W
option, since these are unrelated to the building of the site.
doctest_test_doctest_blocks
(foreign config)Don’t disable doctesting of literal-blocks, ie, don’t reset the
doctest_test_doctest_blocks
configuration value, or else, such code would be invisible tographtik
directive.trim_doctest_flags
(foreign config)This configuration is forced to
False
(default wasTrue
).Attention
This means that in the rendered site, options-in-comments like
# doctest: +SKIP
and<BLACKLINE>
artifacts will be visible.
Examples¶
The following directive renders a diagram of its doctest code, beneath it:
.. graphtik::
:graphvar: addmul
:name: addmul-operation
>>> from graphtik import compose, operation
>>> addmul = compose(
... "addmul",
... operation(name="add", needs="abc".split(), provides="(a+b)×c")(lambda a, b, c: (a + b) * c)
... )
addmul-operation¶
which you may reference
with this syntax:
you may :graphtik:`reference <addmul-operation>` with ...
Hint
In this case, the :graphvar:
parameter is not really needed, since
the code contains just one variable assignment receiving a subclass
of Plottable
or pydot.Dot
instance.
Additionally, the doctest code producing the plottables does not have to be contained in the graphtik directive as a whole.
So the above could have been simply written like this:
>>> from graphtik import compose, operation
>>> addmul = compose(
... "addmul",
... operation(name="add", needs="abc".split(), provides="(a+b)×c")(lambda a, b, c: (a + b) * c)
... )
.. graphtik::
:name: addmul-operation
Errors & debugging¶
Graphs are complex, and execution pipelines may become arbitrarily deep. Launching a debugger-session to inspect deeply nested stacks is notoriously hard.
This projects has dogfooded various approaches when designing and debugging pipelines.
Logging¶
The 1st pit-stop it to increase the logging verbosity.
Logging statements have been melticulously placed to describe the pruning
while planning and subsequent execution flow;
execution flow log-statements are accompanied by the unique solution id
of each flow, like the (3C40)
& (8697)
below,
important for when running pipelines in (deprecated) parallel:
--------------------- Captured log call ---------------------
INFO === Compiling pipeline(t)...
INFO ... pruned step #4 due to unsatisfied-needs['d'] ...
DEBUG ... adding evict-1 for not-to-be-used NEED-chain{'a'} of topo-sorted #1 OpTask(FnOp|(name='...
DEBUG ... cache-updated key: ((), None, None)
INFO === (3C40) Executing pipeline(t), in parallel, on inputs[], according to ExecutionPlan(needs=[], provides=['b'], x2 steps: op1, op2)...
DEBUG +++ (3C40) Parallel batch['op1'] on solution[].
DEBUG +++ (3C40) Executing OpTask(FnOp|(name='op1', needs=[], provides=[sfx: 'b'], fn{}='<lambda>'), sol_keys=[])...
INFO graphtik.fnop.py:534 Results[sfx: 'b'] contained +1 unknown provides[sfx: 'b']
FnOp|(name='op1', needs=[], provides=[sfx: 'b'], fn{}='<lambda>')
INFO ... (3C40) op(op1) completed in 1.406ms.
...
DEBUG === Compiling pipeline(t)...
DEBUG ... cache-hit key: ((), None, None)
INFO === (8697) Executing pipeline(t), evicting, on inputs[], according to ExecutionPlan(needs=[], provides=['b'], x3 steps: op1, op2, sfx: 'b')...
DEBUG +++ (8697) Executing OpTask(FnOp(name='op1', needs=[], provides=[sfx: 'b'], fn{}='<lambda>'), sol_keys=[])...
INFO graphtik.fnop.py:534 Results[sfx: 'b'] contained +1 unknown provides[sfx: 'b']
FnOp(name='op1', needs=[], provides=[sfx: 'b'], fn{}='<lambda>')
INFO ... (8697) op(op1) completed in 0.149ms.
DEBUG +++ (8697) Executing OpTask(FnOp(name='op2', needs=[sfx: 'b'], provides=['b'], fn='<lambda>'), sol_keys=[sfx: 'b'])...
INFO ... (8697) op(op2) completed in 0.08ms.
INFO ... (8697) evicting 'sfx: 'b'' from solution[sfx: 'b', 'b'].
INFO === (8697) Completed pipeline(t) in 0.229ms.
Particularly usefull are the the “pruned step #…” logs, where they explain why the network does not behave as expected.
DEBUG
flag¶
The 2nd pit-stop is to make DEBUG
in configurations
returning true, either by calling set_debug()
, or externally,
by setting the GRAPHTIK_DEBUG
environment variable,
to enact the following:
on errors, plots the 1st errored solution/plan/pipeline/net (in that order) in an SVG file inside the temp-directory, and its path is logged in ERROR-level;
jetsam logs in ERROR (instead of in DEBUG) all annotations on all calls up the stack trace (logged from
graphtik.jetsam.err
logger);FnOp.compute()
prints out full given-inputs (not just their keys);net objects print more details recursively, like fields (not just op-names) and prune-comments;
plotted SVG diagrams include style-provenance as tooltips;
Sphinx extension also saves the original DOT file next to each image (see
graphtik_save_dot_files
).
Of particular interest is the automatic plotting of the failed plottable.
Tip
From code you may wrap the code you are interested in with config.debug_enabled()
“context-manager”, to get augmented print-outs for selected code-paths only.
Jetsam on exceptions¶
If you are on an interactive session, you may access many in-progress variables
on raised exception (e.g. sys.last_value
) from their “jetsam” attribute,
as an immediate post-mortem debugging aid:
>>> from graphtik import compose, operation
>>> from pprint import pprint
>>> def scream(*args):
... raise ValueError("Wrong!")
>>> try:
... compose("errgraph",
... operation(name="screamer", needs=['a'], provides=["foo"])(scream)
... )(a=None)
... except ValueError as ex:
... pprint(ex.jetsam)
{'aliases': None,
'args': {'kwargs': {}, 'positional': [None], 'varargs': []},
'network': Network(x3 nodes, x1 ops: screamer),
'operation': FnOp(name='screamer', needs=['a'], provides=['foo'], fn='scream'),
'outputs': None,
'pipeline': Pipeline('errgraph', needs=['a'], provides=['foo'], x1 ops: screamer),
'plan': ExecutionPlan(needs=['a'], provides=['foo'], x1 steps: screamer),
'results_fn': None,
'results_op': None,
'solution': {'a': None},
'task': OpTask(FnOp(name='screamer', needs=['a'], provides=['foo'], fn='scream'), sol_keys=['a'])}
In interactive REPL console you may use this to get the last raised exception:
import sys
sys.last_value.jetsam
The following annotated attributes might have meaningful value on an exception (press [Tab] to auto-complete):
solution
– the most usefull object to inspect (plot) – an instance of
Solution
, containing inputs & outputs till the error happened; note thatSolution.executed
contain the list of executed operations so far.plan
the innermost plan that executing when a operation crashed
network
the innermost network owning the failed operation/function
pruned_dag
The result of pruning, ingredient of a plan while compiling.
op_comments
Reason why operations were pruned. Ingredient of a plan while compiling.
sorted_nodes
Topo-sort dag respecting operation-insertion order to break ties. Ingredient of a plan while compiling.
needs
provides
pipeline
the innermost pipeline that crashed
operation
the innermost operation that failed
args
either the input arguments list fed into the function, or a dict with both
args
&kwargs
keys in it.outputs
the names of the outputs the function was expected to return
provides
the names eventually the graph needed from the operation; a subset of the above, and not always what has been declared in the operation.
fn_results
the raw results of the operation’s function, if any
op_results
the results, always a dictionary, as matched with operation’s provides
plot_fpath
if DEBUG flag is enabled, the path where the broken plottable has been saved
Of course you may plot some “jetsam” values, to visualize the condition that caused the error (see Plotting).
Debugger¶
The Plotting capabilities, along with the above annotation of exceptions with the internal state of plan/operation often renders a debugger session unnecessary. But since the state of the annotated values might be incomplete, you may not always avoid one.
You may to enable “post mortem debugging” on any program,
but a lot of utilities have a special --pdb
option for it, like pytest
(or scrapy).
For instance, if you are extending this project, to enter the debugger when a test-case breaks, call
pytest --pdb -k <test-case>
from the console.Alternatively, you may set a
breakpoint()
anywhere in your (or 3rd-party) code.
As soon as you arrive in the debugger-prompt, move up a few frames until you locate
either the Solution
, or the ExecutionPlan
instances,
and plot them.
It takes some practice to familiarize yourself with the internals of graphtik, for instance:
in
FnOp._match_inputs_with_fn_needs()
method, the the solution is found in thenamed_inputs
argument. For instance, to index with the 1st needs into the solution:named_inputs[self.needs[0]]
in
ExecutionPlan._handle_task()
method, thesolution
argument contains the “live” instance, whileThe
ExecutionPlan
is contained in theSolution.plan
, orthe plan is the
self
argument, if arrived in theNetwork.compile()
method.
Setting a breakpoint on a specific operation¶
You may take advantage of the callbacks facility and install a breakpoint for a specific operation before calling the pipeline.
Add this code (interactively, or somewhere in your sources):
def break_on_my_op(op_cb):
if op_cb.op.name == "buggy_operation":
breakpoint()
And then call you pipeline with the callbacks
argument:
pipe.compute({...}, callbacks=break_on_my_op)
And that way you may single-step and inspect the inputs & outputs
of the buggy_operation
.
Accessing wrapper operation from task-context¶
Attention
Unstable API, in favor of supporting a specially-named function argument to receive the same instances.
Alternatively, when the debugger is stopped inside an underlying function,
you may access the wrapper FnOp
and the Solution
through the graphtik.execution.task_context
context-var.
This is populated with the OpTask
instance of the currently executing operation,
as shown in the pdb
session printout, below:
(Pdb) from graphtik.execution import task_context
(Pdb) op_task = task_context.get()
Get possible completions on the returned operation-task with [TAB]:
(Pdb) p op_task.[TAB][TAB]
op_task.__call__
op_task.__class__
...
op_task.get
op_task.logname
op_task.marshalled
op_task.op
op_task.result
op_task.sol
op_task.solid
Printing the operation-task gives you a quick overview of the operation and the available solution keys (but not the values, not to clutter the debugger console):
(Pdb) p op_task
OpTask(FnOp(name=..., needs=..., provides=..., fn=...), sol_keys=[...])
Print the wrapper operation:
(Pdb) p op_task.op
...
Print the solution:
(Pdb) p op_task.sol
...
Architecture¶
In mathematical terms, given:
a partially populated data tree, and
a set of functions operating (consuming/producing) on branches of the data tree,
graphtik collects a subset of functions in a graph that when executed consume & produce as values as possible in the data-tree.
- compute¶
- computation¶
- phase¶
The definition & execution of pipelines happens in 3 phases:
… it is constrained by these IO data-structures:
… populates these low-level data-structures:
network (COMPOSE time)
execution dag (COMPILE time)
execution steps (COMPILE time)
solution (EXECUTE time)
… and utilizes these main classes:
graphtik.fnop.FnOp
([fn, name, needs, ...])An operation performing a callable (ie a function, a method, a lambda).
graphtik.pipeline.Pipeline
(operations, name, *)An operation that can compute a network-graph of operations.
graphtik.planning.Network
(*operations[, graph])A graph of operations that can compile an execution plan.
graphtik.execution.ExecutionPlan
(net, needs, ...)A pre-compiled list of operation steps that can execute for the given inputs/outputs.
graphtik.execution.Solution
(plan, input_values)The solution chain-map and execution state (e.g.
… plus those for plotting:
graphtik.plot.Plotter
([theme])graphtik.plot.Theme
(*[, _prototype])The poor man's css-like plot theme (see also
StyleStack
).- compose¶
- composition¶
The phase where operations are constructed and grouped into pipelines and corresponding networks based on their dependencies.
Tip
Use
operation()
factory to constructFnOp
instances (a.k.a. operations).Use
compose()
factory to buildPipeline
instances (a.k.a. pipelines).
- recompute¶
There are 2 ways to feed the solution back into the same pipeline:
by reusing the pre-compiled plan (coarse-grained), or
by using the
compute(recompute_from=...)
argument (fine-grained),
as described in Re-computations tutorial section.
Attention
This feature is not well implemented (e.g.
test_recompute_NEEDS_FIX()
), neither thoroughly tested.- combine pipelines¶
When operations and/or pipelines are composed together, there are two ways to combine the operations contained into the new pipeline: operation merging (default) and operation nesting.
They are selected by the
nest
parameter ofcompose()
factory.- operation merging¶
The default method to combine pipelines, also applied when simply merging operations.
Any identically-named operations override each other, with the operations added earlier in the
.compose()
call (further to the left) winning over those added later (further to the right).- seealso
- operation nesting¶
The elaborate method to combine pipelines forming clusters.
The original pipelines are preserved intact in “isolated” clusters, by prefixing the names of their operations (and optionally data) by the name of the respective original pipeline that contained them (or the user defines the renames).
- seealso
Nesting,
compose()
,RenArgs
,nest_any_node()
,dep_renamed()
,PlotArgs.clusters
, Hierarchical data and further tricks (example).
- compile¶
- compilation¶
- planning¶
The phase where the
Network
creates a new execution plan by pruning all graph nodes into a subgraph dag, and deriving the execution steps.- execute¶
- execution¶
- sequential¶
The phase where the plan derived from a pipeline calls the underlying functions of all operations contained in its execution steps, with inputs/outputs taken/written to the solution.
Currently there are 2 ways to execute:
sequential
(deprecated) parallel, with a
multiprocessing.pool.ProcessPool
Plans may abort their execution by setting the abort run global flag.
- network¶
- graph¶
A
Network.graph
of operations linked by their dependencies implementing a pipeline.During composition, the nodes of the graph are connected by repeated calls of
Network._append_operation()
withinNetwork
constructor.During planning the graph is pruned based on the given inputs, outputs & node predicate to extract the dag, and it is ordered, to derive the execution steps, stored in a new plan, which is then cached on the
Network
class.- plan¶
- execution plan¶
Class
ExecutionPlan
perform the execution phase which contains the dag and the steps.compileed execution plans are cached in
Network._cached_plans
across runs with (inputs, outputs, predicate) as key.- solution¶
A map of dependency-named values fed to/from the pipeline during execution.
It feeds operations with inputs, collects their outputs, records the status of executed or canceled operations, tracks any overwrites, and applies any evictions, as orchestrated by the plan.
A new
Solution
instance is created either internally byPipeline.compute()
and populated with user-inputs, or must be created externally with those values and fed into the said method.The results of the last operation executed “win” in the layers, and the base (least precedence) is the user-inputs given when the execution started.
Certain values may be extracted/populated with accessors.
- layer¶
- solution layer¶
The solution class inherits
ChainMap
, to store the actual outputs of each executed operation in a separate dictionary (+1 for user-inputs).When layers are disabled, the solution populates the passed-in inputs and stores in layers just the keys of outputs produced.
The layering, by default, is disabled if there is no jsonp dependency in the network, and
set_layered_solution()
configurations has not been set, nor has the respective parameter been given to methodscompute()
/execute()
.If disabled, overwrites are lost, but are marked as such.
Hint
Combining hierarchical data with per-operation layers in solution leads to duplications of container nodes in the data tree. To retrieve the complete solution, merging of overwritten nodes across the layers would then be needed.
- overwrite¶
solution values written by more than one operations in the respective layer, accessed by
Solution.overwrites
attribute (assuming that layers have not been disabled e.g. due to hierarchical data, in which case, just the dependency names of the outputs actually produced are stored).Note that sideffected outputs always produce an overwrite.
Overwrites will not work for If evicted outputs.
- prune¶
- pruning¶
A subphase of planning performed by method
Network._prune_graph()
, which extracts a subgraph dag that does not contain any unsatisfied operations.It topologically sorts the graph, and prunes based on given inputs, asked outputs, node predicate and operation needs & provides.
- unsatisfied operation¶
The core of pruning & rescheduling, performed by
planning.unsatisfied_operations()
function, which collects all operations with unreachable dependencies:- dag¶
- execution dag¶
- solution dag¶
There are 2 directed-acyclic-graphs instances used:
the
ExecutionPlan.dag
, in the execution plan, which contains the pruned nodes, used to decide the execution steps;the
Solution.dag
in the solution, which derives the canceled operations due to rescheduled/failed operations upstream.
- steps¶
- execution steps¶
The plan contains a list of the operation-nodes only from the dag, topologically sorted, and interspersed with instruction steps needed to compute the asked outputs from the given inputs.
They are built by
Network._build_execution_steps()
based on the subgraph dag.The only instruction step other than an operation is for performing an eviction.
- eviction¶
A memory footprint optimization where intermediate inputs & outputs are erased from solution as soon as they are not needed further down the dag.
Evictions are pre-calculated during planning, denoted with the dependency inserted in the steps of the execution plan.
- inputs¶
The named input values that are fed into an operation (or pipeline) through
Operation.compute()
method according to its needs.These values are either:
given by the user to the outer pipeline, at the start of a computation, or
derived from solution using needs as keys, during intermediate execution.
- outputs¶
The dictionary of computed values returned by an operation (or a pipeline) matching its provides, when method
Operation.compute()
is called.Those values are either:
retained in the solution, internally during execution, keyed by the respective provide, or
returned to user after the outer pipeline has finished computation.
When no specific outputs requested from a pipeline,
Pipeline.compute()
returns all intermediate inputs along with the outputs, that is, no evictions happens.An operation may return partial outputs.
- pipeline¶
The
Pipeline
composes and computes a network of operations against given inputs & outputs.This class is also an operation, so it specifies needs & provides but these are not fixed, in the sense that
Pipeline.compute()
can potentially consume and provide different subsets of inputs/outputs.- operation¶
Either the abstract notion of an action with specified needs and provides, dependencies, or the concrete wrapper
FnOp
for (anycallable()
), that feeds on inputs and update outputs, from/to solution, or given-by/returned-to the user by a pipeline.The distinction between needs/provides and inputs/outputs is akin to function parameters and arguments during define-time and run-time, respectively.
- dependency¶
The (possibly hierarchical) name of a solution value an operation needs or provides.
Dependencies are declared during composition, when building
FnOp
instances. Operations are then interlinked together, by matching the needs & provides of all operations contained in a pipeline.During planning the graph is then pruned based on the reachability of the dependencies.
During execution
Operation.compute()
performs 2 “matchings”:inputs & outputs in solution are accessed by the needs & provides names of the operations;
operation needs & provides are zipped against the underlying function’s arguments and results.
These matchings are affected by modifiers, print-out with diacritics.
Differences between various dependency operation attributes:
dependency attribute
dupes
token
alias
sfxed
needs
needs
✗
✓
SINGULAR
_user_needs
✓
✓
_fn_needs
✓
✗
STRIPPED
provides
provides
✗
✓
✓
SINGULAR
_user_provides
✓
✓
✗
_fn_provides
✓
✗
✗
STRIPPED
where:
“dupes=no” means the collection drops any duplicated dependencies
“SINGULAR” means
sfxed('A', 'a', 'b') ==> sfxed('A', 'b'), sfxed('A', 'b')
“STRIPPED” means
sfxed('A', 'a', 'b') ==> token('a'), sfxed('b')
- needs¶
- fn_needs¶
- matching inputs¶
The list of dependency names an operation requires from solution as inputs,
roughly corresponding to underlying function’s arguments (fn_needs).
Specifically,
Operation.compute()
extracts input values from solution by these names, and matches them against function arguments, mostly by their positional order. Whenever this matching is not 1-to-1, and function-arguments differ from the regular needs, modifiers must be used.- provides¶
- user_provides¶
- fn_provides¶
- zipping outputs¶
The list of dependency names an operation writes to the solution as outputs,
roughly corresponding to underlying function’s results (fn_provides).
Specifically,
Operation.compute()
“zips” this list-of-names with the output values produced when the operation’s function is called. You may alter this “zipping” by one of the following methods:artificially extended the provides with aliased fn_provides,
use modifiers to annotate certain names with
keyword()
, tokens and/or implicit, ormark the operation that its function returns dictionary, and cancel zipping.
Note
When joining a pipeline this must not be empty, or will scream! (an operation without provides would always be pruned)
- alias¶
Map an existing name in fn_provides into a duplicate, artificial one in provides .
You cannot alias an alias. See Interface differently named dependencies: aliases & keyword modifier
- conveyor operation¶
- default identity function¶
The default function if none given to an operation that conveys needs to provides.
For this to happen when
FnOp.compute()
is called, an operation name must have been given AND the number of provides must match that of the number of needs.- seealso
Default conveyor operation &
identity_function()
.
- returns dictionary¶
When an operation is marked with
FnOp.returns_dict
flag, the underlying function is not expected to return fn_provides as a sequence but as a dictionary; hence, no “zipping” of function-results –> fn_provides takes place.Usefull for operations returning partial outputs to have full control over which outputs were actually produced, or to cancel tokens.
- modifier¶
- diacritic¶
Dependency annotations modifying their behavior during planning, execution,
and when binding them as the inputs & outputs of operations.
The needs and provides may be annotated as:
as
keyword()
and/or optionals to modify their binding with the underlying function’s argumentsas sideffected to operate on the same data more than once (something that a DAG does not allow),
as implicit to let functions do the actual read/write in solution,
as accessor to modularize that access of solution data (only jsonp implemented so far)
the last two often working together to manipulate hierarchical data.
The
representation
of modifier-annotated dependencies utilize a combination of these diacritics:> :
keyword()
? :optional()
* :vararg()
+ :varargs()
@ : accessor (mostly for jsonp) $ :token()
^ :implicit()
See
graphtik.modifier
module.- optionals¶
A needs only modifier for a inputs that do not hinder operation execution (prune) if absent from solution.
In the underlying function it may correspond to either:
non-compulsory function arguments (with defaults), annotated with
optional()
, or
- varargish¶
A needs only modifier for inputs to be appended as
*args
(if present in solution).There are 2 kinds, both, by definition, optionals:
the
vararg()
annotates any solution value to be appended once in the*args
;the
varargs()
annotates iterable values and all its items are appended in the*args
one-by-one.
Attention
To avoid user mistakes, varargs do not accept
str
inputs (though iterables):>>> graph(a=5, b="mistake") Traceback (most recent call last): ValueError: Failed matching inputs <=> needs for FnOp(name='enlist', needs=['a', 'b'(+)], provides=['sum'], fn='enlist'): 1. Expected varargs inputs to be non-str iterables: {'b'(+): 'mistake'} +++inputs: ['a', 'b']
In printouts, it is denoted either with
*
or+
diacritic.See also the elaborate example in Hierarchical data and further tricks section.
- implicit¶
A modifier denoting a dependency not fed into/out of the function, but the dependency is still considered while planning, expected to exist in the solution, downstream.
One use case is for an operation to consume/produce a subdoc(s) with its own means (not through jsonp accessors).
Constructed with the
implicit()
modifier function, they can also be optionals and jsonp (but without accessors). If an implicit cannot solve your problems, try sideffected or tokens…- tokens¶
- sideffects¶
A modifier denoting a fictive dependency linking operations into virtual flows, without real data exchanges.
The side-effect modification may happen to some internal state not fully represented in the graph & solution.
There are actually 2 relevant modifiers:
The with
token()
modifier describing modifications taking place beyond the scope of the solution, and can connect operations arbitrarily, irrespective of data exchanges. It may have just the “optional” diacritic in printouts.Tip
Probably you either need implicit, or the next variant, not this one.
The sideffected modifier (annotated with
sfxed()
) denoting modifications on a real dependency read from and written to the solution.
Both kinds of sideffects participate in the planning of the graph, and both may be given or asked in the inputs & outputs of a pipeline, but they are never given to functions. A function of a returns dictionary operation can return a falsy value to declare it as canceled.
- sideffected¶
- sfx_list¶
A modifier denoting (sfx_list) sideffects acting on a solution dependency.
Note
To be precise, the “sideffected dependency” is the name held in
_Modifier._sideffected
attribute of a modifier created bysfxed()
function; it may have all diacritics in printouts.The main use case is to declare an operation that both needs and provides the same dependency, to mutate it. When designing a network with many sfxed modifiers all based on the same sideffected dependency (i.e. with different sfx_list), then these should form a strict (no forks) sequence, or else, fork modifications will be lost.
The outputs of a sideffected dependency will produce an overwrite if the sideffected dependency is declared both as needs and provides of some operation.
See also the elaborate example in Hierarchical data and further tricks section.
- accessor¶
Getter/setter functions to extract/populate solution values given as a modifier parameter (not applicable for tokens & implicit).
See
Accessor
defining class and themodify()
concrete factory.- subdoc¶
- superdoc¶
- doc chain¶
- data tree¶
- hierarchical data¶
A subdoc is a dependency value nested further into another one (the superdoc), accessed with a json pointer path expression with respect to the solution, denoted with slashes like:
root/parent/child/leaf
Whenever a nested dependency is given/asked, then all docs-in-chain (depicted below) are topologically sorted, before executing any operations working on them.
The docs-in-chain for a hypothetical dependency
stats/b/b1
: superdocs at the left, subdocs at the right ofb1
, respectively.¶Note that if the root has been asked in outputs, none of its subdocs will be evicted.
- seealso
:Hierarchical data and further tricks (example)
- json pointer path¶
- jsonp¶
A dependency containing slashes(
/
) & accessors that can read and write subdoc values with json pointer expressions, likeroot/parent/child/1/item
, resolved from solution.In addition to writing values, the
vcat()
orhcat()
modifiers (& respective accessors) support also pandas concatenation for provides.Note that all non-root dependencies are implicitly created as jsonp if the operation has a current-working-document defined.
- cwd¶
- current-working-document¶
A jsonp prefix of an operation (or pipeline) to prefix any non-root dependency defined.
- pandas concatenation¶
A jsonp dependency in provides may designate its respective
DataFrame
and/orSeries
output value to be concatenated with existing Pandas objects in the solution (usefull for when working with Pandas advanced indexing. or else, sideffecteds are needed to break read-update cycles on dataframes).See example in Concatenating Pandas.
- reschedule¶
- rescheduling¶
- partial outputs¶
- canceled operation¶
The partial pruning of the solution’s dag during execution. It happens when any of these 2 conditions apply:
an operation is marked with the
FnOp.rescheduled
attribute, which means that its underlying callable may produce only a subset of its provides (partial outputs);endurance is enabled, either globally (in the configurations), or for a specific operation.
The solution must then reschedule the remaining operations downstream, and possibly cancel some of those ( assigned in
Solution.canceled
).Partial operations are usually declared with returns dictionary so that the underlying function can control which of the outputs are returned.
- endurance¶
- endured¶
Keep executing as many operations as possible, even if some of them fail. Endurance for an operation is enabled if
set_endure_operations()
is true globally in the configurations or ifFnOp.endured
is true.You may interrogate
Solution.executed
to discover the status of each executed operations or call one ofcheck_if_incomplete()
orscream_if_incomplete()
.- predicate¶
- node predicate¶
A callable(op, node-data) that should return true for nodes to be included in graph during planning.
- abort run¶
A global configurations flag that when set with
abort_run()
function, it halts the execution of all currently or future plans.It is reset automatically on every call of
Pipeline.compute()
(after a successful intermediate planning), or manually, by callingreset_abort()
.- parallel¶
- parallel execution¶
- execution pool¶
- task¶
Attention
Deprecated, in favor of always producing a list of “parallelizable batches”, to hook with other executors (e.g. Dask, Apache’s airflow, Celery). In the future, just the single-process implementation will be kept, and marshalling should be handled externally.
execute operations in parallel, with a thread pool or process pool (instead of sequential). Operations and pipeline are marked as such on construction, or enabled globally from configurations.
Note a tokens are not expected to function with process pools, certainly not when marshalling is enabled.
- process pool¶
When the
multiprocessing.pool.Pool
class is used for (deprecated) parallel execution, the tasks must be communicated to/from the worker process, which requires pickling, and that may fail. With pickling failures you may try marshalling with dill library, and see if that helps.Note that tokens are not expected to function at all. certainly not when marshalling is enabled.
- thread pool¶
When the
multiprocessing.dummy.Pool()
class is used for (deprecated) parallel execution, the tasks are run in process, so no marshalling is needed.- marshalling¶
(deprecated) Pickling parallel operations and their inputs/outputs using the
dill
module. It is configured either globally withset_marshal_tasks()
or set with a flag on each operation / pipeline.Note that tokens do not work when this is enabled.
- plottable¶
Objects that can plot their graph network, such as those inheriting
Plottable
, (FnOp
,Pipeline
,Network
,ExecutionPlan
,Solution
) or apydot.Dot
instance (the result of thePlottable.plot()
method).Such objects may render as SVG in Jupiter notebooks (through their
plot()
method) and can render in a Sphinx site with with thegraphtik
RsT directive. You may control the rendered image as explained in the tip of the Plotting section.SVGs are in rendered with the zoom-and-pan javascript library
Attention
Zoom-and-pan does not work in Sphinx sites for Chrome locally - serve the HTML files through some HTTP server, e.g. launch this command to view the site of this project:
python -m http.server 8080 --directory build/sphinx/html/
- plotter¶
- plotting¶
A
Plotter
is responsible for rendering plottables as images. It is the active plotter that does that, unless overridden in aPlottable.plot()
call. Plotters can be customized by various means, such plot theme.- active plotter¶
- default active plotter¶
The plotter currently installed “in-context” of the respective graphtik configuration - this term implies also any Plot customizations done on the active plotter (such as plot theme).
Installation happens by calling one of
active_plotter_plugged()
orset_active_plotter()
functions.The default active plotter is the plotter instance that this project comes pre-configured with, ie, when no plot-customizations have yet happened.
Attention
It is recommended to use other means for Plot customizations instead of modifying directly theme’s class-attributes.
All
Theme
class-attributes are deep-copied when constructing new instances, to avoid modifications by mistake, while attempting to update instance-attributes instead (hint: allmost all its attributes are containers i.e. dicts). Therefore any class-attributes modification will be ignored, until a newTheme
instance from the patched class is used .- plot theme¶
- current theme¶
The mergeable and expandable styles contained in a
plot.Theme
instance.The current theme in-use is the
Plotter.default_theme
attribute of the active plotter, unless overridden with thetheme
parameter when callingPlottable.plot()
(conveyed internally as the value of thePlotArgs.theme
attribute).- style¶
- style expansion¶
A style is an attribute of a plot theme, either a scalar value or a dictionary.
Styles are collected in
stacks
and aremerged
into a single dictionary after performing the followingexpansions
:Call any callables found as keys, values or the whole style-dict, passing in the current
plot_args
, and replace those with the callable’s result (even more flexible than templates).Resolve any
Ref
instances, first against the current nx_attrs and then against the attributes of the current theme.Render jinja2 templates with template-arguments all attributes of
plot_args
instance in use, (hence much more flexible thanRef
).Any Nones results above are discarded.
Workaround pydot/pydot#228 pydot-cstor not supporting styles-as-lists.
Merge tooltip & tooltip lists.
Tip
if DEBUG flag is enabled, the provenance of all style values appears in the tooltips of plotted graphs.
- configurations¶
- graphtik configuration¶
The functions controlling compile & execution globally are defined in
config
module and +1 ingraphtik.plot
module; the underlying global data are stored incontextvars.ContextVar
instances, to allow for nested control.All boolean configuration flags are tri-state (
None, False, True
), allowing to “force” all operations, when they are not set to theNone
value. All of them default toNone
(false).- callbacks¶
x2 optional callables called before/after each operation
Pipeline.compute()
. Attention, any errors will abort the pipeline execution.- pre-op-callback
Called from solution code before marshalling. A use case would be to validate solution, or trigger a breakpoint by some condition.
- post-op-callback:
Called after solution have been populated with operation results. A use case would be to validate operation outputs and/or solution after results have been populated.
Callbacks must have this signature:
callbacks(op_cb) -> None
… where
op_cb
is an instance of theOpTask
.- jetsam¶
When a pipeline or an operation fails, the original exception gets annotated with salvaged values from
locals()
and raised intact, and optionally (if DEBUG flag) the diagram of the failed plottable is saved in temporary file.See Jetsam on exceptions.
API Reference¶
computation graphs for Python & Pandas |
|
compose operation/dependency from functions, matching/zipping inputs/outputs during execution. |
|
Harvest dependencies annotated callables to form pipelines. |
|
modifiers (print-out with diacritics) change dependency behavior during planning & execution. |
|
compose network of operations & dependencies, compile the plan. |
|
plotting handled by the active plotter & current theme. |
|
configurations for network execution, and utilities on them. |
|
Generic utilities, exceptions and operation & plottable base classes. |
|
jetsam utility for annotating exceptions from |
|
Utility for json pointer path modifier |
|
Extends Sphinx with |
Package: graphtik¶
computation graphs for Python & Pandas
Tip
The module import-time dependencies have been carefully optimized so that importing all from package takes the minimum time (e.g. <10ms in a 2019 laptop):
>>> %time from graphtik import * # doctest: +SKIP
CPU times: user 8.32 ms, sys: 34 µs, total: 8.35 ms
Wall time: 7.53 ms
Still, constructing your pipelines on import time would take considerable more time (e.g. ~300ms for the 1st pipeline). So prefer to construct them in “factory” module functions (remember to annotate them with typing hints to denote their retun type).
See also
plot.active_plotter_plugged()
, plot.set_active_plotter()
&
plot.get_active_plotter()
configs, not imported, unless plot is needed.
Module: fnop¶
compose operation/dependency from functions, matching/zipping inputs/outputs during execution.
Note
This module (along with modifier
& pipeline
) is what client code needs
to define pipelines on import time without incurring a heavy price
(<5ms on a 2019 fast PC)
- class graphtik.fnop.FnOp(fn: Optional[Callable] = None, name=None, needs: Optional[Union[Collection, str]] = None, provides: Optional[Union[Collection, str]] = None, aliases: Optional[Mapping] = None, *, cwd=None, rescheduled=None, endured=None, parallel=None, marshalled=None, returns_dict=None, node_props: Optional[Mapping] = None)[source]¶
An operation performing a callable (ie a function, a method, a lambda).
Tip
Use
operation()
factory to build instances of this class instead.Call
withset()
on existing instances to re-configure new clones.See diacritics to understand printouts of this class.
Differences between various dependency operation attributes:
dependency attribute
dupes
token
alias
sfxed
needs
needs
✗
✓
SINGULAR
_user_needs
✓
✓
_fn_needs
✓
✗
STRIPPED
provides
provides
✗
✓
✓
SINGULAR
_user_provides
✓
✓
✗
_fn_provides
✓
✗
✗
STRIPPED
where:
“dupes=no” means the collection drops any duplicated dependencies
“SINGULAR” means
sfxed('A', 'a', 'b') ==> sfxed('A', 'b'), sfxed('A', 'b')
“STRIPPED” means
sfxed('A', 'a', 'b') ==> token('a'), sfxed('b')
- __init__(fn: Optional[Callable] = None, name=None, needs: Optional[Union[Collection, str]] = None, provides: Optional[Union[Collection, str]] = None, aliases: Optional[Mapping] = None, *, cwd=None, rescheduled=None, endured=None, parallel=None, marshalled=None, returns_dict=None, node_props: Optional[Mapping] = None)[source]¶
Build a new operation out of some function and its requirements.
See
operation()
for the full documentation of parameters, study the code for attributes (or read them from rendered sphinx site).
- __module__ = 'graphtik.fnop'¶
- _abc_impl = <_abc_data object>¶
- _fn_needs[source]¶
Value names the underlying function requires (DUPES preserved, NO-SFX, STRIPPED sideffected).
- _fn_provides[source]¶
Value names the underlying function produces (DUPES, NO-ALIASES, NO_SFX, STRIPPED sideffected).
- _prepare_match_inputs_error(missing: List, varargs_bad: List, named_inputs: Mapping) ValueError [source]¶
- _zip_results_plain(results, is_rescheduled) dict [source]¶
Handle result sequence: no-result, single-item, or many.
- _zip_results_with_provides(results) dict [source]¶
Zip results with expected “real” (without sideffects) provides.
- aliases[source]¶
an optional mapping of fn_provides to additional ones, together comprising this operations the provides.
You cannot alias an alias.
- compute(named_inputs=None, outputs: Optional[Union[Collection, str]] = None, *args, **kw) dict [source]¶
- Parameters
named_inputs – a
Solution
instanceargs – ignored – to comply with superclass contract
kw – ignored – to comply with superclass contract
- property deps: Mapping[str, Collection]¶
All dependency names, including internal _user_ & _fn_.
if not DEBUG, all deps are converted into lists, ready to be printed.
- endured[source]¶
If true, even if callable fails, solution will reschedule; ignored if endurance enabled globally.
- marshalled[source]¶
If true, operation will be marshalled while computed, along with its inputs & outputs. (usefull when run in (deprecated) parallel with a process pool).
- name: str[source]¶
a name for the operation (e.g. ‘conv1’, ‘sum’, etc..); any “parents split by dots(
.
)”. :seealso: Nesting
- needs: Optional[Union[Collection, str]][source]¶
Dependencies ready to lay the graph for pruning (NO-DUPES, SFX, SINGULAR sideffecteds).
- node_props[source]¶
Added as-is into NetworkX graph, and you may filter operations by
Pipeline.withset()
. Also plot-rendering affected if they match Graphviz properties, if they start withUSER_STYLE_PREFFIX
, unless they start with underscore(_
).
- prepare_plot_args(plot_args: PlotArgs) PlotArgs [source]¶
Delegate to a provisional network with a single op .
- provides: Optional[Union[Collection, str]][source]¶
Value names ready to lay the graph for pruning (NO DUPES, ALIASES, SFX, SINGULAR sideffecteds, +alias destinations).
- rescheduled[source]¶
If true, underlying callable may produce a subset of provides, and the plan must then reschedule after the operation has executed. In that case, it makes more sense for the callable to returns_dict.
- returns_dict[source]¶
If true, it means the underlying function returns dictionary , and no further processing is done on its results, i.e. the returned output-values are not zipped with provides.
It does not have to return any alias outputs.
Can be changed amidst execution by the operation’s function.
- validate_fn_name()[source]¶
Call it before enclosing it in a pipeline, or it will fail on compute().
- withset(fn: Callable = Ellipsis, name=Ellipsis, needs: Optional[Union[Collection, str]] = Ellipsis, provides: Optional[Union[Collection, str]] = Ellipsis, aliases: Mapping = Ellipsis, *, cwd=Ellipsis, rescheduled=Ellipsis, endured=Ellipsis, parallel=Ellipsis, marshalled=Ellipsis, returns_dict=Ellipsis, node_props: Mapping = Ellipsis, renamer=None) FnOp [source]¶
Make a clone with the some values replaced, or operation and dependencies renamed.
if renamer given, it is applied on top (and afterwards) any other changed values, for operation-name, needs, provides & any aliases.
- Parameters
renamer –
if a dictionary, it renames any operations & data named as keys into the respective values by feeding them into :func:.dep_renamed()`, so values may be single-input callables themselves.
if it is a
callable()
, it is given aRenArgs
instance to decide the node’s name.
The callable may return a str for the new-name, or any other false value to leave node named as is.
Attention
The callable SHOULD wish to preserve any modifier on dependencies, and use
dep_renamed()
if a callable is given.- Returns
a clone operation with changed/renamed values asked
- Raise
(ValueError, TypeError): all cstor validation errors
ValueError: if a renamer dict contains a non-string and non-callable value
Examples
>>> from graphtik import operation, token
>>> op = operation(str, "foo", needs="a", ... provides=["b", token("c")], ... aliases={"b": "B-aliased"}) >>> op.withset(renamer={"foo": "BAR", ... 'a': "A", ... 'b': "B", ... token('c'): "cc", ... "B-aliased": "new.B-aliased"}) FnOp(name='BAR', needs=['A'], provides=['B', 'cc'($), 'new.B-aliased'], aliases=[('B', 'new.B-aliased')], fn='str')
Notice that
'c'
rename change the token name, without the destination name being atoken()
modifier (but source name must match the sfx-specifier).Notice that the source of aliases from
b-->B
is handled implicitely from the respective rename on the provides.
But usually a callable is more practical, like the one below renaming only data names:
>>> from graphtik.modifier import dep_renamed >>> op.withset(renamer=lambda ren_args: ... dep_renamed(ren_args.name, lambda n: f"parent.{n}") ... if ren_args.typ != 'op' else ... False) FnOp(name='foo', needs=['parent.a'], provides=['parent.b', 'parent.c'($), 'parent.B-aliased'], aliases=[('parent.b', 'parent.B-aliased')], fn='str')
Notice the double use of lambdas with
dep_renamed()
– an equivalent rename callback would be:dep_renamed(ren_args.name, f"parent.{dependency(ren_args.name)}")
- graphtik.fnop.NO_RESULT = <NO_RESULT>¶
A special return value for the function of a reschedule operation signifying that it did not produce any result at all (including tokens), otherwise, it would have been a single result,
None
. Usefull for rescheduled who want to cancel their single result witout being delcared as returns dictionary.
- graphtik.fnop.NO_RESULT_BUT_SFX = <NO_RESULT_BUT_SFX>¶
Like
NO_RESULT
but does not cancel any tokens declared as provides.
- graphtik.fnop._process_dependencies(deps: Collection[str]) Tuple[Collection[str], Collection[str]] [source]¶
Strip or singularize any implicit/tokens and apply CWD.
- Parameters
cwd – The current-working-document, when given, all non-root dependencies (needs, provides & aliases) become jsonps, prefixed with this.
- Returns
a x2 tuple
(op_deps, fn_deps)
, where any instances of tokens in deps are processed like this:- op_deps
any
sfxed()
is replaced by a sequence of “singularized
” instances, one for each item in its sfx_list;any duplicates are discarded;
order is irrelevant, since they don’t reach the function.
- fn_deps
- graphtik.fnop.as_renames(i, argname)[source]¶
Parses a list of (source–>destination) from dict or list-of-2-items.
- Returns
a (possibly empty)list-of-pairs
Note
The same source may be repeatedly renamed to multiple destinations.
- graphtik.fnop.identity_fn(*args, **kwargs)[source]¶
Act as the default function for the conveyor operation when no fn is given.
Adapted from https://stackoverflow.com/a/58524115/548792
- graphtik.fnop.jsonp_ize_all(deps, cwd: Sequence[str])[source]¶
Auto-convert deps with slashes as jsonp (unless
no_jsonp
).
- graphtik.fnop.operation(fn: ~typing.Callable = <UNSET>, name=<UNSET>, needs: ~typing.Optional[~typing.Union[~typing.Collection, str]] = <UNSET>, provides: ~typing.Optional[~typing.Union[~typing.Collection, str]] = <UNSET>, aliases: ~typing.Mapping = <UNSET>, *, cwd=<UNSET>, rescheduled=<UNSET>, endured=<UNSET>, parallel=<UNSET>, marshalled=<UNSET>, returns_dict=<UNSET>, node_props: ~typing.Mapping = <UNSET>) FnOp [source]¶
An operation factory that works like a “fancy decorator”.
- Parameters
fn –
The callable underlying this operation:
if not given, it returns the the
withset()
method as the decorator, so it still supports all arguments, apart from fn.if given, it builds the operation right away (along with any other arguments);
if given, but is
None
, it will assign the :default identity function right before it is computed.
Hint
This is a twisted way for “fancy decorators”.
After all that, you can always call
FnOp.withset()
on existing operation, to obtain a re-configured clone.If the fn is still not given when calling
FnOp.compute()
, then default identity function is implied, if name is given and the number of provides match the number of needs.name (str) – The name of the operation in the computation graph. If not given, deduce from any fn given.
needs –
the list of (positionally ordered) names of the data needed by the operation to receive as inputs, roughly corresponding to the arguments of the underlying fn (plus any tokens).
It can be a single string, in which case a 1-element iterable is assumed.
provides –
the list of (positionally ordered) output data this operation provides, which must, roughly, correspond to the returned values of the fn (plus any tokens & aliases).
It can be a single string, in which case a 1-element iterable is assumed.
If they are more than one, the underlying function must return an iterable with same number of elements, unless param returns_dict is true, in which case must return a dictionary that containing (at least) those named elements.
Note
When joining a pipeline this must not be empty, or will scream! (an operation without provides would always be pruned)
aliases – an optional mapping of provides to additional ones; if you need to map the same source provides into multiple destinations, use a list of tuples, like:
aliases=[("a", "A1"), ("a", "A2")]
.cwd – The current-working-document, when given, all non-root dependencies (needs, provides & aliases) become jsonp\s, prefixed with this.
rescheduled – If true, underlying callable may produce a subset of provides, and the plan must then reschedule after the operation has executed. In that case, it makes more sense for the callable to returns_dict.
endured – If true, even if callable fails, solution will reschedule. ignored if endurance enabled globally.
parallel – (deprecated) execute in parallel
marshalled – If true, operation will be marshalled while computed, along with its inputs & outputs. (usefull when run in (deprecated) parallel with a process pool).
returns_dict – if true, it means the fn returns dictionary with all provides, and no further processing is done on them (i.e. the returned output-values are not zipped with provides)
node_props – Added as-is into NetworkX graph, and you may filter operations by
Pipeline.withset()
. Also plot-rendering affected if they match Graphviz properties., unless they start with underscore(_
)
- Returns
when called with fn, it returns a
FnOp
, otherwise it returns a decorator function that accepts fn as the 1st argument.Note
Actually the returned decorator is the
FnOp.withset()
method and accepts all arguments, monkeypatched to support calling a virtualwithset()
method on it, not to interrupt the builder-pattern, but only that - besides that trick, it is just a bound method.
Example:
If no fn given, it returns the
withset
method, to act as a decorator:>>> from graphtik import operation, varargs
>>> op = operation() >>> op <function FnOp.withset at ...
- But if fn is set to None
>>> op = op(needs=['a', 'b']) >>> op FnOp(name=None, needs=['a', 'b'], fn=None)
If you call an operation without fn and no name, it will scream:
>>> op.compute({"a":1, "b": 2}) Traceback (most recent call last): ValueError: Operation must have a callable `fn` and a non-empty `name`: FnOp(name=None, needs=['a', 'b'], fn=None) (tip: for defaulting `fn` to conveyor-identity, # of provides must equal needs)
But if you give just a name with
None
as fn it will build an conveyor operation for some needs & provides:>>> op = operation(None, name="copy", needs=["foo", "bar"], provides=["FOO", "BAZ"]) >>> op.compute({"foo":1, "bar": 2}) {'FOO': 1, 'BAZ': 2}
You may keep calling
withset()
on an operation, to build modified clones:>>> op = op.withset(needs=['a', 'b'], ... provides='SUM', fn=lambda a, b: a + b) >>> op FnOp(name='copy', needs=['a', 'b'], provides=['SUM'], fn='<lambda>') >>> op.compute({"a":1, "b": 2}) {'SUM': 3}
>>> op.withset(fn=lambda a, b: a * b).compute({'a': 2, 'b': 5}) {'SUM': 10}
- graphtik.fnop.prefixed(dep, cwd)[source]¶
Converts dep into a jsonp and prepends prefix (unless dep was rooted).
TODO: make prefixed a TOP_LEVEL modifier.
- graphtik.fnop.reparse_operation_data(name, needs, provides, aliases=(), cwd: Optional[Sequence[str]] = None) Tuple[str, Collection[str], Collection[str], Collection[Tuple[str, str]]] [source]¶
Validate & reparse operation data as lists.
- Returns
name, needs, provides, aliases
As a separate function to be reused by client building operations, to detect errors early.
Harvest dependencies annotated callables to form pipelines.
- class graphtik.autograph.Autograph(out_patterns: Optional[Union[str, Pattern, Iterable[Union[str, Pattern]]]] = None, overrides: Optional[Mapping[Union[str, Pattern, Iterable[Union[str, Pattern]]], Mapping]] = None, renames: Optional[Mapping] = None, full_path_names: bool = False, domain: Optional[Union[str, int, Collection]] = None, sep=None)[source]¶
Make a graphtik operation by inspecting a function
The params below (except full_path_names) are merged in this order (1st takes precendance):
dict from overrides keyed by name
decorated with
autographed()
inspected from the callable
Example:
>>> from graphtik.autograph import * >>> from graphtik import keyword >>> def calc_sum_ab(a, b=0): ... return a + b
>>> aug = Autograph( ... out_patterns=['calc_', 'upd_'], renames={"a": "A"}, ... overrides={ ... "autographed": {"needs": [keyword("fn", "other_fn"), ...]}, ... }) >>> aug.wrap_funcs([autographed, get_autograph_decors, is_regular_class, FnHarvester]) [FnOp(name='autographed', needs=['fn'(>'other_fn'), 'name'(?), 'needs'(?), 'provides'(?), 'renames'(?), 'returns_dict'(?), 'aliases'(?), 'inp_sideffects'(?), 'out_sideffects'(?), 'domain'(?)], fn='autographed'), FnOp(name='get_autograph_decors', needs=['fn', 'default'(?), 'domain'(?)], fn='get_autograph_decors'), FnOp(name='is_regular_class', needs=['name', 'item'], fn='is_regular_class'), FnOp(name='FnHarvester', needs=['excludes'(?), 'base_modules'(?), 'predicate'(?), 'include_methods'(?), 'sep'(?)], provides=['fn_harvester'], fn='FnHarvester')]
Hint
Notice the use of triple-dot(
...
) to indicate that the rest of the needs should be parsed from the operation’s underlying function.- __init__(out_patterns: Optional[Union[str, Pattern, Iterable[Union[str, Pattern]]]] = None, overrides: Optional[Mapping[Union[str, Pattern, Iterable[Union[str, Pattern]]], Mapping]] = None, renames: Optional[Mapping] = None, full_path_names: bool = False, domain: Optional[Union[str, int, Collection]] = None, sep=None)[source]¶
- __module__ = 'graphtik.autograph'¶
- _apply_renames(rename_maps: Iterable[Union[Mapping, UNSET]], word_lists: Iterable)[source]¶
Rename words in all word_lists matching keys in rename_maps.
- _collect_rest_op_args(decors: dict)[source]¶
Collect the rest operation arguments from autographed decoration.
- _match_fn_name_pattern(fn_name, pattern) Optional[Union[str, Tuple[str, str]]] [source]¶
return matched group or groups, callable results or after matched prefix string
- domain: Collection[source]¶
the
autographed()
domains to search when wrapping functions, in-order; if undefined, only the default domain (None
) is included, otherwise, the default,None
, must be appended explicitely (usually at the end). List-ified if a single str,autographed()
decors for the 1st one matching are used;
- full_path_names[source]¶
Whether operation-nodes would be named after the fully qualified name (separated with . by default)
- out_patterns[source]¶
Autodeduce provides by parsing function-names against a collection of these items, and decide provides by the the 1st one matching (unless provides are specified in the overrides):
regex: may contain 1 or 2 groups:
1 group: the name of a single provides
2 groups: 2nd is the name of a single sideffected dependency, the 1st is the sideffect acting upon the former;
str: matched as a prefix of the function-name, which is trimmed by the first one matching to derrive a single provides;
Note that any out_sideffects in overrides, alone, do not block the rule above.
- overrides[source]¶
a mapping of
fn-keys --> dicts
with keys:name, needs, provides, renames, inp_sideffects, out_sideffects
An fn-key may be a string-tuple of names like:
([module, [class, ...] callable)
- renames[source]¶
global
from --> to
renamings applied both onto needs & provides. They are applied after merging has been completed, so they can rename even “inspected” names.
- wrap_funcs(funcs: Collection[Union[Callable, Tuple[Union[str, Collection[str]], Union[Callable, Collection[Callable]]]]], exclude=(), domain: Optional[Union[str, int, Collection]] = None) List[FnOp] [source]¶
Convert a (possibly @autographed) function into one (or more) operations.
- Parameters
fn – a list of funcs (or 2-tuples (name-path, fn-path)
See also
yield_wrapped_ops()
for the rest arguments.
- yield_wrapped_ops(fn: Union[Callable, Tuple[Union[str, Collection[str]], Union[Callable, Collection[Callable]]]], exclude=(), domain: Optional[Union[str, int, Collection]] = None) Iterable[FnOp] [source]¶
Convert a (possibly @autographed) function into an graphtik FnOperations,
respecting any configured overrides
- Parameters
fn –
either a callable, or a 2-tuple(name-path, fn-path) for:
[module[, class, ...]] callable
If fn is an operation, yielded as is (found also in 2-tuple).
Both tuple elements may be singulars, and are auto-tuple-zed.
The name-path may (or may not) correspond to the given fn-path, and is used to derrive the operation-name; If not given, the function name is inspected.
The last elements of the name-path are overridden by names in decorations; if the decor-name is the “default” (None), the name-path becomes the op-name.
The name-path is not used when matching overrides.
exclude – a list of decor-names to exclude, as stored in decors. Ignored if fn already an operation.
domain – if given, overrides
domain
forautographed()
decorators to search. List-ified if a single str,autographed()
decors for the 1st one matching are used.
- Returns
one or more
FnOp
instances (if more than one name is defined when the given function wasautographed()
).
Overriddes order: my-args, self.overrides, autograph-decorator, inspection
See also: David Brubeck Quartet, “40 days”
- class graphtik.autograph.FnHarvester(*, excludes: Optional[Iterable[Union[str, Pattern, Iterable[Union[str, Pattern]]]]] = None, base_modules: Optional[Iterable[Union[module, str]]] = None, predicate: Optional[Callable[[Any], bool]] = None, include_methods=False, sep=None)[source]¶
Collect public ops, routines, classes & their methods, partials into
collected
.- Parameters
collected –
a list of 2-tuples:
(name_path, item_path)
where the 2 paths correspond to the same items; the last path element is always a callable, and the previous items may be modules and/or classes, in case non-modules are given directly in
harvest()
:[module, [class, ...] callable
E.g. the path of a class constructor is
(module_name, class_name)
.For operations, the name-part is
None
.excludes – names to exclude; they can.be.prefixed or not
base_modules – skip function/classes not in these modules; if not given, include all items. If string, they are searched in
sys.modules
.predicate – any user callable accepting a single argument returning falsy to exclude the visited item
include_methods – Whether to collect methods from classes
Example:
>>> from graphtik.autograph import *
>>> modules = ('os', 'sys') >>> funcs = FnHarvester( ... base_modules=modules, ... include_methods=False, ... ).harvest() >>> len(funcs) > 50 True >>> funcs [(('os', 'PathLike'), ...
Use this pattern when iterating, to account for any operation instances:
>>> funcs = [ ... (name, fn if isinstance(fn, Operation) else fn) ... for name, fn ... in funcs ... ]
- __annotations__ = {'collected': typing.List[typing.Tuple[typing.Tuple[str, ...], typing.Tuple[typing.Callable, ...]]], 'include_methods': <class 'bool'>}¶
- __init__(*, excludes: Optional[Iterable[Union[str, Pattern, Iterable[Union[str, Pattern]]]]] = None, base_modules: Optional[Iterable[Union[module, str]]] = None, predicate: Optional[Callable[[Any], bool]] = None, include_methods=False, sep=None)[source]¶
- __module__ = 'graphtik.autograph'¶
- harvest(*items: Any, base_modules=Ellipsis) List[Tuple[str, Callable]] [source]¶
Collect any callable items and children, respecting base_modules, excludes etc.
- Parameters
items –
module fqdn (if already imported), items with
__name__
, like modules, classes, functions, or partials (without__name__
).If nothing is given, attr:`baseModules is used in its place.
Note
This parameter works differently from
base_modules
, that is, harvesting is not limited to those modules only, recursing to any imported ones from items.- Returns
the
collected
- class graphtik.autograph.Prefkey(sep=None)[source]¶
Index into dicts with a key or a joined(prefix(tuple)+key).
- __dict__ = mappingproxy({'__module__': 'graphtik.autograph', '__doc__': 'Index into dicts with a key or a joined(prefix(tuple)+key).', 'sep': '.', '__init__': <function Prefkey.__init__>, '_join_path_names': <function Prefkey._join_path_names>, '_prefkey': <function Prefkey._prefkey>, '__dict__': <attribute '__dict__' of 'Prefkey' objects>, '__weakref__': <attribute '__weakref__' of 'Prefkey' objects>, '__annotations__': {}})¶
- __module__ = 'graphtik.autograph'¶
- __weakref__¶
list of weak references to the object (if defined)
- _prefkey(d, key: Union[str, Pattern, Iterable[Union[str, Pattern]]], default: Optional[Union[Callable, Any]] = None)[source]¶
- sep = '.'¶
- graphtik.autograph.autographed(fn=<UNSET>, *, name=None, needs=<UNSET>, provides=<UNSET>, renames=<UNSET>, returns_dict=<UNSET>, aliases=<UNSET>, inp_sideffects=<UNSET>, out_sideffects=<UNSET>, domain: ~typing.Optional[~typing.Union[str, int, ~typing.Collection]] = None, **kws)[source]¶
Decorator adding
_autograph
func-attribute with overrides forAutograph
.- Parameters
name –
the name of the operation.
If the same name has already been defined for the same domain, it is overwritten; otherwise, a new decoration is appended, so that
Autograph.yield_wrapped_ops()
will produce more than one operations.if not given, it will be derrived from the fn on wrap-time.
domain – one or more list-ified domains to assign decors into (instead of the “default” domain); it allows to reuse the same function to build different operation, when later wrapped into an operation by
Autograph
.renames – mappings to rename both any matching the final needs & provides
inp_sideffects – appended into needs; if a tuple, makes it a
sfxed
out_sideffects – appended into provides; if a tuple, makes it a
sfxed
kws –
the rest arguments of
graphtik.operation
, such as:kwd, endured, parallel, marshalled, node_props
- graphtik.autograph.camel_2_snake_case(word)[source]¶
>>> camel_2_snake_case("HTTPResponseCodeXYZ") 'http_response_code_xyz'
- graphtik.autograph.get_autograph_decors(fn, default=None, domain: Optional[Union[str, int, Collection]] = None) dict [source]¶
Get the 1st match in domain of the fn
autographed()
special attribute.- Parameters
default – return this if fn non-autographed, or domain don’t match
domain – list-ified if a single str
- Returns
the decors that will override
Autograph
attributes, as found from the given fn, and for the 1st matching domain in domain:<fn>(): _autograph (function-attribute) <domain> (dict) <name> (dict) <decors> (dict)
Module: pipeline¶
compose pipelines by combining operations into network.
Note
This module (along with op
& modifier
) is what client code needs
to define pipelines on import time without incurring a heavy price
(<5ms on a 2019 fast PC)
- class graphtik.pipeline.Pipeline(operations, name, *, outputs=None, predicate: NodePredicate = None, cwd: str = None, rescheduled=None, endured=None, parallel=None, marshalled=None, node_props=None, renamer=None, excludes=None)[source]¶
An operation that can compute a network-graph of operations.
Tip
- __call__(**input_kwargs) Solution [source]¶
Delegates to
compute()
, respecting any narrowed outputs.
- __init__(operations, name, *, outputs=None, predicate: NodePredicate = None, cwd: str = None, rescheduled=None, endured=None, parallel=None, marshalled=None, node_props=None, renamer=None, excludes=None)[source]¶
For arguments, ee
withset()
& class attributes.- Raises
if dupe operation, with msg:
Operations may only be added once, …
- __module__ = 'graphtik.pipeline'¶
- __name__ = 'Pipeline'¶
Fake function attributes.
- __qualname__ = 'Pipeline'¶
Fake function attributes.
- _abc_impl = <_abc_data object>¶
- compile(inputs=None, outputs=<UNSET>, recompute_from=None, *, predicate: NodePredicate = <UNSET>) ExecutionPlan [source]¶
Produce a plan for the given args or outputs/predicate narrowed earlier.
- Parameters
named_inputs – a string or a list of strings that should be fed to the needs of all operations.
outputs – A string or a list of strings with all data asked to compute. If
None
, all possible intermediate outputs will be kept. If not given, those set by a previous call towithset()
or cstor are used.recompute_from – Described in
Pipeline.compute()
.predicate – Will be stored and applied on the next
compute()
orcompile()
. If not given, those set by a previous call towithset()
or cstor are used.
- Returns
the execution plan satisfying the given inputs, outputs & predicate
- Raises
- Unknown output nodes…
if outputs asked do not exist in network.
- Unsolvable graph: …
if it cannot produce any outputs from the given inputs.
- Plan needs more inputs…
if given inputs mismatched plan’s
needs
.- Unreachable outputs…
if net cannot produce asked outputs.
- compute(named_inputs: ~typing.Mapping = None, outputs: ~typing.Optional[~typing.Union[~typing.Collection, str]] = <UNSET>, recompute_from: ~typing.Optional[~typing.Union[~typing.Collection, str]] = None, *, predicate: NodePredicate = <UNSET>, callbacks=None, solution_class: Type[Solution] = None, layered_solution=None) Solution [source]¶
Compile & execute the plan, log jetsam & plot plottable on errors.
Attention
If intermediate planning is successful, the “global abort run flag is reset before the execution starts.
- Parameters
named_inputs – A mapping of names –> values that will be fed to the needs of all operations. Cloned, not modified.
outputs – A string or a list of dependencies with all data asked to compute. If
None
, all possible intermediate outputs will be kept. If not given, those set by a previous call towithset()
or cstor are used.recompute_from –
recompute operations downstream from these (string or list) dependencies. In effect, before compiling, it marks all values strictly downstream (excluding themselves) from the dependencies listed here, as missing from named_inputs.
Traversing downstream stops when arriving at any dep in outputs.
Any dependencies here unreachable downstreams from values in named_inputs are ignored, but logged.
Any dependencies here unreachable upstreams from outputs (if given) are ignored, but logged.
Results may differ even if graph is unchanged, in the presence of overwrites.
predicate – filter-out nodes before compiling If not given, those set by a previous call to
withset()
or cstor are used.callbacks – If given, a 2-tuple with (optional) callbacks to call before/after computing operation, with
OpTask
as argument containing the op & solution. Can be one (scalar), less than 2, or nothing/no elements accepted.solution_class – a custom solution factory to use
layered_solution –
whether to store operation results or just keys into separate solution layers
Unless overridden by a True/False in
set_layered_solution()
of configurations, it accepts the following values:When True(False), always keep results(just the keys) in a separate layer for each operation, regardless of any jsonp dependencies.
If
None
, layers are used only if there are NO jsonp dependencies in the network.
- Returns
The solution which contains the results of each operation executed +1 for inputs in separate dictionaries.
- Raises
If outputs asked do not exist in network, with msg:
Unknown output nodes: …
If plan does not contain any operations, with msg:
Unsolvable graph: …
If given inputs mismatched plan’s
needs
, with msg:Plan needs more inputs…
If net cannot produce asked outputs, with msg:
Unreachable outputs…
See also
Operation.compute()
.
- property graph¶
- withset(outputs: ~typing.Optional[~typing.Union[~typing.Collection, str]] = <UNSET>, predicate: NodePredicate = <UNSET>, *, name=None, cwd=None, rescheduled=None, endured=None, parallel=None, marshalled=None, node_props=None, renamer=None) Pipeline [source]¶
Return a copy with a network pruned for the given needs & provides.
- Parameters
outputs – Will be stored and applied on the next
compute()
orcompile()
. If not given, the value of this instance is conveyed to the clone.predicate – Will be stored and applied on the next
compute()
orcompile()
. If not given, the value of this instance is conveyed to the clone.name –
the name for the new pipeline:
if None, the same name is kept;
if True, a distinct name is devised:
<old-name>-<uid>
if ellipses(
...
), the name of the function where this function call happened is used,otherwise, the given name is applied.
cwd – The current-working-document, when given, all non-root dependencies (needs, provides & aliases) on all contained operations become jsonps, prefixed with this.
rescheduled – applies rescheduled to all contained operations
endured – applies endurance to all contained operations
parallel – (deprecated) mark all contained operations to be executed in parallel
marshalled – mark all contained operations to be marshalled (usefull when run in (deprecated) parallel with a process pool).
renamer – see respective parameter in
FnOp.withset()
.
- Returns
A narrowed pipeline clone, which MIGHT be empty!*
- Raises
If outputs asked do not exist in network, with msg:
Unknown output nodes: …
- graphtik.pipeline.build_network(operations, cwd=None, rescheduled=None, endured=None, parallel=None, marshalled=None, node_props=None, renamer=None, excludes=None)[source]¶
The network factory that does operation merging before constructing it.
- Parameters
nest – see same-named param in
compose()
- graphtik.pipeline.compose(name: Optional[Union[str, ellipsis]], op1: Operation, *operations: Operation, excludes=None, outputs: Optional[Union[Collection, str]] = None, cwd: Optional[str] = None, rescheduled=None, endured=None, parallel=None, marshalled=None, nest: Optional[Union[Callable[[RenArgs], str], Mapping[str, str], bool, str]] = None, node_props=None) Pipeline [source]¶
Merge or nest operations & pipelines into a new pipeline.
Tip
The module import-time dependencies have been carefully optimized so that importing all from package takes the minimum time (e.g. <10ms in a 2019 laptop):
>>> %time from graphtik import * # doctest: +SKIP CPU times: user 8.32 ms, sys: 34 µs, total: 8.35 ms Wall time: 7.53 ms
Still, constructing your pipeline\s on import time would take considerable more time (e.g. ~300ms for the 1st pipeline). So prefer to construct them in “factory” module functions (remember to annotate them with typing hints to denote their retun type).
Operations given earlier (further to the left) override those following (further to the right), similar to set behavior (and contrary to dict).
- Parameters
name – An optional name for the graph being composed by this object. If ellipses(
...
), derrived from function name where the pipeline is defined.op1 – syntactically force at least 1 operation
operations – each argument should be an operation or pipeline instance
excludes – A single string or list of operation-names to exclude from the final network (particularly useful when composing existing pipelines).
nest –
a dictionary or callable corresponding to the renamer paremater of
Pipeline.withset()
, but the calable receives a ren_args withRenArgs.parent
set when merging a pipeline, and applies the default nesting behavior (nest_any_node()
) on truthies.Specifically:
if it is a dictionary, it renames any operations & data named as keys into the respective values, like that:
if a value is callable or str, it is fed into
dep_renamed()
(hint: it can be single-arg callable like:(str) -> str
)it applies default all-nodes nesting if other truthy;
Note that you cannot access the “parent” name with dictionaries, you can only apply default all-node nesting by returning a non-string truthy.
if it is a
callable()
, it is given aRenArgs
instance to decide the node’s name.The callable may return a str for the new-name, or any other true/false to apply default all-nodes nesting.
For example, to nest just operation’s names (but not their dependencies), call:
compose( ..., nest=lambda ren_args: ren_args.typ == "op" )
Attention
The callable SHOULD wish to preserve any modifier on dependencies, and use
dep_renamed()
forRenArgs.typ
not ending in.jsonpart
.If false (default), applies operation merging, not nesting.
if true, applies default operation nesting to all types of nodes.
In all other cases, the names are preserved.
See also
Nesting for examples
Default nesting applied by
nest_any_node()
cwd – The current-working-document, when given, all non-root dependencies (needs, provides & aliases) on all contained operations become jsonps, prefixed with this.
rescheduled – applies rescheduled to all contained operations
endured – applies endurance to all contained operations
parallel – (deprecated) mark all contained operations to be executed in parallel
marshalled – mark all contained operations to be marshalled (usefull when run in (deprecated) parallel with a process pool).
node_props – Added as-is into NetworkX graph, to provide for filtering by
Pipeline.withset()
. Also plot-rendering affected if they match Graphviz properties, unless they start with underscore(_
)
- Returns
Returns a special type of operation class, which represents an entire computation graph as a single operation.
- Raises
If the net` cannot produce the asked outputs from the given inputs.
If nest callable/dictionary produced an non-string or empty name (see (NetworkPipeline))
Module: modifier¶
modifiers (print-out with diacritics) change dependency behavior during planning & execution.
|
Annotate optionals needs corresponding to defaulted op-function arguments, ... |
|
Annotate a dependency that maps to a different name in the underlying function. |
|
implicit dependencies are not fed into/out of the function, usually they are accessed as jsonp from some other dependency or from the solution directly (eg through the "unstable" API |
|
tokens denoting modifications beyond the scope of the solution. |
|
Annotates a sideffected dependency in the solution sustaining side-effects. |
|
Like |
|
Like |
|
Annotate a varargish needs to be fed as function's |
|
An varargish |
|
Provides-only, see pandas concatenation & generic |
|
Provides-only, see pandas concatenation & generic |
|
Generic modifier for term:json pointer path & implicit dependencies. |
|
Returns the underlying dependency name (just str) |
|
Check if a dependency is keyword (and get it, last step if jsonp). |
|
Check if a dependency is optional. |
|
Check if an optionals dependency is vararg. |
|
Check if an optionals dependency is varargs. |
|
|
|
Parse dep as jsonp (unless modified with |
|
Check if dependency is json pointer path and return its steps. |
|
Check if a dependency is tokens or sideffected. |
|
Check if it is tokens but not a sideffected. |
|
Check if it is sideffected. |
|
Return if it is a implicit dependency. |
|
Check if dependency has an accessor, and get it (if funcs below are unfit) |
|
Renames dep as ren or call ren` (if callable) to decide its name, |
|
Return one sideffected for each sfx in |
|
Return the |
|
Make a new modifier with changes -- handle with care. |
The needs and provides annotated with modifiers designate, for instance, optional function arguments, or “ghost” tokens.
Note
This module (along with op
& pipeline
) is what client code needs
to define pipelines on import time without incurring a heavy price
(~7ms on a 2019 fast PC)
Diacritics
The representation
of modifier-annotated dependencies
utilize a combination of these diacritics:
> :keyword()
? :optional()
* :vararg()
+ :varargs()
@ : accessor (mostly for jsonp) $ :token()
^ :implicit()
- class graphtik.modifier.Accessor(contains: Callable[[dict, str], Any], getitem: Callable[[dict, str], Any], setitem: Callable[[dict, str, Any], None], delitem: Callable[[dict, str], None], update: Optional[Callable[[dict, Collection[Tuple[str, Any]]], None]] = None)[source]¶
Getter/setter functions to extract/populate values from a solution layer.
Note
Don’t use its attributes directly, prefer instead the functions returned from
acc_contains()
etc on any dep (plain strings included).TODO: drop accessors, push functionality into jsonp alone.
- static __new__(_cls, contains: Callable[[dict, str], Any], getitem: Callable[[dict, str], Any], setitem: Callable[[dict, str, Any], None], delitem: Callable[[dict, str], None], update: Optional[Callable[[dict, Collection[Tuple[str, Any]]], None]] = None)¶
Create new instance of Accessor(contains, getitem, setitem, delitem, update)
- property contains¶
the containment checker, like:
dep in sol
;
- property delitem¶
the deleter, like:
delitem(sol, dep)
- property getitem¶
the getter, like:
getitem(sol, dep) -> value
- property setitem¶
the setter, like:
setitem(sol, dep, val)
,
- property update¶
mass updater, like:
update(sol, item_values)
,
- graphtik.modifier.HCatAcc()[source]¶
Read/write jsonp and concat columns (axis=1) if both doc & value are Pandas.
- graphtik.modifier.JsonpAcc()[source]¶
Read/write jsonp paths found on modifier’s “extra’ attribute jsonpath
- graphtik.modifier.VCatAcc()[source]¶
Read/write jsonp and concat columns (axis=1) if both doc & value are Pandas.
- class graphtik.modifier._Modifier(name, _repr, _func, keyword, optional: _Optionals, implicit, accessor, sideffected, sfx_list, **kw)[source]¶
Annotate a dependency with a combination of modifier.
This class is private, because client code should not need to call its cstor, or check if a dependency
isinstance()
, but use these facilities instead:the factory functions like
keyword()
,optional()
etc,the predicates like
is_optional()
,is_token()
etc,the conversion functions like
dep_renamed()
,dep_stripped()
etc,and only rarely (and with care) call its
modifier_withset()
method or_modifier()
factor functions.
- Parameters
kw – any extra attributes not needed by execution machinery such as the jsonp, which is used only by accessor.
Note
Factory function:func:_modifier() may return a plain string, if no other arg but
name
is given.- static __new__(cls, name, _repr, _func, keyword, optional: _Optionals, implicit, accessor, sideffected, sfx_list, **kw) _Modifier [source]¶
Warning, returns None!
- __weakref__¶
list of weak references to the object (if defined)
- _accessor: Accessor = None[source]¶
An accessor with getter/setter functions to read/write solution values. Any sequence of 2-callables will do.
- _keyword: str = None[source]¶
Map my name in needs into this kw-argument of the function.
get_keyword()
returns it.
- _optional: _Optionals = None[source]¶
required is None, regular optional or varargish?
is_optional()
returns it. All regulars are keyword.
- _sfx_list: Tuple[Optional[str]] = ()[source]¶
At least one name(s) denoting the tokens modification(s) on the sideffected, performed/required by the operation.
If it is an empty tuple`, it is a token.
If not empty
is_sfxed()
returns true (the_sideffected
).
- _sideffected: str = None[source]¶
Has value only for sideffects: the pure-sideffect string or the existing sideffected dependency.
- property cmd¶
the code to reproduce it
- graphtik.modifier._modifier(name, *, keyword=None, optional: Optional[_Optionals] = None, implicit=None, accessor=None, sideffected=None, sfx_list=(), jsonp=None, **kw) Union[str, _Modifier] [source]¶
A
_Modifier
factory that may return a plain str when no other args given.It decides the final name and _repr for the new modifier by matching the given inputs with the
_modifier_cstor_matrix
.
- graphtik.modifier._modifier_cstor_matrix = {7000000: None, 7000010: ('$%(dep)s', "'%(dep)s'($)", 'token'), 7000011: ("sfxed('%(dep)s', %(sfx)s)", "sfxed(%(acs)s'%(dep)s', %(sfx)s)", 'sfxed'), 7000100: ('%(dep)s', "'%(dep)s'(@)", 'accessor'), 7000111: ("sfxed('%(dep)s', %(sfx)s)", "sfxed('%(dep)s'(@), %(sfx)s)", 'sfxed'), 7001000: ('%(dep)s', "'%(dep)s'(^)", 'implicit'), 7010010: ('$%(dep)s', "'%(dep)s'($?)", 'token'), 7011000: ('%(dep)s', "'%(dep)s'(^?)", 'implicit'), 7020000: ('%(dep)s', "'%(dep)s'(*)", 'vararg'), 7020011: ("sfxed('%(dep)s', %(sfx)s)", "sfxed(%(acs)s'%(dep)s'(*), %(sfx)s)", 'sfxed_vararg'), 7020100: ('%(dep)s', "'%(dep)s'(@*)", 'vararg'), 7020111: ("sfxed('%(dep)s', %(sfx)s)", "sfxed('%(dep)s'(@*), %(sfx)s)", 'sfxed_vararg'), 7030000: ('%(dep)s', "'%(dep)s'(+)", 'varargs'), 7030011: ("sfxed('%(dep)s', %(sfx)s)", "sfxed(%(acs)s'%(dep)s'(+), %(sfx)s)", 'sfxed_varargs'), 7030100: ('%(dep)s', "'%(dep)s'(@+)", 'varargs'), 7030111: ("sfxed('%(dep)s', %(sfx)s)", "sfxed('%(dep)s'(@+), %(sfx)s)", 'sfxed_varargs'), 7100000: ('%(dep)s', "'%(dep)s'(%(acs)s>%(kw)s)", 'keyword'), 7100011: ("sfxed('%(dep)s', %(sfx)s)", "sfxed(%(acs)s'%(dep)s'(>%(kw)s), %(sfx)s)", 'sfxed'), 7100100: ('%(dep)s', "'%(dep)s'(@>%(kw)s)", 'keyword'), 7100111: ("sfxed('%(dep)s', %(sfx)s)", "sfxed('%(dep)s'(@>%(kw)s), %(sfx)s)", 'sfxed'), 7110000: ('%(dep)s', "'%(dep)s'(%(acs)s?%(kw)s)", 'optional'), 7110011: ("sfxed('%(dep)s', %(sfx)s)", "sfxed(%(acs)s'%(dep)s'(?%(kw)s), %(sfx)s)", 'sfxed'), 7110100: ('%(dep)s', "'%(dep)s'(@?%(kw)s)", 'optional'), 7110111: ("sfxed('%(dep)s', %(sfx)s)", "sfxed('%(dep)s'(@?%(kw)s), %(sfx)s)", 'sfxed')}¶
Arguments-presence patterns for
_Modifier
constructor. Combinations missing raise errors.
- graphtik.modifier.acc_contains(dep) Callable[[Collection, str], Any] [source]¶
A fn like
operator.contains()
for any dep (with-or-without accessor)
- graphtik.modifier.acc_delitem(dep) Callable[[Collection, str], None] [source]¶
A fn like
operator.delitem()
for any dep (with-or-without accessor)
- graphtik.modifier.acc_getitem(dep) Callable[[Collection, str], Any] [source]¶
A fn like
operator.getitem()
for any dep (with-or-without accessor)
- graphtik.modifier.acc_setitem(dep) Callable[[Collection, str, Any], None] [source]¶
A fn like
operator.setitem()
for any dep (with-or-without accessor)
- graphtik.modifier.dep_renamed(dep, ren, jsonp=None) Union[_Modifier, str] [source]¶
Renames dep as ren or call ren` (if callable) to decide its name,
preserving any
keyword()
to old-name.- Parameters
jsonp – None (derrived from name),
False
, str, collection of str/callable (last one) See genericmodify()
modifier.
For sideffected it renames the dependency (not the sfx-list) – you have to do it that manually with a custom renamer-function, if ever the need arise.
- graphtik.modifier.dep_singularized(dep) Iterable[Union[str, _Modifier]] [source]¶
Return one sideffected for each sfx in
_sfx_list
, or iterate dep in other cases.
- graphtik.modifier.dep_stripped(dep) Union[str, _Modifier] [source]¶
Return the
_sideffected
if dep is sideffected, dep otherwise,conveying all other properties of the original modifier to the stripped dependency.
- graphtik.modifier.dependency(dep) str [source]¶
Returns the underlying dependency name (just str)
For non-sideffects, it coincides with str(), otherwise, the the pure-sideffect string or the existing sideffected dependency stored in
_sideffected
.
- graphtik.modifier.get_accessor(dep) bool [source]¶
Check if dependency has an accessor, and get it (if funcs below are unfit)
- Returns
the
_accessor
- graphtik.modifier.get_jsonp(dep) Optional[List[str]] [source]¶
Check if dependency is json pointer path and return its steps.
- graphtik.modifier.get_keyword(dep) Optional[str] [source]¶
Check if a dependency is keyword (and get it, last step if jsonp).
All non-varargish optionals are “keyword” (including sideffected ones).
- Returns
the
_keyword
- graphtik.modifier.hcat(name, *, keyword: Optional[str] = None, jsonp=None) _Modifier [source]¶
Provides-only, see pandas concatenation & generic
modify()
modifier.
- graphtik.modifier.implicit(name, *, optional: Optional[bool] = None, jsonp=None) _Modifier [source]¶
implicit dependencies are not fed into/out of the function, usually they are accessed as jsonp from some other dependency or from the solution directly (eg through the “unstable” API
task_context()
).
- graphtik.modifier.is_optional(dep) Optional[_Optionals] [source]¶
Check if a dependency is optional.
Varargish & optional sideffects are included.
- Returns
the
_optional
- graphtik.modifier.is_sfx(dep) Optional[str] [source]¶
Check if a dependency is tokens or sideffected.
- Returns
the
_sideffected
- graphtik.modifier.is_sfxed(dep) bool [source]¶
Check if it is sideffected.
- Returns
the
_sfx_list
if it is a sideffected dep, None/empty-tuple otherwise
- graphtik.modifier.is_token(dep) bool [source]¶
Check if it is tokens but not a sideffected.
- graphtik.modifier.jsonp_ize(dep)[source]¶
Parse dep as jsonp (unless modified with
jsnop=False
) or is pure sfx.
- graphtik.modifier.keyword(name: str, keyword: Optional[str] = None, accessor: Optional[Accessor] = None, jsonp=None) _Modifier [source]¶
Annotate a dependency that maps to a different name in the underlying function.
When used on needs dependencies:
The value of the
name
dependency is read from the solution, and thenthat value is passed in the function as a keyword-argument named
keyword
.
When used on provides dependencies:
The operation must be a returns dictionary.
The value keyed with
keyword
is read from function’s returned dictionary, and thenthat value is placed into solution named as
name
.
- Parameters
keyword –
The argument-name corresponding to this named-input. If it is None, assumed the same as name, so as to behave always like kw-type arg, and to preserve its fn-name if ever renamed.
accessor – the functions to access values to/from solution (see
Accessor
) (actually a 2-tuple with functions is ok)jsonp – None (derrived from name),
False
, str, collection of str/callable (last one) See genericmodify()
modifier.
- Returns
a
_Modifier
instance, even if no keyword is given OR it is the same as name.
Example:
In case the name of a function input argument is different from the name in the graph (or just because the name in the inputs is not a valid argument-name), you may map it with the 2nd argument of
keyword()
:>>> from graphtik import operation, compose, keyword
>>> @operation(needs=[keyword("name-in-inputs", "fn_name")], provides="result") ... def foo(*, fn_name): # it works also with non-positional args ... return fn_name >>> foo FnOp(name='foo', needs=['name-in-inputs'(>'fn_name')], provides=['result'], fn='foo')
>>> pipe = compose('map a need', foo) >>> pipe Pipeline('map a need', needs=['name-in-inputs'], provides=['result'], x1 ops: foo)
>>> sol = pipe.compute({"name-in-inputs": 4}) >>> sol['result'] 4
You can do the same thing to the results of a returns dictionary operation:
>>> op = operation(lambda: {"fn key": 1}, ... name="renaming `provides` with a `keyword`", ... provides=keyword("graph key", "fn key"), ... returns_dict=True) >>> op FnOp(name='renaming `provides` with a `keyword`', provides=['graph key'(>'fn key')], fn{}='<lambda>')
Hint
Mapping provides names wouldn’t make sense for regular operations, since these are defined arbitrarily at the operation level. OTOH, the result names of returns dictionary operation are decided by the underlying function, which may lie beyond the control of the user (e.g. from a 3rd-party object).
- graphtik.modifier.modifier_withset(dep, name=Ellipsis, keyword=Ellipsis, optional: _Optionals = Ellipsis, implicit=Ellipsis, accessor=Ellipsis, sideffected=Ellipsis, sfx_list=Ellipsis, **kw) Union[_Modifier, str] [source]¶
Make a new modifier with changes – handle with care.
- Returns
Delegates to
_modifier()
, so returns a plain string if no args left.
- graphtik.modifier.modify(name: str, *, keyword=None, jsonp=None, implicit=None, accessor: Optional[Accessor] = None) _Modifier [source]¶
Generic modifier for term:json pointer path & implicit dependencies.
- Parameters
jsonp –
If given, it may be some other json-pointer expression, or the pre-splitted parts of the jsonp dependency – in that case, the dependency name is irrelevant – or a falsy (but not
None
) value, to disable the automatic interpeting of the dependency name as a json pointer path, regardless of any containing slashes.If accessing pandas, you may pass an already splitted path with its last part being a callable indexer (Selection by callable).
In addition to writing values, the
vcat()
orhcat()
modifiers (& respective accessors) support also pandas concatenation for provides (see example in Concatenating Pandas).implicit – implicit dependencies are not fed into/out of the function. You may use directly
implicit()
.accessor – Annotate the dependency with accessor functions to read/write solution (actually a 2-tuple with functions is ok)
Example:
Let’s use json pointer dependencies along with the default conveyor operation to build an operation copying values around in the solution:
>>> from graphtik import operation, compose, modify
>>> copy_values = operation( ... fn=None, # ask for the "conveyor op" ... name="copy a+b-->A+BB", ... needs=["inputs/a", "inputs/b"], ... provides=["RESULTS/A", "RESULTS/BB"] ... )
>>> results = copy_values.compute({"inputs": {"a": 1, "b": 2}}) Traceback (most recent call last): ValueError: Failed matching inputs <=> needs for FnOp(name='copy a+b-->A+BB', needs=['inputs/a'(@), 'inputs/b'(@)], provides=['RESULTS/A'(@), 'RESULTS/BB'(@)], fn='identity_fn'): 1. Missing compulsory needs['inputs/a'(@), 'inputs/b'(@)]! +++inputs: ['inputs']
>>> results = copy_values.compute({"inputs/a": 1, "inputs/b": 2}) >>> results {'RESULTS/A'(@): 1, 'RESULTS/BB'(@): 2}
Notice that the hierarchical dependencies did not yet worked, because jsonp modifiers work internally with accessors, and
FnOp
is unaware of them – it’s theSolution
class that supports accessors*, and this requires the operation to be wrapped in a pipeline (see below).Note also that it we see the “representation’ of the key as
'RESULTS/A'(@)
but the actual string value simpler:>>> str(next(iter(results))) 'RESULTS/A'
The results were not nested, because this modifer works with accessor functions, that act only on a real
Solution
, given to the operation only when wrapped in a pipeline (as done below).Now watch how these paths access deep into solution when the same operation is wrapped in a pipeline:
>>> pipe = compose("copy pipe", copy_values) >>> sol = pipe.compute({"inputs": {"a": 1, "b": 2}}, outputs="RESULTS") >>> sol {'RESULTS': {'A': 1, 'BB': 2}}
- graphtik.modifier.optional(name: str, keyword: Optional[str] = None, accessor: Optional[Accessor] = None, jsonp=None) _Modifier [source]¶
Annotate optionals needs corresponding to defaulted op-function arguments, …
received only if present in the inputs (when operation is invoked).
The value of an optional dependency is passed in as a keyword argument to the underlying function.
- Parameters
keyword – the name for the function argument it corresponds; if a falsy is given, same as name assumed, to behave always like kw-type arg and to preserve its fn-name if ever renamed.
accessor – the functions to access values to/from solution (see
Accessor
) (actually a 2-tuple with functions is ok)jsonp – None (derrived from name),
False
, str, collection of str/callable (last one) See genericmodify()
modifier.
Example:
>>> from graphtik import operation, compose, optional
>>> @operation(name='myadd', ... needs=["a", optional("b")], ... provides="sum") ... def myadd(a, b=0): ... return a + b
Notice the default value
0
to theb
annotated as optional argument:>>> graph = compose('mygraph', myadd) >>> graph Pipeline('mygraph', needs=['a', 'b'(?)], provides=['sum'], x1 ops: myadd)
The graph works both with and without
c
provided in the inputs:>>> graph(a=5, b=4)['sum'] 9 >>> graph(a=5) {'a': 5, 'sum': 5}
Like
keyword()
you may map input-name to a different function-argument:>>> operation(needs=['a', optional("quasi-real", "b")], ... provides="sum" ... )(myadd.fn) # Cannot wrap an operation, its `fn` only. FnOp(name='myadd', needs=['a', 'quasi-real'(?'b')], provides=['sum'], fn='myadd')
- graphtik.modifier.sfxed(dependency: str, sfx0: str, *sfx_list: str, keyword: Optional[str] = None, optional: Optional[bool] = None, accessor: Optional[Accessor] = None, jsonp=None) _Modifier [source]¶
Annotates a sideffected dependency in the solution sustaining side-effects.
- Parameters
dependency – the actual dependency receiving the sideffect, which will be fed into/out of the function.
sfx0 – the 1st (arbitrary object) sideffect marked as “acting” on the dependency.
sfx_list – more (arbitrary object) sideffects (like the sfx0)
keyword – the name for the function argument it corresponds. When optional, it becomes the same as name if falsy, so as to behave always like kw-type arg, and to preserve fn-name if ever renamed. When not optional, if not given, it’s all fine.
accessor – the functions to access values to/from solution (see
Accessor
) (actually a 2-tuple with functions is ok)jsonp – None (derrived from name),
False
, str, collection of str/callable (last one) See genericmodify()
modifier.
Like
token()
but annotating a real dependency in the solution, allowing that dependency to be present both in needs and provides of the same function.Example:
A typical use-case is to signify columns required to produce new ones in pandas dataframes (emulated with dictionaries):
>>> from graphtik import operation, compose, sfxed
>>> @operation(needs="order_items", ... provides=sfxed("ORDER", "Items", "Prices")) ... def new_order(items: list) -> "pd.DataFrame": ... order = {"items": items} ... # Pretend we get the prices from sales. ... order['prices'] = list(range(1, len(order['items']) + 1)) ... return order
>>> @operation( ... needs=[sfxed("ORDER", "Items"), "vat rate"], ... provides=sfxed("ORDER", "VAT") ... ) ... def fill_in_vat(order: "pd.DataFrame", vat: float): ... order['VAT'] = [i * vat for i in order['prices']] ... return order
>>> @operation( ... needs=[sfxed("ORDER", "Prices", "VAT")], ... provides=sfxed("ORDER", "Totals") ... ) ... def finalize_prices(order: "pd.DataFrame"): ... order['totals'] = [p + v for p, v in zip(order['prices'], order['VAT'])] ... return order
To view all internal dependencies, enable DEBUG in configurations:
>>> from graphtik.config import debug_enabled
>>> with debug_enabled(True): ... finalize_prices FnOp(name='finalize_prices', needs=[sfxed('ORDER', 'Prices'), sfxed('ORDER', 'VAT')], _user_needs=[sfxed('ORDER', 'Prices', 'VAT')], _fn_needs=['ORDER'], provides=[sfxed('ORDER', 'Totals')], _user_provides=[sfxed('ORDER', 'Totals')], _fn_provides=['ORDER'], fn='finalize_prices')
Notice that declaring a single sideffected with many items in sfx_list, expands into multiple “singular”
sideffected
dependencies in the network (checkneeds
vs_user_needs
above).>>> proc_order = compose('process order', new_order, fill_in_vat, finalize_prices) >>> sol = proc_order.compute({ ... "order_items": ["toilet-paper", "soap"], ... "vat rate": 0.18, ... }) >>> sol {'order_items': ['toilet-paper', 'soap'], 'vat rate': 0.18, 'ORDER': {'items': ['toilet-paper', 'soap'], 'prices': [1, 2], 'VAT': [0.18, 0.36], 'totals': [1.18, 2.36]}}
sideffecteds¶
Notice that although many functions consume & produce the same
ORDER
dependency, something that would have formed cycles, the wrapping operations need and provide different sideffected instances, breaking thus the cycles.See also
The elaborate example in Hierarchical data and further tricks section.
- graphtik.modifier.sfxed_vararg(dependency: str, sfx0: str, *sfx_list: str, accessor: Optional[Accessor] = None, jsonp=None) _Modifier [source]¶
Like
sideffected()
+vararg()
.
- graphtik.modifier.sfxed_varargs(dependency: str, sfx0: str, *sfx_list: str, accessor: Optional[Accessor] = None, jsonp=None) _Modifier [source]¶
Like
sideffected()
+varargs()
.
- graphtik.modifier.token(name, optional: Optional[bool] = None) _Modifier [source]¶
tokens denoting modifications beyond the scope of the solution.
Both needs & provides may be designated as tokens using this modifier. They work as usual while solving the graph (planning) but they have a limited interaction with the operation’s underlying function; specifically:
input tokens must exist in the solution as inputs for an operation depending on it to kick-in, when the computation starts - but this is not necessary for intermediate tokens in the solution during execution;
input tokens are NOT fed into underlying functions;
output tokens are not expected from underlying functions, unless a rescheduled operation with partial outputs designates a sideffected as canceled by returning it with a falsy value (operation must returns dictionary).
Tip
If modifications involve some input/output, prefer
implicit()
orsfxed()
modifiers.You may still convey this relationships by including the dependency name in the string - in the end, it’s just a string - but no enforcement of any kind will happen from graphtik, like:
>>> from graphtik import token
>>> token("price[sales_df]") 'price[sales_df]'($)
Example:
A typical use-case is to signify changes in some “global” context, outside solution:
>>> from graphtik import operation, compose, token
>>> @operation(provides=token("lights off")) # sideffect names can be anything ... def close_the_lights(): ... pass
>>> graph = compose('strip ease', ... close_the_lights, ... operation( ... name='undress', ... needs=[token("lights off")], ... provides="body")(lambda: "TaDa!") ... ) >>> graph Pipeline('strip ease', needs=['lights off'($)], provides=['lights off'($), 'body'], x2 ops: close_the_lights, undress)
>>> sol = graph() >>> sol {'body': 'TaDa!'}
sideffect¶
Note
Something has to provide a sideffect for a function needing it to execute - this could be another operation, like above, or the user-inputs; just specify some truthy value for the sideffect:
>>> sol = graph.compute({token("lights off"): True})
- graphtik.modifier.vararg(name: str, accessor: Optional[Accessor] = None, jsonp=None) _Modifier [source]¶
Annotate a varargish needs to be fed as function’s
*args
.- Parameters
See also
Consult also the example test-case in:
test/test_op.py:test_varargs()
, in the full sources of the project.Example:
We designate
b
&c
as vararg arguments:>>> from graphtik import operation, compose, vararg
>>> @operation( ... needs=['a', vararg('b'), vararg('c')], ... provides='sum' ... ) ... def addall(a, *b): ... return a + sum(b) >>> addall FnOp(name='addall', needs=['a', 'b'(*), 'c'(*)], provides=['sum'], fn='addall')
>>> graph = compose('mygraph', addall)
The graph works with and without any of
b
orc
inputs:>>> graph(a=5, b=2, c=4)['sum'] 11 >>> graph(a=5, b=2) {'a': 5, 'b': 2, 'sum': 7} >>> graph(a=5) {'a': 5, 'sum': 5}
- graphtik.modifier.varargs(name: str, accessor: Optional[Accessor] = None, jsonp=None) _Modifier [source]¶
An varargish
vararg()
, naming a iterable value in the inputs.- Parameters
See also
Consult also the example test-case in:
test/test_op.py:test_varargs()
, in the full sources of the project.Example:
>>> from graphtik import operation, compose, varargs
>>> def enlist(a, *b): ... return [a] + list(b)
>>> graph = compose('mygraph', ... operation(name='enlist', needs=['a', varargs('b')], ... provides='sum')(enlist) ... ) >>> graph Pipeline('mygraph', needs=['a', 'b'(?)], provides=['sum'], x1 ops: enlist)
The graph works with or without b in the inputs:
>>> graph(a=5, b=[2, 20])['sum'] [5, 2, 20] >>> graph(a=5) {'a': 5, 'sum': [5]} >>> graph(a=5, b=0xBAD) Traceback (most recent call last): ValueError: Failed matching inputs <=> needs for FnOp(name='enlist', needs=['a', 'b'(+)], provides=['sum'], fn='enlist'): 1. Expected varargs inputs to be non-str iterables: {'b'(+): 2989} +++inputs: ['a', 'b']
Attention
To avoid user mistakes, varargs do not accept
str
inputs (though iterables):>>> graph(a=5, b="mistake") Traceback (most recent call last): ValueError: Failed matching inputs <=> needs for FnOp(name='enlist', needs=['a', 'b'(+)], provides=['sum'], fn='enlist'): 1. Expected varargs inputs to be non-str iterables: {'b'(+): 'mistake'} +++inputs: ['a', 'b']
See also
The elaborate example in Hierarchical data and further tricks section.
Module: planning¶
compose network of operations & dependencies, compile the plan.
- class graphtik.planning.Network(*operations, graph=None)[source]¶
A graph of operations that can compile an execution plan.
- needs[source]¶
the “base”, all data-nodes that are not produced by some operation, decided on construction.
- __init__(*operations, graph=None)[source]¶
- Parameters
operations – to be added in the graph
graph – if None, create a new.
- Raises
if dupe operation, with msg:
Operations may only be added once, …
- __module__ = 'graphtik.planning'¶
- _abc_impl = <_abc_data object>¶
- _append_operation(graph, operation: Operation)[source]¶
Adds the given operation and its data requirements to the network graph.
Invoked during constructor only (immutability).
Identities are based on the name of the operation, the names of the operation’s needs, and the names of the data it provides.
Adds needs, operation & provides, in that order.
- Parameters
graph – the networkx graph to append to
operation – operation instance to append
- _build_execution_steps(pruned_dag, sorted_nodes, inputs: Collection, outputs: Collection) List [source]¶
Create the list of operations and eviction steps, to execute given IOs.
- Parameters
pruned_dag – The original dag, pruned; not broken.
sorted_nodes – an
IndexedSet
with all graph nodes topo-sorted (including pruned ones) by execution order & operation-insertion to break ties (see_topo_sort_nodes()
).inputs – Not used(!), useless inputs will be evicted when the solution is created.
outputs – outp-names to decide whether to add (and which) evict-instructions
- Returns
the list of operation or dependencies to evict, in computation order
IMPLEMENTATION:
The operation steps are based on the topological sort of the DAG, therefore pruning must have eliminated any cycles.
Then the eviction steps are introduced between the operation nodes (if enabled, and outputs have been asked, or else all outputs are kept), to reduce asap solution’s memory footprint while the computation is running.
An evict-instruction is inserted on 2 occasions:
whenever a need of a an executed op is not used by any other operation further down the DAG.
whenever a provide falls beyond the pruned_dag.
For doc chains, it is either evicted the whole chain (from root), or nothing at all.
For eviction purposes,
sfxed
dependencies are equivalent to their stripped sideffected ones, so these are also inserted in the graph (after sorting, to evade cycles).
- _cached_plans[source]¶
Speed up
compile()
call and avoid a multithreading issue(?) that is occurring when accessing the dag in networkx.
- _prune_graph(inputs: Optional[Union[Collection, str]], outputs: Optional[Union[Collection, str]], predicate: Optional[Callable[[Any, Mapping], bool]] = None) Tuple[DiGraph, Tuple, Tuple, Mapping[Operation, Any]] [source]¶
Determines what graph steps need to run to get to the requested outputs from the provided inputs: - Eliminate steps that are not on a path arriving to requested outputs; - Eliminate unsatisfied operations: partial inputs or no outputs needed; - consolidate the list of needs & provides.
- Parameters
inputs – The names of all given inputs.
outputs – The desired output names. This can also be
None
, in which case the necessary steps are all graph nodes that are reachable from the provided inputs.predicate – the node predicate is a 2-argument callable(op, node-data) that should return true for nodes to include; if None, all nodes included.
- Returns
a 4-tuple:
the pruned execution dag,
net’s needs & outputs based on the given inputs/outputs and the net (may overlap, see
collect_requirements()
),an {op, prune-explanation} dictionary
Use the returned needs/provides to build a new plan.
- Raises
if outputs asked do not exist in network, with msg:
Unknown output nodes: …
- compile(inputs: Optional[Union[Collection, str]] = None, outputs: Optional[Union[Collection, str]] = None, recompute_from=None, *, predicate=None) ExecutionPlan [source]¶
Create or get from cache an execution-plan for the given inputs/outputs.
See
_prune_graph()
and_build_execution_steps()
for detailed description.- Parameters
inputs – A collection with the names of all the given inputs. If None`, all inputs that lead to given outputs are assumed. If string, it is converted to a single-element collection.
outputs – A collection or the name of the output name(s). If None`, all reachable nodes from the given inputs are assumed. If string, it is converted to a single-element collection.
recompute_from – Described in
Pipeline.compute()
.predicate – the node predicate is a 2-argument callable(op, node-data) that should return true for nodes to include; if None, all nodes included.
- Returns
the cached or fresh new execution plan
- Raises
- Unknown output nodes…
if outputs asked do not exist in network.
- Unsolvable graph: …
if it cannot produce any outputs from the given inputs.
- Plan needs more inputs…
if given inputs mismatched plan’s
needs
.- Unreachable outputs…
if net cannot produce asked outputs.
- graph: networkx.Graph[source]¶
The
networkx
(Di)Graph containing all operations and dependencies, prior to planning.
- prepare_plot_args(plot_args: PlotArgs) PlotArgs [source]¶
Called by
plot()
to create the nx-graph and other plot-args, e.g. solution.Clone the graph or merge it with the one in the plot_args (see
PlotArgs.clone_or_merge_graph()
.For the rest args, prefer
PlotArgs.with_defaults()
over_replace()
, not to override user args.
- graphtik.planning._optionalized(graph, data)[source]¶
Retain optionality of a data node based on all needs edges.
- graphtik.planning._topo_sort_nodes(graph) IndexedSet [source]¶
Topo-sort graph by execution order & operation-insertion order to break ties.
This means (probably!?) that the first inserted win the needs, but the last one win the provides (and the final solution).
Inform user in case of cycles.
- graphtik.planning._yield_also_chained_docs(dig_dag: List[Tuple[str, int]], dag, doc: str, stop_set=()) Iterable[str] [source]¶
Dig the doc and its sub/super docs, not recursing in those already in stop_set.
- Parameters
dig_dag – a sequence of 2-tuples like
("in_edges", 0)
, with the name of a networkx method and which edge-node to pick, 0:= src, 1:= dststop_set – Stop traversing (and don’t return) doc if already contained in this set.
- Returns
the given doc, and any other docs discovered with dig_dag linked with a “subdoc” attribute on their edge, except those sub-trees with a root node already in stop_set. If doc is not in dag, returns empty.
- graphtik.planning._yield_chained_docs(dig_dag: Union[Tuple[str, int], List[Tuple[str, int]]], dag, docs: Iterable[str], stop_set=()) Iterable[str] [source]¶
Like
_yield_also_chained_docs()
but digging for many docs at once.- Returns
the given docs, and any other nodes discovered with dig_dag linked with a “subdoc” attribute on their edge, except those sub-trees with a root node already in stop_set.
- graphtik.planning.clone_graph_with_stripped_sfxed(graph)[source]¶
Clone graph including ALSO stripped sideffected deps, with original attrs.
- graphtik.planning.collect_requirements(graph) Tuple[IndexedSet, IndexedSet] [source]¶
Collect & split datanodes in (possibly overlapping) needs/provides.
- graphtik.planning.inputs_for_recompute(graph, inputs: Sequence[str], recompute_from: Sequence[str], recompute_till: Optional[Sequence[str]] = None) Tuple[IndexedSet, IndexedSet] [source]¶
Clears the inputs between recompute_from >–<= recompute_till to clear.
- Parameters
graph – MODIFIED, at most 2 helper nodes inserted
inputs – a sequence
recompute_from – None or a sequence, including any out-of-graph deps (logged))
recompute_till – (optional) a sequence, only in-graph deps.
- Returns
a 2-tuple with the reduced inputs by the dependencies that must be removed from the graph to recompute (along with those dependencies).
It works by temporarily adding x2 nodes to find and remove the intersection of:
strict-descendants(recompute_from) & ancestors(recompute_till)
FIXME: merge recompute() with travesing unsatisfied (see
test_recompute_NEEDS_FIX
) bc it clears inputs of unsatisfied ops (cannot be replaced later)
- graphtik.planning.log = <Logger graphtik.planning (WARNING)>¶
If this logger is eventually DEBUG-enabled, the string-representation of network-objects (network, plan, solution) is augmented with children’s details.
- graphtik.planning.root_doc(dag, doc: str) str [source]¶
Return the most superdoc, or the same doc is not in a chin, or raise if node unknown.
- graphtik.planning.unsatisfied_operations(dag, inputs: Iterable) Tuple[Mapping[Operation, Any], IndexedSet] [source]¶
Traverse topologically sorted dag to collect un-satisfied operations.
Unsatisfied operations are those suffering from ANY of the following:
- They are missing at least one compulsory need-input.
Since the dag is ordered, as soon as we’re on an operation, all its needs have been accounted, so we can get its satisfaction.
- Their provided outputs are not linked to any data in the dag.
An operation might not have any output link when
_prune_graph()
has broken them, due to given intermediate inputs.
- Parameters
dag – a graph with broken edges those arriving to existing inputs
inputs – an iterable of the names of the input values
- Returns
a 2-tuple with ({pruned-op, unsatisfied-explanation}, topo-sorted-nodes)
- graphtik.planning.yield_also_chaindocs(dag, doc: str, stop_set=()) Iterable[str] [source]¶
Calls
_yield_also_chained_docs()
for both subdocs & superdocs.
- graphtik.planning.yield_also_subdocs(dag, doc: str, stop_set=()) Iterable[str] [source]¶
Calls
_yield_also_chained_docs()
for subdocs.
- graphtik.planning.yield_also_superdocs(dag, doc: str, stop_set=()) Iterable[str] [source]¶
Calls
_yield_also_chained_docs()
for superdocs.
- graphtik.planning.yield_chaindocs(dag, docs: Iterable[str], stop_set=()) Iterable[str] [source]¶
Calls
_yield_chained_docs()
for both subdocs & superdocs.
- graphtik.planning.yield_ops(nodes) List[Operation] [source]¶
May scan (preferably)
plan.steps
or dag nodes.
Module: execution¶
execute the plan to derrive the solution.
- class graphtik.execution.ExecutionPlan(net, needs, provides, dag, steps, asked_outs, comments)[source]¶
A pre-compiled list of operation steps that can execute for the given inputs/outputs.
It is the result of the network’s planning phase.
Note the execution plan’s attributes are on purpose immutable tuples.
- net¶
The parent
Network
- needs¶
An
IndexedSet
with the input names needed to exist in order to produce all provides.
- provides¶
An
IndexedSet
with the outputs names produces when all inputs are given.
- dag¶
The regular (not broken) pruned subgraph of net-graph.
- steps¶
The tuple of operation-nodes & instructions needed to evaluate the given inputs & asked outputs, free memory and avoid overwriting any given intermediate inputs.
- asked_outs¶
When true, evictions may kick in (unless disabled by configurations), otherwise, evictions (along with prefect-evictions check) are skipped.
- comments¶
an {op, prune-explanation} dictionary
- __dict__ = mappingproxy({'__module__': 'graphtik.execution', '__doc__': "\n A pre-compiled list of operation steps that can :term:`execute` for the given inputs/outputs.\n\n It is the result of the network's :term:`planning` phase.\n\n Note the execution plan's attributes are on purpose immutable tuples.\n\n .. attribute:: net\n\n The parent :class:`Network`\n .. attribute:: needs\n\n An :class:`.IndexedSet` with the input names needed to exist in order to produce all `provides`.\n .. attribute:: provides\n\n An :class:`.IndexedSet` with the outputs names produces when all `inputs` are given.\n .. attribute:: dag\n\n The regular (not broken) *pruned* subgraph of net-graph.\n .. attribute:: steps\n\n The tuple of operation-nodes & *instructions* needed to evaluate\n the given inputs & asked outputs, free memory and avoid overwriting\n any given intermediate inputs.\n .. attribute:: asked_outs\n\n When true, :term:`eviction`\\s may kick in (unless disabled by :term:`configurations`),\n otherwise, *evictions* (along with prefect-evictions check) are skipped.\n .. attribute:: comments\n\n an {op, prune-explanation} dictionary\n ", 'graph': <property object>, 'prepare_plot_args': <function ExecutionPlan.prepare_plot_args>, '__repr__': <function ExecutionPlan.__repr__>, 'validate': <function ExecutionPlan.validate>, '_check_if_aborted': <function ExecutionPlan._check_if_aborted>, '_prepare_tasks': <function ExecutionPlan._prepare_tasks>, '_handle_task': <function ExecutionPlan._handle_task>, '_execute_thread_pool_barrier_method': <function ExecutionPlan._execute_thread_pool_barrier_method>, '_execute_sequential_method': <function ExecutionPlan._execute_sequential_method>, 'execute': <function ExecutionPlan.execute>, '__dict__': <attribute '__dict__' of 'ExecutionPlan' objects>, '__abstractmethods__': frozenset(), '_abc_impl': <_abc_data object>, '__annotations__': {'graph': 'networkx.Graph'}})¶
- __module__ = 'graphtik.execution'¶
- _abc_impl = <_abc_data object>¶
- _execute_sequential_method(solution: Solution)[source]¶
This method runs the graph one operation at a time in a single thread
- Parameters
solution – must contain the input values only, gets modified
- _execute_thread_pool_barrier_method(solution: Solution)[source]¶
(deprecated) This method runs the graph using a parallel pool of thread executors. You may achieve lower total latency if your graph is sufficiently sub divided into operations using this method.
- Parameters
solution – must contain the input values only, gets modified
- _handle_task(future: Union[OpTask, AsyncResult], op, solution) None [source]¶
Un-dill parallel task results (if marshalled), and update solution / handle failure.
- _prepare_tasks(operations, solution, pool, global_parallel, global_marshal) Union[Future, OpTask, bytes] [source]¶
Combine ops+inputs, apply marshalling, and submit to execution pool (or not) …
based on global/pre-op configs.
- execute(named_inputs, outputs=None, *, name='', callbacks: Optional[Tuple[Callable[[OpTask], None], ...]] = None, solution_class=None, layered_solution=None) Solution [source]¶
- Parameters
named_inputs – A mapping of names –> values that must contain at least the compulsory inputs that were specified when the plan was built (but cannot enforce that!). Cloned, not modified.
outputs – If not None, they are just checked if possible, based on
provides
, and scream if not.name – name of the pipeline used for logging
callbacks – If given, a 2-tuple with (optional) callbacks to call before/after computing operation, with
OpTask
as argument containing the op & solution. Can be one (scalar), less than 2, or nothing/no elements accepted.solution_class – a custom solution factory to use
layered_solution –
whether to store operation results into separate solution layer
Unless overridden by a True/False in
set_layered_solution()
of configurations, it accepts the following values:When True(False), always keep(don’t keep) results in a separate layer for each operation, regardless of any jsonp dependencies.
If
None
, layers are used only if there are NO jsonp dependencies in the network.
- Returns
The solution which contains the results of each operation executed +1 for inputs in separate dictionaries.
- Raises
- Unsolvable graph…
if it cannot produce any outputs from the given inputs.
- Plan needs more inputs…
if given inputs mismatched plan’s
needs
.- Unreachable outputs…
if net cannot produce asked outputs.
- property graph¶
- prepare_plot_args(plot_args: PlotArgs) PlotArgs [source]¶
Called by
plot()
to create the nx-graph and other plot-args, e.g. solution.Clone the graph or merge it with the one in the plot_args (see
PlotArgs.clone_or_merge_graph()
.For the rest args, prefer
PlotArgs.with_defaults()
over_replace()
, not to override user args.
- validate(inputs: ~typing.Optional[~typing.Union[~typing.Collection, str]] = <UNSET>, outputs: ~typing.Optional[~typing.Union[~typing.Collection, str]] = <UNSET>)[source]¶
Scream on invalid inputs, outputs or no operations in graph.
- Parameters
- Raises
- Unsolvable graph…
if it cannot produce any outputs from the given inputs.
- Plan needs more inputs…
if given inputs mismatched plan’s
needs
.- Unreachable outputs…
if net cannot produce asked outputs.
- class graphtik.execution.OpTask(op, sol, solid, result=<UNSET>)[source]¶
Mimic
concurrent.futures.Future
for sequential execution.This intermediate class is needed to solve pickling issue with process executor.
- __module__ = 'graphtik.execution'¶
- __slots__ = ('op', 'sol', 'solid', 'result')¶
- logname = 'graphtik.execution'¶
- op¶
the operation about to be computed.
- result¶
Initially would
UNSET
, will be set after execution with operation’s outputs or exception.
- sol¶
the solution (might be just a plain dict if it has been marshalled).
- solid¶
the operation identity, needed if sol is a plain dict.
- class graphtik.execution.Solution(plan, input_values: dict, callbacks: Optional[Tuple[Callable[[OpTask], None], Callable[[OpTask], None]]] = None, is_layered=None)[source]¶
The solution chain-map and execution state (e.g. overwrite or canceled operation)
It inherits
collections.ChainMap
, to keep a separate dictionary for each operation executed, +1 for the user inputs.- __annotations__ = {'broken': typing.Mapping[graphtik.base.Operation, graphtik.base.Operation], 'canceled': typing.Mapping[graphtik.base.Operation, typing.Any], 'dag': <class 'networkx.classes.digraph.DiGraph'>, 'executed': typing.Mapping[graphtik.base.Operation, typing.Any], 'graph': "'networkx.Graph'", 'is_layered': <class 'bool'>, 'solid': <class 'str'>}¶
- __init__(plan, input_values: dict, callbacks: Optional[Tuple[Callable[[OpTask], None], Callable[[OpTask], None]]] = None, is_layered=None)[source]¶
Initialize a ChainMap by setting maps to the given mappings. If no mappings are provided, a single empty dictionary is used.
- __module__ = 'graphtik.execution'¶
- _abc_impl = <_abc_data object>¶
- _populate_op_layer_with_outputs(op, outputs) dict [source]¶
Installs & populates a new 1st chained-map, if layered, or use named_inputs.
- _reschedule(dag, reason, op)[source]¶
Re-prune dag, and then update and return any newly-canceled ops.
- Parameters
dag – The dag to discover unsatisfied operations from.
reason – for logging
op – for logging
- broken: Mapping[Operation, Operation] = {}[source]¶
A map of {rescheduled operation -> dynamically pruned ops, downstream}.
- canceled: Mapping[Operation, Any] = {}[source]¶
A {op, prune-explanation} dictionary with canceled operation\s due to upstream failures.
- check_if_incomplete() Optional[IncompleteExecutionError] [source]¶
Return a
IncompleteExecutionError
if pipeline operations failed/canceled.
- dag: DiGraph[source]¶
Cloned from plan will be modified, by removing the downstream edges of:
any partial outputs not provided, or
all provides of failed operations.
- executed: Mapping[Operation, Any] = {}[source]¶
A dictionary with keys the operations executed, and values their layer, or status:
no key: not executed yet
value == dict: execution ok, produced those outputs
value == Exception: execution failed
Keys are ordered as operations executed (last, most recently executed).
When
is_layered
, its value-dicts are inserted, in reverse order, into mymaps
(from chain-map).
- property graph¶
- is_layered: bool[source]¶
- Command solution layer, by
default, false if not any jsonp in dependencies.
See
executed
below
- property layers: List[Mapping[Operation, Any]]¶
Outputs by operation, in execution order (last, most recently executed).
- operation_executed(op, outputs)[source]¶
Invoked once per operation, with its results.
It will update
executed
with the operation status and if outputs were partials, it will updatecanceled
with the unsatisfied ops downstream of op.- Parameters
op – the operation that completed ok
outputs – The named values the op` actually produced, which may be a subset of its provides. Sideffects are not considered.
- operation_failed(op, ex)[source]¶
Invoked once per operation, with its results.
It will update
executed
with the operation status and thecanceled
with the unsatisfied ops downstream of op.
- property overwrites: Mapping[Any, List]¶
The data in the solution that exist more than once (refreshed on every call).
A “virtual” property to a dictionary with keys the names of values that exist more than once, and values, all those values in a list, ordered in reverse compute order (1st is the last one computed, last (any) given-inputs).
- plan = 'ExecutionPlan'¶
the plan that produced this solution
- graphtik.execution._do_task(task)[source]¶
Un-dill the simpler
OpTask
& Dill the results, to pass through pool-processes.
- graphtik.execution.log = <Logger graphtik.execution (WARNING)>¶
If this logger is eventually DEBUG-enabled, the string-representation of network-objects (network, plan, solution) is augmented with children’s details.
- graphtik.execution.task_context: ContextVar = <ContextVar name='task_context'>¶
(unstable API) Populated with the
OpTask
for the currently executing operation. It does not work for (deprecated) parallel execution.See also
The elaborate example in Hierarchical data and further tricks section
Module: plot¶
plotting handled by the active plotter & current theme.
- class graphtik.plot.Plotter(theme: Optional[Theme] = None, **styles_kw)[source]¶
a plotter renders diagram images of plottables.
- default_theme[source]¶
The customizable
Theme
instance controlling theme values & dictionaries for plots.
- build_pydot(plot_args: PlotArgs) Dot [source]¶
Build a
pydot.Dot
out of a Network graph/steps/inputs/outputs and return itto be fed into Graphviz to render.
See
Plottable.plot()
for the arguments, sample code, and the legend of the plots.
- legend(filename=None, jupyter_render: Optional[Mapping] = None, theme: Optional[Theme] = None)[source]¶
Generate a legend for all plots (see
Plottable.plot()
for args)See
Plotter.render_pydot()
for the rest arguments.
- render_pydot(dot: Dot, filename=None, jupyter_render: Optional[str] = None)[source]¶
Render a
pydot.Dot
instance with Graphviz in a file and/or in a matplotlib window.- Parameters
dot – the pre-built
pydot.Dot
instancefilename (str) –
Write a file or open a matplotlib window.
If it is a string or file, the diagram is written into the file-path
Common extensions are
.png .dot .jpg .jpeg .pdf .svg
callplot.supported_plot_formats()
for more.If it IS True, opens the diagram in a matplotlib window (requires matplotlib package to be installed).
If it equals -1, it mat-plots but does not open the window.
Otherwise, just return the
pydot.Dot
instance.
- seealso
jupyter_render –
a nested dictionary controlling the rendering of graph-plots in Jupyter cells. If None, defaults to
default_jupyter_render
; values for ommitted sub-keys are taken also from the above dict - pass an empty dict({}
) if that is not desirable (eg. to revert to panZoom’s library’s defaults).You may modify those in place and they will apply for all future calls (see Jupyter notebooks).
You may increase the height of the SVG cell output with something like this:
from graphtik.plot import default_jupyter_render plottable.plot(jupyter_render={ **default_jupyter_render, "svg_element_styles": "height: 600px; width: 100%" })
- Returns
the matplotlib image if
filename=-1
, or the given dot annotated with any jupyter-rendering configurations given in jupyter_render parameter.
See
Plottable.plot()
for sample code.
- with_styles(**kw) Plotter [source]¶
Returns a cloned plotter with a deep-copied theme modified as given.
See also
Theme.withset()
.
- class graphtik.plot.Ref(ref, default=Ellipsis)[source]¶
Deferred attribute reference
resolve()
d on a some object(s).- default¶
- ref¶
- class graphtik.plot.StylesStack(plot_args: PlotArgs, named_styles: List[Tuple[str, dict]], ignore_errors: bool = False)[source]¶
A mergeable stack of dicts preserving provenance and style expansion.
The
merge()
method joins the collected stack of styles into a single dictionary, and if DEBUG (seeremerge()
) insert their provenance in a'tooltip'
attribute; Any lists are merged (important for multi-valued Graphviz attributes likestyle
).Then they are
expanded
.- add(name, kw=Ellipsis)[source]¶
Adds a style by name from style-attributes, or provenanced explicitly, or fail early.
- Parameters
name – Either the provenance name when the kw styles is given, OR just an existing attribute of
style
instance.kw – if given and is None/empty, ignored.
- expand(style: dict) dict [source]¶
Apply style expansions on an already merged style.
Call any callables found as keys, values or the whole style-dict, passing in the current
plot_args
, and replace those with the callable’s result (even more flexible than templates).Resolve any
Ref
instances, first against the current nx_attrs and then against the attributes of the current theme.Render jinja2 templates with template-arguments all attributes of
plot_args
instance in use, (hence much more flexible thanRef
).Any Nones results above are discarded.
Workaround pydot/pydot#228 pydot-cstor not supporting styles-as-lists.
Merge tooltip & tooltip lists.
- property ignore_errors¶
When true, keep merging despite expansion errors.
- merge(debug=None) dict [source]¶
Recursively merge
named_styles
andexpand()
the result style.- Parameters
debug – When not None, override DEBUG flag; when enabled, tooltips are overridden with provenance & nx_attrs.
- Returns
the merged styles
- property named_styles¶
A list of 2-tuples: (name, dict) containing the actual styles along with their provenance.
- property plot_args¶
current item’s plot data with at least
PlotArgs.theme
attribute. ` `
- stack_user_style(nx_attrs: dict, skip=())[source]¶
Appends keys in nx_attrs starting with
USER_STYLE_PREFFIX
into the stack.
- class graphtik.plot.Theme(*, _prototype: Optional[Theme] = None, **kw)[source]¶
The poor man’s css-like plot theme (see also
StyleStack
).To use the values contained in theme-instances, stack them in a
StylesStack
, andStylesStack.merge()
them with style expansions (read it fromStyleStack.expand()
).Attention
It is recommended to use other means for Plot customizations instead of modifying directly theme’s class-attributes.
All
Theme
class-attributes are deep-copied when constructing new instances, to avoid modifications by mistake, while attempting to update instance-attributes instead (hint: allmost all its attributes are containers i.e. dicts). Therefore any class-attributes modification will be ignored, until a newTheme
instance from the patched class is used .- arch_url = 'https://graphtik.readthedocs.io/en/latest/arch.html'¶
the url to the architecture section explaining graphtik glossary, linked by legend.
- broken_color = 'Red'¶
- canceled_color = '#a9a9a9'¶
- data_bad_html_label_keys = {'label'}¶
Keys to ignore from data styles & node-attrs, because they are handled internally by HTML-Label, and/or interact badly with that label.
- data_template = <Template memory:7fb871779890>¶
- edge_defaults = {}[source]¶
Attributes applying to all edges with
edge [...]
graphviz construct, appended in graph only if non-empty.
- evicted_color = '#006666'¶
- failed_color = 'LightCoral'¶
- fill_color = 'wheat'¶
- kw_data = {'fixedsize': 'shape', 'shape': 'rect'}¶
Reduce margins, since sideffects take a lot of space (default margin: x=0.11, y=0.055O)
- kw_data_evicted = {'penwidth': '3', 'tooltip': ['(evicted)']}¶
- kw_data_in_solution = {'fillcolor': Ref('fill_color'), 'style': ['filled'], 'tooltip': [<function make_data_value_tooltip>]}¶
- kw_data_in_solution_null = {'fillcolor': Ref('null_color'), 'tooltip': ['(null-result)']}¶
- kw_data_inp_only = {'shape': 'invhouse', 'tooltip': ['(input)']}¶
- kw_data_io = {'shape': 'hexagon', 'tooltip': ['(input+output)']}¶
- kw_data_missing = {'color': Ref('canceled_color'), 'fontcolor': Ref('canceled_color'), 'tooltip': ['(missing-optional or canceled)']}¶
- kw_data_out_only = {'shape': 'house', 'tooltip': ['(output)']}¶
- kw_data_overwritten = {'fillcolor': Ref('overwrite_color'), 'style': ['filled'], 'tooltip': [<function make_overwrite_tooltip>]}¶
- kw_data_pruned = {'color': Ref('pruned_color'), 'fontcolor': Ref('pruned_color'), 'tooltip': ['(pruned)']}¶
- kw_data_sideffect = {'color': Ref('sideffect_color'), 'fontcolor': Ref('sideffect_color')}¶
- kw_data_to_evict = {'color': Ref('evicted_color'), 'style': ['filled', 'dashed'], 'tooltip': ['(to evict)']}¶
- kw_edge = {'headport': 'n', 'tailport': 's'}¶
- kw_edge_alias = {'fontsize': 11, 'label': <Template memory:7fb8717a0c10>}¶
Added conditionally if alias_of found in edge-attrs.
- kw_edge_broken = {'color': Ref('broken_color'), 'tooltip': ['(partial-broken)']}¶
- kw_edge_endured = {'style': ['dashed']}¶
- kw_edge_head_op = {'arrowtail': 'inv', 'dir': 'back'}¶
- kw_edge_implicit = {<function Theme.<lambda>>: 'obox', 'dir': 'both', 'tooltip': ['(implicit)'], 'fontcolor': Ref('sideffect_color')}¶
- kw_edge_mapping_keyword = {'fontname': 'italic', 'fontsize': 11, 'label': <Template memory:7fb8784af8d0>, 'tooltip': ['(mapped-fn-keyword)']}¶
Rendered if
keyword
exists in nx_attrs.
- kw_edge_null_result = {'color': Ref('null_color'), 'tooltip': ['(null-result)']}¶
- kw_edge_optional = {'style': ['dashed'], 'tooltip': ['(optional)']}¶
- kw_edge_pruned = {'color': Ref('pruned_color'), 'fontcolor': Ref('pruned_color')}¶
- kw_edge_rescheduled = {'style': ['dashed']}¶
- kw_edge_sideffect = {'color': Ref('sideffect_color')}¶
- kw_edge_subdoc = {'arrowtail': 'odot', 'color': Ref('subdoc_color'), 'dir': 'back', 'headport': 'nw', 'tailport': 'se', 'tooltip': ['(subdoc)']}¶
- kw_graph = {'fontname': 'italic', 'graph_type': 'digraph'}¶
- kw_graph_plottable_type = {'ExecutionPlan': {}, 'FnOp': {}, 'Network': {}, 'Pipeline': {}, 'Solution': {}}¶
styles per plot-type
- kw_graph_plottable_type_unknown = {}[source]¶
For when type-name of
PlotArgs.plottable
is not found inkw_plottable_type
( ot missing altogether).
- kw_legend = {'URL': 'https://graphtik.readthedocs.io/en/latest/_images/GraphtikLegend.svg', 'fillcolor': 'yellow', 'name': 'legend', 'shape': 'component', 'style': 'filled', 'target': '_blank'}¶
If
'URL'`
key missing/empty, no legend icon included in plots.
- kw_op = {'name': <function Theme.<lambda>>, 'shape': 'plain', 'tooltip': [<function Theme.<lambda>>]}¶
props for operation node (outside of label))
- kw_op_canceled = {'fillcolor': Ref('canceled_color'), 'tooltip': ['(canceled)']}¶
- kw_op_endured = {'badges': ['!'], 'penwidth': Ref('resched_thickness'), 'style': ['dashed'], 'tooltip': ['(endured)']}¶
- kw_op_executed = {'fillcolor': Ref('fill_color')}¶
- kw_op_failed = {'fillcolor': Ref('failed_color'), 'tooltip': [<Template memory:7fb871769f10>]}¶
- kw_op_label = {'fn_link_target': '_top', 'fn_name': <function Theme.<lambda>>, 'fn_truncate': Ref('truncate_args'), 'fn_url': Ref('fn_url'), 'op_link_target': '_top', 'op_name': <function Theme.<lambda>>, 'op_truncate': Ref('truncate_args'), 'op_url': Ref('op_url')}¶
Jinja2 params for the HTML-Table label, applied 1ST.
- kw_op_label2 = {'fn_tooltip': [<function make_fn_tooltip>], 'op_tooltip': [<function make_op_tooltip>]}¶
Jinja2 params for the HTML-Table label applied AT THE END.
- kw_op_marshalled = {'badges': ['&']}¶
- kw_op_parallel = {'badges': ['|']}¶
- kw_op_prune_comment = {'op_tooltip': [<function make_op_prune_comment>]}¶
- kw_op_pruned = {'color': Ref('pruned_color'), 'fontcolor': Ref('pruned_color')}¶
- kw_op_rescheduled = {'badges': ['?'], 'penwidth': Ref('resched_thickness'), 'style': ['dashed'], 'tooltip': ['(rescheduled)']}¶
- kw_op_returns_dict = {'badges': ['}']}¶
- kw_step = {'arrowhead': 'vee', 'color': Ref('steps_color'), 'fontcolor': Ref('steps_color'), 'fontname': 'bold', 'fontsize': 18, 'splines': True, 'style': 'dotted'}¶
step edges
- kw_step_badge = {'step_bgcolor': Ref('steps_color'), 'step_color': 'white', 'step_target': '_top', 'step_tooltip': 'computation order', 'step_url': 'https://graphtik.readthedocs.io/en/latest/arch.html#term-steps', 'vector_color': Ref('vector_color')}¶
Available as jinja2 params for both data & operation templates.
- node_defaults = {'fillcolor': 'white', 'style': ['filled']}¶
Attributes applying to all nodes with
node [...]
graphviz construct, append in graph only if non-empty.
- null_color = '#ffa9cd'¶
- op_bad_html_label_keys = {'label', 'shape', 'style'}¶
Keys to ignore from operation styles & node-attrs, because they are handled internally by HTML-Label, and/or interact badly with that label.
- op_badge_styles = {'badge_styles': {'!': {'URL': 'https://graphtik.readthedocs.io/en/latest/arch.html#term-endured', 'bgcolor': '#04277d', 'color': 'white', 'target': '_top', 'tooltip': 'endured'}, '&': {'URL': 'https://graphtik.readthedocs.io/en/latest/arch.html#term-marshalling', 'bgcolor': '#4e3165', 'color': 'white', 'target': '_top', 'tooltip': 'marshalled'}, '?': {'URL': 'https://graphtik.readthedocs.io/en/latest/arch.html#term-partial-outputs', 'bgcolor': '#fc89ac', 'color': 'white', 'target': '_top', 'tooltip': 'rescheduled'}, '|': {'URL': 'https://graphtik.readthedocs.io/en/latest/arch.html#term-parallel-execution', 'bgcolor': '#b1ce9a', 'color': 'white', 'target': '_top', 'tooltip': 'parallel'}, '}': {'URL': 'https://graphtik.readthedocs.io/en/latest/arch.html#term-returns-dictionary', 'bgcolor': '#cc5500', 'color': 'white', 'target': '_top', 'tooltip': 'returns_dict'}}}¶
Operation styles may specify one or more “letters” in a badges list item, as long as the “letter” is contained in the dictionary below.
- op_template = <Template memory:7fb871deba10>¶
Try to mimic a regular Graphviz node attributes (see examples in
test.test_plot.test_op_template_full()
for params). TODO: fix jinja2 template is un-picklable!
- overwrite_color = 'SkyBlue'¶
- pruned_color = '#d3d3d3'¶
- resched_thickness = 4¶
- show_chaindocs = None[source]¶
- None:
hide any parent/subdoc not related directly to some operation;
- true:
plot also hierarchical data nodes not directly linked to operations;
- false:
hide also parent-subdoc relation edges.
- show_steps = None[source]¶
- None:
plot just a badge with the order (a number) of each op/data in steps (if contained);
- true:
plot also execution steps, linking operations and evictions with green dotted lines labeled with numbers denoting the execution order;
- false:
hide even op/data step order badges.
- sideffect_color = 'blue'¶
- steps_color = '#00bbbb'¶
- subdoc_color = '#8B4513'¶
- truncate_args = ((23, True), {'reverse': True})¶
args for jinja2 patched truncate filter, above.
- vector_color = '#7193ff'¶
- graphtik.plot.USER_STYLE_PREFFIX = 'graphviz.'¶
Any nx-attributes starting with this prefix are appended verbatim as graphviz attributes, by
stack_user_style()
.
- graphtik.plot.active_plotter_plugged(plotter: Plotter) None [source]¶
Like
set_active_plotter()
as a context-manager, resetting back to old value.
- graphtik.plot.as_identifier(s)[source]¶
Convert string into a valid ID, both for html & graphviz.
It must not rely on Graphviz’s HTML-like string, because it would not be a valid HTML-ID.
Adapted from https://stackoverflow.com/a/3303361/548792,
HTML rule from https://stackoverflow.com/a/79022/548792
Graphviz rules: https://www.graphviz.org/doc/info/lang.html
- graphtik.plot.default_jupyter_render = {'svg_container_styles': '', 'svg_element_styles': 'width: 100%; height: 300px;', 'svg_pan_zoom_json': {'controlIconsEnabled': True, 'fit': True}}¶
A nested dictionary controlling the rendering of graph-plots in Jupyter cells, #: as those returned from
Plottable.plot()
(currently as SVGs). Either modify it in place, or pass another one in the respective methods.Any keys ommitted, are taken from
default_jupyter_render
- pass an empty dict({}
) if that is not desirable (eg. to revert to panZoom’s library’s defaults).The following keys are supported.
- Parameters
svg_pan_zoom_json –
arguments controlling the rendering of a zoomable SVG in Jupyter notebooks, as defined in https://github.com/bumbu/svg-pan-zoom#how-to-use if None, defaults to this map (json strings also supported):
{"controlIconsEnabled": True, "fit": True}"
svg_element_styles –
mostly for sizing the zoomable SVG in Jupyter notebooks. Inspect & experiment on the html page of the notebook with browser tools. if None, defaults to string (also maps supported):
"width: 100%; height: 300px;"
svg_container_styles – like svg_element_styles, if None, defaults to empty string (also maps supported).
Note
referred also by
graphtik
’sgraphtik_zoomable_options
default configuration value.
- graphtik.plot.get_active_plotter() Plotter [source]¶
Get the previously active
plotter
instance or default one.
- graphtik.plot.graphviz_html_string(s, *, repl_nl=None, repl_colon=None, xmltext=None)[source]¶
Workaround pydot parsing of node-id & labels by encoding as HTML.
pydot library does not quote DOT-keywords anywhere (pydot#111).
Char
:
on node-names denote port/compass-points and break IDs (pydot#224).Non-strings are not quote_if_necessary by pydot.
NLs im tooltips of HTML-Table labels need substitution with the XML-entity.
HTML-Label attributes (
xmlattr=True
) need both html-escape & quote.
Attention
It does not correctly handle
ID:port:compass-point
format.
- graphtik.plot.is_nx_node_dependent(graph, nx_node)[source]¶
Return true if node’s edges are not subdoc only.
- graphtik.plot.legend(filename=None, show=None, jupyter_render: Optional[Mapping] = None, plotter: Optional[Plotter] = None)[source]¶
Generate a legend for all plots (see
Plottable.plot()
for args)- Parameters
plotter – override the active plotter
show –
Deprecated since version v6.1.1: Merged with filename param (filename takes precedence).
See
Plotter.render_pydot()
for the rest arguments.
- graphtik.plot.make_data_value_tooltip(plot_args: PlotArgs)[source]¶
Called on datanodes, when solution exists.
- graphtik.plot.make_op_tooltip(plot_args: PlotArgs)[source]¶
the string-representation of an operation (name, needs, provides)
- graphtik.plot.make_overwrite_tooltip(plot_args: PlotArgs)[source]¶
Called on datanodes, withmultiple overwrite values.
- graphtik.plot.make_template(s)[source]¶
Makes dedented jinja2 templates supporting extra escape filters for Graphviz:
ee
Like default escape filter
e
, but Nones/empties evaluate to false. Needed because the default escape filter breaks xmlattr filter with Nones .eee
Escape for when writting inside HTML-strings. Collapses nones/empties (unlike default
e
).hrefer
Dubious escape for when writting URLs inside Graphviz attributes. Does NOT collapse nones/empties (like default
e
)ex
format exceptions
truncate
reversing truncate (keep tail) if truncate arg is true
sideffected
return the sideffected part of an sfxed or none
sfx_list
return the sfx_list part of an sfxed or none
jsonp
return the jsonp list of a dependency or none
- graphtik.plot.remerge(*containers, source_map: Optional[list] = None)[source]¶
Merge recursively dicts and extend lists with
boltons.iterutils.remap()
…screaming on type conflicts, ie, a list needs a list, etc, unless one of them is None, which is ignored.
- Parameters
containers – a list of dicts or lists to merge; later ones take precedence (last-wins). If source_map is given, these must be 2-tuples of
(name: container)
.source_map –
If given, it must be a dictionary, and containers arg must be 2-tuples like
(name: container)
. The source_map will be populated with mappings between path and the name of the container it came from.Warning
if source_map given, the order of input dictionaries is NOT preserved is the results (important if your code rely on PY3.7 stable dictionaries).
- Returns
returns a new, merged top-level container.
Adapted from https://gist.github.com/mahmoud/db02d16ac89fa401b968 but for lists and dicts only, ignoring Nones and screams on incompatible types.
Discusson in: https://gist.github.com/pleasantone/c99671172d95c3c18ed90dc5435ddd57
Example
>>> defaults = { ... 'subdict': { ... 'as_is': 'hi', ... 'overridden_key1': 'value_from_defaults', ... 'overridden_key1': 2222, ... 'merged_list': ['hi', {'untouched_subdict': 'v1'}], ... } ... }
>>> overrides = { ... 'subdict': { ... 'overridden_key1': 'overridden value', ... 'overridden_key2': 5555, ... 'merged_list': ['there'], ... } ... }
>>> from graphtik.plot import remerge >>> source_map = {} >>> remerge( ... ("defaults", defaults), ... ("overrides", overrides), ... source_map=source_map) {'subdict': {'as_is': 'hi', 'overridden_key1': 'overridden value', 'merged_list': ['hi', {'untouched_subdict': 'v1'}, 'there'], 'overridden_key2': 5555}} >>> source_map {('subdict', 'as_is'): 'defaults', ('subdict', 'overridden_key1'): 'overrides', ('subdict', 'merged_list'): ['defaults', 'overrides'], ('subdict',): 'overrides', ('subdict', 'overridden_key2'): 'overrides'}
- graphtik.plot.save_plot_file_by_sha1(plottable: Plottable, dir_prefix: Path)[source]¶
Save plottable in a fpath generated from sha1 of the dot.
- graphtik.plot.set_active_plotter(plotter: Plotter)[source]¶
The default instance to render plottables,
unless overridden with a plotter argument in
Plottable.plot()
.- Parameters
plotter – the
plotter
instance to install
Module: config¶
configurations for network execution, and utilities on them.
See also
methods plot.active_plotter_plugged()
, plot.set_active_plotter()
,
plot.get_active_plotter()
Plot configrations were not defined here, not to pollute import space early, until they are actually needed.
Note
The contant-manager function XXX_plugged()
or XXX_enabled()
do NOT launch
their code blocks using contextvars.Context.run()
in a separate “context”,
so any changes to these or other context-vars will persist
(unless they are also done within such context-managers)
- graphtik.config.abort_run()[source]¶
Sets the abort run global flag, to halt all currently or future executing plans.
This global flag is reset when any
Pipeline.compute()
is executed, or manually, by callingreset_abort()
.
- graphtik.config.debug_enabled(enabled=True)[source]¶
Like
set_debug()
as a context-manager, resetting back to old value.See also
disclaimer about context-managers at the top of this
config
module.
- graphtik.config.evictions_skipped(enabled=True)[source]¶
Like
set_skip_evictions()
as a context-manager, resetting back to old value.See also
disclaimer about context-managers at the top of this
config
module.
- graphtik.config.execution_pool_plugged(pool: Optional[Pool])[source]¶
Like
set_execution_pool()
as a context-manager, resetting back to old value.See also
disclaimer about context-managers at the top of this
config
module.
- graphtik.config.get_execution_pool() Optional[Pool] [source]¶
(deprecated) Get the process-pool for parallel plan executions.
- graphtik.config.is_debug() Optional[bool] [source]¶
Return
set_debug()
or True ifGRAPHTIK_DEBUG
not one of0 false off no
.Affected behavior when DEBUG flag enabled:
on errors, plots the 1st errored solution/plan/pipeline/net (in that order) in an SVG file inside the temp-directory, and its path is logged in ERROR-level;
jetsam logs in ERROR (instead of in DEBUG) all annotations on all calls up the stack trace (logged from
graphtik.jetsam.err
logger);FnOp.compute()
prints out full given-inputs (not just their keys);net objects print more details recursively, like fields (not just op-names) and prune-comments;
plotted SVG diagrams include style-provenance as tooltips;
Sphinx extension also saves the original DOT file next to each image (see
graphtik_save_dot_files
).
Note
The default is controlled with
GRAPHTIK_DEBUG
environment variable.Note that enabling this flag is different from enabling logging in DEBUG, since it affects all code (eg interactive printing in debugger session, exceptions, doctests), not just debug statements (also affected by this flag).
- Returns
a “reset” token (see
ContextVar.set()
)
- graphtik.config.is_marshal_tasks() Optional[bool] [source]¶
(deprecated) see
set_marshal_tasks()
- graphtik.config.operations_endured(enabled=True)[source]¶
Like
set_endure_operations()
as a context-manager, resetting back to old value.See also
disclaimer about context-managers at the top of this
config
module.
- graphtik.config.operations_reschedullled(enabled=True)[source]¶
Like
set_reschedule_operations()
as a context-manager, resetting back to old value.See also
disclaimer about context-managers at the top of this
config
module.
- graphtik.config.reset_abort()[source]¶
Reset the abort run global flag, to permit plan executions to proceed.
- graphtik.config.set_debug(enabled)[source]¶
Enable/disable debug-mode.
- Parameters
enabled –
None, False, string(0, false, off, no)
: Disabledanything else: Enable DEBUG
see
is_debug()
- graphtik.config.set_endure_operations(enabled)[source]¶
Enable/disable globally endurance to keep executing even if some operations fail.
- Parameters
enable –
If
None
(default), respect the flag on each operation;If true/false, force it for all operations.
- Returns
a “reset” token (see
ContextVar.set()
)
.
- graphtik.config.set_execution_pool(pool: Optional[Pool])[source]¶
(deprecated) Set the process-pool for parallel plan executions.
You may have to :also func:set_marshal_tasks() to resolve pickling issues.
- graphtik.config.set_layered_solution(enabled)[source]¶
whether to store operation results into separate solution layer
- Parameters
enable – If false/true, it overrides any param given when executing a pipeline or a plan. If None (default), results are layered only if there are NO jsonp dependencies in the network.
- Returns
a “reset” token (see
ContextVar.set()
)
- graphtik.config.set_marshal_tasks(enabled)[source]¶
(deprecated) Enable/disable globally marshalling of parallel operations, …
inputs & outputs with
dill
, which might help for pickling problems.- Parameters
enable –
If
None
(default), respect the respective flag on each operation;If true/false, force it for all operations.
- Returns
a “reset” token (see
ContextVar.set()
)
- graphtik.config.set_parallel_tasks(enabled)[source]¶
Enable/disable globally parallel execution of operations.
- Parameters
enable –
If
None
(default), respect the respective flag on each operation;If true/false, force it for all operations.
- Returns
a “reset” token (see
ContextVar.set()
)
- graphtik.config.set_reschedule_operations(enabled)[source]¶
Enable/disable globally rescheduling for operations returning only partial outputs.
- Parameters
enable –
If
None
(default), respect the flag on each operation;If true/false, force it for all operations.
- Returns
a “reset” token (see
ContextVar.set()
)
.
- graphtik.config.set_skip_evictions(enabled)[source]¶
When true, disable globally evictions, to keep all intermediate solution values, …
regardless of asked outputs.
- Returns
a “reset” token (see
ContextVar.set()
)
- graphtik.config.solution_layered(enabled=True)[source]¶
Like
set_layered_solution()
as a context-manager, resetting back to old value.See also
disclaimer about context-managers at the top of this
config
module.
- graphtik.config.tasks_in_parallel(enabled=True)[source]¶
(deprecated) Like
set_parallel_tasks()
as a context-manager, resetting back to old value.See also
disclaimer about context-managers at the top of this
config
module.
- graphtik.config.tasks_marshalled(enabled=True)[source]¶
(deprecated) Like
set_marshal_tasks()
as a context-manager, resetting back to old value.See also
disclaimer about context-managers at the top of this
config
module.
Module: base¶
Generic utilities, exceptions and operation & plottable base classes.
- exception graphtik.base.AbortedException[source]¶
Raised from Network when
abort_run()
is called, and contains the solution …with any values populated so far.
- exception graphtik.base.IncompleteExecutionError[source]¶
Reported when any endured/reschedule operations were are canceled.
The exception contains 3 arguments:
the causal errors and conditions (1st arg),
the list of collected exceptions (2nd arg), and
the solution instance (3rd argument), to interrogate for more.
Returned by
check_if_incomplete()
or raised byscream_if_incomplete()
.
- class graphtik.base.Operation[source]¶
An abstract class representing an action with
compute()
.- abstract compute(named_inputs, outputs=None, recompute_from=None, *kw)[source]¶
Compute (optional) asked outputs for the given named_inputs.
It is called by
Network
. End-users should simply call the operation with named_inputs as kwargs.- Parameters
named_inputs – the input values with which to feed the computation.
outputs – what results to compute, see
Pipeline.compute()
.recompute_from – recompute all downstream from those dependencies, see
Pipeline.compute()
.
- Returns list
Should return a list values representing the results of running the feed-forward computation on
inputs
.
- class graphtik.base.PlotArgs(plottable: Plottable = None, graph: nx.Graph = None, name: str = None, steps: Collection = None, inputs: Collection = None, outputs: Collection = None, solution: graphtik.planning.Solution = None, clusters: Mapping = None, plotter: graphtik.plot.Plotter = None, theme: graphtik.plot.Theme = None, dot: pydot.Dot = None, nx_item: Any = None, nx_attrs: dict = None, dot_item: Any = None, clustered: dict = None, jupyter_render: Mapping = None, filename: Union[str, bool, int] = None)[source]¶
All the args of a
Plottable.plot()
call,check this method for a more detailed explanation of its attributes.
- clone_or_merge_graph(base_graph) PlotArgs [source]¶
Overlay
graph
over base_graph, or clone base_graph, if no attribute.- Returns
the updated plot_args
- property clustered¶
Collect the actual clustered dot_nodes among the given nodes.
- property clusters¶
Either a mapping of node-names to dot(
.
)-separated cluster-names, or false/true to enable plotter’s default clustering of nodes based on their dot-separated name parts.Note that if it’s None (default), the plotter will cluster based on node-names, BUT the Plan may replace the None with a dictionary with the “pruned” cluster (when its dag differs from network’s graph); to suppress the pruned-cluster, pass a truthy, NON-dictionary value.
- property dot¶
Where to add graphviz nodes & stuff.
- property dot_item¶
The pydot-node/edge created
- property filename¶
where to write image or show in a matplotlib window
- property graph¶
what to plot (or the “overlay” when calling
Plottable.plot()
)
- property inputs¶
the list of input names .
- property jupyter_render¶
jupyter configuration overrides
- property name¶
The name of the graph in the dot-file (important for cmaps).
- property nx_attrs¶
Attributes gotten from nx-graph for the given graph/node/edge. They are NOT a clone, so any modifications affect the nx graph.
- property outputs¶
the list of output names .
- property plottable¶
who is the caller
- property plotter¶
If given, overrides :active plotter`.
- property steps¶
the list of execution plan steps.
- property theme¶
If given, overrides plot theme plotter will use. It can be any mapping, in which case it overrite the current theme.
- class graphtik.base.Plottable[source]¶
plottable capabilities and graph props for all major classes of the project.
Classes wishing to plot their graphs should inherit this and implement property
plot
to return a “partial” callable that somehow ends up callingplot.render_pydot()
with the graph or any other args bound appropriately. The purpose is to avoid copying this function & documentation here around.- find_op_by_name(name) Optional[Operation] [source]¶
Fetch the 1st operation named with the given name.
- find_ops(predicate) List[Operation] [source]¶
Scan operation nodes and fetch those satisfying predicate.
- Parameters
predicate – the node predicate is a 2-argument callable(op, node-data) that should return true for nodes to include.
- graph: networkx.Graph[source]¶
- plot(filename: Union[str, bool, int] = None, show=None, *, plotter: graphtik.plot.Plotter = None, theme: graphtik.plot.Theme = None, graph: networkx.Graph = None, name=None, steps=None, inputs=None, outputs=None, solution: graphtik.planning.Solution = None, clusters: Mapping = None, jupyter_render: Union[None, Mapping, str] = None) pydot.Dot [source]¶
Entry-point for plotting ready made operation graphs.
- Parameters
filename (str) –
Write a file or open a matplotlib window.
If it is a string or file, the diagram is written into the file-path
Common extensions are
.png .dot .jpg .jpeg .pdf .svg
callplot.supported_plot_formats()
for more.If it IS True, opens the diagram in a matplotlib window (requires matplotlib package to be installed).
If it equals -1, it mat-plots but does not open the window.
Otherwise, just return the
pydot.Dot
instance.
- seealso
plottable –
the plottable that ordered the plotting. Automatically set downstreams to one of:
op | pipeline | net | plan | solution | <missing>
- seealso
plotter –
the plotter to handle plotting; if none, the active plotter is used by default.
- seealso
theme –
Any plot theme or dictionary overrides; if none, the
Plotter.default_theme
of the active plotter is used.- seealso
name –
if not given, dot-lang graph would is named “G”; necessary to be unique when referring to generated CMAPs. No need to quote it, handled by the plotter, downstream.
- seealso
graph (str) –
(optional) A
nx.Digraph
with overrides to merge with the graph provided by underlying plottables (translated by the active plotter).It may contain graph, node & edge attributes for any usage, but these conventions apply:
'graphviz.xxx'
(graph/node/edge attributes)Any “user-overrides” with this prefix are sent verbatim a Graphviz attributes.
Note
Remember to escape those values as Graphviz HTML-Like strings (use
plot.graphviz_html_string()
).no_plot
(node/edge attribute)element skipped from plotting (see “Examples:” section, below)
- seealso
inputs –
an optional name list, any nodes in there are plotted as a “house”
- seealso
outputs –
an optional name list, any nodes in there are plotted as an “inverted-house”
- seealso
solution –
an optional dict with values to annotate nodes, drawn “filled” (currently content not shown, but node drawn as “filled”). It extracts more infos from a
Solution
instance, such as, if solution has anexecuted
attribute, operations contained in it are drawn as “filled”.- seealso
clusters –
Either a mapping, or false/true to enable plotter’s default clustering of nodes base on their dot-separated name parts.
Note that if it’s None (default), the plotter will cluster based on node-names, BUT the Plan may replace the None with a dictionary with the “pruned” cluster (when its dag differs from network’s graph); to suppress the pruned-cluster, pass a truthy, NON-dictionary value.
Practically, when it is a:
dictionary of node-names –> dot(
.
)-separated cluster-names, it is respected, even if empty;truthy: cluster based on dot(
.
)-separated node-name parts;falsy: don’t cluster at all.
- seealso
jupyter_render –
a nested dictionary controlling the rendering of graph-plots in Jupyter cells, if None, defaults to
jupyter_render
; you may modify it in place and apply for all future calls (see Jupyter notebooks).- seealso
show –
Deprecated since version v6.1.1: Merged with filename param (filename takes precedence).
- Returns
a
pydot.Dot
instance (for reference to as similar API topydot.Dot
instance, visit: https://pydotplus.readthedocs.io/reference.html#pydotplus.graphviz.Dot)The
pydot.Dot
instance returned is rendered directly in Jupyter/IPython notebooks as SVG images (see Jupyter notebooks).
Note that the graph argument is absent - Each Plottable provides its own graph internally; use directly
render_pydot()
to provide a different graph.NODES:
- oval
function
- egg
subgraph operation
- house
given input
- inversed-house
asked output
- polygon
given both as input & asked as output (what?)
- square
intermediate data, neither given nor asked.
- red frame
evict-instruction, to free up memory.
- filled
data node has a value in solution OR function has been executed.
- thick frame
function/data node in execution steps.
ARROWS
- solid black arrows
dependencies (source-data need-ed by target-operations, sources-operations provides target-data)
- dashed black arrows
optional needs
- blue arrows
sideffect needs/provides
- wheat arrows
broken dependency (
provide
) during pruning- green-dotted arrows
execution steps labeled in succession
To generate the legend, see
legend()
.Examples:
>>> from graphtik import compose, operation >>> from graphtik.modifier import optional >>> from operator import add
>>> pipeline = compose("pipeline", ... operation(name="add", needs=["a", "b1"], provides=["ab1"])(add), ... operation(name="sub", needs=["a", optional("b2")], provides=["ab2"])(lambda a, b=1: a-b), ... operation(name="abb", needs=["ab1", "ab2"], provides=["asked"])(add), ... )
>>> pipeline.plot(True); # plot just the graph in a matplotlib window # doctest: +SKIP >>> inputs = {'a': 1, 'b1': 2} >>> solution = pipeline(**inputs) # now plots will include the execution-plan
The solution is also plottable:
>>> solution.plot('plot1.svg'); # doctest: +SKIP
or you may augment the pipelinewith the requested inputs/outputs & solution:
>>> pipeline.plot('plot1.svg', inputs=inputs, outputs=['asked', 'b1'], solution=solution); # doctest: +SKIP
In any case you may get the pydot.Dot object (n.b. it is renderable in Jupyter as-is):
>>> dot = pipeline.plot(solution=solution); >>> print(dot) digraph pipeline { fontname=italic; label=<pipeline>; node [fillcolor=white, style=filled]; <a> [fillcolor=wheat, fixedsize=shape, label=<<TABLE CELLBORDER="0" CELLSPACING="0" BORDER="0"> ...
You may use the
PlotArgs.graph
overlay to skip certain nodes (or edges) from the plots:>>> import networkx as nx
>>> g = nx.DiGraph() # the overlay >>> to_hide = pipeline.net.find_op_by_name("sub") >>> g.add_node(to_hide, no_plot=True) >>> dot = pipeline.plot(graph=g) >>> assert "<sub>" not in str(dot), str(dot)
- abstract prepare_plot_args(plot_args: PlotArgs) PlotArgs [source]¶
Called by
plot()
to create the nx-graph and other plot-args, e.g. solution.Clone the graph or merge it with the one in the plot_args (see
PlotArgs.clone_or_merge_graph()
.For the rest args, prefer
PlotArgs.with_defaults()
over_replace()
, not to override user args.
- class graphtik.base.RenArgs(typ: str, op: Operation, name: str, parent: Pipeline = None)[source]¶
Arguments received by callbacks in
rename()
and operation nesting.- property name¶
Alias for field number 2
- property op¶
the operation currently being processed
- property parent¶
The parent
Pipeline
of the operation currently being processed,. Has value only when doing operation nesting fromcompose()
.
- class graphtik.base.Token(s)[source]¶
Guarantee equality, not(!) identity, across processes.
- hashid¶
- graphtik.base.asdict(i, argname: str)[source]¶
Converts iterables-of-pairs or just a pair-tuple into a dict.
- Parameters
argname – Used in the exception raised when i not an iterable.
- graphtik.base.aslist(i, argname, allowed_types=<class 'list'>)[source]¶
Convert iterables (except strings) or wrap a scalar valu into a list.
- Parameters
i – None is converted into an empty list; empty allowed_types are returned as is.
argname – If string, it’s used in the exception raised when i not an iterable. ATTENTION: if None, any scalar (non-iterable) i is wrapped in a single-item list, and no exception is ever raised.
- graphtik.base.astuple(i, argname, allowed_types=<class 'tuple'>)[source]¶
Convert iterables (except strings) or wrap a scalar value into a tuple.
- Parameters
i – None is converted into an empty tuple; empty allowed_types are returned as is.
argname – If string, it’s used in the exception raised when i not an iterable. ATTENTION: if None, any scalar (non-iterable) i is wrapped in a single-item tuple, and no exception is ever raised.
- graphtik.base.first_solid(*tristates, default=None)[source]¶
Utility combining multiple tri-state booleans.
- graphtik.base.func_name(fn, default=Ellipsis, mod=None, fqdn=None, human=None, partials=None) Optional[str] [source]¶
FQDN of fn, descending into partials to print their args.
- Parameters
default – What to return if it fails; by default it raises.
mod – when true, prepend module like
module.name.fn_name
fqdn – when true, use
__qualname__
(instead of__name__
) which differs mostly on methods, where it contains class(es), and locals, respectively (PEP 3155). Sphinx uses fqdn=True for generating IDs.human – when true, explain built-ins, and assume
partials=True
(if that was None)partials – when true (or omitted & human true), partials denote their args like
fn({"a": 1}, ...)
- Returns
a (possibly dot-separated) string, or default (unless this is
...`
).- Raises
Only if default is
...
, otherwise, errors debug-logged.
Examples
>>> func_name(func_name) 'func_name' >>> func_name(func_name, mod=1) 'graphtik.base.func_name' >>> func_name(func_name.__format__, fqdn=0) '__format__' >>> func_name(func_name.__format__, fqdn=1) 'function.__format__'
Even functions defined in docstrings are reported:
>>> def f(): ... def inner(): ... pass ... return inner
>>> func_name(f, mod=1, fqdn=1) 'graphtik.base.f' >>> func_name(f(), fqdn=1) 'f.<locals>.inner'
On failures, arg default controls the outcomes:
TBD
Module: jetsam¶
jetsam utility for annotating exceptions from locals()
like PEP 678
PY3.11 exception-notes.
- class graphtik.jetsam.Jetsam(*args, **kwargs)[source]¶
The jetsam is a dict with items accessed also as attributes.
From https://stackoverflow.com/a/14620633/548792
- log_n_plot(plot=None) Path [source]¶
Log collected items, and plot 1st plottable in a temp-file, if DEBUG flag.
- Parameters
plot – override DEBUG-flag if given (true, plots, false not)
- Returns
the name of temp-file, also ERROR-logged along with the rest jetsam
- graphtik.jetsam.save_jetsam(ex, locs, *salvage_vars: str, annotation='jetsam', **salvage_mappings)[source]¶
Annotate exception with salvaged values from locals(), log, (if DEBUG flag) plot.
- Parameters
ex – the exception to annotate
locs –
locals()
from the context-manager’s block containing vars to be salvaged in case of exceptionATTENTION: wrapped function must finally call
locals()
, because locals dictionary only reflects local-var changes after call.annotation – the name of the attribute to attach on the exception
salvage_vars – local variable names to save as is in the salvaged annotations dictionary.
salvage_mappings – a mapping of destination-annotation-keys –> source-locals-keys; if a source is callable, the value to salvage is retrieved by calling
value(locs)
. They take precedence over`salvage_vars`.
- Returns
the
Jetsam
annotation, also attached on the exception- Raises
any exception raised by the wrapped function, annotated with values assigned as attributes on this context-manager
Any attributes attached on this manager are attached as a new dict on the raised exception as new
jetsam
attribute with a dict as value.If the exception is already annotated, any new items are inserted, but existing ones are preserved.
If DEBUG flag is enabled, plots the 1st found errored in order solution/plan/pipeline/net, and log its path.
Example:
Call it with managed-block’s
locals()
and tell which of them to salvage in case of errors:>>> try: ... a = 1 ... b = 2 ... raise Exception("trouble!") ... except Exception as ex: ... save_jetsam(ex, locals(), "a", b="salvaged_b", c_var="c") ... raise Traceback (most recent call last): Exception: trouble!
And then from a REPL:
>>> import sys >>> sys.exc_info()[1].jetsam # doctest: +SKIP {'a': 1, 'salvaged_b': 2, "c_var": None}
Note
In order not to obfuscate the landing position of post-mortem debuggers in the case of errors, use the
try-finally
withok
flag pattern:>>> ok = False >>> try: ... ... pass # do risky stuff ... ... ok = True # last statement in the try-body. ... except Exception as ex: ... if not ok: ... ex = sys.exc_info()[1] ... save_jetsam(...)
** Reason:**
Graphs may become arbitrary deep. Debugging such graphs is notoriously hard.
The purpose is not to require a debugger-session to inspect the root-causes (without precluding one).
Module: jsonpointer¶
Utility for json pointer path modifier
Copied from pypi/pandalone.
- exception graphtik.jsonpointer.ResolveError[source]¶
A
KeyError
raised when a json-pointer does notresolve
.- property part¶
the part where the resolution broke
- property path¶
the json-pointer that failed to resolve
- graphtik.jsonpointer.collection_popper(doc: Collection, part, do_pop)[source]¶
Resolve part in doc, or pop it with default`if `do_pop.
- graphtik.jsonpointer.collection_scouter(doc: Doc, key, mother, overwrite, concat_axis) Tuple[Any, Optional[Doc]] [source]¶
Get item key from doc collection, or create a new ome from mother.
- Parameters
mother – factory producing the child containers to extend missing steps, or the “child” value (when overwrite is true).
- Returns
a 2-tuple (child, doc) where doc is not None if it needs to be replaced in its parent container (e.g. due to df-concat with value).
- graphtik.jsonpointer.contains_path(doc: Doc, path: Union[str, Iterable[str]], root: Doc = '%%UNSET%%', descend_objects=True) bool [source]¶
Test if doc has a value for json-pointer path by calling
resolve_path()
.
- graphtik.jsonpointer.escape_jsonpointer_part(part: str) str [source]¶
convert path-part according to the json-pointer standard
- graphtik.jsonpointer.json_pointer(parts: Sequence[str]) str [source]¶
Escape & join parts into a jsonpointer path (inverse of
jsonp_path()
).Examples:
>>> json_pointer(["a", "b"]) 'a/b' >>> json_pointer(['', "a", "b"]) '/a/b'
>>> json_pointer([1, "a", 2]) '1/a/2'
>>> json_pointer([""]) '' >>> json_pointer(["a", ""]) '' >>> json_pointer(["", "a", "", "b"]) '/b'
>>> json_pointer([]) ''
>>> json_pointer(["/", "~"]) '~1/~0'
- graphtik.jsonpointer.jsonp_path(jsonpointer: str) List[str] [source]¶
Generates the path parts according to jsonpointer spec.
- Parameters
path – a path to resolve within document
- Returns
The parts of the path as generator), without converting any step to int, and None if None. (the 1st step of absolute-paths is always
''
)
In order to support relative & absolute paths along with a sensible
set_path_value()
, it departs from the standard in these aspects:A double slash or a slash at the end of the path restarts from the root.
- Author
Julian Berman, ankostis
Examples:
>>> jsonp_path('a') ['a'] >>> jsonp_path('a/') [''] >>> jsonp_path('a/b') ['a', 'b']
>>> jsonp_path('/a') ['', 'a']
>>> jsonp_path('/') [''] >>> jsonp_path('') []
>>> jsonp_path('a/b//c') ['', 'c'] >>> jsonp_path('a//b////c') ['', 'c']
- graphtik.jsonpointer.list_popper(doc: MutableSequence, part, do_pop)[source]¶
Call
collection_popper()
with integer part.
- graphtik.jsonpointer.list_scouter(doc: Doc, idx, mother, overwrite) Tuple[Any, Optional[Doc]] [source]¶
Get doc `list item by (int) `idx, or create a new one from mother.
- Parameters
mother – factory producing the child containers to extend missing steps, or the “child” value (when overwrite is true).
- Returns
a 2-tuple (child,
None
)
NOTE: must come after collection-scouter due to special
-
index collision.
- graphtik.jsonpointer.object_popper(doc: Collection, part, do_pop)[source]¶
Resolve part in doc attributes, or
delattr
it, returning its value or default.
- graphtik.jsonpointer.object_scouter(doc: Doc, attr, mother, overwrite) Tuple[Any, Optional[Doc]] [source]¶
Get attribute attr in doc object, or create a new one from mother.
- Parameters
mother – factory producing the child containers to extend missing steps, or the “child” value (when overwrite is true).
- Returns
a 2-tuple (child,
None
)
- graphtik.jsonpointer.pop_path(doc: Doc, path: Union[str, Iterable[str]], default='%%UNSET%%', root: Doc = '%%UNSET%%', descend_objects=True)[source]¶
Delete and return the item referenced by json-pointer path from the nested doc .
- Parameters
doc – the current document to start searching path (which may be different than root)
path –
An absolute or relative json-pointer expression to resolve within doc document (or just the unescaped steps).
Attention
Relative paths DO NOT support the json-pointer extension https://tools.ietf.org/id/draft-handrews-relative-json-pointer-00.html
default – the value to return if path does not resolve; by default, it raises.
root – From where to start resolving absolute paths or double-slashes(
//
), or final slashes. IfNone
, only relative paths allowed; by default, the given doc is assumed as root (so absolute paths are also accepted).descend_objects – If true, a last ditch effort is made for each part, whether it matches the name of an attribute of the parent item.
- Returns
the deleted item in doc, or default if given and path didn’t exist
- Raises
ResolveError – if path cannot resolve and no default given
ValueError – if path was an absolute path a
None
root had been given.
See
resolve_path()
for departures from the json-pointer standardExamples:
>>> dt = { ... 'pi':3.14, ... 'foo':'bar', ... 'df': pd.DataFrame(np.ones((3,2)), columns=list('VN')), ... 'sub': { ... 'sr': pd.Series({'abc':'def'}), ... } ... } >>> resolve_path(dt, '/pi', default=UNSET) 3.14
>>> resolve_path(dt, 'df/V') 0 1.0 1 1.0 2 1.0 Name: V, dtype: float64
>>> resolve_path(dt, '/pi/BAD', 'Hi!') 'Hi!'
- Author
Julian Berman, ankostis
- graphtik.jsonpointer.prepend_parts(prefix_parts: Sequence[str], parts: Sequence[str]) Sequence[str] [source]¶
Prepend prefix_parts before given parts (unless they are rooted).
Both parts & prefix_parts must have been produced by
json_path()
so that any root(""
) must come first, and must not be empty (except prefix-parts).Examples:
>>> prepend_parts(["prefix"], ["b"]) ['prefix', 'b']
>>> prepend_parts(("", "prefix"), ["b"]) ['', 'prefix', 'b'] >>> prepend_parts(["prefix ignored due to rooted"], ("", "b")) ('', 'b')
>>> prepend_parts([], ["b"]) ['b'] >>> prepend_parts(["prefix irrelevant"], []) Traceback (most recent call last): IndexError: list index out of range
- graphtik.jsonpointer.resolve_path(doc: Doc, path: Union[str, Iterable[str]], default='%%UNSET%%', root: Doc = '%%UNSET%%', descend_objects=True)[source]¶
Resolve roughly like a json-pointer path within the referenced doc.
- Parameters
doc – the current document to start searching path (which may be different than root)
path –
An absolute or relative json-pointer expression to resolve within doc document (or just the unescaped steps).
Attention
Relative paths DO NOT support the json-pointer extension https://tools.ietf.org/id/draft-handrews-relative-json-pointer-00.html
default – the value to return if path does not resolve; by default, it raises.
root – From where to start resolving absolute paths or double-slashes(
//
), or final slashes. IfNone
, only relative paths allowed; by default, the given doc is assumed as root (so absolute paths are also accepted).descend_objects – If true, a last ditch effort is made for each part, whether it matches the name of an attribute of the parent item.
- Returns
the resolved doc-item
- Raises
ResolveError – if path cannot resolve and no default given
ValueError – if path was an absolute path a
None
root had been given.
In order to support relative & absolute paths along with a sensible
set_path_value()
, it departs from the standard in these aspects:Supports also relative paths (but not the official extension).
For arrays, it tries 1st as an integer, and then falls back to normal indexing (usefull when accessing pandas).
A
/
path does not bring the value of empty``’’`` key but the whole document (aka the “root”).A double slash or a slash at the end of the path restarts from the root.
Examples:
>>> dt = { ... 'pi':3.14, ... 'foo':'bar', ... 'df': pd.DataFrame(np.ones((3,2)), columns=list('VN')), ... 'sub': { ... 'sr': pd.Series({'abc':'def'}), ... } ... } >>> resolve_path(dt, '/pi') 3.14
>>> resolve_path(dt, 'df/V') 0 1.0 1 1.0 2 1.0 Name: V, dtype: float64
>>> resolve_path(dt, '/pi/BAD') Traceback (most recent call last): graphtik.jsonpointer.ResolveError: Failed resolving step (#2) "BAD" of path '/pi/BAD'. Check debug logs.
>>> resolve_path(dt, '/pi/BAD', 'Hi!') 'Hi!'
- Author
Julian Berman, ankostis
- graphtik.jsonpointer.set_path_value(doc: ~graphtik.jsonpointer.Doc, path: ~typing.Union[str, ~typing.Iterable[str]], value, container_factory=<class 'dict'>, root: ~graphtik.jsonpointer.Doc = '%%UNSET%%', descend_objects=True, concat_axis: ~typing.Optional[int] = None)[source]¶
Set value into a jsonp path within the referenced doc.
Special treatment (i.e. concat) if must insert a DataFrame into a DataFrame with steps
.
(vertical) and-
(horizontal) denoting concatenation axis.- Parameters
doc – the document to extend & insert value
path –
An absolute or relative json-pointer expression to resolve within doc document (or just the unescaped steps).
For sequences (arrays), it supports the special index dash(
-
) char, to refer to the position beyond the last item, as by the spec, BUT it does not raise - it always add a new item.Attention
Relative paths DO NOT support the json-pointer extension https://tools.ietf.org/id/draft-handrews-relative-json-pointer-00.html
container_factory – a factory producing the container to extend missing steps (usually a mapping or a sequence).
root – From where to start resolving absolute paths, double-slashes(
//
) or final slashes. IfNone
, only relative paths allowed; by default, the given doc is assumed as root (so absolute paths are also accepted).descend_objects – If true, a last ditch effort is made for each part, whether it matches the name of an attribute of the parent item.
concat_axis – if 0 or 1, applies :term:’pandas concatenation` vertically or horizontally, by clipping last step when traversing it and doc & value are both Pandas objects.
- Raises
if jsonpointer empty, missing, invalid-content
changed given doc/root (e.g due to concat-ed with value)
See
resolve_path()
for departures from the json-pointer standard
- graphtik.jsonpointer.unescape_jsonpointer_part(part: str) str [source]¶
convert path-part according to the json-pointer standard
- graphtik.jsonpointer.update_paths(doc: ~graphtik.jsonpointer.Doc, paths_vals: ~typing.Collection[~typing.Tuple[str, ~typing.Any]], container_factory=<class 'dict'>, root: ~graphtik.jsonpointer.Doc = '%%UNSET%%', descend_objects=True, concat_axis: ~typing.Optional[int] = None) None [source]¶
Mass-update path_vals (jsonp, value) pairs into doc.
Group jsonp-keys by nesting level,to optimize.
- Parameters
concat_axis – None, 0 or 1, see
set_path_value()
.- Returns
the updated doc (if it was a dataframe and
pd.concact
needed)
Package: sphinxext¶
Extends Sphinx with graphtik
directive for plotting from doctest code.
- class graphtik.sphinxext.DocFilesPurgatory[source]¶
Keeps 2-way associations of docs <–> abs-files, to purge them.
- class graphtik.sphinxext.GraphtikDoctestDirective(name, arguments, options, content, lineno, content_offset, block_text, state, state_machine)[source]¶
Embeds plots from doctest code (see
graphtik
).- option_spec = {'align': <function Figure.align>, 'alt': <function unchanged>, 'caption': <function unchanged>, 'class': <function class_option>, 'figclass': <function class_option>, 'figwidth': <function Figure.figwidth_value>, 'graph-format': <function _valid_format_option>, 'graphvar': <function unchanged_required>, 'height': <function length_or_unitless>, 'hide': <function flag>, 'name': <function unchanged>, 'no-trim-doctest-flags': <function flag>, 'options': <function unchanged>, 'pyversion': <function unchanged_required>, 'scale': <function percentage>, 'skipif': <function unchanged_required>, 'target': <function unchanged_required>, 'trim-doctest-flags': <function flag>, 'width': <function length_or_percentage_or_unitless>, 'zoomable': <function _tristate_bool_option>, 'zoomable-opts': functools.partial(<function _valid_json>, 'zoomable-opts')}¶
Mapping of option names to validator functions.
- class graphtik.sphinxext.GraphtikTestoutputDirective(name, arguments, options, content, lineno, content_offset, block_text, state, state_machine)[source]¶
Like
graphtik
directive, but emulates doctesttestoutput
blocks.- option_spec = {'align': <function Figure.align>, 'alt': <function unchanged>, 'caption': <function unchanged>, 'class': <function class_option>, 'figclass': <function class_option>, 'figwidth': <function Figure.figwidth_value>, 'graph-format': <function _valid_format_option>, 'graphvar': <function unchanged_required>, 'height': <function length_or_unitless>, 'hide': <function flag>, 'name': <function unchanged>, 'no-trim-doctest-flags': <function flag>, 'options': <function unchanged>, 'pyversion': <function unchanged_required>, 'scale': <function percentage>, 'skipif': <function unchanged_required>, 'target': <function unchanged_required>, 'trim-doctest-flags': <function flag>, 'width': <function length_or_percentage_or_unitless>, 'zoomable': <function _tristate_bool_option>, 'zoomable-opts': functools.partial(<function _valid_json>, 'zoomable-opts')}¶
Mapping of option names to validator functions.
Graphtik Changelog¶
Table of Contents
Contents
-
-
v10.5.0 (25 Apr 2023, @ankostis): REVIVE project, Bump DEPS
v10.4.0 (9 Oct 2020, @ankostis): CWD, callbacks non-marshalled, preserve concat index-names
v10.3.0 (21 Sep 2020, @ankostis): CONCAT pandas, Hierarchical overwrites, implicit(), post-cb
v10.2.0 (16 Sep 2020, @ankostis): RECOMPUTE, pre-callback, drop op_xxx, ops-eq-op.name, drop NULL_OP
v10.1.0 (5 Aug 2020, @ankostis): rename return-dict outs; step number badges
v10.0.0 (19 Jul 2020, @ankostis): Implicits; modify(); auto-name pipelines; plot data as overspilled
v9.3.0 (8 Jul 2020, @ankostis): Sphinx AutoDocumenter; fix plot sfx-nodes
v9.1.0 (4 Jul 2020, @ankostis): Bugfix, panda-polite, privatize modifier fields
v9.0.0 (30 Jun 2020, @ankostis): JSONP; net, evictions & sfxed fixes; conveyor fn; rename modules
v8.4.0 (15 May 2020, @ankostis): subclass-able Op, plot edges from south–>north of nodes
v8.3.1 (14 May 2020, @ankostis): plot edges from south–>north of nodes
v8.3.0 (12 May 2020, @ankostis): mapped–>keyword, drop sol-finalize
v8.2.0 (11 May 2020, @ankostis): custom Solutions, Task-context
v8.1.0 (11 May 2020, @ankostis): drop last plan, Rename/Nest, Netop–>Pipeline, purify modules
v7.1.2 (6 May 2020, @ankostis): minor reschedule fixes and refactoring
v7.1.0 (4 May 2020, @ankostis): Cancelable sideffects, theme-ize & expand everything
v7.0.0 (28 Apr 2020, @ankostis): In-solution sideffects, unified OpBuilder, plot badges
v6.2.0 (19 Apr 2020, @ankostis): plotting fixes & more styles, net find util methods
v5.2.2 (03 Mar 2020, @ankostis): stuck in PARALLEL, fix Impossible Outs, plot quoting, legend node
v5.2.1 (28 Feb 2020, @ankostis): fix plan cache on skip-evictions, PY3.8 TCs, docs
v5.2.0 (27 Feb 2020, @ankostis): Map needs inputs –> args, SPELLCHECK
v5.1.0 (22 Jan 2020, @ankostis): accept named-tuples/objects provides
v4.4.0 (21 Dec 2019, @ankostis): RESCHEDULE for PARTIAL Outputs, on a per op basis
v4.1.0 (13 Dec 2019, @ankostis): ChainMap Solution for Rewrites, stable TOPOLOGICAL sort
v4.0.0 (11 Dec 2019, @ankostis): NESTED merge, revert v3.x Unvarying, immutable OPs, “color” nodes
v3.0.0 (2 Dec 2019, @ankostis): UNVARYING NetOperations, narrowed, API refact
v2.3.0 (24 Nov 2019, @ankostis): Zoomable SVGs & more op jobs
v2.2.0 (20 Nov 2019, @ankostis): enhance OPERATIONS & restruct their modules
v2.1.0 (20 Oct 2019, @ankostis): DROP BW-compatible, Restruct modules/API, Plan perfect evictions
v2.0.0b1 (15 Oct 2019, @ankostis): Rebranded as Graphtik for Python 3.6+
v1.3.0 (Oct 2019, @ankostis): NEVER RELEASED: new DAG solver, better plotting & “sideffect”
1.0.3 (Jan 31, 2017, @huyng): Make plotting dependencies optional
1.0.2 (Sep 29, 2016, @pumpikano): Merge pull request yahoo#5 from yahoo/remove-packaging-dep
TODOs¶
- TODOs in the wiki
Changelog¶
- GitHub Releases
v11.0.0.dev0 (?? Apr 2023, @ankostis): AUTOGRAPHED¶
…
v10.5.0 (25 Apr 2023, @ankostis): REVIVE project, Bump DEPS¶
Fix all TCs and warnings for recent dependencies.
FIX(SITE): bump to latest sphinx:
FIX(TCs): bumped jinja2, added MarkupSafe and unpinned pytest so that all TCs run OK again with latest deps.
fix(sphinxext.TC) probably graphviz generates slightly different captions in SVGs.
fix: version-parsing in docs/conf.py neglected since v1.0 and manifested now as erroneously parsing pep440-version id - simplified it a lot; stop using forgotten packaging dep.
fix:
sphinx-extlinks >= 4
plugin demands%s
also in caption or else pytest-fails, and a warning is issued on site generation.fix(site) docutils-0.17+ broke long substitutions like
|version|
; fixed it with adocutils.conf
file (instead of resorting to pinning).
fix(build): unpin
black==20.8b1
since it’s been years sinceblack
was released andpip install -e[all]
was failing when trying to build end-of-lifedtyped-ast
e.g. on Python3.11.style: black & isort imports all over (with a black profile for the latter).
DOC: various nitpicks and diagram beauties.
DOC: add section about diffs with schedula project.
plot: various minor improvements.
ENH(plot) add RC-Shape badges in data-nodes
CHORE: TravisCI site lost :-( moved to GitHub Actions.
v10.4.0 (9 Oct 2020, @ankostis): CWD, callbacks non-marshalled, preserve concat index-names¶
FEAT(compose): a current-working-document given when defining operation prefixes its non-root dependencies as jsonp expressions.
feat(plot): color as “pink” any
None
s in the results, to facilitate identification of operations returning nothing, by mistake, or non-produced implicits. Include “shape” when printing vectors (np-arrays and data-frames).refact/feat(exe): the argument to callbacks now contains the results; replace
OpCb
class with pre-existing_OpTask
(now publicized).Calllbacks are now called from solution context, before marshalling.
ENH(solution): preserve index/column names when concatenating pandas (workaround https://github.com/pandas-dev/pandas/issues/13475, to be fixed with pandas-v1.2).
feat:
sol.update()
supports jsonps, and applies them grouped, to avoid resolving paths repeatedly.
doc(api): add forgotten jsonpointer module; updated module-dependencies diagram
v10.3.0 (21 Sep 2020, @ankostis): CONCAT pandas, Hierarchical overwrites, implicit(), post-cb¶
FEAT(solution+jsonp)): can extend in place pandas (dataframes & series) horizontally/vertically with pandas concatenation. Usefull for when working with Pandas advanced indexing. or else, sideffecteds are needed to break read-update cycles on dataframes.
fix(jsonp):
set_path_value()
should have failed to modify object attributes (but not mass-updated, utilized by accessors).
FEAT/fix(solution): overwrites work properly for non-layered solutions.
refact: dropped
_layers
solution-attribute, role passed to (existing)executed
.
FEAT(execution): support also post-callbacks.
FEAT/DROP(modifier): added x3 new modifiers,
vcat()
andhcat()
, (and respective accessors),implicit()
– dropped never-usedaccessor
modifier.FEAT: parse string explicitly passed in
jsonp=...
argument in modifiers.feat(modifier+fnop): keep just the last part of a keyword+jsonp dependency, to save an explicit conversion (jsonps are invalid as python-identifiers).
break(modifier+fnop): forbid implicit term:sfxed – hard to find a solid use-case, tokens would be preferable in that case.
fix: forbid aliasing implicits – they wouldn’t work anyway.
enh(compose): check for node type collisions (i.e. a dependency name clashing with some operation name).
v10.2.1 (18 Sep 2020, @ankostis): plot sol bugfix¶
FIX(PLOT): passing simple dictionaries into
plot(solution=...)
were crashing.enh(plot): use also a different label (not only format) to distinguish sfx/sfxed in plots.
v10.2.0 (16 Sep 2020, @ankostis): RECOMPUTE, pre-callback, drop op_xxx, ops-eq-op.name, drop NULL_OP¶
Should have been a “major” release due to x2 API-BREAKs not that important.
FEAT(pipeline+execution): add term
pre_callback
to be invoked prior to computing each operation (seepre_callback
arg inPipeline.compute()
).FEAT(pipeline+plan): can recompute modified solutions, partial or complete – The
recompute_from=<dep / list-of-deps>
argument has been added toPipeline.compute()
,Pipeline.compile()
&Network.compile()
methods.REFACT/break(OP): replace badly-specified public attributes
op_needs
&op_provides
with privateFnOp._user_needs
&FnOp._user_provides
– now it must be easier to inheritOperation
anew (but) UNTESTED :-().refact: don’t crash of operations lacking
rescheduled
,marshalled
etc attributes.
ENH(OP):
Operation.name
andname
string compare equal – that is, dictionaries of operations, such asSolution.executed
, can be indexed with their names (note, they didn’t equal by accident).REFACT: move
FnOp.__hash__()/__eq__()
up, to Operation class.
FEAT/break(pipeline): replace
NULL_OP
operation a newcompose(excludes=..)
argument, in order to delete existing operations when merging pipelines.FIX(PLAN): compile cache was ignoring(!) the state of the eviction flag.
FIX(execution): solution
copy()
was crashing (for 9 months till now).ENH(plot): make all nodes “filled” to facilitate hovering for tooltips.
fix(plot): “overwrite” tooltip was written “overridden”.
fix(plan): bug when printing a list of “unsolvable graph” error-message.
FIX(TEST):
exemethod
fixture’sexe_method
was always empty when interrogated for deciding “xfail”s.enh(build): pin
black
version, changes in format affect commits.doc(parallel): Deprecate(!), but just in docs, in favor of always producing a list of “parallelizable batches”, to fed to 3rdp parallelizing executors.
doc(execution+fnop): Mark mark
execution.task_context
as unstable API, in favor of supporting a specially-named function argument to receive the same instances.doc(site+doctests): use greek letters in algebraic formulas AND dependencies.
v10.1.0 (5 Aug 2020, @ankostis): rename return-dict outs; step number badges¶
FEAT(op): keyword()
modifier can rename outputs of a returns dictionary
function.
fix: rescheduled function err-msg were listing wrong missing provides.
enh: err-msg did not list returns-dict mismatches.
ENH(plot): add number badges on operations & data nodes to denote execution order; theme
show_steps=False
can hide them;feat: data-nodes are Graphviz HTML-ized to support badges.
break(fnop): do not accept operations without any provides - they were pruned before, but it is better to fail asap.
fix/break: clip extra data from a returns dictionary function - previously, full-eviction assertion was kicking.
enh(plan): why operations are pruned is now explained in the plan & the plot tooltips.
fix(plan):
ExecutionPlan.validate()
may be called with no args, and uses the compiled ones.enh: suggest mode of action on the error message, when graph has cycles; top-sort nodes only once, and report them to jetsam (see below).
doc: enhance existing tutorial section to explain compilation.
feat(jetsam): pipeline was not collected, till now.
FEAT: items can be accessed as jetsam attributes (helpful for interactive REPLs).
ENH: don’t “catch” exception, use
try-finally
withok
flag instead, not to corrupt the landing position of a post-mortem debugger.feat: now collecting also
pruned_dag
,op_comments
,sorted_nodes
while compiling.revert: stop logging every jetsam item on each salvaging point, not to flood logs (which had been introduced in the previous release).
refact: move into own module.
fix(SphinxExt): catch top-level errors that if occurred, message and stack trace were lost.
doc: list anti-features (when to avoid using this lib).
v10.0.0 (19 Jul 2020, @ankostis): Implicits; modify(); auto-name pipelines; plot data as overspilled¶
FEAT: new implicit modifier doing a better job than
sfx()
.FEAT(pipeline): auto-derive name from enclosing function.
BREAK/fix(modifier): rename modifier
jsonp =>
modify()
; parameterjsonp=False
now works.FEAT(jspoint): descend object attributes were disabled before.
ENH(modifier): privatize all fields (str with foreign attributes interact badly with 3rdp libs).
ENH(plot): stackable tooltips; now data nodes kind and state is fully explained there.
enh: split jsonp data nodes in separate lines forming a tree.
enh: label overspill data-node’s shapes.
enh: theme-stack now expands any callables in keys or whole kv-pairs.
feat:
show_chaindocs=False
them attribute now hides even subdoc relationships (edges).fix: various fixes & enhancements (“canceled” were misattributed, update legend, infective user
'graphviz.xxx"
attributes, plotting no-edge diagrams)
enh(planning): explained why nodes were pruned in
DEBUG
logs.enh(jetsam): exception-annotated contents accessed also as attributes.
doc(debug) improve instructions.
enh(tests): check library also with
DEBUG
logging level.
v9.3.0 (8 Jul 2020, @ankostis): Sphinx AutoDocumenter; fix plot sfx-nodes¶
FIX/FEAT(SPHINXEXT): so far,
operation()
-annotated module functions were excluded from generated sites. Until the installed autodoc function-documenter was instructed how to render the wrapped function in place of the wrappingFnOp
:fix(fnop, pipeline): wrapped function attributes are conveyed to wrapping FnOp.
FIX(plot): sideffect templates were left broken by recent privatization of modifier fields; add x2 Jinja-filters encapsulating the access to these fields.
fix(op): fully fake callables by attaching a
__qualname__
property on operations; also teachfunc_name()
not to choke if__qualname__
missing.
v9.2.0 (4 Jul 2020, @ankostis): Drop MultiValueError¶
Delayed raising of needs errors hindered debug.
v9.1.0 (4 Jul 2020, @ankostis): Bugfix, panda-polite, privatize modifier fields¶
BREAK(modifier): privatize all
_Modifier
properties; it is uncanny for a str to have more public attributes.fix: avoid equality checks on results, to avoid pandas notorious “The truth value of a Series/DataFrame is ambiguous.”
break(plot): Rename theme property
include_steps => show_steps
.feat(plot): new theme property
show_chaindocs
by default false, that when enabled, plots all nodes in the subdoc hierarchy (even if those not used as deps), like this:pipeline.plot(theme={"show_chaindocs": True})
fix(plot): returns-dictionary op-badge had broken url.
v9.0.0 (30 Jun 2020, @ankostis): JSONP; net, evictions & sfxed fixes; conveyor fn; rename modules¶
FEAT(modifier): Dependencies with json pointer path that can read/write subdocs (e.g. nested dicts & pandas).
feat(config): added
set_layered_solution()
into configurations which when True (or jsonps in the network if None (default)) all results are stored in the given inputs to the pipeline (this may become the regular behavior in the future).feat(modifier, solution): +modifier with accessor functions to read/write Solution.
doc: new section Hierarchical data and further tricks putting together all advanced features of the project in a “Weekly task runner”.
BREAK/REFACT: modules and objects renamed:
FROM
TO
modifierS.py
modifier.py
func: modifiers.fn_kwarg
network.py
planning.py
op.py
fnop.py
class: op.FunctionalOperation
FEAT(op): default
identity_function()
acting as conveyor operation.FIX(NET, EXECUTION): discovered and fixed bugs in pruning, evictions and rescheduling with overwrites, while testing new jsonp modifier; rely on dag alone while pruning (and not digging into op needs/provides).
Dupe Evictions of pruned output were deliberately & wrongly consolidated, while it is possible to need to evict repeatedly the same out from multiple ops providing it.
Less aggressive prune-isolated-data permits SFX not to be asked explicitly, and behave more like regular data. Now For certain cases, the more specific error “Unreachable out” gets raised, instead of the too generic “Unsolvable graph”.
Prune-by-outputs was ignoring given inputs, chocking on computation cycles that were possible to avoid!
DROP(net):
_EvictionInstruction
class was obscuring modifier combinations, and it didn’t make sense any more, being the only instruction.FEAT(ops, pipelines, net, sol): unified
Plottable.ops()
utility properties.ENH: Error reporting:
enh(op, pipe): fail earlier if no function/name given when defining operations and pipelines.
enh(op): when
GRAPHTIK_DEBUG
var defined, any errors during inputs/needs matching are raised immediately.enh: improve tips & hints in exception messages; log past executed operations when a pipeline fails.
DOC(op): table explaining the differences between various dependency attributes of
FnOp
.Differences between various dependency operation attributes:
dependency attribute
dupes
token
alias
sfxed
needs
needs
✗
✓
SINGULAR
_user_needs
✓
✓
_fn_needs
✓
✗
STRIPPED
provides
provides
✗
✓
✓
SINGULAR
_user_provides
✓
✓
✗
_fn_provides
✓
✗
✗
STRIPPED
where:
“dupes=no” means the collection drops any duplicated dependencies
“SINGULAR” means
sfxed('A', 'a', 'b') ==> sfxed('A', 'b'), sfxed('A', 'b')
“STRIPPED” means
sfxed('A', 'a', 'b') ==> token('a'), sfxed('b')
enh(op, pipe): restrict operation names to be strings (were
collection.abc.Hashable
).feat(modifier): public-ize
modifier_withset()
to produce modified clones – handle it with care.feat(doc): Add new section with most significant Features of this project.
fix(travis): update pytest or else pip-install chokes with pytest-coverage plugin.
enh(pytest): add
--logger-disabled
CLI option when running TCs, as explained in pytest-dev/pytest#7431.refact(tests): split big
test/test_graphtik.py
TC file into multiple ones, per functionality area (features).
v8.4.0 (15 May 2020, @ankostis): subclass-able Op, plot edges from south–>north of nodes¶
ENH(pipe): nest all Ops (not just FnOps), dropping
FnOp
dependency in network code, to allow for further sub-classingOperation
.FIX(pipeline): due to a side-effect on a
kw
dictionary, it was mixing the attributes of earlier operations into later ones while merging them into pipelines.REFACT(solution): facilitate inheriting Solution by extracting :meth:` .Solution._update_op_outs` into a separate method.
refact(pipe): move build_net() –> back to pipeline module, dropping further network.py–>pipeline.py mod-dep.
enh(plot): StyleStack-ize data-io shape selection into separate theme-able dicts.
DOC(exe, plotting): task-context section in Debugger
v8.3.1 (14 May 2020, @ankostis): plot edges from south–>north of nodes¶
ENH(plot): have all the link-edges between data and operations route out and into the same point on the nodes (src: south, dst: north). Distinguish needs edges from provides with a “dot”.
v8.3.0 (12 May 2020, @ankostis): mapped–>keyword, drop sol-finalize¶
BREAK: rename
mapped --> keyword
, which conveys the mot important meaning.DROP Solution.finalized() method – has stopped being used to reverse values since sfxed have been introduced (v7+).
doc(modifiers): explain diacritic symbols of dependencies when in printouts.
v8.2.0 (11 May 2020, @ankostis): custom Solutions, Task-context¶
FEAT(exe):
compute()
supports custom Solution classes.FEAT(exe): underlying functions gain access to wrapping Operation with
execution.task_context
.
v8.1.0 (11 May 2020, @ankostis): drop last plan, Rename/Nest, Netop–>Pipeline, purify modules¶
DROP(pipeline): After solution class was introduced,
last_plan
attribute was redundant.ENH(op): Rename & Nest operations with dictionary or callable.
FEAT(pipeline):
NO_RESULT_BUT_SFX
token can cancel regular data but leave sideffects of a rescheduled op.REFACT: revert module splits and arrive back to
base.py
,fnop.py
&pipeline.py
, to facilitate development with smaller files, but still with very few import-time dependencies.Importing project composition classes takes less than 4ms in a fast 2019 PC (down from 300ms).
FIX(plot): updated Legend, which had become outdated since v6+.
fix(modifiers): dep_renamed() was faking sideffect-renaming only on repr() (but fix not stressed, bc v8.0.x is not actually released).
enh(pipe): accept a dictionary with renames when doing operation nesting (instead of callables or truthies).
enh(modifiers): add
_Modifier.cmd
with code to reproduce modifier.
v8.0.2 (7 May 2020, @ankostis): re-MODULE; sideffect –> sfx; all DIACRITIC Modifiers; invert “merge” meaning¶
–((superseded immediately v8.0.1 & v8.0.2 with more restructurings)))–
BREAK: restructured
netop
&&network
modules:BREAK: stopped(!) importing
config
top-level.BREAK:
network
module was splitted intoexecution
which now contains plan+solution;BREAK: unified modules
op
+netop
–> :mod`.composition`.DOC: module dependencies diagram in API Reference; now x60 faster
import composition
from 300ms –> 5ms.
BREAK: sideffect modifier functions shortened to
sfx()
&sfxed()
.FEAT: +Sideffected varargish – now sideffected fully mirror a regular dependency.
ENH: change visual representation of modifiers with DIACRITICS only.
refact(modifiers): use cstor matrix to combine modifier arguments; new utility method for renaming dependencies
dep_renamed()
(usefull when Nesting, see below).ENH: possible to rename also sideffects; the actual sideffect string is now stored in the modifier.
BREAK/ENH: invert “merge” meaning with (newly introduced) “:term:”nest <operation nesting>`”; default is now is merge:
FEAT: introduce the
NULL_OP
operation that can “erase” an existing operation when merging pipelines.ENH:
compose(..., nest=nest_cb)
where the callback accepts class.RenArgs
and can perform any kind of renaming on data + operations before combining pipelines.doc: “merge” identically-named ops override each other, “nest” means they are prefixed, “combine” means both operations.
DOC: re-written a merge-vs-nest tutorial for humanity.
DROP(op): parent attribute is no longer maintained – operation identity now based only on name, which may implicitly be nested by dots(
.
).ENH(plot): accept bare dictionary as theme overrides when plotting.
doc: fix site configuration for using the standard
<s5defs>
include for colored/font-size sphinx roles.
v8.0.0, v8.0.1 (7 May 2020, @ankostis): retracted bc found more restructurings¶
–((all changes above in b8.0.2 happened actually in these 2 releases))–
v7.1.2 (6 May 2020, @ankostis): minor reschedule fixes and refactoring¶
Actually it contains just what was destined for v7.1.1.
FIX(op): v7.0.0 promise that
op.__call__
delegates tocompute()
was a fake; now it is fixed.fix(config): endurance flags were miss-behaving.
refact(net): factor out a
_reschedule()
method for both endurance & rescheduled ops.feat(build): +script to launch pytest on a local clone repo before pushing.
v7.1.1 (5 May 2020, @ankostis): canceled, by mistake contained features for 8.x¶
(removed from PyPi/RTD, new features by mistake were removed from v7.1.2)
v7.1.0 (4 May 2020, @ankostis): Cancelable sideffects, theme-ize & expand everything¶
Should have been a MAJOR BUMP due to breaking renames, but just out of another 6.x –> 7.x major bump.
NET: fix rescheduled, cancelable sfx, improve compute API¶
FIX: rescheduled operations were not canceling all downstream deps & operations.
FEAT: Cancelable sideffects: a reschedules operation may return a “falsy” sideffect to cancel downstream operations.
NO_RESULT
constant cancels also sideffects.
ENH(OP): more intuitive API,
compute()
may be called with no args, or a single string as outputs param. Operation’s__call__
now delegates tocompute()
- to quickly experiment with function, access it from the operation’sFnOp.fn
attribute
MODIFIERS: modifier combinations, rename sol_sideffects¶
BREAK: renamed modifiers
sol_sideffect --> sideffected
, to reduce terminology mental load for the users.ENH: support combinations of modifiers (e.g. optional sideffects).
REFACT: convert modifiers classes –> factory functions, producing
_Modifier
instances (normally not managed by the user).
PLOT: them-ize all, convey user-attrs, draw nest clusters, click SVGs to open in tab, …¶
ENH: Theme-ize all; expand callables (beyond Refs and templates).
BREAK: rename
Theme.with_set()
–>Theme.withset()
.break: pass verbatim any nx-attrs starting with
'graphviz.'
into plotting process (instead of passing everything but private attributes).break: rename graph/node/edge control attributes:
_no_plot --> no_plot
._alias_of --> alias_of
.
FEAT: draw combined pipelines as clusters
enh: corrected and richer styles for data nodes.
enh: unify op-badges on plot with diacritics in their string-representation.
ENH(sphinxext): clicking on an SVG opens the diagram in a new tab.
fix(sphinxext): don’t choke on duplicate
:name:
ingraphtik
directives.fix(sphinxext): fix deprecation of sphinx
add_object()
withnote_object()
.
Various: raise TypeErrors, improve “operations” section¶
break: raise
TypeError
instead ofValueError
wherever it must.DOC(operations): heavily restructured chapter - now might stand alone. Started using the pipeline name more often.
doc: use as sample diagram in the project opening an “endured” one (instead of an outdated plain simple on).
doc: renamed document:
composition.py --> pipelines.py
v7.0.0 (28 Apr 2020, @ankostis): In-solution sideffects, unified OpBuilder, plot badges¶
BREAK: stacking of solution results changed to the more natural “chronological” one (outputs written later in the solution override previous ones).
Previously it was the opposite during execution while reading intermediate solution values (1st result or user-inputs won), and it was “reversed” to regular chronological right before the solution was finalized.
FEAT(op, netop): add
__name__
attribute to operations, to disguise as functions.BREAK(op): The
operation()
factory function (used to be class) now behave like a regular decorator when fn given in the first call, and constructs theFnOp
without a need to call again the factory.Specifically the last empty call at the end
()
is not needed (or possible):operation(str, name=...)()
became simply like that:
operation(str, name=...)
DROP(NET):
_DataNode
and use str + modifier-classes as data-nodes;
MODIFIERS: Sideffecteds; arg–> mapped¶
BREAK: rename arg –> mapped`, which conveys the correct meaning.
FEAT: Introduced :term`sideffected`s, to allow for certain dependencies to be produced & consumed by function to apply “sideffects, without creating “cycles”:
feat(op): introduce
_fn_needs
,op_needs
&op_provides
onFnOp
, used when matching Inps/Outs and when pruning graph.FEAT(op): print detailed deps when DEBUG enabled.
PLOT: Badges, StyleStacks, refact Themes, fix style mis-classifications, don’t plot steps¶
ENH: recursively merge Graphviz-styles attributes, with expanding jinja2-template and extending lists while preserving theme-provenance, for debugging.
BREAK: rename class & attributes related to
Style --> Theme
, to distinguish them from styles (stacks of dictionaries).UPD: dot no plot Steps by default; use this Plot customizations to re-enable them:
plottable.plot(plotter=Plotter(show_steps=True))
FEAT: now operations are also plottable.
FEAT: Operation BADGES to distinguish endured, rescheduled, parallel, marshalled, returns_dict.
FIX: Cancel/Evict styles were misclassified.
feat(plot): change label in sol_sideffects; add exceptions as tooltips on failed operations, etc.
enh: improve plot theme, e.g. prunes are all grey, sideffects all blue, “evictions” are colored closer to steps, etc. Add many neglected styles.
Sphinx extension:¶
enh: Save DOTs if DEBUG; save it before…
fix: save debug-DOT before rendering images, to still get those files as debug aid in case of errors.
fix: workaround missing lineno on doctest failures, an incomplete solution introduced upstream by sphinx-doc/sphinx#4584.
Configurations:¶
BREAK: rename context-manager configuration function debug –> debug_enabled.
FEAT: respect
GRAPHTIK_DEBUG
for enabling is_debug() configuration.
DOC:¶
feat: new sections about composing pipelines with reschedule / endured operations & aliases.
enh: Clarified relation and duties of the new term dependency.
enh: Linked many terms from quick-start section.
enh(site): support for Sphinx’s standard colored-text roles.
v6.2.0 (19 Apr 2020, @ankostis): plotting fixes & more styles, net find util methods¶
PLOT:
DEPRECATE(plot): show argument in plot methods & functions; dropped completely from the args of the younger class
Plotter
.It has merged with filename param (the later takes precedence if both given).
ENH: apply more styles on data-nodes; distinguish between Prune/Cancel/Evict data Styles and add tooltips for those cases (ie data nodes without values).
DROP: do not plot wth
splines=ortho
, because it crashes with some shapes; explain in docs how to re-enables this (x2 ways).FIX: node/edge attributes were ignored due to networkx API misuse - add TCs on that.
FIX: Networks were not plotting Inps/Outs/Name due to forgotten
namedtuple._replace()
assignment.feat: introduce
_no_plot
nx-attribute to filter out nodes/edges.
ENH(base): improve auto-naming of operations, descending partials politely and handling better builtins.
FEAT(net): add
Network.find_ops()
&Network.find_op_by_name()
utility methods.enh(build, site, doc): graft Build Ver/Date as gotten from Git in PyPi landing-page.
v6.1.0 (14 Apr 2020, @ankostis): config plugs & fix styles¶
Should have been a MAJOR BUMP due to breaking renames, but…no clients yet (and just out of to 5.x –> 6.x major bump).
REFACT/BREAK(plot): rename
installed_plotter --> active_plotter
.REFACT/BREAK(config): denote context-manager functions by adding a
"_plugged"
suffix.FEAT(plot): offer
with_XXX()
cloning methods on Plotter/Style instances.FIX(plot): Style cstor were had his methods broken due to eager copying them from its parent class.
v6.0.0 (13 Apr 2020, @ankostis): New Plotting Device…¶
–((superseded by v6.1.0 due to installed_potter –> active_plotter))–
ENH/REFACT(PLOT):
REFACT/BREAK: plots are now fully configurable with plot theme through the use of installed plotter.
ENH: Render operation nodes with Graphviz HTML-Table Labels.
ENH: Convey graph, node & edge (“non-private”) attributes from the networkx graph given to the plotter.
FEAT: Operation node link to docs (hackish, based on a URL formatting).
Improved plotting documentation & +3 new terms.
FIX: ReadTheDice deps
drop(plot): don’t suppress the grafting of the title in netop images.
v5.7.1 (7 Apr 2020, @ankostis): Plot job, fix RTD deps¶
ENH(PLOT): Operation tooltips now show function sources.
FIX(site): RTD failing since 5.6.0 due to sphinxcontrib-spelling extension not included n its requirements.
FEAT(sphinxext): add
graphtik_plot_keywords
sphinx-configuration with a default value that suppresses grafting the title of a netop in the images, to avoid duplication whengraphtik:name:
option is given.enh(plot): URL/tooltips are now overridable with node_props
enh(sphinxext): permalink plottables with :name: option.
enh(plot): pan-zoom follows parent container block, on window resize; reduce zoom mouse speed.
v5.7.0 (6 Apr 2020, @ankostis): FIX +SphinxExt in Wheel¶
All previous distributions in PyPi since sphinx-extension was added in v5.3.0
were missing the new package sphinxext
needed to build sites with
the .. graphtik::
directive.
v5.6.0 (6 Apr 2020, @ankostis, BROKEN): +check_if_incomplete¶
–((BROKEN because wheel in PyPi is missing sphinxext
package))–
feat(sol): +
Solution.check_if_incomplete()
just to get multi-errors (not raise them)doc: integrate spellchecking of VSCode IDE & sphinxcontrib.spelling.
v5.5.0 (1 Apr 2020, @ankostis, BROKEN): ortho plots¶
–((BROKEN because wheel in PyPi is missing sphinxext
package))–
Should have been a major bump due to breaking rename of Plotter
class,
but…no clients yet.
ENH(plot): plot edges in graphs with Graphviz
splines=ortho
.REFACT(plot): rename base class from
Plotter --> Plottable
;enh(build): add
[dev]
distribution extras as an alias to[all]
. doc: referred to the new name from a new term in glossary.enh(site): put RST substitutions in
rst_epilog
configuration (instead of importing them from README’s tails).doc(quickstart): exemplify
@operation
as a decorator.
v5.4.0 (29 Mar 2020, @ankostis, BROKEN): auto-name ops, dogfood quickstart¶
–((BROKEN because wheel in PyPi is missing sphinxext
package))–
enh(op): use func_name if none given.
DOC(quickstart): dynamic plots with sphinxext.
v5.3.0 (28 Mar 2020, @ankostis, BROKEN): Sphinx plots, fail-early on bad op¶
–((BROKEN because wheel in PyPi is missing sphinxext
package))–
FEAT(PLOT,SITE): Sphinx extension for plotting graph-diagrams as zoomable SVGs (default), PNGs (with link maps), PDFs, etc.
replace pre-plotted diagrams with dynamic ones.
deps: sphinx >=2; split (optional) matplolib dependencies from graphviz.
test: install and use Sphinx’s harness for testing site features & extensions.
ENH(op): fail early if 1st argument of operation is not a callable.
enh(plot): possible to control the name of the graph, in the result DOT-language (it was stuck to
'G'
before).upd(conf): detailed object representations are enabled by new configuration
debug
flag (instead of piggybacking onlogger.DEBUG
).enh(site):
links-to-sources resolution function was discarding parent object if it could not locate the exact position in the sources;
TC: launch site building in pytest interpreter, to control visibility of logs & stdout;
add index pages, linked from TOCs.
v5.2.2 (03 Mar 2020, @ankostis): stuck in PARALLEL, fix Impossible Outs, plot quoting, legend node¶
FIX(NET): PARALLEL was ALWAYS enabled.
FIX(PLOT): workaround pydot parsing of node-ID & labels (see pydot#111 about DOT-keywords & pydot#224 about colons
:
) by converting IDs to HTML-strings; additionally, this project did not follow Graphviz grammatical-rules for IDs.FIX(NET): impossible outs (outputs that cannot be produced from given inputs) were not raised!
enh(plot): clicking the background of a diagram would link to the legend url, which was annoying; replaced with a separate “legend” node.
v5.2.1 (28 Feb 2020, @ankostis): fix plan cache on skip-evictions, PY3.8 TCs, docs¶
FIX(net): Execution-plans were cached also the transient
is_skip_evictions()
configurations (instead of just whether no-outputs were asked).doc(readme): explain “fork” status in the opening.
ENH(travis): run full tests from Python-3.7–> Python-3.8.
v5.2.0 (27 Feb 2020, @ankostis): Map needs inputs –> args, SPELLCHECK¶
FEAT(modifiers): optionals and new modifier
mapped()
can now fetch values from inputs into differently-named arguments of operation functions.refact: decouple varargs from optional modifiers hierarchy.
REFACT(OP): preparation of NEEDS –> function-args happens once for each argument, allowing to report all errors at once.
feat(base): +MultiValueError exception class.
DOC(modifiers,arch): modifiers were not included in “API reference”, nor in the glossary sections.
FIX: spell-check everything, and add all custom words in the VSCode settings file
.vscode.settings.json
.
v5.1.0 (22 Jan 2020, @ankostis): accept named-tuples/objects provides¶
ENH(OP): flag returns_dict handles also named-tuples & objects (
__dict__
).
v5.0.0 (31 Dec 2019, @ankostis): Method–>Parallel, all configs now per op flags; Screaming Solutions on fails/partials¶
BREAK(NETOP):
compose(method="parallel") --> compose(parallel=None/False/True)
and DROPnetop.set_execution_method(method)
; parallel now also controlled with the globalset_parallel_tasks()
configurations function.feat(jetsam): report task executed in raised exceptions.
break(netop): rename
netop.narrowed() --> withset()
toi mimicOperation
API.break: rename flags:
reschedule --> rescheduleD
marshal --> marshalLED
.
break: rename global configs, as context-managers:
marshal_parallel_tasks --> tasks_marshalled
endure_operations --> operations_endured
FIX(net, plan,.TC): global skip eviction\s flag were not fully obeyed (was untested).
FIX(OP): revamped zipping of function outputs with expected provides, for all combinations of rescheduled,
NO_RESULT
& returns dictionary flags.configs:
refact: extract configs in their own module.
refact: make all global flags tri-state (
None, False, True
), allowing to “force” operation flags when not None. All default toNone
(false).
ENH(net, sol, logs): include a “solution-id” in revamped log messages, to facilitate developers to discover issues when multiple netops are running concurrently. Heavily enhanced log messages make sense to the reader of all actions performed.
ENH(plot): set toolltips with
repr(op)
to view all operation flags.FIX(TCs): close process-pools; now much more TCs for parallel combinations of threaded, process-pool & marshalled.
ENH(netop,net): possible to abort many netops at once, by resetting abort flag on every call of
Pipeline.compute()
(instead of on the first stopped netop).FEAT(SOL):
scream_if_incomplete()
will raise the newIncompleteExecutionError
exception if failures/partial-outs of endured/rescheduled operations prevented all operations to complete; exception message details causal errors and conditions.feat(build): +``all`` extras.
FAIL: x2 multi-threaded TCs fail spuriously with “inverse dag edges”:
test_multithreading_plan_execution()
test_multi_threading_computes()
both marked as
xfail
.
v4.4.1 (22 Dec 2019, @ankostis): bugfix debug print¶
fix(net): had forgotten a debug-print on every operation call.
doc(arch): explain parallel & the need for marshalling with process pools.
v4.4.0 (21 Dec 2019, @ankostis): RESCHEDULE for PARTIAL Outputs, on a per op basis¶
[x] dynamic Reschedule after operations with partial outputs execute.
[x] raise after jetsam.
[x] plots link to legend.
[x] refact netop
[x] endurance per op.
[x] endurance/reschedule for all netop ops.
[x] merge _Rescheduler into Solution.
[x] keep order of outputs in Solution even for parallels.
[x] keep solution layers ordered also for parallel.
[x] require user to create & enter pools.
[x] FIX pickling THREAD POOL –>Process.
Details¶
FIX(NET): keep Solution’s insertion order also for PARALLEL executions.
FEAT(NET, OP): rescheduled operations with partial outputs; they must have
FnOp.rescheduled
set to true, or else they will fail.FEAT(OP, netop): specify endurance/reschedule on a per operation basis, or collectively for all operations grouped under some pipeline.
REFACT(NETOP):
feat(netop): new method
Pipeline.compile()
, delegating to same-named method of network.drop(net): method
Net.narrowed()
; remember netop.narrowed(outputs+predicate) and apply them on netop.compute() &netop.compile()
.PROS: cache narrowed plans.
CONS: cannot review network, must review plan of (new) netop.compile().
drop(netop): inputs args in narrowed() didn’t make much sense, leftover from “unvarying netops”; but exist ni netop.compile().
refact(netop): move net-assembly from compose() –> NetOp cstor; now reschedule/endured/merge/method args in cstor.
NET,OP,TCs: FIX PARALLEL POOL CONCURRENCY
Network:
feat: +marshal +_OpTask
refact: plan._call_op –> _handle_task
enh: Make abort run variable a shared-memory
Value
.
REFACT(OP,.TC): not a namedtuple, breaks pickling.
ENH(pool): Pool
FIX: compare Tokens with is –> ==, or else, it won’t work for sub-processes.
TEST: x MULTIPLE TESTS
+4 tags: parallel, thread, proc, marshal.
many uses of exemethod.
FIX(build): PyPi README check did not detect forbidden
raw
directives, and travis auto-deployments were failing.doc(arch): more terms.
v4.3.0 (16 Dec 2019, @ankostis): Aliases¶
FEAT(OP): support “aliases” of provides, to avoid trivial pipe-through operations, just to rename & match operations.
v4.2.0 (16 Dec 2019, @ankostis): ENDURED Execution¶
FEAT(NET): when
set_endure_operations()
configuration is set to true, a pipeline will keep on calculating solution, skipping any operations downstream from failed ones. The solution eventually collects all failures inSolution.failures
attribute.ENH(DOC,plot): Links in Legend and Architecture Workflow SVGs now work, and delegate to architecture terms.
ENH(plot): mark overwrite, failed & canceled in
repr()
(see endurance).refact(conf): fully rename configuration operation
skip_evictions
.REFACT(jetsam): raise after jetsam in situ, better for Readers & Linters.
enh(net): improve logging.
v4.1.0 (13 Dec 2019, @ankostis): ChainMap Solution for Rewrites, stable TOPOLOGICAL sort¶
FIX(NET): TOPOLOGICALLY-sort now break ties respecting operations insertion order.
ENH(NET): new
Solution
class to collect all computation values, based on acollections.ChainMap
to distinguish outputs per operation executed:ENH(NETOP):
compute()
returnSolution
, consolidating:drop(net):
_PinInstruction
class is not needed.drop(netop): overwrites_collector parameter; now in
Solution.overwrites()
.ENH(plot):
Solution
is also aPlottable
; e.g. usesol.plot(...)`
.
DROP(plot): executed arg from plotting; now embedded in solution.
ENH(PLOT.jupyter,doc): allow to set jupyter graph-styling selectively; fix instructions for jupyter cell-resizing.
fix(plan): time-keeping worked only for sequential execution, not parallel. Refactor it to happen centrally.
enh(NET,.TC): Add PREDICATE argument also for
compile()
.FEAT(DOC): add GLOSSARY as new Architecture section, linked from API HEADERS.
v4.0.1 (12 Dec 2019, @ankostis): bugfix¶
FIX(plan):
plan.repr()
was failing on empty plans.fix(site): minor badge fix & landing diagram.
v4.0.0 (11 Dec 2019, @ankostis): NESTED merge, revert v3.x Unvarying, immutable OPs, “color” nodes¶
BREAK/ENH(NETOP): MERGE NESTED NetOps by collecting all their operations in a single Network; now children netops are not pruned in case some of their needs are unsatisfied.
feat(op): support multiple nesting under other netops.
BREAK(NETOP): REVERT Unvarying NetOps+base-plan, and narrow Networks instead; netops were too rigid, code was cumbersome, and could not really pinpoint the narrowed needs always correctly (e.g. when they were also provides).
A netop always narrows its net based on given inputs/outputs. This means that the net might be a subset of the one constructed out of the given operations. If you want all nodes, don’t specify needs/provides.
drop 3
ExecutionPlan
attributes:plan, needs, plan
drop recompile flag in
Network.compute()
.feat(net): new method
Network.narrowed()
clones and narrows.Network()
cstor accepts a (cloned) graph to supportnarrowed()
methods.
BREAK/REFACT(OP): simplify hierarchy, make
Operation
fully abstract, without name or requirements.enh: make
FnOp
IMMUTABLE, by inheriting from class:.namedtuple.
refact(net): consider as netop needs also intermediate data nodes.
FEAT(#1, net, netop): support pruning based on arbitrary operation attributes (e.g. assign “colors” to nodes and solve a subset each time).
enh(netop):
repr()
now counts number of contained operations.refact(netop): rename
netop.narrow() --> narrowed()
drop(netop): don’t topologically-sort sub-networks before merging them; might change some results, but gives control back to the user to define nets.
v3.1.0 (6 Dec 2019, @ankostis): cooler prune()
¶
break/refact(NET): scream on
plan.execute()
(notnet.prune()
) so as calmly solve needs vs provides, based on the given inputs/outputs.FIX(ot): was failing when plotting graphs with ops without fn set.
enh(net): minor fixes on assertions.
v3.0.0 (2 Dec 2019, @ankostis): UNVARYING NetOperations, narrowed, API refact¶
Pipelines:
BREAK(NET): RAISE if the graph is UNSOLVABLE for the given needs & provides! (see “raises” list of
compute()
).BREAK:
Pipeline.__call__()
accepts solution as keyword-args, to mimic API ofOperation.__call__()
.outputs
keyword has been dropped.Tip
Use
Pipeline.compute()
when you ask different outputs, or set therecompile
flag if just different inputs are given.Read the next change-items for the new behavior of the
compute()
method.UNVARYING NetOperations:
BREAK: calling method
Pipeline.compute()
with a single argument is now UNVARYING, meaning that all needs are demanded, and hence, all provides are produced, unless therecompile
flag is true oroutputs
asked.BREAK: net-operations behave like regular operations when nested inside another netop, and always produce all their provides, or scream if less inputs than needs are given.
ENH: a newly created or cloned netop can be
narrowed()
to specific needs & provides, so as not needing to pass outputs on every call tocompute()
.feat: implemented based on the new “narrowed”
Pipeline.plan
attribute.
FIX: netop needs are not all optional by default; optionality applied only if all underlying operations have a certain need as optional.
FEAT: support function
**args
with 2 new modifiersvararg()
&varargs()
, acting likeoptional()
(but without feeding into underlying functions like keywords).BREAK(yahoo#12): simplify
compose
API by turning it from class –> function; all args and operations are now given in a singlecompose()
call.REFACT(net, netop): make Network IMMUTABLE by appending all operations together, in
Pipeline
constructor.ENH(net): public-size
_prune_graph()
–>Network.prune()`()
which can be used to interrogate needs & provides for a given graph. It accepts None inputs & outputs to auto-derive them.
FIX(SITE): autodocs API chapter were not generated in at all, due to import errors, fixed by using autodoc_mock_imports on networkx, pydot & boltons libs.
enh(op): polite error-,msg when calling an operation with missing needs (instead of an abrupt
KeyError
).FEAT(CI): test also on Python-3.8
v2.3.0 (24 Nov 2019, @ankostis): Zoomable SVGs & more op jobs¶
FEAT(plot): render Zoomable SVGs in jupyter(lab) notebooks.
break(netop): rename execution-method
"sequential" --> None
.break(netop): move
overwrites_collector
&method
args fromnetop.__call__()
–> cstorrefact(netop): convert remaining
**kwargs
into named args, tighten up API.
v2.2.0 (20 Nov 2019, @ankostis): enhance OPERATIONS & restruct their modules¶
REFACT(src): split module
nodes.py
–>fnop.py
+ netop.py and moveOperation
frombase.py
–>fnop.py
, in order to break cycle of base(op) <– net <– netop, and keep utils only in base.py.ENH(op): allow Operations WITHOUT any NEEDS.
ENH(op): allow Operation FUNCTIONS to return directly Dictionaries.
ENH(op): validate function Results against operation provides; jetsam now includes results variables:
results_fn
&results_op
.BREAK(op): drop unused Operation._after_init() pickle-hook; use dill instead.
refact(op): convert
Operation._validate()
into a function, to be called by clients wishing to automate operation construction.refact(op): replace
**kwargs
with named-args in class:FnOp, because it allowed too wide args, and offered no help to the user.REFACT(configs): privatize
network._execution_configs
; expose more config-methods from base package.
v2.1.1 (12 Nov 2019, @ankostis): global configs¶
BREAK: drop Python < 3.6 compatibility.
FEAT: Use (possibly multiple) global configurations for all networks, stored in a
contextvars.ContextVar
.ENH/BREAK: Use a (possibly) single execution_pool in global-configs.
feat: add abort flag in global-configs.
feat: add skip_evictions flag in global-configs.
v2.1.0 (20 Oct 2019, @ankostis): DROP BW-compatible, Restruct modules/API, Plan perfect evictions¶
The first non pre-release for 2.x train.
BRAKE API: DROP Operation’s
params
- use functools.partial() instead.BRAKE API: DROP Backward-Compatible
Data
&Operation
classes,BRAKE: DROP Pickle workarounds - expected to use
dill
instead.break(jetsam): drop “graphtik_` prefix from annotated attribute
ENH(op): now
operation()
supported the “builder pattern” with.operation.withset()
method.REFACT: renamed internal package functional –> nodes and moved classes around, to break cycles easier, (
base
works as supposed to), not to import early everything, but to fail plot early ifpydot
dependency missing.REFACT: move PLAN and
compute()
up, fromNetwork --> Pipeline
.ENH(NET): new PLAN BUILDING algorithm produces PERFECT EVICTIONS, that is, it gradually eliminates from the solution all non-asked outputs.
enh: pruning now cleans isolated data.
enh: eviction-instructions are inserted due to two different conditions: once for unneeded data in the past, and another for unused produced data (those not belonging typo the pruned dag).
enh: discard immediately irrelevant inputs.
ENH(net): changed results, now unrelated inputs are not included in solution.
refact(sideffect): store them as node-attributes in DAG, fix their combination with pinning & eviction.
fix(parallel): eviction was not working due to a typo 65 commits back!
v2.0.0b1 (15 Oct 2019, @ankostis): Rebranded as Graphtik for Python 3.6+¶
Continuation of yahoo#30 as yahoo#31, containing review-fixes in huyng/graphkit#1.
Network¶
FIX: multithreaded operations were failing due to shared
ExecutionPlan.executed
.FIX: pruning sometimes were inserting plan string in DAG. (not
_DataNode
).ENH: heavily reinforced exception annotations (“jetsam”):
FIX: (8f3ec3a) outer graphs/ops do not override the inner cause.
ENH: retrofitted exception-annotations as a single dictionary, to print it in one shot (8f3ec3a & 8d0de1f)
enh: more data in a dictionary
TCs: Add thorough TCs (8f3ec3a & b8063e5).
REFACT: rename Delete–>`Evict`, removed Placeholder from data nodes, privatize node-classes.
ENH: collect “jetsam” on errors and annotate exceptions with them.
ENH(sideffects): make them always DIFFERENT from regular DATA, to allow to co-exist.
fix(sideffects): typo in add_op() were mixing needs/provides.
enh: accept a single string as outputs when running graphs.
Testing & other code:¶
TCs: pytest now checks sphinx-site builds without any warnings.
Established chores with build services:
Travis (and auto-deploy to PyPi),
codecov
ReadTheDocs
v1.3.0 (Oct 2019, @ankostis): NEVER RELEASED: new DAG solver, better plotting & “sideffect”¶
Kept external API (hopefully) the same, but revamped pruning algorithm and
refactored network compute/compile structure, so results may change; significantly
enhanced plotting. The only new feature actually is the .sideffect
modifier.
Network:¶
FIX(yahoo#18, yahoo#26, yahoo#29, yahoo#17, yahoo#20): Revamped DAG SOLVER to fix bad pruning described in yahoo#24 & yahoo#25
Pruning now works by breaking incoming provide-links to any given intermediate inputs dropping operations with partial inputs or without outputs.
The end result is that operations in the graph that do not have all inputs satisfied, they are skipped (in v1.2.4 they crashed).
Also started annotating edges with optional/sideffects, to make proper use of the underlying
networkx
graph.REFACT(yahoo#21, yahoo#29): Refactored Network and introduced
ExecutionPlan
to keep compilation results (the oldsteps
list, plus input/output names).Moved also the check for when to evict a value, from running the execution-plan, to when building it; thus, execute methods don’t need outputs anymore.
ENH(yahoo#26): “Pin* input values that may be overwritten by calculated ones.
This required the introduction of the new
_PinInstruction
in the execution plan.FIX(yahoo#23, yahoo#22-2.4.3): Keep consistent order of
networkx.DiGraph
and sets, to generate deterministic solutions.Unfortunately, it non-determinism has not been fixed in < PY3.5, just reduced the frequency of spurious failures, caused by unstable dicts, and the use of subgraphs.
enh: Mark outputs produced by
Pipeline
’s needs asoptional
. TODO: subgraph network-operations would not be fully functional until “optional outputs” are dealt with (see yahoo#22-2.5).enh: Annotate operation exceptions with
ExecutionPlan
to aid debug sessions,drop: methods
list_layers()
/show layers()
not needed,repr()
is a better replacement.
Plotting:¶
ENH(yahoo#13, yahoo#26, yahoo#29): Now network remembers last plan and uses that to overlay graphs with the internals of the planing and execution:
execution-steps & order
evict & pin instructions
given inputs & asked outputs
solution values (just if they are present)
“optional” needs & broken links during pruning
REFACT: Move all API doc on plotting in a single module, split in 2 phases, build DOT & render DOT
FIX(yahoo#13): bring plot writing into files up-to-date from PY2; do not create plot-file if given file-extension is not supported.
FEAT: path pydot library to support rendering in Jupyter notebooks.
Testing & other code:¶
Increased coverage from 77% –> 90%.
ENH(yahoo#28): use
pytest
, to facilitate TCs parametrization.ENH(yahoo#30): Doctest all code; enabled many assertions that were just print-outs in v1.2.4.
FIX:
operation.__repr__()
was crashing when not all arguments had been set - a condition frequently met during debugging session or failed TCs (inspired by @syamajala’s 309338340).enh: Sped up parallel/multithread TCs by reducing delays & repetitions.
Tip
You need
pytest -m slow
to run those slow tests.
Chore & Docs:¶
FEAT: add changelog in
CHANGES.rst
file, containing flowcharts to compare versionsv1.2.4 <--> v1.3..0
.enh: updated site & documentation for all new features, comparing with v1.2.4.
enh(yahoo#30): added “API reference’ chapter.
drop(build):
sphinx_rtd_theme
library is the default theme for Sphinx now.enh(build): Add
test
pip extras.sound: https://www.youtube.com/watch?v=-527VazA4IQ, https://www.youtube.com/watch?v=8J182LRi8sU&t=43s
v1.2.4 (Mar 7, 2018)¶
Blocking bug in plotting code for Python-3.x.
Test-cases without assertions (just prints).
1.2.2 (Mar 7, 2018, @huyng): Fixed versioning¶
Versioning now is manually specified to avoid bug where the version was not being correctly reflected on pip install deployments
1.2.1 (Feb 23, 2018, @huyng): Fixed multi-threading bug and faster compute through caching of find_necessary_steps¶
We’ve introduced a cache to avoid computing find_necessary_steps multiple times during each inference call.
This has 2 benefits:
It reduces computation time of the compute call
It avoids a subtle multi-threading bug in networkx when accessing the graph from a high number of threads.
1.2.0 (Feb 13, 2018, @huyng)¶
Added set_execution_method(‘parallel’) for execution of graphs in parallel.
1.1.0 (Nov 9, 2017, @huyng)¶
Update setup.py
1.0.4 (Nov 3, 2017, @huyng): Networkx 2.0 compatibility¶
Minor Bug Fixes:
Compatibility fix for networkx 2.0
net.times now only stores timing info from the most recent run
1.0.3 (Jan 31, 2017, @huyng): Make plotting dependencies optional¶
Merge pull request yahoo#6 from yahoo/plot-optional
make plotting dependencies optional
1.0.2 (Sep 29, 2016, @pumpikano): Merge pull request yahoo#5 from yahoo/remove-packaging-dep¶
Remove ‘packaging’ as dependency
1.0.1 (Aug 24, 2016)¶
1.0 (Aug 2, 2016, @robwhess)¶
First public release in PyPi & GitHub.
Merge pull request yahoo#3 from robwhess/travis-build
Travis build
Index¶
Features¶
Deterministic pre-decided execution plan (unless partial-outputs or endured operations defined, see below).
Can assemble existing functions without modifications into pipelines.
dependency resolution can bypass calculation cycles based on data given and asked.
Support functions with partial outputs; keep working even if certain endured operations fail.
Facilitate trivial conveyor operations and alias on provides.
Support cycles, by annotating repeated updates of dependency values as tokens or sideffected (e.g. to add columns into
pandas.DataFrame
s).Hierarchical dependencies may access data values deep in solution with json pointer path expressions.
Hierarchical dependencies annotated as implicit imply which subdoc dependency the function reads or writes in the parent-doc.
Early eviction of intermediate results from solution, to optimize memory footprint.
Solution tracks all intermediate overwritten values for the same dependency.
Elaborate Graphviz plotting with configurable plot themes.
Integration with Sphinx sites with the new
graphtik
directive.Authored with debugging in mind.
Parallel execution (but underdeveloped & DEPRECATED).
Anti-features¶
It’s not meant to follow complex conditional logic based on dependency values (though it does support that to a limited degree).
It’s not an orchestrator for long-running tasks, nor a calendar scheduler - Apache Airflow, Dagster or Luigi may help for that.
It’s not really a parallelizing optimizer, neither a map-reduce framework - look additionally at Dask, IpyParallel, Celery, Hive, Pig, Hadoop, etc.
It’s not a stream/batch processor, like Spark, Storm, Fink, Kinesis, because it pertains function-call semantics, calling only once each function to process data-items.
Differences with schedula¶
schedula is a powerful library written roughly for the same purpose, and works differently along these lines (ie features below refer to schedula):
terminology (<graphtik> := <schedula>):
pipeline := dispatcher
plan := workflow
solution := solution
Dijkstra planning runs while calling operations:
Powerful & flexible (ie all operations are dynamic, domains are possible, etc).
Supports weights.
Cannot pre-calculate & cache execution plans (slow).
Calculated values are stored inside a graph (mimicking the structure of the functions):
graph visualizations absolutely needed to inspect & debug its solutions.
- graphs imply complex pre/post processing & traversal algos
(vs constructing/traversing data-trees).
Reactive plotted diagrams, web-server runs behind the scenes.
Operation graphs are stackable:
plotted nested-graphs support drill-down.
graphtik emulates that with data/operation names (operation nesting), but always a unified graph is solved at once, bc it is impossible to dress nesting-funcs as a python-funcs and pre-solve plan (schedula does not pre-solve plan, Dijkstra runs all the time). See TODO about plotting such nested graphs.
Schedula does not calculate all possible values (ie no overwrites).
Schedula computes precedence based on weights and lexicographical order of function name.
Re-inserting operation does not overrides its current function - must remove it first.
graphtik precedence based insertion order during composition.
Virtual start and end data-nodes needed for Dijkstra to solve the dag.
No domains (execute-time conditionals deciding whether a function must run).
Probably Re-computations is more straightforward in graphtik.
TODO: more differences with schedula exist.
Quick start¶
Here’s how to install:
pip install graphtik
OR with dependencies for plotting support (and you need to install Graphviz program separately with your OS tools):
pip install graphtik[plot]
Let’s build a graphtik computation pipeline that produces the following
x3 outputs out of x2 inputs (α
and β
):
>>> from graphtik import compose, operation
>>> from operator import mul, sub
>>> @operation(name="abs qubed",
... needs=["α-α×β"],
... provides=["|α-α×β|³"])
... def abs_qubed(a):
... return abs(a) ** 3
Hint
Notice that graphtik has not problem working in unicode chars for dependency names.
Compose the abspow
function along with mul
& sub
built-ins
into a computation graph:
>>> graphop = compose("graphop",
... operation(mul, needs=["α", "β"], provides=["α×β"]),
... operation(sub, needs=["α", "α×β"], provides=["α-α×β"]),
... abs_qubed,
... )
>>> graphop
Pipeline('graphop', needs=['α', 'β', 'α×β', 'α-α×β'],
provides=['α×β', 'α-α×β', '|α-α×β|³'],
x3 ops: mul, sub, abs qubed)
You may plot the function graph in a file like this (if in jupyter, no need to specify the file, see Jupyter notebooks):
>>> graphop.plot('graphop.svg') # doctest: +SKIP
As you can see, any function can be used as an operation in Graphtik, even ones imported from system modules.
Run the graph-operation and request all of the outputs:
>>> sol = graphop(**{'α': 2, 'β': 5})
>>> sol
{'α': 2, 'β': 5, 'α×β': 10, 'α-α×β': -8, '|α-α×β|³': 512}
Solutions are plottable as well:
>>> solution.plot('solution.svg') # doctest: +SKIP
Run the graph-operation and request a subset of the outputs:
>>> solution = graphop.compute({'α': 2, 'β': 5}, outputs=["α-α×β"])
>>> solution
{'α-α×β': -8}
… where the (interactive) legend is this:
>>> from graphtik.plot import legend
>>> l = legend()
legend¶