1. Operations¶
An operation is a function in a computation pipeline,
abstractly represented by the Operation
class.
This class specifies the dependencies forming the pipeline’s
network.
Defining Operations¶
You may inherit the Operation
abstract class to do the following:
define the needs & provides properties as collection of@ dependencies (needed to solve the dependencies network),
override the
compute(solution)
method to read from the solution argument those values listed in needs (those values only are guaranteed to exist when called),do some business, and then
populate the values listed in provides back into solution (if other values are populated, they may be ignored).
But there is an easier way – actually half of the code in this project is dedicated to retrofitting existing functions unaware of all these, into operations.
Operations from existing functions¶
The FnOp
provides a concrete wrapper around any arbitrary function
to define and execute within a pipeline.
Use the operation()
factory to instantiate one:
>>> from operator import add
>>> from graphtik import operation
>>> add_op = operation(add,
... needs=['a', 'b'],
... provides=['a_plus_b'])
>>> add_op
FnOp(name='add', needs=['a', 'b'], provides=['a_plus_b'], fn='add')
You may still call the original function at FnOp.fn
,
bypassing thus any operation pre-processing:
>>> add_op.fn(3, 4)
7
But the proper way is to call the operation (either directly or by calling the
FnOp.compute()
method). Notice though that unnamed
positional parameters are not supported:
>>> add_op(a=3, b=4)
{'a_plus_b': 7}
Tip
(unstable API) In case your function needs to access the execution
machinery
or its wrapping operation, it can do that through the task_context
(unstable API, not working during (deprecated) parallel execution,
see Accessing wrapper operation from task-context)
Builder pattern¶
There are two ways to instantiate a FnOp
s, each one suitable
for different scenarios.
We’ve seen that calling manually operation()
allows putting into a pipeline
functions that are defined elsewhere (e.g. in another module, or are system functions).
But that method is also useful if you want to create multiple operation instances
with similar attributes, e.g. needs
:
>>> op_factory = operation(needs=['a'])
Notice that we specified a fn, in order to get back a FnOp
instance (and not a decorator).
>>> from graphtik import operation, compose
>>> from functools import partial
>>> def mypow(a, p=2):
... return a ** p
>>> pow_op2 = op_factory.withset(fn=mypow, provides="^2")
>>> pow_op3 = op_factory.withset(fn=partial(mypow, p=3), name='pow_3', provides='^3')
>>> pow_op0 = op_factory.withset(fn=lambda a: 1, name='pow_0', provides='^0')
>>> graphop = compose('powers', pow_op2, pow_op3, pow_op0)
>>> graphop
Pipeline('powers', needs=['a'], provides=['^2', '^3', '^0'], x3 ops:
mypow, pow_3, pow_0)
>>> graphop(a=2)
{'a': 2, '^2': 4, '^3': 8, '^0': 1}
Tip
See Plotting on how to make diagrams like this.
Decorator specification¶
If you are defining your computation graph and the functions that comprise it all in the same script,
the decorator specification of operation
instances might be particularly useful,
as it allows you to assign computation graph structure to functions as they are defined.
Here’s an example:
>>> from graphtik import operation, compose
>>> @operation(needs=['b', 'a', 'r'], provides='bar')
... def foo(a, b, c):
... return c * (a + b)
>>> graphop = compose('foo_graph', foo)
Notice that if
name
is not given, it is deduced from the function name.
Specifying graph structure: provides
and needs
¶
Each operation is a node in a computation graph, depending and supplying data from and to other nodes (via the solution), in order to compute.
This graph structure is specified (mostly) via the provides
and needs
arguments
to the operation()
factory, specifically:
needs
this argument names the list of (positionally ordered) inputs data the operation requires to receive from solution. The list corresponds, roughly, to the arguments of the underlying function (plus any tokens).
It can be a single string, in which case a 1-element iterable is assumed.
- seealso
needs, modifier,
FnOp.needs
,FnOp._user_needs
,FnOp._fn_needs
provides
this argument names the list of (positionally ordered) outputs data the operation provides into the solution. The list corresponds, roughly, to the returned values of the fn (plus any tokens & aliases).
It can be a single string, in which case a 1-element iterable is assumed.
If they are more than one, the underlying function must return an iterable with same number of elements (unless it returns dictionary).
Declarations of needs and provides is affected by modifiers like
keyword()
:
Map inputs(& outputs) to differently named function arguments (& results)¶
- graphtik.modifier.keyword(name: str, keyword: Optional[str] = None, accessor: Optional[Accessor] = None, jsonp=None) _Modifier [source]
Annotate a dependency that maps to a different name in the underlying function.
When used on needs dependencies:
The value of the
name
dependency is read from the solution, and thenthat value is passed in the function as a keyword-argument named
keyword
.
When used on provides dependencies:
The operation must be a returns dictionary.
The value keyed with
keyword
is read from function’s returned dictionary, and thenthat value is placed into solution named as
name
.
- Parameters
keyword –
The argument-name corresponding to this named-input. If it is None, assumed the same as name, so as to behave always like kw-type arg, and to preserve its fn-name if ever renamed.
accessor – the functions to access values to/from solution (see
Accessor
) (actually a 2-tuple with functions is ok)jsonp – None (derrived from name),
False
, str, collection of str/callable (last one) See genericmodify()
modifier.
- Returns
a
_Modifier
instance, even if no keyword is given OR it is the same as name.
Example:
In case the name of a function input argument is different from the name in the graph (or just because the name in the inputs is not a valid argument-name), you may map it with the 2nd argument of
keyword()
:>>> from graphtik import operation, compose, keyword
>>> @operation(needs=[keyword("name-in-inputs", "fn_name")], provides="result") ... def foo(*, fn_name): # it works also with non-positional args ... return fn_name >>> foo FnOp(name='foo', needs=['name-in-inputs'(>'fn_name')], provides=['result'], fn='foo')
>>> pipe = compose('map a need', foo) >>> pipe Pipeline('map a need', needs=['name-in-inputs'], provides=['result'], x1 ops: foo)
>>> sol = pipe.compute({"name-in-inputs": 4}) >>> sol['result'] 4
You can do the same thing to the results of a returns dictionary operation:
>>> op = operation(lambda: {"fn key": 1}, ... name="renaming `provides` with a `keyword`", ... provides=keyword("graph key", "fn key"), ... returns_dict=True) >>> op FnOp(name='renaming `provides` with a `keyword`', provides=['graph key'(>'fn key')], fn{}='<lambda>')
Hint
Mapping provides names wouldn’t make sense for regular operations, since these are defined arbitrarily at the operation level. OTOH, the result names of returns dictionary operation are decided by the underlying function, which may lie beyond the control of the user (e.g. from a 3rd-party object).
Operations may execute with missing inputs¶
- graphtik.modifier.optional(name: str, keyword: Optional[str] = None, accessor: Optional[Accessor] = None, jsonp=None) _Modifier [source]
Annotate optionals needs corresponding to defaulted op-function arguments, …
received only if present in the inputs (when operation is invoked).
The value of an optional dependency is passed in as a keyword argument to the underlying function.
- Parameters
keyword – the name for the function argument it corresponds; if a falsy is given, same as name assumed, to behave always like kw-type arg and to preserve its fn-name if ever renamed.
accessor – the functions to access values to/from solution (see
Accessor
) (actually a 2-tuple with functions is ok)jsonp – None (derrived from name),
False
, str, collection of str/callable (last one) See genericmodify()
modifier.
Example:
>>> from graphtik import operation, compose, optional
>>> @operation(name='myadd', ... needs=["a", optional("b")], ... provides="sum") ... def myadd(a, b=0): ... return a + b
Notice the default value
0
to theb
annotated as optional argument:>>> graph = compose('mygraph', myadd) >>> graph Pipeline('mygraph', needs=['a', 'b'(?)], provides=['sum'], x1 ops: myadd)
The graph works both with and without
c
provided in the inputs:>>> graph(a=5, b=4)['sum'] 9 >>> graph(a=5) {'a': 5, 'sum': 5}
Like
keyword()
you may map input-name to a different function-argument:>>> operation(needs=['a', optional("quasi-real", "b")], ... provides="sum" ... )(myadd.fn) # Cannot wrap an operation, its `fn` only. FnOp(name='myadd', needs=['a', 'quasi-real'(?'b')], provides=['sum'], fn='myadd')
Calling functions with varargs (*args
)¶
- graphtik.modifier.vararg(name: str, accessor: Optional[Accessor] = None, jsonp=None) _Modifier [source]
Annotate a varargish needs to be fed as function’s
*args
.- Parameters
See also
Consult also the example test-case in:
test/test_op.py:test_varargs()
, in the full sources of the project.Example:
We designate
b
&c
as vararg arguments:>>> from graphtik import operation, compose, vararg
>>> @operation( ... needs=['a', vararg('b'), vararg('c')], ... provides='sum' ... ) ... def addall(a, *b): ... return a + sum(b) >>> addall FnOp(name='addall', needs=['a', 'b'(*), 'c'(*)], provides=['sum'], fn='addall')
>>> graph = compose('mygraph', addall)
The graph works with and without any of
b
orc
inputs:>>> graph(a=5, b=2, c=4)['sum'] 11 >>> graph(a=5, b=2) {'a': 5, 'b': 2, 'sum': 7} >>> graph(a=5) {'a': 5, 'sum': 5}
- graphtik.modifier.varargs(name: str, accessor: Optional[Accessor] = None, jsonp=None) _Modifier [source]
An varargish
vararg()
, naming a iterable value in the inputs.- Parameters
See also
Consult also the example test-case in:
test/test_op.py:test_varargs()
, in the full sources of the project.Example:
>>> from graphtik import operation, compose, varargs
>>> def enlist(a, *b): ... return [a] + list(b)
>>> graph = compose('mygraph', ... operation(name='enlist', needs=['a', varargs('b')], ... provides='sum')(enlist) ... ) >>> graph Pipeline('mygraph', needs=['a', 'b'(?)], provides=['sum'], x1 ops: enlist)
The graph works with or without b in the inputs:
>>> graph(a=5, b=[2, 20])['sum'] [5, 2, 20] >>> graph(a=5) {'a': 5, 'sum': [5]} >>> graph(a=5, b=0xBAD) Traceback (most recent call last): ValueError: Failed matching inputs <=> needs for FnOp(name='enlist', needs=['a', 'b'(+)], provides=['sum'], fn='enlist'): 1. Expected varargs inputs to be non-str iterables: {'b'(+): 2989} +++inputs: ['a', 'b']
Attention
To avoid user mistakes, varargs do not accept
str
inputs (though iterables):>>> graph(a=5, b="mistake") Traceback (most recent call last): ValueError: Failed matching inputs <=> needs for FnOp(name='enlist', needs=['a', 'b'(+)], provides=['sum'], fn='enlist'): 1. Expected varargs inputs to be non-str iterables: {'b'(+): 'mistake'} +++inputs: ['a', 'b']
See also
The elaborate example in Hierarchical data and further tricks section.
Interface differently named dependencies: aliases & keyword modifier¶
Sometimes, you need to interface functions & operations where they name a dependency differently. There are 4 different ways to accomplish that:
Introduce some “pipe-through” operation (see the example in Default conveyor operation, below).
Annotate certain needs with
keyword()
modifier (exemplified in the modifier).For a returns dictionary operation, annotate certain provides with a
keyword()
modifier (exemplified in the modifier).Alias (clone) certain provides to different names:
>>> op = operation(str, ... name="cloning `provides` with an `alias`", ... provides="real thing", ... aliases={"real thing": "clone"})
Default conveyor operation¶
If you don’t specify a callable, the default identity function get assigned, as long a name for the operation is given, and the number of needs matches the number of provides.
This facilitates conveying inputs into renamed outputs without the need to define a trivial identity function matching the needs & provides each time:
>>> from graphtik import keyword, optional, vararg
>>> op = operation(
... None,
... name="a",
... needs=[optional("opt"), vararg("vararg"), "pos", keyword("kw")],
... # positional vararg, keyword, optional
... provides=["pos", "vararg", "kw", "opt"],
... )
>>> op(opt=5, vararg=6, pos=7, kw=8)
{'pos': 7, 'vararg': 6, 'kw': 5, 'opt': 8}
Notice that the order of the results is not that of the needs
(or that of the inputs in the compute()
method), but, as explained in the comment-line,
it follows Python semantics.
Considerations for when building pipelines¶
When many operations are composed into a computation graph, Graphtik matches up the values in their needs and provides to form the edges of that graph (see Pipelines for more on that), like the operations from the sample formula (1) in Quick start section:
>>> from operator import mul, sub
>>> from functools import partial
>>> from graphtik import compose, operation
>>> def abspow(a, p):
... """Compute |a|^p. """
... c = abs(a) ** p
... return c
>>> # Compose the mul, sub, and abspow operations into a computation graph.
>>> graphop = compose("graphop",
... operation(mul, needs=["α", "β"], provides=["α×β"]),
... operation(sub, needs=["α", "α×β"], provides=["α-α×β"]),
... operation(name="abspow1", needs=["α-α×β"], provides=["|α-α×β|³"])
... (partial(abspow, p=3))
... )
>>> graphop
Pipeline('graphop',
needs=['α', 'β', 'α×β', 'α-α×β'],
provides=['α×β', 'α-α×β', '|α-α×β|³'],
x3 ops: mul, sub, abspow1)
Notice the use of
functools.partial()
to set parameterp
to a constant value.And this is done by calling once more the returned “decorator” from
operation()
, when called without a function.
The needs
and provides
arguments to the operations in this script define
a computation graph that looks like this: