1. Operations¶
At a high level, an operation is a node in a computation graph.
Graphtik uses an Operation
class to abstractly represent these computations.
The class specifies the requirments for a function to participate
in a computation graph; those are its input-data needs, and the output-data
it provides.
The FunctionalOperation
provides a lightweight wrapper
around an arbitrary function to define those specifications.
-
class
graphtik.op.
Operation
(name, needs=None, provides=None)[source] An abstract class representing a data transformation by
compute()
.-
compute
(named_inputs, outputs=None)[source] Compute (optional) asked outputs for the given named_inputs.
It is called by
Network
. End-users should simply call the operation with named_inputs as kwargs.Parameters: named_inputs (list) – A list of Data
objects on which to run the layer’s feed-forward computation.Returns list: Should return a list values representing the results of running the feed-forward computation on inputs
.
-
There is a better way to instantiate an FunctionalOperation
than simply constructing it,
and we’ll get to it later.
First off, though, here’s the specifications for the operation classes:
-
class
graphtik.op.
FunctionalOperation
(fn: Callable, name, needs=None, provides=None, *, returns_dict=None)[source] An Operation performing a callable (ie function, method, lambda).
Use
operation()
factory to build instances of this class instead.-
__init__
(fn: Callable, name, needs=None, provides=None, *, returns_dict=None)[source] Create a new layer instance. Names may be given to this layer and its inputs and outputs. This is important when connecting layers and data in a Network object, as the names are used to construct the graph.
Parameters: - name (str) – The name the operation (e.g. conv1, conv2, etc..)
- needs (list) – Names of input data objects this layer requires.
- provides (list) – Names of output data objects this provides.
-
compute
(named_inputs, outputs=None) → dict[source] Compute (optional) asked outputs for the given named_inputs.
It is called by
Network
. End-users should simply call the operation with named_inputs as kwargs.Parameters: named_inputs (list) – A list of Data
objects on which to run the layer’s feed-forward computation.Returns list: Should return a list values representing the results of running the feed-forward computation on inputs
.
-
__call__
(*args, **kwargs)[source] Call self as a function.
-
Operations are just functions¶
At the heart of each operation
is just a function, any arbitrary function.
Indeed, you can instantiate an operation
with a function and then call it
just like the original function, e.g.:
>>> from operator import add
>>> from graphtik import operation
>>> add_op = operation(name='add_op', needs=['a', 'b'], provides=['a_plus_b'])(add)
>>> add_op(3, 4) == add(3, 4)
True
Specifying graph structure: provides
and needs
¶
Of course, each operation
is more than just a function.
It is a node in a computation graph, depending on other nodes in the graph for input data and
supplying output data that may be used by other nodes in the graph (or as a graph output).
This graph structure is specified via the provides
and needs
arguments
to the operation
constructor. Specifically:
provides
: this argument names the outputs (i.e. the returned values) of a givenoperation
. If multiple outputs are specified byprovides
, then the return value of the function comprising theoperation
must return an iterable.needs
: this argument names data that is needed as input by a givenoperation
. Each piece of data named in needs may either be provided by anotheroperation
in the same graph (i.e. specified in theprovides
argument of thatoperation
), or it may be specified as a named input to a graph computation (more on graph computations here).
When many operations are composed into a computation graph (see Graph Composition for more on that),
Graphtik matches up the values in their needs
and provides
to form the edges of that graph.
Let’s look again at the operations from the script in Quick start, for example:
>>> from operator import mul, sub
>>> from functools import partial
>>> from graphtik import compose, operation
>>> # Computes |a|^p.
>>> def abspow(a, p):
... c = abs(a) ** p
... return c
>>> # Compose the mul, sub, and abspow operations into a computation graph.
>>> graphop = compose("graphop",
... operation(name="mul1", needs=["a", "b"], provides=["ab"])(mul),
... operation(name="sub1", needs=["a", "ab"], provides=["a_minus_ab"])(sub),
... operation(name="abspow1", needs=["a_minus_ab"], provides=["abs_a_minus_ab_cubed"])
... (partial(abspow, p=3))
... )
Tip
Notice the use of functools.partial()
to set parameter p
to a contant value.
The needs
and provides
arguments to the operations in this script define
a computation graph that looks like this (where the oval are operations,
squares/houses are data):
Tip
See Plotting on how to make diagrams like this.
Instantiating operations¶
There are several ways to instantiate an operation
, each of which might be more suitable for different scenarios.
Decorator specification¶
If you are defining your computation graph and the functions that comprise it all in the same script, the decorator specification of operation
instances might be particularly useful, as it allows you to assign computation graph structure to functions as they are defined. Here’s an example:
>>> from graphtik import operation, compose
>>> @operation(name='foo_op', needs=['a', 'b', 'c'], provides='foo')
... def foo(a, b, c):
... return c * (a + b)
>>> graphop = compose('foo_graph', foo)
Functional specification¶
If the functions underlying your computation graph operations are defined elsewhere than the script in which your graph itself is defined (e.g. they are defined in another module, or they are system functions), you can use the functional specification of operation
instances:
>>> from operator import add, mul
>>> from graphtik import operation, compose
>>> add_op = operation(name='add_op', needs=['a', 'b'], provides='sum')(add)
>>> mul_op = operation(name='mul_op', needs=['c', 'sum'], provides='product')(mul)
>>> graphop = compose('add_mul_graph', add_op, mul_op)
The functional specification is also useful if you want to create multiple operation
instances from the same function, perhaps with different parameter values, e.g.:
>>> from functools import partial
>>> def mypow(a, p=2):
... return a ** p
>>> pow_op1 = operation(name='pow_op1', needs=['a'], provides='a_squared')(mypow)
>>> pow_op2 = operation(name='pow_op2', needs=['a'], provides='a_cubed')(partial(mypow, p=3))
>>> graphop = compose('two_pows_graph', pow_op1, pow_op2)
A slightly different approach can be used here to accomplish the same effect by creating an operation “builder pattern”:
>>> def mypow(a, p=2):
... return a ** p
>>> pow_op_factory = operation(mypow, needs=['a'], provides='a_squared')
>>> pow_op1 = pow_op_factory(name='pow_op1')
>>> pow_op2 = pow_op_factory.withset(name='pow_op2', provides='a_cubed')(partial(mypow, p=3))
>>> pow_op3 = pow_op_factory(lambda a: 1, name='pow_op0')
>>> graphop = compose('two_pows_graph', pow_op1, pow_op2, pow_op3)
>>> graphop(a=2)
{'a': 2, 'a_cubed': 8, 'a_squared': 4}
Note
You cannot call again the factory to overwrite the function,
you have to use either the fn=
keyword with withset()
method or
call once more.
Modifiers on operation
inputs and outputs¶
Certain modifiers are available to apply to input or output values in needs
and provides
,
for example, to designate optional inputs, or “ghost” sideffects inputs & outputs.
These modifiers are available in the graphtik.modifiers
module:
Optionals¶
-
class
graphtik.modifiers.
optional
[source]¶ An optional need signifies that the function’s argument may not receive a value.
Only input values in
needs
may be designated as optional using this modifier. Anoperation
will receive a value for an optional need only if if it is available in the graph at the time of its invocation. Theoperation
’s function should have a defaulted parameter with the same name as the opetional, and the input value will be passed as a keyword argument, if it is available.Here is an example of an operation that uses an optional argument:
>>> from graphtik import operation, compose, optional >>> def myadd(a, b, c=0): ... return a + b + c
Designate c as an optional argument:
>>> graph = compose('mygraph', ... operation(name='myadd', needs=['a', 'b', optional('c')], provides='sum')(myadd) ... ) >>> graph NetworkOperation(name='mygraph', needs=['a', 'b', optional('c')], provides=['sum'])
The graph works with and without c provided as input:
>>> graph(a=5, b=2, c=4)['sum'] 11 >>> graph(a=5, b=2) {'a': 5, 'b': 2, 'sum': 7}
Varargs¶
-
class
graphtik.modifiers.
vararg
[source]¶ Like
optional
but feeds as ONE OF the*args
into the function (instead of**kwargs
).For instance:
>>> from graphtik import operation, compose, vararg >>> def addall(a, *b): ... return a + sum(b)
Designate b & c as an vararg arguments:
>>> graph = compose('mygraph', ... operation(name='addall', needs=['a', vararg('b'), vararg('c')], ... provides='sum')(addall) ... ) >>> graph NetworkOperation(name='mygraph', needs=['a', optional('b'), optional('c')], provides=['sum'])
The graph works with and without any of b and c inputs:
>>> graph(a=5, b=2, c=4)['sum'] 11 >>> graph(a=5, b=2) {'a': 5, 'b': 2, 'sum': 7} >>> graph(a=5) {'a': 5, 'sum': 5}
-
class
graphtik.modifiers.
varargs
[source]¶ An optional like
vararg
feeds as MANY*args
into the function (instead of**kwargs
).
Read also the example test-case in: test/test_op.py:test_varargs()
Sideffects¶
-
class
graphtik.modifiers.
sideffect
[source]¶ A sideffect data-dependency participates in the graph but never given/asked in functions.
Both inputs & outputs in
needs
&provides
may be designated as sideffects using this modifier. Sideffects work as usual while solving the graph but they do not interact with theoperation
’s function; specifically:- input sideffects are NOT fed into the function;
- output sideffects are NOT expected from the function.
Their purpose is to describe operations that modify the internal state of some of their arguments (“side-effects”). A typical use case is to signify columns required to produce new ones in pandas dataframes:
>>> from graphtik import operation, compose, sideffect >>> # Function appending a new dataframe column from two pre-existing ones. >>> def addcolumns(df): ... df['sum'] = df['a'] + df['b']
Designate a, b & sum column names as an sideffect arguments:
>>> graph = compose('mygraph', ... operation( ... name='addcolumns', ... needs=['df', sideffect('df.b')], # sideffect names can be anything ... provides=[sideffect('df.sum')])(addcolumns) ... ) >>> graph NetworkOperation(name='mygraph', needs=['df', 'sideffect(df.b)'], provides=['sideffect(df.sum)']) >>> df = pd.DataFrame({'a': [5, 0], 'b': [2, 1]}) >>> graph({'df': df})['df'] a b 0 5 2 1 0 1
We didn’t get the
sum
column because the b sideffect was unsatisfied. We have to add its key to the inputs (with _any_ value):>>> graph({'df': df, sideffect("df.b"): 0})['df'] a b sum 0 5 2 7 1 0 1 1
Note that regular data in needs and provides do not match same-named sideffects. That is, in the following operation, the
prices
input is different from thesideffect(prices)
output:>>> def upd_prices(sales_df, prices): ... sales_df["Prices"] = prices
>>> operation(fn=upd_prices, ... name="upd_prices", ... needs=["sales_df", "price"], ... provides=[sideffect("price")]) operation(name='upd_prices', needs=['sales_df', 'price'], provides=['sideffect(price)'], fn='upd_prices')
Note
An
operation
with sideffects outputs only, have functions that return no value at all (like the one above). Such operation would still be called for their side-effects.Tip
You may associate sideffects with other data to convey their relationships, simply by including their names in the string - in the end, it’s just a string - but no enforcement will happen from graphtik.
>>> sideffect("price[sales_df]") 'sideffect(price[sales_df])'