1. Operations

At a high level, an operation is a node in a computation graph. Graphtik uses an Operation class to abstractly represent these computations. The class specifies the requirments for a function to participate in a computation graph; those are its input-data needs, and the output-data it provides.

The FunctionalOperation provides a lightweight wrapper around an arbitrary function to define those specifications.

class graphtik.op.Operation(name, needs=None, provides=None)[source]

An abstract class representing a data transformation by compute().

compute(named_inputs, outputs=None)[source]

Compute (optional) asked outputs for the given named_inputs.

It is called by Network. End-users should simply call the operation with named_inputs as kwargs.

Parameters:named_inputs (list) – A list of Data objects on which to run the layer’s feed-forward computation.
Returns list:Should return a list values representing the results of running the feed-forward computation on inputs.

There is a better way to instantiate an FunctionalOperation than simply constructing it, and we’ll get to it later. First off, though, here’s the specifications for the operation classes:

class graphtik.op.FunctionalOperation(fn: Callable, name, needs=None, provides=None, *, returns_dict=None)[source]

An Operation performing a callable (ie function, method, lambda).

Use operation() factory to build instances of this class instead.

__init__(fn: Callable, name, needs=None, provides=None, *, returns_dict=None)[source]

Create a new layer instance. Names may be given to this layer and its inputs and outputs. This is important when connecting layers and data in a Network object, as the names are used to construct the graph.

Parameters:
  • name (str) – The name the operation (e.g. conv1, conv2, etc..)
  • needs (list) – Names of input data objects this layer requires.
  • provides (list) – Names of output data objects this provides.
compute(named_inputs, outputs=None) → dict[source]

Compute (optional) asked outputs for the given named_inputs.

It is called by Network. End-users should simply call the operation with named_inputs as kwargs.

Parameters:named_inputs (list) – A list of Data objects on which to run the layer’s feed-forward computation.
Returns list:Should return a list values representing the results of running the feed-forward computation on inputs.
__call__(*args, **kwargs)[source]

Call self as a function.

Operations are just functions

At the heart of each operation is just a function, any arbitrary function. Indeed, you can instantiate an operation with a function and then call it just like the original function, e.g.:

>>> from operator import add
>>> from graphtik import operation

>>> add_op = operation(name='add_op', needs=['a', 'b'], provides=['a_plus_b'])(add)

>>> add_op(3, 4) == add(3, 4)
True

Specifying graph structure: provides and needs

Of course, each operation is more than just a function. It is a node in a computation graph, depending on other nodes in the graph for input data and supplying output data that may be used by other nodes in the graph (or as a graph output). This graph structure is specified via the provides and needs arguments to the operation constructor. Specifically:

  • provides: this argument names the outputs (i.e. the returned values) of a given operation. If multiple outputs are specified by provides, then the return value of the function comprising the operation must return an iterable.
  • needs: this argument names data that is needed as input by a given operation. Each piece of data named in needs may either be provided by another operation in the same graph (i.e. specified in the provides argument of that operation), or it may be specified as a named input to a graph computation (more on graph computations here).

When many operations are composed into a computation graph (see Graph Composition for more on that), Graphtik matches up the values in their needs and provides to form the edges of that graph.

Let’s look again at the operations from the script in Quick start, for example:

>>> from operator import mul, sub
>>> from functools import partial
>>> from graphtik import compose, operation

>>> # Computes |a|^p.
>>> def abspow(a, p):
...   c = abs(a) ** p
...   return c

>>> # Compose the mul, sub, and abspow operations into a computation graph.
>>> graphop = compose("graphop",
...    operation(name="mul1", needs=["a", "b"], provides=["ab"])(mul),
...    operation(name="sub1", needs=["a", "ab"], provides=["a_minus_ab"])(sub),
...    operation(name="abspow1", needs=["a_minus_ab"], provides=["abs_a_minus_ab_cubed"])
...    (partial(abspow, p=3))
... )

Tip

Notice the use of functools.partial() to set parameter p to a contant value.

The needs and provides arguments to the operations in this script define a computation graph that looks like this (where the oval are operations, squares/houses are data):

_images/barebone_3ops.svg

Tip

See Plotting on how to make diagrams like this.

Instantiating operations

There are several ways to instantiate an operation, each of which might be more suitable for different scenarios.

Decorator specification

If you are defining your computation graph and the functions that comprise it all in the same script, the decorator specification of operation instances might be particularly useful, as it allows you to assign computation graph structure to functions as they are defined. Here’s an example:

>>> from graphtik import operation, compose

>>> @operation(name='foo_op', needs=['a', 'b', 'c'], provides='foo')
... def foo(a, b, c):
...   return c * (a + b)

>>> graphop = compose('foo_graph', foo)

Functional specification

If the functions underlying your computation graph operations are defined elsewhere than the script in which your graph itself is defined (e.g. they are defined in another module, or they are system functions), you can use the functional specification of operation instances:

>>> from operator import add, mul
>>> from graphtik import operation, compose

>>> add_op = operation(name='add_op', needs=['a', 'b'], provides='sum')(add)
>>> mul_op = operation(name='mul_op', needs=['c', 'sum'], provides='product')(mul)

>>> graphop = compose('add_mul_graph', add_op, mul_op)

The functional specification is also useful if you want to create multiple operation instances from the same function, perhaps with different parameter values, e.g.:

>>> from functools import partial

>>> def mypow(a, p=2):
...    return a ** p

>>> pow_op1 = operation(name='pow_op1', needs=['a'], provides='a_squared')(mypow)
>>> pow_op2 = operation(name='pow_op2', needs=['a'], provides='a_cubed')(partial(mypow, p=3))

>>> graphop = compose('two_pows_graph', pow_op1, pow_op2)

A slightly different approach can be used here to accomplish the same effect by creating an operation “builder pattern”:

>>> def mypow(a, p=2):
...    return a ** p

>>> pow_op_factory = operation(mypow, needs=['a'], provides='a_squared')

>>> pow_op1 = pow_op_factory(name='pow_op1')
>>> pow_op2 = pow_op_factory.withset(name='pow_op2', provides='a_cubed')(partial(mypow, p=3))
>>> pow_op3 = pow_op_factory(lambda a: 1, name='pow_op0')

>>> graphop = compose('two_pows_graph', pow_op1, pow_op2, pow_op3)
>>> graphop(a=2)
{'a': 2, 'a_cubed': 8, 'a_squared': 4}

Note

You cannot call again the factory to overwrite the function, you have to use either the fn= keyword with withset() method or call once more.

Modifiers on operation inputs and outputs

Certain modifiers are available to apply to input or output values in needs and provides, for example, to designate optional inputs, or “ghost” sideffects inputs & outputs. These modifiers are available in the graphtik.modifiers module:

Optionals

class graphtik.modifiers.optional[source]

An optional need signifies that the function’s argument may not receive a value.

Only input values in needs may be designated as optional using this modifier. An operation will receive a value for an optional need only if if it is available in the graph at the time of its invocation. The operation’s function should have a defaulted parameter with the same name as the opetional, and the input value will be passed as a keyword argument, if it is available.

Here is an example of an operation that uses an optional argument:

>>> from graphtik import operation, compose, optional

>>> def myadd(a, b, c=0):
...    return a + b + c

Designate c as an optional argument:

>>> graph = compose('mygraph',
...     operation(name='myadd', needs=['a', 'b', optional('c')], provides='sum')(myadd)
... )
>>> graph
NetworkOperation(name='mygraph',
                 needs=['a', 'b', optional('c')],
                 provides=['sum'])

The graph works with and without c provided as input:

>>> graph(a=5, b=2, c=4)['sum']
11
>>> graph(a=5, b=2)
{'a': 5, 'b': 2, 'sum': 7}

Varargs

class graphtik.modifiers.vararg[source]

Like optional but feeds as ONE OF the *args into the function (instead of **kwargs).

For instance:

>>> from graphtik import operation, compose, vararg

>>> def addall(a, *b):
...    return a + sum(b)

Designate b & c as an vararg arguments:

>>> graph = compose('mygraph',
...     operation(name='addall', needs=['a', vararg('b'), vararg('c')],
...     provides='sum')(addall)
... )
>>> graph
NetworkOperation(name='mygraph',
                 needs=['a', optional('b'), optional('c')],
                 provides=['sum'])

The graph works with and without any of b and c inputs:

>>> graph(a=5, b=2, c=4)['sum']
11
>>> graph(a=5, b=2)
{'a': 5, 'b': 2, 'sum': 7}
>>> graph(a=5)
{'a': 5, 'sum': 5}
class graphtik.modifiers.varargs[source]

An optional like vararg feeds as MANY *args into the function (instead of **kwargs).

Read also the example test-case in: test/test_op.py:test_varargs()

Sideffects

class graphtik.modifiers.sideffect[source]

A sideffect data-dependency participates in the graph but never given/asked in functions.

Both inputs & outputs in needs & provides may be designated as sideffects using this modifier. Sideffects work as usual while solving the graph but they do not interact with the operation’s function; specifically:

  • input sideffects are NOT fed into the function;
  • output sideffects are NOT expected from the function.

Their purpose is to describe operations that modify the internal state of some of their arguments (“side-effects”). A typical use case is to signify columns required to produce new ones in pandas dataframes:

>>> from graphtik import operation, compose, sideffect

>>> # Function appending a new dataframe column from two pre-existing ones.
>>> def addcolumns(df):
...    df['sum'] = df['a'] + df['b']

Designate a, b & sum column names as an sideffect arguments:

>>> graph = compose('mygraph',
...     operation(
...         name='addcolumns',
...         needs=['df', sideffect('df.b')],  # sideffect names can be anything
...         provides=[sideffect('df.sum')])(addcolumns)
... )
>>> graph
NetworkOperation(name='mygraph', needs=['df', 'sideffect(df.b)'],
                 provides=['sideffect(df.sum)'])

>>> df = pd.DataFrame({'a': [5, 0], 'b': [2, 1]})   
>>> graph({'df': df})['df']                         
        a       b
0       5       2
1       0       1

We didn’t get the sum column because the b sideffect was unsatisfied. We have to add its key to the inputs (with _any_ value):

>>> graph({'df': df, sideffect("df.b"): 0})['df']   
        a       b       sum
0       5       2       7
1       0       1       1

Note that regular data in needs and provides do not match same-named sideffects. That is, in the following operation, the prices input is different from the sideffect(prices) output:

>>> def upd_prices(sales_df, prices):
...     sales_df["Prices"] = prices
>>> operation(fn=upd_prices,
...           name="upd_prices",
...           needs=["sales_df", "price"],
...           provides=[sideffect("price")])
operation(name='upd_prices', needs=['sales_df', 'price'],
          provides=['sideffect(price)'], fn='upd_prices')

Note

An operation with sideffects outputs only, have functions that return no value at all (like the one above). Such operation would still be called for their side-effects.

Tip

You may associate sideffects with other data to convey their relationships, simply by including their names in the string - in the end, it’s just a string - but no enforcement will happen from graphtik.

>>> sideffect("price[sales_df]")
'sideffect(price[sales_df])'