Steering

The misunderstanding
NCL has a null propagation cycle so it must necessarily be slower, bigger and require more power.

The understanding
If NCL is simply inserted into the CBL computation model it is, indeed, very inefficient. But NCL engenders a model of computation very different from CBL. It is this native NCL flow computation model that delivers computational efficiency. This page illustrates how steering, enabled by NCL and flow computation, can be more efficient than MUX selection.

Emptiness and steering
The NCL null value is not just a spacer between successive data presentations. It is a background of emptiness through which data wavefronts can be steered. We will illustrate this with an example of a simple case statement.

case of (A)
0: B <= X+ Y;
1: B <= C+ W;
2: B <= D+ Z;
default: B <= 0;
endcase

The CBL rendering of this would be to read all the inputs, execute all the statements and select the result corresponding to the value of A and discard all other results with a MUX ladder as below. Since all paths are always asserting a valid data value the only approach here is to select output paths after computation.

CBL case

The NCL null value, on the other hand, explicitly represents not data and data validity can be expressed by transitioning from not data (null) to data. So the conditional results of A are used to enable input paths before computation, as below, rather than to select output paths after computation, as above.

NCL case

The way steering works
A arrives. If A == 0 then X and Y are enabled and presented to X + Y which flows to the 1of and to B. The C + Y, the D + Z and the 0 remain quiescent (empty) and do not flow. W, C,  Z and D are not enabled. The flow activity is illustrated with the red (transition to data) path below.

NCL case A0

If A is not == 0 the A == 1 test is enabled. If A == 1 then C  and W are enabled and presented to C + W which flows to the 1of and to B. The X + Y, the D + Z and the 0 remain quiescent (empty) and do not flow. X, Z and D are not enabled. The flow activity is illustrated with the red path below.

NCL case A1

 

If A is not == 1 the A == 2 test is enabled. If A == 2 then Z  and D are enabled and presented to Z + D which flows to the 1of and to B. The X + Y, the C + Y and the 0 remain quiescent (empty) and do not flow. W, X, Y and C are not enabled. The flow activity is illustrated with the red path below.

NCL case A2

 

If A is not == 2 the 0 is enabled which flows to the 1of and to B. The X + Y, the C + Y and the D + Z remain quiescent (empty) and do not flow. X, Y, W, C, Z and D are not enabled. The flow activity is illustrated with the red path below.

NCL case def

 

Data wavefronts flow over specific paths through a background of emptiness (null). The background remains quiescent except for the data flow paths. A null wavefront follows the data wavefront over the paths returning the whole to background emptiness (null) and ready for the next data wavefront to be steered through.
Unnecessary activity does not occur – No glitching – No discarding of computation.

The CBL and NCL circuits can be compared by counting activities that are roughly comparable. The OR is less costly than a MUX but we consider them roughly comparable. The logical and arithmetic functions we consider as roughly comparable alu ops. And we consider a register read and an enable as roughly comparable. The table below tallies the activities.

steer 3

NCL activity for each value of A is different so there is an average activity over random values of A of 6.5. CBL activity is the same for all values of A so its average activity over random values of A is 15. Doubling 6.5 to account for the null wavefront still leaves NCL with less activity than CBL.

The worst case activity for NCL is 8 and for CBL is 15. The longest path for NCL is 7 and for CBL is 5.

NCL can be substituted into the CBL select model and it will work correctly but its activity would be 15 and with the null wavefront it would be 30. The fact that NCL can implement the select model as well as the steer model and work correctly in both cases is one source of confusion.

The value of NCL grows with scale. We extrapolate from the example to a case with 10 values for A and 6 inputs.

steer 10

The average case activity for NCL is now 10.6. Doubling the 10.6 to account for the null wavefront is still considerably less than the CBL activity.

Discussion
There are other considerations such as the possibility that the Boolean inputs are the same and the Boolean functions do not switch. But:

CBL glitches and NCL does not.

The NCL circuit only activates when an A arrives. The Boolean circuit can be clock gated to limit its activity, but clock gating requires extra logic and there is a limit to the granularity of clock gating (the cost of the gating can exceed the cost of the activity saved) which means that even if the case statement is clock gated there will still be clock ticks under which it is active but its result is discarded. The NCL case, being active only when an A flows into it and only performing the activity the value of  A specifies, is analogous to perfect clock gating. It is inherent in NCL and does not require extra logic to acheive.

Conclusion
We have shown a first order comparison of a CBL design and a functionally equivalent native NCL design and found that steering can be more efficient than selecting.

Multi-rail
The next step is to demonstrate the multi-rail nature of NCL and its effect on activity.