P4C
The P4 Compiler
|
Files | |
phv_slicing_dfs_iterator.h | |
phv_slicing_iterator.h | |
phv_slicing_split.h | |
types.h | |
The slicing iterator takes a super cluster and some other constraints, like packingConflict
, pa_container_size
, then returns all possible valid slicings of the input cluster. A slicing is valid if all super clusters in the result can be allocated to a container group. The iterator also accepts feedback from the allocation algorithm through the invalidate(const SuperCluster::SliceList* sl)
interface, which tells the iterator to not generate results that contain the input slice list.
All files related to slicing are under phv/slicing
directory, the general structure is
types.h
- basic type definitions for slicing.phv_slicing_split.h
- static functions that can split a SuperCluster according to a schema. These are inherited from the previous implementation, and should be refactored later.phv_slicing_iterator.h
- PImpl
class for slicing iterator, to isolate the implementation and allows us to write white-box tests on the actual implementation.phv_slicing_dfs_iterator.h
- implementation of a DFS-based slicing algorithm.This document explains the DFS-based algorithm, as follows. We start with the basic constraint-free algorithm, and incrementally enhance it by adding constraints. At each stage, we demonstrate how the algorithm handles the additional constraint. We assume no knowledge about Tofino. The discussion will be based on an informal mathematical model, extracted from the hardware details. All unrelated details of hardware are elided.
Let's start with a few definitions:
f1<32>[0:7]
represents the first 8 bits of f1
. For slices that encompass an entire field, we use the field as shorthand. For example, f1<32>
is short for f1<32>[0:31]
.For example, suppose we have four headers A,B,C,D, with this layout:
A valid slicing can be
The above problem can be solved by a simple DFS algorithm. The algorithm maintains two pieces of state: a header list that contains headers to be split, and a result list of headers that has been split. The algorithm is as follows:
Pseudo codes of the DFS:
If the input Headers
is [A, C]
, then we expect to see:
The time complexity for finding all solutions is ~O(2^(M/8)), where M is the total bit sizes of all N headers. Note that there are no constraints on any fields: because those headers are unrelated, all possible slicings are valid. So, the time complexity of finding the first solution is O(M).
Let's add a new constraint to field slices. Define a rotational cluster as a set of field slices that all must be allocated to same-sized containers. That is, in the slicing result, the slices must reside in lists that have the same bit-length. We also require that the field slices in each rotational cluster be split at the same bit.
For example, here is a header list with rotational constraints:
The rotational cluster { a1<12>, b2<128>[0:11] }
means that if we split a1
at the 8th bit, then we must also split b2<128>[0:11]
at the 8th bit. Making this split would result in two additional rotational-cluster constraints:
This constraint transitively links different headers into a set that is called a super cluster. Because of this constraint, some slicings are no longer valid. For example, the 3rd solution
is now invalid because c4
ends up in a 8-bit list while a3
is in a 16-bit list.
We refine the previous algorithm with some pruning strategies to speed up the search. These pruning strategies are early validations for detecting whether the current search state is doomed to produce an invalid result.
Let's introduce the after-split constraint. An after-split constraint of a field slice represents a requirement on the bit-size of the list containing the field slice in the output. We can deduce after-split constraints for field slices in the result list, as they won't be split again. For example, suppose the current state is
then a1
, a2
, and a3
will be in a 32-bit list for any sub-states derived. The implementation prints out after-split constraints with LOG5:
(if we re-define after-split constraints so that they are propagated through rotational clusters, then the second and third pruning strategies are redundant with the first.)
If there are field slices in the same rotational cluster with incompatible after-split constraints, then the current search state can be pruned.
Let's use the example from the above:
If the current state is
then we know there is a conflict on a3
and c4
because they are in the same super cluster, but their after-split constraints are incompatible: a3
is in a 16-bit list, while c4
is in an 8-bit list.
At this point, we should backtrack instead of further splitting header B or D in the header list. By pruning the search here, we eliminate 2^37 invalid results!
The time complexity for this strategy is O(N), where N is the number of field slices. It can eliminate O(2^(K/8)) invalid solutions, where K is the number of bits remaining in the header list. Note that this strategy will prune more invalid states if K is larger. We will talk about this in [Optimization: Search order].
Since headers (i.e., slice lists) only get smaller as the search progresses, we can check the after-split constraints on a field slice against the bit-length of the slice list containing that slice. That is, if an after-split constraint on a field slice f1 requires a list larger than the list currently holding slice f2, and f1 and f2 are in the same rotational cluster, then current state is invalid and can be pruned.
Example:
Time Complexity: O(n), where n equals to the number of field slices.
Consider a header [f1<24>, f2<8>]
, and two after-split constraints enforced by other field slices in the same rotational cluster with f1/f2, f1 must =32, f2 must =16
, there is no valid solution, because only one after-split constraint can be satisfied. Generally, after we collect after-split constraints for all field slices in a list, we can check whether they can all be satisfied on that header/slicelist.
To check whether all after-split constraints of a slice list can be satisfied, we create a bitmap that has the same size as the list, representing after-split constraints on each bit of the list. Then for each field slice, the algorithm will seek back to the left-most possible starting bit, leftmost, then mark bits in [leftmost, leftmost + after-split-constraint.size] to the after-split constraints by taking an intersection of constraint of the field and the constraint recorded on the bitmap. If the intersection of two constraints on the bit is empty, then it means there exists two constraints that cannot be satisfied together. Please refer to DfsItrContext::dfs_prune_unsat_slicelist_constraints
in the implementation for more details.
Example:
The time complexity is O(M), where M is the total number of bits of the input super cluster.
Recall that in Enhancement 1, we stated that rotational clusters transitively link headers into a super cluster. We add another constraint: all slicelists in a super cluster must have a same size. We say that a super cluster is well-formed if it satisfies this constraint.
Example:
This strategy is simple. If we find a super cluster whose slice lists are all in the result list, we check whether this cluster is well-formed.
For slices of a field with same_container_group
constraint, all those slices will end up in a same supercluster, so they share same aftersplit constraints just slices in a rotation cluster. AfterSplitConstraint of any fieldslice will be propagated to all other slices of the same field.
The above discussion assumes that the size of a slice list must be one of the 8, 16, or 32. In this enhancement, we introduce a new type of list, whose size is not subject to this constraint.
Metadata is a special type of field that has virtual padding at the beginning of the field. The size of the virtual padding can any integer from 0 to 7. On Tofino, this virtual padding corresponds to the alignment constraint, and is called Alignment
in the code as well. In this document, we prefer the term "virtual padding". We write fieldname^5<8>
to represent a metadata field that is 8-bits long, preceded by 5 bits of virtual padding.
Metadata fields form metadata slice lists, and we cannot mix metadata fields with normal fields to form a list. Metadata can be in a rotational cluster with other normal fields. Define the virtual size of a metadata list to be the combined length of the slicess in the list, plus the size of the virtual padding on the first field slice. The size of a metadata list is not limited to 8, 16, or 32; instead, its virtual size cannot exceed 32 bits.
Example:
Metadata field slices in a rotational cluster, are not required to be in same-sized lists. However, if there are normal fields in the rotational cluster, then the virtual size of those metadata lists cannot exceed the size of list normal fields(they are supposed to be equal).
Also, for super-cluster slice-list constraint, the constraint becomes that normal slice lists must have the same size, and metadata list's virtual size cannot be larger than that size.
(Not sure I follow this section. Examples would help.)
Although it seems that metadata type complicates the problem, we can handle them by simply enhancing our after-split constraint. Previously, an after-split constraint can only specify the 'exact' requirement. We introduce a less restricted form of after-split constraint, which specifies a minimum size of the list that a metadata resides in.
With the enhanced AfterSplitConstraint
class, the above pruning strategies still apply.
When a medatada list joins two exact containers lists of different sizes into one supercluster, we should prune early because the supercluster cannot be well-formed already. For example,
sl3 will join sl1 and sl2 into one super cluster, and we can infer that this cluster is invalid because there are a 32-bit exact list and a 16-bit exact list.
Two fields exhibit a pack conflict when they cannot end up in the same list. This constraint is orthogonal to the above constraints. The algorithm checks all splitted slice lists to ensure that no slice list contains a pair of pack-conflicted fields.
Because we only split at byte boundaries (i.e., we only split out the first 8, 16, or 32 bits from the front of the list), fields that share a byte will inevitably end up in the same list. For these fields, we skip the pack-confict check. Doing so is consistent with the action-PHV constraint check in the PHV-allocation algorithm.
The @pa_container_size
pragma specifies the sizes of lists that contain a field after slicing. This pragma is surprisingly powerful in that it supports two modes.
The pragma is used in exact-size mode if the specified container sizes add up to the size of the field (e.g., @pa_container_size(f1<32>, 32)
or @pa_container_size(f1<32>, 16, 8, 8)
). To handle this constraint, before doing DFS, we do a pre-slicing following the specified schema, that fully splits out those smaller chunks of fields.
For example,
headers will be pre-sliced into
The pragma also has an up-casting mode, wherein the sum of container sizes is larger than the size of the field (e.g., @pa_container_size(f1<20>, 32)
or @pa_container_size(f2<20>, 16, 8)
). In this mode, we can only do a limited pre-slicing. For example, with the first example usage, we don't know where to split around f1
so that it ends up in a 32-bit list: it can either be 12 bits before f1
or 12 bits after f1
. The good news is, for f2
in the second example, we can know where to split even in up-casting mode. The code implements this by giving the tailing field slice an after-split constraint.
(Not following this last point. An example would help.)
A general optimization for a DFS algorithm is to schedule the search order at each step to form a thinner search tree by leveraging pruning strategies. For example, A*
sorts its search frontier according to f(n) = h(n) + g(n)
. In our slicing DFS algorithm, the decision at each step is to choose N and X, and split out first N bits from list X. For picking the X, similar to Algorithm X, we want to choose the list with most constraints. There is a class called NextSplitTargetMetrics
that can sort lists by constraints. For picking the N, the make_choices
function will return a sorted list where preferred values of N are at front of the list.