P4C
The P4 Compiler
Loading...
Searching...
No Matches
clot Directory Reference
Directory dependency graph for clot:

Directories

 pragma
 

Files

 allocate_clot.h
 
 check_clot_groups.h
 
 clot.h
 
 clot_candidate.h
 
 clot_info.h
 
 deparse_graph.h
 
 field_pov_analysis.h
 
 field_slice_extract_info.h
 
 field_slice_set.h
 
 header_validity_analysis.h
 
 merge_desugared_varbit_valids.h
 
 pseudoheader.h
 

Detailed Description

This documents the design of the CLOT allocator and the metric associated with CLOT allocation.

Definitions

When an object (field, slice, byte, etc.) is allocated to both PHVs and to a CLOT, we say that the object is double-allocated.

We categorize the field slices in the program, as follows:

A CLOT-allocated field slice is overwritten if the value in the slice’s CLOT is replaced by a value from a PHV container or by a calculated checksum during deparsing. More precisely, a CLOT-allocated slice is overwritten if it satisfies any of the following conditions.

Objective

The objective of the CLOT allocator is two-fold:

  1. CLOT-allocate as many unused and read-only slices as possible, while minimizing the number of CLOTs used.
  2. Identify checksum and modified slices and compute their offsets within their CLOT. This is so that we can produce the correct overwrite offsets for the slices’ PHV containers and checksum values in the deparser. Because CLOT allocation occurs before PHV allocation, the CLOT allocator needs to do this itself.

CLOT layout constraints

The following are the layout constraints for CLOT allocation.

CLOT-eligible field slices

A field slice is CLOT-eligible if it is part of a CLOT-eligible field. A field is CLOT-eligible if it is extracted by the parser and satisfies all of the following properties.

CLOT candidates

A CLOT candidate consists of a maximal sequence of CLOT-eligible field slices whose extracts satisfy all of the following properties.

While CLOTs can span more than one parser state, this allocation algorithm does not use this property.

Identifying CLOT candidates

An initial set of CLOT candidates is identified and created through the following sequence of phases.

  1. Finding pseudoheaders. With the decaf optimization, headers are no longer the units of deparsing: fields may be emitted separately from the rest of their header. Fields are therefore grouped into pseudoheaders, which are contiguous sequences of fields emitted by the deparser, with all fields sharing the same set of POV bits.2
  2. Grouping field extracts by pseudoheader. CLOT-eligible fields are grouped by pseudoheader. Each CLOT-eligible field is associated with a map from each parser state in which it is extracted to the field’s offset from the start of the parser state.
  3. Creating CLOT candidates. The sequence of fields in each pseudoheader is traversed, and the parser-state and field-offset information from Phase 2 is used to find corresponding contiguous sequences of field extracts that satisfy the requirements for CLOT candidacy.

Allocation algorithm

The CLOT allocator uses a greedy algorithm to allocate CLOTs in each gress independently. For each gress, it identifies CLOT candidates and allocates them in descending order of number of unused bits. Ties are broken by preferring candidates with more read-only bits. As each candidate is allocated, the remaining candidates are adjusted by splitting, shrinking, or removing from candidacy as needed, to account for the inter-CLOT-gap requirement. This adjustment must account for overlapping fields and look-ahead extracts. The numbers of unused and read-only bits in each candidate are updated to account for this adjustment. Shrinking of candidates is done at the granularity of slices.

Candidate adjustment

After allocating a CLOT candidate C, we adjust the remaining candidates by splitting, shrinking, or removing from candidacy as needed to ensure the bytes in the remaining candidates do not overlap with C in the input packet. We further remove the beginning and end of every remaining unallocated candidate C’ to account for the inter-CLOT gap. First, a couple of definitions.

If C may be parsed before C’ (i.e., GAPS(C, C’) ≠ ∅), then we remove the first few bytes at the beginning of C’ if any of the following is satisfied.

Dually, if C’ may be parsed before C (i.e., GAPS(C’, C) ≠ ∅), then we remove the last few bytes at the end of C’ if any of the following is satisfied.

These adjustments are not mutually exclusive; adjustments may be necessary at both the beginning and end of C’. The number of bytes removed by these adjustments is computed to be the smallest amount needed to achieve the required inter-CLOT gap separation between C and C’ on all paths through the parser.

Allocation adjustment

Once PHV allocation is complete, the allocated CLOTs are adjusted to account for PHV container packing. Consider the example shown below. The CLOT begins with the field f2, which is double-allocated, read-only, and packed into a PHV container with f1. The PHV container also has a slice of f3 to ensure the container is byte-aligned.

Because f1 is modified, the PHV container will be deparsed, and will overwrite the CLOT. But the overwriting container straddles the CLOT boundary, which is not permitted. The CLOT, therefore, needs to be adjusted to exclude those portions that overlap the PHV container. A similar situation occurs for overwrites straddling the end of CLOTs.

header h_t {
bit<8> f1; // modified
bit<4> f2; // read-only
bit<12> f3; // unused
...
PHV container CLOT
│ │
│ │
│ ▼
▼ ──── ──── ──── ──── ──── ────
┌ ─ ─ ─ ─ ─ ┴ ─ ─ ─ ─ ─ │
┌──────────┬────┬─────┴──────────┐ │
││ f1 │ f2 │ f3 │ ... │
└──────────┴────┴─────┬──────────┘ │
└ ─ ─ ─ ─ ─ ┬ ─ ─ ─ ─ ─
└ ──── ──── ──── ──── ──── ───┘

To make these adjustments, we shrink all allocated CLOTs until they neither start nor end with an overwritten field slice. If the entire CLOT is overwritten, then the CLOT is unnecessary, and is de-allocated.

Metrics

We form a metric for CLOT allocation by first identifying header bis that can benefit from CLOT allocation. These are bits satisfying all of the following conditions.

We then report:

We could categorize these bytes into sub-metrics, as follows:

Not all of these sub-metrics will necessarily be indicative of a better CLOT allocation, however. For example, an allocator that allocates an additional CLOT might reduce the overall number of bytes that are not CLOT-allocated, but increase the number of inter-CLOT gap bytes.