P4C
The P4 Compiler
Loading...
Searching...
No Matches
StageUseEstimate Struct Reference

Classes

struct  RAM_counter
 

Public Member Functions

 StageUseEstimate (const IR::MAU::Table *, int &, attached_entries_t &, LayoutChoices *lc, bool prev_placed, bool gateway_attached, bool disable_split, PhvInfo &phv)
 
bool adjust_choices (const IR::MAU::Table *tbl, int &entries, attached_entries_t &)
 
void calculate_attached_rams (const IR::MAU::Table *tbl, const attached_entries_t &att_entries, LayoutOption *lo)
 
void calculate_for_leftover_atcams (const IR::MAU::Table *tbl, int srams_left, int &entries, attached_entries_t &)
 
bool calculate_for_leftover_srams (const IR::MAU::Table *tbl, int &srams_left, int &entries, attached_entries_t &)
 
void calculate_for_leftover_tcams (const IR::MAU::Table *tbl, int srams_left, int tcams_left, int &entries, attached_entries_t &)
 
void calculate_partition_sizes (const IR::MAU::Table *tbl, LayoutOption *lo, int ram_depth)
 
void calculate_per_row_vector (safe_vector< RAM_counter > &per_word_and_width, const IR::MAU::Table *tbl, LayoutOption *lo)
 
void calculate_way_sizes (const IR::MAU::Table *tbl, LayoutOption *lo, int &calculated_depth)
 
bool can_be_identity_hash (const IR::MAU::Table *tbl, LayoutOption *lo, int &calculated_depth)
 
void clear ()
 
void determine_initial_layout_option (const IR::MAU::Table *tbl, int &entries, attached_entries_t &)
 
void fill_estimate_from_option (int &entries)
 
void known_srams_needed (const IR::MAU::Table *tbl, const attached_entries_t &, LayoutOption *lo)
 
void max_entries_best_option ()
 
StageUseEstimate operator+ (const StageUseEstimate &a) const
 
StageUseEstimateoperator+= (const StageUseEstimate &a)
 
bool operator<= (const StageUseEstimate &a)
 
void options_to_atcam_entries (const IR::MAU::Table *tbl, int entries)
 
void options_to_dleft_entries (const IR::MAU::Table *tbl, const attached_entries_t &att_entries)
 
void options_to_rams (const IR::MAU::Table *tbl, const attached_entries_t &att_entries)
 
void options_to_ternary_entries (const IR::MAU::Table *tbl, int entries)
 
void options_to_ways (const IR::MAU::Table *tbl, int entries)
 
const LayoutOptionpreferred () const
 
const ActionData::Format::Usepreferred_action_format () const
 
const MeterALU::Format::Usepreferred_meter_format () const
 
cstring ran_out () const
 
void remove_invalid_option ()
 
void select_best_option (const IR::MAU::Table *tbl)
 
void select_best_option_ternary ()
 
void shrink_preferred_atcams_lo (const IR::MAU::Table *tbl, int &entries, attached_entries_t &attached_entries)
 
void shrink_preferred_srams_lo (const IR::MAU::Table *tbl, int &entries, attached_entries_t &attached_entries)
 
void shrink_preferred_tcams_lo (const IR::MAU::Table *tbl, int &entries, attached_entries_t &attached_entries)
 
void srams_left_best_option (int srams_left)
 
int stages_required () const
 
void tcams_left_best_option ()
 
void unknown_atcams_needed (const IR::MAU::Table *tbl, LayoutOption *lo, int srams_left)
 
void unknown_srams_needed (const IR::MAU::Table *tbl, LayoutOption *lo, int srams_left)
 
void unknown_tcams_needed (const IR::MAU::Table *tbl, LayoutOption *lo, int tcams_left, int srams_left)
 
bool ways_provided (const IR::MAU::Table *tbl, LayoutOption *lo, int &calculated_depth)
 

Static Public Member Functions

static StageUseEstimate max ()
 

Public Attributes

safe_vector< ActionData::Format::Useaction_formats
 
int exact_ixbar_bytes = 0
 
ActionData::FormatType_t format_type
 
int hash_bits_masked = 0
 
safe_vector< LayoutOptionlayout_options
 
int local_tinds = 0
 
int logical_ids = 0
 
int maprams = 0
 
int meter_alus = 0
 
MeterALU::Format::Use meter_format
 
size_t preferred_index = 0
 
int srams = 0
 
int stats_alus = 0
 
int tcams = 0
 
int ternary_ixbar_groups = 0
 

Static Public Attributes

static constexpr int COMPILER_DEFAULT_SELECTOR_POOLS = 4
 
static constexpr int MAX_DLEFT_HASH_SIZE = 23
 
static constexpr int MAX_LOCAL_TINDS = 16
 
static constexpr int MAX_METER_ALUS = 4
 
static constexpr int MAX_MOD = 31
 
static constexpr int MAX_MOD_SHIFT = 5
 
static constexpr int MAX_POOL_RAMLINES = MAX_MOD << MAX_MOD_SHIFT
 
static constexpr int MAX_STATS_ALUS = 4
 
static constexpr int MAX_WAYS = 8
 
static constexpr int MIN_WAYS = 1
 
static constexpr int MOD_INPUT_BITS = 10
 
static constexpr int SINGLE_RAMLINE_POOL_SIZE = 120
 

Member Function Documentation

◆ calculate_partition_sizes()

void StageUseEstimate::calculate_partition_sizes ( const IR::MAU::Table * tbl,
LayoutOption * lo,
int initial_ram_depth )

Calculates an estimate for the total number of logical tables, given the number of RAMs dedicated to an ATCAM table. The goal is, calculate the minimum logical tables that I need, and then balance the size of those logical tables.

◆ calculate_way_sizes()

void StageUseEstimate::calculate_way_sizes ( const IR::MAU::Table * tbl,
LayoutOption * lo,
int & calculated_depth )

This calculates the number of simultaneous lookups within an exact match table, using the cuckoo hashing. The RAM selection is done through using particular bits on the 52 bit hash bus. The lower 40 bits are broken into 4 10 bit sections for RAM line selection, and the upper 12 bits are used to do a RAM select.

In order to fit as at least 90% of entries without having to move other match entries, generally 4 ways are required for complete independent lookup. Thus, if the entries requested for the table is smaller than a particular number, the algorithm will still bump the number of entries up in order to maintain this number of independent ways.

Let me provide the following example. Say that the number of entries for a particular table requires 4 independent ways of size 8. The hash bus would be allocated as the following:

  • Each of the 4 independent ways would each have a separate 10 bits of RAM row select, totalling 40 bits
  • To select a distinct RAM out of the 8 ways, each way would require 3 bits of RAM select, totalling 12 bits. This totals to 52 bits, which fortunately is the size of the number of hash select bits

An optimization that I take advantage of is the fact that I can repeat using of select bits. For example, say the number of entries required 40 RAMs. One could in theory break this up into 5 independent ways of 8 RAMs. However, this would not fit onto the 52 bits, as 50 bits of RAM row select + 15 bits of RAM select, way larger than the 52 bits on a hash select bus.

However, the compiler will optimize so that way 1 and way 5 will actually share the 10 bits of RAM row select, and the 3 upper bits of RAM select. This means that ways 1 and 5 are not independent, instead they are the exact same. However, this is not an issue for our constraint, as we still have 4 independent hash lookups.

This cannot be used indefinitely however. For example, say we needed 64 RAMs, with 4 ways of 16 RAMs. Even though we can fit all RAM row selection in the lower 40 bits, this would require 16 bits of RAM select. In this case, we cannot repeat the use select bits as this would not provide at least 4 independent hash lookups, which is the standard required by the driver.

In the case just described, we would actually require 2 separate RAM select buses, and thus two separate search buses. The fortunate thing is that the maximum number of RAMs is 80 per MAU stage, so even the input xbar requirements are high, the RAM array requirements are high as well.

◆ can_be_identity_hash()

bool StageUseEstimate::can_be_identity_hash ( const IR::MAU::Table * tbl,
LayoutOption * lo,
int & calculated_depth )

An optimization for an exact match table.

If a key is under a certain number of bits, instead of using a random hash of that key to find the position, an identity can be used instead. This makes sense for keys 10 bits or less, as an identity hash would just source to an individual RAM line.

If one was to have, for example, a 12 bit key, this support is possible. If 4 entries fit per RAM line, then by using an identity hash, each entry can fit within a single RAM.

The driver, however, is limited in its current support. The driver uses a reserved entry as the miss entry. This miss entry will be used if the table ever misses. If the miss-entry has action data requirements or potentially stateful requirements, then those entries must be stored.

The driver always programs the miss entry at the highest address, meaning that if an identity address is used, if the table requires a miss-entry, then the all 1 field will collide with the miss-entry.

This could be fixed by a dynamic miss-entry. If the miss-entry could move to any open miss-entry, then all of these tables could support this identity hash. If the table was to ever fill all entries, then by definition, the table could never miss.

The current limitations are if a direct resource is required. This will reserve the all 0 miss-entry, no matter what.

In the future, when this is supported, a table with a direct resource can still use this identity optimization if and only if the miss-entry never uses that resource, which is a more complex check, but not hard to add

UPDATE: Based on driver fixes, driver checks if the EXM table requires a table location to be reserved for the default (miss) entry. Originally the check was simply whether or not the table used direct resources (action, idle, counter, meter, stful). Now, the check is whether any of the possible default actions use the direct resources.

With this change, compiler check will mimic driver checks and use identity hash for cases where default actions do not have an attached resource.

◆ options_to_atcam_entries()

void StageUseEstimate::options_to_atcam_entries ( const IR::MAU::Table * tbl,
int entries )

Calculating the total number of entries for each layout option for an atcam table. The number of RAMs for the whole table is the following calculation: ways_per_partition: ceil_log2(select_bits of the atcam_partition_index) partition_entries: total (logical) simultaneous lookups in the table ram_depth: number of RAMs to hold all partitions, if the match was one ram wide

◆ options_to_dleft_entries()

void StageUseEstimate::options_to_dleft_entries ( const IR::MAU::Table * tbl,
const attached_entries_t & attached_entries )

Currently a very simple way to split up the dleft hash tables into a reasonable number of ALUs with a particular size. Eventually, the hash mod can potentially be used in order to calculate a RAM size exactly, according to Mike Ferrera, so that the addresses don't have to be a power of two.

◆ unknown_atcams_needed()

void StageUseEstimate::unknown_atcams_needed ( const IR::MAU::Table * tbl,
LayoutOption * lo,
int srams_left )

Given a number of srams, calculate the size of the possible atcam table, given the layout option. It is different than normal SRAMs, because the algorithm has to grow all ways simultaneously.

◆ ways_provided()

bool StageUseEstimate::ways_provided ( const IR::MAU::Table * tbl,
LayoutOption * lo,
int & calculated_depth )

There are now two support pragmas, ways and simul_lookups. For an SRAM based table that uses cuckoo hashing, multiple RAMs are looked up simultaneously, each accessed by a different hash function. The number of ways is the number of simultaneous lookup. Each way corresponds to a single hash function provided by the 52 bit hash bus.

The difference in the meaning is the following:

 ways - Each hash function must be entirely independent, i.e. cannot use the same
     hash bits
 simul_lookups - simultaneous lookups can use the same hash bits, an optimization
     supported only in Brig.  Really, one can think of this as making an
     individual way deeper

simul_lookups is only supported internally at this point, and is necessary to make progress on power.p4