P4C
The P4 Compiler
|
Classes | |
struct | RAM_counter |
Public Member Functions | |
StageUseEstimate (const IR::MAU::Table *, int &, attached_entries_t &, LayoutChoices *lc, bool prev_placed, bool gateway_attached, bool disable_split, PhvInfo &phv) | |
bool | adjust_choices (const IR::MAU::Table *tbl, int &entries, attached_entries_t &) |
void | calculate_attached_rams (const IR::MAU::Table *tbl, const attached_entries_t &att_entries, LayoutOption *lo) |
void | calculate_for_leftover_atcams (const IR::MAU::Table *tbl, int srams_left, int &entries, attached_entries_t &) |
bool | calculate_for_leftover_srams (const IR::MAU::Table *tbl, int &srams_left, int &entries, attached_entries_t &) |
void | calculate_for_leftover_tcams (const IR::MAU::Table *tbl, int srams_left, int tcams_left, int &entries, attached_entries_t &) |
void | calculate_partition_sizes (const IR::MAU::Table *tbl, LayoutOption *lo, int ram_depth) |
void | calculate_per_row_vector (safe_vector< RAM_counter > &per_word_and_width, const IR::MAU::Table *tbl, LayoutOption *lo) |
void | calculate_way_sizes (const IR::MAU::Table *tbl, LayoutOption *lo, int &calculated_depth) |
bool | can_be_identity_hash (const IR::MAU::Table *tbl, LayoutOption *lo, int &calculated_depth) |
void | clear () |
void | determine_initial_layout_option (const IR::MAU::Table *tbl, int &entries, attached_entries_t &) |
void | fill_estimate_from_option (int &entries) |
void | known_srams_needed (const IR::MAU::Table *tbl, const attached_entries_t &, LayoutOption *lo) |
void | max_entries_best_option () |
StageUseEstimate | operator+ (const StageUseEstimate &a) const |
StageUseEstimate & | operator+= (const StageUseEstimate &a) |
bool | operator<= (const StageUseEstimate &a) |
void | options_to_atcam_entries (const IR::MAU::Table *tbl, int entries) |
void | options_to_dleft_entries (const IR::MAU::Table *tbl, const attached_entries_t &att_entries) |
void | options_to_rams (const IR::MAU::Table *tbl, const attached_entries_t &att_entries) |
void | options_to_ternary_entries (const IR::MAU::Table *tbl, int entries) |
void | options_to_ways (const IR::MAU::Table *tbl, int entries) |
const LayoutOption * | preferred () const |
const ActionData::Format::Use * | preferred_action_format () const |
const MeterALU::Format::Use * | preferred_meter_format () const |
cstring | ran_out () const |
void | remove_invalid_option () |
void | select_best_option (const IR::MAU::Table *tbl) |
void | select_best_option_ternary () |
void | shrink_preferred_atcams_lo (const IR::MAU::Table *tbl, int &entries, attached_entries_t &attached_entries) |
void | shrink_preferred_srams_lo (const IR::MAU::Table *tbl, int &entries, attached_entries_t &attached_entries) |
void | shrink_preferred_tcams_lo (const IR::MAU::Table *tbl, int &entries, attached_entries_t &attached_entries) |
void | srams_left_best_option (int srams_left) |
int | stages_required () const |
void | tcams_left_best_option () |
void | unknown_atcams_needed (const IR::MAU::Table *tbl, LayoutOption *lo, int srams_left) |
void | unknown_srams_needed (const IR::MAU::Table *tbl, LayoutOption *lo, int srams_left) |
void | unknown_tcams_needed (const IR::MAU::Table *tbl, LayoutOption *lo, int tcams_left, int srams_left) |
bool | ways_provided (const IR::MAU::Table *tbl, LayoutOption *lo, int &calculated_depth) |
Static Public Member Functions | |
static StageUseEstimate | max () |
Public Attributes | |
safe_vector< ActionData::Format::Use > | action_formats |
int | exact_ixbar_bytes = 0 |
ActionData::FormatType_t | format_type |
int | hash_bits_masked = 0 |
safe_vector< LayoutOption > | layout_options |
int | local_tinds = 0 |
int | logical_ids = 0 |
int | maprams = 0 |
int | meter_alus = 0 |
MeterALU::Format::Use | meter_format |
size_t | preferred_index = 0 |
int | srams = 0 |
int | stats_alus = 0 |
int | tcams = 0 |
int | ternary_ixbar_groups = 0 |
void StageUseEstimate::calculate_partition_sizes | ( | const IR::MAU::Table * | tbl, |
LayoutOption * | lo, | ||
int | initial_ram_depth ) |
Calculates an estimate for the total number of logical tables, given the number of RAMs dedicated to an ATCAM table. The goal is, calculate the minimum logical tables that I need, and then balance the size of those logical tables.
void StageUseEstimate::calculate_way_sizes | ( | const IR::MAU::Table * | tbl, |
LayoutOption * | lo, | ||
int & | calculated_depth ) |
This calculates the number of simultaneous lookups within an exact match table, using the cuckoo hashing. The RAM selection is done through using particular bits on the 52 bit hash bus. The lower 40 bits are broken into 4 10 bit sections for RAM line selection, and the upper 12 bits are used to do a RAM select.
In order to fit as at least 90% of entries without having to move other match entries, generally 4 ways are required for complete independent lookup. Thus, if the entries requested for the table is smaller than a particular number, the algorithm will still bump the number of entries up in order to maintain this number of independent ways.
Let me provide the following example. Say that the number of entries for a particular table requires 4 independent ways of size 8. The hash bus would be allocated as the following:
An optimization that I take advantage of is the fact that I can repeat using of select bits. For example, say the number of entries required 40 RAMs. One could in theory break this up into 5 independent ways of 8 RAMs. However, this would not fit onto the 52 bits, as 50 bits of RAM row select + 15 bits of RAM select, way larger than the 52 bits on a hash select bus.
However, the compiler will optimize so that way 1 and way 5 will actually share the 10 bits of RAM row select, and the 3 upper bits of RAM select. This means that ways 1 and 5 are not independent, instead they are the exact same. However, this is not an issue for our constraint, as we still have 4 independent hash lookups.
This cannot be used indefinitely however. For example, say we needed 64 RAMs, with 4 ways of 16 RAMs. Even though we can fit all RAM row selection in the lower 40 bits, this would require 16 bits of RAM select. In this case, we cannot repeat the use select bits as this would not provide at least 4 independent hash lookups, which is the standard required by the driver.
In the case just described, we would actually require 2 separate RAM select buses, and thus two separate search buses. The fortunate thing is that the maximum number of RAMs is 80 per MAU stage, so even the input xbar requirements are high, the RAM array requirements are high as well.
bool StageUseEstimate::can_be_identity_hash | ( | const IR::MAU::Table * | tbl, |
LayoutOption * | lo, | ||
int & | calculated_depth ) |
An optimization for an exact match table.
If a key is under a certain number of bits, instead of using a random hash of that key to find the position, an identity can be used instead. This makes sense for keys 10 bits or less, as an identity hash would just source to an individual RAM line.
If one was to have, for example, a 12 bit key, this support is possible. If 4 entries fit per RAM line, then by using an identity hash, each entry can fit within a single RAM.
The driver, however, is limited in its current support. The driver uses a reserved entry as the miss entry. This miss entry will be used if the table ever misses. If the miss-entry has action data requirements or potentially stateful requirements, then those entries must be stored.
The driver always programs the miss entry at the highest address, meaning that if an identity address is used, if the table requires a miss-entry, then the all 1 field will collide with the miss-entry.
This could be fixed by a dynamic miss-entry. If the miss-entry could move to any open miss-entry, then all of these tables could support this identity hash. If the table was to ever fill all entries, then by definition, the table could never miss.
The current limitations are if a direct resource is required. This will reserve the all 0 miss-entry, no matter what.
In the future, when this is supported, a table with a direct resource can still use this identity optimization if and only if the miss-entry never uses that resource, which is a more complex check, but not hard to add
UPDATE: Based on driver fixes, driver checks if the EXM table requires a table location to be reserved for the default (miss) entry. Originally the check was simply whether or not the table used direct resources (action, idle, counter, meter, stful). Now, the check is whether any of the possible default actions use the direct resources.
With this change, compiler check will mimic driver checks and use identity hash for cases where default actions do not have an attached resource.
void StageUseEstimate::options_to_atcam_entries | ( | const IR::MAU::Table * | tbl, |
int | entries ) |
Calculating the total number of entries for each layout option for an atcam table. The number of RAMs for the whole table is the following calculation: ways_per_partition: ceil_log2(select_bits of the atcam_partition_index) partition_entries: total (logical) simultaneous lookups in the table ram_depth: number of RAMs to hold all partitions, if the match was one ram wide
void StageUseEstimate::options_to_dleft_entries | ( | const IR::MAU::Table * | tbl, |
const attached_entries_t & | attached_entries ) |
Currently a very simple way to split up the dleft hash tables into a reasonable number of ALUs with a particular size. Eventually, the hash mod can potentially be used in order to calculate a RAM size exactly, according to Mike Ferrera, so that the addresses don't have to be a power of two.
void StageUseEstimate::unknown_atcams_needed | ( | const IR::MAU::Table * | tbl, |
LayoutOption * | lo, | ||
int | srams_left ) |
Given a number of srams, calculate the size of the possible atcam table, given the layout option. It is different than normal SRAMs, because the algorithm has to grow all ways simultaneously.
bool StageUseEstimate::ways_provided | ( | const IR::MAU::Table * | tbl, |
LayoutOption * | lo, | ||
int & | calculated_depth ) |
There are now two support pragmas, ways and simul_lookups. For an SRAM based table that uses cuckoo hashing, multiple RAMs are looked up simultaneously, each accessed by a different hash function. The number of ways is the number of simultaneous lookup. Each way corresponds to a single hash function provided by the 52 bit hash bus.
The difference in the meaning is the following:
ways - Each hash function must be entirely independent, i.e. cannot use the same hash bits simul_lookups - simultaneous lookups can use the same hash bits, an optimization supported only in Brig. Really, one can think of this as making an individual way deeper
simul_lookups is only supported internally at this point, and is necessary to make progress on power.p4