mbox series

[v4,0/4] Introduce Topology NUMA grouping for lcores

Message ID 20241105102849.1947-1-vipin.varghese@amd.com (mailing list archive)
Headers
Series Introduce Topology NUMA grouping for lcores |

Message

Varghese, Vipin Nov. 5, 2024, 10:28 a.m. UTC
This patch introduces improvements for NUMA topology awareness in
relation to DPDK logical cores. The goal is to expose API which allows
users to select optimal logical cores for any application. These logical
cores can be selected from various NUMA domains like CPU and I/O.

Change Summary:
 - Introduces the concept of NUMA domain partitioning based on CPU and
   I/O topology.
 - Adds support for grouping DPDK logical cores within the same Cache
   and I/O domain for improved locality.
 - Implements topology detection and core grouping logic that
   distinguishes between the following NUMA configurations:
    * CPU topology & I/O topology (e.g., AMD SoC EPYC, Intel Xeon SPR)
    * CPU+I/O topology (e.g., Ampere One with SLC, Intel Xeon SPR with SNC)
 - Enhances performance by minimizing lcore dispersion across tiles|compute
   package with different L2/L3 cache or IO domains.

Reason:
 - Applications using DPDK libraries relies on consistent memory access.
 - Lcores being closer to same NUMA domain as IO.
 - Lcores sharing same cache.

Latency is minimized by using lcores that share the same NUMA topology.
Memory access is optimized by utilizing cores within the same NUMA
domain or tile. Cache coherence is preserved within the same shared cache
domain, reducing the remote access from tile|compute package via snooping
(local hit in either L2 or L3 within same NUMA domain).

Library dependency: hwloc

Topology Flags:
---------------
 - RTE_LCORE_DOMAIN_L1: to group cores sharing same L1 cache
 - RTE_LCORE_DOMAIN_SMT: same as RTE_LCORE_DOMAIN_L1
 - RTE_LCORE_DOMAIN_L2: group cores sharing same L2 cache
 - RTE_LCORE_DOMAIN_L3: group cores sharing same L3 cache
 - RTE_LCORE_DOMAIN_L4: group cores sharing same L4 cache
 - RTE_LCORE_DOMAIN_IO: group cores sharing same IO

< Function: Purpose >
---------------------
 - rte_get_domain_count: get domain count based on Topology Flag
 - rte_lcore_count_from_domain: get valid lcores count under each domain
 - rte_get_lcore_in_domain: valid lcore id based on index
 - rte_lcore_cpuset_in_domain: return valid cpuset based on index
 - rte_lcore_is_main_in_domain: return true|false if main lcore is present
 - rte_get_next_lcore_from_domain: next valid lcore within domain
 - rte_get_next_lcore_from_next_domain: next valid lcore from next domain

Note:
 1. Topology is NUMA grouping.
 2. Domain is various sub-groups within a specific Topology.

Topology example: L1, L2, L3, L4, IO
Domian example: IO-A, IO-B

< MACRO: Purpose >
------------------
 - RTE_LCORE_FOREACH_DOMAIN: iterate lcores from all domains
 - RTE_LCORE_FOREACH_WORKER_DOMAIN: iterate worker lcores from all domains
 - RTE_LCORE_FORN_NEXT_DOMAIN: iterate domain select n'th lcore
 - RTE_LCORE_FORN_WORKER_NEXT_DOMAIN: iterate domain for worker n'th lcore.

Future work (after merge):
--------------------------
 - dma-perf per IO NUMA
 - eventdev per L3 NUMA
 - pipeline per SMT|L3 NUMA
 - distributor per L3 for Port-Queue
 - l2fwd-power per SMT
 - testpmd option for IO NUMA per port

Platform tested on:
-------------------
 - INTEL(R) XEON(R) PLATINUM 8562Y+ (support IO numa 1 & 2)
 - AMD EPYC 8534P (supports IO numa 1 & 2)
 - AMD EPYC 9554 (supports IO numa 1, 2, 4)

Logs:
-----
1. INTEL(R) XEON(R) PLATINUM 8562Y+:
 - SNC=1
        Domain (IO): at index (0) there are 48 core, with (0) at index 0
 - SNC=2
        Domain (IO): at index (0) there are 24 core, with (0) at index 0
        Domain (IO): at index (1) there are 24 core, with (12) at index 0

2. AMD EPYC 8534P:
 - NPS=1:
        Domain (IO): at index (0) there are 128 core, with (0) at index 0
 - NPS=2:
        Domain (IO): at index (0) there are 64 core, with (0) at index 0
        Domain (IO): at index (1) there are 64 core, with (32) at index 0

Signed-off-by: Vipin Varghese <vipin.varghese@amd.com>

Vipin Varghese (4):
  eal/lcore: add topology based functions
  test/lcore: enable tests for topology
  doc: add topology grouping details
  examples: update with lcore topology API

 app/test/test_lcores.c                        | 528 +++++++++++++
 config/meson.build                            |  18 +
 .../prog_guide/env_abstraction_layer.rst      |  22 +
 examples/helloworld/main.c                    | 154 +++-
 examples/l2fwd/main.c                         |  56 +-
 examples/skeleton/basicfwd.c                  |  22 +
 lib/eal/common/eal_common_lcore.c             | 714 ++++++++++++++++++
 lib/eal/common/eal_private.h                  |  58 ++
 lib/eal/freebsd/eal.c                         |  10 +
 lib/eal/include/rte_lcore.h                   | 209 +++++
 lib/eal/linux/eal.c                           |  11 +
 lib/eal/meson.build                           |   4 +
 lib/eal/version.map                           |  11 +
 lib/eal/windows/eal.c                         |  12 +
 14 files changed, 1819 insertions(+), 10 deletions(-)