[v2,3/6] service: reduce average case service core overhead

Message ID 20221005091615.94652-4-mattias.ronnblom@ericsson.com (mailing list archive)
State Accepted, archived
Delegated to: David Marchand
Headers
Series Service cores performance and statistics improvements |

Checks

Context Check Description
ci/checkpatch success coding style OK

Commit Message

Mattias Rönnblom Oct. 5, 2022, 9:16 a.m. UTC
  Optimize service loop so that the starting point is the lowest-indexed
service mapped to the lcore in question, and terminate the loop at the
highest-indexed service.

While the worst case latency remains the same, this patch
significantly reduces the service framework overhead for the average
case. In particular, scenarios where an lcore only runs a single
service, or multiple services which id values are close (e.g., three
services with ids 17, 18 and 22), show significant improvements.

The worse case is a where the lcore two services mapped to it; one
with service id 0 and the other with id 63.

On a service lcore serving a single service, the service loop overhead
is reduced from ~190 core clock cycles to ~46, on an Intel Cascade
Lake generation Xeon. On weakly ordered CPUs, the gain is larger,
since the loop included load-acquire atomic operations.

Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>

---

v2: Added build-time assertion to prevent the maximum number of
    services to accidentally be changed to a higher value than
    the implementation supports. (Harry van Haaren)
---
 lib/eal/common/rte_service.c | 19 +++++++++++++++----
 1 file changed, 15 insertions(+), 4 deletions(-)
  

Patch

diff --git a/lib/eal/common/rte_service.c b/lib/eal/common/rte_service.c
index 4d51de638d..035c36b8bb 100644
--- a/lib/eal/common/rte_service.c
+++ b/lib/eal/common/rte_service.c
@@ -78,6 +78,11 @@  static uint32_t rte_service_library_initialized;
 int32_t
 rte_service_init(void)
 {
+	/* Hard limit due to the use of an uint64_t-based bitmask (and the
+	 * clzl intrinsic).
+	 */
+	RTE_BUILD_BUG_ON(RTE_SERVICE_NUM_MAX > 64);
+
 	if (rte_service_library_initialized) {
 		RTE_LOG(NOTICE, EAL,
 			"service library init() called, init flag %d\n",
@@ -472,7 +477,6 @@  static int32_t
 service_runner_func(void *arg)
 {
 	RTE_SET_USED(arg);
-	uint32_t i;
 	const int lcore = rte_lcore_id();
 	struct core_state *cs = &lcore_states[lcore];
 
@@ -486,10 +490,17 @@  service_runner_func(void *arg)
 			RUNSTATE_RUNNING) {
 
 		const uint64_t service_mask = cs->service_mask;
+		uint8_t start_id;
+		uint8_t end_id;
+		uint8_t i;
 
-		for (i = 0; i < RTE_SERVICE_NUM_MAX; i++) {
-			if (!service_registered(i))
-				continue;
+		if (service_mask == 0)
+			continue;
+
+		start_id = __builtin_ctzl(service_mask);
+		end_id = 64 - __builtin_clzl(service_mask);
+
+		for (i = start_id; i < end_id; i++) {
 			/* return value ignored as no change to code flow */
 			service_run(i, cs, service_mask, service_get(i), 1);
 		}