From patchwork Mon Jan 24 14:59:53 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Olivier Matz X-Patchwork-Id: 106351 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 05B0CA04A8; Mon, 24 Jan 2022 16:00:06 +0100 (CET) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 92ACE4114F; Mon, 24 Jan 2022 16:00:06 +0100 (CET) Received: from mail-wm1-f44.google.com (mail-wm1-f44.google.com [209.85.128.44]) by mails.dpdk.org (Postfix) with ESMTP id 1723540040 for ; Mon, 24 Jan 2022 16:00:05 +0100 (CET) Received: by mail-wm1-f44.google.com with SMTP id j5-20020a05600c1c0500b0034d2e956aadso45967wms.4 for ; Mon, 24 Jan 2022 07:00:05 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=6wind.com; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=tykwCczppoN8+VtP7Tq5VHkn2SYH7/xyhfaEC/jsO6o=; b=QgvmygpaVz96c2o38JSyQI33vae8n+wlLsTxkSr/hMjjAszscumo8J1F0EbQYJfEBF mJbEiUho+59LZAydsBLXs+RH31BRGP9QPNXFGHBoSh5ylpHene/oGgB+AOHHazQkS4M/ GBZ6k/+dTHiHWfm/3U7JKl350936LkL139ShrSEZD0/zEQBb6zRnruai+gP3NIba3cJi nQWFMySE1EKCbMCliqA8u9h/BT2sV3O7ElgTHLhjnZSyfZdrRKMCFMS46olZN3ihI06f VH7AoN1+uJPdf+J+kRB/bc2kbj3GHQWVDRnnqUUT5bu/yl5/ur/VSAk/9iDJBalPU++Z 7uNg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=tykwCczppoN8+VtP7Tq5VHkn2SYH7/xyhfaEC/jsO6o=; b=Y2Pznq/GFRlGNbaD2iNLYLpIus7Rq2MNsiZAn8ruQyWqBKu5fmrPtTiHbnVWBJ7Sw+ XQZvkT8ZWrGk3fJJgVcc+PLM5Mv8arFiO+V3D+4QnHmk9YX+8pWvK72UMz4Xjh+O8L9j tj7b63My3ZekF/KcRVnn0mrqYQqll8c/dCB/exJn9btNNWpUPNZOchSBuQ9L7OKVit3i KfVSupj9iluGfzeuV2iqurA2VEH5V8naHUIPXoSb6TSh35NGmYQfdsUaHPyniIVLxd6o BJCOaUOIY8+FJg0woYmcEAnlaFem8ULb4x+26K/Z+aGKIMwVhtc9pYDN1WanJxs3ogv2 +mRA== X-Gm-Message-State: AOAM531ZowQYECiccBpSE4rVagSOkZBfhh4m6/w40krYNseTty1NKnkE GKeInfmHIOonYevMCgQ9m9sv8ipiA6H7fg== X-Google-Smtp-Source: ABdhPJz4iaNaky4nyUvE4tYRq5i734mJs35/MlsXC8bXcl0WZLTmCJFE11xhg78ypZ+dlAb+dHNYYQ== X-Received: by 2002:a1c:c917:: with SMTP id f23mr2198774wmb.10.1643036404762; Mon, 24 Jan 2022 07:00:04 -0800 (PST) Received: from gojira.dev.6wind.com ([185.13.181.2]) by smtp.gmail.com with ESMTPSA id t14sm4471160wmq.21.2022.01.24.07.00.04 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 24 Jan 2022 07:00:04 -0800 (PST) From: Olivier Matz To: dev@dpdk.org Cc: mb@smartsharesystems.com, andrew.rybchenko@oktetlabs.ru, bruce.richardson@intel.com, jerinjacobk@gmail.com, olivier.matz@6wind.com, thomas@monjalon.net Subject: [PATCH v2] mempool: test performance with constant n Date: Mon, 24 Jan 2022 15:59:53 +0100 Message-Id: <20220124145953.14281-1-olivier.matz@6wind.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220119113732.40167-1-mb@smartsharesystems.com> References: <20220119113732.40167-1-mb@smartsharesystems.com> MIME-Version: 1.0 X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org From: Morten Brørup "What gets measured gets done." This patch adds mempool performance tests where the number of objects to put and get is constant at compile time, which may significantly improve the performance of these functions. [*] Also, it is ensured that the array holding the object used for testing is cache line aligned, for maximum performance. And finally, the following entries are added to the list of tests: - Number of kept objects: 512 - Number of objects to get and to put: The number of pointers fitting into a cache line, i.e. 8 or 16 [*] Some example performance test (with cache) results: get_bulk=4 put_bulk=4 keep=128 constant_n=false rate_persec=280480972 get_bulk=4 put_bulk=4 keep=128 constant_n=true rate_persec=622159462 get_bulk=8 put_bulk=8 keep=128 constant_n=false rate_persec=477967155 get_bulk=8 put_bulk=8 keep=128 constant_n=true rate_persec=917582643 get_bulk=32 put_bulk=32 keep=32 constant_n=false rate_persec=871248691 get_bulk=32 put_bulk=32 keep=32 constant_n=true rate_persec=1134021836 Signed-off-by: Morten Brørup Signed-off-by: Olivier Matz --- Hi Morten, Here is the updated patch. I launched the mempool_perf on my desktop machine, but I don't reproduce the numbers: constant or non-constant give almost the same rate on my machine (it's even worst with constants). I tested with your initial patch and with this one. Can you please try this patch, and/or give some details about your test environment? Here is what I get: with your patch: mempool_autotest cache=512 cores=1 n_get_bulk=8 n_put_bulk=8 n_keep=128 constant_n=false rate_persec=152620236 mempool_autotest cache=512 cores=1 n_get_bulk=8 n_put_bulk=8 n_keep=128 constant_n=true rate_persec=144716595 mempool_autotest cache=512 cores=2 n_get_bulk=8 n_put_bulk=8 n_keep=128 constant_n=false rate_persec=306996838 mempool_autotest cache=512 cores=2 n_get_bulk=8 n_put_bulk=8 n_keep=128 constant_n=true rate_persec=287375359 mempool_autotest cache=512 cores=12 n_get_bulk=8 n_put_bulk=8 n_keep=128 constant_n=false rate_persec=977626723 mempool_autotest cache=512 cores=12 n_get_bulk=8 n_put_bulk=8 n_keep=128 constant_n=true rate_persec=963103944 with this patch: mempool_autotest cache=512 cores=1 n_get_bulk=8 n_put_bulk=8 n_keep=128 constant_n=0 rate_persec=156460646 mempool_autotest cache=512 cores=1 n_get_bulk=8 n_put_bulk=8 n_keep=128 constant_n=1 rate_persec=142173798 mempool_autotest cache=512 cores=2 n_get_bulk=8 n_put_bulk=8 n_keep=128 constant_n=0 rate_persec=312410111 mempool_autotest cache=512 cores=2 n_get_bulk=8 n_put_bulk=8 n_keep=128 constant_n=1 rate_persec=281699942 mempool_autotest cache=512 cores=12 n_get_bulk=8 n_put_bulk=8 n_keep=128 constant_n=0 rate_persec=983315247 mempool_autotest cache=512 cores=12 n_get_bulk=8 n_put_bulk=8 n_keep=128 constant_n=1 rate_persec=950350638 v2: - use a flag instead of a negative value to enable tests with compile-time constant - use a static inline function instead of a macro - remove some "noise" (do not change variable type when not required) Thanks, Olivier app/test/test_mempool_perf.c | 110 ++++++++++++++++++++++++----------- 1 file changed, 77 insertions(+), 33 deletions(-) -- 2.30.2 diff --git a/app/test/test_mempool_perf.c b/app/test/test_mempool_perf.c index 87ad251367..ce7c6241ab 100644 --- a/app/test/test_mempool_perf.c +++ b/app/test/test_mempool_perf.c @@ -1,5 +1,6 @@ /* SPDX-License-Identifier: BSD-3-Clause * Copyright(c) 2010-2014 Intel Corporation + * Copyright(c) 2022 SmartShare Systems */ #include @@ -55,19 +56,24 @@ * * - Bulk get from 1 to 32 * - Bulk put from 1 to 32 + * - Bulk get and put from 1 to 32, compile time constant * * - Number of kept objects (*n_keep*) * * - 32 * - 128 + * - 512 */ #define N 65536 #define TIME_S 5 #define MEMPOOL_ELT_SIZE 2048 -#define MAX_KEEP 128 +#define MAX_KEEP 512 #define MEMPOOL_SIZE ((rte_lcore_count()*(MAX_KEEP+RTE_MEMPOOL_CACHE_MAX_SIZE))-1) +/* Number of pointers fitting into one cache line. */ +#define CACHE_LINE_BURST (RTE_CACHE_LINE_SIZE / sizeof(uintptr_t)) + #define LOG_ERR() printf("test failed at %s():%d\n", __func__, __LINE__) #define RET_ERR() do { \ LOG_ERR(); \ @@ -91,6 +97,9 @@ static unsigned n_put_bulk; /* number of objects retrieved from mempool before putting them back */ static unsigned n_keep; +/* true if we want to test with constant n_get_bulk and n_put_bulk */ +static int use_constant_values; + /* number of enqueues / dequeues */ struct mempool_test_stats { uint64_t enq_count; @@ -111,11 +120,43 @@ my_obj_init(struct rte_mempool *mp, __rte_unused void *arg, *objnum = i; } +static __rte_always_inline int +test_loop(struct rte_mempool *mp, struct rte_mempool_cache *cache, + unsigned int x_keep, unsigned int x_get_bulk, unsigned int x_put_bulk) +{ + void *obj_table[MAX_KEEP] __rte_cache_aligned; + unsigned int idx; + unsigned int i; + int ret; + + for (i = 0; likely(i < (N / x_keep)); i++) { + /* get x_keep objects by bulk of x_get_bulk */ + for (idx = 0; idx < x_keep; idx += x_get_bulk) { + ret = rte_mempool_generic_get(mp, + &obj_table[idx], + x_get_bulk, + cache); + if (unlikely(ret < 0)) { + rte_mempool_dump(stdout, mp); + return ret; + } + } + + /* put the objects back by bulk of x_put_bulk */ + for (idx = 0; idx < x_keep; idx += x_put_bulk) { + rte_mempool_generic_put(mp, + &obj_table[idx], + x_put_bulk, + cache); + } + } + + return 0; +} + static int per_lcore_mempool_test(void *arg) { - void *obj_table[MAX_KEEP]; - unsigned i, idx; struct rte_mempool *mp = arg; unsigned lcore_id = rte_lcore_id(); int ret = 0; @@ -139,6 +180,9 @@ per_lcore_mempool_test(void *arg) GOTO_ERR(ret, out); if (((n_keep / n_put_bulk) * n_put_bulk) != n_keep) GOTO_ERR(ret, out); + /* for constant n, n_get_bulk and n_put_bulk must be the same */ + if (use_constant_values && n_put_bulk != n_get_bulk) + GOTO_ERR(ret, out); stats[lcore_id].enq_count = 0; @@ -149,31 +193,23 @@ per_lcore_mempool_test(void *arg) start_cycles = rte_get_timer_cycles(); while (time_diff/hz < TIME_S) { - for (i = 0; likely(i < (N/n_keep)); i++) { - /* get n_keep objects by bulk of n_bulk */ - idx = 0; - while (idx < n_keep) { - ret = rte_mempool_generic_get(mp, - &obj_table[idx], - n_get_bulk, - cache); - if (unlikely(ret < 0)) { - rte_mempool_dump(stdout, mp); - /* in this case, objects are lost... */ - GOTO_ERR(ret, out); - } - idx += n_get_bulk; - } + if (!use_constant_values) + ret = test_loop(mp, cache, n_keep, n_get_bulk, n_put_bulk); + else if (n_get_bulk == 1) + ret = test_loop(mp, cache, n_keep, 1, 1); + else if (n_get_bulk == 4) + ret = test_loop(mp, cache, n_keep, 4, 4); + else if (n_get_bulk == CACHE_LINE_BURST) + ret = test_loop(mp, cache, n_keep, + CACHE_LINE_BURST, CACHE_LINE_BURST); + else if (n_get_bulk == 32) + ret = test_loop(mp, cache, n_keep, 32, 32); + else + ret = -1; + + if (ret < 0) + GOTO_ERR(ret, out); - /* put the objects back */ - idx = 0; - while (idx < n_keep) { - rte_mempool_generic_put(mp, &obj_table[idx], - n_put_bulk, - cache); - idx += n_put_bulk; - } - } end_cycles = rte_get_timer_cycles(); time_diff = end_cycles - start_cycles; stats[lcore_id].enq_count += N; @@ -203,10 +239,10 @@ launch_cores(struct rte_mempool *mp, unsigned int cores) memset(stats, 0, sizeof(stats)); printf("mempool_autotest cache=%u cores=%u n_get_bulk=%u " - "n_put_bulk=%u n_keep=%u ", + "n_put_bulk=%u n_keep=%u constant_n=%u ", use_external_cache ? external_cache_size : (unsigned) mp->cache_size, - cores, n_get_bulk, n_put_bulk, n_keep); + cores, n_get_bulk, n_put_bulk, n_keep, use_constant_values); if (rte_mempool_avail_count(mp) != MEMPOOL_SIZE) { printf("mempool is not full\n"); @@ -253,9 +289,9 @@ launch_cores(struct rte_mempool *mp, unsigned int cores) static int do_one_mempool_test(struct rte_mempool *mp, unsigned int cores) { - unsigned bulk_tab_get[] = { 1, 4, 32, 0 }; - unsigned bulk_tab_put[] = { 1, 4, 32, 0 }; - unsigned keep_tab[] = { 32, 128, 0 }; + unsigned int bulk_tab_get[] = { 1, 4, CACHE_LINE_BURST, 32, 0 }; + unsigned int bulk_tab_put[] = { 1, 4, CACHE_LINE_BURST, 32, 0 }; + unsigned int keep_tab[] = { 32, 128, 512, 0 }; unsigned *get_bulk_ptr; unsigned *put_bulk_ptr; unsigned *keep_ptr; @@ -265,13 +301,21 @@ do_one_mempool_test(struct rte_mempool *mp, unsigned int cores) for (put_bulk_ptr = bulk_tab_put; *put_bulk_ptr; put_bulk_ptr++) { for (keep_ptr = keep_tab; *keep_ptr; keep_ptr++) { + use_constant_values = 0; n_get_bulk = *get_bulk_ptr; n_put_bulk = *put_bulk_ptr; n_keep = *keep_ptr; ret = launch_cores(mp, cores); - if (ret < 0) return -1; + + /* replay test with constant values */ + if (n_get_bulk == n_put_bulk) { + use_constant_values = 1; + ret = launch_cores(mp, cores); + if (ret < 0) + return -1; + } } } }