From patchwork Thu Aug 31 11:19:37 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Fengnan Chang X-Patchwork-Id: 130972 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 0CB5A41FDB; Thu, 31 Aug 2023 13:19:47 +0200 (CEST) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id DF33E4027B; Thu, 31 Aug 2023 13:19:46 +0200 (CEST) Received: from mail-pj1-f46.google.com (mail-pj1-f46.google.com [209.85.216.46]) by mails.dpdk.org (Postfix) with ESMTP id 08B3C40279 for ; Thu, 31 Aug 2023 13:19:44 +0200 (CEST) Received: by mail-pj1-f46.google.com with SMTP id 98e67ed59e1d1-26fd9ac0260so76496a91.0 for ; Thu, 31 Aug 2023 04:19:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1693480784; x=1694085584; darn=dpdk.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=b8cHdyFuAFW6GuzuBtm1DJ9MMKiPawWw4kU+ITIALxc=; b=ax29EC7gpxgyE3dCI9hqIb7Jny/d+RCjfkKXiCmmkXNH7CYyWkVrPibX4rw9BXezn+ kwjvTY87QY+5cOb6vzdfK9uunHowrzMZ7w4BkCGuhYwCURXCqmOypmMBDOhd24+cv8Z9 h+GmWMCRgqpWN8ueXo84axspVbCFbE6Pp8pVDHaCqDMcWApGyH8fJVe+qcxxct4G3YjT uVIn+AbXWWP3mbSoDQjiqn6qfrRDPN4vlI1wg8tSxY7qEZiTTgcn65QmdY1hkMzUluVz emF6rS+dUZrYM/qIpC6weRgdg3JsrHHm+LmOf6x9jluL6YNUDQcnLU9Cq1N6hvezNqHh +vpA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1693480784; x=1694085584; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=b8cHdyFuAFW6GuzuBtm1DJ9MMKiPawWw4kU+ITIALxc=; b=TS8pUlqH+xZyqkxTUn7Wm4mnPRF/VnNk0Xlp9O7PXN70fB+WGfI/TLA0a185A1OduG OBDJ7ydhXKdlDG1xbidPFV9GpRbCocTxEXTFY9gNg1BVQGHWYaC9Fz/Q8z8jRHKRDRMx wLpGMPx2jRsaLz0Xzub7hs43JccZH35rC5L3+b5hBtQZtPzFzafNetEW2LMZkQFYs+Pf r7ifrkqBV/E5gX0VMYgZubhi1s3HJjqVholGNpUMGtzN5dnecBTe3dgmJI7Nvot7r5JS ApJaMtmLkAbzouSWM8gUs7UFmW/5AXpHQNPbWHZP88fClyWfZNHsDbLsqihq3iMYI/+E 4zRg== X-Gm-Message-State: AOJu0YyPZnFVfRdpBj/YXpQNALp4yRhZ7Zxd7594MejH67s6UHkx9eFv 9DPCC2XHCQQaG9VKvpt+UWxWxg== X-Google-Smtp-Source: AGHT+IFAABX4tOBh9GkkGnNL/1GFHfKWWA4o3pIAU+cAfkkYovTQrU3s8o2pB2H0wqcyv9RUN2kUqA== X-Received: by 2002:a17:90a:4e47:b0:26d:412c:fc3c with SMTP id t7-20020a17090a4e4700b0026d412cfc3cmr4460563pjl.3.1693480783902; Thu, 31 Aug 2023 04:19:43 -0700 (PDT) Received: from HTW5T2C6VL.bytedance.net ([203.208.167.146]) by smtp.gmail.com with ESMTPSA id u3-20020a17090282c300b001c0ce518e98sm1041036plz.224.2023.08.31.04.19.41 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Thu, 31 Aug 2023 04:19:43 -0700 (PDT) From: Fengnan Chang To: anatoly.burakov@intel.com, dev@dpdk.org, dkozlyuk@nvidia.com Cc: Fengnan Chang Subject: [RFC PATCH] move memset out of hold lock when rte_free Date: Thu, 31 Aug 2023 19:19:37 +0800 Message-Id: <20230831111937.60975-1-changfengnan@bytedance.com> X-Mailer: git-send-email 2.37.1 (Apple Git-137.1) MIME-Version: 1.0 X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org In rte_free, the most cost time part of the whole process is memset, we can do memset without hold heap->lock, the benefit is reduce lock contention when multi thread try alloc or free. In my test with 40 cores machine, I add some code to account whole function cost in test_align_overlap_per_lcore with different alloc/size, under legacy memory mode without existing hugepage, files, this is test result: size w/ w/o 64 119us 118us 128 124us 118us 1024 137us 127us 4096 137us 140us 8192 142us 158us 16384 138us 186us 65536 139us 375us 131072 133us 627us 524277 694us 2973us 1048576 2117us 7685us Signed-off-by: Fengnan Chang --- lib/eal/common/malloc_elem.c | 16 ---------------- lib/eal/common/malloc_heap.c | 26 ++++++++++++++++++++++++-- 2 files changed, 24 insertions(+), 18 deletions(-) diff --git a/lib/eal/common/malloc_elem.c b/lib/eal/common/malloc_elem.c index 35a2313d04..763bbe179b 100644 --- a/lib/eal/common/malloc_elem.c +++ b/lib/eal/common/malloc_elem.c @@ -569,12 +569,6 @@ malloc_elem_join_adjacent_free(struct malloc_elem *elem) struct malloc_elem * malloc_elem_free(struct malloc_elem *elem) { - void *ptr; - size_t data_len; - - ptr = RTE_PTR_ADD(elem, MALLOC_ELEM_HEADER_LEN); - data_len = elem->size - MALLOC_ELEM_OVERHEAD; - /* * Consider the element clean for the purposes of joining. * If both neighbors are clean or non-existent, @@ -591,16 +585,6 @@ malloc_elem_free(struct malloc_elem *elem) /* decrease heap's count of allocated elements */ elem->heap->alloc_count--; - -#ifndef RTE_MALLOC_DEBUG - /* Normally clear the memory when needed. */ - if (!elem->dirty) - memset(ptr, 0, data_len); -#else - /* Always poison the memory in debug mode. */ - memset(ptr, MALLOC_POISON, data_len); -#endif - return elem; } diff --git a/lib/eal/common/malloc_heap.c b/lib/eal/common/malloc_heap.c index d25bdc98f9..a5fdc4cc6f 100644 --- a/lib/eal/common/malloc_heap.c +++ b/lib/eal/common/malloc_heap.c @@ -862,6 +862,8 @@ malloc_heap_free(struct malloc_elem *elem) unsigned int i, n_segs, before_space, after_space; int ret; bool unmapped = false; + void *ptr; + size_t data_len; const struct internal_config *internal_conf = eal_get_internal_configuration(); @@ -875,16 +877,36 @@ malloc_heap_free(struct malloc_elem *elem) msl = elem->msl; page_sz = (size_t)msl->page_sz; - rte_spinlock_lock(&(heap->lock)); - void *asan_ptr = RTE_PTR_ADD(elem, MALLOC_ELEM_HEADER_LEN + elem->pad); size_t asan_data_len = elem->size - MALLOC_ELEM_OVERHEAD - elem->pad; + ptr = RTE_PTR_ADD(elem, MALLOC_ELEM_HEADER_LEN); + data_len = elem->size - MALLOC_ELEM_OVERHEAD; + + /* If orig_elem is clean, any child elem should be clean, so let's do memset + * before hold lock. + */ + if (internal_conf->legacy_mem && !elem->orig_elem->dirty) + memset(ptr, 0, data_len); + + rte_spinlock_lock(&(heap->lock)); /* mark element as free */ elem->state = ELEM_FREE; elem = malloc_elem_free(elem); +#ifndef RTE_MALLOC_DEBUG + if (internal_conf->legacy_mem) { + /* If orig_elem is dirty, the joint element is clean, we need do memset now */ + if (elem->orig_elem->dirty && !elem->dirty) + memset(ptr, 0, data_len); + } else if (!elem->dirty) { + memset(ptr, 0, data_len); + } +#else + /* Always poison the memory in debug mode. */ + memset(ptr, MALLOC_POISON, data_len); +#endif /* anything after this is a bonus */ ret = 0;