From patchwork Fri Jul 13 12:47:57 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Anatoly Burakov X-Patchwork-Id: 43043 Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id CF7B72E81; Fri, 13 Jul 2018 14:48:13 +0200 (CEST) Received: from mga12.intel.com (mga12.intel.com [192.55.52.136]) by dpdk.org (Postfix) with ESMTP id 945222C18 for ; Fri, 13 Jul 2018 14:48:09 +0200 (CEST) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga006.fm.intel.com ([10.253.24.20]) by fmsmga106.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 13 Jul 2018 05:48:06 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.51,347,1526367600"; d="scan'208";a="245436371" Received: from irvmail001.ir.intel.com ([163.33.26.43]) by fmsmga006.fm.intel.com with ESMTP; 13 Jul 2018 05:48:05 -0700 Received: from sivswdev01.ir.intel.com (sivswdev01.ir.intel.com [10.237.217.45]) by irvmail001.ir.intel.com (8.14.3/8.13.6/MailSET/Hub) with ESMTP id w6DCm4en016830; Fri, 13 Jul 2018 13:48:04 +0100 Received: from sivswdev01.ir.intel.com (localhost [127.0.0.1]) by sivswdev01.ir.intel.com with ESMTP id w6DCm4tg006236; Fri, 13 Jul 2018 13:48:04 +0100 Received: (from aburakov@localhost) by sivswdev01.ir.intel.com with LOCAL id w6DCm450006232; Fri, 13 Jul 2018 13:48:04 +0100 From: Anatoly Burakov To: dev@dpdk.org Cc: ray.kinsella@intel.com, kuralamudhan.ramakrishnan@intel.com, louise.m.daly@intel.com, bruce.richardson@intel.com, ferruh.yigit@intel.com, konstantin.ananyev@intel.com, thomas@monjalon.net Date: Fri, 13 Jul 2018 13:47:57 +0100 Message-Id: <53d64ed31c0cc2f570dd78e0226c3fcd986184e0.1531485955.git.anatoly.burakov@intel.com> X-Mailer: git-send-email 1.7.0.7 In-Reply-To: References: In-Reply-To: References: Subject: [dpdk-dev] [PATCH v3 1/8] fbarray: support no-shconf mode X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" When using --no-shconf option, the expectation is that no multiprocess will be supported as no shared files are created. However, fbarray still creates some shared files that prevent multiple processes with the same prefix from starting. Fix this by avoiding creating shared files whenever noshconf option is specified. Since virtual areas we get from eal_get_virtual_area() are read-only, remap them as writable. Signed-off-by: Anatoly Burakov --- Notes: RFC->v1: - Use --no-shconf only lib/librte_eal/common/eal_common_fbarray.c | 71 +++++++++++++--------- 1 file changed, 42 insertions(+), 29 deletions(-) diff --git a/lib/librte_eal/common/eal_common_fbarray.c b/lib/librte_eal/common/eal_common_fbarray.c index 977174c4f..43caf3ced 100644 --- a/lib/librte_eal/common/eal_common_fbarray.c +++ b/lib/librte_eal/common/eal_common_fbarray.c @@ -705,39 +705,52 @@ rte_fbarray_init(struct rte_fbarray *arr, const char *name, unsigned int len, if (data == NULL) goto fail; - eal_get_fbarray_path(path, sizeof(path), name); + if (internal_config.no_shconf) { + /* remap virtual area as writable */ + void *new_data = mmap(data, mmap_len, PROT_READ | PROT_WRITE, + MAP_FIXED | MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); + if (new_data == MAP_FAILED) { + RTE_LOG(DEBUG, EAL, "%s(): couldn't remap anonymous memory: %s\n", + __func__, strerror(errno)); + goto fail; + } + } else { + eal_get_fbarray_path(path, sizeof(path), name); - /* - * Each fbarray is unique to process namespace, i.e. the filename - * depends on process prefix. Try to take out a lock and see if we - * succeed. If we don't, someone else is using it already. - */ - fd = open(path, O_CREAT | O_RDWR, 0600); - if (fd < 0) { - RTE_LOG(DEBUG, EAL, "%s(): couldn't open %s: %s\n", __func__, - path, strerror(errno)); - rte_errno = errno; - goto fail; - } else if (flock(fd, LOCK_EX | LOCK_NB)) { - RTE_LOG(DEBUG, EAL, "%s(): couldn't lock %s: %s\n", __func__, - path, strerror(errno)); - rte_errno = EBUSY; - goto fail; - } + /* + * Each fbarray is unique to process namespace, i.e. the + * filename depends on process prefix. Try to take out a lock + * and see if we succeed. If we don't, someone else is using it + * already. + */ + fd = open(path, O_CREAT | O_RDWR, 0600); + if (fd < 0) { + RTE_LOG(DEBUG, EAL, "%s(): couldn't open %s: %s\n", + __func__, path, strerror(errno)); + rte_errno = errno; + goto fail; + } else if (flock(fd, LOCK_EX | LOCK_NB)) { + RTE_LOG(DEBUG, EAL, "%s(): couldn't lock %s: %s\n", + __func__, path, strerror(errno)); + rte_errno = EBUSY; + goto fail; + } - /* take out a non-exclusive lock, so that other processes could still - * attach to it, but no other process could reinitialize it. - */ - if (flock(fd, LOCK_SH | LOCK_NB)) { - rte_errno = errno; - goto fail; - } + /* take out a non-exclusive lock, so that other processes could + * still attach to it, but no other process could reinitialize + * it. + */ + if (flock(fd, LOCK_SH | LOCK_NB)) { + rte_errno = errno; + goto fail; + } - if (resize_and_map(fd, data, mmap_len)) - goto fail; + if (resize_and_map(fd, data, mmap_len)) + goto fail; - /* we've mmap'ed the file, we can now close the fd */ - close(fd); + /* we've mmap'ed the file, we can now close the fd */ + close(fd); + } /* initialize the data */ memset(data, 0, mmap_len); From patchwork Fri Jul 13 12:47:58 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Anatoly Burakov X-Patchwork-Id: 43044 Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 4947232A5; Fri, 13 Jul 2018 14:48:15 +0200 (CEST) Received: from mga07.intel.com (mga07.intel.com [134.134.136.100]) by dpdk.org (Postfix) with ESMTP id BDFB52C18 for ; Fri, 13 Jul 2018 14:48:09 +0200 (CEST) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by orsmga105.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 13 Jul 2018 05:48:07 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.51,347,1526367600"; d="scan'208";a="245000235" Received: from irvmail001.ir.intel.com ([163.33.26.43]) by fmsmga005.fm.intel.com with ESMTP; 13 Jul 2018 05:48:05 -0700 Received: from sivswdev01.ir.intel.com (sivswdev01.ir.intel.com [10.237.217.45]) by irvmail001.ir.intel.com (8.14.3/8.13.6/MailSET/Hub) with ESMTP id w6DCm43x016833; Fri, 13 Jul 2018 13:48:04 +0100 Received: from sivswdev01.ir.intel.com (localhost [127.0.0.1]) by sivswdev01.ir.intel.com with ESMTP id w6DCm4Lq006243; Fri, 13 Jul 2018 13:48:04 +0100 Received: (from aburakov@localhost) by sivswdev01.ir.intel.com with LOCAL id w6DCm4AY006239; Fri, 13 Jul 2018 13:48:04 +0100 From: Anatoly Burakov To: dev@dpdk.org Cc: ray.kinsella@intel.com, kuralamudhan.ramakrishnan@intel.com, louise.m.daly@intel.com, bruce.richardson@intel.com, ferruh.yigit@intel.com, konstantin.ananyev@intel.com, thomas@monjalon.net Date: Fri, 13 Jul 2018 13:47:58 +0100 Message-Id: X-Mailer: git-send-email 1.7.0.7 In-Reply-To: References: In-Reply-To: References: Subject: [dpdk-dev] [PATCH v3 2/8] ipc: add support for no-shconf mode X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" IPC is an inter-process communication mechanism. Since no secondaries can ever be expected to run in no-shconf mode, IPC will be useless, so do not enable it in the first place. In the interests of API usage convenience, we will still allow registering callbacks, but obviously they won't ever be triggered. Signed-off-by: Anatoly Burakov --- Notes: RFC->v1: - Use --no-shconf only lib/librte_eal/common/eal_common_proc.c | 25 +++++++++++++++++++++++++ 1 file changed, 25 insertions(+) diff --git a/lib/librte_eal/common/eal_common_proc.c b/lib/librte_eal/common/eal_common_proc.c index f010ef59e..c19b4b406 100644 --- a/lib/librte_eal/common/eal_common_proc.c +++ b/lib/librte_eal/common/eal_common_proc.c @@ -626,6 +626,14 @@ rte_mp_channel_init(void) int dir_fd; pthread_t mp_handle_tid, async_reply_handle_tid; + /* in no shared files mode, we do not have secondary processes support, + * so no need to initialize IPC. + */ + if (internal_config.no_shconf) { + RTE_LOG(DEBUG, EAL, "No shared files mode enabled, IPC will be disabled\n"); + return 0; + } + /* create filter path */ create_socket_path("*", path, sizeof(path)); strlcpy(mp_filter, basename(path), sizeof(mp_filter)); @@ -988,6 +996,12 @@ rte_mp_request_sync(struct rte_mp_msg *req, struct rte_mp_reply *reply, if (check_input(req) == false) return -1; + + if (internal_config.no_shconf) { + RTE_LOG(DEBUG, EAL, "No shared files mode enabled, IPC is disabled\n"); + return 0; + } + if (gettimeofday(&now, NULL) < 0) { RTE_LOG(ERR, EAL, "Faile to get current time\n"); rte_errno = errno; @@ -1072,6 +1086,12 @@ rte_mp_request_async(struct rte_mp_msg *req, const struct timespec *ts, if (check_input(req) == false) return -1; + + if (internal_config.no_shconf) { + RTE_LOG(DEBUG, EAL, "No shared files mode enabled, IPC is disabled\n"); + return 0; + } + if (gettimeofday(&now, NULL) < 0) { RTE_LOG(ERR, EAL, "Faile to get current time\n"); rte_errno = errno; @@ -1213,5 +1233,10 @@ rte_mp_reply(struct rte_mp_msg *msg, const char *peer) return -1; } + if (internal_config.no_shconf) { + RTE_LOG(DEBUG, EAL, "No shared files mode enabled, IPC is disabled\n"); + return 0; + } + return mp_send(msg, peer, MP_REP); } From patchwork Fri Jul 13 12:47:59 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Anatoly Burakov X-Patchwork-Id: 43048 Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 3973D4C8C; Fri, 13 Jul 2018 14:48:23 +0200 (CEST) Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by dpdk.org (Postfix) with ESMTP id 42C092C2F for ; Fri, 13 Jul 2018 14:48:12 +0200 (CEST) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga001.jf.intel.com ([10.7.209.18]) by orsmga101.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 13 Jul 2018 05:48:10 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.51,347,1526367600"; d="scan'208";a="72469314" Received: from irvmail001.ir.intel.com ([163.33.26.43]) by orsmga001.jf.intel.com with ESMTP; 13 Jul 2018 05:48:05 -0700 Received: from sivswdev01.ir.intel.com (sivswdev01.ir.intel.com [10.237.217.45]) by irvmail001.ir.intel.com (8.14.3/8.13.6/MailSET/Hub) with ESMTP id w6DCm4RG016836; Fri, 13 Jul 2018 13:48:04 +0100 Received: from sivswdev01.ir.intel.com (localhost [127.0.0.1]) by sivswdev01.ir.intel.com with ESMTP id w6DCm41S006250; Fri, 13 Jul 2018 13:48:04 +0100 Received: (from aburakov@localhost) by sivswdev01.ir.intel.com with LOCAL id w6DCm4Yv006246; Fri, 13 Jul 2018 13:48:04 +0100 From: Anatoly Burakov To: dev@dpdk.org Cc: Bruce Richardson , ray.kinsella@intel.com, kuralamudhan.ramakrishnan@intel.com, louise.m.daly@intel.com, ferruh.yigit@intel.com, konstantin.ananyev@intel.com, thomas@monjalon.net Date: Fri, 13 Jul 2018 13:47:59 +0100 Message-Id: <1657ee590afbaddc3fb66c3dbcddd767bf45b6e5.1531485955.git.anatoly.burakov@intel.com> X-Mailer: git-send-email 1.7.0.7 In-Reply-To: References: In-Reply-To: References: Subject: [dpdk-dev] [PATCH v3 3/8] eal: add support for no-shconf for hugepage info X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Do not create any shared hugepage size info files if we were asked to not create any shared files. Signed-off-by: Anatoly Burakov --- Notes: RFC->v1: - Use --no-shconf only lib/librte_eal/bsdapp/eal/eal_hugepage_info.c | 4 ++++ lib/librte_eal/linuxapp/eal/eal_hugepage_info.c | 4 ++++ 2 files changed, 8 insertions(+) diff --git a/lib/librte_eal/bsdapp/eal/eal_hugepage_info.c b/lib/librte_eal/bsdapp/eal/eal_hugepage_info.c index 836feb672..1e8f5df23 100644 --- a/lib/librte_eal/bsdapp/eal/eal_hugepage_info.c +++ b/lib/librte_eal/bsdapp/eal/eal_hugepage_info.c @@ -101,6 +101,10 @@ eal_hugepage_info_init(void) hpi->num_pages[0] = num_buffers; hpi->lock_descriptor = fd; + /* for no shared files mode, do not create shared memory config */ + if (internal_config.no_shconf) + return 0; + tmp_hpi = create_shared_memory(eal_hugepage_info_path(), sizeof(internal_config.hugepage_info)); if (tmp_hpi == NULL ) { diff --git a/lib/librte_eal/linuxapp/eal/eal_hugepage_info.c b/lib/librte_eal/linuxapp/eal/eal_hugepage_info.c index 7eca711ba..7f8e2fd9c 100644 --- a/lib/librte_eal/linuxapp/eal/eal_hugepage_info.c +++ b/lib/librte_eal/linuxapp/eal/eal_hugepage_info.c @@ -446,6 +446,10 @@ eal_hugepage_info_init(void) if (hugepage_info_init() < 0) return -1; + /* for no shared files mode, we're done */ + if (internal_config.no_shconf) + return 0; + hpi = &internal_config.hugepage_info[0]; tmp_hpi = create_shared_memory(eal_hugepage_info_path(), From patchwork Fri Jul 13 12:48:00 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Anatoly Burakov X-Patchwork-Id: 43046 Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 1F3B544C3; Fri, 13 Jul 2018 14:48:19 +0200 (CEST) Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by dpdk.org (Postfix) with ESMTP id 133622C23 for ; Fri, 13 Jul 2018 14:48:10 +0200 (CEST) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by orsmga102.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 13 Jul 2018 05:48:07 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.51,347,1526367600"; d="scan'208";a="54117443" Received: from irvmail001.ir.intel.com ([163.33.26.43]) by fmsmga007.fm.intel.com with ESMTP; 13 Jul 2018 05:48:05 -0700 Received: from sivswdev01.ir.intel.com (sivswdev01.ir.intel.com [10.237.217.45]) by irvmail001.ir.intel.com (8.14.3/8.13.6/MailSET/Hub) with ESMTP id w6DCm4KW016839; Fri, 13 Jul 2018 13:48:04 +0100 Received: from sivswdev01.ir.intel.com (localhost [127.0.0.1]) by sivswdev01.ir.intel.com with ESMTP id w6DCm4w4006261; Fri, 13 Jul 2018 13:48:04 +0100 Received: (from aburakov@localhost) by sivswdev01.ir.intel.com with LOCAL id w6DCm4MV006257; Fri, 13 Jul 2018 13:48:04 +0100 From: Anatoly Burakov To: dev@dpdk.org Cc: ray.kinsella@intel.com, kuralamudhan.ramakrishnan@intel.com, louise.m.daly@intel.com, bruce.richardson@intel.com, ferruh.yigit@intel.com, konstantin.ananyev@intel.com, thomas@monjalon.net Date: Fri, 13 Jul 2018 13:48:00 +0100 Message-Id: X-Mailer: git-send-email 1.7.0.7 In-Reply-To: References: In-Reply-To: References: Subject: [dpdk-dev] [PATCH v3 4/8] eal: add support for no-shconf in hugepage data file X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Do not create a shared hugepage data file if we were asked to not create any shared files. Signed-off-by: Anatoly Burakov --- Notes: RFC->v1: - Use --no-shconf only lib/librte_eal/linuxapp/eal/eal_memory.c | 13 ++++++++++++- 1 file changed, 12 insertions(+), 1 deletion(-) diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c b/lib/librte_eal/linuxapp/eal/eal_memory.c index 5d3c8831b..ddfa8b133 100644 --- a/lib/librte_eal/linuxapp/eal/eal_memory.c +++ b/lib/librte_eal/linuxapp/eal/eal_memory.c @@ -521,7 +521,18 @@ static void * create_shared_memory(const char *filename, const size_t mem_size) { void *retval; - int fd = open(filename, O_CREAT | O_RDWR, 0666); + int fd; + + /* if no shared files mode is used, create anonymous memory instead */ + if (internal_config.no_shconf) { + retval = mmap(NULL, mem_size, PROT_READ | PROT_WRITE, + MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); + if (retval == MAP_FAILED) + return NULL; + return retval; + } + + fd = open(filename, O_CREAT | O_RDWR, 0666); if (fd < 0) return NULL; if (ftruncate(fd, mem_size) < 0) { From patchwork Fri Jul 13 12:48:01 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Anatoly Burakov X-Patchwork-Id: 43050 Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id E75694CBD; Fri, 13 Jul 2018 14:48:44 +0200 (CEST) Received: from mga12.intel.com (mga12.intel.com [192.55.52.136]) by dpdk.org (Postfix) with ESMTP id DD41014EC for ; Fri, 13 Jul 2018 14:48:43 +0200 (CEST) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga008.jf.intel.com ([10.7.209.65]) by fmsmga106.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 13 Jul 2018 05:48:42 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.51,347,1526367600"; d="scan'208";a="56618770" Received: from irvmail001.ir.intel.com ([163.33.26.43]) by orsmga008.jf.intel.com with ESMTP; 13 Jul 2018 05:48:05 -0700 Received: from sivswdev01.ir.intel.com (sivswdev01.ir.intel.com [10.237.217.45]) by irvmail001.ir.intel.com (8.14.3/8.13.6/MailSET/Hub) with ESMTP id w6DCm46x016842; Fri, 13 Jul 2018 13:48:05 +0100 Received: from sivswdev01.ir.intel.com (localhost [127.0.0.1]) by sivswdev01.ir.intel.com with ESMTP id w6DCm4Vs006271; Fri, 13 Jul 2018 13:48:04 +0100 Received: (from aburakov@localhost) by sivswdev01.ir.intel.com with LOCAL id w6DCm4vF006266; Fri, 13 Jul 2018 13:48:04 +0100 From: Anatoly Burakov To: dev@dpdk.org Cc: Bruce Richardson , ray.kinsella@intel.com, kuralamudhan.ramakrishnan@intel.com, louise.m.daly@intel.com, ferruh.yigit@intel.com, konstantin.ananyev@intel.com, thomas@monjalon.net Date: Fri, 13 Jul 2018 13:48:01 +0100 Message-Id: X-Mailer: git-send-email 1.7.0.7 In-Reply-To: References: In-Reply-To: References: Subject: [dpdk-dev] [PATCH v3 5/8] eal: do not create runtime dir in no-shconf mode X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Now that the rest of the EAL is adjusted to not create any shared files, prevent runtime directory from ever being created. Signed-off-by: Anatoly Burakov --- Notes: RFC->v1: - Use --no-shconf only lib/librte_eal/bsdapp/eal/eal.c | 3 ++- lib/librte_eal/linuxapp/eal/eal.c | 3 ++- 2 files changed, 4 insertions(+), 2 deletions(-) diff --git a/lib/librte_eal/bsdapp/eal/eal.c b/lib/librte_eal/bsdapp/eal/eal.c index dc279542d..13b6f8ae1 100644 --- a/lib/librte_eal/bsdapp/eal/eal.c +++ b/lib/librte_eal/bsdapp/eal/eal.c @@ -601,7 +601,8 @@ rte_eal_init(int argc, char **argv) } /* create runtime data directory */ - if (eal_create_runtime_dir() < 0) { + if (internal_config.no_shconf == 0 && + eal_create_runtime_dir() < 0) { rte_eal_init_alert("Cannot create runtime directory\n"); rte_errno = EACCES; return -1; diff --git a/lib/librte_eal/linuxapp/eal/eal.c b/lib/librte_eal/linuxapp/eal/eal.c index ec7cea55d..191960caa 100644 --- a/lib/librte_eal/linuxapp/eal/eal.c +++ b/lib/librte_eal/linuxapp/eal/eal.c @@ -832,7 +832,8 @@ rte_eal_init(int argc, char **argv) } /* create runtime data directory */ - if (eal_create_runtime_dir() < 0) { + if (internal_config.no_shconf == 0 && + eal_create_runtime_dir() < 0) { rte_eal_init_alert("Cannot create runtime directory\n"); rte_errno = EACCES; return -1; From patchwork Fri Jul 13 12:48:02 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Anatoly Burakov X-Patchwork-Id: 43049 Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 49F594CBB; Fri, 13 Jul 2018 14:48:24 +0200 (CEST) Received: from mga18.intel.com (mga18.intel.com [134.134.136.126]) by dpdk.org (Postfix) with ESMTP id 1F97D2C2F for ; Fri, 13 Jul 2018 14:48:12 +0200 (CEST) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by orsmga106.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 13 Jul 2018 05:48:12 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.51,347,1526367600"; d="scan'208";a="64448452" Received: from irvmail001.ir.intel.com ([163.33.26.43]) by FMSMGA003.fm.intel.com with ESMTP; 13 Jul 2018 05:48:05 -0700 Received: from sivswdev01.ir.intel.com (sivswdev01.ir.intel.com [10.237.217.45]) by irvmail001.ir.intel.com (8.14.3/8.13.6/MailSET/Hub) with ESMTP id w6DCm5Oo016845; Fri, 13 Jul 2018 13:48:05 +0100 Received: from sivswdev01.ir.intel.com (localhost [127.0.0.1]) by sivswdev01.ir.intel.com with ESMTP id w6DCm5xa006278; Fri, 13 Jul 2018 13:48:05 +0100 Received: (from aburakov@localhost) by sivswdev01.ir.intel.com with LOCAL id w6DCm4vq006274; Fri, 13 Jul 2018 13:48:04 +0100 From: Anatoly Burakov To: dev@dpdk.org Cc: ray.kinsella@intel.com, kuralamudhan.ramakrishnan@intel.com, louise.m.daly@intel.com, bruce.richardson@intel.com, ferruh.yigit@intel.com, konstantin.ananyev@intel.com, thomas@monjalon.net Date: Fri, 13 Jul 2018 13:48:02 +0100 Message-Id: X-Mailer: git-send-email 1.7.0.7 In-Reply-To: References: In-Reply-To: References: Subject: [dpdk-dev] [PATCH v3 6/8] mem: add support for hugepage-unlink mode X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Unlink hugepages after creating them, to honor the hugepage-unlink mode. We cannot resize non-existing files, so make single file segments explicitly unsupported. Signed-off-by: Anatoly Burakov --- Notes: v1->v2: - Move check for hugepage unlink into this patch, to be consistent with commit message RFC->v1: - Use --huge-unlink only RFC->v1: - Use --huge-unlink only lib/librte_eal/common/eal_common_options.c | 6 ++++++ lib/librte_eal/linuxapp/eal/eal_memalloc.c | 16 +++++++++++++++- 2 files changed, 21 insertions(+), 1 deletion(-) diff --git a/lib/librte_eal/common/eal_common_options.c b/lib/librte_eal/common/eal_common_options.c index 45ea01a8b..df5d53648 100644 --- a/lib/librte_eal/common/eal_common_options.c +++ b/lib/librte_eal/common/eal_common_options.c @@ -1332,6 +1332,12 @@ eal_check_common_options(struct internal_config *internal_cfg) " is only supported in non-legacy memory mode\n"); return -1; } + if (internal_cfg->single_file_segments && + internal_cfg->hugepage_unlink) { + RTE_LOG(ERR, EAL, "Option --"OPT_SINGLE_FILE_SEGMENTS" is " + "not compatible with --"OPT_HUGE_UNLINK"\n"); + return -1; + } return 0; } diff --git a/lib/librte_eal/linuxapp/eal/eal_memalloc.c b/lib/librte_eal/linuxapp/eal/eal_memalloc.c index 69604f823..d610923b8 100644 --- a/lib/librte_eal/linuxapp/eal/eal_memalloc.c +++ b/lib/librte_eal/linuxapp/eal/eal_memalloc.c @@ -489,6 +489,13 @@ alloc_seg(struct rte_memseg *ms, void *addr, int socket_id, __func__, strerror(errno)); goto resized; } + if (internal_config.hugepage_unlink) { + if (unlink(path)) { + RTE_LOG(DEBUG, EAL, "%s(): unlink() failed: %s\n", + __func__, strerror(errno)); + goto resized; + } + } } /* @@ -587,7 +594,8 @@ alloc_seg(struct rte_memseg *ms, void *addr, int socket_id, /* ignore failure, can't make it any worse */ } else { /* only remove file if we can take out a write lock */ - if (lock(fd, LOCK_EX) == 1) + if (internal_config.hugepage_unlink == 0 && + lock(fd, LOCK_EX) == 1) unlink(path); close(fd); } @@ -612,6 +620,12 @@ free_seg(struct rte_memseg *ms, struct hugepage_info *hi, return -1; } + /* if we've already unlinked the page, nothing needs to be done */ + if (internal_config.hugepage_unlink) { + memset(ms, 0, sizeof(*ms)); + return 0; + } + /* if we are not in single file segments mode, we're going to unmap the * segment and thus drop the lock on original fd, but hugepage dir is * now locked so we can take out another one without races. From patchwork Fri Jul 13 12:48:03 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Anatoly Burakov X-Patchwork-Id: 43047 Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id CFD454C74; Fri, 13 Jul 2018 14:48:20 +0200 (CEST) Received: from mga18.intel.com (mga18.intel.com [134.134.136.126]) by dpdk.org (Postfix) with ESMTP id 321CC2C28 for ; Fri, 13 Jul 2018 14:48:10 +0200 (CEST) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by orsmga106.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 13 Jul 2018 05:48:07 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.51,347,1526367600"; d="scan'208";a="54778354" Received: from irvmail001.ir.intel.com ([163.33.26.43]) by fmsmga008.fm.intel.com with ESMTP; 13 Jul 2018 05:48:05 -0700 Received: from sivswdev01.ir.intel.com (sivswdev01.ir.intel.com [10.237.217.45]) by irvmail001.ir.intel.com (8.14.3/8.13.6/MailSET/Hub) with ESMTP id w6DCm5CL016848; Fri, 13 Jul 2018 13:48:05 +0100 Received: from sivswdev01.ir.intel.com (localhost [127.0.0.1]) by sivswdev01.ir.intel.com with ESMTP id w6DCm5lQ006286; Fri, 13 Jul 2018 13:48:05 +0100 Received: (from aburakov@localhost) by sivswdev01.ir.intel.com with LOCAL id w6DCm5Cr006282; Fri, 13 Jul 2018 13:48:05 +0100 From: Anatoly Burakov To: dev@dpdk.org Cc: ray.kinsella@intel.com, kuralamudhan.ramakrishnan@intel.com, louise.m.daly@intel.com, bruce.richardson@intel.com, ferruh.yigit@intel.com, konstantin.ananyev@intel.com, thomas@monjalon.net Date: Fri, 13 Jul 2018 13:48:03 +0100 Message-Id: X-Mailer: git-send-email 1.7.0.7 In-Reply-To: References: In-Reply-To: References: Subject: [dpdk-dev] [PATCH v3 7/8] eal: add --in-memory option X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" This command-line option will cause DPDK to operate entirely in memory and not create any shared files at runtime, including any shared configuration or hugetlbfs files. This is useful for debug purposes, as well as for certain use cases like containers or automatic memory cleanup. Currently, this option acts as a strict superset of --no-shconf and --huge-unlink commands. Signed-off-by: Anatoly Burakov --- Notes: RFC->v1: - Do not deprecate old options, instead just coopt them lib/librte_eal/common/eal_common_options.c | 18 ++++++++++++++---- lib/librte_eal/common/eal_internal_cfg.h | 4 ++++ lib/librte_eal/common/eal_options.h | 2 ++ 3 files changed, 20 insertions(+), 4 deletions(-) diff --git a/lib/librte_eal/common/eal_common_options.c b/lib/librte_eal/common/eal_common_options.c index df5d53648..f308b57c3 100644 --- a/lib/librte_eal/common/eal_common_options.c +++ b/lib/librte_eal/common/eal_common_options.c @@ -66,6 +66,7 @@ eal_long_options[] = { {OPT_NO_HUGE, 0, NULL, OPT_NO_HUGE_NUM }, {OPT_NO_PCI, 0, NULL, OPT_NO_PCI_NUM }, {OPT_NO_SHCONF, 0, NULL, OPT_NO_SHCONF_NUM }, + {OPT_IN_MEMORY, 0, NULL, OPT_IN_MEMORY_NUM }, {OPT_PCI_BLACKLIST, 1, NULL, OPT_PCI_BLACKLIST_NUM }, {OPT_PCI_WHITELIST, 1, NULL, OPT_PCI_WHITELIST_NUM }, {OPT_PROC_TYPE, 1, NULL, OPT_PROC_TYPE_NUM }, @@ -1170,6 +1171,13 @@ eal_parse_common_option(int opt, const char *optarg, conf->no_shconf = 1; break; + case OPT_IN_MEMORY_NUM: + conf->in_memory = 1; + /* in-memory is a superset of noshconf and huge-unlink */ + conf->no_shconf = 1; + conf->hugepage_unlink = 1; + break; + case OPT_PROC_TYPE_NUM: conf->process_type = eal_parse_proc_type(optarg); break; @@ -1321,8 +1329,8 @@ eal_check_common_options(struct internal_config *internal_cfg) "be specified together with --"OPT_NO_HUGE"\n"); return -1; } - - if (internal_cfg->no_hugetlbfs && internal_cfg->hugepage_unlink) { + if (internal_cfg->no_hugetlbfs && internal_cfg->hugepage_unlink && + !internal_cfg->in_memory) { RTE_LOG(ERR, EAL, "Option --"OPT_HUGE_UNLINK" cannot " "be specified together with --"OPT_NO_HUGE"\n"); return -1; @@ -1330,12 +1338,12 @@ eal_check_common_options(struct internal_config *internal_cfg) if (internal_config.force_socket_limits && internal_config.legacy_mem) { RTE_LOG(ERR, EAL, "Option --"OPT_SOCKET_LIMIT " is only supported in non-legacy memory mode\n"); - return -1; } if (internal_cfg->single_file_segments && internal_cfg->hugepage_unlink) { RTE_LOG(ERR, EAL, "Option --"OPT_SINGLE_FILE_SEGMENTS" is " - "not compatible with --"OPT_HUGE_UNLINK"\n"); + "not compatible with neither --"OPT_IN_MEMORY" nor " + "--"OPT_HUGE_UNLINK"\n"); return -1; } @@ -1386,6 +1394,8 @@ eal_common_usage(void) " Set specific log level\n" " -v Display version information on startup\n" " -h, --help This help\n" + " --"OPT_IN_MEMORY" Operate entirely in memory. This will \n" + " disable secondary process support\n" "\nEAL options for DEBUG use only:\n" " --"OPT_HUGE_UNLINK" Unlink hugepage files after init\n" " --"OPT_NO_HUGE" Use malloc instead of hugetlbfs\n" diff --git a/lib/librte_eal/common/eal_internal_cfg.h b/lib/librte_eal/common/eal_internal_cfg.h index d66cd0313..00ee6e06e 100644 --- a/lib/librte_eal/common/eal_internal_cfg.h +++ b/lib/librte_eal/common/eal_internal_cfg.h @@ -41,6 +41,10 @@ struct internal_config { volatile unsigned vmware_tsc_map; /**< true to use VMware TSC mapping * instead of native TSC */ volatile unsigned no_shconf; /**< true if there is no shared config */ + volatile unsigned in_memory; + /**< true if DPDK should operate entirely in-memory and not create any + * shared files or runtime data. + */ volatile unsigned create_uio_dev; /**< true to create /dev/uioX devices */ volatile enum rte_proc_type_t process_type; /**< multi-process proc type */ /** true to try allocating memory on specific sockets */ diff --git a/lib/librte_eal/common/eal_options.h b/lib/librte_eal/common/eal_options.h index 6d92f64a8..96e166787 100644 --- a/lib/librte_eal/common/eal_options.h +++ b/lib/librte_eal/common/eal_options.h @@ -45,6 +45,8 @@ enum { OPT_NO_PCI_NUM, #define OPT_NO_SHCONF "no-shconf" OPT_NO_SHCONF_NUM, +#define OPT_IN_MEMORY "in-memory" + OPT_IN_MEMORY_NUM, #define OPT_SOCKET_MEM "socket-mem" OPT_SOCKET_MEM_NUM, #define OPT_SOCKET_LIMIT "socket-limit" From patchwork Fri Jul 13 12:48:04 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Anatoly Burakov X-Patchwork-Id: 43045 Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 4125B3772; Fri, 13 Jul 2018 14:48:17 +0200 (CEST) Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by dpdk.org (Postfix) with ESMTP id AE8D32C16 for ; Fri, 13 Jul 2018 14:48:09 +0200 (CEST) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga004.jf.intel.com ([10.7.209.38]) by fmsmga101.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 13 Jul 2018 05:48:08 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.51,347,1526367600"; d="scan'208";a="215740285" Received: from irvmail001.ir.intel.com ([163.33.26.43]) by orsmga004.jf.intel.com with ESMTP; 13 Jul 2018 05:48:06 -0700 Received: from sivswdev01.ir.intel.com (sivswdev01.ir.intel.com [10.237.217.45]) by irvmail001.ir.intel.com (8.14.3/8.13.6/MailSET/Hub) with ESMTP id w6DCm5QD016851; Fri, 13 Jul 2018 13:48:05 +0100 Received: from sivswdev01.ir.intel.com (localhost [127.0.0.1]) by sivswdev01.ir.intel.com with ESMTP id w6DCm5Uq006294; Fri, 13 Jul 2018 13:48:05 +0100 Received: (from aburakov@localhost) by sivswdev01.ir.intel.com with LOCAL id w6DCm5kk006290; Fri, 13 Jul 2018 13:48:05 +0100 From: Anatoly Burakov To: dev@dpdk.org Cc: ray.kinsella@intel.com, kuralamudhan.ramakrishnan@intel.com, louise.m.daly@intel.com, bruce.richardson@intel.com, ferruh.yigit@intel.com, konstantin.ananyev@intel.com, thomas@monjalon.net Date: Fri, 13 Jul 2018 13:48:04 +0100 Message-Id: <3e5056f09adc3d4613b354e36531cd8103e9a869.1531485955.git.anatoly.burakov@intel.com> X-Mailer: git-send-email 1.7.0.7 In-Reply-To: References: In-Reply-To: References: Subject: [dpdk-dev] [PATCH v3 8/8] mem: support in-memory mode X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Implement the final piece of the in-memory mode puzzle - enable running DPDK entirely in memory, without creating any files. To do it, use mmap with MAP_HUGETLB and size flags to enable DPDK to work without hugetlbfs mountpoints. In order to enable this, a few things needed to be changed. First of all, we need to allow empty hugetlbfs mountpoints in hugepage_info, and handle them correctly (by not trying to create any files and lock any directories). Next, we need to reorder the mapping sequence, because the page is not really allocated until the page fault, and we cannot get its IOVA address before we trigger the page fault. Finally, decide at compile time whether we are going to be supporting anonymous hugepages or not, because we cannot check for it at runtime. Signed-off-by: Anatoly Burakov --- Notes: RFC->v1: - Drop memfd and instead use mmap() with MAP_HUGETLB. This will drop the kernel requirements down to 3.8, and does not impose any restrictions glibc (as far as i known). Unfortunately, there's a bit of an issue with this approach, because mmap() is stupid and will happily ignore unsupported arguments. This means that if the binary were to be compiled on a 3.8+ kernel but run on a pre-3.8 kernel (such as currently supported minimum of 3.2), then most likely the memory would be allocated using regular pages, causing unthinkable performance degradation. No solution to this problem is currently known to me. .../linuxapp/eal/eal_hugepage_info.c | 91 ++++++++---- lib/librte_eal/linuxapp/eal/eal_memalloc.c | 140 +++++++++++------- lib/librte_eal/linuxapp/eal/eal_memory.c | 3 +- 3 files changed, 149 insertions(+), 85 deletions(-) diff --git a/lib/librte_eal/linuxapp/eal/eal_hugepage_info.c b/lib/librte_eal/linuxapp/eal/eal_hugepage_info.c index 7f8e2fd9c..3a7d4b222 100644 --- a/lib/librte_eal/linuxapp/eal/eal_hugepage_info.c +++ b/lib/librte_eal/linuxapp/eal/eal_hugepage_info.c @@ -18,6 +18,8 @@ #include #include +#include /* for hugetlb-related flags */ + #include #include #include @@ -313,11 +315,49 @@ compare_hpi(const void *a, const void *b) return hpi_b->hugepage_sz - hpi_a->hugepage_sz; } +static void +calc_num_pages(struct hugepage_info *hpi, struct dirent *dirent) +{ + uint64_t total_pages = 0; + unsigned int i; + + /* + * first, try to put all hugepages into relevant sockets, but + * if first attempts fails, fall back to collecting all pages + * in one socket and sorting them later + */ + total_pages = 0; + /* we also don't want to do this for legacy init */ + if (!internal_config.legacy_mem) + for (i = 0; i < rte_socket_count(); i++) { + int socket = rte_socket_id_by_idx(i); + unsigned int num_pages = + get_num_hugepages_on_node( + dirent->d_name, socket); + hpi->num_pages[socket] = num_pages; + total_pages += num_pages; + } + /* + * we failed to sort memory from the get go, so fall + * back to old way + */ + if (total_pages == 0) { + hpi->num_pages[0] = get_num_hugepages(dirent->d_name); + +#ifndef RTE_ARCH_64 + /* for 32-bit systems, limit number of hugepages to + * 1GB per page size */ + hpi->num_pages[0] = RTE_MIN(hpi->num_pages[0], + RTE_PGSIZE_1G / hpi->hugepage_sz); +#endif + } +} + static int hugepage_info_init(void) { const char dirent_start_text[] = "hugepages-"; const size_t dirent_start_len = sizeof(dirent_start_text) - 1; - unsigned int i, total_pages, num_sizes = 0; + unsigned int i, num_sizes = 0; DIR *dir; struct dirent *dirent; @@ -355,6 +395,22 @@ hugepage_info_init(void) "%" PRIu64 " reserved, but no mounted " "hugetlbfs found for that size\n", num_pages, hpi->hugepage_sz); + /* if we have kernel support for reserving hugepages + * through mmap, and we're in in-memory mode, treat this + * page size as valid. we cannot be in legacy mode at + * this point because we've checked this earlier in the + * init process. + */ +#ifdef MAP_HUGE_SHIFT + if (internal_config.in_memory) { + RTE_LOG(DEBUG, EAL, "In-memory mode enabled, " + "hugepages of size %" PRIu64 " bytes " + "will be allocated anonymously\n", + hpi->hugepage_sz); + calc_num_pages(hpi, dirent); + num_sizes++; + } +#endif continue; } @@ -371,35 +427,7 @@ hugepage_info_init(void) if (clear_hugedir(hpi->hugedir) == -1) break; - /* - * first, try to put all hugepages into relevant sockets, but - * if first attempts fails, fall back to collecting all pages - * in one socket and sorting them later - */ - total_pages = 0; - /* we also don't want to do this for legacy init */ - if (!internal_config.legacy_mem) - for (i = 0; i < rte_socket_count(); i++) { - int socket = rte_socket_id_by_idx(i); - unsigned int num_pages = - get_num_hugepages_on_node( - dirent->d_name, socket); - hpi->num_pages[socket] = num_pages; - total_pages += num_pages; - } - /* - * we failed to sort memory from the get go, so fall - * back to old way - */ - if (total_pages == 0) - hpi->num_pages[0] = get_num_hugepages(dirent->d_name); - -#ifndef RTE_ARCH_64 - /* for 32-bit systems, limit number of hugepages to - * 1GB per page size */ - hpi->num_pages[0] = RTE_MIN(hpi->num_pages[0], - RTE_PGSIZE_1G / hpi->hugepage_sz); -#endif + calc_num_pages(hpi, dirent); num_sizes++; } @@ -423,8 +451,7 @@ hugepage_info_init(void) for (j = 0; j < RTE_MAX_NUMA_NODES; j++) num_pages += hpi->num_pages[j]; - if (strnlen(hpi->hugedir, sizeof(hpi->hugedir)) != 0 && - num_pages > 0) + if (num_pages > 0) return 0; } diff --git a/lib/librte_eal/linuxapp/eal/eal_memalloc.c b/lib/librte_eal/linuxapp/eal/eal_memalloc.c index d610923b8..79443c56a 100644 --- a/lib/librte_eal/linuxapp/eal/eal_memalloc.c +++ b/lib/librte_eal/linuxapp/eal/eal_memalloc.c @@ -28,6 +28,7 @@ #include #endif #include +#include /* for hugetlb-related mmap flags */ #include #include @@ -41,6 +42,15 @@ #include "eal_memalloc.h" #include "eal_private.h" +const int anonymous_hugepages_supported = +#ifdef MAP_HUGE_SHIFT + 1; +#define RTE_MAP_HUGE_SHIFT MAP_HUGE_SHIFT +#else + 0; +#define RTE_MAP_HUGE_SHIFT 26 +#endif + /* * not all kernel version support fallocate on hugetlbfs, so fall back to * ftruncate and disallow deallocation if fallocate is not supported. @@ -461,6 +471,8 @@ alloc_seg(struct rte_memseg *ms, void *addr, int socket_id, int cur_socket_id = 0; #endif uint64_t map_offset; + rte_iova_t iova; + void *va; char path[PATH_MAX]; int ret = 0; int fd; @@ -468,43 +480,66 @@ alloc_seg(struct rte_memseg *ms, void *addr, int socket_id, int flags; void *new_addr; - /* takes out a read lock on segment or segment list */ - fd = get_seg_fd(path, sizeof(path), hi, list_idx, seg_idx); - if (fd < 0) { - RTE_LOG(ERR, EAL, "Couldn't get fd on hugepage file\n"); - return -1; - } - alloc_sz = hi->hugepage_sz; - if (internal_config.single_file_segments) { - map_offset = seg_idx * alloc_sz; - ret = resize_hugefile(fd, path, list_idx, seg_idx, map_offset, - alloc_sz, true); - if (ret < 0) - goto resized; - } else { + if (internal_config.in_memory && anonymous_hugepages_supported) { + int log2, flags; + + log2 = rte_log2_u32(alloc_sz); + /* as per mmap() manpage, all page sizes are log2 of page size + * shifted by MAP_HUGE_SHIFT + */ + flags = (log2 << RTE_MAP_HUGE_SHIFT) | MAP_HUGETLB | MAP_FIXED | + MAP_PRIVATE | MAP_ANONYMOUS; + fd = -1; + va = mmap(addr, alloc_sz, PROT_READ | PROT_WRITE, flags, -1, 0); + + /* single-file segments codepath will never be active because + * in-memory mode is incompatible with it and it's stopped at + * EAL initialization stage, however the compiler doesn't know + * that and complains about map_offset being used uninitialized + * on failure codepaths while having in-memory mode enabled. so, + * assign a value here. + */ map_offset = 0; - if (ftruncate(fd, alloc_sz) < 0) { - RTE_LOG(DEBUG, EAL, "%s(): ftruncate() failed: %s\n", - __func__, strerror(errno)); - goto resized; + } else { + /* takes out a read lock on segment or segment list */ + fd = get_seg_fd(path, sizeof(path), hi, list_idx, seg_idx); + if (fd < 0) { + RTE_LOG(ERR, EAL, "Couldn't get fd on hugepage file\n"); + return -1; } - if (internal_config.hugepage_unlink) { - if (unlink(path)) { - RTE_LOG(DEBUG, EAL, "%s(): unlink() failed: %s\n", + + if (internal_config.single_file_segments) { + map_offset = seg_idx * alloc_sz; + ret = resize_hugefile(fd, path, list_idx, seg_idx, + map_offset, alloc_sz, true); + if (ret < 0) + goto resized; + } else { + map_offset = 0; + if (ftruncate(fd, alloc_sz) < 0) { + RTE_LOG(DEBUG, EAL, "%s(): ftruncate() failed: %s\n", __func__, strerror(errno)); goto resized; } + if (internal_config.hugepage_unlink) { + if (unlink(path)) { + RTE_LOG(DEBUG, EAL, "%s(): unlink() failed: %s\n", + __func__, strerror(errno)); + goto resized; + } + } } + + /* + * map the segment, and populate page tables, the kernel fills + * this segment with zeros if it's a new page. + */ + va = mmap(addr, alloc_sz, PROT_READ | PROT_WRITE, + MAP_SHARED | MAP_POPULATE | MAP_FIXED, fd, + map_offset); } - /* - * map the segment, and populate page tables, the kernel fills this - * segment with zeros if it's a new page. - */ - void *va = mmap(addr, alloc_sz, PROT_READ | PROT_WRITE, - MAP_SHARED | MAP_POPULATE | MAP_FIXED, fd, map_offset); - if (va == MAP_FAILED) { RTE_LOG(DEBUG, EAL, "%s(): mmap() failed: %s\n", __func__, strerror(errno)); @@ -519,24 +554,6 @@ alloc_seg(struct rte_memseg *ms, void *addr, int socket_id, goto resized; } - rte_iova_t iova = rte_mem_virt2iova(addr); - if (iova == RTE_BAD_PHYS_ADDR) { - RTE_LOG(DEBUG, EAL, "%s(): can't get IOVA addr\n", - __func__); - goto mapped; - } - -#ifdef RTE_EAL_NUMA_AWARE_HUGEPAGES - move_pages(getpid(), 1, &addr, NULL, &cur_socket_id, 0); - - if (cur_socket_id != socket_id) { - RTE_LOG(DEBUG, EAL, - "%s(): allocation happened on wrong socket (wanted %d, got %d)\n", - __func__, socket_id, cur_socket_id); - goto mapped; - } -#endif - /* In linux, hugetlb limitations, like cgroup, are * enforced at fault time instead of mmap(), even * with the option of MAP_POPULATE. Kernel will send @@ -549,9 +566,6 @@ alloc_seg(struct rte_memseg *ms, void *addr, int socket_id, (unsigned int)(alloc_sz >> 20)); goto mapped; } - /* for non-single file segments, we can close fd here */ - if (!internal_config.single_file_segments) - close(fd); /* we need to trigger a write to the page to enforce page fault and * ensure that page is accessible to us, but we can't overwrite value @@ -560,6 +574,28 @@ alloc_seg(struct rte_memseg *ms, void *addr, int socket_id, */ *(volatile int *)addr = *(volatile int *)addr; + iova = rte_mem_virt2iova(addr); + if (iova == RTE_BAD_PHYS_ADDR) { + RTE_LOG(DEBUG, EAL, "%s(): can't get IOVA addr\n", + __func__); + goto mapped; + } + +#ifdef RTE_EAL_NUMA_AWARE_HUGEPAGES + move_pages(getpid(), 1, &addr, NULL, &cur_socket_id, 0); + + if (cur_socket_id != socket_id) { + RTE_LOG(DEBUG, EAL, + "%s(): allocation happened on wrong socket (wanted %d, got %d)\n", + __func__, socket_id, cur_socket_id); + goto mapped; + } +#endif + /* for non-single file segments that aren't in-memory, we can close fd + * here */ + if (!internal_config.single_file_segments && !internal_config.in_memory) + close(fd); + ms->addr = addr; ms->hugepage_sz = alloc_sz; ms->len = alloc_sz; @@ -588,6 +624,7 @@ alloc_seg(struct rte_memseg *ms, void *addr, int socket_id, RTE_LOG(CRIT, EAL, "Can't mmap holes in our virtual address space\n"); } resized: + /* in-memory mode will never be single-file-segments mode */ if (internal_config.single_file_segments) { resize_hugefile(fd, path, list_idx, seg_idx, map_offset, alloc_sz, false); @@ -595,6 +632,7 @@ alloc_seg(struct rte_memseg *ms, void *addr, int socket_id, } else { /* only remove file if we can take out a write lock */ if (internal_config.hugepage_unlink == 0 && + internal_config.in_memory == 0 && lock(fd, LOCK_EX) == 1) unlink(path); close(fd); @@ -705,7 +743,7 @@ alloc_seg_walk(const struct rte_memseg_list *msl, void *arg) * during init, we already hold a write lock, so don't try to take out * another one. */ - if (wa->hi->lock_descriptor == -1) { + if (wa->hi->lock_descriptor == -1 && !internal_config.in_memory) { dir_fd = open(wa->hi->hugedir, O_RDONLY); if (dir_fd < 0) { RTE_LOG(ERR, EAL, "%s(): Cannot open '%s': %s\n", @@ -809,7 +847,7 @@ free_seg_walk(const struct rte_memseg_list *msl, void *arg) * during init, we already hold a write lock, so don't try to take out * another one. */ - if (wa->hi->lock_descriptor == -1) { + if (wa->hi->lock_descriptor == -1 && !internal_config.in_memory) { dir_fd = open(wa->hi->hugedir, O_RDONLY); if (dir_fd < 0) { RTE_LOG(ERR, EAL, "%s(): Cannot open '%s': %s\n", diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c b/lib/librte_eal/linuxapp/eal/eal_memory.c index ddfa8b133..dbf19499e 100644 --- a/lib/librte_eal/linuxapp/eal/eal_memory.c +++ b/lib/librte_eal/linuxapp/eal/eal_memory.c @@ -1088,8 +1088,7 @@ get_socket_mem_size(int socket) for (i = 0; i < internal_config.num_hugepage_sizes; i++){ struct hugepage_info *hpi = &internal_config.hugepage_info[i]; - if (strnlen(hpi->hugedir, sizeof(hpi->hugedir)) != 0) - size += hpi->hugepage_sz * hpi->num_pages[socket]; + size += hpi->hugepage_sz * hpi->num_pages[socket]; } return size;