17/01/2022 09:14, Dmitry Kozlyuk:
> Expose Linux EAL ability to reuse existing hugepage files
> via --huge-unlink=never switch.
> Default behavior is unchanged, it can also be specified
> using --huge-unlink=existing for consistency.
> Old --huge-unlink switch is kept,
> it is an alias for --huge-unlink=always.
>
> Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
> ---
> doc/guides/linux_gsg/linux_eal_parameters.rst | 21 ++++++++--
> .../prog_guide/env_abstraction_layer.rst | 9 +++++
> doc/guides/rel_notes/release_22_03.rst | 7 ++++
> lib/eal/common/eal_common_options.c | 39 +++++++++++++++++--
> 4 files changed, 69 insertions(+), 7 deletions(-)
>
> diff --git a/doc/guides/linux_gsg/linux_eal_parameters.rst b/doc/guides/linux_gsg/linux_eal_parameters.rst
> index 74df2611b5..7586f15ce3 100644
> --- a/doc/guides/linux_gsg/linux_eal_parameters.rst
> +++ b/doc/guides/linux_gsg/linux_eal_parameters.rst
> @@ -84,10 +84,23 @@ Memory-related options
> Use specified hugetlbfs directory instead of autodetected ones. This can be
> a sub-directory within a hugetlbfs mountpoint.
>
> -* ``--huge-unlink``
> -
> - Unlink hugepage files after creating them (implies no secondary process
> - support).
> +* ``--huge-unlink[=existing|always|never]``
> +
> + No ``--huge-unlink`` option or ``--huge-unlink=existing`` is the default:
> + existing hugepage files are removed and re-created
> + to ensure the kernel clears the memory and prevents any data leaks.
> +
> + With ``--huge-unlink`` (no value) or ``--huge-unlink=always``,
> + hugepage files are also removed after creating them,
> + so that the application leaves no files in hugetlbfs.
> + This mode implies no multi-process support.
> +
> + When ``--huge-unlink=never`` is specified, existing hugepage files
> + are not removed either before or after mapping them.
One detail not clear: the second unlink is before or after mapping?
> + This makes restart faster by saving time to clear memory at initialization,
> + but it may slow down zeroed allocations later.
> + Reused hugepages can contain data from previous processes that used them,
> + which may be a security concern.
I absolutely love these options.
It keeps compability while making things consistent and understandable.
Acked-by: Thomas Monjalon <thomas@monjalon.net>
@@ -84,10 +84,23 @@ Memory-related options
Use specified hugetlbfs directory instead of autodetected ones. This can be
a sub-directory within a hugetlbfs mountpoint.
-* ``--huge-unlink``
-
- Unlink hugepage files after creating them (implies no secondary process
- support).
+* ``--huge-unlink[=existing|always|never]``
+
+ No ``--huge-unlink`` option or ``--huge-unlink=existing`` is the default:
+ existing hugepage files are removed and re-created
+ to ensure the kernel clears the memory and prevents any data leaks.
+
+ With ``--huge-unlink`` (no value) or ``--huge-unlink=always``,
+ hugepage files are also removed after creating them,
+ so that the application leaves no files in hugetlbfs.
+ This mode implies no multi-process support.
+
+ When ``--huge-unlink=never`` is specified, existing hugepage files
+ are not removed either before or after mapping them.
+ This makes restart faster by saving time to clear memory at initialization,
+ but it may slow down zeroed allocations later.
+ Reused hugepages can contain data from previous processes that used them,
+ which may be a security concern.
* ``--match-allocations``
@@ -277,6 +277,15 @@ to prevent data leaks from previous users of the same hugepage.
EAL ensures this behavior by removing existing backing files at startup
and by recreating them before opening for mapping (as a precaution).
+One exception is ``--huge-unlink=never`` mode.
+It is used to speed up EAL initialization, usually on application restart.
+Clearing memory constitutes more than 95% of hugepage mapping time.
+EAL can save it by remapping existing backing files
+with all the data left in the mapped hugepages ("dirty" memory).
+Such segments are marked with ``RTE_MEMSEG_FLAG_DIRTY``.
+Memory allocator detects dirty segments handles them accordingly,
+in particular, it clears memory requested with ``rte_zmalloc*()``.
+
Anonymous mapping does not allow multi-process architecture,
but it is free of filename conflicts and leftover files on hugetlbfs.
If memfd_create(2) is supported both at build and run time,
@@ -55,6 +55,13 @@ New Features
Also, make sure to start the actual text at the margin.
=======================================================
+* **Added ability to reuse hugepages in Linux.**
+
+ It is possible to reuse files in hugetlbfs to speed up hugepage mapping,
+ which may be useful for fast restart and large allocations.
+ The new mode is activated with ``--huge-unlink=never``
+ and has security implications, refer to the user and programmer guides.
+
Removed Items
-------------
@@ -74,7 +74,7 @@ eal_long_options[] = {
{OPT_FILE_PREFIX, 1, NULL, OPT_FILE_PREFIX_NUM },
{OPT_HELP, 0, NULL, OPT_HELP_NUM },
{OPT_HUGE_DIR, 1, NULL, OPT_HUGE_DIR_NUM },
- {OPT_HUGE_UNLINK, 0, NULL, OPT_HUGE_UNLINK_NUM },
+ {OPT_HUGE_UNLINK, 2, NULL, OPT_HUGE_UNLINK_NUM },
{OPT_IOVA_MODE, 1, NULL, OPT_IOVA_MODE_NUM },
{OPT_LCORES, 1, NULL, OPT_LCORES_NUM },
{OPT_LOG_LEVEL, 1, NULL, OPT_LOG_LEVEL_NUM },
@@ -1596,6 +1596,28 @@ available_cores(void)
return str;
}
+#define HUGE_UNLINK_NEVER "never"
+
+static int
+eal_parse_huge_unlink(const char *arg, struct hugepage_file_discipline *out)
+{
+ if (arg == NULL || strcmp(arg, "always") == 0) {
+ out->unlink_before_mapping = true;
+ return 0;
+ }
+ if (strcmp(arg, "existing") == 0) {
+ /* same as not specifying the option */
+ return 0;
+ }
+ if (strcmp(arg, HUGE_UNLINK_NEVER) == 0) {
+ RTE_LOG(WARNING, EAL, "Using --"OPT_HUGE_UNLINK"="
+ HUGE_UNLINK_NEVER" may create data leaks.\n");
+ out->keep_existing = true;
+ return 0;
+ }
+ return -1;
+}
+
int
eal_parse_common_option(int opt, const char *optarg,
struct internal_config *conf)
@@ -1737,7 +1759,10 @@ eal_parse_common_option(int opt, const char *optarg,
/* long options */
case OPT_HUGE_UNLINK_NUM:
- conf->hugepage_file.unlink_before_mapping = true;
+ if (eal_parse_huge_unlink(optarg, &conf->hugepage_file) < 0) {
+ RTE_LOG(ERR, EAL, "invalid --"OPT_HUGE_UNLINK" option\n");
+ return -1;
+ }
break;
case OPT_NO_HUGE_NUM:
@@ -2068,6 +2093,12 @@ eal_check_common_options(struct internal_config *internal_cfg)
"not compatible with --"OPT_HUGE_UNLINK"\n");
return -1;
}
+ if (internal_cfg->hugepage_file.keep_existing &&
+ internal_cfg->in_memory) {
+ RTE_LOG(ERR, EAL, "Option --"OPT_IN_MEMORY" is not compatible "
+ "with --"OPT_HUGE_UNLINK"="HUGE_UNLINK_NEVER"\n");
+ return -1;
+ }
if (internal_cfg->legacy_mem &&
internal_cfg->in_memory) {
RTE_LOG(ERR, EAL, "Option --"OPT_LEGACY_MEM" is not compatible "
@@ -2200,7 +2231,9 @@ eal_common_usage(void)
" --"OPT_NO_TELEMETRY" Disable telemetry support\n"
" --"OPT_FORCE_MAX_SIMD_BITWIDTH" Force the max SIMD bitwidth\n"
"\nEAL options for DEBUG use only:\n"
- " --"OPT_HUGE_UNLINK" Unlink hugepage files after init\n"
+ " --"OPT_HUGE_UNLINK"[=existing|always|never]\n"
+ " When to unlink files in hugetlbfs\n"
+ " ('existing' by default, no value means 'always')\n"
" --"OPT_NO_HUGE" Use malloc instead of hugetlbfs\n"
" --"OPT_NO_PCI" Disable PCI\n"
" --"OPT_NO_HPET" Disable HPET\n"