[v4] eal: allow worker lcore stacks to be allocated from hugepage memory

Message ID 20220517153136.23128-1-donw@xsightlabs.com (mailing list archive)
State Superseded, archived
Delegated to: Thomas Monjalon
Headers
Series [v4] eal: allow worker lcore stacks to be allocated from hugepage memory |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/github-robot: build success github build: passed
ci/iol-intel-Functional success Functional Testing PASS
ci/iol-intel-Performance success Performance Testing PASS
ci/iol-aarch64-unit-testing success Testing PASS
ci/iol-x86_64-unit-testing success Testing PASS
ci/iol-aarch64-compile-testing success Testing PASS
ci/iol-abi-testing success Testing PASS
ci/iol-x86_64-compile-testing success Testing PASS
ci/Intel-compilation success Compilation OK
ci/intel-Testing success Testing PASS

Commit Message

Don Wallwork May 17, 2022, 3:31 p.m. UTC
  Add support for using hugepages for worker lcore stack memory.  The
intent is to improve performance by reducing stack memory related TLB
misses and also by using memory local to the NUMA node of each lcore.

EAL option '--huge-worker-stack [stack-size-in-kbytes]' is added to allow
the feature to be enabled at runtime.  If the size is not specified,
the system pthread stack size will be used.

Signed-off-by: Don Wallwork <donw@xsightlabs.com>
Acked-by: Morten Brørup <mb@smartsharesystems.com>
---
 doc/guides/linux_gsg/eal_args.include.rst     |  6 ++
 .../prog_guide/env_abstraction_layer.rst      | 21 +++++++
 lib/eal/common/eal_common_options.c           | 41 +++++++++++++
 lib/eal/common/eal_internal_cfg.h             |  4 ++
 lib/eal/common/eal_options.h                  |  2 +
 lib/eal/linux/eal.c                           | 61 ++++++++++++++++++-
 6 files changed, 133 insertions(+), 2 deletions(-)
  

Comments

Stephen Hemminger May 17, 2022, 3:56 p.m. UTC | #1
On Tue, 17 May 2022 11:31:36 -0400
Don Wallwork <donw@xsightlabs.com> wrote:

> Add support for using hugepages for worker lcore stack memory.  The
> intent is to improve performance by reducing stack memory related TLB
> misses and also by using memory local to the NUMA node of each lcore.
> 
> EAL option '--huge-worker-stack [stack-size-in-kbytes]' is added to allow
> the feature to be enabled at runtime.  If the size is not specified,
> the system pthread stack size will be used.
> 
> Signed-off-by: Don Wallwork <donw@xsightlabs.com>
> Acked-by: Morten Brørup <mb@smartsharesystems.com>
> ---

This looks great, just thinking a little more about what the impact
of using it would be.

Since the memory region for the stack is never freed, it will cause
complaints from address sanitizer and maybe from valgrind.

One way to workaround that would be to use the lower level allocation
routine to get the memory segments. This would make stacks a multiple
of page size which would not be bad idea anyway. 
Plus you could use eal_memalloc_seg_bulk.
  
Don Wallwork May 18, 2022, 2:10 p.m. UTC | #2
On 5/17/2022 11:56 AM, Stephen Hemminger wrote:
> On Tue, 17 May 2022 11:31:36 -0400
> Don Wallwork <donw@xsightlabs.com> wrote:
>
>> Add support for using hugepages for worker lcore stack memory.  The
>> intent is to improve performance by reducing stack memory related TLB
>> misses and also by using memory local to the NUMA node of each lcore.
>>
>> EAL option '--huge-worker-stack [stack-size-in-kbytes]' is added to allow
>> the feature to be enabled at runtime.  If the size is not specified,
>> the system pthread stack size will be used.
>>
>> Signed-off-by: Don Wallwork <donw@xsightlabs.com>
>> Acked-by: Morten Brørup <mb@smartsharesystems.com>
>> ---
> This looks great, just thinking a little more about what the impact
> of using it would be.
>
> Since the memory region for the stack is never freed, it will cause
> complaints from address sanitizer and maybe from valgrind.
>
> One way to workaround that would be to use the lower level allocation
> routine to get the memory segments. This would make stacks a multiple
> of page size which would not be bad idea anyway.
> Plus you could use eal_memalloc_seg_bulk.
>
The problem with using this API is that it requires allocating page 
sized stacks
which would be undesirable in memory constrained environments or when the
huge page size is 1GB.

We looked for a place to free this memory, but could not find any place 
in DPDK
where the worker threads are canceled.  Obviously the worker threads 
have to
be stopped before we can free this memory.
  
fengchengwen May 20, 2022, 8:30 a.m. UTC | #3
Acked-by: Chengwen Feng <fengchengwen@huawei.com>

On 2022/5/17 23:31, Don Wallwork wrote:
> Add support for using hugepages for worker lcore stack memory.  The
> intent is to improve performance by reducing stack memory related TLB
> misses and also by using memory local to the NUMA node of each lcore.
> 
> EAL option '--huge-worker-stack [stack-size-in-kbytes]' is added to allow
> the feature to be enabled at runtime.  If the size is not specified,
> the system pthread stack size will be used.
> 
> Signed-off-by: Don Wallwork <donw@xsightlabs.com>
> Acked-by: Morten Brørup <mb@smartsharesystems.com>
> ---

snip
  
Kathleen Capella May 23, 2022, 10:35 p.m. UTC | #4
In this section of the code:

stack_ptr = rte_zmalloc_socket("lcore_stack",
				       stack_size,
				       stack_size,
				       rte_lcore_to_socket_id(lcore_id));

stack memory is aligned to the stack_size. According to the implementation of rte_zmalloc_socket, the alignment must be a power of two. If the user inputs a number of KBs that is not a power of two, this will fail with a generic error message of " EAL: Cannot allocate worker lcore stack memory." A check for this occurrence with a more descriptive error message and a note in the documentation would be good to include.

> -----Original Message-----
> From: Don Wallwork <donw@xsightlabs.com>
> Sent: Tuesday, May 17, 2022 10:32 AM
> To: dev@dpdk.org
> Cc: donw@xsightlabs.com; stephen@networkplumber.org;
> fengchengwen@huawei.com; mb@smartsharesystems.com;
> anatoly.burakov@intel.com; dmitry.kozliuk@gmail.com;
> bruce.richardson@intel.com; Honnappa Nagarahalli
> <Honnappa.Nagarahalli@arm.com>; nd <nd@arm.com>;
> haiyue.wang@intel.com
> Subject: [PATCH v4] eal: allow worker lcore stacks to be allocated from
> hugepage memory
> 
> Add support for using hugepages for worker lcore stack memory.  The intent
> is to improve performance by reducing stack memory related TLB misses and
> also by using memory local to the NUMA node of each lcore.
> 
> EAL option '--huge-worker-stack [stack-size-in-kbytes]' is added to allow the
> feature to be enabled at runtime.  If the size is not specified, the system
> pthread stack size will be used.
> 
> Signed-off-by: Don Wallwork <donw@xsightlabs.com>
> Acked-by: Morten Brørup <mb@smartsharesystems.com>
> ---
>  doc/guides/linux_gsg/eal_args.include.rst     |  6 ++
>  .../prog_guide/env_abstraction_layer.rst      | 21 +++++++
>  lib/eal/common/eal_common_options.c           | 41 +++++++++++++
>  lib/eal/common/eal_internal_cfg.h             |  4 ++
>  lib/eal/common/eal_options.h                  |  2 +
>  lib/eal/linux/eal.c                           | 61 ++++++++++++++++++-
>  6 files changed, 133 insertions(+), 2 deletions(-)
> 
> diff --git a/doc/guides/linux_gsg/eal_args.include.rst
> b/doc/guides/linux_gsg/eal_args.include.rst
> index 3549a0cf56..9cfbf7de84 100644
> --- a/doc/guides/linux_gsg/eal_args.include.rst
> +++ b/doc/guides/linux_gsg/eal_args.include.rst
> @@ -116,6 +116,12 @@ Memory-related options
> 
>      Force IOVA mode to a specific value.
> 
> +*   ``--huge-worker-stack[=size]``
> +
> +    Allocate worker stack memory from hugepage memory. Stack size
> defaults
> +    to system pthread stack size unless the optional size (in kbytes) is
> +    specified.
> +
>  Debugging options
>  ~~~~~~~~~~~~~~~~~
> 
> diff --git a/doc/guides/prog_guide/env_abstraction_layer.rst
> b/doc/guides/prog_guide/env_abstraction_layer.rst
> index 5f0748fba1..e74516f0cf 100644
> --- a/doc/guides/prog_guide/env_abstraction_layer.rst
> +++ b/doc/guides/prog_guide/env_abstraction_layer.rst
> @@ -329,6 +329,27 @@ Another option is to use bigger page sizes. Since
> fewer pages are required to  cover the same memory area, fewer file
> descriptors will be stored internally  by EAL.
> 
> +.. _huge-worker-stack:
> +
> +Hugepage Worker Stacks
> +^^^^^^^^^^^^^^^^^^^^^^
> +
> +When the ``--huge-worker-stack[=size]`` EAL option is specified, worker
> +thread stacks are allocated from hugepage memory local to the NUMA node
> +of the thread. Worker stack size defaults to system pthread stack size
> +if the optional size parameter is not specified.
> +
> +.. warning::
> +    Stacks allocated from hugepage memory are not protected by guard
> +    pages. Worker stacks must be sufficiently sized to prevent stack
> +    overflow when this option is used.
> +
> +    As with normal thread stacks, hugepage worker thread stack size is
> +    fixed and is not dynamically resized. Therefore, an application that
> +    is free of stack page faults under a given load should be safe with
> +    hugepage worker thread stacks given the same thread stack size and
> +    loading conditions.
> +
>  Support for Externally Allocated Memory
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> 
> diff --git a/lib/eal/common/eal_common_options.c
> b/lib/eal/common/eal_common_options.c
> index f247a42455..370801f19b 100644
> --- a/lib/eal/common/eal_common_options.c
> +++ b/lib/eal/common/eal_common_options.c
> @@ -103,6 +103,7 @@ eal_long_options[] = {
>  	{OPT_TELEMETRY,         0, NULL, OPT_TELEMETRY_NUM        },
>  	{OPT_NO_TELEMETRY,      0, NULL, OPT_NO_TELEMETRY_NUM     },
>  	{OPT_FORCE_MAX_SIMD_BITWIDTH, 1, NULL,
> OPT_FORCE_MAX_SIMD_BITWIDTH_NUM},
> +	{OPT_HUGE_WORKER_STACK, 2, NULL,
> OPT_HUGE_WORKER_STACK_NUM     },
> 
>  	{0,                     0, NULL, 0                        }
>  };
> @@ -1618,6 +1619,28 @@ eal_parse_huge_unlink(const char *arg, struct
> hugepage_file_discipline *out)
>  	return -1;
>  }
> 
> +#ifndef RTE_EXEC_ENV_WINDOWS
> +static int
> +eal_parse_huge_worker_stack(const char *arg, size_t
> +*huge_worker_stack_size) {
> +	size_t worker_stack_size;
> +	char *end;
> +
> +	if (arg == NULL || arg[0] == '\0') {
> +		*huge_worker_stack_size = WORKER_STACK_SIZE_FROM_OS;
> +		return 0;
> +	}
> +	errno = 0;
> +	worker_stack_size = strtoul(arg, &end, 10);
> +	if (errno || end == NULL || worker_stack_size == 0 ||
> +	    worker_stack_size >= (size_t)-1 / 1024)
> +		return -1;
> +
> +	*huge_worker_stack_size = worker_stack_size * 1024;
> +	return 0;
> +}
> +#endif
> +
>  int
>  eal_parse_common_option(int opt, const char *optarg,
>  			struct internal_config *conf)
> @@ -1921,6 +1944,17 @@ eal_parse_common_option(int opt, const char
> *optarg,
>  		}
>  		break;
> 
> +#ifndef RTE_EXEC_ENV_WINDOWS
> +	case OPT_HUGE_WORKER_STACK_NUM:
> +		if (eal_parse_huge_worker_stack(optarg,
> +						&conf-
> >huge_worker_stack_size) < 0) {
> +			RTE_LOG(ERR, EAL, "invalid parameter for --"
> +				OPT_HUGE_WORKER_STACK"\n");
> +			return -1;
> +		}
> +		break;
> +#endif /* !RTE_EXEC_ENV_WINDOWS */
> +
>  	/* don't know what to do, leave this to caller */
>  	default:
>  		return 1;
> @@ -2235,5 +2269,12 @@ eal_common_usage(void)
>  	       "  --"OPT_NO_PCI"            Disable PCI\n"
>  	       "  --"OPT_NO_HPET"           Disable HPET\n"
>  	       "  --"OPT_NO_SHCONF"         No shared config (mmap'd files)\n"
> +#ifndef RTE_EXEC_ENV_WINDOWS
> +	       "  --"OPT_HUGE_WORKER_STACK"[=size]\n"
> +	       "                      Allocate worker thread stacks from\n"
> +	       "                      hugepage memory. Size is in units of\n"
> +	       "                      kbytes and defaults to system thread\n"
> +	       "                      stack size if not specified.\n"
> +#endif
>  	       "\n", RTE_MAX_LCORE);
>  }
> diff --git a/lib/eal/common/eal_internal_cfg.h
> b/lib/eal/common/eal_internal_cfg.h
> index b71faadd18..5e154967e4 100644
> --- a/lib/eal/common/eal_internal_cfg.h
> +++ b/lib/eal/common/eal_internal_cfg.h
> @@ -48,6 +48,9 @@ struct hugepage_file_discipline {
>  	bool unlink_existing;
>  };
> 
> +/** Worker hugepage stack size should default to OS value. */ #define
> +WORKER_STACK_SIZE_FROM_OS ((size_t)~0)
> +
>  /**
>   * internal configuration
>   */
> @@ -102,6 +105,7 @@ struct internal_config {
>  	unsigned int no_telemetry; /**< true to disable Telemetry */
>  	struct simd_bitwidth max_simd_bitwidth;
>  	/**< max simd bitwidth path to use */
> +	size_t huge_worker_stack_size; /**< worker thread stack size */
>  };
> 
>  void eal_reset_internal_config(struct internal_config *internal_cfg); diff --git
> a/lib/eal/common/eal_options.h b/lib/eal/common/eal_options.h index
> 8e4f7202a2..3cc9cb6412 100644
> --- a/lib/eal/common/eal_options.h
> +++ b/lib/eal/common/eal_options.h
> @@ -87,6 +87,8 @@ enum {
>  	OPT_NO_TELEMETRY_NUM,
>  #define OPT_FORCE_MAX_SIMD_BITWIDTH  "force-max-simd-bitwidth"
>  	OPT_FORCE_MAX_SIMD_BITWIDTH_NUM,
> +#define OPT_HUGE_WORKER_STACK  "huge-worker-stack"
> +	OPT_HUGE_WORKER_STACK_NUM,
> 
>  	OPT_LONG_MAX_NUM
>  };
> diff --git a/lib/eal/linux/eal.c b/lib/eal/linux/eal.c index
> 1ef263434a..2bee66577e 100644
> --- a/lib/eal/linux/eal.c
> +++ b/lib/eal/linux/eal.c
> @@ -857,6 +857,64 @@ is_iommu_enabled(void)
>  	return n > 2;
>  }
> 
> +static int
> +eal_worker_thread_create(struct internal_config *internal_conf,
> +			 int lcore_id)
> +{
> +	pthread_attr_t attr;
> +	size_t stack_size;
> +	void *stack_ptr;
> +	int ret;
> +
> +	if (internal_conf->huge_worker_stack_size == 0)
> +		return pthread_create(&lcore_config[lcore_id].thread_id,
> +				      NULL,
> +				      eal_thread_loop,
> +				      (void *)(uintptr_t)lcore_id);
> +
> +	/* Allocate NUMA aware stack memory and set pthread attributes */
> +	if (pthread_attr_init(&attr) != 0) {
> +		rte_eal_init_alert("Cannot init pthread attributes");
> +		rte_errno = EFAULT;
> +		return -1;
> +	}
> +	if (internal_conf->huge_worker_stack_size ==
> WORKER_STACK_SIZE_FROM_OS) {
> +		if (pthread_attr_getstacksize(&attr, &stack_size) != 0) {
> +			rte_errno = EFAULT;
> +			return -1;
> +		}
> +	} else {
> +		stack_size = internal_conf->huge_worker_stack_size;
> +	}
> +	stack_ptr = rte_zmalloc_socket("lcore_stack",
> +				       stack_size,
> +				       stack_size,
> +				       rte_lcore_to_socket_id(lcore_id));
> +
> +	if (stack_ptr == NULL) {
> +		rte_eal_init_alert("Cannot allocate worker lcore stack
> memory");
> +		rte_errno = ENOMEM;
> +		return -1;
> +	}
> +
> +	if (pthread_attr_setstack(&attr, stack_ptr, stack_size) != 0) {
> +		rte_eal_init_alert("Cannot set pthread stack attributes");
> +		rte_errno = EFAULT;
> +		return -1;
> +	}
> +
> +	ret = pthread_create(&lcore_config[lcore_id].thread_id, &attr,
> +			     eal_thread_loop,
> +			     (void *)(uintptr_t)lcore_id);
> +
> +	if (pthread_attr_destroy(&attr) != 0) {
> +		rte_eal_init_alert("Cannot destroy pthread attributes");
> +		rte_errno = EFAULT;
> +		return -1;
> +	}
> +	return ret;
> +}
> +
>  /* Launch threads, called at application init(). */  int  rte_eal_init(int argc,
> char **argv) @@ -1144,8 +1202,7 @@ rte_eal_init(int argc, char **argv)
>  		lcore_config[i].state = WAIT;
> 
>  		/* create a thread for each lcore */
> -		ret = pthread_create(&lcore_config[i].thread_id, NULL,
> -				     eal_thread_loop, (void *)(uintptr_t)i);
> +		ret = eal_worker_thread_create(internal_conf, i);
>  		if (ret != 0)
>  			rte_panic("Cannot create thread\n");
> 
> --
> 2.17.1
  
Don Wallwork May 24, 2022, 1:48 p.m. UTC | #5
On 5/23/2022 6:35 PM, Kathleen Capella wrote:
> In this section of the code:
>
> stack_ptr = rte_zmalloc_socket("lcore_stack",
> 				       stack_size,
> 				       stack_size,
> 				       rte_lcore_to_socket_id(lcore_id));
>
> stack memory is aligned to the stack_size. According to the implementation of rte_zmalloc_socket, the alignment must be a power of two. If the user inputs a number of KBs that is not a power of two, this will fail with a generic error message of " EAL: Cannot allocate worker lcore stack memory." A check for this occurrence with a more descriptive error message and a note in the documentation would be good to include.
Good point.  Alignment to stack size is not necessary.  I'll post a new 
version that only requires cache line alignment.
  
Anatoly Burakov May 24, 2022, 2:40 p.m. UTC | #6
On 17-May-22 4:31 PM, Don Wallwork wrote:
> Add support for using hugepages for worker lcore stack memory.  The
> intent is to improve performance by reducing stack memory related TLB
> misses and also by using memory local to the NUMA node of each lcore.
> 
> EAL option '--huge-worker-stack [stack-size-in-kbytes]' is added to allow
> the feature to be enabled at runtime.  If the size is not specified,
> the system pthread stack size will be used.
> 
> Signed-off-by: Don Wallwork <donw@xsightlabs.com>
> Acked-by: Morten Brørup <mb@smartsharesystems.com>
> ---

<snip>

> +++ b/lib/eal/common/eal_common_options.c
> @@ -103,6 +103,7 @@ eal_long_options[] = {
>   	{OPT_TELEMETRY,         0, NULL, OPT_TELEMETRY_NUM        },
>   	{OPT_NO_TELEMETRY,      0, NULL, OPT_NO_TELEMETRY_NUM     },
>   	{OPT_FORCE_MAX_SIMD_BITWIDTH, 1, NULL, OPT_FORCE_MAX_SIMD_BITWIDTH_NUM},
> +	{OPT_HUGE_WORKER_STACK, 2, NULL, OPT_HUGE_WORKER_STACK_NUM     },
>   
>   	{0,                     0, NULL, 0                        }
>   };
> @@ -1618,6 +1619,28 @@ eal_parse_huge_unlink(const char *arg, struct hugepage_file_discipline *out)
>   	return -1;
>   }
>   
> +#ifndef RTE_EXEC_ENV_WINDOWS

Why the #ifdef-ery? This is common code, I think we can just leave it 
there? You could just add a check for `huge_worker_stack_size` in 
Windows EAL to guard against using this setting for Windows, but 
otherwise I see no need for an #ifdef here.
  
Don Wallwork May 24, 2022, 7:38 p.m. UTC | #7
On 5/24/2022 10:40 AM, Burakov, Anatoly wrote:
> On 17-May-22 4:31 PM, Don Wallwork wrote:
>> Add support for using hugepages for worker lcore stack memory.  The
>> intent is to improve performance by reducing stack memory related TLB
>> misses and also by using memory local to the NUMA node of each lcore.
>>
>> EAL option '--huge-worker-stack [stack-size-in-kbytes]' is added to 
>> allow
>> the feature to be enabled at runtime.  If the size is not specified,
>> the system pthread stack size will be used.
>>
>> Signed-off-by: Don Wallwork <donw@xsightlabs.com>
>> Acked-by: Morten Brørup <mb@smartsharesystems.com>
>> ---
>
> <snip>
>
>> +++ b/lib/eal/common/eal_common_options.c
>> @@ -103,6 +103,7 @@ eal_long_options[] = {
>>       {OPT_TELEMETRY,         0, NULL, OPT_TELEMETRY_NUM },
>>       {OPT_NO_TELEMETRY,      0, NULL, OPT_NO_TELEMETRY_NUM },
>>       {OPT_FORCE_MAX_SIMD_BITWIDTH, 1, NULL, 
>> OPT_FORCE_MAX_SIMD_BITWIDTH_NUM},
>> +    {OPT_HUGE_WORKER_STACK, 2, NULL, OPT_HUGE_WORKER_STACK_NUM     },
>>         {0,                     0, NULL, 0                        }
>>   };
>> @@ -1618,6 +1619,28 @@ eal_parse_huge_unlink(const char *arg, struct 
>> hugepage_file_discipline *out)
>>       return -1;
>>   }
>>   +#ifndef RTE_EXEC_ENV_WINDOWS
>
> Why the #ifdef-ery? This is common code, I think we can just leave it 
> there? You could just add a check for `huge_worker_stack_size` in 
> Windows EAL to guard against using this setting for Windows, but 
> otherwise I see no need for an #ifdef here.
>

Was trying to follow the convention used in other cases, but I will post 
a new version that eliminates the ifdefs and checks 
huge_worker_stack_size in FreeBSD and Windows EAL.
  

Patch

diff --git a/doc/guides/linux_gsg/eal_args.include.rst b/doc/guides/linux_gsg/eal_args.include.rst
index 3549a0cf56..9cfbf7de84 100644
--- a/doc/guides/linux_gsg/eal_args.include.rst
+++ b/doc/guides/linux_gsg/eal_args.include.rst
@@ -116,6 +116,12 @@  Memory-related options
 
     Force IOVA mode to a specific value.
 
+*   ``--huge-worker-stack[=size]``
+
+    Allocate worker stack memory from hugepage memory. Stack size defaults
+    to system pthread stack size unless the optional size (in kbytes) is
+    specified.
+
 Debugging options
 ~~~~~~~~~~~~~~~~~
 
diff --git a/doc/guides/prog_guide/env_abstraction_layer.rst b/doc/guides/prog_guide/env_abstraction_layer.rst
index 5f0748fba1..e74516f0cf 100644
--- a/doc/guides/prog_guide/env_abstraction_layer.rst
+++ b/doc/guides/prog_guide/env_abstraction_layer.rst
@@ -329,6 +329,27 @@  Another option is to use bigger page sizes. Since fewer pages are required to
 cover the same memory area, fewer file descriptors will be stored internally
 by EAL.
 
+.. _huge-worker-stack:
+
+Hugepage Worker Stacks
+^^^^^^^^^^^^^^^^^^^^^^
+
+When the ``--huge-worker-stack[=size]`` EAL option is specified, worker
+thread stacks are allocated from hugepage memory local to the NUMA node
+of the thread. Worker stack size defaults to system pthread stack size
+if the optional size parameter is not specified.
+
+.. warning::
+    Stacks allocated from hugepage memory are not protected by guard
+    pages. Worker stacks must be sufficiently sized to prevent stack
+    overflow when this option is used.
+
+    As with normal thread stacks, hugepage worker thread stack size is
+    fixed and is not dynamically resized. Therefore, an application that
+    is free of stack page faults under a given load should be safe with
+    hugepage worker thread stacks given the same thread stack size and
+    loading conditions.
+
 Support for Externally Allocated Memory
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
diff --git a/lib/eal/common/eal_common_options.c b/lib/eal/common/eal_common_options.c
index f247a42455..370801f19b 100644
--- a/lib/eal/common/eal_common_options.c
+++ b/lib/eal/common/eal_common_options.c
@@ -103,6 +103,7 @@  eal_long_options[] = {
 	{OPT_TELEMETRY,         0, NULL, OPT_TELEMETRY_NUM        },
 	{OPT_NO_TELEMETRY,      0, NULL, OPT_NO_TELEMETRY_NUM     },
 	{OPT_FORCE_MAX_SIMD_BITWIDTH, 1, NULL, OPT_FORCE_MAX_SIMD_BITWIDTH_NUM},
+	{OPT_HUGE_WORKER_STACK, 2, NULL, OPT_HUGE_WORKER_STACK_NUM     },
 
 	{0,                     0, NULL, 0                        }
 };
@@ -1618,6 +1619,28 @@  eal_parse_huge_unlink(const char *arg, struct hugepage_file_discipline *out)
 	return -1;
 }
 
+#ifndef RTE_EXEC_ENV_WINDOWS
+static int
+eal_parse_huge_worker_stack(const char *arg, size_t *huge_worker_stack_size)
+{
+	size_t worker_stack_size;
+	char *end;
+
+	if (arg == NULL || arg[0] == '\0') {
+		*huge_worker_stack_size = WORKER_STACK_SIZE_FROM_OS;
+		return 0;
+	}
+	errno = 0;
+	worker_stack_size = strtoul(arg, &end, 10);
+	if (errno || end == NULL || worker_stack_size == 0 ||
+	    worker_stack_size >= (size_t)-1 / 1024)
+		return -1;
+
+	*huge_worker_stack_size = worker_stack_size * 1024;
+	return 0;
+}
+#endif
+
 int
 eal_parse_common_option(int opt, const char *optarg,
 			struct internal_config *conf)
@@ -1921,6 +1944,17 @@  eal_parse_common_option(int opt, const char *optarg,
 		}
 		break;
 
+#ifndef RTE_EXEC_ENV_WINDOWS
+	case OPT_HUGE_WORKER_STACK_NUM:
+		if (eal_parse_huge_worker_stack(optarg,
+						&conf->huge_worker_stack_size) < 0) {
+			RTE_LOG(ERR, EAL, "invalid parameter for --"
+				OPT_HUGE_WORKER_STACK"\n");
+			return -1;
+		}
+		break;
+#endif /* !RTE_EXEC_ENV_WINDOWS */
+
 	/* don't know what to do, leave this to caller */
 	default:
 		return 1;
@@ -2235,5 +2269,12 @@  eal_common_usage(void)
 	       "  --"OPT_NO_PCI"            Disable PCI\n"
 	       "  --"OPT_NO_HPET"           Disable HPET\n"
 	       "  --"OPT_NO_SHCONF"         No shared config (mmap'd files)\n"
+#ifndef RTE_EXEC_ENV_WINDOWS
+	       "  --"OPT_HUGE_WORKER_STACK"[=size]\n"
+	       "                      Allocate worker thread stacks from\n"
+	       "                      hugepage memory. Size is in units of\n"
+	       "                      kbytes and defaults to system thread\n"
+	       "                      stack size if not specified.\n"
+#endif
 	       "\n", RTE_MAX_LCORE);
 }
diff --git a/lib/eal/common/eal_internal_cfg.h b/lib/eal/common/eal_internal_cfg.h
index b71faadd18..5e154967e4 100644
--- a/lib/eal/common/eal_internal_cfg.h
+++ b/lib/eal/common/eal_internal_cfg.h
@@ -48,6 +48,9 @@  struct hugepage_file_discipline {
 	bool unlink_existing;
 };
 
+/** Worker hugepage stack size should default to OS value. */
+#define WORKER_STACK_SIZE_FROM_OS ((size_t)~0)
+
 /**
  * internal configuration
  */
@@ -102,6 +105,7 @@  struct internal_config {
 	unsigned int no_telemetry; /**< true to disable Telemetry */
 	struct simd_bitwidth max_simd_bitwidth;
 	/**< max simd bitwidth path to use */
+	size_t huge_worker_stack_size; /**< worker thread stack size */
 };
 
 void eal_reset_internal_config(struct internal_config *internal_cfg);
diff --git a/lib/eal/common/eal_options.h b/lib/eal/common/eal_options.h
index 8e4f7202a2..3cc9cb6412 100644
--- a/lib/eal/common/eal_options.h
+++ b/lib/eal/common/eal_options.h
@@ -87,6 +87,8 @@  enum {
 	OPT_NO_TELEMETRY_NUM,
 #define OPT_FORCE_MAX_SIMD_BITWIDTH  "force-max-simd-bitwidth"
 	OPT_FORCE_MAX_SIMD_BITWIDTH_NUM,
+#define OPT_HUGE_WORKER_STACK  "huge-worker-stack"
+	OPT_HUGE_WORKER_STACK_NUM,
 
 	OPT_LONG_MAX_NUM
 };
diff --git a/lib/eal/linux/eal.c b/lib/eal/linux/eal.c
index 1ef263434a..2bee66577e 100644
--- a/lib/eal/linux/eal.c
+++ b/lib/eal/linux/eal.c
@@ -857,6 +857,64 @@  is_iommu_enabled(void)
 	return n > 2;
 }
 
+static int
+eal_worker_thread_create(struct internal_config *internal_conf,
+			 int lcore_id)
+{
+	pthread_attr_t attr;
+	size_t stack_size;
+	void *stack_ptr;
+	int ret;
+
+	if (internal_conf->huge_worker_stack_size == 0)
+		return pthread_create(&lcore_config[lcore_id].thread_id,
+				      NULL,
+				      eal_thread_loop,
+				      (void *)(uintptr_t)lcore_id);
+
+	/* Allocate NUMA aware stack memory and set pthread attributes */
+	if (pthread_attr_init(&attr) != 0) {
+		rte_eal_init_alert("Cannot init pthread attributes");
+		rte_errno = EFAULT;
+		return -1;
+	}
+	if (internal_conf->huge_worker_stack_size == WORKER_STACK_SIZE_FROM_OS) {
+		if (pthread_attr_getstacksize(&attr, &stack_size) != 0) {
+			rte_errno = EFAULT;
+			return -1;
+		}
+	} else {
+		stack_size = internal_conf->huge_worker_stack_size;
+	}
+	stack_ptr = rte_zmalloc_socket("lcore_stack",
+				       stack_size,
+				       stack_size,
+				       rte_lcore_to_socket_id(lcore_id));
+
+	if (stack_ptr == NULL) {
+		rte_eal_init_alert("Cannot allocate worker lcore stack memory");
+		rte_errno = ENOMEM;
+		return -1;
+	}
+
+	if (pthread_attr_setstack(&attr, stack_ptr, stack_size) != 0) {
+		rte_eal_init_alert("Cannot set pthread stack attributes");
+		rte_errno = EFAULT;
+		return -1;
+	}
+
+	ret = pthread_create(&lcore_config[lcore_id].thread_id, &attr,
+			     eal_thread_loop,
+			     (void *)(uintptr_t)lcore_id);
+
+	if (pthread_attr_destroy(&attr) != 0) {
+		rte_eal_init_alert("Cannot destroy pthread attributes");
+		rte_errno = EFAULT;
+		return -1;
+	}
+	return ret;
+}
+
 /* Launch threads, called at application init(). */
 int
 rte_eal_init(int argc, char **argv)
@@ -1144,8 +1202,7 @@  rte_eal_init(int argc, char **argv)
 		lcore_config[i].state = WAIT;
 
 		/* create a thread for each lcore */
-		ret = pthread_create(&lcore_config[i].thread_id, NULL,
-				     eal_thread_loop, (void *)(uintptr_t)i);
+		ret = eal_worker_thread_create(internal_conf, i);
 		if (ret != 0)
 			rte_panic("Cannot create thread\n");