[v2,1/1] app/testpmd: add GPU memory option in iofwd engine

Message ID 20211111214141.26612-2-eagostini@nvidia.com (mailing list archive)
State Superseded, archived
Delegated to: Ferruh Yigit
Headers
Series app/testpmd: add GPU memory option in iofwd engine |

Checks

Context Check Description
ci/checkpatch warning coding style issues
ci/github-robot: build success github build: passed
ci/iol-spell-check-testing warning Testing issues
ci/Intel-compilation success Compilation OK
ci/intel-Testing success Testing PASS
ci/iol-mellanox-Performance success Performance Testing PASS
ci/iol-broadcom-Functional success Functional Testing PASS
ci/iol-broadcom-Performance success Performance Testing PASS
ci/iol-aarch64-unit-testing success Testing PASS
ci/iol-x86_64-compile-testing success Testing PASS
ci/iol-aarch64-compile-testing success Testing PASS
ci/iol-x86_64-unit-testing success Testing PASS
ci/iol-intel-Performance success Performance Testing PASS
ci/iol-intel-Functional success Functional Testing PASS

Commit Message

Elena Agostini Nov. 11, 2021, 9:41 p.m. UTC
  From: eagostini <eagostini@nvidia.com>

This patch introduces GPU memory in testpmd through the gpudev library.
Testpmd can be used for network benchmarks when using GPU memory
instead of regular CPU memory to send and receive packets.
This option is currently limited to iofwd engine.

Signed-off-by: Elena Agostini <eagostini@nvidia.com>

Depends-on: series-19465 ("GPU library")
Depends-on: series-20422 ("common/mlx5: fix external memory pool registration")
---
 app/test-pmd/cmdline.c                |  14 +++
 app/test-pmd/meson.build              |   2 +-
 app/test-pmd/parameters.c             |  13 ++-
 app/test-pmd/testpmd.c                | 133 +++++++++++++++++++++++---
 app/test-pmd/testpmd.h                |  16 +++-
 doc/guides/testpmd_app_ug/run_app.rst |   3 +
 6 files changed, 164 insertions(+), 17 deletions(-)
  

Comments

Slava Ovsiienko Nov. 16, 2021, 4:28 p.m. UTC | #1
Hi, Elena

> -----Original Message-----
> From: eagostini@nvidia.com <eagostini@nvidia.com>
> Sent: Thursday, November 11, 2021 23:42
> To: dev@dpdk.org
> Cc: Elena Agostini <eagostini@nvidia.com>
> Subject: [PATCH v2 1/1] app/testpmd: add GPU memory option in iofwd
> engine
> 
> From: eagostini <eagostini@nvidia.com>

Could you, please, set "Author" correctly - "Elena Agostini <eagostini@nvidia.com>"?
Otherwise, we see in the git log:

"Author: eagostini <eagostini@nvidia.com>"

Compare with:
"Author: Bing Zhao <bingz@nvidia.com>"
"Author: Viacheslav Ovsiienko <viacheslavo@nvidia.com>"

Also, please, see the codestyle issues, too many code lines far beyond 100 chars.
Lines like this:
+		if (mbuf_mem_types[idx] == MBUF_MEM_GPU && strcmp(cur_fwd_eng->fwd_mode_name, "io") != 0) {
can be easily be splitted.

>>app/testpmd: add GPU memory option in iofwd
As I see from the patch - it adds the new mbuf pool type (residing on GPU memory).
May be, we should reword the title?
" app/testpmd: add GPU memory option for mbuf pools"

> 
> This patch introduces GPU memory in testpmd through the gpudev library.
> Testpmd can be used for network benchmarks when using GPU memory
> instead of regular CPU memory to send and receive packets.
> This option is currently limited to iofwd engine.
Why? Because iofwd the only mode not touching the mbuf data?
Is it critical for functionality? Is GPU mbuf pool memory accessible from CPU side?
I would explain the reasons (for supporting iofwd mode only) in commit message.

> 
> Signed-off-by: Elena Agostini <eagostini@nvidia.com>
> 
> Depends-on: series-19465 ("GPU library")
> Depends-on: series-20422 ("common/mlx5: fix external memory pool
> registration")
> ---
>  app/test-pmd/cmdline.c                |  14 +++
>  app/test-pmd/meson.build              |   2 +-
>  app/test-pmd/parameters.c             |  13 ++-
>  app/test-pmd/testpmd.c                | 133 +++++++++++++++++++++++---
>  app/test-pmd/testpmd.h                |  16 +++-
>  doc/guides/testpmd_app_ug/run_app.rst |   3 +
>  6 files changed, 164 insertions(+), 17 deletions(-)
> 
> diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c index
> 4f51b259fe..36193bc566 100644
> --- a/app/test-pmd/cmdline.c
> +++ b/app/test-pmd/cmdline.c
> @@ -3614,6 +3614,7 @@ parse_item_list(const char *str, const char

The "parse_item_list()" is used to parse the list looking like "number0, number1, ...., numberN"
and invoked for "core","port", "txtimes", etc. Not sure all of these params need to handle "g"
suffix. We should allow "g" processing only for "mbuf-size". We have "item_name" argument
to check whether we are invoked on "mbuf-size".

> *item_name, unsigned int max_items,
>  	unsigned int j;
>  	int value_ok;
>  	char c;
> +	int gpu_mbuf = 0;
> 
>  	/*
>  	 * First parse all items in the list and store their value.
> @@ -3628,6 +3629,14 @@ parse_item_list(const char *str, const char
> *item_name, unsigned int max_items,
>  			value_ok = 1;
>  			continue;
>  		}
> +		if (c == 'g') {
We should check whether "g" is the single char suffix (last char).
Otherwise, "20g48" and "g20gggg48" would be also valid values.

> +			/*
> +			 * When this flag is set, mbufs for this segment
> +			 * will be created on GPU memory.
> +			 */
> +			gpu_mbuf = 1;
> +			continue;
> +		}
>  		if (c != ',') {
>  			fprintf(stderr, "character %c is not a decimal digit\n",
> c);
>  			return 0;
> @@ -3640,6 +3649,8 @@ parse_item_list(const char *str, const char
> *item_name, unsigned int max_items,
>  			parsed_items[nb_item] = value;
>  			value_ok = 0;
>  			value = 0;
> +			mbuf_mem_types[nb_item] = gpu_mbuf ?
> MBUF_MEM_GPU : MBUF_MEM_CPU;
> +			gpu_mbuf = 0;
>  		}
>  		nb_item++;
>  	}
> @@ -3648,6 +3659,9 @@ parse_item_list(const char *str, const char
> *item_name, unsigned int max_items,
>  			item_name, nb_item + 1, max_items);
>  		return 0;
>  	}
> +
> +	mbuf_mem_types[nb_item] = gpu_mbuf ? MBUF_MEM_GPU :
> MBUF_MEM_CPU;
> +
>  	parsed_items[nb_item++] = value;
>  	if (! check_unique_values)
>  		return nb_item;
> diff --git a/app/test-pmd/meson.build b/app/test-pmd/meson.build index
> d5df52c470..5c8ca68c9d 100644
> --- a/app/test-pmd/meson.build
> +++ b/app/test-pmd/meson.build
> @@ -32,7 +32,7 @@ if dpdk_conf.has('RTE_HAS_JANSSON')
>      ext_deps += jansson_dep
>  endif
> 
> -deps += ['ethdev', 'gro', 'gso', 'cmdline', 'metrics', 'bus_pci']
> +deps += ['ethdev', 'gro', 'gso', 'cmdline', 'metrics', 'bus_pci',
> +'gpudev']
>  if dpdk_conf.has('RTE_CRYPTO_SCHEDULER')
>      deps += 'crypto_scheduler'
>  endif
> diff --git a/app/test-pmd/parameters.c b/app/test-pmd/parameters.c index
> 0974b0a38f..d41f7f220b 100644
> --- a/app/test-pmd/parameters.c
> +++ b/app/test-pmd/parameters.c
> @@ -87,7 +87,10 @@ usage(char* progname)
>  	       "in NUMA mode.\n");
>  	printf("  --mbuf-size=N,[N1[,..Nn]: set the data size of mbuf to "
>  	       "N bytes. If multiple numbers are specified the extra pools "
> -	       "will be created to receive with packet split features\n");
> +	       "will be created to receive with packet split features\n"
> +		   "Use 'g' suffix for GPU memory.\n"
> +		   "If no or an unrecognized suffix is provided, CPU is
> assumed\n");
Unrecognized suffix? I would emit an error and abort the launch.

> +
>  	printf("  --total-num-mbufs=N: set the number of mbufs to be
> allocated "
>  	       "in mbuf pools.\n");
>  	printf("  --max-pkt-len=N: set the maximum size of packet to N
> bytes.\n"); @@ -595,6 +598,7 @@ launch_args_parse(int argc, char** argv)
>  	struct rte_eth_dev_info dev_info;
>  	uint16_t rec_nb_pkts;
>  	int ret;
> +	uint32_t idx = 0;
> 
>  	static struct option lgopts[] = {
>  		{ "help",			0, 0, 0 },
> @@ -1538,4 +1542,11 @@ launch_args_parse(int argc, char** argv)
>  				  "ignored\n");
>  		mempool_flags = 0;
>  	}
> +
> +	for (idx = 0; idx < mbuf_data_size_n; idx++) {
> +		if (mbuf_mem_types[idx] == MBUF_MEM_GPU &&
> strcmp(cur_fwd_eng->fwd_mode_name, "io") != 0) {
> +			fprintf(stderr, "GPU memory mbufs can be used with
> iofwd engine only\n");
> +			rte_exit(EXIT_FAILURE, "Command line is
> incorrect\n");
> +		}
> +	}
Please, note, the forwarding mode can be changed from interactive prompt with "set fwd <mode>" command.
If iofwd mode is crucial for GPU functionality - we should prevent switching to other modes if GPU pools are engaged.


>  }
> diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c index
> a66dfb297c..1af235c4d8 100644
> --- a/app/test-pmd/testpmd.c
> +++ b/app/test-pmd/testpmd.c
> @@ -205,6 +205,12 @@ uint32_t mbuf_data_size_n = 1; /* Number of
> specified mbuf sizes. */  uint16_t mbuf_data_size[MAX_SEGS_BUFFER_SPLIT]
> = {
>  	DEFAULT_MBUF_DATA_SIZE
>  }; /**< Mbuf data space size. */
> +
> +/* Mbuf memory types. */
> +enum mbuf_mem_type mbuf_mem_types[MAX_SEGS_BUFFER_SPLIT];
> +/* Pointers to external memory allocated for mempools. */ uintptr_t
> +mempools_ext_ptr[MAX_SEGS_BUFFER_SPLIT];
> +
>  uint32_t param_total_num_mbufs = 0;  /**< number of mbufs in all pools - if
>                                        * specified on command-line. */  uint16_t
> stats_period; /**< Period to show statistics (disabled by default) */ @@ -
> 543,6 +549,12 @@ int proc_id;
>   */
>  unsigned int num_procs = 1;
> 
> +/*
> + * In case of GPU memory external mbufs use, for simplicity,
> + * the first GPU device in the list.
> + */
> +int gpu_id = 0;
It is assigned with zero and never changes. Support the first GPU only?
This limitation should be mentioned in documentation.

> +
>  static void
>  eth_rx_metadata_negotiate_mp(uint16_t port_id)  { @@ -1103,6 +1115,79
> @@ setup_extbuf(uint32_t nb_mbufs, uint16_t mbuf_sz, unsigned int
> socket_id,
>  	return ext_num;
>  }
> 
> +static struct rte_mempool *
> +gpu_mbuf_pool_create(uint16_t mbuf_seg_size, unsigned int nb_mbuf,
> +			unsigned int socket_id, uint16_t port_id,
> +			int gpu_id, uintptr_t * mp_addr)
> +{
> +	int ret = 0;
> +	char pool_name[RTE_MEMPOOL_NAMESIZE];
> +	struct rte_eth_dev_info dev_info;
> +	struct rte_mempool *rte_mp = NULL;
> +	struct rte_pktmbuf_extmem gpu_mem;
> +	struct rte_gpu_info ginfo;
> +	uint8_t gpu_page_shift = 16;
> +	uint32_t gpu_page_size = (1UL << gpu_page_shift);
> +
> +	ret = eth_dev_info_get_print_err(port_id, &dev_info);
> +	if (ret != 0)
> +		rte_exit(EXIT_FAILURE,
> +			"Failed to get device info for port %d\n",
> +			port_id);
> +
> +	mbuf_poolname_build(socket_id, pool_name, sizeof(pool_name),
> port_id, MBUF_MEM_GPU);
> +	if (!is_proc_primary()) {
> +		rte_mp = rte_mempool_lookup(pool_name);
> +		if (rte_mp == NULL)
> +			rte_exit(EXIT_FAILURE,
> +				"Get mbuf pool for socket %u failed: %s\n",
> +				socket_id, rte_strerror(rte_errno));
> +		return rte_mp;
> +	}
> +
> +	if (rte_gpu_info_get(gpu_id, &ginfo))
> +		rte_exit(EXIT_FAILURE, "Can't retrieve info about GPU %d -
> bye\n",
> +gpu_id);
> +
> +	TESTPMD_LOG(INFO,
> +		"create a new mbuf pool <%s>: n=%u, size=%u, socket=%u
> GPU device=%s\n",
> +		pool_name, nb_mbuf, mbuf_seg_size, socket_id, ginfo.name);
> +
> +	/* Create an external memory mempool using memory allocated on
> the
> +GPU. */
> +
> +	gpu_mem.elt_size = RTE_MBUF_DEFAULT_BUF_SIZE;
> +	gpu_mem.buf_len = RTE_ALIGN_CEIL(nb_mbuf * gpu_mem.elt_size,
> gpu_page_size);
> +	gpu_mem.buf_iova = RTE_BAD_IOVA;
> +
> +	gpu_mem.buf_ptr = rte_gpu_mem_alloc(gpu_id, gpu_mem.buf_len);
> +	if (gpu_mem.buf_ptr == NULL)
> +		rte_exit(EXIT_FAILURE, "Could not allocate GPU device
> memory\n");
> +
> +	ret = rte_extmem_register(gpu_mem.buf_ptr, gpu_mem.buf_len,
> NULL, gpu_mem.buf_iova, gpu_page_size);
> +	if (ret)
> +		rte_exit(EXIT_FAILURE, "Unable to register addr 0x%p, ret
> %d\n",
> +gpu_mem.buf_ptr, ret);
> +
> +	uint16_t pid = 0;
> +
> +	RTE_ETH_FOREACH_DEV(pid)
> +	{
> +		ret = rte_dev_dma_map(dev_info.device, gpu_mem.buf_ptr,
> +					  gpu_mem.buf_iova,
> gpu_mem.buf_len);
> +		if (ret) {
> +			rte_exit(EXIT_FAILURE, "Unable to DMA map addr
> 0x%p for device %s\n",
> +					 gpu_mem.buf_ptr, dev_info.device-
> >name);
> +		}
> +	}
> +
> +	rte_mp = rte_pktmbuf_pool_create_extbuf(pool_name, nb_mbuf,
> mb_mempool_cache, 0, mbuf_seg_size, socket_id, &gpu_mem, 1);
> +	if (rte_mp == NULL) {
> +		rte_exit(EXIT_FAILURE, "Creation of GPU mempool <%s>
> failed\n", pool_name);
> +	}
> +
> +	*mp_addr = (uintptr_t) gpu_mem.buf_ptr;
> +
> +	return rte_mp;
> +}
> +
>  /*
>   * Configuration initialisation done once at init time.
>   */
> @@ -1117,7 +1202,7 @@ mbuf_pool_create(uint16_t mbuf_seg_size,
> unsigned nb_mbuf,
> 
>  	mb_size = sizeof(struct rte_mbuf) + mbuf_seg_size;  #endif
> -	mbuf_poolname_build(socket_id, pool_name, sizeof(pool_name),
> size_idx);
> +	mbuf_poolname_build(socket_id, pool_name, sizeof(pool_name),
> size_idx,
> +MBUF_MEM_CPU);
>  	if (!is_proc_primary()) {
>  		rte_mp = rte_mempool_lookup(pool_name);
>  		if (rte_mp == NULL)
> @@ -1700,19 +1785,42 @@ init_config(void)
> 
>  		for (i = 0; i < num_sockets; i++)
>  			for (j = 0; j < mbuf_data_size_n; j++)
> -				mempools[i * MAX_SEGS_BUFFER_SPLIT + j]
> =
> -					mbuf_pool_create(mbuf_data_size[j],
> -							  nb_mbuf_per_pool,
> -							  socket_ids[i], j);
> +			{
> +				if (mbuf_mem_types[j] == MBUF_MEM_GPU)
> {
> +					if (rte_gpu_count_avail() == 0)
> +						rte_exit(EXIT_FAILURE, "No
> GPU device available.\n");
> +
> +					mempools[i *
> MAX_SEGS_BUFFER_SPLIT + j] =
> +
> 	gpu_mbuf_pool_create(mbuf_data_size[j],
What about GPU/CPU adherence ? Do we create one GPU pool per CPU socket?
Disregarding  hardware topology at all?
We can mitigate the issue by "--socket-mem/--socket-num" parameter though.

> +
> 		 nb_mbuf_per_pool,
> +
> 		 socket_ids[i], j, gpu_id,
> +
> 		 &(mempools_ext_ptr[j]));
> +				} else {
> +					mempools[i *
> MAX_SEGS_BUFFER_SPLIT + j] =
> +
> 	mbuf_pool_create(mbuf_data_size[j],
> +
> 	nb_mbuf_per_pool,
> +								socket_ids[i],
> j);
> +				}
> +			}
>  	} else {
>  		uint8_t i;
> 
>  		for (i = 0; i < mbuf_data_size_n; i++)
> -			mempools[i] = mbuf_pool_create
> -					(mbuf_data_size[i],
> -					 nb_mbuf_per_pool,
> -					 socket_num == UMA_NO_CONFIG ?
> -					 0 : socket_num, i);
> +		{
> +			if (mbuf_mem_types[i] == MBUF_MEM_GPU) {
> +				mempools[i] =
> gpu_mbuf_pool_create(mbuf_data_size[i],
> +
> 			   nb_mbuf_per_pool,
> +
> 			   socket_num == UMA_NO_CONFIG ? 0 : socket_num,
> +
> 			   i, gpu_id,
> +
> 			   &(mempools_ext_ptr[i]));
> +			} else {
> +				mempools[i] =
> mbuf_pool_create(mbuf_data_size[i],
> +							nb_mbuf_per_pool,
> +							socket_num ==
> UMA_NO_CONFIG ?
> +							0 : socket_num, i);
> +			}
> +		}
> +
>  	}
> 
>  	init_port_config();
> @@ -3415,8 +3523,11 @@ pmd_test_exit(void)
>  		}
>  	}
>  	for (i = 0 ; i < RTE_DIM(mempools) ; i++) {
> -		if (mempools[i])
> +		if (mempools[i]) {
>  			mempool_free_mp(mempools[i]);
> +			if (mbuf_mem_types[i] == MBUF_MEM_GPU)
> +				rte_gpu_mem_free(gpu_id, (void
> *)mempools_ext_ptr[i]);
> +		}
>  	}
>  	free(xstats_display);
> 
> diff --git a/app/test-pmd/testpmd.h b/app/test-pmd/testpmd.h index
> 669ce1e87d..9919044372 100644
> --- a/app/test-pmd/testpmd.h
> +++ b/app/test-pmd/testpmd.h
> @@ -12,6 +12,7 @@
>  #include <rte_gro.h>
>  #include <rte_gso.h>
>  #include <rte_os_shim.h>
> +#include <rte_gpudev.h>
>  #include <cmdline.h>
>  #include <sys/queue.h>
>  #ifdef RTE_HAS_JANSSON
> @@ -474,6 +475,11 @@ extern uint8_t dcb_config;  extern uint32_t
> mbuf_data_size_n;  extern uint16_t
> mbuf_data_size[MAX_SEGS_BUFFER_SPLIT];
>  /**< Mbuf data space size. */
> +enum mbuf_mem_type {
> +	MBUF_MEM_CPU,
> +	MBUF_MEM_GPU
> +};
> +extern enum mbuf_mem_type
> mbuf_mem_types[MAX_SEGS_BUFFER_SPLIT];
>  extern uint32_t param_total_num_mbufs;
> 
>  extern uint16_t stats_period;
> @@ -717,14 +723,16 @@ current_fwd_lcore(void)
>  /* Mbuf Pools */
>  static inline void
>  mbuf_poolname_build(unsigned int sock_id, char *mp_name,
> -		    int name_size, uint16_t idx)
> +		    int name_size, uint16_t idx, enum mbuf_mem_type
> mem_type)
>  {
> +	const char *suffix = mem_type == MBUF_MEM_GPU ? "_gpu" : "";
> +
>  	if (!idx)
>  		snprintf(mp_name, name_size,
> -			 MBUF_POOL_NAME_PFX "_%u", sock_id);
> +			 MBUF_POOL_NAME_PFX "_%u%s", sock_id, suffix);
>  	else
>  		snprintf(mp_name, name_size,
> -			 MBUF_POOL_NAME_PFX "_%hu_%hu",
> (uint16_t)sock_id, idx);
> +			 MBUF_POOL_NAME_PFX "_%hu_%hu%s",
> (uint16_t)sock_id, idx, suffix);
>  }
> 
>  static inline struct rte_mempool *
> @@ -732,7 +740,7 @@ mbuf_pool_find(unsigned int sock_id, uint16_t idx)  {
>  	char pool_name[RTE_MEMPOOL_NAMESIZE];
> 
> -	mbuf_poolname_build(sock_id, pool_name, sizeof(pool_name), idx);
> +	mbuf_poolname_build(sock_id, pool_name, sizeof(pool_name), idx,
> +mbuf_mem_types[idx]);
>  	return rte_mempool_lookup((const char *)pool_name);  }
> 
> diff --git a/doc/guides/testpmd_app_ug/run_app.rst
> b/doc/guides/testpmd_app_ug/run_app.rst
> index 30edef07ea..ede7b79abb 100644
> --- a/doc/guides/testpmd_app_ug/run_app.rst
> +++ b/doc/guides/testpmd_app_ug/run_app.rst
> @@ -119,6 +119,9 @@ The command line options are:
>      The default value is 2048. If multiple mbuf-size values are specified the
>      extra memory pools will be created for allocating mbufs to receive packets
>      with buffer splitting features.
> +    Providing mbuf size with a 'g' suffix (e.g. ``--mbuf-size=2048g``),
> +    will cause the mempool to be created on a GPU memory area allocated.
> +    This option is currently limited to iofwd engine with the first GPU.
Mmm, if we have multiple GPUs - we have no way to specify on which one we should allocate the pool.
Syntax is not complete☹. Possible we should specify GPU port id after the suffix ? Like 2048g2 ?
If port index is omitted we should consider we have the only GPU and check this.

With best regards,
Slava
  
Ananyev, Konstantin Nov. 16, 2021, 5:16 p.m. UTC | #2
> Could you, please, set "Author" correctly - "Elena Agostini <eagostini@nvidia.com>"?
> Otherwise, we see in the git log:
> 
> "Author: eagostini <eagostini@nvidia.com>"
> 
> Compare with:
> "Author: Bing Zhao <bingz@nvidia.com>"
> "Author: Viacheslav Ovsiienko <viacheslavo@nvidia.com>"
> 
> Also, please, see the codestyle issues, too many code lines far beyond 100 chars.
> Lines like this:
> +		if (mbuf_mem_types[idx] == MBUF_MEM_GPU && strcmp(cur_fwd_eng->fwd_mode_name, "io") != 0) {
> can be easily be splitted.
> 
> >>app/testpmd: add GPU memory option in iofwd
> As I see from the patch - it adds the new mbuf pool type (residing on GPU memory).
> May be, we should reword the title?
> " app/testpmd: add GPU memory option for mbuf pools"
> 
> >
> > This patch introduces GPU memory in testpmd through the gpudev library.
> > Testpmd can be used for network benchmarks when using GPU memory
> > instead of regular CPU memory to send and receive packets.
> > This option is currently limited to iofwd engine.
> Why? Because iofwd the only mode not touching the mbuf data?
> Is it critical for functionality? Is GPU mbuf pool memory accessible from CPU side?
> I would explain the reasons (for supporting iofwd mode only) in commit message.
> 
> >
> > Signed-off-by: Elena Agostini <eagostini@nvidia.com>
> >
> > Depends-on: series-19465 ("GPU library")
> > Depends-on: series-20422 ("common/mlx5: fix external memory pool
> > registration")
> > ---
> >  app/test-pmd/cmdline.c                |  14 +++
> >  app/test-pmd/meson.build              |   2 +-
> >  app/test-pmd/parameters.c             |  13 ++-
> >  app/test-pmd/testpmd.c                | 133 +++++++++++++++++++++++---
> >  app/test-pmd/testpmd.h                |  16 +++-
> >  doc/guides/testpmd_app_ug/run_app.rst |   3 +
> >  6 files changed, 164 insertions(+), 17 deletions(-)
> >
> > diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c index
> > 4f51b259fe..36193bc566 100644
> > --- a/app/test-pmd/cmdline.c
> > +++ b/app/test-pmd/cmdline.c
> > @@ -3614,6 +3614,7 @@ parse_item_list(const char *str, const char
> 
> The "parse_item_list()" is used to parse the list looking like "number0, number1, ...., numberN"
> and invoked for "core","port", "txtimes", etc. Not sure all of these params need to handle "g"
> suffix. We should allow "g" processing only for "mbuf-size". We have "item_name" argument
> to check whether we are invoked on "mbuf-size".
> 
> > *item_name, unsigned int max_items,
> >  	unsigned int j;
> >  	int value_ok;
> >  	char c;
> > +	int gpu_mbuf = 0;
> >
> >  	/*
> >  	 * First parse all items in the list and store their value.
> > @@ -3628,6 +3629,14 @@ parse_item_list(const char *str, const char
> > *item_name, unsigned int max_items,
> >  			value_ok = 1;
> >  			continue;
> >  		}
> > +		if (c == 'g') {
> We should check whether "g" is the single char suffix (last char).
> Otherwise, "20g48" and "g20gggg48" would be also valid values.
> 
> > +			/*
> > +			 * When this flag is set, mbufs for this segment
> > +			 * will be created on GPU memory.
> > +			 */
> > +			gpu_mbuf = 1;
> > +			continue;
> > +		}
> >  		if (c != ',') {
> >  			fprintf(stderr, "character %c is not a decimal digit\n",
> > c);
> >  			return 0;
> > @@ -3640,6 +3649,8 @@ parse_item_list(const char *str, const char
> > *item_name, unsigned int max_items,
> >  			parsed_items[nb_item] = value;
> >  			value_ok = 0;
> >  			value = 0;
> > +			mbuf_mem_types[nb_item] = gpu_mbuf ?
> > MBUF_MEM_GPU : MBUF_MEM_CPU;
> > +			gpu_mbuf = 0;
> >  		}
> >  		nb_item++;
> >  	}
> > @@ -3648,6 +3659,9 @@ parse_item_list(const char *str, const char
> > *item_name, unsigned int max_items,
> >  			item_name, nb_item + 1, max_items);
> >  		return 0;
> >  	}
> > +
> > +	mbuf_mem_types[nb_item] = gpu_mbuf ? MBUF_MEM_GPU :
> > MBUF_MEM_CPU;
> > +
> >  	parsed_items[nb_item++] = value;
> >  	if (! check_unique_values)
> >  		return nb_item;
> > diff --git a/app/test-pmd/meson.build b/app/test-pmd/meson.build index
> > d5df52c470..5c8ca68c9d 100644
> > --- a/app/test-pmd/meson.build
> > +++ b/app/test-pmd/meson.build
> > @@ -32,7 +32,7 @@ if dpdk_conf.has('RTE_HAS_JANSSON')
> >      ext_deps += jansson_dep
> >  endif
> >
> > -deps += ['ethdev', 'gro', 'gso', 'cmdline', 'metrics', 'bus_pci']
> > +deps += ['ethdev', 'gro', 'gso', 'cmdline', 'metrics', 'bus_pci',
> > +'gpudev']
> >  if dpdk_conf.has('RTE_CRYPTO_SCHEDULER')
> >      deps += 'crypto_scheduler'
> >  endif
> > diff --git a/app/test-pmd/parameters.c b/app/test-pmd/parameters.c index
> > 0974b0a38f..d41f7f220b 100644
> > --- a/app/test-pmd/parameters.c
> > +++ b/app/test-pmd/parameters.c
> > @@ -87,7 +87,10 @@ usage(char* progname)
> >  	       "in NUMA mode.\n");
> >  	printf("  --mbuf-size=N,[N1[,..Nn]: set the data size of mbuf to "
> >  	       "N bytes. If multiple numbers are specified the extra pools "
> > -	       "will be created to receive with packet split features\n");
> > +	       "will be created to receive with packet split features\n"
> > +		   "Use 'g' suffix for GPU memory.\n"
> > +		   "If no or an unrecognized suffix is provided, CPU is
> > assumed\n");
> Unrecognized suffix? I would emit an error and abort the launch.
> 
> > +
> >  	printf("  --total-num-mbufs=N: set the number of mbufs to be
> > allocated "
> >  	       "in mbuf pools.\n");
> >  	printf("  --max-pkt-len=N: set the maximum size of packet to N
> > bytes.\n"); @@ -595,6 +598,7 @@ launch_args_parse(int argc, char** argv)
> >  	struct rte_eth_dev_info dev_info;
> >  	uint16_t rec_nb_pkts;
> >  	int ret;
> > +	uint32_t idx = 0;
> >
> >  	static struct option lgopts[] = {
> >  		{ "help",			0, 0, 0 },
> > @@ -1538,4 +1542,11 @@ launch_args_parse(int argc, char** argv)
> >  				  "ignored\n");
> >  		mempool_flags = 0;
> >  	}
> > +
> > +	for (idx = 0; idx < mbuf_data_size_n; idx++) {
> > +		if (mbuf_mem_types[idx] == MBUF_MEM_GPU &&
> > strcmp(cur_fwd_eng->fwd_mode_name, "io") != 0) {
> > +			fprintf(stderr, "GPU memory mbufs can be used with
> > iofwd engine only\n");
> > +			rte_exit(EXIT_FAILURE, "Command line is
> > incorrect\n");
> > +		}
> > +	}
> Please, note, the forwarding mode can be changed from interactive prompt with "set fwd <mode>" command.
> If iofwd mode is crucial for GPU functionality - we should prevent switching to other modes if GPU pools are engaged.
> 
> 
> >  }
> > diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c index
> > a66dfb297c..1af235c4d8 100644
> > --- a/app/test-pmd/testpmd.c
> > +++ b/app/test-pmd/testpmd.c
> > @@ -205,6 +205,12 @@ uint32_t mbuf_data_size_n = 1; /* Number of
> > specified mbuf sizes. */  uint16_t mbuf_data_size[MAX_SEGS_BUFFER_SPLIT]
> > = {
> >  	DEFAULT_MBUF_DATA_SIZE
> >  }; /**< Mbuf data space size. */
> > +
> > +/* Mbuf memory types. */
> > +enum mbuf_mem_type mbuf_mem_types[MAX_SEGS_BUFFER_SPLIT];
> > +/* Pointers to external memory allocated for mempools. */ uintptr_t
> > +mempools_ext_ptr[MAX_SEGS_BUFFER_SPLIT];
> > +
> >  uint32_t param_total_num_mbufs = 0;  /**< number of mbufs in all pools - if
> >                                        * specified on command-line. */  uint16_t
> > stats_period; /**< Period to show statistics (disabled by default) */ @@ -
> > 543,6 +549,12 @@ int proc_id;
> >   */
> >  unsigned int num_procs = 1;
> >
> > +/*
> > + * In case of GPU memory external mbufs use, for simplicity,
> > + * the first GPU device in the list.
> > + */
> > +int gpu_id = 0;
> It is assigned with zero and never changes. Support the first GPU only?
> This limitation should be mentioned in documentation.
> 
> > +
> >  static void
> >  eth_rx_metadata_negotiate_mp(uint16_t port_id)  { @@ -1103,6 +1115,79
> > @@ setup_extbuf(uint32_t nb_mbufs, uint16_t mbuf_sz, unsigned int
> > socket_id,
> >  	return ext_num;
> >  }
> >
> > +static struct rte_mempool *
> > +gpu_mbuf_pool_create(uint16_t mbuf_seg_size, unsigned int nb_mbuf,
> > +			unsigned int socket_id, uint16_t port_id,
> > +			int gpu_id, uintptr_t * mp_addr)
> > +{
> > +	int ret = 0;
> > +	char pool_name[RTE_MEMPOOL_NAMESIZE];
> > +	struct rte_eth_dev_info dev_info;
> > +	struct rte_mempool *rte_mp = NULL;
> > +	struct rte_pktmbuf_extmem gpu_mem;
> > +	struct rte_gpu_info ginfo;
> > +	uint8_t gpu_page_shift = 16;
> > +	uint32_t gpu_page_size = (1UL << gpu_page_shift);
> > +
> > +	ret = eth_dev_info_get_print_err(port_id, &dev_info);
> > +	if (ret != 0)
> > +		rte_exit(EXIT_FAILURE,
> > +			"Failed to get device info for port %d\n",
> > +			port_id);
> > +
> > +	mbuf_poolname_build(socket_id, pool_name, sizeof(pool_name),
> > port_id, MBUF_MEM_GPU);
> > +	if (!is_proc_primary()) {
> > +		rte_mp = rte_mempool_lookup(pool_name);
> > +		if (rte_mp == NULL)
> > +			rte_exit(EXIT_FAILURE,
> > +				"Get mbuf pool for socket %u failed: %s\n",
> > +				socket_id, rte_strerror(rte_errno));
> > +		return rte_mp;
> > +	}
> > +
> > +	if (rte_gpu_info_get(gpu_id, &ginfo))
> > +		rte_exit(EXIT_FAILURE, "Can't retrieve info about GPU %d -
> > bye\n",
> > +gpu_id);
> > +
> > +	TESTPMD_LOG(INFO,
> > +		"create a new mbuf pool <%s>: n=%u, size=%u, socket=%u
> > GPU device=%s\n",
> > +		pool_name, nb_mbuf, mbuf_seg_size, socket_id, ginfo.name);
> > +
> > +	/* Create an external memory mempool using memory allocated on
> > the
> > +GPU. */
> > +
> > +	gpu_mem.elt_size = RTE_MBUF_DEFAULT_BUF_SIZE;
> > +	gpu_mem.buf_len = RTE_ALIGN_CEIL(nb_mbuf * gpu_mem.elt_size,
> > gpu_page_size);
> > +	gpu_mem.buf_iova = RTE_BAD_IOVA;
> > +
> > +	gpu_mem.buf_ptr = rte_gpu_mem_alloc(gpu_id, gpu_mem.buf_len);
> > +	if (gpu_mem.buf_ptr == NULL)
> > +		rte_exit(EXIT_FAILURE, "Could not allocate GPU device
> > memory\n");
> > +
> > +	ret = rte_extmem_register(gpu_mem.buf_ptr, gpu_mem.buf_len,
> > NULL, gpu_mem.buf_iova, gpu_page_size);
> > +	if (ret)
> > +		rte_exit(EXIT_FAILURE, "Unable to register addr 0x%p, ret
> > %d\n",
> > +gpu_mem.buf_ptr, ret);
> > +
> > +	uint16_t pid = 0;
> > +
> > +	RTE_ETH_FOREACH_DEV(pid)
> > +	{
> > +		ret = rte_dev_dma_map(dev_info.device, gpu_mem.buf_ptr,
> > +					  gpu_mem.buf_iova,
> > gpu_mem.buf_len);
> > +		if (ret) {
> > +			rte_exit(EXIT_FAILURE, "Unable to DMA map addr
> > 0x%p for device %s\n",
> > +					 gpu_mem.buf_ptr, dev_info.device-
> > >name);
> > +		}
> > +	}
> > +
> > +	rte_mp = rte_pktmbuf_pool_create_extbuf(pool_name, nb_mbuf,
> > mb_mempool_cache, 0, mbuf_seg_size, socket_id, &gpu_mem, 1);
> > +	if (rte_mp == NULL) {
> > +		rte_exit(EXIT_FAILURE, "Creation of GPU mempool <%s>
> > failed\n", pool_name);
> > +	}
> > +
> > +	*mp_addr = (uintptr_t) gpu_mem.buf_ptr;
> > +
> > +	return rte_mp;
> > +}
> > +
> >  /*
> >   * Configuration initialisation done once at init time.
> >   */
> > @@ -1117,7 +1202,7 @@ mbuf_pool_create(uint16_t mbuf_seg_size,
> > unsigned nb_mbuf,
> >
> >  	mb_size = sizeof(struct rte_mbuf) + mbuf_seg_size;  #endif
> > -	mbuf_poolname_build(socket_id, pool_name, sizeof(pool_name),
> > size_idx);
> > +	mbuf_poolname_build(socket_id, pool_name, sizeof(pool_name),
> > size_idx,
> > +MBUF_MEM_CPU);
> >  	if (!is_proc_primary()) {
> >  		rte_mp = rte_mempool_lookup(pool_name);
> >  		if (rte_mp == NULL)
> > @@ -1700,19 +1785,42 @@ init_config(void)
> >
> >  		for (i = 0; i < num_sockets; i++)
> >  			for (j = 0; j < mbuf_data_size_n; j++)
> > -				mempools[i * MAX_SEGS_BUFFER_SPLIT + j]
> > =
> > -					mbuf_pool_create(mbuf_data_size[j],
> > -							  nb_mbuf_per_pool,
> > -							  socket_ids[i], j);
> > +			{
> > +				if (mbuf_mem_types[j] == MBUF_MEM_GPU)
> > {
> > +					if (rte_gpu_count_avail() == 0)
> > +						rte_exit(EXIT_FAILURE, "No
> > GPU device available.\n");
> > +
> > +					mempools[i *
> > MAX_SEGS_BUFFER_SPLIT + j] =
> > +
> > 	gpu_mbuf_pool_create(mbuf_data_size[j],
> What about GPU/CPU adherence ? Do we create one GPU pool per CPU socket?
> Disregarding  hardware topology at all?
> We can mitigate the issue by "--socket-mem/--socket-num" parameter though.
> 
> > +
> > 		 nb_mbuf_per_pool,
> > +
> > 		 socket_ids[i], j, gpu_id,
> > +
> > 		 &(mempools_ext_ptr[j]));
> > +				} else {
> > +					mempools[i *
> > MAX_SEGS_BUFFER_SPLIT + j] =
> > +
> > 	mbuf_pool_create(mbuf_data_size[j],
> > +
> > 	nb_mbuf_per_pool,
> > +								socket_ids[i],
> > j);
> > +				}
> > +			}
> >  	} else {
> >  		uint8_t i;
> >
> >  		for (i = 0; i < mbuf_data_size_n; i++)
> > -			mempools[i] = mbuf_pool_create
> > -					(mbuf_data_size[i],
> > -					 nb_mbuf_per_pool,
> > -					 socket_num == UMA_NO_CONFIG ?
> > -					 0 : socket_num, i);
> > +		{
> > +			if (mbuf_mem_types[i] == MBUF_MEM_GPU) {
> > +				mempools[i] =
> > gpu_mbuf_pool_create(mbuf_data_size[i],
> > +
> > 			   nb_mbuf_per_pool,
> > +
> > 			   socket_num == UMA_NO_CONFIG ? 0 : socket_num,
> > +
> > 			   i, gpu_id,
> > +
> > 			   &(mempools_ext_ptr[i]));
> > +			} else {
> > +				mempools[i] =
> > mbuf_pool_create(mbuf_data_size[i],
> > +							nb_mbuf_per_pool,
> > +							socket_num ==
> > UMA_NO_CONFIG ?
> > +							0 : socket_num, i);
> > +			}
> > +		}
> > +
> >  	}
> >
> >  	init_port_config();
> > @@ -3415,8 +3523,11 @@ pmd_test_exit(void)
> >  		}
> >  	}
> >  	for (i = 0 ; i < RTE_DIM(mempools) ; i++) {
> > -		if (mempools[i])
> > +		if (mempools[i]) {
> >  			mempool_free_mp(mempools[i]);
> > +			if (mbuf_mem_types[i] == MBUF_MEM_GPU)
> > +				rte_gpu_mem_free(gpu_id, (void
> > *)mempools_ext_ptr[i]);
> > +		}
> >  	}
> >  	free(xstats_display);
> >
> > diff --git a/app/test-pmd/testpmd.h b/app/test-pmd/testpmd.h index
> > 669ce1e87d..9919044372 100644
> > --- a/app/test-pmd/testpmd.h
> > +++ b/app/test-pmd/testpmd.h
> > @@ -12,6 +12,7 @@
> >  #include <rte_gro.h>
> >  #include <rte_gso.h>
> >  #include <rte_os_shim.h>
> > +#include <rte_gpudev.h>
> >  #include <cmdline.h>
> >  #include <sys/queue.h>
> >  #ifdef RTE_HAS_JANSSON
> > @@ -474,6 +475,11 @@ extern uint8_t dcb_config;  extern uint32_t
> > mbuf_data_size_n;  extern uint16_t
> > mbuf_data_size[MAX_SEGS_BUFFER_SPLIT];
> >  /**< Mbuf data space size. */
> > +enum mbuf_mem_type {
> > +	MBUF_MEM_CPU,
> > +	MBUF_MEM_GPU
> > +};
> > +extern enum mbuf_mem_type
> > mbuf_mem_types[MAX_SEGS_BUFFER_SPLIT];
> >  extern uint32_t param_total_num_mbufs;
> >
> >  extern uint16_t stats_period;
> > @@ -717,14 +723,16 @@ current_fwd_lcore(void)
> >  /* Mbuf Pools */
> >  static inline void
> >  mbuf_poolname_build(unsigned int sock_id, char *mp_name,
> > -		    int name_size, uint16_t idx)
> > +		    int name_size, uint16_t idx, enum mbuf_mem_type
> > mem_type)
> >  {
> > +	const char *suffix = mem_type == MBUF_MEM_GPU ? "_gpu" : "";
> > +
> >  	if (!idx)
> >  		snprintf(mp_name, name_size,
> > -			 MBUF_POOL_NAME_PFX "_%u", sock_id);
> > +			 MBUF_POOL_NAME_PFX "_%u%s", sock_id, suffix);
> >  	else
> >  		snprintf(mp_name, name_size,
> > -			 MBUF_POOL_NAME_PFX "_%hu_%hu",
> > (uint16_t)sock_id, idx);
> > +			 MBUF_POOL_NAME_PFX "_%hu_%hu%s",
> > (uint16_t)sock_id, idx, suffix);
> >  }
> >
> >  static inline struct rte_mempool *
> > @@ -732,7 +740,7 @@ mbuf_pool_find(unsigned int sock_id, uint16_t idx)  {
> >  	char pool_name[RTE_MEMPOOL_NAMESIZE];
> >
> > -	mbuf_poolname_build(sock_id, pool_name, sizeof(pool_name), idx);
> > +	mbuf_poolname_build(sock_id, pool_name, sizeof(pool_name), idx,
> > +mbuf_mem_types[idx]);
> >  	return rte_mempool_lookup((const char *)pool_name);  }
> >
> > diff --git a/doc/guides/testpmd_app_ug/run_app.rst
> > b/doc/guides/testpmd_app_ug/run_app.rst
> > index 30edef07ea..ede7b79abb 100644
> > --- a/doc/guides/testpmd_app_ug/run_app.rst
> > +++ b/doc/guides/testpmd_app_ug/run_app.rst
> > @@ -119,6 +119,9 @@ The command line options are:
> >      The default value is 2048. If multiple mbuf-size values are specified the
> >      extra memory pools will be created for allocating mbufs to receive packets
> >      with buffer splitting features.
> > +    Providing mbuf size with a 'g' suffix (e.g. ``--mbuf-size=2048g``),
> > +    will cause the mempool to be created on a GPU memory area allocated.
> > +    This option is currently limited to iofwd engine with the first GPU.
> Mmm, if we have multiple GPUs - we have no way to specify on which one we should allocate the pool.
> Syntax is not complete☹. Possible we should specify GPU port id after the suffix ? Like 2048g2 ?
> If port index is omitted we should consider we have the only GPU and check this.

Do we need to overload --mbuf-size parameter here?
Why not add 'gpu' option to --mp-alloc parameter?
  
Ferruh Yigit Nov. 16, 2021, 5:55 p.m. UTC | #3
On 11/11/2021 9:41 PM, eagostini@nvidia.com wrote:
> --- a/app/test-pmd/meson.build
> +++ b/app/test-pmd/meson.build
> @@ -32,7 +32,7 @@ if dpdk_conf.has('RTE_HAS_JANSSON')
>       ext_deps += jansson_dep
>   endif
>   
> -deps += ['ethdev', 'gro', 'gso', 'cmdline', 'metrics', 'bus_pci']
> +deps += ['ethdev', 'gro', 'gso', 'cmdline', 'metrics', 'bus_pci', 'gpudev']

I didn't review the set, but in a very high level do we want to add
'gpudev' as dependency? Isn't this like adding 'rawdev' as dependency.
  
Elena Agostini Nov. 16, 2021, 6:06 p.m. UTC | #4
> From: Ferruh Yigit <ferruh.yigit@intel.com>
> Date: Tuesday, 16 November 2021 at 19:00
> To: Elena Agostini <eagostini@nvidia.com>
> Cc: dev@dpdk.org <dev@dpdk.org>, Slava Ovsiienko <viacheslavo@nvidia.com>
> Subject: Re: [PATCH v2 1/1] app/testpmd: add GPU memory option in iofwd engine
> External email: Use caution opening links or attachments>
>

> On 11/11/2021 9:41 PM, eagostini@nvidia.com wrote:
> > --- a/app/test-pmd/meson.build
> > +++ b/app/test-pmd/meson.build
> > @@ -32,7 +32,7 @@ if dpdk_conf.has('RTE_HAS_JANSSON')
> >       ext_deps += jansson_dep
> >   endif
> >
> > -deps += ['ethdev', 'gro', 'gso', 'cmdline', 'metrics', 'bus_pci']
> > +deps += ['ethdev', 'gro', 'gso', 'cmdline', 'metrics', 'bus_pci', 'gpudev']>

> I didn't review the set, but in a very high level do we want to add
> 'gpudev' as dependency? Isn't this like adding 'rawdev' as dependency.

gpudev is a library that can be built without a gpu driver as all the other libraries
and it is actually used only in case of GPU memory mempool.

Reasons for this patch are:

- Have an upstreamed benchmark tool to measure network metrics using GPU memory
- Test some DPDK features not really tested anywhere like the external memory mempool feature
  
Ferruh Yigit Nov. 16, 2021, 6:11 p.m. UTC | #5
On 11/16/2021 6:06 PM, Elena Agostini wrote:
>  > From: Ferruh Yigit <ferruh.yigit@intel.com>
> 
>  > Date: Tuesday, 16 November 2021 at 19:00
> 
>  > To: Elena Agostini <eagostini@nvidia.com>
> 
>  > Cc: dev@dpdk.org <dev@dpdk.org>, Slava Ovsiienko <viacheslavo@nvidia.com>
> 
>  > Subject: Re: [PATCH v2 1/1] app/testpmd: add GPU memory option in iofwd engine
> 
>  > External email: Use caution opening links or attachments>
> 
>  >
> 
>  > On 11/11/2021 9:41 PM, eagostini@nvidia.com wrote:
> 
>  > > --- a/app/test-pmd/meson.build
> 
>  > > +++ b/app/test-pmd/meson.build
> 
>  > > @@ -32,7 +32,7 @@ if dpdk_conf.has('RTE_HAS_JANSSON')
> 
>  > >       ext_deps += jansson_dep
> 
>  > >   endif
> 
>  > >
> 
>  > > -deps += ['ethdev', 'gro', 'gso', 'cmdline', 'metrics', 'bus_pci']
> 
>  > > +deps += ['ethdev', 'gro', 'gso', 'cmdline', 'metrics', 'bus_pci', 'gpudev']>
> 
>  > I didn't review the set, but in a very high level do we want to add
> 
>  > 'gpudev' as dependency? Isn't this like adding 'rawdev' as dependency.
> 
> gpudev is a library that can be built without a gpu driver as all the other libraries
> 
> and itis actually used only in case of GPU memory mempool.
> 
> Reasons for this patch are:
> 
> - Have an upstreamed benchmark tool to measure network metrics using GPU memory
> 
> - Test some DPDK features not really tested anywhere like the external memory mempool feature
> 

I can see the reason, that is obvious, yet again why we are not adding rawdev
testing to the testpmd? But adding gpudev.
It is easier to add it to the testpmd, and for some testing perspective it
makes sense, but still I am not quite sure about this new dependency, I would
like to get more feedback.
  
Elena Agostini Nov. 16, 2021, 6:15 p.m. UTC | #6
> > Could you, please, set "Author" correctly - "Elena Agostini <eagostini@nvidia.com>"?
> > Otherwise, we see in the git log:
> >
> > "Author: eagostini <eagostini@nvidia.com>"
> >
> > Compare with:
> > "Author: Bing Zhao <bingz@nvidia.com>"
> > "Author: Viacheslav Ovsiienko <viacheslavo@nvidia.com>"
> >
> > Also, please, see the codestyle issues, too many code lines far beyond 100 chars.
> > Lines like this:
> > +             if (mbuf_mem_types[idx] == MBUF_MEM_GPU && strcmp(cur_fwd_eng->fwd_mode_name, "io") != 0) {
> > can be easily be splitted.
> >
> > >>app/testpmd: add GPU memory option in iofwd
> > As I see from the patch - it adds the new mbuf pool type (residing on GPU memory).
> > May be, we should reword the title?
> > " app/testpmd: add GPU memory option for mbuf pools"
> >
> > >
> > > This patch introduces GPU memory in testpmd through the gpudev library.
> > > Testpmd can be used for network benchmarks when using GPU memory
> > > instead of regular CPU memory to send and receive packets.
> > > This option is currently limited to iofwd engine.
> > Why? Because iofwd the only mode not touching the mbuf data?
> > Is it critical for functionality? Is GPU mbuf pool memory accessible from CPU side?
> > I would explain the reasons (for supporting iofwd mode only) in commit message.
> >
> > >
> > > Signed-off-by: Elena Agostini <eagostini@nvidia.com>
> > >
> > > Depends-on: series-19465 ("GPU library")
> > > Depends-on: series-20422 ("common/mlx5: fix external memory pool
> > > registration")
> > > ---
> > >  app/test-pmd/cmdline.c                |  14 +++
> > >  app/test-pmd/meson.build              |   2 +-
> > >  app/test-pmd/parameters.c             |  13 ++-
> > >  app/test-pmd/testpmd.c                | 133 +++++++++++++++++++++++---
> > >  app/test-pmd/testpmd.h                |  16 +++-
> > >  doc/guides/testpmd_app_ug/run_app.rst |   3 +
> > >  6 files changed, 164 insertions(+), 17 deletions(-)
> > >
> > > diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c index
> > > 4f51b259fe..36193bc566 100644
> > > --- a/app/test-pmd/cmdline.c
> > > +++ b/app/test-pmd/cmdline.c
> > > @@ -3614,6 +3614,7 @@ parse_item_list(const char *str, const char
> >
> > The "parse_item_list()" is used to parse the list looking like "number0, number1, ...., numberN"
> > and invoked for "core","port", "txtimes", etc. Not sure all of these params need to handle "g"
> > suffix. We should allow "g" processing only for "mbuf-size". We have "item_name" argument
> > to check whether we are invoked on "mbuf-size".
> >
> > > *item_name, unsigned int max_items,
> > >     unsigned int j;
> > >     int value_ok;
> > >     char c;
> > > +   int gpu_mbuf = 0;
> > >
> > >     /*
> > >      * First parse all items in the list and store their value.
> > > @@ -3628,6 +3629,14 @@ parse_item_list(const char *str, const char
> > > *item_name, unsigned int max_items,
> > >                     value_ok = 1;
> > >                     continue;
> > >             }
> > > +           if (c == 'g') {
> > We should check whether "g" is the single char suffix (last char).
> > Otherwise, "20g48" and "g20gggg48" would be also valid values.
> >
> > > +                   /*
> > > +                    * When this flag is set, mbufs for this segment
> > > +                    * will be created on GPU memory.
> > > +                    */
> > > +                   gpu_mbuf = 1;
> > > +                   continue;
> > > +           }
> > >             if (c != ',') {
> > >                     fprintf(stderr, "character %c is not a decimal digit\n",
> > > c);
> > >                     return 0;
> > > @@ -3640,6 +3649,8 @@ parse_item_list(const char *str, const char
> > > *item_name, unsigned int max_items,
> > >                     parsed_items[nb_item] = value;
> > >                     value_ok = 0;
> > >                     value = 0;
> > > +                   mbuf_mem_types[nb_item] = gpu_mbuf ?
> > > MBUF_MEM_GPU : MBUF_MEM_CPU;
> > > +                   gpu_mbuf = 0;
> > >             }
> > >             nb_item++;
> > >     }
> > > @@ -3648,6 +3659,9 @@ parse_item_list(const char *str, const char
> > > *item_name, unsigned int max_items,
> > >                     item_name, nb_item + 1, max_items);
> > >             return 0;
> > >     }
> > > +
> > > +   mbuf_mem_types[nb_item] = gpu_mbuf ? MBUF_MEM_GPU :
> > > MBUF_MEM_CPU;
> > > +
> > >     parsed_items[nb_item++] = value;
> > >     if (! check_unique_values)
> > >             return nb_item;
> > > diff --git a/app/test-pmd/meson.build b/app/test-pmd/meson.build index
> > > d5df52c470..5c8ca68c9d 100644
> > > --- a/app/test-pmd/meson.build
> > > +++ b/app/test-pmd/meson.build
> > > @@ -32,7 +32,7 @@ if dpdk_conf.has('RTE_HAS_JANSSON')
> > >      ext_deps += jansson_dep
> > >  endif
> > >
> > > -deps += ['ethdev', 'gro', 'gso', 'cmdline', 'metrics', 'bus_pci']
> > > +deps += ['ethdev', 'gro', 'gso', 'cmdline', 'metrics', 'bus_pci',
> > > +'gpudev']
> > >  if dpdk_conf.has('RTE_CRYPTO_SCHEDULER')
> > >      deps += 'crypto_scheduler'
> > >  endif
> > > diff --git a/app/test-pmd/parameters.c b/app/test-pmd/parameters.c index
> > > 0974b0a38f..d41f7f220b 100644
> > > --- a/app/test-pmd/parameters.c
> > > +++ b/app/test-pmd/parameters.c
> > > @@ -87,7 +87,10 @@ usage(char* progname)
> > >            "in NUMA mode.\n");
> > >     printf("  --mbuf-size=N,[N1[,..Nn]: set the data size of mbuf to "
> > >            "N bytes. If multiple numbers are specified the extra pools "
> > > -          "will be created to receive with packet split features\n");
> > > +          "will be created to receive with packet split features\n"
> > > +              "Use 'g' suffix for GPU memory.\n"
> > > +              "If no or an unrecognized suffix is provided, CPU is
> > > assumed\n");
> > Unrecognized suffix? I would emit an error and abort the launch.
> >
> > > +
> > >     printf("  --total-num-mbufs=N: set the number of mbufs to be
> > > allocated "
> > >            "in mbuf pools.\n");
> > >     printf("  --max-pkt-len=N: set the maximum size of packet to N
> > > bytes.\n"); @@ -595,6 +598,7 @@ launch_args_parse(int argc, char** argv)
> > >     struct rte_eth_dev_info dev_info;
> > >     uint16_t rec_nb_pkts;
> > >     int ret;
> > > +   uint32_t idx = 0;
> > >
> > >     static struct option lgopts[] = {
> > >             { "help",                       0, 0, 0 },
> > > @@ -1538,4 +1542,11 @@ launch_args_parse(int argc, char** argv)
> > >                               "ignored\n");
> > >             mempool_flags = 0;
> > >     }
> > > +
> > > +   for (idx = 0; idx < mbuf_data_size_n; idx++) {
> > > +           if (mbuf_mem_types[idx] == MBUF_MEM_GPU &&
> > > strcmp(cur_fwd_eng->fwd_mode_name, "io") != 0) {
> > > +                   fprintf(stderr, "GPU memory mbufs can be used with
> > > iofwd engine only\n");
> > > +                   rte_exit(EXIT_FAILURE, "Command line is
> > > incorrect\n");
> > > +           }
> > > +   }
> > Please, note, the forwarding mode can be changed from interactive prompt with "set fwd <mode>" command.
> > If iofwd mode is crucial for GPU functionality - we should prevent switching to other modes if GPU pools are engaged.
> >
> >
> > >  }
> > > diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c index
> > > a66dfb297c..1af235c4d8 100644
> > > --- a/app/test-pmd/testpmd.c
> > > +++ b/app/test-pmd/testpmd.c
> > > @@ -205,6 +205,12 @@ uint32_t mbuf_data_size_n = 1; /* Number of
> > > specified mbuf sizes. */  uint16_t mbuf_data_size[MAX_SEGS_BUFFER_SPLIT]
> > > = {
> > >     DEFAULT_MBUF_DATA_SIZE
> > >  }; /**< Mbuf data space size. */
> > > +
> > > +/* Mbuf memory types. */
> > > +enum mbuf_mem_type mbuf_mem_types[MAX_SEGS_BUFFER_SPLIT];
> > > +/* Pointers to external memory allocated for mempools. */ uintptr_t
> > > +mempools_ext_ptr[MAX_SEGS_BUFFER_SPLIT];
> > > +
> > >  uint32_t param_total_num_mbufs = 0;  /**< number of mbufs in all pools - if
> > >                                        * specified on command-line. */  uint16_t
> > > stats_period; /**< Period to show statistics (disabled by default) */ @@ -
> > > 543,6 +549,12 @@ int proc_id;
> > >   */
> > >  unsigned int num_procs = 1;
> > >
> > > +/*
> > > + * In case of GPU memory external mbufs use, for simplicity,
> > > + * the first GPU device in the list.
> > > + */
> > > +int gpu_id = 0;
> > It is assigned with zero and never changes. Support the first GPU only?
> > This limitation should be mentioned in documentation.
> >
> > > +
> > >  static void
> > >  eth_rx_metadata_negotiate_mp(uint16_t port_id)  { @@ -1103,6 +1115,79
> > > @@ setup_extbuf(uint32_t nb_mbufs, uint16_t mbuf_sz, unsigned int
> > > socket_id,
> > >     return ext_num;
> > >  }
> > >
> > > +static struct rte_mempool *
> > > +gpu_mbuf_pool_create(uint16_t mbuf_seg_size, unsigned int nb_mbuf,
> > > +                   unsigned int socket_id, uint16_t port_id,
> > > +                   int gpu_id, uintptr_t * mp_addr)
> > > +{
> > > +   int ret = 0;
> > > +   char pool_name[RTE_MEMPOOL_NAMESIZE];
> > > +   struct rte_eth_dev_info dev_info;
> > > +   struct rte_mempool *rte_mp = NULL;
> > > +   struct rte_pktmbuf_extmem gpu_mem;
> > > +   struct rte_gpu_info ginfo;
> > > +   uint8_t gpu_page_shift = 16;
> > > +   uint32_t gpu_page_size = (1UL << gpu_page_shift);
> > > +
> > > +   ret = eth_dev_info_get_print_err(port_id, &dev_info);
> > > +   if (ret != 0)
> > > +           rte_exit(EXIT_FAILURE,
> > > +                   "Failed to get device info for port %d\n",
> > > +                   port_id);
> > > +
> > > +   mbuf_poolname_build(socket_id, pool_name, sizeof(pool_name),
> > > port_id, MBUF_MEM_GPU);
> > > +   if (!is_proc_primary()) {
> > > +           rte_mp = rte_mempool_lookup(pool_name);
> > > +           if (rte_mp == NULL)
> > > +                   rte_exit(EXIT_FAILURE,
> > > +                           "Get mbuf pool for socket %u failed: %s\n",
> > > +                           socket_id, rte_strerror(rte_errno));
> > > +           return rte_mp;
> > > +   }
> > > +
> > > +   if (rte_gpu_info_get(gpu_id, &ginfo))
> > > +           rte_exit(EXIT_FAILURE, "Can't retrieve info about GPU %d -
> > > bye\n",
> > > +gpu_id);
> > > +
> > > +   TESTPMD_LOG(INFO,
> > > +           "create a new mbuf pool <%s>: n=%u, size=%u, socket=%u
> > > GPU device=%s\n",
> > > +           pool_name, nb_mbuf, mbuf_seg_size, socket_id, ginfo.name);
> > > +
> > > +   /* Create an external memory mempool using memory allocated on
> > > the
> > > +GPU. */
> > > +
> > > +   gpu_mem.elt_size = RTE_MBUF_DEFAULT_BUF_SIZE;
> > > +   gpu_mem.buf_len = RTE_ALIGN_CEIL(nb_mbuf * gpu_mem.elt_size,
> > > gpu_page_size);
> > > +   gpu_mem.buf_iova = RTE_BAD_IOVA;
> > > +
> > > +   gpu_mem.buf_ptr = rte_gpu_mem_alloc(gpu_id, gpu_mem.buf_len);
> > > +   if (gpu_mem.buf_ptr == NULL)
> > > +           rte_exit(EXIT_FAILURE, "Could not allocate GPU device
> > > memory\n");
> > > +
> > > +   ret = rte_extmem_register(gpu_mem.buf_ptr, gpu_mem.buf_len,
> > > NULL, gpu_mem.buf_iova, gpu_page_size);
> > > +   if (ret)
> > > +           rte_exit(EXIT_FAILURE, "Unable to register addr 0x%p, ret
> > > %d\n",
> > > +gpu_mem.buf_ptr, ret);
> > > +
> > > +   uint16_t pid = 0;
> > > +
> > > +   RTE_ETH_FOREACH_DEV(pid)
> > > +   {
> > > +           ret = rte_dev_dma_map(dev_info.device, gpu_mem.buf_ptr,
> > > +                                     gpu_mem.buf_iova,
> > > gpu_mem.buf_len);
> > > +           if (ret) {
> > > +                   rte_exit(EXIT_FAILURE, "Unable to DMA map addr
> > > 0x%p for device %s\n",
> > > +                                    gpu_mem.buf_ptr, dev_info.device-
> > > >name);
> > > +           }
> > > +   }
> > > +
> > > +   rte_mp = rte_pktmbuf_pool_create_extbuf(pool_name, nb_mbuf,
> > > mb_mempool_cache, 0, mbuf_seg_size, socket_id, &gpu_mem, 1);
> > > +   if (rte_mp == NULL) {
> > > +           rte_exit(EXIT_FAILURE, "Creation of GPU mempool <%s>
> > > failed\n", pool_name);
> > > +   }
> > > +
> > > +   *mp_addr = (uintptr_t) gpu_mem.buf_ptr;
> > > +
> > > +   return rte_mp;
> > > +}
> > > +
> > >  /*
> > >   * Configuration initialisation done once at init time.
> > >   */
> > > @@ -1117,7 +1202,7 @@ mbuf_pool_create(uint16_t mbuf_seg_size,
> > > unsigned nb_mbuf,
> > >
> > >     mb_size = sizeof(struct rte_mbuf) + mbuf_seg_size;  #endif
> > > -   mbuf_poolname_build(socket_id, pool_name, sizeof(pool_name),
> > > size_idx);
> > > +   mbuf_poolname_build(socket_id, pool_name, sizeof(pool_name),
> > > size_idx,
> > > +MBUF_MEM_CPU);
> > >     if (!is_proc_primary()) {
> > >             rte_mp = rte_mempool_lookup(pool_name);
> > >             if (rte_mp == NULL)
> > > @@ -1700,19 +1785,42 @@ init_config(void)
> > >
> > >             for (i = 0; i < num_sockets; i++)
> > >                     for (j = 0; j < mbuf_data_size_n; j++)
> > > -                           mempools[i * MAX_SEGS_BUFFER_SPLIT + j]
> > > =
> > > -                                   mbuf_pool_create(mbuf_data_size[j],
> > > -                                                     nb_mbuf_per_pool,
> > > -                                                     socket_ids[i], j);
> > > +                   {
> > > +                           if (mbuf_mem_types[j] == MBUF_MEM_GPU)
> > > {
> > > +                                   if (rte_gpu_count_avail() == 0)
> > > +                                           rte_exit(EXIT_FAILURE, "No
> > > GPU device available.\n");
> > > +
> > > +                                   mempools[i *
> > > MAX_SEGS_BUFFER_SPLIT + j] =
> > > +
> > >     gpu_mbuf_pool_create(mbuf_data_size[j],
> > What about GPU/CPU adherence ? Do we create one GPU pool per CPU socket?
> > Disregarding  hardware topology at all?
> > We can mitigate the issue by "--socket-mem/--socket-num" parameter though.
> >
> > > +
> > >              nb_mbuf_per_pool,
> > > +
> > >              socket_ids[i], j, gpu_id,
> > > +
> > >              &(mempools_ext_ptr[j]));
> > > +                           } else {
> > > +                                   mempools[i *
> > > MAX_SEGS_BUFFER_SPLIT + j] =
> > > +
> > >     mbuf_pool_create(mbuf_data_size[j],
> > > +
> > >     nb_mbuf_per_pool,
> > > +                                                           socket_ids[i],
> > > j);
> > > +                           }
> > > +                   }
> > >     } else {
> > >             uint8_t i;
> > >
> > >             for (i = 0; i < mbuf_data_size_n; i++)
> > > -                   mempools[i] = mbuf_pool_create
> > > -                                   (mbuf_data_size[i],
> > > -                                    nb_mbuf_per_pool,
> > > -                                    socket_num == UMA_NO_CONFIG ?
> > > -                                    0 : socket_num, i);
> > > +           {
> > > +                   if (mbuf_mem_types[i] == MBUF_MEM_GPU) {
> > > +                           mempools[i] =
> > > gpu_mbuf_pool_create(mbuf_data_size[i],
> > > +
> > >                        nb_mbuf_per_pool,
> > > +
> > >                        socket_num == UMA_NO_CONFIG ? 0 : socket_num,
> > > +
> > >                        i, gpu_id,
> > > +
> > >                        &(mempools_ext_ptr[i]));
> > > +                   } else {
> > > +                           mempools[i] =
> > > mbuf_pool_create(mbuf_data_size[i],
> > > +                                                   nb_mbuf_per_pool,
> > > +                                                   socket_num ==
> > > UMA_NO_CONFIG ?
> > > +                                                   0 : socket_num, i);
> > > +                   }
> > > +           }
> > > +
> > >     }
> > >
> > >     init_port_config();
> > > @@ -3415,8 +3523,11 @@ pmd_test_exit(void)
> > >             }
> > >     }
> > >     for (i = 0 ; i < RTE_DIM(mempools) ; i++) {
> > > -           if (mempools[i])
> > > +           if (mempools[i]) {
> > >                     mempool_free_mp(mempools[i]);
> > > +                   if (mbuf_mem_types[i] == MBUF_MEM_GPU)
> > > +                           rte_gpu_mem_free(gpu_id, (void
> > > *)mempools_ext_ptr[i]);
> > > +           }
> > >     }
> > >     free(xstats_display);
> > >
> > > diff --git a/app/test-pmd/testpmd.h b/app/test-pmd/testpmd.h index
> > > 669ce1e87d..9919044372 100644
> > > --- a/app/test-pmd/testpmd.h
> > > +++ b/app/test-pmd/testpmd.h
> > > @@ -12,6 +12,7 @@
> > >  #include <rte_gro.h>
> > >  #include <rte_gso.h>
> > >  #include <rte_os_shim.h>
> > > +#include <rte_gpudev.h>
> > >  #include <cmdline.h>
> > >  #include <sys/queue.h>
> > >  #ifdef RTE_HAS_JANSSON
> > > @@ -474,6 +475,11 @@ extern uint8_t dcb_config;  extern uint32_t
> > > mbuf_data_size_n;  extern uint16_t
> > > mbuf_data_size[MAX_SEGS_BUFFER_SPLIT];
> > >  /**< Mbuf data space size. */
> > > +enum mbuf_mem_type {
> > > +   MBUF_MEM_CPU,
> > > +   MBUF_MEM_GPU
> > > +};
> > > +extern enum mbuf_mem_type
> > > mbuf_mem_types[MAX_SEGS_BUFFER_SPLIT];
> > >  extern uint32_t param_total_num_mbufs;
> > >
> > >  extern uint16_t stats_period;
> > > @@ -717,14 +723,16 @@ current_fwd_lcore(void)
> > >  /* Mbuf Pools */
> > >  static inline void
> > >  mbuf_poolname_build(unsigned int sock_id, char *mp_name,
> > > -               int name_size, uint16_t idx)
> > > +               int name_size, uint16_t idx, enum mbuf_mem_type
> > > mem_type)
> > >  {
> > > +   const char *suffix = mem_type == MBUF_MEM_GPU ? "_gpu" : "";
> > > +
> > >     if (!idx)
> > >             snprintf(mp_name, name_size,
> > > -                    MBUF_POOL_NAME_PFX "_%u", sock_id);
> > > +                    MBUF_POOL_NAME_PFX "_%u%s", sock_id, suffix);
> > >     else
> > >             snprintf(mp_name, name_size,
> > > -                    MBUF_POOL_NAME_PFX "_%hu_%hu",
> > > (uint16_t)sock_id, idx);
> > > +                    MBUF_POOL_NAME_PFX "_%hu_%hu%s",
> > > (uint16_t)sock_id, idx, suffix);
> > >  }
> > >
> > >  static inline struct rte_mempool *
> > > @@ -732,7 +740,7 @@ mbuf_pool_find(unsigned int sock_id, uint16_t idx)  {
> > >     char pool_name[RTE_MEMPOOL_NAMESIZE];
> > >
> > > -   mbuf_poolname_build(sock_id, pool_name, sizeof(pool_name), idx);
> > > +   mbuf_poolname_build(sock_id, pool_name, sizeof(pool_name), idx,
> > > +mbuf_mem_types[idx]);
> > >     return rte_mempool_lookup((const char *)pool_name);  }
> > >
> > > diff --git a/doc/guides/testpmd_app_ug/run_app.rst
> > > b/doc/guides/testpmd_app_ug/run_app.rst
> > > index 30edef07ea..ede7b79abb 100644
> > > --- a/doc/guides/testpmd_app_ug/run_app.rst
> > > +++ b/doc/guides/testpmd_app_ug/run_app.rst
> > > @@ -119,6 +119,9 @@ The command line options are:
> > >      The default value is 2048. If multiple mbuf-size values are specified the
> > >      extra memory pools will be created for allocating mbufs to receive packets
> > >      with buffer splitting features.
> > > +    Providing mbuf size with a 'g' suffix (e.g. ``--mbuf-size=2048g``),
> > > +    will cause the mempool to be created on a GPU memory area allocated.
> > > +    This option is currently limited to iofwd engine with the first GPU.
> > Mmm, if we have multiple GPUs - we have no way to specify on which one we should allocate the pool.
> > Syntax is not complete☹. Possible we should specify GPU port id after the suffix ? Like 2048g2 ?
> > If port index is omitted we should consider we have the only GPU and check this.
>
> Do we need to overload --mbuf-size parameter here?
> Why not add 'gpu' option to --mp-alloc parameter?

--mbuf-size is preferred to --mp-alloc because with --mbuf-size you can enable buffer split feature
allocating multiple mempools with different mbufs sizes on different type of memories.

An example would be --mbuf-size=2048g,1024 which will create two mempools to split packets,
one in GPU memory and another in CPU memory
  
Jerin Jacob Nov. 16, 2021, 7:09 p.m. UTC | #7
On Tue, Nov 16, 2021 at 11:42 PM Ferruh Yigit <ferruh.yigit@intel.com> wrote:
>
> On 11/16/2021 6:06 PM, Elena Agostini wrote:
> >  > From: Ferruh Yigit <ferruh.yigit@intel.com>
> >
> >  > Date: Tuesday, 16 November 2021 at 19:00
> >
> >  > To: Elena Agostini <eagostini@nvidia.com>
> >
> >  > Cc: dev@dpdk.org <dev@dpdk.org>, Slava Ovsiienko <viacheslavo@nvidia.com>
> >
> >  > Subject: Re: [PATCH v2 1/1] app/testpmd: add GPU memory option in iofwd engine
> >
> >  > External email: Use caution opening links or attachments>
> >
> >  >
> >
> >  > On 11/11/2021 9:41 PM, eagostini@nvidia.com wrote:
> >
> >  > > --- a/app/test-pmd/meson.build
> >
> >  > > +++ b/app/test-pmd/meson.build
> >
> >  > > @@ -32,7 +32,7 @@ if dpdk_conf.has('RTE_HAS_JANSSON')
> >
> >  > >       ext_deps += jansson_dep
> >
> >  > >   endif
> >
> >  > >
> >
> >  > > -deps += ['ethdev', 'gro', 'gso', 'cmdline', 'metrics', 'bus_pci']
> >
> >  > > +deps += ['ethdev', 'gro', 'gso', 'cmdline', 'metrics', 'bus_pci', 'gpudev']>
> >
> >  > I didn't review the set, but in a very high level do we want to add
> >
> >  > 'gpudev' as dependency? Isn't this like adding 'rawdev' as dependency.
> >
> > gpudev is a library that can be built without a gpu driver as all the other libraries
> >
> > and itis actually used only in case of GPU memory mempool.
> >
> > Reasons for this patch are:
> >
> > - Have an upstreamed benchmark tool to measure network metrics using GPU memory
> >
> > - Test some DPDK features not really tested anywhere like the external memory mempool feature
> >
>
> I can see the reason, that is obvious, yet again why we are not adding rawdev
> testing to the testpmd? But adding gpudev.
> It is easier to add it to the testpmd, and for some testing perspective it
> makes sense, but still I am not quite sure about this new dependency, I would
> like to get more feedback.

I had the similar concern earlier. IMO, It is better to have a
separate test application for gpudev like
other device classes. For eventdev cases when it needs to work with
ethdev for Rx adapter cases,
We have enabled such code in app/test-eventdev to make testpmd focus on ethdev.
  
Elena Agostini Nov. 16, 2021, 7:14 p.m. UTC | #8
> From: Jerin Jacob <jerinjacobk@gmail.com>
> Date: Tuesday, 16 November 2021 at 20:09
> To: Ferruh Yigit <ferruh.yigit@intel.com>
> Cc: Elena Agostini <eagostini@nvidia.com>, dev@dpdk.org <dev@dpdk.org>, Slava Ovsiienko <viacheslavo@nvidia.com>
> Subject: Re: [PATCH v2 1/1] app/testpmd: add GPU memory option in iofwd engine
> External email: Use caution opening links or attachments
>
>
> On Tue, Nov 16, 2021 at 11:42 PM Ferruh Yigit <ferruh.yigit@intel.com> wrote:
> >
> > On 11/16/2021 6:06 PM, Elena Agostini wrote:
> > >  > From: Ferruh Yigit <ferruh.yigit@intel.com>
> > >
> > >  > Date: Tuesday, 16 November 2021 at 19:00
> > >
> > >  > To: Elena Agostini <eagostini@nvidia.com>
> > >
> > >  > Cc: dev@dpdk.org <dev@dpdk.org>, Slava Ovsiienko <viacheslavo@nvidia.com>
> > >
> > >  > Subject: Re: [PATCH v2 1/1] app/testpmd: add GPU memory option in iofwd engine
> > >
> > >  > External email: Use caution opening links or attachments>
> > >
> > >  >
> > >
> > >  > On 11/11/2021 9:41 PM, eagostini@nvidia.com wrote:
> > >
> > >  > > --- a/app/test-pmd/meson.build
> > >
> > >  > > +++ b/app/test-pmd/meson.build
> > >
> > >  > > @@ -32,7 +32,7 @@ if dpdk_conf.has('RTE_HAS_JANSSON')
> > >
> > >  > >       ext_deps += jansson_dep
> > >
> > >  > >   endif
> > >
> > >  > >
> > >
> > >  > > -deps += ['ethdev', 'gro', 'gso', 'cmdline', 'metrics', 'bus_pci']
> > >
> > >  > > +deps += ['ethdev', 'gro', 'gso', 'cmdline', 'metrics', 'bus_pci', 'gpudev']>
> > >
> > >  > I didn't review the set, but in a very high level do we want to add
> > >
> > >  > 'gpudev' as dependency? Isn't this like adding 'rawdev' as dependency.
> > >
> > > gpudev is a library that can be built without a gpu driver as all the other libraries
> > >
> > > and itis actually used only in case of GPU memory mempool.
> > >
> > > Reasons for this patch are:
> > >
> > > - Have an upstreamed benchmark tool to measure network metrics using GPU memory
> > >
> > > - Test some DPDK features not really tested anywhere like the external memory mempool feature
> > >
> >
> > I can see the reason, that is obvious, yet again why we are not adding rawdev
> > testing to the testpmd? But adding gpudev.
> > It is easier to add it to the testpmd, and for some testing perspective it
> > makes sense, but still I am not quite sure about this new dependency, I would
> > like to get more feedback.
>
> I had the similar concern earlier. IMO, It is better to have a
> separate test application for gpudev like
> other device classes. For eventdev cases when it needs to work with
> ethdev for Rx adapter cases,
> We have enabled such code in app/test-eventdev to make testpmd focus on ethdev.

gpudev already has a test app in app/test-gpudev.

gpudev needs to be also test with network card and today another application
decidated to test gpudev over the network would be very similar to testpmd io.

At this stage, there is no point in reinventing the wheel
  
Jerin Jacob Nov. 16, 2021, 7:21 p.m. UTC | #9
On Wed, Nov 17, 2021 at 12:44 AM Elena Agostini <eagostini@nvidia.com> wrote:
>
> > From: Jerin Jacob <jerinjacobk@gmail.com>
>
> > Date: Tuesday, 16 November 2021 at 20:09
>
> > To: Ferruh Yigit <ferruh.yigit@intel.com>
>
> > Cc: Elena Agostini <eagostini@nvidia.com>, dev@dpdk.org <dev@dpdk.org>, Slava Ovsiienko <viacheslavo@nvidia.com>
>
> > Subject: Re: [PATCH v2 1/1] app/testpmd: add GPU memory option in iofwd engine
>
> > External email: Use caution opening links or attachments
>
> >
>
> >
>
> > On Tue, Nov 16, 2021 at 11:42 PM Ferruh Yigit <ferruh.yigit@intel.com> wrote:
>
> > >
>
> > > On 11/16/2021 6:06 PM, Elena Agostini wrote:
>
> > > >  > From: Ferruh Yigit <ferruh.yigit@intel.com>
>
> > > >
>
> > > >  > Date: Tuesday, 16 November 2021 at 19:00
>
> > > >
>
> > > >  > To: Elena Agostini <eagostini@nvidia.com>
>
> > > >
>
> > > >  > Cc: dev@dpdk.org <dev@dpdk.org>, Slava Ovsiienko <viacheslavo@nvidia.com>
>
> > > >
>
> > > >  > Subject: Re: [PATCH v2 1/1] app/testpmd: add GPU memory option in iofwd engine
>
> > > >
>
> > > >  > External email: Use caution opening links or attachments>
>
> > > >
>
> > > >  >
>
> > > >
>
> > > >  > On 11/11/2021 9:41 PM, eagostini@nvidia.com wrote:
>
> > > >
>
> > > >  > > --- a/app/test-pmd/meson.build
>
> > > >
>
> > > >  > > +++ b/app/test-pmd/meson.build
>
> > > >
>
> > > >  > > @@ -32,7 +32,7 @@ if dpdk_conf.has('RTE_HAS_JANSSON')
>
> > > >
>
> > > >  > >       ext_deps += jansson_dep
>
> > > >
>
> > > >  > >   endif
>
> > > >
>
> > > >  > >
>
> > > >
>
> > > >  > > -deps += ['ethdev', 'gro', 'gso', 'cmdline', 'metrics', 'bus_pci']
>
> > > >
>
> > > >  > > +deps += ['ethdev', 'gro', 'gso', 'cmdline', 'metrics', 'bus_pci', 'gpudev']>
>
> > > >
>
> > > >  > I didn't review the set, but in a very high level do we want to add
>
> > > >
>
> > > >  > 'gpudev' as dependency? Isn't this like adding 'rawdev' as dependency.
>
> > > >
>
> > > > gpudev is a library that can be built without a gpu driver as all the other libraries
>
> > > >
>
> > > > and itis actually used only in case of GPU memory mempool.
>
> > > >
>
> > > > Reasons for this patch are:
>
> > > >
>
> > > > - Have an upstreamed benchmark tool to measure network metrics using GPU memory
>
> > > >
>
> > > > - Test some DPDK features not really tested anywhere like the external memory mempool feature
>
> > > >
>
> > >
>
> > > I can see the reason, that is obvious, yet again why we are not adding rawdev
>
> > > testing to the testpmd? But adding gpudev.
>
> > > It is easier to add it to the testpmd, and for some testing perspective it
>
> > > makes sense, but still I am not quite sure about this new dependency, I would
>
> > > like to get more feedback.
>
> >
>
> > I had the similar concern earlier. IMO, It is better to have a
>
> > separate test application for gpudev like
>
> > other device classes. For eventdev cases when it needs to work with
>
> > ethdev for Rx adapter cases,
>
> > We have enabled such code in app/test-eventdev to make testpmd focus on ethdev.
>
>
>
> gpudev already has a test app in app/test-gpudev.
>
>
>
> gpudev needs to be also test with network card and today another application
>
> decidated to test gpudev over the network would be very similar to testpmd io.
>
>
>
> At this stage, there is no point in reinventing the wheel


I think, it is not about not reinventing the wheel, It is about
maintenance of testpmd,
currently, the feature are specific to ethdev. Adding more
cross-device-specific features
will populate the testpmd. I had a similar case when it network cases
need to be integrated to eventdev,
I choose to have it test-eventdev so that testpmd focus remains for ethdev.
  
Bruce Richardson Nov. 17, 2021, 8:55 a.m. UTC | #10
On Wed, Nov 17, 2021 at 12:51:13AM +0530, Jerin Jacob wrote:
> On Wed, Nov 17, 2021 at 12:44 AM Elena Agostini <eagostini@nvidia.com> wrote:
> >
> > > From: Jerin Jacob <jerinjacobk@gmail.com>
> >
> > > Date: Tuesday, 16 November 2021 at 20:09
> >
> > > To: Ferruh Yigit <ferruh.yigit@intel.com>
> >
> > > Cc: Elena Agostini <eagostini@nvidia.com>, dev@dpdk.org <dev@dpdk.org>, Slava Ovsiienko <viacheslavo@nvidia.com>
> >
> > > Subject: Re: [PATCH v2 1/1] app/testpmd: add GPU memory option in iofwd engine
> >
> > > External email: Use caution opening links or attachments
> >
> > >
> >
> > >
> >
> > > On Tue, Nov 16, 2021 at 11:42 PM Ferruh Yigit <ferruh.yigit@intel.com> wrote:
> >
> > > >
> >
> > > > On 11/16/2021 6:06 PM, Elena Agostini wrote:
> >
> > > > >  > From: Ferruh Yigit <ferruh.yigit@intel.com>
> >
> > > > >
> >
> > > > >  > Date: Tuesday, 16 November 2021 at 19:00
> >
> > > > >
> >
> > > > >  > To: Elena Agostini <eagostini@nvidia.com>
> >
> > > > >
> >
> > > > >  > Cc: dev@dpdk.org <dev@dpdk.org>, Slava Ovsiienko <viacheslavo@nvidia.com>
> >
> > > > >
> >
> > > > >  > Subject: Re: [PATCH v2 1/1] app/testpmd: add GPU memory option in iofwd engine
> >
> > > > >
> >
> > > > >  > External email: Use caution opening links or attachments>
> >
> > > > >
> >
> > > > >  >
> >
> > > > >
> >
> > > > >  > On 11/11/2021 9:41 PM, eagostini@nvidia.com wrote:
> >
> > > > >
> >
> > > > >  > > --- a/app/test-pmd/meson.build
> >
> > > > >
> >
> > > > >  > > +++ b/app/test-pmd/meson.build
> >
> > > > >
> >
> > > > >  > > @@ -32,7 +32,7 @@ if dpdk_conf.has('RTE_HAS_JANSSON')
> >
> > > > >
> >
> > > > >  > >       ext_deps += jansson_dep
> >
> > > > >
> >
> > > > >  > >   endif
> >
> > > > >
> >
> > > > >  > >
> >
> > > > >
> >
> > > > >  > > -deps += ['ethdev', 'gro', 'gso', 'cmdline', 'metrics', 'bus_pci']
> >
> > > > >
> >
> > > > >  > > +deps += ['ethdev', 'gro', 'gso', 'cmdline', 'metrics', 'bus_pci', 'gpudev']>
> >
> > > > >
> >
> > > > >  > I didn't review the set, but in a very high level do we want to add
> >
> > > > >
> >
> > > > >  > 'gpudev' as dependency? Isn't this like adding 'rawdev' as dependency.
> >
> > > > >
> >
> > > > > gpudev is a library that can be built without a gpu driver as all the other libraries
> >
> > > > >
> >
> > > > > and itis actually used only in case of GPU memory mempool.
> >
> > > > >
> >
> > > > > Reasons for this patch are:
> >
> > > > >
> >
> > > > > - Have an upstreamed benchmark tool to measure network metrics using GPU memory
> >
> > > > >
> >
> > > > > - Test some DPDK features not really tested anywhere like the external memory mempool feature
> >
> > > > >
> >
> > > >
> >
> > > > I can see the reason, that is obvious, yet again why we are not adding rawdev
> >
> > > > testing to the testpmd? But adding gpudev.
> >
> > > > It is easier to add it to the testpmd, and for some testing perspective it
> >
> > > > makes sense, but still I am not quite sure about this new dependency, I would
> >
> > > > like to get more feedback.
> >
> > >
> >
> > > I had the similar concern earlier. IMO, It is better to have a
> >
> > > separate test application for gpudev like
> >
> > > other device classes. For eventdev cases when it needs to work with
> >
> > > ethdev for Rx adapter cases,
> >
> > > We have enabled such code in app/test-eventdev to make testpmd focus on ethdev.
> >
> >
> >
> > gpudev already has a test app in app/test-gpudev.
> >
> >
> >
> > gpudev needs to be also test with network card and today another application
> >
> > decidated to test gpudev over the network would be very similar to testpmd io.
> >
> >
> >
> > At this stage, there is no point in reinventing the wheel
> 
> 
> I think, it is not about not reinventing the wheel, It is about
> maintenance of testpmd,
> currently, the feature are specific to ethdev. Adding more
> cross-device-specific features
> will populate the testpmd. I had a similar case when it network cases
> need to be integrated to eventdev,
> I choose to have it test-eventdev so that testpmd focus remains for ethdev.

Since there is an increasing push for libraries to be able to be disabled
in DPDK, at minimum I would look to have gpudev as an optional dependency
here, so that disabling it would not completely disable building testpmd.

Regards,
/Bruce
  

Patch

diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index 4f51b259fe..36193bc566 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -3614,6 +3614,7 @@  parse_item_list(const char *str, const char *item_name, unsigned int max_items,
 	unsigned int j;
 	int value_ok;
 	char c;
+	int gpu_mbuf = 0;
 
 	/*
 	 * First parse all items in the list and store their value.
@@ -3628,6 +3629,14 @@  parse_item_list(const char *str, const char *item_name, unsigned int max_items,
 			value_ok = 1;
 			continue;
 		}
+		if (c == 'g') {
+			/*
+			 * When this flag is set, mbufs for this segment
+			 * will be created on GPU memory.
+			 */
+			gpu_mbuf = 1;
+			continue;
+		}
 		if (c != ',') {
 			fprintf(stderr, "character %c is not a decimal digit\n", c);
 			return 0;
@@ -3640,6 +3649,8 @@  parse_item_list(const char *str, const char *item_name, unsigned int max_items,
 			parsed_items[nb_item] = value;
 			value_ok = 0;
 			value = 0;
+			mbuf_mem_types[nb_item] = gpu_mbuf ? MBUF_MEM_GPU : MBUF_MEM_CPU;
+			gpu_mbuf = 0;
 		}
 		nb_item++;
 	}
@@ -3648,6 +3659,9 @@  parse_item_list(const char *str, const char *item_name, unsigned int max_items,
 			item_name, nb_item + 1, max_items);
 		return 0;
 	}
+
+	mbuf_mem_types[nb_item] = gpu_mbuf ? MBUF_MEM_GPU : MBUF_MEM_CPU;
+
 	parsed_items[nb_item++] = value;
 	if (! check_unique_values)
 		return nb_item;
diff --git a/app/test-pmd/meson.build b/app/test-pmd/meson.build
index d5df52c470..5c8ca68c9d 100644
--- a/app/test-pmd/meson.build
+++ b/app/test-pmd/meson.build
@@ -32,7 +32,7 @@  if dpdk_conf.has('RTE_HAS_JANSSON')
     ext_deps += jansson_dep
 endif
 
-deps += ['ethdev', 'gro', 'gso', 'cmdline', 'metrics', 'bus_pci']
+deps += ['ethdev', 'gro', 'gso', 'cmdline', 'metrics', 'bus_pci', 'gpudev']
 if dpdk_conf.has('RTE_CRYPTO_SCHEDULER')
     deps += 'crypto_scheduler'
 endif
diff --git a/app/test-pmd/parameters.c b/app/test-pmd/parameters.c
index 0974b0a38f..d41f7f220b 100644
--- a/app/test-pmd/parameters.c
+++ b/app/test-pmd/parameters.c
@@ -87,7 +87,10 @@  usage(char* progname)
 	       "in NUMA mode.\n");
 	printf("  --mbuf-size=N,[N1[,..Nn]: set the data size of mbuf to "
 	       "N bytes. If multiple numbers are specified the extra pools "
-	       "will be created to receive with packet split features\n");
+	       "will be created to receive with packet split features\n"
+		   "Use 'g' suffix for GPU memory.\n"
+		   "If no or an unrecognized suffix is provided, CPU is assumed\n");
+
 	printf("  --total-num-mbufs=N: set the number of mbufs to be allocated "
 	       "in mbuf pools.\n");
 	printf("  --max-pkt-len=N: set the maximum size of packet to N bytes.\n");
@@ -595,6 +598,7 @@  launch_args_parse(int argc, char** argv)
 	struct rte_eth_dev_info dev_info;
 	uint16_t rec_nb_pkts;
 	int ret;
+	uint32_t idx = 0;
 
 	static struct option lgopts[] = {
 		{ "help",			0, 0, 0 },
@@ -1538,4 +1542,11 @@  launch_args_parse(int argc, char** argv)
 				  "ignored\n");
 		mempool_flags = 0;
 	}
+
+	for (idx = 0; idx < mbuf_data_size_n; idx++) {
+		if (mbuf_mem_types[idx] == MBUF_MEM_GPU && strcmp(cur_fwd_eng->fwd_mode_name, "io") != 0) {
+			fprintf(stderr, "GPU memory mbufs can be used with iofwd engine only\n");
+			rte_exit(EXIT_FAILURE, "Command line is incorrect\n");
+		}
+	}
 }
diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index a66dfb297c..1af235c4d8 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -205,6 +205,12 @@  uint32_t mbuf_data_size_n = 1; /* Number of specified mbuf sizes. */
 uint16_t mbuf_data_size[MAX_SEGS_BUFFER_SPLIT] = {
 	DEFAULT_MBUF_DATA_SIZE
 }; /**< Mbuf data space size. */
+
+/* Mbuf memory types. */
+enum mbuf_mem_type mbuf_mem_types[MAX_SEGS_BUFFER_SPLIT];
+/* Pointers to external memory allocated for mempools. */
+uintptr_t mempools_ext_ptr[MAX_SEGS_BUFFER_SPLIT];
+
 uint32_t param_total_num_mbufs = 0;  /**< number of mbufs in all pools - if
                                       * specified on command-line. */
 uint16_t stats_period; /**< Period to show statistics (disabled by default) */
@@ -543,6 +549,12 @@  int proc_id;
  */
 unsigned int num_procs = 1;
 
+/*
+ * In case of GPU memory external mbufs use, for simplicity,
+ * the first GPU device in the list.
+ */
+int gpu_id = 0;
+
 static void
 eth_rx_metadata_negotiate_mp(uint16_t port_id)
 {
@@ -1103,6 +1115,79 @@  setup_extbuf(uint32_t nb_mbufs, uint16_t mbuf_sz, unsigned int socket_id,
 	return ext_num;
 }
 
+static struct rte_mempool *
+gpu_mbuf_pool_create(uint16_t mbuf_seg_size, unsigned int nb_mbuf,
+			unsigned int socket_id, uint16_t port_id,
+			int gpu_id, uintptr_t * mp_addr)
+{
+	int ret = 0;
+	char pool_name[RTE_MEMPOOL_NAMESIZE];
+	struct rte_eth_dev_info dev_info;
+	struct rte_mempool *rte_mp = NULL;
+	struct rte_pktmbuf_extmem gpu_mem;
+	struct rte_gpu_info ginfo;
+	uint8_t gpu_page_shift = 16;
+	uint32_t gpu_page_size = (1UL << gpu_page_shift);
+
+	ret = eth_dev_info_get_print_err(port_id, &dev_info);
+	if (ret != 0)
+		rte_exit(EXIT_FAILURE,
+			"Failed to get device info for port %d\n",
+			port_id);
+
+	mbuf_poolname_build(socket_id, pool_name, sizeof(pool_name), port_id, MBUF_MEM_GPU);
+	if (!is_proc_primary()) {
+		rte_mp = rte_mempool_lookup(pool_name);
+		if (rte_mp == NULL)
+			rte_exit(EXIT_FAILURE,
+				"Get mbuf pool for socket %u failed: %s\n",
+				socket_id, rte_strerror(rte_errno));
+		return rte_mp;
+	}
+
+	if (rte_gpu_info_get(gpu_id, &ginfo))
+		rte_exit(EXIT_FAILURE, "Can't retrieve info about GPU %d - bye\n", gpu_id);
+
+	TESTPMD_LOG(INFO,
+		"create a new mbuf pool <%s>: n=%u, size=%u, socket=%u GPU device=%s\n",
+		pool_name, nb_mbuf, mbuf_seg_size, socket_id, ginfo.name);
+
+	/* Create an external memory mempool using memory allocated on the GPU. */
+
+	gpu_mem.elt_size = RTE_MBUF_DEFAULT_BUF_SIZE;
+	gpu_mem.buf_len = RTE_ALIGN_CEIL(nb_mbuf * gpu_mem.elt_size, gpu_page_size);
+	gpu_mem.buf_iova = RTE_BAD_IOVA;
+
+	gpu_mem.buf_ptr = rte_gpu_mem_alloc(gpu_id, gpu_mem.buf_len);
+	if (gpu_mem.buf_ptr == NULL)
+		rte_exit(EXIT_FAILURE, "Could not allocate GPU device memory\n");
+
+	ret = rte_extmem_register(gpu_mem.buf_ptr, gpu_mem.buf_len, NULL, gpu_mem.buf_iova, gpu_page_size);
+	if (ret)
+		rte_exit(EXIT_FAILURE, "Unable to register addr 0x%p, ret %d\n", gpu_mem.buf_ptr, ret);
+
+	uint16_t pid = 0;
+
+	RTE_ETH_FOREACH_DEV(pid)
+	{
+		ret = rte_dev_dma_map(dev_info.device, gpu_mem.buf_ptr,
+					  gpu_mem.buf_iova, gpu_mem.buf_len);
+		if (ret) {
+			rte_exit(EXIT_FAILURE, "Unable to DMA map addr 0x%p for device %s\n",
+					 gpu_mem.buf_ptr, dev_info.device->name);
+		}
+	}
+
+	rte_mp = rte_pktmbuf_pool_create_extbuf(pool_name, nb_mbuf, mb_mempool_cache, 0, mbuf_seg_size, socket_id, &gpu_mem, 1);
+	if (rte_mp == NULL) {
+		rte_exit(EXIT_FAILURE, "Creation of GPU mempool <%s> failed\n", pool_name);
+	}
+
+	*mp_addr = (uintptr_t) gpu_mem.buf_ptr;
+
+	return rte_mp;
+}
+
 /*
  * Configuration initialisation done once at init time.
  */
@@ -1117,7 +1202,7 @@  mbuf_pool_create(uint16_t mbuf_seg_size, unsigned nb_mbuf,
 
 	mb_size = sizeof(struct rte_mbuf) + mbuf_seg_size;
 #endif
-	mbuf_poolname_build(socket_id, pool_name, sizeof(pool_name), size_idx);
+	mbuf_poolname_build(socket_id, pool_name, sizeof(pool_name), size_idx, MBUF_MEM_CPU);
 	if (!is_proc_primary()) {
 		rte_mp = rte_mempool_lookup(pool_name);
 		if (rte_mp == NULL)
@@ -1700,19 +1785,42 @@  init_config(void)
 
 		for (i = 0; i < num_sockets; i++)
 			for (j = 0; j < mbuf_data_size_n; j++)
-				mempools[i * MAX_SEGS_BUFFER_SPLIT + j] =
-					mbuf_pool_create(mbuf_data_size[j],
-							  nb_mbuf_per_pool,
-							  socket_ids[i], j);
+			{
+				if (mbuf_mem_types[j] == MBUF_MEM_GPU) {
+					if (rte_gpu_count_avail() == 0)
+						rte_exit(EXIT_FAILURE, "No GPU device available.\n");
+
+					mempools[i * MAX_SEGS_BUFFER_SPLIT + j] =
+						gpu_mbuf_pool_create(mbuf_data_size[j],
+											 nb_mbuf_per_pool,
+											 socket_ids[i], j, gpu_id,
+											 &(mempools_ext_ptr[j]));
+				} else {
+					mempools[i * MAX_SEGS_BUFFER_SPLIT + j] =
+						mbuf_pool_create(mbuf_data_size[j],
+								nb_mbuf_per_pool,
+								socket_ids[i], j);
+				}
+			}
 	} else {
 		uint8_t i;
 
 		for (i = 0; i < mbuf_data_size_n; i++)
-			mempools[i] = mbuf_pool_create
-					(mbuf_data_size[i],
-					 nb_mbuf_per_pool,
-					 socket_num == UMA_NO_CONFIG ?
-					 0 : socket_num, i);
+		{
+			if (mbuf_mem_types[i] == MBUF_MEM_GPU) {
+				mempools[i] = gpu_mbuf_pool_create(mbuf_data_size[i],
+												   nb_mbuf_per_pool,
+												   socket_num == UMA_NO_CONFIG ? 0 : socket_num,
+												   i, gpu_id,
+												   &(mempools_ext_ptr[i]));
+			} else {
+				mempools[i] = mbuf_pool_create(mbuf_data_size[i],
+							nb_mbuf_per_pool,
+							socket_num == UMA_NO_CONFIG ?
+							0 : socket_num, i);
+			}
+		}
+
 	}
 
 	init_port_config();
@@ -3415,8 +3523,11 @@  pmd_test_exit(void)
 		}
 	}
 	for (i = 0 ; i < RTE_DIM(mempools) ; i++) {
-		if (mempools[i])
+		if (mempools[i]) {
 			mempool_free_mp(mempools[i]);
+			if (mbuf_mem_types[i] == MBUF_MEM_GPU)
+				rte_gpu_mem_free(gpu_id, (void *)mempools_ext_ptr[i]);
+		}
 	}
 	free(xstats_display);
 
diff --git a/app/test-pmd/testpmd.h b/app/test-pmd/testpmd.h
index 669ce1e87d..9919044372 100644
--- a/app/test-pmd/testpmd.h
+++ b/app/test-pmd/testpmd.h
@@ -12,6 +12,7 @@ 
 #include <rte_gro.h>
 #include <rte_gso.h>
 #include <rte_os_shim.h>
+#include <rte_gpudev.h>
 #include <cmdline.h>
 #include <sys/queue.h>
 #ifdef RTE_HAS_JANSSON
@@ -474,6 +475,11 @@  extern uint8_t dcb_config;
 extern uint32_t mbuf_data_size_n;
 extern uint16_t mbuf_data_size[MAX_SEGS_BUFFER_SPLIT];
 /**< Mbuf data space size. */
+enum mbuf_mem_type {
+	MBUF_MEM_CPU,
+	MBUF_MEM_GPU
+};
+extern enum mbuf_mem_type mbuf_mem_types[MAX_SEGS_BUFFER_SPLIT];
 extern uint32_t param_total_num_mbufs;
 
 extern uint16_t stats_period;
@@ -717,14 +723,16 @@  current_fwd_lcore(void)
 /* Mbuf Pools */
 static inline void
 mbuf_poolname_build(unsigned int sock_id, char *mp_name,
-		    int name_size, uint16_t idx)
+		    int name_size, uint16_t idx, enum mbuf_mem_type mem_type)
 {
+	const char *suffix = mem_type == MBUF_MEM_GPU ? "_gpu" : "";
+
 	if (!idx)
 		snprintf(mp_name, name_size,
-			 MBUF_POOL_NAME_PFX "_%u", sock_id);
+			 MBUF_POOL_NAME_PFX "_%u%s", sock_id, suffix);
 	else
 		snprintf(mp_name, name_size,
-			 MBUF_POOL_NAME_PFX "_%hu_%hu", (uint16_t)sock_id, idx);
+			 MBUF_POOL_NAME_PFX "_%hu_%hu%s", (uint16_t)sock_id, idx, suffix);
 }
 
 static inline struct rte_mempool *
@@ -732,7 +740,7 @@  mbuf_pool_find(unsigned int sock_id, uint16_t idx)
 {
 	char pool_name[RTE_MEMPOOL_NAMESIZE];
 
-	mbuf_poolname_build(sock_id, pool_name, sizeof(pool_name), idx);
+	mbuf_poolname_build(sock_id, pool_name, sizeof(pool_name), idx, mbuf_mem_types[idx]);
 	return rte_mempool_lookup((const char *)pool_name);
 }
 
diff --git a/doc/guides/testpmd_app_ug/run_app.rst b/doc/guides/testpmd_app_ug/run_app.rst
index 30edef07ea..ede7b79abb 100644
--- a/doc/guides/testpmd_app_ug/run_app.rst
+++ b/doc/guides/testpmd_app_ug/run_app.rst
@@ -119,6 +119,9 @@  The command line options are:
     The default value is 2048. If multiple mbuf-size values are specified the
     extra memory pools will be created for allocating mbufs to receive packets
     with buffer splitting features.
+    Providing mbuf size with a 'g' suffix (e.g. ``--mbuf-size=2048g``),
+    will cause the mempool to be created on a GPU memory area allocated.
+    This option is currently limited to iofwd engine with the first GPU.
 
 *   ``--total-num-mbufs=N``