[RFC] mbuf: support dynamic fields and flags

Message ID 20190710092907.5565-1-olivier.matz@6wind.com
State Superseded
Delegated to: Thomas Monjalon
Headers show
Series
  • [RFC] mbuf: support dynamic fields and flags
Related show

Checks

Context Check Description
ci/Intel-compilation success Compilation OK
ci/checkpatch warning coding style issues

Commit Message

Olivier Matz July 10, 2019, 9:29 a.m.
Many features require to store data inside the mbuf. As the room in mbuf
structure is limited, it is not possible to have a field for each
feature. Also, changing fields in the mbuf structure can break the API
or ABI.

This commit addresses these issues, by enabling the dynamic registration
of fields or flags:

- a dynamic field is a named area in the rte_mbuf structure, with a
  given size (>= 1 byte) and alignment constraint.
- a dynamic flag is a named bit in the rte_mbuf structure.

The typical use case is a PMD that registers space for an offload
feature, when the application requests to enable this feature.  As
the space in mbuf is limited, the space should only be reserved if it
is going to be used (i.e when the application explicitly asks for it).

The registration can be done at any moment, but it is not possible
to unregister fields or flags for now.

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
---
 app/test/test_mbuf.c                 |  83 +++++++-
 lib/librte_mbuf/Makefile             |   2 +
 lib/librte_mbuf/meson.build          |   6 +-
 lib/librte_mbuf/rte_mbuf.h           |  25 ++-
 lib/librte_mbuf/rte_mbuf_dyn.c       | 373 +++++++++++++++++++++++++++++++++++
 lib/librte_mbuf/rte_mbuf_dyn.h       | 119 +++++++++++
 lib/librte_mbuf/rte_mbuf_version.map |   4 +
 7 files changed, 607 insertions(+), 5 deletions(-)
 create mode 100644 lib/librte_mbuf/rte_mbuf_dyn.c
 create mode 100644 lib/librte_mbuf/rte_mbuf_dyn.h

Comments

Haiyue Wang July 10, 2019, 5:14 p.m. | #1
Hi,

Sounds cool, just have some questions inline.

> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Olivier Matz
> Sent: Wednesday, July 10, 2019 17:29
> To: dev@dpdk.org
> Subject: [dpdk-dev] [RFC] mbuf: support dynamic fields and flags
> 
> Many features require to store data inside the mbuf. As the room in mbuf
> structure is limited, it is not possible to have a field for each
> feature. Also, changing fields in the mbuf structure can break the API
> or ABI.
> 
> This commit addresses these issues, by enabling the dynamic registration
> of fields or flags:
> 
> - a dynamic field is a named area in the rte_mbuf structure, with a
>   given size (>= 1 byte) and alignment constraint.
> - a dynamic flag is a named bit in the rte_mbuf structure.
> 
> The typical use case is a PMD that registers space for an offload
> feature, when the application requests to enable this feature.  As
> the space in mbuf is limited, the space should only be reserved if it
> is going to be used (i.e when the application explicitly asks for it).
> 
> The registration can be done at any moment, but it is not possible
> to unregister fields or flags for now.
> 
> Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
> ---
>  app/test/test_mbuf.c                 |  83 +++++++-
>  lib/librte_mbuf/Makefile             |   2 +
>  lib/librte_mbuf/meson.build          |   6 +-
>  lib/librte_mbuf/rte_mbuf.h           |  25 ++-
>  lib/librte_mbuf/rte_mbuf_dyn.c       | 373 +++++++++++++++++++++++++++++++++++
>  lib/librte_mbuf/rte_mbuf_dyn.h       | 119 +++++++++++
>  lib/librte_mbuf/rte_mbuf_version.map |   4 +
>  7 files changed, 607 insertions(+), 5 deletions(-)
>  create mode 100644 lib/librte_mbuf/rte_mbuf_dyn.c
>  create mode 100644 lib/librte_mbuf/rte_mbuf_dyn.h
> 
> diff --git a/app/test/test_mbuf.c b/app/test/test_mbuf.c
> index 2a97afe20..8008cc766 100644
> --- a/app/test/test_mbuf.c
> +++ b/app/test/test_mbuf.c
> @@ -28,6 +28,7 @@
>  #include <rte_random.h>
>  #include <rte_cycles.h>
>  #include <rte_malloc.h>
> +#include <rte_mbuf_dyn.h>
> 
>  #include "test.h"
> 
> @@ -502,7 +503,6 @@ test_attach_from_different_pool(struct rte_mempool *pktmbuf_pool,
>  		rte_pktmbuf_free(clone2);
>  	return -1;
>  }
> -#undef GOTO_FAIL
> 
>  /*
>   * test allocation and free of mbufs
> @@ -1122,6 +1122,81 @@ test_tx_offload(void)
>  }
> 
>  static int
> +test_mbuf_dyn(struct rte_mempool *pktmbuf_pool)
> +{
> +	struct rte_mbuf *m = NULL;
> +	int offset, offset2;
> +	int flag, flag2;
> +
> +	offset = rte_mbuf_dynfield_register("test-dynfield", sizeof(uint8_t),
> +					__alignof__(uint8_t), 0);
> +	if (offset == -1)
> +		GOTO_FAIL("failed to register dynamic field, offset=%d: %s",
> +			offset, strerror(errno));
> +
> +	offset2 = rte_mbuf_dynfield_register("test-dynfield", sizeof(uint8_t),
> +					__alignof__(uint8_t), 0);
> +	if (offset2 != offset)
> +		GOTO_FAIL("failed to lookup dynamic field, offset=%d, offset2=%d: %s",
> +			offset, offset2, strerror(errno));
> +
> +	offset2 = rte_mbuf_dynfield_register("test-dynfield2", sizeof(uint16_t),
> +					__alignof__(uint16_t), 0);
> +	if (offset2 == -1 || offset2 == offset || (offset & 1))
> +		GOTO_FAIL("failed to register dynfield field 2, offset=%d, offset2=%d: %s",
> +			offset, offset2, strerror(errno));
> +
> +	printf("offset = %d, offset2 = %d\n", offset, offset2);
> +
> +	offset = rte_mbuf_dynfield_register("test-dynfield-fail", 256, 1, 0);
> +	if (offset != -1)
> +		GOTO_FAIL("dynamic field creation should fail (too big)");
> +
> +	offset = rte_mbuf_dynfield_register("test-dynfield-fail", 1, 3, 0);
> +	if (offset != -1)
> +		GOTO_FAIL("dynamic field creation should fail (bad alignment)");
> +
> +	flag = rte_mbuf_dynflag_register("test-dynflag");
> +	if (flag == -1)
> +		GOTO_FAIL("failed to register dynamic field, flag=%d: %s",
> +			flag, strerror(errno));
> +
> +	flag2 = rte_mbuf_dynflag_register("test-dynflag");
> +	if (flag2 != flag)
> +		GOTO_FAIL("failed to lookup dynamic field, flag=%d, flag2=%d: %s",
> +			flag, flag2, strerror(errno));
> +
> +	flag2 = rte_mbuf_dynflag_register("test-dynflag2");
> +	if (flag2 == -1 || flag2 == flag)
> +		GOTO_FAIL("failed to register dynflag field 2, flag=%d, flag2=%d: %s",
> +			flag, flag2, strerror(errno));
> +
> +	printf("flag = %d, flag2 = %d\n", flag, flag2);
> +
> +	/* set, get dynamic field */
> +	m = rte_pktmbuf_alloc(pktmbuf_pool);
> +	if (m == NULL)
> +		GOTO_FAIL("Cannot allocate mbuf");
> +
> +	*RTE_MBUF_DYNFIELD(m, offset, uint8_t *) = 1;
> +	if (*RTE_MBUF_DYNFIELD(m, offset, uint8_t *) != 1)
> +		GOTO_FAIL("failed to read dynamic field");
> +	*RTE_MBUF_DYNFIELD(m, offset2, uint16_t *) = 1000;
> +	if (*RTE_MBUF_DYNFIELD(m, offset2, uint16_t *) != 1000)
> +		GOTO_FAIL("failed to read dynamic field");
> +
> +	/* set a dynamic flag */
> +	m->ol_flags |= (1ULL << flag);
> +
> +	rte_pktmbuf_free(m);
> +	return 0;
> +fail:
> +	rte_pktmbuf_free(m);
> +	return -1;
> +}
> +#undef GOTO_FAIL
> +
> +static int
>  test_mbuf(void)
>  {
>  	int ret = -1;
> @@ -1140,6 +1215,12 @@ test_mbuf(void)
>  		goto err;
>  	}
> 
> +	/* test registration of dynamic fields and flags */
> +	if (test_mbuf_dyn(pktmbuf_pool) < 0) {
> +		printf("mbuf dynflag test failed\n");
> +		goto err;
> +	}
> +
>  	/* create a specific pktmbuf pool with a priv_size != 0 and no data
>  	 * room size */
>  	pktmbuf_pool2 = rte_pktmbuf_pool_create("test_pktmbuf_pool2",
> diff --git a/lib/librte_mbuf/Makefile b/lib/librte_mbuf/Makefile
> index c8f6d2689..5a9bcee73 100644
> --- a/lib/librte_mbuf/Makefile
> +++ b/lib/librte_mbuf/Makefile
> @@ -17,8 +17,10 @@ LIBABIVER := 5
> 
>  # all source are stored in SRCS-y
>  SRCS-$(CONFIG_RTE_LIBRTE_MBUF) := rte_mbuf.c rte_mbuf_ptype.c rte_mbuf_pool_ops.c
> +SRCS-$(CONFIG_RTE_LIBRTE_MBUF) += rte_mbuf_dyn.c
> 
>  # install includes
>  SYMLINK-$(CONFIG_RTE_LIBRTE_MBUF)-include := rte_mbuf.h rte_mbuf_ptype.h rte_mbuf_pool_ops.h
> +SYMLINK-$(CONFIG_RTE_LIBRTE_MBUF)-include += rte_mbuf_dyn.h
> 
>  include $(RTE_SDK)/mk/rte.lib.mk
> diff --git a/lib/librte_mbuf/meson.build b/lib/librte_mbuf/meson.build
> index 6cc11ebb4..9137e8f26 100644
> --- a/lib/librte_mbuf/meson.build
> +++ b/lib/librte_mbuf/meson.build
> @@ -2,8 +2,10 @@
>  # Copyright(c) 2017 Intel Corporation
> 
>  version = 5
> -sources = files('rte_mbuf.c', 'rte_mbuf_ptype.c', 'rte_mbuf_pool_ops.c')
> -headers = files('rte_mbuf.h', 'rte_mbuf_ptype.h', 'rte_mbuf_pool_ops.h')
> +sources = files('rte_mbuf.c', 'rte_mbuf_ptype.c', 'rte_mbuf_pool_ops.c',
> +	'rte_mbuf_dyn.c')
> +headers = files('rte_mbuf.h', 'rte_mbuf_ptype.h', 'rte_mbuf_pool_ops.h',
> +	'rte_mbuf_dyn.h')
>  deps += ['mempool']
> 
>  allow_experimental_apis = true
> diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
> index 98225ec80..ef588cd54 100644
> --- a/lib/librte_mbuf/rte_mbuf.h
> +++ b/lib/librte_mbuf/rte_mbuf.h
> @@ -198,9 +198,12 @@ extern "C" {
>  #define PKT_RX_OUTER_L4_CKSUM_GOOD	(1ULL << 22)
>  #define PKT_RX_OUTER_L4_CKSUM_INVALID	((1ULL << 21) | (1ULL << 22))
> 
> -/* add new RX flags here */
> +/* add new RX flags here, don't forget to update PKT_FIRST_FREE */
> 
> -/* add new TX flags here */
> +#define PKT_FIRST_FREE (1ULL << 23)
> +#define PKT_LAST_FREE (1ULL << 39)
> +
> +/* add new TX flags here, don't forget to update PKT_LAST_FREE  */
> 
>  /**
>   * Indicate that the metadata field in the mbuf is in use.
> @@ -738,6 +741,8 @@ struct rte_mbuf {
>  	 */
>  	struct rte_mbuf_ext_shared_info *shinfo;
> 
> +	uint64_t dynfield1; /**< Reserved for dynamic fields. */
> +	uint64_t dynfield2; /**< Reserved for dynamic fields. */
>  } __rte_cache_aligned;
> 
>  /**
> @@ -1685,6 +1690,21 @@ rte_pktmbuf_attach_extbuf(struct rte_mbuf *m, void *buf_addr,
>  #define rte_pktmbuf_detach_extbuf(m) rte_pktmbuf_detach(m)
> 
>  /**
> + * Copy dynamic fields from m_src to m_dst.
> + *
> + * @param m_dst
> + *   The destination mbuf.
> + * @param m_src
> + *   The source mbuf.
> + */
> +static inline void
> +rte_mbuf_dynfield_copy(struct rte_mbuf *m_dst, const struct rte_mbuf *m_src)
> +{
> +	m_dst->dynfield1 = m_src->dynfield1;
> +	m_dst->dynfield2 = m_src->dynfield2;
> +}
> +
> +/**
>   * Attach packet mbuf to another packet mbuf.
>   *
>   * If the mbuf we are attaching to isn't a direct buffer and is attached to
> @@ -1732,6 +1752,7 @@ static inline void rte_pktmbuf_attach(struct rte_mbuf *mi, struct rte_mbuf *m)
>  	mi->vlan_tci_outer = m->vlan_tci_outer;
>  	mi->tx_offload = m->tx_offload;
>  	mi->hash = m->hash;
> +	rte_mbuf_dynfield_copy(mi, m);
> 
>  	mi->next = NULL;
>  	mi->pkt_len = mi->data_len;
> diff --git a/lib/librte_mbuf/rte_mbuf_dyn.c b/lib/librte_mbuf/rte_mbuf_dyn.c
> new file mode 100644
> index 000000000..6a96a43da
> --- /dev/null
> +++ b/lib/librte_mbuf/rte_mbuf_dyn.c
> @@ -0,0 +1,373 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright 2019 6WIND S.A.
> + */
> +
> +#include <sys/queue.h>
> +
> +#include <rte_common.h>
> +#include <rte_eal.h>
> +#include <rte_eal_memconfig.h>
> +#include <rte_tailq.h>
> +#include <rte_errno.h>
> +#include <rte_malloc.h>
> +#include <rte_string_fns.h>
> +#include <rte_mbuf.h>
> +#include <rte_mbuf_dyn.h>
> +
> +#define RTE_MBUF_DYN_MZNAME "rte_mbuf_dyn"
> +
> +struct mbuf_dynfield {
> +	TAILQ_ENTRY(mbuf_dynfield) next;
> +	char name[RTE_MBUF_DYN_NAMESIZE];
> +	size_t size;
> +	size_t align;
> +	unsigned int flags;
> +	int offset;
> +};
> +TAILQ_HEAD(mbuf_dynfield_list, rte_tailq_entry);
> +
> +static struct rte_tailq_elem mbuf_dynfield_tailq = {
> +	.name = "RTE_MBUF_DYNFIELD",
> +};
> +EAL_REGISTER_TAILQ(mbuf_dynfield_tailq);
> +
> +struct mbuf_dynflag {
> +	TAILQ_ENTRY(mbuf_dynflag) next;
> +	char name[RTE_MBUF_DYN_NAMESIZE];
> +	int bitnum;
> +};
> +TAILQ_HEAD(mbuf_dynflag_list, rte_tailq_entry);
> +
> +static struct rte_tailq_elem mbuf_dynflag_tailq = {
> +	.name = "RTE_MBUF_DYNFLAG",
> +};
> +EAL_REGISTER_TAILQ(mbuf_dynflag_tailq);
> +
> +struct mbuf_dyn_shm {
> +	/** For each mbuf byte, free_space[i] == 1 if space is free. */
> +	uint8_t free_space[sizeof(struct rte_mbuf)];
> +	/** Bitfield of available flags. */
> +	uint64_t free_flags;
> +};
> +static struct mbuf_dyn_shm *shm;
> +
> +/* allocate and initialize the shared memory */
> +static int
> +init_shared_mem(void)
> +{
> +	const struct rte_memzone *mz;
> +	uint64_t mask;
> +
> +	if (rte_eal_process_type() == RTE_PROC_PRIMARY) {
> +		mz = rte_memzone_reserve_aligned(RTE_MBUF_DYN_MZNAME,
> +						sizeof(struct mbuf_dyn_shm),
> +						SOCKET_ID_ANY, 0,
> +						RTE_CACHE_LINE_SIZE);
> +	} else {
> +		mz = rte_memzone_lookup(RTE_MBUF_DYN_MZNAME);
> +	}
> +	if (mz == NULL)
> +		return -1;
> +
> +	shm = mz->addr;
> +
> +#define mark_free(field)						\
> +	memset(&shm->free_space[offsetof(struct rte_mbuf, field)],	\
> +		0xff, sizeof(((struct rte_mbuf *)0)->field))
> +
> +	if (rte_eal_process_type() == RTE_PROC_PRIMARY) {
> +		/* init free_space, keep it sync'd with
> +		 * rte_mbuf_dynfield_copy().
> +		 */
> +		memset(shm, 0, sizeof(*shm));
> +		mark_free(dynfield1);
> +		mark_free(dynfield2);
> +
> +		/* init free_flags */
> +		for (mask = PKT_FIRST_FREE; mask <= PKT_LAST_FREE; mask <<= 1)
> +			shm->free_flags |= mask;
> +	}
> +#undef mark_free
> +
> +	return 0;
> +}
> +
> +/* check if this offset can be used */
> +static int
> +check_offset(size_t offset, size_t size, size_t align, unsigned int flags)
> +{
> +	size_t i;
> +
> +	(void)flags;
> +
> +	if ((offset & (align - 1)) != 0)
> +		return -1;
> +	if (offset + size > sizeof(struct rte_mbuf))
> +		return -1;
> +
> +	for (i = 0; i < size; i++) {
> +		if (!shm->free_space[i + offset])
> +			return -1;
> +	}
> +
> +	return 0;
> +}
> +
> +/* assume tailq is locked */
> +static struct mbuf_dynfield *
> +__mbuf_dynfield_lookup(const char *name)
> +{
> +	struct mbuf_dynfield_list *mbuf_dynfield_list;
> +	struct mbuf_dynfield *mbuf_dynfield;
> +	struct rte_tailq_entry *te;
> +
> +	mbuf_dynfield_list = RTE_TAILQ_CAST(
> +		mbuf_dynfield_tailq.head, mbuf_dynfield_list);
> +
> +	TAILQ_FOREACH(te, mbuf_dynfield_list, next) {
> +		mbuf_dynfield = (struct mbuf_dynfield *)te->data;
> +		if (strncmp(name, mbuf_dynfield->name,
> +				RTE_MBUF_DYN_NAMESIZE) == 0)
> +			break;
> +	}
> +
> +	if (te == NULL) {
> +		rte_errno = ENOENT;
> +		return NULL;
> +	}
> +
> +	return mbuf_dynfield;
> +}
> +
> +int
> +rte_mbuf_dynfield_lookup(const char *name, size_t *size, size_t *align)
> +{
> +	struct mbuf_dynfield *mbuf_dynfield;
> +
> +	if (shm == NULL) {
> +		rte_errno = ENOENT;
> +		return -1;
> +	}
> +
> +	rte_mcfg_tailq_read_lock();
> +	mbuf_dynfield = __mbuf_dynfield_lookup(name);
> +	rte_mcfg_tailq_read_unlock();
> +
> +	if (mbuf_dynfield == NULL) {
> +		rte_errno = ENOENT;
> +		return -1;
> +	}
> +
> +	if (size != NULL)
> +		*size = mbuf_dynfield->size;
> +	if (align != NULL)
> +		*align = mbuf_dynfield->align;
> +
> +	return mbuf_dynfield->offset;
> +}
> +
> +int
> +rte_mbuf_dynfield_register(const char *name, size_t size, size_t align,
> +			unsigned int flags)
> +{
> +	struct mbuf_dynfield_list *mbuf_dynfield_list;
> +	struct mbuf_dynfield *mbuf_dynfield = NULL;
> +	struct rte_tailq_entry *te = NULL;
> +	int offset, ret;
> +	size_t i;
> +
> +	if (shm == NULL && init_shared_mem() < 0)
> +		goto fail;
> +	if (size >= sizeof(struct rte_mbuf)) {
> +		rte_errno = EINVAL;
> +		goto fail;
> +	}
> +	if (!rte_is_power_of_2(align)) {
> +		rte_errno = EINVAL;
> +		goto fail;
> +	}
> +
> +	rte_mcfg_tailq_write_lock();
> +
> +	mbuf_dynfield = __mbuf_dynfield_lookup(name);
> +	if (mbuf_dynfield != NULL) {
> +		if (mbuf_dynfield->size != size ||
> +				mbuf_dynfield->align != align ||
> +				mbuf_dynfield->flags != flags) {
> +			rte_errno = EEXIST;
> +			goto fail_unlock;
> +		}
> +		offset = mbuf_dynfield->offset;
> +		goto out_unlock;
> +	}
> +
> +	if (rte_eal_process_type() != RTE_PROC_PRIMARY) {
> +		rte_errno = EPERM;
> +		goto fail_unlock;
> +	}
> +
> +	for (offset = 0;
> +	     offset < (int)sizeof(struct rte_mbuf);
> +	     offset++) {
> +		if (check_offset(offset, size, align, flags) == 0)
> +			break;
> +	}
> +
> +	if (offset == sizeof(struct rte_mbuf)) {
> +		rte_errno = ENOENT;
> +		goto fail_unlock;
> +	}
> +
> +	mbuf_dynfield_list = RTE_TAILQ_CAST(
> +		mbuf_dynfield_tailq.head, mbuf_dynfield_list);
> +
> +	te = rte_zmalloc("MBUF_DYNFIELD_TAILQ_ENTRY", sizeof(*te), 0);
> +	if (te == NULL)
> +		goto fail_unlock;
> +
> +	mbuf_dynfield = rte_zmalloc("mbuf_dynfield", sizeof(*mbuf_dynfield), 0);
> +	if (mbuf_dynfield == NULL)
> +		goto fail_unlock;
> +
> +	ret = strlcpy(mbuf_dynfield->name, name, sizeof(mbuf_dynfield->name));
> +	if (ret < 0 || ret >= (int)sizeof(mbuf_dynfield->name)) {
> +		rte_errno = ENAMETOOLONG;
> +		goto fail_unlock;
> +	}
> +	mbuf_dynfield->size = size;
> +	mbuf_dynfield->align = align;
> +	mbuf_dynfield->flags = flags;
> +	mbuf_dynfield->offset = offset;
> +	te->data = mbuf_dynfield;
> +
> +	TAILQ_INSERT_TAIL(mbuf_dynfield_list, te, next);
> +
> +	for (i = offset; i < offset + size; i++)
> +		shm->free_space[i] = 0;
> +
> +out_unlock:
> +	rte_mcfg_tailq_write_unlock();
> +
> +	return offset;
> +
> +fail_unlock:
> +	rte_mcfg_tailq_write_unlock();
> +fail:
> +	rte_free(mbuf_dynfield);
> +	rte_free(te);
> +	return -1;
> +}
> +
> +/* assume tailq is locked */
> +static struct mbuf_dynflag *
> +__mbuf_dynflag_lookup(const char *name)
> +{
> +	struct mbuf_dynflag_list *mbuf_dynflag_list;
> +	struct mbuf_dynflag *mbuf_dynflag;
> +	struct rte_tailq_entry *te;
> +
> +	mbuf_dynflag_list = RTE_TAILQ_CAST(
> +		mbuf_dynflag_tailq.head, mbuf_dynflag_list);
> +
> +	TAILQ_FOREACH(te, mbuf_dynflag_list, next) {
> +		mbuf_dynflag = (struct mbuf_dynflag *)te->data;
> +		if (strncmp(name, mbuf_dynflag->name,
> +				RTE_MBUF_DYN_NAMESIZE) == 0)
> +			break;
> +	}
> +
> +	if (te == NULL) {
> +		rte_errno = ENOENT;
> +		return NULL;
> +	}
> +
> +	return mbuf_dynflag;
> +}
> +
> +int
> +rte_mbuf_dynflag_lookup(const char *name)
> +{
> +	struct mbuf_dynflag *mbuf_dynflag;
> +
> +	if (shm == NULL) {
> +		rte_errno = ENOENT;
> +		return -1;
> +	}
> +
> +	rte_mcfg_tailq_read_lock();
> +	mbuf_dynflag = __mbuf_dynflag_lookup(name);
> +	rte_mcfg_tailq_read_unlock();
> +
> +	if (mbuf_dynflag == NULL) {
> +		rte_errno = ENOENT;
> +		return -1;
> +	}
> +
> +	return mbuf_dynflag->bitnum;
> +}
> +
> +int
> +rte_mbuf_dynflag_register(const char *name)
> +{
> +	struct mbuf_dynflag_list *mbuf_dynflag_list;
> +	struct mbuf_dynflag *mbuf_dynflag = NULL;
> +	struct rte_tailq_entry *te = NULL;
> +	int bitnum, ret;
> +
> +	if (shm == NULL && init_shared_mem() < 0)
> +		goto fail;
> +
> +	rte_mcfg_tailq_write_lock();
> +
> +	mbuf_dynflag = __mbuf_dynflag_lookup(name);
> +	if (mbuf_dynflag != NULL) {
> +		bitnum = mbuf_dynflag->bitnum;
> +		goto out_unlock;
> +	}
> +
> +	if (rte_eal_process_type() != RTE_PROC_PRIMARY) {
> +		rte_errno = EPERM;
> +		goto fail_unlock;
> +	}
> +
> +	if (shm->free_flags == 0) {
> +		rte_errno = ENOENT;
> +		goto fail_unlock;
> +	}
> +	bitnum = rte_bsf64(shm->free_flags);
> +
> +	mbuf_dynflag_list = RTE_TAILQ_CAST(
> +		mbuf_dynflag_tailq.head, mbuf_dynflag_list);
> +
> +	te = rte_zmalloc("MBUF_DYNFLAG_TAILQ_ENTRY", sizeof(*te), 0);
> +	if (te == NULL)
> +		goto fail_unlock;
> +
> +	mbuf_dynflag = rte_zmalloc("mbuf_dynflag", sizeof(*mbuf_dynflag), 0);
> +	if (mbuf_dynflag == NULL)
> +		goto fail_unlock;
> +
> +	ret = strlcpy(mbuf_dynflag->name, name, sizeof(mbuf_dynflag->name));
> +	if (ret < 0 || ret >= (int)sizeof(mbuf_dynflag->name)) {
> +		rte_errno = ENAMETOOLONG;
> +		goto fail_unlock;
> +	}
> +	mbuf_dynflag->bitnum = bitnum;
> +	te->data = mbuf_dynflag;
> +
> +	TAILQ_INSERT_TAIL(mbuf_dynflag_list, te, next);
> +
> +	shm->free_flags &= ~(1ULL << bitnum);
> +
> +out_unlock:
> +	rte_mcfg_tailq_write_unlock();
> +
> +	return bitnum;
> +
> +fail_unlock:
> +	rte_mcfg_tailq_write_unlock();
> +fail:
> +	rte_free(mbuf_dynflag);
> +	rte_free(te);
> +	return -1;
> +}
> diff --git a/lib/librte_mbuf/rte_mbuf_dyn.h b/lib/librte_mbuf/rte_mbuf_dyn.h
> new file mode 100644
> index 000000000..a86986a0f
> --- /dev/null
> +++ b/lib/librte_mbuf/rte_mbuf_dyn.h
> @@ -0,0 +1,119 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright 2019 6WIND S.A.
> + */
> +
> +#ifndef _RTE_MBUF_DYN_H_
> +#define _RTE_MBUF_DYN_H_
> +
> +/**
> + * @file
> + * RTE Mbuf dynamic fields and flags
> + *
> + * Many features require to store data inside the mbuf. As the room in
> + * mbuf structure is limited, it is not possible to have a field for
> + * each feature. Also, changing fields in the mbuf structure can break
> + * the API or ABI.
> + *
> + * This module addresses this issue, by enabling the dynamic
> + * registration of fields or flags:
> + *
> + * - a dynamic field is a named area in the rte_mbuf structure, with a
> + *   given size (>= 1 byte) and alignment constraint.
> + * - a dynamic flag is a named bit in the rte_mbuf structure.
> + *
> + * The typical use case is a PMD that registers space for an offload
> + * feature, when the application requests to enable this feature.  As
> + * the space in mbuf is limited, the space should only be reserved if it
> + * is going to be used (i.e when the application explicitly asks for it).
> + *
> + * The registration can be done at any moment, but it is not possible
> + * to unregister fields or flags for now.
> + *
> + * Example of use:
> + *
> + * - RTE_MBUF_DYN_<feature>_(ID|SIZE|ALIGN) are defined in this file

Does it means that all PMDs define their own 'RTE_MBUF_DYN_<feature>_(ID|SIZE|ALIGN)'
here ? In other words, each PMD can expose its private DYN_<feature> here for public
using ?

How about adding another eth_dev_ops API definitions to show the PMD's supporting feature
names, sizes, align in run time for testpmd ? And also another eth_dev_ops API for showing
the data saved in rte_mbuf by 'dump_pkt_burst' ? Adding a new command for testpmd to set
the dynamic feature may be good for PMD test.

> + * - If the application asks for the feature, the PMD use

How does the application ask for the feature ? By ' rte_mbuf_dynfield_register()' ?

> + *   rte_mbuf_dynfield_register() to get the dynamic offset and stores
> + *   in a global variable.

In case, the PMD calls 'rte_mbuf_dynfield_register()' for 'dyn_feature' firstly, this
means that PMD requests the dynamic feature itself if I understand correctly. Should
PMD calls 'rte_mbuf_dynfield_lookup' for 'dyn_feature' to query the name exists, the
size and align are right as expected ? If exists, but size and align are not right, may
be for PMD change its definition, then PMD can give a warning or error message. If name
exists, both size and align are expected, then PMD think that the application request
the right dynamic features.

> + * - The application also calls rte_mbuf_dynfield_register() to get the
> + *   dynamic offset and stores it in a global variable.
> + * - When the field must be used by the PMD or the application, they
> + *   use the RTE_MBUF_DYNFIELD() helper.
> + */
> +
> +struct rte_mbuf;
> +
> +/**
> + * Register space for a dynamic field in the mbuf structure.
> + *
> + * @param name
> + *   A string identifying the dynamic field. External applications or
> + *   libraries must not define identifers prefixed with "rte_", which
> + *   are reserved for standard features.
> + * @param size
> + *   The number of bytes to reserve.
> + * @param align
> + *   The alignment constraint, which must be a power of 2.
> + * @param flags
> + *   Reserved for future use.
> + * @return
> + *   The offset in the mbuf structure, or -1 on error (rte_errno is set).
> + */
> +__rte_experimental
> +int rte_mbuf_dynfield_register(const char *name, size_t size, size_t align,
> +			unsigned int flags);
> +
> +/**
> + * Lookup for a registered dynamic mbuf field.
> + *
> + * @param name
> + *   A string identifying the dynamic field.
> + * @param size
> + *   If not NULL, the number of reserved bytes for this field is stored
> + *   at this address.
> + * @param align
> + *   If not NULL, the alignement constraint for this field is stored
> + *   at this address.
> + * @return
> + *   The offset of this field in the mbuf structure, or -1 on error
> + *   (rte_errno is set).
> + */
> +__rte_experimental
> +int rte_mbuf_dynfield_lookup(const char *name, size_t *size, size_t *align);
> +
> +/**
> + * Register a dynamic flag in the mbuf structure.
> + *
> + * @param name
> + *   A string identifying the dynamic flag. External applications or
> + *   libraries must not define identifers prefixed with "rte_", which
> + *   are reserved for standard features.
> + * @return
> + *   The number of the reserved bit, or -1 on error (rte_errno is set).
> + */
> +__rte_experimental
> +int rte_mbuf_dynflag_register(const char *name);
> +
> +/**
> + * Lookup for a registered dynamic mbuf flag.
> + *
> + * @param name
> + *   A string identifying the dynamic flag.
> + * @return
> + *   The offset of this flag in the mbuf structure, or -1 on error
> + *   (rte_errno is set).
> + */
> +__rte_experimental
> +int rte_mbuf_dynflag_lookup(const char *name);
> +
> +/**
> + * Helper macro to access to a dynamic field.
> + */
> +#define RTE_MBUF_DYNFIELD(m, offset, type) ((type)((char *)(m) + (offset)))
> +
> +/**
> + * Maximum length of the dynamic field or flag string.
> + */
> +#define RTE_MBUF_DYN_NAMESIZE 32
> +
> +#endif
> diff --git a/lib/librte_mbuf/rte_mbuf_version.map b/lib/librte_mbuf/rte_mbuf_version.map
> index 2662a37bf..a98310570 100644
> --- a/lib/librte_mbuf/rte_mbuf_version.map
> +++ b/lib/librte_mbuf/rte_mbuf_version.map
> @@ -50,4 +50,8 @@ EXPERIMENTAL {
>  	global:
> 
>  	rte_mbuf_check;
> +	rte_mbuf_dynfield_lookup;
> +	rte_mbuf_dynfield_register;
> +	rte_mbuf_dynflag_lookup;
> +	rte_mbuf_dynflag_register;
>  } DPDK_18.08;
> --
> 2.11.0
Stephen Hemminger July 10, 2019, 5:49 p.m. | #2
On Wed, 10 Jul 2019 11:29:07 +0200
Olivier Matz <olivier.matz@6wind.com> wrote:

>  /**
>   * Indicate that the metadata field in the mbuf is in use.
> @@ -738,6 +741,8 @@ struct rte_mbuf {
>  	 */
>  	struct rte_mbuf_ext_shared_info *shinfo;
>  
> +	uint64_t dynfield1; /**< Reserved for dynamic fields. */
> +	uint64_t dynfield2; /**< Reserved for dynamic fields. */
>  } __rte_cache_aligned;

Growing mbuf is a fundamental ABI break and this needs
higher level approval.  Why not one pointer?

It looks like you are creating something like FreeBSD m_tag.
Why not use that?
Wiles, Keith July 10, 2019, 6:12 p.m. | #3
> On Jul 10, 2019, at 12:49 PM, Stephen Hemminger <stephen@networkplumber.org> wrote:
> 
> On Wed, 10 Jul 2019 11:29:07 +0200
> Olivier Matz <olivier.matz@6wind.com> wrote:
> 
>> /**
>>  * Indicate that the metadata field in the mbuf is in use.
>> @@ -738,6 +741,8 @@ struct rte_mbuf {
>> 	 */
>> 	struct rte_mbuf_ext_shared_info *shinfo;
>> 
>> +	uint64_t dynfield1; /**< Reserved for dynamic fields. */
>> +	uint64_t dynfield2; /**< Reserved for dynamic fields. */
>> } __rte_cache_aligned;
> 
> Growing mbuf is a fundamental ABI break and this needs
> higher level approval.  Why not one pointer?
> 
> It looks like you are creating something like FreeBSD m_tag.
> Why not use that?

Changing the mbuf structure causes a big problem for a number reasons as Stephen states.

If we leave the mbuf stucture alone and add this feature to the headroom space between the mbuf structure and the packet. When setting up the mempool/mbuf pool we define a headroom to hold the extra data when the mbuf pool is created or just use the current headroom space. Using this method we can eliminate the mbuf structure change and add the data to the packet buffer. We can do away with dynfield1 and 2 as we know where headroom space begins and ends. Just a thought.

Regards,
Keith
Olivier Matz July 11, 2019, 7:26 a.m. | #4
Hi,

On Wed, Jul 10, 2019 at 05:14:33PM +0000, Wang, Haiyue wrote:
> Hi,
> 
> Sounds cool, just have some questions inline.
> 
> > -----Original Message-----
> > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Olivier Matz
> > Sent: Wednesday, July 10, 2019 17:29
> > To: dev@dpdk.org
> > Subject: [dpdk-dev] [RFC] mbuf: support dynamic fields and flags
> > 
> > Many features require to store data inside the mbuf. As the room in mbuf
> > structure is limited, it is not possible to have a field for each
> > feature. Also, changing fields in the mbuf structure can break the API
> > or ABI.
> > 
> > This commit addresses these issues, by enabling the dynamic registration
> > of fields or flags:
> > 
> > - a dynamic field is a named area in the rte_mbuf structure, with a
> >   given size (>= 1 byte) and alignment constraint.
> > - a dynamic flag is a named bit in the rte_mbuf structure.
> > 
> > The typical use case is a PMD that registers space for an offload
> > feature, when the application requests to enable this feature.  As
> > the space in mbuf is limited, the space should only be reserved if it
> > is going to be used (i.e when the application explicitly asks for it).
> > 
> > The registration can be done at any moment, but it is not possible
> > to unregister fields or flags for now.
> > 
> > Signed-off-by: Olivier Matz <olivier.matz@6wind.com>

(...)

> > +/**
> > + * @file
> > + * RTE Mbuf dynamic fields and flags
> > + *
> > + * Many features require to store data inside the mbuf. As the room in
> > + * mbuf structure is limited, it is not possible to have a field for
> > + * each feature. Also, changing fields in the mbuf structure can break
> > + * the API or ABI.
> > + *
> > + * This module addresses this issue, by enabling the dynamic
> > + * registration of fields or flags:
> > + *
> > + * - a dynamic field is a named area in the rte_mbuf structure, with a
> > + *   given size (>= 1 byte) and alignment constraint.
> > + * - a dynamic flag is a named bit in the rte_mbuf structure.
> > + *
> > + * The typical use case is a PMD that registers space for an offload
> > + * feature, when the application requests to enable this feature.  As
> > + * the space in mbuf is limited, the space should only be reserved if it
> > + * is going to be used (i.e when the application explicitly asks for it).
> > + *
> > + * The registration can be done at any moment, but it is not possible
> > + * to unregister fields or flags for now.
> > + *
> > + * Example of use:
> > + *
> > + * - RTE_MBUF_DYN_<feature>_(ID|SIZE|ALIGN) are defined in this file
> 
> Does it means that all PMDs define their own 'RTE_MBUF_DYN_<feature>_(ID|SIZE|ALIGN)'
> here ? In other words, each PMD can expose its private DYN_<feature> here for public
> using ?

For generic fields, I think they should be declared in this file. For
instance, if we decide to replace the current m->timestamp field by a
dynamic field, we should add like this:

#define RTE_MBUF_DYN_TIMESTAMP_ID "rte_timestamp"
#define RTE_MBUF_DYN_TIMESTAMP_SIZE sizeof(uint64_t)
#define RTE_MBUF_DYN_TIMESTAMP_ALIGN __alignof__(uint64_t)

If the feature is PMD-specific, the defines could be exposed in a
PMD header.

> How about adding another eth_dev_ops API definitions to show the PMD's supporting feature
> names, sizes, align in run time for testpmd ? And also another eth_dev_ops API for showing
> the data saved in rte_mbuf by 'dump_pkt_burst' ? Adding a new command for testpmd to set
> the dynamic feature may be good for PMD test.
> 
> > + * - If the application asks for the feature, the PMD use
> 
> How does the application ask for the feature ? By ' rte_mbuf_dynfield_register()' ?

No change in this area. If we take again the timestamp example, the
feature is asked by the application through the ethdev layer by passing
DEV_RX_OFFLOAD_TIMESTAMP to port or queue configuration.

> 
> > + *   rte_mbuf_dynfield_register() to get the dynamic offset and stores
> > + *   in a global variable.
> 
> In case, the PMD calls 'rte_mbuf_dynfield_register()' for 'dyn_feature' firstly, this
> means that PMD requests the dynamic feature itself if I understand correctly. Should
> PMD calls 'rte_mbuf_dynfield_lookup' for 'dyn_feature' to query the name exists, the
> size and align are right as expected ? If exists, but size and align are not right, may
> be for PMD change its definition, then PMD can give a warning or error message. If name
> exists, both size and align are expected, then PMD think that the application request
> the right dynamic features.

The PMD should only call rte_mbuf_dynfield_register() if the application
requests the feature (through ethdev, or through another mean if it's a
PMD-specific feature). The goal is to only reserve the area in the mbuf
for features that are actually needed.

Hope this is clearer now. I think I need to enhance the documentation in
next version ;)

Thanks for the feedback.
Olivier Matz July 11, 2019, 7:36 a.m. | #5
On Wed, Jul 10, 2019 at 10:49:17AM -0700, Stephen Hemminger wrote:
> On Wed, 10 Jul 2019 11:29:07 +0200
> Olivier Matz <olivier.matz@6wind.com> wrote:
> 
> >  /**
> >   * Indicate that the metadata field in the mbuf is in use.
> > @@ -738,6 +741,8 @@ struct rte_mbuf {
> >  	 */
> >  	struct rte_mbuf_ext_shared_info *shinfo;
> >  
> > +	uint64_t dynfield1; /**< Reserved for dynamic fields. */
> > +	uint64_t dynfield2; /**< Reserved for dynamic fields. */
> >  } __rte_cache_aligned;
> 
> Growing mbuf is a fundamental ABI break and this needs
> higher level approval.

The size of the mbuf is still 128, I used the last 16 bytes that
were unused.

Later, we can think about removing existing fields and replace
them by a dynfield area, which can be anywhere in the structure
(even if it is in a 1 byte hole).

>  Why not one pointer?

A pointer to what?

> It looks like you are creating something like FreeBSD m_tag.
> Why not use that?

My implementation targets performance (accessing to *(mbuf + offset)
should be nearly as fast as accessing to a static field), at the price
of less flexibility compared to something like FreeBSD m_tag.
Olivier Matz July 11, 2019, 7:53 a.m. | #6
Hi Keith,

On Wed, Jul 10, 2019 at 06:12:16PM +0000, Wiles, Keith wrote:
> 
> 
> > On Jul 10, 2019, at 12:49 PM, Stephen Hemminger <stephen@networkplumber.org> wrote:
> > 
> > On Wed, 10 Jul 2019 11:29:07 +0200
> > Olivier Matz <olivier.matz@6wind.com> wrote:
> > 
> >> /**
> >>  * Indicate that the metadata field in the mbuf is in use.
> >> @@ -738,6 +741,8 @@ struct rte_mbuf {
> >> 	 */
> >> 	struct rte_mbuf_ext_shared_info *shinfo;
> >> 
> >> +	uint64_t dynfield1; /**< Reserved for dynamic fields. */
> >> +	uint64_t dynfield2; /**< Reserved for dynamic fields. */
> >> } __rte_cache_aligned;
> > 
> > Growing mbuf is a fundamental ABI break and this needs
> > higher level approval.  Why not one pointer?
> > 
> > It looks like you are creating something like FreeBSD m_tag.
> > Why not use that?
> 
> Changing the mbuf structure causes a big problem for a number reasons as Stephen states.

Can you elaborate?

This is indeed an ABI break, but I think this is only due to the adding
of rte_mbuf_dynfield_copy() in rte_pktmbuf_attach(). The size of the
mbuf does not change and the fields are not initialized when creating a
new mbuf. So I think there is no ABI change for code that is not using
rte_pktmbuf_attach().

I don't think it's a problem to have one ABI change, if it avoids many
others in the future.

> If we leave the mbuf stucture alone and add this feature to the
> headroom space between the mbuf structure and the packet. When setting
> up the mempool/mbuf pool we define a headroom to hold the extra data
> when the mbuf pool is created or just use the current headroom
> space. Using this method we can eliminate the mbuf structure change
> and add the data to the packet buffer. We can do away with dynfield1
> and 2 as we know where headroom space begins and ends. Just a thought.

The size of the mbuf metadata (between the mbuf structure and the
buffer) is configured per pool, so it can be different accross
mbufs. So, the access to the dynamic field would be slower:
*(mbuf + dynfield_offset + metadata_size(mbuf))

Also, the size of the data buffer can be 0: it happens for mbuf pools
that are dedicated to mbuf clones (that reference data in another mbuf
or in an external buffer). In this case, there is no room after metadata
to store the dynamic fields.

Thanks,
Olivier
Haiyue Wang July 11, 2019, 8:04 a.m. | #7
> -----Original Message-----
> From: Olivier Matz [mailto:olivier.matz@6wind.com]
> Sent: Thursday, July 11, 2019 15:26
> To: Wang, Haiyue <haiyue.wang@intel.com>
> Cc: dev@dpdk.org
> Subject: Re: [dpdk-dev] [RFC] mbuf: support dynamic fields and flags
> 
> Hi,
> 
> On Wed, Jul 10, 2019 at 05:14:33PM +0000, Wang, Haiyue wrote:
> > Hi,
> >
> > Sounds cool, just have some questions inline.
> >
> > > -----Original Message-----
> > > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Olivier Matz
> > > Sent: Wednesday, July 10, 2019 17:29
> > > To: dev@dpdk.org
> > > Subject: [dpdk-dev] [RFC] mbuf: support dynamic fields and flags
> > >
> > > Many features require to store data inside the mbuf. As the room in mbuf
> > > structure is limited, it is not possible to have a field for each
> > > feature. Also, changing fields in the mbuf structure can break the API
> > > or ABI.
> > >
> > > This commit addresses these issues, by enabling the dynamic registration
> > > of fields or flags:
> > >
> > > - a dynamic field is a named area in the rte_mbuf structure, with a
> > >   given size (>= 1 byte) and alignment constraint.
> > > - a dynamic flag is a named bit in the rte_mbuf structure.
> > >
> > > The typical use case is a PMD that registers space for an offload
> > > feature, when the application requests to enable this feature.  As
> > > the space in mbuf is limited, the space should only be reserved if it
> > > is going to be used (i.e when the application explicitly asks for it).
> > >
> > > The registration can be done at any moment, but it is not possible
> > > to unregister fields or flags for now.
> > >
> > > Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
> 
> (...)
> 
> > > +/**
> > > + * @file
> > > + * RTE Mbuf dynamic fields and flags
> > > + *
> > > + * Many features require to store data inside the mbuf. As the room in
> > > + * mbuf structure is limited, it is not possible to have a field for
> > > + * each feature. Also, changing fields in the mbuf structure can break
> > > + * the API or ABI.
> > > + *
> > > + * This module addresses this issue, by enabling the dynamic
> > > + * registration of fields or flags:
> > > + *
> > > + * - a dynamic field is a named area in the rte_mbuf structure, with a
> > > + *   given size (>= 1 byte) and alignment constraint.
> > > + * - a dynamic flag is a named bit in the rte_mbuf structure.
> > > + *
> > > + * The typical use case is a PMD that registers space for an offload
> > > + * feature, when the application requests to enable this feature.  As
> > > + * the space in mbuf is limited, the space should only be reserved if it
> > > + * is going to be used (i.e when the application explicitly asks for it).
> > > + *
> > > + * The registration can be done at any moment, but it is not possible
> > > + * to unregister fields or flags for now.
> > > + *
> > > + * Example of use:
> > > + *
> > > + * - RTE_MBUF_DYN_<feature>_(ID|SIZE|ALIGN) are defined in this file
> >
> > Does it means that all PMDs define their own 'RTE_MBUF_DYN_<feature>_(ID|SIZE|ALIGN)'
> > here ? In other words, each PMD can expose its private DYN_<feature> here for public
> > using ?
> 
> For generic fields, I think they should be declared in this file. For
> instance, if we decide to replace the current m->timestamp field by a
> dynamic field, we should add like this:
> 
> #define RTE_MBUF_DYN_TIMESTAMP_ID "rte_timestamp"
> #define RTE_MBUF_DYN_TIMESTAMP_SIZE sizeof(uint64_t)
> #define RTE_MBUF_DYN_TIMESTAMP_ALIGN __alignof__(uint64_t)
> 
> If the feature is PMD-specific, the defines could be exposed in a
> PMD header.
> 

Now, understand the comments a little : ... must not define identifers prefixed with "rte_",
which are reserved for standard features. Seems have big plan ?

> > How about adding another eth_dev_ops API definitions to show the PMD's supporting feature
> > names, sizes, align in run time for testpmd ? And also another eth_dev_ops API for showing
> > the data saved in rte_mbuf by 'dump_pkt_burst' ? Adding a new command for testpmd to set
> > the dynamic feature may be good for PMD test.
> >
> > > + * - If the application asks for the feature, the PMD use
> >
> > How does the application ask for the feature ? By ' rte_mbuf_dynfield_register()' ?
> 
> No change in this area. If we take again the timestamp example, the
> feature is asked by the application through the ethdev layer by passing
> DEV_RX_OFFLOAD_TIMESTAMP to port or queue configuration.
> 
> >
> > > + *   rte_mbuf_dynfield_register() to get the dynamic offset and stores
> > > + *   in a global variable.
> >
> > In case, the PMD calls 'rte_mbuf_dynfield_register()' for 'dyn_feature' firstly, this
> > means that PMD requests the dynamic feature itself if I understand correctly. Should
> > PMD calls 'rte_mbuf_dynfield_lookup' for 'dyn_feature' to query the name exists, the
> > size and align are right as expected ? If exists, but size and align are not right, may
> > be for PMD change its definition, then PMD can give a warning or error message. If name
> > exists, both size and align are expected, then PMD think that the application request
> > the right dynamic features.
> 
> The PMD should only call rte_mbuf_dynfield_register() if the application
> requests the feature (through ethdev, or through another mean if it's a
> PMD-specific feature). The goal is to only reserve the area in the mbuf
> for features that are actually needed.
> 
> Hope this is clearer now. I think I need to enhance the documentation in
> next version ;)
> 

Clearer now, more test code also will be better for fully understanding, thanks! :)

> Thanks for the feedback.
Olivier Matz July 11, 2019, 8:20 a.m. | #8
On Thu, Jul 11, 2019 at 08:04:00AM +0000, Wang, Haiyue wrote:
> > -----Original Message-----
> > From: Olivier Matz [mailto:olivier.matz@6wind.com]
> > Sent: Thursday, July 11, 2019 15:26
> > To: Wang, Haiyue <haiyue.wang@intel.com>
> > Cc: dev@dpdk.org
> > Subject: Re: [dpdk-dev] [RFC] mbuf: support dynamic fields and flags
> > 
> > Hi,
> > 
> > On Wed, Jul 10, 2019 at 05:14:33PM +0000, Wang, Haiyue wrote:
> > > Hi,
> > >
> > > Sounds cool, just have some questions inline.
> > >
> > > > -----Original Message-----
> > > > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Olivier Matz
> > > > Sent: Wednesday, July 10, 2019 17:29
> > > > To: dev@dpdk.org
> > > > Subject: [dpdk-dev] [RFC] mbuf: support dynamic fields and flags
> > > >
> > > > Many features require to store data inside the mbuf. As the room in mbuf
> > > > structure is limited, it is not possible to have a field for each
> > > > feature. Also, changing fields in the mbuf structure can break the API
> > > > or ABI.
> > > >
> > > > This commit addresses these issues, by enabling the dynamic registration
> > > > of fields or flags:
> > > >
> > > > - a dynamic field is a named area in the rte_mbuf structure, with a
> > > >   given size (>= 1 byte) and alignment constraint.
> > > > - a dynamic flag is a named bit in the rte_mbuf structure.
> > > >
> > > > The typical use case is a PMD that registers space for an offload
> > > > feature, when the application requests to enable this feature.  As
> > > > the space in mbuf is limited, the space should only be reserved if it
> > > > is going to be used (i.e when the application explicitly asks for it).
> > > >
> > > > The registration can be done at any moment, but it is not possible
> > > > to unregister fields or flags for now.
> > > >
> > > > Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
> > 
> > (...)
> > 
> > > > +/**
> > > > + * @file
> > > > + * RTE Mbuf dynamic fields and flags
> > > > + *
> > > > + * Many features require to store data inside the mbuf. As the room in
> > > > + * mbuf structure is limited, it is not possible to have a field for
> > > > + * each feature. Also, changing fields in the mbuf structure can break
> > > > + * the API or ABI.
> > > > + *
> > > > + * This module addresses this issue, by enabling the dynamic
> > > > + * registration of fields or flags:
> > > > + *
> > > > + * - a dynamic field is a named area in the rte_mbuf structure, with a
> > > > + *   given size (>= 1 byte) and alignment constraint.
> > > > + * - a dynamic flag is a named bit in the rte_mbuf structure.
> > > > + *
> > > > + * The typical use case is a PMD that registers space for an offload
> > > > + * feature, when the application requests to enable this feature.  As
> > > > + * the space in mbuf is limited, the space should only be reserved if it
> > > > + * is going to be used (i.e when the application explicitly asks for it).
> > > > + *
> > > > + * The registration can be done at any moment, but it is not possible
> > > > + * to unregister fields or flags for now.
> > > > + *
> > > > + * Example of use:
> > > > + *
> > > > + * - RTE_MBUF_DYN_<feature>_(ID|SIZE|ALIGN) are defined in this file
> > >
> > > Does it means that all PMDs define their own 'RTE_MBUF_DYN_<feature>_(ID|SIZE|ALIGN)'
> > > here ? In other words, each PMD can expose its private DYN_<feature> here for public
> > > using ?
> > 
> > For generic fields, I think they should be declared in this file. For
> > instance, if we decide to replace the current m->timestamp field by a
> > dynamic field, we should add like this:
> > 
> > #define RTE_MBUF_DYN_TIMESTAMP_ID "rte_timestamp"
> > #define RTE_MBUF_DYN_TIMESTAMP_SIZE sizeof(uint64_t)
> > #define RTE_MBUF_DYN_TIMESTAMP_ALIGN __alignof__(uint64_t)
> > 
> > If the feature is PMD-specific, the defines could be exposed in a
> > PMD header.
> > 
> 
> Now, understand the comments a little : ... must not define identifers prefixed with "rte_",
> which are reserved for standard features. Seems have big plan ?

The dynamic field can also be used by an external application or by an
external library. For instance, a field to tag a packet, like skb->mark
in linux. In this case, id, size and alignment would be defined outside
dpdk subtree.

To avoid name conflicts, I think we should define a convention for
identifiers, so they are in different namespaces:

- "rte_*" for identifiers declared inside dpdk subtree
- any other name for identifiers declared in an external application or
  library
Haiyue Wang July 11, 2019, 8:34 a.m. | #9
> -----Original Message-----
> From: Olivier Matz [mailto:olivier.matz@6wind.com]
> Sent: Thursday, July 11, 2019 16:21
> To: Wang, Haiyue <haiyue.wang@intel.com>
> Cc: dev@dpdk.org
> Subject: Re: [dpdk-dev] [RFC] mbuf: support dynamic fields and flags
> 
> On Thu, Jul 11, 2019 at 08:04:00AM +0000, Wang, Haiyue wrote:
> > > -----Original Message-----
> > > From: Olivier Matz [mailto:olivier.matz@6wind.com]
> > > Sent: Thursday, July 11, 2019 15:26
> > > To: Wang, Haiyue <haiyue.wang@intel.com>
> > > Cc: dev@dpdk.org
> > > Subject: Re: [dpdk-dev] [RFC] mbuf: support dynamic fields and flags
> > >
> > > Hi,
> > >
> > > On Wed, Jul 10, 2019 at 05:14:33PM +0000, Wang, Haiyue wrote:
> > > > Hi,
> > > >
> > > > Sounds cool, just have some questions inline.
> > > >
> > > > > -----Original Message-----
> > > > > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Olivier Matz
> > > > > Sent: Wednesday, July 10, 2019 17:29
> > > > > To: dev@dpdk.org
> > > > > Subject: [dpdk-dev] [RFC] mbuf: support dynamic fields and flags
> > > > >
> > > > > Many features require to store data inside the mbuf. As the room in mbuf
> > > > > structure is limited, it is not possible to have a field for each
> > > > > feature. Also, changing fields in the mbuf structure can break the API
> > > > > or ABI.
> > > > >
> > > > > This commit addresses these issues, by enabling the dynamic registration
> > > > > of fields or flags:
> > > > >
> > > > > - a dynamic field is a named area in the rte_mbuf structure, with a
> > > > >   given size (>= 1 byte) and alignment constraint.
> > > > > - a dynamic flag is a named bit in the rte_mbuf structure.
> > > > >
> > > > > The typical use case is a PMD that registers space for an offload
> > > > > feature, when the application requests to enable this feature.  As
> > > > > the space in mbuf is limited, the space should only be reserved if it
> > > > > is going to be used (i.e when the application explicitly asks for it).
> > > > >
> > > > > The registration can be done at any moment, but it is not possible
> > > > > to unregister fields or flags for now.
> > > > >
> > > > > Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
> > >
> > > (...)
> > >
> > > > > +/**
> > > > > + * @file
> > > > > + * RTE Mbuf dynamic fields and flags
> > > > > + *
> > > > > + * Many features require to store data inside the mbuf. As the room in
> > > > > + * mbuf structure is limited, it is not possible to have a field for
> > > > > + * each feature. Also, changing fields in the mbuf structure can break
> > > > > + * the API or ABI.
> > > > > + *
> > > > > + * This module addresses this issue, by enabling the dynamic
> > > > > + * registration of fields or flags:
> > > > > + *
> > > > > + * - a dynamic field is a named area in the rte_mbuf structure, with a
> > > > > + *   given size (>= 1 byte) and alignment constraint.
> > > > > + * - a dynamic flag is a named bit in the rte_mbuf structure.
> > > > > + *
> > > > > + * The typical use case is a PMD that registers space for an offload
> > > > > + * feature, when the application requests to enable this feature.  As
> > > > > + * the space in mbuf is limited, the space should only be reserved if it
> > > > > + * is going to be used (i.e when the application explicitly asks for it).
> > > > > + *
> > > > > + * The registration can be done at any moment, but it is not possible
> > > > > + * to unregister fields or flags for now.
> > > > > + *
> > > > > + * Example of use:
> > > > > + *
> > > > > + * - RTE_MBUF_DYN_<feature>_(ID|SIZE|ALIGN) are defined in this file
> > > >
> > > > Does it means that all PMDs define their own 'RTE_MBUF_DYN_<feature>_(ID|SIZE|ALIGN)'
> > > > here ? In other words, each PMD can expose its private DYN_<feature> here for public
> > > > using ?
> > >
> > > For generic fields, I think they should be declared in this file. For
> > > instance, if we decide to replace the current m->timestamp field by a
> > > dynamic field, we should add like this:
> > >
> > > #define RTE_MBUF_DYN_TIMESTAMP_ID "rte_timestamp"
> > > #define RTE_MBUF_DYN_TIMESTAMP_SIZE sizeof(uint64_t)
> > > #define RTE_MBUF_DYN_TIMESTAMP_ALIGN __alignof__(uint64_t)
> > >
> > > If the feature is PMD-specific, the defines could be exposed in a
> > > PMD header.
> > >
> >
> > Now, understand the comments a little : ... must not define identifers prefixed with "rte_",
> > which are reserved for standard features. Seems have big plan ?
> 
> The dynamic field can also be used by an external application or by an
> external library. For instance, a field to tag a packet, like skb->mark
> in linux. In this case, id, size and alignment would be defined outside
> dpdk subtree.
> 
> To avoid name conflicts, I think we should define a convention for
> identifiers, so they are in different namespaces:
> 
> - "rte_*" for identifiers declared inside dpdk subtree
> - any other name for identifiers declared in an external application or
>   library

Very clearer now, thanks, this convention can be in programming guide document. :)
Thomas Monjalon July 11, 2019, 9:24 a.m. | #10
10/07/2019 11:29, Olivier Matz:
> Many features require to store data inside the mbuf. As the room in mbuf
> structure is limited, it is not possible to have a field for each
> feature. Also, changing fields in the mbuf structure can break the API
> or ABI.
> 
> This commit addresses these issues, by enabling the dynamic registration
> of fields or flags:
> 
> - a dynamic field is a named area in the rte_mbuf structure, with a
>   given size (>= 1 byte) and alignment constraint.
> - a dynamic flag is a named bit in the rte_mbuf structure.
> 
> The typical use case is a PMD that registers space for an offload
> feature, when the application requests to enable this feature.  As
> the space in mbuf is limited, the space should only be reserved if it
> is going to be used (i.e when the application explicitly asks for it).
> 
> The registration can be done at any moment, but it is not possible
> to unregister fields or flags for now.
> 
> Signed-off-by: Olivier Matz <olivier.matz@6wind.com>

I fully support this solution.
It will give a lot of space for new features and will solve
the ABI stability problem.

Next step, I would like to move some existing mbuf fields to this
dynamic model. It will increase the free space in mbuf to be used
by dynamic fields. By converting some fields which are currently
union'ed, we can also fix the issue of these features being exclusive.

Acked-by: Thomas Monjalon <thomas@monjalon.net>
Wiles, Keith July 11, 2019, 2:37 p.m. | #11
> On Jul 11, 2019, at 2:53 AM, Olivier Matz <olivier.matz@6wind.com> wrote:
> 
> Hi Keith,
> 
> On Wed, Jul 10, 2019 at 06:12:16PM +0000, Wiles, Keith wrote:
>> 
>> 
>>> On Jul 10, 2019, at 12:49 PM, Stephen Hemminger <stephen@networkplumber.org> wrote:
>>> 
>>> On Wed, 10 Jul 2019 11:29:07 +0200
>>> Olivier Matz <olivier.matz@6wind.com> wrote:
>>> 
>>>> /**
>>>> * Indicate that the metadata field in the mbuf is in use.
>>>> @@ -738,6 +741,8 @@ struct rte_mbuf {
>>>> 	 */
>>>> 	struct rte_mbuf_ext_shared_info *shinfo;
>>>> 
>>>> +	uint64_t dynfield1; /**< Reserved for dynamic fields. */
>>>> +	uint64_t dynfield2; /**< Reserved for dynamic fields. */
>>>> } __rte_cache_aligned;
>>> 
>>> Growing mbuf is a fundamental ABI break and this needs
>>> higher level approval.  Why not one pointer?
>>> 
>>> It looks like you are creating something like FreeBSD m_tag.
>>> Why not use that?
>> 
>> Changing the mbuf structure causes a big problem for a number reasons as Stephen states.
> 
> Can you elaborate?
> 
> This is indeed an ABI break, but I think this is only due to the adding
> of rte_mbuf_dynfield_copy() in rte_pktmbuf_attach(). The size of the
> mbuf does not change and the fields are not initialized when creating a
> new mbuf. So I think there is no ABI change for code that is not using
> rte_pktmbuf_attach().
> 
> I don't think it's a problem to have one ABI change, if it avoids many
> others in the future.
> 
>> If we leave the mbuf stucture alone and add this feature to the
>> headroom space between the mbuf structure and the packet. When setting
>> up the mempool/mbuf pool we define a headroom to hold the extra data
>> when the mbuf pool is created or just use the current headroom
>> space. Using this method we can eliminate the mbuf structure change
>> and add the data to the packet buffer. We can do away with dynfield1
>> and 2 as we know where headroom space begins and ends. Just a thought.
> 
> The size of the mbuf metadata (between the mbuf structure and the
> buffer) is configured per pool, so it can be different accross
> mbufs. So, the access to the dynamic field would be slower:
> *(mbuf + dynfield_offset + metadata_size(mbuf))

We can force that space to be a minimum size when the mempool is created in the case of a cloned mbuf. The cloned mbuf is a small use case, but am important one and increasing the size for those special mbufs by a cache line should not be a huge problem.

I think most allocations do not change the size from the default value of the headroom (128). The mbuf + buffer are normally rounded to 2K or a bit bigger, which gives a bit more space in those cases of a packet size of 1518-1522. Jumbo frames are the same. Using the headroom size for an application needs to be defined and setup for the max size anyway for the application needs, so normally all mbuf creates should contain the same size to account for mbuf moments within the system.

That is my $0.02.

> 
> Also, the size of the data buffer can be 0: it happens for mbuf pools
> that are dedicated to mbuf clones (that reference data in another mbuf
> or in an external buffer). In this case, there is no room after metadata
> to store the dynamic fields.
> 
> Thanks,
> Olivier

Regards,
Keith
Stephen Hemminger July 11, 2019, 3:31 p.m. | #12
On Thu, 11 Jul 2019 09:26:19 +0200
Olivier Matz <olivier.matz@6wind.com> wrote:

> For generic fields, I think they should be declared in this file. For
> instance, if we decide to replace the current m->timestamp field by a
> dynamic field, we should add like this:
> 
> #define RTE_MBUF_DYN_TIMESTAMP_ID "rte_timestamp"
> #define RTE_MBUF_DYN_TIMESTAMP_SIZE sizeof(uint64_t)
> #define RTE_MBUF_DYN_TIMESTAMP_ALIGN __alignof__(uint64_t)


Let's use  structures (like rte_flow) rather that macros for
this?
Olivier Matz July 12, 2019, 9:06 a.m. | #13
Hi,

On Thu, Jul 11, 2019 at 02:37:23PM +0000, Wiles, Keith wrote:
> 
> 
> > On Jul 11, 2019, at 2:53 AM, Olivier Matz <olivier.matz@6wind.com> wrote:
> > 
> > Hi Keith,
> > 
> > On Wed, Jul 10, 2019 at 06:12:16PM +0000, Wiles, Keith wrote:
> >> 
> >> 
> >>> On Jul 10, 2019, at 12:49 PM, Stephen Hemminger <stephen@networkplumber.org> wrote:
> >>> 
> >>> On Wed, 10 Jul 2019 11:29:07 +0200
> >>> Olivier Matz <olivier.matz@6wind.com> wrote:
> >>> 
> >>>> /**
> >>>> * Indicate that the metadata field in the mbuf is in use.
> >>>> @@ -738,6 +741,8 @@ struct rte_mbuf {
> >>>> 	 */
> >>>> 	struct rte_mbuf_ext_shared_info *shinfo;
> >>>> 
> >>>> +	uint64_t dynfield1; /**< Reserved for dynamic fields. */
> >>>> +	uint64_t dynfield2; /**< Reserved for dynamic fields. */
> >>>> } __rte_cache_aligned;
> >>> 
> >>> Growing mbuf is a fundamental ABI break and this needs
> >>> higher level approval.  Why not one pointer?
> >>> 
> >>> It looks like you are creating something like FreeBSD m_tag.
> >>> Why not use that?
> >> 
> >> Changing the mbuf structure causes a big problem for a number reasons as Stephen states.
> > 
> > Can you elaborate?
> > 
> > This is indeed an ABI break, but I think this is only due to the adding
> > of rte_mbuf_dynfield_copy() in rte_pktmbuf_attach(). The size of the
> > mbuf does not change and the fields are not initialized when creating a
> > new mbuf. So I think there is no ABI change for code that is not using
> > rte_pktmbuf_attach().
> > 
> > I don't think it's a problem to have one ABI change, if it avoids many
> > others in the future.
> > 
> >> If we leave the mbuf stucture alone and add this feature to the
> >> headroom space between the mbuf structure and the packet. When setting
> >> up the mempool/mbuf pool we define a headroom to hold the extra data
> >> when the mbuf pool is created or just use the current headroom
> >> space. Using this method we can eliminate the mbuf structure change
> >> and add the data to the packet buffer. We can do away with dynfield1
> >> and 2 as we know where headroom space begins and ends. Just a thought.
> > 
> > The size of the mbuf metadata (between the mbuf structure and the
> > buffer) is configured per pool, so it can be different accross
> > mbufs. So, the access to the dynamic field would be slower:
> > *(mbuf + dynfield_offset + metadata_size(mbuf))
> 

> We can force that space to be a minimum size when the mempool is
> created in the case of a cloned mbuf. The cloned mbuf is a small use
> case, but am important one and increasing the size for those special
> mbufs by a cache line should not be a huge problem.
> 
> I think most allocations do not change the size from the default value
> of the headroom (128). The mbuf + buffer are normally rounded to 2K or
> a bit bigger, which gives a bit more space in those cases of a packet
> size of 1518-1522. Jumbo frames are the same. Using the headroom size
> for an application needs to be defined and setup for the max size
> anyway for the application needs, so normally all mbuf creates should
> contain the same size to account for mbuf moments within the system.

If we want more room for dynamic fields, we can do something like
this. But I don't think this is something that will happen soon: we
already have 16 bytes available, and I'm sure we can get another 16
bytes very easily by just converting fields like timestamp or sequence
numbers.

To attach larger amount of data to mbufs, the metadata feature still
exists. We can imagine to extend the dynamic fields feature to be able
to use more space after the mbuf structure (in metadata?), but it is
more complex.

I don't think that using headroom or tailroom is a good idea. That's
true that mbufs are usually a bit more than 2K, and some space is lost
when mtu is 1500. But smaller mbufs are perfectly legal too, except that
some drivers do not support it. Anyway, headroom and tailroom must be
used for what they are designed: reserving room to prepend or append
data. If we want more space for dynamic fields, let's add a specific
location for it, when it will be needed.
Olivier Matz July 12, 2019, 9:18 a.m. | #14
On Thu, Jul 11, 2019 at 08:31:19AM -0700, Stephen Hemminger wrote:
> On Thu, 11 Jul 2019 09:26:19 +0200
> Olivier Matz <olivier.matz@6wind.com> wrote:
> 
> > For generic fields, I think they should be declared in this file. For
> > instance, if we decide to replace the current m->timestamp field by a
> > dynamic field, we should add like this:
> > 
> > #define RTE_MBUF_DYN_TIMESTAMP_ID "rte_timestamp"
> > #define RTE_MBUF_DYN_TIMESTAMP_SIZE sizeof(uint64_t)
> > #define RTE_MBUF_DYN_TIMESTAMP_ALIGN __alignof__(uint64_t)
> 
> 
> Let's use  structures (like rte_flow) rather that macros for
> this?

The purpose of having defines is:
- to avoid typos when registering dynamic fields/flags
- to avoid name conflicts (because define names are derived from identifier)
- associate a known size and alignment to a given dynamic field

Using strings instead of numeric identifiers is also done on
purpose, to facilitate the definition of unique identifiers outside
the dpdk subtree (as soon as we respect contraints on namespace).

Instead of defines, are you suggesting something like this?

	struct rte_mbuf_dynfield {
		const char *id;
		size_t size;
		size_t align;
	};

	/* definition of a dynamic field */
	static const struct rte_mbuf_dynfield rte_mbuf_dynfield_timestamp {
		.id = "rte_mbuf_dynfield_timestamp",
		.size = sizeof(uint64_t),
		.size = __alignof__(uint64_t),
	};

	/* ...and same for dynamic flags... */

And change the registration API to have one argument of type (struct
rte_mbuf_dynfield *) ?

I agree it could be quite nice with structs.
Jerin Jacob Kollanukkaran July 12, 2019, 12:23 p.m. | #15
> -----Original Message-----
> From: dev <dev-bounces@dpdk.org> On Behalf Of Olivier Matz
> Sent: Thursday, July 11, 2019 1:07 PM
> To: Stephen Hemminger <stephen@networkplumber.org>
> Cc: dev@dpdk.org
> Subject: Re: [dpdk-dev] [RFC] mbuf: support dynamic fields and flags
> 
> On Wed, Jul 10, 2019 at 10:49:17AM -0700, Stephen Hemminger wrote:
> > On Wed, 10 Jul 2019 11:29:07 +0200
> > Olivier Matz <olivier.matz@6wind.com> wrote:
> >
> > >  /**
> > >   * Indicate that the metadata field in the mbuf is in use.
> > > @@ -738,6 +741,8 @@ struct rte_mbuf {
> > >  	 */
> > >  	struct rte_mbuf_ext_shared_info *shinfo;
> > >
> > > +	uint64_t dynfield1; /**< Reserved for dynamic fields. */
> > > +	uint64_t dynfield2; /**< Reserved for dynamic fields. */

Since the mbuf size is fixed, What is the downside of union scheme[1] vs upside of proposed scheme

[1] Example like:
        RTE_STD_C11
        union {
                void *userdata;   /**< Can be used for external metadata */
                uint64_t udata64; /**< Allow 8-byte userdata on 32-bit */
        };

# The fields like mbuf: hash.usr, used in variety  of use case together
Like libraries like distributor() and Eventdev using it. If we switch
to dynamic mbuf scheme, We wil take those field using rte_mbuf_dynfield_register()
on library init?

# I see an upside of dynamic mbuf if we can add rte_mbuf_dynfield_unregister API.
But can we ever do that? Because it will be complex if we need introduce notification mechanism etc.

# In the real world use case, if with union scheme, fastpath API can simply deference 
specific element (say mbuf->fieldx). With dynamic scheme, the offset need to store
in some other data structure  and de reference in fastpath before assessing the interested field.
Right?
Andrew Rybchenko July 12, 2019, 2:54 p.m. | #16
On 10.07.2019 12:29, Olivier Matz wrote:
> Many features require to store data inside the mbuf. As the room in mbuf
> structure is limited, it is not possible to have a field for each
> feature. Also, changing fields in the mbuf structure can break the API
> or ABI.
>
> This commit addresses these issues, by enabling the dynamic registration
> of fields or flags:
>
> - a dynamic field is a named area in the rte_mbuf structure, with a
>    given size (>= 1 byte) and alignment constraint.
> - a dynamic flag is a named bit in the rte_mbuf structure.
>
> The typical use case is a PMD that registers space for an offload
> feature, when the application requests to enable this feature.  As
> the space in mbuf is limited, the space should only be reserved if it
> is going to be used (i.e when the application explicitly asks for it).
>
> The registration can be done at any moment, but it is not possible
> to unregister fields or flags for now.
>
> Signed-off-by: Olivier Matz <olivier.matz@6wind.com>

I like the idea.

I think it would be very useful to measure performance impact. Since it is
core structure which is heavily used on datapath, performance impact is
required to make decision to go or not to go. If acceptable, more fields
can be converted to dynamic: timestamp, user data, sequence number,
timesync data etc. Rules on which fields should be static and which
dynamic are required. Timestamp, for example, is located in the first
cache line. Do we need a way prioritize some dynamic fields to be located
(if possible) in the first cache line? Or is it better simply move some 
static
to the first cache line instead?

I think rules should be better defined and imposed, if possible, when
dynamic fields may be registered. Which entities are allowed to register
dynamic fields? Do we need to keep track which entity has registered
which dynamic fields? What to expect if a dynamic field is registered
after port start (the field is registered, but most likely not filled in)?
What to expect on port restart?
Olivier Matz July 16, 2019, 9:39 a.m. | #17
On Fri, Jul 12, 2019 at 12:23:19PM +0000, Jerin Jacob Kollanukkaran wrote:
> > -----Original Message-----
> > From: dev <dev-bounces@dpdk.org> On Behalf Of Olivier Matz
> > Sent: Thursday, July 11, 2019 1:07 PM
> > To: Stephen Hemminger <stephen@networkplumber.org>
> > Cc: dev@dpdk.org
> > Subject: Re: [dpdk-dev] [RFC] mbuf: support dynamic fields and flags
> > 
> > On Wed, Jul 10, 2019 at 10:49:17AM -0700, Stephen Hemminger wrote:
> > > On Wed, 10 Jul 2019 11:29:07 +0200
> > > Olivier Matz <olivier.matz@6wind.com> wrote:
> > >
> > > >  /**
> > > >   * Indicate that the metadata field in the mbuf is in use.
> > > > @@ -738,6 +741,8 @@ struct rte_mbuf {
> > > >  	 */
> > > >  	struct rte_mbuf_ext_shared_info *shinfo;
> > > >
> > > > +	uint64_t dynfield1; /**< Reserved for dynamic fields. */
> > > > +	uint64_t dynfield2; /**< Reserved for dynamic fields. */
> 
> Since the mbuf size is fixed, What is the downside of union scheme[1] vs upside of proposed scheme
> 
> [1] Example like:
>         RTE_STD_C11
>         union {
>                 void *userdata;   /**< Can be used for external metadata */
>                 uint64_t udata64; /**< Allow 8-byte userdata on 32-bit */
>         };

In the particular case of userdata, the union is not an issue, it
just means that there are several ways to represent the same data.
If needed, it is possible to register a union as a dynamic field.

In other case, like m->hash, having a union makes it impossible to
use several features of the union at the same time. This would be
solved by dynamic fields.

> # The fields like mbuf: hash.usr, used in variety  of use case together
> Like libraries like distributor() and Eventdev using it. If we switch
> to dynamic mbuf scheme, We wil take those field using rte_mbuf_dynfield_register()
> on library init?

If we decide that these fields must be converted to a dynamic field,
yes, each library/application will call rte_mbuf_dynfield_register().

> # I see an upside of dynamic mbuf if we can add rte_mbuf_dynfield_unregister API.
> But can we ever do that? Because it will be complex if we need introduce notification mechanism etc.

An unregister mechanism seems hard to implement, or we can leave the
hard part to the user: either ensure that no mbuf is in use anywhere, or
that removing the dynamic field won't have any impact. But I'd prefer
not introducing an unregistration function until we have a real use-case
for it.

> # In the real world use case, if with union scheme, fastpath API can simply deference 
> specific element (say mbuf->fieldx). With dynamic scheme, the offset need to store
> in some other data structure  and de reference in fastpath before assessing the interested field.
> Right?

Yes, with dynamic fields, the offset is stored in a variable. A global
variable (static to the file or module using it) does the job. This may
have a small performance impact.
Olivier Matz July 16, 2019, 9:49 a.m. | #18
On Fri, Jul 12, 2019 at 05:54:57PM +0300, Andrew Rybchenko wrote:
> On 10.07.2019 12:29, Olivier Matz wrote:
> > Many features require to store data inside the mbuf. As the room in mbuf
> > structure is limited, it is not possible to have a field for each
> > feature. Also, changing fields in the mbuf structure can break the API
> > or ABI.
> > 
> > This commit addresses these issues, by enabling the dynamic registration
> > of fields or flags:
> > 
> > - a dynamic field is a named area in the rte_mbuf structure, with a
> >    given size (>= 1 byte) and alignment constraint.
> > - a dynamic flag is a named bit in the rte_mbuf structure.
> > 
> > The typical use case is a PMD that registers space for an offload
> > feature, when the application requests to enable this feature.  As
> > the space in mbuf is limited, the space should only be reserved if it
> > is going to be used (i.e when the application explicitly asks for it).
> > 
> > The registration can be done at any moment, but it is not possible
> > to unregister fields or flags for now.
> > 
> > Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
> 
> I like the idea.
> 
> I think it would be very useful to measure performance impact. Since it is
> core structure which is heavily used on datapath, performance impact is
> required to make decision to go or not to go. If acceptable, more fields
> can be converted to dynamic: timestamp, user data, sequence number,
> timesync data etc.

Agree. I'll try to do this in the coming days.

> Rules on which fields should be static and which
> dynamic are required. Timestamp, for example, is located in the first
> cache line. Do we need a way prioritize some dynamic fields to be located
> (if possible) in the first cache line? Or is it better simply move some
> static
> to the first cache line instead?

There is a "flags" argument, which is designed for this purpose. Today,
there is no room in the first cache line, but as soon as we remove
something from it, we can add a flag to ask to register a dynamic field
in the first cache line.

> I think rules should be better defined and imposed, if possible, when
> dynamic fields may be registered. Which entities are allowed to register
> dynamic fields?

I think there is no restriction. Library, PMD, App can register their
dynamic fields as soon as there is room for it.

> Do we need to keep track which entity has registered
> which dynamic fields?

Looks quite difficult to me. Most of the time, a dynamic field will be
registered at several places. Only the first registration is effective,
the other will just get the offset.

But at least we could add a log in the registration function.

> What to expect if a dynamic field is registered
> after port start (the field is registered, but most likely not filled in)?
> What to expect on port restart?

Registration of dynamic field can be done at any moment.

But to register a field that will be used by a PMD, we need to ask for
the feature at port configuration (usually through ethdev). Then the PMD
will register the dynamic field. If it fails, the configuration of the
port should fail.

The application that will access to the field will also register it. It
can be done before or after the PMD initialization.
Andrew Rybchenko July 16, 2019, 11:31 a.m. | #19
On 7/16/19 12:49 PM, Olivier Matz wrote:
> On Fri, Jul 12, 2019 at 05:54:57PM +0300, Andrew Rybchenko wrote:
>> On 10.07.2019 12:29, Olivier Matz wrote:
>>> Many features require to store data inside the mbuf. As the room in mbuf
>>> structure is limited, it is not possible to have a field for each
>>> feature. Also, changing fields in the mbuf structure can break the API
>>> or ABI.
>>>
>>> This commit addresses these issues, by enabling the dynamic registration
>>> of fields or flags:
>>>
>>> - a dynamic field is a named area in the rte_mbuf structure, with a
>>>     given size (>= 1 byte) and alignment constraint.
>>> - a dynamic flag is a named bit in the rte_mbuf structure.
>>>
>>> The typical use case is a PMD that registers space for an offload
>>> feature, when the application requests to enable this feature.  As
>>> the space in mbuf is limited, the space should only be reserved if it
>>> is going to be used (i.e when the application explicitly asks for it).
>>>
>>> The registration can be done at any moment, but it is not possible
>>> to unregister fields or flags for now.
>>>
>>> Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
>> I like the idea.
>>
>> I think it would be very useful to measure performance impact. Since it is
>> core structure which is heavily used on datapath, performance impact is
>> required to make decision to go or not to go. If acceptable, more fields
>> can be converted to dynamic: timestamp, user data, sequence number,
>> timesync data etc.
> Agree. I'll try to do this in the coming days.
>
>> Rules on which fields should be static and which
>> dynamic are required. Timestamp, for example, is located in the first
>> cache line. Do we need a way prioritize some dynamic fields to be located
>> (if possible) in the first cache line? Or is it better simply move some
>> static
>> to the first cache line instead?
> There is a "flags" argument, which is designed for this purpose. Today,
> there is no room in the first cache line, but as soon as we remove
> something from it, we can add a flag to ask to register a dynamic field
> in the first cache line.
>
>> I think rules should be better defined and imposed, if possible, when
>> dynamic fields may be registered. Which entities are allowed to register
>> dynamic fields?
> I think there is no restriction. Library, PMD, App can register their
> dynamic fields as soon as there is room for it.

I see that API itself has no restrictions, but the goal is to have
something working and it is very easy to break things with
dynamic fields and flags. May be obvious requirements are
sufficient (e.g. should be registered before lookup to be found
by lookup), but it is getting more complicated when drivers,
core libraries and applications come into play with their life
cycles. But may be it is really out-of-scope of the API description.

Thanks.
Stephen Hemminger July 16, 2019, 2:43 p.m. | #20
On Tue, 16 Jul 2019 11:39:50 +0200
Olivier Matz <olivier.matz@6wind.com> wrote:

> On Fri, Jul 12, 2019 at 12:23:19PM +0000, Jerin Jacob Kollanukkaran wrote:
> > > -----Original Message-----
> > > From: dev <dev-bounces@dpdk.org> On Behalf Of Olivier Matz
> > > Sent: Thursday, July 11, 2019 1:07 PM
> > > To: Stephen Hemminger <stephen@networkplumber.org>
> > > Cc: dev@dpdk.org
> > > Subject: Re: [dpdk-dev] [RFC] mbuf: support dynamic fields and flags
> > > 
> > > On Wed, Jul 10, 2019 at 10:49:17AM -0700, Stephen Hemminger wrote:  
> > > > On Wed, 10 Jul 2019 11:29:07 +0200
> > > > Olivier Matz <olivier.matz@6wind.com> wrote:
> > > >  
> > > > >  /**
> > > > >   * Indicate that the metadata field in the mbuf is in use.
> > > > > @@ -738,6 +741,8 @@ struct rte_mbuf {
> > > > >  	 */
> > > > >  	struct rte_mbuf_ext_shared_info *shinfo;
> > > > >
> > > > > +	uint64_t dynfield1; /**< Reserved for dynamic fields. */
> > > > > +	uint64_t dynfield2; /**< Reserved for dynamic fields. */  
> > 
> > Since the mbuf size is fixed, What is the downside of union scheme[1] vs upside of proposed scheme
> > 
> > [1] Example like:
> >         RTE_STD_C11
> >         union {
> >                 void *userdata;   /**< Can be used for external metadata */
> >                 uint64_t udata64; /**< Allow 8-byte userdata on 32-bit */
> >         };  
> 
> In the particular case of userdata, the union is not an issue, it
> just means that there are several ways to represent the same data.
> If needed, it is possible to register a union as a dynamic field.
> 
> In other case, like m->hash, having a union makes it impossible to
> use several features of the union at the same time. This would be
> solved by dynamic fields.
> 
> > # The fields like mbuf: hash.usr, used in variety  of use case together
> > Like libraries like distributor() and Eventdev using it. If we switch
> > to dynamic mbuf scheme, We wil take those field using rte_mbuf_dynfield_register()
> > on library init?  
> 
> If we decide that these fields must be converted to a dynamic field,
> yes, each library/application will call rte_mbuf_dynfield_register().
> 
> > # I see an upside of dynamic mbuf if we can add rte_mbuf_dynfield_unregister API.
> > But can we ever do that? Because it will be complex if we need introduce notification mechanism etc.  
> 
> An unregister mechanism seems hard to implement, or we can leave the
> hard part to the user: either ensure that no mbuf is in use anywhere, or
> that removing the dynamic field won't have any impact. But I'd prefer
> not introducing an unregistration function until we have a real use-case
> for it.
> 
> > # In the real world use case, if with union scheme, fastpath API can simply deference 
> > specific element (say mbuf->fieldx). With dynamic scheme, the offset need to store
> > in some other data structure  and de reference in fastpath before assessing the interested field.
> > Right?  
> 
> Yes, with dynamic fields, the offset is stored in a variable. A global
> variable (static to the file or module using it) does the job. This may
> have a small performance impact.
> 

Applications are already using userdata reusing that in a driver
would cause a worse disaster than breaking ABI.

Patch

diff --git a/app/test/test_mbuf.c b/app/test/test_mbuf.c
index 2a97afe20..8008cc766 100644
--- a/app/test/test_mbuf.c
+++ b/app/test/test_mbuf.c
@@ -28,6 +28,7 @@ 
 #include <rte_random.h>
 #include <rte_cycles.h>
 #include <rte_malloc.h>
+#include <rte_mbuf_dyn.h>
 
 #include "test.h"
 
@@ -502,7 +503,6 @@  test_attach_from_different_pool(struct rte_mempool *pktmbuf_pool,
 		rte_pktmbuf_free(clone2);
 	return -1;
 }
-#undef GOTO_FAIL
 
 /*
  * test allocation and free of mbufs
@@ -1122,6 +1122,81 @@  test_tx_offload(void)
 }
 
 static int
+test_mbuf_dyn(struct rte_mempool *pktmbuf_pool)
+{
+	struct rte_mbuf *m = NULL;
+	int offset, offset2;
+	int flag, flag2;
+
+	offset = rte_mbuf_dynfield_register("test-dynfield", sizeof(uint8_t),
+					__alignof__(uint8_t), 0);
+	if (offset == -1)
+		GOTO_FAIL("failed to register dynamic field, offset=%d: %s",
+			offset, strerror(errno));
+
+	offset2 = rte_mbuf_dynfield_register("test-dynfield", sizeof(uint8_t),
+					__alignof__(uint8_t), 0);
+	if (offset2 != offset)
+		GOTO_FAIL("failed to lookup dynamic field, offset=%d, offset2=%d: %s",
+			offset, offset2, strerror(errno));
+
+	offset2 = rte_mbuf_dynfield_register("test-dynfield2", sizeof(uint16_t),
+					__alignof__(uint16_t), 0);
+	if (offset2 == -1 || offset2 == offset || (offset & 1))
+		GOTO_FAIL("failed to register dynfield field 2, offset=%d, offset2=%d: %s",
+			offset, offset2, strerror(errno));
+
+	printf("offset = %d, offset2 = %d\n", offset, offset2);
+
+	offset = rte_mbuf_dynfield_register("test-dynfield-fail", 256, 1, 0);
+	if (offset != -1)
+		GOTO_FAIL("dynamic field creation should fail (too big)");
+
+	offset = rte_mbuf_dynfield_register("test-dynfield-fail", 1, 3, 0);
+	if (offset != -1)
+		GOTO_FAIL("dynamic field creation should fail (bad alignment)");
+
+	flag = rte_mbuf_dynflag_register("test-dynflag");
+	if (flag == -1)
+		GOTO_FAIL("failed to register dynamic field, flag=%d: %s",
+			flag, strerror(errno));
+
+	flag2 = rte_mbuf_dynflag_register("test-dynflag");
+	if (flag2 != flag)
+		GOTO_FAIL("failed to lookup dynamic field, flag=%d, flag2=%d: %s",
+			flag, flag2, strerror(errno));
+
+	flag2 = rte_mbuf_dynflag_register("test-dynflag2");
+	if (flag2 == -1 || flag2 == flag)
+		GOTO_FAIL("failed to register dynflag field 2, flag=%d, flag2=%d: %s",
+			flag, flag2, strerror(errno));
+
+	printf("flag = %d, flag2 = %d\n", flag, flag2);
+
+	/* set, get dynamic field */
+	m = rte_pktmbuf_alloc(pktmbuf_pool);
+	if (m == NULL)
+		GOTO_FAIL("Cannot allocate mbuf");
+
+	*RTE_MBUF_DYNFIELD(m, offset, uint8_t *) = 1;
+	if (*RTE_MBUF_DYNFIELD(m, offset, uint8_t *) != 1)
+		GOTO_FAIL("failed to read dynamic field");
+	*RTE_MBUF_DYNFIELD(m, offset2, uint16_t *) = 1000;
+	if (*RTE_MBUF_DYNFIELD(m, offset2, uint16_t *) != 1000)
+		GOTO_FAIL("failed to read dynamic field");
+
+	/* set a dynamic flag */
+	m->ol_flags |= (1ULL << flag);
+
+	rte_pktmbuf_free(m);
+	return 0;
+fail:
+	rte_pktmbuf_free(m);
+	return -1;
+}
+#undef GOTO_FAIL
+
+static int
 test_mbuf(void)
 {
 	int ret = -1;
@@ -1140,6 +1215,12 @@  test_mbuf(void)
 		goto err;
 	}
 
+	/* test registration of dynamic fields and flags */
+	if (test_mbuf_dyn(pktmbuf_pool) < 0) {
+		printf("mbuf dynflag test failed\n");
+		goto err;
+	}
+
 	/* create a specific pktmbuf pool with a priv_size != 0 and no data
 	 * room size */
 	pktmbuf_pool2 = rte_pktmbuf_pool_create("test_pktmbuf_pool2",
diff --git a/lib/librte_mbuf/Makefile b/lib/librte_mbuf/Makefile
index c8f6d2689..5a9bcee73 100644
--- a/lib/librte_mbuf/Makefile
+++ b/lib/librte_mbuf/Makefile
@@ -17,8 +17,10 @@  LIBABIVER := 5
 
 # all source are stored in SRCS-y
 SRCS-$(CONFIG_RTE_LIBRTE_MBUF) := rte_mbuf.c rte_mbuf_ptype.c rte_mbuf_pool_ops.c
+SRCS-$(CONFIG_RTE_LIBRTE_MBUF) += rte_mbuf_dyn.c
 
 # install includes
 SYMLINK-$(CONFIG_RTE_LIBRTE_MBUF)-include := rte_mbuf.h rte_mbuf_ptype.h rte_mbuf_pool_ops.h
+SYMLINK-$(CONFIG_RTE_LIBRTE_MBUF)-include += rte_mbuf_dyn.h
 
 include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/lib/librte_mbuf/meson.build b/lib/librte_mbuf/meson.build
index 6cc11ebb4..9137e8f26 100644
--- a/lib/librte_mbuf/meson.build
+++ b/lib/librte_mbuf/meson.build
@@ -2,8 +2,10 @@ 
 # Copyright(c) 2017 Intel Corporation
 
 version = 5
-sources = files('rte_mbuf.c', 'rte_mbuf_ptype.c', 'rte_mbuf_pool_ops.c')
-headers = files('rte_mbuf.h', 'rte_mbuf_ptype.h', 'rte_mbuf_pool_ops.h')
+sources = files('rte_mbuf.c', 'rte_mbuf_ptype.c', 'rte_mbuf_pool_ops.c',
+	'rte_mbuf_dyn.c')
+headers = files('rte_mbuf.h', 'rte_mbuf_ptype.h', 'rte_mbuf_pool_ops.h',
+	'rte_mbuf_dyn.h')
 deps += ['mempool']
 
 allow_experimental_apis = true
diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
index 98225ec80..ef588cd54 100644
--- a/lib/librte_mbuf/rte_mbuf.h
+++ b/lib/librte_mbuf/rte_mbuf.h
@@ -198,9 +198,12 @@  extern "C" {
 #define PKT_RX_OUTER_L4_CKSUM_GOOD	(1ULL << 22)
 #define PKT_RX_OUTER_L4_CKSUM_INVALID	((1ULL << 21) | (1ULL << 22))
 
-/* add new RX flags here */
+/* add new RX flags here, don't forget to update PKT_FIRST_FREE */
 
-/* add new TX flags here */
+#define PKT_FIRST_FREE (1ULL << 23)
+#define PKT_LAST_FREE (1ULL << 39)
+
+/* add new TX flags here, don't forget to update PKT_LAST_FREE  */
 
 /**
  * Indicate that the metadata field in the mbuf is in use.
@@ -738,6 +741,8 @@  struct rte_mbuf {
 	 */
 	struct rte_mbuf_ext_shared_info *shinfo;
 
+	uint64_t dynfield1; /**< Reserved for dynamic fields. */
+	uint64_t dynfield2; /**< Reserved for dynamic fields. */
 } __rte_cache_aligned;
 
 /**
@@ -1685,6 +1690,21 @@  rte_pktmbuf_attach_extbuf(struct rte_mbuf *m, void *buf_addr,
 #define rte_pktmbuf_detach_extbuf(m) rte_pktmbuf_detach(m)
 
 /**
+ * Copy dynamic fields from m_src to m_dst.
+ *
+ * @param m_dst
+ *   The destination mbuf.
+ * @param m_src
+ *   The source mbuf.
+ */
+static inline void
+rte_mbuf_dynfield_copy(struct rte_mbuf *m_dst, const struct rte_mbuf *m_src)
+{
+	m_dst->dynfield1 = m_src->dynfield1;
+	m_dst->dynfield2 = m_src->dynfield2;
+}
+
+/**
  * Attach packet mbuf to another packet mbuf.
  *
  * If the mbuf we are attaching to isn't a direct buffer and is attached to
@@ -1732,6 +1752,7 @@  static inline void rte_pktmbuf_attach(struct rte_mbuf *mi, struct rte_mbuf *m)
 	mi->vlan_tci_outer = m->vlan_tci_outer;
 	mi->tx_offload = m->tx_offload;
 	mi->hash = m->hash;
+	rte_mbuf_dynfield_copy(mi, m);
 
 	mi->next = NULL;
 	mi->pkt_len = mi->data_len;
diff --git a/lib/librte_mbuf/rte_mbuf_dyn.c b/lib/librte_mbuf/rte_mbuf_dyn.c
new file mode 100644
index 000000000..6a96a43da
--- /dev/null
+++ b/lib/librte_mbuf/rte_mbuf_dyn.c
@@ -0,0 +1,373 @@ 
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright 2019 6WIND S.A.
+ */
+
+#include <sys/queue.h>
+
+#include <rte_common.h>
+#include <rte_eal.h>
+#include <rte_eal_memconfig.h>
+#include <rte_tailq.h>
+#include <rte_errno.h>
+#include <rte_malloc.h>
+#include <rte_string_fns.h>
+#include <rte_mbuf.h>
+#include <rte_mbuf_dyn.h>
+
+#define RTE_MBUF_DYN_MZNAME "rte_mbuf_dyn"
+
+struct mbuf_dynfield {
+	TAILQ_ENTRY(mbuf_dynfield) next;
+	char name[RTE_MBUF_DYN_NAMESIZE];
+	size_t size;
+	size_t align;
+	unsigned int flags;
+	int offset;
+};
+TAILQ_HEAD(mbuf_dynfield_list, rte_tailq_entry);
+
+static struct rte_tailq_elem mbuf_dynfield_tailq = {
+	.name = "RTE_MBUF_DYNFIELD",
+};
+EAL_REGISTER_TAILQ(mbuf_dynfield_tailq);
+
+struct mbuf_dynflag {
+	TAILQ_ENTRY(mbuf_dynflag) next;
+	char name[RTE_MBUF_DYN_NAMESIZE];
+	int bitnum;
+};
+TAILQ_HEAD(mbuf_dynflag_list, rte_tailq_entry);
+
+static struct rte_tailq_elem mbuf_dynflag_tailq = {
+	.name = "RTE_MBUF_DYNFLAG",
+};
+EAL_REGISTER_TAILQ(mbuf_dynflag_tailq);
+
+struct mbuf_dyn_shm {
+	/** For each mbuf byte, free_space[i] == 1 if space is free. */
+	uint8_t free_space[sizeof(struct rte_mbuf)];
+	/** Bitfield of available flags. */
+	uint64_t free_flags;
+};
+static struct mbuf_dyn_shm *shm;
+
+/* allocate and initialize the shared memory */
+static int
+init_shared_mem(void)
+{
+	const struct rte_memzone *mz;
+	uint64_t mask;
+
+	if (rte_eal_process_type() == RTE_PROC_PRIMARY) {
+		mz = rte_memzone_reserve_aligned(RTE_MBUF_DYN_MZNAME,
+						sizeof(struct mbuf_dyn_shm),
+						SOCKET_ID_ANY, 0,
+						RTE_CACHE_LINE_SIZE);
+	} else {
+		mz = rte_memzone_lookup(RTE_MBUF_DYN_MZNAME);
+	}
+	if (mz == NULL)
+		return -1;
+
+	shm = mz->addr;
+
+#define mark_free(field)						\
+	memset(&shm->free_space[offsetof(struct rte_mbuf, field)],	\
+		0xff, sizeof(((struct rte_mbuf *)0)->field))
+
+	if (rte_eal_process_type() == RTE_PROC_PRIMARY) {
+		/* init free_space, keep it sync'd with
+		 * rte_mbuf_dynfield_copy().
+		 */
+		memset(shm, 0, sizeof(*shm));
+		mark_free(dynfield1);
+		mark_free(dynfield2);
+
+		/* init free_flags */
+		for (mask = PKT_FIRST_FREE; mask <= PKT_LAST_FREE; mask <<= 1)
+			shm->free_flags |= mask;
+	}
+#undef mark_free
+
+	return 0;
+}
+
+/* check if this offset can be used */
+static int
+check_offset(size_t offset, size_t size, size_t align, unsigned int flags)
+{
+	size_t i;
+
+	(void)flags;
+
+	if ((offset & (align - 1)) != 0)
+		return -1;
+	if (offset + size > sizeof(struct rte_mbuf))
+		return -1;
+
+	for (i = 0; i < size; i++) {
+		if (!shm->free_space[i + offset])
+			return -1;
+	}
+
+	return 0;
+}
+
+/* assume tailq is locked */
+static struct mbuf_dynfield *
+__mbuf_dynfield_lookup(const char *name)
+{
+	struct mbuf_dynfield_list *mbuf_dynfield_list;
+	struct mbuf_dynfield *mbuf_dynfield;
+	struct rte_tailq_entry *te;
+
+	mbuf_dynfield_list = RTE_TAILQ_CAST(
+		mbuf_dynfield_tailq.head, mbuf_dynfield_list);
+
+	TAILQ_FOREACH(te, mbuf_dynfield_list, next) {
+		mbuf_dynfield = (struct mbuf_dynfield *)te->data;
+		if (strncmp(name, mbuf_dynfield->name,
+				RTE_MBUF_DYN_NAMESIZE) == 0)
+			break;
+	}
+
+	if (te == NULL) {
+		rte_errno = ENOENT;
+		return NULL;
+	}
+
+	return mbuf_dynfield;
+}
+
+int
+rte_mbuf_dynfield_lookup(const char *name, size_t *size, size_t *align)
+{
+	struct mbuf_dynfield *mbuf_dynfield;
+
+	if (shm == NULL) {
+		rte_errno = ENOENT;
+		return -1;
+	}
+
+	rte_mcfg_tailq_read_lock();
+	mbuf_dynfield = __mbuf_dynfield_lookup(name);
+	rte_mcfg_tailq_read_unlock();
+
+	if (mbuf_dynfield == NULL) {
+		rte_errno = ENOENT;
+		return -1;
+	}
+
+	if (size != NULL)
+		*size = mbuf_dynfield->size;
+	if (align != NULL)
+		*align = mbuf_dynfield->align;
+
+	return mbuf_dynfield->offset;
+}
+
+int
+rte_mbuf_dynfield_register(const char *name, size_t size, size_t align,
+			unsigned int flags)
+{
+	struct mbuf_dynfield_list *mbuf_dynfield_list;
+	struct mbuf_dynfield *mbuf_dynfield = NULL;
+	struct rte_tailq_entry *te = NULL;
+	int offset, ret;
+	size_t i;
+
+	if (shm == NULL && init_shared_mem() < 0)
+		goto fail;
+	if (size >= sizeof(struct rte_mbuf)) {
+		rte_errno = EINVAL;
+		goto fail;
+	}
+	if (!rte_is_power_of_2(align)) {
+		rte_errno = EINVAL;
+		goto fail;
+	}
+
+	rte_mcfg_tailq_write_lock();
+
+	mbuf_dynfield = __mbuf_dynfield_lookup(name);
+	if (mbuf_dynfield != NULL) {
+		if (mbuf_dynfield->size != size ||
+				mbuf_dynfield->align != align ||
+				mbuf_dynfield->flags != flags) {
+			rte_errno = EEXIST;
+			goto fail_unlock;
+		}
+		offset = mbuf_dynfield->offset;
+		goto out_unlock;
+	}
+
+	if (rte_eal_process_type() != RTE_PROC_PRIMARY) {
+		rte_errno = EPERM;
+		goto fail_unlock;
+	}
+
+	for (offset = 0;
+	     offset < (int)sizeof(struct rte_mbuf);
+	     offset++) {
+		if (check_offset(offset, size, align, flags) == 0)
+			break;
+	}
+
+	if (offset == sizeof(struct rte_mbuf)) {
+		rte_errno = ENOENT;
+		goto fail_unlock;
+	}
+
+	mbuf_dynfield_list = RTE_TAILQ_CAST(
+		mbuf_dynfield_tailq.head, mbuf_dynfield_list);
+
+	te = rte_zmalloc("MBUF_DYNFIELD_TAILQ_ENTRY", sizeof(*te), 0);
+	if (te == NULL)
+		goto fail_unlock;
+
+	mbuf_dynfield = rte_zmalloc("mbuf_dynfield", sizeof(*mbuf_dynfield), 0);
+	if (mbuf_dynfield == NULL)
+		goto fail_unlock;
+
+	ret = strlcpy(mbuf_dynfield->name, name, sizeof(mbuf_dynfield->name));
+	if (ret < 0 || ret >= (int)sizeof(mbuf_dynfield->name)) {
+		rte_errno = ENAMETOOLONG;
+		goto fail_unlock;
+	}
+	mbuf_dynfield->size = size;
+	mbuf_dynfield->align = align;
+	mbuf_dynfield->flags = flags;
+	mbuf_dynfield->offset = offset;
+	te->data = mbuf_dynfield;
+
+	TAILQ_INSERT_TAIL(mbuf_dynfield_list, te, next);
+
+	for (i = offset; i < offset + size; i++)
+		shm->free_space[i] = 0;
+
+out_unlock:
+	rte_mcfg_tailq_write_unlock();
+
+	return offset;
+
+fail_unlock:
+	rte_mcfg_tailq_write_unlock();
+fail:
+	rte_free(mbuf_dynfield);
+	rte_free(te);
+	return -1;
+}
+
+/* assume tailq is locked */
+static struct mbuf_dynflag *
+__mbuf_dynflag_lookup(const char *name)
+{
+	struct mbuf_dynflag_list *mbuf_dynflag_list;
+	struct mbuf_dynflag *mbuf_dynflag;
+	struct rte_tailq_entry *te;
+
+	mbuf_dynflag_list = RTE_TAILQ_CAST(
+		mbuf_dynflag_tailq.head, mbuf_dynflag_list);
+
+	TAILQ_FOREACH(te, mbuf_dynflag_list, next) {
+		mbuf_dynflag = (struct mbuf_dynflag *)te->data;
+		if (strncmp(name, mbuf_dynflag->name,
+				RTE_MBUF_DYN_NAMESIZE) == 0)
+			break;
+	}
+
+	if (te == NULL) {
+		rte_errno = ENOENT;
+		return NULL;
+	}
+
+	return mbuf_dynflag;
+}
+
+int
+rte_mbuf_dynflag_lookup(const char *name)
+{
+	struct mbuf_dynflag *mbuf_dynflag;
+
+	if (shm == NULL) {
+		rte_errno = ENOENT;
+		return -1;
+	}
+
+	rte_mcfg_tailq_read_lock();
+	mbuf_dynflag = __mbuf_dynflag_lookup(name);
+	rte_mcfg_tailq_read_unlock();
+
+	if (mbuf_dynflag == NULL) {
+		rte_errno = ENOENT;
+		return -1;
+	}
+
+	return mbuf_dynflag->bitnum;
+}
+
+int
+rte_mbuf_dynflag_register(const char *name)
+{
+	struct mbuf_dynflag_list *mbuf_dynflag_list;
+	struct mbuf_dynflag *mbuf_dynflag = NULL;
+	struct rte_tailq_entry *te = NULL;
+	int bitnum, ret;
+
+	if (shm == NULL && init_shared_mem() < 0)
+		goto fail;
+
+	rte_mcfg_tailq_write_lock();
+
+	mbuf_dynflag = __mbuf_dynflag_lookup(name);
+	if (mbuf_dynflag != NULL) {
+		bitnum = mbuf_dynflag->bitnum;
+		goto out_unlock;
+	}
+
+	if (rte_eal_process_type() != RTE_PROC_PRIMARY) {
+		rte_errno = EPERM;
+		goto fail_unlock;
+	}
+
+	if (shm->free_flags == 0) {
+		rte_errno = ENOENT;
+		goto fail_unlock;
+	}
+	bitnum = rte_bsf64(shm->free_flags);
+
+	mbuf_dynflag_list = RTE_TAILQ_CAST(
+		mbuf_dynflag_tailq.head, mbuf_dynflag_list);
+
+	te = rte_zmalloc("MBUF_DYNFLAG_TAILQ_ENTRY", sizeof(*te), 0);
+	if (te == NULL)
+		goto fail_unlock;
+
+	mbuf_dynflag = rte_zmalloc("mbuf_dynflag", sizeof(*mbuf_dynflag), 0);
+	if (mbuf_dynflag == NULL)
+		goto fail_unlock;
+
+	ret = strlcpy(mbuf_dynflag->name, name, sizeof(mbuf_dynflag->name));
+	if (ret < 0 || ret >= (int)sizeof(mbuf_dynflag->name)) {
+		rte_errno = ENAMETOOLONG;
+		goto fail_unlock;
+	}
+	mbuf_dynflag->bitnum = bitnum;
+	te->data = mbuf_dynflag;
+
+	TAILQ_INSERT_TAIL(mbuf_dynflag_list, te, next);
+
+	shm->free_flags &= ~(1ULL << bitnum);
+
+out_unlock:
+	rte_mcfg_tailq_write_unlock();
+
+	return bitnum;
+
+fail_unlock:
+	rte_mcfg_tailq_write_unlock();
+fail:
+	rte_free(mbuf_dynflag);
+	rte_free(te);
+	return -1;
+}
diff --git a/lib/librte_mbuf/rte_mbuf_dyn.h b/lib/librte_mbuf/rte_mbuf_dyn.h
new file mode 100644
index 000000000..a86986a0f
--- /dev/null
+++ b/lib/librte_mbuf/rte_mbuf_dyn.h
@@ -0,0 +1,119 @@ 
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright 2019 6WIND S.A.
+ */
+
+#ifndef _RTE_MBUF_DYN_H_
+#define _RTE_MBUF_DYN_H_
+
+/**
+ * @file
+ * RTE Mbuf dynamic fields and flags
+ *
+ * Many features require to store data inside the mbuf. As the room in
+ * mbuf structure is limited, it is not possible to have a field for
+ * each feature. Also, changing fields in the mbuf structure can break
+ * the API or ABI.
+ *
+ * This module addresses this issue, by enabling the dynamic
+ * registration of fields or flags:
+ *
+ * - a dynamic field is a named area in the rte_mbuf structure, with a
+ *   given size (>= 1 byte) and alignment constraint.
+ * - a dynamic flag is a named bit in the rte_mbuf structure.
+ *
+ * The typical use case is a PMD that registers space for an offload
+ * feature, when the application requests to enable this feature.  As
+ * the space in mbuf is limited, the space should only be reserved if it
+ * is going to be used (i.e when the application explicitly asks for it).
+ *
+ * The registration can be done at any moment, but it is not possible
+ * to unregister fields or flags for now.
+ *
+ * Example of use:
+ *
+ * - RTE_MBUF_DYN_<feature>_(ID|SIZE|ALIGN) are defined in this file
+ * - If the application asks for the feature, the PMD use
+ *   rte_mbuf_dynfield_register() to get the dynamic offset and stores
+ *   in a global variable.
+ * - The application also calls rte_mbuf_dynfield_register() to get the
+ *   dynamic offset and stores it in a global variable.
+ * - When the field must be used by the PMD or the application, they
+ *   use the RTE_MBUF_DYNFIELD() helper.
+ */
+
+struct rte_mbuf;
+
+/**
+ * Register space for a dynamic field in the mbuf structure.
+ *
+ * @param name
+ *   A string identifying the dynamic field. External applications or
+ *   libraries must not define identifers prefixed with "rte_", which
+ *   are reserved for standard features.
+ * @param size
+ *   The number of bytes to reserve.
+ * @param align
+ *   The alignment constraint, which must be a power of 2.
+ * @param flags
+ *   Reserved for future use.
+ * @return
+ *   The offset in the mbuf structure, or -1 on error (rte_errno is set).
+ */
+__rte_experimental
+int rte_mbuf_dynfield_register(const char *name, size_t size, size_t align,
+			unsigned int flags);
+
+/**
+ * Lookup for a registered dynamic mbuf field.
+ *
+ * @param name
+ *   A string identifying the dynamic field.
+ * @param size
+ *   If not NULL, the number of reserved bytes for this field is stored
+ *   at this address.
+ * @param align
+ *   If not NULL, the alignement constraint for this field is stored
+ *   at this address.
+ * @return
+ *   The offset of this field in the mbuf structure, or -1 on error
+ *   (rte_errno is set).
+ */
+__rte_experimental
+int rte_mbuf_dynfield_lookup(const char *name, size_t *size, size_t *align);
+
+/**
+ * Register a dynamic flag in the mbuf structure.
+ *
+ * @param name
+ *   A string identifying the dynamic flag. External applications or
+ *   libraries must not define identifers prefixed with "rte_", which
+ *   are reserved for standard features.
+ * @return
+ *   The number of the reserved bit, or -1 on error (rte_errno is set).
+ */
+__rte_experimental
+int rte_mbuf_dynflag_register(const char *name);
+
+/**
+ * Lookup for a registered dynamic mbuf flag.
+ *
+ * @param name
+ *   A string identifying the dynamic flag.
+ * @return
+ *   The offset of this flag in the mbuf structure, or -1 on error
+ *   (rte_errno is set).
+ */
+__rte_experimental
+int rte_mbuf_dynflag_lookup(const char *name);
+
+/**
+ * Helper macro to access to a dynamic field.
+ */
+#define RTE_MBUF_DYNFIELD(m, offset, type) ((type)((char *)(m) + (offset)))
+
+/**
+ * Maximum length of the dynamic field or flag string.
+ */
+#define RTE_MBUF_DYN_NAMESIZE 32
+
+#endif
diff --git a/lib/librte_mbuf/rte_mbuf_version.map b/lib/librte_mbuf/rte_mbuf_version.map
index 2662a37bf..a98310570 100644
--- a/lib/librte_mbuf/rte_mbuf_version.map
+++ b/lib/librte_mbuf/rte_mbuf_version.map
@@ -50,4 +50,8 @@  EXPERIMENTAL {
 	global:
 
 	rte_mbuf_check;
+	rte_mbuf_dynfield_lookup;
+	rte_mbuf_dynfield_register;
+	rte_mbuf_dynflag_lookup;
+	rte_mbuf_dynflag_register;
 } DPDK_18.08;