[RFC] mempool: introduce indexed memory pool

Message ID 1571295301-25911-1-git-send-email-xuemingl@mellanox.com (mailing list archive)
State RFC, archived
Delegated to: David Marchand
Headers
Series [RFC] mempool: introduce indexed memory pool |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/Intel-compilation success Compilation OK

Commit Message

Xueming Li Oct. 17, 2019, 6:55 a.m. UTC
  Indexed memory pool manages memory entries by index, allocation from
pool returns both memory pointer and index(ID). users save ID as u32
or less(u16) instead of traditional 8 bytes pointer. Memory could be
retrieved from pool or returned to pool later by index.

Pool allocates backend memory in chunk on demand, pool size grows
dynamically. Bitmap is used to track entry usage in chunk, thus
management overhead is one bit per entry.

Standard rte_malloc demands malloc overhead(64B) and minimal data
size(64B). This pool aims to such cost saving also pointer size.
For scenario like creating millions of rte_flows each consists
of small pieces of memories, the difference is huge.

Like standard memory pool, this lightweight pool only support fixed
size memory allocation. Pools should be created for each different
size.

To facilitate memory allocated by index, a set of ILIST_XXX macro
defined to operate entries as regular LIST.

By setting entry size to zero, pool can be used as ID generator.

Signed-off-by: Xueming Li <xuemingl@mellanox.com>
---
 lib/librte_mempool/Makefile                |   3 +-
 lib/librte_mempool/rte_indexed_pool.c      | 289 +++++++++++++++++++++
 lib/librte_mempool/rte_indexed_pool.h      | 224 ++++++++++++++++
 lib/librte_mempool/rte_mempool_version.map |   7 +-
 4 files changed, 521 insertions(+), 2 deletions(-)
 create mode 100644 lib/librte_mempool/rte_indexed_pool.c
 create mode 100644 lib/librte_mempool/rte_indexed_pool.h
  

Comments

Jerin Jacob Oct. 17, 2019, 7:13 a.m. UTC | #1
On Thu, Oct 17, 2019 at 12:25 PM Xueming Li <xuemingl@mellanox.com> wrote:
>
> Indexed memory pool manages memory entries by index, allocation from
> pool returns both memory pointer and index(ID). users save ID as u32
> or less(u16) instead of traditional 8 bytes pointer. Memory could be
> retrieved from pool or returned to pool later by index.
>
> Pool allocates backend memory in chunk on demand, pool size grows
> dynamically. Bitmap is used to track entry usage in chunk, thus
> management overhead is one bit per entry.
>
> Standard rte_malloc demands malloc overhead(64B) and minimal data
> size(64B). This pool aims to such cost saving also pointer size.
> For scenario like creating millions of rte_flows each consists
> of small pieces of memories, the difference is huge.
>
> Like standard memory pool, this lightweight pool only support fixed
> size memory allocation. Pools should be created for each different
> size.
>
> To facilitate memory allocated by index, a set of ILIST_XXX macro
> defined to operate entries as regular LIST.
>
> By setting entry size to zero, pool can be used as ID generator.
>
> Signed-off-by: Xueming Li <xuemingl@mellanox.com>
> ---
>  lib/librte_mempool/Makefile                |   3 +-
>  lib/librte_mempool/rte_indexed_pool.c      | 289 +++++++++++++++++++++
>  lib/librte_mempool/rte_indexed_pool.h      | 224 ++++++++++++++++

Can this be abstracted over the driver interface instead of creating a
new APIS? ie using drivers/mempool/
  
Xueming Li Oct. 17, 2019, 1:13 p.m. UTC | #2
> -----Original Message-----
> From: Jerin Jacob <jerinjacobk@gmail.com>
> Sent: Thursday, October 17, 2019 3:14 PM
> To: Xueming(Steven) Li <xuemingl@mellanox.com>
> Cc: Olivier Matz <olivier.matz@6wind.com>; Andrew Rybchenko
> <arybchenko@solarflare.com>; dpdk-dev <dev@dpdk.org>; Asaf Penso
> <asafp@mellanox.com>; Ori Kam <orika@mellanox.com>
> Subject: Re: [dpdk-dev] [RFC] mempool: introduce indexed memory pool
> 
> On Thu, Oct 17, 2019 at 12:25 PM Xueming Li <xuemingl@mellanox.com> wrote:
> >
> > Indexed memory pool manages memory entries by index, allocation from
> > pool returns both memory pointer and index(ID). users save ID as u32
> > or less(u16) instead of traditional 8 bytes pointer. Memory could be
> > retrieved from pool or returned to pool later by index.
> >
> > Pool allocates backend memory in chunk on demand, pool size grows
> > dynamically. Bitmap is used to track entry usage in chunk, thus
> > management overhead is one bit per entry.
> >
> > Standard rte_malloc demands malloc overhead(64B) and minimal data
> > size(64B). This pool aims to such cost saving also pointer size.
> > For scenario like creating millions of rte_flows each consists of
> > small pieces of memories, the difference is huge.
> >
> > Like standard memory pool, this lightweight pool only support fixed
> > size memory allocation. Pools should be created for each different
> > size.
> >
> > To facilitate memory allocated by index, a set of ILIST_XXX macro
> > defined to operate entries as regular LIST.
> >
> > By setting entry size to zero, pool can be used as ID generator.
> >
> > Signed-off-by: Xueming Li <xuemingl@mellanox.com>
> > ---
> >  lib/librte_mempool/Makefile                |   3 +-
> >  lib/librte_mempool/rte_indexed_pool.c      | 289 +++++++++++++++++++++
> >  lib/librte_mempool/rte_indexed_pool.h      | 224 ++++++++++++++++
> 
> Can this be abstracted over the driver interface instead of creating a new APIS?
> ie using drivers/mempool/

The driver interface manage memory entries with pointers, while this api uses u32 index as key...
  
Jerin Jacob Oct. 17, 2019, 4:40 p.m. UTC | #3
On Thu, Oct 17, 2019 at 6:43 PM Xueming(Steven) Li
<xuemingl@mellanox.com> wrote:
>
> > -----Original Message-----
> > From: Jerin Jacob <jerinjacobk@gmail.com>
> > Sent: Thursday, October 17, 2019 3:14 PM
> > To: Xueming(Steven) Li <xuemingl@mellanox.com>
> > Cc: Olivier Matz <olivier.matz@6wind.com>; Andrew Rybchenko
> > <arybchenko@solarflare.com>; dpdk-dev <dev@dpdk.org>; Asaf Penso
> > <asafp@mellanox.com>; Ori Kam <orika@mellanox.com>
> > Subject: Re: [dpdk-dev] [RFC] mempool: introduce indexed memory pool
> >
> > On Thu, Oct 17, 2019 at 12:25 PM Xueming Li <xuemingl@mellanox.com> wrote:
> > >
> > > Indexed memory pool manages memory entries by index, allocation from
> > > pool returns both memory pointer and index(ID). users save ID as u32
> > > or less(u16) instead of traditional 8 bytes pointer. Memory could be
> > > retrieved from pool or returned to pool later by index.
> > >
> > > Pool allocates backend memory in chunk on demand, pool size grows
> > > dynamically. Bitmap is used to track entry usage in chunk, thus
> > > management overhead is one bit per entry.
> > >
> > > Standard rte_malloc demands malloc overhead(64B) and minimal data
> > > size(64B). This pool aims to such cost saving also pointer size.
> > > For scenario like creating millions of rte_flows each consists of
> > > small pieces of memories, the difference is huge.
> > >
> > > Like standard memory pool, this lightweight pool only support fixed
> > > size memory allocation. Pools should be created for each different
> > > size.
> > >
> > > To facilitate memory allocated by index, a set of ILIST_XXX macro
> > > defined to operate entries as regular LIST.
> > >
> > > By setting entry size to zero, pool can be used as ID generator.
> > >
> > > Signed-off-by: Xueming Li <xuemingl@mellanox.com>
> > > ---
> > >  lib/librte_mempool/Makefile                |   3 +-
> > >  lib/librte_mempool/rte_indexed_pool.c      | 289 +++++++++++++++++++++
> > >  lib/librte_mempool/rte_indexed_pool.h      | 224 ++++++++++++++++
> >
> > Can this be abstracted over the driver interface instead of creating a new APIS?
> > ie using drivers/mempool/
>
> The driver interface manage memory entries with pointers, while this api uses u32 index as key...

I see. As a use case, it makes sense to me.
Have you checked the possibility reusing/extending
lib/librte_eal/common/include/rte_bitmap.h for bitmap management,
instead of rolling a new one?
  
Xueming Li Oct. 18, 2019, 10:10 a.m. UTC | #4
> -----Original Message-----
> From: Jerin Jacob <jerinjacobk@gmail.com>
> Sent: Friday, October 18, 2019 12:41 AM
> To: Xueming(Steven) Li <xuemingl@mellanox.com>
> Cc: Olivier Matz <olivier.matz@6wind.com>; Andrew Rybchenko
> <arybchenko@solarflare.com>; dpdk-dev <dev@dpdk.org>; Asaf Penso
> <asafp@mellanox.com>; Ori Kam <orika@mellanox.com>; Stephen
> Hemminger <stephen@networkplumber.org>
> Subject: Re: [dpdk-dev] [RFC] mempool: introduce indexed memory pool
> 
> On Thu, Oct 17, 2019 at 6:43 PM Xueming(Steven) Li
> <xuemingl@mellanox.com> wrote:
> >
> > > -----Original Message-----
> > > From: Jerin Jacob <jerinjacobk@gmail.com>
> > > Sent: Thursday, October 17, 2019 3:14 PM
> > > To: Xueming(Steven) Li <xuemingl@mellanox.com>
> > > Cc: Olivier Matz <olivier.matz@6wind.com>; Andrew Rybchenko
> > > <arybchenko@solarflare.com>; dpdk-dev <dev@dpdk.org>; Asaf Penso
> > > <asafp@mellanox.com>; Ori Kam <orika@mellanox.com>
> > > Subject: Re: [dpdk-dev] [RFC] mempool: introduce indexed memory pool
> > >
> > > On Thu, Oct 17, 2019 at 12:25 PM Xueming Li <xuemingl@mellanox.com>
> wrote:
> > > >
> > > > Indexed memory pool manages memory entries by index, allocation
> > > > from pool returns both memory pointer and index(ID). users save ID
> > > > as u32 or less(u16) instead of traditional 8 bytes pointer. Memory
> > > > could be retrieved from pool or returned to pool later by index.
> > > >
> > > > Pool allocates backend memory in chunk on demand, pool size grows
> > > > dynamically. Bitmap is used to track entry usage in chunk, thus
> > > > management overhead is one bit per entry.
> > > >
> > > > Standard rte_malloc demands malloc overhead(64B) and minimal data
> > > > size(64B). This pool aims to such cost saving also pointer size.
> > > > For scenario like creating millions of rte_flows each consists of
> > > > small pieces of memories, the difference is huge.
> > > >
> > > > Like standard memory pool, this lightweight pool only support
> > > > fixed size memory allocation. Pools should be created for each
> > > > different size.
> > > >
> > > > To facilitate memory allocated by index, a set of ILIST_XXX macro
> > > > defined to operate entries as regular LIST.
> > > >
> > > > By setting entry size to zero, pool can be used as ID generator.
> > > >
> > > > Signed-off-by: Xueming Li <xuemingl@mellanox.com>
> > > > ---
> > > >  lib/librte_mempool/Makefile                |   3 +-
> > > >  lib/librte_mempool/rte_indexed_pool.c      | 289
> +++++++++++++++++++++
> > > >  lib/librte_mempool/rte_indexed_pool.h      | 224 ++++++++++++++++
> > >
> > > Can this be abstracted over the driver interface instead of creating a new
> APIS?
> > > ie using drivers/mempool/
> >
> > The driver interface manage memory entries with pointers, while this api
> uses u32 index as key...
> 
> I see. As a use case, it makes sense to me.

> Have you checked the possibility reusing/extending
> lib/librte_eal/common/include/rte_bitmap.h for bitmap management,
> instead of rolling a new one?

Yes, the rte_bitmap designed for fixed bitmap size, to grow, have to copy almost entire bitmap(array1+array2).
This pool distribute array2 into each trunk, and the trunk array actually plays the array1 role.
When growing, just grow array1 which is smaller, no touch to existing array2 in each trunk.

The map_xxx() naming might confused people, I'll make following change in next version:
	map_get()/map_set(): only used once and the code is simple, move code into caller.
	map_is_empty()/map_clear()/ : unused, remove
	map_clear_any(): relative simple, embed into caller.
  
Jerin Jacob Oct. 19, 2019, 12:28 p.m. UTC | #5
On Fri, 18 Oct, 2019, 3:40 pm Xueming(Steven) Li, <xuemingl@mellanox.com>
wrote:

> > -----Original Message-----
> > From: Jerin Jacob <jerinjacobk@gmail.com>
> > Sent: Friday, October 18, 2019 12:41 AM
> > To: Xueming(Steven) Li <xuemingl@mellanox.com>
> > Cc: Olivier Matz <olivier.matz@6wind.com>; Andrew Rybchenko
> > <arybchenko@solarflare.com>; dpdk-dev <dev@dpdk.org>; Asaf Penso
> > <asafp@mellanox.com>; Ori Kam <orika@mellanox.com>; Stephen
> > Hemminger <stephen@networkplumber.org>
> > Subject: Re: [dpdk-dev] [RFC] mempool: introduce indexed memory pool
> >
> > On Thu, Oct 17, 2019 at 6:43 PM Xueming(Steven) Li
> > <xuemingl@mellanox.com> wrote:
> > >
> > > > -----Original Message-----
> > > > From: Jerin Jacob <jerinjacobk@gmail.com>
> > > > Sent: Thursday, October 17, 2019 3:14 PM
> > > > To: Xueming(Steven) Li <xuemingl@mellanox.com>
> > > > Cc: Olivier Matz <olivier.matz@6wind.com>; Andrew Rybchenko
> > > > <arybchenko@solarflare.com>; dpdk-dev <dev@dpdk.org>; Asaf Penso
> > > > <asafp@mellanox.com>; Ori Kam <orika@mellanox.com>
> > > > Subject: Re: [dpdk-dev] [RFC] mempool: introduce indexed memory pool
> > > >
> > > > On Thu, Oct 17, 2019 at 12:25 PM Xueming Li <xuemingl@mellanox.com>
> > wrote:
> > > > >
> > > > > Indexed memory pool manages memory entries by index, allocation
> > > > > from pool returns both memory pointer and index(ID). users save ID
> > > > > as u32 or less(u16) instead of traditional 8 bytes pointer. Memory
> > > > > could be retrieved from pool or returned to pool later by index.
> > > > >
> > > > > Pool allocates backend memory in chunk on demand, pool size grows
> > > > > dynamically. Bitmap is used to track entry usage in chunk, thus
> > > > > management overhead is one bit per entry.
> > > > >
> > > > > Standard rte_malloc demands malloc overhead(64B) and minimal data
> > > > > size(64B). This pool aims to such cost saving also pointer size.
> > > > > For scenario like creating millions of rte_flows each consists of
> > > > > small pieces of memories, the difference is huge.
> > > > >
> > > > > Like standard memory pool, this lightweight pool only support
> > > > > fixed size memory allocation. Pools should be created for each
> > > > > different size.
> > > > >
> > > > > To facilitate memory allocated by index, a set of ILIST_XXX macro
> > > > > defined to operate entries as regular LIST.
> > > > >
> > > > > By setting entry size to zero, pool can be used as ID generator.
> > > > >
> > > > > Signed-off-by: Xueming Li <xuemingl@mellanox.com>
> > > > > ---
> > > > >  lib/librte_mempool/Makefile                |   3 +-
> > > > >  lib/librte_mempool/rte_indexed_pool.c      | 289
> > +++++++++++++++++++++
> > > > >  lib/librte_mempool/rte_indexed_pool.h      | 224 ++++++++++++++++
> > > >
> > > > Can this be abstracted over the driver interface instead of creating
> a new
> > APIS?
> > > > ie using drivers/mempool/
> > >
> > > The driver interface manage memory entries with pointers, while this
> api
> > uses u32 index as key...
> >
> > I see. As a use case, it makes sense to me.
>
> > Have you checked the possibility reusing/extending
> > lib/librte_eal/common/include/rte_bitmap.h for bitmap management,
> > instead of rolling a new one?
>
> Yes, the rte_bitmap designed for fixed bitmap size, to grow, have to copy
> almost entire bitmap(array1+array2).
> This pool distribute array2 into each trunk, and the trunk array actually
> plays the array1 role.
> When growing, just grow array1 which is smaller, no touch to existing
> array2 in each trunk.
>

IMO, Growing bit map is generic problem so moving bitmap management logic
to common place will be usefull for other libraries in future. My
suggestion would be to enchanse rte_bitmap to support dynamic bitmap
through new APIs.



> The map_xxx() naming might confused people, I'll make following change in
> next version:
>         map_get()/map_set(): only used once and the code is simple, move
> code into caller.
>         map_is_empty()/map_clear()/ : unused, remove
>         map_clear_any(): relative simple, embed into caller.
>
  
Xueming Li Oct. 25, 2019, 3:29 p.m. UTC | #6
>From: Jerin Jacob <jerinjacobk@gmail.com> 
>Sent: Saturday, October 19, 2019 8:28 PM
>
>On Fri, 18 Oct, 2019, 3:40 pm Xueming(Steven) Li, <mailto:xuemingl@mellanox.com> wrote:
>> -----Original Message-----
>> From: Jerin Jacob <mailto:jerinjacobk@gmail.com>
>> Sent: Friday, October 18, 2019 12:41 AM
>> To: Xueming(Steven) Li <mailto:xuemingl@mellanox.com>
>> Cc: Olivier Matz <mailto:olivier.matz@6wind.com>; Andrew Rybchenko
>> <mailto:arybchenko@solarflare.com>; dpdk-dev <mailto:dev@dpdk.org>; Asaf Penso
>> <mailto:asafp@mellanox.com>; Ori Kam <mailto:orika@mellanox.com>; Stephen
>> Hemminger <mailto:stephen@networkplumber.org>
>> Subject: Re: [dpdk-dev] [RFC] mempool: introduce indexed memory pool
>> 
>> On Thu, Oct 17, 2019 at 6:43 PM Xueming(Steven) Li
>> <mailto:xuemingl@mellanox.com> wrote:
>> >
>> > > -----Original Message-----
>> > > From: Jerin Jacob <mailto:jerinjacobk@gmail.com>
>> > > Sent: Thursday, October 17, 2019 3:14 PM
>> > > To: Xueming(Steven) Li <mailto:xuemingl@mellanox.com>
>> > > Cc: Olivier Matz <mailto:olivier.matz@6wind.com>; Andrew Rybchenko
>> > > <mailto:arybchenko@solarflare.com>; dpdk-dev <mailto:dev@dpdk.org>; Asaf Penso
>> > > <mailto:asafp@mellanox.com>; Ori Kam <mailto:orika@mellanox.com>
>> > > Subject: Re: [dpdk-dev] [RFC] mempool: introduce indexed memory pool
>> > >
>> > > On Thu, Oct 17, 2019 at 12:25 PM Xueming Li <mailto:xuemingl@mellanox.com>
>> wrote:
>> > > >
>> > > > Indexed memory pool manages memory entries by index, allocation
>> > > > from pool returns both memory pointer and index(ID). users save ID
>> > > > as u32 or less(u16) instead of traditional 8 bytes pointer. Memory
>> > > > could be retrieved from pool or returned to pool later by index.
>> > > >
>> > > > Pool allocates backend memory in chunk on demand, pool size grows
>> > > > dynamically. Bitmap is used to track entry usage in chunk, thus
>> > > > management overhead is one bit per entry.
>> > > >
>> > > > Standard rte_malloc demands malloc overhead(64B) and minimal data
>> > > > size(64B). This pool aims to such cost saving also pointer size.
>> > > > For scenario like creating millions of rte_flows each consists of
>> > > > small pieces of memories, the difference is huge.
>> > > >
>> > > > Like standard memory pool, this lightweight pool only support
>> > > > fixed size memory allocation. Pools should be created for each
>> > > > different size.
>> > > >
>> > > > To facilitate memory allocated by index, a set of ILIST_XXX macro
>> > > > defined to operate entries as regular LIST.
>> > > >
>> > > > By setting entry size to zero, pool can be used as ID generator.
>> > > >
>> > > > Signed-off-by: Xueming Li <mailto:xuemingl@mellanox.com>
>> > > > ---
>> > > >  lib/librte_mempool/Makefile                |   3 +-
>> > > >  lib/librte_mempool/rte_indexed_pool.c      | 289
>> +++++++++++++++++++++
>> > > >  lib/librte_mempool/rte_indexed_pool.h      | 224 ++++++++++++++++
>> > >
>> > > Can this be abstracted over the driver interface instead of creating a new
>> APIS?
>> > > ie using drivers/mempool/
>> >
>> > The driver interface manage memory entries with pointers, while this api
>> uses u32 index as key...
>> 
>> I see. As a use case, it makes sense to me.
>
>> Have you checked the possibility reusing/extending
>> lib/librte_eal/common/include/rte_bitmap.h for bitmap management,
>> instead of rolling a new one?
>
>Yes, the rte_bitmap designed for fixed bitmap size, to grow, have to copy almost entire bitmap(array1+array2).
>This pool distribute array2 into each trunk, and the trunk array actually plays the array1 role.
>When growing, just grow array1 which is smaller, no touch to existing array2 in each trunk.
>
>IMO, Growing bit map is generic problem so moving bitmap management logic to common place will be usefull for other libraries in future. My suggestion would be to enchanse rte_bitmap to support dynamic bitmap through new APIs.
>
Interesting that people always think this api a bitmap, now start to realize it meaningful, memory just an optional attachment storage to each bit :)
I'll append missing api like set bitmap by index, then move it to eal common folder, the header file should be rte_bitmap2.h?

>
>
>The map_xxx() naming might confused people, I'll make following change in next version:
>        map_get()/map_set(): only used once and the code is simple, move code into caller.
>        map_is_empty()/map_clear()/ : unused, remove
>        map_clear_any(): relative simple, embed into caller.
>
  
Jerin Jacob Oct. 25, 2019, 4:28 p.m. UTC | #7
On Fri, Oct 25, 2019 at 8:59 PM Xueming(Steven) Li
<xuemingl@mellanox.com> wrote:
>
>
> >From: Jerin Jacob <jerinjacobk@gmail.com>
> >Sent: Saturday, October 19, 2019 8:28 PM
> >
> >On Fri, 18 Oct, 2019, 3:40 pm Xueming(Steven) Li, <mailto:xuemingl@mellanox.com> wrote:
> >> -----Original Message-----
> >> From: Jerin Jacob <mailto:jerinjacobk@gmail.com>
> >> Sent: Friday, October 18, 2019 12:41 AM
> >> To: Xueming(Steven) Li <mailto:xuemingl@mellanox.com>
> >> Cc: Olivier Matz <mailto:olivier.matz@6wind.com>; Andrew Rybchenko
> >> <mailto:arybchenko@solarflare.com>; dpdk-dev <mailto:dev@dpdk.org>; Asaf Penso
> >> <mailto:asafp@mellanox.com>; Ori Kam <mailto:orika@mellanox.com>; Stephen
> >> Hemminger <mailto:stephen@networkplumber.org>
> >> Subject: Re: [dpdk-dev] [RFC] mempool: introduce indexed memory pool
> >>
> >> On Thu, Oct 17, 2019 at 6:43 PM Xueming(Steven) Li
> >> <mailto:xuemingl@mellanox.com> wrote:
> >> >
> >> > > -----Original Message-----
> >> > > From: Jerin Jacob <mailto:jerinjacobk@gmail.com>
> >> > > Sent: Thursday, October 17, 2019 3:14 PM
> >> > > To: Xueming(Steven) Li <mailto:xuemingl@mellanox.com>
> >> > > Cc: Olivier Matz <mailto:olivier.matz@6wind.com>; Andrew Rybchenko
> >> > > <mailto:arybchenko@solarflare.com>; dpdk-dev <mailto:dev@dpdk.org>; Asaf Penso
> >> > > <mailto:asafp@mellanox.com>; Ori Kam <mailto:orika@mellanox.com>
> >> > > Subject: Re: [dpdk-dev] [RFC] mempool: introduce indexed memory pool
> >> > >
> >> > > On Thu, Oct 17, 2019 at 12:25 PM Xueming Li <mailto:xuemingl@mellanox.com>
> >> wrote:
> >> > > >
> >> > > > Indexed memory pool manages memory entries by index, allocation
> >> > > > from pool returns both memory pointer and index(ID). users save ID
> >> > > > as u32 or less(u16) instead of traditional 8 bytes pointer. Memory
> >> > > > could be retrieved from pool or returned to pool later by index.
> >> > > >
> >> > > > Pool allocates backend memory in chunk on demand, pool size grows
> >> > > > dynamically. Bitmap is used to track entry usage in chunk, thus
> >> > > > management overhead is one bit per entry.
> >> > > >
> >> > > > Standard rte_malloc demands malloc overhead(64B) and minimal data
> >> > > > size(64B). This pool aims to such cost saving also pointer size.
> >> > > > For scenario like creating millions of rte_flows each consists of
> >> > > > small pieces of memories, the difference is huge.
> >> > > >
> >> > > > Like standard memory pool, this lightweight pool only support
> >> > > > fixed size memory allocation. Pools should be created for each
> >> > > > different size.
> >> > > >
> >> > > > To facilitate memory allocated by index, a set of ILIST_XXX macro
> >> > > > defined to operate entries as regular LIST.
> >> > > >
> >> > > > By setting entry size to zero, pool can be used as ID generator.
> >> > > >
> >> > > > Signed-off-by: Xueming Li <mailto:xuemingl@mellanox.com>
> >> > > > ---
> >> > > >  lib/librte_mempool/Makefile                |   3 +-
> >> > > >  lib/librte_mempool/rte_indexed_pool.c      | 289
> >> +++++++++++++++++++++
> >> > > >  lib/librte_mempool/rte_indexed_pool.h      | 224 ++++++++++++++++
> >> > >
> >> > > Can this be abstracted over the driver interface instead of creating a new
> >> APIS?
> >> > > ie using drivers/mempool/
> >> >
> >> > The driver interface manage memory entries with pointers, while this api
> >> uses u32 index as key...
> >>
> >> I see. As a use case, it makes sense to me.
> >
> >> Have you checked the possibility reusing/extending
> >> lib/librte_eal/common/include/rte_bitmap.h for bitmap management,
> >> instead of rolling a new one?
> >
> >Yes, the rte_bitmap designed for fixed bitmap size, to grow, have to copy almost entire bitmap(array1+array2).
> >This pool distribute array2 into each trunk, and the trunk array actually plays the array1 role.
> >When growing, just grow array1 which is smaller, no touch to existing array2 in each trunk.
> >
> >IMO, Growing bit map is generic problem so moving bitmap management logic to common place will be usefull for other libraries in future. My suggestion would be to enchanse rte_bitmap to support dynamic bitmap through new APIs.
> >
> Interesting that people always think this api a bitmap, now start to realize it meaningful, memory just an optional attachment storage to each bit :)
> I'll append missing api like set bitmap by index, then move it to eal common folder, the header file should be rte_bitmap2.h?

+ cristian.dumitrescu@intel.com(rte_bitmap.h maintainer) for any comments.

rte_bitmap2.h may not be an intuitive name. One option could behave
separate APIs for the new case in rte_bitmap.h. No strong opinions on
the code organization in eal.


>
> >
> >
> >The map_xxx() naming might confused people, I'll make following change in next version:
> >        map_get()/map_set(): only used once and the code is simple, move code into caller.
> >        map_is_empty()/map_clear()/ : unused, remove
> >        map_clear_any(): relative simple, embed into caller.
> >
  
Olivier Matz Dec. 26, 2019, 11:05 a.m. UTC | #8
Hi Xueming,

On Thu, Oct 17, 2019 at 06:55:01AM +0000, Xueming Li wrote:
> Indexed memory pool manages memory entries by index, allocation from
> pool returns both memory pointer and index(ID). users save ID as u32
> or less(u16) instead of traditional 8 bytes pointer. Memory could be
> retrieved from pool or returned to pool later by index.
> 
> Pool allocates backend memory in chunk on demand, pool size grows
> dynamically. Bitmap is used to track entry usage in chunk, thus
> management overhead is one bit per entry.
> 
> Standard rte_malloc demands malloc overhead(64B) and minimal data
> size(64B). This pool aims to such cost saving also pointer size.
> For scenario like creating millions of rte_flows each consists
> of small pieces of memories, the difference is huge.
> 
> Like standard memory pool, this lightweight pool only support fixed
> size memory allocation. Pools should be created for each different
> size.
> 
> To facilitate memory allocated by index, a set of ILIST_XXX macro
> defined to operate entries as regular LIST.
> 
> By setting entry size to zero, pool can be used as ID generator.
> 
> Signed-off-by: Xueming Li <xuemingl@mellanox.com>

I think some inputs are missing about the use case and what exact
problem you are trying to solve. Where do you plan to use it in dpdk?

My understanding of your needs are:
- An allocator for objects of similar size (i.e. a pool)
- Avoid wasted memory when allocating many small objects
- Identify objects by index instead of pointer (why?)
- Dynamically growable, i.e. add more objects to the pool on demand
- Quick conversion from id to pointer
- Alloc/free does not need to be fast

Is it correct? Anything else?

What prevents you from using a rte_mempool? It should support dynamic
growing (but I don't think it has ever been tested). The missing feature
is the conversion between id and pointer, but it looks quite easy to me.
The reverse conversion (pointer to id) would be more tricky.

Did I miss a requirement?

It looks that the indexed_pool allocator is not designed for fast
allocation/free, right? I don't see any bulk-based functions. Also, as
it is now, I don't really see the link with rte_mempool (except it is a
pool), and I suggest that it could be a separated library instead.

An example of use would be welcome.

Another general comment is about the naming convention. Public
functions, structures, defines should have the same form, as much as
possible: "rte_<libname>_xxx".

Some more comments below.

> ---
>  lib/librte_mempool/Makefile                |   3 +-
>  lib/librte_mempool/rte_indexed_pool.c      | 289 +++++++++++++++++++++
>  lib/librte_mempool/rte_indexed_pool.h      | 224 ++++++++++++++++
>  lib/librte_mempool/rte_mempool_version.map |   7 +-
>  4 files changed, 521 insertions(+), 2 deletions(-)
>  create mode 100644 lib/librte_mempool/rte_indexed_pool.c
>  create mode 100644 lib/librte_mempool/rte_indexed_pool.h
> 
> diff --git a/lib/librte_mempool/Makefile b/lib/librte_mempool/Makefile
> index 20bf63fbca..343e945336 100644
> --- a/lib/librte_mempool/Makefile
> +++ b/lib/librte_mempool/Makefile
> @@ -21,7 +21,8 @@ CFLAGS += -DALLOW_EXPERIMENTAL_API
>  SRCS-$(CONFIG_RTE_LIBRTE_MEMPOOL) +=  rte_mempool.c
>  SRCS-$(CONFIG_RTE_LIBRTE_MEMPOOL) +=  rte_mempool_ops.c
>  SRCS-$(CONFIG_RTE_LIBRTE_MEMPOOL) +=  rte_mempool_ops_default.c
> +SRCS-$(CONFIG_RTE_LIBRTE_MEMPOOL) +=  rte_indexed_pool.c
>  # install includes
> -SYMLINK-$(CONFIG_RTE_LIBRTE_MEMPOOL)-include := rte_mempool.h
> +SYMLINK-$(CONFIG_RTE_LIBRTE_MEMPOOL)-include := rte_mempool.h rte_indexed_pool.h
>  
>  include $(RTE_SDK)/mk/rte.lib.mk

meson.build support is missing.

> diff --git a/lib/librte_mempool/rte_indexed_pool.c b/lib/librte_mempool/rte_indexed_pool.c
> new file mode 100644
> index 0000000000..b9069a6555
> --- /dev/null
> +++ b/lib/librte_mempool/rte_indexed_pool.c
> @@ -0,0 +1,289 @@

Copyright/license is missing.

> +#include "rte_indexed_pool.h"
> +
> +#include <string.h>
> +#include <assert.h>
> +
> +#include <rte_eal.h>
> +#include <rte_malloc.h>
> +#include <rte_debug.h>
> +#include <rte_log.h>
> +
> +
> +/* return 1 if bitmap empty after clear. */
> +static int
> +map_clear_any(uint64_t *map, int n, uint32_t *idx)
> +{
> +	int row, col;
> +
> +	*idx = UINT32_MAX;
> +	/* Locate free entry in trunk */
> +	for (row = 0; row < n / 64; row++) {
> +		if (*idx != UINT32_MAX && map[row])
> +			return 0;
> +		if (!map[row])
> +			continue;
> +		col = __builtin_ctzll(map[row]);
> +		*idx = row * 64 + col;
> +		map[row] &= ~(1LU << col);
> +		if (map[row])
> +			return 0;
> +	}
> +	return 1;
> +}

From what I see, map_clear_any() finds a bit set to 1 in the bitmap, set
it to 0, then returns 1 if the bitmap is all 0. For me, the name of the
function does not reflect what is done. What about take_bit() or
get_bit()?.

"int n" could be "size_t len", and 64 should be UINT64_BIT, to
differentiate cases where 64 is an arbitrary value from cases where it
must be the number of bits in a uint64_t.

These comments apply to other functions.

> +
> +/* Return 1 if all zero. */
> +static inline int __rte_unused
> +map_is_empty(uint64_t *map, int n)
> +{
> +	int row;
> +
> +	for (row = 0; row < n / 64; row++) {
> +		if (map[row])
> +			return 0;
> +	}
> +	return 1;
> +}
> +
> +
> +static inline void __rte_unused
> +map_clear(uint64_t *map, uint32_t idx)
> +{
> +	int row = idx / 64;
> +	int col = idx % 64;
> +
> +	map[row] &= ~(1LU << col);
> +}
> +
> +static inline void
> +map_set(uint64_t *map, uint32_t idx)
> +{
> +	int row = idx / 64;
> +	int col = idx % 64;
> +
> +	map[row] |= (1LU << col);
> +}
> +
> +static inline uint64_t
> +map_get(uint64_t *map, uint32_t idx)
> +{
> +	int row = idx / 64;
> +	int col = idx % 64;
> +
> +	return map[row] & (1LU << col);
> +}
> +
> +static inline void
> +rte_ipool_init(struct rte_indexed_pool *pool, size_t size)
> +{
> +	pool->size = size;
> +	pool->first_free = TRUNK_INVALID;
> +	if (!pool->malloc)
> +		pool->malloc = rte_malloc_socket;
> +	if (!pool->free)
> +		pool->free = rte_free;
> +	rte_spinlock_init(&pool->lock);
> +}

I see that this function is called by rte_izmalloc(). But there is no
function to initialize a new struct rte_indexed_pool *.

> +
> +static int
> +rte_ipool_grow(struct rte_indexed_pool *pool)
> +{
> +	struct rte_indexed_trunk *trunk;
> +	void *p;
> +
> +	if (pool->n_trunk == UINT16_MAX)
> +		return -ENOMEM;
> +	if (pool->n_trunk == pool->n_trunk_list) {
> +		/* No free trunk flags, expand trunk list. */
> +		int n_grow = pool->n_trunk ? pool->n_trunk :
> +			     RTE_CACHE_LINE_SIZE / sizeof(void *);
> +		/* rte_ralloc is buggy. */
> +		p = pool->malloc(pool->type, (pool->n_trunk + n_grow) * 8,
> +				 RTE_CACHE_LINE_SIZE, rte_socket_id());

8 should be sizeof(uint64)

> +		if (!p)
> +			return -ENOMEM;
> +		if (pool->trunks) {
> +			memcpy(p, pool->trunks, pool->n_trunk * 8);
> +			pool->free(pool->trunks);
> +		}
> +		memset(RTE_PTR_ADD(p, pool->n_trunk * 8), 0, n_grow * 8);
> +		pool->trunks = p;
> +		pool->n_trunk_list += n_grow;
> +	}
> +	assert(pool->trunks[pool->n_trunk] == NULL);
> +	trunk = pool->malloc(pool->type,
> +			     sizeof(*trunk) + pool->size * IDX_POOL_TRUNK_ENTRY,
> +			     RTE_CACHE_LINE_SIZE, rte_socket_id());
> +	if (!trunk)
> +		return -ENOMEM;
> +	pool->trunks[pool->n_trunk] = trunk;
> +	trunk->idx = pool->n_trunk;
> +	trunk->open = 1;
> +	trunk->next = TRUNK_INVALID;
> +	assert(pool->first_free == TRUNK_INVALID);
> +	pool->first_free = pool->n_trunk;
> +	/* Mark all entries as available. */
> +	memset(trunk->free, 0xff, sizeof(trunk->free));
> +	pool->n_trunk++;
> +#ifdef POOL_DEBUG
> +	pool->trunk_new++;
> +	pool->trunk_avail++;
> +#endif
> +	return 0;
> +}
> +
> +void *
> +rte_imalloc(struct rte_indexed_pool *pool, size_t size, uint32_t *idx)
> +{
> +	struct rte_indexed_trunk *trunk;
> +	int empty;
> +	void *p;
> +
> +	if (!pool->trunks)
> +		rte_ipool_init(pool, size);
> +	else if (pool->size != size) {
> +		RTE_LOG(ERR, MEMPOOL,
> +			"Trying to alloc different size from pool\n");
> +		return NULL;
> +	}
> +	if (pool->need_lock)
> +		rte_spinlock_lock(&pool->lock);
> +	while (1) {
> +		if (pool->first_free == TRUNK_INVALID) {
> +			/* If no available trunks, grow new. */
> +			if (rte_ipool_grow(pool)) {
> +				if (pool->need_lock)
> +					rte_spinlock_unlock(&pool->lock);
> +				return NULL;
> +			}
> +			trunk = pool->trunks[pool->first_free];
> +			break;
> +		}
> +		trunk = pool->trunks[pool->first_free];
> +		if (trunk->open)
> +			break;
> +		/* Evict full used trunk from free list. */
> +		pool->first_free = trunk->next;
> +		trunk->next = TRUNK_INVALID;
> +	}

Is this while(1) loop really needed? Can it happen that several full
trunks are in free_list?

> +	assert(pool->first_free != TRUNK_INVALID);
> +	assert(pool->first_free < pool->n_trunk);
> +	assert(trunk->open);
> +	empty = map_clear_any(trunk->free, IDX_POOL_TRUNK_ENTRY, idx);
> +	assert(*idx != UINT32_MAX);
> +	assert(*idx < IDX_POOL_TRUNK_ENTRY);
> +	p = &trunk->data[*idx * pool->size];
> +	*idx += IDX_POOL_TRUNK_ENTRY * trunk->idx;
> +	*idx += 1; /* non-zero index. */
> +#ifdef POOL_DEBUG
> +	pool->n_entry++;
> +#endif
> +	if (empty) {
> +		/* Full trunk will be removed from free list in imalloc. */
> +		trunk->open = 0;

Why not doing removing it here?

> +#ifdef POOL_DEBUG
> +		pool->trunk_empty++;
> +		pool->trunk_avail--;
> +#endif
> +	}
> +	if (pool->need_lock)
> +		rte_spinlock_unlock(&pool->lock);
> +	return p;
> +}
> +
> +void *
> +rte_izmalloc(struct rte_indexed_pool *pool, size_t size, uint32_t *idx)
> +{
> +	void *entry = rte_imalloc(pool, size, idx);
> +
> +	if (entry)
> +		memset(entry, 0, size);
> +	return entry;
> +}
> +
> +void
> +rte_ifree(struct rte_indexed_pool *pool, uint32_t idx)
> +{
> +	struct rte_indexed_trunk *trunk;
> +
> +	if (!idx)
> +		return;
> +	idx -= 1;
> +	if (pool->need_lock)
> +		rte_spinlock_lock(&pool->lock);
> +	trunk = pool->trunks[idx / IDX_POOL_TRUNK_ENTRY];
> +	assert(trunk);
> +	assert(trunk->idx == idx / IDX_POOL_TRUNK_ENTRY);
> +	map_set(trunk->free, idx % IDX_POOL_TRUNK_ENTRY);

Note: a modulo operation with a non-power of 2 divisor can be slow on
some architectures.

> +	if (!trunk->open) {
> +		/* Trunk become available, put into free trunk list. */
> +		trunk->next = pool->first_free;
> +		pool->first_free = trunk->idx;
> +		trunk->open = 1;
> +#ifdef POOL_DEBUG
> +		pool->trunk_empty--;
> +		pool->trunk_avail++;
> +#endif
> +	}
> +#ifdef POOL_DEBUG
> +	pool->n_entry--;
> +#endif
> +	if (pool->need_lock)
> +		rte_spinlock_unlock(&pool->lock);
> +}
> +
> +void *
> +rte_iget(struct rte_indexed_pool *pool, uint32_t idx)
> +{
> +	struct rte_indexed_trunk *trunk;
> +	void *p = NULL;
> +
> +	if (!idx)
> +		return NULL;
> +	if (pool->need_lock)
> +		rte_spinlock_lock(&pool->lock);
> +	idx -= 1;
> +	trunk = pool->trunks[idx / IDX_POOL_TRUNK_ENTRY];
> +	assert(trunk);
> +	assert(trunk->idx == idx / IDX_POOL_TRUNK_ENTRY);
> +	if (trunk) {
> +		assert(!map_get(trunk->free, idx % IDX_POOL_TRUNK_ENTRY));
> +		p = &trunk->data[(idx % IDX_POOL_TRUNK_ENTRY) * pool->size];
> +	}
> +	if (pool->need_lock)
> +		rte_spinlock_unlock(&pool->lock);
> +	return p;
> +}
> +
> +int
> +rte_ipool_close(struct rte_indexed_pool *pool)
> +{
> +	struct rte_indexed_trunk **trunks;
> +	int i;
> +
> +	assert(pool);
> +	if (!pool->trunks)
> +		return 0;
> +	if (pool->need_lock)
> +		rte_spinlock_lock(&pool->lock);
> +	trunks = pool->trunks;
> +	for (i = 0; i < pool->n_trunk_list; i++) {
> +		if (trunks[i])
> +			pool->free(trunks[i]);
> +	}
> +	pool->free(pool->trunks);
> +	memset(pool, 0, sizeof(*pool));
> +	return 0;
> +}
> +
> +void
> +rte_ipool_dump(struct rte_indexed_pool *pool, const char *name)
> +{
> +	printf("Pool %s entry size %u, trunks %hu, %d entry per trunk, total: %d\n",
> +	       name, pool->size, pool->n_trunk, IDX_POOL_TRUNK_ENTRY,
> +	       pool->n_trunk * IDX_POOL_TRUNK_ENTRY);
> +#ifdef POOL_DEBUG
> +	printf("Pool %s entry %ld, trunk alloc %ld, empty: %ld, available %ld free %ld\n",
> +	       name, pool->n_entry, pool->trunk_new, pool->trunk_empty,
> +	       pool->trunk_avail, pool->trunk_free);
> +#endif
> +}
> diff --git a/lib/librte_mempool/rte_indexed_pool.h b/lib/librte_mempool/rte_indexed_pool.h
> new file mode 100644
> index 0000000000..73640c5a1b
> --- /dev/null
> +++ b/lib/librte_mempool/rte_indexed_pool.h
> @@ -0,0 +1,224 @@
> +/*-
> + *   BSD LICENSE
> + *
> + *   Copyright 2018 Mellanox.
> + *
> + *   Redistribution and use in source and binary forms, with or without
> + *   modification, are permitted provided that the following conditions
> + *   are met:
> + *
> + *     * Redistributions of source code must retain the above copyright
> + *       notice, this list of conditions and the following disclaimer.
> + *     * Redistributions in binary form must reproduce the above copyright
> + *       notice, this list of conditions and the following disclaimer in
> + *       the documentation and/or other materials provided with the
> + *       distribution.
> + *     * Neither the name of 6WIND S.A. nor the names of its
> + *       contributors may be used to endorse or promote products derived
> + *       from this software without specific prior written permission.
> + *
> + *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
> + *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
> + *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
> + *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
> + *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
> + *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
> + *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
> + *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
> + *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
> + *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
> + *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
> + */

SPDX tag should be used.

> +
> +#ifndef RTE_MEM_POOL_H_
> +#define RTE_MEM_POOL_H_
> +
> +#include <unistd.h>
> +#include <stdint.h>
> +
> +#include <rte_spinlock.h>
> +#include <rte_memory.h>
> +
> +#define IDX_POOL_TRUNK_ENTRY (63 * 64)

64 should be UINT64_BIT.

> +#define TRUNK_INVALID UINT16_MAX
> +#define POOL_DEBUG 1
> +
> +struct rte_indexed_trunk {
> +	uint64_t free[IDX_POOL_TRUNK_ENTRY / 64]; /* map of free entries. */
> +	uint16_t idx; /* Trunk id. */
> +	uint16_t next; /* Next free trunk in free list. */
> +	uint8_t open; /* Free entries available, enlisted in free list */
> +	uint8_t data[] __rte_cache_min_aligned; /* Entry data start. */
> +};
> +
> +struct rte_indexed_pool {
> +	rte_spinlock_t lock;
> +	uint32_t size; /* Entry size. */
> +	uint16_t n_trunk; /* Trunks allocated. */
> +	uint16_t n_trunk_list; /* Trunk pointer array size. */
> +	/* Dim of trunk pointer array. */
> +	struct rte_indexed_trunk **trunks;
> +	uint16_t first_free; /* Index to first free trunk. */
> +	int need_lock:1;
> +	int no_trunk_free:1;
> +	const char *type;
> +	void *(*malloc)(const char *type, size_t size, unsigned int align,
> +			int socket);
> +	void (*free)(void *addr);
> +#ifdef POOL_DEBUG
> +	int64_t n_entry;
> +	int64_t trunk_new;
> +	int64_t trunk_avail;
> +	int64_t trunk_empty;
> +	int64_t trunk_free;
> +#endif

I don't think the topology of public structures should change
depending on a #ifdef (in rte_mempool it is done like this but
it is not a good idea).

Also, to facilitate API stability, structures shall be hidden to users
as much as possible.

> +};
> +
> +/**
> + * This function allocates non-initialized memory from pool.
> + * In NUMA systems, the memory allocated resides on the same
> + * NUMA socket as the core that calls this function.
> + *
> + * Memory is allocated from memory trunk, no alignment.
> + * If size 0 specified, essential a unique ID generator.
> + *
> + * @param pool
> + *   Pointer to indexed memory pool.
> + *   No initialization required.
> + * @param size
> + *   Size (in bytes) to be allocated.
> + *   Size has to be same in each allocation request to same pool.
> + * @param[out] idx
> + *   Pointer to memory to save allocated index.
> + *   Memory index always positive value.
> + * @return
> + *   - Pointer to the allocated memory
> + *   - NULL on error. Not enough memory, or invalid arguments.
> + */
> +__rte_experimental
> +void *rte_imalloc(struct rte_indexed_pool *pool, size_t size, uint32_t *idx);
> +
> +/**
> + * This function allocates zero initialized memory from pool.
> + * In NUMA systems, the memory allocated resides on the same
> + * NUMA socket as the core that calls this function.
> + *
> + * Memory is allocated from memory trunk, no alignment.
> + * If size 0 specified, essential a unique ID generator.
> + *
> + * @param pool
> + *   Pointer to indexed memory pool.
> + *   No initialization required.
> + * @param size
> + *   Size (in bytes) to be allocated.
> + *   Size has to be same in each allocation request to same pool.
> + * @param[out] idx
> + *   Pointer to memory to save allocated index.
> + *   Memory index always positive value.
> + * @return
> + *   - Pointer to the allocated memory
> + *   - NULL on error. Not enough memory, or invalid arguments.
> + */
> +__rte_experimental
> +void *rte_izmalloc(struct rte_indexed_pool *pool, size_t size, uint32_t *idx);
> +
> +/**
> + * This function free indexed memory to pool.
> + * Caller has to make sure that the index is allocated from same pool.
> + *
> + * @param pool
> + *   Pointer to indexed memory pool.
> + * @param idx
> + *   Allocated memory index.
> + */
> +__rte_experimental
> +void rte_ifree(struct rte_indexed_pool *pool, uint32_t idx);
> +
> +/**
> + * This function return pointer of indexed memory from index.
> + * Caller has to make sure that the index is allocated from same pool.
> + *
> + * @param pool
> + *   Pointer to indexed memory pool.
> + * @param idx
> + *   Pointer of memory index allocated.
> + * @return
> + *   - Pointer to indexed memory.
> + *   - NULL on not found.
> + */
> +__rte_experimental
> +void *rte_iget(struct rte_indexed_pool *pool, uint32_t idx);
> +
> +/**
> + * This function release all resources of pool.
> + * Caller has to make sure that all indexes and memories allocated
> + * from this pool not referenced anymore.
> + *
> + * @param pool
> + *   Pointer to indexed memory pool.
> + * @return
> + *   - non-zero value on error.
> + *   - 0 on success.
> + */
> +__rte_experimental
> +int rte_ipool_close(struct rte_indexed_pool *pool);
> +
> +/**
> + * This function dump debug info of pool.
> + *
> + * @param pool
> + *   Pointer to indexed memory pool.
> + */
> +__rte_experimental
> +void rte_ipool_dump(struct rte_indexed_pool *pool, const char *name);
> +
> +/*
> + * Macros for linked list based on indexed memory.
> + * Example data structure:
> + * struct Foo {
> + *	ILIST_ENTRY(uint16_t) next;
> + *	...
> + * }
> + *
> + */
> +#define ILIST_ENTRY(type)						\
> +struct {								\
> +	type prev; /* Index of previous element. */			\
> +	type next; /* Index of next element. */				\
> +}
> +
> +#define ILIST_INSERT(pool, idx, elem, field, head, peer)		\
> +	do {								\
> +		assert(elem && idx);					\
> +		elem->field.next = *head;				\
> +		elem->field.prev = 0;					\
> +		if (*head) {						\
> +			peer = rte_iget(pool, *head);			\
> +			peer->field.prev = idx;				\
> +		}							\
> +		*head = idx;						\
> +	} while (0)
> +
> +#define ILIST_REMOVE(pool, elem, idx, field, head, peer)		\
> +	do {								\
> +		assert(elem);						\
> +		assert(head);						\
> +		if (elem->field.prev) {					\
> +			peer = rte_iget(pool, elem->field.prev);	\
> +			peer->field.next = elem->field.next;		\
> +		}							\
> +		if (elem->field.next) {					\
> +			peer = rte_iget(pool, elem->field.next);	\
> +			peer->field.prev = elem->field.prev;		\
> +		}							\
> +		if (*head == idx)					\
> +			*head = elem->field.next;			\
> +	} while (0)
> +
> +#define ILIST_FOREACH(pool, head, idx, elem, field)			\
> +	for (idx = head, elem = rte_iget(pool, idx);			\
> +	     elem;							\
> +	     idx = elem->field.next, elem = rte_iget(pool, idx))
> +
> +#endif /* RTE_MEM_POOL_H_ */
> +
> diff --git a/lib/librte_mempool/rte_mempool_version.map b/lib/librte_mempool/rte_mempool_version.map
> index 17cbca4607..3b264189be 100644
> --- a/lib/librte_mempool/rte_mempool_version.map
> +++ b/lib/librte_mempool/rte_mempool_version.map
> @@ -41,7 +41,6 @@ DPDK_17.11 {
>  	global:
>  
>  	rte_mempool_populate_iova;
> -
>  } DPDK_16.07;
>  
>  DPDK_18.05 {
> @@ -57,4 +56,10 @@ EXPERIMENTAL {
>  	global:
>  
>  	rte_mempool_ops_get_info;
> +	rte_imalloc;
> +	rte_izmalloc;
> +	rte_ifree;
> +	rte_iget;
> +	rte_ipool_close;
> +	rte_ipool_dump;
>  };
> -- 
> 2.17.1
> 


Regards,
Olivier
  
Suanming Mou March 5, 2020, 7:43 a.m. UTC | #9
Hi Olivier,

Sorry for the late response.
Xueming is currently busy with other tasks, so I will try continue with 
the RFC.

On 12/26/2019 7:05 PM, Olivier Matz wrote:
> Hi Xueming,
>
> On Thu, Oct 17, 2019 at 06:55:01AM +0000, Xueming Li wrote:
>> Indexed memory pool manages memory entries by index, allocation from
>> pool returns both memory pointer and index(ID). users save ID as u32
>> or less(u16) instead of traditional 8 bytes pointer. Memory could be
>> retrieved from pool or returned to pool later by index.
>>
>> Pool allocates backend memory in chunk on demand, pool size grows
>> dynamically. Bitmap is used to track entry usage in chunk, thus
>> management overhead is one bit per entry.
>>
>> Standard rte_malloc demands malloc overhead(64B) and minimal data
>> size(64B). This pool aims to such cost saving also pointer size.
>> For scenario like creating millions of rte_flows each consists
>> of small pieces of memories, the difference is huge.
>>
>> Like standard memory pool, this lightweight pool only support fixed
>> size memory allocation. Pools should be created for each different
>> size.
>>
>> To facilitate memory allocated by index, a set of ILIST_XXX macro
>> defined to operate entries as regular LIST.
>>
>> By setting entry size to zero, pool can be used as ID generator.
>>
>> Signed-off-by: Xueming Li <xuemingl@mellanox.com>
> I think some inputs are missing about the use case and what exact
> problem you are trying to solve. Where do you plan to use it in dpdk?
>
> My understanding of your needs are:
> - An allocator for objects of similar size (i.e. a pool)
Yes.
> - Avoid wasted memory when allocating many small objects
Yes, exactly. Since if we allocate millions of small objects by standard 
rte_xalloc(), the MALLOC_ELEM_HEADER_LEN overhead will cost quite much 
memory.
> - Identify objects by index instead of pointer (why?)
For struct contains quite many mempool element pointers, since the 
pointer size is 8 bytes, index instead of pointer saves 4 bytes for 
every pointer.
> - Dynamically growable, i.e. add more objects to the pool on demand
> - Quick conversion from id to pointer
> - Alloc/free does not need to be fast
>
> Is it correct? Anything else?
>
> What prevents you from using a rte_mempool? It should support dynamic
> growing (but I don't think it has ever been tested). The missing feature
> is the conversion between id and pointer, but it looks quite easy to me.
> The reverse conversion (pointer to id) would be more tricky.

Do you mean to add a new mempool device driver? If yes, appreciate with 
the suggestion.

But when we create a new mempool, we will assign the pool elm number 
"n", and populate the pool.

I think for dynamic growing mempool, populate is not needed, since we 
don't know the exactly number it will allocated.

How about add an new flag MEMPOOL_F_DYNAMIC_GROW while creating the 
mempool, and set "n" to zero, the flag and 0 indicate we are allocating 
a dynamic growing mempool.

Or just set "n"  to zero to indicate dynamic growing, no more new flag 
is needed?

>
> Did I miss a requirement?
>
> It looks that the indexed_pool allocator is not designed for fast
> allocation/free, right? I don't see any bulk-based functions. Also, as
> it is now, I don't really see the link with rte_mempool (except it is a
> pool), and I suggest that it could be a separated library instead.
Bulk is not needed.
>
> An example of use would be welcome.
>
> Another general comment is about the naming convention. Public
> functions, structures, defines should have the same form, as much as
> possible: "rte_<libname>_xxx".
>
> Some more comments below.
>
>> ---
>>   lib/librte_mempool/Makefile                |   3 +-
>>   lib/librte_mempool/rte_indexed_pool.c      | 289 +++++++++++++++++++++
>>   lib/librte_mempool/rte_indexed_pool.h      | 224 ++++++++++++++++
>>   lib/librte_mempool/rte_mempool_version.map |   7 +-
>>   4 files changed, 521 insertions(+), 2 deletions(-)
>>   create mode 100644 lib/librte_mempool/rte_indexed_pool.c
>>   create mode 100644 lib/librte_mempool/rte_indexed_pool.h
>>
>> diff --git a/lib/librte_mempool/Makefile b/lib/librte_mempool/Makefile
>> index 20bf63fbca..343e945336 100644
>> --- a/lib/librte_mempool/Makefile
>> +++ b/lib/librte_mempool/Makefile
>> @@ -21,7 +21,8 @@ CFLAGS += -DALLOW_EXPERIMENTAL_API
>>   SRCS-$(CONFIG_RTE_LIBRTE_MEMPOOL) +=  rte_mempool.c
>>   SRCS-$(CONFIG_RTE_LIBRTE_MEMPOOL) +=  rte_mempool_ops.c
>>   SRCS-$(CONFIG_RTE_LIBRTE_MEMPOOL) +=  rte_mempool_ops_default.c
>> +SRCS-$(CONFIG_RTE_LIBRTE_MEMPOOL) +=  rte_indexed_pool.c
>>   # install includes
>> -SYMLINK-$(CONFIG_RTE_LIBRTE_MEMPOOL)-include := rte_mempool.h
>> +SYMLINK-$(CONFIG_RTE_LIBRTE_MEMPOOL)-include := rte_mempool.h rte_indexed_pool.h
>>   
>>   include $(RTE_SDK)/mk/rte.lib.mk
> meson.build support is missing.
>
>> diff --git a/lib/librte_mempool/rte_indexed_pool.c b/lib/librte_mempool/rte_indexed_pool.c
>> new file mode 100644
>> index 0000000000..b9069a6555
>> --- /dev/null
>> +++ b/lib/librte_mempool/rte_indexed_pool.c
>> @@ -0,0 +1,289 @@
> Copyright/license is missing.
>
>> +#include "rte_indexed_pool.h"
>> +
>> +#include <string.h>
>> +#include <assert.h>
>> +
>> +#include <rte_eal.h>
>> +#include <rte_malloc.h>
>> +#include <rte_debug.h>
>> +#include <rte_log.h>
>> +
>> +
>> +/* return 1 if bitmap empty after clear. */
>> +static int
>> +map_clear_any(uint64_t *map, int n, uint32_t *idx)
>> +{
>> +	int row, col;
>> +
>> +	*idx = UINT32_MAX;
>> +	/* Locate free entry in trunk */
>> +	for (row = 0; row < n / 64; row++) {
>> +		if (*idx != UINT32_MAX && map[row])
>> +			return 0;
>> +		if (!map[row])
>> +			continue;
>> +		col = __builtin_ctzll(map[row]);
>> +		*idx = row * 64 + col;
>> +		map[row] &= ~(1LU << col);
>> +		if (map[row])
>> +			return 0;
>> +	}
>> +	return 1;
>> +}
> >From what I see, map_clear_any() finds a bit set to 1 in the bitmap, set
> it to 0, then returns 1 if the bitmap is all 0. For me, the name of the
> function does not reflect what is done. What about take_bit() or
> get_bit()?.
>
> "int n" could be "size_t len", and 64 should be UINT64_BIT, to
> differentiate cases where 64 is an arbitrary value from cases where it
> must be the number of bits in a uint64_t.
>
> These comments apply to other functions.
>
>> +
>> +/* Return 1 if all zero. */
>> +static inline int __rte_unused
>> +map_is_empty(uint64_t *map, int n)
>> +{
>> +	int row;
>> +
>> +	for (row = 0; row < n / 64; row++) {
>> +		if (map[row])
>> +			return 0;
>> +	}
>> +	return 1;
>> +}
>> +
>> +
>> +static inline void __rte_unused
>> +map_clear(uint64_t *map, uint32_t idx)
>> +{
>> +	int row = idx / 64;
>> +	int col = idx % 64;
>> +
>> +	map[row] &= ~(1LU << col);
>> +}
>> +
>> +static inline void
>> +map_set(uint64_t *map, uint32_t idx)
>> +{
>> +	int row = idx / 64;
>> +	int col = idx % 64;
>> +
>> +	map[row] |= (1LU << col);
>> +}
>> +
>> +static inline uint64_t
>> +map_get(uint64_t *map, uint32_t idx)
>> +{
>> +	int row = idx / 64;
>> +	int col = idx % 64;
>> +
>> +	return map[row] & (1LU << col);
>> +}
>> +
>> +static inline void
>> +rte_ipool_init(struct rte_indexed_pool *pool, size_t size)
>> +{
>> +	pool->size = size;
>> +	pool->first_free = TRUNK_INVALID;
>> +	if (!pool->malloc)
>> +		pool->malloc = rte_malloc_socket;
>> +	if (!pool->free)
>> +		pool->free = rte_free;
>> +	rte_spinlock_init(&pool->lock);
>> +}
> I see that this function is called by rte_izmalloc(). But there is no
> function to initialize a new struct rte_indexed_pool *.
>
>> +
>> +static int
>> +rte_ipool_grow(struct rte_indexed_pool *pool)
>> +{
>> +	struct rte_indexed_trunk *trunk;
>> +	void *p;
>> +
>> +	if (pool->n_trunk == UINT16_MAX)
>> +		return -ENOMEM;
>> +	if (pool->n_trunk == pool->n_trunk_list) {
>> +		/* No free trunk flags, expand trunk list. */
>> +		int n_grow = pool->n_trunk ? pool->n_trunk :
>> +			     RTE_CACHE_LINE_SIZE / sizeof(void *);
>> +		/* rte_ralloc is buggy. */
>> +		p = pool->malloc(pool->type, (pool->n_trunk + n_grow) * 8,
>> +				 RTE_CACHE_LINE_SIZE, rte_socket_id());
> 8 should be sizeof(uint64)
>
>> +		if (!p)
>> +			return -ENOMEM;
>> +		if (pool->trunks) {
>> +			memcpy(p, pool->trunks, pool->n_trunk * 8);
>> +			pool->free(pool->trunks);
>> +		}
>> +		memset(RTE_PTR_ADD(p, pool->n_trunk * 8), 0, n_grow * 8);
>> +		pool->trunks = p;
>> +		pool->n_trunk_list += n_grow;
>> +	}
>> +	assert(pool->trunks[pool->n_trunk] == NULL);
>> +	trunk = pool->malloc(pool->type,
>> +			     sizeof(*trunk) + pool->size * IDX_POOL_TRUNK_ENTRY,
>> +			     RTE_CACHE_LINE_SIZE, rte_socket_id());
>> +	if (!trunk)
>> +		return -ENOMEM;
>> +	pool->trunks[pool->n_trunk] = trunk;
>> +	trunk->idx = pool->n_trunk;
>> +	trunk->open = 1;
>> +	trunk->next = TRUNK_INVALID;
>> +	assert(pool->first_free == TRUNK_INVALID);
>> +	pool->first_free = pool->n_trunk;
>> +	/* Mark all entries as available. */
>> +	memset(trunk->free, 0xff, sizeof(trunk->free));
>> +	pool->n_trunk++;
>> +#ifdef POOL_DEBUG
>> +	pool->trunk_new++;
>> +	pool->trunk_avail++;
>> +#endif
>> +	return 0;
>> +}
>> +
>> +void *
>> +rte_imalloc(struct rte_indexed_pool *pool, size_t size, uint32_t *idx)
>> +{
>> +	struct rte_indexed_trunk *trunk;
>> +	int empty;
>> +	void *p;
>> +
>> +	if (!pool->trunks)
>> +		rte_ipool_init(pool, size);
>> +	else if (pool->size != size) {
>> +		RTE_LOG(ERR, MEMPOOL,
>> +			"Trying to alloc different size from pool\n");
>> +		return NULL;
>> +	}
>> +	if (pool->need_lock)
>> +		rte_spinlock_lock(&pool->lock);
>> +	while (1) {
>> +		if (pool->first_free == TRUNK_INVALID) {
>> +			/* If no available trunks, grow new. */
>> +			if (rte_ipool_grow(pool)) {
>> +				if (pool->need_lock)
>> +					rte_spinlock_unlock(&pool->lock);
>> +				return NULL;
>> +			}
>> +			trunk = pool->trunks[pool->first_free];
>> +			break;
>> +		}
>> +		trunk = pool->trunks[pool->first_free];
>> +		if (trunk->open)
>> +			break;
>> +		/* Evict full used trunk from free list. */
>> +		pool->first_free = trunk->next;
>> +		trunk->next = TRUNK_INVALID;
>> +	}
> Is this while(1) loop really needed? Can it happen that several full
> trunks are in free_list?
>
>> +	assert(pool->first_free != TRUNK_INVALID);
>> +	assert(pool->first_free < pool->n_trunk);
>> +	assert(trunk->open);
>> +	empty = map_clear_any(trunk->free, IDX_POOL_TRUNK_ENTRY, idx);
>> +	assert(*idx != UINT32_MAX);
>> +	assert(*idx < IDX_POOL_TRUNK_ENTRY);
>> +	p = &trunk->data[*idx * pool->size];
>> +	*idx += IDX_POOL_TRUNK_ENTRY * trunk->idx;
>> +	*idx += 1; /* non-zero index. */
>> +#ifdef POOL_DEBUG
>> +	pool->n_entry++;
>> +#endif
>> +	if (empty) {
>> +		/* Full trunk will be removed from free list in imalloc. */
>> +		trunk->open = 0;
> Why not doing removing it here?
>
>> +#ifdef POOL_DEBUG
>> +		pool->trunk_empty++;
>> +		pool->trunk_avail--;
>> +#endif
>> +	}
>> +	if (pool->need_lock)
>> +		rte_spinlock_unlock(&pool->lock);
>> +	return p;
>> +}
>> +
>> +void *
>> +rte_izmalloc(struct rte_indexed_pool *pool, size_t size, uint32_t *idx)
>> +{
>> +	void *entry = rte_imalloc(pool, size, idx);
>> +
>> +	if (entry)
>> +		memset(entry, 0, size);
>> +	return entry;
>> +}
>> +
>> +void
>> +rte_ifree(struct rte_indexed_pool *pool, uint32_t idx)
>> +{
>> +	struct rte_indexed_trunk *trunk;
>> +
>> +	if (!idx)
>> +		return;
>> +	idx -= 1;
>> +	if (pool->need_lock)
>> +		rte_spinlock_lock(&pool->lock);
>> +	trunk = pool->trunks[idx / IDX_POOL_TRUNK_ENTRY];
>> +	assert(trunk);
>> +	assert(trunk->idx == idx / IDX_POOL_TRUNK_ENTRY);
>> +	map_set(trunk->free, idx % IDX_POOL_TRUNK_ENTRY);
> Note: a modulo operation with a non-power of 2 divisor can be slow on
> some architectures.
>
>> +	if (!trunk->open) {
>> +		/* Trunk become available, put into free trunk list. */
>> +		trunk->next = pool->first_free;
>> +		pool->first_free = trunk->idx;
>> +		trunk->open = 1;
>> +#ifdef POOL_DEBUG
>> +		pool->trunk_empty--;
>> +		pool->trunk_avail++;
>> +#endif
>> +	}
>> +#ifdef POOL_DEBUG
>> +	pool->n_entry--;
>> +#endif
>> +	if (pool->need_lock)
>> +		rte_spinlock_unlock(&pool->lock);
>> +}
>> +
>> +void *
>> +rte_iget(struct rte_indexed_pool *pool, uint32_t idx)
>> +{
>> +	struct rte_indexed_trunk *trunk;
>> +	void *p = NULL;
>> +
>> +	if (!idx)
>> +		return NULL;
>> +	if (pool->need_lock)
>> +		rte_spinlock_lock(&pool->lock);
>> +	idx -= 1;
>> +	trunk = pool->trunks[idx / IDX_POOL_TRUNK_ENTRY];
>> +	assert(trunk);
>> +	assert(trunk->idx == idx / IDX_POOL_TRUNK_ENTRY);
>> +	if (trunk) {
>> +		assert(!map_get(trunk->free, idx % IDX_POOL_TRUNK_ENTRY));
>> +		p = &trunk->data[(idx % IDX_POOL_TRUNK_ENTRY) * pool->size];
>> +	}
>> +	if (pool->need_lock)
>> +		rte_spinlock_unlock(&pool->lock);
>> +	return p;
>> +}
>> +
>> +int
>> +rte_ipool_close(struct rte_indexed_pool *pool)
>> +{
>> +	struct rte_indexed_trunk **trunks;
>> +	int i;
>> +
>> +	assert(pool);
>> +	if (!pool->trunks)
>> +		return 0;
>> +	if (pool->need_lock)
>> +		rte_spinlock_lock(&pool->lock);
>> +	trunks = pool->trunks;
>> +	for (i = 0; i < pool->n_trunk_list; i++) {
>> +		if (trunks[i])
>> +			pool->free(trunks[i]);
>> +	}
>> +	pool->free(pool->trunks);
>> +	memset(pool, 0, sizeof(*pool));
>> +	return 0;
>> +}
>> +
>> +void
>> +rte_ipool_dump(struct rte_indexed_pool *pool, const char *name)
>> +{
>> +	printf("Pool %s entry size %u, trunks %hu, %d entry per trunk, total: %d\n",
>> +	       name, pool->size, pool->n_trunk, IDX_POOL_TRUNK_ENTRY,
>> +	       pool->n_trunk * IDX_POOL_TRUNK_ENTRY);
>> +#ifdef POOL_DEBUG
>> +	printf("Pool %s entry %ld, trunk alloc %ld, empty: %ld, available %ld free %ld\n",
>> +	       name, pool->n_entry, pool->trunk_new, pool->trunk_empty,
>> +	       pool->trunk_avail, pool->trunk_free);
>> +#endif
>> +}
>> diff --git a/lib/librte_mempool/rte_indexed_pool.h b/lib/librte_mempool/rte_indexed_pool.h
>> new file mode 100644
>> index 0000000000..73640c5a1b
>> --- /dev/null
>> +++ b/lib/librte_mempool/rte_indexed_pool.h
>> @@ -0,0 +1,224 @@
>> +/*-
>> + *   BSD LICENSE
>> + *
>> + *   Copyright 2018 Mellanox.
>> + *
>> + *   Redistribution and use in source and binary forms, with or without
>> + *   modification, are permitted provided that the following conditions
>> + *   are met:
>> + *
>> + *     * Redistributions of source code must retain the above copyright
>> + *       notice, this list of conditions and the following disclaimer.
>> + *     * Redistributions in binary form must reproduce the above copyright
>> + *       notice, this list of conditions and the following disclaimer in
>> + *       the documentation and/or other materials provided with the
>> + *       distribution.
>> + *     * Neither the name of 6WIND S.A. nor the names of its
>> + *       contributors may be used to endorse or promote products derived
>> + *       from this software without specific prior written permission.
>> + *
>> + *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
>> + *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
>> + *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
>> + *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
>> + *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
>> + *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
>> + *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
>> + *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
>> + *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
>> + *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
>> + *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
>> + */
> SPDX tag should be used.
>
>> +
>> +#ifndef RTE_MEM_POOL_H_
>> +#define RTE_MEM_POOL_H_
>> +
>> +#include <unistd.h>
>> +#include <stdint.h>
>> +
>> +#include <rte_spinlock.h>
>> +#include <rte_memory.h>
>> +
>> +#define IDX_POOL_TRUNK_ENTRY (63 * 64)
> 64 should be UINT64_BIT.
>
>> +#define TRUNK_INVALID UINT16_MAX
>> +#define POOL_DEBUG 1
>> +
>> +struct rte_indexed_trunk {
>> +	uint64_t free[IDX_POOL_TRUNK_ENTRY / 64]; /* map of free entries. */
>> +	uint16_t idx; /* Trunk id. */
>> +	uint16_t next; /* Next free trunk in free list. */
>> +	uint8_t open; /* Free entries available, enlisted in free list */
>> +	uint8_t data[] __rte_cache_min_aligned; /* Entry data start. */
>> +};
>> +
>> +struct rte_indexed_pool {
>> +	rte_spinlock_t lock;
>> +	uint32_t size; /* Entry size. */
>> +	uint16_t n_trunk; /* Trunks allocated. */
>> +	uint16_t n_trunk_list; /* Trunk pointer array size. */
>> +	/* Dim of trunk pointer array. */
>> +	struct rte_indexed_trunk **trunks;
>> +	uint16_t first_free; /* Index to first free trunk. */
>> +	int need_lock:1;
>> +	int no_trunk_free:1;
>> +	const char *type;
>> +	void *(*malloc)(const char *type, size_t size, unsigned int align,
>> +			int socket);
>> +	void (*free)(void *addr);
>> +#ifdef POOL_DEBUG
>> +	int64_t n_entry;
>> +	int64_t trunk_new;
>> +	int64_t trunk_avail;
>> +	int64_t trunk_empty;
>> +	int64_t trunk_free;
>> +#endif
> I don't think the topology of public structures should change
> depending on a #ifdef (in rte_mempool it is done like this but
> it is not a good idea).
>
> Also, to facilitate API stability, structures shall be hidden to users
> as much as possible.
>
>> +};
>> +
>> +/**
>> + * This function allocates non-initialized memory from pool.
>> + * In NUMA systems, the memory allocated resides on the same
>> + * NUMA socket as the core that calls this function.
>> + *
>> + * Memory is allocated from memory trunk, no alignment.
>> + * If size 0 specified, essential a unique ID generator.
>> + *
>> + * @param pool
>> + *   Pointer to indexed memory pool.
>> + *   No initialization required.
>> + * @param size
>> + *   Size (in bytes) to be allocated.
>> + *   Size has to be same in each allocation request to same pool.
>> + * @param[out] idx
>> + *   Pointer to memory to save allocated index.
>> + *   Memory index always positive value.
>> + * @return
>> + *   - Pointer to the allocated memory
>> + *   - NULL on error. Not enough memory, or invalid arguments.
>> + */
>> +__rte_experimental
>> +void *rte_imalloc(struct rte_indexed_pool *pool, size_t size, uint32_t *idx);
>> +
>> +/**
>> + * This function allocates zero initialized memory from pool.
>> + * In NUMA systems, the memory allocated resides on the same
>> + * NUMA socket as the core that calls this function.
>> + *
>> + * Memory is allocated from memory trunk, no alignment.
>> + * If size 0 specified, essential a unique ID generator.
>> + *
>> + * @param pool
>> + *   Pointer to indexed memory pool.
>> + *   No initialization required.
>> + * @param size
>> + *   Size (in bytes) to be allocated.
>> + *   Size has to be same in each allocation request to same pool.
>> + * @param[out] idx
>> + *   Pointer to memory to save allocated index.
>> + *   Memory index always positive value.
>> + * @return
>> + *   - Pointer to the allocated memory
>> + *   - NULL on error. Not enough memory, or invalid arguments.
>> + */
>> +__rte_experimental
>> +void *rte_izmalloc(struct rte_indexed_pool *pool, size_t size, uint32_t *idx);
>> +
>> +/**
>> + * This function free indexed memory to pool.
>> + * Caller has to make sure that the index is allocated from same pool.
>> + *
>> + * @param pool
>> + *   Pointer to indexed memory pool.
>> + * @param idx
>> + *   Allocated memory index.
>> + */
>> +__rte_experimental
>> +void rte_ifree(struct rte_indexed_pool *pool, uint32_t idx);
>> +
>> +/**
>> + * This function return pointer of indexed memory from index.
>> + * Caller has to make sure that the index is allocated from same pool.
>> + *
>> + * @param pool
>> + *   Pointer to indexed memory pool.
>> + * @param idx
>> + *   Pointer of memory index allocated.
>> + * @return
>> + *   - Pointer to indexed memory.
>> + *   - NULL on not found.
>> + */
>> +__rte_experimental
>> +void *rte_iget(struct rte_indexed_pool *pool, uint32_t idx);
>> +
>> +/**
>> + * This function release all resources of pool.
>> + * Caller has to make sure that all indexes and memories allocated
>> + * from this pool not referenced anymore.
>> + *
>> + * @param pool
>> + *   Pointer to indexed memory pool.
>> + * @return
>> + *   - non-zero value on error.
>> + *   - 0 on success.
>> + */
>> +__rte_experimental
>> +int rte_ipool_close(struct rte_indexed_pool *pool);
>> +
>> +/**
>> + * This function dump debug info of pool.
>> + *
>> + * @param pool
>> + *   Pointer to indexed memory pool.
>> + */
>> +__rte_experimental
>> +void rte_ipool_dump(struct rte_indexed_pool *pool, const char *name);
>> +
>> +/*
>> + * Macros for linked list based on indexed memory.
>> + * Example data structure:
>> + * struct Foo {
>> + *	ILIST_ENTRY(uint16_t) next;
>> + *	...
>> + * }
>> + *
>> + */
>> +#define ILIST_ENTRY(type)						\
>> +struct {								\
>> +	type prev; /* Index of previous element. */			\
>> +	type next; /* Index of next element. */				\
>> +}
>> +
>> +#define ILIST_INSERT(pool, idx, elem, field, head, peer)		\
>> +	do {								\
>> +		assert(elem && idx);					\
>> +		elem->field.next = *head;				\
>> +		elem->field.prev = 0;					\
>> +		if (*head) {						\
>> +			peer = rte_iget(pool, *head);			\
>> +			peer->field.prev = idx;				\
>> +		}							\
>> +		*head = idx;						\
>> +	} while (0)
>> +
>> +#define ILIST_REMOVE(pool, elem, idx, field, head, peer)		\
>> +	do {								\
>> +		assert(elem);						\
>> +		assert(head);						\
>> +		if (elem->field.prev) {					\
>> +			peer = rte_iget(pool, elem->field.prev);	\
>> +			peer->field.next = elem->field.next;		\
>> +		}							\
>> +		if (elem->field.next) {					\
>> +			peer = rte_iget(pool, elem->field.next);	\
>> +			peer->field.prev = elem->field.prev;		\
>> +		}							\
>> +		if (*head == idx)					\
>> +			*head = elem->field.next;			\
>> +	} while (0)
>> +
>> +#define ILIST_FOREACH(pool, head, idx, elem, field)			\
>> +	for (idx = head, elem = rte_iget(pool, idx);			\
>> +	     elem;							\
>> +	     idx = elem->field.next, elem = rte_iget(pool, idx))
>> +
>> +#endif /* RTE_MEM_POOL_H_ */
>> +
>> diff --git a/lib/librte_mempool/rte_mempool_version.map b/lib/librte_mempool/rte_mempool_version.map
>> index 17cbca4607..3b264189be 100644
>> --- a/lib/librte_mempool/rte_mempool_version.map
>> +++ b/lib/librte_mempool/rte_mempool_version.map
>> @@ -41,7 +41,6 @@ DPDK_17.11 {
>>   	global:
>>   
>>   	rte_mempool_populate_iova;
>> -
>>   } DPDK_16.07;
>>   
>>   DPDK_18.05 {
>> @@ -57,4 +56,10 @@ EXPERIMENTAL {
>>   	global:
>>   
>>   	rte_mempool_ops_get_info;
>> +	rte_imalloc;
>> +	rte_izmalloc;
>> +	rte_ifree;
>> +	rte_iget;
>> +	rte_ipool_close;
>> +	rte_ipool_dump;
>>   };
>> -- 
>> 2.17.1
>>
>
> Regards,
> Olivier
>
  
Morten Brørup March 5, 2020, 9:52 a.m. UTC | #10
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Suanming Mou
> On 12/26/2019 7:05 PM, Olivier Matz wrote:
> > On Thu, Oct 17, 2019 at 06:55:01AM +0000, Xueming Li wrote:
> >> Indexed memory pool manages memory entries by index, allocation from
> >> pool returns both memory pointer and index(ID). users save ID as u32
> >> or less(u16) instead of traditional 8 bytes pointer. Memory could be
> >> retrieved from pool or returned to pool later by index.
> >>
> >> Pool allocates backend memory in chunk on demand, pool size grows
> >> dynamically. Bitmap is used to track entry usage in chunk, thus
> >> management overhead is one bit per entry.
> >>
> >> Standard rte_malloc demands malloc overhead(64B) and minimal data
> >> size(64B). This pool aims to such cost saving also pointer size.
> >> For scenario like creating millions of rte_flows each consists
> >> of small pieces of memories, the difference is huge.
> >>
> >> Like standard memory pool, this lightweight pool only support fixed
> >> size memory allocation. Pools should be created for each different
> >> size.
> >>
> >> To facilitate memory allocated by index, a set of ILIST_XXX macro
> >> defined to operate entries as regular LIST.
> >>
> >> By setting entry size to zero, pool can be used as ID generator.
> >>
> >> Signed-off-by: Xueming Li <xuemingl@mellanox.com>

So, you have a use case where 64 bit pointers use too much memory, and you want to optimize for memory at the cost of performance by using 16, 24 or 32 bit references instead. A lot of compilers have an option to do this, so this is generally a valid optimization from a high level point of view.

I like the general concept, so I have a few high level comments to the RFC:

Your API should separate pool creation from element allocation, i.e. define one function to create a pool and set the element size of that pool, and define other functions to allocate (get) and free (put) elements in a pool.

Furthermore, your implementation takes a lock when referencing an element. Dereferencing an element by its index should be optimized for speed, and should be lockless. Remember: DPDK is a data plane development kit, not a control plane development kit.

Also consider providing functions to allocate/free consecutive arrays of elements, so they can be dereferenced even faster because only the address of the first element in the array needs to be looked up through your library. Alternatively, provide a function for bulk dereferencing. I don't know if there is a use case for this... just mentioning it. And if the library's dereferencing function is fast enough, this becomes less relevant.

This library will be used for well defined structures, so the library should resemble the Mempool library (for fixed size element allocations) more than the Malloc library (for variable size allocations).

You can consider copying the Mempool API, but returning indexes instead of pointers.

You should also consider building your implementation on top of the Mempool library, like the Mbuf library does. This will give you per-lcore caching and other benefits already provided by the Mempool library.


Finally, a small detail: The macros for using your indexed mempool elements in a linked list should not be part of the library itself. They should be placed in a separate library, so similar macros/functions for using indexed mempool elements in other structures (hashes, queues, etc.) can also be added as separate libraries at a later time.


Med venlig hilsen / kind regards
- Morten Brørup
  
Suanming Mou March 6, 2020, 7:27 a.m. UTC | #11
Hi Morten,

Thanks for the comments.

> -----Original Message-----
> From: Morten Brørup <mb@smartsharesystems.com>
> Sent: Thursday, March 5, 2020 5:52 PM
> To: Suanming Mou <suanmingm@mellanox.com>; Olivier Matz
> <olivier.matz@6wind.com>; Xueming(Steven) Li <xuemingl@mellanox.com>
> Cc: Andrew Rybchenko <arybchenko@solarflare.com>; dev@dpdk.org; Asaf
> Penso <asafp@mellanox.com>; Ori Kam <orika@mellanox.com>
> Subject: RE: [dpdk-dev] [RFC] mempool: introduce indexed memory pool
> 
> > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Suanming Mou On
> > 12/26/2019 7:05 PM, Olivier Matz wrote:
> > > On Thu, Oct 17, 2019 at 06:55:01AM +0000, Xueming Li wrote:
> > >> Indexed memory pool manages memory entries by index, allocation
> > >> from pool returns both memory pointer and index(ID). users save ID
> > >> as u32 or less(u16) instead of traditional 8 bytes pointer. Memory
> > >> could be retrieved from pool or returned to pool later by index.
> > >>
> > >> Pool allocates backend memory in chunk on demand, pool size grows
> > >> dynamically. Bitmap is used to track entry usage in chunk, thus
> > >> management overhead is one bit per entry.
> > >>
> > >> Standard rte_malloc demands malloc overhead(64B) and minimal data
> > >> size(64B). This pool aims to such cost saving also pointer size.
> > >> For scenario like creating millions of rte_flows each consists of
> > >> small pieces of memories, the difference is huge.
> > >>
> > >> Like standard memory pool, this lightweight pool only support fixed
> > >> size memory allocation. Pools should be created for each different
> > >> size.
> > >>
> > >> To facilitate memory allocated by index, a set of ILIST_XXX macro
> > >> defined to operate entries as regular LIST.
> > >>
> > >> By setting entry size to zero, pool can be used as ID generator.
> > >>
> > >> Signed-off-by: Xueming Li <xuemingl@mellanox.com>
> 
> So, you have a use case where 64 bit pointers use too much memory, and you
> want to optimize for memory at the cost of performance by using 16, 24 or 32
> bit references instead. A lot of compilers have an option to do this, so this is
> generally a valid optimization from a high level point of view.
> 
> I like the general concept, so I have a few high level comments to the RFC:
> 
> Your API should separate pool creation from element allocation, i.e. define one
> function to create a pool and set the element size of that pool, and define other
> functions to allocate (get) and free (put) elements in a pool.
> 
> Furthermore, your implementation takes a lock when referencing an element.
> Dereferencing an element by its index should be optimized for speed, and should
> be lockless. Remember: DPDK is a data plane development kit, not a control
> plane development kit.

Agree with that, however, this is used for control plane. And there is already a lock need option for user to configure.
It seems the v1 RFC misses some code which will free the pool trunk memory once the trunk is totally not used anymore.
In this case, lock is required with multiple threads as trunk memory maybe freed.
 
> 
> Also consider providing functions to allocate/free consecutive arrays of
> elements, so they can be dereferenced even faster because only the address of
> the first element in the array needs to be looked up through your library.
> Alternatively, provide a function for bulk dereferencing. I don't know if there is a
> use case for this... just mentioning it. And if the library's dereferencing function
> is fast enough, this becomes less relevant.

Currently, it is not needed. But once implemented as Mempool device driver, it will be supported.

> 
> This library will be used for well defined structures, so the library should
> resemble the Mempool library (for fixed size element allocations) more than the
> Malloc library (for variable size allocations).
> 
> You can consider copying the Mempool API, but returning indexes instead of
> pointers.
> 
> You should also consider building your implementation on top of the Mempool
> library, like the Mbuf library does. This will give you per-lcore caching and other
> benefits already provided by the Mempool library.

Step by step, the on top of Mempool implementation will depend on the  requirement.

> 
> 
> Finally, a small detail: The macros for using your indexed mempool elements in a
> linked list should not be part of the library itself. They should be placed in a
> separate library, so similar macros/functions for using indexed mempool
> elements in other structures (hashes, queues, etc.) can also be added as
> separate libraries at a later time.

Good suggestion.

> 
> 
> Med venlig hilsen / kind regards
> - Morten Brørup
  
Morten Brørup March 6, 2020, 8:57 a.m. UTC | #12
> From: Suanming Mou [mailto:suanmingm@mellanox.com]
> 
> Hi Morten,
> 
> Thanks for the comments.
> 
> > From: Morten Brørup <mb@smartsharesystems.com>
> > > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Suanming Mou
> On
> > > 12/26/2019 7:05 PM, Olivier Matz wrote:
> > > > On Thu, Oct 17, 2019 at 06:55:01AM +0000, Xueming Li wrote:
> > > >> Indexed memory pool manages memory entries by index, allocation
> > > >> from pool returns both memory pointer and index(ID). users save
> ID
> > > >> as u32 or less(u16) instead of traditional 8 bytes pointer.
> Memory
> > > >> could be retrieved from pool or returned to pool later by index.
> > > >>
> > > >> Pool allocates backend memory in chunk on demand, pool size
> grows
> > > >> dynamically. Bitmap is used to track entry usage in chunk, thus
> > > >> management overhead is one bit per entry.
> > > >>
> > > >> Standard rte_malloc demands malloc overhead(64B) and minimal
> data
> > > >> size(64B). This pool aims to such cost saving also pointer size.
> > > >> For scenario like creating millions of rte_flows each consists
> of
> > > >> small pieces of memories, the difference is huge.
> > > >>
> > > >> Like standard memory pool, this lightweight pool only support
> fixed
> > > >> size memory allocation. Pools should be created for each
> different
> > > >> size.
> > > >>
> > > >> To facilitate memory allocated by index, a set of ILIST_XXX
> macro
> > > >> defined to operate entries as regular LIST.
> > > >>
> > > >> By setting entry size to zero, pool can be used as ID generator.
> > > >>
> > > >> Signed-off-by: Xueming Li <xuemingl@mellanox.com>
> >
> > So, you have a use case where 64 bit pointers use too much memory,
> and you
> > want to optimize for memory at the cost of performance by using 16,
> 24 or 32
> > bit references instead. A lot of compilers have an option to do this,
> so this is
> > generally a valid optimization from a high level point of view.
> >
> > I like the general concept, so I have a few high level comments to
> the RFC:
> >
> > Your API should separate pool creation from element allocation, i.e.
> define one
> > function to create a pool and set the element size of that pool, and
> define other
> > functions to allocate (get) and free (put) elements in a pool.
> >
> > Furthermore, your implementation takes a lock when referencing an
> element.
> > Dereferencing an element by its index should be optimized for speed,
> and should
> > be lockless. Remember: DPDK is a data plane development kit, not a
> control
> > plane development kit.
> 
> Agree with that, however, this is used for control plane. And there is
> already a lock need option for user to configure.

The library must be designed for the data plane, at least on API level. If done right, this library could be become a core library - a genuine "memory optimized" alternative to the Mempool library.

With that said, the implementation can be done step by step, so the first version uses locking and serves your control plane use case well. But there should be a roadmap towards a later version for use in the data plane, preferably lockless.

I am not saying that you are obligated to implement the lockless data plane version. I am only asking you to design the API with the data plane in mind.

> It seems the v1 RFC misses some code which will free the pool trunk
> memory once the trunk is totally not used anymore.
> In this case, lock is required with multiple threads as trunk memory
> maybe freed.
> 
> >
> > Also consider providing functions to allocate/free consecutive arrays
> of
> > elements, so they can be dereferenced even faster because only the
> address of
> > the first element in the array needs to be looked up through your
> library.
> > Alternatively, provide a function for bulk dereferencing. I don't
> know if there is a
> > use case for this... just mentioning it. And if the library's
> dereferencing function
> > is fast enough, this becomes less relevant.
> 
> Currently, it is not needed. But once implemented as Mempool device
> driver, it will be supported.
> 
> >
> > This library will be used for well defined structures, so the library
> should
> > resemble the Mempool library (for fixed size element allocations)
> more than the
> > Malloc library (for variable size allocations).
> >
> > You can consider copying the Mempool API, but returning indexes
> instead of
> > pointers.
> >
> > You should also consider building your implementation on top of the
> Mempool
> > library, like the Mbuf library does. This will give you per-lcore
> caching and other
> > benefits already provided by the Mempool library.
> 
> Step by step, the on top of Mempool implementation will depend on the
> requirement.
> 
> >
> >
> > Finally, a small detail: The macros for using your indexed mempool
> elements in a
> > linked list should not be part of the library itself. They should be
> placed in a
> > separate library, so similar macros/functions for using indexed
> mempool
> > elements in other structures (hashes, queues, etc.) can also be added
> as
> > separate libraries at a later time.
> 
> Good suggestion.
> 
> >
> >
> > Med venlig hilsen / kind regards
> > - Morten Brørup
  

Patch

diff --git a/lib/librte_mempool/Makefile b/lib/librte_mempool/Makefile
index 20bf63fbca..343e945336 100644
--- a/lib/librte_mempool/Makefile
+++ b/lib/librte_mempool/Makefile
@@ -21,7 +21,8 @@  CFLAGS += -DALLOW_EXPERIMENTAL_API
 SRCS-$(CONFIG_RTE_LIBRTE_MEMPOOL) +=  rte_mempool.c
 SRCS-$(CONFIG_RTE_LIBRTE_MEMPOOL) +=  rte_mempool_ops.c
 SRCS-$(CONFIG_RTE_LIBRTE_MEMPOOL) +=  rte_mempool_ops_default.c
+SRCS-$(CONFIG_RTE_LIBRTE_MEMPOOL) +=  rte_indexed_pool.c
 # install includes
-SYMLINK-$(CONFIG_RTE_LIBRTE_MEMPOOL)-include := rte_mempool.h
+SYMLINK-$(CONFIG_RTE_LIBRTE_MEMPOOL)-include := rte_mempool.h rte_indexed_pool.h
 
 include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/lib/librte_mempool/rte_indexed_pool.c b/lib/librte_mempool/rte_indexed_pool.c
new file mode 100644
index 0000000000..b9069a6555
--- /dev/null
+++ b/lib/librte_mempool/rte_indexed_pool.c
@@ -0,0 +1,289 @@ 
+#include "rte_indexed_pool.h"
+
+#include <string.h>
+#include <assert.h>
+
+#include <rte_eal.h>
+#include <rte_malloc.h>
+#include <rte_debug.h>
+#include <rte_log.h>
+
+
+/* return 1 if bitmap empty after clear. */
+static int
+map_clear_any(uint64_t *map, int n, uint32_t *idx)
+{
+	int row, col;
+
+	*idx = UINT32_MAX;
+	/* Locate free entry in trunk */
+	for (row = 0; row < n / 64; row++) {
+		if (*idx != UINT32_MAX && map[row])
+			return 0;
+		if (!map[row])
+			continue;
+		col = __builtin_ctzll(map[row]);
+		*idx = row * 64 + col;
+		map[row] &= ~(1LU << col);
+		if (map[row])
+			return 0;
+	}
+	return 1;
+}
+
+/* Return 1 if all zero. */
+static inline int __rte_unused
+map_is_empty(uint64_t *map, int n)
+{
+	int row;
+
+	for (row = 0; row < n / 64; row++) {
+		if (map[row])
+			return 0;
+	}
+	return 1;
+}
+
+
+static inline void __rte_unused
+map_clear(uint64_t *map, uint32_t idx)
+{
+	int row = idx / 64;
+	int col = idx % 64;
+
+	map[row] &= ~(1LU << col);
+}
+
+static inline void
+map_set(uint64_t *map, uint32_t idx)
+{
+	int row = idx / 64;
+	int col = idx % 64;
+
+	map[row] |= (1LU << col);
+}
+
+static inline uint64_t
+map_get(uint64_t *map, uint32_t idx)
+{
+	int row = idx / 64;
+	int col = idx % 64;
+
+	return map[row] & (1LU << col);
+}
+
+static inline void
+rte_ipool_init(struct rte_indexed_pool *pool, size_t size)
+{
+	pool->size = size;
+	pool->first_free = TRUNK_INVALID;
+	if (!pool->malloc)
+		pool->malloc = rte_malloc_socket;
+	if (!pool->free)
+		pool->free = rte_free;
+	rte_spinlock_init(&pool->lock);
+}
+
+static int
+rte_ipool_grow(struct rte_indexed_pool *pool)
+{
+	struct rte_indexed_trunk *trunk;
+	void *p;
+
+	if (pool->n_trunk == UINT16_MAX)
+		return -ENOMEM;
+	if (pool->n_trunk == pool->n_trunk_list) {
+		/* No free trunk flags, expand trunk list. */
+		int n_grow = pool->n_trunk ? pool->n_trunk :
+			     RTE_CACHE_LINE_SIZE / sizeof(void *);
+		/* rte_ralloc is buggy. */
+		p = pool->malloc(pool->type, (pool->n_trunk + n_grow) * 8,
+				 RTE_CACHE_LINE_SIZE, rte_socket_id());
+		if (!p)
+			return -ENOMEM;
+		if (pool->trunks) {
+			memcpy(p, pool->trunks, pool->n_trunk * 8);
+			pool->free(pool->trunks);
+		}
+		memset(RTE_PTR_ADD(p, pool->n_trunk * 8), 0, n_grow * 8);
+		pool->trunks = p;
+		pool->n_trunk_list += n_grow;
+	}
+	assert(pool->trunks[pool->n_trunk] == NULL);
+	trunk = pool->malloc(pool->type,
+			     sizeof(*trunk) + pool->size * IDX_POOL_TRUNK_ENTRY,
+			     RTE_CACHE_LINE_SIZE, rte_socket_id());
+	if (!trunk)
+		return -ENOMEM;
+	pool->trunks[pool->n_trunk] = trunk;
+	trunk->idx = pool->n_trunk;
+	trunk->open = 1;
+	trunk->next = TRUNK_INVALID;
+	assert(pool->first_free == TRUNK_INVALID);
+	pool->first_free = pool->n_trunk;
+	/* Mark all entries as available. */
+	memset(trunk->free, 0xff, sizeof(trunk->free));
+	pool->n_trunk++;
+#ifdef POOL_DEBUG
+	pool->trunk_new++;
+	pool->trunk_avail++;
+#endif
+	return 0;
+}
+
+void *
+rte_imalloc(struct rte_indexed_pool *pool, size_t size, uint32_t *idx)
+{
+	struct rte_indexed_trunk *trunk;
+	int empty;
+	void *p;
+
+	if (!pool->trunks)
+		rte_ipool_init(pool, size);
+	else if (pool->size != size) {
+		RTE_LOG(ERR, MEMPOOL,
+			"Trying to alloc different size from pool\n");
+		return NULL;
+	}
+	if (pool->need_lock)
+		rte_spinlock_lock(&pool->lock);
+	while (1) {
+		if (pool->first_free == TRUNK_INVALID) {
+			/* If no available trunks, grow new. */
+			if (rte_ipool_grow(pool)) {
+				if (pool->need_lock)
+					rte_spinlock_unlock(&pool->lock);
+				return NULL;
+			}
+			trunk = pool->trunks[pool->first_free];
+			break;
+		}
+		trunk = pool->trunks[pool->first_free];
+		if (trunk->open)
+			break;
+		/* Evict full used trunk from free list. */
+		pool->first_free = trunk->next;
+		trunk->next = TRUNK_INVALID;
+	}
+	assert(pool->first_free != TRUNK_INVALID);
+	assert(pool->first_free < pool->n_trunk);
+	assert(trunk->open);
+	empty = map_clear_any(trunk->free, IDX_POOL_TRUNK_ENTRY, idx);
+	assert(*idx != UINT32_MAX);
+	assert(*idx < IDX_POOL_TRUNK_ENTRY);
+	p = &trunk->data[*idx * pool->size];
+	*idx += IDX_POOL_TRUNK_ENTRY * trunk->idx;
+	*idx += 1; /* non-zero index. */
+#ifdef POOL_DEBUG
+	pool->n_entry++;
+#endif
+	if (empty) {
+		/* Full trunk will be removed from free list in imalloc. */
+		trunk->open = 0;
+#ifdef POOL_DEBUG
+		pool->trunk_empty++;
+		pool->trunk_avail--;
+#endif
+	}
+	if (pool->need_lock)
+		rte_spinlock_unlock(&pool->lock);
+	return p;
+}
+
+void *
+rte_izmalloc(struct rte_indexed_pool *pool, size_t size, uint32_t *idx)
+{
+	void *entry = rte_imalloc(pool, size, idx);
+
+	if (entry)
+		memset(entry, 0, size);
+	return entry;
+}
+
+void
+rte_ifree(struct rte_indexed_pool *pool, uint32_t idx)
+{
+	struct rte_indexed_trunk *trunk;
+
+	if (!idx)
+		return;
+	idx -= 1;
+	if (pool->need_lock)
+		rte_spinlock_lock(&pool->lock);
+	trunk = pool->trunks[idx / IDX_POOL_TRUNK_ENTRY];
+	assert(trunk);
+	assert(trunk->idx == idx / IDX_POOL_TRUNK_ENTRY);
+	map_set(trunk->free, idx % IDX_POOL_TRUNK_ENTRY);
+	if (!trunk->open) {
+		/* Trunk become available, put into free trunk list. */
+		trunk->next = pool->first_free;
+		pool->first_free = trunk->idx;
+		trunk->open = 1;
+#ifdef POOL_DEBUG
+		pool->trunk_empty--;
+		pool->trunk_avail++;
+#endif
+	}
+#ifdef POOL_DEBUG
+	pool->n_entry--;
+#endif
+	if (pool->need_lock)
+		rte_spinlock_unlock(&pool->lock);
+}
+
+void *
+rte_iget(struct rte_indexed_pool *pool, uint32_t idx)
+{
+	struct rte_indexed_trunk *trunk;
+	void *p = NULL;
+
+	if (!idx)
+		return NULL;
+	if (pool->need_lock)
+		rte_spinlock_lock(&pool->lock);
+	idx -= 1;
+	trunk = pool->trunks[idx / IDX_POOL_TRUNK_ENTRY];
+	assert(trunk);
+	assert(trunk->idx == idx / IDX_POOL_TRUNK_ENTRY);
+	if (trunk) {
+		assert(!map_get(trunk->free, idx % IDX_POOL_TRUNK_ENTRY));
+		p = &trunk->data[(idx % IDX_POOL_TRUNK_ENTRY) * pool->size];
+	}
+	if (pool->need_lock)
+		rte_spinlock_unlock(&pool->lock);
+	return p;
+}
+
+int
+rte_ipool_close(struct rte_indexed_pool *pool)
+{
+	struct rte_indexed_trunk **trunks;
+	int i;
+
+	assert(pool);
+	if (!pool->trunks)
+		return 0;
+	if (pool->need_lock)
+		rte_spinlock_lock(&pool->lock);
+	trunks = pool->trunks;
+	for (i = 0; i < pool->n_trunk_list; i++) {
+		if (trunks[i])
+			pool->free(trunks[i]);
+	}
+	pool->free(pool->trunks);
+	memset(pool, 0, sizeof(*pool));
+	return 0;
+}
+
+void
+rte_ipool_dump(struct rte_indexed_pool *pool, const char *name)
+{
+	printf("Pool %s entry size %u, trunks %hu, %d entry per trunk, total: %d\n",
+	       name, pool->size, pool->n_trunk, IDX_POOL_TRUNK_ENTRY,
+	       pool->n_trunk * IDX_POOL_TRUNK_ENTRY);
+#ifdef POOL_DEBUG
+	printf("Pool %s entry %ld, trunk alloc %ld, empty: %ld, available %ld free %ld\n",
+	       name, pool->n_entry, pool->trunk_new, pool->trunk_empty,
+	       pool->trunk_avail, pool->trunk_free);
+#endif
+}
diff --git a/lib/librte_mempool/rte_indexed_pool.h b/lib/librte_mempool/rte_indexed_pool.h
new file mode 100644
index 0000000000..73640c5a1b
--- /dev/null
+++ b/lib/librte_mempool/rte_indexed_pool.h
@@ -0,0 +1,224 @@ 
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright 2018 Mellanox.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of 6WIND S.A. nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef RTE_MEM_POOL_H_
+#define RTE_MEM_POOL_H_
+
+#include <unistd.h>
+#include <stdint.h>
+
+#include <rte_spinlock.h>
+#include <rte_memory.h>
+
+#define IDX_POOL_TRUNK_ENTRY (63 * 64)
+#define TRUNK_INVALID UINT16_MAX
+#define POOL_DEBUG 1
+
+struct rte_indexed_trunk {
+	uint64_t free[IDX_POOL_TRUNK_ENTRY / 64]; /* map of free entries. */
+	uint16_t idx; /* Trunk id. */
+	uint16_t next; /* Next free trunk in free list. */
+	uint8_t open; /* Free entries available, enlisted in free list */
+	uint8_t data[] __rte_cache_min_aligned; /* Entry data start. */
+};
+
+struct rte_indexed_pool {
+	rte_spinlock_t lock;
+	uint32_t size; /* Entry size. */
+	uint16_t n_trunk; /* Trunks allocated. */
+	uint16_t n_trunk_list; /* Trunk pointer array size. */
+	/* Dim of trunk pointer array. */
+	struct rte_indexed_trunk **trunks;
+	uint16_t first_free; /* Index to first free trunk. */
+	int need_lock:1;
+	int no_trunk_free:1;
+	const char *type;
+	void *(*malloc)(const char *type, size_t size, unsigned int align,
+			int socket);
+	void (*free)(void *addr);
+#ifdef POOL_DEBUG
+	int64_t n_entry;
+	int64_t trunk_new;
+	int64_t trunk_avail;
+	int64_t trunk_empty;
+	int64_t trunk_free;
+#endif
+};
+
+/**
+ * This function allocates non-initialized memory from pool.
+ * In NUMA systems, the memory allocated resides on the same
+ * NUMA socket as the core that calls this function.
+ *
+ * Memory is allocated from memory trunk, no alignment.
+ * If size 0 specified, essential a unique ID generator.
+ *
+ * @param pool
+ *   Pointer to indexed memory pool.
+ *   No initialization required.
+ * @param size
+ *   Size (in bytes) to be allocated.
+ *   Size has to be same in each allocation request to same pool.
+ * @param[out] idx
+ *   Pointer to memory to save allocated index.
+ *   Memory index always positive value.
+ * @return
+ *   - Pointer to the allocated memory
+ *   - NULL on error. Not enough memory, or invalid arguments.
+ */
+__rte_experimental
+void *rte_imalloc(struct rte_indexed_pool *pool, size_t size, uint32_t *idx);
+
+/**
+ * This function allocates zero initialized memory from pool.
+ * In NUMA systems, the memory allocated resides on the same
+ * NUMA socket as the core that calls this function.
+ *
+ * Memory is allocated from memory trunk, no alignment.
+ * If size 0 specified, essential a unique ID generator.
+ *
+ * @param pool
+ *   Pointer to indexed memory pool.
+ *   No initialization required.
+ * @param size
+ *   Size (in bytes) to be allocated.
+ *   Size has to be same in each allocation request to same pool.
+ * @param[out] idx
+ *   Pointer to memory to save allocated index.
+ *   Memory index always positive value.
+ * @return
+ *   - Pointer to the allocated memory
+ *   - NULL on error. Not enough memory, or invalid arguments.
+ */
+__rte_experimental
+void *rte_izmalloc(struct rte_indexed_pool *pool, size_t size, uint32_t *idx);
+
+/**
+ * This function free indexed memory to pool.
+ * Caller has to make sure that the index is allocated from same pool.
+ *
+ * @param pool
+ *   Pointer to indexed memory pool.
+ * @param idx
+ *   Allocated memory index.
+ */
+__rte_experimental
+void rte_ifree(struct rte_indexed_pool *pool, uint32_t idx);
+
+/**
+ * This function return pointer of indexed memory from index.
+ * Caller has to make sure that the index is allocated from same pool.
+ *
+ * @param pool
+ *   Pointer to indexed memory pool.
+ * @param idx
+ *   Pointer of memory index allocated.
+ * @return
+ *   - Pointer to indexed memory.
+ *   - NULL on not found.
+ */
+__rte_experimental
+void *rte_iget(struct rte_indexed_pool *pool, uint32_t idx);
+
+/**
+ * This function release all resources of pool.
+ * Caller has to make sure that all indexes and memories allocated
+ * from this pool not referenced anymore.
+ *
+ * @param pool
+ *   Pointer to indexed memory pool.
+ * @return
+ *   - non-zero value on error.
+ *   - 0 on success.
+ */
+__rte_experimental
+int rte_ipool_close(struct rte_indexed_pool *pool);
+
+/**
+ * This function dump debug info of pool.
+ *
+ * @param pool
+ *   Pointer to indexed memory pool.
+ */
+__rte_experimental
+void rte_ipool_dump(struct rte_indexed_pool *pool, const char *name);
+
+/*
+ * Macros for linked list based on indexed memory.
+ * Example data structure:
+ * struct Foo {
+ *	ILIST_ENTRY(uint16_t) next;
+ *	...
+ * }
+ *
+ */
+#define ILIST_ENTRY(type)						\
+struct {								\
+	type prev; /* Index of previous element. */			\
+	type next; /* Index of next element. */				\
+}
+
+#define ILIST_INSERT(pool, idx, elem, field, head, peer)		\
+	do {								\
+		assert(elem && idx);					\
+		elem->field.next = *head;				\
+		elem->field.prev = 0;					\
+		if (*head) {						\
+			peer = rte_iget(pool, *head);			\
+			peer->field.prev = idx;				\
+		}							\
+		*head = idx;						\
+	} while (0)
+
+#define ILIST_REMOVE(pool, elem, idx, field, head, peer)		\
+	do {								\
+		assert(elem);						\
+		assert(head);						\
+		if (elem->field.prev) {					\
+			peer = rte_iget(pool, elem->field.prev);	\
+			peer->field.next = elem->field.next;		\
+		}							\
+		if (elem->field.next) {					\
+			peer = rte_iget(pool, elem->field.next);	\
+			peer->field.prev = elem->field.prev;		\
+		}							\
+		if (*head == idx)					\
+			*head = elem->field.next;			\
+	} while (0)
+
+#define ILIST_FOREACH(pool, head, idx, elem, field)			\
+	for (idx = head, elem = rte_iget(pool, idx);			\
+	     elem;							\
+	     idx = elem->field.next, elem = rte_iget(pool, idx))
+
+#endif /* RTE_MEM_POOL_H_ */
+
diff --git a/lib/librte_mempool/rte_mempool_version.map b/lib/librte_mempool/rte_mempool_version.map
index 17cbca4607..3b264189be 100644
--- a/lib/librte_mempool/rte_mempool_version.map
+++ b/lib/librte_mempool/rte_mempool_version.map
@@ -41,7 +41,6 @@  DPDK_17.11 {
 	global:
 
 	rte_mempool_populate_iova;
-
 } DPDK_16.07;
 
 DPDK_18.05 {
@@ -57,4 +56,10 @@  EXPERIMENTAL {
 	global:
 
 	rte_mempool_ops_get_info;
+	rte_imalloc;
+	rte_izmalloc;
+	rte_ifree;
+	rte_iget;
+	rte_ipool_close;
+	rte_ipool_dump;
 };