eal: add option --iso-cmem for external custom memory
diff mbox series

Message ID 20191015053047.52260-1-ajit.khaparde@broadcom.com
State Rejected, archived
Delegated to: David Marchand
Headers show
Series
  • eal: add option --iso-cmem for external custom memory
Related show

Checks

Context Check Description
ci/iol-compilation success Compile Testing PASS
ci/iol-intel-Performance success Performance Testing PASS
ci/Intel-compilation success Compilation OK
ci/checkpatch success coding style OK
ci/iol-mellanox-Performance success Performance Testing PASS

Commit Message

Ajit Khaparde Oct. 15, 2019, 5:30 a.m. UTC
From: Rajesh Ravi <rajesh.ravi@broadcom.com>

Support external custom memory added to heap to be used with vfio with
--iso-cmem option.Type1 memory mapping was by passed for external memory.
But an exception is added to allow external custom memory to be used with
vfio.

Signed-off-by: Rajesh Ravi <rajesh.ravi@broadcom.com>
Reviewed-by: Jonathan Richardson <jonathan.richardson@broadcom.com>
Reviewed-by: Scott Branden <scott.branden@broadcom.com>
Reviewed-by: Vikram Mysore Prakash <vikram.prakash@broadcom.com>
Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
---
 lib/librte_eal/common/eal_common_options.c | 5 +++++
 lib/librte_eal/common/eal_internal_cfg.h   | 1 +
 lib/librte_eal/common/eal_options.h        | 2 ++
 lib/librte_eal/linux/eal/eal_vfio.c        | 2 +-
 4 files changed, 9 insertions(+), 1 deletion(-)

Comments

Burakov, Anatoly Oct. 17, 2019, 3:45 p.m. UTC | #1
On 15-Oct-19 6:30 AM, Ajit Khaparde wrote:
> From: Rajesh Ravi <rajesh.ravi@broadcom.com>
> 
> Support external custom memory added to heap to be used with vfio with
> --iso-cmem option.Type1 memory mapping was by passed for external memory.
> But an exception is added to allow external custom memory to be used with
> vfio.
> 
> Signed-off-by: Rajesh Ravi <rajesh.ravi@broadcom.com>
> Reviewed-by: Jonathan Richardson <jonathan.richardson@broadcom.com>
> Reviewed-by: Scott Branden <scott.branden@broadcom.com>
> Reviewed-by: Vikram Mysore Prakash <vikram.prakash@broadcom.com>
> Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
> ---

Hi,

First of all, what is "iso-cmem"? It doesn't seem to have any defined 
meaning nor any relation to any existing functionality, and it's not 
explained anywhere what is "isolated cmem".

More importantly, why is this necessary? Type1 map only bypasses 
external segments when adding memory at startup - it doesn't stop you 
from calling rte_vfio_dma_map() to map the memory with VFIO when you 
create the segment.
Rajesh Ravi Oct. 18, 2019, 10:54 a.m. UTC | #2
+ Srinath

Thanks Anatoly for reviewing this. Please find my reply inline  below:

[Anatoly] First of all, what is "iso-cmem"? It doesn't seem to have any
defined
meaning nor any relation to any existing functionality, and it's not
explained anywhere what is "isolated cmem".
[Rajesh] I 'll correct commit message to include clearer explanation.

*Context & purpose*
We 're using this to improve SPDK performance. When NVMe completion queues
are allocated from a certain memory range,
out of order competions completions  are enabled with our PCIe and it
improves performance.

*Usage*
spdk_env_init()-->(calls rte_eal_init(), and then calls
)spdk_env_dpdk_post_init() [file: spdk/lib/env_dpdk/init.c]
--> spdk_mem_map_init() [lib/env_dpdk/memory.c]-->map_cmem_virtual_area()
[this is our custom function]
In  map_cmem_virtual_area(): we 're mmaping anonymous &  private region and
then creating iova table which makes iova addresses which fall in  custom
memory region.
Then we call rte_malloc_heap_memory_add() and then allocate NVMe completion
queues from this heap.

[Anatoly] More importantly, why is this necessary? Type1 map only bypasses
external segments when adding memory at startup - it doesn't stop you
from calling rte_vfio_dma_map() to map the memory with VFIO when you
create the segment.
[Rajesh] Please correct me if I'm wrong or  missing some thing here.
  A) We wanted to create a heap from which we can allocate dynamically

  B) I see that type1_map()
[file:dpdk/lib/librte_eal/linuxapp/eal/eal_vfio.c] getting called once
rte_malloc_heap_memory_add()() was invoked.
       SPDK uses type1 dma map [spdk/lib/env_dpdk/memory.c]
        vfio_type1_dma_map() is registered for .dma_map_func member
[dpdk/lib/librte_eal/linuxapp/eal/eal_vfio.c]
        So type1_map is called. Not sure whether we can change this.

Regards,
Rajesh












On Thu, Oct 17, 2019 at 9:15 PM Burakov, Anatoly <anatoly.burakov@intel.com>
wrote:

> On 15-Oct-19 6:30 AM, Ajit Khaparde wrote:
> > From: Rajesh Ravi <rajesh.ravi@broadcom.com>
> >
> > Support external custom memory added to heap to be used with vfio with
> > --iso-cmem option.Type1 memory mapping was by passed for external memory.
> > But an exception is added to allow external custom memory to be used with
> > vfio.
> >
> > Signed-off-by: Rajesh Ravi <rajesh.ravi@broadcom.com>
> > Reviewed-by: Jonathan Richardson <jonathan.richardson@broadcom.com>
> > Reviewed-by: Scott Branden <scott.branden@broadcom.com>
> > Reviewed-by: Vikram Mysore Prakash <vikram.prakash@broadcom.com>
> > Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
> > ---
>
> Hi,
>
> First of all, what is "iso-cmem"? It doesn't seem to have any defined
> meaning nor any relation to any existing functionality, and it's not
> explained anywhere what is "isolated cmem".
>
> More importantly, why is this necessary? Type1 map only bypasses
> external segments when adding memory at startup - it doesn't stop you
> from calling rte_vfio_dma_map() to map the memory with VFIO when you
> create the segment.
>
> --
> Thanks,
> Anatoly
>
Burakov, Anatoly Oct. 18, 2019, 4:53 p.m. UTC | #3
On 18-Oct-19 11:54 AM, Rajesh Ravi wrote:
> + Srinath
> 
> Thanks Anatoly for reviewing this. Please find my reply inline  below:
> 
> [Anatoly] First of all, what is "iso-cmem"? It doesn't seem to have any 
> defined
> meaning nor any relation to any existing functionality, and it's not
> explained anywhere what is "isolated cmem".
> [Rajesh] I 'll correct commit message to include clearer explanation.
> _
> _
> _Context & purpose_
> We 're using this to improve SPDK performance. When NVMe completion 
> queues are allocated from a certain memory range,
> out of order competions completions  are enabled with our PCIe and it 
> improves performance.
> 
> _Usage_
> spdk_env_init()-->(calls rte_eal_init(), and then calls 
> )spdk_env_dpdk_post_init() [file: spdk/lib/env_dpdk/init.c] 
> --> spdk_mem_map_init() 
> [lib/env_dpdk/memory.c]-->map_cmem_virtual_area() [this is our custom 
> function]
> In map_cmem_virtual_area(): we 're mmaping anonymous &  private region 
> and then creating iova table which makes iova addresses which fall in  
> custom memory region.
> Then we call rte_malloc_heap_memory_add() and then allocate NVMe 
> completion queues from this heap.
> 
> [Anatoly] More importantly, why is this necessary? Type1 map only bypasses
> external segments when adding memory at startup - it doesn't stop you
> from calling rte_vfio_dma_map() to map the memory with VFIO when you
> create the segment.
> [Rajesh] Please correct me if I'm wrong or  missing some thing here.
>    A) We wanted to create a heap from which we can allocate dynamically
> 
>    B) I see that type1_map()  
> [file:dpdk/lib/librte_eal/linuxapp/eal/eal_vfio.c] getting called once 
> rte_malloc_heap_memory_add()() was invoked.
>         SPDK uses type1 dma map [spdk/lib/env_dpdk/memory.c]
>          vfio_type1_dma_map() is registered for .dma_map_func member 
> [dpdk/lib/librte_eal/linuxapp/eal/eal_vfio.c]
>          So type1_map is called. Not sure whether we can change this.
> 

I'm not sure i'm following. You mean you *don't* want this mapping to be 
performed?
Rajesh Ravi Oct. 21, 2019, 3:46 p.m. UTC | #4
Thanks Anatoly for prompt response. Sorry for the delayed response, I
took some time to reverify with SPDK.

Infact, I do want the iommu mapping to be performed. I don't want it to be
bypassed by type1_map() [lib/librte_eal/linuxapp/eal/eal_vfio.c] for
external memory.

Then, if I understood correctly, you suggested to call rte_vfio_dma_map()
as an alternative.

*Context & clarification*

1) We 're using DPDK to manage/allocate memory for SPDK through heap API.

  The only DPDK APIs we 're calling are:
  A)rte_malloc_heap_memory_add() to add external memory to heap.
  B)rte_malloc_heap_get_socket() & rte_malloc_socket() to allocate memory

*  Are you suggesting to make a call to  rte_vfio_dma_map() from spdk, in
addition to the APIs listed under 1)A & 1)B instead of modifying DPDK vfio
functions?*
   Please confirm, Probably I missed the context and might not have
understood fully.


2) .dma_user_map_func=vfio_type1_dma_mem_map() is called from 2 paths in
dpdk. In either case call to dma_user_map_func() is skipped.

   A) *During the startup, as you said before:*
     rte_vfio_setup_device() --> type1_map()

   B)During allocation event:
        vfio_mem_event_callback()  (type=RTE_MEM_EVENT_ALLOC,..)
-->vfio_dma_mem_map() -->dma_user_map_func()
----------------------------------------------------------------------------------------------------------------------------------------------------------------------
*Conclusion*

So we have 2 alternatives:

A) make additional call to  rte_vfio_dma_map() API after adding external
memory using rte_malloc_heap_memory_add()  API.

B) remove msl->external check which bypasses call to  dma_user_map_func()
in DPDK.

I modified DPDK functions [Option B) ]. I guess you 're suggesting option A)

Please confirm.


Regards,
Rajesh












On Fri, Oct 18, 2019 at 10:23 PM Burakov, Anatoly <anatoly.burakov@intel.com>
wrote:

> On 18-Oct-19 11:54 AM, Rajesh Ravi wrote:
> > + Srinath
> >
> > Thanks Anatoly for reviewing this. Please find my reply inline  below:
> >
> > [Anatoly] First of all, what is "iso-cmem"? It doesn't seem to have any
> > defined
> > meaning nor any relation to any existing functionality, and it's not
> > explained anywhere what is "isolated cmem".
> > [Rajesh] I 'll correct commit message to include clearer explanation.
> > _
> > _
> > _Context & purpose_
> > We 're using this to improve SPDK performance. When NVMe completion
> > queues are allocated from a certain memory range,
> > out of order competions completions  are enabled with our PCIe and it
> > improves performance.
> >
> > _Usage_
> > spdk_env_init()-->(calls rte_eal_init(), and then calls
> > )spdk_env_dpdk_post_init() [file: spdk/lib/env_dpdk/init.c]
> > --> spdk_mem_map_init()
> > [lib/env_dpdk/memory.c]-->map_cmem_virtual_area() [this is our custom
> > function]
> > In map_cmem_virtual_area(): we 're mmaping anonymous &  private region
> > and then creating iova table which makes iova addresses which fall in
> > custom memory region.
> > Then we call rte_malloc_heap_memory_add() and then allocate NVMe
> > completion queues from this heap.
> >
> > [Anatoly] More importantly, why is this necessary? Type1 map only
> bypasses
> > external segments when adding memory at startup - it doesn't stop you
> > from calling rte_vfio_dma_map() to map the memory with VFIO when you
> > create the segment.
> > [Rajesh] Please correct me if I'm wrong or  missing some thing here.
> >    A) We wanted to create a heap from which we can allocate dynamically
> >
> >    B) I see that type1_map()
> > [file:dpdk/lib/librte_eal/linuxapp/eal/eal_vfio.c] getting called once
> > rte_malloc_heap_memory_add()() was invoked.
> >         SPDK uses type1 dma map [spdk/lib/env_dpdk/memory.c]
> >          vfio_type1_dma_map() is registered for .dma_map_func member
> > [dpdk/lib/librte_eal/linuxapp/eal/eal_vfio.c]
> >          So type1_map is called. Not sure whether we can change this.
> >
>
> I'm not sure i'm following. You mean you *don't* want this mapping to be
> performed?
>
> --
> Thanks,
> Anatoly
>
Rajesh Ravi Oct. 22, 2019, 7:56 a.m. UTC | #5
Hi Anatoly,
I tried calling rte_vfio_dma_map() but, it  failed for me because

vfio_dma_mem_map() failed with error:VFIO support not initialized  because:
default_vfio_cfg->vfio_iommu_type=NULL ( container_dma_map calls
vfio_dma_mem_map  with vfio_cfg =  default_vfio_cfg )

call sequence:  rte_vfio_dma_map() -->container_dma_map()
--> vfio_dma_mem_map

Regards,
Rajesh

On Mon, Oct 21, 2019 at 9:16 PM Rajesh Ravi <rajesh.ravi@broadcom.com>
wrote:

> Thanks Anatoly for prompt response. Sorry for the delayed response, I
> took some time to reverify with SPDK.
>
> Infact, I do want the iommu mapping to be performed. I don't want it to be
> bypassed by type1_map() [lib/librte_eal/linuxapp/eal/eal_vfio.c] for
> external memory.
>
> Then, if I understood correctly, you suggested to call rte_vfio_dma_map()
> as an alternative.
>
> *Context & clarification*
>
> 1) We 're using DPDK to manage/allocate memory for SPDK through heap API.
>
>   The only DPDK APIs we 're calling are:
>   A)rte_malloc_heap_memory_add() to add external memory to heap.
>   B)rte_malloc_heap_get_socket() & rte_malloc_socket() to allocate memory
>
> *  Are you suggesting to make a call to  rte_vfio_dma_map() from spdk, in
> addition to the APIs listed under 1)A & 1)B instead of modifying DPDK vfio
> functions?*
>    Please confirm, Probably I missed the context and might not have
> understood fully.
>
>
> 2) .dma_user_map_func=vfio_type1_dma_mem_map() is called from 2 paths in
> dpdk. In either case call to dma_user_map_func() is skipped.
>
>    A) *During the startup, as you said before:*
>      rte_vfio_setup_device() --> type1_map()
>
>    B)During allocation event:
>         vfio_mem_event_callback()  (type=RTE_MEM_EVENT_ALLOC,..)
> -->vfio_dma_mem_map() -->dma_user_map_func()
>
> ----------------------------------------------------------------------------------------------------------------------------------------------------------------------
> *Conclusion*
>
> So we have 2 alternatives:
>
> A) make additional call to  rte_vfio_dma_map() API after adding external
> memory using rte_malloc_heap_memory_add()  API.
>
> B) remove msl->external check which bypasses call to  dma_user_map_func()
> in DPDK.
>
> I modified DPDK functions [Option B) ]. I guess you 're suggesting option
> A)
>
> Please confirm.
>
>
> Regards,
> Rajesh
>
>
>
>
>
>
>
>
>
>
>
>
> On Fri, Oct 18, 2019 at 10:23 PM Burakov, Anatoly <
> anatoly.burakov@intel.com> wrote:
>
>> On 18-Oct-19 11:54 AM, Rajesh Ravi wrote:
>> > + Srinath
>> >
>> > Thanks Anatoly for reviewing this. Please find my reply inline  below:
>> >
>> > [Anatoly] First of all, what is "iso-cmem"? It doesn't seem to have any
>> > defined
>> > meaning nor any relation to any existing functionality, and it's not
>> > explained anywhere what is "isolated cmem".
>> > [Rajesh] I 'll correct commit message to include clearer explanation.
>> > _
>> > _
>> > _Context & purpose_
>> > We 're using this to improve SPDK performance. When NVMe completion
>> > queues are allocated from a certain memory range,
>> > out of order competions completions  are enabled with our PCIe and it
>> > improves performance.
>> >
>> > _Usage_
>> > spdk_env_init()-->(calls rte_eal_init(), and then calls
>> > )spdk_env_dpdk_post_init() [file: spdk/lib/env_dpdk/init.c]
>> > --> spdk_mem_map_init()
>> > [lib/env_dpdk/memory.c]-->map_cmem_virtual_area() [this is our custom
>> > function]
>> > In map_cmem_virtual_area(): we 're mmaping anonymous &  private region
>> > and then creating iova table which makes iova addresses which fall in
>> > custom memory region.
>> > Then we call rte_malloc_heap_memory_add() and then allocate NVMe
>> > completion queues from this heap.
>> >
>> > [Anatoly] More importantly, why is this necessary? Type1 map only
>> bypasses
>> > external segments when adding memory at startup - it doesn't stop you
>> > from calling rte_vfio_dma_map() to map the memory with VFIO when you
>> > create the segment.
>> > [Rajesh] Please correct me if I'm wrong or  missing some thing here.
>> >    A) We wanted to create a heap from which we can allocate dynamically
>> >
>> >    B) I see that type1_map()
>> > [file:dpdk/lib/librte_eal/linuxapp/eal/eal_vfio.c] getting called once
>> > rte_malloc_heap_memory_add()() was invoked.
>> >         SPDK uses type1 dma map [spdk/lib/env_dpdk/memory.c]
>> >          vfio_type1_dma_map() is registered for .dma_map_func member
>> > [dpdk/lib/librte_eal/linuxapp/eal/eal_vfio.c]
>> >          So type1_map is called. Not sure whether we can change this.
>> >
>>
>> I'm not sure i'm following. You mean you *don't* want this mapping to be
>> performed?
>>
>> --
>> Thanks,
>> Anatoly
>>
>
>
> --
> Regards,
> Rajesh
>
Burakov, Anatoly Oct. 24, 2019, 11:43 a.m. UTC | #6
On 22-Oct-19 8:56 AM, Rajesh Ravi wrote:
> Hi Anatoly,
> I tried calling rte_vfio_dma_map() but, it  failed for me because
> 
> vfio_dma_mem_map() failed with error:VFIO support not initialized  
> because: default_vfio_cfg->vfio_iommu_type=NULL ( container_dma_map 
> calls vfio_dma_mem_map  with vfio_cfg = default_vfio_cfg )
> 
> call sequence: rte_vfio_dma_map() -->container_dma_map() 
> --> vfio_dma_mem_map
> 
> Regards,
> Rajesh
> 
> On Mon, Oct 21, 2019 at 9:16 PM Rajesh Ravi <rajesh.ravi@broadcom.com 
> <mailto:rajesh.ravi@broadcom.com>> wrote:
> 
>     Thanks Anatoly for prompt response. Sorry for the delayed response,
>     I took some time to reverify with SPDK.
> 
>     Infact, I do want the iommu mapping to be performed. I don't want it
>     to be bypassed by type1_map()
>     [lib/librte_eal/linuxapp/eal/eal_vfio.c] for external memory.
> 
>     Then, if I understood correctly, you suggested to call
>     rte_vfio_dma_map() as an alternative.
> 
>     _Context & clarification_
> 
>     1) We 're using DPDK to manage/allocate memory for SPDK through heap
>     API.
> 
>        The only DPDK APIs we 're calling are:
>        A)rte_malloc_heap_memory_add() to add external memory to heap.
>        B)rte_malloc_heap_get_socket() & rte_malloc_socket() to allocate
>     memory
> 
>     /Are you suggesting to make a call to rte_vfio_dma_map() from spdk,
>     in addition to the APIs listed under 1)A & 1)B instead of modifying
>     DPDK vfio functions?/
>         Please confirm, Probably I missed the context and might not have
>     understood fully.
> 
> 
>     2) .dma_user_map_func=vfio_type1_dma_mem_map() is called from 2
>     paths in dpdk. In either case call to dma_user_map_func() is skipped.
>         A) _During the startup, as you said before:_
>           rte_vfio_setup_device() --> type1_map()
> 
>         B)During allocation event:
>              vfio_mem_event_callback()  (type=RTE_MEM_EVENT_ALLOC,..)
>     -->vfio_dma_mem_map() -->dma_user_map_func()
>     ----------------------------------------------------------------------------------------------------------------------------------------------------------------------
>     _Conclusion_
> 
>     So we have 2 alternatives:
> 
>     A) make additional call to rte_vfio_dma_map() API after adding
>     external memory using rte_malloc_heap_memory_add()  API.
> 
>     B) remove msl->external check which bypasses call to
>     dma_user_map_func() in DPDK.
> 
>     I modified DPDK functions [Option B) ]. I guess you 're
>     suggesting option A)
> 
>     Please confirm.
> 
>     Regards,
>     Rajesh

Hi Rajesh,

Apologies for delayed response.

I'm still not sure i understand your problem.

When adding memory with rte_malloc_heap_memory_add(), this memory is, as 
far as i can tell, automatically mapped for DMA mapping with VFIO. This 
happens because:

rte_malloc_heap_memory_add() -> malloc_heap_add_external_memory() -> 
eal_memalloc_mem_event_notify() -> vfio_mem_event_callback() -> 
vfio_dma_mem_map()

VFIO registers itself for mem event callbacks in 
eal_vfio.c:rte_vfio_setup_device():791.

So, there is no need to map anything for DMA when you're creating 
external heaps - it is being mapped automatically by EAL.

For an example implementation have a look at testpmd, specifically 
testpmd.c:mbuf_pool_create() for MP_ALLOC_XMEM type. This calls 
testpmd.c:setup_extmem(), which creates an external heap. Notice how it 
doesn't do anything to map the memory for DMA, because that is already 
done by EAL. You can verify this working by adding --mp-alloc=xmem 
argument to testpmd to force it to use external memory for its mempools.

So, what is the actual issue here, given that the memory is mapped for 
DMA by EAL automatically when it is added?
Rajesh Ravi Oct. 25, 2019, 12:53 p.m. UTC | #7
Thanks Anatoly.
I think now we 're close to sync. You understood it right: the problem is
dma/iommu mappings are not getting created.

1)vfio_mem_event_callback() is getting called later when memory is
allocated from heap but not when external memory is added to heap

2)vfio_dma_mem_map() is called inside vfio_mem_event_callback() but it's
skipped in case of external memory because it's invoked inside following if
condition:
    if (rte_eal_iova_mode() == RTE_IOVA_VA && msl->external == 0)

    [ I verified this also with gdb by stepping over line by line inside
vfio_mem_event_callback() ]

  We 're using DPDK v19.02 but I see almost same code and flow in DPDK
19.08-rc4 as well.
---------------------------------------------------------------------------------------------------------------------------------
[ *Further explanation of point 1) :*
 I used gdb and set break point and checked back trace to verify the call
stack trace: first I set break point at rte_malloc_heap_memory_add.
  Then I hit break point and then set break point at
vfio_mem_event_callback. I see from back trace that it's from
malloc_heap_alloc () with event = RTE_MEM_EVENT_ALLOC as can be seen from
log at the end of this mail.

 And also I don't see any event for memory ADD as below:
enum rte_mem_event {
         RTE_MEM_EVENT_ALLOC = 0, /**< Allocation event. */
         RTE_MEM_EVENT_FREE,      /**< Deallocation event. */
 }; ]

*Log*

(gdb) bt
#0  vfio_mem_event_callback (type=RTE_MEM_EVENT_ALLOC, addr=0x200016c00000,
len=4194304, arg=0x0) at
/usr/src/debug/brcm-dpdk/gitAUTOINC+ca5edf7f64-r0/lib/librte_eal/linuxapp/eal/eal_vfio.c:512
#1  0x00000000004b78f0 in eal_memalloc_mem_event_notify
(event=event@entry=RTE_MEM_EVENT_ALLOC,
start=start@entry=0x200016c00000, len=len@entry=4194304)
    at
/usr/src/debug/brcm-dpdk/gitAUTOINC+ca5edf7f64-r0/lib/librte_eal/common/eal_common_memalloc.c:248
#2  0x00000000004c7a90 in try_expand_heap_primary (contig=false, bound=0,
align=35184753770496, flags=0, socket=0, elt_size=35184753770496,
pg_sz=2097152, heap=0x569b3c <early_mem_config+10364>)
    at
/usr/src/debug/brcm-dpdk/gitAUTOINC+ca5edf7f64-r0/lib/librte_eal/common/malloc_heap.c:412
#3  try_expand_heap (heap=heap@entry=0x569b3c <early_mem_config+10364>,
pg_sz=2097152, elt_size=elt_size@entry=8192, socket=socket@entry=0,
flags=flags@entry=0, align=align@entry=2097152, bound=bound@entry=0,
contig=contig@entry=false)
    at
/usr/src/debug/brcm-dpdk/gitAUTOINC+ca5edf7f64-r0/lib/librte_eal/common/malloc_heap.c:494
#4  0x00000000004c7f28 in alloc_more_mem_on_socket (heap=heap@entry=0x569b3c
<early_mem_config+10364>, size=size@entry=8192, socket=0, flags=flags@entry=0,
align=align@entry=2097152, bound=bound@entry=0, contig=contig@entry=false)
    at
/usr/src/debug/brcm-dpdk/gitAUTOINC+ca5edf7f64-r0/lib/librte_eal/common/malloc_heap.c:622
#5  0x00000000004c822c in malloc_heap_alloc_on_heap_id (size=size@entry=8192,
heap_id=heap_id@entry=0, flags=flags@entry=0, align=<optimized out>,
align@entry=2097152, bound=bound@entry=0, contig=contig@entry=false,
type=0x0)
    at
/usr/src/debug/brcm-dpdk/gitAUTOINC+ca5edf7f64-r0/lib/librte_eal/common/malloc_heap.c:676
#6  0x00000000004c8464 in malloc_heap_alloc (type=type@entry=0x0,
size=size@entry=8192, socket_arg=<optimized out>, flags=flags@entry=0,
align=2097152, bound=bound@entry=0, contig=contig@entry=false)
    at
/usr/src/debug/brcm-dpdk/gitAUTOINC+ca5edf7f64-r0/lib/librte_eal/common/malloc_heap.c:714
#7  0x00000000004c498c in rte_malloc_socket (type=type@entry=0x0,
size=size@entry=8192, align=align@entry=2097152, socket_arg=<optimized
out>, socket_arg@entry=-1)
    at
/usr/src/debug/brcm-dpdk/gitAUTOINC+ca5edf7f64-r0/lib/librte_eal/common/rte_malloc.c:58
#8  0x000000000049fe24 in spdk_malloc (size=size@entry=8192,
align=align@entry=2097152, phys_addr=phys_addr@entry=0x0,
socket_id=socket_id@entry=-1, flags=flags@entry=3)
    at
/usr/src/debug/brcm-spdk/gitAUTOINC+6ae1d4b779-r0/lib/env_dpdk/env.c:95
#9  0x000000000049fe94 in spdk_zmalloc (size=8192, align=align@entry=2097152,
phys_addr=phys_addr@entry=0x0, socket_id=socket_id@entry=-1,
flags=flags@entry=3) at
/usr/src/debug/brcm-spdk/gitAUTOINC+6ae1d4b779-r0/lib/env_dpdk/env.c:108
#10 0x0000000000431f64 in nvme_pcie_qpair_construct (qpair=0x200016aed1d8,
opts=opts@entry=0x0) at
/usr/src/debug/brcm-spdk/gitAUTOINC+6ae1d4b779-r0/lib/nvme/nvme_pcie.c:1054
#11 0x000000000043233c in nvme_pcie_ctrlr_construct_admin_qpair
(ctrlr=0x200016aed340) at
/usr/src/debug/brcm-spdk/gitAUTOINC+6ae1d4b779-r0/lib/nvme/nvme_pcie.c:719
#12 nvme_pcie_ctrlr_construct (trid=<optimized out>, opts=<optimized out>,
devhandle=0x70f4b0) at
/usr/src/debug/brcm-spdk/gitAUTOINC+6ae1d4b779-r0/lib/nvme/nvme_pcie.c:881
#13 0x0000000000437674 in nvme_transport_ctrlr_construct
(trid=trid@entry=0xffffffffec98,
opts=opts@entry=0xffffffffea00, devhandle=devhandle@entry=0x70f4b0)
    at
/usr/src/debug/brcm-spdk/gitAUTOINC+6ae1d4b779-r0/lib/nvme/nvme_transport.c:99
#14 0x0000000000436b08 in nvme_ctrlr_probe (trid=trid@entry=0xffffffffec98,
probe_ctx=0x70eae0, devhandle=devhandle@entry=0x70f4b0) at
/usr/src/debug/brcm-spdk/gitAUTOINC+6ae1d4b779-r0/lib/nvme/nvme.c:426
#15 0x0000000000430d40 in pcie_nvme_enum_cb (ctx=0xffffffffefe8,
pci_dev=0x70f4b0) at
/usr/src/debug/brcm-spdk/gitAUTOINC+6ae1d4b779-r0/lib/nvme/nvme_pcie.c:752
#16 0x00000000004a2e20 in spdk_pci_device_init (_drv=0x51b660
<g_nvme_pci_drv>, _dev=0x61b5f0) at
/usr/src/debug/brcm-spdk/gitAUTOINC+6ae1d4b779-r0/lib/env_dpdk/pci.c:347
#17 0x00000000004d5c68 in rte_pci_probe_one_driver (dev=0x61b5f0,
dr=0x51b660 <g_nvme_pci_drv>) at
/usr/src/debug/brcm-dpdk/gitAUTOINC+ca5edf7f64-r0/drivers/bus/pci/pci_common.c:185
#18 pci_probe_all_drivers (dev=0x61b5f0) at
/usr/src/debug/brcm-dpdk/gitAUTOINC+ca5edf7f64-r0/drivers/bus/pci/pci_common.c:259
#19 rte_pci_probe () at
/usr/src/debug/brcm-dpdk/gitAUTOINC+ca5edf7f64-r0/drivers/bus/pci/pci_common.c:294
#20 0x00000000004bb490 in rte_bus_probe () at
/usr/src/debug/brcm-dpdk/gitAUTOINC+ca5edf7f64-r0/lib/librte_eal/common/eal_common_bus.c:100
#21 0x00000000004a3030 in spdk_pci_enumerate (driver=0x51b660
<g_nvme_pci_drv>, enum_cb=enum_cb@entry=0x430c50 <pcie_nvme_enum_cb>,
enum_ctx=enum_ctx@entry=0xffffffffefe8)
    at
/usr/src/debug/brcm-spdk/gitAUTOINC+6ae1d4b779-r0/lib/env_dpdk/pci.c:502
#22 0x00000000004319c4 in nvme_pcie_ctrlr_scan (probe_ctx=0x70eae0,
direct_connect=<optimized out>) at
/usr/src/debug/brcm-spdk/gitAUTOINC+6ae1d4b779-r0/lib/nvme/nvme_pcie.c:780
#23 0x00000000004376ac in nvme_transport_ctrlr_scan
(probe_ctx=probe_ctx@entry=0x70eae0, direct_connect=direct_connect@entry=false)
at
/usr/src/debug/brcm-spdk/gitAUTOINC+6ae1d4b779-r0/lib/nvme/nvme_transport.c:106
#24 0x0000000000436be4 in spdk_nvme_probe_internal
(probe_ctx=probe_ctx@entry=0x70eae0, direct_connect=direct_connect@entry=false)
at /usr/src/debug/brcm-spdk/gitAUTOINC+6ae1d4b779-r0/lib/nvme/nvme.c:553
#25 0x0000000000436fc4 in spdk_nvme_probe_async (trid=0xfffffffff0d8,
cb_ctx=cb_ctx@entry=0x6402c0, probe_cb=probe_cb@entry=0x420b70 <probe_cb>,
attach_cb=0x421b50 <attach_cb>, remove_cb=0x0)
    at
/usr/src/debug/brcm-spdk/gitAUTOINC+6ae1d4b779-r0/lib/nvme/nvme.c:1078
#26 0x00000000004372d0 in spdk_nvme_probe (trid=<optimized out>,
trid@entry=0x0,
cb_ctx=cb_ctx@entry=0x6402c0, probe_cb=probe_cb@entry=0x420b70 <probe_cb>,
attach_cb=attach_cb@entry=0x421b50 <attach_cb>, remove_cb=remove_cb@entry
=0x0)
    at /usr/src/debug/brcm-spdk/gitAUTOINC+6ae1d4b779-r0/lib/nvme/nvme.c:625
#27 0x00000000004223b0 in bdev_nvme_library_init () at
/usr/src/debug/brcm-spdk/gitAUTOINC+6ae1d4b779-r0/lib/bdev/nvme/bdev_nvme.c:1539
#28 0x000000000048cf80 in spdk_bdev_modules_init () at
/usr/src/debug/brcm-spdk/gitAUTOINC+6ae1d4b779-r0/lib/bdev/bdev.c:963
#29 spdk_bdev_initialize (cb_fn=<optimized out>, cb_arg=<optimized out>) at
/usr/src/debug/brcm-spdk/gitAUTOINC+6ae1d4b779-r0/lib/bdev/bdev.c:1084
#30 0x00000000004811b8 in spdk_subsystem_init (cb_fn=<optimized out>,
cb_arg=<optimized out>) at
/usr/src/debug/brcm-spdk/gitAUTOINC+6ae1d4b779-r0/lib/event/subsystem.c:178
#31 0x0000000000484720 in _spdk_msg_queue_run_batch (max_msgs=<optimized
out>, thread=0x60c600) at
/usr/src/debug/brcm-spdk/gitAUTOINC+6ae1d4b779-r0/lib/thread/thread.c:406
#32 spdk_thread_poll (thread=thread@entry=0x60c600, max_msgs=max_msgs@entry=0,
now=now@entry=2386134829398) at
/usr/src/debug/brcm-spdk/gitAUTOINC+6ae1d4b779-r0/lib/thread/thread.c:462
#33 0x000000000048040c in _spdk_reactor_run (arg=0x60bcc0) at
/usr/src/debug/brcm-spdk/gitAUTOINC+6ae1d4b779-r0/lib/event/reactor.c:272
#34 0x0000000000480990 in spdk_reactors_start () at
/usr/src/debug/brcm-spdk/gitAUTOINC+6ae1d4b779-r0/lib/event/reactor.c:381
#35 0x000000000047f7d4 in spdk_app_start (opts=opts@entry=0xfffffffffa78,
start_fn=start_fn@entry=0x407350 <nvmf_tgt_started>, arg1=arg1@entry=0x0)
at /usr/src/debug/brcm-spdk/gitAUTOINC+6ae1d4b779-r0/lib/event/app.c:692
#36 0x0000000000405b2c in main (argc=7, argv=0xfffffffffc78) at
/usr/src/debug/brcm-spdk/gitAUTOINC+6ae1d4b779-r0/app/nvmf_tgt/nvmf_main.c:75

Regards,
Rajesh

On Thu, Oct 24, 2019 at 5:13 PM Burakov, Anatoly <anatoly.burakov@intel.com>
wrote:

> On 22-Oct-19 8:56 AM, Rajesh Ravi wrote:
> > Hi Anatoly,
> > I tried calling rte_vfio_dma_map() but, it  failed for me because
> >
> > vfio_dma_mem_map() failed with error:VFIO support not initialized
> > because: default_vfio_cfg->vfio_iommu_type=NULL ( container_dma_map
> > calls vfio_dma_mem_map  with vfio_cfg = default_vfio_cfg )
> >
> > call sequence: rte_vfio_dma_map() -->container_dma_map()
> > --> vfio_dma_mem_map
> >
> > Regards,
> > Rajesh
> >
> > On Mon, Oct 21, 2019 at 9:16 PM Rajesh Ravi <rajesh.ravi@broadcom.com
> > <mailto:rajesh.ravi@broadcom.com>> wrote:
> >
> >     Thanks Anatoly for prompt response. Sorry for the delayed response,
> >     I took some time to reverify with SPDK.
> >
> >     Infact, I do want the iommu mapping to be performed. I don't want it
> >     to be bypassed by type1_map()
> >     [lib/librte_eal/linuxapp/eal/eal_vfio.c] for external memory.
> >
> >     Then, if I understood correctly, you suggested to call
> >     rte_vfio_dma_map() as an alternative.
> >
> >     _Context & clarification_
> >
> >     1) We 're using DPDK to manage/allocate memory for SPDK through heap
> >     API.
> >
> >        The only DPDK APIs we 're calling are:
> >        A)rte_malloc_heap_memory_add() to add external memory to heap.
> >        B)rte_malloc_heap_get_socket() & rte_malloc_socket() to allocate
> >     memory
> >
> >     /Are you suggesting to make a call to rte_vfio_dma_map() from spdk,
> >     in addition to the APIs listed under 1)A & 1)B instead of modifying
> >     DPDK vfio functions?/
> >         Please confirm, Probably I missed the context and might not have
> >     understood fully.
> >
> >
> >     2) .dma_user_map_func=vfio_type1_dma_mem_map() is called from 2
> >     paths in dpdk. In either case call to dma_user_map_func() is skipped.
> >         A) _During the startup, as you said before:_
> >           rte_vfio_setup_device() --> type1_map()
> >
> >         B)During allocation event:
> >              vfio_mem_event_callback()  (type=RTE_MEM_EVENT_ALLOC,..)
> >     -->vfio_dma_mem_map() -->dma_user_map_func()
> >
>  ----------------------------------------------------------------------------------------------------------------------------------------------------------------------
> >     _Conclusion_
> >
> >     So we have 2 alternatives:
> >
> >     A) make additional call to rte_vfio_dma_map() API after adding
> >     external memory using rte_malloc_heap_memory_add()  API.
> >
> >     B) remove msl->external check which bypasses call to
> >     dma_user_map_func() in DPDK.
> >
> >     I modified DPDK functions [Option B) ]. I guess you 're
> >     suggesting option A)
> >
> >     Please confirm.
> >
> >     Regards,
> >     Rajesh
>
> Hi Rajesh,
>
> Apologies for delayed response.
>
> I'm still not sure i understand your problem.
>
> When adding memory with rte_malloc_heap_memory_add(), this memory is, as
> far as i can tell, automatically mapped for DMA mapping with VFIO. This
> happens because:
>
> rte_malloc_heap_memory_add() -> malloc_heap_add_external_memory() ->
> eal_memalloc_mem_event_notify() -> vfio_mem_event_callback() ->
> vfio_dma_mem_map()
>
> VFIO registers itself for mem event callbacks in
> eal_vfio.c:rte_vfio_setup_device():791.
>
> So, there is no need to map anything for DMA when you're creating
> external heaps - it is being mapped automatically by EAL.
>
> For an example implementation have a look at testpmd, specifically
> testpmd.c:mbuf_pool_create() for MP_ALLOC_XMEM type. This calls
> testpmd.c:setup_extmem(), which creates an external heap. Notice how it
> doesn't do anything to map the memory for DMA, because that is already
> done by EAL. You can verify this working by adding --mp-alloc=xmem
> argument to testpmd to force it to use external memory for its mempools.
>
> So, what is the actual issue here, given that the memory is mapped for
> DMA by EAL automatically when it is added?
>
> --
> Thanks,
> Anatoly
>
Burakov, Anatoly Oct. 25, 2019, 3:02 p.m. UTC | #8
On 25-Oct-19 1:53 PM, Rajesh Ravi wrote:
> Thanks Anatoly.
> I think now we 're close to sync. You understood it right: the problem 
> is dma/iommu mappings are not getting created.
> 
> 1)vfio_mem_event_callback() is getting called later when memory is 
> allocated from heap but not when external memory is added to heap

Please correct me if i'm wrong, but i believe that's not true. 
vfio_mem_event_callback() is called every time memory is added to a 
heap. That includes internal and external memory [1] [2].

[1] http://git.dpdk.org/dpdk/tree/lib/librte_eal/common/rte_malloc.c#n397
[2] http://git.dpdk.org/dpdk/tree/lib/librte_eal/common/malloc_heap.c#n1231

The only way the callback doesn't get called is if VFIO doesn't get 
initialize and/or fails to register said callback.

> 
> 2)vfio_dma_mem_map() is called inside vfio_mem_event_callback() but it's 
> skipped in case of external memory because it's invoked inside following 
> if condition:
>      if (rte_eal_iova_mode() == RTE_IOVA_VA && msl->external == 0)
>      [ I verified this also with gdb by stepping over line by line 
> inside vfio_mem_event_callback() ]
>    We 're using DPDK v19.02 but I see almost same code and flow in DPDK 
> 19.08-rc4 as well.

I believe that is not true either.

The line you're referring to [1] does *not* skip external memory 
segments (or rather, that's not its intention). Instead, it is checking 
if we can assume that the region is IOVA-contiguous. For IOVA as VA 
mode, we can assume that, *but not for external memory*, which is why it 
is skipped.

After this check, there is a loop that goes memseg by memseg [2], and it 
does *not* skip external memory. The memseg will only get skipped if it 
doesn't have a valid IOVA address [3], which will only happen if you 
don't provide an IOVA table at heap creation time (we can't make 
assumptions about desired IOVA addresses, so we just skip them). Perhaps 
it should output a debug message when it does that, but the important 
thing is, the code does nothing to skip external memory.

[1] http://git.dpdk.org/dpdk/tree/lib/librte_eal/linux/eal/eal_vfio.c#n521
[2] http://git.dpdk.org/dpdk/tree/lib/librte_eal/linux/eal/eal_vfio.c#n533
[3] http://git.dpdk.org/dpdk/tree/lib/librte_eal/linux/eal/eal_vfio.c#n536

> ---------------------------------------------------------------------------------------------------------------------------------
> [ _Further explanation of point 1) :_
>   I used gdb and set break point and checked back trace to verify the 
> call stack trace: first I set break point at rte_malloc_heap_memory_add.
>    Then I hit break point and then set break point at 
> vfio_mem_event_callback. I see from back trace that it's from 
> malloc_heap_alloc () with event = RTE_MEM_EVENT_ALLOC as can be seen 
> from log at the end of this mail.
> 
>   And also I don't see any event for memory ADD as below:
> enum rte_mem_event {
>           RTE_MEM_EVENT_ALLOC = 0, /**< Allocation event. */
>           RTE_MEM_EVENT_FREE,      /**< Deallocation event. */
>   }; ]
> 
> *_Log_*
> *_
> _*
> (gdb) bt
> #0  vfio_mem_event_callback (type=RTE_MEM_EVENT_ALLOC, 
> addr=0x200016c00000, len=4194304, arg=0x0) at 
> /usr/src/debug/brcm-dpdk/gitAUTOINC+ca5edf7f64-r0/lib/librte_eal/linuxapp/eal/eal_vfio.c:512
> #1  0x00000000004b78f0 in eal_memalloc_mem_event_notify 
> (event=event@entry=RTE_MEM_EVENT_ALLOC, 
> start=start@entry=0x200016c00000, len=len@entry=4194304)
>      at 
> /usr/src/debug/brcm-dpdk/gitAUTOINC+ca5edf7f64-r0/lib/librte_eal/common/eal_common_memalloc.c:248
> #2  0x00000000004c7a90 in try_expand_heap_primary (contig=false, 
> bound=0, align=35184753770496, flags=0, socket=0, 
> elt_size=35184753770496, pg_sz=2097152, heap=0x569b3c 
> <early_mem_config+10364>)
>      at 
> /usr/src/debug/brcm-dpdk/gitAUTOINC+ca5edf7f64-r0/lib/librte_eal/common/malloc_heap.c:412
> #3  try_expand_heap (heap=heap@entry=0x569b3c <early_mem_config+10364>, 
> pg_sz=2097152, elt_size=elt_size@entry=8192, socket=socket@entry=0, 
> flags=flags@entry=0, align=align@entry=2097152, bound=bound@entry=0, 
> contig=contig@entry=false)
>      at 
> /usr/src/debug/brcm-dpdk/gitAUTOINC+ca5edf7f64-r0/lib/librte_eal/common/malloc_heap.c:494
> #4  0x00000000004c7f28 in alloc_more_mem_on_socket 
> (heap=heap@entry=0x569b3c <early_mem_config+10364>, 
> size=size@entry=8192, socket=0, flags=flags@entry=0, 
> align=align@entry=2097152, bound=bound@entry=0, contig=contig@entry=false)
>      at 
> /usr/src/debug/brcm-dpdk/gitAUTOINC+ca5edf7f64-r0/lib/librte_eal/common/malloc_heap.c:622
> #5  0x00000000004c822c in malloc_heap_alloc_on_heap_id 
> (size=size@entry=8192, heap_id=heap_id@entry=0, flags=flags@entry=0, 
> align=<optimized out>, align@entry=2097152, bound=bound@entry=0, 
> contig=contig@entry=false, type=0x0)
>      at 
> /usr/src/debug/brcm-dpdk/gitAUTOINC+ca5edf7f64-r0/lib/librte_eal/common/malloc_heap.c:676
> #6  0x00000000004c8464 in malloc_heap_alloc (type=type@entry=0x0, 
> size=size@entry=8192, socket_arg=<optimized out>, flags=flags@entry=0, 
> align=2097152, bound=bound@entry=0, contig=contig@entry=false)
>      at 
> /usr/src/debug/brcm-dpdk/gitAUTOINC+ca5edf7f64-r0/lib/librte_eal/common/malloc_heap.c:714
> #7  0x00000000004c498c in rte_malloc_socket (type=type@entry=0x0, 
> size=size@entry=8192, align=align@entry=2097152, socket_arg=<optimized 
> out>, socket_arg@entry=-1)
>      at 
> /usr/src/debug/brcm-dpdk/gitAUTOINC+ca5edf7f64-r0/lib/librte_eal/common/rte_malloc.c:58
> #8  0x000000000049fe24 in spdk_malloc (size=size@entry=8192, 
> align=align@entry=2097152, phys_addr=phys_addr@entry=0x0, 
> socket_id=socket_id@entry=-1, flags=flags@entry=3)
>      at 
> /usr/src/debug/brcm-spdk/gitAUTOINC+6ae1d4b779-r0/lib/env_dpdk/env.c:95
> #9  0x000000000049fe94 in spdk_zmalloc (size=8192, 
> align=align@entry=2097152, phys_addr=phys_addr@entry=0x0, 
> socket_id=socket_id@entry=-1, flags=flags@entry=3) at 
> /usr/src/debug/brcm-spdk/gitAUTOINC+6ae1d4b779-r0/lib/env_dpdk/env.c:108
> #10 0x0000000000431f64 in nvme_pcie_qpair_construct 
> (qpair=0x200016aed1d8, opts=opts@entry=0x0) at 
> /usr/src/debug/brcm-spdk/gitAUTOINC+6ae1d4b779-r0/lib/nvme/nvme_pcie.c:1054
> #11 0x000000000043233c in nvme_pcie_ctrlr_construct_admin_qpair 
> (ctrlr=0x200016aed340) at 
> /usr/src/debug/brcm-spdk/gitAUTOINC+6ae1d4b779-r0/lib/nvme/nvme_pcie.c:719
> #12 nvme_pcie_ctrlr_construct (trid=<optimized out>, opts=<optimized 
> out>, devhandle=0x70f4b0) at 
> /usr/src/debug/brcm-spdk/gitAUTOINC+6ae1d4b779-r0/lib/nvme/nvme_pcie.c:881
> #13 0x0000000000437674 in nvme_transport_ctrlr_construct 
> (trid=trid@entry=0xffffffffec98, opts=opts@entry=0xffffffffea00, 
> devhandle=devhandle@entry=0x70f4b0)
>      at 
> /usr/src/debug/brcm-spdk/gitAUTOINC+6ae1d4b779-r0/lib/nvme/nvme_transport.c:99
> #14 0x0000000000436b08 in nvme_ctrlr_probe 
> (trid=trid@entry=0xffffffffec98, probe_ctx=0x70eae0, 
> devhandle=devhandle@entry=0x70f4b0) at 
> /usr/src/debug/brcm-spdk/gitAUTOINC+6ae1d4b779-r0/lib/nvme/nvme.c:426
> #15 0x0000000000430d40 in pcie_nvme_enum_cb (ctx=0xffffffffefe8, 
> pci_dev=0x70f4b0) at 
> /usr/src/debug/brcm-spdk/gitAUTOINC+6ae1d4b779-r0/lib/nvme/nvme_pcie.c:752
> #16 0x00000000004a2e20 in spdk_pci_device_init (_drv=0x51b660 
> <g_nvme_pci_drv>, _dev=0x61b5f0) at 
> /usr/src/debug/brcm-spdk/gitAUTOINC+6ae1d4b779-r0/lib/env_dpdk/pci.c:347
> #17 0x00000000004d5c68 in rte_pci_probe_one_driver (dev=0x61b5f0, 
> dr=0x51b660 <g_nvme_pci_drv>) at 
> /usr/src/debug/brcm-dpdk/gitAUTOINC+ca5edf7f64-r0/drivers/bus/pci/pci_common.c:185
> #18 pci_probe_all_drivers (dev=0x61b5f0) at 
> /usr/src/debug/brcm-dpdk/gitAUTOINC+ca5edf7f64-r0/drivers/bus/pci/pci_common.c:259
> #19 rte_pci_probe () at 
> /usr/src/debug/brcm-dpdk/gitAUTOINC+ca5edf7f64-r0/drivers/bus/pci/pci_common.c:294
> #20 0x00000000004bb490 in rte_bus_probe () at 
> /usr/src/debug/brcm-dpdk/gitAUTOINC+ca5edf7f64-r0/lib/librte_eal/common/eal_common_bus.c:100
> #21 0x00000000004a3030 in spdk_pci_enumerate (driver=0x51b660 
> <g_nvme_pci_drv>, enum_cb=enum_cb@entry=0x430c50 <pcie_nvme_enum_cb>, 
> enum_ctx=enum_ctx@entry=0xffffffffefe8)
>      at 
> /usr/src/debug/brcm-spdk/gitAUTOINC+6ae1d4b779-r0/lib/env_dpdk/pci.c:502
> #22 0x00000000004319c4 in nvme_pcie_ctrlr_scan (probe_ctx=0x70eae0, 
> direct_connect=<optimized out>) at 
> /usr/src/debug/brcm-spdk/gitAUTOINC+6ae1d4b779-r0/lib/nvme/nvme_pcie.c:780
> #23 0x00000000004376ac in nvme_transport_ctrlr_scan 
> (probe_ctx=probe_ctx@entry=0x70eae0, 
> direct_connect=direct_connect@entry=false) at 
> /usr/src/debug/brcm-spdk/gitAUTOINC+6ae1d4b779-r0/lib/nvme/nvme_transport.c:106
> #24 0x0000000000436be4 in spdk_nvme_probe_internal 
> (probe_ctx=probe_ctx@entry=0x70eae0, 
> direct_connect=direct_connect@entry=false) at 
> /usr/src/debug/brcm-spdk/gitAUTOINC+6ae1d4b779-r0/lib/nvme/nvme.c:553
> #25 0x0000000000436fc4 in spdk_nvme_probe_async (trid=0xfffffffff0d8, 
> cb_ctx=cb_ctx@entry=0x6402c0, probe_cb=probe_cb@entry=0x420b70 
> <probe_cb>, attach_cb=0x421b50 <attach_cb>, remove_cb=0x0)
>      at 
> /usr/src/debug/brcm-spdk/gitAUTOINC+6ae1d4b779-r0/lib/nvme/nvme.c:1078
> #26 0x00000000004372d0 in spdk_nvme_probe (trid=<optimized out>, 
> trid@entry=0x0, cb_ctx=cb_ctx@entry=0x6402c0, 
> probe_cb=probe_cb@entry=0x420b70 <probe_cb>, 
> attach_cb=attach_cb@entry=0x421b50 <attach_cb>, 
> remove_cb=remove_cb@entry=0x0)
>      at 
> /usr/src/debug/brcm-spdk/gitAUTOINC+6ae1d4b779-r0/lib/nvme/nvme.c:625
> #27 0x00000000004223b0 in bdev_nvme_library_init () at 
> /usr/src/debug/brcm-spdk/gitAUTOINC+6ae1d4b779-r0/lib/bdev/nvme/bdev_nvme.c:1539
> #28 0x000000000048cf80 in spdk_bdev_modules_init () at 
> /usr/src/debug/brcm-spdk/gitAUTOINC+6ae1d4b779-r0/lib/bdev/bdev.c:963
> #29 spdk_bdev_initialize (cb_fn=<optimized out>, cb_arg=<optimized out>) 
> at /usr/src/debug/brcm-spdk/gitAUTOINC+6ae1d4b779-r0/lib/bdev/bdev.c:1084
> #30 0x00000000004811b8 in spdk_subsystem_init (cb_fn=<optimized out>, 
> cb_arg=<optimized out>) at 
> /usr/src/debug/brcm-spdk/gitAUTOINC+6ae1d4b779-r0/lib/event/subsystem.c:178
> #31 0x0000000000484720 in _spdk_msg_queue_run_batch (max_msgs=<optimized 
> out>, thread=0x60c600) at 
> /usr/src/debug/brcm-spdk/gitAUTOINC+6ae1d4b779-r0/lib/thread/thread.c:406
> #32 spdk_thread_poll (thread=thread@entry=0x60c600, 
> max_msgs=max_msgs@entry=0, now=now@entry=2386134829398) at 
> /usr/src/debug/brcm-spdk/gitAUTOINC+6ae1d4b779-r0/lib/thread/thread.c:462
> #33 0x000000000048040c in _spdk_reactor_run (arg=0x60bcc0) at 
> /usr/src/debug/brcm-spdk/gitAUTOINC+6ae1d4b779-r0/lib/event/reactor.c:272
> #34 0x0000000000480990 in spdk_reactors_start () at 
> /usr/src/debug/brcm-spdk/gitAUTOINC+6ae1d4b779-r0/lib/event/reactor.c:381
> #35 0x000000000047f7d4 in spdk_app_start 
> (opts=opts@entry=0xfffffffffa78, start_fn=start_fn@entry=0x407350 
> <nvmf_tgt_started>, arg1=arg1@entry=0x0) at 
> /usr/src/debug/brcm-spdk/gitAUTOINC+6ae1d4b779-r0/lib/event/app.c:692
> #36 0x0000000000405b2c in main (argc=7, argv=0xfffffffffc78) at 
> /usr/src/debug/brcm-spdk/gitAUTOINC+6ae1d4b779-r0/app/nvmf_tgt/nvmf_main.c:75
> 
> Regards,
> Rajesh
> 
> On Thu, Oct 24, 2019 at 5:13 PM Burakov, Anatoly 
> <anatoly.burakov@intel.com <mailto:anatoly.burakov@intel.com>> wrote:
> 
>     On 22-Oct-19 8:56 AM, Rajesh Ravi wrote:
>      > Hi Anatoly,
>      > I tried calling rte_vfio_dma_map() but, it  failed for me because
>      >
>      > vfio_dma_mem_map() failed with error:VFIO support not initialized
>      > because: default_vfio_cfg->vfio_iommu_type=NULL ( container_dma_map
>      > calls vfio_dma_mem_map  with vfio_cfg = default_vfio_cfg )
>      >
>      > call sequence: rte_vfio_dma_map() -->container_dma_map()
>      > --> vfio_dma_mem_map
>      >
>      > Regards,
>      > Rajesh
>      >
>      > On Mon, Oct 21, 2019 at 9:16 PM Rajesh Ravi
>     <rajesh.ravi@broadcom.com <mailto:rajesh.ravi@broadcom.com>
>      > <mailto:rajesh.ravi@broadcom.com
>     <mailto:rajesh.ravi@broadcom.com>>> wrote:
>      >
>      >     Thanks Anatoly for prompt response. Sorry for the delayed
>     response,
>      >     I took some time to reverify with SPDK.
>      >
>      >     Infact, I do want the iommu mapping to be performed. I don't
>     want it
>      >     to be bypassed by type1_map()
>      >     [lib/librte_eal/linuxapp/eal/eal_vfio.c] for external memory.
>      >
>      >     Then, if I understood correctly, you suggested to call
>      >     rte_vfio_dma_map() as an alternative.
>      >
>      >     _Context & clarification_
>      >
>      >     1) We 're using DPDK to manage/allocate memory for SPDK
>     through heap
>      >     API.
>      >
>      >        The only DPDK APIs we 're calling are:
>      >        A)rte_malloc_heap_memory_add() to add external memory to heap.
>      >        B)rte_malloc_heap_get_socket() & rte_malloc_socket() to
>     allocate
>      >     memory
>      >
>      >     /Are you suggesting to make a call to rte_vfio_dma_map() from
>     spdk,
>      >     in addition to the APIs listed under 1)A & 1)B instead of
>     modifying
>      >     DPDK vfio functions?/
>      >         Please confirm, Probably I missed the context and might
>     not have
>      >     understood fully.
>      >
>      >
>      >     2) .dma_user_map_func=vfio_type1_dma_mem_map() is called from 2
>      >     paths in dpdk. In either case call to dma_user_map_func() is
>     skipped.
>      >         A) _During the startup, as you said before:_
>      >           rte_vfio_setup_device() --> type1_map()
>      >
>      >         B)During allocation event:
>      >              vfio_mem_event_callback()  (type=RTE_MEM_EVENT_ALLOC,..)
>      >     -->vfio_dma_mem_map() -->dma_user_map_func()
>      >   
>       ----------------------------------------------------------------------------------------------------------------------------------------------------------------------
>      >     _Conclusion_
>      >
>      >     So we have 2 alternatives:
>      >
>      >     A) make additional call to rte_vfio_dma_map() API after adding
>      >     external memory using rte_malloc_heap_memory_add()  API.
>      >
>      >     B) remove msl->external check which bypasses call to
>      >     dma_user_map_func() in DPDK.
>      >
>      >     I modified DPDK functions [Option B) ]. I guess you 're
>      >     suggesting option A)
>      >
>      >     Please confirm.
>      >
>      >     Regards,
>      >     Rajesh
> 
>     Hi Rajesh,
> 
>     Apologies for delayed response.
> 
>     I'm still not sure i understand your problem.
> 
>     When adding memory with rte_malloc_heap_memory_add(), this memory
>     is, as
>     far as i can tell, automatically mapped for DMA mapping with VFIO. This
>     happens because:
> 
>     rte_malloc_heap_memory_add() -> malloc_heap_add_external_memory() ->
>     eal_memalloc_mem_event_notify() -> vfio_mem_event_callback() ->
>     vfio_dma_mem_map()
> 
>     VFIO registers itself for mem event callbacks in
>     eal_vfio.c:rte_vfio_setup_device():791.
> 
>     So, there is no need to map anything for DMA when you're creating
>     external heaps - it is being mapped automatically by EAL.
> 
>     For an example implementation have a look at testpmd, specifically
>     testpmd.c:mbuf_pool_create() for MP_ALLOC_XMEM type. This calls
>     testpmd.c:setup_extmem(), which creates an external heap. Notice how it
>     doesn't do anything to map the memory for DMA, because that is already
>     done by EAL. You can verify this working by adding --mp-alloc=xmem
>     argument to testpmd to force it to use external memory for its mempools.
> 
>     So, what is the actual issue here, given that the memory is mapped for
>     DMA by EAL automatically when it is added?
> 
>     -- 
>     Thanks,
>     Anatoly
> 
> 
> 
> -- 
> Regards,
> Rajesh
Rajesh Ravi Oct. 30, 2019, 7:50 p.m. UTC | #9
Thanks Anatoly.
Please find  inline below:

[Anatoly] vfio_mem_event_callback() is called every time memory is added to
a
heap. That includes internal and external memory

[Rajesh] malloc_heap_add_external_memory() does call
eal_memalloc_mem_event_notify
[ instead of  vfio_mem_event_callback() ]
              But, no callback function is getting called from inside
eal_memalloc_mem_event_notify()
              execution flow is not entering inside following loop:

              *TAILQ_FOREACH(entry, &mem_event_callback_list, next) {*



*                 RTE_LOG(DEBUG, EAL, "Calling mem event callback
'%s:%p'\n",                         entry->name, entry->arg);
   entry->clb(event, start, len, entry->arg);              }*

Do you mean to say,  we are supposed to explicitly register a callback
which separately builds  iommu tables in addition to calling
rte_malloc_heap_memory_add()  API?

Regards,
Rajesh


On Fri, Oct 25, 2019 at 8:32 PM Burakov, Anatoly <anatoly.burakov@intel.com>
wrote:

> On 25-Oct-19 1:53 PM, Rajesh Ravi wrote:
> > Thanks Anatoly.
> > I think now we 're close to sync. You understood it right: the problem
> > is dma/iommu mappings are not getting created.
> >
> > 1)vfio_mem_event_callback() is getting called later when memory is
> > allocated from heap but not when external memory is added to heap
>
> Please correct me if i'm wrong, but i believe that's not true.
> vfio_mem_event_callback() is called every time memory is added to a
> heap. That includes internal and external memory [1] [2].
>
> [1] http://git.dpdk.org/dpdk/tree/lib/librte_eal/common/rte_malloc.c#n397
> [2]
> http://git.dpdk.org/dpdk/tree/lib/librte_eal/common/malloc_heap.c#n1231
>
> The only way the callback doesn't get called is if VFIO doesn't get
> initialize and/or fails to register said callback.
>
> >
> > 2)vfio_dma_mem_map() is called inside vfio_mem_event_callback() but it's
> > skipped in case of external memory because it's invoked inside following
> > if condition:
> >      if (rte_eal_iova_mode() == RTE_IOVA_VA && msl->external == 0)
> >      [ I verified this also with gdb by stepping over line by line
> > inside vfio_mem_event_callback() ]
> >    We 're using DPDK v19.02 but I see almost same code and flow in DPDK
> > 19.08-rc4 as well.
>
> I believe that is not true either.
>
> The line you're referring to [1] does *not* skip external memory
> segments (or rather, that's not its intention). Instead, it is checking
> if we can assume that the region is IOVA-contiguous. For IOVA as VA
> mode, we can assume that, *but not for external memory*, which is why it
> is skipped.
>
> After this check, there is a loop that goes memseg by memseg [2], and it
> does *not* skip external memory. The memseg will only get skipped if it
> doesn't have a valid IOVA address [3], which will only happen if you
> don't provide an IOVA table at heap creation time (we can't make
> assumptions about desired IOVA addresses, so we just skip them). Perhaps
> it should output a debug message when it does that, but the important
> thing is, the code does nothing to skip external memory.
>
> [1] http://git.dpdk.org/dpdk/tree/lib/librte_eal/linux/eal/eal_vfio.c#n521
> [2] http://git.dpdk.org/dpdk/tree/lib/librte_eal/linux/eal/eal_vfio.c#n533
> [3] http://git.dpdk.org/dpdk/tree/lib/librte_eal/linux/eal/eal_vfio.c#n536
>
> >
> ---------------------------------------------------------------------------------------------------------------------------------
> > [ _Further explanation of point 1) :_
> >   I used gdb and set break point and checked back trace to verify the
> > call stack trace: first I set break point at rte_malloc_heap_memory_add.
> >    Then I hit break point and then set break point at
> > vfio_mem_event_callback. I see from back trace that it's from
> > malloc_heap_alloc () with event = RTE_MEM_EVENT_ALLOC as can be seen
> > from log at the end of this mail.
> >
> >   And also I don't see any event for memory ADD as below:
> > enum rte_mem_event {
> >           RTE_MEM_EVENT_ALLOC = 0, /**< Allocation event. */
> >           RTE_MEM_EVENT_FREE,      /**< Deallocation event. */
> >   }; ]
> >
> > *_Log_*
> > *_
> > _*
> > (gdb) bt
> > #0  vfio_mem_event_callback (type=RTE_MEM_EVENT_ALLOC,
> > addr=0x200016c00000, len=4194304, arg=0x0) at
> >
> /usr/src/debug/brcm-dpdk/gitAUTOINC+ca5edf7f64-r0/lib/librte_eal/linuxapp/eal/eal_vfio.c:512
> > #1  0x00000000004b78f0 in eal_memalloc_mem_event_notify
> > (event=event@entry=RTE_MEM_EVENT_ALLOC,
> > start=start@entry=0x200016c00000, len=len@entry=4194304)
> >      at
> >
> /usr/src/debug/brcm-dpdk/gitAUTOINC+ca5edf7f64-r0/lib/librte_eal/common/eal_common_memalloc.c:248
> > #2  0x00000000004c7a90 in try_expand_heap_primary (contig=false,
> > bound=0, align=35184753770496, flags=0, socket=0,
> > elt_size=35184753770496, pg_sz=2097152, heap=0x569b3c
> > <early_mem_config+10364>)
> >      at
> >
> /usr/src/debug/brcm-dpdk/gitAUTOINC+ca5edf7f64-r0/lib/librte_eal/common/malloc_heap.c:412
> > #3  try_expand_heap (heap=heap@entry=0x569b3c <early_mem_config+10364>,
> > pg_sz=2097152, elt_size=elt_size@entry=8192, socket=socket@entry=0,
> > flags=flags@entry=0, align=align@entry=2097152, bound=bound@entry=0,
> > contig=contig@entry=false)
> >      at
> >
> /usr/src/debug/brcm-dpdk/gitAUTOINC+ca5edf7f64-r0/lib/librte_eal/common/malloc_heap.c:494
> > #4  0x00000000004c7f28 in alloc_more_mem_on_socket
> > (heap=heap@entry=0x569b3c <early_mem_config+10364>,
> > size=size@entry=8192, socket=0, flags=flags@entry=0,
> > align=align@entry=2097152, bound=bound@entry=0, contig=contig@entry
> =false)
> >      at
> >
> /usr/src/debug/brcm-dpdk/gitAUTOINC+ca5edf7f64-r0/lib/librte_eal/common/malloc_heap.c:622
> > #5  0x00000000004c822c in malloc_heap_alloc_on_heap_id
> > (size=size@entry=8192, heap_id=heap_id@entry=0, flags=flags@entry=0,
> > align=<optimized out>, align@entry=2097152, bound=bound@entry=0,
> > contig=contig@entry=false, type=0x0)
> >      at
> >
> /usr/src/debug/brcm-dpdk/gitAUTOINC+ca5edf7f64-r0/lib/librte_eal/common/malloc_heap.c:676
> > #6  0x00000000004c8464 in malloc_heap_alloc (type=type@entry=0x0,
> > size=size@entry=8192, socket_arg=<optimized out>, flags=flags@entry=0,
> > align=2097152, bound=bound@entry=0, contig=contig@entry=false)
> >      at
> >
> /usr/src/debug/brcm-dpdk/gitAUTOINC+ca5edf7f64-r0/lib/librte_eal/common/malloc_heap.c:714
> > #7  0x00000000004c498c in rte_malloc_socket (type=type@entry=0x0,
> > size=size@entry=8192, align=align@entry=2097152, socket_arg=<optimized
> > out>, socket_arg@entry=-1)
> >      at
> >
> /usr/src/debug/brcm-dpdk/gitAUTOINC+ca5edf7f64-r0/lib/librte_eal/common/rte_malloc.c:58
> > #8  0x000000000049fe24 in spdk_malloc (size=size@entry=8192,
> > align=align@entry=2097152, phys_addr=phys_addr@entry=0x0,
> > socket_id=socket_id@entry=-1, flags=flags@entry=3)
> >      at
> > /usr/src/debug/brcm-spdk/gitAUTOINC+6ae1d4b779-r0/lib/env_dpdk/env.c:95
> > #9  0x000000000049fe94 in spdk_zmalloc (size=8192,
> > align=align@entry=2097152, phys_addr=phys_addr@entry=0x0,
> > socket_id=socket_id@entry=-1, flags=flags@entry=3) at
> > /usr/src/debug/brcm-spdk/gitAUTOINC+6ae1d4b779-r0/lib/env_dpdk/env.c:108
> > #10 0x0000000000431f64 in nvme_pcie_qpair_construct
> > (qpair=0x200016aed1d8, opts=opts@entry=0x0) at
> >
> /usr/src/debug/brcm-spdk/gitAUTOINC+6ae1d4b779-r0/lib/nvme/nvme_pcie.c:1054
> > #11 0x000000000043233c in nvme_pcie_ctrlr_construct_admin_qpair
> > (ctrlr=0x200016aed340) at
> >
> /usr/src/debug/brcm-spdk/gitAUTOINC+6ae1d4b779-r0/lib/nvme/nvme_pcie.c:719
> > #12 nvme_pcie_ctrlr_construct (trid=<optimized out>, opts=<optimized
> > out>, devhandle=0x70f4b0) at
> >
> /usr/src/debug/brcm-spdk/gitAUTOINC+6ae1d4b779-r0/lib/nvme/nvme_pcie.c:881
> > #13 0x0000000000437674 in nvme_transport_ctrlr_construct
> > (trid=trid@entry=0xffffffffec98, opts=opts@entry=0xffffffffea00,
> > devhandle=devhandle@entry=0x70f4b0)
> >      at
> >
> /usr/src/debug/brcm-spdk/gitAUTOINC+6ae1d4b779-r0/lib/nvme/nvme_transport.c:99
> > #14 0x0000000000436b08 in nvme_ctrlr_probe
> > (trid=trid@entry=0xffffffffec98, probe_ctx=0x70eae0,
> > devhandle=devhandle@entry=0x70f4b0) at
> > /usr/src/debug/brcm-spdk/gitAUTOINC+6ae1d4b779-r0/lib/nvme/nvme.c:426
> > #15 0x0000000000430d40 in pcie_nvme_enum_cb (ctx=0xffffffffefe8,
> > pci_dev=0x70f4b0) at
> >
> /usr/src/debug/brcm-spdk/gitAUTOINC+6ae1d4b779-r0/lib/nvme/nvme_pcie.c:752
> > #16 0x00000000004a2e20 in spdk_pci_device_init (_drv=0x51b660
> > <g_nvme_pci_drv>, _dev=0x61b5f0) at
> > /usr/src/debug/brcm-spdk/gitAUTOINC+6ae1d4b779-r0/lib/env_dpdk/pci.c:347
> > #17 0x00000000004d5c68 in rte_pci_probe_one_driver (dev=0x61b5f0,
> > dr=0x51b660 <g_nvme_pci_drv>) at
> >
> /usr/src/debug/brcm-dpdk/gitAUTOINC+ca5edf7f64-r0/drivers/bus/pci/pci_common.c:185
> > #18 pci_probe_all_drivers (dev=0x61b5f0) at
> >
> /usr/src/debug/brcm-dpdk/gitAUTOINC+ca5edf7f64-r0/drivers/bus/pci/pci_common.c:259
> > #19 rte_pci_probe () at
> >
> /usr/src/debug/brcm-dpdk/gitAUTOINC+ca5edf7f64-r0/drivers/bus/pci/pci_common.c:294
> > #20 0x00000000004bb490 in rte_bus_probe () at
> >
> /usr/src/debug/brcm-dpdk/gitAUTOINC+ca5edf7f64-r0/lib/librte_eal/common/eal_common_bus.c:100
> > #21 0x00000000004a3030 in spdk_pci_enumerate (driver=0x51b660
> > <g_nvme_pci_drv>, enum_cb=enum_cb@entry=0x430c50 <pcie_nvme_enum_cb>,
> > enum_ctx=enum_ctx@entry=0xffffffffefe8)
> >      at
> > /usr/src/debug/brcm-spdk/gitAUTOINC+6ae1d4b779-r0/lib/env_dpdk/pci.c:502
> > #22 0x00000000004319c4 in nvme_pcie_ctrlr_scan (probe_ctx=0x70eae0,
> > direct_connect=<optimized out>) at
> >
> /usr/src/debug/brcm-spdk/gitAUTOINC+6ae1d4b779-r0/lib/nvme/nvme_pcie.c:780
> > #23 0x00000000004376ac in nvme_transport_ctrlr_scan
> > (probe_ctx=probe_ctx@entry=0x70eae0,
> > direct_connect=direct_connect@entry=false) at
> >
> /usr/src/debug/brcm-spdk/gitAUTOINC+6ae1d4b779-r0/lib/nvme/nvme_transport.c:106
> > #24 0x0000000000436be4 in spdk_nvme_probe_internal
> > (probe_ctx=probe_ctx@entry=0x70eae0,
> > direct_connect=direct_connect@entry=false) at
> > /usr/src/debug/brcm-spdk/gitAUTOINC+6ae1d4b779-r0/lib/nvme/nvme.c:553
> > #25 0x0000000000436fc4 in spdk_nvme_probe_async (trid=0xfffffffff0d8,
> > cb_ctx=cb_ctx@entry=0x6402c0, probe_cb=probe_cb@entry=0x420b70
> > <probe_cb>, attach_cb=0x421b50 <attach_cb>, remove_cb=0x0)
> >      at
> > /usr/src/debug/brcm-spdk/gitAUTOINC+6ae1d4b779-r0/lib/nvme/nvme.c:1078
> > #26 0x00000000004372d0 in spdk_nvme_probe (trid=<optimized out>,
> > trid@entry=0x0, cb_ctx=cb_ctx@entry=0x6402c0,
> > probe_cb=probe_cb@entry=0x420b70 <probe_cb>,
> > attach_cb=attach_cb@entry=0x421b50 <attach_cb>,
> > remove_cb=remove_cb@entry=0x0)
> >      at
> > /usr/src/debug/brcm-spdk/gitAUTOINC+6ae1d4b779-r0/lib/nvme/nvme.c:625
> > #27 0x00000000004223b0 in bdev_nvme_library_init () at
> >
> /usr/src/debug/brcm-spdk/gitAUTOINC+6ae1d4b779-r0/lib/bdev/nvme/bdev_nvme.c:1539
> > #28 0x000000000048cf80 in spdk_bdev_modules_init () at
> > /usr/src/debug/brcm-spdk/gitAUTOINC+6ae1d4b779-r0/lib/bdev/bdev.c:963
> > #29 spdk_bdev_initialize (cb_fn=<optimized out>, cb_arg=<optimized out>)
> > at /usr/src/debug/brcm-spdk/gitAUTOINC+6ae1d4b779-r0/lib/bdev/bdev.c:1084
> > #30 0x00000000004811b8 in spdk_subsystem_init (cb_fn=<optimized out>,
> > cb_arg=<optimized out>) at
> >
> /usr/src/debug/brcm-spdk/gitAUTOINC+6ae1d4b779-r0/lib/event/subsystem.c:178
> > #31 0x0000000000484720 in _spdk_msg_queue_run_batch (max_msgs=<optimized
> > out>, thread=0x60c600) at
> > /usr/src/debug/brcm-spdk/gitAUTOINC+6ae1d4b779-r0/lib/thread/thread.c:406
> > #32 spdk_thread_poll (thread=thread@entry=0x60c600,
> > max_msgs=max_msgs@entry=0, now=now@entry=2386134829398) at
> > /usr/src/debug/brcm-spdk/gitAUTOINC+6ae1d4b779-r0/lib/thread/thread.c:462
> > #33 0x000000000048040c in _spdk_reactor_run (arg=0x60bcc0) at
> > /usr/src/debug/brcm-spdk/gitAUTOINC+6ae1d4b779-r0/lib/event/reactor.c:272
> > #34 0x0000000000480990 in spdk_reactors_start () at
> > /usr/src/debug/brcm-spdk/gitAUTOINC+6ae1d4b779-r0/lib/event/reactor.c:381
> > #35 0x000000000047f7d4 in spdk_app_start
> > (opts=opts@entry=0xfffffffffa78, start_fn=start_fn@entry=0x407350
> > <nvmf_tgt_started>, arg1=arg1@entry=0x0) at
> > /usr/src/debug/brcm-spdk/gitAUTOINC+6ae1d4b779-r0/lib/event/app.c:692
> > #36 0x0000000000405b2c in main (argc=7, argv=0xfffffffffc78) at
> >
> /usr/src/debug/brcm-spdk/gitAUTOINC+6ae1d4b779-r0/app/nvmf_tgt/nvmf_main.c:75
> >
> > Regards,
> > Rajesh
> >
> > On Thu, Oct 24, 2019 at 5:13 PM Burakov, Anatoly
> > <anatoly.burakov@intel.com <mailto:anatoly.burakov@intel.com>> wrote:
> >
> >     On 22-Oct-19 8:56 AM, Rajesh Ravi wrote:
> >      > Hi Anatoly,
> >      > I tried calling rte_vfio_dma_map() but, it  failed for me because
> >      >
> >      > vfio_dma_mem_map() failed with error:VFIO support not initialized
> >      > because: default_vfio_cfg->vfio_iommu_type=NULL (
> container_dma_map
> >      > calls vfio_dma_mem_map  with vfio_cfg = default_vfio_cfg )
> >      >
> >      > call sequence: rte_vfio_dma_map() -->container_dma_map()
> >      > --> vfio_dma_mem_map
> >      >
> >      > Regards,
> >      > Rajesh
> >      >
> >      > On Mon, Oct 21, 2019 at 9:16 PM Rajesh Ravi
> >     <rajesh.ravi@broadcom.com <mailto:rajesh.ravi@broadcom.com>
> >      > <mailto:rajesh.ravi@broadcom.com
> >     <mailto:rajesh.ravi@broadcom.com>>> wrote:
> >      >
> >      >     Thanks Anatoly for prompt response. Sorry for the delayed
> >     response,
> >      >     I took some time to reverify with SPDK.
> >      >
> >      >     Infact, I do want the iommu mapping to be performed. I don't
> >     want it
> >      >     to be bypassed by type1_map()
> >      >     [lib/librte_eal/linuxapp/eal/eal_vfio.c] for external memory.
> >      >
> >      >     Then, if I understood correctly, you suggested to call
> >      >     rte_vfio_dma_map() as an alternative.
> >      >
> >      >     _Context & clarification_
> >      >
> >      >     1) We 're using DPDK to manage/allocate memory for SPDK
> >     through heap
> >      >     API.
> >      >
> >      >        The only DPDK APIs we 're calling are:
> >      >        A)rte_malloc_heap_memory_add() to add external memory to
> heap.
> >      >        B)rte_malloc_heap_get_socket() & rte_malloc_socket() to
> >     allocate
> >      >     memory
> >      >
> >      >     /Are you suggesting to make a call to rte_vfio_dma_map() from
> >     spdk,
> >      >     in addition to the APIs listed under 1)A & 1)B instead of
> >     modifying
> >      >     DPDK vfio functions?/
> >      >         Please confirm, Probably I missed the context and might
> >     not have
> >      >     understood fully.
> >      >
> >      >
> >      >     2) .dma_user_map_func=vfio_type1_dma_mem_map() is called from
> 2
> >      >     paths in dpdk. In either case call to dma_user_map_func() is
> >     skipped.
> >      >         A) _During the startup, as you said before:_
> >      >           rte_vfio_setup_device() --> type1_map()
> >      >
> >      >         B)During allocation event:
> >      >              vfio_mem_event_callback()
> (type=RTE_MEM_EVENT_ALLOC,..)
> >      >     -->vfio_dma_mem_map() -->dma_user_map_func()
> >      >
> >
>  ----------------------------------------------------------------------------------------------------------------------------------------------------------------------
> >      >     _Conclusion_
> >      >
> >      >     So we have 2 alternatives:
> >      >
> >      >     A) make additional call to rte_vfio_dma_map() API after adding
> >      >     external memory using rte_malloc_heap_memory_add()  API.
> >      >
> >      >     B) remove msl->external check which bypasses call to
> >      >     dma_user_map_func() in DPDK.
> >      >
> >      >     I modified DPDK functions [Option B) ]. I guess you 're
> >      >     suggesting option A)
> >      >
> >      >     Please confirm.
> >      >
> >      >     Regards,
> >      >     Rajesh
> >
> >     Hi Rajesh,
> >
> >     Apologies for delayed response.
> >
> >     I'm still not sure i understand your problem.
> >
> >     When adding memory with rte_malloc_heap_memory_add(), this memory
> >     is, as
> >     far as i can tell, automatically mapped for DMA mapping with VFIO.
> This
> >     happens because:
> >
> >     rte_malloc_heap_memory_add() -> malloc_heap_add_external_memory() ->
> >     eal_memalloc_mem_event_notify() -> vfio_mem_event_callback() ->
> >     vfio_dma_mem_map()
> >
> >     VFIO registers itself for mem event callbacks in
> >     eal_vfio.c:rte_vfio_setup_device():791.
> >
> >     So, there is no need to map anything for DMA when you're creating
> >     external heaps - it is being mapped automatically by EAL.
> >
> >     For an example implementation have a look at testpmd, specifically
> >     testpmd.c:mbuf_pool_create() for MP_ALLOC_XMEM type. This calls
> >     testpmd.c:setup_extmem(), which creates an external heap. Notice how
> it
> >     doesn't do anything to map the memory for DMA, because that is
> already
> >     done by EAL. You can verify this working by adding --mp-alloc=xmem
> >     argument to testpmd to force it to use external memory for its
> mempools.
> >
> >     So, what is the actual issue here, given that the memory is mapped
> for
> >     DMA by EAL automatically when it is added?
> >
> >     --
> >     Thanks,
> >     Anatoly
> >
> >
> >
> > --
> > Regards,
> > Rajesh
>
>
> --
> Thanks,
> Anatoly
>
Burakov, Anatoly Nov. 4, 2019, 10:25 a.m. UTC | #10
On 30-Oct-19 7:50 PM, Rajesh Ravi wrote:
> Thanks Anatoly.
> Please find  inline below:
> 
> [Anatoly] vfio_mem_event_callback() is called every time memory is added 
> to a
> heap. That includes internal and external memory
> 
> [Rajesh] malloc_heap_add_external_memory() does call 
> eal_memalloc_mem_event_notify [ instead of vfio_mem_event_callback() ]
>                But, no callback function is getting called from inside 
> eal_memalloc_mem_event_notify()
>                execution flow is not entering inside following loop:
> 
> /TAILQ_FOREACH(entry, &mem_event_callback_list, next) {/
> /                 RTE_LOG(DEBUG, EAL, "Calling mem event callback 
> '%s:%p'\n",
>                           entry->name, entry->arg);
>                   entry->clb(event, start, len, entry->arg);
>                }/
> 
> Do you mean to say,  we are supposed to explicitly register a callback 
> which separately builds  iommu tables in addition to calling 
> rte_malloc_heap_memory_add()  API?

Hi,

No, the callback in VFIO should be registered automatically [1] at EAL 
initialization (or, more precisely, when default container is 
initialized). Does that not happen in your case?

[1] http://git.dpdk.org/dpdk/tree/lib/librte_eal/linux/eal/eal_vfio.c#n791
Burakov, Anatoly Nov. 5, 2019, 11:41 a.m. UTC | #11
On 04-Nov-19 10:25 AM, Burakov, Anatoly wrote:
> On 30-Oct-19 7:50 PM, Rajesh Ravi wrote:
>> Thanks Anatoly.
>> Please find  inline below:
>>
>> [Anatoly] vfio_mem_event_callback() is called every time memory is 
>> added to a
>> heap. That includes internal and external memory
>>
>> [Rajesh] malloc_heap_add_external_memory() does call 
>> eal_memalloc_mem_event_notify [ instead of vfio_mem_event_callback() ]
>>                But, no callback function is getting called from inside 
>> eal_memalloc_mem_event_notify()
>>                execution flow is not entering inside following loop:
>>
>> /TAILQ_FOREACH(entry, &mem_event_callback_list, next) {/
>> /                 RTE_LOG(DEBUG, EAL, "Calling mem event callback 
>> '%s:%p'\n",
>>                           entry->name, entry->arg);
>>                   entry->clb(event, start, len, entry->arg);
>>                }/
>>
>> Do you mean to say,  we are supposed to explicitly register a callback 
>> which separately builds  iommu tables in addition to calling 
>> rte_malloc_heap_memory_add()  API?
> 
> Hi,
> 
> No, the callback in VFIO should be registered automatically [1] at EAL 
> initialization (or, more precisely, when default container is 
> initialized). Does that not happen in your case?
> 
> [1] http://git.dpdk.org/dpdk/tree/lib/librte_eal/linux/eal/eal_vfio.c#n791
> 

Hi Rajesh,

I think i figured it out. It is a defect in design of how external 
memory heaps are handled.

When VFIO initializes, it will find first VFIO-bound device, initialize 
the container, and set up DMA mappings. Then, you can add more memory 
through creating custom memory regions without adding them to heap 
(mmap() + rte_extmem_register() + rte_dev_dma_map()), or with adding 
them to heap (mmap() + rte_malloc_heap_add_memory()).

The problem is, memory registered through rte_dev_dma_map() will get 
added into a list of user maps, while heap memory will not - the 
assumption is that the DMA mapping will happen through the callback, but 
there is no record left anywhere that this memory is supposed to be mapped.

This makes it so that, if there are no VFIO-bound devices at startup, 
then you create a heap, and *then* you hotplug a device, the heap will 
not be mapped because (as you have correctly pointed out) type1_map() 
skips it, and it's not present in a list of user mem maps either, 
because it is heap memory, so EAL is supposed to handle it by itself and 
not through user map list.

There could be two fixes here. The easiest one is to just add another 
flag to the memseglist - that will work for 19.11, and that's what i 
intend on doing since we're breaking ABI anyway.

For older releases, a different approach would be required (i think 
scanning heaps is best we can do here) in order to keep the ABI and not 
introduce new stuff into rte_memseg_list.

I'll submit a patch shortly, it would be great if you could test it.
Rajesh Ravi Nov. 5, 2019, 2:10 p.m. UTC | #12
Thanks a lot Anatoly.
Will the same solution work with  DPDK 19.02 as well? We 're actually using
DPDK 19.02 for memory allocations for SPDK 19.07.
DPDK 19.11 may not be supported by SPDK 19.07 we 're currently using.

I 'll definitely test if the patch works with DPDK 19.02

Thanks again for your time and help.

Regards,
Rajesh




On Tue, Nov 5, 2019 at 5:11 PM Burakov, Anatoly <anatoly.burakov@intel.com>
wrote:

> On 04-Nov-19 10:25 AM, Burakov, Anatoly wrote:
> > On 30-Oct-19 7:50 PM, Rajesh Ravi wrote:
> >> Thanks Anatoly.
> >> Please find  inline below:
> >>
> >> [Anatoly] vfio_mem_event_callback() is called every time memory is
> >> added to a
> >> heap. That includes internal and external memory
> >>
> >> [Rajesh] malloc_heap_add_external_memory() does call
> >> eal_memalloc_mem_event_notify [ instead of vfio_mem_event_callback() ]
> >>                But, no callback function is getting called from inside
> >> eal_memalloc_mem_event_notify()
> >>                execution flow is not entering inside following loop:
> >>
> >> /TAILQ_FOREACH(entry, &mem_event_callback_list, next) {/
> >> /                 RTE_LOG(DEBUG, EAL, "Calling mem event callback
> >> '%s:%p'\n",
> >>                           entry->name, entry->arg);
> >>                   entry->clb(event, start, len, entry->arg);
> >>                }/
> >>
> >> Do you mean to say,  we are supposed to explicitly register a callback
> >> which separately builds  iommu tables in addition to calling
> >> rte_malloc_heap_memory_add()  API?
> >
> > Hi,
> >
> > No, the callback in VFIO should be registered automatically [1] at EAL
> > initialization (or, more precisely, when default container is
> > initialized). Does that not happen in your case?
> >
> > [1]
> http://git.dpdk.org/dpdk/tree/lib/librte_eal/linux/eal/eal_vfio.c#n791
> >
>
> Hi Rajesh,
>
> I think i figured it out. It is a defect in design of how external
> memory heaps are handled.
>
> When VFIO initializes, it will find first VFIO-bound device, initialize
> the container, and set up DMA mappings. Then, you can add more memory
> through creating custom memory regions without adding them to heap
> (mmap() + rte_extmem_register() + rte_dev_dma_map()), or with adding
> them to heap (mmap() + rte_malloc_heap_add_memory()).
>
> The problem is, memory registered through rte_dev_dma_map() will get
> added into a list of user maps, while heap memory will not - the
> assumption is that the DMA mapping will happen through the callback, but
> there is no record left anywhere that this memory is supposed to be mapped.
>
> This makes it so that, if there are no VFIO-bound devices at startup,
> then you create a heap, and *then* you hotplug a device, the heap will
> not be mapped because (as you have correctly pointed out) type1_map()
> skips it, and it's not present in a list of user mem maps either,
> because it is heap memory, so EAL is supposed to handle it by itself and
> not through user map list.
>
> There could be two fixes here. The easiest one is to just add another
> flag to the memseglist - that will work for 19.11, and that's what i
> intend on doing since we're breaking ABI anyway.
>
> For older releases, a different approach would be required (i think
> scanning heaps is best we can do here) in order to keep the ABI and not
> introduce new stuff into rte_memseg_list.
>
> I'll submit a patch shortly, it would be great if you could test it.
>
> --
> Thanks,
> Anatoly
>
Burakov, Anatoly Nov. 5, 2019, 3:18 p.m. UTC | #13
On 05-Nov-19 2:10 PM, Rajesh Ravi wrote:
> Thanks a lot Anatoly.
> Will the same solution work with  DPDK 19.02 as well? We 're actually 
> using DPDK 19.02 for memory allocations for SPDK 19.07.
> DPDK 19.11 may not be supported by SPDK 19.07 we 're currently using.
> 
> I 'll definitely test if the patch works with DPDK 19.02
> 
> Thanks again for your time and help.
> 
> Regards,
> Rajesh

Hi Rajesh,

Please test the patch [1] and see if it works for you. The patch is not 
intended for 19.02 (in fact, 19.02 is not even supported any more), but 
it may still apply. In the meantime, i'm working on a different patch 
for 18.11, that will achieve the same thing using different means.

[1] http://patches.dpdk.org/patch/62468/
Burakov, Anatoly Nov. 5, 2019, 5:13 p.m. UTC | #14
On 05-Nov-19 3:18 PM, Burakov, Anatoly wrote:
> On 05-Nov-19 2:10 PM, Rajesh Ravi wrote:
>> Thanks a lot Anatoly.
>> Will the same solution work with  DPDK 19.02 as well? We 're actually 
>> using DPDK 19.02 for memory allocations for SPDK 19.07.
>> DPDK 19.11 may not be supported by SPDK 19.07 we 're currently using.
>>
>> I 'll definitely test if the patch works with DPDK 19.02
>>
>> Thanks again for your time and help.
>>
>> Regards,
>> Rajesh
> 
> Hi Rajesh,
> 
> Please test the patch [1] and see if it works for you. The patch is not 
> intended for 19.02 (in fact, 19.02 is not even supported any more), but 
> it may still apply. In the meantime, i'm working on a different patch 
> for 18.11, that will achieve the same thing using different means.
> 
> [1] http://patches.dpdk.org/patch/62468/
> 

The following is an alternative implementation that is based off 18.11:

http://patches.dpdk.org/patch/62486/
David Marchand Nov. 6, 2019, 1:55 p.m. UTC | #15
On Tue, Nov 5, 2019 at 6:14 PM Burakov, Anatoly
<anatoly.burakov@intel.com> wrote:
>
> On 05-Nov-19 3:18 PM, Burakov, Anatoly wrote:
> > On 05-Nov-19 2:10 PM, Rajesh Ravi wrote:
> >> Thanks a lot Anatoly.
> >> Will the same solution work with  DPDK 19.02 as well? We 're actually
> >> using DPDK 19.02 for memory allocations for SPDK 19.07.
> >> DPDK 19.11 may not be supported by SPDK 19.07 we 're currently using.
> >>
> >> I 'll definitely test if the patch works with DPDK 19.02
> >>
> >> Thanks again for your time and help.
> >>
> >> Regards,
> >> Rajesh
> >
> > Hi Rajesh,
> >
> > Please test the patch [1] and see if it works for you. The patch is not
> > intended for 19.02 (in fact, 19.02 is not even supported any more), but
> > it may still apply. In the meantime, i'm working on a different patch
> > for 18.11, that will achieve the same thing using different means.
> >
> > [1] http://patches.dpdk.org/patch/62468/
> >
>
> The following is an alternative implementation that is based off 18.11:
>
> http://patches.dpdk.org/patch/62486/

Rajesh,

Did you have a chance to test Anatoly alternative patch?


Thanks.
David Marchand Nov. 7, 2019, 3:51 p.m. UTC | #16
On Wed, Nov 6, 2019 at 2:55 PM David Marchand <david.marchand@redhat.com> wrote:
>
> On Tue, Nov 5, 2019 at 6:14 PM Burakov, Anatoly
> <anatoly.burakov@intel.com> wrote:
> >
> > On 05-Nov-19 3:18 PM, Burakov, Anatoly wrote:
> > > On 05-Nov-19 2:10 PM, Rajesh Ravi wrote:
> > >> Thanks a lot Anatoly.
> > >> Will the same solution work with  DPDK 19.02 as well? We 're actually
> > >> using DPDK 19.02 for memory allocations for SPDK 19.07.
> > >> DPDK 19.11 may not be supported by SPDK 19.07 we 're currently using.
> > >>
> > >> I 'll definitely test if the patch works with DPDK 19.02
> > >>
> > >> Thanks again for your time and help.
> > >>
> > >> Regards,
> > >> Rajesh
> > >
> > > Hi Rajesh,
> > >
> > > Please test the patch [1] and see if it works for you. The patch is not
> > > intended for 19.02 (in fact, 19.02 is not even supported any more), but
> > > it may still apply. In the meantime, i'm working on a different patch
> > > for 18.11, that will achieve the same thing using different means.
> > >
> > > [1] http://patches.dpdk.org/patch/62468/
> > >
> >
> > The following is an alternative implementation that is based off 18.11:
> >
> > http://patches.dpdk.org/patch/62486/
>

Thanks Rajesh, following your test report, I applied Anatoly patch.
A backport will go to 18.11.

I will mark this current proposal as rejected.
If you think it should still be considered, then please set this patch
state back to new in patchwork and explain what is missing in this
thread.


Thanks.
Rajesh Ravi Nov. 7, 2019, 4:16 p.m. UTC | #17
Thanks David.

With Anatoly's patch applied my patch is not needed as the purpose is
solved.

Regards,
Rajesh


On Thu, Nov 7, 2019 at 9:21 PM David Marchand <david.marchand@redhat.com>
wrote:

> On Wed, Nov 6, 2019 at 2:55 PM David Marchand <david.marchand@redhat.com>
> wrote:
> >
> > On Tue, Nov 5, 2019 at 6:14 PM Burakov, Anatoly
> > <anatoly.burakov@intel.com> wrote:
> > >
> > > On 05-Nov-19 3:18 PM, Burakov, Anatoly wrote:
> > > > On 05-Nov-19 2:10 PM, Rajesh Ravi wrote:
> > > >> Thanks a lot Anatoly.
> > > >> Will the same solution work with  DPDK 19.02 as well? We 're
> actually
> > > >> using DPDK 19.02 for memory allocations for SPDK 19.07.
> > > >> DPDK 19.11 may not be supported by SPDK 19.07 we 're currently
> using.
> > > >>
> > > >> I 'll definitely test if the patch works with DPDK 19.02
> > > >>
> > > >> Thanks again for your time and help.
> > > >>
> > > >> Regards,
> > > >> Rajesh
> > > >
> > > > Hi Rajesh,
> > > >
> > > > Please test the patch [1] and see if it works for you. The patch is
> not
> > > > intended for 19.02 (in fact, 19.02 is not even supported any more),
> but
> > > > it may still apply. In the meantime, i'm working on a different patch
> > > > for 18.11, that will achieve the same thing using different means.
> > > >
> > > > [1] http://patches.dpdk.org/patch/62468/
> > > >
> > >
> > > The following is an alternative implementation that is based off 18.11:
> > >
> > > http://patches.dpdk.org/patch/62486/
> >
>
> Thanks Rajesh, following your test report, I applied Anatoly patch.
> A backport will go to 18.11.
>
> I will mark this current proposal as rejected.
> If you think it should still be considered, then please set this patch
> state back to new in patchwork and explain what is missing in this
> thread.
>
>
> Thanks.
> --
> David Marchand
>
>

Patch
diff mbox series

diff --git a/lib/librte_eal/common/eal_common_options.c b/lib/librte_eal/common/eal_common_options.c
index 05cae5f75..b46fb3870 100644
--- a/lib/librte_eal/common/eal_common_options.c
+++ b/lib/librte_eal/common/eal_common_options.c
@@ -78,6 +78,7 @@  eal_long_options[] = {
 	{OPT_VDEV,              1, NULL, OPT_VDEV_NUM             },
 	{OPT_VFIO_INTR,         1, NULL, OPT_VFIO_INTR_NUM        },
 	{OPT_VMWARE_TSC_MAP,    0, NULL, OPT_VMWARE_TSC_MAP_NUM   },
+	{OPT_ISO_CMEM,          0, NULL, OPT_ISO_CMEM_NUM         },
 	{OPT_LEGACY_MEM,        0, NULL, OPT_LEGACY_MEM_NUM       },
 	{OPT_SINGLE_FILE_SEGMENTS, 0, NULL, OPT_SINGLE_FILE_SEGMENTS_NUM},
 	{OPT_MATCH_ALLOCATIONS, 0, NULL, OPT_MATCH_ALLOCATIONS_NUM},
@@ -1327,6 +1328,10 @@  eal_parse_common_option(int opt, const char *optarg,
 		conf->no_hpet = 1;
 		break;
 
+	case OPT_ISO_CMEM_NUM:
+		conf->iso_cmem = 1;
+		break;
+
 	case OPT_VMWARE_TSC_MAP_NUM:
 		conf->vmware_tsc_map = 1;
 		break;
diff --git a/lib/librte_eal/common/eal_internal_cfg.h b/lib/librte_eal/common/eal_internal_cfg.h
index a42f34923..fb64b5f79 100644
--- a/lib/librte_eal/common/eal_internal_cfg.h
+++ b/lib/librte_eal/common/eal_internal_cfg.h
@@ -43,6 +43,7 @@  struct internal_config {
 	volatile unsigned no_hugetlbfs;   /**< true to disable hugetlbfs */
 	unsigned hugepage_unlink;         /**< true to unlink backing files */
 	volatile unsigned no_pci;         /**< true to disable PCI */
+	unsigned int iso_cmem;            /**< true to enable isolated cmem */
 	volatile unsigned no_hpet;        /**< true to disable HPET */
 	volatile unsigned vmware_tsc_map; /**< true to use VMware TSC mapping
 										* instead of native TSC */
diff --git a/lib/librte_eal/common/eal_options.h b/lib/librte_eal/common/eal_options.h
index 9855429e5..f56e2536b 100644
--- a/lib/librte_eal/common/eal_options.h
+++ b/lib/librte_eal/common/eal_options.h
@@ -61,6 +61,8 @@  enum {
 	OPT_VFIO_INTR_NUM,
 #define OPT_VMWARE_TSC_MAP    "vmware-tsc-map"
 	OPT_VMWARE_TSC_MAP_NUM,
+#define OPT_ISO_CMEM          "iso-cmem"
+	OPT_ISO_CMEM_NUM,
 #define OPT_LEGACY_MEM    "legacy-mem"
 	OPT_LEGACY_MEM_NUM,
 #define OPT_SINGLE_FILE_SEGMENTS    "single-file-segments"
diff --git a/lib/librte_eal/linux/eal/eal_vfio.c b/lib/librte_eal/linux/eal/eal_vfio.c
index 501c74f23..8fbad63cc 100644
--- a/lib/librte_eal/linux/eal/eal_vfio.c
+++ b/lib/librte_eal/linux/eal/eal_vfio.c
@@ -1250,7 +1250,7 @@  type1_map(const struct rte_memseg_list *msl, const struct rte_memseg *ms,
 {
 	int *vfio_container_fd = arg;
 
-	if (msl->external)
+	if (msl->external & !internal_config.iso_cmem)
 		return 0;
 
 	return vfio_type1_dma_mem_map(*vfio_container_fd, ms->addr_64, ms->iova,