Message ID | cover.1550048187.git.shahafs@mellanox.com (mailing list archive) |
---|---|
Headers |
Return-Path: <dev-bounces@dpdk.org> X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 151EB1B1DD; Wed, 13 Feb 2019 10:11:25 +0100 (CET) Received: from mellanox.co.il (mail-il-dmz.mellanox.com [193.47.165.129]) by dpdk.org (Postfix) with ESMTP id 331971B11F for <dev@dpdk.org>; Wed, 13 Feb 2019 10:11:20 +0100 (CET) Received: from Internal Mail-Server by MTLPINE1 (envelope-from shahafs@mellanox.com) with ESMTPS (AES256-SHA encrypted); 13 Feb 2019 11:11:18 +0200 Received: from unicorn01.mtl.labs.mlnx. (unicorn01.mtl.labs.mlnx [10.7.12.62]) by labmailer.mlnx (8.13.8/8.13.8) with ESMTP id x1D9BIOj031440; Wed, 13 Feb 2019 11:11:18 +0200 From: Shahaf Shuler <shahafs@mellanox.com> To: anatoly.burakov@intel.com, yskoh@mellanox.com, thomas@monjalon.net, ferruh.yigit@intel.com, nhorman@tuxdriver.com, gaetan.rivet@6wind.com Cc: dev@dpdk.org Date: Wed, 13 Feb 2019 11:10:20 +0200 Message-Id: <cover.1550048187.git.shahafs@mellanox.com> X-Mailer: git-send-email 2.12.0 Subject: [dpdk-dev] [PATCH 0/6] introduce DMA memory mapping for external memory X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions <dev.dpdk.org> List-Unsubscribe: <https://mails.dpdk.org/options/dev>, <mailto:dev-request@dpdk.org?subject=unsubscribe> List-Archive: <http://mails.dpdk.org/archives/dev/> List-Post: <mailto:dev@dpdk.org> List-Help: <mailto:dev-request@dpdk.org?subject=help> List-Subscribe: <https://mails.dpdk.org/listinfo/dev>, <mailto:dev-request@dpdk.org?subject=subscribe> Errors-To: dev-bounces@dpdk.org Sender: "dev" <dev-bounces@dpdk.org> |
Series |
introduce DMA memory mapping for external memory
|
|
Message
Shahaf Shuler
Feb. 13, 2019, 9:10 a.m. UTC
This series is in continue to RFC[1]. The DPDK APIs expose 3 different modes to work with memory used for DMA: 1. Use the DPDK owned memory (backed by the DPDK provided hugepages). This memory is allocated by the DPDK libraries, included in the DPDK memory system (memseg lists) and automatically DMA mapped by the DPDK layers. 2. Use memory allocated by the user and register to the DPDK memory systems. This is also referred as external memory. Upon registration of the external memory, the DPDK layers will DMA map it to all needed devices. 3. Use memory allocated by the user and not registered to the DPDK memory system. This is for users who wants to have tight control on this memory. The user will need to explicitly call DMA map function in order to register such memory to the different devices. The scope of the patch focus on #3 above. Currently the only way to map external memory is through VFIO (rte_vfio_dma_map). While VFIO is common, there are other vendors which use different ways to map memory (e.g. Mellanox and NXP). The work in this patch moves the DMA mapping to vendor agnostic APIs. A new map and unmap ops were added to rte_bus structure. Implementation of those was done currently only on the PCI bus. The implementation takes the driver map and umap implementation as bypass to the VFIO mapping. That is, in case of no specific map/unmap from the PCI driver, VFIO mapping, if possible, will be used. Application use with those APIs is quite simple: * allocate memory * take a device, and query its rte_device. * call the bus map function for this device. Future work will deprecate the rte_vfio_dma_map and rte_vfio_dma_unmap APIs, leaving the PCI device APIs as the preferred option for the user. [1] https://patches.dpdk.org/patch/47796/ Shahaf Shuler (6): vfio: allow DMA map of memory for the default vfio fd vfio: don't fail to DMA map if memory is already mapped bus: introduce DMA memory mapping for external memory net/mlx5: refactor external memory registration net/mlx5: support PCI device DMA map and unmap doc: deprecate VFIO DMA map APIs doc/guides/prog_guide/env_abstraction_layer.rst | 2 +- doc/guides/rel_notes/deprecation.rst | 4 + drivers/bus/pci/pci_common.c | 78 +++++++ drivers/bus/pci/rte_bus_pci.h | 14 ++ drivers/net/mlx5/mlx5.c | 2 + drivers/net/mlx5/mlx5_mr.c | 232 ++++++++++++++++--- drivers/net/mlx5/mlx5_rxtx.h | 5 + lib/librte_eal/common/eal_common_bus.c | 22 ++ lib/librte_eal/common/include/rte_bus.h | 57 +++++ lib/librte_eal/common/include/rte_vfio.h | 12 +- lib/librte_eal/linuxapp/eal/eal_vfio.c | 26 ++- lib/librte_eal/rte_eal_version.map | 2 + 12 files changed, 418 insertions(+), 38 deletions(-)
Comments
On Wed, Feb 13, 2019 at 9:11 AM Shahaf Shuler <shahafs@mellanox.com> wrote: > This series is in continue to RFC[1]. > > The DPDK APIs expose 3 different modes to work with memory used for DMA: > > 1. Use the DPDK owned memory (backed by the DPDK provided hugepages). > This memory is allocated by the DPDK libraries, included in the DPDK > memory system (memseg lists) and automatically DMA mapped by the DPDK > layers. > > 2. Use memory allocated by the user and register to the DPDK memory > systems. This is also referred as external memory. Upon registration of > the external memory, the DPDK layers will DMA map it to all needed > devices. > > 3. Use memory allocated by the user and not registered to the DPDK memory > system. This is for users who wants to have tight control on this > memory. The user will need to explicitly call DMA map function in order > to register such memory to the different devices. > > The scope of the patch focus on #3 above. > > Why can not we have case 2 covering case 3? > Currently the only way to map external memory is through VFIO > (rte_vfio_dma_map). While VFIO is common, there are other vendors > which use different ways to map memory (e.g. Mellanox and NXP). > > As you say, VFIO is common, and when allowing DMAs programmed in user space, the right thing to do. I'm assuming there is an IOMMU hardware and this is what Mellanox and NXP rely on in some way or another. Having each driver doing things in their own way will end up in a harder to validate system. If there is an IOMMU hardware, same mechanism should be used always, leaving to the IOMMU hw specific implementation to deal with the details. If a NIC is IOMMU-able, that should not be supported by specific vendor drivers but through a generic solution like VFIO which will validate a device with such capability and to perform the required actions for that case. VFIO and IOMMU should be modified as needed for supporting this requirement instead of leaving vendor drivers to implement their own solution. In any case, I think this support should be in a different patchset than the private user space mappings. > The work in this patch moves the DMA mapping to vendor agnostic APIs. > A new map and unmap ops were added to rte_bus structure. Implementation > of those was done currently only on the PCI bus. The implementation takes > the driver map and umap implementation as bypass to the VFIO mapping. > That is, in case of no specific map/unmap from the PCI driver, > VFIO mapping, if possible, will be used. > > Application use with those APIs is quite simple: > * allocate memory > * take a device, and query its rte_device. > * call the bus map function for this device. > > Future work will deprecate the rte_vfio_dma_map and rte_vfio_dma_unmap > APIs, leaving the PCI device APIs as the preferred option for the user. > > [1] https://patches.dpdk.org/patch/47796/ > > Shahaf Shuler (6): > vfio: allow DMA map of memory for the default vfio fd > vfio: don't fail to DMA map if memory is already mapped > bus: introduce DMA memory mapping for external memory > net/mlx5: refactor external memory registration > net/mlx5: support PCI device DMA map and unmap > doc: deprecate VFIO DMA map APIs > > doc/guides/prog_guide/env_abstraction_layer.rst | 2 +- > doc/guides/rel_notes/deprecation.rst | 4 + > drivers/bus/pci/pci_common.c | 78 +++++++ > drivers/bus/pci/rte_bus_pci.h | 14 ++ > drivers/net/mlx5/mlx5.c | 2 + > drivers/net/mlx5/mlx5_mr.c | 232 ++++++++++++++++--- > drivers/net/mlx5/mlx5_rxtx.h | 5 + > lib/librte_eal/common/eal_common_bus.c | 22 ++ > lib/librte_eal/common/include/rte_bus.h | 57 +++++ > lib/librte_eal/common/include/rte_vfio.h | 12 +- > lib/librte_eal/linuxapp/eal/eal_vfio.c | 26 ++- > lib/librte_eal/rte_eal_version.map | 2 + > 12 files changed, 418 insertions(+), 38 deletions(-) > > -- > 2.12.0 > >
Wednesday, February 13, 2019 1:43 PM, Alejandro Lucero: > Subject: Re: [dpdk-dev] [PATCH 0/6] introduce DMA memory mapping for > external memory > > On Wed, Feb 13, 2019 at 9:11 AM Shahaf Shuler <shahafs@mellanox.com> > wrote: > > > This series is in continue to RFC[1]. > > > > The DPDK APIs expose 3 different modes to work with memory used for > DMA: > > > > 1. Use the DPDK owned memory (backed by the DPDK provided > hugepages). > > This memory is allocated by the DPDK libraries, included in the DPDK > > memory system (memseg lists) and automatically DMA mapped by the > DPDK > > layers. > > > > 2. Use memory allocated by the user and register to the DPDK memory > > systems. This is also referred as external memory. Upon registration > > of the external memory, the DPDK layers will DMA map it to all needed > > devices. > > > > 3. Use memory allocated by the user and not registered to the DPDK > > memory system. This is for users who wants to have tight control on > > this memory. The user will need to explicitly call DMA map function in > > order to register such memory to the different devices. > > > > The scope of the patch focus on #3 above. > > > > > Why can not we have case 2 covering case 3? Because it is not our choice rather the DPDK application. We could not allow it, and force the application to register their external memory to the DPDK memory management system. However IMO it will be wrong. The use case exists - some application wants to manage their memory by themselves. w/o the extra overhead of rte_malloc, without creating a special socket to populate the memory and without redundant API calls to rte_extmem_*. Simply allocate chunk of memory, DMA map it to device and that’s it. > > > > Currently the only way to map external memory is through VFIO > > (rte_vfio_dma_map). While VFIO is common, there are other vendors > > which use different ways to map memory (e.g. Mellanox and NXP). > > > > > As you say, VFIO is common, and when allowing DMAs programmed in user > space, the right thing to do. It is common indeed. Why it the right thing to do? I'm assuming there is an IOMMU hardware and > this is what Mellanox and NXP rely on in some way or another. For Mellanox, the device works with virtual memory, not physical. If you think of it, it is more secure for user space application. Mellanox device has internal memory translation unit between virtual memory and physical memory. IOMMU can be added on top of it, in case the host doesn't trust the device or the device is given to untrusted entity like VM. > > Having each driver doing things in their own way will end up in a harder to > validate system. Different vendors will have different HW implementations. We cannot force everybody to align the IOMMU. What we can do, is to ease the user life and provide vendor agnostic APIs which just provide the needed functionality. On our case DMA map and unmap. The user should not care if its IOMMU, Mellanox memory registration through verbs or NXP special mapping. The sys admin should set/unset the IOMMU as a general mean of protection. And this of course will work also w/ Mellanox devices. If there is an IOMMU hardware, same mechanism should be > used always, leaving to the IOMMU hw specific implementation to deal with > the details. If a NIC is IOMMU-able, that should not be supported by specific > vendor drivers but through a generic solution like VFIO which will validate a > device with such capability and to perform the required actions for that case. > VFIO and IOMMU should be modified as needed for supporting this > requirement instead of leaving vendor drivers to implement their own > solution. Again - I am against of forcing every PCI device to use VFIO, and I don't think IOMMU as a HW device should control other PCI devices. I see nothing wrong with device which also has extra capabilities of memory translation, and adds another level of security to the user application. > > In any case, I think this support should be in a different patchset than the > private user space mappings. > > > > > The work in this patch moves the DMA mapping to vendor agnostic APIs. > > A new map and unmap ops were added to rte_bus structure. > > Implementation of those was done currently only on the PCI bus. The > > implementation takes the driver map and umap implementation as bypass > to the VFIO mapping. > > That is, in case of no specific map/unmap from the PCI driver, VFIO > > mapping, if possible, will be used. > > > > Application use with those APIs is quite simple: > > * allocate memory > > * take a device, and query its rte_device. > > * call the bus map function for this device. > > > > Future work will deprecate the rte_vfio_dma_map and > rte_vfio_dma_unmap > > APIs, leaving the PCI device APIs as the preferred option for the user. > > > > [1] > > > https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpat > > > ches.dpdk.org%2Fpatch%2F47796%2F&data=02%7C01%7Cshahafs%40 > mellanox > > > .com%7Cdc209a2ceace48c0452008d691a8762d%7Ca652971c7d2e4d9ba6a4d1 > 49256f > > > 461b%7C0%7C0%7C636856550053348339&sdata=3TEUJfS9jBOsbvaPYwo > itQLj7o > > h9VCrtaK7We%2FItg5c%3D&reserved=0 > > > > Shahaf Shuler (6): > > vfio: allow DMA map of memory for the default vfio fd > > vfio: don't fail to DMA map if memory is already mapped > > bus: introduce DMA memory mapping for external memory > > net/mlx5: refactor external memory registration > > net/mlx5: support PCI device DMA map and unmap > > doc: deprecate VFIO DMA map APIs > > > > doc/guides/prog_guide/env_abstraction_layer.rst | 2 +- > > doc/guides/rel_notes/deprecation.rst | 4 + > > drivers/bus/pci/pci_common.c | 78 +++++++ > > drivers/bus/pci/rte_bus_pci.h | 14 ++ > > drivers/net/mlx5/mlx5.c | 2 + > > drivers/net/mlx5/mlx5_mr.c | 232 ++++++++++++++++--- > > drivers/net/mlx5/mlx5_rxtx.h | 5 + > > lib/librte_eal/common/eal_common_bus.c | 22 ++ > > lib/librte_eal/common/include/rte_bus.h | 57 +++++ > > lib/librte_eal/common/include/rte_vfio.h | 12 +- > > lib/librte_eal/linuxapp/eal/eal_vfio.c | 26 ++- > > lib/librte_eal/rte_eal_version.map | 2 + > > 12 files changed, 418 insertions(+), 38 deletions(-) > > > > -- > > 2.12.0 > > > >
On 13-Feb-19 7:24 PM, Shahaf Shuler wrote: > Wednesday, February 13, 2019 1:43 PM, Alejandro Lucero: >> Subject: Re: [dpdk-dev] [PATCH 0/6] introduce DMA memory mapping for >> external memory >> >> On Wed, Feb 13, 2019 at 9:11 AM Shahaf Shuler <shahafs@mellanox.com> >> wrote: >> >>> This series is in continue to RFC[1]. >>> >>> The DPDK APIs expose 3 different modes to work with memory used for >> DMA: >>> >>> 1. Use the DPDK owned memory (backed by the DPDK provided >> hugepages). >>> This memory is allocated by the DPDK libraries, included in the DPDK >>> memory system (memseg lists) and automatically DMA mapped by the >> DPDK >>> layers. >>> >>> 2. Use memory allocated by the user and register to the DPDK memory >>> systems. This is also referred as external memory. Upon registration >>> of the external memory, the DPDK layers will DMA map it to all needed >>> devices. >>> >>> 3. Use memory allocated by the user and not registered to the DPDK >>> memory system. This is for users who wants to have tight control on >>> this memory. The user will need to explicitly call DMA map function in >>> order to register such memory to the different devices. >>> >>> The scope of the patch focus on #3 above. >>> >>> >> Why can not we have case 2 covering case 3? > > Because it is not our choice rather the DPDK application. > We could not allow it, and force the application to register their external memory to the DPDK memory management system. However IMO it will be wrong. > The use case exists - some application wants to manage their memory by themselves. w/o the extra overhead of rte_malloc, without creating a special socket to populate the memory and without redundant API calls to rte_extmem_*. > > Simply allocate chunk of memory, DMA map it to device and that’s it. Just a small note: while this sounds good on paper, i should point out that at least *registering* the memory with DPDK is a necessity. You may see rte_extmem_* calls as redundant (and i agree, to an extent), but we don't advertise our PMD's capabilities in a way that makes it easy to determine whether a particular PMD will or will not work without registering external memory within DPDK (i.e. does it use rte_virt2memseg() internally, for example). So, extmem register calls are a necessary evil in such case, and IMO should be called out as required for such external memory usage scenario.
On Wed, Feb 13, 2019 at 7:24 PM Shahaf Shuler <shahafs@mellanox.com> wrote: > Wednesday, February 13, 2019 1:43 PM, Alejandro Lucero: > > Subject: Re: [dpdk-dev] [PATCH 0/6] introduce DMA memory mapping for > > external memory > > > > On Wed, Feb 13, 2019 at 9:11 AM Shahaf Shuler <shahafs@mellanox.com> > > wrote: > > > > > This series is in continue to RFC[1]. > > > > > > The DPDK APIs expose 3 different modes to work with memory used for > > DMA: > > > > > > 1. Use the DPDK owned memory (backed by the DPDK provided > > hugepages). > > > This memory is allocated by the DPDK libraries, included in the DPDK > > > memory system (memseg lists) and automatically DMA mapped by the > > DPDK > > > layers. > > > > > > 2. Use memory allocated by the user and register to the DPDK memory > > > systems. This is also referred as external memory. Upon registration > > > of the external memory, the DPDK layers will DMA map it to all needed > > > devices. > > > > > > 3. Use memory allocated by the user and not registered to the DPDK > > > memory system. This is for users who wants to have tight control on > > > this memory. The user will need to explicitly call DMA map function in > > > order to register such memory to the different devices. > > > > > > The scope of the patch focus on #3 above. > > > > > > > > Why can not we have case 2 covering case 3? > > Because it is not our choice rather the DPDK application. > We could not allow it, and force the application to register their > external memory to the DPDK memory management system. However IMO it will > be wrong. > The use case exists - some application wants to manage their memory by > themselves. w/o the extra overhead of rte_malloc, without creating a > special socket to populate the memory and without redundant API calls to > rte_extmem_*. > > Simply allocate chunk of memory, DMA map it to device and that’s it. > > Usability is a strong point, but up to some extent. DPDK is all about performance, and adding options the user can choose from will add pressure and complexity for keeping the performance. Your proposal makes sense from an user point of view, but will it avoid to modify things in the DPDK core for supporting this case broadly in the future? Multiprocess will be hard to get, if not impossible, without adding more complexity, and although you likely do not expect that use case requiring multiprocess support, once we have DPDK apps using this model, sooner or later those companies with products based on such option will demand broadly support. I can foresee not just multiprocess support will require changes in the future. This reminds me the case of obtaining real time: the more complexity the less determinism can be obtained. It is not impossible, simply it is far more complex. Pure real time operating systems can add new functionalities, but it is hard to do it properly without jeopardising the main goal. Generic purpose operating systems can try to improve determinism, but up to some extent and with important complexity costs. DPDK is the real time operating system in this comparison. > > > > > > > Currently the only way to map external memory is through VFIO > > > (rte_vfio_dma_map). While VFIO is common, there are other vendors > > > which use different ways to map memory (e.g. Mellanox and NXP). > > > > > > > > As you say, VFIO is common, and when allowing DMAs programmed in user > > space, the right thing to do. > > It is common indeed. Why it the right thing to do? > > Compared with UIO, for sure. VFIO does have the right view of the system in terms of which devices can properly be isolated. Can you confirm a specific implementation by a vendor can ensure same behaviour? If so, do you have duplicated code then? if the answer is your are using VFIO data, why not to use VFIO as the interface and add the required connection between VFIO and drivers? What about mapping validation? is the driver doing that part or relying on kernel code? or is it just assuming the mapping is safe? > I'm assuming there is an IOMMU hardware and > > this is what Mellanox and NXP rely on in some way or another. > > For Mellanox, the device works with virtual memory, not physical. If you > think of it, it is more secure for user space application. Mellanox device > has internal memory translation unit between virtual memory and physical > memory. > IOMMU can be added on top of it, in case the host doesn't trust the device > or the device is given to untrusted entity like VM. > > Any current NIC or device will work with virtual addresses if IOMMU is in place, not matter if the device is IOMMU-aware or not. Any vendor, with that capability in their devices, should follow generic paths and a common interface with the vendor drivers being the executors. The drivers know how to tell the device, but they should be told what to tell and not by the user but by the kernel. I think reading your comment "in case the host doesn't trust the device" makes easier to understand what you try to obtain, and at the same time makes my concerns not a problem at all. This is a case for DPDK being used in certain scenarios where the full system is trusted, what I think is a completely rightful option. My only concern then is the complexity it could imply sooner or later, although I have to admit it is not a strong one :-) > > > > Having each driver doing things in their own way will end up in a harder > to > > validate system. > > Different vendors will have different HW implementations. We cannot force > everybody to align the IOMMU. > What we can do, is to ease the user life and provide vendor agnostic APIs > which just provide the needed functionality. On our case DMA map and unmap. > The user should not care if its IOMMU, Mellanox memory registration > through verbs or NXP special mapping. > > The sys admin should set/unset the IOMMU as a general mean of protection. > And this of course will work also w/ Mellanox devices. > > If there is an IOMMU hardware, same mechanism should be > > used always, leaving to the IOMMU hw specific implementation to deal with > > the details. If a NIC is IOMMU-able, that should not be supported by > specific > > vendor drivers but through a generic solution like VFIO which will > validate a > > device with such capability and to perform the required actions for that > case. > > VFIO and IOMMU should be modified as needed for supporting this > > requirement instead of leaving vendor drivers to implement their own > > solution. > > Again - I am against of forcing every PCI device to use VFIO, and I don't > think IOMMU as a HW device should control other PCI devices. > I see nothing wrong with device which also has extra capabilities of > memory translation, and adds another level of security to the user > application. > > In a system with untrusted components using the device, a generic way of properly configure the system with the right protections should be used instead of relying on specific vendor implementation. > > > > In any case, I think this support should be in a different patchset than > the > > private user space mappings. > > > > > > > > The work in this patch moves the DMA mapping to vendor agnostic APIs. > > > A new map and unmap ops were added to rte_bus structure. > > > Implementation of those was done currently only on the PCI bus. The > > > implementation takes the driver map and umap implementation as bypass > > to the VFIO mapping. > > > That is, in case of no specific map/unmap from the PCI driver, VFIO > > > mapping, if possible, will be used. > > > > > > Application use with those APIs is quite simple: > > > * allocate memory > > > * take a device, and query its rte_device. > > > * call the bus map function for this device. > > > > > > Future work will deprecate the rte_vfio_dma_map and > > rte_vfio_dma_unmap > > > APIs, leaving the PCI device APIs as the preferred option for the user. > > > > > > [1] > > > > > https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpat > > > > > ches.dpdk.org%2Fpatch%2F47796%2F&data=02%7C01%7Cshahafs%40 > > mellanox > > > > > .com%7Cdc209a2ceace48c0452008d691a8762d%7Ca652971c7d2e4d9ba6a4d1 > > 49256f > > > > > 461b%7C0%7C0%7C636856550053348339&sdata=3TEUJfS9jBOsbvaPYwo > > itQLj7o > > > h9VCrtaK7We%2FItg5c%3D&reserved=0 > > > > > > Shahaf Shuler (6): > > > vfio: allow DMA map of memory for the default vfio fd > > > vfio: don't fail to DMA map if memory is already mapped > > > bus: introduce DMA memory mapping for external memory > > > net/mlx5: refactor external memory registration > > > net/mlx5: support PCI device DMA map and unmap > > > doc: deprecate VFIO DMA map APIs > > > > > > doc/guides/prog_guide/env_abstraction_layer.rst | 2 +- > > > doc/guides/rel_notes/deprecation.rst | 4 + > > > drivers/bus/pci/pci_common.c | 78 +++++++ > > > drivers/bus/pci/rte_bus_pci.h | 14 ++ > > > drivers/net/mlx5/mlx5.c | 2 + > > > drivers/net/mlx5/mlx5_mr.c | 232 > ++++++++++++++++--- > > > drivers/net/mlx5/mlx5_rxtx.h | 5 + > > > lib/librte_eal/common/eal_common_bus.c | 22 ++ > > > lib/librte_eal/common/include/rte_bus.h | 57 +++++ > > > lib/librte_eal/common/include/rte_vfio.h | 12 +- > > > lib/librte_eal/linuxapp/eal/eal_vfio.c | 26 ++- > > > lib/librte_eal/rte_eal_version.map | 2 + > > > 12 files changed, 418 insertions(+), 38 deletions(-) > > > > > > -- > > > 2.12.0 > > > > > > >
On Thu, Feb 14, 2019 at 12:22 PM Alejandro Lucero < alejandro.lucero@netronome.com> wrote: > > > On Wed, Feb 13, 2019 at 7:24 PM Shahaf Shuler <shahafs@mellanox.com> > wrote: > >> Wednesday, February 13, 2019 1:43 PM, Alejandro Lucero: >> > Subject: Re: [dpdk-dev] [PATCH 0/6] introduce DMA memory mapping for >> > external memory >> > >> > On Wed, Feb 13, 2019 at 9:11 AM Shahaf Shuler <shahafs@mellanox.com> >> > wrote: >> > >> > > This series is in continue to RFC[1]. >> > > >> > > The DPDK APIs expose 3 different modes to work with memory used for >> > DMA: >> > > >> > > 1. Use the DPDK owned memory (backed by the DPDK provided >> > hugepages). >> > > This memory is allocated by the DPDK libraries, included in the DPDK >> > > memory system (memseg lists) and automatically DMA mapped by the >> > DPDK >> > > layers. >> > > >> > > 2. Use memory allocated by the user and register to the DPDK memory >> > > systems. This is also referred as external memory. Upon registration >> > > of the external memory, the DPDK layers will DMA map it to all needed >> > > devices. >> > > >> > > 3. Use memory allocated by the user and not registered to the DPDK >> > > memory system. This is for users who wants to have tight control on >> > > this memory. The user will need to explicitly call DMA map function in >> > > order to register such memory to the different devices. >> > > >> > > The scope of the patch focus on #3 above. >> > > >> > > >> > Why can not we have case 2 covering case 3? >> >> Because it is not our choice rather the DPDK application. >> We could not allow it, and force the application to register their >> external memory to the DPDK memory management system. However IMO it will >> be wrong. >> The use case exists - some application wants to manage their memory by >> themselves. w/o the extra overhead of rte_malloc, without creating a >> special socket to populate the memory and without redundant API calls to >> rte_extmem_*. >> >> Simply allocate chunk of memory, DMA map it to device and that’s it. >> >> > Usability is a strong point, but up to some extent. DPDK is all about > performance, and adding options the user can choose from will add pressure > and complexity for keeping the performance. Your proposal makes sense from > an user point of view, but will it avoid to modify things in the DPDK core > for supporting this case broadly in the future? Multiprocess will be hard > to get, if not impossible, without adding more complexity, and although you > likely do not expect that use case requiring multiprocess support, once we > have DPDK apps using this model, sooner or later those companies with > products based on such option will demand broadly support. I can foresee > not just multiprocess support will require changes in the future. > > This reminds me the case of obtaining real time: the more complexity the > less determinism can be obtained. It is not impossible, simply it is far > more complex. Pure real time operating systems can add new functionalities, > but it is hard to do it properly without jeopardising the main goal. > Generic purpose operating systems can try to improve determinism, but up to > some extent and with important complexity costs. DPDK is the real time > operating system in this comparison. > > >> > >> > >> > > Currently the only way to map external memory is through VFIO >> > > (rte_vfio_dma_map). While VFIO is common, there are other vendors >> > > which use different ways to map memory (e.g. Mellanox and NXP). >> > > >> > > >> > As you say, VFIO is common, and when allowing DMAs programmed in user >> > space, the right thing to do. >> >> It is common indeed. Why it the right thing to do? >> >> > Compared with UIO, for sure. VFIO does have the right view of the system > in terms of which devices can properly be isolated. Can you confirm a > specific implementation by a vendor can ensure same behaviour? If so, do > you have duplicated code then? if the answer is your are using VFIO data, > why not to use VFIO as the interface and add the required connection > between VFIO and drivers? > > What about mapping validation? is the driver doing that part or relying on > kernel code? or is it just assuming the mapping is safe? > > >> I'm assuming there is an IOMMU hardware and >> > this is what Mellanox and NXP rely on in some way or another. >> >> For Mellanox, the device works with virtual memory, not physical. If you >> think of it, it is more secure for user space application. Mellanox device >> has internal memory translation unit between virtual memory and physical >> memory. >> IOMMU can be added on top of it, in case the host doesn't trust the >> device or the device is given to untrusted entity like VM. >> >> > Any current NIC or device will work with virtual addresses if IOMMU is in > place, not matter if the device is IOMMU-aware or not. Any vendor, with > that capability in their devices, should follow generic paths and a common > interface with the vendor drivers being the executors. The drivers know how > to tell the device, but they should be told what to tell and not by the > user but by the kernel. > > I think reading your comment "in case the host doesn't trust the device" > makes easier to understand what you try to obtain, and at the same time > makes my concerns not a problem at all. This is a case for DPDK being used > in certain scenarios where the full system is trusted, what I think is a > completely rightful option. My only concern then is the complexity it could > imply sooner or later, although I have to admit it is not a strong one :-) > > I forgot to mention the problem of leaving that option open in not fully trusted systems. I do not know how it could be avoided, maybe some checks in EAL initialization, but maybe this is not possible at all. Anyway, I think this is worth to be discussed further. > > >> > Having each driver doing things in their own way will end up in a >> harder to >> > validate system. >> >> Different vendors will have different HW implementations. We cannot force >> everybody to align the IOMMU. >> What we can do, is to ease the user life and provide vendor agnostic APIs >> which just provide the needed functionality. On our case DMA map and unmap. >> The user should not care if its IOMMU, Mellanox memory registration >> through verbs or NXP special mapping. >> >> The sys admin should set/unset the IOMMU as a general mean of protection. >> And this of course will work also w/ Mellanox devices. >> >> If there is an IOMMU hardware, same mechanism should be >> > used always, leaving to the IOMMU hw specific implementation to deal >> with >> > the details. If a NIC is IOMMU-able, that should not be supported by >> specific >> > vendor drivers but through a generic solution like VFIO which will >> validate a >> > device with such capability and to perform the required actions for >> that case. >> > VFIO and IOMMU should be modified as needed for supporting this >> > requirement instead of leaving vendor drivers to implement their own >> > solution. >> >> Again - I am against of forcing every PCI device to use VFIO, and I don't >> think IOMMU as a HW device should control other PCI devices. >> I see nothing wrong with device which also has extra capabilities of >> memory translation, and adds another level of security to the user >> application. >> >> > In a system with untrusted components using the device, a generic way of > properly configure the system with the right protections should be used > instead of relying on specific vendor implementation. > > >> > >> > In any case, I think this support should be in a different patchset >> than the >> > private user space mappings. >> > >> > > > > >> > > The work in this patch moves the DMA mapping to vendor agnostic APIs. >> > > A new map and unmap ops were added to rte_bus structure. >> > > Implementation of those was done currently only on the PCI bus. The >> > > implementation takes the driver map and umap implementation as bypass >> > to the VFIO mapping. >> > > That is, in case of no specific map/unmap from the PCI driver, VFIO >> > > mapping, if possible, will be used. >> > > >> > > Application use with those APIs is quite simple: >> > > * allocate memory >> > > * take a device, and query its rte_device. >> > > * call the bus map function for this device. >> > > >> > > Future work will deprecate the rte_vfio_dma_map and >> > rte_vfio_dma_unmap >> > > APIs, leaving the PCI device APIs as the preferred option for the >> user. >> > > >> > > [1] >> > > >> > https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpat >> > > >> > ches.dpdk.org%2Fpatch%2F47796%2F&data=02%7C01%7Cshahafs%40 >> > mellanox >> > > >> > .com%7Cdc209a2ceace48c0452008d691a8762d%7Ca652971c7d2e4d9ba6a4d1 >> > 49256f >> > > >> > 461b%7C0%7C0%7C636856550053348339&sdata=3TEUJfS9jBOsbvaPYwo >> > itQLj7o >> > > h9VCrtaK7We%2FItg5c%3D&reserved=0 >> > > >> > > Shahaf Shuler (6): >> > > vfio: allow DMA map of memory for the default vfio fd >> > > vfio: don't fail to DMA map if memory is already mapped >> > > bus: introduce DMA memory mapping for external memory >> > > net/mlx5: refactor external memory registration >> > > net/mlx5: support PCI device DMA map and unmap >> > > doc: deprecate VFIO DMA map APIs >> > > >> > > doc/guides/prog_guide/env_abstraction_layer.rst | 2 +- >> > > doc/guides/rel_notes/deprecation.rst | 4 + >> > > drivers/bus/pci/pci_common.c | 78 +++++++ >> > > drivers/bus/pci/rte_bus_pci.h | 14 ++ >> > > drivers/net/mlx5/mlx5.c | 2 + >> > > drivers/net/mlx5/mlx5_mr.c | 232 >> ++++++++++++++++--- >> > > drivers/net/mlx5/mlx5_rxtx.h | 5 + >> > > lib/librte_eal/common/eal_common_bus.c | 22 ++ >> > > lib/librte_eal/common/include/rte_bus.h | 57 +++++ >> > > lib/librte_eal/common/include/rte_vfio.h | 12 +- >> > > lib/librte_eal/linuxapp/eal/eal_vfio.c | 26 ++- >> > > lib/librte_eal/rte_eal_version.map | 2 + >> > > 12 files changed, 418 insertions(+), 38 deletions(-) >> > > >> > > -- >> > > 2.12.0 >> > > >> > > >> >
Thursday, February 14, 2019 12:19 PM, Burakov, Anatoly: > Subject: Re: [dpdk-dev] [PATCH 0/6] introduce DMA memory mapping for > external memory > > On 13-Feb-19 7:24 PM, Shahaf Shuler wrote: > > Wednesday, February 13, 2019 1:43 PM, Alejandro Lucero: > >> Subject: Re: [dpdk-dev] [PATCH 0/6] introduce DMA memory mapping for > >> external memory > >> > >> On Wed, Feb 13, 2019 at 9:11 AM Shahaf Shuler > <shahafs@mellanox.com> > >> wrote: > >> > >>> This series is in continue to RFC[1]. > >>> > >>> The DPDK APIs expose 3 different modes to work with memory used for > >> DMA: > >>> > >>> 1. Use the DPDK owned memory (backed by the DPDK provided > >> hugepages). > >>> This memory is allocated by the DPDK libraries, included in the DPDK > >>> memory system (memseg lists) and automatically DMA mapped by the > >> DPDK > >>> layers. > >>> > >>> 2. Use memory allocated by the user and register to the DPDK memory > >>> systems. This is also referred as external memory. Upon registration > >>> of the external memory, the DPDK layers will DMA map it to all > >>> needed devices. > >>> > >>> 3. Use memory allocated by the user and not registered to the DPDK > >>> memory system. This is for users who wants to have tight control on > >>> this memory. The user will need to explicitly call DMA map function > >>> in order to register such memory to the different devices. > >>> > >>> The scope of the patch focus on #3 above. > >>> > >>> > >> Why can not we have case 2 covering case 3? > > > > Because it is not our choice rather the DPDK application. > > We could not allow it, and force the application to register their external > memory to the DPDK memory management system. However IMO it will be > wrong. > > The use case exists - some application wants to manage their memory by > themselves. w/o the extra overhead of rte_malloc, without creating a special > socket to populate the memory and without redundant API calls to > rte_extmem_*. > > > > Simply allocate chunk of memory, DMA map it to device and that’s it. > > Just a small note: while this sounds good on paper, i should point out that at > least *registering* the memory with DPDK is a necessity. You may see > rte_extmem_* calls as redundant (and i agree, to an extent), but we don't > advertise our PMD's capabilities in a way that makes it easy to determine > whether a particular PMD will or will not work without registering external > memory within DPDK (i.e. does it use > rte_virt2memseg() internally, for example). > > So, extmem register calls are a necessary evil in such case, and IMO should > be called out as required for such external memory usage scenario. If we are going to force all to use the extmem, then there is no need w/ this API. we can have the PMDs to register when the memory is registered. We can just drop the vfio_dma_map APIs and that's it. > > -- > Thanks, > Anatoly
Thursday, February 14, 2019 2:22 PM, Alejandro Lucero: >On Wed, Feb 13, 2019 at 7:24 PM Shahaf Shuler <shahafs@mellanox.com> wrote: >Wednesday, February 13, 2019 1:43 PM, Alejandro Lucero: >> Subject: Re: [dpdk-dev] [PATCH 0/6] introduce DMA memory mapping for >> external memory >> >> On Wed, Feb 13, 2019 at 9:11 AM Shahaf Shuler <shahafs@mellanox.com> >> wrote: >> >> > This series is in continue to RFC[1]. >> > >> > The DPDK APIs expose 3 different modes to work with memory used for >> DMA: >> > >> > 1. Use the DPDK owned memory (backed by the DPDK provided >> hugepages). >> > This memory is allocated by the DPDK libraries, included in the DPDK >> > memory system (memseg lists) and automatically DMA mapped by the >> DPDK >> > layers. >> > >> > 2. Use memory allocated by the user and register to the DPDK memory >> > systems. This is also referred as external memory. Upon registration >> > of the external memory, the DPDK layers will DMA map it to all needed >> > devices. >> > >> > 3. Use memory allocated by the user and not registered to the DPDK >> > memory system. This is for users who wants to have tight control on >> > this memory. The user will need to explicitly call DMA map function in >> > order to register such memory to the different devices. >> > >> > The scope of the patch focus on #3 above. >> > >> > >> Why can not we have case 2 covering case 3? > >Because it is not our choice rather the DPDK application. >We could not allow it, and force the application to register their external memory to the DPDK memory management system. However IMO it will be wrong. >The use case exists - some application wants to manage their memory by themselves. w/o the extra overhead of rte_malloc, without creating a special socket to populate the memory and without redundant API calls to rte_extmem_*. > >Simply allocate chunk of memory, DMA map it to device and that’s it. > >Usability is a strong point, but up to some extent. DPDK is all about performance, and adding options the user can choose from will add pressure and complexity for keeping the performance. Your proposal makes sense from an user point of view, but will it avoid to modify things in the DPDK core for supporting this case broadly in the future? Multiprocess will be hard to get, if not impossible, without adding more complexity, and although you likely do not expect that use case requiring multiprocess support, once we have DPDK apps using this model, sooner or later those companies with products based on such option will demand broadly support. I can foresee not just multiprocess support will require changes in the future. > >This reminds me the case of obtaining real time: the more complexity the less determinism can be obtained. It is not impossible, simply it is far more complex. Pure real time operating systems can add new functionalities, but it is hard to do it properly without jeopardising the main goal. Generic purpose operating systems can try to improve determinism, but up to some extent and with important complexity costs. DPDK is the real time operating system in this comparison. it makes some sense. as I wrote to Anatoly, I am not against forcing the user to work only w/ DPDK registered memory. we may cause some overhead to application, but will make things less complex. we just need to agree on it, and remove backdoors like vfio_dma_map (which BTW, currently being used by application, checkout VPP). > >> >> >> > Currently the only way to map external memory is through VFIO >> > (rte_vfio_dma_map). While VFIO is common, there are other vendors >> > which use different ways to map memory (e.g. Mellanox and NXP). >> > >> > >> As you say, VFIO is common, and when allowing DMAs programmed in user >> space, the right thing to do. > >It is common indeed. Why it the right thing to do? > >Compared with UIO, for sure. VFIO does have the right view of the system in terms of which devices can properly be isolated. Can you confirm a specific implementation by a vendor can ensure same behaviour? If so, do you have duplicated code then? if the answer is your are using VFIO data, why not to use VFIO as the interface and add the required connection between VFIO and drivers? > >What about mapping validation? is the driver doing that part or relying on kernel code? or is it just assuming the mapping is safe? mapping validation is done by Mellanox kernel module, the kernel is trusted. > > I'm assuming there is an IOMMU hardware and >> this is what Mellanox and NXP rely on in some way or another. > >For Mellanox, the device works with virtual memory, not physical. If you think of it, it is more secure for user space application. Mellanox device has internal memory translation unit between virtual memory and physical memory. >IOMMU can be added on top of it, in case the host doesn't trust the device or the device is given to untrusted entity like VM. > >Any current NIC or device will work with virtual addresses if IOMMU is in place, not matter if the device is IOMMU-aware or not. Not sure what you mean here. For example Intel devices works w/ VFIO and use iova to provide buffers to NIC. hence protection between multiple process is by application responsibility or new VFIO container. for devices which works w/ virtual addresses, sharing of device with multiple process is simple and secure. Any vendor, with that capability in their devices, should follow generic paths and a common interface with the vendor drivers being the executors. The drivers know how to tell the device, but they should be told what to tell and not by the user but by the kernel. > >I think reading your comment "in case the host doesn't trust the device" makes easier to understand what you try to obtain, and at the same time makes my concerns not a problem at all. This is a case for DPDK being used in certain scenarios where the full system is trusted, what I think is a completely rightful option. My only concern then is the complexity it could imply sooner or later, although I have to admit it is not a strong one :-) > >> >> Having each driver doing things in their own way will end up in a harder to >> validate system. > >Different vendors will have different HW implementations. We cannot force everybody to align the IOMMU. >What we can do, is to ease the user life and provide vendor agnostic APIs which just provide the needed functionality. On our case DMA map and unmap. >The user should not care if its IOMMU, Mellanox memory registration through verbs or NXP special mapping. > >The sys admin should set/unset the IOMMU as a general mean of protection. And this of course will work also w/ Mellanox devices. > >If there is an IOMMU hardware, same mechanism should be >> used always, leaving to the IOMMU hw specific implementation to deal with >> the details. If a NIC is IOMMU-able, that should not be supported by specific >> vendor drivers but through a generic solution like VFIO which will validate a >> device with such capability and to perform the required actions for that case. >> VFIO and IOMMU should be modified as needed for supporting this >> requirement instead of leaving vendor drivers to implement their own >> solution. > >Again - I am against of forcing every PCI device to use VFIO, and I don't think IOMMU as a HW device should control other PCI devices. >I see nothing wrong with device which also has extra capabilities of memory translation, and adds another level of security to the user application. > >In a system with untrusted components using the device, a generic way of properly configure the system with the right protections should be used instead of relying on specific vendor implementation. > >> >> In any case, I think this support should be in a different patchset than the >> private user space mappings. >> >> >> >> > The work in this patch moves the DMA mapping to vendor agnostic APIs. >> > A new map and unmap ops were added to rte_bus structure. >> > Implementation of those was done currently only on the PCI bus. The >> > implementation takes the driver map and umap implementation as bypass >> to the VFIO mapping. >> > That is, in case of no specific map/unmap from the PCI driver, VFIO >> > mapping, if possible, will be used. >> > >
On 14-Feb-19 1:28 PM, Shahaf Shuler wrote: > Thursday, February 14, 2019 12:19 PM, Burakov, Anatoly: >> Subject: Re: [dpdk-dev] [PATCH 0/6] introduce DMA memory mapping for >> external memory >> >> On 13-Feb-19 7:24 PM, Shahaf Shuler wrote: >>> Wednesday, February 13, 2019 1:43 PM, Alejandro Lucero: >>>> Subject: Re: [dpdk-dev] [PATCH 0/6] introduce DMA memory mapping for >>>> external memory >>>> >>>> On Wed, Feb 13, 2019 at 9:11 AM Shahaf Shuler >> <shahafs@mellanox.com> >>>> wrote: >>>> >>>>> This series is in continue to RFC[1]. >>>>> >>>>> The DPDK APIs expose 3 different modes to work with memory used for >>>> DMA: >>>>> >>>>> 1. Use the DPDK owned memory (backed by the DPDK provided >>>> hugepages). >>>>> This memory is allocated by the DPDK libraries, included in the DPDK >>>>> memory system (memseg lists) and automatically DMA mapped by the >>>> DPDK >>>>> layers. >>>>> >>>>> 2. Use memory allocated by the user and register to the DPDK memory >>>>> systems. This is also referred as external memory. Upon registration >>>>> of the external memory, the DPDK layers will DMA map it to all >>>>> needed devices. >>>>> >>>>> 3. Use memory allocated by the user and not registered to the DPDK >>>>> memory system. This is for users who wants to have tight control on >>>>> this memory. The user will need to explicitly call DMA map function >>>>> in order to register such memory to the different devices. >>>>> >>>>> The scope of the patch focus on #3 above. >>>>> >>>>> >>>> Why can not we have case 2 covering case 3? >>> >>> Because it is not our choice rather the DPDK application. >>> We could not allow it, and force the application to register their external >> memory to the DPDK memory management system. However IMO it will be >> wrong. >>> The use case exists - some application wants to manage their memory by >> themselves. w/o the extra overhead of rte_malloc, without creating a special >> socket to populate the memory and without redundant API calls to >> rte_extmem_*. >>> >>> Simply allocate chunk of memory, DMA map it to device and that’s it. >> >> Just a small note: while this sounds good on paper, i should point out that at >> least *registering* the memory with DPDK is a necessity. You may see >> rte_extmem_* calls as redundant (and i agree, to an extent), but we don't >> advertise our PMD's capabilities in a way that makes it easy to determine >> whether a particular PMD will or will not work without registering external >> memory within DPDK (i.e. does it use >> rte_virt2memseg() internally, for example). >> >> So, extmem register calls are a necessary evil in such case, and IMO should >> be called out as required for such external memory usage scenario. > > If we are going to force all to use the extmem, then there is no need w/ this API. we can have the PMDs to register when the memory is registered. > We can just drop the vfio_dma_map APIs and that's it. > Well, whether we needed it or not is not really my call, but what i can say is that using extmem_register is _necessary_ if you're going to use the PMD's. You're right, we could just map memory for DMA at register time - that would save one API call to get the memory working. It makes it a bit weird semantically, but i think we can live with that :)
On 14-Feb-19 1:41 PM, Shahaf Shuler wrote: > Thursday, February 14, 2019 2:22 PM, Alejandro Lucero: > > >Any current NIC or device will work with virtual addresses if IOMMU is > in place, not matter if the device is IOMMU-aware or not. > > Not sure what you mean here. For example Intel devices works w/ VFIO and > use iova to provide buffers to NIC. hence protection between multiple > process is by application responsibility or new VFIO container. > As far as VFIO is concerned, "multiprocess protection" is not a thing, because the device cannot be used twice in the first place - each usage is strictly limited to one VFIO container. We just sidestep this "limitation" by sharing container/device file descriptors with multiple processes via IPC. So while it's technically true that multiprocess protection is "application responsibility" because we can pass around fd's, it's still protected by the kernel. IOVA mappings are per-container, so the same IOVA range can be mapped twice (thrice...), as long as it's for a different set of devices, in effect making them virtual addresses.
Thursday, February 14, 2019 6:20 PM, Burakov, Anatoly: > Subject: Re: [dpdk-dev] [PATCH 0/6] introduce DMA memory mapping for > external memory > > On 14-Feb-19 1:28 PM, Shahaf Shuler wrote: > > Thursday, February 14, 2019 12:19 PM, Burakov, Anatoly: > >> Subject: Re: [dpdk-dev] [PATCH 0/6] introduce DMA memory mapping for > >> external memory > >> > >> On 13-Feb-19 7:24 PM, Shahaf Shuler wrote: > >>> Wednesday, February 13, 2019 1:43 PM, Alejandro Lucero: > >>>> Subject: Re: [dpdk-dev] [PATCH 0/6] introduce DMA memory mapping > >>>> for external memory > >>>> [...] > > > > If we are going to force all to use the extmem, then there is no need w/ > this API. we can have the PMDs to register when the memory is registered. > > We can just drop the vfio_dma_map APIs and that's it. > > > > Well, whether we needed it or not is not really my call, but what i can say is > that using extmem_register is _necessary_ if you're going to use the PMD's. > You're right, we could just map memory for DMA at register time - that > would save one API call to get the memory working. It makes it a bit weird > semantically, but i think we can live with that :) This was not my suggestion 😊. I don't think the register API should do the mapping as well. My thoughts were on one of the two options: 1. have the series I propose here, and enable the user to work with memory managed outside of DPDK. Either force the user to call rte_extmem_register before the mapping or devices which needs memory to be also registered in the DPDK system can fail the mapping. 2. not providing such option to application, and forcing applications to populate a socket w/ external memory. I vote for #1. > > -- > Thanks, > Anatoly
On 17-Feb-19 6:18 AM, Shahaf Shuler wrote: > Thursday, February 14, 2019 6:20 PM, Burakov, Anatoly: >> Subject: Re: [dpdk-dev] [PATCH 0/6] introduce DMA memory mapping for >> external memory >> >> On 14-Feb-19 1:28 PM, Shahaf Shuler wrote: >>> Thursday, February 14, 2019 12:19 PM, Burakov, Anatoly: >>>> Subject: Re: [dpdk-dev] [PATCH 0/6] introduce DMA memory mapping for >>>> external memory >>>> >>>> On 13-Feb-19 7:24 PM, Shahaf Shuler wrote: >>>>> Wednesday, February 13, 2019 1:43 PM, Alejandro Lucero: >>>>>> Subject: Re: [dpdk-dev] [PATCH 0/6] introduce DMA memory mapping >>>>>> for external memory >>>>>> > > [...] > >>> >>> If we are going to force all to use the extmem, then there is no need w/ >> this API. we can have the PMDs to register when the memory is registered. >>> We can just drop the vfio_dma_map APIs and that's it. >>> >> >> Well, whether we needed it or not is not really my call, but what i can say is >> that using extmem_register is _necessary_ if you're going to use the PMD's. >> You're right, we could just map memory for DMA at register time - that >> would save one API call to get the memory working. It makes it a bit weird >> semantically, but i think we can live with that :) > > This was not my suggestion 😊. I don't think the register API should do the mapping as well. > My thoughts were on one of the two options: > 1. have the series I propose here, and enable the user to work with memory managed outside of DPDK. Either force the user to call rte_extmem_register before the mapping or devices which needs memory to be also registered in the DPDK system can fail the mapping. > 2. not providing such option to application, and forcing applications to populate a socket w/ external memory. > > I vote for #1. I too think #1 is better - we want this to be a valid use case. Allowing such usage in the first place is already gracious enough - all we ask in return is one extra API call, to politely let DPDK know that this memory exists and is going to be used for DMA :) Also, having the memory registered will also allow us to refuse mapping if it cannot be found in DPDK maps - if rte_virt2memseg returns NULL, that means extmem_register was not called. I.e. we can _enforce_ usage of extmem_register, which i believe is a good thing for usability. > >> >> -- >> Thanks, >> Anatoly