[dpdk-dev,v10,3/9] linuxapp/eal_pci: get iommu class
Checks
Commit Message
Get iommu class of PCI device on the bus and returns preferred iova
mapping mode for that bus.
Patch also introduces RTE_PCI_DRV_IOVA_AS_VA drv flag.
Flag used when driver needs to operate in iova=va mode.
Algorithm for iova scheme selection for PCI bus:
0. If no device bound then return with RTE_IOVA_DC mapping mode,
else goto 1).
1. Look for device attached to vfio kdrv and has .drv_flag set
to RTE_PCI_DRV_IOVA_AS_VA.
2. Look for any device attached to UIO class of driver.
3. Check for vfio-noiommu mode enabled.
If 2) & 3) is false and 1) is true then select
mapping scheme as RTE_IOVA_VA. Otherwise use default
mapping scheme (RTE_IOVA_PA).
Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com>
---
lib/librte_eal/common/include/rte_pci.h | 2 +
lib/librte_eal/linuxapp/eal/eal_pci.c | 89 ++++++++++++++++++++++++++++++++-
lib/librte_eal/linuxapp/eal/eal_vfio.c | 19 +++++++
lib/librte_eal/linuxapp/eal/eal_vfio.h | 4 ++
4 files changed, 113 insertions(+), 1 deletion(-)
Comments
Hi,
Nice patch series. But I still have a small question about below flag.
On 10/6/2017 7:03 PM, Santosh Shukla wrote:
> Get iommu class of PCI device on the bus and returns preferred iova
> mapping mode for that bus.
>
> Patch also introduces RTE_PCI_DRV_IOVA_AS_VA drv flag.
> Flag used when driver needs to operate in iova=va mode.
>
Does this flag indicate a must to use VA as IOVA, or a nice-to-have one?
In detail, above commit log says, "needs to operate in iova=va mode",
but the comment in the patch indicates this flag means "driver supports
IOVA as VA".
If it's the latter case, I would suppose all drivers support to use VA
as IOVA, if the NICs are binded to vfio-pci (iommu mode). Please correct
me if I'm wrong.
Thanks,
Jianfeng
On Wednesday 11 October 2017 07:17 AM, Tan, Jianfeng wrote:
> Hi,
>
> Nice patch series. But I still have a small question about below flag.
>
>
> On 10/6/2017 7:03 PM, Santosh Shukla wrote:
>> Get iommu class of PCI device on the bus and returns preferred iova
>> mapping mode for that bus.
>>
>> Patch also introduces RTE_PCI_DRV_IOVA_AS_VA drv flag.
>> Flag used when driver needs to operate in iova=va mode.
>>
> Does this flag indicate a must to use VA as IOVA, or a nice-to-have one? In detail, above commit log says, "needs to operate in iova=va mode", but the comment in the patch indicates this flag means "driver supports IOVA as VA".
>
> If it's the latter case, I would suppose all drivers support to use VA as IOVA, if the NICs are binded to vfio-pci (iommu mode). Please correct me if I'm wrong.
>
- Any iommu backed pmd could choose to use this flag.
- Reasoning for need was performance for our external mempool pmd: avoid phy2virt translation on
mbuf thus save cycles.
> Thanks,
> Jianfeng
On 10/11/2017 12:43 PM, santosh wrote:
> On Wednesday 11 October 2017 07:17 AM, Tan, Jianfeng wrote:
>> Hi,
>>
>> Nice patch series. But I still have a small question about below flag.
>>
>>
>> On 10/6/2017 7:03 PM, Santosh Shukla wrote:
>>> Get iommu class of PCI device on the bus and returns preferred iova
>>> mapping mode for that bus.
>>>
>>> Patch also introduces RTE_PCI_DRV_IOVA_AS_VA drv flag.
>>> Flag used when driver needs to operate in iova=va mode.
>>>
>> Does this flag indicate a must to use VA as IOVA, or a nice-to-have one? In detail, above commit log says, "needs to operate in iova=va mode", but the comment in the patch indicates this flag means "driver supports IOVA as VA".
>>
>> If it's the latter case, I would suppose all drivers support to use VA as IOVA, if the NICs are binded to vfio-pci (iommu mode). Please correct me if I'm wrong.
>>
> - Any iommu backed pmd could choose to use this flag.
But if this is characterized by assumption for all PMDs, why do we
trouble to introduce this flag.
> - Reasoning for need was performance for our external mempool pmd: avoid phy2virt translation on
> mbuf thus save cycles.
>
Agreed, and it's also for running DPDK without root privilege.
Thanks,
Jianfeng
On Wednesday 11 October 2017 11:01 AM, Tan, Jianfeng wrote:
>
>
> On 10/11/2017 12:43 PM, santosh wrote:
>> On Wednesday 11 October 2017 07:17 AM, Tan, Jianfeng wrote:
>>> Hi,
>>>
>>> Nice patch series. But I still have a small question about below flag.
>>>
>>>
>>> On 10/6/2017 7:03 PM, Santosh Shukla wrote:
>>>> Get iommu class of PCI device on the bus and returns preferred iova
>>>> mapping mode for that bus.
>>>>
>>>> Patch also introduces RTE_PCI_DRV_IOVA_AS_VA drv flag.
>>>> Flag used when driver needs to operate in iova=va mode.
>>>>
>>> Does this flag indicate a must to use VA as IOVA, or a nice-to-have one? In detail, above commit log says, "needs to operate in iova=va mode", but the comment in the patch indicates this flag means "driver supports IOVA as VA".
>>>
>>> If it's the latter case, I would suppose all drivers support to use VA as IOVA, if the NICs are binded to vfio-pci (iommu mode). Please correct me if I'm wrong.
>>>
>> - Any iommu backed pmd could choose to use this flag.
>
> But if this is characterized by assumption for all PMDs, why do we trouble to introduce this flag.
>
to hint bus layer about iova=va mapping choice for _this_ driver and default is iova=pa.
Thanks.
> -----Original Message-----
> From: santosh [mailto:santosh.shukla@caviumnetworks.com]
> Sent: Wednesday, October 11, 2017 1:38 PM
> To: Tan, Jianfeng; olivier.matz@6wind.com; dev@dpdk.org
> Cc: thomas@monjalon.net; jerin.jacob@caviumnetworks.com;
> hemant.agrawal@nxp.com; aconole@redhat.com;
> stephen@networkplumber.org; Burakov, Anatoly; gaetan.rivet@6wind.com;
> shreyansh.jain@nxp.com; Richardson, Bruce; Gonzalez Monroy, Sergio;
> maxime.coquelin@redhat.com
> Subject: Re: [dpdk-dev] [PATCH v10 3/9] linuxapp/eal_pci: get iommu class
>
>
> On Wednesday 11 October 2017 11:01 AM, Tan, Jianfeng wrote:
> >
> >
> > On 10/11/2017 12:43 PM, santosh wrote:
> >> On Wednesday 11 October 2017 07:17 AM, Tan, Jianfeng wrote:
> >>> Hi,
> >>>
> >>> Nice patch series. But I still have a small question about below flag.
> >>>
> >>>
> >>> On 10/6/2017 7:03 PM, Santosh Shukla wrote:
> >>>> Get iommu class of PCI device on the bus and returns preferred iova
> >>>> mapping mode for that bus.
> >>>>
> >>>> Patch also introduces RTE_PCI_DRV_IOVA_AS_VA drv flag.
> >>>> Flag used when driver needs to operate in iova=va mode.
> >>>>
> >>> Does this flag indicate a must to use VA as IOVA, or a nice-to-have one?
> In detail, above commit log says, "needs to operate in iova=va mode", but
> the comment in the patch indicates this flag means "driver supports IOVA as
> VA".
> >>>
> >>> If it's the latter case, I would suppose all drivers support to use VA as
> IOVA, if the NICs are binded to vfio-pci (iommu mode). Please correct me if
> I'm wrong.
> >>>
> >> - Any iommu backed pmd could choose to use this flag.
> >
> > But if this is characterized by assumption for all PMDs, why do we trouble
> to introduce this flag.
> >
> to hint bus layer about iova=va mapping choice for _this_ driver and default
> is iova=pa.
>
So that sounds if this flag is set by some PMD, we must use iova=va.
Then how about we enable this, iova=va, if only all PCI devices are binded to vfio-pci (iommu mode)?
Thanks,
Jianfeng
On Wednesday 11 October 2017 12:34 PM, Tan, Jianfeng wrote:
>
>> -----Original Message-----
>> From: santosh [mailto:santosh.shukla@caviumnetworks.com]
>> Sent: Wednesday, October 11, 2017 1:38 PM
>> To: Tan, Jianfeng; olivier.matz@6wind.com; dev@dpdk.org
>> Cc: thomas@monjalon.net; jerin.jacob@caviumnetworks.com;
>> hemant.agrawal@nxp.com; aconole@redhat.com;
>> stephen@networkplumber.org; Burakov, Anatoly; gaetan.rivet@6wind.com;
>> shreyansh.jain@nxp.com; Richardson, Bruce; Gonzalez Monroy, Sergio;
>> maxime.coquelin@redhat.com
>> Subject: Re: [dpdk-dev] [PATCH v10 3/9] linuxapp/eal_pci: get iommu class
>>
>>
>> On Wednesday 11 October 2017 11:01 AM, Tan, Jianfeng wrote:
>>>
>>> On 10/11/2017 12:43 PM, santosh wrote:
>>>> On Wednesday 11 October 2017 07:17 AM, Tan, Jianfeng wrote:
>>>>> Hi,
>>>>>
>>>>> Nice patch series. But I still have a small question about below flag.
>>>>>
>>>>>
>>>>> On 10/6/2017 7:03 PM, Santosh Shukla wrote:
>>>>>> Get iommu class of PCI device on the bus and returns preferred iova
>>>>>> mapping mode for that bus.
>>>>>>
>>>>>> Patch also introduces RTE_PCI_DRV_IOVA_AS_VA drv flag.
>>>>>> Flag used when driver needs to operate in iova=va mode.
>>>>>>
>>>>> Does this flag indicate a must to use VA as IOVA, or a nice-to-have one?
>> In detail, above commit log says, "needs to operate in iova=va mode", but
>> the comment in the patch indicates this flag means "driver supports IOVA as
>> VA".
>>>>> If it's the latter case, I would suppose all drivers support to use VA as
>> IOVA, if the NICs are binded to vfio-pci (iommu mode). Please correct me if
>> I'm wrong.
>>>> - Any iommu backed pmd could choose to use this flag.
>>> But if this is characterized by assumption for all PMDs, why do we trouble
>> to introduce this flag.
>> to hint bus layer about iova=va mapping choice for _this_ driver and default
>> is iova=pa.
>>
> So that sounds if this flag is set by some PMD, we must use iova=va.
>
> Then how about we enable this, iova=va, if only all PCI devices are binded to vfio-pci (iommu mode)?
Right, same I proposed (I guess) in v2 such that iova bus autodetecting in case see all device bound
to vfio-pci then autoselect iova=va, in v3 series (I guess) discussion: it was concluded that
better to send hint from driver. Refer work history, though iova bus still does said
auto-detection.
Thanks.
> > Then how about we enable this, iova=va, if only all PCI devices are binded
> to vfio-pci (iommu mode)?
>
> Right, same I proposed (I guess) in v2 such that iova bus autodetecting in
> case see all device bound
> to vfio-pci then autoselect iova=va, in v3 series (I guess) discussion: it was
> concluded that
> better to send hint from driver. Refer work history, though iova bus still does
> said
> auto-detection.
Sorry I missed that. I tend to think that almost all PMDs for physical devices shall add this flag then.
Thanks,
Jianfeng
On Wednesday 11 October 2017 02:01 PM, Tan, Jianfeng wrote:
>>> Then how about we enable this, iova=va, if only all PCI devices are binded
>> to vfio-pci (iommu mode)?
>>
>> Right, same I proposed (I guess) in v2 such that iova bus autodetecting in
>> case see all device bound
>> to vfio-pci then autoselect iova=va, in v3 series (I guess) discussion: it was
>> concluded that
>> better to send hint from driver. Refer work history, though iova bus still does
>> said
>> auto-detection.
> Sorry I missed that. I tend to think that almost all PMDs for physical devices shall add this flag then.
IMO +1, But decision is upto PMD owner.
Thanks.
@@ -202,6 +202,8 @@ struct rte_pci_bus {
#define RTE_PCI_DRV_INTR_RMV 0x0010
/** Device driver needs to keep mapped resources if unsupported dev detected */
#define RTE_PCI_DRV_KEEP_MAPPED_RES 0x0020
+/** Device driver supports iova as va */
+#define RTE_PCI_DRV_IOVA_AS_VA 0X0040
/**
* A structure describing a PCI mapping.
@@ -45,6 +45,7 @@
#include "eal_filesystem.h"
#include "eal_private.h"
#include "eal_pci_init.h"
+#include "eal_vfio.h"
/**
* @file
@@ -488,11 +489,97 @@ rte_pci_scan(void)
}
/*
- * Get iommu class of pci devices on the bus.
+ * Is pci device bound to any kdrv
+ */
+static inline int
+pci_one_device_is_bound(void)
+{
+ struct rte_pci_device *dev = NULL;
+ int ret = 0;
+
+ FOREACH_DEVICE_ON_PCIBUS(dev) {
+ if (dev->kdrv == RTE_KDRV_UNKNOWN ||
+ dev->kdrv == RTE_KDRV_NONE) {
+ continue;
+ } else {
+ ret = 1;
+ break;
+ }
+ }
+ return ret;
+}
+
+/*
+ * Any one of the device bound to uio
+ */
+static inline int
+pci_one_device_bound_uio(void)
+{
+ struct rte_pci_device *dev = NULL;
+
+ FOREACH_DEVICE_ON_PCIBUS(dev) {
+ if (dev->kdrv == RTE_KDRV_IGB_UIO ||
+ dev->kdrv == RTE_KDRV_UIO_GENERIC) {
+ return 1;
+ }
+ }
+ return 0;
+}
+
+/*
+ * Any one of the device has iova as va
+ */
+static inline int
+pci_one_device_has_iova_va(void)
+{
+ struct rte_pci_device *dev = NULL;
+ struct rte_pci_driver *drv = NULL;
+
+ FOREACH_DRIVER_ON_PCIBUS(drv) {
+ if (drv && drv->drv_flags & RTE_PCI_DRV_IOVA_AS_VA) {
+ FOREACH_DEVICE_ON_PCIBUS(dev) {
+ if (dev->kdrv == RTE_KDRV_VFIO &&
+ rte_pci_match(drv, dev))
+ return 1;
+ }
+ }
+ }
+ return 0;
+}
+
+/*
+ * Get iommu class of PCI devices on the bus.
*/
enum rte_iova_mode
rte_pci_get_iommu_class(void)
{
+ bool is_bound;
+ bool is_vfio_noiommu_enabled = true;
+ bool has_iova_va;
+ bool is_bound_uio;
+
+ is_bound = pci_one_device_is_bound();
+ if (!is_bound)
+ return RTE_IOVA_DC;
+
+ has_iova_va = pci_one_device_has_iova_va();
+ is_bound_uio = pci_one_device_bound_uio();
+#ifdef VFIO_PRESENT
+ is_vfio_noiommu_enabled = vfio_noiommu_is_enabled() == true ?
+ true : false;
+#endif
+
+ if (has_iova_va && !is_bound_uio && !is_vfio_noiommu_enabled)
+ return RTE_IOVA_VA;
+
+ if (has_iova_va) {
+ RTE_LOG(WARNING, EAL, "Some devices want iova as va but pa will be used because.. ");
+ if (is_vfio_noiommu_enabled)
+ RTE_LOG(WARNING, EAL, "vfio-noiommu mode configured\n");
+ if (is_bound_uio)
+ RTE_LOG(WARNING, EAL, "few device bound to UIO\n");
+ }
+
return RTE_IOVA_PA;
}
@@ -816,4 +816,23 @@ vfio_noiommu_dma_map(int __rte_unused vfio_container_fd)
return 0;
}
+int
+vfio_noiommu_is_enabled(void)
+{
+ int fd, ret, cnt __rte_unused;
+ char c;
+
+ ret = -1;
+ fd = open(VFIO_NOIOMMU_MODE, O_RDONLY);
+ if (fd < 0)
+ return -1;
+
+ cnt = read(fd, &c, 1);
+ if (c == 'Y')
+ ret = 1;
+
+ close(fd);
+ return ret;
+}
+
#endif
@@ -150,6 +150,8 @@ struct vfio_config {
#define VFIO_NOIOMMU_GROUP_FMT "/dev/vfio/noiommu-%u"
#define VFIO_GET_REGION_ADDR(x) ((uint64_t) x << 40ULL)
#define VFIO_GET_REGION_IDX(x) (x >> 40)
+#define VFIO_NOIOMMU_MODE \
+ "/sys/module/vfio/parameters/enable_unsafe_noiommu_mode"
/* DMA mapping function prototype.
* Takes VFIO container fd as a parameter.
@@ -210,6 +212,8 @@ int pci_vfio_is_enabled(void);
int vfio_mp_sync_setup(void);
+int vfio_noiommu_is_enabled(void);
+
#define SOCKET_REQ_CONTAINER 0x100
#define SOCKET_REQ_GROUP 0x200
#define SOCKET_CLR_GROUP 0x300