[v10,4/5] kni: add IOVA=VA support in KNI module

Message ID 20190816061252.17214-5-vattunuru@marvell.com (mailing list archive)
State Changes Requested, archived
Delegated to: Ferruh Yigit
Headers
Series kni: add IOVA=VA support |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/Intel-compilation fail Compilation issues

Commit Message

Vamsi Krishna Attunuru Aug. 16, 2019, 6:12 a.m. UTC
  From: Kiran Kumar K <kirankumark@marvell.com>

Patch adds support for kernel module to work in IOVA = VA mode,
the idea is to get physical address from IOVA address using
iommu_iova_to_phys API and later use phys_to_virt API to
convert the physical address to kernel virtual address.

When compared with IOVA = PA mode, there is no performance
drop with this approach.

This approach does not work with the kernel versions less
than 4.4.0 because of API compatibility issues.

Patch also updates these support details in KNI documentation.

Signed-off-by: Kiran Kumar K <kirankumark@marvell.com>
Signed-off-by: Vamsi Attunuru <vattunuru@marvell.com>
---
 kernel/linux/kni/compat.h   |  4 +++
 kernel/linux/kni/kni_dev.h  |  4 +++
 kernel/linux/kni/kni_misc.c | 71 +++++++++++++++++++++++++++++++++++++++------
 kernel/linux/kni/kni_net.c  | 59 ++++++++++++++++++++++++++++---------
 4 files changed, 116 insertions(+), 22 deletions(-)
  

Comments

Ferruh Yigit Oct. 15, 2019, 3:43 p.m. UTC | #1
On 8/16/2019 7:12 AM, vattunuru@marvell.com wrote:
> From: Kiran Kumar K <kirankumark@marvell.com>
> 
> Patch adds support for kernel module to work in IOVA = VA mode,
> the idea is to get physical address from IOVA address using
> iommu_iova_to_phys API and later use phys_to_virt API to
> convert the physical address to kernel virtual address.
> 
> When compared with IOVA = PA mode, there is no performance
> drop with this approach.
> 
> This approach does not work with the kernel versions less
> than 4.4.0 because of API compatibility issues.
> 
> Patch also updates these support details in KNI documentation.
> 
> Signed-off-by: Kiran Kumar K <kirankumark@marvell.com>
> Signed-off-by: Vamsi Attunuru <vattunuru@marvell.com>

<...>

> @@ -348,15 +351,65 @@ kni_ioctl_create(struct net *net, uint32_t ioctl_num,
>  	strncpy(kni->name, dev_info.name, RTE_KNI_NAMESIZE);
>  
>  	/* Translate user space info into kernel space info */
> -	kni->tx_q = phys_to_virt(dev_info.tx_phys);
> -	kni->rx_q = phys_to_virt(dev_info.rx_phys);
> -	kni->alloc_q = phys_to_virt(dev_info.alloc_phys);
> -	kni->free_q = phys_to_virt(dev_info.free_phys);
> -
> -	kni->req_q = phys_to_virt(dev_info.req_phys);
> -	kni->resp_q = phys_to_virt(dev_info.resp_phys);
> -	kni->sync_va = dev_info.sync_va;
> -	kni->sync_kva = phys_to_virt(dev_info.sync_phys);
> +	if (dev_info.iova_mode) {
> +#ifdef HAVE_IOVA_AS_VA_SUPPORT
> +		pci = pci_get_device(dev_info.vendor_id,
> +				     dev_info.device_id, NULL);
> +		if (pci == NULL) {
> +			pr_err("pci dev does not exist\n");
> +			return -ENODEV;
> +		}

If there is no PCI device KNI should still work.
  
Stephen Hemminger Oct. 15, 2019, 3:46 p.m. UTC | #2
On Tue, 15 Oct 2019 16:43:08 +0100
"Yigit, Ferruh" <ferruh.yigit@linux.intel.com> wrote:

> On 8/16/2019 7:12 AM, vattunuru@marvell.com wrote:
> > From: Kiran Kumar K <kirankumark@marvell.com>
> > 
> > Patch adds support for kernel module to work in IOVA = VA mode,
> > the idea is to get physical address from IOVA address using
> > iommu_iova_to_phys API and later use phys_to_virt API to
> > convert the physical address to kernel virtual address.
> > 
> > When compared with IOVA = PA mode, there is no performance
> > drop with this approach.
> > 
> > This approach does not work with the kernel versions less
> > than 4.4.0 because of API compatibility issues.
> > 
> > Patch also updates these support details in KNI documentation.
> > 
> > Signed-off-by: Kiran Kumar K <kirankumark@marvell.com>
> > Signed-off-by: Vamsi Attunuru <vattunuru@marvell.com>  
> 
> <...>
> 
> > @@ -348,15 +351,65 @@ kni_ioctl_create(struct net *net, uint32_t ioctl_num,
> >  	strncpy(kni->name, dev_info.name, RTE_KNI_NAMESIZE);
> >  
> >  	/* Translate user space info into kernel space info */
> > -	kni->tx_q = phys_to_virt(dev_info.tx_phys);
> > -	kni->rx_q = phys_to_virt(dev_info.rx_phys);
> > -	kni->alloc_q = phys_to_virt(dev_info.alloc_phys);
> > -	kni->free_q = phys_to_virt(dev_info.free_phys);
> > -
> > -	kni->req_q = phys_to_virt(dev_info.req_phys);
> > -	kni->resp_q = phys_to_virt(dev_info.resp_phys);
> > -	kni->sync_va = dev_info.sync_va;
> > -	kni->sync_kva = phys_to_virt(dev_info.sync_phys);
> > +	if (dev_info.iova_mode) {
> > +#ifdef HAVE_IOVA_AS_VA_SUPPORT
> > +		pci = pci_get_device(dev_info.vendor_id,
> > +				     dev_info.device_id, NULL);
> > +		if (pci == NULL) {
> > +			pr_err("pci dev does not exist\n");
> > +			return -ENODEV;
> > +		}  
> 
> If there is no PCI device KNI should still work.

Right now it is possible to use KNI with netvsc PMD on Hyper-V/Azure.
With this patch that won't be possible.
  
Vamsi Krishna Attunuru Oct. 16, 2019, 11:26 a.m. UTC | #3
> -----Original Message-----
> From: Stephen Hemminger <stephen@networkplumber.org>
> Sent: Tuesday, October 15, 2019 9:16 PM
> To: Yigit, Ferruh <ferruh.yigit@linux.intel.com>
> Cc: Vamsi Krishna Attunuru <vattunuru@marvell.com>; dev@dpdk.org;
> thomas@monjalon.net; Jerin Jacob Kollanukkaran <jerinj@marvell.com>;
> olivier.matz@6wind.com; ferruh.yigit@intel.com; anatoly.burakov@intel.com;
> arybchenko@solarflare.com; Kiran Kumar Kokkilagadda
> <kirankumark@marvell.com>
> Subject: [EXT] Re: [dpdk-dev] [PATCH v10 4/5] kni: add IOVA=VA support in KNI
> module
> 
> External Email
> 
> ----------------------------------------------------------------------
> On Tue, 15 Oct 2019 16:43:08 +0100
> "Yigit, Ferruh" <ferruh.yigit@linux.intel.com> wrote:
> 
> > On 8/16/2019 7:12 AM, vattunuru@marvell.com wrote:
> > > From: Kiran Kumar K <kirankumark@marvell.com>
> > >
> > > Patch adds support for kernel module to work in IOVA = VA mode, the
> > > idea is to get physical address from IOVA address using
> > > iommu_iova_to_phys API and later use phys_to_virt API to convert the
> > > physical address to kernel virtual address.
> > >
> > > When compared with IOVA = PA mode, there is no performance drop with
> > > this approach.
> > >
> > > This approach does not work with the kernel versions less than 4.4.0
> > > because of API compatibility issues.
> > >
> > > Patch also updates these support details in KNI documentation.
> > >
> > > Signed-off-by: Kiran Kumar K <kirankumark@marvell.com>
> > > Signed-off-by: Vamsi Attunuru <vattunuru@marvell.com>
> >
> > <...>
> >
> > > @@ -348,15 +351,65 @@ kni_ioctl_create(struct net *net, uint32_t
> ioctl_num,
> > >  	strncpy(kni->name, dev_info.name, RTE_KNI_NAMESIZE);
> > >
> > >  	/* Translate user space info into kernel space info */
> > > -	kni->tx_q = phys_to_virt(dev_info.tx_phys);
> > > -	kni->rx_q = phys_to_virt(dev_info.rx_phys);
> > > -	kni->alloc_q = phys_to_virt(dev_info.alloc_phys);
> > > -	kni->free_q = phys_to_virt(dev_info.free_phys);
> > > -
> > > -	kni->req_q = phys_to_virt(dev_info.req_phys);
> > > -	kni->resp_q = phys_to_virt(dev_info.resp_phys);
> > > -	kni->sync_va = dev_info.sync_va;
> > > -	kni->sync_kva = phys_to_virt(dev_info.sync_phys);
> > > +	if (dev_info.iova_mode) {
> > > +#ifdef HAVE_IOVA_AS_VA_SUPPORT
> > > +		pci = pci_get_device(dev_info.vendor_id,
> > > +				     dev_info.device_id, NULL);
> > > +		if (pci == NULL) {
> > > +			pr_err("pci dev does not exist\n");
> > > +			return -ENODEV;
> > > +		}
> >
> > If there is no PCI device KNI should still work.
> 
> Right now it is possible to use KNI with netvsc PMD on Hyper-V/Azure.
> With this patch that won't be possible.

Hi Ferruh, Stephen,

These can be fixed by forcing iommu_mode as PA when vdevs are used
for KNI usecase.

rte_bus_get_iommu_class(void)
 {
        enum rte_iova_mode mode = RTE_IOVA_DC;
+       struct rte_devargs *devargs = NULL;
        bool buses_want_va = false;
        bool buses_want_pa = false;
        struct rte_bus *bus;

+       if (rte_eal_check_module("rte_kni") == 1) {
+               RTE_EAL_DEVARGS_FOREACH("vdev", devargs) {
+                       return RTE_IOVA_PA;
+               }
+       }
+
        TAILQ_FOREACH(bus, &rte_bus_list, next) {
                enum rte_iova_mode bus_iova_mode;

I think this will solve various use cases/combinations like PA or VA mode, pdev or vdev used for KNI.
Existing use cases would not be affected by these patch series with above fix.
  
Vamsi Krishna Attunuru Oct. 16, 2019, 2:37 p.m. UTC | #4
> -----Original Message-----
> From: Vamsi Krishna Attunuru
> Sent: Wednesday, October 16, 2019 4:56 PM
> To: Stephen Hemminger <stephen@networkplumber.org>; Yigit, Ferruh
> <ferruh.yigit@linux.intel.com>
> Cc: dev@dpdk.org; thomas@monjalon.net; Jerin Jacob Kollanukkaran
> <jerinj@marvell.com>; olivier.matz@6wind.com; ferruh.yigit@intel.com;
> anatoly.burakov@intel.com; arybchenko@solarflare.com; Kiran Kumar
> Kokkilagadda <kirankumark@marvell.com>
> Subject: RE: [EXT] Re: [dpdk-dev] [PATCH v10 4/5] kni: add IOVA=VA support in
> KNI module
> 
> 
> 
> > -----Original Message-----
> > From: Stephen Hemminger <stephen@networkplumber.org>
> > Sent: Tuesday, October 15, 2019 9:16 PM
> > To: Yigit, Ferruh <ferruh.yigit@linux.intel.com>
> > Cc: Vamsi Krishna Attunuru <vattunuru@marvell.com>; dev@dpdk.org;
> > thomas@monjalon.net; Jerin Jacob Kollanukkaran <jerinj@marvell.com>;
> > olivier.matz@6wind.com; ferruh.yigit@intel.com;
> > anatoly.burakov@intel.com; arybchenko@solarflare.com; Kiran Kumar
> > Kokkilagadda <kirankumark@marvell.com>
> > Subject: [EXT] Re: [dpdk-dev] [PATCH v10 4/5] kni: add IOVA=VA support
> > in KNI module
> >
> > External Email
> >
> > ----------------------------------------------------------------------
> > On Tue, 15 Oct 2019 16:43:08 +0100
> > "Yigit, Ferruh" <ferruh.yigit@linux.intel.com> wrote:
> >
> > > On 8/16/2019 7:12 AM, vattunuru@marvell.com wrote:
> > > > From: Kiran Kumar K <kirankumark@marvell.com>
> > > >
> > > > Patch adds support for kernel module to work in IOVA = VA mode,
> > > > the idea is to get physical address from IOVA address using
> > > > iommu_iova_to_phys API and later use phys_to_virt API to convert
> > > > the physical address to kernel virtual address.
> > > >
> > > > When compared with IOVA = PA mode, there is no performance drop
> > > > with this approach.
> > > >
> > > > This approach does not work with the kernel versions less than
> > > > 4.4.0 because of API compatibility issues.
> > > >
> > > > Patch also updates these support details in KNI documentation.
> > > >
> > > > Signed-off-by: Kiran Kumar K <kirankumark@marvell.com>
> > > > Signed-off-by: Vamsi Attunuru <vattunuru@marvell.com>
> > >
> > > <...>
> > >
> > > > @@ -348,15 +351,65 @@ kni_ioctl_create(struct net *net, uint32_t
> > ioctl_num,
> > > >  	strncpy(kni->name, dev_info.name, RTE_KNI_NAMESIZE);
> > > >
> > > >  	/* Translate user space info into kernel space info */
> > > > -	kni->tx_q = phys_to_virt(dev_info.tx_phys);
> > > > -	kni->rx_q = phys_to_virt(dev_info.rx_phys);
> > > > -	kni->alloc_q = phys_to_virt(dev_info.alloc_phys);
> > > > -	kni->free_q = phys_to_virt(dev_info.free_phys);
> > > > -
> > > > -	kni->req_q = phys_to_virt(dev_info.req_phys);
> > > > -	kni->resp_q = phys_to_virt(dev_info.resp_phys);
> > > > -	kni->sync_va = dev_info.sync_va;
> > > > -	kni->sync_kva = phys_to_virt(dev_info.sync_phys);
> > > > +	if (dev_info.iova_mode) {
> > > > +#ifdef HAVE_IOVA_AS_VA_SUPPORT
> > > > +		pci = pci_get_device(dev_info.vendor_id,
> > > > +				     dev_info.device_id, NULL);
> > > > +		if (pci == NULL) {
> > > > +			pr_err("pci dev does not exist\n");
> > > > +			return -ENODEV;
> > > > +		}
> > >
> > > If there is no PCI device KNI should still work.
> >
> > Right now it is possible to use KNI with netvsc PMD on Hyper-V/Azure.
> > With this patch that won't be possible.
> 
> Hi Ferruh, Stephen,
> 
> These can be fixed by forcing iommu_mode as PA when vdevs are used for KNI
> usecase.
> 
> rte_bus_get_iommu_class(void)
>  {
>         enum rte_iova_mode mode = RTE_IOVA_DC;
> +       struct rte_devargs *devargs = NULL;
>         bool buses_want_va = false;
>         bool buses_want_pa = false;
>         struct rte_bus *bus;
> 
> +       if (rte_eal_check_module("rte_kni") == 1) {
> +               RTE_EAL_DEVARGS_FOREACH("vdev", devargs) {
> +                       return RTE_IOVA_PA;
> +               }
> +       }
> +
>         TAILQ_FOREACH(bus, &rte_bus_list, next) {
>                 enum rte_iova_mode bus_iova_mode;
> 
> I think this will solve various use cases/combinations like PA or VA mode, pdev or
> vdev used for KNI.
> Existing use cases would not be affected by these patch series with above fix.

Hi Ferruh, Stephen,

Pushed a patch for above mentioned fix.  Fix is moved to vdev bus driver.
Patch adds get_iommu_class callback in vdev bus driver and returns iova=pa for kni usecase.
http://patches.dpdk.org/patch/61312/
  
Ferruh Yigit Oct. 16, 2019, 4:14 p.m. UTC | #5
On 10/16/2019 12:26 PM, Vamsi Krishna Attunuru wrote:
> 
> 
>> -----Original Message-----
>> From: Stephen Hemminger <stephen@networkplumber.org>
>> Sent: Tuesday, October 15, 2019 9:16 PM
>> To: Yigit, Ferruh <ferruh.yigit@linux.intel.com>
>> Cc: Vamsi Krishna Attunuru <vattunuru@marvell.com>; dev@dpdk.org;
>> thomas@monjalon.net; Jerin Jacob Kollanukkaran <jerinj@marvell.com>;
>> olivier.matz@6wind.com; ferruh.yigit@intel.com; anatoly.burakov@intel.com;
>> arybchenko@solarflare.com; Kiran Kumar Kokkilagadda
>> <kirankumark@marvell.com>
>> Subject: [EXT] Re: [dpdk-dev] [PATCH v10 4/5] kni: add IOVA=VA support in KNI
>> module
>>
>> External Email
>>
>> ----------------------------------------------------------------------
>> On Tue, 15 Oct 2019 16:43:08 +0100
>> "Yigit, Ferruh" <ferruh.yigit@linux.intel.com> wrote:
>>
>>> On 8/16/2019 7:12 AM, vattunuru@marvell.com wrote:
>>>> From: Kiran Kumar K <kirankumark@marvell.com>
>>>>
>>>> Patch adds support for kernel module to work in IOVA = VA mode, the
>>>> idea is to get physical address from IOVA address using
>>>> iommu_iova_to_phys API and later use phys_to_virt API to convert the
>>>> physical address to kernel virtual address.
>>>>
>>>> When compared with IOVA = PA mode, there is no performance drop with
>>>> this approach.
>>>>
>>>> This approach does not work with the kernel versions less than 4.4.0
>>>> because of API compatibility issues.
>>>>
>>>> Patch also updates these support details in KNI documentation.
>>>>
>>>> Signed-off-by: Kiran Kumar K <kirankumark@marvell.com>
>>>> Signed-off-by: Vamsi Attunuru <vattunuru@marvell.com>
>>>
>>> <...>
>>>
>>>> @@ -348,15 +351,65 @@ kni_ioctl_create(struct net *net, uint32_t
>> ioctl_num,
>>>>  	strncpy(kni->name, dev_info.name, RTE_KNI_NAMESIZE);
>>>>
>>>>  	/* Translate user space info into kernel space info */
>>>> -	kni->tx_q = phys_to_virt(dev_info.tx_phys);
>>>> -	kni->rx_q = phys_to_virt(dev_info.rx_phys);
>>>> -	kni->alloc_q = phys_to_virt(dev_info.alloc_phys);
>>>> -	kni->free_q = phys_to_virt(dev_info.free_phys);
>>>> -
>>>> -	kni->req_q = phys_to_virt(dev_info.req_phys);
>>>> -	kni->resp_q = phys_to_virt(dev_info.resp_phys);
>>>> -	kni->sync_va = dev_info.sync_va;
>>>> -	kni->sync_kva = phys_to_virt(dev_info.sync_phys);
>>>> +	if (dev_info.iova_mode) {
>>>> +#ifdef HAVE_IOVA_AS_VA_SUPPORT
>>>> +		pci = pci_get_device(dev_info.vendor_id,
>>>> +				     dev_info.device_id, NULL);
>>>> +		if (pci == NULL) {
>>>> +			pr_err("pci dev does not exist\n");
>>>> +			return -ENODEV;
>>>> +		}
>>>
>>> If there is no PCI device KNI should still work.
>>
>> Right now it is possible to use KNI with netvsc PMD on Hyper-V/Azure.
>> With this patch that won't be possible.
> 
> Hi Ferruh, Stephen,
> 
> These can be fixed by forcing iommu_mode as PA when vdevs are used
> for KNI usecase.
> 
> rte_bus_get_iommu_class(void)
>  {
>         enum rte_iova_mode mode = RTE_IOVA_DC;
> +       struct rte_devargs *devargs = NULL;
>         bool buses_want_va = false;
>         bool buses_want_pa = false;
>         struct rte_bus *bus;
> 
> +       if (rte_eal_check_module("rte_kni") == 1) {
> +               RTE_EAL_DEVARGS_FOREACH("vdev", devargs) {
> +                       return RTE_IOVA_PA;
> +               }
> +       }
> +
>         TAILQ_FOREACH(bus, &rte_bus_list, next) {
>                 enum rte_iova_mode bus_iova_mode;
> 
> I think this will solve various use cases/combinations like PA or VA mode, pdev or vdev used for KNI.
> Existing use cases would not be affected by these patch series with above fix.
> 

Hi Vamsi,

I think this is not a problem of using vdev so I think we can't solve this via
vdev check only.

The sample I give (KNI PMD) is using vdev, but application can use KNI library
APIs directly to create kni interface and have kernel/user space communication
without any device involved. KNI PMD (vdev) is just a wrapper to make this easy.

Just thinking aloud,
KNI is sharing in userspace buffer with kernel, so basically it needs to do
virtual address to kernel virtual address translation, in a reasonably fast manner.

iova=va breaks KNI because:
1) physical memory is not continuous anymore which break our address translation
logic
2) we were using physical address of the buffer for the address translation, but
we have no more have it, we now have iova address.

I assume 1) is resolved with 'rte_kni_pktmbuf_pool_create()' it would be helpful
though if you can explain how this works?

For second, a simple question, do we need to get a PCIe device information to be
able to convert iova to kernel virtual address? Can't I get this information
from iommu somehow?
Think about a case "--iova-mode=va" provided but there is no physical device
bind to vfio-pci, can I still allocated memor? And how can I use KNI in that case?
  
Vamsi Krishna Attunuru Oct. 18, 2019, 5:15 p.m. UTC | #6
> -----Original Message-----
> From: dev <dev-bounces@dpdk.org> On Behalf Of Ferruh Yigit
> Sent: Wednesday, October 16, 2019 9:44 PM
> To: Vamsi Krishna Attunuru <vattunuru@marvell.com>; Stephen Hemminger
> <stephen@networkplumber.org>; Yigit, Ferruh
> <ferruh.yigit@linux.intel.com>
> Cc: dev@dpdk.org; thomas@monjalon.net; Jerin Jacob Kollanukkaran
> <jerinj@marvell.com>; olivier.matz@6wind.com;
> anatoly.burakov@intel.com; arybchenko@solarflare.com; Kiran Kumar
> Kokkilagadda <kirankumark@marvell.com>
> Subject: Re: [dpdk-dev] [EXT] Re: [PATCH v10 4/5] kni: add IOVA=VA support
> in KNI module
> 
> On 10/16/2019 12:26 PM, Vamsi Krishna Attunuru wrote:
> >
> >
> >> -----Original Message-----
> >> From: Stephen Hemminger <stephen@networkplumber.org>
> >> Sent: Tuesday, October 15, 2019 9:16 PM
> >> To: Yigit, Ferruh <ferruh.yigit@linux.intel.com>
> >> Cc: Vamsi Krishna Attunuru <vattunuru@marvell.com>; dev@dpdk.org;
> >> thomas@monjalon.net; Jerin Jacob Kollanukkaran <jerinj@marvell.com>;
> >> olivier.matz@6wind.com; ferruh.yigit@intel.com;
> >> anatoly.burakov@intel.com; arybchenko@solarflare.com; Kiran Kumar
> >> Kokkilagadda <kirankumark@marvell.com>
> >> Subject: [EXT] Re: [dpdk-dev] [PATCH v10 4/5] kni: add IOVA=VA
> >> support in KNI module
> >>
> >> External Email
> >>
> >> ---------------------------------------------------------------------
> >> -
> >> On Tue, 15 Oct 2019 16:43:08 +0100
> >> "Yigit, Ferruh" <ferruh.yigit@linux.intel.com> wrote:
> >>
> >>> On 8/16/2019 7:12 AM, vattunuru@marvell.com wrote:
> >>>> From: Kiran Kumar K <kirankumark@marvell.com>
> >>>>
> >>>> Patch adds support for kernel module to work in IOVA = VA mode, the
> >>>> idea is to get physical address from IOVA address using
> >>>> iommu_iova_to_phys API and later use phys_to_virt API to convert
> >>>> the physical address to kernel virtual address.
> >>>>
> >>>> When compared with IOVA = PA mode, there is no performance drop
> >>>> with this approach.
> >>>>
> >>>> This approach does not work with the kernel versions less than
> >>>> 4.4.0 because of API compatibility issues.
> >>>>
> >>>> Patch also updates these support details in KNI documentation.
> >>>>
> >>>> Signed-off-by: Kiran Kumar K <kirankumark@marvell.com>
> >>>> Signed-off-by: Vamsi Attunuru <vattunuru@marvell.com>
> >>>
> >>> <...>
> >>>
> >>>> @@ -348,15 +351,65 @@ kni_ioctl_create(struct net *net, uint32_t
> >> ioctl_num,
> >>>>  	strncpy(kni->name, dev_info.name, RTE_KNI_NAMESIZE);
> >>>>
> >>>>  	/* Translate user space info into kernel space info */
> >>>> -	kni->tx_q = phys_to_virt(dev_info.tx_phys);
> >>>> -	kni->rx_q = phys_to_virt(dev_info.rx_phys);
> >>>> -	kni->alloc_q = phys_to_virt(dev_info.alloc_phys);
> >>>> -	kni->free_q = phys_to_virt(dev_info.free_phys);
> >>>> -
> >>>> -	kni->req_q = phys_to_virt(dev_info.req_phys);
> >>>> -	kni->resp_q = phys_to_virt(dev_info.resp_phys);
> >>>> -	kni->sync_va = dev_info.sync_va;
> >>>> -	kni->sync_kva = phys_to_virt(dev_info.sync_phys);
> >>>> +	if (dev_info.iova_mode) {
> >>>> +#ifdef HAVE_IOVA_AS_VA_SUPPORT
> >>>> +		pci = pci_get_device(dev_info.vendor_id,
> >>>> +				     dev_info.device_id, NULL);
> >>>> +		if (pci == NULL) {
> >>>> +			pr_err("pci dev does not exist\n");
> >>>> +			return -ENODEV;
> >>>> +		}
> >>>
> >>> If there is no PCI device KNI should still work.
> >>
> >> Right now it is possible to use KNI with netvsc PMD on Hyper-V/Azure.
> >> With this patch that won't be possible.
> >
> > Hi Ferruh, Stephen,
> >
> > These can be fixed by forcing iommu_mode as PA when vdevs are used for
> > KNI usecase.
> >
> > rte_bus_get_iommu_class(void)
> >  {
> >         enum rte_iova_mode mode = RTE_IOVA_DC;
> > +       struct rte_devargs *devargs = NULL;
> >         bool buses_want_va = false;
> >         bool buses_want_pa = false;
> >         struct rte_bus *bus;
> >
> > +       if (rte_eal_check_module("rte_kni") == 1) {
> > +               RTE_EAL_DEVARGS_FOREACH("vdev", devargs) {
> > +                       return RTE_IOVA_PA;
> > +               }
> > +       }
> > +
> >         TAILQ_FOREACH(bus, &rte_bus_list, next) {
> >                 enum rte_iova_mode bus_iova_mode;
> >
> > I think this will solve various use cases/combinations like PA or VA mode,
> pdev or vdev used for KNI.
> > Existing use cases would not be affected by these patch series with above
> fix.
> >
> 
> Hi Vamsi,
> 
> I think this is not a problem of using vdev so I think we can't solve this via
> vdev check only.
> 
> The sample I give (KNI PMD) is using vdev, but application can use KNI library
> APIs directly to create kni interface and have kernel/user space
> communication without any device involved. KNI PMD (vdev) is just a
> wrapper to make this easy.
> 
> Just thinking aloud,
> KNI is sharing in userspace buffer with kernel, so basically it needs to do
> virtual address to kernel virtual address translation, in a reasonably fast
> manner.
> 
> iova=va breaks KNI because:
> 1) physical memory is not continuous anymore which break our address
> translation logic
> 2) we were using physical address of the buffer for the address translation,
> but we have no more have it, we now have iova address.
> 
> I assume 1) is resolved with 'rte_kni_pktmbuf_pool_create()' it would be
> helpful though if you can explain how this works?

KNI kernel module uses pa2va, va2pa calls to translate buffer pointers, this will only work when both pa & va are 1-to-1 contiguous. In such cases, dpdk mbuf memory should not cross physical page boundary,  If it crosses, it's iova/va will be contiguous, but it's pa may or may not be contiguous. Kni pktmbuf pool api ensures that mempool is not populated with those type of mbufs(that cross page boundary). Once the pool is created, all mbufs in that pool resides with in the page limits. 
> 
> For second, a simple question, do we need to get a PCIe device information
> to be able to convert iova to kernel virtual address? Can't I get this
> information from iommu somehow?
> Think about a case "--iova-mode=va" provided but there is no physical device
> bind to vfio-pci, can I still allocated memor? And how can I use KNI in that
> case?

We found a way to translate iova/uva to kva using kernel mm subsystem calls, using get_user_pages_remote() call KNI module is able to get mapping for iova to kva. This solution worked for pdevs(without sharing any dev info) and also for wrapper vdev pmds.

I will push next version of patches with these solution and other command line arg changes. Since the major architectural issue is fixed in enabling iova=va for KNI, we are planning to merge these patch series in 19.11 release.
  
Ferruh Yigit Oct. 21, 2019, 11:45 a.m. UTC | #7
On 10/18/2019 6:15 PM, Vamsi Krishna Attunuru wrote:
> 
> 
>> -----Original Message-----
>> From: dev <dev-bounces@dpdk.org> On Behalf Of Ferruh Yigit
>> Sent: Wednesday, October 16, 2019 9:44 PM
>> To: Vamsi Krishna Attunuru <vattunuru@marvell.com>; Stephen Hemminger
>> <stephen@networkplumber.org>; Yigit, Ferruh
>> <ferruh.yigit@linux.intel.com>
>> Cc: dev@dpdk.org; thomas@monjalon.net; Jerin Jacob Kollanukkaran
>> <jerinj@marvell.com>; olivier.matz@6wind.com;
>> anatoly.burakov@intel.com; arybchenko@solarflare.com; Kiran Kumar
>> Kokkilagadda <kirankumark@marvell.com>
>> Subject: Re: [dpdk-dev] [EXT] Re: [PATCH v10 4/5] kni: add IOVA=VA support
>> in KNI module
>>
>> On 10/16/2019 12:26 PM, Vamsi Krishna Attunuru wrote:
>>>
>>>
>>>> -----Original Message-----
>>>> From: Stephen Hemminger <stephen@networkplumber.org>
>>>> Sent: Tuesday, October 15, 2019 9:16 PM
>>>> To: Yigit, Ferruh <ferruh.yigit@linux.intel.com>
>>>> Cc: Vamsi Krishna Attunuru <vattunuru@marvell.com>; dev@dpdk.org;
>>>> thomas@monjalon.net; Jerin Jacob Kollanukkaran <jerinj@marvell.com>;
>>>> olivier.matz@6wind.com; ferruh.yigit@intel.com;
>>>> anatoly.burakov@intel.com; arybchenko@solarflare.com; Kiran Kumar
>>>> Kokkilagadda <kirankumark@marvell.com>
>>>> Subject: [EXT] Re: [dpdk-dev] [PATCH v10 4/5] kni: add IOVA=VA
>>>> support in KNI module
>>>>
>>>> External Email
>>>>
>>>> ---------------------------------------------------------------------
>>>> -
>>>> On Tue, 15 Oct 2019 16:43:08 +0100
>>>> "Yigit, Ferruh" <ferruh.yigit@linux.intel.com> wrote:
>>>>
>>>>> On 8/16/2019 7:12 AM, vattunuru@marvell.com wrote:
>>>>>> From: Kiran Kumar K <kirankumark@marvell.com>
>>>>>>
>>>>>> Patch adds support for kernel module to work in IOVA = VA mode, the
>>>>>> idea is to get physical address from IOVA address using
>>>>>> iommu_iova_to_phys API and later use phys_to_virt API to convert
>>>>>> the physical address to kernel virtual address.
>>>>>>
>>>>>> When compared with IOVA = PA mode, there is no performance drop
>>>>>> with this approach.
>>>>>>
>>>>>> This approach does not work with the kernel versions less than
>>>>>> 4.4.0 because of API compatibility issues.
>>>>>>
>>>>>> Patch also updates these support details in KNI documentation.
>>>>>>
>>>>>> Signed-off-by: Kiran Kumar K <kirankumark@marvell.com>
>>>>>> Signed-off-by: Vamsi Attunuru <vattunuru@marvell.com>
>>>>>
>>>>> <...>
>>>>>
>>>>>> @@ -348,15 +351,65 @@ kni_ioctl_create(struct net *net, uint32_t
>>>> ioctl_num,
>>>>>>  	strncpy(kni->name, dev_info.name, RTE_KNI_NAMESIZE);
>>>>>>
>>>>>>  	/* Translate user space info into kernel space info */
>>>>>> -	kni->tx_q = phys_to_virt(dev_info.tx_phys);
>>>>>> -	kni->rx_q = phys_to_virt(dev_info.rx_phys);
>>>>>> -	kni->alloc_q = phys_to_virt(dev_info.alloc_phys);
>>>>>> -	kni->free_q = phys_to_virt(dev_info.free_phys);
>>>>>> -
>>>>>> -	kni->req_q = phys_to_virt(dev_info.req_phys);
>>>>>> -	kni->resp_q = phys_to_virt(dev_info.resp_phys);
>>>>>> -	kni->sync_va = dev_info.sync_va;
>>>>>> -	kni->sync_kva = phys_to_virt(dev_info.sync_phys);
>>>>>> +	if (dev_info.iova_mode) {
>>>>>> +#ifdef HAVE_IOVA_AS_VA_SUPPORT
>>>>>> +		pci = pci_get_device(dev_info.vendor_id,
>>>>>> +				     dev_info.device_id, NULL);
>>>>>> +		if (pci == NULL) {
>>>>>> +			pr_err("pci dev does not exist\n");
>>>>>> +			return -ENODEV;
>>>>>> +		}
>>>>>
>>>>> If there is no PCI device KNI should still work.
>>>>
>>>> Right now it is possible to use KNI with netvsc PMD on Hyper-V/Azure.
>>>> With this patch that won't be possible.
>>>
>>> Hi Ferruh, Stephen,
>>>
>>> These can be fixed by forcing iommu_mode as PA when vdevs are used for
>>> KNI usecase.
>>>
>>> rte_bus_get_iommu_class(void)
>>>  {
>>>         enum rte_iova_mode mode = RTE_IOVA_DC;
>>> +       struct rte_devargs *devargs = NULL;
>>>         bool buses_want_va = false;
>>>         bool buses_want_pa = false;
>>>         struct rte_bus *bus;
>>>
>>> +       if (rte_eal_check_module("rte_kni") == 1) {
>>> +               RTE_EAL_DEVARGS_FOREACH("vdev", devargs) {
>>> +                       return RTE_IOVA_PA;
>>> +               }
>>> +       }
>>> +
>>>         TAILQ_FOREACH(bus, &rte_bus_list, next) {
>>>                 enum rte_iova_mode bus_iova_mode;
>>>
>>> I think this will solve various use cases/combinations like PA or VA mode,
>> pdev or vdev used for KNI.
>>> Existing use cases would not be affected by these patch series with above
>> fix.
>>>
>>
>> Hi Vamsi,
>>
>> I think this is not a problem of using vdev so I think we can't solve this via
>> vdev check only.
>>
>> The sample I give (KNI PMD) is using vdev, but application can use KNI library
>> APIs directly to create kni interface and have kernel/user space
>> communication without any device involved. KNI PMD (vdev) is just a
>> wrapper to make this easy.
>>
>> Just thinking aloud,
>> KNI is sharing in userspace buffer with kernel, so basically it needs to do
>> virtual address to kernel virtual address translation, in a reasonably fast
>> manner.
>>
>> iova=va breaks KNI because:
>> 1) physical memory is not continuous anymore which break our address
>> translation logic
>> 2) we were using physical address of the buffer for the address translation,
>> but we have no more have it, we now have iova address.
>>
>> I assume 1) is resolved with 'rte_kni_pktmbuf_pool_create()' it would be
>> helpful though if you can explain how this works?
> 
> KNI kernel module uses pa2va, va2pa calls to translate buffer pointers, this will only work when both pa & va are 1-to-1 contiguous. In such cases, dpdk mbuf memory should not cross physical page boundary,  If it crosses, it's iova/va will be contiguous, but it's pa may or may not be contiguous. Kni pktmbuf pool api ensures that mempool is not populated with those type of mbufs(that cross page boundary). Once the pool is created, all mbufs in that pool resides with in the page limits.

ack.

>>
>> For second, a simple question, do we need to get a PCIe device information
>> to be able to convert iova to kernel virtual address? Can't I get this
>> information from iommu somehow?
>> Think about a case "--iova-mode=va" provided but there is no physical device
>> bind to vfio-pci, can I still allocated memor? And how can I use KNI in that
>> case?
> 
> We found a way to translate iova/uva to kva using kernel mm subsystem calls, using get_user_pages_remote() call KNI module is able to get mapping for iova to kva. This solution worked for pdevs(without sharing any dev info) and also for wrapper vdev pmds.

Good to hear this.

> 
> I will push next version of patches with these solution and other command line arg changes. Since the major architectural issue is fixed in enabling iova=va for KNI, we are planning to merge these patch series in 19.11 release.
>
  

Patch

diff --git a/kernel/linux/kni/compat.h b/kernel/linux/kni/compat.h
index 562d8bf..ee997a6 100644
--- a/kernel/linux/kni/compat.h
+++ b/kernel/linux/kni/compat.h
@@ -121,3 +121,7 @@ 
 #if LINUX_VERSION_CODE >= KERNEL_VERSION(4, 11, 0)
 #define HAVE_SIGNAL_FUNCTIONS_OWN_HEADER
 #endif
+
+#if KERNEL_VERSION(4, 4, 0) <= LINUX_VERSION_CODE
+#define HAVE_IOVA_AS_VA_SUPPORT
+#endif
diff --git a/kernel/linux/kni/kni_dev.h b/kernel/linux/kni/kni_dev.h
index c1ca678..d5898f3 100644
--- a/kernel/linux/kni/kni_dev.h
+++ b/kernel/linux/kni/kni_dev.h
@@ -25,6 +25,7 @@ 
 #include <linux/netdevice.h>
 #include <linux/spinlock.h>
 #include <linux/list.h>
+#include <linux/iommu.h>
 
 #include <rte_kni_common.h>
 #define KNI_KTHREAD_RESCHEDULE_INTERVAL 5 /* us */
@@ -41,6 +42,9 @@  struct kni_dev {
 	/* kni list */
 	struct list_head list;
 
+	uint8_t iova_mode;
+	struct iommu_domain *domain;
+
 	uint32_t core_id;            /* Core ID to bind */
 	char name[RTE_KNI_NAMESIZE]; /* Network device name */
 	struct task_struct *pthread;
diff --git a/kernel/linux/kni/kni_misc.c b/kernel/linux/kni/kni_misc.c
index 2b75502..8660205 100644
--- a/kernel/linux/kni/kni_misc.c
+++ b/kernel/linux/kni/kni_misc.c
@@ -295,6 +295,9 @@  kni_ioctl_create(struct net *net, uint32_t ioctl_num,
 	struct rte_kni_device_info dev_info;
 	struct net_device *net_dev = NULL;
 	struct kni_dev *kni, *dev, *n;
+	struct pci_dev *pci = NULL;
+	struct iommu_domain *domain = NULL;
+	phys_addr_t phys_addr;
 
 	pr_info("Creating kni...\n");
 	/* Check the buffer size, to avoid warning */
@@ -348,15 +351,65 @@  kni_ioctl_create(struct net *net, uint32_t ioctl_num,
 	strncpy(kni->name, dev_info.name, RTE_KNI_NAMESIZE);
 
 	/* Translate user space info into kernel space info */
-	kni->tx_q = phys_to_virt(dev_info.tx_phys);
-	kni->rx_q = phys_to_virt(dev_info.rx_phys);
-	kni->alloc_q = phys_to_virt(dev_info.alloc_phys);
-	kni->free_q = phys_to_virt(dev_info.free_phys);
-
-	kni->req_q = phys_to_virt(dev_info.req_phys);
-	kni->resp_q = phys_to_virt(dev_info.resp_phys);
-	kni->sync_va = dev_info.sync_va;
-	kni->sync_kva = phys_to_virt(dev_info.sync_phys);
+	if (dev_info.iova_mode) {
+#ifdef HAVE_IOVA_AS_VA_SUPPORT
+		pci = pci_get_device(dev_info.vendor_id,
+				     dev_info.device_id, NULL);
+		if (pci == NULL) {
+			pr_err("pci dev does not exist\n");
+			return -ENODEV;
+		}
+
+		while (pci) {
+			if ((pci->bus->number == dev_info.bus) &&
+			    (PCI_SLOT(pci->devfn) == dev_info.devid) &&
+			    (PCI_FUNC(pci->devfn) == dev_info.function)) {
+				domain = iommu_get_domain_for_dev(&pci->dev);
+				break;
+			}
+			pci = pci_get_device(dev_info.vendor_id,
+					     dev_info.device_id, pci);
+		}
+
+		if (domain == NULL) {
+			pr_err("Failed to get pci dev domain info\n");
+			return -ENODEV;
+		}
+#else
+		pr_err("Kernel version does not support IOVA as VA\n");
+		return -EINVAL;
+#endif
+		kni->domain = domain;
+		phys_addr = iommu_iova_to_phys(domain, dev_info.tx_phys);
+		kni->tx_q = phys_to_virt(phys_addr);
+		phys_addr = iommu_iova_to_phys(domain, dev_info.rx_phys);
+		kni->rx_q = phys_to_virt(phys_addr);
+		phys_addr = iommu_iova_to_phys(domain, dev_info.alloc_phys);
+		kni->alloc_q = phys_to_virt(phys_addr);
+		phys_addr = iommu_iova_to_phys(domain, dev_info.free_phys);
+		kni->free_q = phys_to_virt(phys_addr);
+		phys_addr = iommu_iova_to_phys(domain, dev_info.req_phys);
+		kni->req_q = phys_to_virt(phys_addr);
+		phys_addr = iommu_iova_to_phys(domain, dev_info.resp_phys);
+		kni->resp_q = phys_to_virt(phys_addr);
+		kni->sync_va = dev_info.sync_va;
+		phys_addr = iommu_iova_to_phys(domain, dev_info.sync_phys);
+		kni->sync_kva = phys_to_virt(phys_addr);
+		kni->iova_mode = 1;
+
+	} else {
+
+		kni->tx_q = phys_to_virt(dev_info.tx_phys);
+		kni->rx_q = phys_to_virt(dev_info.rx_phys);
+		kni->alloc_q = phys_to_virt(dev_info.alloc_phys);
+		kni->free_q = phys_to_virt(dev_info.free_phys);
+
+		kni->req_q = phys_to_virt(dev_info.req_phys);
+		kni->resp_q = phys_to_virt(dev_info.resp_phys);
+		kni->sync_va = dev_info.sync_va;
+		kni->sync_kva = phys_to_virt(dev_info.sync_phys);
+		kni->iova_mode = 0;
+	}
 
 	kni->mbuf_size = dev_info.mbuf_size;
 
diff --git a/kernel/linux/kni/kni_net.c b/kernel/linux/kni/kni_net.c
index 7bd3a9f..8382859 100644
--- a/kernel/linux/kni/kni_net.c
+++ b/kernel/linux/kni/kni_net.c
@@ -36,6 +36,21 @@  static void kni_net_rx_normal(struct kni_dev *kni);
 /* kni rx function pointer, with default to normal rx */
 static kni_net_rx_t kni_net_rx_func = kni_net_rx_normal;
 
+/* iova to kernel virtual address */
+static inline void *
+iova2kva(struct kni_dev *kni, void *pa)
+{
+	return phys_to_virt(iommu_iova_to_phys(kni->domain,
+				(uintptr_t)pa));
+}
+
+static inline void *
+iova2data_kva(struct kni_dev *kni, struct rte_kni_mbuf *m)
+{
+	return phys_to_virt(iommu_iova_to_phys(kni->domain,
+			   (uintptr_t)m->buf_physaddr) + m->data_off);
+}
+
 /* physical address to kernel virtual address */
 static void *
 pa2kva(void *pa)
@@ -62,6 +77,24 @@  kva2data_kva(struct rte_kni_mbuf *m)
 	return phys_to_virt(m->buf_physaddr + m->data_off);
 }
 
+static inline void *
+get_kva(struct kni_dev *kni, void *pa)
+{
+	if (kni->iova_mode == 1)
+		return iova2kva(kni, pa);
+
+	return pa2kva(pa);
+}
+
+static inline void *
+get_data_kva(struct kni_dev *kni, void *pkt_kva)
+{
+	if (kni->iova_mode == 1)
+		return iova2data_kva(kni, pkt_kva);
+
+	return kva2data_kva(pkt_kva);
+}
+
 /*
  * It can be called to process the request.
  */
@@ -178,7 +211,7 @@  kni_fifo_trans_pa2va(struct kni_dev *kni,
 			return;
 
 		for (i = 0; i < num_rx; i++) {
-			kva = pa2kva(kni->pa[i]);
+			kva = get_kva(kni, kni->pa[i]);
 			kni->va[i] = pa2va(kni->pa[i], kva);
 
 			kva_nb_segs = kva->nb_segs;
@@ -266,8 +299,8 @@  kni_net_tx(struct sk_buff *skb, struct net_device *dev)
 	if (likely(ret == 1)) {
 		void *data_kva;
 
-		pkt_kva = pa2kva(pkt_pa);
-		data_kva = kva2data_kva(pkt_kva);
+		pkt_kva = get_kva(kni, pkt_pa);
+		data_kva = get_data_kva(kni, pkt_kva);
 		pkt_va = pa2va(pkt_pa, pkt_kva);
 
 		len = skb->len;
@@ -338,9 +371,9 @@  kni_net_rx_normal(struct kni_dev *kni)
 
 	/* Transfer received packets to netif */
 	for (i = 0; i < num_rx; i++) {
-		kva = pa2kva(kni->pa[i]);
+		kva = get_kva(kni, kni->pa[i]);
 		len = kva->pkt_len;
-		data_kva = kva2data_kva(kva);
+		data_kva = get_data_kva(kni, kva);
 		kni->va[i] = pa2va(kni->pa[i], kva);
 
 		skb = netdev_alloc_skb(dev, len);
@@ -437,9 +470,9 @@  kni_net_rx_lo_fifo(struct kni_dev *kni)
 		num = ret;
 		/* Copy mbufs */
 		for (i = 0; i < num; i++) {
-			kva = pa2kva(kni->pa[i]);
+			kva = get_kva(kni, kni->pa[i]);
 			len = kva->data_len;
-			data_kva = kva2data_kva(kva);
+			data_kva = get_data_kva(kni, kva);
 			kni->va[i] = pa2va(kni->pa[i], kva);
 
 			while (kva->next) {
@@ -449,8 +482,8 @@  kni_net_rx_lo_fifo(struct kni_dev *kni)
 				kva = next_kva;
 			}
 
-			alloc_kva = pa2kva(kni->alloc_pa[i]);
-			alloc_data_kva = kva2data_kva(alloc_kva);
+			alloc_kva = get_kva(kni, kni->alloc_pa[i]);
+			alloc_data_kva = get_data_kva(kni, alloc_kva);
 			kni->alloc_va[i] = pa2va(kni->alloc_pa[i], alloc_kva);
 
 			memcpy(alloc_data_kva, data_kva, len);
@@ -517,9 +550,9 @@  kni_net_rx_lo_fifo_skb(struct kni_dev *kni)
 
 	/* Copy mbufs to sk buffer and then call tx interface */
 	for (i = 0; i < num; i++) {
-		kva = pa2kva(kni->pa[i]);
+		kva = get_kva(kni, kni->pa[i]);
 		len = kva->pkt_len;
-		data_kva = kva2data_kva(kva);
+		data_kva = get_data_kva(kni, kva);
 		kni->va[i] = pa2va(kni->pa[i], kva);
 
 		skb = netdev_alloc_skb(dev, len);
@@ -550,8 +583,8 @@  kni_net_rx_lo_fifo_skb(struct kni_dev *kni)
 					break;
 
 				prev_kva = kva;
-				kva = pa2kva(kva->next);
-				data_kva = kva2data_kva(kva);
+				kva = get_kva(kni, kva->next);
+				data_kva = get_data_kva(kni, kva);
 				/* Convert physical address to virtual address */
 				prev_kva->next = pa2va(prev_kva->next, kva);
 			}