[v4] eal: add VFIO-PCI SR-IOV support

Message ID 20200413082930.84050-1-haiyue.wang@intel.com (mailing list archive)
State Superseded, archived
Headers
Series [v4] eal: add VFIO-PCI SR-IOV support |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/Intel-compilation success Compilation OK
ci/travis-robot warning Travis build: failed
ci/iol-intel-Performance success Performance Testing PASS
ci/iol-mellanox-Performance success Performance Testing PASS
ci/iol-testing success Testing PASS

Commit Message

Wang, Haiyue April 13, 2020, 8:29 a.m. UTC
  The kernel module vfio-pci introduces the VF token to enable SR-IOV
support since 5.7.

The VF token can be set by a vfio-pci based PF driver and must be known
by the vfio-pci based VF driver in order to gain access to the device.

An example VF token option would take this form:

1. Install vfio-pci with option 'enable_sriov=1'

2. ./usertools/dpdk-devbind.py -b vfio-pci 0000:87:00.0

3. echo 2 > /sys/bus/pci/devices/0000:87:00.0/sriov_numvfs

4. Start the PF:
  ./x86_64-native-linux-gcc/app/testpmd -l 22-25 -n 4 \
         -w 87:00.0,vf_token=2ab74924-c335-45f4-9b16-8569e5b08258 \
         --file-prefix=pf -- -i

5. Start the VF:
   ./x86_64-native-linux-gcc/app/testpmd -l 26-29 -n 4 \
         -w 87:02.0,vf_token=2ab74924-c335-45f4-9b16-8569e5b08258 \
         --file-prefix=vf1 -- -i

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Acked-by: Vamsi Attunuru <vattunuru@marvell.com>
---
v4: 1. Ignore rte_vfio_setup_device ABI check since it is
       for Linux driver use.

v3: https://patchwork.dpdk.org/patch/68254/ 
	Fix the Travis build failed:
       (1). rte_uuid.h:97:55: error: unknown type name ‘size_t’
       (2). rte_uuid.h:58:2: error: implicit declaration of function ‘memcpy’

v2: https://patchwork.dpdk.org/patch/68240/ 
         Fix the FreeBSD build error.

v1: https://patchwork.dpdk.org/patch/68237/
         Update the commit message.

RFC v2: https://patchwork.dpdk.org/patch/68114/
	   Based on Vamsi's RFC v1, and Alex's patch for Qemu
	       [https://lore.kernel.org/lkml/20200204161737.34696b91@w520.home/]:
	   Use the devarg to pass-down the VF token.

RFC v1: https://patchwork.dpdk.org/patch/66281/ by Vamsi.
---
 devtools/libabigail.abignore      |  3 ++
 drivers/bus/pci/linux/pci_vfio.c  | 56 +++++++++++++++++++++++++++++--
 lib/librte_eal/freebsd/eal.c      |  3 +-
 lib/librte_eal/include/rte_uuid.h |  2 ++
 lib/librte_eal/include/rte_vfio.h |  8 ++++-
 lib/librte_eal/linux/eal_vfio.c   | 20 +++++++++--
 6 files changed, 85 insertions(+), 7 deletions(-)
  

Comments

Thomas Monjalon April 13, 2020, 12:18 p.m. UTC | #1
Hi,

About the title, I think it does not convey what is new here.
VFIO is not new, SR-IOV is already supported.
The title should mention the new VFIO feature in few simple words.
Is it only about using VFIO for PF?


13/04/2020 10:29, Haiyue Wang:
> v4: 1. Ignore rte_vfio_setup_device ABI check since it is
>        for Linux driver use.
[...]
> +; Ignore this function which is only relevant to linux for driver
> +[suppress_type]
> +	name = rte_vfio_setup_device

Adding such exception for all internal "driver interface" functions
is not scaling. Please use __rte_internal.
I am waiting for the patchset about rte_internal to be reviewed or completed.
As it is not progressing, the decision is to block any patch having
ABI issue because of internal false positive.
Please help, thanks.
  
Andrew Rybchenko April 13, 2020, 3:37 p.m. UTC | #2
On 4/13/20 11:29 AM, Haiyue Wang wrote:
> The kernel module vfio-pci introduces the VF token to enable SR-IOV
> support since 5.7.
> 
> The VF token can be set by a vfio-pci based PF driver and must be known
> by the vfio-pci based VF driver in order to gain access to the device.
> 
> An example VF token option would take this form:
> 
> 1. Install vfio-pci with option 'enable_sriov=1'
> 
> 2. ./usertools/dpdk-devbind.py -b vfio-pci 0000:87:00.0
> 
> 3. echo 2 > /sys/bus/pci/devices/0000:87:00.0/sriov_numvfs
> 
> 4. Start the PF:
>   ./x86_64-native-linux-gcc/app/testpmd -l 22-25 -n 4 \
>          -w 87:00.0,vf_token=2ab74924-c335-45f4-9b16-8569e5b08258 \
>          --file-prefix=pf -- -i

Should I get a token from my head? Any?

> 5. Start the VF:
>    ./x86_64-native-linux-gcc/app/testpmd -l 26-29 -n 4 \
>          -w 87:02.0,vf_token=2ab74924-c335-45f4-9b16-8569e5b08258 \
>          --file-prefix=vf1 -- -i
> 
> Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
> Acked-by: Vamsi Attunuru <vattunuru@marvell.com>
> ---
> v4: 1. Ignore rte_vfio_setup_device ABI check since it is
>        for Linux driver use.
> 
> v3: https://patchwork.dpdk.org/patch/68254/ 
> 	Fix the Travis build failed:
>        (1). rte_uuid.h:97:55: error: unknown type name ‘size_t’
>        (2). rte_uuid.h:58:2: error: implicit declaration of function ‘memcpy’
> 
> v2: https://patchwork.dpdk.org/patch/68240/ 
>          Fix the FreeBSD build error.
> 
> v1: https://patchwork.dpdk.org/patch/68237/
>          Update the commit message.
> 
> RFC v2: https://patchwork.dpdk.org/patch/68114/
> 	   Based on Vamsi's RFC v1, and Alex's patch for Qemu
> 	       [https://lore.kernel.org/lkml/20200204161737.34696b91@w520.home/]:
> 	   Use the devarg to pass-down the VF token.
> 
> RFC v1: https://patchwork.dpdk.org/patch/66281/ by Vamsi.
> ---
>  devtools/libabigail.abignore      |  3 ++
>  drivers/bus/pci/linux/pci_vfio.c  | 56 +++++++++++++++++++++++++++++--
>  lib/librte_eal/freebsd/eal.c      |  3 +-
>  lib/librte_eal/include/rte_uuid.h |  2 ++
>  lib/librte_eal/include/rte_vfio.h |  8 ++++-
>  lib/librte_eal/linux/eal_vfio.c   | 20 +++++++++--
>  6 files changed, 85 insertions(+), 7 deletions(-)
> 
> diff --git a/devtools/libabigail.abignore b/devtools/libabigail.abignore
> index a59df8f13..d918746b4 100644
> --- a/devtools/libabigail.abignore
> +++ b/devtools/libabigail.abignore
> @@ -11,3 +11,6 @@
>          type_kind = enum
>          name = rte_crypto_asym_xform_type
>          changed_enumerators = RTE_CRYPTO_ASYM_XFORM_TYPE_LIST_END
> +; Ignore this function which is only relevant to linux for driver
> +[suppress_type]
> +	name = rte_vfio_setup_device
> diff --git a/drivers/bus/pci/linux/pci_vfio.c b/drivers/bus/pci/linux/pci_vfio.c
> index 64cd84a68..7f99337c7 100644
> --- a/drivers/bus/pci/linux/pci_vfio.c
> +++ b/drivers/bus/pci/linux/pci_vfio.c
> @@ -11,6 +11,7 @@
>  #include <sys/mman.h>
>  #include <stdbool.h>
>  
> +#include <rte_devargs.h>
>  #include <rte_log.h>
>  #include <rte_pci.h>
>  #include <rte_bus_pci.h>
> @@ -644,11 +645,59 @@ pci_vfio_msix_is_mappable(int vfio_dev_fd, int msix_region)
>  	return ret;
>  }
>  
> +static void
> +vfio_pci_vf_token_arg(struct rte_devargs *devargs, rte_uuid_t uu)
> +{
> +#define VF_TOKEN_ARG "vf_token="
> +	char c, *p, *vf_token;
> +
> +	if (devargs == NULL)
> +		return;
> +
> +	p = strstr(devargs->args, VF_TOKEN_ARG);
> +	if (!p)
> +		return;
> +
> +	vf_token = p + strlen(VF_TOKEN_ARG);
> +	if (strlen(vf_token) < (RTE_UUID_STRLEN - 1))
> +		return;
> +
> +	c = vf_token[RTE_UUID_STRLEN - 1];
> +	if (c != '\0' && c != ',')
> +		return;
> +
> +	vf_token[RTE_UUID_STRLEN - 1] = '\0';

Is it possible to parse and handle devargs using rte_kvargs.h?

> +	if (rte_uuid_parse(vf_token, uu)) {
> +		RTE_LOG(ERR, EAL,
> +			"The VF token is not a valid uuid : %s\n", vf_token);
> +		vf_token[RTE_UUID_STRLEN - 1] = c;
> +		return;

I think that the function must return error which is handled
by the caller when something bad happens (e.g. invalid
UUID).

> +	}
> +
> +	RTE_LOG(DEBUG, EAL,
> +		"The VF token is found : %s\n", vf_token);
> +
> +	vf_token[RTE_UUID_STRLEN - 1] = c;
> +
> +	/* Purge this vfio-pci specific token from the device arguments */
> +	if (c != '\0') {
> +		/* 1. Handle the case : 'vf_token=uuid,arg1=val1' */
> +		memmove(p, vf_token + RTE_UUID_STRLEN,
> +			strlen(vf_token + RTE_UUID_STRLEN) + 1);
> +	} else {
> +		/* 2. Handle the case : 'arg1=val1,vf_token=uuid' */
> +		if (p != devargs->args)
> +			p--;
> +
> +		*p = '\0';
> +	}

Is it really required to purge? Why? If yes, it should be explained in
the comment above.

> +}
>  
>  static int
>  pci_vfio_map_resource_primary(struct rte_pci_device *dev)
>  {
>  	struct vfio_device_info device_info = { .argsz = sizeof(device_info) };
> +	rte_uuid_t vf_token = RTE_UUID_INIT(0, 0, 0, 0, 0ULL);

May be it would be better if vfio_pci_vf_token_arg()
initializes it anyway instead of duplication init
in two places?

>  	char pci_addr[PATH_MAX] = {0};
>  	int vfio_dev_fd;
>  	struct rte_pci_addr *loc = &dev->addr;
> @@ -668,8 +717,9 @@ pci_vfio_map_resource_primary(struct rte_pci_device *dev)
>  	snprintf(pci_addr, sizeof(pci_addr), PCI_PRI_FMT,
>  			loc->domain, loc->bus, loc->devid, loc->function);
>  
> +	vfio_pci_vf_token_arg(dev->device.devargs, vf_token);
>  	ret = rte_vfio_setup_device(rte_pci_get_sysfs_path(), pci_addr,
> -					&vfio_dev_fd, &device_info);
> +					&vfio_dev_fd, &device_info, vf_token);
>  	if (ret)
>  		return ret;
>  
> @@ -797,6 +847,7 @@ static int
>  pci_vfio_map_resource_secondary(struct rte_pci_device *dev)
>  {
>  	struct vfio_device_info device_info = { .argsz = sizeof(device_info) };
> +	rte_uuid_t vf_token = RTE_UUID_INIT(0, 0, 0, 0, 0ULL);
>  	char pci_addr[PATH_MAX] = {0};
>  	int vfio_dev_fd;
>  	struct rte_pci_addr *loc = &dev->addr;
> @@ -830,8 +881,9 @@ pci_vfio_map_resource_secondary(struct rte_pci_device *dev)
>  		return -1;
>  	}
>  
> +	vfio_pci_vf_token_arg(dev->device.devargs, vf_token);
>  	ret = rte_vfio_setup_device(rte_pci_get_sysfs_path(), pci_addr,
> -					&vfio_dev_fd, &device_info);
> +					&vfio_dev_fd, &device_info, vf_token);
>  	if (ret)
>  		return ret;
>  
> diff --git a/lib/librte_eal/freebsd/eal.c b/lib/librte_eal/freebsd/eal.c
> index 6ae37e7e6..a92584795 100644
> --- a/lib/librte_eal/freebsd/eal.c
> +++ b/lib/librte_eal/freebsd/eal.c
> @@ -995,7 +995,8 @@ rte_eal_vfio_intr_mode(void)
>  int rte_vfio_setup_device(__rte_unused const char *sysfs_base,
>  		      __rte_unused const char *dev_addr,
>  		      __rte_unused int *vfio_dev_fd,
> -		      __rte_unused struct vfio_device_info *device_info)
> +		      __rte_unused struct vfio_device_info *device_info,
> +		      __rte_unused rte_uuid_t vf_token)
>  {
>  	return -1;
>  }
> diff --git a/lib/librte_eal/include/rte_uuid.h b/lib/librte_eal/include/rte_uuid.h
> index 044afbdfa..8b42e070a 100644
> --- a/lib/librte_eal/include/rte_uuid.h
> +++ b/lib/librte_eal/include/rte_uuid.h
> @@ -15,6 +15,8 @@ extern "C" {
>  #endif
>  
>  #include <stdbool.h>
> +#include <stddef.h>
> +#include <string.h>
>  
>  /**
>   * Struct describing a Universal Unique Identifier
> diff --git a/lib/librte_eal/include/rte_vfio.h b/lib/librte_eal/include/rte_vfio.h
> index 20ed8c45a..1f9e22d82 100644
> --- a/lib/librte_eal/include/rte_vfio.h
> +++ b/lib/librte_eal/include/rte_vfio.h
> @@ -16,6 +16,8 @@ extern "C" {
>  
>  #include <stdint.h>
>  
> +#include <rte_uuid.h>
> +
>  /*
>   * determine if VFIO is present on the system
>   */
> @@ -102,13 +104,17 @@ struct vfio_device_info;
>   * @param device_info
>   *   Device information.
>   *
> + * @param vf_token
> + *   VF token.

Such comments are useles and just eat space adding  nothing
useful. Please, make it useful and explain what is behind the
parameter, when it is necessary, why? Should it be specified
for PF case, VF case, both?

> + *
>   * @return
>   *   0 on success.
>   *   <0 on failure.
>   *   >1 if the device cannot be managed this way.
>   */
>  int rte_vfio_setup_device(const char *sysfs_base, const char *dev_addr,
> -		int *vfio_dev_fd, struct vfio_device_info *device_info);
> +		int *vfio_dev_fd, struct vfio_device_info *device_info,
> +		rte_uuid_t vf_token);

"rte_uuid_t vf_token" looks confusing. Shouldn't it be
"rte_uuid_t *vf_token"?

>  
>  /**
>   * Release a device mapped to a VFIO-managed I/O MMU group.
> diff --git a/lib/librte_eal/linux/eal_vfio.c b/lib/librte_eal/linux/eal_vfio.c
> index 4502aefed..916082b5d 100644
> --- a/lib/librte_eal/linux/eal_vfio.c
> +++ b/lib/librte_eal/linux/eal_vfio.c
> @@ -702,7 +702,8 @@ rte_vfio_clear_group(int vfio_group_fd)
>  
>  int
>  rte_vfio_setup_device(const char *sysfs_base, const char *dev_addr,
> -		int *vfio_dev_fd, struct vfio_device_info *device_info)
> +		int *vfio_dev_fd, struct vfio_device_info *device_info,
> +		rte_uuid_t vf_token)
>  {
>  	struct vfio_group_status group_status = {
>  			.argsz = sizeof(group_status)
> @@ -712,6 +713,7 @@ rte_vfio_setup_device(const char *sysfs_base, const char *dev_addr,
>  	int vfio_container_fd;
>  	int vfio_group_fd;
>  	int iommu_group_num;
> +	char dev[PATH_MAX];

Why PATH_MAX?

>  	int i, ret;
>  
>  	/* get group number */
> @@ -895,8 +897,19 @@ rte_vfio_setup_device(const char *sysfs_base, const char *dev_addr,
>  				t->type_id, t->name);
>  	}
>  
> +	if (!rte_uuid_is_null(vf_token)) {
> +		char vf_token_str[RTE_UUID_STRLEN];
> +
> +		rte_uuid_unparse(vf_token, vf_token_str, sizeof(vf_token_str));
> +		snprintf(dev, sizeof(dev),
> +			 "%s vf_token=%s", dev_addr, vf_token_str);
> +	} else {
> +		snprintf(dev, sizeof(dev),
> +			 "%s", dev_addr);
> +	}
> +
>  	/* get a file descriptor for the device */
> -	*vfio_dev_fd = ioctl(vfio_group_fd, VFIO_GROUP_GET_DEVICE_FD, dev_addr);
> +	*vfio_dev_fd = ioctl(vfio_group_fd, VFIO_GROUP_GET_DEVICE_FD, dev);
>  	if (*vfio_dev_fd < 0) {
>  		/* if we cannot get a device fd, this implies a problem with
>  		 * the VFIO group or the container not having IOMMU configured.
> @@ -2081,7 +2094,8 @@ int
>  rte_vfio_setup_device(__rte_unused const char *sysfs_base,
>  		__rte_unused const char *dev_addr,
>  		__rte_unused int *vfio_dev_fd,
> -		__rte_unused struct vfio_device_info *device_info)
> +		__rte_unused struct vfio_device_info *device_info,
> +		__rte_unused rte_uuid_t vf_token)
>  {
>  	return -1;
>  }
>
  
Wang, Haiyue April 13, 2020, 4:45 p.m. UTC | #3
Hi Andrew,

> -----Original Message-----
> From: Andrew Rybchenko <arybchenko@solarflare.com>
> Sent: Monday, April 13, 2020 23:38
> To: Wang, Haiyue <haiyue.wang@intel.com>; dev@dpdk.org; thomas@monjalon.net; vattunuru@marvell.com;
> jerinj@marvell.com; alex.williamson@redhat.com; david.marchand@redhat.com
> Subject: Re: [dpdk-dev] [PATCH v4] eal: add VFIO-PCI SR-IOV support
> 
> On 4/13/20 11:29 AM, Haiyue Wang wrote:
> > The kernel module vfio-pci introduces the VF token to enable SR-IOV
> > support since 5.7.
> >
> > The VF token can be set by a vfio-pci based PF driver and must be known
> > by the vfio-pci based VF driver in order to gain access to the device.
> >
> > An example VF token option would take this form:
> >
> > 1. Install vfio-pci with option 'enable_sriov=1'
> >
> > 2. ./usertools/dpdk-devbind.py -b vfio-pci 0000:87:00.0
> >
> > 3. echo 2 > /sys/bus/pci/devices/0000:87:00.0/sriov_numvfs
> >
> > 4. Start the PF:
> >   ./x86_64-native-linux-gcc/app/testpmd -l 22-25 -n 4 \
> >          -w 87:00.0,vf_token=2ab74924-c335-45f4-9b16-8569e5b08258 \
> >          --file-prefix=pf -- -i
> 
> Should I get a token from my head? Any?
> 
> > 5. Start the VF:
> >    ./x86_64-native-linux-gcc/app/testpmd -l 26-29 -n 4 \
> >          -w 87:02.0,vf_token=2ab74924-c335-45f4-9b16-8569e5b08258 \
> >          --file-prefix=vf1 -- -i
> >
> > Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
> > Acked-by: Vamsi Attunuru <vattunuru@marvell.com>
> > ---
> > v4: 1. Ignore rte_vfio_setup_device ABI check since it is
> >        for Linux driver use.
> >
> > v3: https://patchwork.dpdk.org/patch/68254/
> > 	Fix the Travis build failed:
> >        (1). rte_uuid.h:97:55: error: unknown type name ‘size_t’
> >        (2). rte_uuid.h:58:2: error: implicit declaration of function ‘memcpy’
> >
> > v2: https://patchwork.dpdk.org/patch/68240/
> >          Fix the FreeBSD build error.
> >
> > v1: https://patchwork.dpdk.org/patch/68237/
> >          Update the commit message.
> >
> > RFC v2: https://patchwork.dpdk.org/patch/68114/
> > 	   Based on Vamsi's RFC v1, and Alex's patch for Qemu
> > 	       [https://lore.kernel.org/lkml/20200204161737.34696b91@w520.home/]:
> > 	   Use the devarg to pass-down the VF token.
> >
> > RFC v1: https://patchwork.dpdk.org/patch/66281/ by Vamsi.
> > ---
> >  devtools/libabigail.abignore      |  3 ++
> >  drivers/bus/pci/linux/pci_vfio.c  | 56 +++++++++++++++++++++++++++++--
> >  lib/librte_eal/freebsd/eal.c      |  3 +-
> >  lib/librte_eal/include/rte_uuid.h |  2 ++
> >  lib/librte_eal/include/rte_vfio.h |  8 ++++-
> >  lib/librte_eal/linux/eal_vfio.c   | 20 +++++++++--
> >  6 files changed, 85 insertions(+), 7 deletions(-)
> >
> > diff --git a/devtools/libabigail.abignore b/devtools/libabigail.abignore
> > index a59df8f13..d918746b4 100644
> > --- a/devtools/libabigail.abignore
> > +++ b/devtools/libabigail.abignore
> > @@ -11,3 +11,6 @@
> >          type_kind = enum
> >          name = rte_crypto_asym_xform_type
> >          changed_enumerators = RTE_CRYPTO_ASYM_XFORM_TYPE_LIST_END
> > +; Ignore this function which is only relevant to linux for driver
> > +[suppress_type]
> > +	name = rte_vfio_setup_device
> > diff --git a/drivers/bus/pci/linux/pci_vfio.c b/drivers/bus/pci/linux/pci_vfio.c
> > index 64cd84a68..7f99337c7 100644
> > --- a/drivers/bus/pci/linux/pci_vfio.c
> > +++ b/drivers/bus/pci/linux/pci_vfio.c
> > @@ -11,6 +11,7 @@
> >  #include <sys/mman.h>
> >  #include <stdbool.h>
> >
> > +#include <rte_devargs.h>
> >  #include <rte_log.h>
> >  #include <rte_pci.h>
> >  #include <rte_bus_pci.h>
> > @@ -644,11 +645,59 @@ pci_vfio_msix_is_mappable(int vfio_dev_fd, int msix_region)
> >  	return ret;
> >  }
> >
> > +static void
> > +vfio_pci_vf_token_arg(struct rte_devargs *devargs, rte_uuid_t uu)
> > +{
> > +#define VF_TOKEN_ARG "vf_token="
> > +	char c, *p, *vf_token;
> > +
> > +	if (devargs == NULL)
> > +		return;
> > +
> > +	p = strstr(devargs->args, VF_TOKEN_ARG);
> > +	if (!p)
> > +		return;
> > +
> > +	vf_token = p + strlen(VF_TOKEN_ARG);
> > +	if (strlen(vf_token) < (RTE_UUID_STRLEN - 1))
> > +		return;
> > +
> > +	c = vf_token[RTE_UUID_STRLEN - 1];
> > +	if (c != '\0' && c != ',')
> > +		return;
> > +
> > +	vf_token[RTE_UUID_STRLEN - 1] = '\0';
> 
> Is it possible to parse and handle devargs using rte_kvargs.h?
> 

Since it needs to remove the 'vf_token', as 'vf_token' is not a
valid PMD related args, so need to parse and delete it.

rte_kvargs_parse(const char *args, const char * const valid_keys[])

> > +	if (rte_uuid_parse(vf_token, uu)) {
> > +		RTE_LOG(ERR, EAL,
> > +			"The VF token is not a valid uuid : %s\n", vf_token);
> > +		vf_token[RTE_UUID_STRLEN - 1] = c;
> > +		return;
> 
> I think that the function must return error which is handled
> by the caller when something bad happens (e.g. invalid
> UUID).
> 

Yes, make sense, will add the error handling.

> > +	}
> > +
> > +	RTE_LOG(DEBUG, EAL,
> > +		"The VF token is found : %s\n", vf_token);
> > +
> > +	vf_token[RTE_UUID_STRLEN - 1] = c;
> > +
> > +	/* Purge this vfio-pci specific token from the device arguments */
> > +	if (c != '\0') {
> > +		/* 1. Handle the case : 'vf_token=uuid,arg1=val1' */
> > +		memmove(p, vf_token + RTE_UUID_STRLEN,
> > +			strlen(vf_token + RTE_UUID_STRLEN) + 1);
> > +	} else {
> > +		/* 2. Handle the case : 'arg1=val1,vf_token=uuid' */
> > +		if (p != devargs->args)
> > +			p--;
> > +
> > +		*p = '\0';
> > +	}
> 
> Is it really required to purge? Why? If yes, it should be explained in
> the comment above.

Please see above reply.

> 
> > +}
> >
> >  static int
> >  pci_vfio_map_resource_primary(struct rte_pci_device *dev)
> >  {
> >  	struct vfio_device_info device_info = { .argsz = sizeof(device_info) };
> > +	rte_uuid_t vf_token = RTE_UUID_INIT(0, 0, 0, 0, 0ULL);
> 
> May be it would be better if vfio_pci_vf_token_arg()
> initializes it anyway instead of duplication init
> in two places?

+1, will update it.

> 
> >  	char pci_addr[PATH_MAX] = {0};
> >  	int vfio_dev_fd;
> >  	struct rte_pci_addr *loc = &dev->addr;
> > @@ -668,8 +717,9 @@ pci_vfio_map_resource_primary(struct rte_pci_device *dev)
> >  	snprintf(pci_addr, sizeof(pci_addr), PCI_PRI_FMT,
> >  			loc->domain, loc->bus, loc->devid, loc->function);
> >
> > +	vfio_pci_vf_token_arg(dev->device.devargs, vf_token);
> >  	ret = rte_vfio_setup_device(rte_pci_get_sysfs_path(), pci_addr,
> > -					&vfio_dev_fd, &device_info);
> > +					&vfio_dev_fd, &device_info, vf_token);
> >  	if (ret)
> >  		return ret;
> >
> > @@ -797,6 +847,7 @@ static int
> >  pci_vfio_map_resource_secondary(struct rte_pci_device *dev)
> >  {
> >  	struct vfio_device_info device_info = { .argsz = sizeof(device_info) };
> > +	rte_uuid_t vf_token = RTE_UUID_INIT(0, 0, 0, 0, 0ULL);
> >  	char pci_addr[PATH_MAX] = {0};
> >  	int vfio_dev_fd;
> >  	struct rte_pci_addr *loc = &dev->addr;
> > @@ -830,8 +881,9 @@ pci_vfio_map_resource_secondary(struct rte_pci_device *dev)
> >  		return -1;
> >  	}
> >
> > +	vfio_pci_vf_token_arg(dev->device.devargs, vf_token);
> >  	ret = rte_vfio_setup_device(rte_pci_get_sysfs_path(), pci_addr,
> > -					&vfio_dev_fd, &device_info);
> > +					&vfio_dev_fd, &device_info, vf_token);
> >  	if (ret)
> >  		return ret;
> >
> > diff --git a/lib/librte_eal/freebsd/eal.c b/lib/librte_eal/freebsd/eal.c
> > index 6ae37e7e6..a92584795 100644
> > --- a/lib/librte_eal/freebsd/eal.c
> > +++ b/lib/librte_eal/freebsd/eal.c
> > @@ -995,7 +995,8 @@ rte_eal_vfio_intr_mode(void)
> >  int rte_vfio_setup_device(__rte_unused const char *sysfs_base,
> >  		      __rte_unused const char *dev_addr,
> >  		      __rte_unused int *vfio_dev_fd,
> > -		      __rte_unused struct vfio_device_info *device_info)
> > +		      __rte_unused struct vfio_device_info *device_info,
> > +		      __rte_unused rte_uuid_t vf_token)
> >  {
> >  	return -1;
> >  }
> > diff --git a/lib/librte_eal/include/rte_uuid.h b/lib/librte_eal/include/rte_uuid.h
> > index 044afbdfa..8b42e070a 100644
> > --- a/lib/librte_eal/include/rte_uuid.h
> > +++ b/lib/librte_eal/include/rte_uuid.h
> > @@ -15,6 +15,8 @@ extern "C" {
> >  #endif
> >
> >  #include <stdbool.h>
> > +#include <stddef.h>
> > +#include <string.h>
> >
> >  /**
> >   * Struct describing a Universal Unique Identifier
> > diff --git a/lib/librte_eal/include/rte_vfio.h b/lib/librte_eal/include/rte_vfio.h
> > index 20ed8c45a..1f9e22d82 100644
> > --- a/lib/librte_eal/include/rte_vfio.h
> > +++ b/lib/librte_eal/include/rte_vfio.h
> > @@ -16,6 +16,8 @@ extern "C" {
> >
> >  #include <stdint.h>
> >
> > +#include <rte_uuid.h>
> > +
> >  /*
> >   * determine if VFIO is present on the system
> >   */
> > @@ -102,13 +104,17 @@ struct vfio_device_info;
> >   * @param device_info
> >   *   Device information.
> >   *
> > + * @param vf_token
> > + *   VF token.
> 
> Such comments are useles and just eat space adding  nothing
> useful. Please, make it useful and explain what is behind the
> parameter, when it is necessary, why? Should it be specified
> for PF case, VF case, both?
> 

Will add more comments, yes for both PF and VF, as Alex's linux patch
explained.

> > + *
> >   * @return
> >   *   0 on success.
> >   *   <0 on failure.
> >   *   >1 if the device cannot be managed this way.
> >   */
> >  int rte_vfio_setup_device(const char *sysfs_base, const char *dev_addr,
> > -		int *vfio_dev_fd, struct vfio_device_info *device_info);
> > +		int *vfio_dev_fd, struct vfio_device_info *device_info,
> > +		rte_uuid_t vf_token);
> 
> "rte_uuid_t vf_token" looks confusing. Shouldn't it be
> "rte_uuid_t *vf_token"?

This is UUID API design and type definition:

bool rte_uuid_is_null(const rte_uuid_t uu);

DPDK: typedef unsigned char rte_uuid_t[16];

vs

Linux: typedef struct {
	__u8 b[UUID_SIZE];
} uuid_t;

> 
> >
> >  /**
> >   * Release a device mapped to a VFIO-managed I/O MMU group.
> > diff --git a/lib/librte_eal/linux/eal_vfio.c b/lib/librte_eal/linux/eal_vfio.c
> > index 4502aefed..916082b5d 100644
> > --- a/lib/librte_eal/linux/eal_vfio.c
> > +++ b/lib/librte_eal/linux/eal_vfio.c
> > @@ -702,7 +702,8 @@ rte_vfio_clear_group(int vfio_group_fd)
> >
> >  int
> >  rte_vfio_setup_device(const char *sysfs_base, const char *dev_addr,
> > -		int *vfio_dev_fd, struct vfio_device_info *device_info)
> > +		int *vfio_dev_fd, struct vfio_device_info *device_info,
> > +		rte_uuid_t vf_token)
> >  {
> >  	struct vfio_group_status group_status = {
> >  			.argsz = sizeof(group_status)
> > @@ -712,6 +713,7 @@ rte_vfio_setup_device(const char *sysfs_base, const char *dev_addr,
> >  	int vfio_container_fd;
> >  	int vfio_group_fd;
> >  	int iommu_group_num;
> > +	char dev[PATH_MAX];
> 
> Why PATH_MAX?

Based on Vamsi's RFC v1, and found that it looked a little reasonable, ' char pci_addr[PATH_MAX] = {0}; '

static int
pci_vfio_map_resource_primary(struct rte_pci_device *dev)
{
	struct vfio_device_info device_info = { .argsz = sizeof(device_info) };
	char pci_addr[PATH_MAX] = {0}; <----

> 
> >  	int i, ret;
> >
> >  	/* get group number */
> > @@ -895,8 +897,19 @@ rte_vfio_setup_device(const char *sysfs_base, const char *dev_addr,
> >  				t->type_id, t->name);
> >  	}
> >
> > +	if (!rte_uuid_is_null(vf_token)) {
> > +		char vf_token_str[RTE_UUID_STRLEN];
> > +
> > +		rte_uuid_unparse(vf_token, vf_token_str, sizeof(vf_token_str));
> > +		snprintf(dev, sizeof(dev),
> > +			 "%s vf_token=%s", dev_addr, vf_token_str);
> > +	} else {
> > +		snprintf(dev, sizeof(dev),
> > +			 "%s", dev_addr);
> > +	}
> > +
> >  	/* get a file descriptor for the device */
> > -	*vfio_dev_fd = ioctl(vfio_group_fd, VFIO_GROUP_GET_DEVICE_FD, dev_addr);
> > +	*vfio_dev_fd = ioctl(vfio_group_fd, VFIO_GROUP_GET_DEVICE_FD, dev);
> >  	if (*vfio_dev_fd < 0) {
> >  		/* if we cannot get a device fd, this implies a problem with
> >  		 * the VFIO group or the container not having IOMMU configured.
> > @@ -2081,7 +2094,8 @@ int
> >  rte_vfio_setup_device(__rte_unused const char *sysfs_base,
> >  		__rte_unused const char *dev_addr,
> >  		__rte_unused int *vfio_dev_fd,
> > -		__rte_unused struct vfio_device_info *device_info)
> > +		__rte_unused struct vfio_device_info *device_info,
> > +		__rte_unused rte_uuid_t vf_token)
> >  {
> >  	return -1;
> >  }
> >
  
Wang, Haiyue April 13, 2020, 5:01 p.m. UTC | #4
Hi Thomas,

> -----Original Message-----
> From: Thomas Monjalon <thomas@monjalon.net>
> Sent: Monday, April 13, 2020 20:18
> To: Wang, Haiyue <haiyue.wang@intel.com>
> Cc: dev@dpdk.org; vattunuru@marvell.com; jerinj@marvell.com; alex.williamson@redhat.com;
> david.marchand@redhat.com
> Subject: Re: [PATCH v4] eal: add VFIO-PCI SR-IOV support
> 
> Hi,
> 
> About the title, I think it does not convey what is new here.
> VFIO is not new, SR-IOV is already supported.
> The title should mention the new VFIO feature in few simple words.
> Is it only about using VFIO for PF?
> 

For Both. Align with Alex's : vfio-pci: QEMU support for vfio-pci VF tokens

How about: 'eal: support for VFIO-PCI VF tokens' ?

> 
> 13/04/2020 10:29, Haiyue Wang:
> > v4: 1. Ignore rte_vfio_setup_device ABI check since it is
> >        for Linux driver use.
> [...]
> > +; Ignore this function which is only relevant to linux for driver
> > +[suppress_type]
> > +	name = rte_vfio_setup_device
> 
> Adding such exception for all internal "driver interface" functions
> is not scaling. Please use __rte_internal.
> I am waiting for the patchset about rte_internal to be reviewed or completed.
> As it is not progressing, the decision is to block any patch having
> ABI issue because of internal false positive.
> Please help, thanks.
> 

I will drop this patch ABI workaround, and try to catch the '__rte_internal ' design.
  

Patch

diff --git a/devtools/libabigail.abignore b/devtools/libabigail.abignore
index a59df8f13..d918746b4 100644
--- a/devtools/libabigail.abignore
+++ b/devtools/libabigail.abignore
@@ -11,3 +11,6 @@ 
         type_kind = enum
         name = rte_crypto_asym_xform_type
         changed_enumerators = RTE_CRYPTO_ASYM_XFORM_TYPE_LIST_END
+; Ignore this function which is only relevant to linux for driver
+[suppress_type]
+	name = rte_vfio_setup_device
diff --git a/drivers/bus/pci/linux/pci_vfio.c b/drivers/bus/pci/linux/pci_vfio.c
index 64cd84a68..7f99337c7 100644
--- a/drivers/bus/pci/linux/pci_vfio.c
+++ b/drivers/bus/pci/linux/pci_vfio.c
@@ -11,6 +11,7 @@ 
 #include <sys/mman.h>
 #include <stdbool.h>
 
+#include <rte_devargs.h>
 #include <rte_log.h>
 #include <rte_pci.h>
 #include <rte_bus_pci.h>
@@ -644,11 +645,59 @@  pci_vfio_msix_is_mappable(int vfio_dev_fd, int msix_region)
 	return ret;
 }
 
+static void
+vfio_pci_vf_token_arg(struct rte_devargs *devargs, rte_uuid_t uu)
+{
+#define VF_TOKEN_ARG "vf_token="
+	char c, *p, *vf_token;
+
+	if (devargs == NULL)
+		return;
+
+	p = strstr(devargs->args, VF_TOKEN_ARG);
+	if (!p)
+		return;
+
+	vf_token = p + strlen(VF_TOKEN_ARG);
+	if (strlen(vf_token) < (RTE_UUID_STRLEN - 1))
+		return;
+
+	c = vf_token[RTE_UUID_STRLEN - 1];
+	if (c != '\0' && c != ',')
+		return;
+
+	vf_token[RTE_UUID_STRLEN - 1] = '\0';
+	if (rte_uuid_parse(vf_token, uu)) {
+		RTE_LOG(ERR, EAL,
+			"The VF token is not a valid uuid : %s\n", vf_token);
+		vf_token[RTE_UUID_STRLEN - 1] = c;
+		return;
+	}
+
+	RTE_LOG(DEBUG, EAL,
+		"The VF token is found : %s\n", vf_token);
+
+	vf_token[RTE_UUID_STRLEN - 1] = c;
+
+	/* Purge this vfio-pci specific token from the device arguments */
+	if (c != '\0') {
+		/* 1. Handle the case : 'vf_token=uuid,arg1=val1' */
+		memmove(p, vf_token + RTE_UUID_STRLEN,
+			strlen(vf_token + RTE_UUID_STRLEN) + 1);
+	} else {
+		/* 2. Handle the case : 'arg1=val1,vf_token=uuid' */
+		if (p != devargs->args)
+			p--;
+
+		*p = '\0';
+	}
+}
 
 static int
 pci_vfio_map_resource_primary(struct rte_pci_device *dev)
 {
 	struct vfio_device_info device_info = { .argsz = sizeof(device_info) };
+	rte_uuid_t vf_token = RTE_UUID_INIT(0, 0, 0, 0, 0ULL);
 	char pci_addr[PATH_MAX] = {0};
 	int vfio_dev_fd;
 	struct rte_pci_addr *loc = &dev->addr;
@@ -668,8 +717,9 @@  pci_vfio_map_resource_primary(struct rte_pci_device *dev)
 	snprintf(pci_addr, sizeof(pci_addr), PCI_PRI_FMT,
 			loc->domain, loc->bus, loc->devid, loc->function);
 
+	vfio_pci_vf_token_arg(dev->device.devargs, vf_token);
 	ret = rte_vfio_setup_device(rte_pci_get_sysfs_path(), pci_addr,
-					&vfio_dev_fd, &device_info);
+					&vfio_dev_fd, &device_info, vf_token);
 	if (ret)
 		return ret;
 
@@ -797,6 +847,7 @@  static int
 pci_vfio_map_resource_secondary(struct rte_pci_device *dev)
 {
 	struct vfio_device_info device_info = { .argsz = sizeof(device_info) };
+	rte_uuid_t vf_token = RTE_UUID_INIT(0, 0, 0, 0, 0ULL);
 	char pci_addr[PATH_MAX] = {0};
 	int vfio_dev_fd;
 	struct rte_pci_addr *loc = &dev->addr;
@@ -830,8 +881,9 @@  pci_vfio_map_resource_secondary(struct rte_pci_device *dev)
 		return -1;
 	}
 
+	vfio_pci_vf_token_arg(dev->device.devargs, vf_token);
 	ret = rte_vfio_setup_device(rte_pci_get_sysfs_path(), pci_addr,
-					&vfio_dev_fd, &device_info);
+					&vfio_dev_fd, &device_info, vf_token);
 	if (ret)
 		return ret;
 
diff --git a/lib/librte_eal/freebsd/eal.c b/lib/librte_eal/freebsd/eal.c
index 6ae37e7e6..a92584795 100644
--- a/lib/librte_eal/freebsd/eal.c
+++ b/lib/librte_eal/freebsd/eal.c
@@ -995,7 +995,8 @@  rte_eal_vfio_intr_mode(void)
 int rte_vfio_setup_device(__rte_unused const char *sysfs_base,
 		      __rte_unused const char *dev_addr,
 		      __rte_unused int *vfio_dev_fd,
-		      __rte_unused struct vfio_device_info *device_info)
+		      __rte_unused struct vfio_device_info *device_info,
+		      __rte_unused rte_uuid_t vf_token)
 {
 	return -1;
 }
diff --git a/lib/librte_eal/include/rte_uuid.h b/lib/librte_eal/include/rte_uuid.h
index 044afbdfa..8b42e070a 100644
--- a/lib/librte_eal/include/rte_uuid.h
+++ b/lib/librte_eal/include/rte_uuid.h
@@ -15,6 +15,8 @@  extern "C" {
 #endif
 
 #include <stdbool.h>
+#include <stddef.h>
+#include <string.h>
 
 /**
  * Struct describing a Universal Unique Identifier
diff --git a/lib/librte_eal/include/rte_vfio.h b/lib/librte_eal/include/rte_vfio.h
index 20ed8c45a..1f9e22d82 100644
--- a/lib/librte_eal/include/rte_vfio.h
+++ b/lib/librte_eal/include/rte_vfio.h
@@ -16,6 +16,8 @@  extern "C" {
 
 #include <stdint.h>
 
+#include <rte_uuid.h>
+
 /*
  * determine if VFIO is present on the system
  */
@@ -102,13 +104,17 @@  struct vfio_device_info;
  * @param device_info
  *   Device information.
  *
+ * @param vf_token
+ *   VF token.
+ *
  * @return
  *   0 on success.
  *   <0 on failure.
  *   >1 if the device cannot be managed this way.
  */
 int rte_vfio_setup_device(const char *sysfs_base, const char *dev_addr,
-		int *vfio_dev_fd, struct vfio_device_info *device_info);
+		int *vfio_dev_fd, struct vfio_device_info *device_info,
+		rte_uuid_t vf_token);
 
 /**
  * Release a device mapped to a VFIO-managed I/O MMU group.
diff --git a/lib/librte_eal/linux/eal_vfio.c b/lib/librte_eal/linux/eal_vfio.c
index 4502aefed..916082b5d 100644
--- a/lib/librte_eal/linux/eal_vfio.c
+++ b/lib/librte_eal/linux/eal_vfio.c
@@ -702,7 +702,8 @@  rte_vfio_clear_group(int vfio_group_fd)
 
 int
 rte_vfio_setup_device(const char *sysfs_base, const char *dev_addr,
-		int *vfio_dev_fd, struct vfio_device_info *device_info)
+		int *vfio_dev_fd, struct vfio_device_info *device_info,
+		rte_uuid_t vf_token)
 {
 	struct vfio_group_status group_status = {
 			.argsz = sizeof(group_status)
@@ -712,6 +713,7 @@  rte_vfio_setup_device(const char *sysfs_base, const char *dev_addr,
 	int vfio_container_fd;
 	int vfio_group_fd;
 	int iommu_group_num;
+	char dev[PATH_MAX];
 	int i, ret;
 
 	/* get group number */
@@ -895,8 +897,19 @@  rte_vfio_setup_device(const char *sysfs_base, const char *dev_addr,
 				t->type_id, t->name);
 	}
 
+	if (!rte_uuid_is_null(vf_token)) {
+		char vf_token_str[RTE_UUID_STRLEN];
+
+		rte_uuid_unparse(vf_token, vf_token_str, sizeof(vf_token_str));
+		snprintf(dev, sizeof(dev),
+			 "%s vf_token=%s", dev_addr, vf_token_str);
+	} else {
+		snprintf(dev, sizeof(dev),
+			 "%s", dev_addr);
+	}
+
 	/* get a file descriptor for the device */
-	*vfio_dev_fd = ioctl(vfio_group_fd, VFIO_GROUP_GET_DEVICE_FD, dev_addr);
+	*vfio_dev_fd = ioctl(vfio_group_fd, VFIO_GROUP_GET_DEVICE_FD, dev);
 	if (*vfio_dev_fd < 0) {
 		/* if we cannot get a device fd, this implies a problem with
 		 * the VFIO group or the container not having IOMMU configured.
@@ -2081,7 +2094,8 @@  int
 rte_vfio_setup_device(__rte_unused const char *sysfs_base,
 		__rte_unused const char *dev_addr,
 		__rte_unused int *vfio_dev_fd,
-		__rte_unused struct vfio_device_info *device_info)
+		__rte_unused struct vfio_device_info *device_info,
+		__rte_unused rte_uuid_t vf_token)
 {
 	return -1;
 }