[1/2] vfio: fix check for vfio_group_fd

Message ID 1589859720-16224-1-git-send-email-wangyunjian@huawei.com (mailing list archive)
State Accepted, archived
Delegated to: David Marchand
Headers
Series fixes for vfio |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/Intel-compilation success Compilation OK
ci/iol-intel-Performance success Performance Testing PASS
ci/iol-nxp-Performance success Performance Testing PASS
ci/iol-mellanox-Performance success Performance Testing PASS

Commit Message

Yunjian Wang May 19, 2020, 3:42 a.m. UTC
  From: Yunjian Wang <wangyunjian@huawei.com>

The issue is that a file descriptor at 0 is a valid one. Currently
the file not found, the return value will be set to 0. As a result,
it is impossible to distinguish between a correct descriptor and a
failed return value. Fix it to return -ENOENT instead of 0.

Fixes: b758423bc4fe ("vfio: fix race condition with sysfs")
Fixes: ff0b67d1c868 ("vfio: DMA mappinge")
Cc: stable@dpdk.org

Signed-off-by: Yunjian Wang <wangyunjian@huawei.com>
---
 lib/librte_eal/linux/eal_vfio.c         | 23 +++++++++++++----------
 lib/librte_eal/linux/eal_vfio_mp_sync.c |  4 ++--
 2 files changed, 15 insertions(+), 12 deletions(-)
  

Comments

David Marchand May 19, 2020, 7:43 a.m. UTC | #1
On Tue, May 19, 2020 at 5:42 AM wangyunjian <wangyunjian@huawei.com> wrote:
>
> From: Yunjian Wang <wangyunjian@huawei.com>
>
> The issue is that a file descriptor at 0 is a valid one. Currently
> the file not found, the return value will be set to 0. As a result,
> it is impossible to distinguish between a correct descriptor and a
> failed return value. Fix it to return -ENOENT instead of 0.
>
> Fixes: b758423bc4fe ("vfio: fix race condition with sysfs")
> Fixes: ff0b67d1c868 ("vfio: DMA mappinge")
> Cc: stable@dpdk.org

This patch reverts the (marked for stable) fix b758423bc4fe ("vfio:
fix race condition with sysfs") and comes with a different fix.

- Kevin, Luca, I would put this b758423bc4fe backport on hold.

- Anatoly, I don't want to put 20.05 at risk.
My simple question for 20.05 is, should we revert b758423bc4fe?

- This patchset will go to 20.08 as I don't feel confident in taking it now.
  
Burakov, Anatoly May 21, 2020, 12:53 p.m. UTC | #2
On 19-May-20 8:43 AM, David Marchand wrote:
> On Tue, May 19, 2020 at 5:42 AM wangyunjian <wangyunjian@huawei.com> wrote:
>>
>> From: Yunjian Wang <wangyunjian@huawei.com>
>>
>> The issue is that a file descriptor at 0 is a valid one. Currently
>> the file not found, the return value will be set to 0. As a result,
>> it is impossible to distinguish between a correct descriptor and a
>> failed return value. Fix it to return -ENOENT instead of 0.
>>
>> Fixes: b758423bc4fe ("vfio: fix race condition with sysfs")
>> Fixes: ff0b67d1c868 ("vfio: DMA mappinge")
>> Cc: stable@dpdk.org
> 
> This patch reverts the (marked for stable) fix b758423bc4fe ("vfio:
> fix race condition with sysfs") and comes with a different fix.
> 
> - Kevin, Luca, I would put this b758423bc4fe backport on hold.
> 
> - Anatoly, I don't want to put 20.05 at risk.
> My simple question for 20.05 is, should we revert b758423bc4fe?

No, let's not revert anything.

As far as i can tell, this patch is a more complete fix, but it 
essentially does the same thing, just in a different (and better) way. 
Still, i haven't reviewed it in detail.

> 
> - This patchset will go to 20.08 as I don't feel confident in taking it now.
> 

Yes, let's leave it for 20.08.
  
David Marchand May 22, 2020, 7:36 a.m. UTC | #3
On Thu, May 21, 2020 at 2:54 PM Burakov, Anatoly
<anatoly.burakov@intel.com> wrote:
>
> On 19-May-20 8:43 AM, David Marchand wrote:
> > On Tue, May 19, 2020 at 5:42 AM wangyunjian <wangyunjian@huawei.com> wrote:
> >>
> >> From: Yunjian Wang <wangyunjian@huawei.com>
> >>
> >> The issue is that a file descriptor at 0 is a valid one. Currently
> >> the file not found, the return value will be set to 0. As a result,
> >> it is impossible to distinguish between a correct descriptor and a
> >> failed return value. Fix it to return -ENOENT instead of 0.
> >>
> >> Fixes: b758423bc4fe ("vfio: fix race condition with sysfs")
> >> Fixes: ff0b67d1c868 ("vfio: DMA mappinge")
> >> Cc: stable@dpdk.org
> >
> > This patch reverts the (marked for stable) fix b758423bc4fe ("vfio:
> > fix race condition with sysfs") and comes with a different fix.
> >
> > - Kevin, Luca, I would put this b758423bc4fe backport on hold.
> >
> > - Anatoly, I don't want to put 20.05 at risk.
> > My simple question for 20.05 is, should we revert b758423bc4fe?
>
> No, let's not revert anything.
>
> As far as i can tell, this patch is a more complete fix, but it
> essentially does the same thing, just in a different (and better) way.
> Still, i haven't reviewed it in detail.
>
> >
> > - This patchset will go to 20.08 as I don't feel confident in taking it now.
> >
>
> Yes, let's leave it for 20.08.

Thanks for the analysis Anatoly.
  
Kevin Traynor May 27, 2020, 10:45 a.m. UTC | #4
On 21/05/2020 13:53, Burakov, Anatoly wrote:
> On 19-May-20 8:43 AM, David Marchand wrote:
>> On Tue, May 19, 2020 at 5:42 AM wangyunjian <wangyunjian@huawei.com> wrote:
>>>
>>> From: Yunjian Wang <wangyunjian@huawei.com>
>>>
>>> The issue is that a file descriptor at 0 is a valid one. Currently
>>> the file not found, the return value will be set to 0. As a result,
>>> it is impossible to distinguish between a correct descriptor and a
>>> failed return value. Fix it to return -ENOENT instead of 0.
>>>
>>> Fixes: b758423bc4fe ("vfio: fix race condition with sysfs")
>>> Fixes: ff0b67d1c868 ("vfio: DMA mappinge")
>>> Cc: stable@dpdk.org
>>
>> This patch reverts the (marked for stable) fix b758423bc4fe ("vfio:
>> fix race condition with sysfs") and comes with a different fix.
>>
>> - Kevin, Luca, I would put this b758423bc4fe backport on hold.
>>
>> - Anatoly, I don't want to put 20.05 at risk.
>> My simple question for 20.05 is, should we revert b758423bc4fe?
> 
> No, let's not revert anything.
> 
> As far as i can tell, this patch is a more complete fix, but it 
> essentially does the same thing, just in a different (and better) way. 
> Still, i haven't reviewed it in detail.
> 

b758423bc4fe was not reverted in master - should that be backported now?
or should we wait until this new fix is ready for backport too?

>>
>> - This patchset will go to 20.08 as I don't feel confident in taking it now.
>>
> 
> Yes, let's leave it for 20.08.
>
  
Yunjian Wang July 29, 2020, 11:43 a.m. UTC | #5
Ping for review

Thanks,
Yunjian

> -----Original Message-----
> From: Kevin Traynor [mailto:ktraynor@redhat.com]
> Sent: Wednesday, May 27, 2020 6:45 PM
> To: Burakov, Anatoly <anatoly.burakov@intel.com>; David Marchand
> <david.marchand@redhat.com>; wangyunjian <wangyunjian@huawei.com>;
> Luca Boccassi <bluca@debian.org>
> Cc: dev <dev@dpdk.org>; Hemant Agrawal <hemant.agrawal@nxp.com>;
> Sachin Saxena <sachin.saxena@nxp.com>; Lilijun (Jerry)
> <jerry.lilijun@huawei.com>; xudingke <xudingke@huawei.com>; dpdk stable
> <stable@dpdk.org>; Luca Boccassi <bluca@debian.org>
> Subject: Re: [dpdk-stable] [dpdk-dev] [PATCH 1/2] vfio: fix check for
> vfio_group_fd
> 
> On 21/05/2020 13:53, Burakov, Anatoly wrote:
> > On 19-May-20 8:43 AM, David Marchand wrote:
> >> On Tue, May 19, 2020 at 5:42 AM wangyunjian <wangyunjian@huawei.com>
> wrote:
> >>>
> >>> From: Yunjian Wang <wangyunjian@huawei.com>
> >>>
> >>> The issue is that a file descriptor at 0 is a valid one. Currently
> >>> the file not found, the return value will be set to 0. As a result,
> >>> it is impossible to distinguish between a correct descriptor and a
> >>> failed return value. Fix it to return -ENOENT instead of 0.
> >>>
> >>> Fixes: b758423bc4fe ("vfio: fix race condition with sysfs")
> >>> Fixes: ff0b67d1c868 ("vfio: DMA mappinge")
> >>> Cc: stable@dpdk.org
> >>
> >> This patch reverts the (marked for stable) fix b758423bc4fe ("vfio:
> >> fix race condition with sysfs") and comes with a different fix.
> >>
> >> - Kevin, Luca, I would put this b758423bc4fe backport on hold.
> >>
> >> - Anatoly, I don't want to put 20.05 at risk.
> >> My simple question for 20.05 is, should we revert b758423bc4fe?
> >
> > No, let's not revert anything.
> >
> > As far as i can tell, this patch is a more complete fix, but it
> > essentially does the same thing, just in a different (and better) way.
> > Still, i haven't reviewed it in detail.
> >
> 
> b758423bc4fe was not reverted in master - should that be backported now?
> or should we wait until this new fix is ready for backport too?
> 
> >>
> >> - This patchset will go to 20.08 as I don't feel confident in taking it now.
> >>
> >
> > Yes, let's leave it for 20.08.
> >
  
Burakov, Anatoly Sept. 17, 2020, 11:44 a.m. UTC | #6
On 19-May-20 4:42 AM, wangyunjian wrote:
> From: Yunjian Wang <wangyunjian@huawei.com>
> 
> The issue is that a file descriptor at 0 is a valid one. Currently
> the file not found, the return value will be set to 0. As a result,
> it is impossible to distinguish between a correct descriptor and a
> failed return value. Fix it to return -ENOENT instead of 0.
> 
> Fixes: b758423bc4fe ("vfio: fix race condition with sysfs")
> Fixes: ff0b67d1c868 ("vfio: DMA mappinge")
> Cc: stable@dpdk.org
> 
> Signed-off-by: Yunjian Wang <wangyunjian@huawei.com>
> ---

Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>
  

Patch

diff --git a/lib/librte_eal/linux/eal_vfio.c b/lib/librte_eal/linux/eal_vfio.c
index d26e1649a..8c5a13be6 100644
--- a/lib/librte_eal/linux/eal_vfio.c
+++ b/lib/librte_eal/linux/eal_vfio.c
@@ -293,7 +293,7 @@  vfio_open_group_fd(int iommu_group_num)
 							strerror(errno));
 					return -1;
 				}
-				return 0;
+				return -ENOENT;
 			}
 			/* noiommu group found */
 		}
@@ -318,12 +318,12 @@  vfio_open_group_fd(int iommu_group_num)
 			vfio_group_fd = mp_rep->fds[0];
 		} else if (p->result == SOCKET_NO_FD) {
 			RTE_LOG(ERR, EAL, "  bad VFIO group fd\n");
-			vfio_group_fd = 0;
+			vfio_group_fd = -ENOENT;
 		}
 	}
 
 	free(mp_reply.msgs);
-	if (vfio_group_fd < 0)
+	if (vfio_group_fd < 0 && vfio_group_fd != -ENOENT)
 		RTE_LOG(ERR, EAL, "  cannot request group fd\n");
 	return vfio_group_fd;
 }
@@ -379,9 +379,9 @@  vfio_get_group_fd(struct vfio_config *vfio_cfg,
 	}
 
 	vfio_group_fd = vfio_open_group_fd(iommu_group_num);
-	if (vfio_group_fd <= 0) {
+	if (vfio_group_fd < 0) {
 		RTE_LOG(ERR, EAL, "Failed to open group %d\n", iommu_group_num);
-		return -1;
+		return vfio_group_fd;
 	}
 
 	cur_grp->group_num = iommu_group_num;
@@ -728,11 +728,14 @@  rte_vfio_setup_device(const char *sysfs_base, const char *dev_addr,
 
 	/* get the actual group fd */
 	vfio_group_fd = rte_vfio_get_group_fd(iommu_group_num);
-	if (vfio_group_fd < 0)
+	if (vfio_group_fd < 0 && vfio_group_fd != -ENOENT)
 		return -1;
 
-	/* if group_fd == 0, that means the device isn't managed by VFIO */
-	if (vfio_group_fd == 0) {
+	/*
+	 * if vfio_group_fd == -ENOENT, that means the device
+	 * isn't managed by VFIO
+	 */
+	if (vfio_group_fd == -ENOENT) {
 		RTE_LOG(WARNING, EAL, " %s not managed by VFIO driver, skipping\n",
 				dev_addr);
 		return 1;
@@ -955,10 +958,10 @@  rte_vfio_release_device(const char *sysfs_base, const char *dev_addr,
 
 	/* get the actual group fd */
 	vfio_group_fd = rte_vfio_get_group_fd(iommu_group_num);
-	if (vfio_group_fd <= 0) {
+	if (vfio_group_fd < 0) {
 		RTE_LOG(INFO, EAL, "rte_vfio_get_group_fd failed for %s\n",
 				   dev_addr);
-		ret = -1;
+		ret = vfio_group_fd;
 		goto out;
 	}
 
diff --git a/lib/librte_eal/linux/eal_vfio_mp_sync.c b/lib/librte_eal/linux/eal_vfio_mp_sync.c
index 5f2a5fc1d..6254696ae 100644
--- a/lib/librte_eal/linux/eal_vfio_mp_sync.c
+++ b/lib/librte_eal/linux/eal_vfio_mp_sync.c
@@ -44,9 +44,9 @@  vfio_mp_primary(const struct rte_mp_msg *msg, const void *peer)
 		r->req = SOCKET_REQ_GROUP;
 		r->group_num = m->group_num;
 		fd = rte_vfio_get_group_fd(m->group_num);
-		if (fd < 0)
+		if (fd < 0 && fd != -ENOENT)
 			r->result = SOCKET_ERR;
-		else if (fd == 0)
+		else if (fd == -ENOENT)
 			/* if VFIO group exists but isn't bound to VFIO driver */
 			r->result = SOCKET_NO_FD;
 		else {