net/mlx5: fix use after free when releasing tx queues

Message ID 1708421499-42236-1-git-send-email-wangyunjian@huawei.com (mailing list archive)
State Accepted, archived
Delegated to: Raslan Darawsheh
Headers
Series net/mlx5: fix use after free when releasing tx queues |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/loongarch-compilation success Compilation OK
ci/loongarch-unit-testing success Unit Testing PASS
ci/Intel-compilation success Compilation OK
ci/intel-Testing success Testing PASS
ci/github-robot: build success github build: passed
ci/iol-mellanox-Performance success Performance Testing PASS
ci/intel-Functional success Functional PASS
ci/iol-abi-testing success Testing PASS
ci/iol-compile-amd64-testing success Testing PASS
ci/iol-unit-amd64-testing success Testing PASS
ci/iol-unit-arm64-testing success Testing PASS
ci/iol-compile-arm64-testing success Testing PASS
ci/iol-broadcom-Performance success Performance Testing PASS
ci/iol-sample-apps-testing success Testing PASS
ci/iol-broadcom-Functional success Functional Testing PASS
ci/iol-intel-Performance success Performance Testing PASS
ci/iol-intel-Functional success Functional Testing PASS

Commit Message

Yunjian Wang Feb. 20, 2024, 9:31 a.m. UTC
From: Pengfei Sun <sunpengfei16@huawei.com>

In function mlx5_dev_configure, dev->data->tx_queues is assigned
to priv->txqs. When a member is removed from a bond, the function
eth_dev_tx_queue_config is called to release dev->data->tx_queues.
However, function mlx5_dev_close will access priv->txqs again and
cause the use after free problem.

In function mlx5_dev_close, before free priv->txqs, we add a check
that dev->data->tx_queues is not NULL.

build/app/dpdk-testpmd -c7 -a 0000:08:00.2 --  -i --nb-cores=2
--total-num-mbufs=2048

testpmd> port stop 0
testpmd> create bonding device 4 0
testpmd> add bonding member 0 1
testpmd> remove bonding member 0 1
testpmd> quit

ASan reports:
==2571911==ERROR: AddressSanitizer: heap-use-after-free on address
0x000174529880 at pc 0x0000113c8440 bp 0xffffefae0ea0 sp 0xffffefae0eb0
READ of size 8 at 0x000174529880 thread T0
    #0 0x113c843c in mlx5_txq_release ../drivers/net/mlx5/mlx5_txq.c:
1203
    #1 0xffdb53c in mlx5_dev_close ../drivers/net/mlx5/mlx5.c:2286
    #2 0xe12dc0 in rte_eth_dev_close ../lib/ethdev/rte_ethdev.c:1877
    #3 0x6bac1c in close_port ../app/test-pmd/testpmd.c:3540
    #4 0x6bc320 in pmd_test_exit ../app/test-pmd/testpmd.c:3808
    #5 0x6c1a94 in main ../app/test-pmd/testpmd.c:4759
    #6 0xffff9328f038  (/usr/lib64/libc.so.6+0x2b038)
    #7 0xffff9328f110 in __libc_start_main (/usr/lib64/libc.so.6+
0x2b110)

Fixes: 6e78005 ("net/mlx5: add reference counter on DPDK Tx queues")
Cc: stable@dpdk.org

Reported-by: Yunjian Wang <wangyunjian@huawei.com>
Signed-off-by: Pengfei Sun <sunpengfei16@huawei.com>
---
 drivers/net/mlx5/mlx5.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
  

Comments

Dariusz Sosnowski Feb. 20, 2024, 1:55 p.m. UTC | #1
Hi,

> -----Original Message-----
> From: Yunjian Wang <wangyunjian@huawei.com>
> Sent: Tuesday, February 20, 2024 10:32
> To: dev@dpdk.org
> Cc: Dariusz Sosnowski <dsosnowski@nvidia.com>; Ori Kam
> <orika@nvidia.com>; Matan Azrad <matan@nvidia.com>; Slava Ovsiienko
> <viacheslavo@nvidia.com>; Suanming Mou <suanmingm@nvidia.com>;
> luyicai@huawei.com; Pengfei Sun <sunpengfei16@huawei.com>;
> stable@dpdk.org
> Subject: [PATCH] net/mlx5: fix use after free when releasing tx queues
> 
> From: Pengfei Sun <sunpengfei16@huawei.com>
> 
> In function mlx5_dev_configure, dev->data->tx_queues is assigned to priv-
> >txqs. When a member is removed from a bond, the function
> eth_dev_tx_queue_config is called to release dev->data->tx_queues.
> However, function mlx5_dev_close will access priv->txqs again and cause the
> use after free problem.
> 
> In function mlx5_dev_close, before free priv->txqs, we add a check that dev-
> >data->tx_queues is not NULL.
> 
> build/app/dpdk-testpmd -c7 -a 0000:08:00.2 --  -i --nb-cores=2
> --total-num-mbufs=2048
> 
> testpmd> port stop 0
> testpmd> create bonding device 4 0
> testpmd> add bonding member 0 1
> testpmd> remove bonding member 0 1
> testpmd> quit
> 
> ASan reports:
> ==2571911==ERROR: AddressSanitizer: heap-use-after-free on address
> 0x000174529880 at pc 0x0000113c8440 bp 0xffffefae0ea0 sp 0xffffefae0eb0
> READ of size 8 at 0x000174529880 thread T0
>     #0 0x113c843c in mlx5_txq_release ../drivers/net/mlx5/mlx5_txq.c:
> 1203
>     #1 0xffdb53c in mlx5_dev_close ../drivers/net/mlx5/mlx5.c:2286
>     #2 0xe12dc0 in rte_eth_dev_close ../lib/ethdev/rte_ethdev.c:1877
>     #3 0x6bac1c in close_port ../app/test-pmd/testpmd.c:3540
>     #4 0x6bc320 in pmd_test_exit ../app/test-pmd/testpmd.c:3808
>     #5 0x6c1a94 in main ../app/test-pmd/testpmd.c:4759
>     #6 0xffff9328f038  (/usr/lib64/libc.so.6+0x2b038)
>     #7 0xffff9328f110 in __libc_start_main (/usr/lib64/libc.so.6+
> 0x2b110)
> 
> Fixes: 6e78005 ("net/mlx5: add reference counter on DPDK Tx queues")
> Cc: stable@dpdk.org
> 
> Reported-by: Yunjian Wang <wangyunjian@huawei.com>
> Signed-off-by: Pengfei Sun <sunpengfei16@huawei.com>
Acked-by: Dariusz Sosnowski <dsosnowski@nvidia.com>

Thank you for the patch.

Question to ethdev maintainers:

While reviewing this patch, I took a look at rte_eth_dev_internal_reset() which is called by bonding PMD for removed members.
This resets Rx and Tx queue configuration, and dev->data->dev_conf,
but not dev->data->dev_configured flag.
So theoretically, after this call, a port can be started without port configuration, which seems invalid.
What do you think? Should it be fixed? 

Best regards,
Dariusz Sosnowski
  
Thomas Monjalon Feb. 21, 2024, 10:24 a.m. UTC | #2
20/02/2024 14:55, Dariusz Sosnowski:
> Hi,
> 
> > -----Original Message-----
> > From: Yunjian Wang <wangyunjian@huawei.com>
> > Sent: Tuesday, February 20, 2024 10:32
> > To: dev@dpdk.org
> > Cc: Dariusz Sosnowski <dsosnowski@nvidia.com>; Ori Kam
> > <orika@nvidia.com>; Matan Azrad <matan@nvidia.com>; Slava Ovsiienko
> > <viacheslavo@nvidia.com>; Suanming Mou <suanmingm@nvidia.com>;
> > luyicai@huawei.com; Pengfei Sun <sunpengfei16@huawei.com>;
> > stable@dpdk.org
> > Subject: [PATCH] net/mlx5: fix use after free when releasing tx queues
> > 
> > From: Pengfei Sun <sunpengfei16@huawei.com>
> > 
> > In function mlx5_dev_configure, dev->data->tx_queues is assigned to priv-
> > >txqs. When a member is removed from a bond, the function
> > eth_dev_tx_queue_config is called to release dev->data->tx_queues.
> > However, function mlx5_dev_close will access priv->txqs again and cause the
> > use after free problem.
> > 
> > In function mlx5_dev_close, before free priv->txqs, we add a check that dev-
> > >data->tx_queues is not NULL.
> > 
> > build/app/dpdk-testpmd -c7 -a 0000:08:00.2 --  -i --nb-cores=2
> > --total-num-mbufs=2048
> > 
> > testpmd> port stop 0
> > testpmd> create bonding device 4 0
> > testpmd> add bonding member 0 1
> > testpmd> remove bonding member 0 1
> > testpmd> quit
> > 
> > ASan reports:
> > ==2571911==ERROR: AddressSanitizer: heap-use-after-free on address
> > 0x000174529880 at pc 0x0000113c8440 bp 0xffffefae0ea0 sp 0xffffefae0eb0
> > READ of size 8 at 0x000174529880 thread T0
> >     #0 0x113c843c in mlx5_txq_release ../drivers/net/mlx5/mlx5_txq.c:
> > 1203
> >     #1 0xffdb53c in mlx5_dev_close ../drivers/net/mlx5/mlx5.c:2286
> >     #2 0xe12dc0 in rte_eth_dev_close ../lib/ethdev/rte_ethdev.c:1877
> >     #3 0x6bac1c in close_port ../app/test-pmd/testpmd.c:3540
> >     #4 0x6bc320 in pmd_test_exit ../app/test-pmd/testpmd.c:3808
> >     #5 0x6c1a94 in main ../app/test-pmd/testpmd.c:4759
> >     #6 0xffff9328f038  (/usr/lib64/libc.so.6+0x2b038)
> >     #7 0xffff9328f110 in __libc_start_main (/usr/lib64/libc.so.6+
> > 0x2b110)
> > 
> > Fixes: 6e78005 ("net/mlx5: add reference counter on DPDK Tx queues")
> > Cc: stable@dpdk.org
> > 
> > Reported-by: Yunjian Wang <wangyunjian@huawei.com>
> > Signed-off-by: Pengfei Sun <sunpengfei16@huawei.com>
> Acked-by: Dariusz Sosnowski <dsosnowski@nvidia.com>
> 
> Thank you for the patch.
> 
> Question to ethdev maintainers:
> 
> While reviewing this patch, I took a look at rte_eth_dev_internal_reset() which is called by bonding PMD for removed members.
> This resets Rx and Tx queue configuration, and dev->data->dev_conf,
> but not dev->data->dev_configured flag.
> So theoretically, after this call, a port can be started without port configuration, which seems invalid.
> What do you think? Should it be fixed?

Probably yes
  
Raslan Darawsheh Feb. 27, 2024, 4:15 p.m. UTC | #3
Hi,
> -----Original Message-----
> From: Yunjian Wang <wangyunjian@huawei.com>
> Sent: Tuesday, February 20, 2024 11:32 AM
> To: dev@dpdk.org
> Cc: Dariusz Sosnowski <dsosnowski@nvidia.com>; Ori Kam
> <orika@nvidia.com>; Matan Azrad <matan@nvidia.com>; Slava Ovsiienko
> <viacheslavo@nvidia.com>; Suanming Mou <suanmingm@nvidia.com>;
> luyicai@huawei.com; Pengfei Sun <sunpengfei16@huawei.com>;
> stable@dpdk.org
> Subject: [PATCH] net/mlx5: fix use after free when releasing tx queues
> 
> From: Pengfei Sun <sunpengfei16@huawei.com>
> 
> In function mlx5_dev_configure, dev->data->tx_queues is assigned to priv-
> >txqs. When a member is removed from a bond, the function
> eth_dev_tx_queue_config is called to release dev->data->tx_queues.
> However, function mlx5_dev_close will access priv->txqs again and cause the
> use after free problem.
> 
> In function mlx5_dev_close, before free priv->txqs, we add a check that dev-
> >data->tx_queues is not NULL.
> 
> build/app/dpdk-testpmd -c7 -a 0000:08:00.2 --  -i --nb-cores=2
> --total-num-mbufs=2048
> 
> testpmd> port stop 0
> testpmd> create bonding device 4 0
> testpmd> add bonding member 0 1
> testpmd> remove bonding member 0 1
> testpmd> quit
> 
> ASan reports:
> ==2571911==ERROR: AddressSanitizer: heap-use-after-free on address
> 0x000174529880 at pc 0x0000113c8440 bp 0xffffefae0ea0 sp 0xffffefae0eb0
> READ of size 8 at 0x000174529880 thread T0
>     #0 0x113c843c in mlx5_txq_release ../drivers/net/mlx5/mlx5_txq.c:
> 1203
>     #1 0xffdb53c in mlx5_dev_close ../drivers/net/mlx5/mlx5.c:2286
>     #2 0xe12dc0 in rte_eth_dev_close ../lib/ethdev/rte_ethdev.c:1877
>     #3 0x6bac1c in close_port ../app/test-pmd/testpmd.c:3540
>     #4 0x6bc320 in pmd_test_exit ../app/test-pmd/testpmd.c:3808
>     #5 0x6c1a94 in main ../app/test-pmd/testpmd.c:4759
>     #6 0xffff9328f038  (/usr/lib64/libc.so.6+0x2b038)
>     #7 0xffff9328f110 in __libc_start_main (/usr/lib64/libc.so.6+
> 0x2b110)
> 
> Fixes: 6e78005 ("net/mlx5: add reference counter on DPDK Tx queues")
> Cc: stable@dpdk.org
> 
> Reported-by: Yunjian Wang <wangyunjian@huawei.com>
> Signed-off-by: Pengfei Sun <sunpengfei16@huawei.com>
Patch applied to next-net-mlx,
Kindest regards
Raslan Darawsheh
  

Patch

diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 3a182de..6b5a4da 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -2279,7 +2279,7 @@  mlx5_dev_close(struct rte_eth_dev *dev)
 		mlx5_free(priv->rxq_privs);
 		priv->rxq_privs = NULL;
 	}
-	if (priv->txqs != NULL) {
+	if (priv->txqs != NULL && dev->data->tx_queues != NULL) {
 		/* XXX race condition if mlx5_tx_burst() is still running. */
 		rte_delay_us_sleep(1000);
 		for (i = 0; (i != priv->txqs_n); ++i)