[3/3] net/vdev: fix insert vdev core dump

Message ID 20240314093630.1066948-4-mingjinx.ye@intel.com (mailing list archive)
State Superseded, archived
Delegated to: Thomas Monjalon
Headers
Series fix insert dev core dump |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/loongarch-compilation success Compilation OK
ci/loongarch-unit-testing success Unit Testing PASS
ci/Intel-compilation success Compilation OK
ci/intel-Testing success Testing PASS
ci/github-robot: build success github build: passed
ci/intel-Functional success Functional PASS
ci/iol-intel-Performance success Performance Testing PASS
ci/iol-intel-Functional success Functional Testing PASS
ci/iol-mellanox-Performance success Performance Testing PASS
ci/iol-abi-testing success Testing PASS
ci/iol-compile-amd64-testing success Testing PASS
ci/iol-unit-amd64-testing success Testing PASS
ci/iol-sample-apps-testing success Testing PASS
ci/iol-unit-arm64-testing success Testing PASS
ci/iol-compile-arm64-testing success Testing PASS
ci/iol-broadcom-Performance success Performance Testing PASS
ci/iol-broadcom-Functional success Functional Testing PASS

Commit Message

Mingjin Ye March 14, 2024, 9:36 a.m. UTC
Inserting a vdev device when the device arguments are already stored
in devargs_list, the rte_devargs_insert function replaces the supplied
new devargs with the found devargs and frees the new devargs. As a
result, the use of free devargs results in a core dump.

This patch fixes the issue by using valid devargs.

Fixes: f3a1188cee4a ("devargs: make device representation generic")
Cc: stable@dpdk.org

Signed-off-by: Mingjin Ye <mingjinx.ye@intel.com>
---
 drivers/bus/vdev/vdev.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
  

Comments

Jiang, YuX March 15, 2024, 5:51 a.m. UTC | #1
> -----Original Message-----
> From: Mingjin Ye <mingjinx.ye@intel.com>
> Sent: Thursday, March 14, 2024 5:37 PM
> To: dev@dpdk.org
> Cc: Ye, MingjinX <mingjinx.ye@intel.com>; stable@dpdk.org
> Subject: [PATCH 3/3] net/vdev: fix insert vdev core dump
> 
> Inserting a vdev device when the device arguments are already stored in
> devargs_list, the rte_devargs_insert function replaces the supplied new
> devargs with the found devargs and frees the new devargs. As a result, the use
> of free devargs results in a core dump.
> 
> This patch fixes the issue by using valid devargs.
> 
> Fixes: f3a1188cee4a ("devargs: make device representation generic")
> Cc: stable@dpdk.org
> 
> Signed-off-by: Mingjin Ye <mingjinx.ye@intel.com>
> ---
>  drivers/bus/vdev/vdev.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 

Tested-by:  Yu Jiang <yux.jiang@intel.com>

Best regards,
Yu Jiang
  
Thomas Monjalon June 19, 2024, 8:16 p.m. UTC | #2
14/03/2024 10:36, Mingjin Ye:
> Inserting a vdev device when the device arguments are already stored
> in devargs_list, the rte_devargs_insert function replaces the supplied
> new devargs with the found devargs and frees the new devargs. As a
> result, the use of free devargs results in a core dump.
> 
> This patch fixes the issue by using valid devargs.
> 
> Fixes: f3a1188cee4a ("devargs: make device representation generic")
> Cc: stable@dpdk.org
> 
> Signed-off-by: Mingjin Ye <mingjinx.ye@intel.com>
> ---
>  	dev->device.bus = &rte_vdev_bus;
>  	dev->device.numa_node = SOCKET_ID_ANY;
> -	dev->device.name = devargs->name;
>  
>  	if (find_vdev(name)) {
>  		/*
> @@ -300,6 +299,7 @@ insert_vdev(const char *name, const char *args, struct rte_vdev_device **p_dev)
>  
>  	rte_devargs_insert(&devargs);
>  	dev->device.devargs = devargs;
> +	dev->device.name = devargs->name;
>  	TAILQ_INSERT_TAIL(&vdev_device_list, dev, next);

How setting the name later can have an impact here?
  
Mingjin Ye June 20, 2024, 6:41 a.m. UTC | #3
> -----Original Message-----
> From: Thomas Monjalon <thomas@monjalon.net>
> Sent: Thursday, June 20, 2024 4:16 AM
> To: Ye, MingjinX <mingjinx.ye@intel.com>
> Cc: dev@dpdk.org; stable@dpdk.org; Marchand, David
> <david.marchand@redhat.com>; stephen@networkplumber.org;
> Richardson, Bruce <bruce.richardson@intel.com>
> Subject: Re: [PATCH 3/3] net/vdev: fix insert vdev core dump
> 
> 14/03/2024 10:36, Mingjin Ye:
> > Inserting a vdev device when the device arguments are already stored
> > in devargs_list, the rte_devargs_insert function replaces the supplied
> > new devargs with the found devargs and frees the new devargs. As a
> > result, the use of free devargs results in a core dump.
> >
> > This patch fixes the issue by using valid devargs.
> >
> > Fixes: f3a1188cee4a ("devargs: make device representation generic")
> > Cc: stable@dpdk.org
> >
> > Signed-off-by: Mingjin Ye <mingjinx.ye@intel.com>
> > ---
> >  	dev->device.bus = &rte_vdev_bus;
> >  	dev->device.numa_node = SOCKET_ID_ANY;
> > -	dev->device.name = devargs->name;
> >
> >  	if (find_vdev(name)) {
> >  		/*
> > @@ -300,6 +299,7 @@ insert_vdev(const char *name, const char *args,
> > struct rte_vdev_device **p_dev)
> >
> >  	rte_devargs_insert(&devargs);
> >  	dev->device.devargs = devargs;
> > +	dev->device.name = devargs->name;
> >  	TAILQ_INSERT_TAIL(&vdev_device_list, dev, next);
> 
> How setting the name later can have an impact here?
> 
/**
 * Insert an rte_devargs in the global list.
 *
 * @param da
 *  The devargs structure to insert.
 *  If a devargs for the same device is already inserted,
 *  it will be updated and returned. It means *da pointer can change.
 *
 * @return
 *   - 0 on success
 *   - Negative on error.
 */
int
rte_devargs_insert(struct rte_devargs **da);

Info: dpdk/lib/eal/include/rte_devargs.h
  
Burakov, Anatoly July 11, 2024, 4:10 p.m. UTC | #4
On 3/14/2024 10:36 AM, Mingjin Ye wrote:
> Inserting a vdev device when the device arguments are already stored
> in devargs_list, the rte_devargs_insert function replaces the supplied
> new devargs with the found devargs and frees the new devargs. As a
> result, the use of free devargs results in a core dump.
> 
> This patch fixes the issue by using valid devargs.
> 
> Fixes: f3a1188cee4a ("devargs: make device representation generic")
> Cc: stable@dpdk.org
> 
> Signed-off-by: Mingjin Ye <mingjinx.ye@intel.com>

I am not too familiar with how devargs works, so I have a trivial question.

I understand the point of this patch, and it is roughly as follows:

1) we enter insert_vdev and allocated new devargs structure (and copy 
the `name` into devargs)
2) we set dev->device.name to devargs->name
3) we insert devargs into the list using rte_devargs_insert
4) if devargs list already had devargs with the same name, our devargs 
is destroyed and replaced with devargs that is in the list
5) because of this, dev->device.name becomes invalid as that specific 
devargs has been freed - it points to name from the old devargs

We do need to store devargs->name in dev->device.name, and we need to do 
so after calling rte_devargs_insert to avoid keeping reference to memory 
that was freed. So, provisionally,

Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>

My question is, under what circumstances would there be duplicate 
entries in devargs list? I assume there is an answer to that question 
because rte_devargs_insert() expects this case, so this is just for my 
understanding - I can't seem to figure out how we would arrive at a 
situation where we have duplicate devargs.

> ---
>   drivers/bus/vdev/vdev.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/bus/vdev/vdev.c b/drivers/bus/vdev/vdev.c
> index 1a6cc7d12d..53fc4a6e21 100644
> --- a/drivers/bus/vdev/vdev.c
> +++ b/drivers/bus/vdev/vdev.c
> @@ -286,7 +286,6 @@ insert_vdev(const char *name, const char *args, struct rte_vdev_device **p_dev)
>   
>   	dev->device.bus = &rte_vdev_bus;
>   	dev->device.numa_node = SOCKET_ID_ANY;
> -	dev->device.name = devargs->name;
>   
>   	if (find_vdev(name)) {
>   		/*
> @@ -300,6 +299,7 @@ insert_vdev(const char *name, const char *args, struct rte_vdev_device **p_dev)
>   
>   	rte_devargs_insert(&devargs);
>   	dev->device.devargs = devargs;
> +	dev->device.name = devargs->name;
>   	TAILQ_INSERT_TAIL(&vdev_device_list, dev, next);
>   
>   	if (p_dev)
  
Mingjin Ye July 12, 2024, 2:18 a.m. UTC | #5
> -----Original Message-----
> From: Burakov, Anatoly <anatoly.burakov@intel.com>
> Sent: Friday, July 12, 2024 12:10 AM
> To: Ye, MingjinX <mingjinx.ye@intel.com>; dev@dpdk.org
> Cc: stable@dpdk.org
> Subject: Re: [PATCH 3/3] net/vdev: fix insert vdev core dump
> 
> On 3/14/2024 10:36 AM, Mingjin Ye wrote:
> > Inserting a vdev device when the device arguments are already stored
> > in devargs_list, the rte_devargs_insert function replaces the supplied
> > new devargs with the found devargs and frees the new devargs. As a
> > result, the use of free devargs results in a core dump.
> >
> > This patch fixes the issue by using valid devargs.
> >
> > Fixes: f3a1188cee4a ("devargs: make device representation generic")
> > Cc: stable@dpdk.org
> >
> > Signed-off-by: Mingjin Ye <mingjinx.ye@intel.com>
> 
> I am not too familiar with how devargs works, so I have a trivial question.
> 
> I understand the point of this patch, and it is roughly as follows:
> 
> 1) we enter insert_vdev and allocated new devargs structure (and copy the
> `name` into devargs)
> 2) we set dev->device.name to devargs->name
> 3) we insert devargs into the list using rte_devargs_insert
> 4) if devargs list already had devargs with the same name, our devargs is
> destroyed and replaced with devargs that is in the list
> 5) because of this, dev->device.name becomes invalid as that specific
> devargs has been freed - it points to name from the old devargs
> 
> We do need to store devargs->name in dev->device.name, and we need to
> do so after calling rte_devargs_insert to avoid keeping reference to memory
> that was freed. So, provisionally,
> 
> Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>
> 
> My question is, under what circumstances would there be duplicate entries
> in devargs list?

In a multi-process circumstances, the primary and secondary processes both have "vdev" startup
Parameters. And their parameters have the same name. This causes multiple devargs objects with the same name
to be created in the secondary process, the 1st one constructed from the startup parameter and
the 2nd one constructed due to the VDEV_SCAN_ONE message received.

Path 1:
eal_parse_args->eal_parse_common_option: OPT_VDEV_NUM
eal_option_device_parse->rte_devargs_add

Path 2:
vdev_action: vdev_scan_one ->insert_vdev(alloc_devargs)

> I assume there is an answer to that question because
> rte_devargs_insert() expects this case, so this is just for my understanding - I
> can't seem to figure out how we would arrive at a situation where we have
> duplicate devargs.
> 
> > ---
> >   drivers/bus/vdev/vdev.c | 2 +-
> >   1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/drivers/bus/vdev/vdev.c b/drivers/bus/vdev/vdev.c index
> > 1a6cc7d12d..53fc4a6e21 100644
> > --- a/drivers/bus/vdev/vdev.c
> > +++ b/drivers/bus/vdev/vdev.c
> > @@ -286,7 +286,6 @@ insert_vdev(const char *name, const char *args,
> > struct rte_vdev_device **p_dev)
> >
> >   	dev->device.bus = &rte_vdev_bus;
> >   	dev->device.numa_node = SOCKET_ID_ANY;
> > -	dev->device.name = devargs->name;
> >
> >   	if (find_vdev(name)) {
> >   		/*
> > @@ -300,6 +299,7 @@ insert_vdev(const char *name, const char *args,
> > struct rte_vdev_device **p_dev)
> >
> >   	rte_devargs_insert(&devargs);
> >   	dev->device.devargs = devargs;
> > +	dev->device.name = devargs->name;
> >   	TAILQ_INSERT_TAIL(&vdev_device_list, dev, next);
> >
> >   	if (p_dev)
> 
> --
> Thanks,
> Anatoly
  
Burakov, Anatoly July 12, 2024, 8:38 a.m. UTC | #6
On 7/12/2024 4:18 AM, Ye, MingjinX wrote:
> 
> 
>> -----Original Message-----
>> From: Burakov, Anatoly <anatoly.burakov@intel.com>
>> Sent: Friday, July 12, 2024 12:10 AM
>> To: Ye, MingjinX <mingjinx.ye@intel.com>; dev@dpdk.org
>> Cc: stable@dpdk.org
>> Subject: Re: [PATCH 3/3] net/vdev: fix insert vdev core dump
>>
>> On 3/14/2024 10:36 AM, Mingjin Ye wrote:
>>> Inserting a vdev device when the device arguments are already stored
>>> in devargs_list, the rte_devargs_insert function replaces the supplied
>>> new devargs with the found devargs and frees the new devargs. As a
>>> result, the use of free devargs results in a core dump.
>>>
>>> This patch fixes the issue by using valid devargs.
>>>
>>> Fixes: f3a1188cee4a ("devargs: make device representation generic")
>>> Cc: stable@dpdk.org
>>>
>>> Signed-off-by: Mingjin Ye <mingjinx.ye@intel.com>
>>
>> I am not too familiar with how devargs works, so I have a trivial question.
>>
>> I understand the point of this patch, and it is roughly as follows:
>>
>> 1) we enter insert_vdev and allocated new devargs structure (and copy the
>> `name` into devargs)
>> 2) we set dev->device.name to devargs->name
>> 3) we insert devargs into the list using rte_devargs_insert
>> 4) if devargs list already had devargs with the same name, our devargs is
>> destroyed and replaced with devargs that is in the list
>> 5) because of this, dev->device.name becomes invalid as that specific
>> devargs has been freed - it points to name from the old devargs
>>
>> We do need to store devargs->name in dev->device.name, and we need to
>> do so after calling rte_devargs_insert to avoid keeping reference to memory
>> that was freed. So, provisionally,
>>
>> Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>
>>
>> My question is, under what circumstances would there be duplicate entries
>> in devargs list?
> 
> In a multi-process circumstances, the primary and secondary processes both have "vdev" startup
> Parameters. And their parameters have the same name. This causes multiple devargs objects with the same name
> to be created in the secondary process, the 1st one constructed from the startup parameter and
> the 2nd one constructed due to the VDEV_SCAN_ONE message received.
> 
> Path 1:
> eal_parse_args->eal_parse_common_option: OPT_VDEV_NUM
> eal_option_device_parse->rte_devargs_add
> 
> Path 2:
> vdev_action: vdev_scan_one ->insert_vdev(alloc_devargs)

So, technically, this is only a problem because we're specifying --vdev 
at secondary process startup even though we could've just relied on vdev 
multiprocess hotplug to provide us with a device? Or is there more to it?

In any case, I think this patch is OK. I think the Fixes: tag is 
pointing to an incorrect commit. I've walked through the commit history, 
and it seems that this wasn't a bug initially, but became a bug when 
vdev multiprocess hotplug was added, so I think the correct Fixes: tag 
should be as follows:

Fixes: cdb068f031c6 ("bus/vdev: scan by multi-process channel")

So, on account of the above, I suggest rewording the commit message to 
something like the following:

In secondary processes, insert_vdev() may be called multiple times on 
the same device due to a combination of multiprocess hotplug for vdev 
bus, and EAL argument adding the same vdev. In this circumstance, when 
rte_devargs_insert() is called, the devargs->name reference becomes 
invalid because rte_devargs_insert() will destroy the just-allocated 
devargs and will replace the pointer with one from the devargs list. As 
a result, the reference to devargs->name stored in dev->device.name 
becomes invalid. Fix the issue by not setting device name until after 
rte_devargs_insert() was called.

Fixes: cdb068f031c6 ("bus/vdev: scan by multi-process channel")
Cc: stable@dpdk.org

Signed-off-by: Mingjin Ye <mingjinx.ye@intel.com>
Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>
  

Patch

diff --git a/drivers/bus/vdev/vdev.c b/drivers/bus/vdev/vdev.c
index 1a6cc7d12d..53fc4a6e21 100644
--- a/drivers/bus/vdev/vdev.c
+++ b/drivers/bus/vdev/vdev.c
@@ -286,7 +286,6 @@  insert_vdev(const char *name, const char *args, struct rte_vdev_device **p_dev)
 
 	dev->device.bus = &rte_vdev_bus;
 	dev->device.numa_node = SOCKET_ID_ANY;
-	dev->device.name = devargs->name;
 
 	if (find_vdev(name)) {
 		/*
@@ -300,6 +299,7 @@  insert_vdev(const char *name, const char *args, struct rte_vdev_device **p_dev)
 
 	rte_devargs_insert(&devargs);
 	dev->device.devargs = devargs;
+	dev->device.name = devargs->name;
 	TAILQ_INSERT_TAIL(&vdev_device_list, dev, next);
 
 	if (p_dev)