[v7] eal: add bus cleanup to eal cleanup

Message ID 20220603143601.230519-1-kevin.laatz@intel.com (mailing list archive)
State Superseded, archived
Delegated to: David Marchand
Headers
Series [v7] eal: add bus cleanup to eal cleanup |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/iol-intel-Functional success Functional Testing PASS
ci/iol-mellanox-Performance success Performance Testing PASS
ci/iol-intel-Performance success Performance Testing PASS
ci/github-robot: build success github build: passed
ci/iol-x86_64-compile-testing success Testing PASS
ci/iol-aarch64-compile-testing success Testing PASS
ci/iol-aarch64-unit-testing success Testing PASS
ci/iol-x86_64-unit-testing success Testing PASS
ci/Intel-compilation success Compilation OK
ci/intel-Testing success Testing PASS
ci/iol-abi-testing success Testing PASS

Commit Message

Kevin Laatz June 3, 2022, 2:36 p.m. UTC
  During EAL init, all buses are probed and the devices found are
initialized. On eal_cleanup(), the inverse does not happen, meaning any
allocated memory and other configuration will not be cleaned up
appropriately on exit.

Currently, in order for device cleanup to take place, applications must
call the driver-relevant functions to ensure proper cleanup is done before
the application exits. Since initialization occurs for all devices on the
bus, not just the devices used by an application, it requires a)
application awareness of all bus devices that could have been probed on the
system, and b) code duplication across applications to ensure cleanup is
performed. An example of this is rte_eth_dev_close() which is commonly used
across the example applications.

This patch proposes adding bus cleanup to the eal_cleanup() to make EAL's
init/exit more symmetrical, ensuring all bus devices are cleaned up
appropriately without the application needing to be aware of all bus types
that may have been probed during initialization.

Contained in this patch are the changes required to perform cleanup for
devices on the PCI bus and VDEV bus during eal_cleanup(). There would be an
ask for bus maintainers to add the relevant cleanup for their buses since
they have the domain expertise.

Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
Acked-by: Morten Brørup <mb@smartsharesystems.com>
Reviewed-by: Bruce Richardson <bruce.richardson@intel.com>

---
v7:
* free rte_pci_device structs during cleanup
* free rte_vdev_device structs during cleanup

v6:
* fix units in doc API descriptions

v5:
* add doc updates for new APIs

v4:
* fix return value when scaling_freq_max is not set
* fix mismatching comments

v3:
* move setters from arg parse function to init
* consider 0 as 'not set' for scaling_freq_max
* other minor fixes

v2:
* add doc update for l3fwd-power
* order version.map additions alphabetically
---
 devtools/libabigail.abignore    |  9 +++++++++
 drivers/bus/pci/pci_common.c    | 26 ++++++++++++++++++++++++++
 drivers/bus/vdev/vdev.c         | 26 ++++++++++++++++++++++++++
 lib/eal/common/eal_common_bus.c | 17 +++++++++++++++++
 lib/eal/common/eal_private.h    | 10 ++++++++++
 lib/eal/freebsd/eal.c           |  1 +
 lib/eal/include/rte_bus.h       | 13 +++++++++++++
 lib/eal/linux/eal.c             |  1 +
 lib/eal/windows/eal.c           |  1 +
 9 files changed, 104 insertions(+)
  

Comments

Stephen Hemminger June 3, 2022, 3:11 p.m. UTC | #1
On Fri,  3 Jun 2022 15:36:01 +0100
Kevin Laatz <kevin.laatz@intel.com> wrote:

> +/* Clean up all devices of all buses */
> +int
> +eal_bus_cleanup(void)
> +{
> +	int ret = 0;
> +	struct rte_bus *bus;
> +
> +	TAILQ_FOREACH(bus, &rte_bus_list, next) {
> +		if (bus->cleanup == NULL)
> +			continue;
> +		if (bus->cleanup() != 0)
> +			ret = -1;
> +	}
> +
> +	return ret;
> +}
> +

This is an internal  function, and all users of it
look like they don't use the return value.

Why not make the function void eal_bus_cleanup()
and simplify back up the call chain?
  
Bruce Richardson June 3, 2022, 3:39 p.m. UTC | #2
On Fri, Jun 03, 2022 at 08:11:54AM -0700, Stephen Hemminger wrote:
> On Fri,  3 Jun 2022 15:36:01 +0100
> Kevin Laatz <kevin.laatz@intel.com> wrote:
> 
> > +/* Clean up all devices of all buses */
> > +int
> > +eal_bus_cleanup(void)
> > +{
> > +	int ret = 0;
> > +	struct rte_bus *bus;
> > +
> > +	TAILQ_FOREACH(bus, &rte_bus_list, next) {
> > +		if (bus->cleanup == NULL)
> > +			continue;
> > +		if (bus->cleanup() != 0)
> > +			ret = -1;
> > +	}
> > +
> > +	return ret;
> > +}
> > +
> 
> This is an internal  function, and all users of it
> look like they don't use the return value.
> 
> Why not make the function void eal_bus_cleanup()
> and simplify back up the call chain?

Is there really that much difference in doing so? My own slight preference
would be to have the error codes available for future use in case we want
them, so long as the overhead of them is not great (which it should not
be). However, if others all feel that having these functions return void is
best, I'm happy enough with that too.
  
lihuisong (C) June 4, 2022, 2:07 a.m. UTC | #3
在 2022/6/3 22:36, Kevin Laatz 写道:
> During EAL init, all buses are probed and the devices found are
> initialized. On eal_cleanup(), the inverse does not happen, meaning any
> allocated memory and other configuration will not be cleaned up
> appropriately on exit.
>
> Currently, in order for device cleanup to take place, applications must
> call the driver-relevant functions to ensure proper cleanup is done before
> the application exits. Since initialization occurs for all devices on the
> bus, not just the devices used by an application, it requires a)
> application awareness of all bus devices that could have been probed on the
> system, and b) code duplication across applications to ensure cleanup is
> performed. An example of this is rte_eth_dev_close() which is commonly used
> across the example applications.
>
> This patch proposes adding bus cleanup to the eal_cleanup() to make EAL's
> init/exit more symmetrical, ensuring all bus devices are cleaned up
> appropriately without the application needing to be aware of all bus types
> that may have been probed during initialization.
>
> Contained in this patch are the changes required to perform cleanup for
> devices on the PCI bus and VDEV bus during eal_cleanup(). There would be an
> ask for bus maintainers to add the relevant cleanup for their buses since
> they have the domain expertise.
>
> Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
> Acked-by: Morten Brørup <mb@smartsharesystems.com>
> Reviewed-by: Bruce Richardson <bruce.richardson@intel.com>
>
> ---
> v7:
> * free rte_pci_device structs during cleanup
> * free rte_vdev_device structs during cleanup
>
> v6:
> * fix units in doc API descriptions
>
> v5:
> * add doc updates for new APIs
>
> v4:
> * fix return value when scaling_freq_max is not set
> * fix mismatching comments
>
> v3:
> * move setters from arg parse function to init
> * consider 0 as 'not set' for scaling_freq_max
> * other minor fixes
>
> v2:
> * add doc update for l3fwd-power
> * order version.map additions alphabetically
> ---
>   devtools/libabigail.abignore    |  9 +++++++++
>   drivers/bus/pci/pci_common.c    | 26 ++++++++++++++++++++++++++
>   drivers/bus/vdev/vdev.c         | 26 ++++++++++++++++++++++++++
>   lib/eal/common/eal_common_bus.c | 17 +++++++++++++++++
>   lib/eal/common/eal_private.h    | 10 ++++++++++
>   lib/eal/freebsd/eal.c           |  1 +
>   lib/eal/include/rte_bus.h       | 13 +++++++++++++
>   lib/eal/linux/eal.c             |  1 +
>   lib/eal/windows/eal.c           |  1 +
>   9 files changed, 104 insertions(+)
>
> diff --git a/devtools/libabigail.abignore b/devtools/libabigail.abignore
> index 79ff15dc4e..3e519ee42a 100644
> --- a/devtools/libabigail.abignore
> +++ b/devtools/libabigail.abignore
> @@ -56,3 +56,12 @@
>   ; Ignore libabigail false-positive in clang builds, after moving code.
>   [suppress_function]
>   	name = rte_eal_remote_launch
> +
> +; Ignore field inserted to rte_bus, adding cleanup function
> +[suppress_type]
> +        name = rte_bus
> +        has_data_member_inserted_at = end
> +
> +; Ignore changes to internally used structs containing rte_bus
> +[suppress_type]
> +        name = rte_pci_bus, rte_vmbus_bus, rte_vdev_bus
> diff --git a/drivers/bus/pci/pci_common.c b/drivers/bus/pci/pci_common.c
> index 37ab879779..75b312eef1 100644
> --- a/drivers/bus/pci/pci_common.c
> +++ b/drivers/bus/pci/pci_common.c
> @@ -25,6 +25,7 @@
>   #include <rte_common.h>
>   #include <rte_devargs.h>
>   #include <rte_vfio.h>
> +#include <rte_tailq.h>
>   
>   #include "private.h"
>   
> @@ -394,6 +395,30 @@ pci_probe(void)
>   	return (probed && probed == failed) ? -1 : 0;
>   }
>   
> +static int
> +pci_cleanup(void)
> +{
> +	struct rte_pci_device *dev, *tmp_dev;
> +	int error = 0;
> +
> +	RTE_TAILQ_FOREACH_SAFE(dev, &rte_pci_bus.device_list, next, tmp_dev) {
> +		struct rte_pci_driver *drv = dev->driver;
> +		int ret = 0;
> +
> +		if (drv == NULL || drv->remove == NULL)
> +			continue;
It seems that 'dev->driver' still points to the 'rte_pci_driver' or 
'rte_vdev_driver'
if the device has been closed by 'dev_close()'. Logically, there is a 
risk of removing
a device twice. Do you want to guarantee through the 'remove()' API itself?
> +
> +		ret = drv->remove(dev);
> +		if (ret < 0) {
> +			rte_errno = errno;
> +			error = -1;
> +		}
> +		free(dev);
Can I use the 'local_dev_remove()' to remove the device and 
rte_pci/vdev_device on the bus?
Because there may be other resources that need to be released, like, 
some release operations
in the 'rte_pci_detach_dev()'.
> +	}
> +
> +	return error;
> +}
> +
>   /* dump one device */
>   static int
>   pci_dump_one_device(FILE *f, struct rte_pci_device *dev)
> @@ -813,6 +838,7 @@ struct rte_pci_bus rte_pci_bus = {
>   	.bus = {
>   		.scan = rte_pci_scan,
>   		.probe = pci_probe,
> +		.cleanup = pci_cleanup,
>   		.find_device = pci_find_device,
>   		.plug = pci_plug,
>   		.unplug = pci_unplug,
> diff --git a/drivers/bus/vdev/vdev.c b/drivers/bus/vdev/vdev.c
> index a8d8b2327e..707ea1bbb5 100644
> --- a/drivers/bus/vdev/vdev.c
> +++ b/drivers/bus/vdev/vdev.c
> @@ -569,6 +569,31 @@ vdev_probe(void)
>   	return ret;
>   }
>   
> +static int
> +vdev_cleanup(void)
> +{
> +	struct rte_vdev_device *dev, *tmp_dev;
> +	int error = 0;
> +
> +	RTE_TAILQ_FOREACH_SAFE(dev, &vdev_device_list, next, tmp_dev) {
> +		const struct rte_vdev_driver *drv;
> +		int ret = 0;
> +
> +		drv = container_of(dev->device.driver, const struct rte_vdev_driver, driver);
> +
> +		if (drv == NULL || drv->remove == NULL)
> +			continue;
> +
> +		ret = drv->remove(dev);
> +		if (ret < 0)
> +			error = -1;
> +
> +		free(dev);
> +	}
> +
> +	return error;
> +}
> +
>   struct rte_device *
>   rte_vdev_find_device(const struct rte_device *start, rte_dev_cmp_t cmp,
>   		     const void *data)
> @@ -627,6 +652,7 @@ vdev_get_iommu_class(void)
>   static struct rte_bus rte_vdev_bus = {
>   	.scan = vdev_scan,
>   	.probe = vdev_probe,
> +	.cleanup = vdev_cleanup,
>   	.find_device = rte_vdev_find_device,
>   	.plug = vdev_plug,
>   	.unplug = vdev_unplug,
> diff --git a/lib/eal/common/eal_common_bus.c b/lib/eal/common/eal_common_bus.c
> index baa5b532af..3fe67af0ba 100644
> --- a/lib/eal/common/eal_common_bus.c
> +++ b/lib/eal/common/eal_common_bus.c
> @@ -85,6 +85,23 @@ rte_bus_probe(void)
>   	return 0;
>   }
>   
> +/* Clean up all devices of all buses */
> +int
> +eal_bus_cleanup(void)
> +{
> +	int ret = 0;
> +	struct rte_bus *bus;
> +
> +	TAILQ_FOREACH(bus, &rte_bus_list, next) {
> +		if (bus->cleanup == NULL)
> +			continue;
> +		if (bus->cleanup() != 0)
> +			ret = -1;
> +	}
> +
> +	return ret;
> +}
> +
>   /* Dump information of a single bus */
>   static int
>   bus_dump_one(FILE *f, struct rte_bus *bus)
> diff --git a/lib/eal/common/eal_private.h b/lib/eal/common/eal_private.h
> index 44d14241f0..eea4749af4 100644
> --- a/lib/eal/common/eal_private.h
> +++ b/lib/eal/common/eal_private.h
> @@ -441,6 +441,16 @@ int rte_eal_memory_detach(void);
>    */
>   struct rte_bus *rte_bus_find_by_device_name(const char *str);
>   
> +/**
> + * For each device on the buses, call the driver-specific function for
> + * device cleanup.
> + *
> + * @return
> + * 0 for successful cleanup
> + * !0 otherwise
> + */
> +int eal_bus_cleanup(void);
> +
>   /**
>    * Create the unix channel for primary/secondary communication.
>    *
> diff --git a/lib/eal/freebsd/eal.c b/lib/eal/freebsd/eal.c
> index a6b20960f2..97ed2c4678 100644
> --- a/lib/eal/freebsd/eal.c
> +++ b/lib/eal/freebsd/eal.c
> @@ -893,6 +893,7 @@ rte_eal_cleanup(void)
>   		eal_get_internal_configuration();
>   	rte_service_finalize();
>   	rte_mp_channel_cleanup();
> +	eal_bus_cleanup();
>   	/* after this point, any DPDK pointers will become dangling */
>   	rte_eal_memory_detach();
>   	rte_eal_alarm_cleanup();
> diff --git a/lib/eal/include/rte_bus.h b/lib/eal/include/rte_bus.h
> index bbbb6efd28..9908a013f6 100644
> --- a/lib/eal/include/rte_bus.h
> +++ b/lib/eal/include/rte_bus.h
> @@ -66,6 +66,18 @@ typedef int (*rte_bus_scan_t)(void);
>    */
>   typedef int (*rte_bus_probe_t)(void);
>   
> +/**
> + * Implementation specific cleanup function which is responsible for cleaning up
> + * devices on that bus with applicable drivers.
> + *
> + * This is called while iterating over each registered bus.
> + *
> + * @return
> + * 0 for successful cleanup
> + * !0 for any error during cleanup
> + */
> +typedef int (*rte_bus_cleanup_t)(void);
> +
>   /**
>    * Device iterator to find a device on a bus.
>    *
> @@ -277,6 +289,7 @@ struct rte_bus {
>   				/**< handle hot-unplug failure on the bus */
>   	rte_bus_sigbus_handler_t sigbus_handler;
>   					/**< handle sigbus error on the bus */
> +	rte_bus_cleanup_t cleanup;   /**< Cleanup devices on bus */
>   
>   };
>   
> diff --git a/lib/eal/linux/eal.c b/lib/eal/linux/eal.c
> index 1ef263434a..9b32265ef5 100644
> --- a/lib/eal/linux/eal.c
> +++ b/lib/eal/linux/eal.c
> @@ -1266,6 +1266,7 @@ rte_eal_cleanup(void)
>   	vfio_mp_sync_cleanup();
>   #endif
>   	rte_mp_channel_cleanup();
> +	eal_bus_cleanup();
>   	/* after this point, any DPDK pointers will become dangling */
>   	rte_eal_memory_detach();
>   	eal_mp_dev_hotplug_cleanup();
> diff --git a/lib/eal/windows/eal.c b/lib/eal/windows/eal.c
> index 122de2a319..fedd6c971a 100644
> --- a/lib/eal/windows/eal.c
> +++ b/lib/eal/windows/eal.c
> @@ -262,6 +262,7 @@ rte_eal_cleanup(void)
>   
>   	eal_intr_thread_cancel();
>   	eal_mem_virt2iova_cleanup();
> +	eal_bus_cleanup();
>   	/* after this point, any DPDK pointers will become dangling */
>   	rte_eal_memory_detach();
>   	eal_cleanup_config(internal_conf);
  
Thomas Monjalon June 7, 2022, 11:09 a.m. UTC | #4
03/06/2022 16:36, Kevin Laatz:
> During EAL init, all buses are probed and the devices found are
> initialized. On eal_cleanup(), the inverse does not happen, meaning any
> allocated memory and other configuration will not be cleaned up
> appropriately on exit.
[...]
> --- a/devtools/libabigail.abignore
> +++ b/devtools/libabigail.abignore
> @@ -56,3 +56,12 @@
>  ; Ignore libabigail false-positive in clang builds, after moving code.
>  [suppress_function]
>  	name = rte_eal_remote_launch
> +
> +; Ignore field inserted to rte_bus, adding cleanup function
> +[suppress_type]
> +        name = rte_bus
> +        has_data_member_inserted_at = end
> +
> +; Ignore changes to internally used structs containing rte_bus
> +[suppress_type]
> +        name = rte_pci_bus, rte_vmbus_bus, rte_vdev_bus

I'm not sure we can safely consider these structs as internal.
The right process is to send a deprecation notice,
and then remove them from the public API.

For info, Li has sent a patch for the bus cleanup
which is not updating the bus code:
https://patches.dpdk.org/project/dpdk/patch/20220606114650.209612-3-lizh@nvidia.com/
It may be a temporary solution before the deprecation.
  
David Marchand June 7, 2022, 3:12 p.m. UTC | #5
On Tue, Jun 7, 2022 at 1:09 PM Thomas Monjalon <thomas@monjalon.net> wrote:
>
> 03/06/2022 16:36, Kevin Laatz:
> > During EAL init, all buses are probed and the devices found are
> > initialized. On eal_cleanup(), the inverse does not happen, meaning any
> > allocated memory and other configuration will not be cleaned up
> > appropriately on exit.
> [...]
> > --- a/devtools/libabigail.abignore
> > +++ b/devtools/libabigail.abignore
> > @@ -56,3 +56,12 @@
> >  ; Ignore libabigail false-positive in clang builds, after moving code.
> >  [suppress_function]
> >       name = rte_eal_remote_launch
> > +
> > +; Ignore field inserted to rte_bus, adding cleanup function
> > +[suppress_type]
> > +        name = rte_bus
> > +        has_data_member_inserted_at = end
> > +
> > +; Ignore changes to internally used structs containing rte_bus
> > +[suppress_type]
> > +        name = rte_pci_bus, rte_vmbus_bus, rte_vdev_bus

(This change is strange as there is no rte_vdev_bus type, but I won't
investigate the relevance of this rule for now).

>
> I'm not sure we can safely consider these structs as internal.
> The right process is to send a deprecation notice,
> and then remove them from the public API.

Same for me, I don't think we can safely ignore.

A rte_bus struct is embedded in rte_pci_bus (resp. rte_vmbus_bus).
If we make it grow, any inlined access (like walk in device_list or
driver_list) after the rte_bus object is broken for code accessing it
out of DPDK.
Such code might exist out there, since we expose
FOREACH_DEVICE_ON_PCIBUS, for example.


>
> For info, Li has sent a patch for the bus cleanup
> which is not updating the bus code:
> https://patches.dpdk.org/project/dpdk/patch/20220606114650.209612-3-lizh@nvidia.com/
> It may be a temporary solution before the deprecation.

On the principle, that's probably the best, there is no question about
unclear frontier of the ABI.
(In practice though, the mentionned patch is triggering segfaults in
two CI, for pdump).

Hiding rte_bus object should be straightforward in v22.11, I had some
patches, but never finished the work.

It would be great too, to look into rte_driver and rte_device which
are exposed important types, but that's another story.
  
Bruce Richardson June 13, 2022, 3:58 p.m. UTC | #6
On Tue, Jun 07, 2022 at 05:12:02PM +0200, David Marchand wrote:
> On Tue, Jun 7, 2022 at 1:09 PM Thomas Monjalon <thomas@monjalon.net> wrote:
> >
> > 03/06/2022 16:36, Kevin Laatz:
> > > During EAL init, all buses are probed and the devices found are
> > > initialized. On eal_cleanup(), the inverse does not happen, meaning any
> > > allocated memory and other configuration will not be cleaned up
> > > appropriately on exit.
> > [...]
> > > --- a/devtools/libabigail.abignore
> > > +++ b/devtools/libabigail.abignore
> > > @@ -56,3 +56,12 @@
> > >  ; Ignore libabigail false-positive in clang builds, after moving code.
> > >  [suppress_function]
> > >       name = rte_eal_remote_launch
> > > +
> > > +; Ignore field inserted to rte_bus, adding cleanup function
> > > +[suppress_type]
> > > +        name = rte_bus
> > > +        has_data_member_inserted_at = end
> > > +
> > > +; Ignore changes to internally used structs containing rte_bus
> > > +[suppress_type]
> > > +        name = rte_pci_bus, rte_vmbus_bus, rte_vdev_bus
> 
> (This change is strange as there is no rte_vdev_bus type, but I won't
> investigate the relevance of this rule for now).
> 
> >
> > I'm not sure we can safely consider these structs as internal.
> > The right process is to send a deprecation notice,
> > and then remove them from the public API.
> 
> Same for me, I don't think we can safely ignore.
> 
> A rte_bus struct is embedded in rte_pci_bus (resp. rte_vmbus_bus).
> If we make it grow, any inlined access (like walk in device_list or
> driver_list) after the rte_bus object is broken for code accessing it
> out of DPDK.
> Such code might exist out there, since we expose
> FOREACH_DEVICE_ON_PCIBUS, for example.
> 
> 
> >
> > For info, Li has sent a patch for the bus cleanup
> > which is not updating the bus code:
> > https://patches.dpdk.org/project/dpdk/patch/20220606114650.209612-3-lizh@nvidia.com/
> > It may be a temporary solution before the deprecation.
> 
> On the principle, that's probably the best, there is no question about
> unclear frontier of the ABI.
> (In practice though, the mentionned patch is triggering segfaults in
> two CI, for pdump).
> 
> Hiding rte_bus object should be straightforward in v22.11, I had some
> patches, but never finished the work.
> 
> It would be great too, to look into rte_driver and rte_device which
> are exposed important types, but that's another story.
> 
Agreed, we need to look into all this for 22.11 release, let's defer this
patch until we get proper deprecation process. Temporary patch looks fine
as a fix too.

/Bruce
  
David Marchand Oct. 3, 2022, 12:35 p.m. UTC | #7
Hello Bruce, Kevin,

On Mon, Jun 13, 2022 at 5:59 PM Bruce Richardson
<bruce.richardson@intel.com> wrote:
> > > For info, Li has sent a patch for the bus cleanup
> > > which is not updating the bus code:
> > > https://patches.dpdk.org/project/dpdk/patch/20220606114650.209612-3-lizh@nvidia.com/
> > > It may be a temporary solution before the deprecation.
> >
> > On the principle, that's probably the best, there is no question about
> > unclear frontier of the ABI.
> > (In practice though, the mentionned patch is triggering segfaults in
> > two CI, for pdump).
> >
> > Hiding rte_bus object should be straightforward in v22.11, I had some
> > patches, but never finished the work.
> >
> > It would be great too, to look into rte_driver and rte_device which
> > are exposed important types, but that's another story.
> >
> Agreed, we need to look into all this for 22.11 release, let's defer this
> patch until we get proper deprecation process. Temporary patch looks fine
> as a fix too.

The patch needs some rebasing for making it into 22.11.
Can you work on it, this week?


Thanks!
  
Kevin Laatz Oct. 3, 2022, 2:39 p.m. UTC | #8
Hi David,

On 03/10/2022 13:35, David Marchand wrote:
> Hello Bruce, Kevin,
>
> On Mon, Jun 13, 2022 at 5:59 PM Bruce Richardson
> <bruce.richardson@intel.com> wrote:
>>>> For info, Li has sent a patch for the bus cleanup
>>>> which is not updating the bus code:
>>>> https://patches.dpdk.org/project/dpdk/patch/20220606114650.209612-3-lizh@nvidia.com/
>>>> It may be a temporary solution before the deprecation.
>>> On the principle, that's probably the best, there is no question about
>>> unclear frontier of the ABI.
>>> (In practice though, the mentionned patch is triggering segfaults in
>>> two CI, for pdump).
>>>
>>> Hiding rte_bus object should be straightforward in v22.11, I had some
>>> patches, but never finished the work.
>>>
>>> It would be great too, to look into rte_driver and rte_device which
>>> are exposed important types, but that's another story.
>>>
>> Agreed, we need to look into all this for 22.11 release, let's defer this
>> patch until we get proper deprecation process. Temporary patch looks fine
>> as a fix too.
> The patch needs some rebasing for making it into 22.11.
> Can you work on it, this week?
>
Yes, I'll have a look at it - thanks for your work on the deprecations 
and cleanup!

-Kevin
  

Patch

diff --git a/devtools/libabigail.abignore b/devtools/libabigail.abignore
index 79ff15dc4e..3e519ee42a 100644
--- a/devtools/libabigail.abignore
+++ b/devtools/libabigail.abignore
@@ -56,3 +56,12 @@ 
 ; Ignore libabigail false-positive in clang builds, after moving code.
 [suppress_function]
 	name = rte_eal_remote_launch
+
+; Ignore field inserted to rte_bus, adding cleanup function
+[suppress_type]
+        name = rte_bus
+        has_data_member_inserted_at = end
+
+; Ignore changes to internally used structs containing rte_bus
+[suppress_type]
+        name = rte_pci_bus, rte_vmbus_bus, rte_vdev_bus
diff --git a/drivers/bus/pci/pci_common.c b/drivers/bus/pci/pci_common.c
index 37ab879779..75b312eef1 100644
--- a/drivers/bus/pci/pci_common.c
+++ b/drivers/bus/pci/pci_common.c
@@ -25,6 +25,7 @@ 
 #include <rte_common.h>
 #include <rte_devargs.h>
 #include <rte_vfio.h>
+#include <rte_tailq.h>
 
 #include "private.h"
 
@@ -394,6 +395,30 @@  pci_probe(void)
 	return (probed && probed == failed) ? -1 : 0;
 }
 
+static int
+pci_cleanup(void)
+{
+	struct rte_pci_device *dev, *tmp_dev;
+	int error = 0;
+
+	RTE_TAILQ_FOREACH_SAFE(dev, &rte_pci_bus.device_list, next, tmp_dev) {
+		struct rte_pci_driver *drv = dev->driver;
+		int ret = 0;
+
+		if (drv == NULL || drv->remove == NULL)
+			continue;
+
+		ret = drv->remove(dev);
+		if (ret < 0) {
+			rte_errno = errno;
+			error = -1;
+		}
+		free(dev);
+	}
+
+	return error;
+}
+
 /* dump one device */
 static int
 pci_dump_one_device(FILE *f, struct rte_pci_device *dev)
@@ -813,6 +838,7 @@  struct rte_pci_bus rte_pci_bus = {
 	.bus = {
 		.scan = rte_pci_scan,
 		.probe = pci_probe,
+		.cleanup = pci_cleanup,
 		.find_device = pci_find_device,
 		.plug = pci_plug,
 		.unplug = pci_unplug,
diff --git a/drivers/bus/vdev/vdev.c b/drivers/bus/vdev/vdev.c
index a8d8b2327e..707ea1bbb5 100644
--- a/drivers/bus/vdev/vdev.c
+++ b/drivers/bus/vdev/vdev.c
@@ -569,6 +569,31 @@  vdev_probe(void)
 	return ret;
 }
 
+static int
+vdev_cleanup(void)
+{
+	struct rte_vdev_device *dev, *tmp_dev;
+	int error = 0;
+
+	RTE_TAILQ_FOREACH_SAFE(dev, &vdev_device_list, next, tmp_dev) {
+		const struct rte_vdev_driver *drv;
+		int ret = 0;
+
+		drv = container_of(dev->device.driver, const struct rte_vdev_driver, driver);
+
+		if (drv == NULL || drv->remove == NULL)
+			continue;
+
+		ret = drv->remove(dev);
+		if (ret < 0)
+			error = -1;
+
+		free(dev);
+	}
+
+	return error;
+}
+
 struct rte_device *
 rte_vdev_find_device(const struct rte_device *start, rte_dev_cmp_t cmp,
 		     const void *data)
@@ -627,6 +652,7 @@  vdev_get_iommu_class(void)
 static struct rte_bus rte_vdev_bus = {
 	.scan = vdev_scan,
 	.probe = vdev_probe,
+	.cleanup = vdev_cleanup,
 	.find_device = rte_vdev_find_device,
 	.plug = vdev_plug,
 	.unplug = vdev_unplug,
diff --git a/lib/eal/common/eal_common_bus.c b/lib/eal/common/eal_common_bus.c
index baa5b532af..3fe67af0ba 100644
--- a/lib/eal/common/eal_common_bus.c
+++ b/lib/eal/common/eal_common_bus.c
@@ -85,6 +85,23 @@  rte_bus_probe(void)
 	return 0;
 }
 
+/* Clean up all devices of all buses */
+int
+eal_bus_cleanup(void)
+{
+	int ret = 0;
+	struct rte_bus *bus;
+
+	TAILQ_FOREACH(bus, &rte_bus_list, next) {
+		if (bus->cleanup == NULL)
+			continue;
+		if (bus->cleanup() != 0)
+			ret = -1;
+	}
+
+	return ret;
+}
+
 /* Dump information of a single bus */
 static int
 bus_dump_one(FILE *f, struct rte_bus *bus)
diff --git a/lib/eal/common/eal_private.h b/lib/eal/common/eal_private.h
index 44d14241f0..eea4749af4 100644
--- a/lib/eal/common/eal_private.h
+++ b/lib/eal/common/eal_private.h
@@ -441,6 +441,16 @@  int rte_eal_memory_detach(void);
  */
 struct rte_bus *rte_bus_find_by_device_name(const char *str);
 
+/**
+ * For each device on the buses, call the driver-specific function for
+ * device cleanup.
+ *
+ * @return
+ * 0 for successful cleanup
+ * !0 otherwise
+ */
+int eal_bus_cleanup(void);
+
 /**
  * Create the unix channel for primary/secondary communication.
  *
diff --git a/lib/eal/freebsd/eal.c b/lib/eal/freebsd/eal.c
index a6b20960f2..97ed2c4678 100644
--- a/lib/eal/freebsd/eal.c
+++ b/lib/eal/freebsd/eal.c
@@ -893,6 +893,7 @@  rte_eal_cleanup(void)
 		eal_get_internal_configuration();
 	rte_service_finalize();
 	rte_mp_channel_cleanup();
+	eal_bus_cleanup();
 	/* after this point, any DPDK pointers will become dangling */
 	rte_eal_memory_detach();
 	rte_eal_alarm_cleanup();
diff --git a/lib/eal/include/rte_bus.h b/lib/eal/include/rte_bus.h
index bbbb6efd28..9908a013f6 100644
--- a/lib/eal/include/rte_bus.h
+++ b/lib/eal/include/rte_bus.h
@@ -66,6 +66,18 @@  typedef int (*rte_bus_scan_t)(void);
  */
 typedef int (*rte_bus_probe_t)(void);
 
+/**
+ * Implementation specific cleanup function which is responsible for cleaning up
+ * devices on that bus with applicable drivers.
+ *
+ * This is called while iterating over each registered bus.
+ *
+ * @return
+ * 0 for successful cleanup
+ * !0 for any error during cleanup
+ */
+typedef int (*rte_bus_cleanup_t)(void);
+
 /**
  * Device iterator to find a device on a bus.
  *
@@ -277,6 +289,7 @@  struct rte_bus {
 				/**< handle hot-unplug failure on the bus */
 	rte_bus_sigbus_handler_t sigbus_handler;
 					/**< handle sigbus error on the bus */
+	rte_bus_cleanup_t cleanup;   /**< Cleanup devices on bus */
 
 };
 
diff --git a/lib/eal/linux/eal.c b/lib/eal/linux/eal.c
index 1ef263434a..9b32265ef5 100644
--- a/lib/eal/linux/eal.c
+++ b/lib/eal/linux/eal.c
@@ -1266,6 +1266,7 @@  rte_eal_cleanup(void)
 	vfio_mp_sync_cleanup();
 #endif
 	rte_mp_channel_cleanup();
+	eal_bus_cleanup();
 	/* after this point, any DPDK pointers will become dangling */
 	rte_eal_memory_detach();
 	eal_mp_dev_hotplug_cleanup();
diff --git a/lib/eal/windows/eal.c b/lib/eal/windows/eal.c
index 122de2a319..fedd6c971a 100644
--- a/lib/eal/windows/eal.c
+++ b/lib/eal/windows/eal.c
@@ -262,6 +262,7 @@  rte_eal_cleanup(void)
 
 	eal_intr_thread_cancel();
 	eal_mem_virt2iova_cleanup();
+	eal_bus_cleanup();
 	/* after this point, any DPDK pointers will become dangling */
 	rte_eal_memory_detach();
 	eal_cleanup_config(internal_conf);