mbox series

[RFC,0/6] replace telemetry with process_info

Message ID 20191205173128.64543-1-ciara.power@intel.com (mailing list archive)
Headers
Series replace telemetry with process_info |

Message

Power, Ciara Dec. 5, 2019, 5:31 p.m. UTC
  From: Bruce Richardson <bruce.richardson@intel.com>

This patchset proposes a new library, called "process-info" for now, to
replace the existing telemetry library in DPDK. (Name subject to change
if someone can propose a better one).

The existing telemetry library provides useful capabilities if used:
  - Creates a unix socket on the system to allow external programs
    connect and gather stats about the DPDK process.
  - Supports outputting the xstats for various network cards on the
    system.
  - Can be used to output any other information exported to the metrics
    library, e.g. by applications.
  - Uses JSON message format, which is widely supported by other
    languages and systems.
  - Is supported by a plugin to collectd.

However, the library also has some issues and limitations that could be
improved upon:
  - Has a dependency on libjansson for JSON processing, so is disabled
    by default.
  - Tied entirely to the metrics library for statistics.
  - No support for sending non-stats data, e.g. something as simple as
    DPDK version string.
  - All data gathering functions are in the library itself, which also
    means…
  - No support for libraries or drivers to present their own
    information via the library.

We therefore propose to keep the good points of the existing library,
but change the way things work to rectify the downsides.
This leads to the following design choices in the library:
  - Keep the existing idea of using a unix socket for connection (just
    simplifying the connection handling).
  - We would like to use JSON format, where possible, but the jansson
    library dependency is problematic. However, creating JSON-encoded
    data is easier than trying to parse JSON in C code, so we can keep
    the JSON output format for processing by e.g. collectd and other
    tools, while simplifying the input to be plain text commands:
	- Commands in this RFC are designed to all start with "/" for
	  consistency
	- Any parameters to the commands, e.g. the specific port to get
	  stats for, are placed after a comma ","
  - Have the library only handle socket creation and input handling.
    All data gathering should be provided by functions external to the
    library registered by other components, e.g. have ethdev library
    provide the function to query NIC xstats, etc.
  - Have the library directly initialized by EAL by default, so that
    unless an app explicitly does not want the support, monitoring is
    available on all DPDK apps.

The obvious question that remains to be answered here is: "why a new
library rather than just fixing the old one?"

The answer is that we have conflicts between the final two design
choices above, which require that the library be built early in the
build as other libraries will call into it to register callbacks, and
the desire to keep backward compatibility e.g. for use with collectd
plugin, which requires the existing library code be kept around and
built - as it is now - at the end of the build process since it calls
into other DPDK libraries. We therefore cannot have one library that
meets both requirements, hence the replacement which allows us to
maintain backward compatibility by just leaving the old lib in place
until e.g. 20.11. 

This is also why the new library is called "process_info", since the
name "telemetry" is already taken. Suggestions for a better name
welcome.

Bruce Richardson (4):
  process-info: introduce process-info library
  eal: integrate process-info library
  usertools: add process-info python script
  ethdev: add callback support for process-info

Ciara Power (2):
  rawdev: add callback support for process-info
  examples/l3fwd-power: enable use of process-info

 config/common_base                            |   5 +
 examples/l3fwd-power/main.c                   |  83 ++-----
 lib/Makefile                                  |   3 +-
 lib/librte_eal/common/eal_common_options.c    |  75 ++++++
 lib/librte_eal/common/eal_internal_cfg.h      |   1 +
 lib/librte_eal/common/eal_options.h           |   5 +
 lib/librte_eal/freebsd/eal/Makefile           |   1 +
 lib/librte_eal/freebsd/eal/eal.c              |  14 ++
 lib/librte_eal/linux/eal/Makefile             |   1 +
 lib/librte_eal/linux/eal/eal.c                |  15 ++
 lib/librte_eal/meson.build                    |   2 +-
 lib/librte_ethdev/Makefile                    |   2 +-
 lib/librte_ethdev/rte_ethdev.c                |  73 ++++++
 lib/librte_process_info/Makefile              |  26 ++
 lib/librte_process_info/meson.build           |   8 +
 lib/librte_process_info/process_info.c        | 231 ++++++++++++++++++
 lib/librte_process_info/rte_process_info.h    |  25 ++
 .../rte_process_info_version.map              |   6 +
 lib/librte_rawdev/Makefile                    |   3 +-
 lib/librte_rawdev/meson.build                 |   1 +
 lib/librte_rawdev/rte_rawdev.c                |  82 +++++++
 lib/meson.build                               |   1 +
 mk/rte.app.mk                                 |   1 +
 usertools/test_new_telemetry.py               |  28 +++
 24 files changed, 630 insertions(+), 62 deletions(-)
 create mode 100644 lib/librte_process_info/Makefile
 create mode 100644 lib/librte_process_info/meson.build
 create mode 100644 lib/librte_process_info/process_info.c
 create mode 100644 lib/librte_process_info/rte_process_info.h
 create mode 100644 lib/librte_process_info/rte_process_info_version.map
 create mode 100755 usertools/test_new_telemetry.py
  

Comments

David Marchand Feb. 5, 2020, 3:21 p.m. UTC | #1
Hello Ciara, Bruce,

On Thu, Dec 5, 2019 at 6:34 PM Ciara Power <ciara.power@intel.com> wrote:
>
> From: Bruce Richardson <bruce.richardson@intel.com>
>
> This patchset proposes a new library, called "process-info" for now, to
> replace the existing telemetry library in DPDK. (Name subject to change
> if someone can propose a better one).
>
> The existing telemetry library provides useful capabilities if used:
>   - Creates a unix socket on the system to allow external programs
>     connect and gather stats about the DPDK process.
>   - Supports outputting the xstats for various network cards on the
>     system.
>   - Can be used to output any other information exported to the metrics
>     library, e.g. by applications.
>   - Uses JSON message format, which is widely supported by other
>     languages and systems.
>   - Is supported by a plugin to collectd.
>
> However, the library also has some issues and limitations that could be
> improved upon:
>   - Has a dependency on libjansson for JSON processing, so is disabled
>     by default.
>   - Tied entirely to the metrics library for statistics.
>   - No support for sending non-stats data, e.g. something as simple as
>     DPDK version string.
>   - All data gathering functions are in the library itself, which also
>     means…
>   - No support for libraries or drivers to present their own
>     information via the library.
>
> We therefore propose to keep the good points of the existing library,
> but change the way things work to rectify the downsides.
> This leads to the following design choices in the library:
>   - Keep the existing idea of using a unix socket for connection (just
>     simplifying the connection handling).
>   - We would like to use JSON format, where possible, but the jansson
>     library dependency is problematic. However, creating JSON-encoded
>     data is easier than trying to parse JSON in C code, so we can keep
>     the JSON output format for processing by e.g. collectd and other
>     tools, while simplifying the input to be plain text commands:
>         - Commands in this RFC are designed to all start with "/" for
>           consistency
>         - Any parameters to the commands, e.g. the specific port to get
>           stats for, are placed after a comma ","
>   - Have the library only handle socket creation and input handling.
>     All data gathering should be provided by functions external to the
>     library registered by other components, e.g. have ethdev library
>     provide the function to query NIC xstats, etc.
>   - Have the library directly initialized by EAL by default, so that
>     unless an app explicitly does not want the support, monitoring is
>     available on all DPDK apps.
>
> The obvious question that remains to be answered here is: "why a new
> library rather than just fixing the old one?"
>
> The answer is that we have conflicts between the final two design
> choices above, which require that the library be built early in the
> build as other libraries will call into it to register callbacks, and
> the desire to keep backward compatibility e.g. for use with collectd
> plugin, which requires the existing library code be kept around and
> built - as it is now - at the end of the build process since it calls
> into other DPDK libraries. We therefore cannot have one library that
> meets both requirements, hence the replacement which allows us to
> maintain backward compatibility by just leaving the old lib in place
> until e.g. 20.11.
>
> This is also why the new library is called "process_info", since the
> name "telemetry" is already taken. Suggestions for a better name
> welcome.

The only user of the rte_telemetry api I could find is the (not yet
merged [1]) dpdk collectd plugin.

How will this impact it?
Can we expect compatibility?


1: https://github.com/collectd/collectd/pull/3273
  
Bruce Richardson Feb. 5, 2020, 5:12 p.m. UTC | #2
On Wed, Feb 05, 2020 at 04:21:16PM +0100, David Marchand wrote:
> Hello Ciara, Bruce,
> 
> On Thu, Dec 5, 2019 at 6:34 PM Ciara Power <ciara.power@intel.com> wrote:
> >
> > From: Bruce Richardson <bruce.richardson@intel.com>
> >
> > This patchset proposes a new library, called "process-info" for now, to
> > replace the existing telemetry library in DPDK. (Name subject to change
> > if someone can propose a better one).
> >
> > The existing telemetry library provides useful capabilities if used:
> >   - Creates a unix socket on the system to allow external programs
> >     connect and gather stats about the DPDK process.
> >   - Supports outputting the xstats for various network cards on the
> >     system.
> >   - Can be used to output any other information exported to the metrics
> >     library, e.g. by applications.
> >   - Uses JSON message format, which is widely supported by other
> >     languages and systems.
> >   - Is supported by a plugin to collectd.
> >
> > However, the library also has some issues and limitations that could be
> > improved upon:
> >   - Has a dependency on libjansson for JSON processing, so is disabled
> >     by default.
> >   - Tied entirely to the metrics library for statistics.
> >   - No support for sending non-stats data, e.g. something as simple as
> >     DPDK version string.
> >   - All data gathering functions are in the library itself, which also
> >     means…
> >   - No support for libraries or drivers to present their own
> >     information via the library.
> >
> > We therefore propose to keep the good points of the existing library,
> > but change the way things work to rectify the downsides.
> > This leads to the following design choices in the library:
> >   - Keep the existing idea of using a unix socket for connection (just
> >     simplifying the connection handling).
> >   - We would like to use JSON format, where possible, but the jansson
> >     library dependency is problematic. However, creating JSON-encoded
> >     data is easier than trying to parse JSON in C code, so we can keep
> >     the JSON output format for processing by e.g. collectd and other
> >     tools, while simplifying the input to be plain text commands:
> >         - Commands in this RFC are designed to all start with "/" for
> >           consistency
> >         - Any parameters to the commands, e.g. the specific port to get
> >           stats for, are placed after a comma ","
> >   - Have the library only handle socket creation and input handling.
> >     All data gathering should be provided by functions external to the
> >     library registered by other components, e.g. have ethdev library
> >     provide the function to query NIC xstats, etc.
> >   - Have the library directly initialized by EAL by default, so that
> >     unless an app explicitly does not want the support, monitoring is
> >     available on all DPDK apps.
> >
> > The obvious question that remains to be answered here is: "why a new
> > library rather than just fixing the old one?"
> >
> > The answer is that we have conflicts between the final two design
> > choices above, which require that the library be built early in the
> > build as other libraries will call into it to register callbacks, and
> > the desire to keep backward compatibility e.g. for use with collectd
> > plugin, which requires the existing library code be kept around and
> > built - as it is now - at the end of the build process since it calls
> > into other DPDK libraries. We therefore cannot have one library that
> > meets both requirements, hence the replacement which allows us to
> > maintain backward compatibility by just leaving the old lib in place
> > until e.g. 20.11.
> >
> > This is also why the new library is called "process_info", since the
> > name "telemetry" is already taken. Suggestions for a better name
> > welcome.
> 
> The only user of the rte_telemetry api I could find is the (not yet
> merged [1]) dpdk collectd plugin.
> 
> How will this impact it?
> Can we expect compatibility?
> 
> 
> 1: https://github.com/collectd/collectd/pull/3273
> 
Yes, we are aware of this, and we are investigating compatibility options.
Hopefully, we'll have more on this to share in 20.05 timeframe, as we do
more prototyping and investigation.

/Bruce