[2/2] doc/contributing: guidelines for logging, tracing and telemetry

Message ID 20230613143355.77914-3-bruce.richardson@intel.com (mailing list archive)
State Superseded, archived
Delegated to: Thomas Monjalon
Headers
Series Improve docs on getting info on running process |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/loongarch-compilation success Compilation OK
ci/loongarch-unit-testing success Unit Testing PASS
ci/Intel-compilation success Compilation OK
ci/iol-mellanox-Performance success Performance Testing PASS
ci/iol-intel-Functional success Functional Testing PASS
ci/intel-Testing success Testing PASS
ci/iol-aarch-unit-testing success Testing PASS
ci/iol-unit-testing success Testing PASS
ci/github-robot: build success github build: passed
ci/intel-Functional success Functional PASS
ci/iol-abi-testing success Testing PASS
ci/iol-x86_64-compile-testing success Testing PASS
ci/iol-testing success Testing PASS
ci/iol-x86_64-unit-testing success Testing PASS
ci/iol-intel-Performance success Performance Testing PASS
ci/iol-broadcom-Performance success Performance Testing PASS
ci/iol-broadcom-Functional success Functional Testing PASS
ci/iol-aarch64-compile-testing success Testing PASS

Commit Message

Bruce Richardson June 13, 2023, 2:33 p.m. UTC
  As discussed by DPDK technical board [1], out contributor guide should
include some details as to when to use logging vs tracing vs telemetry
to provide the end user with information about the running process and
the DPDK libraries it uses.

[1] https://mails.dpdk.org/archives/dev/2023-March/265204.html

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 doc/guides/contributing/coding_style.rst |  2 ++
 doc/guides/contributing/design.rst       | 34 ++++++++++++++++++++++++
 doc/guides/prog_guide/telemetry_lib.rst  |  2 ++
 doc/guides/prog_guide/trace_lib.rst      |  2 ++
 4 files changed, 40 insertions(+)
  

Comments

Stephen Hemminger June 13, 2023, 3:21 p.m. UTC | #1
On Tue, 13 Jun 2023 15:33:55 +0100
Bruce Richardson <bruce.richardson@intel.com> wrote:

> +Below are some guidelines for when each should be used:
> +
> +* For reporting error conditions, or other abnormal runtime issues, *logging* should be used.
> +  Depending on the severity of the issue, the appropriate log level, for example,
> +  ``ERROR``, ``WARNING`` or ``NOTICE``, should be used.
> +* In general, it is not recommended that the DPDK logging functions should be used for debugging.
> +  Although the ``DEBUG`` log level may be used in components, it should only be used sparingly,
> +  and the *tracing* functionality used instead.
> +  More specifically:
> +
> +  * for cases where a path through the code is only likely to be taken once,
> +    for example, initialization code, either *logging* at ``DEBUG`` level or *tracing* may be used -
> +    though tracing is preferred.
> +  * where a path is to be taken multiple times within a short timeframe, for example,
> +    datapath or regular control plane code, *tracing* should be used.

When a new feature, bus or device type is added it *must* not log when not present.
No new messages for existing users please.
  
Morten Brørup June 13, 2023, 7:38 p.m. UTC | #2
> From: Bruce Richardson [mailto:bruce.richardson@intel.com]
> Sent: Tuesday, 13 June 2023 16.34
> 
> As discussed by DPDK technical board [1], out contributor guide should

Typo: out -> our

> include some details as to when to use logging vs tracing vs telemetry
> to provide the end user with information about the running process and
> the DPDK libraries it uses.
> 
> [1] https://mails.dpdk.org/archives/dev/2023-March/265204.html
> 
> Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
> ---

Debug logging can be more verbose than trace in the one-off cases, so I have a slightly different personal preference. But perhaps this is mainly relevant in the abnormal execution paths, and thus already covered.

Anyway, I agree with everything in this patch.

Acked-by: Morten Brørup <mb@smartsharesystems.com>

You might also consider mentioning the xyz_dump(FILE *f) functions such as rte_mempool_list_dump(FILE *f).

I find this type of semi-structured verbose dumping very useful for investigating the state of systems in production. Our debug CLI provides access to these _dump() functions, including some of our own, from various subsystems in the application.
  
Bruce Richardson June 14, 2023, 8:36 a.m. UTC | #3
On Tue, Jun 13, 2023 at 09:38:55PM +0200, Morten Brørup wrote:
> > From: Bruce Richardson [mailto:bruce.richardson@intel.com]
> > Sent: Tuesday, 13 June 2023 16.34
> > 
> > As discussed by DPDK technical board [1], out contributor guide should
> 
> Typo: out -> our
> 
> > include some details as to when to use logging vs tracing vs telemetry
> > to provide the end user with information about the running process and
> > the DPDK libraries it uses.
> > 
> > [1] https://mails.dpdk.org/archives/dev/2023-March/265204.html
> > 
> > Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
> > ---
> 
> Debug logging can be more verbose than trace in the one-off cases, so I have a slightly different personal preference. But perhaps this is mainly relevant in the abnormal execution paths, and thus already covered.
> 

Feel free to suggest alternate wordings if you have any. That particular
item is hard to come up with good guidelines for - especially ones that are
not pages long! :-)

> Anyway, I agree with everything in this patch.
> 
> Acked-by: Morten Brørup <mb@smartsharesystems.com>
> 

Thanks.

> You might also consider mentioning the xyz_dump(FILE *f) functions such as rte_mempool_list_dump(FILE *f).
> 
> I find this type of semi-structured verbose dumping very useful for investigating the state of systems in production. Our debug CLI provides access to these _dump() functions, including some of our own, from various subsystems in the application.
> 

I'll try and put some mention of that in V2.

/Bruce
  
Morten Brørup June 14, 2023, 9:39 a.m. UTC | #4
> From: Bruce Richardson [mailto:bruce.richardson@intel.com]
> Sent: Wednesday, 14 June 2023 10.37
> 
> On Tue, Jun 13, 2023 at 09:38:55PM +0200, Morten Brørup wrote:
> > > From: Bruce Richardson [mailto:bruce.richardson@intel.com]
> > > Sent: Tuesday, 13 June 2023 16.34
> > >
> > > As discussed by DPDK technical board [1], out contributor guide should
> >
> > Typo: out -> our
> >
> > > include some details as to when to use logging vs tracing vs telemetry
> > > to provide the end user with information about the running process and
> > > the DPDK libraries it uses.
> > >
> > > [1] https://mails.dpdk.org/archives/dev/2023-March/265204.html
> > >
> > > Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
> > > ---
> >
> > Debug logging can be more verbose than trace in the one-off cases, so I have
> a slightly different personal preference. But perhaps this is mainly relevant
> in the abnormal execution paths, and thus already covered.
> >
> 
> Feel free to suggest alternate wordings if you have any. That particular
> item is hard to come up with good guidelines for - especially ones that are
> not pages long! :-)

+  * For cases where a path through the code is only likely to be taken once,
+    for example, initialization code, either *logging* at ``DEBUG`` level or *tracing* may be used -
+    though tracing is preferred.

How about adding this sentence at the end of the bullet:
+    though tracing is preferred. When conveying a lot of information, *logging* with more details can supplement *tracing* with the basic details.

+  * Where a path is to be taken multiple times within a short timeframe, for example,
+    datapath or regular control plane code, *tracing* should be used.

Capitalize the first letter at the bullets: "* for"->"* For" and "* where"->"* Where"

> 
> > Anyway, I agree with everything in this patch.
> >
> > Acked-by: Morten Brørup <mb@smartsharesystems.com>
> >
> 
> Thanks.
> 
> > You might also consider mentioning the xyz_dump(FILE *f) functions such as
> rte_mempool_list_dump(FILE *f).
> >
> > I find this type of semi-structured verbose dumping very useful for
> investigating the state of systems in production. Our debug CLI provides
> access to these _dump() functions, including some of our own, from various
> subsystems in the application.
> >
> 
> I'll try and put some mention of that in V2.
> 
> /Bruce
  
Ferruh Yigit June 14, 2023, 11:35 a.m. UTC | #5
On 6/13/2023 3:33 PM, Bruce Richardson wrote:
> As discussed by DPDK technical board [1], out contributor guide should
> include some details as to when to use logging vs tracing vs telemetry
> to provide the end user with information about the running process and
> the DPDK libraries it uses.
> 
> [1] https://mails.dpdk.org/archives/dev/2023-March/265204.html
> 
> Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
> ---
>  doc/guides/contributing/coding_style.rst |  2 ++
>  doc/guides/contributing/design.rst       | 34 ++++++++++++++++++++++++
>  doc/guides/prog_guide/telemetry_lib.rst  |  2 ++
>  doc/guides/prog_guide/trace_lib.rst      |  2 ++
>  4 files changed, 40 insertions(+)
> 
> diff --git a/doc/guides/contributing/coding_style.rst b/doc/guides/contributing/coding_style.rst
> index f50a78a346..13b2390d9e 100644
> --- a/doc/guides/contributing/coding_style.rst
> +++ b/doc/guides/contributing/coding_style.rst
> @@ -794,6 +794,8 @@ Control Statements
>                   /* NOTREACHED */
>           }
>  
> +.. _dynamic_logging:
> +
>  Dynamic Logging
>  ---------------
>  
> diff --git a/doc/guides/contributing/design.rst b/doc/guides/contributing/design.rst
> index d24a7ff6a0..9edf3ea43b 100644
> --- a/doc/guides/contributing/design.rst
> +++ b/doc/guides/contributing/design.rst
> @@ -4,6 +4,40 @@
>  Design
>  ======
>  
> +Runtime Information - Logging, Tracing and Telemetry
> +-------------------------------------------------------
> +
> +It is often desirable to provide information to the end-user as to what is happening to the application at runtime.
>

I think target audience matters, who is consumer of the runtime
information, possible candidates I can think of:
- dev-ops engineer deploys and maintains a dpdk application
- DPDK based application developer
- DPDK developer

Should we distinguish them? Like I guess telemetry output can be more
useful to dev-ops engineer, but debug output can be more useful to
application developer.

To roughly set target consumer for each method may help developer to
decide which one to use, but not sure if we can narrow it like this.

> +DPDK provides a number of built-in mechanisms to provide this introspection:
> +
> +* :ref:`Logging<dynamic_logging>`
> +* :ref:`Tracing<trace_library>`
> +* :ref:`Telemetry<telemetry_library>`
> +
> +Each of these has it's own strengths and suitabilities for use within DPDK components.
> +
> +Below are some guidelines for when each should be used:
> +
> +* For reporting error conditions, or other abnormal runtime issues, *logging* should be used.
> +  Depending on the severity of the issue, the appropriate log level, for example,
> +  ``ERROR``, ``WARNING`` or ``NOTICE``, should be used.

+1

> +* In general, it is not recommended that the DPDK logging functions should be used for debugging.
> +  Although the ``DEBUG`` log level may be used in components, it should only be used sparingly,
> +  and the *tracing* functionality used instead.

I think DEBUG logging is still useful for control path, 'tracing' is
missing the custom message, so I found it more useful for to analyze
order and frequency of a function call, which suits more to datapath.

> +  More specifically:
> +
> +  * for cases where a path through the code is only likely to be taken once,
> +    for example, initialization code, either *logging* at ``DEBUG`` level or *tracing* may be used -
> +    though tracing is preferred.>

I am for preferring 'logging' for this case, because there is extra
effort required to analyze the trace output, but logs are simple and
just there.

> +  * where a path is to be taken multiple times within a short timeframe, for example,
> +    datapath or regular control plane code, *tracing* should be used.
> +
>

+1 to use tracing mainly for datapath

Also there is logging support for datapath, it would be nice to clarify
that usage too. Datapath logging used:
a) RTE_LOG_DP macros where code compiled out based on selected log level
b) 'RTE_ETHDEV_DEBUG_RX' & 'RTE_ETHDEV_DEBUG_TX' macros

ethdev layer use b for logging,
drivers use combination of a & b

Tracing helps on datapath to get profiling etc, but there are cases
driver needs to log some variables, like descriptor details to ensure
*functionality* is correct.


I suggest to remove/deprecate RTE_LOG_DP [1], and use
'RTE_ETHDEV_DEBUG_x' macros for developer logging, and use tracing for
profiling. What do you think?


[1]
Intention is good with RTE_LOG_DP macro to set the desired log level and
compile out not used logs so that they won't take extra cycle.
But every time log level changed, it requires compiling again anyway,
and in practice there is no multiple level logging need for datapath,
most of the time single macro to enable/disable logging is sufficient.
That why I believe simpler to use an explicit macro to enable/disable
datapath logging, like 'RTE_ETHDEV_DEBUG_RX' & 'RTE_ETHDEV_DEBUG_TX'.



> +* For numerical or statistical data generated by a component, for example, per-packet statistics,
> +  *telemetry* should be used.
>

+1


> +* For any data where the data may need to be gathered at any point in the execution to help assess the state of the application component,
> +  for example, core configuration, device information, *telemetry* should be used.
> +

+1

Should we explicitly state that telemetry only used as readonly?
I think it implicitly states, but since there was a patch in the past, I
wonder if this needs to be highlighted?
  
Jerin Jacob June 15, 2023, 11:51 a.m. UTC | #6
On Tue, Jun 13, 2023 at 8:04 PM Bruce Richardson
<bruce.richardson@intel.com> wrote:
>
> As discussed by DPDK technical board [1], out contributor guide should
> include some details as to when to use logging vs tracing vs telemetry
> to provide the end user with information about the running process and
> the DPDK libraries it uses.
>
> [1] https://mails.dpdk.org/archives/dev/2023-March/265204.html
>
> Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
> ---
>  doc/guides/contributing/coding_style.rst |  2 ++
>  doc/guides/contributing/design.rst       | 34 ++++++++++++++++++++++++
>  doc/guides/prog_guide/telemetry_lib.rst  |  2 ++
>  doc/guides/prog_guide/trace_lib.rst      |  2 ++
>  4 files changed, 40 insertions(+)
>
> diff --git a/doc/guides/contributing/coding_style.rst b/doc/guides/contributing/coding_style.rst
> index f50a78a346..13b2390d9e 100644
> --- a/doc/guides/contributing/coding_style.rst
> +++ b/doc/guides/contributing/coding_style.rst
> @@ -794,6 +794,8 @@ Control Statements
>                   /* NOTREACHED */
>           }
>
> +.. _dynamic_logging:
> +
>  Dynamic Logging
>  ---------------
>
> diff --git a/doc/guides/contributing/design.rst b/doc/guides/contributing/design.rst
> index d24a7ff6a0..9edf3ea43b 100644
> --- a/doc/guides/contributing/design.rst
> +++ b/doc/guides/contributing/design.rst
> @@ -4,6 +4,40 @@
>  Design
>  ======
>
> +Runtime Information - Logging, Tracing and Telemetry
> +-------------------------------------------------------
> +
> +It is often desirable to provide information to the end-user as to what is happening to the application at runtime.
> +DPDK provides a number of built-in mechanisms to provide this introspection:
> +
> +* :ref:`Logging<dynamic_logging>`
> +* :ref:`Tracing<trace_library>`
> +* :ref:`Telemetry<telemetry_library>`
> +
> +Each of these has it's own strengths and suitabilities for use within DPDK components.
> +
> +Below are some guidelines for when each should be used:
> +
> +* For reporting error conditions, or other abnormal runtime issues, *logging* should be used.
> +  Depending on the severity of the issue, the appropriate log level, for example,
> +  ``ERROR``, ``WARNING`` or ``NOTICE``, should be used.
> +* In general, it is not recommended that the DPDK logging functions should be used for debugging.
> +  Although the ``DEBUG`` log level may be used in components, it should only be used sparingly,
> +  and the *tracing* functionality used instead.
> +  More specifically:
> +
> +  * for cases where a path through the code is only likely to be taken once,
> +    for example, initialization code, either *logging* at ``DEBUG`` level or *tracing* may be used -
> +    though tracing is preferred.
> +  * where a path is to be taken multiple times within a short timeframe, for example,
> +    datapath or regular control plane code, *tracing* should be used.


If you think it makes sense, the following point may be added for trace.
# Since trace is using CommonTraceFormat, it is possible to write
_reusable_ post-processing tools with
https://babeltrace.org/docs/v1.5/python/babeltrace/ like finding idle
CPU time etc.

Rest looks good to me.



> +
> +* For numerical or statistical data generated by a component, for example, per-packet statistics,
> +  *telemetry* should be used.
> +* For any data where the data may need to be gathered at any point in the execution to help assess the state of the application component,
> +  for example, core configuration, device information, *telemetry* should be used.
> +
> +
>  Environment or Architecture-specific Sources
>  --------------------------------------------
>
> diff --git a/doc/guides/prog_guide/telemetry_lib.rst b/doc/guides/prog_guide/telemetry_lib.rst
> index 32f525a67f..71f8bd735e 100644
> --- a/doc/guides/prog_guide/telemetry_lib.rst
> +++ b/doc/guides/prog_guide/telemetry_lib.rst
> @@ -1,6 +1,8 @@
>  ..  SPDX-License-Identifier: BSD-3-Clause
>      Copyright(c) 2020 Intel Corporation.
>
> +.. _telemetry_library:
> +
>  Telemetry Library
>  =================
>
> diff --git a/doc/guides/prog_guide/trace_lib.rst b/doc/guides/prog_guide/trace_lib.rst
> index e5718feddc..a3b8a7c2eb 100644
> --- a/doc/guides/prog_guide/trace_lib.rst
> +++ b/doc/guides/prog_guide/trace_lib.rst
> @@ -1,6 +1,8 @@
>  ..  SPDX-License-Identifier: BSD-3-Clause
>      Copyright(C) 2020 Marvell International Ltd.
>
> +.. _trace_library:
> +
>  Trace Library
>  =============
>
> --
> 2.39.2
>
  

Patch

diff --git a/doc/guides/contributing/coding_style.rst b/doc/guides/contributing/coding_style.rst
index f50a78a346..13b2390d9e 100644
--- a/doc/guides/contributing/coding_style.rst
+++ b/doc/guides/contributing/coding_style.rst
@@ -794,6 +794,8 @@  Control Statements
                  /* NOTREACHED */
          }
 
+.. _dynamic_logging:
+
 Dynamic Logging
 ---------------
 
diff --git a/doc/guides/contributing/design.rst b/doc/guides/contributing/design.rst
index d24a7ff6a0..9edf3ea43b 100644
--- a/doc/guides/contributing/design.rst
+++ b/doc/guides/contributing/design.rst
@@ -4,6 +4,40 @@ 
 Design
 ======
 
+Runtime Information - Logging, Tracing and Telemetry
+-------------------------------------------------------
+
+It is often desirable to provide information to the end-user as to what is happening to the application at runtime.
+DPDK provides a number of built-in mechanisms to provide this introspection:
+
+* :ref:`Logging<dynamic_logging>`
+* :ref:`Tracing<trace_library>`
+* :ref:`Telemetry<telemetry_library>`
+
+Each of these has it's own strengths and suitabilities for use within DPDK components.
+
+Below are some guidelines for when each should be used:
+
+* For reporting error conditions, or other abnormal runtime issues, *logging* should be used.
+  Depending on the severity of the issue, the appropriate log level, for example,
+  ``ERROR``, ``WARNING`` or ``NOTICE``, should be used.
+* In general, it is not recommended that the DPDK logging functions should be used for debugging.
+  Although the ``DEBUG`` log level may be used in components, it should only be used sparingly,
+  and the *tracing* functionality used instead.
+  More specifically:
+
+  * for cases where a path through the code is only likely to be taken once,
+    for example, initialization code, either *logging* at ``DEBUG`` level or *tracing* may be used -
+    though tracing is preferred.
+  * where a path is to be taken multiple times within a short timeframe, for example,
+    datapath or regular control plane code, *tracing* should be used.
+
+* For numerical or statistical data generated by a component, for example, per-packet statistics,
+  *telemetry* should be used.
+* For any data where the data may need to be gathered at any point in the execution to help assess the state of the application component,
+  for example, core configuration, device information, *telemetry* should be used.
+
+
 Environment or Architecture-specific Sources
 --------------------------------------------
 
diff --git a/doc/guides/prog_guide/telemetry_lib.rst b/doc/guides/prog_guide/telemetry_lib.rst
index 32f525a67f..71f8bd735e 100644
--- a/doc/guides/prog_guide/telemetry_lib.rst
+++ b/doc/guides/prog_guide/telemetry_lib.rst
@@ -1,6 +1,8 @@ 
 ..  SPDX-License-Identifier: BSD-3-Clause
     Copyright(c) 2020 Intel Corporation.
 
+.. _telemetry_library:
+
 Telemetry Library
 =================
 
diff --git a/doc/guides/prog_guide/trace_lib.rst b/doc/guides/prog_guide/trace_lib.rst
index e5718feddc..a3b8a7c2eb 100644
--- a/doc/guides/prog_guide/trace_lib.rst
+++ b/doc/guides/prog_guide/trace_lib.rst
@@ -1,6 +1,8 @@ 
 ..  SPDX-License-Identifier: BSD-3-Clause
     Copyright(C) 2020 Marvell International Ltd.
 
+.. _trace_library:
+
 Trace Library
 =============