mbox series

[v2,0/5] net/mlx5: introduce Tx datapath tracing

Message ID 20230613165845.19109-1-viacheslavo@nvidia.com (mailing list archive)
Headers
Series net/mlx5: introduce Tx datapath tracing |

Message

Slava Ovsiienko June 13, 2023, 4:58 p.m. UTC
  The mlx5 provides the send scheduling on specific moment of time,
and for the related kind of applications it would be extremely useful
to have extra debug information - when and how packets were scheduled
and when the actual sending was completed by the NIC hardware (it helps
application to track the internal delay issues).

Because the DPDK tx datapath API does not suppose getting any feedback
from the driver and the feature looks like to be mlx5 specific, it seems
to be reasonable to engage exisiting DPDK datapath tracing capability.

The work cycle is supposed to be:
  - compile appplication with enabled tracing
  - run application with EAL parameters configuring the tracing in mlx5
    Tx datapath
  - store the dump file with gathered tracing information
  - run analyzing scrypt (in Python) to combine related events (packet
    firing and completion) and see the data in human-readable view

Below is the detailed instruction "how to" with mlx5 NIC to gather
all the debug data including the full timings information.


1. Build DPDK application with enabled datapath tracing

The meson option should be specified:
   --enable_trace_fp=true

The c_args shoudl be specified:
   -DALLOW_EXPERIMENTAL_API

The DPDK configuration examples:

  meson configure --buildtype=debug -Denable_trace_fp=true
        -Dc_args='-DRTE_LIBRTE_MLX5_DEBUG -DRTE_ENABLE_ASSERT -DALLOW_EXPERIMENTAL_API' build

  meson configure --buildtype=debug -Denable_trace_fp=true
        -Dc_args='-DRTE_ENABLE_ASSERT -DALLOW_EXPERIMENTAL_API' build

  meson configure --buildtype=release -Denable_trace_fp=true
        -Dc_args='-DRTE_ENABLE_ASSERT -DALLOW_EXPERIMENTAL_API' build

  meson configure --buildtype=release -Denable_trace_fp=true
        -Dc_args='-DALLOW_EXPERIMENTAL_API' build


2. Configuring the NIC

If the sending completion timings are important the NIC should be configured
to provide realtime timestamps, the REAL_TIME_CLOCK_ENABLE NV settings parameter
should be configured to TRUE, for example with command (and with following
FW/driver reset):

  sudo mlxconfig -d /dev/mst/mt4125_pciconf0 s REAL_TIME_CLOCK_ENABLE=1


3. Run DPDK application to gather the traces

EAL parameters controlling trace capability in runtime

  --trace=pmd.net.mlx5.tx - the regular expression enabling the tracepoints
                            with matching names at least "pmd.net.mlx5.tx"
                            must be enabled to gather all events needed
                            to analyze mlx5 Tx datapath and its timings.
                            By default all tracepoints are disabled.

  --trace-dir=/var/log - trace storing directory

  --trace-bufsz=<val>B|<val>K|<val>M - optional, trace data buffer size
                                       per thread. The default is 1MB.

  --trace-mode=overwrite|discard  - optional, selects trace data buffer mode.


4. Installing or Building Babeltrace2 Package

The gathered trace data can be analyzed with a developed Python script.
To parse the trace, the data script uses the Babeltrace2 library.
The package should be either installed or built from source code as
shown below:

  git clone https://github.com/efficios/babeltrace.git
  cd babeltrace
  ./bootstrap
  ./configure -help
  ./configure --disable-api-doc --disable-man-pages
              --disable-python-bindings-doc --enbale-python-plugins
              --enable-python-binding

5. Running the Analyzing Script

The analyzing script is located in the folder: ./drivers/net/mlx5/tools
It requires Python3.6, Babeltrace2 packages and it takes the only parameter
of trace data file. For example:

   ./mlx5_trace.py /var/log/rte-2023-01-23-AM-11-52-39


6. Interpreting the Script Output Data

All the timings are given in nanoseconds.
The list of Tx (and coming Rx) bursts per port/queue is presented in the output.
Each list element contains the list of built WQEs with specific opcodes, and
each WQE contains the list of the encompassed packets to send.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>

--
v2: - comment addressed: "dump_trace" command is replaced with "save_trace"
    - Windows build failure addressed, Windows does not support tracing

Viacheslav Ovsiienko (5):
  app/testpmd: add trace save command
  common/mlx5: introduce tracepoints for mlx5 drivers
  net/mlx5: add Tx datapath tracing
  net/mlx5: add comprehensive send completion trace
  net/mlx5: add Tx datapath trace analyzing script

 app/test-pmd/cmdline.c               |  38 ++++
 drivers/common/mlx5/meson.build      |   1 +
 drivers/common/mlx5/mlx5_trace.c     |  25 +++
 drivers/common/mlx5/mlx5_trace.h     |  72 +++++++
 drivers/common/mlx5/version.map      |   8 +
 drivers/net/mlx5/linux/mlx5_verbs.c  |   8 +-
 drivers/net/mlx5/mlx5_devx.c         |   8 +-
 drivers/net/mlx5/mlx5_rx.h           |  19 --
 drivers/net/mlx5/mlx5_rxtx.h         |  19 ++
 drivers/net/mlx5/mlx5_tx.c           |   9 +
 drivers/net/mlx5/mlx5_tx.h           |  88 ++++++++-
 drivers/net/mlx5/tools/mlx5_trace.py | 271 +++++++++++++++++++++++++++
 12 files changed, 537 insertions(+), 29 deletions(-)
 create mode 100644 drivers/common/mlx5/mlx5_trace.c
 create mode 100644 drivers/common/mlx5/mlx5_trace.h
 create mode 100755 drivers/net/mlx5/tools/mlx5_trace.py
  

Comments

Raslan Darawsheh June 20, 2023, noon UTC | #1
Hi,

> -----Original Message-----
> From: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
> Sent: Tuesday, June 13, 2023 7:59 PM
> To: dev@dpdk.org
> Subject: [PATCH v2 0/5] net/mlx5: introduce Tx datapath tracing
> 
> The mlx5 provides the send scheduling on specific moment of time,
> and for the related kind of applications it would be extremely useful
> to have extra debug information - when and how packets were scheduled
> and when the actual sending was completed by the NIC hardware (it helps
> application to track the internal delay issues).
> 
> Because the DPDK tx datapath API does not suppose getting any feedback
> from the driver and the feature looks like to be mlx5 specific, it seems
> to be reasonable to engage exisiting DPDK datapath tracing capability.
> 
> The work cycle is supposed to be:
>   - compile appplication with enabled tracing
>   - run application with EAL parameters configuring the tracing in mlx5
>     Tx datapath
>   - store the dump file with gathered tracing information
>   - run analyzing scrypt (in Python) to combine related events (packet
>     firing and completion) and see the data in human-readable view
> 
> Below is the detailed instruction "how to" with mlx5 NIC to gather
> all the debug data including the full timings information.
> 
> 
> 1. Build DPDK application with enabled datapath tracing
> 
> The meson option should be specified:
>    --enable_trace_fp=true
> 
> The c_args shoudl be specified:
>    -DALLOW_EXPERIMENTAL_API
> 
> The DPDK configuration examples:
> 
>   meson configure --buildtype=debug -Denable_trace_fp=true
>         -Dc_args='-DRTE_LIBRTE_MLX5_DEBUG -DRTE_ENABLE_ASSERT -
> DALLOW_EXPERIMENTAL_API' build
> 
>   meson configure --buildtype=debug -Denable_trace_fp=true
>         -Dc_args='-DRTE_ENABLE_ASSERT -DALLOW_EXPERIMENTAL_API' build
> 
>   meson configure --buildtype=release -Denable_trace_fp=true
>         -Dc_args='-DRTE_ENABLE_ASSERT -DALLOW_EXPERIMENTAL_API' build
> 
>   meson configure --buildtype=release -Denable_trace_fp=true
>         -Dc_args='-DALLOW_EXPERIMENTAL_API' build
> 
> 
> 2. Configuring the NIC
> 
> If the sending completion timings are important the NIC should be configured
> to provide realtime timestamps, the REAL_TIME_CLOCK_ENABLE NV settings
> parameter
> should be configured to TRUE, for example with command (and with following
> FW/driver reset):
> 
>   sudo mlxconfig -d /dev/mst/mt4125_pciconf0 s
> REAL_TIME_CLOCK_ENABLE=1
> 
> 
> 3. Run DPDK application to gather the traces
> 
> EAL parameters controlling trace capability in runtime
> 
>   --trace=pmd.net.mlx5.tx - the regular expression enabling the tracepoints
>                             with matching names at least "pmd.net.mlx5.tx"
>                             must be enabled to gather all events needed
>                             to analyze mlx5 Tx datapath and its timings.
>                             By default all tracepoints are disabled.
> 
>   --trace-dir=/var/log - trace storing directory
> 
>   --trace-bufsz=<val>B|<val>K|<val>M - optional, trace data buffer size
>                                        per thread. The default is 1MB.
> 
>   --trace-mode=overwrite|discard  - optional, selects trace data buffer mode.
> 
> 
> 4. Installing or Building Babeltrace2 Package
> 
> The gathered trace data can be analyzed with a developed Python script.
> To parse the trace, the data script uses the Babeltrace2 library.
> The package should be either installed or built from source code as
> shown below:
> 
>   git clone https://github.com/efficios/babeltrace.git
>   cd babeltrace
>   ./bootstrap
>   ./configure -help
>   ./configure --disable-api-doc --disable-man-pages
>               --disable-python-bindings-doc --enbale-python-plugins
>               --enable-python-binding
> 
> 5. Running the Analyzing Script
> 
> The analyzing script is located in the folder: ./drivers/net/mlx5/tools
> It requires Python3.6, Babeltrace2 packages and it takes the only parameter
> of trace data file. For example:
> 
>    ./mlx5_trace.py /var/log/rte-2023-01-23-AM-11-52-39
> 
> 
> 6. Interpreting the Script Output Data
> 
> All the timings are given in nanoseconds.
> The list of Tx (and coming Rx) bursts per port/queue is presented in the
> output.
> Each list element contains the list of built WQEs with specific opcodes, and
> each WQE contains the list of the encompassed packets to send.
> 
> Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
> 
> --
> v2: - comment addressed: "dump_trace" command is replaced with
> "save_trace"
>     - Windows build failure addressed, Windows does not support tracing
> 
> Viacheslav Ovsiienko (5):
>   app/testpmd: add trace save command
>   common/mlx5: introduce tracepoints for mlx5 drivers
>   net/mlx5: add Tx datapath tracing
>   net/mlx5: add comprehensive send completion trace
>   net/mlx5: add Tx datapath trace analyzing script
> 
>  app/test-pmd/cmdline.c               |  38 ++++
>  drivers/common/mlx5/meson.build      |   1 +
>  drivers/common/mlx5/mlx5_trace.c     |  25 +++
>  drivers/common/mlx5/mlx5_trace.h     |  72 +++++++
>  drivers/common/mlx5/version.map      |   8 +
>  drivers/net/mlx5/linux/mlx5_verbs.c  |   8 +-
>  drivers/net/mlx5/mlx5_devx.c         |   8 +-
>  drivers/net/mlx5/mlx5_rx.h           |  19 --
>  drivers/net/mlx5/mlx5_rxtx.h         |  19 ++
>  drivers/net/mlx5/mlx5_tx.c           |   9 +
>  drivers/net/mlx5/mlx5_tx.h           |  88 ++++++++-
>  drivers/net/mlx5/tools/mlx5_trace.py | 271
> +++++++++++++++++++++++++++
>  12 files changed, 537 insertions(+), 29 deletions(-)
>  create mode 100644 drivers/common/mlx5/mlx5_trace.c
>  create mode 100644 drivers/common/mlx5/mlx5_trace.h
>  create mode 100755 drivers/net/mlx5/tools/mlx5_trace.py
> 
> --
> 2.18.1

Series applied to next-net-mlx,

Kindest regards
Raslan Darawsheh
  
Thomas Monjalon June 27, 2023, 12:46 a.m. UTC | #2
20/06/2023 14:00, Raslan Darawsheh:
> Hi,
> 
> > -----Original Message-----
> > From: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
> > Sent: Tuesday, June 13, 2023 7:59 PM
> > To: dev@dpdk.org
> > Subject: [PATCH v2 0/5] net/mlx5: introduce Tx datapath tracing
> > 
> > The mlx5 provides the send scheduling on specific moment of time,
> > and for the related kind of applications it would be extremely useful
> > to have extra debug information - when and how packets were scheduled
> > and when the actual sending was completed by the NIC hardware (it helps
> > application to track the internal delay issues).
> > 
> > Because the DPDK tx datapath API does not suppose getting any feedback
> > from the driver and the feature looks like to be mlx5 specific, it seems
> > to be reasonable to engage exisiting DPDK datapath tracing capability.
> > 
> > The work cycle is supposed to be:
> >   - compile appplication with enabled tracing
> >   - run application with EAL parameters configuring the tracing in mlx5
> >     Tx datapath
> >   - store the dump file with gathered tracing information
> >   - run analyzing scrypt (in Python) to combine related events (packet
> >     firing and completion) and see the data in human-readable view
> > 
> > Below is the detailed instruction "how to" with mlx5 NIC to gather
> > all the debug data including the full timings information.
> > 
> > 
> > 1. Build DPDK application with enabled datapath tracing
> > 
> > The meson option should be specified:
> >    --enable_trace_fp=true
> > 
> > The c_args shoudl be specified:
> >    -DALLOW_EXPERIMENTAL_API
> > 
> > The DPDK configuration examples:
> > 
> >   meson configure --buildtype=debug -Denable_trace_fp=true
> >         -Dc_args='-DRTE_LIBRTE_MLX5_DEBUG -DRTE_ENABLE_ASSERT -
> > DALLOW_EXPERIMENTAL_API' build
> > 
> >   meson configure --buildtype=debug -Denable_trace_fp=true
> >         -Dc_args='-DRTE_ENABLE_ASSERT -DALLOW_EXPERIMENTAL_API' build
> > 
> >   meson configure --buildtype=release -Denable_trace_fp=true
> >         -Dc_args='-DRTE_ENABLE_ASSERT -DALLOW_EXPERIMENTAL_API' build
> > 
> >   meson configure --buildtype=release -Denable_trace_fp=true
> >         -Dc_args='-DALLOW_EXPERIMENTAL_API' build
> > 
> > 
> > 2. Configuring the NIC
> > 
> > If the sending completion timings are important the NIC should be configured
> > to provide realtime timestamps, the REAL_TIME_CLOCK_ENABLE NV settings
> > parameter
> > should be configured to TRUE, for example with command (and with following
> > FW/driver reset):
> > 
> >   sudo mlxconfig -d /dev/mst/mt4125_pciconf0 s
> > REAL_TIME_CLOCK_ENABLE=1
> > 
> > 
> > 3. Run DPDK application to gather the traces
> > 
> > EAL parameters controlling trace capability in runtime
> > 
> >   --trace=pmd.net.mlx5.tx - the regular expression enabling the tracepoints
> >                             with matching names at least "pmd.net.mlx5.tx"
> >                             must be enabled to gather all events needed
> >                             to analyze mlx5 Tx datapath and its timings.
> >                             By default all tracepoints are disabled.
> > 
> >   --trace-dir=/var/log - trace storing directory
> > 
> >   --trace-bufsz=<val>B|<val>K|<val>M - optional, trace data buffer size
> >                                        per thread. The default is 1MB.
> > 
> >   --trace-mode=overwrite|discard  - optional, selects trace data buffer mode.
> > 
> > 
> > 4. Installing or Building Babeltrace2 Package
> > 
> > The gathered trace data can be analyzed with a developed Python script.
> > To parse the trace, the data script uses the Babeltrace2 library.
> > The package should be either installed or built from source code as
> > shown below:
> > 
> >   git clone https://github.com/efficios/babeltrace.git
> >   cd babeltrace
> >   ./bootstrap
> >   ./configure -help
> >   ./configure --disable-api-doc --disable-man-pages
> >               --disable-python-bindings-doc --enbale-python-plugins
> >               --enable-python-binding
> > 
> > 5. Running the Analyzing Script
> > 
> > The analyzing script is located in the folder: ./drivers/net/mlx5/tools
> > It requires Python3.6, Babeltrace2 packages and it takes the only parameter
> > of trace data file. For example:
> > 
> >    ./mlx5_trace.py /var/log/rte-2023-01-23-AM-11-52-39
> > 
> > 
> > 6. Interpreting the Script Output Data
> > 
> > All the timings are given in nanoseconds.
> > The list of Tx (and coming Rx) bursts per port/queue is presented in the
> > output.
> > Each list element contains the list of built WQEs with specific opcodes, and
> > each WQE contains the list of the encompassed packets to send.

This information should be in the documentation.

I think we should request a review of the Python script from people familiar with tracing
and from people more familiar with Python scripting for user tools.
  
Slava Ovsiienko June 27, 2023, 11:24 a.m. UTC | #3
> -----Original Message-----
> From: Thomas Monjalon <thomas@monjalon.net>
> Sent: Tuesday, June 27, 2023 3:46 AM
> To: Slava Ovsiienko <viacheslavo@nvidia.com>
> Cc: dev@dpdk.org; Raslan Darawsheh <rasland@nvidia.com>;
> rjarry@redhat.com; jerinj@marvell.com
> Subject: Re: [PATCH v2 0/5] net/mlx5: introduce Tx datapath tracing
> 
> 20/06/2023 14:00, Raslan Darawsheh:
> > Hi,
> >
> > > -----Original Message-----
> > > From: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
> > > Sent: Tuesday, June 13, 2023 7:59 PM
> > > To: dev@dpdk.org
> > > Subject: [PATCH v2 0/5] net/mlx5: introduce Tx datapath tracing
> > >
> > > The mlx5 provides the send scheduling on specific moment of time,
> > > and for the related kind of applications it would be extremely
> > > useful to have extra debug information - when and how packets were
> > > scheduled and when the actual sending was completed by the NIC
> > > hardware (it helps application to track the internal delay issues).
> > >
> > > Because the DPDK tx datapath API does not suppose getting any
> > > feedback from the driver and the feature looks like to be mlx5
> > > specific, it seems to be reasonable to engage exisiting DPDK datapath
> tracing capability.
> > >
> > > The work cycle is supposed to be:
> > >   - compile appplication with enabled tracing
> > >   - run application with EAL parameters configuring the tracing in mlx5
> > >     Tx datapath
> > >   - store the dump file with gathered tracing information
> > >   - run analyzing scrypt (in Python) to combine related events (packet
> > >     firing and completion) and see the data in human-readable view
> > >
> > > Below is the detailed instruction "how to" with mlx5 NIC to gather
> > > all the debug data including the full timings information.
> > >
> > >
> > > 1. Build DPDK application with enabled datapath tracing
> > >
> > > The meson option should be specified:
> > >    --enable_trace_fp=true
> > >
> > > The c_args shoudl be specified:
> > >    -DALLOW_EXPERIMENTAL_API
> > >
> > > The DPDK configuration examples:
> > >
> > >   meson configure --buildtype=debug -Denable_trace_fp=true
> > >         -Dc_args='-DRTE_LIBRTE_MLX5_DEBUG -DRTE_ENABLE_ASSERT -
> > > DALLOW_EXPERIMENTAL_API' build
> > >
> > >   meson configure --buildtype=debug -Denable_trace_fp=true
> > >         -Dc_args='-DRTE_ENABLE_ASSERT -DALLOW_EXPERIMENTAL_API'
> > > build
> > >
> > >   meson configure --buildtype=release -Denable_trace_fp=true
> > >         -Dc_args='-DRTE_ENABLE_ASSERT -DALLOW_EXPERIMENTAL_API'
> > > build
> > >
> > >   meson configure --buildtype=release -Denable_trace_fp=true
> > >         -Dc_args='-DALLOW_EXPERIMENTAL_API' build
> > >
> > >
> > > 2. Configuring the NIC
> > >
> > > If the sending completion timings are important the NIC should be
> > > configured to provide realtime timestamps, the
> > > REAL_TIME_CLOCK_ENABLE NV settings parameter should be configured
> to
> > > TRUE, for example with command (and with following FW/driver reset):
> > >
> > >   sudo mlxconfig -d /dev/mst/mt4125_pciconf0 s
> > > REAL_TIME_CLOCK_ENABLE=1
> > >
> > >
> > > 3. Run DPDK application to gather the traces
> > >
> > > EAL parameters controlling trace capability in runtime
> > >
> > >   --trace=pmd.net.mlx5.tx - the regular expression enabling the tracepoints
> > >                             with matching names at least "pmd.net.mlx5.tx"
> > >                             must be enabled to gather all events needed
> > >                             to analyze mlx5 Tx datapath and its timings.
> > >                             By default all tracepoints are disabled.
> > >
> > >   --trace-dir=/var/log - trace storing directory
> > >
> > >   --trace-bufsz=<val>B|<val>K|<val>M - optional, trace data buffer size
> > >                                        per thread. The default is 1MB.
> > >
> > >   --trace-mode=overwrite|discard  - optional, selects trace data buffer
> mode.
> > >
> > >
> > > 4. Installing or Building Babeltrace2 Package
> > >
> > > The gathered trace data can be analyzed with a developed Python script.
> > > To parse the trace, the data script uses the Babeltrace2 library.
> > > The package should be either installed or built from source code as
> > > shown below:
> > >
> > >   git clone https://github.com/efficios/babeltrace.git
> > >   cd babeltrace
> > >   ./bootstrap
> > >   ./configure -help
> > >   ./configure --disable-api-doc --disable-man-pages
> > >               --disable-python-bindings-doc --enbale-python-plugins
> > >               --enable-python-binding
> > >
> > > 5. Running the Analyzing Script
> > >
> > > The analyzing script is located in the folder:
> > > ./drivers/net/mlx5/tools It requires Python3.6, Babeltrace2 packages
> > > and it takes the only parameter of trace data file. For example:
> > >
> > >    ./mlx5_trace.py /var/log/rte-2023-01-23-AM-11-52-39
> > >
> > >
> > > 6. Interpreting the Script Output Data
> > >
> > > All the timings are given in nanoseconds.
> > > The list of Tx (and coming Rx) bursts per port/queue is presented in
> > > the output.
> > > Each list element contains the list of built WQEs with specific
> > > opcodes, and each WQE contains the list of the encompassed packets to
> send.
> 
> This information should be in the documentation.
OK, should we make this cover-letter part of mlx5.rst?

> 
> I think we should request a review of the Python script from people familiar
> with tracing and from people more familiar with Python scripting for user
> tools.
Would be very helpful, could you recommend/ask someone?

With best regards,
Slava



>
  
Thomas Monjalon June 27, 2023, 11:34 a.m. UTC | #4
27/06/2023 13:24, Slava Ovsiienko:
> 
> > -----Original Message-----
> > From: Thomas Monjalon <thomas@monjalon.net>
> > Sent: Tuesday, June 27, 2023 3:46 AM
> > To: Slava Ovsiienko <viacheslavo@nvidia.com>
> > Cc: dev@dpdk.org; Raslan Darawsheh <rasland@nvidia.com>;
> > rjarry@redhat.com; jerinj@marvell.com
> > Subject: Re: [PATCH v2 0/5] net/mlx5: introduce Tx datapath tracing
> > 
> > 20/06/2023 14:00, Raslan Darawsheh:
> > > Hi,
> > >
> > > > -----Original Message-----
> > > > From: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
> > > > Sent: Tuesday, June 13, 2023 7:59 PM
> > > > To: dev@dpdk.org
> > > > Subject: [PATCH v2 0/5] net/mlx5: introduce Tx datapath tracing
> > > >
> > > > The mlx5 provides the send scheduling on specific moment of time,
> > > > and for the related kind of applications it would be extremely
> > > > useful to have extra debug information - when and how packets were
> > > > scheduled and when the actual sending was completed by the NIC
> > > > hardware (it helps application to track the internal delay issues).
> > > >
> > > > Because the DPDK tx datapath API does not suppose getting any
> > > > feedback from the driver and the feature looks like to be mlx5
> > > > specific, it seems to be reasonable to engage exisiting DPDK datapath
> > tracing capability.
> > > >
> > > > The work cycle is supposed to be:
> > > >   - compile appplication with enabled tracing
> > > >   - run application with EAL parameters configuring the tracing in mlx5
> > > >     Tx datapath
> > > >   - store the dump file with gathered tracing information
> > > >   - run analyzing scrypt (in Python) to combine related events (packet
> > > >     firing and completion) and see the data in human-readable view
> > > >
> > > > Below is the detailed instruction "how to" with mlx5 NIC to gather
> > > > all the debug data including the full timings information.
> > > >
> > > >
> > > > 1. Build DPDK application with enabled datapath tracing
> > > >
> > > > The meson option should be specified:
> > > >    --enable_trace_fp=true
> > > >
> > > > The c_args shoudl be specified:
> > > >    -DALLOW_EXPERIMENTAL_API
> > > >
> > > > The DPDK configuration examples:
> > > >
> > > >   meson configure --buildtype=debug -Denable_trace_fp=true
> > > >         -Dc_args='-DRTE_LIBRTE_MLX5_DEBUG -DRTE_ENABLE_ASSERT -
> > > > DALLOW_EXPERIMENTAL_API' build
> > > >
> > > >   meson configure --buildtype=debug -Denable_trace_fp=true
> > > >         -Dc_args='-DRTE_ENABLE_ASSERT -DALLOW_EXPERIMENTAL_API'
> > > > build
> > > >
> > > >   meson configure --buildtype=release -Denable_trace_fp=true
> > > >         -Dc_args='-DRTE_ENABLE_ASSERT -DALLOW_EXPERIMENTAL_API'
> > > > build
> > > >
> > > >   meson configure --buildtype=release -Denable_trace_fp=true
> > > >         -Dc_args='-DALLOW_EXPERIMENTAL_API' build
> > > >
> > > >
> > > > 2. Configuring the NIC
> > > >
> > > > If the sending completion timings are important the NIC should be
> > > > configured to provide realtime timestamps, the
> > > > REAL_TIME_CLOCK_ENABLE NV settings parameter should be configured
> > to
> > > > TRUE, for example with command (and with following FW/driver reset):
> > > >
> > > >   sudo mlxconfig -d /dev/mst/mt4125_pciconf0 s
> > > > REAL_TIME_CLOCK_ENABLE=1
> > > >
> > > >
> > > > 3. Run DPDK application to gather the traces
> > > >
> > > > EAL parameters controlling trace capability in runtime
> > > >
> > > >   --trace=pmd.net.mlx5.tx - the regular expression enabling the tracepoints
> > > >                             with matching names at least "pmd.net.mlx5.tx"
> > > >                             must be enabled to gather all events needed
> > > >                             to analyze mlx5 Tx datapath and its timings.
> > > >                             By default all tracepoints are disabled.
> > > >
> > > >   --trace-dir=/var/log - trace storing directory
> > > >
> > > >   --trace-bufsz=<val>B|<val>K|<val>M - optional, trace data buffer size
> > > >                                        per thread. The default is 1MB.
> > > >
> > > >   --trace-mode=overwrite|discard  - optional, selects trace data buffer
> > mode.
> > > >
> > > >
> > > > 4. Installing or Building Babeltrace2 Package
> > > >
> > > > The gathered trace data can be analyzed with a developed Python script.
> > > > To parse the trace, the data script uses the Babeltrace2 library.
> > > > The package should be either installed or built from source code as
> > > > shown below:
> > > >
> > > >   git clone https://github.com/efficios/babeltrace.git
> > > >   cd babeltrace
> > > >   ./bootstrap
> > > >   ./configure -help
> > > >   ./configure --disable-api-doc --disable-man-pages
> > > >               --disable-python-bindings-doc --enbale-python-plugins
> > > >               --enable-python-binding
> > > >
> > > > 5. Running the Analyzing Script
> > > >
> > > > The analyzing script is located in the folder:
> > > > ./drivers/net/mlx5/tools It requires Python3.6, Babeltrace2 packages
> > > > and it takes the only parameter of trace data file. For example:
> > > >
> > > >    ./mlx5_trace.py /var/log/rte-2023-01-23-AM-11-52-39
> > > >
> > > >
> > > > 6. Interpreting the Script Output Data
> > > >
> > > > All the timings are given in nanoseconds.
> > > > The list of Tx (and coming Rx) bursts per port/queue is presented in
> > > > the output.
> > > > Each list element contains the list of built WQEs with specific
> > > > opcodes, and each WQE contains the list of the encompassed packets to
> > send.
> > 
> > This information should be in the documentation.
> OK, should we make this cover-letter part of mlx5.rst?

Kind of, yes.

> > I think we should request a review of the Python script from people familiar
> > with tracing and from people more familiar with Python scripting for user
> > tools.
> Would be very helpful, could you recommend/ask someone?

Jerin, what do you think of such a script?
Robin, would you have time to look at this trace processing script please?
  
Robin Jarry June 28, 2023, 2:18 p.m. UTC | #5
Thomas Monjalon, Jun 27, 2023 at 13:34:
> Robin, would you have time to look at this trace processing script
> please?

Hi there,

I've had a brief look at the script. I don't exactly know what it is
taking as input and should be producing as output. Could you give some
examples?

Maybe I could suggest a few ideas to make it "feel" more python-esque.

Cheers,
  
Slava Ovsiienko June 29, 2023, 7:16 a.m. UTC | #6
Hi, Robin

Thank you for your courtesy about script reviewing.
Please see an attachment - the raw data gathered as a result of tracing, and brief description.

With best regards,
Slava

> -----Original Message-----
> From: Robin Jarry <rjarry@redhat.com>
> Sent: Wednesday, June 28, 2023 5:19 PM
> To: NBU-Contact-Thomas Monjalon (EXTERNAL) <thomas@monjalon.net>;
> Slava Ovsiienko <viacheslavo@nvidia.com>
> Cc: dev@dpdk.org; Raslan Darawsheh <rasland@nvidia.com>;
> jerinj@marvell.com; david.marchand@redhat.com
> Subject: Re: [PATCH v2 0/5] net/mlx5: introduce Tx datapath tracing
> 
> Thomas Monjalon, Jun 27, 2023 at 13:34:
> > Robin, would you have time to look at this trace processing script
> > please?
> 
> Hi there,
> 
> I've had a brief look at the script. I don't exactly know what it is taking as
> input and should be producing as output. Could you give some examples?
> 
> Maybe I could suggest a few ideas to make it "feel" more python-esque.
> 
> Cheers,
  
Robin Jarry June 29, 2023, 9:08 a.m. UTC | #7
Slava Ovsiienko, Jun 29, 2023 at 09:16:
> Hi, Robin
>
> Thank you for your courtesy about script reviewing.
> Please see an attachment - the raw data gathered as a result of tracing, and brief description.

Thanks for the details. I think that most of the contents of the
included pdf file should go into the docs and/or into the script help.

As for the script itself, the first thing to do would be to fix all
warnings reported by pylint:

$ pylint --enable=all mlx5_trace.py

After that, I have a few general remarks:

* do not use global variables except for constants
* most of the time, there is no need to use sys.exit() explicitly
* print errors on stderr
* remember that python has exceptions, it makes error handling easier

I would also advise to format your code using [black][1] so that you
don't have to bother about coding style.

[1]: https://github.com/psf/black

Feel free to inspire from the general structure that is present in some
of the scripts that I have written:

* usertools/dpdk-pmdinfo.py
* usertools/dpdk-rss-flows.py (not yet applied,
  http://patches.dpdk.org/project/dpdk/patch/20230628134748.117697-3-rjarry@redhat.com/)

Cheers,
Robin