mbox series

[v3,00/12] mldev: introduce machine learning device library

Message ID 20230207151316.835441-1-jerinj@marvell.com (mailing list archive)
Headers
Series mldev: introduce machine learning device library |

Message

Jerin Jacob Feb. 7, 2023, 3:13 p.m. UTC
From: Jerin Jacob <jerinj@marvell.com>

Machine learning inference library
==================================

Definition of machine learning inference
----------------------------------------
Inference in machine learning is the process of making an output prediction
based on new input data using a pre-trained machine learning model.

The scope of the RFC would include only inferencing with pre-trained machine learning models,
training and building/compiling the ML models is out of scope for this RFC or
DPDK mldev API. Use existing machine learning compiler frameworks for model creation.

Motivation for the new library
------------------------------
Multiple semiconductor vendors are offering accelerator products such as DPU
(often called Smart-NIC), FPGA, GPU, etc., which have ML inferencing capabilities
integrated as part of the product. Use of ML inferencing is increasing in the domain
of packet processing for flow classification, intrusion, malware and anomaly detection.

Lack of inferencing support through DPDK APIs will involve complexities and
increased latency from moving data across frameworks (i.e, dataplane to
non dataplane ML frameworks and vice-versa). Having a standardized DPDK APIs for ML
inferencing would enable the dataplane solutions to harness the benefit of inline
inferencing supported by the hardware.

Contents
---------------
A) API specification for:

1) Discovery of ML capabilities (e.g., device specific features) in a vendor
independent fashion
2) Definition of functions to handle ML devices, which includes probing,
initialization and termination of the devices.
3) Definition of functions to handle ML models used to perform inference operations.
4) Definition of function to handle quantize and dequantize operations

B) Common code for above specification

rfc..v1:
- Added programmer guide documentation
- Added implementation for common code

v2..v1:
- Moved dynamic log (Stephen)
- model id to uint16_t from int16t_t (Stephen)
- added release note updates

v3..v2:
- Introduced rte_ml_dev_init() similar to rte_gpu_init() (Stephen, Thomas)
- In struct rte_ml_dev_data, removed reserved[3] and   __rte_cache_aligned.
Also, moved name field to the end(Stephen)
 
Machine learning library framework
----------------------------------

The ML framework is built on the following model:


    +-----------------+               rte_ml_[en|de]queue_burst()
    |                 |                          |
    |     Machine     o------+     +--------+    |
    |     Learning    |      |     | queue  |    |    +------+
    |     Inference   o------+-----o        |<===o===>|Core 0|
    |     Engine      |      |     | pair 0 |         +------+
    |                 o----+ |     +--------+
    |                 |    | |
    +-----------------+    | |     +--------+
             ^             | |     | queue  |         +------+
             |             | +-----o        |<=======>|Core 1|
             |             |       | pair 1 |         +------+
             |             |       +--------+
    +--------+--------+    |
    | +-------------+ |    |       +--------+
    | |   Model 0   | |    |       | queue  |         +------+
    | +-------------+ |    +-------o        |<=======>|Core N|
    | +-------------+ |            | pair N |         +------+
    | |   Model 1   | |            +--------+
    | +-------------+ |
    | +-------------+ |<------- rte_ml_model_load()
    | |   Model ..  | |-------> rte_ml_model_info()
    | +-------------+ |<------- rte_ml_model_start()
    | +-------------+ |<------- rte_ml_model_stop()
    | |   Model N   | |<------- rte_ml_model_params_update()
    | +-------------+ |<------- rte_ml_model_unload()
    +-----------------+

ML Device: A hardware or software-based implementation of ML device API for
running inferences using a pre-trained ML model.

ML Model: An ML model is an algorithm trained over a dataset. A model consists of
procedure/algorithm and data/pattern required to make predictions on live data.
Once the model is created and trained outside of the DPDK scope, the model can be loaded
via rte_ml_model_load() and then start it using rte_ml_model_start() API.
The rte_ml_model_params_update() can be used to update the model parameters such as weight
and bias without unloading the model using rte_ml_model_unload().

ML Inference: ML inference is the process of feeding data to the model via
rte_ml_enqueue_burst() API and use rte_ml_dequeue_burst() API to get the calculated
outputs/predictions from the started model.

In all functions of the ML device API, the ML device is designated by an
integer >= 0 named as device identifier *dev_id*.

The functions exported by the ML device API to setup a device designated by
its device identifier must be invoked in the following order:

     - rte_ml_dev_configure()
     - rte_ml_dev_queue_pair_setup()
     - rte_ml_dev_start()

A model is required to run the inference operations with the user specified inputs.
Application needs to invoke the ML model API in the following order before queueing
inference jobs.

     - rte_ml_model_load()
     - rte_ml_model_start()

The rte_ml_model_info() API is provided to retrieve the information related to the model.
The information would include the shape and type of input and output required for the inference.

Data quantization and dequantization is one of the main aspects in ML domain. This involves
conversion of input data from a higher precision to a lower precision data type and vice-versa
for the output. APIs are provided to enable quantization through rte_ml_io_quantize() and
dequantization through rte_ml_io_dequantize(). These APIs have the capability to handle input
and output buffers holding data for multiple batches.
Two utility APIs rte_ml_io_input_size_get() and rte_ml_io_output_size_get() can used to get the
size of quantized and de-quantized multi-batch input and output buffers.

User can optionally update the model parameters with rte_ml_model_params_update() after
invoking rte_ml_model_stop() API on a given model ID.

The application can invoke, in any order, the functions exported by the ML API to enqueue
inference jobs and dequeue inference response.

If the application wants to change the device configuration (i.e., call
rte_ml_dev_configure() or rte_ml_dev_queue_pair_setup()), then application must stop the
device using rte_ml_dev_stop() API. Likewise, if model parameters need to be updated then
the application must call rte_ml_model_stop() followed by rte_ml_model_params_update() API
for the given model. The application does not need to call rte_ml_dev_stop() API for
any model re-configuration such as rte_ml_model_params_update(), rte_ml_model_unload() etc.

Once the device is in the start state after invoking rte_ml_dev_start() API and the model is in
start state after invoking rte_ml_model_start() API, then the application can call
rte_ml_enqueue() and rte_ml_dequeue() API on the destined device and model ID.

Finally, an application can close an ML device by invoking the rte_ml_dev_close() function.

Typical application utilisation of the ML API will follow the following
programming flow.

- rte_ml_dev_configure()
- rte_ml_dev_queue_pair_setup()
- rte_ml_model_load()
- rte_ml_model_start()
- rte_ml_model_info()
- rte_ml_dev_start()
- rte_ml_enqueue_burst()
- rte_ml_dequeue_burst()
- rte_ml_model_stop()
- rte_ml_model_unload()
- rte_ml_dev_stop()
- rte_ml_dev_close()

Regarding multi-threading, by default, all the functions of the ML Device API exported by a PMD
are lock-free functions which assume to not be invoked in parallel on different logical cores
on the same target object. For instance, the dequeue function of a poll mode driver cannot be
invoked in parallel on two logical cores to operate on same queue pair. Of course, this function
can be invoked in parallel by different logical core on different queue pair.
It is the responsibility of the user application to enforce this rule.

Example application usage for ML inferencing
--------------------------------------------
This example application is to demonstrate the programming model of ML device
library. This example omits the error checks to simplify the application. This
example also assumes that the input data received is quantized and output expected
is also quantized. In order to handle non-quantized inputs and outputs, users can
invoke rte_ml_io_quantize() or rte_ml_io_dequantize() for data type conversions.

#define ML_MODEL_NAME "model"
#define IO_MZ "io_mz"

struct app_ctx {
	char model_file[PATH_MAX];
	char inp_file[PATH_MAX];
	char out_file[PATH_MAX];

	struct rte_ml_model_params params;
	struct rte_ml_model_info info;
	uint16_t id;

	uint64_t input_size;
	uint64_t output_size;
	uint8_t *input_buffer;
	uint8_t *output_buffer;
} __rte_cache_aligned;

struct app_ctx ctx;

static int
parse_args(int argc, char **argv)
{
	int opt, option_index;
	static struct option lgopts[] = {{"model", required_argument, NULL, 'm'},
					 {"input", required_argument, NULL, 'i'},
					 {"output", required_argument, NULL, 'o'},
					 {NULL, 0, NULL, 0}};

	while ((opt = getopt_long(argc, argv, "m:i:o:", lgopts, &option_index)) != EOF)
		switch (opt) {
		case 'm':
			strncpy(ctx.model_file, optarg, PATH_MAX - 1);
			break;
		case 'i':
			strncpy(ctx.inp_file, optarg, PATH_MAX - 1);
			break;
		case 'o':
			strncpy(ctx.out_file, optarg, PATH_MAX - 1);
			break;
		default:
			return -1;
		}

	return 0;
}

int
main(int argc, char **argv)
{
	struct rte_ml_dev_qp_conf qp_conf;
	struct rte_ml_dev_config config;
	struct rte_ml_dev_info dev_info;
	const struct rte_memzone *mz;
	struct rte_mempool *op_pool;
	struct rte_ml_op *op_enq;
	struct rte_ml_op *op_deq;

	FILE *fp;
	int rc;

	/* Initialize EAL */
	rc = rte_eal_init(argc, argv);
	if (rc < 0)
		rte_exit(EXIT_FAILURE, "Invalid EAL arguments\n");
	argc -= rc;
	argv += rc;

	/* Parse application arguments (after the EAL args) */
	if (parse_args(argc, argv) < 0)
		rte_exit(EXIT_FAILURE, "Invalid application arguments\n");

	/* Step 1: Check for ML devices */
	if (rte_ml_dev_count() <= 0)
		rte_exit(EXIT_FAILURE, "Failed to find ML devices\n");

	/* Step 2: Get device info */
	if (rte_ml_dev_info_get(0, &dev_info) != 0)
		rte_exit(EXIT_FAILURE, "Failed to get device info\n");

	/* Step 3: Configure ML device, use device 0 */
	config.socket_id = rte_ml_dev_socket_id(0);
	config.max_nb_models = dev_info.max_models;
	config.nb_queue_pairs = dev_info.max_queue_pairs;
	if (rte_ml_dev_configure(0, &config) != 0)
		rte_exit(EXIT_FAILURE, "Device configuration failed\n");

	/* Step 4: Setup queue pairs, used qp_id = 0 */
	qp_conf.nb_desc = 1;
	if (rte_ml_dev_queue_pair_setup(0, 0, &qp_conf, config.socket_id) != 0)
		rte_exit(EXIT_FAILURE, "Queue-pair setup failed\n");

	/* Step 5: Start device */
	if (rte_ml_dev_start(0) != 0)
		rte_exit(EXIT_FAILURE, "Device start failed\n");

	/* Step 6: Read model data and update load params structure */
	fp = fopen(ctx.model_file, "r+");
	if (fp == NULL)
		rte_exit(EXIT_FAILURE, "Failed to open model file\n");

	fseek(fp, 0, SEEK_END);
	ctx.params.size = ftell(fp);
	fseek(fp, 0, SEEK_SET);

	ctx.params.addr = malloc(ctx.params.size);
	if (fread(ctx.params.addr, 1, ctx.params.size, fp) != ctx.params.size){
		fclose(fp);
		rte_exit(EXIT_FAILURE, "Failed to read model\n");
	}
	fclose(fp);
	strcpy(ctx.params.name, ML_MODEL_NAME);

	/* Step 7: Load the model */
	if (rte_ml_model_load(0, &ctx.params, &ctx.id) != 0)
		rte_exit(EXIT_FAILURE, "Failed to load model\n");
	free(ctx.params.addr);

	/* Step 8: Start the model */
	if (rte_ml_model_start(0, ctx.id) != 0)
		rte_exit(EXIT_FAILURE, "Failed to start model\n");

	/* Step 9: Allocate buffers for quantized input and output */

	/* Get model information */
	if (rte_ml_model_info_get(0, ctx.id, &ctx.info) != 0)
		rte_exit(EXIT_FAILURE, "Failed to get model info\n");

	/* Get the buffer size for input and output */
	rte_ml_io_input_size_get(0, ctx.id, ctx.info.batch_size, &ctx.input_size, NULL);
	rte_ml_io_output_size_get(0, ctx.id, ctx.info.batch_size, &ctx.output_size, NULL);

	mz = rte_memzone_reserve(IO_MZ, ctx.input_size + ctx.output_size, config.socket_id, 0);
	if (mz == NULL)
		rte_exit(EXIT_FAILURE, "Failed to create IO memzone\n");

	ctx.input_buffer = mz->addr;
	ctx.output_buffer = ctx.input_buffer + ctx.input_size;

	/* Step 10: Fill the input data */
	fp = fopen(ctx.inp_file, "r+");
	if (fp == NULL)
		rte_exit(EXIT_FAILURE, "Failed to open input file\n");

	if (fread(ctx.input_buffer, 1, ctx.input_size, fp) != ctx.input_size) {
		fclose(fp);
		rte_exit(EXIT_FAILURE, "Failed to read input file\n");
	}
	fclose(fp);

	/* Step 11: Create ML op mempool */
	op_pool = rte_ml_op_pool_create("ml_op_pool", 1, 0, 0, config.socket_id);
	if (op_pool == NULL)
		rte_exit(EXIT_FAILURE, "Failed to create op pool\n");

	/* Step 12: Form an ML op */
	rte_mempool_get_bulk(op_pool, (void *)op_enq, 1);
	op_enq->model_id = ctx.id;
	op_enq->nb_batches = ctx.info.batch_size;
	op_enq->mempool = op_pool;
	op_enq->input.addr = ctx.input_buffer;
	op_enq->input.length = ctx.input_size;
	op_enq->input.next = NULL;
	op_enq->output.addr = ctx.output_buffer;
	op_enq->output.length = ctx.output_size;
	op_enq->output.next = NULL;

	/* Step 13: Enqueue jobs */
	rte_ml_enqueue_burst(0, 0, &op_enq, 1);

	/* Step 14: Dequeue jobs and release op pool */
	while (rte_ml_dequeue_burst(0, 0, &op_deq, 1) != 1)
		;

	/* Step 15: Write output */
	fp = fopen(ctx.out_file, "w+");
	if (fp == NULL)
		rte_exit(EXIT_FAILURE, "Failed to open output file\n");
	fwrite(ctx.output_buffer, 1, ctx.output_size, fp);
	fclose(fp);

	/* Step 16: Clean up */
	/* Stop ML model */
	rte_ml_model_stop(0, ctx.id);
	/* Unload ML model */
	rte_ml_model_unload(0, ctx.id);
	/* Free input/output memory */
	rte_memzone_free(rte_memzone_lookup(IO_MZ));
	/* Free the ml op back to pool */
	rte_mempool_put_bulk(op_pool, (void *)op_deq, 1);
	/* Free ml op pool */
	rte_mempool_free(op_pool);
	/* Stop the device */
	rte_ml_dev_stop(0);
	rte_ml_dev_close(0);
	rte_eal_cleanup();

	return 0;
}


Jerin Jacob (1):
  mldev: introduce machine learning device library

Srikanth Yalavarthi (11):
  mldev: support PMD functions for ML device
  mldev: support ML device handling functions
  mldev: support ML device queue-pair setup
  mldev: support handling ML models
  mldev: support input and output data handling
  mldev: support ML op pool and ops
  mldev: support inference enqueue and dequeue
  mldev: support device statistics
  mldev: support device extended statistics
  mldev: support to retrieve error information
  mldev: support to get debug info and test device

 MAINTAINERS                              |    5 +
 doc/api/doxy-api-index.md                |    1 +
 doc/api/doxy-api.conf.in                 |    1 +
 doc/guides/prog_guide/img/mldev_flow.svg |  714 ++++++++++++++
 doc/guides/prog_guide/index.rst          |    1 +
 doc/guides/prog_guide/mldev.rst          |  186 ++++
 doc/guides/rel_notes/release_23_03.rst   |    5 +
 lib/meson.build                          |    1 +
 lib/mldev/meson.build                    |   27 +
 lib/mldev/rte_mldev.c                    |  947 ++++++++++++++++++
 lib/mldev/rte_mldev.h                    | 1119 ++++++++++++++++++++++
 lib/mldev/rte_mldev_core.h               |  717 ++++++++++++++
 lib/mldev/rte_mldev_pmd.c                |   62 ++
 lib/mldev/rte_mldev_pmd.h                |  151 +++
 lib/mldev/version.map                    |   51 +
 15 files changed, 3988 insertions(+)
 create mode 100644 doc/guides/prog_guide/img/mldev_flow.svg
 create mode 100644 doc/guides/prog_guide/mldev.rst
 create mode 100644 lib/mldev/meson.build
 create mode 100644 lib/mldev/rte_mldev.c
 create mode 100644 lib/mldev/rte_mldev.h
 create mode 100644 lib/mldev/rte_mldev_core.h
 create mode 100644 lib/mldev/rte_mldev_pmd.c
 create mode 100644 lib/mldev/rte_mldev_pmd.h
 create mode 100644 lib/mldev/version.map
  

Comments

Ferruh Yigit Feb. 15, 2023, 12:55 p.m. UTC | #1
On 2/7/2023 3:13 PM, jerinj@marvell.com wrote:
> From: Jerin Jacob <jerinj@marvell.com>
> 

Hi Jerin,

Please find some comments/questions gathered with the help of some
collegues.

> Machine learning inference library
> ==================================
> 
> Definition of machine learning inference
> ----------------------------------------
> Inference in machine learning is the process of making an output prediction
> based on new input data using a pre-trained machine learning model.
> 
> The scope of the RFC would include only inferencing with pre-trained machine learning models,
> training and building/compiling the ML models is out of scope for this RFC or
> DPDK mldev API. Use existing machine learning compiler frameworks for model creation.
> 
> Motivation for the new library
> ------------------------------
> Multiple semiconductor vendors are offering accelerator products such as DPU
> (often called Smart-NIC), FPGA, GPU, etc., which have ML inferencing capabilities
> integrated as part of the product. Use of ML inferencing is increasing in the domain
> of packet processing for flow classification, intrusion, malware and anomaly detection.
> 

Agree on this need.

> Lack of inferencing support through DPDK APIs will involve complexities and
> increased latency from moving data across frameworks (i.e, dataplane to
> non dataplane ML frameworks and vice-versa). Having a standardized DPDK APIs for ML
> inferencing would enable the dataplane solutions to harness the benefit of inline
> inferencing supported by the hardware.
> 

ack

> Contents
> ---------------
> A) API specification for:
> 
> 1) Discovery of ML capabilities (e.g., device specific features) in a vendor
> independent fashion
> 2) Definition of functions to handle ML devices, which includes probing,
> initialization and termination of the devices.
> 3) Definition of functions to handle ML models used to perform inference operations.
> 4) Definition of function to handle quantize and dequantize operations
> 
> B) Common code for above specification
> 
> rfc..v1:
> - Added programmer guide documentation
> - Added implementation for common code
> 
> v2..v1:
> - Moved dynamic log (Stephen)
> - model id to uint16_t from int16t_t (Stephen)
> - added release note updates
> 
> v3..v2:
> - Introduced rte_ml_dev_init() similar to rte_gpu_init() (Stephen, Thomas)
> - In struct rte_ml_dev_data, removed reserved[3] and   __rte_cache_aligned.
> Also, moved name field to the end(Stephen)
>  
> Machine learning library framework
> ----------------------------------
> 
> The ML framework is built on the following model:
> 
> 
>     +-----------------+               rte_ml_[en|de]queue_burst()
>     |                 |                          |
>     |     Machine     o------+     +--------+    |
>     |     Learning    |      |     | queue  |    |    +------+
>     |     Inference   o------+-----o        |<===o===>|Core 0|
>     |     Engine      |      |     | pair 0 |         +------+
>     |                 o----+ |     +--------+
>     |                 |    | |
>     +-----------------+    | |     +--------+
>              ^             | |     | queue  |         +------+
>              |             | +-----o        |<=======>|Core 1|
>              |             |       | pair 1 |         +------+
>              |             |       +--------+
>     +--------+--------+    |
>     | +-------------+ |    |       +--------+
>     | |   Model 0   | |    |       | queue  |         +------+
>     | +-------------+ |    +-------o        |<=======>|Core N|
>     | +-------------+ |            | pair N |         +------+
>     | |   Model 1   | |            +--------+
>     | +-------------+ |
>     | +-------------+ |<------- rte_ml_model_load()
>     | |   Model ..  | |-------> rte_ml_model_info()
>     | +-------------+ |<------- rte_ml_model_start()
>     | +-------------+ |<------- rte_ml_model_stop()
>     | |   Model N   | |<------- rte_ml_model_params_update()
>     | +-------------+ |<------- rte_ml_model_unload()
>     +-----------------+
> 


Should model load/unload, params_update be part of dpdk, or dpdk can
assume these are already in place. For FPGA both options works, what is
the benefit to have these APIs part of DPDK? What are usecases for other
architectures?


Is multiple active models at same time supported?
For FPGA case multiple models may exist at same time, it would be good
to have a way to select the model to use, like a handle for model that
API accepts.
Similarly a handle for model may help chaining models, possibly with
help of additional APIs to define the chaining.

> ML Device: A hardware or software-based implementation of ML device API for
> running inferences using a pre-trained ML model.
> 

Can this device consume multiple queues in parallel?

> ML Model: An ML model is an algorithm trained over a dataset. A model consists of
> procedure/algorithm and data/pattern required to make predictions on live data.
> Once the model is created and trained outside of the DPDK scope, the model can be loaded
> via rte_ml_model_load() and then start it using rte_ml_model_start() API.
> The rte_ml_model_params_update() can be used to update the model parameters such as weight
> and bias without unloading the model using rte_ml_model_unload().
> 
> ML Inference: ML inference is the process of feeding data to the model via
> rte_ml_enqueue_burst() API and use rte_ml_dequeue_burst() API to get the calculated
> outputs/predictions from the started model.
> 
> In all functions of the ML device API, the ML device is designated by an
> integer >= 0 named as device identifier *dev_id*.
> 
> The functions exported by the ML device API to setup a device designated by
> its device identifier must be invoked in the following order:
> 
>      - rte_ml_dev_configure()
>      - rte_ml_dev_queue_pair_setup()
>      - rte_ml_dev_start()
> 
> A model is required to run the inference operations with the user specified inputs.
> Application needs to invoke the ML model API in the following order before queueing
> inference jobs.
> 
>      - rte_ml_model_load()
>      - rte_ml_model_start()
> 
> The rte_ml_model_info() API is provided to retrieve the information related to the model.
> The information would include the shape and type of input and output required for the inference.
> 

It seems there is a sandardization effort for model description, called
ONNX (https://onnx.ai/), supported by many vendors.

Does it make sense that 'rte_ml_model_info()' describes data, and
perhaps model itself too, using onnx format?

> Data quantization and dequantization is one of the main aspects in ML domain. This involves
> conversion of input data from a higher precision to a lower precision data type and vice-versa
> for the output. APIs are provided to enable quantization through rte_ml_io_quantize() and
> dequantization through rte_ml_io_dequantize(). These APIs have the capability to handle input
> and output buffers holding data for multiple batches.
> Two utility APIs rte_ml_io_input_size_get() and rte_ml_io_output_size_get() can used to get the
> size of quantized and de-quantized multi-batch input and output buffers.
> 


It seems quantize and dequantize can be part of model and optimized
during training, can you please some information HW architecture that
needs these APIs?

Does it make sense to have quantize/dequantize as a capability, like in
case HW has specific support for it this can be used, else host can
provide this functionality.

> User can optionally update the model parameters with rte_ml_model_params_update() after
> invoking rte_ml_model_stop() API on a given model ID.
> 
> The application can invoke, in any order, the functions exported by the ML API to enqueue
> inference jobs and dequeue inference response.
> 
> If the application wants to change the device configuration (i.e., call
> rte_ml_dev_configure() or rte_ml_dev_queue_pair_setup()), then application must stop the
> device using rte_ml_dev_stop() API. Likewise, if model parameters need to be updated then
> the application must call rte_ml_model_stop() followed by rte_ml_model_params_update() API
> for the given model. The application does not need to call rte_ml_dev_stop() API for
> any model re-configuration such as rte_ml_model_params_update(), rte_ml_model_unload() etc.
> 
> Once the device is in the start state after invoking rte_ml_dev_start() API and the model is in
> start state after invoking rte_ml_model_start() API, then the application can call
> rte_ml_enqueue() and rte_ml_dequeue() API on the destined device and model ID.
> 
> Finally, an application can close an ML device by invoking the rte_ml_dev_close() function.
> 
> Typical application utilisation of the ML API will follow the following
> programming flow.
> 
> - rte_ml_dev_configure()
> - rte_ml_dev_queue_pair_setup()
> - rte_ml_model_load()
> - rte_ml_model_start()
> - rte_ml_model_info()
> - rte_ml_dev_start()
> - rte_ml_enqueue_burst()
> - rte_ml_dequeue_burst()
> - rte_ml_model_stop()
> - rte_ml_model_unload()
> - rte_ml_dev_stop()
> - rte_ml_dev_close()
> 

is a 'reset()' API needed?

> Regarding multi-threading, by default, all the functions of the ML Device API exported by a PMD
> are lock-free functions which assume to not be invoked in parallel on different logical cores
> on the same target object. For instance, the dequeue function of a poll mode driver cannot be
> invoked in parallel on two logical cores to operate on same queue pair. Of course, this function
> can be invoked in parallel by different logical core on different queue pair.
> It is the responsibility of the user application to enforce this rule.
> 
> Example application usage for ML inferencing
> --------------------------------------------
> This example application is to demonstrate the programming model of ML device
> library. This example omits the error checks to simplify the application. This
> example also assumes that the input data received is quantized and output expected
> is also quantized. In order to handle non-quantized inputs and outputs, users can
> invoke rte_ml_io_quantize() or rte_ml_io_dequantize() for data type conversions.
> 
> #define ML_MODEL_NAME "model"
> #define IO_MZ "io_mz"
> 
> struct app_ctx {
> 	char model_file[PATH_MAX];
> 	char inp_file[PATH_MAX];
> 	char out_file[PATH_MAX];
> 
> 	struct rte_ml_model_params params;
> 	struct rte_ml_model_info info;
> 	uint16_t id;
> 
> 	uint64_t input_size;
> 	uint64_t output_size;
> 	uint8_t *input_buffer;
> 	uint8_t *output_buffer;
> } __rte_cache_aligned;
> 
> struct app_ctx ctx;
> 
> static int
> parse_args(int argc, char **argv)
> {
> 	int opt, option_index;
> 	static struct option lgopts[] = {{"model", required_argument, NULL, 'm'},
> 					 {"input", required_argument, NULL, 'i'},
> 					 {"output", required_argument, NULL, 'o'},
> 					 {NULL, 0, NULL, 0}};
> 
> 	while ((opt = getopt_long(argc, argv, "m:i:o:", lgopts, &option_index)) != EOF)
> 		switch (opt) {
> 		case 'm':
> 			strncpy(ctx.model_file, optarg, PATH_MAX - 1);
> 			break;
> 		case 'i':
> 			strncpy(ctx.inp_file, optarg, PATH_MAX - 1);
> 			break;
> 		case 'o':
> 			strncpy(ctx.out_file, optarg, PATH_MAX - 1);
> 			break;
> 		default:
> 			return -1;
> 		}
> 
> 	return 0;
> }
> 
> int
> main(int argc, char **argv)
> {
> 	struct rte_ml_dev_qp_conf qp_conf;
> 	struct rte_ml_dev_config config;
> 	struct rte_ml_dev_info dev_info;
> 	const struct rte_memzone *mz;
> 	struct rte_mempool *op_pool;
> 	struct rte_ml_op *op_enq;
> 	struct rte_ml_op *op_deq;
> 
> 	FILE *fp;
> 	int rc;
> 
> 	/* Initialize EAL */
> 	rc = rte_eal_init(argc, argv);
> 	if (rc < 0)
> 		rte_exit(EXIT_FAILURE, "Invalid EAL arguments\n");
> 	argc -= rc;
> 	argv += rc;
> 
> 	/* Parse application arguments (after the EAL args) */
> 	if (parse_args(argc, argv) < 0)
> 		rte_exit(EXIT_FAILURE, "Invalid application arguments\n");
> 
> 	/* Step 1: Check for ML devices */
> 	if (rte_ml_dev_count() <= 0)
> 		rte_exit(EXIT_FAILURE, "Failed to find ML devices\n");
> 
> 	/* Step 2: Get device info */
> 	if (rte_ml_dev_info_get(0, &dev_info) != 0)
> 		rte_exit(EXIT_FAILURE, "Failed to get device info\n");
> 
> 	/* Step 3: Configure ML device, use device 0 */
> 	config.socket_id = rte_ml_dev_socket_id(0);
> 	config.max_nb_models = dev_info.max_models;
> 	config.nb_queue_pairs = dev_info.max_queue_pairs;
> 	if (rte_ml_dev_configure(0, &config) != 0)
> 		rte_exit(EXIT_FAILURE, "Device configuration failed\n");
> 
> 	/* Step 4: Setup queue pairs, used qp_id = 0 */
> 	qp_conf.nb_desc = 1;
> 	if (rte_ml_dev_queue_pair_setup(0, 0, &qp_conf, config.socket_id) != 0)
> 		rte_exit(EXIT_FAILURE, "Queue-pair setup failed\n");
> 
> 	/* Step 5: Start device */
> 	if (rte_ml_dev_start(0) != 0)
> 		rte_exit(EXIT_FAILURE, "Device start failed\n");
> 
> 	/* Step 6: Read model data and update load params structure */
> 	fp = fopen(ctx.model_file, "r+");
> 	if (fp == NULL)
> 		rte_exit(EXIT_FAILURE, "Failed to open model file\n");
> 
> 	fseek(fp, 0, SEEK_END);
> 	ctx.params.size = ftell(fp);
> 	fseek(fp, 0, SEEK_SET);
> 
> 	ctx.params.addr = malloc(ctx.params.size);
> 	if (fread(ctx.params.addr, 1, ctx.params.size, fp) != ctx.params.size){
> 		fclose(fp);
> 		rte_exit(EXIT_FAILURE, "Failed to read model\n");
> 	}
> 	fclose(fp);
> 	strcpy(ctx.params.name, ML_MODEL_NAME);
> 
> 	/* Step 7: Load the model */
> 	if (rte_ml_model_load(0, &ctx.params, &ctx.id) != 0)
> 		rte_exit(EXIT_FAILURE, "Failed to load model\n");
> 	free(ctx.params.addr);
> 
> 	/* Step 8: Start the model */
> 	if (rte_ml_model_start(0, ctx.id) != 0)
> 		rte_exit(EXIT_FAILURE, "Failed to start model\n");
> 
> 	/* Step 9: Allocate buffers for quantized input and output */
> 
> 	/* Get model information */
> 	if (rte_ml_model_info_get(0, ctx.id, &ctx.info) != 0)
> 		rte_exit(EXIT_FAILURE, "Failed to get model info\n");
> 
> 	/* Get the buffer size for input and output */
> 	rte_ml_io_input_size_get(0, ctx.id, ctx.info.batch_size, &ctx.input_size, NULL);
> 	rte_ml_io_output_size_get(0, ctx.id, ctx.info.batch_size, &ctx.output_size, NULL);
> 
> 	mz = rte_memzone_reserve(IO_MZ, ctx.input_size + ctx.output_size, config.socket_id, 0);
> 	if (mz == NULL)
> 		rte_exit(EXIT_FAILURE, "Failed to create IO memzone\n");
> 
> 	ctx.input_buffer = mz->addr;
> 	ctx.output_buffer = ctx.input_buffer + ctx.input_size;
> 
> 	/* Step 10: Fill the input data */
> 	fp = fopen(ctx.inp_file, "r+");
> 	if (fp == NULL)
> 		rte_exit(EXIT_FAILURE, "Failed to open input file\n");
> 
> 	if (fread(ctx.input_buffer, 1, ctx.input_size, fp) != ctx.input_size) {
> 		fclose(fp);
> 		rte_exit(EXIT_FAILURE, "Failed to read input file\n");
> 	}
> 	fclose(fp);
> 
> 	/* Step 11: Create ML op mempool */
> 	op_pool = rte_ml_op_pool_create("ml_op_pool", 1, 0, 0, config.socket_id);
> 	if (op_pool == NULL)
> 		rte_exit(EXIT_FAILURE, "Failed to create op pool\n");
> 
> 	/* Step 12: Form an ML op */
> 	rte_mempool_get_bulk(op_pool, (void *)op_enq, 1);
> 	op_enq->model_id = ctx.id;
> 	op_enq->nb_batches = ctx.info.batch_size;
> 	op_enq->mempool = op_pool;
> 	op_enq->input.addr = ctx.input_buffer;
> 	op_enq->input.length = ctx.input_size;
> 	op_enq->input.next = NULL;
> 	op_enq->output.addr = ctx.output_buffer;
> 	op_enq->output.length = ctx.output_size;
> 	op_enq->output.next = NULL;
> 
> 	/* Step 13: Enqueue jobs */
> 	rte_ml_enqueue_burst(0, 0, &op_enq, 1);
> 
> 	/* Step 14: Dequeue jobs and release op pool */
> 	while (rte_ml_dequeue_burst(0, 0, &op_deq, 1) != 1)
> 		;
> 
> 	/* Step 15: Write output */
> 	fp = fopen(ctx.out_file, "w+");
> 	if (fp == NULL)
> 		rte_exit(EXIT_FAILURE, "Failed to open output file\n");
> 	fwrite(ctx.output_buffer, 1, ctx.output_size, fp);
> 	fclose(fp);
> 
> 	/* Step 16: Clean up */
> 	/* Stop ML model */
> 	rte_ml_model_stop(0, ctx.id);
> 	/* Unload ML model */
> 	rte_ml_model_unload(0, ctx.id);
> 	/* Free input/output memory */
> 	rte_memzone_free(rte_memzone_lookup(IO_MZ));
> 	/* Free the ml op back to pool */
> 	rte_mempool_put_bulk(op_pool, (void *)op_deq, 1);
> 	/* Free ml op pool */
> 	rte_mempool_free(op_pool);
> 	/* Stop the device */
> 	rte_ml_dev_stop(0);
> 	rte_ml_dev_close(0);
> 	rte_eal_cleanup();
> 
> 	return 0;
> }
> 
> 
> Jerin Jacob (1):
>   mldev: introduce machine learning device library
> 
> Srikanth Yalavarthi (11):
>   mldev: support PMD functions for ML device
>   mldev: support ML device handling functions
>   mldev: support ML device queue-pair setup
>   mldev: support handling ML models
>   mldev: support input and output data handling
>   mldev: support ML op pool and ops
>   mldev: support inference enqueue and dequeue
>   mldev: support device statistics
>   mldev: support device extended statistics
>   mldev: support to retrieve error information
>   mldev: support to get debug info and test device
> 
>  MAINTAINERS                              |    5 +
>  doc/api/doxy-api-index.md                |    1 +
>  doc/api/doxy-api.conf.in                 |    1 +
>  doc/guides/prog_guide/img/mldev_flow.svg |  714 ++++++++++++++
>  doc/guides/prog_guide/index.rst          |    1 +
>  doc/guides/prog_guide/mldev.rst          |  186 ++++
>  doc/guides/rel_notes/release_23_03.rst   |    5 +
>  lib/meson.build                          |    1 +
>  lib/mldev/meson.build                    |   27 +
>  lib/mldev/rte_mldev.c                    |  947 ++++++++++++++++++
>  lib/mldev/rte_mldev.h                    | 1119 ++++++++++++++++++++++
>  lib/mldev/rte_mldev_core.h               |  717 ++++++++++++++
>  lib/mldev/rte_mldev_pmd.c                |   62 ++
>  lib/mldev/rte_mldev_pmd.h                |  151 +++
>  lib/mldev/version.map                    |   51 +
>  15 files changed, 3988 insertions(+)
>  create mode 100644 doc/guides/prog_guide/img/mldev_flow.svg
>  create mode 100644 doc/guides/prog_guide/mldev.rst
>  create mode 100644 lib/mldev/meson.build
>  create mode 100644 lib/mldev/rte_mldev.c
>  create mode 100644 lib/mldev/rte_mldev.h
>  create mode 100644 lib/mldev/rte_mldev_core.h
>  create mode 100644 lib/mldev/rte_mldev_pmd.c
>  create mode 100644 lib/mldev/rte_mldev_pmd.h
>  create mode 100644 lib/mldev/version.map
>
  
Jerin Jacob Feb. 15, 2023, 5:03 p.m. UTC | #2
On Wed, Feb 15, 2023 at 6:25 PM Ferruh Yigit <ferruh.yigit@amd.com> wrote:
>
> On 2/7/2023 3:13 PM, jerinj@marvell.com wrote:
> > From: Jerin Jacob <jerinj@marvell.com>
> >
>
> Hi Jerin,
>
> Please find some comments/questions gathered with the help of some
> collegues.


Thanks Ferruh for the review.


>
> > Machine learning inference library
> > ==================================
> >
> > Definition of machine learning inference
> > ----------------------------------------
> > Inference in machine learning is the process of making an output prediction
> > based on new input data using a pre-trained machine learning model.
> >
> > The scope of the RFC would include only inferencing with pre-trained machine learning models,
> > training and building/compiling the ML models is out of scope for this RFC or
> > DPDK mldev API. Use existing machine learning compiler frameworks for model creation.
> >
> > Motivation for the new library
> > ------------------------------
> > Multiple semiconductor vendors are offering accelerator products such as DPU
> > (often called Smart-NIC), FPGA, GPU, etc., which have ML inferencing capabilities
> > integrated as part of the product. Use of ML inferencing is increasing in the domain
> > of packet processing for flow classification, intrusion, malware and anomaly detection.
> >
>
> Agree on this need.
>
> > Lack of inferencing support through DPDK APIs will involve complexities and
> > increased latency from moving data across frameworks (i.e, dataplane to
> > non dataplane ML frameworks and vice-versa). Having a standardized DPDK APIs for ML
> > inferencing would enable the dataplane solutions to harness the benefit of inline
> > inferencing supported by the hardware.
> >
>
> ack
>
> > Contents
> > ---------------
> > A) API specification for:
> >
> > 1) Discovery of ML capabilities (e.g., device specific features) in a vendor
> > independent fashion
> > 2) Definition of functions to handle ML devices, which includes probing,
> > initialization and termination of the devices.
> > 3) Definition of functions to handle ML models used to perform inference operations.
> > 4) Definition of function to handle quantize and dequantize operations
> >
> > B) Common code for above specification
> >
> > rfc..v1:
> > - Added programmer guide documentation
> > - Added implementation for common code
> >
> > v2..v1:
> > - Moved dynamic log (Stephen)
> > - model id to uint16_t from int16t_t (Stephen)
> > - added release note updates
> >
> > v3..v2:
> > - Introduced rte_ml_dev_init() similar to rte_gpu_init() (Stephen, Thomas)
> > - In struct rte_ml_dev_data, removed reserved[3] and   __rte_cache_aligned.
> > Also, moved name field to the end(Stephen)
> >
> > Machine learning library framework
> > ----------------------------------
> >
> > The ML framework is built on the following model:
> >
> >
> >     +-----------------+               rte_ml_[en|de]queue_burst()
> >     |                 |                          |
> >     |     Machine     o------+     +--------+    |
> >     |     Learning    |      |     | queue  |    |    +------+
> >     |     Inference   o------+-----o        |<===o===>|Core 0|
> >     |     Engine      |      |     | pair 0 |         +------+
> >     |                 o----+ |     +--------+
> >     |                 |    | |
> >     +-----------------+    | |     +--------+
> >              ^             | |     | queue  |         +------+
> >              |             | +-----o        |<=======>|Core 1|
> >              |             |       | pair 1 |         +------+
> >              |             |       +--------+
> >     +--------+--------+    |
> >     | +-------------+ |    |       +--------+
> >     | |   Model 0   | |    |       | queue  |         +------+
> >     | +-------------+ |    +-------o        |<=======>|Core N|
> >     | +-------------+ |            | pair N |         +------+
> >     | |   Model 1   | |            +--------+
> >     | +-------------+ |
> >     | +-------------+ |<------- rte_ml_model_load()
> >     | |   Model ..  | |-------> rte_ml_model_info()
> >     | +-------------+ |<------- rte_ml_model_start()
> >     | +-------------+ |<------- rte_ml_model_stop()
> >     | |   Model N   | |<------- rte_ml_model_params_update()
> >     | +-------------+ |<------- rte_ml_model_unload()
> >     +-----------------+
> >
>
>
> Should model load/unload, params_update be part of dpdk, or dpdk can
> assume these are already in place.

The driver hooks can be NOPs if the model is already loaded or fixed
model FPGA solutions.
Probably we can add parameter info_get() in case if someone think user
needs to be aware of it.
Currently its an experimental API, when such device comes we can
extent the rte_ml_dev_infostructure if/as needed.


> For FPGA both options works, what is
> the benefit to have these APIs part of DPDK?

Support for runtime load/unload model if ml devices supports it.
Also it has some bearing on data-path as inference
needs to be stopped if one need to unload the model.

> What are usecases for other  architectures?

When a device supports max N models (rte_ml_dev_info::max_models),
a user can replace unused model at runtime when the max number model
are reached.

>
>
> Is multiple active models at same time supported?
> For FPGA case multiple models may exist at same time, it would be good
> to have a way to select the model to use, like a handle for model that
> API accepts.
> Similarly a handle for model may help chaining models, possibly with
> help of additional APIs to define the chaining.

Yes, multiple active models are supported simultaneously.
Each model loaded would have unique model_id assigned by the driver
which can be used as a handle, while queuing inference requests or
doing slow-path operations.


>
> > ML Device: A hardware or software-based implementation of ML device API for
> > running inferences using a pre-trained ML model.
> >
>
> Can this device consume multiple queues in parallel?

Yes, spec doesn't impose any restrictions on the number of queues that
can be consumed by the device.
Actual number of queues permitted is dependent on the device.
see rte_ml_dev_info:max_queue_pairs

>
> > ML Model: An ML model is an algorithm trained over a dataset. A model consists of
> > procedure/algorithm and data/pattern required to make predictions on live data.
> > Once the model is created and trained outside of the DPDK scope, the model can be loaded
> > via rte_ml_model_load() and then start it using rte_ml_model_start() API.
> > The rte_ml_model_params_update() can be used to update the model parameters such as weight
> > and bias without unloading the model using rte_ml_model_unload().
> >
> > ML Inference: ML inference is the process of feeding data to the model via
> > rte_ml_enqueue_burst() API and use rte_ml_dequeue_burst() API to get the calculated
> > outputs/predictions from the started model.
> >
> > In all functions of the ML device API, the ML device is designated by an
> > integer >= 0 named as device identifier *dev_id*.
> >
> > The functions exported by the ML device API to setup a device designated by
> > its device identifier must be invoked in the following order:
> >
> >      - rte_ml_dev_configure()
> >      - rte_ml_dev_queue_pair_setup()
> >      - rte_ml_dev_start()
> >
> > A model is required to run the inference operations with the user specified inputs.
> > Application needs to invoke the ML model API in the following order before queueing
> > inference jobs.
> >
> >      - rte_ml_model_load()
> >      - rte_ml_model_start()
> >
> > The rte_ml_model_info() API is provided to retrieve the information related to the model.
> > The information would include the shape and type of input and output required for the inference.
> >
>
> It seems there is a sandardization effort for model description, called
> ONNX (https://onnx.ai/), supported by many vendors.
>
> Does it make sense that 'rte_ml_model_info()' describes data, and
> perhaps model itself too, using onnx format?


ONNX format is presents higher number of details, which may not be
required from a dataplane point-of-view.
We have proposed a rte_ml_model_info struct with the most important
fields that required. This structure can be expanded further as
required.

>
> > Data quantization and dequantization is one of the main aspects in ML domain. This involves
> > conversion of input data from a higher precision to a lower precision data type and vice-versa
> > for the output. APIs are provided to enable quantization through rte_ml_io_quantize() and
> > dequantization through rte_ml_io_dequantize(). These APIs have the capability to handle input
> > and output buffers holding data for multiple batches.
> > Two utility APIs rte_ml_io_input_size_get() and rte_ml_io_output_size_get() can used to get the
> > size of quantized and de-quantized multi-batch input and output buffers.
> >
>
>
> It seems quantize and dequantize can be part of model and optimized
> during training, can you please some information HW architecture that
> needs these APIs?

Marvell HW engines don't support quantization/dequantization in HW and
the same has to be done by software on ARM64/x64 cores.
The same applies for SW based mldevices.
So it can NOP for the device which already supports it or we can
introduce the capability
when such drivers getting added to DPDK. No issue in updating the API
when new driver comes.

>
> Does it make sense to have quantize/dequantize as a capability, like in
> case HW has specific support for it this can be used, else host can
> provide this functionality.

We tried to keep the APIs minimal in the initial version. Yes.
quantize/dequantize capability can be part of these device
capabilities when such devices added.

>
> > User can optionally update the model parameters with rte_ml_model_params_update() after
> > invoking rte_ml_model_stop() API on a given model ID.
> >
> > The application can invoke, in any order, the functions exported by the ML API to enqueue
> > inference jobs and dequeue inference response.
> >
> > If the application wants to change the device configuration (i.e., call
> > rte_ml_dev_configure() or rte_ml_dev_queue_pair_setup()), then application must stop the
> > device using rte_ml_dev_stop() API. Likewise, if model parameters need to be updated then
> > the application must call rte_ml_model_stop() followed by rte_ml_model_params_update() API
> > for the given model. The application does not need to call rte_ml_dev_stop() API for
> > any model re-configuration such as rte_ml_model_params_update(), rte_ml_model_unload() etc.
> >
> > Once the device is in the start state after invoking rte_ml_dev_start() API and the model is in
> > start state after invoking rte_ml_model_start() API, then the application can call
> > rte_ml_enqueue() and rte_ml_dequeue() API on the destined device and model ID.
> >
> > Finally, an application can close an ML device by invoking the rte_ml_dev_close() function.
> >
> > Typical application utilisation of the ML API will follow the following
> > programming flow.
> >
> > - rte_ml_dev_configure()
> > - rte_ml_dev_queue_pair_setup()
> > - rte_ml_model_load()
> > - rte_ml_model_start()
> > - rte_ml_model_info()
> > - rte_ml_dev_start()
> > - rte_ml_enqueue_burst()
> > - rte_ml_dequeue_burst()
> > - rte_ml_model_stop()
> > - rte_ml_model_unload()
> > - rte_ml_dev_stop()
> > - rte_ml_dev_close()
> >
>
> is a 'reset()' API needed?

 We can add in future, if a specific HW needs/supports it. Keeping
bare minium for first version.

>
> > Regarding multi-threading, by default, all the functions of the ML Device API exported by a PMD
> > are lock-free functions which assume to not be invoked in parallel on different logical cores
> > on the same target object. For instance, the dequeue function of a poll mode driver cannot be
> > invoked in parallel on two logical cores to operate on same queue pair. Of course, this function
> > can be invoked in parallel by different logical core on different queue pair.
> > It is the responsibility of the user application to enforce this rule.
> >
> > Example application usage for ML inferencing
> > --------------------------------------------
> > This example application is to demonstrate the programming model of ML device
> > library. This example omits the error checks to simplify the application. This
> > example also assumes that the input data received is quantized and output expected
> > is also quantized. In order to handle non-quantized inputs and outputs, users can
> > invoke rte_ml_io_quantize() or rte_ml_io_dequantize() for data type conversions.
> >
> > #define ML_MODEL_NAME "model"
> > #define IO_MZ "io_mz"
> >
> > struct app_ctx {
> >       char model_file[PATH_MAX];
> >       char inp_file[PATH_MAX];
> >       char out_file[PATH_MAX];
> >
> >       struct rte_ml_model_params params;
> >       struct rte_ml_model_info info;
> >       uint16_t id;
> >
> >       uint64_t input_size;
> >       uint64_t output_size;
> >       uint8_t *input_buffer;
> >       uint8_t *output_buffer;
> > } __rte_cache_aligned;
> >
> > struct app_ctx ctx;
> >
> > static int
> > parse_args(int argc, char **argv)
> > {
> >       int opt, option_index;
> >       static struct option lgopts[] = {{"model", required_argument, NULL, 'm'},
> >                                        {"input", required_argument, NULL, 'i'},
> >                                        {"output", required_argument, NULL, 'o'},
> >                                        {NULL, 0, NULL, 0}};
> >
> >       while ((opt = getopt_long(argc, argv, "m:i:o:", lgopts, &option_index)) != EOF)
> >               switch (opt) {
> >               case 'm':
> >                       strncpy(ctx.model_file, optarg, PATH_MAX - 1);
> >                       break;
> >               case 'i':
> >                       strncpy(ctx.inp_file, optarg, PATH_MAX - 1);
> >                       break;
> >               case 'o':
> >                       strncpy(ctx.out_file, optarg, PATH_MAX - 1);
> >                       break;
> >               default:
> >                       return -1;
> >               }
> >
> >       return 0;
> > }
> >
> > int
> > main(int argc, char **argv)
> > {
> >       struct rte_ml_dev_qp_conf qp_conf;
> >       struct rte_ml_dev_config config;
> >       struct rte_ml_dev_info dev_info;
> >       const struct rte_memzone *mz;
> >       struct rte_mempool *op_pool;
> >       struct rte_ml_op *op_enq;
> >       struct rte_ml_op *op_deq;
> >
> >       FILE *fp;
> >       int rc;
> >
> >       /* Initialize EAL */
> >       rc = rte_eal_init(argc, argv);
> >       if (rc < 0)
> >               rte_exit(EXIT_FAILURE, "Invalid EAL arguments\n");
> >       argc -= rc;
> >       argv += rc;
> >
> >       /* Parse application arguments (after the EAL args) */
> >       if (parse_args(argc, argv) < 0)
> >               rte_exit(EXIT_FAILURE, "Invalid application arguments\n");
> >
> >       /* Step 1: Check for ML devices */
> >       if (rte_ml_dev_count() <= 0)
> >               rte_exit(EXIT_FAILURE, "Failed to find ML devices\n");
> >
> >       /* Step 2: Get device info */
> >       if (rte_ml_dev_info_get(0, &dev_info) != 0)
> >               rte_exit(EXIT_FAILURE, "Failed to get device info\n");
> >
> >       /* Step 3: Configure ML device, use device 0 */
> >       config.socket_id = rte_ml_dev_socket_id(0);
> >       config.max_nb_models = dev_info.max_models;
> >       config.nb_queue_pairs = dev_info.max_queue_pairs;
> >       if (rte_ml_dev_configure(0, &config) != 0)
> >               rte_exit(EXIT_FAILURE, "Device configuration failed\n");
> >
> >       /* Step 4: Setup queue pairs, used qp_id = 0 */
> >       qp_conf.nb_desc = 1;
> >       if (rte_ml_dev_queue_pair_setup(0, 0, &qp_conf, config.socket_id) != 0)
> >               rte_exit(EXIT_FAILURE, "Queue-pair setup failed\n");
> >
> >       /* Step 5: Start device */
> >       if (rte_ml_dev_start(0) != 0)
> >               rte_exit(EXIT_FAILURE, "Device start failed\n");
> >
> >       /* Step 6: Read model data and update load params structure */
> >       fp = fopen(ctx.model_file, "r+");
> >       if (fp == NULL)
> >               rte_exit(EXIT_FAILURE, "Failed to open model file\n");
> >
> >       fseek(fp, 0, SEEK_END);
> >       ctx.params.size = ftell(fp);
> >       fseek(fp, 0, SEEK_SET);
> >
> >       ctx.params.addr = malloc(ctx.params.size);
> >       if (fread(ctx.params.addr, 1, ctx.params.size, fp) != ctx.params.size){
> >               fclose(fp);
> >               rte_exit(EXIT_FAILURE, "Failed to read model\n");
> >       }
> >       fclose(fp);
> >       strcpy(ctx.params.name, ML_MODEL_NAME);
> >
> >       /* Step 7: Load the model */
> >       if (rte_ml_model_load(0, &ctx.params, &ctx.id) != 0)
> >               rte_exit(EXIT_FAILURE, "Failed to load model\n");
> >       free(ctx.params.addr);
> >
> >       /* Step 8: Start the model */
> >       if (rte_ml_model_start(0, ctx.id) != 0)
> >               rte_exit(EXIT_FAILURE, "Failed to start model\n");
> >
> >       /* Step 9: Allocate buffers for quantized input and output */
> >
> >       /* Get model information */
> >       if (rte_ml_model_info_get(0, ctx.id, &ctx.info) != 0)
> >               rte_exit(EXIT_FAILURE, "Failed to get model info\n");
> >
> >       /* Get the buffer size for input and output */
> >       rte_ml_io_input_size_get(0, ctx.id, ctx.info.batch_size, &ctx.input_size, NULL);
> >       rte_ml_io_output_size_get(0, ctx.id, ctx.info.batch_size, &ctx.output_size, NULL);
> >
> >       mz = rte_memzone_reserve(IO_MZ, ctx.input_size + ctx.output_size, config.socket_id, 0);
> >       if (mz == NULL)
> >               rte_exit(EXIT_FAILURE, "Failed to create IO memzone\n");
> >
> >       ctx.input_buffer = mz->addr;
> >       ctx.output_buffer = ctx.input_buffer + ctx.input_size;
> >
> >       /* Step 10: Fill the input data */
> >       fp = fopen(ctx.inp_file, "r+");
> >       if (fp == NULL)
> >               rte_exit(EXIT_FAILURE, "Failed to open input file\n");
> >
> >       if (fread(ctx.input_buffer, 1, ctx.input_size, fp) != ctx.input_size) {
> >               fclose(fp);
> >               rte_exit(EXIT_FAILURE, "Failed to read input file\n");
> >       }
> >       fclose(fp);
> >
> >       /* Step 11: Create ML op mempool */
> >       op_pool = rte_ml_op_pool_create("ml_op_pool", 1, 0, 0, config.socket_id);
> >       if (op_pool == NULL)
> >               rte_exit(EXIT_FAILURE, "Failed to create op pool\n");
> >
> >       /* Step 12: Form an ML op */
> >       rte_mempool_get_bulk(op_pool, (void *)op_enq, 1);
> >       op_enq->model_id = ctx.id;
> >       op_enq->nb_batches = ctx.info.batch_size;
> >       op_enq->mempool = op_pool;
> >       op_enq->input.addr = ctx.input_buffer;
> >       op_enq->input.length = ctx.input_size;
> >       op_enq->input.next = NULL;
> >       op_enq->output.addr = ctx.output_buffer;
> >       op_enq->output.length = ctx.output_size;
> >       op_enq->output.next = NULL;
> >
> >       /* Step 13: Enqueue jobs */
> >       rte_ml_enqueue_burst(0, 0, &op_enq, 1);
> >
> >       /* Step 14: Dequeue jobs and release op pool */
> >       while (rte_ml_dequeue_burst(0, 0, &op_deq, 1) != 1)
> >               ;
> >
> >       /* Step 15: Write output */
> >       fp = fopen(ctx.out_file, "w+");
> >       if (fp == NULL)
> >               rte_exit(EXIT_FAILURE, "Failed to open output file\n");
> >       fwrite(ctx.output_buffer, 1, ctx.output_size, fp);
> >       fclose(fp);
> >
> >       /* Step 16: Clean up */
> >       /* Stop ML model */
> >       rte_ml_model_stop(0, ctx.id);
> >       /* Unload ML model */
> >       rte_ml_model_unload(0, ctx.id);
> >       /* Free input/output memory */
> >       rte_memzone_free(rte_memzone_lookup(IO_MZ));
> >       /* Free the ml op back to pool */
> >       rte_mempool_put_bulk(op_pool, (void *)op_deq, 1);
> >       /* Free ml op pool */
> >       rte_mempool_free(op_pool);
> >       /* Stop the device */
> >       rte_ml_dev_stop(0);
> >       rte_ml_dev_close(0);
> >       rte_eal_cleanup();
> >
> >       return 0;
> > }
> >
> >
> > Jerin Jacob (1):
> >   mldev: introduce machine learning device library
> >
> > Srikanth Yalavarthi (11):
> >   mldev: support PMD functions for ML device
> >   mldev: support ML device handling functions
> >   mldev: support ML device queue-pair setup
> >   mldev: support handling ML models
> >   mldev: support input and output data handling
> >   mldev: support ML op pool and ops
> >   mldev: support inference enqueue and dequeue
> >   mldev: support device statistics
> >   mldev: support device extended statistics
> >   mldev: support to retrieve error information
> >   mldev: support to get debug info and test device
> >
> >  MAINTAINERS                              |    5 +
> >  doc/api/doxy-api-index.md                |    1 +
> >  doc/api/doxy-api.conf.in                 |    1 +
> >  doc/guides/prog_guide/img/mldev_flow.svg |  714 ++++++++++++++
> >  doc/guides/prog_guide/index.rst          |    1 +
> >  doc/guides/prog_guide/mldev.rst          |  186 ++++
> >  doc/guides/rel_notes/release_23_03.rst   |    5 +
> >  lib/meson.build                          |    1 +
> >  lib/mldev/meson.build                    |   27 +
> >  lib/mldev/rte_mldev.c                    |  947 ++++++++++++++++++
> >  lib/mldev/rte_mldev.h                    | 1119 ++++++++++++++++++++++
> >  lib/mldev/rte_mldev_core.h               |  717 ++++++++++++++
> >  lib/mldev/rte_mldev_pmd.c                |   62 ++
> >  lib/mldev/rte_mldev_pmd.h                |  151 +++
> >  lib/mldev/version.map                    |   51 +
> >  15 files changed, 3988 insertions(+)
> >  create mode 100644 doc/guides/prog_guide/img/mldev_flow.svg
> >  create mode 100644 doc/guides/prog_guide/mldev.rst
> >  create mode 100644 lib/mldev/meson.build
> >  create mode 100644 lib/mldev/rte_mldev.c
> >  create mode 100644 lib/mldev/rte_mldev.h
> >  create mode 100644 lib/mldev/rte_mldev_core.h
> >  create mode 100644 lib/mldev/rte_mldev_pmd.c
> >  create mode 100644 lib/mldev/rte_mldev_pmd.h
> >  create mode 100644 lib/mldev/version.map
> >
>
  
Thomas Monjalon March 9, 2023, 5:33 p.m. UTC | #3
07/02/2023 16:13, jerinj@marvell.com:
> Jerin Jacob (1):
>   mldev: introduce machine learning device library
> 
> Srikanth Yalavarthi (11):
>   mldev: support PMD functions for ML device
>   mldev: support ML device handling functions
>   mldev: support ML device queue-pair setup
>   mldev: support handling ML models
>   mldev: support input and output data handling
>   mldev: support ML op pool and ops
>   mldev: support inference enqueue and dequeue
>   mldev: support device statistics
>   mldev: support device extended statistics
>   mldev: support to retrieve error information
>   mldev: support to get debug info and test device

Applied with few styling improvements for the doc, thanks.

One more library in DPDK :)