mbox series

[v1,00/12] mldev: introduce machine learning device library

Message ID 20221114120238.2143832-1-jerinj@marvell.com (mailing list archive)
Headers
Series mldev: introduce machine learning device library |

Message

Jerin Jacob Nov. 14, 2022, 12:02 p.m. UTC
From: Jerin Jacob <jerinj@marvell.com>

Machine learning inference library
==================================

Definition of machine learning inference
----------------------------------------
Inference in machine learning is the process of making an output prediction
based on new input data using a pre-trained machine learning model.

The scope of the RFC would include only inferencing with pre-trained machine learning models,
training and building/compiling the ML models is out of scope for this RFC or
DPDK mldev API. Use existing machine learning compiler frameworks for model creation.

Motivation for the new library
------------------------------
Multiple semiconductor vendors are offering accelerator products such as DPU
(often called Smart-NIC), FPGA, GPU, etc., which have ML inferencing capabilities
integrated as part of the product. Use of ML inferencing is increasing in the domain
of packet processing for flow classification, intrusion, malware and anomaly detection.

Lack of inferencing support through DPDK APIs will involve complexities and
increased latency from moving data across frameworks (i.e, dataplane to
non dataplane ML frameworks and vice-versa). Having a standardized DPDK APIs for ML
inferencing would enable the dataplane solutions to harness the benefit of inline
inferencing supported by the hardware.

Contents
---------------

A) API specification for:

1) Discovery of ML capabilities (e.g., device specific features) in a vendor
independent fashion
2) Definition of functions to handle ML devices, which includes probing,
initialization and termination of the devices.
3) Definition of functions to handle ML models used to perform inference operations.
4) Definition of function to handle quantize and dequantize operations

B) Common code for above specification

Roadmap
-------
1) SW mldev driver based on TVM (https://tvm.apache.org/)
2) HW mldev driver for cn10k
3) Add app/test-mldev application similar to other device class tests

rfc..v1:
- Added programmer guide documentation
- Added implementation for common code

Machine learning library framework
----------------------------------

The ML framework is built on the following model:


    +-----------------+               rte_ml_[en|de]queue_burst()
    |                 |                          |
    |     Machine     o------+     +--------+    |
    |     Learning    |      |     | queue  |    |    +------+
    |     Inference   o------+-----o        |<===o===>|Core 0|
    |     Engine      |      |     | pair 0 |         +------+
    |                 o----+ |     +--------+
    |                 |    | |
    +-----------------+    | |     +--------+
             ^             | |     | queue  |         +------+
             |             | +-----o        |<=======>|Core 1|
             |             |       | pair 1 |         +------+
             |             |       +--------+
    +--------+--------+    |
    | +-------------+ |    |       +--------+
    | |   Model 0   | |    |       | queue  |         +------+
    | +-------------+ |    +-------o        |<=======>|Core N|
    | +-------------+ |            | pair N |         +------+
    | |   Model 1   | |            +--------+
    | +-------------+ |
    | +-------------+ |<------- rte_ml_model_load()
    | |   Model ..  | |-------> rte_ml_model_info()
    | +-------------+ |<------- rte_ml_model_start()
    | +-------------+ |<------- rte_ml_model_stop()
    | |   Model N   | |<------- rte_ml_model_params_update()
    | +-------------+ |<------- rte_ml_model_unload()
    +-----------------+

ML Device: A hardware or software-based implementation of ML device API for
running inferences using a pre-trained ML model.

ML Model: An ML model is an algorithm trained over a dataset. A model consists of
procedure/algorithm and data/pattern required to make predictions on live data.
Once the model is created and trained outside of the DPDK scope, the model can be loaded
via rte_ml_model_load() and then start it using rte_ml_model_start() API.
The rte_ml_model_params_update() can be used to update the model parameters such as weight
and bias without unloading the model using rte_ml_model_unload().

ML Inference: ML inference is the process of feeding data to the model via
rte_ml_enqueue_burst() API and use rte_ml_dequeue_burst() API to get the calculated
outputs/predictions from the started model.

In all functions of the ML device API, the ML device is designated by an
integer >= 0 named as device identifier *dev_id*.

The functions exported by the ML device API to setup a device designated by
its device identifier must be invoked in the following order:

     - rte_ml_dev_configure()
     - rte_ml_dev_queue_pair_setup()
     - rte_ml_dev_start()

A model is required to run the inference operations with the user specified inputs.
Application needs to invoke the ML model API in the following order before queueing
inference jobs.

     - rte_ml_model_load()
     - rte_ml_model_start()

The rte_ml_model_info() API is provided to retrieve the information related to the model.
The information would include the shape and type of input and output required for the inference.

Data quantization and dequantization is one of the main aspects in ML domain. This involves
conversion of input data from a higher precision to a lower precision data type and vice-versa
for the output. APIs are provided to enable quantization through rte_ml_io_quantize() and
dequantization through rte_ml_io_dequantize(). These APIs have the capability to handle input
and output buffers holding data for multiple batches.
Two utility APIs rte_ml_io_input_size_get() and rte_ml_io_output_size_get() can used to get the
size of quantized and de-quantized multi-batch input and output buffers.

User can optionally update the model parameters with rte_ml_model_params_update() after
invoking rte_ml_model_stop() API on a given model ID.

The application can invoke, in any order, the functions exported by the ML API to enqueue
inference jobs and dequeue inference response.

If the application wants to change the device configuration (i.e., call
rte_ml_dev_configure() or rte_ml_dev_queue_pair_setup()), then application must stop the
device using rte_ml_dev_stop() API. Likewise, if model parameters need to be updated then
the application must call rte_ml_model_stop() followed by rte_ml_model_params_update() API
for the given model. The application does not need to call rte_ml_dev_stop() API for
any model re-configuration such as rte_ml_model_params_update(), rte_ml_model_unload() etc.

Once the device is in the start state after invoking rte_ml_dev_start() API and the model is in
start state after invoking rte_ml_model_start() API, then the application can call
rte_ml_enqueue() and rte_ml_dequeue() API on the destined device and model ID.

Finally, an application can close an ML device by invoking the rte_ml_dev_close() function.

Typical application utilisation of the ML API will follow the following
programming flow.

- rte_ml_dev_configure()
- rte_ml_dev_queue_pair_setup()
- rte_ml_model_load()
- rte_ml_model_start()
- rte_ml_model_info()
- rte_ml_dev_start()
- rte_ml_enqueue_burst()
- rte_ml_dequeue_burst()
- rte_ml_model_stop()
- rte_ml_model_unload()
- rte_ml_dev_stop()
- rte_ml_dev_close()

Regarding multi-threading, by default, all the functions of the ML Device API exported by a PMD
are lock-free functions which assume to not be invoked in parallel on different logical cores
on the same target object. For instance, the dequeue function of a poll mode driver cannot be
invoked in parallel on two logical cores to operate on same queue pair. Of course, this function
can be invoked in parallel by different logical core on different queue pair.
It is the responsibility of the user application to enforce this rule.

Example application usage for ML inferencing
--------------------------------------------
This example application is to demonstrate the programming model of ML device
library. This example omits the error checks to simplify the application. This
example also assumes that the input data received is quantized and output expected
is also quantized. In order to handle non-quantized inputs and outputs, users can
invoke rte_ml_io_quantize() or rte_ml_io_dequantize() for data type conversions.

#define ML_MODEL_NAME "model"
#define IO_MZ "io_mz"

struct app_ctx {
	char model_file[PATH_MAX];
	char inp_file[PATH_MAX];
	char out_file[PATH_MAX];

	struct rte_ml_model_params params;
	struct rte_ml_model_info info;
	uint16_t id;

	uint64_t input_size;
	uint64_t output_size;
	uint8_t *input_buffer;
	uint8_t *output_buffer;
} __rte_cache_aligned;

struct app_ctx ctx;

static int
parse_args(int argc, char **argv)
{
	int opt, option_index;
	static struct option lgopts[] = {{"model", required_argument, NULL, 'm'},
					 {"input", required_argument, NULL, 'i'},
					 {"output", required_argument, NULL, 'o'},
					 {NULL, 0, NULL, 0}};

	while ((opt = getopt_long(argc, argv, "m:i:o:", lgopts, &option_index)) != EOF)
		switch (opt) {
		case 'm':
			strncpy(ctx.model_file, optarg, PATH_MAX - 1);
			break;
		case 'i':
			strncpy(ctx.inp_file, optarg, PATH_MAX - 1);
			break;
		case 'o':
			strncpy(ctx.out_file, optarg, PATH_MAX - 1);
			break;
		default:
			return -1;
		}

	return 0;
}

int
main(int argc, char **argv)
{
	struct rte_ml_dev_qp_conf qp_conf;
	struct rte_ml_dev_config config;
	struct rte_ml_dev_info dev_info;
	const struct rte_memzone *mz;
	struct rte_mempool *op_pool;
	struct rte_ml_op *op_enq;
	struct rte_ml_op *op_deq;

	FILE *fp;
	int rc;

	/* Initialize EAL */
	rc = rte_eal_init(argc, argv);
	if (rc < 0)
		rte_exit(EXIT_FAILURE, "Invalid EAL arguments\n");
	argc -= rc;
	argv += rc;

	/* Parse application arguments (after the EAL args) */
	if (parse_args(argc, argv) < 0)
		rte_exit(EXIT_FAILURE, "Invalid application arguments\n");

	/* Step 1: Check for ML devices */
	if (rte_ml_dev_count() <= 0)
		rte_exit(EXIT_FAILURE, "Failed to find ML devices\n");

	/* Step 2: Get device info */
	if (rte_ml_dev_info_get(0, &dev_info) != 0)
		rte_exit(EXIT_FAILURE, "Failed to get device info\n");

	/* Step 3: Configure ML device, use device 0 */
	config.socket_id = rte_ml_dev_socket_id(0);
	config.max_nb_models = dev_info.max_models;
	config.nb_queue_pairs = dev_info.max_queue_pairs;
	if (rte_ml_dev_configure(0, &config) != 0)
		rte_exit(EXIT_FAILURE, "Device configuration failed\n");

	/* Step 4: Setup queue pairs, used qp_id = 0 */
	qp_conf.nb_desc = 1;
	if (rte_ml_dev_queue_pair_setup(0, 0, &qp_conf, config.socket_id) != 0)
		rte_exit(EXIT_FAILURE, "Queue-pair setup failed\n");

	/* Step 5: Start device */
	if (rte_ml_dev_start(0) != 0)
		rte_exit(EXIT_FAILURE, "Device start failed\n");

	/* Step 6: Read model data and update load params structure */
	fp = fopen(ctx.model_file, "r+");
	if (fp == NULL)
		rte_exit(EXIT_FAILURE, "Failed to open model file\n");

	fseek(fp, 0, SEEK_END);
	ctx.params.size = ftell(fp);
	fseek(fp, 0, SEEK_SET);

	ctx.params.addr = malloc(ctx.params.size);
	if (fread(ctx.params.addr, 1, ctx.params.size, fp) != ctx.params.size){
		fclose(fp);
		rte_exit(EXIT_FAILURE, "Failed to read model\n");
	}
	fclose(fp);
	strcpy(ctx.params.name, ML_MODEL_NAME);

	/* Step 7: Load the model */
	if (rte_ml_model_load(0, &ctx.params, &ctx.id) != 0)
		rte_exit(EXIT_FAILURE, "Failed to load model\n");
	free(ctx.params.addr);

	/* Step 8: Start the model */
	if (rte_ml_model_start(0, ctx.id) != 0)
		rte_exit(EXIT_FAILURE, "Failed to start model\n");

	/* Step 9: Allocate buffers for quantized input and output */

	/* Get model information */
	if (rte_ml_model_info_get(0, ctx.id, &ctx.info) != 0)
		rte_exit(EXIT_FAILURE, "Failed to get model info\n");

	/* Get the buffer size for input and output */
	rte_ml_io_input_size_get(0, ctx.id, ctx.info.batch_size, &ctx.input_size, NULL);
	rte_ml_io_output_size_get(0, ctx.id, ctx.info.batch_size, &ctx.output_size, NULL);

	mz = rte_memzone_reserve(IO_MZ, ctx.input_size + ctx.output_size, config.socket_id, 0);
	if (mz == NULL)
		rte_exit(EXIT_FAILURE, "Failed to create IO memzone\n");

	ctx.input_buffer = mz->addr;
	ctx.output_buffer = ctx.input_buffer + ctx.input_size;

	/* Step 10: Fill the input data */
	fp = fopen(ctx.inp_file, "r+");
	if (fp == NULL)
		rte_exit(EXIT_FAILURE, "Failed to open input file\n");

	if (fread(ctx.input_buffer, 1, ctx.input_size, fp) != ctx.input_size) {
		fclose(fp);
		rte_exit(EXIT_FAILURE, "Failed to read input file\n");
	}
	fclose(fp);

	/* Step 11: Create ML op mempool */
	op_pool = rte_ml_op_pool_create("ml_op_pool", 1, 0, 0, config.socket_id);
	if (op_pool == NULL)
		rte_exit(EXIT_FAILURE, "Failed to create op pool\n");

	/* Step 12: Form an ML op */
	rte_mempool_get_bulk(op_pool, (void *)op_enq, 1);
	op_enq->model_id = ctx.id;
	op_enq->nb_batches = ctx.info.batch_size;
	op_enq->mempool = op_pool;
	op_enq->input.addr = ctx.input_buffer;
	op_enq->input.length = ctx.input_size;
	op_enq->input.next = NULL;
	op_enq->output.addr = ctx.output_buffer;
	op_enq->output.length = ctx.output_size;
	op_enq->output.next = NULL;

	/* Step 13: Enqueue jobs */
	rte_ml_enqueue_burst(0, 0, &op_enq, 1);

	/* Step 14: Dequeue jobs and release op pool */
	while (rte_ml_dequeue_burst(0, 0, &op_deq, 1) != 1)
		;

	/* Step 15: Write output */
	fp = fopen(ctx.out_file, "w+");
	if (fp == NULL)
		rte_exit(EXIT_FAILURE, "Failed to open output file\n");
	fwrite(ctx.output_buffer, 1, ctx.output_size, fp);
	fclose(fp);

	/* Step 16: Clean up */
	/* Stop ML model */
	rte_ml_model_stop(0, ctx.id);
	/* Unload ML model */
	rte_ml_model_unload(0, ctx.id);
	/* Free input/output memory */
	rte_memzone_free(rte_memzone_lookup(IO_MZ));
	/* Free the ml op back to pool */
	rte_mempool_put_bulk(op_pool, (void *)op_deq, 1);
	/* Free ml op pool */
	rte_mempool_free(op_pool);
	/* Stop the device */
	rte_ml_dev_stop(0);
	rte_ml_dev_close(0);
	rte_eal_cleanup();

	return 0;
}


Jerin Jacob (1):
  mldev: introduce machine learning device library

Srikanth Yalavarthi (11):
  mldev: add PMD functions for ML device
  mldev: support device handling functions
  mldev: support device queue-pair setup
  mldev: support handling ML models
  mldev: support input and output data handling
  mldev: support op pool and its operations
  mldev: support inference enqueue and dequeue
  mldev: support device statistics
  mldev: support device extended statistics
  mldev: support to retrieve error information
  mldev: support to get debug info and test device

 MAINTAINERS                              |    5 +
 config/rte_config.h                      |    3 +
 doc/api/doxy-api-index.md                |    1 +
 doc/api/doxy-api.conf.in                 |    1 +
 doc/guides/prog_guide/img/mldev_flow.svg |  714 ++++++++++++++
 doc/guides/prog_guide/index.rst          |    1 +
 doc/guides/prog_guide/mldev.rst          |  186 ++++
 lib/eal/common/eal_common_log.c          |    1 +
 lib/eal/include/rte_log.h                |    1 +
 lib/meson.build                          |    1 +
 lib/mldev/meson.build                    |   27 +
 lib/mldev/rte_mldev.c                    |  901 ++++++++++++++++++
 lib/mldev/rte_mldev.h                    | 1092 ++++++++++++++++++++++
 lib/mldev/rte_mldev_core.h               |  724 ++++++++++++++
 lib/mldev/rte_mldev_pmd.c                |   61 ++
 lib/mldev/rte_mldev_pmd.h                |  151 +++
 lib/mldev/version.map                    |   49 +
 17 files changed, 3919 insertions(+)
 create mode 100644 doc/guides/prog_guide/img/mldev_flow.svg
 create mode 100644 doc/guides/prog_guide/mldev.rst
 create mode 100644 lib/mldev/meson.build
 create mode 100644 lib/mldev/rte_mldev.c
 create mode 100644 lib/mldev/rte_mldev.h
 create mode 100644 lib/mldev/rte_mldev_core.h
 create mode 100644 lib/mldev/rte_mldev_pmd.c
 create mode 100644 lib/mldev/rte_mldev_pmd.h
 create mode 100644 lib/mldev/version.map
  

Comments

Thomas Monjalon Jan. 25, 2023, 2:20 p.m. UTC | #1
14/11/2022 13:02, jerinj@marvell.com:
> From: Jerin Jacob <jerinj@marvell.com>
> 
> Machine learning inference library
> ==================================
> 
> Definition of machine learning inference
> ----------------------------------------
> Inference in machine learning is the process of making an output prediction
> based on new input data using a pre-trained machine learning model.
> 
> The scope of the RFC would include only inferencing with pre-trained machine learning models,
> training and building/compiling the ML models is out of scope for this RFC or
> DPDK mldev API. Use existing machine learning compiler frameworks for model creation.
> 
> Motivation for the new library
> ------------------------------
> Multiple semiconductor vendors are offering accelerator products such as DPU
> (often called Smart-NIC), FPGA, GPU, etc., which have ML inferencing capabilities
> integrated as part of the product. Use of ML inferencing is increasing in the domain
> of packet processing for flow classification, intrusion, malware and anomaly detection.
> 
> Lack of inferencing support through DPDK APIs will involve complexities and
> increased latency from moving data across frameworks (i.e, dataplane to
> non dataplane ML frameworks and vice-versa). Having a standardized DPDK APIs for ML
> inferencing would enable the dataplane solutions to harness the benefit of inline
> inferencing supported by the hardware.
> 
> Contents
> ---------------
> 
> A) API specification for:
> 
> 1) Discovery of ML capabilities (e.g., device specific features) in a vendor
> independent fashion
> 2) Definition of functions to handle ML devices, which includes probing,
> initialization and termination of the devices.
> 3) Definition of functions to handle ML models used to perform inference operations.
> 4) Definition of function to handle quantize and dequantize operations
> 
> B) Common code for above specification

Can we compare this library with WinML?
https://learn.microsoft.com/en-us/windows/ai/windows-ml/api-reference
Is there things we can learn from it?


> ML Model: An ML model is an algorithm trained over a dataset. A model consists of
> procedure/algorithm and data/pattern required to make predictions on live data.
> Once the model is created and trained outside of the DPDK scope, the model can be loaded
> via rte_ml_model_load() and then start it using rte_ml_model_start() API.
> The rte_ml_model_params_update() can be used to update the model parameters such as weight
> and bias without unloading the model using rte_ml_model_unload().

The fact that the model is prepared outside means the model format is free
and probably different per mldev driver.
I think it is OK but it requires a lot of documentation effort to explain
how to bind the model and its parameters with the DPDK API.
Also we may need to pass some metadata from the model builder
to the inference engine in order to enable optimizations prepared in the model.
And the other way, we may need inference capabilities in order to generate
an optimized model which can run in the inference engine.


[...]
> Typical application utilisation of the ML API will follow the following
> programming flow.
> 
> - rte_ml_dev_configure()
> - rte_ml_dev_queue_pair_setup()
> - rte_ml_model_load()
> - rte_ml_model_start()
> - rte_ml_model_info()
> - rte_ml_dev_start()
> - rte_ml_enqueue_burst()
> - rte_ml_dequeue_burst()
> - rte_ml_model_stop()
> - rte_ml_model_unload()
> - rte_ml_dev_stop()
> - rte_ml_dev_close()

Where is parameters update in this flow?
Should we update all parameters at once or can it be done more fine-grain?


Question about the memory used by mldev:
Can we manage where the memory is allocated (host, device, mix, etc)?
  
Jerin Jacob Jan. 25, 2023, 7:01 p.m. UTC | #2
On Wed, Jan 25, 2023 at 7:50 PM Thomas Monjalon <thomas@monjalon.net> wrote:
>
> 14/11/2022 13:02, jerinj@marvell.com:
> > From: Jerin Jacob <jerinj@marvell.com>
> >
> > Machine learning inference library
> > ==================================
> >
> > Definition of machine learning inference
> > ----------------------------------------
> > Inference in machine learning is the process of making an output prediction
> > based on new input data using a pre-trained machine learning model.
> >
> > The scope of the RFC would include only inferencing with pre-trained machine learning models,
> > training and building/compiling the ML models is out of scope for this RFC or
> > DPDK mldev API. Use existing machine learning compiler frameworks for model creation.
> >
> > Motivation for the new library
> > ------------------------------
> > Multiple semiconductor vendors are offering accelerator products such as DPU
> > (often called Smart-NIC), FPGA, GPU, etc., which have ML inferencing capabilities
> > integrated as part of the product. Use of ML inferencing is increasing in the domain
> > of packet processing for flow classification, intrusion, malware and anomaly detection.
> >
> > Lack of inferencing support through DPDK APIs will involve complexities and
> > increased latency from moving data across frameworks (i.e, dataplane to
> > non dataplane ML frameworks and vice-versa). Having a standardized DPDK APIs for ML
> > inferencing would enable the dataplane solutions to harness the benefit of inline
> > inferencing supported by the hardware.
> >
> > Contents
> > ---------------
> >
> > A) API specification for:
> >
> > 1) Discovery of ML capabilities (e.g., device specific features) in a vendor
> > independent fashion
> > 2) Definition of functions to handle ML devices, which includes probing,
> > initialization and termination of the devices.
> > 3) Definition of functions to handle ML models used to perform inference operations.
> > 4) Definition of function to handle quantize and dequantize operations
> >
> > B) Common code for above specification


Thanks for the review.

>
> Can we compare this library with WinML?
> https://learn.microsoft.com/en-us/windows/ai/windows-ml/api-reference

Proposed DPDK library supports only inferencing with pre-trained models.

> Is there things we can learn from it?

Comparing to winML, API provide functionality similar to
"LearningModel*" classes provides.
Support related to handling custom operators and Native APIs like
winML is provided through this API.
There may more features which we can add where there are drivers which
supports it.

>
>
> > ML Model: An ML model is an algorithm trained over a dataset. A model consists of
> > procedure/algorithm and data/pattern required to make predictions on live data.
> > Once the model is created and trained outside of the DPDK scope, the model can be loaded
> > via rte_ml_model_load() and then start it using rte_ml_model_start() API.
> > The rte_ml_model_params_update() can be used to update the model parameters such as weight
> > and bias without unloading the model using rte_ml_model_unload().
>
> The fact that the model is prepared outside means the model format is free
> and probably different per mldev driver.
> I think it is OK but it requires a lot of documentation effort to explain
> how to bind the model and its parameters with the DPDK API.
> Also we may need to pass some metadata from the model builder
> to the inference engine in order to enable optimizations prepared in the model.
> And the other way, we may need inference capabilities in order to generate
> an optimized model which can run in the inference engine.

The base API specification kept absolute minimum. Currently, weight and biases
parameters updated through rte_ml_model_params_update(). It can be extended
when there are drivers supports it or if you have any specific
parameter you would like to add
it in rte_ml_model_params_update().

Other metadata data like batch, shapes, formats queried using rte_ml_io_info().


>
>
> [...]
> > Typical application utilisation of the ML API will follow the following
> > programming flow.
> >
> > - rte_ml_dev_configure()
> > - rte_ml_dev_queue_pair_setup()
> > - rte_ml_model_load()
> > - rte_ml_model_start()
> > - rte_ml_model_info()
> > - rte_ml_dev_start()
> > - rte_ml_enqueue_burst()
> > - rte_ml_dequeue_burst()
> > - rte_ml_model_stop()
> > - rte_ml_model_unload()
> > - rte_ml_dev_stop()
> > - rte_ml_dev_close()
>
> Where is parameters update in this flow?

Added the mandatory APIs in the top level flow doc.
rte_ml_model_params_update() used to update the parameters.

> Should we update all parameters at once or can it be done more fine-grain?

Currently, rte_ml_model_params_update() can be used to update weight
and bias via buffer when device is
in stop state and without unloading the model.

>
>
> Question about the memory used by mldev:
> Can we manage where the memory is allocated (host, device, mix, etc)?

Just passing buffer pointers now like other subsystem.
Other EAL infra service can take care of the locality of memory as it
is not specific to ML dev.

+/** ML operation's input and output buffer representation as scatter
gather list
+ */
+struct rte_ml_buff_seg {
+ rte_iova_t iova_addr;
+ /**< IOVA address of segment buffer. */
+ void *addr;
+ /**< Virtual address of segment buffer. */
+ uint32_t length;
+ /**< Segment length. */
+ uint32_t reserved;
+ /**< Reserved for future use. */
+ struct rte_ml_buff_seg *next;
+ /**< Points to next segment. Value NULL represents the last segment. */
+};


>
>
  
Thomas Monjalon Jan. 26, 2023, 11:11 a.m. UTC | #3
25/01/2023 20:01, Jerin Jacob:
> On Wed, Jan 25, 2023 at 7:50 PM Thomas Monjalon <thomas@monjalon.net> wrote:
> > 14/11/2022 13:02, jerinj@marvell.com:
> > > ML Model: An ML model is an algorithm trained over a dataset. A model consists of
> > > procedure/algorithm and data/pattern required to make predictions on live data.
> > > Once the model is created and trained outside of the DPDK scope, the model can be loaded
> > > via rte_ml_model_load() and then start it using rte_ml_model_start() API.
> > > The rte_ml_model_params_update() can be used to update the model parameters such as weight
> > > and bias without unloading the model using rte_ml_model_unload().
> >
> > The fact that the model is prepared outside means the model format is free
> > and probably different per mldev driver.
> > I think it is OK but it requires a lot of documentation effort to explain
> > how to bind the model and its parameters with the DPDK API.
> > Also we may need to pass some metadata from the model builder
> > to the inference engine in order to enable optimizations prepared in the model.
> > And the other way, we may need inference capabilities in order to generate
> > an optimized model which can run in the inference engine.
> 
> The base API specification kept absolute minimum. Currently, weight and biases
> parameters updated through rte_ml_model_params_update(). It can be extended
> when there are drivers supports it or if you have any specific
> parameter you would like to add
> it in rte_ml_model_params_update().

This function is
int rte_ml_model_params_update(int16_t dev_id, int16_t model_id, void *buffer);

How are we supposed to provide separate parameters in this void* ?


> Other metadata data like batch, shapes, formats queried using rte_ml_io_info().

Copying:
+/** Input and output data information structure
+ *
+ * Specifies the type and shape of input and output data.
+ */
+struct rte_ml_io_info {
+       char name[RTE_ML_STR_MAX];
+       /**< Name of data */
+       struct rte_ml_io_shape shape;
+       /**< Shape of data */
+       enum rte_ml_io_type qtype;
+       /**< Type of quantized data */
+       enum rte_ml_io_type dtype;
+       /**< Type of de-quantized data */
+};

Is it the right place to notify the app that some model optimizations
are supported? (example: merge some operations in the graph)

> > [...]
> > > Typical application utilisation of the ML API will follow the following
> > > programming flow.
> > >
> > > - rte_ml_dev_configure()
> > > - rte_ml_dev_queue_pair_setup()
> > > - rte_ml_model_load()
> > > - rte_ml_model_start()
> > > - rte_ml_model_info()
> > > - rte_ml_dev_start()
> > > - rte_ml_enqueue_burst()
> > > - rte_ml_dequeue_burst()
> > > - rte_ml_model_stop()
> > > - rte_ml_model_unload()
> > > - rte_ml_dev_stop()
> > > - rte_ml_dev_close()
> >
> > Where is parameters update in this flow?
> 
> Added the mandatory APIs in the top level flow doc.
> rte_ml_model_params_update() used to update the parameters.

The question is "where" should it be done?
Before/after start?

> > Should we update all parameters at once or can it be done more fine-grain?
> 
> Currently, rte_ml_model_params_update() can be used to update weight
> and bias via buffer when device is
> in stop state and without unloading the model.

The question is "can we update a single parameter"?
And how?

> > Question about the memory used by mldev:
> > Can we manage where the memory is allocated (host, device, mix, etc)?
> 
> Just passing buffer pointers now like other subsystem.
> Other EAL infra service can take care of the locality of memory as it
> is not specific to ML dev.

I was thinking about memory allocation required by the inference engine.
How to specify where to allocate? Is it just hardcoded in the driver?
  
Shivah Shankar Shankar Narayan Rao Jan. 27, 2023, 2:33 a.m. UTC | #4
External Email

----------------------------------------------------------------------
25/01/2023 20:01, Jerin Jacob:
> On Wed, Jan 25, 2023 at 7:50 PM Thomas Monjalon <thomas@monjalon.net> wrote:
> > 14/11/2022 13:02, jerinj@marvell.com:
> > > ML Model: An ML model is an algorithm trained over a dataset. A 
> > > model consists of procedure/algorithm and data/pattern required to make predictions on live data.
> > > Once the model is created and trained outside of the DPDK scope, 
> > > the model can be loaded via rte_ml_model_load() and then start it using rte_ml_model_start() API.
> > > The rte_ml_model_params_update() can be used to update the model 
> > > parameters such as weight and bias without unloading the model using rte_ml_model_unload().
> >
> > The fact that the model is prepared outside means the model format 
> > is free and probably different per mldev driver.
> > I think it is OK but it requires a lot of documentation effort to 
> > explain how to bind the model and its parameters with the DPDK API.
> > Also we may need to pass some metadata from the model builder to the 
> > inference engine in order to enable optimizations prepared in the model.
> > And the other way, we may need inference capabilities in order to 
> > generate an optimized model which can run in the inference engine.
> 
> The base API specification kept absolute minimum. Currently, weight 
> and biases parameters updated through rte_ml_model_params_update(). It 
> can be extended when there are drivers supports it or if you have any 
> specific parameter you would like to add it in 
> rte_ml_model_params_update().

This function is
int rte_ml_model_params_update(int16_t dev_id, int16_t model_id, void *buffer);

How are we supposed to provide separate parameters in this void* ?

Just to clarify on what "parameters" mean, they just mean weights and biases of the model, which are the parameters for a model.
Also, the Proposed APIs are for running the inference on a pre-trained model. For running the inference the amount of parameters tuning needed/done is limited/none.
The only parameters that get may get changed are the Weights and Bias which the API rte_ml_model_params_update() caters to. 

While running the inference on a Model there won't be any random addition or removal of operators to/from the model or there won't be any changes in the actual flow of model.
Since the only parameter that can be changed is Weights and Biases the above API should take care.

> Other metadata data like batch, shapes, formats queried using rte_ml_io_info().

Copying:
+/** Input and output data information structure
+ *
+ * Specifies the type and shape of input and output data.
+ */
+struct rte_ml_io_info {
+       char name[RTE_ML_STR_MAX];
+       /**< Name of data */
+       struct rte_ml_io_shape shape;
+       /**< Shape of data */
+       enum rte_ml_io_type qtype;
+       /**< Type of quantized data */
+       enum rte_ml_io_type dtype;
+       /**< Type of de-quantized data */ };

Is it the right place to notify the app that some model optimizations are supported? (example: merge some operations in the graph)

The inference is run on a pre-trained model, which means any merges /additions of operations to the graph are NOT done. 
If any such things are done then the changed model needs to go through the training and compilation once again which is out of scope of these APIs.

> > [...]
> > > Typical application utilisation of the ML API will follow the 
> > > following programming flow.
> > >
> > > - rte_ml_dev_configure()
> > > - rte_ml_dev_queue_pair_setup()
> > > - rte_ml_model_load()
> > > - rte_ml_model_start()
> > > - rte_ml_model_info()
> > > - rte_ml_dev_start()
> > > - rte_ml_enqueue_burst()
> > > - rte_ml_dequeue_burst()
> > > - rte_ml_model_stop()
> > > - rte_ml_model_unload()
> > > - rte_ml_dev_stop()
> > > - rte_ml_dev_close()
> >
> > Where is parameters update in this flow?
> 
> Added the mandatory APIs in the top level flow doc.
> rte_ml_model_params_update() used to update the parameters.

The question is "where" should it be done?
Before/after start?

The model image comes with the Weights and Bias and will be loaded and used as a part of rte_ml_model_load and rte_ml_model_start. 
In rare scenarios where the user wants to update the Weights and Bias of an already loaded model, the rte_ml_model_stop can be called to stop the model and the Weights and Biases can be updated using the The parameters (Weights&Biases) can be updated when the  rte_ml_model_params_update() API followed by rte_ml_model_start to start the model with the new Weights and Biases.

> > Should we update all parameters at once or can it be done more fine-grain?
> 
> Currently, rte_ml_model_params_update() can be used to update weight 
> and bias via buffer when device is in stop state and without unloading 
> the model.

The question is "can we update a single parameter"?
And how?
As mentioned above for running inference the model is already trained the only parameter that is updated is the Weights and Biases. 
"Parameters" is another word for Weights and Bias. No other parameters are considered.

Are there any other parameters you have on your mind?

> > Question about the memory used by mldev:
> > Can we manage where the memory is allocated (host, device, mix, etc)?
> 
> Just passing buffer pointers now like other subsystem.
> Other EAL infra service can take care of the locality of memory as it 
> is not specific to ML dev.

I was thinking about memory allocation required by the inference engine.
How to specify where to allocate? Is it just hardcoded in the driver?

Any memory within the hardware is managed by the driver.
  
Jerin Jacob Jan. 27, 2023, 4:29 a.m. UTC | #5
On Fri, Jan 27, 2023 at 8:04 AM Shivah Shankar Shankar Narayan Rao
<sshankarnara@marvell.com> wrote:
>
> External Email
>
> ----------------------------------------------------------------------
> 25/01/2023 20:01, Jerin Jacob:
> > On Wed, Jan 25, 2023 at 7:50 PM Thomas Monjalon <thomas@monjalon.net> wrote:
> > > 14/11/2022 13:02, jerinj@marvell.com:
> > > > ML Model: An ML model is an algorithm trained over a dataset. A
> > > > model consists of procedure/algorithm and data/pattern required to make predictions on live data.
> > > > Once the model is created and trained outside of the DPDK scope,
> > > > the model can be loaded via rte_ml_model_load() and then start it using rte_ml_model_start() API.
> > > > The rte_ml_model_params_update() can be used to update the model
> > > > parameters such as weight and bias without unloading the model using rte_ml_model_unload().
> > >
> > > The fact that the model is prepared outside means the model format
> > > is free and probably different per mldev driver.
> > > I think it is OK but it requires a lot of documentation effort to
> > > explain how to bind the model and its parameters with the DPDK API.
> > > Also we may need to pass some metadata from the model builder to the
> > > inference engine in order to enable optimizations prepared in the model.
> > > And the other way, we may need inference capabilities in order to
> > > generate an optimized model which can run in the inference engine.
> >
> > The base API specification kept absolute minimum. Currently, weight
> > and biases parameters updated through rte_ml_model_params_update(). It
> > can be extended when there are drivers supports it or if you have any
> > specific parameter you would like to add it in
> > rte_ml_model_params_update().
>
> This function is
> int rte_ml_model_params_update(int16_t dev_id, int16_t model_id, void *buffer);
>
> How are we supposed to provide separate parameters in this void* ?
>
> Just to clarify on what "parameters" mean, they just mean weights and biases of the model, which are the parameters for a model.
> Also, the Proposed APIs are for running the inference on a pre-trained model. For running the inference the amount of parameters tuning needed/done is limited/none.
> The only parameters that get may get changed are the Weights and Bias which the API rte_ml_model_params_update() caters to.
>
> While running the inference on a Model there won't be any random addition or removal of operators to/from the model or there won't be any changes in the actual flow of model.
> Since the only parameter that can be changed is Weights and Biases the above API should take care.
>
> > Other metadata data like batch, shapes, formats queried using rte_ml_io_info().
>
> Copying:
> +/** Input and output data information structure
> + *
> + * Specifies the type and shape of input and output data.
> + */
> +struct rte_ml_io_info {
> +       char name[RTE_ML_STR_MAX];
> +       /**< Name of data */
> +       struct rte_ml_io_shape shape;
> +       /**< Shape of data */
> +       enum rte_ml_io_type qtype;
> +       /**< Type of quantized data */
> +       enum rte_ml_io_type dtype;
> +       /**< Type of de-quantized data */ };
>
> Is it the right place to notify the app that some model optimizations are supported? (example: merge some operations in the graph)
>
> The inference is run on a pre-trained model, which means any merges /additions of operations to the graph are NOT done.
> If any such things are done then the changed model needs to go through the training and compilation once again which is out of scope of these APIs.
>
> > > [...]
> > > > Typical application utilisation of the ML API will follow the
> > > > following programming flow.
> > > >
> > > > - rte_ml_dev_configure()
> > > > - rte_ml_dev_queue_pair_setup()
> > > > - rte_ml_model_load()
> > > > - rte_ml_model_start()
> > > > - rte_ml_model_info()
> > > > - rte_ml_dev_start()
> > > > - rte_ml_enqueue_burst()
> > > > - rte_ml_dequeue_burst()
> > > > - rte_ml_model_stop()
> > > > - rte_ml_model_unload()
> > > > - rte_ml_dev_stop()
> > > > - rte_ml_dev_close()
> > >
> > > Where is parameters update in this flow?
> >
> > Added the mandatory APIs in the top level flow doc.
> > rte_ml_model_params_update() used to update the parameters.
>
> The question is "where" should it be done?
> Before/after start?
>
> The model image comes with the Weights and Bias and will be loaded and used as a part of rte_ml_model_load and rte_ml_model_start.
> In rare scenarios where the user wants to update the Weights and Bias of an already loaded model, the rte_ml_model_stop can be called to stop the model and the Weights and Biases can be updated using the The parameters (Weights&Biases) can be updated when the  rte_ml_model_params_update() API followed by rte_ml_model_start to start the model with the new Weights and Biases.
>
> > > Should we update all parameters at once or can it be done more fine-grain?
> >
> > Currently, rte_ml_model_params_update() can be used to update weight
> > and bias via buffer when device is in stop state and without unloading
> > the model.
>
> The question is "can we update a single parameter"?
> And how?
> As mentioned above for running inference the model is already trained the only parameter that is updated is the Weights and Biases.
> "Parameters" is another word for Weights and Bias. No other parameters are considered.
>
> Are there any other parameters you have on your mind?
>
> > > Question about the memory used by mldev:
> > > Can we manage where the memory is allocated (host, device, mix, etc)?
> >
> > Just passing buffer pointers now like other subsystem.
> > Other EAL infra service can take care of the locality of memory as it
> > is not specific to ML dev.
>
> I was thinking about memory allocation required by the inference engine.
> How to specify where to allocate? Is it just hardcoded in the driver?
>
> Any memory within the hardware is managed by the driver.

I think, Thomas is asking input and output memory for interference. If
so, the parameters for
struct rte_ml_buff_seg or needs to add type or so. Thomas, Please
propose what parameters you want here.
In case if it is for internal driver memory, We can pass the memory
type in rte_ml_dev_configure(), If so, please propose
the memory types you need and the parameters.



>
>
  
Thomas Monjalon Jan. 27, 2023, 11:34 a.m. UTC | #6
Hi,

Shivah Shankar, please quote your replies
so we can distinguish what I said from what you say.

Please try to understand my questions, you tend to reply to something else.


27/01/2023 05:29, Jerin Jacob:
> On Fri, Jan 27, 2023 at 8:04 AM Shivah Shankar Shankar Narayan Rao
> <sshankarnara@marvell.com> wrote:
> > 25/01/2023 20:01, Jerin Jacob:
> > > On Wed, Jan 25, 2023 at 7:50 PM Thomas Monjalon <thomas@monjalon.net> wrote:
> > > > 14/11/2022 13:02, jerinj@marvell.com:
> > > > > > ML Model: An ML model is an algorithm trained over a dataset. A
> > > > > > model consists of procedure/algorithm and data/pattern required to
> > > > > > make predictions on live data. Once the model is created and
> > > > > > trained outside of the DPDK scope,
> > > > > > the model can be loaded via rte_ml_model_load() and then start it
> > > > > > using rte_ml_model_start() API. The rte_ml_model_params_update()
> > > > > > can be used to update the model
> > > > > > parameters such as weight and bias without unloading the model
> > > > > > using rte_ml_model_unload().> > > > > 
> > > > > The fact that the model is prepared outside means the model format
> > > > > is free and probably different per mldev driver.
> > > > > I think it is OK but it requires a lot of documentation effort to
> > > > > explain how to bind the model and its parameters with the DPDK API.
> > > > > Also we may need to pass some metadata from the model builder to the
> > > > > inference engine in order to enable optimizations prepared in the
> > > > > model.
> > > > > And the other way, we may need inference capabilities in order to
> > > > > generate an optimized model which can run in the inference engine.
> > > > 
> > > > The base API specification kept absolute minimum. Currently, weight
> > > > and biases parameters updated through rte_ml_model_params_update(). It
> > > > can be extended when there are drivers supports it or if you have any
> > > > specific parameter you would like to add it in
> > > > rte_ml_model_params_update().
> > > 
> > > This function is
> > > int rte_ml_model_params_update(int16_t dev_id, int16_t model_id, void
> > > *buffer);
> > > 
> > > How are we supposed to provide separate parameters in this void* ?
> >
> > Just to clarify on what "parameters" mean,
> > they just mean weights and biases of the model,
> > which are the parameters for a model.
> > Also, the Proposed APIs are for running the inference
> > on a pre-trained model.
> > For running the inference the amount of parameters tuning
> > needed/done is limited/none.

Why is it limited?
I think you are limiting to *your* model.

> > The only parameters that get may get changed are the Weights and Bias
> > which the API rte_ml_model_params_update() caters to.

We cannot imagine a model with more type of parameters?

> > While running the inference on a Model there won't be any random
> > addition or removal of operators to/from the model or there won't
> > be any changes in the actual flow of model.
> > Since the only parameter that can be changed is Weights and Biases
> > the above API should take care.

No, you don't reply to my question.
I want to be able to change a single parameter.
I am expecting a more fine-grain API than a simple "void*".
We could give the name of the parameter and a value, why not?

> > > > Other metadata data like batch, shapes, formats queried using
> > > > rte_ml_io_info().
> > > Copying:
> > > +/** Input and output data information structure
> > > + *
> > > + * Specifies the type and shape of input and output data.
> > > + */
> > > +struct rte_ml_io_info {
> > > +       char name[RTE_ML_STR_MAX];
> > > +       /**< Name of data */
> > > +       struct rte_ml_io_shape shape;
> > > +       /**< Shape of data */
> > > +       enum rte_ml_io_type qtype;
> > > +       /**< Type of quantized data */
> > > +       enum rte_ml_io_type dtype;
> > > +       /**< Type of de-quantized data */ };
> > > 
> > > Is it the right place to notify the app that some model optimizations
> > > are supported? (example: merge some operations in the graph)
> >
> > The inference is run on a pre-trained model, which means
> > any merges /additions of operations to the graph are NOT done.
> > If any such things are done then the changed model needs to go
> > through the training and compilation once again
> > which is out of scope of these APIs.

Please try to understand what I am saying.
I want the application to be able to know some capabilities are supported
by the inference driver.
So it will allow to generate the model with some optimizations.

> > > > > [...]
> > > > > > Typical application utilisation of the ML API will follow the
> > > > > > following programming flow.
> > > > > > 
> > > > > > - rte_ml_dev_configure()
> > > > > > - rte_ml_dev_queue_pair_setup()
> > > > > > - rte_ml_model_load()
> > > > > > - rte_ml_model_start()
> > > > > > - rte_ml_model_info()
> > > > > > - rte_ml_dev_start()
> > > > > > - rte_ml_enqueue_burst()
> > > > > > - rte_ml_dequeue_burst()
> > > > > > - rte_ml_model_stop()
> > > > > > - rte_ml_model_unload()
> > > > > > - rte_ml_dev_stop()
> > > > > > - rte_ml_dev_close()
> > > > > 
> > > > > Where is parameters update in this flow?
> > > > 
> > > > Added the mandatory APIs in the top level flow doc.
> > > > rte_ml_model_params_update() used to update the parameters.
> > > 
> > > The question is "where" should it be done?
> > > Before/after start?
> >
> > The model image comes with the Weights and Bias
> > and will be loaded and used as a part of rte_ml_model_load
> > and rte_ml_model_start.
> > In rare scenarios where the user wants to update
> > the Weights and Bias of an already loaded model,
> > the rte_ml_model_stop can be called to stop the model
> > and the Weights and Biases can be updated using the
> > The parameters (Weights&Biases) can be updated
> > when the  rte_ml_model_params_update() API
> > followed by rte_ml_model_start to start the model
> > with the new Weights and Biases.

OK please sure it is documented that parameters update
must be done on a stopped engine.

> > > > > Should we update all parameters at once or can it be done more
> > > > > fine-grain?
> > > > 
> > > > Currently, rte_ml_model_params_update() can be used to update weight
> > > > and bias via buffer when device is in stop state and without unloading
> > > > the model.

Passing a raw buffer is a really dark API.
We need to know how to fill the buffer.

> > > The question is "can we update a single parameter"?
> > > And how?
> > 
> > As mentioned above for running inference the model is already trained
> > the only parameter that is updated is the Weights and Biases.
> > "Parameters" is another word for Weights and Bias.
> > No other parameters are considered.

You are not replying to the question.
How can we update a single parameter?

> > Are there any other parameters you have on your mind?

No

> > > > > Question about the memory used by mldev:
> > > > > Can we manage where the memory is allocated (host, device, mix,
> > > > > etc)?
> > > > 
> > > > Just passing buffer pointers now like other subsystem.
> > > > Other EAL infra service can take care of the locality of memory as it
> > > > is not specific to ML dev.
> > > 
> > > I was thinking about memory allocation required by the inference engine.
> > > How to specify where to allocate? Is it just hardcoded in the driver?
> >
> > Any memory within the hardware is managed by the driver.
> 
> I think, Thomas is asking input and output memory for interference. If
> so, the parameters for
> struct rte_ml_buff_seg or needs to add type or so. Thomas, Please
> propose what parameters you want here.
> In case if it is for internal driver memory, We can pass the memory
> type in rte_ml_dev_configure(), If so, please propose
> the memory types you need and the parameters.

I'm talking about the memory used by the driver to make the inference works.
In some cases we may prefer the hardware using host memory,
sometimes use the device memory.
I think that's something we may tune in the configuration.
I suppose we are fine with allocation hardcoded in the driver for now,
as I don't have a clear need.
  
Jerin Jacob Jan. 28, 2023, 11:27 a.m. UTC | #7
On Fri, Jan 27, 2023 at 6:28 PM Thomas Monjalon <thomas@monjalon.net> wrote:
>
> Hi,
>
> Shivah Shankar, please quote your replies
> so we can distinguish what I said from what you say.
>
> Please try to understand my questions, you tend to reply to something else.
>
>
> 27/01/2023 05:29, Jerin Jacob:
> > On Fri, Jan 27, 2023 at 8:04 AM Shivah Shankar Shankar Narayan Rao
> > <sshankarnara@marvell.com> wrote:
> > > 25/01/2023 20:01, Jerin Jacob:
> > > > On Wed, Jan 25, 2023 at 7:50 PM Thomas Monjalon <thomas@monjalon.net> wrote:
> > > > > 14/11/2022 13:02, jerinj@marvell.com:
> > > > > > > ML Model: An ML model is an algorithm trained over a dataset. A
> > > > > > > model consists of procedure/algorithm and data/pattern required to
> > > > > > > make predictions on live data. Once the model is created and
> > > > > > > trained outside of the DPDK scope,
> > > > > > > the model can be loaded via rte_ml_model_load() and then start it
> > > > > > > using rte_ml_model_start() API. The rte_ml_model_params_update()
> > > > > > > can be used to update the model
> > > > > > > parameters such as weight and bias without unloading the model
> > > > > > > using rte_ml_model_unload().> > > > >
> > > > > > The fact that the model is prepared outside means the model format
> > > > > > is free and probably different per mldev driver.
> > > > > > I think it is OK but it requires a lot of documentation effort to
> > > > > > explain how to bind the model and its parameters with the DPDK API.
> > > > > > Also we may need to pass some metadata from the model builder to the
> > > > > > inference engine in order to enable optimizations prepared in the
> > > > > > model.
> > > > > > And the other way, we may need inference capabilities in order to
> > > > > > generate an optimized model which can run in the inference engine.
> > > > >
> > > > > The base API specification kept absolute minimum. Currently, weight
> > > > > and biases parameters updated through rte_ml_model_params_update(). It
> > > > > can be extended when there are drivers supports it or if you have any
> > > > > specific parameter you would like to add it in
> > > > > rte_ml_model_params_update().
> > > >
> > > > This function is
> > > > int rte_ml_model_params_update(int16_t dev_id, int16_t model_id, void
> > > > *buffer);
> > > >
> > > > How are we supposed to provide separate parameters in this void* ?
> > >
> > > Just to clarify on what "parameters" mean,
> > > they just mean weights and biases of the model,
> > > which are the parameters for a model.
> > > Also, the Proposed APIs are for running the inference
> > > on a pre-trained model.
> > > For running the inference the amount of parameters tuning
> > > needed/done is limited/none.
>
> Why is it limited?
> I think you are limiting to *your* model.

See below.


>
> > > The only parameters that get may get changed are the Weights and Bias
> > > which the API rte_ml_model_params_update() caters to.
>
> We cannot imagine a model with more type of parameters?
>
> > > While running the inference on a Model there won't be any random
> > > addition or removal of operators to/from the model or there won't
> > > be any changes in the actual flow of model.
> > > Since the only parameter that can be changed is Weights and Biases
> > > the above API should take care.
>
> No, you don't reply to my question.
> I want to be able to change a single parameter.
> I am expecting a more fine-grain API than a simple "void*".
> We could give the name of the parameter and a value, why not?

The current API model is follows
1)The model is developed outside DPDK and binary file is loaded via
rte_ml_model_load()
2)The modes "read only" capabilities like shape  or quantized data can
be read through
rte_ml_model_info_get() API. If you wish to advertise any other
capability for optimization etc
please give inline reply around rte_ml_io_info for the parameter and
its comment.
We can review and add it.
3)Now comes the parameter, which is the "update" on the model which
loaded prior via rte_ml_model_load() .
Also, it created outside DPDK. User have an "update" to the parameter
when we have new set of training happens.
Currently we are assuming this as single blob due to that fact that It
is model specific and it just continues
stream of bytes from model and thus void* is given.

If you have use case or your model support more parameter update as
separate blob, we should
able to update rte_ml_model_params_update() as needed. Please suggest the new
rte_ml_model_params_type enum or so. We can add that to
rte_ml_model_params_update().
Also, if you have concrete data type instead of void* for given TYPE.
Please propose the structure
for that as well, We should be able to update struct rte_ml_dev_info
for these capabilities
to abstract the models or inference engine differences.

>
> > > > > Other metadata data like batch, shapes, formats queried using
> > > > > rte_ml_io_info().
> > > > Copying:
> > > > +/** Input and output data information structure
> > > > + *
> > > > + * Specifies the type and shape of input and output data.
> > > > + */
> > > > +struct rte_ml_io_info {
> > > > +       char name[RTE_ML_STR_MAX];
> > > > +       /**< Name of data */
> > > > +       struct rte_ml_io_shape shape;
> > > > +       /**< Shape of data */
> > > > +       enum rte_ml_io_type qtype;
> > > > +       /**< Type of quantized data */
> > > > +       enum rte_ml_io_type dtype;
> > > > +       /**< Type of de-quantized data */ };
> > > >
> > > > Is it the right place to notify the app that some model optimizations
> > > > are supported? (example: merge some operations in the graph)
> > >
> > > The inference is run on a pre-trained model, which means
> > > any merges /additions of operations to the graph are NOT done.
> > > If any such things are done then the changed model needs to go
> > > through the training and compilation once again
> > > which is out of scope of these APIs.
>
> Please try to understand what I am saying.
> I want the application to be able to know some capabilities are supported
> by the inference driver.
> So it will allow to generate the model with some optimizations.

See above. Yes, this place to add that. Please propose any changes
that you want to add.

>
> > > > > > [...]
> > > > > > > Typical application utilisation of the ML API will follow the
> > > > > > > following programming flow.
> > > > > > >
> > > > > > > - rte_ml_dev_configure()
> > > > > > > - rte_ml_dev_queue_pair_setup()
> > > > > > > - rte_ml_model_load()
> > > > > > > - rte_ml_model_start()
> > > > > > > - rte_ml_model_info()
> > > > > > > - rte_ml_dev_start()
> > > > > > > - rte_ml_enqueue_burst()
> > > > > > > - rte_ml_dequeue_burst()
> > > > > > > - rte_ml_model_stop()
> > > > > > > - rte_ml_model_unload()
> > > > > > > - rte_ml_dev_stop()
> > > > > > > - rte_ml_dev_close()
> > > > > >
> > > > > > Where is parameters update in this flow?
> > > > >
> > > > > Added the mandatory APIs in the top level flow doc.
> > > > > rte_ml_model_params_update() used to update the parameters.
> > > >
> > > > The question is "where" should it be done?
> > > > Before/after start?
> > >
> > > The model image comes with the Weights and Bias
> > > and will be loaded and used as a part of rte_ml_model_load
> > > and rte_ml_model_start.
> > > In rare scenarios where the user wants to update
> > > the Weights and Bias of an already loaded model,
> > > the rte_ml_model_stop can be called to stop the model
> > > and the Weights and Biases can be updated using the
> > > The parameters (Weights&Biases) can be updated
> > > when the  rte_ml_model_params_update() API
> > > followed by rte_ml_model_start to start the model
> > > with the new Weights and Biases.
>
> OK please sure it is documented that parameters update
> must be done on a stopped engine.

The doc is already there in the exisitng patch. Please see

+/**
+ * Update the model parameters without unloading model.
+ *
+ * Update model parameters such as weights and bias without unloading
the model.
+ * rte_ml_model_stop() must be called before invoking this API.
+ *
+ * @param[in] dev_id
+ *   The identifier of the device.
+ * @param[in] model_id
+ *   Identifier for the model created
+ * @param[in] buffer
+ *   Pointer to the model weights and bias buffer.
+ * Size of the buffer is equal to wb_size returned in *rte_ml_model_info*.
+ *
+ * @return
+ *   - Returns 0 on success
+ *   - Returns negative value on failure
+ */
+__rte_experimental
+int
+rte_ml_model_params_update(int16_t dev_id, int16_t model_id, void *buffer);


>
> > > > > > Should we update all parameters at once or can it be done more
> > > > > > fine-grain?
> > > > >
> > > > > Currently, rte_ml_model_params_update() can be used to update weight
> > > > > and bias via buffer when device is in stop state and without unloading
> > > > > the model.
>
> Passing a raw buffer is a really dark API.
> We need to know how to fill the buffer.

See above,  Currently it is model specific and model is spitting out
the paramater
blob update after the traning or so. DPDK interfence engine API is a means to
transport these blob from model to ML engine.


>
> > > > The question is "can we update a single parameter"?
> > > > And how?
> > >
> > > As mentioned above for running inference the model is already trained
> > > the only parameter that is updated is the Weights and Biases.
> > > "Parameters" is another word for Weights and Bias.
> > > No other parameters are considered.
>
> You are not replying to the question.
> How can we update a single parameter?

See above.

I see main comments are on param update and get the capablities.
To enable that, please propose the changes around rte_ml_model_params_update(),
rte_ml_model_info. We should able to take that and send v2.


>
> > > Are there any other parameters you have on your mind?
>
> No
>
> > > > > > Question about the memory used by mldev:
> > > > > > Can we manage where the memory is allocated (host, device, mix,
> > > > > > etc)?
> > > > >
> > > > > Just passing buffer pointers now like other subsystem.
> > > > > Other EAL infra service can take care of the locality of memory as it
> > > > > is not specific to ML dev.
> > > >
> > > > I was thinking about memory allocation required by the inference engine.
> > > > How to specify where to allocate? Is it just hardcoded in the driver?
> > >
> > > Any memory within the hardware is managed by the driver.
> >
> > I think, Thomas is asking input and output memory for interference. If
> > so, the parameters for
> > struct rte_ml_buff_seg or needs to add type or so. Thomas, Please
> > propose what parameters you want here.
> > In case if it is for internal driver memory, We can pass the memory
> > type in rte_ml_dev_configure(), If so, please propose
> > the memory types you need and the parameters.
>
> I'm talking about the memory used by the driver to make the inference works.
> In some cases we may prefer the hardware using host memory,
> sometimes use the device memory.
> I think that's something we may tune in the configuration.
> I suppose we are fine with allocation hardcoded in the driver for now,
> as I don't have a clear need.

OK.


>
>
  
Thomas Monjalon Feb. 1, 2023, 4:57 p.m. UTC | #8
28/01/2023 12:27, Jerin Jacob:
> I see main comments are on param update and get the capablities.
> To enable that, please propose the changes around rte_ml_model_params_update(),
> rte_ml_model_info. We should able to take that and send v2.

Sorry I don't have the bandwidth to work on mldev now.
I understand you took the easy path of opaque pointer,
and you are OK to refine it if needed.
Because there is not much reviews, I think we should merge it as-is
and keep it experimental the time needed to have more feedbacks
and a second vendor implementing it.
  
Jerin Jacob Feb. 1, 2023, 5:33 p.m. UTC | #9
On Wed, Feb 1, 2023 at 10:27 PM Thomas Monjalon <thomas@monjalon.net> wrote:
>
> 28/01/2023 12:27, Jerin Jacob:
> > I see main comments are on param update and get the capablities.
> > To enable that, please propose the changes around rte_ml_model_params_update(),
> > rte_ml_model_info. We should able to take that and send v2.
>
> Sorry I don't have the bandwidth to work on mldev now.

Understandable.

> I understand you took the easy path of opaque pointer,

I would say not easy path, rather the use case I am not aware of and
the model that we are supporting.

> and you are OK to refine it if needed.

Yes


> Because there is not much reviews, I think we should merge it as-is
> and keep it experimental the time needed to have more feedbacks
> and a second vendor implementing it.

Ack.I think, it is reasonable the first patch was pushed on Aug3. It
was around 6 months for reviews.
https://inbox.dpdk.org/dev/20220803132839.2747858-2-jerinj@marvell.com/


>
>