[v4,3/3] ethdev: add standby flags for live migration

Message ID 20230118154447.595231-4-rongweil@nvidia.com (mailing list archive)
State Superseded, archived
Delegated to: Ferruh Yigit
Headers
Series add API for live migration |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/loongarch-compilation success Compilation OK
ci/loongarch-unit-testing success Unit Testing PASS
ci/Intel-compilation success Compilation OK
ci/iol-broadcom-Performance success Performance Testing PASS
ci/iol-mellanox-Performance success Performance Testing PASS
ci/iol-intel-Performance success Performance Testing PASS
ci/intel-Testing success Testing PASS
ci/iol-aarch64-unit-testing success Testing PASS
ci/iol-intel-Functional success Functional Testing PASS
ci/iol-x86_64-compile-testing success Testing PASS
ci/github-robot: build success github build: passed
ci/iol-testing success Testing PASS
ci/iol-x86_64-unit-testing success Testing PASS
ci/iol-aarch64-compile-testing success Testing PASS
ci/iol-abi-testing success Testing PASS

Commit Message

Rongwei Liu Jan. 18, 2023, 3:44 p.m. UTC
  Some flags are added to the process state API for live migration
in order to change the behavior of the flow rules in a standby process.

Signed-off-by: Rongwei Liu <rongweil@nvidia.com>
---
 lib/ethdev/rte_ethdev.h | 21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)
  

Comments

Jerin Jacob Jan. 23, 2023, 1:20 p.m. UTC | #1
On Wed, Jan 18, 2023 at 9:15 PM Rongwei Liu <rongweil@nvidia.com> wrote:
>
> Some flags are added to the process state API for live migration
> in order to change the behavior of the flow rules in a standby process.
>
> Signed-off-by: Rongwei Liu <rongweil@nvidia.com>
> ---
>  lib/ethdev/rte_ethdev.h | 21 +++++++++++++++++++++
>  1 file changed, 21 insertions(+)
>
> diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
> index 1505396ced..9ae4f426a7 100644
> --- a/lib/ethdev/rte_ethdev.h
> +++ b/lib/ethdev/rte_ethdev.h
> @@ -2260,6 +2260,27 @@ int rte_eth_dev_owner_get(const uint16_t port_id,
>  __rte_experimental
>  int rte_eth_process_set_role(bool standby, uint32_t flags);
>
> +/**@{@name Process role flags
> + * used when migrating from an application to another one.
> + * @see rte_eth_process_set_active
> + */
> +/**
> + * When set on a standby process, ingress flow rules will be effective
> + * in active and standby processes, so the ingress traffic may be duplicated.
> + */
> +#define RTE_ETH_PROCESS_FLAG_STANDBY_DUP_FLOW_INGRESS      RTE_BIT32(0)


How to duplicate if action has statefull items for example,
rte_flow_action_security::security_session -> it store the live pointer
rte_flow_action_meter::mtr_id; -> MTR object ID created with rte_mtr_create()
  
Rongwei Liu Jan. 30, 2023, 2:47 a.m. UTC | #2
Hi Jerin

BR
Rongwei

> -----Original Message-----
> From: Jerin Jacob <jerinjacobk@gmail.com>
> Sent: Monday, January 23, 2023 21:20
> To: Rongwei Liu <rongweil@nvidia.com>
> Cc: dev@dpdk.org; Matan Azrad <matan@nvidia.com>; Slava Ovsiienko
> <viacheslavo@nvidia.com>; Ori Kam <orika@nvidia.com>; NBU-Contact-
> Thomas Monjalon (EXTERNAL) <thomas@monjalon.net>;
> stephen@networkplumber.org; Raslan Darawsheh <rasland@nvidia.com>;
> Ferruh Yigit <ferruh.yigit@amd.com>; Andrew Rybchenko
> <andrew.rybchenko@oktetlabs.ru>
> Subject: Re: [PATCH v4 3/3] ethdev: add standby flags for live migration
> 
> External email: Use caution opening links or attachments
> 
> 
> On Wed, Jan 18, 2023 at 9:15 PM Rongwei Liu <rongweil@nvidia.com> wrote:
> >
> > Some flags are added to the process state API for live migration in
> > order to change the behavior of the flow rules in a standby process.
> >
> > Signed-off-by: Rongwei Liu <rongweil@nvidia.com>
> > ---
> >  lib/ethdev/rte_ethdev.h | 21 +++++++++++++++++++++
> >  1 file changed, 21 insertions(+)
> >
> > diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h index
> > 1505396ced..9ae4f426a7 100644
> > --- a/lib/ethdev/rte_ethdev.h
> > +++ b/lib/ethdev/rte_ethdev.h
> > @@ -2260,6 +2260,27 @@ int rte_eth_dev_owner_get(const uint16_t
> > port_id,  __rte_experimental  int rte_eth_process_set_role(bool
> > standby, uint32_t flags);
> >
> > +/**@{@name Process role flags
> > + * used when migrating from an application to another one.
> > + * @see rte_eth_process_set_active
> > + */
> > +/**
> > + * When set on a standby process, ingress flow rules will be
> > +effective
> > + * in active and standby processes, so the ingress traffic may be duplicated.
> > + */
> > +#define RTE_ETH_PROCESS_FLAG_STANDBY_DUP_FLOW_INGRESS
> RTE_BIT32(0)
> 
> 
> How to duplicate if action has statefull items for example,
> rte_flow_action_security::security_session -> it store the live pointer
> rte_flow_action_meter::mtr_id; -> MTR object ID created with
> rte_mtr_create()
I agree with you, not all actions can be supported in the active/standby model.
That' why we have return value checking and rollback.
In Nvidia driver doc, we suggested user to start from 'rss/queue/jump' actions.
Meter is possible, at least per my view.
Assume: "meter g_action queue 0 / y_action drop / r_action drop"
Old application: create meter_id 'A' with pre-defined limitation.
New application: create meter_id 'B' which has the same parameters with 'A'.
1. 1st possible approach:
	Hardware duplicates the traffic; old application use meter 'A' and new application uses meter 'B' to control traffic throughputs.
	Since traffic is duplicated, so it can go to different meters. 
2. 2nd possible approach:
             Meter 'A' and 'B' point to the same hardware resource, and traffic reaches this part first and if green, duplication happens.
  
Jerin Jacob Jan. 30, 2023, 5:10 p.m. UTC | #3
On Mon, Jan 30, 2023 at 8:17 AM Rongwei Liu <rongweil@nvidia.com> wrote:
>
> Hi Jerin
>
> BR
> Rongwei
>
> > -----Original Message-----
> > From: Jerin Jacob <jerinjacobk@gmail.com>
> > Sent: Monday, January 23, 2023 21:20
> > To: Rongwei Liu <rongweil@nvidia.com>
> > Cc: dev@dpdk.org; Matan Azrad <matan@nvidia.com>; Slava Ovsiienko
> > <viacheslavo@nvidia.com>; Ori Kam <orika@nvidia.com>; NBU-Contact-
> > Thomas Monjalon (EXTERNAL) <thomas@monjalon.net>;
> > stephen@networkplumber.org; Raslan Darawsheh <rasland@nvidia.com>;
> > Ferruh Yigit <ferruh.yigit@amd.com>; Andrew Rybchenko
> > <andrew.rybchenko@oktetlabs.ru>
> > Subject: Re: [PATCH v4 3/3] ethdev: add standby flags for live migration
> >
> > External email: Use caution opening links or attachments
> >
> >
> > On Wed, Jan 18, 2023 at 9:15 PM Rongwei Liu <rongweil@nvidia.com> wrote:
> > >
> > > Some flags are added to the process state API for live migration in
> > > order to change the behavior of the flow rules in a standby process.
> > >
> > > Signed-off-by: Rongwei Liu <rongweil@nvidia.com>
> > > ---
> > >  lib/ethdev/rte_ethdev.h | 21 +++++++++++++++++++++
> > >  1 file changed, 21 insertions(+)
> > >
> > > diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h index
> > > 1505396ced..9ae4f426a7 100644
> > > --- a/lib/ethdev/rte_ethdev.h
> > > +++ b/lib/ethdev/rte_ethdev.h
> > > @@ -2260,6 +2260,27 @@ int rte_eth_dev_owner_get(const uint16_t
> > > port_id,  __rte_experimental  int rte_eth_process_set_role(bool
> > > standby, uint32_t flags);
> > >
> > > +/**@{@name Process role flags
> > > + * used when migrating from an application to another one.
> > > + * @see rte_eth_process_set_active
> > > + */
> > > +/**
> > > + * When set on a standby process, ingress flow rules will be
> > > +effective
> > > + * in active and standby processes, so the ingress traffic may be duplicated.
> > > + */
> > > +#define RTE_ETH_PROCESS_FLAG_STANDBY_DUP_FLOW_INGRESS
> > RTE_BIT32(0)
> >
> >
> > How to duplicate if action has statefull items for example,
> > rte_flow_action_security::security_session -> it store the live pointer
> > rte_flow_action_meter::mtr_id; -> MTR object ID created with
> > rte_mtr_create()
> I agree with you, not all actions can be supported in the active/standby model.

IMO, Where ever rules are not standalone (like QUEUE, RSS) etc, It
will be architecturally is not possible to migrate with pointers.
That's where I have concern generalizing this feature for this ethdev.

Also, I don't believe there is any real HW support needed for this.
IMO, Having DPDK standard multiprocess can do this by keeping
secondary application can migrate, keeping all the SW logic in the
primary process by doing the housekeeping in the application. On plus
side,
it works with pointers too.

I am not sure how much housekeeping offload to _HW_ in your case. In
my view, it should be generic utils functions to track the flow
and installing the rules using rte_flow APIs and keep the scope only
for rte_flow.

That's just my view. I leave to ethdev maintainers for the rest of the
review and decision on this series.

> That' why we have return value checking and rollback.
> In Nvidia driver doc, we suggested user to start from 'rss/queue/jump' actions.
> Meter is possible, at least per my view.
> Assume: "meter g_action queue 0 / y_action drop / r_action drop"
> Old application: create meter_id 'A' with pre-defined limitation.
> New application: create meter_id 'B' which has the same parameters with 'A'.
> 1. 1st possible approach:
>         Hardware duplicates the traffic; old application use meter 'A' and new application uses meter 'B' to control traffic throughputs.
>         Since traffic is duplicated, so it can go to different meters.
> 2. 2nd possible approach:
>              Meter 'A' and 'B' point to the same hardware resource, and traffic reaches this part first and if green, duplication happens.
  
Rongwei Liu Jan. 31, 2023, 2:53 a.m. UTC | #4
HI Jerin:

BR
Rongwei

> -----Original Message-----
> From: Jerin Jacob <jerinjacobk@gmail.com>
> Sent: Tuesday, January 31, 2023 01:10
> To: Rongwei Liu <rongweil@nvidia.com>
> Cc: dev@dpdk.org; Matan Azrad <matan@nvidia.com>; Slava Ovsiienko
> <viacheslavo@nvidia.com>; Ori Kam <orika@nvidia.com>; NBU-Contact-
> Thomas Monjalon (EXTERNAL) <thomas@monjalon.net>;
> stephen@networkplumber.org; Raslan Darawsheh <rasland@nvidia.com>;
> Ferruh Yigit <ferruh.yigit@amd.com>; Andrew Rybchenko
> <andrew.rybchenko@oktetlabs.ru>
> Subject: Re: [PATCH v4 3/3] ethdev: add standby flags for live migration
> 
> External email: Use caution opening links or attachments
> 
> 
> On Mon, Jan 30, 2023 at 8:17 AM Rongwei Liu <rongweil@nvidia.com> wrote:
> >
> > Hi Jerin
> >
> > BR
> > Rongwei
> >
> > > -----Original Message-----
> > > From: Jerin Jacob <jerinjacobk@gmail.com>
> > > Sent: Monday, January 23, 2023 21:20
> > > To: Rongwei Liu <rongweil@nvidia.com>
> > > Cc: dev@dpdk.org; Matan Azrad <matan@nvidia.com>; Slava Ovsiienko
> > > <viacheslavo@nvidia.com>; Ori Kam <orika@nvidia.com>; NBU-Contact-
> > > Thomas Monjalon (EXTERNAL) <thomas@monjalon.net>;
> > > stephen@networkplumber.org; Raslan Darawsheh <rasland@nvidia.com>;
> > > Ferruh Yigit <ferruh.yigit@amd.com>; Andrew Rybchenko
> > > <andrew.rybchenko@oktetlabs.ru>
> > > Subject: Re: [PATCH v4 3/3] ethdev: add standby flags for live
> > > migration
> > >
> > > External email: Use caution opening links or attachments
> > >
> > >
> > > On Wed, Jan 18, 2023 at 9:15 PM Rongwei Liu <rongweil@nvidia.com>
> wrote:
> > > >
> > > > Some flags are added to the process state API for live migration
> > > > in order to change the behavior of the flow rules in a standby process.
> > > >
> > > > Signed-off-by: Rongwei Liu <rongweil@nvidia.com>
> > > > ---
> > > >  lib/ethdev/rte_ethdev.h | 21 +++++++++++++++++++++
> > > >  1 file changed, 21 insertions(+)
> > > >
> > > > diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
> > > > index
> > > > 1505396ced..9ae4f426a7 100644
> > > > --- a/lib/ethdev/rte_ethdev.h
> > > > +++ b/lib/ethdev/rte_ethdev.h
> > > > @@ -2260,6 +2260,27 @@ int rte_eth_dev_owner_get(const uint16_t
> > > > port_id,  __rte_experimental  int rte_eth_process_set_role(bool
> > > > standby, uint32_t flags);
> > > >
> > > > +/**@{@name Process role flags
> > > > + * used when migrating from an application to another one.
> > > > + * @see rte_eth_process_set_active  */
> > > > +/**
> > > > + * When set on a standby process, ingress flow rules will be
> > > > +effective
> > > > + * in active and standby processes, so the ingress traffic may be
> duplicated.
> > > > + */
> > > > +#define RTE_ETH_PROCESS_FLAG_STANDBY_DUP_FLOW_INGRESS
> > > RTE_BIT32(0)
> > >
> > >
> > > How to duplicate if action has statefull items for example,
> > > rte_flow_action_security::security_session -> it store the live
> > > pointer rte_flow_action_meter::mtr_id; -> MTR object ID created with
> > > rte_mtr_create()
> > I agree with you, not all actions can be supported in the active/standby
> model.
> 
> IMO, Where ever rules are not standalone (like QUEUE, RSS) etc, It will be
> architecturally is not possible to migrate with pointers.
> That's where I have concern generalizing this feature for this ethdev.
> 
Not sure I understand your concern correctly. What' the pointer concept here?
Queue RSS actions can be migrated per my local test. Active/Standby application have its fully own rxq/txq.
They are totally separated processes and like two members in pipeline. 2nd member can't be feed if 1st member alive and handle the traffic. 

> Also, I don't believe there is any real HW support needed for this.
> IMO, Having DPDK standard multiprocess can do this by keeping secondary
> application can migrate, keeping all the SW logic in the primary process by
> doing the housekeeping in the application. On plus side, it works with pointers
> too.
IMO, in multiple process model, primary process usually owns the hardware resources via mmap/iomap/pci_map etc.
Secondary process is not able to run if primary quits no matter gracefully or crashing.
This patch wants to introduce a "backup to alive" model.
Assume user wants to upgrade from DPDK version 22.03 to 23.03, 22.03 is running and active role while 23.03 comes up in standby.
Both DPDK processes have its own resources and doesn't rely on each other. 
User can migrate the application following the steps in commit message with minimum traffic downtime. 
SW logic like flow rules can be done following iptables-save/iptables-restore approach.  
> 
> I am not sure how much housekeeping offload to _HW_ in your case. In my
> view, it should be generic utils functions to track the flow and installing the
> rules using rte_flow APIs and keep the scope only for rte_flow.
For rules part, totally agree with you. Issue is there maybe millions of flow rules in field and each rule may take different steps to 
re-install per vendor' implementations.
This serial wants to propose a unified interface for upper layer application' easy use.  
> 
> That's just my view. I leave to ethdev maintainers for the rest of the review
> and decision on this series.
> 
> > That' why we have return value checking and rollback.
> > In Nvidia driver doc, we suggested user to start from 'rss/queue/jump'
> actions.
> > Meter is possible, at least per my view.
> > Assume: "meter g_action queue 0 / y_action drop / r_action drop"
> > Old application: create meter_id 'A' with pre-defined limitation.
> > New application: create meter_id 'B' which has the same parameters with
> 'A'.
> > 1. 1st possible approach:
> >         Hardware duplicates the traffic; old application use meter 'A' and new
> application uses meter 'B' to control traffic throughputs.
> >         Since traffic is duplicated, so it can go to different meters.
> > 2. 2nd possible approach:
> >              Meter 'A' and 'B' point to the same hardware resource, and traffic
> reaches this part first and if green, duplication happens.
  
Jerin Jacob Jan. 31, 2023, 8:45 a.m. UTC | #5
On Tue, Jan 31, 2023 at 8:23 AM Rongwei Liu <rongweil@nvidia.com> wrote:
>
> HI Jerin:
>
> BR
> Rongwei
>
> > -----Original Message-----
> > From: Jerin Jacob <jerinjacobk@gmail.com>
> > Sent: Tuesday, January 31, 2023 01:10
> > To: Rongwei Liu <rongweil@nvidia.com>
> > Cc: dev@dpdk.org; Matan Azrad <matan@nvidia.com>; Slava Ovsiienko
> > <viacheslavo@nvidia.com>; Ori Kam <orika@nvidia.com>; NBU-Contact-
> > Thomas Monjalon (EXTERNAL) <thomas@monjalon.net>;
> > stephen@networkplumber.org; Raslan Darawsheh <rasland@nvidia.com>;
> > Ferruh Yigit <ferruh.yigit@amd.com>; Andrew Rybchenko
> > <andrew.rybchenko@oktetlabs.ru>
> > Subject: Re: [PATCH v4 3/3] ethdev: add standby flags for live migration
> >
> > External email: Use caution opening links or attachments
> >
> >
> > On Mon, Jan 30, 2023 at 8:17 AM Rongwei Liu <rongweil@nvidia.com> wrote:
> > >
> > > Hi Jerin
> > >
> > > BR
> > > Rongwei
> > >
> > > > -----Original Message-----
> > > > From: Jerin Jacob <jerinjacobk@gmail.com>
> > > > Sent: Monday, January 23, 2023 21:20
> > > > To: Rongwei Liu <rongweil@nvidia.com>
> > > > Cc: dev@dpdk.org; Matan Azrad <matan@nvidia.com>; Slava Ovsiienko
> > > > <viacheslavo@nvidia.com>; Ori Kam <orika@nvidia.com>; NBU-Contact-
> > > > Thomas Monjalon (EXTERNAL) <thomas@monjalon.net>;
> > > > stephen@networkplumber.org; Raslan Darawsheh <rasland@nvidia.com>;
> > > > Ferruh Yigit <ferruh.yigit@amd.com>; Andrew Rybchenko
> > > > <andrew.rybchenko@oktetlabs.ru>
> > > > Subject: Re: [PATCH v4 3/3] ethdev: add standby flags for live
> > > > migration
> > > >
> > > > External email: Use caution opening links or attachments
> > > >
> > > >
> > > > On Wed, Jan 18, 2023 at 9:15 PM Rongwei Liu <rongweil@nvidia.com>
> > wrote:
> > > > >
> > > > > Some flags are added to the process state API for live migration
> > > > > in order to change the behavior of the flow rules in a standby process.
> > > > >
> > > > > Signed-off-by: Rongwei Liu <rongweil@nvidia.com>
> > > > > ---
> > > > >  lib/ethdev/rte_ethdev.h | 21 +++++++++++++++++++++
> > > > >  1 file changed, 21 insertions(+)
> > > > >
> > > > > diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
> > > > > index
> > > > > 1505396ced..9ae4f426a7 100644
> > > > > --- a/lib/ethdev/rte_ethdev.h
> > > > > +++ b/lib/ethdev/rte_ethdev.h
> > > > > @@ -2260,6 +2260,27 @@ int rte_eth_dev_owner_get(const uint16_t
> > > > > port_id,  __rte_experimental  int rte_eth_process_set_role(bool
> > > > > standby, uint32_t flags);
> > > > >
> > > > > +/**@{@name Process role flags
> > > > > + * used when migrating from an application to another one.
> > > > > + * @see rte_eth_process_set_active  */
> > > > > +/**
> > > > > + * When set on a standby process, ingress flow rules will be
> > > > > +effective
> > > > > + * in active and standby processes, so the ingress traffic may be
> > duplicated.
> > > > > + */
> > > > > +#define RTE_ETH_PROCESS_FLAG_STANDBY_DUP_FLOW_INGRESS
> > > > RTE_BIT32(0)
> > > >
> > > >
> > > > How to duplicate if action has statefull items for example,
> > > > rte_flow_action_security::security_session -> it store the live
> > > > pointer rte_flow_action_meter::mtr_id; -> MTR object ID created with
> > > > rte_mtr_create()
> > > I agree with you, not all actions can be supported in the active/standby
> > model.
> >
> > IMO, Where ever rules are not standalone (like QUEUE, RSS) etc, It will be
> > architecturally is not possible to migrate with pointers.
> > That's where I have concern generalizing this feature for this ethdev.
> >
> Not sure I understand your concern correctly. What' the pointer concept here?

I meant, Any HW resource driver deals with "pointers" or "fixed ID"
can not get the same value
for the new application. That's where I believe this whole concepts
works for very standalone rte_flow patterns and actions.


> Queue RSS actions can be migrated per my local test. Active/Standby application have its fully own rxq/txq.

Yes. It because it is standalone.

> They are totally separated processes and like two members in pipeline. 2nd member can't be feed if 1st member alive and handle the traffic.
>
> > Also, I don't believe there is any real HW support needed for this.
> > IMO, Having DPDK standard multiprocess can do this by keeping secondary
> > application can migrate, keeping all the SW logic in the primary process by
> > doing the housekeeping in the application. On plus side, it works with pointers
> > too.

> IMO, in multiple process model, primary process usually owns the hardware resources via mmap/iomap/pci_map etc.
> Secondary process is not able to run if primary quits no matter gracefully or crashing.
> This patch wants to introduce a "backup to alive" model.
> Assume user wants to upgrade from DPDK version 22.03 to 23.03, 22.03 is running and active role while 23.03 comes up in standby.
> Both DPDK processes have its own resources and doesn't rely on each other.
> User can migrate the application following the steps in commit message with minimum traffic downtime.
> SW logic like flow rules can be done following iptables-save/iptables-restore approach.
> >
> > I am not sure how much housekeeping offload to _HW_ in your case. In my
> > view, it should be generic utils functions to track the flow and installing the
> > rules using rte_flow APIs and keep the scope only for rte_flow.
> For rules part, totally agree with you. Issue is there maybe millions of flow rules in field and each rule may take different steps to
> re-install per vendor' implementations.

I understand the desire for millon flow migrations. Which makes
sense.IMO, It may be just easy to make this feature just
for rte_flow name space. Just have APIs to export() existing rules for
the given port and import() the rules exported
rather than going to ethdev space and call it as "live migration".

> This serial wants to propose a unified interface for upper layer application' easy use.
> >
> > That's just my view. I leave to ethdev maintainers for the rest of the review
> > and decision on this series.
> >
> > > That' why we have return value checking and rollback.
> > > In Nvidia driver doc, we suggested user to start from 'rss/queue/jump'
> > actions.
> > > Meter is possible, at least per my view.
> > > Assume: "meter g_action queue 0 / y_action drop / r_action drop"
> > > Old application: create meter_id 'A' with pre-defined limitation.
> > > New application: create meter_id 'B' which has the same parameters with
> > 'A'.
> > > 1. 1st possible approach:
> > >         Hardware duplicates the traffic; old application use meter 'A' and new
> > application uses meter 'B' to control traffic throughputs.
> > >         Since traffic is duplicated, so it can go to different meters.
> > > 2. 2nd possible approach:
> > >              Meter 'A' and 'B' point to the same hardware resource, and traffic
> > reaches this part first and if green, duplication happens.
  
Rongwei Liu Jan. 31, 2023, 9:01 a.m. UTC | #6
Hi Jerin:

BR
Rongwei

> -----Original Message-----
> From: Jerin Jacob <jerinjacobk@gmail.com>
> Sent: Tuesday, January 31, 2023 16:46
> To: Rongwei Liu <rongweil@nvidia.com>
> Cc: dev@dpdk.org; Matan Azrad <matan@nvidia.com>; Slava Ovsiienko
> <viacheslavo@nvidia.com>; Ori Kam <orika@nvidia.com>; NBU-Contact-
> Thomas Monjalon (EXTERNAL) <thomas@monjalon.net>;
> stephen@networkplumber.org; Raslan Darawsheh <rasland@nvidia.com>;
> Ferruh Yigit <ferruh.yigit@amd.com>; Andrew Rybchenko
> <andrew.rybchenko@oktetlabs.ru>
> Subject: Re: [PATCH v4 3/3] ethdev: add standby flags for live migration
> 
> External email: Use caution opening links or attachments
> 
> 
> On Tue, Jan 31, 2023 at 8:23 AM Rongwei Liu <rongweil@nvidia.com> wrote:
> >
> > HI Jerin:
> >
> > BR
> > Rongwei
> >
> > > -----Original Message-----
> > > From: Jerin Jacob <jerinjacobk@gmail.com>
> > > Sent: Tuesday, January 31, 2023 01:10
> > > To: Rongwei Liu <rongweil@nvidia.com>
> > > Cc: dev@dpdk.org; Matan Azrad <matan@nvidia.com>; Slava Ovsiienko
> > > <viacheslavo@nvidia.com>; Ori Kam <orika@nvidia.com>; NBU-Contact-
> > > Thomas Monjalon (EXTERNAL) <thomas@monjalon.net>;
> > > stephen@networkplumber.org; Raslan Darawsheh <rasland@nvidia.com>;
> > > Ferruh Yigit <ferruh.yigit@amd.com>; Andrew Rybchenko
> > > <andrew.rybchenko@oktetlabs.ru>
> > > Subject: Re: [PATCH v4 3/3] ethdev: add standby flags for live
> > > migration
> > >
> > > External email: Use caution opening links or attachments
> > >
> > >
> > > On Mon, Jan 30, 2023 at 8:17 AM Rongwei Liu <rongweil@nvidia.com>
> wrote:
> > > >
> > > > Hi Jerin
> > > >
> > > > BR
> > > > Rongwei
> > > >
> > > > > -----Original Message-----
> > > > > From: Jerin Jacob <jerinjacobk@gmail.com>
> > > > > Sent: Monday, January 23, 2023 21:20
> > > > > To: Rongwei Liu <rongweil@nvidia.com>
> > > > > Cc: dev@dpdk.org; Matan Azrad <matan@nvidia.com>; Slava
> > > > > Ovsiienko <viacheslavo@nvidia.com>; Ori Kam <orika@nvidia.com>;
> > > > > NBU-Contact- Thomas Monjalon (EXTERNAL) <thomas@monjalon.net>;
> > > > > stephen@networkplumber.org; Raslan Darawsheh
> > > > > <rasland@nvidia.com>; Ferruh Yigit <ferruh.yigit@amd.com>;
> > > > > Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
> > > > > Subject: Re: [PATCH v4 3/3] ethdev: add standby flags for live
> > > > > migration
> > > > >
> > > > > External email: Use caution opening links or attachments
> > > > >
> > > > >
> > > > > On Wed, Jan 18, 2023 at 9:15 PM Rongwei Liu
> > > > > <rongweil@nvidia.com>
> > > wrote:
> > > > > >
> > > > > > Some flags are added to the process state API for live
> > > > > > migration in order to change the behavior of the flow rules in a
> standby process.
> > > > > >
> > > > > > Signed-off-by: Rongwei Liu <rongweil@nvidia.com>
> > > > > > ---
> > > > > >  lib/ethdev/rte_ethdev.h | 21 +++++++++++++++++++++
> > > > > >  1 file changed, 21 insertions(+)
> > > > > >
> > > > > > diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
> > > > > > index
> > > > > > 1505396ced..9ae4f426a7 100644
> > > > > > --- a/lib/ethdev/rte_ethdev.h
> > > > > > +++ b/lib/ethdev/rte_ethdev.h
> > > > > > @@ -2260,6 +2260,27 @@ int rte_eth_dev_owner_get(const
> > > > > > uint16_t port_id,  __rte_experimental  int
> > > > > > rte_eth_process_set_role(bool standby, uint32_t flags);
> > > > > >
> > > > > > +/**@{@name Process role flags
> > > > > > + * used when migrating from an application to another one.
> > > > > > + * @see rte_eth_process_set_active  */
> > > > > > +/**
> > > > > > + * When set on a standby process, ingress flow rules will be
> > > > > > +effective
> > > > > > + * in active and standby processes, so the ingress traffic
> > > > > > +may be
> > > duplicated.
> > > > > > + */
> > > > > > +#define RTE_ETH_PROCESS_FLAG_STANDBY_DUP_FLOW_INGRESS
> > > > > RTE_BIT32(0)
> > > > >
> > > > >
> > > > > How to duplicate if action has statefull items for example,
> > > > > rte_flow_action_security::security_session -> it store the live
> > > > > pointer rte_flow_action_meter::mtr_id; -> MTR object ID created
> > > > > with
> > > > > rte_mtr_create()
> > > > I agree with you, not all actions can be supported in the
> > > > active/standby
> > > model.
> > >
> > > IMO, Where ever rules are not standalone (like QUEUE, RSS) etc, It
> > > will be architecturally is not possible to migrate with pointers.
> > > That's where I have concern generalizing this feature for this ethdev.
> > >
> > Not sure I understand your concern correctly. What' the pointer concept
> here?
> 
> I meant, Any HW resource driver deals with "pointers" or "fixed ID"
> can not get the same value
> for the new application. That's where I believe this whole concepts works for
> very standalone rte_flow patterns and actions.
> 
> 
> > Queue RSS actions can be migrated per my local test. Active/Standby
> application have its fully own rxq/txq.
> 
> Yes. It because it is standalone.
> 
> > They are totally separated processes and like two members in pipeline. 2nd
> member can't be feed if 1st member alive and handle the traffic.
> >
> > > Also, I don't believe there is any real HW support needed for this.
> > > IMO, Having DPDK standard multiprocess can do this by keeping
> > > secondary application can migrate, keeping all the SW logic in the
> > > primary process by doing the housekeeping in the application. On
> > > plus side, it works with pointers too.
> 
> > IMO, in multiple process model, primary process usually owns the hardware
> resources via mmap/iomap/pci_map etc.
> > Secondary process is not able to run if primary quits no matter gracefully or
> crashing.
> > This patch wants to introduce a "backup to alive" model.
> > Assume user wants to upgrade from DPDK version 22.03 to 23.03, 22.03 is
> running and active role while 23.03 comes up in standby.
> > Both DPDK processes have its own resources and doesn't rely on each other.
> > User can migrate the application following the steps in commit message
> with minimum traffic downtime.
> > SW logic like flow rules can be done following iptables-save/iptables-restore
> approach.
> > >
> > > I am not sure how much housekeeping offload to _HW_ in your case. In
> > > my view, it should be generic utils functions to track the flow and
> > > installing the rules using rte_flow APIs and keep the scope only for
> rte_flow.
> > For rules part, totally agree with you. Issue is there maybe millions
> > of flow rules in field and each rule may take different steps to re-install per
> vendor' implementations.
> 
> I understand the desire for millon flow migrations. Which makes sense.IMO, It
> may be just easy to make this feature just for rte_flow name space. Just have
> APIs to export() existing rules for the given port and import() the rules
> exported rather than going to ethdev space and call it as "live migration".
> 
Do you mean the API naming should be "rte_flow_process_set_role()" instead of "rte_eth_process_set_role()" ?
Also move to rte_flow.c/.h files? Are we good to keep the PMD callback in eth_dev layer?
Simple export()/import() may not work. Image some flow rules are exclusive and can't be issued from both applications. 
We need to stop old application. I am afraid this will introduce big time window which traffic stops. 
Application won't like this behavior.
With this callback, each PMD can specify each rule, queue it or use lower priority if exclusive. Or return error.

> > This serial wants to propose a unified interface for upper layer application'
> easy use.
> > >
> > > That's just my view. I leave to ethdev maintainers for the rest of
> > > the review and decision on this series.
> > >
> > > > That' why we have return value checking and rollback.
> > > > In Nvidia driver doc, we suggested user to start from 'rss/queue/jump'
> > > actions.
> > > > Meter is possible, at least per my view.
> > > > Assume: "meter g_action queue 0 / y_action drop / r_action drop"
> > > > Old application: create meter_id 'A' with pre-defined limitation.
> > > > New application: create meter_id 'B' which has the same parameters
> > > > with
> > > 'A'.
> > > > 1. 1st possible approach:
> > > >         Hardware duplicates the traffic; old application use meter
> > > > 'A' and new
> > > application uses meter 'B' to control traffic throughputs.
> > > >         Since traffic is duplicated, so it can go to different meters.
> > > > 2. 2nd possible approach:
> > > >              Meter 'A' and 'B' point to the same hardware
> > > > resource, and traffic
> > > reaches this part first and if green, duplication happens.
  
Jerin Jacob Jan. 31, 2023, 2:37 p.m. UTC | #7
On Tue, Jan 31, 2023 at 2:31 PM Rongwei Liu <rongweil@nvidia.com> wrote:
>
> Hi Jerin:
>
> BR
> Rongwei
>
> > -----Original Message-----
> > From: Jerin Jacob <jerinjacobk@gmail.com>
> > Sent: Tuesday, January 31, 2023 16:46
> > To: Rongwei Liu <rongweil@nvidia.com>
> > Cc: dev@dpdk.org; Matan Azrad <matan@nvidia.com>; Slava Ovsiienko
> > <viacheslavo@nvidia.com>; Ori Kam <orika@nvidia.com>; NBU-Contact-
> > Thomas Monjalon (EXTERNAL) <thomas@monjalon.net>;
> > stephen@networkplumber.org; Raslan Darawsheh <rasland@nvidia.com>;
> > Ferruh Yigit <ferruh.yigit@amd.com>; Andrew Rybchenko
> > <andrew.rybchenko@oktetlabs.ru>
> > Subject: Re: [PATCH v4 3/3] ethdev: add standby flags for live migration
> >
> > External email: Use caution opening links or attachments
> >
> >
> > On Tue, Jan 31, 2023 at 8:23 AM Rongwei Liu <rongweil@nvidia.com> wrote:
> > >
> > > HI Jerin:
> > >
> > > BR
> > > Rongwei
> > >
> > > > -----Original Message-----
> > > > From: Jerin Jacob <jerinjacobk@gmail.com>
> > > > Sent: Tuesday, January 31, 2023 01:10
> > > > To: Rongwei Liu <rongweil@nvidia.com>
> > > > Cc: dev@dpdk.org; Matan Azrad <matan@nvidia.com>; Slava Ovsiienko
> > > > <viacheslavo@nvidia.com>; Ori Kam <orika@nvidia.com>; NBU-Contact-
> > > > Thomas Monjalon (EXTERNAL) <thomas@monjalon.net>;
> > > > stephen@networkplumber.org; Raslan Darawsheh <rasland@nvidia.com>;
> > > > Ferruh Yigit <ferruh.yigit@amd.com>; Andrew Rybchenko
> > > > <andrew.rybchenko@oktetlabs.ru>
> > > > Subject: Re: [PATCH v4 3/3] ethdev: add standby flags for live
> > > > migration
> > > >
> > > > External email: Use caution opening links or attachments
> > > >
> > > >
> > > > On Mon, Jan 30, 2023 at 8:17 AM Rongwei Liu <rongweil@nvidia.com>
> > wrote:
> > > > >
> > > > > Hi Jerin
> > > > >
> > > > > BR
> > > > > Rongwei
> > > > >
> > > > > > -----Original Message-----
> > > > > > From: Jerin Jacob <jerinjacobk@gmail.com>
> > > > > > Sent: Monday, January 23, 2023 21:20
> > > > > > To: Rongwei Liu <rongweil@nvidia.com>
> > > > > > Cc: dev@dpdk.org; Matan Azrad <matan@nvidia.com>; Slava
> > > > > > Ovsiienko <viacheslavo@nvidia.com>; Ori Kam <orika@nvidia.com>;
> > > > > > NBU-Contact- Thomas Monjalon (EXTERNAL) <thomas@monjalon.net>;
> > > > > > stephen@networkplumber.org; Raslan Darawsheh
> > > > > > <rasland@nvidia.com>; Ferruh Yigit <ferruh.yigit@amd.com>;
> > > > > > Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
> > > > > > Subject: Re: [PATCH v4 3/3] ethdev: add standby flags for live
> > > > > > migration
> > > > > >
> > > > > > External email: Use caution opening links or attachments
> > > > > >
> > > > > >
> > > > > > On Wed, Jan 18, 2023 at 9:15 PM Rongwei Liu
> > > > > > <rongweil@nvidia.com>
> > > > wrote:
> > > > > > >
> > > > > > > Some flags are added to the process state API for live
> > > > > > > migration in order to change the behavior of the flow rules in a
> > standby process.
> > > > > > >
> > > > > > > Signed-off-by: Rongwei Liu <rongweil@nvidia.com>
> > > > > > > ---
> > > > > > >  lib/ethdev/rte_ethdev.h | 21 +++++++++++++++++++++
> > > > > > >  1 file changed, 21 insertions(+)
> > > > > > >
> > > > > > > diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
> > > > > > > index
> > > > > > > 1505396ced..9ae4f426a7 100644
> > > > > > > --- a/lib/ethdev/rte_ethdev.h
> > > > > > > +++ b/lib/ethdev/rte_ethdev.h
> > > > > > > @@ -2260,6 +2260,27 @@ int rte_eth_dev_owner_get(const
> > > > > > > uint16_t port_id,  __rte_experimental  int
> > > > > > > rte_eth_process_set_role(bool standby, uint32_t flags);
> > > > > > >
> > > > > > > +/**@{@name Process role flags
> > > > > > > + * used when migrating from an application to another one.
> > > > > > > + * @see rte_eth_process_set_active  */
> > > > > > > +/**
> > > > > > > + * When set on a standby process, ingress flow rules will be
> > > > > > > +effective
> > > > > > > + * in active and standby processes, so the ingress traffic
> > > > > > > +may be
> > > > duplicated.
> > > > > > > + */
> > > > > > > +#define RTE_ETH_PROCESS_FLAG_STANDBY_DUP_FLOW_INGRESS
> > > > > > RTE_BIT32(0)
> > > > > >
> > > > > >
> > > > > > How to duplicate if action has statefull items for example,
> > > > > > rte_flow_action_security::security_session -> it store the live
> > > > > > pointer rte_flow_action_meter::mtr_id; -> MTR object ID created
> > > > > > with
> > > > > > rte_mtr_create()
> > > > > I agree with you, not all actions can be supported in the
> > > > > active/standby
> > > > model.
> > > >
> > > > IMO, Where ever rules are not standalone (like QUEUE, RSS) etc, It
> > > > will be architecturally is not possible to migrate with pointers.
> > > > That's where I have concern generalizing this feature for this ethdev.
> > > >
> > > Not sure I understand your concern correctly. What' the pointer concept
> > here?
> >
> > I meant, Any HW resource driver deals with "pointers" or "fixed ID"
> > can not get the same value
> > for the new application. That's where I believe this whole concepts works for
> > very standalone rte_flow patterns and actions.
> >
> >
> > > Queue RSS actions can be migrated per my local test. Active/Standby
> > application have its fully own rxq/txq.
> >
> > Yes. It because it is standalone.
> >
> > > They are totally separated processes and like two members in pipeline. 2nd
> > member can't be feed if 1st member alive and handle the traffic.
> > >
> > > > Also, I don't believe there is any real HW support needed for this.
> > > > IMO, Having DPDK standard multiprocess can do this by keeping
> > > > secondary application can migrate, keeping all the SW logic in the
> > > > primary process by doing the housekeeping in the application. On
> > > > plus side, it works with pointers too.
> >
> > > IMO, in multiple process model, primary process usually owns the hardware
> > resources via mmap/iomap/pci_map etc.
> > > Secondary process is not able to run if primary quits no matter gracefully or
> > crashing.
> > > This patch wants to introduce a "backup to alive" model.
> > > Assume user wants to upgrade from DPDK version 22.03 to 23.03, 22.03 is
> > running and active role while 23.03 comes up in standby.
> > > Both DPDK processes have its own resources and doesn't rely on each other.
> > > User can migrate the application following the steps in commit message
> > with minimum traffic downtime.
> > > SW logic like flow rules can be done following iptables-save/iptables-restore
> > approach.
> > > >
> > > > I am not sure how much housekeeping offload to _HW_ in your case. In
> > > > my view, it should be generic utils functions to track the flow and
> > > > installing the rules using rte_flow APIs and keep the scope only for
> > rte_flow.
> > > For rules part, totally agree with you. Issue is there maybe millions
> > > of flow rules in field and each rule may take different steps to re-install per
> > vendor' implementations.
> >
> > I understand the desire for millon flow migrations. Which makes sense.IMO, It
> > may be just easy to make this feature just for rte_flow name space. Just have
> > APIs to export() existing rules for the given port and import() the rules
> > exported rather than going to ethdev space and call it as "live migration".
> >
> Do you mean the API naming should be "rte_flow_process_set_role()" instead of "rte_eth_process_set_role()" ?
> Also move to rte_flow.c/.h files? Are we good to keep the PMD callback in eth_dev layer?

Yes. something with rte_flow_ prefix and not sure _set_role() kind of scheme.

> Simple export()/import() may not work. Image some flow rules are exclusive and can't be issued from both applications.
> We need to stop old application. I am afraid this will introduce big time window which traffic stops.

Yes, I think the  sequence is
rte_flow_rules_export() on app 1
stop the app 1
rte_flow_rules_import() of app 1 by app2.


> Application won't like this behavior.
> With this callback, each PMD can specify each rule, queue it or use lower priority if exclusive. Or return error.
>
> > > This serial wants to propose a unified interface for upper layer application'
> > easy use.
> > > >
> > > > That's just my view. I leave to ethdev maintainers for the rest of
> > > > the review and decision on this series.
> > > >
> > > > > That' why we have return value checking and rollback.
> > > > > In Nvidia driver doc, we suggested user to start from 'rss/queue/jump'
> > > > actions.
> > > > > Meter is possible, at least per my view.
> > > > > Assume: "meter g_action queue 0 / y_action drop / r_action drop"
> > > > > Old application: create meter_id 'A' with pre-defined limitation.
> > > > > New application: create meter_id 'B' which has the same parameters
> > > > > with
> > > > 'A'.
> > > > > 1. 1st possible approach:
> > > > >         Hardware duplicates the traffic; old application use meter
> > > > > 'A' and new
> > > > application uses meter 'B' to control traffic throughputs.
> > > > >         Since traffic is duplicated, so it can go to different meters.
> > > > > 2. 2nd possible approach:
> > > > >              Meter 'A' and 'B' point to the same hardware
> > > > > resource, and traffic
> > > > reaches this part first and if green, duplication happens.
  
Ori Kam Jan. 31, 2023, 2:45 p.m. UTC | #8
Hi Jerin and Rongwei,

> -----Original Message-----
> From: Jerin Jacob <jerinjacobk@gmail.com>
> Sent: Tuesday, 31 January 2023 16:37
> 
> On Tue, Jan 31, 2023 at 2:31 PM Rongwei Liu <rongweil@nvidia.com> wrote:
> >
> > Hi Jerin:
> >
> > BR
> > Rongwei
> >
> > > -----Original Message-----
> > > From: Jerin Jacob <jerinjacobk@gmail.com>
> > > Sent: Tuesday, January 31, 2023 16:46
> > > To: Rongwei Liu <rongweil@nvidia.com>
> > > Cc: dev@dpdk.org; Matan Azrad <matan@nvidia.com>; Slava Ovsiienko
> > > <viacheslavo@nvidia.com>; Ori Kam <orika@nvidia.com>; NBU-Contact-
> > > Thomas Monjalon (EXTERNAL) <thomas@monjalon.net>;
> > > stephen@networkplumber.org; Raslan Darawsheh
> <rasland@nvidia.com>;
> > > Ferruh Yigit <ferruh.yigit@amd.com>; Andrew Rybchenko
> > > <andrew.rybchenko@oktetlabs.ru>
> > > Subject: Re: [PATCH v4 3/3] ethdev: add standby flags for live migration
> > >
> > > External email: Use caution opening links or attachments
> > >
> > >
> > > On Tue, Jan 31, 2023 at 8:23 AM Rongwei Liu <rongweil@nvidia.com>
> wrote:
> > > >
> > > > HI Jerin:
> > > >
> > > > BR
> > > > Rongwei
> > > >
> > > > > -----Original Message-----
> > > > > From: Jerin Jacob <jerinjacobk@gmail.com>
> > > > > Sent: Tuesday, January 31, 2023 01:10
> > > > > To: Rongwei Liu <rongweil@nvidia.com>
> > > > > Cc: dev@dpdk.org; Matan Azrad <matan@nvidia.com>; Slava
> Ovsiienko
> > > > > <viacheslavo@nvidia.com>; Ori Kam <orika@nvidia.com>; NBU-
> Contact-
> > > > > Thomas Monjalon (EXTERNAL) <thomas@monjalon.net>;
> > > > > stephen@networkplumber.org; Raslan Darawsheh
> <rasland@nvidia.com>;
> > > > > Ferruh Yigit <ferruh.yigit@amd.com>; Andrew Rybchenko
> > > > > <andrew.rybchenko@oktetlabs.ru>
> > > > > Subject: Re: [PATCH v4 3/3] ethdev: add standby flags for live
> > > > > migration
> > > > >
> > > > > External email: Use caution opening links or attachments
> > > > >
> > > > >
> > > > > On Mon, Jan 30, 2023 at 8:17 AM Rongwei Liu
> <rongweil@nvidia.com>
> > > wrote:
> > > > > >
> > > > > > Hi Jerin
> > > > > >
> > > > > > BR
> > > > > > Rongwei
> > > > > >
> > > > > > > -----Original Message-----
> > > > > > > From: Jerin Jacob <jerinjacobk@gmail.com>
> > > > > > > Sent: Monday, January 23, 2023 21:20
> > > > > > > To: Rongwei Liu <rongweil@nvidia.com>
> > > > > > > Cc: dev@dpdk.org; Matan Azrad <matan@nvidia.com>; Slava
> > > > > > > Ovsiienko <viacheslavo@nvidia.com>; Ori Kam
> <orika@nvidia.com>;
> > > > > > > NBU-Contact- Thomas Monjalon (EXTERNAL)
> <thomas@monjalon.net>;
> > > > > > > stephen@networkplumber.org; Raslan Darawsheh
> > > > > > > <rasland@nvidia.com>; Ferruh Yigit <ferruh.yigit@amd.com>;
> > > > > > > Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
> > > > > > > Subject: Re: [PATCH v4 3/3] ethdev: add standby flags for live
> > > > > > > migration
> > > > > > >
> > > > > > > External email: Use caution opening links or attachments
> > > > > > >
> > > > > > >
> > > > > > > On Wed, Jan 18, 2023 at 9:15 PM Rongwei Liu
> > > > > > > <rongweil@nvidia.com>
> > > > > wrote:
> > > > > > > >
> > > > > > > > Some flags are added to the process state API for live
> > > > > > > > migration in order to change the behavior of the flow rules in a
> > > standby process.
> > > > > > > >
> > > > > > > > Signed-off-by: Rongwei Liu <rongweil@nvidia.com>
> > > > > > > > ---
> > > > > > > >  lib/ethdev/rte_ethdev.h | 21 +++++++++++++++++++++
> > > > > > > >  1 file changed, 21 insertions(+)
> > > > > > > >
> > > > > > > > diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
> > > > > > > > index
> > > > > > > > 1505396ced..9ae4f426a7 100644
> > > > > > > > --- a/lib/ethdev/rte_ethdev.h
> > > > > > > > +++ b/lib/ethdev/rte_ethdev.h
> > > > > > > > @@ -2260,6 +2260,27 @@ int rte_eth_dev_owner_get(const
> > > > > > > > uint16_t port_id,  __rte_experimental  int
> > > > > > > > rte_eth_process_set_role(bool standby, uint32_t flags);
> > > > > > > >
> > > > > > > > +/**@{@name Process role flags
> > > > > > > > + * used when migrating from an application to another one.
> > > > > > > > + * @see rte_eth_process_set_active  */
> > > > > > > > +/**
> > > > > > > > + * When set on a standby process, ingress flow rules will be
> > > > > > > > +effective
> > > > > > > > + * in active and standby processes, so the ingress traffic
> > > > > > > > +may be
> > > > > duplicated.
> > > > > > > > + */
> > > > > > > > +#define
> RTE_ETH_PROCESS_FLAG_STANDBY_DUP_FLOW_INGRESS
> > > > > > > RTE_BIT32(0)
> > > > > > >
> > > > > > >
> > > > > > > How to duplicate if action has statefull items for example,
> > > > > > > rte_flow_action_security::security_session -> it store the live
> > > > > > > pointer rte_flow_action_meter::mtr_id; -> MTR object ID created
> > > > > > > with
> > > > > > > rte_mtr_create()
> > > > > > I agree with you, not all actions can be supported in the
> > > > > > active/standby
> > > > > model.
> > > > >
> > > > > IMO, Where ever rules are not standalone (like QUEUE, RSS) etc, It
> > > > > will be architecturally is not possible to migrate with pointers.
> > > > > That's where I have concern generalizing this feature for this ethdev.
> > > > >
> > > > Not sure I understand your concern correctly. What' the pointer
> concept
> > > here?
> > >
> > > I meant, Any HW resource driver deals with "pointers" or "fixed ID"
> > > can not get the same value
> > > for the new application. That's where I believe this whole concepts works
> for
> > > very standalone rte_flow patterns and actions.
> > >
> > >
> > > > Queue RSS actions can be migrated per my local test. Active/Standby
> > > application have its fully own rxq/txq.
> > >
> > > Yes. It because it is standalone.
> > >
> > > > They are totally separated processes and like two members in pipeline.
> 2nd
> > > member can't be feed if 1st member alive and handle the traffic.
> > > >
> > > > > Also, I don't believe there is any real HW support needed for this.
> > > > > IMO, Having DPDK standard multiprocess can do this by keeping
> > > > > secondary application can migrate, keeping all the SW logic in the
> > > > > primary process by doing the housekeeping in the application. On
> > > > > plus side, it works with pointers too.
> > >
> > > > IMO, in multiple process model, primary process usually owns the
> hardware
> > > resources via mmap/iomap/pci_map etc.
> > > > Secondary process is not able to run if primary quits no matter
> gracefully or
> > > crashing.
> > > > This patch wants to introduce a "backup to alive" model.
> > > > Assume user wants to upgrade from DPDK version 22.03 to 23.03, 22.03
> is
> > > running and active role while 23.03 comes up in standby.
> > > > Both DPDK processes have its own resources and doesn't rely on each
> other.
> > > > User can migrate the application following the steps in commit message
> > > with minimum traffic downtime.
> > > > SW logic like flow rules can be done following iptables-save/iptables-
> restore
> > > approach.
> > > > >
> > > > > I am not sure how much housekeeping offload to _HW_ in your case.
> In
> > > > > my view, it should be generic utils functions to track the flow and
> > > > > installing the rules using rte_flow APIs and keep the scope only for
> > > rte_flow.
> > > > For rules part, totally agree with you. Issue is there maybe millions
> > > > of flow rules in field and each rule may take different steps to re-install
> per
> > > vendor' implementations.
> > >
> > > I understand the desire for millon flow migrations. Which makes
> sense.IMO, It
> > > may be just easy to make this feature just for rte_flow name space. Just
> have
> > > APIs to export() existing rules for the given port and import() the rules
> > > exported rather than going to ethdev space and call it as "live migration".
> > >
> > Do you mean the API naming should be "rte_flow_process_set_role()"
> instead of "rte_eth_process_set_role()" ?
> > Also move to rte_flow.c/.h files? Are we good to keep the PMD callback in
> eth_dev layer?
> 
> Yes. something with rte_flow_ prefix and not sure _set_role() kind of
> scheme.

I think that the process of upgrade relates to the entire port and not only the rte_flow,
I don't mind that this flag will be part  of rte_flow, but it looks like this information is in higher level.

> 
> > Simple export()/import() may not work. Image some flow rules are
> exclusive and can't be issued from both applications.
> > We need to stop old application. I am afraid this will introduce big time
> window which traffic stops.
> 
> Yes, I think the  sequence is
> rte_flow_rules_export() on app 1
> stop the app 1
> rte_flow_rules_import() of app 1 by app2.
> 
I don't think export is the best solution, since maybe the second application doesn't want
all rules.
From my understanding the idea is to set priority between two process so when 
one application closes the traffic is going to be received by the second application.
We have also the option that the second process will get duplicated traffic with the
First application.

> 
> > Application won't like this behavior.
> > With this callback, each PMD can specify each rule, queue it or use lower
> priority if exclusive. Or return error.
> >
> > > > This serial wants to propose a unified interface for upper layer
> application'
> > > easy use.
> > > > >
> > > > > That's just my view. I leave to ethdev maintainers for the rest of
> > > > > the review and decision on this series.
> > > > >
> > > > > > That' why we have return value checking and rollback.
> > > > > > In Nvidia driver doc, we suggested user to start from
> 'rss/queue/jump'
> > > > > actions.
> > > > > > Meter is possible, at least per my view.
> > > > > > Assume: "meter g_action queue 0 / y_action drop / r_action drop"
> > > > > > Old application: create meter_id 'A' with pre-defined limitation.
> > > > > > New application: create meter_id 'B' which has the same
> parameters
> > > > > > with
> > > > > 'A'.
> > > > > > 1. 1st possible approach:
> > > > > >         Hardware duplicates the traffic; old application use meter
> > > > > > 'A' and new
> > > > > application uses meter 'B' to control traffic throughputs.
> > > > > >         Since traffic is duplicated, so it can go to different meters.
> > > > > > 2. 2nd possible approach:
> > > > > >              Meter 'A' and 'B' point to the same hardware
> > > > > > resource, and traffic
> > > > > reaches this part first and if green, duplication happens.
  
Thomas Monjalon Jan. 31, 2023, 5:50 p.m. UTC | #9
31/01/2023 15:45, Ori Kam:
> From: Jerin Jacob <jerinjacobk@gmail.com>
> > On Tue, Jan 31, 2023 at 2:31 PM Rongwei Liu <rongweil@nvidia.com> wrote:
> > > From: Jerin Jacob <jerinjacobk@gmail.com>
> > > > On Tue, Jan 31, 2023 at 8:23 AM Rongwei Liu <rongweil@nvidia.com>
> > > > > From: Jerin Jacob <jerinjacobk@gmail.com>
> > > > > > On Mon, Jan 30, 2023 at 8:17 AM Rongwei Liu
> > > > > > > From: Jerin Jacob <jerinjacobk@gmail.com>
> > > > > > > > On Wed, Jan 18, 2023 at 9:15 PM Rongwei Liu
> > > > > > > > > +/**@{@name Process role flags
> > > > > > > > > + * used when migrating from an application to another one.
> > > > > > > > > + * @see rte_eth_process_set_active  */
> > > > > > > > > +/**
> > > > > > > > > + * When set on a standby process, ingress flow rules will be
> > > > > > > > > +effective
> > > > > > > > > + * in active and standby processes, so the ingress traffic
> > > > > > > > > +may be duplicated.
> > > > > > > > > + */
> > > > > > > > > +#define RTE_ETH_PROCESS_FLAG_STANDBY_DUP_FLOW_INGRESS RTE_BIT32(0)
> > > > > > > >
> > > > > > > >
> > > > > > > > How to duplicate if action has statefull items for example,
> > > > > > > > rte_flow_action_security::security_session -> it store the live
> > > > > > > > pointer rte_flow_action_meter::mtr_id; -> MTR object ID created
> > > > > > > > with
> > > > > > > > rte_mtr_create()
> > > > > > > I agree with you, not all actions can be supported in the
> > > > > > > active/standby model.
> > > > > >
> > > > > > IMO, Where ever rules are not standalone (like QUEUE, RSS) etc, It
> > > > > > will be architecturally is not possible to migrate with pointers.
> > > > > > That's where I have concern generalizing this feature for this ethdev.
> > > > > >
> > > > > Not sure I understand your concern correctly. What' the pointer concept here?
> > > >
> > > > I meant, Any HW resource driver deals with "pointers" or "fixed ID"
> > > > can not get the same value
> > > > for the new application. That's where I believe this whole concepts works
> > > > for very standalone rte_flow patterns and actions.
> > > >
> > > > > Queue RSS actions can be migrated per my local test. Active/Standby
> > > > application have its fully own rxq/txq.
> > > >
> > > > Yes. It because it is standalone.
> > > >
> > > > > They are totally separated processes and like two members in pipeline.
> > > > > 2nd member can't be feed if 1st member alive and handle the traffic.
> > > > >
[...]
> > > > > > my view, it should be generic utils functions to track the flow and
> > > > > > installing the rules using rte_flow APIs and keep the scope only for
> > > > > > rte_flow.
> > > > > 
> > > > > For rules part, totally agree with you. Issue is there maybe millions
> > > > > of flow rules in field and each rule may take different steps
> > > > > to re-install per vendor' implementations.
> > > >
> > > > I understand the desire for millon flow migrations. Which makes sense.
> > > > IMO, It may be just easy to make this feature just for rte_flow name space.
> > > > Just have APIs to export() existing rules for the given port
> > > > and import() the rules
> > > > exported rather than going to ethdev space and call it as "live migration".
> > > >
> > > Do you mean the API naming should be "rte_flow_process_set_role()"
> > > instead of "rte_eth_process_set_role()" ?
> > > Also move to rte_flow.c/.h files? Are we good to keep the PMD callback
> > > in eth_dev layer?
> > 
> > Yes. something with rte_flow_ prefix and not sure _set_role() kind of
> > scheme.
> 
> I think that the process of upgrade relates to the entire port and not only the rte_flow,
> I don't mind that this flag will be part  of rte_flow, but it looks like this information is in higher level.

I agree, application migration is a high-level concept.
For now we see that we can take advantage of it for some flow rules.
It could help more use cases.

I also agree that it is not a full solution.
Migration is complex, that's sure we cannot solve it in few weeks,
and we'll need to add more functions and helpers to make it easy to use
in more cases.


> > > Simple export()/import() may not work. Image some flow rules are
> > exclusive and can't be issued from both applications.
> > > We need to stop old application. I am afraid this will introduce big time
> > window which traffic stops.
> > 
> > Yes, I think the  sequence is
> > rte_flow_rules_export() on app 1
> > stop the app 1
> > rte_flow_rules_import() of app 1 by app2.
> > 
> I don't think export is the best solution, since maybe the second application doesn't want
> all rules.
> From my understanding the idea is to set priority between two process so when 
> one application closes the traffic is going to be received by the second application.
> We have also the option that the second process will get duplicated traffic with the
> First application.
> 
> > > Application won't like this behavior.
> > > With this callback, each PMD can specify each rule, queue it or use lower
> > priority if exclusive. Or return error.
> > >
> > > > > This serial wants to propose a unified interface for upper layer
> > application'
> > > > easy use.
> > > > > >
> > > > > > That's just my view. I leave to ethdev maintainers for the rest of
> > > > > > the review and decision on this series.

That's a first step which allows to declare the migration intent.
We should try to build on top of it and keep it as experimental
as long as needed to achieve a good migration support.

I am for going in this direction (accept the patch) for now.
If we discover in the next months that there is a better direction,
we can change.


> > > > > > > That' why we have return value checking and rollback.
> > > > > > > In Nvidia driver doc, we suggested user to start from
> > 'rss/queue/jump'
> > > > > > actions.
> > > > > > > Meter is possible, at least per my view.
> > > > > > > Assume: "meter g_action queue 0 / y_action drop / r_action drop"
> > > > > > > Old application: create meter_id 'A' with pre-defined limitation.
> > > > > > > New application: create meter_id 'B' which has the same
> > parameters
> > > > > > > with
> > > > > > 'A'.
> > > > > > > 1. 1st possible approach:
> > > > > > >         Hardware duplicates the traffic; old application use meter
> > > > > > > 'A' and new
> > > > > > application uses meter 'B' to control traffic throughputs.
> > > > > > >         Since traffic is duplicated, so it can go to different meters.
> > > > > > > 2. 2nd possible approach:
> > > > > > >              Meter 'A' and 'B' point to the same hardware
> > > > > > > resource, and traffic
> > > > > > reaches this part first and if green, duplication happens.
  
Jerin Jacob Jan. 31, 2023, 6:10 p.m. UTC | #10
On Tue, Jan 31, 2023 at 11:20 PM Thomas Monjalon <thomas@monjalon.net> wrote:
>
> 31/01/2023 15:45, Ori Kam:
> > From: Jerin Jacob <jerinjacobk@gmail.com>
> > > On Tue, Jan 31, 2023 at 2:31 PM Rongwei Liu <rongweil@nvidia.com> wrote:
> > > > From: Jerin Jacob <jerinjacobk@gmail.com>
> > > > > On Tue, Jan 31, 2023 at 8:23 AM Rongwei Liu <rongweil@nvidia.com>
> > > > > > From: Jerin Jacob <jerinjacobk@gmail.com>
> > > > > > > On Mon, Jan 30, 2023 at 8:17 AM Rongwei Liu
> > > > > > > > From: Jerin Jacob <jerinjacobk@gmail.com>
> > > > > > > > > On Wed, Jan 18, 2023 at 9:15 PM Rongwei Liu
> > > > > > > > > > +/**@{@name Process role flags
> > > > > > > > > > + * used when migrating from an application to another one.
> > > > > > > > > > + * @see rte_eth_process_set_active  */
> > > > > > > > > > +/**
> > > > > > > > > > + * When set on a standby process, ingress flow rules will be
> > > > > > > > > > +effective
> > > > > > > > > > + * in active and standby processes, so the ingress traffic
> > > > > > > > > > +may be duplicated.
> > > > > > > > > > + */
> > > > > > > > > > +#define RTE_ETH_PROCESS_FLAG_STANDBY_DUP_FLOW_INGRESS RTE_BIT32(0)
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > How to duplicate if action has statefull items for example,
> > > > > > > > > rte_flow_action_security::security_session -> it store the live
> > > > > > > > > pointer rte_flow_action_meter::mtr_id; -> MTR object ID created
> > > > > > > > > with
> > > > > > > > > rte_mtr_create()
> > > > > > > > I agree with you, not all actions can be supported in the
> > > > > > > > active/standby model.
> > > > > > >
> > > > > > > IMO, Where ever rules are not standalone (like QUEUE, RSS) etc, It
> > > > > > > will be architecturally is not possible to migrate with pointers.
> > > > > > > That's where I have concern generalizing this feature for this ethdev.
> > > > > > >
> > > > > > Not sure I understand your concern correctly. What' the pointer concept here?
> > > > >
> > > > > I meant, Any HW resource driver deals with "pointers" or "fixed ID"
> > > > > can not get the same value
> > > > > for the new application. That's where I believe this whole concepts works
> > > > > for very standalone rte_flow patterns and actions.
> > > > >
> > > > > > Queue RSS actions can be migrated per my local test. Active/Standby
> > > > > application have its fully own rxq/txq.
> > > > >
> > > > > Yes. It because it is standalone.
> > > > >
> > > > > > They are totally separated processes and like two members in pipeline.
> > > > > > 2nd member can't be feed if 1st member alive and handle the traffic.
> > > > > >
> [...]
> > > > > > > my view, it should be generic utils functions to track the flow and
> > > > > > > installing the rules using rte_flow APIs and keep the scope only for
> > > > > > > rte_flow.
> > > > > >
> > > > > > For rules part, totally agree with you. Issue is there maybe millions
> > > > > > of flow rules in field and each rule may take different steps
> > > > > > to re-install per vendor' implementations.
> > > > >
> > > > > I understand the desire for millon flow migrations. Which makes sense.
> > > > > IMO, It may be just easy to make this feature just for rte_flow name space.
> > > > > Just have APIs to export() existing rules for the given port
> > > > > and import() the rules
> > > > > exported rather than going to ethdev space and call it as "live migration".
> > > > >
> > > > Do you mean the API naming should be "rte_flow_process_set_role()"
> > > > instead of "rte_eth_process_set_role()" ?
> > > > Also move to rte_flow.c/.h files? Are we good to keep the PMD callback
> > > > in eth_dev layer?
> > >
> > > Yes. something with rte_flow_ prefix and not sure _set_role() kind of
> > > scheme.
> >
> > I think that the process of upgrade relates to the entire port and not only the rte_flow,
> > I don't mind that this flag will be part  of rte_flow, but it looks like this information is in higher level.
>
> I agree, application migration is a high-level concept.
> For now we see that we can take advantage of it for some flow rules.
> It could help more use cases.
>
> I also agree that it is not a full solution.
> Migration is complex, that's sure we cannot solve it in few weeks,
> and we'll need to add more functions and helpers to make it easy to use
> in more cases.

Makes sense.

>
>
> > > > Simple export()/import() may not work. Image some flow rules are
> > > exclusive and can't be issued from both applications.
> > > > We need to stop old application. I am afraid this will introduce big time
> > > window which traffic stops.
> > >
> > > Yes, I think the  sequence is
> > > rte_flow_rules_export() on app 1
> > > stop the app 1
> > > rte_flow_rules_import() of app 1 by app2.
> > >
> > I don't think export is the best solution, since maybe the second application doesn't want
> > all rules.
> > From my understanding the idea is to set priority between two process so when
> > one application closes the traffic is going to be received by the second application.
> > We have also the option that the second process will get duplicated traffic with the
> > First application.
> >
> > > > Application won't like this behavior.
> > > > With this callback, each PMD can specify each rule, queue it or use lower
> > > priority if exclusive. Or return error.
> > > >
> > > > > > This serial wants to propose a unified interface for upper layer
> > > application'
> > > > > easy use.
> > > > > > >
> > > > > > > That's just my view. I leave to ethdev maintainers for the rest of
> > > > > > > the review and decision on this series.
>
> That's a first step which allows to declare the migration intent.
> We should try to build on top of it and keep it as experimental
> as long as needed to achieve a good migration support.
>
> I am for going in this direction (accept the patch) for now.
> If we discover in the next months that there is a better direction,
> we can change.

Please have a driver support and test application to exercise this API
when merging this patch.
  

Patch

diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
index 1505396ced..9ae4f426a7 100644
--- a/lib/ethdev/rte_ethdev.h
+++ b/lib/ethdev/rte_ethdev.h
@@ -2260,6 +2260,27 @@  int rte_eth_dev_owner_get(const uint16_t port_id,
 __rte_experimental
 int rte_eth_process_set_role(bool standby, uint32_t flags);
 
+/**@{@name Process role flags
+ * used when migrating from an application to another one.
+ * @see rte_eth_process_set_active
+ */
+/**
+ * When set on a standby process, ingress flow rules will be effective
+ * in active and standby processes, so the ingress traffic may be duplicated.
+ */
+#define RTE_ETH_PROCESS_FLAG_STANDBY_DUP_FLOW_INGRESS      RTE_BIT32(0)
+/**
+ * When set on a standby process, egress flow rules will be effective
+ * in active and standby processes, so the egress traffic may be duplicated.
+ */
+#define RTE_ETH_PROCESS_FLAG_STANDBY_DUP_FLOW_EGRESS       RTE_BIT32(1)
+/**
+ * When set on a standby process, transfer flow rules will be effective
+ * in active and standby processes, so the transfer traffic may be duplicated.
+ */
+#define RTE_ETH_PROCESS_FLAG_STANDBY_DUP_FLOW_TRANSFER     RTE_BIT32(2)
+/**@}*/
+
 /**
  * Get the number of ports which are usable for the application.
  *