[RFC] ethdev: support flow aging

Message ID 1558865893-23381-1-git-send-email-matan@mellanox.com (mailing list archive)
State RFC, archived
Delegated to: Ferruh Yigit
Headers
Series [RFC] ethdev: support flow aging |

Checks

Context Check Description
ci/checkpatch warning coding style issues
ci/Intel-compilation success Compilation OK

Commit Message

Matan Azrad May 26, 2019, 10:18 a.m. UTC
  One of the reasons to destroy a flow is the fact that no packet matches the
flow for "timeout" time.
For example, when TCP\UDP sessions are suddenly closed.

Currently, there is no any dpdk mechanism for flow aging and the
applications use there own ways to detect and destroy aged-out flows.

This RFC introduces flow aging APIs to offload the flow aging task from
the application to the port.

Design:
- A new rte_flow action: RTE_FLOW_ACTION_TYPE_AGE to set the timeout and
  the application flow context for each flow.
- A new ethdev event: RTE_ETH_EVENT_FLOW_AGED for the driver to report
  that there are new aged-out flows.
- A new rte_flow API: rte_flow_get_aged_flows to get the aged-out flows
  contexts from the port.

By this design each PMD can use its best way to do the aging with the
device offloads supported by its HW.

Signed-off-by: Matan Azrad <matan@mellanox.com>
---
 lib/librte_ethdev/rte_ethdev.h |  1 +
 lib/librte_ethdev/rte_flow.h   | 56 ++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 57 insertions(+)
  

Comments

Jerin Jacob Kollanukkaran June 6, 2019, 10:24 a.m. UTC | #1
> -----Original Message-----
> From: dev <dev-bounces@dpdk.org> On Behalf Of Matan Azrad
> Sent: Sunday, May 26, 2019 3:48 PM
> To: Adrien Mazarguil <adrien.mazarguil@6wind.com>; dev@dpdk.org
> Subject: [dpdk-dev] [PATCH] [RFC] ethdev: support flow aging
> 
> One of the reasons to destroy a flow is the fact that no packet matches the
> flow for "timeout" time.
> For example, when TCP\UDP sessions are suddenly closed.
> 
> Currently, there is no any dpdk mechanism for flow aging and the
> applications use there own ways to detect and destroy aged-out flows.
> 
> This RFC introduces flow aging APIs to offload the flow aging task from the
> application to the port.
> 
> Design:
> - A new rte_flow action: RTE_FLOW_ACTION_TYPE_AGE to set the timeout
> and
>   the application flow context for each flow.
> - A new ethdev event: RTE_ETH_EVENT_FLOW_AGED for the driver to report
>   that there are new aged-out flows.
> - A new rte_flow API: rte_flow_get_aged_flows to get the aged-out flows
>   contexts from the port.
> 
> By this design each PMD can use its best way to do the aging with the device
> offloads supported by its HW.
> 
> Signed-off-by: Matan Azrad <matan@mellanox.com>
> ---
>  lib/librte_ethdev/rte_ethdev.h |  1 +
>  lib/librte_ethdev/rte_flow.h   | 56
> ++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 57 insertions(+)
> 
> diff --git a/lib/librte_ethdev/rte_ethdev.h b/lib/librte_ethdev/rte_ethdev.h
> index 1f35e1d..6fc1531 100644
> --- a/lib/librte_ethdev/rte_ethdev.h
> +++ b/lib/librte_ethdev/rte_ethdev.h
> @@ -2771,6 +2771,7 @@ enum rte_eth_event_type {
>  	RTE_ETH_EVENT_NEW,      /**< port is probed */
>  	RTE_ETH_EVENT_DESTROY,  /**< port is released */
>  	RTE_ETH_EVENT_IPSEC,    /**< IPsec offload related event */
> +	RTE_ETH_EVENT_FLOW_AGED,/**< New aged-out flows detected in
> the port

Does this event supported in HW? Or Are planning to implement with alarm or
timer. Just asking because, if none of the HW supports the interrupt then
only rte_flow_get_aged_flows sync API be enough()
  
Matan Azrad June 6, 2019, 10:51 a.m. UTC | #2
Hi Jerin

From: Jerin Jacob 
> > -----Original Message-----
> > From: dev <dev-bounces@dpdk.org> On Behalf Of Matan Azrad
> > Sent: Sunday, May 26, 2019 3:48 PM
> > To: Adrien Mazarguil <adrien.mazarguil@6wind.com>; dev@dpdk.org
> > Subject: [dpdk-dev] [PATCH] [RFC] ethdev: support flow aging
> >
> > One of the reasons to destroy a flow is the fact that no packet
> > matches the flow for "timeout" time.
> > For example, when TCP\UDP sessions are suddenly closed.
> >
> > Currently, there is no any dpdk mechanism for flow aging and the
> > applications use there own ways to detect and destroy aged-out flows.
> >
> > This RFC introduces flow aging APIs to offload the flow aging task
> > from the application to the port.
> >
> > Design:
> > - A new rte_flow action: RTE_FLOW_ACTION_TYPE_AGE to set the timeout
> > and
> >   the application flow context for each flow.
> > - A new ethdev event: RTE_ETH_EVENT_FLOW_AGED for the driver to
> report
> >   that there are new aged-out flows.
> > - A new rte_flow API: rte_flow_get_aged_flows to get the aged-out flows
> >   contexts from the port.
> >
> > By this design each PMD can use its best way to do the aging with the
> > device offloads supported by its HW.
> >
> > Signed-off-by: Matan Azrad <matan@mellanox.com>
> > ---
> >  lib/librte_ethdev/rte_ethdev.h |  1 +
> >  lib/librte_ethdev/rte_flow.h   | 56
> > ++++++++++++++++++++++++++++++++++++++++++
> >  2 files changed, 57 insertions(+)
> >
> > diff --git a/lib/librte_ethdev/rte_ethdev.h
> > b/lib/librte_ethdev/rte_ethdev.h index 1f35e1d..6fc1531 100644
> > --- a/lib/librte_ethdev/rte_ethdev.h
> > +++ b/lib/librte_ethdev/rte_ethdev.h
> > @@ -2771,6 +2771,7 @@ enum rte_eth_event_type {
> >  	RTE_ETH_EVENT_NEW,      /**< port is probed */
> >  	RTE_ETH_EVENT_DESTROY,  /**< port is released */
> >  	RTE_ETH_EVENT_IPSEC,    /**< IPsec offload related event */
> > +	RTE_ETH_EVENT_FLOW_AGED,/**< New aged-out flows detected in
> > the port
> Does this event supported in HW?
It depends in the PMD implementation and HW capability.

> Or Are planning to implement with alarm
> or timer.
Again, depends in the PMD implementation.

> Just asking because, if none of the HW supports the interrupt then
> only rte_flow_get_aged_flows sync API be enough()
Why?

According to the above design this is the way for the PMD to notify the application when it has some aged flows ASAP.
So, if the PMD uses an alarm\timer or any other way to support aging action it is better in part of the cases to notify the user asynchronically instead of doing polling by the application.
The idea is to let the application to decide what is better for its usage.

For mlx5 case,
The plan is to raise this event from an HW interrupt handling(same as link event).

Matan.
  
Jerin Jacob Kollanukkaran June 6, 2019, 12:15 p.m. UTC | #3
> -----Original Message-----
> From: Matan Azrad <matan@mellanox.com>
> Sent: Thursday, June 6, 2019 4:22 PM
> To: Jerin Jacob Kollanukkaran <jerinj@marvell.com>; Adrien Mazarguil
> <adrien.mazarguil@6wind.com>; dev@dpdk.org
> Subject: [EXT] RE: [PATCH] [RFC] ethdev: support flow aging
> 
> Hi Jerin

Hi Matan,

> 
> From: Jerin Jacob
> > > -----Original Message-----
> > > From: dev <dev-bounces@dpdk.org> On Behalf Of Matan Azrad
> > > Sent: Sunday, May 26, 2019 3:48 PM
> > > To: Adrien Mazarguil <adrien.mazarguil@6wind.com>; dev@dpdk.org
> > > Subject: [dpdk-dev] [PATCH] [RFC] ethdev: support flow aging
> > >
> > > One of the reasons to destroy a flow is the fact that no packet
> > > matches the flow for "timeout" time.
> > > For example, when TCP\UDP sessions are suddenly closed.
> > >
> > > Currently, there is no any dpdk mechanism for flow aging and the
> > > applications use there own ways to detect and destroy aged-out flows.
> > >
> > > This RFC introduces flow aging APIs to offload the flow aging task
> > > from the application to the port.
> > >
> > > Design:
> > > - A new rte_flow action: RTE_FLOW_ACTION_TYPE_AGE to set the
> timeout
> > > and
> > >   the application flow context for each flow.
> > > - A new ethdev event: RTE_ETH_EVENT_FLOW_AGED for the driver to
> > report
> > >   that there are new aged-out flows.
> > > - A new rte_flow API: rte_flow_get_aged_flows to get the aged-out
> flows
> > >   contexts from the port.
> > >
> > > By this design each PMD can use its best way to do the aging with
> > > the device offloads supported by its HW.
> > >
> > > Signed-off-by: Matan Azrad <matan@mellanox.com>
> > > ---
> > >  lib/librte_ethdev/rte_ethdev.h |  1 +
> > >  lib/librte_ethdev/rte_flow.h   | 56
> > > ++++++++++++++++++++++++++++++++++++++++++
> > >  2 files changed, 57 insertions(+)
> > >
> > > diff --git a/lib/librte_ethdev/rte_ethdev.h
> > > b/lib/librte_ethdev/rte_ethdev.h index 1f35e1d..6fc1531 100644
> > > --- a/lib/librte_ethdev/rte_ethdev.h
> > > +++ b/lib/librte_ethdev/rte_ethdev.h
> > > @@ -2771,6 +2771,7 @@ enum rte_eth_event_type {
> > >  	RTE_ETH_EVENT_NEW,      /**< port is probed */
> > >  	RTE_ETH_EVENT_DESTROY,  /**< port is released */
> > >  	RTE_ETH_EVENT_IPSEC,    /**< IPsec offload related event */
> > > +	RTE_ETH_EVENT_FLOW_AGED,/**< New aged-out flows detected in
> > > the port
> > Does this event supported in HW?
> It depends in the PMD implementation and HW capability.
> 
> > Or Are planning to implement with alarm or timer.
> Again, depends in the PMD implementation.
> 
> > Just asking because, if none of the HW supports the interrupt then
> > only rte_flow_get_aged_flows sync API be enough()
> Why?

If none of the HW supports it then application/common code can periodically polls it.
If mlx5 hw supports it then it fine to have interrupt. 
But I think, we need to have means to express a HW/Implementation does not support its
As there may following reasons why drivers choose to not take timer/alarm path 
1) Some EAL port does not support timer/alarm example: FreeBSD DPDK port
2) If we need to support a few killo rules then timer/alarm implementation will be heavy
So an option to express un supported event would be fine.

> 
> According to the above design this is the way for the PMD to notify the
> application when it has some aged flows ASAP.
> So, if the PMD uses an alarm\timer or any other way to support aging action
> it is better in part of the cases to notify the user asynchronically instead of
> doing polling by the application.
> The idea is to let the application to decide what is better for its usage.
> 
> For mlx5 case,
> The plan is to raise this event from an HW interrupt handling(same as link
> event).

Good to know.

> 
> Matan.
> 
> 
> 
> 
> 
> 
>
  
Matan Azrad June 18, 2019, 5:56 a.m. UTC | #4
Hi Jerin

From: Jerin Jacob
> Sent: Thursday, June 6, 2019 3:16 PM
> To: Matan Azrad <matan@mellanox.com>; Adrien Mazarguil
> <adrien.mazarguil@6wind.com>; dev@dpdk.org
> Subject: RE: [PATCH] [RFC] ethdev: support flow aging
> 
> > -----Original Message-----
> > From: Matan Azrad <matan@mellanox.com>
> > Sent: Thursday, June 6, 2019 4:22 PM
> > To: Jerin Jacob Kollanukkaran <jerinj@marvell.com>; Adrien Mazarguil
> > <adrien.mazarguil@6wind.com>; dev@dpdk.org
> > Subject: [EXT] RE: [PATCH] [RFC] ethdev: support flow aging
> >
> > Hi Jerin
> 
> Hi Matan,
> 
> >
> > From: Jerin Jacob
> > > > -----Original Message-----
> > > > From: dev <dev-bounces@dpdk.org> On Behalf Of Matan Azrad
> > > > Sent: Sunday, May 26, 2019 3:48 PM
> > > > To: Adrien Mazarguil <adrien.mazarguil@6wind.com>; dev@dpdk.org
> > > > Subject: [dpdk-dev] [PATCH] [RFC] ethdev: support flow aging
> > > >
> > > > One of the reasons to destroy a flow is the fact that no packet
> > > > matches the flow for "timeout" time.
> > > > For example, when TCP\UDP sessions are suddenly closed.
> > > >
> > > > Currently, there is no any dpdk mechanism for flow aging and the
> > > > applications use there own ways to detect and destroy aged-out flows.
> > > >
> > > > This RFC introduces flow aging APIs to offload the flow aging task
> > > > from the application to the port.
> > > >
> > > > Design:
> > > > - A new rte_flow action: RTE_FLOW_ACTION_TYPE_AGE to set the
> > timeout
> > > > and
> > > >   the application flow context for each flow.
> > > > - A new ethdev event: RTE_ETH_EVENT_FLOW_AGED for the driver to
> > > report
> > > >   that there are new aged-out flows.
> > > > - A new rte_flow API: rte_flow_get_aged_flows to get the aged-out
> > flows
> > > >   contexts from the port.
> > > >
> > > > By this design each PMD can use its best way to do the aging with
> > > > the device offloads supported by its HW.
> > > >
> > > > Signed-off-by: Matan Azrad <matan@mellanox.com>
> > > > ---
> > > >  lib/librte_ethdev/rte_ethdev.h |  1 +
> > > >  lib/librte_ethdev/rte_flow.h   | 56
> > > > ++++++++++++++++++++++++++++++++++++++++++
> > > >  2 files changed, 57 insertions(+)
> > > >
> > > > diff --git a/lib/librte_ethdev/rte_ethdev.h
> > > > b/lib/librte_ethdev/rte_ethdev.h index 1f35e1d..6fc1531 100644
> > > > --- a/lib/librte_ethdev/rte_ethdev.h
> > > > +++ b/lib/librte_ethdev/rte_ethdev.h
> > > > @@ -2771,6 +2771,7 @@ enum rte_eth_event_type {
> > > >  	RTE_ETH_EVENT_NEW,      /**< port is probed */
> > > >  	RTE_ETH_EVENT_DESTROY,  /**< port is released */
> > > >  	RTE_ETH_EVENT_IPSEC,    /**< IPsec offload related event */
> > > > +	RTE_ETH_EVENT_FLOW_AGED,/**< New aged-out flows detected in
> > > > the port
> > > Does this event supported in HW?
> > It depends in the PMD implementation and HW capability.
> >
> > > Or Are planning to implement with alarm or timer.
> > Again, depends in the PMD implementation.
> >
> > > Just asking because, if none of the HW supports the interrupt then
> > > only rte_flow_get_aged_flows sync API be enough()
> > Why?
> 
> If none of the HW supports it then application/common code can periodically
> polls it.
> If mlx5 hw supports it then it fine to have interrupt.

Actually MLX5 doesn't support aging fully by HW but the HW can help to do it better.
Look, the PMD is the best one to know what is the best way to do aging by its HW even if aging is not fully supported by it.
And it may add a meaningful efficiency to the application. 

> But I think, we need to have means to express a HW/Implementation does
> not support its As there may following reasons why drivers choose to not
> take timer/alarm path
> 1) Some EAL port does not support timer/alarm example: FreeBSD DPDK port
	OK, but why not to support it for other cases (no FreeBSD port)?

> 2) If we need to support a few killo rules then timer/alarm implementation
> will be heavy

Not sure, Depend in the HW ability.

> So an option to express un supported event would be fine.

Can you explain more what is your intension here (2)?

> > According to the above design this is the way for the PMD to notify
> > the application when it has some aged flows ASAP.
> > So, if the PMD uses an alarm\timer or any other way to support aging
> > action it is better in part of the cases to notify the user
> > asynchronically instead of doing polling by the application.
> > The idea is to let the application to decide what is better for its usage.
> >
> > For mlx5 case,
> > The plan is to raise this event from an HW interrupt handling(same as
> > link event).
> 
> Good to know.

The MLX5 plan is still to use timer/alarm and interrupt mechanism to support aging:
 The HW help here is the ability to query batch of flows counters asynchronically, so getting the response of the new counters values by an interrupt.

The timer\alarm will call to devX operation to read batch of counters asynchronically - fast command.
The interrupt handler to catch the response and to check timeout for each flow
(no need to copy the counters from the HW memory - the values are in the PMD memory) - if there is a new aged flow - raise the event.
  
Jerin Jacob Kollanukkaran June 24, 2019, 6:26 a.m. UTC | #5
> -----Original Message-----
> From: Matan Azrad <matan@mellanox.com>
> Sent: Tuesday, June 18, 2019 11:27 AM
> To: Jerin Jacob Kollanukkaran <jerinj@marvell.com>; Adrien Mazarguil
> <adrien.mazarguil@6wind.com>; dev@dpdk.org
> Subject: [EXT] RE: [PATCH] [RFC] ethdev: support flow aging
> 
> Hi Jerin

Hi Matan,

> 
> From: Jerin Jacob
> > Sent: Thursday, June 6, 2019 3:16 PM
> > To: Matan Azrad <matan@mellanox.com>; Adrien Mazarguil
> > <adrien.mazarguil@6wind.com>; dev@dpdk.org
> > Subject: RE: [PATCH] [RFC] ethdev: support flow aging
> >
> > > -----Original Message-----
> > > From: Matan Azrad <matan@mellanox.com>
> > > Sent: Thursday, June 6, 2019 4:22 PM
> > > To: Jerin Jacob Kollanukkaran <jerinj@marvell.com>; Adrien Mazarguil
> > > <adrien.mazarguil@6wind.com>; dev@dpdk.org
> > > Subject: [EXT] RE: [PATCH] [RFC] ethdev: support flow aging
> > >
> > > Hi Jerin
> >
> > Hi Matan,
> >
> > >
> > > From: Jerin Jacob
> > > > > -----Original Message-----
> > > > > From: dev <dev-bounces@dpdk.org> On Behalf Of Matan Azrad
> > > > > Sent: Sunday, May 26, 2019 3:48 PM
> > > > > To: Adrien Mazarguil <adrien.mazarguil@6wind.com>; dev@dpdk.org
> > > > > Subject: [dpdk-dev] [PATCH] [RFC] ethdev: support flow aging
> > > > >
> > > > > One of the reasons to destroy a flow is the fact that no packet
> > > > > matches the flow for "timeout" time.
> > > > > For example, when TCP\UDP sessions are suddenly closed.
> > > > >
> > > > > Currently, there is no any dpdk mechanism for flow aging and the
> > > > > applications use there own ways to detect and destroy aged-out
> flows.
> > > > >
> > > > > This RFC introduces flow aging APIs to offload the flow aging
> > > > > task from the application to the port.
> > > > >
> > > > > Design:
> > > > > - A new rte_flow action: RTE_FLOW_ACTION_TYPE_AGE to set the
> > > timeout
> > > > > and
> > > > >   the application flow context for each flow.
> > > > > - A new ethdev event: RTE_ETH_EVENT_FLOW_AGED for the driver
> to
> > > > report
> > > > >   that there are new aged-out flows.
> > > > > - A new rte_flow API: rte_flow_get_aged_flows to get the
> > > > > aged-out
> > > flows
> > > > >   contexts from the port.
> > > > >
> > > > > By this design each PMD can use its best way to do the aging
> > > > > with the device offloads supported by its HW.
> > > > >
> > > > > Signed-off-by: Matan Azrad <matan@mellanox.com>
> > > > > ---
> > > > >  lib/librte_ethdev/rte_ethdev.h |  1 +
> > > > >  lib/librte_ethdev/rte_flow.h   | 56
> > > > > ++++++++++++++++++++++++++++++++++++++++++
> > > > >  2 files changed, 57 insertions(+)
> > > > >
> > > > > diff --git a/lib/librte_ethdev/rte_ethdev.h
> > > > > b/lib/librte_ethdev/rte_ethdev.h index 1f35e1d..6fc1531 100644
> > > > > --- a/lib/librte_ethdev/rte_ethdev.h
> > > > > +++ b/lib/librte_ethdev/rte_ethdev.h
> > > > > @@ -2771,6 +2771,7 @@ enum rte_eth_event_type {
> > > > >  	RTE_ETH_EVENT_NEW,      /**< port is probed */
> > > > >  	RTE_ETH_EVENT_DESTROY,  /**< port is released */
> > > > >  	RTE_ETH_EVENT_IPSEC,    /**< IPsec offload related event */
> > > > > +	RTE_ETH_EVENT_FLOW_AGED,/**< New aged-out flows
> detected in
> > > > > the port
> > > > Does this event supported in HW?
> > > It depends in the PMD implementation and HW capability.
> > >
> > > > Or Are planning to implement with alarm or timer.
> > > Again, depends in the PMD implementation.
> > >
> > > > Just asking because, if none of the HW supports the interrupt then
> > > > only rte_flow_get_aged_flows sync API be enough()
> > > Why?
> >
> > If none of the HW supports it then application/common code can
> > periodically polls it.
> > If mlx5 hw supports it then it fine to have interrupt.
> 
> Actually MLX5 doesn't support aging fully by HW but the HW can help to do it
> better.
> Look, the PMD is the best one to know what is the best way to do aging by its
> HW even if aging is not fully supported by it.
> And it may add a meaningful efficiency to the application.
> 
> > But I think, we need to have means to express a HW/Implementation does
> > not support its As there may following reasons why drivers choose to
> > not take timer/alarm path
> > 1) Some EAL port does not support timer/alarm example: FreeBSD DPDK
> > port
> 	OK, but why not to support it for other cases (no FreeBSD port)?
> 
> > 2) If we need to support a few killo rules then timer/alarm
> > implementation will be heavy
> 
> Not sure, Depend in the HW ability.

Yes when HW does not support at all.

> 
> > So an option to express un supported event would be fine.
> 
> Can you explain more what is your intension here (2)?

To address the case where HW and/or OS(Like FreeBSD) does not support at all . In such case,
Expressing the unsupported would help application to handle in synchronous manner.

> 
> > > According to the above design this is the way for the PMD to notify
> > > the application when it has some aged flows ASAP.
> > > So, if the PMD uses an alarm\timer or any other way to support aging
> > > action it is better in part of the cases to notify the user
> > > asynchronically instead of doing polling by the application.
> > > The idea is to let the application to decide what is better for its usage.
> > >
> > > For mlx5 case,
> > > The plan is to raise this event from an HW interrupt handling(same
> > > as link event).
> >
> > Good to know.
> 
> The MLX5 plan is still to use timer/alarm and interrupt mechanism to support
> aging:
>  The HW help here is the ability to query batch of flows counters
> asynchronically, so getting the response of the new counters values by an
> interrupt.
> 
> The timer\alarm will call to devX operation to read batch of counters
> asynchronically - fast command.
> The interrupt handler to catch the response and to check timeout for each
> flow (no need to copy the counters from the HW memory - the values are in
> the PMD memory) - if there is a new aged flow - raise the event.
> 
>
  
Matan Azrad June 27, 2019, 8:26 a.m. UTC | #6
Hi all

Thanks Jerin for your comments.
Looks like we agree that the feature is relevant at least for mlx5...

Anyone else has more comments?


From: Jerin Jacob Kollanukkaran 
> > -----Original Message-----
> > From: Matan Azrad <matan@mellanox.com>
> > Sent: Tuesday, June 18, 2019 11:27 AM
> > To: Jerin Jacob Kollanukkaran <jerinj@marvell.com>; Adrien Mazarguil
> > <adrien.mazarguil@6wind.com>; dev@dpdk.org
> > Subject: [EXT] RE: [PATCH] [RFC] ethdev: support flow aging
> >
> > Hi Jerin
> 
> Hi Matan,
> 
> >
> > From: Jerin Jacob
> > > Sent: Thursday, June 6, 2019 3:16 PM
> > > To: Matan Azrad <matan@mellanox.com>; Adrien Mazarguil
> > > <adrien.mazarguil@6wind.com>; dev@dpdk.org
> > > Subject: RE: [PATCH] [RFC] ethdev: support flow aging
> > >
> > > > -----Original Message-----
> > > > From: Matan Azrad <matan@mellanox.com>
> > > > Sent: Thursday, June 6, 2019 4:22 PM
> > > > To: Jerin Jacob Kollanukkaran <jerinj@marvell.com>; Adrien
> > > > Mazarguil <adrien.mazarguil@6wind.com>; dev@dpdk.org
> > > > Subject: [EXT] RE: [PATCH] [RFC] ethdev: support flow aging
> > > >
> > > > Hi Jerin
> > >
> > > Hi Matan,
> > >
> > > >
> > > > From: Jerin Jacob
> > > > > > -----Original Message-----
> > > > > > From: dev <dev-bounces@dpdk.org> On Behalf Of Matan Azrad
> > > > > > Sent: Sunday, May 26, 2019 3:48 PM
> > > > > > To: Adrien Mazarguil <adrien.mazarguil@6wind.com>;
> > > > > > dev@dpdk.org
> > > > > > Subject: [dpdk-dev] [PATCH] [RFC] ethdev: support flow aging
> > > > > >
> > > > > > One of the reasons to destroy a flow is the fact that no
> > > > > > packet matches the flow for "timeout" time.
> > > > > > For example, when TCP\UDP sessions are suddenly closed.
> > > > > >
> > > > > > Currently, there is no any dpdk mechanism for flow aging and
> > > > > > the applications use there own ways to detect and destroy
> > > > > > aged-out
> > flows.
> > > > > >
> > > > > > This RFC introduces flow aging APIs to offload the flow aging
> > > > > > task from the application to the port.
> > > > > >
> > > > > > Design:
> > > > > > - A new rte_flow action: RTE_FLOW_ACTION_TYPE_AGE to set the
> > > > timeout
> > > > > > and
> > > > > >   the application flow context for each flow.
> > > > > > - A new ethdev event: RTE_ETH_EVENT_FLOW_AGED for the driver
> > to
> > > > > report
> > > > > >   that there are new aged-out flows.
> > > > > > - A new rte_flow API: rte_flow_get_aged_flows to get the
> > > > > > aged-out
> > > > flows
> > > > > >   contexts from the port.
> > > > > >
> > > > > > By this design each PMD can use its best way to do the aging
> > > > > > with the device offloads supported by its HW.
> > > > > >
> > > > > > Signed-off-by: Matan Azrad <matan@mellanox.com>
> > > > > > ---
> > > > > >  lib/librte_ethdev/rte_ethdev.h |  1 +
> > > > > >  lib/librte_ethdev/rte_flow.h   | 56
> > > > > > ++++++++++++++++++++++++++++++++++++++++++
> > > > > >  2 files changed, 57 insertions(+)
> > > > > >
> > > > > > diff --git a/lib/librte_ethdev/rte_ethdev.h
> > > > > > b/lib/librte_ethdev/rte_ethdev.h index 1f35e1d..6fc1531 100644
> > > > > > --- a/lib/librte_ethdev/rte_ethdev.h
> > > > > > +++ b/lib/librte_ethdev/rte_ethdev.h
> > > > > > @@ -2771,6 +2771,7 @@ enum rte_eth_event_type {
> > > > > >  	RTE_ETH_EVENT_NEW,      /**< port is probed */
> > > > > >  	RTE_ETH_EVENT_DESTROY,  /**< port is released */
> > > > > >  	RTE_ETH_EVENT_IPSEC,    /**< IPsec offload related event */
> > > > > > +	RTE_ETH_EVENT_FLOW_AGED,/**< New aged-out flows
> > detected in
> > > > > > the port
> > > > > Does this event supported in HW?
> > > > It depends in the PMD implementation and HW capability.
> > > >
> > > > > Or Are planning to implement with alarm or timer.
> > > > Again, depends in the PMD implementation.
> > > >
> > > > > Just asking because, if none of the HW supports the interrupt
> > > > > then only rte_flow_get_aged_flows sync API be enough()
> > > > Why?
> > >
> > > If none of the HW supports it then application/common code can
> > > periodically polls it.
> > > If mlx5 hw supports it then it fine to have interrupt.
> >
> > Actually MLX5 doesn't support aging fully by HW but the HW can help to
> > do it better.
> > Look, the PMD is the best one to know what is the best way to do aging
> > by its HW even if aging is not fully supported by it.
> > And it may add a meaningful efficiency to the application.
> >
> > > But I think, we need to have means to express a HW/Implementation
> > > does not support its As there may following reasons why drivers
> > > choose to not take timer/alarm path
> > > 1) Some EAL port does not support timer/alarm example: FreeBSD DPDK
> > > port
> > 	OK, but why not to support it for other cases (no FreeBSD port)?
> >
> > > 2) If we need to support a few killo rules then timer/alarm
> > > implementation will be heavy
> >
> > Not sure, Depend in the HW ability.
> 
> Yes when HW does not support at all.
> 
> >
> > > So an option to express un supported event would be fine.
> >
> > Can you explain more what is your intension here (2)?
> 
> To address the case where HW and/or OS(Like FreeBSD) does not support at
> all . In such case, Expressing the unsupported would help application to
> handle in synchronous manner.
> 
> >
> > > > According to the above design this is the way for the PMD to
> > > > notify the application when it has some aged flows ASAP.
> > > > So, if the PMD uses an alarm\timer or any other way to support
> > > > aging action it is better in part of the cases to notify the user
> > > > asynchronically instead of doing polling by the application.
> > > > The idea is to let the application to decide what is better for its usage.
> > > >
> > > > For mlx5 case,
> > > > The plan is to raise this event from an HW interrupt handling(same
> > > > as link event).
> > >
> > > Good to know.
> >
> > The MLX5 plan is still to use timer/alarm and interrupt mechanism to
> > support
> > aging:
> >  The HW help here is the ability to query batch of flows counters
> > asynchronically, so getting the response of the new counters values by
> > an interrupt.
> >
> > The timer\alarm will call to devX operation to read batch of counters
> > asynchronically - fast command.
> > The interrupt handler to catch the response and to check timeout for
> > each flow (no need to copy the counters from the HW memory - the
> > values are in the PMD memory) - if there is a new aged flow - raise the
> event.
> >
> >
  
Stephen Hemminger March 16, 2020, 4:13 p.m. UTC | #7
On Thu, 6 Jun 2019 12:15:50 +0000
Jerin Jacob Kollanukkaran <jerinj@marvell.com> wrote:

> > -----Original Message-----
> > From: Matan Azrad <matan@mellanox.com>
> > Sent: Thursday, June 6, 2019 4:22 PM
> > To: Jerin Jacob Kollanukkaran <jerinj@marvell.com>; Adrien Mazarguil
> > <adrien.mazarguil@6wind.com>; dev@dpdk.org
> > Subject: [EXT] RE: [PATCH] [RFC] ethdev: support flow aging
> > 
> > Hi Jerin  
> 
> Hi Matan,
> 
> > 
> > From: Jerin Jacob  
> > > > -----Original Message-----
> > > > From: dev <dev-bounces@dpdk.org> On Behalf Of Matan Azrad
> > > > Sent: Sunday, May 26, 2019 3:48 PM
> > > > To: Adrien Mazarguil <adrien.mazarguil@6wind.com>; dev@dpdk.org
> > > > Subject: [dpdk-dev] [PATCH] [RFC] ethdev: support flow aging
> > > >
> > > > One of the reasons to destroy a flow is the fact that no packet
> > > > matches the flow for "timeout" time.
> > > > For example, when TCP\UDP sessions are suddenly closed.
> > > >
> > > > Currently, there is no any dpdk mechanism for flow aging and the
> > > > applications use there own ways to detect and destroy aged-out flows.
> > > >
> > > > This RFC introduces flow aging APIs to offload the flow aging task
> > > > from the application to the port.
> > > >
> > > > Design:
> > > > - A new rte_flow action: RTE_FLOW_ACTION_TYPE_AGE to set the  
> > timeout  
> > > > and
> > > >   the application flow context for each flow.
> > > > - A new ethdev event: RTE_ETH_EVENT_FLOW_AGED for the driver to  
> > > report  
> > > >   that there are new aged-out flows.
> > > > - A new rte_flow API: rte_flow_get_aged_flows to get the aged-out  
> > flows  
> > > >   contexts from the port.
> > > >
> > > > By this design each PMD can use its best way to do the aging with
> > > > the device offloads supported by its HW.
> > > >
> > > > Signed-off-by: Matan Azrad <matan@mellanox.com>
> > > > ---
> > > >  lib/librte_ethdev/rte_ethdev.h |  1 +
> > > >  lib/librte_ethdev/rte_flow.h   | 56
> > > > ++++++++++++++++++++++++++++++++++++++++++
> > > >  2 files changed, 57 insertions(+)
> > > >
> > > > diff --git a/lib/librte_ethdev/rte_ethdev.h
> > > > b/lib/librte_ethdev/rte_ethdev.h index 1f35e1d..6fc1531 100644
> > > > --- a/lib/librte_ethdev/rte_ethdev.h
> > > > +++ b/lib/librte_ethdev/rte_ethdev.h
> > > > @@ -2771,6 +2771,7 @@ enum rte_eth_event_type {
> > > >  	RTE_ETH_EVENT_NEW,      /**< port is probed */
> > > >  	RTE_ETH_EVENT_DESTROY,  /**< port is released */
> > > >  	RTE_ETH_EVENT_IPSEC,    /**< IPsec offload related event */
> > > > +	RTE_ETH_EVENT_FLOW_AGED,/**< New aged-out flows detected in
> > > > the port  
> > > Does this event supported in HW?  
> > It depends in the PMD implementation and HW capability.
> >   
> > > Or Are planning to implement with alarm or timer.  
> > Again, depends in the PMD implementation.
> >   
> > > Just asking because, if none of the HW supports the interrupt then
> > > only rte_flow_get_aged_flows sync API be enough()  
> > Why?  
> 
> If none of the HW supports it then application/common code can periodically polls it.
> If mlx5 hw supports it then it fine to have interrupt. 
> But I think, we need to have means to express a HW/Implementation does not support its
> As there may following reasons why drivers choose to not take timer/alarm path 
> 1) Some EAL port does not support timer/alarm example: FreeBSD DPDK port
> 2) If we need to support a few killo rules then timer/alarm implementation will be heavy
> So an option to express un supported event would be fine.

This API needs to be defined in a way that it is possible to write
an application that works on multiple types of hardware. This is often hard
to do with DPDK because too often API's are added that are convenient for the
driver writer.

There must be only one way that flow aging notifications happen, and they must
only occur in a specific context. Is this in a normal DPDK thread, or in interrupt thread,
or alarm thread. Choose one and make all drivers do the same thing.
  

Patch

diff --git a/lib/librte_ethdev/rte_ethdev.h b/lib/librte_ethdev/rte_ethdev.h
index 1f35e1d..6fc1531 100644
--- a/lib/librte_ethdev/rte_ethdev.h
+++ b/lib/librte_ethdev/rte_ethdev.h
@@ -2771,6 +2771,7 @@  enum rte_eth_event_type {
 	RTE_ETH_EVENT_NEW,      /**< port is probed */
 	RTE_ETH_EVENT_DESTROY,  /**< port is released */
 	RTE_ETH_EVENT_IPSEC,    /**< IPsec offload related event */
+	RTE_ETH_EVENT_FLOW_AGED,/**< New aged-out flows detected in the port */
 	RTE_ETH_EVENT_MAX       /**< max value of this enum */
 };
 
diff --git a/lib/librte_ethdev/rte_flow.h b/lib/librte_ethdev/rte_flow.h
index 63f84fc..757e65f 100644
--- a/lib/librte_ethdev/rte_flow.h
+++ b/lib/librte_ethdev/rte_flow.h
@@ -1650,6 +1650,12 @@  enum rte_flow_action_type {
 	 * See struct rte_flow_action_set_mac.
 	 */
 	RTE_FLOW_ACTION_TYPE_SET_MAC_DST,
+	/**
+	 * Report as aged-out if timeout passed without any matching on the flow.
+	 *
+	 * See struct rte_flow_action_age.
+	 */
+	RTE_FLOW_ACTION_TYPE_AGE,
 };
 
 /**
@@ -2131,6 +2137,22 @@  struct rte_flow_action_set_mac {
 	uint8_t mac_addr[ETHER_ADDR_LEN];
 };
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this structure may change without prior notice
+ *
+ * RTE_FLOW_ACTION_TYPE_AGE
+ *
+ * Report as aged-out if timeout passed without any matching on the flow.
+ *
+ * The flow context and the flow handle will be reported by the
+ * rte_flow_get_aged_flows API.
+ */
+struct rte_flow_action_age {
+	uint16_t timeout; /**< Time in seconds. */
+	void *context; /**< The user flow context. */
+};
+
 /*
  * Definition of a single action.
  *
@@ -2686,6 +2708,40 @@  struct rte_flow_desc {
 	      const void *src,
 	      struct rte_flow_error *error);
 
+/**
+ * Get aged-out flows of a given port.
+ *
+ * RTE_ETH_EVENT_FLOW_AGED is triggered when a port detects aged-out flows.
+ * This function can be called to get the aged flows usynchronously from the
+ * event callback or synchronously when the user wants it.
+ * The callback synchronization is on the user responsibility.
+ *
+ * @param port_id
+ *   Port identifier of Ethernet device.
+ * @param[in/out] flows
+ *   An allocated array to get the aged-out flows handles.
+ *   NULL indicates the flow handles should not be reported.
+ * @param[in/out] contexts
+ *   An allocated array to get the aged-out flows contexts.
+ *   NULL indicates the flow contexts should not be reported.
+ * @param[in] n
+ *   The allocated array entries number of @p flows and @p contexts if exist.
+ * @param[out] error
+ *   Perform verbose error reporting if not NULL. Initialized in case of
+ *   error only.
+ *
+ * @return
+ *   0 in case there are not any aged-out flows, otherwise if positive
+ *   is the number of the reported aged-out flows to @p flows and/or
+ *   @p contexts, a negative errno value otherwise and rte_errno is set.
+ *
+ * @see rte_flow_action_age
+ */
+__rte_experimental
+int
+rte_flow_get_aged_flows(uint16_t port_id, struct rte_flow *flows[],
+			void *contexts[], int n, struct rte_flow_error *error);
+
 #ifdef __cplusplus
 }
 #endif