mbox series

[v6,0/8] fib: implement AVX512 vector lookup

Message ID cover.1594638050.git.vladimir.medvedkin@intel.com (mailing list archive)
Headers
Series fib: implement AVX512 vector lookup |

Message

Vladimir Medvedkin July 13, 2020, 11:11 a.m. UTC
  This patch series implements vectorized lookup using AVX512 for
ipv4 dir24_8 and ipv6 trie algorithms.
Also introduced rte_fib_set_lookup_fn() to change lookup function type.
Added option to select lookup function type in testfib application.

v6:
 - style fixes

v5:
 - prefix zmm macro in rte_vect.h with RTE_X86
 - remove unnecessary typedef for _x86_zmm_t
 - reword commit title
 - fix typos

v4:
 - use __rte_aligned() instead of using compiler attribute directly
 - rework and add comments to meson.build

v3:
 - separate out the AVX-512 code into a separate file

v2:
 - rename rte_zmm to __rte_x86_zmm to reflect its internal usage
 - make runtime decision to use avx512 lookup

Vladimir Medvedkin (8):
  eal/x86: introduce AVX 512-bit type
  fib: make lookup function type configurable
  fib: move lookup definition into the header file
  fib: introduce AVX512 lookup
  fib6: make lookup function type configurable
  fib6: move lookup definition into the header file
  fib6: introduce AVX512 lookup
  app/testfib: add support for different lookup functions

 app/test-fib/main.c                   |  58 +++++-
 lib/librte_eal/x86/include/rte_vect.h |  19 ++
 lib/librte_fib/Makefile               |  24 +++
 lib/librte_fib/dir24_8.c              | 281 +++++---------------------
 lib/librte_fib/dir24_8.h              | 226 ++++++++++++++++++++-
 lib/librte_fib/dir24_8_avx512.c       | 165 +++++++++++++++
 lib/librte_fib/dir24_8_avx512.h       |  24 +++
 lib/librte_fib/meson.build            |  31 +++
 lib/librte_fib/rte_fib.c              |  21 +-
 lib/librte_fib/rte_fib.h              |  24 +++
 lib/librte_fib/rte_fib6.c             |  20 +-
 lib/librte_fib/rte_fib6.h             |  22 ++
 lib/librte_fib/rte_fib_version.map    |   2 +
 lib/librte_fib/trie.c                 | 161 +++------------
 lib/librte_fib/trie.h                 | 119 ++++++++++-
 lib/librte_fib/trie_avx512.c          | 269 ++++++++++++++++++++++++
 lib/librte_fib/trie_avx512.h          |  20 ++
 17 files changed, 1114 insertions(+), 372 deletions(-)
 create mode 100644 lib/librte_fib/dir24_8_avx512.c
 create mode 100644 lib/librte_fib/dir24_8_avx512.h
 create mode 100644 lib/librte_fib/trie_avx512.c
 create mode 100644 lib/librte_fib/trie_avx512.h
  

Comments

Stephen Hemminger July 13, 2020, 10:19 p.m. UTC | #1
On Mon, 13 Jul 2020 12:11:19 +0100
Vladimir Medvedkin <vladimir.medvedkin@intel.com> wrote:

> This patch series implements vectorized lookup using AVX512 for
> ipv4 dir24_8 and ipv6 trie algorithms.
> Also introduced rte_fib_set_lookup_fn() to change lookup function type.
> Added option to select lookup function type in testfib application.
> 
> v6:
>  - style fixes
> 
> v5:
>  - prefix zmm macro in rte_vect.h with RTE_X86
>  - remove unnecessary typedef for _x86_zmm_t
>  - reword commit title
>  - fix typos
> 
> v4:
>  - use __rte_aligned() instead of using compiler attribute directly
>  - rework and add comments to meson.build
> 
> v3:
>  - separate out the AVX-512 code into a separate file
> 
> v2:
>  - rename rte_zmm to __rte_x86_zmm to reflect its internal usage
>  - make runtime decision to use avx512 lookup
> 
> Vladimir Medvedkin (8):
>   eal/x86: introduce AVX 512-bit type
>   fib: make lookup function type configurable
>   fib: move lookup definition into the header file
>   fib: introduce AVX512 lookup
>   fib6: make lookup function type configurable
>   fib6: move lookup definition into the header file
>   fib6: introduce AVX512 lookup
>   app/testfib: add support for different lookup functions
> 
>  app/test-fib/main.c                   |  58 +++++-
>  lib/librte_eal/x86/include/rte_vect.h |  19 ++
>  lib/librte_fib/Makefile               |  24 +++
>  lib/librte_fib/dir24_8.c              | 281 +++++---------------------
>  lib/librte_fib/dir24_8.h              | 226 ++++++++++++++++++++-
>  lib/librte_fib/dir24_8_avx512.c       | 165 +++++++++++++++
>  lib/librte_fib/dir24_8_avx512.h       |  24 +++
>  lib/librte_fib/meson.build            |  31 +++
>  lib/librte_fib/rte_fib.c              |  21 +-
>  lib/librte_fib/rte_fib.h              |  24 +++
>  lib/librte_fib/rte_fib6.c             |  20 +-
>  lib/librte_fib/rte_fib6.h             |  22 ++
>  lib/librte_fib/rte_fib_version.map    |   2 +
>  lib/librte_fib/trie.c                 | 161 +++------------
>  lib/librte_fib/trie.h                 | 119 ++++++++++-
>  lib/librte_fib/trie_avx512.c          | 269 ++++++++++++++++++++++++
>  lib/librte_fib/trie_avx512.h          |  20 ++
>  17 files changed, 1114 insertions(+), 372 deletions(-)
>  create mode 100644 lib/librte_fib/dir24_8_avx512.c
>  create mode 100644 lib/librte_fib/dir24_8_avx512.h
>  create mode 100644 lib/librte_fib/trie_avx512.c
>  create mode 100644 lib/librte_fib/trie_avx512.h
> 

Did anyone else see the recent AVX512 discussion from Linus:
  "I hope AVX512 dies a painful death, and that Intel starts fixing real problems 
   instead of trying to create magic instructions to then create benchmarks that they can look good on.
  
Ray Kinsella July 14, 2020, 7:31 a.m. UTC | #2
On 13/07/2020 23:19, Stephen Hemminger wrote:
> On Mon, 13 Jul 2020 12:11:19 +0100
> Vladimir Medvedkin <vladimir.medvedkin@intel.com> wrote:
> 
>> This patch series implements vectorized lookup using AVX512 for
>> ipv4 dir24_8 and ipv6 trie algorithms.
>> Also introduced rte_fib_set_lookup_fn() to change lookup function type.
>> Added option to select lookup function type in testfib application.
>>
>> v6:
>>  - style fixes
>>
>> v5:
>>  - prefix zmm macro in rte_vect.h with RTE_X86
>>  - remove unnecessary typedef for _x86_zmm_t
>>  - reword commit title
>>  - fix typos
>>
>> v4:
>>  - use __rte_aligned() instead of using compiler attribute directly
>>  - rework and add comments to meson.build
>>
>> v3:
>>  - separate out the AVX-512 code into a separate file
>>
>> v2:
>>  - rename rte_zmm to __rte_x86_zmm to reflect its internal usage
>>  - make runtime decision to use avx512 lookup
>>
>> Vladimir Medvedkin (8):
>>   eal/x86: introduce AVX 512-bit type
>>   fib: make lookup function type configurable
>>   fib: move lookup definition into the header file
>>   fib: introduce AVX512 lookup
>>   fib6: make lookup function type configurable
>>   fib6: move lookup definition into the header file
>>   fib6: introduce AVX512 lookup
>>   app/testfib: add support for different lookup functions
>>
>>  app/test-fib/main.c                   |  58 +++++-
>>  lib/librte_eal/x86/include/rte_vect.h |  19 ++
>>  lib/librte_fib/Makefile               |  24 +++
>>  lib/librte_fib/dir24_8.c              | 281 +++++---------------------
>>  lib/librte_fib/dir24_8.h              | 226 ++++++++++++++++++++-
>>  lib/librte_fib/dir24_8_avx512.c       | 165 +++++++++++++++
>>  lib/librte_fib/dir24_8_avx512.h       |  24 +++
>>  lib/librte_fib/meson.build            |  31 +++
>>  lib/librte_fib/rte_fib.c              |  21 +-
>>  lib/librte_fib/rte_fib.h              |  24 +++
>>  lib/librte_fib/rte_fib6.c             |  20 +-
>>  lib/librte_fib/rte_fib6.h             |  22 ++
>>  lib/librte_fib/rte_fib_version.map    |   2 +
>>  lib/librte_fib/trie.c                 | 161 +++------------
>>  lib/librte_fib/trie.h                 | 119 ++++++++++-
>>  lib/librte_fib/trie_avx512.c          | 269 ++++++++++++++++++++++++
>>  lib/librte_fib/trie_avx512.h          |  20 ++
>>  17 files changed, 1114 insertions(+), 372 deletions(-)
>>  create mode 100644 lib/librte_fib/dir24_8_avx512.c
>>  create mode 100644 lib/librte_fib/dir24_8_avx512.h
>>  create mode 100644 lib/librte_fib/trie_avx512.c
>>  create mode 100644 lib/librte_fib/trie_avx512.h
>>
> 
> Did anyone else see the recent AVX512 discussion from Linus:
>   "I hope AVX512 dies a painful death, and that Intel starts fixing real problems 
>    instead of trying to create magic instructions to then create benchmarks that they can look good on. 

Yup - I saw this one.
Sweeping statements like these are good to provoke debate, the truth is generally more nuanced.
If you continue to read the post, Linus appears to be mostly questioning microprocessor design decisions.

That is an interesting discussion, however the reality is that the technology does exists and may be beneficial for Packet Processing. 

I would suggest, we continue to apply the same logic governing adoption of any technology by DPDK. 
When the technology is present and a clear benefit is shown, we use it with caution.

In the case of Vladimir's patch,
the user has to explicitly switch on the AVX512 lookup with RTE_FIB_DIR24_8_VECTOR_AVX512.

Thanks, 

Ray K
  
Stephen Hemminger July 14, 2020, 2:38 p.m. UTC | #3
On Tue, 14 Jul 2020 08:31:32 +0100
"Kinsella, Ray" <mdr@ashroe.eu> wrote:

> On 13/07/2020 23:19, Stephen Hemminger wrote:
> > On Mon, 13 Jul 2020 12:11:19 +0100
> > Vladimir Medvedkin <vladimir.medvedkin@intel.com> wrote:
> >   
> >> This patch series implements vectorized lookup using AVX512 for
> >> ipv4 dir24_8 and ipv6 trie algorithms.
> >> Also introduced rte_fib_set_lookup_fn() to change lookup function type.
> >> Added option to select lookup function type in testfib application.
> >>
> >> v6:
> >>  - style fixes
> >>
> >> v5:
> >>  - prefix zmm macro in rte_vect.h with RTE_X86
> >>  - remove unnecessary typedef for _x86_zmm_t
> >>  - reword commit title
> >>  - fix typos
> >>
> >> v4:
> >>  - use __rte_aligned() instead of using compiler attribute directly
> >>  - rework and add comments to meson.build
> >>
> >> v3:
> >>  - separate out the AVX-512 code into a separate file
> >>
> >> v2:
> >>  - rename rte_zmm to __rte_x86_zmm to reflect its internal usage
> >>  - make runtime decision to use avx512 lookup
> >>
> >> Vladimir Medvedkin (8):
> >>   eal/x86: introduce AVX 512-bit type
> >>   fib: make lookup function type configurable
> >>   fib: move lookup definition into the header file
> >>   fib: introduce AVX512 lookup
> >>   fib6: make lookup function type configurable
> >>   fib6: move lookup definition into the header file
> >>   fib6: introduce AVX512 lookup
> >>   app/testfib: add support for different lookup functions
> >>
> >>  app/test-fib/main.c                   |  58 +++++-
> >>  lib/librte_eal/x86/include/rte_vect.h |  19 ++
> >>  lib/librte_fib/Makefile               |  24 +++
> >>  lib/librte_fib/dir24_8.c              | 281 +++++---------------------
> >>  lib/librte_fib/dir24_8.h              | 226 ++++++++++++++++++++-
> >>  lib/librte_fib/dir24_8_avx512.c       | 165 +++++++++++++++
> >>  lib/librte_fib/dir24_8_avx512.h       |  24 +++
> >>  lib/librte_fib/meson.build            |  31 +++
> >>  lib/librte_fib/rte_fib.c              |  21 +-
> >>  lib/librte_fib/rte_fib.h              |  24 +++
> >>  lib/librte_fib/rte_fib6.c             |  20 +-
> >>  lib/librte_fib/rte_fib6.h             |  22 ++
> >>  lib/librte_fib/rte_fib_version.map    |   2 +
> >>  lib/librte_fib/trie.c                 | 161 +++------------
> >>  lib/librte_fib/trie.h                 | 119 ++++++++++-
> >>  lib/librte_fib/trie_avx512.c          | 269 ++++++++++++++++++++++++
> >>  lib/librte_fib/trie_avx512.h          |  20 ++
> >>  17 files changed, 1114 insertions(+), 372 deletions(-)
> >>  create mode 100644 lib/librte_fib/dir24_8_avx512.c
> >>  create mode 100644 lib/librte_fib/dir24_8_avx512.h
> >>  create mode 100644 lib/librte_fib/trie_avx512.c
> >>  create mode 100644 lib/librte_fib/trie_avx512.h
> >>  
> > 
> > Did anyone else see the recent AVX512 discussion from Linus:
> >   "I hope AVX512 dies a painful death, and that Intel starts fixing real problems 
> >    instead of trying to create magic instructions to then create benchmarks that they can look good on.   
> 
> Yup - I saw this one.
> Sweeping statements like these are good to provoke debate, the truth is generally more nuanced.
> If you continue to read the post, Linus appears to be mostly questioning microprocessor design decisions.
> 
> That is an interesting discussion, however the reality is that the technology does exists and may be beneficial for Packet Processing. 
> 
> I would suggest, we continue to apply the same logic governing adoption of any technology by DPDK. 
> When the technology is present and a clear benefit is shown, we use it with caution.
> 
> In the case of Vladimir's patch,
> the user has to explicitly switch on the AVX512 lookup with RTE_FIB_DIR24_8_VECTOR_AVX512.
> 

Using what is available makes sense in DPDK.
  
Thomas Monjalon July 15, 2020, 9:47 a.m. UTC | #4
14/07/2020 16:38, Stephen Hemminger:
> "Kinsella, Ray" <mdr@ashroe.eu> wrote:
> > On 13/07/2020 23:19, Stephen Hemminger wrote:
> > > Did anyone else see the recent AVX512 discussion from Linus:
> > >   "I hope AVX512 dies a painful death, and that Intel starts fixing real problems 
> > >    instead of trying to create magic instructions to then create benchmarks that they can look good on.   
> > 
> > Yup - I saw this one.
> > Sweeping statements like these are good to provoke debate, the truth is generally more nuanced.
> > If you continue to read the post, Linus appears to be mostly questioning microprocessor design decisions.
> > 
> > That is an interesting discussion, however the reality is that the technology does exists and may be beneficial for Packet Processing. 
> > 
> > I would suggest, we continue to apply the same logic governing adoption of any technology by DPDK. 
> > When the technology is present and a clear benefit is shown, we use it with caution.
> > 
> > In the case of Vladimir's patch,
> > the user has to explicitly switch on the AVX512 lookup with RTE_FIB_DIR24_8_VECTOR_AVX512.
> 
> Using what is available makes sense in DPDK. 

Why does it require explicit  enabling in application?
AVX512 is not reliable enough to be automatically used when available?
  
Vladimir Medvedkin July 15, 2020, 10:35 a.m. UTC | #5
On 15/07/2020 10:47, Thomas Monjalon wrote:
> 14/07/2020 16:38, Stephen Hemminger:
>> "Kinsella, Ray" <mdr@ashroe.eu> wrote:
>>> On 13/07/2020 23:19, Stephen Hemminger wrote:
>>>> Did anyone else see the recent AVX512 discussion from Linus:
>>>>    "I hope AVX512 dies a painful death, and that Intel starts fixing real problems
>>>>     instead of trying to create magic instructions to then create benchmarks that they can look good on.
>>>
>>> Yup - I saw this one.
>>> Sweeping statements like these are good to provoke debate, the truth is generally more nuanced.
>>> If you continue to read the post, Linus appears to be mostly questioning microprocessor design decisions.
>>>
>>> That is an interesting discussion, however the reality is that the technology does exists and may be beneficial for Packet Processing.
>>>
>>> I would suggest, we continue to apply the same logic governing adoption of any technology by DPDK.
>>> When the technology is present and a clear benefit is shown, we use it with caution.
>>>
>>> In the case of Vladimir's patch,
>>> the user has to explicitly switch on the AVX512 lookup with RTE_FIB_DIR24_8_VECTOR_AVX512.
>>
>> Using what is available makes sense in DPDK.
> 
> Why does it require explicit  enabling in application?
> AVX512 is not reliable enough to be automatically used when available?
> 

It is reliable enough. User have to explicitly trigger to avx512 lookup 
because using avx512 instructions can reduce the frequency of your 
cores. The user knows their environment better. So the scalar version is 
used so as not to affect the frequency.


> 
>
  
Thomas Monjalon July 15, 2020, 11:59 a.m. UTC | #6
15/07/2020 12:35, Medvedkin, Vladimir:
> On 15/07/2020 10:47, Thomas Monjalon wrote:
> > 14/07/2020 16:38, Stephen Hemminger:
> >> "Kinsella, Ray" <mdr@ashroe.eu> wrote:
> >>> On 13/07/2020 23:19, Stephen Hemminger wrote:
> >>>> Did anyone else see the recent AVX512 discussion from Linus:
> >>>>    "I hope AVX512 dies a painful death, and that Intel starts fixing real problems
> >>>>     instead of trying to create magic instructions to then create benchmarks that they can look good on.
> >>>
> >>> Yup - I saw this one.
> >>> Sweeping statements like these are good to provoke debate, the truth is generally more nuanced.
> >>> If you continue to read the post, Linus appears to be mostly questioning microprocessor design decisions.
> >>>
> >>> That is an interesting discussion, however the reality is that the technology does exists and may be beneficial for Packet Processing.
> >>>
> >>> I would suggest, we continue to apply the same logic governing adoption of any technology by DPDK.
> >>> When the technology is present and a clear benefit is shown, we use it with caution.
> >>>
> >>> In the case of Vladimir's patch,
> >>> the user has to explicitly switch on the AVX512 lookup with RTE_FIB_DIR24_8_VECTOR_AVX512.
> >>
> >> Using what is available makes sense in DPDK.
> > 
> > Why does it require explicit  enabling in application?
> > AVX512 is not reliable enough to be automatically used when available?
> 
> It is reliable enough. User have to explicitly trigger to avx512 lookup 
> because using avx512 instructions can reduce the frequency of your 
> cores. The user knows their environment better. So the scalar version is 
> used so as not to affect the frequency.

So the user must know which micro-optimization is better for a code
they don't know. Reminder: an user is not a developper.
I understand we have no better solution though.
Can we improve the user experience with some recommendations, numbers, etc?
  
Vladimir Medvedkin July 15, 2020, 12:29 p.m. UTC | #7
On 15/07/2020 12:59, Thomas Monjalon wrote:
> 15/07/2020 12:35, Medvedkin, Vladimir:
>> On 15/07/2020 10:47, Thomas Monjalon wrote:
>>> 14/07/2020 16:38, Stephen Hemminger:
>>>> "Kinsella, Ray" <mdr@ashroe.eu> wrote:
>>>>> On 13/07/2020 23:19, Stephen Hemminger wrote:
>>>>>> Did anyone else see the recent AVX512 discussion from Linus:
>>>>>>     "I hope AVX512 dies a painful death, and that Intel starts fixing real problems
>>>>>>      instead of trying to create magic instructions to then create benchmarks that they can look good on.
>>>>>
>>>>> Yup - I saw this one.
>>>>> Sweeping statements like these are good to provoke debate, the truth is generally more nuanced.
>>>>> If you continue to read the post, Linus appears to be mostly questioning microprocessor design decisions.
>>>>>
>>>>> That is an interesting discussion, however the reality is that the technology does exists and may be beneficial for Packet Processing.
>>>>>
>>>>> I would suggest, we continue to apply the same logic governing adoption of any technology by DPDK.
>>>>> When the technology is present and a clear benefit is shown, we use it with caution.
>>>>>
>>>>> In the case of Vladimir's patch,
>>>>> the user has to explicitly switch on the AVX512 lookup with RTE_FIB_DIR24_8_VECTOR_AVX512.
>>>>
>>>> Using what is available makes sense in DPDK.
>>>
>>> Why does it require explicit  enabling in application?
>>> AVX512 is not reliable enough to be automatically used when available?
>>
>> It is reliable enough. User have to explicitly trigger to avx512 lookup
>> because using avx512 instructions can reduce the frequency of your
>> cores. The user knows their environment better. So the scalar version is
>> used so as not to affect the frequency.
> 
> So the user must know which micro-optimization is better for a code
> they don't know. Reminder: an user is not a developper.
> I understand we have no better solution though.
> Can we improve the user experience with some recommendations, numbers, etc?
> 

In case where a user is a developer (dpdk users are mostly devs, aren't 
they?) who uses the fib library in their app may decide to switch to 
avx512 lookup using rte_fib_set_lookup_fn() when they know that their 
code is already using avx512 (ifdef, startup check, etc).
In other case an app developer, for example, could provide to user 
command line option or some interactive command to switch lookup function.
I'd recommend to run testfib app with various "-v" options to evaluate 
lookup performance on a target system to make a decision.

>
  
Thomas Monjalon July 15, 2020, 12:45 p.m. UTC | #8
15/07/2020 14:29, Medvedkin, Vladimir:
> On 15/07/2020 12:59, Thomas Monjalon wrote:
> > 15/07/2020 12:35, Medvedkin, Vladimir:
> >> On 15/07/2020 10:47, Thomas Monjalon wrote:
> >>> 14/07/2020 16:38, Stephen Hemminger:
> >>>> "Kinsella, Ray" <mdr@ashroe.eu> wrote:
> >>>>> On 13/07/2020 23:19, Stephen Hemminger wrote:
> >>>>>> Did anyone else see the recent AVX512 discussion from Linus:
> >>>>>>     "I hope AVX512 dies a painful death, and that Intel starts fixing real problems
> >>>>>>      instead of trying to create magic instructions to then create benchmarks that they can look good on.
> >>>>>
> >>>>> Yup - I saw this one.
> >>>>> Sweeping statements like these are good to provoke debate, the truth is generally more nuanced.
> >>>>> If you continue to read the post, Linus appears to be mostly questioning microprocessor design decisions.
> >>>>>
> >>>>> That is an interesting discussion, however the reality is that the technology does exists and may be beneficial for Packet Processing.
> >>>>>
> >>>>> I would suggest, we continue to apply the same logic governing adoption of any technology by DPDK.
> >>>>> When the technology is present and a clear benefit is shown, we use it with caution.
> >>>>>
> >>>>> In the case of Vladimir's patch,
> >>>>> the user has to explicitly switch on the AVX512 lookup with RTE_FIB_DIR24_8_VECTOR_AVX512.
> >>>>
> >>>> Using what is available makes sense in DPDK.
> >>>
> >>> Why does it require explicit  enabling in application?
> >>> AVX512 is not reliable enough to be automatically used when available?
> >>
> >> It is reliable enough. User have to explicitly trigger to avx512 lookup
> >> because using avx512 instructions can reduce the frequency of your
> >> cores. The user knows their environment better. So the scalar version is
> >> used so as not to affect the frequency.
> > 
> > So the user must know which micro-optimization is better for a code
> > they don't know. Reminder: an user is not a developper.
> > I understand we have no better solution though.
> > Can we improve the user experience with some recommendations, numbers, etc?
> > 
> 
> In case where a user is a developer (dpdk users are mostly devs, aren't 
> they?) who uses the fib library in their app may decide to switch to 
> avx512 lookup using rte_fib_set_lookup_fn() when they know that their 
> code is already using avx512 (ifdef, startup check, etc).
> In other case an app developer, for example, could provide to user 
> command line option or some interactive command to switch lookup function.
> I'd recommend to run testfib app with various "-v" options to evaluate 
> lookup performance on a target system to make a decision.

I think this is the difference between a library for hackers,
and a product for end-users.
We are not building a product, but we can make a step in that direction
by documenting some knowledge.
I don't know exactly what it means in this case, so I'll let others
suggest some doc improvements (if anyone cares).
  
Bruce Richardson July 17, 2020, 4:43 p.m. UTC | #9
> -----Original Message-----
> From: Thomas Monjalon <thomas@monjalon.net>
> Sent: Wednesday, July 15, 2020 1:45 PM
> To: Medvedkin, Vladimir <vladimir.medvedkin@intel.com>
> Cc: Kinsella, Ray <mdr@ashroe.eu>; Stephen Hemminger
> <stephen@networkplumber.org>; dev@dpdk.org; david.marchand@redhat.com;
> jerinj@marvell.com; Ananyev, Konstantin <konstantin.ananyev@intel.com>;
> Richardson, Bruce <bruce.richardson@intel.com>; Mcnamara, John
> <john.mcnamara@intel.com>; O'Driscoll, Tim <tim.odriscoll@intel.com>
> Subject: Re: [dpdk-dev] [PATCH v6 0/8] fib: implement AVX512 vector lookup
> 
> 15/07/2020 14:29, Medvedkin, Vladimir:
> > On 15/07/2020 12:59, Thomas Monjalon wrote:
> > > 15/07/2020 12:35, Medvedkin, Vladimir:
> > >> On 15/07/2020 10:47, Thomas Monjalon wrote:
> > >>> 14/07/2020 16:38, Stephen Hemminger:
> > >>>> "Kinsella, Ray" <mdr@ashroe.eu> wrote:
> > >>>>> On 13/07/2020 23:19, Stephen Hemminger wrote:
> > >>>>>> Did anyone else see the recent AVX512 discussion from Linus:
> > >>>>>>     "I hope AVX512 dies a painful death, and that Intel starts
> fixing real problems
> > >>>>>>      instead of trying to create magic instructions to then
> create benchmarks that they can look good on.
> > >>>>>
> > >>>>> Yup - I saw this one.
> > >>>>> Sweeping statements like these are good to provoke debate, the
> truth is generally more nuanced.
> > >>>>> If you continue to read the post, Linus appears to be mostly
> questioning microprocessor design decisions.
> > >>>>>
> > >>>>> That is an interesting discussion, however the reality is that the
> technology does exists and may be beneficial for Packet Processing.
> > >>>>>
> > >>>>> I would suggest, we continue to apply the same logic governing
> adoption of any technology by DPDK.
> > >>>>> When the technology is present and a clear benefit is shown, we
> use it with caution.
> > >>>>>
> > >>>>> In the case of Vladimir's patch, the user has to explicitly
> > >>>>> switch on the AVX512 lookup with RTE_FIB_DIR24_8_VECTOR_AVX512.
> > >>>>
> > >>>> Using what is available makes sense in DPDK.
> > >>>
> > >>> Why does it require explicit  enabling in application?
> > >>> AVX512 is not reliable enough to be automatically used when
> available?
> > >>
> > >> It is reliable enough. User have to explicitly trigger to avx512
> > >> lookup because using avx512 instructions can reduce the frequency
> > >> of your cores. The user knows their environment better. So the
> > >> scalar version is used so as not to affect the frequency.
> > >
> > > So the user must know which micro-optimization is better for a code
> > > they don't know. Reminder: an user is not a developper.
> > > I understand we have no better solution though.
> > > Can we improve the user experience with some recommendations, numbers,
> etc?
> > >
> >
> > In case where a user is a developer (dpdk users are mostly devs,
> > aren't
> > they?) who uses the fib library in their app may decide to switch to
> > avx512 lookup using rte_fib_set_lookup_fn() when they know that their
> > code is already using avx512 (ifdef, startup check, etc).
> > In other case an app developer, for example, could provide to user
> > command line option or some interactive command to switch lookup
> function.
> > I'd recommend to run testfib app with various "-v" options to evaluate
> > lookup performance on a target system to make a decision.
> 
> I think this is the difference between a library for hackers, and a
> product for end-users.
> We are not building a product, but we can make a step in that direction by
> documenting some knowledge.
> I don't know exactly what it means in this case, so I'll let others
> suggest some doc improvements (if anyone cares).
> 

We have got a patchset in the works to try and make AVX-512 use simpler for 20.11,
by providing both developer APIs and end-user cmdline flags to control this
centrally for DPDK, rather than having each library provide its own magic hooks
to optionally enable this support. As part of that set, we'll see about what
doc updates need to be made also - again covering both developer and end-app user.

Hopefully we can get that set out soon to get early feedback and reach a good
conclusion.

/Bruce
  
Thomas Monjalon July 19, 2020, 10:04 a.m. UTC | #10
17/07/2020 18:43, Richardson, Bruce:
> From: Thomas Monjalon <thomas@monjalon.net>
> > 15/07/2020 14:29, Medvedkin, Vladimir:
> > > On 15/07/2020 12:59, Thomas Monjalon wrote:
> > > > 15/07/2020 12:35, Medvedkin, Vladimir:
> > > >> On 15/07/2020 10:47, Thomas Monjalon wrote:
> > > >>> 14/07/2020 16:38, Stephen Hemminger:
> > > >>>> "Kinsella, Ray" <mdr@ashroe.eu> wrote:
> > > >>>>> On 13/07/2020 23:19, Stephen Hemminger wrote:
> > > >>>>>> Did anyone else see the recent AVX512 discussion from Linus:
> > > >>>>>>     "I hope AVX512 dies a painful death, and that Intel starts
> > fixing real problems
> > > >>>>>>      instead of trying to create magic instructions to then
> > create benchmarks that they can look good on.
> > > >>>>>
> > > >>>>> Yup - I saw this one.
> > > >>>>> Sweeping statements like these are good to provoke debate, the
> > truth is generally more nuanced.
> > > >>>>> If you continue to read the post, Linus appears to be mostly
> > questioning microprocessor design decisions.
> > > >>>>>
> > > >>>>> That is an interesting discussion, however the reality is that the
> > technology does exists and may be beneficial for Packet Processing.
> > > >>>>>
> > > >>>>> I would suggest, we continue to apply the same logic governing
> > adoption of any technology by DPDK.
> > > >>>>> When the technology is present and a clear benefit is shown, we
> > use it with caution.
> > > >>>>>
> > > >>>>> In the case of Vladimir's patch, the user has to explicitly
> > > >>>>> switch on the AVX512 lookup with RTE_FIB_DIR24_8_VECTOR_AVX512.
> > > >>>>
> > > >>>> Using what is available makes sense in DPDK.
> > > >>>
> > > >>> Why does it require explicit  enabling in application?
> > > >>> AVX512 is not reliable enough to be automatically used when
> > available?
> > > >>
> > > >> It is reliable enough. User have to explicitly trigger to avx512
> > > >> lookup because using avx512 instructions can reduce the frequency
> > > >> of your cores. The user knows their environment better. So the
> > > >> scalar version is used so as not to affect the frequency.
> > > >
> > > > So the user must know which micro-optimization is better for a code
> > > > they don't know. Reminder: an user is not a developper.
> > > > I understand we have no better solution though.
> > > > Can we improve the user experience with some recommendations, numbers,
> > etc?
> > > >
> > >
> > > In case where a user is a developer (dpdk users are mostly devs,
> > > aren't
> > > they?) who uses the fib library in their app may decide to switch to
> > > avx512 lookup using rte_fib_set_lookup_fn() when they know that their
> > > code is already using avx512 (ifdef, startup check, etc).
> > > In other case an app developer, for example, could provide to user
> > > command line option or some interactive command to switch lookup
> > function.
> > > I'd recommend to run testfib app with various "-v" options to evaluate
> > > lookup performance on a target system to make a decision.
> > 
> > I think this is the difference between a library for hackers, and a
> > product for end-users.
> > We are not building a product, but we can make a step in that direction by
> > documenting some knowledge.
> > I don't know exactly what it means in this case, so I'll let others
> > suggest some doc improvements (if anyone cares).
> > 
> 
> We have got a patchset in the works to try and make AVX-512 use simpler for 20.11,
> by providing both developer APIs and end-user cmdline flags to control this
> centrally for DPDK, rather than having each library provide its own magic hooks
> to optionally enable this support. As part of that set, we'll see about what
> doc updates need to be made also - again covering both developer and end-app user.
> 
> Hopefully we can get that set out soon to get early feedback and reach a good
> conclusion.

We cannot merge anymore in 20.08 because we passed -rc1.
I am in favor of merging this feature the day after 20.08 release.