mbox series

[v5,00/29] graph: introduce graph subsystem

Message ID 20200411141428.1987768-1-jerinj@marvell.com (mailing list archive)
Headers
Series graph: introduce graph subsystem |

Message

Jerin Jacob Kollanukkaran April 11, 2020, 2:13 p.m. UTC
  From: Jerin Jacob <jerinj@marvell.com>

Using graph traversal for packet processing is a proven architecture
that has been implemented in various open source libraries.

Graph architecture for packet processing enables abstracting the data
processing functions as “nodes” and “links” them together to create a
complex “graph” to create reusable/modular data processing functions. 

The patchset further includes performance enhancements and modularity
to the DPDK as discussed in more detail below.

v5..v4:
------
Addressed the following review comments from Andrzej Ostruszka.

1) Addressed and comment in (http://mails.dpdk.org/archives/dev/2020-April/162184.html)
and improved following function prototypes/return types and adjusted the
implementation
a) rte_graph_node_get
b) rte_graph_max_count
c) rte_graph_export
d) rte_graph_destroy
2) Updated UT and l3fwd-graph for updated function prototype
3) bug fix in edge_update
4) avoid reading graph_src_nodes_count() twice in rte_graph_create()
5) Fix graph_mem_fixup_secondray typo
6) Fixed Doxygen comments for rte_node_next_stream_put
7) Updated the documentation to reflect the same.
8) Removed RTE prefix from rte_node_mbuf_priv[1|2] * as they are
internal defines
9) Limited next_hop id provided to LPM route add in
librte_node/ip4_lookup.c to 24 bits ()
10) Fixed pattern array overflow issue with l3fwd-graph/main.c by
splitting pattern
array to default + non-default array. Updated doc with the same info.
11) Fixed parsing issues in parse_config() in l3fwd-graph/main.c inline
with issues reported
in l2fwd-event
12)Removed next_hop field in l3fwd-graph/main.c main()
13) Fixed graph create error check in l3fwd-graph/main.c main()

v4..v3:
-------
Addressed the following review comments from Wang, Xiao W

1) Remove unnecessary line from rte_graph.h
2) Fix a typo from rte_graph.h
3) Move NODE_ID_CHECK to 3rd patch where it is first used.
4) Fixed bug in edge_update()

v3..v2:
-------
1) refactor ipv4 node lookup by moving SSE and NEON specific code to
lib/librte_node/ip4_lookup_sse.h and lib/librte_node/ip4_lookup_neon.h
2) Add scalar version of process() function for ipv4 lookup to make
the node work on NON x86 and arm64 machines.

v2..v1:
------
1) Added programmer guide/implementation documentation and l3fwd-graph doc

RFC..v1:
--------

1) Split the patch to more logical ones for review.
2) Added doxygen comments for the API
3) Code cleanup
4) Additional performance improvements.
Delta between l3fwd and l3fwd-graph is negligible now.
(~1%) on octeontx2.
5) Added SIMD routines for x86 in additional to arm64.

Hosted in netlify for easy reference:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Programmer’s Guide:
https://dpdk-graph.netlify.com/doc/html/guides/prog_guide/graph_lib.html

l3fwd-graph doc:
https://dpdk-graph.netlify.com/doc/html/guides/sample_app_ug/l3_forward_graph.html

API doc:
https://dpdk-graph.netlify.com/doc/html/api/rte__graph_8h.html
https://dpdk-graph.netlify.com/doc/html/api/rte__graph__worker_8h.html
https://dpdk-graph.netlify.com/doc/html/api/rte__node__eth__api_8h.html
https://dpdk-graph.netlify.com/doc/html/api/rte__node__ip4__api_8h.html

2) Added the release notes for the this feature

3) Fix build issues reported by CI for v1:
http://mails.dpdk.org/archives/test-report/2020-March/121326.html


Addional nodes planned for v20.08
----------------------------------
1) Packet classification node
2) Support for IPV6 LPM node


This patchset contains
-----------------------------
1) The API definition to "create" nodes and "link" together to create a
"graph" for packet processing. See, lib/librte_graph/rte_graph.h  

2) The Fast path API definition for the graph walker and enqueue
function used by the workers. See, lib/librte_graph/rte_graph_worker.h

3) Optimized SW implementation for (1) and (2). See, lib/librte_graph/

4) Test case to verify the graph infrastructure functionality
See, app/test/test_graph.c
 
5) Performance test cases to evaluate the cost of graph walker and nodes
enqueue fast-path function for various combinations.

See app/test/test_graph_perf.c

6) Packet processing nodes(Null, Rx, Tx, Pkt drop, IPV4 rewrite, IPv4
lookup)
using graph infrastructure. See lib/librte_node/*

7) An example application to showcase l3fwd
(functionality same as existing examples/l3fwd) using graph
infrastructure and use packets processing nodes (item (6)). See examples/l3fwd-graph/.

Performance
-----------
1) Graph walk and node enqueue overhead can be tested with performance
test case application [1]
# If all packets go from a node to another node (we call it as
# "homerun") then it will be just a pointer swap for a burst of packets.
# In the worst case, a couple of handful cycles to move an object from a
node to another node.

2) Performance comparison with existing l3fwd (The complete static code
with out any nodes) vs modular l3fwd-graph with 5 nodes
(ip4_lookup, ip4_rewrite, ethdev_tx, ethdev_rx, pkt_drop).
Here is graphical representation of the l3fwd-graph as Graphviz dot
file: 
http://bit.ly/39UPPGm

# l3fwd-graph performance is -1.2% wrt static l3fwd.

# We have simulated the similar test with existing librte_pipeline
# application [4].
ip_pipline application is -48.62% wrt static l3fwd.

The above results are on octeontx2. It may vary on other platforms.
The platforms with higher L1 and L2 caches will have further better
performance.


Tested architectures:
--------------------
1) AArch64
2) X86


Identified tweaking for better performance on different targets
---------------------------------------------------------------
1) Test with various burst size values (256, 128, 64, 32) using
CONFIG_RTE_GRAPH_BURST_SIZE config option.
Based on our testing, on x86 and arm64 servers, The sweet spot is 256
burst size.
While on arm64 embedded SoCs, it is either 64 or 128.

2) Disable node statistics (use CONFIG_RTE_LIBRTE_GRAPH_STATS config
option)
if not needed.

3) Use arm64 optimized memory copy for arm64 architecture by
selecting CONFIG_RTE_ARCH_ARM64_MEMCPY. 

Commands to run tests
---------------------

[1] 
perf test:
echo "graph_perf_autotest" | sudo ./build/app/test/dpdk-test -c 0x30

[2]
functionality test:
echo "graph_autotest" | sudo ./build/app/test/dpdk-test -c 0x30

[3]
l3fwd-graph:
./l3fwd-graph -c 0x100  -- -p 0x3 --config="(0, 0, 8)" -P

[4]
# ./ip_pipeline --c 0xff0000 -- -s route.cli

Route.cli: (Copy paste to the shell to avoid dos format issues)

https://pastebin.com/raw/B4Ktx7TT

Jerin Jacob (13):
  graph: define the public API for graph support
  graph: implement node registration
  graph: implement node operations
  graph: implement node debug routines
  graph: implement internal graph operation helpers
  graph: populate fastpath memory for graph reel
  graph: implement create and destroy APIs
  graph: implement graph operation APIs
  graph: implement Graphviz export
  graph: implement debug routines
  graph: implement stats support
  graph: implement fastpath API routines
  doc: add graph library programmer's guide guide

Kiran Kumar K (2):
  graph: add unit test case
  node: add ipv4 rewrite node

Nithin Dabilpuram (11):
  node: add log infra and null node
  node: add ethdev Rx node
  node: add ethdev Tx node
  node: add ethdev Rx and Tx node ctrl API
  node: ipv4 lookup for arm64
  node: add ipv4 rewrite and lookup ctrl API
  node: add packet drop node
  l3fwd-graph: add graph based l3fwd skeleton
  l3fwd-graph: add ethdev configuration changes
  l3fwd-graph: add graph config and main loop
  doc: add l3fwd graph application user guide

Pavan Nikhilesh (3):
  graph: add performance testcase
  node: add generic ipv4 lookup node
  node: ipv4 lookup for x86

 MAINTAINERS                                   |   14 +
 app/test/Makefile                             |    7 +
 app/test/meson.build                          |   12 +-
 app/test/test_graph.c                         |  819 ++++
 app/test/test_graph_perf.c                    | 1057 ++++++
 config/common_base                            |   12 +
 config/rte_config.h                           |    4 +
 doc/api/doxy-api-index.md                     |    5 +
 doc/api/doxy-api.conf.in                      |    2 +
 doc/guides/prog_guide/graph_lib.rst           |  397 ++
 .../prog_guide/img/anatomy_of_a_node.svg      | 1078 ++++++
 .../prog_guide/img/graph_mem_layout.svg       |  702 ++++
 doc/guides/prog_guide/img/link_the_nodes.svg  | 3330 +++++++++++++++++
 doc/guides/prog_guide/index.rst               |    1 +
 doc/guides/rel_notes/release_20_05.rst        |   32 +
 doc/guides/sample_app_ug/index.rst            |    1 +
 doc/guides/sample_app_ug/intro.rst            |    4 +
 doc/guides/sample_app_ug/l3_forward_graph.rst |  334 ++
 examples/Makefile                             |    3 +
 examples/l3fwd-graph/Makefile                 |   58 +
 examples/l3fwd-graph/main.c                   | 1126 ++++++
 examples/l3fwd-graph/meson.build              |   13 +
 examples/meson.build                          |    6 +-
 lib/Makefile                                  |    6 +
 lib/librte_graph/Makefile                     |   28 +
 lib/librte_graph/graph.c                      |  587 +++
 lib/librte_graph/graph_debug.c                |   84 +
 lib/librte_graph/graph_ops.c                  |  169 +
 lib/librte_graph/graph_populate.c             |  234 ++
 lib/librte_graph/graph_private.h              |  347 ++
 lib/librte_graph/graph_stats.c                |  406 ++
 lib/librte_graph/meson.build                  |   11 +
 lib/librte_graph/node.c                       |  421 +++
 lib/librte_graph/rte_graph.h                  |  668 ++++
 lib/librte_graph/rte_graph_version.map        |   47 +
 lib/librte_graph/rte_graph_worker.h           |  510 +++
 lib/librte_node/Makefile                      |   32 +
 lib/librte_node/ethdev_ctrl.c                 |  115 +
 lib/librte_node/ethdev_rx.c                   |  221 ++
 lib/librte_node/ethdev_rx_priv.h              |   81 +
 lib/librte_node/ethdev_tx.c                   |   86 +
 lib/librte_node/ethdev_tx_priv.h              |   62 +
 lib/librte_node/ip4_lookup.c                  |  215 ++
 lib/librte_node/ip4_lookup_neon.h             |  238 ++
 lib/librte_node/ip4_lookup_sse.h              |  244 ++
 lib/librte_node/ip4_rewrite.c                 |  326 ++
 lib/librte_node/ip4_rewrite_priv.h            |   77 +
 lib/librte_node/log.c                         |   14 +
 lib/librte_node/meson.build                   |   10 +
 lib/librte_node/node_private.h                |   79 +
 lib/librte_node/null.c                        |   23 +
 lib/librte_node/pkt_drop.c                    |   26 +
 lib/librte_node/rte_node_eth_api.h            |   64 +
 lib/librte_node/rte_node_ip4_api.h            |   78 +
 lib/librte_node/rte_node_version.map          |    9 +
 lib/meson.build                               |    5 +-
 meson.build                                   |    1 +
 mk/rte.app.mk                                 |    2 +
 58 files changed, 14538 insertions(+), 5 deletions(-)
 create mode 100644 app/test/test_graph.c
 create mode 100644 app/test/test_graph_perf.c
 create mode 100644 doc/guides/prog_guide/graph_lib.rst
 create mode 100644 doc/guides/prog_guide/img/anatomy_of_a_node.svg
 create mode 100644 doc/guides/prog_guide/img/graph_mem_layout.svg
 create mode 100644 doc/guides/prog_guide/img/link_the_nodes.svg
 create mode 100644 doc/guides/sample_app_ug/l3_forward_graph.rst
 create mode 100644 examples/l3fwd-graph/Makefile
 create mode 100644 examples/l3fwd-graph/main.c
 create mode 100644 examples/l3fwd-graph/meson.build
 create mode 100644 lib/librte_graph/Makefile
 create mode 100644 lib/librte_graph/graph.c
 create mode 100644 lib/librte_graph/graph_debug.c
 create mode 100644 lib/librte_graph/graph_ops.c
 create mode 100644 lib/librte_graph/graph_populate.c
 create mode 100644 lib/librte_graph/graph_private.h
 create mode 100644 lib/librte_graph/graph_stats.c
 create mode 100644 lib/librte_graph/meson.build
 create mode 100644 lib/librte_graph/node.c
 create mode 100644 lib/librte_graph/rte_graph.h
 create mode 100644 lib/librte_graph/rte_graph_version.map
 create mode 100644 lib/librte_graph/rte_graph_worker.h
 create mode 100644 lib/librte_node/Makefile
 create mode 100644 lib/librte_node/ethdev_ctrl.c
 create mode 100644 lib/librte_node/ethdev_rx.c
 create mode 100644 lib/librte_node/ethdev_rx_priv.h
 create mode 100644 lib/librte_node/ethdev_tx.c
 create mode 100644 lib/librte_node/ethdev_tx_priv.h
 create mode 100644 lib/librte_node/ip4_lookup.c
 create mode 100644 lib/librte_node/ip4_lookup_neon.h
 create mode 100644 lib/librte_node/ip4_lookup_sse.h
 create mode 100644 lib/librte_node/ip4_rewrite.c
 create mode 100644 lib/librte_node/ip4_rewrite_priv.h
 create mode 100644 lib/librte_node/log.c
 create mode 100644 lib/librte_node/meson.build
 create mode 100644 lib/librte_node/node_private.h
 create mode 100644 lib/librte_node/null.c
 create mode 100644 lib/librte_node/pkt_drop.c
 create mode 100644 lib/librte_node/rte_node_eth_api.h
 create mode 100644 lib/librte_node/rte_node_ip4_api.h
 create mode 100644 lib/librte_node/rte_node_version.map
  

Comments

Tom Barbette April 30, 2020, 8:07 a.m. UTC | #1
Hi all,

I could not check all discussions regarding the graph subsystem, but I 
could not find a trivia behind the idea of re-creating yet another graph 
processing system like VPP, BESS, Click/FastClick and a few others that 
all support DPDK already and comes with up to thousands of "nodes" 
already built?

Is there something fundamentally better than those? Or this is just to 
provide a clean in-house API?

Thanks,

Tom

Le 11/04/2020 à 16:13, jerinj@marvell.com a écrit :
> From: Jerin Jacob <jerinj@marvell.com>
> 
> Using graph traversal for packet processing is a proven architecture
> that has been implemented in various open source libraries.
> 
> Graph architecture for packet processing enables abstracting the data
> processing functions as “nodes” and “links” them together to create a
> complex “graph” to create reusable/modular data processing functions.
> 
> The patchset further includes performance enhancements and modularity
> to the DPDK as discussed in more detail below.
> 
> v5..v4:
> ------
> Addressed the following review comments from Andrzej Ostruszka.
> 
> 1) Addressed and comment in (http://mails.dpdk.org/archives/dev/2020-April/162184.html)
> and improved following function prototypes/return types and adjusted the
> implementation
> a) rte_graph_node_get
> b) rte_graph_max_count
> c) rte_graph_export
> d) rte_graph_destroy
> 2) Updated UT and l3fwd-graph for updated function prototype
> 3) bug fix in edge_update
> 4) avoid reading graph_src_nodes_count() twice in rte_graph_create()
> 5) Fix graph_mem_fixup_secondray typo
> 6) Fixed Doxygen comments for rte_node_next_stream_put
> 7) Updated the documentation to reflect the same.
> 8) Removed RTE prefix from rte_node_mbuf_priv[1|2] * as they are
> internal defines
> 9) Limited next_hop id provided to LPM route add in
> librte_node/ip4_lookup.c to 24 bits ()
> 10) Fixed pattern array overflow issue with l3fwd-graph/main.c by
> splitting pattern
> array to default + non-default array. Updated doc with the same info.
> 11) Fixed parsing issues in parse_config() in l3fwd-graph/main.c inline
> with issues reported
> in l2fwd-event
> 12)Removed next_hop field in l3fwd-graph/main.c main()
> 13) Fixed graph create error check in l3fwd-graph/main.c main()
> 
> v4..v3:
> -------
> Addressed the following review comments from Wang, Xiao W
> 
> 1) Remove unnecessary line from rte_graph.h
> 2) Fix a typo from rte_graph.h
> 3) Move NODE_ID_CHECK to 3rd patch where it is first used.
> 4) Fixed bug in edge_update()
> 
> v3..v2:
> -------
> 1) refactor ipv4 node lookup by moving SSE and NEON specific code to
> lib/librte_node/ip4_lookup_sse.h and lib/librte_node/ip4_lookup_neon.h
> 2) Add scalar version of process() function for ipv4 lookup to make
> the node work on NON x86 and arm64 machines.
> 
> v2..v1:
> ------
> 1) Added programmer guide/implementation documentation and l3fwd-graph doc
> 
> RFC..v1:
> --------
> 
> 1) Split the patch to more logical ones for review.
> 2) Added doxygen comments for the API
> 3) Code cleanup
> 4) Additional performance improvements.
> Delta between l3fwd and l3fwd-graph is negligible now.
> (~1%) on octeontx2.
> 5) Added SIMD routines for x86 in additional to arm64.
> 
> Hosted in netlify for easy reference:
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> Programmer’s Guide:
> https://dpdk-graph.netlify.com/doc/html/guides/prog_guide/graph_lib.html
> 
> l3fwd-graph doc:
> https://dpdk-graph.netlify.com/doc/html/guides/sample_app_ug/l3_forward_graph.html
> 
> API doc:
> https://dpdk-graph.netlify.com/doc/html/api/rte__graph_8h.html
> https://dpdk-graph.netlify.com/doc/html/api/rte__graph__worker_8h.html
> https://dpdk-graph.netlify.com/doc/html/api/rte__node__eth__api_8h.html
> https://dpdk-graph.netlify.com/doc/html/api/rte__node__ip4__api_8h.html
> 
> 2) Added the release notes for the this feature
> 
> 3) Fix build issues reported by CI for v1:
> http://mails.dpdk.org/archives/test-report/2020-March/121326.html
> 
> 
> Addional nodes planned for v20.08
> ----------------------------------
> 1) Packet classification node
> 2) Support for IPV6 LPM node
> 
> 
> This patchset contains
> -----------------------------
> 1) The API definition to "create" nodes and "link" together to create a
> "graph" for packet processing. See, lib/librte_graph/rte_graph.h
> 
> 2) The Fast path API definition for the graph walker and enqueue
> function used by the workers. See, lib/librte_graph/rte_graph_worker.h
> 
> 3) Optimized SW implementation for (1) and (2). See, lib/librte_graph/
> 
> 4) Test case to verify the graph infrastructure functionality
> See, app/test/test_graph.c
>   
> 5) Performance test cases to evaluate the cost of graph walker and nodes
> enqueue fast-path function for various combinations.
> 
> See app/test/test_graph_perf.c
> 
> 6) Packet processing nodes(Null, Rx, Tx, Pkt drop, IPV4 rewrite, IPv4
> lookup)
> using graph infrastructure. See lib/librte_node/*
> 
> 7) An example application to showcase l3fwd
> (functionality same as existing examples/l3fwd) using graph
> infrastructure and use packets processing nodes (item (6)). See examples/l3fwd-graph/.
> 
> Performance
> -----------
> 1) Graph walk and node enqueue overhead can be tested with performance
> test case application [1]
> # If all packets go from a node to another node (we call it as
> # "homerun") then it will be just a pointer swap for a burst of packets.
> # In the worst case, a couple of handful cycles to move an object from a
> node to another node.
> 
> 2) Performance comparison with existing l3fwd (The complete static code
> with out any nodes) vs modular l3fwd-graph with 5 nodes
> (ip4_lookup, ip4_rewrite, ethdev_tx, ethdev_rx, pkt_drop).
> Here is graphical representation of the l3fwd-graph as Graphviz dot
> file:
> http://bit.ly/39UPPGm
> 
> # l3fwd-graph performance is -1.2% wrt static l3fwd.
> 
> # We have simulated the similar test with existing librte_pipeline
> # application [4].
> ip_pipline application is -48.62% wrt static l3fwd.
> 
> The above results are on octeontx2. It may vary on other platforms.
> The platforms with higher L1 and L2 caches will have further better
> performance.
> 
> 
> Tested architectures:
> --------------------
> 1) AArch64
> 2) X86
> 
> 
> Identified tweaking for better performance on different targets
> ---------------------------------------------------------------
> 1) Test with various burst size values (256, 128, 64, 32) using
> CONFIG_RTE_GRAPH_BURST_SIZE config option.
> Based on our testing, on x86 and arm64 servers, The sweet spot is 256
> burst size.
> While on arm64 embedded SoCs, it is either 64 or 128.
> 
> 2) Disable node statistics (use CONFIG_RTE_LIBRTE_GRAPH_STATS config
> option)
> if not needed.
> 
> 3) Use arm64 optimized memory copy for arm64 architecture by
> selecting CONFIG_RTE_ARCH_ARM64_MEMCPY.
> 
> Commands to run tests
> ---------------------
> 
> [1]
> perf test:
> echo "graph_perf_autotest" | sudo ./build/app/test/dpdk-test -c 0x30
> 
> [2]
> functionality test:
> echo "graph_autotest" | sudo ./build/app/test/dpdk-test -c 0x30
> 
> [3]
> l3fwd-graph:
> ./l3fwd-graph -c 0x100  -- -p 0x3 --config="(0, 0, 8)" -P
> 
> [4]
> # ./ip_pipeline --c 0xff0000 -- -s route.cli
> 
> Route.cli: (Copy paste to the shell to avoid dos format issues)
> 
> https://pastebin.com/raw/B4Ktx7TT
> 
> Jerin Jacob (13):
>    graph: define the public API for graph support
>    graph: implement node registration
>    graph: implement node operations
>    graph: implement node debug routines
>    graph: implement internal graph operation helpers
>    graph: populate fastpath memory for graph reel
>    graph: implement create and destroy APIs
>    graph: implement graph operation APIs
>    graph: implement Graphviz export
>    graph: implement debug routines
>    graph: implement stats support
>    graph: implement fastpath API routines
>    doc: add graph library programmer's guide guide
> 
> Kiran Kumar K (2):
>    graph: add unit test case
>    node: add ipv4 rewrite node
> 
> Nithin Dabilpuram (11):
>    node: add log infra and null node
>    node: add ethdev Rx node
>    node: add ethdev Tx node
>    node: add ethdev Rx and Tx node ctrl API
>    node: ipv4 lookup for arm64
>    node: add ipv4 rewrite and lookup ctrl API
>    node: add packet drop node
>    l3fwd-graph: add graph based l3fwd skeleton
>    l3fwd-graph: add ethdev configuration changes
>    l3fwd-graph: add graph config and main loop
>    doc: add l3fwd graph application user guide
> 
> Pavan Nikhilesh (3):
>    graph: add performance testcase
>    node: add generic ipv4 lookup node
>    node: ipv4 lookup for x86
> 
>   MAINTAINERS                                   |   14 +
>   app/test/Makefile                             |    7 +
>   app/test/meson.build                          |   12 +-
>   app/test/test_graph.c                         |  819 ++++
>   app/test/test_graph_perf.c                    | 1057 ++++++
>   config/common_base                            |   12 +
>   config/rte_config.h                           |    4 +
>   doc/api/doxy-api-index.md                     |    5 +
>   doc/api/doxy-api.conf.in                      |    2 +
>   doc/guides/prog_guide/graph_lib.rst           |  397 ++
>   .../prog_guide/img/anatomy_of_a_node.svg      | 1078 ++++++
>   .../prog_guide/img/graph_mem_layout.svg       |  702 ++++
>   doc/guides/prog_guide/img/link_the_nodes.svg  | 3330 +++++++++++++++++
>   doc/guides/prog_guide/index.rst               |    1 +
>   doc/guides/rel_notes/release_20_05.rst        |   32 +
>   doc/guides/sample_app_ug/index.rst            |    1 +
>   doc/guides/sample_app_ug/intro.rst            |    4 +
>   doc/guides/sample_app_ug/l3_forward_graph.rst |  334 ++
>   examples/Makefile                             |    3 +
>   examples/l3fwd-graph/Makefile                 |   58 +
>   examples/l3fwd-graph/main.c                   | 1126 ++++++
>   examples/l3fwd-graph/meson.build              |   13 +
>   examples/meson.build                          |    6 +-
>   lib/Makefile                                  |    6 +
>   lib/librte_graph/Makefile                     |   28 +
>   lib/librte_graph/graph.c                      |  587 +++
>   lib/librte_graph/graph_debug.c                |   84 +
>   lib/librte_graph/graph_ops.c                  |  169 +
>   lib/librte_graph/graph_populate.c             |  234 ++
>   lib/librte_graph/graph_private.h              |  347 ++
>   lib/librte_graph/graph_stats.c                |  406 ++
>   lib/librte_graph/meson.build                  |   11 +
>   lib/librte_graph/node.c                       |  421 +++
>   lib/librte_graph/rte_graph.h                  |  668 ++++
>   lib/librte_graph/rte_graph_version.map        |   47 +
>   lib/librte_graph/rte_graph_worker.h           |  510 +++
>   lib/librte_node/Makefile                      |   32 +
>   lib/librte_node/ethdev_ctrl.c                 |  115 +
>   lib/librte_node/ethdev_rx.c                   |  221 ++
>   lib/librte_node/ethdev_rx_priv.h              |   81 +
>   lib/librte_node/ethdev_tx.c                   |   86 +
>   lib/librte_node/ethdev_tx_priv.h              |   62 +
>   lib/librte_node/ip4_lookup.c                  |  215 ++
>   lib/librte_node/ip4_lookup_neon.h             |  238 ++
>   lib/librte_node/ip4_lookup_sse.h              |  244 ++
>   lib/librte_node/ip4_rewrite.c                 |  326 ++
>   lib/librte_node/ip4_rewrite_priv.h            |   77 +
>   lib/librte_node/log.c                         |   14 +
>   lib/librte_node/meson.build                   |   10 +
>   lib/librte_node/node_private.h                |   79 +
>   lib/librte_node/null.c                        |   23 +
>   lib/librte_node/pkt_drop.c                    |   26 +
>   lib/librte_node/rte_node_eth_api.h            |   64 +
>   lib/librte_node/rte_node_ip4_api.h            |   78 +
>   lib/librte_node/rte_node_version.map          |    9 +
>   lib/meson.build                               |    5 +-
>   meson.build                                   |    1 +
>   mk/rte.app.mk                                 |    2 +
>   58 files changed, 14538 insertions(+), 5 deletions(-)
>   create mode 100644 app/test/test_graph.c
>   create mode 100644 app/test/test_graph_perf.c
>   create mode 100644 doc/guides/prog_guide/graph_lib.rst
>   create mode 100644 doc/guides/prog_guide/img/anatomy_of_a_node.svg
>   create mode 100644 doc/guides/prog_guide/img/graph_mem_layout.svg
>   create mode 100644 doc/guides/prog_guide/img/link_the_nodes.svg
>   create mode 100644 doc/guides/sample_app_ug/l3_forward_graph.rst
>   create mode 100644 examples/l3fwd-graph/Makefile
>   create mode 100644 examples/l3fwd-graph/main.c
>   create mode 100644 examples/l3fwd-graph/meson.build
>   create mode 100644 lib/librte_graph/Makefile
>   create mode 100644 lib/librte_graph/graph.c
>   create mode 100644 lib/librte_graph/graph_debug.c
>   create mode 100644 lib/librte_graph/graph_ops.c
>   create mode 100644 lib/librte_graph/graph_populate.c
>   create mode 100644 lib/librte_graph/graph_private.h
>   create mode 100644 lib/librte_graph/graph_stats.c
>   create mode 100644 lib/librte_graph/meson.build
>   create mode 100644 lib/librte_graph/node.c
>   create mode 100644 lib/librte_graph/rte_graph.h
>   create mode 100644 lib/librte_graph/rte_graph_version.map
>   create mode 100644 lib/librte_graph/rte_graph_worker.h
>   create mode 100644 lib/librte_node/Makefile
>   create mode 100644 lib/librte_node/ethdev_ctrl.c
>   create mode 100644 lib/librte_node/ethdev_rx.c
>   create mode 100644 lib/librte_node/ethdev_rx_priv.h
>   create mode 100644 lib/librte_node/ethdev_tx.c
>   create mode 100644 lib/librte_node/ethdev_tx_priv.h
>   create mode 100644 lib/librte_node/ip4_lookup.c
>   create mode 100644 lib/librte_node/ip4_lookup_neon.h
>   create mode 100644 lib/librte_node/ip4_lookup_sse.h
>   create mode 100644 lib/librte_node/ip4_rewrite.c
>   create mode 100644 lib/librte_node/ip4_rewrite_priv.h
>   create mode 100644 lib/librte_node/log.c
>   create mode 100644 lib/librte_node/meson.build
>   create mode 100644 lib/librte_node/node_private.h
>   create mode 100644 lib/librte_node/null.c
>   create mode 100644 lib/librte_node/pkt_drop.c
>   create mode 100644 lib/librte_node/rte_node_eth_api.h
>   create mode 100644 lib/librte_node/rte_node_ip4_api.h
>   create mode 100644 lib/librte_node/rte_node_version.map
>
  
Jerin Jacob April 30, 2020, 8:42 a.m. UTC | #2
On Thu, Apr 30, 2020 at 1:38 PM Tom Barbette <barbette@kth.se> wrote:
>
> Hi all,
>
> I could not check all discussions regarding the graph subsystem, but I
> could not find a trivia behind the idea of re-creating yet another graph
> processing system like VPP, BESS, Click/FastClick and a few others that
> all support DPDK already and comes with up to thousands of "nodes"
> already built?
>
> Is there something fundamentally better than those? Or this is just to
> provide a clean in-house API?

Enumerating some of the features:

0) Native mbuf based nodes. This will avoid, mbuf to other data plane
container conversion cost in fastpath.
1) Nodes as plugins.
2) Support for out of tree nodes.
3) Inbuilt nodes for packet processing.
4) Multi-process support.
5) Low overhead graph walk and node enqueue.[1] See performance data
from the cover letter.
6) Low overhead statistics collection infrastructure.
7) Support to export the graph as a Graphviz dot file. See rte_graph_export().
8) Allow having another graph walk implementation in the future by
segregating the fast path(rte_graph_worker.h) and slow path code.
9) Native DPDK apps for DPDK drivers to leverage graph architecture.

I think item (0), (4), (5), (9) useful for the DPDK ecosystem.

Performance[1]
-----------
1) Graph walk and node enqueue overhead can be tested with performance
test case application [1]
# If all packets go from a node to another node (we call it as
# "homerun") then it will be just a pointer swap for a burst of packets.
# In the worst case, a couple of handful cycles to move an object from a
node to another node.

2) Performance comparison with existing l3fwd (The complete static code
with out any nodes) vs modular l3fwd-graph with 5 nodes
(ip4_lookup, ip4_rewrite, ethdev_tx, ethdev_rx, pkt_drop).
Here is graphical representation of the l3fwd-graph as Graphviz dot
file:
http://bit.ly/39UPPGm

# l3fwd-graph performance is -1.2% wrt static l3fwd.

# We have simulated the similar test with existing librte_pipeline
# application [4].
ip_pipline application is -48.62% wrt static l3fwd.

The above results are on octeontx2. It may vary on other platforms.
The platforms with higher L1 and L2 caches will have further better
performance.
>
> Thanks,
>
> Tom
>
> Le 11/04/2020 à 16:13, jerinj@marvell.com a écrit :
> > From: Jerin Jacob <jerinj@marvell.com>
> >
> > Using graph traversal for packet processing is a proven architecture
> > that has been implemented in various open source libraries.
> >
> > Graph architecture for packet processing enables abstracting the data
> > processing functions as “nodes” and “links” them together to create a
> > complex “graph” to create reusable/modular data processing functions.
> >
> > The patchset further includes performance enhancements and modularity
> > to the DPDK as discussed in more detail below.
> >
> > v5..v4:
> > ------
> > Addressed the following review comments from Andrzej Ostruszka.
> >
> > 1) Addressed and comment in (http://mails.dpdk.org/archives/dev/2020-April/162184.html)
> > and improved following function prototypes/return types and adjusted the
> > implementation
> > a) rte_graph_node_get
> > b) rte_graph_max_count
> > c) rte_graph_export
> > d) rte_graph_destroy
> > 2) Updated UT and l3fwd-graph for updated function prototype
> > 3) bug fix in edge_update
> > 4) avoid reading graph_src_nodes_count() twice in rte_graph_create()
> > 5) Fix graph_mem_fixup_secondray typo
> > 6) Fixed Doxygen comments for rte_node_next_stream_put
> > 7) Updated the documentation to reflect the same.
> > 8) Removed RTE prefix from rte_node_mbuf_priv[1|2] * as they are
> > internal defines
> > 9) Limited next_hop id provided to LPM route add in
> > librte_node/ip4_lookup.c to 24 bits ()
> > 10) Fixed pattern array overflow issue with l3fwd-graph/main.c by
> > splitting pattern
> > array to default + non-default array. Updated doc with the same info.
> > 11) Fixed parsing issues in parse_config() in l3fwd-graph/main.c inline
> > with issues reported
> > in l2fwd-event
> > 12)Removed next_hop field in l3fwd-graph/main.c main()
> > 13) Fixed graph create error check in l3fwd-graph/main.c main()
> >
> > v4..v3:
> > -------
> > Addressed the following review comments from Wang, Xiao W
> >
> > 1) Remove unnecessary line from rte_graph.h
> > 2) Fix a typo from rte_graph.h
> > 3) Move NODE_ID_CHECK to 3rd patch where it is first used.
> > 4) Fixed bug in edge_update()
> >
> > v3..v2:
> > -------
> > 1) refactor ipv4 node lookup by moving SSE and NEON specific code to
> > lib/librte_node/ip4_lookup_sse.h and lib/librte_node/ip4_lookup_neon.h
> > 2) Add scalar version of process() function for ipv4 lookup to make
> > the node work on NON x86 and arm64 machines.
> >
> > v2..v1:
> > ------
> > 1) Added programmer guide/implementation documentation and l3fwd-graph doc
> >
> > RFC..v1:
> > --------
> >
> > 1) Split the patch to more logical ones for review.
> > 2) Added doxygen comments for the API
> > 3) Code cleanup
> > 4) Additional performance improvements.
> > Delta between l3fwd and l3fwd-graph is negligible now.
> > (~1%) on octeontx2.
> > 5) Added SIMD routines for x86 in additional to arm64.
> >
> > Hosted in netlify for easy reference:
> > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > Programmer’s Guide:
> > https://dpdk-graph.netlify.com/doc/html/guides/prog_guide/graph_lib.html
> >
> > l3fwd-graph doc:
> > https://dpdk-graph.netlify.com/doc/html/guides/sample_app_ug/l3_forward_graph.html
> >
> > API doc:
> > https://dpdk-graph.netlify.com/doc/html/api/rte__graph_8h.html
> > https://dpdk-graph.netlify.com/doc/html/api/rte__graph__worker_8h.html
> > https://dpdk-graph.netlify.com/doc/html/api/rte__node__eth__api_8h.html
> > https://dpdk-graph.netlify.com/doc/html/api/rte__node__ip4__api_8h.html
> >
> > 2) Added the release notes for the this feature
> >
> > 3) Fix build issues reported by CI for v1:
> > http://mails.dpdk.org/archives/test-report/2020-March/121326.html
> >
> >
> > Addional nodes planned for v20.08
> > ----------------------------------
> > 1) Packet classification node
> > 2) Support for IPV6 LPM node
> >
> >
> > This patchset contains
> > -----------------------------
> > 1) The API definition to "create" nodes and "link" together to create a
> > "graph" for packet processing. See, lib/librte_graph/rte_graph.h
> >
> > 2) The Fast path API definition for the graph walker and enqueue
> > function used by the workers. See, lib/librte_graph/rte_graph_worker.h
> >
> > 3) Optimized SW implementation for (1) and (2). See, lib/librte_graph/
> >
> > 4) Test case to verify the graph infrastructure functionality
> > See, app/test/test_graph.c
> >
> > 5) Performance test cases to evaluate the cost of graph walker and nodes
> > enqueue fast-path function for various combinations.
> >
> > See app/test/test_graph_perf.c
> >
> > 6) Packet processing nodes(Null, Rx, Tx, Pkt drop, IPV4 rewrite, IPv4
> > lookup)
> > using graph infrastructure. See lib/librte_node/*
> >
> > 7) An example application to showcase l3fwd
> > (functionality same as existing examples/l3fwd) using graph
> > infrastructure and use packets processing nodes (item (6)). See examples/l3fwd-graph/.
> >
> > Performance
> > -----------
> > 1) Graph walk and node enqueue overhead can be tested with performance
> > test case application [1]
> > # If all packets go from a node to another node (we call it as
> > # "homerun") then it will be just a pointer swap for a burst of packets.
> > # In the worst case, a couple of handful cycles to move an object from a
> > node to another node.
> >
> > 2) Performance comparison with existing l3fwd (The complete static code
> > with out any nodes) vs modular l3fwd-graph with 5 nodes
> > (ip4_lookup, ip4_rewrite, ethdev_tx, ethdev_rx, pkt_drop).
> > Here is graphical representation of the l3fwd-graph as Graphviz dot
> > file:
> > http://bit.ly/39UPPGm
> >
> > # l3fwd-graph performance is -1.2% wrt static l3fwd.
> >
> > # We have simulated the similar test with existing librte_pipeline
> > # application [4].
> > ip_pipline application is -48.62% wrt static l3fwd.
> >
> > The above results are on octeontx2. It may vary on other platforms.
> > The platforms with higher L1 and L2 caches will have further better
> > performance.
> >
> >
> > Tested architectures:
> > --------------------
> > 1) AArch64
> > 2) X86
> >
> >
> > Identified tweaking for better performance on different targets
> > ---------------------------------------------------------------
> > 1) Test with various burst size values (256, 128, 64, 32) using
> > CONFIG_RTE_GRAPH_BURST_SIZE config option.
> > Based on our testing, on x86 and arm64 servers, The sweet spot is 256
> > burst size.
> > While on arm64 embedded SoCs, it is either 64 or 128.
> >
> > 2) Disable node statistics (use CONFIG_RTE_LIBRTE_GRAPH_STATS config
> > option)
> > if not needed.
> >
> > 3) Use arm64 optimized memory copy for arm64 architecture by
> > selecting CONFIG_RTE_ARCH_ARM64_MEMCPY.
> >
> > Commands to run tests
> > ---------------------
> >
> > [1]
> > perf test:
> > echo "graph_perf_autotest" | sudo ./build/app/test/dpdk-test -c 0x30
> >
> > [2]
> > functionality test:
> > echo "graph_autotest" | sudo ./build/app/test/dpdk-test -c 0x30
> >
> > [3]
> > l3fwd-graph:
> > ./l3fwd-graph -c 0x100  -- -p 0x3 --config="(0, 0, 8)" -P
> >
> > [4]
> > # ./ip_pipeline --c 0xff0000 -- -s route.cli
> >
> > Route.cli: (Copy paste to the shell to avoid dos format issues)
> >
> > https://pastebin.com/raw/B4Ktx7TT
> >
> > Jerin Jacob (13):
> >    graph: define the public API for graph support
> >    graph: implement node registration
> >    graph: implement node operations
> >    graph: implement node debug routines
> >    graph: implement internal graph operation helpers
> >    graph: populate fastpath memory for graph reel
> >    graph: implement create and destroy APIs
> >    graph: implement graph operation APIs
> >    graph: implement Graphviz export
> >    graph: implement debug routines
> >    graph: implement stats support
> >    graph: implement fastpath API routines
> >    doc: add graph library programmer's guide guide
> >
> > Kiran Kumar K (2):
> >    graph: add unit test case
> >    node: add ipv4 rewrite node
> >
> > Nithin Dabilpuram (11):
> >    node: add log infra and null node
> >    node: add ethdev Rx node
> >    node: add ethdev Tx node
> >    node: add ethdev Rx and Tx node ctrl API
> >    node: ipv4 lookup for arm64
> >    node: add ipv4 rewrite and lookup ctrl API
> >    node: add packet drop node
> >    l3fwd-graph: add graph based l3fwd skeleton
> >    l3fwd-graph: add ethdev configuration changes
> >    l3fwd-graph: add graph config and main loop
> >    doc: add l3fwd graph application user guide
> >
> > Pavan Nikhilesh (3):
> >    graph: add performance testcase
> >    node: add generic ipv4 lookup node
> >    node: ipv4 lookup for x86
> >
> >   MAINTAINERS                                   |   14 +
> >   app/test/Makefile                             |    7 +
> >   app/test/meson.build                          |   12 +-
> >   app/test/test_graph.c                         |  819 ++++
> >   app/test/test_graph_perf.c                    | 1057 ++++++
> >   config/common_base                            |   12 +
> >   config/rte_config.h                           |    4 +
> >   doc/api/doxy-api-index.md                     |    5 +
> >   doc/api/doxy-api.conf.in                      |    2 +
> >   doc/guides/prog_guide/graph_lib.rst           |  397 ++
> >   .../prog_guide/img/anatomy_of_a_node.svg      | 1078 ++++++
> >   .../prog_guide/img/graph_mem_layout.svg       |  702 ++++
> >   doc/guides/prog_guide/img/link_the_nodes.svg  | 3330 +++++++++++++++++
> >   doc/guides/prog_guide/index.rst               |    1 +
> >   doc/guides/rel_notes/release_20_05.rst        |   32 +
> >   doc/guides/sample_app_ug/index.rst            |    1 +
> >   doc/guides/sample_app_ug/intro.rst            |    4 +
> >   doc/guides/sample_app_ug/l3_forward_graph.rst |  334 ++
> >   examples/Makefile                             |    3 +
> >   examples/l3fwd-graph/Makefile                 |   58 +
> >   examples/l3fwd-graph/main.c                   | 1126 ++++++
> >   examples/l3fwd-graph/meson.build              |   13 +
> >   examples/meson.build                          |    6 +-
> >   lib/Makefile                                  |    6 +
> >   lib/librte_graph/Makefile                     |   28 +
> >   lib/librte_graph/graph.c                      |  587 +++
> >   lib/librte_graph/graph_debug.c                |   84 +
> >   lib/librte_graph/graph_ops.c                  |  169 +
> >   lib/librte_graph/graph_populate.c             |  234 ++
> >   lib/librte_graph/graph_private.h              |  347 ++
> >   lib/librte_graph/graph_stats.c                |  406 ++
> >   lib/librte_graph/meson.build                  |   11 +
> >   lib/librte_graph/node.c                       |  421 +++
> >   lib/librte_graph/rte_graph.h                  |  668 ++++
> >   lib/librte_graph/rte_graph_version.map        |   47 +
> >   lib/librte_graph/rte_graph_worker.h           |  510 +++
> >   lib/librte_node/Makefile                      |   32 +
> >   lib/librte_node/ethdev_ctrl.c                 |  115 +
> >   lib/librte_node/ethdev_rx.c                   |  221 ++
> >   lib/librte_node/ethdev_rx_priv.h              |   81 +
> >   lib/librte_node/ethdev_tx.c                   |   86 +
> >   lib/librte_node/ethdev_tx_priv.h              |   62 +
> >   lib/librte_node/ip4_lookup.c                  |  215 ++
> >   lib/librte_node/ip4_lookup_neon.h             |  238 ++
> >   lib/librte_node/ip4_lookup_sse.h              |  244 ++
> >   lib/librte_node/ip4_rewrite.c                 |  326 ++
> >   lib/librte_node/ip4_rewrite_priv.h            |   77 +
> >   lib/librte_node/log.c                         |   14 +
> >   lib/librte_node/meson.build                   |   10 +
> >   lib/librte_node/node_private.h                |   79 +
> >   lib/librte_node/null.c                        |   23 +
> >   lib/librte_node/pkt_drop.c                    |   26 +
> >   lib/librte_node/rte_node_eth_api.h            |   64 +
> >   lib/librte_node/rte_node_ip4_api.h            |   78 +
> >   lib/librte_node/rte_node_version.map          |    9 +
> >   lib/meson.build                               |    5 +-
> >   meson.build                                   |    1 +
> >   mk/rte.app.mk                                 |    2 +
> >   58 files changed, 14538 insertions(+), 5 deletions(-)
> >   create mode 100644 app/test/test_graph.c
> >   create mode 100644 app/test/test_graph_perf.c
> >   create mode 100644 doc/guides/prog_guide/graph_lib.rst
> >   create mode 100644 doc/guides/prog_guide/img/anatomy_of_a_node.svg
> >   create mode 100644 doc/guides/prog_guide/img/graph_mem_layout.svg
> >   create mode 100644 doc/guides/prog_guide/img/link_the_nodes.svg
> >   create mode 100644 doc/guides/sample_app_ug/l3_forward_graph.rst
> >   create mode 100644 examples/l3fwd-graph/Makefile
> >   create mode 100644 examples/l3fwd-graph/main.c
> >   create mode 100644 examples/l3fwd-graph/meson.build
> >   create mode 100644 lib/librte_graph/Makefile
> >   create mode 100644 lib/librte_graph/graph.c
> >   create mode 100644 lib/librte_graph/graph_debug.c
> >   create mode 100644 lib/librte_graph/graph_ops.c
> >   create mode 100644 lib/librte_graph/graph_populate.c
> >   create mode 100644 lib/librte_graph/graph_private.h
> >   create mode 100644 lib/librte_graph/graph_stats.c
> >   create mode 100644 lib/librte_graph/meson.build
> >   create mode 100644 lib/librte_graph/node.c
> >   create mode 100644 lib/librte_graph/rte_graph.h
> >   create mode 100644 lib/librte_graph/rte_graph_version.map
> >   create mode 100644 lib/librte_graph/rte_graph_worker.h
> >   create mode 100644 lib/librte_node/Makefile
> >   create mode 100644 lib/librte_node/ethdev_ctrl.c
> >   create mode 100644 lib/librte_node/ethdev_rx.c
> >   create mode 100644 lib/librte_node/ethdev_rx_priv.h
> >   create mode 100644 lib/librte_node/ethdev_tx.c
> >   create mode 100644 lib/librte_node/ethdev_tx_priv.h
> >   create mode 100644 lib/librte_node/ip4_lookup.c
> >   create mode 100644 lib/librte_node/ip4_lookup_neon.h
> >   create mode 100644 lib/librte_node/ip4_lookup_sse.h
> >   create mode 100644 lib/librte_node/ip4_rewrite.c
> >   create mode 100644 lib/librte_node/ip4_rewrite_priv.h
> >   create mode 100644 lib/librte_node/log.c
> >   create mode 100644 lib/librte_node/meson.build
> >   create mode 100644 lib/librte_node/node_private.h
> >   create mode 100644 lib/librte_node/null.c
> >   create mode 100644 lib/librte_node/pkt_drop.c
> >   create mode 100644 lib/librte_node/rte_node_eth_api.h
> >   create mode 100644 lib/librte_node/rte_node_ip4_api.h
> >   create mode 100644 lib/librte_node/rte_node_version.map
> >
  
Thomas Monjalon May 5, 2020, 9:44 p.m. UTC | #3
11/04/2020 16:13, jerinj@marvell.com:
> From: Jerin Jacob <jerinj@marvell.com>
> 
> Using graph traversal for packet processing is a proven architecture
> that has been implemented in various open source libraries.
> 
> Graph architecture for packet processing enables abstracting the data
> processing functions as “nodes” and “links” them together to create a
> complex “graph” to create reusable/modular data processing functions. 
[...]
> Jerin Jacob (13):
>   graph: define the public API for graph support
>   graph: implement node registration
>   graph: implement node operations
>   graph: implement node debug routines
>   graph: implement internal graph operation helpers
>   graph: populate fastpath memory for graph reel
>   graph: implement create and destroy APIs
>   graph: implement graph operation APIs
>   graph: implement Graphviz export
>   graph: implement debug routines
>   graph: implement stats support
>   graph: implement fastpath API routines
>   doc: add graph library programmer's guide guide
> 
> Kiran Kumar K (2):
>   graph: add unit test case
>   node: add ipv4 rewrite node
> 
> Nithin Dabilpuram (11):
>   node: add log infra and null node
>   node: add ethdev Rx node
>   node: add ethdev Tx node
>   node: add ethdev Rx and Tx node ctrl API
>   node: ipv4 lookup for arm64
>   node: add ipv4 rewrite and lookup ctrl API
>   node: add packet drop node
>   l3fwd-graph: add graph based l3fwd skeleton
>   l3fwd-graph: add ethdev configuration changes
>   l3fwd-graph: add graph config and main loop
>   doc: add l3fwd graph application user guide
> 
> Pavan Nikhilesh (3):
>   graph: add performance testcase
>   node: add generic ipv4 lookup node
>   node: ipv4 lookup for x86

Applied with below small changes:
	- removed allow_experimental from libs
	- minor changes in MAINTAINERS
	- fixed SVG because of lines wrapped at 990 in email formatting

Thanks for the new experimental libraries.