From patchwork Sat Jan 7 01:06:06 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "De Lara Guarch, Pablo" X-Patchwork-Id: 18987 Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [IPv6:::1]) by dpdk.org (Postfix) with ESMTP id 73B39F922; Sat, 7 Jan 2017 02:05:41 +0100 (CET) Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by dpdk.org (Postfix) with ESMTP id C73ED2A6C for ; Sat, 7 Jan 2017 02:04:46 +0100 (CET) Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by orsmga103.jf.intel.com with ESMTP; 06 Jan 2017 17:04:43 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.33,327,1477983600"; d="scan'208";a="210558083" Received: from silpixa00381631.ir.intel.com (HELO silpixa00381631.ger.corp.intel.com) ([10.237.222.122]) by fmsmga004.fm.intel.com with ESMTP; 06 Jan 2017 17:04:42 -0800 From: Pablo de Lara To: dev@dpdk.org Cc: Pablo de Lara , Sameh Gobriel Date: Sat, 7 Jan 2017 01:06:06 +0000 Message-Id: <1483751166-32423-6-git-send-email-pablo.de.lara.guarch@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1483751166-32423-1-git-send-email-pablo.de.lara.guarch@intel.com> References: <1480690340-17652-1-git-send-email-pablo.de.lara.guarch@intel.com> <1483751166-32423-1-git-send-email-pablo.de.lara.guarch@intel.com> Subject: [dpdk-dev] [PATCH v2 5/5] doc: add flow distributor guide X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Signed-off-by: Sameh Gobriel Signed-off-by: Pablo de Lara --- MAINTAINERS | 1 + doc/guides/sample_app_ug/flow_distributor.rst | 492 ++++++++++++++++++++++ doc/guides/sample_app_ug/img/flow_distributor.svg | 417 ++++++++++++++++++ doc/guides/sample_app_ug/index.rst | 1 + 4 files changed, 911 insertions(+) create mode 100644 doc/guides/sample_app_ug/flow_distributor.rst create mode 100644 doc/guides/sample_app_ug/img/flow_distributor.svg diff --git a/MAINTAINERS b/MAINTAINERS index 66e9466..0d3b247 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -535,6 +535,7 @@ F: lib/librte_efd/ F: doc/guides/prog_guide/efd_lib.rst F: app/test/test_efd* F: examples/flow_distributor/ +F: doc/guides/sample_app_ug/flow_distributor.rst Hashes M: Bruce Richardson diff --git a/doc/guides/sample_app_ug/flow_distributor.rst b/doc/guides/sample_app_ug/flow_distributor.rst new file mode 100644 index 0000000..39820f0 --- /dev/null +++ b/doc/guides/sample_app_ug/flow_distributor.rst @@ -0,0 +1,492 @@ +.. BSD LICENSE + Copyright(c) 2010-2016 Intel Corporation. All rights reserved. + All rights reserved. + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions + are met: + + * Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + * Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in + the documentation and/or other materials provided with the + distribution. + * Neither the name of Intel Corporation nor the names of its + contributors may be used to endorse or promote products derived + from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +Flow Distributor Sample Application +=================================== + +This sample application demonstrates the use of EFD library as a flow-level +load balancer, for more information about the EFD Library please refer to the +DPDK programmer's guide. + +This sample application is a variant of the :ref:`client-server sample application ` +where a specific target node is specified for every and each flow +(not in a round-robin fashion as the original load balancing sample application). + +Overview +-------- + +The architecture of the EFD flow-based load balancer sample application is +presented in the following figure. + +.. _figure_efd_sample_app_overview: + +.. figure:: img/flow_distributor.* + + Using EFD as a Flow-Level Load Balancer + +As shown in figure, the sample application consists of a front end node +(distributor) using the EFD library to create a load-balancing table for flows, +for each flow a target backend worker node is specified. The EFD table does not +store the flow key (unlike a regular hash table), and hence, it can +individually load-balance millions of flows (number of targets * maximum number +of flows fit in a flow table per target) while still fitting in CPU cache. + +It should be noted that although they are referred to as nodes, the frontend +distributor and worker nodes are processes running on the same platform. + +Front-end Distributor +~~~~~~~~~~~~~~~~~~~~~ + +Upon initializing, the frontend distributor node (process) creates a flow +distributor table (based on the EFD library) which is populated with flow +information and its intended target node. + +The sample application assigns a specific target node_id (process) for each of +the IP destination addresses as follows: + +.. code-block:: c + + node_id = i % num_nodes; /* Target node id is generated */ + ip_dst = rte_cpu_to_be_32(i); /* Specific ip destination address is + assigned to this target node */ + +then the pair of is inserted into the flow distribution table. + +The main loop of the the distributor node receives a burst of packets, then for +each packet a flow key (IP destination address) is extracted. The flow +distributor table is looked up and the target node id is returned. Packets are +then enqueued to the specified target node id. + +It should be noted that flow distributor table is not a membership test table. +I.e. if the key has already been inserted the target node id will be correct, +but for new keys the flow distributor table will return a value (which can be +valid). + +Backend Worker Nodes +~~~~~~~~~~~~~~~~~~~~ + +Upon initializing, the worker node (process) creates a flow table (a regular +hash table that stores the key default size 1M flows) which is populated with +only the flow information that are serviced at this node. This flow key is +essential to point out new keys that have not been inserted before. + +The worker node's main loop is simply receiving packets then doing a hash table +lookup. If a match occurs then statistics are updated for flows serviced by +this node. If no match is found in the local hash table then this indicates +that this is a new flows which is dropped. + + +Compiling the Application +------------------------- + +The sequence of steps used to build the application is: + +#. Export the required environment variables: + + .. code-block:: console + + export RTE_SDK=/path/to/rte_sdk + export RTE_TARGET=x86_64-native-linuxapp-gcc + +#. Build the application executable file: + + .. code-block:: console + + cd ${RTE_SDK}/examples/flow_distributor/ + make + + For more details on how to build the DPDK libraries and sample + applications, + please refer to the *DPDK Getting Started Guide.* + + +Running the Application +----------------------- + +The application has two binaries to be run: the front-end distributor +and the back-end node. + +The frontend distributor (distributor) has the following command line options:: + + ./distributor [EAL options] -- -p PORTMASK -n NUM_NODES -f NUM_FLOWS + +Where, + +* ``-p PORTMASK:`` Hexadecimal bitmask of ports to configure +* ``-n NUM_NODES:`` Number of back-end nodes that will be used +* ``-f NUM_FLOWS:`` Number of flows to be added in the EFD table (1 million, by default) + +The back-end node (node) has the following command line options:: + + ./node [EAL options] -- -n NODE_ID + +Where, + +* ``-n NODE_ID:`` Node ID, which cannot be equal or higher than NUM_MODES + + +First, the distributor app must be run, with the number of nodes that will be run. +Once it has been started, the node instances can be run, with different NODE_ID. +These instances have to be run as secondary processes, with `--proc-type=secondary` +in the EAL options, which will attach to the primary process memory, and therefore, +they can access the queues created by the primary process to distribute packets. + +To successfully run the application, the command line used to start the +application has to be in sync with the traffic flows configured on the traffic +generator side. + +For examples of application command lines and traffic generator flows, please +refer to the DPDK Test Report. For more details on how to set up and run the +sample applications provided with DPDK package, please refer to the +:ref:`DPDK Getting Started Guide for Linux ` and +:ref:`DPDK Getting Started Guide for FreeBSD `. + + +Explanation +----------- + +As described in previous sections, there are two processes in this example. + +The first process, the front-end distributor, creates and populates the EFD table, +which is used to distribute packets to nodes, which the number of flows +specified in the command line (1 million, by default). + + +.. code-block:: c + + static void + create_flow_distributor_table(void) + { + uint8_t socket_id = rte_socket_id(); + + /* create table */ + efd_table = rte_efd_create("flow table", num_flows * 2, sizeof(uint32_t), + 1 << socket_id, socket_id); + + if (efd_table == NULL) + rte_exit(EXIT_FAILURE, "Problem creating the flow table\n"); + } + + static void + populate_flow_distributor_table(void) + { + unsigned int i; + int32_t ret; + uint32_t ip_dst; + uint8_t socket_id = rte_socket_id(); + uint64_t node_id; + + /* Add flows in table */ + for (i = 0; i < num_flows; i++) { + node_id = i % num_nodes; + + ip_dst = rte_cpu_to_be_32(i); + ret = rte_efd_update(efd_table, socket_id, + (void *)&ip_dst, (efd_value_t)node_id); + if (ret < 0) + rte_exit(EXIT_FAILURE, "Unable to add entry %u in " + "flow distributor table\n", i); + } + + printf("EFD table: Adding 0x%x keys\n", num_flows); + } + +After initialization, packets are received from the enabled ports, and the IPv4 +address from the packets is used as a key to look up in the EFD table, +which tells the node where the packet has to be distributed. + +.. code-block:: c + + static void + process_packets(uint32_t port_num __rte_unused, struct rte_mbuf *pkts[], + uint16_t rx_count, unsigned int socket_id) + { + uint16_t i; + uint8_t node; + efd_value_t data[EFD_BURST_MAX]; + const void *key_ptrs[EFD_BURST_MAX]; + + struct ipv4_hdr *ipv4_hdr; + uint32_t ipv4_dst_ip[EFD_BURST_MAX]; + + for (i = 0; i < rx_count; i++) { + /* Handle IPv4 header.*/ + ipv4_hdr = rte_pktmbuf_mtod_offset(pkts[i], struct ipv4_hdr *, + sizeof(struct ether_hdr)); + ipv4_dst_ip[i] = ipv4_hdr->dst_addr; + key_ptrs[i] = (void *)&ipv4_dst_ip[i]; + } + + rte_efd_lookup_bulk(efd_table, socket_id, rx_count, + (const void **) key_ptrs, data); + for (i = 0; i < rx_count; i++) { + node = (uint8_t) ((uintptr_t)data[i]); + + if (node >= num_nodes) { + /* + * Node is out of range, which means that + * flow has not been inserted + */ + flow_dist_stats.drop++; + rte_pktmbuf_free(pkts[i]); + } else { + flow_dist_stats.distributed++; + enqueue_rx_packet(node, pkts[i]); + } + } + + for (i = 0; i < num_nodes; i++) + flush_rx_queue(i); + } + +The burst of packets received is enqueued in temporary buffers (per node), +and enqueued in the shared ring between the distributor and the node. +After this, a new burst of packets is received and this process is +repeated infinitely. + +.. code-block:: c + + static void + flush_rx_queue(uint16_t node) + { + uint16_t j; + struct node *cl; + + if (cl_rx_buf[node].count == 0) + return; + + cl = &nodes[node]; + if (rte_ring_enqueue_bulk(cl->rx_q, (void **)cl_rx_buf[node].buffer, + cl_rx_buf[node].count) != 0){ + for (j = 0; j < cl_rx_buf[node].count; j++) + rte_pktmbuf_free(cl_rx_buf[node].buffer[j]); + cl->stats.rx_drop += cl_rx_buf[node].count; + } else + cl->stats.rx += cl_rx_buf[node].count; + + cl_rx_buf[node].count = 0; + } + +The second process, the back-end node, receives the packets from the shared +ring with the distributor and send them out, if they belong to the node. + +At initialization, it attaches to the distributor process memory, to have +access to the shared ring, parameters and statistics. + +.. code-block:: c + + rx_ring = rte_ring_lookup(get_rx_queue_name(node_id)); + if (rx_ring == NULL) + rte_exit(EXIT_FAILURE, "Cannot get RX ring - " + "is distributor process running?\n"); + + mp = rte_mempool_lookup(PKTMBUF_POOL_NAME); + if (mp == NULL) + rte_exit(EXIT_FAILURE, "Cannot get mempool for mbufs\n"); + + mz = rte_memzone_lookup(MZ_SHARED_INFO); + if (mz == NULL) + rte_exit(EXIT_FAILURE, "Cannot get port info structure\n"); + info = mz->addr; + tx_stats = &(info->tx_stats[node_id]); + filter_stats = &(info->filter_stats[node_id]); + +Then, the hash table that contains the flows that will be handled +by the node is created and populated. + +.. code-block:: c + + static struct rte_hash * + create_hash_table(const struct shared_info *info) + { + uint32_t num_flows_node = info->num_flows / info->num_nodes; + char name[RTE_HASH_NAMESIZE]; + struct rte_hash *h; + + /* create table */ + struct rte_hash_parameters hash_params = { + .entries = num_flows_node * 2, /* table load = 50% */ + .key_len = sizeof(uint32_t), /* Store IPv4 dest IP address */ + .socket_id = rte_socket_id(), + .hash_func_init_val = 0, + }; + + snprintf(name, sizeof(name), "hash_table_%d", node_id); + hash_params.name = name; + h = rte_hash_create(&hash_params); + + if (h == NULL) + rte_exit(EXIT_FAILURE, + "Problem creating the hash table for node %d\n", + node_id); + return h; + } + + static void + populate_hash_table(const struct rte_hash *h, const struct shared_info *info) + { + unsigned int i; + int32_t ret; + uint32_t ip_dst; + uint32_t num_flows_node = 0; + uint64_t target_node; + + /* Add flows in table */ + for (i = 0; i < info->num_flows; i++) { + target_node = i % info->num_nodes; + if (target_node != node_id) + continue; + + ip_dst = rte_cpu_to_be_32(i); + + ret = rte_hash_add_key(h, (void *) &ip_dst); + if (ret < 0) + rte_exit(EXIT_FAILURE, "Unable to add entry %u " + "in hash table\n", i); + else + num_flows_node++; + + } + + printf("Hash table: Adding 0x%x keys\n", num_flows_node); + } + +After initialization, packets are dequeued from the shared ring (from the distributor) and, +like in the distributor process, the IPv4 address from the packets is used as +a key to look up in the hash table. +If there is a hit, packet is stored in a buffer, to be eventually transmitted +in one of the enabled ports. If key is not there, packet is dropped, since the +flow is not handled by the node. + +.. code-block:: c + + static inline void + handle_packets(struct rte_hash *h, struct rte_mbuf **bufs, uint16_t num_packets) + { + struct ipv4_hdr *ipv4_hdr; + uint32_t ipv4_dst_ip[PKT_READ_SIZE]; + const void *key_ptrs[PKT_READ_SIZE]; + unsigned int i; + int32_t positions[PKT_READ_SIZE] = {0}; + + for (i = 0; i < num_packets; i++) { + /* Handle IPv4 header.*/ + ipv4_hdr = rte_pktmbuf_mtod_offset(bufs[i], struct ipv4_hdr *, + sizeof(struct ether_hdr)); + ipv4_dst_ip[i] = ipv4_hdr->dst_addr; + key_ptrs[i] = &ipv4_dst_ip[i]; + } + /* Check if packets belongs to any flows handled by this node */ + rte_hash_lookup_bulk(h, key_ptrs, num_packets, positions); + + for (i = 0; i < num_packets; i++) { + if (likely(positions[i] >= 0)) { + filter_stats->passed++; + transmit_packet(bufs[i]); + } else { + filter_stats->drop++; + /* Drop packet, as flow is not handled by this node */ + rte_pktmbuf_free(bufs[i]); + } + } + } + +Finally, note that both processes updates statistics, such as transmitted, received +and dropped packets, which are shown and refreshed by the distributor app. + +.. code-block:: c + + static void + do_stats_display(void) + { + unsigned int i, j; + const char clr[] = {27, '[', '2', 'J', '\0'}; + const char topLeft[] = {27, '[', '1', ';', '1', 'H', '\0'}; + uint64_t port_tx[RTE_MAX_ETHPORTS], port_tx_drop[RTE_MAX_ETHPORTS]; + uint64_t node_tx[MAX_NODES], node_tx_drop[MAX_NODES]; + + /* to get TX stats, we need to do some summing calculations */ + memset(port_tx, 0, sizeof(port_tx)); + memset(port_tx_drop, 0, sizeof(port_tx_drop)); + memset(node_tx, 0, sizeof(node_tx)); + memset(node_tx_drop, 0, sizeof(node_tx_drop)); + + for (i = 0; i < num_nodes; i++) { + const struct tx_stats *tx = &info->tx_stats[i]; + + for (j = 0; j < info->num_ports; j++) { + const uint64_t tx_val = tx->tx[info->id[j]]; + const uint64_t drop_val = tx->tx_drop[info->id[j]]; + + port_tx[j] += tx_val; + port_tx_drop[j] += drop_val; + node_tx[i] += tx_val; + node_tx_drop[i] += drop_val; + } + } + + /* Clear screen and move to top left */ + printf("%s%s", clr, topLeft); + + printf("PORTS\n"); + printf("-----\n"); + for (i = 0; i < info->num_ports; i++) + printf("Port %u: '%s'\t", (unsigned int)info->id[i], + get_printable_mac_addr(info->id[i])); + printf("\n\n"); + for (i = 0; i < info->num_ports; i++) { + printf("Port %u - rx: %9"PRIu64"\t" + "tx: %9"PRIu64"\n", + (unsigned int)info->id[i], info->rx_stats.rx[i], + port_tx[i]); + } + + printf("\nFLOW DISTRIBUTOR\n"); + printf("-----\n"); + printf("distributed: %9"PRIu64", drop: %9"PRIu64"\n", + flow_dist_stats.distributed, flow_dist_stats.drop); + + printf("\nNODES\n"); + printf("-------\n"); + for (i = 0; i < num_nodes; i++) { + const unsigned long long rx = nodes[i].stats.rx; + const unsigned long long rx_drop = nodes[i].stats.rx_drop; + const struct filter_stats *filter = &info->filter_stats[i]; + + printf("Node %2u - rx: %9llu, rx_drop: %9llu\n" + " tx: %9"PRIu64", tx_drop: %9"PRIu64"\n" + " filter_passed: %9"PRIu64", " + "filter_drop: %9"PRIu64"\n", + i, rx, rx_drop, node_tx[i], node_tx_drop[i], + filter->passed, filter->drop); + } + + printf("\n"); + } diff --git a/doc/guides/sample_app_ug/img/flow_distributor.svg b/doc/guides/sample_app_ug/img/flow_distributor.svg new file mode 100644 index 0000000..b35faad --- /dev/null +++ b/doc/guides/sample_app_ug/img/flow_distributor.svg @@ -0,0 +1,417 @@ + + + + + + + + + + + + + + Page-1 + + + Sheet.1 + + + + + + diff --git a/doc/guides/sample_app_ug/index.rst b/doc/guides/sample_app_ug/index.rst index 775e2f7..260f6a5 100644 --- a/doc/guides/sample_app_ug/index.rst +++ b/doc/guides/sample_app_ug/index.rst @@ -57,6 +57,7 @@ Sample Applications User Guides l3_forward_virtual link_status_intr load_balancer + flow_distributor multi_process qos_metering qos_scheduler