From patchwork Tue Jan 14 14:25:15 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Andrzej Ostruszka [C]" X-Patchwork-Id: 64668 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id B19FDA04FF; Tue, 14 Jan 2020 15:25:34 +0100 (CET) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id A0B3C1C1AB; Tue, 14 Jan 2020 15:25:26 +0100 (CET) Received: from mx0b-0016f401.pphosted.com (mx0a-0016f401.pphosted.com [67.231.148.174]) by dpdk.org (Postfix) with ESMTP id 91D731C1A5 for ; Tue, 14 Jan 2020 15:25:25 +0100 (CET) Received: from pps.filterd (m0045849.ppops.net [127.0.0.1]) by mx0a-0016f401.pphosted.com (8.16.0.42/8.16.0.42) with SMTP id 00EEKUmO013360 for ; Tue, 14 Jan 2020 06:25:24 -0800 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=marvell.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : content-type; s=pfpt0818; bh=YsQqc2Hj+Nvgdb5LVljYpZyCMMyAbbPYlzxjM07aWZk=; b=mP284PMJShHzqvO07jPZZO2SEiDwBrVL4CEzv+WZKKnpsIWU4ReIEZyAJZl2bvewzPUn Evrs/qHSKfZGO/nqEiSh8rL1guOTrWEMg5hbvpZStKM25+Noy6rgcIDkc5jBLQJAkQdh al3v+ekhDsRXNZdupa5BEx6E+8Wmvj8gojjRnBy0QyDjwzq5lFA3RMm8/uvh2ZQOR6pb nHjBm/rKIZRfqdWVXNeMyE2GOTckTma6VELmAszQ3jolcv2FzMb0UAKhUylO85eLfUKM 1cqwjSPfAcQs2XCYEFXxHXHj3KhVLHVThzoV4vLP5DwDpPdZM8aGtV/h1k3QQK8DmwOv 1g== Received: from sc-exch01.marvell.com ([199.233.58.181]) by mx0a-0016f401.pphosted.com with ESMTP id 2xhc6sgngc-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-SHA384 bits=256 verify=NOT) for ; Tue, 14 Jan 2020 06:25:24 -0800 Received: from SC-EXCH03.marvell.com (10.93.176.83) by SC-EXCH01.marvell.com (10.93.176.81) with Microsoft SMTP Server (TLS) id 15.0.1497.2; Tue, 14 Jan 2020 06:25:22 -0800 Received: from maili.marvell.com (10.93.176.43) by SC-EXCH03.marvell.com (10.93.176.83) with Microsoft SMTP Server id 15.0.1497.2 via Frontend Transport; Tue, 14 Jan 2020 06:25:22 -0800 Received: from amok.marvell.com (unknown [10.95.130.253]) by maili.marvell.com (Postfix) with ESMTP id 4D3E43F7040; Tue, 14 Jan 2020 06:25:21 -0800 (PST) From: Andrzej Ostruszka To: CC: Jerin Jacob Kollanukkaran , Nithin Kumar Dabilpuram , Pavan Nikhilesh Bhagavatula , Kiran Kumar Kokkilagadda , Krzysztof Kanas Date: Tue, 14 Jan 2020 15:25:15 +0100 Message-ID: <20200114142517.29522-2-aostruszka@marvell.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20200114142517.29522-1-aostruszka@marvell.com> References: <20200114142517.29522-1-aostruszka@marvell.com> MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.138, 18.0.572 definitions=2020-01-14_04:2020-01-13, 2020-01-14 signatures=0 Subject: [dpdk-dev] [RFC PATCH 1/3] lib: introduce IF proxy library (API) X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" This library allows to designate ports visible to the system (such as Tun/Tap or KNI) as port representors serving as proxies for other DPDK ports. When such a proxy is configured this library initially queries network configuration from the system and later monitors its changes. The information gathered is passed to the application via a set of user registered callbacks. This way user can use normal network utilities (like those from the iproute2 suite) to configure DPDK ports. Signed-off-by: Andrzej Ostruszka --- lib/librte_if_proxy/rte_if_proxy.h | 364 +++++++++++++++++++++++++++++ 1 file changed, 364 insertions(+) create mode 100644 lib/librte_if_proxy/rte_if_proxy.h diff --git a/lib/librte_if_proxy/rte_if_proxy.h b/lib/librte_if_proxy/rte_if_proxy.h new file mode 100644 index 000000000..83895d8b7 --- /dev/null +++ b/lib/librte_if_proxy/rte_if_proxy.h @@ -0,0 +1,364 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(C) 2020 Marvell International Ltd. + */ + +#ifndef _RTE_IF_PROXY_H_ +#define _RTE_IF_PROXY_H_ + +/** + * @file + * RTE IF Proxy library + * + * The IF Proxy library allows for monitoring of system network configuration + * and configuration of DPDK ports by using usual system utilities (like the + * ones from iproute2 package). + * + * It is based on the notion of "proxy interface" which actually can be any DPDK + * port which is also visible to the system - that is it has non-zero 'if_index' + * field in 'rte_eth_dev_info' structure. + * + * If application doesn't have any such port (or doesn't want to use it for + * proxy) it can create one by calling: + * + * proxy_id = rte_ifpx_create(RTE_IFPX_DEFAULT); + * + * This function is just a wrapper that constructs valid 'devargs' string based + * on the proxy type chosen (currently Tap or KNI) and creates the interface by + * calling rte_ifpx_dev_create(). + * + * Once one has DPDK port capable of being proxy one can bind target DPDK port + * to it by calling. + * + * rte_ifpx_port_bind(port_id, proxy_id); + * + * This binding is a logical one - there is no automatic packet forwarding + * between port and it's proxy since the library doesn't know the structure of + * application's packet processing. It remains application responsibility to + * forward the packets from/to proxy port (by calling the usual DPDK RX/TX burst + * API). However when the library notes some change to the proxy interface it + * will simply call appropriate callback with 'port_id' of the DPDK port that is + * bound to this proxy interface. The binding can be 1 to many - that is many + * ports can point to one proxy - in that case registered callbacks will be + * called for every bound port. + * + * The callbacks that are used for notifications are described by the + * 'rte_ifpx_callbacks' structure and they are registered by calling: + * + * rte_ifpx_callbacks_register(&cbs); + * + * Finally the application should call: + * + * rte_ifpx_listen(); + * + * which will query system for present network configuration and start listening + * to its changes. + */ + +#include +#include + +#ifdef __cplusplus +extern "C" { +#endif + +/** + * Enum naming the type of proxy to create. + * + * @see rte_ifpx_create() + */ +enum rte_ifpx_type { + RTE_IFPX_DEFAULT, /**< Use default proxy type for given arch. */ + RTE_IFPX_TAP, /**< Use Tap based port for proxy. */ + RTE_IFPX_KNI /**< Use KNI based port for proxy. */ +}; + +/** + * Create DPDK port that can serve as an interface proxy. + * + * This function is just a wrapper around rte_ifpx_create_by_devarg() that + * constructs its 'devarg' argument based on type of proxy requested. + * + * @param type + * A type of proxy to create. + * + * @return + * DPDK port id on success, RTE_MAX_ETHPORTS otherwise. + * + * @see enum rte_ifpx_type + * @see rte_ifpx_create_by_devarg() + */ +__rte_experimental +uint16_t rte_ifpx_create(enum rte_ifpx_type type); + +/** + * Create DPDK port that can serve as an interface proxy. + * + * @param devarg + * A string passed to rte_dev_probe() to create proxy port. + * + * @return + * DPDK port id on success, RTE_MAX_ETHPORTS otherwise. + */ +__rte_experimental +uint16_t rte_ifpx_create_by_devarg(const char *devarg); + +/** + * Remove DPDK proxy port. + * + * In addition to removing the proxy port the bindings (if any) are cleared. + * + * @param proxy_id + * Port id of the proxy that should be removed. + * + * @return + * 0 on success, negative on error. + */ +__rte_experimental +int rte_ifpx_destroy(uint16_t proxy_id); + +/** + * This structure groups the callbacks that might be called as a notification + * events for changing network configuration. Not every platform might + * implement all of them and you can query the availability with + * rte_ifpx_callbacks_available() function and testing each bit against bit mask + * values defined in enum rte_ifpx_cb_bit. + * @see enum rte_ifpx_cb_bit + * @see rte_ifpx_callbacks_available() + * @see rte_ifpx_callbacks_register() + */ +struct rte_ifpx_callbacks { + void (*mac_change)(uint16_t port_id, const struct rte_ether_addr *mac); + /**< Callback for notification about MAC change of the proxy interface. + * This callback (as all other port related callbacks) is called for + * each port (with its port_id as a first argument) bound to the proxy + * interface for which change has been observed. + * @see RTE_IFPX_MAC_CHANGE + */ + void (*mtu_change)(uint16_t port_id, uint16_t mtu); + /**< Callback for notification about MTU change. + * @see RTE_IFPX_MTU_CHANGE + */ + void (*link_change)(uint16_t port_id, int is_up); + /**< Callback for notification about link going up/down. + * @see RTE_IFPX_LINK_CHANGE + */ + /* All IPv4 addresses are in host order */ + void (*addr_add)(uint16_t port_id, uint32_t ip); + /**< Callback for notification about IPv4 address being added. + * @see RTE_IFPX_ADDR_ADD + */ + void (*addr_del)(uint16_t port_id, uint32_t ip); + /**< Callback for notification about IPv4 address removal. + * @see RTE_IFPX_ADDR_DEL + */ + void (*addr6_add)(uint16_t port_id, const uint8_t *ip); + /**< Callback for notification about IPv6 address being added. + * @see RTE_IFPX_ADDR6_ADD + */ + void (*addr6_del)(uint16_t port_id, const uint8_t *ip); + /**< Callback for notification about IPv4 address removal. + * @see RTE_IFPX_ADDR6_DEL + */ + void (*route_add)(uint32_t ip, uint8_t depth); + /**< Callback for notification about IPv4 route being added. + * Note that "route" callbacks might be also called when user adds + * address to the interface (that is in addition to address related + * callbacks). + * @see RTE_IFPX_ROUTE_ADD + */ + void (*route_del)(uint32_t ip, uint8_t depth); + /**< Callback for notification about IPv4 route removal. + * @see RTE_IFPX_ROUTE_DEL + */ + void (*route6_add)(const uint8_t *ip, uint8_t depth); + /**< Callback for notification about IPv6 route being added. + * @see RTE_IFPX_ROUTE6_ADD + */ + void (*route6_del)(const uint8_t *ip, uint8_t depth); + /**< Callback for notification about IPv6 route removal. + * @see RTE_IFPX_ROUTE6_DEL + */ + void (*cfg_finished)(void); + /**< Lib specific callback - called when initial network configuration + * query is finished. + */ +}; + +/** + * The rte_ifpx_cb_bit enum defines bit mask values to test against value + * returned by rte_ifpx_callbacks_available() to learn about type of callbacks + * implemented for this platform. + */ +enum rte_ifpx_cb_bit { + RTE_IFPX_MAC_CHANGE = 1ULL << 0, /**< @see mac_change callback */ + RTE_IFPX_MTU_CHANGE = 1ULL << 1, /**< @see mtu_change callback */ + RTE_IFPX_LINK_CHANGE = 1ULL << 2, /**< @see link_change callback */ + RTE_IFPX_ADDR_ADD = 1ULL << 3, /**< @see addr_add callback */ + RTE_IFPX_ADDR_DEL = 1ULL << 4, /**< @see addr_del callback */ + RTE_IFPX_ADDR6_ADD = 1ULL << 5, /**< @see addr6_add callback */ + RTE_IFPX_ADDR6_DEL = 1ULL << 6, /**< @see addr6_del callback */ + RTE_IFPX_ROUTE_ADD = 1ULL << 7, /**< @see route_add callback */ + RTE_IFPX_ROUTE_DEL = 1ULL << 8, /**< @see route_del callback */ + RTE_IFPX_ROUTE6_ADD = 1ULL << 9, /**< @see route6_add callback */ + RTE_IFPX_ROUTE6_DEL = 1ULL << 10, /**< @see route6_del callback */ +}; +/** + * Get the bit mask of implemented callbacks for this platform. + * + * @return + * Bit mask of callbacks implemented. + * @see enum rte_ifpx_cb_bit + */ +__rte_experimental +uint64_t rte_ifpx_callbacks_available(void); + +/** + * Typedef naming type of value returned during callback registration. + * + * @see rte_ifpx_callbacks_register() + */ +typedef const void *rte_ifpx_cbs_hndl; + +/** + * Register proxy callbacks. + * + * This function registers callbacks to be called upon appropriate network + * event notification. + * + * @param cbs + * Set of callbacks that will be called. The library does not take any + * ownership of the pointer passed - the callbacks are stored internally. + * + * @return + * Non-NULL pointer upon successful registration - that pointer can be used + * as a handle to unregister callbacks (and nothing more). On failure NULL + * is returned. + */ +__rte_experimental +rte_ifpx_cbs_hndl rte_ifpx_callbacks_register(const + struct rte_ifpx_callbacks *cbs); + +/** + * Unregister proxy callbacks. + * + * This function unregisters callbacks previously registered with + * rte_ifpx_callbacks_register(). + * + * @param cbs + * Handle/pointer returned on previous callback registration. + * + * @return + * 0 on success, negative otherwise. + */ +__rte_experimental +int rte_ifpx_callbacks_unregister(rte_ifpx_cbs_hndl cbs); + +/** + * Bind the port to its proxy. + * + * After calling this function all network configuration of the proxy (and it's + * changes) will be passed to given port by calling registered callbacks with + * 'port_id' as an argument. + * + * Note: since both arguments are of the same type in order to not mix them and + * ease remembering the order the first one is kept the same for bind/unbind. + * + * @param port_id + * Id of the port to be bound. + * @param proxy_id + * Id of the proxy the port needs to be bound to. + * @return + * 0 on success, negative on error. + */ +__rte_experimental +int rte_ifpx_port_bind(uint16_t port_id, uint16_t proxy_id); + +/** + * Unbind the port from its proxy. + * + * After calling this function registered callbacks will no longer be called for + * this port (but they might be called for other ports in one to many binding + * scenario). + * + * @param port_id + * Id of the port to unbind. + * @return + * 0 on success, negative on error. + */ +__rte_experimental +int rte_ifpx_port_unbind(uint16_t port_id); + +/** + * Get the system network configuration and start listening to its changes. + * + * @return + * 0 on success, negative otherwise. + */ +__rte_experimental +int rte_ifpx_listen(void); + +/** + * Remove all bindings/callbacks and stop listening to network configuration. + * + * @return + * 0 on success, negative otherwise. + */ +__rte_experimental +int rte_ifpx_close(void); + +/** + * Get the id of the proxy the port is bound to. + * + * @param port_id + * Id of the port for which to get proxy. + * @return + * Port id of the proxy on success, RTE_ETH_MAXPORT on error. + */ +__rte_experimental +uint16_t rte_ifpx_proxy_get(uint16_t port_id); + +/** + * Get the ids of the ports bound to the proxy. + * + * @param proxy_id + * Id of the proxy for which to get ports. + * @param ports + * Array where to store the port ids. + * @param num + * Size of the 'ports' array. + * @return + * The number of ports bound to given proxy. Note that this function return + * value does not depend on the ports/num argument - so you can call it first + * with NULL/0 to query for the size of the buffer to create or call it with + * the buffer you have and later check if it was large enough. + */ +__rte_experimental +unsigned int rte_ifpx_port_get(uint16_t proxy_id, + uint16_t *ports, unsigned int num); + +/** + * The structure containing some properties of the proxy interface. + */ +struct rte_ifpx_info { + unsigned int if_index; /* entry valid iff if_index != 0 */ + uint16_t mtu; + struct rte_ether_addr mac; + char if_name[RTE_ETH_NAME_MAX_LEN]; +}; + +/** + * Get the properties of the proxy interface given port is bound to. + * + * @param port_id + * Id of the port for which to get proxy properties. + * @return + * Pointer to the proxy information structure. + */ +__rte_experimental +const struct rte_ifpx_info *rte_ifpx_info_get(uint16_t port_id); + +#ifdef __cplusplus +} +#endif + +#endif /* _RTE_IF_PROXY_H_ */ From patchwork Tue Jan 14 14:25:16 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Andrzej Ostruszka [C]" X-Patchwork-Id: 64669 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id 03B88A04FF; Tue, 14 Jan 2020 15:25:48 +0100 (CET) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id DD2A61C1BE; Tue, 14 Jan 2020 15:25:30 +0100 (CET) Received: from mx0b-0016f401.pphosted.com (mx0b-0016f401.pphosted.com [67.231.156.173]) by dpdk.org (Postfix) with ESMTP id F260F1C1B7 for ; Tue, 14 Jan 2020 15:25:28 +0100 (CET) Received: from pps.filterd (m0045851.ppops.net [127.0.0.1]) by mx0b-0016f401.pphosted.com (8.16.0.42/8.16.0.42) with SMTP id 00EEPRg6007770; Tue, 14 Jan 2020 06:25:28 -0800 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=marvell.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : content-type; s=pfpt0818; bh=6dXfNNpIjK+jZl3afcTqg62n8ua3BmA4HDbdP8foEjA=; b=MrLdWURqqHlenhkHmnjjbU0WlQtr3FId52h1n0rTI2R2MLWaXBVsOqI+K25paIMfB+Mt zAkKhfPIDJBfOwEotEg5S4jwNk4mfNAgjLnGc/8fnXfm/PGJc1FF+kCIsLkFtKR2+2Ht aQ0y/TLxmkvjUk8KxC2rdewVz7xRMEj3a2ebHvo8DQ1kQqLbLiFww7IhqMHMxwhMkNKC 5vvFbyPyBjPnIGzuiefNrkbffgyebC8F82bGfCSUKKfMtCY3FBZRPfyyyylJhvku05xv wK1Q+NqPHea3ATmbX3Evspq/NnR1R7jvDMKkfdWoK9Gcmn27MgZKlQQrESSv5qX+XRGz CA== Received: from sc-exch02.marvell.com ([199.233.58.182]) by mx0b-0016f401.pphosted.com with ESMTP id 2xgng4vxm0-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-SHA384 bits=256 verify=NOT); Tue, 14 Jan 2020 06:25:28 -0800 Received: from SC-EXCH03.marvell.com (10.93.176.83) by SC-EXCH02.marvell.com (10.93.176.82) with Microsoft SMTP Server (TLS) id 15.0.1497.2; Tue, 14 Jan 2020 06:25:25 -0800 Received: from maili.marvell.com (10.93.176.43) by SC-EXCH03.marvell.com (10.93.176.83) with Microsoft SMTP Server id 15.0.1497.2 via Frontend Transport; Tue, 14 Jan 2020 06:25:25 -0800 Received: from amok.marvell.com (unknown [10.95.130.253]) by maili.marvell.com (Postfix) with ESMTP id 868423F703F; Tue, 14 Jan 2020 06:25:23 -0800 (PST) From: Andrzej Ostruszka To: , Thomas Monjalon CC: Jerin Jacob Kollanukkaran , Nithin Kumar Dabilpuram , Pavan Nikhilesh Bhagavatula , Kiran Kumar Kokkilagadda , Krzysztof Kanas Date: Tue, 14 Jan 2020 15:25:16 +0100 Message-ID: <20200114142517.29522-3-aostruszka@marvell.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20200114142517.29522-1-aostruszka@marvell.com> References: <20200114142517.29522-1-aostruszka@marvell.com> MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.138, 18.0.572 definitions=2020-01-14_04:2020-01-13, 2020-01-14 signatures=0 Subject: [dpdk-dev] [RFC PATCH 2/3] if_proxy: add preliminary Linux implementation X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" This commit adds a preliminary Linux implementation of the IF Proxy library. It should allow one to play around with the idea and check its usefulness. Signed-off-by: Andrzej Ostruszka --- config/common_base | 5 + lib/Makefile | 2 + .../common/include/rte_eal_interrupts.h | 2 + lib/librte_eal/linux/eal/eal_interrupts.c | 14 +- lib/librte_if_proxy/Makefile | 25 + lib/librte_if_proxy/meson.build | 7 + lib/librte_if_proxy/rte_if_proxy.c | 803 ++++++++++++++++++ lib/meson.build | 2 +- 8 files changed, 855 insertions(+), 5 deletions(-) create mode 100644 lib/librte_if_proxy/Makefile create mode 100644 lib/librte_if_proxy/meson.build create mode 100644 lib/librte_if_proxy/rte_if_proxy.c diff --git a/config/common_base b/config/common_base index 7dec7ed45..f20296750 100644 --- a/config/common_base +++ b/config/common_base @@ -1056,6 +1056,11 @@ CONFIG_RTE_LIBRTE_BPF_ELF=n # CONFIG_RTE_LIBRTE_IPSEC=y +# +# Compile librte_if_proxy +# +CONFIG_RTE_LIBRTE_IF_PROXY=y + # # Compile the test application # diff --git a/lib/Makefile b/lib/Makefile index 46b91ae1a..0a60f3656 100644 --- a/lib/Makefile +++ b/lib/Makefile @@ -118,6 +118,8 @@ DIRS-$(CONFIG_RTE_LIBRTE_TELEMETRY) += librte_telemetry DEPDIRS-librte_telemetry := librte_eal librte_metrics librte_ethdev DIRS-$(CONFIG_RTE_LIBRTE_RCU) += librte_rcu DEPDIRS-librte_rcu := librte_eal +DIRS-$(CONFIG_RTE_LIBRTE_IF_PROXY) += librte_if_proxy +DEPDIRS-librte_if_proxy := librte_eal ifeq ($(CONFIG_RTE_EXEC_ENV_LINUX),y) DIRS-$(CONFIG_RTE_LIBRTE_KNI) += librte_kni diff --git a/lib/librte_eal/common/include/rte_eal_interrupts.h b/lib/librte_eal/common/include/rte_eal_interrupts.h index b370c0d26..f3d39a5ce 100644 --- a/lib/librte_eal/common/include/rte_eal_interrupts.h +++ b/lib/librte_eal/common/include/rte_eal_interrupts.h @@ -35,7 +35,9 @@ enum rte_intr_handle_type { RTE_INTR_HANDLE_EXT, /**< external handler */ RTE_INTR_HANDLE_VDEV, /**< virtual device */ RTE_INTR_HANDLE_DEV_EVENT, /**< device event handle */ + RTE_INTR_HANDLE_NETLINK, /**< netlink notification handle */ RTE_INTR_HANDLE_VFIO_REQ, /**< VFIO request handle */ + RTE_INTR_HANDLE_MAX /**< count of elements */ }; diff --git a/lib/librte_eal/linux/eal/eal_interrupts.c b/lib/librte_eal/linux/eal/eal_interrupts.c index 14ebb108c..ccdd94002 100644 --- a/lib/librte_eal/linux/eal/eal_interrupts.c +++ b/lib/librte_eal/linux/eal/eal_interrupts.c @@ -680,6 +680,9 @@ rte_intr_enable(const struct rte_intr_handle *intr_handle) break; /* not used at this moment */ case RTE_INTR_HANDLE_ALARM: +#if RTE_LIBRTE_IF_PROXY + case RTE_INTR_HANDLE_NETLINK: +#endif return -1; #ifdef VFIO_PRESENT case RTE_INTR_HANDLE_VFIO_MSIX: @@ -796,6 +799,9 @@ rte_intr_disable(const struct rte_intr_handle *intr_handle) break; /* not used at this moment */ case RTE_INTR_HANDLE_ALARM: +#if RTE_LIBRTE_IF_PROXY + case RTE_INTR_HANDLE_NETLINK: +#endif return -1; #ifdef VFIO_PRESENT case RTE_INTR_HANDLE_VFIO_MSIX: @@ -889,12 +895,12 @@ eal_intr_process_interrupts(struct epoll_event *events, int nfds) break; #endif #endif - case RTE_INTR_HANDLE_VDEV: case RTE_INTR_HANDLE_EXT: - bytes_read = 0; - call = true; - break; + case RTE_INTR_HANDLE_VDEV: case RTE_INTR_HANDLE_DEV_EVENT: +#if RTE_LIBRTE_IF_PROXY + case RTE_INTR_HANDLE_NETLINK: +#endif bytes_read = 0; call = true; break; diff --git a/lib/librte_if_proxy/Makefile b/lib/librte_if_proxy/Makefile new file mode 100644 index 000000000..9dd5f4791 --- /dev/null +++ b/lib/librte_if_proxy/Makefile @@ -0,0 +1,25 @@ +# SPDX-License-Identifier: BSD-3-Clause +# Copyright(C) 2019 Marvell International Ltd. + + +include $(RTE_SDK)/mk/rte.vars.mk + +# library name +LIB = librte_if_proxy.a + +CFLAGS += -DALLOW_EXPERIMENTAL_API +CFLAGS += -O3 +CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR) +LDLIBS += -lrte_eal + +EXPORT_MAP := rte_if_proxy_version.map + +LIBABIVER := 1 + +# all source are stored in SRCS-y +SRCS-$(CONFIG_RTE_LIBRTE_IF_PROXY) := rte_if_proxy.c + +# install this header file +SYMLINK-$(CONFIG_RTE_LIBRTE_IF_PROXY)-include := rte_if_proxy.h + +include $(RTE_SDK)/mk/rte.lib.mk diff --git a/lib/librte_if_proxy/meson.build b/lib/librte_if_proxy/meson.build new file mode 100644 index 000000000..f9ed410b6 --- /dev/null +++ b/lib/librte_if_proxy/meson.build @@ -0,0 +1,7 @@ +# SPDX-License-Identifier: BSD-3-Clause +# Copyright(C) 2019 Marvell International Ltd. + +version = 1 +allow_experimental_apis = true +sources = files('rte_if_proxy.c') +headers = files('rte_if_proxy.h') diff --git a/lib/librte_if_proxy/rte_if_proxy.c b/lib/librte_if_proxy/rte_if_proxy.c new file mode 100644 index 000000000..770462702 --- /dev/null +++ b/lib/librte_if_proxy/rte_if_proxy.c @@ -0,0 +1,803 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(C) 2020 Marvell International Ltd. + */ + +#include +#include +#include +#include + +#include +#include +#include +#include +#include +#include +#include + +static +int ifpx_log_type; +#define IFPX_LOG(level, fmt, args...) \ + rte_log(RTE_LOG_ ## level, ifpx_log_type, "%s(): " fmt "\n", \ + __func__, ##args) + +static +struct rte_intr_handle ifpx_irq = { + .type = RTE_INTR_HANDLE_NETLINK, + .fd = -1, +}; + +static +unsigned int ifpx_pid; + +/* Port to proxy mapping table */ +static uint16_t ifpx_p2p[RTE_MAX_ETHPORTS]; + +/* Since this library is really slow/config path we guard data structures with + * a lock - and only one for all of them should be enough. But only callback + * and proxies lists are protected, I don't expect the need to protect port to + * proxy map table above. + */ +static +rte_spinlock_t ifpx_lock = RTE_SPINLOCK_INITIALIZER; + +/* List of configured proxies */ +struct ifpx_proxies_node { + TAILQ_ENTRY(ifpx_proxies_node) elem; + uint16_t proxy_id; + struct rte_ifpx_info info; +}; +static +TAILQ_HEAD(ifpx_proxies_head, ifpx_proxies_node) ifpx_proxies = + TAILQ_HEAD_INITIALIZER(ifpx_proxies); + +/* List of registered callbacks */ +struct ifpx_cbs_node { + TAILQ_ENTRY(ifpx_cbs_node) elem; + struct rte_ifpx_callbacks cbs; +}; +static +TAILQ_HEAD(ifpx_cbs_head, ifpx_cbs_node) ifpx_callbacks = + TAILQ_HEAD_INITIALIZER(ifpx_callbacks); + +static +int request_info(int type, int index); + +uint64_t rte_ifpx_callbacks_available(void) +{ + return RTE_IFPX_MAC_CHANGE | RTE_IFPX_MTU_CHANGE | + RTE_IFPX_LINK_CHANGE | RTE_IFPX_ADDR_ADD | + RTE_IFPX_ADDR_DEL | RTE_IFPX_ADDR6_ADD | + RTE_IFPX_ADDR6_DEL | RTE_IFPX_ROUTE_ADD | + RTE_IFPX_ROUTE_DEL | RTE_IFPX_ROUTE6_ADD | + RTE_IFPX_ROUTE6_DEL; +} + +uint16_t rte_ifpx_create(enum rte_ifpx_type type) +{ + char devargs[16] = { '\0' }; + int dev_cnt = 0, nlen; + uint16_t port_id; + + switch (type) { + case RTE_IFPX_DEFAULT: + case RTE_IFPX_TAP: + nlen = strlcpy(devargs, "net_tap", sizeof(devargs)); + break; + case RTE_IFPX_KNI: + nlen = strlcpy(devargs, "net_kni", sizeof(devargs)); + break; + default: + IFPX_LOG(ERR, "Unknown proxy type: %d", type); + return RTE_MAX_ETHPORTS; + } + + RTE_ETH_FOREACH_DEV(port_id) { + if (strcmp(rte_eth_devices[port_id].device->driver->name, + devargs) == 0) + ++dev_cnt; + } + snprintf(devargs+nlen, sizeof(devargs)-nlen, "%d", dev_cnt); + + return rte_ifpx_create_by_devarg(devargs); +} + +uint16_t rte_ifpx_create_by_devarg(const char *devarg) +{ + uint16_t port_id = RTE_MAX_ETHPORTS; + struct rte_dev_iterator iter; + + if (rte_dev_probe(devarg) < 0) { + IFPX_LOG(ERR, "Failed to create proxy port %s\n", devarg); + return RTE_MAX_ETHPORTS; + } + + RTE_ETH_FOREACH_MATCHING_DEV(port_id, devarg, &iter) { + break; + } + if (port_id != RTE_MAX_ETHPORTS) + rte_eth_iterator_cleanup(&iter); + + return port_id; +} + +int rte_ifpx_destroy(uint16_t proxy_id) +{ + struct ifpx_proxies_node *px; + unsigned int i; + int ec = 0; + + rte_spinlock_lock(&ifpx_lock); + TAILQ_FOREACH(px, &ifpx_proxies, elem) { + if (px->proxy_id != proxy_id) + continue; + } + if (!px) { + ec = -EINVAL; + goto exit; + } + TAILQ_REMOVE(&ifpx_proxies, px, elem); + free(px); + + /* Clear any bindings for this proxy. */ + for (i = 0; i < RTE_DIM(ifpx_p2p); ++i) { + if (ifpx_p2p[i] == proxy_id) + ifpx_p2p[i] = RTE_MAX_ETHPORTS; + } + + ec = rte_dev_remove(rte_eth_devices[proxy_id].device); +exit: + rte_spinlock_unlock(&ifpx_lock); + return ec; +} + +int rte_ifpx_port_bind(uint16_t port_id, uint16_t proxy_id) +{ + struct rte_eth_dev_info proxy_eth_info; + struct ifpx_proxies_node *px; + int ec; + + if (port_id >= RTE_MAX_ETHPORTS || proxy_id >= RTE_MAX_ETHPORTS) { + IFPX_LOG(ERR, "Invalid port_id: %d", port_id); + return -EINVAL; + } + + /* Do automatic rebinding but issue a warning since this is not + * considered to be a valid behaviour. + */ + if (ifpx_p2p[port_id] != RTE_MAX_ETHPORTS) { + IFPX_LOG(WARNING, "Port already bound: %d -> %d", port_id, + ifpx_p2p[port_id]); + } + + ec = rte_eth_dev_info_get(proxy_id, &proxy_eth_info); + if (ec < 0) { + IFPX_LOG(ERR, "Failed to read proxy dev info: %d", ec); + return ec; + } + if (proxy_eth_info.if_index == 0) { + IFPX_LOG(ERR, "Proxy with no IF index"); + return -EINVAL; + } + + /* Search for existing proxy - if not found add one to the list. */ + rte_spinlock_lock(&ifpx_lock); + TAILQ_FOREACH(px, &ifpx_proxies, elem) { + if (px->proxy_id == proxy_id) + break; + } + if (!px) { + px = malloc(sizeof(*px)); + if (!px) { + rte_spinlock_unlock(&ifpx_lock); + return -ENOMEM; + } + px->proxy_id = proxy_id; + px->info.if_index = proxy_eth_info.if_index; + rte_eth_dev_get_mtu(proxy_id, &px->info.mtu); + rte_eth_macaddr_get(proxy_id, &px->info.mac); + memset(px->info.if_name, 0, sizeof(px->info.if_name)); + TAILQ_INSERT_TAIL(&ifpx_proxies, px, elem); + } + rte_spinlock_unlock(&ifpx_lock); + ifpx_p2p[port_id] = proxy_id; + + if (ifpx_irq.fd != -1) + request_info(RTM_GETLINK, px->info.if_index); + + return 0; +} + +int rte_ifpx_port_unbind(uint16_t port_id) +{ + if (port_id >= RTE_MAX_ETHPORTS || + ifpx_p2p[port_id] == RTE_MAX_ETHPORTS) + return -EINVAL; + + ifpx_p2p[port_id] = RTE_MAX_ETHPORTS; + /* Proxy without any port bound is OK - that is the state of the proxy + * that has just been created, and it can still report routing + * information. So we do not even check if this is the case. + */ + + return 0; +} + +rte_ifpx_cbs_hndl rte_ifpx_callbacks_register(const + struct rte_ifpx_callbacks *cbs) +{ + rte_ifpx_cbs_hndl cb_hndl = NULL; + struct ifpx_cbs_node *node; + + if (!cbs) + return NULL; + + rte_spinlock_lock(&ifpx_lock); + TAILQ_FOREACH(node, &ifpx_callbacks, elem) { + if (&node->cbs == cbs) { + cb_hndl = cbs; + goto exit; + } + } + + node = malloc(sizeof(*node)); + if (!node) + goto exit; + + node->cbs = *cbs; + TAILQ_INSERT_TAIL(&ifpx_callbacks, node, elem); + cb_hndl = &node->cbs; +exit: + rte_spinlock_unlock(&ifpx_lock); + + return cb_hndl; +} + +int rte_ifpx_callbacks_unregister(rte_ifpx_cbs_hndl cbs) +{ + struct ifpx_cbs_node *node; + int ec = -EINVAL; + + if (!cbs) + return ec; + + rte_spinlock_lock(&ifpx_lock); + TAILQ_FOREACH(node, &ifpx_callbacks, elem) { + if (&node->cbs == cbs) { + TAILQ_REMOVE(&ifpx_callbacks, node, elem); + free(node); + ec = 0; + break; + } + } + rte_spinlock_unlock(&ifpx_lock); + + return ec; +} + +uint16_t rte_ifpx_proxy_get(uint16_t port_id) +{ + if (port_id >= RTE_MAX_ETHPORTS) + return RTE_MAX_ETHPORTS; + + return ifpx_p2p[port_id]; +} + +unsigned int rte_ifpx_port_get(uint16_t proxy_id, + uint16_t *ports, unsigned int num) +{ + unsigned int p, cnt = 0; + + for (p = 0; p < RTE_DIM(ifpx_p2p); ++p) { + if (ifpx_p2p[p] == proxy_id) { + ++cnt; + if (ports && num > 0) { + *ports++ = ifpx_p2p[p]; + --num; + } + } + } + return cnt; +} + +const struct rte_ifpx_info *rte_ifpx_info_get(uint16_t port_id) +{ + struct ifpx_proxies_node *px; + + if (port_id >= RTE_MAX_ETHPORTS || + ifpx_p2p[port_id] == RTE_MAX_ETHPORTS) + return NULL; + + rte_spinlock_lock(&ifpx_lock); + TAILQ_FOREACH(px, &ifpx_proxies, elem) { + if (px->proxy_id == ifpx_p2p[port_id]) + break; + } + rte_spinlock_unlock(&ifpx_lock); + RTE_ASSERT(px && "Internal IF Proxy library error"); + + return &px->info; +} + +static +void handle_link(const struct nlmsghdr *h) +{ + const struct ifinfomsg *ifi = NLMSG_DATA(h); + int alen = h->nlmsg_len - NLMSG_LENGTH(sizeof(*ifi)); + const struct rtattr *attrs[IFLA_MAX+1] = { NULL }; + const struct rtattr *attr; + struct ifpx_proxies_node *px; + struct ifpx_cbs_node *cb; + uint16_t p; + + IFPX_LOG(DEBUG, "\tLink action (%u): %u, 0x%x/0x%x (flags/changed)", + ifi->ifi_index, h->nlmsg_type, ifi->ifi_flags, + ifi->ifi_change); + + rte_spinlock_lock(&ifpx_lock); + TAILQ_FOREACH(px, &ifpx_proxies, elem) { + if (px->info.if_index == (unsigned int)ifi->ifi_index) + break; + } + rte_spinlock_unlock(&ifpx_lock); + + /* Drop messages that are not associated with any proxy */ + if (!px) + return; + /* When message is a reply to request for specific interface then keep + * it only when it contains info for this interface. + */ + if (h->nlmsg_pid == ifpx_pid && h->nlmsg_seq >> 8 && + (h->nlmsg_seq >> 8) != (unsigned int)ifi->ifi_index) + return; + + for (attr = IFLA_RTA(ifi); RTA_OK(attr, alen); + attr = RTA_NEXT(attr, alen)) { + if (attr->rta_type > IFLA_MAX) + continue; + attrs[attr->rta_type] = attr; + } + + rte_spinlock_lock(&ifpx_lock); + if (ifi->ifi_change & IFF_UP) { + TAILQ_FOREACH(cb, &ifpx_callbacks, elem) { + if (!cb->cbs.link_change) + continue; + for (p = 0; p < RTE_DIM(ifpx_p2p); ++p) { + if (ifpx_p2p[p] != px->proxy_id) + continue; + cb->cbs.link_change(p, + ifi->ifi_flags & IFF_UP); + } + } + } + if (attrs[IFLA_MTU]) { + uint16_t mtu = *(const int *)RTA_DATA(attrs[IFLA_MTU]); + if (mtu != px->info.mtu) { + px->info.mtu = mtu; + TAILQ_FOREACH(cb, &ifpx_callbacks, elem) { + if (!cb->cbs.mtu_change) + continue; + for (p = 0; p < RTE_DIM(ifpx_p2p); ++p) { + if (ifpx_p2p[p] != px->proxy_id) + continue; + cb->cbs.mtu_change(p, mtu); + } + } + } + } + if (attrs[IFLA_ADDRESS]) { + const struct rte_ether_addr *mac = + RTA_DATA(attrs[IFLA_ADDRESS]); + + RTE_ASSERT(RTA_PAYLOAD(attrs[IFLA_ADDRESS]) == + RTE_ETHER_ADDR_LEN); + if (memcmp(mac, &px->info.mac, RTE_ETHER_ADDR_LEN) != 0) { + memcpy(px->info.mac.addr_bytes, mac, RTE_ETHER_ADDR_LEN); + TAILQ_FOREACH(cb, &ifpx_callbacks, elem) { + if (!cb->cbs.mac_change) + continue; + for (p = 0; p < RTE_DIM(ifpx_p2p); ++p) { + if (ifpx_p2p[p] != px->proxy_id) + continue; + cb->cbs.mac_change(p, mac); + } + } + } + } + rte_spinlock_unlock(&ifpx_lock); + if (h->nlmsg_pid == ifpx_pid) { + RTE_ASSERT((h->nlmsg_seq & 0xFF) == RTM_GETLINK); + /* If this is reply for specific link request (not initial + * global dump) then follow up with address request, otherwise + * just store the interface name. + */ + if (h->nlmsg_seq >> 8) + request_info(RTM_GETADDR, ifi->ifi_index); + else if (!px->info.if_name[0] && attrs[IFLA_IFNAME]) + strlcpy(px->info.if_name, RTA_DATA(attrs[IFLA_IFNAME]), + sizeof(px->info.if_name)); + } +} + +static +void handle_addr(const struct nlmsghdr *h, bool needs_del) +{ + const struct ifaddrmsg *ifa = NLMSG_DATA(h); + int alen = h->nlmsg_len - NLMSG_LENGTH(sizeof(*ifa)); + const struct rtattr *attrs[IFA_MAX+1] = { NULL }; + const struct rtattr *attr; + struct ifpx_proxies_node *px; + struct ifpx_cbs_node *cb; + const uint8_t *ip; + uint16_t p; + + rte_spinlock_lock(&ifpx_lock); + TAILQ_FOREACH(px, &ifpx_proxies, elem) { + if (px->info.if_index == ifa->ifa_index) + break; + } + rte_spinlock_unlock(&ifpx_lock); + + /* Drop messages that are not associated with any proxy */ + if (!px) + return; + /* When message is a reply to request for specific interface then keep + * it only when it contains info for this interface. + */ + if (h->nlmsg_pid == ifpx_pid && h->nlmsg_seq >> 8 && + (h->nlmsg_seq >> 8) != ifa->ifa_index) + return; + + for (attr = IFA_RTA(ifa); RTA_OK(attr, alen); + attr = RTA_NEXT(attr, alen)) { + if (attr->rta_type > IFA_MAX) + continue; + attrs[attr->rta_type] = attr; + } + + rte_spinlock_lock(&ifpx_lock); + if (attrs[IFA_ADDRESS]) { + TAILQ_FOREACH(cb, &ifpx_callbacks, elem) { + struct rte_ifpx_callbacks *cbs = &cb->cbs; + + ip = RTA_DATA(attrs[IFA_ADDRESS]); + if (ifa->ifa_family == AF_INET) { + /* address is in network order */ + uint32_t ipv4 = + RTE_IPV4(ip[0], ip[1], ip[2], ip[3]); + + for (p = 0; p < RTE_DIM(ifpx_p2p); ++p) { + if (ifpx_p2p[p] != px->proxy_id) + continue; + if (needs_del && cbs->addr_del) + cb->cbs.addr_del(p, ipv4); + else if (!needs_del && cbs->addr_add) + cb->cbs.addr_add(p, ipv4); + } + } else if (ifa->ifa_family == AF_INET6) { + for (p = 0; p < RTE_DIM(ifpx_p2p); ++p) { + if (ifpx_p2p[p] != px->proxy_id) + continue; + if (needs_del && cbs->addr6_del) + cb->cbs.addr6_del(p, ip); + else if (!needs_del && cbs->addr6_add) + cb->cbs.addr6_add(p, ip); + } + } + } + } + rte_spinlock_unlock(&ifpx_lock); +} + +static +void handle_route(const struct nlmsghdr *h, bool needs_del) +{ + const struct rtmsg *r = NLMSG_DATA(h); + int alen = h->nlmsg_len - NLMSG_LENGTH(sizeof(*r)); + const struct rtattr *attrs[RTA_MAX+1] = { NULL }; + const struct rtattr *attr; + struct ifpx_cbs_node *node; + const uint8_t *ip; + + for (attr = RTM_RTA(r); RTA_OK(attr, alen); + attr = RTA_NEXT(attr, alen)) { + if (attr->rta_type > RTA_MAX) + continue; + attrs[attr->rta_type] = attr; + } + + rte_spinlock_lock(&ifpx_lock); + if (attrs[RTA_DST]) { + TAILQ_FOREACH(node, &ifpx_callbacks, elem) { + struct rte_ifpx_callbacks *cbs = &node->cbs; + + ip = RTA_DATA(attrs[RTA_DST]); + if (r->rtm_family == AF_INET) { + /* address is in network order */ + uint32_t ipv4 = + RTE_IPV4(ip[0], ip[1], ip[2], ip[3]); + + if (needs_del && cbs->route_del) + cbs->route_del(ipv4, r->rtm_dst_len); + else if (!needs_del && cbs->route_add) + cbs->route_add(ipv4, r->rtm_dst_len); + } else if (r->rtm_family == AF_INET6) { + if (needs_del && cbs->route6_del) + cbs->route6_del(ip, r->rtm_dst_len); + else if (!needs_del && cbs->route6_add) + cbs->route6_add(ip, r->rtm_dst_len); + } + } + } + rte_spinlock_unlock(&ifpx_lock); +} + +static +int request_info(int type, int index) +{ + static rte_spinlock_t send_lock = RTE_SPINLOCK_INITIALIZER; + struct info_get { + struct nlmsghdr h; + union { + struct ifinfomsg ifm; + struct ifaddrmsg ifa; + struct rtmsg rtm; + } __rte_aligned(NLMSG_ALIGNTO); + } info_req; + int ret; + + IFPX_LOG(DEBUG, "\tRequesting msg %d for: %u", type, index); + + memset(&info_req, 0, sizeof(info_req)); + /* First byte of these messages is family, so just make sure that this + * memset is enough to get all families. + */ + RTE_ASSERT(AF_UNSPEC == 0); + + info_req.h.nlmsg_pid = ifpx_pid; + info_req.h.nlmsg_type = type; + info_req.h.nlmsg_flags = NLM_F_REQUEST | NLM_F_DUMP; + info_req.h.nlmsg_len = offsetof(struct info_get, ifm); + + switch (type) { + case RTM_GETLINK: + info_req.h.nlmsg_len += sizeof(info_req.ifm); + info_req.ifm.ifi_index = index; + break; + case RTM_GETADDR: + info_req.h.nlmsg_len += sizeof(info_req.ifa); + info_req.ifa.ifa_index = index; + break; + case RTM_GETROUTE: + info_req.h.nlmsg_len += sizeof(info_req.rtm); + break; + default: + return -EINVAL; + } + /* Store request type (and if it is global or link specific) in 'seq'. + * Later it is used during handling of reply to continue requesting of + * information dump from system - if needed. + */ + info_req.h.nlmsg_seq = index << 8 | type; + + rte_spinlock_lock(&send_lock); + ret = send(ifpx_irq.fd, &info_req, info_req.h.nlmsg_len, 0); + if (ret < 0) { + IFPX_LOG(ERR, "Failed to send netlink msg: %d", errno); + rte_errno = errno; + } + rte_spinlock_unlock(&send_lock); + + return ret; +} + +static +void notify_cfg_finished(void) +{ + struct ifpx_cbs_node *node; + + rte_spinlock_lock(&ifpx_lock); + TAILQ_FOREACH(node, &ifpx_callbacks, elem) { + if ( !node->cbs.cfg_finished) + continue; + node->cbs.cfg_finished(); + } + rte_spinlock_unlock(&ifpx_lock); +} + +static +void if_proxy_intr_callback(void *arg __rte_unused) +{ + struct nlmsghdr *h; + struct sockaddr_nl addr; + socklen_t addr_len; + char buf[8192]; + ssize_t len; + +restart: + len = recvfrom(ifpx_irq.fd, buf, sizeof(buf), 0, + (struct sockaddr *)&addr, &addr_len); + if (len < 0) { + if (errno == EINTR) { + IFPX_LOG(DEBUG, "recvmsg() interrupted"); + goto restart; + } + IFPX_LOG(ERR, "Failed to read netlink msg: %ld (errno %d)", + len, errno); + return; + } + if (addr_len != sizeof(addr)) { + IFPX_LOG(ERR, "Invalid netlink addr size: %d", addr_len); + return; + } + IFPX_LOG(DEBUG, "Read %lu bytes (buf %lu) from %u/%u", len, + sizeof(buf), addr.nl_pid, addr.nl_groups); + + for (h = (struct nlmsghdr *)buf; NLMSG_OK(h, len); + h = NLMSG_NEXT(h, len)) { + IFPX_LOG(DEBUG, "Recv msg: %u (%u/%u/%u seq/flags/pid)", + h->nlmsg_type, h->nlmsg_seq, h->nlmsg_flags, + h->nlmsg_pid); + + switch (h->nlmsg_type) { + case RTM_NEWLINK: + case RTM_DELLINK: + handle_link(h); + break; + case RTM_NEWADDR: + case RTM_DELADDR: + handle_addr(h, h->nlmsg_type == RTM_DELADDR); + break; + case RTM_NEWROUTE: + case RTM_DELROUTE: + handle_route(h, h->nlmsg_type == RTM_DELROUTE); + break; + } + + /* If this is a reply for global request then follow up with + * additional requests and notify about finish. + */ + if (h->nlmsg_pid == ifpx_pid && (h->nlmsg_seq >> 8) == 0 && + h->nlmsg_type == NLMSG_DONE) { + if ((h->nlmsg_seq & 0xFF) == RTM_GETLINK) + request_info(RTM_GETADDR, 0); + else if ((h->nlmsg_seq & 0xFF) == RTM_GETADDR) + request_info(RTM_GETROUTE, 0); + else { + RTE_ASSERT((h->nlmsg_seq & 0xFF) == + RTE_GETROUTE); + notify_cfg_finished(); + } + } + } + IFPX_LOG(DEBUG, "Finished msg loop: %ld bytes left", len); +} + +int rte_ifpx_listen(void) +{ + struct sockaddr_nl addr = { + .nl_family = AF_NETLINK, + .nl_pid = 0, + }; + socklen_t addr_len = sizeof(addr); + int ret; + + if (ifpx_irq.fd != -1) { + rte_errno = EBUSY; + return -1; + } + + addr.nl_groups = 1 << (RTNLGRP_LINK-1) + | 1 << (RTNLGRP_IPV4_IFADDR-1) + | 1 << (RTNLGRP_IPV6_IFADDR-1) + | 1 << (RTNLGRP_IPV4_ROUTE-1) + | 1 << (RTNLGRP_IPV6_ROUTE-1); + + ifpx_irq.fd = socket(AF_NETLINK, SOCK_RAW | SOCK_CLOEXEC, + NETLINK_ROUTE); + if (ifpx_irq.fd == -1) { + IFPX_LOG(ERR, "Failed to create netlink socket: %d", errno); + goto error; + } + /* Starting with kernel 4.19 you can request dump for a specific + * interface and kernel will filter out and send only relevant info. + * Otherwise NLM_F_DUMP will generate info for all interfaces and you + * need to filter them yourself. + */ +#ifdef NETLINK_DUMP_STRICT_CHK + ret = 1; /* use this var also as an input param */ + ret = setsockopt(ifpx_irq.fd, SOL_SOCKET, NETLINK_DUMP_STRICT_CHK, + &ret, sizeof(ret)); + if (ret < 0) { + IFPX_LOG(ERR, "Failed to set socket option: %d", errno); + goto error; + } +#endif + + ret = bind(ifpx_irq.fd, (struct sockaddr *)&addr, addr_len); + if (ret < 0) { + IFPX_LOG(ERR, "Failed to bind socket: %d", errno); + goto error; + } + ret = getsockname(ifpx_irq.fd, (struct sockaddr *)&addr, &addr_len); + if (ret < 0) { + IFPX_LOG(ERR, "Failed to get socket addr: %d", errno); + goto error; + } else { + ifpx_pid = addr.nl_pid; + IFPX_LOG(DEBUG, "Assigned port ID: %u", addr.nl_pid); + } + + ret = rte_intr_callback_register(&ifpx_irq, if_proxy_intr_callback, + NULL); + if (ret < 0) + goto error; + + request_info(RTM_GETLINK, 0); + + return 0; + +error: + rte_errno = errno; + if (ifpx_irq.fd != -1) { + close(ifpx_irq.fd); + ifpx_irq.fd = -1; + } + return -1; +} + +int rte_ifpx_close(void) +{ + int ec; + unsigned int p; + struct ifpx_cbs_node *cbs; + struct ifpx_proxies_node *px; + + if (ifpx_irq.fd < 0) + return -EBADFD; + +restart: + ec = rte_intr_callback_unregister(&ifpx_irq, + if_proxy_intr_callback, NULL); + if (ec == -EAGAIN) /* unlikely but possible - at least I think so */ + goto restart; + + rte_spinlock_lock(&ifpx_lock); + + close(ifpx_irq.fd); + ifpx_irq.fd = -1; + ifpx_pid = 0; + + /* Clear callbacks. */ + while (!TAILQ_EMPTY(&ifpx_callbacks)) { + cbs = TAILQ_FIRST(&ifpx_callbacks); + TAILQ_REMOVE(&ifpx_callbacks, cbs, elem); + free(cbs); + } + + /* Clear proxies. */ + while (!TAILQ_EMPTY(&ifpx_proxies)) { + px = TAILQ_FIRST(&ifpx_proxies); + TAILQ_REMOVE(&ifpx_proxies, px, elem); + free(px); + } + + for (p = 0; p < RTE_DIM(ifpx_p2p); ++p) + ifpx_p2p[p] = RTE_MAX_ETHPORTS; + + rte_spinlock_unlock(&ifpx_lock); + + return 0; +} + +RTE_INIT(if_proxy_init) +{ + unsigned int i; + for (i = 0; i < RTE_DIM(ifpx_p2p); ++i) + ifpx_p2p[i] = RTE_MAX_ETHPORTS; + + ifpx_log_type = rte_log_register("lib.if_proxy"); + if (ifpx_log_type >= 0) + rte_log_set_level(ifpx_log_type, RTE_LOG_WARNING); +} diff --git a/lib/meson.build b/lib/meson.build index 0af3efab2..c913b33dd 100644 --- a/lib/meson.build +++ b/lib/meson.build @@ -19,7 +19,7 @@ libraries = [ 'acl', 'bbdev', 'bitratestats', 'cfgfile', 'compressdev', 'cryptodev', 'distributor', 'efd', 'eventdev', - 'gro', 'gso', 'ip_frag', 'jobstats', + 'gro', 'gso', 'if_proxy', 'ip_frag', 'jobstats', 'kni', 'latencystats', 'lpm', 'member', 'power', 'pdump', 'rawdev', 'rcu', 'rib', 'reorder', 'sched', 'security', 'stack', 'vhost', From patchwork Tue Jan 14 14:25:17 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Andrzej Ostruszka [C]" X-Patchwork-Id: 64670 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id C5DC7A04FF; Tue, 14 Jan 2020 15:25:57 +0100 (CET) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 6E49F1C1C1; Tue, 14 Jan 2020 15:25:33 +0100 (CET) Received: from mx0b-0016f401.pphosted.com (mx0a-0016f401.pphosted.com [67.231.148.174]) by dpdk.org (Postfix) with ESMTP id 561EA1C1C1 for ; Tue, 14 Jan 2020 15:25:31 +0100 (CET) Received: from pps.filterd (m0045849.ppops.net [127.0.0.1]) by mx0a-0016f401.pphosted.com (8.16.0.42/8.16.0.42) with SMTP id 00EEKULf013366; Tue, 14 Jan 2020 06:25:30 -0800 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=marvell.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : content-type; s=pfpt0818; bh=31ESREwEYa4sdPp199Ejs1RynvMaRAj7vKifLQW5QkE=; b=GqHJWwuwoq1L6RxAjKVjRoM9yJMLNMopyDSH+N8OGRJzpr6Lt/x02OXclmljmHd34wH/ jUzBMIycrtAYyv0UMq7jx8RQC06Uly6gRMGBJJ1TBu36sYtnMpxw/20jaV2l8MVy2Jyp Lcxcno34mobxWlLhDR3IQFSzj5VV4JSh5wMwMZzlEgsHEhmgYxxx8NK/sHliMCMm4iFp IzxYfKv3UfSJ7Fk1oIhvn+MF4z/Lz/i5ybfG57IgB9B+8wIai6wBlUad8F5Igqcu8NRi t+jXfhpb6VaIN8ojDBzXHWL77RD7rOhz9sQBJiz7LfUiyriQZ7r6zpGDpk4i/gHmItkV 1Q== Received: from sc-exch01.marvell.com ([199.233.58.181]) by mx0a-0016f401.pphosted.com with ESMTP id 2xhc6sgngg-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-SHA384 bits=256 verify=NOT); Tue, 14 Jan 2020 06:25:30 -0800 Received: from SC-EXCH03.marvell.com (10.93.176.83) by SC-EXCH01.marvell.com (10.93.176.81) with Microsoft SMTP Server (TLS) id 15.0.1497.2; Tue, 14 Jan 2020 06:25:28 -0800 Received: from maili.marvell.com (10.93.176.43) by SC-EXCH03.marvell.com (10.93.176.83) with Microsoft SMTP Server id 15.0.1497.2 via Frontend Transport; Tue, 14 Jan 2020 06:25:28 -0800 Received: from amok.marvell.com (unknown [10.95.130.253]) by maili.marvell.com (Postfix) with ESMTP id 6BD153F7040; Tue, 14 Jan 2020 06:25:26 -0800 (PST) From: Andrzej Ostruszka To: , John McNamara , Marko Kovacevic CC: Jerin Jacob Kollanukkaran , Nithin Kumar Dabilpuram , Pavan Nikhilesh Bhagavatula , Kiran Kumar Kokkilagadda , Krzysztof Kanas Date: Tue, 14 Jan 2020 15:25:17 +0100 Message-ID: <20200114142517.29522-4-aostruszka@marvell.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20200114142517.29522-1-aostruszka@marvell.com> References: <20200114142517.29522-1-aostruszka@marvell.com> MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.138, 18.0.572 definitions=2020-01-14_04:2020-01-13, 2020-01-14 signatures=0 Subject: [dpdk-dev] [RFC PATCH 3/3] if_proxy: add example, test and documentation X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" This commit adds a test, documentation and a small example. The example just creates one proxy port and binds all ports available to it. Then you can play around with changing of network configuration of this proxy port and you should observe notifications from the appropriate callbacks. Below is an exemplary output (with some parts elided and some comments added) - 'dtap0' is the name of the proxy interface. sudo ./if_proxy -w 00:03.0 -w 00:04.0 ... Press ^C to quit route add -> 10.0.0.0/16 route add -> 192.168.123.0/24 ... route6 add -> ::1/128 route6 add -> fe80::/64 route6 add -> fe80::ee05:deaf:6827:b435/128 ... [[ output on: ip link set dtap0 mtu 1600 ]] mtu change for port 0 -> 1600 mtu change for port 1 -> 1600 [[ output on: ip link set dtap0 up ]] port 0 going up port 1 going up route6 add -> ff00::/8 route6 add -> fe80::/64 address6 add for port 0 -> fe80::2436:17ff:fefd:94ed address6 add for port 1 -> fe80::2436:17ff:fefd:94ed route6 add -> fe80::2436:17ff:fefd:94ed/128 Signed-off-by: Andrzej Ostruszka --- app/test/Makefile | 5 + app/test/meson.build | 1 + app/test/test_if_proxy.c | 431 +++++++++++++++++++++++++ doc/guides/prog_guide/if_proxy_lib.rst | 103 ++++++ doc/guides/prog_guide/index.rst | 1 + examples/Makefile | 1 + examples/if_proxy/Makefile | 58 ++++ examples/if_proxy/main.c | 203 ++++++++++++ examples/if_proxy/meson.build | 12 + examples/meson.build | 2 +- 10 files changed, 816 insertions(+), 1 deletion(-) create mode 100644 app/test/test_if_proxy.c create mode 100644 doc/guides/prog_guide/if_proxy_lib.rst create mode 100644 examples/if_proxy/Makefile create mode 100644 examples/if_proxy/main.c create mode 100644 examples/if_proxy/meson.build diff --git a/app/test/Makefile b/app/test/Makefile index 57930c00b..f621978d7 100644 --- a/app/test/Makefile +++ b/app/test/Makefile @@ -230,6 +230,11 @@ SRCS-$(CONFIG_RTE_LIBRTE_BPF) += test_bpf.c SRCS-$(CONFIG_RTE_LIBRTE_RCU) += test_rcu_qsbr.c test_rcu_qsbr_perf.c +ifeq ($(CONFIG_RTE_LIBRTE_IF_PROXY),y) +SRCS-y += test_if_proxy.c +LDLIBS += -lrte_if_proxy +endif + SRCS-$(CONFIG_RTE_LIBRTE_IPSEC) += test_ipsec.c SRCS-$(CONFIG_RTE_LIBRTE_IPSEC) += test_ipsec_sad.c ifeq ($(CONFIG_RTE_LIBRTE_IPSEC),y) diff --git a/app/test/meson.build b/app/test/meson.build index fb49d804b..2a3b5fef2 100644 --- a/app/test/meson.build +++ b/app/test/meson.build @@ -61,6 +61,7 @@ test_sources = files('commands.c', 'test_hash_perf.c', 'test_hash_readwrite_lf.c', 'test_interrupts.c', + 'test_if_proxy.c', 'test_ipsec.c', 'test_ipsec_sad.c', 'test_kni.c', diff --git a/app/test/test_if_proxy.c b/app/test/test_if_proxy.c new file mode 100644 index 000000000..0ecfb79b4 --- /dev/null +++ b/app/test/test_if_proxy.c @@ -0,0 +1,431 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(C) 2020 Marvell International Ltd. + */ + +#include "test.h" + +#include +#include + +#include +#include +#include +#include +#include +#include +#include + +static pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER; +static pthread_cond_t cond = PTHREAD_COND_INITIALIZER; + +enum net_op { + INITIALIZED = 1U << 0, + LOOP_ROUTE = 1U << 1, + LOOP6_ROUTE = 1U << 2, + LINK_CHANGED = 1U << 3, + MAC_CHANGED = 1U << 4, + MTU_CHANGED = 1U << 5, + ADDR_ADD = 1U << 6, + ADDR_DEL = 1U << 7, + ROUTE_ADD = 1U << 8, + ROUTE_DEL = 1U << 9, + ADDR6_ADD = 1U << 10, + ADDR6_DEL = 1U << 11, + ROUTE6_ADD = 1U << 12, + ROUTE6_DEL = 1U << 13, +}; + +static unsigned int state; + +static struct { + struct rte_ether_addr mac_addr; + uint16_t port_id, mtu; + struct in_addr ipv4, route4; + struct in6_addr ipv6, route6; + uint16_t depth4, depth6; + int is_up; +} net_cfg; + +static +int unlock_notify(unsigned int op) +{ + /* the mutex is expected to be locked on entry */ + RTE_VERIFY(pthread_mutex_trylock(&mutex) == EBUSY); + state |= op; + + pthread_mutex_unlock(&mutex); + return pthread_cond_signal(&cond); +} + +static +int wait_for(unsigned int op_mask, unsigned int sec) +{ + struct timespec time; + int ec = pthread_mutex_trylock(&mutex); + + /* the mutex is expected to be locked on entry */ + RTE_VERIFY(ec == EBUSY); + + ec = 0; + clock_gettime(CLOCK_REALTIME, &time); + time.tv_sec += sec; + + while ((state & op_mask) != op_mask && ec == 0) + ec = pthread_cond_timedwait(&cond, &mutex, &time); + + return ec; +} + +static +int expect(unsigned int op_mask, const char *fmt, ...) +#if __GNUC__ + __attribute__((format(printf, 2, 3))); +#endif + +static +int expect(unsigned int op_mask, const char *fmt, ...) +{ + char cmd[128]; + va_list args; + int ret; + + state &= ~op_mask; + va_start(args, fmt); + vsnprintf(cmd, sizeof(cmd), fmt, args); + va_end(args); + ret = system(cmd); + if (ret == 0) + /* IPv6 address notifications seem to need that long delay. */ + return wait_for(op_mask, 2); + return ret; +} + +static +void mac_change(uint16_t port_id, const struct rte_ether_addr *mac) +{ + pthread_mutex_lock(&mutex); + RTE_VERIFY(port_id == net_cfg.port_id); + if (memcmp(mac->addr_bytes, net_cfg.mac_addr.addr_bytes, + RTE_ETHER_ADDR_LEN) == 0) { + unlock_notify(MAC_CHANGED); + return; + } + pthread_mutex_unlock(&mutex); +} + +static +void mtu_change(uint16_t port_id, uint16_t mtu) +{ + pthread_mutex_lock(&mutex); + RTE_VERIFY(port_id == net_cfg.port_id); + if (net_cfg.mtu == mtu) { + unlock_notify(MTU_CHANGED); + return; + } + pthread_mutex_unlock(&mutex); +} + +static +void link_change(uint16_t port_id, int is_up) +{ + pthread_mutex_lock(&mutex); + RTE_VERIFY(port_id == net_cfg.port_id); + if (net_cfg.is_up == is_up) { + unlock_notify(LINK_CHANGED); + return; + } + pthread_mutex_unlock(&mutex); +} + +static +void addr_add(uint16_t port_id, uint32_t ip) +{ + pthread_mutex_lock(&mutex); + RTE_VERIFY(port_id == net_cfg.port_id); + if (net_cfg.ipv4.s_addr == ip) { + unlock_notify(ADDR_ADD); + return; + } + pthread_mutex_unlock(&mutex); +} + +static +void addr_del(uint16_t port_id, uint32_t ip) +{ + pthread_mutex_lock(&mutex); + RTE_VERIFY(port_id == net_cfg.port_id); + if (net_cfg.ipv4.s_addr == ip) { + unlock_notify(ADDR_DEL); + return; + } + pthread_mutex_unlock(&mutex); +} + +static +void addr6_add(uint16_t port_id, const uint8_t *ip) +{ + pthread_mutex_lock(&mutex); + RTE_VERIFY(port_id == net_cfg.port_id); + if (memcmp(ip, net_cfg.ipv6.s6_addr, 16) == 0) { + unlock_notify(ADDR6_ADD); + return; + } + pthread_mutex_unlock(&mutex); +} + +static +void addr6_del(uint16_t port_id __rte_unused, const uint8_t *ip) +{ + pthread_mutex_lock(&mutex); + RTE_VERIFY(port_id == net_cfg.port_id); + if (memcmp(ip, net_cfg.ipv6.s6_addr, 16) == 0) { + unlock_notify(ADDR6_DEL); + return; + } + pthread_mutex_unlock(&mutex); +} + +static +void route_add(uint32_t ip, uint8_t depth) +{ + pthread_mutex_lock(&mutex); + /* Since we are checking if during initialization we get some routing + * info we need to notify either when we are not initialized or when + * the exact route matches. + */ + if (!(state & INITIALIZED) || + (net_cfg.depth4 == depth && net_cfg.route4.s_addr == ip)) { + unlock_notify(ROUTE_ADD); + return; + } + pthread_mutex_unlock(&mutex); +} + +static +void route_del(uint32_t ip, uint8_t depth) +{ + pthread_mutex_lock(&mutex); + if (net_cfg.depth4 == depth && net_cfg.route4.s_addr == ip) { + unlock_notify(ROUTE_DEL); + return; + } + pthread_mutex_unlock(&mutex); +} + +static +void route6_add(const uint8_t *ip, uint8_t depth) +{ + pthread_mutex_lock(&mutex); + /* Since we are checking if during initialization we get some routing + * info we need to notify either when we are not initialized or when + * the exact route matches. + */ + if (!(state & INITIALIZED) || + (net_cfg.depth6 == depth && + /* don't check for trailing zeros */ + memcmp(ip, net_cfg.route6.s6_addr, depth/8) == 0)) { + unlock_notify(ROUTE6_ADD); + return; + } + pthread_mutex_unlock(&mutex); +} + +static +void route6_del(const uint8_t *ip, uint8_t depth) +{ + pthread_mutex_lock(&mutex); + if (net_cfg.depth6 == depth && + /* don't check for trailing zeros */ + memcmp(ip, net_cfg.route6.s6_addr, depth/8) == 0) { + unlock_notify(ROUTE6_DEL); + return; + } + pthread_mutex_unlock(&mutex); +} + +static +void cfg_finished(void) +{ + pthread_mutex_lock(&mutex); + unlock_notify(INITIALIZED); +} + +static +struct rte_ifpx_callbacks cbs = { + .mac_change = mac_change, + .mtu_change = mtu_change, + .link_change = link_change, + .addr_add = addr_add, + .addr_del = addr_del, + .addr6_add = addr6_add, + .addr6_del = addr6_del, + .route_add = route_add, + .route_del = route_del, + .route6_add = route6_add, + .route6_del = route6_del, + /* lib specific callback */ + .cfg_finished = cfg_finished, +}; + +static int +test_if_proxy(void) +{ + int ec; + char buf[INET6_ADDRSTRLEN]; + const struct rte_ifpx_info *pinfo; + + state = 0; + memset(&net_cfg, 0, sizeof(net_cfg)); + /* Since we are not going to test RX/TX we can just create proxy and + * bind it to itself to test just notification functionality. + */ + net_cfg.port_id = rte_ifpx_create(RTE_IFPX_DEFAULT); + RTE_VERIFY(net_cfg.port_id != RTE_MAX_ETHPORTS); + rte_ifpx_port_bind(net_cfg.port_id, net_cfg.port_id); + rte_ifpx_callbacks_register(&cbs); + rte_ifpx_listen(); + + pthread_mutex_lock(&mutex); + /* During initialization we should observe IPv4/6 loopback routes. */ + net_cfg.route4.s_addr = RTE_IPV4(127, 0, 0, 1); + net_cfg.depth4 = 32; + memcpy(net_cfg.route6.s6_addr, in6addr_loopback.s6_addr, 16); + net_cfg.depth6 = 128; + ec = wait_for(INITIALIZED | ROUTE_ADD | ROUTE6_ADD, 2); + if (ec != 0) { + printf("Failed to obtain network configuration\n"); + goto exit; + } + pinfo = rte_ifpx_info_get(net_cfg.port_id); + RTE_VERIFY(pinfo); + + /* Make sure the link is down. */ + net_cfg.is_up = 0; + ec = expect(LINK_CHANGED, "ip link set dev %s down", pinfo->if_name); + RTE_VERIFY(ec == ETIMEDOUT || ec == 0); + + /* Test link up notification. */ + net_cfg.is_up = 1; + ec = expect(LINK_CHANGED, "ip link set dev %s up", pinfo->if_name); + if (ec != 0) { + printf("Failed to notify about link going up\n"); + goto exit; + } + + /* Test for MAC changes notification. */ + rte_eth_random_addr(net_cfg.mac_addr.addr_bytes); + rte_ether_format_addr(buf, sizeof(buf), &net_cfg.mac_addr); + ec = expect(MAC_CHANGED, "ip link set dev %s address %s", + pinfo->if_name, buf); + if (ec != 0) { + printf("Missing/wrong notification about mac change\n"); + goto exit; + } + + /* Test for MTU changes notification. */ + net_cfg.mtu = pinfo->mtu + 100; + ec = expect(MTU_CHANGED, "ip link set dev %s mtu %d", + pinfo->if_name, net_cfg.mtu); + if (ec != 0) { + printf("Missing/wrong notification about mtu change\n"); + goto exit; + } + + /* Test for adding of IPv4 address - using address from TEST-2 pool. + * This test is specific to linux netlink behaviour - after adding + * address we get both notification about address being added and new + * route. So I check both. + */ + net_cfg.ipv4.s_addr = RTE_IPV4(198, 51, 100, 14); + net_cfg.route4.s_addr = net_cfg.ipv4.s_addr; + net_cfg.depth4 = 32; + ec = expect(ADDR_ADD | ROUTE_ADD, "ip addr add 198.51.100.14 dev %s", + pinfo->if_name); + if (ec != 0) { + printf("Missing/wrong notifications about IPv4 address add\n"); + goto exit; + } + + /* Test for IPv4 address removal. See comment above for 'addr add'. */ + ec = expect(ADDR_DEL | ROUTE_DEL, "ip addr del 198.51.100.14/32 dev %s", + pinfo->if_name); + if (ec != 0) { + printf("Missing/wrong notifications about IPv4 address del\n"); + goto exit; + } + + /* Test for adding IPv4 route. */ + net_cfg.route4.s_addr = RTE_IPV4(198, 51, 100, 0); + net_cfg.depth4 = 24; + ec = expect(ROUTE_ADD, "ip route add 198.51.100.0/24 dev %s", + pinfo->if_name); + if (ec != 0) { + printf("Missing/wrong notifications about IPv4 route add\n"); + goto exit; + } + + /* Test for IPv4 route removal. */ + ec = expect(ROUTE_DEL, "ip route del 198.51.100.0/24 dev %s", + pinfo->if_name); + if (ec != 0) { + printf("Missing/wrong notifications about IPv4 route del\n"); + goto exit; + } + + /* Now the same for IPv6 - with address from "documentation pool". */ + inet_pton(AF_INET6, "2001:db8::dead:beef", net_cfg.ipv6.s6_addr); + /* This is specific to linux netlink behaviour - after adding address + * we get both notification about address being added and new route. + * So I wait for both. + */ + memcpy(net_cfg.route6.s6_addr, net_cfg.ipv6.s6_addr, 16); + net_cfg.depth6 = 128; + ec = expect(ADDR6_ADD | ROUTE6_ADD, + "ip addr add 2001:db8::dead:beef dev %s", + pinfo->if_name); + if (ec != 0) { + printf("Missing/wrong notifications about IPv6 address add\n"); + goto exit; + } + + /* See comment above for 'addr6 add'. */ + ec = expect(ADDR6_DEL | ROUTE6_DEL, + "ip addr del 2001:db8::dead:beef/128 dev %s", + pinfo->if_name); + if (ec != 0) { + printf("Missing/wrong notifications about IPv6 address del\n"); + goto exit; + } + + net_cfg.depth6 = 96; + ec = expect(ROUTE6_ADD, "ip route add 2001:db8::dead:0/96 dev %s", + pinfo->if_name); + if (ec != 0) { + printf("Missing/wrong notifications about IPv6 route add\n"); + goto exit; + } + + ec = expect(ROUTE6_DEL, "ip route del 2001:db8::dead:0/96 dev %s", + pinfo->if_name); + if (ec != 0) { + printf("Missing/wrong notifications about IPv6 route del\n"); + goto exit; + } + + /* Finally put link down and test for notification. */ + net_cfg.is_up = 0; + ec = expect(LINK_CHANGED, "ip link set dev %s down", pinfo->if_name); + if (ec != 0) { + printf("Failed to notify about link going down\n"); + goto exit; + } + +exit: + pthread_mutex_unlock(&mutex); + rte_ifpx_destroy(net_cfg.port_id); + rte_ifpx_close(); + + return ec; +} + +REGISTER_TEST_COMMAND(if_proxy_autotest, test_if_proxy) diff --git a/doc/guides/prog_guide/if_proxy_lib.rst b/doc/guides/prog_guide/if_proxy_lib.rst new file mode 100644 index 000000000..dc1202cdf --- /dev/null +++ b/doc/guides/prog_guide/if_proxy_lib.rst @@ -0,0 +1,103 @@ +.. SPDX-License-Identifier: BSD-3-Clause + Copyright(C) 2019 Marvell International Ltd. + +.. _IF_Proxy_Library: + +IF Proxy Library +================ + +When a network interface is assigned to DPDK it usually disappears from +the system. +This way user looses ability to configure it via typical configuration +tools and is left basically with two options: + + - configure it via command line arguments, + + - add support for live configuration via some IPC mechanism. + +The first option is static and the second one requires some work to add +communication loop (e.g. separate thread listening/communicating on +a socket). + +This library adds a possibility to configure DPDK ports by using normal +configuration utilities (e.g. from iproute2 suite). +It requires user to configure additional DPDK ports that are visible to +the system (such as Tap or KNI - actually any port that has valid +'if_index' in 'struct rte_eth_dev_info' will do) and designate them as +a port representor (a proxy) in the system. + +Let's see typical intended usage by an example. +Suppose that you have application that handles traffic on two ports (in +the white list below). + + ./app -w 00:14.0 -w 00:16.0 --vdev=net_tap0 --vdev=net_tap1 + +So in addition you configure two proxy ports and in the application code +you bind them to the "main" ports: + + rte_if_proxy_port_bind(port0, proxy0); + rte_if_proxy_port_bind(port1, proxy1); + +This binding is a logical one - there is no automatic packet forwarding +configured. +This is because library cannot tell upfront what portion of the traffic +received on ports 0/1 should be redirected to the system via proxies and +also it does not know how the application is structured (what packet +processing engines it uses). +Therefore it is application writer responsibility to include proxy ports +into its packet processing and forward appropriate packets between +proxies and ports. +What the library actually does is that it gets network configuration +from the system and listens to its changes. +This information is then matched against 'if_index' of the configured +proxies (when applicable - routing information is global) and passed to +the application via set of callbacks that user has to register: + + rte_if_proxy_callbacks_register(&cbs); + +Here 'cbs' is a 'struct rte_if_proxy_callbacks' which has following +members: + + void (*mac_change)(uint16_t port_id, const struct rte_ether_addr *mac); + void (*mtu_change)(uint16_t port_id, uint16_t mtu); + void (*link_change)(uint16_t port_id, int is_up); + /* IPv4 addresses are in host order */ + void (*addr_add)(uint16_t port_id, uint32_t ip); + void (*addr_del)(uint16_t port_id, uint32_t ip); + void (*addr6_add)(uint16_t port_id, const uint8_t *ip); + void (*addr6_del)(uint16_t port_id, const uint8_t *ip); + void (*route_add)(uint32_t ip, uint8_t depth); + void (*route_del)(uint32_t ip, uint8_t depth); + void (*route6_add)(const uint8_t *ip, uint8_t depth); + void (*route6_del)(const uint8_t *ip, uint8_t depth); + /* lib specific callback - called when initial network configuration + * query is finished */ + void (*cfg_finished)(void); + +So for example when the user issues command: + + ip link set dev dtap0 mtu 1600 + +then library will call `mtu_change()` callback with port_id equal to +'port0' (id of the port bound to this proxy) and 'mtu' equal to 1600 +('dtap0' is the default interface name for 'net_tap0'). +Application can simply use `rte_eth_dev_set_mtu()` as this callback. +The same way `rte_eth_dev_default_mac_addr_set()` can be used for +`mac_change()` and `rte_eth_dev_set_link_up/down()` can be used inside +the callback that does dispatch based on 'is_up' argument. + +Please note however that the context in which these callbacks are called +is most probably different from the one in which packets are handled and +it is application writer responsibility to use proper synchronization +mechanisms - if they are needed. + +If the application supports IP protocol stack then it can utilize +callbacks for adding/removing of addresses to the proxies and also +routing information (note that routing info is not associated with any +port). +E.g. application can feed some LPM tables with these addresses and upon +reception of a packet on some port match this packet against those +tables to figure out what to do with this packet. +If the decision is to pass it to the system then it can simply forward +them to the proxy corresponding to the port on which packet has been +received by using standard PMD TX interface. diff --git a/doc/guides/prog_guide/index.rst b/doc/guides/prog_guide/index.rst index dc4851c57..0a1541f34 100644 --- a/doc/guides/prog_guide/index.rst +++ b/doc/guides/prog_guide/index.rst @@ -57,6 +57,7 @@ Programmer's Guide metrics_lib bpf_lib ipsec_lib + if_proxy_lib source_org dev_kit_build_system dev_kit_root_make_help diff --git a/examples/Makefile b/examples/Makefile index feff79784..5aa9ab431 100644 --- a/examples/Makefile +++ b/examples/Makefile @@ -81,6 +81,7 @@ else $(info vm_power_manager requires libvirt >= 0.9.3) endif endif +DIRS-$(CONFIG_RTE_LIBRTE_IF_PROXY) += if_proxy DIRS-y += eventdev_pipeline diff --git a/examples/if_proxy/Makefile b/examples/if_proxy/Makefile new file mode 100644 index 000000000..dd0515fa4 --- /dev/null +++ b/examples/if_proxy/Makefile @@ -0,0 +1,58 @@ +# SPDX-License-Identifier: BSD-3-Clause +# Copyright(c) 2019 Marvell International Ltd. + +# binary name +APP = if_proxy + +# all source are stored in SRCS-y +SRCS-y := main.c + +# Build using pkg-config variables if possible +ifeq ($(shell pkg-config --exists libdpdk && echo 0),0) + +all: shared +.PHONY: shared static +shared: build/$(APP)-shared + ln -sf $(APP)-shared build/$(APP) +static: build/$(APP)-static + ln -sf $(APP)-static build/$(APP) + +PKGCONF=pkg-config --define-prefix + +PC_FILE := $(shell $(PKGCONF) --path libdpdk) +CFLAGS += -O3 $(shell $(PKGCONF) --cflags libdpdk) +LDFLAGS_SHARED = $(shell $(PKGCONF) --libs libdpdk) +LDFLAGS_STATIC = -Wl,-Bstatic $(shell $(PKGCONF) --static --libs libdpdk) + +build/$(APP)-shared: $(SRCS-y) Makefile $(PC_FILE) | build + $(CC) $(CFLAGS) $(SRCS-y) -o $@ $(LDFLAGS) $(LDFLAGS_SHARED) + +build/$(APP)-static: $(SRCS-y) Makefile $(PC_FILE) | build + $(CC) $(CFLAGS) $(SRCS-y) -o $@ $(LDFLAGS) $(LDFLAGS_STATIC) + +build: + @mkdir -p $@ + +.PHONY: clean +clean: + rm -f build/$(APP) build/$(APP)-static build/$(APP)-shared + test -d build && rmdir -p build || true + +else # Build using legacy build system + +ifeq ($(RTE_SDK),) +$(error "Please define RTE_SDK environment variable") +endif + +# Default target, detect a build directory, by looking for a path with a .config +RTE_TARGET ?= $(notdir $(abspath $(dir $(firstword $(wildcard $(RTE_SDK)/*/.config))))) + +include $(RTE_SDK)/mk/rte.vars.mk + +CFLAGS += -O3 +CFLAGS += -DALLOW_EXPERIMENTAL_API +CFLAGS += $(WERROR_FLAGS) +LDLIBS += -lrte_if_proxy -lrte_ethdev -lrte_eal + +include $(RTE_SDK)/mk/rte.extapp.mk +endif diff --git a/examples/if_proxy/main.c b/examples/if_proxy/main.c new file mode 100644 index 000000000..2195fb490 --- /dev/null +++ b/examples/if_proxy/main.c @@ -0,0 +1,203 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(C) 2020 Marvell International Ltd. + */ + +#include + +#include +#include +#include +#include + +static +char buf[INET6_ADDRSTRLEN]; + +static +uint16_t proxy_id = RTE_MAX_ETHPORTS; + +static +void mac_change(uint16_t port_id, const struct rte_ether_addr *mac) +{ + char buf[3*RTE_ETHER_ADDR_LEN]; + + rte_ether_format_addr(buf, sizeof(buf), mac); + printf("\tmac change for port %u -> %s\n", port_id, buf); +} + +static +void mtu_change(uint16_t port_id, uint16_t mtu) +{ + printf("\tmtu change for port %u -> %u\n", port_id, mtu); +} + +static +void link_change(uint16_t port_id, int is_up) +{ + printf("\tport %u going %s\n", port_id, is_up ? "up" : "down"); +} + +static +void addr_add(uint16_t port_id, uint32_t ip) +{ + struct in_addr a = { .s_addr = htonl(ip) }; + + printf("\taddress add for port %u -> %s\n", port_id, + inet_ntop(AF_INET, &a, buf, sizeof(buf))); +} + +static +void addr_del(uint16_t port_id, uint32_t ip) +{ + struct in_addr a = { .s_addr = htonl(ip) }; + + printf("\taddress del for port %u -> %s\n", port_id, + inet_ntop(AF_INET, &a, buf, sizeof(buf))); +} + +static +void addr6_add(uint16_t port_id, const uint8_t *ip) +{ + struct in6_addr a; + + memcpy(a.s6_addr, ip, 16); + printf("\taddress6 add for port %u -> %s\n", port_id, + inet_ntop(AF_INET6, &a, buf, sizeof(buf))); +} + +static +void addr6_del(uint16_t port_id, const uint8_t *ip) +{ + struct in6_addr a; + + memcpy(a.s6_addr, ip, 16); + printf("\taddress6 del for port %u -> %s\n", port_id, + inet_ntop(AF_INET6, &a, buf, sizeof(buf))); +} + +static +void route_add(uint32_t ip, uint8_t depth) +{ + struct in_addr a = { .s_addr = htonl(ip) }; + + printf("\troute add -> %s/%u\n", + inet_ntop(AF_INET, &a, buf, sizeof(buf)), depth); +} + +static +void route_del(uint32_t ip, uint8_t depth) +{ + struct in_addr a = { .s_addr = htonl(ip) }; + + printf("\troute del -> %s/%u\n", + inet_ntop(AF_INET, &a, buf, sizeof(buf)), depth); +} + +static +void route6_add(const uint8_t *ip, uint8_t depth) +{ + struct in6_addr a; + + memcpy(a.s6_addr, ip, 16); + printf("\troute6 add -> %s/%u\n", + inet_ntop(AF_INET6, &a, buf, sizeof(buf)), depth); +} + +static +void route6_del(const uint8_t *ip, uint8_t depth) +{ + struct in6_addr a; + + memcpy(a.s6_addr, ip, 16); + printf("\troute6 del -> %s/%u\n", + inet_ntop(AF_INET6, &a, buf, sizeof(buf)), depth); +} + +struct rte_ifpx_callbacks cbs = { + .mac_change = mac_change, + .mtu_change = mtu_change, + .link_change = link_change, + .addr_add = addr_add, + .addr_del = addr_del, + .addr6_add = addr6_add, + .addr6_del = addr6_del, + .route_add = route_add, + .route_del = route_del, + .route6_add = route6_add, + .route6_del = route6_del, +}; + +static +void proxy_bind_change(int sig) +{ + uint16_t port; + if (sig == SIGUSR1) + port = 0; + else if (sig == SIGUSR2) + port = 1; + else + return; + + if (port >= rte_eth_dev_count_avail()) { + printf("\tNot enough ports allocated!\n"); + return; + } + + if (rte_ifpx_proxy_get(port) == RTE_MAX_ETHPORTS) { + printf("\tbinding port %d to proxy\n", port); + rte_ifpx_port_bind(port, proxy_id); + } else { + printf("\tunbinding port %d\n", port); + rte_ifpx_port_unbind(port); + } +} + +int +main(int argc, char **argv) +{ + int i, sig, nb_ports; + sigset_t set; + + /* init EAL */ + i = rte_eal_init(argc, argv); + if (i < 0) + rte_exit(EXIT_FAILURE, "Invalid EAL arguments\n"); + argc -= i; + argv += i; + + nb_ports = rte_eth_dev_count_avail(); + if (nb_ports == 0) + rte_exit(EXIT_FAILURE, "No Ethernet ports - bye\n"); + + proxy_id = rte_ifpx_create(RTE_IFPX_DEFAULT); + if (proxy_id >= RTE_MAX_ETHPORTS) { + printf("Failed to create default proxy\n"); + return -1; + } + /* Bind all ports to the same proxy. */ + for (i = 0; i < nb_ports; ++i) + rte_ifpx_port_bind(i, proxy_id); + rte_ifpx_callbacks_register(&cbs); + rte_ifpx_listen(); + + /* Since we do not process packets - only listen to net events - we only + * wait for signal either to quit or to change proxy binding. + */ + signal(SIGUSR1, proxy_bind_change); + signal(SIGUSR2, proxy_bind_change); + + sigemptyset(&set); + sigaddset(&set, SIGINT); + sigprocmask(SIG_BLOCK, &set, NULL); + printf("Press ^C to quit\n"); + do { + i = sigwait(&set, &sig); + } while (i != 0 && sig != SIGINT); + + RTE_ETH_FOREACH_DEV(i) { + printf("\nClosing port %d...\n", i); + rte_eth_dev_close(i); + } + printf("Bye\n"); + + return 0; +} diff --git a/examples/if_proxy/meson.build b/examples/if_proxy/meson.build new file mode 100644 index 000000000..5f5826a90 --- /dev/null +++ b/examples/if_proxy/meson.build @@ -0,0 +1,12 @@ +# SPDX-License-Identifier: BSD-3-Clause +# Copyright(c) 2019 Marvell International Ltd. + +# meson file, for building this example as part of a main DPDK build. +# +# To build this example as a standalone application with an already-installed +# DPDK instance, use 'make' + +allow_experimental_apis = true +sources = files( + 'main.c' +) diff --git a/examples/meson.build b/examples/meson.build index 1f2b6f516..468ef8a90 100644 --- a/examples/meson.build +++ b/examples/meson.build @@ -16,7 +16,7 @@ all_examples = [ 'eventdev_pipeline', 'fips_validation', 'flow_classify', 'flow_filtering', 'helloworld', - 'ioat', + 'if_proxy', 'ioat', 'ip_fragmentation', 'ip_pipeline', 'ip_reassembly', 'ipsec-secgw', 'ipv4_multicast', 'kni',