From patchwork Tue Oct 2 06:30:36 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Slava Ovsiienko X-Patchwork-Id: 45801 X-Patchwork-Delegate: shahafs@mellanox.com Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id A4C9D4C99; Tue, 2 Oct 2018 08:30:39 +0200 (CEST) Received: from EUR01-DB5-obe.outbound.protection.outlook.com (mail-db5eur01on0064.outbound.protection.outlook.com [104.47.2.64]) by dpdk.org (Postfix) with ESMTP id F073744BE for ; Tue, 2 Oct 2018 08:30:37 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Mellanox.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=o2LRQqt4xmlG9ZvbLLyVMGuUx6zB0cahN+jYzdeTPq0=; b=InFJ1lcSpSnAw09RpH+2kbgM5MmUh9y9GtuB3zkzh2TtRMeIIawmS2pgrzrvK5AN/pnibTP0hbo4xzj2QO/FPurgYkGuTDlTNxvNUuzRpuOy1rj6K70NHxxVjvQ6OSh4zEcdQ7Qaq1QX8Xg0jGIDy2Jj11/THIrtHWZdYDdtQ0M= Received: from AM4PR05MB3265.eurprd05.prod.outlook.com (10.171.186.150) by AM4PR05MB3443.eurprd05.prod.outlook.com (10.171.187.148) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.1185.25; Tue, 2 Oct 2018 06:30:36 +0000 Received: from AM4PR05MB3265.eurprd05.prod.outlook.com ([fe80::81a1:b719:2345:50e5]) by AM4PR05MB3265.eurprd05.prod.outlook.com ([fe80::81a1:b719:2345:50e5%5]) with mapi id 15.20.1185.024; Tue, 2 Oct 2018 06:30:36 +0000 From: Slava Ovsiienko To: "dev@dpdk.org" CC: Shahaf Shuler , Slava Ovsiienko Thread-Topic: [PATCH 2/5] net/mlx5: e-switch VXLAN netlink routines update Thread-Index: AQHUWhltm+DQLIgIZUK7lo+WLLLXLw== Date: Tue, 2 Oct 2018 06:30:36 +0000 Message-ID: <1538461807-37507-2-git-send-email-viacheslavo@mellanox.com> References: <1538461807-37507-1-git-send-email-viacheslavo@mellanox.com> In-Reply-To: <1538461807-37507-1-git-send-email-viacheslavo@mellanox.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-clientproxiedby: AM0PR02CA0019.eurprd02.prod.outlook.com (2603:10a6:208:3e::32) To AM4PR05MB3265.eurprd05.prod.outlook.com (2603:10a6:205:4::22) authentication-results: spf=none (sender IP is ) smtp.mailfrom=viacheslavo@mellanox.com; x-ms-exchange-messagesentrepresentingtype: 1 x-originating-ip: [37.142.13.130] x-ms-publictraffictype: Email x-microsoft-exchange-diagnostics: 1; AM4PR05MB3443; 6:3YfMn9GB2b4EnpV96VEGsjHqSadDJObU6K6CU3CovXXMgk3q4f/4lVVc3u48RtAdutkNL8it5jJEgP9fa2fFaa6oLu6oYgSz2O4bHdautf9pUSLIMTmTyWGFiyEE1T3/sbri/85zXQe0LYjLQj7rvOfnR5tl1xFgcsIh18QoVB5kdQr2GS6jGHW5Gxzh62slAuzjGrQUcxEN6WlxgQD3evy8SPH980Ik4ZbJZnegyHAcgC4pPXPp1UUK+dbcUTmsiaMDQORm6IsRAfaeAHED6rV3B1yZ8Kc4M6bNoT+A11RpS3ISMmw49+AJsi+oTZB7zvjvBptKzuJhZHlmTEhT54ZUhQsZPqx5GyU1oabRgL7Xjx0HxwB3/TCu7cZjc+ZuzRfVVKp3+fishaUPVw2PP6NScQd7+CFoHNCEcA6IUihT7NDqtqcck6US9FKgk9ynZIZhnzzf+0+2J8oYcjY8DA==; 5:ib2vGdqNJChVkdJF4R4dF+wqlEy4LgkqRLlnMq9lhaNV2UovE0a8fIHP6PDQEIvwDtNHIew/8QK6BenkX9gLMdJwdL2VonnJk0fjppbUNA9pWxMRaWWnP3hPaLyarvluwOciFwRNH3+SnWBTvEfK7QlNxiyo+t254FL+Jh5B3jQ=; 7:no4kJHKPH8UB4wBPDWhCPwSNloT068dxIA1cPUVHsifEkhvPIgsBexad1xsyMIBp3bta0GjY8W1jC2DlW/dbahkNxI3NV2+1EW7Pu225kWe7jaX4F1aZHKruniojb3B/1LqJHottPmC2peJMyEP4FYO+gp8ORW9nXfbFd6lPepK18Pu7tqeTFIUAUMc22p3HxhYZAZgec7b9fEt1fefPi7SCVm3AgTiA9YLMuNQypv1x/ogGWDKq2DYG08lvkcqq x-ms-office365-filtering-correlation-id: 36653fcd-a60b-4c53-3a22-08d62830904b x-ms-office365-filtering-ht: Tenant x-microsoft-antispam: BCL:0; PCL:0; RULEID:(7020095)(4652040)(8989299)(4534165)(4627221)(201703031133081)(201702281549075)(8990200)(5600074)(711020)(4618075)(2017052603328)(7153060)(7193020); SRVR:AM4PR05MB3443; x-ms-traffictypediagnostic: AM4PR05MB3443: x-microsoft-antispam-prvs: x-exchange-antispam-report-test: UriScan:(269456686620040)(211171220733660); x-ms-exchange-senderadcheck: 1 x-exchange-antispam-report-cfa-test: BCL:0; PCL:0; RULEID:(8211001083)(6040522)(2401047)(8121501046)(5005006)(3231355)(944501410)(52105095)(3002001)(10201501046)(93006095)(93001095)(6055026)(149066)(150057)(6041310)(20161123560045)(20161123558120)(20161123562045)(20161123564045)(201703131423095)(201702281528075)(20161123555045)(201703061421075)(201703061406153)(201708071742011)(7699051); SRVR:AM4PR05MB3443; BCL:0; PCL:0; RULEID:; SRVR:AM4PR05MB3443; x-forefront-prvs: 0813C68E65 x-forefront-antispam-report: SFV:NSPM; SFS:(10009020)(396003)(136003)(39860400002)(376002)(366004)(346002)(189003)(199004)(5024004)(5640700003)(256004)(25786009)(14444005)(26005)(106356001)(54906003)(186003)(316002)(102836004)(68736007)(105586002)(6116002)(3846002)(97736004)(14454004)(81156014)(1730700003)(86362001)(2351001)(2900100001)(8676002)(575784001)(7736002)(2616005)(8936002)(476003)(305945005)(446003)(11346002)(81166006)(6486002)(52116002)(386003)(6436002)(6506007)(15650500001)(53936002)(2906002)(4326008)(71200400001)(36756003)(478600001)(486006)(99286004)(5250100002)(66066001)(76176011)(5660300001)(6512007)(2501003)(6916009)(4744004)(71190400001)(107886003); DIR:OUT; SFP:1101; SCL:1; SRVR:AM4PR05MB3443; H:AM4PR05MB3265.eurprd05.prod.outlook.com; FPR:; SPF:None; LANG:en; PTR:InfoNoRecords; MX:1; A:1; received-spf: None (protection.outlook.com: mellanox.com does not designate permitted sender hosts) x-microsoft-antispam-message-info: pwel+OyXD8PG5SoM8fM5ciRb9CzHG+qNWfYmDWJiTs/90w095Lu57lb2QHtF6BSqMHG+7TYa8g8tTYBf6reoArYYl0UVFsNuBM5CTVH0g6puMjONWgVnxjB/U6gkua5YD330JAqJ2Zpsq5h7DxHR8FzktSRSTlwghiH6Qzk7zRhi0bxJbQbRqCyMszqju6/0tRaI3vORdQdSY+GP64re2FBYTN1DTIjF0faovINTSN6xIiKJh/xtl9kAKvmMNi048k6Moc7qldtKNhanpz9InM+t2DQg+zngWf8/XKF5qy5WLFzDIhwwsYZ1d2gmbd1a1GvGkAXh71z5+c3DMJbQbnTVvm14eqzfK1Zuo2FpN9g= spamdiagnosticoutput: 1:99 spamdiagnosticmetadata: NSPM MIME-Version: 1.0 X-OriginatorOrg: Mellanox.com X-MS-Exchange-CrossTenant-Network-Message-Id: 36653fcd-a60b-4c53-3a22-08d62830904b X-MS-Exchange-CrossTenant-originalarrivaltime: 02 Oct 2018 06:30:36.7376 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: a652971c-7d2e-4d9b-a6a4-d149256f461b X-MS-Exchange-Transport-CrossTenantHeadersStamped: AM4PR05MB3443 Subject: [dpdk-dev] [PATCH 2/5] net/mlx5: e-switch VXLAN netlink routines update X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" This part of patchset updates Netlink exchange routines. Message sequence numbers became not random ones, the multipart reply messages are supported, not propagating errors to the following socket calls, Netlink replies buffer size is increased to MNL_SOCKET_BUFFER_SIZE. Suggested-by: Adrien Mazarguil Signed-off-by: Viacheslav Ovsiienko --- drivers/net/mlx5/mlx5.c | 18 ++-- drivers/net/mlx5/mlx5.h | 7 +- drivers/net/mlx5/mlx5_flow.h | 9 +- drivers/net/mlx5/mlx5_flow_tcf.c | 214 +++++++++++++++++++++++---------------- 4 files changed, 147 insertions(+), 101 deletions(-) diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c index 4be6a1c..201a26e 100644 --- a/drivers/net/mlx5/mlx5.c +++ b/drivers/net/mlx5/mlx5.c @@ -287,8 +287,7 @@ close(priv->nl_socket_route); if (priv->nl_socket_rdma >= 0) close(priv->nl_socket_rdma); - if (priv->mnl_socket) - mlx5_flow_tcf_socket_destroy(priv->mnl_socket); + mlx5_flow_tcf_socket_close(&priv->tcf_socket); ret = mlx5_hrxq_ibv_verify(dev); if (ret) DRV_LOG(WARNING, "port %u some hash Rx queue still remain", @@ -1138,8 +1137,9 @@ claim_zero(mlx5_mac_addr_add(eth_dev, &mac, 0, 0)); if (vf && config.vf_nl_en) mlx5_nl_mac_addr_sync(eth_dev); - priv->mnl_socket = mlx5_flow_tcf_socket_create(); - if (!priv->mnl_socket) { + /* Initialize Netlink socket for e-switch control */ + err = mlx5_flow_tcf_socket_open(&priv->tcf_socket); + if (err) { err = -rte_errno; DRV_LOG(WARNING, "flow rules relying on switch offloads will not be" @@ -1154,16 +1154,15 @@ error.message = "cannot retrieve network interface index"; } else { - err = mlx5_flow_tcf_init(priv->mnl_socket, ifindex, - &error); + err = mlx5_flow_tcf_ifindex_init(&priv->tcf_socket, + ifindex, &error); } if (err) { DRV_LOG(WARNING, "flow rules relying on switch offloads will" " not be supported: %s: %s", error.message, strerror(rte_errno)); - mlx5_flow_tcf_socket_destroy(priv->mnl_socket); - priv->mnl_socket = NULL; + mlx5_flow_tcf_socket_close(&priv->tcf_socket); } } TAILQ_INIT(&priv->flows); @@ -1218,8 +1217,7 @@ close(priv->nl_socket_route); if (priv->nl_socket_rdma >= 0) close(priv->nl_socket_rdma); - if (priv->mnl_socket) - mlx5_flow_tcf_socket_destroy(priv->mnl_socket); + mlx5_flow_tcf_socket_close(&priv->tcf_socket); if (own_domain_id) claim_zero(rte_eth_switch_domain_free(priv->domain_id)); rte_free(priv); diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h index 8de0d74..b327a39 100644 --- a/drivers/net/mlx5/mlx5.h +++ b/drivers/net/mlx5/mlx5.h @@ -160,6 +160,11 @@ struct mlx5_drop { struct mnl_socket; +struct mlx5_tcf_socket { + uint32_t seq; /* Message sequence number. */ + struct mnl_socket *nl; /* NETLINK_ROUTE libmnl socket. */ +}; + struct priv { LIST_ENTRY(priv) mem_event_cb; /* Called by memory event callback. */ struct rte_eth_dev_data *dev_data; /* Pointer to device data. */ @@ -220,12 +225,12 @@ struct priv { int nl_socket_rdma; /* Netlink socket (NETLINK_RDMA). */ int nl_socket_route; /* Netlink socket (NETLINK_ROUTE). */ uint32_t nl_sn; /* Netlink message sequence number. */ + struct mlx5_tcf_socket tcf_socket; /* Libmnl socket for tcf. */ #ifndef RTE_ARCH_64 rte_spinlock_t uar_lock_cq; /* CQs share a common distinct UAR */ rte_spinlock_t uar_lock[MLX5_UAR_PAGE_NUM_MAX]; /* UAR same-page access control required in 32bit implementations. */ #endif - struct mnl_socket *mnl_socket; /* Libmnl socket. */ }; #define PORT_ID(priv) ((priv)->dev_data->port_id) diff --git a/drivers/net/mlx5/mlx5_flow.h b/drivers/net/mlx5/mlx5_flow.h index 2d56ced..fff905a 100644 --- a/drivers/net/mlx5/mlx5_flow.h +++ b/drivers/net/mlx5/mlx5_flow.h @@ -348,9 +348,10 @@ int mlx5_flow_validate_item_vxlan_gpe(const struct rte_flow_item *item, /* mlx5_flow_tcf.c */ -int mlx5_flow_tcf_init(struct mnl_socket *nl, unsigned int ifindex, - struct rte_flow_error *error); -struct mnl_socket *mlx5_flow_tcf_socket_create(void); -void mlx5_flow_tcf_socket_destroy(struct mnl_socket *nl); +int mlx5_flow_tcf_ifindex_init(struct mlx5_tcf_socket *tcf, + unsigned int ifindex, + struct rte_flow_error *error); +int mlx5_flow_tcf_socket_open(struct mlx5_tcf_socket *tcf); +void mlx5_flow_tcf_socket_close(struct mlx5_tcf_socket *tcf); #endif /* RTE_PMD_MLX5_FLOW_H_ */ diff --git a/drivers/net/mlx5/mlx5_flow_tcf.c b/drivers/net/mlx5/mlx5_flow_tcf.c index 5c93412..15e250c 100644 --- a/drivers/net/mlx5/mlx5_flow_tcf.c +++ b/drivers/net/mlx5/mlx5_flow_tcf.c @@ -1552,8 +1552,8 @@ struct flow_tcf_ptoi { /** * Send Netlink message with acknowledgment. * - * @param nl - * Libmnl socket to use. + * @param tcf + * Libmnl socket context to use. * @param nlh * Message to send. This function always raises the NLM_F_ACK flag before * sending. @@ -1562,26 +1562,108 @@ struct flow_tcf_ptoi { * 0 on success, a negative errno value otherwise and rte_errno is set. */ static int -flow_tcf_nl_ack(struct mnl_socket *nl, struct nlmsghdr *nlh) +flow_tcf_nl_ack(struct mlx5_tcf_socket *tcf, struct nlmsghdr *nlh) { alignas(struct nlmsghdr) - uint8_t ans[mnl_nlmsg_size(sizeof(struct nlmsgerr)) + - nlh->nlmsg_len - sizeof(*nlh)]; - uint32_t seq = random(); - int ret; - + uint8_t ans[MNL_SOCKET_BUFFER_SIZE]; + unsigned int portid = mnl_socket_get_portid(tcf->nl); + uint32_t seq = tcf->seq++; + struct mnl_socket *nl = tcf->nl; + int err, ret; + + assert(nl); + if (!seq) + seq = tcf->seq++; nlh->nlmsg_flags |= NLM_F_ACK; nlh->nlmsg_seq = seq; ret = mnl_socket_sendto(nl, nlh, nlh->nlmsg_len); - if (ret != -1) - ret = mnl_socket_recvfrom(nl, ans, sizeof(ans)); - if (ret != -1) - ret = mnl_cb_run - (ans, ret, seq, mnl_socket_get_portid(nl), NULL, NULL); + err = (ret <= 0) ? -errno : 0; + nlh = (struct nlmsghdr *)ans; + /* + * The following loop postpones non-fatal errors until multipart + * messages are complete. + */ if (ret > 0) + while (true) { + ret = mnl_socket_recvfrom(nl, ans, sizeof(ans)); + if (ret < 0) { + err = errno; + if (err != ENOSPC) + break; + } + if (!err) { + ret = mnl_cb_run(nlh, ret, seq, portid, + NULL, NULL); + if (ret < 0) { + err = errno; + break; + } + } + /* Will receive till end of multipart message */ + if (!(nlh->nlmsg_flags & NLM_F_MULTI) || + nlh->nlmsg_type == NLMSG_DONE) + break; + } + if (!err) return 0; - rte_errno = errno; - return -rte_errno; + rte_errno = err; + return -err; +} + +/** + * Initialize ingress qdisc of a given network interface. + * + * @param tcf + * Libmnl socket context object. + * @param ifindex + * Index of network interface to initialize. + * @param[out] error + * Perform verbose error reporting if not NULL. + * + * @return + * 0 on success, a negative errno value otherwise and rte_errno is set. + */ +int +mlx5_flow_tcf_ifindex_init(struct mlx5_tcf_socket *tcf, unsigned int ifindex, + struct rte_flow_error *error) +{ + struct nlmsghdr *nlh; + struct tcmsg *tcm; + alignas(struct nlmsghdr) + uint8_t buf[mnl_nlmsg_size(sizeof(*tcm) + 128)]; + + /* Destroy existing ingress qdisc and everything attached to it. */ + nlh = mnl_nlmsg_put_header(buf); + nlh->nlmsg_type = RTM_DELQDISC; + nlh->nlmsg_flags = NLM_F_REQUEST; + tcm = mnl_nlmsg_put_extra_header(nlh, sizeof(*tcm)); + tcm->tcm_family = AF_UNSPEC; + tcm->tcm_ifindex = ifindex; + tcm->tcm_handle = TC_H_MAKE(TC_H_INGRESS, 0); + tcm->tcm_parent = TC_H_INGRESS; + /* Ignore errors when qdisc is already absent. */ + if (flow_tcf_nl_ack(tcf, nlh) && + rte_errno != EINVAL && rte_errno != ENOENT) + return rte_flow_error_set(error, rte_errno, + RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL, + "netlink: failed to remove ingress" + " qdisc"); + /* Create fresh ingress qdisc. */ + nlh = mnl_nlmsg_put_header(buf); + nlh->nlmsg_type = RTM_NEWQDISC; + nlh->nlmsg_flags = NLM_F_REQUEST | NLM_F_CREATE | NLM_F_EXCL; + tcm = mnl_nlmsg_put_extra_header(nlh, sizeof(*tcm)); + tcm->tcm_family = AF_UNSPEC; + tcm->tcm_ifindex = ifindex; + tcm->tcm_handle = TC_H_MAKE(TC_H_INGRESS, 0); + tcm->tcm_parent = TC_H_INGRESS; + mnl_attr_put_strz_check(nlh, sizeof(buf), TCA_KIND, "ingress"); + if (flow_tcf_nl_ack(tcf, nlh)) + return rte_flow_error_set(error, rte_errno, + RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL, + "netlink: failed to create ingress" + " qdisc"); + return 0; } /** @@ -1602,18 +1684,25 @@ struct flow_tcf_ptoi { struct rte_flow_error *error) { struct priv *priv = dev->data->dev_private; - struct mnl_socket *nl = priv->mnl_socket; + struct mlx5_tcf_socket *tcf = &priv->tcf_socket; struct mlx5_flow *dev_flow; struct nlmsghdr *nlh; + int ret; dev_flow = LIST_FIRST(&flow->dev_flows); /* E-Switch flow can't be expanded. */ assert(!LIST_NEXT(dev_flow, next)); + if (dev_flow->tcf.applied) + return 0; nlh = dev_flow->tcf.nlh; nlh->nlmsg_type = RTM_NEWTFILTER; nlh->nlmsg_flags = NLM_F_REQUEST | NLM_F_CREATE | NLM_F_EXCL; - if (!flow_tcf_nl_ack(nl, nlh)) + ret = flow_tcf_nl_ack(tcf, nlh); + if (!ret) { + dev_flow->tcf.applied = 1; return 0; + } + DRV_LOG(WARNING, "Failed to create TC rule (%d)", rte_errno); return rte_flow_error_set(error, rte_errno, RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL, "netlink: failed to create TC flow rule"); @@ -1631,7 +1720,7 @@ struct flow_tcf_ptoi { flow_tcf_remove(struct rte_eth_dev *dev, struct rte_flow *flow) { struct priv *priv = dev->data->dev_private; - struct mnl_socket *nl = priv->mnl_socket; + struct mlx5_tcf_socket *tcf = &priv->tcf_socket; struct mlx5_flow *dev_flow; struct nlmsghdr *nlh; @@ -1645,7 +1734,8 @@ struct flow_tcf_ptoi { nlh = dev_flow->tcf.nlh; nlh->nlmsg_type = RTM_DELTFILTER; nlh->nlmsg_flags = NLM_F_REQUEST; - flow_tcf_nl_ack(nl, nlh); + flow_tcf_nl_ack(tcf, nlh); + dev_flow->tcf.applied = 0; } /** @@ -1683,93 +1773,45 @@ struct flow_tcf_ptoi { }; /** - * Initialize ingress qdisc of a given network interface. - * - * @param nl - * Libmnl socket of the @p NETLINK_ROUTE kind. - * @param ifindex - * Index of network interface to initialize. - * @param[out] error - * Perform verbose error reporting if not NULL. + * Creates and configures a libmnl socket for Netlink flow rules. * + * @param tcf + * tcf socket object to be initialized by function. * @return * 0 on success, a negative errno value otherwise and rte_errno is set. */ int -mlx5_flow_tcf_init(struct mnl_socket *nl, unsigned int ifindex, - struct rte_flow_error *error) -{ - struct nlmsghdr *nlh; - struct tcmsg *tcm; - alignas(struct nlmsghdr) - uint8_t buf[mnl_nlmsg_size(sizeof(*tcm) + 128)]; - - /* Destroy existing ingress qdisc and everything attached to it. */ - nlh = mnl_nlmsg_put_header(buf); - nlh->nlmsg_type = RTM_DELQDISC; - nlh->nlmsg_flags = NLM_F_REQUEST; - tcm = mnl_nlmsg_put_extra_header(nlh, sizeof(*tcm)); - tcm->tcm_family = AF_UNSPEC; - tcm->tcm_ifindex = ifindex; - tcm->tcm_handle = TC_H_MAKE(TC_H_INGRESS, 0); - tcm->tcm_parent = TC_H_INGRESS; - /* Ignore errors when qdisc is already absent. */ - if (flow_tcf_nl_ack(nl, nlh) && - rte_errno != EINVAL && rte_errno != ENOENT) - return rte_flow_error_set(error, rte_errno, - RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL, - "netlink: failed to remove ingress" - " qdisc"); - /* Create fresh ingress qdisc. */ - nlh = mnl_nlmsg_put_header(buf); - nlh->nlmsg_type = RTM_NEWQDISC; - nlh->nlmsg_flags = NLM_F_REQUEST | NLM_F_CREATE | NLM_F_EXCL; - tcm = mnl_nlmsg_put_extra_header(nlh, sizeof(*tcm)); - tcm->tcm_family = AF_UNSPEC; - tcm->tcm_ifindex = ifindex; - tcm->tcm_handle = TC_H_MAKE(TC_H_INGRESS, 0); - tcm->tcm_parent = TC_H_INGRESS; - mnl_attr_put_strz_check(nlh, sizeof(buf), TCA_KIND, "ingress"); - if (flow_tcf_nl_ack(nl, nlh)) - return rte_flow_error_set(error, rte_errno, - RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL, - "netlink: failed to create ingress" - " qdisc"); - return 0; -} - -/** - * Create and configure a libmnl socket for Netlink flow rules. - * - * @return - * A valid libmnl socket object pointer on success, NULL otherwise and - * rte_errno is set. - */ -struct mnl_socket * -mlx5_flow_tcf_socket_create(void) +mlx5_flow_tcf_socket_open(struct mlx5_tcf_socket *tcf) { struct mnl_socket *nl = mnl_socket_open(NETLINK_ROUTE); + tcf->nl = NULL; if (nl) { mnl_socket_setsockopt(nl, NETLINK_CAP_ACK, &(int){ 1 }, sizeof(int)); - if (!mnl_socket_bind(nl, 0, MNL_SOCKET_AUTOPID)) - return nl; + if (!mnl_socket_bind(nl, 0, MNL_SOCKET_AUTOPID)) { + tcf->nl = nl; + tcf->seq = random(); + return 0; + } } rte_errno = errno; if (nl) mnl_socket_close(nl); - return NULL; + return -rte_errno; } /** - * Destroy a libmnl socket. + * Destroys tcf object (closes MNL socket). * - * @param nl - * Libmnl socket of the @p NETLINK_ROUTE kind. + * @param tcf + * tcf socket object to be destroyed by function. */ void -mlx5_flow_tcf_socket_destroy(struct mnl_socket *nl) +mlx5_flow_tcf_socket_close(struct mlx5_tcf_socket *tcf) { - mnl_socket_close(nl); + if (tcf->nl) { + mnl_socket_close(tcf->nl); + tcf->nl = NULL; + } } From patchwork Tue Oct 2 06:30:38 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Slava Ovsiienko X-Patchwork-Id: 45802 X-Patchwork-Delegate: shahafs@mellanox.com Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 45E894CA6; Tue, 2 Oct 2018 08:30:42 +0200 (CEST) Received: from EUR01-DB5-obe.outbound.protection.outlook.com (mail-db5eur01on0078.outbound.protection.outlook.com [104.47.2.78]) by dpdk.org (Postfix) with ESMTP id 9022D4CA2 for ; Tue, 2 Oct 2018 08:30:40 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Mellanox.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=HWH7sittDM+MqHJjb7PIUTtodkJWhMCaTQ9t0bIRDIc=; b=Ic220li8IafP+q/TOyl26QVj4IBuSw9fnZq+SRm3vIWFOUuRseJ6CVaYsbXhkXG2mP5Wzqd1YlwEuMnO7P41Qi4HqitogGWk3NKwVIkwITs6T4nPRqx2+CwZgSOre4tybDCSXTtEYg7RanDS4hOtxLJED6bjTXctz3hrHz3Exuw= Received: from AM4PR05MB3265.eurprd05.prod.outlook.com (10.171.186.150) by AM4PR05MB3443.eurprd05.prod.outlook.com (10.171.187.148) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.1185.25; Tue, 2 Oct 2018 06:30:38 +0000 Received: from AM4PR05MB3265.eurprd05.prod.outlook.com ([fe80::81a1:b719:2345:50e5]) by AM4PR05MB3265.eurprd05.prod.outlook.com ([fe80::81a1:b719:2345:50e5%5]) with mapi id 15.20.1185.024; Tue, 2 Oct 2018 06:30:38 +0000 From: Slava Ovsiienko To: "dev@dpdk.org" CC: Shahaf Shuler , Slava Ovsiienko Thread-Topic: [PATCH 3/5] net/mlx5: e-switch VXLAN flow validation routine Thread-Index: AQHUWhlv9c2nk6IFnEC7AZTKZn5zWg== Date: Tue, 2 Oct 2018 06:30:38 +0000 Message-ID: <1538461807-37507-3-git-send-email-viacheslavo@mellanox.com> References: <1538461807-37507-1-git-send-email-viacheslavo@mellanox.com> In-Reply-To: <1538461807-37507-1-git-send-email-viacheslavo@mellanox.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-clientproxiedby: AM0PR02CA0019.eurprd02.prod.outlook.com (2603:10a6:208:3e::32) To AM4PR05MB3265.eurprd05.prod.outlook.com (2603:10a6:205:4::22) authentication-results: spf=none (sender IP is ) smtp.mailfrom=viacheslavo@mellanox.com; x-ms-exchange-messagesentrepresentingtype: 1 x-originating-ip: [37.142.13.130] x-ms-publictraffictype: Email x-microsoft-exchange-diagnostics: 1; AM4PR05MB3443; 6:uT0sFqKxjgEFbSUHNrT0GD39JI7YyvpKr+ws8KPK6RnLCwMR0V/NxZpMmXSTnHms1TxWc6yU9oUoqQCZi22r47tVPw+raok35mtc6TENrGP1gPFNa+oPDUiE4CqofJSUhQqVRDI447+lvJMNuYyiEP4BY7bO/murJKWzJcJpXaI3Zthd4P3V9C4FMK3mcqgUrxPjOqGJqYsIrxksxR8rksz5hbJuiJEbC+re+lLOU7X1Zo566tw5cXla8/1CsNVKJyVcvYfh9HzJCXvIPRW18i+R/rjeAAuQ4YtZbA+3L7t2hbAjyYZyiB333Z26IXIeFLaDe6kWUVQyFE3IIVLrKGE/tmuG6S5Vct0yfvn9inWJRAY4Zc7YBc1CLf9Ee0yg/wqQdSqtkBq9dAdeBy4Fij3SNQ8ew7+vjsv00R4oltevwcXnoKn2WP1/CSVLW6UoU17lRGKXIlSpRXpwxfqU7w==; 5:7EQVTjyczXoDsFFo/ewOGwxZbdT0InwQY7AfjyV1FPu3MfoyqGrPpN2sNaLdfFuFNT0ylJgUy5diI4uICTA//J//U3l8UR+G1ufF+pFFhpk92TZFTMyuCuxiiZvc6jDNV0OBPFlqqgGagAdnioUDUQYRQYzoIvd7+2IVGo7nBa4=; 7:4VFTyotL+2GvZDvcuvZvLQl32IH4cHBNopGDXrY9uNIDaDLEh6yCx9KfzJn0bYHE8Tq0VaXW1bpsRvu/iaAeRoMP6m6e/WhksOSK8+fjjAAhSXD+P9hWxO8Y1fiIaqoFuDILtA6XY6zagF+AZtX1YqDqOzW0xrjdiuJBs1c7jHv/NU7dTRzXoaSMEUS+jLZYix7q91JD9YHNhA1Jdl4hcKC65o/HiwMtfWiEkWyaLAwJt1PLG+Trqz67WGsPM8iI x-ms-office365-filtering-correlation-id: 4f0b4fa7-bd60-4b93-0421-08d628309171 x-ms-office365-filtering-ht: Tenant x-microsoft-antispam: BCL:0; PCL:0; RULEID:(7020095)(4652040)(8989299)(4534165)(4627221)(201703031133081)(201702281549075)(8990200)(5600074)(711020)(4618075)(2017052603328)(7153060)(7193020); SRVR:AM4PR05MB3443; x-ms-traffictypediagnostic: AM4PR05MB3443: x-microsoft-antispam-prvs: x-exchange-antispam-report-test: UriScan:; x-ms-exchange-senderadcheck: 1 x-exchange-antispam-report-cfa-test: BCL:0; PCL:0; RULEID:(8211001083)(6040522)(2401047)(8121501046)(5005006)(3231355)(944501410)(52105095)(3002001)(10201501046)(93006095)(93001095)(6055026)(149066)(150057)(6041310)(20161123560045)(20161123558120)(20161123562045)(20161123564045)(201703131423095)(201702281528075)(20161123555045)(201703061421075)(201703061406153)(201708071742011)(7699051); SRVR:AM4PR05MB3443; BCL:0; PCL:0; RULEID:; SRVR:AM4PR05MB3443; x-forefront-prvs: 0813C68E65 x-forefront-antispam-report: SFV:NSPM; SFS:(10009020)(396003)(136003)(39860400002)(376002)(366004)(346002)(189003)(199004)(5640700003)(256004)(25786009)(14444005)(26005)(106356001)(54906003)(186003)(316002)(102836004)(68736007)(105586002)(6116002)(3846002)(97736004)(14454004)(81156014)(1730700003)(86362001)(2351001)(2900100001)(8676002)(7736002)(2616005)(8936002)(476003)(305945005)(446003)(11346002)(81166006)(6486002)(52116002)(386003)(6436002)(6506007)(53936002)(2906002)(4326008)(71200400001)(36756003)(478600001)(486006)(99286004)(5250100002)(66066001)(76176011)(5660300001)(6512007)(2501003)(6916009)(53946003)(4744004)(71190400001)(107886003)(579004)(559001); DIR:OUT; SFP:1101; SCL:1; SRVR:AM4PR05MB3443; H:AM4PR05MB3265.eurprd05.prod.outlook.com; FPR:; SPF:None; LANG:en; PTR:InfoNoRecords; MX:1; A:1; received-spf: None (protection.outlook.com: mellanox.com does not designate permitted sender hosts) x-microsoft-antispam-message-info: 4uAcgANfvf0iUmoS9lJtFbK8dt/U2FYzY37UohH6jOSHHun32/LBDyPgifNBORmQ4e1kErVLv2E8jN84jHf11eyzOhGu9++x5BpSuXDK273EdTNTTWu91v7eioFD6NNKz86Xxelnev0hfygh7r4SfFpMhaLh0CRdSDO70hk1SC1Ubnom2b4Q1j8yZKTe3o3ieL6jCOY8dnduh8xiLfO5728mHPTne6yC1L/GIJDhqoD/1fl6V7wrjrX4+x0hywCvirCzMMNqCHYaLybC+Pxarj1O41EFu+1dJv+N5/8qG6AXhRel1Y0ZGgB3CKRLaH2zJwDF1pJ5i3yXQqI/q20Kr/PjFRlwTRIgkPj+5rRSz4Y= spamdiagnosticoutput: 1:99 spamdiagnosticmetadata: NSPM MIME-Version: 1.0 X-OriginatorOrg: Mellanox.com X-MS-Exchange-CrossTenant-Network-Message-Id: 4f0b4fa7-bd60-4b93-0421-08d628309171 X-MS-Exchange-CrossTenant-originalarrivaltime: 02 Oct 2018 06:30:38.7511 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: a652971c-7d2e-4d9b-a6a4-d149256f461b X-MS-Exchange-Transport-CrossTenantHeadersStamped: AM4PR05MB3443 Subject: [dpdk-dev] [PATCH 3/5] net/mlx5: e-switch VXLAN flow validation routine X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" This part of patchset adds support for flow item/actions lists validation. The following entities are now supported: - RTE_FLOW_ITEM_TYPE_VXLAN, contains the tunnel VNI - RTE_FLOW_ACTION_TYPE_VXLAN_DECAP, if this action is specified the items in the flow items list treated as outer network parameters for tunnel outer header match. The ethernet layer addresses always are treated as inner ones. - RTE_FLOW_ACTION_TYPE_VXLAN_ENCAP, contains the item list to build the encapsulation header. In current implementation the values is the subject for some constraints: - outer source IP should coincide with outer egress interface assigned address - outer source MAC address will be always unconditionally set to the one of MAC addresses of outer egress interface - no way to specify source UDP port - all abovementioned parameters are ignored if specified in the rule, warning messages are sent to the log Suggested-by: Adrien Mazarguil Signed-off-by: Viacheslav Ovsiienko --- drivers/net/mlx5/mlx5_flow_tcf.c | 717 ++++++++++++++++++++++++++++++++++++++- 1 file changed, 713 insertions(+), 4 deletions(-) diff --git a/drivers/net/mlx5/mlx5_flow_tcf.c b/drivers/net/mlx5/mlx5_flow_tcf.c index 15e250c..97451bd 100644 --- a/drivers/net/mlx5/mlx5_flow_tcf.c +++ b/drivers/net/mlx5/mlx5_flow_tcf.c @@ -558,6 +558,630 @@ struct flow_tcf_ptoi { } /** + * Validate VXLAN_ENCAP action RTE_FLOW_ITEM_TYPE_ETH item for E-Switch. + * + * @param[in] item + * Pointer to the itemn structure. + * @param[out] error + * Pointer to the error structure. + * + * @return + * 0 on success, a negative errno value otherwise and rte_errno is set. + **/ +static int +flow_tcf_validate_vxlan_encap_eth(const struct rte_flow_item *item, + struct rte_flow_error *error) +{ + const struct rte_flow_item_eth *spec = item->spec; + const struct rte_flow_item_eth *mask = item->mask; + + if (!spec) + /* + * Specification for L2 addresses can be empty + * because these ones are optional and not + * required directly by tc rule. + */ + return 0; + if (!mask) + /* If mask is not specified use the default one. */ + mask = &rte_flow_item_eth_mask; + if (memcmp(&mask->dst, + &flow_tcf_mask_empty.eth.dst, + sizeof(flow_tcf_mask_empty.eth.dst))) { + if (memcmp(&mask->dst, + &rte_flow_item_eth_mask.dst, + sizeof(rte_flow_item_eth_mask.dst))) + return rte_flow_error_set(error, ENOTSUP, + RTE_FLOW_ERROR_TYPE_ITEM_MASK, mask, + "no support for partial mask on" + " \"eth.dst\" field"); + /* + * Ethernet addresses are not supported by + * tc as tunnel_key parameters. Destination + * L2 address is needed to form encap packet + * header and retrieved by kernel from implicit + * sources (ARP table, etc), address masks are + * not supported at all. + */ + DRV_LOG(WARNING, + "outer ethernet destination address " + "cannot be forced for VXLAN " + "encapsulation, parameter ignored"); + } + if (memcmp(&mask->src, + &flow_tcf_mask_empty.eth.src, + sizeof(flow_tcf_mask_empty.eth.src))) { + if (memcmp(&mask->src, + &rte_flow_item_eth_mask.src, + sizeof(rte_flow_item_eth_mask.src))) + return rte_flow_error_set(error, ENOTSUP, + RTE_FLOW_ERROR_TYPE_ITEM_MASK, mask, + "no support for partial mask on" + " \"eth.src\" field"); + DRV_LOG(WARNING, + "outer ethernet source address " + "cannot be forced for VXLAN " + "encapsulation, parameter ignored"); + } + if (mask->type != RTE_BE16(0x0000)) { + if (mask->type != RTE_BE16(0xffff)) + return rte_flow_error_set(error, ENOTSUP, + RTE_FLOW_ERROR_TYPE_ITEM_MASK, mask, + "no support for partial mask on" + " \"eth.type\" field"); + DRV_LOG(WARNING, + "outer ethernet type field " + "cannot be forced for VXLAN " + "encapsulation, parameter ignored"); + } + return 0; +} + +/** + * Validate VXLAN_ENCAP action RTE_FLOW_ITEM_TYPE_IPV4 item for E-Switch. + * + * @param[in] item + * Pointer to the itemn structure. + * @param[out] error + * Pointer to the error structure. + * + * @return + * 0 on success, a negative errno value otherwise and rte_errno is set. + **/ +static int +flow_tcf_validate_vxlan_encap_ipv4(const struct rte_flow_item *item, + struct rte_flow_error *error) +{ + const struct rte_flow_item_ipv4 *spec = item->spec; + const struct rte_flow_item_ipv4 *mask = item->mask; + + if (!spec) + /* + * Specification for L3 addresses cannot be empty + * because it is required by tunnel_key parameter. + */ + return rte_flow_error_set(error, EINVAL, + RTE_FLOW_ERROR_TYPE_ITEM, item, + "NULL outer L3 address specification " + " for VXLAN encapsulation"); + if (!mask) + mask = &rte_flow_item_ipv4_mask; + if (mask->hdr.dst_addr != RTE_BE32(0x00000000)) { + if (mask->hdr.dst_addr != RTE_BE32(0xffffffff)) + return rte_flow_error_set(error, ENOTSUP, + RTE_FLOW_ERROR_TYPE_ITEM_MASK, mask, + "no support for partial mask on" + " \"ipv4.hdr.dst_addr\" field"); + /* More L3 address validations can be put here. */ + } else { + /* + * Kernel uses the destination L3 address to determine + * the routing path and obtain the L2 destination + * address, so L3 destination address must be + * specified in the tc rule. + */ + return rte_flow_error_set(error, EINVAL, + RTE_FLOW_ERROR_TYPE_ITEM, item, + "outer L3 destination address must be " + "specified for VXLAN encapsulation"); + } + if (mask->hdr.src_addr != RTE_BE32(0x00000000)) { + if (mask->hdr.src_addr != RTE_BE32(0xffffffff)) + return rte_flow_error_set(error, ENOTSUP, + RTE_FLOW_ERROR_TYPE_ITEM_MASK, mask, + "no support for partial mask on" + " \"ipv4.hdr.src_addr\" field"); + /* More L3 address validations can be put here. */ + } else { + /* + * Kernel uses the source L3 address to select the + * interface for egress encapsulated traffic, so + * it must be specified in the tc rule. + */ + return rte_flow_error_set(error, EINVAL, + RTE_FLOW_ERROR_TYPE_ITEM, item, + "outer L3 source address must be " + "specified for VXLAN encapsulation"); + } + return 0; +} + +/** + * Validate VXLAN_ENCAP action RTE_FLOW_ITEM_TYPE_IPV6 item for E-Switch. + * + * @param[in] item + * Pointer to the itemn structure. + * @param[out] error + * Pointer to the error structure. + * + * @return + * 0 on success, a negative errno value otherwise and rte_ernno is set. + **/ +static int +flow_tcf_validate_vxlan_encap_ipv6(const struct rte_flow_item *item, + struct rte_flow_error *error) +{ + const struct rte_flow_item_ipv6 *spec = item->spec; + const struct rte_flow_item_ipv6 *mask = item->mask; + + if (!spec) + /* + * Specification for L3 addresses cannot be empty + * because it is required by tunnel_key parameter. + */ + return rte_flow_error_set(error, EINVAL, + RTE_FLOW_ERROR_TYPE_ITEM, item, + "NULL outer L3 address specification " + " for VXLAN encapsulation"); + if (!mask) + mask = &rte_flow_item_ipv6_mask; + if (memcmp(&mask->hdr.dst_addr, + &flow_tcf_mask_empty.ipv6.hdr.dst_addr, + sizeof(flow_tcf_mask_empty.ipv6.hdr.dst_addr))) { + if (memcmp(&mask->hdr.dst_addr, + &rte_flow_item_ipv6_mask.hdr.dst_addr, + sizeof(rte_flow_item_ipv6_mask.hdr.dst_addr))) + return rte_flow_error_set(error, ENOTSUP, + RTE_FLOW_ERROR_TYPE_ITEM_MASK, mask, + "no support for partial mask on" + " \"ipv6.hdr.dst_addr\" field"); + /* More L3 address validations can be put here. */ + } else { + /* + * Kernel uses the destination L3 address to determine + * the routing path and obtain the L2 destination + * address (heigh or gate), so L3 destination address + * must be specified within the tc rule. + */ + return rte_flow_error_set(error, EINVAL, + RTE_FLOW_ERROR_TYPE_ITEM, item, + "outer L3 destination address must be " + "specified for VXLAN encapsulation"); + } + if (memcmp(&mask->hdr.src_addr, + &flow_tcf_mask_empty.ipv6.hdr.src_addr, + sizeof(flow_tcf_mask_empty.ipv6.hdr.src_addr))) { + if (memcmp(&mask->hdr.src_addr, + &rte_flow_item_ipv6_mask.hdr.src_addr, + sizeof(rte_flow_item_ipv6_mask.hdr.src_addr))) + return rte_flow_error_set(error, ENOTSUP, + RTE_FLOW_ERROR_TYPE_ITEM_MASK, mask, + "no support for partial mask on" + " \"ipv6.hdr.src_addr\" field"); + /* More L3 address validation can be put here. */ + } else { + /* + * Kernel uses the source L3 address to select the + * interface for egress encapsulated traffic, so + * it must be specified in the tc rule. + */ + return rte_flow_error_set(error, EINVAL, + RTE_FLOW_ERROR_TYPE_ITEM, item, + "outer L3 source address must be " + "specified for VXLAN encapsulation"); + } + return 0; +} + +/** + * Validate VXLAN_ENCAP action RTE_FLOW_ITEM_TYPE_UDP item for E-Switch. + * + * @param[in] item + * Pointer to the itemn structure. + * @param[out] error + * Pointer to the error structure. + * + * @return + * 0 on success, a negative errno value otherwise and rte_ernno is set. + **/ +static int +flow_tcf_validate_vxlan_encap_udp(const struct rte_flow_item *item, + struct rte_flow_error *error) +{ + const struct rte_flow_item_udp *spec = item->spec; + const struct rte_flow_item_udp *mask = item->mask; + + if (!spec) + /* + * Specification for UDP ports cannot be empty + * because it is required by tunnel_key parameter. + */ + return rte_flow_error_set(error, EINVAL, + RTE_FLOW_ERROR_TYPE_ITEM, item, + "NULL UDP port specification " + " for VXLAN encapsulation"); + if (!mask) + mask = &rte_flow_item_udp_mask; + if (mask->hdr.dst_port != RTE_BE16(0x0000)) { + if (mask->hdr.dst_port != RTE_BE16(0xffff)) + return rte_flow_error_set(error, ENOTSUP, + RTE_FLOW_ERROR_TYPE_ITEM_MASK, mask, + "no support for partial mask on" + " \"udp.hdr.dst_port\" field"); + if (!spec->hdr.dst_port) + return rte_flow_error_set(error, EINVAL, + RTE_FLOW_ERROR_TYPE_ITEM, item, + "zero encap remote UDP port"); + } else { + return rte_flow_error_set(error, EINVAL, + RTE_FLOW_ERROR_TYPE_ITEM, item, + "outer UDP remote port must be " + "specified for VXLAN encapsulation"); + } + if (mask->hdr.src_port != RTE_BE16(0x0000)) { + if (mask->hdr.src_port != RTE_BE16(0xffff)) + return rte_flow_error_set(error, ENOTSUP, + RTE_FLOW_ERROR_TYPE_ITEM_MASK, mask, + "no support for partial mask on" + " \"udp.hdr.src_port\" field"); + DRV_LOG(WARNING, + "outer UDP source port cannot be " + "forced for VXLAN encapsulation, " + "parameter ignored"); + } + return 0; +} + +/** + * Validate VXLAN_ENCAP action RTE_FLOW_ITEM_TYPE_VXLAN item for E-Switch. + * + * @param[in] item + * Pointer to the itemn structure. + * @param[out] error + * Pointer to the error structure. + * + * @return + * 0 on success, a negative errno value otherwise and rte_ernno is set. + **/ +static int +flow_tcf_validate_vxlan_encap_vni(const struct rte_flow_item *item, + struct rte_flow_error *error) +{ + const struct rte_flow_item_vxlan *spec = item->spec; + const struct rte_flow_item_vxlan *mask = item->mask; + + if (!spec) + /* Outer VNI is required by tunnel_key parameter. */ + return rte_flow_error_set(error, EINVAL, + RTE_FLOW_ERROR_TYPE_ITEM, item, + "NULL VNI specification " + " for VXLAN encapsulation"); + if (!mask) + mask = &rte_flow_item_vxlan_mask; + if (mask->vni[0] != 0 || + mask->vni[1] != 0 || + mask->vni[2] != 0) { + if (mask->vni[0] != 0xff || + mask->vni[1] != 0xff || + mask->vni[2] != 0xff) + return rte_flow_error_set(error, ENOTSUP, + RTE_FLOW_ERROR_TYPE_ITEM_MASK, mask, + "no support for partial mask on" + " \"vxlan.vni\" field"); + if (spec->vni[0] == 0 && + spec->vni[1] == 0 && + spec->vni[2] == 0) + return rte_flow_error_set(error, EINVAL, + RTE_FLOW_ERROR_TYPE_ITEM, item, + "VXLAN vni cannot be 0"); + } else { + return rte_flow_error_set(error, EINVAL, + RTE_FLOW_ERROR_TYPE_ITEM, + item, + "outer VNI must be specified " + "for VXLAN encapsulation"); + } + return 0; +} + +/** + * Validate VXLAN_ENCAP action item list for E-Switch. + * + * @param[in] action + * Pointer to the VXLAN_ENCAP action structure. + * @param[out] error + * Pointer to the error structure. + * + * @return + * 0 on success, a negative errno value otherwise and rte_ernno is set. + **/ +static int +flow_tcf_validate_vxlan_encap(const struct rte_flow_action *action, + struct rte_flow_error *error) +{ + const struct rte_flow_item *items; + int ret; + uint32_t item_flags = 0; + + assert(action->type == RTE_FLOW_ACTION_TYPE_VXLAN_ENCAP); + if (!action->conf) + return rte_flow_error_set + (error, EINVAL, RTE_FLOW_ERROR_TYPE_ACTION, + action, "Missing VXLAN tunnel " + "action configuration"); + items = ((const struct rte_flow_action_vxlan_encap *) + action->conf)->definition; + if (!items) + return rte_flow_error_set + (error, EINVAL, RTE_FLOW_ERROR_TYPE_ACTION, + action, "Missing VXLAN tunnel " + "encapsulation parameters"); + for (; items->type != RTE_FLOW_ITEM_TYPE_END; items++) { + switch (items->type) { + case RTE_FLOW_ITEM_TYPE_VOID: + break; + case RTE_FLOW_ITEM_TYPE_ETH: + ret = mlx5_flow_validate_item_eth(items, item_flags, + error); + if (ret < 0) + return ret; + ret = flow_tcf_validate_vxlan_encap_eth(items, error); + if (ret < 0) + return ret; + item_flags |= MLX5_FLOW_LAYER_OUTER_L2; + break; + break; + case RTE_FLOW_ITEM_TYPE_IPV4: + ret = mlx5_flow_validate_item_ipv4(items, item_flags, + error); + if (ret < 0) + return ret; + ret = flow_tcf_validate_vxlan_encap_ipv4(items, error); + if (ret < 0) + return ret; + item_flags |= MLX5_FLOW_LAYER_OUTER_L3_IPV4; + break; + case RTE_FLOW_ITEM_TYPE_IPV6: + ret = mlx5_flow_validate_item_ipv6(items, item_flags, + error); + if (ret < 0) + return ret; + ret = flow_tcf_validate_vxlan_encap_ipv6(items, error); + if (ret < 0) + return ret; + item_flags |= MLX5_FLOW_LAYER_OUTER_L3_IPV6; + break; + case RTE_FLOW_ITEM_TYPE_UDP: + ret = mlx5_flow_validate_item_udp(items, item_flags, + 0xFF, error); + if (ret < 0) + return ret; + ret = flow_tcf_validate_vxlan_encap_udp(items, error); + if (ret < 0) + return ret; + item_flags |= MLX5_FLOW_LAYER_OUTER_L4_UDP; + break; + case RTE_FLOW_ITEM_TYPE_VXLAN: + ret = mlx5_flow_validate_item_vxlan(items, + item_flags, error); + if (ret < 0) + return ret; + ret = flow_tcf_validate_vxlan_encap_vni(items, error); + if (ret < 0) + return ret; + item_flags |= MLX5_FLOW_LAYER_VXLAN; + break; + default: + return rte_flow_error_set(error, ENOTSUP, + RTE_FLOW_ERROR_TYPE_ITEM, items, + "VXLAN encap item not supported"); + } + } + if (!(item_flags & MLX5_FLOW_LAYER_OUTER_L3)) + return rte_flow_error_set(error, EINVAL, + RTE_FLOW_ERROR_TYPE_ACTION, action, + "no outer L3 layer found" + " for VXLAN encapsulation"); + if (!(item_flags & MLX5_FLOW_LAYER_OUTER_L4_UDP)) + return rte_flow_error_set(error, EINVAL, + RTE_FLOW_ERROR_TYPE_ACTION, action, + "no outer L4 layer found" + " for VXLAN encapsulation"); + if (!(item_flags & MLX5_FLOW_LAYER_VXLAN)) + return rte_flow_error_set(error, EINVAL, + RTE_FLOW_ERROR_TYPE_ACTION, action, + "no VXLAN VNI found" + " for VXLAN encapsulation"); + return 0; +} + +/** + * Validate VXLAN_DECAP action outer tunnel items for E-Switch. + * + * @param[in] item_flags + * Mask of provided outer tunnel parameters + * @param[in] ipv4 + * Outer IPv4 address item (if any, NULL otherwise). + * @param[in] ipv6 + * Outer IPv6 address item (if any, NULL otherwise). + * @param[in] udp + * Outer UDP layer item (if any, NULL otherwise). + * @param[out] error + * Pointer to the error structure. + * + * @return + * 0 on success, a negative errno value otherwise and rte_ernno is set. + **/ +static int +flow_tcf_validate_vxlan_decap(uint32_t item_flags, + const struct rte_flow_action *action, + const struct rte_flow_item *ipv4, + const struct rte_flow_item *ipv6, + const struct rte_flow_item *udp, + struct rte_flow_error *error) +{ + if (!ipv4 && !ipv6) + return rte_flow_error_set(error, EINVAL, + RTE_FLOW_ERROR_TYPE_ACTION, action, + "no outer L3 layer found" + " for VXLAN decapsulation"); + if (ipv4) { + const struct rte_flow_item_ipv4 *spec = ipv4->spec; + const struct rte_flow_item_ipv4 *mask = ipv4->mask; + + if (!spec) + /* + * Specification for L3 addresses cannot be empty + * because it is required as decap parameter. + */ + return rte_flow_error_set(error, EINVAL, + RTE_FLOW_ERROR_TYPE_ITEM, ipv4, + "NULL outer L3 address specification " + " for VXLAN decapsulation"); + if (!mask) + mask = &rte_flow_item_ipv4_mask; + if (mask->hdr.dst_addr != RTE_BE32(0x00000000)) { + if (mask->hdr.dst_addr != RTE_BE32(0xffffffff)) + return rte_flow_error_set(error, ENOTSUP, + RTE_FLOW_ERROR_TYPE_ITEM_MASK, mask, + "no support for partial mask on" + " \"ipv4.hdr.dst_addr\" field"); + /* More L3 address validations can be put here. */ + } else { + /* + * Kernel uses the destination L3 address + * to determine the ingress network interface + * for traffic being decapculated. + */ + return rte_flow_error_set(error, EINVAL, + RTE_FLOW_ERROR_TYPE_ITEM, ipv4, + "outer L3 destination address must be " + "specified for VXLAN decapsulation"); + } + /* Source L3 address is optional for decap. */ + if (mask->hdr.src_addr != RTE_BE32(0x00000000)) + if (mask->hdr.src_addr != RTE_BE32(0xffffffff)) + return rte_flow_error_set(error, ENOTSUP, + RTE_FLOW_ERROR_TYPE_ITEM_MASK, mask, + "no support for partial mask on" + " \"ipv4.hdr.src_addr\" field"); + } else { + const struct rte_flow_item_ipv6 *spec = ipv6->spec; + const struct rte_flow_item_ipv6 *mask = ipv6->mask; + + if (!spec) + /* + * Specification for L3 addresses cannot be empty + * because it is required as decap parameter. + */ + return rte_flow_error_set(error, EINVAL, + RTE_FLOW_ERROR_TYPE_ITEM, ipv6, + "NULL outer L3 address specification " + " for VXLAN decapsulation"); + if (!mask) + mask = &rte_flow_item_ipv6_mask; + if (memcmp(&mask->hdr.dst_addr, + &flow_tcf_mask_empty.ipv6.hdr.dst_addr, + sizeof(flow_tcf_mask_empty.ipv6.hdr.dst_addr))) { + if (memcmp(&mask->hdr.dst_addr, + &rte_flow_item_ipv6_mask.hdr.dst_addr, + sizeof(rte_flow_item_ipv6_mask.hdr.dst_addr))) + return rte_flow_error_set(error, ENOTSUP, + RTE_FLOW_ERROR_TYPE_ITEM_MASK, mask, + "no support for partial mask on" + " \"ipv6.hdr.dst_addr\" field"); + /* More L3 address validations can be put here. */ + } else { + /* + * Kernel uses the destination L3 address + * to determine the ingress network interface + * for traffic being decapculated. + */ + return rte_flow_error_set(error, EINVAL, + RTE_FLOW_ERROR_TYPE_ITEM, ipv6, + "outer L3 destination address must be " + "specified for VXLAN decapsulation"); + } + /* Source L3 address is optional for decap. */ + if (memcmp(&mask->hdr.src_addr, + &flow_tcf_mask_empty.ipv6.hdr.src_addr, + sizeof(flow_tcf_mask_empty.ipv6.hdr.src_addr))) { + if (memcmp(&mask->hdr.src_addr, + &rte_flow_item_ipv6_mask.hdr.src_addr, + sizeof(mask->hdr.src_addr))) + return rte_flow_error_set(error, ENOTSUP, + RTE_FLOW_ERROR_TYPE_ITEM_MASK, mask, + "no support for partial mask on" + " \"ipv6.hdr.src_addr\" field"); + } + } + if (!udp) { + return rte_flow_error_set(error, EINVAL, + RTE_FLOW_ERROR_TYPE_ACTION, action, + "no outer L4 layer found" + " for VXLAN decapsulation"); + } else { + const struct rte_flow_item_udp *spec = udp->spec; + const struct rte_flow_item_udp *mask = udp->mask; + + if (!spec) + /* + * Specification for UDP ports cannot be empty + * because it is required as decap parameter. + */ + return rte_flow_error_set(error, EINVAL, + RTE_FLOW_ERROR_TYPE_ITEM, udp, + "NULL UDP port specification " + " for VXLAN decapsulation"); + if (!mask) + mask = &rte_flow_item_udp_mask; + if (mask->hdr.dst_port != RTE_BE16(0x0000)) { + if (mask->hdr.dst_port != RTE_BE16(0xffff)) + return rte_flow_error_set(error, ENOTSUP, + RTE_FLOW_ERROR_TYPE_ITEM_MASK, mask, + "no support for partial mask on" + " \"udp.hdr.dst_port\" field"); + if (!spec->hdr.dst_port) + return rte_flow_error_set(error, EINVAL, + RTE_FLOW_ERROR_TYPE_ITEM, udp, + "zero decap local UDP port"); + } else { + return rte_flow_error_set(error, EINVAL, + RTE_FLOW_ERROR_TYPE_ITEM, udp, + "outer UDP destination port must be " + "specified for VXLAN decapsulation"); + } + if (mask->hdr.src_port != RTE_BE16(0x0000)) { + if (mask->hdr.src_port != RTE_BE16(0xffff)) + return rte_flow_error_set(error, ENOTSUP, + RTE_FLOW_ERROR_TYPE_ITEM_MASK, mask, + "no support for partial mask on" + " \"udp.hdr.src_port\" field"); + DRV_LOG(WARNING, + "outer UDP local port cannot be " + "forced for VXLAN encapsulation, " + "parameter ignored"); + } + } + if (!(item_flags & MLX5_FLOW_LAYER_VXLAN)) + return rte_flow_error_set(error, EINVAL, + RTE_FLOW_ERROR_TYPE_ACTION, action, + "no VXLAN VNI found" + " for VXLAN decapsulation"); + /* VNI is already validated, extra check can be put here. */ + return 0; +} +/** * Validate flow for E-Switch. * * @param[in] priv @@ -589,6 +1213,7 @@ struct flow_tcf_ptoi { const struct rte_flow_item_ipv6 *ipv6; const struct rte_flow_item_tcp *tcp; const struct rte_flow_item_udp *udp; + const struct rte_flow_item_vxlan *vxlan; } spec, mask; union { const struct rte_flow_action_port_id *port_id; @@ -597,7 +1222,11 @@ struct flow_tcf_ptoi { of_set_vlan_vid; const struct rte_flow_action_of_set_vlan_pcp * of_set_vlan_pcp; + const struct rte_flow_action_vxlan_encap *vxlan_encap; } conf; + const struct rte_flow_item *ipv4 = NULL; /* storage to check */ + const struct rte_flow_item *ipv6 = NULL; /* outer tunnel. */ + const struct rte_flow_item *udp = NULL; /* parameters. */ uint32_t item_flags = 0; uint32_t action_flags = 0; uint8_t next_protocol = -1; @@ -724,7 +1353,6 @@ struct flow_tcf_ptoi { error); if (ret < 0) return ret; - item_flags |= MLX5_FLOW_LAYER_OUTER_L3_IPV4; mask.ipv4 = flow_tcf_item_mask (items, &rte_flow_item_ipv4_mask, &flow_tcf_mask_supported.ipv4, @@ -745,13 +1373,22 @@ struct flow_tcf_ptoi { next_protocol = ((const struct rte_flow_item_ipv4 *) (items->spec))->hdr.next_proto_id; + if (item_flags & MLX5_FLOW_LAYER_OUTER_L3_IPV4) { + /* + * Multiple outer items are not allowed as + * tunnel parameters + */ + ipv4 = NULL; + } else { + ipv4 = items; + item_flags |= MLX5_FLOW_LAYER_OUTER_L3_IPV4; + } break; case RTE_FLOW_ITEM_TYPE_IPV6: ret = mlx5_flow_validate_item_ipv6(items, item_flags, error); if (ret < 0) return ret; - item_flags |= MLX5_FLOW_LAYER_OUTER_L3_IPV6; mask.ipv6 = flow_tcf_item_mask (items, &rte_flow_item_ipv6_mask, &flow_tcf_mask_supported.ipv6, @@ -772,13 +1409,22 @@ struct flow_tcf_ptoi { next_protocol = ((const struct rte_flow_item_ipv6 *) (items->spec))->hdr.proto; + if (item_flags & MLX5_FLOW_LAYER_OUTER_L3_IPV6) { + /* + *Multiple outer items are not allowed as + * tunnel parameters + */ + ipv6 = NULL; + } else { + ipv6 = items; + item_flags |= MLX5_FLOW_LAYER_OUTER_L3_IPV6; + } break; case RTE_FLOW_ITEM_TYPE_UDP: ret = mlx5_flow_validate_item_udp(items, item_flags, next_protocol, error); if (ret < 0) return ret; - item_flags |= MLX5_FLOW_LAYER_OUTER_L4_UDP; mask.udp = flow_tcf_item_mask (items, &rte_flow_item_udp_mask, &flow_tcf_mask_supported.udp, @@ -787,13 +1433,18 @@ struct flow_tcf_ptoi { error); if (!mask.udp) return -rte_errno; + if (item_flags & MLX5_FLOW_LAYER_OUTER_L4_UDP) { + udp = NULL; + } else { + udp = items; + item_flags |= MLX5_FLOW_LAYER_OUTER_L4_UDP; + } break; case RTE_FLOW_ITEM_TYPE_TCP: ret = mlx5_flow_validate_item_tcp(items, item_flags, next_protocol, error); if (ret < 0) return ret; - item_flags |= MLX5_FLOW_LAYER_OUTER_L4_TCP; mask.tcp = flow_tcf_item_mask (items, &rte_flow_item_tcp_mask, &flow_tcf_mask_supported.tcp, @@ -802,6 +1453,31 @@ struct flow_tcf_ptoi { error); if (!mask.tcp) return -rte_errno; + item_flags |= MLX5_FLOW_LAYER_OUTER_L4_TCP; + break; + case RTE_FLOW_ITEM_TYPE_VXLAN: + ret = mlx5_flow_validate_item_vxlan(items, + item_flags, error); + if (ret < 0) + return ret; + mask.vxlan = flow_tcf_item_mask + (items, &rte_flow_item_vxlan_mask, + &flow_tcf_mask_supported.vxlan, + &flow_tcf_mask_empty.vxlan, + sizeof(flow_tcf_mask_supported.vxlan), + error); + if (!mask.vxlan) + return -rte_errno; + if (mask.vxlan->vni[0] != 0xff || + mask.vxlan->vni[1] != 0xff || + mask.vxlan->vni[2] != 0xff) + return rte_flow_error_set + (error, ENOTSUP, + RTE_FLOW_ERROR_TYPE_ITEM_MASK, + mask.vxlan, + "no support for partial or " + "empty mask on \"vxlan.vni\" field"); + item_flags |= MLX5_FLOW_LAYER_VXLAN; break; default: return rte_flow_error_set(error, ENOTSUP, @@ -857,6 +1533,33 @@ struct flow_tcf_ptoi { case RTE_FLOW_ACTION_TYPE_OF_SET_VLAN_PCP: action_flags |= MLX5_ACTION_OF_SET_VLAN_PCP; break; + case RTE_FLOW_ACTION_TYPE_VXLAN_ENCAP: + if (action_flags & (MLX5_ACTION_VXLAN_ENCAP + | MLX5_ACTION_VXLAN_DECAP)) + return rte_flow_error_set + (error, ENOTSUP, + RTE_FLOW_ERROR_TYPE_ACTION, actions, + "can't have multiple vxlan actions"); + ret = flow_tcf_validate_vxlan_encap(actions, error); + if (ret < 0) + return ret; + action_flags |= MLX5_ACTION_VXLAN_ENCAP; + break; + case RTE_FLOW_ACTION_TYPE_VXLAN_DECAP: + if (action_flags & (MLX5_ACTION_VXLAN_ENCAP + | MLX5_ACTION_VXLAN_DECAP)) + return rte_flow_error_set + (error, ENOTSUP, + RTE_FLOW_ERROR_TYPE_ACTION, actions, + "can't have multiple vxlan actions"); + ret = flow_tcf_validate_vxlan_decap(item_flags, + actions, + ipv4, ipv6, udp, + error); + if (ret < 0) + return ret; + action_flags |= MLX5_ACTION_VXLAN_DECAP; + break; default: return rte_flow_error_set(error, ENOTSUP, RTE_FLOW_ERROR_TYPE_ACTION, @@ -864,6 +1567,12 @@ struct flow_tcf_ptoi { "action not supported"); } } + if ((item_flags & MLX5_FLOW_LAYER_VXLAN) && + !(action_flags & MLX5_ACTION_VXLAN_DECAP)) + return rte_flow_error_set(error, ENOTSUP, + RTE_FLOW_ERROR_TYPE_ACTION, NULL, + "VNI pattern should be followed " + " by VXLAN_DECAP action"); return 0; } From patchwork Tue Oct 2 06:30:40 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Slava Ovsiienko X-Patchwork-Id: 45803 X-Patchwork-Delegate: shahafs@mellanox.com Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 175F94F90; Tue, 2 Oct 2018 08:30:44 +0200 (CEST) Received: from EUR01-DB5-obe.outbound.protection.outlook.com (mail-db5eur01on0086.outbound.protection.outlook.com [104.47.2.86]) by dpdk.org (Postfix) with ESMTP id 1DC684C8B for ; Tue, 2 Oct 2018 08:30:42 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Mellanox.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=x7eXpmzu0W1IpwYEsIkwgU5wtVD9ihHaaRc8y4meCnE=; b=JbAE3euVN95m1S6HFmBsSzJxpxIGA3L9njABEN5lENLtoBv2WvK43qJZzW5QEItC5frFGgmNr5ZsAhhN8jyCa6+e4COxaLt6uwzVyHlBjmTBUGW5UzbpW8lX/x8esT+j45dgrC5MkJO/4AHifPyOeHLyWgXM+6uT9PpXdIGzagI= Received: from AM4PR05MB3265.eurprd05.prod.outlook.com (10.171.186.150) by AM4PR05MB3443.eurprd05.prod.outlook.com (10.171.187.148) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.1185.25; Tue, 2 Oct 2018 06:30:40 +0000 Received: from AM4PR05MB3265.eurprd05.prod.outlook.com ([fe80::81a1:b719:2345:50e5]) by AM4PR05MB3265.eurprd05.prod.outlook.com ([fe80::81a1:b719:2345:50e5%5]) with mapi id 15.20.1185.024; Tue, 2 Oct 2018 06:30:40 +0000 From: Slava Ovsiienko To: "dev@dpdk.org" CC: Shahaf Shuler , Slava Ovsiienko Thread-Topic: [PATCH 4/5] net/mlx5: e-switch VXLAN flow translation routine Thread-Index: AQHUWhlwMhlHfQkdIEa8oEfa/cy22Q== Date: Tue, 2 Oct 2018 06:30:40 +0000 Message-ID: <1538461807-37507-4-git-send-email-viacheslavo@mellanox.com> References: <1538461807-37507-1-git-send-email-viacheslavo@mellanox.com> In-Reply-To: <1538461807-37507-1-git-send-email-viacheslavo@mellanox.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-clientproxiedby: AM0PR02CA0019.eurprd02.prod.outlook.com (2603:10a6:208:3e::32) To AM4PR05MB3265.eurprd05.prod.outlook.com (2603:10a6:205:4::22) authentication-results: spf=none (sender IP is ) smtp.mailfrom=viacheslavo@mellanox.com; x-ms-exchange-messagesentrepresentingtype: 1 x-originating-ip: [37.142.13.130] x-ms-publictraffictype: Email x-microsoft-exchange-diagnostics: 1; AM4PR05MB3443; 6:s20gdf566GyZ/A/bRlGALDWOn3n7QBwJaeqb1ll1SjfQgIEGQOMUoNdRISfKzQ7XGwLm6iykw7gPcOeZMlxWSVUP1dR/k1ZEE0y4q584iIuZtChIxnn57ZXfSdzx+XVTdWEOtYgL92j2gvrFLsfLgN8CmsKcRxOrePOOhro/hoWqwHvVCLalqJ7bsbKo11Ga4wdegpZlbPx+gCdkefs7CQD4T+ekXnDE/xo75bWKVvWJXQ3cNUa4rtJpVOq6pL7jOaouNQ32HuTkV/TyIsY4hjtDLyvcM5TDiwUN1/CXwQb+0YGg6+M9boFpVJWwNZ1UT4U3ITkR96LPp8OwlqIO5qjFRcIs6L3e06NeLwGarj+1YXSAHiaxH8EZaoYQL2/bBUrkrhMGRU2JxwRoh0a0F4M8SPH9Sn4NYbp1DSkb9G9Fn6GtCT/wEy7FSvpTdCYU6REw57TAkYHVOyPrxrsqPQ==; 5:46gL7V6h16F93BtRt831sDJPuI8CXyFDb319o/YYg/KRjJPpVglEsMYhlPmUsx218jxkGm4L/f+YeZAhh1VVw+OMbaOThnjdReOfj+NGuGUr2ZZlfoXzH9NJsrR+GBjgXUDDosgnNlypezAX+jF+GC5O99nH7r7o7YtOGylW91c=; 7:Xob5ldP+j9kunNjdVnnwoehnEeYUD/ap8BhxpXN81Xu1cxqmnEpegkEN1qh27fnpLSlkwCuxW028dLYFH6ow6JsFV+36tS3u0o4uvveWBJjYt/NvUz0kiDUhCIRkpJ7v2N4v70BrmI3VZMcZ4PA3hBqWiU3wSSVd7yJCtkT38zvzaYjShx7RqvJZiWksuXJE/lfCAaGHZuoSRkIQiBj9obxk4XwFKM4iXFpHlcfKbwaHZj0Po05kN1CI5e8ZF+kW x-ms-office365-filtering-correlation-id: 9d5b3bc7-9e53-49de-3c2d-08d62830927c x-ms-office365-filtering-ht: Tenant x-microsoft-antispam: BCL:0; PCL:0; RULEID:(7020095)(4652040)(8989299)(4534165)(4627221)(201703031133081)(201702281549075)(8990200)(5600074)(711020)(4618075)(2017052603328)(7153060)(7193020); SRVR:AM4PR05MB3443; x-ms-traffictypediagnostic: AM4PR05MB3443: x-microsoft-antispam-prvs: x-exchange-antispam-report-test: UriScan:; x-ms-exchange-senderadcheck: 1 x-exchange-antispam-report-cfa-test: BCL:0; PCL:0; RULEID:(8211001083)(6040522)(2401047)(8121501046)(5005006)(3231355)(944501410)(52105095)(3002001)(10201501046)(93006095)(93001095)(6055026)(149066)(150057)(6041310)(20161123560045)(20161123558120)(20161123562045)(20161123564045)(201703131423095)(201702281528075)(20161123555045)(201703061421075)(201703061406153)(201708071742011)(7699051); SRVR:AM4PR05MB3443; BCL:0; PCL:0; RULEID:; SRVR:AM4PR05MB3443; x-forefront-prvs: 0813C68E65 x-forefront-antispam-report: SFV:NSPM; SFS:(10009020)(396003)(136003)(39860400002)(376002)(366004)(346002)(52314003)(189003)(199004)(5640700003)(256004)(25786009)(14444005)(26005)(106356001)(54906003)(186003)(316002)(102836004)(68736007)(105586002)(6116002)(3846002)(97736004)(14454004)(81156014)(1730700003)(86362001)(2351001)(2900100001)(8676002)(575784001)(7736002)(2616005)(8936002)(476003)(305945005)(446003)(11346002)(81166006)(6486002)(52116002)(386003)(6436002)(6506007)(53936002)(2906002)(4326008)(71200400001)(36756003)(478600001)(486006)(99286004)(5250100002)(66066001)(76176011)(5660300001)(6512007)(2501003)(6916009)(53946003)(4744004)(71190400001)(107886003)(579004)(559001); DIR:OUT; SFP:1101; SCL:1; SRVR:AM4PR05MB3443; H:AM4PR05MB3265.eurprd05.prod.outlook.com; FPR:; SPF:None; LANG:en; PTR:InfoNoRecords; MX:1; A:1; received-spf: None (protection.outlook.com: mellanox.com does not designate permitted sender hosts) x-microsoft-antispam-message-info: FU6gMpOi9vUB5PryWqWTkgrLQA8kkMtQ4Ti0b2xug2cPW6Wbgpr4iPRwOIxJz6/pd0P0CEPWowKg4gM1FR1dT8V3Ny+HBd0qUjCFU/J9HURzyEejz3qqh6zhfm3zTf1e11Xo82vz9SqekK0ZSoUnCR2TscDbvyWBYzGRKVLqnpbSYDI/qQwGbU3g3oRUCvkP3kdY3tj4373kNj+mCKHS1UktX5ksTBKzA05FzP1qiSUqrvJsg8Xr+1tlErpjRL5TTz8FyqxXqusYcaU6BQ8iFUh7jq2lO20BZhSYwS6imoOmHCGO+3wlxATHiYMN9+/RNPIrjSgnDa0xMQMV2IyRG9/ifSighD0WAifUZqThzCs= spamdiagnosticoutput: 1:99 spamdiagnosticmetadata: NSPM MIME-Version: 1.0 X-OriginatorOrg: Mellanox.com X-MS-Exchange-CrossTenant-Network-Message-Id: 9d5b3bc7-9e53-49de-3c2d-08d62830927c X-MS-Exchange-CrossTenant-originalarrivaltime: 02 Oct 2018 06:30:40.4963 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: a652971c-7d2e-4d9b-a6a4-d149256f461b X-MS-Exchange-Transport-CrossTenantHeadersStamped: AM4PR05MB3443 Subject: [dpdk-dev] [PATCH 4/5] net/mlx5: e-switch VXLAN flow translation routine X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" This part of patchset adds support of VXLAN-related items and actions to the flow translation routine. If some of them are specified in the rule, the extra space for tunnel description structure is allocated. Later some tunnel types, other than VXLAN can be addedd (GRE). No VTEP devices are created at this point, the flow rule is just translated, not applied yet. Suggested-by: Adrien Mazarguil Signed-off-by: Viacheslav Ovsiienko --- drivers/net/mlx5/mlx5_flow_tcf.c | 671 ++++++++++++++++++++++++++++++++++----- 1 file changed, 591 insertions(+), 80 deletions(-) diff --git a/drivers/net/mlx5/mlx5_flow_tcf.c b/drivers/net/mlx5/mlx5_flow_tcf.c index 97451bd..dfffc50 100644 --- a/drivers/net/mlx5/mlx5_flow_tcf.c +++ b/drivers/net/mlx5/mlx5_flow_tcf.c @@ -1597,7 +1597,7 @@ struct flow_tcf_ptoi { size += SZ_NLATTR_STRZ_OF("flower") + SZ_NLATTR_NEST + /* TCA_OPTIONS. */ - SZ_NLATTR_TYPE_OF(uint32_t); /* TCA_CLS_FLAGS_SKIP_SW. */ + SZ_NLATTR_TYPE_OF_UINT32; /* TCA_CLS_FLAGS_SKIP_SW. */ for (; items->type != RTE_FLOW_ITEM_TYPE_END; items++) { switch (items->type) { case RTE_FLOW_ITEM_TYPE_VOID: @@ -1605,45 +1605,49 @@ struct flow_tcf_ptoi { case RTE_FLOW_ITEM_TYPE_PORT_ID: break; case RTE_FLOW_ITEM_TYPE_ETH: - size += SZ_NLATTR_TYPE_OF(uint16_t) + /* Ether type. */ + size += SZ_NLATTR_TYPE_OF_UINT16 + /* Ether type. */ SZ_NLATTR_DATA_OF(ETHER_ADDR_LEN) * 4; /* dst/src MAC addr and mask. */ flags |= MLX5_FLOW_LAYER_OUTER_L2; break; case RTE_FLOW_ITEM_TYPE_VLAN: - size += SZ_NLATTR_TYPE_OF(uint16_t) + /* Ether type. */ - SZ_NLATTR_TYPE_OF(uint16_t) + + size += SZ_NLATTR_TYPE_OF_UINT16 + /* Ether type. */ + SZ_NLATTR_TYPE_OF_UINT16 + /* VLAN Ether type. */ - SZ_NLATTR_TYPE_OF(uint8_t) + /* VLAN prio. */ - SZ_NLATTR_TYPE_OF(uint16_t); /* VLAN ID. */ + SZ_NLATTR_TYPE_OF_UINT8 + /* VLAN prio. */ + SZ_NLATTR_TYPE_OF_UINT16; /* VLAN ID. */ flags |= MLX5_FLOW_LAYER_OUTER_VLAN; break; case RTE_FLOW_ITEM_TYPE_IPV4: - size += SZ_NLATTR_TYPE_OF(uint16_t) + /* Ether type. */ - SZ_NLATTR_TYPE_OF(uint8_t) + /* IP proto. */ - SZ_NLATTR_TYPE_OF(uint32_t) * 4; + size += SZ_NLATTR_TYPE_OF_UINT16 + /* Ether type. */ + SZ_NLATTR_TYPE_OF_UINT8 + /* IP proto. */ + SZ_NLATTR_TYPE_OF_UINT32 * 4; /* dst/src IP addr and mask. */ flags |= MLX5_FLOW_LAYER_OUTER_L3_IPV4; break; case RTE_FLOW_ITEM_TYPE_IPV6: - size += SZ_NLATTR_TYPE_OF(uint16_t) + /* Ether type. */ - SZ_NLATTR_TYPE_OF(uint8_t) + /* IP proto. */ + size += SZ_NLATTR_TYPE_OF_UINT16 + /* Ether type. */ + SZ_NLATTR_TYPE_OF_UINT8 + /* IP proto. */ SZ_NLATTR_TYPE_OF(IPV6_ADDR_LEN) * 4; /* dst/src IP addr and mask. */ flags |= MLX5_FLOW_LAYER_OUTER_L3_IPV6; break; case RTE_FLOW_ITEM_TYPE_UDP: - size += SZ_NLATTR_TYPE_OF(uint8_t) + /* IP proto. */ - SZ_NLATTR_TYPE_OF(uint16_t) * 4; + size += SZ_NLATTR_TYPE_OF_UINT8 + /* IP proto. */ + SZ_NLATTR_TYPE_OF_UINT16 * 4; /* dst/src port and mask. */ flags |= MLX5_FLOW_LAYER_OUTER_L4_UDP; break; case RTE_FLOW_ITEM_TYPE_TCP: - size += SZ_NLATTR_TYPE_OF(uint8_t) + /* IP proto. */ - SZ_NLATTR_TYPE_OF(uint16_t) * 4; + size += SZ_NLATTR_TYPE_OF_UINT8 + /* IP proto. */ + SZ_NLATTR_TYPE_OF_UINT16 * 4; /* dst/src port and mask. */ flags |= MLX5_FLOW_LAYER_OUTER_L4_TCP; break; + case RTE_FLOW_ITEM_TYPE_VXLAN: + size += SZ_NLATTR_TYPE_OF_UINT32; + flags |= MLX5_FLOW_LAYER_VXLAN; + break; default: DRV_LOG(WARNING, "unsupported item %p type %d," @@ -1657,6 +1661,265 @@ struct flow_tcf_ptoi { } /** + * Helper function to process RTE_FLOW_ITEM_TYPE_ETH entry in configuration + * of action RTE_FLOW_ACTION_TYPE_VXLAN_ENCAP. Fills the MAC address fields + * in the encapsulation parameters structure. The item must be prevalidated, + * no any validation checks performed by function. + * + * @param[in] spec + * RTE_FLOW_ITEM_TYPE_ETH entry specification. + * @param[in] mask + * RTE_FLOW_ITEM_TYPE_ETH entry mask. + * @param[out] encap + * Structure to fill the gathered MAC address data. + * + * @return + * The size needed the Netlink message tunnel_key + * parameter buffer to store the item attributes. + */ +static int +flow_tcf_parse_vxlan_encap_eth(const struct rte_flow_item_eth *spec, + const struct rte_flow_item_eth *mask, + struct mlx5_flow_tcf_vxlan_encap *encap) +{ + /* Item must be validated before. No redundant checks. */ + assert(spec); + if (!mask || !memcmp(&mask->dst, + &rte_flow_item_eth_mask.dst, + sizeof(rte_flow_item_eth_mask.dst))) { + /* + * Ethernet addresses are not supported by + * tc as tunnel_key parameters. Destination + * address is needed to form encap packet + * header and retrieved by kernel from + * implicit sources (ARP table, etc), + * address masks are not supported at all. + */ + encap->eth.dst = spec->dst; + encap->mask |= MLX5_FLOW_TCF_ENCAP_ETH_DST; + } + if (!mask || !memcmp(&mask->src, + &rte_flow_item_eth_mask.src, + sizeof(rte_flow_item_eth_mask.src))) { + /* + * Ethernet addresses are not supported by + * tc as tunnel_key parameters. Source ethernet + * address is ignored anyway. + */ + encap->eth.src = spec->src; + encap->mask |= MLX5_FLOW_TCF_ENCAP_ETH_SRC; + } + /* + * No space allocated for ethernet addresses within Netlink + * message tunnel_key record - these ones are not + * supported by tc. + */ + return 0; +} + +/** + * Helper function to process RTE_FLOW_ITEM_TYPE_IPV4 entry in configuration + * of action RTE_FLOW_ACTION_TYPE_VXLAN_ENCAP. Fills the IPV4 address fields + * in the encapsulation parameters structure. The item must be prevalidated, + * no any validation checks performed by function. + * + * @param[in] spec + * RTE_FLOW_ITEM_TYPE_IPV4 entry specification. + * @param[out] encap + * Structure to fill the gathered IPV4 address data. + * + * @return + * The size needed the Netlink message tunnel_key + * parameter buffer to store the item attributes. + */ +static int +flow_tcf_parse_vxlan_encap_ipv4(const struct rte_flow_item_ipv4 *spec, + struct mlx5_flow_tcf_vxlan_encap *encap) +{ + /* Item must be validated before. No redundant checks. */ + assert(spec); + encap->ipv4.dst = spec->hdr.dst_addr; + encap->ipv4.src = spec->hdr.src_addr; + encap->mask |= MLX5_FLOW_TCF_ENCAP_IPV4_SRC | + MLX5_FLOW_TCF_ENCAP_IPV4_DST; + return SZ_NLATTR_TYPE_OF_UINT32 * 2; +} + +/** + * Helper function to process RTE_FLOW_ITEM_TYPE_IPV6 entry in configuration + * of action RTE_FLOW_ACTION_TYPE_VXLAN_ENCAP. Fills the IPV6 address fields + * in the encapsulation parameters structure. The item must be prevalidated, + * no any validation checks performed by function. + * + * @param[in] spec + * RTE_FLOW_ITEM_TYPE_IPV6 entry specification. + * @param[out] encap + * Structure to fill the gathered IPV6 address data. + * + * @return + * The size needed the Netlink message tunnel_key + * parameter buffer to store the item attributes. + */ +static int +flow_tcf_parse_vxlan_encap_ipv6(const struct rte_flow_item_ipv6 *spec, + struct mlx5_flow_tcf_vxlan_encap *encap) +{ + /* Item must be validated before. No redundant checks. */ + assert(spec); + memcpy(encap->ipv6.dst, spec->hdr.dst_addr, sizeof(encap->ipv6.dst)); + memcpy(encap->ipv6.src, spec->hdr.src_addr, sizeof(encap->ipv6.src)); + encap->mask |= MLX5_FLOW_TCF_ENCAP_IPV6_SRC | + MLX5_FLOW_TCF_ENCAP_IPV6_DST; + return SZ_NLATTR_TYPE_OF(IPV6_ADDR_LEN) * 2; +} + +/** + * Helper function to process RTE_FLOW_ITEM_TYPE_UDP entry in configuration + * of action RTE_FLOW_ACTION_TYPE_VXLAN_ENCAP. Fills the UDP port fields + * in the encapsulation parameters structure. The item must be prevalidated, + * no any validation checks performed by function. + * + * @param[in] spec + * RTE_FLOW_ITEM_TYPE_UDP entry specification. + * @param[in] mask + * RTE_FLOW_ITEM_TYPE_UDP entry mask. + * @param[out] encap + * Structure to fill the gathered UDP port data. + * + * @return + * The size needed the Netlink message tunnel_key + * parameter buffer to store the item attributes. + */ +static int +flow_tcf_parse_vxlan_encap_udp(const struct rte_flow_item_udp *spec, + const struct rte_flow_item_udp *mask, + struct mlx5_flow_tcf_vxlan_encap *encap) +{ + int size = SZ_NLATTR_TYPE_OF_UINT16; + + assert(spec); + encap->udp.dst = spec->hdr.dst_port; + encap->mask |= MLX5_FLOW_TCF_ENCAP_UDP_DST; + if (!mask || mask->hdr.src_port != RTE_BE16(0x0000)) { + encap->udp.src = spec->hdr.src_port; + size += SZ_NLATTR_TYPE_OF_UINT16; + encap->mask |= MLX5_FLOW_TCF_ENCAP_IPV4_SRC; + } + return size; +} + +/** + * Helper function to process RTE_FLOW_ITEM_TYPE_VXLAN entry in configuration + * of action RTE_FLOW_ACTION_TYPE_VXLAN_ENCAP. Fills the VNI fields + * in the encapsulation parameters structure. The item must be prevalidated, + * no any validation checks performed by function. + * + * @param[in] spec + * RTE_FLOW_ITEM_TYPE_VXLAN entry specification. + * @param[out] encap + * Structure to fill the gathered VNI address data. + * + * @return + * The size needed the Netlink message tunnel_key + * parameter buffer to store the item attributes. + */ +static int +flow_tcf_parse_vxlan_encap_vni(const struct rte_flow_item_vxlan *spec, + struct mlx5_flow_tcf_vxlan_encap *encap) +{ + /* Item must be validated before. Do not redundant checks. */ + assert(spec); + memcpy(encap->vxlan.vni, spec->vni, sizeof(encap->vxlan.vni)); + encap->mask |= MLX5_FLOW_TCF_ENCAP_VXLAN_VNI; + return SZ_NLATTR_TYPE_OF_UINT32; +} + +/** + * Populate consolidated encapsulation object from list of pattern items. + * + * Helper function to process configuration of action such as + * RTE_FLOW_ACTION_TYPE_VXLAN_ENCAP. The item list should be + * validated, there is no way to return an meaningful error. + * + * @param[in] action + * RTE_FLOW_ACTION_TYPE_VXLAN_ENCAP action object. + * List of pattern items to gather data from. + * @param[out] src + * Structure to fill gathered data. + * + * @return + * The size the part of Netlink message buffer to store the item + * attributes on success, zero otherwise. The mask field in + * result structure reflects correctly parsed items. + */ +static int +flow_tcf_vxlan_encap_parse(const struct rte_flow_action *action, + struct mlx5_flow_tcf_vxlan_encap *encap) +{ + union { + const struct rte_flow_item_eth *eth; + const struct rte_flow_item_ipv4 *ipv4; + const struct rte_flow_item_ipv6 *ipv6; + const struct rte_flow_item_udp *udp; + const struct rte_flow_item_vxlan *vxlan; + } spec, mask; + const struct rte_flow_item *items; + int size = 0; + + assert(action->type == RTE_FLOW_ACTION_TYPE_VXLAN_ENCAP); + assert(action->conf); + + items = ((const struct rte_flow_action_vxlan_encap *) + action->conf)->definition; + assert(items); + for (; items->type != RTE_FLOW_ITEM_TYPE_END; items++) { + switch (items->type) { + case RTE_FLOW_ITEM_TYPE_VOID: + break; + case RTE_FLOW_ITEM_TYPE_ETH: + mask.eth = items->mask; + spec.eth = items->spec; + size += flow_tcf_parse_vxlan_encap_eth(spec.eth, + mask.eth, + encap); + break; + case RTE_FLOW_ITEM_TYPE_IPV4: + spec.ipv4 = items->spec; + size += flow_tcf_parse_vxlan_encap_ipv4(spec.ipv4, + encap); + break; + case RTE_FLOW_ITEM_TYPE_IPV6: + spec.ipv6 = items->spec; + size += flow_tcf_parse_vxlan_encap_ipv6(spec.ipv6, + encap); + break; + case RTE_FLOW_ITEM_TYPE_UDP: + mask.udp = items->mask; + spec.udp = items->spec; + size += flow_tcf_parse_vxlan_encap_udp(spec.udp, + mask.udp, + encap); + break; + case RTE_FLOW_ITEM_TYPE_VXLAN: + spec.vxlan = items->spec; + size += flow_tcf_parse_vxlan_encap_vni(spec.vxlan, + encap); + break; + default: + assert(false); + DRV_LOG(WARNING, + "unsupported item %p type %d," + " items must be validated" + " before flow creation", + (const void *)items, items->type); + encap->mask = 0; + return 0; + } + } + return size; +} + +/** * Calculate maximum size of memory for flow actions of Linux TC flower and * extract specified actions. * @@ -1664,13 +1927,16 @@ struct flow_tcf_ptoi { * Pointer to the list of actions. * @param[out] action_flags * Pointer to the detected actions. + * @param[out] tunnel + * Pointer to tunnel encapsulation parameters structure to fill. * * @return * Maximum size of memory for actions. */ static int flow_tcf_get_actions_and_size(const struct rte_flow_action actions[], - uint64_t *action_flags) + uint64_t *action_flags, + void *tunnel) { int size = 0; uint64_t flags = 0; @@ -1684,14 +1950,14 @@ struct flow_tcf_ptoi { size += SZ_NLATTR_NEST + /* na_act_index. */ SZ_NLATTR_STRZ_OF("mirred") + SZ_NLATTR_NEST + /* TCA_ACT_OPTIONS. */ - SZ_NLATTR_TYPE_OF(struct tc_mirred); + SZ_NLATTR_TYPE_OF_STRUCT(tc_mirred); flags |= MLX5_ACTION_PORT_ID; break; case RTE_FLOW_ACTION_TYPE_DROP: size += SZ_NLATTR_NEST + /* na_act_index. */ SZ_NLATTR_STRZ_OF("gact") + SZ_NLATTR_NEST + /* TCA_ACT_OPTIONS. */ - SZ_NLATTR_TYPE_OF(struct tc_gact); + SZ_NLATTR_TYPE_OF_STRUCT(tc_gact); flags |= MLX5_ACTION_DROP; break; case RTE_FLOW_ACTION_TYPE_OF_POP_VLAN: @@ -1710,11 +1976,34 @@ struct flow_tcf_ptoi { size += SZ_NLATTR_NEST + /* na_act_index. */ SZ_NLATTR_STRZ_OF("vlan") + SZ_NLATTR_NEST + /* TCA_ACT_OPTIONS. */ - SZ_NLATTR_TYPE_OF(struct tc_vlan) + - SZ_NLATTR_TYPE_OF(uint16_t) + + SZ_NLATTR_TYPE_OF_STRUCT(tc_vlan) + + SZ_NLATTR_TYPE_OF_UINT16 + /* VLAN protocol. */ - SZ_NLATTR_TYPE_OF(uint16_t) + /* VLAN ID. */ - SZ_NLATTR_TYPE_OF(uint8_t); /* VLAN prio. */ + SZ_NLATTR_TYPE_OF_UINT16 + /* VLAN ID. */ + SZ_NLATTR_TYPE_OF_UINT8; /* VLAN prio. */ + break; + case RTE_FLOW_ACTION_TYPE_VXLAN_ENCAP: + size += SZ_NLATTR_NEST + /* na_act_index. */ + SZ_NLATTR_STRZ_OF("tunnel_key") + + SZ_NLATTR_NEST + /* TCA_ACT_OPTIONS. */ + SZ_NLATTR_TYPE_OF_UINT8 + /* no UDP sum */ + SZ_NLATTR_TYPE_OF_STRUCT(tc_tunnel_key) + + flow_tcf_vxlan_encap_parse(actions, tunnel) + + RTE_ALIGN_CEIL /* preceding encap params. */ + (sizeof(struct mlx5_flow_tcf_vxlan_encap), + MNL_ALIGNTO); + flags |= MLX5_ACTION_VXLAN_ENCAP; + break; + case RTE_FLOW_ACTION_TYPE_VXLAN_DECAP: + size += SZ_NLATTR_NEST + /* na_act_index. */ + SZ_NLATTR_STRZ_OF("tunnel_key") + + SZ_NLATTR_NEST + /* TCA_ACT_OPTIONS. */ + SZ_NLATTR_TYPE_OF_UINT8 + /* no UDP sum */ + SZ_NLATTR_TYPE_OF_STRUCT(tc_tunnel_key) + + RTE_ALIGN_CEIL /* preceding decap params. */ + (sizeof(struct mlx5_flow_tcf_vxlan_decap), + MNL_ALIGNTO); + flags |= MLX5_ACTION_VXLAN_DECAP; break; default: DRV_LOG(WARNING, @@ -1750,6 +2039,26 @@ struct flow_tcf_ptoi { } /** + * Convert VXLAN VNI to 32-bit integer. + * + * @param[in] vni + * VXLAN VNI in 24-bit wire format. + * + * @return + * VXLAN VNI as a 32-bit integer value in network endian. + */ +static rte_be32_t +vxlan_vni_as_be32(const uint8_t vni[3]) +{ + rte_be32_t ret; + + ret = vni[0]; + ret = (ret << 8) | vni[1]; + ret = (ret << 8) | vni[2]; + return RTE_BE32(ret); +} + +/** * Prepare a flow object for Linux TC flower. It calculates the maximum size of * memory required, allocates the memory, initializes Netlink message headers * and set unique TC message handle. @@ -1784,22 +2093,54 @@ struct flow_tcf_ptoi { struct mlx5_flow *dev_flow; struct nlmsghdr *nlh; struct tcmsg *tcm; + struct mlx5_flow_tcf_vxlan_encap encap = {.mask = 0}; + uint8_t *sp, *tun = NULL; size += flow_tcf_get_items_and_size(items, item_flags); - size += flow_tcf_get_actions_and_size(actions, action_flags); - dev_flow = rte_zmalloc(__func__, size, MNL_ALIGNTO); + size += flow_tcf_get_actions_and_size(actions, action_flags, &encap); + dev_flow = rte_zmalloc(__func__, size, + RTE_MAX(alignof(struct mlx5_flow_tcf_tunnel_hdr), + (size_t)MNL_ALIGNTO)); if (!dev_flow) { rte_flow_error_set(error, ENOMEM, RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL, "not enough memory to create E-Switch flow"); return NULL; } - nlh = mnl_nlmsg_put_header((void *)(dev_flow + 1)); + sp = (uint8_t *)(dev_flow + 1); + if (*action_flags & MLX5_ACTION_VXLAN_ENCAP) { + tun = sp; + sp += RTE_ALIGN_CEIL + (sizeof(struct mlx5_flow_tcf_vxlan_encap), + MNL_ALIGNTO); + size -= RTE_ALIGN_CEIL + (sizeof(struct mlx5_flow_tcf_vxlan_encap), + MNL_ALIGNTO); + encap.hdr.type = MLX5_FLOW_TCF_TUNACT_VXLAN_ENCAP; + memcpy(tun, &encap, + sizeof(struct mlx5_flow_tcf_vxlan_encap)); + } else if (*action_flags & MLX5_ACTION_VXLAN_DECAP) { + tun = sp; + sp += RTE_ALIGN_CEIL + (sizeof(struct mlx5_flow_tcf_vxlan_decap), + MNL_ALIGNTO); + size -= RTE_ALIGN_CEIL + (sizeof(struct mlx5_flow_tcf_vxlan_decap), + MNL_ALIGNTO); + encap.hdr.type = MLX5_FLOW_TCF_TUNACT_VXLAN_DECAP; + memcpy(tun, &encap, + sizeof(struct mlx5_flow_tcf_vxlan_decap)); + } + nlh = mnl_nlmsg_put_header(sp); tcm = mnl_nlmsg_put_extra_header(nlh, sizeof(*tcm)); *dev_flow = (struct mlx5_flow){ .tcf = (struct mlx5_flow_tcf){ + .nlsize = size, .nlh = nlh, .tcm = tcm, + .tunnel = (struct mlx5_flow_tcf_tunnel_hdr *)tun, + .item_flags = *item_flags, + .action_flags = *action_flags, }, }; /* @@ -1853,6 +2194,7 @@ struct flow_tcf_ptoi { const struct rte_flow_item_ipv6 *ipv6; const struct rte_flow_item_tcp *tcp; const struct rte_flow_item_udp *udp; + const struct rte_flow_item_vxlan *vxlan; } spec, mask; union { const struct rte_flow_action_port_id *port_id; @@ -1862,6 +2204,14 @@ struct flow_tcf_ptoi { const struct rte_flow_action_of_set_vlan_pcp * of_set_vlan_pcp; } conf; + union { + struct mlx5_flow_tcf_tunnel_hdr *hdr; + struct mlx5_flow_tcf_vxlan_decap *vxlan; + } decap; + union { + struct mlx5_flow_tcf_tunnel_hdr *hdr; + struct mlx5_flow_tcf_vxlan_encap *vxlan; + } encap; struct flow_tcf_ptoi ptoi[PTOI_TABLE_SZ_MAX(dev)]; struct nlmsghdr *nlh = dev_flow->tcf.nlh; struct tcmsg *tcm = dev_flow->tcf.tcm; @@ -1877,6 +2227,12 @@ struct flow_tcf_ptoi { claim_nonzero(flow_tcf_build_ptoi_table(dev, ptoi, PTOI_TABLE_SZ_MAX(dev))); + encap.hdr = NULL; + decap.hdr = NULL; + if (dev_flow->tcf.action_flags & MLX5_ACTION_VXLAN_ENCAP) + encap.vxlan = dev_flow->tcf.vxlan_encap; + if (dev_flow->tcf.action_flags & MLX5_ACTION_VXLAN_DECAP) + decap.vxlan = dev_flow->tcf.vxlan_decap; nlh = dev_flow->tcf.nlh; tcm = dev_flow->tcf.tcm; /* Prepare API must have been called beforehand. */ @@ -1892,7 +2248,6 @@ struct flow_tcf_ptoi { RTE_BE16(ETH_P_ALL)); mnl_attr_put_strz(nlh, TCA_KIND, "flower"); na_flower = mnl_attr_nest_start(nlh, TCA_OPTIONS); - mnl_attr_put_u32(nlh, TCA_FLOWER_FLAGS, TCA_CLS_FLAGS_SKIP_SW); for (; items->type != RTE_FLOW_ITEM_TYPE_END; items++) { unsigned int i; @@ -1935,6 +2290,12 @@ struct flow_tcf_ptoi { spec.eth->type); eth_type_set = 1; } + /* + * Send L2 addresses/masks anyway, including + * VXLAN encap/decap cases, sometimes kernel + * returns an error if no L2 address provided + * and skip_sw flag is set + */ if (!is_zero_ether_addr(&mask.eth->dst)) { mnl_attr_put(nlh, TCA_FLOWER_KEY_ETH_DST, ETHER_ADDR_LEN, @@ -1951,8 +2312,19 @@ struct flow_tcf_ptoi { ETHER_ADDR_LEN, mask.eth->src.addr_bytes); } + if (decap.hdr) { + DRV_LOG(WARNING, + "ethernet addresses are treated " + "as inner ones for tunnel decapsulation"); + } + assert(dev_flow->tcf.nlsize >= nlh->nlmsg_len); break; case RTE_FLOW_ITEM_TYPE_VLAN: + if (encap.hdr || decap.hdr) + return rte_flow_error_set(error, ENOTSUP, + RTE_FLOW_ERROR_TYPE_ITEM, NULL, + "outer VLAN is not " + "supported for tunnels"); mask.vlan = flow_tcf_item_mask (items, &rte_flow_item_vlan_mask, &flow_tcf_mask_supported.vlan, @@ -1983,6 +2355,7 @@ struct flow_tcf_ptoi { rte_be_to_cpu_16 (spec.vlan->tci & RTE_BE16(0x0fff))); + assert(dev_flow->tcf.nlsize >= nlh->nlmsg_len); break; case RTE_FLOW_ITEM_TYPE_IPV4: mask.ipv4 = flow_tcf_item_mask @@ -1992,36 +2365,53 @@ struct flow_tcf_ptoi { sizeof(flow_tcf_mask_supported.ipv4), error); assert(mask.ipv4); - if (!eth_type_set || !vlan_eth_type_set) - mnl_attr_put_u16(nlh, - vlan_present ? - TCA_FLOWER_KEY_VLAN_ETH_TYPE : - TCA_FLOWER_KEY_ETH_TYPE, - RTE_BE16(ETH_P_IP)); - eth_type_set = 1; - vlan_eth_type_set = 1; - if (mask.ipv4 == &flow_tcf_mask_empty.ipv4) - break; spec.ipv4 = items->spec; - if (mask.ipv4->hdr.next_proto_id) { - mnl_attr_put_u8(nlh, TCA_FLOWER_KEY_IP_PROTO, + if (!decap.vxlan) { + if (!eth_type_set || !vlan_eth_type_set) { + mnl_attr_put_u16(nlh, + vlan_present ? + TCA_FLOWER_KEY_VLAN_ETH_TYPE : + TCA_FLOWER_KEY_ETH_TYPE, + RTE_BE16(ETH_P_IP)); + } + eth_type_set = 1; + vlan_eth_type_set = 1; + if (mask.ipv4 == &flow_tcf_mask_empty.ipv4) + break; + if (mask.ipv4->hdr.next_proto_id) { + mnl_attr_put_u8 + (nlh, TCA_FLOWER_KEY_IP_PROTO, spec.ipv4->hdr.next_proto_id); - ip_proto_set = 1; + ip_proto_set = 1; + } + } else { + assert(mask.ipv4 != &flow_tcf_mask_empty.ipv4); } if (mask.ipv4->hdr.src_addr) { - mnl_attr_put_u32(nlh, TCA_FLOWER_KEY_IPV4_SRC, - spec.ipv4->hdr.src_addr); - mnl_attr_put_u32(nlh, - TCA_FLOWER_KEY_IPV4_SRC_MASK, - mask.ipv4->hdr.src_addr); + mnl_attr_put_u32 + (nlh, decap.vxlan ? + TCA_FLOWER_KEY_ENC_IPV4_SRC : + TCA_FLOWER_KEY_IPV4_SRC, + spec.ipv4->hdr.src_addr); + mnl_attr_put_u32 + (nlh, decap.vxlan ? + TCA_FLOWER_KEY_ENC_IPV4_SRC_MASK : + TCA_FLOWER_KEY_IPV4_SRC_MASK, + mask.ipv4->hdr.src_addr); } if (mask.ipv4->hdr.dst_addr) { - mnl_attr_put_u32(nlh, TCA_FLOWER_KEY_IPV4_DST, - spec.ipv4->hdr.dst_addr); - mnl_attr_put_u32(nlh, - TCA_FLOWER_KEY_IPV4_DST_MASK, - mask.ipv4->hdr.dst_addr); + mnl_attr_put_u32 + (nlh, decap.vxlan ? + TCA_FLOWER_KEY_ENC_IPV4_DST : + TCA_FLOWER_KEY_IPV4_DST, + spec.ipv4->hdr.dst_addr); + mnl_attr_put_u32 + (nlh, decap.vxlan ? + TCA_FLOWER_KEY_ENC_IPV4_DST_MASK : + TCA_FLOWER_KEY_IPV4_DST_MASK, + mask.ipv4->hdr.dst_addr); } + assert(dev_flow->tcf.nlsize >= nlh->nlmsg_len); break; case RTE_FLOW_ITEM_TYPE_IPV6: mask.ipv6 = flow_tcf_item_mask @@ -2031,38 +2421,53 @@ struct flow_tcf_ptoi { sizeof(flow_tcf_mask_supported.ipv6), error); assert(mask.ipv6); - if (!eth_type_set || !vlan_eth_type_set) - mnl_attr_put_u16(nlh, + spec.ipv6 = items->spec; + if (!decap.vxlan) { + if (!eth_type_set || !vlan_eth_type_set) { + mnl_attr_put_u16(nlh, vlan_present ? TCA_FLOWER_KEY_VLAN_ETH_TYPE : TCA_FLOWER_KEY_ETH_TYPE, RTE_BE16(ETH_P_IPV6)); - eth_type_set = 1; - vlan_eth_type_set = 1; - if (mask.ipv6 == &flow_tcf_mask_empty.ipv6) - break; - spec.ipv6 = items->spec; - if (mask.ipv6->hdr.proto) { - mnl_attr_put_u8(nlh, TCA_FLOWER_KEY_IP_PROTO, - spec.ipv6->hdr.proto); - ip_proto_set = 1; + } + eth_type_set = 1; + vlan_eth_type_set = 1; + if (mask.ipv6 == &flow_tcf_mask_empty.ipv6) + break; + if (mask.ipv6->hdr.proto) { + mnl_attr_put_u8 + (nlh, TCA_FLOWER_KEY_IP_PROTO, + spec.ipv6->hdr.proto); + ip_proto_set = 1; + } + } else { + assert(mask.ipv6 != &flow_tcf_mask_empty.ipv6); } if (!IN6_IS_ADDR_UNSPECIFIED(mask.ipv6->hdr.src_addr)) { - mnl_attr_put(nlh, TCA_FLOWER_KEY_IPV6_SRC, + mnl_attr_put(nlh, decap.vxlan ? + TCA_FLOWER_KEY_ENC_IPV6_SRC : + TCA_FLOWER_KEY_IPV6_SRC, sizeof(spec.ipv6->hdr.src_addr), spec.ipv6->hdr.src_addr); - mnl_attr_put(nlh, TCA_FLOWER_KEY_IPV6_SRC_MASK, + mnl_attr_put(nlh, decap.vxlan ? + TCA_FLOWER_KEY_ENC_IPV6_SRC_MASK : + TCA_FLOWER_KEY_IPV6_SRC_MASK, sizeof(mask.ipv6->hdr.src_addr), mask.ipv6->hdr.src_addr); } if (!IN6_IS_ADDR_UNSPECIFIED(mask.ipv6->hdr.dst_addr)) { - mnl_attr_put(nlh, TCA_FLOWER_KEY_IPV6_DST, + mnl_attr_put(nlh, decap.vxlan ? + TCA_FLOWER_KEY_ENC_IPV6_DST : + TCA_FLOWER_KEY_IPV6_DST, sizeof(spec.ipv6->hdr.dst_addr), spec.ipv6->hdr.dst_addr); - mnl_attr_put(nlh, TCA_FLOWER_KEY_IPV6_DST_MASK, + mnl_attr_put(nlh, decap.vxlan ? + TCA_FLOWER_KEY_ENC_IPV6_DST_MASK : + TCA_FLOWER_KEY_IPV6_DST_MASK, sizeof(mask.ipv6->hdr.dst_addr), mask.ipv6->hdr.dst_addr); } + assert(dev_flow->tcf.nlsize >= nlh->nlmsg_len); break; case RTE_FLOW_ITEM_TYPE_UDP: mask.udp = flow_tcf_item_mask @@ -2072,27 +2477,45 @@ struct flow_tcf_ptoi { sizeof(flow_tcf_mask_supported.udp), error); assert(mask.udp); - if (!ip_proto_set) - mnl_attr_put_u8(nlh, TCA_FLOWER_KEY_IP_PROTO, - IPPROTO_UDP); - if (mask.udp == &flow_tcf_mask_empty.udp) - break; spec.udp = items->spec; + if (!decap.vxlan) { + if (!ip_proto_set) + mnl_attr_put_u8 + (nlh, TCA_FLOWER_KEY_IP_PROTO, + IPPROTO_UDP); + if (mask.udp == &flow_tcf_mask_empty.udp) + break; + } else { + assert(mask.udp != &flow_tcf_mask_empty.udp); + decap.vxlan->udp_port + = RTE_BE16(spec.udp->hdr.dst_port); + } if (mask.udp->hdr.src_port) { - mnl_attr_put_u16(nlh, TCA_FLOWER_KEY_UDP_SRC, - spec.udp->hdr.src_port); - mnl_attr_put_u16(nlh, - TCA_FLOWER_KEY_UDP_SRC_MASK, - mask.udp->hdr.src_port); + mnl_attr_put_u16 + (nlh, decap.vxlan ? + TCA_FLOWER_KEY_ENC_UDP_SRC_PORT : + TCA_FLOWER_KEY_UDP_SRC, + spec.udp->hdr.src_port); + mnl_attr_put_u16 + (nlh, decap.vxlan ? + TCA_FLOWER_KEY_ENC_UDP_SRC_PORT_MASK : + TCA_FLOWER_KEY_UDP_SRC_MASK, + mask.udp->hdr.src_port); } if (mask.udp->hdr.dst_port) { - mnl_attr_put_u16(nlh, TCA_FLOWER_KEY_UDP_DST, - spec.udp->hdr.dst_port); - mnl_attr_put_u16(nlh, - TCA_FLOWER_KEY_UDP_DST_MASK, - mask.udp->hdr.dst_port); + mnl_attr_put_u16 + (nlh, decap.vxlan ? + TCA_FLOWER_KEY_ENC_UDP_DST_PORT : + TCA_FLOWER_KEY_UDP_DST, + spec.udp->hdr.dst_port); + mnl_attr_put_u16 + (nlh, decap.vxlan ? + TCA_FLOWER_KEY_ENC_UDP_DST_PORT_MASK : + TCA_FLOWER_KEY_UDP_DST_MASK, + mask.udp->hdr.dst_port); } - break; + assert(dev_flow->tcf.nlsize >= nlh->nlmsg_len); + break; case RTE_FLOW_ITEM_TYPE_TCP: mask.tcp = flow_tcf_item_mask (items, &rte_flow_item_tcp_mask, @@ -2121,6 +2544,15 @@ struct flow_tcf_ptoi { TCA_FLOWER_KEY_TCP_DST_MASK, mask.tcp->hdr.dst_port); } + assert(dev_flow->tcf.nlsize >= nlh->nlmsg_len); + break; + case RTE_FLOW_ITEM_TYPE_VXLAN: + assert(decap.vxlan); + spec.vxlan = items->spec; + mnl_attr_put_u32(nlh, + TCA_FLOWER_KEY_ENC_KEY_ID, + vxlan_vni_as_be32(spec.vxlan->vni)); + assert(dev_flow->tcf.nlsize >= nlh->nlmsg_len); break; default: return rte_flow_error_set(error, ENOTSUP, @@ -2154,6 +2586,14 @@ struct flow_tcf_ptoi { mnl_attr_put_strz(nlh, TCA_ACT_KIND, "mirred"); na_act = mnl_attr_nest_start(nlh, TCA_ACT_OPTIONS); assert(na_act); + if (encap.hdr) { + assert(dev_flow->tcf.tunnel); + dev_flow->tcf.tunnel->ifindex_ptr = + &((struct tc_mirred *) + mnl_attr_get_payload + (mnl_nlmsg_get_payload_tail + (nlh)))->ifindex; + } mnl_attr_put(nlh, TCA_MIRRED_PARMS, sizeof(struct tc_mirred), &(struct tc_mirred){ @@ -2163,6 +2603,7 @@ struct flow_tcf_ptoi { }); mnl_attr_nest_end(nlh, na_act); mnl_attr_nest_end(nlh, na_act_index); + assert(dev_flow->tcf.nlsize >= nlh->nlmsg_len); break; case RTE_FLOW_ACTION_TYPE_DROP: na_act_index = @@ -2243,6 +2684,74 @@ struct flow_tcf_ptoi { (na_vlan_priority) = conf.of_set_vlan_pcp->vlan_pcp; } + assert(dev_flow->tcf.nlsize >= nlh->nlmsg_len); + break; + case RTE_FLOW_ACTION_TYPE_VXLAN_DECAP: + assert(decap.vxlan); + assert(dev_flow->tcf.tunnel); + dev_flow->tcf.tunnel->ifindex_ptr + = (unsigned int *)&tcm->tcm_ifindex; + na_act_index = + mnl_attr_nest_start(nlh, na_act_index_cur++); + assert(na_act_index); + mnl_attr_put_strz(nlh, TCA_ACT_KIND, "tunnel_key"); + na_act = mnl_attr_nest_start(nlh, TCA_ACT_OPTIONS); + assert(na_act); + mnl_attr_put(nlh, TCA_TUNNEL_KEY_PARMS, + sizeof(struct tc_tunnel_key), + &(struct tc_tunnel_key){ + .action = TC_ACT_PIPE, + .t_action = TCA_TUNNEL_KEY_ACT_RELEASE, + }); + mnl_attr_nest_end(nlh, na_act); + mnl_attr_nest_end(nlh, na_act_index); + assert(dev_flow->tcf.nlsize >= nlh->nlmsg_len); + break; + case RTE_FLOW_ACTION_TYPE_VXLAN_ENCAP: + assert(encap.vxlan); + na_act_index = + mnl_attr_nest_start(nlh, na_act_index_cur++); + assert(na_act_index); + mnl_attr_put_strz(nlh, TCA_ACT_KIND, "tunnel_key"); + na_act = mnl_attr_nest_start(nlh, TCA_ACT_OPTIONS); + assert(na_act); + mnl_attr_put(nlh, TCA_TUNNEL_KEY_PARMS, + sizeof(struct tc_tunnel_key), + &(struct tc_tunnel_key){ + .action = TC_ACT_PIPE, + .t_action = TCA_TUNNEL_KEY_ACT_SET, + }); + if (encap.vxlan->mask & MLX5_FLOW_TCF_ENCAP_UDP_DST) + mnl_attr_put_u16(nlh, + TCA_TUNNEL_KEY_ENC_DST_PORT, + encap.vxlan->udp.dst); + if (encap.vxlan->mask & MLX5_FLOW_TCF_ENCAP_IPV4_SRC) + mnl_attr_put_u32(nlh, + TCA_TUNNEL_KEY_ENC_IPV4_SRC, + encap.vxlan->ipv4.src); + if (encap.vxlan->mask & MLX5_FLOW_TCF_ENCAP_IPV4_DST) + mnl_attr_put_u32(nlh, + TCA_TUNNEL_KEY_ENC_IPV4_DST, + encap.vxlan->ipv4.dst); + if (encap.vxlan->mask & MLX5_FLOW_TCF_ENCAP_IPV6_SRC) + mnl_attr_put(nlh, + TCA_TUNNEL_KEY_ENC_IPV6_SRC, + sizeof(encap.vxlan->ipv6.src), + &encap.vxlan->ipv6.src); + if (encap.vxlan->mask & MLX5_FLOW_TCF_ENCAP_IPV6_DST) + mnl_attr_put(nlh, + TCA_TUNNEL_KEY_ENC_IPV6_DST, + sizeof(encap.vxlan->ipv6.dst), + &encap.vxlan->ipv6.dst); + if (encap.vxlan->mask & MLX5_FLOW_TCF_ENCAP_VXLAN_VNI) + mnl_attr_put_u32(nlh, + TCA_TUNNEL_KEY_ENC_KEY_ID, + vxlan_vni_as_be32 + (encap.vxlan->vxlan.vni)); + mnl_attr_put_u8(nlh, TCA_TUNNEL_KEY_NO_CSUM, 0); + mnl_attr_nest_end(nlh, na_act); + mnl_attr_nest_end(nlh, na_act_index); + assert(dev_flow->tcf.nlsize >= nlh->nlmsg_len); break; default: return rte_flow_error_set(error, ENOTSUP, @@ -2254,7 +2763,9 @@ struct flow_tcf_ptoi { assert(na_flower); assert(na_flower_act); mnl_attr_nest_end(nlh, na_flower_act); + mnl_attr_put_u32(nlh, TCA_FLOWER_FLAGS, TCA_CLS_FLAGS_SKIP_SW); mnl_attr_nest_end(nlh, na_flower); + assert(dev_flow->tcf.nlsize >= nlh->nlmsg_len); return 0; } From patchwork Tue Oct 2 06:30:42 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Slava Ovsiienko X-Patchwork-Id: 45804 X-Patchwork-Delegate: shahafs@mellanox.com Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 64B2E4CB3; Tue, 2 Oct 2018 08:30:46 +0200 (CEST) Received: from EUR02-AM5-obe.outbound.protection.outlook.com (mail-eopbgr00073.outbound.protection.outlook.com [40.107.0.73]) by dpdk.org (Postfix) with ESMTP id AD66A4D3A for ; Tue, 2 Oct 2018 08:30:43 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Mellanox.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=qdlpXxt+XYi6k2UZ/mmLdPvjnX1dQJqx9OiOVckhrsQ=; b=eC7la7d5lA/ypUaH9XgVNhBZTKphVayMUvPeSGLBO2Hsa9MxDSO4zgNrT3nxmZwArikpi9i4OjoMticdThqek2nO81p+HL0Uv06M7Om6PtCXMipHTSAjlRBuWUymtf9jsdaFtcv2pdu++xvJYPjIX9gXKIwUpE1uDJIY73sSF+U= Received: from AM4PR05MB3265.eurprd05.prod.outlook.com (10.171.186.150) by AM4PR05MB3284.eurprd05.prod.outlook.com (10.171.186.157) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.1185.22; Tue, 2 Oct 2018 06:30:42 +0000 Received: from AM4PR05MB3265.eurprd05.prod.outlook.com ([fe80::81a1:b719:2345:50e5]) by AM4PR05MB3265.eurprd05.prod.outlook.com ([fe80::81a1:b719:2345:50e5%5]) with mapi id 15.20.1185.024; Tue, 2 Oct 2018 06:30:42 +0000 From: Slava Ovsiienko To: "dev@dpdk.org" CC: Shahaf Shuler , Slava Ovsiienko Thread-Topic: [PATCH 5/5] net/mlx5: e-switch VXLAN tunnel devices management Thread-Index: AQHUWhlxq79e/tSVaEucP+fG+hh4Dg== Date: Tue, 2 Oct 2018 06:30:42 +0000 Message-ID: <1538461807-37507-5-git-send-email-viacheslavo@mellanox.com> References: <1538461807-37507-1-git-send-email-viacheslavo@mellanox.com> In-Reply-To: <1538461807-37507-1-git-send-email-viacheslavo@mellanox.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-clientproxiedby: AM0PR02CA0019.eurprd02.prod.outlook.com (2603:10a6:208:3e::32) To AM4PR05MB3265.eurprd05.prod.outlook.com (2603:10a6:205:4::22) authentication-results: spf=none (sender IP is ) smtp.mailfrom=viacheslavo@mellanox.com; x-ms-exchange-messagesentrepresentingtype: 1 x-originating-ip: [37.142.13.130] x-ms-publictraffictype: Email x-microsoft-exchange-diagnostics: 1; AM4PR05MB3284; 6:V+KhsqPh7IJsMFZKCq6C3Bg5ssPxUfR3vzJEYOaJwYRlHzuFStT6tU1trtfFbVCBBVLpCwGFvGv2DXCakTY3/k7F07vD62Bec1GMZdP4yGmHr4LQuxG8k708MiWZOvRKQejMAMmVu5fW71JkSiNxt7QUUOV18cmOlOuKxHNkb1yndqpS/gmZ0V8PjWEY+ELb1ttafhmK2957FDFpEUEdTwS7x9MH97tvWYAt/qSknwZWyVn4g658DKJhYOne8PoqhgPJAtxbT1knv49QzAAHtHbsGmsWcfDM+DevveZHvuf4YkNkVOzTxcSisTezF94wzH43rZPHMC2YuHsvpUYWl7XQnURoT9rxXI95ez4Yx+8bnfx6lPkKefF9dKJtAiyN/9RU3pW9tIn4OUW18NKkIDXHv33xB0og7AHWIr9sJLG2snZau+Fuw+62dpfxRSSsUuhNYKjSUmHB8rVS5Qa/VQ==; 5:upWyKs6PgrQ6TuyQJnFFS55zhQ19Jem5lzrckR4kj2AV1Jc0cJ52ZIZf3h4M80jI9gDeMHpbxoKoXGHENs8Bj0bwVUYPAn5T69u8PZu2tp8nCp3/IhwwwTvq32ssHewXgOl6Hqr7WTYqoHZkqVelAZX8gQewe86cop14eG7HZlc=; 7:M7SdFbpCjL7hiYa1KM5bV3D2Ox3Fp3fru5AZ0r94iZOt7u1vAvpaUWOljorffZULTqF3jeJEoxAXDKrJql1mFKBWViHHoc3VHEk8W9evOZ/EbHThJCL3MgqRAyTgplCbgGVcKWSwkeu1WJRuw8PBCpeee1Tc+eVSIpG2lwy9B5ziPiNNb/mJMb0Q7g/iWbV6PNpbtoINmeXJydGLg4fbVfMUZZb+mjjN3EyJXP448xj4FV2pA8e9Hd1/4JKqt9Ga x-ms-office365-filtering-correlation-id: 3bd80dd8-f4d9-48eb-6bd0-08d628309377 x-ms-office365-filtering-ht: Tenant x-microsoft-antispam: BCL:0; PCL:0; RULEID:(7020095)(4652040)(8989299)(4534165)(4627221)(201703031133081)(201702281549075)(8990200)(5600074)(711020)(4618075)(2017052603328)(7153060)(7193020); SRVR:AM4PR05MB3284; x-ms-traffictypediagnostic: AM4PR05MB3284: x-microsoft-antispam-prvs: x-exchange-antispam-report-test: UriScan:(211171220733660)(788757137089); x-ms-exchange-senderadcheck: 1 x-exchange-antispam-report-cfa-test: BCL:0; PCL:0; RULEID:(8211001083)(6040522)(2401047)(5005006)(8121501046)(10201501046)(3231355)(944501410)(52105095)(3002001)(93006095)(93001095)(6055026)(149066)(150057)(6041310)(20161123560045)(20161123562045)(20161123564045)(201703131423095)(201702281528075)(20161123555045)(201703061421075)(201703061406153)(20161123558120)(201708071742011)(7699051); SRVR:AM4PR05MB3284; BCL:0; PCL:0; RULEID:; SRVR:AM4PR05MB3284; x-forefront-prvs: 0813C68E65 x-forefront-antispam-report: SFV:NSPM; SFS:(10009020)(346002)(376002)(366004)(396003)(136003)(39860400002)(189003)(199004)(386003)(97736004)(186003)(26005)(52116002)(6512007)(36756003)(6506007)(2351001)(99286004)(6916009)(68736007)(446003)(3846002)(305945005)(6116002)(7736002)(102836004)(76176011)(11346002)(5640700003)(486006)(86362001)(5660300001)(2616005)(476003)(2900100001)(106356001)(316002)(1730700003)(25786009)(478600001)(81166006)(81156014)(5250100002)(2501003)(54906003)(6436002)(8936002)(14444005)(71200400001)(71190400001)(2906002)(107886003)(8676002)(256004)(66066001)(4326008)(14454004)(6486002)(105586002)(53936002)(4744004); DIR:OUT; SFP:1101; SCL:1; SRVR:AM4PR05MB3284; H:AM4PR05MB3265.eurprd05.prod.outlook.com; FPR:; SPF:None; LANG:en; PTR:InfoNoRecords; MX:1; A:1; received-spf: None (protection.outlook.com: mellanox.com does not designate permitted sender hosts) x-microsoft-antispam-message-info: mwAzW31/BOvqbYp1gz7YskUtkMWwndqyxdQPRh874B/NelC4faQQ7Ts4iK+6M9OqoTxE2QSxkBjl8mH6ru2JJRLXOUlD14WQSsl1aUrAdNHX9xjFnBUEwFs7ypJbafjS5sWcXzAB6ItEfYIoIqngaKXRviNhxUeEv58zPaPEkfPjUHBRV9jzu6A3TIKaqKJhZAsArCuoFWqZUgWJMIpl24ESn9rqSWQbIV4uvdVN+a+LbZTMApIbpfem6w05f8qyIkpc8vnIBYQU70B2xmHHXiJGEl11dFlV7Oy5cZnjAjDAlHujVs8Z4VjKp2zkP0nDHdUV/uVddH57RlBHvbfF+Csr7rsn+iUI3GXSQ+MRWjc= spamdiagnosticoutput: 1:99 spamdiagnosticmetadata: NSPM MIME-Version: 1.0 X-OriginatorOrg: Mellanox.com X-MS-Exchange-CrossTenant-Network-Message-Id: 3bd80dd8-f4d9-48eb-6bd0-08d628309377 X-MS-Exchange-CrossTenant-originalarrivaltime: 02 Oct 2018 06:30:42.0264 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: a652971c-7d2e-4d9b-a6a4-d149256f461b X-MS-Exchange-Transport-CrossTenantHeadersStamped: AM4PR05MB3284 Subject: [dpdk-dev] [PATCH 5/5] net/mlx5: e-switch VXLAN tunnel devices management X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" VXLAN interfaces are dynamically created for each local UDP port of outer networks and then used as targets for TC "flower" filters in order to perform encapsulation. These VXLAN interfaces are system-wide, the only one device with given UDP port can exist in the system (the attempt of creating another device with the same UDP local port returns EEXIST), so PMD should support the shared device instances database for PMD instances. These VXLAN implicitly created devices are called VTEPs (Virtual Tunnel End Points). Creation of the VTEP occurs at the moment of rule applying. The link is set up, root ingress qdisc is also initialized. One VTEP is shared for all encapsulation rules in the DPDK application instance. For decapsulaton one VTEP is created per every unique UDP local port to accept tunnel traffic. The name of created VTEP consists of prefix "vmlx_" and the number of UDP port in decimal digits without leading zeros (vmlx_4789). The VTEP can be preliminary created in the system before the launching application, it allows to share UDP ports between primary and secondary processes. Suggested-by: Adrien Mazarguil Signed-off-by: Viacheslav Ovsiienko --- drivers/net/mlx5/mlx5_flow_tcf.c | 344 ++++++++++++++++++++++++++++++++++++++- 1 file changed, 343 insertions(+), 1 deletion(-) diff --git a/drivers/net/mlx5/mlx5_flow_tcf.c b/drivers/net/mlx5/mlx5_flow_tcf.c index dfffc50..0e62fe9 100644 --- a/drivers/net/mlx5/mlx5_flow_tcf.c +++ b/drivers/net/mlx5/mlx5_flow_tcf.c @@ -1482,7 +1482,7 @@ struct flow_tcf_ptoi { default: return rte_flow_error_set(error, ENOTSUP, RTE_FLOW_ERROR_TYPE_ITEM, - NULL, "item not supported"); + items, "item not supported"); } } for (; actions->type != RTE_FLOW_ACTION_TYPE_END; actions++) { @@ -2886,6 +2886,291 @@ struct flow_tcf_ptoi { return 0; } +/* VTEP device list is shared between PMD port instances. */ +static LIST_HEAD(, mlx5_flow_tcf_vtep) + vtep_list_vxlan = LIST_HEAD_INITIALIZER(); +static pthread_mutex_t vtep_list_mutex = PTHREAD_MUTEX_INITIALIZER; +static struct mlx5_flow_tcf_vtep *vtep_encap; + +/** + * Deletes VTEP network device. + * + * @param[in] tcf + * Context object initialized by mlx5_flow_tcf_socket_open(). + * @param[in] vtep + * Flow tcf object with tunnel device structure to delete. + */ +static void +flow_tcf_delete_iface(struct mlx5_tcf_socket *tcf, + struct mlx5_flow_tcf_vtep *vtep) +{ + struct nlmsghdr *nlh; + struct ifinfomsg *ifm; + alignas(struct nlmsghdr) + uint8_t buf[mnl_nlmsg_size(MNL_ALIGN(sizeof(*ifm))) + 8]; + int ret; + + DRV_LOG(NOTICE, "VTEP delete (%d)", vtep->port); + nlh = mnl_nlmsg_put_header(buf); + nlh->nlmsg_type = RTM_DELLINK; + nlh->nlmsg_flags = NLM_F_REQUEST; + ifm = mnl_nlmsg_put_extra_header(nlh, sizeof(*ifm)); + ifm->ifi_family = AF_UNSPEC; + ifm->ifi_index = vtep->ifindex; + ret = flow_tcf_nl_ack(tcf, nlh); + if (ret) + DRV_LOG(DEBUG, "error deleting VXLAN encap/decap ifindex %u", + ifm->ifi_index); +} + +/** + * Creates VTEP network device. + * + * @param[in] tcf + * Context object initialized by mlx5_flow_tcf_socket_open(). + * @param[in] port + * UDP port of created VTEP device. + * @param[out] error + * Perform verbose error reporting if not NULL. + * + * @return + * Pointer to created device structure on success, NULL otherwise + * and rte_errno is set. + */ +static struct mlx5_flow_tcf_vtep* +flow_tcf_create_iface(struct mlx5_tcf_socket *tcf, uint16_t port, + struct rte_flow_error *error) +{ + struct mlx5_flow_tcf_vtep *vtep; + struct nlmsghdr *nlh; + struct ifinfomsg *ifm; + alignas(struct nlmsghdr) + char name[sizeof(MLX5_VXLAN_DEVICE_PFX) + 24]; + uint8_t buf[mnl_nlmsg_size(sizeof(*ifm)) + + SZ_NLATTR_DATA_OF(sizeof(name)) + + SZ_NLATTR_NEST * 2 + + SZ_NLATTR_STRZ_OF("vxlan") + + SZ_NLATTR_TYPE_OF_UINT32 + + SZ_NLATTR_TYPE_OF_UINT16 + + SZ_NLATTR_TYPE_OF_UINT8 + 128]; + struct nlattr *na_info; + struct nlattr *na_vxlan; + rte_be16_t vxlan_port = RTE_BE16(port); + int ret; + + vtep = rte_zmalloc(__func__, sizeof(*vtep), + alignof(struct mlx5_flow_tcf_vtep)); + if (!vtep) { + rte_flow_error_set + (error, ENOMEM, RTE_FLOW_ERROR_TYPE_UNSPECIFIED, + NULL, "unadble to allocate memory for VTEP desc"); + return NULL; + } + *vtep = (struct mlx5_flow_tcf_vtep){ + .refcnt = 0, + .port = port, + .notcreated = 0, + }; + memset(buf, 0, sizeof(buf)); + nlh = mnl_nlmsg_put_header(buf); + nlh->nlmsg_type = RTM_NEWLINK; + nlh->nlmsg_flags = NLM_F_REQUEST | NLM_F_CREATE | NLM_F_EXCL; + ifm = mnl_nlmsg_put_extra_header(nlh, sizeof(*ifm)); + ifm->ifi_family = AF_UNSPEC; + ifm->ifi_type = 0; + ifm->ifi_index = 0; + ifm->ifi_flags = IFF_UP; + ifm->ifi_change = 0xffffffff; + snprintf(name, sizeof(name), "%s%u", MLX5_VXLAN_DEVICE_PFX, port); + mnl_attr_put_strz(nlh, IFLA_IFNAME, name); + na_info = mnl_attr_nest_start(nlh, IFLA_LINKINFO); + assert(na_info); + mnl_attr_put_strz(nlh, IFLA_INFO_KIND, "vxlan"); + na_vxlan = mnl_attr_nest_start(nlh, IFLA_INFO_DATA); + assert(na_vxlan); + mnl_attr_put_u8(nlh, IFLA_VXLAN_COLLECT_METADATA, 1); + mnl_attr_put_u8(nlh, IFLA_VXLAN_UDP_ZERO_CSUM6_RX, 1); + mnl_attr_put_u8(nlh, IFLA_VXLAN_LEARNING, 0); + mnl_attr_put_u16(nlh, IFLA_VXLAN_PORT, vxlan_port); + mnl_attr_nest_end(nlh, na_vxlan); + mnl_attr_nest_end(nlh, na_info); + assert(sizeof(buf) >= nlh->nlmsg_len); + ret = flow_tcf_nl_ack(tcf, nlh); + if (ret) { + DRV_LOG(WARNING, + "VTEP %s create failure (%d)", + name, rte_errno); + vtep->notcreated = 1; /* Assume the device exists. */ + } + ret = if_nametoindex(name); + if (ret) { + vtep->ifindex = ret; + memset(buf, 0, sizeof(buf)); + nlh = mnl_nlmsg_put_header(buf); + nlh->nlmsg_type = RTM_NEWLINK; + nlh->nlmsg_flags = NLM_F_REQUEST; + ifm = mnl_nlmsg_put_extra_header(nlh, sizeof(*ifm)); + ifm->ifi_family = AF_UNSPEC; + ifm->ifi_type = 0; + ifm->ifi_index = vtep->ifindex; + ifm->ifi_flags = IFF_UP; + ifm->ifi_change = IFF_UP; + ret = flow_tcf_nl_ack(tcf, nlh); + if (ret) { + DRV_LOG(WARNING, + "VTEP %s set link up failure (%d)", name, rte_errno); + rte_free(vtep); + rte_flow_error_set + (error, -errno, + RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL, + "netlink: failed to set VTEP link up"); + vtep = NULL; + } else { + ret = mlx5_flow_tcf_ifindex_init(tcf, + vtep->ifindex, error); + if (ret) + DRV_LOG(WARNING, + "VTEP %s init failure (%d)", name, rte_errno); + } + } else { + DRV_LOG(WARNING, + "VTEP %s failed to get index (%d)", name, errno); + rte_flow_error_set + (error, -errno, + RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL, + vtep->notcreated ? "netlink: failed to create VTEP" : + "netlink: failed to retrieve VTEP ifindex"); + ret = 1; + } + if (ret) { + if (!vtep->notcreated && vtep->ifindex) + flow_tcf_delete_iface(tcf, vtep); + rte_free(vtep); + vtep = NULL; + } + DRV_LOG(NOTICE, "VTEP create (%d, %s)", vtep->port, vtep ? "OK" : "error"); + return vtep; +} + +/** + * Creates target interface index for tunneling. + * + * @param tcf + * Context object initialized by mlx5_flow_tcf_socket_open(). + * @param[in] dev_flow + * Flow tcf object with tunnel structure pointer set. + * @param[out] error + * Perform verbose error reporting if not NULL. + * + * @return + * Interface index on success, zero otherwise and rte_errno is set. + */ +static unsigned int +flow_tcf_tunnel_vtep_create(struct mlx5_tcf_socket *tcf, + struct mlx5_flow *dev_flow, + struct rte_flow_error *error) +{ + unsigned int ret; + + assert(dev_flow->tcf.tunnel); + pthread_mutex_lock(&vtep_list_mutex); + switch (dev_flow->tcf.tunnel->type) { + case MLX5_FLOW_TCF_TUNACT_VXLAN_ENCAP: + if (!vtep_encap) { + vtep_encap = flow_tcf_create_iface(tcf, + MLX5_VXLAN_DEFAULT_PORT, error); + if (!vtep_encap) { + ret = 0; + break; + } + LIST_INSERT_HEAD(&vtep_list_vxlan, vtep_encap, next); + } + vtep_encap->refcnt++; + ret = vtep_encap->ifindex; + assert(ret); + break; + case MLX5_FLOW_TCF_TUNACT_VXLAN_DECAP: { + struct mlx5_flow_tcf_vtep *vtep; + uint16_t port = dev_flow->tcf.vxlan_decap->udp_port; + + LIST_FOREACH(vtep, &vtep_list_vxlan, next) { + if (vtep->port == port) + break; + } + if (!vtep) { + vtep = flow_tcf_create_iface(tcf, port, error); + if (!vtep) { + ret = 0; + break; + } + LIST_INSERT_HEAD(&vtep_list_vxlan, vtep, next); + } + vtep->refcnt++; + ret = vtep->ifindex; + assert(ret); + break; + } + default: + rte_flow_error_set(error, ENOTSUP, + RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL, + "unsupported tunnel type"); + ret = 0; + break; + } + pthread_mutex_unlock(&vtep_list_mutex); + return ret; +} + +/** + * Deletes tunneling interface by UDP port. + * + * @param tx + * Context object initialized by mlx5_flow_tcf_socket_open(). + * @param[in] dev_flow + * Flow tcf object with tunnel structure pointer set. + */ +static void +flow_tcf_tunnel_vtep_delete(struct mlx5_tcf_socket *tcf, + struct mlx5_flow *dev_flow) +{ + struct mlx5_flow_tcf_vtep *vtep; + uint16_t port = MLX5_VXLAN_DEFAULT_PORT; + + assert(dev_flow->tcf.tunnel); + pthread_mutex_lock(&vtep_list_mutex); + switch (dev_flow->tcf.tunnel->type) { + case MLX5_FLOW_TCF_TUNACT_VXLAN_DECAP: + port = dev_flow->tcf.vxlan_decap->udp_port; + /* There is no break operator intentionally. */ + case MLX5_FLOW_TCF_TUNACT_VXLAN_ENCAP: + LIST_FOREACH(vtep, &vtep_list_vxlan, next) { + if (vtep->port == port) + break; + } + if (!vtep) { + DRV_LOG(WARNING, + "No VTEP device found in the list"); + break; + } + assert(dev_flow->tcf.tunnel->ifindex_tun == vtep->ifindex); + assert(vtep->refcnt); + if (vtep->refcnt && --vtep->refcnt) + break; + if (!vtep->notcreated) + flow_tcf_delete_iface(tcf, vtep); + LIST_REMOVE(vtep, next); + if (vtep_encap == vtep) + vtep_encap = NULL; + rte_free(vtep); + break; + default: + assert(false); + DRV_LOG(WARNING, "Unsupported tunnel type"); + break; + } + pthread_mutex_unlock(&vtep_list_mutex); +} + /** * Apply flow to E-Switch by sending Netlink message. * @@ -2917,12 +3202,45 @@ struct flow_tcf_ptoi { nlh = dev_flow->tcf.nlh; nlh->nlmsg_type = RTM_NEWTFILTER; nlh->nlmsg_flags = NLM_F_REQUEST | NLM_F_CREATE | NLM_F_EXCL; + if (dev_flow->tcf.tunnel) { + /* + * Replace the interface index, target for + * encapsulation, source for decapsulation + */ + assert(!dev_flow->tcf.tunnel->ifindex_tun); + assert(dev_flow->tcf.tunnel->ifindex_ptr); + /* Create actual VTEP device when rule is being applied. */ + dev_flow->tcf.tunnel->ifindex_tun + = flow_tcf_tunnel_vtep_create(&priv->tcf_socket, + dev_flow, error); + DRV_LOG(INFO, "Replace ifindex: %d->%d", + dev_flow->tcf.tunnel->ifindex_tun, + *dev_flow->tcf.tunnel->ifindex_ptr); + if (!dev_flow->tcf.tunnel->ifindex_tun) + return -rte_errno; + dev_flow->tcf.tunnel->ifindex_org + = *dev_flow->tcf.tunnel->ifindex_ptr; + *dev_flow->tcf.tunnel->ifindex_ptr + = dev_flow->tcf.tunnel->ifindex_tun; + } ret = flow_tcf_nl_ack(tcf, nlh); + if (dev_flow->tcf.tunnel) { + DRV_LOG(INFO, "Restore ifindex: %d->%d", + dev_flow->tcf.tunnel->ifindex_org, + *dev_flow->tcf.tunnel->ifindex_ptr); + *dev_flow->tcf.tunnel->ifindex_ptr + = dev_flow->tcf.tunnel->ifindex_org; + dev_flow->tcf.tunnel->ifindex_org = 0; + } if (!ret) { dev_flow->tcf.applied = 1; return 0; } DRV_LOG(WARNING, "Failed to create TC rule (%d)", rte_errno); + if (dev_flow->tcf.tunnel->ifindex_tun) { + flow_tcf_tunnel_vtep_delete(&priv->tcf_socket, dev_flow); + dev_flow->tcf.tunnel->ifindex_tun = 0; + } return rte_flow_error_set(error, rte_errno, RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL, "netlink: failed to create TC flow rule"); @@ -2951,10 +3269,34 @@ struct flow_tcf_ptoi { return; /* E-Switch flow can't be expanded. */ assert(!LIST_NEXT(dev_flow, next)); + if (!dev_flow->tcf.applied) + return; + if (dev_flow->tcf.tunnel) { + /* + * Replace the interface index, target for + * encapsulation, source for decapsulation + */ + assert(dev_flow->tcf.tunnel->ifindex_tun); + assert(dev_flow->tcf.tunnel->ifindex_ptr); + dev_flow->tcf.tunnel->ifindex_org + = *dev_flow->tcf.tunnel->ifindex_ptr; + *dev_flow->tcf.tunnel->ifindex_ptr + = dev_flow->tcf.tunnel->ifindex_tun; + } nlh = dev_flow->tcf.nlh; nlh->nlmsg_type = RTM_DELTFILTER; nlh->nlmsg_flags = NLM_F_REQUEST; flow_tcf_nl_ack(tcf, nlh); + if (dev_flow->tcf.tunnel) { + *dev_flow->tcf.tunnel->ifindex_ptr + = dev_flow->tcf.tunnel->ifindex_org; + dev_flow->tcf.tunnel->ifindex_org = 0; + if (dev_flow->tcf.tunnel->ifindex_tun) { + flow_tcf_tunnel_vtep_delete(&priv->tcf_socket, + dev_flow); + dev_flow->tcf.tunnel->ifindex_tun = 0; + } + } dev_flow->tcf.applied = 0; }