[v8,2/3] net/af_xdp: fix multi interface support for K8s

Message ID 20240214160716.2391286-3-mtahhan@redhat.com (mailing list archive)
State Superseded, archived
Delegated to: Ferruh Yigit
Headers
Series net/af_xdp: fix multi interface support for K8s |

Checks

Context Check Description
ci/checkpatch warning coding style issues

Commit Message

Maryam Tahhan Feb. 14, 2024, 4:06 p.m. UTC
The original 'use_cni' implementation, was added
to enable support for the AF_XDP PMD in a K8s env
without any escalated privileges.
However 'use_cni' used a hardcoded socket rather
than a configurable one. If a DPDK pod is requesting
multiple net devices and these devices are from
different pools, then the AF_XDP PMD attempts to
mount all the netdev UDSes in the pod as /tmp/afxdp.sock.
Which means that at best only 1 netdev will handshake
correctly with the AF_XDP DP. This patch addresses
this by making the socket parameter configurable using
a new vdev param called 'dp_path' alongside the
original 'use_cni' param. If the 'dp_path' parameter
is not set alongside the 'use_cni' parameter, then
it's configured inside the AF_XDP PMD (transparently
to the user). This change has been tested
with the AF_XDP DP PR 81[1], with both single and
multiple interfaces.

Fixes: 7fc6ae50369d ("net/af_xdp: support CNI Integration")
Cc: stable@dpdk.org

Signed-off-by: Maryam Tahhan <mtahhan@redhat.com>
---
 doc/guides/howto/af_xdp_dp.rst         | 43 ++++++++++--
 doc/guides/nics/af_xdp.rst             | 14 ++++
 doc/guides/rel_notes/release_24_03.rst |  7 ++
 drivers/net/af_xdp/rte_eth_af_xdp.c    | 92 ++++++++++++++++----------
 4 files changed, 115 insertions(+), 41 deletions(-)
  

Patch

diff --git a/doc/guides/howto/af_xdp_dp.rst b/doc/guides/howto/af_xdp_dp.rst
index 657fc8d52c..8a64ec5599 100644
--- a/doc/guides/howto/af_xdp_dp.rst
+++ b/doc/guides/howto/af_xdp_dp.rst
@@ -52,13 +52,18 @@  should be used when creating the socket
 to instruct libbpf not to load the default libbpf program on the netdev.
 Instead the loading is handled by the AF_XDP Device Plugin.
 
+The EAL vdev argument ``dp_path`` is used alongside the ``use_cni`` argument
+to explicitly tell the AF_XDP PMD where to find the UDS to interact with the
+AF_XDP Device Plugin. If this argument is not passed alongside the ``use_cni``
+argument then the AF_XDP PMD configures it internally.
+
 Limitations
 -----------
 
 For DPDK versions <= v23.11 the Unix Domain Socket file path appears in
 the pod at "/tmp/afxdp.sock". The handshake implementation in the AF_XDP PMD
-is only compatible with the AF_XDP Device Plugin up to commit id `38317c2`_
-and the pod is limited to a single netdev.
+is only compatible with the `AF_XDP Device Plugin for Kubernetes`_  up to
+commit id `38317c2`_ and the pod is limited to a single netdev.
 
 .. note::
 
@@ -75,6 +80,14 @@  in the PMD alongside the `use_cni` parameter.
 
 .. _38317c2: https://github.com/intel/afxdp-plugins-for-kubernetes/commit/38317c256b5c7dfb39e013a0f76010c2ded03669
 
+.. note::
+
+    The introduction of the ``dp_path`` EAL vdev argument fixes the limitation above. If a
+    user doesn't explicitly set the ``dp_path``parameter when using ``use_cni`` then that
+    path is transparently configured in the AF_XDP PMD to the default
+    `AF_XDP Device Plugin for Kubernetes`_ mount point path. This is compatible with the latest
+    AF_XDP Device Plugin. For backwards compatibility with versions of the AF_XDP DP <= commit
+    id `38317c2`_ please explicitly set ``dp_path`` to ``/tmp/afxdp.sock``.
 
 Prerequisites
 -------------
@@ -105,10 +118,10 @@  Device Plugin and DPDK container prerequisites:
 
   .. code-block:: console
 
-     cat << EOF | sudo tee /etc/systemd/system/containerd.service.d/limits.conf
-     [Service]
-     LimitMEMLOCK=infinity
-     EOF
+    cat << EOF | sudo tee /etc/systemd/system/containerd.service.d/limits.conf
+    [Service]
+    LimitMEMLOCK=infinity
+    EOF
 
 * dpdk-testpmd application should have AF_XDP feature enabled.
 
@@ -284,7 +297,7 @@  Run dpdk-testpmd with the AF_XDP Device Plugin + CNI
         emptyDir:
           medium: HugePages
 
-  For further reference please use the `pod.yaml`_
+  For further reference please see the `pod.yaml`_
 
   .. _pod.yaml: https://github.com/intel/afxdp-plugins-for-kubernetes/blob/main/examples/pod-spec.yaml
 
@@ -297,3 +310,19 @@  Run dpdk-testpmd with the AF_XDP Device Plugin + CNI
            --vdev=net_af_xdp0,use_cni=1,iface=<interface name> \
            --no-mlockall --in-memory \
            -- -i --a --nb-cores=2 --rxq=1 --txq=1 --forward-mode=macswap;
+
+  Or
+
+  .. code-block:: console
+
+     kubectl exec -i <Pod name> --container <containers name> -- \
+           /<Path>/dpdk-testpmd -l 0,1 --no-pci \
+           --vdev=net_af_xdp0,use_cni=1,iface=<interface name>,dp_path="/tmp/afxdp_dp/<interface name>/afxdp.sock" \
+           --no-mlockall --in-memory \
+           -- -i --a --nb-cores=2 --rxq=1 --txq=1 --forward-mode=macswap;
+
+.. note::
+
+    If the ``dp_path`` parameter isn't explicitly set (like the example above)
+    the AF_XDP PMD will set the parameter value to
+    ``/tmp/afxdp_dp/<<interface name>>/afxdp.sock``.
diff --git a/doc/guides/nics/af_xdp.rst b/doc/guides/nics/af_xdp.rst
index 4dd9c73742..7f8651beda 100644
--- a/doc/guides/nics/af_xdp.rst
+++ b/doc/guides/nics/af_xdp.rst
@@ -171,6 +171,20 @@  enable the `AF_XDP Device Plugin for Kubernetes`_ with a DPDK application/pod.
    so enabling and disabling of the promiscuous mode through the DPDK application
    is also not supported.
 
+dp_path
+~~~~~~~
+
+The EAL vdev argument ``dp_path`` is used alongside the ``use_cni`` argument
+to explicitly tell the AF_XDP PMD where to find the UDS to interact with the
+`AF_XDP Device Plugin for Kubernetes`_. If this argument is not passed
+alongside the ``use_cni`` argument then the AF_XDP PMD configures it internally.
+
+.. _AF_XDP Device Plugin for Kubernetes: https://github.com/intel/afxdp-plugins-for-kubernetes
+
+.. code-block:: console
+
+   --vdev=net_af_xdp0,use_cni=1,dp_path="/tmp/afxdp_dp/<<interface name>>/afxdp.sock"
+
 Limitations
 -----------
 
diff --git a/doc/guides/rel_notes/release_24_03.rst b/doc/guides/rel_notes/release_24_03.rst
index 6f8ad27808..7f3653ad6f 100644
--- a/doc/guides/rel_notes/release_24_03.rst
+++ b/doc/guides/rel_notes/release_24_03.rst
@@ -55,6 +55,13 @@  New Features
      Also, make sure to start the actual text at the margin.
      =======================================================
 
+* **Enabled AF_XDP PMD multi interface (UDS) support with AF_XDP Device Plugin**.
+
+  The EAL vdev argument for the AF_XDP PMD ``use_cni`` previously limited
+  a pod to using only a single netdev/interface. The latest changes (adding
+  the ``dp_path`` parameter) remove this limitation and maintain backward
+  compatibility for any applications already using the ``use_cni`` vdev
+  argument with the AF_XDP Device Plugin.
 
 Removed Items
 -------------
diff --git a/drivers/net/af_xdp/rte_eth_af_xdp.c b/drivers/net/af_xdp/rte_eth_af_xdp.c
index 353c8688ec..6be8badf54 100644
--- a/drivers/net/af_xdp/rte_eth_af_xdp.c
+++ b/drivers/net/af_xdp/rte_eth_af_xdp.c
@@ -83,12 +83,13 @@  RTE_LOG_REGISTER_DEFAULT(af_xdp_logtype, NOTICE);
 
 #define ETH_AF_XDP_MP_KEY "afxdp_mp_send_fds"
 
+#define DP_BASE_PATH			"/tmp/afxdp_dp"
+#define DP_UDS_SOCK             "afxdp.sock"
 #define MAX_LONG_OPT_SZ			64
 #define UDS_MAX_FD_NUM			2
 #define UDS_MAX_CMD_LEN			64
 #define UDS_MAX_CMD_RESP		128
 #define UDS_XSK_MAP_FD_MSG		"/xsk_map_fd"
-#define UDS_SOCK			"/tmp/afxdp.sock"
 #define UDS_CONNECT_MSG			"/connect"
 #define UDS_HOST_OK_MSG			"/host_ok"
 #define UDS_HOST_NAK_MSG		"/host_nak"
@@ -171,6 +172,7 @@  struct pmd_internals {
 	bool custom_prog_configured;
 	bool force_copy;
 	bool use_cni;
+	char dp_path[PATH_MAX];
 	struct bpf_map *map;
 
 	struct rte_ether_addr eth_addr;
@@ -191,6 +193,7 @@  struct pmd_process_private {
 #define ETH_AF_XDP_BUDGET_ARG			"busy_budget"
 #define ETH_AF_XDP_FORCE_COPY_ARG		"force_copy"
 #define ETH_AF_XDP_USE_CNI_ARG			"use_cni"
+#define ETH_AF_XDP_DP_PATH_ARG			"dp_path"
 
 static const char * const valid_arguments[] = {
 	ETH_AF_XDP_IFACE_ARG,
@@ -201,6 +204,7 @@  static const char * const valid_arguments[] = {
 	ETH_AF_XDP_BUDGET_ARG,
 	ETH_AF_XDP_FORCE_COPY_ARG,
 	ETH_AF_XDP_USE_CNI_ARG,
+	ETH_AF_XDP_DP_PATH_ARG,
 	NULL
 };
 
@@ -1351,7 +1355,7 @@  configure_preferred_busy_poll(struct pkt_rx_queue *rxq)
 }
 
 static int
-init_uds_sock(struct sockaddr_un *server)
+init_uds_sock(struct sockaddr_un *server, const char *dp_path)
 {
 	int sock;
 
@@ -1362,7 +1366,7 @@  init_uds_sock(struct sockaddr_un *server)
 	}
 
 	server->sun_family = AF_UNIX;
-	strlcpy(server->sun_path, UDS_SOCK, sizeof(server->sun_path));
+	strlcpy(server->sun_path, dp_path, sizeof(server->sun_path));
 
 	if (connect(sock, (struct sockaddr *)server, sizeof(struct sockaddr_un)) < 0) {
 		close(sock);
@@ -1382,7 +1386,7 @@  struct msg_internal {
 };
 
 static int
-send_msg(int sock, char *request, int *fd)
+send_msg(int sock, char *request, int *fd, const char *dp_path)
 {
 	int snd;
 	struct iovec iov;
@@ -1393,7 +1397,7 @@  send_msg(int sock, char *request, int *fd)
 
 	memset(&dst, 0, sizeof(dst));
 	dst.sun_family = AF_UNIX;
-	strlcpy(dst.sun_path, UDS_SOCK, sizeof(dst.sun_path));
+	strlcpy(dst.sun_path, dp_path, sizeof(dst.sun_path));
 
 	/* Initialize message header structure */
 	memset(&msgh, 0, sizeof(msgh));
@@ -1470,8 +1474,8 @@  read_msg(int sock, char *response, struct sockaddr_un *s, int *fd)
 }
 
 static int
-make_request_cni(int sock, struct sockaddr_un *server, char *request,
-		 int *req_fd, char *response, int *out_fd)
+make_request_dp(int sock, struct sockaddr_un *server, char *request,
+		 int *req_fd, char *response, int *out_fd, const char *dp_path)
 {
 	int rval;
 
@@ -1483,7 +1487,7 @@  make_request_cni(int sock, struct sockaddr_un *server, char *request,
 	if (req_fd == NULL)
 		rval = write(sock, request, strlen(request));
 	else
-		rval = send_msg(sock, request, req_fd);
+		rval = send_msg(sock, request, req_fd, dp_path);
 
 	if (rval < 0) {
 		AF_XDP_LOG(ERR, "Write error %s\n", strerror(errno));
@@ -1507,7 +1511,7 @@  check_response(char *response, char *exp_resp, long size)
 }
 
 static int
-get_cni_fd(char *if_name)
+uds_get_xskmap_fd(char *if_name, const char *dp_path)
 {
 	char request[UDS_MAX_CMD_LEN], response[UDS_MAX_CMD_RESP];
 	char hostname[MAX_LONG_OPT_SZ], exp_resp[UDS_MAX_CMD_RESP];
@@ -1520,14 +1524,14 @@  get_cni_fd(char *if_name)
 		return -1;
 
 	memset(&server, 0, sizeof(server));
-	sock = init_uds_sock(&server);
+	sock = init_uds_sock(&server, dp_path);
 	if (sock < 0)
 		return -1;
 
-	/* Initiates handshake to CNI send: /connect,hostname */
+	/* Initiates handshake to the AF_XDP Device Plugin send: /connect,hostname */
 	snprintf(request, sizeof(request), "%s,%s", UDS_CONNECT_MSG, hostname);
 	memset(response, 0, sizeof(response));
-	if (make_request_cni(sock, &server, request, NULL, response, &out_fd) < 0) {
+	if (make_request_dp(sock, &server, request, NULL, response, &out_fd, dp_path) < 0) {
 		AF_XDP_LOG(ERR, "Error in processing cmd [%s]\n", request);
 		goto err_close;
 	}
@@ -1541,7 +1545,7 @@  get_cni_fd(char *if_name)
 	/* Request for "/version" */
 	strlcpy(request, UDS_VERSION_MSG, UDS_MAX_CMD_LEN);
 	memset(response, 0, sizeof(response));
-	if (make_request_cni(sock, &server, request, NULL, response, &out_fd) < 0) {
+	if (make_request_dp(sock, &server, request, NULL, response, &out_fd, dp_path) < 0) {
 		AF_XDP_LOG(ERR, "Error in processing cmd [%s]\n", request);
 		goto err_close;
 	}
@@ -1549,7 +1553,7 @@  get_cni_fd(char *if_name)
 	/* Request for file descriptor for netdev name*/
 	snprintf(request, sizeof(request), "%s,%s", UDS_XSK_MAP_FD_MSG, if_name);
 	memset(response, 0, sizeof(response));
-	if (make_request_cni(sock, &server, request, NULL, response, &out_fd) < 0) {
+	if (make_request_dp(sock, &server, request, NULL, response, &out_fd, dp_path) < 0) {
 		AF_XDP_LOG(ERR, "Error in processing cmd [%s]\n", request);
 		goto err_close;
 	}
@@ -1571,7 +1575,7 @@  get_cni_fd(char *if_name)
 	/* Initiate close connection */
 	strlcpy(request, UDS_FIN_MSG, UDS_MAX_CMD_LEN);
 	memset(response, 0, sizeof(response));
-	if (make_request_cni(sock, &server, request, NULL, response, &out_fd) < 0) {
+	if (make_request_dp(sock, &server, request, NULL, response, &out_fd, dp_path) < 0) {
 		AF_XDP_LOG(ERR, "Error in processing cmd [%s]\n", request);
 		goto err_close;
 	}
@@ -1695,17 +1699,16 @@  xsk_configure(struct pmd_internals *internals, struct pkt_rx_queue *rxq,
 	}
 
 	if (internals->use_cni) {
-		int err, fd, map_fd;
+		int err, map_fd;
 
-		/* get socket fd from CNI plugin */
-		map_fd = get_cni_fd(internals->if_name);
+		/* get socket fd from AF_XDP Device Plugin */
+		map_fd = uds_get_xskmap_fd(internals->if_name, internals->dp_path);
 		if (map_fd < 0) {
-			AF_XDP_LOG(ERR, "Failed to receive CNI plugin fd\n");
+			AF_XDP_LOG(ERR, "Failed to receive xskmap fd from AF_XDP Device Plugin\n");
 			goto out_xsk;
 		}
-		/* get socket fd */
-		fd = xsk_socket__fd(rxq->xsk);
-		err = bpf_map_update_elem(map_fd, &rxq->xsk_queue_idx, &fd, 0);
+
+		err = xsk_socket__update_xskmap(rxq->xsk, map_fd);
 		if (err) {
 			AF_XDP_LOG(ERR, "Failed to insert unprivileged xsk in map.\n");
 			goto out_xsk;
@@ -1881,13 +1884,13 @@  static const struct eth_dev_ops ops = {
 	.get_monitor_addr = eth_get_monitor_addr,
 };
 
-/* CNI option works in unprivileged container environment
- * and ethernet device functionality will be reduced. So
- * additional customiszed eth_dev_ops struct is needed
- * for cni. Promiscuous enable and disable functionality
- * is removed.
+/* AF_XDP Device Plugin option works in unprivileged
+ * container environments and ethernet device functionality
+ * will be reduced. So additional customised eth_dev_ops
+ * struct is needed for the Device Plugin. Promiscuous
+ * enable and disable functionality is removed.
  **/
-static const struct eth_dev_ops ops_cni = {
+static const struct eth_dev_ops ops_afxdp_dp = {
 	.dev_start = eth_dev_start,
 	.dev_stop = eth_dev_stop,
 	.dev_close = eth_dev_close,
@@ -2023,7 +2026,8 @@  xdp_get_channels_info(const char *if_name, int *max_queues,
 static int
 parse_parameters(struct rte_kvargs *kvlist, char *if_name, int *start_queue,
 		 int *queue_cnt, int *shared_umem, char *prog_path,
-		 int *busy_budget, int *force_copy, int *use_cni)
+		 int *busy_budget, int *force_copy, int *use_cni,
+		 char *dp_path)
 {
 	int ret;
 
@@ -2069,6 +2073,11 @@  parse_parameters(struct rte_kvargs *kvlist, char *if_name, int *start_queue,
 	if (ret < 0)
 		goto free_kvlist;
 
+	ret = rte_kvargs_process(kvlist, ETH_AF_XDP_DP_PATH_ARG,
+				 &parse_prog_arg, dp_path);
+	if (ret < 0)
+		goto free_kvlist;
+
 free_kvlist:
 	rte_kvargs_free(kvlist);
 	return ret;
@@ -2108,7 +2117,7 @@  static struct rte_eth_dev *
 init_internals(struct rte_vdev_device *dev, const char *if_name,
 	       int start_queue_idx, int queue_cnt, int shared_umem,
 	       const char *prog_path, int busy_budget, int force_copy,
-	       int use_cni)
+	       int use_cni, const char *dp_path)
 {
 	const char *name = rte_vdev_device_name(dev);
 	const unsigned int numa_node = dev->device.numa_node;
@@ -2138,6 +2147,7 @@  init_internals(struct rte_vdev_device *dev, const char *if_name,
 	internals->shared_umem = shared_umem;
 	internals->force_copy = force_copy;
 	internals->use_cni = use_cni;
+    strlcpy(internals->dp_path, dp_path, PATH_MAX);
 
 	if (xdp_get_channels_info(if_name, &internals->max_queue_cnt,
 				  &internals->combined_queue_cnt)) {
@@ -2199,7 +2209,7 @@  init_internals(struct rte_vdev_device *dev, const char *if_name,
 	if (!internals->use_cni)
 		eth_dev->dev_ops = &ops;
 	else
-		eth_dev->dev_ops = &ops_cni;
+		eth_dev->dev_ops = &ops_afxdp_dp;
 
 	eth_dev->rx_pkt_burst = eth_af_xdp_rx;
 	eth_dev->tx_pkt_burst = eth_af_xdp_tx;
@@ -2328,6 +2338,7 @@  rte_pmd_af_xdp_probe(struct rte_vdev_device *dev)
 	int busy_budget = -1, ret;
 	int force_copy = 0;
 	int use_cni = 0;
+    char dp_path[PATH_MAX] = {'\0'};
 	struct rte_eth_dev *eth_dev = NULL;
 	const char *name = rte_vdev_device_name(dev);
 
@@ -2370,7 +2381,7 @@  rte_pmd_af_xdp_probe(struct rte_vdev_device *dev)
 
 	if (parse_parameters(kvlist, if_name, &xsk_start_queue_idx,
 			     &xsk_queue_cnt, &shared_umem, prog_path,
-			     &busy_budget, &force_copy, &use_cni) < 0) {
+			     &busy_budget, &force_copy, &use_cni, dp_path) < 0) {
 		AF_XDP_LOG(ERR, "Invalid kvargs value\n");
 		return -EINVAL;
 	}
@@ -2387,6 +2398,18 @@  rte_pmd_af_xdp_probe(struct rte_vdev_device *dev)
 			return -EINVAL;
 	}
 
+	if (use_cni && !strnlen(dp_path, PATH_MAX)) {
+		snprintf(dp_path, sizeof(dp_path), "%s/%s/%s", DP_BASE_PATH, if_name, DP_UDS_SOCK);
+		AF_XDP_LOG(INFO, "'%s' parameter not provided, setting value to '%s'\n",
+			ETH_AF_XDP_DP_PATH_ARG, dp_path);
+	}
+
+    if (!use_cni && strnlen(dp_path, PATH_MAX)) {
+		AF_XDP_LOG(ERR, "'%s' parameter is set, but '%s' was not enabled\n",
+			ETH_AF_XDP_DP_PATH_ARG, ETH_AF_XDP_USE_CNI_ARG);
+			return -EINVAL;
+	}
+
 	if (strlen(if_name) == 0) {
 		AF_XDP_LOG(ERR, "Network interface must be specified\n");
 		return -EINVAL;
@@ -2410,7 +2433,7 @@  rte_pmd_af_xdp_probe(struct rte_vdev_device *dev)
 
 	eth_dev = init_internals(dev, if_name, xsk_start_queue_idx,
 				 xsk_queue_cnt, shared_umem, prog_path,
-				 busy_budget, force_copy, use_cni);
+				 busy_budget, force_copy, use_cni, dp_path);
 	if (eth_dev == NULL) {
 		AF_XDP_LOG(ERR, "Failed to init internals\n");
 		return -1;
@@ -2471,4 +2494,5 @@  RTE_PMD_REGISTER_PARAM_STRING(net_af_xdp,
 			      "xdp_prog=<string> "
 			      "busy_budget=<int> "
 			      "force_copy=<int> "
-			      "use_cni=<int> ");
+			      "use_cni=<int> "
+			      "dp_path=<string> ");