[v11,3/3] net/af_xdp: support AF_XDP DP pinned maps

Message ID 20240229132129.656166-4-mtahhan@redhat.com (mailing list archive)
State Changes Requested, archived
Delegated to: Ferruh Yigit
Headers
Series net/af_xdp: fix multi interface support for K8s |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/loongarch-compilation success Compilation OK
ci/loongarch-unit-testing success Unit Testing PASS
ci/Intel-compilation success Compilation OK
ci/intel-Testing success Testing PASS
ci/github-robot: build success github build: passed
ci/intel-Functional success Functional PASS
ci/iol-intel-Performance success Performance Testing PASS
ci/iol-mellanox-Performance success Performance Testing PASS
ci/iol-abi-testing success Testing PASS
ci/iol-compile-amd64-testing success Testing PASS
ci/iol-unit-arm64-testing success Testing PASS
ci/iol-compile-arm64-testing success Testing PASS
ci/iol-unit-amd64-testing success Testing PASS
ci/iol-intel-Functional success Functional Testing PASS
ci/iol-broadcom-Performance success Performance Testing PASS
ci/iol-broadcom-Functional success Functional Testing PASS
ci/iol-sample-apps-testing success Testing PASS

Commit Message

Maryam Tahhan Feb. 29, 2024, 1:21 p.m. UTC
Enable the AF_XDP PMD to retrieve the xskmap
from a pinned eBPF map. This map is expected
to be pinned by an external entity like the
AF_XDP Device Plugin. This enabled unprivileged
pods to create and use AF_XDP sockets.

Signed-off-by: Maryam Tahhan <mtahhan@redhat.com>
---
 doc/guides/howto/af_xdp_dp.rst         | 35 ++++++++--
 doc/guides/nics/af_xdp.rst             | 34 ++++++++--
 doc/guides/rel_notes/release_24_03.rst | 10 +++
 drivers/net/af_xdp/rte_eth_af_xdp.c    | 93 ++++++++++++++++++++------
 4 files changed, 141 insertions(+), 31 deletions(-)
  

Patch

diff --git a/doc/guides/howto/af_xdp_dp.rst b/doc/guides/howto/af_xdp_dp.rst
index 4aa6b5499f..8b9b5ebbad 100644
--- a/doc/guides/howto/af_xdp_dp.rst
+++ b/doc/guides/howto/af_xdp_dp.rst
@@ -52,10 +52,21 @@  should be used when creating the socket
 to instruct libbpf not to load the default libbpf program on the netdev.
 Instead the loading is handled by the AF_XDP Device Plugin.
 
-The EAL vdev argument ``dp_path`` is used alongside the ``use_cni`` argument
-to explicitly tell the AF_XDP PMD where to find the UDS to interact with the
-AF_XDP Device Plugin. If this argument is not passed alongside the ``use_cni``
-argument then the AF_XDP PMD configures it internally.
+The EAL vdev argument ``use_pinned_map`` is used indicate to the AF_XDP PMD to
+retrieve the XSKMAP fd from a pinned eBPF map. This map is expected to be pinned
+by an external entity like the AF_XDP Device Plugin. This enabled unprivileged pods
+to create and use AF_XDP sockets. When this flag is set, the
+``XSK_LIBBPF_FLAGS__INHIBIT_PROG_LOAD`` libbpf flag is used by the AF_XDP PMD when
+creating the AF_XDP socket.
+
+The EAL vdev argument ``dp_path`` is used alongside the ``use_cni`` or ``use_pinned_map``
+arguments to explicitly tell the AF_XDP PMD where to find either:
+
+1. The UDS to interact with the AF_XDP Device Plugin. OR
+2. The pinned xskmap to use when creating AF_XDP sockets.
+
+If this argument is not passed alongside the ``use_cni`` or ``use_pinned_map`` arguments then
+the AF_XDP PMD configures it internally to the `AF_XDP Device Plugin for Kubernetes`_.
 
 .. note::
 
@@ -312,8 +323,18 @@  Run dpdk-testpmd with the AF_XDP Device Plugin + CNI
            --no-mlockall --in-memory \
            -- -i --a --nb-cores=2 --rxq=1 --txq=1 --forward-mode=macswap;
 
+  Or
+
+  .. code-block:: console
+
+     kubectl exec -i <Pod name> --container <containers name> -- \
+           /<Path>/dpdk-testpmd -l 0,1 --no-pci \
+           --vdev=net_af_xdp0,use_pinned_map=1,iface=<interface name>,dp_path="/tmp/afxdp_dp/<interface name>/xsks_map" \
+           --no-mlockall --in-memory \
+           -- -i --a --nb-cores=2 --rxq=1 --txq=1 --forward-mode=macswap;
+
 .. note::
 
-    If the ``dp_path`` parameter isn't explicitly set (like the example above)
-    the AF_XDP PMD will set the parameter value to
-    ``/tmp/afxdp_dp/<<interface name>>/afxdp.sock``.
+    If the ``dp_path`` parameter isn't explicitly set with ``use_cni`` or ``use_pinned_map``
+    the AF_XDP PMD will set the parameter values to the `AF_XDP Device Plugin for Kubernetes`_
+    defaults.
diff --git a/doc/guides/nics/af_xdp.rst b/doc/guides/nics/af_xdp.rst
index 7f8651beda..940bbf60f2 100644
--- a/doc/guides/nics/af_xdp.rst
+++ b/doc/guides/nics/af_xdp.rst
@@ -171,13 +171,35 @@  enable the `AF_XDP Device Plugin for Kubernetes`_ with a DPDK application/pod.
    so enabling and disabling of the promiscuous mode through the DPDK application
    is also not supported.
 
+use_pinned_map
+~~~~~~~~~~~~~~
+
+The EAL vdev argument ``use_pinned_map`` is used to indicate that the user wishes to
+load a pinned xskmap mounted by `AF_XDP Device Plugin for Kubernetes`_ in the DPDK
+application/pod.
+
+.. _AF_XDP Device Plugin for Kubernetes: https://github.com/intel/afxdp-plugins-for-kubernetes
+
+.. code-block:: console
+
+   --vdev=net_af_xdp0,use_pinned_map=1
+
+.. note::
+
+    This feature can also be used with any external entity that can pin an eBPF map, not just
+    the `AF_XDP Device Plugin for Kubernetes`_.
+
 dp_path
 ~~~~~~~
 
-The EAL vdev argument ``dp_path`` is used alongside the ``use_cni`` argument
-to explicitly tell the AF_XDP PMD where to find the UDS to interact with the
-`AF_XDP Device Plugin for Kubernetes`_. If this argument is not passed
-alongside the ``use_cni`` argument then the AF_XDP PMD configures it internally.
+The EAL vdev argument ``dp_path`` is used alongside the ``use_cni`` or ``use_pinned_map``
+arguments to explicitly tell the AF_XDP PMD where to find either:
+
+1. The UDS to interact with the AF_XDP Device Plugin. OR
+2. The pinned xskmap to use when creating AF_XDP sockets.
+
+If this argument is not passed alongside the ``use_cni`` or ``use_pinned_map`` arguments then
+the AF_XDP PMD configures it internally to the `AF_XDP Device Plugin for Kubernetes`_.
 
 .. _AF_XDP Device Plugin for Kubernetes: https://github.com/intel/afxdp-plugins-for-kubernetes
 
@@ -185,6 +207,10 @@  alongside the ``use_cni`` argument then the AF_XDP PMD configures it internally.
 
    --vdev=net_af_xdp0,use_cni=1,dp_path="/tmp/afxdp_dp/<<interface name>>/afxdp.sock"
 
+.. code-block:: console
+
+   --vdev=net_af_xdp0,use_pinned_map=1,dp_path="/tmp/afxdp_dp/<<interface name>>/xsks_map"
+
 Limitations
 -----------
 
diff --git a/doc/guides/rel_notes/release_24_03.rst b/doc/guides/rel_notes/release_24_03.rst
index b2b1f2566f..95d9a0f842 100644
--- a/doc/guides/rel_notes/release_24_03.rst
+++ b/doc/guides/rel_notes/release_24_03.rst
@@ -146,6 +146,16 @@  New Features
   compatibility for any applications already using the ``use_cni`` vdev
   argument with the AF_XDP Device Plugin.
 
+* **Integrated AF_XDP PMD with AF_XDP Device Plugin eBPF map pinning support**.
+
+  The EAL vdev argument for the AF_XDP PMD ``use_map_pinning`` was added
+  to allow Kubernetes Pods to use AF_XDP with DPDK, and run  with limited
+  privileges, without having to do a full handshake over a Unix Domain
+  Socket with the Device Plugin. This flag indicates that the AF_XDP PMD
+  will be used in unprivileged mode and will obtain the XSKMAP FD by calling
+  ``bpf_obj_get()`` for an xskmap pinned (by the AF_XDP DP) inside the
+  container.
+
 Removed Items
 -------------
 
diff --git a/drivers/net/af_xdp/rte_eth_af_xdp.c b/drivers/net/af_xdp/rte_eth_af_xdp.c
index 3fb0c6a3b9..f13bdb9017 100644
--- a/drivers/net/af_xdp/rte_eth_af_xdp.c
+++ b/drivers/net/af_xdp/rte_eth_af_xdp.c
@@ -85,6 +85,7 @@  RTE_LOG_REGISTER_DEFAULT(af_xdp_logtype, NOTICE);
 
 #define DP_BASE_PATH			"/tmp/afxdp_dp"
 #define DP_UDS_SOCK             "afxdp.sock"
+#define DP_XSK_MAP				"xsks_map"
 #define MAX_LONG_OPT_SZ			64
 #define UDS_MAX_FD_NUM			2
 #define UDS_MAX_CMD_LEN			64
@@ -172,6 +173,7 @@  struct pmd_internals {
 	bool custom_prog_configured;
 	bool force_copy;
 	bool use_cni;
+	bool use_pinned_map;
 	char dp_path[PATH_MAX];
 	struct bpf_map *map;
 
@@ -193,6 +195,7 @@  struct pmd_process_private {
 #define ETH_AF_XDP_BUDGET_ARG			"busy_budget"
 #define ETH_AF_XDP_FORCE_COPY_ARG		"force_copy"
 #define ETH_AF_XDP_USE_CNI_ARG			"use_cni"
+#define ETH_AF_XDP_USE_PINNED_MAP_ARG	"use_pinned_map"
 #define ETH_AF_XDP_DP_PATH_ARG			"dp_path"
 
 static const char * const valid_arguments[] = {
@@ -204,6 +207,7 @@  static const char * const valid_arguments[] = {
 	ETH_AF_XDP_BUDGET_ARG,
 	ETH_AF_XDP_FORCE_COPY_ARG,
 	ETH_AF_XDP_USE_CNI_ARG,
+	ETH_AF_XDP_USE_PINNED_MAP_ARG,
 	ETH_AF_XDP_DP_PATH_ARG,
 	NULL
 };
@@ -1258,6 +1262,21 @@  xsk_umem_info *xdp_umem_configure(struct pmd_internals *internals,
 }
 #endif
 
+static int
+get_pinned_map(const char *dp_path, int *map_fd)
+{
+	*map_fd  = bpf_obj_get(dp_path);
+	if (!*map_fd) {
+		AF_XDP_LOG(ERR, "Failed to find xsks_map in %s\n", dp_path);
+		return -1;
+	}
+
+	AF_XDP_LOG(INFO, "Successfully retrieved map %s with fd %d\n",
+				dp_path, *map_fd);
+
+	return 0;
+}
+
 static int
 load_custom_xdp_prog(const char *prog_path, int if_index, struct bpf_map **map)
 {
@@ -1644,7 +1663,7 @@  xsk_configure(struct pmd_internals *internals, struct pkt_rx_queue *rxq,
 #endif
 
 	/* Disable libbpf from loading XDP program */
-	if (internals->use_cni)
+	if (internals->use_cni || internals->use_pinned_map)
 		cfg.libbpf_flags |= XSK_LIBBPF_FLAGS__INHIBIT_PROG_LOAD;
 
 	if (strnlen(internals->prog_path, PATH_MAX)) {
@@ -1698,14 +1717,23 @@  xsk_configure(struct pmd_internals *internals, struct pkt_rx_queue *rxq,
 		}
 	}
 
-	if (internals->use_cni) {
+	if (internals->use_cni || internals->use_pinned_map) {
 		int err, map_fd;
 
-		/* get socket fd from AF_XDP Device Plugin */
-		map_fd = uds_get_xskmap_fd(internals->if_name, internals->dp_path);
-		if (map_fd < 0) {
-			AF_XDP_LOG(ERR, "Failed to receive xskmap fd from AF_XDP Device Plugin\n");
-			goto out_xsk;
+		if (internals->use_cni) {
+			/* get socket fd from AF_XDP Device Plugin */
+			map_fd = uds_get_xskmap_fd(internals->if_name, internals->dp_path);
+			if (map_fd < 0) {
+				AF_XDP_LOG(ERR, "Failed to receive xskmap fd from AF_XDP Device Plugin\n");
+				goto out_xsk;
+			}
+		} else {
+			/* get socket fd from AF_XDP plugin */
+			err = get_pinned_map(internals->dp_path, &map_fd);
+			if (err < 0 || map_fd < 0) {
+				AF_XDP_LOG(ERR, "Failed to retrieve pinned map fd\n");
+				goto out_xsk;
+			}
 		}
 
 		err = xsk_socket__update_xskmap(rxq->xsk, map_fd);
@@ -2027,7 +2055,7 @@  static int
 parse_parameters(struct rte_kvargs *kvlist, char *if_name, int *start_queue,
 		 int *queue_cnt, int *shared_umem, char *prog_path,
 		 int *busy_budget, int *force_copy, int *use_cni,
-		 char *dp_path)
+		 int *use_pinned_map, char *dp_path)
 {
 	int ret;
 
@@ -2073,6 +2101,11 @@  parse_parameters(struct rte_kvargs *kvlist, char *if_name, int *start_queue,
 	if (ret < 0)
 		goto free_kvlist;
 
+	ret = rte_kvargs_process(kvlist, ETH_AF_XDP_USE_PINNED_MAP_ARG,
+				 &parse_integer_arg, use_pinned_map);
+	if (ret < 0)
+		goto free_kvlist;
+
 	ret = rte_kvargs_process(kvlist, ETH_AF_XDP_DP_PATH_ARG,
 				 &parse_prog_arg, dp_path);
 	if (ret < 0)
@@ -2117,7 +2150,7 @@  static struct rte_eth_dev *
 init_internals(struct rte_vdev_device *dev, const char *if_name,
 	       int start_queue_idx, int queue_cnt, int shared_umem,
 	       const char *prog_path, int busy_budget, int force_copy,
-	       int use_cni, const char *dp_path)
+	       int use_cni, int use_pinned_map, const char *dp_path)
 {
 	const char *name = rte_vdev_device_name(dev);
 	const unsigned int numa_node = dev->device.numa_node;
@@ -2147,6 +2180,7 @@  init_internals(struct rte_vdev_device *dev, const char *if_name,
 	internals->shared_umem = shared_umem;
 	internals->force_copy = force_copy;
 	internals->use_cni = use_cni;
+	internals->use_pinned_map = use_pinned_map;
 	strlcpy(internals->dp_path, dp_path, PATH_MAX);
 
 	if (xdp_get_channels_info(if_name, &internals->max_queue_cnt,
@@ -2206,7 +2240,7 @@  init_internals(struct rte_vdev_device *dev, const char *if_name,
 	eth_dev->data->dev_link = pmd_link;
 	eth_dev->data->mac_addrs = &internals->eth_addr;
 	eth_dev->data->dev_flags |= RTE_ETH_DEV_AUTOFILL_QUEUE_XSTATS;
-	if (!internals->use_cni)
+	if (!internals->use_cni && !internals->use_pinned_map)
 		eth_dev->dev_ops = &ops;
 	else
 		eth_dev->dev_ops = &ops_afxdp_dp;
@@ -2338,6 +2372,7 @@  rte_pmd_af_xdp_probe(struct rte_vdev_device *dev)
 	int busy_budget = -1, ret;
 	int force_copy = 0;
 	int use_cni = 0;
+	int use_pinned_map = 0;
 	char dp_path[PATH_MAX] = {'\0'};
 	struct rte_eth_dev *eth_dev = NULL;
 	const char *name = rte_vdev_device_name(dev);
@@ -2381,20 +2416,29 @@  rte_pmd_af_xdp_probe(struct rte_vdev_device *dev)
 
 	if (parse_parameters(kvlist, if_name, &xsk_start_queue_idx,
 			     &xsk_queue_cnt, &shared_umem, prog_path,
-			     &busy_budget, &force_copy, &use_cni, dp_path) < 0) {
+			     &busy_budget, &force_copy, &use_cni, &use_pinned_map,
+			     dp_path) < 0) {
 		AF_XDP_LOG(ERR, "Invalid kvargs value\n");
 		return -EINVAL;
 	}
 
-	if (use_cni && busy_budget > 0) {
+	if (use_cni && use_pinned_map) {
 		AF_XDP_LOG(ERR, "When '%s' parameter is used, '%s' parameter is not valid\n",
-			ETH_AF_XDP_USE_CNI_ARG, ETH_AF_XDP_BUDGET_ARG);
+			ETH_AF_XDP_USE_CNI_ARG, ETH_AF_XDP_USE_PINNED_MAP_ARG);
 		return -EINVAL;
 	}
 
-	if (use_cni && strnlen(prog_path, PATH_MAX)) {
-		AF_XDP_LOG(ERR, "When '%s' parameter is used, '%s' parameter is not valid\n",
-			ETH_AF_XDP_USE_CNI_ARG, ETH_AF_XDP_PROG_ARG);
+	if ((use_cni || use_pinned_map) && busy_budget > 0) {
+		AF_XDP_LOG(ERR, "When '%s' or '%s' parameter is used, '%s' parameter is not valid\n",
+			ETH_AF_XDP_USE_CNI_ARG, ETH_AF_XDP_USE_PINNED_MAP_ARG,
+			ETH_AF_XDP_BUDGET_ARG);
+		return -EINVAL;
+	}
+
+	if ((use_cni || use_pinned_map) && strnlen(prog_path, PATH_MAX)) {
+		AF_XDP_LOG(ERR, "When '%s' or '%s' parameter is used, '%s' parameter is not valid\n",
+			ETH_AF_XDP_USE_CNI_ARG, ETH_AF_XDP_USE_PINNED_MAP_ARG,
+			ETH_AF_XDP_PROG_ARG);
 		return -EINVAL;
 	}
 
@@ -2404,9 +2448,16 @@  rte_pmd_af_xdp_probe(struct rte_vdev_device *dev)
 			ETH_AF_XDP_DP_PATH_ARG, dp_path);
 	}
 
-	if (!use_cni && strnlen(dp_path, PATH_MAX)) {
-		AF_XDP_LOG(ERR, "'%s' parameter is set, but '%s' was not enabled\n",
-			ETH_AF_XDP_DP_PATH_ARG, ETH_AF_XDP_USE_CNI_ARG);
+	if (use_pinned_map && !strnlen(dp_path, PATH_MAX)) {
+		snprintf(dp_path, sizeof(dp_path), "%s/%s/%s", DP_BASE_PATH, if_name, DP_XSK_MAP);
+		AF_XDP_LOG(INFO, "'%s' parameter not provided, setting value to '%s'\n",
+			ETH_AF_XDP_DP_PATH_ARG, dp_path);
+	}
+
+	if ((!use_cni && !use_pinned_map) && strnlen(dp_path, PATH_MAX)) {
+		AF_XDP_LOG(ERR, "'%s' parameter is set, but '%s' or '%s' were not enabled\n",
+			ETH_AF_XDP_DP_PATH_ARG, ETH_AF_XDP_USE_CNI_ARG,
+			ETH_AF_XDP_USE_PINNED_MAP_ARG);
 		return -EINVAL;
 	}
 
@@ -2433,7 +2484,8 @@  rte_pmd_af_xdp_probe(struct rte_vdev_device *dev)
 
 	eth_dev = init_internals(dev, if_name, xsk_start_queue_idx,
 				 xsk_queue_cnt, shared_umem, prog_path,
-				 busy_budget, force_copy, use_cni, dp_path);
+				 busy_budget, force_copy, use_cni, use_pinned_map,
+				 dp_path);
 	if (eth_dev == NULL) {
 		AF_XDP_LOG(ERR, "Failed to init internals\n");
 		return -1;
@@ -2495,4 +2547,5 @@  RTE_PMD_REGISTER_PARAM_STRING(net_af_xdp,
 			      "busy_budget=<int> "
 			      "force_copy=<int> "
 			      "use_cni=<int> "
+			      "use_pinned_map=<int> "
 			      "dp_path=<string> ");