[v10,2/3] net/af_xdp: fix multi interface support for K8s
Checks
Commit Message
The original 'use_cni' implementation, was added
to enable support for the AF_XDP PMD in a K8s env
without any escalated privileges.
However 'use_cni' used a hardcoded socket rather
than a configurable one. If a DPDK pod is requesting
multiple net devices and these devices are from
different pools, then the AF_XDP PMD attempts to
mount all the netdev UDSes in the pod as /tmp/afxdp.sock.
Which means that at best only 1 netdev will handshake
correctly with the AF_XDP DP. This patch addresses
this by making the socket parameter configurable using
a new vdev param called 'dp_path' alongside the
original 'use_cni' param. If the 'dp_path' parameter
is not set alongside the 'use_cni' parameter, then
it's configured inside the AF_XDP PMD (transparently
to the user). This change has been tested
with the AF_XDP DP PR 81[1], with both single and
multiple interfaces.
[1] https://github.com/intel/afxdp-plugins-for-kubernetes/pull/81
Fixes: 7fc6ae50369d ("net/af_xdp: support CNI Integration")
Cc: stable@dpdk.org
Signed-off-by: Maryam Tahhan <mtahhan@redhat.com>
---
doc/guides/howto/af_xdp_dp.rst | 62 +++++++++++------
doc/guides/nics/af_xdp.rst | 14 ++++
doc/guides/rel_notes/release_24_03.rst | 7 ++
drivers/net/af_xdp/rte_eth_af_xdp.c | 94 ++++++++++++++++----------
4 files changed, 121 insertions(+), 56 deletions(-)
Comments
On 29/02/2024 13:01, Maryam Tahhan wrote:
> The original 'use_cni' implementation, was added
> to enable support for the AF_XDP PMD in a K8s env
> without any escalated privileges.
> However 'use_cni' used a hardcoded socket rather
> than a configurable one. If a DPDK pod is requesting
> multiple net devices and these devices are from
> different pools, then the AF_XDP PMD attempts to
> mount all the netdev UDSes in the pod as /tmp/afxdp.sock.
> Which means that at best only 1 netdev will handshake
> correctly with the AF_XDP DP. This patch addresses
> this by making the socket parameter configurable using
> a new vdev param called 'dp_path' alongside the
> original 'use_cni' param. If the 'dp_path' parameter
> is not set alongside the 'use_cni' parameter, then
> it's configured inside the AF_XDP PMD (transparently
> to the user). This change has been tested
> with the AF_XDP DP PR 81[1], with both single and
> multiple interfaces.
>
> [1] https://github.com/intel/afxdp-plugins-for-kubernetes/pull/81
>
> Fixes: 7fc6ae50369d ("net/af_xdp: support CNI Integration")
> Cc: stable@dpdk.org
>
> Signed-off-by: Maryam Tahhan <mtahhan@redhat.com>
> ---
> doc/guides/howto/af_xdp_dp.rst | 62 +++++++++++------
> doc/guides/nics/af_xdp.rst | 14 ++++
> doc/guides/rel_notes/release_24_03.rst | 7 ++
> drivers/net/af_xdp/rte_eth_af_xdp.c | 94 ++++++++++++++++----------
> 4 files changed, 121 insertions(+), 56 deletions(-)
>
> diff --git a/doc/guides/howto/af_xdp_dp.rst b/doc/guides/howto/af_xdp_dp.rst
> index 7166d904bd..ec348c3b82 100644
> --- a/doc/guides/howto/af_xdp_dp.rst
> +++ b/doc/guides/howto/af_xdp_dp.rst
> @@ -52,29 +52,33 @@ should be used when creating the socket
> to instruct libbpf not to load the default libbpf program on the netdev.
> Instead the loading is handled by the AF_XDP Device Plugin.
>
> -Limitations
> ------------
> +The EAL vdev argument ``dp_path`` is used alongside the ``use_cni`` argument
> +to explicitly tell the AF_XDP PMD where to find the UDS to interact with the
> +AF_XDP Device Plugin. If this argument is not passed alongside the ``use_cni``
> +argument then the AF_XDP PMD configures it internally.
>
> -For DPDK versions <= v23.11 the Unix Domain Socket file path appears in
> -the pod at "/tmp/afxdp.sock". The handshake implementation in the AF_XDP PMD
> -is only compatible with the AF_XDP Device Plugin up to commit id `38317c2`_
> -and the pod is limited to a single netdev.
> +.. note::
> +
> + DPDK AF_XDP PMD <= v23.11 will only work with the AF_XDP Device Plugin
> + <= commit id `38317c2`_.
>
> .. note::
>
> - DPDK AF_XDP PMD <= v23.11 will not work with the latest version of the
> - AF_XDP Device Plugin.
> + DPDK AF_XDP PMD > v23.11 will work with latest version of the
> + AF_XDP Device Plugin through a combination of the ``dp_path`` and/or
> + the ``use_cni`` parameter. In these versions of the PMD if a user doesn't
> + explicitly set the ``dp_path``parameter when using ``use_cni`` then that
I see the typo - will respin - sorry, it's been a long day already
> + path is transparently configured in the AF_XDP PMD to the default
> + `AF_XDP Device Plugin for Kubernetes`_ mount point path. The path can
> + be overriden by explicitly setting the ``dp_path`` param.
>
> -The issue is if a single pod requests different devices from different pools it
> -results in multiple UDS servers serving the pod with the container using only a
> -single mount point for their UDS as ``/tmp/afxdp.sock``. This means that at best one
> -device might be able to complete the handshake. This has been fixed in the AF_XDP
> -Device Plugin so that the mount point in the pods for the UDS appear at
> -``/tmp/afxdp_dp/<netdev>/afxdp.sock``. Later versions of DPDK fix this hardcoded path
> -in the PMD alongside the ``use_cni`` parameter.
> +.. note::
>
> -.. _38317c2: https://github.com/intel/afxdp-plugins-for-kubernetes/commit/38317c256b5c7dfb39e013a0f76010c2ded03669
> + DPDK AF_XDP PMD > v23.11 is backwards compatible with (older) versions
> + of the AF_XDP DP <= commit id `38317c2`_ by explicitly setting ``dp_path`` to
> + ``/tmp/afxdp.sock``.
>
> +.. _38317c2: https://github.com/intel/afxdp-plugins-for-kubernetes/commit/38317c256b5c7dfb39e013a0f76010c2ded03669
>
> Prerequisites
> -------------
> @@ -105,10 +109,10 @@ Device Plugin and DPDK container prerequisites:
>
> .. code-block:: console
>
> - cat << EOF | sudo tee /etc/systemd/system/containerd.service.d/limits.conf
> - [Service]
> - LimitMEMLOCK=infinity
> - EOF
> + cat << EOF | sudo tee /etc/systemd/system/containerd.service.d/limits.conf
> + [Service]
> + LimitMEMLOCK=infinity
> + EOF
>
> * dpdk-testpmd application should have AF_XDP feature enabled.
>
> @@ -284,7 +288,7 @@ Run dpdk-testpmd with the AF_XDP Device Plugin + CNI
> emptyDir:
> medium: HugePages
>
> - For further reference please use the `pod.yaml`_
> + For further reference please see the `pod.yaml`_
>
> .. _pod.yaml: https://github.com/intel/afxdp-plugins-for-kubernetes/blob/main/examples/pod-spec.yaml
>
> @@ -297,3 +301,19 @@ Run dpdk-testpmd with the AF_XDP Device Plugin + CNI
> --vdev=net_af_xdp0,use_cni=1,iface=<interface name> \
> --no-mlockall --in-memory \
> -- -i --a --nb-cores=2 --rxq=1 --txq=1 --forward-mode=macswap;
> +
> + Or
> +
> + .. code-block:: console
> +
> + kubectl exec -i <Pod name> --container <containers name> -- \
> + /<Path>/dpdk-testpmd -l 0,1 --no-pci \
> + --vdev=net_af_xdp0,use_cni=1,iface=<interface name>,dp_path="/tmp/afxdp_dp/<interface name>/afxdp.sock" \
> + --no-mlockall --in-memory \
> + -- -i --a --nb-cores=2 --rxq=1 --txq=1 --forward-mode=macswap;
> +
> +.. note::
> +
> + If the ``dp_path`` parameter isn't explicitly set (like the example above)
> + the AF_XDP PMD will set the parameter value to
> + ``/tmp/afxdp_dp/<<interface name>>/afxdp.sock``.
> diff --git a/doc/guides/nics/af_xdp.rst b/doc/guides/nics/af_xdp.rst
> index 4dd9c73742..7f8651beda 100644
> --- a/doc/guides/nics/af_xdp.rst
> +++ b/doc/guides/nics/af_xdp.rst
> @@ -171,6 +171,20 @@ enable the `AF_XDP Device Plugin for Kubernetes`_ with a DPDK application/pod.
> so enabling and disabling of the promiscuous mode through the DPDK application
> is also not supported.
>
> +dp_path
> +~~~~~~~
> +
> +The EAL vdev argument ``dp_path`` is used alongside the ``use_cni`` argument
> +to explicitly tell the AF_XDP PMD where to find the UDS to interact with the
> +`AF_XDP Device Plugin for Kubernetes`_. If this argument is not passed
> +alongside the ``use_cni`` argument then the AF_XDP PMD configures it internally.
> +
> +.. _AF_XDP Device Plugin for Kubernetes: https://github.com/intel/afxdp-plugins-for-kubernetes
> +
> +.. code-block:: console
> +
> + --vdev=net_af_xdp0,use_cni=1,dp_path="/tmp/afxdp_dp/<<interface name>>/afxdp.sock"
> +
> Limitations
> -----------
>
> diff --git a/doc/guides/rel_notes/release_24_03.rst b/doc/guides/rel_notes/release_24_03.rst
> index 879bb4944c..b2b1f2566f 100644
> --- a/doc/guides/rel_notes/release_24_03.rst
> +++ b/doc/guides/rel_notes/release_24_03.rst
> @@ -138,6 +138,13 @@ New Features
> to support TLS v1.2, TLS v1.3 and DTLS v1.2.
> * Added PMD API to allow raw submission of instructions to CPT.
>
> +* **Enabled AF_XDP PMD multi interface (UDS) support with AF_XDP Device Plugin**.
> +
> + The EAL vdev argument for the AF_XDP PMD ``use_cni`` previously limited
> + a pod to using only a single netdev/interface. The latest changes (adding
> + the ``dp_path`` parameter) remove this limitation and maintain backward
> + compatibility for any applications already using the ``use_cni`` vdev
> + argument with the AF_XDP Device Plugin.
>
> Removed Items
> -------------
> diff --git a/drivers/net/af_xdp/rte_eth_af_xdp.c b/drivers/net/af_xdp/rte_eth_af_xdp.c
> index 2d151e45c7..3fb0c6a3b9 100644
> --- a/drivers/net/af_xdp/rte_eth_af_xdp.c
> +++ b/drivers/net/af_xdp/rte_eth_af_xdp.c
> @@ -83,12 +83,13 @@ RTE_LOG_REGISTER_DEFAULT(af_xdp_logtype, NOTICE);
>
> #define ETH_AF_XDP_MP_KEY "afxdp_mp_send_fds"
>
> +#define DP_BASE_PATH "/tmp/afxdp_dp"
> +#define DP_UDS_SOCK "afxdp.sock"
> #define MAX_LONG_OPT_SZ 64
> #define UDS_MAX_FD_NUM 2
> #define UDS_MAX_CMD_LEN 64
> #define UDS_MAX_CMD_RESP 128
> #define UDS_XSK_MAP_FD_MSG "/xsk_map_fd"
> -#define UDS_SOCK "/tmp/afxdp.sock"
> #define UDS_CONNECT_MSG "/connect"
> #define UDS_HOST_OK_MSG "/host_ok"
> #define UDS_HOST_NAK_MSG "/host_nak"
> @@ -171,6 +172,7 @@ struct pmd_internals {
> bool custom_prog_configured;
> bool force_copy;
> bool use_cni;
> + char dp_path[PATH_MAX];
> struct bpf_map *map;
>
> struct rte_ether_addr eth_addr;
> @@ -191,6 +193,7 @@ struct pmd_process_private {
> #define ETH_AF_XDP_BUDGET_ARG "busy_budget"
> #define ETH_AF_XDP_FORCE_COPY_ARG "force_copy"
> #define ETH_AF_XDP_USE_CNI_ARG "use_cni"
> +#define ETH_AF_XDP_DP_PATH_ARG "dp_path"
>
> static const char * const valid_arguments[] = {
> ETH_AF_XDP_IFACE_ARG,
> @@ -201,6 +204,7 @@ static const char * const valid_arguments[] = {
> ETH_AF_XDP_BUDGET_ARG,
> ETH_AF_XDP_FORCE_COPY_ARG,
> ETH_AF_XDP_USE_CNI_ARG,
> + ETH_AF_XDP_DP_PATH_ARG,
> NULL
> };
>
> @@ -1351,7 +1355,7 @@ configure_preferred_busy_poll(struct pkt_rx_queue *rxq)
> }
>
> static int
> -init_uds_sock(struct sockaddr_un *server)
> +init_uds_sock(struct sockaddr_un *server, const char *dp_path)
> {
> int sock;
>
> @@ -1362,7 +1366,7 @@ init_uds_sock(struct sockaddr_un *server)
> }
>
> server->sun_family = AF_UNIX;
> - strlcpy(server->sun_path, UDS_SOCK, sizeof(server->sun_path));
> + strlcpy(server->sun_path, dp_path, sizeof(server->sun_path));
>
> if (connect(sock, (struct sockaddr *)server, sizeof(struct sockaddr_un)) < 0) {
> close(sock);
> @@ -1382,7 +1386,7 @@ struct msg_internal {
> };
>
> static int
> -send_msg(int sock, char *request, int *fd)
> +send_msg(int sock, char *request, int *fd, const char *dp_path)
> {
> int snd;
> struct iovec iov;
> @@ -1393,7 +1397,7 @@ send_msg(int sock, char *request, int *fd)
>
> memset(&dst, 0, sizeof(dst));
> dst.sun_family = AF_UNIX;
> - strlcpy(dst.sun_path, UDS_SOCK, sizeof(dst.sun_path));
> + strlcpy(dst.sun_path, dp_path, sizeof(dst.sun_path));
>
> /* Initialize message header structure */
> memset(&msgh, 0, sizeof(msgh));
> @@ -1470,8 +1474,8 @@ read_msg(int sock, char *response, struct sockaddr_un *s, int *fd)
> }
>
> static int
> -make_request_cni(int sock, struct sockaddr_un *server, char *request,
> - int *req_fd, char *response, int *out_fd)
> +make_request_dp(int sock, struct sockaddr_un *server, char *request,
> + int *req_fd, char *response, int *out_fd, const char *dp_path)
> {
> int rval;
>
> @@ -1483,7 +1487,7 @@ make_request_cni(int sock, struct sockaddr_un *server, char *request,
> if (req_fd == NULL)
> rval = write(sock, request, strlen(request));
> else
> - rval = send_msg(sock, request, req_fd);
> + rval = send_msg(sock, request, req_fd, dp_path);
>
> if (rval < 0) {
> AF_XDP_LOG(ERR, "Write error %s\n", strerror(errno));
> @@ -1507,7 +1511,7 @@ check_response(char *response, char *exp_resp, long size)
> }
>
> static int
> -get_cni_fd(char *if_name)
> +uds_get_xskmap_fd(char *if_name, const char *dp_path)
> {
> char request[UDS_MAX_CMD_LEN], response[UDS_MAX_CMD_RESP];
> char hostname[MAX_LONG_OPT_SZ], exp_resp[UDS_MAX_CMD_RESP];
> @@ -1520,14 +1524,14 @@ get_cni_fd(char *if_name)
> return -1;
>
> memset(&server, 0, sizeof(server));
> - sock = init_uds_sock(&server);
> + sock = init_uds_sock(&server, dp_path);
> if (sock < 0)
> return -1;
>
> - /* Initiates handshake to CNI send: /connect,hostname */
> + /* Initiates handshake to the AF_XDP Device Plugin send: /connect,hostname */
> snprintf(request, sizeof(request), "%s,%s", UDS_CONNECT_MSG, hostname);
> memset(response, 0, sizeof(response));
> - if (make_request_cni(sock, &server, request, NULL, response, &out_fd) < 0) {
> + if (make_request_dp(sock, &server, request, NULL, response, &out_fd, dp_path) < 0) {
> AF_XDP_LOG(ERR, "Error in processing cmd [%s]\n", request);
> goto err_close;
> }
> @@ -1541,7 +1545,7 @@ get_cni_fd(char *if_name)
> /* Request for "/version" */
> strlcpy(request, UDS_VERSION_MSG, UDS_MAX_CMD_LEN);
> memset(response, 0, sizeof(response));
> - if (make_request_cni(sock, &server, request, NULL, response, &out_fd) < 0) {
> + if (make_request_dp(sock, &server, request, NULL, response, &out_fd, dp_path) < 0) {
> AF_XDP_LOG(ERR, "Error in processing cmd [%s]\n", request);
> goto err_close;
> }
> @@ -1549,7 +1553,7 @@ get_cni_fd(char *if_name)
> /* Request for file descriptor for netdev name*/
> snprintf(request, sizeof(request), "%s,%s", UDS_XSK_MAP_FD_MSG, if_name);
> memset(response, 0, sizeof(response));
> - if (make_request_cni(sock, &server, request, NULL, response, &out_fd) < 0) {
> + if (make_request_dp(sock, &server, request, NULL, response, &out_fd, dp_path) < 0) {
> AF_XDP_LOG(ERR, "Error in processing cmd [%s]\n", request);
> goto err_close;
> }
> @@ -1571,7 +1575,7 @@ get_cni_fd(char *if_name)
> /* Initiate close connection */
> strlcpy(request, UDS_FIN_MSG, UDS_MAX_CMD_LEN);
> memset(response, 0, sizeof(response));
> - if (make_request_cni(sock, &server, request, NULL, response, &out_fd) < 0) {
> + if (make_request_dp(sock, &server, request, NULL, response, &out_fd, dp_path) < 0) {
> AF_XDP_LOG(ERR, "Error in processing cmd [%s]\n", request);
> goto err_close;
> }
> @@ -1695,17 +1699,16 @@ xsk_configure(struct pmd_internals *internals, struct pkt_rx_queue *rxq,
> }
>
> if (internals->use_cni) {
> - int err, fd, map_fd;
> + int err, map_fd;
>
> - /* get socket fd from CNI plugin */
> - map_fd = get_cni_fd(internals->if_name);
> + /* get socket fd from AF_XDP Device Plugin */
> + map_fd = uds_get_xskmap_fd(internals->if_name, internals->dp_path);
> if (map_fd < 0) {
> - AF_XDP_LOG(ERR, "Failed to receive CNI plugin fd\n");
> + AF_XDP_LOG(ERR, "Failed to receive xskmap fd from AF_XDP Device Plugin\n");
> goto out_xsk;
> }
> - /* get socket fd */
> - fd = xsk_socket__fd(rxq->xsk);
> - err = bpf_map_update_elem(map_fd, &rxq->xsk_queue_idx, &fd, 0);
> +
> + err = xsk_socket__update_xskmap(rxq->xsk, map_fd);
> if (err) {
> AF_XDP_LOG(ERR, "Failed to insert unprivileged xsk in map.\n");
> goto out_xsk;
> @@ -1881,13 +1884,13 @@ static const struct eth_dev_ops ops = {
> .get_monitor_addr = eth_get_monitor_addr,
> };
>
> -/* CNI option works in unprivileged container environment
> - * and ethernet device functionality will be reduced. So
> - * additional customiszed eth_dev_ops struct is needed
> - * for cni. Promiscuous enable and disable functionality
> - * is removed.
> +/* AF_XDP Device Plugin option works in unprivileged
> + * container environments and ethernet device functionality
> + * will be reduced. So additional customised eth_dev_ops
> + * struct is needed for the Device Plugin. Promiscuous
> + * enable and disable functionality is removed.
> **/
> -static const struct eth_dev_ops ops_cni = {
> +static const struct eth_dev_ops ops_afxdp_dp = {
> .dev_start = eth_dev_start,
> .dev_stop = eth_dev_stop,
> .dev_close = eth_dev_close,
> @@ -2023,7 +2026,8 @@ xdp_get_channels_info(const char *if_name, int *max_queues,
> static int
> parse_parameters(struct rte_kvargs *kvlist, char *if_name, int *start_queue,
> int *queue_cnt, int *shared_umem, char *prog_path,
> - int *busy_budget, int *force_copy, int *use_cni)
> + int *busy_budget, int *force_copy, int *use_cni,
> + char *dp_path)
> {
> int ret;
>
> @@ -2069,6 +2073,11 @@ parse_parameters(struct rte_kvargs *kvlist, char *if_name, int *start_queue,
> if (ret < 0)
> goto free_kvlist;
>
> + ret = rte_kvargs_process(kvlist, ETH_AF_XDP_DP_PATH_ARG,
> + &parse_prog_arg, dp_path);
> + if (ret < 0)
> + goto free_kvlist;
> +
> free_kvlist:
> rte_kvargs_free(kvlist);
> return ret;
> @@ -2108,7 +2117,7 @@ static struct rte_eth_dev *
> init_internals(struct rte_vdev_device *dev, const char *if_name,
> int start_queue_idx, int queue_cnt, int shared_umem,
> const char *prog_path, int busy_budget, int force_copy,
> - int use_cni)
> + int use_cni, const char *dp_path)
> {
> const char *name = rte_vdev_device_name(dev);
> const unsigned int numa_node = dev->device.numa_node;
> @@ -2138,6 +2147,7 @@ init_internals(struct rte_vdev_device *dev, const char *if_name,
> internals->shared_umem = shared_umem;
> internals->force_copy = force_copy;
> internals->use_cni = use_cni;
> + strlcpy(internals->dp_path, dp_path, PATH_MAX);
>
> if (xdp_get_channels_info(if_name, &internals->max_queue_cnt,
> &internals->combined_queue_cnt)) {
> @@ -2199,7 +2209,7 @@ init_internals(struct rte_vdev_device *dev, const char *if_name,
> if (!internals->use_cni)
> eth_dev->dev_ops = &ops;
> else
> - eth_dev->dev_ops = &ops_cni;
> + eth_dev->dev_ops = &ops_afxdp_dp;
>
> eth_dev->rx_pkt_burst = eth_af_xdp_rx;
> eth_dev->tx_pkt_burst = eth_af_xdp_tx;
> @@ -2328,6 +2338,7 @@ rte_pmd_af_xdp_probe(struct rte_vdev_device *dev)
> int busy_budget = -1, ret;
> int force_copy = 0;
> int use_cni = 0;
> + char dp_path[PATH_MAX] = {'\0'};
> struct rte_eth_dev *eth_dev = NULL;
> const char *name = rte_vdev_device_name(dev);
>
> @@ -2370,7 +2381,7 @@ rte_pmd_af_xdp_probe(struct rte_vdev_device *dev)
>
> if (parse_parameters(kvlist, if_name, &xsk_start_queue_idx,
> &xsk_queue_cnt, &shared_umem, prog_path,
> - &busy_budget, &force_copy, &use_cni) < 0) {
> + &busy_budget, &force_copy, &use_cni, dp_path) < 0) {
> AF_XDP_LOG(ERR, "Invalid kvargs value\n");
> return -EINVAL;
> }
> @@ -2384,7 +2395,19 @@ rte_pmd_af_xdp_probe(struct rte_vdev_device *dev)
> if (use_cni && strnlen(prog_path, PATH_MAX)) {
> AF_XDP_LOG(ERR, "When '%s' parameter is used, '%s' parameter is not valid\n",
> ETH_AF_XDP_USE_CNI_ARG, ETH_AF_XDP_PROG_ARG);
> - return -EINVAL;
> + return -EINVAL;
> + }
> +
> + if (use_cni && !strnlen(dp_path, PATH_MAX)) {
> + snprintf(dp_path, sizeof(dp_path), "%s/%s/%s", DP_BASE_PATH, if_name, DP_UDS_SOCK);
> + AF_XDP_LOG(INFO, "'%s' parameter not provided, setting value to '%s'\n",
> + ETH_AF_XDP_DP_PATH_ARG, dp_path);
> + }
> +
> + if (!use_cni && strnlen(dp_path, PATH_MAX)) {
> + AF_XDP_LOG(ERR, "'%s' parameter is set, but '%s' was not enabled\n",
> + ETH_AF_XDP_DP_PATH_ARG, ETH_AF_XDP_USE_CNI_ARG);
> + return -EINVAL;
> }
>
> if (strlen(if_name) == 0) {
> @@ -2410,7 +2433,7 @@ rte_pmd_af_xdp_probe(struct rte_vdev_device *dev)
>
> eth_dev = init_internals(dev, if_name, xsk_start_queue_idx,
> xsk_queue_cnt, shared_umem, prog_path,
> - busy_budget, force_copy, use_cni);
> + busy_budget, force_copy, use_cni, dp_path);
> if (eth_dev == NULL) {
> AF_XDP_LOG(ERR, "Failed to init internals\n");
> return -1;
> @@ -2471,4 +2494,5 @@ RTE_PMD_REGISTER_PARAM_STRING(net_af_xdp,
> "xdp_prog=<string> "
> "busy_budget=<int> "
> "force_copy=<int> "
> - "use_cni=<int> ");
> + "use_cni=<int> "
> + "dp_path=<string> ");
@@ -52,29 +52,33 @@ should be used when creating the socket
to instruct libbpf not to load the default libbpf program on the netdev.
Instead the loading is handled by the AF_XDP Device Plugin.
-Limitations
------------
+The EAL vdev argument ``dp_path`` is used alongside the ``use_cni`` argument
+to explicitly tell the AF_XDP PMD where to find the UDS to interact with the
+AF_XDP Device Plugin. If this argument is not passed alongside the ``use_cni``
+argument then the AF_XDP PMD configures it internally.
-For DPDK versions <= v23.11 the Unix Domain Socket file path appears in
-the pod at "/tmp/afxdp.sock". The handshake implementation in the AF_XDP PMD
-is only compatible with the AF_XDP Device Plugin up to commit id `38317c2`_
-and the pod is limited to a single netdev.
+.. note::
+
+ DPDK AF_XDP PMD <= v23.11 will only work with the AF_XDP Device Plugin
+ <= commit id `38317c2`_.
.. note::
- DPDK AF_XDP PMD <= v23.11 will not work with the latest version of the
- AF_XDP Device Plugin.
+ DPDK AF_XDP PMD > v23.11 will work with latest version of the
+ AF_XDP Device Plugin through a combination of the ``dp_path`` and/or
+ the ``use_cni`` parameter. In these versions of the PMD if a user doesn't
+ explicitly set the ``dp_path``parameter when using ``use_cni`` then that
+ path is transparently configured in the AF_XDP PMD to the default
+ `AF_XDP Device Plugin for Kubernetes`_ mount point path. The path can
+ be overriden by explicitly setting the ``dp_path`` param.
-The issue is if a single pod requests different devices from different pools it
-results in multiple UDS servers serving the pod with the container using only a
-single mount point for their UDS as ``/tmp/afxdp.sock``. This means that at best one
-device might be able to complete the handshake. This has been fixed in the AF_XDP
-Device Plugin so that the mount point in the pods for the UDS appear at
-``/tmp/afxdp_dp/<netdev>/afxdp.sock``. Later versions of DPDK fix this hardcoded path
-in the PMD alongside the ``use_cni`` parameter.
+.. note::
-.. _38317c2: https://github.com/intel/afxdp-plugins-for-kubernetes/commit/38317c256b5c7dfb39e013a0f76010c2ded03669
+ DPDK AF_XDP PMD > v23.11 is backwards compatible with (older) versions
+ of the AF_XDP DP <= commit id `38317c2`_ by explicitly setting ``dp_path`` to
+ ``/tmp/afxdp.sock``.
+.. _38317c2: https://github.com/intel/afxdp-plugins-for-kubernetes/commit/38317c256b5c7dfb39e013a0f76010c2ded03669
Prerequisites
-------------
@@ -105,10 +109,10 @@ Device Plugin and DPDK container prerequisites:
.. code-block:: console
- cat << EOF | sudo tee /etc/systemd/system/containerd.service.d/limits.conf
- [Service]
- LimitMEMLOCK=infinity
- EOF
+ cat << EOF | sudo tee /etc/systemd/system/containerd.service.d/limits.conf
+ [Service]
+ LimitMEMLOCK=infinity
+ EOF
* dpdk-testpmd application should have AF_XDP feature enabled.
@@ -284,7 +288,7 @@ Run dpdk-testpmd with the AF_XDP Device Plugin + CNI
emptyDir:
medium: HugePages
- For further reference please use the `pod.yaml`_
+ For further reference please see the `pod.yaml`_
.. _pod.yaml: https://github.com/intel/afxdp-plugins-for-kubernetes/blob/main/examples/pod-spec.yaml
@@ -297,3 +301,19 @@ Run dpdk-testpmd with the AF_XDP Device Plugin + CNI
--vdev=net_af_xdp0,use_cni=1,iface=<interface name> \
--no-mlockall --in-memory \
-- -i --a --nb-cores=2 --rxq=1 --txq=1 --forward-mode=macswap;
+
+ Or
+
+ .. code-block:: console
+
+ kubectl exec -i <Pod name> --container <containers name> -- \
+ /<Path>/dpdk-testpmd -l 0,1 --no-pci \
+ --vdev=net_af_xdp0,use_cni=1,iface=<interface name>,dp_path="/tmp/afxdp_dp/<interface name>/afxdp.sock" \
+ --no-mlockall --in-memory \
+ -- -i --a --nb-cores=2 --rxq=1 --txq=1 --forward-mode=macswap;
+
+.. note::
+
+ If the ``dp_path`` parameter isn't explicitly set (like the example above)
+ the AF_XDP PMD will set the parameter value to
+ ``/tmp/afxdp_dp/<<interface name>>/afxdp.sock``.
@@ -171,6 +171,20 @@ enable the `AF_XDP Device Plugin for Kubernetes`_ with a DPDK application/pod.
so enabling and disabling of the promiscuous mode through the DPDK application
is also not supported.
+dp_path
+~~~~~~~
+
+The EAL vdev argument ``dp_path`` is used alongside the ``use_cni`` argument
+to explicitly tell the AF_XDP PMD where to find the UDS to interact with the
+`AF_XDP Device Plugin for Kubernetes`_. If this argument is not passed
+alongside the ``use_cni`` argument then the AF_XDP PMD configures it internally.
+
+.. _AF_XDP Device Plugin for Kubernetes: https://github.com/intel/afxdp-plugins-for-kubernetes
+
+.. code-block:: console
+
+ --vdev=net_af_xdp0,use_cni=1,dp_path="/tmp/afxdp_dp/<<interface name>>/afxdp.sock"
+
Limitations
-----------
@@ -138,6 +138,13 @@ New Features
to support TLS v1.2, TLS v1.3 and DTLS v1.2.
* Added PMD API to allow raw submission of instructions to CPT.
+* **Enabled AF_XDP PMD multi interface (UDS) support with AF_XDP Device Plugin**.
+
+ The EAL vdev argument for the AF_XDP PMD ``use_cni`` previously limited
+ a pod to using only a single netdev/interface. The latest changes (adding
+ the ``dp_path`` parameter) remove this limitation and maintain backward
+ compatibility for any applications already using the ``use_cni`` vdev
+ argument with the AF_XDP Device Plugin.
Removed Items
-------------
@@ -83,12 +83,13 @@ RTE_LOG_REGISTER_DEFAULT(af_xdp_logtype, NOTICE);
#define ETH_AF_XDP_MP_KEY "afxdp_mp_send_fds"
+#define DP_BASE_PATH "/tmp/afxdp_dp"
+#define DP_UDS_SOCK "afxdp.sock"
#define MAX_LONG_OPT_SZ 64
#define UDS_MAX_FD_NUM 2
#define UDS_MAX_CMD_LEN 64
#define UDS_MAX_CMD_RESP 128
#define UDS_XSK_MAP_FD_MSG "/xsk_map_fd"
-#define UDS_SOCK "/tmp/afxdp.sock"
#define UDS_CONNECT_MSG "/connect"
#define UDS_HOST_OK_MSG "/host_ok"
#define UDS_HOST_NAK_MSG "/host_nak"
@@ -171,6 +172,7 @@ struct pmd_internals {
bool custom_prog_configured;
bool force_copy;
bool use_cni;
+ char dp_path[PATH_MAX];
struct bpf_map *map;
struct rte_ether_addr eth_addr;
@@ -191,6 +193,7 @@ struct pmd_process_private {
#define ETH_AF_XDP_BUDGET_ARG "busy_budget"
#define ETH_AF_XDP_FORCE_COPY_ARG "force_copy"
#define ETH_AF_XDP_USE_CNI_ARG "use_cni"
+#define ETH_AF_XDP_DP_PATH_ARG "dp_path"
static const char * const valid_arguments[] = {
ETH_AF_XDP_IFACE_ARG,
@@ -201,6 +204,7 @@ static const char * const valid_arguments[] = {
ETH_AF_XDP_BUDGET_ARG,
ETH_AF_XDP_FORCE_COPY_ARG,
ETH_AF_XDP_USE_CNI_ARG,
+ ETH_AF_XDP_DP_PATH_ARG,
NULL
};
@@ -1351,7 +1355,7 @@ configure_preferred_busy_poll(struct pkt_rx_queue *rxq)
}
static int
-init_uds_sock(struct sockaddr_un *server)
+init_uds_sock(struct sockaddr_un *server, const char *dp_path)
{
int sock;
@@ -1362,7 +1366,7 @@ init_uds_sock(struct sockaddr_un *server)
}
server->sun_family = AF_UNIX;
- strlcpy(server->sun_path, UDS_SOCK, sizeof(server->sun_path));
+ strlcpy(server->sun_path, dp_path, sizeof(server->sun_path));
if (connect(sock, (struct sockaddr *)server, sizeof(struct sockaddr_un)) < 0) {
close(sock);
@@ -1382,7 +1386,7 @@ struct msg_internal {
};
static int
-send_msg(int sock, char *request, int *fd)
+send_msg(int sock, char *request, int *fd, const char *dp_path)
{
int snd;
struct iovec iov;
@@ -1393,7 +1397,7 @@ send_msg(int sock, char *request, int *fd)
memset(&dst, 0, sizeof(dst));
dst.sun_family = AF_UNIX;
- strlcpy(dst.sun_path, UDS_SOCK, sizeof(dst.sun_path));
+ strlcpy(dst.sun_path, dp_path, sizeof(dst.sun_path));
/* Initialize message header structure */
memset(&msgh, 0, sizeof(msgh));
@@ -1470,8 +1474,8 @@ read_msg(int sock, char *response, struct sockaddr_un *s, int *fd)
}
static int
-make_request_cni(int sock, struct sockaddr_un *server, char *request,
- int *req_fd, char *response, int *out_fd)
+make_request_dp(int sock, struct sockaddr_un *server, char *request,
+ int *req_fd, char *response, int *out_fd, const char *dp_path)
{
int rval;
@@ -1483,7 +1487,7 @@ make_request_cni(int sock, struct sockaddr_un *server, char *request,
if (req_fd == NULL)
rval = write(sock, request, strlen(request));
else
- rval = send_msg(sock, request, req_fd);
+ rval = send_msg(sock, request, req_fd, dp_path);
if (rval < 0) {
AF_XDP_LOG(ERR, "Write error %s\n", strerror(errno));
@@ -1507,7 +1511,7 @@ check_response(char *response, char *exp_resp, long size)
}
static int
-get_cni_fd(char *if_name)
+uds_get_xskmap_fd(char *if_name, const char *dp_path)
{
char request[UDS_MAX_CMD_LEN], response[UDS_MAX_CMD_RESP];
char hostname[MAX_LONG_OPT_SZ], exp_resp[UDS_MAX_CMD_RESP];
@@ -1520,14 +1524,14 @@ get_cni_fd(char *if_name)
return -1;
memset(&server, 0, sizeof(server));
- sock = init_uds_sock(&server);
+ sock = init_uds_sock(&server, dp_path);
if (sock < 0)
return -1;
- /* Initiates handshake to CNI send: /connect,hostname */
+ /* Initiates handshake to the AF_XDP Device Plugin send: /connect,hostname */
snprintf(request, sizeof(request), "%s,%s", UDS_CONNECT_MSG, hostname);
memset(response, 0, sizeof(response));
- if (make_request_cni(sock, &server, request, NULL, response, &out_fd) < 0) {
+ if (make_request_dp(sock, &server, request, NULL, response, &out_fd, dp_path) < 0) {
AF_XDP_LOG(ERR, "Error in processing cmd [%s]\n", request);
goto err_close;
}
@@ -1541,7 +1545,7 @@ get_cni_fd(char *if_name)
/* Request for "/version" */
strlcpy(request, UDS_VERSION_MSG, UDS_MAX_CMD_LEN);
memset(response, 0, sizeof(response));
- if (make_request_cni(sock, &server, request, NULL, response, &out_fd) < 0) {
+ if (make_request_dp(sock, &server, request, NULL, response, &out_fd, dp_path) < 0) {
AF_XDP_LOG(ERR, "Error in processing cmd [%s]\n", request);
goto err_close;
}
@@ -1549,7 +1553,7 @@ get_cni_fd(char *if_name)
/* Request for file descriptor for netdev name*/
snprintf(request, sizeof(request), "%s,%s", UDS_XSK_MAP_FD_MSG, if_name);
memset(response, 0, sizeof(response));
- if (make_request_cni(sock, &server, request, NULL, response, &out_fd) < 0) {
+ if (make_request_dp(sock, &server, request, NULL, response, &out_fd, dp_path) < 0) {
AF_XDP_LOG(ERR, "Error in processing cmd [%s]\n", request);
goto err_close;
}
@@ -1571,7 +1575,7 @@ get_cni_fd(char *if_name)
/* Initiate close connection */
strlcpy(request, UDS_FIN_MSG, UDS_MAX_CMD_LEN);
memset(response, 0, sizeof(response));
- if (make_request_cni(sock, &server, request, NULL, response, &out_fd) < 0) {
+ if (make_request_dp(sock, &server, request, NULL, response, &out_fd, dp_path) < 0) {
AF_XDP_LOG(ERR, "Error in processing cmd [%s]\n", request);
goto err_close;
}
@@ -1695,17 +1699,16 @@ xsk_configure(struct pmd_internals *internals, struct pkt_rx_queue *rxq,
}
if (internals->use_cni) {
- int err, fd, map_fd;
+ int err, map_fd;
- /* get socket fd from CNI plugin */
- map_fd = get_cni_fd(internals->if_name);
+ /* get socket fd from AF_XDP Device Plugin */
+ map_fd = uds_get_xskmap_fd(internals->if_name, internals->dp_path);
if (map_fd < 0) {
- AF_XDP_LOG(ERR, "Failed to receive CNI plugin fd\n");
+ AF_XDP_LOG(ERR, "Failed to receive xskmap fd from AF_XDP Device Plugin\n");
goto out_xsk;
}
- /* get socket fd */
- fd = xsk_socket__fd(rxq->xsk);
- err = bpf_map_update_elem(map_fd, &rxq->xsk_queue_idx, &fd, 0);
+
+ err = xsk_socket__update_xskmap(rxq->xsk, map_fd);
if (err) {
AF_XDP_LOG(ERR, "Failed to insert unprivileged xsk in map.\n");
goto out_xsk;
@@ -1881,13 +1884,13 @@ static const struct eth_dev_ops ops = {
.get_monitor_addr = eth_get_monitor_addr,
};
-/* CNI option works in unprivileged container environment
- * and ethernet device functionality will be reduced. So
- * additional customiszed eth_dev_ops struct is needed
- * for cni. Promiscuous enable and disable functionality
- * is removed.
+/* AF_XDP Device Plugin option works in unprivileged
+ * container environments and ethernet device functionality
+ * will be reduced. So additional customised eth_dev_ops
+ * struct is needed for the Device Plugin. Promiscuous
+ * enable and disable functionality is removed.
**/
-static const struct eth_dev_ops ops_cni = {
+static const struct eth_dev_ops ops_afxdp_dp = {
.dev_start = eth_dev_start,
.dev_stop = eth_dev_stop,
.dev_close = eth_dev_close,
@@ -2023,7 +2026,8 @@ xdp_get_channels_info(const char *if_name, int *max_queues,
static int
parse_parameters(struct rte_kvargs *kvlist, char *if_name, int *start_queue,
int *queue_cnt, int *shared_umem, char *prog_path,
- int *busy_budget, int *force_copy, int *use_cni)
+ int *busy_budget, int *force_copy, int *use_cni,
+ char *dp_path)
{
int ret;
@@ -2069,6 +2073,11 @@ parse_parameters(struct rte_kvargs *kvlist, char *if_name, int *start_queue,
if (ret < 0)
goto free_kvlist;
+ ret = rte_kvargs_process(kvlist, ETH_AF_XDP_DP_PATH_ARG,
+ &parse_prog_arg, dp_path);
+ if (ret < 0)
+ goto free_kvlist;
+
free_kvlist:
rte_kvargs_free(kvlist);
return ret;
@@ -2108,7 +2117,7 @@ static struct rte_eth_dev *
init_internals(struct rte_vdev_device *dev, const char *if_name,
int start_queue_idx, int queue_cnt, int shared_umem,
const char *prog_path, int busy_budget, int force_copy,
- int use_cni)
+ int use_cni, const char *dp_path)
{
const char *name = rte_vdev_device_name(dev);
const unsigned int numa_node = dev->device.numa_node;
@@ -2138,6 +2147,7 @@ init_internals(struct rte_vdev_device *dev, const char *if_name,
internals->shared_umem = shared_umem;
internals->force_copy = force_copy;
internals->use_cni = use_cni;
+ strlcpy(internals->dp_path, dp_path, PATH_MAX);
if (xdp_get_channels_info(if_name, &internals->max_queue_cnt,
&internals->combined_queue_cnt)) {
@@ -2199,7 +2209,7 @@ init_internals(struct rte_vdev_device *dev, const char *if_name,
if (!internals->use_cni)
eth_dev->dev_ops = &ops;
else
- eth_dev->dev_ops = &ops_cni;
+ eth_dev->dev_ops = &ops_afxdp_dp;
eth_dev->rx_pkt_burst = eth_af_xdp_rx;
eth_dev->tx_pkt_burst = eth_af_xdp_tx;
@@ -2328,6 +2338,7 @@ rte_pmd_af_xdp_probe(struct rte_vdev_device *dev)
int busy_budget = -1, ret;
int force_copy = 0;
int use_cni = 0;
+ char dp_path[PATH_MAX] = {'\0'};
struct rte_eth_dev *eth_dev = NULL;
const char *name = rte_vdev_device_name(dev);
@@ -2370,7 +2381,7 @@ rte_pmd_af_xdp_probe(struct rte_vdev_device *dev)
if (parse_parameters(kvlist, if_name, &xsk_start_queue_idx,
&xsk_queue_cnt, &shared_umem, prog_path,
- &busy_budget, &force_copy, &use_cni) < 0) {
+ &busy_budget, &force_copy, &use_cni, dp_path) < 0) {
AF_XDP_LOG(ERR, "Invalid kvargs value\n");
return -EINVAL;
}
@@ -2384,7 +2395,19 @@ rte_pmd_af_xdp_probe(struct rte_vdev_device *dev)
if (use_cni && strnlen(prog_path, PATH_MAX)) {
AF_XDP_LOG(ERR, "When '%s' parameter is used, '%s' parameter is not valid\n",
ETH_AF_XDP_USE_CNI_ARG, ETH_AF_XDP_PROG_ARG);
- return -EINVAL;
+ return -EINVAL;
+ }
+
+ if (use_cni && !strnlen(dp_path, PATH_MAX)) {
+ snprintf(dp_path, sizeof(dp_path), "%s/%s/%s", DP_BASE_PATH, if_name, DP_UDS_SOCK);
+ AF_XDP_LOG(INFO, "'%s' parameter not provided, setting value to '%s'\n",
+ ETH_AF_XDP_DP_PATH_ARG, dp_path);
+ }
+
+ if (!use_cni && strnlen(dp_path, PATH_MAX)) {
+ AF_XDP_LOG(ERR, "'%s' parameter is set, but '%s' was not enabled\n",
+ ETH_AF_XDP_DP_PATH_ARG, ETH_AF_XDP_USE_CNI_ARG);
+ return -EINVAL;
}
if (strlen(if_name) == 0) {
@@ -2410,7 +2433,7 @@ rte_pmd_af_xdp_probe(struct rte_vdev_device *dev)
eth_dev = init_internals(dev, if_name, xsk_start_queue_idx,
xsk_queue_cnt, shared_umem, prog_path,
- busy_budget, force_copy, use_cni);
+ busy_budget, force_copy, use_cni, dp_path);
if (eth_dev == NULL) {
AF_XDP_LOG(ERR, "Failed to init internals\n");
return -1;
@@ -2471,4 +2494,5 @@ RTE_PMD_REGISTER_PARAM_STRING(net_af_xdp,
"xdp_prog=<string> "
"busy_budget=<int> "
"force_copy=<int> "
- "use_cni=<int> ");
+ "use_cni=<int> "
+ "dp_path=<string> ");