[v3,1/3] net/bonding: support Tx prepare

Message ID 20220917041537.6821-2-fengchengwen@huawei.com (mailing list archive)
State Superseded, archived
Delegated to: Andrew Rybchenko
Headers
Series add Tx prepare support for bonding driver |

Checks

Context Check Description
ci/checkpatch warning coding style issues

Commit Message

Chengwen Feng Sept. 17, 2022, 4:15 a.m. UTC
  Normally, to use the HW offloads capability (e.g. checksum and TSO) in
the Tx direction, the application needs to call rte_eth_dev_prepare to
do some adjustment with the packets before sending them (e.g. processing
pseudo headers when Tx checksum offload enabled). But, the tx_prepare
callback of the bonding driver is not implemented. Therefore, the
sent packets may have errors (e.g. checksum errors).

However, it is difficult to design the tx_prepare callback for bonding
driver. Because when a bonded device sends packets, the bonded device
allocates the packets to different slave devices based on the real-time
link status and bonding mode. That is, it is very difficult for the
bonded device to determine which slave device's prepare function should
be invoked.

So, in this patch, the tx_prepare callback of bonding driver is not
implemented. Instead, the rte_eth_dev_tx_prepare() will be called for
all the fast path packets in mode 0, 1, 2, 4, 5, 6 (mode 3 is not
included, see[1]). In this way, all tx_offloads can be processed
correctly for all NIC devices in these modes.

As previously discussed (see [2]), for user input mbufs, if the
tx_prepare fails, the bonding driver will free the cossesponding
packets internally, and only the packets of the tx_prepare OK are xmit.

To minimize performance impact, this patch adds one new
'tx_prepare_enabled' field, and corresponding control and get API:
rte_eth_bond_tx_prepare_set() and rte_eth_bond_tx_prepare_get().

[1]: In bond mode 3 (broadcast), a packet needs to be sent by all slave
ports. Different slave PMDs process the packets differently in
tx_prepare. If call tx_prepare before each slave port sending, the sent
packet may be incorrect.
[2]: https://inbox.dpdk.org/dev/1618571071-5927-2-git-send-email-tangchengchang@huawei.com/

Signed-off-by: Chengchang Tang <tangchengchang@huawei.com>
Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
Reviewed-by: Min Hu (Connor) <humin29@huawei.com>
---
 .../link_bonding_poll_mode_drv_lib.rst        |   5 +
 doc/guides/rel_notes/release_22_11.rst        |   6 ++
 drivers/net/bonding/eth_bond_private.h        |   6 ++
 drivers/net/bonding/rte_eth_bond.h            |  24 +++++
 drivers/net/bonding/rte_eth_bond_8023ad.c     |   8 +-
 drivers/net/bonding/rte_eth_bond_api.c        |  32 ++++++
 drivers/net/bonding/rte_eth_bond_pmd.c        | 101 +++++++++++++++---
 drivers/net/bonding/version.map               |   5 +
 8 files changed, 169 insertions(+), 18 deletions(-)
  

Patch

diff --git a/doc/guides/prog_guide/link_bonding_poll_mode_drv_lib.rst b/doc/guides/prog_guide/link_bonding_poll_mode_drv_lib.rst
index 9510368103..a3d91b2091 100644
--- a/doc/guides/prog_guide/link_bonding_poll_mode_drv_lib.rst
+++ b/doc/guides/prog_guide/link_bonding_poll_mode_drv_lib.rst
@@ -359,6 +359,11 @@  The link status of a bonded device is dictated by that of its slaves, if all
 slave device link status are down or if all slaves are removed from the link
 bonding device then the link status of the bonding device will go down.
 
+Unlike normal PMD drivers, the Tx prepare for the bonding driver is controlled
+by ``rte_eth_bond_tx_prepare_set`` (all bond modes except mode 3 (broadcast)
+are supported). The ``rte_eth_bond_tx_prepare_get`` for querying the enabling
+status is provided.
+
 It is also possible to configure / query the configuration of the control
 parameters of a bonded device using the provided APIs
 ``rte_eth_bond_mode_set/ get``, ``rte_eth_bond_primary_set/get``,
diff --git a/doc/guides/rel_notes/release_22_11.rst b/doc/guides/rel_notes/release_22_11.rst
index 8c021cf050..6e28a6c0af 100644
--- a/doc/guides/rel_notes/release_22_11.rst
+++ b/doc/guides/rel_notes/release_22_11.rst
@@ -55,6 +55,12 @@  New Features
      Also, make sure to start the actual text at the margin.
      =======================================================
 
+* **Added Tx prepare for bonding driver.**
+
+  * Added ``rte_eth_bond_tx_prepare_set`` to set whether enable Tx prepare for bonded port.
+    All bond modes except mode 3 (broadcast) are supported.
+  * Added ``rte_eth_bond_tx_prepare_get`` to get whether Tx prepare enabled for bonded port.
+
 
 Removed Items
 -------------
diff --git a/drivers/net/bonding/eth_bond_private.h b/drivers/net/bonding/eth_bond_private.h
index 8222e3cd38..976163b06b 100644
--- a/drivers/net/bonding/eth_bond_private.h
+++ b/drivers/net/bonding/eth_bond_private.h
@@ -117,6 +117,7 @@  struct bond_dev_private {
 	uint16_t user_defined_primary_port;
 	/**< Flag for whether primary port is user defined or not */
 
+	uint8_t tx_prepare_enabled;
 	uint8_t balance_xmit_policy;
 	/**< Transmit policy - l2 / l23 / l34 for operation in balance mode */
 	burst_xmit_hash_t burst_xmit_hash;
@@ -258,6 +259,11 @@  void
 slave_add(struct bond_dev_private *internals,
 		struct rte_eth_dev *slave_eth_dev);
 
+uint16_t
+bond_ethdev_tx_ctrl_wrap(struct bond_dev_private *internals,
+			uint16_t slave_port_id, uint16_t queue_id,
+			struct rte_mbuf **tx_pkts, uint16_t nb_pkts);
+
 void
 burst_xmit_l2_hash(struct rte_mbuf **buf, uint16_t nb_pkts,
 		uint16_t slave_count, uint16_t *slaves);
diff --git a/drivers/net/bonding/rte_eth_bond.h b/drivers/net/bonding/rte_eth_bond.h
index 874aa91a5f..deae2dd9ad 100644
--- a/drivers/net/bonding/rte_eth_bond.h
+++ b/drivers/net/bonding/rte_eth_bond.h
@@ -343,6 +343,30 @@  rte_eth_bond_link_up_prop_delay_set(uint16_t bonded_port_id,
 int
 rte_eth_bond_link_up_prop_delay_get(uint16_t bonded_port_id);
 
+/**
+ * Set whether enable Tx-prepare (rte_eth_tx_prepare) for bonded port
+ *
+ * @param bonded_port_id      Bonded device id
+ * @param en                  Enable flag
+ *
+ * @return
+ *   0 on success, negative value otherwise.
+ */
+__rte_experimental
+int
+rte_eth_bond_tx_prepare_set(uint16_t bonded_port_id, bool en);
+
+/**
+ * Get whether Tx-prepare (rte_eth_tx_prepare) is enabled for bonded port
+ *
+ * @param bonded_port_id      Bonded device id
+ *
+ * @return
+ *   0-disabled, 1-enabled, negative value otherwise.
+ */
+__rte_experimental
+int
+rte_eth_bond_tx_prepare_get(uint16_t bonded_port_id);
 
 #ifdef __cplusplus
 }
diff --git a/drivers/net/bonding/rte_eth_bond_8023ad.c b/drivers/net/bonding/rte_eth_bond_8023ad.c
index b3cddd8a20..d5951d684e 100644
--- a/drivers/net/bonding/rte_eth_bond_8023ad.c
+++ b/drivers/net/bonding/rte_eth_bond_8023ad.c
@@ -636,8 +636,8 @@  tx_machine(struct bond_dev_private *internals, uint16_t slave_id)
 			return;
 		}
 	} else {
-		uint16_t pkts_sent = rte_eth_tx_burst(slave_id,
-				internals->mode4.dedicated_queues.tx_qid,
+		uint16_t pkts_sent = bond_ethdev_tx_ctrl_wrap(internals,
+				slave_id, internals->mode4.dedicated_queues.tx_qid,
 				&lacp_pkt, 1);
 		if (pkts_sent != 1) {
 			rte_pktmbuf_free(lacp_pkt);
@@ -1371,8 +1371,8 @@  bond_mode_8023ad_handle_slow_pkt(struct bond_dev_private *internals,
 			}
 		} else {
 			/* Send packet directly to the slow queue */
-			uint16_t tx_count = rte_eth_tx_burst(slave_id,
-					internals->mode4.dedicated_queues.tx_qid,
+			uint16_t tx_count = bond_ethdev_tx_ctrl_wrap(internals,
+					slave_id, internals->mode4.dedicated_queues.tx_qid,
 					&pkt, 1);
 			if (tx_count != 1) {
 				/* reset timer */
diff --git a/drivers/net/bonding/rte_eth_bond_api.c b/drivers/net/bonding/rte_eth_bond_api.c
index 4ac191c468..47841289f4 100644
--- a/drivers/net/bonding/rte_eth_bond_api.c
+++ b/drivers/net/bonding/rte_eth_bond_api.c
@@ -1070,3 +1070,35 @@  rte_eth_bond_link_up_prop_delay_get(uint16_t bonded_port_id)
 
 	return internals->link_up_delay_ms;
 }
+
+int
+rte_eth_bond_tx_prepare_set(uint16_t bonded_port_id, bool en)
+{
+	struct bond_dev_private *internals;
+
+	if (valid_bonded_port_id(bonded_port_id) != 0)
+		return -1;
+
+	internals = rte_eth_devices[bonded_port_id].data->dev_private;
+	if (internals->mode == BONDING_MODE_BROADCAST) {
+		RTE_BOND_LOG(ERR, "Mode broadcast don't support to configure Tx-prepare");
+		return -ENOTSUP;
+	}
+
+	internals->tx_prepare_enabled = en ? 1 : 0;
+
+	return 0;
+}
+
+int
+rte_eth_bond_tx_prepare_get(uint16_t bonded_port_id)
+{
+	struct bond_dev_private *internals;
+
+	if (valid_bonded_port_id(bonded_port_id) != 0)
+		return -1;
+
+	internals = rte_eth_devices[bonded_port_id].data->dev_private;
+
+	return internals->tx_prepare_enabled;
+}
diff --git a/drivers/net/bonding/rte_eth_bond_pmd.c b/drivers/net/bonding/rte_eth_bond_pmd.c
index 73e6972035..ec9d7d7bab 100644
--- a/drivers/net/bonding/rte_eth_bond_pmd.c
+++ b/drivers/net/bonding/rte_eth_bond_pmd.c
@@ -559,6 +559,76 @@  bond_ethdev_rx_burst_alb(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
 	return nb_recv_pkts;
 }
 
+/** This function is used to transmit internally generated mbufs. */
+uint16_t
+bond_ethdev_tx_ctrl_wrap(struct bond_dev_private *internals,
+			 uint16_t slave_port_id, uint16_t queue_id,
+			 struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
+{
+	uint16_t nb_pre = nb_pkts;
+
+	if (internals->tx_prepare_enabled)
+		nb_pre = rte_eth_tx_prepare(slave_port_id, queue_id, tx_pkts,
+					    nb_pkts);
+
+	return rte_eth_tx_burst(slave_port_id, queue_id, tx_pkts, nb_pre);
+}
+
+/**
+ * This function is used to transmit the mbufs input by the user.
+ * If the tx-prepare is enabled, the mbufs that fail with tx-prepare will be
+ * freed internally.
+ */
+static inline uint16_t
+bond_ethdev_tx_user_wrap(struct bond_tx_queue *bd_tx_q, uint16_t slave_port_id,
+		    struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
+{
+	struct bond_dev_private *internals = bd_tx_q->dev_private;
+	uint16_t queue_id = bd_tx_q->queue_id;
+	struct rte_mbuf *fail_pkts[nb_pkts];
+	uint8_t fail_mark[nb_pkts];
+	uint16_t nb_pre, index;
+	uint16_t fail_cnt = 0;
+	int i;
+
+	if (!internals->tx_prepare_enabled)
+		goto tx_burst;
+
+	nb_pre = rte_eth_tx_prepare(slave_port_id, queue_id, tx_pkts, nb_pkts);
+	if (nb_pre == nb_pkts)
+		goto tx_burst;
+
+	fail_pkts[fail_cnt++] = tx_pkts[nb_pre];
+	memset(fail_mark, 0, sizeof(fail_mark));
+	fail_mark[nb_pre] = 1;
+	for (i = nb_pre + 1; i < nb_pkts; /* update in inner loop */) {
+		nb_pre = rte_eth_tx_prepare(slave_port_id, queue_id,
+					    tx_pkts + i, nb_pkts - i);
+		if (nb_pre == nb_pkts - i)
+			break;
+		fail_pkts[fail_cnt++] = tx_pkts[i + nb_pre];
+		fail_mark[i + nb_pre] = 1;
+		i += nb_pre + 1;
+	}
+
+	/* move tx-prepare OK mbufs to the end */
+	for (i = index = nb_pkts - 1; i >= 0; i--) {
+		if (!fail_mark[i])
+			tx_pkts[index--] = tx_pkts[i];
+	}
+	/* move tx-prepare fail mbufs to the begin, and free them */
+	for (i = 0; i < fail_cnt; i++) {
+		tx_pkts[i] = fail_pkts[i];
+		rte_pktmbuf_free(fail_pkts[i]);
+	}
+
+	if (fail_cnt == nb_pkts)
+		return nb_pkts;
+tx_burst:
+	return fail_cnt + rte_eth_tx_burst(slave_port_id, queue_id,
+				tx_pkts + fail_cnt, nb_pkts - fail_cnt);
+}
+
 static uint16_t
 bond_ethdev_tx_burst_round_robin(void *queue, struct rte_mbuf **bufs,
 		uint16_t nb_pkts)
@@ -602,8 +672,9 @@  bond_ethdev_tx_burst_round_robin(void *queue, struct rte_mbuf **bufs,
 	/* Send packet burst on each slave device */
 	for (i = 0; i < num_of_slaves; i++) {
 		if (slave_nb_pkts[i] > 0) {
-			num_tx_slave = rte_eth_tx_burst(slaves[i], bd_tx_q->queue_id,
-					slave_bufs[i], slave_nb_pkts[i]);
+			num_tx_slave = bond_ethdev_tx_user_wrap(bd_tx_q,
+					slaves[i], slave_bufs[i],
+					slave_nb_pkts[i]);
 
 			/* if tx burst fails move packets to end of bufs */
 			if (unlikely(num_tx_slave < slave_nb_pkts[i])) {
@@ -635,8 +706,8 @@  bond_ethdev_tx_burst_active_backup(void *queue,
 	if (internals->active_slave_count < 1)
 		return 0;
 
-	return rte_eth_tx_burst(internals->current_primary_port, bd_tx_q->queue_id,
-			bufs, nb_pkts);
+	return bond_ethdev_tx_user_wrap(bd_tx_q,
+			internals->current_primary_port, bufs, nb_pkts);
 }
 
 static inline uint16_t
@@ -951,8 +1022,8 @@  bond_ethdev_tx_burst_tlb(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
 #endif
 		}
 
-		num_tx_total += rte_eth_tx_burst(slaves[i], bd_tx_q->queue_id,
-				bufs + num_tx_total, nb_pkts - num_tx_total);
+		num_tx_total += bond_ethdev_tx_user_wrap(bd_tx_q, slaves[i],
+					bufs + num_tx_total, nb_pkts - num_tx_total);
 
 		if (num_tx_total == nb_pkts)
 			break;
@@ -1064,8 +1135,9 @@  bond_ethdev_tx_burst_alb(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
 	/* Send ARP packets on proper slaves */
 	for (i = 0; i < RTE_MAX_ETHPORTS; i++) {
 		if (slave_bufs_pkts[i] > 0) {
-			num_send = rte_eth_tx_burst(i, bd_tx_q->queue_id,
-					slave_bufs[i], slave_bufs_pkts[i]);
+			num_send = bond_ethdev_tx_ctrl_wrap(internals, i,
+					bd_tx_q->queue_id, slave_bufs[i],
+					slave_bufs_pkts[i]);
 			for (j = 0; j < slave_bufs_pkts[i] - num_send; j++) {
 				bufs[nb_pkts - 1 - num_not_send - j] =
 						slave_bufs[i][nb_pkts - 1 - j];
@@ -1088,7 +1160,8 @@  bond_ethdev_tx_burst_alb(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
 	/* Send update packets on proper slaves */
 	for (i = 0; i < RTE_MAX_ETHPORTS; i++) {
 		if (update_bufs_pkts[i] > 0) {
-			num_send = rte_eth_tx_burst(i, bd_tx_q->queue_id, update_bufs[i],
+			num_send = bond_ethdev_tx_ctrl_wrap(internals, i,
+					bd_tx_q->queue_id, update_bufs[i],
 					update_bufs_pkts[i]);
 			for (j = num_send; j < update_bufs_pkts[i]; j++) {
 				rte_pktmbuf_free(update_bufs[i][j]);
@@ -1158,9 +1231,8 @@  tx_burst_balance(void *queue, struct rte_mbuf **bufs, uint16_t nb_bufs,
 		if (slave_nb_bufs[i] == 0)
 			continue;
 
-		slave_tx_count = rte_eth_tx_burst(slave_port_ids[i],
-				bd_tx_q->queue_id, slave_bufs[i],
-				slave_nb_bufs[i]);
+		slave_tx_count = bond_ethdev_tx_user_wrap(bd_tx_q,
+			slave_port_ids[i], slave_bufs[i], slave_nb_bufs[i]);
 
 		total_tx_count += slave_tx_count;
 
@@ -1243,8 +1315,9 @@  tx_burst_8023ad(void *queue, struct rte_mbuf **bufs, uint16_t nb_bufs,
 
 		if (rte_ring_dequeue(port->tx_ring,
 				     (void **)&ctrl_pkt) != -ENOENT) {
-			slave_tx_count = rte_eth_tx_burst(slave_port_ids[i],
-					bd_tx_q->queue_id, &ctrl_pkt, 1);
+			slave_tx_count = bond_ethdev_tx_ctrl_wrap(internals,
+					slave_port_ids[i], bd_tx_q->queue_id,
+					&ctrl_pkt, 1);
 			/*
 			 * re-enqueue LAG control plane packets to buffering
 			 * ring if transmission fails so the packet isn't lost.
diff --git a/drivers/net/bonding/version.map b/drivers/net/bonding/version.map
index 9333923b4e..2c121f2559 100644
--- a/drivers/net/bonding/version.map
+++ b/drivers/net/bonding/version.map
@@ -31,3 +31,8 @@  DPDK_23 {
 
 	local: *;
 };
+
+EXPERIMENTAL {
+	rte_eth_bond_tx_prepare_get;
+	rte_eth_bond_tx_prepare_set;
+};
\ No newline at end of file