examples/eventdev: fix segment fault with generic pipeline

Message ID 20240801111120.5380-1-fengchengwen@huawei.com (mailing list archive)
State New
Delegated to: Jerin Jacob
Headers
Series examples/eventdev: fix segment fault with generic pipeline |

Checks

Context Check Description
ci/loongarch-compilation success Compilation OK
ci/checkpatch warning coding style issues
ci/loongarch-unit-testing success Unit Testing PASS
ci/github-robot: build success github build: passed
ci/iol-mellanox-Performance success Performance Testing PASS
ci/iol-marvell-Functional success Functional Testing PASS
ci/iol-abi-testing success Testing PASS
ci/iol-broadcom-Performance success Performance Testing PASS
ci/iol-broadcom-Functional success Functional Testing PASS
ci/iol-unit-arm64-testing success Testing PASS
ci/iol-unit-amd64-testing success Testing PASS
ci/iol-compile-amd64-testing success Testing PASS
ci/iol-compile-arm64-testing success Testing PASS
ci/Intel-compilation success Compilation OK
ci/intel-Testing success Testing PASS
ci/intel-Functional success Functional PASS
ci/iol-intel-Performance success Performance Testing PASS
ci/iol-intel-Functional success Functional Testing PASS
ci/iol-sample-apps-testing success Testing PASS

Commit Message

fengchengwen Aug. 1, 2024, 11:11 a.m. UTC
There was a segmentation fault when executing eventdev_pipeline with
command [1] with ConnectX-5 NIC card:

0x000000000079208c in rte_eth_tx_buffer (tx_pkt=0x16f8ed300, buffer=0x100, queue_id=11, port_id=0) at ../lib/ethdev/rte_ethdev.h:6636
txa_service_tx (txa=0x17b19d080, ev=0xffffffffe500, n=4) at ../lib/eventdev/rte_event_eth_tx_adapter.c:631
0x0000000000792234 in txa_service_func (args=0x17b19d080) at ../lib/eventdev/rte_event_eth_tx_adapter.c:666
0x00000000008b0784 in service_runner_do_callback (s=0x17fffe100, cs=0x17ffb5f80, service_idx=2) at ../lib/eal/common/rte_service.c:405
0x00000000008b0ad8 in service_run (i=2, cs=0x17ffb5f80, service_mask=18446744073709551615, s=0x17fffe100, serialize_mt_unsafe=0)
    at ../lib/eal/common/rte_service.c:441
0x00000000008b0c68 in rte_service_run_iter_on_app_lcore (id=2, serialize_mt_unsafe=0) at ../lib/eal/common/rte_service.c:477
0x000000000057bcc4 in schedule_devices (lcore_id=0) at ../examples/eventdev_pipeline/pipeline_common.h:138
0x000000000057ca94 in worker_generic_burst (arg=0x17b131e80) at ../examples/eventdev_pipeline/pipeline_worker_generic.c:83
0x00000000005794a8 in main (argc=11, argv=0xfffffffff470) at ../examples/eventdev_pipeline/main.c:449

The root cause is that the queue_id (11) is invalid, the queue_id comes
from mbuf.hash.txadapter.txq which may pre-write by NIC driver when
receiving packets (e.g. pre-write mbuf.hash.fdir.hi field).

Because this example only enabled one ethdev queue, so fixes it by reset
txq to zero in the first worker stage.

[1] dpdk-eventdev_pipeline -l 0-48 --vdev event_sw0 -- -r1 -t1 -e1 -w ff0 -s5 -n0 -c32 -W1000 -D
When launch eventdev_pipeline with command [1],  event_sw

Fixes: 81fb40f95c82 ("examples/eventdev: add generic worker pipeline")
Cc: stable@dpdk.org

Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
Reported-by: Chenxingyu Wang <wangchenxingyu@huawei.com>
---
 examples/eventdev_pipeline/pipeline_worker_generic.c | 12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)
  

Comments

Van Haaren, Harry Aug. 1, 2024, 12:43 p.m. UTC | #1
> From: Chengwen Feng <fengchengwen@huawei.com>
> Sent: Thursday, August 1, 2024 12:11 PM
> To: thomas@monjalon.net <thomas@monjalon.net>; dev@dpdk.org <dev@dpdk.org>
> Cc: Van Haaren, Harry <harry.van.haaren@intel.com>; wangchenxingyu@huawei.com <wangchenxingyu@huawei.com>
> Subject: [PATCH] examples/eventdev: fix segment fault with generic pipeline
>
> There was a segmentation fault when executing eventdev_pipeline with
> command [1] with ConnectX-5 NIC card:
>
> 0x000000000079208c in rte_eth_tx_buffer (tx_pkt=0x16f8ed300, buffer=0x100, queue_id=11, port_id=0) at ../lib/ethdev/rte_ethdev.h:6636
> txa_service_tx (txa=0x17b19d080, ev=0xffffffffe500, n=4) at ../lib/eventdev/rte_event_eth_tx_adapter.c:631
> 0x0000000000792234 in txa_service_func (args=0x17b19d080) at ../lib/eventdev/rte_event_eth_tx_adapter.c:666
> 0x00000000008b0784 in service_runner_do_callback (s=0x17fffe100, cs=0x17ffb5f80, service_idx=2) at ../lib/eal/common/rte_service.c:405
> 0x00000000008b0ad8 in service_run (i=2, cs=0x17ffb5f80, service_mask=18446744073709551615, s=0x17fffe100, serialize_mt_unsafe=0)
>     at ../lib/eal/common/rte_service.c:441
> 0x00000000008b0c68 in rte_service_run_iter_on_app_lcore (id=2, serialize_mt_unsafe=0) at ../lib/eal/common/rte_service.c:477
> 0x000000000057bcc4 in schedule_devices (lcore_id=0) at ../examples/eventdev_pipeline/pipeline_common.h:138
> 0x000000000057ca94 in worker_generic_burst (arg=0x17b131e80) at ../examples/eventdev_pipeline/pipeline_worker_generic.c:83
> 0x00000000005794a8 in main (argc=11, argv=0xfffffffff470) at ../examples/eventdev_pipeline/main.c:449
>
> The root cause is that the queue_id (11) is invalid, the queue_id comes
> from mbuf.hash.txadapter.txq which may pre-write by NIC driver when
> receiving packets (e.g. pre-write mbuf.hash.fdir.hi field).

Good bug report, thanks for the detailed info on hash.fdir.hi union-ed with txadapter fields.
I don't have the specific HW to test, so code review only.

I don't recall the TXQ quantities etc (been a number of years since I worked on this code...!)
so I'll +CC Pavan who reworked the logic around generic workers & eventdev stages, and might recall?

> Because this example only enabled one ethdev queue, so fixes it by reset
> txq to zero in the first worker stage.
>
> [1] dpdk-eventdev_pipeline -l 0-48 --vdev event_sw0 -- -r1 -t1 -e1 -w ff0 -s5 -n0 -c32 -W1000 -D
> When launch eventdev_pipeline with command [1],  event_sw
>
> Fixes: 81fb40f95c82 ("examples/eventdev: add generic worker pipeline")
> Cc: stable@dpdk.org
>
> Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
> Reported-by: Chenxingyu Wang <wangchenxingyu@huawei.com>

Generally the change looks fine - I'll wait a few days for Pavan's input, and otherwise review & Ack assuming no issues found.

Thanks for the patch! -Harry
  
Pavan Nikhilesh Bhagavatula Aug. 18, 2024, 8:34 a.m. UTC | #2
> -----Original Message-----
> From: Van Haaren, Harry <harry.van.haaren@intel.com>
> Sent: Thursday, August 1, 2024 6:14 PM
> To: Chengwen Feng <fengchengwen@huawei.com>; thomas@monjalon.net;
> dev@dpdk.org
> Cc: wangchenxingyu@huawei.com; Pavan Nikhilesh Bhagavatula
> <pbhagavatula@marvell.com>
> Subject: [EXTERNAL] Re: [PATCH] examples/eventdev: fix segment fault with
> generic pipeline
> 
> > From: Chengwen Feng <fengchengwen@ huawei. com> > Sent: Thursday,
> August 1, 2024 12: 11 PM > To: thomas@ monjalon. net
> <thomas@ monjalon. net>; dev@ dpdk. org <dev@ dpdk. org> > Cc: Van
> Haaren, Harry <harry. van. haaren@ intel. com>;
> 
> > From: Chengwen Feng <fengchengwen@huawei.com>
> > Sent: Thursday, August 1, 2024 12:11 PM
> > To: thomas@monjalon.net <thomas@monjalon.net>; dev@dpdk.org
> <dev@dpdk.org>
> > Cc: Van Haaren, Harry <harry.van.haaren@intel.com>;
> wangchenxingyu@huawei.com <wangchenxingyu@huawei.com>
> > Subject: [PATCH] examples/eventdev: fix segment fault with generic pipeline
> >
> > There was a segmentation fault when executing eventdev_pipeline with
> > command [1] with ConnectX-5 NIC card:
> >
> > 0x000000000079208c in rte_eth_tx_buffer (tx_pkt=0x16f8ed300,
> buffer=0x100, queue_id=11, port_id=0) at ../lib/ethdev/rte_ethdev.h:6636
> > txa_service_tx (txa=0x17b19d080, ev=0xffffffffe500, n=4) at
> ../lib/eventdev/rte_event_eth_tx_adapter.c:631
> > 0x0000000000792234 in txa_service_func (args=0x17b19d080) at
> ../lib/eventdev/rte_event_eth_tx_adapter.c:666
> > 0x00000000008b0784 in service_runner_do_callback (s=0x17fffe100,
> cs=0x17ffb5f80, service_idx=2) at ../lib/eal/common/rte_service.c:405
> > 0x00000000008b0ad8 in service_run (i=2, cs=0x17ffb5f80,
> service_mask=18446744073709551615, s=0x17fffe100,
> serialize_mt_unsafe=0)
> >     at ../lib/eal/common/rte_service.c:441
> > 0x00000000008b0c68 in rte_service_run_iter_on_app_lcore (id=2,
> serialize_mt_unsafe=0) at ../lib/eal/common/rte_service.c:477
> > 0x000000000057bcc4 in schedule_devices (lcore_id=0) at
> ../examples/eventdev_pipeline/pipeline_common.h:138
> > 0x000000000057ca94 in worker_generic_burst (arg=0x17b131e80) at
> ../examples/eventdev_pipeline/pipeline_worker_generic.c:83
> > 0x00000000005794a8 in main (argc=11, argv=0xfffffffff470) at
> ../examples/eventdev_pipeline/main.c:449
> >
> > The root cause is that the queue_id (11) is invalid, the queue_id comes
> > from mbuf.hash.txadapter.txq which may pre-write by NIC driver when
> > receiving packets (e.g. pre-write mbuf.hash.fdir.hi field).
> 
> Good bug report, thanks for the detailed info on hash.fdir.hi union-ed with
> txadapter fields.
> I don't have the specific HW to test, so code review only.
> 
> I don't recall the TXQ quantities etc (been a number of years since I worked on
> this code...!)
> so I'll +CC Pavan who reworked the logic around generic workers & eventdev
> stages, and might recall?
> 
> > Because this example only enabled one ethdev queue, so fixes it by reset
> > txq to zero in the first worker stage.
> >
> > [1] dpdk-eventdev_pipeline -l 0-48 --vdev event_sw0 -- -r1 -t1 -e1 -w ff0 -s5
> -n0 -c32 -W1000 -D
> > When launch eventdev_pipeline with command [1],  event_sw
> >
> > Fixes: 81fb40f95c82 ("examples/eventdev: add generic worker pipeline")
> > Cc: stable@dpdk.org
> >
> > Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
> > Reported-by: Chenxingyu Wang <wangchenxingyu@huawei.com>
> 
> Generally the change looks fine - I'll wait a few days for Pavan's input, and
> otherwise review & Ack assuming no issues found.

Harry, 
The fix looks correct to me.

Acked-by: Pavan Nikhilesh <pbhagavatula@marvell.com>

> 
> Thanks for the patch! -Harry
  

Patch

diff --git a/examples/eventdev_pipeline/pipeline_worker_generic.c b/examples/eventdev_pipeline/pipeline_worker_generic.c
index 783f68c91e..831d7fd53d 100644
--- a/examples/eventdev_pipeline/pipeline_worker_generic.c
+++ b/examples/eventdev_pipeline/pipeline_worker_generic.c
@@ -38,10 +38,12 @@  worker_generic(void *arg)
 		}
 		received++;
 
-		/* The first worker stage does classification */
-		if (ev.queue_id == cdata.qid[0])
+		/* The first worker stage does classification and sets txq. */
+		if (ev.queue_id == cdata.qid[0]) {
 			ev.flow_id = ev.mbuf->hash.rss
 						% cdata.num_fids;
+			rte_event_eth_tx_adapter_txq_set(ev.mbuf, 0);
+		}
 
 		ev.queue_id = cdata.next_qid[ev.queue_id];
 		ev.op = RTE_EVENT_OP_FORWARD;
@@ -96,10 +98,12 @@  worker_generic_burst(void *arg)
 
 		for (i = 0; i < nb_rx; i++) {
 
-			/* The first worker stage does classification */
-			if (events[i].queue_id == cdata.qid[0])
+			/* The first worker stage does classification and sets txq. */
+			if (events[i].queue_id == cdata.qid[0]) {
 				events[i].flow_id = events[i].mbuf->hash.rss
 							% cdata.num_fids;
+				rte_event_eth_tx_adapter_txq_set(events[i].mbuf, 0);
+			}
 
 			events[i].queue_id = cdata.next_qid[events[i].queue_id];
 			events[i].op = RTE_EVENT_OP_FORWARD;