examples/eventdev: fix segment fault with generic pipeline
Checks
Commit Message
There was a segmentation fault when executing eventdev_pipeline with
command [1] with ConnectX-5 NIC card:
0x000000000079208c in rte_eth_tx_buffer (tx_pkt=0x16f8ed300, buffer=0x100, queue_id=11, port_id=0) at ../lib/ethdev/rte_ethdev.h:6636
txa_service_tx (txa=0x17b19d080, ev=0xffffffffe500, n=4) at ../lib/eventdev/rte_event_eth_tx_adapter.c:631
0x0000000000792234 in txa_service_func (args=0x17b19d080) at ../lib/eventdev/rte_event_eth_tx_adapter.c:666
0x00000000008b0784 in service_runner_do_callback (s=0x17fffe100, cs=0x17ffb5f80, service_idx=2) at ../lib/eal/common/rte_service.c:405
0x00000000008b0ad8 in service_run (i=2, cs=0x17ffb5f80, service_mask=18446744073709551615, s=0x17fffe100, serialize_mt_unsafe=0)
at ../lib/eal/common/rte_service.c:441
0x00000000008b0c68 in rte_service_run_iter_on_app_lcore (id=2, serialize_mt_unsafe=0) at ../lib/eal/common/rte_service.c:477
0x000000000057bcc4 in schedule_devices (lcore_id=0) at ../examples/eventdev_pipeline/pipeline_common.h:138
0x000000000057ca94 in worker_generic_burst (arg=0x17b131e80) at ../examples/eventdev_pipeline/pipeline_worker_generic.c:83
0x00000000005794a8 in main (argc=11, argv=0xfffffffff470) at ../examples/eventdev_pipeline/main.c:449
The root cause is that the queue_id (11) is invalid, the queue_id comes
from mbuf.hash.txadapter.txq which may pre-write by NIC driver when
receiving packets (e.g. pre-write mbuf.hash.fdir.hi field).
Because this example only enabled one ethdev queue, so fixes it by reset
txq to zero in the first worker stage.
[1] dpdk-eventdev_pipeline -l 0-48 --vdev event_sw0 -- -r1 -t1 -e1 -w ff0 -s5 -n0 -c32 -W1000 -D
When launch eventdev_pipeline with command [1], event_sw
Fixes: 81fb40f95c82 ("examples/eventdev: add generic worker pipeline")
Cc: stable@dpdk.org
Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
Reported-by: Chenxingyu Wang <wangchenxingyu@huawei.com>
---
examples/eventdev_pipeline/pipeline_worker_generic.c | 12 ++++++++----
1 file changed, 8 insertions(+), 4 deletions(-)
Comments
> From: Chengwen Feng <fengchengwen@huawei.com>
> Sent: Thursday, August 1, 2024 12:11 PM
> To: thomas@monjalon.net <thomas@monjalon.net>; dev@dpdk.org <dev@dpdk.org>
> Cc: Van Haaren, Harry <harry.van.haaren@intel.com>; wangchenxingyu@huawei.com <wangchenxingyu@huawei.com>
> Subject: [PATCH] examples/eventdev: fix segment fault with generic pipeline
>
> There was a segmentation fault when executing eventdev_pipeline with
> command [1] with ConnectX-5 NIC card:
>
> 0x000000000079208c in rte_eth_tx_buffer (tx_pkt=0x16f8ed300, buffer=0x100, queue_id=11, port_id=0) at ../lib/ethdev/rte_ethdev.h:6636
> txa_service_tx (txa=0x17b19d080, ev=0xffffffffe500, n=4) at ../lib/eventdev/rte_event_eth_tx_adapter.c:631
> 0x0000000000792234 in txa_service_func (args=0x17b19d080) at ../lib/eventdev/rte_event_eth_tx_adapter.c:666
> 0x00000000008b0784 in service_runner_do_callback (s=0x17fffe100, cs=0x17ffb5f80, service_idx=2) at ../lib/eal/common/rte_service.c:405
> 0x00000000008b0ad8 in service_run (i=2, cs=0x17ffb5f80, service_mask=18446744073709551615, s=0x17fffe100, serialize_mt_unsafe=0)
> at ../lib/eal/common/rte_service.c:441
> 0x00000000008b0c68 in rte_service_run_iter_on_app_lcore (id=2, serialize_mt_unsafe=0) at ../lib/eal/common/rte_service.c:477
> 0x000000000057bcc4 in schedule_devices (lcore_id=0) at ../examples/eventdev_pipeline/pipeline_common.h:138
> 0x000000000057ca94 in worker_generic_burst (arg=0x17b131e80) at ../examples/eventdev_pipeline/pipeline_worker_generic.c:83
> 0x00000000005794a8 in main (argc=11, argv=0xfffffffff470) at ../examples/eventdev_pipeline/main.c:449
>
> The root cause is that the queue_id (11) is invalid, the queue_id comes
> from mbuf.hash.txadapter.txq which may pre-write by NIC driver when
> receiving packets (e.g. pre-write mbuf.hash.fdir.hi field).
Good bug report, thanks for the detailed info on hash.fdir.hi union-ed with txadapter fields.
I don't have the specific HW to test, so code review only.
I don't recall the TXQ quantities etc (been a number of years since I worked on this code...!)
so I'll +CC Pavan who reworked the logic around generic workers & eventdev stages, and might recall?
> Because this example only enabled one ethdev queue, so fixes it by reset
> txq to zero in the first worker stage.
>
> [1] dpdk-eventdev_pipeline -l 0-48 --vdev event_sw0 -- -r1 -t1 -e1 -w ff0 -s5 -n0 -c32 -W1000 -D
> When launch eventdev_pipeline with command [1], event_sw
>
> Fixes: 81fb40f95c82 ("examples/eventdev: add generic worker pipeline")
> Cc: stable@dpdk.org
>
> Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
> Reported-by: Chenxingyu Wang <wangchenxingyu@huawei.com>
Generally the change looks fine - I'll wait a few days for Pavan's input, and otherwise review & Ack assuming no issues found.
Thanks for the patch! -Harry
> -----Original Message-----
> From: Van Haaren, Harry <harry.van.haaren@intel.com>
> Sent: Thursday, August 1, 2024 6:14 PM
> To: Chengwen Feng <fengchengwen@huawei.com>; thomas@monjalon.net;
> dev@dpdk.org
> Cc: wangchenxingyu@huawei.com; Pavan Nikhilesh Bhagavatula
> <pbhagavatula@marvell.com>
> Subject: [EXTERNAL] Re: [PATCH] examples/eventdev: fix segment fault with
> generic pipeline
>
> > From: Chengwen Feng <fengchengwen@ huawei. com> > Sent: Thursday,
> August 1, 2024 12: 11 PM > To: thomas@ monjalon. net
> <thomas@ monjalon. net>; dev@ dpdk. org <dev@ dpdk. org> > Cc: Van
> Haaren, Harry <harry. van. haaren@ intel. com>;
>
> > From: Chengwen Feng <fengchengwen@huawei.com>
> > Sent: Thursday, August 1, 2024 12:11 PM
> > To: thomas@monjalon.net <thomas@monjalon.net>; dev@dpdk.org
> <dev@dpdk.org>
> > Cc: Van Haaren, Harry <harry.van.haaren@intel.com>;
> wangchenxingyu@huawei.com <wangchenxingyu@huawei.com>
> > Subject: [PATCH] examples/eventdev: fix segment fault with generic pipeline
> >
> > There was a segmentation fault when executing eventdev_pipeline with
> > command [1] with ConnectX-5 NIC card:
> >
> > 0x000000000079208c in rte_eth_tx_buffer (tx_pkt=0x16f8ed300,
> buffer=0x100, queue_id=11, port_id=0) at ../lib/ethdev/rte_ethdev.h:6636
> > txa_service_tx (txa=0x17b19d080, ev=0xffffffffe500, n=4) at
> ../lib/eventdev/rte_event_eth_tx_adapter.c:631
> > 0x0000000000792234 in txa_service_func (args=0x17b19d080) at
> ../lib/eventdev/rte_event_eth_tx_adapter.c:666
> > 0x00000000008b0784 in service_runner_do_callback (s=0x17fffe100,
> cs=0x17ffb5f80, service_idx=2) at ../lib/eal/common/rte_service.c:405
> > 0x00000000008b0ad8 in service_run (i=2, cs=0x17ffb5f80,
> service_mask=18446744073709551615, s=0x17fffe100,
> serialize_mt_unsafe=0)
> > at ../lib/eal/common/rte_service.c:441
> > 0x00000000008b0c68 in rte_service_run_iter_on_app_lcore (id=2,
> serialize_mt_unsafe=0) at ../lib/eal/common/rte_service.c:477
> > 0x000000000057bcc4 in schedule_devices (lcore_id=0) at
> ../examples/eventdev_pipeline/pipeline_common.h:138
> > 0x000000000057ca94 in worker_generic_burst (arg=0x17b131e80) at
> ../examples/eventdev_pipeline/pipeline_worker_generic.c:83
> > 0x00000000005794a8 in main (argc=11, argv=0xfffffffff470) at
> ../examples/eventdev_pipeline/main.c:449
> >
> > The root cause is that the queue_id (11) is invalid, the queue_id comes
> > from mbuf.hash.txadapter.txq which may pre-write by NIC driver when
> > receiving packets (e.g. pre-write mbuf.hash.fdir.hi field).
>
> Good bug report, thanks for the detailed info on hash.fdir.hi union-ed with
> txadapter fields.
> I don't have the specific HW to test, so code review only.
>
> I don't recall the TXQ quantities etc (been a number of years since I worked on
> this code...!)
> so I'll +CC Pavan who reworked the logic around generic workers & eventdev
> stages, and might recall?
>
> > Because this example only enabled one ethdev queue, so fixes it by reset
> > txq to zero in the first worker stage.
> >
> > [1] dpdk-eventdev_pipeline -l 0-48 --vdev event_sw0 -- -r1 -t1 -e1 -w ff0 -s5
> -n0 -c32 -W1000 -D
> > When launch eventdev_pipeline with command [1], event_sw
> >
> > Fixes: 81fb40f95c82 ("examples/eventdev: add generic worker pipeline")
> > Cc: stable@dpdk.org
> >
> > Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
> > Reported-by: Chenxingyu Wang <wangchenxingyu@huawei.com>
>
> Generally the change looks fine - I'll wait a few days for Pavan's input, and
> otherwise review & Ack assuming no issues found.
Harry,
The fix looks correct to me.
Acked-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
>
> Thanks for the patch! -Harry
@@ -38,10 +38,12 @@ worker_generic(void *arg)
}
received++;
- /* The first worker stage does classification */
- if (ev.queue_id == cdata.qid[0])
+ /* The first worker stage does classification and sets txq. */
+ if (ev.queue_id == cdata.qid[0]) {
ev.flow_id = ev.mbuf->hash.rss
% cdata.num_fids;
+ rte_event_eth_tx_adapter_txq_set(ev.mbuf, 0);
+ }
ev.queue_id = cdata.next_qid[ev.queue_id];
ev.op = RTE_EVENT_OP_FORWARD;
@@ -96,10 +98,12 @@ worker_generic_burst(void *arg)
for (i = 0; i < nb_rx; i++) {
- /* The first worker stage does classification */
- if (events[i].queue_id == cdata.qid[0])
+ /* The first worker stage does classification and sets txq. */
+ if (events[i].queue_id == cdata.qid[0]) {
events[i].flow_id = events[i].mbuf->hash.rss
% cdata.num_fids;
+ rte_event_eth_tx_adapter_txq_set(events[i].mbuf, 0);
+ }
events[i].queue_id = cdata.next_qid[events[i].queue_id];
events[i].op = RTE_EVENT_OP_FORWARD;