[v4] app/testpmd: fix primary process not polling all queues

Message ID 20230609090340.3942-1-haijie1@huawei.com (mailing list archive)
State Accepted, archived
Delegated to: Ferruh Yigit
Headers
Series [v4] app/testpmd: fix primary process not polling all queues |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/loongarch-compilation success Compilation OK
ci/loongarch-unit-testing success Unit Testing PASS
ci/iol-mellanox-Performance success Performance Testing PASS
ci/Intel-compilation success Compilation OK
ci/iol-abi-testing success Testing PASS
ci/iol-aarch-unit-testing success Testing PASS
ci/github-robot: build success github build: passed
ci/iol-x86_64-compile-testing success Testing PASS
ci/iol-unit-testing success Testing PASS
ci/intel-Functional success Functional PASS
ci/iol-testing success Testing PASS
ci/iol-x86_64-unit-testing success Testing PASS
ci/iol-intel-Performance success Performance Testing PASS
ci/iol-broadcom-Functional success Functional Testing PASS
ci/iol-intel-Functional success Functional Testing PASS
ci/iol-broadcom-Performance success Performance Testing PASS
ci/iol-aarch64-compile-testing success Testing PASS
ci/intel-Testing success Testing PASS

Commit Message

Jie Hai June 9, 2023, 9:03 a.m. UTC
  Here's how the problem arises.
step1: Start the app.
    dpdk-testpmd -a 0000:35:00.0 -l 0-3 -- -i --rxq=10 --txq=10

step2: Perform the following steps and send traffic. As expected,
queue 7 does not send or receive packets, and other queues do.
    port 0 rxq 7 stop
    port 0 txq 7 stop
    set fwd mac
    start

step3: Perform the following steps and send traffic. All queues
are expected to send and receive packets normally, but that's not
the case for queue 7.
    stop
    port stop all
    port start all
    start
    show port xstats all

In fact, only the value of rx_q7_packets for queue 7 is not zero,
which means queue 7 is enabled for the driver but is not involved
in packet receiving and forwarding by software. If we check queue
state by command 'show rxq info 0 7' and 'show txq info 0 7',
we see queue 7 is started as other queues are.
    Rx queue state: started
    Tx queue state: started
The queue 7 is started but cannot forward. That's the problem.

We know that each stream has a read-only "disabled" field that
control if this stream should be used to forward. This field
depends on testpmd local queue state, please see
commit 3c4426db54fc ("app/testpmd: do not poll stopped queues").
DPDK framework maintains ethdev queue state that drivers reported,
which indicates the real state of queues.

There are commands that update these two kind queue state such as
'port X rxq|txq start|stop'. But these operations take effect only
in one stop-start round. In the following stop-start round, the
preceding operations do not take effect anymore. However, only
the ethdev queue state is updated, causing the testpmd and ethdev
state information to diverge and causing unexpected side effects
as above problem.

There was a similar problem for the secondary process, please see
commit 5028f207a4fa ("app/testpmd: fix secondary process packet
forwarding").

This patch applies its workaround with some difference to the
primary process. Not all PMDs implement rte_eth_rx_queue_info_get and
rte_eth_tx_queue_info_get, however they may support deferred_start
with primary process. To not break their behavior, retain the original
testpmd local queue state for those PMDs.

Fixes: 3c4426db54fc ("app/testpmd: do not poll stopped queues")
Cc: stable@dpdk.org

Signed-off-by: Jie Hai <haijie1@huawei.com>
---
v1->v2:
1. Fix misspelled word 'deferred'.
2. Fix incorrect format of reference to commits.

v2->v3:
1. Fix incorrect format of reference to commits.

v3->v4:
1. Remove deferred_start change.
2. Modify commit log.
---
 app/test-pmd/testpmd.c | 20 ++++++++++++++++----
 1 file changed, 16 insertions(+), 4 deletions(-)
  

Comments

Ferruh Yigit June 9, 2023, 11:10 a.m. UTC | #1
On 6/9/2023 10:03 AM, Jie Hai wrote:
> Here's how the problem arises.
> step1: Start the app.
>     dpdk-testpmd -a 0000:35:00.0 -l 0-3 -- -i --rxq=10 --txq=10
> 
> step2: Perform the following steps and send traffic. As expected,
> queue 7 does not send or receive packets, and other queues do.
>     port 0 rxq 7 stop
>     port 0 txq 7 stop
>     set fwd mac
>     start
> 
> step3: Perform the following steps and send traffic. All queues
> are expected to send and receive packets normally, but that's not
> the case for queue 7.
>     stop
>     port stop all
>     port start all
>     start
>     show port xstats all
> 
> In fact, only the value of rx_q7_packets for queue 7 is not zero,
> which means queue 7 is enabled for the driver but is not involved
> in packet receiving and forwarding by software. If we check queue
> state by command 'show rxq info 0 7' and 'show txq info 0 7',
> we see queue 7 is started as other queues are.
>     Rx queue state: started
>     Tx queue state: started
> The queue 7 is started but cannot forward. That's the problem.
> 
> We know that each stream has a read-only "disabled" field that
> control if this stream should be used to forward. This field
> depends on testpmd local queue state, please see
> commit 3c4426db54fc ("app/testpmd: do not poll stopped queues").
> DPDK framework maintains ethdev queue state that drivers reported,
> which indicates the real state of queues.
> 
> There are commands that update these two kind queue state such as
> 'port X rxq|txq start|stop'. But these operations take effect only
> in one stop-start round. In the following stop-start round, the
> preceding operations do not take effect anymore. However, only
> the ethdev queue state is updated, causing the testpmd and ethdev
> state information to diverge and causing unexpected side effects
> as above problem.
> 
> There was a similar problem for the secondary process, please see
> commit 5028f207a4fa ("app/testpmd: fix secondary process packet
> forwarding").
> 
> This patch applies its workaround with some difference to the
> primary process. Not all PMDs implement rte_eth_rx_queue_info_get and
> rte_eth_tx_queue_info_get, however they may support deferred_start
> with primary process. To not break their behavior, retain the original
> testpmd local queue state for those PMDs.
> 
> Fixes: 3c4426db54fc ("app/testpmd: do not poll stopped queues")
> Cc: stable@dpdk.org
> 
> Signed-off-by: Jie Hai <haijie1@huawei.com>
> 

Patch looks good to me, but since it has potential side effects,

Can some from test team verify following before continue:
a) Secondary testpmd
b) Deferred Queue

Thanks,
Ferruh
  
Jie Hai June 20, 2023, 10:07 a.m. UTC | #2
On 2023/6/9 19:10, Ferruh Yigit wrote:
> On 6/9/2023 10:03 AM, Jie Hai wrote:
>> Here's how the problem arises.
>> step1: Start the app.
>>      dpdk-testpmd -a 0000:35:00.0 -l 0-3 -- -i --rxq=10 --txq=10
>>
>> step2: Perform the following steps and send traffic. As expected,
>> queue 7 does not send or receive packets, and other queues do.
>>      port 0 rxq 7 stop
>>      port 0 txq 7 stop
>>      set fwd mac
>>      start
>>
>> step3: Perform the following steps and send traffic. All queues
>> are expected to send and receive packets normally, but that's not
>> the case for queue 7.
>>      stop
>>      port stop all
>>      port start all
>>      start
>>      show port xstats all
>>
>> In fact, only the value of rx_q7_packets for queue 7 is not zero,
>> which means queue 7 is enabled for the driver but is not involved
>> in packet receiving and forwarding by software. If we check queue
>> state by command 'show rxq info 0 7' and 'show txq info 0 7',
>> we see queue 7 is started as other queues are.
>>      Rx queue state: started
>>      Tx queue state: started
>> The queue 7 is started but cannot forward. That's the problem.
>>
>> We know that each stream has a read-only "disabled" field that
>> control if this stream should be used to forward. This field
>> depends on testpmd local queue state, please see
>> commit 3c4426db54fc ("app/testpmd: do not poll stopped queues").
>> DPDK framework maintains ethdev queue state that drivers reported,
>> which indicates the real state of queues.
>>
>> There are commands that update these two kind queue state such as
>> 'port X rxq|txq start|stop'. But these operations take effect only
>> in one stop-start round. In the following stop-start round, the
>> preceding operations do not take effect anymore. However, only
>> the ethdev queue state is updated, causing the testpmd and ethdev
>> state information to diverge and causing unexpected side effects
>> as above problem.
>>
>> There was a similar problem for the secondary process, please see
>> commit 5028f207a4fa ("app/testpmd: fix secondary process packet
>> forwarding").
>>
>> This patch applies its workaround with some difference to the
>> primary process. Not all PMDs implement rte_eth_rx_queue_info_get and
>> rte_eth_tx_queue_info_get, however they may support deferred_start
>> with primary process. To not break their behavior, retain the original
>> testpmd local queue state for those PMDs.
>>
>> Fixes: 3c4426db54fc ("app/testpmd: do not poll stopped queues")
>> Cc: stable@dpdk.org
>>
>> Signed-off-by: Jie Hai <haijie1@huawei.com>
>>
> 
> Patch looks good to me, but since it has potential side effects,
> 
> Can some from test team verify following before continue:
> a) Secondary testpmd
> b) Deferred Queue
> 
> Thanks,
> Ferruh
> 
> 
Hi Ferruh,

I tested them with hns3 driver. The results are the same before and
after the patch is applied. The results are as follows:

case1: Secondary testpmd
	Action: Secondary testpmd stop a queue and primary testpmd start the queue.
	Result: The queue can forward for both process.

case2:
	Action: Set a queue with deferred_start on for a primary process.
	Result: The queue cannot forward until deferred_start is off.

Thanks,
Jie Hai
> 
> .
  
Ferruh Yigit June 20, 2023, 10:57 a.m. UTC | #3
On 6/20/2023 11:07 AM, Jie Hai wrote:
> On 2023/6/9 19:10, Ferruh Yigit wrote:
>> On 6/9/2023 10:03 AM, Jie Hai wrote:
>>> Here's how the problem arises.
>>> step1: Start the app.
>>>      dpdk-testpmd -a 0000:35:00.0 -l 0-3 -- -i --rxq=10 --txq=10
>>>
>>> step2: Perform the following steps and send traffic. As expected,
>>> queue 7 does not send or receive packets, and other queues do.
>>>      port 0 rxq 7 stop
>>>      port 0 txq 7 stop
>>>      set fwd mac
>>>      start
>>>
>>> step3: Perform the following steps and send traffic. All queues
>>> are expected to send and receive packets normally, but that's not
>>> the case for queue 7.
>>>      stop
>>>      port stop all
>>>      port start all
>>>      start
>>>      show port xstats all
>>>
>>> In fact, only the value of rx_q7_packets for queue 7 is not zero,
>>> which means queue 7 is enabled for the driver but is not involved
>>> in packet receiving and forwarding by software. If we check queue
>>> state by command 'show rxq info 0 7' and 'show txq info 0 7',
>>> we see queue 7 is started as other queues are.
>>>      Rx queue state: started
>>>      Tx queue state: started
>>> The queue 7 is started but cannot forward. That's the problem.
>>>
>>> We know that each stream has a read-only "disabled" field that
>>> control if this stream should be used to forward. This field
>>> depends on testpmd local queue state, please see
>>> commit 3c4426db54fc ("app/testpmd: do not poll stopped queues").
>>> DPDK framework maintains ethdev queue state that drivers reported,
>>> which indicates the real state of queues.
>>>
>>> There are commands that update these two kind queue state such as
>>> 'port X rxq|txq start|stop'. But these operations take effect only
>>> in one stop-start round. In the following stop-start round, the
>>> preceding operations do not take effect anymore. However, only
>>> the ethdev queue state is updated, causing the testpmd and ethdev
>>> state information to diverge and causing unexpected side effects
>>> as above problem.
>>>
>>> There was a similar problem for the secondary process, please see
>>> commit 5028f207a4fa ("app/testpmd: fix secondary process packet
>>> forwarding").
>>>
>>> This patch applies its workaround with some difference to the
>>> primary process. Not all PMDs implement rte_eth_rx_queue_info_get and
>>> rte_eth_tx_queue_info_get, however they may support deferred_start
>>> with primary process. To not break their behavior, retain the original
>>> testpmd local queue state for those PMDs.
>>>
>>> Fixes: 3c4426db54fc ("app/testpmd: do not poll stopped queues")
>>> Cc: stable@dpdk.org
>>>
>>> Signed-off-by: Jie Hai <haijie1@huawei.com>
>>>
>>
>> Patch looks good to me, but since it has potential side effects,
>>
>> Can some from test team verify following before continue:
>> a) Secondary testpmd
>> b) Deferred Queue
>>
>> Thanks,
>> Ferruh
>>
>>
> Hi Ferruh,
> 
> I tested them with hns3 driver. The results are the same before and
> after the patch is applied. The results are as follows:
> 
> case1: Secondary testpmd
>     Action: Secondary testpmd stop a queue and primary testpmd start the
> queue.
>     Result: The queue can forward for both process.
> 
> case2:
>     Action: Set a queue with deferred_start on for a primary process.
>     Result: The queue cannot forward until deferred_start is off.
> 

Thanks Jie, I will continue to process the patch.
  
Ferruh Yigit June 20, 2023, 5:05 p.m. UTC | #4
On 6/20/2023 11:07 AM, Jie Hai wrote:
> On 2023/6/9 19:10, Ferruh Yigit wrote:
>> On 6/9/2023 10:03 AM, Jie Hai wrote:
>>> Here's how the problem arises.
>>> step1: Start the app.
>>>      dpdk-testpmd -a 0000:35:00.0 -l 0-3 -- -i --rxq=10 --txq=10
>>>
>>> step2: Perform the following steps and send traffic. As expected,
>>> queue 7 does not send or receive packets, and other queues do.
>>>      port 0 rxq 7 stop
>>>      port 0 txq 7 stop
>>>      set fwd mac
>>>      start
>>>
>>> step3: Perform the following steps and send traffic. All queues
>>> are expected to send and receive packets normally, but that's not
>>> the case for queue 7.
>>>      stop
>>>      port stop all
>>>      port start all
>>>      start
>>>      show port xstats all
>>>
>>> In fact, only the value of rx_q7_packets for queue 7 is not zero,
>>> which means queue 7 is enabled for the driver but is not involved
>>> in packet receiving and forwarding by software. If we check queue
>>> state by command 'show rxq info 0 7' and 'show txq info 0 7',
>>> we see queue 7 is started as other queues are.
>>>      Rx queue state: started
>>>      Tx queue state: started
>>> The queue 7 is started but cannot forward. That's the problem.
>>>
>>> We know that each stream has a read-only "disabled" field that
>>> control if this stream should be used to forward. This field
>>> depends on testpmd local queue state, please see
>>> commit 3c4426db54fc ("app/testpmd: do not poll stopped queues").
>>> DPDK framework maintains ethdev queue state that drivers reported,
>>> which indicates the real state of queues.
>>>
>>> There are commands that update these two kind queue state such as
>>> 'port X rxq|txq start|stop'. But these operations take effect only
>>> in one stop-start round. In the following stop-start round, the
>>> preceding operations do not take effect anymore. However, only
>>> the ethdev queue state is updated, causing the testpmd and ethdev
>>> state information to diverge and causing unexpected side effects
>>> as above problem.
>>>
>>> There was a similar problem for the secondary process, please see
>>> commit 5028f207a4fa ("app/testpmd: fix secondary process packet
>>> forwarding").
>>>
>>> This patch applies its workaround with some difference to the
>>> primary process. Not all PMDs implement rte_eth_rx_queue_info_get and
>>> rte_eth_tx_queue_info_get, however they may support deferred_start
>>> with primary process. To not break their behavior, retain the original
>>> testpmd local queue state for those PMDs.
>>>
>>> Fixes: 3c4426db54fc ("app/testpmd: do not poll stopped queues")
>>> Cc: stable@dpdk.org
>>>
>>> Signed-off-by: Jie Hai <haijie1@huawei.com>
>>>
>>
>> Patch looks good to me, but since it has potential side effects,
>>
>> Can some from test team verify following before continue:
>> a) Secondary testpmd
>> b) Deferred Queue
>>
>> Thanks,
>> Ferruh
>>
>>
> Hi Ferruh,
> 
> I tested them with hns3 driver. The results are the same before and
> after the patch is applied. The results are as follows:
> 
> case1: Secondary testpmd
>     Action: Secondary testpmd stop a queue and primary testpmd start the
> queue.
>     Result: The queue can forward for both process.
> 
> case2:
>     Action: Set a queue with deferred_start on for a primary process.
>     Result: The queue cannot forward until deferred_start is off.
> 

Acked-by: Ferruh Yigit <ferruh.yigit@amd.com>

Applied to dpdk-next-net/main, thanks.
  
Ali Alnubani June 22, 2023, 4:40 p.m. UTC | #5
> -----Original Message-----
> From: Jie Hai <haijie1@huawei.com>
> Sent: Friday, June 9, 2023 12:04 PM
> To: Aman Singh <aman.deep.singh@intel.com>; Yuying Zhang
> <yuying.zhang@intel.com>; Anatoly Burakov <anatoly.burakov@intel.com>;
> Matan Azrad <matan@nvidia.com>; Dmitry Kozlyuk
> <dmitry.kozliuk@gmail.com>
> Cc: dev@dpdk.org; liudongdong3@huawei.com; shiyangx.he@intel.com;
> ferruh.yigit@amd.com
> Subject: [PATCH v4] app/testpmd: fix primary process not polling all queues
> 
> Here's how the problem arises.
> step1: Start the app.
>     dpdk-testpmd -a 0000:35:00.0 -l 0-3 -- -i --rxq=10 --txq=10
> 
> step2: Perform the following steps and send traffic. As expected,
> queue 7 does not send or receive packets, and other queues do.
>     port 0 rxq 7 stop
>     port 0 txq 7 stop
>     set fwd mac
>     start
> 
> step3: Perform the following steps and send traffic. All queues
> are expected to send and receive packets normally, but that's not
> the case for queue 7.
>     stop
>     port stop all
>     port start all
>     start
>     show port xstats all
> 
> In fact, only the value of rx_q7_packets for queue 7 is not zero,
> which means queue 7 is enabled for the driver but is not involved
> in packet receiving and forwarding by software. If we check queue
> state by command 'show rxq info 0 7' and 'show txq info 0 7',
> we see queue 7 is started as other queues are.
>     Rx queue state: started
>     Tx queue state: started
> The queue 7 is started but cannot forward. That's the problem.
> 
> We know that each stream has a read-only "disabled" field that
> control if this stream should be used to forward. This field
> depends on testpmd local queue state, please see
> commit 3c4426db54fc ("app/testpmd: do not poll stopped queues").
> DPDK framework maintains ethdev queue state that drivers reported,
> which indicates the real state of queues.
> 
> There are commands that update these two kind queue state such as
> 'port X rxq|txq start|stop'. But these operations take effect only
> in one stop-start round. In the following stop-start round, the
> preceding operations do not take effect anymore. However, only
> the ethdev queue state is updated, causing the testpmd and ethdev
> state information to diverge and causing unexpected side effects
> as above problem.
> 
> There was a similar problem for the secondary process, please see
> commit 5028f207a4fa ("app/testpmd: fix secondary process packet
> forwarding").
> 
> This patch applies its workaround with some difference to the
> primary process. Not all PMDs implement rte_eth_rx_queue_info_get and
> rte_eth_tx_queue_info_get, however they may support deferred_start
> with primary process. To not break their behavior, retain the original
> testpmd local queue state for those PMDs.
> 
> Fixes: 3c4426db54fc ("app/testpmd: do not poll stopped queues")
> Cc: stable@dpdk.org
> 
> Signed-off-by: Jie Hai <haijie1@huawei.com>
> ---

Hi Jie,

I see the error below when starting a representor port after reattaching it with this patch, is it expected?

$ sudo ./build /app/dpdk-testpmd -n 4  -a 0000:08:00.0,dv_esw_en=1,representor=vf0-1  -a auxiliary: -a 00:00.0 --iova-mode="va" -- -i
[..]
testpmd> port stop all
testpmd> port close 0
testpmd> device detach 0000:08:00.0
testpmd> port attach 0000:08:00.0,dv_esw_en=1,representor=0-1
testpmd> port start 1
Configuring Port 1 (socket 0)
Port 1: FA:9E:D8:5F:D7:D8
Invalid Rx queue_id=0
testpmd: Failed to get rx queue info
Invalid Tx queue_id=0
testpmd: Failed to get tx queue info

Regards,
Ali
  
Jie Hai June 26, 2023, 9:30 a.m. UTC | #6
On 2023/6/23 0:40, Ali Alnubani wrote:
>> -----Original Message-----
>> From: Jie Hai <haijie1@huawei.com>
>> Sent: Friday, June 9, 2023 12:04 PM
>> To: Aman Singh <aman.deep.singh@intel.com>; Yuying Zhang
>> <yuying.zhang@intel.com>; Anatoly Burakov <anatoly.burakov@intel.com>;
>> Matan Azrad <matan@nvidia.com>; Dmitry Kozlyuk
>> <dmitry.kozliuk@gmail.com>
>> Cc: dev@dpdk.org; liudongdong3@huawei.com; shiyangx.he@intel.com;
>> ferruh.yigit@amd.com
>> Subject: [PATCH v4] app/testpmd: fix primary process not polling all queues
>>
>> Here's how the problem arises.
>> step1: Start the app.
>>      dpdk-testpmd -a 0000:35:00.0 -l 0-3 -- -i --rxq=10 --txq=10
>>
>> step2: Perform the following steps and send traffic. As expected,
>> queue 7 does not send or receive packets, and other queues do.
>>      port 0 rxq 7 stop
>>      port 0 txq 7 stop
>>      set fwd mac
>>      start
>>
>> step3: Perform the following steps and send traffic. All queues
>> are expected to send and receive packets normally, but that's not
>> the case for queue 7.
>>      stop
>>      port stop all
>>      port start all
>>      start
>>      show port xstats all
>>
>> In fact, only the value of rx_q7_packets for queue 7 is not zero,
>> which means queue 7 is enabled for the driver but is not involved
>> in packet receiving and forwarding by software. If we check queue
>> state by command 'show rxq info 0 7' and 'show txq info 0 7',
>> we see queue 7 is started as other queues are.
>>      Rx queue state: started
>>      Tx queue state: started
>> The queue 7 is started but cannot forward. That's the problem.
>>
>> We know that each stream has a read-only "disabled" field that
>> control if this stream should be used to forward. This field
>> depends on testpmd local queue state, please see
>> commit 3c4426db54fc ("app/testpmd: do not poll stopped queues").
>> DPDK framework maintains ethdev queue state that drivers reported,
>> which indicates the real state of queues.
>>
>> There are commands that update these two kind queue state such as
>> 'port X rxq|txq start|stop'. But these operations take effect only
>> in one stop-start round. In the following stop-start round, the
>> preceding operations do not take effect anymore. However, only
>> the ethdev queue state is updated, causing the testpmd and ethdev
>> state information to diverge and causing unexpected side effects
>> as above problem.
>>
>> There was a similar problem for the secondary process, please see
>> commit 5028f207a4fa ("app/testpmd: fix secondary process packet
>> forwarding").
>>
>> This patch applies its workaround with some difference to the
>> primary process. Not all PMDs implement rte_eth_rx_queue_info_get and
>> rte_eth_tx_queue_info_get, however they may support deferred_start
>> with primary process. To not break their behavior, retain the original
>> testpmd local queue state for those PMDs.
>>
>> Fixes: 3c4426db54fc ("app/testpmd: do not poll stopped queues")
>> Cc: stable@dpdk.org
>>
>> Signed-off-by: Jie Hai <haijie1@huawei.com>
>> ---
> 
> Hi Jie,
> 
> I see the error below when starting a representor port after reattaching it with this patch, is it expected?
> 
> $ sudo ./build /app/dpdk-testpmd -n 4  -a 0000:08:00.0,dv_esw_en=1,representor=vf0-1  -a auxiliary: -a 00:00.0 --iova-mode="va" -- -i
> [..]
> testpmd> port stop all
> testpmd> port close 0
> testpmd> device detach 0000:08:00.0
> testpmd> port attach 0000:08:00.0,dv_esw_en=1,representor=0-1
> testpmd> port start 1
> Configuring Port 1 (socket 0)
> Port 1: FA:9E:D8:5F:D7:D8
> Invalid Rx queue_id=0
> testpmd: Failed to get rx queue info
> Invalid Tx queue_id=0
> testpmd: Failed to get tx queue info
> 
> Regards,
> Ali
Hi Ali,
Thanks for your feedback.

When update_queue_state is called, the status of all queues on all ports 
are updated.
The number of queues is nb_rxq|nb_txq which is stored locally by testpmd 
process.
All ports on the same process shares the same nb_rxq|nb_txq.

After detached and attached, the number of queues of port 0 is 0.
And it changes only when the port is reconfigured by testpmd,
which is when port 0 is started.

If we start port 1 first, update_queue_state will update nb_rxq|nb_txq
queues state of port 0, and that's invalid because there's zero queues.

If this patch is not applied, the same problem occurs when the secondary 
process detaches and attaches the port, and then starts the port in the 
multi-process scenario.

I will submit a patch to fix this problem. When port starts, update 
queue state based on the number of queues reported by the driver.

Thanks,
Jie Hai
  
Ferruh Yigit June 27, 2023, 11:05 a.m. UTC | #7
On 6/26/2023 10:30 AM, Jie Hai wrote:
> On 2023/6/23 0:40, Ali Alnubani wrote:
>>> -----Original Message-----
>>> From: Jie Hai <haijie1@huawei.com>
>>> Sent: Friday, June 9, 2023 12:04 PM
>>> To: Aman Singh <aman.deep.singh@intel.com>; Yuying Zhang
>>> <yuying.zhang@intel.com>; Anatoly Burakov <anatoly.burakov@intel.com>;
>>> Matan Azrad <matan@nvidia.com>; Dmitry Kozlyuk
>>> <dmitry.kozliuk@gmail.com>
>>> Cc: dev@dpdk.org; liudongdong3@huawei.com; shiyangx.he@intel.com;
>>> ferruh.yigit@amd.com
>>> Subject: [PATCH v4] app/testpmd: fix primary process not polling all
>>> queues
>>>
>>> Here's how the problem arises.
>>> step1: Start the app.
>>>      dpdk-testpmd -a 0000:35:00.0 -l 0-3 -- -i --rxq=10 --txq=10
>>>
>>> step2: Perform the following steps and send traffic. As expected,
>>> queue 7 does not send or receive packets, and other queues do.
>>>      port 0 rxq 7 stop
>>>      port 0 txq 7 stop
>>>      set fwd mac
>>>      start
>>>
>>> step3: Perform the following steps and send traffic. All queues
>>> are expected to send and receive packets normally, but that's not
>>> the case for queue 7.
>>>      stop
>>>      port stop all
>>>      port start all
>>>      start
>>>      show port xstats all
>>>
>>> In fact, only the value of rx_q7_packets for queue 7 is not zero,
>>> which means queue 7 is enabled for the driver but is not involved
>>> in packet receiving and forwarding by software. If we check queue
>>> state by command 'show rxq info 0 7' and 'show txq info 0 7',
>>> we see queue 7 is started as other queues are.
>>>      Rx queue state: started
>>>      Tx queue state: started
>>> The queue 7 is started but cannot forward. That's the problem.
>>>
>>> We know that each stream has a read-only "disabled" field that
>>> control if this stream should be used to forward. This field
>>> depends on testpmd local queue state, please see
>>> commit 3c4426db54fc ("app/testpmd: do not poll stopped queues").
>>> DPDK framework maintains ethdev queue state that drivers reported,
>>> which indicates the real state of queues.
>>>
>>> There are commands that update these two kind queue state such as
>>> 'port X rxq|txq start|stop'. But these operations take effect only
>>> in one stop-start round. In the following stop-start round, the
>>> preceding operations do not take effect anymore. However, only
>>> the ethdev queue state is updated, causing the testpmd and ethdev
>>> state information to diverge and causing unexpected side effects
>>> as above problem.
>>>
>>> There was a similar problem for the secondary process, please see
>>> commit 5028f207a4fa ("app/testpmd: fix secondary process packet
>>> forwarding").
>>>
>>> This patch applies its workaround with some difference to the
>>> primary process. Not all PMDs implement rte_eth_rx_queue_info_get and
>>> rte_eth_tx_queue_info_get, however they may support deferred_start
>>> with primary process. To not break their behavior, retain the original
>>> testpmd local queue state for those PMDs.
>>>
>>> Fixes: 3c4426db54fc ("app/testpmd: do not poll stopped queues")
>>> Cc: stable@dpdk.org
>>>
>>> Signed-off-by: Jie Hai <haijie1@huawei.com>
>>> ---
>>
>> Hi Jie,
>>
>> I see the error below when starting a representor port after
>> reattaching it with this patch, is it expected?
>>
>> $ sudo ./build /app/dpdk-testpmd -n 4  -a
>> 0000:08:00.0,dv_esw_en=1,representor=vf0-1  -a auxiliary: -a 00:00.0
>> --iova-mode="va" -- -i
>> [..]
>> testpmd> port stop all
>> testpmd> port close 0
>> testpmd> device detach 0000:08:00.0
>> testpmd> port attach 0000:08:00.0,dv_esw_en=1,representor=0-1
>> testpmd> port start 1
>> Configuring Port 1 (socket 0)
>> Port 1: FA:9E:D8:5F:D7:D8
>> Invalid Rx queue_id=0
>> testpmd: Failed to get rx queue info
>> Invalid Tx queue_id=0
>> testpmd: Failed to get tx queue info
>>
>> Regards,
>> Ali
> Hi Ali,
> Thanks for your feedback.
> 
> When update_queue_state is called, the status of all queues on all ports
> are updated.
> The number of queues is nb_rxq|nb_txq which is stored locally by testpmd
> process.
> All ports on the same process shares the same nb_rxq|nb_txq.
> 
> After detached and attached, the number of queues of port 0 is 0.
> And it changes only when the port is reconfigured by testpmd,
> which is when port 0 is started.
> 
> If we start port 1 first, update_queue_state will update nb_rxq|nb_txq
> queues state of port 0, and that's invalid because there's zero queues.
> 
> If this patch is not applied, the same problem occurs when the secondary
> process detaches and attaches the port, and then starts the port in the
> multi-process scenario.
> 
> I will submit a patch to fix this problem. When port starts, update
> queue state based on the number of queues reported by the driver.
> 

Hi Ali,

How big a blocker is this issue, should the fix be part of -rc2?
  
Ali Alnubani July 3, 2023, 1:40 p.m. UTC | #8
> -----Original Message-----
> From: Ferruh Yigit <ferruh.yigit@amd.com>
> Sent: Tuesday, June 27, 2023 2:05 PM
> To: Jie Hai <haijie1@huawei.com>; Ali Alnubani <alialnu@nvidia.com>; Aman
> Singh <aman.deep.singh@intel.com>; Yuying Zhang
> <yuying.zhang@intel.com>; Anatoly Burakov <anatoly.burakov@intel.com>;
> Matan Azrad <matan@nvidia.com>; Dmitry Kozlyuk
> <dmitry.kozliuk@gmail.com>
> Cc: dev@dpdk.org; liudongdong3@huawei.com; shiyangx.he@intel.com;
> Raslan Darawsheh <rasland@nvidia.com>; NBU-Contact-Thomas Monjalon
> (EXTERNAL) <thomas@monjalon.net>
> Subject: Re: [PATCH v4] app/testpmd: fix primary process not polling all
> queues
> 
> On 6/26/2023 10:30 AM, Jie Hai wrote:
> > On 2023/6/23 0:40, Ali Alnubani wrote:
> >>> -----Original Message-----
> >>> From: Jie Hai <haijie1@huawei.com>
> >>> Sent: Friday, June 9, 2023 12:04 PM
> >>> To: Aman Singh <aman.deep.singh@intel.com>; Yuying Zhang
> >>> <yuying.zhang@intel.com>; Anatoly Burakov
> <anatoly.burakov@intel.com>;
> >>> Matan Azrad <matan@nvidia.com>; Dmitry Kozlyuk
> >>> <dmitry.kozliuk@gmail.com>
> >>> Cc: dev@dpdk.org; liudongdong3@huawei.com; shiyangx.he@intel.com;
> >>> ferruh.yigit@amd.com
> >>> Subject: [PATCH v4] app/testpmd: fix primary process not polling all
> >>> queues
> >>>
> >>> Here's how the problem arises.
> >>> step1: Start the app.
> >>>      dpdk-testpmd -a 0000:35:00.0 -l 0-3 -- -i --rxq=10 --txq=10
> >>>
> >>> step2: Perform the following steps and send traffic. As expected,
> >>> queue 7 does not send or receive packets, and other queues do.
> >>>      port 0 rxq 7 stop
> >>>      port 0 txq 7 stop
> >>>      set fwd mac
> >>>      start
> >>>
> >>> step3: Perform the following steps and send traffic. All queues
> >>> are expected to send and receive packets normally, but that's not
> >>> the case for queue 7.
> >>>      stop
> >>>      port stop all
> >>>      port start all
> >>>      start
> >>>      show port xstats all
> >>>
> >>> In fact, only the value of rx_q7_packets for queue 7 is not zero,
> >>> which means queue 7 is enabled for the driver but is not involved
> >>> in packet receiving and forwarding by software. If we check queue
> >>> state by command 'show rxq info 0 7' and 'show txq info 0 7',
> >>> we see queue 7 is started as other queues are.
> >>>      Rx queue state: started
> >>>      Tx queue state: started
> >>> The queue 7 is started but cannot forward. That's the problem.
> >>>
> >>> We know that each stream has a read-only "disabled" field that
> >>> control if this stream should be used to forward. This field
> >>> depends on testpmd local queue state, please see
> >>> commit 3c4426db54fc ("app/testpmd: do not poll stopped queues").
> >>> DPDK framework maintains ethdev queue state that drivers reported,
> >>> which indicates the real state of queues.
> >>>
> >>> There are commands that update these two kind queue state such as
> >>> 'port X rxq|txq start|stop'. But these operations take effect only
> >>> in one stop-start round. In the following stop-start round, the
> >>> preceding operations do not take effect anymore. However, only
> >>> the ethdev queue state is updated, causing the testpmd and ethdev
> >>> state information to diverge and causing unexpected side effects
> >>> as above problem.
> >>>
> >>> There was a similar problem for the secondary process, please see
> >>> commit 5028f207a4fa ("app/testpmd: fix secondary process packet
> >>> forwarding").
> >>>
> >>> This patch applies its workaround with some difference to the
> >>> primary process. Not all PMDs implement rte_eth_rx_queue_info_get and
> >>> rte_eth_tx_queue_info_get, however they may support deferred_start
> >>> with primary process. To not break their behavior, retain the original
> >>> testpmd local queue state for those PMDs.
> >>>
> >>> Fixes: 3c4426db54fc ("app/testpmd: do not poll stopped queues")
> >>> Cc: stable@dpdk.org
> >>>
> >>> Signed-off-by: Jie Hai <haijie1@huawei.com>
> >>> ---
> >>
> >> Hi Jie,
> >>
> >> I see the error below when starting a representor port after
> >> reattaching it with this patch, is it expected?
> >>
> >> $ sudo ./build /app/dpdk-testpmd -n 4  -a
> >> 0000:08:00.0,dv_esw_en=1,representor=vf0-1  -a auxiliary: -a 00:00.0
> >> --iova-mode="va" -- -i
> >> [..]
> >> testpmd> port stop all
> >> testpmd> port close 0
> >> testpmd> device detach 0000:08:00.0
> >> testpmd> port attach 0000:08:00.0,dv_esw_en=1,representor=0-1
> >> testpmd> port start 1
> >> Configuring Port 1 (socket 0)
> >> Port 1: FA:9E:D8:5F:D7:D8
> >> Invalid Rx queue_id=0
> >> testpmd: Failed to get rx queue info
> >> Invalid Tx queue_id=0
> >> testpmd: Failed to get tx queue info
> >>
> >> Regards,
> >> Ali
> > Hi Ali,
> > Thanks for your feedback.
> >
> > When update_queue_state is called, the status of all queues on all ports
> > are updated.
> > The number of queues is nb_rxq|nb_txq which is stored locally by testpmd
> > process.
> > All ports on the same process shares the same nb_rxq|nb_txq.
> >
> > After detached and attached, the number of queues of port 0 is 0.
> > And it changes only when the port is reconfigured by testpmd,
> > which is when port 0 is started.
> >
> > If we start port 1 first, update_queue_state will update nb_rxq|nb_txq
> > queues state of port 0, and that's invalid because there's zero queues.
> >
> > If this patch is not applied, the same problem occurs when the secondary
> > process detaches and attaches the port, and then starts the port in the
> > multi-process scenario.
> >
> > I will submit a patch to fix this problem. When port starts, update
> > queue state based on the number of queues reported by the driver.
> >
> 
> Hi Ali,
> 
> How big a blocker is this issue, should the fix be part of -rc2?

Hi Ferruh,

I missed your email, sorry about that.
Jie already sent a patch and it resolved it for me.

Thanks,
Ali
  

Patch

diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index c6ad9b18bf03..1fc70650e0a4 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -2424,6 +2424,13 @@  update_rx_queue_state(uint16_t port_id, uint16_t queue_id)
 		ports[port_id].rxq[queue_id].state =
 			rx_qinfo.queue_state;
 	} else if (rc == -ENOTSUP) {
+		/*
+		 * Do not change the rxq state for primary process
+		 * to ensure that the PMDs do not implement
+		 * rte_eth_rx_queue_info_get can forward as before.
+		 */
+		if (rte_eal_process_type() == RTE_PROC_PRIMARY)
+			return;
 		/*
 		 * Set the rxq state to RTE_ETH_QUEUE_STATE_STARTED
 		 * to ensure that the PMDs do not implement
@@ -2449,6 +2456,13 @@  update_tx_queue_state(uint16_t port_id, uint16_t queue_id)
 		ports[port_id].txq[queue_id].state =
 			tx_qinfo.queue_state;
 	} else if (rc == -ENOTSUP) {
+		/*
+		 * Do not change the txq state for primary process
+		 * to ensure that the PMDs do not implement
+		 * rte_eth_tx_queue_info_get can forward as before.
+		 */
+		if (rte_eal_process_type() == RTE_PROC_PRIMARY)
+			return;
 		/*
 		 * Set the txq state to RTE_ETH_QUEUE_STATE_STARTED
 		 * to ensure that the PMDs do not implement
@@ -2516,8 +2530,7 @@  start_packet_forwarding(int with_tx_first)
 		return;
 
 	if (stream_init != NULL) {
-		if (rte_eal_process_type() == RTE_PROC_SECONDARY)
-			update_queue_state();
+		update_queue_state();
 		for (i = 0; i < cur_fwd_config.nb_fwd_streams; i++)
 			stream_init(fwd_streams[i]);
 	}
@@ -3280,8 +3293,7 @@  start_port(portid_t pid)
 		pl[cfg_pi++] = pi;
 	}
 
-	if (rte_eal_process_type() == RTE_PROC_SECONDARY)
-		update_queue_state();
+	update_queue_state();
 
 	if (at_least_one_port_successfully_started && !no_link_check)
 		check_all_ports_link_status(RTE_PORT_ALL);