[v3] net/ixgbe: add proper memory barriers for some Rx functions

Message ID 20230506102359.243462-1-zhoumin@loongson.cn (mailing list archive)
State Superseded, archived
Delegated to: Qi Zhang
Headers
Series [v3] net/ixgbe: add proper memory barriers for some Rx functions |

Checks

Context Check Description
ci/checkpatch warning coding style issues
ci/loongarch-compilation success Compilation OK
ci/loongarch-unit-testing success Unit Testing PASS
ci/Intel-compilation success Compilation OK
ci/github-robot: build success github build: passed
ci/intel-Testing success Testing PASS
ci/iol-mellanox-Performance success Performance Testing PASS
ci/iol-intel-Performance success Performance Testing PASS
ci/iol-broadcom-Functional success Functional Testing PASS
ci/iol-intel-Functional success Functional Testing PASS
ci/iol-broadcom-Performance success Performance Testing PASS
ci/iol-aarch64-unit-testing success Testing PASS
ci/iol-unit-testing success Testing PASS
ci/iol-abi-testing success Testing PASS
ci/iol-x86_64-compile-testing success Testing PASS
ci/iol-aarch64-compile-testing success Testing PASS
ci/iol-testing success Testing PASS
ci/iol-x86_64-unit-testing success Testing PASS
ci/intel-Functional success Functional PASS

Commit Message

zhoumin May 6, 2023, 10:23 a.m. UTC
  Segmentation fault has been observed while running the
ixgbe_recv_pkts_lro() function to receive packets on the Loongson 3C5000
processor which has 64 cores and 4 NUMA nodes.

From the ixgbe_recv_pkts_lro() function, we found that as long as the first
packet has the EOP bit set, and the length of this packet is less than or
equal to rxq->crc_len, the segmentation fault will definitely happen even
though on the other platforms. For example, if we made the first packet
which had the EOP bit set had a zero length by force, the segmentation
fault would happen on X86.

Because when processd the first packet the first_seg->next will be NULL, if
at the same time this packet has the EOP bit set and its length is less
than or equal to rxq->crc_len, the following loop will be executed:

    for (lp = first_seg; lp->next != rxm; lp = lp->next)
        ;

We know that the first_seg->next will be NULL under this condition. So the
expression of lp->next->next will cause the segmentation fault.

Normally, the length of the first packet with EOP bit set will be greater
than rxq->crc_len. However, the out-of-order execution of CPU may make the
read ordering of the status and the rest of the descriptor fields in this
function not be correct. The related codes are as following:

        rxdp = &rx_ring[rx_id];
 #1     staterr = rte_le_to_cpu_32(rxdp->wb.upper.status_error);

        if (!(staterr & IXGBE_RXDADV_STAT_DD))
            break;

 #2     rxd = *rxdp;

The sentence #2 may be executed before sentence #1. This action is likely
to make the ready packet zero length. If the packet is the first packet and
has the EOP bit set, the above segmentation fault will happen.

So, we should add a proper memory barrier to ensure the read ordering be
correct. We also did the same thing in the ixgbe_recv_pkts() function to
make the rxd data be valid even though we did not find segmentation fault
in this function.

Fixes: 8eecb3295ae ("ixgbe: add LRO support")
Cc: stable@dpdk.org

Signed-off-by: Min Zhou <zhoumin@loongson.cn>
---
v3:
- Use rte_smp_rmb() as the proper memory barrier instead of rte_rmb()
---
v2:
- Make the calling of rte_rmb() for all platforms
---
 drivers/net/ixgbe/ixgbe_rxtx.c | 39 ++++++++++++----------------------
 1 file changed, 13 insertions(+), 26 deletions(-)
  

Comments

Ruifeng Wang May 8, 2023, 6:03 a.m. UTC | #1
> -----Original Message-----
> From: Min Zhou <zhoumin@loongson.cn>
> Sent: Saturday, May 6, 2023 6:24 PM
> To: qi.z.zhang@intel.com; mb@smartsharesystems.com; konstantin.v.ananyev@yandex.ru;
> qiming.yang@intel.com; wenjun1.wu@intel.com; zhoumin@loongson.cn
> Cc: Ruifeng Wang <Ruifeng.Wang@arm.com>; drc@linux.vnet.ibm.com;
> roretzla@linux.microsoft.com; dev@dpdk.org; stable@dpdk.org; maobibo@loongson.cn
> Subject: [PATCH v3] net/ixgbe: add proper memory barriers for some Rx functions
> 
> Segmentation fault has been observed while running the
> ixgbe_recv_pkts_lro() function to receive packets on the Loongson 3C5000 processor which
> has 64 cores and 4 NUMA nodes.
> 
> From the ixgbe_recv_pkts_lro() function, we found that as long as the first packet has the
> EOP bit set, and the length of this packet is less than or equal to rxq->crc_len, the
> segmentation fault will definitely happen even though on the other platforms. For example,
> if we made the first packet which had the EOP bit set had a zero length by force, the
> segmentation fault would happen on X86.
> 
> Because when processd the first packet the first_seg->next will be NULL, if at the same
> time this packet has the EOP bit set and its length is less than or equal to rxq->crc_len,
> the following loop will be executed:
> 
>     for (lp = first_seg; lp->next != rxm; lp = lp->next)
>         ;
> 
> We know that the first_seg->next will be NULL under this condition. So the expression of
> lp->next->next will cause the segmentation fault.
> 
> Normally, the length of the first packet with EOP bit set will be greater than rxq-
> >crc_len. However, the out-of-order execution of CPU may make the read ordering of the
> status and the rest of the descriptor fields in this function not be correct. The related
> codes are as following:
> 
>         rxdp = &rx_ring[rx_id];
>  #1     staterr = rte_le_to_cpu_32(rxdp->wb.upper.status_error);
> 
>         if (!(staterr & IXGBE_RXDADV_STAT_DD))
>             break;
> 
>  #2     rxd = *rxdp;
> 
> The sentence #2 may be executed before sentence #1. This action is likely to make the
> ready packet zero length. If the packet is the first packet and has the EOP bit set, the
> above segmentation fault will happen.
> 
> So, we should add a proper memory barrier to ensure the read ordering be correct. We also
> did the same thing in the ixgbe_recv_pkts() function to make the rxd data be valid even
> though we did not find segmentation fault in this function.
> 
> Fixes: 8eecb3295ae ("ixgbe: add LRO support")
> Cc: stable@dpdk.org
> 
> Signed-off-by: Min Zhou <zhoumin@loongson.cn>
> ---
> v3:
> - Use rte_smp_rmb() as the proper memory barrier instead of rte_rmb()
> ---
> v2:
> - Make the calling of rte_rmb() for all platforms
> ---
>  drivers/net/ixgbe/ixgbe_rxtx.c | 39 ++++++++++++----------------------
>  1 file changed, 13 insertions(+), 26 deletions(-)
> 
> diff --git a/drivers/net/ixgbe/ixgbe_rxtx.c b/drivers/net/ixgbe/ixgbe_rxtx.c index
> 6b3d3a4d1a..80bcaef093 100644
> --- a/drivers/net/ixgbe/ixgbe_rxtx.c
> +++ b/drivers/net/ixgbe/ixgbe_rxtx.c
> @@ -1823,6 +1823,12 @@ ixgbe_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts,
>  		staterr = rxdp->wb.upper.status_error;
>  		if (!(staterr & rte_cpu_to_le_32(IXGBE_RXDADV_STAT_DD)))
>  			break;
> +
> +		/*
> +		 * This barrier is to ensure that status_error which includes DD
> +		 * bit is loaded before loading of other descriptor words.
> +		 */
> +		rte_smp_rmb();
>  		rxd = *rxdp;
> 
>  		/*
> @@ -2089,32 +2095,8 @@ ixgbe_recv_pkts_lro(void *rx_queue, struct rte_mbuf **rx_pkts,
> uint16_t nb_pkts,
> 
>  next_desc:
>  		/*
> -		 * The code in this whole file uses the volatile pointer to
> -		 * ensure the read ordering of the status and the rest of the
> -		 * descriptor fields (on the compiler level only!!!). This is so
> -		 * UGLY - why not to just use the compiler barrier instead? DPDK
> -		 * even has the rte_compiler_barrier() for that.
> -		 *
> -		 * But most importantly this is just wrong because this doesn't
> -		 * ensure memory ordering in a general case at all. For
> -		 * instance, DPDK is supposed to work on Power CPUs where
> -		 * compiler barrier may just not be enough!
> -		 *
> -		 * I tried to write only this function properly to have a
> -		 * starting point (as a part of an LRO/RSC series) but the
> -		 * compiler cursed at me when I tried to cast away the
> -		 * "volatile" from rx_ring (yes, it's volatile too!!!). So, I'm
> -		 * keeping it the way it is for now.
> -		 *
> -		 * The code in this file is broken in so many other places and
> -		 * will just not work on a big endian CPU anyway therefore the
> -		 * lines below will have to be revisited together with the rest
> -		 * of the ixgbe PMD.
> -		 *
> -		 * TODO:
> -		 *    - Get rid of "volatile" and let the compiler do its job.
> -		 *    - Use the proper memory barrier (rte_rmb()) to ensure the
> -		 *      memory ordering below.
> +		 * It is necessary to use a proper memory barrier to ensure the
> +		 * memory ordering below.
>  		 */
>  		rxdp = &rx_ring[rx_id];
>  		staterr = rte_le_to_cpu_32(rxdp->wb.upper.status_error);
> @@ -2122,6 +2104,11 @@ ixgbe_recv_pkts_lro(void *rx_queue, struct rte_mbuf **rx_pkts,
> uint16_t nb_pkts,
>  		if (!(staterr & IXGBE_RXDADV_STAT_DD))
>  			break;
> 
> +		/*
> +		 * This barrier is to ensure that status_error which includes DD
> +		 * bit is loaded before loading of other descriptor words.
> +		 */
> +		rte_smp_rmb();
>  		rxd = *rxdp;
> 
>  		PMD_RX_LOG(DEBUG, "port_id=%u queue_id=%u rx_id=%u "
> --
> 2.31.1
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
  
Qi Zhang May 15, 2023, 2:10 a.m. UTC | #2
> -----Original Message-----
> From: Ruifeng Wang <Ruifeng.Wang@arm.com>
> Sent: Monday, May 8, 2023 2:03 PM
> To: Min Zhou <zhoumin@loongson.cn>; Zhang, Qi Z <qi.z.zhang@intel.com>;
> mb@smartsharesystems.com; konstantin.v.ananyev@yandex.ru; Yang,
> Qiming <qiming.yang@intel.com>; Wu, Wenjun1 <wenjun1.wu@intel.com>
> Cc: drc@linux.vnet.ibm.com; roretzla@linux.microsoft.com; dev@dpdk.org;
> stable@dpdk.org; maobibo@loongson.cn; nd <nd@arm.com>
> Subject: RE: [PATCH v3] net/ixgbe: add proper memory barriers for some Rx
> functions
> 
> > -----Original Message-----
> > From: Min Zhou <zhoumin@loongson.cn>
> > Sent: Saturday, May 6, 2023 6:24 PM
> > To: qi.z.zhang@intel.com; mb@smartsharesystems.com;
> > konstantin.v.ananyev@yandex.ru; qiming.yang@intel.com;
> > wenjun1.wu@intel.com; zhoumin@loongson.cn
> > Cc: Ruifeng Wang <Ruifeng.Wang@arm.com>; drc@linux.vnet.ibm.com;
> > roretzla@linux.microsoft.com; dev@dpdk.org; stable@dpdk.org;
> > maobibo@loongson.cn
> > Subject: [PATCH v3] net/ixgbe: add proper memory barriers for some Rx
> > functions
> >
> > Segmentation fault has been observed while running the
> > ixgbe_recv_pkts_lro() function to receive packets on the Loongson
> > 3C5000 processor which has 64 cores and 4 NUMA nodes.
> >
> > From the ixgbe_recv_pkts_lro() function, we found that as long as the
> > first packet has the EOP bit set, and the length of this packet is
> > less than or equal to rxq->crc_len, the segmentation fault will
> > definitely happen even though on the other platforms. For example, if
> > we made the first packet which had the EOP bit set had a zero length by
> force, the segmentation fault would happen on X86.
> >
> > Because when processd the first packet the first_seg->next will be
> > NULL, if at the same time this packet has the EOP bit set and its
> > length is less than or equal to rxq->crc_len, the following loop will be
> executed:
> >
> >     for (lp = first_seg; lp->next != rxm; lp = lp->next)
> >         ;
> >
> > We know that the first_seg->next will be NULL under this condition. So
> > the expression of
> > lp->next->next will cause the segmentation fault.
> >
> > Normally, the length of the first packet with EOP bit set will be
> > greater than rxq-
> > >crc_len. However, the out-of-order execution of CPU may make the read
> > >ordering of the
> > status and the rest of the descriptor fields in this function not be
> > correct. The related codes are as following:
> >
> >         rxdp = &rx_ring[rx_id];
> >  #1     staterr = rte_le_to_cpu_32(rxdp->wb.upper.status_error);
> >
> >         if (!(staterr & IXGBE_RXDADV_STAT_DD))
> >             break;
> >
> >  #2     rxd = *rxdp;
> >
> > The sentence #2 may be executed before sentence #1. This action is
> > likely to make the ready packet zero length. If the packet is the
> > first packet and has the EOP bit set, the above segmentation fault will
> happen.
> >
> > So, we should add a proper memory barrier to ensure the read ordering
> > be correct. We also did the same thing in the ixgbe_recv_pkts()
> > function to make the rxd data be valid even though we did not find
> segmentation fault in this function.
> >
> > Fixes: 8eecb3295ae ("ixgbe: add LRO support")
> > Cc: stable@dpdk.org
> >
> > Signed-off-by: Min Zhou <zhoumin@loongson.cn>
> > ---
> > v3:
> > - Use rte_smp_rmb() as the proper memory barrier instead of rte_rmb()
> > ---
> > v2:
> > - Make the calling of rte_rmb() for all platforms
> > ---
> >  drivers/net/ixgbe/ixgbe_rxtx.c | 39
> > ++++++++++++----------------------
> >  1 file changed, 13 insertions(+), 26 deletions(-)
> >
> > diff --git a/drivers/net/ixgbe/ixgbe_rxtx.c
> > b/drivers/net/ixgbe/ixgbe_rxtx.c index
> > 6b3d3a4d1a..80bcaef093 100644
> > --- a/drivers/net/ixgbe/ixgbe_rxtx.c
> > +++ b/drivers/net/ixgbe/ixgbe_rxtx.c
> > @@ -1823,6 +1823,12 @@ ixgbe_recv_pkts(void *rx_queue, struct
> rte_mbuf **rx_pkts,
> >  		staterr = rxdp->wb.upper.status_error;
> >  		if (!(staterr & rte_cpu_to_le_32(IXGBE_RXDADV_STAT_DD)))
> >  			break;
> > +
> > +		/*
> > +		 * This barrier is to ensure that status_error which includes
> DD
> > +		 * bit is loaded before loading of other descriptor words.
> > +		 */
> > +		rte_smp_rmb();
> >  		rxd = *rxdp;
> >
> >  		/*
> > @@ -2089,32 +2095,8 @@ ixgbe_recv_pkts_lro(void *rx_queue, struct
> > rte_mbuf **rx_pkts, uint16_t nb_pkts,
> >
> >  next_desc:
> >  		/*
> > -		 * The code in this whole file uses the volatile pointer to
> > -		 * ensure the read ordering of the status and the rest of the
> > -		 * descriptor fields (on the compiler level only!!!). This is so
> > -		 * UGLY - why not to just use the compiler barrier instead?
> DPDK
> > -		 * even has the rte_compiler_barrier() for that.
> > -		 *
> > -		 * But most importantly this is just wrong because this
> doesn't
> > -		 * ensure memory ordering in a general case at all. For
> > -		 * instance, DPDK is supposed to work on Power CPUs where
> > -		 * compiler barrier may just not be enough!
> > -		 *
> > -		 * I tried to write only this function properly to have a
> > -		 * starting point (as a part of an LRO/RSC series) but the
> > -		 * compiler cursed at me when I tried to cast away the
> > -		 * "volatile" from rx_ring (yes, it's volatile too!!!). So, I'm
> > -		 * keeping it the way it is for now.
> > -		 *
> > -		 * The code in this file is broken in so many other places and
> > -		 * will just not work on a big endian CPU anyway therefore
> the
> > -		 * lines below will have to be revisited together with the rest
> > -		 * of the ixgbe PMD.
> > -		 *
> > -		 * TODO:
> > -		 *    - Get rid of "volatile" and let the compiler do its job.
> > -		 *    - Use the proper memory barrier (rte_rmb()) to ensure
> the
> > -		 *      memory ordering below.
> > +		 * It is necessary to use a proper memory barrier to ensure
> the
> > +		 * memory ordering below.
> >  		 */
> >  		rxdp = &rx_ring[rx_id];
> >  		staterr = rte_le_to_cpu_32(rxdp->wb.upper.status_error);
> > @@ -2122,6 +2104,11 @@ ixgbe_recv_pkts_lro(void *rx_queue, struct
> > rte_mbuf **rx_pkts, uint16_t nb_pkts,
> >  		if (!(staterr & IXGBE_RXDADV_STAT_DD))
> >  			break;
> >
> > +		/*
> > +		 * This barrier is to ensure that status_error which includes
> DD
> > +		 * bit is loaded before loading of other descriptor words.
> > +		 */
> > +		rte_smp_rmb();
> >  		rxd = *rxdp;
> >
> >  		PMD_RX_LOG(DEBUG, "port_id=%u queue_id=%u rx_id=%u "
> > --
> > 2.31.1
> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>

Applied to dpdk-next-net-intel.

Thanks
Qi
  
Thomas Monjalon June 12, 2023, 10:26 a.m. UTC | #3
15/05/2023 04:10, Zhang, Qi Z:
> From: Ruifeng Wang <Ruifeng.Wang@arm.com>
> > From: Min Zhou <zhoumin@loongson.cn>
> > >
> > > Segmentation fault has been observed while running the
> > > ixgbe_recv_pkts_lro() function to receive packets on the Loongson
> > > 3C5000 processor which has 64 cores and 4 NUMA nodes.
> > >
> > > From the ixgbe_recv_pkts_lro() function, we found that as long as the
> > > first packet has the EOP bit set, and the length of this packet is
> > > less than or equal to rxq->crc_len, the segmentation fault will
> > > definitely happen even though on the other platforms. For example, if
> > > we made the first packet which had the EOP bit set had a zero length by
> > force, the segmentation fault would happen on X86.
> > >
> > > Because when processd the first packet the first_seg->next will be
> > > NULL, if at the same time this packet has the EOP bit set and its
> > > length is less than or equal to rxq->crc_len, the following loop will be
> > executed:
> > >
> > >     for (lp = first_seg; lp->next != rxm; lp = lp->next)
> > >         ;
> > >
> > > We know that the first_seg->next will be NULL under this condition. So
> > > the expression of
> > > lp->next->next will cause the segmentation fault.
> > >
> > > Normally, the length of the first packet with EOP bit set will be
> > > greater than rxq-
> > > >crc_len. However, the out-of-order execution of CPU may make the read
> > > >ordering of the
> > > status and the rest of the descriptor fields in this function not be
> > > correct. The related codes are as following:
> > >
> > >         rxdp = &rx_ring[rx_id];
> > >  #1     staterr = rte_le_to_cpu_32(rxdp->wb.upper.status_error);
> > >
> > >         if (!(staterr & IXGBE_RXDADV_STAT_DD))
> > >             break;
> > >
> > >  #2     rxd = *rxdp;
> > >
> > > The sentence #2 may be executed before sentence #1. This action is
> > > likely to make the ready packet zero length. If the packet is the
> > > first packet and has the EOP bit set, the above segmentation fault will
> > happen.
> > >
> > > So, we should add a proper memory barrier to ensure the read ordering
> > > be correct. We also did the same thing in the ixgbe_recv_pkts()
> > > function to make the rxd data be valid even though we did not find
> > segmentation fault in this function.
> > >
> > > Fixes: 8eecb3295ae ("ixgbe: add LRO support")
> > > Cc: stable@dpdk.org
> > >
> > > Signed-off-by: Min Zhou <zhoumin@loongson.cn>
> > > ---
> > > v3:
> > > - Use rte_smp_rmb() as the proper memory barrier instead of rte_rmb()
> > > ---
> > > v2:
> > > - Make the calling of rte_rmb() for all platforms
> > > ---
[...]
> > Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> 
> Applied to dpdk-next-net-intel.
> 
> Thanks
> Qi
> 

Why ignoring checkpatch?
It is saying:
"
Warning in drivers/net/ixgbe/ixgbe_rxtx.c:
Using rte_smp_[r/w]mb
"

Ruifeng proposed "rte_atomic_thread_fence(__ATOMIC_ACQUIRE)"
in a comment on the v2.

I will drop this patch from the pull of next-net-intel branch.
  
zhoumin June 12, 2023, 11:58 a.m. UTC | #4
Hi Thomas,

On Mon, June 12, 2023 at 6:26PM, Thomas Monjalon wrote:
> 15/05/2023 04:10, Zhang, Qi Z:
>> From: Ruifeng Wang <Ruifeng.Wang@arm.com>
>>> From: Min Zhou <zhoumin@loongson.cn>
>>>> Segmentation fault has been observed while running the
>>>> ixgbe_recv_pkts_lro() function to receive packets on the Loongson
>>>> 3C5000 processor which has 64 cores and 4 NUMA nodes.
>>>>
>>>>  From the ixgbe_recv_pkts_lro() function, we found that as long as the
>>>> first packet has the EOP bit set, and the length of this packet is
>>>> less than or equal to rxq->crc_len, the segmentation fault will
>>>> definitely happen even though on the other platforms. For example, if
>>>> we made the first packet which had the EOP bit set had a zero length by
>>> force, the segmentation fault would happen on X86.
>>>> Because when processd the first packet the first_seg->next will be
>>>> NULL, if at the same time this packet has the EOP bit set and its
>>>> length is less than or equal to rxq->crc_len, the following loop will be
>>> executed:
>>>>      for (lp = first_seg; lp->next != rxm; lp = lp->next)
>>>>          ;
>>>>
>>>> We know that the first_seg->next will be NULL under this condition. So
>>>> the expression of
>>>> lp->next->next will cause the segmentation fault.
>>>>
>>>> Normally, the length of the first packet with EOP bit set will be
>>>> greater than rxq-
>>>>> crc_len. However, the out-of-order execution of CPU may make the read
>>>>> ordering of the
>>>> status and the rest of the descriptor fields in this function not be
>>>> correct. The related codes are as following:
>>>>
>>>>          rxdp = &rx_ring[rx_id];
>>>>   #1     staterr = rte_le_to_cpu_32(rxdp->wb.upper.status_error);
>>>>
>>>>          if (!(staterr & IXGBE_RXDADV_STAT_DD))
>>>>              break;
>>>>
>>>>   #2     rxd = *rxdp;
>>>>
>>>> The sentence #2 may be executed before sentence #1. This action is
>>>> likely to make the ready packet zero length. If the packet is the
>>>> first packet and has the EOP bit set, the above segmentation fault will
>>> happen.
>>>> So, we should add a proper memory barrier to ensure the read ordering
>>>> be correct. We also did the same thing in the ixgbe_recv_pkts()
>>>> function to make the rxd data be valid even though we did not find
>>> segmentation fault in this function.
>>>> Fixes: 8eecb3295ae ("ixgbe: add LRO support")
>>>> Cc: stable@dpdk.org
>>>>
>>>> Signed-off-by: Min Zhou <zhoumin@loongson.cn>
>>>> ---
>>>> v3:
>>>> - Use rte_smp_rmb() as the proper memory barrier instead of rte_rmb()
>>>> ---
>>>> v2:
>>>> - Make the calling of rte_rmb() for all platforms
>>>> ---
> [...]
>>> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
>> Applied to dpdk-next-net-intel.
>>
>> Thanks
>> Qi
>>
> Why ignoring checkpatch?
> It is saying:
> "
> Warning in drivers/net/ixgbe/ixgbe_rxtx.c:
> Using rte_smp_[r/w]mb
> "


I'm sorry. Should we never use rte_smp_[r/w]mb in the driver's code?


> Ruifeng proposed "rte_atomic_thread_fence(__ATOMIC_ACQUIRE)"
> in a comment on the v2.


Thanks, I see. I think I also can use rte_atomic_thread_fence() to solve 
this problem. I will send the V4 patch.


>
> I will drop this patch from the pull of next-net-intel branch.
>
  
Thomas Monjalon June 12, 2023, 12:44 p.m. UTC | #5
12/06/2023 13:58, zhoumin:
> On Mon, June 12, 2023 at 6:26PM, Thomas Monjalon wrote:
> > 15/05/2023 04:10, Zhang, Qi Z:
> >> From: Ruifeng Wang <Ruifeng.Wang@arm.com>
> >>> From: Min Zhou <zhoumin@loongson.cn>
> >>>> ---
> >>>> v3:
> >>>> - Use rte_smp_rmb() as the proper memory barrier instead of rte_rmb()
> >>>> ---
> >>>> v2:
> >>>> - Make the calling of rte_rmb() for all platforms
> >>>> ---
> > [...]
> >>> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> >> Applied to dpdk-next-net-intel.
> >>
> >> Thanks
> >> Qi
> >>
> > Why ignoring checkpatch?
> > It is saying:
> > "
> > Warning in drivers/net/ixgbe/ixgbe_rxtx.c:
> > Using rte_smp_[r/w]mb
> > "
> 
> 
> I'm sorry. Should we never use rte_smp_[r/w]mb in the driver's code?

No we should avoid.
It has been decided to slowly replace such barriers.
By the way, I think it is not enough documented.
You can find an explanation in doc/guides/rel_notes/deprecation.rst

I think we should also add some notes to
lib/eal/include/generic/rte_atomic.h
Tyler, Honnappa, Ruifeng, Konstantin, what do you think?
  
zhoumin June 13, 2023, 1:42 a.m. UTC | #6
Hi Thomas,

On Mon, June 12, 2023 at 8:44PM, Thomas Monjalon wrote:
> 12/06/2023 13:58, zhoumin:
>> On Mon, June 12, 2023 at 6:26PM, Thomas Monjalon wrote:
>>> 15/05/2023 04:10, Zhang, Qi Z:
>>>> From: Ruifeng Wang <Ruifeng.Wang@arm.com>
>>>>> From: Min Zhou <zhoumin@loongson.cn>
>>>>>> ---
>>>>>> v3:
>>>>>> - Use rte_smp_rmb() as the proper memory barrier instead of rte_rmb()
>>>>>> ---
>>>>>> v2:
>>>>>> - Make the calling of rte_rmb() for all platforms
>>>>>> ---
>>> [...]
>>>>> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
>>>> Applied to dpdk-next-net-intel.
>>>>
>>>> Thanks
>>>> Qi
>>>>
>>> Why ignoring checkpatch?
>>> It is saying:
>>> "
>>> Warning in drivers/net/ixgbe/ixgbe_rxtx.c:
>>> Using rte_smp_[r/w]mb
>>> "
>>
>> I'm sorry. Should we never use rte_smp_[r/w]mb in the driver's code?
> No we should avoid.
> It has been decided to slowly replace such barriers.
> By the way, I think it is not enough documented.
> You can find an explanation in doc/guides/rel_notes/deprecation.rst
Thank your for providing the reference documents. I have read this file. 
The explanation is clear and I get it.
> I think we should also add some notes to
> lib/eal/include/generic/rte_atomic.h
Yes, I do think so. The notes added at the definitions of 
rte_smp_[r/w]mb are better.
> Tyler, Honnappa, Ruifeng, Konstantin, what do you think?
>
  
Jiawen Wu June 13, 2023, 3:30 a.m. UTC | #7
On Tuesday, June 13, 2023 9:43 AM, zhoumin wrote:
> On Mon, June 12, 2023 at 8:44PM, Thomas Monjalon wrote:
> > 12/06/2023 13:58, zhoumin:
> >> On Mon, June 12, 2023 at 6:26PM, Thomas Monjalon wrote:
> >>> 15/05/2023 04:10, Zhang, Qi Z:
> >>>> From: Ruifeng Wang <Ruifeng.Wang@arm.com>
> >>>>> From: Min Zhou <zhoumin@loongson.cn>
> >>>>>> ---
> >>>>>> v3:
> >>>>>> - Use rte_smp_rmb() as the proper memory barrier instead of rte_rmb()
> >>>>>> ---
> >>>>>> v2:
> >>>>>> - Make the calling of rte_rmb() for all platforms
> >>>>>> ---

Hi zhoumin,

I recently learned that Loongson is doing tests with Wangxun NICs on 3C5000, and also
found this problem on Wangxun NICs. I'm wondering if it would also be fixed on txgbe/ngbe.

Thanks.
  
Ruifeng Wang June 13, 2023, 9:25 a.m. UTC | #8
> -----Original Message-----
> From: Thomas Monjalon <thomas@monjalon.net>
> Sent: Monday, June 12, 2023 8:45 PM
> To: Ruifeng Wang <Ruifeng.Wang@arm.com>; Zhang, Qi Z <qi.z.zhang@intel.com>; zhoumin
> <zhoumin@loongson.cn>
> Cc: dev@dpdk.org; mb@smartsharesystems.com; konstantin.v.ananyev@yandex.ru; Yang, Qiming
> <qiming.yang@intel.com>; Wu, Wenjun1 <wenjun1.wu@intel.com>; drc@linux.vnet.ibm.com;
> roretzla@linux.microsoft.com; stable@dpdk.org; maobibo@loongson.cn; nd <nd@arm.com>;
> david.marchand@redhat.com; Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>; Tyler
> Retzlaff <roretzla@microsoft.com>; konstantin.ananyev@huawei.com
> Subject: Re: [PATCH v3] net/ixgbe: add proper memory barriers for some Rx functions
> 
> 12/06/2023 13:58, zhoumin:
> > On Mon, June 12, 2023 at 6:26PM, Thomas Monjalon wrote:
> > > 15/05/2023 04:10, Zhang, Qi Z:
> > >> From: Ruifeng Wang <Ruifeng.Wang@arm.com>
> > >>> From: Min Zhou <zhoumin@loongson.cn>
> > >>>> ---
> > >>>> v3:
> > >>>> - Use rte_smp_rmb() as the proper memory barrier instead of
> > >>>> rte_rmb()
> > >>>> ---
> > >>>> v2:
> > >>>> - Make the calling of rte_rmb() for all platforms
> > >>>> ---
> > > [...]
> > >>> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> > >> Applied to dpdk-next-net-intel.
> > >>
> > >> Thanks
> > >> Qi
> > >>
> > > Why ignoring checkpatch?
> > > It is saying:
> > > "
> > > Warning in drivers/net/ixgbe/ixgbe_rxtx.c:
> > > Using rte_smp_[r/w]mb
> > > "
> >
> >
> > I'm sorry. Should we never use rte_smp_[r/w]mb in the driver's code?
> 
> No we should avoid.
> It has been decided to slowly replace such barriers.
> By the way, I think it is not enough documented.
> You can find an explanation in doc/guides/rel_notes/deprecation.rst
> 
> I think we should also add some notes to lib/eal/include/generic/rte_atomic.h
> Tyler, Honnappa, Ruifeng, Konstantin, what do you think?
> 

Agree that we should add notes to rte_atomic.h.
The notes were not there for the sake of avoiding warnings on existing occurrences. 
With Tyler's rte_atomic series merged, rte_atomicNN_xx can be marked as __rte_deprecated.
rte_smp_*mb can be marked as __rte_deprecated after existing occurrences are converted.
  
zhoumin June 13, 2023, 10:12 a.m. UTC | #9
Hi Jiawen,

On Tues, June 13, 2023 at 11:30PM, Jiawen Wu wrote:
> On Tuesday, June 13, 2023 9:43 AM, zhoumin wrote:
>> On Mon, June 12, 2023 at 8:44PM, Thomas Monjalon wrote:
>>> 12/06/2023 13:58, zhoumin:
>>>> On Mon, June 12, 2023 at 6:26PM, Thomas Monjalon wrote:
>>>>> 15/05/2023 04:10, Zhang, Qi Z:
>>>>>> From: Ruifeng Wang <Ruifeng.Wang@arm.com>
>>>>>>> From: Min Zhou <zhoumin@loongson.cn>
>>>>>>>> ---
>>>>>>>> v3:
>>>>>>>> - Use rte_smp_rmb() as the proper memory barrier instead of rte_rmb()
>>>>>>>> ---
>>>>>>>> v2:
>>>>>>>> - Make the calling of rte_rmb() for all platforms
>>>>>>>> ---
> Hi zhoumin,
>
> I recently learned that Loongson is doing tests with Wangxun NICs on 3C5000, and also
> found this problem on Wangxun NICs. I'm wondering if it would also be fixed on txgbe/ngbe.
I'm sorry. I have not tested the Wangxun NICs on LRO receiving mode. The 
previous test results for Wangxun NICs were normal. I will do additional 
tests for Wangxun NICs to verify this problem.
> Thanks.
  
Thomas Monjalon June 20, 2023, 3:52 p.m. UTC | #10
13/06/2023 11:25, Ruifeng Wang:
> From: Thomas Monjalon <thomas@monjalon.net>
> > 12/06/2023 13:58, zhoumin:
> > > On Mon, June 12, 2023 at 6:26PM, Thomas Monjalon wrote:
> > > > 15/05/2023 04:10, Zhang, Qi Z:
> > > >> From: Ruifeng Wang <Ruifeng.Wang@arm.com>
> > > >>> From: Min Zhou <zhoumin@loongson.cn>
> > > >>>> ---
> > > >>>> v3:
> > > >>>> - Use rte_smp_rmb() as the proper memory barrier instead of
> > > >>>> rte_rmb()
> > > >>>> ---
> > > >>>> v2:
> > > >>>> - Make the calling of rte_rmb() for all platforms
> > > >>>> ---
> > > > [...]
> > > >>> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> > > >> Applied to dpdk-next-net-intel.
> > > >>
> > > >> Thanks
> > > >> Qi
> > > >>
> > > > Why ignoring checkpatch?
> > > > It is saying:
> > > > "
> > > > Warning in drivers/net/ixgbe/ixgbe_rxtx.c:
> > > > Using rte_smp_[r/w]mb
> > > > "
> > >
> > >
> > > I'm sorry. Should we never use rte_smp_[r/w]mb in the driver's code?
> > 
> > No we should avoid.
> > It has been decided to slowly replace such barriers.
> > By the way, I think it is not enough documented.
> > You can find an explanation in doc/guides/rel_notes/deprecation.rst
> > 
> > I think we should also add some notes to lib/eal/include/generic/rte_atomic.h
> > Tyler, Honnappa, Ruifeng, Konstantin, what do you think?
> > 
> 
> Agree that we should add notes to rte_atomic.h.
> The notes were not there for the sake of avoiding warnings on existing occurrences. 
> With Tyler's rte_atomic series merged, rte_atomicNN_xx can be marked as __rte_deprecated.
> rte_smp_*mb can be marked as __rte_deprecated after existing occurrences are converted.

Would you like to add some function comments to explain why it is deprecated?
  
Ruifeng Wang June 21, 2023, 6:50 a.m. UTC | #11
> -----Original Message-----
> From: Thomas Monjalon <thomas@monjalon.net>
> Sent: Tuesday, June 20, 2023 11:53 PM
> To: Zhang, Qi Z <qi.z.zhang@intel.com>; zhoumin <zhoumin@loongson.cn>; Ruifeng Wang
> <Ruifeng.Wang@arm.com>
> Cc: dev@dpdk.org; mb@smartsharesystems.com; konstantin.v.ananyev@yandex.ru; Yang, Qiming
> <qiming.yang@intel.com>; Wu, Wenjun1 <wenjun1.wu@intel.com>; drc@linux.vnet.ibm.com;
> roretzla@linux.microsoft.com; stable@dpdk.org; maobibo@loongson.cn; nd <nd@arm.com>;
> david.marchand@redhat.com; Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>; Tyler
> Retzlaff <roretzla@microsoft.com>; konstantin.ananyev@huawei.com; nd <nd@arm.com>
> Subject: Re: [PATCH v3] net/ixgbe: add proper memory barriers for some Rx functions
> 
> 13/06/2023 11:25, Ruifeng Wang:
> > From: Thomas Monjalon <thomas@monjalon.net>
> > > 12/06/2023 13:58, zhoumin:
> > > > On Mon, June 12, 2023 at 6:26PM, Thomas Monjalon wrote:
> > > > > 15/05/2023 04:10, Zhang, Qi Z:
> > > > >> From: Ruifeng Wang <Ruifeng.Wang@arm.com>
> > > > >>> From: Min Zhou <zhoumin@loongson.cn>
> > > > >>>> ---
> > > > >>>> v3:
> > > > >>>> - Use rte_smp_rmb() as the proper memory barrier instead of
> > > > >>>> rte_rmb()
> > > > >>>> ---
> > > > >>>> v2:
> > > > >>>> - Make the calling of rte_rmb() for all platforms
> > > > >>>> ---
> > > > > [...]
> > > > >>> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> > > > >> Applied to dpdk-next-net-intel.
> > > > >>
> > > > >> Thanks
> > > > >> Qi
> > > > >>
> > > > > Why ignoring checkpatch?
> > > > > It is saying:
> > > > > "
> > > > > Warning in drivers/net/ixgbe/ixgbe_rxtx.c:
> > > > > Using rte_smp_[r/w]mb
> > > > > "
> > > >
> > > >
> > > > I'm sorry. Should we never use rte_smp_[r/w]mb in the driver's code?
> > >
> > > No we should avoid.
> > > It has been decided to slowly replace such barriers.
> > > By the way, I think it is not enough documented.
> > > You can find an explanation in doc/guides/rel_notes/deprecation.rst
> > >
> > > I think we should also add some notes to lib/eal/include/generic/rte_atomic.h
> > > Tyler, Honnappa, Ruifeng, Konstantin, what do you think?
> > >
> >
> > Agree that we should add notes to rte_atomic.h.
> > The notes were not there for the sake of avoiding warnings on existing occurrences.
> > With Tyler's rte_atomic series merged, rte_atomicNN_xx can be marked as __rte_deprecated.
> > rte_smp_*mb can be marked as __rte_deprecated after existing occurrences are converted.
> 
> Would you like to add some function comments to explain why it is deprecated?
> 
Sure. Added notes in patch:
http://patches.dpdk.org/project/dpdk/patch/20230621064420.163931-1-ruifeng.wang@arm.com/
  

Patch

diff --git a/drivers/net/ixgbe/ixgbe_rxtx.c b/drivers/net/ixgbe/ixgbe_rxtx.c
index 6b3d3a4d1a..80bcaef093 100644
--- a/drivers/net/ixgbe/ixgbe_rxtx.c
+++ b/drivers/net/ixgbe/ixgbe_rxtx.c
@@ -1823,6 +1823,12 @@  ixgbe_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts,
 		staterr = rxdp->wb.upper.status_error;
 		if (!(staterr & rte_cpu_to_le_32(IXGBE_RXDADV_STAT_DD)))
 			break;
+
+		/*
+		 * This barrier is to ensure that status_error which includes DD
+		 * bit is loaded before loading of other descriptor words.
+		 */
+		rte_smp_rmb();
 		rxd = *rxdp;
 
 		/*
@@ -2089,32 +2095,8 @@  ixgbe_recv_pkts_lro(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t nb_pkts,
 
 next_desc:
 		/*
-		 * The code in this whole file uses the volatile pointer to
-		 * ensure the read ordering of the status and the rest of the
-		 * descriptor fields (on the compiler level only!!!). This is so
-		 * UGLY - why not to just use the compiler barrier instead? DPDK
-		 * even has the rte_compiler_barrier() for that.
-		 *
-		 * But most importantly this is just wrong because this doesn't
-		 * ensure memory ordering in a general case at all. For
-		 * instance, DPDK is supposed to work on Power CPUs where
-		 * compiler barrier may just not be enough!
-		 *
-		 * I tried to write only this function properly to have a
-		 * starting point (as a part of an LRO/RSC series) but the
-		 * compiler cursed at me when I tried to cast away the
-		 * "volatile" from rx_ring (yes, it's volatile too!!!). So, I'm
-		 * keeping it the way it is for now.
-		 *
-		 * The code in this file is broken in so many other places and
-		 * will just not work on a big endian CPU anyway therefore the
-		 * lines below will have to be revisited together with the rest
-		 * of the ixgbe PMD.
-		 *
-		 * TODO:
-		 *    - Get rid of "volatile" and let the compiler do its job.
-		 *    - Use the proper memory barrier (rte_rmb()) to ensure the
-		 *      memory ordering below.
+		 * It is necessary to use a proper memory barrier to ensure the
+		 * memory ordering below.
 		 */
 		rxdp = &rx_ring[rx_id];
 		staterr = rte_le_to_cpu_32(rxdp->wb.upper.status_error);
@@ -2122,6 +2104,11 @@  ixgbe_recv_pkts_lro(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t nb_pkts,
 		if (!(staterr & IXGBE_RXDADV_STAT_DD))
 			break;
 
+		/*
+		 * This barrier is to ensure that status_error which includes DD
+		 * bit is loaded before loading of other descriptor words.
+		 */
+		rte_smp_rmb();
 		rxd = *rxdp;
 
 		PMD_RX_LOG(DEBUG, "port_id=%u queue_id=%u rx_id=%u "