net/mlx5: fix instruction hotspot on replenishing Rx buffer
Checks
Commit Message
On replenishing Rx buffers for vectorized Rx, mbuf->buf_addr isn't needed
to be accessed as it is static and easily calculated from the mbuf address.
Accessing the mbuf content causes unnecessary load stall and it is worsened
on ARM.
Fixes: 545b884b1da3 ("net/mlx5: fix buffer address posting in SSE Rx")
Cc: stable@dpdk.org
Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
---
drivers/net/mlx5/mlx5_rxtx_vec.h | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)
Comments
On Wed, Jan 9, 2019 at 9:54 AM Yongseok Koh <yskoh@mellanox.com> wrote:
> On replenishing Rx buffers for vectorized Rx, mbuf->buf_addr isn't needed
> to be accessed as it is static and easily calculated from the mbuf address.
> Accessing the mbuf content causes unnecessary load stall and it is worsened
> on ARM.
>
> Fixes: 545b884b1da3 ("net/mlx5: fix buffer address posting in SSE Rx")
> Cc: stable@dpdk.org
>
> Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
> ---
> drivers/net/mlx5/mlx5_rxtx_vec.h | 8 ++++++--
> 1 file changed, 6 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/net/mlx5/mlx5_rxtx_vec.h
> b/drivers/net/mlx5/mlx5_rxtx_vec.h
> index fda7004e2d..ced5547307 100644
> --- a/drivers/net/mlx5/mlx5_rxtx_vec.h
> +++ b/drivers/net/mlx5/mlx5_rxtx_vec.h
> @@ -102,8 +102,12 @@ mlx5_rx_replenish_bulk_mbuf(struct mlx5_rxq_data
> *rxq, uint16_t n)
> return;
> }
> for (i = 0; i < n; ++i) {
> - wq[i].addr = rte_cpu_to_be_64((uintptr_t)elts[i]->buf_addr
> +
> - RTE_PKTMBUF_HEADROOM);
> + uintptr_t buf_addr =
> + (uintptr_t)elts[i] + sizeof(struct rte_mbuf) +
> + rte_pktmbuf_priv_size(rxq->mp) +
> RTE_PKTMBUF_HEADROOM;
> +
> + assert(buf_addr == (uintptr_t)elts[i]->buf_addr);
> + wq[i].addr = rte_cpu_to_be_64(buf_addr);
> /* If there's only one MR, no need to replace LKey in WQE.
> */
> if (unlikely(mlx5_mr_btree_len(&rxq->mr_ctrl.cache_bh) >
> 1))
> wq[i].lkey = mlx5_rx_mb2mr(rxq, elts[i]);
> --
> 2.11.0
>
>
How about having a macro / inline in the mbuf api to get this information
in a consistent/unique way ?
I can see we have this calculation at least in rte_pktmbuf_init() and
rte_pktmbuf_detach().
On Wed, Jan 09, 2019 at 10:38:07AM +0100, David Marchand wrote:
> On Wed, Jan 9, 2019 at 9:54 AM Yongseok Koh <yskoh@mellanox.com> wrote:
>
> > On replenishing Rx buffers for vectorized Rx, mbuf->buf_addr isn't needed
> > to be accessed as it is static and easily calculated from the mbuf address.
> > Accessing the mbuf content causes unnecessary load stall and it is worsened
> > on ARM.
> >
> > Fixes: 545b884b1da3 ("net/mlx5: fix buffer address posting in SSE Rx")
> > Cc: stable@dpdk.org
> >
> > Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
> > ---
> > drivers/net/mlx5/mlx5_rxtx_vec.h | 8 ++++++--
> > 1 file changed, 6 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/net/mlx5/mlx5_rxtx_vec.h
> > b/drivers/net/mlx5/mlx5_rxtx_vec.h
> > index fda7004e2d..ced5547307 100644
> > --- a/drivers/net/mlx5/mlx5_rxtx_vec.h
> > +++ b/drivers/net/mlx5/mlx5_rxtx_vec.h
> > @@ -102,8 +102,12 @@ mlx5_rx_replenish_bulk_mbuf(struct mlx5_rxq_data
> > *rxq, uint16_t n)
> > return;
> > }
> > for (i = 0; i < n; ++i) {
> > - wq[i].addr = rte_cpu_to_be_64((uintptr_t)elts[i]->buf_addr
> > +
> > - RTE_PKTMBUF_HEADROOM);
> > + uintptr_t buf_addr =
> > + (uintptr_t)elts[i] + sizeof(struct rte_mbuf) +
> > + rte_pktmbuf_priv_size(rxq->mp) +
> > RTE_PKTMBUF_HEADROOM;
> > +
> > + assert(buf_addr == (uintptr_t)elts[i]->buf_addr);
> > + wq[i].addr = rte_cpu_to_be_64(buf_addr);
> > /* If there's only one MR, no need to replace LKey in WQE.
> > */
> > if (unlikely(mlx5_mr_btree_len(&rxq->mr_ctrl.cache_bh) >
> > 1))
> > wq[i].lkey = mlx5_rx_mb2mr(rxq, elts[i]);
> > --
> > 2.11.0
> >
> >
> How about having a macro / inline in the mbuf api to get this information
> in a consistent/unique way ?
> I can see we have this calculation at least in rte_pktmbuf_init() and
> rte_pktmbuf_detach().
Agree. Maybe rte_mbuf_default_buf_addr(m) ?
Side note, is the assert() correct in the patch? I'd say there's a
difference of RTE_PKTMBUF_HEADROOM between the 2 values.
Olivier
> On Jan 9, 2019, at 1:52 AM, Olivier Matz <olivier.matz@6wind.com> wrote:
>
> On Wed, Jan 09, 2019 at 10:38:07AM +0100, David Marchand wrote:
>> On Wed, Jan 9, 2019 at 9:54 AM Yongseok Koh <yskoh@mellanox.com> wrote:
>>
>>> On replenishing Rx buffers for vectorized Rx, mbuf->buf_addr isn't needed
>>> to be accessed as it is static and easily calculated from the mbuf address.
>>> Accessing the mbuf content causes unnecessary load stall and it is worsened
>>> on ARM.
>>>
>>> Fixes: 545b884b1da3 ("net/mlx5: fix buffer address posting in SSE Rx")
>>> Cc: stable@dpdk.org
>>>
>>> Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
>>> ---
>>> drivers/net/mlx5/mlx5_rxtx_vec.h | 8 ++++++--
>>> 1 file changed, 6 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/net/mlx5/mlx5_rxtx_vec.h
>>> b/drivers/net/mlx5/mlx5_rxtx_vec.h
>>> index fda7004e2d..ced5547307 100644
>>> --- a/drivers/net/mlx5/mlx5_rxtx_vec.h
>>> +++ b/drivers/net/mlx5/mlx5_rxtx_vec.h
>>> @@ -102,8 +102,12 @@ mlx5_rx_replenish_bulk_mbuf(struct mlx5_rxq_data
>>> *rxq, uint16_t n)
>>> return;
>>> }
>>> for (i = 0; i < n; ++i) {
>>> - wq[i].addr = rte_cpu_to_be_64((uintptr_t)elts[i]->buf_addr
>>> +
>>> - RTE_PKTMBUF_HEADROOM);
>>> + uintptr_t buf_addr =
>>> + (uintptr_t)elts[i] + sizeof(struct rte_mbuf) +
>>> + rte_pktmbuf_priv_size(rxq->mp) +
>>> RTE_PKTMBUF_HEADROOM;
>>> +
>>> + assert(buf_addr == (uintptr_t)elts[i]->buf_addr);
>>> + wq[i].addr = rte_cpu_to_be_64(buf_addr);
>>> /* If there's only one MR, no need to replace LKey in WQE.
>>> */
>>> if (unlikely(mlx5_mr_btree_len(&rxq->mr_ctrl.cache_bh) >
>>> 1))
>>> wq[i].lkey = mlx5_rx_mb2mr(rxq, elts[i]);
>>> --
>>> 2.11.0
>>>
>>>
>> How about having a macro / inline in the mbuf api to get this information
>> in a consistent/unique way ?
>> I can see we have this calculation at least in rte_pktmbuf_init() and
>> rte_pktmbuf_detach().
>
> Agree. Maybe rte_mbuf_default_buf_addr(m) ?
I'm also okay to add. Will come up with a new patch.
> Side note, is the assert() correct in the patch? I'd say there's a
> difference of RTE_PKTMBUF_HEADROOM between the 2 values.
Oops, my fault. Thanks for the catch, you saved a crash. :-)
Thanks,
Yongseok
On Wed, Jan 9, 2019 at 10:56 AM Yongseok Koh <yskoh@mellanox.com> wrote:
>
> > On Jan 9, 2019, at 1:52 AM, Olivier Matz <olivier.matz@6wind.com> wrote:
> >
> > On Wed, Jan 09, 2019 at 10:38:07AM +0100, David Marchand wrote:
> >> On Wed, Jan 9, 2019 at 9:54 AM Yongseok Koh <yskoh@mellanox.com> wrote:
> >>
> >>> On replenishing Rx buffers for vectorized Rx, mbuf->buf_addr isn't
> needed
> >>> to be accessed as it is static and easily calculated from the mbuf
> address.
> >>> Accessing the mbuf content causes unnecessary load stall and it is
> worsened
> >>> on ARM.
> >>>
> >>> Fixes: 545b884b1da3 ("net/mlx5: fix buffer address posting in SSE Rx")
> >>> Cc: stable@dpdk.org
> >>>
> >>> Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
> >>> ---
> >>> drivers/net/mlx5/mlx5_rxtx_vec.h | 8 ++++++--
> >>> 1 file changed, 6 insertions(+), 2 deletions(-)
> >>>
> >>> diff --git a/drivers/net/mlx5/mlx5_rxtx_vec.h
> >>> b/drivers/net/mlx5/mlx5_rxtx_vec.h
> >>> index fda7004e2d..ced5547307 100644
> >>> --- a/drivers/net/mlx5/mlx5_rxtx_vec.h
> >>> +++ b/drivers/net/mlx5/mlx5_rxtx_vec.h
> >>> @@ -102,8 +102,12 @@ mlx5_rx_replenish_bulk_mbuf(struct mlx5_rxq_data
> >>> *rxq, uint16_t n)
> >>> return;
> >>> }
> >>> for (i = 0; i < n; ++i) {
> >>> - wq[i].addr =
> rte_cpu_to_be_64((uintptr_t)elts[i]->buf_addr
> >>> +
> >>> - RTE_PKTMBUF_HEADROOM);
> >>> + uintptr_t buf_addr =
> >>> + (uintptr_t)elts[i] + sizeof(struct rte_mbuf) +
> >>> + rte_pktmbuf_priv_size(rxq->mp) +
> >>> RTE_PKTMBUF_HEADROOM;
> >>> +
> >>> + assert(buf_addr == (uintptr_t)elts[i]->buf_addr);
> >>> + wq[i].addr = rte_cpu_to_be_64(buf_addr);
> >>> /* If there's only one MR, no need to replace LKey in
> WQE.
> >>> */
> >>> if (unlikely(mlx5_mr_btree_len(&rxq->mr_ctrl.cache_bh) >
> >>> 1))
> >>> wq[i].lkey = mlx5_rx_mb2mr(rxq, elts[i]);
> >>> --
> >>> 2.11.0
> >>>
> >>>
> >> How about having a macro / inline in the mbuf api to get this
> information
> >> in a consistent/unique way ?
> >> I can see we have this calculation at least in rte_pktmbuf_init() and
> >> rte_pktmbuf_detach().
> >
> > Agree. Maybe rte_mbuf_default_buf_addr(m) ?
>
> I'm also okay to add. Will come up with a new patch.
>
> > Side note, is the assert() correct in the patch? I'd say there's a
> > difference of RTE_PKTMBUF_HEADROOM between the 2 values.
>
> Oops, my fault. Thanks for the catch, you saved a crash. :-)
>
Is this assert really necessary if we have a common macro ?
I was under the impression that this assert is there to catch misalignement
between the mbuf api and the driver.
> On Jan 9, 2019, at 2:05 AM, David Marchand <david.marchand@redhat.com> wrote:
>
>
>
> On Wed, Jan 9, 2019 at 10:56 AM Yongseok Koh <yskoh@mellanox.com> wrote:
>
> > On Jan 9, 2019, at 1:52 AM, Olivier Matz <olivier.matz@6wind.com> wrote:
> >
> > On Wed, Jan 09, 2019 at 10:38:07AM +0100, David Marchand wrote:
> >> On Wed, Jan 9, 2019 at 9:54 AM Yongseok Koh <yskoh@mellanox.com> wrote:
> >>
> >>> On replenishing Rx buffers for vectorized Rx, mbuf->buf_addr isn't needed
> >>> to be accessed as it is static and easily calculated from the mbuf address.
> >>> Accessing the mbuf content causes unnecessary load stall and it is worsened
> >>> on ARM.
> >>>
> >>> Fixes: 545b884b1da3 ("net/mlx5: fix buffer address posting in SSE Rx")
> >>> Cc: stable@dpdk.org
> >>>
> >>> Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
> >>> ---
> >>> drivers/net/mlx5/mlx5_rxtx_vec.h | 8 ++++++--
> >>> 1 file changed, 6 insertions(+), 2 deletions(-)
> >>>
> >>> diff --git a/drivers/net/mlx5/mlx5_rxtx_vec.h
> >>> b/drivers/net/mlx5/mlx5_rxtx_vec.h
> >>> index fda7004e2d..ced5547307 100644
> >>> --- a/drivers/net/mlx5/mlx5_rxtx_vec.h
> >>> +++ b/drivers/net/mlx5/mlx5_rxtx_vec.h
> >>> @@ -102,8 +102,12 @@ mlx5_rx_replenish_bulk_mbuf(struct mlx5_rxq_data
> >>> *rxq, uint16_t n)
> >>> return;
> >>> }
> >>> for (i = 0; i < n; ++i) {
> >>> - wq[i].addr = rte_cpu_to_be_64((uintptr_t)elts[i]->buf_addr
> >>> +
> >>> - RTE_PKTMBUF_HEADROOM);
> >>> + uintptr_t buf_addr =
> >>> + (uintptr_t)elts[i] + sizeof(struct rte_mbuf) +
> >>> + rte_pktmbuf_priv_size(rxq->mp) +
> >>> RTE_PKTMBUF_HEADROOM;
> >>> +
> >>> + assert(buf_addr == (uintptr_t)elts[i]->buf_addr);
> >>> + wq[i].addr = rte_cpu_to_be_64(buf_addr);
> >>> /* If there's only one MR, no need to replace LKey in WQE.
> >>> */
> >>> if (unlikely(mlx5_mr_btree_len(&rxq->mr_ctrl.cache_bh) >
> >>> 1))
> >>> wq[i].lkey = mlx5_rx_mb2mr(rxq, elts[i]);
> >>> --
> >>> 2.11.0
> >>>
> >>>
> >> How about having a macro / inline in the mbuf api to get this information
> >> in a consistent/unique way ?
> >> I can see we have this calculation at least in rte_pktmbuf_init() and
> >> rte_pktmbuf_detach().
> >
> > Agree. Maybe rte_mbuf_default_buf_addr(m) ?
>
> I'm also okay to add. Will come up with a new patch.
>
> > Side note, is the assert() correct in the patch? I'd say there's a
> > difference of RTE_PKTMBUF_HEADROOM between the 2 values.
>
> Oops, my fault. Thanks for the catch, you saved a crash. :-)
>
> Is this assert really necessary if we have a common macro ?
> I was under the impression that this assert is there to catch misalignement between the mbuf api and the driver.
It is still good to have. This can catch corruption of mbuf content which sometimes
happens due to wrong mbuf handling in PMD or potential HW memory corruption.
Thanks,
Yongseok
@@ -102,8 +102,12 @@ mlx5_rx_replenish_bulk_mbuf(struct mlx5_rxq_data *rxq, uint16_t n)
return;
}
for (i = 0; i < n; ++i) {
- wq[i].addr = rte_cpu_to_be_64((uintptr_t)elts[i]->buf_addr +
- RTE_PKTMBUF_HEADROOM);
+ uintptr_t buf_addr =
+ (uintptr_t)elts[i] + sizeof(struct rte_mbuf) +
+ rte_pktmbuf_priv_size(rxq->mp) + RTE_PKTMBUF_HEADROOM;
+
+ assert(buf_addr == (uintptr_t)elts[i]->buf_addr);
+ wq[i].addr = rte_cpu_to_be_64(buf_addr);
/* If there's only one MR, no need to replace LKey in WQE. */
if (unlikely(mlx5_mr_btree_len(&rxq->mr_ctrl.cache_bh) > 1))
wq[i].lkey = mlx5_rx_mb2mr(rxq, elts[i]);