rte_memcpy: fix off by one for size 16 and 32
Checks
Commit Message
The rte_memcpy code would do extra instructions for size 16
and 32 which potentially could reference past end of data.
For size of 16, only single mov16 is needed.
same for size of 32, only single mov32.
Fixes: f5472703c0bd ("eal: optimize aligned memcpy on x86")
Fixes: d35cc1fe6a7a ("eal/x86: revert select optimized memcpy at run-time")
Suggested-by: Morten Brørup <mb@smartsharesystems.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
lib/eal/x86/include/rte_memcpy.h | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
Comments
On Sat, 2 Mar 2024 12:49:23 -0800
Stephen Hemminger <stephen@networkplumber.org> wrote:
> The rte_memcpy code would do extra instructions for size 16
> and 32 which potentially could reference past end of data.
>
> For size of 16, only single mov16 is needed.
> same for size of 32, only single mov32.
>
> Fixes: f5472703c0bd ("eal: optimize aligned memcpy on x86")
> Fixes: d35cc1fe6a7a ("eal/x86: revert select optimized memcpy at run-time")
>
> Suggested-by: Morten Brørup <mb@smartsharesystems.com>
> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Self-NAK, more is needed here.
The code has lots of pre-existing bugs where it will reference past the end
of the data in some cases.
I'm also working on a fix.
Med venlig hilsen / Kind regards,
-Morten Brørup
> -----Original Message-----
> From: Stephen Hemminger [mailto:stephen@networkplumber.org]
> Sent: Saturday, 2 March 2024 21.57
> To: dev@dpdk.org
> Cc: Morten Brørup; Bruce Richardson; Konstantin Ananyev; Zhihong Wang;
> Yuanhan Liu; Xiaoyun Li
> Subject: Re: [PATCH] rte_memcpy: fix off by one for size 16 and 32
>
> On Sat, 2 Mar 2024 12:49:23 -0800
> Stephen Hemminger <stephen@networkplumber.org> wrote:
>
> > The rte_memcpy code would do extra instructions for size 16
> > and 32 which potentially could reference past end of data.
> >
> > For size of 16, only single mov16 is needed.
> > same for size of 32, only single mov32.
> >
> > Fixes: f5472703c0bd ("eal: optimize aligned memcpy on x86")
> > Fixes: d35cc1fe6a7a ("eal/x86: revert select optimized memcpy at run-
> time")
> >
> > Suggested-by: Morten Brørup <mb@smartsharesystems.com>
> > Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
>
> Self-NAK, more is needed here.
>
> The code has lots of pre-existing bugs where it will reference past the
> end
> of the data in some cases.
> From: Stephen Hemminger [mailto:stephen@networkplumber.org]
> Sent: Saturday, 2 March 2024 21.49
>
> The rte_memcpy code would do extra instructions for size 16
> and 32 which potentially could reference past end of data.
It's a somewhat weird concept, but they don't reference past end of data.
They reference data in chunks of 16.
E.g. when copying 17 bytes: first copy [0..15] and then copy [1..16] (as "-16 + n" in the code).
By referencing an address "-16" in a block of 16 bytes, they are not referencing past end of data.
>
> For size of 16, only single mov16 is needed.
> same for size of 32, only single mov32.
I fixed the duplicate copies with my patch [1]. Please review.
[1]: https://inbox.dpdk.org/dev/20240302234812.9137-1-mb@smartsharesystems.com/T/#u
On 2024-03-02 21:56, Stephen Hemminger wrote:
> On Sat, 2 Mar 2024 12:49:23 -0800
> Stephen Hemminger <stephen@networkplumber.org> wrote:
>
>> The rte_memcpy code would do extra instructions for size 16
>> and 32 which potentially could reference past end of data.
>>
>> For size of 16, only single mov16 is needed.
>> same for size of 32, only single mov32.
>>
>> Fixes: f5472703c0bd ("eal: optimize aligned memcpy on x86")
>> Fixes: d35cc1fe6a7a ("eal/x86: revert select optimized memcpy at run-time")
>>
>> Suggested-by: Morten Brørup <mb@smartsharesystems.com>
>> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
>
> Self-NAK, more is needed here.
>
> The code has lots of pre-existing bugs where it will reference past the end
> of the data in some cases.
Memory beyond the buffer is not accessed in this case. The rte_mov16()
copies just overlap.
A colleague pointed out the same "bug" to me a couple of years ago. We
didn't realize what code would be generated in the n == 16 case though.
That seems very much worth fixing.
Maybe it's worth adding a comment regarding the overlap.
On 2024-03-02 21:49, Stephen Hemminger wrote:
> The rte_memcpy code would do extra instructions for size 16
> and 32 which potentially could reference past end of data.
>
> For size of 16, only single mov16 is needed.
> same for size of 32, only single mov32.
>
> Fixes: f5472703c0bd ("eal: optimize aligned memcpy on x86")
> Fixes: d35cc1fe6a7a ("eal/x86: revert select optimized memcpy at run-time")
>
> Suggested-by: Morten Brørup <mb@smartsharesystems.com>
> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
> ---
> lib/eal/x86/include/rte_memcpy.h | 6 ++++--
> 1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/lib/eal/x86/include/rte_memcpy.h b/lib/eal/x86/include/rte_memcpy.h
> index 72a92290e05d..00479a4bfbe6 100644
> --- a/lib/eal/x86/include/rte_memcpy.h
> +++ b/lib/eal/x86/include/rte_memcpy.h
> @@ -233,13 +233,15 @@ rte_memcpy_generic(void *dst, const void *src, size_t n)
> */
> if (n <= 32) {
> rte_mov16((uint8_t *)dst, (const uint8_t *)src);
> - rte_mov16((uint8_t *)dst - 16 + n,
> + if (n > 16)
> + rte_mov16((uint8_t *)dst - 16 + n,
> (const uint8_t *)src - 16 + n);
> return ret;
> }
> if (n <= 64) {
> rte_mov32((uint8_t *)dst, (const uint8_t *)src);
> - rte_mov32((uint8_t *)dst - 32 + n,
> + if (n > 32)
n is always > 32 here.
> + rte_mov32((uint8_t *)dst - 32 + n,
> (const uint8_t *)src - 32 + n);
> return ret;
> }
@@ -233,13 +233,15 @@ rte_memcpy_generic(void *dst, const void *src, size_t n)
*/
if (n <= 32) {
rte_mov16((uint8_t *)dst, (const uint8_t *)src);
- rte_mov16((uint8_t *)dst - 16 + n,
+ if (n > 16)
+ rte_mov16((uint8_t *)dst - 16 + n,
(const uint8_t *)src - 16 + n);
return ret;
}
if (n <= 64) {
rte_mov32((uint8_t *)dst, (const uint8_t *)src);
- rte_mov32((uint8_t *)dst - 32 + n,
+ if (n > 32)
+ rte_mov32((uint8_t *)dst - 32 + n,
(const uint8_t *)src - 32 + n);
return ret;
}