[v1,1/2] ethdev: fix null pointer dereference

Message ID 20230223123029.2117781-2-adwivedi@marvell.com (mailing list archive)
State Superseded, archived
Delegated to: Ferruh Yigit
Headers
Series bug fix in ethdev trace |

Checks

Context Check Description
ci/checkpatch success coding style OK

Commit Message

Ankur Dwivedi Feb. 23, 2023, 12:30 p.m. UTC
  The speed_fec_capa pointer can be null. So dereferencing the pointer is
removed and only the pointer is captured in trace function.
Fixed few more trace functions in which null pointer can be dereferenced.

Coverity issue: 383238
Bugzilla ID: 1162
Fixes: 6679cf21d608 ("ethdev: add trace points")
Fixes: ed04fd4072e9 ("ethdev: add trace points for flow")

Signed-off-by: Ankur Dwivedi <adwivedi@marvell.com>
---
 lib/ethdev/ethdev_trace.h | 29 +++++++----------------------
 1 file changed, 7 insertions(+), 22 deletions(-)
  

Comments

Ferruh Yigit Feb. 28, 2023, 11:04 a.m. UTC | #1
On 2/23/2023 12:30 PM, Ankur Dwivedi wrote:
> The speed_fec_capa pointer can be null. So dereferencing the pointer is
> removed and only the pointer is captured in trace function.
> Fixed few more trace functions in which null pointer can be dereferenced.
> 
> Coverity issue: 383238
> Bugzilla ID: 1162
> Fixes: 6679cf21d608 ("ethdev: add trace points")
> Fixes: ed04fd4072e9 ("ethdev: add trace points for flow")
> 
> Signed-off-by: Ankur Dwivedi <adwivedi@marvell.com>

Hi Ankur,

There is another bug report: https://bugs.dpdk.org/show_bug.cgi?id=1167


As far as I can see that is caused by '__rte_trace_point_register()' is
calling 'register_fn()' [1].

At registering trace point stage, most of the pointers can be invalid,
and this can crash other locations too.

Why 'register_fn()' called withing the trace point register? Can we
remove it?





[1]
#define RTE_TRACE_POINT_REGISTER(trace, name)
	RTE_INIT(trace##_init)
		__rte_trace_point_register(..., (void (*)(void)) trace);

__rte_trace_point_register(handle, name, void (*register_fn)(void)) {
	...
	register_fn();
	...
}
  
David Marchand Feb. 28, 2023, 11:29 a.m. UTC | #2
On Tue, Feb 28, 2023 at 12:05 PM Ferruh Yigit <ferruh.yigit@amd.com> wrote:
>
> On 2/23/2023 12:30 PM, Ankur Dwivedi wrote:
> > The speed_fec_capa pointer can be null. So dereferencing the pointer is
> > removed and only the pointer is captured in trace function.
> > Fixed few more trace functions in which null pointer can be dereferenced.
> >
> > Coverity issue: 383238
> > Bugzilla ID: 1162
> > Fixes: 6679cf21d608 ("ethdev: add trace points")
> > Fixes: ed04fd4072e9 ("ethdev: add trace points for flow")
> >
> > Signed-off-by: Ankur Dwivedi <adwivedi@marvell.com>
>
> Hi Ankur,
>
> There is another bug report: https://bugs.dpdk.org/show_bug.cgi?id=1167
>
>
> As far as I can see that is caused by '__rte_trace_point_register()' is
> calling 'register_fn()' [1].
>
> At registering trace point stage, most of the pointers can be invalid,
> and this can crash other locations too.

I remember hitting this issue when running with UBsan.

>
> Why 'register_fn()' called withing the trace point register? Can we
> remove it?

IIRC, this is used to evaluate the size of the trace point event.
  
Ferruh Yigit Feb. 28, 2023, 12:46 p.m. UTC | #3
On 2/28/2023 11:29 AM, David Marchand wrote:
> On Tue, Feb 28, 2023 at 12:05 PM Ferruh Yigit <ferruh.yigit@amd.com> wrote:
>>
>> On 2/23/2023 12:30 PM, Ankur Dwivedi wrote:
>>> The speed_fec_capa pointer can be null. So dereferencing the pointer is
>>> removed and only the pointer is captured in trace function.
>>> Fixed few more trace functions in which null pointer can be dereferenced.
>>>
>>> Coverity issue: 383238
>>> Bugzilla ID: 1162
>>> Fixes: 6679cf21d608 ("ethdev: add trace points")
>>> Fixes: ed04fd4072e9 ("ethdev: add trace points for flow")
>>>
>>> Signed-off-by: Ankur Dwivedi <adwivedi@marvell.com>
>>
>> Hi Ankur,
>>
>> There is another bug report: https://bugs.dpdk.org/show_bug.cgi?id=1167
>>
>>
>> As far as I can see that is caused by '__rte_trace_point_register()' is
>> calling 'register_fn()' [1].
>>
>> At registering trace point stage, most of the pointers can be invalid,
>> and this can crash other locations too.
> 
> I remember hitting this issue when running with UBsan.
> 
>>
>> Why 'register_fn()' called withing the trace point register? Can we
>> remove it?
> 
> IIRC, this is used to evaluate the size of the trace point event.
> 
> 

Yes, as checked with Jerin, it is used to evaluate size and some sanity
checks fro size.

We need either find a way to calculate size without really reading the
pointer content during register phase, all convert all pointer tracing
to emit_ptr().

I prefer first option if we can.
  
Jerin Jacob Feb. 28, 2023, 1:17 p.m. UTC | #4
On Tue, Feb 28, 2023 at 6:16 PM Ferruh Yigit <ferruh.yigit@amd.com> wrote:
>
> On 2/28/2023 11:29 AM, David Marchand wrote:
> > On Tue, Feb 28, 2023 at 12:05 PM Ferruh Yigit <ferruh.yigit@amd.com> wrote:
> >>
> >> On 2/23/2023 12:30 PM, Ankur Dwivedi wrote:
> >>> The speed_fec_capa pointer can be null. So dereferencing the pointer is
> >>> removed and only the pointer is captured in trace function.
> >>> Fixed few more trace functions in which null pointer can be dereferenced.
> >>>
> >>> Coverity issue: 383238
> >>> Bugzilla ID: 1162
> >>> Fixes: 6679cf21d608 ("ethdev: add trace points")
> >>> Fixes: ed04fd4072e9 ("ethdev: add trace points for flow")
> >>>
> >>> Signed-off-by: Ankur Dwivedi <adwivedi@marvell.com>
> >>
> >> Hi Ankur,
> >>
> >> There is another bug report: https://bugs.dpdk.org/show_bug.cgi?id=1167
> >>
> >>
> >> As far as I can see that is caused by '__rte_trace_point_register()' is
> >> calling 'register_fn()' [1].
> >>
> >> At registering trace point stage, most of the pointers can be invalid,
> >> and this can crash other locations too.
> >
> > I remember hitting this issue when running with UBsan.
> >
> >>
> >> Why 'register_fn()' called withing the trace point register? Can we
> >> remove it?
> >
> > IIRC, this is used to evaluate the size of the trace point event.
> >
> >
>
> Yes, as checked with Jerin, it is used to evaluate size and some sanity
> checks fro size.
>
> We need either find a way to calculate size without really reading the
> pointer content during register phase, all convert all pointer tracing
> to emit_ptr().
>
> I prefer first option if we can.

Looking at the root of the issue.

rte_trace_point_emit_* has two variant
a)trace point version - Which will emit the trace
b)trace register version - Which will find the size of trace
automatically with single definition of trace point to make life easy
for defining the trace point

In this case, conf value is junk, as we are referencing the value at
registration time.  Looks like in PPC arch, the stack content comes as
junk at this point and got this issue.
And other arch or other environment that adders is OK and since we're
just _reading_ the value. It is not making any issue. But it is a bug.

RTE_TRACE_POINT was designed to just use
rte_trace_point_emit_u16(conf->my_value) so that in registration scope
"conf->my_value" will be omitted by compiler.
But there as issue in using bitfiled[1] as trace is not supporting
bitfield.  Ankur added a local variable to work around the bitfiled
tracing.

So couple of options, I can think of

1)Remove the bitfiled trace point( There are only some trace point
uses that, Just we need to remove bitfield items from that)
2)It is possible to have anonymous union of type like u16, u32 for
tracing the the bitfields[2], but that would need change in public
structure
3) Another option is to have a #define to define the scope of
registration. Something like [3] which is noisy.

I think, we can just do 1 now for rc2 and (2) or (3) or some other
ideas we can think in next release.


[1]
../lib/ethdev/ethdev_trace.h: In function
‘rte_eth_trace_tx_hairpin_queue_setup’:
../lib/eal/include/rte_trace_point.h:378:21: error: cannot take
address of bit-field ‘peer_count’
  378 |         memcpy(mem, &(in), sizeof(in)); \
      |                     ^


[2]
struct abc {
union {
          uint32_t val;
          struct {
                     val_5_bit:5

        }
}
}

[3]

[main]dell[dpdk.org] $ git diff
diff --git a/lib/eal/include/rte_trace_point_register.h
b/lib/eal/include/rte_trace_point_register.h
index a9682d3f22..266350b29c 100644
--- a/lib/eal/include/rte_trace_point_register.h
+++ b/lib/eal/include/rte_trace_point_register.h
@@ -16,6 +16,8 @@ extern "C" {
 #include <rte_per_lcore.h>
 #include <rte_trace_point.h>

+#define RTE_TRACE_SCOPE_IS_REGISTRATION
+
 RTE_DECLARE_PER_LCORE(volatile int, trace_point_sz);

 #define RTE_TRACE_POINT_REGISTER(trace, name) \
diff --git a/lib/ethdev/ethdev_trace.h b/lib/ethdev/ethdev_trace.h
index 53d1a71ff0..ba42c3d10a 100644
--- a/lib/ethdev/ethdev_trace.h
+++ b/lib/ethdev/ethdev_trace.h
@@ -237,13 +237,23 @@ RTE_TRACE_POINT(
        RTE_TRACE_POINT_ARGS(uint16_t port_id, uint16_t rx_queue_id,
                uint16_t nb_rx_desc, const struct rte_eth_hairpin_conf *conf,
                int ret),
-       uint16_t peer_count = conf->peer_count;
-       uint8_t tx_explicit = conf->tx_explicit;
-       uint8_t manual_bind = conf->manual_bind;
-       uint8_t use_locked_device_memory = conf->use_locked_device_memory;
-       uint8_t use_rte_memory = conf->use_rte_memory;
-       uint8_t force_memory = conf->force_memory;

+       uint16_t peer_count;
+       uint8_t tx_explicit;
+       uint8_t manual_bind;
+       uint8_t use_locked_device_memory;
+       uint8_t use_rte_memory;
+       uint8_t force_memory;
+#ifdef RTE_TRACE_SCOPE_IS_REGISTRATION
+       RTE_SET_USED(conf);
+#else
+       peer_count = conf->peer_count;
+       tx_explicit = conf->tx_explicit;
+       manual_bind = conf->manual_bind;
+       use_locked_device_memory = conf->use_locked_device_memory;
+       use_rte_memory = conf->use_rte_memory;
+       force_memory = conf->force_memory;
+#endif
        rte_trace_point_emit_u16(port_id);
        rte_trace_point_emit_u16(rx_queue_id);
        rte_trace_point_emit_u16(nb_rx_desc);
[main]dell[dpdk.org] $
  
Ferruh Yigit Feb. 28, 2023, 1:39 p.m. UTC | #5
On 2/28/2023 1:17 PM, Jerin Jacob wrote:
> On Tue, Feb 28, 2023 at 6:16 PM Ferruh Yigit <ferruh.yigit@amd.com> wrote:
>>
>> On 2/28/2023 11:29 AM, David Marchand wrote:
>>> On Tue, Feb 28, 2023 at 12:05 PM Ferruh Yigit <ferruh.yigit@amd.com> wrote:
>>>>
>>>> On 2/23/2023 12:30 PM, Ankur Dwivedi wrote:
>>>>> The speed_fec_capa pointer can be null. So dereferencing the pointer is
>>>>> removed and only the pointer is captured in trace function.
>>>>> Fixed few more trace functions in which null pointer can be dereferenced.
>>>>>
>>>>> Coverity issue: 383238
>>>>> Bugzilla ID: 1162
>>>>> Fixes: 6679cf21d608 ("ethdev: add trace points")
>>>>> Fixes: ed04fd4072e9 ("ethdev: add trace points for flow")
>>>>>
>>>>> Signed-off-by: Ankur Dwivedi <adwivedi@marvell.com>
>>>>
>>>> Hi Ankur,
>>>>
>>>> There is another bug report: https://bugs.dpdk.org/show_bug.cgi?id=1167
>>>>
>>>>
>>>> As far as I can see that is caused by '__rte_trace_point_register()' is
>>>> calling 'register_fn()' [1].
>>>>
>>>> At registering trace point stage, most of the pointers can be invalid,
>>>> and this can crash other locations too.
>>>
>>> I remember hitting this issue when running with UBsan.
>>>
>>>>
>>>> Why 'register_fn()' called withing the trace point register? Can we
>>>> remove it?
>>>
>>> IIRC, this is used to evaluate the size of the trace point event.
>>>
>>>
>>
>> Yes, as checked with Jerin, it is used to evaluate size and some sanity
>> checks fro size.
>>
>> We need either find a way to calculate size without really reading the
>> pointer content during register phase, all convert all pointer tracing
>> to emit_ptr().
>>
>> I prefer first option if we can.
> 
> Looking at the root of the issue.
> 
> rte_trace_point_emit_* has two variant
> a)trace point version - Which will emit the trace
> b)trace register version - Which will find the size of trace
> automatically with single definition of trace point to make life easy
> for defining the trace point
> 
> In this case, conf value is junk, as we are referencing the value at
> registration time.  Looks like in PPC arch, the stack content comes as
> junk at this point and got this issue.
> And other arch or other environment that adders is OK and since we're
> just _reading_ the value. It is not making any issue. But it is a bug.
> 
> RTE_TRACE_POINT was designed to just use
> rte_trace_point_emit_u16(conf->my_value) so that in registration scope
> "conf->my_value" will be omitted by compiler.
> But there as issue in using bitfiled[1] as trace is not supporting
> bitfield.  Ankur added a local variable to work around the bitfiled
> tracing.
> 
> So couple of options, I can think of
> 
> 1)Remove the bitfiled trace point( There are only some trace point
> uses that, Just we need to remove bitfield items from that)
> 2)It is possible to have anonymous union of type like u16, u32 for
> tracing the the bitfields[2], but that would need change in public
> structure
> 3) Another option is to have a #define to define the scope of
> registration. Something like [3] which is noisy.
> 
> I think, we can just do 1 now for rc2 and (2) or (3) or some other
> ideas we can think in next release.
> 


+1 to go for option 1, specially at this phase of the release.

Only limited number of traces are affected by this bitfield related
issue anyway.


btw, this is for the Bug 1167,
I thought 1167 & 1162 share same root cause but they don't,
so 1162 is still open.

> 
> [1]
> ../lib/ethdev/ethdev_trace.h: In function
> ‘rte_eth_trace_tx_hairpin_queue_setup’:
> ../lib/eal/include/rte_trace_point.h:378:21: error: cannot take
> address of bit-field ‘peer_count’
>   378 |         memcpy(mem, &(in), sizeof(in)); \
>       |                     ^
> 
> 
> [2]
> struct abc {
> union {
>           uint32_t val;
>           struct {
>                      val_5_bit:5
> 
>         }
> }
> }
> 
> [3]
> 
> [main]dell[dpdk.org] $ git diff
> diff --git a/lib/eal/include/rte_trace_point_register.h
> b/lib/eal/include/rte_trace_point_register.h
> index a9682d3f22..266350b29c 100644
> --- a/lib/eal/include/rte_trace_point_register.h
> +++ b/lib/eal/include/rte_trace_point_register.h
> @@ -16,6 +16,8 @@ extern "C" {
>  #include <rte_per_lcore.h>
>  #include <rte_trace_point.h>
> 
> +#define RTE_TRACE_SCOPE_IS_REGISTRATION
> +
>  RTE_DECLARE_PER_LCORE(volatile int, trace_point_sz);
> 
>  #define RTE_TRACE_POINT_REGISTER(trace, name) \
> diff --git a/lib/ethdev/ethdev_trace.h b/lib/ethdev/ethdev_trace.h
> index 53d1a71ff0..ba42c3d10a 100644
> --- a/lib/ethdev/ethdev_trace.h
> +++ b/lib/ethdev/ethdev_trace.h
> @@ -237,13 +237,23 @@ RTE_TRACE_POINT(
>         RTE_TRACE_POINT_ARGS(uint16_t port_id, uint16_t rx_queue_id,
>                 uint16_t nb_rx_desc, const struct rte_eth_hairpin_conf *conf,
>                 int ret),
> -       uint16_t peer_count = conf->peer_count;
> -       uint8_t tx_explicit = conf->tx_explicit;
> -       uint8_t manual_bind = conf->manual_bind;
> -       uint8_t use_locked_device_memory = conf->use_locked_device_memory;
> -       uint8_t use_rte_memory = conf->use_rte_memory;
> -       uint8_t force_memory = conf->force_memory;
> 
> +       uint16_t peer_count;
> +       uint8_t tx_explicit;
> +       uint8_t manual_bind;
> +       uint8_t use_locked_device_memory;
> +       uint8_t use_rte_memory;
> +       uint8_t force_memory;
> +#ifdef RTE_TRACE_SCOPE_IS_REGISTRATION
> +       RTE_SET_USED(conf);
> +#else
> +       peer_count = conf->peer_count;
> +       tx_explicit = conf->tx_explicit;
> +       manual_bind = conf->manual_bind;
> +       use_locked_device_memory = conf->use_locked_device_memory;
> +       use_rte_memory = conf->use_rte_memory;
> +       force_memory = conf->force_memory;
> +#endif
>         rte_trace_point_emit_u16(port_id);
>         rte_trace_point_emit_u16(rx_queue_id);
>         rte_trace_point_emit_u16(nb_rx_desc);
> [main]dell[dpdk.org] $
  
Ferruh Yigit Feb. 28, 2023, 3:01 p.m. UTC | #6
On 2/23/2023 12:30 PM, Ankur Dwivedi wrote:
> The speed_fec_capa pointer can be null. So dereferencing the pointer is
> removed and only the pointer is captured in trace function.
> Fixed few more trace functions in which null pointer can be dereferenced.
> 
> Coverity issue: 383238
> Bugzilla ID: 1162
> Fixes: 6679cf21d608 ("ethdev: add trace points")
> Fixes: ed04fd4072e9 ("ethdev: add trace points for flow")
> 

In below changes, pointers can be NULL at runtime, so agree on to the
change, with minor comments please see below.


But I don't think this is a fix for Bug 1162, which is an ASan reported
error, can you please drop that tag unless it is verified.

<...>

> @@ -2308,8 +2300,7 @@ RTE_TRACE_POINT_FP(
>  		int ret),
>  	rte_trace_point_emit_u16(port_id);
>  	rte_trace_point_emit_ptr(flow);
> -	rte_trace_point_emit_int(action->type);
> -	rte_trace_point_emit_ptr(action->conf);
> +	rte_trace_point_emit_ptr(action);
>  	rte_trace_point_emit_ptr(data);
>  	rte_trace_point_emit_int(ret);

I think 'rte_flow_trace_create()' is missed, rest looks OK.

Can you please double check 'rte_flow_trace_create()' too?

>  )
> @@ -2349,14 +2340,8 @@ RTE_TRACE_POINT_FP(
>  		const struct rte_flow_indir_action_conf *conf,
>  		const struct rte_flow_action *action,
>  		const struct rte_flow_action_handle *handle),
> -	uint8_t ingress = conf->ingress;
> -	uint8_t egress = conf->egress;
> -	uint8_t transfer = conf->transfer;
> -
>  	rte_trace_point_emit_u16(port_id);
> -	rte_trace_point_emit_u8(ingress);
> -	rte_trace_point_emit_u8(egress);
> -	rte_trace_point_emit_u8(transfer);
> +	rte_trace_point_emit_ptr(conf);
>  	rte_trace_point_emit_ptr(action);
>  	rte_trace_point_emit_ptr(handle);
>  )

This change is different than others, this is removing bitfield related
local variable assignment.

According to bug 1167 that is causing a crash. So we need a separate
patch to either remove or fix bitfield related issues, for now I am OK
to remove (as done above).

But can you please make another patch for bitfield issue and move above
change to that patch?

Thanks,
ferruh
  
Ankur Dwivedi Feb. 28, 2023, 3:40 p.m. UTC | #7
>----------------------------------------------------------------------
>On 2/23/2023 12:30 PM, Ankur Dwivedi wrote:
>> The speed_fec_capa pointer can be null. So dereferencing the pointer
>> is removed and only the pointer is captured in trace function.
>> Fixed few more trace functions in which null pointer can be dereferenced.
>>
>> Coverity issue: 383238
>> Bugzilla ID: 1162
>> Fixes: 6679cf21d608 ("ethdev: add trace points")
>> Fixes: ed04fd4072e9 ("ethdev: add trace points for flow")
>>
>
>In below changes, pointers can be NULL at runtime, so agree on to the change,
>with minor comments please see below.
>
>
>But I don't think this is a fix for Bug 1162, which is an ASan reported error, can
>you please drop that tag unless it is verified.
The asan error reported in 1162 was because of capturing rte_trace_point_emit_string(parent->bus_info); in
rte_eth_trace_find_next_of. I could not find the exact reason for the asan error with parent->bus_info. 

But I think parent pointer can be NULL in rte_eth_find_next_of, so I removed the pointer reference. This resolved the asan error in 1162 as a side effect.

-	rte_trace_point_emit_string(parent->name);
-	rte_trace_point_emit_string(parent->bus_info);
-	rte_trace_point_emit_int(parent->numa_node);
+	rte_trace_point_emit_ptr(parent);

I will check if I can add an if condition to check if a pointer is NULL inside the trace function. If that works I will update this patch.
>
><...>
>
>> @@ -2308,8 +2300,7 @@ RTE_TRACE_POINT_FP(
>>  		int ret),
>>  	rte_trace_point_emit_u16(port_id);
>>  	rte_trace_point_emit_ptr(flow);
>> -	rte_trace_point_emit_int(action->type);
>> -	rte_trace_point_emit_ptr(action->conf);
>> +	rte_trace_point_emit_ptr(action);
>>  	rte_trace_point_emit_ptr(data);
>>  	rte_trace_point_emit_int(ret);
>
>I think 'rte_flow_trace_create()' is missed, rest looks OK.
>
>Can you please double check 'rte_flow_trace_create()' too?
Yes. Will add rte_flow_trace_create.
>
>>  )
>> @@ -2349,14 +2340,8 @@ RTE_TRACE_POINT_FP(
>>  		const struct rte_flow_indir_action_conf *conf,
>>  		const struct rte_flow_action *action,
>>  		const struct rte_flow_action_handle *handle),
>> -	uint8_t ingress = conf->ingress;
>> -	uint8_t egress = conf->egress;
>> -	uint8_t transfer = conf->transfer;
>> -
>>  	rte_trace_point_emit_u16(port_id);
>> -	rte_trace_point_emit_u8(ingress);
>> -	rte_trace_point_emit_u8(egress);
>> -	rte_trace_point_emit_u8(transfer);
>> +	rte_trace_point_emit_ptr(conf);
>>  	rte_trace_point_emit_ptr(action);
>>  	rte_trace_point_emit_ptr(handle);
>>  )
>
>This change is different than others, this is removing bitfield related local
>variable assignment.
>
>According to bug 1167 that is causing a crash. So we need a separate patch to
>either remove or fix bitfield related issues, for now I am OK to remove (as
>done above).
>
>But can you please make another patch for bitfield issue and move above
>change to that patch?

Yes, will move this change to fix bitfield patch.
>
>Thanks,
>ferruh
  
Ferruh Yigit Feb. 28, 2023, 4:08 p.m. UTC | #8
On 2/28/2023 3:40 PM, Ankur Dwivedi wrote:
>> ----------------------------------------------------------------------
>> On 2/23/2023 12:30 PM, Ankur Dwivedi wrote:
>>> The speed_fec_capa pointer can be null. So dereferencing the pointer
>>> is removed and only the pointer is captured in trace function.
>>> Fixed few more trace functions in which null pointer can be dereferenced.
>>>
>>> Coverity issue: 383238
>>> Bugzilla ID: 1162
>>> Fixes: 6679cf21d608 ("ethdev: add trace points")
>>> Fixes: ed04fd4072e9 ("ethdev: add trace points for flow")
>>>
>>
>> In below changes, pointers can be NULL at runtime, so agree on to the change,
>> with minor comments please see below.
>>
>>
>> But I don't think this is a fix for Bug 1162, which is an ASan reported error, can
>> you please drop that tag unless it is verified.
> The asan error reported in 1162 was because of capturing rte_trace_point_emit_string(parent->bus_info); in
> rte_eth_trace_find_next_of. I could not find the exact reason for the asan error with parent->bus_info. 
> 

Yeah, I also looked at it but not able to identify why it is complaining.

> But I think parent pointer can be NULL in rte_eth_find_next_of, so I removed the pointer reference. This resolved the asan error in 1162 as a side effect.
> 

Is it confirmed that patch fixing asan issue, I have not seen it in the
bugzilla comments. If it is confirmed, OK to keep Bugzilla ID tag.

And aside from the asan issue, OK to have this patch, because of reason
you described.

> -	rte_trace_point_emit_string(parent->name);
> -	rte_trace_point_emit_string(parent->bus_info);
> -	rte_trace_point_emit_int(parent->numa_node);
> +	rte_trace_point_emit_ptr(parent);
> 
> I will check if I can add an if condition to check if a pointer is NULL inside the trace function. If that works I will update this patch.
>>
>> <...>
>>
>>> @@ -2308,8 +2300,7 @@ RTE_TRACE_POINT_FP(
>>>  		int ret),
>>>  	rte_trace_point_emit_u16(port_id);
>>>  	rte_trace_point_emit_ptr(flow);
>>> -	rte_trace_point_emit_int(action->type);
>>> -	rte_trace_point_emit_ptr(action->conf);
>>> +	rte_trace_point_emit_ptr(action);
>>>  	rte_trace_point_emit_ptr(data);
>>>  	rte_trace_point_emit_int(ret);
>>
>> I think 'rte_flow_trace_create()' is missed, rest looks OK.
>>
>> Can you please double check 'rte_flow_trace_create()' too?
> Yes. Will add rte_flow_trace_create.
>>
>>>  )
>>> @@ -2349,14 +2340,8 @@ RTE_TRACE_POINT_FP(
>>>  		const struct rte_flow_indir_action_conf *conf,
>>>  		const struct rte_flow_action *action,
>>>  		const struct rte_flow_action_handle *handle),
>>> -	uint8_t ingress = conf->ingress;
>>> -	uint8_t egress = conf->egress;
>>> -	uint8_t transfer = conf->transfer;
>>> -
>>>  	rte_trace_point_emit_u16(port_id);
>>> -	rte_trace_point_emit_u8(ingress);
>>> -	rte_trace_point_emit_u8(egress);
>>> -	rte_trace_point_emit_u8(transfer);
>>> +	rte_trace_point_emit_ptr(conf);
>>>  	rte_trace_point_emit_ptr(action);
>>>  	rte_trace_point_emit_ptr(handle);
>>>  )
>>
>> This change is different than others, this is removing bitfield related local
>> variable assignment.
>>
>> According to bug 1167 that is causing a crash. So we need a separate patch to
>> either remove or fix bitfield related issues, for now I am OK to remove (as
>> done above).
>>
>> But can you please make another patch for bitfield issue and move above
>> change to that patch?
> 
> Yes, will move this change to fix bitfield patch.
>>
>> Thanks,
>> ferruh
>
  
Ferruh Yigit Feb. 28, 2023, 4:27 p.m. UTC | #9
On 2/28/2023 3:40 PM, Ankur Dwivedi wrote:
>> ----------------------------------------------------------------------
>> On 2/23/2023 12:30 PM, Ankur Dwivedi wrote:
>>> The speed_fec_capa pointer can be null. So dereferencing the pointer
>>> is removed and only the pointer is captured in trace function.
>>> Fixed few more trace functions in which null pointer can be dereferenced.
>>>
>>> Coverity issue: 383238
>>> Bugzilla ID: 1162
>>> Fixes: 6679cf21d608 ("ethdev: add trace points")
>>> Fixes: ed04fd4072e9 ("ethdev: add trace points for flow")
>>>
>>
>> In below changes, pointers can be NULL at runtime, so agree on to the change,
>> with minor comments please see below.
>>
>>
>> But I don't think this is a fix for Bug 1162, which is an ASan reported error, can
>> you please drop that tag unless it is verified.
> The asan error reported in 1162 was because of capturing rte_trace_point_emit_string(parent->bus_info); in
> rte_eth_trace_find_next_of. I could not find the exact reason for the asan error with parent->bus_info. 
> 
> But I think parent pointer can be NULL in rte_eth_find_next_of, so I removed the pointer reference. This resolved the asan error in 1162 as a side effect.
> 
> -	rte_trace_point_emit_string(parent->name);
> -	rte_trace_point_emit_string(parent->bus_info);
> -	rte_trace_point_emit_int(parent->numa_node);
> +	rte_trace_point_emit_ptr(parent);
> 
> I will check if I can add an if condition to check if a pointer is NULL inside the trace function. If that works I will update this patch.
>>
>> <...>
>>
>>> @@ -2308,8 +2300,7 @@ RTE_TRACE_POINT_FP(
>>>  		int ret),
>>>  	rte_trace_point_emit_u16(port_id);
>>>  	rte_trace_point_emit_ptr(flow);
>>> -	rte_trace_point_emit_int(action->type);
>>> -	rte_trace_point_emit_ptr(action->conf);
>>> +	rte_trace_point_emit_ptr(action);
>>>  	rte_trace_point_emit_ptr(data);
>>>  	rte_trace_point_emit_int(ret);
>>
>> I think 'rte_flow_trace_create()' is missed, rest looks OK.
>>
>> Can you please double check 'rte_flow_trace_create()' too?
> Yes. Will add rte_flow_trace_create.


Can you please check 'rte_eth_trace_read_clock()' too,
it has:
RTE_TRACE_POINT_ARGS(..., const uint64_t *clk, ...)
uint64_t clk_v = *clk;

>>
>>>  )
>>> @@ -2349,14 +2340,8 @@ RTE_TRACE_POINT_FP(
>>>  		const struct rte_flow_indir_action_conf *conf,
>>>  		const struct rte_flow_action *action,
>>>  		const struct rte_flow_action_handle *handle),
>>> -	uint8_t ingress = conf->ingress;
>>> -	uint8_t egress = conf->egress;
>>> -	uint8_t transfer = conf->transfer;
>>> -
>>>  	rte_trace_point_emit_u16(port_id);
>>> -	rte_trace_point_emit_u8(ingress);
>>> -	rte_trace_point_emit_u8(egress);
>>> -	rte_trace_point_emit_u8(transfer);
>>> +	rte_trace_point_emit_ptr(conf);
>>>  	rte_trace_point_emit_ptr(action);
>>>  	rte_trace_point_emit_ptr(handle);
>>>  )
>>
>> This change is different than others, this is removing bitfield related local
>> variable assignment.
>>
>> According to bug 1167 that is causing a crash. So we need a separate patch to
>> either remove or fix bitfield related issues, for now I am OK to remove (as
>> done above).
>>
>> But can you please make another patch for bitfield issue and move above
>> change to that patch?
> 
> Yes, will move this change to fix bitfield patch.
>>
>> Thanks,
>> ferruh
>
  
Ferruh Yigit March 2, 2023, 9:58 a.m. UTC | #10
On 2/28/2023 4:27 PM, Ferruh Yigit wrote:
> On 2/28/2023 3:40 PM, Ankur Dwivedi wrote:
>>> ----------------------------------------------------------------------
>>> On 2/23/2023 12:30 PM, Ankur Dwivedi wrote:
>>>> The speed_fec_capa pointer can be null. So dereferencing the pointer
>>>> is removed and only the pointer is captured in trace function.
>>>> Fixed few more trace functions in which null pointer can be dereferenced.
>>>>
>>>> Coverity issue: 383238
>>>> Bugzilla ID: 1162
>>>> Fixes: 6679cf21d608 ("ethdev: add trace points")
>>>> Fixes: ed04fd4072e9 ("ethdev: add trace points for flow")
>>>>
>>>
>>> In below changes, pointers can be NULL at runtime, so agree on to the change,
>>> with minor comments please see below.
>>>
>>>
>>> But I don't think this is a fix for Bug 1162, which is an ASan reported error, can
>>> you please drop that tag unless it is verified.
>> The asan error reported in 1162 was because of capturing rte_trace_point_emit_string(parent->bus_info); in
>> rte_eth_trace_find_next_of. I could not find the exact reason for the asan error with parent->bus_info. 
>>
>> But I think parent pointer can be NULL in rte_eth_find_next_of, so I removed the pointer reference. This resolved the asan error in 1162 as a side effect.
>>
>> -	rte_trace_point_emit_string(parent->name);
>> -	rte_trace_point_emit_string(parent->bus_info);
>> -	rte_trace_point_emit_int(parent->numa_node);
>> +	rte_trace_point_emit_ptr(parent);
>>
>> I will check if I can add an if condition to check if a pointer is NULL inside the trace function. If that works I will update this patch.
>>>
>>> <...>
>>>
>>>> @@ -2308,8 +2300,7 @@ RTE_TRACE_POINT_FP(
>>>>  		int ret),
>>>>  	rte_trace_point_emit_u16(port_id);
>>>>  	rte_trace_point_emit_ptr(flow);
>>>> -	rte_trace_point_emit_int(action->type);
>>>> -	rte_trace_point_emit_ptr(action->conf);
>>>> +	rte_trace_point_emit_ptr(action);
>>>>  	rte_trace_point_emit_ptr(data);
>>>>  	rte_trace_point_emit_int(ret);
>>>
>>> I think 'rte_flow_trace_create()' is missed, rest looks OK.
>>>
>>> Can you please double check 'rte_flow_trace_create()' too?
>> Yes. Will add rte_flow_trace_create.
> 
> 
> Can you please check 'rte_eth_trace_read_clock()' too,
> it has:
> RTE_TRACE_POINT_ARGS(..., const uint64_t *clk, ...)
> uint64_t clk_v = *clk;
> 

Hi Ankur,

It seems bug 1162 is verified with this patch, can you please send a v2
so we can close the defect.

Thanks,
ferruh

>>>
>>>>  )
>>>> @@ -2349,14 +2340,8 @@ RTE_TRACE_POINT_FP(
>>>>  		const struct rte_flow_indir_action_conf *conf,
>>>>  		const struct rte_flow_action *action,
>>>>  		const struct rte_flow_action_handle *handle),
>>>> -	uint8_t ingress = conf->ingress;
>>>> -	uint8_t egress = conf->egress;
>>>> -	uint8_t transfer = conf->transfer;
>>>> -
>>>>  	rte_trace_point_emit_u16(port_id);
>>>> -	rte_trace_point_emit_u8(ingress);
>>>> -	rte_trace_point_emit_u8(egress);
>>>> -	rte_trace_point_emit_u8(transfer);
>>>> +	rte_trace_point_emit_ptr(conf);
>>>>  	rte_trace_point_emit_ptr(action);
>>>>  	rte_trace_point_emit_ptr(handle);
>>>>  )
>>>
>>> This change is different than others, this is removing bitfield related local
>>> variable assignment.
>>>
>>> According to bug 1167 that is causing a crash. So we need a separate patch to
>>> either remove or fix bitfield related issues, for now I am OK to remove (as
>>> done above).
>>>
>>> But can you please make another patch for bitfield issue and move above
>>> change to that patch?
>>
>> Yes, will move this change to fix bitfield patch.
>>>
>>> Thanks,
>>> ferruh
>>
>
  
Ankur Dwivedi March 2, 2023, 4:49 p.m. UTC | #11
>On 2/28/2023 4:27 PM, Ferruh Yigit wrote:
>> On 2/28/2023 3:40 PM, Ankur Dwivedi wrote:
>>>> --------------------------------------------------------------------
>>>> -- On 2/23/2023 12:30 PM, Ankur Dwivedi wrote:
>>>>> The speed_fec_capa pointer can be null. So dereferencing the
>>>>> pointer is removed and only the pointer is captured in trace function.
>>>>> Fixed few more trace functions in which null pointer can be
>dereferenced.
>>>>>
>>>>> Coverity issue: 383238
>>>>> Bugzilla ID: 1162
>>>>> Fixes: 6679cf21d608 ("ethdev: add trace points")
>>>>> Fixes: ed04fd4072e9 ("ethdev: add trace points for flow")
>>>>>
>>>>
>>>> In below changes, pointers can be NULL at runtime, so agree on to
>>>> the change, with minor comments please see below.
>>>>
>>>>
>>>> But I don't think this is a fix for Bug 1162, which is an ASan
>>>> reported error, can you please drop that tag unless it is verified.
>>> The asan error reported in 1162 was because of capturing
>>> rte_trace_point_emit_string(parent->bus_info); in
>rte_eth_trace_find_next_of. I could not find the exact reason for the asan
>error with parent->bus_info.
>>>
>>> But I think parent pointer can be NULL in rte_eth_find_next_of, so I
>removed the pointer reference. This resolved the asan error in 1162 as a side
>effect.
>>>
>>> -	rte_trace_point_emit_string(parent->name);
>>> -	rte_trace_point_emit_string(parent->bus_info);
>>> -	rte_trace_point_emit_int(parent->numa_node);
>>> +	rte_trace_point_emit_ptr(parent);
>>>
>>> I will check if I can add an if condition to check if a pointer is NULL inside the
>trace function. If that works I will update this patch.
>>>>
>>>> <...>
>>>>
>>>>> @@ -2308,8 +2300,7 @@ RTE_TRACE_POINT_FP(
>>>>>  		int ret),
>>>>>  	rte_trace_point_emit_u16(port_id);
>>>>>  	rte_trace_point_emit_ptr(flow);
>>>>> -	rte_trace_point_emit_int(action->type);
>>>>> -	rte_trace_point_emit_ptr(action->conf);
>>>>> +	rte_trace_point_emit_ptr(action);
>>>>>  	rte_trace_point_emit_ptr(data);
>>>>>  	rte_trace_point_emit_int(ret);
>>>>
>>>> I think 'rte_flow_trace_create()' is missed, rest looks OK.
>>>>
>>>> Can you please double check 'rte_flow_trace_create()' too?
>>> Yes. Will add rte_flow_trace_create.
>>
>>
>> Can you please check 'rte_eth_trace_read_clock()' too, it has:
>> RTE_TRACE_POINT_ARGS(..., const uint64_t *clk, ...) uint64_t clk_v =
>> *clk;
>>
>
>Hi Ankur,
>
>It seems bug 1162 is verified with this patch, can you please send a v2
>so we can close the defect.

Sure, will send a v2.
>
>Thanks,
>ferruh
>
>>>>
>>>>>  )
>>>>> @@ -2349,14 +2340,8 @@ RTE_TRACE_POINT_FP(
>>>>>  		const struct rte_flow_indir_action_conf *conf,
>>>>>  		const struct rte_flow_action *action,
>>>>>  		const struct rte_flow_action_handle *handle),
>>>>> -	uint8_t ingress = conf->ingress;
>>>>> -	uint8_t egress = conf->egress;
>>>>> -	uint8_t transfer = conf->transfer;
>>>>> -
>>>>>  	rte_trace_point_emit_u16(port_id);
>>>>> -	rte_trace_point_emit_u8(ingress);
>>>>> -	rte_trace_point_emit_u8(egress);
>>>>> -	rte_trace_point_emit_u8(transfer);
>>>>> +	rte_trace_point_emit_ptr(conf);
>>>>>  	rte_trace_point_emit_ptr(action);
>>>>>  	rte_trace_point_emit_ptr(handle);
>>>>>  )
>>>>
>>>> This change is different than others, this is removing bitfield related local
>>>> variable assignment.
>>>>
>>>> According to bug 1167 that is causing a crash. So we need a separate
>patch to
>>>> either remove or fix bitfield related issues, for now I am OK to remove (as
>>>> done above).
>>>>
>>>> But can you please make another patch for bitfield issue and move above
>>>> change to that patch?
>>>
>>> Yes, will move this change to fix bitfield patch.
>>>>
>>>> Thanks,
>>>> ferruh
>>>
>>
  

Patch

diff --git a/lib/ethdev/ethdev_trace.h b/lib/ethdev/ethdev_trace.h
index 53d1a71ff0..a13e33fe64 100644
--- a/lib/ethdev/ethdev_trace.h
+++ b/lib/ethdev/ethdev_trace.h
@@ -123,8 +123,7 @@  RTE_TRACE_POINT(
 	RTE_TRACE_POINT_ARGS(uint16_t port_id,
 		const struct rte_eth_dev_owner *owner, int ret),
 	rte_trace_point_emit_u16(port_id);
-	rte_trace_point_emit_u64(owner->id);
-	rte_trace_point_emit_string(owner->name);
+	rte_trace_point_emit_ptr(owner);
 	rte_trace_point_emit_int(ret);
 )
 
@@ -375,9 +374,7 @@  RTE_TRACE_POINT(
 	rte_eth_trace_find_next_of,
 	RTE_TRACE_POINT_ARGS(uint16_t port_id, const struct rte_device *parent),
 	rte_trace_point_emit_u16(port_id);
-	rte_trace_point_emit_string(parent->name);
-	rte_trace_point_emit_string(parent->bus_info);
-	rte_trace_point_emit_int(parent->numa_node);
+	rte_trace_point_emit_ptr(parent);
 )
 
 RTE_TRACE_POINT(
@@ -869,8 +866,7 @@  RTE_TRACE_POINT(
 		const struct rte_eth_fec_capa *speed_fec_capa,
 		unsigned int num, int ret),
 	rte_trace_point_emit_u16(port_id);
-	rte_trace_point_emit_u32(speed_fec_capa->speed);
-	rte_trace_point_emit_u32(speed_fec_capa->capa);
+	rte_trace_point_emit_ptr(speed_fec_capa);
 	rte_trace_point_emit_u32(num);
 	rte_trace_point_emit_int(ret);
 )
@@ -1416,8 +1412,7 @@  RTE_TRACE_POINT(
 		const struct rte_flow_item *pattern,
 		const struct rte_flow_action *actions, int ret),
 	rte_trace_point_emit_u16(port_id);
-	rte_trace_point_emit_u32(attr->group);
-	rte_trace_point_emit_u32(attr->priority);
+	rte_trace_point_emit_ptr(attr);
 	rte_trace_point_emit_ptr(pattern);
 	rte_trace_point_emit_ptr(actions);
 	rte_trace_point_emit_int(ret);
@@ -1510,10 +1505,7 @@  RTE_TRACE_POINT(
 		const struct rte_flow_item_flex_conf *conf,
 		const struct rte_flow_item_flex_handle *handle),
 	rte_trace_point_emit_u16(port_id);
-	rte_trace_point_emit_int(conf->tunnel);
-	rte_trace_point_emit_int(conf->nb_samples);
-	rte_trace_point_emit_int(conf->nb_inputs);
-	rte_trace_point_emit_int(conf->nb_outputs);
+	rte_trace_point_emit_ptr(conf);
 	rte_trace_point_emit_ptr(handle);
 )
 
@@ -2308,8 +2300,7 @@  RTE_TRACE_POINT_FP(
 		int ret),
 	rte_trace_point_emit_u16(port_id);
 	rte_trace_point_emit_ptr(flow);
-	rte_trace_point_emit_int(action->type);
-	rte_trace_point_emit_ptr(action->conf);
+	rte_trace_point_emit_ptr(action);
 	rte_trace_point_emit_ptr(data);
 	rte_trace_point_emit_int(ret);
 )
@@ -2349,14 +2340,8 @@  RTE_TRACE_POINT_FP(
 		const struct rte_flow_indir_action_conf *conf,
 		const struct rte_flow_action *action,
 		const struct rte_flow_action_handle *handle),
-	uint8_t ingress = conf->ingress;
-	uint8_t egress = conf->egress;
-	uint8_t transfer = conf->transfer;
-
 	rte_trace_point_emit_u16(port_id);
-	rte_trace_point_emit_u8(ingress);
-	rte_trace_point_emit_u8(egress);
-	rte_trace_point_emit_u8(transfer);
+	rte_trace_point_emit_ptr(conf);
 	rte_trace_point_emit_ptr(action);
 	rte_trace_point_emit_ptr(handle);
 )