[dpdk-dev,25/25] rte_eal_init: add info about rte_errno codes

Message ID 1485529023-5486-26-git-send-email-aconole@redhat.com (mailing list archive)
State Superseded, archived
Headers

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/Intel compilation fail Compilation issues

Commit Message

Aaron Conole Jan. 27, 2017, 2:57 p.m. UTC
  The rte_eal_init function will now pass failure reason hints to the
application.  To help app developers deciper this, add some brief
information about what the codes are indicating.

Signed-off-by: Aaron Conole <aconole@redhat.com>
---
 lib/librte_eal/common/include/rte_eal.h | 24 +++++++++++++++++++++++-
 1 file changed, 23 insertions(+), 1 deletion(-)
  

Comments

Stephen Hemminger Jan. 27, 2017, 4:33 p.m. UTC | #1
On Fri, 27 Jan 2017 09:57:03 -0500
Aaron Conole <aconole@redhat.com> wrote:

> diff --git a/lib/librte_eal/common/include/rte_eal.h b/lib/librte_eal/common/include/rte_eal.h
> index 03fee50..46e427f 100644
> --- a/lib/librte_eal/common/include/rte_eal.h
> +++ b/lib/librte_eal/common/include/rte_eal.h
> @@ -159,7 +159,29 @@ int rte_eal_iopl_init(void);
>   *     function call and should not be further interpreted by the
>   *     application.  The EAL does not take any ownership of the memory used
>   *     for either the argv array, or its members.
> - *   - On failure, a negative error value.
> + *   - On failure, -1 and rte_errno is set to a value indicating the cause
> + *     for failure.
> + *
> + *   The error codes returned via rte_errno:
> + *     EACCES indicates a permissions issue.
> + *
> + *     EAGAIN indicates either a bus or system resource was not available,
> + *            try again.
> + *
> + *     EALREADY indicates that the rte_eal_init function has already been
> + *              called, and cannot be called again.
> + *
> + *     EINVAL indicates invalid parameters were passed as argv/argc.
> + *
> + *     EIO indicates failure to setup the logging handlers.  This is usually
> + *         caused by an out-of-memory condition.
> + *
> + *     ENODEV indicates memory setup issues.
> + *
> + *     ENOTSUP indicates that the EAL cannot initialize on this system.
> + *
> + *     EUNATCH indicates that the PCI bus is either not present, or is not
> + *             readable by the eal.
>   */
>  int rte_eal_init(int argc, char **argv);

Why use rte_errno?
Most DPDK calls just return negative value on error which corresponds to error number.
Are you trying to keep ABI compatibility? Doesn't make sense because before all these
errors were panic's no working application is going to care.
  
Bruce Richardson Jan. 27, 2017, 4:47 p.m. UTC | #2
On Fri, Jan 27, 2017 at 08:33:46AM -0800, Stephen Hemminger wrote:
> On Fri, 27 Jan 2017 09:57:03 -0500
> Aaron Conole <aconole@redhat.com> wrote:
> 
> > diff --git a/lib/librte_eal/common/include/rte_eal.h b/lib/librte_eal/common/include/rte_eal.h
> > index 03fee50..46e427f 100644
> > --- a/lib/librte_eal/common/include/rte_eal.h
> > +++ b/lib/librte_eal/common/include/rte_eal.h
> > @@ -159,7 +159,29 @@ int rte_eal_iopl_init(void);
> >   *     function call and should not be further interpreted by the
> >   *     application.  The EAL does not take any ownership of the memory used
> >   *     for either the argv array, or its members.
> > - *   - On failure, a negative error value.
> > + *   - On failure, -1 and rte_errno is set to a value indicating the cause
> > + *     for failure.
> > + *
> > + *   The error codes returned via rte_errno:
> > + *     EACCES indicates a permissions issue.
> > + *
> > + *     EAGAIN indicates either a bus or system resource was not available,
> > + *            try again.
> > + *
> > + *     EALREADY indicates that the rte_eal_init function has already been
> > + *              called, and cannot be called again.
> > + *
> > + *     EINVAL indicates invalid parameters were passed as argv/argc.
> > + *
> > + *     EIO indicates failure to setup the logging handlers.  This is usually
> > + *         caused by an out-of-memory condition.
> > + *
> > + *     ENODEV indicates memory setup issues.
> > + *
> > + *     ENOTSUP indicates that the EAL cannot initialize on this system.
> > + *
> > + *     EUNATCH indicates that the PCI bus is either not present, or is not
> > + *             readable by the eal.
> >   */
> >  int rte_eal_init(int argc, char **argv);
> 
> Why use rte_errno?
> Most DPDK calls just return negative value on error which corresponds to error number.
> Are you trying to keep ABI compatibility? Doesn't make sense because before all these
> errors were panic's no working application is going to care.

Either will work, but I actually prefer this way. I view using rte_errno
to be better as it can work in just about all cases, including with
functions which return pointers. This allows you to have a standard
method across all functions for returning error codes, and it only
requires a single sentinal value to indicate error, rather than using a
whole range of values.

/Bruce
  
Stephen Hemminger Jan. 27, 2017, 5:37 p.m. UTC | #3
On Fri, 27 Jan 2017 16:47:40 +0000
Bruce Richardson <bruce.richardson@intel.com> wrote:

> On Fri, Jan 27, 2017 at 08:33:46AM -0800, Stephen Hemminger wrote:
> > On Fri, 27 Jan 2017 09:57:03 -0500
> > Aaron Conole <aconole@redhat.com> wrote:
> >   
> > > diff --git a/lib/librte_eal/common/include/rte_eal.h b/lib/librte_eal/common/include/rte_eal.h
> > > index 03fee50..46e427f 100644
> > > --- a/lib/librte_eal/common/include/rte_eal.h
> > > +++ b/lib/librte_eal/common/include/rte_eal.h
> > > @@ -159,7 +159,29 @@ int rte_eal_iopl_init(void);
> > >   *     function call and should not be further interpreted by the
> > >   *     application.  The EAL does not take any ownership of the memory used
> > >   *     for either the argv array, or its members.
> > > - *   - On failure, a negative error value.
> > > + *   - On failure, -1 and rte_errno is set to a value indicating the cause
> > > + *     for failure.
> > > + *
> > > + *   The error codes returned via rte_errno:
> > > + *     EACCES indicates a permissions issue.
> > > + *
> > > + *     EAGAIN indicates either a bus or system resource was not available,
> > > + *            try again.
> > > + *
> > > + *     EALREADY indicates that the rte_eal_init function has already been
> > > + *              called, and cannot be called again.
> > > + *
> > > + *     EINVAL indicates invalid parameters were passed as argv/argc.
> > > + *
> > > + *     EIO indicates failure to setup the logging handlers.  This is usually
> > > + *         caused by an out-of-memory condition.
> > > + *
> > > + *     ENODEV indicates memory setup issues.
> > > + *
> > > + *     ENOTSUP indicates that the EAL cannot initialize on this system.
> > > + *
> > > + *     EUNATCH indicates that the PCI bus is either not present, or is not
> > > + *             readable by the eal.
> > >   */
> > >  int rte_eal_init(int argc, char **argv);  
> > 
> > Why use rte_errno?
> > Most DPDK calls just return negative value on error which corresponds to error number.
> > Are you trying to keep ABI compatibility? Doesn't make sense because before all these
> > errors were panic's no working application is going to care.  
> 
> Either will work, but I actually prefer this way. I view using rte_errno
> to be better as it can work in just about all cases, including with
> functions which return pointers. This allows you to have a standard
> method across all functions for returning error codes, and it only
> requires a single sentinal value to indicate error, rather than using a
> whole range of values.

The problem is DPDK is getting more inconsistent on how this is done.
As long as error returns are always same as kernel/glibc errno's it really doesn't
matter much which way the value is returned from a technical point of view
but the inconsistency is sure to be a usability problem and source of errors.
  
Aaron Conole Jan. 30, 2017, 6:38 p.m. UTC | #4
Stephen Hemminger <stephen@networkplumber.org> writes:

> On Fri, 27 Jan 2017 16:47:40 +0000
> Bruce Richardson <bruce.richardson@intel.com> wrote:
>
>> On Fri, Jan 27, 2017 at 08:33:46AM -0800, Stephen Hemminger wrote:
>> > On Fri, 27 Jan 2017 09:57:03 -0500
>> > Aaron Conole <aconole@redhat.com> wrote:
>> >   
>> > > diff --git a/lib/librte_eal/common/include/rte_eal.h b/lib/librte_eal/common/include/rte_eal.h
>> > > index 03fee50..46e427f 100644
>> > > --- a/lib/librte_eal/common/include/rte_eal.h
>> > > +++ b/lib/librte_eal/common/include/rte_eal.h
>> > > @@ -159,7 +159,29 @@ int rte_eal_iopl_init(void);
>> > >   *     function call and should not be further interpreted by the
>> > >   *     application.  The EAL does not take any ownership of the memory used
>> > >   *     for either the argv array, or its members.
>> > > - *   - On failure, a negative error value.
>> > > + *   - On failure, -1 and rte_errno is set to a value indicating the cause
>> > > + *     for failure.
>> > > + *
>> > > + *   The error codes returned via rte_errno:
>> > > + *     EACCES indicates a permissions issue.
>> > > + *
>> > > + *     EAGAIN indicates either a bus or system resource was not available,
>> > > + *            try again.
>> > > + *
>> > > + *     EALREADY indicates that the rte_eal_init function has already been
>> > > + *              called, and cannot be called again.
>> > > + *
>> > > + *     EINVAL indicates invalid parameters were passed as argv/argc.
>> > > + *
>> > > + *     EIO indicates failure to setup the logging handlers.  This is usually
>> > > + *         caused by an out-of-memory condition.
>> > > + *
>> > > + *     ENODEV indicates memory setup issues.
>> > > + *
>> > > + *     ENOTSUP indicates that the EAL cannot initialize on this system.
>> > > + *
>> > > + *     EUNATCH indicates that the PCI bus is either not present, or is not
>> > > + *             readable by the eal.
>> > >   */
>> > >  int rte_eal_init(int argc, char **argv);  
>> > 
>> > Why use rte_errno?
>> > Most DPDK calls just return negative value on error which
>> > corresponds to error number.
>> > Are you trying to keep ABI compatibility? Doesn't make sense
>> > because before all these
>> > errors were panic's no working application is going to care.  
>> 
>> Either will work, but I actually prefer this way. I view using rte_errno
>> to be better as it can work in just about all cases, including with
>> functions which return pointers. This allows you to have a standard
>> method across all functions for returning error codes, and it only
>> requires a single sentinal value to indicate error, rather than using a
>> whole range of values.
>
> The problem is DPDK is getting more inconsistent on how this is done.
> As long as error returns are always same as kernel/glibc errno's it really doesn't
> matter much which way the value is returned from a technical point of view
> but the inconsistency is sure to be a usability problem and source of errors.

I am using rte_errno here because I assumed it was the preferred
method.  In fact, looking at some recently contributed modules (for
instance pdump), it seems that folks are using it.

I'm not really sure the purpose of having rte_errno if it isn't used, so
it'd be helpful to know if there's some consensus on reflecting errors
via this variable, or on returning error codes.  Whichever is the more
consistent with the way the DPDK project does things, I'm game :).

Thanks for the thoughts, and review.
  
Thomas Monjalon Jan. 30, 2017, 8:19 p.m. UTC | #5
2017-01-30 13:38, Aaron Conole:
> Stephen Hemminger <stephen@networkplumber.org> writes:
> > Bruce Richardson <bruce.richardson@intel.com> wrote:
> >> On Fri, Jan 27, 2017 at 08:33:46AM -0800, Stephen Hemminger wrote:
> >> > Why use rte_errno?
> >> > Most DPDK calls just return negative value on error which
> >> > corresponds to error number.
> >> > Are you trying to keep ABI compatibility? Doesn't make sense
> >> > because before all these
> >> > errors were panic's no working application is going to care.  
> >> 
> >> Either will work, but I actually prefer this way. I view using rte_errno
> >> to be better as it can work in just about all cases, including with
> >> functions which return pointers. This allows you to have a standard
> >> method across all functions for returning error codes, and it only
> >> requires a single sentinal value to indicate error, rather than using a
> >> whole range of values.
> >
> > The problem is DPDK is getting more inconsistent on how this is done.
> > As long as error returns are always same as kernel/glibc errno's it really doesn't
> > matter much which way the value is returned from a technical point of view
> > but the inconsistency is sure to be a usability problem and source of errors.
> 
> I am using rte_errno here because I assumed it was the preferred
> method.  In fact, looking at some recently contributed modules (for
> instance pdump), it seems that folks are using it.
> 
> I'm not really sure the purpose of having rte_errno if it isn't used, so
> it'd be helpful to know if there's some consensus on reflecting errors
> via this variable, or on returning error codes.  Whichever is the more
> consistent with the way the DPDK project does things, I'm game :).

I think we can use both return value and rte_errno.
We could try to enforce rte_errno as mandatory everywhere.

Adrien did the recent rte_flow API.
Please Adrien, could you give your thought?
  
Bruce Richardson Jan. 31, 2017, 9:33 a.m. UTC | #6
On Mon, Jan 30, 2017 at 01:38:00PM -0500, Aaron Conole wrote:
> Stephen Hemminger <stephen@networkplumber.org> writes:
> 
> > On Fri, 27 Jan 2017 16:47:40 +0000
> > Bruce Richardson <bruce.richardson@intel.com> wrote:
> >
> >> On Fri, Jan 27, 2017 at 08:33:46AM -0800, Stephen Hemminger wrote:
> >> > On Fri, 27 Jan 2017 09:57:03 -0500
> >> > Aaron Conole <aconole@redhat.com> wrote:
> >> >   
> >> > > diff --git a/lib/librte_eal/common/include/rte_eal.h b/lib/librte_eal/common/include/rte_eal.h
> >> > > index 03fee50..46e427f 100644
> >> > > --- a/lib/librte_eal/common/include/rte_eal.h
> >> > > +++ b/lib/librte_eal/common/include/rte_eal.h
> >> > > @@ -159,7 +159,29 @@ int rte_eal_iopl_init(void);
> >> > >   *     function call and should not be further interpreted by the
> >> > >   *     application.  The EAL does not take any ownership of the memory used
> >> > >   *     for either the argv array, or its members.
> >> > > - *   - On failure, a negative error value.
> >> > > + *   - On failure, -1 and rte_errno is set to a value indicating the cause
> >> > > + *     for failure.
> >> > > + *
> >> > > + *   The error codes returned via rte_errno:
> >> > > + *     EACCES indicates a permissions issue.
> >> > > + *
> >> > > + *     EAGAIN indicates either a bus or system resource was not available,
> >> > > + *            try again.
> >> > > + *
> >> > > + *     EALREADY indicates that the rte_eal_init function has already been
> >> > > + *              called, and cannot be called again.
> >> > > + *
> >> > > + *     EINVAL indicates invalid parameters were passed as argv/argc.
> >> > > + *
> >> > > + *     EIO indicates failure to setup the logging handlers.  This is usually
> >> > > + *         caused by an out-of-memory condition.
> >> > > + *
> >> > > + *     ENODEV indicates memory setup issues.
> >> > > + *
> >> > > + *     ENOTSUP indicates that the EAL cannot initialize on this system.
> >> > > + *
> >> > > + *     EUNATCH indicates that the PCI bus is either not present, or is not
> >> > > + *             readable by the eal.
> >> > >   */
> >> > >  int rte_eal_init(int argc, char **argv);  
> >> > 
> >> > Why use rte_errno?
> >> > Most DPDK calls just return negative value on error which
> >> > corresponds to error number.
> >> > Are you trying to keep ABI compatibility? Doesn't make sense
> >> > because before all these
> >> > errors were panic's no working application is going to care.  
> >> 
> >> Either will work, but I actually prefer this way. I view using rte_errno
> >> to be better as it can work in just about all cases, including with
> >> functions which return pointers. This allows you to have a standard
> >> method across all functions for returning error codes, and it only
> >> requires a single sentinal value to indicate error, rather than using a
> >> whole range of values.
> >
> > The problem is DPDK is getting more inconsistent on how this is done.
> > As long as error returns are always same as kernel/glibc errno's it really doesn't
> > matter much which way the value is returned from a technical point of view
> > but the inconsistency is sure to be a usability problem and source of errors.
> 
> I am using rte_errno here because I assumed it was the preferred
> method.  In fact, looking at some recently contributed modules (for
> instance pdump), it seems that folks are using it.
> 
> I'm not really sure the purpose of having rte_errno if it isn't used, so
> it'd be helpful to know if there's some consensus on reflecting errors
> via this variable, or on returning error codes.  Whichever is the more
> consistent with the way the DPDK project does things, I'm game :).
> 
Unfortunately, this is one area where DPDK is inconsistent, and both
schemes are widely used. I much prefer using the rte_errno method, but
returning error codes directly is also common in DPDK.

/Bruce
  
Stephen Hemminger Jan. 31, 2017, 4:56 p.m. UTC | #7
On Tue, 31 Jan 2017 09:33:45 +0000
Bruce Richardson <bruce.richardson@intel.com> wrote:

> On Mon, Jan 30, 2017 at 01:38:00PM -0500, Aaron Conole wrote:
> > Stephen Hemminger <stephen@networkplumber.org> writes:
> >   
> > > On Fri, 27 Jan 2017 16:47:40 +0000
> > > Bruce Richardson <bruce.richardson@intel.com> wrote:
> > >  
> > >> On Fri, Jan 27, 2017 at 08:33:46AM -0800, Stephen Hemminger wrote:  
> > >> > On Fri, 27 Jan 2017 09:57:03 -0500
> > >> > Aaron Conole <aconole@redhat.com> wrote:
> > >> >     
> > >> > > diff --git a/lib/librte_eal/common/include/rte_eal.h b/lib/librte_eal/common/include/rte_eal.h
> > >> > > index 03fee50..46e427f 100644
> > >> > > --- a/lib/librte_eal/common/include/rte_eal.h
> > >> > > +++ b/lib/librte_eal/common/include/rte_eal.h
> > >> > > @@ -159,7 +159,29 @@ int rte_eal_iopl_init(void);
> > >> > >   *     function call and should not be further interpreted by the
> > >> > >   *     application.  The EAL does not take any ownership of the memory used
> > >> > >   *     for either the argv array, or its members.
> > >> > > - *   - On failure, a negative error value.
> > >> > > + *   - On failure, -1 and rte_errno is set to a value indicating the cause
> > >> > > + *     for failure.
> > >> > > + *
> > >> > > + *   The error codes returned via rte_errno:
> > >> > > + *     EACCES indicates a permissions issue.
> > >> > > + *
> > >> > > + *     EAGAIN indicates either a bus or system resource was not available,
> > >> > > + *            try again.
> > >> > > + *
> > >> > > + *     EALREADY indicates that the rte_eal_init function has already been
> > >> > > + *              called, and cannot be called again.
> > >> > > + *
> > >> > > + *     EINVAL indicates invalid parameters were passed as argv/argc.
> > >> > > + *
> > >> > > + *     EIO indicates failure to setup the logging handlers.  This is usually
> > >> > > + *         caused by an out-of-memory condition.
> > >> > > + *
> > >> > > + *     ENODEV indicates memory setup issues.
> > >> > > + *
> > >> > > + *     ENOTSUP indicates that the EAL cannot initialize on this system.
> > >> > > + *
> > >> > > + *     EUNATCH indicates that the PCI bus is either not present, or is not
> > >> > > + *             readable by the eal.
> > >> > >   */
> > >> > >  int rte_eal_init(int argc, char **argv);    
> > >> > 
> > >> > Why use rte_errno?
> > >> > Most DPDK calls just return negative value on error which
> > >> > corresponds to error number.
> > >> > Are you trying to keep ABI compatibility? Doesn't make sense
> > >> > because before all these
> > >> > errors were panic's no working application is going to care.    
> > >> 
> > >> Either will work, but I actually prefer this way. I view using rte_errno
> > >> to be better as it can work in just about all cases, including with
> > >> functions which return pointers. This allows you to have a standard
> > >> method across all functions for returning error codes, and it only
> > >> requires a single sentinal value to indicate error, rather than using a
> > >> whole range of values.  
> > >
> > > The problem is DPDK is getting more inconsistent on how this is done.
> > > As long as error returns are always same as kernel/glibc errno's it really doesn't
> > > matter much which way the value is returned from a technical point of view
> > > but the inconsistency is sure to be a usability problem and source of errors.  
> > 
> > I am using rte_errno here because I assumed it was the preferred
> > method.  In fact, looking at some recently contributed modules (for
> > instance pdump), it seems that folks are using it.
> > 
> > I'm not really sure the purpose of having rte_errno if it isn't used, so
> > it'd be helpful to know if there's some consensus on reflecting errors
> > via this variable, or on returning error codes.  Whichever is the more
> > consistent with the way the DPDK project does things, I'm game :).
> >   
> Unfortunately, this is one area where DPDK is inconsistent, and both
> schemes are widely used. I much prefer using the rte_errno method, but
> returning error codes directly is also common in DPDK.

One argument in favor of returning error codes directly, is that it makes
it safer in application when one user function is returning an error code 
back through its internal call tree.

Also, the API does not really do a good job of distinguishing between normal
(no data present) and exceptional (NIC has died).  At least it doesn't depend
on something like Structured Exception handling...

Feel free to clean the stables on this one.
  
Adrien Mazarguil Feb. 1, 2017, 10:54 a.m. UTC | #8
On Mon, Jan 30, 2017 at 09:19:29PM +0100, Thomas Monjalon wrote:
> 2017-01-30 13:38, Aaron Conole:
> > Stephen Hemminger <stephen@networkplumber.org> writes:
> > > Bruce Richardson <bruce.richardson@intel.com> wrote:
> > >> On Fri, Jan 27, 2017 at 08:33:46AM -0800, Stephen Hemminger wrote:
> > >> > Why use rte_errno?
> > >> > Most DPDK calls just return negative value on error which
> > >> > corresponds to error number.
> > >> > Are you trying to keep ABI compatibility? Doesn't make sense
> > >> > because before all these
> > >> > errors were panic's no working application is going to care.  
> > >> 
> > >> Either will work, but I actually prefer this way. I view using rte_errno
> > >> to be better as it can work in just about all cases, including with
> > >> functions which return pointers. This allows you to have a standard
> > >> method across all functions for returning error codes, and it only
> > >> requires a single sentinal value to indicate error, rather than using a
> > >> whole range of values.
> > >
> > > The problem is DPDK is getting more inconsistent on how this is done.
> > > As long as error returns are always same as kernel/glibc errno's it really doesn't
> > > matter much which way the value is returned from a technical point of view
> > > but the inconsistency is sure to be a usability problem and source of errors.
> > 
> > I am using rte_errno here because I assumed it was the preferred
> > method.  In fact, looking at some recently contributed modules (for
> > instance pdump), it seems that folks are using it.
> > 
> > I'm not really sure the purpose of having rte_errno if it isn't used, so
> > it'd be helpful to know if there's some consensus on reflecting errors
> > via this variable, or on returning error codes.  Whichever is the more
> > consistent with the way the DPDK project does things, I'm game :).
> 
> I think we can use both return value and rte_errno.
> We could try to enforce rte_errno as mandatory everywhere.
> 
> Adrien did the recent rte_flow API.
> Please Adrien, could you give your thought?

Sure, actually as already pointed out in this thread, both approaches have
pros and cons depending on the use-case.

Through return value:

Pros
----

- Most common approach used in DPPK today.
- Used internally by the Linux kernel (negative errno) and in the pthreads
  library (positive errno).
- Avoids the need to access an external, global variable requiring its own
  thread-local storage.
- Inherently thread-safe and reentrant (i.e. safe with signal handlers).
- Returned value is also the error code, two facts reported at once.

Cons
----

- Difficult to use with functions returning anything other than signed
  integers with negative values having no other meaning.
- The returned value must be assigned to a local variable in order not to
  discard it and process it later most of the time.
- All function calls must be tested for errors.

Through rte_errno:

Pros
----

- errno-like, well known behavior defined by the C standard and used
  everywhere in the C library.
- Testing return values is not mandatory, e.g. rte_errno can be initialized
  to zero before calling a group of functions and checking its value
  afterward (rte_errno is only updated in case of error).
- Assigning a local variable to store its value is not necessary as long as
  another function that may affect rte_errno is not called.

Cons
----

- Not fully reentrant, thread-safety is fine for most purposes but signal
  handlers affecting it still cause undefined behavior (they must at least
  save and restore its value in case they modify it).
- Accessing non-local storage may affect CPU cycle-sensitive functions such
  as TX/RX burst.

My opinion is that rte_errno is best for control path operations while using
the return value makes more sense in the data path. The major issue being
that function returning anything other than int (e.g. TX/RX burst) cannot
describe any kind of error to the application.

I went with both in rte_flow (return + rte_errno) mostly due to the return
type of a few functions (e.g. rte_flow_create()) and wanted to keep the API
consistent while maintaining compatibility with other DPDK APIs. Note there
is little overhead for API functions to set rte_errno _and_ return its
value, it's mostly free.

I think using both is best also because it leaves applications the choice of
error-handling method, however if I had to pick one I'd go with rte_errno
and standardize on -1 as the default error value (as in the C library).

Below are a bunch of use-case examples to illustrate how rte_errno could
be convenient to applications.

Easily creating many flow rules during init in a all-or-nothing fashion:

 rte_errno = 0;
 for (i = 0; i != num; ++i)
     rule[i] = rte_flow_create(port, ...);
 if (unlikely(rte_errno)) {
     rte_flow_flush(port);
     return -1;
 }

Complete TX packet burst failure with explanation (could also detect partial
failures by initializing rte_errno to 0):

 sent = rte_eth_tx_burst(...);
 if (unlikely(!sent)) {
     switch (rte_errno) {
         case E2BIG:
             // too many packets in burst
         ...
         case EMSGSIZE:
             // first packet is too large
         ...
         case ENOBUFS:
             // TX queue is full
         ...
     }
 }
 
TX burst functions in PMDs could be modified as follows with minimal impact
on their performance and no ABI change:

     uint16_t sent = 0;
     int error; // new variable
 
     [process burst]
     if (unlikely([something went wrong])) { // this check already exists
         error = EPROBLEM; // new assignment
         goto error; // instead of "return sent"
     }
     [process burst]
     return sent;
 error:
     rte_errno = error;
     return sent;
  
Jan Blunck Feb. 1, 2017, 12:06 p.m. UTC | #9
On Wed, Feb 1, 2017 at 11:54 AM, Adrien Mazarguil
<adrien.mazarguil@6wind.com> wrote:
> On Mon, Jan 30, 2017 at 09:19:29PM +0100, Thomas Monjalon wrote:
>> 2017-01-30 13:38, Aaron Conole:
>> > Stephen Hemminger <stephen@networkplumber.org> writes:
>> > > Bruce Richardson <bruce.richardson@intel.com> wrote:
>> > >> On Fri, Jan 27, 2017 at 08:33:46AM -0800, Stephen Hemminger wrote:
>> > >> > Why use rte_errno?
>> > >> > Most DPDK calls just return negative value on error which
>> > >> > corresponds to error number.
>> > >> > Are you trying to keep ABI compatibility? Doesn't make sense
>> > >> > because before all these
>> > >> > errors were panic's no working application is going to care.
>> > >>
>> > >> Either will work, but I actually prefer this way. I view using rte_errno
>> > >> to be better as it can work in just about all cases, including with
>> > >> functions which return pointers. This allows you to have a standard
>> > >> method across all functions for returning error codes, and it only
>> > >> requires a single sentinal value to indicate error, rather than using a
>> > >> whole range of values.
>> > >
>> > > The problem is DPDK is getting more inconsistent on how this is done.
>> > > As long as error returns are always same as kernel/glibc errno's it really doesn't
>> > > matter much which way the value is returned from a technical point of view
>> > > but the inconsistency is sure to be a usability problem and source of errors.
>> >
>> > I am using rte_errno here because I assumed it was the preferred
>> > method.  In fact, looking at some recently contributed modules (for
>> > instance pdump), it seems that folks are using it.
>> >
>> > I'm not really sure the purpose of having rte_errno if it isn't used, so
>> > it'd be helpful to know if there's some consensus on reflecting errors
>> > via this variable, or on returning error codes.  Whichever is the more
>> > consistent with the way the DPDK project does things, I'm game :).
>>
>> I think we can use both return value and rte_errno.
>> We could try to enforce rte_errno as mandatory everywhere.
>>
>> Adrien did the recent rte_flow API.
>> Please Adrien, could you give your thought?
>
> Sure, actually as already pointed out in this thread, both approaches have
> pros and cons depending on the use-case.
>
> Through return value:
>
> Pros
> ----
>
> - Most common approach used in DPPK today.
> - Used internally by the Linux kernel (negative errno) and in the pthreads
>   library (positive errno).
> - Avoids the need to access an external, global variable requiring its own
>   thread-local storage.
> - Inherently thread-safe and reentrant (i.e. safe with signal handlers).
> - Returned value is also the error code, two facts reported at once.

Caller can decide to ignore return value if no error handling is wanted.

>
> Cons
> ----
>
> - Difficult to use with functions returning anything other than signed
>   integers with negative values having no other meaning.
> - The returned value must be assigned to a local variable in order not to
>   discard it and process it later most of the time.

I believe this is Pro since the rte_errno even needs to assign to a
thread-local variable even.

> - All function calls must be tested for errors.

The rte_errno needs to do this too to decide if it needs to assign a
value to rte_errno.

>
> Through rte_errno:
>
> Pros
> ----
>
> - errno-like, well known behavior defined by the C standard and used
>   everywhere in the C library.
> - Testing return values is not mandatory, e.g. rte_errno can be initialized
>   to zero before calling a group of functions and checking its value
>   afterward (rte_errno is only updated in case of error).
> - Assigning a local variable to store its value is not necessary as long as
>   another function that may affect rte_errno is not called.
>
> Cons
> ----
>
> - Not fully reentrant, thread-safety is fine for most purposes but signal
>   handlers affecting it still cause undefined behavior (they must at least
>   save and restore its value in case they modify it).
> - Accessing non-local storage may affect CPU cycle-sensitive functions such
>   as TX/RX burst.

Actually testing for errors mean you also have to reset the rte_errno
variable before. That also means you have to access thread-local
storage twice.

Besides that the problem of rte_errno is that you do error handling
twice because the implementation still needs to check for the error
condition before assigning a meaningful error value to rte_errno.
After that again the user code needs to check for the return value to
decide if looking at rte_errno makes any sense.


> My opinion is that rte_errno is best for control path operations while using
> the return value makes more sense in the data path. The major issue being
> that function returning anything other than int (e.g. TX/RX burst) cannot
> describe any kind of error to the application.
>
> I went with both in rte_flow (return + rte_errno) mostly due to the return
> type of a few functions (e.g. rte_flow_create()) and wanted to keep the API
> consistent while maintaining compatibility with other DPDK APIs. Note there
> is little overhead for API functions to set rte_errno _and_ return its
> value, it's mostly free.
>
> I think using both is best also because it leaves applications the choice of
> error-handling method, however if I had to pick one I'd go with rte_errno
> and standardize on -1 as the default error value (as in the C library).
>
> Below are a bunch of use-case examples to illustrate how rte_errno could
> be convenient to applications.
>
> Easily creating many flow rules during init in a all-or-nothing fashion:
>
>  rte_errno = 0;
>  for (i = 0; i != num; ++i)
>      rule[i] = rte_flow_create(port, ...);
>  if (unlikely(rte_errno)) {
>      rte_flow_flush(port);
>      return -1;
>  }
>
> Complete TX packet burst failure with explanation (could also detect partial
> failures by initializing rte_errno to 0):
>
>  sent = rte_eth_tx_burst(...);
>  if (unlikely(!sent)) {
>      switch (rte_errno) {
>          case E2BIG:
>              // too many packets in burst
>          ...
>          case EMSGSIZE:
>              // first packet is too large
>          ...
>          case ENOBUFS:
>              // TX queue is full
>          ...
>      }
>  }
>
> TX burst functions in PMDs could be modified as follows with minimal impact
> on their performance and no ABI change:
>
>      uint16_t sent = 0;
>      int error; // new variable
>
>      [process burst]
>      if (unlikely([something went wrong])) { // this check already exists
>          error = EPROBLEM; // new assignment
>          goto error; // instead of "return sent"
>      }
>      [process burst]
>      return sent;
>  error:
>      rte_errno = error;
>      return sent;
>
> --
> Adrien Mazarguil
> 6WIND
  
Bruce Richardson Feb. 1, 2017, 2:18 p.m. UTC | #10
On Wed, Feb 01, 2017 at 01:06:03PM +0100, Jan Blunck wrote:
> On Wed, Feb 1, 2017 at 11:54 AM, Adrien Mazarguil
> <adrien.mazarguil@6wind.com> wrote:
> > On Mon, Jan 30, 2017 at 09:19:29PM +0100, Thomas Monjalon wrote:
> >> 2017-01-30 13:38, Aaron Conole:
> >> > Stephen Hemminger <stephen@networkplumber.org> writes:
> >> > > Bruce Richardson <bruce.richardson@intel.com> wrote:
> >> > >> On Fri, Jan 27, 2017 at 08:33:46AM -0800, Stephen Hemminger wrote:
> >> > >> > Why use rte_errno?
> >> > >> > Most DPDK calls just return negative value on error which
> >> > >> > corresponds to error number.
> >> > >> > Are you trying to keep ABI compatibility? Doesn't make sense
> >> > >> > because before all these
> >> > >> > errors were panic's no working application is going to care.
> >> > >>
> >> > >> Either will work, but I actually prefer this way. I view using rte_errno
> >> > >> to be better as it can work in just about all cases, including with
> >> > >> functions which return pointers. This allows you to have a standard
> >> > >> method across all functions for returning error codes, and it only
> >> > >> requires a single sentinal value to indicate error, rather than using a
> >> > >> whole range of values.
> >> > >
> >> > > The problem is DPDK is getting more inconsistent on how this is done.
> >> > > As long as error returns are always same as kernel/glibc errno's it really doesn't
> >> > > matter much which way the value is returned from a technical point of view
> >> > > but the inconsistency is sure to be a usability problem and source of errors.
> >> >
> >> > I am using rte_errno here because I assumed it was the preferred
> >> > method.  In fact, looking at some recently contributed modules (for
> >> > instance pdump), it seems that folks are using it.
> >> >
> >> > I'm not really sure the purpose of having rte_errno if it isn't used, so
> >> > it'd be helpful to know if there's some consensus on reflecting errors
> >> > via this variable, or on returning error codes.  Whichever is the more
> >> > consistent with the way the DPDK project does things, I'm game :).
> >>
> >> I think we can use both return value and rte_errno.
> >> We could try to enforce rte_errno as mandatory everywhere.
> >>
> >> Adrien did the recent rte_flow API.
> >> Please Adrien, could you give your thought?
> >
> > Sure, actually as already pointed out in this thread, both approaches have
> > pros and cons depending on the use-case.
> >
> > Through return value:
> >
> > Pros
> > ----
> >
> > - Most common approach used in DPPK today.
> > - Used internally by the Linux kernel (negative errno) and in the pthreads
> >   library (positive errno).
> > - Avoids the need to access an external, global variable requiring its own
> >   thread-local storage.
> > - Inherently thread-safe and reentrant (i.e. safe with signal handlers).
> > - Returned value is also the error code, two facts reported at once.
> 
> Caller can decide to ignore return value if no error handling is wanted.
>
Not always the case. In the case of a rx or tx burst call, if there is a
negative error that must be checked for or assigned to zero in some
cases to make other logic in the path work sanely, e.g. updating an
array of stats using the return value.

> >
> > Cons
> > ----
> >
> > - Difficult to use with functions returning anything other than signed
> >   integers with negative values having no other meaning.
> > - The returned value must be assigned to a local variable in order not to
> >   discard it and process it later most of the time.
> 
> I believe this is Pro since the rte_errno even needs to assign to a
> thread-local variable even.

No, it's a con, since for errno the value will be preserved in the
absense of other errors. The application can delay handling the error as
long as it wants, in the absense of causes of subsequent errors.

> 
> > - All function calls must be tested for errors.
> 
> The rte_errno needs to do this too to decide if it needs to assign a
> value to rte_errno.
> 
Thats inside the called function, not the application. See my earlier
comment above about having to check your return value is in the valid
"logical range" expected from the call. Having a negative number of
packets received does not make logical sense, so you have to check the
return value when updating stats etc.


> >
> > Through rte_errno:
> >
> > Pros
> > ----
> >
> > - errno-like, well known behavior defined by the C standard and used
> >   everywhere in the C library.
> > - Testing return values is not mandatory, e.g. rte_errno can be initialized
> >   to zero before calling a group of functions and checking its value
> >   afterward (rte_errno is only updated in case of error).
> > - Assigning a local variable to store its value is not necessary as long as
> >   another function that may affect rte_errno is not called.
> >
> > Cons
> > ----
> >
> > - Not fully reentrant, thread-safety is fine for most purposes but signal
> >   handlers affecting it still cause undefined behavior (they must at least
> >   save and restore its value in case they modify it).
> > - Accessing non-local storage may affect CPU cycle-sensitive functions such
> >   as TX/RX burst.
> 
> Actually testing for errors mean you also have to reset the rte_errno
> variable before. That also means you have to access thread-local
> storage twice.
> 
Not true. Your return value still indicates an error via a single
sentinal value. Only in that case do you (the app) access the global value,
to find out the exact error reason.

> Besides that the problem of rte_errno is that you do error handling
> twice because the implementation still needs to check for the error
> condition before assigning a meaningful error value to rte_errno.
> After that again the user code needs to check for the return value to
> decide if looking at rte_errno makes any sense.
> 
Yes, in the case of an error occuring there will be an extra write to a
global variable, and a subsequent read from that value (which should not
be a problem, as the write will have occurred in the same thread).
However, this is irrelevant to normal path processing. Error should be
the exception not the rule.

> 
> > My opinion is that rte_errno is best for control path operations while using
> > the return value makes more sense in the data path. The major issue being
> > that function returning anything other than int (e.g. TX/RX burst) cannot
> > describe any kind of error to the application.
> >
> > I went with both in rte_flow (return + rte_errno) mostly due to the return
> > type of a few functions (e.g. rte_flow_create()) and wanted to keep the API
> > consistent while maintaining compatibility with other DPDK APIs. Note there
> > is little overhead for API functions to set rte_errno _and_ return its
> > value, it's mostly free
+1, and error cases should be rare, even if there is a small cost.
.
> >
> > I think using both is best also because it leaves applications the choice of
> > error-handling method, however if I had to pick one I'd go with rte_errno
> > and standardize on -1 as the default error value (as in the C library).
> >
+1
though I think the sentinal value will vary depending on each case. I would
look to keep the standard packet rx/tx functions and ones like them
returning a zero on any error, to simplify programming logic, and also
because in many cases the only real causes of error they can produce is
from bad parameters.
Functions returning pointers obviously will use NULL as error value.


> > Below are a bunch of use-case examples to illustrate how rte_errno could
> > be convenient to applications.
> >
> > Easily creating many flow rules during init in a all-or-nothing fashion:
> >
> >  rte_errno = 0;
> >  for (i = 0; i != num; ++i)
> >      rule[i] = rte_flow_create(port, ...);
> >  if (unlikely(rte_errno)) {
> >      rte_flow_flush(port);
> >      return -1;
> >  }
> >
> > Complete TX packet burst failure with explanation (could also detect partial
> > failures by initializing rte_errno to 0):
> >
> >  sent = rte_eth_tx_burst(...);
> >  if (unlikely(!sent)) {
> >      switch (rte_errno) {
> >          case E2BIG:
> >              // too many packets in burst
> >          ...
> >          case EMSGSIZE:
> >              // first packet is too large
> >          ...
> >          case ENOBUFS:
> >              // TX queue is full
> >          ...
> >      }
> >  }
> >
> > TX burst functions in PMDs could be modified as follows with minimal impact
> > on their performance and no ABI change:
> >
> >      uint16_t sent = 0;
> >      int error; // new variable
> >
> >      [process burst]
> >      if (unlikely([something went wrong])) { // this check already exists
> >          error = EPROBLEM; // new assignment
> >          goto error; // instead of "return sent"
> >      }
> >      [process burst]
> >      return sent;
> >  error:
> >      rte_errno = error;
> >      return sent;
> >
> > --
> > Adrien Mazarguil
> > 6WIND
  

Patch

diff --git a/lib/librte_eal/common/include/rte_eal.h b/lib/librte_eal/common/include/rte_eal.h
index 03fee50..46e427f 100644
--- a/lib/librte_eal/common/include/rte_eal.h
+++ b/lib/librte_eal/common/include/rte_eal.h
@@ -159,7 +159,29 @@  int rte_eal_iopl_init(void);
  *     function call and should not be further interpreted by the
  *     application.  The EAL does not take any ownership of the memory used
  *     for either the argv array, or its members.
- *   - On failure, a negative error value.
+ *   - On failure, -1 and rte_errno is set to a value indicating the cause
+ *     for failure.
+ *
+ *   The error codes returned via rte_errno:
+ *     EACCES indicates a permissions issue.
+ *
+ *     EAGAIN indicates either a bus or system resource was not available,
+ *            try again.
+ *
+ *     EALREADY indicates that the rte_eal_init function has already been
+ *              called, and cannot be called again.
+ *
+ *     EINVAL indicates invalid parameters were passed as argv/argc.
+ *
+ *     EIO indicates failure to setup the logging handlers.  This is usually
+ *         caused by an out-of-memory condition.
+ *
+ *     ENODEV indicates memory setup issues.
+ *
+ *     ENOTSUP indicates that the EAL cannot initialize on this system.
+ *
+ *     EUNATCH indicates that the PCI bus is either not present, or is not
+ *             readable by the eal.
  */
 int rte_eal_init(int argc, char **argv);