[dpdk-dev] eal: register rte_panic user callback

Message ID 1520360928-9375-1-git-send-email-arnon@qwilt.com (mailing list archive)
State Superseded, archived
Headers

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/Intel-compilation success Compilation OK

Commit Message

Arnon Warshavsky March 6, 2018, 6:28 p.m. UTC
  The use case addressed here is dpdk environment init
aborting the process due to panic,
preventing the calling process from running its own tear-down actions.
A preferred, though ABI breaking solution would be
to have the environment init always return a value
rather than abort upon distress.

This patch defines a couple of callback registration functions,
one for panic and one for exit
in case one wishes to distinguish between these events.
Once a callback is set and panic takes place,
it will be called prior to calling abort.

Maiden voyage patch for Qwilt and myself.

Signed-off-by: Arnon Warshavsky <arnon@qwilt.com>
---
 lib/librte_eal/bsdapp/eal/eal_debug.c     | 37 ++++++++++++++++++++++++++++++
 lib/librte_eal/common/include/rte_debug.h | 24 +++++++++++++++++++
 lib/librte_eal/linuxapp/eal/eal_debug.c   | 38 +++++++++++++++++++++++++++++++
 lib/librte_eal/rte_eal_version.map        |  2 ++
 4 files changed, 101 insertions(+)
  

Comments

Thomas Monjalon March 7, 2018, 8:32 a.m. UTC | #1
Hi,

06/03/2018 19:28, Arnon Warshavsky:
> The use case addressed here is dpdk environment init
> aborting the process due to panic,
> preventing the calling process from running its own tear-down actions.

Thank you for working on this long standing issue.

> A preferred, though ABI breaking solution would be
> to have the environment init always return a value
> rather than abort upon distress.

Yes, it is the preferred solution.
We should not use exit (panic & co) inside a library.
It is important enough to break the API.
I would be in favor of accepting such breakage in 18.05.

> This patch defines a couple of callback registration functions,
> one for panic and one for exit
> in case one wishes to distinguish between these events.
> Once a callback is set and panic takes place,
> it will be called prior to calling abort.
> 
> Maiden voyage patch for Qwilt and myself.

Are you OK to visit the other side of the solution?
  
Arnon Warshavsky March 7, 2018, 8:57 a.m. UTC | #2
>
>
> Are you OK to visit the other side of the solution?
>
>
Sure. If no one is emotionally attached to those panic aborts,
this patch can be discarded and I will create a new one with the license to
break.
  
Anatoly Burakov March 7, 2018, 9:05 a.m. UTC | #3
On 07-Mar-18 8:32 AM, Thomas Monjalon wrote:
> Hi,
> 
> 06/03/2018 19:28, Arnon Warshavsky:
>> The use case addressed here is dpdk environment init
>> aborting the process due to panic,
>> preventing the calling process from running its own tear-down actions.
> 
> Thank you for working on this long standing issue.
> 
>> A preferred, though ABI breaking solution would be
>> to have the environment init always return a value
>> rather than abort upon distress.
> 
> Yes, it is the preferred solution.
> We should not use exit (panic & co) inside a library.
> It is important enough to break the API.

+1, panic exists mostly for historical reasons AFAIK. it's a pity i 
didn't think of it at the time of submitting the memory hotplug RFC, 
because i now hit the same issue with the v1 - we might panic while 
holding a lock, and didn't realize that it was an API break to change 
this behavior.

Can this really go into current release without deprecation notices?
  
Thomas Monjalon March 7, 2018, 9:59 a.m. UTC | #4
07/03/2018 10:05, Burakov, Anatoly:
> On 07-Mar-18 8:32 AM, Thomas Monjalon wrote:
> > Hi,
> > 
> > 06/03/2018 19:28, Arnon Warshavsky:
> >> The use case addressed here is dpdk environment init
> >> aborting the process due to panic,
> >> preventing the calling process from running its own tear-down actions.
> > 
> > Thank you for working on this long standing issue.
> > 
> >> A preferred, though ABI breaking solution would be
> >> to have the environment init always return a value
> >> rather than abort upon distress.
> > 
> > Yes, it is the preferred solution.
> > We should not use exit (panic & co) inside a library.
> > It is important enough to break the API.
> 
> +1, panic exists mostly for historical reasons AFAIK. it's a pity i 
> didn't think of it at the time of submitting the memory hotplug RFC, 
> because i now hit the same issue with the v1 - we might panic while 
> holding a lock, and didn't realize that it was an API break to change 
> this behavior.
> 
> Can this really go into current release without deprecation notices?

If such an exception is done, it must be approved by the technical board.
We need to check few criterias:
	- which functions need to be changed
	- how the application is impacted
	- what is the urgency

If a panic is removed and the application is not already checking some
error code, the execution will continue without considering the error.

Some rte_panic could be probably removed without any impact on applications.
Some rte_panic could wait for 18.08 with a notice in 18.05.
If some rte_panic cannot wait, it must be discussed specifically.
  
Arnon Warshavsky March 7, 2018, 11:02 a.m. UTC | #5
> > Can this really go into current release without deprecation notices?
>
> If such an exception is done, it must be approved by the technical board.
> We need to check few criterias:
>         - which functions need to be changed
>         - how the application is impacted
>         - what is the urgency
>
> If a panic is removed and the application is not already checking some
> error code, the execution will continue without considering the error.
>

> Some rte_panic could be probably removed without any impact on
> applications.
> Some rte_panic could wait for 18.08 with a notice in 18.05.
> If some rte_panic cannot wait, it must be discussed specifically.
>

Every panic removal must be handled all the way up in all call paths.
If not all instances can be removed at once in 18.05 (which seems to be the
case)
maybe we should keep the callback patch until all the remains are gone.
  
Anatoly Burakov March 7, 2018, 11:29 a.m. UTC | #6
On 07-Mar-18 9:59 AM, Thomas Monjalon wrote:
> 07/03/2018 10:05, Burakov, Anatoly:
>> On 07-Mar-18 8:32 AM, Thomas Monjalon wrote:
>>> Hi,
>>>
>>> 06/03/2018 19:28, Arnon Warshavsky:
>>>> The use case addressed here is dpdk environment init
>>>> aborting the process due to panic,
>>>> preventing the calling process from running its own tear-down actions.
>>>
>>> Thank you for working on this long standing issue.
>>>
>>>> A preferred, though ABI breaking solution would be
>>>> to have the environment init always return a value
>>>> rather than abort upon distress.
>>>
>>> Yes, it is the preferred solution.
>>> We should not use exit (panic & co) inside a library.
>>> It is important enough to break the API.
>>
>> +1, panic exists mostly for historical reasons AFAIK. it's a pity i
>> didn't think of it at the time of submitting the memory hotplug RFC,
>> because i now hit the same issue with the v1 - we might panic while
>> holding a lock, and didn't realize that it was an API break to change
>> this behavior.
>>
>> Can this really go into current release without deprecation notices?
> 
> If such an exception is done, it must be approved by the technical board.
> We need to check few criterias:
> 	- which functions need to be changed
> 	- how the application is impacted
> 	- what is the urgency
> 
> If a panic is removed and the application is not already checking some
> error code, the execution will continue without considering the error.
> 
> Some rte_panic could be probably removed without any impact on applications.
> Some rte_panic could wait for 18.08 with a notice in 18.05.
> If some rte_panic cannot wait, it must be discussed specifically.
> 

Can we add a compile warning for adding new rte_panic's into code? It's 
a nice tool while debugging, but it probably shouldn't be in any new 
production code.
  
Arnon Warshavsky March 7, 2018, 1:23 p.m. UTC | #7
>
> Can we add a compile warning for adding new rte_panic's into code? It's a
> nice tool while debugging, but it probably shouldn't be in any new
> production code.
>

I thought about renaming the current function and calls to something like
deprecated_rte_panic()
, and keep the old API with __rte_deprecated.
Is this kind of API break acceptable?
  
Thomas Monjalon March 7, 2018, 3:04 p.m. UTC | #8
07/03/2018 12:02, Arnon Warshavsky:
> > > Can this really go into current release without deprecation notices?
> >
> > If such an exception is done, it must be approved by the technical board.
> > We need to check few criterias:
> >         - which functions need to be changed
> >         - how the application is impacted
> >         - what is the urgency
> >
> > If a panic is removed and the application is not already checking some
> > error code, the execution will continue without considering the error.
> >
> 
> > Some rte_panic could be probably removed without any impact on
> > applications.
> > Some rte_panic could wait for 18.08 with a notice in 18.05.
> > If some rte_panic cannot wait, it must be discussed specifically.
> >
> 
> Every panic removal must be handled all the way up in all call paths.
> If not all instances can be removed at once in 18.05 (which seems to be the
> case)
> maybe we should keep the callback patch until all the remains are gone.

Why introducing a new API for a temporary solution?
It has always been like that, so the remaining occurences could wait
one more release, isn't it?
  
Thomas Monjalon March 7, 2018, 3:06 p.m. UTC | #9
07/03/2018 14:23, Arnon Warshavsky:
> >
> > Can we add a compile warning for adding new rte_panic's into code? It's a
> > nice tool while debugging, but it probably shouldn't be in any new
> > production code.

Yes could be nice to automatically detect it in drivers/ or lib/ directories.


> I thought about renaming the current function and calls to something like
> deprecated_rte_panic()
> , and keep the old API with __rte_deprecated.
> Is this kind of API break acceptable?

No, rte_panic can be used in applications.
  
Arnon Warshavsky March 7, 2018, 4:26 p.m. UTC | #10
> > maybe we should keep the callback patch until all the remains are gone.
>
> Why introducing a new API for a temporary solution?
> It has always been like that, so the remaining occurences could wait
> one more release, isn't it?
>
> Yes.
I guess I am over excited to get rid of my local changes faster :)
  
Arnon Warshavsky March 7, 2018, 4:31 p.m. UTC | #11
> > Can we add a compile warning for adding new rte_panic's into code? It's
a

> > > nice tool while debugging, but it probably shouldn't be in any new
> > > production code.
>
> Yes could be nice to automatically detect it in drivers/ or lib/
> directories.
>

How do we apply a warning only to new code? via checkpatch?
  
Thomas Monjalon March 7, 2018, 5:17 p.m. UTC | #12
07/03/2018 17:31, Arnon Warshavsky:
> > > Can we add a compile warning for adding new rte_panic's into code? It's
> a
> 
> > > > nice tool while debugging, but it probably shouldn't be in any new
> > > > production code.
> >
> > Yes could be nice to automatically detect it in drivers/ or lib/
> > directories.
> >
> 
> How do we apply a warning only to new code? via checkpatch?

Yes in devtools/checkpatches.sh
  

Patch

diff --git a/lib/librte_eal/bsdapp/eal/eal_debug.c b/lib/librte_eal/bsdapp/eal/eal_debug.c
index 5d92500..010859d 100644
--- a/lib/librte_eal/bsdapp/eal/eal_debug.c
+++ b/lib/librte_eal/bsdapp/eal/eal_debug.c
@@ -18,6 +18,39 @@ 
 
 #define BACKTRACE_SIZE 256
 
+/*
+ * user function pointers that when assigned, gets to be called
+ * during ret_exit()
+ */
+static rte_user_abort_callback_t *exit_user_callback;
+
+/*
+ * user function pointers that when assigned, gets to be called
+ * during ret_panic()
+ */
+static rte_user_abort_callback_t *panic_user_callback;
+
+/**
+ * Register user callback function to be called during rte_panic()
+ * Deregisteration is by passing NULL as the parameter
+ */
+void __rte_experimental
+rte_panic_user_callback_register(rte_user_abort_callback_t *cb)
+{
+	panic_user_callback = cb;
+}
+
+/**
+ * Register user callback function to be called during rte_exit()
+ * Deregisteration is by passing NULL as the parameter
+ */
+void __rte_experimental
+rte_exit_user_callback_register(rte_user_abort_callback_t *cb)
+{
+	exit_user_callback = cb;
+}
+
+
 /* dump the stack of the calling core */
 void rte_dump_stack(void)
 {
@@ -59,6 +92,8 @@  void __rte_panic(const char *funcname, const char *format, ...)
 	va_end(ap);
 	rte_dump_stack();
 	rte_dump_registers();
+	if (panic_user_callback)
+		(*panic_user_callback)();
 	abort();
 }
 
@@ -78,6 +113,8 @@  rte_exit(int exit_code, const char *format, ...)
 	va_start(ap, format);
 	rte_vlog(RTE_LOG_CRIT, RTE_LOGTYPE_EAL, format, ap);
 	va_end(ap);
+	if (exit_user_callback)
+		(*exit_user_callback)();
 
 #ifndef RTE_EAL_ALWAYS_PANIC_ON_ERROR
 	if (rte_eal_cleanup() != 0)
diff --git a/lib/librte_eal/common/include/rte_debug.h b/lib/librte_eal/common/include/rte_debug.h
index 272df49..7e3d0a2 100644
--- a/lib/librte_eal/common/include/rte_debug.h
+++ b/lib/librte_eal/common/include/rte_debug.h
@@ -16,11 +16,35 @@ 
 
 #include "rte_log.h"
 #include "rte_branch_prediction.h"
+#include <rte_compat.h>
 
 #ifdef __cplusplus
 extern "C" {
 #endif
 
+
+/*
+ * Definition of user function pointer type to be called during
+ * the execution of rte_panic
+ */
+
+typedef void  (*rte_user_abort_callback_t)(void);
+/**< @internal Ethernet device configuration. */
+
+/**
+ * Register user callback function to be called during rte_panic()
+ * Deregisteration is by passing NULL as the parameter
+ */
+void __rte_experimental
+rte_panic_user_callback_register(rte_user_abort_callback_t *cb);
+
+/**
+ * Register user callback function to be called during rte_exit()
+ * Deregisteration is by passing NULL as the parameter
+ */
+void __rte_experimental
+rte_exit_user_callback_register(rte_user_abort_callback_t *cb);
+
 /**
  * Dump the stack of the calling core to the console.
  */
diff --git a/lib/librte_eal/linuxapp/eal/eal_debug.c b/lib/librte_eal/linuxapp/eal/eal_debug.c
index 5d92500..b1748b8 100644
--- a/lib/librte_eal/linuxapp/eal/eal_debug.c
+++ b/lib/librte_eal/linuxapp/eal/eal_debug.c
@@ -16,8 +16,42 @@ 
 #include <rte_common.h>
 #include <rte_eal.h>
 
+
 #define BACKTRACE_SIZE 256
 
+/*
+ * user function pointers that when assigned, gets to be called
+ * during ret_exit()
+ */
+static rte_user_abort_callback_t *exit_user_callback;
+
+/*
+ * user function pointers that when assigned, gets to be called
+ * during ret_panic()
+ */
+static rte_user_abort_callback_t *panic_user_callback;
+
+/**
+ * Register user callback function to be called during rte_panic()
+ * Deregisteration is by passing NULL as the parameter
+ */
+void __rte_experimental
+rte_panic_user_callback_register(rte_user_abort_callback_t *cb)
+{
+	panic_user_callback = cb;
+}
+
+/**
+ * Register user callback function to be called during rte_exit()
+ * Deregisteration is by passing NULL as the parameter
+ */
+void __rte_experimental
+rte_exit_user_callback_register(rte_user_abort_callback_t *cb)
+{
+	exit_user_callback = cb;
+}
+
+
 /* dump the stack of the calling core */
 void rte_dump_stack(void)
 {
@@ -59,6 +93,8 @@  void __rte_panic(const char *funcname, const char *format, ...)
 	va_end(ap);
 	rte_dump_stack();
 	rte_dump_registers();
+	if (panic_user_callback)
+		(*panic_user_callback)();
 	abort();
 }
 
@@ -78,6 +114,8 @@  rte_exit(int exit_code, const char *format, ...)
 	va_start(ap, format);
 	rte_vlog(RTE_LOG_CRIT, RTE_LOGTYPE_EAL, format, ap);
 	va_end(ap);
+	if (exit_user_callback)
+		(*exit_user_callback)();
 
 #ifndef RTE_EAL_ALWAYS_PANIC_ON_ERROR
 	if (rte_eal_cleanup() != 0)
diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
index d123602..7b8f55d 100644
--- a/lib/librte_eal/rte_eal_version.map
+++ b/lib/librte_eal/rte_eal_version.map
@@ -221,11 +221,13 @@  EXPERIMENTAL {
 	rte_eal_hotplug_add;
 	rte_eal_hotplug_remove;
 	rte_eal_mbuf_user_pool_ops;
+	rte_exit_user_callback_register;
 	rte_mp_action_register;
 	rte_mp_action_unregister;
 	rte_mp_sendmsg;
 	rte_mp_request;
 	rte_mp_reply;
+	rte_panic_user_callback_register;
 	rte_service_attr_get;
 	rte_service_attr_reset_all;
 	rte_service_component_register;