mk: using initial-exec model for thread local variable
Checks
Commit Message
When building share library, thread-local storage model will be changed
to global-dynamic. It will add additional cost for reading thread local
variable. On the other hand, dynamically load share library with static
TLS will request additional DTV slot which is limited by loader. By now
only librte_pmd_eal.so contain thread local variable. So that can make
TLS model back to initial-exec like static library for better
performance.
Signed-off-by: Marvin Liu <yong.liu@intel.com>
Comments
05/07/2018 16:13, Marvin Liu:
> When building share library, thread-local storage model will be changed
> to global-dynamic. It will add additional cost for reading thread local
> variable. On the other hand, dynamically load share library with static
> TLS will request additional DTV slot which is limited by loader. By now
> only librte_pmd_eal.so contain thread local variable. So that can make
> TLS model back to initial-exec like static library for better
> performance.
>
> Signed-off-by: Marvin Liu <yong.liu@intel.com>
>
> diff --git a/mk/toolchain/gcc/rte.vars.mk b/mk/toolchain/gcc/rte.vars.mk
> index 7e4531bab..19d5e11ef 100644
> --- a/mk/toolchain/gcc/rte.vars.mk
> +++ b/mk/toolchain/gcc/rte.vars.mk
It is only for GCC? not clang?
> +# Initial execution TLS model has better performane compared to dynamic
> +# global. But this model require for addtional slot on DTV when dlopen
> +# object with thread local variable.
Few typos in this comment.
> +ifeq ($(CONFIG_RTE_BUILD_SHARED_LIB),y)
> +TOOLCHAIN_CFLAGS += -ftls-model=initial-exec
> +endif
We really need more test or review of this patch.
Cc techboard: do we take the risk of getting it in RC1
without review? It is waiting for long.
>
> When building share library, thread-local storage model will be changed to
> global-dynamic. It will add additional cost for reading thread local variable.
> On the other hand, dynamically load share library with static TLS will request
> additional DTV slot which is limited by loader. By now only librte_pmd_eal.so
> contain thread local variable. So that can make TLS model back to initial-exec
> like static library for better performance.
>
> Signed-off-by: Marvin Liu <yong.liu@intel.com>
>
> diff --git a/mk/toolchain/gcc/rte.vars.mk b/mk/toolchain/gcc/rte.vars.mk
> index 7e4531bab..19d5e11ef 100644
> --- a/mk/toolchain/gcc/rte.vars.mk
> +++ b/mk/toolchain/gcc/rte.vars.mk
> @@ -43,6 +43,13 @@ ifeq (,$(findstring -O0,$(EXTRA_CFLAGS))) endif endif
>
> +# Initial execution TLS model has better performane compared to dynamic
> +# global. But this model require for addtional slot on DTV when dlopen
> +# object with thread local variable.
> +ifeq ($(CONFIG_RTE_BUILD_SHARED_LIB),y)
> +TOOLCHAIN_CFLAGS += -ftls-model=initial-exec endif
> +
[Sachin Saxena] Using initial-exec model for shared object is not recommended. If you link a shared object containing IE-model, the object will have the DF_STATIC_TLS flag set. By the spec, this means that dlopen() might refuse to load it if TLS usage is greater than static TLS space.
This is what happening, when I tried to validate this change on ARM64 based NXP platform with VPP-dpdk solution. VPP initialization fails with following error:
"load_one_plugin:145: /usr/lib/vpp_plugins/dpdk_plugin.so: cannot allocate memory in static TLS block"
Note that dpdk dpaa2 driver and VPP both uses TLS variables quite significantly. When forced to Initial-exec model in dpdk shared object, VPP static TLS space is getting exhausted and dlopen() returns error while trying to load the DPDK object.
For same reason, when we use "-fPIC" the default TLS model changed to "global-dynamics" from "Initial-exec".
In my opinion, this change should not be merged as it is breaking basic functionality.
> WERROR_FLAGS := -W -Wall -Wstrict-prototypes -Wmissing-prototypes
> WERROR_FLAGS += -Wmissing-declarations -Wold-style-definition -Wpointer-
> arith WERROR_FLAGS += -Wcast-align -Wnested-externs -Wcast-qual
> --
> 2.17.0
> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Sachin Saxena
> Sent: Thursday, July 05, 2018 10:46 PM
> To: Liu, Yong <yong.liu@intel.com>; Yang, Zhiyong <zhiyong.yang@intel.com>;
> thomas@monjalon.net; dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH] mk: using initial-exec model for thread
> local variable
>
>
>
> >
> > When building share library, thread-local storage model will be changed
> to
> > global-dynamic. It will add additional cost for reading thread local
> variable.
> > On the other hand, dynamically load share library with static TLS will
> request
> > additional DTV slot which is limited by loader. By now only
> librte_pmd_eal.so
> > contain thread local variable. So that can make TLS model back to
> initial-exec
> > like static library for better performance.
> >
> > Signed-off-by: Marvin Liu <yong.liu@intel.com>
> >
> > diff --git a/mk/toolchain/gcc/rte.vars.mk b/mk/toolchain/gcc/rte.vars.mk
> > index 7e4531bab..19d5e11ef 100644
> > --- a/mk/toolchain/gcc/rte.vars.mk
> > +++ b/mk/toolchain/gcc/rte.vars.mk
> > @@ -43,6 +43,13 @@ ifeq (,$(findstring -O0,$(EXTRA_CFLAGS))) endif
> endif
> >
> > +# Initial execution TLS model has better performane compared to dynamic
> > +# global. But this model require for addtional slot on DTV when dlopen
> > +# object with thread local variable.
> > +ifeq ($(CONFIG_RTE_BUILD_SHARED_LIB),y)
> > +TOOLCHAIN_CFLAGS += -ftls-model=initial-exec endif
> > +
>
> [Sachin Saxena] Using initial-exec model for shared object is not
> recommended. If you link a shared object containing IE-model, the object
> will have the DF_STATIC_TLS flag set. By the spec, this means that dlopen()
> might refuse to load it if TLS usage is greater than static TLS space.
> This is what happening, when I tried to validate this change on ARM64
> based NXP platform with VPP-dpdk solution. VPP initialization fails with
> following error:
> "load_one_plugin:145: /usr/lib/vpp_plugins/dpdk_plugin.so: cannot
> allocate memory in static TLS block"
>
> Note that dpdk dpaa2 driver and VPP both uses TLS variables quite
> significantly. When forced to Initial-exec model in dpdk shared object,
> VPP static TLS space is getting exhausted and dlopen() returns error while
> trying to load the DPDK object.
> For same reason, when we use "-fPIC" the default TLS model changed to
> "global-dynamics" from "Initial-exec".
>
> In my opinion, this change should not be merged as it is breaking basic
> functionality.
Thanks for your opinion, Sachin.
IE model may cause problem when using dlopen open share object. On the other hand, it can benefit performance.
It will be better to keep current workable setting and users may change it by themselves.
Regards,
Marvin
>
> > WERROR_FLAGS := -W -Wall -Wstrict-prototypes -Wmissing-prototypes
> > WERROR_FLAGS += -Wmissing-declarations -Wold-style-definition -Wpointer-
> > arith WERROR_FLAGS += -Wcast-align -Wnested-externs -Wcast-qual
> > --
> > 2.17.0
On Fri, Jul 06, 2018 at 02:22:14AM +0000, Liu, Yong wrote:
>
>
> > -----Original Message-----
> > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Sachin Saxena
> > Sent: Thursday, July 05, 2018 10:46 PM
> > To: Liu, Yong <yong.liu@intel.com>; Yang, Zhiyong <zhiyong.yang@intel.com>;
> > thomas@monjalon.net; dev@dpdk.org
> > Subject: Re: [dpdk-dev] [PATCH] mk: using initial-exec model for thread
> > local variable
> >
> >
> >
> > >
> > > When building share library, thread-local storage model will be changed
> > to
> > > global-dynamic. It will add additional cost for reading thread local
> > variable.
> > > On the other hand, dynamically load share library with static TLS will
> > request
> > > additional DTV slot which is limited by loader. By now only
> > librte_pmd_eal.so
> > > contain thread local variable. So that can make TLS model back to
> > initial-exec
> > > like static library for better performance.
> > >
> > > Signed-off-by: Marvin Liu <yong.liu@intel.com>
> > >
> > > diff --git a/mk/toolchain/gcc/rte.vars.mk b/mk/toolchain/gcc/rte.vars.mk
> > > index 7e4531bab..19d5e11ef 100644
> > > --- a/mk/toolchain/gcc/rte.vars.mk
> > > +++ b/mk/toolchain/gcc/rte.vars.mk
> > > @@ -43,6 +43,13 @@ ifeq (,$(findstring -O0,$(EXTRA_CFLAGS))) endif
> > endif
> > >
> > > +# Initial execution TLS model has better performane compared to dynamic
> > > +# global. But this model require for addtional slot on DTV when dlopen
> > > +# object with thread local variable.
> > > +ifeq ($(CONFIG_RTE_BUILD_SHARED_LIB),y)
> > > +TOOLCHAIN_CFLAGS += -ftls-model=initial-exec endif
> > > +
> >
> > [Sachin Saxena] Using initial-exec model for shared object is not
> > recommended. If you link a shared object containing IE-model, the object
> > will have the DF_STATIC_TLS flag set. By the spec, this means that dlopen()
> > might refuse to load it if TLS usage is greater than static TLS space.
> > This is what happening, when I tried to validate this change on ARM64
> > based NXP platform with VPP-dpdk solution. VPP initialization fails with
> > following error:
> > "load_one_plugin:145: /usr/lib/vpp_plugins/dpdk_plugin.so: cannot
> > allocate memory in static TLS block"
> >
> > Note that dpdk dpaa2 driver and VPP both uses TLS variables quite
> > significantly. When forced to Initial-exec model in dpdk shared object,
> > VPP static TLS space is getting exhausted and dlopen() returns error while
> > trying to load the DPDK object.
> > For same reason, when we use "-fPIC" the default TLS model changed to
> > "global-dynamics" from "Initial-exec".
> >
> > In my opinion, this change should not be merged as it is breaking basic
> > functionality.
>
> Thanks for your opinion, Sachin.
> IE model may cause problem when using dlopen open share object. On the other hand, it can benefit performance.
> It will be better to keep current workable setting and users may change it by themselves.
>
What is the performance delta, and where is it most seen? I suggest for
future patches like this, that the commit message itself should give a
rough/approx indication of the perf impacts.
/Bruce
@@ -43,6 +43,13 @@ ifeq (,$(findstring -O0,$(EXTRA_CFLAGS)))
endif
endif
+# Initial execution TLS model has better performane compared to dynamic
+# global. But this model require for addtional slot on DTV when dlopen
+# object with thread local variable.
+ifeq ($(CONFIG_RTE_BUILD_SHARED_LIB),y)
+TOOLCHAIN_CFLAGS += -ftls-model=initial-exec
+endif
+
WERROR_FLAGS := -W -Wall -Wstrict-prototypes -Wmissing-prototypes
WERROR_FLAGS += -Wmissing-declarations -Wold-style-definition -Wpointer-arith
WERROR_FLAGS += -Wcast-align -Wnested-externs -Wcast-qual