From patchwork Mon Oct 2 00:12:25 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Li, Xiaoyun" X-Patchwork-Id: 29464 Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 173A91B1B9; Mon, 2 Oct 2017 02:12:32 +0200 (CEST) Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by dpdk.org (Postfix) with ESMTP id 42FA61B1B1 for ; Mon, 2 Oct 2017 02:12:30 +0200 (CEST) Received: from orsmga001.jf.intel.com ([10.7.209.18]) by orsmga101.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 01 Oct 2017 17:12:29 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos; i="5.42,466,1500966000"; d="scan'208"; a="1177598932" Received: from fmsmsx103.amr.corp.intel.com ([10.18.124.201]) by orsmga001.jf.intel.com with ESMTP; 01 Oct 2017 17:12:28 -0700 Received: from fmsmsx152.amr.corp.intel.com (10.18.125.5) by FMSMSX103.amr.corp.intel.com (10.18.124.201) with Microsoft SMTP Server (TLS) id 14.3.319.2; Sun, 1 Oct 2017 17:12:28 -0700 Received: from shsmsx152.ccr.corp.intel.com (10.239.6.52) by FMSMSX152.amr.corp.intel.com (10.18.125.5) with Microsoft SMTP Server (TLS) id 14.3.319.2; Sun, 1 Oct 2017 17:12:28 -0700 Received: from shsmsx101.ccr.corp.intel.com ([169.254.1.159]) by SHSMSX152.ccr.corp.intel.com ([169.254.6.93]) with mapi id 14.03.0319.002; Mon, 2 Oct 2017 08:12:26 +0800 From: "Li, Xiaoyun" To: "Ananyev, Konstantin" , "Richardson, Bruce" CC: "Lu, Wenzhuo" , "Zhang, Helin" , "dev@dpdk.org" Thread-Topic: [PATCH v3 1/3] eal/x86: run-time dispatch over memcpy Thread-Index: AQHTNps4d1S4LinEtUS+0otIwfUeF6LPKmYAgACLkxA= Date: Mon, 2 Oct 2017 00:12:25 +0000 Message-ID: References: <1506411689-94690-1-git-send-email-xiaoyun.li@intel.com> <1506411689-94690-2-git-send-email-xiaoyun.li@intel.com> <2601191342CEEE43887BDE71AB9772585FAA2BD2@IRSMSX103.ger.corp.intel.com> In-Reply-To: <2601191342CEEE43887BDE71AB9772585FAA2BD2@IRSMSX103.ger.corp.intel.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: dlp-product: dlpe-windows dlp-version: 11.0.0.116 dlp-reaction: no-action x-ctpclassification: CTP_IC x-titus-metadata-40: eyJDYXRlZ29yeUxhYmVscyI6IiIsIk1ldGFkYXRhIjp7Im5zIjoiaHR0cDpcL1wvd3d3LnRpdHVzLmNvbVwvbnNcL0ludGVsMyIsImlkIjoiYWFiOTY0ZGMtZDVmYS00MzI1LTk0YTctMWEzODU4ZmY5MDE0IiwicHJvcHMiOlt7Im4iOiJDVFBDbGFzc2lmaWNhdGlvbiIsInZhbHMiOlt7InZhbHVlIjoiQ1RQX0lDIn1dfV19LCJTdWJqZWN0TGFiZWxzIjpbXSwiVE1DVmVyc2lvbiI6IjE2LjUuOS4zIiwiVHJ1c3RlZExhYmVsSGFzaCI6IndyakxKcE9ZeVdJejdSNFBlOFc0blRuWDRTdkJWdkUrN28zS2h2NmFENTg9In0= x-originating-ip: [10.239.127.40] MIME-Version: 1.0 Subject: Re: [dpdk-dev] [PATCH v3 1/3] eal/x86: run-time dispatch over memcpy X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Hi > That means that each file with '#include will have its own > copy > of that function: > $ objdump -d x86_64-native-linuxapp-gcc/app/testpmd | grep > ':' | sort -u | wc -l > 233 > Same story for rte_memcpy_ptr and rte_memcpy_DEFAULT, etc... > Obviously we need (and want) only one copy of that stuff per binary. > > > +#ifdef CC_SUPPORT_AVX2 > > Why do you assume this macro will be defined? > By whom? > There is no such macro with gcc: > $ gcc -march=native -dM -E - &1 | grep AVX2 > #define __AVX2__ 1 > , and you don't define it yourself. > When building with '-march=native' on BDW only rte_memcpy_DEFAULT get > compiled. > I defined it myself. But when I sort the patch, I forgot to modify the file in this version. Sorry about that. It should be like this. To check whether the compiler supports AVX2 or AVX512. > To summarize: as I understand the goal of that patch was > (assuming that our current rte_memcpy() implementation is good in terms of > both performance and functionality): > 1. Based on current rte_memcpy() implementation define 3 x86 arch specific > rte_memcpy flavors: > a) rte_memcpy_SSE > b) rte_memcpy_AVX2 > c) rte_memcpy_AVX512 > 2. Select appropriate flavor based on current HW at runtime, > i.e. both 3 flavors should be present in the binary and selection should be > made > at program startup. > > As I can see none of the goals was achieved with the current patch, > instead a lot of redundant code was introduced. > So I think it is NACK for the current version. > What I think need to be done instead: > > 1. mv lib/librte_eal/common/include/arch/x86/rte_memcpy.h > lib/librte_eal/common/include/arch/x86/rte_memcpy_internal.h > 2. inside rte_memcpy_internal.h rename rte_memcpy() into > rte_memcpy_internal(). > 3. create 3 files: > rte_memcpy_sse.c > rte_memcpy_avx2.c > rte_memcpy_avx512.c > > Inside each of these files we define corresponding rte_memcpy_xxx() > function. > I.E: > rte_memcpy_avx2.c: > .... > #ifndef RTE_MACHINE_CPUFLAG_AVX2 > #error "no avx2 support" > endif > > #include "rte_memcpy_internal.h" > ... > > void * > rte_memcpy_avx2(void *dst, const void *src, size_t n) > { > return rte_memcpy_internal(dst, src, n); > } > > 4. Make changes inside lib/librte_eal/Makefile to ensure that each of > rte_memcpy_xxx() > get build with appropriate -march flags (I.E: avx2 with -mavx2, etc.) > You can use librte_acl/Makefile as a reference. > > 5. Create rte_memcpy.c and put rte_memcpy_ptr/rte_memcpy_init() > definitions in that file. > 6. Create new rte_memcpy.h and define rte_memcpy() in it: > > ... > #include > ... > > +#define RTE_X86_MEMCPY_THRESH 128 > static inline void * > rte_memcpy(void *dst, const void *src, size_t n) > { > if (n <= RTE_X86_MEMCPY_THRESH) > return rte_memcpy_internal(dst, src, n); > else > return (*rte_memcpy_ptr)(dst, src, n); > } > > 7. Test it properly - i.e. build dpdk with default target and make sure each of > 3 flavors > could be selected properly at runtime based on underlying arch. > > 8. As a possible future improvement - with such changes we don't need a > generic inline > implementation. We can think about creating a faster version that need to > copy > <= 128B. > > Konstantin > Will modify it in next version. Thanks. diff --git a/mk/rte.cpuflags.mk b/mk/rte.cpuflags.mk index a813c91..92399ec 100644 --- a/mk/rte.cpuflags.mk +++ b/mk/rte.cpuflags.mk @@ -141,3 +141,17 @@ space:= $(empty) $(empty) CPUFLAGSTMP1 := $(addprefix RTE_CPUFLAG_,$(CPUFLAGS)) CPUFLAGSTMP2 := $(subst $(space),$(comma),$(CPUFLAGSTMP1)) CPUFLAGS_LIST := -DRTE_COMPILE_TIME_CPUFLAGS=$(CPUFLAGSTMP2) + +# Check if the compiler supports AVX512. +CC_SUPPORT_AVX512 := $(shell $(CC) -march=skylake-avx512 -dM -E - < /dev/null 2>&1 | grep -q AVX512 && echo 1) +ifeq ($(CC_SUPPORT_AVX512),1) +ifeq ($(CONFIG_RTE_ENABLE_AVX512),y) +MACHINE_CFLAGS += -DCC_SUPPORT_AVX512 +endif +endif + +# Check if the compiler supports AVX2. +CC_SUPPORT_AVX2 := $(shell $(CC) -march=core-avx2 -dM -E - < /dev/null 2>&1 | grep -q AVX2 && echo 1) +ifeq ($(CC_SUPPORT_AVX2),1) +MACHINE_CFLAGS += -DCC_SUPPORT_AVX2 +endif