common/mlx5: adjust fork call with the new kernel API

Message ID 20230524120140.416144-1-erezf@nvidia.com (mailing list archive)
State Accepted, archived
Delegated to: Raslan Darawsheh
Headers
Series common/mlx5: adjust fork call with the new kernel API |

Checks

Context Check Description
ci/checkpatch warning coding style issues
ci/loongarch-compilation success Compilation OK
ci/loongarch-unit-testing success Unit Testing PASS
ci/iol-mellanox-Performance success Performance Testing PASS
ci/iol-intel-Performance success Performance Testing PASS
ci/iol-broadcom-Performance success Performance Testing PASS
ci/iol-intel-Functional success Functional Testing PASS
ci/github-robot: build success github build: passed
ci/iol-broadcom-Functional success Functional Testing PASS
ci/iol-abi-testing success Testing PASS
ci/iol-testing success Testing PASS
ci/iol-unit-testing success Testing PASS
ci/iol-x86_64-unit-testing success Testing PASS
ci/iol-x86_64-compile-testing success Testing PASS
ci/iol-aarch64-unit-testing success Testing PASS
ci/iol-aarch64-compile-testing success Testing PASS
ci/Intel-compilation success Compilation OK
ci/intel-Testing success Testing PASS
ci/intel-Functional success Functional PASS

Commit Message

erezf@nvidia.com May 24, 2023, 12:01 p.m. UTC
From: Erez Ferber <erezf@nvidia.com>

While doing process fork() the operating system remaps all the parent
process's memory to the address space of the child process and activates
the Copy-on-Write mechanics - it duplicates physical pages once memory
writing happens in the child process. Sometimes memory duplication is
not allowed - for example, if the page contains hardware queue
descriptors. To handle similar issues the rdma-core library should be
prepared for forking.

The ibv_fork_init() prepares the library to track all the related memory
and prevent it from forking using madvise() system API. This approach
allows fork, but not all the memory is forked to the child process and,
application should care not to touch pages where the parent application
allocated the rdma-core objects.

The newer kernels propose an option of copy-on-fork for DMA pages and
tracking all the memory and disabling it for the forking is no longer
needed. The new API routine ibv_is_fork_initialized() should be involved
to decide if library initialization for forking is required.

Fixes: 0e83b8e536 ("net/mlx5: move rdma-core calls to separate file")
Cc: stable@dpdk.org
Signed-off-by: Erez Ferber <erezf@nvidia.com>
---
 drivers/common/mlx5/linux/meson.build | 2 ++
 drivers/common/mlx5/linux/mlx5_glue.c | 4 ++++
 2 files changed, 6 insertions(+)
  

Comments

Stephen Hemminger May 24, 2023, 2:50 p.m. UTC | #1
On Wed, 24 May 2023 15:01:40 +0300
<erezf@nvidia.com> wrote:

> From: Erez Ferber <erezf@nvidia.com>
> 
> While doing process fork() the operating system remaps all the parent
> process's memory to the address space of the child process and activates
> the Copy-on-Write mechanics - it duplicates physical pages once memory
> writing happens in the child process. Sometimes memory duplication is
> not allowed - for example, if the page contains hardware queue
> descriptors. To handle similar issues the rdma-core library should be
> prepared for forking.
> 
> The ibv_fork_init() prepares the library to track all the related memory
> and prevent it from forking using madvise() system API. This approach
> allows fork, but not all the memory is forked to the child process and,
> application should care not to touch pages where the parent application
> allocated the rdma-core objects.
> 
> The newer kernels propose an option of copy-on-fork for DMA pages and
> tracking all the memory and disabling it for the forking is no longer
> needed. The new API routine ibv_is_fork_initialized() should be involved
> to decide if library initialization for forking is required.
> 
> Fixes: 0e83b8e536 ("net/mlx5: move rdma-core calls to separate file")
> Cc: stable@dpdk.org
> Signed-off-by: Erez Ferber <erezf@nvidia.com>

I don't think DPDK applications should fork(), and lots other
parts of the shared huge pages will break if an application does this.
  
Slava Ovsiienko May 24, 2023, 4:05 p.m. UTC | #2
> -----Original Message-----
> From: Erez Ferber <erezf@nvidia.com>
> Sent: Wednesday, May 24, 2023 3:02 PM
> To: dev@dpdk.org
> Cc: Slava Ovsiienko <viacheslavo@nvidia.com>; Matan Azrad
> <matan@nvidia.com>; Raslan Darawsheh <rasland@nvidia.com>; Erez Ferber
> <erezf@nvidia.com>; stable@dpdk.org
> Subject: [PATCH] common/mlx5: adjust fork call with the new kernel API
> 
> From: Erez Ferber <erezf@nvidia.com>
> 
> While doing process fork() the operating system remaps all the parent
> process's memory to the address space of the child process and activates the
> Copy-on-Write mechanics - it duplicates physical pages once memory writing
> happens in the child process. Sometimes memory duplication is not allowed -
> for example, if the page contains hardware queue descriptors. To handle
> similar issues the rdma-core library should be prepared for forking.
> 
> The ibv_fork_init() prepares the library to track all the related memory and
> prevent it from forking using madvise() system API. This approach allows fork,
> but not all the memory is forked to the child process and, application should
> care not to touch pages where the parent application allocated the rdma-core
> objects.
> 
> The newer kernels propose an option of copy-on-fork for DMA pages and
> tracking all the memory and disabling it for the forking is no longer needed.
> The new API routine ibv_is_fork_initialized() should be involved to decide if
> library initialization for forking is required.
> 
> Fixes: 0e83b8e536 ("net/mlx5: move rdma-core calls to separate file")
> Cc: stable@dpdk.org
> Signed-off-by: Erez Ferber <erezf@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
  
Slava Ovsiienko May 25, 2023, 8:10 a.m. UTC | #3
> -----Original Message-----
> From: Stephen Hemminger <stephen@networkplumber.org>
> Sent: Wednesday, May 24, 2023 5:50 PM
> To: Erez Ferber <erezf@nvidia.com>
> Cc: dev@dpdk.org; Slava Ovsiienko <viacheslavo@nvidia.com>; Matan Azrad
> <matan@nvidia.com>; Raslan Darawsheh <rasland@nvidia.com>;
> stable@dpdk.org
> Subject: Re: [PATCH] common/mlx5: adjust fork call with the new kernel API
> 
> On Wed, 24 May 2023 15:01:40 +0300
> <erezf@nvidia.com> wrote:
> 
> > From: Erez Ferber <erezf@nvidia.com>
> >
> > While doing process fork() the operating system remaps all the parent
> > process's memory to the address space of the child process and
> > activates the Copy-on-Write mechanics - it duplicates physical pages
> > once memory writing happens in the child process. Sometimes memory
> > duplication is not allowed - for example, if the page contains
> > hardware queue descriptors. To handle similar issues the rdma-core
> > library should be prepared for forking.
> >
> > The ibv_fork_init() prepares the library to track all the related
> > memory and prevent it from forking using madvise() system API. This
> > approach allows fork, but not all the memory is forked to the child
> > process and, application should care not to touch pages where the
> > parent application allocated the rdma-core objects.
> >
> > The newer kernels propose an option of copy-on-fork for DMA pages and
> > tracking all the memory and disabling it for the forking is no longer
> > needed. The new API routine ibv_is_fork_initialized() should be
> > involved to decide if library initialization for forking is required.
> >
> > Fixes: 0e83b8e536 ("net/mlx5: move rdma-core calls to separate file")
> > Cc: stable@dpdk.org
> > Signed-off-by: Erez Ferber <erezf@nvidia.com>
> 
Hi,

> I don't think DPDK applications should fork(), and lots other parts of the
> shared huge pages will break if an application does this.

I agree - application should not, we have the secondary/primary processes approach.
Nonetheless, we have the real use case - DPDK application does fork() and works well.
Without mlx5 PMD 😊. With mlx5 it ran into some troubles. Now we have the solution.

With best regards,
Slava
  
Stephen Hemminger May 25, 2023, 3:27 p.m. UTC | #4
On Thu, 25 May 2023 08:10:03 +0000
Slava Ovsiienko <viacheslavo@nvidia.com> wrote:

> > -----Original Message-----
> > From: Stephen Hemminger <stephen@networkplumber.org>
> > Sent: Wednesday, May 24, 2023 5:50 PM
> > To: Erez Ferber <erezf@nvidia.com>
> > Cc: dev@dpdk.org; Slava Ovsiienko <viacheslavo@nvidia.com>; Matan Azrad
> > <matan@nvidia.com>; Raslan Darawsheh <rasland@nvidia.com>;
> > stable@dpdk.org
> > Subject: Re: [PATCH] common/mlx5: adjust fork call with the new kernel API
> > 
> > On Wed, 24 May 2023 15:01:40 +0300
> > <erezf@nvidia.com> wrote:
> >   
> > > From: Erez Ferber <erezf@nvidia.com>
> > >
> > > While doing process fork() the operating system remaps all the parent
> > > process's memory to the address space of the child process and
> > > activates the Copy-on-Write mechanics - it duplicates physical pages
> > > once memory writing happens in the child process. Sometimes memory
> > > duplication is not allowed - for example, if the page contains
> > > hardware queue descriptors. To handle similar issues the rdma-core
> > > library should be prepared for forking.
> > >
> > > The ibv_fork_init() prepares the library to track all the related
> > > memory and prevent it from forking using madvise() system API. This
> > > approach allows fork, but not all the memory is forked to the child
> > > process and, application should care not to touch pages where the
> > > parent application allocated the rdma-core objects.
> > >
> > > The newer kernels propose an option of copy-on-fork for DMA pages and
> > > tracking all the memory and disabling it for the forking is no longer
> > > needed. The new API routine ibv_is_fork_initialized() should be
> > > involved to decide if library initialization for forking is required.
> > >
> > > Fixes: 0e83b8e536 ("net/mlx5: move rdma-core calls to separate file")
> > > Cc: stable@dpdk.org
> > > Signed-off-by: Erez Ferber <erezf@nvidia.com>  
> >   
> Hi,
> 
> > I don't think DPDK applications should fork(), and lots other parts of the
> > shared huge pages will break if an application does this.  
> 
> I agree - application should not, we have the secondary/primary processes approach.
> Nonetheless, we have the real use case - DPDK application does fork() and works well.
> Without mlx5 PMD 😊. With mlx5 it ran into some troubles. Now we have the solution.

The problem is you are allowing fork(). And many other libraries may break.
Imagine a DPDK library which has some local mutex and hugepages.
If two forked processes use it then the locks won't work and hugepage data
will be corrupted.
  
Slava Ovsiienko May 26, 2023, 8:05 a.m. UTC | #5
> -----Original Message-----
> From: Stephen Hemminger <stephen@networkplumber.org>
> Sent: Thursday, May 25, 2023 6:28 PM
> To: Slava Ovsiienko <viacheslavo@nvidia.com>
> Cc: Erez Ferber <erezf@nvidia.com>; dev@dpdk.org; Matan Azrad
> <matan@nvidia.com>; Raslan Darawsheh <rasland@nvidia.com>;
> stable@dpdk.org
> Subject: Re: [PATCH] common/mlx5: adjust fork call with the new kernel API
> 
> On Thu, 25 May 2023 08:10:03 +0000
> Slava Ovsiienko <viacheslavo@nvidia.com> wrote:
> 
> > > -----Original Message-----
> > > From: Stephen Hemminger <stephen@networkplumber.org>
> > > Sent: Wednesday, May 24, 2023 5:50 PM
> > > To: Erez Ferber <erezf@nvidia.com>
> > > Cc: dev@dpdk.org; Slava Ovsiienko <viacheslavo@nvidia.com>; Matan
> > > Azrad <matan@nvidia.com>; Raslan Darawsheh <rasland@nvidia.com>;
> > > stable@dpdk.org
> > > Subject: Re: [PATCH] common/mlx5: adjust fork call with the new
> > > kernel API
> > >
> > > On Wed, 24 May 2023 15:01:40 +0300
> > > <erezf@nvidia.com> wrote:
> > >
> > > > From: Erez Ferber <erezf@nvidia.com>
> > > >
> > > > While doing process fork() the operating system remaps all the
> > > > parent process's memory to the address space of the child process
> > > > and activates the Copy-on-Write mechanics - it duplicates physical
> > > > pages once memory writing happens in the child process. Sometimes
> > > > memory duplication is not allowed - for example, if the page
> > > > contains hardware queue descriptors. To handle similar issues the
> > > > rdma-core library should be prepared for forking.
> > > >
> > > > The ibv_fork_init() prepares the library to track all the related
> > > > memory and prevent it from forking using madvise() system API.
> > > > This approach allows fork, but not all the memory is forked to the
> > > > child process and, application should care not to touch pages
> > > > where the parent application allocated the rdma-core objects.
> > > >
> > > > The newer kernels propose an option of copy-on-fork for DMA pages
> > > > and tracking all the memory and disabling it for the forking is no
> > > > longer needed. The new API routine ibv_is_fork_initialized()
> > > > should be involved to decide if library initialization for forking is
> required.
> > > >
> > > > Fixes: 0e83b8e536 ("net/mlx5: move rdma-core calls to separate
> > > > file")
> > > > Cc: stable@dpdk.org
> > > > Signed-off-by: Erez Ferber <erezf@nvidia.com>
> > >
> > Hi,
> >
> > > I don't think DPDK applications should fork(), and lots other parts
> > > of the shared huge pages will break if an application does this.
> >
> > I agree - application should not, we have the secondary/primary processes
> approach.
> > Nonetheless, we have the real use case - DPDK application does fork() and
> works well.
> > Without mlx5 PMD 😊. With mlx5 it ran into some troubles. Now we have
> the solution.
> 
> The problem is you are allowing fork(). And many other libraries may break.
> Imagine a DPDK library which has some local mutex and hugepages.
> If two forked processes use it then the locks won't work and hugepage data
> will be corrupted.

IMO, this problem is not fork() specific - applications/libraries supporting 
secondary/primary process approach should be developed with the similar
precautions.

And, IIRC, fork() is neither disallowed nor discouraged in DPDK documentation.
Moreover, some library unit tests ensure fork() works. No wonder users
may develop applications using fork(). They do, and app works well, beside mlx5 PMD.
Also, the recommendations to re-design with pri/sec approach was given.

With best regards,
Slava
  
Raslan Darawsheh June 22, 2023, 12:19 p.m. UTC | #6
Hi,

> -----Original Message-----
> From: Erez Ferber <erezf@nvidia.com>
> Sent: Wednesday, May 24, 2023 3:02 PM
> To: dev@dpdk.org
> Cc: Slava Ovsiienko <viacheslavo@nvidia.com>; Matan Azrad
> <matan@nvidia.com>; Raslan Darawsheh <rasland@nvidia.com>; Erez Ferber
> <erezf@nvidia.com>; stable@dpdk.org
> Subject: [PATCH] common/mlx5: adjust fork call with the new kernel API
> 
> From: Erez Ferber <erezf@nvidia.com>
> 
> While doing process fork() the operating system remaps all the parent
> process's memory to the address space of the child process and activates
> the Copy-on-Write mechanics - it duplicates physical pages once memory
> writing happens in the child process. Sometimes memory duplication is
> not allowed - for example, if the page contains hardware queue
> descriptors. To handle similar issues the rdma-core library should be
> prepared for forking.
> 
> The ibv_fork_init() prepares the library to track all the related memory
> and prevent it from forking using madvise() system API. This approach
> allows fork, but not all the memory is forked to the child process and,
> application should care not to touch pages where the parent application
> allocated the rdma-core objects.
> 
> The newer kernels propose an option of copy-on-fork for DMA pages and
> tracking all the memory and disabling it for the forking is no longer
> needed. The new API routine ibv_is_fork_initialized() should be involved
> to decide if library initialization for forking is required.
> 
> Fixes: 0e83b8e536 ("net/mlx5: move rdma-core calls to separate file")
> Cc: stable@dpdk.org
> Signed-off-by: Erez Ferber <erezf@nvidia.com>

Patch applied to next-net-mlx,

Kindest regards,
Raslan Darawsheh
  

Patch

diff --git a/drivers/common/mlx5/linux/meson.build b/drivers/common/mlx5/linux/meson.build
index 96a6c6c9be..c1fcb36fdd 100644
--- a/drivers/common/mlx5/linux/meson.build
+++ b/drivers/common/mlx5/linux/meson.build
@@ -219,6 +219,8 @@  has_sym_args = [
             'ibv_import_device' ],
         [ 'HAVE_MLX5DV_DR_ACTION_CREATE_DEST_ROOT_TABLE', 'infiniband/mlx5dv.h',
             'mlx5dv_dr_action_create_dest_root_table' ],
+	[ 'HAVE_IBV_FORK_UNNEEDED', 'infiniband/verbs.h',
+		'ibv_is_fork_initialized'],
 ]
 if  libmtcr_ul_found
     has_sym_args += [
diff --git a/drivers/common/mlx5/linux/mlx5_glue.c b/drivers/common/mlx5/linux/mlx5_glue.c
index 702eb36b62..88b99fe029 100644
--- a/drivers/common/mlx5/linux/mlx5_glue.c
+++ b/drivers/common/mlx5/linux/mlx5_glue.c
@@ -19,6 +19,10 @@ 
 static int
 mlx5_glue_fork_init(void)
 {
+#ifdef HAVE_IBV_FORK_UNNEEDED
+	if (ibv_is_fork_initialized() == IBV_FORK_UNNEEDED)
+		return 0; /* ibv_fork_init() not needed */
+#endif
 	return ibv_fork_init();
 }