test/eal: do not scan PCI devices for memory tests

Message ID 1564662465-2925-1-git-send-email-david.marchand@redhat.com (mailing list archive)
State Changes Requested, archived
Delegated to: Thomas Monjalon
Headers
Series test/eal: do not scan PCI devices for memory tests |

Checks

Context Check Description
ci/checkpatch warning coding style issues
ci/iol-Compile-Testing success Compile Testing PASS
ci/intel-Performance-Testing success Performance Testing PASS
ci/Intel-compilation success Compilation OK
ci/mellanox-Performance-Testing success Performance Testing PASS

Commit Message

David Marchand Aug. 1, 2019, 12:27 p.m. UTC
  The memory tests currently check that, for normal mode (not legacy mode),
there is no memory left behind when exiting.

The problem is that if a ethdev port is allocated when scanning pci
devices (even if the driver probe fails like when you have a virtio
management interface attached to the kernel), on exit, dpdk won't free
the associated memory since ethdev never frees the ethdev memzone.

Workaround this by disabling pci scan.

Fixes: 651cc78f83b5 ("test: fix hugepage file handling in EAL flags autotest")
Fixes: 690fd3577e90 ("test/eal: add cases for in-memory and single-file-segments")
Cc: stable@dpdk.org

Signed-off-by: David Marchand <david.marchand@redhat.com>
---
 app/test/test_eal_flags.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)
  

Comments

David Marchand Aug. 1, 2019, 12:29 p.m. UTC | #1
On Thu, Aug 1, 2019 at 2:28 PM David Marchand <david.marchand@redhat.com> wrote:
>
> The memory tests currently check that, for normal mode (not legacy mode),
> there is no memory left behind when exiting.
>
> The problem is that if a ethdev port is allocated when scanning pci
> devices (even if the driver probe fails like when you have a virtio
> management interface attached to the kernel), on exit, dpdk won't free
> the associated memory since ethdev never frees the ethdev memzone.
>
> Workaround this by disabling pci scan.

Not entirely happy with this patch.
I am open to suggestions :-)

>
> Fixes: 651cc78f83b5 ("test: fix hugepage file handling in EAL flags autotest")
> Fixes: 690fd3577e90 ("test/eal: add cases for in-memory and single-file-segments")
> Cc: stable@dpdk.org

And we might want to drop stable.
  
Gaëtan Rivet Aug. 1, 2019, 1:22 p.m. UTC | #2
On Thu, Aug 01, 2019 at 02:29:21PM +0200, David Marchand wrote:
> On Thu, Aug 1, 2019 at 2:28 PM David Marchand <david.marchand@redhat.com> wrote:
> >
> > The memory tests currently check that, for normal mode (not legacy mode),
> > there is no memory left behind when exiting.
> >
> > The problem is that if a ethdev port is allocated when scanning pci
> > devices (even if the driver probe fails like when you have a virtio
> > management interface attached to the kernel), on exit, dpdk won't free
> > the associated memory since ethdev never frees the ethdev memzone.
> >
> > Workaround this by disabling pci scan.
> 
> Not entirely happy with this patch.
> I am open to suggestions :-)
> 

Hello David,

Why not cleanup on .fini the ethdev subsystem?

There is RTE_UNREGISTER_CLASS() that was added just for that. I
think I remember someone (maybe Thomas) not liking it. Why are we
keeping the memzones allocated on exit? Debug ? Primary/secondary snafu?

I would take this opportunity to troll once again about the default
blacklist-mode of the PCI bus, which is still an issue for downstream
consumers of DPDK and seems to be kept only to avoid changing the CI
infrastructure of upstream contributors. Unfortunately the root-cause
might be elsewhere.

> >
> > Fixes: 651cc78f83b5 ("test: fix hugepage file handling in EAL flags autotest")
> > Fixes: 690fd3577e90 ("test/eal: add cases for in-memory and single-file-segments")
> > Cc: stable@dpdk.org
> 
> And we might want to drop stable.
> 
> 
> -- 
> David Marchand
  
David Marchand Aug. 1, 2019, 1:24 p.m. UTC | #3
Hey Gaëtan,

On Thu, Aug 1, 2019 at 3:22 PM Gaëtan Rivet <gaetan.rivet@6wind.com> wrote:
>
> On Thu, Aug 01, 2019 at 02:29:21PM +0200, David Marchand wrote:
> > On Thu, Aug 1, 2019 at 2:28 PM David Marchand <david.marchand@redhat.com> wrote:
> > >
> > > The memory tests currently check that, for normal mode (not legacy mode),
> > > there is no memory left behind when exiting.
> > >
> > > The problem is that if a ethdev port is allocated when scanning pci
> > > devices (even if the driver probe fails like when you have a virtio
> > > management interface attached to the kernel), on exit, dpdk won't free
> > > the associated memory since ethdev never frees the ethdev memzone.
> > >
> > > Workaround this by disabling pci scan.
> >
> > Not entirely happy with this patch.
> > I am open to suggestions :-)
> >
>> Why not cleanup on .fini the ethdev subsystem?

I had this in mind, tried it quickly, but it still failed.
So I suppose .fini is executed after rte_eal_cleanup (where the
freeing of the hugepages happens).
  
David Marchand Aug. 1, 2019, 1:26 p.m. UTC | #4
On Thu, Aug 1, 2019 at 3:24 PM David Marchand <david.marchand@redhat.com> wrote:
>
> Hey Gaëtan,
>
> On Thu, Aug 1, 2019 at 3:22 PM Gaëtan Rivet <gaetan.rivet@6wind.com> wrote:
> >
> > On Thu, Aug 01, 2019 at 02:29:21PM +0200, David Marchand wrote:
> > > On Thu, Aug 1, 2019 at 2:28 PM David Marchand <david.marchand@redhat.com> wrote:
> > > >
> > > > The memory tests currently check that, for normal mode (not legacy mode),
> > > > there is no memory left behind when exiting.
> > > >
> > > > The problem is that if a ethdev port is allocated when scanning pci
> > > > devices (even if the driver probe fails like when you have a virtio
> > > > management interface attached to the kernel), on exit, dpdk won't free
> > > > the associated memory since ethdev never frees the ethdev memzone.
> > > >
> > > > Workaround this by disabling pci scan.
> > >
> > > Not entirely happy with this patch.
> > > I am open to suggestions :-)
> > >
> >> Why not cleanup on .fini the ethdev subsystem?
>
> I had this in mind, tried it quickly, but it still failed.
> So I suppose .fini is executed after rte_eal_cleanup (where the
> freeing of the hugepages happens).

Or we move rte_eal_cleanup in a .fini and play with priorities... too
dangerous for 19.08, from my pov.
  
David Marchand Aug. 2, 2019, 11:13 a.m. UTC | #5
On Thu, Aug 1, 2019 at 2:29 PM David Marchand <david.marchand@redhat.com> wrote:
>
> On Thu, Aug 1, 2019 at 2:28 PM David Marchand <david.marchand@redhat.com> wrote:
> >
> > The memory tests currently check that, for normal mode (not legacy mode),
> > there is no memory left behind when exiting.
> >
> > The problem is that if a ethdev port is allocated when scanning pci
> > devices (even if the driver probe fails like when you have a virtio
> > management interface attached to the kernel), on exit, dpdk won't free
> > the associated memory since ethdev never frees the ethdev memzone.
> >
> > Workaround this by disabling pci scan.
>
> Not entirely happy with this patch.
> I am open to suggestions :-)

Note: this problem won't be seen in shared builds as long as the pci
drivers for net devices are not loaded.
  
Aaron Conole Aug. 2, 2019, 1:37 p.m. UTC | #6
David Marchand <david.marchand@redhat.com> writes:

> On Thu, Aug 1, 2019 at 2:28 PM David Marchand <david.marchand@redhat.com> wrote:
>>
>> The memory tests currently check that, for normal mode (not legacy mode),
>> there is no memory left behind when exiting.
>>
>> The problem is that if a ethdev port is allocated when scanning pci
>> devices (even if the driver probe fails like when you have a virtio
>> management interface attached to the kernel), on exit, dpdk won't free
>> the associated memory since ethdev never frees the ethdev memzone.
>>
>> Workaround this by disabling pci scan.
>
> Not entirely happy with this patch.
> I am open to suggestions :-)

Seems like an order of allocation / free issue.  Is it possible to
change the order to be consistent?  IE: we only allocate something after
we know there's good reason to do so and then we can be sure to always
free?  I don't know the code in this area well enough yet to comment any
more than that.

>>
>> Fixes: 651cc78f83b5 ("test: fix hugepage file handling in EAL flags autotest")
>> Fixes: 690fd3577e90 ("test/eal: add cases for in-memory and single-file-segments")
>> Cc: stable@dpdk.org
>
> And we might want to drop stable.
  
David Marchand Aug. 2, 2019, 1:45 p.m. UTC | #7
On Fri, Aug 2, 2019 at 3:37 PM Aaron Conole <aconole@redhat.com> wrote:
>
> David Marchand <david.marchand@redhat.com> writes:
>
> > On Thu, Aug 1, 2019 at 2:28 PM David Marchand <david.marchand@redhat.com> wrote:
> >>
> >> The memory tests currently check that, for normal mode (not legacy mode),
> >> there is no memory left behind when exiting.
> >>
> >> The problem is that if a ethdev port is allocated when scanning pci
> >> devices (even if the driver probe fails like when you have a virtio
> >> management interface attached to the kernel), on exit, dpdk won't free
> >> the associated memory since ethdev never frees the ethdev memzone.
> >>
> >> Workaround this by disabling pci scan.
> >
> > Not entirely happy with this patch.
> > I am open to suggestions :-)
>
> Seems like an order of allocation / free issue.  Is it possible to
> change the order to be consistent?  IE: we only allocate something after
> we know there's good reason to do so and then we can be sure to always
> free?  I don't know the code in this area well enough yet to comment any
> more than that.

Err, this looks hard to achieve.
The test is quite basic, in that it expects all hugepage files to be
removed on dpdk exit (meaning all allocations freed).
  
Thomas Monjalon Aug. 2, 2019, 8:57 p.m. UTC | #8
01/08/2019 14:27, David Marchand:
> The memory tests currently check that, for normal mode (not legacy mode),
> there is no memory left behind when exiting.

I think this is the real bug:
we are checking a behaviour that we cannot achieve currently.

> The problem is that if a ethdev port is allocated when scanning pci
> devices (even if the driver probe fails like when you have a virtio
> management interface attached to the kernel), on exit, dpdk won't free
> the associated memory since ethdev never frees the ethdev memzone.

As you said in this thread, we could think about how to free it properly
in a future release.
For 19.08, I would suggest to disable the test with a comment
explaining the reason.

> Workaround this by disabling pci scan.

If you choose the --no-pci workaround, please add a comment
about the reason.
  
David Marchand Aug. 3, 2019, 9:51 a.m. UTC | #9
On Fri, Aug 2, 2019 at 10:57 PM Thomas Monjalon <thomas@monjalon.net> wrote:
>
> 01/08/2019 14:27, David Marchand:
> > The memory tests currently check that, for normal mode (not legacy mode),
> > there is no memory left behind when exiting.
>
> I think this is the real bug:
> we are checking a behaviour that we cannot achieve currently.
>
> > The problem is that if a ethdev port is allocated when scanning pci
> > devices (even if the driver probe fails like when you have a virtio
> > management interface attached to the kernel), on exit, dpdk won't free
> > the associated memory since ethdev never frees the ethdev memzone.
>
> As you said in this thread, we could think about how to free it properly
> in a future release.
> For 19.08, I would suggest to disable the test with a comment
> explaining the reason.

For 19.08, as long as we test shared builds in the CI, then it just
"works", because the net drivers are not loaded.
No net driver, no ethdev leak ;-)
  
Thomas Monjalon Aug. 5, 2019, 10:19 a.m. UTC | #10
03/08/2019 11:51, David Marchand:
> On Fri, Aug 2, 2019 at 10:57 PM Thomas Monjalon <thomas@monjalon.net> wrote:
> >
> > 01/08/2019 14:27, David Marchand:
> > > The memory tests currently check that, for normal mode (not legacy mode),
> > > there is no memory left behind when exiting.
> >
> > I think this is the real bug:
> > we are checking a behaviour that we cannot achieve currently.
> >
> > > The problem is that if a ethdev port is allocated when scanning pci
> > > devices (even if the driver probe fails like when you have a virtio
> > > management interface attached to the kernel), on exit, dpdk won't free
> > > the associated memory since ethdev never frees the ethdev memzone.
> >
> > As you said in this thread, we could think about how to free it properly
> > in a future release.
> > For 19.08, I would suggest to disable the test with a comment
> > explaining the reason.
> 
> For 19.08, as long as we test shared builds in the CI, then it just
> "works", because the net drivers are not loaded.
> No net driver, no ethdev leak ;-)

So we keep the bug with the unit test not running with a static build
for 19.08, and we'll try to fix it in 19.11?
  
David Marchand Aug. 8, 2019, 11:23 a.m. UTC | #11
On Mon, Aug 5, 2019 at 12:20 PM Thomas Monjalon <thomas@monjalon.net> wrote:
> 03/08/2019 11:51, David Marchand:
> > On Fri, Aug 2, 2019 at 10:57 PM Thomas Monjalon <thomas@monjalon.net> wrote:
> > >
> > > 01/08/2019 14:27, David Marchand:
> > > > The memory tests currently check that, for normal mode (not legacy mode),
> > > > there is no memory left behind when exiting.
> > >
> > > I think this is the real bug:
> > > we are checking a behaviour that we cannot achieve currently.
> > >
> > > > The problem is that if a ethdev port is allocated when scanning pci
> > > > devices (even if the driver probe fails like when you have a virtio
> > > > management interface attached to the kernel), on exit, dpdk won't free
> > > > the associated memory since ethdev never frees the ethdev memzone.
> > >
> > > As you said in this thread, we could think about how to free it properly
> > > in a future release.
> > > For 19.08, I would suggest to disable the test with a comment
> > > explaining the reason.
> >
> > For 19.08, as long as we test shared builds in the CI, then it just
> > "works", because the net drivers are not loaded.
> > No net driver, no ethdev leak ;-)
>
> So we keep the bug with the unit test not running with a static build
> for 19.08, and we'll try to fix it in 19.11?

Seems the more pragmatic yes.
  

Patch

diff --git a/app/test/test_eal_flags.c b/app/test/test_eal_flags.c
index 5b2c0f5..6aaa4a3 100644
--- a/app/test/test_eal_flags.c
+++ b/app/test/test_eal_flags.c
@@ -1044,7 +1044,7 @@  test_file_prefix(void)
 			DEFAULT_MEM_SIZE, "--file-prefix=" memtest };
 
 	/* primary process with memtest1 and default mem mode */
-	const char *argv1[] = {prgname, "-m",
+	const char *argv1[] = {prgname, "--no-pci", "-m",
 			DEFAULT_MEM_SIZE, "--file-prefix=" memtest1 };
 
 	/* primary process with memtest1 and legacy mem mode */
@@ -1058,7 +1058,7 @@  test_file_prefix(void)
 			"--legacy-mem" };
 
 	/* primary process with memtest2 and default mem mode */
-	const char *argv4[] = {prgname, "-m",
+	const char *argv4[] = {prgname, "--no-pci", "-m",
 			DEFAULT_MEM_SIZE, "--file-prefix=" memtest2 };
 
 	/* primary process with --in-memory mode */
@@ -1075,7 +1075,7 @@  test_file_prefix(void)
 		DEFAULT_MEM_SIZE, "--in-memory", "--file-prefix", prefix };
 
 	/* primary process with memtest1 and --single-file-segments mode */
-	const char * const argv8[] = {prgname, "-m",
+	const char * const argv8[] = {prgname, "--no-pci", "-m",
 		DEFAULT_MEM_SIZE, "--single-file-segments",
 		"--file-prefix=" memtest1 };