[v2,6/6] test/dmadev: add tests for stopping and restarting dev

Message ID 20230116173738.562322-7-bruce.richardson@intel.com (mailing list archive)
State Superseded, archived
Delegated to: David Marchand
Headers
Series dma/ioat: fix issues with stopping and restarting device |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/loongarch-compilation success Compilation OK
ci/loongarch-unit-testing success Unit Testing PASS
ci/Intel-compilation success Compilation OK
ci/iol-broadcom-Performance success Performance Testing PASS
ci/iol-mellanox-Performance success Performance Testing PASS
ci/intel-Testing success Testing PASS
ci/iol-intel-Functional success Functional Testing PASS
ci/iol-intel-Performance success Performance Testing PASS
ci/github-robot: build fail github build: failed
ci/iol-aarch64-unit-testing success Testing PASS
ci/iol-testing success Testing PASS
ci/iol-x86_64-unit-testing success Testing PASS
ci/iol-x86_64-compile-testing success Testing PASS
ci/iol-abi-testing success Testing PASS
ci/iol-aarch64-compile-testing success Testing PASS

Commit Message

Bruce Richardson Jan. 16, 2023, 5:37 p.m. UTC
  Validate device operation when a device is stopped or restarted.

The only complication - and gap in the dmadev ABI specification - is
what happens to the job ids on restart. Some drivers reset them to 0,
while others continue where things left off. Take account of both
possibilities in the test case.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 app/test/test_dmadev.c | 46 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 46 insertions(+)
  

Comments

Kevin Laatz Feb. 14, 2023, 4:04 p.m. UTC | #1
On 16/01/2023 17:37, Bruce Richardson wrote:
> Validate device operation when a device is stopped or restarted.
>
> The only complication - and gap in the dmadev ABI specification - is
> what happens to the job ids on restart. Some drivers reset them to 0,
> while others continue where things left off. Take account of both
> possibilities in the test case.
>
> Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
> ---
>   app/test/test_dmadev.c | 46 ++++++++++++++++++++++++++++++++++++++++++
>   1 file changed, 46 insertions(+)
>
Acked-by: Kevin Laatz <kevin.laatz@intel.com>
  
fengchengwen Feb. 15, 2023, 1:59 a.m. UTC | #2
On 2023/1/17 1:37, Bruce Richardson wrote:
> Validate device operation when a device is stopped or restarted.
> 
> The only complication - and gap in the dmadev ABI specification - is
> what happens to the job ids on restart. Some drivers reset them to 0,
> while others continue where things left off. Take account of both
> possibilities in the test case.
> 
> Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
> ---
>  app/test/test_dmadev.c | 46 ++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 46 insertions(+)
> 
> diff --git a/app/test/test_dmadev.c b/app/test/test_dmadev.c
> index de787c14e2..8fb73a41e2 100644
> --- a/app/test/test_dmadev.c
> +++ b/app/test/test_dmadev.c
> @@ -304,6 +304,48 @@ test_enqueue_copies(int16_t dev_id, uint16_t vchan)
>  			|| do_multi_copies(dev_id, vchan, 0, 0, 1);
>  }
>  
> +static int
> +test_stop_start(int16_t dev_id, uint16_t vchan)
> +{
> +	/* device is already started on input, should be (re)started on output */
> +
> +	uint16_t id = 0;
> +	enum rte_dma_status_code status = RTE_DMA_STATUS_SUCCESSFUL;
> +
> +	/* - test stopping a device works ok,
> +	 * - then do a start-stop without doing a copy
> +	 * - finally restart the device
> +	 * checking for errors at each stage, and validating we can still copy at the end.
> +	 */
> +	if (rte_dma_stop(dev_id) < 0)
> +		ERR_RETURN("Error stopping device\n");
> +
> +	if (rte_dma_start(dev_id) < 0)
> +		ERR_RETURN("Error restarting device\n");
> +	if (rte_dma_stop(dev_id) < 0)
> +		ERR_RETURN("Error stopping device after restart (no jobs executed)\n");
> +
> +	if (rte_dma_start(dev_id) < 0)
> +		ERR_RETURN("Error restarting device after multiple stop-starts\n");
> +
> +	/* before doing a copy, we need to know what the next id will be it should
> +	 * either be:
> +	 * - the last completed job before start if driver does not reset id on stop
> +	 * - or -1 i.e. next job is 0, if driver does reset the job ids on stop
> +	 */
> +	if (rte_dma_completed_status(dev_id, vchan, 1, &id, &status) != 0)
> +		ERR_RETURN("Error with rte_dma_completed_status when no job done\n");
> +	id += 1; /* id_count is next job id */
> +	if (id != id_count && id != 0)
> +		ERR_RETURN("Unexpected next id from device after stop-start. Got %u, expected %u or 0\n",
> +				id, id_count);

Hi Bruce,

Suggest add a warn LOG to identify the id was not reset zero.
So that new driver could detect break ABI specification.

Thanks.
  
Bruce Richardson Feb. 15, 2023, 11:57 a.m. UTC | #3
On Wed, Feb 15, 2023 at 09:59:06AM +0800, fengchengwen wrote:
> On 2023/1/17 1:37, Bruce Richardson wrote:
> > Validate device operation when a device is stopped or restarted.
> > 
> > The only complication - and gap in the dmadev ABI specification - is
> > what happens to the job ids on restart. Some drivers reset them to 0,
> > while others continue where things left off. Take account of both
> > possibilities in the test case.
> > 
> > Signed-off-by: Bruce Richardson <bruce.richardson@intel.com> ---
> > app/test/test_dmadev.c | 46 ++++++++++++++++++++++++++++++++++++++++++
> > 1 file changed, 46 insertions(+)
> > 
> > diff --git a/app/test/test_dmadev.c b/app/test/test_dmadev.c index
> > de787c14e2..8fb73a41e2 100644 --- a/app/test/test_dmadev.c +++
> > b/app/test/test_dmadev.c @@ -304,6 +304,48 @@
> > test_enqueue_copies(int16_t dev_id, uint16_t vchan) ||
> > do_multi_copies(dev_id, vchan, 0, 0, 1); }
> >  
> > +static int +test_stop_start(int16_t dev_id, uint16_t vchan) +{ +	/*
> > device is already started on input, should be (re)started on output */
> > + +	uint16_t id = 0; +	enum rte_dma_status_code status =
> > RTE_DMA_STATUS_SUCCESSFUL; + +	/* - test stopping a device works
> > ok, +	 * - then do a start-stop without doing a copy +	 *
> > - finally restart the device +	 * checking for errors at each
> > stage, and validating we can still copy at the end.  +	 */ +	if
> > (rte_dma_stop(dev_id) < 0) +		ERR_RETURN("Error stopping
> > device\n"); + +	if (rte_dma_start(dev_id) < 0) +
> > ERR_RETURN("Error restarting device\n"); +	if (rte_dma_stop(dev_id) <
> > 0) +		ERR_RETURN("Error stopping device after restart (no
> > jobs executed)\n"); + +	if (rte_dma_start(dev_id) < 0) +
> > ERR_RETURN("Error restarting device after multiple stop-starts\n"); + +
> > /* before doing a copy, we need to know what the next id will be it
> > should +	 * either be: +	 * - the last completed job before start if
> > driver does not reset id on stop +	 * - or -1 i.e. next job is 0, if
> > driver does reset the job ids on stop +	 */ +	if
> > (rte_dma_completed_status(dev_id, vchan, 1, &id, &status) != 0) +
> > ERR_RETURN("Error with rte_dma_completed_status when no job done\n"); +
> > id += 1; /* id_count is next job id */ +	if (id != id_count && id !=
> > 0) +		ERR_RETURN("Unexpected next id from device after
> > stop-start. Got %u, expected %u or 0\n", +				id,
> > id_count);
> 
> Hi Bruce,
> 
> Suggest add a warn LOG to identify the id was not reset zero.  So that
> new driver could detect break ABI specification.
> 
Not sure that that is necessary. The actual ABI, nor the doxygen docs,
doesn't specify what happens to the values on doing stop and then start. My
thinking was that it should continue numbering as it would be equivalent to
suspend and resume, but other drivers appear to treat it as a "reset". I
suspect there are advantages and disadvantages to both schemes. Until we
decide on what the correct behaviour should be - or decide to allow both -
I don't think warning is the right thing to do here.

/Bruce
  
fengchengwen Feb. 16, 2023, 1:24 a.m. UTC | #4
On 2023/2/15 19:57, Bruce Richardson wrote:
> On Wed, Feb 15, 2023 at 09:59:06AM +0800, fengchengwen wrote:
>> On 2023/1/17 1:37, Bruce Richardson wrote:
>>> Validate device operation when a device is stopped or restarted.
>>>
>>> The only complication - and gap in the dmadev ABI specification - is
>>> what happens to the job ids on restart. Some drivers reset them to 0,
>>> while others continue where things left off. Take account of both
>>> possibilities in the test case.
>>>
>>> Signed-off-by: Bruce Richardson <bruce.richardson@intel.com> ---
>>> app/test/test_dmadev.c | 46 ++++++++++++++++++++++++++++++++++++++++++
>>> 1 file changed, 46 insertions(+)
>>>
>>> diff --git a/app/test/test_dmadev.c b/app/test/test_dmadev.c index
>>> de787c14e2..8fb73a41e2 100644 --- a/app/test/test_dmadev.c +++
>>> b/app/test/test_dmadev.c @@ -304,6 +304,48 @@
>>> test_enqueue_copies(int16_t dev_id, uint16_t vchan) ||
>>> do_multi_copies(dev_id, vchan, 0, 0, 1); }
>>>  
>>> +static int +test_stop_start(int16_t dev_id, uint16_t vchan) +{ +	/*
>>> device is already started on input, should be (re)started on output */
>>> + +	uint16_t id = 0; +	enum rte_dma_status_code status =
>>> RTE_DMA_STATUS_SUCCESSFUL; + +	/* - test stopping a device works
>>> ok, +	 * - then do a start-stop without doing a copy +	 *
>>> - finally restart the device +	 * checking for errors at each
>>> stage, and validating we can still copy at the end.  +	 */ +	if
>>> (rte_dma_stop(dev_id) < 0) +		ERR_RETURN("Error stopping
>>> device\n"); + +	if (rte_dma_start(dev_id) < 0) +
>>> ERR_RETURN("Error restarting device\n"); +	if (rte_dma_stop(dev_id) <
>>> 0) +		ERR_RETURN("Error stopping device after restart (no
>>> jobs executed)\n"); + +	if (rte_dma_start(dev_id) < 0) +
>>> ERR_RETURN("Error restarting device after multiple stop-starts\n"); + +
>>> /* before doing a copy, we need to know what the next id will be it
>>> should +	 * either be: +	 * - the last completed job before start if
>>> driver does not reset id on stop +	 * - or -1 i.e. next job is 0, if
>>> driver does reset the job ids on stop +	 */ +	if
>>> (rte_dma_completed_status(dev_id, vchan, 1, &id, &status) != 0) +
>>> ERR_RETURN("Error with rte_dma_completed_status when no job done\n"); +
>>> id += 1; /* id_count is next job id */ +	if (id != id_count && id !=
>>> 0) +		ERR_RETURN("Unexpected next id from device after
>>> stop-start. Got %u, expected %u or 0\n", +				id,
>>> id_count);
>>
>> Hi Bruce,
>>
>> Suggest add a warn LOG to identify the id was not reset zero.  So that
>> new driver could detect break ABI specification.
>>
> Not sure that that is necessary. The actual ABI, nor the doxygen docs,
> doesn't specify what happens to the values on doing stop and then start. My
> thinking was that it should continue numbering as it would be equivalent to
> suspend and resume, but other drivers appear to treat it as a "reset". I
> suspect there are advantages and disadvantages to both schemes. Until we
> decide on what the correct behaviour should be - or decide to allow both -
> I don't think warning is the right thing to do here.

In this point, agree to upstream this patch first, and then discuss the correct
behavior should be for restart scenario.

> 
> /Bruce
> 
> .
>
  
Bruce Richardson Feb. 16, 2023, 9:24 a.m. UTC | #5
On Thu, Feb 16, 2023 at 09:24:38AM +0800, fengchengwen wrote:
> On 2023/2/15 19:57, Bruce Richardson wrote:
> > On Wed, Feb 15, 2023 at 09:59:06AM +0800, fengchengwen wrote:
> >> On 2023/1/17 1:37, Bruce Richardson wrote:
> >>> Validate device operation when a device is stopped or restarted.
> >>>
> >>> The only complication - and gap in the dmadev ABI specification - is
> >>> what happens to the job ids on restart. Some drivers reset them to 0,
> >>> while others continue where things left off. Take account of both
> >>> possibilities in the test case.
> >>>
> >>> Signed-off-by: Bruce Richardson <bruce.richardson@intel.com> ---
> >>> app/test/test_dmadev.c | 46 ++++++++++++++++++++++++++++++++++++++++++
> >>> 1 file changed, 46 insertions(+)
> >>>
> >>> diff --git a/app/test/test_dmadev.c b/app/test/test_dmadev.c index
> >>> de787c14e2..8fb73a41e2 100644 --- a/app/test/test_dmadev.c +++
> >>> b/app/test/test_dmadev.c @@ -304,6 +304,48 @@
> >>> test_enqueue_copies(int16_t dev_id, uint16_t vchan) ||
> >>> do_multi_copies(dev_id, vchan, 0, 0, 1); }
> >>>  
> >>> +static int +test_stop_start(int16_t dev_id, uint16_t vchan) +{ +	/*
> >>> device is already started on input, should be (re)started on output */
> >>> + +	uint16_t id = 0; +	enum rte_dma_status_code status =
> >>> RTE_DMA_STATUS_SUCCESSFUL; + +	/* - test stopping a device works
> >>> ok, +	 * - then do a start-stop without doing a copy +	 *
> >>> - finally restart the device +	 * checking for errors at each
> >>> stage, and validating we can still copy at the end.  +	 */ +	if
> >>> (rte_dma_stop(dev_id) < 0) +		ERR_RETURN("Error stopping
> >>> device\n"); + +	if (rte_dma_start(dev_id) < 0) +
> >>> ERR_RETURN("Error restarting device\n"); +	if (rte_dma_stop(dev_id) <
> >>> 0) +		ERR_RETURN("Error stopping device after restart (no
> >>> jobs executed)\n"); + +	if (rte_dma_start(dev_id) < 0) +
> >>> ERR_RETURN("Error restarting device after multiple stop-starts\n"); + +
> >>> /* before doing a copy, we need to know what the next id will be it
> >>> should +	 * either be: +	 * - the last completed job before start if
> >>> driver does not reset id on stop +	 * - or -1 i.e. next job is 0, if
> >>> driver does reset the job ids on stop +	 */ +	if
> >>> (rte_dma_completed_status(dev_id, vchan, 1, &id, &status) != 0) +
> >>> ERR_RETURN("Error with rte_dma_completed_status when no job done\n"); +
> >>> id += 1; /* id_count is next job id */ +	if (id != id_count && id !=
> >>> 0) +		ERR_RETURN("Unexpected next id from device after
> >>> stop-start. Got %u, expected %u or 0\n", +				id,
> >>> id_count);
> >>
> >> Hi Bruce,
> >>
> >> Suggest add a warn LOG to identify the id was not reset zero.  So that
> >> new driver could detect break ABI specification.
> >>
> > Not sure that that is necessary. The actual ABI, nor the doxygen docs,
> > doesn't specify what happens to the values on doing stop and then start. My
> > thinking was that it should continue numbering as it would be equivalent to
> > suspend and resume, but other drivers appear to treat it as a "reset". I
> > suspect there are advantages and disadvantages to both schemes. Until we
> > decide on what the correct behaviour should be - or decide to allow both -
> > I don't think warning is the right thing to do here.
> 
> In this point, agree to upstream this patch first, and then discuss the correct
> behavior should be for restart scenario.
> 
+1. Thanks.

With this patch in place we will also be better able to help drivers
enforce the correct behaviour once we define it.

I'll do v3 keeping this as-is for now.

/Bruce
  

Patch

diff --git a/app/test/test_dmadev.c b/app/test/test_dmadev.c
index de787c14e2..8fb73a41e2 100644
--- a/app/test/test_dmadev.c
+++ b/app/test/test_dmadev.c
@@ -304,6 +304,48 @@  test_enqueue_copies(int16_t dev_id, uint16_t vchan)
 			|| do_multi_copies(dev_id, vchan, 0, 0, 1);
 }
 
+static int
+test_stop_start(int16_t dev_id, uint16_t vchan)
+{
+	/* device is already started on input, should be (re)started on output */
+
+	uint16_t id = 0;
+	enum rte_dma_status_code status = RTE_DMA_STATUS_SUCCESSFUL;
+
+	/* - test stopping a device works ok,
+	 * - then do a start-stop without doing a copy
+	 * - finally restart the device
+	 * checking for errors at each stage, and validating we can still copy at the end.
+	 */
+	if (rte_dma_stop(dev_id) < 0)
+		ERR_RETURN("Error stopping device\n");
+
+	if (rte_dma_start(dev_id) < 0)
+		ERR_RETURN("Error restarting device\n");
+	if (rte_dma_stop(dev_id) < 0)
+		ERR_RETURN("Error stopping device after restart (no jobs executed)\n");
+
+	if (rte_dma_start(dev_id) < 0)
+		ERR_RETURN("Error restarting device after multiple stop-starts\n");
+
+	/* before doing a copy, we need to know what the next id will be it should
+	 * either be:
+	 * - the last completed job before start if driver does not reset id on stop
+	 * - or -1 i.e. next job is 0, if driver does reset the job ids on stop
+	 */
+	if (rte_dma_completed_status(dev_id, vchan, 1, &id, &status) != 0)
+		ERR_RETURN("Error with rte_dma_completed_status when no job done\n");
+	id += 1; /* id_count is next job id */
+	if (id != id_count && id != 0)
+		ERR_RETURN("Unexpected next id from device after stop-start. Got %u, expected %u or 0\n",
+				id, id_count);
+
+	id_count = id;
+	if (test_single_copy(dev_id, vchan) < 0)
+		ERR_RETURN("Error performing copy after device restart\n");
+	return 0;
+}
+
 /* Failure handling test cases - global macros and variables for those tests*/
 #define COMP_BURST_SZ	16
 #define OPT_FENCE(idx) ((fence && idx == 8) ? RTE_DMA_OP_FLAG_FENCE : 0)
@@ -819,6 +861,10 @@  test_dmadev_instance(int16_t dev_id)
 	if (runtest("copy", test_enqueue_copies, 640, dev_id, vchan, CHECK_ERRS) < 0)
 		goto err;
 
+	/* run tests stopping/starting devices and check jobs still work after restart */
+	if (runtest("stop-start", test_stop_start, 1, dev_id, vchan, CHECK_ERRS) < 0)
+		goto err;
+
 	/* run some burst capacity tests */
 	if (rte_dma_burst_capacity(dev_id, vchan) < 64)
 		printf("DMA Dev %u: insufficient burst capacity (64 required), skipping tests\n",