[v7,2/2] bus/pci: support MMIO in PCI ioport accessors

Message ID 1614014118-91150-3-git-send-email-huawei.xhw@alibaba-inc.com (mailing list archive)
State Superseded, archived
Delegated to: David Marchand
Headers
Series support both PIO and MMIO BAR for legacy device in virtio PMD |

Checks

Context Check Description
ci/checkpatch warning coding style issues
ci/intel-Testing success Testing PASS
ci/iol-intel-Functional success Functional Testing PASS
ci/iol-intel-Performance success Performance Testing PASS
ci/iol-mellanox-Performance success Performance Testing PASS
ci/iol-mellanox-Functional success Functional Testing PASS
ci/iol-testing fail Testing issues

Commit Message

谢华伟(此时此刻) Feb. 22, 2021, 5:15 p.m. UTC
From: "huawei.xhw" <huawei.xhw@alibaba-inc.com>

With IO BAR, we get PIO(programmed IO) address.
With MMIO BAR, we get mapped virtual address.
We distinguish PIO(Programmed IO) and MMIO(memory mapped IO) by their address like how kernel does.
ioread/write8/16/32 is provided to access PIO/MMIO.
By the way, for virtio on arch other than x86, BAR flag indicates PIO but is mapped.

Signed-off-by: huawei xie <huawei.xhw@alibaba-inc.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 drivers/bus/pci/linux/pci.c     |   4 --
 drivers/bus/pci/linux/pci_uio.c | 156 +++++++++++++++++++++++++++++-----------
 2 files changed, 113 insertions(+), 47 deletions(-)
  

Comments

Ferruh Yigit Feb. 22, 2021, 5:25 p.m. UTC | #1
On 2/22/2021 5:15 PM, 谢华伟(此时此刻) wrote:
> From: "huawei.xhw" <huawei.xhw@alibaba-inc.com>
> 
> With IO BAR, we get PIO(programmed IO) address.
> With MMIO BAR, we get mapped virtual address.
> We distinguish PIO(Programmed IO) and MMIO(memory mapped IO) by their address like how kernel does.
> ioread/write8/16/32 is provided to access PIO/MMIO.
> By the way, for virtio on arch other than x86, BAR flag indicates PIO but is mapped.
> 
> Signed-off-by: huawei xie <huawei.xhw@alibaba-inc.com>
> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>

<...>

> +
> +static inline void iowrite8(uint8_t val, void *addr)
> +{
> +	(uint64_t)(uintptr_t)addr >= PIO_MAX ?
> +		*(volatile uint8_t *)addr = val :
> +		outb(val, (unsigned long)addr);

//copying question from previous version:

Is the 'outb_p' to 'outb' conversion intentional? And if so why?

Same of the all 'outb_p', 'outw_p', 'outl_p'.
  
谢华伟(此时此刻) Feb. 23, 2021, 2:20 p.m. UTC | #2
On 2021/2/23 1:25, Ferruh Yigit wrote:
> On 2/22/2021 5:15 PM, 谢华伟(此时此刻) wrote:
>> From: "huawei.xhw" <huawei.xhw@alibaba-inc.com>
>>
>> With IO BAR, we get PIO(programmed IO) address.
>> With MMIO BAR, we get mapped virtual address.
>> We distinguish PIO(Programmed IO) and MMIO(memory mapped IO) by their 
>> address like how kernel does.
>> ioread/write8/16/32 is provided to access PIO/MMIO.
>> By the way, for virtio on arch other than x86, BAR flag indicates PIO 
>> but is mapped.
>>
>> Signed-off-by: huawei xie <huawei.xhw@alibaba-inc.com>
>> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
>
> <...>
>
>> +
>> +static inline void iowrite8(uint8_t val, void *addr)
>> +{
>> +    (uint64_t)(uintptr_t)addr >= PIO_MAX ?
>> +        *(volatile uint8_t *)addr = val :
>> +        outb(val, (unsigned long)addr);
>
> //copying question from previous version:
>
> Is the 'outb_p' to 'outb' conversion intentional? And if so why?
>
> Same of the all 'outb_p', 'outw_p', 'outl_p'.

There is no need to delay for virtio device, as we can see in virtio 
legacy driver.

IMO, the delay is for ugly old device. The device itself should assure 
the previous IO completes when the subsequent IO instruction arrives.
  
Ferruh Yigit Feb. 24, 2021, 3:45 p.m. UTC | #3
On 2/23/2021 2:20 PM, 谢华伟(此时此刻) wrote:
> 
> On 2021/2/23 1:25, Ferruh Yigit wrote:
>> On 2/22/2021 5:15 PM, 谢华伟(此时此刻) wrote:
>>> From: "huawei.xhw" <huawei.xhw@alibaba-inc.com>
>>>
>>> With IO BAR, we get PIO(programmed IO) address.
>>> With MMIO BAR, we get mapped virtual address.
>>> We distinguish PIO(Programmed IO) and MMIO(memory mapped IO) by their address 
>>> like how kernel does.
>>> ioread/write8/16/32 is provided to access PIO/MMIO.
>>> By the way, for virtio on arch other than x86, BAR flag indicates PIO but is 
>>> mapped.
>>>
>>> Signed-off-by: huawei xie <huawei.xhw@alibaba-inc.com>
>>> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
>>
>> <...>
>>
>>> +
>>> +static inline void iowrite8(uint8_t val, void *addr)
>>> +{
>>> +    (uint64_t)(uintptr_t)addr >= PIO_MAX ?
>>> +        *(volatile uint8_t *)addr = val :
>>> +        outb(val, (unsigned long)addr);
>>
>> //copying question from previous version:
>>
>> Is the 'outb_p' to 'outb' conversion intentional? And if so why?
>>
>> Same of the all 'outb_p', 'outw_p', 'outl_p'.
> 
> There is no need to delay for virtio device, as we can see in virtio legacy driver.
> 
> IMO, the delay is for ugly old device. The device itself should assure the 
> previous IO completes when the subsequent IO instruction arrives.
> 

Can there be any virtio legacy device needing this?

What is the downside of using "pause until the I/O completes" versions?
  
谢华伟(此时此刻) Feb. 25, 2021, 3:59 a.m. UTC | #4
On 2021/2/24 23:45, Ferruh Yigit wrote:
> On 2/23/2021 2:20 PM, 谢华伟(此时此刻) wrote:
>>
>> On 2021/2/23 1:25, Ferruh Yigit wrote:
>>> On 2/22/2021 5:15 PM, 谢华伟(此时此刻) wrote:
>>>> From: "huawei.xhw" <huawei.xhw@alibaba-inc.com>
>>>>
>>>> With IO BAR, we get PIO(programmed IO) address.
>>>> With MMIO BAR, we get mapped virtual address.
>>>> We distinguish PIO(Programmed IO) and MMIO(memory mapped IO) by 
>>>> their address like how kernel does.
>>>> ioread/write8/16/32 is provided to access PIO/MMIO.
>>>> By the way, for virtio on arch other than x86, BAR flag indicates 
>>>> PIO but is mapped.
>>>>
>>>> Signed-off-by: huawei xie <huawei.xhw@alibaba-inc.com>
>>>> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
>>>
>>> <...>
>>>
>>>> +
>>>> +static inline void iowrite8(uint8_t val, void *addr)
>>>> +{
>>>> +    (uint64_t)(uintptr_t)addr >= PIO_MAX ?
>>>> +        *(volatile uint8_t *)addr = val :
>>>> +        outb(val, (unsigned long)addr);
>>>
>>> //copying question from previous version:
>>>
>>> Is the 'outb_p' to 'outb' conversion intentional? And if so why?
>>>
>>> Same of the all 'outb_p', 'outw_p', 'outl_p'.
>>
>> There is no need to delay for virtio device, as we can see in virtio 
>> legacy driver.
>>
>> IMO, the delay is for ugly old device. The device itself should 
>> assure the previous IO completes when the subsequent IO instruction 
>> arrives.
>>
>
> Can there be any virtio legacy device needing this?

The pause version delays sometime by writing to 0x80 debug port. virtio 
doesn't need this. virtio legacy PMD driver doens't use this.

Any device relying on this i think is buggy. How could the device rely 
on some uncertain cpu cycles to behave correct?

>
> What is the downside of using "pause until the I/O completes" versions?

The downside in virtio PMD is a small performance penalty when we use it 
to notify backend. CPU executes unnecessary serializing IO instruction.

I check kernel code, io wrapper for in/out doesn't use p version.
  
David Marchand Feb. 25, 2021, 9:52 a.m. UTC | #5
On Thu, Feb 25, 2021 at 5:00 AM 谢华伟(此时此刻) <huawei.xhw@alibaba-inc.com> wrote:
> >>> Is the 'outb_p' to 'outb' conversion intentional? And if so why?
> >>>
> >>> Same of the all 'outb_p', 'outw_p', 'outl_p'.
> >>
> >> There is no need to delay for virtio device, as we can see in virtio
> >> legacy driver.
> >>
> >> IMO, the delay is for ugly old device. The device itself should
> >> assure the previous IO completes when the subsequent IO instruction
> >> arrives.
> >>
> >
> > Can there be any virtio legacy device needing this?
>
> The pause version delays sometime by writing to 0x80 debug port. virtio
> doesn't need this. virtio legacy PMD driver doens't use this.
>
> Any device relying on this i think is buggy. How could the device rely
> on some uncertain cpu cycles to behave correct?
>
> >
> > What is the downside of using "pause until the I/O completes" versions?
>
> The downside in virtio PMD is a small performance penalty when we use it
> to notify backend. CPU executes unnecessary serializing IO instruction.
>
> I check kernel code, io wrapper for in/out doesn't use p version.

This change is a fix/optimisation.
This is a separate topic from adding MMIO support with x86 ioport.
I would split as a separate patch.
  
谢华伟(此时此刻) March 1, 2021, 3:43 p.m. UTC | #6
On 2021/2/25 17:52, David Marchand wrote:
> On Thu, Feb 25, 2021 at 5:00 AM 谢华伟(此时此刻) <huawei.xhw@alibaba-inc.com> wrote:
>>>>> Is the 'outb_p' to 'outb' conversion intentional? And if so why?
>>>>>
>>>>> Same of the all 'outb_p', 'outw_p', 'outl_p'.
>>>> There is no need to delay for virtio device, as we can see in virtio
>>>> legacy driver.
>>>>
>>>> IMO, the delay is for ugly old device. The device itself should
>>>> assure the previous IO completes when the subsequent IO instruction
>>>> arrives.
>>>>
>>> Can there be any virtio legacy device needing this?
>> The pause version delays sometime by writing to 0x80 debug port. virtio
>> doesn't need this. virtio legacy PMD driver doens't use this.
>>
>> Any device relying on this i think is buggy. How could the device rely
>> on some uncertain cpu cycles to behave correct?
>>
>>> What is the downside of using "pause until the I/O completes" versions?
>> The downside in virtio PMD is a small performance penalty when we use it
>> to notify backend. CPU executes unnecessary serializing IO instruction.
>>
>> I check kernel code, io wrapper for in/out doesn't use p version.
> This change is a fix/optimisation.
> This is a separate topic from adding MMIO support with x86 ioport.
> I would split as a separate patch.

Hi David:

Maybe there is confuse? There is no change. The out/in is added. I don't 
remove _p on purpose.

>
>
  
David Marchand March 2, 2021, 1:14 p.m. UTC | #7
On Mon, Mar 1, 2021 at 4:44 PM 谢华伟(此时此刻) <huawei.xhw@alibaba-inc.com> wrote:
> >>> What is the downside of using "pause until the I/O completes" versions?
> >> The downside in virtio PMD is a small performance penalty when we use it
> >> to notify backend. CPU executes unnecessary serializing IO instruction.
> >>
> >> I check kernel code, io wrapper for in/out doesn't use p version.
> > This change is a fix/optimisation.
> > This is a separate topic from adding MMIO support with x86 ioport.
> > I would split as a separate patch.
>
> Hi David:
>
> Maybe there is confuse? There is no change. The out/in is added. I don't
> remove _p on purpose.

Looking at v8 and repeating previous mails:

+#if defined(RTE_ARCH_X86)
...
+static inline void iowrite8(uint8_t val, void *addr)
+{
+    (uint64_t)(uintptr_t)addr >= PIO_MAX ?
+        *(volatile uint8_t *)addr = val :
+        outb(val, (unsigned long)addr); <======
+}

[...]


-#if defined(RTE_ARCH_X86)
-            outb_p(*s, reg); <======
-#else
-            *(volatile uint8_t *)reg = *s;
-#endif
+            iowrite8(*s, (void *)reg);


This almost went unnoticed (thanks Ferruh for spotting).

Do we _need_ this change on outX_p -> outX?

I am not comfortable at touching such low level internal routines that
have been in dpdk since v1.5.0.

If there is a good reason, it has nothing to do with adding MMIO
support and must be split in a separate patch.
If there is no reason, please restore outX_p, since the safest is not to touch.
  
谢华伟(此时此刻) March 3, 2021, 7:56 a.m. UTC | #8
On 2021/3/2 21:14, David Marchand wrote:
>>> This change is a fix/optimisation.
>>> This is a separate topic from adding MMIO support with x86 ioport.
>>> I would split as a separate patch.
>> Hi David:
>>
>> Maybe there is confuse? There is no change. The out/in is added. I don't
>> remove _p on purpose.
> Looking at v8 and repeating previous mails:
>
> +#if defined(RTE_ARCH_X86)
> ...
> +static inline void iowrite8(uint8_t val, void *addr)
> +{
> +    (uint64_t)(uintptr_t)addr >= PIO_MAX ?
> +        *(volatile uint8_t *)addr = val :
> +        outb(val, (unsigned long)addr); <======
> +}
>
> [...]
>
>
> -#if defined(RTE_ARCH_X86)
> -            outb_p(*s, reg); <======
> -#else
> -            *(volatile uint8_t *)reg = *s;
> -#endif
> +            iowrite8(*s, (void *)reg);
>
>
> This almost went unnoticed (thanks Ferruh for spotting).
>
> Do we_need_  this change on outX_p -> outX?

I understand where the confuse comes from. In the previous 
implementation, it is _p version, however i think _p is not needed.

In the initial DPDK virtio PMD, there is no _p, if my memory is correct. 
Don't know who added it, if any reason.

Anyway, i will send v9 with _p.

> I am not comfortable at touching such low level internal routines that
> have been in dpdk since v1.5.0.
>
> If there is a good reason, it has nothing to do with adding MMIO
> support and must be split in a separate patch.
> If there is no reason, please restore outX_p, since the safest is not to touch.
  

Patch

diff --git a/drivers/bus/pci/linux/pci.c b/drivers/bus/pci/linux/pci.c
index 0f38abf..0dc99e9 100644
--- a/drivers/bus/pci/linux/pci.c
+++ b/drivers/bus/pci/linux/pci.c
@@ -715,8 +715,6 @@  int rte_pci_write_config(const struct rte_pci_device *device,
 		break;
 #endif
 	case RTE_PCI_KDRV_IGB_UIO:
-		pci_uio_ioport_read(p, data, len, offset);
-		break;
 	case RTE_PCI_KDRV_UIO_GENERIC:
 		pci_uio_ioport_read(p, data, len, offset);
 		break;
@@ -736,8 +734,6 @@  int rte_pci_write_config(const struct rte_pci_device *device,
 		break;
 #endif
 	case RTE_PCI_KDRV_IGB_UIO:
-		pci_uio_ioport_write(p, data, len, offset);
-		break;
 	case RTE_PCI_KDRV_UIO_GENERIC:
 		pci_uio_ioport_write(p, data, len, offset);
 		break;
diff --git a/drivers/bus/pci/linux/pci_uio.c b/drivers/bus/pci/linux/pci_uio.c
index 01f2a40..f566953 100644
--- a/drivers/bus/pci/linux/pci_uio.c
+++ b/drivers/bus/pci/linux/pci_uio.c
@@ -368,6 +368,8 @@ 
 	return -1;
 }
 
+#define PIO_MAX 0x10000
+
 #if defined(RTE_ARCH_X86)
 int
 pci_uio_ioport_map(struct rte_pci_device *dev, int bar,
@@ -381,12 +383,6 @@ 
 	unsigned long base;
 	int i;
 
-	if (rte_eal_iopl_init() != 0) {
-		RTE_LOG(ERR, EAL, "%s(): insufficient ioport permissions for PCI device %s\n",
-			__func__, dev->name);
-		return -1;
-	}
-
 	/* open and read addresses of the corresponding resource in sysfs */
 	snprintf(filename, sizeof(filename), "%s/" PCI_PRI_FMT "/resource",
 		rte_pci_get_sysfs_path(), dev->addr.domain, dev->addr.bus,
@@ -408,15 +404,27 @@ 
 		&end_addr, &flags) < 0)
 		goto error;
 
-	if (!(flags & IORESOURCE_IO)) {
-		RTE_LOG(ERR, EAL, "%s(): bar resource other than IO is not supported\n", __func__);
-		goto error;
-	}
-	base = (unsigned long)phys_addr;
-	RTE_LOG(INFO, EAL, "%s(): PIO BAR %08lx detected\n", __func__, base);
+	if (flags & IORESOURCE_IO) {
+		if (rte_eal_iopl_init()) {
+			RTE_LOG(ERR, EAL, "%s(): insufficient ioport permissions for PCI device %s\n",
+				__func__, dev->name);
+			goto error;
+		}
 
-	if (base > UINT16_MAX)
+		base = (unsigned long)phys_addr;
+		if (base > PIO_MAX) {
+			RTE_LOG(ERR, EAL, "%s(): %08lx too large PIO resource\n", __func__, base);
+			goto error;
+		}
+
+		RTE_LOG(DEBUG, EAL, "%s(): PIO BAR %08lx detected\n", __func__, base);
+	} else if (flags & IORESOURCE_MEM) {
+		base = (unsigned long)dev->mem_resource[bar].addr;
+		RTE_LOG(DEBUG, EAL, "%s(): MMIO BAR %08lx detected\n", __func__, base);
+	} else {
+		RTE_LOG(ERR, EAL, "%s(): unknown BAR type\n", __func__);
 		goto error;
+	}
 
 	/* FIXME only for primary process ? */
 	if (dev->intr_handle.type == RTE_INTR_HANDLE_UNKNOWN) {
@@ -517,6 +525,92 @@ 
 }
 #endif
 
+#if defined(RTE_ARCH_X86)
+static inline uint8_t ioread8(void *addr)
+{
+	uint8_t val;
+
+	val = (uint64_t)(uintptr_t)addr >= PIO_MAX ?
+		*(volatile uint8_t *)addr :
+		inb((unsigned long)addr);
+
+	return val;
+}
+
+static inline uint16_t ioread16(void *addr)
+{
+	uint16_t val;
+
+	val = (uint64_t)(uintptr_t)addr >= PIO_MAX ?
+		*(volatile uint16_t *)addr :
+		inw((unsigned long)addr);
+
+	return val;
+}
+
+static inline uint32_t ioread32(void *addr)
+{
+	uint32_t val;
+
+	val = (uint64_t)(uintptr_t)addr >= PIO_MAX ?
+		*(volatile uint32_t *)addr :
+		inl((unsigned long)addr);
+
+	return val;
+}
+
+static inline void iowrite8(uint8_t val, void *addr)
+{
+	(uint64_t)(uintptr_t)addr >= PIO_MAX ?
+		*(volatile uint8_t *)addr = val :
+		outb(val, (unsigned long)addr);
+}
+
+static inline void iowrite16(uint16_t val, void *addr)
+{
+	(uint64_t)(uintptr_t)addr >= PIO_MAX ?
+		*(volatile uint16_t *)addr = val :
+		outw(val, (unsigned long)addr);
+}
+
+static inline void iowrite32(uint32_t val, void *addr)
+{
+	(uint64_t)(uintptr_t)addr >= PIO_MAX ?
+		*(volatile uint32_t *)addr = val :
+		outl(val, (unsigned long)addr);
+}
+#else
+static inline uint8_t ioread8(void *addr)
+{
+	return *(volatile uint8_t *)addr;
+}
+
+static inline uint16_t ioread16(void *addr)
+{
+	return *(volatile uint16_t *)addr;
+}
+
+static inline uint32_t ioread32(void *addr)
+{
+	return *(volatile uint32_t *)addr;
+}
+
+static inline void iowrite8(uint8_t val, void *addr)
+{
+	*(volatile uint8_t *)addr = val;
+}
+
+static inline void iowrite16(uint16_t val, void *addr)
+{
+	*(volatile uint16_t *)addr = val;
+}
+
+static inline void iowrite32(uint32_t val, void *addr)
+{
+	*(volatile uint32_t *)addr = val;
+}
+#endif
+
 void
 pci_uio_ioport_read(struct rte_pci_ioport *p,
 		    void *data, size_t len, off_t offset)
@@ -528,25 +622,13 @@ 
 	for (d = data; len > 0; d += size, reg += size, len -= size) {
 		if (len >= 4) {
 			size = 4;
-#if defined(RTE_ARCH_X86)
-			*(uint32_t *)d = inl(reg);
-#else
-			*(uint32_t *)d = *(volatile uint32_t *)reg;
-#endif
+			*(uint32_t *)d = ioread32((void *)reg);
 		} else if (len >= 2) {
 			size = 2;
-#if defined(RTE_ARCH_X86)
-			*(uint16_t *)d = inw(reg);
-#else
-			*(uint16_t *)d = *(volatile uint16_t *)reg;
-#endif
+			*(uint16_t *)d = ioread16((void *)reg);
 		} else {
 			size = 1;
-#if defined(RTE_ARCH_X86)
-			*d = inb(reg);
-#else
-			*d = *(volatile uint8_t *)reg;
-#endif
+			*d = ioread8((void *)reg);
 		}
 	}
 }
@@ -562,25 +644,13 @@ 
 	for (s = data; len > 0; s += size, reg += size, len -= size) {
 		if (len >= 4) {
 			size = 4;
-#if defined(RTE_ARCH_X86)
-			outl_p(*(const uint32_t *)s, reg);
-#else
-			*(volatile uint32_t *)reg = *(const uint32_t *)s;
-#endif
+			iowrite32(*(const uint32_t *)s, (void *)reg);
 		} else if (len >= 2) {
 			size = 2;
-#if defined(RTE_ARCH_X86)
-			outw_p(*(const uint16_t *)s, reg);
-#else
-			*(volatile uint16_t *)reg = *(const uint16_t *)s;
-#endif
+			iowrite16(*(const uint16_t *)s, (void *)reg);
 		} else {
 			size = 1;
-#if defined(RTE_ARCH_X86)
-			outb_p(*s, reg);
-#else
-			*(volatile uint8_t *)reg = *s;
-#endif
+			iowrite8(*s, (void *)reg);
 		}
 	}
 }