[v3] kni: fix possible kernel crash with va2pa

Message ID 20190625150414.11332-1-zhouyates@gmail.com (mailing list archive)
State Accepted, archived
Delegated to: Thomas Monjalon
Headers
Series [v3] kni: fix possible kernel crash with va2pa |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/intel-Performance-Testing success Performance Testing PASS
ci/mellanox-Performance-Testing success Performance Testing PASS

Commit Message

Matt June 25, 2019, 3:04 p.m. UTC
  va2pa depends on the physical address and virtual address offset of
current mbuf. It may get the wrong physical address of next mbuf which
allocated in another hugepage segment.

In rte_mempool_populate_default(), trying to allocate whole block of
contiguous memory could be failed. Then, it would reserve memory in
several memzones that have different physical address and virtual address
offsets. The rte_mempool_populate_default() is used by
rte_pktmbuf_pool_create().

Fixes: 8451269e6d7b ("kni: remove continuous memory restriction")

Signed-off-by: Yangchao Zhou <zhouyates@gmail.com>
---
 .../prog_guide/kernel_nic_interface.rst       |  6 ++-
 kernel/linux/kni/kni_net.c                    | 51 ++++++++++++-------
 .../linux/eal/include/rte_kni_common.h        |  2 +-
 lib/librte_kni/rte_kni.c                      | 16 +++++-
 lib/librte_kni/rte_kni_fifo.h                 | 12 +++++
 5 files changed, 65 insertions(+), 22 deletions(-)
  

Comments

yoursunny July 2, 2019, 8:07 p.m. UTC | #1
I am battling a related problem as reported on
https://bugs.dpdk.org/show_bug.cgi?id=183 and this patch seems
relevant, so I applied this patch on 196a46fab6eeb3ce2039e3bcaca80f8ba43ffc8d

However, this patch does not work for me:
with CONFIG_RTE_LIBRTE_MBUF_DEBUG enabled, kni_free_mbufs's invocation of
rte_pktmbuf_free throws "bad mbuf pool" error.

While all mbufs and segments in kni->rx_q now have physical addresses,
the mbufs and segments placed back to kni->free_q still have (mis-)calculated
virtual address. The pa2va function is not working properly.

Consequently, userspace side is passing wrong pointer to rte_pktmbuf_free,
so that application crashes with CONFIG_RTE_LIBRTE_MBUF_DEBUG enabled.
  
Ferruh Yigit July 10, 2019, 8:09 p.m. UTC | #2
On 6/25/2019 4:04 PM, Yangchao Zhou wrote:
> va2pa depends on the physical address and virtual address offset of
> current mbuf. It may get the wrong physical address of next mbuf which
> allocated in another hugepage segment.
> 
> In rte_mempool_populate_default(), trying to allocate whole block of
> contiguous memory could be failed. Then, it would reserve memory in
> several memzones that have different physical address and virtual address
> offsets. The rte_mempool_populate_default() is used by
> rte_pktmbuf_pool_create().
> 
> Fixes: 8451269e6d7b ("kni: remove continuous memory restriction")
> 
> Signed-off-by: Yangchao Zhou <zhouyates@gmail.com>

Overall looks good to me, not from this patch but can you please check below
comment too.
Also there is a comment from Junxiao, lets clear it before the ack.

<...>

> @@ -396,7 +401,7 @@ kni_net_rx_lo_fifo(struct kni_dev *kni)
>  	uint32_t ret;
>  	uint32_t len;
>  	uint32_t i, num, num_rq, num_tq, num_aq, num_fq;
> -	struct rte_kni_mbuf *kva;
> +	struct rte_kni_mbuf *kva, *next_kva;
>  	void *data_kva;
>  	struct rte_kni_mbuf *alloc_kva;
>  	void *alloc_data_kva;
> @@ -439,6 +444,13 @@ kni_net_rx_lo_fifo(struct kni_dev *kni)
>  			data_kva = kva2data_kva(kva);
>  			kni->va[i] = pa2va(kni->pa[i], kva);
>  
> +			while (kva->next) {
> +				next_kva = pa2kva(kva->next);
> +				/* Convert physical address to virtual address */
> +				kva->next = pa2va(kva->next, next_kva);
> +				kva = next_kva;
> +			}

Not done in this patch, but in 'kni_net_rx_lo_fifo()' the len calculated as
'len = kva->pkt_len;'

But while copying 'data' to 'alloc_data' the segmentation is not taken into
account and 'len' is used:
memcpy(alloc_data_kva, data_kva, len);

This may lead overflow 'alloc_data_kva' for some 'pkt_len' values.
  
Ferruh Yigit July 10, 2019, 8:11 p.m. UTC | #3
On 7/2/2019 9:07 PM, Junxiao Shi wrote:
> I am battling a related problem as reported on
> https://bugs.dpdk.org/show_bug.cgi?id=183 and this patch seems
> relevant, so I applied this patch on 196a46fab6eeb3ce2039e3bcaca80f8ba43ffc8d
> 
> However, this patch does not work for me:
> with CONFIG_RTE_LIBRTE_MBUF_DEBUG enabled, kni_free_mbufs's invocation of
> rte_pktmbuf_free throws "bad mbuf pool" error.
> 
> While all mbufs and segments in kni->rx_q now have physical addresses,
> the mbufs and segments placed back to kni->free_q still have (mis-)calculated
> virtual address. The pa2va function is not working properly.
> 
> Consequently, userspace side is passing wrong pointer to rte_pktmbuf_free,
> so that application crashes with CONFIG_RTE_LIBRTE_MBUF_DEBUG enabled.
> 

Hi Junxiao,

I don't see any issue in the code, and as far as I tested not seeing any issue.

Can you please provide more information how to reproduce the issue, is it only
enabling "CONFIG_RTE_LIBRTE_MBUF_DEBUG" config?

Thanks,
ferruh
  
yoursunny July 10, 2019, 8:40 p.m. UTC | #4
Hi Ferruh

I've uploaded my test case to Bug 183.
You can determine whether it is a bug that should be covered by this
patch, or it should be handled separately.

Yours, Junxiao

On Wed, Jul 10, 2019 at 4:11 PM Ferruh Yigit <ferruh.yigit@intel.com> wrote:
>
> On 7/2/2019 9:07 PM, Junxiao Shi wrote:
> > I am battling a related problem as reported on
> > https://bugs.dpdk.org/show_bug.cgi?id=183 and this patch seems
> > relevant, so I applied this patch on 196a46fab6eeb3ce2039e3bcaca80f8ba43ffc8d
> >
> > However, this patch does not work for me:
> > with CONFIG_RTE_LIBRTE_MBUF_DEBUG enabled, kni_free_mbufs's invocation of
> > rte_pktmbuf_free throws "bad mbuf pool" error.
> >
> > While all mbufs and segments in kni->rx_q now have physical addresses,
> > the mbufs and segments placed back to kni->free_q still have (mis-)calculated
> > virtual address. The pa2va function is not working properly.
> >
> > Consequently, userspace side is passing wrong pointer to rte_pktmbuf_free,
> > so that application crashes with CONFIG_RTE_LIBRTE_MBUF_DEBUG enabled.
> >
>
> Hi Junxiao,
>
> I don't see any issue in the code, and as far as I tested not seeing any issue.
>
> Can you please provide more information how to reproduce the issue, is it only
> enabling "CONFIG_RTE_LIBRTE_MBUF_DEBUG" config?
>
> Thanks,
> ferruh
  
Ferruh Yigit July 10, 2019, 9:23 p.m. UTC | #5
On 7/10/2019 9:40 PM, yoursunny wrote:
> Hi Ferruh
> 
> I've uploaded my test case to Bug 183.
> You can determine whether it is a bug that should be covered by this
> patch, or it should be handled separately.

Thanks for clarification,
your use case is about indirect mbufs, I can see why it is not working from your
explanation in the bug report.
I think previously when all mbufs were coming from same physically continuous
memory, indirect mbufs should work. Right now I don't know how to support them.

This defect fixes an issue for multi segment packages, caused by wrong address
calculation for next segments.

So I am for continuing with this patch, independent from indirect mbufs issue.

Do you have any other issue, except than the indirect mbuf issue, if you have
other KNI use cases?


> 
> Yours, Junxiao
> 
> On Wed, Jul 10, 2019 at 4:11 PM Ferruh Yigit <ferruh.yigit@intel.com> wrote:
>>
>> On 7/2/2019 9:07 PM, Junxiao Shi wrote:
>>> I am battling a related problem as reported on
>>> https://bugs.dpdk.org/show_bug.cgi?id=183 and this patch seems
>>> relevant, so I applied this patch on 196a46fab6eeb3ce2039e3bcaca80f8ba43ffc8d
>>>
>>> However, this patch does not work for me:
>>> with CONFIG_RTE_LIBRTE_MBUF_DEBUG enabled, kni_free_mbufs's invocation of
>>> rte_pktmbuf_free throws "bad mbuf pool" error.
>>>
>>> While all mbufs and segments in kni->rx_q now have physical addresses,
>>> the mbufs and segments placed back to kni->free_q still have (mis-)calculated
>>> virtual address. The pa2va function is not working properly.
>>>
>>> Consequently, userspace side is passing wrong pointer to rte_pktmbuf_free,
>>> so that application crashes with CONFIG_RTE_LIBRTE_MBUF_DEBUG enabled.
>>>
>>
>> Hi Junxiao,
>>
>> I don't see any issue in the code, and as far as I tested not seeing any issue.
>>
>> Can you please provide more information how to reproduce the issue, is it only
>> enabling "CONFIG_RTE_LIBRTE_MBUF_DEBUG" config?
>>
>> Thanks,
>> ferruh
  
yoursunny July 10, 2019, 11:52 p.m. UTC | #6
Hi Ferruh

I have no objection of merging the current commit.
I do have some ideas on how to support indirect mbufs. Please review
at https://bugs.dpdk.org/show_bug.cgi?id=183#c4

Yours, Junxiao

On Wed, Jul 10, 2019 at 5:23 PM Ferruh Yigit <ferruh.yigit@intel.com> wrote:
>
> your use case is about indirect mbufs, I can see why it is not working from your
> explanation in the bug report.
> I think previously when all mbufs were coming from same physically continuous
> memory, indirect mbufs should work. Right now I don't know how to support them.
  
Ferruh Yigit July 11, 2019, 7:46 a.m. UTC | #7
On 7/10/2019 9:09 PM, Ferruh Yigit wrote:
> On 6/25/2019 4:04 PM, Yangchao Zhou wrote:
>> va2pa depends on the physical address and virtual address offset of
>> current mbuf. It may get the wrong physical address of next mbuf which
>> allocated in another hugepage segment.
>>
>> In rte_mempool_populate_default(), trying to allocate whole block of
>> contiguous memory could be failed. Then, it would reserve memory in
>> several memzones that have different physical address and virtual address
>> offsets. The rte_mempool_populate_default() is used by
>> rte_pktmbuf_pool_create().
>>
>> Fixes: 8451269e6d7b ("kni: remove continuous memory restriction")
>>
>> Signed-off-by: Yangchao Zhou <zhouyates@gmail.com>
> 
> Overall looks good to me, not from this patch but can you please check below
> comment too.
> Also there is a comment from Junxiao, lets clear it before the ack.
> 

Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>

> <...>
> 
>> @@ -396,7 +401,7 @@ kni_net_rx_lo_fifo(struct kni_dev *kni)
>>  	uint32_t ret;
>>  	uint32_t len;
>>  	uint32_t i, num, num_rq, num_tq, num_aq, num_fq;
>> -	struct rte_kni_mbuf *kva;
>> +	struct rte_kni_mbuf *kva, *next_kva;
>>  	void *data_kva;
>>  	struct rte_kni_mbuf *alloc_kva;
>>  	void *alloc_data_kva;
>> @@ -439,6 +444,13 @@ kni_net_rx_lo_fifo(struct kni_dev *kni)
>>  			data_kva = kva2data_kva(kva);
>>  			kni->va[i] = pa2va(kni->pa[i], kva);
>>  
>> +			while (kva->next) {
>> +				next_kva = pa2kva(kva->next);
>> +				/* Convert physical address to virtual address */
>> +				kva->next = pa2va(kva->next, next_kva);
>> +				kva = next_kva;
>> +			}
> 
> Not done in this patch, but in 'kni_net_rx_lo_fifo()' the len calculated as
> 'len = kva->pkt_len;'
> 
> But while copying 'data' to 'alloc_data' the segmentation is not taken into
> account and 'len' is used:
> memcpy(alloc_data_kva, data_kva, len);
> 
> This may lead overflow 'alloc_data_kva' for some 'pkt_len' values.
> 

I will send separate patch for this.
  
Thomas Monjalon July 15, 2019, 8:50 p.m. UTC | #8
11/07/2019 09:46, Ferruh Yigit:
> On 7/10/2019 9:09 PM, Ferruh Yigit wrote:
> > On 6/25/2019 4:04 PM, Yangchao Zhou wrote:
> >> va2pa depends on the physical address and virtual address offset of
> >> current mbuf. It may get the wrong physical address of next mbuf which
> >> allocated in another hugepage segment.
> >>
> >> In rte_mempool_populate_default(), trying to allocate whole block of
> >> contiguous memory could be failed. Then, it would reserve memory in
> >> several memzones that have different physical address and virtual address
> >> offsets. The rte_mempool_populate_default() is used by
> >> rte_pktmbuf_pool_create().
> >>
> >> Fixes: 8451269e6d7b ("kni: remove continuous memory restriction")
> >>
> >> Signed-off-by: Yangchao Zhou <zhouyates@gmail.com>
> > 
> > Overall looks good to me, not from this patch but can you please check below
> > comment too.
> > Also there is a comment from Junxiao, lets clear it before the ack.
> > 
> 
> Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>

The commit log does not really explained neither the use case
nor the solution. But as you acked it...
Applied, thanks
  

Patch

diff --git a/doc/guides/prog_guide/kernel_nic_interface.rst b/doc/guides/prog_guide/kernel_nic_interface.rst
index daf87f4a8..2fd3b7983 100644
--- a/doc/guides/prog_guide/kernel_nic_interface.rst
+++ b/doc/guides/prog_guide/kernel_nic_interface.rst
@@ -268,12 +268,14 @@  Use Case: Ingress
 -----------------
 
 On the DPDK RX side, the mbuf is allocated by the PMD in the RX thread context.
-This thread will enqueue the mbuf in the rx_q FIFO.
+This thread will enqueue the mbuf in the rx_q FIFO, and the next pointers in mbuf-chain will convert to physical address.
 The KNI thread will poll all KNI active devices for the rx_q.
 If an mbuf is dequeued, it will be converted to a sk_buff and sent to the net stack via netif_rx().
-The dequeued mbuf must be freed, so the same pointer is sent back in the free_q FIFO.
+The dequeued mbuf must be freed, so the same pointer is sent back in the free_q FIFO,
+and next pointers must convert back to virtual address if exists before put in the free_q FIFO.
 
 The RX thread, in the same main loop, polls this FIFO and frees the mbuf after dequeuing it.
+The address conversion of the next pointer is to prevent the chained mbuf in different hugepage segments from causing kernel crash.
 
 Use Case: Egress
 ----------------
diff --git a/kernel/linux/kni/kni_net.c b/kernel/linux/kni/kni_net.c
index ad8365877..f65233aaa 100644
--- a/kernel/linux/kni/kni_net.c
+++ b/kernel/linux/kni/kni_net.c
@@ -61,18 +61,6 @@  kva2data_kva(struct rte_kni_mbuf *m)
 	return phys_to_virt(m->buf_physaddr + m->data_off);
 }
 
-/* virtual address to physical address */
-static void *
-va2pa(void *va, struct rte_kni_mbuf *m)
-{
-	void *pa;
-
-	pa = (void *)((unsigned long)va -
-			((unsigned long)m->buf_addr -
-			 (unsigned long)m->buf_physaddr));
-	return pa;
-}
-
 /*
  * It can be called to process the request.
  */
@@ -173,7 +161,10 @@  kni_fifo_trans_pa2va(struct kni_dev *kni,
 	struct rte_kni_fifo *src_pa, struct rte_kni_fifo *dst_va)
 {
 	uint32_t ret, i, num_dst, num_rx;
-	void *kva;
+	struct rte_kni_mbuf *kva, *prev_kva;
+	int nb_segs;
+	int kva_nb_segs;
+
 	do {
 		num_dst = kni_fifo_free_count(dst_va);
 		if (num_dst == 0)
@@ -188,6 +179,17 @@  kni_fifo_trans_pa2va(struct kni_dev *kni,
 		for (i = 0; i < num_rx; i++) {
 			kva = pa2kva(kni->pa[i]);
 			kni->va[i] = pa2va(kni->pa[i], kva);
+
+			kva_nb_segs = kva->nb_segs;
+			for (nb_segs = 0; nb_segs < kva_nb_segs; nb_segs++) {
+				if (!kva->next)
+					break;
+
+				prev_kva = kva;
+				kva = pa2kva(kva->next);
+				/* Convert physical address to virtual address */
+				prev_kva->next = pa2va(prev_kva->next, kva);
+			}
 		}
 
 		ret = kni_fifo_put(dst_va, kni->va, num_rx);
@@ -313,7 +315,7 @@  kni_net_rx_normal(struct kni_dev *kni)
 	uint32_t ret;
 	uint32_t len;
 	uint32_t i, num_rx, num_fq;
-	struct rte_kni_mbuf *kva;
+	struct rte_kni_mbuf *kva, *prev_kva;
 	void *data_kva;
 	struct sk_buff *skb;
 	struct net_device *dev = kni->net_dev;
@@ -363,8 +365,11 @@  kni_net_rx_normal(struct kni_dev *kni)
 				if (!kva->next)
 					break;
 
-				kva = pa2kva(va2pa(kva->next, kva));
+				prev_kva = kva;
+				kva = pa2kva(kva->next);
 				data_kva = kva2data_kva(kva);
+				/* Convert physical address to virtual address */
+				prev_kva->next = pa2va(prev_kva->next, kva);
 			}
 		}
 
@@ -396,7 +401,7 @@  kni_net_rx_lo_fifo(struct kni_dev *kni)
 	uint32_t ret;
 	uint32_t len;
 	uint32_t i, num, num_rq, num_tq, num_aq, num_fq;
-	struct rte_kni_mbuf *kva;
+	struct rte_kni_mbuf *kva, *next_kva;
 	void *data_kva;
 	struct rte_kni_mbuf *alloc_kva;
 	void *alloc_data_kva;
@@ -439,6 +444,13 @@  kni_net_rx_lo_fifo(struct kni_dev *kni)
 			data_kva = kva2data_kva(kva);
 			kni->va[i] = pa2va(kni->pa[i], kva);
 
+			while (kva->next) {
+				next_kva = pa2kva(kva->next);
+				/* Convert physical address to virtual address */
+				kva->next = pa2va(kva->next, next_kva);
+				kva = next_kva;
+			}
+
 			alloc_kva = pa2kva(kni->alloc_pa[i]);
 			alloc_data_kva = kva2data_kva(alloc_kva);
 			kni->alloc_va[i] = pa2va(kni->alloc_pa[i], alloc_kva);
@@ -481,7 +493,7 @@  kni_net_rx_lo_fifo_skb(struct kni_dev *kni)
 	uint32_t ret;
 	uint32_t len;
 	uint32_t i, num_rq, num_fq, num;
-	struct rte_kni_mbuf *kva;
+	struct rte_kni_mbuf *kva, *prev_kva;
 	void *data_kva;
 	struct sk_buff *skb;
 	struct net_device *dev = kni->net_dev;
@@ -545,8 +557,11 @@  kni_net_rx_lo_fifo_skb(struct kni_dev *kni)
 				if (!kva->next)
 					break;
 
-				kva = pa2kva(va2pa(kva->next, kva));
+				prev_kva = kva;
+				kva = pa2kva(kva->next);
 				data_kva = kva2data_kva(kva);
+				/* Convert physical address to virtual address */
+				prev_kva->next = pa2va(prev_kva->next, kva);
 			}
 		}
 
diff --git a/lib/librte_eal/linux/eal/include/rte_kni_common.h b/lib/librte_eal/linux/eal/include/rte_kni_common.h
index 91a1c1408..37d9ee8f0 100644
--- a/lib/librte_eal/linux/eal/include/rte_kni_common.h
+++ b/lib/librte_eal/linux/eal/include/rte_kni_common.h
@@ -86,7 +86,7 @@  struct rte_kni_mbuf {
 	/* fields on second cache line */
 	char pad3[8] __attribute__((__aligned__(RTE_CACHE_LINE_MIN_SIZE)));
 	void *pool;
-	void *next;
+	void *next;             /**< Physical address of next mbuf in kernel. */
 };
 
 /*
diff --git a/lib/librte_kni/rte_kni.c b/lib/librte_kni/rte_kni.c
index e29d0cc7d..6fc36f744 100644
--- a/lib/librte_kni/rte_kni.c
+++ b/lib/librte_kni/rte_kni.c
@@ -344,6 +344,19 @@  va2pa(struct rte_mbuf *m)
 			 (unsigned long)m->buf_iova));
 }
 
+static void *
+va2pa_all(struct rte_mbuf *mbuf)
+{
+	void *phy_mbuf = va2pa(mbuf);
+	struct rte_mbuf *next = mbuf->next;
+	while (next) {
+		mbuf->next = va2pa(next);
+		mbuf = next;
+		next = mbuf->next;
+	}
+	return phy_mbuf;
+}
+
 static void
 obj_free(struct rte_mempool *mp __rte_unused, void *opaque, void *obj,
 		unsigned obj_idx __rte_unused)
@@ -536,12 +549,13 @@  rte_kni_handle_request(struct rte_kni *kni)
 unsigned
 rte_kni_tx_burst(struct rte_kni *kni, struct rte_mbuf **mbufs, unsigned num)
 {
+	num = RTE_MIN(kni_fifo_free_count(kni->rx_q), num);
 	void *phy_mbufs[num];
 	unsigned int ret;
 	unsigned int i;
 
 	for (i = 0; i < num; i++)
-		phy_mbufs[i] = va2pa(mbufs[i]);
+		phy_mbufs[i] = va2pa_all(mbufs[i]);
 
 	ret = kni_fifo_put(kni->rx_q, phy_mbufs, num);
 
diff --git a/lib/librte_kni/rte_kni_fifo.h b/lib/librte_kni/rte_kni_fifo.h
index 287d7deb2..11d7fd6bd 100644
--- a/lib/librte_kni/rte_kni_fifo.h
+++ b/lib/librte_kni/rte_kni_fifo.h
@@ -104,3 +104,15 @@  kni_fifo_count(struct rte_kni_fifo *fifo)
 	unsigned fifo_read = __KNI_LOAD_ACQUIRE(&fifo->read);
 	return (fifo->len + fifo_write - fifo_read) & (fifo->len - 1);
 }
+
+/**
+ * Get the num of available elements in the fifo
+ */
+static inline uint32_t
+kni_fifo_free_count(struct rte_kni_fifo *fifo)
+{
+	uint32_t fifo_write = __KNI_LOAD_ACQUIRE(&fifo->write);
+	uint32_t fifo_read = __KNI_LOAD_ACQUIRE(&fifo->read);
+	return (fifo_read - fifo_write - 1) & (fifo->len - 1);
+}
+