[dpdk-dev,v3,1/4] vmxnet3: restore tx data ring support
Commit Message
Tx data ring support was removed in a previous change
to add multi-seg transmit. This change adds it back.
According to the original commit (2e849373), 64B pkt
rate with l2fwd improved by ~20% on an Ivy Bridge
server at which point we start to hit some bottleneck
on the rx side.
I also re-did the same test on a different setup (Haswell
processor, ~2.3GHz clock rate) on top of the master
and still observed ~17% performance gains.
Fixes: 7ba5de417e3c ("vmxnet3: support multi-segment transmit")
Signed-off-by: Yong Wang <yongwang@vmware.com>
---
doc/guides/rel_notes/release_2_3.rst | 5 +++++
drivers/net/vmxnet3/vmxnet3_rxtx.c | 17 ++++++++++++++++-
2 files changed, 21 insertions(+), 1 deletion(-)
Comments
On Tue, 5 Jan 2016 16:12:55 -0800
Yong Wang <yongwang@vmware.com> wrote:
> @@ -365,6 +366,14 @@ vmxnet3_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
> break;
> }
>
> + if (rte_pktmbuf_pkt_len(txm) <= VMXNET3_HDR_COPY_SIZE) {
> + struct Vmxnet3_TxDataDesc *tdd;
> +
> + tdd = txq->data_ring.base + txq->cmd_ring.next2fill;
> + copy_size = rte_pktmbuf_pkt_len(txm);
> + rte_memcpy(tdd->data, rte_pktmbuf_mtod(txm, char *), copy_size);
> + }
Good idea to use a local region which optmizes the copy in the host,
but this implementation needs to be more general.
As written it is broken for multi-segment packets. A multi-segment
packet will have a pktlen >= datalen as in:
m -> mb_segs=3, pktlen=1200, datalen=200
-> datalen=900
-> datalen=100
There are two ways to fix this. You could test for nb_segs == 1
or better yet. Optimize each segment it might be that the first
segment (or tail segment) would fit in the available data area.
On 1/5/16, 4:48 PM, "Stephen Hemminger" <stephen@networkplumber.org> wrote:
>On Tue, 5 Jan 2016 16:12:55 -0800
>Yong Wang <yongwang@vmware.com> wrote:
>
>> @@ -365,6 +366,14 @@ vmxnet3_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
>> break;
>> }
>>
>> + if (rte_pktmbuf_pkt_len(txm) <= VMXNET3_HDR_COPY_SIZE) {
>> + struct Vmxnet3_TxDataDesc *tdd;
>> +
>> + tdd = txq->data_ring.base + txq->cmd_ring.next2fill;
>> + copy_size = rte_pktmbuf_pkt_len(txm);
>> + rte_memcpy(tdd->data, rte_pktmbuf_mtod(txm, char *), copy_size);
>> + }
>
>Good idea to use a local region which optmizes the copy in the host,
>but this implementation needs to be more general.
>
>As written it is broken for multi-segment packets. A multi-segment
>packet will have a pktlen >= datalen as in:
> m -> mb_segs=3, pktlen=1200, datalen=200
> -> datalen=900
> -> datalen=100
>
>There are two ways to fix this. You could test for nb_segs == 1
>or better yet. Optimize each segment it might be that the first
>segment (or tail segment) would fit in the available data area.
Currently the vmxnet3 backend has a limitation of 128B data area so
it should work even for the multi-segmented pkt shown above. But
I agree it does not work for all multi-segmented packets. The
following packet will be such an example.
m -> nb_segs=3, pktlen=128, datalen=64
-> datalen=32
-> datalen=32
It’s unclear if/how we might get into such a multi-segmented pkt
but I agree we should handle this case. Patch updated taking the
simple approach (checking for nb_segs == 1). I’ll leave the
optimization as a future patch.
On Wed, 13 Jan 2016 02:20:01 +0000
Yong Wang <yongwang@vmware.com> wrote:
> >Good idea to use a local region which optmizes the copy in the host,
> >but this implementation needs to be more general.
> >
> >As written it is broken for multi-segment packets. A multi-segment
> >packet will have a pktlen >= datalen as in:
> > m -> mb_segs=3, pktlen=1200, datalen=200
> > -> datalen=900
> > -> datalen=100
> >
> >There are two ways to fix this. You could test for nb_segs == 1
> >or better yet. Optimize each segment it might be that the first
> >segment (or tail segment) would fit in the available data area.
>
> Currently the vmxnet3 backend has a limitation of 128B data area so
> it should work even for the multi-segmented pkt shown above. But
> I agree it does not work for all multi-segmented packets. The
> following packet will be such an example.
>
> m -> nb_segs=3, pktlen=128, datalen=64
> -> datalen=32
> -> datalen=32
>
>
> It’s unclear if/how we might get into such a multi-segmented pkt
> but I agree we should handle this case. Patch updated taking the
> simple approach (checking for nb_segs == 1). I’ll leave the
> optimization as a future patch.
Such a packet can happen when adding a tunnel header such as VXLAN
and the underlying packet is shared (refcnt > 1) or does not have
enough headroom for the tunnel header.
@@ -15,6 +15,11 @@ EAL
Drivers
~~~~~~~
+* **vmxnet3: restore tx data ring.**
+
+ Tx data ring has been shown to improve small pkt forwarding performance
+ on vSphere environment.
+
Libraries
~~~~~~~~~
@@ -348,6 +348,7 @@ vmxnet3_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
uint32_t first2fill, avail, dw2;
struct rte_mbuf *txm = tx_pkts[nb_tx];
struct rte_mbuf *m_seg = txm;
+ int copy_size = 0;
/* Is this packet execessively fragmented, then drop */
if (unlikely(txm->nb_segs > VMXNET3_MAX_TXD_PER_PKT)) {
@@ -365,6 +366,14 @@ vmxnet3_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
break;
}
+ if (rte_pktmbuf_pkt_len(txm) <= VMXNET3_HDR_COPY_SIZE) {
+ struct Vmxnet3_TxDataDesc *tdd;
+
+ tdd = txq->data_ring.base + txq->cmd_ring.next2fill;
+ copy_size = rte_pktmbuf_pkt_len(txm);
+ rte_memcpy(tdd->data, rte_pktmbuf_mtod(txm, char *), copy_size);
+ }
+
/* use the previous gen bit for the SOP desc */
dw2 = (txq->cmd_ring.gen ^ 0x1) << VMXNET3_TXD_GEN_SHIFT;
first2fill = txq->cmd_ring.next2fill;
@@ -377,7 +386,13 @@ vmxnet3_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
transmit buffer size (16K) is greater than
maximum sizeof mbuf segment size. */
gdesc = txq->cmd_ring.base + txq->cmd_ring.next2fill;
- gdesc->txd.addr = RTE_MBUF_DATA_DMA_ADDR(m_seg);
+ if (copy_size)
+ gdesc->txd.addr = rte_cpu_to_le_64(txq->data_ring.basePA +
+ txq->cmd_ring.next2fill *
+ sizeof(struct Vmxnet3_TxDataDesc));
+ else
+ gdesc->txd.addr = RTE_MBUF_DATA_DMA_ADDR(m_seg);
+
gdesc->dword[2] = dw2 | m_seg->data_len;
gdesc->dword[3] = 0;