[dpdk-dev,v2,4/7] vhost: do not use rte_memcpy for virtio_hdr copy
Commit Message
First of all, rte_memcpy() is mostly useful for coping big packets
by leveraging hardware advanced instructions like AVX. But for virtio
net hdr, which is 12 bytes at most, invoking rte_memcpy() will not
introduce any performance boost.
And, to my suprise, rte_memcpy() is VERY huge. Since rte_memcpy()
is inlined, it increases the binary code size linearly every time
we call it at a different place. Replacing the two rte_memcpy()
with directly copy saves nearly 12K bytes of code size!
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
---
lib/librte_vhost/vhost_rxtx.c | 17 +++++++++++++----
1 file changed, 13 insertions(+), 4 deletions(-)
Comments
On 2/18/2016 9:48 PM, Yuanhan Liu wrote:
> Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Acked-by: Huawei Xie <huawei.xie@intel.com>
On Thu, 18 Feb 2016 21:49:09 +0800
Yuanhan Liu <yuanhan.liu@linux.intel.com> wrote:
> +static inline void
> +copy_virtio_net_hdr(struct vhost_virtqueue *vq, uint64_t desc_addr,
> + struct virtio_net_hdr_mrg_rxbuf hdr)
> +{
> + if (vq->vhost_hlen == sizeof(struct virtio_net_hdr_mrg_rxbuf)) {
> + *(struct virtio_net_hdr_mrg_rxbuf *)(uintptr_t)desc_addr = hdr;
> + } else {
> + *(struct virtio_net_hdr *)(uintptr_t)desc_addr = hdr.hdr;
> + }
> +}
> +
Don't use {} around single statements.
Since you are doing all this casting, why not just use regular old memcpy
which will be inlined by Gcc into same instructions anyway.
And since are always casting the desc_addr, why not pass a type that
doesn't need the additional cast (like void *)
On 3/7/2016 12:20 PM, Stephen Hemminger wrote:
> On Thu, 18 Feb 2016 21:49:09 +0800
> Yuanhan Liu <yuanhan.liu@linux.intel.com> wrote:
>
>> +static inline void
>> +copy_virtio_net_hdr(struct vhost_virtqueue *vq, uint64_t desc_addr,
>> + struct virtio_net_hdr_mrg_rxbuf hdr)
>> +{
>> + if (vq->vhost_hlen == sizeof(struct virtio_net_hdr_mrg_rxbuf)) {
>> + *(struct virtio_net_hdr_mrg_rxbuf *)(uintptr_t)desc_addr = hdr;
>> + } else {
>> + *(struct virtio_net_hdr *)(uintptr_t)desc_addr = hdr.hdr;
>> + }
>> +}
>> +
> Don't use {} around single statements.
There are other cs issues, like
used_idx = vq->last_used_idx & (vq->size -1);
^ space needed
Please run checkpatch against your patch.
> Since you are doing all this casting, why not just use regular old memcpy
> which will be inlined by Gcc into same instructions anyway.
> And since are always casting the desc_addr, why not pass a type that
> doesn't need the additional cast (like void *)
>
On Sun, Mar 06, 2016 at 08:20:00PM -0800, Stephen Hemminger wrote:
> On Thu, 18 Feb 2016 21:49:09 +0800
> Yuanhan Liu <yuanhan.liu@linux.intel.com> wrote:
>
> > +static inline void
> > +copy_virtio_net_hdr(struct vhost_virtqueue *vq, uint64_t desc_addr,
> > + struct virtio_net_hdr_mrg_rxbuf hdr)
> > +{
> > + if (vq->vhost_hlen == sizeof(struct virtio_net_hdr_mrg_rxbuf)) {
> > + *(struct virtio_net_hdr_mrg_rxbuf *)(uintptr_t)desc_addr = hdr;
> > + } else {
> > + *(struct virtio_net_hdr *)(uintptr_t)desc_addr = hdr.hdr;
> > + }
> > +}
> > +
>
> Don't use {} around single statements.
Oh, I was thinking that it's a personal preference. Okay, I will remove
them.
> Since you are doing all this casting, why not just use regular old memcpy
> which will be inlined by Gcc into same instructions anyway.
I thought there are some (tiny) differences: memcpy() is not an inlined
function. And I was thinking it generates some slightly more complicated
instructions.
> And since are always casting the desc_addr, why not pass a type that
> doesn't need the additional cast (like void *)
You have to cast it from "uint64_t" to "void *" as well while call it.
So, that makes no difference.
--yliu
@@ -92,6 +92,17 @@ virtio_enqueue_offload(struct rte_mbuf *m_buf, struct virtio_net_hdr *net_hdr)
return;
}
+static inline void
+copy_virtio_net_hdr(struct vhost_virtqueue *vq, uint64_t desc_addr,
+ struct virtio_net_hdr_mrg_rxbuf hdr)
+{
+ if (vq->vhost_hlen == sizeof(struct virtio_net_hdr_mrg_rxbuf)) {
+ *(struct virtio_net_hdr_mrg_rxbuf *)(uintptr_t)desc_addr = hdr;
+ } else {
+ *(struct virtio_net_hdr *)(uintptr_t)desc_addr = hdr.hdr;
+ }
+}
+
static inline int __attribute__((always_inline))
copy_mbuf_to_desc(struct virtio_net *dev, struct vhost_virtqueue *vq,
struct rte_mbuf *m, uint16_t desc_idx, uint32_t *copied)
@@ -108,8 +119,7 @@ copy_mbuf_to_desc(struct virtio_net *dev, struct vhost_virtqueue *vq,
rte_prefetch0((void *)(uintptr_t)desc_addr);
virtio_enqueue_offload(m, &virtio_hdr.hdr);
- rte_memcpy((void *)(uintptr_t)desc_addr,
- (const void *)&virtio_hdr, vq->vhost_hlen);
+ copy_virtio_net_hdr(vq, desc_addr, virtio_hdr);
PRINT_PACKET(dev, (uintptr_t)desc_addr, vq->vhost_hlen, 0);
desc_offset = vq->vhost_hlen;
@@ -404,8 +414,7 @@ copy_mbuf_to_desc_mergeable(struct virtio_net *dev, struct vhost_virtqueue *vq,
dev->device_fh, virtio_hdr.num_buffers);
virtio_enqueue_offload(m, &virtio_hdr.hdr);
- rte_memcpy((void *)(uintptr_t)desc_addr,
- (const void *)&virtio_hdr, vq->vhost_hlen);
+ copy_virtio_net_hdr(vq, desc_addr, virtio_hdr);
PRINT_PACKET(dev, (uintptr_t)desc_addr, vq->vhost_hlen, 0);
desc_avail = vq->buf_vec[vec_idx].buf_len - vq->vhost_hlen;