From patchwork Wed Nov 28 09:46:00 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Xiao Wang X-Patchwork-Id: 48372 X-Patchwork-Delegate: maxime.coquelin@redhat.com Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 595CD1B460; Wed, 28 Nov 2018 10:55:51 +0100 (CET) Received: from mga05.intel.com (mga05.intel.com [192.55.52.43]) by dpdk.org (Postfix) with ESMTP id 82F611B452 for ; Wed, 28 Nov 2018 10:55:49 +0100 (CET) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga001.jf.intel.com ([10.7.209.18]) by fmsmga105.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 28 Nov 2018 01:55:49 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,290,1539673200"; d="scan'208";a="112891133" Received: from dpdk-xiao-1.sh.intel.com ([10.67.111.106]) by orsmga001.jf.intel.com with ESMTP; 28 Nov 2018 01:55:47 -0800 From: Xiao Wang To: tiwei.bie@intel.com, maxime.coquelin@redhat.com Cc: dev@dpdk.org, zhihong.wang@intel.com, xiaolong.ye@intel.com, Xiao Wang Date: Wed, 28 Nov 2018 17:46:00 +0800 Message-Id: <20181128094607.106173-3-xiao.w.wang@intel.com> X-Mailer: git-send-email 2.15.1 In-Reply-To: <20181128094607.106173-1-xiao.w.wang@intel.com> References: <20181128094607.106173-1-xiao.w.wang@intel.com> Subject: [dpdk-dev] [PATCH 2/9] vhost: provide helpers for virtio ring relay X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" This patch provides two helpers for vdpa device driver to perform a relay between the guest virtio ring and a mediate virtio ring. The available ring relay will synchronize the available entries, and helps to do desc validity checking. The used ring relay will synchronize the used entries from mediate ring to guest ring, and helps to do dirty page logging for live migration. The next patch will leverage these two helpers. Signed-off-by: Xiao Wang --- lib/librte_vhost/rte_vdpa.h | 38 ++++++++ lib/librte_vhost/rte_vhost_version.map | 2 + lib/librte_vhost/vdpa.c | 173 +++++++++++++++++++++++++++++++++ lib/librte_vhost/vhost.h | 40 ++++++++ lib/librte_vhost/virtio_net.c | 39 -------- 5 files changed, 253 insertions(+), 39 deletions(-) diff --git a/lib/librte_vhost/rte_vdpa.h b/lib/librte_vhost/rte_vdpa.h index 89c5bb6b3..0c44b9080 100644 --- a/lib/librte_vhost/rte_vdpa.h +++ b/lib/librte_vhost/rte_vdpa.h @@ -173,4 +173,42 @@ rte_vdpa_get_device_num(void); */ int __rte_experimental rte_vhost_host_notifier_ctrl(int vid, bool enable); + +/** + * @warning + * @b EXPERIMENTAL: this API may change without prior notice + * + * Synchronize the available ring from guest to mediate ring, help to + * check desc validity to protect against malicious guest driver. + * + * @param vid + * vhost device id + * @param qid + * vhost queue id + * @param m_vring + * mediate virtio ring pointer + * @return + * number of synced available entries on success, -1 on failure + */ +int __rte_experimental +rte_vdpa_relay_avail_ring(int vid, int qid, struct vring *m_vring); + +/** + * @warning + * @b EXPERIMENTAL: this API may change without prior notice + * + * Synchronize the used ring from mediate ring to guest, log dirty + * page for each Rx buffer used. + * + * @param vid + * vhost device id + * @param qid + * vhost queue id + * @param m_vring + * mediate virtio ring pointer + * @return + * number of synced used entries on success, -1 on failure + */ +int __rte_experimental +rte_vdpa_relay_used_ring(int vid, int qid, struct vring *m_vring); #endif /* _RTE_VDPA_H_ */ diff --git a/lib/librte_vhost/rte_vhost_version.map b/lib/librte_vhost/rte_vhost_version.map index 22302e972..0ad0fbea2 100644 --- a/lib/librte_vhost/rte_vhost_version.map +++ b/lib/librte_vhost/rte_vhost_version.map @@ -84,4 +84,6 @@ EXPERIMENTAL { rte_vhost_crypto_set_zero_copy; rte_vhost_va_from_guest_pa; rte_vhost_host_notifier_ctrl; + rte_vdpa_relay_avail_ring; + rte_vdpa_relay_used_ring; }; diff --git a/lib/librte_vhost/vdpa.c b/lib/librte_vhost/vdpa.c index e7d849ee0..e41117776 100644 --- a/lib/librte_vhost/vdpa.c +++ b/lib/librte_vhost/vdpa.c @@ -122,3 +122,176 @@ rte_vdpa_get_device_num(void) { return vdpa_device_num; } + +static int +invalid_desc_check(struct virtio_net *dev, struct vhost_virtqueue *vq, + uint64_t desc_iova, uint64_t desc_len, uint8_t perm) +{ + uint64_t desc_addr, desc_chunck_len; + + while (desc_len) { + desc_chunck_len = desc_len; + desc_addr = vhost_iova_to_vva(dev, vq, + desc_iova, + &desc_chunck_len, + perm); + + if (!desc_addr) + return -1; + + desc_len -= desc_chunck_len; + desc_iova += desc_chunck_len; + } + + return 0; +} + +int +rte_vdpa_relay_avail_ring(int vid, int qid, struct vring *m_vring) +{ + struct virtio_net *dev = get_device(vid); + uint16_t idx, idx_m, desc_id; + struct vring_desc desc; + struct vhost_virtqueue *vq; + struct vring_desc *desc_ring; + struct vring_desc *idesc = NULL; + uint64_t dlen; + int ret; + + if (!dev) + return -1; + + vq = dev->virtqueue[qid]; + idx = vq->avail->idx; + idx_m = m_vring->avail->idx; + ret = idx - idx_m; + + while (idx_m != idx) { + /* avail entry copy */ + desc_id = vq->avail->ring[idx_m % vq->size]; + m_vring->avail->ring[idx_m % vq->size] = desc_id; + desc_ring = vq->desc; + + if (vq->desc[desc_id].flags & VRING_DESC_F_INDIRECT) { + dlen = vq->desc[desc_id].len; + desc_ring = (struct vring_desc *)(uintptr_t) + vhost_iova_to_vva(dev, vq, vq->desc[desc_id].addr, + &dlen, + VHOST_ACCESS_RO); + if (unlikely(!desc_ring)) + return -1; + + if (unlikely(dlen < vq->desc[idx].len)) { + idesc = alloc_copy_ind_table(dev, vq, + vq->desc[idx].addr, vq->desc[idx].len); + if (unlikely(!idesc)) + return -1; + + desc_ring = idesc; + } + + desc_id = 0; + } + + /* check if the buf addr is within the guest memory */ + do { + desc = desc_ring[desc_id]; + if (invalid_desc_check(dev, vq, desc.addr, desc.len, + VHOST_ACCESS_RW)) + return -1; + desc_id = desc.next; + } while (desc.flags & VRING_DESC_F_NEXT); + + if (unlikely(!!idesc)) { + free_ind_table(idesc); + idesc = NULL; + } + + idx_m++; + } + + m_vring->avail->idx = idx; + + if (dev->features & (1ULL << VIRTIO_RING_F_EVENT_IDX)) + vhost_avail_event(vq) = vq->avail->idx; + + return ret; +} + +int +rte_vdpa_relay_used_ring(int vid, int qid, struct vring *m_vring) +{ + struct virtio_net *dev = get_device(vid); + uint16_t idx, idx_m, desc_id; + struct vhost_virtqueue *vq; + struct vring_desc desc; + struct vring_desc *desc_ring; + struct vring_desc *idesc = NULL; + uint64_t dlen; + int ret; + + if (!dev) + return -1; + + vq = dev->virtqueue[qid]; + idx = vq->used->idx; + idx_m = m_vring->used->idx; + ret = idx_m - idx; + + while (idx != idx_m) { + /* copy used entry, used ring logging is not covered here */ + vq->used->ring[idx % vq->size] = + m_vring->used->ring[idx % vq->size]; + + /* dirty page logging for used ring */ + vhost_log_used_vring(dev, vq, + offsetof(struct vring_used, ring[idx % vq->size]), + sizeof(struct vring_used_elem)); + + desc_id = vq->used->ring[idx % vq->size].id; + desc_ring = vq->desc; + + if (vq->desc[desc_id].flags & VRING_DESC_F_INDIRECT) { + dlen = vq->desc[desc_id].len; + desc_ring = (struct vring_desc *)(uintptr_t) + vhost_iova_to_vva(dev, vq, vq->desc[desc_id].addr, + &dlen, + VHOST_ACCESS_RO); + if (unlikely(!desc_ring)) + return -1; + + if (unlikely(dlen < vq->desc[idx].len)) { + idesc = alloc_copy_ind_table(dev, vq, + vq->desc[idx].addr, vq->desc[idx].len); + if (unlikely(!idesc)) + return -1; + + desc_ring = idesc; + } + + desc_id = 0; + } + + /* dirty page logging for Rx buffer */ + do { + desc = desc_ring[desc_id]; + if (desc.flags & VRING_DESC_F_WRITE) + vhost_log_write(dev, desc.addr, desc.len); + desc_id = desc.next; + } while (desc.flags & VRING_DESC_F_NEXT); + + if (unlikely(!!idesc)) { + free_ind_table(idesc); + idesc = NULL; + } + + idx++; + } + + vq->used->idx = idx_m; + + if (dev->features & (1ULL << VIRTIO_RING_F_EVENT_IDX)) + vring_used_event(m_vring) = m_vring->used->idx; + + return ret; +} diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h index 5218f1b12..2164cd6d9 100644 --- a/lib/librte_vhost/vhost.h +++ b/lib/librte_vhost/vhost.h @@ -18,6 +18,7 @@ #include #include #include +#include #include "rte_vhost.h" #include "rte_vdpa.h" @@ -753,4 +754,43 @@ vhost_vring_call_packed(struct virtio_net *dev, struct vhost_virtqueue *vq) eventfd_write(vq->callfd, (eventfd_t)1); } +static __rte_always_inline void * +alloc_copy_ind_table(struct virtio_net *dev, struct vhost_virtqueue *vq, + uint64_t desc_addr, uint64_t desc_len) +{ + void *idesc; + uint64_t src, dst; + uint64_t len, remain = desc_len; + + idesc = rte_malloc(__func__, desc_len, 0); + if (unlikely(!idesc)) + return 0; + + dst = (uint64_t)(uintptr_t)idesc; + + while (remain) { + len = remain; + src = vhost_iova_to_vva(dev, vq, desc_addr, &len, + VHOST_ACCESS_RO); + if (unlikely(!src || !len)) { + rte_free(idesc); + return 0; + } + + rte_memcpy((void *)(uintptr_t)dst, (void *)(uintptr_t)src, len); + + remain -= len; + dst += len; + desc_addr += len; + } + + return idesc; +} + +static __rte_always_inline void +free_ind_table(void *idesc) +{ + rte_free(idesc); +} + #endif /* _VHOST_NET_CDEV_H_ */ diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c index 5e1a1a727..8c657a101 100644 --- a/lib/librte_vhost/virtio_net.c +++ b/lib/librte_vhost/virtio_net.c @@ -37,45 +37,6 @@ is_valid_virt_queue_idx(uint32_t idx, int is_tx, uint32_t nr_vring) return (is_tx ^ (idx & 1)) == 0 && idx < nr_vring; } -static __rte_always_inline void * -alloc_copy_ind_table(struct virtio_net *dev, struct vhost_virtqueue *vq, - uint64_t desc_addr, uint64_t desc_len) -{ - void *idesc; - uint64_t src, dst; - uint64_t len, remain = desc_len; - - idesc = rte_malloc(__func__, desc_len, 0); - if (unlikely(!idesc)) - return 0; - - dst = (uint64_t)(uintptr_t)idesc; - - while (remain) { - len = remain; - src = vhost_iova_to_vva(dev, vq, desc_addr, &len, - VHOST_ACCESS_RO); - if (unlikely(!src || !len)) { - rte_free(idesc); - return 0; - } - - rte_memcpy((void *)(uintptr_t)dst, (void *)(uintptr_t)src, len); - - remain -= len; - dst += len; - desc_addr += len; - } - - return idesc; -} - -static __rte_always_inline void -free_ind_table(void *idesc) -{ - rte_free(idesc); -} - static __rte_always_inline void do_flush_shadow_used_ring_split(struct virtio_net *dev, struct vhost_virtqueue *vq,