From patchwork Wed Aug 12 08:02:43 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ouyang Changchun X-Patchwork-Id: 6744 Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [IPv6:::1]) by dpdk.org (Postfix) with ESMTP id 4968B8DA4; Wed, 12 Aug 2015 10:03:21 +0200 (CEST) Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by dpdk.org (Postfix) with ESMTP id D7F4F8DA3 for ; Wed, 12 Aug 2015 10:03:19 +0200 (CEST) Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by orsmga101.jf.intel.com with ESMTP; 12 Aug 2015 01:03:19 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.15,659,1432623600"; d="scan'208";a="540305883" Received: from shvmail01.sh.intel.com ([10.239.29.42]) by FMSMGA003.fm.intel.com with ESMTP; 12 Aug 2015 01:03:18 -0700 Received: from shecgisg004.sh.intel.com (shecgisg004.sh.intel.com [10.239.29.89]) by shvmail01.sh.intel.com with ESMTP id t7C83FFt027089; Wed, 12 Aug 2015 16:03:15 +0800 Received: from shecgisg004.sh.intel.com (localhost [127.0.0.1]) by shecgisg004.sh.intel.com (8.13.6/8.13.6/SuSE Linux 0.8) with ESMTP id t7C83C5C003682; Wed, 12 Aug 2015 16:03:14 +0800 Received: (from couyang@localhost) by shecgisg004.sh.intel.com (8.13.6/8.13.6/Submit) id t7C83C8E003678; Wed, 12 Aug 2015 16:03:12 +0800 From: Ouyang Changchun To: dev@dpdk.org Date: Wed, 12 Aug 2015 16:02:43 +0800 Message-Id: <1439366567-3402-9-git-send-email-changchun.ouyang@intel.com> X-Mailer: git-send-email 1.7.12.2 In-Reply-To: <1439366567-3402-1-git-send-email-changchun.ouyang@intel.com> References: <1434355006-30583-1-git-send-email-changchun.ouyang@intel.com> <1439366567-3402-1-git-send-email-changchun.ouyang@intel.com> Subject: [dpdk-dev] [PATCH v4 08/12] vhost: support multiple queues X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Sample vhost leverage the VMDq+RSS in HW to receive packets and distribute them into different queue in the pool according to 5 tuples. And enable multiple queues mode in vhost/virtio layer. HW queue numbers in pool exactly same with the queue number in virtio device, e.g. rxq = 4, the queue number is 4, it means 4 HW queues in each VMDq pool, and 4 queues in each virtio device/port, one maps to each. ========================================= ==================| |==================| vport0 | | vport1 | --- --- --- ---| |--- --- --- ---| q0 | q1 | q2 | q3 | |q0 | q1 | q2 | q3 | /\= =/\= =/\= =/\=| |/\= =/\= =/\= =/\=| || || || || || || || || || || || || || || || || ||= =||= =||= =||=| =||== ||== ||== ||=| q0 | q1 | q2 | q3 | |q0 | q1 | q2 | q3 | ------------------| |------------------| VMDq pool0 | | VMDq pool1 | ==================| |==================| In RX side, it firstly polls each queue of the pool and gets the packets from it and enqueue them into its corresponding queue in virtio device/port. In TX side, it dequeue packets from each queue of virtio device/port and send to either physical port or another virtio device according to its destination MAC address. Signed-off-by: Changchun Ouyang --- Changes in v4: - address comments and refine var name - support FVL nic - fix check patch issue Changes in v2: - check queue num per pool in VMDq and queue pair number per vhost device - remove the unnecessary calling q_num_set api - fix checkpatch errors examples/vhost/main.c | 190 ++++++++++++++++++++++++++++++++------------------ 1 file changed, 124 insertions(+), 66 deletions(-) diff --git a/examples/vhost/main.c b/examples/vhost/main.c index 5b811af..683a300 100644 --- a/examples/vhost/main.c +++ b/examples/vhost/main.c @@ -368,6 +368,37 @@ validate_num_devices(uint32_t max_nb_devices) return 0; } +static int +get_dev_nb_for_82599(struct rte_eth_dev_info dev_info) +{ + int dev_nb = -1; + switch (rxq) { + case 1: + case 2: + /* + * for 82599, dev_info.max_vmdq_pools always 64 dispite rx mode. + */ + dev_nb = (int)dev_info.max_vmdq_pools; + break; + case 4: + dev_nb = (int)dev_info.max_vmdq_pools / 2; + break; + default: + RTE_LOG(ERR, VHOST_CONFIG, "rxq invalid for VMDq.\n"); + } + return dev_nb; +} + +static int +get_dev_nb_for_fvl(struct rte_eth_dev_info dev_info) +{ + /* + * for FVL, dev_info.max_vmdq_pools is calculated according to + * the configured value: CONFIG_RTE_LIBRTE_I40E_QUEUE_NUM_PER_VM. + */ + return (int)dev_info.max_vmdq_pools; +} + /* * Initialises a given port using global settings and with the rx buffers * coming from the mbuf_pool passed as parameter @@ -412,17 +443,14 @@ port_init(uint8_t port) } /* Configure the virtio devices num based on VMDQ limits */ - switch (rxq) { - case 1: - case 2: - num_devices = dev_info.max_vmdq_pools; - break; - case 4: - num_devices = dev_info.max_vmdq_pools / 2; - break; - default: - RTE_LOG(ERR, VHOST_CONFIG, "rxq invalid for VMDq.\n"); - return -1; + if (dev_info.max_vmdq_pools == ETH_64_POOLS) { + num_devices = (uint32_t)get_dev_nb_for_82599(dev_info); + if (num_devices == (uint32_t)-1) + return -1; + } else { + num_devices = (uint32_t)get_dev_nb_for_fvl(dev_info); + if (num_devices == (uint32_t)-1) + return -1; } if (zero_copy) { @@ -1001,8 +1029,9 @@ link_vmdq(struct vhost_dev *vdev, struct rte_mbuf *m) /* Enable stripping of the vlan tag as we handle routing. */ if (vlan_strip) - rte_eth_dev_set_vlan_strip_on_queue(ports[0], - (uint16_t)vdev->vmdq_rx_q, 1); + for (i = 0; i < (int)rxq; i++) + rte_eth_dev_set_vlan_strip_on_queue(ports[0], + (uint16_t)(vdev->vmdq_rx_q + i), 1); /* Set device as ready for RX. */ vdev->ready = DEVICE_RX; @@ -1017,7 +1046,7 @@ link_vmdq(struct vhost_dev *vdev, struct rte_mbuf *m) static inline void unlink_vmdq(struct vhost_dev *vdev) { - unsigned i = 0; + unsigned i = 0, j = 0; unsigned rx_count; struct rte_mbuf *pkts_burst[MAX_PKT_BURST]; @@ -1030,15 +1059,19 @@ unlink_vmdq(struct vhost_dev *vdev) vdev->vlan_tag = 0; /*Clear out the receive buffers*/ - rx_count = rte_eth_rx_burst(ports[0], - (uint16_t)vdev->vmdq_rx_q, pkts_burst, MAX_PKT_BURST); + for (i = 0; i < rxq; i++) { + rx_count = rte_eth_rx_burst(ports[0], + (uint16_t)vdev->vmdq_rx_q + i, + pkts_burst, MAX_PKT_BURST); - while (rx_count) { - for (i = 0; i < rx_count; i++) - rte_pktmbuf_free(pkts_burst[i]); + while (rx_count) { + for (j = 0; j < rx_count; j++) + rte_pktmbuf_free(pkts_burst[j]); - rx_count = rte_eth_rx_burst(ports[0], - (uint16_t)vdev->vmdq_rx_q, pkts_burst, MAX_PKT_BURST); + rx_count = rte_eth_rx_burst(ports[0], + (uint16_t)vdev->vmdq_rx_q + i, + pkts_burst, MAX_PKT_BURST); + } } vdev->ready = DEVICE_MAC_LEARNING; @@ -1050,7 +1083,7 @@ unlink_vmdq(struct vhost_dev *vdev) * the packet on that devices RX queue. If not then return. */ static inline int __attribute__((always_inline)) -virtio_tx_local(struct vhost_dev *vdev, struct rte_mbuf *m) +virtio_tx_local(struct vhost_dev *vdev, struct rte_mbuf *m, uint32_t q_idx) { struct virtio_net_data_ll *dev_ll; struct ether_hdr *pkt_hdr; @@ -1065,7 +1098,7 @@ virtio_tx_local(struct vhost_dev *vdev, struct rte_mbuf *m) while (dev_ll != NULL) { if ((dev_ll->vdev->ready == DEVICE_RX) && ether_addr_cmp(&(pkt_hdr->d_addr), - &dev_ll->vdev->mac_address)) { + &dev_ll->vdev->mac_address)) { /* Drop the packet if the TX packet is destined for the TX device. */ if (dev_ll->vdev->dev->device_fh == dev->device_fh) { @@ -1083,7 +1116,9 @@ virtio_tx_local(struct vhost_dev *vdev, struct rte_mbuf *m) LOG_DEBUG(VHOST_DATA, "(%"PRIu64") Device is marked for removal\n", tdev->device_fh); } else { /*send the packet to the local virtio device*/ - ret = rte_vhost_enqueue_burst(tdev, VIRTIO_RXQ, &m, 1); + ret = rte_vhost_enqueue_burst(tdev, + VIRTIO_RXQ + q_idx * VIRTIO_QNUM, + &m, 1); if (enable_stats) { rte_atomic64_add( &dev_statistics[tdev->device_fh].rx_total_atomic, @@ -1160,7 +1195,8 @@ find_local_dest(struct virtio_net *dev, struct rte_mbuf *m, * or the physical port. */ static inline void __attribute__((always_inline)) -virtio_tx_route(struct vhost_dev *vdev, struct rte_mbuf *m, uint16_t vlan_tag) +virtio_tx_route(struct vhost_dev *vdev, struct rte_mbuf *m, + uint16_t vlan_tag, uint32_t q_idx) { struct mbuf_table *tx_q; struct rte_mbuf **m_table; @@ -1170,7 +1206,8 @@ virtio_tx_route(struct vhost_dev *vdev, struct rte_mbuf *m, uint16_t vlan_tag) struct ether_hdr *nh; /*check if destination is local VM*/ - if ((vm2vm_mode == VM2VM_SOFTWARE) && (virtio_tx_local(vdev, m) == 0)) { + if ((vm2vm_mode == VM2VM_SOFTWARE) && + (virtio_tx_local(vdev, m, q_idx) == 0)) { rte_pktmbuf_free(m); return; } @@ -1334,49 +1371,60 @@ switch_worker(__attribute__((unused)) void *arg) } if (likely(vdev->ready == DEVICE_RX)) { /*Handle guest RX*/ - rx_count = rte_eth_rx_burst(ports[0], - vdev->vmdq_rx_q, pkts_burst, MAX_PKT_BURST); + for (i = 0; i < rxq; i++) { + rx_count = rte_eth_rx_burst(ports[0], + vdev->vmdq_rx_q + i, pkts_burst, MAX_PKT_BURST); - if (rx_count) { - /* - * Retry is enabled and the queue is full then we wait and retry to avoid packet loss - * Here MAX_PKT_BURST must be less than virtio queue size - */ - if (enable_retry && unlikely(rx_count > rte_vring_available_entries(dev, VIRTIO_RXQ))) { - for (retry = 0; retry < burst_rx_retry_num; retry++) { - rte_delay_us(burst_rx_delay_time); - if (rx_count <= rte_vring_available_entries(dev, VIRTIO_RXQ)) - break; + if (rx_count) { + /* + * Retry is enabled and the queue is full then we wait and retry to avoid packet loss + * Here MAX_PKT_BURST must be less than virtio queue size + */ + if (enable_retry && unlikely(rx_count > rte_vring_available_entries(dev, + VIRTIO_RXQ + i * VIRTIO_QNUM))) { + for (retry = 0; retry < burst_rx_retry_num; retry++) { + rte_delay_us(burst_rx_delay_time); + if (rx_count <= rte_vring_available_entries(dev, + VIRTIO_RXQ + i * VIRTIO_QNUM)) + break; + } + } + ret_count = rte_vhost_enqueue_burst(dev, VIRTIO_RXQ + i * VIRTIO_QNUM, + pkts_burst, rx_count); + if (enable_stats) { + rte_atomic64_add( + &dev_statistics[dev_ll->vdev->dev->device_fh].rx_total_atomic, + rx_count); + rte_atomic64_add( + &dev_statistics[dev_ll->vdev->dev->device_fh].rx_atomic, ret_count); + } + while (likely(rx_count)) { + rx_count--; + rte_pktmbuf_free(pkts_burst[rx_count]); } } - ret_count = rte_vhost_enqueue_burst(dev, VIRTIO_RXQ, pkts_burst, rx_count); - if (enable_stats) { - rte_atomic64_add( - &dev_statistics[dev_ll->vdev->dev->device_fh].rx_total_atomic, - rx_count); - rte_atomic64_add( - &dev_statistics[dev_ll->vdev->dev->device_fh].rx_atomic, ret_count); - } - while (likely(rx_count)) { - rx_count--; - rte_pktmbuf_free(pkts_burst[rx_count]); - } - } } if (likely(!vdev->remove)) { /* Handle guest TX*/ - tx_count = rte_vhost_dequeue_burst(dev, VIRTIO_TXQ, mbuf_pool, pkts_burst, MAX_PKT_BURST); - /* If this is the first received packet we need to learn the MAC and setup VMDQ */ - if (unlikely(vdev->ready == DEVICE_MAC_LEARNING) && tx_count) { - if (vdev->remove || (link_vmdq(vdev, pkts_burst[0]) == -1)) { - while (tx_count) - rte_pktmbuf_free(pkts_burst[--tx_count]); + for (i = 0; i < rxq; i++) { + tx_count = rte_vhost_dequeue_burst(dev, VIRTIO_TXQ + i * VIRTIO_QNUM, + mbuf_pool, pkts_burst, MAX_PKT_BURST); + /* + * If this is the first received packet we need to learn + * the MAC and setup VMDQ + */ + if (unlikely(vdev->ready == DEVICE_MAC_LEARNING) && tx_count) { + if (vdev->remove || (link_vmdq(vdev, pkts_burst[0]) == -1)) { + while (tx_count) + rte_pktmbuf_free(pkts_burst[--tx_count]); + } } + while (tx_count) + virtio_tx_route(vdev, pkts_burst[--tx_count], + (uint16_t)dev->device_fh, i); } - while (tx_count) - virtio_tx_route(vdev, pkts_burst[--tx_count], (uint16_t)dev->device_fh); } /*move to the next device in the list*/ @@ -2634,6 +2682,14 @@ new_device (struct virtio_net *dev) uint32_t device_num_min = num_devices; struct vhost_dev *vdev; uint32_t regionidx; + uint32_t i; + + if ((rxq > 1) && (dev->virt_qp_nb != rxq)) { + RTE_LOG(ERR, VHOST_DATA, "(%"PRIu64") queue num in VMDq pool:" + "%d != queue pair num in vhost dev:%d\n", + dev->device_fh, rxq, dev->virt_qp_nb); + return -1; + } vdev = rte_zmalloc("vhost device", sizeof(*vdev), RTE_CACHE_LINE_SIZE); if (vdev == NULL) { @@ -2680,12 +2736,12 @@ new_device (struct virtio_net *dev) } } - /* Add device to main ll */ ll_dev = get_data_ll_free_entry(&ll_root_free); if (ll_dev == NULL) { - RTE_LOG(INFO, VHOST_DATA, "(%"PRIu64") No free entry found in linked list. Device limit " - "of %d devices per core has been reached\n", + RTE_LOG(INFO, VHOST_DATA, + "(%"PRIu64") No free entry found in linked list." + "Device limit of %d devices per core has been reached\n", dev->device_fh, num_devices); if (vdev->regions_hpa) rte_free(vdev->regions_hpa); @@ -2694,8 +2750,7 @@ new_device (struct virtio_net *dev) } ll_dev->vdev = vdev; add_data_ll_entry(&ll_root_used, ll_dev); - vdev->vmdq_rx_q - = dev->device_fh * queues_per_pool + vmdq_queue_base; + vdev->vmdq_rx_q = dev->device_fh * rxq + vmdq_queue_base; if (zero_copy) { uint32_t index = vdev->vmdq_rx_q; @@ -2801,8 +2856,11 @@ new_device (struct virtio_net *dev) memset(&dev_statistics[dev->device_fh], 0, sizeof(struct device_statistics)); /* Disable notifications. */ - rte_vhost_enable_guest_notification(dev, VIRTIO_RXQ, 0); - rte_vhost_enable_guest_notification(dev, VIRTIO_TXQ, 0); + for (i = 0; i < rxq; i++) { + rte_vhost_enable_guest_notification(dev, i * VIRTIO_QNUM + VIRTIO_RXQ, 0); + rte_vhost_enable_guest_notification(dev, i * VIRTIO_QNUM + VIRTIO_TXQ, 0); + } + lcore_info[vdev->coreid].lcore_ll->device_num++; dev->flags |= VIRTIO_DEV_RUNNING;