From patchwork Fri Aug 18 18:41:26 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Mykola Kostenok X-Patchwork-Id: 130543 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 83DE4430A0; Fri, 18 Aug 2023 20:42:34 +0200 (CEST) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 755AC43269; Fri, 18 Aug 2023 20:41:48 +0200 (CEST) Received: from egress-ip33b.ess.de.barracuda.com (egress-ip33b.ess.de.barracuda.com [18.185.115.237]) by mails.dpdk.org (Postfix) with ESMTP id 006414325F for ; Fri, 18 Aug 2023 20:41:43 +0200 (CEST) Received: from EUR05-VI1-obe.outbound.protection.outlook.com (mail-vi1eur05lp2177.outbound.protection.outlook.com [104.47.17.177]) by mx-outbound19-62.eu-central-1b.ess.aws.cudaops.com (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Fri, 18 Aug 2023 18:41:42 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=D/ZT5WzRPHBT8sgkwZvaVNYGYwKmdIT5vs0FD4UDEuyrfyhriXtTuR6oPCVINmUPD5tpfny02o/4QTPceON+rJQRqkVqNJdUHLL7vsl/6Xzi/DxSaylD4HlEcf4wZl2zEvk/ED7M+H/LUw9ZWG+QS4R12Devzip9gejuBXuU3KtfcJV+NjRrxFhUerzi2GUdRiXuV/OcRmuK8Rh5T1jwTqgMBfGE48rWVcV9kmshVem7kVDP0QiK+fqI7XygFHh2ujzRjzjR1/+DroDs3LsEGZxn626zrbtcgbyniOKpoKCntZFPeUM1mnwTmXcaaeG8nJYs72TRSVx4paS7i7ofaA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=xZ1wsA39qYDihP3O5PdCdN1os8jyFV1H4HLmYGJoFT8=; b=SC1uodjnQnvk4uTcZw8VPafCTqXEUkeTYcNMekvMcXsdnv+GWg3hfBZPSrtRd3KnylgiQ7/vd3pwGlcJPs4ppbWZlxLeS4CmuO1shwjuERB5AkrdjDOo3GAAzLuGSDjNgGFeSX4/2+tf+RI8czwXAYiSy/ne/o2pVKkWIzLFDELbUsRc1YZMAiDpmZEQHHiOWsbUAz3ESpIqgr//DYK0sXgH3jlOzXNAHD2xFj2NhBBEOdheF5PsvQOmVT5XeZ2ONoLTQUK2t5Tqr5LSVZzt67gjB0AScEmFdDlA3wevQhtmvBGzZXTFs3ACjuMeSfJf6lDE9yG2a81+WANzB+jAFg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=fail (sender ip is 178.72.21.4) smtp.rcpttodomain=dpdk.org smtp.mailfrom=napatech.com; dmarc=fail (p=none sp=none pct=100) action=none header.from=napatech.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=napatech.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=xZ1wsA39qYDihP3O5PdCdN1os8jyFV1H4HLmYGJoFT8=; b=Mp6JW4Z5hqDkwBGDayyMxquba4Cy7d7AyekSuLT1epUcuKko1XjLU20Bl8DeZSv7Oz2tzX1kKf8NfZSSnU3GxSI8xTb45yNG2m+rjNN1scLKe2Nef9Lg4EIucd+nQM6JOmq268yLT3eXfSepytlGhL+xDdAD7vj7AqYcox98eJo= Received: from DU7PR01CA0019.eurprd01.prod.exchangelabs.com (2603:10a6:10:50f::22) by VI1P190MB0605.EURP190.PROD.OUTLOOK.COM (2603:10a6:800:12d::22) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6699.20; Fri, 18 Aug 2023 18:41:30 +0000 Received: from DU6PEPF0000A7E4.eurprd02.prod.outlook.com (2603:10a6:10:50f:cafe::5b) by DU7PR01CA0019.outlook.office365.com (2603:10a6:10:50f::22) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6699.20 via Frontend Transport; Fri, 18 Aug 2023 18:41:30 +0000 X-MS-Exchange-Authentication-Results: spf=fail (sender IP is 178.72.21.4) smtp.mailfrom=napatech.com; dkim=none (message not signed) header.d=none;dmarc=fail action=none header.from=napatech.com; Received-SPF: Fail (protection.outlook.com: domain of napatech.com does not designate 178.72.21.4 as permitted sender) receiver=protection.outlook.com; client-ip=178.72.21.4; helo=localhost.localdomain.com; Received: from localhost.localdomain.com (178.72.21.4) by DU6PEPF0000A7E4.mail.protection.outlook.com (10.167.8.44) with Microsoft SMTP Server id 15.20.6699.14 via Frontend Transport; Fri, 18 Aug 2023 18:41:30 +0000 From: Mykola Kostenok To: dev@dpdk.org, mko-plv@napatech.com Cc: ckm@napatech.com Subject: [PATCH v4 7/8] net/ntnic: adds ethdev and makes PMD available Date: Fri, 18 Aug 2023 20:41:26 +0200 Message-Id: <20230818184127.422574-7-mko-plv@napatech.com> X-Mailer: git-send-email 2.39.3 In-Reply-To: <20230818184127.422574-1-mko-plv@napatech.com> References: <20230816132552.2483752-1-mko-plv@napatech.com> <20230818184127.422574-1-mko-plv@napatech.com> MIME-Version: 1.0 X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: DU6PEPF0000A7E4:EE_|VI1P190MB0605:EE_ X-MS-Office365-Filtering-Correlation-Id: ea740393-f88b-48ff-7915-08dba01abd23 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: 3lwnyKJwf3+xZsyfN6r2Iyvfl3JV6s+xSrDP6Knpe/SFiDLp1mf2nsAoqLi/oMdTw7GJNjrk+L3GH8MBoDIRB6S5/uwkExZVNdktLLWOi1g5caFLZ8EoPSF5TsgOctaz6KMI1E2t2CtQgqfoCPBO+upI76vaTHkjE2ghJ5dqYrqrUZRSvMiyUTzjOI7wU3Tc9wPS5NGTfAoZ5g8tbEHmfgzsejjzEAvU6BWOO9lh3djVqDs7TbCABscwihfCA/v7rkhY93lgvyEXJNsVrDHO9Sql71koiUjcQHafaL6mAsksKayBo19CbcTYvnu5p/HQkzzX7QUbZ3istWD6FmkaI1TowJYjir/gIEwRoLlXzTiPn+6i3mAVh+0hhDTNbKbfwBkKY6U7if55LJr0fXgX/4gzpeydIVmr+1QkVt602OViNzI9dC4z/iKqzgs4gLQMRrRXrUn3aCTAle8KcFbgyynWwsVTRVHjlus4RYbDJ0xgulzp6rPf6gOVEV32e6+YFC3CSC+aiQVU1SvharMMpZXIvFkXUQx8uStDuKn3nhHfikxf3D9vcdShWe/TR4QaHxccXE8wx5ZbdFJQeW8Waj80p5xyIba+nGCoGxFXeLxa2pEiFPMaRCEw9KjcGaX+zX6eRVP/+Ey8NBNFF8v+Ce/WUG9BwHWCc4jIwHi7BNFbn2i+0DlZHwpbjMBaattnX9YGFZnU7HQi+vmPmgNF0XR9VbIbhS4QE/5M8ZATQVetYDLce+83ogzx1oPrJmQyTqLsEhoBMPtfeHvvi4DW4EtzsD3oBUJsxCg8A2+ghiKCIHeW/M3co4hWOjDPUV4ZfJAnEaC+VkQaJcWSTb3c9Q== X-Forefront-Antispam-Report: CIP:178.72.21.4; CTRY:DK; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:localhost.localdomain.com; PTR:InfoDomainNonexistent; CAT:NONE; SFS:(13230031)(6069001)(346002)(39830400003)(136003)(396003)(376002)(451199024)(82310400011)(186009)(1800799009)(36840700001)(46966006)(356005)(81166007)(66899024)(2616005)(107886003)(956004)(1076003)(336012)(6666004)(6486002)(26005)(6506007)(47076005)(36860700001)(34020700004)(6512007)(83380400001)(30864003)(5660300002)(2906002)(8936002)(4326008)(118246002)(8676002)(478600001)(41300700001)(70586007)(36736006)(316002)(70206006)(86362001)(40480700001)(36756003)(9316004)(21314003)(36900700001)(579004)(559001)(309714004); DIR:OUT; SFP:1101; X-MS-Exchange-AntiSpam-ExternalHop-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-ExternalHop-MessageData-0: FTWioi5tPxuxQ6DRBqIRjU/9+F7z6hZxV5dZ+6XV026H+vA9S+s00E7rgIwyWrXCkwwI3hmsm0BFM39qzIcNyf3DtJUXKsLZY5g+dKd17v7C0lA6jbb6s9bO9lZn18bMa1BWdz414V+lMY/sVbIObDIKQUeUEBe66yi0QXhS6/0iR3v5TEeeOJb6+p64rpy70Eq2d93LfzJyBI0bqN/lHzbWI55v5GqboNZiXcDE0qfM5gBDvqqlH4vmN+MZOasXCfb1BPaR+SfwZv6hCppcu1LgjdBV4GsR5Dskwxp+EzHzjC7c/CAJu8qY/4eRQ8FfZZTH4OZyrf5smKoJrNz/qRhLJlpryAs6XAJUDhXhGLHHgig2RFIud+jr2tdkrY1SOj1xg38mBGf7EHIPqQa+mKPyVI3uh6x679jEYzQHPmTUWFrNRhn5DSihmM1TvhxsUf2am+gsEbyM0L7CPuPDkOzf3PSWlpTcfj47W5W1Nhal/rFaezoVtn9YFiAdKu7H5C++i3Ntjqay3ZQZhEEcEcpvl0WKR0RQ/vYzoNK/e5xJxNV/9P3uTbrtBQVojjYh7V13PcW/w1YRcwIhCi1zw1Meciq2rtRz2+49a9Wx9+0JrX3RlhGzx99j6uAJw01H16Ujl3AuLvF4cV4Ha/yXDod4gcEqDOtb2kfCTSxiplhXE1oRnRQeTi7iRsrqfJTqmEK2M60o8AMWi0krADMEq14QF7BcXFS0ArZBYF8pySG5hNDooogdkz950VNYOcGxv4D1Lk8AjMoNjfH8/cGFGOQRv3zR0Vrx3GiAfQDgp9E= X-OriginatorOrg: napatech.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 18 Aug 2023 18:41:30.3994 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: ea740393-f88b-48ff-7915-08dba01abd23 X-MS-Exchange-CrossTenant-Id: c4540d0b-728a-4233-9da5-9ea30c7ec3ed X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=c4540d0b-728a-4233-9da5-9ea30c7ec3ed; Ip=[178.72.21.4]; Helo=[localhost.localdomain.com] X-MS-Exchange-CrossTenant-AuthSource: DU6PEPF0000A7E4.eurprd02.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: VI1P190MB0605 X-BESS-ID: 1692384102-304926-12334-33465-1 X-BESS-VER: 2019.1_20230817.1828 X-BESS-Apparent-Source-IP: 104.47.17.177 X-BESS-Parts: H4sIAAAAAAACA4uuVkqtKFGyUioBkjpK+cVKVsbmRgaGFkB2BlDYMi0tNdU8Kc koycLIzMDUOM3SIMUo1cDM0DLVzCQt2UipNhYALs4VvkMAAAA= X-BESS-Outbound-Spam-Score: 0.00 X-BESS-Outbound-Spam-Report: Code version 3.2, rules version 3.2.2.250233 [from cloudscan18-44.eu-central-1b.ess.aws.cudaops.com] Rule breakdown below pts rule name description ---- ---------------------- -------------------------------- 0.00 LARGE_BODY_SHORTCUT META: X-BESS-Outbound-Spam-Status: SCORE=0.00 using account:ESS113687 scores of KILL_LEVEL=7.0 tests=LARGE_BODY_SHORTCUT X-BESS-BRTS-Status: 1 X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org From: Christian Koue Muf Hooks into the DPDK API, and make the PMD available to use. Also adds documentation as .rst and .ini files. Signed-off-by: Christian Koue Muf Reviewed-by: Mykola Kostenok --- v2: * Fixed WARNING:TYPO_SPELLING * Fix supported platform list v3: * Fix Fedora 38 compilation issues --- .mailmap | 2 + MAINTAINERS | 7 + doc/guides/nics/features/ntnic.ini | 50 + doc/guides/nics/ntnic.rst | 235 + drivers/net/ntnic/include/ntdrv_4ga.h | 23 + drivers/net/ntnic/include/ntos_system.h | 23 + drivers/net/ntnic/meson.build | 13 + drivers/net/ntnic/ntnic_dbsconfig.c | 1670 +++++++ drivers/net/ntnic/ntnic_dbsconfig.h | 251 + drivers/net/ntnic/ntnic_ethdev.c | 4256 +++++++++++++++++ drivers/net/ntnic/ntnic_ethdev.h | 357 ++ .../net/ntnic/ntnic_filter/create_elements.h | 1190 +++++ drivers/net/ntnic/ntnic_filter/ntnic_filter.c | 656 +++ drivers/net/ntnic/ntnic_filter/ntnic_filter.h | 14 + drivers/net/ntnic/ntnic_hshconfig.c | 102 + drivers/net/ntnic/ntnic_hshconfig.h | 9 + drivers/net/ntnic/ntnic_meter.c | 811 ++++ drivers/net/ntnic/ntnic_meter.h | 10 + drivers/net/ntnic/ntnic_vdpa.c | 365 ++ drivers/net/ntnic/ntnic_vdpa.h | 21 + drivers/net/ntnic/ntnic_vf.c | 83 + drivers/net/ntnic/ntnic_vf.h | 17 + drivers/net/ntnic/ntnic_vf_vdpa.c | 1246 +++++ drivers/net/ntnic/ntnic_vf_vdpa.h | 25 + drivers/net/ntnic/ntnic_vfio.c | 321 ++ drivers/net/ntnic/ntnic_vfio.h | 31 + drivers/net/ntnic/ntnic_xstats.c | 703 +++ drivers/net/ntnic/ntnic_xstats.h | 22 + 28 files changed, 12513 insertions(+) create mode 100644 doc/guides/nics/features/ntnic.ini create mode 100644 doc/guides/nics/ntnic.rst create mode 100644 drivers/net/ntnic/include/ntdrv_4ga.h create mode 100644 drivers/net/ntnic/include/ntos_system.h create mode 100644 drivers/net/ntnic/ntnic_dbsconfig.c create mode 100644 drivers/net/ntnic/ntnic_dbsconfig.h create mode 100644 drivers/net/ntnic/ntnic_ethdev.c create mode 100644 drivers/net/ntnic/ntnic_ethdev.h create mode 100644 drivers/net/ntnic/ntnic_filter/create_elements.h create mode 100644 drivers/net/ntnic/ntnic_filter/ntnic_filter.c create mode 100644 drivers/net/ntnic/ntnic_filter/ntnic_filter.h create mode 100644 drivers/net/ntnic/ntnic_hshconfig.c create mode 100644 drivers/net/ntnic/ntnic_hshconfig.h create mode 100644 drivers/net/ntnic/ntnic_meter.c create mode 100644 drivers/net/ntnic/ntnic_meter.h create mode 100644 drivers/net/ntnic/ntnic_vdpa.c create mode 100644 drivers/net/ntnic/ntnic_vdpa.h create mode 100644 drivers/net/ntnic/ntnic_vf.c create mode 100644 drivers/net/ntnic/ntnic_vf.h create mode 100644 drivers/net/ntnic/ntnic_vf_vdpa.c create mode 100644 drivers/net/ntnic/ntnic_vf_vdpa.h create mode 100644 drivers/net/ntnic/ntnic_vfio.c create mode 100644 drivers/net/ntnic/ntnic_vfio.h create mode 100644 drivers/net/ntnic/ntnic_xstats.c create mode 100644 drivers/net/ntnic/ntnic_xstats.h diff --git a/.mailmap b/.mailmap index 864d33ee46..be8880971d 100644 --- a/.mailmap +++ b/.mailmap @@ -227,6 +227,7 @@ Chintu Hetam Choonho Son Chris Metcalf Christian Ehrhardt +Christian Koue Muf Christian Maciocco Christophe Fontaine Christophe Grosse @@ -967,6 +968,7 @@ Mukesh Dua Murphy Yang Murthy NSSR Muthurajan Jayakumar +Mykola Kostenok Nachiketa Prachanda Nagadheeraj Rottela Naga Harish K S V diff --git a/MAINTAINERS b/MAINTAINERS index 8c3f2c993f..02aca74173 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -1053,6 +1053,13 @@ F: drivers/net/memif/ F: doc/guides/nics/memif.rst F: doc/guides/nics/features/memif.ini +NTNIC PMD +M: Mykola Kostenok +M: Christiam Muf +F: drivers/net/ntnic/ +F: doc/guides/nics/ntnic.rst +F: doc/guides/nics/features/ntnic.ini + Crypto Drivers -------------- diff --git a/doc/guides/nics/features/ntnic.ini b/doc/guides/nics/features/ntnic.ini new file mode 100644 index 0000000000..2583e12b1f --- /dev/null +++ b/doc/guides/nics/features/ntnic.ini @@ -0,0 +1,50 @@ +; +; Supported features of the 'ntnic' network poll mode driver. +; +; Refer to default.ini for the full list of available PMD features. +; +[Features] +Speed capabilities = Y +Link status = Y +Queue start/stop = Y +Shared Rx queue = Y +MTU update = Y +Promiscuous mode = Y +Unicast MAC filter = Y +Multicast MAC filter = Y +RSS hash = Y +RSS key update = Y +Inner RSS = Y +CRC offload = Y +L3 checksum offload = Y +L4 checksum offload = Y +Inner L3 checksum = Y +Inner L4 checksum = Y +Basic stats = Y +Extended stats = Y +FW version = Y +Linux = Y +x86-64 = Y + +[rte_flow items] +any = Y +eth = Y +gtp = Y +ipv4 = Y +ipv6 = Y +port_id = Y +sctp = Y +tcp = Y +udp = Y +vlan = Y + +[rte_flow actions] +drop = Y +jump = Y +meter = Y +modify_field = Y +port_id = Y +queue = Y +raw_decap = Y +raw_encap = Y +rss = Y diff --git a/doc/guides/nics/ntnic.rst b/doc/guides/nics/ntnic.rst new file mode 100644 index 0000000000..85c58543dd --- /dev/null +++ b/doc/guides/nics/ntnic.rst @@ -0,0 +1,235 @@ +.. SPDX-License-Identifier: BSD-3-Clause + Copyright(c) 2023 Napatech A/S + +NTNIC Poll Mode Driver +====================== + +The NTNIC PMD provides poll mode driver support for Napatech smartNICs. + + +Design +------ + +The NTNIC PMD is designed as a pure user-space driver, and requires no special +Napatech kernel modules. + +The Napatech smartNIC presents one control PCI device (PF0). NTNIC PMD accesses +smartNIC PF0 via vfio-pci kernel driver. Access to PF0 for all purposes is +exclusive, so only one process should access it. The physical ports are located +behind PF0 as DPDK port 0 and 1. These ports can be configured with one or more +TX and RX queues each. + +Virtual ports can be added by creating VFs via SR-IOV. The vfio-pci kernel +driver is bound to the VFs. The VFs implement virtio data plane only and the VF +configuration is done by NTNIC PMD through PF0. Each VF can be configured with +one or more TX and RX queue pairs. The VF’s are numbered starting from VF 4. +The number of VFs is limited by the number of queues supported by the FPGA, +and the number of queue pairs allocated for each VF. Current FPGA supports 128 +queues in each TX and RX direction. A maximum of 63 VFs is supported (VF4-VF66). + +As the Napatech smartNICs supports sensors and monitoring beyond what is +available in the DPDK API, the PMD includes the ntconnect socket interface. +ntconnect additionally allows Napatech to implement specific customer requests +that are not supported by the DPDK API. + + +Supported NICs +-------------- + +- NT200A02 2x100G SmartNIC + + - FPGA ID 9563 (Inline Flow Management) + + +Features +-------- + +- Multiple TX and RX queues. +- Scattered and gather for TX and RX. +- RSS based on VLAN or 5-tuple. +- RSS using different combinations of fields: L3 only, L4 only or both, and + source only, destination only or both. +- Several RSS hash keys, one for each flow type. +- Default RSS operation with no hash key specification. +- VLAN filtering. +- RX VLAN stripping via raw decap. +- TX VLAN insertion via raw encap. +- Hairpin. +- HW checksum offload of RX and hairpin. +- Promiscuous mode on PF and VF. +- Flow API. +- Multiple process. +- Tunnel types: GTP. +- Tunnel HW offload: Packet type, inner/outer RSS, IP and UDP checksum + verification. +- Support for multiple rte_flow groups. +- Encapsulation and decapsulation of GTP data. +- Packet modification: NAT, TTL decrement, DSCP tagging +- Traffic mirroring. +- Jumbo frame support. +- Port and queue statistics. +- RMON statistics in extended stats. +- Flow metering, including meter policy API. +- Link state information. +- CAM and TCAM based matching. +- Exact match of 140 million flows and policies. + + +Limitations +~~~~~~~~~~~ + +Kernel versions before 5.7 are not supported. Kernel version 5.7 added vfio-pci +support for creating VFs from the PF which is required for the PMD to use +vfio-pci on the PF. This support has been back-ported to older Linux +distributions and they are also supported. If vfio-pci is not required kernel +version 4.18 is supported. + +Current NTNIC PMD implementation only supports one active adapter. + + +Configuration +------------- + +Command line arguments +~~~~~~~~~~~~~~~~~~~~~~ + +Following standard DPDK command line arguments are used by the PMD: + + -a: Used to specifically define the NT adapter by PCI ID. + --iova-mode: Must be set to ‘pa’ for Physical Address mode. + +NTNIC specific arguments can be passed to the PMD in the PCI device parameter list:: + + ... -a 0000:03:00.0[{,}] + +The NTNIC specific argument format is:: + + .=[:] + +Multiple arguments for the same device are separated by ‘,’ comma. + can be a single value or a range. + + +- ``rxqs`` parameter [int] + + Specify number of RX queues to use. + + To specify number of RX queues:: + + -a ::00.0,rxqs=4,txqs=4 + + By default, the value is set to 1. + +- ``txqs`` parameter [int] + + Specify number of TX queues to use. + + To specify number of TX queues:: + + -a ::00.0,rxqs=4,txqs=4 + + By default, the value is set to 1. + +- ``exception_path`` parameter [int] + + Enable exception path for unmatched packets to go through queue 0. + + To enable exception_path:: + + -a ::00.0,exception_path=1 + + By default, the value is set to 0. + +- ``port.link_speed`` parameter [list] + + This parameter is used to set the link speed on physical ports in the format:: + + port.link_speed=: + + To set up link speeds:: + + -a ::00.0,port.link_speed=0:10000,port.link_speed=1:25000 + + By default, set to the maximum corresponding to the NIM bit rate. + +- ``supported-fpgas`` parameter [str] + + List the supported FPGAs for a compiled NTNIC DPDK-driver. + + This parameter has two options:: + + - list. + - verbose. + + Example usages:: + + -a ::00.0,supported-fpgas=list + -a ::00.0,supported-fpgas=verbose + +- ``help`` parameter [none] + + List all available NTNIC PMD parameters. + + +Build options +~~~~~~~~~~~~~ + +- ``NT_TOOLS`` + + Define that enables the PMD ntconnect source code. + + Default: Enabled. + +- ``NT_VF_VDPA`` + + Define that enables the PMD VF VDPA source code. + + Default: Enabled. + +- ``NT_RELAY_CORE`` + + Define that enables the PMD replay core source code. The relay core is used + by Napatech's vSwitch PMD profile in an OVS environment. + + Default: Disabled. + + +Logging and Debugging +--------------------- + +NTNIC supports several groups of logging that can be enabled with ``log-level`` +parameter: + +- ETHDEV. + + Logging info from the main PMD code. i.e. code that is related to DPDK:: + + --log-level=ntnic.ethdev,8 + +- NTHW. + + Logging info from NTHW. i.e. code that is related to the FPGA and the Adapter:: + + --log-level=ntnic.nthw,8 + +- vDPA. + + Logging info from vDPA. i.e. code that is related to VFIO and vDPA:: + + --log-level=ntnic.vdpa,8 + +- FILTER. + + Logging info from filter. i.e. code that is related to the binary filter:: + + --log-level=ntnic.filter,8 + +- FPGA. + + Logging related to FPGA:: + + --log-level=ntnic.fpga,8 + +To enable logging on all levels use wildcard in the following way:: + + --log-level=ntnic.*,8 diff --git a/drivers/net/ntnic/include/ntdrv_4ga.h b/drivers/net/ntnic/include/ntdrv_4ga.h new file mode 100644 index 0000000000..e9c38fc330 --- /dev/null +++ b/drivers/net/ntnic/include/ntdrv_4ga.h @@ -0,0 +1,23 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2023 Napatech A/S + */ + +#ifndef __NTDRV_4GA_H__ +#define __NTDRV_4GA_H__ + +#include "nthw_drv.h" +#include "nt4ga_adapter.h" +#include "nthw_platform_drv.h" + +typedef struct ntdrv_4ga_s { + uint32_t pciident; + struct adapter_info_s adapter_info; + char *p_drv_name; + + volatile bool b_shutdown; + pthread_mutex_t stat_lck; + pthread_t stat_thread; + pthread_t flm_thread; +} ntdrv_4ga_t; + +#endif /* __NTDRV_4GA_H__ */ diff --git a/drivers/net/ntnic/include/ntos_system.h b/drivers/net/ntnic/include/ntos_system.h new file mode 100644 index 0000000000..0adfe86cc3 --- /dev/null +++ b/drivers/net/ntnic/include/ntos_system.h @@ -0,0 +1,23 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2023 Napatech A/S + */ + +#ifndef __NTOS_SYSTEM_H__ +#define __NTOS_SYSTEM_H__ + +#include "include/ntdrv_4ga.h" + +/* + * struct drv_s for DPDK (clone of kernel struct) + * keep it as close as possible to original kernel struct + */ +struct drv_s { + int adapter_no; + struct rte_pci_device *p_dev; + struct ntdrv_4ga_s ntdrv; + + int n_eth_dev_init_count; + int probe_finished; +}; + +#endif /* __NTOS_SYSTEM_H__ */ diff --git a/drivers/net/ntnic/meson.build b/drivers/net/ntnic/meson.build index fde385d929..40ab25899e 100644 --- a/drivers/net/ntnic/meson.build +++ b/drivers/net/ntnic/meson.build @@ -21,6 +21,9 @@ includes = [ include_directories('sensors/ntavr'), ] +# deps +deps += 'vhost' + # all sources sources = files( 'adapter/nt4ga_adapter.c', @@ -107,6 +110,16 @@ sources = files( 'nthw/nthw_stat.c', 'nthw/supported/nthw_fpga_9563_055_024_0000.c', 'ntlog/ntlog.c', + 'ntnic_dbsconfig.c', + 'ntnic_ethdev.c', + 'ntnic_filter/ntnic_filter.c', + 'ntnic_hshconfig.c', + 'ntnic_meter.c', + 'ntnic_vdpa.c', + 'ntnic_vf.c', + 'ntnic_vfio.c', + 'ntnic_vf_vdpa.c', + 'ntnic_xstats.c', 'ntutil/nt_util.c', 'sensors/avr_sensors/avr_sensors.c', 'sensors/board_sensors/board_sensors.c', diff --git a/drivers/net/ntnic/ntnic_dbsconfig.c b/drivers/net/ntnic/ntnic_dbsconfig.c new file mode 100644 index 0000000000..2217c163ad --- /dev/null +++ b/drivers/net/ntnic/ntnic_dbsconfig.c @@ -0,0 +1,1670 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2023 Napatech A/S + */ + +#include +#include +#include +#include + +#include "ntdrv_4ga.h" +#include "nt_util.h" +#include "ntnic_dbsconfig.h" +#include "ntnic_ethdev.h" +#include "ntlog.h" + +#define STRUCT_ALIGNMENT (4 * 1024LU) +#define MAX_VIRT_QUEUES 128 + +#define LAST_QUEUE 127 +#define DISABLE 0 +#define ENABLE 1 +#define RX_AM_DISABLE DISABLE +#define RX_AM_ENABLE ENABLE +#define RX_UW_DISABLE DISABLE +#define RX_UW_ENABLE ENABLE +#define RX_Q_DISABLE DISABLE +#define RX_Q_ENABLE ENABLE +#define RX_AM_POLL_SPEED 5 +#define RX_UW_POLL_SPEED 9 +#define HOST_ID 0 +#define INIT_QUEUE 1 + +#define TX_AM_DISABLE DISABLE +#define TX_AM_ENABLE ENABLE +#define TX_UW_DISABLE DISABLE +#define TX_UW_ENABLE ENABLE +#define TX_Q_DISABLE DISABLE +#define TX_Q_ENABLE ENABLE +#define TX_AM_POLL_SPEED 5 +#define TX_UW_POLL_SPEED 8 + +/**************************************************************************/ + +#define le64 uint64_t +#define le32 uint32_t +#define le16 uint16_t + +/**************************************************************************/ + +#define VIRTQ_AVAIL_F_NO_INTERRUPT 1 +#pragma pack(1) +struct virtq_avail { + le16 flags; + le16 idx; + le16 ring[]; /* Queue size */ +}; + +#pragma pack() +/**************************************************************************/ + +/* le32 is used here for ids for padding reasons. */ +#pragma pack(1) +struct virtq_used_elem { + /* Index of start of used descriptor chain. */ + le32 id; + /* Total length of the descriptor chain which was used (written to) */ + le32 len; +}; + +#pragma pack() + +#define VIRTQ_USED_F_NO_NOTIFY 1 + +#pragma pack(1) +struct virtq_used { + le16 flags; + le16 idx; + struct virtq_used_elem ring[]; /* Queue size */ +}; + +#pragma pack() + +struct virtq_struct_layout_s { + size_t used_offset; + size_t desc_offset; +}; + +enum nthw_virt_queue_usage { UNUSED = 0, UNMANAGED, MANAGED }; + +#define PACKED(vq_type) ((vq_type) == PACKED_RING ? 1 : 0) + +struct nthw_virt_queue { + /* Pointers to virt-queue structs */ + union { + struct { + /* SPLIT virtqueue */ + struct virtq_avail *p_avail; + struct virtq_used *p_used; + struct virtq_desc *p_desc; + /* Control variables for virt-queue structs */ + le16 am_idx; + le16 used_idx; + le16 cached_idx; + le16 tx_descr_avail_idx; + }; + struct { + /* PACKED virtqueue */ + struct pvirtq_event_suppress *driver_event; + struct pvirtq_event_suppress *device_event; + struct pvirtq_desc *desc; + struct { + le16 next; + le16 num; + } outs; + /* + * when in-order release used Tx packets from FPGA it may collapse + * into a batch. When getting new Tx buffers we may only need + * partial + */ + le16 next_avail; + le16 next_used; + le16 avail_wrap_count; + le16 used_wrap_count; + }; + }; + + /* Array with packet buffers */ + struct nthw_memory_descriptor *p_virtual_addr; + + /* Queue configuration info */ + enum nthw_virt_queue_usage usage; + uint16_t vq_type; + uint16_t in_order; + int irq_vector; + + nthw_dbs_t *mp_nthw_dbs; + uint32_t index; + le16 queue_size; + uint32_t am_enable; + uint32_t host_id; + uint32_t port; /* Only used by TX queues */ + uint32_t virtual_port; /* Only used by TX queues */ + uint32_t header; + /* + * Only used by TX queues: + * 0: VirtIO-Net header (12 bytes). + * 1: Napatech DVIO0 descriptor (12 bytes). + */ + void *avail_struct_phys_addr; + void *used_struct_phys_addr; + void *desc_struct_phys_addr; +}; + +struct pvirtq_struct_layout_s { + size_t driver_event_offset; + size_t device_event_offset; +}; + +static struct nthw_virt_queue rxvq[MAX_VIRT_QUEUES]; +static struct nthw_virt_queue txvq[MAX_VIRT_QUEUES]; + +static void dbs_init_rx_queue(nthw_dbs_t *p_nthw_dbs, uint32_t queue, + uint32_t start_idx, uint32_t start_ptr) +{ + uint32_t busy; + uint32_t init; + uint32_t dummy; + + do { + get_rx_init(p_nthw_dbs, &init, &dummy, &busy); + } while (busy != 0); + + set_rx_init(p_nthw_dbs, start_idx, start_ptr, INIT_QUEUE, queue); + + do { + get_rx_init(p_nthw_dbs, &init, &dummy, &busy); + } while (busy != 0); +} + +static void dbs_init_tx_queue(nthw_dbs_t *p_nthw_dbs, uint32_t queue, + uint32_t start_idx, uint32_t start_ptr) +{ + uint32_t busy; + uint32_t init; + uint32_t dummy; + + do { + get_tx_init(p_nthw_dbs, &init, &dummy, &busy); + } while (busy != 0); + + set_tx_init(p_nthw_dbs, start_idx, start_ptr, INIT_QUEUE, queue); + + do { + get_tx_init(p_nthw_dbs, &init, &dummy, &busy); + } while (busy != 0); +} + +int nthw_virt_queue_init(struct fpga_info_s *p_fpga_info) +{ + assert(p_fpga_info); + + nt_fpga_t *const p_fpga = p_fpga_info->mp_fpga; + nthw_dbs_t *p_nthw_dbs; + int res = 0; + uint32_t i; + + p_fpga_info->mp_nthw_dbs = NULL; + + p_nthw_dbs = nthw_dbs_new(); + if (p_nthw_dbs == NULL) + return -1; + + res = dbs_init(NULL, p_fpga, 0); /* Check that DBS exists in FPGA */ + if (res) { + free(p_nthw_dbs); + return res; + } + + res = dbs_init(p_nthw_dbs, p_fpga, 0); /* Create DBS module */ + if (res) { + free(p_nthw_dbs); + return res; + } + + p_fpga_info->mp_nthw_dbs = p_nthw_dbs; + + for (i = 0; i < MAX_VIRT_QUEUES; ++i) { + rxvq[i].usage = UNUSED; + txvq[i].usage = UNUSED; + } + + dbs_reset(p_nthw_dbs); + + for (i = 0; i < NT_DBS_RX_QUEUES_MAX; ++i) + dbs_init_rx_queue(p_nthw_dbs, i, 0, 0); + + for (i = 0; i < NT_DBS_TX_QUEUES_MAX; ++i) + dbs_init_tx_queue(p_nthw_dbs, i, 0, 0); + + set_rx_control(p_nthw_dbs, LAST_QUEUE, RX_AM_DISABLE, RX_AM_POLL_SPEED, + RX_UW_DISABLE, RX_UW_POLL_SPEED, RX_Q_DISABLE); + set_rx_control(p_nthw_dbs, LAST_QUEUE, RX_AM_ENABLE, RX_AM_POLL_SPEED, + RX_UW_ENABLE, RX_UW_POLL_SPEED, RX_Q_DISABLE); + set_rx_control(p_nthw_dbs, LAST_QUEUE, RX_AM_ENABLE, RX_AM_POLL_SPEED, + RX_UW_ENABLE, RX_UW_POLL_SPEED, RX_Q_ENABLE); + + set_tx_control(p_nthw_dbs, LAST_QUEUE, TX_AM_DISABLE, TX_AM_POLL_SPEED, + TX_UW_DISABLE, TX_UW_POLL_SPEED, TX_Q_DISABLE); + set_tx_control(p_nthw_dbs, LAST_QUEUE, TX_AM_ENABLE, TX_AM_POLL_SPEED, + TX_UW_ENABLE, TX_UW_POLL_SPEED, TX_Q_DISABLE); + set_tx_control(p_nthw_dbs, LAST_QUEUE, TX_AM_ENABLE, TX_AM_POLL_SPEED, + TX_UW_ENABLE, TX_UW_POLL_SPEED, TX_Q_ENABLE); + + return 0; +} + +static struct virtq_struct_layout_s dbs_calc_struct_layout(uint32_t queue_size) +{ + size_t avail_mem = + sizeof(struct virtq_avail) + + queue_size * + sizeof(le16); /* + sizeof(le16); ("avail->used_event" is not used) */ + size_t avail_mem_aligned = + ((avail_mem % STRUCT_ALIGNMENT) == 0) ? + avail_mem : + STRUCT_ALIGNMENT * (avail_mem / STRUCT_ALIGNMENT + 1); + + /* + sizeof(le16); ("used->avail_event" is not used) */ + size_t used_mem = + sizeof(struct virtq_used) + + queue_size * + sizeof(struct virtq_used_elem); + size_t used_mem_aligned = + ((used_mem % STRUCT_ALIGNMENT) == 0) ? + used_mem : + STRUCT_ALIGNMENT * (used_mem / STRUCT_ALIGNMENT + 1); + + struct virtq_struct_layout_s virtq_layout; + + virtq_layout.used_offset = avail_mem_aligned; + virtq_layout.desc_offset = avail_mem_aligned + used_mem_aligned; + + return virtq_layout; +} + +static void dbs_initialize_avail_struct(void *addr, uint16_t queue_size, + uint16_t initial_avail_idx) +{ + uint16_t i; + struct virtq_avail *p_avail = (struct virtq_avail *)addr; + + p_avail->flags = VIRTQ_AVAIL_F_NO_INTERRUPT; + p_avail->idx = initial_avail_idx; + for (i = 0; i < queue_size; ++i) + p_avail->ring[i] = i; +} + +static void dbs_initialize_used_struct(void *addr, uint16_t queue_size) +{ + int i; + struct virtq_used *p_used = (struct virtq_used *)addr; + + p_used->flags = 1; + p_used->idx = 0; + for (i = 0; i < queue_size; ++i) { + p_used->ring[i].id = 0; + p_used->ring[i].len = 0; + } +} + +static void dbs_initialize_descriptor_struct(void *addr, + struct nthw_memory_descriptor *packet_buffer_descriptors, + uint16_t queue_size, ule16 flgs) +{ + if (packet_buffer_descriptors) { + int i; + struct virtq_desc *p_desc = (struct virtq_desc *)addr; + + for (i = 0; i < queue_size; ++i) { + p_desc[i].addr = + (uint64_t)packet_buffer_descriptors[i].phys_addr; + p_desc[i].len = packet_buffer_descriptors[i].len; + p_desc[i].flags = flgs; + p_desc[i].next = 0; + } + } +} + +static void dbs_initialize_virt_queue_structs(void *avail_struct_addr, + void *used_struct_addr, void *desc_struct_addr, + struct nthw_memory_descriptor *packet_buffer_descriptors, + uint16_t queue_size, uint16_t initial_avail_idx, ule16 flgs) +{ + dbs_initialize_avail_struct(avail_struct_addr, queue_size, + initial_avail_idx); + dbs_initialize_used_struct(used_struct_addr, queue_size); + dbs_initialize_descriptor_struct(desc_struct_addr, + packet_buffer_descriptors, + queue_size, flgs); +} + +static le16 dbs_qsize_log2(le16 qsize) +{ + uint32_t qs = 0; + + while (qsize) { + qsize = qsize >> 1; + ++qs; + } + --qs; + return qs; +} + +struct nthw_virt_queue *nthw_setup_rx_virt_queue(nthw_dbs_t *p_nthw_dbs, + uint32_t index, uint16_t start_idx, + uint16_t start_ptr, void *avail_struct_phys_addr, void *used_struct_phys_addr, + void *desc_struct_phys_addr, uint16_t queue_size, uint32_t host_id, + uint32_t header, uint32_t vq_type, int irq_vector) +{ + uint32_t qs = dbs_qsize_log2(queue_size); + uint32_t int_enable; + uint32_t vec; + uint32_t istk; + + /* + * Setup DBS module - DSF00094 + * 3. Configure the DBS.RX_DR_DATA memory; good idea to initialize all + * DBS_RX_QUEUES entries. + */ + if (set_rx_dr_data(p_nthw_dbs, index, (uint64_t)desc_struct_phys_addr, + host_id, qs, header, PACKED(vq_type)) != 0) + return NULL; + + /* + * 4. Configure the DBS.RX_UW_DATA memory; good idea to initialize all + * DBS_RX_QUEUES entries. + * Notice: We always start out with interrupts disabled (by setting the + * "irq_vector" argument to -1). Queues that require interrupts will have + * it enabled at a later time (after we have enabled vfio interrupts in + * the kernel). + */ + int_enable = 0; + vec = 0; + istk = 0; + NT_LOG(DBG, ETHDEV, "%s: set_rx_uw_data int=0 irq_vector=%u\n", + __func__, irq_vector); + if (set_rx_uw_data(p_nthw_dbs, index, + vq_type == PACKED_RING ? (uint64_t)desc_struct_phys_addr : + (uint64_t)used_struct_phys_addr, + host_id, qs, PACKED(vq_type), int_enable, vec, istk) != 0) + return NULL; + + /* + * 2. Configure the DBS.RX_AM_DATA memory and enable the queues you plan to use; + * good idea to initialize all DBS_RX_QUEUES entries. + * Notice: We do this only for queues that don't require interrupts (i.e. if + * irq_vector < 0). Queues that require interrupts will have RX_AM_DATA enabled + * at a later time (after we have enabled vfio interrupts in the kernel). + */ + if (irq_vector < 0) { + if (set_rx_am_data(p_nthw_dbs, index, + (uint64_t)avail_struct_phys_addr, RX_AM_DISABLE, + host_id, PACKED(vq_type), + irq_vector >= 0 ? 1 : 0) != 0) + return NULL; + } + + /* + * 5. Initialize all RX queues (all DBS_RX_QUEUES of them) using the + * DBS.RX_INIT register. + */ + dbs_init_rx_queue(p_nthw_dbs, index, start_idx, start_ptr); + + /* + * 2. Configure the DBS.RX_AM_DATA memory and enable the queues you plan to use; + * good idea to initialize all DBS_RX_QUEUES entries. + */ + if (set_rx_am_data(p_nthw_dbs, index, (uint64_t)avail_struct_phys_addr, + RX_AM_ENABLE, host_id, PACKED(vq_type), + irq_vector >= 0 ? 1 : 0) != 0) + return NULL; + + /* Save queue state */ + rxvq[index].usage = UNMANAGED; + rxvq[index].mp_nthw_dbs = p_nthw_dbs; + rxvq[index].index = index; + rxvq[index].queue_size = queue_size; + rxvq[index].am_enable = (irq_vector < 0) ? RX_AM_ENABLE : RX_AM_DISABLE; + rxvq[index].host_id = host_id; + rxvq[index].avail_struct_phys_addr = avail_struct_phys_addr; + rxvq[index].used_struct_phys_addr = used_struct_phys_addr; + rxvq[index].desc_struct_phys_addr = desc_struct_phys_addr; + rxvq[index].vq_type = vq_type; + rxvq[index].in_order = 0; /* not used */ + rxvq[index].irq_vector = irq_vector; + + /* Return queue handle */ + return &rxvq[index]; +} + +static int dbs_wait_hw_queue_shutdown(struct nthw_virt_queue *vq, int rx); + +int nthw_disable_rx_virt_queue(struct nthw_virt_queue *rx_vq) +{ + if (!rx_vq) { + NT_LOG(ERR, ETHDEV, "%s: Invalid queue\n", __func__); + return -1; + } + + nthw_dbs_t *p_nthw_dbs = rx_vq->mp_nthw_dbs; + + if (rx_vq->index >= MAX_VIRT_QUEUES) + return -1; + + if (rx_vq->usage != UNMANAGED) + return -1; + + uint32_t qs = dbs_qsize_log2(rx_vq->queue_size); + + /* If ISTK is set, make sure to unset it */ + if (set_rx_uw_data(p_nthw_dbs, rx_vq->index, + rx_vq->vq_type == PACKED_RING ? + (uint64_t)rx_vq->desc_struct_phys_addr : + (uint64_t)rx_vq->used_struct_phys_addr, + rx_vq->host_id, qs, PACKED(rx_vq->vq_type), 0, 0, + 0) != 0) + return -1; + + /* Disable AM */ + rx_vq->am_enable = RX_AM_DISABLE; + if (set_rx_am_data(p_nthw_dbs, rx_vq->index, + (uint64_t)rx_vq->avail_struct_phys_addr, + rx_vq->am_enable, rx_vq->host_id, + PACKED(rx_vq->vq_type), 0) != 0) + return -1; + + /* let the FPGA finish packet processing */ + if (dbs_wait_hw_queue_shutdown(rx_vq, 1) != 0) + return -1; + + return 0; +} + +int nthw_enable_rx_virt_queue(struct nthw_virt_queue *rx_vq) +{ + uint32_t int_enable; + uint32_t vec; + uint32_t istk; + + if (!rx_vq) { + NT_LOG(ERR, ETHDEV, "%s: Invalid queue\n", __func__); + return -1; + } + + nthw_dbs_t *p_nthw_dbs = rx_vq->mp_nthw_dbs; + + if (rx_vq->index >= MAX_VIRT_QUEUES) + return -1; + + if (rx_vq->usage != UNMANAGED) + return -1; + + uint32_t qs = dbs_qsize_log2(rx_vq->queue_size); + + /* Set ISTK if */ + if (rx_vq->irq_vector >= 0 && + rx_vq->irq_vector < MAX_MSIX_VECTORS_PR_VF) { + int_enable = 1; + vec = rx_vq->irq_vector; + istk = 1; + } else { + int_enable = 0; + vec = 0; + istk = 0; + } + NT_LOG(DBG, ETHDEV, "%s: set_rx_uw_data irq_vector=%u\n", __func__, + rx_vq->irq_vector); + if (set_rx_uw_data(p_nthw_dbs, rx_vq->index, + rx_vq->vq_type == PACKED_RING ? + (uint64_t)rx_vq->desc_struct_phys_addr : + (uint64_t)rx_vq->used_struct_phys_addr, + rx_vq->host_id, qs, PACKED(rx_vq->vq_type), + int_enable, vec, istk) != 0) + return -1; + + /* Enable AM */ + rx_vq->am_enable = RX_AM_ENABLE; + if (set_rx_am_data(p_nthw_dbs, rx_vq->index, + (uint64_t)rx_vq->avail_struct_phys_addr, + rx_vq->am_enable, rx_vq->host_id, + PACKED(rx_vq->vq_type), + rx_vq->irq_vector >= 0 ? 1 : 0) != 0) + return -1; + + return 0; +} + +int nthw_disable_tx_virt_queue(struct nthw_virt_queue *tx_vq) +{ + if (!tx_vq) { + NT_LOG(ERR, ETHDEV, "%s: Invalid queue\n", __func__); + return -1; + } + + nthw_dbs_t *p_nthw_dbs = tx_vq->mp_nthw_dbs; + + if (tx_vq->index >= MAX_VIRT_QUEUES) + return -1; + + if (tx_vq->usage != UNMANAGED) + return -1; + + uint32_t qs = dbs_qsize_log2(tx_vq->queue_size); + + /* If ISTK is set, make sure to unset it */ + if (set_tx_uw_data(p_nthw_dbs, tx_vq->index, + tx_vq->vq_type == PACKED_RING ? + (uint64_t)tx_vq->desc_struct_phys_addr : + (uint64_t)tx_vq->used_struct_phys_addr, + tx_vq->host_id, qs, PACKED(tx_vq->vq_type), 0, 0, 0, + tx_vq->in_order) != 0) + return -1; + + /* Disable AM */ + tx_vq->am_enable = TX_AM_DISABLE; + if (set_tx_am_data(p_nthw_dbs, tx_vq->index, + (uint64_t)tx_vq->avail_struct_phys_addr, + tx_vq->am_enable, tx_vq->host_id, + PACKED(tx_vq->vq_type), 0) != 0) + return -1; + + /* let the FPGA finish packet processing */ + if (dbs_wait_hw_queue_shutdown(tx_vq, 0) != 0) + return -1; + + return 0; +} + +int nthw_enable_tx_virt_queue(struct nthw_virt_queue *tx_vq) +{ + uint32_t int_enable; + uint32_t vec; + uint32_t istk; + + if (!tx_vq) { + NT_LOG(ERR, ETHDEV, "%s: Invalid queue\n", __func__); + return -1; + } + + nthw_dbs_t *p_nthw_dbs = tx_vq->mp_nthw_dbs; + + if (tx_vq->index >= MAX_VIRT_QUEUES) + return -1; + + if (tx_vq->usage != UNMANAGED) + return -1; + + uint32_t qs = dbs_qsize_log2(tx_vq->queue_size); + + /* Set ISTK if irq_vector is used */ + if (tx_vq->irq_vector >= 0 && + tx_vq->irq_vector < MAX_MSIX_VECTORS_PR_VF) { + int_enable = 1; + vec = tx_vq->irq_vector; + istk = 1; /* Use sticky interrupt */ + } else { + int_enable = 0; + vec = 0; + istk = 0; + } + if (set_tx_uw_data(p_nthw_dbs, tx_vq->index, + tx_vq->vq_type == PACKED_RING ? + (uint64_t)tx_vq->desc_struct_phys_addr : + (uint64_t)tx_vq->used_struct_phys_addr, + tx_vq->host_id, qs, PACKED(tx_vq->vq_type), + int_enable, vec, istk, tx_vq->in_order) != 0) + return -1; + + /* Enable AM */ + tx_vq->am_enable = TX_AM_ENABLE; + if (set_tx_am_data(p_nthw_dbs, tx_vq->index, + (uint64_t)tx_vq->avail_struct_phys_addr, + tx_vq->am_enable, tx_vq->host_id, + PACKED(tx_vq->vq_type), + tx_vq->irq_vector >= 0 ? 1 : 0) != 0) + return -1; + + return 0; +} + +int nthw_enable_and_change_port_tx_virt_queue(struct nthw_virt_queue *tx_vq, + uint32_t outport) +{ + nthw_dbs_t *p_nthw_dbs = tx_vq->mp_nthw_dbs; + uint32_t qs = dbs_qsize_log2(tx_vq->queue_size); + + if (set_tx_dr_data(p_nthw_dbs, tx_vq->index, + (uint64_t)tx_vq->desc_struct_phys_addr, tx_vq->host_id, + qs, outport, 0, PACKED(tx_vq->vq_type)) != 0) + return -1; + return nthw_enable_tx_virt_queue(tx_vq); +} + +int nthw_set_tx_qos_config(nthw_dbs_t *p_nthw_dbs, uint32_t port, uint32_t enable, + uint32_t ir, uint32_t bs) +{ + return set_tx_qos_data(p_nthw_dbs, port, enable, ir, bs); +} + +int nthw_set_tx_qos_rate_global(nthw_dbs_t *p_nthw_dbs, uint32_t multiplier, + uint32_t divider) +{ + return set_tx_qos_rate(p_nthw_dbs, multiplier, divider); +} + +#define INDEX_PTR_NOT_VALID 0x80000000 +static int dbs_get_rx_ptr(nthw_dbs_t *p_nthw_dbs, uint32_t *p_index) +{ + uint32_t ptr; + uint32_t queue; + uint32_t valid; + + const int status = get_rx_ptr(p_nthw_dbs, &ptr, &queue, &valid); + + if (status == 0) { + if (valid) + *p_index = ptr; + else + *p_index = INDEX_PTR_NOT_VALID; + } + return status; +} + +static int dbs_get_tx_ptr(nthw_dbs_t *p_nthw_dbs, uint32_t *p_index) +{ + uint32_t ptr; + uint32_t queue; + uint32_t valid; + + const int status = get_tx_ptr(p_nthw_dbs, &ptr, &queue, &valid); + + if (status == 0) { + if (valid) + *p_index = ptr; + else + *p_index = INDEX_PTR_NOT_VALID; + } + return status; +} + +static int dbs_initialize_get_rx_ptr(nthw_dbs_t *p_nthw_dbs, uint32_t queue) +{ + return set_rx_ptr_queue(p_nthw_dbs, queue); +} + +static int dbs_initialize_get_tx_ptr(nthw_dbs_t *p_nthw_dbs, uint32_t queue) +{ + return set_tx_ptr_queue(p_nthw_dbs, queue); +} + +static int dbs_wait_on_busy(struct nthw_virt_queue *vq, uint32_t *idle, int rx) +{ + uint32_t busy; + uint32_t queue; + int err = 0; + nthw_dbs_t *p_nthw_dbs = vq->mp_nthw_dbs; + + do { + if (rx) + err = get_rx_idle(p_nthw_dbs, idle, &queue, &busy); + else + err = get_tx_idle(p_nthw_dbs, idle, &queue, &busy); + } while (!err && busy); + + return err; +} + +static int dbs_wait_hw_queue_shutdown(struct nthw_virt_queue *vq, int rx) +{ + int err = 0; + uint32_t idle = 0; + nthw_dbs_t *p_nthw_dbs = vq->mp_nthw_dbs; + + err = dbs_wait_on_busy(vq, &idle, rx); + if (err) { + if (err == -ENOTSUP) { + NT_OS_WAIT_USEC(200000); + return 0; + } + return -1; + } + + do { + if (rx) + err = set_rx_idle(p_nthw_dbs, 1, vq->index); + else + err = set_tx_idle(p_nthw_dbs, 1, vq->index); + + if (err) + return -1; + + if (dbs_wait_on_busy(vq, &idle, rx) != 0) + return -1; + + } while (idle == 0); + + return 0; +} + +static int dbs_internal_release_rx_virt_queue(struct nthw_virt_queue *rxvq) +{ + nthw_dbs_t *p_nthw_dbs = rxvq->mp_nthw_dbs; + + if (rxvq == NULL) + return -1; + + /* Clear UW */ + rxvq->used_struct_phys_addr = NULL; + if (set_rx_uw_data(p_nthw_dbs, rxvq->index, + (uint64_t)rxvq->used_struct_phys_addr, rxvq->host_id, 0, + PACKED(rxvq->vq_type), 0, 0, 0) != 0) + return -1; + + /* Disable AM */ + rxvq->am_enable = RX_AM_DISABLE; + if (set_rx_am_data(p_nthw_dbs, rxvq->index, + (uint64_t)rxvq->avail_struct_phys_addr, rxvq->am_enable, + rxvq->host_id, PACKED(rxvq->vq_type), 0) != 0) + return -1; + + /* Let the FPGA finish packet processing */ + if (dbs_wait_hw_queue_shutdown(rxvq, 1) != 0) + return -1; + + /* Clear rest of AM */ + rxvq->avail_struct_phys_addr = NULL; + rxvq->host_id = 0; + if (set_rx_am_data(p_nthw_dbs, rxvq->index, + (uint64_t)rxvq->avail_struct_phys_addr, rxvq->am_enable, + rxvq->host_id, PACKED(rxvq->vq_type), 0) != 0) + return -1; + + /* Clear DR */ + rxvq->desc_struct_phys_addr = NULL; + if (set_rx_dr_data(p_nthw_dbs, rxvq->index, + (uint64_t)rxvq->desc_struct_phys_addr, rxvq->host_id, 0, + rxvq->header, PACKED(rxvq->vq_type)) != 0) + return -1; + + /* Initialize queue */ + dbs_init_rx_queue(p_nthw_dbs, rxvq->index, 0, 0); + + /* Reset queue state */ + rxvq->usage = UNUSED; + rxvq->mp_nthw_dbs = p_nthw_dbs; + rxvq->index = 0; + rxvq->queue_size = 0; + + return 0; +} + +int nthw_release_rx_virt_queue(struct nthw_virt_queue *rxvq) +{ + if (rxvq == NULL || rxvq->usage != UNMANAGED) + return -1; + + return dbs_internal_release_rx_virt_queue(rxvq); +} + +int nthw_release_managed_rx_virt_queue(struct nthw_virt_queue *rxvq) +{ + if (rxvq == NULL || rxvq->usage != MANAGED) + return -1; + + if (rxvq->p_virtual_addr) { + free(rxvq->p_virtual_addr); + rxvq->p_virtual_addr = NULL; + } + + return dbs_internal_release_rx_virt_queue(rxvq); +} + +static int dbs_internal_release_tx_virt_queue(struct nthw_virt_queue *txvq) +{ + nthw_dbs_t *p_nthw_dbs = txvq->mp_nthw_dbs; + + if (txvq == NULL) + return -1; + + /* Clear UW */ + txvq->used_struct_phys_addr = NULL; + if (set_tx_uw_data(p_nthw_dbs, txvq->index, + (uint64_t)txvq->used_struct_phys_addr, txvq->host_id, 0, + PACKED(txvq->vq_type), 0, 0, 0, + txvq->in_order) != 0) + return -1; + + /* Disable AM */ + txvq->am_enable = TX_AM_DISABLE; + if (set_tx_am_data(p_nthw_dbs, txvq->index, + (uint64_t)txvq->avail_struct_phys_addr, txvq->am_enable, + txvq->host_id, PACKED(txvq->vq_type), 0) != 0) + return -1; + + /* Let the FPGA finish packet processing */ + if (dbs_wait_hw_queue_shutdown(txvq, 0) != 0) + return -1; + + /* Clear rest of AM */ + txvq->avail_struct_phys_addr = NULL; + txvq->host_id = 0; + if (set_tx_am_data(p_nthw_dbs, txvq->index, + (uint64_t)txvq->avail_struct_phys_addr, txvq->am_enable, + txvq->host_id, PACKED(txvq->vq_type), 0) != 0) + return -1; + + /* Clear DR */ + txvq->desc_struct_phys_addr = NULL; + txvq->port = 0; + txvq->header = 0; + if (set_tx_dr_data(p_nthw_dbs, txvq->index, + (uint64_t)txvq->desc_struct_phys_addr, txvq->host_id, 0, + txvq->port, txvq->header, + PACKED(txvq->vq_type)) != 0) + return -1; + + /* Clear QP */ + txvq->virtual_port = 0; + if (nthw_dbs_set_tx_qp_data(p_nthw_dbs, txvq->index, txvq->virtual_port) != + 0) + return -1; + + /* Initialize queue */ + dbs_init_tx_queue(p_nthw_dbs, txvq->index, 0, 0); + + /* Reset queue state */ + txvq->usage = UNUSED; + txvq->mp_nthw_dbs = p_nthw_dbs; + txvq->index = 0; + txvq->queue_size = 0; + + return 0; +} + +int nthw_release_tx_virt_queue(struct nthw_virt_queue *txvq) +{ + if (txvq == NULL || txvq->usage != UNMANAGED) + return -1; + + return dbs_internal_release_tx_virt_queue(txvq); +} + +int nthw_release_managed_tx_virt_queue(struct nthw_virt_queue *txvq) +{ + if (txvq == NULL || txvq->usage != MANAGED) + return -1; + + if (txvq->p_virtual_addr) { + free(txvq->p_virtual_addr); + txvq->p_virtual_addr = NULL; + } + + return dbs_internal_release_tx_virt_queue(txvq); +} + +struct nthw_virt_queue *nthw_setup_tx_virt_queue(nthw_dbs_t *p_nthw_dbs, + uint32_t index, uint16_t start_idx, + uint16_t start_ptr, void *avail_struct_phys_addr, void *used_struct_phys_addr, + void *desc_struct_phys_addr, uint16_t queue_size, uint32_t host_id, + uint32_t port, uint32_t virtual_port, uint32_t header, uint32_t vq_type, + int irq_vector, uint32_t in_order) +{ + uint32_t int_enable; + uint32_t vec; + uint32_t istk; + uint32_t qs = dbs_qsize_log2(queue_size); + + /* + * Setup DBS module - DSF00094 + * 3. Configure the DBS.TX_DR_DATA memory; good idea to initialize all + * DBS_TX_QUEUES entries. + */ + if (set_tx_dr_data(p_nthw_dbs, index, (uint64_t)desc_struct_phys_addr, + host_id, qs, port, header, PACKED(vq_type)) != 0) + return NULL; + + /* + * 4. Configure the DBS.TX_UW_DATA memory; good idea to initialize all + * DBS_TX_QUEUES entries. + * Notice: We always start out with interrupts disabled (by setting the + * "irq_vector" argument to -1). Queues that require interrupts will have + * it enabled at a later time (after we have enabled vfio interrupts in the + * kernel). + */ + int_enable = 0; + vec = 0; + istk = 0; + + if (set_tx_uw_data(p_nthw_dbs, index, + vq_type == PACKED_RING ? + (uint64_t)desc_struct_phys_addr : + (uint64_t)used_struct_phys_addr, + host_id, qs, PACKED(vq_type), int_enable, vec, istk, + in_order) != 0) + return NULL; + + /* + * 2. Configure the DBS.TX_AM_DATA memory and enable the queues you plan to use; + * good idea to initialize all DBS_TX_QUEUES entries. + */ + if (set_tx_am_data(p_nthw_dbs, index, (uint64_t)avail_struct_phys_addr, + TX_AM_DISABLE, host_id, PACKED(vq_type), + irq_vector >= 0 ? 1 : 0) != 0) + return NULL; + + /* + * 5. Initialize all TX queues (all DBS_TX_QUEUES of them) using the + * DBS.TX_INIT register. + */ + dbs_init_tx_queue(p_nthw_dbs, index, start_idx, start_ptr); + + if (nthw_dbs_set_tx_qp_data(p_nthw_dbs, index, virtual_port) != 0) + return NULL; + + /* + * 2. Configure the DBS.TX_AM_DATA memory and enable the queues you plan to use; + * good idea to initialize all DBS_TX_QUEUES entries. + * Notice: We do this only for queues that don't require interrupts (i.e. if + * irq_vector < 0). Queues that require interrupts will have TX_AM_DATA + * enabled at a later time (after we have enabled vfio interrupts in the + * kernel). + */ + if (irq_vector < 0) { + if (set_tx_am_data(p_nthw_dbs, index, + (uint64_t)avail_struct_phys_addr, TX_AM_ENABLE, + host_id, PACKED(vq_type), + irq_vector >= 0 ? 1 : 0) != 0) + return NULL; + } + + /* Save queue state */ + txvq[index].usage = UNMANAGED; + txvq[index].mp_nthw_dbs = p_nthw_dbs; + txvq[index].index = index; + txvq[index].queue_size = queue_size; + txvq[index].am_enable = (irq_vector < 0) ? TX_AM_ENABLE : TX_AM_DISABLE; + txvq[index].host_id = host_id; + txvq[index].port = port; + txvq[index].virtual_port = virtual_port; + txvq[index].header = header; + txvq[index].avail_struct_phys_addr = avail_struct_phys_addr; + txvq[index].used_struct_phys_addr = used_struct_phys_addr; + txvq[index].desc_struct_phys_addr = desc_struct_phys_addr; + txvq[index].vq_type = vq_type; + txvq[index].in_order = in_order; + txvq[index].irq_vector = irq_vector; + + /* Return queue handle */ + return &txvq[index]; +} + +static struct nthw_virt_queue *nthw_setup_managed_rx_virt_queue_split(nthw_dbs_t *p_nthw_dbs, + uint32_t index, uint32_t queue_size, + uint32_t host_id, uint32_t header, + struct nthw_memory_descriptor *p_virt_struct_area, + struct nthw_memory_descriptor *p_packet_buffers, int irq_vector) +{ + struct virtq_struct_layout_s virtq_struct_layout = + dbs_calc_struct_layout(queue_size); + + dbs_initialize_virt_queue_structs(p_virt_struct_area->virt_addr, + (char *)p_virt_struct_area->virt_addr + + virtq_struct_layout.used_offset, + (char *)p_virt_struct_area->virt_addr + + virtq_struct_layout.desc_offset, + p_packet_buffers, (uint16_t)queue_size, + p_packet_buffers ? (uint16_t)queue_size : 0, + VIRTQ_DESC_F_WRITE /* Rx */); + + rxvq[index].p_avail = p_virt_struct_area->virt_addr; + rxvq[index].p_used = (void *)((char *)p_virt_struct_area->virt_addr + + virtq_struct_layout.used_offset); + rxvq[index].p_desc = (void *)((char *)p_virt_struct_area->virt_addr + + virtq_struct_layout.desc_offset); + + rxvq[index].am_idx = p_packet_buffers ? (uint16_t)queue_size : 0; + rxvq[index].used_idx = 0; + rxvq[index].cached_idx = 0; + rxvq[index].p_virtual_addr = NULL; + + if (p_packet_buffers) { + rxvq[index].p_virtual_addr = + malloc(queue_size * sizeof(*p_packet_buffers)); + memcpy(rxvq[index].p_virtual_addr, p_packet_buffers, + queue_size * sizeof(*p_packet_buffers)); + } + + nthw_setup_rx_virt_queue(p_nthw_dbs, index, 0, 0, + (void *)p_virt_struct_area->phys_addr, + (char *)p_virt_struct_area->phys_addr + + virtq_struct_layout.used_offset, + (char *)p_virt_struct_area->phys_addr + + virtq_struct_layout.desc_offset, + (uint16_t)queue_size, host_id, header, + SPLIT_RING, irq_vector); + + rxvq[index].usage = MANAGED; + + return &rxvq[index]; +} + +static struct nthw_virt_queue *nthw_setup_managed_tx_virt_queue_split(nthw_dbs_t *p_nthw_dbs, + uint32_t index, uint32_t queue_size, + uint32_t host_id, uint32_t port, uint32_t virtual_port, uint32_t header, + int irq_vector, uint32_t in_order, + struct nthw_memory_descriptor *p_virt_struct_area, + struct nthw_memory_descriptor *p_packet_buffers) +{ + struct virtq_struct_layout_s virtq_struct_layout = + dbs_calc_struct_layout(queue_size); + + dbs_initialize_virt_queue_structs(p_virt_struct_area->virt_addr, + (char *)p_virt_struct_area->virt_addr + + virtq_struct_layout.used_offset, + (char *)p_virt_struct_area->virt_addr + + virtq_struct_layout.desc_offset, + p_packet_buffers, (uint16_t)queue_size, 0, 0 /* Tx */); + + txvq[index].p_avail = p_virt_struct_area->virt_addr; + txvq[index].p_used = (void *)((char *)p_virt_struct_area->virt_addr + + virtq_struct_layout.used_offset); + txvq[index].p_desc = (void *)((char *)p_virt_struct_area->virt_addr + + virtq_struct_layout.desc_offset); + txvq[index].queue_size = (le16)queue_size; + txvq[index].am_idx = 0; + txvq[index].used_idx = 0; + txvq[index].cached_idx = 0; + txvq[index].p_virtual_addr = NULL; + + txvq[index].tx_descr_avail_idx = 0; + + if (p_packet_buffers) { + txvq[index].p_virtual_addr = + malloc(queue_size * sizeof(*p_packet_buffers)); + memcpy(txvq[index].p_virtual_addr, p_packet_buffers, + queue_size * sizeof(*p_packet_buffers)); + } + + nthw_setup_tx_virt_queue(p_nthw_dbs, index, 0, 0, + (void *)p_virt_struct_area->phys_addr, + (char *)p_virt_struct_area->phys_addr + + virtq_struct_layout.used_offset, + (char *)p_virt_struct_area->phys_addr + + virtq_struct_layout.desc_offset, + (uint16_t)queue_size, host_id, port, virtual_port, + header, SPLIT_RING, irq_vector, in_order); + + txvq[index].usage = MANAGED; + + return &txvq[index]; +} + +/* + * Packed Ring + */ +static int nthw_setup_managed_virt_queue_packed(struct nthw_virt_queue *vq, + struct pvirtq_struct_layout_s *pvirtq_layout, + struct nthw_memory_descriptor *p_virt_struct_area, + struct nthw_memory_descriptor *p_packet_buffers, ule16 flags, int rx) +{ + /* page aligned */ + assert(((uintptr_t)p_virt_struct_area->phys_addr & 0xfff) == 0); + assert(p_packet_buffers); + + /* clean canvas */ + memset(p_virt_struct_area->virt_addr, 0, + sizeof(struct pvirtq_desc) * vq->queue_size + + sizeof(struct pvirtq_event_suppress) * 2 + + sizeof(int) * vq->queue_size); + + pvirtq_layout->device_event_offset = + sizeof(struct pvirtq_desc) * vq->queue_size; + pvirtq_layout->driver_event_offset = + pvirtq_layout->device_event_offset + + sizeof(struct pvirtq_event_suppress); + + vq->desc = p_virt_struct_area->virt_addr; + vq->device_event = (void *)((uintptr_t)vq->desc + + pvirtq_layout->device_event_offset); + vq->driver_event = (void *)((uintptr_t)vq->desc + + pvirtq_layout->driver_event_offset); + + vq->next_avail = 0; + vq->next_used = 0; + vq->avail_wrap_count = 1; + vq->used_wrap_count = 1; + + /* + * Only possible if FPGA always delivers in-order + * Buffer ID used is the index in the pPacketBuffers array + */ + unsigned int i; + struct pvirtq_desc *p_desc = vq->desc; + + for (i = 0; i < vq->queue_size; i++) { + if (rx) { + p_desc[i].addr = (uint64_t)p_packet_buffers[i].phys_addr; + p_desc[i].len = p_packet_buffers[i].len; + } + p_desc[i].id = i; + p_desc[i].flags = flags; + } + + if (rx) + vq->avail_wrap_count ^= + 1; /* filled up available buffers for Rx */ + else + vq->used_wrap_count ^= 1; /* pre-fill free buffer IDs */ + + if (vq->queue_size == 0) + return -1; /* don't allocate memory with size of 0 bytes */ + vq->p_virtual_addr = malloc(vq->queue_size * sizeof(*p_packet_buffers)); + if (vq->p_virtual_addr == NULL) + return -1; + + memcpy(vq->p_virtual_addr, p_packet_buffers, + vq->queue_size * sizeof(*p_packet_buffers)); + + /* Not used yet by FPGA - make sure we disable */ + vq->device_event->flags = RING_EVENT_FLAGS_DISABLE; + + return 0; +} + +static struct nthw_virt_queue *nthw_setup_managed_rx_virt_queue_packed(nthw_dbs_t *p_nthw_dbs, + uint32_t index, uint32_t queue_size, + uint32_t host_id, uint32_t header, + struct nthw_memory_descriptor *p_virt_struct_area, + struct nthw_memory_descriptor *p_packet_buffers, int irq_vector) +{ + struct pvirtq_struct_layout_s pvirtq_layout; + struct nthw_virt_queue *vq = &rxvq[index]; + /* Set size and setup packed vq ring */ + vq->queue_size = queue_size; + /* Use Avail flag bit == 1 because wrap bit is initially set to 1 - and Used is inverse */ + if (nthw_setup_managed_virt_queue_packed(vq, &pvirtq_layout, + p_virt_struct_area, p_packet_buffers, + VIRTQ_DESC_F_WRITE | VIRTQ_DESC_F_AVAIL, 1) != 0) + return NULL; + + nthw_setup_rx_virt_queue(p_nthw_dbs, index, 0x8000, + 0, /* start wrap ring counter as 1 */ + (void *)((uintptr_t)p_virt_struct_area->phys_addr + + pvirtq_layout.driver_event_offset), + (void *)((uintptr_t)p_virt_struct_area->phys_addr + + pvirtq_layout.device_event_offset), + p_virt_struct_area->phys_addr, (uint16_t)queue_size, + host_id, header, PACKED_RING, irq_vector); + + vq->usage = MANAGED; + return vq; +} + +static struct nthw_virt_queue *nthw_setup_managed_tx_virt_queue_packed(nthw_dbs_t *p_nthw_dbs, + uint32_t index, uint32_t queue_size, + uint32_t host_id, uint32_t port, uint32_t virtual_port, uint32_t header, + int irq_vector, uint32_t in_order, + struct nthw_memory_descriptor *p_virt_struct_area, + struct nthw_memory_descriptor *p_packet_buffers) +{ + struct pvirtq_struct_layout_s pvirtq_layout; + struct nthw_virt_queue *vq = &txvq[index]; + /* Set size and setup packed vq ring */ + vq->queue_size = queue_size; + if (nthw_setup_managed_virt_queue_packed(vq, &pvirtq_layout, + p_virt_struct_area, + p_packet_buffers, 0, 0) != 0) + return NULL; + + nthw_setup_tx_virt_queue(p_nthw_dbs, index, 0x8000, + 0, /* start wrap ring counter as 1 */ + (void *)((uintptr_t)p_virt_struct_area->phys_addr + + pvirtq_layout.driver_event_offset), + (void *)((uintptr_t)p_virt_struct_area->phys_addr + + pvirtq_layout.device_event_offset), + p_virt_struct_area->phys_addr, (uint16_t)queue_size, + host_id, port, virtual_port, header, PACKED_RING, + irq_vector, in_order); + + vq->usage = MANAGED; + return vq; +} + +/* + * Create a Managed Rx Virt Queue + * + * p_virt_struct_area - Memory that can be used for virtQueue structs + * p_packet_buffers - Memory that can be used for packet buffers. Array must have queue_size entries + * + * Notice: The queue will be created with interrupts disabled. + * If interrupts are required, make sure to call nthw_enable_rx_virt_queue() + * afterwards. + */ +struct nthw_virt_queue * +nthw_setup_managed_rx_virt_queue(nthw_dbs_t *p_nthw_dbs, uint32_t index, + uint32_t queue_size, uint32_t host_id, + uint32_t header, + struct nthw_memory_descriptor *p_virt_struct_area, + struct nthw_memory_descriptor *p_packet_buffers, + uint32_t vq_type, int irq_vector) +{ + switch (vq_type) { + case SPLIT_RING: + return nthw_setup_managed_rx_virt_queue_split(p_nthw_dbs, + index, queue_size, host_id, header, + p_virt_struct_area, p_packet_buffers, irq_vector); + case PACKED_RING: + return nthw_setup_managed_rx_virt_queue_packed(p_nthw_dbs, + index, queue_size, host_id, header, + p_virt_struct_area, p_packet_buffers, irq_vector); + default: + break; + } + return NULL; +} + +/* + * Create a Managed Tx Virt Queue + * + * p_virt_struct_area - Memory that can be used for virtQueue structs + * p_packet_buffers - Memory that can be used for packet buffers. Array must have queue_size entries + * + * Notice: The queue will be created with interrupts disabled. + * If interrupts are required, make sure to call nthw_enable_tx_virt_queue() + * afterwards. + */ +struct nthw_virt_queue *nthw_setup_managed_tx_virt_queue(nthw_dbs_t *p_nthw_dbs, + uint32_t index, uint32_t queue_size, + uint32_t host_id, uint32_t port, uint32_t virtual_port, uint32_t header, + struct nthw_memory_descriptor *p_virt_struct_area, + struct nthw_memory_descriptor *p_packet_buffers, uint32_t vq_type, + int irq_vector, uint32_t in_order) +{ + switch (vq_type) { + case SPLIT_RING: + return nthw_setup_managed_tx_virt_queue_split(p_nthw_dbs, index, + queue_size, host_id, port, virtual_port, + header, irq_vector, in_order, p_virt_struct_area, + p_packet_buffers); + case PACKED_RING: + return nthw_setup_managed_tx_virt_queue_packed(p_nthw_dbs, index, + queue_size, host_id, port, virtual_port, + header, irq_vector, in_order, p_virt_struct_area, + p_packet_buffers); + default: + break; + } + return NULL; +} + +/* + * Packed Ring helper macros + */ +#define avail_flag(vq) ((vq)->avail_wrap_count ? VIRTQ_DESC_F_AVAIL : 0) +#define used_flag_inv(vq) ((vq)->avail_wrap_count ? 0 : VIRTQ_DESC_F_USED) + +#define inc_avail(_vq, _num) \ + do { \ + __typeof__(_vq) (vq) = (_vq); \ + __typeof__(_num) (num) = (_num); \ + (vq)->next_avail += num; \ + if ((vq)->next_avail >= (vq)->queue_size) { \ + (vq)->next_avail -= (vq)->queue_size; \ + (vq)->avail_wrap_count ^= 1; \ + } \ + } while (0) + +#define inc_used(_vq, _num) \ + do { \ + __typeof__(_vq) (vq) = (_vq); \ + __typeof__(_num) (num) = (_num); \ + (vq)->next_used += num; \ + if ((vq)->next_used >= (vq)->queue_size) { \ + (vq)->next_used -= (vq)->queue_size; \ + (vq)->used_wrap_count ^= 1; \ + } \ + } while (0) + +uint16_t nthw_get_rx_packets(struct nthw_virt_queue *rxvq, uint16_t n, + struct nthw_received_packets *rp, uint16_t *nb_pkts) +{ + le16 segs = 0; + uint16_t pkts = 0; + + if (rxvq->vq_type == SPLIT_RING) { + le16 i; + le16 entries_ready = (le16)(rxvq->cached_idx - rxvq->used_idx); + + if (entries_ready < n) { + /* Look for more packets */ + rxvq->cached_idx = rxvq->p_used->idx; + entries_ready = (le16)(rxvq->cached_idx - rxvq->used_idx); + if (entries_ready == 0) { + *nb_pkts = 0; + return 0; + } + + if (n > entries_ready) + n = entries_ready; + } + + /* Give packets - make sure all packets are whole packets. + * Valid because queue_size is always 2^n + */ + const le16 queue_mask = (le16)(rxvq->queue_size - 1); + const ule32 buf_len = rxvq->p_desc[0].len; + + le16 used = rxvq->used_idx; + + for (i = 0; i < n; ++i) { + le32 id = rxvq->p_used->ring[used & queue_mask].id; + + rp[i].addr = rxvq->p_virtual_addr[id].virt_addr; + rp[i].len = rxvq->p_used->ring[used & queue_mask].len; + + uint32_t pkt_len = + ((struct _pkt_hdr_rx *)rp[i].addr)->cap_len; + + if (pkt_len > buf_len) { + /* segmented */ + int nbsegs = (pkt_len + buf_len - 1) / buf_len; + + if (((int)i + nbsegs) > n) { + /* don't have enough segments - break out */ + break; + } + + int ii; + + for (ii = 1; ii < nbsegs; ii++) { + ++i; + id = rxvq->p_used + ->ring[(used + ii) & + queue_mask] + .id; + rp[i].addr = + rxvq->p_virtual_addr[id].virt_addr; + rp[i].len = rxvq->p_used + ->ring[(used + ii) & + queue_mask] + .len; + } + used += nbsegs; + } else { + ++used; + } + + pkts++; + segs = i + 1; + } + + rxvq->used_idx = used; + } else if (rxvq->vq_type == PACKED_RING) { + /* This requires in-order behavior from FPGA */ + int i; + + for (i = 0; i < n; i++) { + struct pvirtq_desc *desc = &rxvq->desc[rxvq->next_used]; + + ule16 flags = desc->flags; + uint8_t avail = !!(flags & VIRTQ_DESC_F_AVAIL); + uint8_t used = !!(flags & VIRTQ_DESC_F_USED); + + if (avail != rxvq->used_wrap_count || + used != rxvq->used_wrap_count) + break; + + rp[pkts].addr = rxvq->p_virtual_addr[desc->id].virt_addr; + rp[pkts].len = desc->len; + pkts++; + + inc_used(rxvq, 1); + } + + segs = pkts; + } + + *nb_pkts = pkts; + return segs; +} + +/* + * Put buffers back into Avail Ring + */ +void nthw_release_rx_packets(struct nthw_virt_queue *rxvq, le16 n) +{ + if (rxvq->vq_type == SPLIT_RING) { + rxvq->am_idx = (le16)(rxvq->am_idx + n); + rxvq->p_avail->idx = rxvq->am_idx; + } else if (rxvq->vq_type == PACKED_RING) { + int i; + /* + * Defer flags update on first segment - due to serialization towards HW and + * when jumbo segments are added + */ + + ule16 first_flags = VIRTQ_DESC_F_WRITE | avail_flag(rxvq) | + used_flag_inv(rxvq); + struct pvirtq_desc *first_desc = &rxvq->desc[rxvq->next_avail]; + + uint32_t len = rxvq->p_virtual_addr[0].len; /* all same size */ + + /* Optimization point: use in-order release */ + + for (i = 0; i < n; i++) { + struct pvirtq_desc *desc = + &rxvq->desc[rxvq->next_avail]; + + desc->id = rxvq->next_avail; + desc->addr = + (ule64)rxvq->p_virtual_addr[desc->id].phys_addr; + desc->len = len; + if (i) + desc->flags = VIRTQ_DESC_F_WRITE | + avail_flag(rxvq) | + used_flag_inv(rxvq); + + inc_avail(rxvq, 1); + } + rte_rmb(); + first_desc->flags = first_flags; + } +} + +#define vq_log_arg(vq, format, ...) + +uint16_t nthw_get_tx_buffers(struct nthw_virt_queue *txvq, uint16_t n, + uint16_t *first_idx, struct nthw_cvirtq_desc *cvq, + struct nthw_memory_descriptor **p_virt_addr) +{ + int m = 0; + le16 queue_mask = (le16)(txvq->queue_size - + 1); /* Valid because queue_size is always 2^n */ + *p_virt_addr = txvq->p_virtual_addr; + + if (txvq->vq_type == SPLIT_RING) { + cvq->s = txvq->p_desc; + cvq->vq_type = SPLIT_RING; + + *first_idx = txvq->tx_descr_avail_idx; + + le16 entries_used = + (le16)((txvq->tx_descr_avail_idx - txvq->cached_idx) & + queue_mask); + le16 entries_ready = (le16)(txvq->queue_size - 1 - entries_used); + + vq_log_arg(txvq, + "ask %i: descrAvail %i, cachedidx %i, used: %i, ready %i used->idx %i\n", + n, txvq->tx_descr_avail_idx, txvq->cached_idx, entries_used, + entries_ready, txvq->p_used->idx); + + if (entries_ready < n) { + /* + * Look for more packets. + * Using the used_idx in the avail ring since they are held synchronous + * because of in-order + */ + txvq->cached_idx = + txvq->p_avail->ring[(txvq->p_used->idx - 1) & + queue_mask]; + + vq_log_arg(txvq, + "_update: get cachedidx %i (used_idx-1 %i)\n", + txvq->cached_idx, + (txvq->p_used->idx - 1) & queue_mask); + entries_used = (le16)((txvq->tx_descr_avail_idx - + txvq->cached_idx) & + queue_mask); + entries_ready = + (le16)(txvq->queue_size - 1 - entries_used); + vq_log_arg(txvq, "new used: %i, ready %i\n", + entries_used, entries_ready); + if (n > entries_ready) + n = entries_ready; + } + } else if (txvq->vq_type == PACKED_RING) { + int i; + + cvq->p = txvq->desc; + cvq->vq_type = PACKED_RING; + + if (txvq->outs.num) { + *first_idx = txvq->outs.next; + uint16_t num = RTE_MIN(n, txvq->outs.num); + + txvq->outs.next = (txvq->outs.next + num) & queue_mask; + txvq->outs.num -= num; + + if (n == num) + return n; + + m = num; + n -= num; + } else { + *first_idx = txvq->next_used; + } + /* iterate the ring - this requires in-order behavior from FPGA */ + for (i = 0; i < n; i++) { + struct pvirtq_desc *desc = &txvq->desc[txvq->next_used]; + + ule16 flags = desc->flags; + uint8_t avail = !!(flags & VIRTQ_DESC_F_AVAIL); + uint8_t used = !!(flags & VIRTQ_DESC_F_USED); + + if (avail != txvq->used_wrap_count || + used != txvq->used_wrap_count) { + n = i; + break; + } + + le16 incr = (desc->id - txvq->next_used) & queue_mask; + + i += incr; + inc_used(txvq, incr + 1); + } + + if (i > n) { + int outs_num = i - n; + + txvq->outs.next = (txvq->next_used - outs_num) & + queue_mask; + txvq->outs.num = outs_num; + } + + } else { + return 0; + } + return m + n; +} + +void nthw_release_tx_buffers(struct nthw_virt_queue *txvq, uint16_t n, + uint16_t n_segs[]) +{ + int i; + + if (txvq->vq_type == SPLIT_RING) { + /* Valid because queue_size is always 2^n */ + le16 queue_mask = (le16)(txvq->queue_size - 1); + + vq_log_arg(txvq, "pkts %i, avail idx %i, start at %i\n", n, + txvq->am_idx, txvq->tx_descr_avail_idx); + for (i = 0; i < n; i++) { + int idx = txvq->am_idx & queue_mask; + + txvq->p_avail->ring[idx] = txvq->tx_descr_avail_idx; + txvq->tx_descr_avail_idx = + (txvq->tx_descr_avail_idx + n_segs[i]) & queue_mask; + txvq->am_idx++; + } + /* Make sure the ring has been updated before HW reads index update */ + rte_mb(); + txvq->p_avail->idx = txvq->am_idx; + vq_log_arg(txvq, "new avail idx %i, descr_idx %i\n", + txvq->p_avail->idx, txvq->tx_descr_avail_idx); + + } else if (txvq->vq_type == PACKED_RING) { + /* + * Defer flags update on first segment - due to serialization towards HW and + * when jumbo segments are added + */ + + ule16 first_flags = avail_flag(txvq) | used_flag_inv(txvq); + struct pvirtq_desc *first_desc = &txvq->desc[txvq->next_avail]; + + for (i = 0; i < n; i++) { + struct pvirtq_desc *desc = + &txvq->desc[txvq->next_avail]; + + desc->id = txvq->next_avail; + desc->addr = + (ule64)txvq->p_virtual_addr[desc->id].phys_addr; + + if (i) + /* bitwise-or here because next flags may already have been setup */ + desc->flags |= avail_flag(txvq) | + used_flag_inv(txvq); + + inc_avail(txvq, 1); + } + /* Proper read barrier before FPGA may see first flags */ + rte_rmb(); + first_desc->flags = first_flags; + } +} + +int nthw_get_rx_queue_ptr(struct nthw_virt_queue *rxvq, uint16_t *index) +{ + uint32_t rx_ptr; + uint32_t loops = 100000; + + dbs_initialize_get_rx_ptr(rxvq->mp_nthw_dbs, rxvq->index); + do { + if (dbs_get_rx_ptr(rxvq->mp_nthw_dbs, &rx_ptr) != 0) + return -1; + if (--loops == 0) + return -1; + usleep(10); + } while (rx_ptr == INDEX_PTR_NOT_VALID); + + *index = (uint16_t)(rx_ptr & 0xffff); + return 0; +} + +int nthw_get_tx_queue_ptr(struct nthw_virt_queue *txvq, uint16_t *index) +{ + uint32_t tx_ptr; + uint32_t loops = 100000; + + dbs_initialize_get_tx_ptr(txvq->mp_nthw_dbs, txvq->index); + do { + if (dbs_get_tx_ptr(txvq->mp_nthw_dbs, &tx_ptr) != 0) + return -1; + if (--loops == 0) + return -1; + usleep(10); + } while (tx_ptr == INDEX_PTR_NOT_VALID); + + *index = (uint16_t)(tx_ptr & 0xffff); + return 0; +} diff --git a/drivers/net/ntnic/ntnic_dbsconfig.h b/drivers/net/ntnic/ntnic_dbsconfig.h new file mode 100644 index 0000000000..ceae535741 --- /dev/null +++ b/drivers/net/ntnic/ntnic_dbsconfig.h @@ -0,0 +1,251 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2023 Napatech A/S + */ + +#ifndef NTNIC_DBS_CONFIG_H +#define NTNIC_DBS_CONFIG_H + +#include +#include "nthw_drv.h" + +struct nthw_virt_queue; + +struct nthw_memory_descriptor { + void *phys_addr; + void *virt_addr; + uint32_t len; +}; + +#define ule64 uint64_t +#define ule32 uint32_t +#define ule16 uint16_t + +#define MAX_MSIX_VECTORS_PR_VF 8 + +#define SPLIT_RING 0 +#define PACKED_RING 1 +#define IN_ORDER 1 +#define NO_ORDER_REQUIRED 0 + +/* + * SPLIT : This marks a buffer as continuing via the next field. + * PACKED: This marks a buffer as continuing. (packed does not have a next field, so must be + * contiguous) In Used descriptors it must be ignored + */ +#define VIRTQ_DESC_F_NEXT 1 +/* + * SPLIT : This marks a buffer as device write-only (otherwise device read-only). + * PACKED: This marks a descriptor as device write-only (otherwise device read-only). + * PACKED: In a used descriptor, this bit is used to specify whether any data has been written by + * the device into any parts of the buffer. + */ +#define VIRTQ_DESC_F_WRITE 2 +/* + * SPLIT : This means the buffer contains a list of buffer descriptors. + * PACKED: This means the element contains a table of descriptors. + */ +#define VIRTQ_DESC_F_INDIRECT 4 + +/* + * Split Ring virtq Descriptor + */ +#pragma pack(1) +struct virtq_desc { + /* Address (guest-physical). */ + ule64 addr; + /* Length. */ + ule32 len; + /* The flags as indicated above. */ + ule16 flags; + /* Next field if flags & NEXT */ + ule16 next; +}; + +#pragma pack() + +/* + * Packed Ring special structures and defines + * + */ + +#define MAX_PACKED_RING_ELEMENTS (1 << 15) /* 32768 */ + +/* additional packed ring flags */ +#define VIRTQ_DESC_F_AVAIL (1 << 7) +#define VIRTQ_DESC_F_USED (1 << 15) + +/* descr phys address must be 16 byte aligned */ +#pragma pack(push, 16) +struct pvirtq_desc { + /* Buffer Address. */ + ule64 addr; + /* Buffer Length. */ + ule32 len; + /* Buffer ID. */ + ule16 id; + /* The flags depending on descriptor type. */ + ule16 flags; +}; + +#pragma pack(pop) + +/* Enable events */ +#define RING_EVENT_FLAGS_ENABLE 0x0 +/* Disable events */ +#define RING_EVENT_FLAGS_DISABLE 0x1 +/* + * Enable events for a specific descriptor + * (as specified by Descriptor Ring Change Event offset/Wrap Counter). + * Only valid if VIRTIO_F_RING_EVENT_IDX has been negotiated. + */ +#define RING_EVENT_FLAGS_DESC 0x2 +/* The value 0x3 is reserved */ + +struct pvirtq_event_suppress { + union { + struct { + ule16 desc_event_off : 15; /* Descriptor Ring Change Event offset */ + ule16 desc_event_wrap : 1; /* Descriptor Ring Change Event Wrap Counter */ + }; + ule16 desc; /* If desc_event_flags set to RING_EVENT_FLAGS_DESC */ + }; + + /* phys address must be 4 byte aligned */ +#pragma pack(push, 16) + union { + struct { + ule16 desc_event_flags : 2; /* Descriptor Ring Change Event Flags */ + ule16 reserved : 14; /* Reserved, set to 0 */ + }; + ule16 flags; + }; +}; + +#pragma pack(pop) + +/* + * Common virtq descr + */ +#define vq_set_next(_vq, index, nxt) \ + do { \ + __typeof__(_vq) (vq) = (_vq); \ + if ((vq)->vq_type == SPLIT_RING) \ + (vq)->s[index].next = nxt; \ + } while (0) +#define vq_add_flags(_vq, _index, _flgs) \ + do { \ + __typeof__(_vq) (vq) = (_vq); \ + __typeof__(_index) (index) = (_index); \ + __typeof__(_flgs) (flgs) = (_flgs); \ + if ((vq)->vq_type == SPLIT_RING) \ + (vq)->s[index].flags |= flgs; \ + else if ((vq)->vq_type == PACKED_RING) \ + (vq)->p[index].flags |= flgs; \ + } while (0) +#define vq_set_flags(_vq, _index, _flgs) \ + do { \ + __typeof__(_vq) (vq) = (_vq); \ + __typeof__(_index) (index) = (_index); \ + __typeof__(_flgs) (flgs) = (_flgs); \ + if ((vq)->vq_type == SPLIT_RING) \ + (vq)->s[index].flags = flgs; \ + else if ((vq)->vq_type == PACKED_RING) \ + (vq)->p[index].flags = flgs; \ + } while (0) + +struct nthw_virtq_desc_buf { + /* Address (guest-physical). */ + ule64 addr; + /* Length. */ + ule32 len; +} __rte_aligned(16); + +struct nthw_cvirtq_desc { + union { + struct nthw_virtq_desc_buf *b; /* buffer part as is common */ + struct virtq_desc *s; /* SPLIT */ + struct pvirtq_desc *p; /* PACKED */ + }; + uint16_t vq_type; +}; + +/* Setup a virt_queue for a VM */ +struct nthw_virt_queue *nthw_setup_rx_virt_queue(nthw_dbs_t *p_nthw_dbs, + uint32_t index, uint16_t start_idx, + uint16_t start_ptr, void *avail_struct_phys_addr, void *used_struct_phys_addr, + void *desc_struct_phys_addr, uint16_t queue_size, uint32_t host_id, + uint32_t header, uint32_t vq_type, int irq_vector); + +int nthw_enable_rx_virt_queue(struct nthw_virt_queue *rx_vq); +int nthw_disable_rx_virt_queue(struct nthw_virt_queue *rx_vq); +int nthw_release_rx_virt_queue(struct nthw_virt_queue *rxvq); + +struct nthw_virt_queue *nthw_setup_tx_virt_queue(nthw_dbs_t *p_nthw_dbs, + uint32_t index, uint16_t start_idx, + uint16_t start_ptr, void *avail_struct_phys_addr, void *used_struct_phys_addr, + void *desc_struct_phys_addr, uint16_t queue_size, uint32_t host_id, + uint32_t port, uint32_t virtual_port, uint32_t header, uint32_t vq_type, + int irq_vector, uint32_t in_order); + +int nthw_enable_tx_virt_queue(struct nthw_virt_queue *tx_vq); +int nthw_disable_tx_virt_queue(struct nthw_virt_queue *tx_vq); +int nthw_release_tx_virt_queue(struct nthw_virt_queue *txvq); +int nthw_enable_and_change_port_tx_virt_queue(struct nthw_virt_queue *tx_vq, + uint32_t outport); + +struct nthw_virt_queue *nthw_setup_managed_rx_virt_queue(nthw_dbs_t *p_nthw_dbs, + uint32_t index, uint32_t queue_size, + uint32_t host_id, uint32_t header, + struct nthw_memory_descriptor * + p_virt_struct_area, + struct nthw_memory_descriptor * + p_packet_buffers, + uint32_t vq_type, int irq_vector); + +int nthw_release_managed_rx_virt_queue(struct nthw_virt_queue *rxvq); + +struct nthw_virt_queue *nthw_setup_managed_tx_virt_queue(nthw_dbs_t *p_nthw_dbs, + uint32_t index, uint32_t queue_size, + uint32_t host_id, uint32_t port, uint32_t virtual_port, uint32_t header, + struct nthw_memory_descriptor * + p_virt_struct_area, + struct nthw_memory_descriptor * + p_packet_buffers, + uint32_t vq_type, int irq_vector, uint32_t in_order); + +int nthw_release_managed_tx_virt_queue(struct nthw_virt_queue *txvq); + +int nthw_set_tx_qos_config(nthw_dbs_t *p_nthw_dbs, uint32_t port, uint32_t enable, + uint32_t ir, uint32_t bs); + +int nthw_set_tx_qos_rate_global(nthw_dbs_t *p_nthw_dbs, uint32_t multiplier, + uint32_t divider); + +struct nthw_received_packets { + void *addr; + uint32_t len; +}; + +/* + * These functions handles both Split and Packed including merged buffers (jumbo) + */ +uint16_t nthw_get_rx_packets(struct nthw_virt_queue *rxvq, uint16_t n, + struct nthw_received_packets *rp, + uint16_t *nb_pkts); + +void nthw_release_rx_packets(struct nthw_virt_queue *rxvq, uint16_t n); + +uint16_t nthw_get_tx_buffers(struct nthw_virt_queue *txvq, uint16_t n, + uint16_t *first_idx, struct nthw_cvirtq_desc *cvq, + struct nthw_memory_descriptor **p_virt_addr); + +void nthw_release_tx_buffers(struct nthw_virt_queue *txvq, uint16_t n, + uint16_t n_segs[]); + +int nthw_get_rx_queue_ptr(struct nthw_virt_queue *rxvq, uint16_t *index); + +int nthw_get_tx_queue_ptr(struct nthw_virt_queue *txvq, uint16_t *index); + +int nthw_virt_queue_init(struct fpga_info_s *p_fpga_info); + +#endif diff --git a/drivers/net/ntnic/ntnic_ethdev.c b/drivers/net/ntnic/ntnic_ethdev.c new file mode 100644 index 0000000000..ce07d5a8cd --- /dev/null +++ b/drivers/net/ntnic/ntnic_ethdev.c @@ -0,0 +1,4256 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2023 Napatech A/S + */ + +#include /* sleep() */ +#include +#include +#include +#include +#include +#include + +#include "ntdrv_4ga.h" + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "ntlog.h" + +#include "stream_binary_flow_api.h" +#include "nthw_fpga.h" +#include "ntnic_xstats.h" +#include "ntnic_hshconfig.h" +#include "ntnic_ethdev.h" +#include "ntnic_vdpa.h" +#include "ntnic_vf.h" +#include "ntnic_vfio.h" +#include "ntnic_meter.h" + +#include "flow_api.h" + +#ifdef NT_TOOLS +#include "ntconnect.h" +#include "ntconnect_api.h" +#include "ntconnect_modules/ntconn_modules.h" +#endif + +/* Defines: */ + +#define HW_MAX_PKT_LEN (10000) +#define MAX_MTU (HW_MAX_PKT_LEN - RTE_ETHER_HDR_LEN - RTE_ETHER_CRC_LEN) +#define MIN_MTU 46 +#define MIN_MTU_INLINE 512 + +#include "ntnic_dbsconfig.h" + +#define EXCEPTION_PATH_HID 0 + +#define MAX_TOTAL_QUEUES 128 + +#define ONE_G_SIZE 0x40000000 +#define ONE_G_MASK (ONE_G_SIZE - 1) + +#define VIRTUAL_TUNNEL_PORT_OFFSET 72 + +int lag_active; + +static struct { + struct nthw_virt_queue *vq; + int managed; + int rx; +} rel_virt_queue[MAX_REL_VQS]; + +#define MAX_RX_PACKETS 128 +#define MAX_TX_PACKETS 128 + +#if defined(RX_SRC_DUMP_PKTS_DEBUG) || defined(RX_DST_DUMP_PKTS_DEBUG) || \ + defined(TX_SRC_DUMP_PKTS_DEBUG) || defined(TX_DST_DUMP_PKTS_DEBUG) +static void dump_packet_seg(const char *text, uint8_t *data, int len) +{ + int x; + + if (text) + printf("%s (%p, len %i)", text, data, len); + for (x = 0; x < len; x++) { + if (!(x % 16)) + printf("\n%04X:", x); + printf(" %02X", *(data + x)); + } + printf("\n"); +} +#endif + +/* Global statistics: */ +extern const struct rte_flow_ops _dev_flow_ops; +struct pmd_internals *pmd_intern_base; +uint64_t rte_tsc_freq; + +/*------- Tables to store DPDK EAL log levels for nt log modules----------*/ +static int nt_log_module_logtype[NT_LOG_MODULE_COUNT] = { -1 }; +/*Register the custom module binding to EAL --log-level option here*/ +static const char *nt_log_module_eal_name[NT_LOG_MODULE_COUNT] = { + [NT_LOG_MODULE_INDEX(NT_LOG_MODULE_GENERAL)] = "pmd.net.ntnic.general", + [NT_LOG_MODULE_INDEX(NT_LOG_MODULE_NTHW)] = "pmd.net.ntnic.nthw", + [NT_LOG_MODULE_INDEX(NT_LOG_MODULE_FILTER)] = "pmd.net.ntnic.filter", + [NT_LOG_MODULE_INDEX(NT_LOG_MODULE_VDPA)] = "pmd.net.ntnic.vdpa", + [NT_LOG_MODULE_INDEX(NT_LOG_MODULE_FPGA)] = "pmd.net.ntnic.fpga", + [NT_LOG_MODULE_INDEX(NT_LOG_MODULE_NTCONNECT)] = + "pmd.net.ntnic.ntconnect", + [NT_LOG_MODULE_INDEX(NT_LOG_MODULE_ETHDEV)] = "pmd.net.ntnic.ethdev" +}; + +/*--------------------------------------------------------------------------*/ + +rte_spinlock_t hwlock = RTE_SPINLOCK_INITIALIZER; + +static void *lag_management(void *arg); +static void (*previous_handler)(int sig); +static pthread_t shutdown_tid; +int kill_pmd; + +#define ETH_DEV_NTNIC_HELP_ARG "help" +#define ETH_DEV_NTHW_PORTMASK_ARG "portmask" +#define ETH_DEV_NTHW_RXQUEUES_ARG "rxqs" +#define ETH_DEV_NTHW_TXQUEUES_ARG "txqs" +#define ETH_DEV_NTHW_PORTQUEUES_ARG "portqueues" +#define ETH_DEV_NTHW_REPRESENTOR_ARG "representor" +#define ETH_DEV_NTHW_EXCEPTION_PATH_ARG "exception_path" +#define ETH_NTNIC_LAG_PRIMARY_ARG "primary" +#define ETH_NTNIC_LAG_BACKUP_ARG "backup" +#define ETH_NTNIC_LAG_MODE_ARG "mode" +#define ETH_DEV_NTHW_LINK_SPEED_ARG "port.link_speed" +#define ETH_DEV_NTNIC_SUPPORTED_FPGAS_ARG "supported-fpgas" + +#define DVIO_VHOST_DIR_NAME "/usr/local/var/run/" + +static const char *const valid_arguments[] = { + ETH_DEV_NTNIC_HELP_ARG, + ETH_DEV_NTHW_PORTMASK_ARG, + ETH_DEV_NTHW_RXQUEUES_ARG, + ETH_DEV_NTHW_TXQUEUES_ARG, + ETH_DEV_NTHW_PORTQUEUES_ARG, + ETH_DEV_NTHW_REPRESENTOR_ARG, + ETH_DEV_NTHW_EXCEPTION_PATH_ARG, + ETH_NTNIC_LAG_PRIMARY_ARG, + ETH_NTNIC_LAG_BACKUP_ARG, + ETH_NTNIC_LAG_MODE_ARG, + ETH_DEV_NTHW_LINK_SPEED_ARG, + ETH_DEV_NTNIC_SUPPORTED_FPGAS_ARG, + NULL, +}; + +static struct rte_ether_addr eth_addr_vp[MAX_FPGA_VIRTUAL_PORTS_SUPPORTED]; + +/* Functions: */ + +/* + * The set of PCI devices this driver supports + */ +static const struct rte_pci_id nthw_pci_id_map[] = { + { RTE_PCI_DEVICE(NT_HW_PCI_VENDOR_ID, NT_HW_PCI_DEVICE_ID_NT200A02) }, + { RTE_PCI_DEVICE(NT_HW_PCI_VENDOR_ID, NT_HW_PCI_DEVICE_ID_NT50B01) }, + { + .vendor_id = 0, + }, /* sentinel */ +}; + +/* + * Store and get adapter info + */ + +static struct drv_s *g_p_drv[NUM_ADAPTER_MAX] = { NULL }; + +static void store_pdrv(struct drv_s *p_drv) +{ + if (p_drv->adapter_no > NUM_ADAPTER_MAX) { + NT_LOG(ERR, ETHDEV, + "Internal error adapter number %u out of range. Max number of adapters: %u\n", + p_drv->adapter_no, NUM_ADAPTER_MAX); + return; + } + if (g_p_drv[p_drv->adapter_no] != 0) { + NT_LOG(WRN, ETHDEV, + "Overwriting adapter structure for PCI " PCIIDENT_PRINT_STR + " with adapter structure for PCI " PCIIDENT_PRINT_STR + "\n", + PCIIDENT_TO_DOMAIN(g_p_drv[p_drv->adapter_no]->ntdrv.pciident), + PCIIDENT_TO_BUSNR(g_p_drv[p_drv->adapter_no]->ntdrv.pciident), + PCIIDENT_TO_DEVNR(g_p_drv[p_drv->adapter_no]->ntdrv.pciident), + PCIIDENT_TO_FUNCNR(g_p_drv[p_drv->adapter_no]->ntdrv.pciident), + PCIIDENT_TO_DOMAIN(p_drv->ntdrv.pciident), + PCIIDENT_TO_BUSNR(p_drv->ntdrv.pciident), + PCIIDENT_TO_DEVNR(p_drv->ntdrv.pciident), + PCIIDENT_TO_FUNCNR(p_drv->ntdrv.pciident)); + } + rte_spinlock_lock(&hwlock); + g_p_drv[p_drv->adapter_no] = p_drv; + rte_spinlock_unlock(&hwlock); +} + +static struct drv_s *get_pdrv_from_pci(struct rte_pci_addr addr) +{ + int i; + struct drv_s *p_drv = NULL; + + rte_spinlock_lock(&hwlock); + for (i = 0; i < NUM_ADAPTER_MAX; i++) { + if (g_p_drv[i]) { + if (PCIIDENT_TO_DOMAIN(g_p_drv[i]->ntdrv.pciident) == + addr.domain && + PCIIDENT_TO_BUSNR(g_p_drv[i]->ntdrv.pciident) == + addr.bus) { + p_drv = g_p_drv[i]; + break; + } + } + } + rte_spinlock_unlock(&hwlock); + return p_drv; +} + +static struct drv_s *get_pdrv_from_pciident(uint32_t pciident) +{ + struct rte_pci_addr addr; + + addr.domain = PCIIDENT_TO_DOMAIN(pciident); + addr.bus = PCIIDENT_TO_BUSNR(pciident); + addr.devid = PCIIDENT_TO_DEVNR(pciident); + addr.function = PCIIDENT_TO_FUNCNR(pciident); + return get_pdrv_from_pci(addr); +} + +int debug_adapter_show_info(uint32_t pciident, FILE *pfh) +{ + struct drv_s *p_drv = get_pdrv_from_pciident(pciident); + + return nt4ga_adapter_show_info(&p_drv->ntdrv.adapter_info, pfh); +} + +nthw_dbs_t *get_pdbs_from_pci(struct rte_pci_addr pci_addr) +{ + nthw_dbs_t *p_nthw_dbs = NULL; + struct drv_s *p_drv; + + p_drv = get_pdrv_from_pci(pci_addr); + if (p_drv) { + p_nthw_dbs = p_drv->ntdrv.adapter_info.fpga_info.mp_nthw_dbs; + } else { + NT_LOG(ERR, ETHDEV, + "Adapter DBS %p (p_drv=%p) info for adapter with PCI " PCIIDENT_PRINT_STR + " is not found\n", + p_nthw_dbs, p_drv, pci_addr.domain, pci_addr.bus, pci_addr.devid, + pci_addr.function); + } + return p_nthw_dbs; +} + +enum fpga_info_profile get_fpga_profile_from_pci(struct rte_pci_addr pci_addr) +{ + enum fpga_info_profile fpga_profile = FPGA_INFO_PROFILE_UNKNOWN; + struct drv_s *p_drv; + + p_drv = get_pdrv_from_pci(pci_addr); + if (p_drv) { + fpga_profile = p_drv->ntdrv.adapter_info.fpga_info.profile; + } else { + NT_LOG(ERR, ETHDEV, + "FPGA profile (p_drv=%p) for adapter with PCI " PCIIDENT_PRINT_STR + " is not found\n", + p_drv, pci_addr.domain, pci_addr.bus, pci_addr.devid, pci_addr.function); + } + return fpga_profile; +} + +static int string_to_u32(const char *key_str __rte_unused, + const char *value_str, void *extra_args) +{ + if (!value_str || !extra_args) + return -1; + const uint32_t value = strtol(value_str, NULL, 0); + *(uint32_t *)extra_args = value; + return 0; +} + +struct port_link_speed { + int port_id; + int link_speed; +}; + +/* Parse :, e.g 1:10000 */ +static int string_to_port_link_speed(const char *key_str __rte_unused, + const char *value_str, void *extra_args) +{ + if (!value_str || !extra_args) + return -1; + char *semicol; + const uint32_t pid = strtol(value_str, &semicol, 10); + + if (*semicol != ':') + return -1; + const uint32_t lspeed = strtol(++semicol, NULL, 10); + struct port_link_speed *pls = *(struct port_link_speed **)extra_args; + + pls->port_id = pid; + pls->link_speed = lspeed; + ++(*((struct port_link_speed **)(extra_args))); + return 0; +} + +static int dpdk_stats_collect(struct pmd_internals *internals, + struct rte_eth_stats *stats) +{ + unsigned int i; + struct drv_s *p_drv = internals->p_drv; + struct ntdrv_4ga_s *p_nt_drv = &p_drv->ntdrv; + nt4ga_stat_t *p_nt4ga_stat = &p_nt_drv->adapter_info.nt4ga_stat; + nthw_stat_t *p_nthw_stat = p_nt4ga_stat->mp_nthw_stat; + const int if_index = internals->if_index; + uint64_t rx_total = 0; + uint64_t rx_total_b = 0; + uint64_t tx_total = 0; + uint64_t tx_total_b = 0; + uint64_t tx_err_total = 0; + + if (!p_nthw_stat || !p_nt4ga_stat || !stats || if_index < 0 || + if_index > NUM_ADAPTER_PORTS_MAX) { + NT_LOG(WRN, ETHDEV, "%s - error exit\n", __func__); + return -1; + } + + /* + * Pull the latest port statistic numbers (Rx/Tx pkts and bytes) + * Return values are in the "internals->rxq_scg[]" and "internals->txq_scg[]" arrays + */ + poll_statistics(internals); + + memset(stats, 0, sizeof(*stats)); + for (i = 0; + i < RTE_ETHDEV_QUEUE_STAT_CNTRS && i < internals->nb_rx_queues; + i++) { + stats->q_ipackets[i] = internals->rxq_scg[i].rx_pkts; + stats->q_ibytes[i] = internals->rxq_scg[i].rx_bytes; + rx_total += stats->q_ipackets[i]; + rx_total_b += stats->q_ibytes[i]; + } + + for (i = 0; + i < RTE_ETHDEV_QUEUE_STAT_CNTRS && i < internals->nb_tx_queues; + i++) { + stats->q_opackets[i] = internals->txq_scg[i].tx_pkts; + stats->q_obytes[i] = internals->txq_scg[i].tx_bytes; + stats->q_errors[i] = internals->txq_scg[i].err_pkts; + tx_total += stats->q_opackets[i]; + tx_total_b += stats->q_obytes[i]; + tx_err_total += stats->q_errors[i]; + } + + stats->imissed = internals->rx_missed; + stats->ipackets = rx_total; + stats->ibytes = rx_total_b; + stats->opackets = tx_total; + stats->obytes = tx_total_b; + stats->oerrors = tx_err_total; + + return 0; +} + +static int dpdk_stats_reset(struct pmd_internals *internals, + struct ntdrv_4ga_s *p_nt_drv, int n_intf_no) +{ + nt4ga_stat_t *p_nt4ga_stat = &p_nt_drv->adapter_info.nt4ga_stat; + nthw_stat_t *p_nthw_stat = p_nt4ga_stat->mp_nthw_stat; + unsigned int i; + + if (!p_nthw_stat || !p_nt4ga_stat || n_intf_no < 0 || + n_intf_no > NUM_ADAPTER_PORTS_MAX) + return -1; + + pthread_mutex_lock(&p_nt_drv->stat_lck); + + /* Rx */ + for (i = 0; i < internals->nb_rx_queues; i++) { + internals->rxq_scg[i].rx_pkts = 0; + internals->rxq_scg[i].rx_bytes = 0; + internals->rxq_scg[i].err_pkts = 0; + } + + internals->rx_missed = 0; + + /* Tx */ + for (i = 0; i < internals->nb_tx_queues; i++) { + internals->txq_scg[i].tx_pkts = 0; + internals->txq_scg[i].tx_bytes = 0; + internals->txq_scg[i].err_pkts = 0; + } + + p_nt4ga_stat->n_totals_reset_timestamp = time(NULL); + + pthread_mutex_unlock(&p_nt_drv->stat_lck); + + return 0; +} + +/* NOTE: please note the difference between ETH_SPEED_NUM_xxx and ETH_LINK_SPEED_xxx */ +static int nt_link_speed_to_eth_speed_num(enum nt_link_speed_e nt_link_speed) +{ + int eth_speed_num = ETH_SPEED_NUM_NONE; + + switch (nt_link_speed) { + case NT_LINK_SPEED_10M: + eth_speed_num = ETH_SPEED_NUM_10M; + break; + case NT_LINK_SPEED_100M: + eth_speed_num = ETH_SPEED_NUM_100M; + break; + case NT_LINK_SPEED_1G: + eth_speed_num = ETH_SPEED_NUM_1G; + break; + case NT_LINK_SPEED_10G: + eth_speed_num = ETH_SPEED_NUM_10G; + break; + case NT_LINK_SPEED_25G: + eth_speed_num = ETH_SPEED_NUM_25G; + break; + case NT_LINK_SPEED_40G: + eth_speed_num = ETH_SPEED_NUM_40G; + break; + case NT_LINK_SPEED_50G: + eth_speed_num = ETH_SPEED_NUM_50G; + break; + case NT_LINK_SPEED_100G: + eth_speed_num = ETH_SPEED_NUM_100G; + break; + default: + eth_speed_num = ETH_SPEED_NUM_NONE; + break; + } + + return eth_speed_num; +} + +static int nt_link_duplex_to_eth_duplex(enum nt_link_duplex_e nt_link_duplex) +{ + int eth_link_duplex = 0; + + switch (nt_link_duplex) { + case NT_LINK_DUPLEX_FULL: + eth_link_duplex = ETH_LINK_FULL_DUPLEX; + break; + case NT_LINK_DUPLEX_HALF: + eth_link_duplex = ETH_LINK_HALF_DUPLEX; + break; + case NT_LINK_DUPLEX_UNKNOWN: /* fall-through */ + default: + break; + } + return eth_link_duplex; +} + +static int eth_link_update(struct rte_eth_dev *eth_dev, + int wait_to_complete __rte_unused) +{ + struct pmd_internals *internals = + (struct pmd_internals *)eth_dev->data->dev_private; + const int n_intf_no = internals->if_index; + struct adapter_info_s *p_adapter_info = + &internals->p_drv->ntdrv.adapter_info; + + if (eth_dev->data->dev_started) { + if (internals->type == PORT_TYPE_VIRTUAL || + internals->type == PORT_TYPE_OVERRIDE) { + eth_dev->data->dev_link.link_status = + ((internals->vport_comm == + VIRT_PORT_NEGOTIATED_NONE) ? + ETH_LINK_DOWN : + ETH_LINK_UP); + eth_dev->data->dev_link.link_speed = ETH_SPEED_NUM_NONE; + eth_dev->data->dev_link.link_duplex = + ETH_LINK_FULL_DUPLEX; + return 0; + } + + const bool port_link_status = + nt4ga_port_get_link_status(p_adapter_info, n_intf_no); + eth_dev->data->dev_link.link_status = + port_link_status ? ETH_LINK_UP : ETH_LINK_DOWN; + + nt_link_speed_t port_link_speed = + nt4ga_port_get_link_speed(p_adapter_info, n_intf_no); + eth_dev->data->dev_link.link_speed = + nt_link_speed_to_eth_speed_num(port_link_speed); + + nt_link_duplex_t nt_link_duplex = + nt4ga_port_get_link_duplex(p_adapter_info, n_intf_no); + eth_dev->data->dev_link.link_duplex = + nt_link_duplex_to_eth_duplex(nt_link_duplex); + } else { + eth_dev->data->dev_link.link_status = ETH_LINK_DOWN; + eth_dev->data->dev_link.link_speed = ETH_SPEED_NUM_NONE; + eth_dev->data->dev_link.link_duplex = ETH_LINK_FULL_DUPLEX; + } + return 0; +} + +static int eth_stats_get(struct rte_eth_dev *eth_dev, + struct rte_eth_stats *stats) +{ + struct pmd_internals *internals = + (struct pmd_internals *)eth_dev->data->dev_private; + dpdk_stats_collect(internals, stats); + return 0; +} + +static int eth_stats_reset(struct rte_eth_dev *eth_dev) +{ + struct pmd_internals *internals = + (struct pmd_internals *)eth_dev->data->dev_private; + struct drv_s *p_drv = internals->p_drv; + struct ntdrv_4ga_s *p_nt_drv = &p_drv->ntdrv; + const int if_index = internals->if_index; + + dpdk_stats_reset(internals, p_nt_drv, if_index); + return 0; +} + +static uint32_t nt_link_speed_capa_to_eth_speed_capa(int nt_link_speed_capa) +{ + uint32_t eth_speed_capa = 0; + + if (nt_link_speed_capa & NT_LINK_SPEED_10M) + eth_speed_capa |= ETH_LINK_SPEED_10M; + if (nt_link_speed_capa & NT_LINK_SPEED_100M) + eth_speed_capa |= ETH_LINK_SPEED_100M; + if (nt_link_speed_capa & NT_LINK_SPEED_1G) + eth_speed_capa |= ETH_LINK_SPEED_1G; + if (nt_link_speed_capa & NT_LINK_SPEED_10G) + eth_speed_capa |= ETH_LINK_SPEED_10G; + if (nt_link_speed_capa & NT_LINK_SPEED_25G) + eth_speed_capa |= ETH_LINK_SPEED_25G; + if (nt_link_speed_capa & NT_LINK_SPEED_40G) + eth_speed_capa |= ETH_LINK_SPEED_40G; + if (nt_link_speed_capa & NT_LINK_SPEED_50G) + eth_speed_capa |= ETH_LINK_SPEED_50G; + if (nt_link_speed_capa & NT_LINK_SPEED_100G) + eth_speed_capa |= ETH_LINK_SPEED_100G; + + return eth_speed_capa; +} + +#define RTE_RSS_5TUPLE (ETH_RSS_IP | ETH_RSS_TCP | ETH_RSS_UDP) + +static int eth_dev_infos_get(struct rte_eth_dev *eth_dev, + struct rte_eth_dev_info *dev_info) +{ + struct pmd_internals *internals = + (struct pmd_internals *)eth_dev->data->dev_private; + const int n_intf_no = internals->if_index; + struct adapter_info_s *p_adapter_info = + &internals->p_drv->ntdrv.adapter_info; + + dev_info->if_index = internals->if_index; + dev_info->driver_name = internals->name; + dev_info->max_mac_addrs = NUM_MAC_ADDRS_PER_PORT; + dev_info->max_rx_pktlen = HW_MAX_PKT_LEN; + dev_info->max_mtu = MAX_MTU; + if (p_adapter_info->fpga_info.profile == FPGA_INFO_PROFILE_INLINE) + dev_info->min_mtu = MIN_MTU_INLINE; + + else + dev_info->min_mtu = MIN_MTU; + + if (internals->p_drv) { + dev_info->max_rx_queues = internals->nb_rx_queues; + dev_info->max_tx_queues = internals->nb_tx_queues; + + dev_info->min_rx_bufsize = 64; + + const uint32_t nt_port_speed_capa = + nt4ga_port_get_link_speed_capabilities(p_adapter_info, + n_intf_no); + dev_info->speed_capa = nt_link_speed_capa_to_eth_speed_capa(nt_port_speed_capa); + } + + dev_info->flow_type_rss_offloads = + RTE_RSS_5TUPLE | RTE_ETH_RSS_C_VLAN | + RTE_ETH_RSS_LEVEL_INNERMOST | RTE_ETH_RSS_L3_SRC_ONLY | + RTE_ETH_RSS_LEVEL_OUTERMOST | RTE_ETH_RSS_L3_DST_ONLY; + /* + * NT hashing algorithm doesn't use key, so it is just a fake key length to + * feet testpmd requirements. + */ + dev_info->hash_key_size = 1; + + return 0; +} + +static __rte_always_inline int +copy_virtqueue_to_mbuf(struct rte_mbuf *mbuf, struct rte_mempool *mb_pool, + struct nthw_received_packets *hw_recv, int max_segs, + uint16_t data_len) +{ + int src_pkt = 0; + /* + * 1. virtqueue packets may be segmented + * 2. the mbuf size may be too small and may need to be segmented + */ + char *data = (char *)hw_recv->addr + SG_HDR_SIZE; + char *dst = (char *)mbuf->buf_addr + RTE_PKTMBUF_HEADROOM; + + /* set packet length */ + mbuf->pkt_len = data_len - SG_HDR_SIZE; + +#ifdef RX_MERGE_SEGMENT_DEBUG + void *dbg_src_start = hw_recv->addr; + void *dbg_dst_start = dst; +#endif + + int remain = mbuf->pkt_len; + /* First cpy_size is without header */ + int cpy_size = (data_len > SG_HW_RX_PKT_BUFFER_SIZE) ? + SG_HW_RX_PKT_BUFFER_SIZE - SG_HDR_SIZE : + remain; + + struct rte_mbuf *m = mbuf; /* if mbuf segmentation is needed */ + + while (++src_pkt <= max_segs) { + /* keep track of space in dst */ + int cpto_size = rte_pktmbuf_tailroom(m); + +#ifdef RX_MERGE_SEGMENT_DEBUG + printf("src copy size %i\n", cpy_size); +#endif + + if (cpy_size > cpto_size) { + int new_cpy_size = cpto_size; + +#ifdef RX_MERGE_SEGMENT_DEBUG + printf("Seg %i: mbuf first cpy src off 0x%" PRIX64 ", dst off 0x%" PRIX64 ", size %i\n", + mbuf->nb_segs - 1, + (uint64_t)data - (uint64_t)dbg_src_start, + (uint64_t)dst - (uint64_t)dbg_dst_start, + new_cpy_size); +#endif + rte_memcpy((void *)dst, (void *)data, new_cpy_size); + m->data_len += new_cpy_size; + remain -= new_cpy_size; + cpy_size -= new_cpy_size; + + data += new_cpy_size; + + /* + * Loop if remaining data from this virtqueue seg cannot fit in one extra + * mbuf + */ + do { + m->next = rte_pktmbuf_alloc(mb_pool); + if (unlikely(!m->next)) + return -1; + m = m->next; + + /* Headroom is not needed in chained mbufs */ + rte_pktmbuf_prepend(m, rte_pktmbuf_headroom(m)); + dst = (char *)m->buf_addr; + m->data_len = 0; + m->pkt_len = 0; + +#ifdef RX_MERGE_SEGMENT_DEBUG + dbg_dst_start = dst; +#endif + cpto_size = rte_pktmbuf_tailroom(m); + + int actual_cpy_size = (cpy_size > cpto_size) ? + cpto_size : + cpy_size; +#ifdef RX_MERGE_SEGMENT_DEBUG + printf("new dst mbuf seg - size %i\n", + cpto_size); + printf("Seg %i: mbuf cpy src off 0x%" PRIX64 ", dst off 0x%" PRIX64 ", size %i\n", + mbuf->nb_segs, + (uint64_t)data - (uint64_t)dbg_src_start, + (uint64_t)dst - (uint64_t)dbg_dst_start, + actual_cpy_size); +#endif + + rte_memcpy((void *)dst, (void *)data, + actual_cpy_size); + m->pkt_len += actual_cpy_size; + m->data_len += actual_cpy_size; + + remain -= actual_cpy_size; + cpy_size -= actual_cpy_size; + + data += actual_cpy_size; + + mbuf->nb_segs++; + + } while (cpy_size && remain); + + } else { + /* all data from this virtqueue segment can fit in current mbuf */ +#ifdef RX_MERGE_SEGMENT_DEBUG + printf("Copy all into Seg %i: %i bytes, src off 0x%" PRIX64 + ", dst off 0x%" PRIX64 "\n", + mbuf->nb_segs - 1, cpy_size, + (uint64_t)data - (uint64_t)dbg_src_start, + (uint64_t)dst - (uint64_t)dbg_dst_start); +#endif + rte_memcpy((void *)dst, (void *)data, cpy_size); + m->data_len += cpy_size; + if (mbuf->nb_segs > 1) + m->pkt_len += cpy_size; + remain -= cpy_size; + } + + /* packet complete - all data from current virtqueue packet has been copied */ + if (remain == 0) + break; + /* increment dst to data end */ + dst = rte_pktmbuf_mtod_offset(m, char *, m->data_len); + /* prepare for next virtqueue segment */ + data = (char *)hw_recv[src_pkt] + .addr; /* following packets are full data */ + +#ifdef RX_MERGE_SEGMENT_DEBUG + dbg_src_start = data; +#endif + cpy_size = (remain > SG_HW_RX_PKT_BUFFER_SIZE) ? + SG_HW_RX_PKT_BUFFER_SIZE : + remain; +#ifdef RX_MERGE_SEGMENT_DEBUG + printf("next src buf\n"); +#endif + }; + + if (src_pkt > max_segs) { + NT_LOG(ERR, ETHDEV, + "Did not receive correct number of segment for a whole packet"); + return -1; + } + + return src_pkt; +} + +static uint16_t eth_dev_rx_scg(void *queue, struct rte_mbuf **bufs, + uint16_t nb_pkts) +{ + unsigned int i; + struct rte_mbuf *mbuf; + struct ntnic_rx_queue *rx_q = queue; + uint16_t num_rx = 0; + + struct nthw_received_packets hw_recv[MAX_RX_PACKETS]; + + if (kill_pmd) + return 0; + +#ifdef DEBUG_PRINT_APPROX_CPU_LOAD + dbg_print_approx_cpu_load_rx_called(rx_q->port); +#endif + + if (unlikely(nb_pkts == 0)) + return 0; + + if (nb_pkts > MAX_RX_PACKETS) + nb_pkts = MAX_RX_PACKETS; + + uint16_t whole_pkts; + uint16_t hw_recv_pkt_segs = + nthw_get_rx_packets(rx_q->vq, nb_pkts, hw_recv, &whole_pkts); + + if (!hw_recv_pkt_segs) { +#ifdef DEBUG_PRINT_APPROX_CPU_LOAD + dbg_print_approx_cpu_load_rx_done(rx_q->port, 0); +#endif + + return 0; + } + +#ifdef NT_DEBUG_STAT + dbg_rx_queue(rx_q, + hw_recv_pkt_segs); /* _update debug statistics with new rx packet count */ +#endif + + nb_pkts = whole_pkts; + +#ifdef RX_MERGE_SEGMENT_DEBUG + printf("\n---------- DPDK Rx ------------\n"); + printf("[Port %i] Pkts recv %i on hw queue index %i: tot segs %i, " + "vq buf %i, vq header size %i\n", + rx_q->port, nb_pkts, rx_q->queue.hw_id, hw_recv_pkt_segs, + SG_HW_RX_PKT_BUFFER_SIZE, SG_HDR_SIZE); +#endif + + int src_pkt = 0; /* from 0 to hw_recv_pkt_segs */ + + for (i = 0; i < nb_pkts; i++) { + bufs[i] = rte_pktmbuf_alloc(rx_q->mb_pool); + if (!bufs[i]) { + printf("ERROR - no more buffers mbuf in mempool\n"); + goto err_exit; + } + mbuf = bufs[i]; + + struct _pkt_hdr_rx *phdr = + (struct _pkt_hdr_rx *)hw_recv[src_pkt].addr; + +#ifdef RX_MERGE_SEGMENT_DEBUG + printf("\nRx pkt #%i: vq pkt len %i, segs %i -> mbuf size %i, headroom size %i\n", + i, phdr->cap_len - SG_HDR_SIZE, + (phdr->cap_len + SG_HW_RX_PKT_BUFFER_SIZE - 1) / + SG_HW_RX_PKT_BUFFER_SIZE, + rte_pktmbuf_tailroom(mbuf), rte_pktmbuf_headroom(mbuf)); +#endif + +#ifdef RX_SRC_DUMP_PKTS_DEBUG + { + int d, _segs = (phdr->cap_len + + SG_HW_RX_PKT_BUFFER_SIZE - 1) / + SG_HW_RX_PKT_BUFFER_SIZE; + int _size = phdr->cap_len; + + printf("Rx packet dump: pkt #%i hdr rx port %i, pkt len %i, segs %i\n", + i, phdr->port, phdr->cap_len - SG_HDR_SIZE, + _segs); + for (d = 0; d < _segs; d++) { + printf("Dump seg %i:\n", d); + dump_packet_seg("Vq seg:", hw_recv[src_pkt + d].addr, + _size > SG_HW_RX_PKT_BUFFER_SIZE ? + SG_HW_RX_PKT_BUFFER_SIZE : + _size); + _size -= SG_HW_RX_PKT_BUFFER_SIZE; + } + } +#endif + + if (phdr->cap_len < SG_HDR_SIZE) { + printf("Pkt len of zero received. No header!! - dropping packets\n"); + rte_pktmbuf_free(mbuf); + goto err_exit; + } + + { + if (phdr->cap_len <= SG_HW_RX_PKT_BUFFER_SIZE && + (phdr->cap_len - SG_HDR_SIZE) <= + rte_pktmbuf_tailroom(mbuf)) { +#ifdef RX_MERGE_SEGMENT_DEBUG + printf("Simple copy vq -> mbuf %p size %i\n", + rte_pktmbuf_mtod(mbuf, void *), + phdr->cap_len); +#endif + mbuf->data_len = phdr->cap_len - SG_HDR_SIZE; + rte_memcpy(rte_pktmbuf_mtod(mbuf, char *), + (char *)hw_recv[src_pkt].addr + + SG_HDR_SIZE, + mbuf->data_len); + + mbuf->pkt_len = mbuf->data_len; + src_pkt++; + } else { + int cpy_segs = copy_virtqueue_to_mbuf(mbuf, rx_q->mb_pool, + &hw_recv[src_pkt], + hw_recv_pkt_segs - src_pkt, + phdr->cap_len); + if (cpy_segs < 0) { + /* Error */ + rte_pktmbuf_free(mbuf); + goto err_exit; + } + src_pkt += cpy_segs; + } + +#ifdef RX_DST_DUMP_PKTS_DEBUG + { + struct rte_mbuf *m = mbuf; + + printf("\nRx final mbuf:\n"); + for (int ii = 0; m && ii < m->nb_segs; ii++) { + printf(" seg %i len %i\n", ii, + m->data_len); + printf(" seg dump:\n"); + dump_packet_seg("mbuf seg:", + rte_pktmbuf_mtod(m, uint8_t *), + m->data_len); + m = m->next; + } + } +#endif + + num_rx++; + + mbuf->ol_flags &= + ~(RTE_MBUF_F_RX_FDIR_ID | RTE_MBUF_F_RX_FDIR); + mbuf->port = (uint16_t)-1; + + if (phdr->color_type == 0) { + if (phdr->port >= VIRTUAL_TUNNEL_PORT_OFFSET && + ((phdr->color >> 24) == 0x02)) { + /* VNI in color of descriptor add port as well */ + mbuf->hash.fdir.hi = + ((uint32_t)phdr->color & + 0xffffff) | + ((uint32_t)phdr->port + << 24); + mbuf->hash.fdir.lo = + (uint32_t)phdr->fid; + mbuf->ol_flags |= RTE_MBUF_F_RX_FDIR_ID; + + NT_LOG(DBG, ETHDEV, + "POP'ed packet received that missed on inner match. color = %08x, port %i, tunnel-match flow stat id %i", + phdr->color, phdr->port, + phdr->fid); + } + + } else { + if (phdr->color) { + mbuf->hash.fdir.hi = + phdr->color & + (NT_MAX_COLOR_FLOW_STATS - 1); + mbuf->ol_flags |= + RTE_MBUF_F_RX_FDIR_ID | + RTE_MBUF_F_RX_FDIR; + } + } + } + } + +err_exit: + nthw_release_rx_packets(rx_q->vq, hw_recv_pkt_segs); + +#ifdef DEBUG_PRINT_APPROX_CPU_LOAD + dbg_print_approx_cpu_load_rx_done(rx_q->port, num_rx); +#endif + +#ifdef RX_MERGE_SEGMENT_DEBUG + /* + * When the application double frees a mbuf, it will become a doublet in the memory pool + * This is obvious a bug in application, but can be verified here to some extend at least + */ + uint64_t addr = (uint64_t)bufs[0]->buf_addr; + + for (int i = 1; i < num_rx; i++) { + if (bufs[i]->buf_addr == addr) { + printf("Duplicate packet addresses! num_rx %i\n", + num_rx); + for (int ii = 0; ii < num_rx; ii++) { + printf("bufs[%i]->buf_addr %p\n", ii, + bufs[ii]->buf_addr); + } + } + } +#endif + + return num_rx; +} + +int copy_mbuf_to_virtqueue(struct nthw_cvirtq_desc *cvq_desc, + uint16_t vq_descr_idx, + struct nthw_memory_descriptor *vq_bufs, int max_segs, + struct rte_mbuf *mbuf) +{ + /* + * 1. mbuf packet may be segmented + * 2. the virtqueue buffer size may be too small and may need to be segmented + */ + + char *data = rte_pktmbuf_mtod(mbuf, char *); + char *dst = (char *)vq_bufs[vq_descr_idx].virt_addr + SG_HDR_SIZE; + + int remain = mbuf->pkt_len; + int cpy_size = mbuf->data_len; + +#ifdef CPY_MBUF_TO_VQUEUE_DEBUG + printf("src copy size %i\n", cpy_size); +#endif + + struct rte_mbuf *m = mbuf; + int cpto_size = SG_HW_TX_PKT_BUFFER_SIZE - SG_HDR_SIZE; + + cvq_desc->b[vq_descr_idx].len = SG_HDR_SIZE; + + int cur_seg_num = 0; /* start from 0 */ + + while (m) { + /* Can all data in current src segment be in current dest segment */ + if (cpy_size > cpto_size) { + int new_cpy_size = cpto_size; + +#ifdef CPY_MBUF_TO_VQUEUE_DEBUG + printf("Seg %i: virtq buf first cpy src offs %u, dst offs 0x%" PRIX64 ", size %i\n", + cur_seg_num, + (uint64_t)data - rte_pktmbuf_mtod(m, uint64_t), + (uint64_t)dst - + (uint64_t)vq_bufs[vq_descr_idx].virt_addr, + new_cpy_size); +#endif + rte_memcpy((void *)dst, (void *)data, new_cpy_size); + + cvq_desc->b[vq_descr_idx].len += new_cpy_size; + + remain -= new_cpy_size; + cpy_size -= new_cpy_size; + + data += new_cpy_size; + + /* + * Loop if remaining data from this virtqueue seg cannot fit in one extra + * mbuf + */ + do { + vq_add_flags(cvq_desc, vq_descr_idx, + VIRTQ_DESC_F_NEXT); + + int next_vq_descr_idx = + VIRTQ_DESCR_IDX_NEXT(vq_descr_idx); + + vq_set_next(cvq_desc, vq_descr_idx, + next_vq_descr_idx); + + vq_descr_idx = next_vq_descr_idx; + + vq_set_flags(cvq_desc, vq_descr_idx, 0); + vq_set_next(cvq_desc, vq_descr_idx, 0); + + if (++cur_seg_num > max_segs) + break; + + dst = (char *)vq_bufs[vq_descr_idx].virt_addr; + cpto_size = SG_HW_TX_PKT_BUFFER_SIZE; + + int actual_cpy_size = (cpy_size > cpto_size) ? + cpto_size : + cpy_size; +#ifdef CPY_MBUF_TO_VQUEUE_DEBUG + printf("Tx vq buf seg %i: virtq cpy %i - offset 0x%" PRIX64 "\n", + cur_seg_num, actual_cpy_size, + (uint64_t)dst - + (uint64_t)vq_bufs[vq_descr_idx] + .virt_addr); +#endif + rte_memcpy((void *)dst, (void *)data, + actual_cpy_size); + + cvq_desc->b[vq_descr_idx].len = actual_cpy_size; + + remain -= actual_cpy_size; + cpy_size -= actual_cpy_size; + cpto_size -= actual_cpy_size; + + data += actual_cpy_size; + + } while (cpy_size && remain); + + } else { + /* All data from this segment can fit in current virtqueue buffer */ +#ifdef CPY_MBUF_TO_VQUEUE_DEBUG + printf("Tx vq buf seg %i: Copy %i bytes - offset %u\n", + cur_seg_num, cpy_size, + (uint64_t)dst - + (uint64_t)vq_bufs[vq_descr_idx].virt_addr); +#endif + + rte_memcpy((void *)dst, (void *)data, cpy_size); + + cvq_desc->b[vq_descr_idx].len += cpy_size; + + remain -= cpy_size; + cpto_size -= cpy_size; + } + + /* Packet complete - all segments from current mbuf has been copied */ + if (remain == 0) + break; + /* increment dst to data end */ + dst = (char *)vq_bufs[vq_descr_idx].virt_addr + + cvq_desc->b[vq_descr_idx].len; + + m = m->next; + if (!m) { + NT_LOG(ERR, ETHDEV, "ERROR: invalid packet size\n"); + break; + } + + /* Prepare for next mbuf segment */ + data = rte_pktmbuf_mtod(m, char *); + cpy_size = m->data_len; + }; + + cur_seg_num++; + if (cur_seg_num > max_segs) { + NT_LOG(ERR, ETHDEV, + "Did not receive correct number of segment for a whole packet"); + return -1; + } + + return cur_seg_num; +} + +static uint16_t eth_dev_tx_scg(void *queue, struct rte_mbuf **bufs, + uint16_t nb_pkts) +{ + uint16_t pkt; + uint16_t first_vq_descr_idx = 0; + + struct nthw_cvirtq_desc cvq_desc; + + struct nthw_memory_descriptor *vq_bufs; + + struct ntnic_tx_queue *tx_q = queue; + + int nb_segs = 0, i; + int pkts_sent = 0; + uint16_t nb_segs_arr[MAX_TX_PACKETS]; + + if (kill_pmd) + return 0; + + if (nb_pkts > MAX_TX_PACKETS) + nb_pkts = MAX_TX_PACKETS; + +#ifdef TX_CHAINING_DEBUG + printf("\n---------- DPDK Tx ------------\n"); +#endif + + /* + * count all segments needed to contain all packets in vq buffers + */ + for (i = 0; i < nb_pkts; i++) { + if (bufs[i]->pkt_len < 60) { + bufs[i]->pkt_len = 60; + bufs[i]->data_len = 60; + } + + /* build the num segments array for segmentation control and release function */ + int vq_segs = NUM_VQ_SEGS(bufs[i]->pkt_len); + + nb_segs_arr[i] = vq_segs; + nb_segs += vq_segs; + } + if (!nb_segs) + goto exit_out; + +#ifdef TX_CHAINING_DEBUG + printf("[Port %i] Mbufs for Tx: tot segs %i, packets %i, mbuf size %i, headroom size %i\n", + tx_q->port, nb_segs, nb_pkts, + bufs[0]->buf_len - rte_pktmbuf_headroom(bufs[0]), + rte_pktmbuf_headroom(bufs[0])); +#endif + + int got_nb_segs = + nthw_get_tx_buffers(tx_q->vq, nb_segs, &first_vq_descr_idx, + &cvq_desc /*&vq_descr,*/, &vq_bufs); + if (!got_nb_segs) { +#ifdef TX_CHAINING_DEBUG + printf("Zero segments got - back pressure from HW\n"); +#endif + goto exit_out; + } + + /* + * we may get less vq buffers than we have asked for + * calculate last whole packet that can fit into what + * we have got + */ + while (got_nb_segs < nb_segs) { + if (!--nb_pkts) + goto exit_out; + nb_segs -= NUM_VQ_SEGS(bufs[nb_pkts]->pkt_len); + if (nb_segs <= 0) + goto exit_out; + } + + /* + * nb_pkts & nb_segs, got it all, ready to copy + */ + int seg_idx = 0; + int last_seg_idx = seg_idx; + + for (pkt = 0; pkt < nb_pkts; ++pkt) { + uint16_t vq_descr_idx = VIRTQ_DESCR_IDX(seg_idx); + + vq_set_flags(&cvq_desc, vq_descr_idx, 0); + vq_set_next(&cvq_desc, vq_descr_idx, 0); + + struct _pkt_hdr_tx *hdr_tx = + (struct _pkt_hdr_tx *)vq_bufs[vq_descr_idx].virt_addr; + /* Set the header to all zeros */ + memset(hdr_tx, 0, SG_HDR_SIZE); + + /* + * Set the NT DVIO0 header fields + * + * Applicable for Vswitch only. + * For other product types the header values are "don't care" and we leave them as + * all zeros. + */ + if (tx_q->profile == FPGA_INFO_PROFILE_VSWITCH) { + hdr_tx->bypass_port = tx_q->target_id; + + /* set packet length */ + hdr_tx->cap_len = bufs[pkt]->pkt_len + SG_HDR_SIZE; + } + +#ifdef TX_CHAINING_DEBUG + printf("\nTx pkt #%i: pkt segs %i, pkt len %i -> vq buf size %i, vq header size %i\n", + pkt, bufs[pkt]->nb_segs, bufs[pkt]->pkt_len, + SG_HW_TX_PKT_BUFFER_SIZE, SG_HDR_SIZE); + +#ifdef TX_SRC_DUMP_PKTS_DEBUG + { + struct rte_mbuf *m = bufs[pkt]; + int ii; + + printf("Dump src mbuf:\n"); + for (ii = 0; ii < bufs[pkt]->nb_segs; ii++) { + printf(" seg %i len %i\n", ii, m->data_len); + printf(" seg dump:\n"); + dump_packet_seg("mbuf seg:", + rte_pktmbuf_mtod(m, uint8_t *), + m->data_len); + m = m->next; + } + } +#endif + +#endif + + if (bufs[pkt]->nb_segs == 1 && nb_segs_arr[pkt] == 1) { +#ifdef TX_CHAINING_DEBUG + printf("Simple copy %i bytes - mbuf -> vq\n", + bufs[pkt]->pkt_len); +#endif + rte_memcpy((void *)((char *)vq_bufs[vq_descr_idx].virt_addr + + SG_HDR_SIZE), + rte_pktmbuf_mtod(bufs[pkt], void *), + bufs[pkt]->pkt_len); + + cvq_desc.b[vq_descr_idx].len = + bufs[pkt]->pkt_len + SG_HDR_SIZE; + + seg_idx++; + } else { + int cpy_segs = copy_mbuf_to_virtqueue(&cvq_desc, + vq_descr_idx, vq_bufs, + nb_segs - last_seg_idx, bufs[pkt]); + if (cpy_segs < 0) + break; + seg_idx += cpy_segs; + } + +#ifdef TX_DST_DUMP_PKTS_DEBUG + int d, tot_size = 0; + + for (d = last_seg_idx; d < seg_idx; d++) + tot_size += cvq_desc.b[VIRTQ_DESCR_IDX(d)].len; + printf("\nDump final Tx vq pkt %i, size %i, tx port %i, bypass id %i, using hw queue index %i\n", + pkt, tot_size, tx_q->port, hdr_tx->bypass_port, + tx_q->queue.hw_id); + for (d = last_seg_idx; d < seg_idx; d++) { + char str[32]; + + sprintf(str, "Vq seg %i:", d - last_seg_idx); + dump_packet_seg(str, + vq_bufs[VIRTQ_DESCR_IDX(d)].virt_addr, + cvq_desc.b[VIRTQ_DESCR_IDX(d)].len); + } +#endif + + last_seg_idx = seg_idx; + rte_pktmbuf_free(bufs[pkt]); + pkts_sent++; + } + +#ifdef TX_CHAINING_DEBUG + printf("\nTx final vq setup:\n"); + for (int i = 0; i < nb_segs; i++) { + int idx = VIRTQ_DESCR_IDX(i); + + if (cvq_desc.vq_type == SPLIT_RING) { + printf("virtq descr %i, len %i, flags %04x, next %i\n", + idx, cvq_desc.b[idx].len, cvq_desc.s[idx].flags, + cvq_desc.s[idx].next); + } + } +#endif + +exit_out: + + if (pkts_sent) { +#ifdef TX_CHAINING_DEBUG + printf("Release virtq segs %i\n", nb_segs); +#endif + nthw_release_tx_buffers(tx_q->vq, pkts_sent, nb_segs_arr); + } + return pkts_sent; +} + +static int allocate_hw_virtio_queues(struct rte_eth_dev *eth_dev, int vf_num, + struct hwq_s *hwq, int num_descr, + int buf_size) +{ + int i, res; + uint32_t size; + uint64_t iova_addr; + + NT_LOG(DBG, ETHDEV, + "***** Configure IOMMU for HW queues on VF %i *****\n", vf_num); + + /* Just allocate 1MB to hold all combined descr rings */ + uint64_t tot_alloc_size = 0x100000 + buf_size * num_descr; + + void *virt = rte_malloc_socket("VirtQDescr", tot_alloc_size, + ALIGN_SIZE(tot_alloc_size), + eth_dev->data->numa_node); + if (!virt) + return -1; + + uint64_t gp_offset = (uint64_t)virt & ONE_G_MASK; + rte_iova_t hpa = rte_malloc_virt2iova(virt); + + NT_LOG(DBG, ETHDEV, + "Allocated virtio descr rings : virt %p [0x%" PRIX64 + "], hpa %p [0x%" PRIX64 "]\n", + virt, gp_offset, hpa, hpa & ONE_G_MASK); + + /* + * Same offset on both HPA and IOVA + * Make sure 1G boundary is never crossed + */ + if (((hpa & ONE_G_MASK) != gp_offset) || + (((uint64_t)virt + tot_alloc_size) & ~ONE_G_MASK) != + ((uint64_t)virt & ~ONE_G_MASK)) { + NT_LOG(ERR, ETHDEV, + "*********************************************************\n"); + NT_LOG(ERR, ETHDEV, + "ERROR, no optimal IOMMU mapping available hpa : %016lx (%016lx), gp_offset : %016lx size %u\n", + hpa, hpa & ONE_G_MASK, gp_offset, tot_alloc_size); + NT_LOG(ERR, ETHDEV, + "*********************************************************\n"); + + rte_free(virt); + + /* Just allocate 1MB to hold all combined descr rings */ + size = 0x100000; + void *virt = rte_malloc_socket("VirtQDescr", size, 4096, + eth_dev->data->numa_node); + if (!virt) + return -1; + + res = nt_vfio_dma_map(vf_num, virt, &iova_addr, size); + + NT_LOG(DBG, ETHDEV, "VFIO MMAP res %i, vf_num %i\n", res, + vf_num); + if (res != 0) + return -1; + + hwq->vf_num = vf_num; + hwq->virt_queues_ctrl.virt_addr = virt; + hwq->virt_queues_ctrl.phys_addr = (void *)iova_addr; + hwq->virt_queues_ctrl.len = size; + + NT_LOG(DBG, ETHDEV, + "Allocated for virtio descr rings combined 1MB : %p, IOVA %016lx\n", + virt, iova_addr); + + size = num_descr * sizeof(struct nthw_memory_descriptor); + hwq->pkt_buffers = rte_zmalloc_socket("rx_pkt_buffers", size, + 64, eth_dev->data->numa_node); + if (!hwq->pkt_buffers) { + NT_LOG(ERR, ETHDEV, + "Failed to allocated buffer array for hw-queue %p, " + "total size %i, elements %i\n", + hwq->pkt_buffers, size, num_descr); + rte_free(virt); + return -1; + } + + size = buf_size * num_descr; + void *virt_addr = rte_malloc_socket("pkt_buffer_pkts", size, + 4096, + eth_dev->data->numa_node); + if (!virt_addr) { + NT_LOG(ERR, ETHDEV, + "Failed allocate packet buffers for hw-queue %p, " + "buf size %i, elements %i\n", + hwq->pkt_buffers, buf_size, num_descr); + rte_free(hwq->pkt_buffers); + rte_free(virt); + return -1; + } + + res = nt_vfio_dma_map(vf_num, virt_addr, &iova_addr, size); + + NT_LOG(DBG, ETHDEV, + "VFIO MMAP res %i, virt %p, iova %016lx, vf_num %i, num " + "pkt bufs %i, tot size %i\n", + res, virt_addr, iova_addr, vf_num, num_descr, size); + + if (res != 0) + return -1; + + for (i = 0; i < num_descr; i++) { + hwq->pkt_buffers[i].virt_addr = + (void *)((char *)virt_addr + + ((uint64_t)(i) * buf_size)); + hwq->pkt_buffers[i].phys_addr = + (void *)(iova_addr + ((uint64_t)(i) * buf_size)); + hwq->pkt_buffers[i].len = buf_size; + } + + return 0; + } /* End of: no optimal IOMMU mapping available */ + + res = nt_vfio_dma_map(vf_num, virt, &iova_addr, ONE_G_SIZE); + if (res != 0) { + NT_LOG(ERR, ETHDEV, "VFIO MMAP FAILED! res %i, vf_num %i\n", + res, vf_num); + return -1; + } + + hwq->vf_num = vf_num; + hwq->virt_queues_ctrl.virt_addr = virt; + hwq->virt_queues_ctrl.phys_addr = (void *)(iova_addr); + hwq->virt_queues_ctrl.len = 0x100000; + iova_addr += 0x100000; + + NT_LOG(DBG, ETHDEV, + "VFIO MMAP: virt_addr=%" PRIX64 " phys_addr=%" PRIX64 + " size=%" PRIX64 " hpa=%" PRIX64 "\n", + hwq->virt_queues_ctrl.virt_addr, hwq->virt_queues_ctrl.phys_addr, + hwq->virt_queues_ctrl.len, + rte_malloc_virt2iova(hwq->virt_queues_ctrl.virt_addr)); + + size = num_descr * sizeof(struct nthw_memory_descriptor); + hwq->pkt_buffers = rte_zmalloc_socket("rx_pkt_buffers", size, 64, + eth_dev->data->numa_node); + if (!hwq->pkt_buffers) { + NT_LOG(ERR, ETHDEV, + "Failed to allocated buffer array for hw-queue %p, total size %i, elements %i\n", + hwq->pkt_buffers, size, num_descr); + rte_free(virt); + return -1; + } + + void *virt_addr = (void *)((uint64_t)virt + 0x100000); + + for (i = 0; i < num_descr; i++) { + hwq->pkt_buffers[i].virt_addr = + (void *)((char *)virt_addr + ((uint64_t)(i) * buf_size)); + hwq->pkt_buffers[i].phys_addr = + (void *)(iova_addr + ((uint64_t)(i) * buf_size)); + hwq->pkt_buffers[i].len = buf_size; + } + return 0; +} + +static void release_hw_virtio_queues(struct hwq_s *hwq) +{ + if (!hwq || hwq->vf_num == 0) + return; + hwq->vf_num = 0; +} + +static int deallocate_hw_virtio_queues(struct hwq_s *hwq) +{ + int vf_num = hwq->vf_num; + + void *virt = hwq->virt_queues_ctrl.virt_addr; + + int res = nt_vfio_dma_unmap(vf_num, hwq->virt_queues_ctrl.virt_addr, + (uint64_t)hwq->virt_queues_ctrl.phys_addr, + ONE_G_SIZE); + if (res != 0) { + NT_LOG(ERR, ETHDEV, "VFIO UNMMAP FAILED! res %i, vf_num %i\n", + res, vf_num); + return -1; + } + + release_hw_virtio_queues(hwq); + rte_free(hwq->pkt_buffers); + rte_free(virt); + return 0; +} + +static void eth_tx_queue_release(struct rte_eth_dev *dev, uint16_t queue_id) +{ + struct pmd_internals *internals = dev->data->dev_private; + struct ntnic_tx_queue *tx_q = &internals->txq_scg[queue_id]; + + deallocate_hw_virtio_queues(&tx_q->hwq); + NT_LOG(DBG, ETHDEV, "NTNIC: %s\n", __func__); +} + +static void eth_rx_queue_release(struct rte_eth_dev *dev, uint16_t queue_id) +{ + struct pmd_internals *internals = dev->data->dev_private; + struct ntnic_rx_queue *rx_q = &internals->rxq_scg[queue_id]; + + deallocate_hw_virtio_queues(&rx_q->hwq); + NT_LOG(DBG, ETHDEV, "NTNIC: %s\n", __func__); +} + +static int num_queues_allocated; + +/* Returns num queue starting at returned queue num or -1 on fail */ +static int allocate_queue(int num) +{ + int next_free = num_queues_allocated; + + NT_LOG(DBG, ETHDEV, + "%s: num_queues_allocated=%u, New queues=%u, Max queues=%u\n", + __func__, num_queues_allocated, num, MAX_TOTAL_QUEUES); + if (num_queues_allocated + num > MAX_TOTAL_QUEUES) + return -1; + num_queues_allocated += num; + return next_free; +} + +static int +eth_rx_scg_queue_setup(struct rte_eth_dev *eth_dev, uint16_t rx_queue_id, + uint16_t nb_rx_desc __rte_unused, + unsigned int socket_id __rte_unused, + const struct rte_eth_rxconf *rx_conf __rte_unused, + struct rte_mempool *mb_pool) +{ + NT_LOG(DBG, ETHDEV, "%s: [%s:%u]\n", __func__, __func__, __LINE__); + struct rte_pktmbuf_pool_private *mbp_priv; + struct pmd_internals *internals = eth_dev->data->dev_private; + struct ntnic_rx_queue *rx_q = &internals->rxq_scg[rx_queue_id]; + struct drv_s *p_drv = internals->p_drv; + struct ntdrv_4ga_s *p_nt_drv = &p_drv->ntdrv; + + if (internals->type == PORT_TYPE_OVERRIDE) { + rx_q->mb_pool = mb_pool; + eth_dev->data->rx_queues[rx_queue_id] = rx_q; + mbp_priv = rte_mempool_get_priv(rx_q->mb_pool); + rx_q->buf_size = (uint16_t)(mbp_priv->mbuf_data_room_size - + RTE_PKTMBUF_HEADROOM); + rx_q->enabled = 1; + return 0; + } + + NT_LOG(DBG, ETHDEV, + "(%i) NTNIC RX OVS-SW queue setup: queue id %i, hw queue index %i\n", + internals->port, rx_queue_id, rx_q->queue.hw_id); + + rx_q->mb_pool = mb_pool; + + eth_dev->data->rx_queues[rx_queue_id] = rx_q; + + mbp_priv = rte_mempool_get_priv(rx_q->mb_pool); + rx_q->buf_size = (uint16_t)(mbp_priv->mbuf_data_room_size - + RTE_PKTMBUF_HEADROOM); + rx_q->enabled = 1; + + if (allocate_hw_virtio_queues(eth_dev, EXCEPTION_PATH_HID, &rx_q->hwq, + SG_NB_HW_RX_DESCRIPTORS, + SG_HW_RX_PKT_BUFFER_SIZE) < 0) + return -1; + + rx_q->nb_hw_rx_descr = SG_NB_HW_RX_DESCRIPTORS; + + rx_q->profile = p_drv->ntdrv.adapter_info.fpga_info.profile; + + rx_q->vq = nthw_setup_managed_rx_virt_queue(p_nt_drv->adapter_info.fpga_info.mp_nthw_dbs, + rx_q->queue.hw_id, /* index */ + rx_q->nb_hw_rx_descr, EXCEPTION_PATH_HID, /* host_id */ + 1, /* header NT DVIO header for exception path */ + &rx_q->hwq.virt_queues_ctrl, rx_q->hwq.pkt_buffers, SPLIT_RING, -1); + + NT_LOG(DBG, ETHDEV, "(%i) NTNIC RX OVS-SW queues successfully setup\n", + internals->port); + + return 0; +} + +static int +eth_tx_scg_queue_setup(struct rte_eth_dev *eth_dev, uint16_t tx_queue_id, + uint16_t nb_tx_desc __rte_unused, + unsigned int socket_id __rte_unused, + const struct rte_eth_txconf *tx_conf __rte_unused) +{ + NT_LOG(DBG, ETHDEV, "%s: [%s:%u]\n", __func__, __func__, __LINE__); + struct pmd_internals *internals = eth_dev->data->dev_private; + struct drv_s *p_drv = internals->p_drv; + struct ntdrv_4ga_s *p_nt_drv = &p_drv->ntdrv; + struct ntnic_tx_queue *tx_q = &internals->txq_scg[tx_queue_id]; + + if (internals->type == PORT_TYPE_OVERRIDE) { + eth_dev->data->tx_queues[tx_queue_id] = tx_q; + return 0; + } + + NT_LOG(DBG, ETHDEV, + "(%i) NTNIC TX OVS-SW queue setup: queue id %i, hw queue index %i\n", + tx_q->port, tx_queue_id, tx_q->queue.hw_id); + + if (tx_queue_id > internals->nb_tx_queues) { + printf("Error invalid tx queue id\n"); + return -1; + } + + eth_dev->data->tx_queues[tx_queue_id] = tx_q; + + /* Calculate target ID for HW - to be used in NTDVIO0 header bypass_port */ + if (tx_q->rss_target_id >= 0) { + /* bypass to a multiqueue port - qsl-hsh index */ + tx_q->target_id = tx_q->rss_target_id + 0x90; + } else { + if (internals->vpq[tx_queue_id].hw_id > -1) { + /* virtual port - queue index */ + tx_q->target_id = internals->vpq[tx_queue_id].hw_id; + } else { + /* Phy port - phy port identifier */ + if (lag_active) { + /* If in LAG mode use bypass 0x90 mode */ + tx_q->target_id = 0x90; + } else { + /* output/bypass to MAC */ + tx_q->target_id = (int)(tx_q->port + 0x80); + } + } + } + + if (allocate_hw_virtio_queues(eth_dev, EXCEPTION_PATH_HID, &tx_q->hwq, + SG_NB_HW_TX_DESCRIPTORS, + SG_HW_TX_PKT_BUFFER_SIZE) < 0) + return -1; + + tx_q->nb_hw_tx_descr = SG_NB_HW_TX_DESCRIPTORS; + + tx_q->profile = p_drv->ntdrv.adapter_info.fpga_info.profile; + + uint32_t port, header; + + if (tx_q->profile == FPGA_INFO_PROFILE_VSWITCH) { + /* transmit port - not used in vswitch enabled mode - using bypass */ + port = 0; + header = 1; /* header type DVIO0 Always for exception path */ + } else { + port = tx_q->port; /* transmit port */ + header = 0; /* header type VirtIO-Net */ + } + /* + * in_port - in vswitch mode has to move tx port from OVS excep. Away + * from VM tx port, because of QoS is matched by port id! + */ + tx_q->vq = nthw_setup_managed_tx_virt_queue(p_nt_drv->adapter_info.fpga_info.mp_nthw_dbs, + tx_q->queue.hw_id, /* index */ + tx_q->nb_hw_tx_descr, /* queue size */ + EXCEPTION_PATH_HID, /* host_id always VF4 */ + port, + tx_q->port + + 128, + header, &tx_q->hwq.virt_queues_ctrl, tx_q->hwq.pkt_buffers, + SPLIT_RING, -1, IN_ORDER); + + tx_q->enabled = 1; + for (uint32_t i = 0; i < internals->vpq_nb_vq; i++) { + nthw_epp_set_queue_to_vport(p_nt_drv->adapter_info.fpga_info.mp_nthw_epp, + internals->vpq[i].hw_id, tx_q->port); + } + + NT_LOG(DBG, ETHDEV, "(%i) NTNIC TX OVS-SW queues successfully setup\n", + internals->port); + + if (internals->type == PORT_TYPE_PHYSICAL) { + struct adapter_info_s *p_adapter_info = + &internals->p_drv->ntdrv.adapter_info; + NT_LOG(DBG, ETHDEV, "Port %i is ready for data. Enable port\n", + internals->if_index); + nt4ga_port_set_adm_state(p_adapter_info, internals->if_index, + true); + if (lag_active && internals->if_index == 0) { + /* + * Special case for link aggregation where the second phy interface (port 1) + * is "hidden" from DPDK and therefore doesn't get enabled through normal + * interface probing + */ + NT_LOG(DBG, ETHDEV, "LAG: Enable port %i\n", + internals->if_index + 1); + nt4ga_port_set_adm_state(p_adapter_info, + internals->if_index + 1, true); + } + } + + return 0; +} + +static int dev_set_mtu_inline(struct rte_eth_dev *dev, uint16_t mtu) +{ + struct pmd_internals *internals = + (struct pmd_internals *)dev->data->dev_private; + struct flow_eth_dev *flw_dev = internals->flw_dev; + int ret = -1; + + if (internals->type == PORT_TYPE_PHYSICAL && mtu >= MIN_MTU_INLINE && + mtu <= MAX_MTU) + ret = flow_set_mtu_inline(flw_dev, internals->port, mtu); + return ret ? -EINVAL : 0; +} + +static int dev_set_mtu(struct rte_eth_dev *dev, uint16_t mtu) +{ + struct pmd_internals *internals = dev->data->dev_private; + /*struct ntnic_tx_queue *tx_q = internals->txq; */ + struct drv_s *p_drv = internals->p_drv; + struct ntdrv_4ga_s *p_nt_drv = &p_drv->ntdrv; + fpga_info_t *fpga_info = &p_nt_drv->adapter_info.fpga_info; + int retval = EINVAL; + + if (mtu < MIN_MTU || mtu > MAX_MTU) + return -EINVAL; + + if (internals->type == PORT_TYPE_VIRTUAL) { + /* set MTU on exception to MAX_MTU */ + retval = nthw_epp_set_mtu(fpga_info->mp_nthw_epp, + internals->rxq_scg[0] + .queue + .hw_id, /* exception tx queue hw_id to OVS */ + MAX_MTU, /* max number of bytes allowed for a given port. */ + internals->type); /* port type */ + + if (retval) + return retval; + + uint i; + + for (i = 0; i < internals->vpq_nb_vq; i++) { + retval = nthw_epp_set_mtu(fpga_info->mp_nthw_epp, + internals->vpq[i].hw_id, /* tx queue hw_id */ + mtu, /* max number of bytes allowed for a given port. */ + internals->type); /* port type */ + if (retval) + return retval; + + NT_LOG(DBG, ETHDEV, "SET MTU SIZE %d queue hw_id %d\n", + mtu, internals->vpq[i].hw_id); + } + } else if (internals->type == PORT_TYPE_PHYSICAL) { + /* set MTU on exception to MAX_MTU */ + retval = nthw_epp_set_mtu(fpga_info->mp_nthw_epp, + internals->rxq_scg[0] + .queue + .hw_id, /* exception tx queue hw_id to OVS */ + MAX_MTU, /* max number of bytes allowed for a given port. */ + PORT_TYPE_VIRTUAL); /* port type */ + if (retval) + return retval; + + retval = nthw_epp_set_mtu(fpga_info->mp_nthw_epp, + internals->port, /* port number */ + mtu, /* max number of bytes allowed for a given port. */ + internals->type); /* port type */ + + NT_LOG(DBG, ETHDEV, "SET MTU SIZE %d port %d\n", mtu, + internals->port); + } else { + NT_LOG(DBG, ETHDEV, + "COULD NOT SET MTU SIZE %d port %d type %d\n", mtu, + internals->port, internals->type); + retval = -EINVAL; + } + return retval; +} + +static int eth_rx_queue_start(struct rte_eth_dev *eth_dev, uint16_t rx_queue_id) +{ + NT_LOG(DBG, ETHDEV, "%s: [%s:%u]\n", __func__, __func__, __LINE__); + eth_dev->data->rx_queue_state[rx_queue_id] = + RTE_ETH_QUEUE_STATE_STARTED; + return 0; +} + +static int eth_rx_queue_stop(struct rte_eth_dev *eth_dev, uint16_t rx_queue_id) +{ + NT_LOG(DBG, ETHDEV, "%s: [%s:%u]\n", __func__, __func__, __LINE__); + eth_dev->data->rx_queue_state[rx_queue_id] = + RTE_ETH_QUEUE_STATE_STOPPED; + return 0; +} + +static int eth_tx_queue_start(struct rte_eth_dev *eth_dev, uint16_t rx_queue_id) +{ + NT_LOG(DBG, ETHDEV, "%s: [%s:%u]\n", __func__, __func__, __LINE__); + eth_dev->data->tx_queue_state[rx_queue_id] = + RTE_ETH_QUEUE_STATE_STARTED; + return 0; +} + +static int eth_tx_queue_stop(struct rte_eth_dev *eth_dev, uint16_t rx_queue_id) +{ + NT_LOG(DBG, ETHDEV, "%s: [%s:%u]\n", __func__, __func__, __LINE__); + eth_dev->data->tx_queue_state[rx_queue_id] = + RTE_ETH_QUEUE_STATE_STOPPED; + return 0; +} + +static void eth_mac_addr_remove(struct rte_eth_dev *dev, uint32_t index) +{ + struct rte_ether_addr *const eth_addrs = dev->data->mac_addrs; + + assert(index < NUM_MAC_ADDRS_PER_PORT); + + if (index >= NUM_MAC_ADDRS_PER_PORT) { + const struct pmd_internals *const internals = + dev->data->dev_private; + NT_LOG(ERR, ETHDEV, + "%s: [%s:%i]: Port %i: illegal index %u (>= %u)\n", + __FILE__, __func__, __LINE__, internals->if_index, index, + NUM_MAC_ADDRS_PER_PORT); + return; + } + (void)memset(ð_addrs[index], 0, sizeof(eth_addrs[index])); +} + +static int eth_mac_addr_add(struct rte_eth_dev *dev, + struct rte_ether_addr *mac_addr, uint32_t index, + uint32_t vmdq __rte_unused) +{ + struct rte_ether_addr *const eth_addrs = dev->data->mac_addrs; + + assert(index < NUM_MAC_ADDRS_PER_PORT); + + if (index >= NUM_MAC_ADDRS_PER_PORT) { + const struct pmd_internals *const internals = + dev->data->dev_private; + NT_LOG(ERR, ETHDEV, + "%s: [%s:%i]: Port %i: illegal index %u (>= %u)\n", + __FILE__, __func__, __LINE__, internals->if_index, index, + NUM_MAC_ADDRS_PER_PORT); + return -1; + } + + eth_addrs[index] = *mac_addr; + + return 0; +} + +static int eth_mac_addr_set(struct rte_eth_dev *dev, + struct rte_ether_addr *mac_addr) +{ + struct rte_ether_addr *const eth_addrs = dev->data->mac_addrs; + + eth_addrs[0U] = *mac_addr; + + return 0; +} + +static int eth_set_mc_addr_list(struct rte_eth_dev *dev, + struct rte_ether_addr *mc_addr_set, + uint32_t nb_mc_addr) +{ + struct pmd_internals *const internals = dev->data->dev_private; + struct rte_ether_addr *const mc_addrs = internals->mc_addrs; + size_t i; + + if (nb_mc_addr >= NUM_MULTICAST_ADDRS_PER_PORT) { + NT_LOG(ERR, ETHDEV, + "%s: [%s:%i]: Port %i: too many multicast addresses %u (>= %u)\n", + __FILE__, __func__, __LINE__, internals->if_index, + nb_mc_addr, NUM_MULTICAST_ADDRS_PER_PORT); + return -1; + } + + for (i = 0U; i < NUM_MULTICAST_ADDRS_PER_PORT; i++) { + if (i < nb_mc_addr) + mc_addrs[i] = mc_addr_set[i]; + + else + (void)memset(&mc_addrs[i], 0, sizeof(mc_addrs[i])); + } + + return 0; +} + +static int eth_dev_configure(struct rte_eth_dev *eth_dev) +{ + struct pmd_internals *internals = + (struct pmd_internals *)eth_dev->data->dev_private; + struct drv_s *p_drv = internals->p_drv; + + NT_LOG(DBG, ETHDEV, "%s: [%s:%u] Called for eth_dev %p\n", __func__, + __func__, __LINE__, eth_dev); + + p_drv->probe_finished = 1; + + /* The device is ALWAYS running promiscuous mode. */ + eth_dev->data->promiscuous ^= ~eth_dev->data->promiscuous; + return 0; +} + +static int eth_dev_start(struct rte_eth_dev *eth_dev) +{ + struct pmd_internals *internals = + (struct pmd_internals *)eth_dev->data->dev_private; + const int n_intf_no = internals->if_index; + struct adapter_info_s *p_adapter_info = + &internals->p_drv->ntdrv.adapter_info; + + NT_LOG(DBG, ETHDEV, "%s: [%s:%u] - Port %u, %u\n", __func__, __func__, + __LINE__, internals->n_intf_no, internals->if_index); + + if (internals->type == PORT_TYPE_VIRTUAL || + internals->type == PORT_TYPE_OVERRIDE) { + eth_dev->data->dev_link.link_status = ETH_LINK_UP; + } else { + /* + * wait for link on port + * If application starts sending too soon before FPGA port is ready, garbage is + * produced + */ + int loop = 0; + + while (nt4ga_port_get_link_status(p_adapter_info, n_intf_no) == + ETH_LINK_DOWN) { + /* break out after 5 sec */ + if (++loop >= 50) { + NT_LOG(DBG, ETHDEV, + "%s: TIMEOUT No link on port %i (5sec timeout)\n", + __func__, internals->n_intf_no); + break; + } + usleep(100000); + } + assert(internals->n_intf_no == + internals->if_index); /* Sanity check */ + if (internals->lpbk_mode) { + if (internals->lpbk_mode & 1 << 0) { + nt4ga_port_set_loopback_mode(p_adapter_info, + n_intf_no, + NT_LINK_LOOPBACK_HOST); + } + if (internals->lpbk_mode & 1 << 1) { + nt4ga_port_set_loopback_mode(p_adapter_info, + n_intf_no, + NT_LINK_LOOPBACK_LINE); + } + } + } + return 0; +} + +static int eth_dev_stop(struct rte_eth_dev *eth_dev) +{ + struct pmd_internals *internals = + (struct pmd_internals *)eth_dev->data->dev_private; + const int n_intf_no = internals->if_index; + struct adapter_info_s *p_adapter_info = + &internals->p_drv->ntdrv.adapter_info; + + NT_LOG(DBG, ETHDEV, "%s: [%s:%u] - Port %u, %u, type %u\n", __func__, + __func__, __LINE__, internals->n_intf_no, internals->if_index, + internals->type); + + if (internals->type != PORT_TYPE_VIRTUAL) { + struct ntnic_rx_queue *rx_q = internals->rxq_scg; + struct ntnic_tx_queue *tx_q = internals->txq_scg; + + uint q; + + for (q = 0; q < internals->nb_rx_queues; q++) + nthw_release_managed_rx_virt_queue(rx_q[q].vq); + + for (q = 0; q < internals->nb_tx_queues; q++) + nthw_release_managed_tx_virt_queue(tx_q[q].vq); + + nt4ga_port_set_adm_state(p_adapter_info, n_intf_no, 0); + nt4ga_port_set_link_status(p_adapter_info, n_intf_no, 0); + nt4ga_port_set_link_speed(p_adapter_info, n_intf_no, + NT_LINK_SPEED_UNKNOWN); + nt4ga_port_set_loopback_mode(p_adapter_info, n_intf_no, + NT_LINK_LOOPBACK_OFF); + } + + eth_dev->data->dev_link.link_status = ETH_LINK_DOWN; + return 0; +} + +static int eth_dev_set_link_up(struct rte_eth_dev *dev) +{ + struct pmd_internals *const internals = dev->data->dev_private; + struct adapter_info_s *p_adapter_info = + &internals->p_drv->ntdrv.adapter_info; + const int port = internals->if_index; + + if (internals->type == PORT_TYPE_VIRTUAL || + internals->type == PORT_TYPE_OVERRIDE) + return 0; + + assert(port >= 0 && port < NUM_ADAPTER_PORTS_MAX); + assert(port == internals->n_intf_no); + + nt4ga_port_set_adm_state(p_adapter_info, port, true); + + return 0; +} + +static int eth_dev_set_link_down(struct rte_eth_dev *dev) +{ + struct pmd_internals *const internals = dev->data->dev_private; + struct adapter_info_s *p_adapter_info = + &internals->p_drv->ntdrv.adapter_info; + const int port = internals->if_index; + + if (internals->type == PORT_TYPE_VIRTUAL || + internals->type == PORT_TYPE_OVERRIDE) + return 0; + + assert(port >= 0 && port < NUM_ADAPTER_PORTS_MAX); + assert(port == internals->n_intf_no); + + nt4ga_port_set_link_status(p_adapter_info, port, false); + + return 0; +} + +static int eth_dev_close(struct rte_eth_dev *eth_dev) +{ + struct pmd_internals *internals = + (struct pmd_internals *)eth_dev->data->dev_private; + struct drv_s *p_drv = internals->p_drv; + ntdrv_4ga_t *p_nt_drv = &p_drv->ntdrv; + fpga_info_t *fpga_info = &p_nt_drv->adapter_info.fpga_info; + struct rte_pci_device *pci_dev = RTE_ETH_DEV_TO_PCI(eth_dev); + (void)pci_dev; /* UNUSED */ + + NT_LOG(DBG, ETHDEV, "%s: enter [%s:%u]\n", __func__, __func__, + __LINE__); + + internals->p_drv = NULL; + + /* LAG cleanup */ + if (internals->lag_config) { + if (internals->lag_config->lag_tid) { + internals->lag_config->lag_thread_active = 0; + pthread_join(internals->lag_config->lag_tid, NULL); + } + lag_active = 0; + rte_free(internals->lag_config); + } + + /* free */ + rte_free(internals); + internals = NULL; + + eth_dev->data->dev_private = NULL; + eth_dev->data->mac_addrs = NULL; + + /* release */ + rte_eth_dev_release_port(eth_dev); + + NT_LOG(DBG, ETHDEV, "%s: %d [%s:%u]\n", __func__, + p_drv->n_eth_dev_init_count, __func__, __LINE__); + p_drv->n_eth_dev_init_count--; + + /* + * rte_pci_dev has no private member for p_drv + * wait until all rte_eth_dev's are closed - then close adapters via p_drv + */ + if (!p_drv->n_eth_dev_init_count && p_drv) { + NT_LOG(DBG, ETHDEV, "%s: %d [%s:%u]\n", __func__, + p_drv->n_eth_dev_init_count, __func__, __LINE__); + p_drv->ntdrv.b_shutdown = true; + void *p_ret_val = NULL; + + pthread_join(p_nt_drv->stat_thread, &p_ret_val); + if (fpga_info->profile == FPGA_INFO_PROFILE_INLINE) { + p_ret_val = NULL; + pthread_join(p_nt_drv->flm_thread, &p_ret_val); + } + nt4ga_adapter_deinit(&p_nt_drv->adapter_info); + rte_free(p_drv); + } + NT_LOG(DBG, ETHDEV, "%s: leave [%s:%u]\n", __func__, __func__, + __LINE__); + return 0; +} + +static int eth_fw_version_get(struct rte_eth_dev *eth_dev, char *fw_version, + size_t fw_size) +{ + struct pmd_internals *internals = + (struct pmd_internals *)eth_dev->data->dev_private; + + if (internals->type == PORT_TYPE_VIRTUAL || + internals->type == PORT_TYPE_OVERRIDE) + return 0; + + fpga_info_t *fpga_info = &internals->p_drv->ntdrv.adapter_info.fpga_info; + const int length = + snprintf(fw_version, fw_size, "%03d-%04d-%02d-%02d", + fpga_info->n_fpga_type_id, fpga_info->n_fpga_prod_id, + fpga_info->n_fpga_ver_id, fpga_info->n_fpga_rev_id); + if ((size_t)length < fw_size) { + /* We have space for the version string */ + return 0; + } + /* We do not have space for the version string -return the needed space */ + return length + 1; +} + +static int eth_xstats_get(struct rte_eth_dev *eth_dev, + struct rte_eth_xstat *stats, unsigned int n) +{ + struct pmd_internals *internals = + (struct pmd_internals *)eth_dev->data->dev_private; + struct drv_s *p_drv = internals->p_drv; + ntdrv_4ga_t *p_nt_drv = &p_drv->ntdrv; + nt4ga_stat_t *p_nt4ga_stat = &p_nt_drv->adapter_info.nt4ga_stat; + nthw_stat_t *p_nthw_stat = p_nt4ga_stat->mp_nthw_stat; + int if_index = internals->if_index; + int nb_xstats; + + pthread_mutex_lock(&p_nt_drv->stat_lck); + nb_xstats = nthw_xstats_get(p_nt4ga_stat, stats, n, + p_nthw_stat->mb_is_vswitch, if_index); + pthread_mutex_unlock(&p_nt_drv->stat_lck); + return nb_xstats; +} + +static int eth_xstats_get_by_id(struct rte_eth_dev *eth_dev, + const uint64_t *ids, uint64_t *values, + unsigned int n) +{ + struct pmd_internals *internals = + (struct pmd_internals *)eth_dev->data->dev_private; + struct drv_s *p_drv = internals->p_drv; + ntdrv_4ga_t *p_nt_drv = &p_drv->ntdrv; + nt4ga_stat_t *p_nt4ga_stat = &p_nt_drv->adapter_info.nt4ga_stat; + nthw_stat_t *p_nthw_stat = p_nt4ga_stat->mp_nthw_stat; + int if_index = internals->if_index; + int nb_xstats; + + pthread_mutex_lock(&p_nt_drv->stat_lck); + nb_xstats = nthw_xstats_get_by_id(p_nt4ga_stat, ids, values, n, + p_nthw_stat->mb_is_vswitch, if_index); + pthread_mutex_unlock(&p_nt_drv->stat_lck); + return nb_xstats; +} + +static int eth_xstats_reset(struct rte_eth_dev *eth_dev) +{ + struct pmd_internals *internals = + (struct pmd_internals *)eth_dev->data->dev_private; + struct drv_s *p_drv = internals->p_drv; + ntdrv_4ga_t *p_nt_drv = &p_drv->ntdrv; + nt4ga_stat_t *p_nt4ga_stat = &p_nt_drv->adapter_info.nt4ga_stat; + nthw_stat_t *p_nthw_stat = p_nt4ga_stat->mp_nthw_stat; + int if_index = internals->if_index; + + pthread_mutex_lock(&p_nt_drv->stat_lck); + nthw_xstats_reset(p_nt4ga_stat, p_nthw_stat->mb_is_vswitch, if_index); + pthread_mutex_unlock(&p_nt_drv->stat_lck); + return dpdk_stats_reset(internals, p_nt_drv, if_index); +} + +static int eth_xstats_get_names(struct rte_eth_dev *eth_dev __rte_unused, + struct rte_eth_xstat_name *xstats_names, + unsigned int size) +{ + struct pmd_internals *internals = + (struct pmd_internals *)eth_dev->data->dev_private; + struct drv_s *p_drv = internals->p_drv; + ntdrv_4ga_t *p_nt_drv = &p_drv->ntdrv; + nt4ga_stat_t *p_nt4ga_stat = &p_nt_drv->adapter_info.nt4ga_stat; + nthw_stat_t *p_nthw_stat = p_nt4ga_stat->mp_nthw_stat; + + return nthw_xstats_get_names(p_nt4ga_stat, xstats_names, size, + p_nthw_stat->mb_is_vswitch); +} + +static int eth_xstats_get_names_by_id(struct rte_eth_dev *eth_dev, + const uint64_t *ids, + struct rte_eth_xstat_name *xstats_names, + unsigned int size) +{ + struct pmd_internals *internals = + (struct pmd_internals *)eth_dev->data->dev_private; + struct drv_s *p_drv = internals->p_drv; + ntdrv_4ga_t *p_nt_drv = &p_drv->ntdrv; + nt4ga_stat_t *p_nt4ga_stat = &p_nt_drv->adapter_info.nt4ga_stat; + nthw_stat_t *p_nthw_stat = p_nt4ga_stat->mp_nthw_stat; + + return nthw_xstats_get_names_by_id(p_nt4ga_stat, xstats_names, ids, size, + p_nthw_stat->mb_is_vswitch); +} + +static int _dev_flow_ops_get(struct rte_eth_dev *dev __rte_unused, + const struct rte_flow_ops **ops) +{ + *ops = &_dev_flow_ops; + return 0; +} + +static int promiscuous_enable(struct rte_eth_dev __rte_unused * dev) +{ + NT_LOG(DBG, NTHW, "The device always run promiscuous mode."); + return 0; +} + +static int eth_dev_rss_hash_update(struct rte_eth_dev *eth_dev, + struct rte_eth_rss_conf *rss_conf) +{ + struct pmd_internals *internals = + (struct pmd_internals *)eth_dev->data->dev_private; + struct flow_eth_dev *fedev = internals->flw_dev; + struct flow_nic_dev *ndev = fedev->ndev; + const int hsh_idx = + 0; /* hsh index 0 means the default receipt in HSH module */ + int res = flow_nic_set_hasher_fields(ndev, hsh_idx, + nt_rss_hash_field_from_dpdk(rss_conf->rss_hf)); + res |= hw_mod_hsh_rcp_flush(&ndev->be, hsh_idx, 1); + return res; +} + +static int rss_hash_conf_get(struct rte_eth_dev *eth_dev, + struct rte_eth_rss_conf *rss_conf) +{ + struct pmd_internals *internals = + (struct pmd_internals *)eth_dev->data->dev_private; + struct flow_eth_dev *fedev = internals->flw_dev; + struct flow_nic_dev *ndev = fedev->ndev; + + rss_conf->rss_key = NULL; + rss_conf->rss_key_len = 0; + rss_conf->rss_hf |= + dpdk_rss_hash_define_from_nt_rss(ndev->rss_hash_config); + return 0; +} + +static struct eth_dev_ops nthw_eth_dev_ops = { + .dev_configure = eth_dev_configure, + .dev_start = eth_dev_start, + .dev_stop = eth_dev_stop, + .dev_set_link_up = eth_dev_set_link_up, + .dev_set_link_down = eth_dev_set_link_down, + .dev_close = eth_dev_close, + .link_update = eth_link_update, + .stats_get = eth_stats_get, + .stats_reset = eth_stats_reset, + .dev_infos_get = eth_dev_infos_get, + .fw_version_get = eth_fw_version_get, + .rx_queue_setup = eth_rx_scg_queue_setup, + .rx_queue_start = eth_rx_queue_start, + .rx_queue_stop = eth_rx_queue_stop, + .rx_queue_release = eth_rx_queue_release, + .tx_queue_setup = eth_tx_scg_queue_setup, + .tx_queue_start = eth_tx_queue_start, + .tx_queue_stop = eth_tx_queue_stop, + .tx_queue_release = eth_tx_queue_release, + .mac_addr_remove = eth_mac_addr_remove, + .mac_addr_add = eth_mac_addr_add, + .mac_addr_set = eth_mac_addr_set, + .set_mc_addr_list = eth_set_mc_addr_list, + .xstats_get = eth_xstats_get, + .xstats_get_names = eth_xstats_get_names, + .xstats_reset = eth_xstats_reset, + .xstats_get_by_id = eth_xstats_get_by_id, + .xstats_get_names_by_id = eth_xstats_get_names_by_id, + .mtu_set = NULL, + .mtr_ops_get = eth_mtr_ops_get, + .flow_ops_get = _dev_flow_ops_get, + .promiscuous_disable = NULL, + .promiscuous_enable = promiscuous_enable, + .rss_hash_update = eth_dev_rss_hash_update, + .rss_hash_conf_get = rss_hash_conf_get, +}; + +/* Converts link speed provided in Mbps to NT specific definitions.*/ +static nt_link_speed_t convert_link_speed(int link_speed_mbps) +{ + switch (link_speed_mbps) { + case 10: + return NT_LINK_SPEED_10M; + case 100: + return NT_LINK_SPEED_100M; + case 1000: + return NT_LINK_SPEED_1G; + case 10000: + return NT_LINK_SPEED_10G; + case 40000: + return NT_LINK_SPEED_40G; + case 100000: + return NT_LINK_SPEED_100G; + case 50000: + return NT_LINK_SPEED_50G; + case 25000: + return NT_LINK_SPEED_25G; + default: + return NT_LINK_SPEED_UNKNOWN; + } +} + +/* + * Adapter flm stat thread + */ +static void *adapter_flm_thread_fn(void *context) +{ + struct drv_s *p_drv = context; + struct ntdrv_4ga_s *p_nt_drv = &p_drv->ntdrv; + struct adapter_info_s *p_adapter_info = &p_nt_drv->adapter_info; + struct nt4ga_filter_s *p_nt4ga_filter = &p_adapter_info->nt4ga_filter; + struct flow_nic_dev *p_flow_nic_dev = p_nt4ga_filter->mp_flow_device; + + NT_LOG(DBG, ETHDEV, "%s: %s: waiting for port configuration\n", + p_adapter_info->mp_adapter_id_str, __func__); + + while (p_flow_nic_dev->eth_base == NULL) + usleep(1000000); + struct flow_eth_dev *dev = p_flow_nic_dev->eth_base; + + NT_LOG(DBG, ETHDEV, "%s: %s: begin\n", p_adapter_info->mp_adapter_id_str, + __func__); + + while (!p_drv->ntdrv.b_shutdown) { + if (flm_mtr_update_stats(dev) == 0) + usleep(10); + } + + NT_LOG(DBG, ETHDEV, "%s: %s: end\n", p_adapter_info->mp_adapter_id_str, + __func__); + + return NULL; +} + +/* + * Adapter stat thread + */ +static void *adapter_stat_thread_fn(void *context) +{ + struct drv_s *p_drv = context; + ntdrv_4ga_t *p_nt_drv = &p_drv->ntdrv; + nt4ga_stat_t *p_nt4ga_stat = &p_nt_drv->adapter_info.nt4ga_stat; + nthw_stat_t *p_nthw_stat = p_nt4ga_stat->mp_nthw_stat; + + const char *const p_adapter_id_str _unused = + p_nt_drv->adapter_info.mp_adapter_id_str; + + NT_LOG(DBG, ETHDEV, "%s: %s: begin\n", p_adapter_id_str, __func__); + + assert(p_nthw_stat); + + while (!p_drv->ntdrv.b_shutdown) { + usleep(100 * 100); + + nthw_stat_trigger(p_nthw_stat); + + uint32_t loop = 0; + + while ((!p_drv->ntdrv.b_shutdown) && + (*p_nthw_stat->mp_timestamp == (uint64_t)-1)) { + usleep(1 * 100); + + if (nt_log_is_debug(NT_LOG_MODULE_ETHDEV) && + (++loop & 0x3fff) == 0) { + uint32_t sf_ram_of = + nthw_rmc_get_status_sf_ram_of(p_nt4ga_stat->mp_nthw_rmc); + uint32_t descr_fifo_of = + nthw_rmc_get_status_descr_fifo_of(p_nt4ga_stat->mp_nthw_rmc); + + uint32_t dbg_merge = + nthw_rmc_get_dbg_merge(p_nt4ga_stat->mp_nthw_rmc); + uint32_t mac_if_err = + nthw_rmc_get_mac_if_err(p_nt4ga_stat->mp_nthw_rmc); + + NT_LOG(ERR, ETHDEV, "Statistics DMA frozen\n"); + NT_LOG(ERR, ETHDEV, + "SF RAM Overflow : %08x\n", + sf_ram_of); + NT_LOG(ERR, ETHDEV, + "Descr Fifo Overflow : %08x\n", + descr_fifo_of); + NT_LOG(ERR, ETHDEV, + "DBG Merge : %08x\n", + dbg_merge); + NT_LOG(ERR, ETHDEV, + "MAC If Errors : %08x\n", + mac_if_err); + } + } + + /* Check then collect */ + { + pthread_mutex_lock(&p_nt_drv->stat_lck); + nt4ga_stat_collect(&p_nt_drv->adapter_info, p_nt4ga_stat); + pthread_mutex_unlock(&p_nt_drv->stat_lck); + } + } + + NT_LOG(DBG, ETHDEV, "%s: %s: end\n", p_adapter_id_str, __func__); + + return NULL; +} + +static struct { + struct rte_pci_device *vpf_dev; + struct rte_eth_devargs eth_da; + int portqueues[MAX_FPGA_VIRTUAL_PORTS_SUPPORTED]; + uint16_t pf_backer_port_id; +} rep; + +static int nthw_pci_dev_init(struct rte_pci_device *pci_dev) +{ + int res; + struct drv_s *p_drv; + ntdrv_4ga_t *p_nt_drv; + fpga_info_t *fpga_info; + + hw_info_t *p_hw_info _unused; + uint32_t n_port_mask = -1; /* All ports enabled by default */ + uint32_t nb_rx_queues = 1; + uint32_t nb_tx_queues = 1; + uint32_t exception_path = 0; + struct flow_queue_id_s queue_ids[FLOW_MAX_QUEUES]; + lag_config_t *lag_config = NULL; + int n_phy_ports; + struct port_link_speed pls_mbps[NUM_ADAPTER_PORTS_MAX] = { 0 }; + int num_port_speeds = 0; + enum flow_eth_dev_profile profile; + + NT_LOG(DBG, ETHDEV, "%s: [%s:%u] enter\n", __func__, __FILE__, __LINE__); + NT_LOG(DBG, ETHDEV, "Dev %s PF #%i Init : %02x:%02x:%i\n", + pci_dev->name, pci_dev->addr.function, pci_dev->addr.bus, + pci_dev->addr.devid, pci_dev->addr.function); + + /* + * Process options/arguments + */ + if (pci_dev->device.devargs && pci_dev->device.devargs->args) { + int kvargs_count; + struct rte_kvargs *kvlist = rte_kvargs_parse(pci_dev->device.devargs->args, + valid_arguments); + if (kvlist == NULL) + return -1; + + /* + * Argument: help + * NOTE: this argument/option check should be the first as it will stop + * execution after producing its output + */ + { + if (rte_kvargs_get(kvlist, ETH_DEV_NTNIC_HELP_ARG)) { + size_t i; + + printf("NTNIC supported arguments:\n\n"); + for (i = 0; i < RTE_DIM(valid_arguments); i++) { + if (valid_arguments[i] == NULL) + break; + printf(" %s\n", valid_arguments[i]); + } + printf("\n"); + exit(0); + } + } + + /* + * Argument: supported-fpgas=list|verbose + * NOTE: this argument/option check should be the first as it will stop + * execution after producing its output + */ + { + const char *val_str; + + val_str = rte_kvargs_get(kvlist, + ETH_DEV_NTNIC_SUPPORTED_FPGAS_ARG); + if (val_str) { + int detail_level = 0; + nt_fpga_mgr_t *p_fpga_mgr = NULL; + + if (strcmp(val_str, "list") == 0) { + detail_level = 0; + } else if (strcmp(val_str, "verbose") == 0) { + detail_level = 1; + } else { + NT_LOG(ERR, ETHDEV, + "%s: argument '%s': '%s': unsupported value\n", + __func__, + ETH_DEV_NTNIC_SUPPORTED_FPGAS_ARG, + val_str); + exit(1); + } + /* Produce fpgamgr output and exit hard */ + p_fpga_mgr = fpga_mgr_new(); + if (p_fpga_mgr) { + fpga_mgr_init(p_fpga_mgr); + fpga_mgr_show(p_fpga_mgr, stdout, + detail_level); + fpga_mgr_delete(p_fpga_mgr); + p_fpga_mgr = NULL; + } else { + NT_LOG(ERR, ETHDEV, + "%s: %s cannot complete\n", + __func__, + ETH_DEV_NTNIC_SUPPORTED_FPGAS_ARG); + exit(1); + } + exit(0); + } + } + + /* link_speed options/argument only applicable for physical ports. */ + num_port_speeds = + rte_kvargs_count(kvlist, ETH_DEV_NTHW_LINK_SPEED_ARG); + if (num_port_speeds) { + assert(num_port_speeds <= NUM_ADAPTER_PORTS_MAX); + void *pls_mbps_ptr = &pls_mbps[0]; + + res = rte_kvargs_process(kvlist, + ETH_DEV_NTHW_LINK_SPEED_ARG, + &string_to_port_link_speed, + &pls_mbps_ptr); + if (res < 0) { + NT_LOG(ERR, ETHDEV, + "%s: problem with port link speed command " + "line arguments: res=%d\n", + __func__, res); + return -1; + } + for (int i = 0; i < num_port_speeds; ++i) { + int pid = pls_mbps[i].port_id; + + int lspeed _unused = pls_mbps[i].link_speed; + + NT_LOG(DBG, ETHDEV, "%s: devargs: %s=%d.%d\n", + __func__, ETH_DEV_NTHW_LINK_SPEED_ARG, + pid, lspeed); + if (pls_mbps[i].port_id >= + NUM_ADAPTER_PORTS_MAX) { + NT_LOG(ERR, ETHDEV, + "%s: problem with port link speed command line " + "arguments: port id should be 0 to %d, got %d\n", + __func__, NUM_ADAPTER_PORTS_MAX, + pid); + return -1; + } + } + } + + /* + * portmask option/argument + * It is intentional that portmask is only used to decide if DPDK eth_dev + * should be created for testing we would still keep the nthw subsystems + * running for all interfaces + */ + kvargs_count = + rte_kvargs_count(kvlist, ETH_DEV_NTHW_PORTMASK_ARG); + if (kvargs_count) { + assert(kvargs_count == 1); + res = rte_kvargs_process(kvlist, + ETH_DEV_NTHW_PORTMASK_ARG, + &string_to_u32, &n_port_mask); + if (res < 0) { + NT_LOG(ERR, ETHDEV, + "%s: problem with command line arguments: res=%d\n", + __func__, res); + return -1; + } + NT_LOG(DBG, ETHDEV, "%s: devargs: %s=%u\n", __func__, + ETH_DEV_NTHW_PORTMASK_ARG, n_port_mask); + } + + /* + * rxq option/argument + * The number of rxq (hostbuffers) allocated in memory. + * Default is 32 RX Hostbuffers + */ + kvargs_count = + rte_kvargs_count(kvlist, ETH_DEV_NTHW_RXQUEUES_ARG); + if (kvargs_count) { + assert(kvargs_count == 1); + res = rte_kvargs_process(kvlist, + ETH_DEV_NTHW_RXQUEUES_ARG, + &string_to_u32, &nb_rx_queues); + if (res < 0) { + NT_LOG(ERR, ETHDEV, + "%s: problem with command line arguments: res=%d\n", + __func__, res); + return -1; + } + NT_LOG(DBG, ETHDEV, "%s: devargs: %s=%u\n", __func__, + ETH_DEV_NTHW_RXQUEUES_ARG, nb_rx_queues); + } + + /* + * txq option/argument + * The number of txq (hostbuffers) allocated in memory. + * Default is 32 TX Hostbuffers + */ + kvargs_count = + rte_kvargs_count(kvlist, ETH_DEV_NTHW_TXQUEUES_ARG); + if (kvargs_count) { + assert(kvargs_count == 1); + res = rte_kvargs_process(kvlist, + ETH_DEV_NTHW_TXQUEUES_ARG, + &string_to_u32, &nb_tx_queues); + if (res < 0) { + NT_LOG(ERR, ETHDEV, + "%s: problem with command line arguments: res=%d\n", + __func__, res); + return -1; + } + NT_LOG(DBG, ETHDEV, "%s: devargs: %s=%u\n", __func__, + ETH_DEV_NTHW_TXQUEUES_ARG, nb_tx_queues); + } + + kvargs_count = rte_kvargs_count(kvlist, ETH_NTNIC_LAG_MODE_ARG); + if (kvargs_count) { + lag_config = (lag_config_t *)rte_zmalloc(NULL, sizeof(lag_config_t), 0); + if (lag_config == NULL) { + NT_LOG(ERR, ETHDEV, + "Failed to alloc lag_config data\n"); + return -1; + } + assert(kvargs_count == 1); + res = rte_kvargs_process(kvlist, ETH_NTNIC_LAG_MODE_ARG, + &string_to_u32, + &lag_config->mode); + if (res < 0) { + NT_LOG(ERR, ETHDEV, + "%s: problem with command line arguments: res=%d\n", + __func__, res); + return -1; + } + NT_LOG(DBG, ETHDEV, "%s: devargs: %s=%u\n", __func__, + ETH_NTNIC_LAG_MODE_ARG, nb_tx_queues); + lag_active = 1; + } + + kvargs_count = rte_kvargs_count(kvlist, + ETH_DEV_NTHW_EXCEPTION_PATH_ARG); + if (kvargs_count) { + assert(kvargs_count == 1); + res = rte_kvargs_process(kvlist, + ETH_DEV_NTHW_EXCEPTION_PATH_ARG, + &string_to_u32, &exception_path); + if (res < 0) { + NT_LOG(ERR, ETHDEV, + "%s: problem with command line arguments: res=%d\n", + __func__, res); + return -1; + } + NT_LOG(DBG, ETHDEV, "%s: devargs: %s=%u\n", __func__, + ETH_DEV_NTHW_EXCEPTION_PATH_ARG, exception_path); + } + + if (lag_active && lag_config) { + switch (lag_config->mode) { + case BONDING_MODE_ACTIVE_BACKUP: + NT_LOG(DBG, ETHDEV, + "Active / Backup LAG mode\n"); + kvargs_count = rte_kvargs_count(kvlist, + ETH_NTNIC_LAG_PRIMARY_ARG); + if (kvargs_count) { + assert(kvargs_count == 1); + res = rte_kvargs_process(kvlist, + ETH_NTNIC_LAG_PRIMARY_ARG, + &string_to_u32, + &lag_config->primary_port); + if (res < 0) { + NT_LOG(ERR, ETHDEV, + "%s: problem with command line " + "arguments: res=%d\n", + __func__, res); + return -1; + } + NT_LOG(DBG, ETHDEV, + "%s: devargs: %s=%u\n", __func__, + ETH_NTNIC_LAG_MODE_ARG, + nb_tx_queues); + } else { + NT_LOG(ERR, ETHDEV, + "LAG must define a primary port\n"); + return -1; + } + + kvargs_count = rte_kvargs_count(kvlist, + ETH_NTNIC_LAG_BACKUP_ARG); + if (kvargs_count) { + assert(kvargs_count == 1); + res = rte_kvargs_process(kvlist, + ETH_NTNIC_LAG_BACKUP_ARG, + &string_to_u32, + &lag_config->backup_port); + if (res != 0) { + NT_LOG(ERR, ETHDEV, + "%s: problem with command line " + "arguments: res=%d\n", + __func__, res); + return -1; + } + NT_LOG(DBG, ETHDEV, + "%s: devargs: %s=%u\n", __func__, + ETH_NTNIC_LAG_MODE_ARG, + nb_tx_queues); + } else { + NT_LOG(ERR, ETHDEV, + "LAG must define a backup port\n"); + return -1; + } + break; + + case BONDING_MODE_8023AD: + NT_LOG(DBG, ETHDEV, + "Active / Active LAG mode\n"); + lag_config->primary_port = 0; + lag_config->backup_port = 0; + break; + + default: + NT_LOG(ERR, ETHDEV, "Unsupported LAG mode\n"); + return -1; + } + } + + rte_kvargs_free(kvlist); + } + + /* parse representor args */ + if (setup_virtual_pf_representor_base(pci_dev) == -1) { + NT_LOG(ERR, ETHDEV, + "%s: setup_virtual_pf_representor_base error %d (%s:%u)\n", + (pci_dev->name[0] ? pci_dev->name : "NA"), -1, __func__, + __LINE__); + return -1; + } + + /* alloc */ + p_drv = rte_zmalloc_socket(pci_dev->name, sizeof(struct drv_s), + RTE_CACHE_LINE_SIZE, + pci_dev->device.numa_node); + if (!p_drv) { + NT_LOG(ERR, ETHDEV, "%s: error %d (%s:%u)\n", + (pci_dev->name[0] ? pci_dev->name : "NA"), -1, __func__, + __LINE__); + return -1; + } + + /* Setup VFIO context */ + int vfio = nt_vfio_setup(pci_dev); + + if (vfio < 0) { + NT_LOG(ERR, ETHDEV, "%s: vfio_setup error %d (%s:%u)\n", + (pci_dev->name[0] ? pci_dev->name : "NA"), -1, __func__, + __LINE__); + rte_free(p_drv); + return -1; + } + + p_drv->probe_finished = 0; + /* context */ + p_nt_drv = &p_drv->ntdrv; + fpga_info = &p_nt_drv->adapter_info.fpga_info; + p_hw_info = &p_nt_drv->adapter_info.hw_info; + + p_drv->p_dev = pci_dev; + + /* Set context for NtDrv */ + p_nt_drv->pciident = + BDF_TO_PCIIDENT(pci_dev->addr.domain, pci_dev->addr.bus, + pci_dev->addr.devid, pci_dev->addr.function); + p_nt_drv->adapter_info.n_rx_host_buffers = nb_rx_queues; + p_nt_drv->adapter_info.n_tx_host_buffers = nb_tx_queues; + + fpga_info->bar0_addr = (void *)pci_dev->mem_resource[0].addr; + fpga_info->bar0_size = pci_dev->mem_resource[0].len; + NT_LOG(DBG, ETHDEV, "bar0=0x%" PRIX64 " len=%d\n", fpga_info->bar0_addr, + fpga_info->bar0_size); + fpga_info->numa_node = pci_dev->device.numa_node; + fpga_info->pciident = p_nt_drv->pciident; + fpga_info->adapter_no = p_drv->adapter_no; + + p_nt_drv->adapter_info.hw_info.pci_class_id = pci_dev->id.class_id; + p_nt_drv->adapter_info.hw_info.pci_vendor_id = pci_dev->id.vendor_id; + p_nt_drv->adapter_info.hw_info.pci_device_id = pci_dev->id.device_id; + p_nt_drv->adapter_info.hw_info.pci_sub_vendor_id = + pci_dev->id.subsystem_vendor_id; + p_nt_drv->adapter_info.hw_info.pci_sub_device_id = + pci_dev->id.subsystem_device_id; + + NT_LOG(DBG, ETHDEV, + "%s: " PCIIDENT_PRINT_STR " %04X:%04X: %04X:%04X:\n", + p_nt_drv->adapter_info.mp_adapter_id_str, + PCIIDENT_TO_DOMAIN(p_nt_drv->pciident), + PCIIDENT_TO_BUSNR(p_nt_drv->pciident), + PCIIDENT_TO_DEVNR(p_nt_drv->pciident), + PCIIDENT_TO_FUNCNR(p_nt_drv->pciident), + p_nt_drv->adapter_info.hw_info.pci_vendor_id, + p_nt_drv->adapter_info.hw_info.pci_device_id, + p_nt_drv->adapter_info.hw_info.pci_sub_vendor_id, + p_nt_drv->adapter_info.hw_info.pci_sub_device_id); + + p_nt_drv->b_shutdown = false; + p_nt_drv->adapter_info.pb_shutdown = &p_nt_drv->b_shutdown; + + for (int i = 0; i < num_port_speeds; ++i) { + struct adapter_info_s *p_adapter_info = &p_nt_drv->adapter_info; + nt_link_speed_t link_speed = + convert_link_speed(pls_mbps[i].link_speed); + nt4ga_port_set_link_speed(p_adapter_info, i, link_speed); + } + + /* store context */ + store_pdrv(p_drv); + + /* initialize nt4ga nthw fpga module instance in drv */ + int err = nt4ga_adapter_init(&p_nt_drv->adapter_info); + + if (err != 0) { + NT_LOG(ERR, ETHDEV, + "%s: Cannot initialize the adapter instance\n", + p_nt_drv->adapter_info.mp_adapter_id_str); + return -1; + } + + if (fpga_info->mp_nthw_epp != NULL) + nthw_eth_dev_ops.mtu_set = dev_set_mtu; + + /* Initialize the queue system */ + if (err == 0) { + err = nthw_virt_queue_init(fpga_info); + if (err != 0) { + NT_LOG(ERR, ETHDEV, + "%s: Cannot initialize scatter-gather queues\n", + p_nt_drv->adapter_info.mp_adapter_id_str); + } else { + NT_LOG(DBG, ETHDEV, + "%s: Initialized scatter-gather queues\n", + p_nt_drv->adapter_info.mp_adapter_id_str); + } + } + + switch (fpga_info->profile) { + case FPGA_INFO_PROFILE_VSWITCH: + profile = FLOW_ETH_DEV_PROFILE_VSWITCH; + break; + case FPGA_INFO_PROFILE_INLINE: + profile = FLOW_ETH_DEV_PROFILE_INLINE; + break; + case FPGA_INFO_PROFILE_UNKNOWN: + /* fallthrough */ + case FPGA_INFO_PROFILE_CAPTURE: + /* fallthrough */ + default: + NT_LOG(ERR, ETHDEV, "%s: fpga profile not supported [%s:%u]\n", + (pci_dev->name[0] ? pci_dev->name : "NA"), __func__, + __LINE__); + return -1; + } + + if (err == 0) { + /* mp_adapter_id_str is initialized after nt4ga_adapter_init(p_nt_drv) */ + const char *const p_adapter_id_str _unused = + p_nt_drv->adapter_info.mp_adapter_id_str; + NT_LOG(DBG, ETHDEV, + "%s: %s: AdapterPCI=" PCIIDENT_PRINT_STR + " Hw=0x%02X_rev%d PhyPorts=%d\n", + (pci_dev->name[0] ? pci_dev->name : "NA"), p_adapter_id_str, + PCIIDENT_TO_DOMAIN(p_nt_drv->adapter_info.fpga_info.pciident), + PCIIDENT_TO_BUSNR(p_nt_drv->adapter_info.fpga_info.pciident), + PCIIDENT_TO_DEVNR(p_nt_drv->adapter_info.fpga_info.pciident), + PCIIDENT_TO_FUNCNR(p_nt_drv->adapter_info.fpga_info.pciident), + p_hw_info->hw_platform_id, fpga_info->nthw_hw_info.hw_id, + fpga_info->n_phy_ports); + } else { + NT_LOG(ERR, ETHDEV, "%s: error=%d [%s:%u]\n", + (pci_dev->name[0] ? pci_dev->name : "NA"), err, __func__, + __LINE__); + return -1; + } + + pthread_mutex_init(&p_nt_drv->stat_lck, NULL); + res = rte_ctrl_thread_create(&p_nt_drv->stat_thread, "nt4ga_stat_thr", + NULL, adapter_stat_thread_fn, + (void *)p_drv); + if (res) { + NT_LOG(ERR, ETHDEV, "%s: error=%d [%s:%u]\n", + (pci_dev->name[0] ? pci_dev->name : "NA"), res, __func__, + __LINE__); + return -1; + } + + if (fpga_info->profile == FPGA_INFO_PROFILE_INLINE) { + res = rte_ctrl_thread_create(&p_nt_drv->flm_thread, + "nt_flm_stat_thr", NULL, + adapter_flm_thread_fn, + (void *)p_drv); + if (res) { + NT_LOG(ERR, ETHDEV, "%s: error=%d [%s:%u]\n", + (pci_dev->name[0] ? pci_dev->name : "NA"), res, + __func__, __LINE__); + return -1; + } + } + + if (lag_config) { + /* LAG is activated, so only use port 0 */ + n_phy_ports = 1; + } else { + n_phy_ports = fpga_info->n_phy_ports; + } + for (int n_intf_no = 0; n_intf_no < n_phy_ports; n_intf_no++) { + const char *const p_port_id_str _unused = + p_nt_drv->adapter_info.mp_port_id_str[n_intf_no]; + struct pmd_internals *internals = NULL; + struct rte_eth_dev *eth_dev; + char name[32]; + int i; + + if ((1 << n_intf_no) & ~n_port_mask) { + NT_LOG(DBG, ETHDEV, + "%s: %s: interface #%d: skipping due to portmask 0x%02X\n", + __func__, p_port_id_str, n_intf_no, n_port_mask); + continue; + } + + snprintf(name, sizeof(name), "ntnic%d", n_intf_no); + NT_LOG(DBG, ETHDEV, "%s: %s: interface #%d: %s: '%s'\n", + __func__, p_port_id_str, n_intf_no, + (pci_dev->name[0] ? pci_dev->name : "NA"), name); + + internals = rte_zmalloc_socket(name, + sizeof(struct pmd_internals), + RTE_CACHE_LINE_SIZE, + pci_dev->device.numa_node); + if (!internals) { + NT_LOG(ERR, ETHDEV, "%s: %s: error=%d [%s:%u]\n", + (pci_dev->name[0] ? pci_dev->name : "NA"), name, + -1, __func__, __LINE__); + return -1; + } + + internals->pci_dev = pci_dev; + internals->n_intf_no = n_intf_no; + internals->if_index = n_intf_no; + internals->min_tx_pkt_size = 64; + internals->max_tx_pkt_size = 10000; + internals->type = PORT_TYPE_PHYSICAL; + internals->vhid = -1; + internals->port = n_intf_no; + internals->nb_rx_queues = nb_rx_queues; + internals->nb_tx_queues = nb_tx_queues; + + /* Not used queue index as dest port in bypass - use 0x80 + port nr */ + for (i = 0; i < MAX_QUEUES; i++) + internals->vpq[i].hw_id = -1; + + /* Setup queue_ids */ + if (nb_rx_queues > 1) { + NT_LOG(DBG, ETHDEV, + "(%i) NTNIC configured with Rx multi queues. %i queues\n", + 0 /*port*/, nb_rx_queues); + } + + if (nb_tx_queues > 1) { + NT_LOG(DBG, ETHDEV, + "(%i) NTNIC configured with Tx multi queues. %i queues\n", + 0 /*port*/, nb_tx_queues); + } + + int max_num_queues = (nb_rx_queues > nb_tx_queues) ? + nb_rx_queues : + nb_tx_queues; + int start_queue = allocate_queue(max_num_queues); + + if (start_queue < 0) + return -1; + + for (i = 0; i < (int)max_num_queues; i++) { + queue_ids[i].id = start_queue + i; + queue_ids[i].hw_id = queue_ids[i].id; + + internals->rxq_scg[i].queue = queue_ids[i]; + /* use same index in Rx and Tx rings */ + internals->txq_scg[i].queue = queue_ids[i]; + internals->rxq_scg[i].enabled = 0; + internals->txq_scg[i].type = internals->type; + internals->rxq_scg[i].type = internals->type; + internals->rxq_scg[i].port = internals->port; + } + + /* no tx queues - tx data goes out on phy */ + internals->vpq_nb_vq = 0; + + for (i = 0; i < (int)nb_tx_queues; i++) { + internals->txq_scg[i].port = internals->port; + internals->txq_scg[i].enabled = 0; + } + + /* Set MAC address (but only if the MAC address is permitted) */ + if (n_intf_no < fpga_info->nthw_hw_info.vpd_info.mn_mac_addr_count) { + const uint64_t mac = + fpga_info->nthw_hw_info.vpd_info.mn_mac_addr_value + + n_intf_no; + internals->eth_addrs[0].addr_bytes[0] = (mac >> 40) & + 0xFFu; + internals->eth_addrs[0].addr_bytes[1] = (mac >> 32) & + 0xFFu; + internals->eth_addrs[0].addr_bytes[2] = (mac >> 24) & + 0xFFu; + internals->eth_addrs[0].addr_bytes[3] = (mac >> 16) & + 0xFFu; + internals->eth_addrs[0].addr_bytes[4] = (mac >> 8) & + 0xFFu; + internals->eth_addrs[0].addr_bytes[5] = (mac >> 0) & + 0xFFu; + } + + eth_dev = rte_eth_dev_allocate(name); + if (!eth_dev) { + NT_LOG(ERR, ETHDEV, "%s: %s: error=%d [%s:%u]\n", + (pci_dev->name[0] ? pci_dev->name : "NA"), name, + -1, __func__, __LINE__); + return -1; + } + + internals->flw_dev = flow_get_eth_dev(0, n_intf_no, + eth_dev->data->port_id, + nb_rx_queues, + queue_ids, + &internals->txq_scg[0].rss_target_id, + profile, exception_path); + if (!internals->flw_dev) { + NT_LOG(ERR, VDPA, + "Error creating port. Resource exhaustion in HW\n"); + return -1; + } + + NT_LOG(DBG, ETHDEV, + "%s: [%s:%u] eth_dev %p, port_id %u, if_index %u\n", + __func__, __func__, __LINE__, eth_dev, + eth_dev->data->port_id, n_intf_no); + + /* connect structs */ + internals->p_drv = p_drv; + eth_dev->data->dev_private = internals; + eth_dev->data->mac_addrs = internals->eth_addrs; + + internals->port_id = eth_dev->data->port_id; + + /* + * if representor ports defined on this PF set the assigned port_id as the + * backer_port_id for the VFs + */ + if (rep.vpf_dev == pci_dev) + rep.pf_backer_port_id = eth_dev->data->port_id; + NT_LOG(DBG, ETHDEV, + "%s: [%s:%u] Setting up RX functions for SCG\n", + __func__, __func__, __LINE__); + eth_dev->rx_pkt_burst = eth_dev_rx_scg; + eth_dev->tx_pkt_burst = eth_dev_tx_scg; + eth_dev->tx_pkt_prepare = NULL; + + struct rte_eth_link pmd_link; + + pmd_link.link_speed = ETH_SPEED_NUM_NONE; + pmd_link.link_duplex = ETH_LINK_FULL_DUPLEX; + pmd_link.link_status = ETH_LINK_DOWN; + pmd_link.link_autoneg = ETH_LINK_AUTONEG; + + eth_dev->device = &pci_dev->device; + eth_dev->data->dev_link = pmd_link; + eth_dev->data->numa_node = pci_dev->device.numa_node; + eth_dev->dev_ops = &nthw_eth_dev_ops; + eth_dev->state = RTE_ETH_DEV_ATTACHED; + + rte_eth_copy_pci_info(eth_dev, pci_dev); + eth_dev_pci_specific_init(eth_dev, + pci_dev); /* performs rte_eth_copy_pci_info() */ + + p_drv->n_eth_dev_init_count++; + + if (lag_config) { + internals->lag_config = lag_config; + lag_config->internals = internals; + + /* Always merge port 0 and port 1 on a LAG bond */ + lag_set_port_group(0, (uint32_t)0x01); + lag_config->lag_thread_active = 1; + pthread_create(&lag_config->lag_tid, NULL, + lag_management, lag_config); + } + + if (fpga_info->profile == FPGA_INFO_PROFILE_INLINE && + internals->flw_dev->ndev->be.tpe.ver >= 2) { + assert(nthw_eth_dev_ops.mtu_set == + dev_set_mtu_inline || + nthw_eth_dev_ops.mtu_set == NULL); + nthw_eth_dev_ops.mtu_set = dev_set_mtu_inline; + dev_set_mtu_inline(eth_dev, MTUINITVAL); + NT_LOG(DBG, ETHDEV, + "%s INLINE MTU supported, tpe version %d\n", + __func__, internals->flw_dev->ndev->be.tpe.ver); + } else { + NT_LOG(DBG, ETHDEV, "INLINE MTU not supported"); + } + } + + NT_LOG(DBG, ETHDEV, "%s: [%s:%u] leave\n", __func__, __FILE__, __LINE__); + +#ifdef NT_TOOLS + /* + * If NtConnect interface must be started for external tools + */ + ntconn_adap_register(p_drv); + ntconn_stat_register(p_drv); + + /* Determine CPU used by the DPDK */ + cpu_set_t cpuset; + unsigned int lcore_id; + + CPU_ZERO(&cpuset); + for (lcore_id = 0; lcore_id < RTE_MAX_LCORE; lcore_id++) { + if (rte_lcore_has_role(lcore_id, ROLE_OFF)) + continue; + rte_cpuset_t lcore_cpuset = rte_lcore_cpuset(lcore_id); + + RTE_CPU_OR(&cpuset, &cpuset, &lcore_cpuset); + } + /* Set available CPU for ntconnect */ + RTE_CPU_NOT(&cpuset, &cpuset); + + ntconn_flow_register(p_drv); + ntconn_meter_register(p_drv); +#ifdef NTCONNECT_TEST + ntconn_test_register(p_drv); +#endif + ntconnect_init(NTCONNECT_SOCKET, cpuset); +#endif + + return 0; +} + +static int nthw_pci_dev_deinit(struct rte_eth_dev *eth_dev __rte_unused) +{ + int i; + + NT_LOG(DBG, ETHDEV, "%s: [%s:%u] start\n", __func__, __FILE__, __LINE__); + + struct pmd_internals *internals = pmd_intern_base; + + sleep(1); /* let running threads end Rx and Tx activity */ + + while (internals) { + for (i = internals->nb_tx_queues - 1; i >= 0; i--) { + nthw_release_managed_tx_virt_queue(internals->txq_scg[i].vq); + release_hw_virtio_queues(&internals->txq_scg[i].hwq); + } + + for (i = internals->nb_rx_queues - 1; i >= 0; i--) { + nthw_release_managed_rx_virt_queue(internals->rxq_scg[i].vq); + release_hw_virtio_queues(&internals->rxq_scg[i].hwq); + } + internals = internals->next; + } + + for (i = 0; i < MAX_REL_VQS; i++) { + if (rel_virt_queue[i].vq != NULL) { + if (rel_virt_queue[i].rx) { + if (rel_virt_queue[i].managed) + nthw_release_managed_rx_virt_queue(rel_virt_queue[i].vq); + else + nthw_release_rx_virt_queue(rel_virt_queue[i].vq); + } else { + if (rel_virt_queue[i].managed) + nthw_release_managed_tx_virt_queue(rel_virt_queue[i].vq); + else + nthw_release_tx_virt_queue(rel_virt_queue[i].vq); + } + rel_virt_queue[i].vq = NULL; + } + } + + nt_vfio_remove(EXCEPTION_PATH_HID); + + return 0; +} + +static void signal_handler_func_int(int sig) +{ + if (sig != SIGINT) { + signal(sig, previous_handler); + raise(sig); + return; + } + kill_pmd = 1; +} + +static void *shutdown_thread(void *arg __rte_unused) +{ + struct rte_eth_dev dummy; + + while (!kill_pmd) + usleep(100000); + + NT_LOG(DBG, ETHDEV, "%s: Shutting down because of ctrl+C\n", __func__); + nthw_pci_dev_deinit(&dummy); + + signal(SIGINT, previous_handler); + raise(SIGINT); + + return NULL; +} + +static int init_shutdown(void) +{ + NT_LOG(DBG, ETHDEV, "%s: Starting shutdown handler\n", __func__); + kill_pmd = 0; + previous_handler = signal(SIGINT, signal_handler_func_int); + pthread_create(&shutdown_tid, NULL, shutdown_thread, NULL); + + /* + * 1 time calculation of 1 sec stat update rtc cycles to prevent stat poll + * flooding by OVS from multiple virtual port threads - no need to be precise + */ + uint64_t now_rtc = rte_get_tsc_cycles(); + + usleep(10000); + rte_tsc_freq = 100 * (rte_get_tsc_cycles() - now_rtc); + + return 0; +} + +static int nthw_pci_probe(struct rte_pci_driver *pci_drv __rte_unused, + struct rte_pci_device *pci_dev) +{ + int res; + + NT_LOG(DBG, ETHDEV, "%s: [%s:%u] start\n", __func__, __FILE__, __LINE__); + +#if defined(DEBUG) + NT_LOG(DBG, NTHW, "Testing NTHW %u [%s:%u]\n", + nt_log_module_logtype[NT_LOG_MODULE_INDEX(NT_LOG_MODULE_NTHW)], + __func__, __LINE__); +#endif + + NT_LOG(DBG, ETHDEV, "%s: pcidev: name: '%s'\n", __func__, + pci_dev->name); + NT_LOG(DBG, ETHDEV, "%s: devargs: name: '%s'\n", __func__, + pci_dev->device.name); + if (pci_dev->device.devargs) { + NT_LOG(DBG, ETHDEV, "%s: devargs: args: '%s'\n", __func__, + (pci_dev->device.devargs->args ? + pci_dev->device.devargs->args : + "NULL")); + NT_LOG(DBG, ETHDEV, "%s: devargs: data: '%s'\n", __func__, + (pci_dev->device.devargs->data ? + pci_dev->device.devargs->data : + "NULL")); + } + + const int n_rte_has_pci = rte_eal_has_pci(); + + NT_LOG(DBG, ETHDEV, "has_pci=%d\n", n_rte_has_pci); + if (n_rte_has_pci == 0) { + NT_LOG(ERR, ETHDEV, "has_pci=%d: this PMD needs hugepages\n", + n_rte_has_pci); + return -1; + } + + const int n_rte_vfio_no_io_mmu_enabled = rte_vfio_noiommu_is_enabled(); + + NT_LOG(DBG, ETHDEV, "vfio_no_iommu_enabled=%d\n", + n_rte_vfio_no_io_mmu_enabled); + if (n_rte_vfio_no_io_mmu_enabled) { + NT_LOG(ERR, ETHDEV, + "vfio_no_iommu_enabled=%d: this PMD needs VFIO IOMMU\n", + n_rte_vfio_no_io_mmu_enabled); + return -1; + } + + const enum rte_iova_mode n_rte_io_va_mode = rte_eal_iova_mode(); + + NT_LOG(DBG, ETHDEV, "iova mode=%d\n", n_rte_io_va_mode); + if (n_rte_io_va_mode != RTE_IOVA_PA) { + NT_LOG(WRN, ETHDEV, + "iova mode (%d) should be PA for performance reasons\n", + n_rte_io_va_mode); + } + + const int n_rte_has_huge_pages = rte_eal_has_hugepages(); + + NT_LOG(DBG, ETHDEV, "has_hugepages=%d\n", n_rte_has_huge_pages); + if (n_rte_has_huge_pages == 0) { + NT_LOG(ERR, ETHDEV, + "has_hugepages=%d: this PMD needs hugepages\n", + n_rte_has_huge_pages); + return -1; + } + + NT_LOG(DBG, ETHDEV, + "busid=" PCI_PRI_FMT + " pciid=%04x:%04x_%04x:%04x locstr=%s @ numanode=%d: drv=%s drvalias=%s\n", + pci_dev->addr.domain, pci_dev->addr.bus, pci_dev->addr.devid, + pci_dev->addr.function, pci_dev->id.vendor_id, + pci_dev->id.device_id, pci_dev->id.subsystem_vendor_id, + pci_dev->id.subsystem_device_id, + pci_dev->name[0] ? pci_dev->name : "NA", /* locstr */ + pci_dev->device.numa_node, + pci_dev->driver->driver.name ? pci_dev->driver->driver.name : + "NA", + pci_dev->driver->driver.alias ? pci_dev->driver->driver.alias : + "NA"); + + if (pci_dev->id.vendor_id == NT_HW_PCI_VENDOR_ID) { + if (pci_dev->id.device_id == NT_HW_PCI_DEVICE_ID_NT200A01 || + pci_dev->id.device_id == NT_HW_PCI_DEVICE_ID_NT50B01) { + if (pci_dev->id.subsystem_device_id != 0x01) { + NT_LOG(DBG, ETHDEV, + "%s: PCIe bifurcation - secondary endpoint " + "found - leaving probe\n", + __func__); + return -1; + } + } + } + + res = nthw_pci_dev_init(pci_dev); + + init_shutdown(); + + NT_LOG(DBG, ETHDEV, "%s: leave: res=%d\n", __func__, res); + return res; +} + +static int nthw_pci_remove(struct rte_pci_device *pci_dev) +{ + NT_LOG(DBG, ETHDEV, "%s: [%s:%u]\n", __func__, __func__, __LINE__); + + return rte_eth_dev_pci_generic_remove(pci_dev, nthw_pci_dev_deinit); +} + +static int nt_log_init_impl(void) +{ + rte_log_set_global_level(RTE_LOG_DEBUG); + + NT_LOG(DBG, ETHDEV, "%s: [%s:%u]\n", __func__, __func__, __LINE__); + + for (int i = NT_LOG_MODULE_GENERAL; i < NT_LOG_MODULE_END; ++i) { + int index = NT_LOG_MODULE_INDEX(i); + + nt_log_module_logtype[index] = + rte_log_register_type_and_pick_level(nt_log_module_eal_name[index], + RTE_LOG_INFO); + } + + NT_LOG(DBG, ETHDEV, "%s: [%s:%u]\n", __func__, __func__, __LINE__); + + return 0; +} + +__rte_format_printf(3, 0) +static int nt_log_log_impl(enum nt_log_level level, uint32_t module, + const char *format, va_list args) +{ + uint32_t rte_level = 0; + uint32_t rte_module = 0; + + switch (level) { + case NT_LOG_ERR: + rte_level = RTE_LOG_ERR; + break; + case NT_LOG_WRN: + rte_level = RTE_LOG_WARNING; + break; + case NT_LOG_INF: + rte_level = RTE_LOG_INFO; + break; + default: + rte_level = RTE_LOG_DEBUG; + } + + rte_module = + (module >= NT_LOG_MODULE_GENERAL && + module < NT_LOG_MODULE_END) ? + (uint32_t)nt_log_module_logtype[NT_LOG_MODULE_INDEX(module)] : module; + + return (int)rte_vlog(rte_level, rte_module, format, args); +} + +static int nt_log_is_debug_impl(uint32_t module) +{ + if (module < NT_LOG_MODULE_GENERAL || module >= NT_LOG_MODULE_END) + return -1; + int index = NT_LOG_MODULE_INDEX(module); + + return rte_log_get_level(nt_log_module_logtype[index]) == RTE_LOG_DEBUG; +} + +RTE_INIT(ntnic_rte_init); /* must go before function */ + +static void ntnic_rte_init(void) +{ + static struct nt_log_impl impl = { .init = &nt_log_init_impl, + .log = &nt_log_log_impl, + .is_debug = &nt_log_is_debug_impl + }; + + nt_log_init(&impl); +} + +static struct rte_pci_driver rte_nthw_pmd = { + .driver = { + .name = "net_ntnic", + }, + + .id_table = nthw_pci_id_map, + .drv_flags = RTE_PCI_DRV_NEED_MAPPING, + .probe = nthw_pci_probe, + .remove = nthw_pci_remove, +}; + +RTE_PMD_REGISTER_PCI(net_ntnic, rte_nthw_pmd); +RTE_PMD_REGISTER_PCI_TABLE(net_ntnic, nthw_pci_id_map); +RTE_PMD_REGISTER_KMOD_DEP(net_ntnic, "* vfio-pci"); + +/* + * VF and VDPA code + */ +int register_release_virtqueue_info(struct nthw_virt_queue *vq, int rx, + int managed) +{ + int i; + + for (i = 0; i < MAX_REL_VQS; i++) { + if (rel_virt_queue[i].vq == NULL) { + rel_virt_queue[i].vq = vq; + rel_virt_queue[i].rx = rx; + rel_virt_queue[i].managed = managed; + return 0; + } + } + return -1; +} + +int de_register_release_virtqueue_info(struct nthw_virt_queue *vq) +{ + int i; + + for (i = 0; i < MAX_REL_VQS; i++) { + if (rel_virt_queue[i].vq == vq) { + rel_virt_queue[i].vq = NULL; + return 0; + } + } + return -1; +} + +struct pmd_internals *vp_vhid_instance_ready(int vhid) +{ + struct pmd_internals *intern = pmd_intern_base; + + while (intern) { + if (intern->vhid == vhid) + return intern; + intern = intern->next; + } + return NULL; +} + +struct pmd_internals *vp_path_instance_ready(const char *path) +{ + struct pmd_internals *intern = pmd_intern_base; + + while (intern) { + printf("Searching for path: \"%s\" == \"%s\" (%d)\n", + intern->vhost_path, path, + strcmp(intern->vhost_path, path)); + if (strcmp(intern->vhost_path, path) == 0) + return intern; + intern = intern->next; + } + return NULL; +} + +static void read_port_queues_mapping(char *str, int *portq) +{ + int len; + char *tok; + + while (*str != '[' && *str != '\0') + str++; + + if (*str == '\0') + return; + str++; + len = strlen(str); + char *str_e = &str[len]; + + while (*str_e != ']' && str_e != str) + str_e--; + if (*str_e != ']') + return; + *str_e = '\0'; + + tok = strtok(str, ",;"); + while (tok) { + char *ch = strchr(tok, ':'); + + if (ch) { + *ch = '\0'; + int port = atoi(tok); + int nvq = atoi(ch + 1); + + if (port >= 0 && + port < MAX_FPGA_VIRTUAL_PORTS_SUPPORTED && + nvq > 0 && nvq < MAX_QUEUES) + portq[port] = nvq; + } + + tok = strtok(NULL, ",;"); + } +} + +int setup_virtual_pf_representor_base(struct rte_pci_device *dev) +{ + struct rte_eth_devargs eth_da; + + eth_da.nb_representor_ports = 0U; + if (dev->device.devargs && dev->device.devargs->args) { + char *ch = strstr(dev->device.devargs->args, "portqueues"); + + if (ch) { + read_port_queues_mapping(ch, rep.portqueues); + /* + * Remove this extension. DPDK cannot read representor=[x] if added + * parameter to the end + */ + *ch = '\0'; + } + + int err = rte_eth_devargs_parse(dev->device.devargs->args, + ð_da); + if (err) { + rte_errno = -err; + NT_LOG(ERR, VDPA, + "failed to process device arguments: %s", + strerror(rte_errno)); + return -1; + } + + if (eth_da.nb_representor_ports) { + rep.vpf_dev = dev; + rep.eth_da = eth_da; + } + } + /* Will be set later when assigned to this PF */ + rep.pf_backer_port_id = RTE_MAX_ETHPORTS; + return eth_da.nb_representor_ports; +} + +static inline struct rte_eth_dev * +rte_eth_vdev_allocate(struct rte_pci_device *dev, const char *name, + size_t private_data_size, int *n_vq) +{ + static int next_rep_p; + struct rte_eth_dev *eth_dev = NULL; + + eth_dev = rte_eth_dev_allocate(name); + if (!eth_dev) + return NULL; + + NT_LOG(DBG, VDPA, "%s: [%s:%u] eth_dev %p, port_id %u\n", __func__, + __func__, __LINE__, eth_dev, eth_dev->data->port_id); + + if (private_data_size) { + eth_dev->data->dev_private = rte_zmalloc_socket(name, private_data_size, + RTE_CACHE_LINE_SIZE, + dev->device.numa_node); + if (!eth_dev->data->dev_private) { + rte_eth_dev_release_port(eth_dev); + return NULL; + } + } + + eth_dev->intr_handle = NULL; + eth_dev->data->numa_node = dev->device.numa_node; + eth_dev->data->dev_flags |= RTE_ETH_DEV_REPRESENTOR; + + if (rep.vpf_dev && rep.eth_da.nb_representor_ports > next_rep_p) { + eth_dev->data->representor_id = + rep.eth_da.representor_ports[next_rep_p++]; + eth_dev->device = &rep.vpf_dev->device; + eth_dev->data->backer_port_id = rep.pf_backer_port_id; + } else { + eth_dev->data->representor_id = nt_vfio_vf_num(dev); + eth_dev->device = &dev->device; + } + + if (rep.portqueues[eth_dev->data->representor_id]) + *n_vq = rep.portqueues[eth_dev->data->representor_id]; + + else + *n_vq = 1; + return eth_dev; +} + +static inline const char * +rte_vdev_device_name(const struct rte_pci_device *dev) +{ + if (dev && dev->device.name) + return dev->device.name; + return NULL; +} + +static const char *const valid_args[] = { +#define VP_VLAN_ID "vlan" + VP_VLAN_ID, +#define VP_SEPARATE_SOCKET "sep" + VP_SEPARATE_SOCKET, NULL +}; + +static int rte_pmd_vp_init_internals(struct rte_pci_device *vdev, + struct rte_eth_dev **eth_dev) +{ + struct pmd_internals *internals = NULL; + struct rte_eth_dev_data *data = NULL; + int i; + struct rte_eth_link pmd_link; + int numa_node = vdev->device.numa_node; + const char *name; + int n_vq; + int num_queues; + uint8_t port; + uint32_t vlan = 0; + uint32_t separate_socket = 0; + + enum fpga_info_profile fpga_profile = + get_fpga_profile_from_pci(vdev->addr); + + name = rte_vdev_device_name(vdev); + + /* + * Now do all data allocation - for eth_dev structure + * and internal (private) data + */ + + if (vdev && vdev->device.devargs) { + struct rte_kvargs *kvlist = NULL; + + kvlist = rte_kvargs_parse(vdev->device.devargs->args, + valid_args); + if (!kvlist) { + NT_LOG(ERR, VDPA, "error when parsing param"); + goto error; + } + + if (rte_kvargs_count(kvlist, VP_VLAN_ID) == 1) { + if (rte_kvargs_process(kvlist, VP_VLAN_ID, + &string_to_u32, &vlan) < 0) { + NT_LOG(ERR, VDPA, "error to parse %s", + VP_VLAN_ID); + goto error; + } + } + + if (rte_kvargs_count(kvlist, VP_SEPARATE_SOCKET) == 1) { + if (rte_kvargs_process(kvlist, VP_SEPARATE_SOCKET, + &string_to_u32, + &separate_socket) < 0) { + NT_LOG(ERR, VDPA, "error to parse %s", + VP_SEPARATE_SOCKET); + goto error; + } + } + } + + n_vq = 0; + *eth_dev = + rte_eth_vdev_allocate(vdev, name, sizeof(*internals), &n_vq); + if (*eth_dev == NULL) + goto error; + + data = rte_zmalloc_socket(name, sizeof(*data), 0, numa_node); + if (data == NULL) + goto error; + + NT_LOG(DBG, VDPA, "%s: [%s:%u] eth_dev %p, port_id %u, if_index %u\n", + __func__, __func__, __LINE__, *eth_dev, + (*eth_dev)->data->port_id, (*eth_dev)->data->representor_id); + + port = (*eth_dev)->data->representor_id; + + if (port < MAX_NTNIC_PORTS || port >= VIRTUAL_TUNNEL_PORT_OFFSET) { + NT_LOG(ERR, VDPA, + "(%i) Creating ntvp-backend ethdev on numa socket %i has invalid representor port\n", + port, numa_node); + return -1; + } + NT_LOG(DBG, VDPA, + "(%i) Creating ntnic-backend ethdev on numa socket %i\n", port, + numa_node); + + /* Build up private dev data */ + internals = (*eth_dev)->data->dev_private; + internals->pci_dev = vdev; + if (fpga_profile == FPGA_INFO_PROFILE_VSWITCH) { + internals->type = PORT_TYPE_VIRTUAL; + internals->nb_rx_queues = 1; + internals->nb_tx_queues = 1; + } else { + internals->type = PORT_TYPE_OVERRIDE; + internals->nb_rx_queues = n_vq; + internals->nb_tx_queues = n_vq; + } + internals->p_drv = get_pdrv_from_pci(vdev->addr); + + if (n_vq > MAX_QUEUES) { + NT_LOG(ERR, VDPA, + "Error creating virtual port. Too many rx or tx queues. Max is %i\n", + MAX_QUEUES); + goto error; + } + + if (n_vq > FLOW_MAX_QUEUES) { + NT_LOG(ERR, VDPA, + "Error creating virtual port. Too many rx or tx queues for NIC. Max reported %i\n", + FLOW_MAX_QUEUES); + goto error; + } + + /* Initialize HB output dest to none */ + for (i = 0; i < MAX_QUEUES; i++) + internals->vpq[i].hw_id = -1; + + internals->vhid = -1; + internals->port = port; + internals->if_index = port; + internals->port_id = (*eth_dev)->data->port_id; + internals->vlan = vlan; + + /* + * Create first time all queues in HW + */ + struct flow_queue_id_s queue_ids[FLOW_MAX_QUEUES + 1]; + + if (fpga_profile == FPGA_INFO_PROFILE_VSWITCH) + num_queues = n_vq + 1; /* add 1: 0th for exception */ + else + num_queues = n_vq; + + int start_queue = allocate_queue(num_queues); + + if (start_queue < 0) { + NT_LOG(ERR, VDPA, + "Error creating virtual port. Too many rx queues. Could not allocate %i\n", + num_queues); + goto error; + } + + int vhid = -1; + + for (i = 0; i < num_queues; i++) { + queue_ids[i].id = start_queue + i; /* 0th is exception queue */ + queue_ids[i].hw_id = start_queue + i; + } + + if (fpga_profile == FPGA_INFO_PROFILE_VSWITCH) { + internals->txq_scg[0].rss_target_id = -1; + internals->flw_dev = flow_get_eth_dev(0, internals->port, + internals->port_id, num_queues, + queue_ids, + &internals->txq_scg[0].rss_target_id, + FLOW_ETH_DEV_PROFILE_VSWITCH, 0); + } else { + uint16_t in_port = internals->port & 1; + char name[RTE_ETH_NAME_MAX_LEN]; + struct pmd_internals *main_internals; + struct rte_eth_dev *eth_dev; + int i; + int status; + + /* Get name of in_port */ + status = rte_eth_dev_get_name_by_port(in_port, name); + if (status != 0) { + NT_LOG(ERR, VDPA, "Name of port not found"); + goto error; + } + NT_LOG(DBG, VDPA, "Name of port %u = %s\n", in_port, name); + + /* Get ether device for in_port */ + eth_dev = rte_eth_dev_get_by_name(name); + if (eth_dev == NULL) { + NT_LOG(ERR, VDPA, "Failed to get eth device"); + goto error; + } + + /* Get internals for in_port */ + main_internals = + (struct pmd_internals *)eth_dev->data->dev_private; + NT_LOG(DBG, VDPA, "internals port %u\n\n", + main_internals->port); + if (main_internals->port != in_port) { + NT_LOG(ERR, VDPA, "Port did not match"); + goto error; + } + + /* Get flow device for in_port */ + internals->flw_dev = main_internals->flw_dev; + + for (i = 0; i < num_queues && i < MAX_QUEUES; i++) { + NT_LOG(DBG, VDPA, "Queue: %u\n", + queue_ids[i].id); + NT_LOG(DBG, VDPA, "HW ID: %u\n", + queue_ids[i].hw_id); + if (flow_eth_dev_add_queue(main_internals->flw_dev, + &queue_ids[i])) { + NT_LOG(ERR, VDPA, "Could not add queue"); + goto error; + } + } + } + + if (!internals->flw_dev) { + NT_LOG(ERR, VDPA, + "Error creating virtual port. Resource exhaustion in HW\n"); + goto error; + } + + char path[128]; + + if (!separate_socket) { + sprintf(path, "%sstdvio%i", DVIO_VHOST_DIR_NAME, port); + } else { + sprintf(path, "%sstdvio%i/stdvio%i", DVIO_VHOST_DIR_NAME, port, + port); + } + + internals->vpq_nb_vq = n_vq; + if (fpga_profile == FPGA_INFO_PROFILE_VSWITCH) { + if (nthw_vdpa_init(vdev, (*eth_dev)->device->name, path, + queue_ids[1].hw_id, n_vq, n_vq, + internals->port, &vhid)) { + NT_LOG(ERR, VDPA, + "*********** ERROR *********** vDPA RELAY INIT\n"); + goto error; + } + for (i = 0; i < n_vq; i++) { + internals->vpq[i] = + queue_ids[i + 1]; /* queue 0 is for exception */ + } + } else { + if (nthw_vdpa_init(vdev, (*eth_dev)->device->name, path, + queue_ids[0].hw_id, n_vq, n_vq, + internals->port, &vhid)) { + NT_LOG(ERR, VDPA, + "*********** ERROR *********** vDPA RELAY INIT\n"); + goto error; + } + for (i = 0; i < n_vq; i++) + internals->vpq[i] = queue_ids[i]; + } + + /* + * Exception queue for OVS SW path + */ + internals->rxq_scg[0].queue = queue_ids[0]; + internals->txq_scg[0].queue = + queue_ids[0]; /* use same index in Rx and Tx rings */ + internals->rxq_scg[0].enabled = 0; + internals->txq_scg[0].port = port; + + internals->txq_scg[0].type = internals->type; + internals->rxq_scg[0].type = internals->type; + internals->rxq_scg[0].port = internals->port; + + /* Setup pmd_link info */ + pmd_link.link_speed = ETH_SPEED_NUM_NONE; + pmd_link.link_duplex = ETH_LINK_FULL_DUPLEX; + pmd_link.link_status = ETH_LINK_DOWN; + + rte_memcpy(data, (*eth_dev)->data, sizeof(*data)); + data->dev_private = internals; + data->port_id = (*eth_dev)->data->port_id; + + data->nb_rx_queues = 1; /* this is exception */ + data->nb_tx_queues = 1; + + data->dev_link = pmd_link; + data->mac_addrs = ð_addr_vp[port - MAX_NTNIC_PORTS]; + data->numa_node = numa_node; + + (*eth_dev)->data = data; + (*eth_dev)->dev_ops = &nthw_eth_dev_ops; + + if (pmd_intern_base) { + struct pmd_internals *intern = pmd_intern_base; + + while (intern->next) + intern = intern->next; + intern->next = internals; + } else { + pmd_intern_base = internals; + } + internals->next = NULL; + + atomic_store(&internals->vhid, vhid); + + LIST_INIT(&internals->mtr_profiles); + LIST_INIT(&internals->mtrs); + return 0; + +error: + if (data) + rte_free(data); + if (internals) + rte_free(internals); + return -1; +} + +/* + * PORT_TYPE_OVERRIDE cannot receive data through SCG as the queues + * are going to VF/vDPA + */ +static uint16_t eth_dev_rx_scg_dummy(void *queue __rte_unused, + struct rte_mbuf **bufs __rte_unused, + uint16_t nb_pkts __rte_unused) +{ + return 0; +} + +/* + * PORT_TYPE_OVERRIDE cannot transmit data through SCG as the queues + * are coming from VF/vDPA + */ +static uint16_t eth_dev_tx_scg_dummy(void *queue __rte_unused, + struct rte_mbuf **bufs __rte_unused, + uint16_t nb_pkts __rte_unused) +{ + return 0; +} + +int nthw_create_vf_interface_dpdk(struct rte_pci_device *pci_dev) +{ + struct pmd_internals *internals; + struct rte_eth_dev *eth_dev; + + /* Create virtual function DPDK PCI devices.*/ + if (rte_pmd_vp_init_internals(pci_dev, ð_dev) < 0) + return -1; + + internals = (struct pmd_internals *)eth_dev->data->dev_private; + + if (internals->type == PORT_TYPE_OVERRIDE) { + eth_dev->rx_pkt_burst = eth_dev_rx_scg_dummy; + eth_dev->tx_pkt_burst = eth_dev_tx_scg_dummy; + } else { + eth_dev->rx_pkt_burst = eth_dev_rx_scg; + eth_dev->tx_pkt_burst = eth_dev_tx_scg; + } + + rte_eth_dev_probing_finish(eth_dev); + + return 0; +} + +int nthw_remove_vf_interface_dpdk(struct rte_pci_device *pci_dev) +{ + struct rte_eth_dev *eth_dev = NULL; + + NT_LOG(DBG, VDPA, "Closing ntvp pmd on numa socket %u\n", + rte_socket_id()); + + if (!pci_dev) + return -1; + + /* Clean up all vDPA devices */ + nthw_vdpa_close(); + + /* reserve an ethdev entry */ + eth_dev = rte_eth_dev_allocated(rte_vdev_device_name(pci_dev)); + if (eth_dev == NULL) + return -1; + + rte_free(eth_dev->data->dev_private); + rte_free(eth_dev->data); + + rte_eth_dev_release_port(eth_dev); + + return 0; +} + +/* + * LAG + */ + +#define LAG_PORT0_ONLY (100) +#define LAG_BALANCED_50_50 (50) +#define LAG_PORT1_ONLY (0) + +#define LAG_NO_TX (0) +#define LAG_PORT0_INDEX (1) +#define LAG_PORT1_INDEX (2) +#define LAG_HASH_INDEX (3) + +static int lag_nop(lag_config_t *config __rte_unused) +{ + return 0; +} + +static int lag_balance(lag_config_t *config __rte_unused) +{ + NT_LOG(DBG, ETHDEV, "AA LAG: balanced output\n"); + return lag_set_config(0, FLOW_LAG_SET_BALANCE, 0, LAG_BALANCED_50_50); +} + +static int lag_port0_active(lag_config_t *config __rte_unused) +{ + NT_LOG(DBG, ETHDEV, "AA LAG: port 0 output only\n"); + return lag_set_config(0, FLOW_LAG_SET_BALANCE, 0, LAG_PORT0_ONLY); +} + +static int lag_port1_active(lag_config_t *config __rte_unused) +{ + NT_LOG(DBG, ETHDEV, "AA LAG: port 1 output only\n"); + return lag_set_config(0, FLOW_LAG_SET_BALANCE, 0, LAG_PORT1_ONLY); +} + +static int lag_notx(lag_config_t *config __rte_unused) +{ + NT_LOG(DBG, ETHDEV, "AA LAG: no link\n"); + + int retval = 0; + + retval += + lag_set_config(0, FLOW_LAG_SET_ALL, LAG_PORT0_INDEX, LAG_NO_TX); + retval += + lag_set_config(0, FLOW_LAG_SET_ALL, LAG_HASH_INDEX, LAG_NO_TX); + return retval; +} + +static bool lag_get_link_status(lag_config_t *lag_config, uint8_t port) +{ + struct adapter_info_s *p_adapter_info = + &lag_config->internals->p_drv->ntdrv.adapter_info; + const bool link_up = nt4ga_port_get_link_status(p_adapter_info, port); + + NT_LOG(DBG, ETHDEV, "port %d status: %d\n", port, link_up); + return link_up; +} + +static int lag_get_status(lag_config_t *config) +{ + uint8_t port0 = lag_get_link_status(config, 0); + + uint8_t port1 = lag_get_link_status(config, 1); + + uint8_t status = (port1 << 1 | port0); + return status; +} + +static int lag_activate_primary(lag_config_t *config) +{ + int retval; + + uint8_t port_0_distribution; + uint8_t blocked_port; + + if (config->primary_port == 0) { + /* If port 0 is the active primary, then it take 100% of the hash distribution. */ + port_0_distribution = 100; + blocked_port = LAG_PORT1_INDEX; + } else { + /* If port 1 is the active primary, then port 0 take 0% of the hash distribution. */ + port_0_distribution = 0; + blocked_port = LAG_PORT0_INDEX; + } + + retval = + lag_set_config(0, FLOW_LAG_SET_BALANCE, 0, port_0_distribution); + + /* Block Rx on the backup port */ + retval += lag_set_port_block(0, blocked_port); + + return retval; +} + +static int lag_activate_backup(lag_config_t *config) +{ + int retval; + + uint8_t port_0_distribution; + uint8_t blocked_port; + + if (config->backup_port == 0) { + /* If port 0 is the active backup, then it take 100% of the hash distribution. */ + port_0_distribution = 100; + blocked_port = LAG_PORT1_INDEX; + } else { + /* If port 1 is the active backup, then port 0 take 0% of the hash distribution. */ + port_0_distribution = 0; + blocked_port = LAG_PORT0_INDEX; + } + + /* Tx only on the backup port */ + retval = + lag_set_config(0, FLOW_LAG_SET_BALANCE, 0, port_0_distribution); + + /* Block Rx on the primary port */ + retval += lag_set_port_block(0, blocked_port); + + return retval; +} + +static int lag_active_backup(lag_config_t *config) +{ + uint8_t backup_port_active = 0; + + /* Initialize with the primary port active */ + lag_activate_primary(config); + + while (config->lag_thread_active) { + usleep(500 * + 1000); /* 500 ms sleep between testing the link status. */ + + bool primary_port_status = + lag_get_link_status(config, config->primary_port); + + if (!primary_port_status) { + bool backup_port_status = + lag_get_link_status(config, config->backup_port); + /* If the backup port has been activated, no need to do more. */ + if (backup_port_active) + continue; + + /* If the backup port is up, flip to it. */ + if (backup_port_status) { + NT_LOG(DBG, ETHDEV, + "LAG: primary port down => swapping to backup port\n"); + lag_activate_backup(config); + backup_port_active = 1; + } + } else { + /* If using the backup port and primary come back. */ + if (backup_port_active) { + NT_LOG(DBG, ETHDEV, + "LAG: primary port restored => swapping to primary port\n"); + lag_activate_primary(config); + backup_port_active = 0; + } /* Backup is active, while primary is restored. */ + } /* Primary port status */ + } + + return 0; +} + +typedef int (*lag_aa_action)(lag_config_t *config); + +/* port 0 is LSB and port 1 is MSB */ +enum lag_state_e { + P0DOWN_P1DOWN = 0b00, + P0UP_P1DOWN = 0b01, + P0DOWN_P1UP = 0b10, + P0UP_P1UP = 0b11 +}; + +struct lag_action_s { + enum lag_state_e src_state; + enum lag_state_e dst_state; + lag_aa_action action; +}; + +struct lag_action_s actions[] = { + /* No action in same state */ + { P0UP_P1UP, P0UP_P1UP, lag_nop }, + { P0UP_P1DOWN, P0UP_P1DOWN, lag_nop }, + { P0DOWN_P1UP, P0DOWN_P1UP, lag_nop }, + { P0DOWN_P1DOWN, P0DOWN_P1DOWN, lag_nop }, + + /* UU start */ + { P0UP_P1UP, P0UP_P1DOWN, lag_port0_active }, + { P0UP_P1UP, P0DOWN_P1UP, lag_port1_active }, + { P0UP_P1UP, P0DOWN_P1DOWN, lag_notx }, + + /* UD start */ + { P0UP_P1DOWN, P0DOWN_P1DOWN, lag_notx }, + { P0UP_P1DOWN, P0DOWN_P1UP, lag_port1_active }, + { P0UP_P1DOWN, P0UP_P1UP, lag_balance }, + + /* DU start */ + { P0DOWN_P1UP, P0DOWN_P1DOWN, lag_notx }, + { P0DOWN_P1UP, P0UP_P1DOWN, lag_port0_active }, + { P0DOWN_P1UP, P0UP_P1UP, lag_balance }, + + /* DD start */ + { P0DOWN_P1DOWN, P0DOWN_P1UP, lag_port1_active }, + { P0DOWN_P1DOWN, P0UP_P1DOWN, lag_port0_active }, + { P0DOWN_P1DOWN, P0UP_P1UP, lag_balance }, +}; + +static lag_aa_action lookup_action(enum lag_state_e current_state, + enum lag_state_e new_state) +{ + uint32_t i; + + for (i = 0; i < sizeof(actions) / sizeof(struct lag_action_s); i++) { + if (actions[i].src_state == current_state && + actions[i].dst_state == new_state) + return actions[i].action; + } + return NULL; +} + +static int lag_active_active(lag_config_t *config) +{ + enum lag_state_e ports_status; + + /* Set the initial state to 50/50% */ + enum lag_state_e current_state = P0UP_P1UP; + + lag_balance(config); + /* No ports are blocked in active/active */ + lag_set_port_block(0, 0); + + lag_aa_action action; + + while (config->lag_thread_active) { + /* 500 ms sleep between testing the link status. */ + usleep(500 * 1000); + + ports_status = lag_get_status(config); + + action = lookup_action(current_state, ports_status); + action(config); + + current_state = ports_status; + } + + return 0; +} + +static void *lag_management(void *arg) +{ + lag_config_t *config = (lag_config_t *)arg; + + switch (config->mode) { + case BONDING_MODE_ACTIVE_BACKUP: + lag_active_backup(config); + break; + + case BONDING_MODE_8023AD: + lag_active_active(config); + break; + + default: + fprintf(stderr, "Unsupported NTbond mode\n"); + return NULL; + } + + return NULL; +} diff --git a/drivers/net/ntnic/ntnic_ethdev.h b/drivers/net/ntnic/ntnic_ethdev.h new file mode 100644 index 0000000000..a82027cbe7 --- /dev/null +++ b/drivers/net/ntnic/ntnic_ethdev.h @@ -0,0 +1,357 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2023 Napatech A/S + */ + +#ifndef __NTNIC_ETHDEV_H__ +#define __NTNIC_ETHDEV_H__ + +#include + +#include +#include /* RTE_VERSION, RTE_VERSION_NUM */ +#include +#include +#include +#include + +#include "ntos_system.h" +#include "ntnic_dbsconfig.h" +#include "stream_binary_flow_api.h" + +#if (RTE_VERSION_NUM(22, 07, 0, 0) <= RTE_VERSION) +#undef ETH_LINK_HALF_DUPLEX +#undef ETH_LINK_FULL_DUPLEX +#undef ETH_LINK_DOWN +#undef ETH_LINK_UP +#undef ETH_LINK_FIXED +#undef ETH_LINK_AUTONEG +#undef ETH_SPEED_NUM_NONE +#undef ETH_SPEED_NUM_10M +#undef ETH_SPEED_NUM_100M +#undef ETH_SPEED_NUM_1G +#undef ETH_SPEED_NUM_2_5G +#undef ETH_SPEED_NUM_5G +#undef ETH_SPEED_NUM_10G +#undef ETH_SPEED_NUM_20G +#undef ETH_SPEED_NUM_25G +#undef ETH_SPEED_NUM_40G +#undef ETH_SPEED_NUM_50G +#undef ETH_SPEED_NUM_56G +#undef ETH_SPEED_NUM_100G +#undef ETH_SPEED_NUM_200G +#undef ETH_SPEED_NUM_UNKNOWN +#undef ETH_LINK_SPEED_AUTONEG +#undef ETH_LINK_SPEED_FIXED +#undef ETH_LINK_SPEED_10M_HD +#undef ETH_LINK_SPEED_10M +#undef ETH_LINK_SPEED_100M_HD +#undef ETH_LINK_SPEED_100M +#undef ETH_LINK_SPEED_1G +#undef ETH_LINK_SPEED_2_5G +#undef ETH_LINK_SPEED_5G +#undef ETH_LINK_SPEED_10G +#undef ETH_LINK_SPEED_20G +#undef ETH_LINK_SPEED_25G +#undef ETH_LINK_SPEED_40G +#undef ETH_LINK_SPEED_50G +#undef ETH_LINK_SPEED_56G +#undef ETH_LINK_SPEED_100G +#undef ETH_LINK_SPEED_200G +#undef ETH_RSS_IP +#undef ETH_RSS_UDP +#undef ETH_RSS_TCP +#undef ETH_RSS_SCTP +#define ETH_LINK_HALF_DUPLEX RTE_ETH_LINK_HALF_DUPLEX +#define ETH_LINK_FULL_DUPLEX RTE_ETH_LINK_FULL_DUPLEX +#define ETH_LINK_DOWN RTE_ETH_LINK_DOWN +#define ETH_LINK_UP RTE_ETH_LINK_UP +#define ETH_LINK_FIXED RTE_ETH_LINK_FIXED +#define ETH_LINK_AUTONEG RTE_ETH_LINK_AUTONEG +#define ETH_SPEED_NUM_NONE RTE_ETH_SPEED_NUM_NONE +#define ETH_SPEED_NUM_10M RTE_ETH_SPEED_NUM_10M +#define ETH_SPEED_NUM_100M RTE_ETH_SPEED_NUM_100M +#define ETH_SPEED_NUM_1G RTE_ETH_SPEED_NUM_1G +#define ETH_SPEED_NUM_2_5G RTE_ETH_SPEED_NUM_2_5G +#define ETH_SPEED_NUM_5G RTE_ETH_SPEED_NUM_5G +#define ETH_SPEED_NUM_10G RTE_ETH_SPEED_NUM_10G +#define ETH_SPEED_NUM_20G RTE_ETH_SPEED_NUM_20G +#define ETH_SPEED_NUM_25G RTE_ETH_SPEED_NUM_25G +#define ETH_SPEED_NUM_40G RTE_ETH_SPEED_NUM_40G +#define ETH_SPEED_NUM_50G RTE_ETH_SPEED_NUM_50G +#define ETH_SPEED_NUM_56G RTE_ETH_SPEED_NUM_56G +#define ETH_SPEED_NUM_100G RTE_ETH_SPEED_NUM_100G +#define ETH_SPEED_NUM_200G RTE_ETH_SPEED_NUM_200G +#define ETH_SPEED_NUM_UNKNOWN RTE_ETH_SPEED_NUM_UNKNOWN +#define ETH_LINK_SPEED_AUTONEG RTE_ETH_LINK_SPEED_AUTONEG +#define ETH_LINK_SPEED_FIXED RTE_ETH_LINK_SPEED_FIXED +#define ETH_LINK_SPEED_10M_HD RTE_ETH_LINK_SPEED_10M_HD +#define ETH_LINK_SPEED_10M RTE_ETH_LINK_SPEED_10M +#define ETH_LINK_SPEED_100M_HD RTE_ETH_LINK_SPEED_100M_HD +#define ETH_LINK_SPEED_100M RTE_ETH_LINK_SPEED_100M +#define ETH_LINK_SPEED_1G RTE_ETH_LINK_SPEED_1G +#define ETH_LINK_SPEED_2_5G RTE_ETH_LINK_SPEED_2_5G +#define ETH_LINK_SPEED_5G RTE_ETH_LINK_SPEED_5G +#define ETH_LINK_SPEED_10G RTE_ETH_LINK_SPEED_10G +#define ETH_LINK_SPEED_20G RTE_ETH_LINK_SPEED_20G +#define ETH_LINK_SPEED_25G RTE_ETH_LINK_SPEED_25G +#define ETH_LINK_SPEED_40G RTE_ETH_LINK_SPEED_40G +#define ETH_LINK_SPEED_50G RTE_ETH_LINK_SPEED_50G +#define ETH_LINK_SPEED_56G RTE_ETH_LINK_SPEED_56G +#define ETH_LINK_SPEED_100G RTE_ETH_LINK_SPEED_100G +#define ETH_LINK_SPEED_200G RTE_ETH_LINK_SPEED_200G +#define ETH_RSS_IP RTE_ETH_RSS_IP +#define ETH_RSS_UDP RTE_ETH_RSS_UDP +#define ETH_RSS_TCP RTE_ETH_RSS_TCP +#define ETH_RSS_SCTP RTE_ETH_RSS_SCTP +#define ETH_RSS_IPV4 RTE_ETH_RSS_IPV4 +#define ETH_RSS_FRAG_IPV4 RTE_ETH_RSS_FRAG_IPV4 +#define ETH_RSS_NONFRAG_IPV4_OTHER RTE_ETH_RSS_NONFRAG_IPV4_OTHER +#define ETH_RSS_IPV6 RTE_ETH_RSS_IPV6 +#define ETH_RSS_FRAG_IPV6 RTE_ETH_RSS_FRAG_IPV6 +#define ETH_RSS_NONFRAG_IPV6_OTHER RTE_ETH_RSS_NONFRAG_IPV6_OTHER +#define ETH_RSS_IPV6_EX RTE_ETH_RSS_IPV6_EX +#define ETH_RSS_C_VLAN RTE_ETH_RSS_C_VLAN +#define ETH_RSS_L3_DST_ONLY RTE_ETH_RSS_L3_DST_ONLY +#define ETH_RSS_L3_SRC_ONLY RTE_ETH_RSS_L3_SRC_ONLY +#endif + +#define NUM_MAC_ADDRS_PER_PORT (16U) +#define NUM_MULTICAST_ADDRS_PER_PORT (16U) + +#define MAX_FPGA_VIRTUAL_PORTS_SUPPORTED 256 + +/* Total max ports per NT NFV NIC */ +#define MAX_NTNIC_PORTS 2 + +/* Max RSS queues */ +#define MAX_QUEUES 125 + +#define SG_NB_HW_RX_DESCRIPTORS 1024 +#define SG_NB_HW_TX_DESCRIPTORS 1024 +#define SG_HW_RX_PKT_BUFFER_SIZE (1024 << 1) +#define SG_HW_TX_PKT_BUFFER_SIZE (1024 << 1) + +#define SG_HDR_SIZE 12 + +/* VQ buffers needed to fit all data in packet + header */ +#define NUM_VQ_SEGS(_data_size_) \ + ({ \ + size_t _size = (_data_size_); \ + size_t _segment_count = ((_size + SG_HDR_SIZE) > SG_HW_TX_PKT_BUFFER_SIZE) ? \ + (((_size + SG_HDR_SIZE) + SG_HW_TX_PKT_BUFFER_SIZE - 1) / \ + SG_HW_TX_PKT_BUFFER_SIZE) : 1; \ + _segment_count; \ + }) + + +#define VIRTQ_DESCR_IDX(_tx_pkt_idx_) \ + (((_tx_pkt_idx_) + first_vq_descr_idx) % SG_NB_HW_TX_DESCRIPTORS) + +#define VIRTQ_DESCR_IDX_NEXT(_vq_descr_idx_) \ + (((_vq_descr_idx_) + 1) % SG_NB_HW_TX_DESCRIPTORS) + +#define MAX_REL_VQS 128 + +/* Functions: */ +struct pmd_internals *vp_vhid_instance_ready(int vhid); +struct pmd_internals *vp_path_instance_ready(const char *path); +int setup_virtual_pf_representor_base(struct rte_pci_device *dev); +int nthw_create_vf_interface_dpdk(struct rte_pci_device *pci_dev); +int nthw_remove_vf_interface_dpdk(struct rte_pci_device *pci_dev); +nthw_dbs_t *get_pdbs_from_pci(struct rte_pci_addr pci_addr); +enum fpga_info_profile get_fpga_profile_from_pci(struct rte_pci_addr pci_addr); +int register_release_virtqueue_info(struct nthw_virt_queue *vq, int rx, + int managed); +int de_register_release_virtqueue_info(struct nthw_virt_queue *vq); +int copy_mbuf_to_virtqueue(struct nthw_cvirtq_desc *cvq_desc, + uint16_t vq_descr_idx, + struct nthw_memory_descriptor *vq_bufs, int max_segs, + struct rte_mbuf *mbuf); + +extern int lag_active; +extern uint64_t rte_tsc_freq; +extern rte_spinlock_t hwlock; + +/* Structs: */ + +#define SG_HDR_SIZE 12 + +struct _pkt_hdr_rx { + uint32_t cap_len : 14; + uint32_t fid : 10; + uint32_t ofs1 : 8; + uint32_t ip_prot : 8; + uint32_t port : 13; + uint32_t descr : 8; + uint32_t descr_12b : 1; + uint32_t color_type : 2; + uint32_t color : 32; +}; + +struct _pkt_hdr_tx { + uint32_t cap_len : 14; + uint32_t lso_cso0 : 9; + uint32_t lso_cso1 : 9; + uint32_t lso_cso2 : 8; + /* all 1's : use implicit in-port. 0-127 queue index. 0x80 + phy-port to phy */ + uint32_t bypass_port : 13; + uint32_t descr : 8; + uint32_t descr_12b : 1; + uint32_t color_type : 2; + uint32_t color : 32; +}; + +/* Compile time verification of scatter gather header size. */ +typedef char check_sg_pkt_rx_hdr_size +[(sizeof(struct _pkt_hdr_rx) == SG_HDR_SIZE) ? 1 : -1]; +typedef char check_sg_pkt_tx_hdr_size +[(sizeof(struct _pkt_hdr_tx) == SG_HDR_SIZE) ? 1 : -1]; + +typedef void *handle_t; + +struct hwq_s { + int vf_num; + struct nthw_memory_descriptor virt_queues_ctrl; + struct nthw_memory_descriptor *pkt_buffers; +}; + +struct ntnic_rx_queue { + struct flow_queue_id_s + queue; /* queue info - user id and hw queue index */ + + struct rte_mempool *mb_pool; /* mbuf memory pool */ + uint16_t buf_size; /* size of data area in mbuf */ + unsigned long rx_pkts; /* Rx packet statistics */ + unsigned long rx_bytes; /* Rx bytes statistics */ + unsigned long err_pkts; /* Rx error packet statistics */ + int enabled; /* Enabling/disabling of this queue */ + + struct hwq_s hwq; + struct nthw_virt_queue *vq; + int nb_hw_rx_descr; + nt_meta_port_type_t type; + uint32_t port; /* Rx port for this queue */ + enum fpga_info_profile profile; /* Vswitch / Inline / Capture */ + +} __rte_cache_aligned; + +struct ntnic_tx_queue { + struct flow_queue_id_s + queue; /* queue info - user id and hw queue index */ + struct hwq_s hwq; + struct nthw_virt_queue *vq; + int nb_hw_tx_descr; + /* Used for bypass in NTDVIO0 header on Tx - pre calculated */ + int target_id; + nt_meta_port_type_t type; + /* only used for exception tx queue from OVS SW switching */ + int rss_target_id; + + uint32_t port; /* Tx port for this queue */ + unsigned long tx_pkts; /* Tx packet statistics */ + unsigned long tx_bytes; /* Tx bytes statistics */ + unsigned long err_pkts; /* Tx error packet stat */ + int enabled; /* Enabling/disabling of this queue */ + enum fpga_info_profile profile; /* Vswitch / Inline / Capture */ +} __rte_cache_aligned; + +#define MAX_ARRAY_ENTRIES MAX_QUEUES +struct array_s { + uint32_t value[MAX_ARRAY_ENTRIES]; + int count; +}; + +/* Configuerations related to LAG management */ +typedef struct { + uint8_t mode; + + int8_t primary_port; + int8_t backup_port; + + uint32_t ntpl_rx_id; + + pthread_t lag_tid; + uint8_t lag_thread_active; + + struct pmd_internals *internals; +} lag_config_t; + +#define BONDING_MODE_ACTIVE_BACKUP (1) +#define BONDING_MODE_8023AD (4) +struct nt_mtr_profile { + LIST_ENTRY(nt_mtr_profile) next; + uint32_t profile_id; + struct rte_mtr_meter_profile profile; +}; + +struct nt_mtr { + LIST_ENTRY(nt_mtr) next; + uint32_t mtr_id; + int shared; + struct nt_mtr_profile *profile; +}; + +enum virt_port_comm { + VIRT_PORT_NEGOTIATED_NONE, + VIRT_PORT_NEGOTIATED_SPLIT, + VIRT_PORT_NEGOTIATED_PACKED, + VIRT_PORT_USE_RELAY +}; + +#define MAX_PATH_LEN 128 + +struct pmd_internals { + const struct rte_pci_device *pci_dev; + + struct flow_eth_dev *flw_dev; + + char name[20]; + char vhost_path[MAX_PATH_LEN]; + + int n_intf_no; + int if_index; + + int lpbk_mode; + + uint8_t nb_ports_on_adapter; + uint8_t ts_multiplier; + uint16_t min_tx_pkt_size; + uint16_t max_tx_pkt_size; + + unsigned int nb_rx_queues; /* Number of Rx queues configured */ + unsigned int nb_tx_queues; /* Number of Tx queues configured */ + uint32_t port; + uint8_t port_id; + + nt_meta_port_type_t type; + struct flow_queue_id_s vpq[MAX_QUEUES]; + unsigned int vpq_nb_vq; + volatile atomic_int vhid; /* if a virtual port type - the vhid */ + enum virt_port_comm vport_comm; /* link and how split,packed,relay */ + uint32_t vlan; + + lag_config_t *lag_config; + + struct ntnic_rx_queue rxq_scg[MAX_QUEUES]; /* Array of Rx queues */ + struct ntnic_tx_queue txq_scg[MAX_QUEUES]; /* Array of Tx queues */ + + struct drv_s *p_drv; + /* Ethernet (MAC) addresses. Element number zero denotes default address. */ + struct rte_ether_addr eth_addrs[NUM_MAC_ADDRS_PER_PORT]; + /* Multicast ethernet (MAC) addresses. */ + struct rte_ether_addr mc_addrs[NUM_MULTICAST_ADDRS_PER_PORT]; + + LIST_HEAD(_profiles, nt_mtr_profile) mtr_profiles; + LIST_HEAD(_mtrs, nt_mtr) mtrs; + + uint64_t last_stat_rtc; + uint64_t rx_missed; + + struct pmd_internals *next; +}; + +void cleanup_flows(struct pmd_internals *internals); +int poll_statistics(struct pmd_internals *internals); +int debug_adapter_show_info(uint32_t pciident, FILE *pfh); + +#endif /* __NTNIC_ETHDEV_H__ */ diff --git a/drivers/net/ntnic/ntnic_filter/create_elements.h b/drivers/net/ntnic/ntnic_filter/create_elements.h new file mode 100644 index 0000000000..e90643ec6b --- /dev/null +++ b/drivers/net/ntnic/ntnic_filter/create_elements.h @@ -0,0 +1,1190 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2023 Napatech A/S + */ + +#ifndef __CREATE_ELEMENTS_H__ +#define __CREATE_ELEMENTS_H__ + +#include "stream_binary_flow_api.h" + +#define MAX_ELEMENTS 64 +#define MAX_ACTIONS 32 + +#define MAX_COLOR_FLOW_STATS 0x400 + +struct cnv_match_s { + struct flow_elem flow_elem[MAX_ELEMENTS]; +}; + +struct tun_def_s { + struct flow_elem *tun_definition; + struct cnv_match_s match; +}; + +struct cnv_attr_s { + struct cnv_match_s match; + struct flow_attr attr; +}; + +struct cnv_action_s { + struct flow_action flow_actions[MAX_ACTIONS]; + struct tun_def_s tun_def; + struct flow_action_rss flow_rss; + struct rte_flow_action_mark mark; + struct flow_action_raw_encap encap; + struct flow_action_raw_decap decap; + struct flow_action_queue queue; +}; + +/* + * Only needed because it eases the use of statistics through NTAPI + * for faster integration into NTAPI version of driver + * Therefore, this is only a good idea when running on a temporary NTAPI + * The query() functionality must go to flow engine, when moved to Open Source driver + */ + +struct rte_flow { + void *flw_hdl; + int used; + uint32_t flow_stat_id; + + uint64_t stat_pkts; + uint64_t stat_bytes; + uint8_t stat_tcp_flags; +}; + +enum nt_rte_flow_item_type { + NT_RTE_FLOW_ITEM_TYPE_END = INT_MIN, + NT_RTE_FLOW_ITEM_TYPE_TAG, + NT_RTE_FLOW_ITEM_TYPE_TUNNEL, +}; + +enum nt_rte_flow_action_type { + NT_RTE_FLOW_ACTION_TYPE_END = INT_MIN, + NT_RTE_FLOW_ACTION_TYPE_TAG, + NT_RTE_FLOW_ACTION_TYPE_TUNNEL_SET, + NT_RTE_FLOW_ACTION_TYPE_JUMP, +}; + +static int convert_tables_initialized; + +#define MAX_RTE_ENUM_INDEX 127 + +static int elem_list[MAX_RTE_ENUM_INDEX + 1]; +static int action_list[MAX_RTE_ENUM_INDEX + 1]; + +#ifdef RTE_FLOW_DEBUG +static const char *elem_list_str[MAX_RTE_ENUM_INDEX + 1]; +static const char *action_list_str[MAX_RTE_ENUM_INDEX + 1]; +#endif + +#define CNV_TO_ELEM(item) \ + ({ \ + int _temp_item = (item); \ + ((_temp_item >= 0 && _temp_item <= MAX_RTE_ENUM_INDEX) ? \ + elem_list[_temp_item] : -1); \ + }) + + +#define CNV_TO_ACTION(action) \ + ({ \ + int _temp_action = (action); \ + (_temp_action >= 0 && _temp_action <= MAX_RTE_ENUM_INDEX) ? \ + action_list[_temp_action] : -1; \ + }) + + +static uint32_t flow_stat_id_map[MAX_COLOR_FLOW_STATS]; +static rte_spinlock_t flow_lock = RTE_SPINLOCK_INITIALIZER; + +static int convert_error(struct rte_flow_error *error, + struct flow_error *flow_error) +{ + if (error) { + error->cause = NULL; + error->message = flow_error->message; + + if (flow_error->type == FLOW_ERROR_NONE || + flow_error->type == FLOW_ERROR_SUCCESS) + error->type = RTE_FLOW_ERROR_TYPE_NONE; + + else + error->type = RTE_FLOW_ERROR_TYPE_UNSPECIFIED; + } + return 0; +} + +/* + * Map Flow MARK to flow stat id + */ +static uint32_t create_flow_stat_id_locked(uint32_t mark) +{ + uint32_t flow_stat_id = mark & (MAX_COLOR_FLOW_STATS - 1); + + while (flow_stat_id_map[flow_stat_id]) + flow_stat_id = (flow_stat_id + 1) & (MAX_COLOR_FLOW_STATS - 1); + + flow_stat_id_map[flow_stat_id] = mark + 1; + return flow_stat_id; +} + +static uint32_t create_flow_stat_id(uint32_t mark) +{ + rte_spinlock_lock(&flow_lock); + uint32_t ret = create_flow_stat_id_locked(mark); + + rte_spinlock_unlock(&flow_lock); + return ret; +} + +static void delete_flow_stat_id_locked(uint32_t flow_stat_id) +{ + if (flow_stat_id < MAX_COLOR_FLOW_STATS) + flow_stat_id_map[flow_stat_id] = 0; +} + +static void initialize_global_cnv_tables(void) +{ + if (convert_tables_initialized) + return; + + memset(elem_list, -1, sizeof(elem_list)); + elem_list[RTE_FLOW_ITEM_TYPE_END] = FLOW_ELEM_TYPE_END; + elem_list[RTE_FLOW_ITEM_TYPE_ANY] = FLOW_ELEM_TYPE_ANY; + elem_list[RTE_FLOW_ITEM_TYPE_ETH] = FLOW_ELEM_TYPE_ETH; + elem_list[RTE_FLOW_ITEM_TYPE_VLAN] = FLOW_ELEM_TYPE_VLAN; + elem_list[RTE_FLOW_ITEM_TYPE_IPV4] = FLOW_ELEM_TYPE_IPV4; + elem_list[RTE_FLOW_ITEM_TYPE_IPV6] = FLOW_ELEM_TYPE_IPV6; + elem_list[RTE_FLOW_ITEM_TYPE_UDP] = FLOW_ELEM_TYPE_UDP; + elem_list[RTE_FLOW_ITEM_TYPE_SCTP] = FLOW_ELEM_TYPE_SCTP; + elem_list[RTE_FLOW_ITEM_TYPE_TCP] = FLOW_ELEM_TYPE_TCP; + elem_list[RTE_FLOW_ITEM_TYPE_ICMP] = FLOW_ELEM_TYPE_ICMP; + elem_list[RTE_FLOW_ITEM_TYPE_VXLAN] = FLOW_ELEM_TYPE_VXLAN; + elem_list[RTE_FLOW_ITEM_TYPE_GTP] = FLOW_ELEM_TYPE_GTP; + elem_list[RTE_FLOW_ITEM_TYPE_PORT_ID] = FLOW_ELEM_TYPE_PORT_ID; + elem_list[RTE_FLOW_ITEM_TYPE_TAG] = FLOW_ELEM_TYPE_TAG; + elem_list[RTE_FLOW_ITEM_TYPE_VOID] = FLOW_ELEM_TYPE_VOID; + +#ifdef RTE_FLOW_DEBUG + elem_list_str[RTE_FLOW_ITEM_TYPE_END] = "FLOW_ELEM_TYPE_END"; + elem_list_str[RTE_FLOW_ITEM_TYPE_ANY] = "FLOW_ELEM_TYPE_ANY"; + elem_list_str[RTE_FLOW_ITEM_TYPE_ETH] = "FLOW_ELEM_TYPE_ETH"; + elem_list_str[RTE_FLOW_ITEM_TYPE_VLAN] = "FLOW_ELEM_TYPE_VLAN"; + elem_list_str[RTE_FLOW_ITEM_TYPE_IPV4] = "FLOW_ELEM_TYPE_IPV4"; + elem_list_str[RTE_FLOW_ITEM_TYPE_IPV6] = "FLOW_ELEM_TYPE_IPV6"; + elem_list_str[RTE_FLOW_ITEM_TYPE_UDP] = "FLOW_ELEM_TYPE_UDP"; + elem_list_str[RTE_FLOW_ITEM_TYPE_SCTP] = "FLOW_ELEM_TYPE_SCTP"; + elem_list_str[RTE_FLOW_ITEM_TYPE_TCP] = "FLOW_ELEM_TYPE_TCP"; + elem_list_str[RTE_FLOW_ITEM_TYPE_ICMP] = "FLOW_ELEM_TYPE_ICMP"; + elem_list_str[RTE_FLOW_ITEM_TYPE_VXLAN] = "FLOW_ELEM_TYPE_VXLAN"; + elem_list_str[RTE_FLOW_ITEM_TYPE_GTP] = "FLOW_ELEM_TYPE_GTP"; + elem_list_str[RTE_FLOW_ITEM_TYPE_PORT_ID] = "FLOW_ELEM_TYPE_PORT_ID"; + elem_list_str[RTE_FLOW_ITEM_TYPE_TAG] = "FLOW_ELEM_TYPE_TAG"; + elem_list_str[RTE_FLOW_ITEM_TYPE_VOID] = "FLOW_ELEM_TYPE_VOID"; +#endif + + memset(action_list, -1, sizeof(action_list)); + action_list[RTE_FLOW_ACTION_TYPE_END] = FLOW_ACTION_TYPE_END; + action_list[RTE_FLOW_ACTION_TYPE_MARK] = FLOW_ACTION_TYPE_MARK; + action_list[RTE_FLOW_ACTION_TYPE_SET_TAG] = FLOW_ACTION_TYPE_SET_TAG; + action_list[RTE_FLOW_ACTION_TYPE_DROP] = FLOW_ACTION_TYPE_DROP; + action_list[RTE_FLOW_ACTION_TYPE_COUNT] = FLOW_ACTION_TYPE_COUNT; + action_list[RTE_FLOW_ACTION_TYPE_RSS] = FLOW_ACTION_TYPE_RSS; + action_list[RTE_FLOW_ACTION_TYPE_PORT_ID] = FLOW_ACTION_TYPE_PORT_ID; + action_list[RTE_FLOW_ACTION_TYPE_QUEUE] = FLOW_ACTION_TYPE_QUEUE; + action_list[RTE_FLOW_ACTION_TYPE_JUMP] = FLOW_ACTION_TYPE_JUMP; + action_list[RTE_FLOW_ACTION_TYPE_METER] = FLOW_ACTION_TYPE_METER; + action_list[RTE_FLOW_ACTION_TYPE_VXLAN_ENCAP] = + FLOW_ACTION_TYPE_VXLAN_ENCAP; + action_list[RTE_FLOW_ACTION_TYPE_VXLAN_DECAP] = + FLOW_ACTION_TYPE_VXLAN_DECAP; + action_list[RTE_FLOW_ACTION_TYPE_OF_PUSH_VLAN] = + FLOW_ACTION_TYPE_PUSH_VLAN; + action_list[RTE_FLOW_ACTION_TYPE_OF_SET_VLAN_VID] = + FLOW_ACTION_TYPE_SET_VLAN_VID; + action_list[RTE_FLOW_ACTION_TYPE_OF_SET_VLAN_PCP] = + FLOW_ACTION_TYPE_SET_VLAN_PCP; + action_list[RTE_FLOW_ACTION_TYPE_OF_POP_VLAN] = + FLOW_ACTION_TYPE_POP_VLAN; + action_list[RTE_FLOW_ACTION_TYPE_RAW_ENCAP] = + FLOW_ACTION_TYPE_RAW_ENCAP; + action_list[RTE_FLOW_ACTION_TYPE_RAW_DECAP] = + FLOW_ACTION_TYPE_RAW_DECAP; + action_list[RTE_FLOW_ACTION_TYPE_MODIFY_FIELD] = + FLOW_ACTION_TYPE_MODIFY_FIELD; + +#ifdef RTE_FLOW_DEBUG + action_list_str[RTE_FLOW_ACTION_TYPE_END] = "FLOW_ACTION_TYPE_END"; + action_list_str[RTE_FLOW_ACTION_TYPE_MARK] = "FLOW_ACTION_TYPE_MARK"; + action_list_str[RTE_FLOW_ACTION_TYPE_SET_TAG] = + "FLOW_ACTION_TYPE_SET_TAG"; + action_list_str[RTE_FLOW_ACTION_TYPE_DROP] = "FLOW_ACTION_TYPE_DROP"; + action_list_str[RTE_FLOW_ACTION_TYPE_COUNT] = "FLOW_ACTION_TYPE_COUNT"; + action_list_str[RTE_FLOW_ACTION_TYPE_RSS] = "FLOW_ACTION_TYPE_RSS"; + action_list_str[RTE_FLOW_ACTION_TYPE_PORT_ID] = + "FLOW_ACTION_TYPE_PORT_ID"; + action_list_str[RTE_FLOW_ACTION_TYPE_QUEUE] = "FLOW_ACTION_TYPE_QUEUE"; + action_list_str[RTE_FLOW_ACTION_TYPE_JUMP] = "FLOW_ACTION_TYPE_JUMP"; + action_list_str[RTE_FLOW_ACTION_TYPE_METER] = "FLOW_ACTION_TYPE_METER"; + action_list_str[RTE_FLOW_ACTION_TYPE_VXLAN_ENCAP] = + "FLOW_ACTION_TYPE_VXLAN_ENCAP"; + action_list_str[RTE_FLOW_ACTION_TYPE_VXLAN_DECAP] = + "FLOW_ACTION_TYPE_VXLAN_DECAP"; + action_list_str[RTE_FLOW_ACTION_TYPE_OF_PUSH_VLAN] = + "FLOW_ACTION_TYPE_PUSH_VLAN"; + action_list_str[RTE_FLOW_ACTION_TYPE_OF_SET_VLAN_VID] = + "FLOW_ACTION_TYPE_SET_VLAN_VID"; + action_list_str[RTE_FLOW_ACTION_TYPE_OF_SET_VLAN_PCP] = + "FLOW_ACTION_TYPE_SET_VLAN_PCP"; + action_list_str[RTE_FLOW_ACTION_TYPE_OF_POP_VLAN] = + "FLOW_ACTION_TYPE_POP_VLAN"; + action_list_str[RTE_FLOW_ACTION_TYPE_RAW_ENCAP] = + "FLOW_ACTION_TYPE_RAW_ENCAP"; + action_list_str[RTE_FLOW_ACTION_TYPE_RAW_DECAP] = + "FLOW_ACTION_TYPE_RAW_DECAP"; + action_list_str[RTE_FLOW_ACTION_TYPE_MODIFY_FIELD] = + "FLOW_ACTION_TYPE_MODIFY_FIELD"; +#endif + + convert_tables_initialized = 1; +} + +static int interpret_raw_data(uint8_t *data, uint8_t *preserve, int size, + struct flow_elem *out) +{ + int hdri = 0; + int pkti = 0; + + /* Ethernet */ + if (size - pkti == 0) + goto interpret_end; + if (size - pkti < (int)sizeof(struct rte_ether_hdr)) + return -1; + + out[hdri].type = FLOW_ELEM_TYPE_ETH; + out[hdri].spec = &data[pkti]; + out[hdri].mask = (preserve != NULL) ? &preserve[pkti] : NULL; + + rte_be16_t ether_type = + ((struct rte_ether_hdr *)&data[pkti])->ether_type; + + hdri += 1; + pkti += sizeof(struct rte_ether_hdr); + + if (size - pkti == 0) + goto interpret_end; + + /* VLAN */ + while (ether_type == rte_cpu_to_be_16(RTE_ETHER_TYPE_VLAN) || + ether_type == rte_cpu_to_be_16(RTE_ETHER_TYPE_QINQ) || + ether_type == rte_cpu_to_be_16(RTE_ETHER_TYPE_QINQ1)) { + if (size - pkti == 0) + goto interpret_end; + if (size - pkti < (int)sizeof(struct rte_vlan_hdr)) + return -1; + + out[hdri].type = FLOW_ELEM_TYPE_VLAN; + out[hdri].spec = &data[pkti]; + out[hdri].mask = (preserve != NULL) ? &preserve[pkti] : NULL; + + ether_type = ((struct rte_vlan_hdr *)&data[pkti])->eth_proto; + + hdri += 1; + pkti += sizeof(struct rte_vlan_hdr); + } + + if (size - pkti == 0) + goto interpret_end; + + /* Layer 3 */ + uint8_t next_header = 0; + + if (ether_type == rte_cpu_to_be_16(RTE_ETHER_TYPE_IPV4) && + (data[pkti] & 0xF0) == 0x40) { + if (size - pkti < (int)sizeof(struct rte_ipv4_hdr)) + return -1; + + out[hdri].type = FLOW_ELEM_TYPE_IPV4; + out[hdri].spec = &data[pkti]; + out[hdri].mask = (preserve != NULL) ? &preserve[pkti] : NULL; + + next_header = data[pkti + 9]; + + hdri += 1; + pkti += sizeof(struct rte_ipv4_hdr); + } else if (ether_type == rte_cpu_to_be_16(RTE_ETHER_TYPE_IPV6) && + (data[pkti] & 0xF0) == 0x60) { + if (size - pkti < (int)sizeof(struct rte_ipv6_hdr)) + return -1; + + out[hdri].type = FLOW_ELEM_TYPE_IPV6; + out[hdri].spec = &data[pkti]; + out[hdri].mask = (preserve != NULL) ? &preserve[pkti] : NULL; + + next_header = data[pkti + 6]; + + hdri += 1; + pkti += sizeof(struct rte_ipv6_hdr); + + } else { + return -1; + } + + if (size - pkti == 0) + goto interpret_end; + + /* Layer 4 */ + int gtpu_encap = 0; + + if (next_header == 1) { /* ICMP */ + if (size - pkti < (int)sizeof(struct rte_icmp_hdr)) + return -1; + + out[hdri].type = FLOW_ELEM_TYPE_ICMP; + out[hdri].spec = &data[pkti]; + out[hdri].mask = (preserve != NULL) ? &preserve[pkti] : NULL; + + hdri += 1; + pkti += sizeof(struct rte_icmp_hdr); + } else if (next_header == 6) { /* TCP */ + if (size - pkti < (int)sizeof(struct rte_tcp_hdr)) + return -1; + + out[hdri].type = FLOW_ELEM_TYPE_TCP; + out[hdri].spec = &data[pkti]; + out[hdri].mask = (preserve != NULL) ? &preserve[pkti] : NULL; + + hdri += 1; + pkti += sizeof(struct rte_tcp_hdr); + } else if (next_header == 17) { /* UDP */ + if (size - pkti < (int)sizeof(struct rte_udp_hdr)) + return -1; + + out[hdri].type = FLOW_ELEM_TYPE_UDP; + out[hdri].spec = &data[pkti]; + out[hdri].mask = (preserve != NULL) ? &preserve[pkti] : NULL; + + gtpu_encap = ((struct rte_udp_hdr *)&data[pkti])->dst_port == + rte_cpu_to_be_16(RTE_GTPU_UDP_PORT); + + hdri += 1; + pkti += sizeof(struct rte_udp_hdr); + } else if (next_header == 132) { /* SCTP */ + if (size - pkti < (int)sizeof(struct rte_sctp_hdr)) + return -1; + + out[hdri].type = FLOW_ELEM_TYPE_SCTP; + out[hdri].spec = &data[pkti]; + out[hdri].mask = (preserve != NULL) ? &preserve[pkti] : NULL; + + hdri += 1; + pkti += sizeof(struct rte_sctp_hdr); + } else { + return -1; + } + + if (size - pkti == 0) + goto interpret_end; + + /* GTPv1-U */ + if (gtpu_encap) { + if (size - pkti < (int)sizeof(struct rte_gtp_hdr)) + return -1; + + out[hdri].type = FLOW_ELEM_TYPE_GTP; + out[hdri].spec = &data[pkti]; + out[hdri].mask = (preserve != NULL) ? &preserve[pkti] : NULL; + + int extension_present_bit = + ((struct rte_gtp_hdr *)&data[pkti])->e; + + hdri += 1; + pkti += sizeof(struct rte_gtp_hdr); + + if (extension_present_bit) { + if (size - pkti < + (int)sizeof(struct rte_gtp_hdr_ext_word)) + return -1; + + out[hdri].type = FLOW_ELEM_TYPE_GTP; + out[hdri].spec = &data[pkti]; + out[hdri].mask = (preserve != NULL) ? &preserve[pkti] : + NULL; + + uint8_t next_ext = + ((struct rte_gtp_hdr_ext_word *)&data[pkti]) + ->next_ext; + + hdri += 1; + pkti += sizeof(struct rte_gtp_hdr_ext_word); + + while (next_ext) { + size_t ext_len = data[pkti] * 4; + + if (size - pkti < (int)ext_len) + return -1; + + out[hdri].type = FLOW_ELEM_TYPE_GTP; + out[hdri].spec = &data[pkti]; + out[hdri].mask = (preserve != NULL) ? + &preserve[pkti] : + NULL; + + next_ext = data[pkti + ext_len - 1]; + + hdri += 1; + pkti += ext_len; + } + } + } + + if (size - pkti != 0) + return -1; + +interpret_end: + out[hdri].type = FLOW_ELEM_TYPE_END; + out[hdri].spec = NULL; + out[hdri].mask = NULL; + + return hdri + 1; +} + +static int create_attr(struct cnv_attr_s *attribute, + const struct rte_flow_attr *attr) +{ + memset(&attribute->attr, 0x0, sizeof(struct flow_attr)); + if (attr) { + attribute->attr.group = attr->group; + attribute->attr.priority = attr->priority; + } + return 0; +} + +static int create_match_elements(struct cnv_match_s *match, + const struct rte_flow_item items[], + int max_elem) +{ + int eidx = 0; + int iter_idx = 0; + int type = -1; + + if (!items) { + NT_LOG(ERR, FILTER, "ERROR no items to iterate!\n"); + return -1; + } + + if (!convert_tables_initialized) + initialize_global_cnv_tables(); + + do { + type = CNV_TO_ELEM(items[iter_idx].type); + if (type < 0) { + if ((int)items[iter_idx].type == + NT_RTE_FLOW_ITEM_TYPE_TUNNEL) { + type = FLOW_ELEM_TYPE_TUNNEL; + } else { + NT_LOG(ERR, FILTER, + "ERROR unknown item type received!\n"); + return -1; + } + } + + if (type >= 0) { + if (items[iter_idx].last) { + /* Ranges are not supported yet */ + NT_LOG(ERR, FILTER, + "ERROR ITEM-RANGE SETUP - NOT SUPPORTED!\n"); + return -1; + } + + if (eidx == max_elem) { + NT_LOG(ERR, FILTER, + "ERROR TOO MANY ELEMENTS ENCOUNTERED!\n"); + return -1; + } + +#ifdef RTE_FLOW_DEBUG + NT_LOG(INF, FILTER, + "RTE ITEM -> FILTER FLOW ELEM - %i -> %i - %s\n", + items[iter_idx].type, type, + ((int)items[iter_idx].type >= 0) ? + elem_list_str[items[iter_idx].type] : + "FLOW_ELEM_TYPE_TUNNEL"); + + switch (type) { + case FLOW_ELEM_TYPE_ETH: + if (items[iter_idx].spec) { + const struct flow_elem_eth *eth = + items[iter_idx].spec; + NT_LOG(DBG, FILTER, + "FLOW_ELEM_TYPE_ETH SPEC: dst=%02X:%02X:%02X:%02X:%02X:%02X\n", + eth->d_addr.addr_b[0] & 0xFF, + eth->d_addr.addr_b[1] & 0xFF, + eth->d_addr.addr_b[2] & 0xFF, + eth->d_addr.addr_b[3] & 0xFF, + eth->d_addr.addr_b[4] & 0xFF, + eth->d_addr.addr_b[5] & 0xFF); + NT_LOG(DBG, FILTER, + " src=%02X:%02X:%02X:%02X:%02X:%02X\n", + eth->s_addr.addr_b[0] & 0xFF, + eth->s_addr.addr_b[1] & 0xFF, + eth->s_addr.addr_b[2] & 0xFF, + eth->s_addr.addr_b[3] & 0xFF, + eth->s_addr.addr_b[4] & 0xFF, + eth->s_addr.addr_b[5] & 0xFF); + NT_LOG(DBG, FILTER, + " type=%04x\n", + htons(eth->ether_type)); + } + if (items[iter_idx].mask) { + const struct flow_elem_eth *eth = + items[iter_idx].mask; + NT_LOG(DBG, FILTER, + "FLOW_ELEM_TYPE_ETH MASK: dst=%02X:%02X:%02X:%02X:%02X:%02X\n", + eth->d_addr.addr_b[0] & 0xFF, + eth->d_addr.addr_b[1] & 0xFF, + eth->d_addr.addr_b[2] & 0xFF, + eth->d_addr.addr_b[3] & 0xFF, + eth->d_addr.addr_b[4] & 0xFF, + eth->d_addr.addr_b[5] & 0xFF); + NT_LOG(DBG, FILTER, + " src=%02X:%02X:%02X:%02X:%02X:%02X\n", + eth->s_addr.addr_b[0] & 0xFF, + eth->s_addr.addr_b[1] & 0xFF, + eth->s_addr.addr_b[2] & 0xFF, + eth->s_addr.addr_b[3] & 0xFF, + eth->s_addr.addr_b[4] & 0xFF, + eth->s_addr.addr_b[5] & 0xFF); + NT_LOG(DBG, FILTER, + " type=%04x\n", + htons(eth->ether_type)); + } + break; + case FLOW_ELEM_TYPE_VLAN: + if (items[iter_idx].spec) { + const struct flow_elem_vlan *vlan = + (const struct flow_elem_vlan *) + items[iter_idx] + .spec; + NT_LOG(DBG, FILTER, + "FLOW_ELEM_TYPE_VLAN SPEC: tci=%04x\n", + htons(vlan->tci)); + NT_LOG(DBG, FILTER, + " inner type=%04x\n", + htons(vlan->inner_type)); + } + if (items[iter_idx].mask) { + const struct flow_elem_vlan *vlan = + (const struct flow_elem_vlan *) + items[iter_idx] + .mask; + NT_LOG(DBG, FILTER, + "FLOW_ELEM_TYPE_VLAN MASK: tci=%04x\n", + htons(vlan->tci)); + NT_LOG(DBG, FILTER, + " inner type=%04x\n", + htons(vlan->inner_type)); + } + break; + case FLOW_ELEM_TYPE_IPV4: + if (items[iter_idx].spec) { + const struct flow_elem_ipv4 *ip = + items[iter_idx].spec; + NT_LOG(DBG, FILTER, + "FLOW_ELEM_TYPE_IPV4 SPEC: dst=%d.%d.%d.%d\n", + ((const char *)&ip->hdr.dst_ip)[0] & 0xFF, + ((const char *)&ip->hdr.dst_ip)[1] & 0xFF, + ((const char *)&ip->hdr.dst_ip)[2] & 0xFF, + ((const char *)&ip->hdr.dst_ip)[3] & 0xFF); + NT_LOG(DBG, FILTER, + " src=%d.%d.%d.%d\n", + ((const char *)&ip->hdr.src_ip)[0] & 0xFF, + ((const char *)&ip->hdr.src_ip)[1] & 0xFF, + ((const char *)&ip->hdr.src_ip)[2] & 0xFF, + ((const char *)&ip->hdr.src_ip)[3] & 0xFF); + NT_LOG(DBG, FILTER, + " fragment_offset=%u\n", + ip->hdr.frag_offset); + NT_LOG(DBG, FILTER, + " next_proto_id=%u\n", + ip->hdr.next_proto_id); + NT_LOG(DBG, FILTER, + " packet_id=%u\n", + ip->hdr.id); + NT_LOG(DBG, FILTER, + " time_to_live=%u\n", + ip->hdr.ttl); + NT_LOG(DBG, FILTER, + " type_of_service=%u\n", + ip->hdr.tos); + NT_LOG(DBG, FILTER, + " version_ihl=%u\n", + ip->hdr.version_ihl); + NT_LOG(DBG, FILTER, + " total_length=%u\n", + ip->hdr.length); + } + if (items[iter_idx].mask) { + const struct flow_elem_ipv4 *ip = + items[iter_idx].mask; + NT_LOG(DBG, FILTER, + "FLOW_ELEM_TYPE_IPV4 MASK: dst=%d.%d.%d.%d\n", + ((const char *)&ip->hdr.dst_ip)[0] & 0xFF, + ((const char *)&ip->hdr.dst_ip)[1] & 0xFF, + ((const char *)&ip->hdr.dst_ip)[2] & 0xFF, + ((const char *)&ip->hdr.dst_ip)[3] & 0xFF); + NT_LOG(DBG, FILTER, + " src=%d.%d.%d.%d\n", + ((const char *)&ip->hdr.src_ip)[0] & 0xFF, + ((const char *)&ip->hdr.src_ip)[1] & 0xFF, + ((const char *)&ip->hdr.src_ip)[2] & 0xFF, + ((const char *)&ip->hdr.src_ip)[3] & 0xFF); + NT_LOG(DBG, FILTER, + " fragment_offset=%x\n", + ip->hdr.frag_offset); + NT_LOG(DBG, FILTER, + " next_proto_id=%x\n", + ip->hdr.next_proto_id); + NT_LOG(DBG, FILTER, + " packet_id=%x\n", + ip->hdr.id); + NT_LOG(DBG, FILTER, + " time_to_live=%x\n", + ip->hdr.ttl); + NT_LOG(DBG, FILTER, + " type_of_service=%x\n", + ip->hdr.tos); + NT_LOG(DBG, FILTER, + " version_ihl=%x\n", + ip->hdr.version_ihl); + NT_LOG(DBG, FILTER, + " total_length=%x\n", + ip->hdr.length); + } + break; + case FLOW_ELEM_TYPE_UDP: + if (items[iter_idx].spec) { + const struct flow_elem_udp *udp = + (const struct flow_elem_udp *) + items[iter_idx] + .spec; + NT_LOG(DBG, FILTER, + "FLOW_ELEM_TYPE_UDP SPEC: src port=%04x\n", + htons(udp->hdr.src_port)); + NT_LOG(DBG, FILTER, + " dst port=%04x\n", + htons(udp->hdr.dst_port)); + } + if (items[iter_idx].mask) { + const struct flow_elem_udp *udp = + (const struct flow_elem_udp *) + items[iter_idx] + .mask; + NT_LOG(DBG, FILTER, + "FLOW_ELEM_TYPE_UDP MASK: src port=%04x\n", + htons(udp->hdr.src_port)); + NT_LOG(DBG, FILTER, + " dst port=%04x\n", + htons(udp->hdr.dst_port)); + } + break; + case FLOW_ELEM_TYPE_TAG: + if (items[iter_idx].spec) { + const struct flow_elem_tag *tag = + (const struct flow_elem_tag *) + items[iter_idx] + .spec; + NT_LOG(DBG, FILTER, + "FLOW_ELEM_TYPE_TAG SPEC: data=%u\n", + tag->data); + NT_LOG(DBG, FILTER, + " index=%u\n", + tag->index); + } + if (items[iter_idx].mask) { + const struct flow_elem_tag *tag = + (const struct flow_elem_tag *) + items[iter_idx] + .mask; + NT_LOG(DBG, FILTER, + "FLOW_ELEM_TYPE_TAG MASK: data=%u\n", + tag->data); + NT_LOG(DBG, FILTER, + " index=%u\n", + tag->index); + } + break; + case FLOW_ELEM_TYPE_VXLAN: { + const struct flow_elem_vxlan *vxlan = + (const struct flow_elem_vxlan *) + items[iter_idx] + .spec; + const struct flow_elem_vxlan *mask = + (const struct flow_elem_vxlan *) + items[iter_idx] + .mask; + + uint32_t vni = + (uint32_t)(((uint32_t)vxlan->vni[0] + << 16) | + ((uint32_t)vxlan->vni[1] + << 8) | + ((uint32_t)vxlan->vni[2])); + uint32_t vni_mask = + (uint32_t)(((uint32_t)mask->vni[0] + << 16) | + ((uint32_t)mask->vni[1] + << 8) | + ((uint32_t)mask->vni[2])); + + NT_LOG(INF, FILTER, "VNI: %08x / %08x\n", vni, + vni_mask); + } + break; + } +#endif + + match->flow_elem[eidx].type = type; + match->flow_elem[eidx].spec = items[iter_idx].spec; + match->flow_elem[eidx].mask = items[iter_idx].mask; + + eidx++; + iter_idx++; + } + + } while (type >= 0 && type != FLOW_ELEM_TYPE_END); + + return (type >= 0) ? 0 : -1; +} + +static int +create_action_elements_vswitch(struct cnv_action_s *action, + const struct rte_flow_action actions[], + int max_elem, uint32_t *flow_stat_id) +{ + int aidx = 0; + int iter_idx = 0; + int type = -1; + + if (!actions) + return -1; + + if (!convert_tables_initialized) + initialize_global_cnv_tables(); + + *flow_stat_id = MAX_COLOR_FLOW_STATS; + do { + type = CNV_TO_ACTION(actions[iter_idx].type); + if (type < 0) { + if ((int)actions[iter_idx].type == + NT_RTE_FLOW_ACTION_TYPE_TUNNEL_SET) { + type = FLOW_ACTION_TYPE_TUNNEL_SET; + } else { + NT_LOG(ERR, FILTER, + "ERROR unknown action type received!\n"); + return -1; + } + } + +#ifdef RTE_FLOW_DEBUG + NT_LOG(INF, FILTER, + "RTE ACTION -> FILTER FLOW ACTION - %i -> %i - %s\n", + actions[iter_idx].type, type, + ((int)actions[iter_idx].type >= 0) ? + action_list_str[actions[iter_idx].type] : + "FLOW_ACTION_TYPE_TUNNEL_SET"); +#endif + + if (type >= 0) { + action->flow_actions[aidx].type = type; + + /* + * Non-compatible actions handled here + */ + switch (type) { + case -1: +#ifdef RTE_FLOW_DEBUG + NT_LOG(INF, FILTER, + "RTE ACTION UNSUPPORTED %i\n", + actions[iter_idx].type); +#endif + return -1; + + case FLOW_ACTION_TYPE_RSS: { + const struct rte_flow_action_rss *rss = + (const struct rte_flow_action_rss *) + actions[iter_idx] + .conf; + action->flow_rss.func = + FLOW_HASH_FUNCTION_DEFAULT; + + if (rss->func != + RTE_ETH_HASH_FUNCTION_DEFAULT) + return -1; + action->flow_rss.level = rss->level; + action->flow_rss.types = rss->types; + action->flow_rss.key_len = rss->key_len; + action->flow_rss.queue_num = rss->queue_num; + action->flow_rss.key = rss->key; + action->flow_rss.queue = rss->queue; +#ifdef RTE_FLOW_DEBUG + NT_LOG(DBG, FILTER, + "FLOW_ACTION_TYPE_RSS: rss->level = %u\n", + rss->level); + NT_LOG(DBG, FILTER, + " rss->types = 0x%" PRIX64 "\n", + (unsigned long long)rss->types); + NT_LOG(DBG, FILTER, + " rss->key_len = %u\n", + rss->key_len); + NT_LOG(DBG, FILTER, + " rss->queue_num = %u\n", + rss->queue_num); + NT_LOG(DBG, FILTER, + " rss->key = %p\n", + rss->key); + unsigned int i; + + for (i = 0; i < rss->queue_num; i++) { + NT_LOG(DBG, FILTER, + " rss->queue[%u] = %u\n", + i, rss->queue[i]); + } +#endif + action->flow_actions[aidx].conf = + &action->flow_rss; + break; + } + + case FLOW_ACTION_TYPE_VXLAN_ENCAP: { + const struct rte_flow_action_vxlan_encap *tun = + (const struct rte_flow_action_vxlan_encap + *)actions[iter_idx] + .conf; + if (!tun || create_match_elements(&action->tun_def.match, + tun->definition, + MAX_ELEMENTS) != 0) + return -1; + action->tun_def.tun_definition = + action->tun_def.match.flow_elem; + action->flow_actions[aidx].conf = + &action->tun_def; + } + break; + + case FLOW_ACTION_TYPE_MARK: { + const struct rte_flow_action_mark *mark_id = + (const struct rte_flow_action_mark *) + actions[iter_idx] + .conf; + if (mark_id) { +#ifdef RTE_FLOW_DEBUG + NT_LOG(DBG, FILTER, "Mark ID=%u\n", + mark_id->id); +#endif + *flow_stat_id = create_flow_stat_id(mark_id->id); + action->mark.id = *flow_stat_id; + action->flow_actions[aidx].conf = + &action->mark; + + } else { + action->flow_actions[aidx].conf = + actions[iter_idx].conf; + } + } + break; + + default: + /* Compatible */ + + /* + * OVS Full offload does not add mark in RTE Flow + * We need one in FPGA to control flow(color) statistics + */ + if (type == FLOW_ACTION_TYPE_END && + *flow_stat_id == MAX_COLOR_FLOW_STATS) { + /* We need to insert a mark for our FPGA */ + *flow_stat_id = create_flow_stat_id(0); + action->mark.id = *flow_stat_id; + + action->flow_actions[aidx].type = + FLOW_ACTION_TYPE_MARK; + action->flow_actions[aidx].conf = + &action->mark; + aidx++; + + /* Move end type */ + action->flow_actions[aidx].type = + FLOW_ACTION_TYPE_END; + } + +#ifdef RTE_FLOW_DEBUG + switch (type) { + case FLOW_ACTION_TYPE_PORT_ID: + NT_LOG(DBG, FILTER, + "Port ID=%u, Original=%u\n", + ((const struct rte_flow_action_port_id + *)actions[iter_idx] + .conf) + ->id, + ((const struct rte_flow_action_port_id + *)actions[iter_idx] + .conf) + ->original); + break; + case FLOW_ACTION_TYPE_COUNT: + NT_LOG(DBG, FILTER, "Count ID=%u\n", + ((const struct rte_flow_action_count + *)actions[iter_idx] + .conf) + ->id); + break; + case FLOW_ACTION_TYPE_SET_TAG: + NT_LOG(DBG, FILTER, + "FLOW_ACTION_TYPE_SET_TAG: data=%u\n", + ((const struct flow_action_tag *) + actions[iter_idx] + .conf) + ->data); + NT_LOG(DBG, FILTER, + " mask=%u\n", + ((const struct flow_action_tag *) + actions[iter_idx] + .conf) + ->mask); + NT_LOG(DBG, FILTER, + " index=%u\n", + ((const struct flow_action_tag *) + actions[iter_idx] + .conf) + ->index); + break; + } +#endif + + action->flow_actions[aidx].conf = + actions[iter_idx].conf; + break; + } + + aidx++; + if (aidx == max_elem) + return -1; + iter_idx++; + } + + } while (type >= 0 && type != FLOW_ACTION_TYPE_END); + + return (type >= 0) ? 0 : -1; +} + +static int create_action_elements_inline(struct cnv_action_s *action, + const struct rte_flow_action actions[], + int max_elem, uint32_t queue_offset) +{ + int aidx = 0; + int type = -1; + + do { + type = CNV_TO_ACTION(actions[aidx].type); + +#ifdef RTE_FLOW_DEBUG + NT_LOG(INF, FILTER, + "RTE ACTION -> FILTER FLOW ACTION - %i -> %i - %s\n", + actions[aidx].type, type, + ((int)actions[aidx].type >= 0) ? + action_list_str[actions[aidx].type] : + "FLOW_ACTION_TYPE_TUNNEL_SET"); +#endif + + if (type >= 0) { + action->flow_actions[aidx].type = type; + + /* + * Non-compatible actions handled here + */ + switch (type) { + case FLOW_ACTION_TYPE_RSS: { + const struct rte_flow_action_rss *rss = + (const struct rte_flow_action_rss *) + actions[aidx] + .conf; + action->flow_rss.func = + FLOW_HASH_FUNCTION_DEFAULT; + + if (rss->func != + RTE_ETH_HASH_FUNCTION_DEFAULT) + return -1; + action->flow_rss.level = rss->level; + action->flow_rss.types = rss->types; + action->flow_rss.key_len = rss->key_len; + action->flow_rss.queue_num = rss->queue_num; + action->flow_rss.key = rss->key; + action->flow_rss.queue = rss->queue; + action->flow_actions[aidx].conf = + &action->flow_rss; +#ifdef RTE_FLOW_DEBUG + NT_LOG(DBG, FILTER, + "FLOW_ACTION_TYPE_RSS: rss->level = %u\n", + rss->level); + NT_LOG(DBG, FILTER, + " rss->types = 0x%" PRIX64 "\n", + (unsigned long long)rss->types); + NT_LOG(DBG, FILTER, + " rss->key_len = %u\n", + rss->key_len); + NT_LOG(DBG, FILTER, + " rss->queue_num = %u\n", + rss->queue_num); + NT_LOG(DBG, FILTER, + " rss->key = %p\n", + rss->key); + unsigned int i; + + for (i = 0; i < rss->queue_num; i++) { + NT_LOG(DBG, FILTER, + " rss->queue[%u] = %u\n", + i, rss->queue[i]); + } +#endif + } + break; + + case FLOW_ACTION_TYPE_RAW_DECAP: { + const struct rte_flow_action_raw_decap *decap = + (const struct rte_flow_action_raw_decap + *)actions[aidx] + .conf; + int item_count = interpret_raw_data(decap->data, + NULL, decap->size, + action->decap.items); + if (item_count < 0) + return item_count; +#ifdef RTE_FLOW_DEBUG + NT_LOG(DBG, FILTER, + "FLOW_ACTION_TYPE_RAW_DECAP: size = %u\n", + decap->size); + NT_LOG(DBG, FILTER, + "FLOW_ACTION_TYPE_RAW_DECAP: item_count = %u\n", + item_count); + for (int i = 0; i < item_count; i++) { + NT_LOG(DBG, FILTER, + "FLOW_ACTION_TYPE_RAW_DECAP: item = %u\n", + action->decap.items[i].type); + } +#endif + action->decap.data = decap->data; + action->decap.size = decap->size; + action->decap.item_count = item_count; + action->flow_actions[aidx].conf = + &action->decap; + } + break; + + case FLOW_ACTION_TYPE_RAW_ENCAP: { + const struct rte_flow_action_raw_encap *encap = + (const struct rte_flow_action_raw_encap + *)actions[aidx] + .conf; + int item_count = interpret_raw_data(encap->data, + encap->preserve, + encap->size, + action->encap.items); + if (item_count < 0) + return item_count; +#ifdef RTE_FLOW_DEBUG + NT_LOG(DBG, FILTER, + "FLOW_ACTION_TYPE_RAW_ENCAP: size = %u\n", + encap->size); + NT_LOG(DBG, FILTER, + "FLOW_ACTION_TYPE_RAW_ENCAP: item_count = %u\n", + item_count); +#endif + action->encap.data = encap->data; + action->encap.preserve = encap->preserve; + action->encap.size = encap->size; + action->encap.item_count = item_count; + action->flow_actions[aidx].conf = + &action->encap; + } + break; + + case FLOW_ACTION_TYPE_QUEUE: { + const struct rte_flow_action_queue *queue = + (const struct rte_flow_action_queue *) + actions[aidx] + .conf; + action->queue.index = + queue->index + queue_offset; + action->flow_actions[aidx].conf = + &action->queue; +#ifdef RTE_FLOW_DEBUG + NT_LOG(DBG, FILTER, + "FLOW_ACTION_TYPE_QUEUE: queue = %u\n", + action->queue.index); +#endif + } + break; + + default: { + action->flow_actions[aidx].conf = + actions[aidx].conf; + +#ifdef RTE_FLOW_DEBUG + switch (type) { + case FLOW_ACTION_TYPE_PORT_ID: + NT_LOG(DBG, FILTER, + "Port ID=%u, Original=%u\n", + ((const struct rte_flow_action_port_id + *)actions[aidx] + .conf) + ->id, + ((const struct rte_flow_action_port_id + *)actions[aidx] + .conf) + ->original); + break; + case FLOW_ACTION_TYPE_COUNT: + NT_LOG(DBG, FILTER, "Count ID=%u\n", + ((const struct rte_flow_action_count + *)actions[aidx] + .conf) + ->id); + break; + case FLOW_ACTION_TYPE_SET_TAG: + NT_LOG(DBG, FILTER, + "FLOW_ACTION_TYPE_SET_TAG: data=%u\n", + ((const struct flow_action_tag *) + actions[aidx] + .conf) + ->data); + NT_LOG(DBG, FILTER, + " mask=%u\n", + ((const struct flow_action_tag *) + actions[aidx] + .conf) + ->mask); + NT_LOG(DBG, FILTER, + " index=%u\n", + ((const struct flow_action_tag *) + actions[aidx] + .conf) + ->index); + break; + } +#endif + } + break; + } + + aidx++; + if (aidx == max_elem) + return -1; + } + + } while (type >= 0 && type != FLOW_ACTION_TYPE_END); + + return (type >= 0) ? 0 : -1; +} + +#endif /* __CREATE_ELEMENTS_H__ */ diff --git a/drivers/net/ntnic/ntnic_filter/ntnic_filter.c b/drivers/net/ntnic/ntnic_filter/ntnic_filter.c new file mode 100644 index 0000000000..b1cc4d2959 --- /dev/null +++ b/drivers/net/ntnic/ntnic_filter/ntnic_filter.c @@ -0,0 +1,656 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2023 Napatech A/S + */ + +#include +#include +#include + +#include "ntdrv_4ga.h" +#include +#include +#include "ntnic_ethdev.h" + +#include "ntlog.h" +#include "nt_util.h" +#include "create_elements.h" +#include "ntnic_filter.h" + +#define MAX_RTE_FLOWS 8192 +#define MAX_PORTIDS 64 + +#if (MAX_COLOR_FLOW_STATS != NT_MAX_COLOR_FLOW_STATS) +#error Difference in COLOR_FLOW_STATS. Please synchronize the defines. +#endif + +struct rte_flow nt_flows[MAX_RTE_FLOWS]; + +static int is_flow_handle_typecast(struct rte_flow *flow) +{ + const void *first_element = &nt_flows[0]; + const void *last_element = &nt_flows[MAX_RTE_FLOWS - 1]; + + return (void *)flow < first_element || (void *)flow > last_element; +} + +static int convert_flow(struct rte_eth_dev *eth_dev, + const struct rte_flow_attr *attr, + const struct rte_flow_item items[], + const struct rte_flow_action actions[], + struct cnv_attr_s *attribute, struct cnv_match_s *match, + struct cnv_action_s *action, + struct rte_flow_error *error, uint32_t *flow_stat_id) +{ + struct pmd_internals *dev = eth_dev->data->dev_private; + struct fpga_info_s *fpga_info = &dev->p_drv->ntdrv.adapter_info.fpga_info; + + static struct flow_error flow_error = { .type = FLOW_ERROR_NONE, + .message = "none" + }; + uint32_t queue_offset = 0; + +#ifdef RTE_FLOW_DEBUG + NT_LOG(DBG, FILTER, "ntnic_flow_create port_id %u - %s\n", + eth_dev->data->port_id, eth_dev->data->name); +#endif + + if (dev->type == PORT_TYPE_OVERRIDE && dev->vpq_nb_vq > 0) { + /* + * The queues coming from the main PMD will always start from 0 + * When the port is a the VF/vDPA port the queues must be changed + * to match the queues allocated for VF/vDPA. + */ + queue_offset = dev->vpq[0].id; + } + + /* Set initial error */ + convert_error(error, &flow_error); + + if (!dev) { + rte_flow_error_set(error, EINVAL, + RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL, + "Missing eth_dev"); + return -1; + } + + if (create_attr(attribute, attr) != 0) { + rte_flow_error_set(error, EINVAL, RTE_FLOW_ERROR_TYPE_ATTR, + NULL, "Error in attr"); + return -1; + } + if (create_match_elements(match, items, MAX_ELEMENTS) != 0) { + rte_flow_error_set(error, EINVAL, RTE_FLOW_ERROR_TYPE_ITEM, + NULL, "Error in items"); + return -1; + } + if (fpga_info->profile == FPGA_INFO_PROFILE_INLINE) { + if (create_action_elements_inline(action, actions, MAX_ACTIONS, + queue_offset) != 0) { + rte_flow_error_set(error, EINVAL, + RTE_FLOW_ERROR_TYPE_ACTION, NULL, + "Error in actions"); + return -1; + } + if (attribute->attr.group > 0) + return 0; + } else if (fpga_info->profile == FPGA_INFO_PROFILE_VSWITCH) { + if (create_action_elements_vswitch(action, actions, MAX_ACTIONS, + flow_stat_id) != 0) { + rte_flow_error_set(error, EINVAL, + RTE_FLOW_ERROR_TYPE_ACTION, NULL, + "Error in actions"); + return -1; + } + } else { + rte_flow_error_set(error, EPERM, + RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL, + "Unsupported adapter profile"); + return -1; + } + return 0; +} + +static int eth_flow_destroy(struct rte_eth_dev *eth_dev, struct rte_flow *flow, + struct rte_flow_error *error) +{ + struct pmd_internals *dev = eth_dev->data->dev_private; + static struct flow_error flow_error = { .type = FLOW_ERROR_NONE, + .message = "none" + }; + + int res = 0; + + /* Set initial error */ + convert_error(error, &flow_error); + + if (!flow) + return 0; + + if (is_flow_handle_typecast(flow)) { + res = flow_destroy(dev->flw_dev, (void *)flow, &flow_error); + convert_error(error, &flow_error); + } else { + res = flow_destroy(dev->flw_dev, flow->flw_hdl, &flow_error); + convert_error(error, &flow_error); + + rte_spinlock_lock(&flow_lock); + delete_flow_stat_id_locked(flow->flow_stat_id); + flow->used = 0; + rte_spinlock_unlock(&flow_lock); + } + + /* Clear the flow statistics if successfully destroyed */ + if (res == 0) { + flow->stat_pkts = 0UL; + flow->stat_bytes = 0UL; + flow->stat_tcp_flags = 0; + } + + return res; +} + +static int eth_flow_validate(struct rte_eth_dev *eth_dev, + const struct rte_flow_attr *attr, + const struct rte_flow_item items[], + const struct rte_flow_action actions[], + struct rte_flow_error *error) +{ + static struct flow_error flow_error = { .type = FLOW_ERROR_NONE, + .message = "none" + }; + struct pmd_internals *dev = eth_dev->data->dev_private; + struct cnv_attr_s attribute; + struct cnv_match_s match; + struct cnv_action_s action; + uint32_t flow_stat_id = 0; + int res; + + if (convert_flow(eth_dev, attr, items, actions, &attribute, &match, + &action, error, &flow_stat_id) < 0) + return -EINVAL; + + res = flow_validate(dev->flw_dev, match.flow_elem, action.flow_actions, + &flow_error); + + if (res < 0) + convert_error(error, &flow_error); + + return res; +} + +static struct rte_flow *eth_flow_create(struct rte_eth_dev *eth_dev, + const struct rte_flow_attr *attr, + const struct rte_flow_item items[], + const struct rte_flow_action actions[], + struct rte_flow_error *error) +{ + struct pmd_internals *dev = eth_dev->data->dev_private; + struct fpga_info_s *fpga_info = &dev->p_drv->ntdrv.adapter_info.fpga_info; + + struct cnv_attr_s attribute; + struct cnv_match_s match; + struct cnv_action_s action; + + static struct flow_error flow_error = { .type = FLOW_ERROR_NONE, + .message = "none" + }; + uint32_t flow_stat_id = 0; + +#ifdef RTE_FLOW_DEBUG + NT_LOG(DBG, FILTER, "ntnic_flow_create port_id %u - %s\n", + eth_dev->data->port_id, eth_dev->data->name); +#endif + + if (convert_flow(eth_dev, attr, items, actions, &attribute, &match, + &action, error, &flow_stat_id) < 0) + return NULL; + + if (fpga_info->profile == FPGA_INFO_PROFILE_INLINE && + attribute.attr.group > 0) { + void *flw_hdl = flow_create(dev->flw_dev, &attribute.attr, + match.flow_elem, + action.flow_actions, &flow_error); + convert_error(error, &flow_error); + return (struct rte_flow *)flw_hdl; + } + + struct rte_flow *flow = NULL; + + rte_spinlock_lock(&flow_lock); + int i; + + for (i = 0; i < MAX_RTE_FLOWS; i++) { + if (!nt_flows[i].used) { + nt_flows[i].flow_stat_id = flow_stat_id; + if (nt_flows[i].flow_stat_id < + NT_MAX_COLOR_FLOW_STATS) { + nt_flows[i].used = 1; + flow = &nt_flows[i]; + } + break; + } + } + rte_spinlock_unlock(&flow_lock); + if (flow) { + flow->flw_hdl = flow_create(dev->flw_dev, &attribute.attr, + match.flow_elem, + action.flow_actions, &flow_error); + convert_error(error, &flow_error); + if (!flow->flw_hdl) { + rte_spinlock_lock(&flow_lock); + delete_flow_stat_id_locked(flow->flow_stat_id); + flow->used = 0; + flow = NULL; + rte_spinlock_unlock(&flow_lock); + } else { +#ifdef RTE_FLOW_DEBUG + NT_LOG(INF, FILTER, "Create Flow %p using stat_id %i\n", + flow, flow->flow_stat_id); +#endif + } + } + return flow; +} + +uint64_t last_stat_rtc; + +int poll_statistics(struct pmd_internals *internals) +{ + int flow; + struct drv_s *p_drv = internals->p_drv; + struct ntdrv_4ga_s *p_nt_drv = &p_drv->ntdrv; + nt4ga_stat_t *p_nt4ga_stat = &p_nt_drv->adapter_info.nt4ga_stat; + const int if_index = internals->if_index; + + if (!p_nt4ga_stat || if_index < 0 || if_index > NUM_ADAPTER_PORTS_MAX) + return -1; + + assert(rte_tsc_freq > 0); + + rte_spinlock_lock(&hwlock); + + uint64_t now_rtc = rte_get_tsc_cycles(); + + /* + * Check per port max once a second + * if more than a second since last stat read, do a new one + */ + if ((now_rtc - internals->last_stat_rtc) < rte_tsc_freq) { + rte_spinlock_unlock(&hwlock); + return 0; + } + + internals->last_stat_rtc = now_rtc; + + pthread_mutex_lock(&p_nt_drv->stat_lck); + + /* + * Add the RX statistics increments since last time we polled. + * (No difference if physical or virtual port) + */ + internals->rxq_scg[0].rx_pkts += + p_nt4ga_stat->a_port_rx_packets_total[if_index] - + p_nt4ga_stat->a_port_rx_packets_base[if_index]; + internals->rxq_scg[0].rx_bytes += + p_nt4ga_stat->a_port_rx_octets_total[if_index] - + p_nt4ga_stat->a_port_rx_octets_base[if_index]; + internals->rxq_scg[0].err_pkts += 0; + internals->rx_missed += p_nt4ga_stat->a_port_rx_drops_total[if_index] - + p_nt4ga_stat->a_port_rx_drops_base[if_index]; + + /* _update the increment bases */ + p_nt4ga_stat->a_port_rx_packets_base[if_index] = + p_nt4ga_stat->a_port_rx_packets_total[if_index]; + p_nt4ga_stat->a_port_rx_octets_base[if_index] = + p_nt4ga_stat->a_port_rx_octets_total[if_index]; + p_nt4ga_stat->a_port_rx_drops_base[if_index] = + p_nt4ga_stat->a_port_rx_drops_total[if_index]; + + /* Tx (here we must distinguish between physical and virtual ports) */ + if (internals->type == PORT_TYPE_PHYSICAL) { + /* LAG management of Tx stats. */ + if (lag_active && if_index == 0) { + unsigned int i; + /* + * Collect all LAG ports Tx stat into this one. Simplified to only collect + * from port 0 and 1. + */ + for (i = 0; i < 2; i++) { + /* Add the statistics increments since last time we polled */ + internals->txq_scg[0].tx_pkts += + p_nt4ga_stat->a_port_tx_packets_total[i] - + p_nt4ga_stat->a_port_tx_packets_base[i]; + internals->txq_scg[0].tx_bytes += + p_nt4ga_stat->a_port_tx_octets_total[i] - + p_nt4ga_stat->a_port_tx_octets_base[i]; + internals->txq_scg[0].err_pkts += 0; + + /* _update the increment bases */ + p_nt4ga_stat->a_port_tx_packets_base[i] = + p_nt4ga_stat->a_port_tx_packets_total[i]; + p_nt4ga_stat->a_port_tx_octets_base[i] = + p_nt4ga_stat->a_port_tx_octets_total[i]; + } + } else { + /* Add the statistics increments since last time we polled */ + internals->txq_scg[0].tx_pkts += + p_nt4ga_stat->a_port_tx_packets_total[if_index] - + p_nt4ga_stat->a_port_tx_packets_base[if_index]; + internals->txq_scg[0].tx_bytes += + p_nt4ga_stat->a_port_tx_octets_total[if_index] - + p_nt4ga_stat->a_port_tx_octets_base[if_index]; + internals->txq_scg[0].err_pkts += 0; + + /* _update the increment bases */ + p_nt4ga_stat->a_port_tx_packets_base[if_index] = + p_nt4ga_stat->a_port_tx_packets_total[if_index]; + p_nt4ga_stat->a_port_tx_octets_base[if_index] = + p_nt4ga_stat->a_port_tx_octets_total[if_index]; + } + } + if (internals->type == PORT_TYPE_VIRTUAL) { + /* _update TX counters from HB queue counter */ + unsigned int i; + struct host_buffer_counters *const p_hb_counters = + p_nt4ga_stat->mp_stat_structs_hb; + uint64_t v_port_packets_total = 0, v_port_octets_total = 0; + + /* + * This is a bit odd. But typically nb_tx_queues must be only 1 since it denotes + * the number of exception queues which must be 1 - for now. The code is kept if we + * want it in future, but it will not be likely. + * Therefore adding all vPorts queue tx counters into Tx[0] is ok for now. + * + * Only use the vPort Tx counter to update OVS, since these are the real ones. + * The rep port into OVS that represents this port will always replicate the traffic + * here, also when no offload occurs + */ + for (i = 0; i < internals->vpq_nb_vq; ++i) { + v_port_packets_total += + p_hb_counters[internals->vpq[i].id].fwd_packets; + v_port_octets_total += + p_hb_counters[internals->vpq[i].id].fwd_bytes; + } + /* Add the statistics increments since last time we polled */ + internals->txq_scg[0].tx_pkts += + v_port_packets_total - + p_nt4ga_stat->a_port_tx_packets_base[if_index]; + internals->txq_scg[0].tx_bytes += + v_port_octets_total - + p_nt4ga_stat->a_port_tx_octets_base[if_index]; + internals->txq_scg[0].err_pkts += 0; /* What to user here ?? */ + + /* _update the increment bases */ + p_nt4ga_stat->a_port_tx_packets_base[if_index] = v_port_packets_total; + p_nt4ga_stat->a_port_tx_octets_base[if_index] = v_port_octets_total; + } + + /* Globally only once a second */ + if ((now_rtc - last_stat_rtc) < rte_tsc_freq) { + rte_spinlock_unlock(&hwlock); + pthread_mutex_unlock(&p_nt_drv->stat_lck); + return 0; + } + + last_stat_rtc = now_rtc; + + /* All color counter are global, therefore only 1 pmd must update them */ + const struct color_counters *p_color_counters = + p_nt4ga_stat->mp_stat_structs_color; + struct color_counters *p_color_counters_base = + p_nt4ga_stat->a_stat_structs_color_base; + uint64_t color_packets_accumulated, color_bytes_accumulated; + + for (flow = 0; flow < MAX_RTE_FLOWS; flow++) { + if (nt_flows[flow].used) { + unsigned int color = nt_flows[flow].flow_stat_id; + + if (color < NT_MAX_COLOR_FLOW_STATS) { + color_packets_accumulated = + p_color_counters[color].color_packets; + nt_flows[flow].stat_pkts += + (color_packets_accumulated - + p_color_counters_base[color].color_packets); + + nt_flows[flow].stat_tcp_flags |= + p_color_counters[color].tcp_flags; + + color_bytes_accumulated = + p_color_counters[color].color_bytes; + nt_flows[flow].stat_bytes += + (color_bytes_accumulated - + p_color_counters_base[color].color_bytes); + + /* _update the counter bases */ + p_color_counters_base[color].color_packets = + color_packets_accumulated; + p_color_counters_base[color].color_bytes = + color_bytes_accumulated; + } + } + } + + rte_spinlock_unlock(&hwlock); + pthread_mutex_unlock(&p_nt_drv->stat_lck); + + return 0; +} + +static int eth_flow_query(struct rte_eth_dev *dev, struct rte_flow *flow, + const struct rte_flow_action *action, void *data, + struct rte_flow_error *err) +{ + struct pmd_internals *internals = dev->data->dev_private; + + err->cause = NULL; + err->message = NULL; + + if (is_flow_handle_typecast(flow)) { + rte_flow_error_set(err, EFAULT, RTE_FLOW_ERROR_TYPE_UNSPECIFIED, + NULL, "Error in flow handle"); + return -1; + } + + poll_statistics(internals); + + if (action->type == RTE_FLOW_ACTION_TYPE_COUNT) { + struct rte_flow_query_count *qcnt = + (struct rte_flow_query_count *)data; + if (qcnt) { + if (flow) { + qcnt->hits = flow->stat_pkts; + qcnt->hits_set = 1; + qcnt->bytes = flow->stat_bytes; + qcnt->bytes_set = 1; + + if (qcnt->reset) { + flow->stat_pkts = 0UL; + flow->stat_bytes = 0UL; + flow->stat_tcp_flags = 0; + } + } else { + qcnt->hits_set = 0; + qcnt->bytes_set = 0; + } + } + } else { + rte_flow_error_set(err, EINVAL, RTE_FLOW_ERROR_TYPE_ACTION, + NULL, "Unsupported query"); + return -1; + } + rte_flow_error_set(err, 0, RTE_FLOW_ERROR_TYPE_NONE, NULL, "Success"); + return 0; +} + +#ifdef DEBUGGING + +static void _print_tunnel(struct rte_flow_tunnel *tunnel) +{ + struct in_addr addr; + + NT_LOG(DBG, FILTER, " tun type: %i\n", tunnel->type); + NT_LOG(DBG, FILTER, " tun ID: %016lx\n", tunnel->tun_id); + addr.s_addr = tunnel->ipv4.src_addr; + NT_LOG(DBG, FILTER, " tun src IP: %s\n", inet_ntoa(addr)); + addr.s_addr = tunnel->ipv4.dst_addr; + NT_LOG(DBG, FILTER, " tun dst IP: %s\n", inet_ntoa(addr)); + NT_LOG(DBG, FILTER, " tun tp_src: %i\n", htons(tunnel->tp_src)); + NT_LOG(DBG, FILTER, " tun tp_dst: %i\n", htons(tunnel->tp_dst)); + NT_LOG(DBG, FILTER, " tun flags: %i\n", tunnel->tun_flags); + NT_LOG(DBG, FILTER, " tun ipv6: %i\n", tunnel->is_ipv6); + + NT_LOG(DBG, FILTER, " tun tos: %i\n", tunnel->tos); + NT_LOG(DBG, FILTER, " tun ttl: %i\n", tunnel->ttl); +} +#endif + +static struct rte_flow_action _pmd_actions[] = { + { .type = (enum rte_flow_action_type)NT_RTE_FLOW_ACTION_TYPE_TUNNEL_SET, + .conf = NULL + }, + { .type = 0, .conf = NULL } +}; + +static int ntnic_tunnel_decap_set(struct rte_eth_dev *dev _unused, + struct rte_flow_tunnel *tunnel, + struct rte_flow_action **pmd_actions, + uint32_t *num_of_actions, + struct rte_flow_error *err _unused) +{ +#ifdef DEBUGGING + NT_LOG(DBG, FILTER, "%s: [%s:%u] start\n", __func__, __FILE__, __LINE__); +#endif + + if (tunnel->type == RTE_FLOW_ITEM_TYPE_VXLAN) + _pmd_actions[1].type = RTE_FLOW_ACTION_TYPE_VXLAN_DECAP; + else + return -ENOTSUP; + + *pmd_actions = _pmd_actions; + *num_of_actions = 2; + + return 0; +} + +static struct rte_flow_item _pmd_items = { + .type = (enum rte_flow_item_type)NT_RTE_FLOW_ITEM_TYPE_TUNNEL, + .spec = NULL, + .last = NULL, + .mask = NULL +}; + +static int ntnic_tunnel_match(struct rte_eth_dev *dev _unused, + struct rte_flow_tunnel *tunnel _unused, + struct rte_flow_item **pmd_items, + uint32_t *num_of_items, + struct rte_flow_error *err _unused) +{ +#ifdef DEBUGGING + NT_LOG(DBG, FILTER, "%s: [%s:%u] start\n", __func__, __FILE__, __LINE__); +#endif + + *pmd_items = &_pmd_items; + *num_of_items = 1; + return 0; +} + +/* + * Restoration API support + */ +static int ntnic_get_restore_info(struct rte_eth_dev *dev _unused, + struct rte_mbuf *m, + struct rte_flow_restore_info *info, + struct rte_flow_error *err _unused) +{ +#ifdef DEBUGGING + NT_LOG(DBG, FILTER, "%s: [%s:%u]\n", __func__, __FILE__, __LINE__); + NT_LOG(DBG, FILTER, "dev name: %s - port_id %i\n", dev->data->name, dev->data->port_id); + NT_LOG(DBG, FILTER, "dpdk tunnel mark %08x\n", m->hash.fdir.hi); +#endif + + if ((m->ol_flags & RTE_MBUF_F_RX_FDIR_ID) && m->hash.fdir.hi) { + uint8_t port_id = (m->hash.fdir.hi >> 24) & 0xff; + uint32_t stat_id = m->hash.fdir.lo & 0xffffff; + + struct tunnel_cfg_s tuncfg; + int ret = flow_get_tunnel_definition(&tuncfg, stat_id, port_id); + + if (ret) + return -EINVAL; + + if (tuncfg.ipversion == 4) { + info->tunnel.ipv4.dst_addr = tuncfg.v4.dst_ip; + info->tunnel.ipv4.src_addr = tuncfg.v4.src_ip; + info->tunnel.is_ipv6 = 0; + } else { + /* IPv6 */ + for (int i = 0; i < 16; i++) { + info->tunnel.ipv6.src_addr[i] = + tuncfg.v6.src_ip[i]; + info->tunnel.ipv6.dst_addr[i] = + tuncfg.v6.dst_ip[i]; + } + info->tunnel.is_ipv6 = 1; + } + + info->tunnel.tp_dst = tuncfg.d_port; + info->tunnel.tp_src = tuncfg.s_port; + + info->tunnel.ttl = 64; + info->tunnel.tos = 0; + + /* FLOW_TNL_F_KEY | FLOW_TNL_F_DONT_FRAGMENT */ + info->tunnel.tun_flags = (1 << 3) | (1 << 1); + + info->tunnel.type = RTE_FLOW_ITEM_TYPE_VXLAN; + info->tunnel.tun_id = m->hash.fdir.hi & 0xffffff; + + info->flags = RTE_FLOW_RESTORE_INFO_TUNNEL; + /* | RTE_FLOW_RESTORE_INFO_ENCAPSULATED; if restored packet is sent back */ + info->group_id = 0; + +#ifdef DEBUGGING + _print_tunnel(&info->tunnel); +#endif + + return 0; + } + return -EINVAL; /* Supported, but no hit found */ +} + +static int +ntnic_tunnel_action_decap_release(struct rte_eth_dev *dev _unused, + struct rte_flow_action *pmd_actions _unused, + uint32_t num_of_actions _unused, + struct rte_flow_error *err _unused) +{ +#ifdef DEBUGGING + NT_LOG(DBG, FILTER, "%s: [%s:%u] start\n", __func__, __FILE__, __LINE__); +#endif + return 0; +} + +static int ntnic_tunnel_item_release(struct rte_eth_dev *dev _unused, + struct rte_flow_item *pmd_items _unused, + uint32_t num_of_items _unused, + struct rte_flow_error *err _unused) +{ +#ifdef DEBUGGING + NT_LOG(DBG, FILTER, "%s: [%s:%u] start\n", __func__, __FILE__, __LINE__); +#endif + return 0; +} + +const struct rte_flow_ops _dev_flow_ops = { + .validate = eth_flow_validate, + .create = eth_flow_create, + .destroy = eth_flow_destroy, + .flush = NULL, + .query = eth_flow_query, + .tunnel_decap_set = ntnic_tunnel_decap_set, + .tunnel_match = ntnic_tunnel_match, + .get_restore_info = ntnic_get_restore_info, + .tunnel_action_decap_release = ntnic_tunnel_action_decap_release, + .tunnel_item_release = ntnic_tunnel_item_release + +}; diff --git a/drivers/net/ntnic/ntnic_filter/ntnic_filter.h b/drivers/net/ntnic/ntnic_filter/ntnic_filter.h new file mode 100644 index 0000000000..cf4207e5de --- /dev/null +++ b/drivers/net/ntnic/ntnic_filter/ntnic_filter.h @@ -0,0 +1,14 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2023 Napatech A/S + */ + +#ifndef __NTNIC_FILTER_H__ +#define __NTNIC_FILTER_H__ + +struct rte_flow * +client_flow_create(struct flow_eth_dev *flw_dev, enum fpga_info_profile profile, + struct cnv_attr_s *attribute, struct cnv_match_s *match, + struct cnv_action_s *action, uint32_t flow_stat_id, + struct rte_flow_error *error); + +#endif /* __NTNIC_FILTER_H__ */ diff --git a/drivers/net/ntnic/ntnic_hshconfig.c b/drivers/net/ntnic/ntnic_hshconfig.c new file mode 100644 index 0000000000..a8eff76528 --- /dev/null +++ b/drivers/net/ntnic/ntnic_hshconfig.c @@ -0,0 +1,102 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2023 Napatech A/S + */ + +#include +#include + +#include "ntnic_hshconfig.h" + +#include +#include + +struct pair_uint64_t { + uint64_t first; + uint64_t second; +}; + +#define PAIR_NT(name) \ + { \ + RTE_##name, NT_##name \ + } + +struct pair_uint64_t rte_eth_rss_to_nt[] = { + PAIR_NT(ETH_RSS_IPV4), + PAIR_NT(ETH_RSS_FRAG_IPV4), + PAIR_NT(ETH_RSS_NONFRAG_IPV4_OTHER), + PAIR_NT(ETH_RSS_IPV6), + PAIR_NT(ETH_RSS_FRAG_IPV6), + PAIR_NT(ETH_RSS_NONFRAG_IPV6_OTHER), + PAIR_NT(ETH_RSS_IPV6_EX), + PAIR_NT(ETH_RSS_C_VLAN), + PAIR_NT(ETH_RSS_L3_DST_ONLY), + PAIR_NT(ETH_RSS_L3_SRC_ONLY), + PAIR_NT(ETH_RSS_LEVEL_OUTERMOST), + PAIR_NT(ETH_RSS_LEVEL_INNERMOST), +}; + +static const uint64_t *rte_to_nt_rss_flag(const uint64_t rte_flag) +{ + const struct pair_uint64_t *start = rte_eth_rss_to_nt; + + for (const struct pair_uint64_t *p = start; + p != start + ARRAY_SIZE(rte_eth_rss_to_nt); ++p) { + if (p->first == rte_flag) + return &p->second; + } + return NULL; /* NOT found */ +} + +static const uint64_t *nt_to_rte_rss_flag(const uint64_t nt_flag) +{ + const struct pair_uint64_t *start = rte_eth_rss_to_nt; + + for (const struct pair_uint64_t *p = start; + p != start + ARRAY_SIZE(rte_eth_rss_to_nt); ++p) { + if (p->second == nt_flag) + return &p->first; + } + return NULL; /* NOT found */ +} + +struct nt_eth_rss nt_rss_hash_field_from_dpdk(uint64_t rte_hash_bits) +{ + struct nt_eth_rss res = { 0 }; + + for (uint i = 0; i < sizeof(rte_hash_bits) * CHAR_BIT; ++i) { + uint64_t rte_bit = (UINT64_C(1) << i); + + if (rte_hash_bits & rte_bit) { + const uint64_t *nt_bit_p = rte_to_nt_rss_flag(rte_bit); + + if (!nt_bit_p) { + NT_LOG(ERR, ETHDEV, + "RSS hash function field number %d is not supported. Only supported fields will be used in RSS hash function.", + i); + } else { + res.fields |= *nt_bit_p; + } + } + } + + return res; +} + +uint64_t dpdk_rss_hash_define_from_nt_rss(struct nt_eth_rss nt_hsh) +{ + uint64_t res = 0; + + for (uint i = 0; i < sizeof(nt_hsh.fields) * CHAR_BIT; ++i) { + uint64_t nt_bit = (UINT64_C(1) << i); + + if (nt_hsh.fields & nt_bit) { + const uint64_t *rte_bit_p = nt_to_rte_rss_flag(nt_bit); + + assert(rte_bit_p && + "All nt rss bit flags should be mapped to rte rss bit fields, as nt rss is a subset of rte options"); + res |= *rte_bit_p; + } + } + + return res; +} diff --git a/drivers/net/ntnic/ntnic_hshconfig.h b/drivers/net/ntnic/ntnic_hshconfig.h new file mode 100644 index 0000000000..d4d7337d23 --- /dev/null +++ b/drivers/net/ntnic/ntnic_hshconfig.h @@ -0,0 +1,9 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2023 Napatech A/S + */ + +#include + +/* Mapping from dpdk rss hash defines to nt hash defines */ +struct nt_eth_rss nt_rss_hash_field_from_dpdk(uint64_t rte_hash_bits); +uint64_t dpdk_rss_hash_define_from_nt_rss(struct nt_eth_rss nt_hsh); diff --git a/drivers/net/ntnic/ntnic_meter.c b/drivers/net/ntnic/ntnic_meter.c new file mode 100644 index 0000000000..027ae073dd --- /dev/null +++ b/drivers/net/ntnic/ntnic_meter.c @@ -0,0 +1,811 @@ +/* + * SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2023 Napatech A/S + */ + +#include + +#include +#include +#include +#include + +#include "ntdrv_4ga.h" +#include "nthw_fpga.h" +#include "ntnic_ethdev.h" +#include "ntnic_meter.h" +#include "ntlog.h" + +/* + ******************************************************************************* + * Vswitch metering + ******************************************************************************* + */ + +static const uint32_t highest_bit_mask = (~(~0u >> 1)); + +static struct nt_mtr_profile * +nt_mtr_profile_find(struct pmd_internals *dev_priv, uint32_t meter_profile_id) +{ + struct nt_mtr_profile *profile = NULL; + + LIST_FOREACH(profile, &dev_priv->mtr_profiles, next) + if (profile->profile_id == meter_profile_id) + break; + + return profile; +} + +static int eth_meter_profile_add(struct rte_eth_dev *dev, + uint32_t meter_profile_id, + struct rte_mtr_meter_profile *profile, + struct rte_mtr_error *error) +{ + struct pmd_internals *dev_priv = dev->data->dev_private; + + NT_LOG(DBG, NTHW, "%s: [%s:%u] adapter: " PCIIDENT_PRINT_STR "\n", + __func__, __func__, __LINE__, + PCIIDENT_TO_DOMAIN(dev_priv->p_drv->ntdrv.pciident), + PCIIDENT_TO_BUSNR(dev_priv->p_drv->ntdrv.pciident), + PCIIDENT_TO_DEVNR(dev_priv->p_drv->ntdrv.pciident), + PCIIDENT_TO_FUNCNR(dev_priv->p_drv->ntdrv.pciident)); + + const bool is_egress = meter_profile_id & highest_bit_mask; + + if (dev_priv->type == PORT_TYPE_VIRTUAL || is_egress) { + struct nt_mtr_profile *prof; + + prof = nt_mtr_profile_find(dev_priv, meter_profile_id); + if (prof) + return -rte_mtr_error_set(error, EEXIST, + RTE_MTR_ERROR_TYPE_METER_PROFILE_ID, + NULL, + "Profile id already exists\n"); + + prof = rte_zmalloc(NULL, sizeof(*prof), 0); + if (!prof) { + return -rte_mtr_error_set(error, + ENOMEM, RTE_MTR_ERROR_TYPE_UNSPECIFIED, + NULL, NULL); + } + + prof->profile_id = meter_profile_id; + memcpy(&prof->profile, profile, + sizeof(struct rte_mtr_meter_profile)); + + LIST_INSERT_HEAD(&dev_priv->mtr_profiles, prof, next); + + return 0; + } + /* Ingress is not possible yet on phy ports */ + return -rte_mtr_error_set(error, EINVAL, + RTE_MTR_ERROR_TYPE_UNSPECIFIED, NULL, + "Traffic ingress metering/policing is not supported on physical ports\n"); +} + +static int eth_meter_profile_delete(struct rte_eth_dev *dev, + uint32_t meter_profile_id, + struct rte_mtr_error *error) +{ + struct pmd_internals *dev_priv = dev->data->dev_private; + struct nt_mtr_profile *profile; + + NT_LOG(DBG, NTHW, "%s: [%s:%u] adapter: " PCIIDENT_PRINT_STR "\n", + __func__, __func__, __LINE__, + PCIIDENT_TO_DOMAIN(dev_priv->p_drv->ntdrv.pciident), + PCIIDENT_TO_BUSNR(dev_priv->p_drv->ntdrv.pciident), + PCIIDENT_TO_DEVNR(dev_priv->p_drv->ntdrv.pciident), + PCIIDENT_TO_FUNCNR(dev_priv->p_drv->ntdrv.pciident)); + + profile = nt_mtr_profile_find(dev_priv, meter_profile_id); + if (!profile) + return -rte_mtr_error_set(error, ENODEV, + RTE_MTR_ERROR_TYPE_METER_PROFILE_ID, + NULL, "Profile id does not exist\n"); + + LIST_REMOVE(profile, next); + rte_free(profile); + return 0; +} + +static struct nt_mtr *nt_mtr_find(struct pmd_internals *dev_priv, + uint32_t mtr_id) +{ + struct nt_mtr *mtr = NULL; + + LIST_FOREACH(mtr, &dev_priv->mtrs, next) + if (mtr->mtr_id == mtr_id) + break; + + return mtr; +} + +struct qos_integer_fractional { + uint32_t integer; + uint32_t fractional; /* 1/1024 */ +}; + +/* + * Converts byte/s to byte/period if form of integer + 1/1024*fractional + * the period depends on the clock friquency and other parameters which + * being combined give multiplier. The resulting formula is: + * f[bytes/period] = x[byte/s] * period_ps / 10^-12 + */ +static struct qos_integer_fractional +byte_per_second_to_qo_s_ri(uint64_t byte_per_second, uint64_t period_ps) +{ + struct qos_integer_fractional res; + const uint64_t dividend = byte_per_second * period_ps; + const uint64_t divisor = 1000000000000ull; /*10^12 pico second*/ + + res.integer = dividend / divisor; + const uint64_t reminder = dividend % divisor; + + res.fractional = 1024ull * reminder / divisor; + return res; +} + +static struct qos_integer_fractional +byte_per_second_to_physical_qo_s_ri(uint64_t byte_per_second) +{ + return byte_per_second_to_qo_s_ri(byte_per_second, 8 * 3333ul); +} + +static struct qos_integer_fractional +byte_per_second_to_virtual_qo_s_ri(uint64_t byte_per_second) +{ + return byte_per_second_to_qo_s_ri(byte_per_second, 512 * 3333ul); +} + +static int eth_meter_enable(struct rte_eth_dev *dev, uint32_t mtr_id, + struct rte_mtr_error *error) +{ + struct pmd_internals *dev_priv = dev->data->dev_private; + int res; + static int ingress_initial; + + NT_LOG(DBG, NTHW, "%s: [%s:%u] adapter: " PCIIDENT_PRINT_STR "\n", + __func__, __func__, __LINE__, + PCIIDENT_TO_DOMAIN(dev_priv->p_drv->ntdrv.pciident), + PCIIDENT_TO_BUSNR(dev_priv->p_drv->ntdrv.pciident), + PCIIDENT_TO_DEVNR(dev_priv->p_drv->ntdrv.pciident), + PCIIDENT_TO_FUNCNR(dev_priv->p_drv->ntdrv.pciident)); + + nthw_dbs_t *p_nthw_dbs = + dev_priv->p_drv->ntdrv.adapter_info.fpga_info.mp_nthw_dbs; + nthw_epp_t *p_nthw_epp = + dev_priv->p_drv->ntdrv.adapter_info.fpga_info.mp_nthw_epp; + + /* + * FPGA is based on FRC 4115 so CIR,EIR and CBS/EBS are used + * rfc4115.cir = rfc2697.cir + * rfc4115.eir = rfc2697.cir + * rfc4115.cbs = rfc2697.cbs + * rfc4115.ebs = rfc2697.ebs + */ + struct nt_mtr *mtr = nt_mtr_find(dev_priv, mtr_id); + + if (!mtr) { + return -rte_mtr_error_set(error, EINVAL, + RTE_MTR_ERROR_TYPE_MTR_ID, NULL, + "Meter id not found\n"); + } + + if (!mtr->profile) { + return -rte_mtr_error_set(error, EINVAL, + RTE_MTR_ERROR_TYPE_METER_PROFILE_ID, + NULL, "Meter profile id not found\n"); + } + + const uint32_t profile_id = mtr->profile->profile_id; + const bool is_egress = profile_id & highest_bit_mask; + uint32_t burst = mtr->profile->profile.srtcm_rfc2697.cbs; + + if (is_egress) { + const bool is_virtual = (dev_priv->type == PORT_TYPE_VIRTUAL); + struct qos_integer_fractional cir = { 0 }; + + if (is_virtual) { + cir = + byte_per_second_to_virtual_qo_s_ri(mtr->profile->profile.srtcm_rfc2697.cir); + if (cir.integer == 0 && cir.fractional == 0) + cir.fractional = 1; + res = nthw_epp_set_vport_qos(p_nthw_epp, dev_priv->port, + cir.integer, cir.fractional, + burst); + } else { + cir = + byte_per_second_to_physical_qo_s_ri(mtr->profile->profile + .srtcm_rfc2697.cir); + if (cir.integer == 0 && cir.fractional == 0) + cir.fractional = 1; + res = nthw_epp_set_txp_qos(p_nthw_epp, dev_priv->port, + cir.integer, cir.fractional, + burst); + } + if (res) { + return -rte_mtr_error_set(error, EINVAL, + RTE_MTR_ERROR_TYPE_UNSPECIFIED, + NULL, + "Applying meter profile for setting egress policy failed\n"); + } + } else { + if (!ingress_initial) { + /* + * based on a 250Mhz FPGA + * _update refresh rate interval calculation: + * multiplier / (divider * 4ns) + * 1 / (2000 * 4ns) = 8,000*10-6 => refresh rate interval = 8000ns + * + * results in resolution of IR is 1Mbps + */ + res = nthw_set_tx_qos_rate_global(p_nthw_dbs, 1, 2000); + + if (res) { + return -rte_mtr_error_set(error, EINVAL, + RTE_MTR_ERROR_TYPE_UNSPECIFIED, NULL, + "Applying meter profile for setting ingress " + "global QoS rate failed\n"); + } + ingress_initial = 1; + } + + if (mtr->profile->profile.srtcm_rfc2697.cbs >= (1 << 27)) { + /* max burst 1,074Mb (27 bits) */ + mtr->profile->profile.srtcm_rfc2697.cbs = (1 << 27) - 1; + } + /* IR - fill x bytes each 8000ns -> 1B/8000ns => 1000Kbps => 125000Bps / x */ + res = nthw_set_tx_qos_config(p_nthw_dbs, dev_priv->port, /* vport */ + 1, /* enable */ + mtr->profile->profile.srtcm_rfc2697.cir / + 125000, + mtr->profile->profile.srtcm_rfc2697 + .cbs); /* BS - burst size in Bytes */ + if (res) { + return -rte_mtr_error_set(error, EINVAL, + RTE_MTR_ERROR_TYPE_UNSPECIFIED, + NULL, "Applying meter profile failed\n"); + } + } + return 0; +} + +static void disable(struct pmd_internals *dev_priv) +{ + NT_LOG(DBG, NTHW, "%s: [%s:%u] adapter: " PCIIDENT_PRINT_STR "\n", + __func__, __func__, __LINE__, + PCIIDENT_TO_DOMAIN(dev_priv->p_drv->ntdrv.pciident), + PCIIDENT_TO_BUSNR(dev_priv->p_drv->ntdrv.pciident), + PCIIDENT_TO_DEVNR(dev_priv->p_drv->ntdrv.pciident), + PCIIDENT_TO_FUNCNR(dev_priv->p_drv->ntdrv.pciident)); + + nthw_dbs_t *p_nthw_dbs = + dev_priv->p_drv->ntdrv.adapter_info.fpga_info.mp_nthw_dbs; + nthw_set_tx_qos_config(p_nthw_dbs, dev_priv->port, /* vport */ + 0, /* disable */ + 0, /* IR */ + 0); /* BS */ +} + +static int eth_meter_disable(struct rte_eth_dev *dev, uint32_t mtr_id, + struct rte_mtr_error *error) +{ + struct pmd_internals *dev_priv = dev->data->dev_private; + struct nt_mtr *mtr = nt_mtr_find(dev_priv, mtr_id); + + NT_LOG(DBG, NTHW, "%s: [%s:%u] adapter: " PCIIDENT_PRINT_STR "\n", + __func__, __func__, __LINE__, + PCIIDENT_TO_DOMAIN(dev_priv->p_drv->ntdrv.pciident), + PCIIDENT_TO_BUSNR(dev_priv->p_drv->ntdrv.pciident), + PCIIDENT_TO_DEVNR(dev_priv->p_drv->ntdrv.pciident), + PCIIDENT_TO_FUNCNR(dev_priv->p_drv->ntdrv.pciident)); + + nthw_epp_t *p_nthw_epp = + dev_priv->p_drv->ntdrv.adapter_info.fpga_info.mp_nthw_epp; + + if (!mtr) { + return -rte_mtr_error_set(error, EINVAL, + RTE_MTR_ERROR_TYPE_MTR_ID, NULL, + "Meter id not found\n"); + } + + const bool is_egress = mtr_id & highest_bit_mask; + + if (is_egress) { + const bool is_virtual = (dev_priv->type == PORT_TYPE_VIRTUAL); + + if (is_virtual) + nthw_epp_set_vport_qos(p_nthw_epp, dev_priv->port, 0, 0, 0); + else + nthw_epp_set_txp_qos(p_nthw_epp, dev_priv->port, 0, 0, 0); + } else { + disable(dev_priv); + } + return 0; +} + +/* MTR object create */ +static int eth_mtr_create(struct rte_eth_dev *dev, uint32_t mtr_id, + struct rte_mtr_params *params, int shared, + struct rte_mtr_error *error) +{ + struct pmd_internals *dev_priv = dev->data->dev_private; + struct nt_mtr *mtr = NULL; + struct nt_mtr_profile *profile; + + NT_LOG(DBG, NTHW, "%s: [%s:%u] adapter: " PCIIDENT_PRINT_STR "\n", + __func__, __func__, __LINE__, + PCIIDENT_TO_DOMAIN(dev_priv->p_drv->ntdrv.pciident), + PCIIDENT_TO_BUSNR(dev_priv->p_drv->ntdrv.pciident), + PCIIDENT_TO_DEVNR(dev_priv->p_drv->ntdrv.pciident), + PCIIDENT_TO_FUNCNR(dev_priv->p_drv->ntdrv.pciident)); + + const bool is_egress = mtr_id & highest_bit_mask; + + if (dev_priv->type == PORT_TYPE_PHYSICAL && !is_egress) { + NT_LOG(ERR, NTHW, + "ERROR try to create ingress meter object on a phy port. Not supported\n"); + + return -rte_mtr_error_set(error, EINVAL, RTE_MTR_ERROR_TYPE_UNSPECIFIED, NULL, + "Traffic ingress metering/policing is not supported on physical ports\n"); + } + + mtr = nt_mtr_find(dev_priv, mtr_id); + if (mtr) + return -rte_mtr_error_set(error, EEXIST, + RTE_MTR_ERROR_TYPE_MTR_ID, NULL, + "Meter id already exists\n"); + + profile = nt_mtr_profile_find(dev_priv, params->meter_profile_id); + if (!profile) { + return -rte_mtr_error_set(error, EINVAL, + RTE_MTR_ERROR_TYPE_METER_PROFILE_ID, + NULL, "Profile id does not exist\n"); + } + + mtr = rte_zmalloc(NULL, sizeof(struct nt_mtr), 0); + if (!mtr) + return -rte_mtr_error_set(error, ENOMEM, + RTE_MTR_ERROR_TYPE_UNSPECIFIED, NULL, + NULL); + + mtr->shared = shared; + mtr->mtr_id = mtr_id; + mtr->profile = profile; + LIST_INSERT_HEAD(&dev_priv->mtrs, mtr, next); + + if (params->meter_enable) + return eth_meter_enable(dev, mtr_id, error); + + return 0; +} + +/* MTR object destroy */ +static int eth_mtr_destroy(struct rte_eth_dev *dev, uint32_t mtr_id, + struct rte_mtr_error *error) +{ + struct pmd_internals *dev_priv = dev->data->dev_private; + struct nt_mtr *mtr; + + NT_LOG(DBG, NTHW, "%s: [%s:%u] adapter: " PCIIDENT_PRINT_STR "\n", + __func__, __func__, __LINE__, + PCIIDENT_TO_DOMAIN(dev_priv->p_drv->ntdrv.pciident), + PCIIDENT_TO_BUSNR(dev_priv->p_drv->ntdrv.pciident), + PCIIDENT_TO_DEVNR(dev_priv->p_drv->ntdrv.pciident), + PCIIDENT_TO_FUNCNR(dev_priv->p_drv->ntdrv.pciident)); + + nthw_epp_t *p_nthw_epp = + dev_priv->p_drv->ntdrv.adapter_info.fpga_info.mp_nthw_epp; + + mtr = nt_mtr_find(dev_priv, mtr_id); + if (!mtr) + return -rte_mtr_error_set(error, EEXIST, + RTE_MTR_ERROR_TYPE_MTR_ID, NULL, + "Meter id does not exist\n"); + + const bool is_egress = mtr_id & highest_bit_mask; + + if (is_egress) { + const bool is_virtual = (dev_priv->type == PORT_TYPE_VIRTUAL); + + if (is_virtual) + nthw_epp_set_vport_qos(p_nthw_epp, dev_priv->port, 0, 0, 0); + else + nthw_epp_set_txp_qos(p_nthw_epp, dev_priv->port, 0, 0, 0); + } else { + disable(dev_priv); + } + LIST_REMOVE(mtr, next); + rte_free(mtr); + return 0; +} + +/* + ******************************************************************************* + * Inline FLM metering + ******************************************************************************* + */ + +static int eth_mtr_capabilities_get_inline(struct rte_eth_dev *dev, + struct rte_mtr_capabilities *cap, + struct rte_mtr_error *error) +{ + struct pmd_internals *dev_priv = dev->data->dev_private; + + if (!flow_mtr_supported(dev_priv->flw_dev)) { + return -rte_mtr_error_set(error, EINVAL, + RTE_MTR_ERROR_TYPE_UNSPECIFIED, NULL, + "Ethernet device does not support metering\n"); + } + + memset(cap, 0x0, sizeof(struct rte_mtr_capabilities)); + + /* MBR records use 28-bit integers */ + cap->n_max = flow_mtr_meters_supported(); + cap->n_shared_max = cap->n_max; + + cap->identical = 0; + cap->shared_identical = 0; + + cap->shared_n_flows_per_mtr_max = UINT32_MAX; + + /* Limited by number of MBR record ids per FLM learn record */ + cap->chaining_n_mtrs_per_flow_max = 4; + + cap->chaining_use_prev_mtr_color_supported = 0; + cap->chaining_use_prev_mtr_color_enforced = 0; + + cap->meter_rate_max = (uint64_t)(0xfff << 0xf) * 1099; + + cap->stats_mask = RTE_MTR_STATS_N_PKTS_GREEN | + RTE_MTR_STATS_N_BYTES_GREEN; + + /* Only color-blind mode is supported */ + cap->color_aware_srtcm_rfc2697_supported = 0; + cap->color_aware_trtcm_rfc2698_supported = 0; + cap->color_aware_trtcm_rfc4115_supported = 0; + + /* Focused on RFC2698 for now */ + cap->meter_srtcm_rfc2697_n_max = 0; + cap->meter_trtcm_rfc2698_n_max = cap->n_max; + cap->meter_trtcm_rfc4115_n_max = 0; + + cap->meter_policy_n_max = flow_mtr_meter_policy_n_max(); + + /* Byte mode is supported */ + cap->srtcm_rfc2697_byte_mode_supported = 0; + cap->trtcm_rfc2698_byte_mode_supported = 1; + cap->trtcm_rfc4115_byte_mode_supported = 0; + + /* Packet mode not supported */ + cap->srtcm_rfc2697_packet_mode_supported = 0; + cap->trtcm_rfc2698_packet_mode_supported = 0; + cap->trtcm_rfc4115_packet_mode_supported = 0; + + return 0; +} + +static int +eth_mtr_meter_profile_add_inline(struct rte_eth_dev *dev, + uint32_t meter_profile_id, + struct rte_mtr_meter_profile *profile, + struct rte_mtr_error *error __rte_unused) +{ + struct pmd_internals *dev_priv = dev->data->dev_private; + + if (meter_profile_id >= flow_mtr_meter_policy_n_max()) + return -rte_mtr_error_set(error, EINVAL, + RTE_MTR_ERROR_TYPE_METER_PROFILE_ID, + NULL, "Profile id out of range\n"); + + if (profile->packet_mode != 0) { + return -rte_mtr_error_set(error, EINVAL, + RTE_MTR_ERROR_TYPE_METER_PROFILE_PACKET_MODE, NULL, + "Profile packet mode not supported\n"); + } + + if (profile->alg == RTE_MTR_SRTCM_RFC2697) { + return -rte_mtr_error_set(error, EINVAL, + RTE_MTR_ERROR_TYPE_METER_PROFILE, + NULL, "RFC 2697 not supported\n"); + } + + if (profile->alg == RTE_MTR_TRTCM_RFC4115) { + return -rte_mtr_error_set(error, EINVAL, + RTE_MTR_ERROR_TYPE_METER_PROFILE, + NULL, "RFC 4115 not supported\n"); + } + + if (profile->trtcm_rfc2698.cir != profile->trtcm_rfc2698.pir || + profile->trtcm_rfc2698.cbs != profile->trtcm_rfc2698.pbs) { + return -rte_mtr_error_set(error, EINVAL, RTE_MTR_ERROR_TYPE_METER_PROFILE, NULL, + "Profile committed and peak rates must be equal\n"); + } + + int res = flow_mtr_set_profile(dev_priv->flw_dev, meter_profile_id, + profile->trtcm_rfc2698.cir, + profile->trtcm_rfc2698.cbs, 0, 0); + + if (res) { + return -rte_mtr_error_set(error, EINVAL, + RTE_MTR_ERROR_TYPE_METER_PROFILE, + NULL, + "Profile could not be added.\n"); + } + + return 0; +} + +static int +eth_mtr_meter_profile_delete_inline(struct rte_eth_dev *dev __rte_unused, + uint32_t meter_profile_id __rte_unused, + struct rte_mtr_error *error __rte_unused) +{ + struct pmd_internals *dev_priv = dev->data->dev_private; + + if (meter_profile_id >= flow_mtr_meter_policy_n_max()) + return -rte_mtr_error_set(error, EINVAL, + RTE_MTR_ERROR_TYPE_METER_PROFILE_ID, + NULL, "Profile id out of range\n"); + + flow_mtr_set_profile(dev_priv->flw_dev, meter_profile_id, 0, 0, 0, 0); + + return 0; +} + +static int +eth_mtr_meter_policy_add_inline(struct rte_eth_dev *dev, uint32_t policy_id, + struct rte_mtr_meter_policy_params *policy, + struct rte_mtr_error *error) +{ + struct pmd_internals *dev_priv = dev->data->dev_private; + + if (policy_id >= flow_mtr_meter_policy_n_max()) + return -rte_mtr_error_set(error, EINVAL, + RTE_MTR_ERROR_TYPE_METER_POLICY_ID, + NULL, "Policy id out of range\n"); + + const struct rte_flow_action *actions = + policy->actions[RTE_COLOR_GREEN]; + int green_action_supported = + (actions[0].type == RTE_FLOW_ACTION_TYPE_END) || + (actions[0].type == RTE_FLOW_ACTION_TYPE_VOID && + actions[1].type == RTE_FLOW_ACTION_TYPE_END) || + (actions[0].type == RTE_FLOW_ACTION_TYPE_PASSTHRU && + actions[1].type == RTE_FLOW_ACTION_TYPE_END); + + actions = policy->actions[RTE_COLOR_YELLOW]; + int yellow_action_supported = + actions[0].type == RTE_FLOW_ACTION_TYPE_DROP && + actions[1].type == RTE_FLOW_ACTION_TYPE_END; + + actions = policy->actions[RTE_COLOR_RED]; + int red_action_supported = actions[0].type == + RTE_FLOW_ACTION_TYPE_DROP && + actions[1].type == RTE_FLOW_ACTION_TYPE_END; + + if (green_action_supported == 0 || yellow_action_supported == 0 || + red_action_supported == 0) { + return -rte_mtr_error_set(error, EINVAL, + RTE_MTR_ERROR_TYPE_METER_POLICY, NULL, + "Unsupported meter policy actions\n"); + } + + if (flow_mtr_set_policy(dev_priv->flw_dev, policy_id, 1)) { + return -rte_mtr_error_set(error, EINVAL, + RTE_MTR_ERROR_TYPE_METER_POLICY, NULL, + "Policy could not be added\n"); + } + + return 0; +} + +static int +eth_mtr_meter_policy_delete_inline(struct rte_eth_dev *dev __rte_unused, + uint32_t policy_id __rte_unused, + struct rte_mtr_error *error __rte_unused) +{ + if (policy_id >= flow_mtr_meter_policy_n_max()) + return -rte_mtr_error_set(error, EINVAL, + RTE_MTR_ERROR_TYPE_METER_POLICY_ID, + NULL, "Policy id out of range\n"); + + return 0; +} + +static int eth_mtr_create_inline(struct rte_eth_dev *dev, uint32_t mtr_id, + struct rte_mtr_params *params, int shared, + struct rte_mtr_error *error) +{ + struct pmd_internals *dev_priv = dev->data->dev_private; + + if (params->use_prev_mtr_color != 0 || params->dscp_table != NULL) { + return -rte_mtr_error_set(error, EINVAL, RTE_MTR_ERROR_TYPE_MTR_PARAMS, NULL, + "Only color blind mode is supported\n"); + } + + uint64_t allowed_stats_mask = RTE_MTR_STATS_N_PKTS_GREEN | + RTE_MTR_STATS_N_BYTES_GREEN; + if ((params->stats_mask & ~allowed_stats_mask) != 0) { + return -rte_mtr_error_set(error, EINVAL, RTE_MTR_ERROR_TYPE_MTR_PARAMS, NULL, + "Requested color stats not supported\n"); + } + + if (params->meter_enable == 0) { + return -rte_mtr_error_set(error, EINVAL, + RTE_MTR_ERROR_TYPE_MTR_PARAMS, NULL, + "Disabled meters not supported\n"); + } + + if (shared == 0) { + return -rte_mtr_error_set(error, EINVAL, + RTE_MTR_ERROR_TYPE_MTR_PARAMS, NULL, + "Only shared mtrs are supported\n"); + } + + if (params->meter_profile_id >= flow_mtr_meter_policy_n_max()) + return -rte_mtr_error_set(error, EINVAL, + RTE_MTR_ERROR_TYPE_METER_PROFILE_ID, + NULL, "Profile id out of range\n"); + + if (params->meter_policy_id >= flow_mtr_meter_policy_n_max()) + return -rte_mtr_error_set(error, EINVAL, + RTE_MTR_ERROR_TYPE_METER_POLICY_ID, + NULL, "Policy id out of range\n"); + + if (mtr_id >= flow_mtr_meters_supported()) { + return -rte_mtr_error_set(error, EINVAL, + RTE_MTR_ERROR_TYPE_MTR_PARAMS, NULL, + "MTR id is out of range\n"); + } + + int res = flow_mtr_create_meter(dev_priv->flw_dev, mtr_id, + params->meter_profile_id, + params->meter_policy_id, + params->stats_mask); + + if (res) { + return -rte_mtr_error_set(error, EINVAL, + RTE_MTR_ERROR_TYPE_UNSPECIFIED, NULL, + "Failed to offload to hardware\n"); + } + + return 0; +} + +static int eth_mtr_destroy_inline(struct rte_eth_dev *dev, uint32_t mtr_id, + struct rte_mtr_error *error __rte_unused) +{ + struct pmd_internals *dev_priv = dev->data->dev_private; + + if (mtr_id >= flow_mtr_meters_supported()) { + return -rte_mtr_error_set(error, EINVAL, + RTE_MTR_ERROR_TYPE_MTR_PARAMS, NULL, + "MTR id is out of range\n"); + } + + if (flow_mtr_destroy_meter(dev_priv->flw_dev, mtr_id)) { + return -rte_mtr_error_set(error, EINVAL, + RTE_MTR_ERROR_TYPE_UNSPECIFIED, NULL, + "Failed to offload to hardware\n"); + } + + return 0; +} + +static int eth_mtr_stats_adjust_inline(struct rte_eth_dev *dev, uint32_t mtr_id, + uint64_t adjust_value, + struct rte_mtr_error *error) +{ + const uint64_t adjust_bit = 1ULL << 63; + struct pmd_internals *dev_priv = dev->data->dev_private; + + if (mtr_id >= flow_mtr_meters_supported()) { + return -rte_mtr_error_set(error, EINVAL, + RTE_MTR_ERROR_TYPE_MTR_PARAMS, NULL, + "MTR id is out of range\n"); + } + + if ((adjust_value & adjust_bit) == 0) { + return -rte_mtr_error_set(error, EINVAL, RTE_MTR_ERROR_TYPE_MTR_PARAMS, NULL, + "To adjust a MTR bucket value, bit 63 of \"stats_mask\" must be 1\n"); + } + + adjust_value &= adjust_bit - 1; + + if (adjust_value > (uint64_t)UINT32_MAX) { + return -rte_mtr_error_set(error, EINVAL, + RTE_MTR_ERROR_TYPE_MTR_PARAMS, NULL, + "Adjust value is out of range\n"); + } + + if (flm_mtr_adjust_stats(dev_priv->flw_dev, mtr_id, + (uint32_t)adjust_value)) { + return -rte_mtr_error_set(error, EINVAL, + RTE_MTR_ERROR_TYPE_UNSPECIFIED, NULL, + "Failed to adjust offloaded MTR\n"); + } + + return 0; +} + +static int eth_mtr_stats_read_inline(struct rte_eth_dev *dev, uint32_t mtr_id, + struct rte_mtr_stats *stats, + uint64_t *stats_mask, int clear, + struct rte_mtr_error *error) +{ + struct pmd_internals *dev_priv = dev->data->dev_private; + + if (mtr_id >= flow_mtr_meters_supported()) { + return -rte_mtr_error_set(error, EINVAL, + RTE_MTR_ERROR_TYPE_MTR_PARAMS, NULL, + "MTR id is out of range\n"); + } + + memset(stats, 0x0, sizeof(struct rte_mtr_stats)); + flm_mtr_read_stats(dev_priv->flw_dev, mtr_id, stats_mask, + &stats->n_pkts[RTE_COLOR_GREEN], + &stats->n_bytes[RTE_COLOR_GREEN], clear); + + return 0; +} + +/* + ******************************************************************************* + * Ops setup + ******************************************************************************* + */ + +static const struct rte_mtr_ops mtr_ops_vswitch = { + .meter_profile_add = eth_meter_profile_add, + .meter_profile_delete = eth_meter_profile_delete, + .create = eth_mtr_create, + .destroy = eth_mtr_destroy, + .meter_enable = eth_meter_enable, + .meter_disable = eth_meter_disable, +}; + +static const struct rte_mtr_ops mtr_ops_inline = { + .capabilities_get = eth_mtr_capabilities_get_inline, + .meter_profile_add = eth_mtr_meter_profile_add_inline, + .meter_profile_delete = eth_mtr_meter_profile_delete_inline, + .create = eth_mtr_create_inline, + .destroy = eth_mtr_destroy_inline, + .meter_policy_add = eth_mtr_meter_policy_add_inline, + .meter_policy_delete = eth_mtr_meter_policy_delete_inline, + .stats_update = eth_mtr_stats_adjust_inline, + .stats_read = eth_mtr_stats_read_inline, +}; + +int eth_mtr_ops_get(struct rte_eth_dev *dev, void *ops) +{ + struct pmd_internals *internals = + (struct pmd_internals *)dev->data->dev_private; + ntdrv_4ga_t *p_nt_drv = &internals->p_drv->ntdrv; + enum fpga_info_profile profile = p_nt_drv->adapter_info.fpga_info.profile; + + switch (profile) { + case FPGA_INFO_PROFILE_VSWITCH: + *(const struct rte_mtr_ops **)ops = &mtr_ops_vswitch; + break; + case FPGA_INFO_PROFILE_INLINE: + *(const struct rte_mtr_ops **)ops = &mtr_ops_inline; + break; + case FPGA_INFO_PROFILE_UNKNOWN: + /* fallthrough */ + case FPGA_INFO_PROFILE_CAPTURE: + /* fallthrough */ + default: + NT_LOG(ERR, NTHW, + "" PCIIDENT_PRINT_STR + ": fpga profile not supported [%s:%u]\n", + PCIIDENT_TO_DOMAIN(p_nt_drv->pciident), + PCIIDENT_TO_BUSNR(p_nt_drv->pciident), + PCIIDENT_TO_DEVNR(p_nt_drv->pciident), + PCIIDENT_TO_FUNCNR(p_nt_drv->pciident), + __func__, __LINE__); + return -1; + } + + return 0; +} diff --git a/drivers/net/ntnic/ntnic_meter.h b/drivers/net/ntnic/ntnic_meter.h new file mode 100644 index 0000000000..9484c9ee20 --- /dev/null +++ b/drivers/net/ntnic/ntnic_meter.h @@ -0,0 +1,10 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2023 Napatech A/S + */ + +#ifndef __NTNIC_METER_H__ +#define __NTNIC_METER_H__ + +int eth_mtr_ops_get(struct rte_eth_dev *dev __rte_unused, void *ops); + +#endif /* __NTNIC_METER_H__ */ diff --git a/drivers/net/ntnic/ntnic_vdpa.c b/drivers/net/ntnic/ntnic_vdpa.c new file mode 100644 index 0000000000..6372514527 --- /dev/null +++ b/drivers/net/ntnic/ntnic_vdpa.c @@ -0,0 +1,365 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2023 Napatech A/S + */ + +#include +#include + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include "ntnic_vf_vdpa.h" +#include "ntnic_vdpa.h" +#include "ntnic_ethdev.h" +#include "nt_util.h" +#include "ntlog.h" +#include "ntnic_vfio.h" + +#define MAX_PATH_LEN 128 +#define MAX_VDPA_PORTS 128UL + +struct vdpa_port { + char ifname[MAX_PATH_LEN]; + struct rte_vdpa_device *vdev; + int vid; + uint32_t index; + uint32_t host_id; + uint32_t rep_port; + int rxqs; + int txqs; + uint64_t flags; + struct rte_pci_addr addr; +}; + +static struct vdpa_port vport[MAX_VDPA_PORTS]; +static uint32_t nb_vpda_devcnt; + +static int nthw_vdpa_start(struct vdpa_port *vport); + +int nthw_vdpa_get_queue_id_info(struct rte_vdpa_device *vdpa_dev, int rx, + int queue_id, uint32_t *hw_index, + uint32_t *host_id, uint32_t *rep_port) +{ + uint32_t i; + + for (i = 0; i < nb_vpda_devcnt; i++) { + if (vport[i].vdev == vdpa_dev) { + if (rx) { + if (queue_id >= vport[i].rxqs) { + NT_LOG(ERR, VDPA, + "Failed: %s: Queue ID not configured. vDPA dev %p, rx queue_id %i, rxqs %i\n", + __func__, vdpa_dev, queue_id, + vport[i].rxqs); + return -1; + } + *hw_index = vport[i].index + queue_id; + } else { + if (queue_id >= vport[i].txqs) { + NT_LOG(ERR, VDPA, + "Failed: %s: Queue ID not configured. vDPA dev %p, tx queue_id %i, rxqs %i\n", + __func__, vdpa_dev, queue_id, + vport[i].rxqs); + return -1; + } + *hw_index = vport[i].index + queue_id; + } + + *host_id = vport[i].host_id; + *rep_port = vport[i].rep_port; + return 0; + } + } + + NT_LOG(ERR, VDPA, + "Failed: %s: Ask on vDPA dev %p, queue_id %i, nb_vpda_devcnt %i\n", + __func__, vdpa_dev, queue_id, nb_vpda_devcnt); + return -1; +} + +int nthw_vdpa_init(const struct rte_pci_device *vdev, + const char *backing_devname _unused, const char *socket_path, + uint32_t index, int rxqs, int txqs, uint32_t rep_port, + int *vhid) +{ + int ret; + uint32_t host_id = nt_vfio_vf_num(vdev); + + struct rte_vdpa_device *vdpa_dev = + rte_vdpa_find_device_by_name(vdev->name); + if (!vdpa_dev) { + NT_LOG(ERR, VDPA, "vDPA device with name %s - not found\n", + vdev->name); + return -1; + } + + vport[nb_vpda_devcnt].vdev = vdpa_dev; + vport[nb_vpda_devcnt].host_id = host_id; /* VF # */ + vport[nb_vpda_devcnt].index = index; /* HW ring index */ + vport[nb_vpda_devcnt].rep_port = rep_port; /* in port override on Tx */ + vport[nb_vpda_devcnt].rxqs = rxqs; + vport[nb_vpda_devcnt].txqs = txqs; + vport[nb_vpda_devcnt].addr = vdev->addr; + + vport[nb_vpda_devcnt].flags = RTE_VHOST_USER_CLIENT; + strlcpy(vport[nb_vpda_devcnt].ifname, socket_path, MAX_PATH_LEN); + + NT_LOG(INF, VDPA, + "vDPA%u: device %s (host_id %u), backing device %s, index %u, queues %i, rep port %u, ifname %s\n", + nb_vpda_devcnt, vdev->name, host_id, backing_devname, index, + rxqs, rep_port, vport[nb_vpda_devcnt].ifname); + + ret = nthw_vdpa_start(&vport[nb_vpda_devcnt]); + + *vhid = nb_vpda_devcnt; + nb_vpda_devcnt++; + return ret; +} + +void nthw_vdpa_close(void) +{ + uint32_t i; + + for (i = 0; i < MAX_VDPA_PORTS; i++) { + if (vport[i].ifname[0] != '\0') { + int ret; + char *socket_path = vport[i].ifname; + + ret = rte_vhost_driver_detach_vdpa_device(socket_path); + if (ret != 0) { + NT_LOG(ERR, VDPA, + "detach vdpa device failed: %s\n", + socket_path); + } + + ret = rte_vhost_driver_unregister(socket_path); + if (ret != 0) { + NT_LOG(ERR, VDPA, + "Fail to unregister vhost driver for %s.\n", + socket_path); + } + + vport[i].ifname[0] = '\0'; + return; + } + } +} + +#ifdef DUMP_VIRTIO_FEATURES +#define VIRTIO_F_NOTIFICATION_DATA 38 +#define NUM_FEATURES 40 +struct { + uint64_t id; + const char *name; +} virt_features[NUM_FEATURES] = { + { VIRTIO_NET_F_CSUM, "VIRTIO_NET_F_CSUM" }, + { VIRTIO_NET_F_GUEST_CSUM, "VIRTIO_NET_F_GUEST_CSUM" }, + { VIRTIO_NET_F_CTRL_GUEST_OFFLOADS, + " VIRTIO_NET_F_CTRL_GUEST_OFFLOADS" + }, + { VIRTIO_NET_F_MTU, " VIRTIO_NET_F_MTU" }, + { VIRTIO_NET_F_MAC, " VIRTIO_NET_F_MAC" }, + { VIRTIO_NET_F_GSO, " VIRTIO_NET_F_GSO" }, + { VIRTIO_NET_F_GUEST_TSO4, " VIRTIO_NET_F_GUEST_TSO4" }, + { VIRTIO_NET_F_GUEST_TSO6, " VIRTIO_NET_F_GUEST_TSO6" }, + { VIRTIO_NET_F_GUEST_ECN, " VIRTIO_NET_F_GUEST_ECN" }, + { VIRTIO_NET_F_GUEST_UFO, " VIRTIO_NET_F_GUEST_UFO" }, + { VIRTIO_NET_F_HOST_TSO4, " VIRTIO_NET_F_HOST_TSO4" }, + { VIRTIO_NET_F_HOST_TSO6, " VIRTIO_NET_F_HOST_TSO6" }, + { VIRTIO_NET_F_HOST_ECN, " VIRTIO_NET_F_HOST_ECN" }, + { VIRTIO_NET_F_HOST_UFO, " VIRTIO_NET_F_HOST_UFO" }, + { VIRTIO_NET_F_MRG_RXBUF, " VIRTIO_NET_F_MRG_RXBUF" }, + { VIRTIO_NET_F_STATUS, " VIRTIO_NET_F_STATUS" }, + { VIRTIO_NET_F_CTRL_VQ, " VIRTIO_NET_F_CTRL_VQ" }, + { VIRTIO_NET_F_CTRL_RX, " VIRTIO_NET_F_CTRL_RX" }, + { VIRTIO_NET_F_CTRL_VLAN, " VIRTIO_NET_F_CTRL_VLAN" }, + { VIRTIO_NET_F_CTRL_RX_EXTRA, " VIRTIO_NET_F_CTRL_RX_EXTRA" }, + { VIRTIO_NET_F_GUEST_ANNOUNCE, " VIRTIO_NET_F_GUEST_ANNOUNCE" }, + { VIRTIO_NET_F_MQ, " VIRTIO_NET_F_MQ" }, + { VIRTIO_NET_F_CTRL_MAC_ADDR, " VIRTIO_NET_F_CTRL_MAC_ADDR" }, + { VIRTIO_NET_F_HASH_REPORT, " VIRTIO_NET_F_HASH_REPORT" }, + { VIRTIO_NET_F_RSS, " VIRTIO_NET_F_RSS" }, + { VIRTIO_NET_F_RSC_EXT, " VIRTIO_NET_F_RSC_EXT" }, + { VIRTIO_NET_F_STANDBY, " VIRTIO_NET_F_STANDBY" }, + { VIRTIO_NET_F_SPEED_DUPLEX, " VIRTIO_NET_F_SPEED_DUPLEX" }, + { VIRTIO_F_NOTIFY_ON_EMPTY, " VIRTIO_F_NOTIFY_ON_EMPTY" }, + { VIRTIO_F_ANY_LAYOUT, " VIRTIO_F_ANY_LAYOUT" }, + { VIRTIO_RING_F_INDIRECT_DESC, " VIRTIO_RING_F_INDIRECT_DESC" }, + { VIRTIO_F_VERSION_1, " VIRTIO_F_VERSION_1" }, + { VIRTIO_F_IOMMU_PLATFORM, " VIRTIO_F_IOMMU_PLATFORM" }, + { VIRTIO_F_RING_PACKED, " VIRTIO_F_RING_PACKED" }, + { VIRTIO_TRANSPORT_F_START, " VIRTIO_TRANSPORT_F_START" }, + { VIRTIO_TRANSPORT_F_END, " VIRTIO_TRANSPORT_F_END" }, + { VIRTIO_F_IN_ORDER, " VIRTIO_F_IN_ORDER" }, + { VIRTIO_F_ORDER_PLATFORM, " VIRTIO_F_ORDER_PLATFORM" }, + { VIRTIO_F_NOTIFICATION_DATA, " VIRTIO_F_NOTIFICATION_DATA" }, +}; + +static void dump_virtio_features(uint64_t features) +{ + int i; + + for (i = 0; i < NUM_FEATURES; i++) { + if ((1ULL << virt_features[i].id) == + (features & (1ULL << virt_features[i].id))) + printf("Virtio feature: %s\n", virt_features[i].name); + } +} +#endif + +static int nthw_vdpa_new_device(int vid) +{ + char ifname[MAX_PATH_LEN]; + uint64_t negotiated_features = 0; + unsigned int vhid = -1; + + rte_vhost_get_ifname(vid, ifname, sizeof(ifname)); + + for (vhid = 0; vhid < MAX_VDPA_PORTS; vhid++) { + if (strncmp(ifname, vport[vhid].ifname, MAX_PATH_LEN) == 0) { + vport[vhid].vid = vid; + break; + } + } + + if (vhid >= MAX_VDPA_PORTS) + return -1; + + int max_loops = 2000; + struct pmd_internals *intern; + + while ((intern = vp_vhid_instance_ready(vhid)) == NULL) { + usleep(1000); + if (--max_loops == 0) { + NT_LOG(INF, VDPA, + "FAILED CREATING (vhost could not get ready) New port %s, vDPA dev: %s\n", + ifname, vport[vhid].vdev->device->name); + return -1; + } + } + + /* set link up on virtual port */ + intern->vport_comm = VIRT_PORT_NEGOTIATED_NONE; + + /* Store ifname (vhost_path) */ + strlcpy(intern->vhost_path, ifname, MAX_PATH_LEN); + + NT_LOG(INF, VDPA, "New port %s, vDPA dev: %s\n", ifname, + vport[vhid].vdev->device->name); + rte_vhost_get_negotiated_features(vid, &negotiated_features); + NT_LOG(INF, VDPA, "Virtio Negotiated features %016lx\n", + negotiated_features); + +#ifdef DUMP_VIRTIO_FEATURES + dump_virtio_features(negotiated_features); +#endif + + if ((((negotiated_features & (1ULL << VIRTIO_F_IN_ORDER))) || + ((negotiated_features & (1ULL << VIRTIO_F_RING_PACKED))))) { + /* IN_ORDER negotiated - we can run HW-virtio directly (vDPA) */ + NT_LOG(INF, VDPA, "Running virtio in vDPA mode : %s %s\n", + (negotiated_features & (1ULL << VIRTIO_F_RING_PACKED)) ? + "\"Packed-Ring\"" : + "\"Split-Ring\"", + (negotiated_features & (1ULL << VIRTIO_F_IN_ORDER)) ? + "\"In-Order\"" : + "\"No In-Order Requested\""); + + intern->vport_comm = + (negotiated_features & (1ULL << VIRTIO_F_RING_PACKED)) ? + VIRT_PORT_NEGOTIATED_PACKED : + VIRT_PORT_NEGOTIATED_SPLIT; + } else { + NT_LOG(ERR, VDPA, "Incompatible virtio negotiated features.\n"); + return -1; + } + return 0; +} + +static void nthw_vdpa_destroy_device(int vid) +{ + char ifname[MAX_PATH_LEN]; + uint32_t i; + unsigned int vhid; + + rte_vhost_get_ifname(vid, ifname, sizeof(ifname)); + for (i = 0; i < MAX_VDPA_PORTS; i++) { + if (strcmp(ifname, vport[i].ifname) == 0) { + NT_LOG(INF, VDPA, "\ndestroy port %s, vDPA dev: %s\n", + ifname, vport[i].vdev->device->name); + break; + } + } + + struct pmd_internals *intern; + + /* set link down on virtual port */ + for (vhid = 0; vhid < MAX_VDPA_PORTS; vhid++) { + if (strncmp(ifname, vport[vhid].ifname, MAX_PATH_LEN) == 0) { + intern = vp_vhid_instance_ready(vhid); + if (intern) + intern->vport_comm = VIRT_PORT_NEGOTIATED_NONE; + break; + } + } +} + +static const struct rte_vhost_device_ops vdpa_devops = { + .new_device = nthw_vdpa_new_device, + .destroy_device = nthw_vdpa_destroy_device, +}; + +static int nthw_vdpa_start(struct vdpa_port *vport) +{ + int ret; + char *socket_path = vport->ifname; + + ret = rte_vhost_driver_register(socket_path, vport->flags); + if (ret != 0) { + NT_LOG(ERR, VDPA, "register driver failed: %s\n", socket_path); + return -1; + } + + ret = rte_vhost_driver_callback_register(socket_path, &vdpa_devops); + if (ret != 0) { + NT_LOG(ERR, VDPA, "register driver ops failed: %s\n", + socket_path); + return -1; + } + + ret = rte_vhost_driver_disable_features(socket_path, (1ULL << VIRTIO_NET_F_HOST_TSO4) | + (1ULL << VIRTIO_NET_F_HOST_TSO6) | + (1ULL << VIRTIO_NET_F_CSUM) | + (1ULL << VIRTIO_RING_F_EVENT_IDX) | + (1ULL << VIRTIO_RING_F_INDIRECT_DESC) | + (1ULL << VIRTIO_NET_F_HOST_UFO) | + (1ULL << VIRTIO_NET_F_HOST_ECN) | + (1ULL << VIRTIO_NET_F_GUEST_CSUM) | + (1ULL << VIRTIO_NET_F_GUEST_TSO4) | + (1ULL << VIRTIO_NET_F_GUEST_TSO6) | + (1ULL << VIRTIO_NET_F_GUEST_UFO) | + (1ULL << VIRTIO_NET_F_GUEST_ECN) | + (1ULL << VIRTIO_NET_F_CTRL_VQ) | + (1ULL << VIRTIO_NET_F_CTRL_RX) | + (1ULL << VIRTIO_NET_F_GSO) | + (1ULL << VIRTIO_NET_F_MTU)); + + if (ret != 0) { + NT_LOG(INF, VDPA, + "rte_vhost_driver_disable_features failed for vhost user client port: %s\n", + socket_path); + return -1; + } + + if (rte_vhost_driver_start(socket_path) < 0) { + NT_LOG(ERR, VDPA, "start vhost driver failed: %s\n", + socket_path); + return -1; + } + return 0; +} diff --git a/drivers/net/ntnic/ntnic_vdpa.h b/drivers/net/ntnic/ntnic_vdpa.h new file mode 100644 index 0000000000..7acc2c8e4b --- /dev/null +++ b/drivers/net/ntnic/ntnic_vdpa.h @@ -0,0 +1,21 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2023 Napatech A/S + */ + +#ifndef _NTNIC_VDPA_H_ +#define _NTNIC_VDPA_H_ + +#include + +int nthw_vdpa_get_queue_id_info(struct rte_vdpa_device *vdpa_dev, int rx, + int queue_id, uint32_t *hw_index, + uint32_t *host_id, uint32_t *rep_port); + +int nthw_vdpa_init(const struct rte_pci_device *vdev, + const char *backing_devname, const char *socket_path, + uint32_t index, int rxqs, int txqs, uint32_t rep_port, + int *vhid); + +void nthw_vdpa_close(void); + +#endif /* _NTNIC_VDPA_H_ */ diff --git a/drivers/net/ntnic/ntnic_vf.c b/drivers/net/ntnic/ntnic_vf.c new file mode 100644 index 0000000000..0724b040c3 --- /dev/null +++ b/drivers/net/ntnic/ntnic_vf.c @@ -0,0 +1,83 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2023 Napatech A/S + */ + +#include +#include +#include +#include +#include +#include +#include +#include + +#include "ntnic_ethdev.h" +#include "ntnic_vf.h" +#include "ntnic_vf_vdpa.h" +#include "nt_util.h" +#include "ntlog.h" + +#define NT_HW_NAPATECH_PCI_VENDOR_ID (0x18F4) +#define NT_HW_NAPATECH_PCI_DEVICE_ID_NT200A02_VF (0x051A) +#define NT_HW_NAPATECH_PCI_DEVICE_ID_NT50B01_VF (0x051B) + +static const char *get_adapter_name(struct rte_pci_device *pci_dev) +{ + switch (pci_dev->id.vendor_id) { + case NT_HW_NAPATECH_PCI_VENDOR_ID: + switch (pci_dev->id.device_id) { + case NT_HW_NAPATECH_PCI_DEVICE_ID_NT200A02_VF: + return "NT200A02"; + case NT_HW_NAPATECH_PCI_DEVICE_ID_NT50B01_VF: + return "NT50B01"; + } + break; + } + + return "Unknown"; +} + +int nt_vf_pci_probe(struct rte_pci_driver *pci_drv, + struct rte_pci_device *pci_dev) +{ + const char *adapter_name _unused = get_adapter_name(pci_dev); + + NT_LOG(INF, VDPA, "Probe %s VF : %02x:%02x:%i\n", adapter_name, + pci_dev->addr.bus, pci_dev->addr.devid, pci_dev->addr.function); + + /* Create vDPA device for the virtual function interface.*/ + + if (ntvf_vdpa_pci_probe(pci_drv, pci_dev) != 0) + return -1; + + return nthw_create_vf_interface_dpdk(pci_dev); +} + +int nt_vf_pci_remove(struct rte_pci_device *pci_dev) +{ + if (ntvf_vdpa_pci_remove(pci_dev) != 0) + return -1; + + return nthw_remove_vf_interface_dpdk(pci_dev); +} + +static const struct rte_pci_id pci_id_nt_vf_map[] = { + { RTE_PCI_DEVICE(NT_HW_NAPATECH_PCI_VENDOR_ID, + NT_HW_NAPATECH_PCI_DEVICE_ID_NT200A02_VF) + }, + { RTE_PCI_DEVICE(NT_HW_NAPATECH_PCI_VENDOR_ID, + NT_HW_NAPATECH_PCI_DEVICE_ID_NT50B01_VF) + }, + { .vendor_id = 0, /* sentinel */ }, +}; + +static struct rte_pci_driver rte_nt_vf = { + .id_table = pci_id_nt_vf_map, + .drv_flags = 0, + .probe = nt_vf_pci_probe, + .remove = nt_vf_pci_remove, +}; + +RTE_PMD_REGISTER_PCI(net_nt_vf, rte_nt_vf); +RTE_PMD_REGISTER_PCI_TABLE(net_nt_vf, pci_id_nt_vf_map); +RTE_PMD_REGISTER_KMOD_DEP(net_nt_vf, "* vfio-pci"); diff --git a/drivers/net/ntnic/ntnic_vf.h b/drivers/net/ntnic/ntnic_vf.h new file mode 100644 index 0000000000..84be3bd71f --- /dev/null +++ b/drivers/net/ntnic/ntnic_vf.h @@ -0,0 +1,17 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2023 Napatech A/S + */ + +#ifndef _NTNIC_VF_H_ +#define _NTNIC_VF_H_ + +#include "rte_bus_pci.h" + +int nt_vf_pci_probe(struct rte_pci_driver *pci_drv __rte_unused, + struct rte_pci_device *pci_dev); +int nt_vf_pci_remove(struct rte_pci_device *pci_dev __rte_unused); + +int get_container_fd(int vf_num); +int close_vf_mem_mapping(int vf_num); + +#endif /* _NTNIC_VF_H_ */ diff --git a/drivers/net/ntnic/ntnic_vf_vdpa.c b/drivers/net/ntnic/ntnic_vf_vdpa.c new file mode 100644 index 0000000000..c520a43c44 --- /dev/null +++ b/drivers/net/ntnic/ntnic_vf_vdpa.c @@ -0,0 +1,1246 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2018 Intel Corporation + * Copyright(c) 2023 Napatech A/S + */ + +#include +#include +#include + +#include +#include +#include +#include + +#include +#include + +#include +#include + +#include +#include +#include +#include +#include +#include +#include +#include + +#include + +#include "ntdrv_4ga.h" +#include "ntnic_ethdev.h" +#include "ntnic_vdpa.h" +#include "ntnic_vf_vdpa.h" +#include "ntnic_vf.h" +#include "ntnic_vfio.h" +#include "ntnic_dbsconfig.h" +#include "ntlog.h" + +#define NTVF_VDPA_MAX_QUEUES (MAX_QUEUES) +#define NTVF_VDPA_MAX_INTR_VECTORS 8 + +#if RTE_VERSION_NUM(23, 3, 0, 99) > RTE_VERSION +#define NTVF_VDPA_SUPPORTED_PROTOCOL_FEATURES \ + ((1ULL << VHOST_USER_PROTOCOL_F_REPLY_ACK) | \ + (1ULL << VHOST_USER_PROTOCOL_F_SLAVE_REQ) | \ + (1ULL << VHOST_USER_PROTOCOL_F_SLAVE_SEND_FD) | \ + (1ULL << VHOST_USER_PROTOCOL_F_HOST_NOTIFIER) | \ + (1ULL << VHOST_USER_PROTOCOL_F_LOG_SHMFD) | \ + (1ULL << VHOST_USER_PROTOCOL_F_MQ)) +#else +#define NTVF_VDPA_SUPPORTED_PROTOCOL_FEATURES \ + ((1ULL << VHOST_USER_PROTOCOL_F_REPLY_ACK) | \ + (1ULL << VHOST_USER_PROTOCOL_F_BACKEND_REQ) | \ + (1ULL << VHOST_USER_PROTOCOL_F_BACKEND_SEND_FD) | \ + (1ULL << VHOST_USER_PROTOCOL_F_HOST_NOTIFIER) | \ + (1ULL << VHOST_USER_PROTOCOL_F_LOG_SHMFD) | \ + (1ULL << VHOST_USER_PROTOCOL_F_MQ)) +#endif + +#define NTVF_VIRTIO_NET_SUPPORTED_FEATURES \ + ((1ULL << VIRTIO_F_ANY_LAYOUT) | (1ULL << VIRTIO_F_VERSION_1) | \ + (1ULL << VHOST_F_LOG_ALL) | (1ULL << VIRTIO_NET_F_MRG_RXBUF) | \ + (1ULL << VIRTIO_F_IOMMU_PLATFORM) | (1ULL << VIRTIO_F_IN_ORDER) | \ + (1ULL << VIRTIO_F_RING_PACKED) | \ + (1ULL << VIRTIO_NET_F_GUEST_ANNOUNCE) | \ + (1ULL << VHOST_USER_F_PROTOCOL_FEATURES)) + +static int ntvf_vdpa_set_vring_state(int vid, int vring, int state); + +struct vring_info { + uint64_t desc; + uint64_t avail; + uint64_t used; + uint16_t size; + + uint16_t last_avail_idx; + uint16_t last_used_idx; + + int vq_type; + struct nthw_virt_queue *p_vq; + + int enable; +}; + +struct ntvf_vdpa_hw { + uint64_t negotiated_features; + + uint8_t nr_vring; + + struct vring_info vring[NTVF_VDPA_MAX_QUEUES * 2]; +}; + +struct ntvf_vdpa_internal { + struct rte_pci_device *pdev; + struct rte_vdpa_device *vdev; + + int vfio_container_fd; + int vfio_group_fd; + int vfio_dev_fd; + + int vid; + + uint32_t outport; + + uint16_t max_queues; + + uint64_t features; + + struct ntvf_vdpa_hw hw; + + volatile atomic_int_fast32_t started; + volatile atomic_int_fast32_t dev_attached; + volatile atomic_int_fast32_t running; + + rte_spinlock_t lock; + + volatile atomic_int_fast32_t dma_mapped; + volatile atomic_int_fast32_t intr_enabled; +}; + +#ifndef PAGE_SIZE +#define PAGE_SIZE 4096 +#endif + +#define NTVF_USED_RING_LEN(size) \ + ((size) * sizeof(struct vring_used_elem) + sizeof(uint16_t) * 3) + +#define NTVF_MEDIATED_VRING 0x210000000000 + +struct internal_list { + TAILQ_ENTRY(internal_list) next; + struct ntvf_vdpa_internal *internal; +}; + +TAILQ_HEAD(internal_list_head, internal_list); + +static struct internal_list_head internal_list = + TAILQ_HEAD_INITIALIZER(internal_list); + +static pthread_mutex_t internal_list_lock = PTHREAD_MUTEX_INITIALIZER; + +int ntvf_vdpa_logtype; + +static struct internal_list * +find_internal_resource_by_vdev(struct rte_vdpa_device *vdev) +{ + int found = 0; + struct internal_list *list; + + NT_LOG(DBG, VDPA, "%s: vDPA dev=%p\n", __func__, vdev); + + pthread_mutex_lock(&internal_list_lock); + + TAILQ_FOREACH(list, &internal_list, next) + { + if (vdev == list->internal->vdev) { + found = 1; + break; + } + } + + pthread_mutex_unlock(&internal_list_lock); + + if (!found) + return NULL; + + return list; +} + +static struct internal_list * +ntvf_vdpa_find_internal_resource_by_dev(const struct rte_pci_device *pdev) +{ + int found = 0; + struct internal_list *list; + + NT_LOG(DBG, VDPA, "%s: [%s:%u]\n", __func__, __FILE__, __LINE__); + + pthread_mutex_lock(&internal_list_lock); + + TAILQ_FOREACH(list, &internal_list, next) + { + if (pdev == list->internal->pdev) { + found = 1; + break; + } + } + + pthread_mutex_unlock(&internal_list_lock); + + if (!found) + return NULL; + + return list; +} + +static int ntvf_vdpa_vfio_setup(struct ntvf_vdpa_internal *internal) +{ + int vfio; + + LOG_FUNC_ENTER(); + + internal->vfio_dev_fd = -1; + internal->vfio_group_fd = -1; + internal->vfio_container_fd = -1; + + vfio = nt_vfio_setup(internal->pdev); + if (vfio == -1) { + NT_LOG(ERR, VDPA, "%s: [%s:%u]\n", __func__, __FILE__, __LINE__); + return -1; + } + internal->vfio_container_fd = nt_vfio_get_container_fd(vfio); + internal->vfio_group_fd = nt_vfio_get_group_fd(vfio); + internal->vfio_dev_fd = nt_vfio_get_dev_fd(vfio); + return 0; +} + +static int ntvf_vdpa_dma_map(struct ntvf_vdpa_internal *internal, int do_map) +{ + uint32_t i; + int ret = 0; + struct rte_vhost_memory *mem = NULL; + int vf_num = nt_vfio_vf_num(internal->pdev); + + LOG_FUNC_ENTER(); + + NT_LOG(DBG, VDPA, "%s: vid=%d vDPA dev=%p\n", __func__, internal->vid, + internal->vdev); + + if ((do_map && atomic_load(&internal->dma_mapped)) || + (!do_map && !atomic_load(&internal->dma_mapped))) { + ret = -1; + goto exit; + } + ret = rte_vhost_get_mem_table(internal->vid, &mem); + if (ret < 0) { + NT_LOG(ERR, VDPA, "failed to get VM memory layout.\n"); + goto exit; + } + + for (i = 0; i < mem->nregions; i++) { + struct rte_vhost_mem_region *reg = &mem->regions[i]; + + NT_LOG(INF, VDPA, + "%s, region %u: HVA 0x%" PRIX64 ", GPA 0xllx, size 0x%" PRIX64 ".\n", + (do_map ? "DMA map" : "DMA unmap"), i, + reg->host_user_addr, reg->guest_phys_addr, reg->size); + + if (do_map) { + ret = nt_vfio_dma_map_vdpa(vf_num, reg->host_user_addr, + reg->guest_phys_addr, + reg->size); + if (ret < 0) { + NT_LOG(ERR, VDPA, "%s: DMA map failed.\n", + __func__); + goto exit; + } + atomic_store(&internal->dma_mapped, 1); + } else { + ret = nt_vfio_dma_unmap_vdpa(vf_num, + reg->host_user_addr, + reg->guest_phys_addr, + reg->size); + if (ret < 0) { + NT_LOG(ERR, VDPA, "%s: DMA unmap failed.\n", __func__); + goto exit; + } + atomic_store(&internal->dma_mapped, 0); + } + } + +exit: + if (mem) + free(mem); + + LOG_FUNC_LEAVE(); + return ret; +} + +static uint64_t _hva_to_gpa(int vid, uint64_t hva) +{ + struct rte_vhost_memory *mem = NULL; + struct rte_vhost_mem_region *reg; + uint64_t gpa = 0; + uint32_t i; + + if (rte_vhost_get_mem_table(vid, &mem) < 0) + goto exit; + + for (i = 0; i < mem->nregions; i++) { + reg = &mem->regions[i]; + if (hva >= reg->host_user_addr && + hva < reg->host_user_addr + reg->size) { + gpa = hva - reg->host_user_addr + reg->guest_phys_addr; + break; + } + } + +exit: + if (mem) + free(mem); + + return gpa; +} + +static int ntvf_vdpa_create_vring(struct ntvf_vdpa_internal *internal, + int vring) +{ + struct ntvf_vdpa_hw *hw = &internal->hw; + struct rte_vhost_vring vq; + int vid = internal->vid; + uint64_t gpa; + + rte_vhost_get_vhost_vring(vid, vring, &vq); + + NT_LOG(INF, VDPA, "%s: idx=%d: vq.desc %p\n", __func__, vring, vq.desc); + + gpa = _hva_to_gpa(vid, (uint64_t)(uintptr_t)vq.desc); + if (gpa == 0) { + NT_LOG(ERR, VDPA, + "%s: idx=%d: failed to get GPA for descriptor ring: vq.desc %p\n", + __func__, vring, vq.desc); + return -1; + } + hw->vring[vring].desc = gpa; + + gpa = _hva_to_gpa(vid, (uint64_t)(uintptr_t)vq.avail); + if (gpa == 0) { + NT_LOG(ERR, VDPA, + "%s: idx=%d: failed to get GPA for available ring\n", + __func__, vring); + return -1; + } + hw->vring[vring].avail = gpa; + + gpa = _hva_to_gpa(vid, (uint64_t)(uintptr_t)vq.used); + if (gpa == 0) { + NT_LOG(ERR, VDPA, "%s: idx=%d: fail to get GPA for used ring\n", + __func__, vring); + return -1; + } + + hw->vring[vring].used = gpa; + hw->vring[vring].size = vq.size; + + rte_vhost_get_vring_base(vid, vring, &hw->vring[vring].last_avail_idx, + &hw->vring[vring].last_used_idx); + + /* Prevent multiple creations */ + { + const int index = vring; + uint32_t hw_index = 0; + uint32_t host_id = 0; + const uint32_t header = 0; /* 0=VirtIO hdr, 1=NT virtio hdr */ + uint32_t vport = 0; + uint32_t port = internal->outport; + struct vring_info *p_vr_inf = &hw->vring[vring]; + nthw_dbs_t *p_nthw_dbs = get_pdbs_from_pci(internal->pdev->addr); + + int res = nthw_vdpa_get_queue_id_info(internal->vdev, + !(vring & 1), vring >> 1, + &hw_index, &host_id, + &vport); + if (res) { + NT_LOG(ERR, VDPA, "HW info received failed\n"); + p_vr_inf->p_vq = NULL; /* Failed to create the vring */ + return res; + } + + if (!(vring & 1)) { + NT_LOG(DBG, VDPA, + "Rx: idx %u, host_id %u, vport %u, queue %i\n", + hw_index, host_id, vport, vring >> 1); + } else { + NT_LOG(DBG, VDPA, + "Tx: idx %u, host_id %u, vport %u, queue %i\n", + hw_index, host_id, vport, vring >> 1); + } + NT_LOG(DBG, VDPA, + "%s: idx=%d: avail=%p used=%p desc=%p: %X: %d %d %d\n", + __func__, index, (void *)p_vr_inf->avail, + (void *)p_vr_inf->used, (void *)p_vr_inf->desc, + p_vr_inf->size, host_id, port, header); + + if ((hw->negotiated_features & (1ULL << VIRTIO_F_IN_ORDER)) || + (hw->negotiated_features & + (1ULL << VIRTIO_F_RING_PACKED))) { + int res; + + NT_LOG(DBG, VDPA, + "%s: idx=%d: feature VIRTIO_F_IN_ORDER is set: 0x%016lX\n", + __func__, index, hw->negotiated_features); + + if (!(vring & 1)) { + struct nthw_virt_queue *rx_vq; + + uint16_t start_idx = + hw->vring[vring].last_avail_idx; + uint16_t next_ptr = + (start_idx & 0x7fff) % vq.size; + + /* disable doorbell not needed by FPGA */ + ((struct pvirtq_event_suppress *)vq.used) + ->flags = RING_EVENT_FLAGS_DISABLE; + rte_wmb(); + if (hw->negotiated_features & + (1ULL << VIRTIO_F_RING_PACKED)) { + NT_LOG(DBG, VDPA, + "Rx: hw_index %u, host_id %u, start_idx %u, header %u, vring %u, vport %u\n", + hw_index, host_id, start_idx, + header, vring, vport); + /* irq_vector 1,3,5... for Rx we support max 8 pr VF */ + rx_vq = nthw_setup_rx_virt_queue(p_nthw_dbs, + hw_index, start_idx, + next_ptr, + (void *)p_vr_inf + ->avail, /* -> driver_event */ + (void *)p_vr_inf + ->used, /* -> device_event */ + (void *)p_vr_inf->desc, + p_vr_inf->size, host_id, header, + PACKED_RING, + vring + 1); + + } else { + rx_vq = nthw_setup_rx_virt_queue(p_nthw_dbs, + hw_index, start_idx, + next_ptr, + (void *)p_vr_inf->avail, + (void *)p_vr_inf->used, + (void *)p_vr_inf->desc, + p_vr_inf->size, host_id, header, + SPLIT_RING, + -1); /* no interrupt enabled */ + } + + p_vr_inf->p_vq = rx_vq; + p_vr_inf->vq_type = 0; + res = (rx_vq ? 0 : -1); + if (res == 0) + register_release_virtqueue_info(rx_vq, + 1, 0); + + NT_LOG(DBG, VDPA, "[%i] Rx Queue size %i\n", + hw_index, p_vr_inf->size); + } else if (vring & 1) { + /* + * transmit virt queue + */ + struct nthw_virt_queue *tx_vq; + uint16_t start_idx = + hw->vring[vring].last_avail_idx; + uint16_t next_ptr; + + if (hw->negotiated_features & + (1ULL << VIRTIO_F_RING_PACKED)) { + next_ptr = + (start_idx & 0x7fff) % vq.size; + + /* disable doorbell needs from FPGA */ + ((struct pvirtq_event_suppress *)vq.used) + ->flags = + RING_EVENT_FLAGS_DISABLE; + rte_wmb(); + tx_vq = nthw_setup_tx_virt_queue(p_nthw_dbs, + hw_index, start_idx, + next_ptr, + (void *)p_vr_inf->avail, /* driver_event */ + (void *)p_vr_inf->used, /* device_event */ + (void *)p_vr_inf->desc, + p_vr_inf->size, host_id, port, + vport, header, PACKED_RING, + vring + 1, /* interrupt 2,4,6... */ + !!(hw->negotiated_features & + (1ULL << VIRTIO_F_IN_ORDER))); + + } else { + /* + * In Live Migration restart scenario: + * This only works if no jumbo packets has been send from VM + * on the LM source sideю This pointer points to the next + * free descr and may be pushed ahead by next flag and if + * so, this pointer calculation is incorrect + * + * NOTE: THEREFORE, THIS DOES NOT WORK WITH JUMBO PACKETS + * SUPPORT IN VM + */ + next_ptr = + (start_idx & 0x7fff) % vq.size; + tx_vq = nthw_setup_tx_virt_queue(p_nthw_dbs, + hw_index, start_idx, + next_ptr, + (void *)p_vr_inf->avail, + (void *)p_vr_inf->used, + (void *)p_vr_inf->desc, + p_vr_inf->size, host_id, port, + vport, header, SPLIT_RING, + -1, /* no interrupt enabled */ + IN_ORDER); + } + + p_vr_inf->p_vq = tx_vq; + p_vr_inf->vq_type = 1; + res = (tx_vq ? 0 : -1); + if (res == 0) + register_release_virtqueue_info(tx_vq, + 0, 0); + + NT_LOG(DBG, VDPA, "[%i] Tx Queue size %i\n", + hw_index, p_vr_inf->size); + } else { + NT_LOG(ERR, VDPA, + "%s: idx=%d: unexpected index: %d\n", + __func__, index, vring); + res = -1; + } + if (res != 0) { + NT_LOG(ERR, VDPA, + "%s: idx=%d: vring error: res=%d\n", + __func__, index, res); + } + + } else { + NT_LOG(WRN, VDPA, + "%s: idx=%d: for SPLIT RING: feature VIRTIO_F_IN_ORDER is *NOT* set: 0x%016lX\n", + __func__, index, hw->negotiated_features); + return 0; + } + } + + return 0; +} + +static int ntvf_vdpa_start(struct ntvf_vdpa_internal *internal) +{ + enum fpga_info_profile fpga_profile = + get_fpga_profile_from_pci(internal->pdev->addr); + struct ntvf_vdpa_hw *hw = &internal->hw; + int vid; + + LOG_FUNC_ENTER(); + + vid = internal->vid; + hw->nr_vring = rte_vhost_get_vring_num(vid); + rte_vhost_get_negotiated_features(vid, &hw->negotiated_features); + + if (fpga_profile == FPGA_INFO_PROFILE_INLINE) { + NT_LOG(INF, VDPA, "%s: Number of VRINGs=%u\n", __func__, + hw->nr_vring); + + for (int i = 0; i < hw->nr_vring && i < 2; i++) { + if (!hw->vring[i].enable) { + ntvf_vdpa_dma_map(internal, 1); + ntvf_vdpa_create_vring(internal, i); + if (hw->vring[i].desc && hw->vring[i].p_vq) { + if (hw->vring[i].vq_type == 0) + nthw_enable_rx_virt_queue(hw->vring[i].p_vq); + else + nthw_enable_tx_virt_queue(hw->vring[i].p_vq); + hw->vring[i].enable = 1; + } + } + } + } else { + /* + * Initially vring 0 must be enabled/created here - it is not later + * enabled in vring state + */ + if (!hw->vring[0].enable) { + ntvf_vdpa_dma_map(internal, 1); + ntvf_vdpa_create_vring(internal, 0); + hw->vring[0].enable = 1; + } + } + + LOG_FUNC_LEAVE(); + return 0; +} + +static int ntvf_vdpa_stop(struct ntvf_vdpa_internal *internal) +{ + struct ntvf_vdpa_hw *hw = &internal->hw; + uint64_t features; + uint32_t i; + int vid; + int res; + + LOG_FUNC_ENTER(); + + vid = internal->vid; + + for (i = 0; i < hw->nr_vring; i++) { + rte_vhost_set_vring_base(vid, i, hw->vring[i].last_avail_idx, + hw->vring[i].last_used_idx); + } + + rte_vhost_get_negotiated_features(vid, &features); + + for (i = 0; i < hw->nr_vring; i++) { + struct vring_info *p_vr_inf = &hw->vring[i]; + + if ((hw->negotiated_features & (1ULL << VIRTIO_F_IN_ORDER)) || + (hw->negotiated_features & + (1ULL << VIRTIO_F_RING_PACKED))) { + NT_LOG(DBG, VDPA, + "%s: feature VIRTIO_F_IN_ORDER is set: 0x%016lX\n", + __func__, hw->negotiated_features); + if (p_vr_inf->vq_type == 0) { + de_register_release_virtqueue_info(p_vr_inf->p_vq); + res = nthw_release_rx_virt_queue(p_vr_inf->p_vq); + } else if (p_vr_inf->vq_type == 1) { + de_register_release_virtqueue_info(p_vr_inf->p_vq); + res = nthw_release_tx_virt_queue(p_vr_inf->p_vq); + } else { + NT_LOG(ERR, VDPA, + "%s: vring #%d: unknown type %d\n", + __func__, i, p_vr_inf->vq_type); + res = -1; + } + if (res != 0) { + NT_LOG(ERR, VDPA, "%s: vring #%d: res=%d\n", + __func__, i, res); + } + } else { + NT_LOG(WRN, VDPA, + "%s: feature VIRTIO_F_IN_ORDER is *NOT* set: 0x%016lX\n", + __func__, hw->negotiated_features); + } + p_vr_inf->desc = 0UL; + } + + if (RTE_VHOST_NEED_LOG(features)) { + NT_LOG(WRN, VDPA, + "%s: vid %d: vhost logging feature needed - currently not supported\n", + __func__, vid); + } + + LOG_FUNC_LEAVE(); + return 0; +} + +#define MSIX_IRQ_SET_BUF_LEN \ + (sizeof(struct vfio_irq_set) + \ + sizeof(int) * NTVF_VDPA_MAX_QUEUES * 2 + 1) + +static int ntvf_vdpa_enable_vfio_intr(struct ntvf_vdpa_internal *internal) +{ + int ret; + uint32_t i, nr_vring; + char irq_set_buf[MSIX_IRQ_SET_BUF_LEN]; + struct vfio_irq_set *irq_set; + int *fd_ptr; + struct rte_vhost_vring vring; + + if (atomic_load(&internal->intr_enabled)) + return 0; + + LOG_FUNC_ENTER(); + vring.callfd = -1; + + nr_vring = rte_vhost_get_vring_num(internal->vid); + + NT_LOG(INF, VDPA, + "Enable VFIO interrupt MSI-X num rings %i on VID %i (%02x:%02x.%x)\n", + nr_vring, internal->vid, internal->pdev->addr.bus, + internal->pdev->addr.devid, internal->pdev->addr.function); + + if (nr_vring + 1 > NTVF_VDPA_MAX_INTR_VECTORS) { + NT_LOG(WRN, VDPA, + "Can't enable MSI interrupts. Too many vectors requested: " + "%i (max: %i) only poll mode drivers will work", + nr_vring + 1, NTVF_VDPA_MAX_INTR_VECTORS); + /* + * Return success, because polling drivers in VM still works without + * interrupts (i.e. DPDK PMDs) + */ + return 0; + } + + irq_set = (struct vfio_irq_set *)irq_set_buf; + irq_set->argsz = sizeof(irq_set_buf); + irq_set->count = nr_vring + 1; + irq_set->flags = VFIO_IRQ_SET_DATA_EVENTFD | + VFIO_IRQ_SET_ACTION_TRIGGER; + irq_set->index = VFIO_PCI_MSIX_IRQ_INDEX; + irq_set->start = 0; + fd_ptr = (int *)&irq_set->data; + + fd_ptr[RTE_INTR_VEC_ZERO_OFFSET] = internal->pdev->intr_handle->fd; + + for (i = 0; i < nr_vring; i += 2) { + rte_vhost_get_vhost_vring(internal->vid, i, &vring); + fd_ptr[RTE_INTR_VEC_RXTX_OFFSET + i] = vring.callfd; + + rte_vhost_get_vhost_vring(internal->vid, i + 1, &vring); + fd_ptr[RTE_INTR_VEC_RXTX_OFFSET + i + 1] = vring.callfd; + } + + ret = ioctl(internal->vfio_dev_fd, VFIO_DEVICE_SET_IRQS, irq_set); + if (ret) { + NT_LOG(ERR, VDPA, "Error enabling MSI-X interrupts: %s", + strerror(errno)); + return -1; + } + + atomic_store(&internal->intr_enabled, 1); + + LOG_FUNC_LEAVE(); + return 0; +} + +static int ntvf_vdpa_disable_vfio_intr(struct ntvf_vdpa_internal *internal) +{ + int ret; + char irq_set_buf[MSIX_IRQ_SET_BUF_LEN]; + struct vfio_irq_set *irq_set; + int len; + + if (!atomic_load(&internal->intr_enabled)) + return 0; + LOG_FUNC_ENTER(); + + NT_LOG(INF, VDPA, "Disable VFIO interrupt on VID %i (%02x:%02x.%x)\n", + internal->vid, internal->pdev->addr.bus, + internal->pdev->addr.devid, internal->pdev->addr.function); + + len = sizeof(struct vfio_irq_set); + irq_set = (struct vfio_irq_set *)irq_set_buf; + irq_set->argsz = len; + irq_set->count = 0; + irq_set->flags = VFIO_IRQ_SET_DATA_NONE | VFIO_IRQ_SET_ACTION_TRIGGER; + irq_set->index = VFIO_PCI_MSIX_IRQ_INDEX; + irq_set->start = 0; + + ret = ioctl(internal->vfio_dev_fd, VFIO_DEVICE_SET_IRQS, irq_set); + if (ret) { + NT_LOG(ERR, VDPA, "Error disabling MSI-X interrupts: %s", + strerror(errno)); + return -1; + } + + atomic_store(&internal->intr_enabled, 0); + + LOG_FUNC_LEAVE(); + return 0; +} + +static int ntvf_vdpa_update_datapath(struct ntvf_vdpa_internal *internal) +{ + int ret; + + LOG_FUNC_ENTER(); + + rte_spinlock_lock(&internal->lock); + + if (!atomic_load(&internal->running) && + (atomic_load(&internal->started) && + atomic_load(&internal->dev_attached))) { + NT_LOG(DBG, VDPA, "%s: [%s:%u] start\n", __func__, __FILE__, + __LINE__); + + ret = ntvf_vdpa_start(internal); + if (ret) { + NT_LOG(ERR, VDPA, "%s: [%s:%u]\n", __func__, __FILE__, + __LINE__); + goto err; + } + + atomic_store(&internal->running, 1); + } else if (atomic_load(&internal->running) && + (!atomic_load(&internal->started) || + !atomic_load(&internal->dev_attached))) { + NT_LOG(DBG, VDPA, "%s: stop\n", __func__); + + ret = ntvf_vdpa_stop(internal); + if (ret) { + NT_LOG(ERR, VDPA, "%s: [%s:%u]\n", __func__, __FILE__, + __LINE__); + goto err; + } + + ret = ntvf_vdpa_disable_vfio_intr(internal); + if (ret) { + goto err; + NT_LOG(ERR, VDPA, "%s: [%s:%u]\n", __func__, __FILE__, + __LINE__); + } + + ret = ntvf_vdpa_dma_map(internal, 0); + if (ret) { + NT_LOG(ERR, VDPA, "%s: [%s:%u]\n", __func__, __FILE__, + __LINE__); + goto err; + } + + atomic_store(&internal->running, 0); + } else { + NT_LOG(INF, VDPA, "%s: unhandled state [%s:%u]\n", __func__, + __FILE__, __LINE__); + } + + rte_spinlock_unlock(&internal->lock); + LOG_FUNC_LEAVE(); + return 0; + +err: + rte_spinlock_unlock(&internal->lock); + NT_LOG(ERR, VDPA, "%s: leave [%s:%u]\n", __func__, __FILE__, __LINE__); + return ret; +} + +static int ntvf_vdpa_dev_config(int vid) +{ + struct rte_vdpa_device *vdev; + struct internal_list *list; + struct ntvf_vdpa_internal *internal; + + LOG_FUNC_ENTER(); + + vdev = rte_vhost_get_vdpa_device(vid); + list = find_internal_resource_by_vdev(vdev); + if (list == NULL) { + NT_LOG(ERR, VDPA, "Invalid vDPA device: %p", vdev); + return -1; + } + + internal = list->internal; + internal->vid = vid; + + atomic_store(&internal->dev_attached, 1); + + ntvf_vdpa_update_datapath(internal); + + LOG_FUNC_LEAVE(); + return 0; +} + +static int ntvf_vdpa_dev_close(int vid) +{ + struct rte_vdpa_device *vdev; + struct internal_list *list; + struct ntvf_vdpa_internal *internal; + + LOG_FUNC_ENTER(); + + vdev = rte_vhost_get_vdpa_device(vid); + list = find_internal_resource_by_vdev(vdev); + if (list == NULL) { + NT_LOG(ERR, VDPA, "Invalid vDPA device: %p", vdev); + return -1; + } + + internal = list->internal; + + atomic_store(&internal->dev_attached, 0); + ntvf_vdpa_update_datapath(internal); + + /* Invalidate the virt queue pointers */ + uint32_t i; + struct ntvf_vdpa_hw *hw = &internal->hw; + + for (i = 0; i < hw->nr_vring; i++) + hw->vring[i].p_vq = NULL; + + LOG_FUNC_LEAVE(); + return 0; +} + +static int ntvf_vdpa_set_features(int vid) +{ + uint64_t features; + struct rte_vdpa_device *vdev; + struct internal_list *list; + + LOG_FUNC_ENTER(); + + vdev = rte_vhost_get_vdpa_device(vid); + list = find_internal_resource_by_vdev(vdev); + if (list == NULL) { + NT_LOG(ERR, VDPA, "Invalid vDPA device: %p", vdev); + return -1; + } + + rte_vhost_get_negotiated_features(vid, &features); + NT_LOG(DBG, VDPA, "%s: vid %d: vDPA dev %p: features=0x%016lX\n", + __func__, vid, vdev, features); + + if (!RTE_VHOST_NEED_LOG(features)) + return 0; + + NT_LOG(INF, VDPA, + "%s: Starting Live Migration for vid=%d vDPA dev=%p\n", __func__, + vid, vdev); + + /* Relay core feature not present. We cannot do live migration then. */ + NT_LOG(ERR, VDPA, + "%s: Live Migration not possible. Relay core feature required.\n", + __func__); + return -1; +} + +static int ntvf_vdpa_get_vfio_group_fd(int vid) +{ + struct rte_vdpa_device *vdev; + struct internal_list *list; + + LOG_FUNC_ENTER(); + + vdev = rte_vhost_get_vdpa_device(vid); + list = find_internal_resource_by_vdev(vdev); + if (list == NULL) { + NT_LOG(ERR, VDPA, "Invalid vDPA device: %p", vdev); + return -1; + } + + LOG_FUNC_LEAVE(); + return list->internal->vfio_group_fd; +} + +static int ntvf_vdpa_get_vfio_device_fd(int vid) +{ + struct rte_vdpa_device *vdev; + struct internal_list *list; + + LOG_FUNC_ENTER(); + + vdev = rte_vhost_get_vdpa_device(vid); + list = find_internal_resource_by_vdev(vdev); + if (list == NULL) { + NT_LOG(ERR, VDPA, "Invalid vDPA device: %p", vdev); + return -1; + } + + LOG_FUNC_LEAVE(); + return list->internal->vfio_dev_fd; +} + +static int ntvf_vdpa_get_queue_num(struct rte_vdpa_device *vdev, + uint32_t *queue_num) +{ + struct internal_list *list; + + LOG_FUNC_ENTER(); + + list = find_internal_resource_by_vdev(vdev); + if (list == NULL) { + NT_LOG(ERR, VDPA, "%s: Invalid device : %p\n", __func__, vdev); + return -1; + } + *queue_num = list->internal->max_queues; + NT_LOG(DBG, VDPA, "%s: vDPA dev=%p queue_num=%d\n", __func__, vdev, + *queue_num); + + LOG_FUNC_LEAVE(); + return 0; +} + +static int ntvf_vdpa_get_vdpa_features(struct rte_vdpa_device *vdev, + uint64_t *features) +{ + struct internal_list *list; + + LOG_FUNC_ENTER(); + + list = find_internal_resource_by_vdev(vdev); + if (list == NULL) { + NT_LOG(ERR, VDPA, "%s: Invalid device : %p\n", __func__, vdev); + return -1; + } + + if (!features) { + NT_LOG(ERR, VDPA, "%s: vDPA dev=%p: no ptr to feature\n", + __func__, vdev); + return -1; + } + + *features = list->internal->features; + NT_LOG(DBG, VDPA, "%s: vDPA dev=%p: features=0x%016lX\n", __func__, + vdev, *features); + + LOG_FUNC_LEAVE(); + return 0; +} + +static int +ntvf_vdpa_get_protocol_features(struct rte_vdpa_device *vdev __rte_unused, + uint64_t *features) +{ + LOG_FUNC_ENTER(); + + if (!features) { + NT_LOG(ERR, VDPA, "%s: vDPA dev=%p: no ptr to feature\n", + __func__, vdev); + return -1; + } + + *features = NTVF_VDPA_SUPPORTED_PROTOCOL_FEATURES; + NT_LOG(DBG, VDPA, "%s: vDPA dev=%p: features=0x%016lX\n", __func__, + vdev, *features); + + LOG_FUNC_LEAVE(); + return 0; +} + +static int ntvf_vdpa_configure_queue(struct ntvf_vdpa_hw *hw, + struct ntvf_vdpa_internal *internal) +{ + int ret = 0; + + ret = ntvf_vdpa_enable_vfio_intr(internal); + if (ret) { + printf("ERROR - ENABLE INTERRUPT via VFIO\n"); + return ret; + } + /* Enable Rx and Tx for all vrings */ + for (int i = 0; i < hw->nr_vring; i++) { + if (i & 1) + nthw_enable_tx_virt_queue(hw->vring[i].p_vq); + else + nthw_enable_rx_virt_queue(hw->vring[i].p_vq); + } + return ret; +} +static int ntvf_vdpa_set_vring_state(int vid, int vring, int state) +{ + struct rte_vdpa_device *vdev; + struct internal_list *list; + + struct ntvf_vdpa_internal *internal; + struct ntvf_vdpa_hw *hw; + int ret = 0; + + LOG_FUNC_ENTER(); + + vdev = rte_vhost_get_vdpa_device(vid); + list = find_internal_resource_by_vdev(vdev); + if (list == NULL) { + NT_LOG(ERR, VDPA, "Invalid vDPA device: %p", vdev); + return -1; + } + + internal = list->internal; + if (vring < 0 || vring >= internal->max_queues * 2) { + NT_LOG(ERR, VDPA, "Vring index %d not correct", vring); + return -1; + } + + hw = &internal->hw; + enum fpga_info_profile fpga_profile = + get_fpga_profile_from_pci(internal->pdev->addr); + + if (!state && hw->vring[vring].enable) { + /* Disable vring */ + if (hw->vring[vring].desc && hw->vring[vring].p_vq) { + if (hw->vring[vring].vq_type == 0) + nthw_disable_rx_virt_queue(hw->vring[vring].p_vq); + else + nthw_disable_tx_virt_queue(hw->vring[vring].p_vq); + } + } + + if (state && !hw->vring[vring].enable) { + /* Enable/Create vring */ + if (hw->vring[vring].desc && hw->vring[vring].p_vq) { + if (hw->vring[vring].vq_type == 0) + nthw_enable_rx_virt_queue(hw->vring[vring].p_vq); + else + nthw_enable_tx_virt_queue(hw->vring[vring].p_vq); + } else { + ntvf_vdpa_dma_map(internal, 1); + ntvf_vdpa_create_vring(internal, vring); + + if (fpga_profile != FPGA_INFO_PROFILE_INLINE) { + /* + * After last vq enable VFIO interrupt IOMMU re-mapping and enable + * FPGA Rx/Tx + */ + if (vring == hw->nr_vring - 1) { + ret = ntvf_vdpa_configure_queue(hw, internal); + if (ret) + return ret; + } + } + } + } + + if (fpga_profile == FPGA_INFO_PROFILE_INLINE) { + hw->vring[vring].enable = !!state; + /* after last vq enable VFIO interrupt IOMMU re-mapping */ + if (hw->vring[vring].enable && vring == hw->nr_vring - 1) { + ret = ntvf_vdpa_configure_queue(hw, internal); + if (ret) + return ret; + } + } else { + hw->vring[vring].enable = !!state; + } + LOG_FUNC_LEAVE(); + return 0; +} + +static struct rte_vdpa_dev_ops ntvf_vdpa_vdpa_ops = { + .get_queue_num = ntvf_vdpa_get_queue_num, + .get_features = ntvf_vdpa_get_vdpa_features, + .get_protocol_features = ntvf_vdpa_get_protocol_features, + .dev_conf = ntvf_vdpa_dev_config, + .dev_close = ntvf_vdpa_dev_close, + .set_vring_state = ntvf_vdpa_set_vring_state, + .set_features = ntvf_vdpa_set_features, + .migration_done = NULL, + .get_vfio_group_fd = ntvf_vdpa_get_vfio_group_fd, + .get_vfio_device_fd = ntvf_vdpa_get_vfio_device_fd, + .get_notify_area = NULL, +}; + +int ntvf_vdpa_pci_probe(struct rte_pci_driver *pci_drv __rte_unused, + struct rte_pci_device *pci_dev) +{ + struct ntvf_vdpa_internal *internal = NULL; + struct internal_list *list = NULL; + enum fpga_info_profile fpga_profile; + + LOG_FUNC_ENTER(); + + NT_LOG(INF, VDPA, "%s: [%s:%u] %04x:%02x:%02x.%x\n", __func__, __FILE__, + __LINE__, pci_dev->addr.domain, pci_dev->addr.bus, + pci_dev->addr.devid, pci_dev->addr.function); + list = rte_zmalloc("ntvf_vdpa", sizeof(*list), 0); + if (list == NULL) { + NT_LOG(ERR, VDPA, "%s: [%s:%u]\n", __func__, __FILE__, + __LINE__); + goto error; + } + + internal = rte_zmalloc("ntvf_vdpa", sizeof(*internal), 0); + if (internal == NULL) { + NT_LOG(ERR, VDPA, "%s: [%s:%u]\n", __func__, __FILE__, + __LINE__); + goto error; + } + + internal->pdev = pci_dev; + rte_spinlock_init(&internal->lock); + + if (ntvf_vdpa_vfio_setup(internal) < 0) { + NT_LOG(ERR, VDPA, "%s: [%s:%u]\n", __func__, __FILE__, + __LINE__); + return -1; + } + + internal->max_queues = NTVF_VDPA_MAX_QUEUES; + + internal->features = NTVF_VIRTIO_NET_SUPPORTED_FEATURES; + + NT_LOG(DBG, VDPA, "%s: masked features=0x%016lX [%s:%u]\n", __func__, + internal->features, __FILE__, __LINE__); + + fpga_profile = get_fpga_profile_from_pci(internal->pdev->addr); + if (fpga_profile == FPGA_INFO_PROFILE_VSWITCH) { + internal->outport = 0; + } else { + /* VF4 output port 0, VF5 output port 1, VF6 output port 0, ....... */ + internal->outport = internal->pdev->addr.function & 1; + } + + list->internal = internal; + + internal->vdev = + rte_vdpa_register_device(&pci_dev->device, &ntvf_vdpa_vdpa_ops); + NT_LOG(DBG, VDPA, "%s: vDPA dev=%p\n", __func__, internal->vdev); + + if (!internal->vdev) { + NT_LOG(ERR, VDPA, "%s: [%s:%u] Register vDPA device failed\n", + __func__, __FILE__, __LINE__); + goto error; + } + + pthread_mutex_lock(&internal_list_lock); + TAILQ_INSERT_TAIL(&internal_list, list, next); + pthread_mutex_unlock(&internal_list_lock); + + atomic_store(&internal->started, 1); + + ntvf_vdpa_update_datapath(internal); + + LOG_FUNC_LEAVE(); + return 0; + +error: + rte_free(list); + rte_free(internal); + return -1; +} + +int ntvf_vdpa_pci_remove(struct rte_pci_device *pci_dev) +{ + struct ntvf_vdpa_internal *internal; + struct internal_list *list; + int vf_num = nt_vfio_vf_num(pci_dev); + + LOG_FUNC_ENTER(); + list = ntvf_vdpa_find_internal_resource_by_dev(pci_dev); + if (list == NULL) { + NT_LOG(ERR, VDPA, "%s: Invalid device: %s", __func__, + pci_dev->name); + return -1; + } + + internal = list->internal; + atomic_store(&internal->started, 0); + + ntvf_vdpa_update_datapath(internal); + + rte_pci_unmap_device(internal->pdev); + nt_vfio_remove(vf_num); + rte_vdpa_unregister_device(internal->vdev); + + pthread_mutex_lock(&internal_list_lock); + TAILQ_REMOVE(&internal_list, list, next); + pthread_mutex_unlock(&internal_list_lock); + + rte_free(list); + rte_free(internal); + + LOG_FUNC_LEAVE(); + return 0; +} + +static const struct rte_pci_id pci_id_ntvf_vdpa_map[] = { + { + .vendor_id = 0, + }, +}; + +static struct rte_pci_driver rte_ntvf_vdpa = { + .id_table = pci_id_ntvf_vdpa_map, + .drv_flags = 0, + .probe = ntvf_vdpa_pci_probe, + .remove = ntvf_vdpa_pci_remove, +}; + +RTE_PMD_REGISTER_PCI(net_ntvf_vdpa, rte_ntvf_vdpa); +RTE_PMD_REGISTER_PCI_TABLE(net_ntvf_vdpa, pci_id_ntvf_vdpa_map); +RTE_PMD_REGISTER_KMOD_DEP(net_ntvf_vdpa, "* vfio-pci"); + diff --git a/drivers/net/ntnic/ntnic_vf_vdpa.h b/drivers/net/ntnic/ntnic_vf_vdpa.h new file mode 100644 index 0000000000..561e3bf7cf --- /dev/null +++ b/drivers/net/ntnic/ntnic_vf_vdpa.h @@ -0,0 +1,25 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2023 Napatech A/S + */ + +#ifndef __NTNIC_VF_VDPA_H__ +#define __NTNIC_VF_VDPA_H__ + +extern int ntvf_vdpa_logtype; + +#define LOG_FUNC_TRACE +#ifdef LOG_FUNC_TRACE +#define LOG_FUNC_ENTER() NT_LOG(DBG, VDPA, "%s: enter\n", __func__) +#define LOG_FUNC_LEAVE() NT_LOG(DBG, VDPA, "%s: leave\n", __func__) +#else +#define LOG_FUNC_ENTER() +#define LOG_FUNC_LEAVE() +#endif + +int ntvf_vdpa_pci_probe(struct rte_pci_driver *pci_drv __rte_unused, + struct rte_pci_device *pci_dev); +int ntvf_vdpa_pci_remove(struct rte_pci_device *pci_dev); + +void ntvf_vdpa_reset_hw(int vid); + +#endif /* __NTNIC_VF_VDPA_H__ */ diff --git a/drivers/net/ntnic/ntnic_vfio.c b/drivers/net/ntnic/ntnic_vfio.c new file mode 100644 index 0000000000..1390383c55 --- /dev/null +++ b/drivers/net/ntnic/ntnic_vfio.c @@ -0,0 +1,321 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2023 Napatech A/S + */ + +#include + +#include +#include +#include +#include +#include +#include + +#include +#include +#include "ntnic_vfio.h" + +#define ONE_G_SIZE 0x40000000 +#define ONE_G_MASK (ONE_G_SIZE - 1) +#define START_VF_IOVA 0x220000000000 + +int nt_vfio_vf_num(const struct rte_pci_device *pdev) +{ + return ((pdev->addr.devid & 0x1f) << 3) + ((pdev->addr.function) & 0x7); +} + +/* Internal API */ +struct vfio_dev { + int container_fd; + int group_fd; + int dev_fd; + uint64_t iova_addr; +}; + +static struct vfio_dev vfio_list[256]; + +static struct vfio_dev *vfio_get(int vf_num) +{ + if (vf_num < 0 || vf_num > 255) + return NULL; + return &vfio_list[vf_num]; +} + +/* External API */ +int nt_vfio_setup(struct rte_pci_device *dev) +{ + char devname[RTE_DEV_NAME_MAX_LEN] = { 0 }; + int iommu_group_num; + int vf_num; + struct vfio_dev *vfio; + + NT_LOG(INF, ETHDEV, "NT VFIO device setup %s\n", dev->name); + + vf_num = nt_vfio_vf_num(dev); + + vfio = vfio_get(vf_num); + if (vfio == NULL) { + NT_LOG(ERR, ETHDEV, + "VFIO device setup failed. Illegal device id\n"); + return -1; + } + + vfio->dev_fd = -1; + vfio->group_fd = -1; + vfio->container_fd = -1; + vfio->iova_addr = START_VF_IOVA; + + rte_pci_device_name(&dev->addr, devname, RTE_DEV_NAME_MAX_LEN); + rte_vfio_get_group_num(rte_pci_get_sysfs_path(), devname, + &iommu_group_num); + + if (vf_num == 0) { + /* use default container for pf0 */ + vfio->container_fd = RTE_VFIO_DEFAULT_CONTAINER_FD; + } else { + vfio->container_fd = rte_vfio_container_create(); + if (vfio->container_fd < 0) { + NT_LOG(ERR, ETHDEV, + "VFIO device setup failed. VFIO container creation failed.\n"); + return -1; + } + } + + vfio->group_fd = rte_vfio_container_group_bind(vfio->container_fd, + iommu_group_num); + if (vfio->group_fd < 0) { + NT_LOG(ERR, ETHDEV, + "VFIO device setup failed. VFIO container group bind failed.\n"); + goto err; + } + + if (vf_num > 0) { + if (rte_pci_map_device(dev)) { + NT_LOG(ERR, ETHDEV, + "Map VFIO device failed. is the vfio-pci driver loaded?\n"); + goto err; + } + } + + vfio->dev_fd = rte_intr_dev_fd_get(dev->intr_handle); + + NT_LOG(DBG, ETHDEV, + "%s: VFIO id=%d, dev_fd=%d, container_fd=%d, group_fd=%d, iommu_group_num=%d\n", + dev->name, vf_num, vfio->dev_fd, vfio->container_fd, + vfio->group_fd, iommu_group_num); + + return vf_num; + +err: + if (vfio->container_fd != RTE_VFIO_DEFAULT_CONTAINER_FD) + rte_vfio_container_destroy(vfio->container_fd); + return -1; +} + +int nt_vfio_remove(int vf_num) +{ + struct vfio_dev *vfio; + + NT_LOG(DBG, ETHDEV, "NT VFIO device remove VF=%d\n", vf_num); + + vfio = vfio_get(vf_num); + if (!vfio) { + NT_LOG(ERR, ETHDEV, + "VFIO device remove failed. Illegal device id\n"); + return -1; + } + + rte_vfio_container_destroy(vfio->container_fd); + return 0; +} + +int nt_vfio_dma_map(int vf_num, void *virt_addr, uint64_t *iova_addr, + uint64_t size) +{ + uint64_t gp_virt_base; + uint64_t gp_offset; + + if (size == ONE_G_SIZE) { + gp_virt_base = (uint64_t)virt_addr & ~ONE_G_MASK; + gp_offset = (uint64_t)virt_addr & ONE_G_MASK; + } else { + gp_virt_base = (uint64_t)virt_addr; + gp_offset = 0; + } + + struct vfio_dev *vfio; + + vfio = vfio_get(vf_num); + if (vfio == NULL) { + NT_LOG(ERR, ETHDEV, "VFIO MAP: VF number %d invalid\n", vf_num); + return -1; + } + + NT_LOG(DBG, ETHDEV, + "VFIO MMAP VF=%d VirtAddr=%" PRIX64 " HPA=%" PRIX64 + " VirtBase=%" PRIX64 " IOVA Addr=%" PRIX64 " size=%d\n", + vf_num, virt_addr, rte_malloc_virt2iova(virt_addr), gp_virt_base, + vfio->iova_addr, size); + + int res = rte_vfio_container_dma_map(vfio->container_fd, gp_virt_base, + vfio->iova_addr, size); + + NT_LOG(DBG, ETHDEV, "VFIO MMAP res %i, container_fd %i, vf_num %i\n", + res, vfio->container_fd, vf_num); + if (res) { + NT_LOG(ERR, ETHDEV, + "rte_vfio_container_dma_map failed: res %d\n", res); + return -1; + } + + *iova_addr = vfio->iova_addr + gp_offset; + + vfio->iova_addr += ONE_G_SIZE; + + return 0; +} + +int nt_vfio_dma_unmap(int vf_num, void *virt_addr, uint64_t iova_addr, + uint64_t size) +{ + uint64_t gp_virt_base; + struct vfio_dev *vfio; + + if (size == ONE_G_SIZE) { + uint64_t gp_offset; + + gp_virt_base = (uint64_t)virt_addr & ~ONE_G_MASK; + gp_offset = (uint64_t)virt_addr & ONE_G_MASK; + iova_addr -= gp_offset; + } else { + gp_virt_base = (uint64_t)virt_addr; + } + + vfio = vfio_get(vf_num); + + if (vfio == NULL) { + NT_LOG(ERR, ETHDEV, "VFIO UNMAP: VF number %d invalid\n", + vf_num); + return -1; + } + + if (vfio->container_fd == -1) + return 0; + + int res = rte_vfio_container_dma_unmap(vfio->container_fd, gp_virt_base, + iova_addr, size); + if (res != 0) { + NT_LOG(ERR, ETHDEV, + "VFIO UNMMAP FAILED! res %i, container_fd %i, vf_num %i, virt_base=%" PRIX64 + ", IOVA=%" PRIX64 ", size=%i\n", + res, vfio->container_fd, vf_num, gp_virt_base, iova_addr, + (int)size); + return -1; + } + + return 0; +} + +/* vDPA mapping with Guest Phy addresses as IOVA */ +int nt_vfio_dma_map_vdpa(int vf_num, uint64_t virt_addr, uint64_t iova_addr, + uint64_t size) +{ + struct vfio_dev *vfio = vfio_get(vf_num); + + if (vfio == NULL) { + NT_LOG(ERR, ETHDEV, "VFIO MAP: VF number %d invalid\n", vf_num); + return -1; + } + + NT_LOG(DBG, ETHDEV, + "VFIO vDPA MMAP VF=%d VirtAddr=%" PRIX64 " IOVA Addr=%" PRIX64 + " size=%d\n", + vf_num, virt_addr, iova_addr, size); + + int res = rte_vfio_container_dma_map(vfio->container_fd, virt_addr, + iova_addr, size); + + NT_LOG(DBG, ETHDEV, + "VFIO vDPA MMAP res %i, container_fd %i, vf_num %i\n", res, + vfio->container_fd, vf_num); + if (res) { + NT_LOG(ERR, ETHDEV, + "rte_vfio_container_dma_map failed: res %d\n", res); + return -1; + } + + return 0; +} + +int nt_vfio_dma_unmap_vdpa(int vf_num, uint64_t virt_addr, uint64_t iova_addr, + uint64_t size) +{ + struct vfio_dev *vfio = vfio_get(vf_num); + + if (vfio == NULL) { + NT_LOG(ERR, ETHDEV, "VFIO vDPA UNMAP: VF number %d invalid\n", + vf_num); + return -1; + } + int res = rte_vfio_container_dma_unmap(vfio->container_fd, virt_addr, + iova_addr, size); + if (res != 0) { + NT_LOG(ERR, ETHDEV, + "VFIO vDPA UNMMAP FAILED! res %i, container_fd %i, vf_num %i\n", + res, vfio->container_fd, vf_num); + return -1; + } + + return 0; +} + +int nt_vfio_get_container_fd(int vf_num) +{ + struct vfio_dev *vfio; + + vfio = vfio_get(vf_num); + if (!vfio) { + NT_LOG(ERR, ETHDEV, + "VFIO device remove failed. Illegal device id\n"); + return -1; + } + return vfio->container_fd; +} + +int nt_vfio_get_group_fd(int vf_num) +{ + struct vfio_dev *vfio; + + vfio = vfio_get(vf_num); + if (!vfio) { + NT_LOG(ERR, ETHDEV, + "VFIO device remove failed. Illegal device id\n"); + return -1; + } + return vfio->group_fd; +} + +int nt_vfio_get_dev_fd(int vf_num) +{ + struct vfio_dev *vfio; + + vfio = vfio_get(vf_num); + if (!vfio) { + NT_LOG(ERR, ETHDEV, + "VFIO device remove failed. Illegal device id\n"); + return -1; + } + return vfio->dev_fd; +} + +/* Internal init */ + +RTE_INIT(nt_vfio_init); + +static void nt_vfio_init(void) +{ + struct nt_util_vfio_impl s = { .vfio_dma_map = nt_vfio_dma_map, + .vfio_dma_unmap = nt_vfio_dma_unmap + }; + nt_util_vfio_init(&s); +} diff --git a/drivers/net/ntnic/ntnic_vfio.h b/drivers/net/ntnic/ntnic_vfio.h new file mode 100644 index 0000000000..5d8a63d364 --- /dev/null +++ b/drivers/net/ntnic/ntnic_vfio.h @@ -0,0 +1,31 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2023 Napatech A/S + */ + +#ifndef _NTNIC_VFIO_H_ +#define _NTNIC_VFIO_H_ + +#include +#include +#include + +int nt_vfio_setup(struct rte_pci_device *dev); +int nt_vfio_remove(int vf_num); + +int nt_vfio_get_container_fd(int vf_num); +int nt_vfio_get_group_fd(int vf_num); +int nt_vfio_get_dev_fd(int vf_num); + +int nt_vfio_dma_map(int vf_num, void *virt_addr, uint64_t *iova_addr, + uint64_t size); +int nt_vfio_dma_unmap(int vf_num, void *virt_addr, uint64_t iova_addr, + uint64_t size); + +int nt_vfio_dma_map_vdpa(int vf_num, uint64_t virt_addr, uint64_t iova_addr, + uint64_t size); +int nt_vfio_dma_unmap_vdpa(int vf_num, uint64_t virt_addr, uint64_t iova_addr, + uint64_t size); + +/* Find device (PF/VF) number from device address */ +int nt_vfio_vf_num(const struct rte_pci_device *dev); +#endif /* _NTNIC_VFIO_H_ */ diff --git a/drivers/net/ntnic/ntnic_xstats.c b/drivers/net/ntnic/ntnic_xstats.c new file mode 100644 index 0000000000..e034e33c89 --- /dev/null +++ b/drivers/net/ntnic/ntnic_xstats.c @@ -0,0 +1,703 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2023 Napatech A/S + */ + +#include +#include + +#include "ntdrv_4ga.h" +#include "ntlog.h" +#include "nthw_drv.h" +#include "nthw_fpga.h" +#include "ntnic_xstats.h" + +#define UNUSED __rte_unused + +struct rte_nthw_xstats_names_s { + char name[RTE_ETH_XSTATS_NAME_SIZE]; + uint8_t source; + unsigned int offset; +}; + +/* + * Extended stat for VSwitch + */ +static struct rte_nthw_xstats_names_s nthw_virt_xstats_names[] = { + { "rx_octets", 1, offsetof(struct port_counters_vswitch_v1, octets) }, + { "rx_packets", 1, offsetof(struct port_counters_vswitch_v1, pkts) }, + { "rx_dropped_packets", 1, + offsetof(struct port_counters_vswitch_v1, drop_events) + }, + { "rx_qos_dropped_bytes", 1, + offsetof(struct port_counters_vswitch_v1, qos_drop_octets) + }, + { "rx_qos_dropped_packets", 1, + offsetof(struct port_counters_vswitch_v1, qos_drop_pkts) + }, + { "tx_octets", 2, offsetof(struct port_counters_vswitch_v1, octets) }, + { "tx_packets", 2, offsetof(struct port_counters_vswitch_v1, pkts) }, + { "tx_dropped_packets", 2, + offsetof(struct port_counters_vswitch_v1, drop_events) + }, + { "tx_qos_dropped_bytes", 2, + offsetof(struct port_counters_vswitch_v1, qos_drop_octets) + }, + { "tx_qos_dropped_packets", 2, + offsetof(struct port_counters_vswitch_v1, qos_drop_pkts) + }, +}; + +#define NTHW_VIRT_XSTATS_NAMES RTE_DIM(nthw_virt_xstats_names) + +/* + * Extended stat for Capture/Inline - implements RMON + * FLM 0.17 + */ +static struct rte_nthw_xstats_names_s nthw_cap_xstats_names_v1[] = { + { "rx_drop_events", 1, offsetof(struct port_counters_v2, drop_events) }, + { "rx_octets", 1, offsetof(struct port_counters_v2, octets) }, + { "rx_packets", 1, offsetof(struct port_counters_v2, pkts) }, + { "rx_broadcast_packets", 1, + offsetof(struct port_counters_v2, broadcast_pkts) + }, + { "rx_multicast_packets", 1, + offsetof(struct port_counters_v2, multicast_pkts) + }, + { "rx_unicast_packets", 1, + offsetof(struct port_counters_v2, unicast_pkts) + }, + { "rx_align_errors", 1, + offsetof(struct port_counters_v2, pkts_alignment) + }, + { "rx_code_violation_errors", 1, + offsetof(struct port_counters_v2, pkts_code_violation) + }, + { "rx_crc_errors", 1, offsetof(struct port_counters_v2, pkts_crc) }, + { "rx_undersize_packets", 1, + offsetof(struct port_counters_v2, undersize_pkts) + }, + { "rx_oversize_packets", 1, + offsetof(struct port_counters_v2, oversize_pkts) + }, + { "rx_fragments", 1, offsetof(struct port_counters_v2, fragments) }, + { "rx_jabbers_not_truncated", 1, + offsetof(struct port_counters_v2, jabbers_not_truncated) + }, + { "rx_jabbers_truncated", 1, + offsetof(struct port_counters_v2, jabbers_truncated) + }, + { "rx_size_64_packets", 1, + offsetof(struct port_counters_v2, pkts_64_octets) + }, + { "rx_size_65_to_127_packets", 1, + offsetof(struct port_counters_v2, pkts_65_to_127_octets) + }, + { "rx_size_128_to_255_packets", 1, + offsetof(struct port_counters_v2, pkts_128_to_255_octets) + }, + { "rx_size_256_to_511_packets", 1, + offsetof(struct port_counters_v2, pkts_256_to_511_octets) + }, + { "rx_size_512_to_1023_packets", 1, + offsetof(struct port_counters_v2, pkts_512_to_1023_octets) + }, + { "rx_size_1024_to_1518_packets", 1, + offsetof(struct port_counters_v2, pkts_1024_to_1518_octets) + }, + { "rx_size_1519_to_2047_packets", 1, + offsetof(struct port_counters_v2, pkts_1519_to_2047_octets) + }, + { "rx_size_2048_to_4095_packets", 1, + offsetof(struct port_counters_v2, pkts_2048_to_4095_octets) + }, + { "rx_size_4096_to_8191_packets", 1, + offsetof(struct port_counters_v2, pkts_4096_to_8191_octets) + }, + { "rx_size_8192_to_max_packets", 1, + offsetof(struct port_counters_v2, pkts_8192_to_max_octets) + }, + { "rx_ip_checksum_error", 1, + offsetof(struct port_counters_v2, pkts_ip_chksum_error) + }, + { "rx_udp_checksum_error", 1, + offsetof(struct port_counters_v2, pkts_udp_chksum_error) + }, + { "rx_tcp_checksum_error", 1, + offsetof(struct port_counters_v2, pkts_tcp_chksum_error) + }, + + { "tx_drop_events", 2, offsetof(struct port_counters_v2, drop_events) }, + { "tx_octets", 2, offsetof(struct port_counters_v2, octets) }, + { "tx_packets", 2, offsetof(struct port_counters_v2, pkts) }, + { "tx_broadcast_packets", 2, + offsetof(struct port_counters_v2, broadcast_pkts) + }, + { "tx_multicast_packets", 2, + offsetof(struct port_counters_v2, multicast_pkts) + }, + { "tx_unicast_packets", 2, + offsetof(struct port_counters_v2, unicast_pkts) + }, + { "tx_align_errors", 2, + offsetof(struct port_counters_v2, pkts_alignment) + }, + { "tx_code_violation_errors", 2, + offsetof(struct port_counters_v2, pkts_code_violation) + }, + { "tx_crc_errors", 2, offsetof(struct port_counters_v2, pkts_crc) }, + { "tx_undersize_packets", 2, + offsetof(struct port_counters_v2, undersize_pkts) + }, + { "tx_oversize_packets", 2, + offsetof(struct port_counters_v2, oversize_pkts) + }, + { "tx_fragments", 2, offsetof(struct port_counters_v2, fragments) }, + { "tx_jabbers_not_truncated", 2, + offsetof(struct port_counters_v2, jabbers_not_truncated) + }, + { "tx_jabbers_truncated", 2, + offsetof(struct port_counters_v2, jabbers_truncated) + }, + { "tx_size_64_packets", 2, + offsetof(struct port_counters_v2, pkts_64_octets) + }, + { "tx_size_65_to_127_packets", 2, + offsetof(struct port_counters_v2, pkts_65_to_127_octets) + }, + { "tx_size_128_to_255_packets", 2, + offsetof(struct port_counters_v2, pkts_128_to_255_octets) + }, + { "tx_size_256_to_511_packets", 2, + offsetof(struct port_counters_v2, pkts_256_to_511_octets) + }, + { "tx_size_512_to_1023_packets", 2, + offsetof(struct port_counters_v2, pkts_512_to_1023_octets) + }, + { "tx_size_1024_to_1518_packets", 2, + offsetof(struct port_counters_v2, pkts_1024_to_1518_octets) + }, + { "tx_size_1519_to_2047_packets", 2, + offsetof(struct port_counters_v2, pkts_1519_to_2047_octets) + }, + { "tx_size_2048_to_4095_packets", 2, + offsetof(struct port_counters_v2, pkts_2048_to_4095_octets) + }, + { "tx_size_4096_to_8191_packets", 2, + offsetof(struct port_counters_v2, pkts_4096_to_8191_octets) + }, + { "tx_size_8192_to_max_packets", 2, + offsetof(struct port_counters_v2, pkts_8192_to_max_octets) + }, + + /* FLM 0.17 */ + { "flm_count_current", 3, offsetof(struct flm_counters_v1, current) }, + { "flm_count_learn_done", 3, + offsetof(struct flm_counters_v1, learn_done) + }, + { "flm_count_learn_ignore", 3, + offsetof(struct flm_counters_v1, learn_ignore) + }, + { "flm_count_learn_fail", 3, + offsetof(struct flm_counters_v1, learn_fail) + }, + { "flm_count_unlearn_done", 3, + offsetof(struct flm_counters_v1, unlearn_done) + }, + { "flm_count_unlearn_ignore", 3, + offsetof(struct flm_counters_v1, unlearn_ignore) + }, + { "flm_count_auto_unlearn_done", 3, + offsetof(struct flm_counters_v1, auto_unlearn_done) + }, + { "flm_count_auto_unlearn_ignore", 3, + offsetof(struct flm_counters_v1, auto_unlearn_ignore) + }, + { "flm_count_auto_unlearn_fail", 3, + offsetof(struct flm_counters_v1, auto_unlearn_fail) + }, + { "flm_count_timeout_unlearn_done", 3, + offsetof(struct flm_counters_v1, timeout_unlearn_done) + }, + { "flm_count_rel_done", 3, offsetof(struct flm_counters_v1, rel_done) }, + { "flm_count_rel_ignore", 3, + offsetof(struct flm_counters_v1, rel_ignore) + }, + { "flm_count_prb_done", 3, offsetof(struct flm_counters_v1, prb_done) }, + { "flm_count_prb_ignore", 3, + offsetof(struct flm_counters_v1, prb_ignore) + }, +}; + +/* + * Extended stat for Capture/Inline - implements RMON + * FLM 0.18 + */ +static struct rte_nthw_xstats_names_s nthw_cap_xstats_names_v2[] = { + { "rx_drop_events", 1, offsetof(struct port_counters_v2, drop_events) }, + { "rx_octets", 1, offsetof(struct port_counters_v2, octets) }, + { "rx_packets", 1, offsetof(struct port_counters_v2, pkts) }, + { "rx_broadcast_packets", 1, + offsetof(struct port_counters_v2, broadcast_pkts) + }, + { "rx_multicast_packets", 1, + offsetof(struct port_counters_v2, multicast_pkts) + }, + { "rx_unicast_packets", 1, + offsetof(struct port_counters_v2, unicast_pkts) + }, + { "rx_align_errors", 1, + offsetof(struct port_counters_v2, pkts_alignment) + }, + { "rx_code_violation_errors", 1, + offsetof(struct port_counters_v2, pkts_code_violation) + }, + { "rx_crc_errors", 1, offsetof(struct port_counters_v2, pkts_crc) }, + { "rx_undersize_packets", 1, + offsetof(struct port_counters_v2, undersize_pkts) + }, + { "rx_oversize_packets", 1, + offsetof(struct port_counters_v2, oversize_pkts) + }, + { "rx_fragments", 1, offsetof(struct port_counters_v2, fragments) }, + { "rx_jabbers_not_truncated", 1, + offsetof(struct port_counters_v2, jabbers_not_truncated) + }, + { "rx_jabbers_truncated", 1, + offsetof(struct port_counters_v2, jabbers_truncated) + }, + { "rx_size_64_packets", 1, + offsetof(struct port_counters_v2, pkts_64_octets) + }, + { "rx_size_65_to_127_packets", 1, + offsetof(struct port_counters_v2, pkts_65_to_127_octets) + }, + { "rx_size_128_to_255_packets", 1, + offsetof(struct port_counters_v2, pkts_128_to_255_octets) + }, + { "rx_size_256_to_511_packets", 1, + offsetof(struct port_counters_v2, pkts_256_to_511_octets) + }, + { "rx_size_512_to_1023_packets", 1, + offsetof(struct port_counters_v2, pkts_512_to_1023_octets) + }, + { "rx_size_1024_to_1518_packets", 1, + offsetof(struct port_counters_v2, pkts_1024_to_1518_octets) + }, + { "rx_size_1519_to_2047_packets", 1, + offsetof(struct port_counters_v2, pkts_1519_to_2047_octets) + }, + { "rx_size_2048_to_4095_packets", 1, + offsetof(struct port_counters_v2, pkts_2048_to_4095_octets) + }, + { "rx_size_4096_to_8191_packets", 1, + offsetof(struct port_counters_v2, pkts_4096_to_8191_octets) + }, + { "rx_size_8192_to_max_packets", 1, + offsetof(struct port_counters_v2, pkts_8192_to_max_octets) + }, + { "rx_ip_checksum_error", 1, + offsetof(struct port_counters_v2, pkts_ip_chksum_error) + }, + { "rx_udp_checksum_error", 1, + offsetof(struct port_counters_v2, pkts_udp_chksum_error) + }, + { "rx_tcp_checksum_error", 1, + offsetof(struct port_counters_v2, pkts_tcp_chksum_error) + }, + + { "tx_drop_events", 2, offsetof(struct port_counters_v2, drop_events) }, + { "tx_octets", 2, offsetof(struct port_counters_v2, octets) }, + { "tx_packets", 2, offsetof(struct port_counters_v2, pkts) }, + { "tx_broadcast_packets", 2, + offsetof(struct port_counters_v2, broadcast_pkts) + }, + { "tx_multicast_packets", 2, + offsetof(struct port_counters_v2, multicast_pkts) + }, + { "tx_unicast_packets", 2, + offsetof(struct port_counters_v2, unicast_pkts) + }, + { "tx_align_errors", 2, + offsetof(struct port_counters_v2, pkts_alignment) + }, + { "tx_code_violation_errors", 2, + offsetof(struct port_counters_v2, pkts_code_violation) + }, + { "tx_crc_errors", 2, offsetof(struct port_counters_v2, pkts_crc) }, + { "tx_undersize_packets", 2, + offsetof(struct port_counters_v2, undersize_pkts) + }, + { "tx_oversize_packets", 2, + offsetof(struct port_counters_v2, oversize_pkts) + }, + { "tx_fragments", 2, offsetof(struct port_counters_v2, fragments) }, + { "tx_jabbers_not_truncated", 2, + offsetof(struct port_counters_v2, jabbers_not_truncated) + }, + { "tx_jabbers_truncated", 2, + offsetof(struct port_counters_v2, jabbers_truncated) + }, + { "tx_size_64_packets", 2, + offsetof(struct port_counters_v2, pkts_64_octets) + }, + { "tx_size_65_to_127_packets", 2, + offsetof(struct port_counters_v2, pkts_65_to_127_octets) + }, + { "tx_size_128_to_255_packets", 2, + offsetof(struct port_counters_v2, pkts_128_to_255_octets) + }, + { "tx_size_256_to_511_packets", 2, + offsetof(struct port_counters_v2, pkts_256_to_511_octets) + }, + { "tx_size_512_to_1023_packets", 2, + offsetof(struct port_counters_v2, pkts_512_to_1023_octets) + }, + { "tx_size_1024_to_1518_packets", 2, + offsetof(struct port_counters_v2, pkts_1024_to_1518_octets) + }, + { "tx_size_1519_to_2047_packets", 2, + offsetof(struct port_counters_v2, pkts_1519_to_2047_octets) + }, + { "tx_size_2048_to_4095_packets", 2, + offsetof(struct port_counters_v2, pkts_2048_to_4095_octets) + }, + { "tx_size_4096_to_8191_packets", 2, + offsetof(struct port_counters_v2, pkts_4096_to_8191_octets) + }, + { "tx_size_8192_to_max_packets", 2, + offsetof(struct port_counters_v2, pkts_8192_to_max_octets) + }, + + /* FLM 0.17 */ + { "flm_count_current", 3, offsetof(struct flm_counters_v1, current) }, + { "flm_count_learn_done", 3, + offsetof(struct flm_counters_v1, learn_done) + }, + { "flm_count_learn_ignore", 3, + offsetof(struct flm_counters_v1, learn_ignore) + }, + { "flm_count_learn_fail", 3, + offsetof(struct flm_counters_v1, learn_fail) + }, + { "flm_count_unlearn_done", 3, + offsetof(struct flm_counters_v1, unlearn_done) + }, + { "flm_count_unlearn_ignore", 3, + offsetof(struct flm_counters_v1, unlearn_ignore) + }, + { "flm_count_auto_unlearn_done", 3, + offsetof(struct flm_counters_v1, auto_unlearn_done) + }, + { "flm_count_auto_unlearn_ignore", 3, + offsetof(struct flm_counters_v1, auto_unlearn_ignore) + }, + { "flm_count_auto_unlearn_fail", 3, + offsetof(struct flm_counters_v1, auto_unlearn_fail) + }, + { "flm_count_timeout_unlearn_done", 3, + offsetof(struct flm_counters_v1, timeout_unlearn_done) + }, + { "flm_count_rel_done", 3, offsetof(struct flm_counters_v1, rel_done) }, + { "flm_count_rel_ignore", 3, + offsetof(struct flm_counters_v1, rel_ignore) + }, + { "flm_count_prb_done", 3, offsetof(struct flm_counters_v1, prb_done) }, + { "flm_count_prb_ignore", 3, + offsetof(struct flm_counters_v1, prb_ignore) + }, + + /* FLM 0.20 */ + { "flm_count_sta_done", 3, offsetof(struct flm_counters_v1, sta_done) }, + { "flm_count_inf_done", 3, offsetof(struct flm_counters_v1, inf_done) }, + { "flm_count_inf_skip", 3, offsetof(struct flm_counters_v1, inf_skip) }, + { "flm_count_pck_hit", 3, offsetof(struct flm_counters_v1, pck_hit) }, + { "flm_count_pck_miss", 3, offsetof(struct flm_counters_v1, pck_miss) }, + { "flm_count_pck_unh", 3, offsetof(struct flm_counters_v1, pck_unh) }, + { "flm_count_pck_dis", 3, offsetof(struct flm_counters_v1, pck_dis) }, + { "flm_count_csh_hit", 3, offsetof(struct flm_counters_v1, csh_hit) }, + { "flm_count_csh_miss", 3, offsetof(struct flm_counters_v1, csh_miss) }, + { "flm_count_csh_unh", 3, offsetof(struct flm_counters_v1, csh_unh) }, + { "flm_count_cuc_start", 3, + offsetof(struct flm_counters_v1, cuc_start) + }, + { "flm_count_cuc_move", 3, offsetof(struct flm_counters_v1, cuc_move) }, +}; + +#define NTHW_CAP_XSTATS_NAMES_V1 RTE_DIM(nthw_cap_xstats_names_v1) +#define NTHW_CAP_XSTATS_NAMES_V2 RTE_DIM(nthw_cap_xstats_names_v2) + +/* + * Container for the reset values + */ +#define NTHW_XSTATS_SIZE ((NTHW_VIRT_XSTATS_NAMES < NTHW_CAP_XSTATS_NAMES_V2) ? \ + NTHW_CAP_XSTATS_NAMES_V2 : NTHW_VIRT_XSTATS_NAMES) + +uint64_t nthw_xstats_reset_val[NUM_ADAPTER_PORTS_MAX][NTHW_XSTATS_SIZE] = { 0 }; + + +/* + * These functions must only be called with stat mutex locked + */ +int nthw_xstats_get(nt4ga_stat_t *p_nt4ga_stat, struct rte_eth_xstat *stats, + unsigned int n, bool is_vswitch, uint8_t port) +{ + unsigned int i; + uint8_t *flm_ptr; + uint8_t *rx_ptr; + uint8_t *tx_ptr; + uint32_t nb_names; + struct rte_nthw_xstats_names_s *names; + + if (is_vswitch) { + flm_ptr = NULL; + rx_ptr = (uint8_t *)&p_nt4ga_stat->virt.mp_stat_structs_port_rx[port]; + tx_ptr = (uint8_t *)&p_nt4ga_stat->virt.mp_stat_structs_port_tx[port]; + names = nthw_virt_xstats_names; + nb_names = NTHW_VIRT_XSTATS_NAMES; + } else { + flm_ptr = (uint8_t *)p_nt4ga_stat->mp_stat_structs_flm; + rx_ptr = (uint8_t *)&p_nt4ga_stat->cap.mp_stat_structs_port_rx[port]; + tx_ptr = (uint8_t *)&p_nt4ga_stat->cap.mp_stat_structs_port_tx[port]; + if (p_nt4ga_stat->flm_stat_ver < 18) { + names = nthw_cap_xstats_names_v1; + nb_names = NTHW_CAP_XSTATS_NAMES_V1; + } else { + names = nthw_cap_xstats_names_v2; + nb_names = NTHW_CAP_XSTATS_NAMES_V2; + } + } + + for (i = 0; i < n && i < nb_names; i++) { + stats[i].id = i; + switch (names[i].source) { + case 1: + /* RX stat */ + stats[i].value = + *((uint64_t *)&rx_ptr[names[i].offset]) - + nthw_xstats_reset_val[port][i]; + break; + case 2: + /* TX stat */ + stats[i].value = + *((uint64_t *)&tx_ptr[names[i].offset]) - + nthw_xstats_reset_val[port][i]; + break; + case 3: + /* FLM stat */ + if (flm_ptr) { + stats[i].value = + *((uint64_t *)&flm_ptr[names[i].offset]) - + nthw_xstats_reset_val[0][i]; + } else { + stats[i].value = 0; + } + break; + default: + stats[i].value = 0; + break; + } + } + + return i; +} + +int nthw_xstats_get_by_id(nt4ga_stat_t *p_nt4ga_stat, const uint64_t *ids, + uint64_t *values, unsigned int n, bool is_vswitch, + uint8_t port) +{ + unsigned int i; + uint8_t *flm_ptr; + uint8_t *rx_ptr; + uint8_t *tx_ptr; + uint32_t nb_names; + struct rte_nthw_xstats_names_s *names; + int count = 0; + + if (is_vswitch) { + flm_ptr = NULL; + rx_ptr = (uint8_t *)&p_nt4ga_stat->virt.mp_stat_structs_port_rx[port]; + tx_ptr = (uint8_t *)&p_nt4ga_stat->virt.mp_stat_structs_port_tx[port]; + names = nthw_virt_xstats_names; + nb_names = NTHW_VIRT_XSTATS_NAMES; + } else { + flm_ptr = (uint8_t *)p_nt4ga_stat->mp_stat_structs_flm; + rx_ptr = (uint8_t *)&p_nt4ga_stat->cap.mp_stat_structs_port_rx[port]; + tx_ptr = (uint8_t *)&p_nt4ga_stat->cap.mp_stat_structs_port_tx[port]; + if (p_nt4ga_stat->flm_stat_ver < 18) { + names = nthw_cap_xstats_names_v1; + nb_names = NTHW_CAP_XSTATS_NAMES_V1; + } else { + names = nthw_cap_xstats_names_v2; + nb_names = NTHW_CAP_XSTATS_NAMES_V2; + } + } + + for (i = 0; i < n; i++) { + if (ids[i] < nb_names) { + switch (names[ids[i]].source) { + case 1: + /* RX stat */ + values[i] = + *((uint64_t *)&rx_ptr[names[ids[i]] + .offset]) - + nthw_xstats_reset_val[port][ids[i]]; + break; + case 2: + /* TX stat */ + values[i] = + *((uint64_t *)&tx_ptr[names[ids[i]] + .offset]) - + nthw_xstats_reset_val[port][ids[i]]; + break; + case 3: + /* FLM stat */ + if (flm_ptr) { + values[i] = + *((uint64_t *)&flm_ptr + [names[ids[i]].offset]) - + nthw_xstats_reset_val[0][ids[i]]; + } else { + values[i] = 0; + } + break; + default: + values[i] = 0; + break; + } + count++; + } + } + + return count; +} + +void nthw_xstats_reset(nt4ga_stat_t *p_nt4ga_stat, bool is_vswitch, uint8_t port) +{ + unsigned int i; + uint8_t *flm_ptr; + uint8_t *rx_ptr; + uint8_t *tx_ptr; + uint32_t nb_names; + struct rte_nthw_xstats_names_s *names; + + if (is_vswitch) { + flm_ptr = NULL; + rx_ptr = (uint8_t *)&p_nt4ga_stat->virt.mp_stat_structs_port_rx[port]; + tx_ptr = (uint8_t *)&p_nt4ga_stat->virt.mp_stat_structs_port_tx[port]; + names = nthw_virt_xstats_names; + nb_names = NTHW_VIRT_XSTATS_NAMES; + } else { + flm_ptr = (uint8_t *)p_nt4ga_stat->mp_stat_structs_flm; + rx_ptr = (uint8_t *)&p_nt4ga_stat->cap.mp_stat_structs_port_rx[port]; + tx_ptr = (uint8_t *)&p_nt4ga_stat->cap.mp_stat_structs_port_tx[port]; + if (p_nt4ga_stat->flm_stat_ver < 18) { + names = nthw_cap_xstats_names_v1; + nb_names = NTHW_CAP_XSTATS_NAMES_V1; + } else { + names = nthw_cap_xstats_names_v2; + nb_names = NTHW_CAP_XSTATS_NAMES_V2; + } + } + + for (i = 0; i < nb_names; i++) { + switch (names[i].source) { + case 1: + /* RX stat */ + nthw_xstats_reset_val[port][i] = + *((uint64_t *)&rx_ptr[names[i].offset]); + break; + case 2: + /* TX stat */ + nthw_xstats_reset_val[port][i] = + *((uint64_t *)&tx_ptr[names[i].offset]); + break; + case 3: + /* + * FLM stat + * Reset makes no sense for flm_count_current + */ + if (flm_ptr && strcmp(names[i].name, "flm_count_current") != 0) { + nthw_xstats_reset_val[0][i] = + *((uint64_t *)&flm_ptr[names[i].offset]); + } + break; + default: + break; + } + } +} + +/* + * These functions does not require stat mutex locked + */ +int nthw_xstats_get_names(nt4ga_stat_t *p_nt4ga_stat, + struct rte_eth_xstat_name *xstats_names, + unsigned int size, bool is_vswitch) +{ + int count = 0; + unsigned int i; + uint32_t nb_names; + struct rte_nthw_xstats_names_s *names; + + if (is_vswitch) { + names = nthw_virt_xstats_names; + nb_names = NTHW_VIRT_XSTATS_NAMES; + } else { + if (p_nt4ga_stat->flm_stat_ver < 18) { + names = nthw_cap_xstats_names_v1; + nb_names = NTHW_CAP_XSTATS_NAMES_V1; + } else { + names = nthw_cap_xstats_names_v2; + nb_names = NTHW_CAP_XSTATS_NAMES_V2; + } + } + + if (!xstats_names) + return nb_names; + + for (i = 0; i < size && i < nb_names; i++) { + strlcpy(xstats_names[i].name, names[i].name, + sizeof(xstats_names[i].name)); + count++; + } + + return count; +} + +int nthw_xstats_get_names_by_id(nt4ga_stat_t *p_nt4ga_stat, + struct rte_eth_xstat_name *xstats_names, + const uint64_t *ids, unsigned int size, + bool is_vswitch) +{ + int count = 0; + unsigned int i; + + uint32_t nb_names; + struct rte_nthw_xstats_names_s *names; + + if (is_vswitch) { + names = nthw_virt_xstats_names; + nb_names = NTHW_VIRT_XSTATS_NAMES; + } else { + if (p_nt4ga_stat->flm_stat_ver < 18) { + names = nthw_cap_xstats_names_v1; + nb_names = NTHW_CAP_XSTATS_NAMES_V1; + } else { + names = nthw_cap_xstats_names_v2; + nb_names = NTHW_CAP_XSTATS_NAMES_V2; + } + } + + if (!xstats_names) + return nb_names; + + for (i = 0; i < size; i++) { + if (ids[i] < nb_names) { + strlcpy(xstats_names[i].name, names[ids[i]].name, + RTE_ETH_XSTATS_NAME_SIZE); + } + count++; + } + + return count; +} diff --git a/drivers/net/ntnic/ntnic_xstats.h b/drivers/net/ntnic/ntnic_xstats.h new file mode 100644 index 0000000000..0a82a1a677 --- /dev/null +++ b/drivers/net/ntnic/ntnic_xstats.h @@ -0,0 +1,22 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2023 Napatech A/S + */ + +#ifndef NTNIC_XSTATS_H_ +#define NTNIC_XSTATS_H_ + +int nthw_xstats_get_names(nt4ga_stat_t *p_nt4ga_stat, + struct rte_eth_xstat_name *xstats_names, + unsigned int size, bool is_vswitch); +int nthw_xstats_get(nt4ga_stat_t *p_nt4ga_stat, struct rte_eth_xstat *stats, + unsigned int n, bool is_vswitch, uint8_t port); +void nthw_xstats_reset(nt4ga_stat_t *p_nt4ga_stat, bool is_vswitch, uint8_t port); +int nthw_xstats_get_names_by_id(nt4ga_stat_t *p_nt4ga_stat, + struct rte_eth_xstat_name *xstats_names, + const uint64_t *ids, unsigned int size, + bool is_vswitch); +int nthw_xstats_get_by_id(nt4ga_stat_t *p_nt4ga_stat, const uint64_t *ids, + uint64_t *values, unsigned int n, bool is_vswitch, + uint8_t port); + +#endif /* NTNIC_XSTATS_H_ */