[v2,3/4] doc: give specific instructions for running as non-root

Message ID 20220617112508.3823291-4-dkozlyuk@nvidia.com (mailing list archive)
State Superseded, archived
Delegated to: Thomas Monjalon
Headers
Series Improve documentation for running as non-root |

Checks

Context Check Description
ci/checkpatch warning coding style issues

Commit Message

Dmitry Kozlyuk June 17, 2022, 11:25 a.m. UTC
  The guide to run DPDK applications as non-root in Linux
did not provide specific instructions to configure the required access
and did not explain why each bit is needed.
The latter is important because running as non-root
is one of the ways to tighten security and grant minimal permissions.

Cc: stable@dpdk.org

Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
---
 doc/guides/linux_gsg/enable_func.rst          | 67 +++++++++++++++++--
 .../prog_guide/env_abstraction_layer.rst      |  2 +
 2 files changed, 63 insertions(+), 6 deletions(-)
  

Comments

Bruce Richardson June 17, 2022, 4:38 p.m. UTC | #1
On Fri, Jun 17, 2022 at 02:25:07PM +0300, Dmitry Kozlyuk wrote:
> The guide to run DPDK applications as non-root in Linux
> did not provide specific instructions to configure the required access
> and did not explain why each bit is needed.
> The latter is important because running as non-root
> is one of the ways to tighten security and grant minimal permissions.
> 
> Cc: stable@dpdk.org
> 
> Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>

Thanks for this, some good changes here. Comments inline below.

/Bruce

> ---
>  doc/guides/linux_gsg/enable_func.rst          | 67 +++++++++++++++++--
>  .../prog_guide/env_abstraction_layer.rst      |  2 +
>  2 files changed, 63 insertions(+), 6 deletions(-)
> 
> diff --git a/doc/guides/linux_gsg/enable_func.rst b/doc/guides/linux_gsg/enable_func.rst
> index 1df3ab0255..2f908e8b70 100644
> --- a/doc/guides/linux_gsg/enable_func.rst
> +++ b/doc/guides/linux_gsg/enable_func.rst
> @@ -13,13 +13,58 @@ Enabling Additional Functionality
>  Running DPDK Applications Without Root Privileges
>  -------------------------------------------------
>  
> -In order to run DPDK as non-root, the following Linux filesystem objects'
> -permissions should be adjusted to ensure that the Linux account being used to
> -run the DPDK application has access to them:
> +The following sections describe generic requirements and configuration
> +for running DPDK applications as non-root.
> +There may be additional requirements documented for some drivers.
>  
> -*   All directories which serve as hugepage mount points, for example, ``/dev/hugepages``
> +Hugepages
> +~~~~~~~~~
>  
> -*   If the HPET is to be used,  ``/dev/hpet``
> +Hugepages must be reserved as root before runing the application as non-root,
> +for example::
> +
> +  sudo dpdk-hugepages.py --reserve 1G
> +
> +If multi-process is not required, running with ``--in-memory``
> +bypasses the need to access hugepage mount point and files within it.
> +Otherwise, hugepage directory must be made accessible
> +for writing to the unprivileged user.
> +A good way for managing multiple applications using hugepages
> +is to mount the filesystem with group permissions
> +and add a supplementary group to each application or container.
> +
> +One option is to use the script provided by this project::
> +
> +  export HUGEDIR=$HOME/huge-1G
> +  mkdir -p $HUGEDIR
> +  sudo dpdk-hugepages.py --mount --directory $HUGEDIR --owner `id -u`:`id -g`
> +
> +In production environment, the OS can manage mount points
> +(`systemd example <https://github.com/systemd/systemd/blob/main/units/dev-hugepages.mount>`_).
> +
> +The ``hugetlb`` filesystem has additional options to guarantee or limit
> +the amount of memory that is possible to allocate using the mount point.
> +Refer to the `documentation <https://www.kernel.org/doc/Documentation/vm/hugetlbpage.txt>`_.
> +
> +If the driver requires using physical addresses (PA),
> +the executable file must be granted additional capabilities:
> +
> +* ``SYS_ADMIN`` to read ``/proc/self/pagemaps``
> +* ``IPC_LOCK`` to lock hugepages in memory

Are either of these necessary if using vfio-pci and VA mode? I have seen it
previously reported that IPC_LOCK is necessary for IOMMU memory mapping for
DMA - at least for docker containers - so I'd like it confirmed that we
don't need them in the in-memory case running on the host. If I get the
chance I'll try double-checking by testing myself.

> +
> +.. code-block:: console
> +
> +   setcap cap_ipc_lock,cap_sys_admin+ep <executable>
> +
> +If physical addresses are not accessible,
> +the following message will appear during EAL initialization::
> +
> +  EAL: rte_mem_virt2phy(): cannot open /proc/self/pagemap: Permission denied
> +
> +It is harmless in case PA are not needed.
> +

While this is probably worth having in the doc, I think we should really
include a note here about using vfio-pci rather than uio and therefore not
needing physical addresses.

> +Resource Limits
> +~~~~~~~~~~~~~~~
>  
>  When running as non-root user, there may be some additional resource limits
>  that are imposed by the system. Specifically, the following resource limits may
> @@ -34,7 +79,15 @@ need to be adjusted in order to ensure normal DPDK operation:
>  The above limits can usually be adjusted by editing
>  ``/etc/security/limits.conf`` file, and rebooting.
>  
> -Additionally, depending on which kernel driver is in use, the relevant
> +See `Hugepage Mapping <hugepage_mapping>`_
> +secton to learn how these limits affect EAL.

Typo: s/secton/section/

> +
> +Device Control
> +~~~~~~~~~~~~~~
> +
> +If the HPET is to be used, ``/dev/hpet`` permissions must be adjusted.
> +

Given that HPET has been off by default for years, I think we can probably
remove this line. Anyone still using it likely already knows this.

> +Depending on which kernel driver is in use, the relevant
>  resources also should be accessible by the user running the DPDK application.
>  
>  For ``vfio-pci`` kernel driver, the following Linux file system objects'
> @@ -64,6 +117,8 @@ system objects' permissions should be adjusted:
>         /sys/class/uio/uio0/device/config
>         /sys/class/uio/uio0/device/resource*
>  

I think our minimum supported kernel version is now >4.0 so I believe this
uio section should be removed as it's only applicable for earlier kernel
versions.

> +For ``virtio`` PMD in legacy mode, ``SYS_RAWIO`` capability is required
> +for ``iopl()`` call to enable access to PCI IO ports.
>  

How "legacy" is legacy-mode? Is it still likely in widespread use that we
need this?

>  Power Management and Power Saving Functionality
>  -----------------------------------------------
> diff --git a/doc/guides/prog_guide/env_abstraction_layer.rst b/doc/guides/prog_guide/env_abstraction_layer.rst
> index 5f0748fba1..70fa099d30 100644
> --- a/doc/guides/prog_guide/env_abstraction_layer.rst
> +++ b/doc/guides/prog_guide/env_abstraction_layer.rst
> @@ -228,6 +228,8 @@ Normally, these options do not need to be changed.
>      can later be mapped into that preallocated VA space (if dynamic memory mode
>      is enabled), and can optionally be mapped into it at startup.
>  
> +.. _hugepage_mapping:
> +
>  Hugepage Mapping
>  ^^^^^^^^^^^^^^^^
>  
> -- 
> 2.25.1
>
  
Dmitry Kozlyuk June 20, 2022, 6:10 a.m. UTC | #2
> From: Bruce Richardson <bruce.richardson@intel.com>
> Sent: Friday, June 17, 2022 7:38 PM
> > [...]
> > +If the driver requires using physical addresses (PA),
> > +the executable file must be granted additional capabilities:
> > +
> > +* ``SYS_ADMIN`` to read ``/proc/self/pagemaps``
> > +* ``IPC_LOCK`` to lock hugepages in memory
> 
> Are either of these necessary if using vfio-pci and VA mode? I have
> seen it previously reported that IPC_LOCK is necessary for IOMMU
> memory mapping for DMA - at least for docker containers - so I'd
> like it confirmed that we don't need them in the in-memory case
> running on the host. If I get the chance I'll try double-checking
> by testing myself.

Sorry, I don't have a physical device using vfio-pci to check.
MLX5 that I have tested doesn't need these capabilities,
but it locks memory from the kernel side.
Note that --in-memory doesn't imply --iova-mode=va.

> 
> > +
> > +.. code-block:: console
> > +
> > +   setcap cap_ipc_lock,cap_sys_admin+ep <executable>
> > +
> > +If physical addresses are not accessible,
> > +the following message will appear during EAL initialization::
> > +
> > +  EAL: rte_mem_virt2phy(): cannot open /proc/self/pagemap:
> Permission denied
> > +
> > +It is harmless in case PA are not needed.
> > +
> 
> While this is probably worth having in the doc, I think we should
> really
> include a note here about using vfio-pci rather than uio and therefore
> not
> needing physical addresses.

A note won't harm. There are also non-PCI devices, though.

> > +For ``virtio`` PMD in legacy mode, ``SYS_RAWIO`` capability is
> required
> > +for ``iopl()`` call to enable access to PCI IO ports.
> >
> 
> How "legacy" is legacy-mode? Is it still likely in widespread use that
> we need this?

I don't really know.
The spec says that legacy support is optional
(2.2.3 Legacy Interface: A Note on Feature Bits) and it aims
to reduce the chance of a legacy driver attempting to drive the device
(4.1.2.1 Device Requirements: PCI Device Discovery).
OTOH, DPDK supports it and requirements must be documented.
I can add a line suggesting to use modern virtio,
but also don't mind removing this.

I'll address skipped comments in v3, thanks.
  
Bruce Richardson June 20, 2022, 8:37 a.m. UTC | #3
On Mon, Jun 20, 2022 at 06:10:37AM +0000, Dmitry Kozlyuk wrote:
> > From: Bruce Richardson <bruce.richardson@intel.com>
> > Sent: Friday, June 17, 2022 7:38 PM
> > > [...]
> > > +If the driver requires using physical addresses (PA),
> > > +the executable file must be granted additional capabilities:
> > > +
> > > +* ``SYS_ADMIN`` to read ``/proc/self/pagemaps``
> > > +* ``IPC_LOCK`` to lock hugepages in memory
> > 
> > Are either of these necessary if using vfio-pci and VA mode? I have
> > seen it previously reported that IPC_LOCK is necessary for IOMMU
> > memory mapping for DMA - at least for docker containers - so I'd
> > like it confirmed that we don't need them in the in-memory case
> > running on the host. If I get the chance I'll try double-checking
> > by testing myself.
> 
> Sorry, I don't have a physical device using vfio-pci to check.
> MLX5 that I have tested doesn't need these capabilities,
> but it locks memory from the kernel side.
> Note that --in-memory doesn't imply --iova-mode=va.
> 
> > 
> > > +
> > > +.. code-block:: console
> > > +
> > > +   setcap cap_ipc_lock,cap_sys_admin+ep <executable>
> > > +
> > > +If physical addresses are not accessible,
> > > +the following message will appear during EAL initialization::
> > > +
> > > +  EAL: rte_mem_virt2phy(): cannot open /proc/self/pagemap:
> > Permission denied
> > > +
> > > +It is harmless in case PA are not needed.
> > > +
> > 
> > While this is probably worth having in the doc, I think we should
> > really
> > include a note here about using vfio-pci rather than uio and therefore
> > not
> > needing physical addresses.
> 
> A note won't harm. There are also non-PCI devices, though.
> 
> > > +For ``virtio`` PMD in legacy mode, ``SYS_RAWIO`` capability is
> > required
> > > +for ``iopl()`` call to enable access to PCI IO ports.
> > >
> > 
> > How "legacy" is legacy-mode? Is it still likely in widespread use that
> > we need this?
> 
> I don't really know.
> The spec says that legacy support is optional
> (2.2.3 Legacy Interface: A Note on Feature Bits) and it aims
> to reduce the chance of a legacy driver attempting to drive the device
> (4.1.2.1 Device Requirements: PCI Device Discovery).
> OTOH, DPDK supports it and requirements must be documented.
> I can add a line suggesting to use modern virtio,
> but also don't mind removing this.
>

I suppose the main question for this legacy virtio bit is where it should
be documented, more than if it should be. Given this is a GSG, we should
try and avoid getting too deep into driver-specific issues, so I think we
should omit legacy virtio here, but have it docuemented in the relevant
virtio-specific doc. Does that seem reasonable?
  
Dmitry Kozlyuk June 24, 2022, 8:49 a.m. UTC | #4
> From: Bruce Richardson <bruce.richardson@intel.com>
> Sent: Monday, June 20, 2022 11:38 AM
> [...]
> > > > +For ``virtio`` PMD in legacy mode, ``SYS_RAWIO`` capability is
> > > required
> > > > +for ``iopl()`` call to enable access to PCI IO ports.
> > > >
> > >
> > > How "legacy" is legacy-mode? Is it still likely in widespread use
> that
> > > we need this?
> >
> > I don't really know.
> > The spec says that legacy support is optional
> > (2.2.3 Legacy Interface: A Note on Feature Bits) and it aims
> > to reduce the chance of a legacy driver attempting to drive the
> device
> > (4.1.2.1 Device Requirements: PCI Device Discovery).
> > OTOH, DPDK supports it and requirements must be documented.
> > I can add a line suggesting to use modern virtio,
> > but also don't mind removing this.
> >
> 
> I suppose the main question for this legacy virtio bit
> is where it should be documented, more than if it should be.
> Given this is a GSG, we should try and avoid getting too deep
> into driver-specific issues, so I think we should omit legacy virtio here,
> but have it docuemented in the relevant virtio-specific doc.
> Does that seem reasonable?

Yes, moved to the virtio doc (it looks like it could use an update BTW).

I also chose to keep HPET line because there's an entire section on HPET
below in the document.
  

Patch

diff --git a/doc/guides/linux_gsg/enable_func.rst b/doc/guides/linux_gsg/enable_func.rst
index 1df3ab0255..2f908e8b70 100644
--- a/doc/guides/linux_gsg/enable_func.rst
+++ b/doc/guides/linux_gsg/enable_func.rst
@@ -13,13 +13,58 @@  Enabling Additional Functionality
 Running DPDK Applications Without Root Privileges
 -------------------------------------------------
 
-In order to run DPDK as non-root, the following Linux filesystem objects'
-permissions should be adjusted to ensure that the Linux account being used to
-run the DPDK application has access to them:
+The following sections describe generic requirements and configuration
+for running DPDK applications as non-root.
+There may be additional requirements documented for some drivers.
 
-*   All directories which serve as hugepage mount points, for example, ``/dev/hugepages``
+Hugepages
+~~~~~~~~~
 
-*   If the HPET is to be used,  ``/dev/hpet``
+Hugepages must be reserved as root before runing the application as non-root,
+for example::
+
+  sudo dpdk-hugepages.py --reserve 1G
+
+If multi-process is not required, running with ``--in-memory``
+bypasses the need to access hugepage mount point and files within it.
+Otherwise, hugepage directory must be made accessible
+for writing to the unprivileged user.
+A good way for managing multiple applications using hugepages
+is to mount the filesystem with group permissions
+and add a supplementary group to each application or container.
+
+One option is to use the script provided by this project::
+
+  export HUGEDIR=$HOME/huge-1G
+  mkdir -p $HUGEDIR
+  sudo dpdk-hugepages.py --mount --directory $HUGEDIR --owner `id -u`:`id -g`
+
+In production environment, the OS can manage mount points
+(`systemd example <https://github.com/systemd/systemd/blob/main/units/dev-hugepages.mount>`_).
+
+The ``hugetlb`` filesystem has additional options to guarantee or limit
+the amount of memory that is possible to allocate using the mount point.
+Refer to the `documentation <https://www.kernel.org/doc/Documentation/vm/hugetlbpage.txt>`_.
+
+If the driver requires using physical addresses (PA),
+the executable file must be granted additional capabilities:
+
+* ``SYS_ADMIN`` to read ``/proc/self/pagemaps``
+* ``IPC_LOCK`` to lock hugepages in memory
+
+.. code-block:: console
+
+   setcap cap_ipc_lock,cap_sys_admin+ep <executable>
+
+If physical addresses are not accessible,
+the following message will appear during EAL initialization::
+
+  EAL: rte_mem_virt2phy(): cannot open /proc/self/pagemap: Permission denied
+
+It is harmless in case PA are not needed.
+
+Resource Limits
+~~~~~~~~~~~~~~~
 
 When running as non-root user, there may be some additional resource limits
 that are imposed by the system. Specifically, the following resource limits may
@@ -34,7 +79,15 @@  need to be adjusted in order to ensure normal DPDK operation:
 The above limits can usually be adjusted by editing
 ``/etc/security/limits.conf`` file, and rebooting.
 
-Additionally, depending on which kernel driver is in use, the relevant
+See `Hugepage Mapping <hugepage_mapping>`_
+secton to learn how these limits affect EAL.
+
+Device Control
+~~~~~~~~~~~~~~
+
+If the HPET is to be used, ``/dev/hpet`` permissions must be adjusted.
+
+Depending on which kernel driver is in use, the relevant
 resources also should be accessible by the user running the DPDK application.
 
 For ``vfio-pci`` kernel driver, the following Linux file system objects'
@@ -64,6 +117,8 @@  system objects' permissions should be adjusted:
        /sys/class/uio/uio0/device/config
        /sys/class/uio/uio0/device/resource*
 
+For ``virtio`` PMD in legacy mode, ``SYS_RAWIO`` capability is required
+for ``iopl()`` call to enable access to PCI IO ports.
 
 Power Management and Power Saving Functionality
 -----------------------------------------------
diff --git a/doc/guides/prog_guide/env_abstraction_layer.rst b/doc/guides/prog_guide/env_abstraction_layer.rst
index 5f0748fba1..70fa099d30 100644
--- a/doc/guides/prog_guide/env_abstraction_layer.rst
+++ b/doc/guides/prog_guide/env_abstraction_layer.rst
@@ -228,6 +228,8 @@  Normally, these options do not need to be changed.
     can later be mapped into that preallocated VA space (if dynamic memory mode
     is enabled), and can optionally be mapped into it at startup.
 
+.. _hugepage_mapping:
+
 Hugepage Mapping
 ^^^^^^^^^^^^^^^^