[v2,3/4] doc: give specific instructions for running as non-root
Checks
Commit Message
The guide to run DPDK applications as non-root in Linux
did not provide specific instructions to configure the required access
and did not explain why each bit is needed.
The latter is important because running as non-root
is one of the ways to tighten security and grant minimal permissions.
Cc: stable@dpdk.org
Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
---
doc/guides/linux_gsg/enable_func.rst | 67 +++++++++++++++++--
.../prog_guide/env_abstraction_layer.rst | 2 +
2 files changed, 63 insertions(+), 6 deletions(-)
Comments
On Fri, Jun 17, 2022 at 02:25:07PM +0300, Dmitry Kozlyuk wrote:
> The guide to run DPDK applications as non-root in Linux
> did not provide specific instructions to configure the required access
> and did not explain why each bit is needed.
> The latter is important because running as non-root
> is one of the ways to tighten security and grant minimal permissions.
>
> Cc: stable@dpdk.org
>
> Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
Thanks for this, some good changes here. Comments inline below.
/Bruce
> ---
> doc/guides/linux_gsg/enable_func.rst | 67 +++++++++++++++++--
> .../prog_guide/env_abstraction_layer.rst | 2 +
> 2 files changed, 63 insertions(+), 6 deletions(-)
>
> diff --git a/doc/guides/linux_gsg/enable_func.rst b/doc/guides/linux_gsg/enable_func.rst
> index 1df3ab0255..2f908e8b70 100644
> --- a/doc/guides/linux_gsg/enable_func.rst
> +++ b/doc/guides/linux_gsg/enable_func.rst
> @@ -13,13 +13,58 @@ Enabling Additional Functionality
> Running DPDK Applications Without Root Privileges
> -------------------------------------------------
>
> -In order to run DPDK as non-root, the following Linux filesystem objects'
> -permissions should be adjusted to ensure that the Linux account being used to
> -run the DPDK application has access to them:
> +The following sections describe generic requirements and configuration
> +for running DPDK applications as non-root.
> +There may be additional requirements documented for some drivers.
>
> -* All directories which serve as hugepage mount points, for example, ``/dev/hugepages``
> +Hugepages
> +~~~~~~~~~
>
> -* If the HPET is to be used, ``/dev/hpet``
> +Hugepages must be reserved as root before runing the application as non-root,
> +for example::
> +
> + sudo dpdk-hugepages.py --reserve 1G
> +
> +If multi-process is not required, running with ``--in-memory``
> +bypasses the need to access hugepage mount point and files within it.
> +Otherwise, hugepage directory must be made accessible
> +for writing to the unprivileged user.
> +A good way for managing multiple applications using hugepages
> +is to mount the filesystem with group permissions
> +and add a supplementary group to each application or container.
> +
> +One option is to use the script provided by this project::
> +
> + export HUGEDIR=$HOME/huge-1G
> + mkdir -p $HUGEDIR
> + sudo dpdk-hugepages.py --mount --directory $HUGEDIR --owner `id -u`:`id -g`
> +
> +In production environment, the OS can manage mount points
> +(`systemd example <https://github.com/systemd/systemd/blob/main/units/dev-hugepages.mount>`_).
> +
> +The ``hugetlb`` filesystem has additional options to guarantee or limit
> +the amount of memory that is possible to allocate using the mount point.
> +Refer to the `documentation <https://www.kernel.org/doc/Documentation/vm/hugetlbpage.txt>`_.
> +
> +If the driver requires using physical addresses (PA),
> +the executable file must be granted additional capabilities:
> +
> +* ``SYS_ADMIN`` to read ``/proc/self/pagemaps``
> +* ``IPC_LOCK`` to lock hugepages in memory
Are either of these necessary if using vfio-pci and VA mode? I have seen it
previously reported that IPC_LOCK is necessary for IOMMU memory mapping for
DMA - at least for docker containers - so I'd like it confirmed that we
don't need them in the in-memory case running on the host. If I get the
chance I'll try double-checking by testing myself.
> +
> +.. code-block:: console
> +
> + setcap cap_ipc_lock,cap_sys_admin+ep <executable>
> +
> +If physical addresses are not accessible,
> +the following message will appear during EAL initialization::
> +
> + EAL: rte_mem_virt2phy(): cannot open /proc/self/pagemap: Permission denied
> +
> +It is harmless in case PA are not needed.
> +
While this is probably worth having in the doc, I think we should really
include a note here about using vfio-pci rather than uio and therefore not
needing physical addresses.
> +Resource Limits
> +~~~~~~~~~~~~~~~
>
> When running as non-root user, there may be some additional resource limits
> that are imposed by the system. Specifically, the following resource limits may
> @@ -34,7 +79,15 @@ need to be adjusted in order to ensure normal DPDK operation:
> The above limits can usually be adjusted by editing
> ``/etc/security/limits.conf`` file, and rebooting.
>
> -Additionally, depending on which kernel driver is in use, the relevant
> +See `Hugepage Mapping <hugepage_mapping>`_
> +secton to learn how these limits affect EAL.
Typo: s/secton/section/
> +
> +Device Control
> +~~~~~~~~~~~~~~
> +
> +If the HPET is to be used, ``/dev/hpet`` permissions must be adjusted.
> +
Given that HPET has been off by default for years, I think we can probably
remove this line. Anyone still using it likely already knows this.
> +Depending on which kernel driver is in use, the relevant
> resources also should be accessible by the user running the DPDK application.
>
> For ``vfio-pci`` kernel driver, the following Linux file system objects'
> @@ -64,6 +117,8 @@ system objects' permissions should be adjusted:
> /sys/class/uio/uio0/device/config
> /sys/class/uio/uio0/device/resource*
>
I think our minimum supported kernel version is now >4.0 so I believe this
uio section should be removed as it's only applicable for earlier kernel
versions.
> +For ``virtio`` PMD in legacy mode, ``SYS_RAWIO`` capability is required
> +for ``iopl()`` call to enable access to PCI IO ports.
>
How "legacy" is legacy-mode? Is it still likely in widespread use that we
need this?
> Power Management and Power Saving Functionality
> -----------------------------------------------
> diff --git a/doc/guides/prog_guide/env_abstraction_layer.rst b/doc/guides/prog_guide/env_abstraction_layer.rst
> index 5f0748fba1..70fa099d30 100644
> --- a/doc/guides/prog_guide/env_abstraction_layer.rst
> +++ b/doc/guides/prog_guide/env_abstraction_layer.rst
> @@ -228,6 +228,8 @@ Normally, these options do not need to be changed.
> can later be mapped into that preallocated VA space (if dynamic memory mode
> is enabled), and can optionally be mapped into it at startup.
>
> +.. _hugepage_mapping:
> +
> Hugepage Mapping
> ^^^^^^^^^^^^^^^^
>
> --
> 2.25.1
>
> From: Bruce Richardson <bruce.richardson@intel.com>
> Sent: Friday, June 17, 2022 7:38 PM
> > [...]
> > +If the driver requires using physical addresses (PA),
> > +the executable file must be granted additional capabilities:
> > +
> > +* ``SYS_ADMIN`` to read ``/proc/self/pagemaps``
> > +* ``IPC_LOCK`` to lock hugepages in memory
>
> Are either of these necessary if using vfio-pci and VA mode? I have
> seen it previously reported that IPC_LOCK is necessary for IOMMU
> memory mapping for DMA - at least for docker containers - so I'd
> like it confirmed that we don't need them in the in-memory case
> running on the host. If I get the chance I'll try double-checking
> by testing myself.
Sorry, I don't have a physical device using vfio-pci to check.
MLX5 that I have tested doesn't need these capabilities,
but it locks memory from the kernel side.
Note that --in-memory doesn't imply --iova-mode=va.
>
> > +
> > +.. code-block:: console
> > +
> > + setcap cap_ipc_lock,cap_sys_admin+ep <executable>
> > +
> > +If physical addresses are not accessible,
> > +the following message will appear during EAL initialization::
> > +
> > + EAL: rte_mem_virt2phy(): cannot open /proc/self/pagemap:
> Permission denied
> > +
> > +It is harmless in case PA are not needed.
> > +
>
> While this is probably worth having in the doc, I think we should
> really
> include a note here about using vfio-pci rather than uio and therefore
> not
> needing physical addresses.
A note won't harm. There are also non-PCI devices, though.
> > +For ``virtio`` PMD in legacy mode, ``SYS_RAWIO`` capability is
> required
> > +for ``iopl()`` call to enable access to PCI IO ports.
> >
>
> How "legacy" is legacy-mode? Is it still likely in widespread use that
> we need this?
I don't really know.
The spec says that legacy support is optional
(2.2.3 Legacy Interface: A Note on Feature Bits) and it aims
to reduce the chance of a legacy driver attempting to drive the device
(4.1.2.1 Device Requirements: PCI Device Discovery).
OTOH, DPDK supports it and requirements must be documented.
I can add a line suggesting to use modern virtio,
but also don't mind removing this.
I'll address skipped comments in v3, thanks.
On Mon, Jun 20, 2022 at 06:10:37AM +0000, Dmitry Kozlyuk wrote:
> > From: Bruce Richardson <bruce.richardson@intel.com>
> > Sent: Friday, June 17, 2022 7:38 PM
> > > [...]
> > > +If the driver requires using physical addresses (PA),
> > > +the executable file must be granted additional capabilities:
> > > +
> > > +* ``SYS_ADMIN`` to read ``/proc/self/pagemaps``
> > > +* ``IPC_LOCK`` to lock hugepages in memory
> >
> > Are either of these necessary if using vfio-pci and VA mode? I have
> > seen it previously reported that IPC_LOCK is necessary for IOMMU
> > memory mapping for DMA - at least for docker containers - so I'd
> > like it confirmed that we don't need them in the in-memory case
> > running on the host. If I get the chance I'll try double-checking
> > by testing myself.
>
> Sorry, I don't have a physical device using vfio-pci to check.
> MLX5 that I have tested doesn't need these capabilities,
> but it locks memory from the kernel side.
> Note that --in-memory doesn't imply --iova-mode=va.
>
> >
> > > +
> > > +.. code-block:: console
> > > +
> > > + setcap cap_ipc_lock,cap_sys_admin+ep <executable>
> > > +
> > > +If physical addresses are not accessible,
> > > +the following message will appear during EAL initialization::
> > > +
> > > + EAL: rte_mem_virt2phy(): cannot open /proc/self/pagemap:
> > Permission denied
> > > +
> > > +It is harmless in case PA are not needed.
> > > +
> >
> > While this is probably worth having in the doc, I think we should
> > really
> > include a note here about using vfio-pci rather than uio and therefore
> > not
> > needing physical addresses.
>
> A note won't harm. There are also non-PCI devices, though.
>
> > > +For ``virtio`` PMD in legacy mode, ``SYS_RAWIO`` capability is
> > required
> > > +for ``iopl()`` call to enable access to PCI IO ports.
> > >
> >
> > How "legacy" is legacy-mode? Is it still likely in widespread use that
> > we need this?
>
> I don't really know.
> The spec says that legacy support is optional
> (2.2.3 Legacy Interface: A Note on Feature Bits) and it aims
> to reduce the chance of a legacy driver attempting to drive the device
> (4.1.2.1 Device Requirements: PCI Device Discovery).
> OTOH, DPDK supports it and requirements must be documented.
> I can add a line suggesting to use modern virtio,
> but also don't mind removing this.
>
I suppose the main question for this legacy virtio bit is where it should
be documented, more than if it should be. Given this is a GSG, we should
try and avoid getting too deep into driver-specific issues, so I think we
should omit legacy virtio here, but have it docuemented in the relevant
virtio-specific doc. Does that seem reasonable?
> From: Bruce Richardson <bruce.richardson@intel.com>
> Sent: Monday, June 20, 2022 11:38 AM
> [...]
> > > > +For ``virtio`` PMD in legacy mode, ``SYS_RAWIO`` capability is
> > > required
> > > > +for ``iopl()`` call to enable access to PCI IO ports.
> > > >
> > >
> > > How "legacy" is legacy-mode? Is it still likely in widespread use
> that
> > > we need this?
> >
> > I don't really know.
> > The spec says that legacy support is optional
> > (2.2.3 Legacy Interface: A Note on Feature Bits) and it aims
> > to reduce the chance of a legacy driver attempting to drive the
> device
> > (4.1.2.1 Device Requirements: PCI Device Discovery).
> > OTOH, DPDK supports it and requirements must be documented.
> > I can add a line suggesting to use modern virtio,
> > but also don't mind removing this.
> >
>
> I suppose the main question for this legacy virtio bit
> is where it should be documented, more than if it should be.
> Given this is a GSG, we should try and avoid getting too deep
> into driver-specific issues, so I think we should omit legacy virtio here,
> but have it docuemented in the relevant virtio-specific doc.
> Does that seem reasonable?
Yes, moved to the virtio doc (it looks like it could use an update BTW).
I also chose to keep HPET line because there's an entire section on HPET
below in the document.
@@ -13,13 +13,58 @@ Enabling Additional Functionality
Running DPDK Applications Without Root Privileges
-------------------------------------------------
-In order to run DPDK as non-root, the following Linux filesystem objects'
-permissions should be adjusted to ensure that the Linux account being used to
-run the DPDK application has access to them:
+The following sections describe generic requirements and configuration
+for running DPDK applications as non-root.
+There may be additional requirements documented for some drivers.
-* All directories which serve as hugepage mount points, for example, ``/dev/hugepages``
+Hugepages
+~~~~~~~~~
-* If the HPET is to be used, ``/dev/hpet``
+Hugepages must be reserved as root before runing the application as non-root,
+for example::
+
+ sudo dpdk-hugepages.py --reserve 1G
+
+If multi-process is not required, running with ``--in-memory``
+bypasses the need to access hugepage mount point and files within it.
+Otherwise, hugepage directory must be made accessible
+for writing to the unprivileged user.
+A good way for managing multiple applications using hugepages
+is to mount the filesystem with group permissions
+and add a supplementary group to each application or container.
+
+One option is to use the script provided by this project::
+
+ export HUGEDIR=$HOME/huge-1G
+ mkdir -p $HUGEDIR
+ sudo dpdk-hugepages.py --mount --directory $HUGEDIR --owner `id -u`:`id -g`
+
+In production environment, the OS can manage mount points
+(`systemd example <https://github.com/systemd/systemd/blob/main/units/dev-hugepages.mount>`_).
+
+The ``hugetlb`` filesystem has additional options to guarantee or limit
+the amount of memory that is possible to allocate using the mount point.
+Refer to the `documentation <https://www.kernel.org/doc/Documentation/vm/hugetlbpage.txt>`_.
+
+If the driver requires using physical addresses (PA),
+the executable file must be granted additional capabilities:
+
+* ``SYS_ADMIN`` to read ``/proc/self/pagemaps``
+* ``IPC_LOCK`` to lock hugepages in memory
+
+.. code-block:: console
+
+ setcap cap_ipc_lock,cap_sys_admin+ep <executable>
+
+If physical addresses are not accessible,
+the following message will appear during EAL initialization::
+
+ EAL: rte_mem_virt2phy(): cannot open /proc/self/pagemap: Permission denied
+
+It is harmless in case PA are not needed.
+
+Resource Limits
+~~~~~~~~~~~~~~~
When running as non-root user, there may be some additional resource limits
that are imposed by the system. Specifically, the following resource limits may
@@ -34,7 +79,15 @@ need to be adjusted in order to ensure normal DPDK operation:
The above limits can usually be adjusted by editing
``/etc/security/limits.conf`` file, and rebooting.
-Additionally, depending on which kernel driver is in use, the relevant
+See `Hugepage Mapping <hugepage_mapping>`_
+secton to learn how these limits affect EAL.
+
+Device Control
+~~~~~~~~~~~~~~
+
+If the HPET is to be used, ``/dev/hpet`` permissions must be adjusted.
+
+Depending on which kernel driver is in use, the relevant
resources also should be accessible by the user running the DPDK application.
For ``vfio-pci`` kernel driver, the following Linux file system objects'
@@ -64,6 +117,8 @@ system objects' permissions should be adjusted:
/sys/class/uio/uio0/device/config
/sys/class/uio/uio0/device/resource*
+For ``virtio`` PMD in legacy mode, ``SYS_RAWIO`` capability is required
+for ``iopl()`` call to enable access to PCI IO ports.
Power Management and Power Saving Functionality
-----------------------------------------------
@@ -228,6 +228,8 @@ Normally, these options do not need to be changed.
can later be mapped into that preallocated VA space (if dynamic memory mode
is enabled), and can optionally be mapped into it at startup.
+.. _hugepage_mapping:
+
Hugepage Mapping
^^^^^^^^^^^^^^^^