[dpdk-dev,RFC] doc: refactored figure numbers into references

Message ID 1427972658-2472-1-git-send-email-john.mcnamara@intel.com (mailing list archive)
State RFC, archived
Headers

Commit Message

Mcnamara, John April 2, 2015, 11:04 a.m. UTC
The is an RFC patch demonstrating automatic figure references
in the documentation. The figure numbers in the generated
Html and PDF docs with by automatically numbered based on
section. Requires Sphinx >= 1.3.

The patch makes the following changes.

* Changes image:: tag to figure:: and moves image caption
  to the figure.

* Adds captions to figures that didn't previously have any.

* Un-templates the |image-name| substitution definitions
  into explicit figure:: tags. They weren't used more
  than once anyway and Sphinx doesn't support them
  for figure.

* Adds a target to each image that didn't previously
  have one so that they can be cross-referenced.

* Renamed existing image target to match the image
  name for consistency.

* Replaces the Figures lists with automatic :numref:
  :ref: entries to generate automatic numbering
  and captions.

* Replaces "Figure" references with automatic :numref:
  references.

Note: a V2 patch would be required to do the same for
      tables.

Signed-off-by: John McNamara <john.mcnamara@intel.com>
---
 doc/guides/conf.py                                 |   2 +
 doc/guides/nics/index.rst                          |  18 ++-
 doc/guides/nics/intel_vf.rst                       |  37 ++---
 doc/guides/nics/virtio.rst                         |  18 ++-
 doc/guides/nics/vmxnet3.rst                        |  18 ++-
 doc/guides/prog_guide/env_abstraction_layer.rst    |   8 +-
 doc/guides/prog_guide/index.rst                    |  92 +++++++-----
 doc/guides/prog_guide/ivshmem_lib.rst              |   8 +-
 doc/guides/prog_guide/kernel_nic_interface.rst     |  40 ++---
 .../prog_guide/link_bonding_poll_mode_drv_lib.rst  |  43 ++++--
 doc/guides/prog_guide/lpm6_lib.rst                 |   8 +-
 doc/guides/prog_guide/lpm_lib.rst                  |   8 +-
 doc/guides/prog_guide/malloc_lib.rst               |   9 +-
 doc/guides/prog_guide/mbuf_lib.rst                 |  20 +--
 doc/guides/prog_guide/mempool_lib.rst              |  32 ++--
 doc/guides/prog_guide/multi_proc_support.rst       |   9 +-
 doc/guides/prog_guide/overview.rst                 |   9 +-
 doc/guides/prog_guide/packet_distrib_lib.rst       |  15 +-
 doc/guides/prog_guide/packet_framework.rst         |  81 +++++-----
 doc/guides/prog_guide/qos_framework.rst            | 163 +++++++--------------
 doc/guides/prog_guide/ring_lib.rst                 | 159 +++++++++++---------
 doc/guides/sample_app_ug/dist_app.rst              |  20 ++-
 doc/guides/sample_app_ug/exception_path.rst        |   8 +-
 doc/guides/sample_app_ug/index.rst                 |  58 ++++----
 doc/guides/sample_app_ug/intel_quickassist.rst     |  11 +-
 doc/guides/sample_app_ug/kernel_nic_interface.rst  |   9 +-
 doc/guides/sample_app_ug/l2_forward_job_stats.rst  |  23 +--
 .../sample_app_ug/l2_forward_real_virtual.rst      |  22 +--
 .../sample_app_ug/l3_forward_access_ctrl.rst       |  21 ++-
 doc/guides/sample_app_ug/load_balancer.rst         |   9 +-
 doc/guides/sample_app_ug/multi_process.rst         |  36 ++---
 doc/guides/sample_app_ug/qos_scheduler.rst         |   9 +-
 doc/guides/sample_app_ug/quota_watermark.rst       |  36 ++---
 doc/guides/sample_app_ug/test_pipeline.rst         |   9 +-
 doc/guides/sample_app_ug/vhost.rst                 |  45 ++----
 doc/guides/sample_app_ug/vm_power_management.rst   |  18 +--
 doc/guides/sample_app_ug/vmdq_dcb_forwarding.rst   |  11 +-
 doc/guides/xen/pkt_switch.rst                      |  30 ++--
 38 files changed, 539 insertions(+), 633 deletions(-)
  

Patch

diff --git a/doc/guides/conf.py b/doc/guides/conf.py
index b1ef323..1bc031f 100644
--- a/doc/guides/conf.py
+++ b/doc/guides/conf.py
@@ -41,6 +41,8 @@  release = version
 
 master_doc = 'index'
 
+numfig = True
+
 latex_documents = [
     ('index',
      'doc.tex',
diff --git a/doc/guides/nics/index.rst b/doc/guides/nics/index.rst
index aadbae3..1ee67fa 100644
--- a/doc/guides/nics/index.rst
+++ b/doc/guides/nics/index.rst
@@ -50,14 +50,20 @@  Network Interface Controller Drivers
 
 **Figures**
 
-:ref:`Figure 1. Virtualization for a Single Port NIC in SR-IOV Mode <nic_figure_1>`
+:numref:`figure_single_port_nic` :ref:`figure_single_port_nic`
 
-:ref:`Figure 2. SR-IOV Performance Benchmark Setup <nic_figure_2>`
+:numref:`figure_perf_benchmark` :ref:`figure_perf_benchmark`
 
-:ref:`Figure 3. Fast Host-based Packet Processing <nic_figure_3>`
+:numref:`figure_fast_pkt_proc` :ref:`figure_fast_pkt_proc`
 
-:ref:`Figure 4. SR-IOV Inter-VM Communication <nic_figure_4>`
+:numref:`figure_inter_vm_comms` :ref:`figure_inter_vm_comms`
 
-:ref:`Figure 5. Virtio Host2VM Communication Example Using KNI vhost Back End <nic_figure_5>`
+:numref:`figure_host_vm_comms` :ref:`figure_host_vm_comms`
 
-:ref:`Figure 6. Virtio Host2VM Communication Example Using Qemu vhost Back End <nic_figure_6>`
+:numref:`figure_host_vm_comms_qemu` :ref:`figure_host_vm_comms_qemu`
+
+:numref:`figure_vmxnet3_int` :ref:`figure_vmxnet3_int`
+
+:numref:`figure_vswitch_vm` :ref:`figure_vswitch_vm`
+
+:numref:`figure_vm_vm_comms` :ref:`figure_vm_vm_comms`
diff --git a/doc/guides/nics/intel_vf.rst b/doc/guides/nics/intel_vf.rst
index e773627..17e83a2 100644
--- a/doc/guides/nics/intel_vf.rst
+++ b/doc/guides/nics/intel_vf.rst
@@ -49,9 +49,9 @@  SR-IOV Mode Utilization in a DPDK Environment
 The DPDK uses the SR-IOV feature for hardware-based I/O sharing in IOV mode.
 Therefore, it is possible to partition SR-IOV capability on Ethernet controller NIC resources logically and
 expose them to a virtual machine as a separate PCI function called a "Virtual Function".
-Refer to Figure 10.
+Refer to :numref:`figure_single_port_nic`.
 
-Therefore, a NIC is logically distributed among multiple virtual machines (as shown in Figure 10),
+Therefore, a NIC is logically distributed among multiple virtual machines (as shown in :numref:`figure_single_port_nic`),
 while still having global data in common to share with the Physical Function and other Virtual Functions.
 The DPDK fm10kvf, i40evf, igbvf or ixgbevf as a Poll Mode Driver (PMD) serves for the Intel® 82576 Gigabit Ethernet Controller,
 Intel® Ethernet Controller I350 family, Intel® 82599 10 Gigabit Ethernet Controller NIC,
@@ -72,11 +72,12 @@  For more detail on SR-IOV, please refer to the following documents:
 
 *   `Scalable I/O Virtualized Servers <http://www.intel.com/content/www/us/en/virtualization/server-virtualization/scalable-i-o-virtualized-servers-paper.html>`_
 
-.. _nic_figure_1:
+.. _figure_single_port_nic:
 
-**Figure 1. Virtualization for a Single Port NIC in SR-IOV Mode**
+.. figure:: img/single_port_nic.*
+
+   Virtualization for a Single Port NIC in SR-IOV Mode
 
-.. image:: img/single_port_nic.*
 
 Physical and Virtual Function Infrastructure
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -548,13 +549,14 @@  The setup procedure is as follows:
         can also be used to bind and unbind devices to a virtual machine in Ubuntu.
         If this option is used, step 6 in the instructions provided will be different.
 
-    *   The Virtual Machine Monitor (see Figure 11) is equivalent to a Host OS with KVM installed as described in the instructions.
+    *   The Virtual Machine Monitor (see :numref:`figure_perf_benchmark`) is equivalent to a Host OS with KVM installed as described in the instructions.
+
+.. _figure_perf_benchmark:
 
-.. _nic_figure_2:
+.. figure:: img/perf_benchmark.*
 
-**Figure 2. Performance Benchmark Setup**
+   Performance Benchmark Setup
 
-.. image:: img/perf_benchmark.*
 
 DPDK SR-IOV PMD PF/VF Driver Usage Model
 ----------------------------------------
@@ -569,14 +571,15 @@  the DPDK VF PMD driver performs the same throughput result as a non-VT native en
 With such host instance fast packet processing, lots of services such as filtering, QoS,
 DPI can be offloaded on the host fast path.
 
-Figure 12 shows the scenario where some VMs directly communicate externally via a VFs,
+:numref:`figure_fast_pkt_proc` shows the scenario where some VMs directly communicate externally via a VFs,
 while others connect to a virtual switch and share the same uplink bandwidth.
 
-.. _nic_figure_3:
+.. _figure_fast_pkt_proc:
+
+.. figure:: img/fast_pkt_proc.*
 
-**Figure 3. Fast Host-based Packet Processing**
+   Fast Host-based Packet Processing
 
-.. image:: img/fast_pkt_proc.*
 
 SR-IOV (PF/VF) Approach for Inter-VM Communication
 --------------------------------------------------
@@ -587,7 +590,7 @@  So VF-to-VF traffic within the same physical port (VM0<->VM1) have hardware acce
 However, when VF crosses physical ports (VM0<->VM2), there is no such hardware bridge.
 In this case, the DPDK PMD PF driver provides host forwarding between such VMs.
 
-Figure 13 shows an example.
+:numref:`figure_inter_vm_comms` shows an example.
 In this case an update of the MAC address lookup tables in both the NIC and host DPDK application is required.
 
 In the NIC, writing the destination of a MAC address belongs to another cross device VM to the PF specific pool.
@@ -598,8 +601,8 @@  that is, the packet is forwarded to the correct PF pool.
 The SR-IOV NIC switch forwards the packet to a specific VM according to the MAC destination address
 which belongs to the destination VF on the VM.
 
-.. _nic_figure_4:
+.. _figure_inter_vm_comms:
 
-**Figure 4. Inter-VM Communication**
+.. figure:: img/inter_vm_comms.*
 
-.. image:: img/inter_vm_comms.*
+   Inter-VM Communication
diff --git a/doc/guides/nics/virtio.rst b/doc/guides/nics/virtio.rst
index 073d980..9f18b3a 100644
--- a/doc/guides/nics/virtio.rst
+++ b/doc/guides/nics/virtio.rst
@@ -106,11 +106,12 @@  Virtio with kni vhost Back End
 
 This section demonstrates kni vhost back end example setup for Phy-VM Communication.
 
-.. _nic_figure_5:
+.. _figure_host_vm_comms:
 
-**Figure 5. Host2VM Communication Example Using kni vhost Back End**
+.. figure:: img/host_vm_comms.*
+
+   Host2VM Communication Example Using kni vhost Back End
 
-.. image:: img/host_vm_comms.*
 
 Host2VM communication example
 
@@ -174,7 +175,9 @@  Host2VM communication example
 
     We use testpmd as the forwarding application in this example.
 
-    .. image:: img/console.*
+    .. figure:: img/console.*
+
+       Running testpmd
 
 #.  Use IXIA packet generator to inject a packet stream into the KNI physical port.
 
@@ -185,11 +188,12 @@  Host2VM communication example
 Virtio with qemu virtio Back End
 --------------------------------
 
-.. _nic_figure_6:
+.. _figure_host_vm_comms_qemu:
+
+.. figure:: img/host_vm_comms_qemu.*
 
-**Figure 6. Host2VM Communication Example Using qemu vhost Back End**
+   Host2VM Communication Example Using qemu vhost Back End
 
-.. image:: img/host_vm_comms_qemu.*
 
 .. code-block:: console
 
diff --git a/doc/guides/nics/vmxnet3.rst b/doc/guides/nics/vmxnet3.rst
index 3aa5b40..fe32a41 100644
--- a/doc/guides/nics/vmxnet3.rst
+++ b/doc/guides/nics/vmxnet3.rst
@@ -121,7 +121,11 @@  The following prerequisites apply:
 *   Before starting a VM, a VMXNET3 interface to a VM through VMware vSphere Client must be assigned.
     This is shown in the figure below.
 
-.. image:: img/vmxnet3_int.*
+.. _figure_vmxnet3_int:
+
+.. figure:: img/vmxnet3_int.*
+
+   Assigning a VMXNET3 interface to a VM using VMware vSphere Client
 
 .. note::
 
@@ -142,7 +146,11 @@  VMXNET3 with a Native NIC Connected to a vSwitch
 
 This section describes an example setup for Phy-vSwitch-VM-Phy communication.
 
-.. image:: img/vswitch_vm.*
+.. _figure_vswitch_vm:
+
+.. figure:: img/vswitch_vm.*
+
+   VMXNET3 with a Native NIC Connected to a vSwitch
 
 .. note::
 
@@ -159,7 +167,11 @@  VMXNET3 Chaining VMs Connected to a vSwitch
 
 The following figure shows an example VM-to-VM communication over a Phy-VM-vSwitch-VM-Phy communication channel.
 
-.. image:: img/vm_vm_comms.*
+.. _figure_vm_vm_comms:
+
+.. figure:: img/vm_vm_comms.*
+
+   VMXNET3 Chaining VMs Connected to a vSwitch
 
 .. note::
 
diff --git a/doc/guides/prog_guide/env_abstraction_layer.rst b/doc/guides/prog_guide/env_abstraction_layer.rst
index 1b531e2..3656ca6 100644
--- a/doc/guides/prog_guide/env_abstraction_layer.rst
+++ b/doc/guides/prog_guide/env_abstraction_layer.rst
@@ -85,13 +85,12 @@  A check is also performed at initialization time to ensure that the micro archit
 Then, the main() function is called. The core initialization and launch is done in rte_eal_init() (see the API documentation).
 It consist of calls to the pthread library (more specifically, pthread_self(), pthread_create(), and pthread_setaffinity_np()).
 
-.. _pg_figure_2:
+.. _figure_linuxapp_launch:
 
-**Figure 2. EAL Initialization in a Linux Application Environment**
+.. figure:: img/linuxapp_launch.*
 
-.. image3_png has been replaced
+   EAL Initialization in a Linux Application Environment
 
-|linuxapp_launch|
 
 .. note::
 
@@ -367,4 +366,3 @@  We expect only 50% of CPU spend on packet IO.
     echo  50000 > pkt_io/cpu.cfs_quota_us
 
 
-.. |linuxapp_launch| image:: img/linuxapp_launch.*
diff --git a/doc/guides/prog_guide/index.rst b/doc/guides/prog_guide/index.rst
index a9966a0..84a657e 100644
--- a/doc/guides/prog_guide/index.rst
+++ b/doc/guides/prog_guide/index.rst
@@ -80,71 +80,97 @@  Programmer's Guide
 
 **Figures**
 
-:ref:`Figure 1. Core Components Architecture <pg_figure_1>`
+:numref:`figure_architecture-overview` :ref:`figure_architecture-overview`
 
-:ref:`Figure 2. EAL Initialization in a Linux Application Environment <pg_figure_2>`
+:numref:`figure_linuxapp_launch` :ref:`figure_linuxapp_launch`
 
-:ref:`Figure 3. Example of a malloc heap and malloc elements within the malloc library <pg_figure_3>`
+:numref:`figure_malloc_heap` :ref:`figure_malloc_heap`
 
-:ref:`Figure 4. Ring Structure <pg_figure_4>`
+:numref:`figure_ring1` :ref:`figure_ring1`
 
-:ref:`Figure 5. Two Channels and Quad-ranked DIMM Example <pg_figure_5>`
+:numref:`figure_ring-enqueue1` :ref:`figure_ring-enqueue1`
 
-:ref:`Figure 6. Three Channels and Two Dual-ranked DIMM Example <pg_figure_6>`
+:numref:`figure_ring-enqueue2` :ref:`figure_ring-enqueue2`
 
-:ref:`Figure 7. A mempool in Memory with its Associated Ring <pg_figure_7>`
+:numref:`figure_ring-enqueue3` :ref:`figure_ring-enqueue3`
 
-:ref:`Figure 8. An mbuf with One Segment <pg_figure_8>`
+:numref:`figure_ring-dequeue1` :ref:`figure_ring-dequeue1`
 
-:ref:`Figure 9. An mbuf with Three Segments <pg_figure_9>`
+:numref:`figure_ring-dequeue2` :ref:`figure_ring-dequeue2`
 
-:ref:`Figure 16. Memory Sharing inthe Intel® DPDK Multi-process Sample Application <pg_figure_16>`
+:numref:`figure_ring-dequeue3` :ref:`figure_ring-dequeue3`
 
-:ref:`Figure 17. Components of an Intel® DPDK KNI Application <pg_figure_17>`
+:numref:`figure_ring-mp-enqueue1` :ref:`figure_ring-mp-enqueue1`
 
-:ref:`Figure 18. Packet Flow via mbufs in the Intel DPDK® KNI <pg_figure_18>`
+:numref:`figure_ring-mp-enqueue2` :ref:`figure_ring-mp-enqueue2`
 
-:ref:`Figure 19. vHost-net Architecture Overview <pg_figure_19>`
+:numref:`figure_ring-mp-enqueue3` :ref:`figure_ring-mp-enqueue3`
 
-:ref:`Figure 20. KNI Traffic Flow <pg_figure_20>`
+:numref:`figure_ring-mp-enqueue4` :ref:`figure_ring-mp-enqueue4`
 
-:ref:`Figure 21. Complex Packet Processing Pipeline with QoS Support <pg_figure_21>`
+:numref:`figure_ring-mp-enqueue5` :ref:`figure_ring-mp-enqueue5`
 
-:ref:`Figure 22. Hierarchical Scheduler Block Internal Diagram <pg_figure_22>`
+:numref:`figure_ring-modulo1` :ref:`figure_ring-modulo1`
 
-:ref:`Figure 23. Scheduling Hierarchy per Port <pg_figure_23>`
+:numref:`figure_ring-modulo2` :ref:`figure_ring-modulo2`
 
-:ref:`Figure 24. Internal Data Structures per Port <pg_figure_24>`
+:numref:`figure_memory-management` :ref:`figure_memory-management`
 
-:ref:`Figure 25. Prefetch Pipeline for the Hierarchical Scheduler Enqueue Operation <pg_figure_25>`
+:numref:`figure_memory-management2` :ref:`figure_memory-management2`
 
-:ref:`Figure 26. Pipe Prefetch State Machine for the Hierarchical Scheduler Dequeue Operation <pg_figure_26>`
+:numref:`figure_mempool` :ref:`figure_mempool`
 
-:ref:`Figure 27. High-level Block Diagram of the Intel® DPDK Dropper <pg_figure_27>`
+:numref:`figure_mbuf1` :ref:`figure_mbuf1`
 
-:ref:`Figure 28. Flow Through the Dropper <pg_figure_28>`
+:numref:`figure_mbuf2` :ref:`figure_mbuf2`
 
-:ref:`Figure 29. Example Data Flow Through Dropper <pg_figure_29>`
+:numref:`figure_multi_process_memory` :ref:`figure_multi_process_memory`
 
-:ref:`Figure 30. Packet Drop Probability for a Given RED Configuration <pg_figure_30>`
+:numref:`figure_kernel_nic_intf` :ref:`figure_kernel_nic_intf`
 
-:ref:`Figure 31. Initial Drop Probability (pb), Actual Drop probability (pa) Computed Using a Factor 1 (Blue Curve) and a Factor 2 (Red Curve) <pg_figure_31>`
+:numref:`figure_pkt_flow_kni` :ref:`figure_pkt_flow_kni`
 
-:ref:`Figure 32. Example of packet processing pipeline. The input ports 0 and 1 are connected with the output ports 0, 1 and 2 through tables 0 and 1. <pg_figure_32>`
+:numref:`figure_vhost_net_arch2` :ref:`figure_vhost_net_arch2`
 
-:ref:`Figure 33. Sequence of steps for hash table operations in packet processing context <pg_figure_33>`
+:numref:`figure_kni_traffic_flow` :ref:`figure_kni_traffic_flow`
 
-:ref:`Figure 34. Data structures for configurable key size hash tables <pg_figure_34>`
 
-:ref:`Figure 35. Bucket search pipeline for key lookup operation (configurable key size hash tables) <pg_figure_35>`
+:numref:`figure_pkt_proc_pipeline_qos` :ref:`figure_pkt_proc_pipeline_qos`
 
-:ref:`Figure 36. Pseudo-code for match, match_many and match_pos <pg_figure_36>`
+:numref:`figure_hier_sched_blk` :ref:`figure_hier_sched_blk`
 
-:ref:`Figure 37. Data structures for 8-byte key hash tables <pg_figure_37>`
+:numref:`figure_sched_hier_per_port` :ref:`figure_sched_hier_per_port`
 
-:ref:`Figure 38. Data structures for 16-byte key hash tables <pg_figure_38>`
+:numref:`figure_data_struct_per_port` :ref:`figure_data_struct_per_port`
+
+:numref:`figure_prefetch_pipeline` :ref:`figure_prefetch_pipeline`
+
+:numref:`figure_pipe_prefetch_sm` :ref:`figure_pipe_prefetch_sm`
+
+:numref:`figure_blk_diag_dropper` :ref:`figure_blk_diag_dropper`
+
+:numref:`figure_flow_tru_droppper` :ref:`figure_flow_tru_droppper`
+
+:numref:`figure_ex_data_flow_tru_dropper` :ref:`figure_ex_data_flow_tru_dropper`
+
+:numref:`figure_pkt_drop_probability` :ref:`figure_pkt_drop_probability`
+
+:numref:`figure_drop_probability_graph` :ref:`figure_drop_probability_graph`
+
+:numref:`figure_figure32` :ref:`figure_figure32`
+
+:numref:`figure_figure33` :ref:`figure_figure33`
+
+:numref:`figure_figure34` :ref:`figure_figure34`
+
+:numref:`figure_figure35` :ref:`figure_figure35`
+
+:numref:`figure_figure37` :ref:`figure_figure37`
+
+:numref:`figure_figure38` :ref:`figure_figure38`
+
+:numref:`figure_figure39` :ref:`figure_figure39`
 
-:ref:`Figure 39. Bucket search pipeline for key lookup operation (single key size hash tables) <pg_figure_39>`
 
 **Tables**
 
diff --git a/doc/guides/prog_guide/ivshmem_lib.rst b/doc/guides/prog_guide/ivshmem_lib.rst
index c76d2b3..af4c7a9 100644
--- a/doc/guides/prog_guide/ivshmem_lib.rst
+++ b/doc/guides/prog_guide/ivshmem_lib.rst
@@ -43,9 +43,11 @@  they are automatically recognized by the DPDK Environment Abstraction Layer (EAL
 
 A typical DPDK IVSHMEM use case looks like the following.
 
-.. image28_png has been renamed
 
-|ivshmem|
+.. figure:: img/ivshmem.*
+
+   Typical Ivshmem use case
+
 
 The same could work with several virtual machines, providing host-to-VM or VM-to-VM communication.
 The maximum number of metadata files is 32 (by default) and each metadata file can contain different (or even the same) hugepages.
@@ -154,5 +156,3 @@  It is important to note that once QEMU is started, it holds on to the hugepages
 As a result, if the user wishes to shut down or restart the IVSHMEM host application,
 it is not enough to simply shut the application down.
 The virtual machine must also be shut down (if not, it will hold onto outdated host data).
-
-.. |ivshmem| image:: img/ivshmem.*
diff --git a/doc/guides/prog_guide/kernel_nic_interface.rst b/doc/guides/prog_guide/kernel_nic_interface.rst
index bac2215..3402fd2 100644
--- a/doc/guides/prog_guide/kernel_nic_interface.rst
+++ b/doc/guides/prog_guide/kernel_nic_interface.rst
@@ -42,15 +42,14 @@  The benefits of using the DPDK KNI are:
 
 *   Allows an interface with the kernel network stack.
 
-The components of an application using the DPDK Kernel NIC Interface are shown in Figure 17.
+The components of an application using the DPDK Kernel NIC Interface are shown in :numref:`figure_kernel_nic_intf`.
 
-.. _pg_figure_17:
+.. _figure_kernel_nic_intf:
 
-**Figure 17. Components of a DPDK KNI Application**
+.. figure:: img/kernel_nic_intf.*
 
-.. image43_png has been renamed
+   Components of a DPDK KNI Application
 
-|kernel_nic_intf|
 
 The DPDK KNI Kernel Module
 --------------------------
@@ -114,15 +113,14 @@  To minimize the amount of DPDK code running in kernel space, the mbuf mempool is
 The kernel module will be aware of mbufs,
 but all mbuf allocation and free operations will be handled by the DPDK application only.
 
-Figure 18 shows a typical scenario with packets sent in both directions.
+:numref:`figure_pkt_flow_kni` shows a typical scenario with packets sent in both directions.
 
-.. _pg_figure_18:
+.. _figure_pkt_flow_kni:
 
-**Figure 18. Packet Flow via mbufs in the DPDK KNI**
+.. figure:: img/pkt_flow_kni.*
 
-.. image44_png has been renamed
+   Packet Flow via mbufs in the DPDK KNI
 
-|pkt_flow_kni|
 
 Use Case: Ingress
 -----------------
@@ -189,13 +187,12 @@  it naturally supports both legacy virtio -net and the DPDK PMD virtio.
 There is a little penalty that comes from the non-polling mode of vhost.
 However, it scales throughput well when using KNI in multi-thread mode.
 
-.. _pg_figure_19:
+.. _figure_vhost_net_arch2:
 
-**Figure 19. vHost-net Architecture Overview**
+.. figure:: img/vhost_net_arch.*
 
-.. image45_png has been renamed
+   vHost-net Architecture Overview
 
-|vhost_net_arch|
 
 Packet Flow
 ~~~~~~~~~~~
@@ -208,13 +205,12 @@  All the packet copying, irrespective of whether it is on the transmit or receive
 happens in the context of vhost kthread.
 Every vhost-net device is exposed to a front end virtio device in the guest.
 
-.. _pg_figure_20:
+.. _figure_kni_traffic_flow:
 
-**Figure 20. KNI Traffic Flow**
+.. figure:: img/kni_traffic_flow.*
 
-.. image46_png  has been renamed
+   KNI Traffic Flow
 
-|kni_traffic_flow|
 
 Sample Usage
 ~~~~~~~~~~~~
@@ -280,11 +276,3 @@  since the kni-vhost does not yet support those features.
 Even if the option is turned on, kni-vhost will ignore the information that the header contains.
 When working with legacy virtio on the guest, it is better to turn off unsupported offload features using ethtool -K.
 Otherwise, there may be problems such as an incorrect L4 checksum error.
-
-.. |kni_traffic_flow| image:: img/kni_traffic_flow.*
-
-.. |vhost_net_arch| image:: img/vhost_net_arch.*
-
-.. |pkt_flow_kni| image:: img/pkt_flow_kni.*
-
-.. |kernel_nic_intf| image:: img/kernel_nic_intf.*
diff --git a/doc/guides/prog_guide/link_bonding_poll_mode_drv_lib.rst b/doc/guides/prog_guide/link_bonding_poll_mode_drv_lib.rst
index 24a1a36..fd3ac5e 100644
--- a/doc/guides/prog_guide/link_bonding_poll_mode_drv_lib.rst
+++ b/doc/guides/prog_guide/link_bonding_poll_mode_drv_lib.rst
@@ -35,7 +35,10 @@  In addition to Poll Mode Drivers (PMDs) for physical and virtual hardware,
 DPDK also includes a pure-software library that
 allows physical PMD's to be bonded together to create a single logical PMD.
 
-|bond-overview|
+.. figure:: img/bond-overview.*
+
+   Bonded PMDs
+
 
 The Link Bonding PMD library(librte_pmd_bond) supports bonding of groups of
 ``rte_eth_dev`` ports of the same speed and duplex to provide
@@ -62,7 +65,10 @@  Currently the Link Bonding PMD library supports 4 modes of operation:
 
 *   **Round-Robin (Mode 0):**
 
-|bond-mode-0|
+.. figure:: img/bond-mode-0.*
+
+   Round-Robin (Mode 0)
+
 
     This mode provides load balancing and fault tolerance by transmission of
     packets in sequential order from the first available slave device through
@@ -72,7 +78,10 @@  Currently the Link Bonding PMD library supports 4 modes of operation:
 
 *   **Active Backup (Mode 1):**
 
-|bond-mode-1|
+.. figure:: img/bond-mode-1.*
+
+   Active Backup (Mode 1)
+
 
     In this mode only one slave in the bond is active at any time, a different
     slave becomes active if, and only if, the primary active slave fails,
@@ -82,7 +91,10 @@  Currently the Link Bonding PMD library supports 4 modes of operation:
 
 *   **Balance XOR (Mode 2):**
 
-|bond-mode-2|
+.. figure:: img/bond-mode-2.*
+
+   Balance XOR (Mode 2)
+
 
     This mode provides transmit load balancing (based on the selected
     transmission policy) and fault tolerance. The default policy (layer2) uses
@@ -101,14 +113,20 @@  Currently the Link Bonding PMD library supports 4 modes of operation:
 
 *   **Broadcast (Mode 3):**
 
-|bond-mode-3|
+.. figure:: img/bond-mode-3.*
+
+   Broadcast (Mode 3)
+
 
     This mode provides fault tolerance by transmission of packets on all slave
     ports.
 
 *   **Link Aggregation 802.3AD (Mode 4):**
 
-|bond-mode-4|
+.. figure:: img/bond-mode-4.*
+
+   Link Aggregation 802.3AD (Mode 4)
+
 
     This mode provides dynamic link aggregation according to the 802.3ad
     specification. It negotiates and monitors aggregation groups that share the
@@ -128,7 +146,10 @@  Currently the Link Bonding PMD library supports 4 modes of operation:
 
 *   **Transmit Load Balancing (Mode 5):**
 
-|bond-mode-5|
+.. figure:: img/bond-mode-5.*
+
+   Transmit Load Balancing (Mode 5)
+
 
     This mode provides an adaptive transmit load balancing. It dynamically
     changes the transmitting slave, according to the computed load. Statistics
@@ -433,11 +454,3 @@  Create a bonded device in balance mode with two slaves specified by their PCI ad
 .. code-block:: console
 
     $RTE_TARGET/app/testpmd -c '0xf' -n 4 --vdev 'eth_bond0,mode=2, slave=0000:00a:00.01,slave=0000:004:00.00,xmit_policy=l34' -- --port-topology=chained
-
-.. |bond-overview| image:: img/bond-overview.*
-.. |bond-mode-0| image:: img/bond-mode-0.*
-.. |bond-mode-1| image:: img/bond-mode-1.*
-.. |bond-mode-2| image:: img/bond-mode-2.*
-.. |bond-mode-3| image:: img/bond-mode-3.*
-.. |bond-mode-4| image:: img/bond-mode-4.*
-.. |bond-mode-5| image:: img/bond-mode-5.*
diff --git a/doc/guides/prog_guide/lpm6_lib.rst b/doc/guides/prog_guide/lpm6_lib.rst
index abc5adb..87f5066 100644
--- a/doc/guides/prog_guide/lpm6_lib.rst
+++ b/doc/guides/prog_guide/lpm6_lib.rst
@@ -108,9 +108,11 @@  This is not feasible due to resource restrictions.
 By splitting the process in different tables/levels and limiting the number of tbl8s,
 we can greatly reduce memory consumption while maintaining a very good lookup speed (one memory access per level).
 
-.. image40_png has been renamed
 
-|tbl24_tbl8_tbl8|
+.. figure:: img/tbl24_tbl8_tbl8.*
+
+   Table split into different levels
+
 
 An entry in a table contains the following fields:
 
@@ -231,5 +233,3 @@  Use Case: IPv6 Forwarding
 -------------------------
 
 The LPM algorithm is used to implement the Classless Inter-Domain Routing (CIDR) strategy used by routers implementing IP forwarding.
-
-.. |tbl24_tbl8_tbl8| image:: img/tbl24_tbl8_tbl8.*
diff --git a/doc/guides/prog_guide/lpm_lib.rst b/doc/guides/prog_guide/lpm_lib.rst
index 692e37f..c33e469 100644
--- a/doc/guides/prog_guide/lpm_lib.rst
+++ b/doc/guides/prog_guide/lpm_lib.rst
@@ -90,9 +90,11 @@  Instead, this approach takes advantage of the fact that rules longer than 24 bit
 By splitting the process in two different tables/levels and limiting the number of tbl8s,
 we can greatly reduce memory consumption while maintaining a very good lookup speed (one memory access, most of the times).
 
-.. image39 has been renamed
 
-|tbl24_tbl8|
+.. figure:: img/tbl24_tbl8.*
+
+   Table split into different levels
+
 
 An entry in tbl24 contains the following fields:
 
@@ -219,5 +221,3 @@  References
 
 *   Pankaj Gupta, Algorithms for Routing Lookups and Packet Classification, PhD Thesis, Stanford University,
     2000  (`http://klamath.stanford.edu/~pankaj/thesis/ thesis_1sided.pdf <http://klamath.stanford.edu/~pankaj/thesis/%20thesis_1sided.pdf>`_ )
-
-.. |tbl24_tbl8| image:: img/tbl24_tbl8.*
diff --git a/doc/guides/prog_guide/malloc_lib.rst b/doc/guides/prog_guide/malloc_lib.rst
index b9298f8..6418fab 100644
--- a/doc/guides/prog_guide/malloc_lib.rst
+++ b/doc/guides/prog_guide/malloc_lib.rst
@@ -117,13 +117,12 @@  The key fields of the heap structure and their function are described below (see
     since these are never touched except when they are to be freed again -
     at which point the pointer to the block is an input to the free() function.
 
-.. _pg_figure_3:
+.. _figure_malloc_heap:
 
-**Figure 3. Example of a malloc heap and malloc elements within the malloc library**
+.. figure:: img/malloc_heap.*
 
-.. image4_png has been renamed
+   Example of a malloc heap and malloc elements within the malloc library
 
-|malloc_heap|
 
 Structure: malloc_elem
 ^^^^^^^^^^^^^^^^^^^^^^
@@ -232,5 +231,3 @@  These next and previous elements are then checked to see if they too are free,
 and if so, they are merged with the current elements.
 This means that we can never have two free memory blocks adjacent to one another,
 they are always merged into a single block.
-
-.. |malloc_heap| image:: img/malloc_heap.*
diff --git a/doc/guides/prog_guide/mbuf_lib.rst b/doc/guides/prog_guide/mbuf_lib.rst
index 8f546e0..8845039 100644
--- a/doc/guides/prog_guide/mbuf_lib.rst
+++ b/doc/guides/prog_guide/mbuf_lib.rst
@@ -71,23 +71,21 @@  Message buffers may be used to carry control information, packets, events,
 and so on between different entities in the system.
 Message buffers may also use their buffer pointers to point to other message buffer data sections or other structures.
 
-Figure 8 and Figure 9 show some of these scenarios.
+:numref:`figure_mbuf1` and :numref:`figure_mbuf2` show some of these scenarios.
 
-.. _pg_figure_8:
+.. _figure_mbuf1:
 
-**Figure 8. An mbuf with One Segment**
+.. figure:: img/mbuf1.*
 
-.. image22_png  has been replaced
+   An mbuf with One Segment
 
-|mbuf1|
 
-.. _pg_figure_9:
+.. _figure_mbuf2:
 
-**Figure 9. An mbuf with Three Segments**
+.. figure:: img/mbuf2.*
 
-.. image23_png has been replaced
+   An mbuf with Three Segments
 
-|mbuf2|
 
 The Buffer Manager implements a fairly standard set of buffer access functions to manipulate network packets.
 
@@ -277,7 +275,3 @@  Use Cases
 ---------
 
 All networking application should use mbufs to transport network packets.
-
-.. |mbuf1| image:: img/mbuf1.*
-
-.. |mbuf2| image:: img/mbuf2.*
diff --git a/doc/guides/prog_guide/mempool_lib.rst b/doc/guides/prog_guide/mempool_lib.rst
index f9b7cfe..f0ca06f 100644
--- a/doc/guides/prog_guide/mempool_lib.rst
+++ b/doc/guides/prog_guide/mempool_lib.rst
@@ -74,28 +74,27 @@  When running an application, the EAL command line options provide the ability to
 
     The command line must always have the number of memory channels specified for the processor.
 
-Examples of alignment for different DIMM architectures are shown in Figure 5 and Figure 6.
+Examples of alignment for different DIMM architectures are shown in
+:numref:`figure_memory-management` and :numref:`figure_memory-management2`.
 
-.. _pg_figure_5:
+.. _figure_memory-management:
 
-**Figure 5. Two Channels and Quad-ranked DIMM Example**
+.. figure:: img/memory-management.*
 
-.. image19_png has been replaced
+   Two Channels and Quad-ranked DIMM Example
 
-|memory-management|
 
 In this case, the assumption is that a packet is 16 blocks of 64 bytes, which is not true.
 
 The Intel® 5520 chipset has three channels, so in most cases,
 no padding is required between objects (except for objects whose size are n x 3 x 64 bytes blocks).
 
-.. _pg_figure_6:
+.. _figure_memory-management2:
 
-**Figure 6. Three Channels and Two Dual-ranked DIMM Example**
+.. figure:: img/memory-management2.*
 
-.. image20_png has been replaced
+   Three Channels and Two Dual-ranked DIMM Example
 
-|memory-management2|
 
 When creating a new pool, the user can specify to use this feature or not.
 
@@ -119,15 +118,14 @@  This cache can be enabled or disabled at creation of the pool.
 
 The maximum size of the cache is static and is defined at compilation time (CONFIG_RTE_MEMPOOL_CACHE_MAX_SIZE).
 
-Figure 7 shows a cache in operation.
+:numref:`figure_mempool` shows a cache in operation.
 
-.. _pg_figure_7:
+.. _figure_mempool:
 
-**Figure 7. A mempool in Memory with its Associated Ring**
+.. figure:: img/mempool.*
 
-.. image21_png has been replaced
+   A mempool in Memory with its Associated Ring
 
-|mempool|
 
 Use Cases
 ---------
@@ -140,9 +138,3 @@  Below are some examples:
 *   :ref:`Environment Abstraction Layer <Environment_Abstraction_Layer>` , for logging service
 
 *   Any application that needs to allocate fixed-sized objects in the data plane and that will be continuously utilized by the system.
-
-.. |memory-management| image:: img/memory-management.*
-
-.. |memory-management2| image:: img/memory-management2.*
-
-.. |mempool| image:: img/mempool.*
diff --git a/doc/guides/prog_guide/multi_proc_support.rst b/doc/guides/prog_guide/multi_proc_support.rst
index 25a6056..6562f0d 100644
--- a/doc/guides/prog_guide/multi_proc_support.rst
+++ b/doc/guides/prog_guide/multi_proc_support.rst
@@ -83,13 +83,12 @@  and point to the same objects, in both processes.
     Refer to Section 23.3 "Multi-process Limitations" for details of
     how Linux kernel Address-Space Layout Randomization (ASLR) can affect memory sharing.
 
-.. _pg_figure_16:
+.. _figure_multi_process_memory:
 
-**Figure 16. Memory Sharing in the DPDK Multi-process Sample Application**
+.. figure:: img/multi_process_memory.*
 
-.. image42_png has been replaced
+   Memory Sharing in the DPDK Multi-process Sample Application
 
-|multi_process_memory|
 
 The EAL also supports an auto-detection mode (set by EAL --proc-type=auto flag ),
 whereby an DPDK process is started as a secondary instance if a primary instance is already running.
@@ -199,5 +198,3 @@  instead of the functions which do the hashing internally, such as rte_hash_add()
     which means that only the first, primary DPDK process instance can open and mmap  /dev/hpet.
     If the number of required DPDK processes exceeds that of the number of available HPET comparators,
     the TSC (which is the default timer in this release) must be used as a time source across all processes instead of the HPET.
-
-.. |multi_process_memory| image:: img/multi_process_memory.*
diff --git a/doc/guides/prog_guide/overview.rst b/doc/guides/prog_guide/overview.rst
index 062d923..cef6ca7 100644
--- a/doc/guides/prog_guide/overview.rst
+++ b/doc/guides/prog_guide/overview.rst
@@ -120,13 +120,12 @@  Core Components
 The *core components* are a set of libraries that provide all the elements needed
 for high-performance packet processing applications.
 
-.. _pg_figure_1:
+.. _figure_architecture-overview:
 
-**Figure 1. Core Components Architecture**
+.. figure:: img/architecture-overview.*
 
-.. image2_png has been replaced
+   Core Components Architecture
 
-|architecture-overview|
 
 Memory Manager (librte_malloc)
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -203,5 +202,3 @@  librte_net
 The librte_net library is a collection of IP protocol definitions and convenience macros.
 It is based on code from the FreeBSD* IP stack and contains protocol numbers (for use in IP headers),
 IP-related macros, IPv4/IPv6 header structures and TCP, UDP and SCTP header structures.
-
-.. |architecture-overview| image:: img/architecture-overview.*
diff --git a/doc/guides/prog_guide/packet_distrib_lib.rst b/doc/guides/prog_guide/packet_distrib_lib.rst
index 767accc..b5bdabb 100644
--- a/doc/guides/prog_guide/packet_distrib_lib.rst
+++ b/doc/guides/prog_guide/packet_distrib_lib.rst
@@ -38,7 +38,10 @@  which is responsible for load balancing or distributing packets,
 and a set of worker lcores which are responsible for receiving the packets from the distributor and operating on them.
 The model of operation is shown in the diagram below.
 
-|packet_distributor1|
+.. figure:: img/packet_distributor1.*
+
+   Packet Distributor mode of operation
+
 
 Distributor Core Operation
 --------------------------
@@ -91,9 +94,11 @@  No packet ordering guarantees are made about packets which do not share a common
 Using the process and returned_pkts API, the following application workflow can be used,
 while allowing packet order within a packet flow -- identified by a tag -- to be maintained.
 
-.. image41_png has been renamed
 
-|packet_distributor2|
+.. figure:: img/packet_distributor2.*
+
+   Application workflow
+
 
 The flush and clear_returns API calls, mentioned previously,
 are likely of less use that the process and returned_pkts APIS, and are principally provided to aid in unit testing of the library.
@@ -110,7 +115,3 @@  Since it may be desirable to vary the number of worker cores, depending on the t
 i.e. to save power at times of lighter load,
 it is possible to have a worker stop processing packets by calling "rte_distributor_return_pkt()" to indicate that
 it has finished the current packet and does not want a new one.
-
-.. |packet_distributor1| image:: img/packet_distributor1.*
-
-.. |packet_distributor2| image:: img/packet_distributor2.*
diff --git a/doc/guides/prog_guide/packet_framework.rst b/doc/guides/prog_guide/packet_framework.rst
index 8e8e32f..42bbbaa 100644
--- a/doc/guides/prog_guide/packet_framework.rst
+++ b/doc/guides/prog_guide/packet_framework.rst
@@ -66,15 +66,15 @@  one of the table entries (on lookup hit) or the default table entry (on lookup m
 provides the set of actions to be applied on the current packet,
 as well as the next hop for the packet, which can be either another table, an output port or packet drop.
 
-An example of packet processing pipeline is presented in Figure 32:
+An example of packet processing pipeline is presented in :numref:`figure_figure32`:
 
-.. _pg_figure_32:
+.. _figure_figure32:
 
-**Figure 32 Example of Packet Processing Pipeline where Input Ports 0 and 1 are Connected with Output Ports 0, 1 and 2 through Tables 0 and 1**
+.. figure:: img/figure32.*
 
-.. Object_1_png has been renamed
+   Example of Packet Processing Pipeline where Input Ports 0 and 1
+   are Connected with Output Ports 0, 1 and 2 through Tables 0 and 1
 
-|figure32|
 
 Port Library Design
 -------------------
@@ -344,13 +344,14 @@  considering *n_bits* as the number of bits set in *bucket_mask = n_buckets - 1*,
 this means that all the keys that end up in the same hash table bucket have the lower *n_bits* of their signature identical.
 In order to reduce the number of keys in the same bucket (collisions), the number of hash table buckets needs to be increased.
 
-In packet processing context, the sequence of operations involved in hash table operations is described in Figure 33:
+In packet processing context, the sequence of operations involved in hash table operations is described in :numref:`figure_figure33`:
 
-.. _pg_figure_33:
+.. _figure_figure33:
 
-**Figure 33 Sequence of Steps for Hash Table Operations in a Packet Processing Context**
+.. figure:: img/figure33.*
+
+   Sequence of Steps for Hash Table Operations in a Packet Processing Context
 
-|figure33|
 
 
 Hash Table Use Cases
@@ -553,16 +554,15 @@  This avoids the important cost associated with flushing the CPU core execution p
 Configurable Key Size Hash Table
 """"""""""""""""""""""""""""""""
 
-Figure 34, Table 25 and Table 26 detail the main data structures used to implement configurable key size hash tables (either LRU or extendable bucket,
+:numref:`figure_figure34`, Table 25 and Table 26 detail the main data structures used to implement configurable key size hash tables (either LRU or extendable bucket,
 either with pre-computed signature or "do-sig").
 
-.. _pg_figure_34:
+.. _figure_figure34:
 
-**Figure 34 Data Structures for Configurable Key Size Hash Tables**
+.. figure:: img/figure34.*
 
-.. image65_png has been renamed
+   Data Structures for Configurable Key Size Hash Tables
 
-|figure34|
 
 .. _pg_table_25:
 
@@ -627,15 +627,17 @@  either with pre-computed signature or "do-sig").
 +---+------------------+--------------------+------------------------------------------------------------------+
 
 
-Figure 35 and Table 27 detail the bucket search pipeline stages (either LRU or extendable bucket,
+:numref:`figure_figure35` and Table 27 detail the bucket search pipeline stages (either LRU or extendable bucket,
 either with pre-computed signature or "do-sig").
 For each pipeline stage, the described operations are applied to each of the two packets handled by that stage.
 
-.. _pg_figure_35:
+.. _figure_figure35:
+
+.. figure:: img/figure35.*
 
-**Figure 35 Bucket Search Pipeline for Key Lookup Operation (Configurable Key Size Hash Tables)**
+   Bucket Search Pipeline for Key Lookup Operation (Configurable Key Size Hash
+   Tables)
 
-|figure35|
 
 .. _pg_table_27:
 
@@ -814,11 +816,8 @@  Given the input *mask*, the values for *match*, *match_many* and *match_pos* can
 |            |                                          |                   |
 +------------+------------------------------------------+-------------------+
 
-The pseudo-code is displayed in Figure 36.
-
-.. _pg_figure_36:
 
-**Figure 36 Pseudo-code for match, match_many and match_pos**
+The pseudo-code for match, match_many and match_pos is::
 
     match = (0xFFFELLU >> mask) & 1;
 
@@ -829,24 +828,22 @@  The pseudo-code is displayed in Figure 36.
 Single Key Size Hash Tables
 """""""""""""""""""""""""""
 
-Figure 37, Figure 38, Table 30 and 31 detail the main data structures used to implement 8-byte and 16-byte key hash tables
+:numref:`figure_figure37`, :numref:`figure_figure38`, Table 30 and 31 detail the main data structures used to implement 8-byte and 16-byte key hash tables
 (either LRU or extendable bucket, either with pre-computed signature or "do-sig").
 
-.. _pg_figure_37:
+.. _figure_figure37:
 
-**Figure 37 Data Structures for 8-byte Key Hash Tables**
+.. figure:: img/figure37.*
 
-.. image66_png has been renamed
+   Data Structures for 8-byte Key Hash Tables
 
-|figure37|
 
-.. _pg_figure_38:
+.. _figure_figure38:
 
-**Figure 38 Data Structures for 16-byte Key Hash Tables**
+.. figure:: img/figure38.*
 
-.. image67_png has been renamed
+   Data Structures for 16-byte Key Hash Tables
 
-|figure38|
 
 .. _pg_table_30:
 
@@ -914,11 +911,13 @@  and detail the bucket search pipeline used to implement 8-byte and 16-byte key h
 either with pre-computed signature or "do-sig").
 For each pipeline stage, the described operations are applied to each of the two packets handled by that stage.
 
-.. _pg_figure_39:
+.. _figure_figure39:
 
-**Figure 39 Bucket Search Pipeline for Key Lookup Operation (Single Key Size Hash Tables)**
+.. figure:: img/figure39.*
+
+   Bucket Search Pipeline for Key Lookup Operation (Single Key Size Hash
+   Tables)
 
-|figure39|
 
 .. _pg_table_32:
 
@@ -1167,17 +1166,3 @@  Usually, to support a specific functional block, specific implementation of Pack
 with all the implementations sharing the same API: pure SW implementation (no acceleration), implementation using accelerator A, implementation using accelerator B, etc.
 The selection between these implementations could be done at build time or at run-time (recommended), based on which accelerators are present in the system,
 with no application changes required.
-
-.. |figure33| image:: img/figure33.*
-
-.. |figure35| image:: img/figure35.*
-
-.. |figure39| image:: img/figure39.*
-
-.. |figure34| image:: img/figure34.*
-
-.. |figure32| image:: img/figure32.*
-
-.. |figure37| image:: img/figure37.*
-
-.. |figure38| image:: img/figure38.*
diff --git a/doc/guides/prog_guide/qos_framework.rst b/doc/guides/prog_guide/qos_framework.rst
index b609841..59f7fb3 100644
--- a/doc/guides/prog_guide/qos_framework.rst
+++ b/doc/guides/prog_guide/qos_framework.rst
@@ -38,13 +38,12 @@  Packet Pipeline with QoS Support
 
 An example of a complex packet processing pipeline with QoS support is shown in the following figure.
 
-.. _pg_figure_21:
+.. _figure_pkt_proc_pipeline_qos:
 
-**Figure 21. Complex Packet Processing Pipeline with QoS Support**
+.. figure:: img/pkt_proc_pipeline_qos.*
 
-.. image47_png has been renamed
+   Complex Packet Processing Pipeline with QoS Support
 
-|pkt_proc_pipeline_qos|
 
 This pipeline can be built using reusable DPDK software libraries.
 The main blocks implementing QoS in this pipeline are: the policer, the dropper and the scheduler.
@@ -139,13 +138,12 @@  It typically acts like a buffer that is able to temporarily store a large number
 as the NIC TX is requesting more packets for transmission,
 these packets are later on removed and handed over to the NIC TX with the packet selection logic observing the predefined SLAs (dequeue operation).
 
-.. _pg_figure_22:
+.. _figure_hier_sched_blk:
 
-**Figure 22. Hierarchical Scheduler Block Internal Diagram**
+.. figure:: img/hier_sched_blk.*
 
-.. image48_png has been renamed
+   Hierarchical Scheduler Block Internal Diagram
 
-|hier_sched_blk|
 
 The hierarchical scheduler is optimized for a large number of packet queues.
 When only a small number of queues are needed, message passing queues should be used instead of this block.
@@ -154,7 +152,7 @@  See Section 26.2.5 "Worst Case Scenarios for Performance" for a more detailed di
 Scheduling Hierarchy
 ~~~~~~~~~~~~~~~~~~~~
 
-The scheduling hierarchy is shown in Figure 23.
+The scheduling hierarchy is shown in :numref:`figure_sched_hier_per_port`.
 The first level of the hierarchy is the Ethernet TX port 1/10/40 GbE,
 with subsequent hierarchy levels defined as subport, pipe, traffic class and queue.
 
@@ -163,13 +161,12 @@  Each traffic class is the representation of a different traffic type with specif
 delay and jitter requirements, such as voice, video or data transfers.
 Each queue hosts packets from one or multiple connections of the same type belonging to the same user.
 
-.. _pg_figure_23:
+.. _figure_sched_hier_per_port:
 
-**Figure 23. Scheduling Hierarchy per Port**
+.. figure:: img/sched_hier_per_port.*
 
-.. image49_png has been renamed
+   Scheduling Hierarchy per Port
 
-|sched_hier_per_port|
 
 The functionality of each hierarchical level is detailed in the following table.
 
@@ -293,13 +290,12 @@  Internal Data Structures per Port
 
 A schematic of the internal data structures in shown in with details in.
 
-.. _pg_figure_24:
+.. _figure_data_struct_per_port:
 
-**Figure 24. Internal Data Structures per Port**
+.. figure:: img/data_struct_per_port.*
 
-.. image50_png has been renamed
+    Internal Data Structures per Port
 
-|data_struct_per_port|
 
 .. _pg_table_4:
 
@@ -434,16 +430,15 @@  the processor should not attempt to access the data structure currently under pr
 The only other work available is to execute different stages of the enqueue sequence of operations on other input packets,
 thus resulting in a pipelined implementation for the enqueue operation.
 
-Figure 25 illustrates a pipelined implementation for the enqueue operation with 4 pipeline stages and each stage executing 2 different input packets.
+:numref:`figure_prefetch_pipeline` illustrates a pipelined implementation for the enqueue operation with 4 pipeline stages and each stage executing 2 different input packets.
 No input packet can be part of more than one pipeline stage at a given time.
 
-.. _pg_figure_25:
+.. _figure_prefetch_pipeline:
 
-**Figure 25. Prefetch Pipeline for the Hierarchical Scheduler Enqueue Operation**
+.. figure:: img/prefetch_pipeline.*
 
-.. image51 has been renamed
+    Prefetch Pipeline for the Hierarchical Scheduler Enqueue Operation
 
-|prefetch_pipeline|
 
 The congestion management scheme implemented by the enqueue pipeline described above is very basic:
 packets are enqueued until a specific queue becomes full,
@@ -478,13 +473,13 @@  The dequeue pipe state machine exploits the data presence into the processor cac
 therefore it tries to send as many packets from the same pipe TC and pipe as possible (up to the available packets and credits) before
 moving to the next active TC from the same pipe (if any) or to another active pipe.
 
-.. _pg_figure_26:
+.. _figure_pipe_prefetch_sm:
 
-**Figure 26. Pipe Prefetch State Machine for the Hierarchical Scheduler Dequeue Operation**
+.. figure:: img/pipe_prefetch_sm.*
 
-.. image52 has been renamed
+   Pipe Prefetch State Machine for the Hierarchical Scheduler Dequeue
+   Operation
 
-|pipe_prefetch_sm|
 
 Timing and Synchronization
 ^^^^^^^^^^^^^^^^^^^^^^^^^^
@@ -1173,17 +1168,16 @@  Dropper
 The purpose of the DPDK dropper is to drop packets arriving at a packet scheduler to avoid congestion.
 The dropper supports the Random Early Detection (RED),
 Weighted Random Early Detection (WRED) and tail drop algorithms.
-Figure 1 illustrates how the dropper integrates with the scheduler.
+:numref:`figure_blk_diag_dropper` illustrates how the dropper integrates with the scheduler.
 The DPDK currently does not support congestion management
 so the dropper provides the only method for congestion avoidance.
 
-.. _pg_figure_27:
+.. _figure_blk_diag_dropper:
 
-**Figure 27. High-level Block Diagram of the DPDK Dropper**
+.. figure:: img/blk_diag_dropper.*
 
-.. image53_png has been renamed
+   High-level Block Diagram of the DPDK Dropper
 
-|blk_diag_dropper|
 
 The dropper uses the Random Early Detection (RED) congestion avoidance algorithm as documented in the reference publication.
 The purpose of the RED algorithm is to monitor a packet queue,
@@ -1202,16 +1196,15 @@  In the case of severe congestion, the dropper resorts to tail drop.
 This occurs when a packet queue has reached maximum capacity and cannot store any more packets.
 In this situation, all arriving packets are dropped.
 
-The flow through the dropper is illustrated in Figure 28.
+The flow through the dropper is illustrated in :numref:`figure_flow_tru_droppper`.
 The RED/WRED algorithm is exercised first and tail drop second.
 
-.. _pg_figure_28:
+.. _figure_flow_tru_droppper:
 
-**Figure 28. Flow Through the Dropper**
+.. figure:: img/flow_tru_droppper.*
 
-..  image54_png has been renamed
+   Flow Through the Dropper
 
-|flow_tru_droppper|
 
 The use cases supported by the dropper are:
 
@@ -1270,17 +1263,16 @@  for example, a filter weight parameter value of 9 corresponds to a filter weight
 Enqueue Operation
 ~~~~~~~~~~~~~~~~~
 
-In the example shown in Figure 29, q (actual queue size) is the input value,
+In the example shown in :numref:`figure_ex_data_flow_tru_dropper`, q (actual queue size) is the input value,
 avg (average queue size) and count (number of packets since the last drop) are run-time values,
 decision is the output value and the remaining values are configuration parameters.
 
-.. _pg_figure_29:
+.. _figure_ex_data_flow_tru_dropper:
 
-**Figure 29. Example Data Flow Through Dropper**
+.. figure:: img/ex_data_flow_tru_dropper.*
 
-.. image55_png has been renamed
+   Example Data Flow Through Dropper
 
-|ex_data_flow_tru_dropper|
 
 EWMA Filter Microblock
 ^^^^^^^^^^^^^^^^^^^^^^
@@ -1298,11 +1290,7 @@  Average Queue Size Calculation when the Queue is not Empty
 
 The definition of the EWMA filter is given in the following equation.
 
-**Equation 1.**
-
-.. image56_png has been renamed
-
-|ewma_filter_eq_1|
+.. image:: img/ewma_filter_eq_1.*
 
 Where:
 
@@ -1326,11 +1314,7 @@  When the queue becomes empty, average queue size should decay gradually to zero
 or remaining stagnant at the last computed value.
 When a packet is enqueued on an empty queue, the average queue size is computed using the following formula:
 
-**Equation 2.**
-
-.. image57_png has been renamed
-
-|ewma_filter_eq_2|
+.. image:: img/ewma_filter_eq_2.*
 
 Where:
 
@@ -1338,9 +1322,7 @@  Where:
 
 In the dropper module, *m* is defined as:
 
-.. image58_png has been renamed
-
-|m_definition|
+.. image:: img/m_definition.*
 
 Where:
 
@@ -1374,15 +1356,13 @@  A numerical method is used to compute the factor (1-wq)^m that appears in Equati
 
 This method is based on the following identity:
 
-.. image59_png has been renamed
+.. image:: img/eq2_factor.*
 
-|eq2_factor|
 
 This allows us to express the following:
 
-.. image60_png has been renamed
+.. image:: img/eq2_expression.*
 
-|eq2_expression|
 
 In the dropper module, a look-up table is used to compute log2(1-wq) for each value of wq supported by the dropper module.
 The factor (1-wq)^m can then be obtained by multiplying the table value by *m* and applying shift operations.
@@ -1465,11 +1445,7 @@  Initial Packet Drop Probability
 
 The initial drop probability is calculated using the following equation.
 
-**Equation 3.**
-
-.. image61_png has been renamed
-
-|drop_probability_eq3|
+.. image:: img/drop_probability_eq3.*
 
 Where:
 
@@ -1481,19 +1457,18 @@  Where:
 
 *   *maxth*  = maximum threshold
 
-The calculation of the packet drop probability using Equation 3 is illustrated in Figure 30.
+The calculation of the packet drop probability using Equation 3 is illustrated in :numref:`figure_pkt_drop_probability`.
 If the average queue size is below the minimum threshold, an arriving packet is enqueued.
 If the average queue size is at or above the maximum threshold, an arriving packet is dropped.
 If the average queue size is between the minimum and maximum thresholds,
 a drop probability is calculated to determine if the packet should be enqueued or dropped.
 
-.. _pg_figure_30:
+.. _figure_pkt_drop_probability:
 
-**Figure 30. Packet Drop Probability for a Given RED Configuration**
+.. figure:: img/pkt_drop_probability.*
 
-.. image62_png has been renamed
+   Packet Drop Probability for a Given RED Configuration
 
-|pkt_drop_probability|
 
 Actual Drop Probability
 """""""""""""""""""""""
@@ -1501,11 +1476,7 @@  Actual Drop Probability
 If the average queue size is between the minimum and maximum thresholds,
 then the actual drop probability is calculated from the following equation.
 
-**Equation 4.**
-
-.. image63_png has been renamed
-
-|drop_probability_eq4|
+.. image:: img/drop_probability_eq4.*
 
 Where:
 
@@ -1518,7 +1489,7 @@  given in the reference document where a value of 1 is used instead.
 It should be noted that the value pa computed from can be negative or greater than 1.
 If this is the case, then a value of 1 should be used instead.
 
-The initial and actual drop probabilities are shown in Figure 31.
+The initial and actual drop probabilities are shown in :numref:`figure_drop_probability_graph`.
 The actual drop probability is shown for the case where
 the formula given in the reference document1 is used (blue curve)
 and also for the case where the formula implemented in the dropper module,
@@ -1528,13 +1499,13 @@  compared to the mark probability configuration parameter specified by the user.
 The choice to deviate from the reference document is simply a design decision and
 one that has been taken by other RED implementations, for example, FreeBSD* ALTQ RED.
 
-.. _pg_figure_31:
+.. _figure_drop_probability_graph:
 
-**Figure 31. Initial Drop Probability (pb), Actual Drop probability (pa) Computed Using a Factor 1 (Blue Curve) and a Factor 2 (Red Curve)**
+.. figure:: img/drop_probability_graph.*
 
-.. image64_png has been renamed
+   Initial Drop Probability (pb), Actual Drop probability (pa) Computed Using
+   a Factor 1 (Blue Curve) and a Factor 2 (Red Curve)
 
-|drop_probability_graph|
 
 .. _Queue_Empty_Operation:
 
@@ -1727,39 +1698,3 @@  For each input packet, the steps for the srTCM / trTCM algorithms are:
     the input color of the packet is also considered.
     When the output color is not red, a number of tokens equal to the length of the IP packet are
     subtracted from the C or E /P or both buckets, depending on the algorithm and the output color of the packet.
-
-.. |flow_tru_droppper| image:: img/flow_tru_droppper.*
-
-.. |drop_probability_graph| image:: img/drop_probability_graph.*
-
-.. |drop_probability_eq3| image:: img/drop_probability_eq3.*
-
-.. |eq2_expression| image:: img/eq2_expression.*
-
-.. |drop_probability_eq4| image:: img/drop_probability_eq4.*
-
-.. |pkt_drop_probability| image:: img/pkt_drop_probability.*
-
-.. |pkt_proc_pipeline_qos| image:: img/pkt_proc_pipeline_qos.*
-
-.. |ex_data_flow_tru_dropper| image:: img/ex_data_flow_tru_dropper.*
-
-.. |ewma_filter_eq_1| image:: img/ewma_filter_eq_1.*
-
-.. |ewma_filter_eq_2| image:: img/ewma_filter_eq_2.*
-
-.. |data_struct_per_port| image:: img/data_struct_per_port.*
-
-.. |prefetch_pipeline| image:: img/prefetch_pipeline.*
-
-.. |pipe_prefetch_sm| image:: img/pipe_prefetch_sm.*
-
-.. |blk_diag_dropper| image:: img/blk_diag_dropper.*
-
-.. |m_definition| image:: img/m_definition.*
-
-.. |eq2_factor| image:: img/eq2_factor.*
-
-.. |sched_hier_per_port| image:: img/sched_hier_per_port.*
-
-.. |hier_sched_blk| image:: img/hier_sched_blk.*
diff --git a/doc/guides/prog_guide/ring_lib.rst b/doc/guides/prog_guide/ring_lib.rst
index 8547b38..3b92a8f 100644
--- a/doc/guides/prog_guide/ring_lib.rst
+++ b/doc/guides/prog_guide/ring_lib.rst
@@ -72,13 +72,12 @@  The disadvantages:
 
 A simplified representation of a Ring is shown in with consumer and producer head and tail pointers to objects stored in the data structure.
 
-.. _pg_figure_4:
+.. _figure_ring1:
 
-**Figure 4. Ring Structure**
+.. figure:: img/ring1.*
 
-.. image5_png has been replaced
+   Ring Structure
 
-|ring1|
 
 References for Ring Implementation in FreeBSD*
 ----------------------------------------------
@@ -155,9 +154,13 @@  The prod_next local variable points to the next element of the table, or several
 
 If there is not enough room in the ring (this is detected by checking cons_tail), it returns an error.
 
-.. image6_png has been replaced
 
-|ring-enqueue1|
+.. _figure_ring-enqueue1:
+
+.. figure:: img/ring-enqueue1.*
+
+   Enqueue first step
+
 
 Enqueue Second Step
 ^^^^^^^^^^^^^^^^^^^
@@ -166,9 +169,13 @@  The second step is to modify *ring->prod_head* in ring structure to point to the
 
 A pointer to the added object is copied in the ring (obj4).
 
-.. image7_png has been replaced
 
-|ring-enqueue2|
+.. _figure_ring-enqueue2:
+
+.. figure:: img/ring-enqueue2.*
+
+   Enqueue second step
+
 
 Enqueue Last Step
 ^^^^^^^^^^^^^^^^^
@@ -176,9 +183,13 @@  Enqueue Last Step
 Once the object is added in the ring, ring->prod_tail in the ring structure is modified to point to the same location as *ring->prod_head*.
 The enqueue operation is finished.
 
-.. image8_png has been replaced
 
-|ring-enqueue3|
+.. _figure_ring-enqueue3:
+
+.. figure:: img/ring-enqueue3.*
+
+   Enqueue last step
+
 
 Single Consumer Dequeue
 ~~~~~~~~~~~~~~~~~~~~~~~
@@ -196,9 +207,13 @@  The cons_next local variable points to the next element of the table, or several
 
 If there are not enough objects in the ring (this is detected by checking prod_tail), it returns an error.
 
-.. image9_png has been replaced
 
-|ring-dequeue1|
+.. _figure_ring-dequeue1:
+
+.. figure:: img/ring-dequeue1.*
+
+   Dequeue last step
+
 
 Dequeue Second Step
 ^^^^^^^^^^^^^^^^^^^
@@ -207,9 +222,13 @@  The second step is to modify ring->cons_head in the ring structure to point to t
 
 The pointer to the dequeued object (obj1) is copied in the pointer given by the user.
 
-.. image10_png has been replaced
 
-|ring-dequeue2|
+.. _figure_ring-dequeue2:
+
+.. figure:: img/ring-dequeue2.*
+
+   Dequeue second step
+
 
 Dequeue Last Step
 ^^^^^^^^^^^^^^^^^
@@ -217,9 +236,13 @@  Dequeue Last Step
 Finally, ring->cons_tail in the ring structure is modified to point to the same location as ring->cons_head.
 The dequeue operation is finished.
 
-.. image11_png has been replaced
 
-|ring-dequeue3|
+.. _figure_ring-dequeue3:
+
+.. figure:: img/ring-dequeue3.*
+
+   Dequeue last step
+
 
 Multiple Producers Enqueue
 ~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -229,8 +252,8 @@  In this example, only the producer head and tail (prod_head and prod_tail) are m
 
 The initial state is to have a prod_head and prod_tail pointing at the same location.
 
-MC Enqueue First Step
-^^^^^^^^^^^^^^^^^^^^^
+Multiple Consumer Enqueue First Step
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
 On both cores, *ring->prod_head* and ring->cons_tail are copied in local variables.
 The prod_next local variable points to the next element of the table,
@@ -238,12 +261,16 @@  or several elements after in the case of bulk enqueue.
 
 If there is not enough room in the ring (this is detected by checking cons_tail), it returns an error.
 
-.. image12_png has been replaced
 
-|ring-mp-enqueue1|
+.. _figure_ring-mp-enqueue1:
+
+.. figure:: img/ring-mp-enqueue1.*
+
+   Multiple consumer enqueue first step
+
 
-MC Enqueue Second Step
-^^^^^^^^^^^^^^^^^^^^^^
+Multiple Consumer Enqueue Second Step
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
 The second step is to modify ring->prod_head in the ring structure to point to the same location as prod_next.
 This operation is done using a Compare And Swap (CAS) instruction, which does the following operations atomically:
@@ -256,41 +283,57 @@  This operation is done using a Compare And Swap (CAS) instruction, which does th
 
 In the figure, the operation succeeded on core 1, and step one restarted on core 2.
 
-.. image13_png has been replaced
 
-|ring-mp-enqueue2|
+.. _figure_ring-mp-enqueue2:
 
-MC Enqueue Third Step
-^^^^^^^^^^^^^^^^^^^^^
+.. figure:: img/ring-mp-enqueue2.*
+
+   Multiple consumer enqueue second step
+
+
+Multiple Consumer Enqueue Third Step
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
 The CAS operation is retried on core 2 with success.
 
 The core 1 updates one element of the ring(obj4), and the core 2 updates another one (obj5).
 
-.. image14_png has been replaced
 
-|ring-mp-enqueue3|
+.. _figure_ring-mp-enqueue3:
+
+.. figure:: img/ring-mp-enqueue3.*
+
+   Multiple consumer enqueue third step
+
 
-MC Enqueue Fourth Step
-^^^^^^^^^^^^^^^^^^^^^^
+Multiple Consumer Enqueue Fourth Step
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
 Each core now wants to update ring->prod_tail.
 A core can only update it if ring->prod_tail is equal to the prod_head local variable.
 This is only true on core 1. The operation is finished on core 1.
 
-.. image15_png has been replaced
 
-|ring-mp-enqueue4|
+.. _figure_ring-mp-enqueue4:
 
-MC Enqueue Last Step
-^^^^^^^^^^^^^^^^^^^^
+.. figure:: img/ring-mp-enqueue4.*
+
+   Multiple consumer enqueue fourth step
+
+
+Multiple Consumer Enqueue Last Step
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
 Once ring->prod_tail is updated by core 1, core 2 is allowed to update it too.
 The operation is also finished on core 2.
 
-.. image16_png has been replaced
 
-|ring-mp-enqueue5|
+.. _figure_ring-mp-enqueue5:
+
+.. figure:: img/ring-mp-enqueue5.*
+
+   Multiple consumer enqueue last step
+
 
 Modulo 32-bit Indexes
 ~~~~~~~~~~~~~~~~~~~~~
@@ -309,15 +352,23 @@  The following are two examples that help to explain how indexes are used in a ri
     In addition, the four indexes are defined as unsigned 16-bit integers,
     as opposed to unsigned 32-bit integers in the more realistic case.
 
-.. image17_png has been replaced
 
-|ring-modulo1|
+.. _figure_ring-modulo1:
+
+.. figure:: img/ring-modulo1.*
+
+   Modulo 32-bit indexes - Example 1
+
 
 This ring contains 11000 entries.
 
-.. image18_png has been replaced
 
-|ring-modulo2|
+.. _figure_ring-modulo2:
+
+.. figure:: img/ring-modulo2.*
+
+      Modulo 32-bit indexes - Example 2
+
 
 This ring contains 12536 entries.
 
@@ -346,31 +397,3 @@  References
     *   `bufring.c in FreeBSD <http://svn.freebsd.org/viewvc/base/release/8.0.0/sys/kern/subr_bufring.c?revision=199625&amp;view=markup>`_ (version 8)
 
     *   `Linux Lockless Ring Buffer Design <http://lwn.net/Articles/340400/>`_
-
-.. |ring1| image:: img/ring1.*
-
-.. |ring-enqueue1| image:: img/ring-enqueue1.*
-
-.. |ring-enqueue2| image:: img/ring-enqueue2.*
-
-.. |ring-enqueue3| image:: img/ring-enqueue3.*
-
-.. |ring-dequeue1| image:: img/ring-dequeue1.*
-
-.. |ring-dequeue2| image:: img/ring-dequeue2.*
-
-.. |ring-dequeue3| image:: img/ring-dequeue3.*
-
-.. |ring-mp-enqueue1| image:: img/ring-mp-enqueue1.*
-
-.. |ring-mp-enqueue2| image:: img/ring-mp-enqueue2.*
-
-.. |ring-mp-enqueue3| image:: img/ring-mp-enqueue3.*
-
-.. |ring-mp-enqueue4| image:: img/ring-mp-enqueue4.*
-
-.. |ring-mp-enqueue5| image:: img/ring-mp-enqueue5.*
-
-.. |ring-modulo1| image:: img/ring-modulo1.*
-
-.. |ring-modulo2| image:: img/ring-modulo2.*
diff --git a/doc/guides/sample_app_ug/dist_app.rst b/doc/guides/sample_app_ug/dist_app.rst
index bcff0dd..25844e0 100644
--- a/doc/guides/sample_app_ug/dist_app.rst
+++ b/doc/guides/sample_app_ug/dist_app.rst
@@ -47,11 +47,12 @@  into each other.
 This application can be used to benchmark performance using the traffic
 generator as shown in the figure below.
 
-.. _figure_22:
+.. _figure_dist_perf:
 
-**Figure 22. Performance Benchmarking Setup (Basic Environment)**
+.. figure:: img/dist_perf.*
+
+   Performance Benchmarking Setup (Basic Environment)
 
-|dist_perf|
 
 Compiling the Application
 -------------------------
@@ -106,7 +107,7 @@  Explanation
 The distributor application consists of three types of threads: a receive
 thread (lcore_rx()), a set of worker threads(locre_worker())
 and a transmit thread(lcore_tx()). How these threads work together is shown
-in Fig2 below. The main() function launches  threads of these three types.
+in :numref:`figure_dist_app` below. The main() function launches  threads of these three types.
 Each thread has a while loop which will be doing processing and which is
 terminated only upon SIGINT or ctrl+C. The receive and transmit threads
 communicate using a software ring (rte_ring structure).
@@ -136,11 +137,12 @@  Users who wish to terminate the running of the application have to press ctrl+C
 in the application will terminate all running threads gracefully and print
 final statistics to the user.
 
-.. _figure_23:
+.. _figure_dist_app:
+
+.. figure:: img/dist_app.*
 
-**Figure 23. Distributor Sample Application Layout**
+   Distributor Sample Application Layout
 
-|dist_app|
 
 Debug Logging Support
 ---------------------
@@ -171,7 +173,3 @@  Sample Application. See Section 9.4.4, "RX Queue Initialization".
 
 TX queue initialization is done in the same way as it is done in the L2 Forwarding
 Sample Application. See Section 9.4.5, "TX Queue Initialization".
-
-.. |dist_perf| image:: img/dist_perf.*
-
-.. |dist_app| image:: img/dist_app.*
diff --git a/doc/guides/sample_app_ug/exception_path.rst b/doc/guides/sample_app_ug/exception_path.rst
index 6c06959..3cc7cbe 100644
--- a/doc/guides/sample_app_ug/exception_path.rst
+++ b/doc/guides/sample_app_ug/exception_path.rst
@@ -46,13 +46,12 @@  The second thread reads from a TAP interface and writes the data unmodified to t
 
 The packet flow through the exception path application is as shown in the following figure.
 
-.. _figure_1:
+.. _figure_exception_path_example:
 
-**Figure 1. Packet Flow**
+.. figure:: img/exception_path_example.*
 
-.. image2_png has been replaced
+   Packet Flow
 
-|exception_path_example|
 
 To make throughput measurements, kernel bridges must be setup to forward data between the bridges appropriately.
 
@@ -327,4 +326,3 @@  To remove bridges and persistent TAP interfaces, the following commands are used
     brctl delbr br0
     openvpn --rmtun --dev tap_dpdk_00
 
-.. |exception_path_example| image:: img/exception_path_example.*
diff --git a/doc/guides/sample_app_ug/index.rst b/doc/guides/sample_app_ug/index.rst
index aaa95ef..745a7ac 100644
--- a/doc/guides/sample_app_ug/index.rst
+++ b/doc/guides/sample_app_ug/index.rst
@@ -74,57 +74,63 @@  Sample Applications User Guide
 
 **Figures**
 
-:ref:`Figure 1.Packet Flow <figure_1>`
+:numref:`figure_exception_path_example` :ref:`figure_exception_path_example`
 
-:ref:`Figure 2.Kernel NIC Application Packet Flow <figure_2>`
+:numref:`figure_kernel_nic` :ref:`figure_kernel_nic`
 
-:ref:`Figure 3.Performance Benchmark Setup (Basic Environment) <figure_3>`
+:numref:`figure_l2_fwd_benchmark_setup_jobstats` :ref:`figure_l2_fwd_benchmark_setup_jobstats`
 
-:ref:`Figure 4.Performance Benchmark Setup (Virtualized Environment) <figure_4>`
+:numref:`figure_l2_fwd_virtenv_benchmark_setup_jobstats` :ref:`figure_l2_fwd_virtenv_benchmark_setup_jobstats`
 
-:ref:`Figure 5.Load Balancer Application Architecture <figure_5>`
+:numref:`figure_l2_fwd_benchmark_setup` :ref:`figure_l2_fwd_benchmark_setup`
 
-:ref:`Figure 5.Example Rules File <figure_5_1>`
+:numref:`figure_l2_fwd_virtenv_benchmark_setup` :ref:`figure_l2_fwd_virtenv_benchmark_setup`
 
-:ref:`Figure 6.Example Data Flow in a Symmetric Multi-process Application <figure_6>`
+:numref:`figure_ipv4_acl_rule` :ref:`figure_ipv4_acl_rule`
 
-:ref:`Figure 7.Example Data Flow in a Client-Server Symmetric Multi-process Application <figure_7>`
+:numref:`figure_example_rules` :ref:`figure_example_rules`
 
-:ref:`Figure 8.Master-slave Process Workflow <figure_8>`
+:numref:`figure_load_bal_app_arch` :ref:`figure_load_bal_app_arch`
 
-:ref:`Figure 9.Slave Process Recovery Process Flow <figure_9>`
+:numref:`figure_sym_multi_proc_app` :ref:`figure_sym_multi_proc_app`
 
-:ref:`Figure 10.QoS Scheduler Application Architecture <figure_10>`
+:numref:`figure_client_svr_sym_multi_proc_app` :ref:`figure_client_svr_sym_multi_proc_app`
 
-:ref:`Figure 11.Intel®QuickAssist Technology Application Block Diagram <figure_11>`
+:numref:`figure_master_slave_proc` :ref:`figure_master_slave_proc`
 
-:ref:`Figure 12.Pipeline Overview <figure_12>`
+:numref:`figure_slave_proc_recov` :ref:`figure_slave_proc_recov`
 
-:ref:`Figure 13.Ring-based Processing Pipeline Performance Setup <figure_13>`
+:numref:`figure_qos_sched_app_arch` :ref:`figure_qos_sched_app_arch`
 
-:ref:`Figure 14.Threads and Pipelines <figure_14>`
+:numref:`figure_quickassist_block_diagram` :ref:`figure_quickassist_block_diagram`
 
-:ref:`Figure 15.Packet Flow Through the VMDQ and DCB Sample Application <figure_15>`
+:numref:`figure_pipeline_overview` :ref:`figure_pipeline_overview`
 
-:ref:`Figure 16.QEMU Virtio-net (prior to vhost-net) <figure_16>`
+:numref:`figure_ring_pipeline_perf_setup` :ref:`figure_ring_pipeline_perf_setup`
 
-:ref:`Figure 17.Virtio with Linux* Kernel Vhost <figure_17>`
+:numref:`figure_threads_pipelines` :ref:`figure_threads_pipelines`
 
-:ref:`Figure 18.Vhost-net Architectural Overview <figure_18>`
+:numref:`figure_vmdq_dcb_example` :ref:`figure_vmdq_dcb_example`
 
-:ref:`Figure 19.Packet Flow Through the vhost-net Sample Application <figure_19>`
+:numref:`figure_qemu_virtio_net` :ref:`figure_qemu_virtio_net`
 
-:ref:`Figure 20.Packet Flow on TX in DPDK-testpmd <figure_20>`
+:numref:`figure_virtio_linux_vhost` :ref:`figure_virtio_linux_vhost`
 
-:ref:`Figure 21.Test Pipeline Application <figure_21>`
+:numref:`figure_vhost_net_arch` :ref:`figure_vhost_net_arch`
 
-:ref:`Figure 22.Performance Benchmarking Setup (Basic Environment) <figure_22>`
+:numref:`figure_vhost_net_sample_app` :ref:`figure_vhost_net_sample_app`
 
-:ref:`Figure 23.Distributor Sample Application Layout <figure_23>`
+:numref:`figure_tx_dpdk_testpmd` :ref:`figure_tx_dpdk_testpmd`
 
-:ref:`Figure 24.High level Solution <figure_24>`
+:numref:`figure_test_pipeline_app` :ref:`figure_test_pipeline_app`
 
-:ref:`Figure 25.VM request to scale frequency <figure_25>`
+:numref:`figure_dist_perf` :ref:`figure_dist_perf`
+
+:numref:`figure_dist_app` :ref:`figure_dist_app`
+
+:numref:`figure_vm_power_mgr_highlevel` :ref:`figure_vm_power_mgr_highlevel`
+
+:numref:`figure_vm_power_mgr_vm_request_seq` :ref:`figure_vm_power_mgr_vm_request_seq`
 
 **Tables**
 
diff --git a/doc/guides/sample_app_ug/intel_quickassist.rst b/doc/guides/sample_app_ug/intel_quickassist.rst
index 7f55282..a80d4ca 100644
--- a/doc/guides/sample_app_ug/intel_quickassist.rst
+++ b/doc/guides/sample_app_ug/intel_quickassist.rst
@@ -46,17 +46,16 @@  For this sample application, there is a dependency on either of:
 Overview
 --------
 
-An overview of the application is provided in Figure 11.
+An overview of the application is provided in :numref:`figure_quickassist_block_diagram`.
 For simplicity, only two NIC ports and one Intel® QuickAssist Technology device are shown in this diagram,
 although the number of NIC ports and Intel® QuickAssist Technology devices can be different.
 
-.. _figure_11:
+.. _figure_quickassist_block_diagram:
 
-**Figure 11. Intel® QuickAssist Technology Application Block Diagram**
+.. figure:: img/quickassist_block_diagram.*
 
-.. image14_png has been renamed
+   Intel® QuickAssist Technology Application Block Diagram
 
-|quickassist_block_diagram|
 
 The application allows the configuration of the following items:
 
@@ -220,5 +219,3 @@  performing AES-CBC-128 encryption with AES-XCBC-MAC-96 hash, the following setti
 
 Refer to the *DPDK Test Report* for more examples of traffic generator setup and the application startup command lines.
 If no errors are generated in response to the startup commands, the application is running correctly.
-
-.. |quickassist_block_diagram| image:: img/quickassist_block_diagram.*
diff --git a/doc/guides/sample_app_ug/kernel_nic_interface.rst b/doc/guides/sample_app_ug/kernel_nic_interface.rst
index d6876e2..02dde59 100644
--- a/doc/guides/sample_app_ug/kernel_nic_interface.rst
+++ b/doc/guides/sample_app_ug/kernel_nic_interface.rst
@@ -71,13 +71,12 @@  it is just for performance testing, or it can work together with VMDq support in
 
 The packet flow through the Kernel NIC Interface application is as shown in the following figure.
 
-.. _figure_2:
+.. _figure_kernel_nic:
 
-**Figure 2. Kernel NIC Application Packet Flow**
+.. figure:: img/kernel_nic.*
 
-.. image3_png has been renamed to kernel_nic.*
+   Kernel NIC Application Packet Flow
 
-|kernel_nic|
 
 Compiling the Application
 -------------------------
@@ -616,5 +615,3 @@  Currently, setting a new MTU and configuring the network interface (up/ down) ar
             RTE_LOG(ERR, APP, "Failed to start port %d\n", port_id);
         return ret;
     }
-
-.. |kernel_nic| image:: img/kernel_nic.*
diff --git a/doc/guides/sample_app_ug/l2_forward_job_stats.rst b/doc/guides/sample_app_ug/l2_forward_job_stats.rst
index eafb8df..b588faa 100644
--- a/doc/guides/sample_app_ug/l2_forward_job_stats.rst
+++ b/doc/guides/sample_app_ug/l2_forward_job_stats.rst
@@ -55,27 +55,24 @@  Also, the MAC addresses are affected as follows:
 
 *   The destination MAC address is replaced by  02:00:00:00:00:TX_PORT_ID
 
-This application can be used to benchmark performance using a traffic-generator, as shown in the Figure 3.
+This application can be used to benchmark performance using a traffic-generator, as shown in the :numref:`figure_l2_fwd_benchmark_setup_jobstats`.
 
-The application can also be used in a virtualized environment as shown in Figure 4.
+The application can also be used in a virtualized environment as shown in :numref:`figure_l2_fwd_virtenv_benchmark_setup_jobstats`.
 
 The L2 Forwarding application can also be used as a starting point for developing a new application based on the DPDK.
 
-.. _figure_3:
+.. _figure_l2_fwd_benchmark_setup_jobstats:
 
-**Figure 3. Performance Benchmark Setup (Basic Environment)**
+.. figure:: img/l2_fwd_benchmark_setup.*
 
-.. image4_png has been replaced
+   Performance Benchmark Setup (Basic Environment)
 
-|l2_fwd_benchmark_setup|
+.. _figure_l2_fwd_virtenv_benchmark_setup_jobstats:
 
-.. _figure_4:
+.. figure:: img/l2_fwd_virtenv_benchmark_setup.*
 
-**Figure 4. Performance Benchmark Setup (Virtualized Environment)**
+   Performance Benchmark Setup (Virtualized Environment)
 
-.. image5_png has been renamed
-
-|l2_fwd_virtenv_benchmark_setup|
 
 Virtual Function Setup Instructions
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -631,7 +628,3 @@  however it improves performance:
          * in which it was called. */
         rte_jobstats_finish(&qconf->flush_job, qconf->flush_job.target);
     }
-
-.. |l2_fwd_benchmark_setup| image:: img/l2_fwd_benchmark_setup.*
-
-.. |l2_fwd_virtenv_benchmark_setup| image:: img/l2_fwd_virtenv_benchmark_setup.*
diff --git a/doc/guides/sample_app_ug/l2_forward_real_virtual.rst b/doc/guides/sample_app_ug/l2_forward_real_virtual.rst
index 234d71d..9334e75 100644
--- a/doc/guides/sample_app_ug/l2_forward_real_virtual.rst
+++ b/doc/guides/sample_app_ug/l2_forward_real_virtual.rst
@@ -54,27 +54,25 @@  Also, the MAC addresses are affected as follows:
 
 *   The destination MAC address is replaced by  02:00:00:00:00:TX_PORT_ID
 
-This application can be used to benchmark performance using a traffic-generator, as shown in the Figure 3.
+This application can be used to benchmark performance using a traffic-generator, as shown in the :numref:`figure_l2_fwd_benchmark_setup`.
 
-The application can also be used in a virtualized environment as shown in Figure 4.
+The application can also be used in a virtualized environment as shown in :numref:`figure_l2_fwd_virtenv_benchmark_setup`.
 
 The L2 Forwarding application can also be used as a starting point for developing a new application based on the DPDK.
 
-.. _figure_3:
+.. _figure_l2_fwd_benchmark_setup:
 
-**Figure 3. Performance Benchmark Setup (Basic Environment)**
+.. figure:: img/l2_fwd_benchmark_setup.*
 
-.. image4_png has been replaced
+   Performance Benchmark Setup (Basic Environment)
 
-|l2_fwd_benchmark_setup|
 
-.. _figure_4:
+.. _figure_l2_fwd_virtenv_benchmark_setup:
 
-**Figure 4. Performance Benchmark Setup (Virtualized Environment)**
+.. figure:: img/l2_fwd_virtenv_benchmark_setup.*
 
-.. image5_png has been renamed
+   Performance Benchmark Setup (Virtualized Environment)
 
-|l2_fwd_virtenv_benchmark_setup|
 
 Virtual Function Setup Instructions
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -526,7 +524,3 @@  however it improves performance:
 
         prev_tsc = cur_tsc;
     }
-
-.. |l2_fwd_benchmark_setup| image:: img/l2_fwd_benchmark_setup.*
-
-.. |l2_fwd_virtenv_benchmark_setup| image:: img/l2_fwd_virtenv_benchmark_setup.*
diff --git a/doc/guides/sample_app_ug/l3_forward_access_ctrl.rst b/doc/guides/sample_app_ug/l3_forward_access_ctrl.rst
index 73fa4df..dbf47c7 100644
--- a/doc/guides/sample_app_ug/l3_forward_access_ctrl.rst
+++ b/doc/guides/sample_app_ug/l3_forward_access_ctrl.rst
@@ -142,9 +142,13 @@  Other lines types are considered invalid.
 
 *   A typical IPv4 ACL rule line should have a format as shown below:
 
-.. image6_png has been renamed
 
-|ipv4_acl_rule|
+.. _figure_ipv4_acl_rule:
+
+.. figure:: img/ipv4_acl_rule.*
+
+   A typical IPv4 ACL rule
+
 
 IPv4 addresses are specified in CIDR format as specified in RFC 4632.
 They consist of the dot notation for the address and a prefix length separated by '/'.
@@ -164,15 +168,12 @@  For example: 6/0xfe matches protocol values 6 and 7.
 Rules File Example
 ~~~~~~~~~~~~~~~~~~
 
-.. _figure_5_1:
+.. _figure_example_rules:
 
-Figure 5 is an example of a rules file. This file has three rules, one for ACL and two for route information.
+.. figure:: img/example_rules.*
 
-**Figure 5.Example Rules File**
+   Rules example
 
-.. image7_png has been renamed
-
-|example_rules|
 
 Each rule is explained as follows:
 
@@ -397,7 +398,3 @@  Finally, the application creates contexts handler from the ACL library,
 adds rules parsed from the file into the database and build an ACL trie.
 It is important to note that the application creates an independent copy of each database for each socket CPU
 involved in the task to reduce the time for remote memory access.
-
-.. |ipv4_acl_rule| image:: img/ipv4_acl_rule.*
-
-.. |example_rules| image:: img/example_rules.*
diff --git a/doc/guides/sample_app_ug/load_balancer.rst b/doc/guides/sample_app_ug/load_balancer.rst
index 6237633..857eb8a 100644
--- a/doc/guides/sample_app_ug/load_balancer.rst
+++ b/doc/guides/sample_app_ug/load_balancer.rst
@@ -44,13 +44,12 @@  Overview
 
 The architecture of the Load Balance application is presented in the following figure.
 
-.. _figure_5:
+.. _figure_load_bal_app_arch:
 
-**Figure 5. Load Balancer Application Architecture**
+.. figure:: img/load_bal_app_arch.*
 
-.. image8_png has been renamed
+   Load Balancer Application Architecture
 
-|load_bal_app_arch|
 
 For the sake of simplicity, the diagram illustrates a specific case of two I/O RX and two I/O TX lcores off loading the packet I/O
 overhead incurred by four NIC ports from four worker cores, with each I/O lcore handling RX/TX for two NIC ports.
@@ -241,5 +240,3 @@  are on the same or different CPU sockets, the following run-time scenarios are p
 #.  ABC: The packet is received on socket A, it is processed by an lcore on socket B,
     then it has to be transmitted out by a NIC connected to socket C.
     The performance price for crossing the CPU socket boundary is paid twice for this packet.
-
-.. |load_bal_app_arch| image:: img/load_bal_app_arch.*
diff --git a/doc/guides/sample_app_ug/multi_process.rst b/doc/guides/sample_app_ug/multi_process.rst
index 7ca71ca..f42cb9a 100644
--- a/doc/guides/sample_app_ug/multi_process.rst
+++ b/doc/guides/sample_app_ug/multi_process.rst
@@ -190,13 +190,12 @@  such as a client-server mode of operation seen in the next example,
 where different processes perform different tasks, yet co-operate to form a packet-processing system.)
 The following diagram shows the data-flow through the application, using two processes.
 
-.. _figure_6:
+.. _figure_sym_multi_proc_app:
 
-**Figure 6. Example Data Flow in a Symmetric Multi-process Application**
+.. figure:: img/sym_multi_proc_app.*
 
-.. image9_png has been renamed
+   Example Data Flow in a Symmetric Multi-process Application
 
-|sym_multi_proc_app|
 
 As the diagram shows, each process reads packets from each of the network ports in use.
 RSS is used to distribute incoming packets on each port to different hardware RX queues.
@@ -296,13 +295,12 @@  In this case, the client applications just perform level-2 forwarding of packets
 
 The following diagram shows the data-flow through the application, using two client processes.
 
-.. _figure_7:
+.. _figure_client_svr_sym_multi_proc_app:
 
-**Figure 7. Example Data Flow in a Client-Server Symmetric Multi-process Application**
+.. figure:: img/client_svr_sym_multi_proc_app.*
 
-.. image10_png has been renamed
+   Example Data Flow in a Client-Server Symmetric Multi-process Application
 
-|client_svr_sym_multi_proc_app|
 
 Running the Application
 ^^^^^^^^^^^^^^^^^^^^^^^
@@ -395,13 +393,12 @@  Once the master process begins to run, it tries to initialize all the resources
 memory, CPU cores, driver, ports, and so on, as the other examples do.
 Thereafter, it creates slave processes, as shown in the following figure.
 
-.. _figure_8:
+.. _figure_master_slave_proc:
 
-**Figure 8. Master-slave Process Workflow**
+.. figure:: img/master_slave_proc.*
 
-.. image11_png has been renamed
+   Master-slave Process Workflow
 
-|master_slave_proc|
 
 The master process calls the rte_eal_mp_remote_launch() EAL function to launch an application function for each pinned thread through the pipe.
 Then, it waits to check if any slave processes have exited.
@@ -475,13 +472,12 @@  Therefore, to provide the capability to resume the new slave instance if the pre
 
 The following diagram describes slave process recovery.
 
-.. _figure_9:
+.. _figure_slave_proc_recov:
 
-**Figure 9. Slave Process Recovery Process Flow**
+.. figure:: img/slave_proc_recov.*
 
-.. image12_png has been renamed
+   Slave Process Recovery Process Flow
 
-|slave_proc_recov|
 
 Floating Process Support
 ^^^^^^^^^^^^^^^^^^^^^^^^
@@ -774,11 +770,3 @@  so it remaps the resource to the new core ID slot.
         }
         return 0;
     }
-
-.. |sym_multi_proc_app| image:: img/sym_multi_proc_app.*
-
-.. |client_svr_sym_multi_proc_app| image:: img/client_svr_sym_multi_proc_app.*
-
-.. |master_slave_proc| image:: img/master_slave_proc.*
-
-.. |slave_proc_recov| image:: img/slave_proc_recov.*
diff --git a/doc/guides/sample_app_ug/qos_scheduler.rst b/doc/guides/sample_app_ug/qos_scheduler.rst
index 56326df..66c261c 100644
--- a/doc/guides/sample_app_ug/qos_scheduler.rst
+++ b/doc/guides/sample_app_ug/qos_scheduler.rst
@@ -38,13 +38,12 @@  Overview
 
 The architecture of the QoS scheduler application is shown in the following figure.
 
-.. _figure_10:
+.. _figure_qos_sched_app_arch:
 
-**Figure 10. QoS Scheduler Application Architecture**
+.. figure:: img/qos_sched_app_arch.*
 
-.. image13_png has been renamed
+   QoS Scheduler Application Architecture
 
-|qos_sched_app_arch|
 
 There are two flavors of the runtime execution for this application,
 with two or three threads per each packet flow configuration being used.
@@ -347,5 +346,3 @@  This application classifies based on the QinQ double VLAN tags and the IP destin
 +----------------+-------------------------+--------------------------------------------------+----------------------------------+
 
 Please refer to the "QoS Scheduler" chapter in the *DPDK Programmer's Guide* for more information about these parameters.
-
-.. |qos_sched_app_arch| image:: img/qos_sched_app_arch.*
diff --git a/doc/guides/sample_app_ug/quota_watermark.rst b/doc/guides/sample_app_ug/quota_watermark.rst
index e091ad9..de9e118 100644
--- a/doc/guides/sample_app_ug/quota_watermark.rst
+++ b/doc/guides/sample_app_ug/quota_watermark.rst
@@ -54,15 +54,14 @@  and ports 2 and 3 forward into each other.
 The MAC addresses of the forwarded Ethernet frames are not affected.
 
 Internally, packets are pulled from the ports by the master logical core and put on a variable length processing pipeline,
-each stage of which being connected by rings, as shown in Figure 12.
+each stage of which being connected by rings, as shown in :numref:`figure_pipeline_overview`.
 
-.. _figure_12:
+.. _figure_pipeline_overview:
 
-**Figure 12. Pipeline Overview**
+.. figure:: img/pipeline_overview.*
 
-.. image15_png has been renamed
+   Pipeline Overview
 
-|pipeline_overview|
 
 An adjustable quota value controls how many packets are being moved through the pipeline per enqueue and dequeue.
 Adjustable watermark values associated with the rings control a back-off mechanism that
@@ -79,15 +78,14 @@  eventually lead to an Ethernet flow control frame being send to the source.
 
 On top of serving as an example of quota and watermark usage,
 this application can be used to benchmark ring based processing pipelines performance using a traffic- generator,
-as shown in Figure 13.
+as shown in :numref:`figure_ring_pipeline_perf_setup`.
 
-.. _figure_13:
+.. _figure_ring_pipeline_perf_setup:
 
-**Figure 13. Ring-based Processing Pipeline Performance Setup**
+.. figure:: img/ring_pipeline_perf_setup.*
 
-.. image16_png has been renamed
+   Ring-based Processing Pipeline Performance Setup
 
-|ring_pipeline_perf_setup|
 
 Compiling the Application
 -------------------------
@@ -311,7 +309,7 @@  Logical Cores Assignment
 The application uses the master logical core to poll all the ports for new packets and enqueue them on a ring associated with the port.
 
 Each logical core except the last runs pipeline_stage() after a ring for each used port is initialized on that core.
-pipeline_stage() on core X dequeues packets from core X-1's rings and enqueue them on its own rings. See Figure 14.
+pipeline_stage() on core X dequeues packets from core X-1's rings and enqueue them on its own rings. See :numref:`figure_threads_pipelines`.
 
 .. code-block:: c
 
@@ -340,16 +338,12 @@  sending them out on the destination port setup by pair_ports().
 Receive, Process and Transmit Packets
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
-.. _figure_14:
+.. _figure_threads_pipelines:
 
-Figure 14 shows where each thread in the pipeline is.
-It should be used as a reference while reading the rest of this section.
+.. figure:: img/threads_pipelines.*
 
-**Figure 14. Threads and Pipelines**
+   Threads and Pipelines
 
-.. image17_png has been renamed
-
-|threads_pipelines|
 
 In the receive_stage() function running on the master logical core,
 the main task is to read ingress packets from the RX ports and enqueue them
@@ -498,9 +492,3 @@  low_watermark from the rte_memzone previously created by qw.
 
         low_watermark = (unsigned int *) qw_memzone->addr + sizeof(int);
     }
-
-.. |pipeline_overview| image:: img/pipeline_overview.*
-
-.. |ring_pipeline_perf_setup| image:: img/ring_pipeline_perf_setup.*
-
-.. |threads_pipelines| image:: img/threads_pipelines.*
diff --git a/doc/guides/sample_app_ug/test_pipeline.rst b/doc/guides/sample_app_ug/test_pipeline.rst
index 0432942..fbc290c 100644
--- a/doc/guides/sample_app_ug/test_pipeline.rst
+++ b/doc/guides/sample_app_ug/test_pipeline.rst
@@ -49,13 +49,12 @@  The application uses three CPU cores:
 
 *   Core C ("TX core") receives traffic from core B through software queues and sends it to the NIC ports for transmission.
 
-.. _figure_21:
+.. _figure_test_pipeline_app:
 
-**Figure 21.Test Pipeline Application**
+.. figure:: img/test_pipeline_app.*
 
-.. image24_png has been renamed
+   Test Pipeline Application
 
-|test_pipeline_app|
 
 Compiling the Application
 -------------------------
@@ -281,5 +280,3 @@  The profile for input traffic is TCP/IPv4 packets with:
 *   destination TCP port fixed to 0
 
 *   source TCP port fixed to 0
-
-.. |test_pipeline_app| image:: img/test_pipeline_app.*
diff --git a/doc/guides/sample_app_ug/vhost.rst b/doc/guides/sample_app_ug/vhost.rst
index cd9b232..5c4b79d 100644
--- a/doc/guides/sample_app_ug/vhost.rst
+++ b/doc/guides/sample_app_ug/vhost.rst
@@ -48,13 +48,12 @@  between host and guest.
 It was found that virtio-net performance was poor due to context switching and packet copying between host, guest, and QEMU.
 The following figure shows the system architecture for a virtio-based networking (virtio-net).
 
-.. _figure_16:
+.. _figure_qemu_virtio_net:
 
-**Figure16. QEMU Virtio-net (prior to vhost-net)**
+.. figure:: img/qemu_virtio_net.*
 
-.. image19_png has been renamed
+   System Architecture for Virtio-based Networking (virtio-net).
 
-|qemu_virtio_net|
 
 The Linux* Kernel vhost-net module was developed as an offload mechanism for virtio-net.
 The vhost-net module enables KVM (QEMU) to offload the servicing of virtio-net devices to the vhost-net kernel module,
@@ -76,13 +75,12 @@  This is achieved by QEMU sharing the following information with the vhost-net mo
 
 The following figure shows the system architecture for virtio-net networking with vhost-net offload.
 
-.. _figure_17:
+.. _figure_virtio_linux_vhost:
 
-**Figure 17. Virtio with Linux* Kernel Vhost**
+.. figure:: img/virtio_linux_vhost.*
 
-.. image20_png has been renamed
+   Virtio with Linux
 
-|virtio_linux_vhost|
 
 Sample Code Overview
 --------------------
@@ -119,23 +117,21 @@  The vhost sample code application is a simple packet switching application with
 
 The following figure shows the architecture of the Vhost sample application based on vhost-cuse.
 
-.. _figure_18:
+.. _figure_vhost_net_arch:
 
-**Figure 18. Vhost-net Architectural Overview**
+.. figure:: img/vhost_net_arch.*
 
-.. image21_png has been renamed
+   Vhost-net Architectural Overview
 
-|vhost_net_arch|
 
 The following figure shows the flow of packets through the vhost-net sample application.
 
-.. _figure_19:
+.. _figure_vhost_net_sample_app:
 
-**Figure 19. Packet Flow Through the vhost-net Sample Application**
+.. figure:: img/vhost_net_sample_app.*
 
-.. image22_png  has been renamed
+   Packet Flow Through the vhost-net Sample Application
 
-|vhost_net_sample_app|
 
 Supported Distributions
 -----------------------
@@ -794,13 +790,12 @@  In the "wait and retry" mode if the virtqueue is found to be full, then testpmd
 The "wait and retry" algorithm is implemented in DPDK testpmd as a forwarding method call "mac_retry".
 The following sequence diagram describes the algorithm in detail.
 
-.. _figure_20:
+.. _figure_tx_dpdk_testpmd:
 
-**Figure 20. Packet Flow on TX in DPDK-testpmd**
+.. figure:: img/tx_dpdk_testpmd.*
 
-.. image23_png has been renamed
+   Packet Flow on TX in DPDK-testpmd
 
-|tx_dpdk_testpmd|
 
 Running Testpmd
 ~~~~~~~~~~~~~~~
@@ -861,13 +856,3 @@  For example:
 The above message indicates that device 0 has been registered with MAC address cc:bb:bb:bb:bb:bb and VLAN tag 1000.
 Any packets received on the NIC with these values is placed on the devices receive queue.
 When a virtio-net device transmits packets, the VLAN tag is added to the packet by the DPDK vhost sample code.
-
-.. |vhost_net_arch| image:: img/vhost_net_arch.*
-
-.. |qemu_virtio_net| image:: img/qemu_virtio_net.*
-
-.. |tx_dpdk_testpmd| image:: img/tx_dpdk_testpmd.*
-
-.. |vhost_net_sample_app| image:: img/vhost_net_sample_app.*
-
-.. |virtio_linux_vhost| image:: img/virtio_linux_vhost.*
diff --git a/doc/guides/sample_app_ug/vm_power_management.rst b/doc/guides/sample_app_ug/vm_power_management.rst
index 2a923d8..81db6ad 100644
--- a/doc/guides/sample_app_ug/vm_power_management.rst
+++ b/doc/guides/sample_app_ug/vm_power_management.rst
@@ -74,11 +74,12 @@  The solution is comprised of two high-level components:
    The l3fwd-power application will use this implementation when deployed on a VM
    (see Chapter 11 "L3 Forwarding with Power Management Application").
 
-.. _figure_24:
+.. _figure_vm_power_mgr_highlevel:
 
-**Figure 24. Highlevel Solution**
+.. figure:: img/vm_power_mgr_highlevel.*
+
+   Highlevel Solution
 
-|vm_power_mgr_highlevel|
 
 Overview
 --------
@@ -105,11 +106,12 @@  at runtime based on the environment.
 Upon receiving a request, the host translates the vCPU to a pCPU via
 the libvirt API before forwarding to the host librte_power.
 
-.. _figure_25:
+.. _figure_vm_power_mgr_vm_request_seq:
+
+.. figure:: img/vm_power_mgr_vm_request_seq.*
 
-**Figure 25. VM request to scale frequency**
+   VM request to scale frequency
 
-|vm_power_mgr_vm_request_seq|
 
 Performance Considerations
 ~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -355,7 +357,3 @@  Where {core_num} is the lcore and channel to change frequency by scaling up/down
 .. code-block:: console
 
   set_cpu_freq {core_num} up|down|min|max
-
-.. |vm_power_mgr_highlevel| image:: img/vm_power_mgr_highlevel.*
-
-.. |vm_power_mgr_vm_request_seq| image:: img/vm_power_mgr_vm_request_seq.*
diff --git a/doc/guides/sample_app_ug/vmdq_dcb_forwarding.rst b/doc/guides/sample_app_ug/vmdq_dcb_forwarding.rst
index e5d34e1..49ec6ce 100644
--- a/doc/guides/sample_app_ug/vmdq_dcb_forwarding.rst
+++ b/doc/guides/sample_app_ug/vmdq_dcb_forwarding.rst
@@ -53,7 +53,7 @@  All traffic is read from a single incoming port (port 0) and output on port 1, w
 The traffic is split into 128 queues on input, where each thread of the application reads from multiple queues.
 For example, when run with 8 threads, that is, with the -c FF option, each thread receives and forwards packets from 16 queues.
 
-As supplied, the sample application configures the VMDQ feature to have 16 pools with 8 queues each as indicated in Figure 15.
+As supplied, the sample application configures the VMDQ feature to have 16 pools with 8 queues each as indicated in :numref:`figure_vmdq_dcb_example`.
 The Intel® 82599 10 Gigabit Ethernet Controller NIC also supports the splitting of traffic into 32 pools of 4 queues each and
 this can be used by changing the NUM_POOLS parameter in the supplied code.
 The NUM_POOLS parameter can be passed on the command line, after the EAL parameters:
@@ -64,13 +64,12 @@  The NUM_POOLS parameter can be passed on the command line, after the EAL paramet
 
 where, NP can be 16 or 32.
 
-.. _figure_15:
+.. _figure_vmdq_dcb_example:
 
-**Figure 15. Packet Flow Through the VMDQ and DCB Sample Application**
+.. figure:: img/vmdq_dcb_example.*
 
-.. image18_png has been replaced
+   Packet Flow Through the VMDQ and DCB Sample Application
 
-|vmdq_dcb_example|
 
 In Linux* user space, the application can display statistics with the number of packets received on each queue.
 To have the application display the statistics, send a SIGHUP signal to the running application process, as follows:
@@ -247,5 +246,3 @@  To generate the statistics output, use the following command:
 
 Please note that the statistics output will appear on the terminal where the vmdq_dcb_app is running,
 rather than the terminal from which the HUP signal was sent.
-
-.. |vmdq_dcb_example| image:: img/vmdq_dcb_example.*
diff --git a/doc/guides/xen/pkt_switch.rst b/doc/guides/xen/pkt_switch.rst
index a9eca52..3a6fc47 100644
--- a/doc/guides/xen/pkt_switch.rst
+++ b/doc/guides/xen/pkt_switch.rst
@@ -52,9 +52,13 @@  The switching back end maps those grant table references and creates shared ring
 
 The following diagram describes the functionality of the DPDK Xen Packet- Switching Solution.
 
-.. image35_png has been renamed
 
-|dpdk_xen_pkt_switch|
+.. _figure_dpdk_xen_pkt_switch:
+
+.. figure:: img/dpdk_xen_pkt_switch.*
+
+   Functionality of the DPDK Xen Packet Switching Solution.
+
 
 Note 1 The Xen hypervisor uses a mechanism called a Grant Table to share memory between domains
 (`http://wiki.xen.org/wiki/Grant Table <http://wiki.xen.org/wiki/Grant%20Table>`_).
@@ -62,9 +66,13 @@  Note 1 The Xen hypervisor uses a mechanism called a Grant Table to share memory
 A diagram of the design is shown below, where "gva" is the Guest Virtual Address,
 which is the data pointer of the mbuf, and "hva" is the Host Virtual Address:
 
-.. image36_png has been renamed
 
-|grant_table|
+.. _figure_grant_table:
+
+.. figure:: img/grant_table.*
+
+   DPDK Xen Layout
+
 
 In this design, a Virtio ring is used as a para-virtualized interface for better performance over a Xen private ring
 when packet switching to and from a VM.
@@ -139,9 +147,13 @@  Take idx#_mempool_gref node for example, the host maps those Grant references to
 The real Grant reference information is stored in this virtual address space,
 where (gref, pfn) pairs follow each other with -1 as the terminator.
 
-.. image37_pnng has been renamed
 
-|grant_refs|
+.. _figure_grant_refs:
+
+.. figure:: img/grant_refs.*
+
+   Mapping Grant references to a continuous virtual address space
+
 
 After all gref# IDs are retrieved, the host maps them to a continuous virtual address space.
 With the guest mempool virtual address, the host establishes 1:1 address mapping.
@@ -456,9 +468,3 @@  then sent out through hardware with destination MAC address 00:00:00:00:00:33.
 The packet flow is:
 
 packet generator->Virtio in guest VM1->switching backend->Virtio in guest VM2->switching backend->wire
-
-.. |grant_table| image:: img/grant_table.*
-
-.. |grant_refs| image:: img/grant_refs.*
-
-.. |dpdk_xen_pkt_switch| image:: img/dpdk_xen_pkt_switch.*