[RFC,19.11,1/2] ethdev: make DPDK core functions non-inline

Message ID 20190730124950.1293-2-marcinx.a.zapolski@intel.com (mailing list archive)
State Superseded, archived
Delegated to: Ferruh Yigit
Headers
Series Hide DPDK internal struct from public API |

Checks

Context Check Description
ci/Intel-compilation success Compilation OK
ci/checkpatch warning coding style issues

Commit Message

Marcin Zapolski July 30, 2019, 12:49 p.m. UTC
  Make rte_eth_rx_burst, rte_eth_tx_burst and other static inline ethdev
functions not inline. They are referencing DPDK internal structures and
inlining forces those structures to be exposed to user applications.

In internal testing with i40e NICs a performance drop of about 2% was
observed with testpmd.

Signed-off-by: Marcin Zapolski <marcinx.a.zapolski@intel.com>
---
 lib/librte_ethdev/rte_ethdev.c           | 168 +++++++++++++++++++++++
 lib/librte_ethdev/rte_ethdev.h           | 166 ++--------------------
 lib/librte_ethdev/rte_ethdev_version.map |  12 ++
 3 files changed, 195 insertions(+), 151 deletions(-)
  

Comments

Jerin Jacob Kollanukkaran July 30, 2019, 3:01 p.m. UTC | #1
> -----Original Message-----
> From: dev <dev-bounces@dpdk.org> On Behalf Of Marcin Zapolski
> Sent: Tuesday, July 30, 2019 6:20 PM
> To: dev@dpdk.org
> Cc: Marcin Zapolski <marcinx.a.zapolski@intel.com>
> Subject: [dpdk-dev] [RFC 19.11 1/2] ethdev: make DPDK core functions non-
> inline
> 
> Make rte_eth_rx_burst, rte_eth_tx_burst and other static inline ethdev
> functions not inline. They are referencing DPDK internal structures and
> inlining forces those structures to be exposed to user applications.
> 
> In internal testing with i40e NICs a performance drop of about 2% was
> observed with testpmd.

I tested on two class of arm64 machines(Highend and lowend) one has 1.4% drop
And other one has 3.6% drop.

I second to not expose internal data structure to avoid ABI break.

IMO, This patch has performance issue due to it is fixing it in simple way.

It is not worth two have function call overhead to call the driver function.
Some thoughts below to reduce the performance impact without exposing internal 
structures.

And I think, We need to follow the similar mechanism for cryptodev, Eventdev, rawdev
Etc so bring the common scheme to address this semantics will be use full.

> 
> Signed-off-by: Marcin Zapolski <marcinx.a.zapolski@intel.com>
> ---
>  lib/librte_ethdev/rte_ethdev.c           | 168 +++++++++++++++++++++++
>  lib/librte_ethdev/rte_ethdev.h           | 166 ++--------------------
>  lib/librte_ethdev/rte_ethdev_version.map |  12 ++
>  3 files changed, 195 insertions(+), 151 deletions(-)
> 
> diff --git a/lib/librte_ethdev/rte_ethdev.c b/lib/librte_ethdev/rte_ethdev.c
> index 17d183e1f..31432a956 100644
> --- a/lib/librte_ethdev/rte_ethdev.c
> +++ b/lib/librte_ethdev/rte_ethdev.c
> @@ -749,6 +749,174 @@ rte_eth_dev_get_sec_ctx(uint16_t port_id)
>  	return rte_eth_devices[port_id].security_ctx;
>  }
> 
> +uint16_t
> +rte_eth_rx_burst(uint16_t port_id, uint16_t queue_id,
> +		 struct rte_mbuf **rx_pkts, const uint16_t nb_pkts) {
> +	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
> +	uint16_t nb_rx;

I think, we only need to store 3 function pointers per port.
IMO, Let have structure for that.

i.e split the struct rte_eth_dev content as public and private.
I think, We nee only following elements in rte_eth_dev
struct rte_eth_dev_fns {
        eth_rx_burst_t rx_pkt_burst; /**< Pointer to PMD receive function. */
        eth_tx_burst_t tx_pkt_burst; /**< Pointer to PMD transmit function. */
        eth_tx_prep_t tx_pkt_prepare; /**< Pointer to PMD transmit prepare function. *
};
struct rte_eth_dev  {
	struct rte_eth_dev_fns fns; // make it as first item allows type cast to struct rte_eth_dev_fns from struct rte_eth_dev  
               private ones
}


> +
> +#ifdef RTE_LIBRTE_ETHDEV_DEBUG
> +	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, 0);
> +	RTE_FUNC_PTR_OR_ERR_RET(*dev->rx_pkt_burst, 0);
> +
> +	if (queue_id >= dev->data->nb_rx_queues) {
> +		RTE_ETHDEV_LOG(ERR, "Invalid RX queue_id=%u\n",
> queue_id);
> +		return 0;
> +	}
> +#endif
> +	nb_rx = (*dev->rx_pkt_burst)(dev->data->rx_queues[queue_id],

I think, if we make driver funtions as (*dev->rx_pkt_burst)(dev, rx_pkts, nb_pkts)
Then no need to deference data from inline function.
Lets expose a helper function from driver layer and let PMD use to access queue memory.
No need to expose that helper to user app.

> +				     rx_pkts, nb_pkts);
> +
> +#ifdef RTE_ETHDEV_RXTX_CALLBACKS

# If we have ethdev driver helper function  for the same and PMD can call it as well no need
to call this inline function.
# I think, it make sense to as RX_OFFLOAD_FLAGS so that when app needs only
It can be included in fastpath.

# lastly we are not exposing rte_eth_dev to application then I think we can
Remove rte_ from name.


> +	if (unlikely(dev->post_rx_burst_cbs[queue_id] != NULL)) {
> +		struct rte_eth_rxtx_callback *cb =
> +				dev->post_rx_burst_cbs[queue_id];
> +
> +		do {
> +			nb_rx = cb->fn.rx(port_id, queue_id, rx_pkts, nb_rx,
> +						nb_pkts, cb->param);
> +			cb = cb->next;
> +		} while (cb != NULL);
> +	}
> +#endif
> +
> +	return nb_rx;
> +}
> +
>
  
Stephen Hemminger July 30, 2019, 3:25 p.m. UTC | #2
On Tue, 30 Jul 2019 14:49:49 +0200
Marcin Zapolski <marcinx.a.zapolski@intel.com> wrote:

> Make rte_eth_rx_burst, rte_eth_tx_burst and other static inline ethdev
> functions not inline. They are referencing DPDK internal structures and
> inlining forces those structures to be exposed to user applications.
> 
> In internal testing with i40e NICs a performance drop of about 2% was
> observed with testpmd.
> 
> Signed-off-by: Marcin Zapolski <marcinx.a.zapolski@intel.com>

Sorry 2% matters.
  
Bruce Richardson July 30, 2019, 3:32 p.m. UTC | #3
On Tue, Jul 30, 2019 at 03:01:00PM +0000, Jerin Jacob Kollanukkaran wrote:
> > -----Original Message----- From: dev <dev-bounces@dpdk.org> On Behalf
> > Of Marcin Zapolski Sent: Tuesday, July 30, 2019 6:20 PM To:
> > dev@dpdk.org Cc: Marcin Zapolski <marcinx.a.zapolski@intel.com>
> > Subject: [dpdk-dev] [RFC 19.11 1/2] ethdev: make DPDK core functions
> > non- inline
> > 
> > Make rte_eth_rx_burst, rte_eth_tx_burst and other static inline ethdev
> > functions not inline. They are referencing DPDK internal structures and
> > inlining forces those structures to be exposed to user applications.
> > 
> > In internal testing with i40e NICs a performance drop of about 2% was
> > observed with testpmd.
> 
> I tested on two class of arm64 machines(Highend and lowend) one has 1.4%
> drop And other one has 3.6% drop.
>
This is with testpmd only right? I'd just point out that we need to
remember that these numbers need to be scaled down appropriately for a
realworld app where IO is only a (hopefully small) proportion of the packet
processing budget. For example, I would expect the ~2% drop we saw in
testpmd to correspond to <0.5% drop in something like OVS.
 
> I second to not expose internal data structure to avoid ABI break.
> 
> IMO, This patch has performance issue due to it is fixing it in simple
> way.
> 
> It is not worth two have function call overhead to call the driver
> function.  Some thoughts below to reduce the performance impact without
> exposing internal structures.
> 
The big concern I have with what you propose is that would involve changing
each and every ethdev driver in DPDK! I'd prefer to make sure that the
impact of this change is actually felt in real-world apps before we start
looking to make such updates across the DPDK codebase.

> And I think, We need to follow the similar mechanism for cryptodev, Eventdev, rawdev
> Etc so bring the common scheme to address this semantics will be use full.
> 
Agreed.

Regards,
/Bruce
  
Bruce Richardson July 30, 2019, 3:33 p.m. UTC | #4
On Tue, Jul 30, 2019 at 08:25:34AM -0700, Stephen Hemminger wrote:
> On Tue, 30 Jul 2019 14:49:49 +0200
> Marcin Zapolski <marcinx.a.zapolski@intel.com> wrote:
> 
> > Make rte_eth_rx_burst, rte_eth_tx_burst and other static inline ethdev
> > functions not inline. They are referencing DPDK internal structures and
> > inlining forces those structures to be exposed to user applications.
> > 
> > In internal testing with i40e NICs a performance drop of about 2% was
> > observed with testpmd.
> > 
> > Signed-off-by: Marcin Zapolski <marcinx.a.zapolski@intel.com>
> 
> Sorry 2% matters.

Note that this is with testpmd. Are there many apps out there where a 2%
drop in IO cost would be noticable?
  
Stephen Hemminger July 30, 2019, 3:54 p.m. UTC | #5
On Tue, 30 Jul 2019 16:33:55 +0100
Bruce Richardson <bruce.richardson@intel.com> wrote:

> On Tue, Jul 30, 2019 at 08:25:34AM -0700, Stephen Hemminger wrote:
> > On Tue, 30 Jul 2019 14:49:49 +0200
> > Marcin Zapolski <marcinx.a.zapolski@intel.com> wrote:
> >   
> > > Make rte_eth_rx_burst, rte_eth_tx_burst and other static inline ethdev
> > > functions not inline. They are referencing DPDK internal structures and
> > > inlining forces those structures to be exposed to user applications.
> > > 
> > > In internal testing with i40e NICs a performance drop of about 2% was
> > > observed with testpmd.
> > > 
> > > Signed-off-by: Marcin Zapolski <marcinx.a.zapolski@intel.com>  
> > 
> > Sorry 2% matters.  
> 
> Note that this is with testpmd. Are there many apps out there where a 2%
> drop in IO cost would be noticable?

Why not find a way to get the 2% back elsewhere? Maybe analyzing the code/cache
in more detail. Perhaps some prefetching could help, or getting rid of
indirect calls elsewhere in the code.  At the extreme, maybe implementing
something like the kernel static branches (self-modifying code) would
get a lot back.
  
Wiles, Keith July 30, 2019, 4:04 p.m. UTC | #6
> On Jul 30, 2019, at 10:54 AM, Stephen Hemminger <stephen@networkplumber.org> wrote:
> 
> On Tue, 30 Jul 2019 16:33:55 +0100
> Bruce Richardson <bruce.richardson@intel.com> wrote:
> 
>> On Tue, Jul 30, 2019 at 08:25:34AM -0700, Stephen Hemminger wrote:
>>> On Tue, 30 Jul 2019 14:49:49 +0200
>>> Marcin Zapolski <marcinx.a.zapolski@intel.com> wrote:
>>> 
>>>> Make rte_eth_rx_burst, rte_eth_tx_burst and other static inline ethdev
>>>> functions not inline. They are referencing DPDK internal structures and
>>>> inlining forces those structures to be exposed to user applications.
>>>> 
>>>> In internal testing with i40e NICs a performance drop of about 2% was
>>>> observed with testpmd.
>>>> 
>>>> Signed-off-by: Marcin Zapolski <marcinx.a.zapolski@intel.com>  
>>> 
>>> Sorry 2% matters.  
>> 
>> Note that this is with testpmd. Are there many apps out there where a 2%
>> drop in IO cost would be noticable?
> 
> Why not find a way to get the 2% back elsewhere? Maybe analyzing the code/cache
> in more detail. Perhaps some prefetching could help, or getting rid of
> indirect calls elsewhere in the code.  At the extreme, maybe implementing
> something like the kernel static branches (self-modifying code) would
> get a lot back.

+1, I discussed something very similar internally or at least let's not reduce DPDK performance by 2% and find a different way.

Regards,
Keith
  
Bruce Richardson July 30, 2019, 4:11 p.m. UTC | #7
On Tue, Jul 30, 2019 at 08:54:13AM -0700, Stephen Hemminger wrote:
> On Tue, 30 Jul 2019 16:33:55 +0100
> Bruce Richardson <bruce.richardson@intel.com> wrote:
> 
> > On Tue, Jul 30, 2019 at 08:25:34AM -0700, Stephen Hemminger wrote:
> > > On Tue, 30 Jul 2019 14:49:49 +0200
> > > Marcin Zapolski <marcinx.a.zapolski@intel.com> wrote:
> > >   
> > > > Make rte_eth_rx_burst, rte_eth_tx_burst and other static inline ethdev
> > > > functions not inline. They are referencing DPDK internal structures and
> > > > inlining forces those structures to be exposed to user applications.
> > > > 
> > > > In internal testing with i40e NICs a performance drop of about 2% was
> > > > observed with testpmd.
> > > > 
> > > > Signed-off-by: Marcin Zapolski <marcinx.a.zapolski@intel.com>  
> > > 
> > > Sorry 2% matters.  
> > 
> > Note that this is with testpmd. Are there many apps out there where a 2%
> > drop in IO cost would be noticable?
> 
> Why not find a way to get the 2% back elsewhere? Maybe analyzing the code/cache
> in more detail. Perhaps some prefetching could help, or getting rid of
> indirect calls elsewhere in the code.  At the extreme, maybe implementing
> something like the kernel static branches (self-modifying code) would
> get a lot back.

I'm all for getting it back, but the most likely place is in individual
drivers themselves. Do you have a link on the static branches that the rest
of us could read up on, since I, for one, am not familiar with the term.
  
Stephen Hemminger July 30, 2019, 4:23 p.m. UTC | #8
On Tue, 30 Jul 2019 17:11:31 +0100
Bruce Richardson <bruce.richardson@intel.com> wrote:

> On Tue, Jul 30, 2019 at 08:54:13AM -0700, Stephen Hemminger wrote:
> > On Tue, 30 Jul 2019 16:33:55 +0100
> > Bruce Richardson <bruce.richardson@intel.com> wrote:
> >   
> > > On Tue, Jul 30, 2019 at 08:25:34AM -0700, Stephen Hemminger wrote:  
> > > > On Tue, 30 Jul 2019 14:49:49 +0200
> > > > Marcin Zapolski <marcinx.a.zapolski@intel.com> wrote:
> > > >     
> > > > > Make rte_eth_rx_burst, rte_eth_tx_burst and other static inline ethdev
> > > > > functions not inline. They are referencing DPDK internal structures and
> > > > > inlining forces those structures to be exposed to user applications.
> > > > > 
> > > > > In internal testing with i40e NICs a performance drop of about 2% was
> > > > > observed with testpmd.
> > > > > 
> > > > > Signed-off-by: Marcin Zapolski <marcinx.a.zapolski@intel.com>    
> > > > 
> > > > Sorry 2% matters.    
> > > 
> > > Note that this is with testpmd. Are there many apps out there where a 2%
> > > drop in IO cost would be noticable?  
> > 
> > Why not find a way to get the 2% back elsewhere? Maybe analyzing the code/cache
> > in more detail. Perhaps some prefetching could help, or getting rid of
> > indirect calls elsewhere in the code.  At the extreme, maybe implementing
> > something like the kernel static branches (self-modifying code) would
> > get a lot back.  
> 
> I'm all for getting it back, but the most likely place is in individual
> drivers themselves. Do you have a link on the static branches that the rest
> of us could read up on, since I, for one, am not familiar with the term.

https://www.kernel.org/doc/Documentation/static-keys.txt
  

Patch

diff --git a/lib/librte_ethdev/rte_ethdev.c b/lib/librte_ethdev/rte_ethdev.c
index 17d183e1f..31432a956 100644
--- a/lib/librte_ethdev/rte_ethdev.c
+++ b/lib/librte_ethdev/rte_ethdev.c
@@ -749,6 +749,174 @@  rte_eth_dev_get_sec_ctx(uint16_t port_id)
 	return rte_eth_devices[port_id].security_ctx;
 }
 
+uint16_t
+rte_eth_rx_burst(uint16_t port_id, uint16_t queue_id,
+		 struct rte_mbuf **rx_pkts, const uint16_t nb_pkts)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	uint16_t nb_rx;
+
+#ifdef RTE_LIBRTE_ETHDEV_DEBUG
+	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, 0);
+	RTE_FUNC_PTR_OR_ERR_RET(*dev->rx_pkt_burst, 0);
+
+	if (queue_id >= dev->data->nb_rx_queues) {
+		RTE_ETHDEV_LOG(ERR, "Invalid RX queue_id=%u\n", queue_id);
+		return 0;
+	}
+#endif
+	nb_rx = (*dev->rx_pkt_burst)(dev->data->rx_queues[queue_id],
+				     rx_pkts, nb_pkts);
+
+#ifdef RTE_ETHDEV_RXTX_CALLBACKS
+	if (unlikely(dev->post_rx_burst_cbs[queue_id] != NULL)) {
+		struct rte_eth_rxtx_callback *cb =
+				dev->post_rx_burst_cbs[queue_id];
+
+		do {
+			nb_rx = cb->fn.rx(port_id, queue_id, rx_pkts, nb_rx,
+						nb_pkts, cb->param);
+			cb = cb->next;
+		} while (cb != NULL);
+	}
+#endif
+
+	return nb_rx;
+}
+
+int
+rte_eth_rx_queue_count(uint16_t port_id, uint16_t queue_id)
+{
+	struct rte_eth_dev *dev;
+
+	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL);
+	dev = &rte_eth_devices[port_id];
+	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rx_queue_count, -ENOTSUP);
+	if (queue_id >= dev->data->nb_rx_queues)
+		return -EINVAL;
+
+	return (int)(*dev->dev_ops->rx_queue_count)(dev, queue_id);
+}
+
+int
+rte_eth_rx_descriptor_done(uint16_t port_id, uint16_t queue_id, uint16_t offset)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
+	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rx_descriptor_done, -ENOTSUP);
+	return (*dev->dev_ops->rx_descriptor_done)(
+		dev->data->rx_queues[queue_id], offset);
+}
+
+int
+rte_eth_rx_descriptor_status(uint16_t port_id, uint16_t queue_id,
+	uint16_t offset)
+{
+	struct rte_eth_dev *dev;
+	void *rxq;
+
+#ifdef RTE_LIBRTE_ETHDEV_DEBUG
+	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
+#endif
+	dev = &rte_eth_devices[port_id];
+#ifdef RTE_LIBRTE_ETHDEV_DEBUG
+	if (queue_id >= dev->data->nb_rx_queues)
+		return -ENODEV;
+#endif
+	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rx_descriptor_status, -ENOTSUP);
+	rxq = dev->data->rx_queues[queue_id];
+
+	return (*dev->dev_ops->rx_descriptor_status)(rxq, offset);
+}
+
+int
+rte_eth_tx_descriptor_status(uint16_t port_id,
+	uint16_t queue_id, uint16_t offset)
+{
+	struct rte_eth_dev *dev;
+	void *txq;
+
+#ifdef RTE_LIBRTE_ETHDEV_DEBUG
+	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
+#endif
+	dev = &rte_eth_devices[port_id];
+#ifdef RTE_LIBRTE_ETHDEV_DEBUG
+	if (queue_id >= dev->data->nb_tx_queues)
+		return -ENODEV;
+#endif
+	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->tx_descriptor_status, -ENOTSUP);
+	txq = dev->data->tx_queues[queue_id];
+
+	return (*dev->dev_ops->tx_descriptor_status)(txq, offset);
+}
+
+uint16_t
+rte_eth_tx_burst(uint16_t port_id, uint16_t queue_id,
+		 struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+
+#ifdef RTE_LIBRTE_ETHDEV_DEBUG
+	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, 0);
+	RTE_FUNC_PTR_OR_ERR_RET(*dev->tx_pkt_burst, 0);
+
+	if (queue_id >= dev->data->nb_tx_queues) {
+		RTE_ETHDEV_LOG(ERR, "Invalid TX queue_id=%u\n", queue_id);
+		return 0;
+	}
+#endif
+
+#ifdef RTE_ETHDEV_RXTX_CALLBACKS
+	struct rte_eth_rxtx_callback *cb = dev->pre_tx_burst_cbs[queue_id];
+
+	if (unlikely(cb != NULL)) {
+		do {
+			nb_pkts = cb->fn.tx(port_id, queue_id, tx_pkts, nb_pkts,
+					cb->param);
+			cb = cb->next;
+		} while (cb != NULL);
+	}
+#endif
+
+	return (*dev->tx_pkt_burst)(dev->data->tx_queues[queue_id],
+		tx_pkts, nb_pkts);
+}
+
+#ifndef RTE_ETHDEV_TX_PREPARE_NOOP
+
+uint16_t
+rte_eth_tx_prepare(uint16_t port_id, uint16_t queue_id,
+		struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
+{
+	struct rte_eth_dev *dev;
+
+#ifdef RTE_LIBRTE_ETHDEV_DEBUG
+	if (!rte_eth_dev_is_valid_port(port_id)) {
+		RTE_ETHDEV_LOG(ERR, "Invalid TX port_id=%u\n", port_id);
+		rte_errno = EINVAL;
+		return 0;
+	}
+#endif
+
+	dev = &rte_eth_devices[port_id];
+
+#ifdef RTE_LIBRTE_ETHDEV_DEBUG
+	if (queue_id >= dev->data->nb_tx_queues) {
+		RTE_ETHDEV_LOG(ERR, "Invalid TX queue_id=%u\n", queue_id);
+		rte_errno = EINVAL;
+		return 0;
+	}
+#endif
+
+	if (!dev->tx_pkt_prepare)
+		return nb_pkts;
+
+	return (*dev->tx_pkt_prepare)(dev->data->tx_queues[queue_id],
+			tx_pkts, nb_pkts);
+}
+
+#endif
+
 uint16_t
 rte_eth_dev_count(void)
 {
diff --git a/lib/librte_ethdev/rte_ethdev.h b/lib/librte_ethdev/rte_ethdev.h
index dc6596bc9..3438cb681 100644
--- a/lib/librte_ethdev/rte_ethdev.h
+++ b/lib/librte_ethdev/rte_ethdev.h
@@ -4078,40 +4078,9 @@  rte_eth_dev_get_sec_ctx(uint16_t port_id);
  *   of pointers to *rte_mbuf* structures effectively supplied to the
  *   *rx_pkts* array.
  */
-static inline uint16_t
+uint16_t
 rte_eth_rx_burst(uint16_t port_id, uint16_t queue_id,
-		 struct rte_mbuf **rx_pkts, const uint16_t nb_pkts)
-{
-	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
-	uint16_t nb_rx;
-
-#ifdef RTE_LIBRTE_ETHDEV_DEBUG
-	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, 0);
-	RTE_FUNC_PTR_OR_ERR_RET(*dev->rx_pkt_burst, 0);
-
-	if (queue_id >= dev->data->nb_rx_queues) {
-		RTE_ETHDEV_LOG(ERR, "Invalid RX queue_id=%u\n", queue_id);
-		return 0;
-	}
-#endif
-	nb_rx = (*dev->rx_pkt_burst)(dev->data->rx_queues[queue_id],
-				     rx_pkts, nb_pkts);
-
-#ifdef RTE_ETHDEV_RXTX_CALLBACKS
-	if (unlikely(dev->post_rx_burst_cbs[queue_id] != NULL)) {
-		struct rte_eth_rxtx_callback *cb =
-				dev->post_rx_burst_cbs[queue_id];
-
-		do {
-			nb_rx = cb->fn.rx(port_id, queue_id, rx_pkts, nb_rx,
-						nb_pkts, cb->param);
-			cb = cb->next;
-		} while (cb != NULL);
-	}
-#endif
-
-	return nb_rx;
-}
+		 struct rte_mbuf **rx_pkts, const uint16_t nb_pkts);
 
 /**
  * Get the number of used descriptors of a rx queue
@@ -4125,19 +4094,8 @@  rte_eth_rx_burst(uint16_t port_id, uint16_t queue_id,
  *     (-EINVAL) if *port_id* or *queue_id* is invalid
  *     (-ENOTSUP) if the device does not support this function
  */
-static inline int
-rte_eth_rx_queue_count(uint16_t port_id, uint16_t queue_id)
-{
-	struct rte_eth_dev *dev;
-
-	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL);
-	dev = &rte_eth_devices[port_id];
-	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rx_queue_count, -ENOTSUP);
-	if (queue_id >= dev->data->nb_rx_queues)
-		return -EINVAL;
-
-	return (int)(*dev->dev_ops->rx_queue_count)(dev, queue_id);
-}
+int
+rte_eth_rx_queue_count(uint16_t port_id, uint16_t queue_id);
 
 /**
  * Check if the DD bit of the specific RX descriptor in the queue has been set
@@ -4154,15 +4112,9 @@  rte_eth_rx_queue_count(uint16_t port_id, uint16_t queue_id)
  *  - (-ENODEV) if *port_id* invalid.
  *  - (-ENOTSUP) if the device does not support this function
  */
-static inline int
-rte_eth_rx_descriptor_done(uint16_t port_id, uint16_t queue_id, uint16_t offset)
-{
-	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
-	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
-	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rx_descriptor_done, -ENOTSUP);
-	return (*dev->dev_ops->rx_descriptor_done)( \
-		dev->data->rx_queues[queue_id], offset);
-}
+int
+rte_eth_rx_descriptor_done(uint16_t port_id, uint16_t queue_id,
+	uint16_t offset);
 
 #define RTE_ETH_RX_DESC_AVAIL    0 /**< Desc available for hw. */
 #define RTE_ETH_RX_DESC_DONE     1 /**< Desc done, filled by hw. */
@@ -4201,26 +4153,9 @@  rte_eth_rx_descriptor_done(uint16_t port_id, uint16_t queue_id, uint16_t offset)
  *  - (-ENOTSUP) if the device does not support this function.
  *  - (-ENODEV) bad port or queue (only if compiled with debug).
  */
-static inline int
+int
 rte_eth_rx_descriptor_status(uint16_t port_id, uint16_t queue_id,
-	uint16_t offset)
-{
-	struct rte_eth_dev *dev;
-	void *rxq;
-
-#ifdef RTE_LIBRTE_ETHDEV_DEBUG
-	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
-#endif
-	dev = &rte_eth_devices[port_id];
-#ifdef RTE_LIBRTE_ETHDEV_DEBUG
-	if (queue_id >= dev->data->nb_rx_queues)
-		return -ENODEV;
-#endif
-	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rx_descriptor_status, -ENOTSUP);
-	rxq = dev->data->rx_queues[queue_id];
-
-	return (*dev->dev_ops->rx_descriptor_status)(rxq, offset);
-}
+	uint16_t offset);
 
 #define RTE_ETH_TX_DESC_FULL    0 /**< Desc filled for hw, waiting xmit. */
 #define RTE_ETH_TX_DESC_DONE    1 /**< Desc done, packet is transmitted. */
@@ -4259,25 +4194,8 @@  rte_eth_rx_descriptor_status(uint16_t port_id, uint16_t queue_id,
  *  - (-ENOTSUP) if the device does not support this function.
  *  - (-ENODEV) bad port or queue (only if compiled with debug).
  */
-static inline int rte_eth_tx_descriptor_status(uint16_t port_id,
-	uint16_t queue_id, uint16_t offset)
-{
-	struct rte_eth_dev *dev;
-	void *txq;
-
-#ifdef RTE_LIBRTE_ETHDEV_DEBUG
-	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
-#endif
-	dev = &rte_eth_devices[port_id];
-#ifdef RTE_LIBRTE_ETHDEV_DEBUG
-	if (queue_id >= dev->data->nb_tx_queues)
-		return -ENODEV;
-#endif
-	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->tx_descriptor_status, -ENOTSUP);
-	txq = dev->data->tx_queues[queue_id];
-
-	return (*dev->dev_ops->tx_descriptor_status)(txq, offset);
-}
+int rte_eth_tx_descriptor_status(uint16_t port_id,
+	uint16_t queue_id, uint16_t offset);
 
 /**
  * Send a burst of output packets on a transmit queue of an Ethernet device.
@@ -4345,36 +4263,9 @@  static inline int rte_eth_tx_descriptor_status(uint16_t port_id,
  *   the transmit ring. The return value can be less than the value of the
  *   *tx_pkts* parameter when the transmit ring is full or has been filled up.
  */
-static inline uint16_t
+uint16_t
 rte_eth_tx_burst(uint16_t port_id, uint16_t queue_id,
-		 struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
-{
-	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
-
-#ifdef RTE_LIBRTE_ETHDEV_DEBUG
-	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, 0);
-	RTE_FUNC_PTR_OR_ERR_RET(*dev->tx_pkt_burst, 0);
-
-	if (queue_id >= dev->data->nb_tx_queues) {
-		RTE_ETHDEV_LOG(ERR, "Invalid TX queue_id=%u\n", queue_id);
-		return 0;
-	}
-#endif
-
-#ifdef RTE_ETHDEV_RXTX_CALLBACKS
-	struct rte_eth_rxtx_callback *cb = dev->pre_tx_burst_cbs[queue_id];
-
-	if (unlikely(cb != NULL)) {
-		do {
-			nb_pkts = cb->fn.tx(port_id, queue_id, tx_pkts, nb_pkts,
-					cb->param);
-			cb = cb->next;
-		} while (cb != NULL);
-	}
-#endif
-
-	return (*dev->tx_pkt_burst)(dev->data->tx_queues[queue_id], tx_pkts, nb_pkts);
-}
+		 struct rte_mbuf **tx_pkts, uint16_t nb_pkts);
 
 /**
  * Process a burst of output packets on a transmit queue of an Ethernet device.
@@ -4431,36 +4322,9 @@  rte_eth_tx_burst(uint16_t port_id, uint16_t queue_id,
 
 #ifndef RTE_ETHDEV_TX_PREPARE_NOOP
 
-static inline uint16_t
+uint16_t
 rte_eth_tx_prepare(uint16_t port_id, uint16_t queue_id,
-		struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
-{
-	struct rte_eth_dev *dev;
-
-#ifdef RTE_LIBRTE_ETHDEV_DEBUG
-	if (!rte_eth_dev_is_valid_port(port_id)) {
-		RTE_ETHDEV_LOG(ERR, "Invalid TX port_id=%u\n", port_id);
-		rte_errno = EINVAL;
-		return 0;
-	}
-#endif
-
-	dev = &rte_eth_devices[port_id];
-
-#ifdef RTE_LIBRTE_ETHDEV_DEBUG
-	if (queue_id >= dev->data->nb_tx_queues) {
-		RTE_ETHDEV_LOG(ERR, "Invalid TX queue_id=%u\n", queue_id);
-		rte_errno = EINVAL;
-		return 0;
-	}
-#endif
-
-	if (!dev->tx_pkt_prepare)
-		return nb_pkts;
-
-	return (*dev->tx_pkt_prepare)(dev->data->tx_queues[queue_id],
-			tx_pkts, nb_pkts);
-}
+		struct rte_mbuf **tx_pkts, uint16_t nb_pkts);
 
 #else
 
diff --git a/lib/librte_ethdev/rte_ethdev_version.map b/lib/librte_ethdev/rte_ethdev_version.map
index df9141825..ab590bb71 100644
--- a/lib/librte_ethdev/rte_ethdev_version.map
+++ b/lib/librte_ethdev/rte_ethdev_version.map
@@ -236,6 +236,18 @@  DPDK_19.05 {
 
 } DPDK_18.11;
 
+DPDK_19.11 {
+	global:
+
+	rte_eth_rx_burst;
+	rte_eth_rx_descriptor_done;
+	rte_eth_rx_descriptor_status;
+	rte_eth_rx_queue_count;
+	rte_eth_tx_burst;
+	rte_eth_tx_descriptor_status;
+	rte_eth_tx_prepare;
+} DPDK_19.05;
+
 EXPERIMENTAL {
 	global: