[RFC,1/2] ethdev: introduce internal rxq/txq stats API

Message ID 1552576396-19906-1-git-send-email-david.marchand@redhat.com (mailing list archive)
State Changes Requested, archived
Delegated to: Ferruh Yigit
Headers
Series [RFC,1/2] ethdev: introduce internal rxq/txq stats API |

Checks

Context Check Description
ci/Intel-compilation success Compilation OK

Commit Message

David Marchand March 14, 2019, 3:13 p.m. UTC
  Introduce a new api to retrieve per queue statistics from the drivers.
The api objectives:
- easily add some common per queue statistics and have it exposed
  through the user xstats api while the user stats api is left untouched
- remove the limitations on the per queue statistics count (inherited
  from ixgbe) and avoid recurrent bugs on stats array overflow

Signed-off-by: David Marchand <david.marchand@redhat.com>
---
 lib/librte_ethdev/rte_ethdev.c        | 191 ++++++++++++++++++++++++++++------
 lib/librte_ethdev/rte_ethdev_core.h   |  13 +++
 lib/librte_ethdev/rte_ethdev_driver.h |  18 ++++
 3 files changed, 192 insertions(+), 30 deletions(-)
  

Comments

David Marchand March 15, 2019, 1:30 p.m. UTC | #1
On Thu, Mar 14, 2019 at 4:13 PM David Marchand <david.marchand@redhat.com>
wrote:

> Introduce a new api to retrieve per queue statistics from the drivers.
> The api objectives:
> - easily add some common per queue statistics and have it exposed
>   through the user xstats api while the user stats api is left untouched
> - remove the limitations on the per queue statistics count (inherited
>   from ixgbe) and avoid recurrent bugs on stats array overflow
>

I have looked at more drivers.
I will try to handle the case where drivers are only keeping tracks of per
q stats and aggregate then when called by .stats_get.
The drivers with hw counters would need both .stats_get and per q
functions, so I need to accomodate with this.
v2 rfc for next week unless someone totally disagrees with the approach
before :-)
  
Ferruh Yigit March 19, 2019, 5:18 p.m. UTC | #2
On 3/14/2019 3:13 PM, David Marchand wrote:
> Introduce a new api to retrieve per queue statistics from the drivers.
> The api objectives:
> - easily add some common per queue statistics and have it exposed
>   through the user xstats api while the user stats api is left untouched
> - remove the limitations on the per queue statistics count (inherited
>   from ixgbe) and avoid recurrent bugs on stats array overflow

The patch is adding two new dev_ops 'rxq_stats_get' & 'txq_stats_get', my
concern is if it is overkill to have three dev_ops to get stats
and I am feeling that is making xstat code more complex.

Would it be simpler to add 'q_ierrors' & 'q_oerrors' to 'struct rte_eth_stats'?

And perhaps we can do the 'fix rxq q_errors' patchset [1] after this change, so
fix can be done with less changes, although it will push the fix into next
release because of the ABI break.
OR ethdev will be broken this release, because of max_mtu, since ABI is already
broken perhaps we can squeeze this in.

Overall I would like to get more comment on this, Andrew, Thomas?

> 
> Signed-off-by: David Marchand <david.marchand@redhat.com>

<...>
  
Stephen Hemminger March 19, 2019, 5:54 p.m. UTC | #3
On Tue, 19 Mar 2019 17:18:08 +0000
Ferruh Yigit <ferruh.yigit@intel.com> wrote:

> On 3/14/2019 3:13 PM, David Marchand wrote:
> > Introduce a new api to retrieve per queue statistics from the drivers.
> > The api objectives:
> > - easily add some common per queue statistics and have it exposed
> >   through the user xstats api while the user stats api is left untouched
> > - remove the limitations on the per queue statistics count (inherited
> >   from ixgbe) and avoid recurrent bugs on stats array overflow  
> 
> The patch is adding two new dev_ops 'rxq_stats_get' & 'txq_stats_get', my
> concern is if it is overkill to have three dev_ops to get stats
> and I am feeling that is making xstat code more complex.
> 
> Would it be simpler to add 'q_ierrors' & 'q_oerrors' to 'struct rte_eth_stats'?
> 
> And perhaps we can do the 'fix rxq q_errors' patchset [1] after this change, so
> fix can be done with less changes, although it will push the fix into next
> release because of the ABI break.
> OR ethdev will be broken this release, because of max_mtu, since ABI is already
> broken perhaps we can squeeze this in.
> 
> Overall I would like to get more comment on this, Andrew, Thomas?
> 
> > 
> > Signed-off-by: David Marchand <david.marchand@redhat.com>  
> 
> <...>
> 

My preference would be:
  1. Make all DPDK drivers consistent in usage of current statistic values.
  2. Propose an enhancement to have new ethdev statistics match some pre-existing
     standard like SNMP or other RFC.
  3. Reduce custom (xstats) values by using #2. Leave it for driver specific stuff.

I.e: don't modify existing API at all, make a new one.

PS: ethdev is one of those structures that needs to get removed/hidden from
application headers.  It should be possible to add/remove stuff from ethdev internals, device and bus
without breaking API/ABI.
  
David Marchand March 26, 2019, 9:29 a.m. UTC | #4
On Tue, Mar 19, 2019 at 6:18 PM Ferruh Yigit <ferruh.yigit@intel.com> wrote:

> On 3/14/2019 3:13 PM, David Marchand wrote:
> > Introduce a new api to retrieve per queue statistics from the drivers.
> > The api objectives:
> > - easily add some common per queue statistics and have it exposed
> >   through the user xstats api while the user stats api is left untouched
> > - remove the limitations on the per queue statistics count (inherited
> >   from ixgbe) and avoid recurrent bugs on stats array overflow
>
> The patch is adding two new dev_ops 'rxq_stats_get' & 'txq_stats_get', my
> concern is if it is overkill to have three dev_ops to get stats
> and I am feeling that is making xstat code more complex.
>

Having these new (meant to be) internal dev_ops has the avantage of
separating the statistics reported from the drivers from the exported api.
This is also why I did not prefix the structure names with rte_.

The "complex" part is in a single place, ethdev and this is when
translating from an internal representation to the exposed bits in the
public apis.


Would it be simpler to add 'q_ierrors' & 'q_oerrors' to 'struct
> rte_eth_stats'?
>

It does not solve the problem of drivers that are buggy because of the
limit on RTE_ETHDEV_QUEUE_STAT_CNTRS.
All drivers need to be aware of this limitation of the rte_eth_stats
structure.

The drivers manipulate queues in rx/tx_queue_setup dev_ops for example,
having rxq/txq_stats_get dev_ops is more consistent to me.


> And perhaps we can do the 'fix rxq q_errors' patchset [1] after this
> change, so
> fix can be done with less changes, although it will push the fix into next
> release because of the ABI break.
>

I am fine with merging this together, we don't want to backport this
anyway, right?

But so far, I don't feel the need to break the rte_eth_stats_get API, if we
want to go to this, we can define an entirely new api to expose
standardized bits and still use the rxq/txq_stats_get internal dev_ops I
propose.


Thomas, Andrew, can you provide feedback please ?
rc1 is coming.
  
Thomas Monjalon April 12, 2019, 1:18 p.m. UTC | #5
19/03/2019 18:54, Stephen Hemminger:
> My preference would be:
>   1. Make all DPDK drivers consistent in usage of current statistic values.
>   2. Propose an enhancement to have new ethdev statistics match some pre-existing
>      standard like SNMP or other RFC.

This patch is about basic stats.
What would be different if looking at SNMP or other?

>   3. Reduce custom (xstats) values by using #2. Leave it for driver specific stuff.
> 
> I.e: don't modify existing API at all, make a new one.

This patch does not modify the API.
  
Thomas Monjalon April 12, 2019, 1:29 p.m. UTC | #6
26/03/2019 10:29, David Marchand:
> On Tue, Mar 19, 2019 at 6:18 PM Ferruh Yigit <ferruh.yigit@intel.com> wrote:
> 
> > On 3/14/2019 3:13 PM, David Marchand wrote:
> > > Introduce a new api to retrieve per queue statistics from the drivers.
> > > The api objectives:
> > > - easily add some common per queue statistics and have it exposed
> > >   through the user xstats api while the user stats api is left untouched
> > > - remove the limitations on the per queue statistics count (inherited
> > >   from ixgbe) and avoid recurrent bugs on stats array overflow

First comment, I think it would be easier to read if renaming the legacy
basic stats interface was in a separate patch.

> > The patch is adding two new dev_ops 'rxq_stats_get' & 'txq_stats_get', my
> > concern is if it is overkill to have three dev_ops to get stats
> > and I am feeling that is making xstat code more complex.
> 
> Having these new (meant to be) internal dev_ops has the avantage of
> separating the statistics reported from the drivers from the exported api.
> This is also why I did not prefix the structure names with rte_.

Yes, and to make it clear, please do not talk about API,
as it is only a driver interface.

> The "complex" part is in a single place, ethdev and this is when
> translating from an internal representation to the exposed bits in the
> public apis.
> 
> Would it be simpler to add 'q_ierrors' & 'q_oerrors' to 'struct
> > rte_eth_stats'?
> >
> 
> It does not solve the problem of drivers that are buggy because of the
> limit on RTE_ETHDEV_QUEUE_STAT_CNTRS.
> All drivers need to be aware of this limitation of the rte_eth_stats
> structure.

Yes, this limitation should be dropped.
I would like to see the functions rte_eth_dev_set_?x_queue_stats_mapping()
deprecated as they were a bad abstraction of ixgbe limitation.

> The drivers manipulate queues in rx/tx_queue_setup dev_ops for example,
> having rxq/txq_stats_get dev_ops is more consistent to me.

+1

> > And perhaps we can do the 'fix rxq q_errors' patchset [1] after this
> > change, so
> > fix can be done with less changes, although it will push the fix into next
> > release because of the ABI break.
> 
> I am fine with merging this together, we don't want to backport this
> anyway, right?

No, it would make some behaviours changing in stable releases,
so better to not backport it and keep the buggy behaviour in old branches.

> But so far, I don't feel the need to break the rte_eth_stats_get API, if we
> want to go to this, we can define an entirely new api to expose
> standardized bits and still use the rxq/txq_stats_get internal dev_ops I
> propose.

Good
  
David Marchand April 12, 2019, 2:32 p.m. UTC | #7
On Fri, Apr 12, 2019 at 3:29 PM Thomas Monjalon <thomas@monjalon.net> wrote:

> 26/03/2019 10:29, David Marchand:
> > On Tue, Mar 19, 2019 at 6:18 PM Ferruh Yigit <ferruh.yigit@intel.com>
> wrote:
> >
> > > On 3/14/2019 3:13 PM, David Marchand wrote:
> > > > Introduce a new api to retrieve per queue statistics from the
> drivers.
> > > > The api objectives:
> > > > - easily add some common per queue statistics and have it exposed
> > > >   through the user xstats api while the user stats api is left
> untouched
> > > > - remove the limitations on the per queue statistics count (inherited
> > > >   from ixgbe) and avoid recurrent bugs on stats array overflow
>
> First comment, I think it would be easier to read if renaming the legacy
> basic stats interface was in a separate patch.
>

It will be quite artificial, but I can do this yes.


> > > The patch is adding two new dev_ops 'rxq_stats_get' & 'txq_stats_get',
> my
> > > concern is if it is overkill to have three dev_ops to get stats
> > > and I am feeling that is making xstat code more complex.
> >
> > Having these new (meant to be) internal dev_ops has the avantage of
> > separating the statistics reported from the drivers from the exported
> api.
> > This is also why I did not prefix the structure names with rte_.
>
> Yes, and to make it clear, please do not talk about API,
> as it is only a driver interface.
>

Ok, so I will describe this as a "driver interface" update.



> > The "complex" part is in a single place, ethdev and this is when
> > translating from an internal representation to the exposed bits in the
> > public apis.
> >
> > Would it be simpler to add 'q_ierrors' & 'q_oerrors' to 'struct
> > > rte_eth_stats'?
> > >
> >
> > It does not solve the problem of drivers that are buggy because of the
> > limit on RTE_ETHDEV_QUEUE_STAT_CNTRS.
> > All drivers need to be aware of this limitation of the rte_eth_stats
> > structure.
>
> Yes, this limitation should be dropped.
> I would like to see the functions rte_eth_dev_set_?x_queue_stats_mapping()
> deprecated as they were a bad abstraction of ixgbe limitation.
>

That's a different topic from my pov, but yes, this mapping stuff should go
away, later.


> > And perhaps we can do the 'fix rxq q_errors' patchset [1] after this
> > > change, so
> > > fix can be done with less changes, although it will push the fix into
> next
> > > release because of the ABI break.
> >
> > I am fine with merging this together, we don't want to backport this
> > anyway, right?
>
> No, it would make some behaviours changing in stable releases,
> so better to not backport it and keep the buggy behaviour in old branches.
>

Since the time I had posted this RFC, I have worked on a RFC v2, I will
post this next week, with the drivers I found time to convert.
We will have to take a decision on what goes to -rc2 between this and the
q_errors[] patchset.
  
Stephen Hemminger April 12, 2019, 4:05 p.m. UTC | #8
On Fri, 12 Apr 2019 16:32:01 +0200
David Marchand <david.marchand@redhat.com> wrote:

> On Fri, Apr 12, 2019 at 3:29 PM Thomas Monjalon <thomas@monjalon.net> wrote:
> 
> > 26/03/2019 10:29, David Marchand:  
> > > On Tue, Mar 19, 2019 at 6:18 PM Ferruh Yigit <ferruh.yigit@intel.com>  
> > wrote:  
> > >  
> > > > On 3/14/2019 3:13 PM, David Marchand wrote:  
> > > > > Introduce a new api to retrieve per queue statistics from the  
> > drivers.  
> > > > > The api objectives:
> > > > > - easily add some common per queue statistics and have it exposed
> > > > >   through the user xstats api while the user stats api is left  
> > untouched  
> > > > > - remove the limitations on the per queue statistics count (inherited
> > > > >   from ixgbe) and avoid recurrent bugs on stats array overflow  
> >
> > First comment, I think it would be easier to read if renaming the legacy
> > basic stats interface was in a separate patch.
> >  
> 
> It will be quite artificial, but I can do this yes.
> 
> 
> > > > The patch is adding two new dev_ops 'rxq_stats_get' & 'txq_stats_get',  
> > my  
> > > > concern is if it is overkill to have three dev_ops to get stats
> > > > and I am feeling that is making xstat code more complex.  
> > >
> > > Having these new (meant to be) internal dev_ops has the avantage of
> > > separating the statistics reported from the drivers from the exported  
> > api.  
> > > This is also why I did not prefix the structure names with rte_.  
> >
> > Yes, and to make it clear, please do not talk about API,
> > as it is only a driver interface.
> >  
> 
> Ok, so I will describe this as a "driver interface" update.
> 
> 
> 
> > > The "complex" part is in a single place, ethdev and this is when
> > > translating from an internal representation to the exposed bits in the
> > > public apis.
> > >
> > > Would it be simpler to add 'q_ierrors' & 'q_oerrors' to 'struct  
> > > > rte_eth_stats'?
> > > >  
> > >
> > > It does not solve the problem of drivers that are buggy because of the
> > > limit on RTE_ETHDEV_QUEUE_STAT_CNTRS.
> > > All drivers need to be aware of this limitation of the rte_eth_stats
> > > structure.  
> >
> > Yes, this limitation should be dropped.
> > I would like to see the functions rte_eth_dev_set_?x_queue_stats_mapping()
> > deprecated as they were a bad abstraction of ixgbe limitation.
> >  
> 
> That's a different topic from my pov, but yes, this mapping stuff should go
> away, later.
> 
> 
> > > And perhaps we can do the 'fix rxq q_errors' patchset [1] after this  
> > > > change, so
> > > > fix can be done with less changes, although it will push the fix into  
> > next  
> > > > release because of the ABI break.  
> > >
> > > I am fine with merging this together, we don't want to backport this
> > > anyway, right?  
> >
> > No, it would make some behaviours changing in stable releases,
> > so better to not backport it and keep the buggy behaviour in old branches.
> >  
> 
> Since the time I had posted this RFC, I have worked on a RFC v2, I will
> post this next week, with the drivers I found time to convert.
> We will have to take a decision on what goes to -rc2 between this and the
> q_errors[] patchset.
> 
> 

It looks like this all about maintaining source compatiablity with older
or out of tree drivers. This is not something DPDK has to worry about.
Why not just do a mondo patch that fixes all the drivers to use the new stats API.
You do need to keep the same ethdev interface for applications, but driver
API can change.
  

Patch

diff --git a/lib/librte_ethdev/rte_ethdev.c b/lib/librte_ethdev/rte_ethdev.c
index 85c1794..058fbd1 100644
--- a/lib/librte_ethdev/rte_ethdev.c
+++ b/lib/librte_ethdev/rte_ethdev.c
@@ -88,21 +88,30 @@  struct rte_eth_xstats_name_off {
 
 #define RTE_NB_STATS (sizeof(rte_stats_strings) / sizeof(rte_stats_strings[0]))
 
-static const struct rte_eth_xstats_name_off rte_rxq_stats_strings[] = {
+static const struct rte_eth_xstats_name_off legacy_rxq_stats_map[] = {
 	{"packets", offsetof(struct rte_eth_stats, q_ipackets)},
 	{"bytes", offsetof(struct rte_eth_stats, q_ibytes)},
 	{"errors", offsetof(struct rte_eth_stats, q_errors)},
 };
+#define RTE_NB_LEGACY_RXQ_STATS RTE_DIM(legacy_rxq_stats_map)
+static const struct rte_eth_xstats_name_off rxq_stats_map[] = {
+	{"packets", offsetof(struct pmd_eth_rxq_stats, packets)},
+	{"bytes", offsetof(struct pmd_eth_rxq_stats, bytes)},
+	{"errors", offsetof(struct pmd_eth_rxq_stats, errors)},
+};
+#define RTE_NB_RXQ_STATS RTE_DIM(rxq_stats_map)
 
-#define RTE_NB_RXQ_STATS (sizeof(rte_rxq_stats_strings) /	\
-		sizeof(rte_rxq_stats_strings[0]))
-
-static const struct rte_eth_xstats_name_off rte_txq_stats_strings[] = {
+static const struct rte_eth_xstats_name_off legacy_txq_stats_map[] = {
 	{"packets", offsetof(struct rte_eth_stats, q_opackets)},
 	{"bytes", offsetof(struct rte_eth_stats, q_obytes)},
 };
-#define RTE_NB_TXQ_STATS (sizeof(rte_txq_stats_strings) /	\
-		sizeof(rte_txq_stats_strings[0]))
+#define RTE_NB_LEGACY_TXQ_STATS RTE_DIM(legacy_txq_stats_map)
+static const struct rte_eth_xstats_name_off txq_stats_map[] = {
+	{"packets", offsetof(struct pmd_eth_txq_stats, packets)},
+	{"bytes", offsetof(struct pmd_eth_txq_stats, bytes)},
+	{"errors", offsetof(struct pmd_eth_txq_stats, errors)},
+};
+#define RTE_NB_TXQ_STATS RTE_DIM(txq_stats_map)
 
 #define RTE_RX_OFFLOAD_BIT2STR(_name)	\
 	{ DEV_RX_OFFLOAD_##_name, #_name }
@@ -1937,6 +1946,10 @@  struct rte_eth_dev *
 rte_eth_stats_get(uint16_t port_id, struct rte_eth_stats *stats)
 {
 	struct rte_eth_dev *dev;
+	unsigned int nb_rxqs;
+	unsigned int nb_txqs;
+	unsigned int qid;
+	int ret;
 
 	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL);
 
@@ -1945,7 +1958,44 @@  struct rte_eth_dev *
 
 	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->stats_get, -ENOTSUP);
 	stats->rx_nombuf = dev->data->rx_mbuf_alloc_failed;
-	return eth_err(port_id, (*dev->dev_ops->stats_get)(dev, stats));
+	ret = eth_err(port_id, (*dev->dev_ops->stats_get)(dev, stats));
+	if (ret)
+		goto out;
+
+	if (!dev->dev_ops->rxq_stats_get)
+		goto skip_rxq;
+	nb_rxqs = RTE_MIN(dev->data->nb_rx_queues,
+			  RTE_ETHDEV_QUEUE_STAT_CNTRS);
+	for (qid = 0; qid < nb_rxqs; qid++) {
+		struct pmd_eth_rxq_stats rxq_stats;
+
+		memset(&rxq_stats, 0, sizeof(rxq_stats));
+		if (dev->dev_ops->rxq_stats_get(dev, qid, &rxq_stats))
+			continue;
+
+		stats->q_ipackets[qid] = rxq_stats.packets;
+		stats->q_ibytes[qid] = rxq_stats.bytes;
+		stats->q_errors[qid] = rxq_stats.errors;
+	}
+
+skip_rxq:
+	if (!dev->dev_ops->txq_stats_get)
+		goto out;
+	nb_txqs = RTE_MIN(dev->data->nb_tx_queues,
+			  RTE_ETHDEV_QUEUE_STAT_CNTRS);
+	for (qid = 0; qid < nb_txqs; qid++) {
+		struct pmd_eth_txq_stats txq_stats;
+
+		memset(&txq_stats, 0, sizeof(txq_stats));
+		if (dev->dev_ops->txq_stats_get(dev, qid, &txq_stats))
+			continue;
+
+		stats->q_opackets[qid] = txq_stats.packets;
+		stats->q_obytes[qid] = txq_stats.bytes;
+	}
+
+out:
+	return ret;
 }
 
 int
@@ -1969,12 +2019,24 @@  struct rte_eth_dev *
 	uint16_t nb_rxqs, nb_txqs;
 	int count;
 
-	nb_rxqs = RTE_MIN(dev->data->nb_rx_queues, RTE_ETHDEV_QUEUE_STAT_CNTRS);
-	nb_txqs = RTE_MIN(dev->data->nb_tx_queues, RTE_ETHDEV_QUEUE_STAT_CNTRS);
-
 	count = RTE_NB_STATS;
-	count += nb_rxqs * RTE_NB_RXQ_STATS;
-	count += nb_txqs * RTE_NB_TXQ_STATS;
+	if (dev->dev_ops->rxq_stats_get) {
+		nb_rxqs = dev->data->nb_rx_queues;
+		count += nb_rxqs * RTE_NB_RXQ_STATS;
+	} else {
+		nb_rxqs = RTE_MIN(dev->data->nb_rx_queues,
+				  RTE_ETHDEV_QUEUE_STAT_CNTRS);
+		count += nb_rxqs * RTE_NB_LEGACY_RXQ_STATS;
+	}
+
+	if (dev->dev_ops->txq_stats_get) {
+		nb_txqs = dev->data->nb_tx_queues;
+		count += nb_txqs * RTE_NB_TXQ_STATS;
+	} else {
+		nb_txqs = RTE_MIN(dev->data->nb_tx_queues,
+				  RTE_ETHDEV_QUEUE_STAT_CNTRS);
+		count += nb_txqs * RTE_NB_LEGACY_TXQ_STATS;
+	}
 
 	return count;
 }
@@ -2065,27 +2127,59 @@  struct rte_eth_dev *
 			"%s", rte_stats_strings[idx].name);
 		cnt_used_entries++;
 	}
+
+	if (dev->dev_ops->rxq_stats_get) {
+		num_q = dev->data->nb_rx_queues;
+		for (id_queue = 0; id_queue < num_q; id_queue++) {
+			for (idx = 0; idx < RTE_NB_RXQ_STATS; idx++) {
+				snprintf(xstats_names[cnt_used_entries].name,
+					sizeof(xstats_names[0].name),
+					"rx_q%u%s",
+					id_queue, rxq_stats_map[idx].name);
+				cnt_used_entries++;
+			}
+		}
+		goto skip_legacy_rxq;
+	}
+
 	num_q = RTE_MIN(dev->data->nb_rx_queues, RTE_ETHDEV_QUEUE_STAT_CNTRS);
 	for (id_queue = 0; id_queue < num_q; id_queue++) {
-		for (idx = 0; idx < RTE_NB_RXQ_STATS; idx++) {
+		for (idx = 0; idx < RTE_NB_LEGACY_RXQ_STATS; idx++) {
 			snprintf(xstats_names[cnt_used_entries].name,
 				sizeof(xstats_names[0].name),
 				"rx_q%u%s",
-				id_queue, rte_rxq_stats_strings[idx].name);
+				id_queue, legacy_rxq_stats_map[idx].name);
 			cnt_used_entries++;
 		}
+	}
 
+skip_legacy_rxq:
+	if (dev->dev_ops->txq_stats_get) {
+		num_q = dev->data->nb_tx_queues;
+		for (id_queue = 0; id_queue < num_q; id_queue++) {
+			for (idx = 0; idx < RTE_NB_TXQ_STATS; idx++) {
+				snprintf(xstats_names[cnt_used_entries].name,
+					sizeof(xstats_names[0].name),
+					"tx_q%u%s",
+					id_queue, txq_stats_map[idx].name);
+				cnt_used_entries++;
+			}
+		}
+		goto skip_legacy_txq;
 	}
+
 	num_q = RTE_MIN(dev->data->nb_tx_queues, RTE_ETHDEV_QUEUE_STAT_CNTRS);
 	for (id_queue = 0; id_queue < num_q; id_queue++) {
-		for (idx = 0; idx < RTE_NB_TXQ_STATS; idx++) {
+		for (idx = 0; idx < RTE_NB_LEGACY_TXQ_STATS; idx++) {
 			snprintf(xstats_names[cnt_used_entries].name,
 				sizeof(xstats_names[0].name),
 				"tx_q%u%s",
-				id_queue, rte_txq_stats_strings[idx].name);
+				id_queue, legacy_txq_stats_map[idx].name);
 			cnt_used_entries++;
 		}
 	}
+
+skip_legacy_txq:
 	return cnt_used_entries;
 }
 
@@ -2252,9 +2346,6 @@  struct rte_eth_dev *
 
 	dev = &rte_eth_devices[port_id];
 
-	nb_rxqs = RTE_MIN(dev->data->nb_rx_queues, RTE_ETHDEV_QUEUE_STAT_CNTRS);
-	nb_txqs = RTE_MIN(dev->data->nb_tx_queues, RTE_ETHDEV_QUEUE_STAT_CNTRS);
-
 	/* global stats */
 	for (i = 0; i < RTE_NB_STATS; i++) {
 		stats_ptr = RTE_PTR_ADD(&eth_stats,
@@ -2264,26 +2355,71 @@  struct rte_eth_dev *
 	}
 
 	/* per-rxq stats */
+	if (dev->dev_ops->rxq_stats_get) {
+		nb_rxqs = dev->data->nb_rx_queues;
+		for (q = 0; q < nb_rxqs; q++) {
+			struct pmd_eth_rxq_stats rxq_stats;
+
+			memset(&rxq_stats, 0, sizeof(rxq_stats));
+			if (dev->dev_ops->rxq_stats_get(dev, q, &rxq_stats)) {
+				count += RTE_NB_RXQ_STATS;
+				continue;
+			}
+			for (i = 0; i < RTE_NB_RXQ_STATS; i++) {
+				stats_ptr = RTE_PTR_ADD(&rxq_stats,
+						rxq_stats_map[i].offset);
+				val = *stats_ptr;
+				xstats[count++].value = val;
+			}
+		}
+		goto skip_legacy_rxq;
+	}
+
+	nb_rxqs = RTE_MIN(dev->data->nb_rx_queues, RTE_ETHDEV_QUEUE_STAT_CNTRS);
 	for (q = 0; q < nb_rxqs; q++) {
-		for (i = 0; i < RTE_NB_RXQ_STATS; i++) {
+		for (i = 0; i < RTE_NB_LEGACY_RXQ_STATS; i++) {
 			stats_ptr = RTE_PTR_ADD(&eth_stats,
-					rte_rxq_stats_strings[i].offset +
+					legacy_rxq_stats_map[i].offset +
 					q * sizeof(uint64_t));
 			val = *stats_ptr;
 			xstats[count++].value = val;
 		}
 	}
 
+skip_legacy_rxq:
 	/* per-txq stats */
+	if (dev->dev_ops->txq_stats_get) {
+		nb_txqs = dev->data->nb_tx_queues;
+		for (q = 0; q < nb_txqs; q++) {
+			struct pmd_eth_txq_stats txq_stats;
+
+			memset(&txq_stats, 0, sizeof(txq_stats));
+			if (dev->dev_ops->txq_stats_get(dev, q, &txq_stats)) {
+				count += RTE_NB_TXQ_STATS;
+				continue;
+			}
+			for (i = 0; i < RTE_NB_TXQ_STATS; i++) {
+				stats_ptr = RTE_PTR_ADD(&txq_stats,
+						txq_stats_map[i].offset);
+				val = *stats_ptr;
+				xstats[count++].value = val;
+			}
+		}
+		goto skip_legacy_txq;
+	}
+
+	nb_txqs = RTE_MIN(dev->data->nb_tx_queues, RTE_ETHDEV_QUEUE_STAT_CNTRS);
 	for (q = 0; q < nb_txqs; q++) {
-		for (i = 0; i < RTE_NB_TXQ_STATS; i++) {
+		for (i = 0; i < RTE_NB_LEGACY_TXQ_STATS; i++) {
 			stats_ptr = RTE_PTR_ADD(&eth_stats,
-					rte_txq_stats_strings[i].offset +
+					legacy_txq_stats_map[i].offset +
 					q * sizeof(uint64_t));
 			val = *stats_ptr;
 			xstats[count++].value = val;
 		}
 	}
+
+skip_legacy_txq:
 	return count;
 }
 
@@ -2387,19 +2523,14 @@  struct rte_eth_dev *
 	struct rte_eth_dev *dev;
 	unsigned int count = 0, i;
 	signed int xcount = 0;
-	uint16_t nb_rxqs, nb_txqs;
 	int ret;
 
 	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL);
 
 	dev = &rte_eth_devices[port_id];
 
-	nb_rxqs = RTE_MIN(dev->data->nb_rx_queues, RTE_ETHDEV_QUEUE_STAT_CNTRS);
-	nb_txqs = RTE_MIN(dev->data->nb_tx_queues, RTE_ETHDEV_QUEUE_STAT_CNTRS);
-
 	/* Return generic statistics */
-	count = RTE_NB_STATS + (nb_rxqs * RTE_NB_RXQ_STATS) +
-		(nb_txqs * RTE_NB_TXQ_STATS);
+	count = get_xstats_basic_count(dev);
 
 	/* implemented by the driver */
 	if (dev->dev_ops->xstats_get != NULL) {
diff --git a/lib/librte_ethdev/rte_ethdev_core.h b/lib/librte_ethdev/rte_ethdev_core.h
index 8f03f83..63375fe 100644
--- a/lib/librte_ethdev/rte_ethdev_core.h
+++ b/lib/librte_ethdev/rte_ethdev_core.h
@@ -97,6 +97,16 @@  typedef int (*eth_xstats_get_names_by_id_t)(struct rte_eth_dev *dev,
 	unsigned int size);
 /**< @internal Get names of extended stats of an Ethernet device. */
 
+struct pmd_eth_rxq_stats;
+typedef int (*eth_rxq_stats_get)(struct rte_eth_dev *dev, uint16_t rx_queue_id,
+	struct pmd_eth_rxq_stats *rx_queue_stats);
+/**< @internal Get statistics for a rx queue of an Ethernet device. */
+
+struct pmd_eth_txq_stats;
+typedef int (*eth_txq_stats_get)(struct rte_eth_dev *dev, uint16_t tx_queue_id,
+	struct pmd_eth_txq_stats *tx_queue_stats);
+/**< @internal Get statistics for a tx queue of an Ethernet device. */
+
 typedef int (*eth_queue_stats_mapping_set_t)(struct rte_eth_dev *dev,
 					     uint16_t queue_id,
 					     uint8_t stat_idx,
@@ -501,6 +511,9 @@  struct eth_dev_ops {
 	eth_xstats_get_names_by_id_t xstats_get_names_by_id;
 	/**< Get name of extended device statistics by ID. */
 
+	eth_rxq_stats_get rxq_stats_get; /**< Stats per rxq */
+	eth_txq_stats_get txq_stats_get; /**< Stats per txq */
+
 	eth_tm_ops_get_t tm_ops_get;
 	/**< Get Traffic Management (TM) operations. */
 
diff --git a/lib/librte_ethdev/rte_ethdev_driver.h b/lib/librte_ethdev/rte_ethdev_driver.h
index c2ac263..33a4b22 100644
--- a/lib/librte_ethdev/rte_ethdev_driver.h
+++ b/lib/librte_ethdev/rte_ethdev_driver.h
@@ -331,6 +331,24 @@  typedef int (*ethdev_bus_specific_init)(struct rte_eth_dev *ethdev,
 int __rte_experimental
 rte_eth_dev_destroy(struct rte_eth_dev *ethdev, ethdev_uninit_t ethdev_uninit);
 
+/**
+ * @internal
+ *
+ * Internal structures used by PMD to provide the per rx/tx queues to the
+ * ethdev layer.
+ */
+struct pmd_eth_rxq_stats {
+	uint64_t packets;
+	uint64_t bytes;
+	uint64_t errors;
+};
+
+struct pmd_eth_txq_stats {
+	uint64_t packets;
+	uint64_t bytes;
+	uint64_t errors;
+};
+
 #ifdef __cplusplus
 }
 #endif