[24/25] net/ena: fix direct access to shared memory config
Checks
Commit Message
The ENA driver calculates a ring's NUMA node affinity by directly
accessing the memzone list. Fix it to do it through the public
API's instead.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
drivers/net/ena/ena_ethdev.c | 18 +++---------------
1 file changed, 3 insertions(+), 15 deletions(-)
Comments
On 29.05.2019 18:31, Anatoly Burakov wrote:
> The ENA driver calculates a ring's NUMA node affinity by directly
> accessing the memzone list. Fix it to do it through the public
> API's instead.
>
> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
> ---
> drivers/net/ena/ena_ethdev.c | 18 +++---------------
> 1 file changed, 3 insertions(+), 15 deletions(-)
>
> diff --git a/drivers/net/ena/ena_ethdev.c b/drivers/net/ena/ena_ethdev.c
> index b6651fc0f..e745e9e92 100644
> --- a/drivers/net/ena/ena_ethdev.c
> +++ b/drivers/net/ena/ena_ethdev.c
> @@ -274,20 +274,6 @@ static const struct eth_dev_ops ena_dev_ops = {
>
> #define NUMA_NO_NODE SOCKET_ID_ANY
>
> -static inline int ena_cpu_to_node(int cpu)
> -{
> - struct rte_config *config = rte_eal_get_configuration();
> - struct rte_fbarray *arr = &config->mem_config->memzones;
> - const struct rte_memzone *mz;
> -
> - if (unlikely(cpu >= RTE_MAX_MEMZONE))
> - return NUMA_NO_NODE;
> -
> - mz = rte_fbarray_get(arr, cpu);
> -
> - return mz->socket_id;
> -}
> -
> static inline void ena_rx_mbuf_prepare(struct rte_mbuf *mbuf,
> struct ena_com_rx_ctx *ena_rx_ctx)
> {
> @@ -1099,6 +1085,7 @@ static int ena_create_io_queue(struct ena_ring *ring)
> {
> struct ena_adapter *adapter;
> struct ena_com_dev *ena_dev;
> + struct rte_memseg_list *msl;
> struct ena_com_create_io_ctx ctx =
> /* policy set to _HOST just to satisfy icc compiler */
> { ENA_ADMIN_PLACEMENT_POLICY_HOST,
> @@ -1126,7 +1113,8 @@ static int ena_create_io_queue(struct ena_ring *ring)
> }
> ctx.qid = ena_qid;
> ctx.msix_vector = -1; /* interrupts not used */
> - ctx.numa_node = ena_cpu_to_node(ring->id);
> + msl = rte_mem_virt2memseg_list(ring);
> + ctx.numa_node = msl->socket_id;
>
> rc = ena_com_create_io_queue(ena_dev, &ctx);
> if (rc) {
>
Hi Anatoly,
I'm not sure why the previous maintainers implemented this that way, I
can only guess. I think that they were assuming, that each queue will be
assigned to the lcore which is equal to ring id. They probably also
misunderstood how the memzones are working and they thought that each
lcore is having assigned only one memzone which is being mapped 1 to 1.
They wanted to prevent cross NUMA data acces, when the CPU is operating
in the different NUMA zone and the IO queues memory resides in the
other. I think that above solution won't prevent that neither, as you
are using ring address, which is being allocated together with
struct ena_adapter (it is just an array), so it will probably reside in
the single numa zone.
I'm currently thinking on solution that could help us to determine on
which numa zone the queue descriptors will be allocated and on which the
lcore assigned to the queue will be working, but have no any ideas for
now :)
Anyway, your fix won't break anything, as the previous solution wasn't
working as it was supposed to work, so before I will fix that, we can
keep that patch to prevent direct usage of the memzone.
Thanks,
Michal
On 03.06.2019 09:33, Michał Krawczyk wrote:
> On 29.05.2019 18:31, Anatoly Burakov wrote:
>> The ENA driver calculates a ring's NUMA node affinity by directly
>> accessing the memzone list. Fix it to do it through the public
>> API's instead.
>>
>> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
>> ---
>> drivers/net/ena/ena_ethdev.c | 18 +++---------------
>> 1 file changed, 3 insertions(+), 15 deletions(-)
>>
>> diff --git a/drivers/net/ena/ena_ethdev.c b/drivers/net/ena/ena_ethdev.c
>> index b6651fc0f..e745e9e92 100644
>> --- a/drivers/net/ena/ena_ethdev.c
>> +++ b/drivers/net/ena/ena_ethdev.c
>> @@ -274,20 +274,6 @@ static const struct eth_dev_ops ena_dev_ops = {
>> #define NUMA_NO_NODE SOCKET_ID_ANY
>> -static inline int ena_cpu_to_node(int cpu)
>> -{
>> - struct rte_config *config = rte_eal_get_configuration();
>> - struct rte_fbarray *arr = &config->mem_config->memzones;
>> - const struct rte_memzone *mz;
>> -
>> - if (unlikely(cpu >= RTE_MAX_MEMZONE))
>> - return NUMA_NO_NODE;
>> -
>> - mz = rte_fbarray_get(arr, cpu);
>> -
>> - return mz->socket_id;
>> -}
>> -
>> static inline void ena_rx_mbuf_prepare(struct rte_mbuf *mbuf,
>> struct ena_com_rx_ctx *ena_rx_ctx)
>> {
>> @@ -1099,6 +1085,7 @@ static int ena_create_io_queue(struct ena_ring
>> *ring)
>> {
>> struct ena_adapter *adapter;
>> struct ena_com_dev *ena_dev;
>> + struct rte_memseg_list *msl;
>> struct ena_com_create_io_ctx ctx =
>> /* policy set to _HOST just to satisfy icc compiler */
>> { ENA_ADMIN_PLACEMENT_POLICY_HOST,
>> @@ -1126,7 +1113,8 @@ static int ena_create_io_queue(struct ena_ring
>> *ring)
>> }
>> ctx.qid = ena_qid;
>> ctx.msix_vector = -1; /* interrupts not used */
>> - ctx.numa_node = ena_cpu_to_node(ring->id);
>> + msl = rte_mem_virt2memseg_list(ring);
>> + ctx.numa_node = msl->socket_id;
>> rc = ena_com_create_io_queue(ena_dev, &ctx);
>> if (rc) {
>>
>
> Hi Anatoly,
>
> I'm not sure why the previous maintainers implemented this that way, I
> can only guess. I think that they were assuming, that each queue will be
> assigned to the lcore which is equal to ring id. They probably also
> misunderstood how the memzones are working and they thought that each
> lcore is having assigned only one memzone which is being mapped 1 to 1.
>
> They wanted to prevent cross NUMA data acces, when the CPU is operating
> in the different NUMA zone and the IO queues memory resides in the
> other. I think that above solution won't prevent that neither, as you
> are using ring address, which is being allocated together with
> struct ena_adapter (it is just an array), so it will probably reside in
> the single numa zone.
>
> I'm currently thinking on solution that could help us to determine on
> which numa zone the queue descriptors will be allocated and on which the
> lcore assigned to the queue will be working, but have no any ideas for
> now :)
>
> Anyway, your fix won't break anything, as the previous solution wasn't
> working as it was supposed to work, so before I will fix that, we can
> keep that patch to prevent direct usage of the memzone.
>
> Thanks,
> Michal
After investigation I think that we should use socket_id provided by the
tx/rx queue setup functions.
Could you, please, abandon this patch? I will send the proper fix soon.
Thanks,
Michal
On 03-Jun-19 2:36 PM, Michał Krawczyk wrote:
> On 03.06.2019 09:33, Michał Krawczyk wrote:
>> On 29.05.2019 18:31, Anatoly Burakov wrote:
>>> The ENA driver calculates a ring's NUMA node affinity by directly
>>> accessing the memzone list. Fix it to do it through the public
>>> API's instead.
>>>
>>> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
>>> ---
>>> drivers/net/ena/ena_ethdev.c | 18 +++---------------
>>> 1 file changed, 3 insertions(+), 15 deletions(-)
>>>
>>> diff --git a/drivers/net/ena/ena_ethdev.c b/drivers/net/ena/ena_ethdev.c
>>> index b6651fc0f..e745e9e92 100644
>>> --- a/drivers/net/ena/ena_ethdev.c
>>> +++ b/drivers/net/ena/ena_ethdev.c
>>> @@ -274,20 +274,6 @@ static const struct eth_dev_ops ena_dev_ops = {
>>> #define NUMA_NO_NODE SOCKET_ID_ANY
>>> -static inline int ena_cpu_to_node(int cpu)
>>> -{
>>> - struct rte_config *config = rte_eal_get_configuration();
>>> - struct rte_fbarray *arr = &config->mem_config->memzones;
>>> - const struct rte_memzone *mz;
>>> -
>>> - if (unlikely(cpu >= RTE_MAX_MEMZONE))
>>> - return NUMA_NO_NODE;
>>> -
>>> - mz = rte_fbarray_get(arr, cpu);
>>> -
>>> - return mz->socket_id;
>>> -}
>>> -
>>> static inline void ena_rx_mbuf_prepare(struct rte_mbuf *mbuf,
>>> struct ena_com_rx_ctx *ena_rx_ctx)
>>> {
>>> @@ -1099,6 +1085,7 @@ static int ena_create_io_queue(struct ena_ring
>>> *ring)
>>> {
>>> struct ena_adapter *adapter;
>>> struct ena_com_dev *ena_dev;
>>> + struct rte_memseg_list *msl;
>>> struct ena_com_create_io_ctx ctx =
>>> /* policy set to _HOST just to satisfy icc compiler */
>>> { ENA_ADMIN_PLACEMENT_POLICY_HOST,
>>> @@ -1126,7 +1113,8 @@ static int ena_create_io_queue(struct ena_ring
>>> *ring)
>>> }
>>> ctx.qid = ena_qid;
>>> ctx.msix_vector = -1; /* interrupts not used */
>>> - ctx.numa_node = ena_cpu_to_node(ring->id);
>>> + msl = rte_mem_virt2memseg_list(ring);
>>> + ctx.numa_node = msl->socket_id;
>>> rc = ena_com_create_io_queue(ena_dev, &ctx);
>>> if (rc) {
>>>
>>
>> Hi Anatoly,
>>
>> I'm not sure why the previous maintainers implemented this that way, I
>> can only guess. I think that they were assuming, that each queue will
>> be assigned to the lcore which is equal to ring id. They probably also
>> misunderstood how the memzones are working and they thought that each
>> lcore is having assigned only one memzone which is being mapped 1 to 1.
>>
>> They wanted to prevent cross NUMA data acces, when the CPU is
>> operating in the different NUMA zone and the IO queues memory resides
>> in the other. I think that above solution won't prevent that neither,
>> as you are using ring address, which is being allocated together with
>> struct ena_adapter (it is just an array), so it will probably reside
>> in the single numa zone.
>>
>> I'm currently thinking on solution that could help us to determine on
>> which numa zone the queue descriptors will be allocated and on which
>> the lcore assigned to the queue will be working, but have no any ideas
>> for now :)
>>
>> Anyway, your fix won't break anything, as the previous solution wasn't
>> working as it was supposed to work, so before I will fix that, we can
>> keep that patch to prevent direct usage of the memzone.
>>
>> Thanks,
>> Michal
>
> After investigation I think that we should use socket_id provided by the
> tx/rx queue setup functions.
> Could you, please, abandon this patch? I will send the proper fix soon.
>
I can't really "abandon" it as it will break ENA compilation once the
structure is hidden in the last patch. What i can do is wait for you to
submit your patch, and either rebase my patchset on top of it, or
(better) include it in the patchset itself.
> Thanks,
> Michal
>
wt., 4 cze 2019 o 12:28 Burakov, Anatoly <anatoly.burakov@intel.com> napisał(a):
>
> On 03-Jun-19 2:36 PM, Michał Krawczyk wrote:
> > On 03.06.2019 09:33, Michał Krawczyk wrote:
> >> On 29.05.2019 18:31, Anatoly Burakov wrote:
> >>> The ENA driver calculates a ring's NUMA node affinity by directly
> >>> accessing the memzone list. Fix it to do it through the public
> >>> API's instead.
> >>>
> >>> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
> >>> ---
> >>> drivers/net/ena/ena_ethdev.c | 18 +++---------------
> >>> 1 file changed, 3 insertions(+), 15 deletions(-)
> >>>
> >>> diff --git a/drivers/net/ena/ena_ethdev.c b/drivers/net/ena/ena_ethdev.c
> >>> index b6651fc0f..e745e9e92 100644
> >>> --- a/drivers/net/ena/ena_ethdev.c
> >>> +++ b/drivers/net/ena/ena_ethdev.c
> >>> @@ -274,20 +274,6 @@ static const struct eth_dev_ops ena_dev_ops = {
> >>> #define NUMA_NO_NODE SOCKET_ID_ANY
> >>> -static inline int ena_cpu_to_node(int cpu)
> >>> -{
> >>> - struct rte_config *config = rte_eal_get_configuration();
> >>> - struct rte_fbarray *arr = &config->mem_config->memzones;
> >>> - const struct rte_memzone *mz;
> >>> -
> >>> - if (unlikely(cpu >= RTE_MAX_MEMZONE))
> >>> - return NUMA_NO_NODE;
> >>> -
> >>> - mz = rte_fbarray_get(arr, cpu);
> >>> -
> >>> - return mz->socket_id;
> >>> -}
> >>> -
> >>> static inline void ena_rx_mbuf_prepare(struct rte_mbuf *mbuf,
> >>> struct ena_com_rx_ctx *ena_rx_ctx)
> >>> {
> >>> @@ -1099,6 +1085,7 @@ static int ena_create_io_queue(struct ena_ring
> >>> *ring)
> >>> {
> >>> struct ena_adapter *adapter;
> >>> struct ena_com_dev *ena_dev;
> >>> + struct rte_memseg_list *msl;
> >>> struct ena_com_create_io_ctx ctx =
> >>> /* policy set to _HOST just to satisfy icc compiler */
> >>> { ENA_ADMIN_PLACEMENT_POLICY_HOST,
> >>> @@ -1126,7 +1113,8 @@ static int ena_create_io_queue(struct ena_ring
> >>> *ring)
> >>> }
> >>> ctx.qid = ena_qid;
> >>> ctx.msix_vector = -1; /* interrupts not used */
> >>> - ctx.numa_node = ena_cpu_to_node(ring->id);
> >>> + msl = rte_mem_virt2memseg_list(ring);
> >>> + ctx.numa_node = msl->socket_id;
> >>> rc = ena_com_create_io_queue(ena_dev, &ctx);
> >>> if (rc) {
> >>>
> >>
> >> Hi Anatoly,
> >>
> >> I'm not sure why the previous maintainers implemented this that way, I
> >> can only guess. I think that they were assuming, that each queue will
> >> be assigned to the lcore which is equal to ring id. They probably also
> >> misunderstood how the memzones are working and they thought that each
> >> lcore is having assigned only one memzone which is being mapped 1 to 1.
> >>
> >> They wanted to prevent cross NUMA data acces, when the CPU is
> >> operating in the different NUMA zone and the IO queues memory resides
> >> in the other. I think that above solution won't prevent that neither,
> >> as you are using ring address, which is being allocated together with
> >> struct ena_adapter (it is just an array), so it will probably reside
> >> in the single numa zone.
> >>
> >> I'm currently thinking on solution that could help us to determine on
> >> which numa zone the queue descriptors will be allocated and on which
> >> the lcore assigned to the queue will be working, but have no any ideas
> >> for now :)
> >>
> >> Anyway, your fix won't break anything, as the previous solution wasn't
> >> working as it was supposed to work, so before I will fix that, we can
> >> keep that patch to prevent direct usage of the memzone.
> >>
> >> Thanks,
> >> Michal
> >
> > After investigation I think that we should use socket_id provided by the
> > tx/rx queue setup functions.
> > Could you, please, abandon this patch? I will send the proper fix soon.
> >
>
> I can't really "abandon" it as it will break ENA compilation once the
> structure is hidden in the last patch. What i can do is wait for you to
> submit your patch, and either rebase my patchset on top of it, or
> (better) include it in the patchset itself.
Ok, I've just uploaded the patch (second version has fixed commit
log), you can find it below
https://patches.dpdk.org/patch/54352/
I'm fine with including the patch into this patchset.
>
> > Thanks,
> > Michal
> >
>
>
> --
> Thanks,
> Anatoly
On 04-Jun-19 11:45 AM, Michał Krawczyk wrote:
> wt., 4 cze 2019 o 12:28 Burakov, Anatoly <anatoly.burakov@intel.com> napisał(a):
>>
>> On 03-Jun-19 2:36 PM, Michał Krawczyk wrote:
>>> On 03.06.2019 09:33, Michał Krawczyk wrote:
>>>> On 29.05.2019 18:31, Anatoly Burakov wrote:
>>>>> The ENA driver calculates a ring's NUMA node affinity by directly
>>>>> accessing the memzone list. Fix it to do it through the public
>>>>> API's instead.
>>>>>
>>>>> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
>>>>> ---
>>>>> drivers/net/ena/ena_ethdev.c | 18 +++---------------
>>>>> 1 file changed, 3 insertions(+), 15 deletions(-)
>>>>>
>>>>> diff --git a/drivers/net/ena/ena_ethdev.c b/drivers/net/ena/ena_ethdev.c
>>>>> index b6651fc0f..e745e9e92 100644
>>>>> --- a/drivers/net/ena/ena_ethdev.c
>>>>> +++ b/drivers/net/ena/ena_ethdev.c
>>>>> @@ -274,20 +274,6 @@ static const struct eth_dev_ops ena_dev_ops = {
>>>>> #define NUMA_NO_NODE SOCKET_ID_ANY
>>>>> -static inline int ena_cpu_to_node(int cpu)
>>>>> -{
>>>>> - struct rte_config *config = rte_eal_get_configuration();
>>>>> - struct rte_fbarray *arr = &config->mem_config->memzones;
>>>>> - const struct rte_memzone *mz;
>>>>> -
>>>>> - if (unlikely(cpu >= RTE_MAX_MEMZONE))
>>>>> - return NUMA_NO_NODE;
>>>>> -
>>>>> - mz = rte_fbarray_get(arr, cpu);
>>>>> -
>>>>> - return mz->socket_id;
>>>>> -}
>>>>> -
>>>>> static inline void ena_rx_mbuf_prepare(struct rte_mbuf *mbuf,
>>>>> struct ena_com_rx_ctx *ena_rx_ctx)
>>>>> {
>>>>> @@ -1099,6 +1085,7 @@ static int ena_create_io_queue(struct ena_ring
>>>>> *ring)
>>>>> {
>>>>> struct ena_adapter *adapter;
>>>>> struct ena_com_dev *ena_dev;
>>>>> + struct rte_memseg_list *msl;
>>>>> struct ena_com_create_io_ctx ctx =
>>>>> /* policy set to _HOST just to satisfy icc compiler */
>>>>> { ENA_ADMIN_PLACEMENT_POLICY_HOST,
>>>>> @@ -1126,7 +1113,8 @@ static int ena_create_io_queue(struct ena_ring
>>>>> *ring)
>>>>> }
>>>>> ctx.qid = ena_qid;
>>>>> ctx.msix_vector = -1; /* interrupts not used */
>>>>> - ctx.numa_node = ena_cpu_to_node(ring->id);
>>>>> + msl = rte_mem_virt2memseg_list(ring);
>>>>> + ctx.numa_node = msl->socket_id;
>>>>> rc = ena_com_create_io_queue(ena_dev, &ctx);
>>>>> if (rc) {
>>>>>
>>>>
>>>> Hi Anatoly,
>>>>
>>>> I'm not sure why the previous maintainers implemented this that way, I
>>>> can only guess. I think that they were assuming, that each queue will
>>>> be assigned to the lcore which is equal to ring id. They probably also
>>>> misunderstood how the memzones are working and they thought that each
>>>> lcore is having assigned only one memzone which is being mapped 1 to 1.
>>>>
>>>> They wanted to prevent cross NUMA data acces, when the CPU is
>>>> operating in the different NUMA zone and the IO queues memory resides
>>>> in the other. I think that above solution won't prevent that neither,
>>>> as you are using ring address, which is being allocated together with
>>>> struct ena_adapter (it is just an array), so it will probably reside
>>>> in the single numa zone.
>>>>
>>>> I'm currently thinking on solution that could help us to determine on
>>>> which numa zone the queue descriptors will be allocated and on which
>>>> the lcore assigned to the queue will be working, but have no any ideas
>>>> for now :)
>>>>
>>>> Anyway, your fix won't break anything, as the previous solution wasn't
>>>> working as it was supposed to work, so before I will fix that, we can
>>>> keep that patch to prevent direct usage of the memzone.
>>>>
>>>> Thanks,
>>>> Michal
>>>
>>> After investigation I think that we should use socket_id provided by the
>>> tx/rx queue setup functions.
>>> Could you, please, abandon this patch? I will send the proper fix soon.
>>>
>>
>> I can't really "abandon" it as it will break ENA compilation once the
>> structure is hidden in the last patch. What i can do is wait for you to
>> submit your patch, and either rebase my patchset on top of it, or
>> (better) include it in the patchset itself.
>
> Ok, I've just uploaded the patch (second version has fixed commit
> log), you can find it below
> https://patches.dpdk.org/patch/54352/
>
> I'm fine with including the patch into this patchset.
>
Thanks for your quick implementation!
I'll include it in the inevitable v2.
>>
>>> Thanks,
>>> Michal
>>>
>>
>>
>> --
>> Thanks,
>> Anatoly
>
@@ -274,20 +274,6 @@ static const struct eth_dev_ops ena_dev_ops = {
#define NUMA_NO_NODE SOCKET_ID_ANY
-static inline int ena_cpu_to_node(int cpu)
-{
- struct rte_config *config = rte_eal_get_configuration();
- struct rte_fbarray *arr = &config->mem_config->memzones;
- const struct rte_memzone *mz;
-
- if (unlikely(cpu >= RTE_MAX_MEMZONE))
- return NUMA_NO_NODE;
-
- mz = rte_fbarray_get(arr, cpu);
-
- return mz->socket_id;
-}
-
static inline void ena_rx_mbuf_prepare(struct rte_mbuf *mbuf,
struct ena_com_rx_ctx *ena_rx_ctx)
{
@@ -1099,6 +1085,7 @@ static int ena_create_io_queue(struct ena_ring *ring)
{
struct ena_adapter *adapter;
struct ena_com_dev *ena_dev;
+ struct rte_memseg_list *msl;
struct ena_com_create_io_ctx ctx =
/* policy set to _HOST just to satisfy icc compiler */
{ ENA_ADMIN_PLACEMENT_POLICY_HOST,
@@ -1126,7 +1113,8 @@ static int ena_create_io_queue(struct ena_ring *ring)
}
ctx.qid = ena_qid;
ctx.msix_vector = -1; /* interrupts not used */
- ctx.numa_node = ena_cpu_to_node(ring->id);
+ msl = rte_mem_virt2memseg_list(ring);
+ ctx.numa_node = msl->socket_id;
rc = ena_com_create_io_queue(ena_dev, &ctx);
if (rc) {