[dpdk-dev] vhost: Fix retrieval of numa information in PMD

Message ID 1459872587-11655-1-git-send-email-ciara.loftus@intel.com (mailing list archive)
State Accepted, archived
Headers

Commit Message

Loftus, Ciara April 5, 2016, 4:09 p.m. UTC
  After some testing, it was found that retrieving numa information
about a vhost device via a call to get_mempolicy is more
accurate when performed during the new_device callback versus
the vring_state_changed callback, in particular upon initial boot
of the VM.  Performing this check during new_device is also
potentially more efficient as this callback is only triggered once
during device initialisation, compared with vring_state_changed
which may be called multiple times depending on the number of
queues assigned to the device.

Reorganise the code to perform this check and assign the correct
socket_id to the device during the new_device callback.

Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
---
 drivers/net/vhost/rte_eth_vhost.c | 28 ++++++++++++++--------------
 1 file changed, 14 insertions(+), 14 deletions(-)
  

Comments

Yuanhan Liu April 6, 2016, 5:03 a.m. UTC | #1
On Tue, Apr 05, 2016 at 05:09:47PM +0100, Ciara Loftus wrote:
> After some testing, it was found that retrieving numa information
> about a vhost device via a call to get_mempolicy is more
> accurate when performed during the new_device callback versus
> the vring_state_changed callback, in particular upon initial boot
> of the VM.  Performing this check during new_device is also
> potentially more efficient as this callback is only triggered once
> during device initialisation, compared with vring_state_changed
> which may be called multiple times depending on the number of
> queues assigned to the device.
> 
> Reorganise the code to perform this check and assign the correct
> socket_id to the device during the new_device callback.
> 
> Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
> ---

Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>

	--yliu
  
Jianfeng Tan April 6, 2016, 5:32 a.m. UTC | #2
Hi,

Just out of interest, seems that the message handling thread which runs 
new_device() is pthread_create() from the thread which calls the 
dev_start(), usually master thread, right? But it's not necessary to be 
the master thread to poll pkts from this vhost port, right? So what's 
the significance to record the numa_node information of message handling 
thread here? Shall we make the decision of numa_realloc based on the 
final PMD thread who is responsible for polling this vhost port?

It's not related to this patch itself. And it seems good to me.


Thanks,
Jianfeng



On 4/6/2016 12:09 AM, Ciara Loftus wrote:
> After some testing, it was found that retrieving numa information
> about a vhost device via a call to get_mempolicy is more
> accurate when performed during the new_device callback versus
> the vring_state_changed callback, in particular upon initial boot
> of the VM.  Performing this check during new_device is also
> potentially more efficient as this callback is only triggered once
> during device initialisation, compared with vring_state_changed
> which may be called multiple times depending on the number of
> queues assigned to the device.
>
> Reorganise the code to perform this check and assign the correct
> socket_id to the device during the new_device callback.
>
> Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
> ---
>   drivers/net/vhost/rte_eth_vhost.c | 28 ++++++++++++++--------------
>   1 file changed, 14 insertions(+), 14 deletions(-)
>
> diff --git a/drivers/net/vhost/rte_eth_vhost.c b/drivers/net/vhost/rte_eth_vhost.c
> index 4cc6bec..b1eb082 100644
> --- a/drivers/net/vhost/rte_eth_vhost.c
> +++ b/drivers/net/vhost/rte_eth_vhost.c
> @@ -229,6 +229,9 @@ new_device(struct virtio_net *dev)
>   	struct pmd_internal *internal;
>   	struct vhost_queue *vq;
>   	unsigned i;
> +#ifdef RTE_LIBRTE_VHOST_NUMA
> +	int newnode, ret;
> +#endif
>   
>   	if (dev == NULL) {
>   		RTE_LOG(INFO, PMD, "Invalid argument\n");
> @@ -244,6 +247,17 @@ new_device(struct virtio_net *dev)
>   	eth_dev = list->eth_dev;
>   	internal = eth_dev->data->dev_private;
>   
> +#ifdef RTE_LIBRTE_VHOST_NUMA
> +	ret  = get_mempolicy(&newnode, NULL, 0, dev,
> +			MPOL_F_NODE | MPOL_F_ADDR);
> +	if (ret < 0) {
> +		RTE_LOG(ERR, PMD, "Unknown numa node\n");
> +		return -1;
> +	}
> +
> +	eth_dev->data->numa_node = newnode;
> +#endif
> +
>   	for (i = 0; i < eth_dev->data->nb_rx_queues; i++) {
>   		vq = eth_dev->data->rx_queues[i];
>   		if (vq == NULL)
> @@ -352,9 +366,6 @@ vring_state_changed(struct virtio_net *dev, uint16_t vring, int enable)
>   	struct rte_vhost_vring_state *state;
>   	struct rte_eth_dev *eth_dev;
>   	struct internal_list *list;
> -#ifdef RTE_LIBRTE_VHOST_NUMA
> -	int newnode, ret;
> -#endif
>   
>   	if (dev == NULL) {
>   		RTE_LOG(ERR, PMD, "Invalid argument\n");
> @@ -370,17 +381,6 @@ vring_state_changed(struct virtio_net *dev, uint16_t vring, int enable)
>   	eth_dev = list->eth_dev;
>   	/* won't be NULL */
>   	state = vring_states[eth_dev->data->port_id];
> -
> -#ifdef RTE_LIBRTE_VHOST_NUMA
> -	ret  = get_mempolicy(&newnode, NULL, 0, dev,
> -			MPOL_F_NODE | MPOL_F_ADDR);
> -	if (ret < 0) {
> -		RTE_LOG(ERR, PMD, "Unknown numa node\n");
> -		return -1;
> -	}
> -
> -	eth_dev->data->numa_node = newnode;
> -#endif
>   	rte_spinlock_lock(&state->lock);
>   	state->cur[vring] = enable;
>   	state->max_vring = RTE_MAX(vring, state->max_vring);
  
Yuanhan Liu April 6, 2016, 5:44 a.m. UTC | #3
On Wed, Apr 06, 2016 at 01:32:12PM +0800, Tan, Jianfeng wrote:
> Hi,
> 
> Just out of interest, seems that the message handling thread which runs
> new_device() is pthread_create() from the thread which calls the
> dev_start(), usually master thread, right? But it's not necessary to be the
> master thread to poll pkts from this vhost port, right? So what's the
> significance to record the numa_node information of message handling thread
> here? Shall we make the decision of numa_realloc based on the final PMD
> thread who is responsible for polling this vhost port?

It doesn't matter on which core we made the decision: the result
would be same since we are querying the numa node info of the
virtio_net dev struct.

	--yliu
> 
> It's not related to this patch itself. And it seems good to me.
> 
> 
> Thanks,
> Jianfeng
> 
> 
> 
> On 4/6/2016 12:09 AM, Ciara Loftus wrote:
> >After some testing, it was found that retrieving numa information
> >about a vhost device via a call to get_mempolicy is more
> >accurate when performed during the new_device callback versus
> >the vring_state_changed callback, in particular upon initial boot
> >of the VM.  Performing this check during new_device is also
> >potentially more efficient as this callback is only triggered once
> >during device initialisation, compared with vring_state_changed
> >which may be called multiple times depending on the number of
> >queues assigned to the device.
> >
> >Reorganise the code to perform this check and assign the correct
> >socket_id to the device during the new_device callback.
> >
> >Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
> >---
> >  drivers/net/vhost/rte_eth_vhost.c | 28 ++++++++++++++--------------
> >  1 file changed, 14 insertions(+), 14 deletions(-)
> >
> >diff --git a/drivers/net/vhost/rte_eth_vhost.c b/drivers/net/vhost/rte_eth_vhost.c
> >index 4cc6bec..b1eb082 100644
> >--- a/drivers/net/vhost/rte_eth_vhost.c
> >+++ b/drivers/net/vhost/rte_eth_vhost.c
> >@@ -229,6 +229,9 @@ new_device(struct virtio_net *dev)
> >  	struct pmd_internal *internal;
> >  	struct vhost_queue *vq;
> >  	unsigned i;
> >+#ifdef RTE_LIBRTE_VHOST_NUMA
> >+	int newnode, ret;
> >+#endif
> >  	if (dev == NULL) {
> >  		RTE_LOG(INFO, PMD, "Invalid argument\n");
> >@@ -244,6 +247,17 @@ new_device(struct virtio_net *dev)
> >  	eth_dev = list->eth_dev;
> >  	internal = eth_dev->data->dev_private;
> >+#ifdef RTE_LIBRTE_VHOST_NUMA
> >+	ret  = get_mempolicy(&newnode, NULL, 0, dev,
> >+			MPOL_F_NODE | MPOL_F_ADDR);
> >+	if (ret < 0) {
> >+		RTE_LOG(ERR, PMD, "Unknown numa node\n");
> >+		return -1;
> >+	}
> >+
> >+	eth_dev->data->numa_node = newnode;
> >+#endif
> >+
> >  	for (i = 0; i < eth_dev->data->nb_rx_queues; i++) {
> >  		vq = eth_dev->data->rx_queues[i];
> >  		if (vq == NULL)
> >@@ -352,9 +366,6 @@ vring_state_changed(struct virtio_net *dev, uint16_t vring, int enable)
> >  	struct rte_vhost_vring_state *state;
> >  	struct rte_eth_dev *eth_dev;
> >  	struct internal_list *list;
> >-#ifdef RTE_LIBRTE_VHOST_NUMA
> >-	int newnode, ret;
> >-#endif
> >  	if (dev == NULL) {
> >  		RTE_LOG(ERR, PMD, "Invalid argument\n");
> >@@ -370,17 +381,6 @@ vring_state_changed(struct virtio_net *dev, uint16_t vring, int enable)
> >  	eth_dev = list->eth_dev;
> >  	/* won't be NULL */
> >  	state = vring_states[eth_dev->data->port_id];
> >-
> >-#ifdef RTE_LIBRTE_VHOST_NUMA
> >-	ret  = get_mempolicy(&newnode, NULL, 0, dev,
> >-			MPOL_F_NODE | MPOL_F_ADDR);
> >-	if (ret < 0) {
> >-		RTE_LOG(ERR, PMD, "Unknown numa node\n");
> >-		return -1;
> >-	}
> >-
> >-	eth_dev->data->numa_node = newnode;
> >-#endif
> >  	rte_spinlock_lock(&state->lock);
> >  	state->cur[vring] = enable;
> >  	state->max_vring = RTE_MAX(vring, state->max_vring);
  
Jianfeng Tan April 6, 2016, 6:05 a.m. UTC | #4
On 4/6/2016 1:44 PM, Yuanhan Liu wrote:
> On Wed, Apr 06, 2016 at 01:32:12PM +0800, Tan, Jianfeng wrote:
>> Hi,
>>
>> Just out of interest, seems that the message handling thread which runs
>> new_device() is pthread_create() from the thread which calls the
>> dev_start(), usually master thread, right? But it's not necessary to be the
>> master thread to poll pkts from this vhost port, right? So what's the
>> significance to record the numa_node information of message handling thread
>> here? Shall we make the decision of numa_realloc based on the final PMD
>> thread who is responsible for polling this vhost port?
> It doesn't matter on which core we made the decision: the result
> would be same since we are querying the numa node info of the
> virtio_net dev struct.

OK, according to your hint, read numa_realloc() again, it's to keep *dev 
(struct virtio_net), *dev->virtqueue[], on the same numa socket of 
shared memory with virtio?

And numa_realloc() is called in vhost_set_vring_addr(), which is after 
new_device()?


>
> 	--yliu
>> It's not related to this patch itself. And it seems good to me.
>>
>>
>> Thanks,
>> Jianfeng
>>
>>
>>
>> On 4/6/2016 12:09 AM, Ciara Loftus wrote:
>>> After some testing, it was found that retrieving numa information
>>> about a vhost device via a call to get_mempolicy is more
>>> accurate when performed during the new_device callback versus
>>> the vring_state_changed callback, in particular upon initial boot
>>> of the VM.  Performing this check during new_device is also
>>> potentially more efficient as this callback is only triggered once
>>> during device initialisation, compared with vring_state_changed
>>> which may be called multiple times depending on the number of
>>> queues assigned to the device.
>>>
>>> Reorganise the code to perform this check and assign the correct
>>> socket_id to the device during the new_device callback.
>>>
>>> Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
>>> ---
>>>   drivers/net/vhost/rte_eth_vhost.c | 28 ++++++++++++++--------------
>>>   1 file changed, 14 insertions(+), 14 deletions(-)
>>>
>>> diff --git a/drivers/net/vhost/rte_eth_vhost.c b/drivers/net/vhost/rte_eth_vhost.c
>>> index 4cc6bec..b1eb082 100644
>>> --- a/drivers/net/vhost/rte_eth_vhost.c
>>> +++ b/drivers/net/vhost/rte_eth_vhost.c
>>> @@ -229,6 +229,9 @@ new_device(struct virtio_net *dev)
>>>   	struct pmd_internal *internal;
>>>   	struct vhost_queue *vq;
>>>   	unsigned i;
>>> +#ifdef RTE_LIBRTE_VHOST_NUMA
>>> +	int newnode, ret;
>>> +#endif
>>>   	if (dev == NULL) {
>>>   		RTE_LOG(INFO, PMD, "Invalid argument\n");
>>> @@ -244,6 +247,17 @@ new_device(struct virtio_net *dev)
>>>   	eth_dev = list->eth_dev;
>>>   	internal = eth_dev->data->dev_private;
>>> +#ifdef RTE_LIBRTE_VHOST_NUMA
>>> +	ret  = get_mempolicy(&newnode, NULL, 0, dev,
>>> +			MPOL_F_NODE | MPOL_F_ADDR);
>>> +	if (ret < 0) {
>>> +		RTE_LOG(ERR, PMD, "Unknown numa node\n");
>>> +		return -1;
>>> +	}
>>> +
>>> +	eth_dev->data->numa_node = newnode;

If it's correct of above analysis, dev, here, could be numa_realloc() later?

Thanks,
Jianfeng


>>> +#endif
>>> +
>>>   	for (i = 0; i < eth_dev->data->nb_rx_queues; i++) {
>>>   		vq = eth_dev->data->rx_queues[i];
>>>   		if (vq == NULL)
>>> @@ -352,9 +366,6 @@ vring_state_changed(struct virtio_net *dev, uint16_t vring, int enable)
>>>   	struct rte_vhost_vring_state *state;
>>>   	struct rte_eth_dev *eth_dev;
>>>   	struct internal_list *list;
>>> -#ifdef RTE_LIBRTE_VHOST_NUMA
>>> -	int newnode, ret;
>>> -#endif
>>>   	if (dev == NULL) {
>>>   		RTE_LOG(ERR, PMD, "Invalid argument\n");
>>> @@ -370,17 +381,6 @@ vring_state_changed(struct virtio_net *dev, uint16_t vring, int enable)
>>>   	eth_dev = list->eth_dev;
>>>   	/* won't be NULL */
>>>   	state = vring_states[eth_dev->data->port_id];
>>> -
>>> -#ifdef RTE_LIBRTE_VHOST_NUMA
>>> -	ret  = get_mempolicy(&newnode, NULL, 0, dev,
>>> -			MPOL_F_NODE | MPOL_F_ADDR);
>>> -	if (ret < 0) {
>>> -		RTE_LOG(ERR, PMD, "Unknown numa node\n");
>>> -		return -1;
>>> -	}
>>> -
>>> -	eth_dev->data->numa_node = newnode;
>>> -#endif
>>>   	rte_spinlock_lock(&state->lock);
>>>   	state->cur[vring] = enable;
>>>   	state->max_vring = RTE_MAX(vring, state->max_vring);
  
Yuanhan Liu April 6, 2016, 6:17 a.m. UTC | #5
On Wed, Apr 06, 2016 at 02:05:51PM +0800, Tan, Jianfeng wrote:
> 
> 
> On 4/6/2016 1:44 PM, Yuanhan Liu wrote:
> >On Wed, Apr 06, 2016 at 01:32:12PM +0800, Tan, Jianfeng wrote:
> >>Hi,
> >>
> >>Just out of interest, seems that the message handling thread which runs
> >>new_device() is pthread_create() from the thread which calls the
> >>dev_start(), usually master thread, right? But it's not necessary to be the
> >>master thread to poll pkts from this vhost port, right? So what's the
> >>significance to record the numa_node information of message handling thread
> >>here? Shall we make the decision of numa_realloc based on the final PMD
> >>thread who is responsible for polling this vhost port?
> >It doesn't matter on which core we made the decision: the result
> >would be same since we are querying the numa node info of the
> >virtio_net dev struct.
> 
> OK, according to your hint, read numa_realloc() again, it's to keep *dev
> (struct virtio_net), *dev->virtqueue[], on the same numa socket of shared
> memory with virtio?

Sort of, and I'm guessing that the comment have already made it clear?

    /*
     * Reallocate virtio_dev and vhost_virtqueue data structure to make them on the
     * same numa node as the memory of vring descriptor.
     */

> 
> And numa_realloc() is called in vhost_set_vring_addr(), which is after
> new_device()?

It's actually before new_device().

	--yliu
  
Jianfeng Tan April 6, 2016, 6:32 a.m. UTC | #6
On 4/6/2016 2:17 PM, Yuanhan Liu wrote:
> On Wed, Apr 06, 2016 at 02:05:51PM +0800, Tan, Jianfeng wrote:
>>
>> On 4/6/2016 1:44 PM, Yuanhan Liu wrote:
>>> On Wed, Apr 06, 2016 at 01:32:12PM +0800, Tan, Jianfeng wrote:
>>>> Hi,
>>>>
>>>> Just out of interest, seems that the message handling thread which runs
>>>> new_device() is pthread_create() from the thread which calls the
>>>> dev_start(), usually master thread, right? But it's not necessary to be the
>>>> master thread to poll pkts from this vhost port, right? So what's the
>>>> significance to record the numa_node information of message handling thread
>>>> here? Shall we make the decision of numa_realloc based on the final PMD
>>>> thread who is responsible for polling this vhost port?
>>> It doesn't matter on which core we made the decision: the result
>>> would be same since we are querying the numa node info of the
>>> virtio_net dev struct.
>> OK, according to your hint, read numa_realloc() again, it's to keep *dev
>> (struct virtio_net), *dev->virtqueue[], on the same numa socket of shared
>> memory with virtio?
> Sort of, and I'm guessing that the comment have already made it clear?
>
>      /*
>       * Reallocate virtio_dev and vhost_virtqueue data structure to make them on the
>       * same numa node as the memory of vring descriptor.
>       */
>
>> And numa_realloc() is called in vhost_set_vring_addr(), which is after
>> new_device()?
> It's actually before new_device().
>
> 	--yliu

Yes, exactly as you said. Thanks for explanation!

Jianfeng
  
Tetsuya Mukawa April 6, 2016, 6:49 a.m. UTC | #7
On 2016/04/06 1:09, Ciara Loftus wrote:
> After some testing, it was found that retrieving numa information
> about a vhost device via a call to get_mempolicy is more
> accurate when performed during the new_device callback versus
> the vring_state_changed callback, in particular upon initial boot
> of the VM.  Performing this check during new_device is also
> potentially more efficient as this callback is only triggered once
> during device initialisation, compared with vring_state_changed
> which may be called multiple times depending on the number of
> queues assigned to the device.
>
> Reorganise the code to perform this check and assign the correct
> socket_id to the device during the new_device callback.
>
> Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
> ---
>  drivers/net/vhost/rte_eth_vhost.c | 28 ++++++++++++++--------------
>  1 file changed, 14 insertions(+), 14 deletions(-)
>
> diff --git a/drivers/net/vhost/rte_eth_vhost.c b/drivers/net/vhost/rte_eth_vhost.c
> index 4cc6bec..b1eb082 100644
> --- a/drivers/net/vhost/rte_eth_vhost.c
> +++ b/drivers/net/vhost/rte_eth_vhost.c
>

Hi,

I appreciate fixing it.
Just one worry is that state changed event may be occurred before new
device event.
The users should not call rte_eth_dev_socket_id() until new device event
comes, even if they catch queue state events.
Otherwise, they will get wrong socket id to call
rte_eth_rx/tx_queue_setup().

So how about commenting it in 'rte_eth_vhost.h'?

Thanks,
Tetsuya
  
Yuanhan Liu April 6, 2016, 7:17 a.m. UTC | #8
On Wed, Apr 06, 2016 at 03:49:25PM +0900, Tetsuya Mukawa wrote:
> On 2016/04/06 1:09, Ciara Loftus wrote:
> > After some testing, it was found that retrieving numa information
> > about a vhost device via a call to get_mempolicy is more
> > accurate when performed during the new_device callback versus
> > the vring_state_changed callback, in particular upon initial boot
> > of the VM.  Performing this check during new_device is also
> > potentially more efficient as this callback is only triggered once
> > during device initialisation, compared with vring_state_changed
> > which may be called multiple times depending on the number of
> > queues assigned to the device.
> >
> > Reorganise the code to perform this check and assign the correct
> > socket_id to the device during the new_device callback.
> >
> > Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
> > ---
> >  drivers/net/vhost/rte_eth_vhost.c | 28 ++++++++++++++--------------
> >  1 file changed, 14 insertions(+), 14 deletions(-)
> >
> > diff --git a/drivers/net/vhost/rte_eth_vhost.c b/drivers/net/vhost/rte_eth_vhost.c
> > index 4cc6bec..b1eb082 100644
> > --- a/drivers/net/vhost/rte_eth_vhost.c
> > +++ b/drivers/net/vhost/rte_eth_vhost.c
> >
> 
> Hi,
> 
> I appreciate fixing it.
> Just one worry is that state changed event may be occurred before new
> device event.
> The users should not call rte_eth_dev_socket_id() until new device event
> comes, even if they catch queue state events.
> Otherwise, they will get wrong socket id to call
> rte_eth_rx/tx_queue_setup().

There is no way to guarantee that the socket id stuff would work
perfectly in vhost, right? I mean, it's likely that virtio device
would allocate memory from 2 or more sockets.

So, it doesn't matter too much whether it's set perfectly right
or not. Instead, we should assign it with a saner value instead
of a obvious wrong one when new_device() is not invoked yet. So,
I'd suggest to make an assignment first based on vhost_dev (or
whatever) struct, and then make it "right" at new_device()
callback?

> So how about commenting it in 'rte_eth_vhost.h'?

It asks a different usage than other PMDs, which I don't think
it's a good idea.

	--yliu
  
Tetsuya Mukawa April 6, 2016, 7:28 a.m. UTC | #9
On 2016/04/06 16:17, Yuanhan Liu wrote:
> On Wed, Apr 06, 2016 at 03:49:25PM +0900, Tetsuya Mukawa wrote:
>> On 2016/04/06 1:09, Ciara Loftus wrote:
>>> After some testing, it was found that retrieving numa information
>>> about a vhost device via a call to get_mempolicy is more
>>> accurate when performed during the new_device callback versus
>>> the vring_state_changed callback, in particular upon initial boot
>>> of the VM.  Performing this check during new_device is also
>>> potentially more efficient as this callback is only triggered once
>>> during device initialisation, compared with vring_state_changed
>>> which may be called multiple times depending on the number of
>>> queues assigned to the device.
>>>
>>> Reorganise the code to perform this check and assign the correct
>>> socket_id to the device during the new_device callback.
>>>
>>> Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
>>> ---
>>>  drivers/net/vhost/rte_eth_vhost.c | 28 ++++++++++++++--------------
>>>  1 file changed, 14 insertions(+), 14 deletions(-)
>>>
>>> diff --git a/drivers/net/vhost/rte_eth_vhost.c b/drivers/net/vhost/rte_eth_vhost.c
>>> index 4cc6bec..b1eb082 100644
>>> --- a/drivers/net/vhost/rte_eth_vhost.c
>>> +++ b/drivers/net/vhost/rte_eth_vhost.c
>>>
>> Hi,
>>
>> I appreciate fixing it.
>> Just one worry is that state changed event may be occurred before new
>> device event.
>> The users should not call rte_eth_dev_socket_id() until new device event
>> comes, even if they catch queue state events.
>> Otherwise, they will get wrong socket id to call
>> rte_eth_rx/tx_queue_setup().
> There is no way to guarantee that the socket id stuff would work
> perfectly in vhost, right? I mean, it's likely that virtio device
> would allocate memory from 2 or more sockets.
>
> So, it doesn't matter too much whether it's set perfectly right
> or not. Instead, we should assign it with a saner value instead
> of a obvious wrong one when new_device() is not invoked yet. So,
> I'd suggest to make an assignment first based on vhost_dev (or
> whatever) struct, and then make it "right" at new_device()
> callback?

Yes, I agree with you idea.

Thanks,
Tetsuya

>> So how about commenting it in 'rte_eth_vhost.h'?
> It asks a different usage than other PMDs, which I don't think
> it's a good idea.
>
> 	--yliu
  
Loftus, Ciara April 6, 2016, 9:37 a.m. UTC | #10
> 
> On Wed, Apr 06, 2016 at 03:49:25PM +0900, Tetsuya Mukawa wrote:
> > On 2016/04/06 1:09, Ciara Loftus wrote:
> > > After some testing, it was found that retrieving numa information
> > > about a vhost device via a call to get_mempolicy is more
> > > accurate when performed during the new_device callback versus
> > > the vring_state_changed callback, in particular upon initial boot
> > > of the VM.  Performing this check during new_device is also
> > > potentially more efficient as this callback is only triggered once
> > > during device initialisation, compared with vring_state_changed
> > > which may be called multiple times depending on the number of
> > > queues assigned to the device.
> > >
> > > Reorganise the code to perform this check and assign the correct
> > > socket_id to the device during the new_device callback.
> > >
> > > Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
> > > ---
> > >  drivers/net/vhost/rte_eth_vhost.c | 28 ++++++++++++++--------------
> > >  1 file changed, 14 insertions(+), 14 deletions(-)
> > >
> > > diff --git a/drivers/net/vhost/rte_eth_vhost.c
> b/drivers/net/vhost/rte_eth_vhost.c
> > > index 4cc6bec..b1eb082 100644
> > > --- a/drivers/net/vhost/rte_eth_vhost.c
> > > +++ b/drivers/net/vhost/rte_eth_vhost.c
> > >
> >
> > Hi,
> >
> > I appreciate fixing it.
> > Just one worry is that state changed event may be occurred before new
> > device event.
> > The users should not call rte_eth_dev_socket_id() until new device event
> > comes, even if they catch queue state events.
> > Otherwise, they will get wrong socket id to call
> > rte_eth_rx/tx_queue_setup().
> 
> There is no way to guarantee that the socket id stuff would work
> perfectly in vhost, right? I mean, it's likely that virtio device
> would allocate memory from 2 or more sockets.
> 
> So, it doesn't matter too much whether it's set perfectly right
> or not. Instead, we should assign it with a saner value instead
> of a obvious wrong one when new_device() is not invoked yet. So,
> I'd suggest to make an assignment first based on vhost_dev (or
> whatever) struct, and then make it "right" at new_device()
> callback?

Thanks for the feedback.
At the moment with this patch numa_node is initially set to rte_socket_id() during  pmd init and then updated to the correct value during new_device.
Are you suggesting we set it again in between these two steps ("based on vhost_dev")? If so where do you think would be a good place?

Thanks,
Ciara

> 
> > So how about commenting it in 'rte_eth_vhost.h'?
> 
> It asks a different usage than other PMDs, which I don't think
> it's a good idea.
> 
> 	--yliu
  
Yuanhan Liu April 6, 2016, 4:09 p.m. UTC | #11
On Wed, Apr 06, 2016 at 09:37:53AM +0000, Loftus, Ciara wrote:
> > 
> > On Wed, Apr 06, 2016 at 03:49:25PM +0900, Tetsuya Mukawa wrote:
> > > On 2016/04/06 1:09, Ciara Loftus wrote:
> > > > After some testing, it was found that retrieving numa information
> > > > about a vhost device via a call to get_mempolicy is more
> > > > accurate when performed during the new_device callback versus
> > > > the vring_state_changed callback, in particular upon initial boot
> > > > of the VM.  Performing this check during new_device is also
> > > > potentially more efficient as this callback is only triggered once
> > > > during device initialisation, compared with vring_state_changed
> > > > which may be called multiple times depending on the number of
> > > > queues assigned to the device.
> > > >
> > > > Reorganise the code to perform this check and assign the correct
> > > > socket_id to the device during the new_device callback.
> > > >
> > > > Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
> > > > ---
> > > >  drivers/net/vhost/rte_eth_vhost.c | 28 ++++++++++++++--------------
> > > >  1 file changed, 14 insertions(+), 14 deletions(-)
> > > >
> > > > diff --git a/drivers/net/vhost/rte_eth_vhost.c
> > b/drivers/net/vhost/rte_eth_vhost.c
> > > > index 4cc6bec..b1eb082 100644
> > > > --- a/drivers/net/vhost/rte_eth_vhost.c
> > > > +++ b/drivers/net/vhost/rte_eth_vhost.c
> > > >
> > >
> > > Hi,
> > >
> > > I appreciate fixing it.
> > > Just one worry is that state changed event may be occurred before new
> > > device event.
> > > The users should not call rte_eth_dev_socket_id() until new device event
> > > comes, even if they catch queue state events.
> > > Otherwise, they will get wrong socket id to call
> > > rte_eth_rx/tx_queue_setup().
> > 
> > There is no way to guarantee that the socket id stuff would work
> > perfectly in vhost, right? I mean, it's likely that virtio device
> > would allocate memory from 2 or more sockets.
> > 
> > So, it doesn't matter too much whether it's set perfectly right
> > or not. Instead, we should assign it with a saner value instead
> > of a obvious wrong one when new_device() is not invoked yet. So,
> > I'd suggest to make an assignment first based on vhost_dev (or
> > whatever) struct, and then make it "right" at new_device()
> > callback?
> 
> Thanks for the feedback.
> At the moment with this patch numa_node is initially set to rte_socket_id() during  pmd init and then updated to the correct value during new_device.
> Are you suggesting we set it again in between these two steps ("based on vhost_dev")? If so where do you think would be a good place?

Oh, I was not aware of that. Then I think we are fine here.

	--yliu
  
Thomas Monjalon April 6, 2016, 4:12 p.m. UTC | #12
2016-04-07 00:09, Yuanhan Liu:
> On Wed, Apr 06, 2016 at 09:37:53AM +0000, Loftus, Ciara wrote:
> > > On Wed, Apr 06, 2016 at 03:49:25PM +0900, Tetsuya Mukawa wrote:
> > > > Hi,
> > > >
> > > > I appreciate fixing it.
> > > > Just one worry is that state changed event may be occurred before new
> > > > device event.
> > > > The users should not call rte_eth_dev_socket_id() until new device event
> > > > comes, even if they catch queue state events.
> > > > Otherwise, they will get wrong socket id to call
> > > > rte_eth_rx/tx_queue_setup().
> > > 
> > > There is no way to guarantee that the socket id stuff would work
> > > perfectly in vhost, right? I mean, it's likely that virtio device
> > > would allocate memory from 2 or more sockets.
> > > 
> > > So, it doesn't matter too much whether it's set perfectly right
> > > or not. Instead, we should assign it with a saner value instead
> > > of a obvious wrong one when new_device() is not invoked yet. So,
> > > I'd suggest to make an assignment first based on vhost_dev (or
> > > whatever) struct, and then make it "right" at new_device()
> > > callback?
> > 
> > Thanks for the feedback.
> > At the moment with this patch numa_node is initially set to rte_socket_id() during  pmd init and then updated to the correct value during new_device.
> > Are you suggesting we set it again in between these two steps ("based on vhost_dev")? If so where do you think would be a good place?
> 
> Oh, I was not aware of that. Then I think we are fine here.

Please Yuanhan, could you be more explicit?
  
Yuanhan Liu April 6, 2016, 4:43 p.m. UTC | #13
On Wed, Apr 06, 2016 at 06:12:07PM +0200, Thomas Monjalon wrote:
> 2016-04-07 00:09, Yuanhan Liu:
> > On Wed, Apr 06, 2016 at 09:37:53AM +0000, Loftus, Ciara wrote:
> > > > On Wed, Apr 06, 2016 at 03:49:25PM +0900, Tetsuya Mukawa wrote:
> > > > > Hi,
> > > > >
> > > > > I appreciate fixing it.
> > > > > Just one worry is that state changed event may be occurred before new
> > > > > device event.
> > > > > The users should not call rte_eth_dev_socket_id() until new device event
> > > > > comes, even if they catch queue state events.
> > > > > Otherwise, they will get wrong socket id to call
> > > > > rte_eth_rx/tx_queue_setup().
> > > > 
> > > > There is no way to guarantee that the socket id stuff would work
> > > > perfectly in vhost, right? I mean, it's likely that virtio device
> > > > would allocate memory from 2 or more sockets.
> > > > 
> > > > So, it doesn't matter too much whether it's set perfectly right
> > > > or not. Instead, we should assign it with a saner value instead
> > > > of a obvious wrong one when new_device() is not invoked yet. So,
> > > > I'd suggest to make an assignment first based on vhost_dev (or
> > > > whatever) struct, and then make it "right" at new_device()
> > > > callback?
> > > 
> > > Thanks for the feedback.
> > > At the moment with this patch numa_node is initially set to rte_socket_id() during  pmd init and then updated to the correct value during new_device.
> > > Are you suggesting we set it again in between these two steps ("based on vhost_dev")? If so where do you think would be a good place?
> > 
> > Oh, I was not aware of that. Then I think we are fine here.
> 
> Please Yuanhan, could you be more explicit?

Oh, sorry, I made a reply at wrong place. My reply was against to
following statement:

  > At the moment with this patch numa_node is initially set to
  > rte_socket_id() during  pmd init and then updated to the correct
  > value during new_device.

So, we don't need do another numa_node inital assignment. Put simply,
this patch is fine.

	--yliu
  
Thomas Monjalon April 6, 2016, 4:45 p.m. UTC | #14
2016-04-06 13:03, Yuanhan Liu:
> On Tue, Apr 05, 2016 at 05:09:47PM +0100, Ciara Loftus wrote:
> > After some testing, it was found that retrieving numa information
> > about a vhost device via a call to get_mempolicy is more
> > accurate when performed during the new_device callback versus
> > the vring_state_changed callback, in particular upon initial boot
> > of the VM.  Performing this check during new_device is also
> > potentially more efficient as this callback is only triggered once
> > during device initialisation, compared with vring_state_changed
> > which may be called multiple times depending on the number of
> > queues assigned to the device.
> > 
> > Reorganise the code to perform this check and assign the correct
> > socket_id to the device during the new_device callback.
> > 
> > Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
> 
> Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>

Fixes: ee584e9710b9 ("vhost: add driver on top of the library")

Applied, thanks
  

Patch

diff --git a/drivers/net/vhost/rte_eth_vhost.c b/drivers/net/vhost/rte_eth_vhost.c
index 4cc6bec..b1eb082 100644
--- a/drivers/net/vhost/rte_eth_vhost.c
+++ b/drivers/net/vhost/rte_eth_vhost.c
@@ -229,6 +229,9 @@  new_device(struct virtio_net *dev)
 	struct pmd_internal *internal;
 	struct vhost_queue *vq;
 	unsigned i;
+#ifdef RTE_LIBRTE_VHOST_NUMA
+	int newnode, ret;
+#endif
 
 	if (dev == NULL) {
 		RTE_LOG(INFO, PMD, "Invalid argument\n");
@@ -244,6 +247,17 @@  new_device(struct virtio_net *dev)
 	eth_dev = list->eth_dev;
 	internal = eth_dev->data->dev_private;
 
+#ifdef RTE_LIBRTE_VHOST_NUMA
+	ret  = get_mempolicy(&newnode, NULL, 0, dev,
+			MPOL_F_NODE | MPOL_F_ADDR);
+	if (ret < 0) {
+		RTE_LOG(ERR, PMD, "Unknown numa node\n");
+		return -1;
+	}
+
+	eth_dev->data->numa_node = newnode;
+#endif
+
 	for (i = 0; i < eth_dev->data->nb_rx_queues; i++) {
 		vq = eth_dev->data->rx_queues[i];
 		if (vq == NULL)
@@ -352,9 +366,6 @@  vring_state_changed(struct virtio_net *dev, uint16_t vring, int enable)
 	struct rte_vhost_vring_state *state;
 	struct rte_eth_dev *eth_dev;
 	struct internal_list *list;
-#ifdef RTE_LIBRTE_VHOST_NUMA
-	int newnode, ret;
-#endif
 
 	if (dev == NULL) {
 		RTE_LOG(ERR, PMD, "Invalid argument\n");
@@ -370,17 +381,6 @@  vring_state_changed(struct virtio_net *dev, uint16_t vring, int enable)
 	eth_dev = list->eth_dev;
 	/* won't be NULL */
 	state = vring_states[eth_dev->data->port_id];
-
-#ifdef RTE_LIBRTE_VHOST_NUMA
-	ret  = get_mempolicy(&newnode, NULL, 0, dev,
-			MPOL_F_NODE | MPOL_F_ADDR);
-	if (ret < 0) {
-		RTE_LOG(ERR, PMD, "Unknown numa node\n");
-		return -1;
-	}
-
-	eth_dev->data->numa_node = newnode;
-#endif
 	rte_spinlock_lock(&state->lock);
 	state->cur[vring] = enable;
 	state->max_vring = RTE_MAX(vring, state->max_vring);