[v4,1/4] hash: fix race condition in iterate

Message ID 1538155426-145177-2-git-send-email-yipeng1.wang@intel.com (mailing list archive)
State Superseded, archived
Delegated to: Thomas Monjalon
Headers
Series hash: add extendable bucket and partial key hashing |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/Intel-compilation success Compilation OK

Commit Message

Wang, Yipeng1 Sept. 28, 2018, 5:23 p.m. UTC
  In rte_hash_iterate, the reader lock did not protect the
while loop which checks empty entry. This created a race
condition that the entry may become empty when enters
the lock, then a wrong key data value would be read out.

This commit extends the protected region.

Fixes: f2e3001b53ec ("hash: support read/write concurrency")
Cc: stable@dpdk.org

Signed-off-by: Yipeng Wang <yipeng1.wang@intel.com>
Reported-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
---
 lib/librte_hash/rte_cuckoo_hash.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)
  

Comments

Honnappa Nagarahalli Oct. 1, 2018, 8:23 p.m. UTC | #1
> 
> In rte_hash_iterate, the reader lock did not protect the while loop which
> checks empty entry. This created a race condition that the entry may become
> empty when enters the lock, then a wrong key data value would be read out.
> 
> This commit extends the protected region.
> 
> Fixes: f2e3001b53ec ("hash: support read/write concurrency")
> Cc: stable@dpdk.org
> 
> Signed-off-by: Yipeng Wang <yipeng1.wang@intel.com>
> Reported-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> ---
>  lib/librte_hash/rte_cuckoo_hash.c | 7 +++++--
>  1 file changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/lib/librte_hash/rte_cuckoo_hash.c
> b/lib/librte_hash/rte_cuckoo_hash.c
> index f7b86c8..eba13e9 100644
> --- a/lib/librte_hash/rte_cuckoo_hash.c
> +++ b/lib/librte_hash/rte_cuckoo_hash.c
> @@ -1317,16 +1317,19 @@ rte_hash_iterate(const struct rte_hash *h, const
> void **key, void **data, uint32
>  	bucket_idx = *next / RTE_HASH_BUCKET_ENTRIES;
>  	idx = *next % RTE_HASH_BUCKET_ENTRIES;
> 
> +	__hash_rw_reader_lock(h);
This does not work well with the lock-less changes I am making.  We should leave the lock in its original position. Instead change the while loop as follows:

while ((position = h->buckets[bucket_idx].key_idx[idx]) == EMPTY_SLOT)

>  	/* If current position is empty, go to the next one */
>  	while (h->buckets[bucket_idx].key_idx[idx] == EMPTY_SLOT) {
>  		(*next)++;
>  		/* End of table */
> -		if (*next == total_entries)
> +		if (*next == total_entries) {
> +			__hash_rw_reader_unlock(h);
>  			return -ENOENT;
> +		}
>  		bucket_idx = *next / RTE_HASH_BUCKET_ENTRIES;
>  		idx = *next % RTE_HASH_BUCKET_ENTRIES;
>  	}
> -	__hash_rw_reader_lock(h);
> +
>  	/* Get position of entry in key table */
>  	position = h->buckets[bucket_idx].key_idx[idx];
If we change the while loop as I suggested as above, we can remove this line.

>  	next_key = (struct rte_hash_key *) ((char *)h->key_store +
> --
> 2.7.4
  
Wang, Yipeng1 Oct. 2, 2018, 12:17 a.m. UTC | #2
>-----Original Message-----
>From: Honnappa Nagarahalli [mailto:Honnappa.Nagarahalli@arm.com]
>> @@ -1317,16 +1317,19 @@ rte_hash_iterate(const struct rte_hash *h, const
>> void **key, void **data, uint32
>>  	bucket_idx = *next / RTE_HASH_BUCKET_ENTRIES;
>>  	idx = *next % RTE_HASH_BUCKET_ENTRIES;
>>
>> +	__hash_rw_reader_lock(h);
>This does not work well with the lock-less changes I am making.  We should leave the lock in its original position. Instead change the
>while loop as follows:
>
>while ((position = h->buckets[bucket_idx].key_idx[idx]) == EMPTY_SLOT)
>
>>  	/* If current position is empty, go to the next one */
>>  	while (h->buckets[bucket_idx].key_idx[idx] == EMPTY_SLOT) {
>>  		(*next)++;
>>  		/* End of table */
>> -		if (*next == total_entries)
>> +		if (*next == total_entries) {
>> +			__hash_rw_reader_unlock(h);
>>  			return -ENOENT;
>> +		}
>>  		bucket_idx = *next / RTE_HASH_BUCKET_ENTRIES;
>>  		idx = *next % RTE_HASH_BUCKET_ENTRIES;
>>  	}
>> -	__hash_rw_reader_lock(h);
>> +
>>  	/* Get position of entry in key table */
>>  	position = h->buckets[bucket_idx].key_idx[idx];
>If we change the while loop as I suggested as above, we can remove this line.
>
>>  	next_key = (struct rte_hash_key *) ((char *)h->key_store +

[Wang, Yipeng] Sorry that I did not realize you already have it in your patch set and I agree.
Do you want to export it as a bug fix in your patch set? I will remove my change.

For the lock free, do we need to protect it with version counter? Imagine the following corner case:
While the iterator read the key and data, there is a writer deleted, removed, and recycled the key-data pair,
and write a new key and data into it. While the writer are writing, will the reader reads out wrong key/data, or
mismatched key/data?
  
Honnappa Nagarahalli Oct. 2, 2018, 4:26 a.m. UTC | #3
> 
> >-----Original Message-----
> >From: Honnappa Nagarahalli [mailto:Honnappa.Nagarahalli@arm.com]
> >> @@ -1317,16 +1317,19 @@ rte_hash_iterate(const struct rte_hash *h,
> >> const void **key, void **data, uint32
> >>  	bucket_idx = *next / RTE_HASH_BUCKET_ENTRIES;
> >>  	idx = *next % RTE_HASH_BUCKET_ENTRIES;
> >>
> >> +	__hash_rw_reader_lock(h);
> >This does not work well with the lock-less changes I am making.  We
> >should leave the lock in its original position. Instead change the while loop as
> follows:
> >
> >while ((position = h->buckets[bucket_idx].key_idx[idx]) == EMPTY_SLOT)
> >
> >>  	/* If current position is empty, go to the next one */
> >>  	while (h->buckets[bucket_idx].key_idx[idx] == EMPTY_SLOT) {
> >>  		(*next)++;
> >>  		/* End of table */
> >> -		if (*next == total_entries)
> >> +		if (*next == total_entries) {
> >> +			__hash_rw_reader_unlock(h);
> >>  			return -ENOENT;
> >> +		}
> >>  		bucket_idx = *next / RTE_HASH_BUCKET_ENTRIES;
> >>  		idx = *next % RTE_HASH_BUCKET_ENTRIES;
> >>  	}
> >> -	__hash_rw_reader_lock(h);
> >> +
> >>  	/* Get position of entry in key table */
> >>  	position = h->buckets[bucket_idx].key_idx[idx];
> >If we change the while loop as I suggested as above, we can remove this line.
> >
> >>  	next_key = (struct rte_hash_key *) ((char *)h->key_store +
> 
> [Wang, Yipeng] Sorry that I did not realize you already have it in your patch
> set and I agree.
> Do you want to export it as a bug fix in your patch set? I will remove my
> change.
Sure, I will make a separate commit for this.

> 
> For the lock free, do we need to protect it with version counter? Imagine the
> following corner case:
> While the iterator read the key and data, there is a writer deleted, removed,
> and recycled the key-data pair, and write a new key and data into it. While the
> writer are writing, will the reader reads out wrong key/data, or mismatched
> key/data?
> 
In the lock-free algorithm, the key-data is not 'freed' until the readers have completed all their references to the 'deleted' key-data. Hence, the writers will not be able to allocate the same key store index till the readers have stopped referring to the 'deleted' key-data.
I re-checked my ladder diagrams [1] and I could not find any issues.

[1] https://dpdkuserspace2018.sched.com/event/G44w/lock-free-read-write-concurrency-in-rtehash (PPT)
  
Wang, Yipeng1 Oct. 2, 2018, 11:53 p.m. UTC | #4
>-----Original Message-----
>From: Honnappa Nagarahalli [mailto:Honnappa.Nagarahalli@arm.com]
>> >>  	/* Get position of entry in key table */
>> >>  	position = h->buckets[bucket_idx].key_idx[idx];
>> >If we change the while loop as I suggested as above, we can remove this line.
>> >
>> >>  	next_key = (struct rte_hash_key *) ((char *)h->key_store +
>>
>> [Wang, Yipeng] Sorry that I did not realize you already have it in your patch
>> set and I agree.
>> Do you want to export it as a bug fix in your patch set? I will remove my
>> change.
>Sure, I will make a separate commit for this.
[Wang, Yipeng] I fixed the issue you mentioned and put it in V5 and I saw you acked it. You don't need to change you patch now.
>>
>> For the lock free, do we need to protect it with version counter? Imagine the
>> following corner case:
>> While the iterator read the key and data, there is a writer deleted, removed,
>> and recycled the key-data pair, and write a new key and data into it. While the
>> writer are writing, will the reader reads out wrong key/data, or mismatched
>> key/data?
>>
>In the lock-free algorithm, the key-data is not 'freed' until the readers have completed all their references to the 'deleted' key-data.
>Hence, the writers will not be able to allocate the same key store index till the readers have stopped referring to the 'deleted' key-
>data.
>I re-checked my ladder diagrams [1] and I could not find any issues.
>
>[1] https://dpdkuserspace2018.sched.com/event/G44w/lock-free-read-write-concurrency-in-rtehash (PPT)
[Wang, Yipeng]
After checking your slides, I agree. It works with upper level application which should have RCU-like mechanisms to ensure
the grace period has finished before recycling. In this case, the logic is good! 

If you haven't done so, could you be more specific in doc and the API comment to indicate that the key-data pair recycle function should only be called
after reader finished (or maybe specifically indicate RCU type of mechanism is needed?). I feel that users not familiar with this could easily
get it wrong. 

Besides this, I don't have other concern.
  

Patch

diff --git a/lib/librte_hash/rte_cuckoo_hash.c b/lib/librte_hash/rte_cuckoo_hash.c
index f7b86c8..eba13e9 100644
--- a/lib/librte_hash/rte_cuckoo_hash.c
+++ b/lib/librte_hash/rte_cuckoo_hash.c
@@ -1317,16 +1317,19 @@  rte_hash_iterate(const struct rte_hash *h, const void **key, void **data, uint32
 	bucket_idx = *next / RTE_HASH_BUCKET_ENTRIES;
 	idx = *next % RTE_HASH_BUCKET_ENTRIES;
 
+	__hash_rw_reader_lock(h);
 	/* If current position is empty, go to the next one */
 	while (h->buckets[bucket_idx].key_idx[idx] == EMPTY_SLOT) {
 		(*next)++;
 		/* End of table */
-		if (*next == total_entries)
+		if (*next == total_entries) {
+			__hash_rw_reader_unlock(h);
 			return -ENOENT;
+		}
 		bucket_idx = *next / RTE_HASH_BUCKET_ENTRIES;
 		idx = *next % RTE_HASH_BUCKET_ENTRIES;
 	}
-	__hash_rw_reader_lock(h);
+
 	/* Get position of entry in key table */
 	position = h->buckets[bucket_idx].key_idx[idx];
 	next_key = (struct rte_hash_key *) ((char *)h->key_store +