[dpdk-dev,01/19] devtools: add simple script to find duplicate includes
Checks
Commit Message
This is just a simple check script to find obvious duplications.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
devtools/dup_include.pl | 64 +++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 64 insertions(+)
create mode 100755 devtools/dup_include.pl
Comments
Hi Stephen,
11/07/2017 20:55, Stephen Hemminger:
> This is just a simple check script to find obvious duplications.
>
> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
> ---
> devtools/dup_include.pl | 64 +++++++++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 64 insertions(+)
> create mode 100755 devtools/dup_include.pl
Thank you for this script, but... it is written in Perl!
I don't think it is a good idea to add yet another language to DPDK.
We already have shell and python scripts.
And I am not sure a lot of (young) people are able to parse it ;)
I would like to propose this shell script:
dirs='app buildtools drivers examples lib test'
pattern='^[[:space:]]*#include[[:space:]]*[<"](.*)[>"].*'
for file in $(git ls $dirs) ; do
dups=$(sed -rn "s,$pattern,\1,p" $file | sort | uniq -d)
[ -n "$dups" ] || continue
echo "$file"
echo "$dups" | sed 's,^,\t,'
done
On Tue, 11 Jul 2017 22:33:55 +0200
Thomas Monjalon <thomas@monjalon.net> wrote:
> Hi Stephen,
>
> 11/07/2017 20:55, Stephen Hemminger:
> > This is just a simple check script to find obvious duplications.
> >
> > Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
> > ---
> > devtools/dup_include.pl | 64 +++++++++++++++++++++++++++++++++++++++++++++++++
> > 1 file changed, 64 insertions(+)
> > create mode 100755 devtools/dup_include.pl
>
> Thank you for this script, but... it is written in Perl!
> I don't think it is a good idea to add yet another language to DPDK.
> We already have shell and python scripts.
> And I am not sure a lot of (young) people are able to parse it ;)
>
> I would like to propose this shell script:
>
> dirs='app buildtools drivers examples lib test'
> pattern='^[[:space:]]*#include[[:space:]]*[<"](.*)[>"].*'
>
> for file in $(git ls $dirs) ; do
> dups=$(sed -rn "s,$pattern,\1,p" $file | sort | uniq -d)
> [ -n "$dups" ] || continue
> echo "$file"
> echo "$dups" | sed 's,^,\t,'
> done
Sorry, it is quick and easy. After all it is optional, just like
coccinelle and not part of the build. Plus checkpatch is in Perl.
12/07/2017 01:05, Stephen Hemminger:
> On Tue, 11 Jul 2017 22:33:55 +0200
> Thomas Monjalon <thomas@monjalon.net> wrote:
>
> > Hi Stephen,
> >
> > 11/07/2017 20:55, Stephen Hemminger:
> > > This is just a simple check script to find obvious duplications.
> > >
> > > Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
> > > ---
> > > devtools/dup_include.pl | 64 +++++++++++++++++++++++++++++++++++++++++++++++++
> > > 1 file changed, 64 insertions(+)
> > > create mode 100755 devtools/dup_include.pl
> >
> > Thank you for this script, but... it is written in Perl!
> > I don't think it is a good idea to add yet another language to DPDK.
> > We already have shell and python scripts.
> > And I am not sure a lot of (young) people are able to parse it ;)
> >
> > I would like to propose this shell script:
> >
> > dirs='app buildtools drivers examples lib test'
> > pattern='^[[:space:]]*#include[[:space:]]*[<"](.*)[>"].*'
> >
> > for file in $(git ls $dirs) ; do
> > dups=$(sed -rn "s,$pattern,\1,p" $file | sort | uniq -d)
> > [ -n "$dups" ] || continue
> > echo "$file"
> > echo "$dups" | sed 's,^,\t,'
> > done
>
> Sorry, it is quick and easy. After all it is optional, just like
> coccinelle and not part of the build. Plus checkpatch is in Perl.
checkpatch is not in the repository ;)
I prefer my shell script because
- I am able to maintain it
- the regexp is more tolerant with #include line
- it checks the whole repository
On Tue, 11 Jul 2017 22:33:55 +0200
Thomas Monjalon <thomas@monjalon.net> wrote:
> Thank you for this script, but... it is written in Perl!
> I don't think it is a good idea to add yet another language to DPDK.
> We already have shell and python scripts.
> And I am not sure a lot of (young) people are able to parse it ;)
>
> I would like to propose this shell script:
>
> dirs='app buildtools drivers examples lib test'
> pattern='^[[:space:]]*#include[[:space:]]*[<"](.*)[>"].*'
>
> for file in $(git ls $dirs) ; do
> dups=$(sed -rn "s,$pattern,\1,p" $file | sort | uniq -d)
> [ -n "$dups" ] || continue
> echo "$file"
> echo "$dups" | sed 's,^,\t,'
> done
There is no "git ls" command in current version,
Using find instead works.
plus shell is 7x slower.
$ time bash -c "find . -name '*.c' | xargs /tmp/dupinc.sh"
real 0m0.765s
user 0m1.220s
sys 0m0.155s
$time bash -c "find . -name '*.c' | xargs ~/bin/dup_inc.pl"
real 0m0.131s
user 0m0.118s
sys 0m0.014s
How about some python code.
12/07/2017 23:59, Stephen Hemminger:
> On Tue, 11 Jul 2017 22:33:55 +0200
> Thomas Monjalon <thomas@monjalon.net> wrote:
>
> > Thank you for this script, but... it is written in Perl!
> > I don't think it is a good idea to add yet another language to DPDK.
> > We already have shell and python scripts.
> > And I am not sure a lot of (young) people are able to parse it ;)
> >
> > I would like to propose this shell script:
> >
> > dirs='app buildtools drivers examples lib test'
> > pattern='^[[:space:]]*#include[[:space:]]*[<"](.*)[>"].*'
> >
> > for file in $(git ls $dirs) ; do
> > dups=$(sed -rn "s,$pattern,\1,p" $file | sort | uniq -d)
> > [ -n "$dups" ] || continue
> > echo "$file"
> > echo "$dups" | sed 's,^,\t,'
> > done
>
> There is no "git ls" command in current version,
>
> Using find instead works.
Yes, both work if specifying source code directories as above.
> plus shell is 7x slower.
>
> $ time bash -c "find . -name '*.c' | xargs /tmp/dupinc.sh"
> real 0m0.765s
> user 0m1.220s
> sys 0m0.155s
> $time bash -c "find . -name '*.c' | xargs ~/bin/dup_inc.pl"
> real 0m0.131s
> user 0m0.118s
> sys 0m0.014s
I don't think speed is really relevant here :)
> How about some python code.
I don't really care between shell or python.
I thought a shell script would be really concise
and without Python 2/3 compat issues.
> On Jul 13, 2017, at 1:56 AM, Thomas Monjalon <thomas@monjalon.net> wrote:
>
> 12/07/2017 23:59, Stephen Hemminger:
>> On Tue, 11 Jul 2017 22:33:55 +0200
>> Thomas Monjalon <thomas@monjalon.net> wrote:
>>
>>> Thank you for this script, but... it is written in Perl!
>>> I don't think it is a good idea to add yet another language to DPDK.
>>> We already have shell and python scripts.
>>> And I am not sure a lot of (young) people are able to parse it ;)
>>>
>>> I would like to propose this shell script:
>>>
>>> dirs='app buildtools drivers examples lib test'
>>> pattern='^[[:space:]]*#include[[:space:]]*[<"](.*)[>"].*'
>>>
>>> for file in $(git ls $dirs) ; do
>>> dups=$(sed -rn "s,$pattern,\1,p" $file | sort | uniq -d)
>>> [ -n "$dups" ] || continue
>>> echo "$file"
>>> echo "$dups" | sed 's,^,\t,'
>>> done
>>
>> There is no "git ls" command in current version,
>>
>> Using find instead works.
>
> Yes, both work if specifying source code directories as above.
‘git ls’ I had to change it to ‘git ls-files’ to make it work. I think you have a git alias setup for ls-files.
Can this be added to the patch testing?
>
>> plus shell is 7x slower.
>>
>> $ time bash -c "find . -name '*.c' | xargs /tmp/dupinc.sh"
>> real 0m0.765s
>> user 0m1.220s
>> sys 0m0.155s
>> $time bash -c "find . -name '*.c' | xargs ~/bin/dup_inc.pl"
>> real 0m0.131s
>> user 0m0.118s
>> sys 0m0.014s
>
> I don't think speed is really relevant here :)
>
>> How about some python code.
>
> I don't really care between shell or python.
> I thought a shell script would be really concise
> and without Python 2/3 compat issues.
Regards,
Keith
13/07/2017 14:19, Wiles, Keith:
>
> > On Jul 13, 2017, at 1:56 AM, Thomas Monjalon <thomas@monjalon.net> wrote:
> >
> > 12/07/2017 23:59, Stephen Hemminger:
> >> On Tue, 11 Jul 2017 22:33:55 +0200
> >> Thomas Monjalon <thomas@monjalon.net> wrote:
> >>
> >>> Thank you for this script, but... it is written in Perl!
> >>> I don't think it is a good idea to add yet another language to DPDK.
> >>> We already have shell and python scripts.
> >>> And I am not sure a lot of (young) people are able to parse it ;)
> >>>
> >>> I would like to propose this shell script:
> >>>
> >>> dirs='app buildtools drivers examples lib test'
> >>> pattern='^[[:space:]]*#include[[:space:]]*[<"](.*)[>"].*'
> >>>
> >>> for file in $(git ls $dirs) ; do
> >>> dups=$(sed -rn "s,$pattern,\1,p" $file | sort | uniq -d)
> >>> [ -n "$dups" ] || continue
> >>> echo "$file"
> >>> echo "$dups" | sed 's,^,\t,'
> >>> done
> >>
> >> There is no "git ls" command in current version,
> >>
> >> Using find instead works.
> >
> > Yes, both work if specifying source code directories as above.
>
> ‘git ls’ I had to change it to ‘git ls-files’ to make it work. I think you have a git alias setup for ls-files.
Ah yes, you're right!
Thanks, I has not well understood the original comment :)
> Can this be added to the patch testing?
Yes sure, I will do.
13/07/2017 08:56, Thomas Monjalon:
> 12/07/2017 23:59, Stephen Hemminger:
> > On Tue, 11 Jul 2017 22:33:55 +0200
> > Thomas Monjalon <thomas@monjalon.net> wrote:
> >
> > > Thank you for this script, but... it is written in Perl!
> > > I don't think it is a good idea to add yet another language to DPDK.
> > > We already have shell and python scripts.
> > > And I am not sure a lot of (young) people are able to parse it ;)
> > >
> > > I would like to propose this shell script:
[...]
>
> > plus shell is 7x slower.
> >
> > $ time bash -c "find . -name '*.c' | xargs /tmp/dupinc.sh"
> > real 0m0.765s
> > user 0m1.220s
> > sys 0m0.155s
> > $time bash -c "find . -name '*.c' | xargs ~/bin/dup_inc.pl"
> > real 0m0.131s
> > user 0m0.118s
> > sys 0m0.014s
>
> I don't think speed is really relevant here :)
I did my own benchmark (recreation time):
# time sh -c 'for file in $(git ls-files app buildtools drivers examples lib test) ; do devtools/dup_include.pl $file ; done'
4,41s user 1,32s system 101% cpu 5,667 total
# time devtools/check-duplicate-includes.sh
5,48s user 1,00s system 153% cpu 4,222 total
The shell version is reported as faster on my computer!
It is faster when filtering only .c and .h files:
for file in $(git ls-files '*.[ch]') ; do
dups=$(sed -rn "s,$pattern,\1,p" $file | sort | uniq -d)
[ -z "$dups" ] || echo "$dups" | sed "s,^,$file: duplicated include: ,"
done
# time sh -c 'for file in $(git ls-files "*.[ch]") ; do devtools/dup_include.pl $file ; done'
3,65s user 1,05s system 100% cpu 4,668 total
# time devtools/check-duplicate-includes.sh
4,72s user 0,80s system 153% cpu 3,603 total
I prefer this version using only pipes, which is well parallelized:
for file in $(git ls-files '*.[ch]') ; do
sed -rn "s,$pattern,\1,p" $file | sort | uniq -d |
sed "s,^,$file: duplicated include: ,"
done
7,40s user 1,49s system 231% cpu 3,847 total
14/07/2017 17:39, Thomas Monjalon:
> 13/07/2017 08:56, Thomas Monjalon:
> > 12/07/2017 23:59, Stephen Hemminger:
> > > On Tue, 11 Jul 2017 22:33:55 +0200
> > > Thomas Monjalon <thomas@monjalon.net> wrote:
> > >
> > > > Thank you for this script, but... it is written in Perl!
> > > > I don't think it is a good idea to add yet another language to DPDK.
> > > > We already have shell and python scripts.
> > > > And I am not sure a lot of (young) people are able to parse it ;)
> > > >
> > > > I would like to propose this shell script:
> [...]
> >
> > > plus shell is 7x slower.
> > >
> > > $ time bash -c "find . -name '*.c' | xargs /tmp/dupinc.sh"
> > > real 0m0.765s
> > > user 0m1.220s
> > > sys 0m0.155s
> > > $time bash -c "find . -name '*.c' | xargs ~/bin/dup_inc.pl"
> > > real 0m0.131s
> > > user 0m0.118s
> > > sys 0m0.014s
> >
> > I don't think speed is really relevant here :)
>
> I did my own benchmark (recreation time):
>
> # time sh -c 'for file in $(git ls-files app buildtools drivers examples lib test) ; do devtools/dup_include.pl $file ; done'
> 4,41s user 1,32s system 101% cpu 5,667 total
> # time devtools/check-duplicate-includes.sh
> 5,48s user 1,00s system 153% cpu 4,222 total
>
> The shell version is reported as faster on my computer!
>
> It is faster when filtering only .c and .h files:
>
> for file in $(git ls-files '*.[ch]') ; do
> dups=$(sed -rn "s,$pattern,\1,p" $file | sort | uniq -d)
> [ -z "$dups" ] || echo "$dups" | sed "s,^,$file: duplicated include: ,"
> done
>
> # time sh -c 'for file in $(git ls-files "*.[ch]") ; do devtools/dup_include.pl $file ; done'
> 3,65s user 1,05s system 100% cpu 4,668 total
> # time devtools/check-duplicate-includes.sh
> 4,72s user 0,80s system 153% cpu 3,603 total
>
> I prefer this version using only pipes, which is well parallelized:
>
> for file in $(git ls-files '*.[ch]') ; do
> sed -rn "s,$pattern,\1,p" $file | sort | uniq -d |
> sed "s,^,$file: duplicated include: ,"
> done
>
> 7,40s user 1,49s system 231% cpu 3,847 total
And now, the big shell optimization:
export LC_ALL=C
Result is impressive:
2,99s user 0,72s system 258% cpu 1,436 total
I'm sure you will agree to integrate my version now :)
> On Jul 14, 2017, at 10:54 AM, Thomas Monjalon <thomas@monjalon.net> wrote:
>
> 14/07/2017 17:39, Thomas Monjalon:
>> 13/07/2017 08:56, Thomas Monjalon:
>>> 12/07/2017 23:59, Stephen Hemminger:
>>>> On Tue, 11 Jul 2017 22:33:55 +0200
>>>> Thomas Monjalon <thomas@monjalon.net> wrote:
>>>>
>>>>> Thank you for this script, but... it is written in Perl!
>>>>> I don't think it is a good idea to add yet another language to DPDK.
>>>>> We already have shell and python scripts.
>>>>> And I am not sure a lot of (young) people are able to parse it ;)
>>>>>
>>>>> I would like to propose this shell script:
>> [...]
>>>
>>>> plus shell is 7x slower.
>>>>
>>>> $ time bash -c "find . -name '*.c' | xargs /tmp/dupinc.sh"
>>>> real 0m0.765s
>>>> user 0m1.220s
>>>> sys 0m0.155s
>>>> $time bash -c "find . -name '*.c' | xargs ~/bin/dup_inc.pl"
>>>> real 0m0.131s
>>>> user 0m0.118s
>>>> sys 0m0.014s
>>>
>>> I don't think speed is really relevant here :)
>>
>> I did my own benchmark (recreation time):
>>
>> # time sh -c 'for file in $(git ls-files app buildtools drivers examples lib test) ; do devtools/dup_include.pl $file ; done'
>> 4,41s user 1,32s system 101% cpu 5,667 total
>> # time devtools/check-duplicate-includes.sh
>> 5,48s user 1,00s system 153% cpu 4,222 total
>>
>> The shell version is reported as faster on my computer!
>>
>> It is faster when filtering only .c and .h files:
>>
>> for file in $(git ls-files '*.[ch]') ; do
>> dups=$(sed -rn "s,$pattern,\1,p" $file | sort | uniq -d)
>> [ -z "$dups" ] || echo "$dups" | sed "s,^,$file: duplicated include: ,"
>> done
>>
>> # time sh -c 'for file in $(git ls-files "*.[ch]") ; do devtools/dup_include.pl $file ; done'
>> 3,65s user 1,05s system 100% cpu 4,668 total
>> # time devtools/check-duplicate-includes.sh
>> 4,72s user 0,80s system 153% cpu 3,603 total
>>
>> I prefer this version using only pipes, which is well parallelized:
>>
>> for file in $(git ls-files '*.[ch]') ; do
>> sed -rn "s,$pattern,\1,p" $file | sort | uniq -d |
>> sed "s,^,$file: duplicated include: ,"
>> done
>>
>> 7,40s user 1,49s system 231% cpu 3,847 total
>
> And now, the big shell optimization:
> export LC_ALL=C
> Result is impressive:
> 2,99s user 0,72s system 258% cpu 1,436 total
>
> I'm sure you will agree to integrate my version now :)
>
hands down winner, where is the patch? :-)
>
Regards,
Keith
new file mode 100755
@@ -0,0 +1,64 @@
+#! /usr/bin/env perl
+
+# BSD LICENSE
+#
+# Copyright 2016 Stephen Hemminger <stephen@networkplumber.org>
+#
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions
+# are met:
+#
+# * Redistributions of source code must retain the above copyright
+# notice, this list of conditions and the following disclaimer.
+# * Redistributions in binary form must reproduce the above copyright
+# notice, this list of conditions and the following disclaimer in
+# the documentation and/or other materials provided with the
+# distribution.
+# * Neither the name of 6WIND S.A. nor the names of its
+# contributors may be used to endorse or promote products derived
+# from this software without specific prior written permission.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+# Examine files looking for duplicate references to the
+# same include file.
+#
+# Usage: dup_include.pl foo.c bar.c
+#
+
+use strict;
+use warnings;
+
+die "no input files'n"
+ if ($#ARGV < 0);
+
+foreach my $filename (@ARGV) {
+ open my $f, '<', $filename
+ or die "$filename: $!\n";
+
+ my %included;
+ my $lineno = 0;
+
+ while (<$f>) {
+ ++$lineno;
+ next unless /^#include\s+<(.+)>$/;
+
+ my $h = $1;
+ warn "$filename:$lineno duplicate include $h\n"
+ if $included{$h};
+
+ $included{$h} = 1;
+ }
+
+ close $f;
+}