[3/5] dts: add parsing utility module

Message ID 20240412111136.3470304-4-luca.vizzarro@arm.com (mailing list archive)
State New
Delegated to: Thomas Monjalon
Headers
Series dts: testpmd show port info/stats |

Checks

Context Check Description
ci/checkpatch success coding style OK

Commit Message

Luca Vizzarro April 12, 2024, 11:11 a.m. UTC
  Adds parsing text into a custom data structure. It provides a new
`TextParser` dataclass to be inherited. This implements the `parse`
method, which combined with the parser functions, it can automatically
parse the value for each field.

Signed-off-by: Luca Vizzarro <luca.vizzarro@arm.com>
Reviewed-by: Paul Szczepanek <paul.szczepanek@arm.com>
---
 dts/framework/parser.py | 147 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 147 insertions(+)
 create mode 100644 dts/framework/parser.py
  

Comments

Juraj Linkeš April 16, 2024, 8:59 a.m. UTC | #1
On Fri, Apr 12, 2024 at 1:11 PM Luca Vizzarro <luca.vizzarro@arm.com> wrote:
>
> Adds parsing text into a custom data structure. It provides a new
> `TextParser` dataclass to be inherited. This implements the `parse`
> method, which combined with the parser functions, it can automatically
> parse the value for each field.
>

From this commit message, I don't know why we're adding the module.
What are we going to use it for? Since you mentioned you'll send a v2,
I'll wait with review after that as I think it'll make it a bit easier
to review.
  
Luca Vizzarro April 16, 2024, 12:16 p.m. UTC | #2
On 16/04/2024 09:59, Juraj Linkeš wrote:
>  From this commit message, I don't know why we're adding the module.
> What are we going to use it for? Since you mentioned you'll send a v2,
> I'll wait with review after that as I think it'll make it a bit easier
> to review.

Ack. Will rewrite the commit body.
  
Jeremy Spewock April 29, 2024, 4:15 p.m. UTC | #3
On Fri, Apr 12, 2024 at 7:11 AM Luca Vizzarro <luca.vizzarro@arm.com> wrote:
>
<snip>
> @@ -0,0 +1,147 @@
> +# SPDX-License-Identifier: BSD-3-Clause
> +# Copyright(c) 2024 Arm Limited
> +
> +"""Parsing utility module.
> +
> +This module provides :class:`~TextParser` which can be used to model any data structure
> +that can parse a block of text.
> +"""
> +

It would be helpful if this top level docstring explained more of how
to use the text parser and some examples of using a small dataclass
that chains some of these methods together. At first glance it wasn't
clear to me why things were done the way they were or what these
methods really provided, but looking at how you use it in the testpmd
shell made more sense.

> +from dataclasses import dataclass, fields, MISSING
> +import re
> +from typing import TypeVar
> +from typing_extensions import Self
> +
> +T = TypeVar("T")

This is declared but I don't see where it is used. There aren't many
typehints in this file, since these types can essentially be anything,
maybe using this could make some things slightly more clear (like, for
example, in eq v1 and v2 are the same type) but in most cases it's
implied so I'm not sure how beneficial this would be regardless.

> +
> +
> +META_PARSERS = "parsers"
> +
> +
> +def chain(parser, metadata):
> +    """Chain a parser function.
> +
> +    The parser function can take and return a single argument of any type. It is
> +    up to the user to ensure that the chained functions have compatible types.
> +
> +    Args:
> +        parser: the parser function pointer
> +        metadata: pre-existing metadata to chain if any
> +    """
> +    parsers = metadata.get(META_PARSERS) or []
> +    parsers.append(parser)
> +    return {**metadata, META_PARSERS: parsers}
> +
> +
> +def to_int(metadata={}, base=0):

Is it simpler to default this to base 10? I assume that's what it'll
be most of the time so we might as well allow users to skip this
parameter.

> +    """Converts a string to an integer.
> +
> +    Args:
> +        metadata: pre-existing metadata to chain if any
> +        base: argument passed to the constructor of ``int``
> +    """
> +    return chain(lambda v: int(v, base), metadata)
> +
<snip>
> +@dataclass
> +class TextParser:
> +    """Helper abstract dataclass that parses a text according to the fields' rules.
> +
> +    This class is accompanied by a selection of parser functions and a generic chaining function,
> +    that are to be set to the fields' metadata, to enable parsing. If a field metadata is not set with
> +    any parser function, this is skipped.
> +    """
> +
> +    @classmethod
> +    def parse(cls, text: str) -> Self:
> +        """The parsing class method.
> +
> +        This function loops through every field that has any parser function associated with it and runs
> +        each parser chain to the supplied text. If a parser function returns None, it expects that parsing
> +        has failed and continues to the next field.
> +
> +        Args:
> +            text: the text to parse
> +        Raises:
> +            RuntimeError: if the parser did not find a match and the field does not have a default value
> +                          or default factory.
> +        """
> +        fields_values = {}
> +        for field in fields(cls):
> +            parsers = field.metadata.get(META_PARSERS)
> +            if parsers is None:
> +                continue
> +
> +            field_value = text
> +            for parser_fn in parsers:
> +                field_value = parser_fn(field_value)
> +                if field_value is None:
> +                    # nothing was actually parsed, move on
> +                    break
> +
> +            if field_value is None:
> +                if field.default is MISSING and field.default_factory is MISSING:
> +                    raise RuntimeError(
> +                        f"parsers for field {field.name} returned None, but the field has no default"
> +                    )

If we just skip instead of raising an exception here, would this solve
the issues caused by the first and last line in the testpmd output?
The check to see if the first line is an invalid port would obviously
still not work, but would it solve the problem of the trailing prompt?

> +            else:
> +                fields_values[field.name] = field_value
> +
> +        return cls(**fields_values)
> --
> 2.34.1
>
  

Patch

diff --git a/dts/framework/parser.py b/dts/framework/parser.py
new file mode 100644
index 0000000000..5a2ba0c93a
--- /dev/null
+++ b/dts/framework/parser.py
@@ -0,0 +1,147 @@ 
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2024 Arm Limited
+
+"""Parsing utility module.
+
+This module provides :class:`~TextParser` which can be used to model any data structure
+that can parse a block of text.
+"""
+
+from dataclasses import dataclass, fields, MISSING
+import re
+from typing import TypeVar
+from typing_extensions import Self
+
+T = TypeVar("T")
+
+
+META_PARSERS = "parsers"
+
+
+def chain(parser, metadata):
+    """Chain a parser function.
+
+    The parser function can take and return a single argument of any type. It is
+    up to the user to ensure that the chained functions have compatible types.
+
+    Args:
+        parser: the parser function pointer
+        metadata: pre-existing metadata to chain if any
+    """
+    parsers = metadata.get(META_PARSERS) or []
+    parsers.append(parser)
+    return {**metadata, META_PARSERS: parsers}
+
+
+def to_int(metadata={}, base=0):
+    """Converts a string to an integer.
+
+    Args:
+        metadata: pre-existing metadata to chain if any
+        base: argument passed to the constructor of ``int``
+    """
+    return chain(lambda v: int(v, base), metadata)
+
+
+def eq(v2, metadata={}):
+    """Compares two values and returns a boolean.
+
+    Args:
+        v2: value to compare with the incoming value
+        metadata: pre-existing metadata to chain if any
+    """
+    return chain(lambda v1: v1 == v2, metadata)
+
+
+def to_bool(metadata={}):
+    """Evaluates a string into a boolean.
+
+    The following case-insensitive words yield ``True``: on, yes, enabled, true.
+
+    Args:
+        metadata: pre-existing metadata to chain if any
+    """
+    return chain(lambda s: s.lower() in ["on", "yes", "enabled", "true"], metadata)
+
+
+def regex(
+    pattern: str | re.Pattern[str],
+    flags: re.RegexFlag = re.RegexFlag(0),
+    named: bool = False,
+    metadata={},
+):
+    """Searches for a regular expression in a text.
+
+    If there is only one capture group, its value is returned, otherwise a tuple containing all the
+    capture groups values is returned instead.
+
+    Args:
+        pattern: the regular expression pattern
+        flags: the regular expression flags
+        named: if set to True only the named capture groups will be returned as a dictionary
+        metadata: pre-existing metadata to chain if any
+    """
+    pattern = re.compile(pattern, flags)
+
+    def regex_parser(text: str):
+        m = pattern.search(text)
+        if m is None:
+            return m
+
+        if named:
+            return m.groupdict()
+
+        matches = m.groups()
+        if len(matches) == 1:
+            return matches[0]
+
+        return matches
+
+    return chain(regex_parser, metadata)
+
+
+@dataclass
+class TextParser:
+    """Helper abstract dataclass that parses a text according to the fields' rules.
+
+    This class is accompanied by a selection of parser functions and a generic chaining function,
+    that are to be set to the fields' metadata, to enable parsing. If a field metadata is not set with
+    any parser function, this is skipped.
+    """
+
+    @classmethod
+    def parse(cls, text: str) -> Self:
+        """The parsing class method.
+
+        This function loops through every field that has any parser function associated with it and runs
+        each parser chain to the supplied text. If a parser function returns None, it expects that parsing
+        has failed and continues to the next field.
+
+        Args:
+            text: the text to parse
+        Raises:
+            RuntimeError: if the parser did not find a match and the field does not have a default value
+                          or default factory.
+        """
+        fields_values = {}
+        for field in fields(cls):
+            parsers = field.metadata.get(META_PARSERS)
+            if parsers is None:
+                continue
+
+            field_value = text
+            for parser_fn in parsers:
+                field_value = parser_fn(field_value)
+                if field_value is None:
+                    # nothing was actually parsed, move on
+                    break
+
+            if field_value is None:
+                if field.default is MISSING and field.default_factory is MISSING:
+                    raise RuntimeError(
+                        f"parsers for field {field.name} returned None, but the field has no default"
+                    )
+            else:
+                fields_values[field.name] = field_value
+
+        return cls(**fields_values)