[v2,3/5] dts: add parsing utility module

Message ID 20240509112635.1170557-4-luca.vizzarro@arm.com (mailing list archive)
State Superseded
Delegated to: Thomas Monjalon
Headers
Series dts: testpmd show port info/stats |

Checks

Context Check Description
ci/checkpatch success coding style OK

Commit Message

Luca Vizzarro May 9, 2024, 11:26 a.m. UTC
  Adds parsing text into a custom dataclass. It provides a new
`TextParser` dataclass to be inherited. This implements the `parse`
method, which combined with the parser functions, it can automatically
parse the value for each field.

This new utility will facilitate and simplify the parsing of complex
command outputs, while ensuring that the codebase does not get bloated
and stays flexible.

Signed-off-by: Luca Vizzarro <luca.vizzarro@arm.com>
Reviewed-by: Paul Szczepanek <paul.szczepanek@arm.com>
---
 dts/framework/parser.py | 199 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 199 insertions(+)
 create mode 100644 dts/framework/parser.py
  

Comments

Jeremy Spewock May 31, 2024, 9:06 p.m. UTC | #1
On Thu, May 9, 2024 at 7:26 AM Luca Vizzarro <luca.vizzarro@arm.com> wrote:
>
> Adds parsing text into a custom dataclass. It provides a new
> `TextParser` dataclass to be inherited. This implements the `parse`
> method, which combined with the parser functions, it can automatically
> parse the value for each field.
>
> This new utility will facilitate and simplify the parsing of complex
> command outputs, while ensuring that the codebase does not get bloated
> and stays flexible.
>
> Signed-off-by: Luca Vizzarro <luca.vizzarro@arm.com>
> Reviewed-by: Paul Szczepanek <paul.szczepanek@arm.com>
> ---
Reviewed-by: Jeremy Spewock <jspewock@iol.unh.edu>
  
Juraj Linkeš June 4, 2024, 3:13 p.m. UTC | #2
There are mostly documentation and basically inconsequential minor comments.

On 9. 5. 2024 13:26, Luca Vizzarro wrote:
> Adds parsing text into a custom dataclass. It provides a new
> `TextParser` dataclass to be inherited. This implements the `parse`
> method, which combined with the parser functions, it can automatically
> parse the value for each field.
> 
> This new utility will facilitate and simplify the parsing of complex
> command outputs, while ensuring that the codebase does not get bloated
> and stays flexible.
> 
> Signed-off-by: Luca Vizzarro <luca.vizzarro@arm.com>
> Reviewed-by: Paul Szczepanek <paul.szczepanek@arm.com>
> ---
>   dts/framework/parser.py | 199 ++++++++++++++++++++++++++++++++++++++++
>   1 file changed, 199 insertions(+)
>   create mode 100644 dts/framework/parser.py
> 
> diff --git a/dts/framework/parser.py b/dts/framework/parser.py
> new file mode 100644
> index 0000000000..5b4acddead
> --- /dev/null
> +++ b/dts/framework/parser.py
> @@ -0,0 +1,199 @@
> +# SPDX-License-Identifier: BSD-3-Clause
> +# Copyright(c) 2024 Arm Limited
> +
> +r"""Parsing utility module.
> +
> +This module provides :class:`~TextParser` which can be used to model any dataclass to a block of
> +text.
> +
> +Usage example::

I'd like to see a high level explanation of what the key pieces of the 
parsing are and how they're tied to the implementation:
1. The text we're about to parse, passed to the instance of a subclass
2. What we're parsing, the fields of the subclass
3. How we're parsing the fields, the functions of TextParser

This could be part of the example or mentioned before the example or in 
the class. I had to study the code to understand the API, so that should 
be better documented.

> +..code:: python
> +
> +    from dataclasses import dataclass, field
> +    from enum import Enum
> +    from framework.parser import TextParser
> +
> +    class Colour(Enum):
> +        BLACK = 1
> +        WHITE = 2
> +
> +        @classmethod
> +        def from_str(cls, text: str):
> +            match text:
> +                case "black":
> +                    return cls.BLACK
> +                case "white":
> +                    return cls.WHITE
> +                case _:
> +                    return None # unsupported colour
> +
> +        @classmethod
> +        def make_parser(cls):
> +            # make a parser function that finds a match and
> +            # then makes it a Colour object through Colour.from_str
> +            return TextParser.compose(cls.from_str, TextParser.find(r"is a (\w+)"))
> +
> +    @dataclass
> +    class Animal(TextParser):
> +        kind: str = field(metadata=TextParser.find(r"is a \w+ (\w+)"))
> +        name: str = field(metadata=TextParser.find(r"^(\w+)"))
> +        colour: Colour = field(metadata=Colour.make_parser())
> +        age: int = field(metadata=TextParser.find_int(r"aged (\d+)"))
> +
> +    steph = Animal.parse("Stephanie is a white cat aged 10")
> +    print(steph) # Animal(kind='cat', name='Stephanie', colour=<Colour.WHITE: 2>, age=10)
> +"""
> +
> +import re
> +from abc import ABC
> +from dataclasses import MISSING, dataclass, fields
> +from functools import partial
> +from typing import Any, Callable, TypedDict, cast
> +
> +from typing_extensions import Self
> +
> +
> +class ParserFn(TypedDict):
> +    """Parser function in a dict compatible with the :func:`dataclasses.field` metadata param."""
> +
> +    #:
> +    TextParser_fn: Callable[[str], Any]
> +
> +
> +@dataclass
> +class TextParser(ABC):
> +    """Helper abstract dataclass that parses a text according to the fields' rules.
> +
> +    This class provides a selection of parser functions and a function to compose generic functions
> +    with parser functions. Parser functions are designed to be passed to the fields' metadata param
> +    to enable parsing.
> +    """

We should add that this class should be subclassed; basically what I've 
laid out above so that devs know how to actually use the class. I'm not 
sure which is the better place to put it, but probably here.

> +
> +    """============ BEGIN PARSER FUNCTIONS ============"""
> +
> +    @staticmethod
> +    def compose(f: Callable, parser_fn: ParserFn) -> ParserFn:
> +        """Makes a composite parser function.
> +
> +        The parser function is run and if a non-None value was returned, f is called with it.

Let's use `parser_fn` instead of "The parser function". Let's also put f 
into backticks. It's also peculiar that we're passing the functions in 
the reverse order - first the second one is applied and then the first 
one is applied.

> +        Otherwise the function returns early with None.
> +
> +        Metadata modifier for :func:`dataclasses.field`.

This sentence is all alone here and I don't understand what it's saying 
without more context.

> +
> +        Args:
> +            f: the function to apply to the parser's result

This now refers to just parser's results instead of parser functions's 
result, which is confusing. But let's also use `parser_fn` here.

Also, the arg descriptions should end with a dot. And start with a 
capital letter.

https://www.sphinx-doc.org/en/master/usage/extensions/example_google.html#example-google

> +            parser_fn: the dictionary storing the parser function
> +        """

The docstring is missing a Returns: section.

> +        g = parser_fn["TextParser_fn"]

This is a matter of preference, but I like more descriptive variable 
names, something like initial_func. And then I'd change f to 
subsequent_func, which would make the function signature a bit more 
readable.

> +
> +        def _composite_parser_fn(text: str) -> Any:
> +            intermediate_value = g(text)
> +            if intermediate_value is None:
> +                return None
> +            return f(intermediate_value)
> +
> +        return ParserFn(TextParser_fn=_composite_parser_fn)
> +
> +    @staticmethod
> +    def find(
> +        pattern: str | re.Pattern[str],
> +        flags: re.RegexFlag = re.RegexFlag(0),
> +        named: bool = False,
> +    ) -> ParserFn:
> +        """Makes a parser function that finds a regular expression match in the text.
> +
> +        If the pattern has capturing groups, it returns None if no match was found. If the pattern
> +        has only one capturing group and a match was found, its value is returned. If the pattern
> +        has no capturing groups then either True or False is returned if the pattern had a match or
> +        not.
> +

This description would be a better fit in a Returns: section. The body 
could explain the various scenarios in which the function can be used. 
At least I imagine there are various scenarios based on the different 
things the function returns, but I don't really want to go around 
digging in the code to verify that. :-)

Also a note: what does the function return if the pattern has capturing 
groups and a match is found?

> +        Metadata modifier for :func:`dataclasses.field`.
> +
> +        Args:
> +            pattern: the regular expression pattern
> +            flags: the regular expression flags. Not used if the given pattern is already compiled
> +            named: if set to True only the named capture groups will be returned as a dictionary
> +        """

This is also missing the Returns: sections.

> +        if isinstance(pattern, str):
> +            pattern = re.compile(pattern, flags)
> +
> +        def _find(text: str) -> Any:
> +            m = pattern.search(text)
> +            if m is None:
> +                return None if pattern.groups > 0 else False
> +
> +            if pattern.groups == 0:
> +                return True
> +
> +            if named:
> +                return m.groupdict()
> +
> +            matches = m.groups()
> +            if len(matches) == 1:
> +                return matches[0]
> +
> +            return matches
> +
> +        return ParserFn(TextParser_fn=_find)
> +
> +    @classmethod

Is there a reason why find_int() is a classmethod while the rest are 
staticmethods? Looks like it could also be a staticmethod.

> +    def find_int(
> +        cls,
> +        pattern: str | re.Pattern[str],
> +        flags: re.RegexFlag = re.RegexFlag(0),
> +        int_base: int = 0,
> +    ) -> ParserFn:
> +        """Makes a parser function that converts the match of :meth:`~find` to int.
> +
> +        This function is compatible only with a pattern containing one capturing group.
> +
> +        Metadata modifier for :func:`dataclasses.field`.
> +
> +        Args:
> +            pattern: the regular expression pattern
> +            flags: the regular expression flags
> +            int_base: the base of the number to convert from
> +        Raises:
> +            RuntimeError: if the pattern does not have exactly one capturing group
> +        """
> +        if isinstance(pattern, str):
> +            pattern = re.compile(pattern, flags)
> +
> +        if pattern.groups != 1:
> +            raise RuntimeError("only one capturing group is allowed with this parser function")
> +

Have you considered using a subclass of DTSError so that we can add a 
severity to this error? The severity is there to "rank" DTS errors from 
worst to least worst.

> +        return cls.compose(partial(int, base=int_base), cls.find(pattern))
> +
> +    """============ END PARSER FUNCTIONS ============"""
> +
> +    @classmethod
> +    def parse(cls, text: str) -> Self:

Would converting this into __init__(self, text: str) work? Sounds like 
we could just use "for field in fields(self)" and then setattr() to 
populate the variables.

> +        """Creates a new instance of the class from the given text.
> +
> +        A new class instance is created with all the fields that have a parser function in their
> +        metadata. Fields without one are ignored and are expected to have a default value, otherwise
> +        the class initialization will fail.
> +
> +        A field is populated with the value returned by its corresponding parser function.
> +
> +        Args:
> +            text: the text to parse
> +        Raises:
> +            RuntimeError: if the parser did not find a match and the field does not have a default
> +                          value or default factory.
> +        """
> +        fields_values = {}
> +        for field in fields(cls):
> +            parse = cast(ParserFn, field.metadata).get("TextParser_fn")
> +            if parse is None:
> +                continue
> +
> +            value = parse(text)
> +            if value is not None:
> +                fields_values[field.name] = value

If we convert the method into a constructor, we would just use setattr() 
here.

> +            elif field.default is MISSING and field.default_factory is MISSING:
> +                raise RuntimeError(
> +                    f"parser for field {field.name} returned None, but the field has no default"
> +                )
> +
> +        return cls(**fields_values)
  
Luca Vizzarro June 5, 2024, 10:35 a.m. UTC | #3
On 04/06/2024 16:13, Juraj Linkeš wrote:
> I'd like to see a high level explanation of what the key pieces of the 
> parsing are and how they're tied to the implementation:
> 1. The text we're about to parse, passed to the instance of a subclass
> 2. What we're parsing, the fields of the subclass
> 3. How we're parsing the fields, the functions of TextParser
> 
> This could be part of the example or mentioned before the example or in 
> the class. I had to study the code to understand the API, so that should 
> be better documented.
Ack.

>> +@dataclass
>> +class TextParser(ABC):
>> +    """Helper abstract dataclass that parses a text according to the 
>> fields' rules.
>> +
>> +    This class provides a selection of parser functions and a 
>> function to compose generic functions
>> +    with parser functions. Parser functions are designed to be passed 
>> to the fields' metadata param
>> +    to enable parsing.
>> +    """
> 
> We should add that this class should be subclassed; basically what I've 
> laid out above so that devs know how to actually use the class. I'm not 
> sure which is the better place to put it, but probably here.
Ack.
>> +
>> +    """============ BEGIN PARSER FUNCTIONS ============"""
>> +
>> +    @staticmethod
>> +    def compose(f: Callable, parser_fn: ParserFn) -> ParserFn:
>> +        """Makes a composite parser function.
>> +
>> +        The parser function is run and if a non-None value was 
>> returned, f is called with it.
> 
> Let's use `parser_fn` instead of "The parser function". Let's also put f 
> into backticks. It's also peculiar that we're passing the functions in 
> the reverse order - first the second one is applied and then the first 
> one is applied.
> 

Ack.

compose(f, g) is equivalent of f(g(..)). It follows a syntactical order, 
rather than a calling one. I don't have a preference on ordering either 
way, so anything is fine by me.

>> +        Otherwise the function returns early with None.
>> +
>> +        Metadata modifier for :func:`dataclasses.field`.
> 
> This sentence is all alone here and I don't understand what it's saying 
> without more context.

Similarly to what was previously done with the Params class, this is 
just to signify that this function is modifies the metadata argument of 
dataclasses.field. It's just a note. I guess this could actually be in 
the Returns section, and it'd be clearer that way.

>> +
>> +        Args:
>> +            f: the function to apply to the parser's result
> 
> This now refers to just parser's results instead of parser functions's 
> result, which is confusing. But let's also use `parser_fn` here.
> 
> Also, the arg descriptions should end with a dot. And start with a 
> capital letter.
> 
> https://www.sphinx-doc.org/en/master/usage/extensions/example_google.html#example-google
> 

Ack.

>> +            parser_fn: the dictionary storing the parser function
>> +        """
> 
> The docstring is missing a Returns: section.

Ack.

>> +        g = parser_fn["TextParser_fn"]
> 
> This is a matter of preference, but I like more descriptive variable 
> names, something like initial_func. And then I'd change f to 
> subsequent_func, which would make the function signature a bit more 
> readable.

Ack.

>> +    @staticmethod
>> +    def find(
>> +        pattern: str | re.Pattern[str],
>> +        flags: re.RegexFlag = re.RegexFlag(0),
>> +        named: bool = False,
>> +    ) -> ParserFn:
>> +        """Makes a parser function that finds a regular expression 
>> match in the text.
>> +
>> +        If the pattern has capturing groups, it returns None if no 
>> match was found. If the pattern
>> +        has only one capturing group and a match was found, its value 
>> is returned. If the pattern
>> +        has no capturing groups then either True or False is returned 
>> if the pattern had a match or
>> +        not.
>> +
> 
> This description would be a better fit in a Returns: section. The body 
> could explain the various scenarios in which the function can be used. 
> At least I imagine there are various scenarios based on the different 
> things the function returns, but I don't really want to go around 
> digging in the code to verify that. :-)

Returns is about what the `find` returns, but it's not what it actually 
returns. `find` returns a dictionary compatible with 
`dataclasses.field`'s metadata argument, which contains the actual _find 
function. Hence why I wrote that segment in the body. I think it would 
be confusing to put this description in the returns. I agree that this 
should be clarified though.

> Also a note: what does the function return if the pattern has capturing 
> groups and a match is found?

Ha! The implicit answer is the captured groups, but yes it should be 
added. Nice catch.

>> +        Metadata modifier for :func:`dataclasses.field`.
>> +
>> +        Args:
>> +            pattern: the regular expression pattern
>> +            flags: the regular expression flags. Not used if the 
>> given pattern is already compiled
>> +            named: if set to True only the named capture groups will 
>> be returned as a dictionary
>> +        """
> 
> This is also missing the Returns: sections.

Ack.

>> +        if isinstance(pattern, str):
>> +            pattern = re.compile(pattern, flags)
>> +
>> +        def _find(text: str) -> Any:
>> +            m = pattern.search(text)
>> +            if m is None:
>> +                return None if pattern.groups > 0 else False
>> +
>> +            if pattern.groups == 0:
>> +                return True
>> +
>> +            if named:
>> +                return m.groupdict()
>> +
>> +            matches = m.groups()
>> +            if len(matches) == 1:
>> +                return matches[0]
>> +
>> +            return matches
>> +
>> +        return ParserFn(TextParser_fn=_find)
>> +
>> +    @classmethod
> 
> Is there a reason why find_int() is a classmethod while the rest are 
> staticmethods? Looks like it could also be a staticmethod.

Class methods are pretty much static methods that receive a reference to 
the class. In this case it's a class method as we are calling another 
class method: cls.find(..). The @staticmethod equivalent would be 
explicitly specifying the class and the method we want to use.

>> +    def find_int(
>> +        cls,
>> +        pattern: str | re.Pattern[str],
>> +        flags: re.RegexFlag = re.RegexFlag(0),
>> +        int_base: int = 0,
>> +    ) -> ParserFn:
>> +        """Makes a parser function that converts the match of 
>> :meth:`~find` to int.
>> +
>> +        This function is compatible only with a pattern containing 
>> one capturing group.
>> +
>> +        Metadata modifier for :func:`dataclasses.field`.
>> +
>> +        Args:
>> +            pattern: the regular expression pattern
>> +            flags: the regular expression flags
>> +            int_base: the base of the number to convert from
>> +        Raises:
>> +            RuntimeError: if the pattern does not have exactly one 
>> capturing group
>> +        """
>> +        if isinstance(pattern, str):
>> +            pattern = re.compile(pattern, flags)
>> +
>> +        if pattern.groups != 1:
>> +            raise RuntimeError("only one capturing group is allowed 
>> with this parser function")
>> +
> 
> Have you considered using a subclass of DTSError so that we can add a 
> severity to this error? The severity is there to "rank" DTS errors from 
> worst to least worst.

No, I actually didn't think of it. This error should only occur out of 
development error, and it should never be triggered in practice. Do you 
think it should be used?

>> +        return cls.compose(partial(int, base=int_base), 
>> cls.find(pattern))
>> +
>> +    """============ END PARSER FUNCTIONS ============"""
>> +
>> +    @classmethod
>> +    def parse(cls, text: str) -> Self:
> 
> Would converting this into __init__(self, text: str) work? Sounds like 
> we could just use "for field in fields(self)" and then setattr() to 
> populate the variables.

I am not in favour of overriding the constructor, as I see the parsing 
ability as an extension to the class. Nonetheless, this would make sense 
if we think that this would be used exclusively for objects that need to 
be parsed in order to be constructed. I purposefully left the 
flexibility open, but if we don't think it'll ever be needed, then it's 
not a game changer to me.

>> +        for field in fields(cls):
>> +            parse = cast(ParserFn, field.metadata).get("TextParser_fn")
>> +            if parse is None:
>> +                continue
>> +
>> +            value = parse(text)
>> +            if value is not None:
>> +                fields_values[field.name] = value
> 
> If we convert the method into a constructor, we would just use setattr() 
> here.

Ack.
  
Juraj Linkeš June 5, 2024, 12:19 p.m. UTC | #4
>>> +
>>> +    """============ BEGIN PARSER FUNCTIONS ============"""
>>> +
>>> +    @staticmethod
>>> +    def compose(f: Callable, parser_fn: ParserFn) -> ParserFn:
>>> +        """Makes a composite parser function.
>>> +
>>> +        The parser function is run and if a non-None value was 
>>> returned, f is called with it.
>>
>> Let's use `parser_fn` instead of "The parser function". Let's also put 
>> f into backticks. It's also peculiar that we're passing the functions 
>> in the reverse order - first the second one is applied and then the 
>> first one is applied.
>>
> 
> Ack.
> 
> compose(f, g) is equivalent of f(g(..)). It follows a syntactical order, 
> rather than a calling one. I don't have a preference on ordering either 
> way, so anything is fine by me.
> 

I'd prefer to have the calling order. You could also include that the 
resulting function is f(g()).

>>> +        Otherwise the function returns early with None.
>>> +
>>> +        Metadata modifier for :func:`dataclasses.field`.
>>
>> This sentence is all alone here and I don't understand what it's 
>> saying without more context.
> 
> Similarly to what was previously done with the Params class, this is 
> just to signify that this function is modifies the metadata argument of 
> dataclasses.field. It's just a note. I guess this could actually be in 
> the Returns section, and it'd be clearer that way.
> 

Ok, let's see how that reads.


>>> +    @staticmethod
>>> +    def find(
>>> +        pattern: str | re.Pattern[str],
>>> +        flags: re.RegexFlag = re.RegexFlag(0),
>>> +        named: bool = False,
>>> +    ) -> ParserFn:
>>> +        """Makes a parser function that finds a regular expression 
>>> match in the text.
>>> +
>>> +        If the pattern has capturing groups, it returns None if no 
>>> match was found. If the pattern
>>> +        has only one capturing group and a match was found, its 
>>> value is returned. If the pattern
>>> +        has no capturing groups then either True or False is 
>>> returned if the pattern had a match or
>>> +        not.
>>> +
>>
>> This description would be a better fit in a Returns: section. The body 
>> could explain the various scenarios in which the function can be used. 
>> At least I imagine there are various scenarios based on the different 
>> things the function returns, but I don't really want to go around 
>> digging in the code to verify that. :-)
> 
> Returns is about what the `find` returns, but it's not what it actually 
> returns. `find` returns a dictionary compatible with 
> `dataclasses.field`'s metadata argument, which contains the actual _find 
> function. Hence why I wrote that segment in the body. I think it would 
> be confusing to put this description in the returns. I agree that this 
> should be clarified though.
> 

They way it's phrased it looks like you're describing what find() 
returns, but I get that it returns a function that returns what you 
described. It could be easily clarified in the returns section though:

Returns:
     A function that returns <the whole description>.

>>> +    @classmethod
>>
>> Is there a reason why find_int() is a classmethod while the rest are 
>> staticmethods? Looks like it could also be a staticmethod.
> 
> Class methods are pretty much static methods that receive a reference to 
> the class. In this case it's a class method as we are calling another 
> class method: cls.find(..). The @staticmethod equivalent would be 
> explicitly specifying the class and the method we want to use.
> 

I was wondering about this because it doesn't need to be a classmethod, 
as we're calling a static method from within and as you mention, we can 
just specify the class with the method (i.e. TextParser.find()), so my 
thoughts were why is find() deviating from the rest of the methods when 
it doesn't need to.

This doesn't really matter though, it's going to be called the same from 
outside the class, so I'll leave this up to you. :-)

>>> +    def find_int(
>>> +        cls,
>>> +        pattern: str | re.Pattern[str],
>>> +        flags: re.RegexFlag = re.RegexFlag(0),
>>> +        int_base: int = 0,
>>> +    ) -> ParserFn:
>>> +        """Makes a parser function that converts the match of 
>>> :meth:`~find` to int.
>>> +
>>> +        This function is compatible only with a pattern containing 
>>> one capturing group.
>>> +
>>> +        Metadata modifier for :func:`dataclasses.field`.
>>> +
>>> +        Args:
>>> +            pattern: the regular expression pattern
>>> +            flags: the regular expression flags
>>> +            int_base: the base of the number to convert from
>>> +        Raises:
>>> +            RuntimeError: if the pattern does not have exactly one 
>>> capturing group
>>> +        """
>>> +        if isinstance(pattern, str):
>>> +            pattern = re.compile(pattern, flags)
>>> +
>>> +        if pattern.groups != 1:
>>> +            raise RuntimeError("only one capturing group is allowed 
>>> with this parser function")
>>> +
>>
>> Have you considered using a subclass of DTSError so that we can add a 
>> severity to this error? The severity is there to "rank" DTS errors 
>> from worst to least worst.
> 
> No, I actually didn't think of it. This error should only occur out of 
> development error, and it should never be triggered in practice. Do you 
> think it should be used?
> 

I err on the side of using DTSErrors, as it gives automated tools a 
possibly useful way of telling the severity of the error (from the 
return code). A development error like this could be easily identifiable 
in CI then and we could maybe produce a helpful message based on that.

But maybe we can talk about this Patrick and his team (how usable the 
return code is and what the severities should be).

>>> +        return cls.compose(partial(int, base=int_base), 
>>> cls.find(pattern))
>>> +
>>> +    """============ END PARSER FUNCTIONS ============"""
>>> +
>>> +    @classmethod
>>> +    def parse(cls, text: str) -> Self:
>>
>> Would converting this into __init__(self, text: str) work? Sounds like 
>> we could just use "for field in fields(self)" and then setattr() to 
>> populate the variables.
> 
> I am not in favour of overriding the constructor, as I see the parsing 
> ability as an extension to the class.

Well, the class should primarily do the parsing (based on its name), so 
everything else is an extension in my mind.

> Nonetheless, this would make sense 
> if we think that this would be used exclusively for objects that need to 
> be parsed in order to be constructed. I purposefully left the 
> flexibility open, but if we don't think it'll ever be needed, then it's 
> not a game changer to me.
> 

Everything about the class indicates (or outright says) that parsing 
should be the only purpose. I can't really imagine what else we could 
add (in this state, an instance before calling parse() would be just the 
text with fields containing function in the metadata needed for parsing 
- what else is there to do with this data?). Can you elaborate if you 
can think of something?

And again, that said, it doesn't matter much, but I like the constructor 
as it really doesn't look like the class could (or should) do anything 
else than parsing text.
  
Luca Vizzarro June 5, 2024, 1 p.m. UTC | #5
On 05/06/2024 13:19, Juraj Linkeš wrote:
>>>> +        return cls.compose(partial(int, base=int_base), 
>>>> cls.find(pattern))
>>>> +
>>>> +    """============ END PARSER FUNCTIONS ============"""
>>>> +
>>>> +    @classmethod
>>>> +    def parse(cls, text: str) -> Self:
>>>
>>> Would converting this into __init__(self, text: str) work? Sounds 
>>> like we could just use "for field in fields(self)" and then setattr() 
>>> to populate the variables.
>>
>> I am not in favour of overriding the constructor, as I see the parsing 
>> ability as an extension to the class.
> 
> Well, the class should primarily do the parsing (based on its name), so 
> everything else is an extension in my mind >> Nonetheless, this would make sense if we think that this would be used
>> exclusively for objects that need to be parsed in order to be 
>> constructed. I purposefully left the flexibility open, but if we don't 
>> think it'll ever be needed, then it's not a game changer to me.
> 
> Everything about the class indicates (or outright says) that parsing 
> should be the only purpose. I can't really imagine what else we could 
> add (in this state, an instance before calling parse() would be just the 
> text with fields containing function in the metadata needed for parsing 
> - what else is there to do with this data?). Can you elaborate if you 
> can think of something?
> 
> And again, that said, it doesn't matter much, but I like the constructor 
> as it really doesn't look like the class could (or should) do anything 
> else than parsing text.

I think there is some confusion going on. The class is any data object 
like TestPmdPort. TextParser is an implementation of a parsing 
functionality, a Protocol/interface with a pre-implemented 
functionality, which can be added as an extension.
The only way this would make sense is if these data objects are actually 
"text representation to object adapters". If we are exclusively 
interpreting them as such, then yes, it would totally make sense to 
bring a custom constructor. But if we will need to re-use the objects 
for other reasons aside from parsing, then we may have to revert to the 
current implementation.
  

Patch

diff --git a/dts/framework/parser.py b/dts/framework/parser.py
new file mode 100644
index 0000000000..5b4acddead
--- /dev/null
+++ b/dts/framework/parser.py
@@ -0,0 +1,199 @@ 
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2024 Arm Limited
+
+r"""Parsing utility module.
+
+This module provides :class:`~TextParser` which can be used to model any dataclass to a block of
+text.
+
+Usage example::
+..code:: python
+
+    from dataclasses import dataclass, field
+    from enum import Enum
+    from framework.parser import TextParser
+
+    class Colour(Enum):
+        BLACK = 1
+        WHITE = 2
+
+        @classmethod
+        def from_str(cls, text: str):
+            match text:
+                case "black":
+                    return cls.BLACK
+                case "white":
+                    return cls.WHITE
+                case _:
+                    return None # unsupported colour
+
+        @classmethod
+        def make_parser(cls):
+            # make a parser function that finds a match and
+            # then makes it a Colour object through Colour.from_str
+            return TextParser.compose(cls.from_str, TextParser.find(r"is a (\w+)"))
+
+    @dataclass
+    class Animal(TextParser):
+        kind: str = field(metadata=TextParser.find(r"is a \w+ (\w+)"))
+        name: str = field(metadata=TextParser.find(r"^(\w+)"))
+        colour: Colour = field(metadata=Colour.make_parser())
+        age: int = field(metadata=TextParser.find_int(r"aged (\d+)"))
+
+    steph = Animal.parse("Stephanie is a white cat aged 10")
+    print(steph) # Animal(kind='cat', name='Stephanie', colour=<Colour.WHITE: 2>, age=10)
+"""
+
+import re
+from abc import ABC
+from dataclasses import MISSING, dataclass, fields
+from functools import partial
+from typing import Any, Callable, TypedDict, cast
+
+from typing_extensions import Self
+
+
+class ParserFn(TypedDict):
+    """Parser function in a dict compatible with the :func:`dataclasses.field` metadata param."""
+
+    #:
+    TextParser_fn: Callable[[str], Any]
+
+
+@dataclass
+class TextParser(ABC):
+    """Helper abstract dataclass that parses a text according to the fields' rules.
+
+    This class provides a selection of parser functions and a function to compose generic functions
+    with parser functions. Parser functions are designed to be passed to the fields' metadata param
+    to enable parsing.
+    """
+
+    """============ BEGIN PARSER FUNCTIONS ============"""
+
+    @staticmethod
+    def compose(f: Callable, parser_fn: ParserFn) -> ParserFn:
+        """Makes a composite parser function.
+
+        The parser function is run and if a non-None value was returned, f is called with it.
+        Otherwise the function returns early with None.
+
+        Metadata modifier for :func:`dataclasses.field`.
+
+        Args:
+            f: the function to apply to the parser's result
+            parser_fn: the dictionary storing the parser function
+        """
+        g = parser_fn["TextParser_fn"]
+
+        def _composite_parser_fn(text: str) -> Any:
+            intermediate_value = g(text)
+            if intermediate_value is None:
+                return None
+            return f(intermediate_value)
+
+        return ParserFn(TextParser_fn=_composite_parser_fn)
+
+    @staticmethod
+    def find(
+        pattern: str | re.Pattern[str],
+        flags: re.RegexFlag = re.RegexFlag(0),
+        named: bool = False,
+    ) -> ParserFn:
+        """Makes a parser function that finds a regular expression match in the text.
+
+        If the pattern has capturing groups, it returns None if no match was found. If the pattern
+        has only one capturing group and a match was found, its value is returned. If the pattern
+        has no capturing groups then either True or False is returned if the pattern had a match or
+        not.
+
+        Metadata modifier for :func:`dataclasses.field`.
+
+        Args:
+            pattern: the regular expression pattern
+            flags: the regular expression flags. Not used if the given pattern is already compiled
+            named: if set to True only the named capture groups will be returned as a dictionary
+        """
+        if isinstance(pattern, str):
+            pattern = re.compile(pattern, flags)
+
+        def _find(text: str) -> Any:
+            m = pattern.search(text)
+            if m is None:
+                return None if pattern.groups > 0 else False
+
+            if pattern.groups == 0:
+                return True
+
+            if named:
+                return m.groupdict()
+
+            matches = m.groups()
+            if len(matches) == 1:
+                return matches[0]
+
+            return matches
+
+        return ParserFn(TextParser_fn=_find)
+
+    @classmethod
+    def find_int(
+        cls,
+        pattern: str | re.Pattern[str],
+        flags: re.RegexFlag = re.RegexFlag(0),
+        int_base: int = 0,
+    ) -> ParserFn:
+        """Makes a parser function that converts the match of :meth:`~find` to int.
+
+        This function is compatible only with a pattern containing one capturing group.
+
+        Metadata modifier for :func:`dataclasses.field`.
+
+        Args:
+            pattern: the regular expression pattern
+            flags: the regular expression flags
+            int_base: the base of the number to convert from
+        Raises:
+            RuntimeError: if the pattern does not have exactly one capturing group
+        """
+        if isinstance(pattern, str):
+            pattern = re.compile(pattern, flags)
+
+        if pattern.groups != 1:
+            raise RuntimeError("only one capturing group is allowed with this parser function")
+
+        return cls.compose(partial(int, base=int_base), cls.find(pattern))
+
+    """============ END PARSER FUNCTIONS ============"""
+
+    @classmethod
+    def parse(cls, text: str) -> Self:
+        """Creates a new instance of the class from the given text.
+
+        A new class instance is created with all the fields that have a parser function in their
+        metadata. Fields without one are ignored and are expected to have a default value, otherwise
+        the class initialization will fail.
+
+        A field is populated with the value returned by its corresponding parser function.
+
+        Args:
+            text: the text to parse
+        Raises:
+            RuntimeError: if the parser did not find a match and the field does not have a default
+                          value or default factory.
+        """
+        fields_values = {}
+        for field in fields(cls):
+            parse = cast(ParserFn, field.metadata).get("TextParser_fn")
+            if parse is None:
+                continue
+
+            value = parse(text)
+            if value is not None:
+                fields_values[field.name] = value
+            elif field.default is MISSING and field.default_factory is MISSING:
+                raise RuntimeError(
+                    f"parser for field {field.name} returned None, but the field has no default"
+                )
+
+        return cls(**fields_values)