Transformations

This module offers a canonical interface with the aim to make re-use of transforming algorithms easier.

Let’s look at it with examples:

from delb.transform import Transformation


class ResolveCopyOf(Transformation):
    def transform(self):
        for node in self.root.css_select(
            "*[copyOf]", namespaces={None: "http://www.tei-c.org/ns/1.0"}
        ):
            source_id = node["copyOf"]
            source_node = self.origin_document.xpath(
                f'//*[@xml:id="{source_id[1:]}"]',
                namespaces={}
            ).first
            cloned_node = source_node.clone(deep=True)
            cloned_node.id = None
            node.replace_with(cloned_node)

From such defined transformations instances can be called with a (sub-)tree and an optional document where that tree originates from:

resolve_copy_of = ResolveCopyOf()
tree = resolve_copy_of(tree, origin_document=document)

typing.NamedTuple are used to define options for transformations:

from typing import Final, NamedTuple, TypedDict


class NamespacesKWArgs(TypedDict):
    namespaces: dict[str | None, str]


TEI: Final[NamespacesKWArgs] = {"namespaces": {None: TEI_NAMESPACE}}


class ResolveChoiceOptions(NamedTuple):
    corr: bool = True
    reg: bool = True


class ResolveChoice(Transformation):
    options_class = ResolveChoiceOptions

    def __init__(self, options):
        super().__init__(options)
        self.keep_selector = ",".join(
            (
                "corr" if self.options.corr else "sic",
                "reg" if self.options.reg else "orig"
            )
         )
        self.drop_selector = ",".join(
            (
                "sic" if self.options.corr else "corr",
                "orig" if self.options.reg else "reg"
            )
        )

    def transform(self):
        for choice_node in self.root.css_select("choice", **TEI):
            node_to_drop = choice_node.css_select(self.drop_selector, **TEI).first
            node_to_drop.detach()

            node_to_keep = choice_node.css_select(self.keep_selector, **TEI).first
            node_to_keep.detach(retain_child_nodes=True)

            choice_node.detach(retain_child_nodes=True)

A transformation class that defines an option_class property can then either be used with its defaults or with alternate options:

resolve_choice = ResolveChoice()
tree = resolve_choice(tree)

resolve_choice = ResolveChoice(ResolveChoiceOptions(reg=False))
tree = resolve_choice(tree)

Finally, concrete transformations can be chained, both as classes or instances. The interface allows also to chain multiple chains:

from delb.transform import TransformationSequence

tidy_up = TransformationSequence(ResolveCopyOf, resolve_choice)
tree = tidy_up(tree)

Attention

This is an experimental feature. It might change significantly in the future or be removed altogether.

class delb.transform.Transformation(options: NamedTuple | None = None)[source]

This is a base class for any transformation algorithm.

abstract transform()[source]

This method needs to implement the transformation logic. When it is called, the instance has two attributes assigned from its call:

root is the node that the transformation was called to transform with. origin_document is the document that was possibly passed as second argument.

class delb.transform.TransformationSequence(*transformations: TransformationBase | type[TransformationBase])[source]

A transformation sequence can be used to combine any number of both Transformation (provided as class or instantiated with options) and other TransformationSequence instances or classes.