API Documentation¶

Note

There are actually two packages that are installed with delb: delb and _delb. As the underscore indicates, the latter is exposing private parts of the API while the first is re-exposing what is deemed to be public from that one and additional contents. As a rule of thumb, use the public API in applications and the private API in delb extensions. By doing so, you can avoid circular dependencies if your extension (or other code that it depends on) uses contents from the _delb package.

Documents¶

class delb.Document(source, collapse_whitespace=None, parser=None, parser_options=None, klass=None, **config_options)[source]¶

This class is the entrypoint to obtain a representation of an XML encoded text document. For instantiation any object can be passed. A suitable loader must be available for the given source. See Document loaders for the default loaders that come with this package. Plugins are capable to alter the available loaders, see Extending delb.

Nodes can be tested for membership in a document:

>>> document = Document("<root>text</root>")
>>> text_node = document.root[0]
>>> text_node in document
True
>>> text_node.clone() in document
False

The string coercion of a document yields an XML encoded stream as string. Its appearance can be configured via DefaultStringOptions.

>>> document = Document("<root/>")
>>> str(document)
"<?xml version='1.0' encoding='UTF-8'?><root/>"

Parameters:

source – Anything that the configured loaders can make sense of to return a parsed document tree.
collapse_whitespace – Deprecated. Use the argument with the same name on the parser_options object.
parser – Deprecated.
parser_options – A delb.ParserOptions instance to configure the used parser.
klass – Explicitly define the initialized class. This can be useful for applications that have default document subclasses in use.
config – Additional keyword arguments for the configuration of extension classes.

Properties

`config`	Beside the used `parser` and `collapsed_whitespace` option, this property contains the namespaced data that extension classes and loaders may have stored.
`head_nodes`	A list-like accessor to the nodes that precede the document's root node.
`namespaces`	The namespace mapping of the document's `root` node.
`root`	The root node of a document tree.
`source_url`	The source URL where a loader obtained the document's contents or `None`.
`tail_nodes`	A list-like accessor to the nodes that follow the document's root node.

Uncategorized methods

`clone`()	return: Another instance with the duplicated contents.
`collapse_whitespace`()	Collapses whitespace as described here: https://wiki.tei-c.org/index.php/XML_Whitespace#Recommendations
`css_select`(expression[, namespaces])	This method proxies to the `TagNode.css_select()` method of the document's `root` node.
`merge_text_nodes`()	This method proxies to the `TagNode.merge_text_nodes()` method of the document's `root` node.
`new_tag_node`(local_name[, attributes, namespace])	This method proxies to the `TagNode.new_tag_node()` method of the document's root node.
`save`(path[, pretty, encoding, ...])	param path: The filesystem path to the target file.
`write`(buffer[, pretty, encoding, ...])	param buffer: A file-like object that the document is written to.
`xpath`(expression[, namespaces])	This method proxies to the `TagNode.xpath()` method of the document's `root` node.
`xslt`(transformation)	param transformation: A `lxml.etree.XSLT` instance that shall be

clone() → Document[source]¶

Returns:: Another instance with the duplicated contents.

collapse_whitespace()[source]¶

Collapses whitespace as described here: https://wiki.tei-c.org/index.php/XML_Whitespace#Recommendations

Implicitly merges all neighbouring text nodes.

config: SimpleNamespace¶: Beside the used parser and collapsed_whitespace option, this property contains the namespaced data that extension classes and loaders may have stored.

css_select(expression: str, namespaces: Optional[NamespaceDeclarations] = None) → QueryResults[source]¶: This method proxies to the TagNode.css_select() method of the document’s root node.

head_nodes¶: A list-like accessor to the nodes that precede the document’s root node. Note that nodes can’t be removed or replaced.

merge_text_nodes()[source]¶: This method proxies to the TagNode.merge_text_nodes() method of the document’s root node.

property namespaces: Namespaces¶: The namespace mapping of the document’s root node.

new_tag_node(local_name: str, attributes: Optional[dict[str, str]] = None, namespace: Optional[str] = None) → TagNode[source]¶: This method proxies to the TagNode.new_tag_node() method of the document’s root node.

property root: TagNode¶: The root node of a document tree.

save(path: Path, pretty: Optional[bool] = None, *, encoding: str = 'utf-8', align_attributes: bool = False, indentation: str = '', namespaces: Optional[NamespaceDeclarations] = None, newline: None | str = None, text_width: int = 0)[source]¶

Parameters:

path – The filesystem path to the target file.
pretty – Deprecated. Adds indentation for human consumers when True.
encoding – The desired text encoding.
align_attributes – Determines whether attributes’ names and values line up sharply around vertically aligned equal signs.
indentation – This string prefixes descending nodes’ contents one time per depth level. A non-empty string implies line-breaks between nodes as well.
namespaces – A mapping of prefixes to namespaces. These are overriding possible declarations from a parsed serialisat that the document instance stems from. Prefixes for undeclared namespaces are enumerated with the prefix ns.
newline – See io.TextIOWrapper for a detailed explanation of the parameter with the same name.
text_width – A positive value indicates that text nodes shall get wrapped at this character position. Indentations are not considered as part of text. This parameter’s purposed to define reasonable widths for text displays that can be scrolled horizontally.

source_url: Optional[str]¶: The source URL where a loader obtained the document’s contents or None.

tail_nodes¶: A list-like accessor to the nodes that follow the document’s root node. Note that nodes can’t be removed or replaced.

write(buffer: BinaryIO, pretty: Optional[bool] = None, *, encoding: str = 'utf-8', align_attributes: bool = False, indentation: str = '', namespaces: Optional[NamespaceDeclarations] = None, newline: None | str, text_width: int = 0)[source]¶

Parameters:

buffer – A file-like object that the document is written to.
pretty – Deprecated. Adds indentation for human consumers when True.
encoding – The desired text encoding.
align_attributes – Determines whether attributes’ names and values line up sharply around vertically aligned equal signs.
indentation – This string prefixes descending nodes’ contents one time per depth level. A non-empty string implies line-breaks between nodes as well.
namespaces – A mapping of prefixes to namespaces. These are overriding possible declarations from a parsed serialisat that the document instance stems from. Prefixes for undeclared namespaces are enumerated with the prefix ns.
newline – See io.TextIOWrapper for a detailed explanation of the parameter with the same name.
text_width – A positive value indicates that text nodes shall get wrapped at this character position. Indentations are not considered as part of text. This parameter’s purposed to define reasonable widths for text displays that can be scrolled horizontally.

xpath(expression: str, namespaces: Optional[NamespaceDeclarations] = None) → QueryResults[source]¶: This method proxies to the TagNode.xpath() method of the document’s root node.

xslt(transformation: XSLT) → Document[source]¶

Parameters:: transformation – A lxml.etree.XSLT instance that shall be applied to the document.
Returns:: A new instance with the transformation’s result.

Document loaders¶

If you want or need to manipulate the availability of or order in which loaders are attempted, you can change the delb.plugins.plugin_manager.plugins.loaders object which is a list. Its state is reflected in your whole application. Please refer to this issue when you require finer controls over these aspects.

Core¶

The core_loaders module provides a set loaders to retrieve documents from various data sources.

_delb.plugins.core_loaders.buffer_loader(data: Any, config: SimpleNamespace) → LoaderResult[source]¶: This loader loads a document from a file-like object.

_delb.plugins.core_loaders.etree_loader(data: Any, config: SimpleNamespace) → LoaderResult[source]¶: This loader processes lxml.etree._Element and lxml.etree._ElementTree instances.

_delb.plugins.core_loaders.ftp_loader(data: Any, config: SimpleNamespace) → LoaderResult[source]¶: Loads a document from a URL with either the ftp schema. The URL will be bound to source_url on the document’s Document.config attribute.

_delb.plugins.core_loaders.path_loader(data: Any, config: SimpleNamespace) → LoaderResult[source]¶: This loader loads from a file that is pointed at with a pathlib.Path instance. That instance will be bound to source_path on the document’s Document.config attribute.

_delb.plugins.core_loaders.tag_node_loader(data: Any, config: SimpleNamespace) → LoaderResult[source]¶: This loader loads, or rather clones, a delb.TagNode instance and its descendant nodes.

_delb.plugins.core_loaders.text_loader(data: Any, config: SimpleNamespace) → LoaderResult[source]¶: Parses a string containing a full document.

Extra¶

If delb is installed with https-loader as extra, the required dependencies for this loader are installed as well. See Installation.

_delb.plugins.https_loader.https_loader(data: Any, config: SimpleNamespace, client: httpx.Client = <httpx.Client object>) → LoaderResult[source]¶

This loader loads a document from a URL with the http and https scheme. The default httpx-client follows redirects and can partially be configured with environment variables. The URL will be bound to the name source_url on the document’s Document.config attribute.

Loaders with specifically configured httpx-clients can build on this loader like so:

import httpx
from _delb.plugins import plugin_manager
from _delb.plugins.https_loader import https_loader


client = httpx.Client(follow_redirects=False, trust_env=False)

@plugin_manager.register_loader(before=https_loader)
def custom_https_loader(data, config):
    return https_loader(data, config, client=client)

Parser options¶

class delb.ParserOptions(collapse_whitespace: bool = False, remove_comments: bool = False, remove_processing_instructions: bool = False, resolve_entities: bool = True, unplugged: bool = False)[source]¶

The configuration options that define an XML parser’s behaviour.

Parameters:

collapse_whitespace – Collapse the content's whitespace.
remove_comments – Ignore comments.
remove_processing_instructions – Don’t include processing instructions in the parsed tree.
resolve_entities – Resolve entities.
unplugged – Don’t load referenced resources over network.

Nodes¶

Comment¶

class delb.CommentNode(etree_element: _Element)[source]¶

The instances of this class represent comment nodes of a tree.

To instantiate new nodes use new_comment_node().

Properties

`content`	The comment's text.
`depth`	The depth (or level) of the node in its tree.
`document`	The `Document` instance that the node is associated with or `None`.
`first_child`
`full_text`	The concatenated contents of all text node descendants in document order.
`index`	The node's index within the parent's collection of child nodes or `None` when the node has no parent.
`last_child`
`last_descendant`
`namespaces`	The prefix to namespace mapping of the node.
`parent`	The node's parent or `None`.

Fetching a single relative node

`fetch_following`(*filter)	param filter: Any number of filter s.
`fetch_following_sibling`(*filter)	param filter: Any number of filter s.
`fetch_preceding`(*filter)	param filter: Any number of filter s.
`fetch_preceding_sibling`(*filter)	param filter: Any number of filter s.

Iterating over relative nodes

`iterate_ancestors`(*filter)	param filter: Any number of filter s that a node must match to be
`iterate_children`(*filter[, recurse])	A generator iterator that yields nothing.
`iterate_descendants`(*filter)	param filter: Any number of filter s that a node must match to be
`iterate_following`(*filter)	param filter: Any number of filter s that a node must match to be
`iterate_following_siblings`(*filter)	param filter: Any number of filter s that a node must match to be
`iterate_preceding`(*filter)	param filter: Any number of filter s that a node must match to be
`iterate_preceding_siblings`(*filter)	param filter: Any number of filter s that a node must match to be

Querying nodes

xpath(expression[, namespaces])

See Queries with XPath & CSS for details on the extent of the XPath implementation.

Adding nodes

`add_following_siblings`(*node[, clone])	Adds one or more nodes to the right of the node this method is called on.
`add_preceding_siblings`(*node[, clone])	Adds one or more nodes to the left of the node this method is called on.

Removing a node from its tree

`detach`([retain_child_nodes])	Removes the node from its tree.
`replace_with`(node[, clone])	Removes the node and places the given one in its tree location.

Uncategorized methods

clone([deep, quick_and_unsafe])

param deep:: Clones the whole subtree if True.

new_tag_node(local_name[, attributes, ...])

Creates a new TagNode instance in the node's context.

serialize(*[, align_attributes, ...])

Returns a string that contains the serialization of the node.

add_following_siblings(*node: NodeSource, clone: bool = False)¶

Adds one or more nodes to the right of the node this method is called on.

The nodes can be concrete instances of any node type or rather abstract descriptions in the form of strings or objects returned from the tag() function that are used to derive TextNode respectively TagNode instances from.

Parameters:

node – The node(s) to be added.
clone – Clones the concrete nodes before adding if True.

add_preceding_siblings(*node: NodeSource, clone: bool = False)¶

Adds one or more nodes to the left of the node this method is called on.

The nodes can be concrete instances of any node type or rather abstract descriptions in the form of strings or objects returned from the tag() function that are used to derive TextNode respectively TagNode instances from.

Parameters:

node – The node(s) to be added.
clone – Clones the concrete nodes before adding if True.

clone(deep: bool = False, quick_and_unsafe: bool = False) → _ElementWrappingNode¶

Parameters:

deep – Clones the whole subtree if True.
quick_and_unsafe – Creates a deep clone in a quicker manner where text nodes may get lost. It should be safe with trees that don’t contain subsequent text nodes, e.g. freshly parsed, unaltered documents of after TagNode.merge_text_nodes() has been applied.

Returns:

A copy of the node.

property content: str¶: The comment’s text.

property depth: int¶: The depth (or level) of the node in its tree.

detach(retain_child_nodes: bool = False) → _ElementWrappingNode¶

Removes the node from its tree.

Parameters:: retain_child_nodes – Keeps the node’s descendants in the originating tree if True.
Returns:: The removed node.

property document: Optional[Document]¶: The Document instance that the node is associated with or None.

fetch_following(*filter: _delb.typing.Filter) → Optional[NodeBase]¶

Parameters:: filter – Any number of filter s.
Returns:: The next node in document order that matches all filters or None.

fetch_following_sibling(*filter: _delb.typing.Filter) → Optional[NodeBase]¶

Parameters:: filter – Any number of filter s.
Returns:: The next sibling to the right that matches all filters or None.

fetch_preceding(*filter: _delb.typing.Filter) → Optional[NodeBase]¶

Parameters:: filter – Any number of filter s.
Returns:: The previous node in document order that matches all filters or None.

fetch_preceding_sibling(*filter: _delb.typing.Filter) → Optional[NodeBase]¶

Parameters:: filter – Any number of filter s.
Returns:: The next sibling to the left that matches all filters or None.

first_child = None¶

property full_text: str¶: The concatenated contents of all text node descendants in document order.

property index: Optional[int]¶: The node’s index within the parent’s collection of child nodes or None when the node has no parent.

iterate_ancestors(*filter: Filter) → Iterator[TagNode]¶

Parameters:: filter – Any number of filter s that a node must match to be yielded.
Returns:: A generator iterator that yields the ancestor nodes from bottom to top.

iterate_children(*filter: Filter, recurse: bool = False) → Iterator[NodeBase]¶

A generator iterator that yields nothing.

iterate_descendants(*filter: Filter) → Iterator[NodeBase]¶

Parameters:: filter – Any number of filter s that a node must match to be yielded.
Returns:: A generator iterator that yields the descending nodes of the node.

iterate_following(*filter: Filter) → Iterator[NodeBase]¶

Parameters:: filter – Any number of filter s that a node must match to be yielded.
Returns:: A generator iterator that yields the following nodes in document order.

iterate_following_siblings(*filter: Filter) → Iterator[NodeBase]¶

Parameters:: filter – Any number of filter s that a node must match to be yielded.
Returns:: A generator iterator that yields the siblings to the node’s right.

iterate_preceding(*filter: Filter) → Iterator[NodeBase]¶

Parameters:: filter – Any number of filter s that a node must match to be yielded.
Returns:: A generator iterator that yields the previous nodes in document order.

iterate_preceding_siblings(*filter: Filter) → Iterator[NodeBase]¶

Parameters:: filter – Any number of filter s that a node must match to be yielded.
Returns:: A generator iterator that yields the siblings to the node’s left.

last_child = None¶

last_descendant = None¶

property namespaces: Namespaces¶: The prefix to namespace mapping of the node.

new_tag_node(local_name: str, attributes: Optional[dict[str, str]] = None, namespace: Optional[str] = None, children: Sequence[str | NodeBase | _TagDefinition] = ()) → TagNode¶

Creates a new TagNode instance in the node’s context.

Parameters:

local_name – The tag name.
attributes – Optional attributes that are assigned to the new node.
namespace – An optional tag namespace. If none is provided, the context node’s namespace is inherited.
children – An optional sequence of objects that will be appended as child nodes. This can be existing nodes, strings that will be inserted as text nodes and in-place definitions of TagNode instances from tag(). The latter will be assigned to the same namespace.

Returns:

The newly created tag node.

property parent: Optional[TagNode]¶: The node’s parent or None.

replace_with(node: NodeSource, clone: bool = False) → NodeBase¶

Removes the node and places the given one in its tree location.

The node can be a concrete instance of any node type or a rather abstract description in the form of a string or an object returned from the tag() function that is used to derive a TextNode respectively TagNode instance from.

Parameters:

node – The replacing node.
clone – A concrete, replacing node is cloned if True.

Returns:

The removed node.

serialize(*, align_attributes: bool = False, indentation: str = '', namespaces: Optional[NamespaceDeclarations] = None, newline: Optional[str] = None, text_width: int = 0)¶

Returns a string that contains the serialization of the node.

Parameters:

align_attributes – Determines whether attributes’ names and values line up sharply around vertically aligned equal signs.
indentation – This string prefixes descending nodes’ contents one time per depth level. A non-empty string implies line-breaks between nodes as well.
namespaces – A mapping of prefixes to namespaces. These are overriding possible declarations from a parsed serialisat that the document instance stems from. Prefixes for undeclared namespaces are enumerated with the prefix ns.
newline – See io.TextIOWrapper for a detailed explanation of the parameter with the same name.
text_width – A positive value indicates that text nodes shall get wrapped at this character position. Indentations are not considered as part of text. This parameter’s purposed to define reasonable widths for text displays that can be scrolled horizontally.

xpath(expression: str, namespaces: Optional[NamespaceDeclarations] = None) → QueryResults¶

See Queries with XPath & CSS for details on the extent of the XPath implementation.

Parameters:

expression – A supported XPath 1.0 expression that contains one or more location paths.
namespaces – A mapping of prefixes that are used in the expression to namespaces. The declarations that were used in a document’s source serialisat serve as fallback.

Returns:

All nodes that match the evaluation of the provided XPath expression.

Processing instruction¶

class delb.ProcessingInstructionNode(etree_element: _Element)[source]¶

The instances of this class represent processing instruction nodes of a tree.

To instantiate new nodes use new_processing_instruction_node().

Properties

`content`	The processing instruction's text.
`depth`	The depth (or level) of the node in its tree.
`document`	The `Document` instance that the node is associated with or `None`.
`first_child`
`full_text`	The concatenated contents of all text node descendants in document order.
`index`	The node's index within the parent's collection of child nodes or `None` when the node has no parent.
`last_child`
`last_descendant`
`namespaces`	The prefix to namespace mapping of the node.
`parent`	The node's parent or `None`.
`target`	The processing instruction's target.

Fetching a single relative node

`fetch_following`(*filter)	param filter: Any number of filter s.
`fetch_following_sibling`(*filter)	param filter: Any number of filter s.
`fetch_preceding`(*filter)	param filter: Any number of filter s.
`fetch_preceding_sibling`(*filter)	param filter: Any number of filter s.

Iterating over relative nodes

`iterate_ancestors`(*filter)	param filter: Any number of filter s that a node must match to be
`iterate_children`(*filter[, recurse])	A generator iterator that yields nothing.
`iterate_descendants`(*filter)	param filter: Any number of filter s that a node must match to be
`iterate_following`(*filter)	param filter: Any number of filter s that a node must match to be
`iterate_following_siblings`(*filter)	param filter: Any number of filter s that a node must match to be
`iterate_preceding`(*filter)	param filter: Any number of filter s that a node must match to be
`iterate_preceding_siblings`(*filter)	param filter: Any number of filter s that a node must match to be

Querying nodes

xpath(expression[, namespaces])

See Queries with XPath & CSS for details on the extent of the XPath implementation.

Adding nodes

`add_following_siblings`(*node[, clone])	Adds one or more nodes to the right of the node this method is called on.
`add_preceding_siblings`(*node[, clone])	Adds one or more nodes to the left of the node this method is called on.

Removing a node from its tree

`detach`([retain_child_nodes])	Removes the node from its tree.
`replace_with`(node[, clone])	Removes the node and places the given one in its tree location.

Uncategorized methods

clone([deep, quick_and_unsafe])

param deep:: Clones the whole subtree if True.

new_tag_node(local_name[, attributes, ...])

Creates a new TagNode instance in the node's context.

serialize(*[, align_attributes, ...])

Returns a string that contains the serialization of the node.

add_following_siblings(*node: NodeSource, clone: bool = False)¶

Adds one or more nodes to the right of the node this method is called on.

The nodes can be concrete instances of any node type or rather abstract descriptions in the form of strings or objects returned from the tag() function that are used to derive TextNode respectively TagNode instances from.

Parameters:

node – The node(s) to be added.
clone – Clones the concrete nodes before adding if True.

add_preceding_siblings(*node: NodeSource, clone: bool = False)¶

Adds one or more nodes to the left of the node this method is called on.

The nodes can be concrete instances of any node type or rather abstract descriptions in the form of strings or objects returned from the tag() function that are used to derive TextNode respectively TagNode instances from.

Parameters:

node – The node(s) to be added.
clone – Clones the concrete nodes before adding if True.

clone(deep: bool = False, quick_and_unsafe: bool = False) → _ElementWrappingNode¶

Parameters:

deep – Clones the whole subtree if True.
quick_and_unsafe – Creates a deep clone in a quicker manner where text nodes may get lost. It should be safe with trees that don’t contain subsequent text nodes, e.g. freshly parsed, unaltered documents of after TagNode.merge_text_nodes() has been applied.

Returns:

A copy of the node.

property content: str¶: The processing instruction’s text.

property depth: int¶: The depth (or level) of the node in its tree.

detach(retain_child_nodes: bool = False) → _ElementWrappingNode¶

Removes the node from its tree.

Parameters:: retain_child_nodes – Keeps the node’s descendants in the originating tree if True.
Returns:: The removed node.

property document: Optional[Document]¶: The Document instance that the node is associated with or None.

fetch_following(*filter: _delb.typing.Filter) → Optional[NodeBase]¶

Parameters:: filter – Any number of filter s.
Returns:: The next node in document order that matches all filters or None.

fetch_following_sibling(*filter: _delb.typing.Filter) → Optional[NodeBase]¶

Parameters:: filter – Any number of filter s.
Returns:: The next sibling to the right that matches all filters or None.

fetch_preceding(*filter: _delb.typing.Filter) → Optional[NodeBase]¶

Parameters:: filter – Any number of filter s.
Returns:: The previous node in document order that matches all filters or None.

fetch_preceding_sibling(*filter: _delb.typing.Filter) → Optional[NodeBase]¶

Parameters:: filter – Any number of filter s.
Returns:: The next sibling to the left that matches all filters or None.

first_child = None¶

property full_text: str¶: The concatenated contents of all text node descendants in document order.

property index: Optional[int]¶: The node’s index within the parent’s collection of child nodes or None when the node has no parent.

iterate_ancestors(*filter: Filter) → Iterator[TagNode]¶

Parameters:: filter – Any number of filter s that a node must match to be yielded.
Returns:: A generator iterator that yields the ancestor nodes from bottom to top.

iterate_children(*filter: Filter, recurse: bool = False) → Iterator[NodeBase]¶

A generator iterator that yields nothing.

iterate_descendants(*filter: Filter) → Iterator[NodeBase]¶

Parameters:: filter – Any number of filter s that a node must match to be yielded.
Returns:: A generator iterator that yields the descending nodes of the node.

iterate_following(*filter: Filter) → Iterator[NodeBase]¶

Parameters:: filter – Any number of filter s that a node must match to be yielded.
Returns:: A generator iterator that yields the following nodes in document order.

iterate_following_siblings(*filter: Filter) → Iterator[NodeBase]¶

Parameters:: filter – Any number of filter s that a node must match to be yielded.
Returns:: A generator iterator that yields the siblings to the node’s right.

iterate_preceding(*filter: Filter) → Iterator[NodeBase]¶

Parameters:: filter – Any number of filter s that a node must match to be yielded.
Returns:: A generator iterator that yields the previous nodes in document order.

iterate_preceding_siblings(*filter: Filter) → Iterator[NodeBase]¶

Parameters:: filter – Any number of filter s that a node must match to be yielded.
Returns:: A generator iterator that yields the siblings to the node’s left.

last_child = None¶

last_descendant = None¶

property namespaces: Namespaces¶: The prefix to namespace mapping of the node.

new_tag_node(local_name: str, attributes: Optional[dict[str, str]] = None, namespace: Optional[str] = None, children: Sequence[str | NodeBase | _TagDefinition] = ()) → TagNode¶

Creates a new TagNode instance in the node’s context.

Parameters:

local_name – The tag name.
attributes – Optional attributes that are assigned to the new node.
namespace – An optional tag namespace. If none is provided, the context node’s namespace is inherited.
children – An optional sequence of objects that will be appended as child nodes. This can be existing nodes, strings that will be inserted as text nodes and in-place definitions of TagNode instances from tag(). The latter will be assigned to the same namespace.

Returns:

The newly created tag node.

property parent: Optional[TagNode]¶: The node’s parent or None.

replace_with(node: NodeSource, clone: bool = False) → NodeBase¶

Removes the node and places the given one in its tree location.

The node can be a concrete instance of any node type or a rather abstract description in the form of a string or an object returned from the tag() function that is used to derive a TextNode respectively TagNode instance from.

Parameters:

node – The replacing node.
clone – A concrete, replacing node is cloned if True.

Returns:

The removed node.

serialize(*, align_attributes: bool = False, indentation: str = '', namespaces: Optional[NamespaceDeclarations] = None, newline: Optional[str] = None, text_width: int = 0)¶

Returns a string that contains the serialization of the node.

Parameters:

align_attributes – Determines whether attributes’ names and values line up sharply around vertically aligned equal signs.
indentation – This string prefixes descending nodes’ contents one time per depth level. A non-empty string implies line-breaks between nodes as well.
namespaces – A mapping of prefixes to namespaces. These are overriding possible declarations from a parsed serialisat that the document instance stems from. Prefixes for undeclared namespaces are enumerated with the prefix ns.
newline – See io.TextIOWrapper for a detailed explanation of the parameter with the same name.
text_width – A positive value indicates that text nodes shall get wrapped at this character position. Indentations are not considered as part of text. This parameter’s purposed to define reasonable widths for text displays that can be scrolled horizontally.

property target: str¶: The processing instruction’s target.

xpath(expression: str, namespaces: Optional[NamespaceDeclarations] = None) → QueryResults¶

See Queries with XPath & CSS for details on the extent of the XPath implementation.

Parameters:

expression – A supported XPath 1.0 expression that contains one or more location paths.
namespaces – A mapping of prefixes that are used in the expression to namespaces. The declarations that were used in a document’s source serialisat serve as fallback.

Returns:

All nodes that match the evaluation of the provided XPath expression.

Tag¶

class delb.TagNode(etree_element: _Element)[source]¶

The instances of this class represent tag node s of a tree, the equivalent of DOM’s elements.

To instantiate new nodes use Document.new_tag_node, TagNode.new_tag_node, TextNode.new_tag_node or new_tag_node().

Some syntactic sugar is baked in:

Attributes and nodes can be tested for membership in a node.

>>> root = Document('<root ham="spam"><child/></root>').root
>>> child = root.first_child
>>> "ham" in root
True
>>> child in root
True

Nodes can be copied. Note that this relies on TagNode.clone().

>>> from copy import copy, deepcopy
>>> root = Document("<root>Content</root>").root
>>> print(copy(root))
<root/>
>>> print(deepcopy(root))
<root>Content</root>

Nodes can be tested for equality regarding their qualified name and attributes.

>>> root = Document('<root><foo x="0"/><foo x="0"/><bar x="0"/></root>').root
>>> root[0] == root[1]
True
>>> root[0] == root[2]
False

Attribute values and child nodes can be obtained with the subscript notation.

>>> root = Document('<root x="0"><child_1/>child_2<child_3/></root>').root
>>> root["x"]
'0'
>>> print(root[0])
<child_1/>
>>> print(root[-1])
<child_3/>
>>> print([str(x) for x in root[1::-1]])
['child_2', '<child_1/>']

How much child nodes has this node anyway?

>>> root = Document("<root><child_1/><child_2/></root>").root
>>> len(root)
2
>>> len(root[0])
0

As seen in the examples above, a tag nodes string representation yields a serialized XML representation of a sub-/tree.

Properties

`attributes`	A mapping that can be used to query and alter the node's attributes.
`depth`	The depth (or level) of the node in its tree.
`document`	The `Document` instance that the node is associated with or `None`.
`first_child`	The node's first child node.
`full_text`	The concatenated contents of all text node descendants in document order.
`id`	This is a shortcut to retrieve and set the `id` attribute in the XML namespace.
`index`	The node's index within the parent's collection of child nodes or `None` when the node has no parent.
`last_child`	The node's last child node.
`last_descendant`	The node's last descendant.
`local_name`	The node's name.
`location_path`	An unambiguous XPath location path that points to this node from its tree root.
`namespace`	The node's namespace.
`namespaces`	The prefix to namespace mapping of the node.
`parent`	The node's parent or `None`.
`prefix`	The prefix that the node's namespace is currently mapped to.
`universal_name`	The node's qualified name in Clark notation.

Fetching a single relative node

`fetch_following`(*filter)	param filter: Any number of filter s.
`fetch_following_sibling`(*filter)	param filter: Any number of filter s.
`fetch_preceding`(*filter)	param filter: Any number of filter s.
`fetch_preceding_sibling`(*filter)	param filter: Any number of filter s.

Iterating over relative nodes

`iterate_ancestors`(*filter)	param filter: Any number of filter s that a node must match to be
`iterate_children`(*filter[, recurse])	param filter: Any number of filter s that a node must match to be
`iterate_descendants`(*filter)	param filter: Any number of filter s that a node must match to be
`iterate_following`(*filter)	param filter: Any number of filter s that a node must match to be
`iterate_following_siblings`(*filter)	param filter: Any number of filter s that a node must match to be
`iterate_preceding`(*filter)	param filter: Any number of filter s that a node must match to be
`iterate_preceding_siblings`(*filter)	param filter: Any number of filter s that a node must match to be

Querying nodes

`css_select`(expression[, namespaces])	See Queries with XPath & CSS regarding the extent of the supported grammar.
`fetch_or_create_by_xpath`(expression[, ...])	Fetches a single node that is locatable by the provided XPath expression.
`xpath`(expression[, namespaces])	See Queries with XPath & CSS for details on the extent of the XPath implementation.

Adding nodes

`add_following_siblings`(*node[, clone])	Adds one or more nodes to the right of the node this method is called on.
`add_preceding_siblings`(*node[, clone])	Adds one or more nodes to the left of the node this method is called on.
`append_children`(*node[, clone])	Adds one or more nodes as child nodes after any existing to the child nodes of the node this method is called on.
`insert_children`(index, *node[, clone])	Inserts one or more child nodes.
`prepend_children`(*node[, clone])	Adds one or more nodes as child nodes before any existing to the child nodes of the node this method is called on.

Removing a node from its tree

`detach`([retain_child_nodes])	Removes the node from its tree.
`replace_with`(node[, clone])	Removes the node and places the given one in its tree location.

Uncategorized methods

`clone`([deep, quick_and_unsafe])	param deep: Clones the whole subtree if `True`.
`merge_text_nodes`()	Merges all consecutive text nodes in the subtree into one.
`new_tag_node`(local_name[, attributes, ...])	Creates a new `TagNode` instance in the node's context.
`parse`(text[, parser, parser_options, ...])	Parses the given string or bytes sequence into a new tree.
`serialize`(*[, align_attributes, ...])	Returns a string that contains the serialization of the node.

add_following_siblings(*node: NodeSource, clone: bool = False)¶

Adds one or more nodes to the right of the node this method is called on.

The nodes can be concrete instances of any node type or rather abstract descriptions in the form of strings or objects returned from the tag() function that are used to derive TextNode respectively TagNode instances from.

Parameters:

node – The node(s) to be added.
clone – Clones the concrete nodes before adding if True.

add_preceding_siblings(*node: NodeSource, clone: bool = False)¶

Adds one or more nodes to the left of the node this method is called on.

The nodes can be concrete instances of any node type or rather abstract descriptions in the form of strings or objects returned from the tag() function that are used to derive TextNode respectively TagNode instances from.

Parameters:

node – The node(s) to be added.
clone – Clones the concrete nodes before adding if True.

append_children(*node: NodeSource, clone: bool = False)[source]¶

Adds one or more nodes as child nodes after any existing to the child nodes of the node this method is called on.

The nodes can be concrete instances of any node type or rather abstract descriptions in the form of strings or objects returned from the tag() function that are used to derive TextNode respectively TagNode instances from.

Parameters:

node – The node(s) to be added.
clone – Clones the concrete nodes before adding if True.

property attributes: TagAttributes¶

A mapping that can be used to query and alter the node’s attributes.

>>> node = new_tag_node("node", attributes={"foo": "0", "bar": "0"})
>>> node.attributes
{'foo': '0', 'bar': '0'}
>>> node.attributes.pop("bar")
'0'
>>> node.attributes["foo"] = "1"
>>> node.attributes["peng"] = "1"
>>> print(node)
<node foo="1" peng="1"/>
>>> node.attributes.update({"foo": "2", "zong": "2"})
>>> print(node)
<node foo="2" peng="1" zong="2"/>

Namespaced attributes can be accessed by using Python’s slice notation. A default namespace can be provided optionally, but it’s also found without.

>>> node = new_tag_node("node", {})
>>> node.attributes["http://namespace":"foo"] = "0"
>>> print(node)
<node xmlns:ns0="http://namespace" ns0:foo="0"/>
>>> node = Document('<node xmlns="default" foo="0"/>').root
>>> node.attributes["default":"foo"] is node.attributes["foo"]
True

Attributes behave like strings, but also expose namespace, local name and value for manipulation.

>>> node = new_tag_node("node")
>>> node.attributes["foo"] = "0"
>>> node.attributes["foo"].local_name = "bar"
>>> node.attributes["bar"].namespace = "http://namespace"
>>> node.attributes["http://namespace":"bar"].value = "1"
>>> print(node)
<node xmlns:ns0="http://namespace" ns0:bar="1"/>

Unlike with typical Python mappings, requesting a non-existing attribute doesn’t evoke a KeyError, instead None is returned.

clone(deep: bool = False, quick_and_unsafe: bool = False) → TagNode[source]¶

Parameters:

deep – Clones the whole subtree if True.
quick_and_unsafe – Creates a deep clone in a quicker manner where text nodes may get lost. It should be safe with trees that don’t contain subsequent text nodes, e.g. freshly parsed, unaltered documents of after TagNode.merge_text_nodes() has been applied.

Returns:

A copy of the node.

css_select(expression: str, namespaces: Optional[NamespaceDeclarations] = None) → QueryResults[source]¶

See Queries with XPath & CSS regarding the extent of the supported grammar.

Namespace prefixes are delimited with a | before a name test, for example div svg|metadata selects all descendants of div named nodes that belong to the default namespace or have no namespace and whose name is metadata and have a namespace that is mapped to the svg prefix.

Parameters:

expression – A CSS selector expression.
namespaces – A mapping of prefixes that are used in the expression to namespaces. If omitted, the node’s definition is used.

Returns:

All nodes that match the evaluation of the provided CSS selector expression.

property depth: int¶: The depth (or level) of the node in its tree.

detach(retain_child_nodes: bool = False) → _ElementWrappingNode[source]¶

Removes the node from its tree.

Parameters:: retain_child_nodes – Keeps the node’s descendants in the originating tree if True.
Returns:: The removed node.

property document: Optional[Document]¶: The Document instance that the node is associated with or None.

fetch_following(*filter: _delb.typing.Filter) → Optional[NodeBase]¶

Parameters:: filter – Any number of filter s.
Returns:: The next node in document order that matches all filters or None.

fetch_following_sibling(*filter: _delb.typing.Filter) → Optional[NodeBase]¶

Parameters:: filter – Any number of filter s.
Returns:: The next sibling to the right that matches all filters or None.

fetch_or_create_by_xpath(expression: str, namespaces: Optional[NamespaceDeclarations] = None) → TagNode[source]¶

Fetches a single node that is locatable by the provided XPath expression. If the node doesn’t exist, the non-existing branch will be created. These rules are imperative in your endeavour:

All location steps must use the child axis.
Each step needs to provide a name test.
Attribute comparisons against literals are the only allowed predicates.
Multiple attribute comparisons must be joined with the and operator and / or contained in more than one predicate expression.
The logical validity of multiple attribute comparisons isn’t checked. E.g. one could provide foo[@p="her"][@p="him"], but expect an undefined behaviour.

>>> root = Document("<root/>").root
>>> grandchild = root.fetch_or_create_by_xpath(
...     "child[@a='b']/grandchild"
... )
>>> grandchild is root.fetch_or_create_by_xpath(
...     "child[@a='b']/grandchild"
... )
True
>>> str(root)
'<root><child a="b"><grandchild/></child></root>'

Parameters:

expression – An XPath expression that can unambiguously locate a descending node in a tree that has any state.
namespaces – An optional mapping of prefixes to namespaces. The declarations that were used in a document’s source serialisat serve as fallback.

Returns:

The existing or freshly created node descibed with expression.

fetch_preceding(*filter: _delb.typing.Filter) → Optional[NodeBase]¶

Parameters:: filter – Any number of filter s.
Returns:: The previous node in document order that matches all filters or None.

fetch_preceding_sibling(*filter: _delb.typing.Filter) → Optional[NodeBase]¶

Parameters:: filter – Any number of filter s.
Returns:: The next sibling to the left that matches all filters or None.

property first_child: Optional[NodeBase]¶: The node’s first child node.

property full_text: str¶: The concatenated contents of all text node descendants in document order.

property id: Optional[str]¶: This is a shortcut to retrieve and set the id attribute in the XML namespace. The client code is responsible to pass properly formed id names.

property index: Optional[int]¶: The node’s index within the parent’s collection of child nodes or None when the node has no parent.

insert_children(index: int, *node: NodeSource, clone: bool = False)[source]¶

Inserts one or more child nodes.

The nodes can be concrete instances of any node type or rather abstract descriptions in the form of strings or objects returned from the tag() function that are used to derive TextNode respectively TagNode instances from.

Parameters:

index – The index at which the first of the given nodes will be inserted, the remaining nodes are added afterwards in the given order.
node – The node(s) to be added.
clone – Clones the concrete nodes before adding if True.

iterate_ancestors(*filter: Filter) → Iterator[TagNode]¶

Parameters:: filter – Any number of filter s that a node must match to be yielded.
Returns:: A generator iterator that yields the ancestor nodes from bottom to top.

iterate_children(*filter: Filter, recurse: bool = False) → Iterator[NodeBase][source]¶

Parameters:

filter – Any number of filter s that a node must match to be yielded.
recurse – Deprecated. Use NodeBase.iterate_descendants().

Returns:

A generator iterator that yields the child nodes of the node.

iterate_descendants(*filter: Filter) → Iterator[NodeBase][source]¶

Parameters:: filter – Any number of filter s that a node must match to be yielded.
Returns:: A generator iterator that yields the descending nodes of the node.

iterate_following(*filter: Filter) → Iterator[NodeBase]¶

Parameters:: filter – Any number of filter s that a node must match to be yielded.
Returns:: A generator iterator that yields the following nodes in document order.

iterate_following_siblings(*filter: Filter) → Iterator[NodeBase]¶

Parameters:: filter – Any number of filter s that a node must match to be yielded.
Returns:: A generator iterator that yields the siblings to the node’s right.

iterate_preceding(*filter: Filter) → Iterator[NodeBase]¶

Parameters:: filter – Any number of filter s that a node must match to be yielded.
Returns:: A generator iterator that yields the previous nodes in document order.

iterate_preceding_siblings(*filter: Filter) → Iterator[NodeBase]¶

Parameters:: filter – Any number of filter s that a node must match to be yielded.
Returns:: A generator iterator that yields the siblings to the node’s left.

property last_child: Optional[NodeBase]¶: The node’s last child node.

property last_descendant: Optional[NodeBase]¶: The node’s last descendant.

property local_name: str¶: The node’s name.

property location_path: str¶: An unambiguous XPath location path that points to this node from its tree root.

merge_text_nodes()[source]¶: Merges all consecutive text nodes in the subtree into one.

property namespace: Optional[str]¶: The node’s namespace. Be aware, that while this property can be set to None, serializations will continue to render a previous default namespace declaration if the node had such.

property namespaces: Namespaces¶: The prefix to namespace mapping of the node.

new_tag_node(local_name: str, attributes: Optional[dict[str, str]] = None, namespace: Optional[str] = None, children: Sequence[str | NodeBase | _TagDefinition] = ()) → TagNode[source]¶

Creates a new TagNode instance in the node’s context.

Parameters:

local_name – The tag name.
attributes – Optional attributes that are assigned to the new node.
namespace – An optional tag namespace. If none is provided, the context node’s namespace is inherited.
children – An optional sequence of objects that will be appended as child nodes. This can be existing nodes, strings that will be inserted as text nodes and in-place definitions of TagNode instances from tag(). The latter will be assigned to the same namespace.

Returns:

The newly created tag node.

property parent: Optional[TagNode]¶: The node’s parent or None.

static parse(text: AnyStr, parser: Optional[XMLParser] = None, parser_options: Optional[ParserOptions] = None, collapse_whitespace: Optional[bool] = None) → TagNode[source]¶

Parses the given string or bytes sequence into a new tree.

Parameters:

text – A serialized XML tree.
parser – Deprecated.
parser_options – A delb.ParserOptions class to configure the used parser.
collapse_whitespace – Deprecated. Use the argument with the same name on the parser_options object.

property prefix: Optional[str]¶: The prefix that the node’s namespace is currently mapped to.

prepend_children(*node: NodeBase, clone: bool = False) → None[source]¶

Adds one or more nodes as child nodes before any existing to the child nodes of the node this method is called on.

The nodes can be concrete instances of any node type or rather abstract descriptions in the form of strings or objects returned from the tag() function that are used to derive TextNode respectively TagNode instances from.

Parameters:

node – The node(s) to be added.
clone – Clones the concrete nodes before adding if True.

replace_with(node: NodeSource, clone: bool = False) → NodeBase¶

Removes the node and places the given one in its tree location.

The node can be a concrete instance of any node type or a rather abstract description in the form of a string or an object returned from the tag() function that is used to derive a TextNode respectively TagNode instance from.

Parameters:

node – The replacing node.
clone – A concrete, replacing node is cloned if True.

Returns:

The removed node.

serialize(*, align_attributes: bool = False, indentation: str = '', namespaces: Optional[NamespaceDeclarations] = None, newline: Optional[str] = None, text_width: int = 0)[source]¶

Returns a string that contains the serialization of the node.

Parameters:

align_attributes – Determines whether attributes’ names and values line up sharply around vertically aligned equal signs.
indentation – This string prefixes descending nodes’ contents one time per depth level. A non-empty string implies line-breaks between nodes as well.
namespaces – A mapping of prefixes to namespaces. These are overriding possible declarations from a parsed serialisat that the document instance stems from. Prefixes for undeclared namespaces are enumerated with the prefix ns.
newline – See io.TextIOWrapper for a detailed explanation of the parameter with the same name.
text_width – A positive value indicates that text nodes shall get wrapped at this character position. Indentations are not considered as part of text. This parameter’s purposed to define reasonable widths for text displays that can be scrolled horizontally.

property universal_name: str¶: The node’s qualified name in Clark notation.

xpath(expression: str, namespaces: Optional[NamespaceDeclarations] = None) → QueryResults[source]¶

See Queries with XPath & CSS for details on the extent of the XPath implementation.

Parameters:

expression – A supported XPath 1.0 expression that contains one or more location paths.
namespaces – A mapping of prefixes that are used in the expression to namespaces. The declarations that were used in a document’s source serialisat serve as fallback.

Returns:

All nodes that match the evaluation of the provided XPath expression.

Tag attribute¶

class delb.nodes.Attribute(attributes: TagAttributes, key: str)[source]¶

Attribute objects represent tag node’s attributes. See the delb.TagNode.attributes() documentation for capabilities.

property local_name: str¶: The attribute’s local name.

property namespace: Optional[str]¶: The attribute’s namespace

property universal_name: str¶: The attribute’s namespace and local name in Clark notation.

property value: str¶: The attribute’s value.

Text¶

class delb.TextNode(reference_or_text: _Element | str | TextNode, position: int = 0)[source]¶

TextNodes contain the textual data of a document. The class shall not be initialized by client code, just throw strings into the trees.

Instances expose all methods of str except str.index():

>>> node = TextNode("Show us the way to the next whisky bar.")
>>> node.split()
['Show', 'us', 'the', 'way', 'to', 'the', 'next', 'whisky', 'bar.']

Instances can be tested for inequality with other text nodes and strings:

>>> TextNode("ham") == TextNode("spam")
False
>>> TextNode("Patsy") == "Patsy"
True

And they can be tested for substrings:

>>> "Sir" in TextNode("Sir Bedevere the Wise")
True

Attributes that rely to child nodes yield nothing respectively None.

Properties

`content`	The node's text content.
`depth`	The depth (or level) of the node in its tree.
`document`	The `Document` instance that the node is associated with or `None`.
`first_child`
`full_text`	The concatenated contents of all text node descendants in document order.
`index`	The node's index within the parent's collection of child nodes or `None` when the node has no parent.
`last_child`
`last_descendant`
`namespaces`	The prefix to namespace mapping of the node.
`parent`	The node's parent or `None`.

Fetching a single relative node

`fetch_following`(*filter)	param filter: Any number of filter s.
`fetch_following_sibling`(*filter)	param filter: Any number of filter s.
`fetch_preceding`(*filter)	param filter: Any number of filter s.
`fetch_preceding_sibling`(*filter)	param filter: Any number of filter s.

Iterating over relative nodes

`iterate_ancestors`(*filter)	param filter: Any number of filter s that a node must match to be
`iterate_children`(*filter[, recurse])	A generator iterator that yields nothing.
`iterate_descendants`(*filter)	param filter: Any number of filter s that a node must match to be
`iterate_following`(*filter)	param filter: Any number of filter s that a node must match to be
`iterate_following_siblings`(*filter)	param filter: Any number of filter s that a node must match to be
`iterate_preceding`(*filter)	param filter: Any number of filter s that a node must match to be
`iterate_preceding_siblings`(*filter)	param filter: Any number of filter s that a node must match to be

Querying nodes

xpath(expression[, namespaces])

See Queries with XPath & CSS for details on the extent of the XPath implementation.

Adding nodes

`add_following_siblings`(*node[, clone])	Adds one or more nodes to the right of the node this method is called on.
`add_preceding_siblings`(*node[, clone])	Adds one or more nodes to the left of the node this method is called on.

Removing a node from its tree

`detach`([retain_child_nodes])	Removes the node from its tree.
`replace_with`(node[, clone])	Removes the node and places the given one in its tree location.

Uncategorized methods

clone([deep, quick_and_unsafe])

param deep:: Clones the whole subtree if True.

new_tag_node(local_name[, attributes, ...])

Creates a new TagNode instance in the node's context.

serialize(*[, align_attributes, ...])

Returns a string that contains the serialization of the node.

add_following_siblings(*node: NodeSource, clone: bool = False)¶

Adds one or more nodes to the right of the node this method is called on.

The nodes can be concrete instances of any node type or rather abstract descriptions in the form of strings or objects returned from the tag() function that are used to derive TextNode respectively TagNode instances from.

Parameters:

node – The node(s) to be added.
clone – Clones the concrete nodes before adding if True.

add_preceding_siblings(*node: NodeSource, clone: bool = False)¶

Adds one or more nodes to the left of the node this method is called on.

The nodes can be concrete instances of any node type or rather abstract descriptions in the form of strings or objects returned from the tag() function that are used to derive TextNode respectively TagNode instances from.

Parameters:

node – The node(s) to be added.
clone – Clones the concrete nodes before adding if True.

clone(deep: bool = False, quick_and_unsafe: bool = False) → NodeBase[source]¶

Parameters:

deep – Clones the whole subtree if True.
quick_and_unsafe – Creates a deep clone in a quicker manner where text nodes may get lost. It should be safe with trees that don’t contain subsequent text nodes, e.g. freshly parsed, unaltered documents of after TagNode.merge_text_nodes() has been applied.

Returns:

A copy of the node.

property content: str¶: The node’s text content.

property depth: int¶: The depth (or level) of the node in its tree.

detach(retain_child_nodes: bool = False) → TextNode[source]¶

Removes the node from its tree.

Parameters:: retain_child_nodes – Keeps the node’s descendants in the originating tree if True.
Returns:: The removed node.

property document: Optional[Document]¶: The Document instance that the node is associated with or None.

fetch_following(*filter: _delb.typing.Filter) → Optional[NodeBase]¶

Parameters:: filter – Any number of filter s.
Returns:: The next node in document order that matches all filters or None.

fetch_following_sibling(*filter: _delb.typing.Filter) → Optional[NodeBase]¶

Parameters:: filter – Any number of filter s.
Returns:: The next sibling to the right that matches all filters or None.

fetch_preceding(*filter: _delb.typing.Filter) → Optional[NodeBase]¶

Parameters:: filter – Any number of filter s.
Returns:: The previous node in document order that matches all filters or None.

fetch_preceding_sibling(*filter: _delb.typing.Filter) → Optional[NodeBase][source]¶

Parameters:: filter – Any number of filter s.
Returns:: The next sibling to the left that matches all filters or None.

first_child = None¶

property full_text: str¶: The concatenated contents of all text node descendants in document order.

property index: Optional[int]¶: The node’s index within the parent’s collection of child nodes or None when the node has no parent.

iterate_ancestors(*filter: Filter) → Iterator[TagNode]¶

Parameters:: filter – Any number of filter s that a node must match to be yielded.
Returns:: A generator iterator that yields the ancestor nodes from bottom to top.

iterate_children(*filter: Filter, recurse: bool = False) → Iterator[NodeBase]¶

A generator iterator that yields nothing.

iterate_descendants(*filter: Filter) → Iterator[NodeBase]¶

Parameters:: filter – Any number of filter s that a node must match to be yielded.
Returns:: A generator iterator that yields the descending nodes of the node.

iterate_following(*filter: Filter) → Iterator[NodeBase]¶

Parameters:: filter – Any number of filter s that a node must match to be yielded.
Returns:: A generator iterator that yields the following nodes in document order.

iterate_following_siblings(*filter: Filter) → Iterator[NodeBase]¶

Parameters:: filter – Any number of filter s that a node must match to be yielded.
Returns:: A generator iterator that yields the siblings to the node’s right.

iterate_preceding(*filter: Filter) → Iterator[NodeBase]¶

Parameters:: filter – Any number of filter s that a node must match to be yielded.
Returns:: A generator iterator that yields the previous nodes in document order.

iterate_preceding_siblings(*filter: Filter) → Iterator[NodeBase]¶

Parameters:: filter – Any number of filter s that a node must match to be yielded.
Returns:: A generator iterator that yields the siblings to the node’s left.

last_child = None¶

last_descendant = None¶

property namespaces¶: The prefix to namespace mapping of the node.

new_tag_node(local_name: str, attributes: Optional[dict[str, str]] = None, namespace: Optional[str] = None, children: Sequence[str | NodeBase | _TagDefinition] = ()) → TagNode¶

Creates a new TagNode instance in the node’s context.

Parameters:

local_name – The tag name.
attributes – Optional attributes that are assigned to the new node.
namespace – An optional tag namespace. If none is provided, the context node’s namespace is inherited.
children – An optional sequence of objects that will be appended as child nodes. This can be existing nodes, strings that will be inserted as text nodes and in-place definitions of TagNode instances from tag(). The latter will be assigned to the same namespace.

Returns:

The newly created tag node.

property parent: Optional[TagNode]¶: The node’s parent or None.

replace_with(node: NodeSource, clone: bool = False) → NodeBase¶

Removes the node and places the given one in its tree location.

The node can be a concrete instance of any node type or a rather abstract description in the form of a string or an object returned from the tag() function that is used to derive a TextNode respectively TagNode instance from.

Parameters:

node – The replacing node.
clone – A concrete, replacing node is cloned if True.

Returns:

The removed node.

serialize(*, align_attributes: bool = False, indentation: str = '', namespaces: Optional[NamespaceDeclarations] = None, newline: Optional[str] = None, text_width: int = 0)¶

Returns a string that contains the serialization of the node.

Parameters:

align_attributes – Determines whether attributes’ names and values line up sharply around vertically aligned equal signs.
indentation – This string prefixes descending nodes’ contents one time per depth level. A non-empty string implies line-breaks between nodes as well.
namespaces – A mapping of prefixes to namespaces. These are overriding possible declarations from a parsed serialisat that the document instance stems from. Prefixes for undeclared namespaces are enumerated with the prefix ns.
newline – See io.TextIOWrapper for a detailed explanation of the parameter with the same name.
text_width – A positive value indicates that text nodes shall get wrapped at this character position. Indentations are not considered as part of text. This parameter’s purposed to define reasonable widths for text displays that can be scrolled horizontally.

xpath(expression: str, namespaces: Optional[NamespaceDeclarations] = None) → QueryResults¶

See Queries with XPath & CSS for details on the extent of the XPath implementation.

Parameters:

expression – A supported XPath 1.0 expression that contains one or more location paths.
namespaces – A mapping of prefixes that are used in the expression to namespaces. The declarations that were used in a document’s source serialisat serve as fallback.

Returns:

All nodes that match the evaluation of the provided XPath expression.

Node constructors¶

delb.new_comment_node(content: str) → CommentNode[source]¶

Creates a new CommentNode.

Parameters:: content – The comment’s content a.k.a. text.
Returns:: The newly created comment node.

delb.new_processing_instruction_node(target: str, content: str) → ProcessingInstructionNode[source]¶

Creates a new ProcessingInstructionNode.

Parameters:

target – The processing instruction’s target name.
content – The processing instruction’s text.

Returns:

The newly created processing instruction node.

delb.new_tag_node(local_name: str, attributes: Optional[dict[str, str]] = None, namespace: Optional[str] = None, children: Sequence[NodeSource] = ()) → TagNode[source]¶

Creates a new TagNode instance outside any context. It is preferable to use the method new_tag_node on instances of documents and nodes where the namespace is inherited.

Parameters:

local_name – The tag name.
attributes – Optional attributes that are assigned to the new node.
namespace – An optional tag namespace.
children – An optional sequence of objects that will be appended as child nodes. This can be existing nodes, strings that will be inserted as text nodes and in-place definitions of TagNode instances from tag(). The latter will be assigned to the same namespace.

Returns:

The newly created tag node.

Queries with XPath & CSS¶

delb allows querying of nodes with CSS selector and XPath expressions. CSS selectors are converted to XPath expressions with a third-party library before evaluation and they are only supported as far as their computed XPath equivalents are supported by delb’s very own XPath implementation.

This implementation is not fully compliant with one of the W3C’s XPath specifications. It mostly covers the XPath 1.0 specs, but focuses on the querying via path expressions with simple constraints while it omits a broad employment of computations (that’s what programming languages are for) and has therefore these intended deviations from that standard:

Default namespaces can be addressed in node and attribute names, by simply using no prefix.
The attribute and namespace axes are not supported in location steps (see also below).
In predicates only the attribute axis can be used in its abbreviated form (@name).
Path evaluations within predicates are not available.
Only these predicate functions are provided and tested:
- boolean
- concat
- contains
- last
- not
- position
- starts-with
- text
  
  Behaves as if deployed as a single step location path that only tests for the node type text. Hence it returns the contents of the context node’s first child node that is a text node or an empty string when there is none.
- Please refrain from extension requests without a proper, concrete implementation proposal.

If you’re accustomed to retrieve attribute values with XPath expressions, employ the functionality of the higher programming language at hand like this:

>>> [x.attributes["target"] for x in root.xpath(".//foo")
...  if "target" in x.attributes ]  

Instead of:

>>> root.xpath(".//foo/@target")  

See _delb.plugins.PluginManager.register_xpath_function() regarding the use of custom functions.

class _delb.xpath.EvaluationContext(node: NodeBase, position: int, size: int, namespaces: Namespaces)[source]¶

Instances of this type are passed to XPath functions in order to pass contextual information.

count(value, /)¶: Return number of occurrences of value.

index(value, start=0, stop=9223372036854775807, /)¶

Return first index of value.

Raises ValueError if the value is not present.

property namespaces¶: A mapping of prefixes to namespaces that is used in the whole evaluation.

property node¶: The node that is evaluated.

property position¶: The node’s position within all nodes that matched a location step’s node test in order of the step’s axis’ direction. The first position is 1.

property size¶: The number of all nodes all nodes that matched a location step’s node test.

class _delb.xpath.QueryResults(results: Iterable[NodeBase])[source]¶

A container that includes the results of a CSS selector or XPath query with some helpers for better readable Python expressions.

as_list() → list[NodeBase][source]¶: The contained nodes as a new list.

property as_tuple: tuple[NodeBase, ...]¶: The contained nodes in a tuple.

count(value) → integer -- return number of occurrences of value¶

filtered_by(*filters: _delb.typing.Filter) → QueryResults[source]¶: Returns another QueryResults instance that contains all nodes filtered by the provided filter s.

property first: Optional[NodeBase]¶: The first node from the results or None if there are none.

in_document_order() → QueryResults[source]¶: Returns another QueryResults instance where the contained nodes are sorted in document order.

index(value[, start[, stop]]) → integer -- return first index of value.¶

Raises ValueError if the value is not present.

Supporting start and stop arguments is optional, but recommended.

property last: Optional[NodeBase]¶: The last node from the results or None if there are none.

property size: int¶: The amount of contained nodes.

Filters¶

Default filters¶

delb.altered_default_filters(*filter: _delb.typing.Filter, extend: bool = False)[source]¶

This function can be either used as as context manager or decorator to define a set of default_filters for the encapsuled code block or callable. These are then applied in all operations that allow node filtering, like TagNode.next_node(). Mind that they also affect a node’s index property and indexed access to child nodes.

>>> root = Document(
...     '<root xmlns="foo"><a/><!--x--><b/><!--y--><c/></root>'
... ).root
>>> with altered_default_filters(is_comment_node):
...     print([x.content for x in root.iterate_children()])
['x', 'y']

As the default filters shadow comments and processing instructions by default, use no argument to unset this in order to access all type of nodes.

Parameters:

filter – The filters to set or append.
extend – Extends the currently active filters with the given ones instead of replacing them.

Contributed filters¶

delb.any_of(*filter: _delb.typing.Filter) → _delb.typing.Filter[source]¶: A node filter wrapper that matches when any of the given filters is matching, like a boolean or.

delb.is_comment_node(node: NodeBase) → bool[source]¶: A node filter that matches CommentNode instances.

delb.is_processing_instruction_node(node: NodeBase) → bool[source]¶: A node filter that matches ProcessingInstructionNode instances.

delb.is_tag_node(node: NodeBase) → bool[source]¶: A node filter that matches TagNode instances.

delb.is_text_node(node: NodeBase) → bool[source]¶: A node filter that matches TextNode instances.

delb.not_(*filter: _delb.typing.Filter) → _delb.typing.Filter[source]¶: A node filter wrapper that matches when the given filter is not matching, like a boolean not.

Transformations¶

This module offers a canonical interface with the aim to make re-use of transforming algorithms easier.

Let’s look at it with examples:

from delb.transform import Transformation


class ResolveCopyOf(Transformation):
    def transform(self):
        for node in self.root.css_select("*[copyOf]"):
            source_id = node["copyOf"]
            source_node = self.origin_document.xpath(
                f'//*[@xml:id="{source_id[1:]}"]'
            ).first
            cloned_node = source_node.clone(deep=True)
            cloned_node.id = None
            node.replace_with(cloned_node)

From such defined transformations instances can be called with a (sub-)tree and an optional document where that tree originates from:

resolve_copy_of = ResolveCopyOf()
tree = resolve_copy_of(tree)  # where tree is an instance of TagNode

typing.NamedTuple are used to define options for transformations:

from typing import NamedTuple


class ResolveChoiceOptions(NamedTuple):
    corr: bool = True
    reg: bool = True


class ResolveChoice(Transformation):
    options_class = ResolveChoiceOptions

    def __init__(self, options):
        super().__init__(options)
        self.keep_selector = ",".join(
            (
                "corr" if self.options.corr else "sic",
                "reg" if self.options.reg else "orig"
            )
         )
        self.drop_selector = ",".join(
            (
                "sic" if self.options.corr else "corr",
                "orig" if self.options.reg else "reg"
            )
        )

    def transform(self):
        for choice_node in self.root.css_select("choice"):
            node_to_drop = choice_node.css_select(self.drop_selector).first
            node_to_drop.detach()

            node_to_keep = choice_node.css_select(self.keep_selector).first
            node_to_keep.detach(retain_child_nodes=True)

            choice_node.detach(retain_child_nodes=True)

A transformation class that defines an option_class property can then either be used with its defaults or with alternate options:

resolve_choice = ResolveChoice()
tree = resolve_choice(tree)

resolve_choice = ResolveChoice(ResolveChoiceOptions(reg=False))
tree = resolve_choice(tree)

Finally, concrete transformations can be chained, both as classes or instances. The interface allows also to chain multiple chains:

from delb.transform import TransformationSequence

tidy_up = TransformationSequence(ResolveCopyOf, resolve_choice)
tree = tidy_up(tree)

Attention

This is an experimental feature. It might change significantly in the future or be removed altogether.

class delb.transform.Transformation(options: Optional[NamedTuple] = None)[source]¶

This is a base class for any transformation algorithm.

abstract transform()[source]¶

This method needs to implement the transformation logic. When it is called, the instance has two attributes assigned from its call:

root is the node that the transformation was called to transform with. origin_document is the document that was possibly passed as second argument.

class delb.transform.TransformationSequence(*transformations: TransformationBase | type[TransformationBase])[source]¶: A transformation sequence can be used to combine any number of both Transformation (provided as class or instantiated with options) and other TransformationSequence instances or classes.

String serialization¶

class delb.DefaultStringOptions[source]¶

This object’s class variables are used to configure the serialization parameters that are applied when nodes are coerced to str objects. Hence it also applies when node objects are fed to the print() function and in other cases where objects are implicitly cast to strings.

⚠️ Use this once to define behaviour on application level. For thread-safe serializations of nodes with diverging parameters use NodeBase.serialize()! Think thrice whether you want to use this facility in a library.

align_attributes: ClassWar[bool] = False¶: Determines whether attributes’ names and values line up sharply around vertically aligned equal signs.

indentation: ClassWar[str] = ''¶: This string prefixes descending nodes’ contents one time per depth level. A non-empty string implies line-breaks between nodes as well.

namespaces: ClassWar[None | NamespaceDeclarations] = None¶: A mapping of prefixes to namespaces. These are overriding possible declarations from a parsed serialisat that the document instance stems from. Prefixes for undeclared namespaces are enumerated with the prefix ns.

newline: ClassWar[None | str] = None¶: See io.TextIOWrapper for a detailed explanation of the parameter with the same name.

classmethod reset_defaults()[source]¶: Restores the factory settings.

text_width: ClassWar[int] = 0¶: A positive value indicates that text nodes shall get wrapped at this character position. Indentations are not considered as part of text. This parameter’s purposed to define reasonable widths for text displays that can be scrolled horizontally.

Various helpers¶

delb.first(iterable: Iterable) → Optional[Any][source]¶: Returns the first item of the given iterable or None if it’s empty. Note that the first item is consumed when the iterable is an iterator.

delb.get_traverser(from_left=True, depth_first=True, from_top=True)[source]¶

Returns a function that can be used to traverse a (sub)tree with the given node as root. While traversing the given root node is yielded at some point.

The returned functions have this signature:

def traverser(root: NodeBase, *filters: Filter) -> Iterator[NodeBase]:
    ...

Parameters:

from_left – The traverser yields sibling nodes from left to right if True, or starting from the right if False.
depth_first – The child nodes resp. the parent node are yielded before the siblings of a node by a traverser if True. Siblings are favored if False.
from_top – The traverser starts yielding nodes with the lowest depth if True. When False, again, the opposite is in effect.

delb.last(iterable: Iterable) → Optional[Any][source]¶: Returns the last item of the given iterable or None if it’s empty. Note that the whole iterator is consumed when such is given.

delb.tag(local_name: str)[source]¶

delb.tag(local_name: str, attributes: Mapping[str, str])

delb.tag(local_name: str, child: NodeSource)

delb.tag(local_name: str, children: Sequence[NodeSource])

delb.tag(local_name: str, attributes: Mapping[str, str], child: NodeSource)

delb.tag(local_name: str, attributes: Mapping[str, str], children: Sequence[NodeSource])

This function can be used for in-place creation (or call it templating if you want to) of TagNode instances as:

node argument to methods that add nodes to a tree
items in the children argument of new_tag_node() and NodeBase.new_tag_node()

The first argument to the function is always the local name of the tag node. Optionally, the second argument can be a mapping that specifies attributes for that node. The optional last argument is either a single object that will be appended as child node or a sequence of such, these objects can be node instances of any type, strings (for derived TextNode instances) or other definitions from this function (for derived TagNode instances).

The actual nodes that are constructed always inherit the namespace of the context node they are created in.

>>> root = new_tag_node('root', children=[
...     tag("head", {"lvl": "1"}, "Hello!"),
...     tag("items", (
...         tag("item1"),
...         tag("item2"),
...         )
...     )
... ])
>>> str(root)
'<root><head lvl="1">Hello!</head><items><item1/><item2/></items></root>'
>>> root.append_children(tag("addendum"))
>>> str(root)[-26:]
'</items><addendum/></root>'

Exceptions¶

exception delb.exceptions.AmbiguousTreeError(message: str)[source]¶

Raised when a single node shall be fetched or created by an XPath expression in a tree where the target position can’t be clearly determined.

with_traceback()¶: Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

exception delb.exceptions.DelbBaseException[source]¶

with_traceback()¶: Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

exception delb.exceptions.FailedDocumentLoading(source: Any, excuses: dict[Loader, str | Exception])[source]¶

with_traceback()¶: Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

exception delb.exceptions.InvalidCodePath[source]¶

Raised when a code path that is not expected to be executed is reached.

with_traceback()¶: Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

exception delb.exceptions.InvalidOperation[source]¶

Raised when an invalid operation is attempted by the client code.

with_traceback()¶: Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

exception delb.exceptions.XPathEvaluationError(message: str)[source]¶

with_traceback()¶: Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

exception delb.exceptions.XPathParsingError(expression: Optional[str] = None, position: Optional[int] = None, message: Optional[str] = None)[source]¶

Raised when an XPath expression can’t be parsed.

with_traceback()¶: Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

exception delb.exceptions.XPathUnsupportedStandardFeature(position: int, feature_description: str)[source]¶

Raised when an unsupported XPath expression feature is recognized.

with_traceback()¶: Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

API Documentation¶

Documents¶

Document loaders¶

Core¶

Extra¶

Parser options¶

Nodes¶

Comment¶

Processing instruction¶

Tag¶

Tag attribute¶

Text¶

Node constructors¶

Queries with XPath & CSS¶

Filters¶

Default filters¶

Contributed filters¶

Transformations¶

String serialization¶

Various helpers¶

Exceptions¶

Table of Contents

Related Topics

This Page