API Documentation

Note

There are actually two packages that are installed with delb: delb and _delb. As the underscore indicates, the latter is exposing private parts of the API while the first is re-exposing what is deemed to be public from that one and additional contents. As a rule of thumb, use the public API in applications and the private API in delb extensions. By doing so, you can avoid circular dependencies if your extension (or other code that it depends on) uses contents from the _delb package.

Documents

class delb.Document(source, collapse_whitespace=None, parser=None, parser_options=None, klass=None, **config)[source]

This class is the entrypoint to obtain a representation of an XML encoded text document. For instantiation any object can be passed. A suitable loader must be available for the given source. See Document loaders for the default loaders that come with this package. Plugins are capable to alter the available loaders, see Extending delb.

Nodes can be tested for membership in a document:

>>> document = Document("<root>text</root>")
>>> text_node = document.root[0]
>>> text_node in document
True
>>> text_node.clone() in document
False

The string coercion of a document yields an XML encoded stream, but unlike Document.save() and Document.write(), without an XML declaration:

>>> document = Document("<root/>")
>>> str(document)
'<root/>'
Parameters
  • source – Anything that the configured loaders can make sense of to return a parsed document tree.

  • collapse_whitespace – Deprecated. Use the argument with the same name on the parser_options object.

  • parser – Deprecated.

  • parser_options – A delb.ParserOptions class to configure the used parser.

  • klass – Explicitly define the initilized class. This can be useful for applications that have default document subclasses in use.

  • config – Additional keyword arguments for the configuration of extension classes.

Properties

config

Beside the used parser and collapsed_whitespace option, this property contains the namespaced data that extension classes and loaders may have stored.

head_nodes

A list-like accessor to the nodes that precede the document's root node.

namespaces

The namespace mapping of the document's root node.

root

The root node of a document tree.

source_url

The source URL where a loader obtained the document's contents or None.

tail_nodes

A list-like accessor to the nodes that follow the document's root node.

Uncategorized methods

cleanup_namespaces([namespaces, retain_prefixes])

Consolidates the namespace declarations in the document by removing unused and redundant ones.

clone()

return

Another instance with the duplicated contents.

collapse_whitespace()

Collapses whitespace as described here: https://wiki.tei-c.org/index.php/XML_Whitespace#Recommendations

css_select(expression[, namespaces])

This method proxies to the TagNode.css_select() method of the document's root node.

merge_text_nodes()

This method proxies to the TagNode.merge_text_nodes() method of the document's root node.

new_tag_node(local_name[, attributes, namespace])

This method proxies to the TagNode.new_tag_node() method of the document's root node.

save(path[, pretty])

param path

The path where the document shall be saved.

write(buffer[, pretty])

param buffer

A file-like object that the document is written to.

xpath(expression[, namespaces])

This method proxies to the TagNode.xpath() method of the document's root node.

xslt(transformation)

param transformation

A lxml.etree.XSLT instance that shall be


cleanup_namespaces(namespaces: Optional[Mapping[Optional[str], str]] = None, retain_prefixes: Optional[Iterable[str]] = None)[source]

Consolidates the namespace declarations in the document by removing unused and redundant ones.

There are currently some caveats due to lxml/libxml2’s implementations:
  • prefixes cannot be set for the default namespace

  • a namespace cannot be declared as default after a node’s creation (where a namespace was specified that had been registered for a prefix with register_namespace())

  • there’s no way to unregister a prefix for a namespace

  • if there are other namespaces used as default namespaces (where a namespace was specified that had not been registered for a prefix) in the descendants of the root, their declarations are lost when this method is used

To ensure clean serializations, one should:
  • register prefixes for all namespaces except the default one at the start of an application

  • use only one default namespace within a document

Parameters
  • namespaces – An optional mapping of prefixes (keys) to namespaces (values) that will be declared at the root element.

  • retain_prefixes – An optional iterable that contains prefixes whose declarations shall be kept despite not being used.

clone() Document[source]
Returns

Another instance with the duplicated contents.

collapse_whitespace()[source]

Collapses whitespace as described here: https://wiki.tei-c.org/index.php/XML_Whitespace#Recommendations

Implicitly merges all neighbouring text nodes.

config: SimpleNamespace

Beside the used parser and collapsed_whitespace option, this property contains the namespaced data that extension classes and loaders may have stored.

css_select(expression: str, namespaces: Optional[Namespaces] = None) QueryResults[source]

This method proxies to the TagNode.css_select() method of the document’s root node.

head_nodes

A list-like accessor to the nodes that precede the document’s root node. Note that nodes can’t be removed or replaced.

merge_text_nodes()[source]

This method proxies to the TagNode.merge_text_nodes() method of the document’s root node.

property namespaces: Namespaces

The namespace mapping of the document’s root node.

new_tag_node(local_name: str, attributes: Optional[Dict[str, str]] = None, namespace: Optional[str] = None) TagNode[source]

This method proxies to the TagNode.new_tag_node() method of the document’s root node.

property root: TagNode

The root node of a document tree.

save(path: Path, pretty: bool = False, **cleanup_namespaces_args)[source]
Parameters
  • path – The path where the document shall be saved.

  • pretty – Adds indentation for human consumers when True.

  • cleanup_namespaces_args – Arguments that are a passed to Document.cleanup_namespaces() before saving.

source_url: Optional[str]

The source URL where a loader obtained the document’s contents or None.

tail_nodes

A list-like accessor to the nodes that follow the document’s root node. Note that nodes can’t be removed or replaced.

write(buffer: IO, pretty: bool = False, **cleanup_namespaces_args)[source]
Parameters
  • buffer – A file-like object that the document is written to.

  • pretty – Adds indentation for human consumers when True.

  • cleanup_namespaces_args – Arguments that are a passed to Document.cleanup_namespaces() before writing.

xpath(expression: str, namespaces: Optional[Namespaces] = None) QueryResults[source]

This method proxies to the TagNode.xpath() method of the document’s root node.

xslt(transformation: XSLT) Document[source]
Parameters

transformation – A lxml.etree.XSLT instance that shall be applied to the document.

Returns

A new instance with the transformation’s result.

Document loaders

If you want or need to manipulate the availability of or order in which loaders are attempted, you can change the delb.plugins.plugin_manager.plugins.loaders object which is a list. Its state is reflected in your whole application. Please refer to this issue when you require finer controls over these aspects.

Core

The core_loaders module provides a set loaders to retrieve documents from various data sources.

_delb.plugins.core_loaders.buffer_loader(data: Any, config: SimpleNamespace) _delb.typing.LoaderResult[source]

This loader loads a document from a file-like object.

_delb.plugins.core_loaders.etree_loader(data: Any, config: SimpleNamespace) _delb.typing.LoaderResult[source]

This loader processes lxml.etree._Element and lxml.etree._ElementTree instances.

_delb.plugins.core_loaders.ftp_loader(data: Any, config: SimpleNamespace) _delb.typing.LoaderResult[source]

Loads a document from a URL with either the ftp schema. The URL will be bound to source_url on the document’s Document.config attribute.

_delb.plugins.core_loaders.path_loader(data: Any, config: SimpleNamespace) _delb.typing.LoaderResult[source]

This loader loads from a file that is pointed at with a pathlib.Path instance. That instance will be bound to source_path on the document’s Document.config attribute.

_delb.plugins.core_loaders.tag_node_loader(data: Any, config: SimpleNamespace) _delb.typing.LoaderResult[source]

This loader loads, or rather clones, a delb.TagNode instance and its descendant nodes.

_delb.plugins.core_loaders.text_loader(data: Any, config: SimpleNamespace) _delb.typing.LoaderResult[source]

Parses a string containing a full document.

Extra

If delb is installed with https-loader as extra, the required dependencies for this loader are installed as well. See Installation.

_delb.plugins.https_loader.https_loader(data: ~typing.Any, config: ~types.SimpleNamespace, client: ~httpx.Client = <httpx.Client object>) _delb.typing.LoaderResult[source]

This loader loads a document from a URL with the http and https scheme. Redirects are followed. The default httpx-client follows redirects and can partially be configured with environment variables. The URL will be bound to the name source_url on the document’s Document.config attribute.

Loaders with specifically configured httpx-clients can build on this loader like so:

import httpx
from _delb.plugins import plugin_manager
from _delb.plugins.https_loader import https_loader


client = httpx.Client(follow_redirects=False, trust_env=False)

@plugin_manager.register_loader(before=https_loader)
def custom_https_loader(data, config):
    return https_loader(data, config, client=client)

Parser options

class delb.ParserOptions(cleanup_namespaces: bool = False, collapse_whitespace: bool = False, remove_comments: bool = False, remove_processing_instructions: bool = False, resolve_entities: bool = True, unplugged: bool = False)[source]

The configuration options that define an XML parser’s behaviour.

Parameters
  • cleanup_namespaces – Consolidate XML namespace declarations.

  • collapse_whitespaceCollapse the content's whitespace.

  • remove_comments – Ignore comments.

  • remove_processing_instructions – Don’t include processing instructions in the parsed tree.

  • resolve_entities – Resolve entities.

  • unplugged – Don’t load referenced resources over network.

Nodes

Comment

class delb.CommentNode(etree_element: _Element)[source]

The instances of this class represent comment nodes of a tree.

To instantiate new nodes use new_comment_node().

Properties

content

The comment's text.

depth

The depth (or level) of the node in its tree.

document

The Document instances that the node is associated with or None.

first_child

full_text

The concatenated contents of all text node descendants in document order.

index

The node's index within the parent's collection of child nodes or None when the node has no parent.

last_child

last_descendant

namespaces

The prefix to namespace mapping of the node.

parent

The node's parent or None.

Fetching a single relative node

fetch_following(*filter)

param filter

Any number of filter s.

fetch_following_sibling(*filter)

param filter

Any number of filter s.

fetch_preceding(*filter)

param filter

Any number of filter s.

fetch_preceding_sibling(*filter)

param filter

Any number of filter s.

Iterating over relative nodes

iterate_ancestors(*filter)

param filter

Any number of filter s that a node must match to be

iterate_children(*filter[, recurse])

A generator iterator that yields nothing.

iterate_descendants(*filter)

param filter

Any number of filter s that a node must match to be

iterate_following(*filter)

param filter

Any number of filter s that a node must match to be

iterate_following_siblings(*filter)

param filter

Any number of filter s that a node must match to be

iterate_preceding(*filter)

param filter

Any number of filter s that a node must match to be

iterate_preceding_siblings(*filter)

param filter

Any number of filter s that a node must match to be

Querying nodes

xpath(expression[, namespaces])

See Queries with XPath & CSS for details on the extent of the XPath implementation.

Adding nodes

add_following_siblings(*node[, clone])

Adds one or more nodes to the right of the node this method is called on.

add_preceding_siblings(*node[, clone])

Adds one or more nodes to the left of the node this method is called on.

Removing a node from its tree

detach([retain_child_nodes])

Removes the node from its tree.

replace_with(node[, clone])

Removes the node and places the given one in its tree location.

Uncategorized methods

clone([deep, quick_and_unsafe])

param deep

Clones the whole subtree if True.

new_tag_node(local_name[, attributes, ...])

Creates a new TagNode instance in the node's context.


add_following_siblings(*node: Union[str, NodeBase, _TagDefinition], clone: bool = False)

Adds one or more nodes to the right of the node this method is called on.

The nodes can be concrete instances of any node type or rather abstract descriptions in the form of strings or objects returned from the tag() function that are used to derive TextNode respectively TagNode instances from.

Parameters
  • node – The node(s) to be added.

  • clone – Clones the concrete nodes before adding if True.

add_preceding_siblings(*node: Union[str, NodeBase, _TagDefinition], clone: bool = False)

Adds one or more nodes to the left of the node this method is called on.

The nodes can be concrete instances of any node type or rather abstract descriptions in the form of strings or objects returned from the tag() function that are used to derive TextNode respectively TagNode instances from.

Parameters
  • node – The node(s) to be added.

  • clone – Clones the concrete nodes before adding if True.

clone(deep: bool = False, quick_and_unsafe: bool = False) _ElementWrappingNode
Parameters
  • deep – Clones the whole subtree if True.

  • quick_and_unsafe – Creates a deep clone in a quicker manner where text nodes may get lost. It should be safe with trees that don’t contain subsequent text nodes, e.g. freshly parsed, unaltered documents of after TagNode.merge_text_nodes() has been applied.

Returns

A copy of the node.

property content: str

The comment’s text.

property depth: int

The depth (or level) of the node in its tree.

detach(retain_child_nodes: bool = False) _ElementWrappingNode

Removes the node from its tree.

Parameters

retain_child_nodes – Keeps the node’s descendants in the originating tree if True.

Returns

The removed node.

property document: Optional[Document]

The Document instances that the node is associated with or None.

fetch_following(*filter: _delb.typing.Filter) Optional[NodeBase]
Parameters

filter – Any number of filter s.

Returns

The next node in document order that matches all filters or None.

fetch_following_sibling(*filter: _delb.typing.Filter) Optional[NodeBase]
Parameters

filter – Any number of filter s.

Returns

The next sibling to the right that matches all filters or None.

fetch_preceding(*filter: _delb.typing.Filter) Optional[NodeBase]
Parameters

filter – Any number of filter s.

Returns

The previous node in document order that matches all filters or None.

fetch_preceding_sibling(*filter: _delb.typing.Filter) Optional[NodeBase]
Parameters

filter – Any number of filter s.

Returns

The next sibling to the left that matches all filters or None.

first_child = None
property full_text: str

The concatenated contents of all text node descendants in document order.

property index: Optional[int]

The node’s index within the parent’s collection of child nodes or None when the node has no parent.

iterate_ancestors(*filter: _delb.typing.Filter) Iterator[TagNode]
Parameters

filter – Any number of filter s that a node must match to be yielded.

Returns

A generator iterator that yields the ancestor nodes from bottom to top.

iterate_children(*filter: _delb.typing.Filter, recurse: bool = False) Iterator[NodeBase]

A generator iterator that yields nothing.

iterate_descendants(*filter: _delb.typing.Filter) Iterator[NodeBase]
Parameters

filter – Any number of filter s that a node must match to be yielded.

Returns

A generator iterator that yields the descending nodes of the node.

iterate_following(*filter: _delb.typing.Filter) Iterator[NodeBase]
Parameters

filter – Any number of filter s that a node must match to be yielded.

Returns

A generator iterator that yields the following nodes in document order.

iterate_following_siblings(*filter: _delb.typing.Filter) Iterator[NodeBase]
Parameters

filter – Any number of filter s that a node must match to be yielded.

Returns

A generator iterator that yields the siblings to the node’s right.

iterate_preceding(*filter: _delb.typing.Filter) Iterator[NodeBase]
Parameters

filter – Any number of filter s that a node must match to be yielded.

Returns

A generator iterator that yields the previous nodes in document order.

iterate_preceding_siblings(*filter: _delb.typing.Filter) Iterator[NodeBase]
Parameters

filter – Any number of filter s that a node must match to be yielded.

Returns

A generator iterator that yields the siblings to the node’s left.

last_child = None
last_descendant = None
property namespaces: Namespaces

The prefix to namespace mapping of the node.

new_tag_node(local_name: str, attributes: Optional[Dict[str, str]] = None, namespace: Optional[str] = None, children: Sequence[Union[str, NodeBase, _TagDefinition]] = ()) TagNode

Creates a new TagNode instance in the node’s context.

Parameters
  • local_name – The tag name.

  • attributes – Optional attributes that are assigned to the new node.

  • namespace – An optional tag namespace. If none is provided, the context node’s namespace is inherited.

  • children – An optional sequence of objects that will be appended as child nodes. This can be existing nodes, strings that will be inserted as text nodes and in-place definitions of TagNode instances from tag(). The latter will be assigned to the same namespace.

Returns

The newly created tag node.

property parent: Optional[TagNode]

The node’s parent or None.

replace_with(node: Union[str, NodeBase, _TagDefinition], clone: bool = False) NodeBase

Removes the node and places the given one in its tree location.

The node can be a concrete instance of any node type or a rather abstract description in the form of a string or an object returned from the tag() function that is used to derive a TextNode respectively TagNode instance from.

Parameters
  • node – The replacing node.

  • clone – A concrete, replacing node is cloned if True.

Returns

The removed node.

xpath(expression: str, namespaces: Optional[Namespaces] = None) QueryResults

See Queries with XPath & CSS for details on the extent of the XPath implementation.

Parameters
  • expression – A supported XPath 1.0 expression that contains one or more location paths.

  • namespaces – A mapping of prefixes that are used in the expression to namespaces. If omitted, the node’s definition is used.

Returns

All nodes that match the evaluation of the provided XPath expression.

Processing instruction

class delb.ProcessingInstructionNode(etree_element: _Element)[source]

The instances of this class represent processing instruction nodes of a tree.

To instantiate new nodes use new_processing_instruction_node().

Properties

content

The processing instruction's text.

depth

The depth (or level) of the node in its tree.

document

The Document instances that the node is associated with or None.

first_child

full_text

The concatenated contents of all text node descendants in document order.

index

The node's index within the parent's collection of child nodes or None when the node has no parent.

last_child

last_descendant

namespaces

The prefix to namespace mapping of the node.

parent

The node's parent or None.

target

The processing instruction's target.

Fetching a single relative node

fetch_following(*filter)

param filter

Any number of filter s.

fetch_following_sibling(*filter)

param filter

Any number of filter s.

fetch_preceding(*filter)

param filter

Any number of filter s.

fetch_preceding_sibling(*filter)

param filter

Any number of filter s.

Iterating over relative nodes

iterate_ancestors(*filter)

param filter

Any number of filter s that a node must match to be

iterate_children(*filter[, recurse])

A generator iterator that yields nothing.

iterate_descendants(*filter)

param filter

Any number of filter s that a node must match to be

iterate_following(*filter)

param filter

Any number of filter s that a node must match to be

iterate_following_siblings(*filter)

param filter

Any number of filter s that a node must match to be

iterate_preceding(*filter)

param filter

Any number of filter s that a node must match to be

iterate_preceding_siblings(*filter)

param filter

Any number of filter s that a node must match to be

Querying nodes

xpath(expression[, namespaces])

See Queries with XPath & CSS for details on the extent of the XPath implementation.

Adding nodes

add_following_siblings(*node[, clone])

Adds one or more nodes to the right of the node this method is called on.

add_preceding_siblings(*node[, clone])

Adds one or more nodes to the left of the node this method is called on.

Removing a node from its tree

detach([retain_child_nodes])

Removes the node from its tree.

replace_with(node[, clone])

Removes the node and places the given one in its tree location.

Uncategorized methods

clone([deep, quick_and_unsafe])

param deep

Clones the whole subtree if True.

new_tag_node(local_name[, attributes, ...])

Creates a new TagNode instance in the node's context.


add_following_siblings(*node: Union[str, NodeBase, _TagDefinition], clone: bool = False)

Adds one or more nodes to the right of the node this method is called on.

The nodes can be concrete instances of any node type or rather abstract descriptions in the form of strings or objects returned from the tag() function that are used to derive TextNode respectively TagNode instances from.

Parameters
  • node – The node(s) to be added.

  • clone – Clones the concrete nodes before adding if True.

add_preceding_siblings(*node: Union[str, NodeBase, _TagDefinition], clone: bool = False)

Adds one or more nodes to the left of the node this method is called on.

The nodes can be concrete instances of any node type or rather abstract descriptions in the form of strings or objects returned from the tag() function that are used to derive TextNode respectively TagNode instances from.

Parameters
  • node – The node(s) to be added.

  • clone – Clones the concrete nodes before adding if True.

clone(deep: bool = False, quick_and_unsafe: bool = False) _ElementWrappingNode
Parameters
  • deep – Clones the whole subtree if True.

  • quick_and_unsafe – Creates a deep clone in a quicker manner where text nodes may get lost. It should be safe with trees that don’t contain subsequent text nodes, e.g. freshly parsed, unaltered documents of after TagNode.merge_text_nodes() has been applied.

Returns

A copy of the node.

property content: str

The processing instruction’s text.

property depth: int

The depth (or level) of the node in its tree.

detach(retain_child_nodes: bool = False) _ElementWrappingNode

Removes the node from its tree.

Parameters

retain_child_nodes – Keeps the node’s descendants in the originating tree if True.

Returns

The removed node.

property document: Optional[Document]

The Document instances that the node is associated with or None.

fetch_following(*filter: _delb.typing.Filter) Optional[NodeBase]
Parameters

filter – Any number of filter s.

Returns

The next node in document order that matches all filters or None.

fetch_following_sibling(*filter: _delb.typing.Filter) Optional[NodeBase]
Parameters

filter – Any number of filter s.

Returns

The next sibling to the right that matches all filters or None.

fetch_preceding(*filter: _delb.typing.Filter) Optional[NodeBase]
Parameters

filter – Any number of filter s.

Returns

The previous node in document order that matches all filters or None.

fetch_preceding_sibling(*filter: _delb.typing.Filter) Optional[NodeBase]
Parameters

filter – Any number of filter s.

Returns

The next sibling to the left that matches all filters or None.

first_child = None
property full_text: str

The concatenated contents of all text node descendants in document order.

property index: Optional[int]

The node’s index within the parent’s collection of child nodes or None when the node has no parent.

iterate_ancestors(*filter: _delb.typing.Filter) Iterator[TagNode]
Parameters

filter – Any number of filter s that a node must match to be yielded.

Returns

A generator iterator that yields the ancestor nodes from bottom to top.

iterate_children(*filter: _delb.typing.Filter, recurse: bool = False) Iterator[NodeBase]

A generator iterator that yields nothing.

iterate_descendants(*filter: _delb.typing.Filter) Iterator[NodeBase]
Parameters

filter – Any number of filter s that a node must match to be yielded.

Returns

A generator iterator that yields the descending nodes of the node.

iterate_following(*filter: _delb.typing.Filter) Iterator[NodeBase]
Parameters

filter – Any number of filter s that a node must match to be yielded.

Returns

A generator iterator that yields the following nodes in document order.

iterate_following_siblings(*filter: _delb.typing.Filter) Iterator[NodeBase]
Parameters

filter – Any number of filter s that a node must match to be yielded.

Returns

A generator iterator that yields the siblings to the node’s right.

iterate_preceding(*filter: _delb.typing.Filter) Iterator[NodeBase]
Parameters

filter – Any number of filter s that a node must match to be yielded.

Returns

A generator iterator that yields the previous nodes in document order.

iterate_preceding_siblings(*filter: _delb.typing.Filter) Iterator[NodeBase]
Parameters

filter – Any number of filter s that a node must match to be yielded.

Returns

A generator iterator that yields the siblings to the node’s left.

last_child = None
last_descendant = None
property namespaces: Namespaces

The prefix to namespace mapping of the node.

new_tag_node(local_name: str, attributes: Optional[Dict[str, str]] = None, namespace: Optional[str] = None, children: Sequence[Union[str, NodeBase, _TagDefinition]] = ()) TagNode

Creates a new TagNode instance in the node’s context.

Parameters
  • local_name – The tag name.

  • attributes – Optional attributes that are assigned to the new node.

  • namespace – An optional tag namespace. If none is provided, the context node’s namespace is inherited.

  • children – An optional sequence of objects that will be appended as child nodes. This can be existing nodes, strings that will be inserted as text nodes and in-place definitions of TagNode instances from tag(). The latter will be assigned to the same namespace.

Returns

The newly created tag node.

property parent: Optional[TagNode]

The node’s parent or None.

replace_with(node: Union[str, NodeBase, _TagDefinition], clone: bool = False) NodeBase

Removes the node and places the given one in its tree location.

The node can be a concrete instance of any node type or a rather abstract description in the form of a string or an object returned from the tag() function that is used to derive a TextNode respectively TagNode instance from.

Parameters
  • node – The replacing node.

  • clone – A concrete, replacing node is cloned if True.

Returns

The removed node.

property target: str

The processing instruction’s target.

xpath(expression: str, namespaces: Optional[Namespaces] = None) QueryResults

See Queries with XPath & CSS for details on the extent of the XPath implementation.

Parameters
  • expression – A supported XPath 1.0 expression that contains one or more location paths.

  • namespaces – A mapping of prefixes that are used in the expression to namespaces. If omitted, the node’s definition is used.

Returns

All nodes that match the evaluation of the provided XPath expression.

Tag

class delb.TagNode(etree_element: _Element)[source]

The instances of this class represent tag node s of a tree, the equivalent of DOM’s elements.

To instantiate new nodes use Document.new_tag_node, TagNode.new_tag_node, TextNode.new_tag_node or new_tag_node().

Some syntactic sugar is baked in:

Attributes and nodes can be tested for membership in a node.

>>> root = Document('<root ham="spam"><child/></root>').root
>>> child = root.first_child
>>> "ham" in root
True
>>> child in root
True

Nodes can be copied. Note that this relies on TagNode.clone().

>>> from copy import copy, deepcopy
>>> root = Document("<root>Content</root>").root
>>> print(copy(root))
<root/>
>>> print(deepcopy(root))
<root>Content</root>

Nodes can be tested for equality regarding their qualified name and attributes.

>>> root = Document('<root><foo x="0"/><foo x="0"/><bar x="0"/></root>').root
>>> root[0] == root[1]
True
>>> root[0] == root[2]
False

Attribute values and child nodes can be obtained with the subscript notation.

>>> root = Document('<root x="0"><child_1/>child_2<child_3/></root>').root
>>> root["x"]
'0'
>>> print(root[0])
<child_1/>
>>> print(root[-1])
<child_3/>
>>> print([str(x) for x in root[1::-1]])
['child_2', '<child_1/>']

How much child nodes has this node anyway?

>>> root = Document("<root><child_1/><child_2/></root>").root
>>> len(root)
2
>>> len(root[0])
0

As seen in the examples above, a tag nodes string representation yields a serialized XML representation of a sub-/tree.

Properties

attributes

A mapping that can be used to query and alter the node's attributes.

depth

The depth (or level) of the node in its tree.

document

The Document instances that the node is associated with or None.

first_child

The node's first child node.

full_text

The concatenated contents of all text node descendants in document order.

id

This is a shortcut to retrieve and set the id attribute in the XML namespace.

index

The node's index within the parent's collection of child nodes or None when the node has no parent.

last_child

The node's last child node.

last_descendant

The node's last descendant.

local_name

The node's name.

location_path

An unambiguous XPath location path that points to this node from its tree root.

namespace

The node's namespace.

namespaces

The prefix to namespace mapping of the node.

parent

The node's parent or None.

prefix

The prefix that the node's namespace is currently mapped to.

universal_name

The node's qualified name in Clark notation.

Fetching a single relative node

fetch_following(*filter)

param filter

Any number of filter s.

fetch_following_sibling(*filter)

param filter

Any number of filter s.

fetch_preceding(*filter)

param filter

Any number of filter s.

fetch_preceding_sibling(*filter)

param filter

Any number of filter s.

Iterating over relative nodes

iterate_ancestors(*filter)

param filter

Any number of filter s that a node must match to be

iterate_children(*filter[, recurse])

param filter

Any number of filter s that a node must match to be

iterate_descendants(*filter)

param filter

Any number of filter s that a node must match to be

iterate_following(*filter)

param filter

Any number of filter s that a node must match to be

iterate_following_siblings(*filter)

param filter

Any number of filter s that a node must match to be

iterate_preceding(*filter)

param filter

Any number of filter s that a node must match to be

iterate_preceding_siblings(*filter)

param filter

Any number of filter s that a node must match to be

Querying nodes

css_select(expression[, namespaces])

See Queries with XPath & CSS regarding the extent of the supported grammar.

fetch_or_create_by_xpath(expression[, ...])

Fetches a single node that is locatable by the provided XPath expression.

xpath(expression[, namespaces])

See Queries with XPath & CSS for details on the extent of the XPath implementation.

Adding nodes

add_following_siblings(*node[, clone])

Adds one or more nodes to the right of the node this method is called on.

add_preceding_siblings(*node[, clone])

Adds one or more nodes to the left of the node this method is called on.

append_children(*node[, clone])

Adds one or more nodes as child nodes after any existing to the child nodes of the node this method is called on.

insert_children(index, *node[, clone])

Inserts one or more child nodes.

prepend_children(*node[, clone])

Adds one or more nodes as child nodes before any existing to the child nodes of the node this method is called on.

Removing a node from its tree

detach([retain_child_nodes])

Removes the node from its tree.

replace_with(node[, clone])

Removes the node and places the given one in its tree location.

Uncategorized methods

clone([deep, quick_and_unsafe])

param deep

Clones the whole subtree if True.

merge_text_nodes()

Merges all consecutive text nodes in the subtree into one.

new_tag_node(local_name[, attributes, ...])

Creates a new TagNode instance in the node's context.

parse(text[, parser, parser_options, ...])

Parses the given string or bytes sequence into a new tree.


add_following_siblings(*node: Union[str, NodeBase, _TagDefinition], clone: bool = False)

Adds one or more nodes to the right of the node this method is called on.

The nodes can be concrete instances of any node type or rather abstract descriptions in the form of strings or objects returned from the tag() function that are used to derive TextNode respectively TagNode instances from.

Parameters
  • node – The node(s) to be added.

  • clone – Clones the concrete nodes before adding if True.

add_preceding_siblings(*node: Union[str, NodeBase, _TagDefinition], clone: bool = False)

Adds one or more nodes to the left of the node this method is called on.

The nodes can be concrete instances of any node type or rather abstract descriptions in the form of strings or objects returned from the tag() function that are used to derive TextNode respectively TagNode instances from.

Parameters
  • node – The node(s) to be added.

  • clone – Clones the concrete nodes before adding if True.

append_children(*node: Union[str, NodeBase, _TagDefinition], clone: bool = False)[source]

Adds one or more nodes as child nodes after any existing to the child nodes of the node this method is called on.

The nodes can be concrete instances of any node type or rather abstract descriptions in the form of strings or objects returned from the tag() function that are used to derive TextNode respectively TagNode instances from.

Parameters
  • node – The node(s) to be added.

  • clone – Clones the concrete nodes before adding if True.

property attributes: TagAttributes

A mapping that can be used to query and alter the node’s attributes.

>>> node = new_tag_node("node", attributes={"foo": "0", "bar": "0"})
>>> node.attributes
{'foo': '0', 'bar': '0'}
>>> node.attributes.pop("bar")
'0'
>>> node.attributes["foo"] = "1"
>>> node.attributes["peng"] = "1"
>>> print(node)
<node foo="1" peng="1"/>
>>> node.attributes.update({"foo": "2", "zong": "2"})
>>> print(node)
<node foo="2" peng="1" zong="2"/>

Namespaced attributes can be accessed by using Python’s slice notation. A default namespace can be provided optionally, but it’s also found without.

>>> node = new_tag_node("node", {})
>>> node.attributes["http://namespace":"foo"] = "0"
>>> print(node)
<node xmlns:ns0="http://namespace" ns0:foo="0"/>
>>> node = Document('<node xmlns="default" foo="0"/>').root
>>> node.attributes["default":"foo"] is node.attributes["foo"]
True

Attributes behave like strings, but also expose namespace, local name and value for manipulation.

>>> node = new_tag_node("node")
>>> node.attributes["foo"] = "0"
>>> node.attributes["foo"].local_name = "bar"
>>> node.attributes["bar"].namespace = "http://namespace"
>>> node.attributes["http://namespace":"bar"].value = "1"
>>> print(node)
<node xmlns:ns0="http://namespace" ns0:bar="1"/>

Unlike with typical Python mappings, requesting a non-existing attribute doesn’t evoke a KeyError, instead None is returned.

clone(deep: bool = False, quick_and_unsafe: bool = False) TagNode[source]
Parameters
  • deep – Clones the whole subtree if True.

  • quick_and_unsafe – Creates a deep clone in a quicker manner where text nodes may get lost. It should be safe with trees that don’t contain subsequent text nodes, e.g. freshly parsed, unaltered documents of after TagNode.merge_text_nodes() has been applied.

Returns

A copy of the node.

css_select(expression: str, namespaces: Optional[Namespaces] = None) QueryResults[source]

See Queries with XPath & CSS regarding the extent of the supported grammar.

Namespace prefixes are delimited with a | before a name test, for example div svg|metadata selects all descendants of div named nodes that belong to the default namespace or have no namespace and whose name is metadata and have a namespace that is mapped to the svg prefix.

Parameters
  • expression – A CSS selector expression.

  • namespaces – A mapping of prefixes that are used in the expression to namespaces. If omitted, the node’s definition is used.

Returns

All nodes that match the evaluation of the provided CSS selector expression.

property depth: int

The depth (or level) of the node in its tree.

detach(retain_child_nodes: bool = False) _ElementWrappingNode[source]

Removes the node from its tree.

Parameters

retain_child_nodes – Keeps the node’s descendants in the originating tree if True.

Returns

The removed node.

property document: Optional[Document]

The Document instances that the node is associated with or None.

fetch_following(*filter: _delb.typing.Filter) Optional[NodeBase]
Parameters

filter – Any number of filter s.

Returns

The next node in document order that matches all filters or None.

fetch_following_sibling(*filter: _delb.typing.Filter) Optional[NodeBase]
Parameters

filter – Any number of filter s.

Returns

The next sibling to the right that matches all filters or None.

fetch_or_create_by_xpath(expression: str, namespaces: Union[Namespaces, None, Mapping[Optional[str], str]] = None) TagNode[source]

Fetches a single node that is locatable by the provided XPath expression. If the node doesn’t exist, the non-existing branch will be created. These rules are imperative in your endeavour:

  • All location steps must use the child axis.

  • Each step needs to provide a name test.

  • Attributes must be compared against a literal.

  • Multiple attribute comparisons must be joined with the and operator and / or more than one predicate expression.

  • The logical validity of multiple attribute comparisons isn’t checked. E.g. one could provide foo[@p="her"][@p="him"], but expect an undefined behaviour.

  • Other contents in predicate expressions are invalid.

>>> document = Document("<root/>")
>>> grandchild = document.root.fetch_or_create_by_xpath(
...     "child[@a='b']/grandchild"
... )
>>> grandchild is document.root.fetch_or_create_by_xpath(
...     "child[@a='b']/grandchild"
... )
True
>>> str(document)
'<root><child a="b"><grandchild/></child></root>'
Parameters
  • expression – An XPath expression that can unambiguously locate a descending node in a tree that has any state.

  • namespaces – An optional mapping of prefixes to namespaces. As default the node’s one is used.

Returns

The existing or freshly created node descibed with expression.

fetch_preceding(*filter: _delb.typing.Filter) Optional[NodeBase]
Parameters

filter – Any number of filter s.

Returns

The previous node in document order that matches all filters or None.

fetch_preceding_sibling(*filter: _delb.typing.Filter) Optional[NodeBase]
Parameters

filter – Any number of filter s.

Returns

The next sibling to the left that matches all filters or None.

property first_child: Optional[NodeBase]

The node’s first child node.

property full_text: str

The concatenated contents of all text node descendants in document order.

property id: Optional[str]

This is a shortcut to retrieve and set the id attribute in the XML namespace. The client code is responsible to pass properly formed id names.

property index: Optional[int]

The node’s index within the parent’s collection of child nodes or None when the node has no parent.

insert_children(index: int, *node: Union[str, NodeBase, _TagDefinition], clone: bool = False)[source]

Inserts one or more child nodes.

The nodes can be concrete instances of any node type or rather abstract descriptions in the form of strings or objects returned from the tag() function that are used to derive TextNode respectively TagNode instances from.

Parameters
  • index – The index at which the first of the given nodes will be inserted, the remaining nodes are added afterwards in the given order.

  • node – The node(s) to be added.

  • clone – Clones the concrete nodes before adding if True.

iterate_ancestors(*filter: _delb.typing.Filter) Iterator[TagNode]
Parameters

filter – Any number of filter s that a node must match to be yielded.

Returns

A generator iterator that yields the ancestor nodes from bottom to top.

iterate_children(*filter: _delb.typing.Filter, recurse: bool = False) Iterator[NodeBase][source]
Parameters
  • filter – Any number of filter s that a node must match to be yielded.

  • recurse – Deprecated. Use NodeBase.iterate_descendants().

Returns

A generator iterator that yields the child nodes of the node.

iterate_descendants(*filter: _delb.typing.Filter) Iterator[NodeBase][source]
Parameters

filter – Any number of filter s that a node must match to be yielded.

Returns

A generator iterator that yields the descending nodes of the node.

iterate_following(*filter: _delb.typing.Filter) Iterator[NodeBase]
Parameters

filter – Any number of filter s that a node must match to be yielded.

Returns

A generator iterator that yields the following nodes in document order.

iterate_following_siblings(*filter: _delb.typing.Filter) Iterator[NodeBase]
Parameters

filter – Any number of filter s that a node must match to be yielded.

Returns

A generator iterator that yields the siblings to the node’s right.

iterate_preceding(*filter: _delb.typing.Filter) Iterator[NodeBase]
Parameters

filter – Any number of filter s that a node must match to be yielded.

Returns

A generator iterator that yields the previous nodes in document order.

iterate_preceding_siblings(*filter: _delb.typing.Filter) Iterator[NodeBase]
Parameters

filter – Any number of filter s that a node must match to be yielded.

Returns

A generator iterator that yields the siblings to the node’s left.

property last_child: Optional[NodeBase]

The node’s last child node.

property last_descendant: Optional[NodeBase]

The node’s last descendant.

property local_name: str

The node’s name.

property location_path: str

An unambiguous XPath location path that points to this node from its tree root.

merge_text_nodes()[source]

Merges all consecutive text nodes in the subtree into one.

property namespace: Optional[str]

The node’s namespace. Be aware, that while this property can be set to None, serializations will continue to render a previous default namespace declaration if the node had such.

property namespaces: Namespaces

The prefix to namespace mapping of the node.

new_tag_node(local_name: str, attributes: Optional[Dict[str, str]] = None, namespace: Optional[str] = None, children: Sequence[Union[str, NodeBase, _TagDefinition]] = ()) TagNode[source]

Creates a new TagNode instance in the node’s context.

Parameters
  • local_name – The tag name.

  • attributes – Optional attributes that are assigned to the new node.

  • namespace – An optional tag namespace. If none is provided, the context node’s namespace is inherited.

  • children – An optional sequence of objects that will be appended as child nodes. This can be existing nodes, strings that will be inserted as text nodes and in-place definitions of TagNode instances from tag(). The latter will be assigned to the same namespace.

Returns

The newly created tag node.

property parent: Optional[TagNode]

The node’s parent or None.

static parse(text: AnyStr, parser: Optional[XMLParser] = None, parser_options: Optional[ParserOptions] = None, collapse_whitespace: Optional[bool] = None) TagNode[source]

Parses the given string or bytes sequence into a new tree.

Parameters
  • text – A serialized XML tree.

  • parser – Deprecated.

  • parser_options – A delb.ParserOptions class to configure the used parser.

  • collapse_whitespace – Deprecated. Use the argument with the same name on the parser_options object.

property prefix: Optional[str]

The prefix that the node’s namespace is currently mapped to.

prepend_children(*node: NodeBase, clone: bool = False) None[source]

Adds one or more nodes as child nodes before any existing to the child nodes of the node this method is called on.

The nodes can be concrete instances of any node type or rather abstract descriptions in the form of strings or objects returned from the tag() function that are used to derive TextNode respectively TagNode instances from.

Parameters
  • node – The node(s) to be added.

  • clone – Clones the concrete nodes before adding if True.

replace_with(node: Union[str, NodeBase, _TagDefinition], clone: bool = False) NodeBase

Removes the node and places the given one in its tree location.

The node can be a concrete instance of any node type or a rather abstract description in the form of a string or an object returned from the tag() function that is used to derive a TextNode respectively TagNode instance from.

Parameters
  • node – The replacing node.

  • clone – A concrete, replacing node is cloned if True.

Returns

The removed node.

property universal_name: str

The node’s qualified name in Clark notation.

xpath(expression: str, namespaces: Optional[Namespaces] = None) QueryResults[source]

See Queries with XPath & CSS for details on the extent of the XPath implementation.

Parameters
  • expression – A supported XPath 1.0 expression that contains one or more location paths.

  • namespaces – A mapping of prefixes that are used in the expression to namespaces. If omitted, the node’s definition is used.

Returns

All nodes that match the evaluation of the provided XPath expression.

Tag attribute

class delb.nodes.Attribute(attributes: TagAttributes, key: str)[source]

Attribute objects represent tag node’s attributes. See the delb.TagNode.attributes() documentation for capabilities.

property local_name: str

The attribute’s local name.

property namespace: Optional[str]

The attribute’s namespace

property universal_name: str

The attribute’s namespace and local name in Clark notation.

property value: str

The attribute’s value.

Text

class delb.TextNode(reference_or_text: Union[_Element, str, TextNode], position: int = 0)[source]

TextNodes contain the textual data of a document. The class shall not be initialized by client code, just throw strings into the trees.

Instances expose all methods of str except str.index():

>>> node = TextNode("Show us the way to the next whisky bar.")
>>> node.split()
['Show', 'us', 'the', 'way', 'to', 'the', 'next', 'whisky', 'bar.']

Instances can be tested for inequality with other text nodes and strings:

>>> TextNode("ham") == TextNode("spam")
False
>>> TextNode("Patsy") == "Patsy"
True

And they can be tested for substrings:

>>> "Sir" in TextNode("Sir Bedevere the Wise")
True

Attributes that rely to child nodes yield nothing respectively None.

Properties

content

The node's text content.

depth

The depth (or level) of the node in its tree.

document

The Document instances that the node is associated with or None.

first_child

full_text

The concatenated contents of all text node descendants in document order.

index

The node's index within the parent's collection of child nodes or None when the node has no parent.

last_child

last_descendant

namespaces

The prefix to namespace mapping of the node.

parent

The node's parent or None.

Fetching a single relative node

fetch_following(*filter)

param filter

Any number of filter s.

fetch_following_sibling(*filter)

param filter

Any number of filter s.

fetch_preceding(*filter)

param filter

Any number of filter s.

fetch_preceding_sibling(*filter)

param filter

Any number of filter s.

Iterating over relative nodes

iterate_ancestors(*filter)

param filter

Any number of filter s that a node must match to be

iterate_children(*filter[, recurse])

A generator iterator that yields nothing.

iterate_descendants(*filter)

param filter

Any number of filter s that a node must match to be

iterate_following(*filter)

param filter

Any number of filter s that a node must match to be

iterate_following_siblings(*filter)

param filter

Any number of filter s that a node must match to be

iterate_preceding(*filter)

param filter

Any number of filter s that a node must match to be

iterate_preceding_siblings(*filter)

param filter

Any number of filter s that a node must match to be

Querying nodes

xpath(expression[, namespaces])

See Queries with XPath & CSS for details on the extent of the XPath implementation.

Adding nodes

add_following_siblings(*node[, clone])

Adds one or more nodes to the right of the node this method is called on.

add_preceding_siblings(*node[, clone])

Adds one or more nodes to the left of the node this method is called on.

Removing a node from its tree

detach([retain_child_nodes])

Removes the node from its tree.

replace_with(node[, clone])

Removes the node and places the given one in its tree location.

Uncategorized methods

clone([deep, quick_and_unsafe])

param deep

Clones the whole subtree if True.

new_tag_node(local_name[, attributes, ...])

Creates a new TagNode instance in the node's context.


add_following_siblings(*node: Union[str, NodeBase, _TagDefinition], clone: bool = False)

Adds one or more nodes to the right of the node this method is called on.

The nodes can be concrete instances of any node type or rather abstract descriptions in the form of strings or objects returned from the tag() function that are used to derive TextNode respectively TagNode instances from.

Parameters
  • node – The node(s) to be added.

  • clone – Clones the concrete nodes before adding if True.

add_preceding_siblings(*node: Union[str, NodeBase, _TagDefinition], clone: bool = False)

Adds one or more nodes to the left of the node this method is called on.

The nodes can be concrete instances of any node type or rather abstract descriptions in the form of strings or objects returned from the tag() function that are used to derive TextNode respectively TagNode instances from.

Parameters
  • node – The node(s) to be added.

  • clone – Clones the concrete nodes before adding if True.

clone(deep: bool = False, quick_and_unsafe: bool = False) NodeBase[source]
Parameters
  • deep – Clones the whole subtree if True.

  • quick_and_unsafe – Creates a deep clone in a quicker manner where text nodes may get lost. It should be safe with trees that don’t contain subsequent text nodes, e.g. freshly parsed, unaltered documents of after TagNode.merge_text_nodes() has been applied.

Returns

A copy of the node.

property content: str

The node’s text content.

property depth: int

The depth (or level) of the node in its tree.

detach(retain_child_nodes: bool = False) TextNode[source]

Removes the node from its tree.

Parameters

retain_child_nodes – Keeps the node’s descendants in the originating tree if True.

Returns

The removed node.

property document: Optional[Document]

The Document instances that the node is associated with or None.

fetch_following(*filter: _delb.typing.Filter) Optional[NodeBase]
Parameters

filter – Any number of filter s.

Returns

The next node in document order that matches all filters or None.

fetch_following_sibling(*filter: _delb.typing.Filter) Optional[NodeBase]
Parameters

filter – Any number of filter s.

Returns

The next sibling to the right that matches all filters or None.

fetch_preceding(*filter: _delb.typing.Filter) Optional[NodeBase]
Parameters

filter – Any number of filter s.

Returns

The previous node in document order that matches all filters or None.

fetch_preceding_sibling(*filter: _delb.typing.Filter) Optional[NodeBase][source]
Parameters

filter – Any number of filter s.

Returns

The next sibling to the left that matches all filters or None.

first_child = None
property full_text: str

The concatenated contents of all text node descendants in document order.

property index: Optional[int]

The node’s index within the parent’s collection of child nodes or None when the node has no parent.

iterate_ancestors(*filter: _delb.typing.Filter) Iterator[TagNode]
Parameters

filter – Any number of filter s that a node must match to be yielded.

Returns

A generator iterator that yields the ancestor nodes from bottom to top.

iterate_children(*filter: _delb.typing.Filter, recurse: bool = False) Iterator[NodeBase]

A generator iterator that yields nothing.

iterate_descendants(*filter: _delb.typing.Filter) Iterator[NodeBase]
Parameters

filter – Any number of filter s that a node must match to be yielded.

Returns

A generator iterator that yields the descending nodes of the node.

iterate_following(*filter: _delb.typing.Filter) Iterator[NodeBase]
Parameters

filter – Any number of filter s that a node must match to be yielded.

Returns

A generator iterator that yields the following nodes in document order.

iterate_following_siblings(*filter: _delb.typing.Filter) Iterator[NodeBase]
Parameters

filter – Any number of filter s that a node must match to be yielded.

Returns

A generator iterator that yields the siblings to the node’s right.

iterate_preceding(*filter: _delb.typing.Filter) Iterator[NodeBase]
Parameters

filter – Any number of filter s that a node must match to be yielded.

Returns

A generator iterator that yields the previous nodes in document order.

iterate_preceding_siblings(*filter: _delb.typing.Filter) Iterator[NodeBase]
Parameters

filter – Any number of filter s that a node must match to be yielded.

Returns

A generator iterator that yields the siblings to the node’s left.

last_child = None
last_descendant = None
property namespaces

The prefix to namespace mapping of the node.

new_tag_node(local_name: str, attributes: Optional[Dict[str, str]] = None, namespace: Optional[str] = None, children: Sequence[Union[str, NodeBase, _TagDefinition]] = ()) TagNode

Creates a new TagNode instance in the node’s context.

Parameters
  • local_name – The tag name.

  • attributes – Optional attributes that are assigned to the new node.

  • namespace – An optional tag namespace. If none is provided, the context node’s namespace is inherited.

  • children – An optional sequence of objects that will be appended as child nodes. This can be existing nodes, strings that will be inserted as text nodes and in-place definitions of TagNode instances from tag(). The latter will be assigned to the same namespace.

Returns

The newly created tag node.

property parent: Optional[TagNode]

The node’s parent or None.

replace_with(node: Union[str, NodeBase, _TagDefinition], clone: bool = False) NodeBase

Removes the node and places the given one in its tree location.

The node can be a concrete instance of any node type or a rather abstract description in the form of a string or an object returned from the tag() function that is used to derive a TextNode respectively TagNode instance from.

Parameters
  • node – The replacing node.

  • clone – A concrete, replacing node is cloned if True.

Returns

The removed node.

xpath(expression: str, namespaces: Optional[Namespaces] = None) QueryResults

See Queries with XPath & CSS for details on the extent of the XPath implementation.

Parameters
  • expression – A supported XPath 1.0 expression that contains one or more location paths.

  • namespaces – A mapping of prefixes that are used in the expression to namespaces. If omitted, the node’s definition is used.

Returns

All nodes that match the evaluation of the provided XPath expression.

Node constructors

delb.new_comment_node(content: str) CommentNode[source]

Creates a new CommentNode.

Parameters

content – The comment’s content a.k.a. as text.

Returns

The newly created comment node.

delb.new_processing_instruction_node(target: str, content: str) ProcessingInstructionNode[source]

Creates a new ProcessingInstructionNode.

Parameters
  • target – The processing instruction’s target name.

  • content – The processing instruction’s text.

Returns

The newly created processing instruction node.

delb.new_tag_node(local_name: str, attributes: Optional[Dict[str, str]] = None, namespace: Optional[str] = None, children: Sequence[Union[str, NodeBase, _TagDefinition]] = ()) TagNode[source]

Creates a new TagNode instance outside any context. It is preferable to use new_tag_node(), on instances of documents and nodes where the instance is the creation context.

Parameters
  • local_name – The tag name.

  • attributes – Optional attributes that are assigned to the new node.

  • namespace – An optional tag namespace.

  • children – An optional sequence of objects that will be appended as child nodes. This can be existing nodes, strings that will be inserted as text nodes and in-place definitions of TagNode instances from tag(). The latter will be assigned to the same namespace.

Returns

The newly created tag node.

Queries with XPath & CSS

delb allows querying of nodes with CSS selector and XPath expressions. CSS selectors are converted to XPath expressions with a third-party library before evaluation and they are only supported as far as their computed XPath equivalents are supported by delb’s very own XPath implementation.

This implementation is not fully compliant with one of the W3C’s XPath specifications. It mostly covers the XPath 1.0 specs, but focuses on the querying via path expressions with simple constraints while it omits a broad employment of computations (that’s what programming languages are for) and has therefore these intended deviations from that standard:

  • Default namespaces can be addressed in node and attribute names, by simply using no prefix.

  • The attribute and namespace axes are not supported in location steps (see also below).

  • In predicates only the attribute axis can be used in its abbreviated form (@name).

  • Path evaluations within predicates are not available.

  • Only these predicate functions are provided and tested:
    • boolean

    • concat

    • contains

    • last

    • not

    • position

    • starts-with

    • text
      • Behaves as if deployed as a single step location path that only tests for the node type text. Hence it returns the contents of the context node’s first child node that is a text node or an empty string when there is none.

    • Please refrain from extension requests without a proper, concrete implementation proposal.

If you’re accustomed to retrieve attribute values with XPath expressions, employ the functionality of the higher programming language at hand like this:

>>> [x.attributes["target"] for x in root.xpath(".//foo")
...  if "target" in x.attributes ]  

Instead of:

>>> root.xpath(".//foo/@target")  

See _delb.plugins.PluginManager.register_xpath_function() regarding the use of custom functions.

class _delb.xpath.EvaluationContext(node: NodeBase, position: int, size: int, namespaces: Namespaces)[source]

Instances of this type are passed to XPath functions in order to pass contextual information.

count(value, /)

Return number of occurrences of value.

index(value, start=0, stop=9223372036854775807, /)

Return first index of value.

Raises ValueError if the value is not present.

property namespaces

A mapping of prefixes to namespaces that is used in the whole evaluation.

property node

The node that is evaluated.

property position

The node’s position within all nodes that matched a location step’s node test in order of the step’s axis’ direction. The first position is 1.

property size

The number of all nodes all nodes that matched a location step’s node test.

class _delb.xpath.QueryResults(results: Iterable[NodeBase])[source]

A container that includes the results of a CSS selector or XPath query with some helpers for better readable Python expressions.

as_list() List[NodeBase][source]

The contained nodes as a new list.

property as_tuple: Tuple[NodeBase, ...]

The contained nodes in a tuple.

count(value) integer -- return number of occurrences of value
filtered_by(*filters: _delb.typing.Filter) QueryResults[source]

Returns another QueryResults instance that contains all nodes filtered by the provided filter s.

property first: Optional[NodeBase]

The first node from the results or None if there are none.

in_document_order() QueryResults[source]

Returns another QueryResults instance where the contained nodes are sorted in document order.

index(value[, start[, stop]]) integer -- return first index of value.

Raises ValueError if the value is not present.

Supporting start and stop arguments is optional, but recommended.

property last: Optional[NodeBase]

The last node from the results or None if there are none.

property size: int

The amount of contained nodes.

Filters

Default filters

delb.altered_default_filters(*filter: _delb.typing.Filter, extend: bool = False)[source]

This function can be either used as as context manager or decorator to define a set of default_filters for the encapsuled code block or callable. These are then applied in all operations that allow node filtering, like TagNode.next_node(). Mind that they also affect a node’s index property and indexed access to child nodes.

>>> root = Document(
...     '<root xmlns="foo"><a/><!--x--><b/><!--y--><c/></root>'
... ).root
>>> with altered_default_filters(is_comment_node):
...     print([x.content for x in root.iterate_children()])
['x', 'y']

As the default filters shadow comments and processing instructions by default, use no argument to unset this in order to access all type of nodes.

Parameters

extend – Extends the currently active filters with the given ones instead of replacing them.

Contributed filters

delb.any_of(*filter: _delb.typing.Filter) _delb.typing.Filter[source]

A node filter wrapper that matches when any of the given filters is matching, like a boolean or.

delb.is_comment_node(node: NodeBase) bool[source]

A node filter that matches CommentNode instances.

delb.is_processing_instruction_node(node: NodeBase) bool[source]

A node filter that matches ProcessingInstructionNode instances.

delb.is_tag_node(node: NodeBase) bool[source]

A node filter that matches TagNode instances.

delb.is_text_node(node: NodeBase) bool[source]

A node filter that matches TextNode instances.

delb.not_(*filter: _delb.typing.Filter) _delb.typing.Filter[source]

A node filter wrapper that matches when the given filter is not matching, like a boolean not.

Transformations

This module offers a canonical interface with the aim to make re-use of transforming algorithms easier.

Let’s look at it with examples:

from delb.transform import Transformation


class ResolveCopyOf(Transformation):
    def transform(self):
        for node in self.root.css_select("*[copyOf]"):
            source_id = node["copyOf"]
            source_node = self.origin_document.xpath(
                f'//*[@xml:id="{source_id[1:]}"]'
            ).first
            cloned_node = source_node.clone(deep=True)
            cloned_node.id = None
            node.replace_with(cloned_node)

From such defined transformations instances can be called with a (sub-)tree and an optional document where that tree originates from:

resolve_copy_of = ResolveCopyOf()
tree = resolve_copy_of(tree)  # where tree is an instance of TagNode

typing.NamedTuple are used to define options for transformations:

from typing import NamedTuple


class ResolveChoiceOptions(NamedTuple):
    corr: bool = True
    reg: bool = True


class ResolveChoice(Transformation):
    options_class = ResolveChoiceOptions

    def __init__(self, options):
        super().__init__(options)
        self.keep_selector = ",".join(
            (
                "corr" if self.options.corr else "sic",
                "reg" if self.options.reg else "orig"
            )
         )
        self.drop_selector = ",".join(
            (
                "sic" if self.options.corr else "corr",
                "orig" if self.options.reg else "reg"
            )
        )

    def transform(self):
        for choice_node in self.root.css_select("choice"):
            node_to_drop = choice_node.css_select(self.drop_selector).first
            node_to_drop.detach()

            node_to_keep = choice_node.css_select(self.keep_selector).first
            node_to_keep.detach(retain_child_nodes=True)

            choice_node.detach(retain_child_nodes=True)

A transformation class that defines an option_class property can then either be used with its defaults or with alternate options:

resolve_choice = ResolveChoice()
tree = resolve_choice(tree)

resolve_choice = ResolveChoice(ResolveChoiceOptions(reg=False))
tree = resolve_choice(tree)

Finally, concrete transformations can be chained, both as classes or instances. The interface allows also to chain multiple chains:

from delb.transform import TransformationSequence

tidy_up = TransformationSequence(ResolveCopyOf, resolve_choice)
tree = tidy_up(tree)

Attention

This is an experimental feature. It might change significantly in the future or be removed altogether.

class delb.transform.Transformation(options: Optional[NamedTuple] = None)[source]

This is a base class for any transformation algorithm.

abstract transform()[source]

This method needs to implement the transformation logic. When it is called, the instance has two attributes assigned from its call:

root is the node that the transformation was called to transform with. origin_document is the document that was possibly passed as second argument.

class delb.transform.TransformationBase[source]

This base class defines the calling interface of transformations.

class delb.transform.TransformationSequence(*transformations: Union[TransformationBase, Type[TransformationBase]])[source]

A transformation sequence can be used to combine any number of both Transformation (provided as class or instantiated with options) and other TransformationSequence instances or classes.

Various helpers

delb.first(iterable: Iterable) Optional[Any][source]

Returns the first item of the given iterable or None if it’s empty. Note that the first item is consumed when the iterable is an iterator.

delb.get_traverser(from_left=True, depth_first=True, from_top=True)[source]

Returns a function that can be used to traverse a (sub)tree with the given node as root. While traversing the given root node is yielded at some point.

The returned functions have this signature:

def traverser(root: NodeBase, *filters: Filter) -> Iterator[NodeBase]:
    ...
Parameters
  • from_left – The traverser yields sibling nodes from left to right if True, or starting from the right if False.

  • depth_first – The child nodes resp. the parent node are yielded before the siblings of a node by a traverser if True. Siblings are favored if False.

  • from_top – The traverser starts yielding nodes with the lowest depth if True. When False, again, the opposite is in effect.

delb.last(iterable: Iterable) Optional[Any][source]

Returns the last item of the given iterable or None if it’s empty. Note that the whole iterator is consumed when such is given.

delb.register_namespace(prefix: str, namespace: str)[source]

Registers a namespace prefix that newly created TagNode instances in that namespace will use in serializations.

The registry is global, and any existing mapping for either the given prefix or the namespace URI will be removed. It has however no effect on the serialization of existing nodes, see Document.cleanup_namespace() for that.

Parameters
  • prefix – The prefix to register.

  • namespace – The targeted namespace.

delb.tag(local_name: str)[source]
delb.tag(local_name: str, attributes: Mapping[str, str])
delb.tag(local_name: str, child: Union[str, NodeBase, _TagDefinition])
delb.tag(local_name: str, children: Sequence[Union[str, NodeBase, _TagDefinition]])
delb.tag(local_name: str, attributes: Mapping[str, str], child: Union[str, NodeBase, _TagDefinition])
delb.tag(local_name: str, attributes: Mapping[str, str], children: Sequence[Union[str, NodeBase, _TagDefinition]])

This function can be used for in-place creation (or call it templating if you want to) of TagNode instances as:

  • node argument to methods that add nodes to a tree

  • items in the children argument of new_tag_node() and NodeBase.new_tag_node()

The first argument to the function is always the local name of the tag node. Optionally, the second argument can be a mapping that specifies attributes for that node. The optional last argument is either a single object that will be appended as child node or a sequence of such, these objects can be node instances of any type, strings (for derived TextNode instances) or other definitions from this function (for derived TagNode instances).

The actual nodes that are constructed always inherit the namespace of the context node they are created in.

>>> root = new_tag_node('root', children=[
...     tag("head", {"lvl": "1"}, "Hello!"),
...     tag("items", (
...         tag("item1"),
...         tag("item2"),
...         )
...     )
... ])
>>> str(root)
'<root><head lvl="1">Hello!</head><items><item1/><item2/></items></root>'
>>> root.append_children(tag("addendum"))
>>> str(root)[-26:]
'</items><addendum/></root>'

Exceptions

exception delb.exceptions.AmbiguousTreeError(message: str)[source]

Raised when a single node shall be fetched or created by an XPath expression in a tree where the target position can’t be clearly determined.

with_traceback()

Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

exception delb.exceptions.DelbBaseException[source]
with_traceback()

Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

exception delb.exceptions.FailedDocumentLoading(source: Any, excuses: Dict[Callable[[Any, SimpleNamespace], Union[_ElementTree, str]], Union[str, Exception]])[source]
with_traceback()

Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

exception delb.exceptions.InvalidCodePath[source]

Raised when a code path that is not expected to be executed is reached.

with_traceback()

Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

exception delb.exceptions.InvalidOperation[source]

Raised when an invalid operation is attempted by the client code.

with_traceback()

Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

exception delb.exceptions.XPathEvaluationError(message: str)[source]
with_traceback()

Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

exception delb.exceptions.XPathParsingError(expression: Optional[str] = None, position: Optional[int] = None, message: Optional[str] = None)[source]

Raised when an XPath expression can’t be parsed.

with_traceback()

Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

exception delb.exceptions.XPathUnsupportedStandardFeature(position: int, feature_description: str)[source]

Raised when an unsupported XPath expression feature is recognized.

with_traceback()

Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.