API Documentation¶

Note

There are actually two packages that are installed with delb: delb and _delb. As the underscore indicates, the latter is exposing private parts of the API while the first is re-exposing what is deemed to be public from that one and additional contents. As a rule of thumb, use the public API in applications and the private API in delb extensions. By doing so, you can avoid circular dependencies if your extension (or other code that it depends on) uses contents from the _delb package.

Documents¶

class delb.Document(source, collapse_whitespace=None, parser=None, parser_options=None, klass=None, **config)[source]¶

This class is the entrypoint to obtain a representation of an XML encoded text document. For instantiation any object can be passed. A suitable loader must be available for the given source. See Document loaders for the default loaders that come with this package. Plugins are capable to alter the available loaders, see Extending delb.

Nodes can be tested for membership in a document:

>>> document = Document("<root>text</root>")
>>> text_node = document.root[0]
>>> text_node in document
True
>>> text_node.clone() in document
False

The string coercion of a document yields an XML encoded stream, but unlike Document.save() and Document.write(), without an XML declaration:

>>> document = Document("<root/>")
>>> str(document)
'<root/>'

Parameters

source – Anything that the configured loaders can make sense of to return a parsed document tree.
collapse_whitespace – Deprecated. Use the argument with the same name on the parser_options object.
parser – Deprecated.
parser_options – A delb.ParserOptions class to configure the used parser.
klass – Explicitly define the initilized class. This can be useful for applications that have default document subclasses in use.
config – Additional keyword arguments for the configuration of extension classes.

Properties

`config`	Beside the used `parser` and `collapsed_whitespace` option, this property contains the namespaced data that extension classes and loaders may have stored.
`head_nodes`	A list-like accessor to the nodes that precede the document's root node.
`namespaces`	The namespace mapping of the document's `root` node.
`root`	The root node of a document tree.
`source_url`	The source URL where a loader obtained the document's contents or `None`.
`tail_nodes`	A list-like accessor to the nodes that follow the document's root node.

Uncategorized methods

`cleanup_namespaces`([namespaces, retain_prefixes])	Consolidates the namespace declarations in the document by removing unused and redundant ones.
`clone`()	return Another instance with the duplicated contents.
`collapse_whitespace`()	Collapses whitespace as described here: https://wiki.tei-c.org/index.php/XML_Whitespace#Recommendations
`css_select`(expression[, namespaces])	This method proxies to the `TagNode.css_select()` method of the document's `root` node.
`merge_text_nodes`()	This method proxies to the `TagNode.merge_text_nodes()` method of the document's `root` node.
`new_tag_node`(local_name[, attributes, namespace])	This method proxies to the `TagNode.new_tag_node()` method of the document's root node.
`save`(path[, pretty])	param path The path where the document shall be saved.
`write`(buffer[, pretty])	param buffer A file-like object that the document is written to.
`xpath`(expression[, namespaces])	This method proxies to the `TagNode.xpath()` method of the document's `root` node.
`xslt`(transformation)	param transformation A `lxml.etree.XSLT` instance that shall be

cleanup_namespaces(namespaces: Optional[Mapping[Optional[str], str]] = None, retain_prefixes: Optional[Iterable[str]] = None)[source]¶

Consolidates the namespace declarations in the document by removing unused and redundant ones.

There are currently some caveats due to lxml/libxml2’s implementations:

prefixes cannot be set for the default namespace
a namespace cannot be declared as default after a node’s creation (where a namespace was specified that had been registered for a prefix with register_namespace())
there’s no way to unregister a prefix for a namespace
if there are other namespaces used as default namespaces (where a namespace was specified that had not been registered for a prefix) in the descendants of the root, their declarations are lost when this method is used

To ensure clean serializations, one should:

register prefixes for all namespaces except the default one at the start of an application
use only one default namespace within a document

Parameters

namespaces – An optional mapping of prefixes (keys) to namespaces (values) that will be declared at the root element.
retain_prefixes – An optional iterable that contains prefixes whose declarations shall be kept despite not being used.

clone() → Document[source]¶

Returns: Another instance with the duplicated contents.

collapse_whitespace()[source]¶

Collapses whitespace as described here: https://wiki.tei-c.org/index.php/XML_Whitespace#Recommendations

Implicitly merges all neighbouring text nodes.

config: SimpleNamespace¶: Beside the used parser and collapsed_whitespace option, this property contains the namespaced data that extension classes and loaders may have stored.

css_select(expression: str, namespaces: Optional[Namespaces] = None) → QueryResults[source]¶: This method proxies to the TagNode.css_select() method of the document’s root node.

head_nodes¶: A list-like accessor to the nodes that precede the document’s root node. Note that nodes can’t be removed or replaced.

merge_text_nodes()[source]¶: This method proxies to the TagNode.merge_text_nodes() method of the document’s root node.

property namespaces: Namespaces¶: The namespace mapping of the document’s root node.

new_tag_node(local_name: str, attributes: Optional[Dict[str, str]] = None, namespace: Optional[str] = None) → TagNode[source]¶: This method proxies to the TagNode.new_tag_node() method of the document’s root node.

property root: TagNode¶: The root node of a document tree.

save(path: Path, pretty: bool = False, **cleanup_namespaces_args)[source]¶

Parameters

path – The path where the document shall be saved.
pretty – Adds indentation for human consumers when True.
cleanup_namespaces_args – Arguments that are a passed to Document.cleanup_namespaces() before saving.

source_url: Optional[str]¶: The source URL where a loader obtained the document’s contents or None.

tail_nodes¶: A list-like accessor to the nodes that follow the document’s root node. Note that nodes can’t be removed or replaced.

write(buffer: IO, pretty: bool = False, **cleanup_namespaces_args)[source]¶

Parameters

buffer – A file-like object that the document is written to.
pretty – Adds indentation for human consumers when True.
cleanup_namespaces_args – Arguments that are a passed to Document.cleanup_namespaces() before writing.

xpath(expression: str, namespaces: Optional[Namespaces] = None) → QueryResults[source]¶: This method proxies to the TagNode.xpath() method of the document’s root node.

xslt(transformation: XSLT) → Document[source]¶

Parameters: transformation – A lxml.etree.XSLT instance that shall be applied to the document.
Returns: A new instance with the transformation’s result.

Document loaders¶

If you want or need to manipulate the availability of or order in which loaders are attempted, you can change the delb.plugins.plugin_manager.plugins.loaders object which is a list. Its state is reflected in your whole application. Please refer to this issue when you require finer controls over these aspects.

Core¶

The core_loaders module provides a set loaders to retrieve documents from various data sources.

_delb.plugins.core_loaders.buffer_loader(data: Any, config: SimpleNamespace) → _delb.typing.LoaderResult[source]¶: This loader loads a document from a file-like object.

_delb.plugins.core_loaders.etree_loader(data: Any, config: SimpleNamespace) → _delb.typing.LoaderResult[source]¶: This loader processes lxml.etree._Element and lxml.etree._ElementTree instances.

_delb.plugins.core_loaders.ftp_loader(data: Any, config: SimpleNamespace) → _delb.typing.LoaderResult[source]¶: Loads a document from a URL with either the ftp schema. The URL will be bound to source_url on the document’s Document.config attribute.

_delb.plugins.core_loaders.path_loader(data: Any, config: SimpleNamespace) → _delb.typing.LoaderResult[source]¶: This loader loads from a file that is pointed at with a pathlib.Path instance. That instance will be bound to source_path on the document’s Document.config attribute.

_delb.plugins.core_loaders.tag_node_loader(data: Any, config: SimpleNamespace) → _delb.typing.LoaderResult[source]¶: This loader loads, or rather clones, a delb.TagNode instance and its descendant nodes.

_delb.plugins.core_loaders.text_loader(data: Any, config: SimpleNamespace) → _delb.typing.LoaderResult[source]¶: Parses a string containing a full document.

Extra¶

If delb is installed with https-loader as extra, the required dependencies for this loader are installed as well. See Installation.

_delb.plugins.https_loader.https_loader(data: ~typing.Any, config: ~types.SimpleNamespace, client: ~httpx.Client = <httpx.Client object>) → _delb.typing.LoaderResult[source]¶

This loader loads a document from a URL with the http and https scheme. Redirects are followed. The default httpx-client follows redirects and can partially be configured with environment variables. The URL will be bound to the name source_url on the document’s Document.config attribute.

Loaders with specifically configured httpx-clients can build on this loader like so:

import httpx
from _delb.plugins import plugin_manager
from _delb.plugins.https_loader import https_loader


client = httpx.Client(follow_redirects=False, trust_env=False)

@plugin_manager.register_loader(before=https_loader)
def custom_https_loader(data, config):
    return https_loader(data, config, client=client)

Parser options¶

class delb.ParserOptions(cleanup_namespaces: bool = False, collapse_whitespace: bool = False, remove_comments: bool = False, remove_processing_instructions: bool = False, resolve_entities: bool = True, unplugged: bool = False)[source]¶

The configuration options that define an XML parser’s behaviour.

Parameters

cleanup_namespaces – Consolidate XML namespace declarations.
collapse_whitespace – Collapse the content's whitespace.
remove_comments – Ignore comments.
remove_processing_instructions – Don’t include processing instructions in the parsed tree.
resolve_entities – Resolve entities.
unplugged – Don’t load referenced resources over network.

Nodes¶

Comment¶

class delb.CommentNode(etree_element: _Element)[source]¶

The instances of this class represent comment nodes of a tree.

To instantiate new nodes use new_comment_node().

Properties

`content`	The comment's text.
`depth`	The depth (or level) of the node in its tree.
`document`	The `Document` instances that the node is associated with or `None`.
`first_child`
`full_text`	The concatenated contents of all text node descendants in document order.
`index`	The node's index within the parent's collection of child nodes or `None` when the node has no parent.
`last_child`
`last_descendant`
`namespaces`	The prefix to namespace mapping of the node.
`parent`	The node's parent or `None`.

Fetching a single relative node

`fetch_following`(*filter)	param filter Any number of filter s.
`fetch_following_sibling`(*filter)	param filter Any number of filter s.
`fetch_preceding`(*filter)	param filter Any number of filter s.
`fetch_preceding_sibling`(*filter)	param filter Any number of filter s.

Iterating over relative nodes

`iterate_ancestors`(*filter)	param filter Any number of filter s that a node must match to be
`iterate_children`(*filter[, recurse])	A generator iterator that yields nothing.
`iterate_descendants`(*filter)	param filter Any number of filter s that a node must match to be
`iterate_following`(*filter)	param filter Any number of filter s that a node must match to be
`iterate_following_siblings`(*filter)	param filter Any number of filter s that a node must match to be
`iterate_preceding`(*filter)	param filter Any number of filter s that a node must match to be
`iterate_preceding_siblings`(*filter)	param filter Any number of filter s that a node must match to be

Querying nodes

xpath(expression[, namespaces])

See Queries with XPath & CSS for details on the extent of the XPath implementation.

Adding nodes

`add_following_siblings`(*node[, clone])	Adds one or more nodes to the right of the node this method is called on.
`add_preceding_siblings`(*node[, clone])	Adds one or more nodes to the left of the node this method is called on.

Removing a node from its tree

`detach`([retain_child_nodes])	Removes the node from its tree.
`replace_with`(node[, clone])	Removes the node and places the given one in its tree location.

Uncategorized methods

clone([deep, quick_and_unsafe])

param deep: Clones the whole subtree if True.

new_tag_node(local_name[, attributes, ...])

Creates a new TagNode instance in the node's context.

add_following_siblings(*node: Union[str, NodeBase, _TagDefinition], clone: bool = False)¶

Adds one or more nodes to the right of the node this method is called on.

The nodes can be concrete instances of any node type or rather abstract descriptions in the form of strings or objects returned from the tag() function that are used to derive TextNode respectively TagNode instances from.

Parameters

node – The node(s) to be added.
clone – Clones the concrete nodes before adding if True.

add_preceding_siblings(*node: Union[str, NodeBase, _TagDefinition], clone: bool = False)¶

Adds one or more nodes to the left of the node this method is called on.

The nodes can be concrete instances of any node type or rather abstract descriptions in the form of strings or objects returned from the tag() function that are used to derive TextNode respectively TagNode instances from.

Parameters

node – The node(s) to be added.
clone – Clones the concrete nodes before adding if True.

clone(deep: bool = False, quick_and_unsafe: bool = False) → _ElementWrappingNode¶

Parameters

deep – Clones the whole subtree if True.
quick_and_unsafe – Creates a deep clone in a quicker manner where text nodes may get lost. It should be safe with trees that don’t contain subsequent text nodes, e.g. freshly parsed, unaltered documents of after TagNode.merge_text_nodes() has been applied.

Returns

A copy of the node.

property content: str¶: The comment’s text.

property depth: int¶: The depth (or level) of the node in its tree.

detach(retain_child_nodes: bool = False) → _ElementWrappingNode¶

Removes the node from its tree.

Parameters: retain_child_nodes – Keeps the node’s descendants in the originating tree if True.
Returns: The removed node.

property document: Optional[Document]¶: The Document instances that the node is associated with or None.

fetch_following(*filter: _delb.typing.Filter) → Optional[NodeBase]¶

Parameters: filter – Any number of filter s.
Returns: The next node in document order that matches all filters or None.

fetch_following_sibling(*filter: _delb.typing.Filter) → Optional[NodeBase]¶

Parameters: filter – Any number of filter s.
Returns: The next sibling to the right that matches all filters or None.

fetch_preceding(*filter: _delb.typing.Filter) → Optional[NodeBase]¶

Parameters: filter – Any number of filter s.
Returns: The previous node in document order that matches all filters or None.

fetch_preceding_sibling(*filter: _delb.typing.Filter) → Optional[NodeBase]¶

Parameters: filter – Any number of filter s.
Returns: The next sibling to the left that matches all filters or None.

first_child = None¶

property full_text: str¶: The concatenated contents of all text node descendants in document order.

property index: Optional[int]¶: The node’s index within the parent’s collection of child nodes or None when the node has no parent.

iterate_ancestors(*filter: _delb.typing.Filter) → Iterator[TagNode]¶

Parameters: filter – Any number of filter s that a node must match to be yielded.
Returns: A generator iterator that yields the ancestor nodes from bottom to top.

iterate_children(*filter: _delb.typing.Filter, recurse: bool = False) → Iterator[NodeBase]¶

A generator iterator that yields nothing.

iterate_descendants(*filter: _delb.typing.Filter) → Iterator[NodeBase]¶

Parameters: filter – Any number of filter s that a node must match to be yielded.
Returns: A generator iterator that yields the descending nodes of the node.

iterate_following(*filter: _delb.typing.Filter) → Iterator[NodeBase]¶

Parameters: filter – Any number of filter s that a node must match to be yielded.
Returns: A generator iterator that yields the following nodes in document order.

iterate_following_siblings(*filter: _delb.typing.Filter) → Iterator[NodeBase]¶

Parameters: filter – Any number of filter s that a node must match to be yielded.
Returns: A generator iterator that yields the siblings to the node’s right.

iterate_preceding(*filter: _delb.typing.Filter) → Iterator[NodeBase]¶

Parameters: filter – Any number of filter s that a node must match to be yielded.
Returns: A generator iterator that yields the previous nodes in document order.

iterate_preceding_siblings(*filter: _delb.typing.Filter) → Iterator[NodeBase]¶

Parameters: filter – Any number of filter s that a node must match to be yielded.
Returns: A generator iterator that yields the siblings to the node’s left.

last_child = None¶

last_descendant = None¶

property namespaces: Namespaces¶: The prefix to namespace mapping of the node.

new_tag_node(local_name: str, attributes: Optional[Dict[str, str]] = None, namespace: Optional[str] = None, children: Sequence[Union[str, NodeBase, _TagDefinition]] = ()) → TagNode¶

Creates a new TagNode instance in the node’s context.

Parameters

local_name – The tag name.
attributes – Optional attributes that are assigned to the new node.
namespace – An optional tag namespace. If none is provided, the context node’s namespace is inherited.
children – An optional sequence of objects that will be appended as child nodes. This can be existing nodes, strings that will be inserted as text nodes and in-place definitions of TagNode instances from tag(). The latter will be assigned to the same namespace.

Returns

The newly created tag node.

property parent: Optional[TagNode]¶: The node’s parent or None.

replace_with(node: Union[str, NodeBase, _TagDefinition], clone: bool = False) → NodeBase¶

Removes the node and places the given one in its tree location.

The node can be a concrete instance of any node type or a rather abstract description in the form of a string or an object returned from the tag() function that is used to derive a TextNode respectively TagNode instance from.

Parameters

node – The replacing node.
clone – A concrete, replacing node is cloned if True.

Returns

The removed node.

xpath(expression: str, namespaces: Optional[Namespaces] = None) → QueryResults¶

See Queries with XPath & CSS for details on the extent of the XPath implementation.

Parameters

expression – A supported XPath 1.0 expression that contains one or more location paths.
namespaces – A mapping of prefixes that are used in the expression to namespaces. If omitted, the node’s definition is used.

Returns

All nodes that match the evaluation of the provided XPath expression.

Processing instruction¶

class delb.ProcessingInstructionNode(etree_element: _Element)[source]¶

The instances of this class represent processing instruction nodes of a tree.

To instantiate new nodes use new_processing_instruction_node().

Properties

`content`	The processing instruction's text.
`depth`	The depth (or level) of the node in its tree.
`document`	The `Document` instances that the node is associated with or `None`.
`first_child`
`full_text`	The concatenated contents of all text node descendants in document order.
`index`	The node's index within the parent's collection of child nodes or `None` when the node has no parent.
`last_child`
`last_descendant`
`namespaces`	The prefix to namespace mapping of the node.
`parent`	The node's parent or `None`.
`target`	The processing instruction's target.

Fetching a single relative node

`fetch_following`(*filter)	param filter Any number of filter s.
`fetch_following_sibling`(*filter)	param filter Any number of filter s.
`fetch_preceding`(*filter)	param filter Any number of filter s.
`fetch_preceding_sibling`(*filter)	param filter Any number of filter s.

Iterating over relative nodes

`iterate_ancestors`(*filter)	param filter Any number of filter s that a node must match to be
`iterate_children`(*filter[, recurse])	A generator iterator that yields nothing.
`iterate_descendants`(*filter)	param filter Any number of filter s that a node must match to be
`iterate_following`(*filter)	param filter Any number of filter s that a node must match to be
`iterate_following_siblings`(*filter)	param filter Any number of filter s that a node must match to be
`iterate_preceding`(*filter)	param filter Any number of filter s that a node must match to be
`iterate_preceding_siblings`(*filter)	param filter Any number of filter s that a node must match to be

Querying nodes

xpath(expression[, namespaces])

See Queries with XPath & CSS for details on the extent of the XPath implementation.

Adding nodes

`add_following_siblings`(*node[, clone])	Adds one or more nodes to the right of the node this method is called on.
`add_preceding_siblings`(*node[, clone])	Adds one or more nodes to the left of the node this method is called on.

Removing a node from its tree

`detach`([retain_child_nodes])	Removes the node from its tree.
`replace_with`(node[, clone])	Removes the node and places the given one in its tree location.

Uncategorized methods

clone([deep, quick_and_unsafe])

param deep: Clones the whole subtree if True.

new_tag_node(local_name[, attributes, ...])

Creates a new TagNode instance in the node's context.

add_following_siblings(*node: Union[str, NodeBase, _TagDefinition], clone: bool = False)¶

Adds one or more nodes to the right of the node this method is called on.

The nodes can be concrete instances of any node type or rather abstract descriptions in the form of strings or objects returned from the tag() function that are used to derive TextNode respectively TagNode instances from.

Parameters

node – The node(s) to be added.
clone – Clones the concrete nodes before adding if True.

add_preceding_siblings(*node: Union[str, NodeBase, _TagDefinition], clone: bool = False)¶

Adds one or more nodes to the left of the node this method is called on.

The nodes can be concrete instances of any node type or rather abstract descriptions in the form of strings or objects returned from the tag() function that are used to derive TextNode respectively TagNode instances from.

Parameters

node – The node(s) to be added.
clone – Clones the concrete nodes before adding if True.

clone(deep: bool = False, quick_and_unsafe: bool = False) → _ElementWrappingNode¶

Parameters

deep – Clones the whole subtree if True.
quick_and_unsafe – Creates a deep clone in a quicker manner where text nodes may get lost. It should be safe with trees that don’t contain subsequent text nodes, e.g. freshly parsed, unaltered documents of after TagNode.merge_text_nodes() has been applied.

Returns

A copy of the node.

property content: str¶: The processing instruction’s text.

property depth: int¶: The depth (or level) of the node in its tree.

detach(retain_child_nodes: bool = False) → _ElementWrappingNode¶

Removes the node from its tree.

Parameters: retain_child_nodes – Keeps the node’s descendants in the originating tree if True.
Returns: The removed node.

property document: Optional[Document]¶: The Document instances that the node is associated with or None.

fetch_following(*filter: _delb.typing.Filter) → Optional[NodeBase]¶

Parameters: filter – Any number of filter s.
Returns: The next node in document order that matches all filters or None.

fetch_following_sibling(*filter: _delb.typing.Filter) → Optional[NodeBase]¶

Parameters: filter – Any number of filter s.
Returns: The next sibling to the right that matches all filters or None.

fetch_preceding(*filter: _delb.typing.Filter) → Optional[NodeBase]¶

Parameters: filter – Any number of filter s.
Returns: The previous node in document order that matches all filters or None.

fetch_preceding_sibling(*filter: _delb.typing.Filter) → Optional[NodeBase]¶

Parameters: filter – Any number of filter s.
Returns: The next sibling to the left that matches all filters or None.

first_child = None¶

property full_text: str¶: The concatenated contents of all text node descendants in document order.

property index: Optional[int]¶: The node’s index within the parent’s collection of child nodes or None when the node has no parent.

iterate_ancestors(*filter: _delb.typing.Filter) → Iterator[TagNode]¶

Parameters: filter – Any number of filter s that a node must match to be yielded.
Returns: A generator iterator that yields the ancestor nodes from bottom to top.

iterate_children(*filter: _delb.typing.Filter, recurse: bool = False) → Iterator[NodeBase]¶

A generator iterator that yields nothing.

iterate_descendants(*filter: _delb.typing.Filter) → Iterator[NodeBase]¶

Parameters: filter – Any number of filter s that a node must match to be yielded.
Returns: A generator iterator that yields the descending nodes of the node.

iterate_following(*filter: _delb.typing.Filter) → Iterator[NodeBase]¶

Parameters: filter – Any number of filter s that a node must match to be yielded.
Returns: A generator iterator that yields the following nodes in document order.

iterate_following_siblings(*filter: _delb.typing.Filter) → Iterator[NodeBase]¶

Parameters: filter – Any number of filter s that a node must match to be yielded.
Returns: A generator iterator that yields the siblings to the node’s right.

iterate_preceding(*filter: _delb.typing.Filter) → Iterator[NodeBase]¶

Parameters: filter – Any number of filter s that a node must match to be yielded.
Returns: A generator iterator that yields the previous nodes in document order.

iterate_preceding_siblings(*filter: _delb.typing.Filter) → Iterator[NodeBase]¶

Parameters: filter – Any number of filter s that a node must match to be yielded.
Returns: A generator iterator that yields the siblings to the node’s left.

last_child = None¶

last_descendant = None¶

property namespaces: Namespaces¶: The prefix to namespace mapping of the node.

new_tag_node(local_name: str, attributes: Optional[Dict[str, str]] = None, namespace: Optional[str] = None, children: Sequence[Union[str, NodeBase, _TagDefinition]] = ()) → TagNode¶

Creates a new TagNode instance in the node’s context.

Parameters

local_name – The tag name.
attributes – Optional attributes that are assigned to the new node.
namespace – An optional tag namespace. If none is provided, the context node’s namespace is inherited.
children – An optional sequence of objects that will be appended as child nodes. This can be existing nodes, strings that will be inserted as text nodes and in-place definitions of TagNode instances from tag(). The latter will be assigned to the same namespace.

Returns

The newly created tag node.

property parent: Optional[TagNode]¶: The node’s parent or None.

replace_with(node: Union[str, NodeBase, _TagDefinition], clone: bool = False) → NodeBase¶

Removes the node and places the given one in its tree location.

The node can be a concrete instance of any node type or a rather abstract description in the form of a string or an object returned from the tag() function that is used to derive a TextNode respectively TagNode instance from.

Parameters

node – The replacing node.
clone – A concrete, replacing node is cloned if True.

Returns

The removed node.

property target: str¶: The processing instruction’s target.

xpath(expression: str, namespaces: Optional[Namespaces] = None) → QueryResults¶

See Queries with XPath & CSS for details on the extent of the XPath implementation.

Parameters

expression – A supported XPath 1.0 expression that contains one or more location paths.
namespaces – A mapping of prefixes that are used in the expression to namespaces. If omitted, the node’s definition is used.

Returns

All nodes that match the evaluation of the provided XPath expression.

Tag¶

class delb.TagNode(etree_element: _Element)[source]¶

The instances of this class represent tag node s of a tree, the equivalent of DOM’s elements.

To instantiate new nodes use Document.new_tag_node, TagNode.new_tag_node, TextNode.new_tag_node or new_tag_node().

Some syntactic sugar is baked in:

Attributes and nodes can be tested for membership in a node.

>>> root = Document('<root ham="spam"><child/></root>').root
>>> child = root.first_child
>>> "ham" in root
True
>>> child in root
True

Nodes can be copied. Note that this relies on TagNode.clone().

>>> from copy import copy, deepcopy
>>> root = Document("<root>Content</root>").root
>>> print(copy(root))
<root/>
>>> print(deepcopy(root))
<root>Content</root>

Nodes can be tested for equality regarding their qualified name and attributes.

>>> root = Document('<root><foo x="0"/><foo x="0"/><bar x="0"/></root>').root
>>> root[0] == root[1]
True
>>> root[0] == root[2]
False

Attribute values and child nodes can be obtained with the subscript notation.

>>> root = Document('<root x="0"><child_1/>child_2<child_3/></root>').root
>>> root["x"]
'0'
>>> print(root[0])
<child_1/>
>>> print(root[-1])
<child_3/>
>>> print([str(x) for x in root[1::-1]])
['child_2', '<child_1/>']

How much child nodes has this node anyway?

>>> root = Document("<root><child_1/><child_2/></root>").root
>>> len(root)
2
>>> len(root[0])
0

As seen in the examples above, a tag nodes string representation yields a serialized XML representation of a sub-/tree.

Properties

`attributes`	A mapping that can be used to query and alter the node's attributes.
`depth`	The depth (or level) of the node in its tree.
`document`	The `Document` instances that the node is associated with or `None`.
`first_child`	The node's first child node.
`full_text`	The concatenated contents of all text node descendants in document order.
`id`	This is a shortcut to retrieve and set the `id` attribute in the XML namespace.
`index`	The node's index within the parent's collection of child nodes or `None` when the node has no parent.
`last_child`	The node's last child node.
`last_descendant`	The node's last descendant.
`local_name`	The node's name.
`location_path`	An unambiguous XPath location path that points to this node from its tree root.
`namespace`	The node's namespace.
`namespaces`	The prefix to namespace mapping of the node.
`parent`	The node's parent or `None`.
`prefix`	The prefix that the node's namespace is currently mapped to.
`universal_name`	The node's qualified name in Clark notation.

Fetching a single relative node

`fetch_following`(*filter)	param filter Any number of filter s.
`fetch_following_sibling`(*filter)	param filter Any number of filter s.
`fetch_preceding`(*filter)	param filter Any number of filter s.
`fetch_preceding_sibling`(*filter)	param filter Any number of filter s.

Iterating over relative nodes

`iterate_ancestors`(*filter)	param filter Any number of filter s that a node must match to be
`iterate_children`(*filter[, recurse])	param filter Any number of filter s that a node must match to be
`iterate_descendants`(*filter)	param filter Any number of filter s that a node must match to be
`iterate_following`(*filter)	param filter Any number of filter s that a node must match to be
`iterate_following_siblings`(*filter)	param filter Any number of filter s that a node must match to be
`iterate_preceding`(*filter)	param filter Any number of filter s that a node must match to be
`iterate_preceding_siblings`(*filter)	param filter Any number of filter s that a node must match to be

Querying nodes

`css_select`(expression[, namespaces])	See Queries with XPath & CSS regarding the extent of the supported grammar.
`fetch_or_create_by_xpath`(expression[, ...])	Fetches a single node that is locatable by the provided XPath expression.
`xpath`(expression[, namespaces])	See Queries with XPath & CSS for details on the extent of the XPath implementation.

Adding nodes

`add_following_siblings`(*node[, clone])	Adds one or more nodes to the right of the node this method is called on.
`add_preceding_siblings`(*node[, clone])	Adds one or more nodes to the left of the node this method is called on.
`append_children`(*node[, clone])	Adds one or more nodes as child nodes after any existing to the child nodes of the node this method is called on.
`insert_children`(index, *node[, clone])	Inserts one or more child nodes.
`prepend_children`(*node[, clone])	Adds one or more nodes as child nodes before any existing to the child nodes of the node this method is called on.

Removing a node from its tree

`detach`([retain_child_nodes])	Removes the node from its tree.
`replace_with`(node[, clone])	Removes the node and places the given one in its tree location.

Uncategorized methods

`clone`([deep, quick_and_unsafe])	param deep Clones the whole subtree if `True`.
`merge_text_nodes`()	Merges all consecutive text nodes in the subtree into one.
`new_tag_node`(local_name[, attributes, ...])	Creates a new `TagNode` instance in the node's context.
`parse`(text[, parser, parser_options, ...])	Parses the given string or bytes sequence into a new tree.

add_following_siblings(*node: Union[str, NodeBase, _TagDefinition], clone: bool = False)¶

Adds one or more nodes to the right of the node this method is called on.

The nodes can be concrete instances of any node type or rather abstract descriptions in the form of strings or objects returned from the tag() function that are used to derive TextNode respectively TagNode instances from.

Parameters

node – The node(s) to be added.
clone – Clones the concrete nodes before adding if True.

add_preceding_siblings(*node: Union[str, NodeBase, _TagDefinition], clone: bool = False)¶

Adds one or more nodes to the left of the node this method is called on.

The nodes can be concrete instances of any node type or rather abstract descriptions in the form of strings or objects returned from the tag() function that are used to derive TextNode respectively TagNode instances from.

Parameters

node – The node(s) to be added.
clone – Clones the concrete nodes before adding if True.

append_children(*node: Union[str, NodeBase, _TagDefinition], clone: bool = False)[source]¶

Adds one or more nodes as child nodes after any existing to the child nodes of the node this method is called on.

The nodes can be concrete instances of any node type or rather abstract descriptions in the form of strings or objects returned from the tag() function that are used to derive TextNode respectively TagNode instances from.

Parameters

node – The node(s) to be added.
clone – Clones the concrete nodes before adding if True.

property attributes: TagAttributes¶

A mapping that can be used to query and alter the node’s attributes.

>>> node = new_tag_node("node", attributes={"foo": "0", "bar": "0"})
>>> node.attributes
{'foo': '0', 'bar': '0'}
>>> node.attributes.pop("bar")
'0'
>>> node.attributes["foo"] = "1"
>>> node.attributes["peng"] = "1"
>>> print(node)
<node foo="1" peng="1"/>
>>> node.attributes.update({"foo": "2", "zong": "2"})
>>> print(node)
<node foo="2" peng="1" zong="2"/>

Namespaced attributes can be accessed by using Python’s slice notation. A default namespace can be provided optionally, but it’s also found without.

>>> node = new_tag_node("node", {})
>>> node.attributes["http://namespace":"foo"] = "0"
>>> print(node)
<node xmlns:ns0="http://namespace" ns0:foo="0"/>
>>> node = Document('<node xmlns="default" foo="0"/>').root
>>> node.attributes["default":"foo"] is node.attributes["foo"]
True

Attributes behave like strings, but also expose namespace, local name and value for manipulation.

>>> node = new_tag_node("node")
>>> node.attributes["foo"] = "0"
>>> node.attributes["foo"].local_name = "bar"
>>> node.attributes["bar"].namespace = "http://namespace"
>>> node.attributes["http://namespace":"bar"].value = "1"
>>> print(node)
<node xmlns:ns0="http://namespace" ns0:bar="1"/>

Unlike with typical Python mappings, requesting a non-existing attribute doesn’t evoke a KeyError, instead None is returned.

clone(deep: bool = False, quick_and_unsafe: bool = False) → TagNode[source]¶

Parameters

deep – Clones the whole subtree if True.
quick_and_unsafe – Creates a deep clone in a quicker manner where text nodes may get lost. It should be safe with trees that don’t contain subsequent text nodes, e.g. freshly parsed, unaltered documents of after TagNode.merge_text_nodes() has been applied.

Returns

A copy of the node.

css_select(expression: str, namespaces: Optional[Namespaces] = None) → QueryResults[source]¶

See Queries with XPath & CSS regarding the extent of the supported grammar.

Namespace prefixes are delimited with a | before a name test, for example div svg|metadata selects all descendants of div named nodes that belong to the default namespace or have no namespace and whose name is metadata and have a namespace that is mapped to the svg prefix.

Parameters

expression – A CSS selector expression.
namespaces – A mapping of prefixes that are used in the expression to namespaces. If omitted, the node’s definition is used.

Returns

All nodes that match the evaluation of the provided CSS selector expression.

property depth: int¶: The depth (or level) of the node in its tree.

detach(retain_child_nodes: bool = False) → _ElementWrappingNode[source]¶

Removes the node from its tree.

Parameters: retain_child_nodes – Keeps the node’s descendants in the originating tree if True.
Returns: The removed node.

property document: Optional[Document]¶: The Document instances that the node is associated with or None.

fetch_following(*filter: _delb.typing.Filter) → Optional[NodeBase]¶

Parameters: filter – Any number of filter s.
Returns: The next node in document order that matches all filters or None.

fetch_following_sibling(*filter: _delb.typing.Filter) → Optional[NodeBase]¶

Parameters: filter – Any number of filter s.
Returns: The next sibling to the right that matches all filters or None.

fetch_or_create_by_xpath(expression: str, namespaces: Union[Namespaces, None, Mapping[Optional[str], str]] = None) → TagNode[source]¶

Fetches a single node that is locatable by the provided XPath expression. If the node doesn’t exist, the non-existing branch will be created. These rules are imperative in your endeavour:

All location steps must use the child axis.
Each step needs to provide a name test.
Attributes must be compared against a literal.
Multiple attribute comparisons must be joined with the and operator and / or more than one predicate expression.
The logical validity of multiple attribute comparisons isn’t checked. E.g. one could provide foo[@p="her"][@p="him"], but expect an undefined behaviour.
Other contents in predicate expressions are invalid.

>>> document = Document("<root/>")
>>> grandchild = document.root.fetch_or_create_by_xpath(
...     "child[@a='b']/grandchild"
... )
>>> grandchild is document.root.fetch_or_create_by_xpath(
...     "child[@a='b']/grandchild"
... )
True
>>> str(document)
'<root><child a="b"><grandchild/></child></root>'

Parameters

expression – An XPath expression that can unambiguously locate a descending node in a tree that has any state.
namespaces – An optional mapping of prefixes to namespaces. As default the node’s one is used.

Returns

The existing or freshly created node descibed with expression.

fetch_preceding(*filter: _delb.typing.Filter) → Optional[NodeBase]¶

Parameters: filter – Any number of filter s.
Returns: The previous node in document order that matches all filters or None.

fetch_preceding_sibling(*filter: _delb.typing.Filter) → Optional[NodeBase]¶

Parameters: filter – Any number of filter s.
Returns: The next sibling to the left that matches all filters or None.

property first_child: Optional[NodeBase]¶: The node’s first child node.

property full_text: str¶: The concatenated contents of all text node descendants in document order.

property id: Optional[str]¶: This is a shortcut to retrieve and set the id attribute in the XML namespace. The client code is responsible to pass properly formed id names.

property index: Optional[int]¶: The node’s index within the parent’s collection of child nodes or None when the node has no parent.

insert_children(index: int, *node: Union[str, NodeBase, _TagDefinition], clone: bool = False)[source]¶

Inserts one or more child nodes.

The nodes can be concrete instances of any node type or rather abstract descriptions in the form of strings or objects returned from the tag() function that are used to derive TextNode respectively TagNode instances from.

Parameters

index – The index at which the first of the given nodes will be inserted, the remaining nodes are added afterwards in the given order.
node – The node(s) to be added.
clone – Clones the concrete nodes before adding if True.

iterate_ancestors(*filter: _delb.typing.Filter) → Iterator[TagNode]¶

Parameters: filter – Any number of filter s that a node must match to be yielded.
Returns: A generator iterator that yields the ancestor nodes from bottom to top.

iterate_children(*filter: _delb.typing.Filter, recurse: bool = False) → Iterator[NodeBase][source]¶

Parameters

filter – Any number of filter s that a node must match to be yielded.
recurse – Deprecated. Use NodeBase.iterate_descendants().

Returns

A generator iterator that yields the child nodes of the node.

iterate_descendants(*filter: _delb.typing.Filter) → Iterator[NodeBase][source]¶

Parameters: filter – Any number of filter s that a node must match to be yielded.
Returns: A generator iterator that yields the descending nodes of the node.

iterate_following(*filter: _delb.typing.Filter) → Iterator[NodeBase]¶

Parameters: filter – Any number of filter s that a node must match to be yielded.
Returns: A generator iterator that yields the following nodes in document order.

iterate_following_siblings(*filter: _delb.typing.Filter) → Iterator[NodeBase]¶

Parameters: filter – Any number of filter s that a node must match to be yielded.
Returns: A generator iterator that yields the siblings to the node’s right.

iterate_preceding(*filter: _delb.typing.Filter) → Iterator[NodeBase]¶

Parameters: filter – Any number of filter s that a node must match to be yielded.
Returns: A generator iterator that yields the previous nodes in document order.

iterate_preceding_siblings(*filter: _delb.typing.Filter) → Iterator[NodeBase]¶

Parameters: filter – Any number of filter s that a node must match to be yielded.
Returns: A generator iterator that yields the siblings to the node’s left.

property last_child: Optional[NodeBase]¶: The node’s last child node.

property last_descendant: Optional[NodeBase]¶: The node’s last descendant.

property local_name: str¶: The node’s name.

property location_path: str¶: An unambiguous XPath location path that points to this node from its tree root.

merge_text_nodes()[source]¶: Merges all consecutive text nodes in the subtree into one.

property namespace: Optional[str]¶: The node’s namespace. Be aware, that while this property can be set to None, serializations will continue to render a previous default namespace declaration if the node had such.

property namespaces: Namespaces¶: The prefix to namespace mapping of the node.

new_tag_node(local_name: str, attributes: Optional[Dict[str, str]] = None, namespace: Optional[str] = None, children: Sequence[Union[str, NodeBase, _TagDefinition]] = ()) → TagNode[source]¶

Creates a new TagNode instance in the node’s context.

Parameters

local_name – The tag name.
attributes – Optional attributes that are assigned to the new node.
namespace – An optional tag namespace. If none is provided, the context node’s namespace is inherited.
children – An optional sequence of objects that will be appended as child nodes. This can be existing nodes, strings that will be inserted as text nodes and in-place definitions of TagNode instances from tag(). The latter will be assigned to the same namespace.

Returns

The newly created tag node.

property parent: Optional[TagNode]¶: The node’s parent or None.

static parse(text: AnyStr, parser: Optional[XMLParser] = None, parser_options: Optional[ParserOptions] = None, collapse_whitespace: Optional[bool] = None) → TagNode[source]¶

Parses the given string or bytes sequence into a new tree.

Parameters

text – A serialized XML tree.
parser – Deprecated.
parser_options – A delb.ParserOptions class to configure the used parser.
collapse_whitespace – Deprecated. Use the argument with the same name on the parser_options object.

property prefix: Optional[str]¶: The prefix that the node’s namespace is currently mapped to.

prepend_children(*node: NodeBase, clone: bool = False) → None[source]¶

Adds one or more nodes as child nodes before any existing to the child nodes of the node this method is called on.

The nodes can be concrete instances of any node type or rather abstract descriptions in the form of strings or objects returned from the tag() function that are used to derive TextNode respectively TagNode instances from.

Parameters

node – The node(s) to be added.
clone – Clones the concrete nodes before adding if True.

replace_with(node: Union[str, NodeBase, _TagDefinition], clone: bool = False) → NodeBase¶

Removes the node and places the given one in its tree location.

The node can be a concrete instance of any node type or a rather abstract description in the form of a string or an object returned from the tag() function that is used to derive a TextNode respectively TagNode instance from.

Parameters

node – The replacing node.
clone – A concrete, replacing node is cloned if True.

Returns

The removed node.

property universal_name: str¶: The node’s qualified name in Clark notation.

xpath(expression: str, namespaces: Optional[Namespaces] = None) → QueryResults[source]¶

See Queries with XPath & CSS for details on the extent of the XPath implementation.

Parameters

expression – A supported XPath 1.0 expression that contains one or more location paths.
namespaces – A mapping of prefixes that are used in the expression to namespaces. If omitted, the node’s definition is used.

Returns

All nodes that match the evaluation of the provided XPath expression.

Tag attribute¶

class delb.nodes.Attribute(attributes: TagAttributes, key: str)[source]¶

Attribute objects represent tag node’s attributes. See the delb.TagNode.attributes() documentation for capabilities.

property local_name: str¶: The attribute’s local name.

property namespace: Optional[str]¶: The attribute’s namespace

property universal_name: str¶: The attribute’s namespace and local name in Clark notation.

property value: str¶: The attribute’s value.

Text¶

class delb.TextNode(reference_or_text: Union[_Element, str, TextNode], position: int = 0)[source]¶

TextNodes contain the textual data of a document. The class shall not be initialized by client code, just throw strings into the trees.

Instances expose all methods of str except str.index():

>>> node = TextNode("Show us the way to the next whisky bar.")
>>> node.split()
['Show', 'us', 'the', 'way', 'to', 'the', 'next', 'whisky', 'bar.']

Instances can be tested for inequality with other text nodes and strings:

>>> TextNode("ham") == TextNode("spam")
False
>>> TextNode("Patsy") == "Patsy"
True

And they can be tested for substrings:

>>> "Sir" in TextNode("Sir Bedevere the Wise")
True

Attributes that rely to child nodes yield nothing respectively None.

Properties

`content`	The node's text content.
`depth`	The depth (or level) of the node in its tree.
`document`	The `Document` instances that the node is associated with or `None`.
`first_child`
`full_text`	The concatenated contents of all text node descendants in document order.
`index`	The node's index within the parent's collection of child nodes or `None` when the node has no parent.
`last_child`
`last_descendant`
`namespaces`	The prefix to namespace mapping of the node.
`parent`	The node's parent or `None`.

Fetching a single relative node

`fetch_following`(*filter)	param filter Any number of filter s.
`fetch_following_sibling`(*filter)	param filter Any number of filter s.
`fetch_preceding`(*filter)	param filter Any number of filter s.
`fetch_preceding_sibling`(*filter)	param filter Any number of filter s.

Iterating over relative nodes

`iterate_ancestors`(*filter)	param filter Any number of filter s that a node must match to be
`iterate_children`(*filter[, recurse])	A generator iterator that yields nothing.
`iterate_descendants`(*filter)	param filter Any number of filter s that a node must match to be
`iterate_following`(*filter)	param filter Any number of filter s that a node must match to be
`iterate_following_siblings`(*filter)	param filter Any number of filter s that a node must match to be
`iterate_preceding`(*filter)	param filter Any number of filter s that a node must match to be
`iterate_preceding_siblings`(*filter)	param filter Any number of filter s that a node must match to be

Querying nodes

xpath(expression[, namespaces])

See Queries with XPath & CSS for details on the extent of the XPath implementation.

Adding nodes

`add_following_siblings`(*node[, clone])	Adds one or more nodes to the right of the node this method is called on.
`add_preceding_siblings`(*node[, clone])	Adds one or more nodes to the left of the node this method is called on.

Removing a node from its tree

`detach`([retain_child_nodes])	Removes the node from its tree.
`replace_with`(node[, clone])	Removes the node and places the given one in its tree location.

Uncategorized methods

clone([deep, quick_and_unsafe])

param deep: Clones the whole subtree if True.

new_tag_node(local_name[, attributes, ...])

Creates a new TagNode instance in the node's context.

add_following_siblings(*node: Union[str, NodeBase, _TagDefinition], clone: bool = False)¶

Adds one or more nodes to the right of the node this method is called on.

The nodes can be concrete instances of any node type or rather abstract descriptions in the form of strings or objects returned from the tag() function that are used to derive TextNode respectively TagNode instances from.

Parameters

node – The node(s) to be added.
clone – Clones the concrete nodes before adding if True.

add_preceding_siblings(*node: Union[str, NodeBase, _TagDefinition], clone: bool = False)¶

Adds one or more nodes to the left of the node this method is called on.

The nodes can be concrete instances of any node type or rather abstract descriptions in the form of strings or objects returned from the tag() function that are used to derive TextNode respectively TagNode instances from.

Parameters

node – The node(s) to be added.
clone – Clones the concrete nodes before adding if True.

clone(deep: bool = False, quick_and_unsafe: bool = False) → NodeBase[source]¶

Parameters

deep – Clones the whole subtree if True.
quick_and_unsafe – Creates a deep clone in a quicker manner where text nodes may get lost. It should be safe with trees that don’t contain subsequent text nodes, e.g. freshly parsed, unaltered documents of after TagNode.merge_text_nodes() has been applied.

Returns

A copy of the node.

property content: str¶: The node’s text content.

property depth: int¶: The depth (or level) of the node in its tree.

detach(retain_child_nodes: bool = False) → TextNode[source]¶

Removes the node from its tree.

Parameters: retain_child_nodes – Keeps the node’s descendants in the originating tree if True.
Returns: The removed node.

property document: Optional[Document]¶: The Document instances that the node is associated with or None.

fetch_following(*filter: _delb.typing.Filter) → Optional[NodeBase]¶

Parameters: filter – Any number of filter s.
Returns: The next node in document order that matches all filters or None.

fetch_following_sibling(*filter: _delb.typing.Filter) → Optional[NodeBase]¶

Parameters: filter – Any number of filter s.
Returns: The next sibling to the right that matches all filters or None.

fetch_preceding(*filter: _delb.typing.Filter) → Optional[NodeBase]¶

Parameters: filter – Any number of filter s.
Returns: The previous node in document order that matches all filters or None.

fetch_preceding_sibling(*filter: _delb.typing.Filter) → Optional[NodeBase][source]¶

Parameters: filter – Any number of filter s.
Returns: The next sibling to the left that matches all filters or None.

first_child = None¶

property full_text: str¶: The concatenated contents of all text node descendants in document order.

property index: Optional[int]¶: The node’s index within the parent’s collection of child nodes or None when the node has no parent.

iterate_ancestors(*filter: _delb.typing.Filter) → Iterator[TagNode]¶

Parameters: filter – Any number of filter s that a node must match to be yielded.
Returns: A generator iterator that yields the ancestor nodes from bottom to top.

iterate_children(*filter: _delb.typing.Filter, recurse: bool = False) → Iterator[NodeBase]¶

A generator iterator that yields nothing.

iterate_descendants(*filter: _delb.typing.Filter) → Iterator[NodeBase]¶

Parameters: filter – Any number of filter s that a node must match to be yielded.
Returns: A generator iterator that yields the descending nodes of the node.

iterate_following(*filter: _delb.typing.Filter) → Iterator[NodeBase]¶

Parameters: filter – Any number of filter s that a node must match to be yielded.
Returns: A generator iterator that yields the following nodes in document order.

iterate_following_siblings(*filter: _delb.typing.Filter) → Iterator[NodeBase]¶

Parameters: filter – Any number of filter s that a node must match to be yielded.
Returns: A generator iterator that yields the siblings to the node’s right.

iterate_preceding(*filter: _delb.typing.Filter) → Iterator[NodeBase]¶

Parameters: filter – Any number of filter s that a node must match to be yielded.
Returns: A generator iterator that yields the previous nodes in document order.

iterate_preceding_siblings(*filter: _delb.typing.Filter) → Iterator[NodeBase]¶

Parameters: filter – Any number of filter s that a node must match to be yielded.
Returns: A generator iterator that yields the siblings to the node’s left.

last_child = None¶

last_descendant = None¶

property namespaces¶: The prefix to namespace mapping of the node.

new_tag_node(local_name: str, attributes: Optional[Dict[str, str]] = None, namespace: Optional[str] = None, children: Sequence[Union[str, NodeBase, _TagDefinition]] = ()) → TagNode¶

Creates a new TagNode instance in the node’s context.

Parameters

local_name – The tag name.
attributes – Optional attributes that are assigned to the new node.
namespace – An optional tag namespace. If none is provided, the context node’s namespace is inherited.
children – An optional sequence of objects that will be appended as child nodes. This can be existing nodes, strings that will be inserted as text nodes and in-place definitions of TagNode instances from tag(). The latter will be assigned to the same namespace.

Returns

The newly created tag node.

property parent: Optional[TagNode]¶: The node’s parent or None.

replace_with(node: Union[str, NodeBase, _TagDefinition], clone: bool = False) → NodeBase¶

Removes the node and places the given one in its tree location.

The node can be a concrete instance of any node type or a rather abstract description in the form of a string or an object returned from the tag() function that is used to derive a TextNode respectively TagNode instance from.

Parameters

node – The replacing node.
clone – A concrete, replacing node is cloned if True.

Returns

The removed node.

xpath(expression: str, namespaces: Optional[Namespaces] = None) → QueryResults¶

See Queries with XPath & CSS for details on the extent of the XPath implementation.

Parameters

expression – A supported XPath 1.0 expression that contains one or more location paths.
namespaces – A mapping of prefixes that are used in the expression to namespaces. If omitted, the node’s definition is used.

Returns

All nodes that match the evaluation of the provided XPath expression.

Node constructors¶

delb.new_comment_node(content: str) → CommentNode[source]¶

Creates a new CommentNode.

Parameters: content – The comment’s content a.k.a. as text.
Returns: The newly created comment node.

delb.new_processing_instruction_node(target: str, content: str) → ProcessingInstructionNode[source]¶

Creates a new ProcessingInstructionNode.

Parameters

target – The processing instruction’s target name.
content – The processing instruction’s text.

Returns

The newly created processing instruction node.

delb.new_tag_node(local_name: str, attributes: Optional[Dict[str, str]] = None, namespace: Optional[str] = None, children: Sequence[Union[str, NodeBase, _TagDefinition]] = ()) → TagNode[source]¶

Creates a new TagNode instance outside any context. It is preferable to use new_tag_node(), on instances of documents and nodes where the instance is the creation context.

Parameters

local_name – The tag name.
attributes – Optional attributes that are assigned to the new node.
namespace – An optional tag namespace.
children – An optional sequence of objects that will be appended as child nodes. This can be existing nodes, strings that will be inserted as text nodes and in-place definitions of TagNode instances from tag(). The latter will be assigned to the same namespace.

Returns

The newly created tag node.

Queries with XPath & CSS¶

delb allows querying of nodes with CSS selector and XPath expressions. CSS selectors are converted to XPath expressions with a third-party library before evaluation and they are only supported as far as their computed XPath equivalents are supported by delb’s very own XPath implementation.

This implementation is not fully compliant with one of the W3C’s XPath specifications. It mostly covers the XPath 1.0 specs, but focuses on the querying via path expressions with simple constraints while it omits a broad employment of computations (that’s what programming languages are for) and has therefore these intended deviations from that standard:

Default namespaces can be addressed in node and attribute names, by simply using no prefix.
The attribute and namespace axes are not supported in location steps (see also below).
In predicates only the attribute axis can be used in its abbreviated form (@name).
Path evaluations within predicates are not available.
Only these predicate functions are provided and tested:
- boolean
- concat
- contains
- last
- not
- position
- starts-with
- text
  
  Behaves as if deployed as a single step location path that only tests for the node type text. Hence it returns the contents of the context node’s first child node that is a text node or an empty string when there is none.
- Please refrain from extension requests without a proper, concrete implementation proposal.

If you’re accustomed to retrieve attribute values with XPath expressions, employ the functionality of the higher programming language at hand like this:

>>> [x.attributes["target"] for x in root.xpath(".//foo")
...  if "target" in x.attributes ]  

Instead of:

>>> root.xpath(".//foo/@target")  

See _delb.plugins.PluginManager.register_xpath_function() regarding the use of custom functions.

class _delb.xpath.EvaluationContext(node: NodeBase, position: int, size: int, namespaces: Namespaces)[source]¶

Instances of this type are passed to XPath functions in order to pass contextual information.

count(value, /)¶: Return number of occurrences of value.

index(value, start=0, stop=9223372036854775807, /)¶

Return first index of value.

Raises ValueError if the value is not present.

property namespaces¶: A mapping of prefixes to namespaces that is used in the whole evaluation.

property node¶: The node that is evaluated.

property position¶: The node’s position within all nodes that matched a location step’s node test in order of the step’s axis’ direction. The first position is 1.

property size¶: The number of all nodes all nodes that matched a location step’s node test.

class _delb.xpath.QueryResults(results: Iterable[NodeBase])[source]¶

A container that includes the results of a CSS selector or XPath query with some helpers for better readable Python expressions.

as_list() → List[NodeBase][source]¶: The contained nodes as a new list.

property as_tuple: Tuple[NodeBase, ...]¶: The contained nodes in a tuple.

count(value) → integer -- return number of occurrences of value¶

filtered_by(*filters: _delb.typing.Filter) → QueryResults[source]¶: Returns another QueryResults instance that contains all nodes filtered by the provided filter s.

property first: Optional[NodeBase]¶: The first node from the results or None if there are none.

in_document_order() → QueryResults[source]¶: Returns another QueryResults instance where the contained nodes are sorted in document order.

index(value[, start[, stop]]) → integer -- return first index of value.¶

Raises ValueError if the value is not present.

Supporting start and stop arguments is optional, but recommended.

property last: Optional[NodeBase]¶: The last node from the results or None if there are none.

property size: int¶: The amount of contained nodes.

Filters¶

Default filters¶

delb.altered_default_filters(*filter: _delb.typing.Filter, extend: bool = False)[source]¶

This function can be either used as as context manager or decorator to define a set of default_filters for the encapsuled code block or callable. These are then applied in all operations that allow node filtering, like TagNode.next_node(). Mind that they also affect a node’s index property and indexed access to child nodes.

>>> root = Document(
...     '<root xmlns="foo"><a/><!--x--><b/><!--y--><c/></root>'
... ).root
>>> with altered_default_filters(is_comment_node):
...     print([x.content for x in root.iterate_children()])
['x', 'y']

As the default filters shadow comments and processing instructions by default, use no argument to unset this in order to access all type of nodes.

Parameters: extend – Extends the currently active filters with the given ones instead of replacing them.

Contributed filters¶

delb.any_of(*filter: _delb.typing.Filter) → _delb.typing.Filter[source]¶: A node filter wrapper that matches when any of the given filters is matching, like a boolean or.

delb.is_comment_node(node: NodeBase) → bool[source]¶: A node filter that matches CommentNode instances.

delb.is_processing_instruction_node(node: NodeBase) → bool[source]¶: A node filter that matches ProcessingInstructionNode instances.

delb.is_tag_node(node: NodeBase) → bool[source]¶: A node filter that matches TagNode instances.

delb.is_text_node(node: NodeBase) → bool[source]¶: A node filter that matches TextNode instances.

delb.not_(*filter: _delb.typing.Filter) → _delb.typing.Filter[source]¶: A node filter wrapper that matches when the given filter is not matching, like a boolean not.

Transformations¶

This module offers a canonical interface with the aim to make re-use of transforming algorithms easier.

Let’s look at it with examples:

from delb.transform import Transformation


class ResolveCopyOf(Transformation):
    def transform(self):
        for node in self.root.css_select("*[copyOf]"):
            source_id = node["copyOf"]
            source_node = self.origin_document.xpath(
                f'//*[@xml:id="{source_id[1:]}"]'
            ).first
            cloned_node = source_node.clone(deep=True)
            cloned_node.id = None
            node.replace_with(cloned_node)

From such defined transformations instances can be called with a (sub-)tree and an optional document where that tree originates from:

resolve_copy_of = ResolveCopyOf()
tree = resolve_copy_of(tree)  # where tree is an instance of TagNode

typing.NamedTuple are used to define options for transformations:

from typing import NamedTuple


class ResolveChoiceOptions(NamedTuple):
    corr: bool = True
    reg: bool = True


class ResolveChoice(Transformation):
    options_class = ResolveChoiceOptions

    def __init__(self, options):
        super().__init__(options)
        self.keep_selector = ",".join(
            (
                "corr" if self.options.corr else "sic",
                "reg" if self.options.reg else "orig"
            )
         )
        self.drop_selector = ",".join(
            (
                "sic" if self.options.corr else "corr",
                "orig" if self.options.reg else "reg"
            )
        )

    def transform(self):
        for choice_node in self.root.css_select("choice"):
            node_to_drop = choice_node.css_select(self.drop_selector).first
            node_to_drop.detach()

            node_to_keep = choice_node.css_select(self.keep_selector).first
            node_to_keep.detach(retain_child_nodes=True)

            choice_node.detach(retain_child_nodes=True)

A transformation class that defines an option_class property can then either be used with its defaults or with alternate options:

resolve_choice = ResolveChoice()
tree = resolve_choice(tree)

resolve_choice = ResolveChoice(ResolveChoiceOptions(reg=False))
tree = resolve_choice(tree)

Finally, concrete transformations can be chained, both as classes or instances. The interface allows also to chain multiple chains:

from delb.transform import TransformationSequence

tidy_up = TransformationSequence(ResolveCopyOf, resolve_choice)
tree = tidy_up(tree)

Attention

This is an experimental feature. It might change significantly in the future or be removed altogether.

class delb.transform.Transformation(options: Optional[NamedTuple] = None)[source]¶

This is a base class for any transformation algorithm.

abstract transform()[source]¶

This method needs to implement the transformation logic. When it is called, the instance has two attributes assigned from its call:

root is the node that the transformation was called to transform with. origin_document is the document that was possibly passed as second argument.

class delb.transform.TransformationBase[source]¶: This base class defines the calling interface of transformations.

class delb.transform.TransformationSequence(*transformations: Union[TransformationBase, Type[TransformationBase]])[source]¶: A transformation sequence can be used to combine any number of both Transformation (provided as class or instantiated with options) and other TransformationSequence instances or classes.

Various helpers¶

delb.first(iterable: Iterable) → Optional[Any][source]¶: Returns the first item of the given iterable or None if it’s empty. Note that the first item is consumed when the iterable is an iterator.

delb.get_traverser(from_left=True, depth_first=True, from_top=True)[source]¶

Returns a function that can be used to traverse a (sub)tree with the given node as root. While traversing the given root node is yielded at some point.

The returned functions have this signature:

def traverser(root: NodeBase, *filters: Filter) -> Iterator[NodeBase]:
    ...

Parameters

from_left – The traverser yields sibling nodes from left to right if True, or starting from the right if False.
depth_first – The child nodes resp. the parent node are yielded before the siblings of a node by a traverser if True. Siblings are favored if False.
from_top – The traverser starts yielding nodes with the lowest depth if True. When False, again, the opposite is in effect.

delb.last(iterable: Iterable) → Optional[Any][source]¶: Returns the last item of the given iterable or None if it’s empty. Note that the whole iterator is consumed when such is given.

delb.register_namespace(prefix: str, namespace: str)[source]¶

Registers a namespace prefix that newly created TagNode instances in that namespace will use in serializations.

The registry is global, and any existing mapping for either the given prefix or the namespace URI will be removed. It has however no effect on the serialization of existing nodes, see Document.cleanup_namespace() for that.

Parameters

prefix – The prefix to register.
namespace – The targeted namespace.

delb.tag(local_name: str)[source]¶

delb.tag(local_name: str, attributes: Mapping[str, str])

delb.tag(local_name: str, child: Union[str, NodeBase, _TagDefinition])

delb.tag(local_name: str, children: Sequence[Union[str, NodeBase, _TagDefinition]])

delb.tag(local_name: str, attributes: Mapping[str, str], child: Union[str, NodeBase, _TagDefinition])

delb.tag(local_name: str, attributes: Mapping[str, str], children: Sequence[Union[str, NodeBase, _TagDefinition]])

This function can be used for in-place creation (or call it templating if you want to) of TagNode instances as:

node argument to methods that add nodes to a tree
items in the children argument of new_tag_node() and NodeBase.new_tag_node()

The first argument to the function is always the local name of the tag node. Optionally, the second argument can be a mapping that specifies attributes for that node. The optional last argument is either a single object that will be appended as child node or a sequence of such, these objects can be node instances of any type, strings (for derived TextNode instances) or other definitions from this function (for derived TagNode instances).

The actual nodes that are constructed always inherit the namespace of the context node they are created in.

>>> root = new_tag_node('root', children=[
...     tag("head", {"lvl": "1"}, "Hello!"),
...     tag("items", (
...         tag("item1"),
...         tag("item2"),
...         )
...     )
... ])
>>> str(root)
'<root><head lvl="1">Hello!</head><items><item1/><item2/></items></root>'
>>> root.append_children(tag("addendum"))
>>> str(root)[-26:]
'</items><addendum/></root>'

Exceptions¶

exception delb.exceptions.AmbiguousTreeError(message: str)[source]¶

Raised when a single node shall be fetched or created by an XPath expression in a tree where the target position can’t be clearly determined.

with_traceback()¶: Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

exception delb.exceptions.DelbBaseException[source]¶

with_traceback()¶: Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

exception delb.exceptions.FailedDocumentLoading(source: Any, excuses: Dict[Callable[[Any, SimpleNamespace], Union[_ElementTree, str]], Union[str, Exception]])[source]¶

with_traceback()¶: Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

exception delb.exceptions.InvalidCodePath[source]¶

Raised when a code path that is not expected to be executed is reached.

with_traceback()¶: Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

exception delb.exceptions.InvalidOperation[source]¶

Raised when an invalid operation is attempted by the client code.

with_traceback()¶: Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

exception delb.exceptions.XPathEvaluationError(message: str)[source]¶

with_traceback()¶: Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

exception delb.exceptions.XPathParsingError(expression: Optional[str] = None, position: Optional[int] = None, message: Optional[str] = None)[source]¶

Raised when an XPath expression can’t be parsed.

with_traceback()¶: Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

exception delb.exceptions.XPathUnsupportedStandardFeature(position: int, feature_description: str)[source]¶

Raised when an unsupported XPath expression feature is recognized.

with_traceback()¶: Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

API Documentation¶

Documents¶

Document loaders¶

Core¶

Extra¶

Parser options¶

Nodes¶

Comment¶

Processing instruction¶

Tag¶

Tag attribute¶

Text¶

Node constructors¶

Queries with XPath & CSS¶

Filters¶

Default filters¶

Contributed filters¶

Transformations¶

Various helpers¶

Exceptions¶

Table of Contents

Related Topics

This Page