API Documentation¶
Note
There are actually two packages that are installed with delb:
delb
and _delb
. As the underscore indicates, the latter is exposing
private parts of the API while the first is re-exposing what is deemed to
be public from that one and additional contents.
As a rule of thumb, use the public API in applications and the private API
in delb extensions. By doing so, you can avoid circular dependencies if
your extension (or other code that it depends on) uses contents from the
_delb
package.
Documents¶
- class delb.Document(source, collapse_whitespace=None, parser=None, parser_options=None, klass=None, **config_options)[source]¶
This class is the entrypoint to obtain a representation of an XML encoded text document. For instantiation any object can be passed. A suitable loader must be available for the given source. See Document loaders for the default loaders that come with this package. Plugins are capable to alter the available loaders, see Extending delb.
Nodes can be tested for membership in a document:
>>> document = Document("<root>text</root>") >>> text_node = document.root[0] >>> text_node in document True >>> text_node.clone() in document False
The string coercion of a document yields an XML encoded stream as string. Its appearance can be configured via
DefaultStringOptions
.>>> document = Document("<root/>") >>> str(document) "<?xml version='1.0' encoding='UTF-8'?><root/>"
- Parameters:
source – Anything that the configured loaders can make sense of to return a parsed document tree.
collapse_whitespace – Deprecated. Use the argument with the same name on the
parser_options
object.parser – Deprecated.
parser_options – A
delb.ParserOptions
instance to configure the used parser.klass – Explicitly define the initialized class. This can be useful for applications that have default document subclasses in use.
config – Additional keyword arguments for the configuration of extension classes.
Properties
Beside the used
parser
andcollapsed_whitespace
option, this property contains the namespaced data that extension classes and loaders may have stored.A list-like accessor to the nodes that precede the document's root node.
The namespace mapping of the document's
root
node.The root node of a document tree.
The source URL where a loader obtained the document's contents or
None
.A list-like accessor to the nodes that follow the document's root node.
Uncategorized methods
clone
()- return:
Another instance with the duplicated contents.
Collapses whitespace as described here: https://wiki.tei-c.org/index.php/XML_Whitespace#Recommendations
css_select
(expression[, namespaces])This method proxies to the
TagNode.css_select()
method of the document'sroot
node.This method proxies to the
TagNode.merge_text_nodes()
method of the document'sroot
node.new_tag_node
(local_name[, attributes, namespace])This method proxies to the
TagNode.new_tag_node()
method of the document's root node.save
(path[, pretty, encoding, ...])- param path:
The filesystem path to the target file.
write
(buffer[, pretty, encoding, ...])- param buffer:
A file-like object that the document is written to.
xpath
(expression[, namespaces])This method proxies to the
TagNode.xpath()
method of the document'sroot
node.xslt
(transformation)- param transformation:
A
lxml.etree.XSLT
instance that shall be
- collapse_whitespace()[source]¶
Collapses whitespace as described here: https://wiki.tei-c.org/index.php/XML_Whitespace#Recommendations
Implicitly merges all neighbouring text nodes.
- config: SimpleNamespace¶
Beside the used
parser
andcollapsed_whitespace
option, this property contains the namespaced data that extension classes and loaders may have stored.
- css_select(expression: str, namespaces: Optional[NamespaceDeclarations] = None) QueryResults [source]¶
This method proxies to the
TagNode.css_select()
method of the document’sroot
node.
- head_nodes¶
A list-like accessor to the nodes that precede the document’s root node. Note that nodes can’t be removed or replaced.
- merge_text_nodes()[source]¶
This method proxies to the
TagNode.merge_text_nodes()
method of the document’sroot
node.
- new_tag_node(local_name: str, attributes: Optional[dict[str, str]] = None, namespace: Optional[str] = None) TagNode [source]¶
This method proxies to the
TagNode.new_tag_node()
method of the document’s root node.
- save(path: Path, pretty: Optional[bool] = None, *, encoding: str = 'utf-8', align_attributes: bool = False, indentation: str = '', namespaces: Optional[NamespaceDeclarations] = None, newline: None | str = None, text_width: int = 0)[source]¶
- Parameters:
path – The filesystem path to the target file.
pretty – Deprecated. Adds indentation for human consumers when
True
.encoding – The desired text encoding.
align_attributes – Determines whether attributes’ names and values line up sharply around vertically aligned equal signs.
indentation – This string prefixes descending nodes’ contents one time per depth level. A non-empty string implies line-breaks between nodes as well.
namespaces – A mapping of prefixes to namespaces. These are overriding possible declarations from a parsed serialisat that the document instance stems from. Prefixes for undeclared namespaces are enumerated with the prefix
ns
.newline – See
io.TextIOWrapper
for a detailed explanation of the parameter with the same name.text_width – A positive value indicates that text nodes shall get wrapped at this character position. Indentations are not considered as part of text. This parameter’s purposed to define reasonable widths for text displays that can be scrolled horizontally.
- tail_nodes¶
A list-like accessor to the nodes that follow the document’s root node. Note that nodes can’t be removed or replaced.
- write(buffer: BinaryIO, pretty: Optional[bool] = None, *, encoding: str = 'utf-8', align_attributes: bool = False, indentation: str = '', namespaces: Optional[NamespaceDeclarations] = None, newline: None | str, text_width: int = 0)[source]¶
- Parameters:
buffer – A file-like object that the document is written to.
pretty – Deprecated. Adds indentation for human consumers when
True
.encoding – The desired text encoding.
align_attributes – Determines whether attributes’ names and values line up sharply around vertically aligned equal signs.
indentation – This string prefixes descending nodes’ contents one time per depth level. A non-empty string implies line-breaks between nodes as well.
namespaces – A mapping of prefixes to namespaces. These are overriding possible declarations from a parsed serialisat that the document instance stems from. Prefixes for undeclared namespaces are enumerated with the prefix
ns
.newline – See
io.TextIOWrapper
for a detailed explanation of the parameter with the same name.text_width – A positive value indicates that text nodes shall get wrapped at this character position. Indentations are not considered as part of text. This parameter’s purposed to define reasonable widths for text displays that can be scrolled horizontally.
- xpath(expression: str, namespaces: Optional[NamespaceDeclarations] = None) QueryResults [source]¶
This method proxies to the
TagNode.xpath()
method of the document’sroot
node.
Document loaders¶
If you want or need to manipulate the availability of or order in which loaders
are attempted, you can change the
delb.plugins.plugin_manager.plugins.loaders
object which is a
list
. Its state is reflected in your whole application. Please refer to
this issue when you require finer controls over these aspects.
Core¶
The core_loaders
module provides a set loaders to retrieve documents from various
data sources.
- _delb.plugins.core_loaders.buffer_loader(data: Any, config: SimpleNamespace) LoaderResult [source]¶
This loader loads a document from a file-like object.
- _delb.plugins.core_loaders.etree_loader(data: Any, config: SimpleNamespace) LoaderResult [source]¶
This loader processes
lxml.etree._Element
andlxml.etree._ElementTree
instances.
- _delb.plugins.core_loaders.ftp_loader(data: Any, config: SimpleNamespace) LoaderResult [source]¶
Loads a document from a URL with either the
ftp
schema. The URL will be bound tosource_url
on the document’sDocument.config
attribute.
- _delb.plugins.core_loaders.path_loader(data: Any, config: SimpleNamespace) LoaderResult [source]¶
This loader loads from a file that is pointed at with a
pathlib.Path
instance. That instance will be bound tosource_path
on the document’sDocument.config
attribute.
- _delb.plugins.core_loaders.tag_node_loader(data: Any, config: SimpleNamespace) LoaderResult [source]¶
This loader loads, or rather clones, a
delb.TagNode
instance and its descendant nodes.
Extra¶
If delb
is installed with https-loader
as extra, the required
dependencies for this loader are installed as well. See Installation.
- _delb.plugins.https_loader.https_loader(data: Any, config: SimpleNamespace, client: httpx.Client = <httpx.Client object>) LoaderResult [source]¶
This loader loads a document from a URL with the
http
andhttps
scheme. The default httpx-client follows redirects and can partially be configured with environment variables. The URL will be bound to the namesource_url
on the document’sDocument.config
attribute.Loaders with specifically configured httpx-clients can build on this loader like so:
import httpx from _delb.plugins import plugin_manager from _delb.plugins.https_loader import https_loader client = httpx.Client(follow_redirects=False, trust_env=False) @plugin_manager.register_loader(before=https_loader) def custom_https_loader(data, config): return https_loader(data, config, client=client)
Parser options¶
- class delb.ParserOptions(collapse_whitespace: bool = False, remove_comments: bool = False, remove_processing_instructions: bool = False, resolve_entities: bool = True, unplugged: bool = False)[source]¶
The configuration options that define an XML parser’s behaviour.
- Parameters:
collapse_whitespace –
Collapse the content's whitespace
.remove_comments – Ignore comments.
remove_processing_instructions – Don’t include processing instructions in the parsed tree.
resolve_entities – Resolve entities.
unplugged – Don’t load referenced resources over network.
Nodes¶
Comment¶
- class delb.CommentNode(etree_element: _Element)[source]¶
The instances of this class represent comment nodes of a tree.
To instantiate new nodes use
new_comment_node()
.Properties
content
The comment's text.
depth
The depth (or level) of the node in its tree.
document
The
Document
instance that the node is associated with orNone
.first_child
full_text
The concatenated contents of all text node descendants in document order.
index
The node's index within the parent's collection of child nodes or
None
when the node has no parent.last_child
last_descendant
namespaces
The prefix to namespace mapping of the node.
parent
The node's parent or
None
.Fetching a single relative node
fetch_following
(*filter)- param filter:
Any number of filter s.
fetch_following_sibling
(*filter)- param filter:
Any number of filter s.
fetch_preceding
(*filter)- param filter:
Any number of filter s.
fetch_preceding_sibling
(*filter)- param filter:
Any number of filter s.
Iterating over relative nodes
iterate_ancestors
(*filter)- param filter:
Any number of filter s that a node must match to be
iterate_children
(*filter[, recurse])A generator iterator that yields nothing.
iterate_descendants
(*filter)- param filter:
Any number of filter s that a node must match to be
iterate_following
(*filter)- param filter:
Any number of filter s that a node must match to be
iterate_following_siblings
(*filter)- param filter:
Any number of filter s that a node must match to be
iterate_preceding
(*filter)- param filter:
Any number of filter s that a node must match to be
iterate_preceding_siblings
(*filter)- param filter:
Any number of filter s that a node must match to be
Querying nodes
xpath
(expression[, namespaces])See Queries with XPath & CSS for details on the extent of the XPath implementation.
Adding nodes
add_following_siblings
(*node[, clone])Adds one or more nodes to the right of the node this method is called on.
add_preceding_siblings
(*node[, clone])Adds one or more nodes to the left of the node this method is called on.
Removing a node from its tree
detach
([retain_child_nodes])Removes the node from its tree.
replace_with
(node[, clone])Removes the node and places the given one in its tree location.
Uncategorized methods
clone
([deep, quick_and_unsafe])- param deep:
Clones the whole subtree if
True
.
new_tag_node
(local_name[, attributes, ...])Creates a new
TagNode
instance in the node's context.serialize
(*[, align_attributes, ...])Returns a string that contains the serialization of the node.
- add_following_siblings(*node: NodeSource, clone: bool = False)¶
Adds one or more nodes to the right of the node this method is called on.
The nodes can be concrete instances of any node type or rather abstract descriptions in the form of strings or objects returned from the
tag()
function that are used to deriveTextNode
respectivelyTagNode
instances from.- Parameters:
node – The node(s) to be added.
clone – Clones the concrete nodes before adding if
True
.
- add_preceding_siblings(*node: NodeSource, clone: bool = False)¶
Adds one or more nodes to the left of the node this method is called on.
The nodes can be concrete instances of any node type or rather abstract descriptions in the form of strings or objects returned from the
tag()
function that are used to deriveTextNode
respectivelyTagNode
instances from.- Parameters:
node – The node(s) to be added.
clone – Clones the concrete nodes before adding if
True
.
- clone(deep: bool = False, quick_and_unsafe: bool = False) _ElementWrappingNode ¶
- Parameters:
deep – Clones the whole subtree if
True
.quick_and_unsafe – Creates a deep clone in a quicker manner where text nodes may get lost. It should be safe with trees that don’t contain subsequent text nodes, e.g. freshly parsed, unaltered documents of after
TagNode.merge_text_nodes()
has been applied.
- Returns:
A copy of the node.
- detach(retain_child_nodes: bool = False) _ElementWrappingNode ¶
Removes the node from its tree.
- Parameters:
retain_child_nodes – Keeps the node’s descendants in the originating tree if
True
.- Returns:
The removed node.
- property document: Optional[Document]¶
The
Document
instance that the node is associated with orNone
.
- first_child = None¶
- property index: Optional[int]¶
The node’s index within the parent’s collection of child nodes or
None
when the node has no parent.
- iterate_ancestors(*filter: Filter) Iterator[TagNode] ¶
- Parameters:
filter – Any number of filter s that a node must match to be yielded.
- Returns:
A generator iterator that yields the ancestor nodes from bottom to top.
- iterate_children(*filter: Filter, recurse: bool = False) Iterator[NodeBase] ¶
A generator iterator that yields nothing.
- iterate_descendants(*filter: Filter) Iterator[NodeBase] ¶
- Parameters:
filter – Any number of filter s that a node must match to be yielded.
- Returns:
A generator iterator that yields the descending nodes of the node.
- iterate_following(*filter: Filter) Iterator[NodeBase] ¶
- Parameters:
filter – Any number of filter s that a node must match to be yielded.
- Returns:
A generator iterator that yields the following nodes in document order.
- iterate_following_siblings(*filter: Filter) Iterator[NodeBase] ¶
- Parameters:
filter – Any number of filter s that a node must match to be yielded.
- Returns:
A generator iterator that yields the siblings to the node’s right.
- iterate_preceding(*filter: Filter) Iterator[NodeBase] ¶
- Parameters:
filter – Any number of filter s that a node must match to be yielded.
- Returns:
A generator iterator that yields the previous nodes in document order.
- iterate_preceding_siblings(*filter: Filter) Iterator[NodeBase] ¶
- Parameters:
filter – Any number of filter s that a node must match to be yielded.
- Returns:
A generator iterator that yields the siblings to the node’s left.
- last_child = None¶
- last_descendant = None¶
- new_tag_node(local_name: str, attributes: Optional[dict[str, str]] = None, namespace: Optional[str] = None, children: Sequence[str | NodeBase | _TagDefinition] = ()) TagNode ¶
Creates a new
TagNode
instance in the node’s context.- Parameters:
local_name – The tag name.
attributes – Optional attributes that are assigned to the new node.
namespace – An optional tag namespace. If none is provided, the context node’s namespace is inherited.
children – An optional sequence of objects that will be appended as child nodes. This can be existing nodes, strings that will be inserted as text nodes and in-place definitions of
TagNode
instances fromtag()
. The latter will be assigned to the same namespace.
- Returns:
The newly created tag node.
- replace_with(node: NodeSource, clone: bool = False) NodeBase ¶
Removes the node and places the given one in its tree location.
The node can be a concrete instance of any node type or a rather abstract description in the form of a string or an object returned from the
tag()
function that is used to derive aTextNode
respectivelyTagNode
instance from.- Parameters:
node – The replacing node.
clone – A concrete, replacing node is cloned if
True
.
- Returns:
The removed node.
- serialize(*, align_attributes: bool = False, indentation: str = '', namespaces: Optional[NamespaceDeclarations] = None, newline: Optional[str] = None, text_width: int = 0)¶
Returns a string that contains the serialization of the node.
- Parameters:
align_attributes – Determines whether attributes’ names and values line up sharply around vertically aligned equal signs.
indentation – This string prefixes descending nodes’ contents one time per depth level. A non-empty string implies line-breaks between nodes as well.
namespaces – A mapping of prefixes to namespaces. These are overriding possible declarations from a parsed serialisat that the document instance stems from. Prefixes for undeclared namespaces are enumerated with the prefix
ns
.newline – See
io.TextIOWrapper
for a detailed explanation of the parameter with the same name.text_width – A positive value indicates that text nodes shall get wrapped at this character position. Indentations are not considered as part of text. This parameter’s purposed to define reasonable widths for text displays that can be scrolled horizontally.
- xpath(expression: str, namespaces: Optional[NamespaceDeclarations] = None) QueryResults ¶
See Queries with XPath & CSS for details on the extent of the XPath implementation.
- Parameters:
expression – A supported XPath 1.0 expression that contains one or more location paths.
namespaces – A mapping of prefixes that are used in the expression to namespaces. The declarations that were used in a document’s source serialisat serve as fallback.
- Returns:
All nodes that match the evaluation of the provided XPath expression.
Processing instruction¶
- class delb.ProcessingInstructionNode(etree_element: _Element)[source]¶
The instances of this class represent processing instruction nodes of a tree.
To instantiate new nodes use
new_processing_instruction_node()
.Properties
content
The processing instruction's text.
depth
The depth (or level) of the node in its tree.
document
The
Document
instance that the node is associated with orNone
.first_child
full_text
The concatenated contents of all text node descendants in document order.
index
The node's index within the parent's collection of child nodes or
None
when the node has no parent.last_child
last_descendant
namespaces
The prefix to namespace mapping of the node.
parent
The node's parent or
None
.target
The processing instruction's target.
Fetching a single relative node
fetch_following
(*filter)- param filter:
Any number of filter s.
fetch_following_sibling
(*filter)- param filter:
Any number of filter s.
fetch_preceding
(*filter)- param filter:
Any number of filter s.
fetch_preceding_sibling
(*filter)- param filter:
Any number of filter s.
Iterating over relative nodes
iterate_ancestors
(*filter)- param filter:
Any number of filter s that a node must match to be
iterate_children
(*filter[, recurse])A generator iterator that yields nothing.
iterate_descendants
(*filter)- param filter:
Any number of filter s that a node must match to be
iterate_following
(*filter)- param filter:
Any number of filter s that a node must match to be
iterate_following_siblings
(*filter)- param filter:
Any number of filter s that a node must match to be
iterate_preceding
(*filter)- param filter:
Any number of filter s that a node must match to be
iterate_preceding_siblings
(*filter)- param filter:
Any number of filter s that a node must match to be
Querying nodes
xpath
(expression[, namespaces])See Queries with XPath & CSS for details on the extent of the XPath implementation.
Adding nodes
add_following_siblings
(*node[, clone])Adds one or more nodes to the right of the node this method is called on.
add_preceding_siblings
(*node[, clone])Adds one or more nodes to the left of the node this method is called on.
Removing a node from its tree
detach
([retain_child_nodes])Removes the node from its tree.
replace_with
(node[, clone])Removes the node and places the given one in its tree location.
Uncategorized methods
clone
([deep, quick_and_unsafe])- param deep:
Clones the whole subtree if
True
.
new_tag_node
(local_name[, attributes, ...])Creates a new
TagNode
instance in the node's context.serialize
(*[, align_attributes, ...])Returns a string that contains the serialization of the node.
- add_following_siblings(*node: NodeSource, clone: bool = False)¶
Adds one or more nodes to the right of the node this method is called on.
The nodes can be concrete instances of any node type or rather abstract descriptions in the form of strings or objects returned from the
tag()
function that are used to deriveTextNode
respectivelyTagNode
instances from.- Parameters:
node – The node(s) to be added.
clone – Clones the concrete nodes before adding if
True
.
- add_preceding_siblings(*node: NodeSource, clone: bool = False)¶
Adds one or more nodes to the left of the node this method is called on.
The nodes can be concrete instances of any node type or rather abstract descriptions in the form of strings or objects returned from the
tag()
function that are used to deriveTextNode
respectivelyTagNode
instances from.- Parameters:
node – The node(s) to be added.
clone – Clones the concrete nodes before adding if
True
.
- clone(deep: bool = False, quick_and_unsafe: bool = False) _ElementWrappingNode ¶
- Parameters:
deep – Clones the whole subtree if
True
.quick_and_unsafe – Creates a deep clone in a quicker manner where text nodes may get lost. It should be safe with trees that don’t contain subsequent text nodes, e.g. freshly parsed, unaltered documents of after
TagNode.merge_text_nodes()
has been applied.
- Returns:
A copy of the node.
- detach(retain_child_nodes: bool = False) _ElementWrappingNode ¶
Removes the node from its tree.
- Parameters:
retain_child_nodes – Keeps the node’s descendants in the originating tree if
True
.- Returns:
The removed node.
- property document: Optional[Document]¶
The
Document
instance that the node is associated with orNone
.
- first_child = None¶
- property index: Optional[int]¶
The node’s index within the parent’s collection of child nodes or
None
when the node has no parent.
- iterate_ancestors(*filter: Filter) Iterator[TagNode] ¶
- Parameters:
filter – Any number of filter s that a node must match to be yielded.
- Returns:
A generator iterator that yields the ancestor nodes from bottom to top.
- iterate_children(*filter: Filter, recurse: bool = False) Iterator[NodeBase] ¶
A generator iterator that yields nothing.
- iterate_descendants(*filter: Filter) Iterator[NodeBase] ¶
- Parameters:
filter – Any number of filter s that a node must match to be yielded.
- Returns:
A generator iterator that yields the descending nodes of the node.
- iterate_following(*filter: Filter) Iterator[NodeBase] ¶
- Parameters:
filter – Any number of filter s that a node must match to be yielded.
- Returns:
A generator iterator that yields the following nodes in document order.
- iterate_following_siblings(*filter: Filter) Iterator[NodeBase] ¶
- Parameters:
filter – Any number of filter s that a node must match to be yielded.
- Returns:
A generator iterator that yields the siblings to the node’s right.
- iterate_preceding(*filter: Filter) Iterator[NodeBase] ¶
- Parameters:
filter – Any number of filter s that a node must match to be yielded.
- Returns:
A generator iterator that yields the previous nodes in document order.
- iterate_preceding_siblings(*filter: Filter) Iterator[NodeBase] ¶
- Parameters:
filter – Any number of filter s that a node must match to be yielded.
- Returns:
A generator iterator that yields the siblings to the node’s left.
- last_child = None¶
- last_descendant = None¶
- new_tag_node(local_name: str, attributes: Optional[dict[str, str]] = None, namespace: Optional[str] = None, children: Sequence[str | NodeBase | _TagDefinition] = ()) TagNode ¶
Creates a new
TagNode
instance in the node’s context.- Parameters:
local_name – The tag name.
attributes – Optional attributes that are assigned to the new node.
namespace – An optional tag namespace. If none is provided, the context node’s namespace is inherited.
children – An optional sequence of objects that will be appended as child nodes. This can be existing nodes, strings that will be inserted as text nodes and in-place definitions of
TagNode
instances fromtag()
. The latter will be assigned to the same namespace.
- Returns:
The newly created tag node.
- replace_with(node: NodeSource, clone: bool = False) NodeBase ¶
Removes the node and places the given one in its tree location.
The node can be a concrete instance of any node type or a rather abstract description in the form of a string or an object returned from the
tag()
function that is used to derive aTextNode
respectivelyTagNode
instance from.- Parameters:
node – The replacing node.
clone – A concrete, replacing node is cloned if
True
.
- Returns:
The removed node.
- serialize(*, align_attributes: bool = False, indentation: str = '', namespaces: Optional[NamespaceDeclarations] = None, newline: Optional[str] = None, text_width: int = 0)¶
Returns a string that contains the serialization of the node.
- Parameters:
align_attributes – Determines whether attributes’ names and values line up sharply around vertically aligned equal signs.
indentation – This string prefixes descending nodes’ contents one time per depth level. A non-empty string implies line-breaks between nodes as well.
namespaces – A mapping of prefixes to namespaces. These are overriding possible declarations from a parsed serialisat that the document instance stems from. Prefixes for undeclared namespaces are enumerated with the prefix
ns
.newline – See
io.TextIOWrapper
for a detailed explanation of the parameter with the same name.text_width – A positive value indicates that text nodes shall get wrapped at this character position. Indentations are not considered as part of text. This parameter’s purposed to define reasonable widths for text displays that can be scrolled horizontally.
- xpath(expression: str, namespaces: Optional[NamespaceDeclarations] = None) QueryResults ¶
See Queries with XPath & CSS for details on the extent of the XPath implementation.
- Parameters:
expression – A supported XPath 1.0 expression that contains one or more location paths.
namespaces – A mapping of prefixes that are used in the expression to namespaces. The declarations that were used in a document’s source serialisat serve as fallback.
- Returns:
All nodes that match the evaluation of the provided XPath expression.
Tag¶
- class delb.TagNode(etree_element: _Element)[source]¶
The instances of this class represent tag node s of a tree, the equivalent of DOM’s elements.
To instantiate new nodes use
Document.new_tag_node
,TagNode.new_tag_node
,TextNode.new_tag_node
ornew_tag_node()
.Some syntactic sugar is baked in:
Attributes and nodes can be tested for membership in a node.
>>> root = Document('<root ham="spam"><child/></root>').root >>> child = root.first_child >>> "ham" in root True >>> child in root True
Nodes can be copied. Note that this relies on
TagNode.clone()
.>>> from copy import copy, deepcopy >>> root = Document("<root>Content</root>").root >>> print(copy(root)) <root/> >>> print(deepcopy(root)) <root>Content</root>
Nodes can be tested for equality regarding their qualified name and attributes.
>>> root = Document('<root><foo x="0"/><foo x="0"/><bar x="0"/></root>').root >>> root[0] == root[1] True >>> root[0] == root[2] False
Attribute values and child nodes can be obtained with the subscript notation.
>>> root = Document('<root x="0"><child_1/>child_2<child_3/></root>').root >>> root["x"] '0' >>> print(root[0]) <child_1/> >>> print(root[-1]) <child_3/> >>> print([str(x) for x in root[1::-1]]) ['child_2', '<child_1/>']
How much child nodes has this node anyway?
>>> root = Document("<root><child_1/><child_2/></root>").root >>> len(root) 2 >>> len(root[0]) 0
As seen in the examples above, a tag nodes string representation yields a serialized XML representation of a sub-/tree.
Properties
attributes
A mapping that can be used to query and alter the node's attributes.
depth
The depth (or level) of the node in its tree.
document
The
Document
instance that the node is associated with orNone
.first_child
The node's first child node.
full_text
The concatenated contents of all text node descendants in document order.
id
This is a shortcut to retrieve and set the
id
attribute in the XML namespace.index
The node's index within the parent's collection of child nodes or
None
when the node has no parent.last_child
The node's last child node.
last_descendant
The node's last descendant.
local_name
The node's name.
location_path
An unambiguous XPath location path that points to this node from its tree root.
namespace
The node's namespace.
namespaces
The prefix to namespace mapping of the node.
parent
The node's parent or
None
.prefix
The prefix that the node's namespace is currently mapped to.
universal_name
The node's qualified name in Clark notation.
Fetching a single relative node
fetch_following
(*filter)- param filter:
Any number of filter s.
fetch_following_sibling
(*filter)- param filter:
Any number of filter s.
fetch_preceding
(*filter)- param filter:
Any number of filter s.
fetch_preceding_sibling
(*filter)- param filter:
Any number of filter s.
Iterating over relative nodes
iterate_ancestors
(*filter)- param filter:
Any number of filter s that a node must match to be
iterate_children
(*filter[, recurse])- param filter:
Any number of filter s that a node must match to be
iterate_descendants
(*filter)- param filter:
Any number of filter s that a node must match to be
iterate_following
(*filter)- param filter:
Any number of filter s that a node must match to be
iterate_following_siblings
(*filter)- param filter:
Any number of filter s that a node must match to be
iterate_preceding
(*filter)- param filter:
Any number of filter s that a node must match to be
iterate_preceding_siblings
(*filter)- param filter:
Any number of filter s that a node must match to be
Querying nodes
css_select
(expression[, namespaces])See Queries with XPath & CSS regarding the extent of the supported grammar.
fetch_or_create_by_xpath
(expression[, ...])Fetches a single node that is locatable by the provided XPath expression.
xpath
(expression[, namespaces])See Queries with XPath & CSS for details on the extent of the XPath implementation.
Adding nodes
add_following_siblings
(*node[, clone])Adds one or more nodes to the right of the node this method is called on.
add_preceding_siblings
(*node[, clone])Adds one or more nodes to the left of the node this method is called on.
append_children
(*node[, clone])Adds one or more nodes as child nodes after any existing to the child nodes of the node this method is called on.
insert_children
(index, *node[, clone])Inserts one or more child nodes.
prepend_children
(*node[, clone])Adds one or more nodes as child nodes before any existing to the child nodes of the node this method is called on.
Removing a node from its tree
detach
([retain_child_nodes])Removes the node from its tree.
replace_with
(node[, clone])Removes the node and places the given one in its tree location.
Uncategorized methods
clone
([deep, quick_and_unsafe])- param deep:
Clones the whole subtree if
True
.
merge_text_nodes
()Merges all consecutive text nodes in the subtree into one.
new_tag_node
(local_name[, attributes, ...])Creates a new
TagNode
instance in the node's context.parse
(text[, parser, parser_options, ...])Parses the given string or bytes sequence into a new tree.
serialize
(*[, align_attributes, ...])Returns a string that contains the serialization of the node.
- add_following_siblings(*node: NodeSource, clone: bool = False)¶
Adds one or more nodes to the right of the node this method is called on.
The nodes can be concrete instances of any node type or rather abstract descriptions in the form of strings or objects returned from the
tag()
function that are used to deriveTextNode
respectivelyTagNode
instances from.- Parameters:
node – The node(s) to be added.
clone – Clones the concrete nodes before adding if
True
.
- add_preceding_siblings(*node: NodeSource, clone: bool = False)¶
Adds one or more nodes to the left of the node this method is called on.
The nodes can be concrete instances of any node type or rather abstract descriptions in the form of strings or objects returned from the
tag()
function that are used to deriveTextNode
respectivelyTagNode
instances from.- Parameters:
node – The node(s) to be added.
clone – Clones the concrete nodes before adding if
True
.
- append_children(*node: NodeSource, clone: bool = False)[source]¶
Adds one or more nodes as child nodes after any existing to the child nodes of the node this method is called on.
The nodes can be concrete instances of any node type or rather abstract descriptions in the form of strings or objects returned from the
tag()
function that are used to deriveTextNode
respectivelyTagNode
instances from.- Parameters:
node – The node(s) to be added.
clone – Clones the concrete nodes before adding if
True
.
- property attributes: TagAttributes¶
A mapping that can be used to query and alter the node’s attributes.
>>> node = new_tag_node("node", attributes={"foo": "0", "bar": "0"}) >>> node.attributes {'foo': '0', 'bar': '0'} >>> node.attributes.pop("bar") '0' >>> node.attributes["foo"] = "1" >>> node.attributes["peng"] = "1" >>> print(node) <node foo="1" peng="1"/> >>> node.attributes.update({"foo": "2", "zong": "2"}) >>> print(node) <node foo="2" peng="1" zong="2"/>
Namespaced attributes can be accessed by using Python’s slice notation. A default namespace can be provided optionally, but it’s also found without.
>>> node = new_tag_node("node", {}) >>> node.attributes["http://namespace":"foo"] = "0" >>> print(node) <node xmlns:ns0="http://namespace" ns0:foo="0"/> >>> node = Document('<node xmlns="default" foo="0"/>').root >>> node.attributes["default":"foo"] is node.attributes["foo"] True
Attributes behave like strings, but also expose namespace, local name and value for manipulation.
>>> node = new_tag_node("node") >>> node.attributes["foo"] = "0" >>> node.attributes["foo"].local_name = "bar" >>> node.attributes["bar"].namespace = "http://namespace" >>> node.attributes["http://namespace":"bar"].value = "1" >>> print(node) <node xmlns:ns0="http://namespace" ns0:bar="1"/>
Unlike with typical Python mappings, requesting a non-existing attribute doesn’t evoke a
KeyError
, insteadNone
is returned.
- clone(deep: bool = False, quick_and_unsafe: bool = False) TagNode [source]¶
- Parameters:
deep – Clones the whole subtree if
True
.quick_and_unsafe – Creates a deep clone in a quicker manner where text nodes may get lost. It should be safe with trees that don’t contain subsequent text nodes, e.g. freshly parsed, unaltered documents of after
TagNode.merge_text_nodes()
has been applied.
- Returns:
A copy of the node.
- css_select(expression: str, namespaces: Optional[NamespaceDeclarations] = None) QueryResults [source]¶
See Queries with XPath & CSS regarding the extent of the supported grammar.
Namespace prefixes are delimited with a
|
before a name test, for examplediv svg|metadata
selects all descendants ofdiv
named nodes that belong to the default namespace or have no namespace and whose name ismetadata
and have a namespace that is mapped to thesvg
prefix.- Parameters:
expression – A CSS selector expression.
namespaces – A mapping of prefixes that are used in the expression to namespaces. If omitted, the node’s definition is used.
- Returns:
All nodes that match the evaluation of the provided CSS selector expression.
- detach(retain_child_nodes: bool = False) _ElementWrappingNode [source]¶
Removes the node from its tree.
- Parameters:
retain_child_nodes – Keeps the node’s descendants in the originating tree if
True
.- Returns:
The removed node.
- property document: Optional[Document]¶
The
Document
instance that the node is associated with orNone
.
- fetch_or_create_by_xpath(expression: str, namespaces: Optional[NamespaceDeclarations] = None) TagNode [source]¶
Fetches a single node that is locatable by the provided XPath expression. If the node doesn’t exist, the non-existing branch will be created. These rules are imperative in your endeavour:
All location steps must use the child axis.
Each step needs to provide a name test.
Attribute comparisons against literals are the only allowed predicates.
Multiple attribute comparisons must be joined with the and operator and / or contained in more than one predicate expression.
The logical validity of multiple attribute comparisons isn’t checked. E.g. one could provide
foo[@p="her"][@p="him"]
, but expect an undefined behaviour.
>>> root = Document("<root/>").root >>> grandchild = root.fetch_or_create_by_xpath( ... "child[@a='b']/grandchild" ... ) >>> grandchild is root.fetch_or_create_by_xpath( ... "child[@a='b']/grandchild" ... ) True >>> str(root) '<root><child a="b"><grandchild/></child></root>'
- Parameters:
expression – An XPath expression that can unambiguously locate a descending node in a tree that has any state.
namespaces – An optional mapping of prefixes to namespaces. The declarations that were used in a document’s source serialisat serve as fallback.
- Returns:
The existing or freshly created node descibed with
expression
.
- property id: Optional[str]¶
This is a shortcut to retrieve and set the
id
attribute in the XML namespace. The client code is responsible to pass properly formed id names.
- property index: Optional[int]¶
The node’s index within the parent’s collection of child nodes or
None
when the node has no parent.
- insert_children(index: int, *node: NodeSource, clone: bool = False)[source]¶
Inserts one or more child nodes.
The nodes can be concrete instances of any node type or rather abstract descriptions in the form of strings or objects returned from the
tag()
function that are used to deriveTextNode
respectivelyTagNode
instances from.- Parameters:
index – The index at which the first of the given nodes will be inserted, the remaining nodes are added afterwards in the given order.
node – The node(s) to be added.
clone – Clones the concrete nodes before adding if
True
.
- iterate_ancestors(*filter: Filter) Iterator[TagNode] ¶
- Parameters:
filter – Any number of filter s that a node must match to be yielded.
- Returns:
A generator iterator that yields the ancestor nodes from bottom to top.
- iterate_children(*filter: Filter, recurse: bool = False) Iterator[NodeBase] [source]¶
- Parameters:
filter – Any number of filter s that a node must match to be yielded.
recurse – Deprecated. Use
NodeBase.iterate_descendants()
.
- Returns:
A generator iterator that yields the child nodes of the node.
- iterate_descendants(*filter: Filter) Iterator[NodeBase] [source]¶
- Parameters:
filter – Any number of filter s that a node must match to be yielded.
- Returns:
A generator iterator that yields the descending nodes of the node.
- iterate_following(*filter: Filter) Iterator[NodeBase] ¶
- Parameters:
filter – Any number of filter s that a node must match to be yielded.
- Returns:
A generator iterator that yields the following nodes in document order.
- iterate_following_siblings(*filter: Filter) Iterator[NodeBase] ¶
- Parameters:
filter – Any number of filter s that a node must match to be yielded.
- Returns:
A generator iterator that yields the siblings to the node’s right.
- iterate_preceding(*filter: Filter) Iterator[NodeBase] ¶
- Parameters:
filter – Any number of filter s that a node must match to be yielded.
- Returns:
A generator iterator that yields the previous nodes in document order.
- iterate_preceding_siblings(*filter: Filter) Iterator[NodeBase] ¶
- Parameters:
filter – Any number of filter s that a node must match to be yielded.
- Returns:
A generator iterator that yields the siblings to the node’s left.
- property location_path: str¶
An unambiguous XPath location path that points to this node from its tree root.
- property namespace: Optional[str]¶
The node’s namespace. Be aware, that while this property can be set to
None
, serializations will continue to render a previous default namespace declaration if the node had such.
- new_tag_node(local_name: str, attributes: Optional[dict[str, str]] = None, namespace: Optional[str] = None, children: Sequence[str | NodeBase | _TagDefinition] = ()) TagNode [source]¶
Creates a new
TagNode
instance in the node’s context.- Parameters:
local_name – The tag name.
attributes – Optional attributes that are assigned to the new node.
namespace – An optional tag namespace. If none is provided, the context node’s namespace is inherited.
children – An optional sequence of objects that will be appended as child nodes. This can be existing nodes, strings that will be inserted as text nodes and in-place definitions of
TagNode
instances fromtag()
. The latter will be assigned to the same namespace.
- Returns:
The newly created tag node.
- static parse(text: AnyStr, parser: Optional[XMLParser] = None, parser_options: Optional[ParserOptions] = None, collapse_whitespace: Optional[bool] = None) TagNode [source]¶
Parses the given string or bytes sequence into a new tree.
- Parameters:
text – A serialized XML tree.
parser – Deprecated.
parser_options – A
delb.ParserOptions
class to configure the used parser.collapse_whitespace – Deprecated. Use the argument with the same name on the
parser_options
object.
- prepend_children(*node: NodeBase, clone: bool = False) None [source]¶
Adds one or more nodes as child nodes before any existing to the child nodes of the node this method is called on.
The nodes can be concrete instances of any node type or rather abstract descriptions in the form of strings or objects returned from the
tag()
function that are used to deriveTextNode
respectivelyTagNode
instances from.- Parameters:
node – The node(s) to be added.
clone – Clones the concrete nodes before adding if
True
.
- replace_with(node: NodeSource, clone: bool = False) NodeBase ¶
Removes the node and places the given one in its tree location.
The node can be a concrete instance of any node type or a rather abstract description in the form of a string or an object returned from the
tag()
function that is used to derive aTextNode
respectivelyTagNode
instance from.- Parameters:
node – The replacing node.
clone – A concrete, replacing node is cloned if
True
.
- Returns:
The removed node.
- serialize(*, align_attributes: bool = False, indentation: str = '', namespaces: Optional[NamespaceDeclarations] = None, newline: Optional[str] = None, text_width: int = 0)[source]¶
Returns a string that contains the serialization of the node.
- Parameters:
align_attributes – Determines whether attributes’ names and values line up sharply around vertically aligned equal signs.
indentation – This string prefixes descending nodes’ contents one time per depth level. A non-empty string implies line-breaks between nodes as well.
namespaces – A mapping of prefixes to namespaces. These are overriding possible declarations from a parsed serialisat that the document instance stems from. Prefixes for undeclared namespaces are enumerated with the prefix
ns
.newline – See
io.TextIOWrapper
for a detailed explanation of the parameter with the same name.text_width – A positive value indicates that text nodes shall get wrapped at this character position. Indentations are not considered as part of text. This parameter’s purposed to define reasonable widths for text displays that can be scrolled horizontally.
- property universal_name: str¶
The node’s qualified name in Clark notation.
- xpath(expression: str, namespaces: Optional[NamespaceDeclarations] = None) QueryResults [source]¶
See Queries with XPath & CSS for details on the extent of the XPath implementation.
- Parameters:
expression – A supported XPath 1.0 expression that contains one or more location paths.
namespaces – A mapping of prefixes that are used in the expression to namespaces. The declarations that were used in a document’s source serialisat serve as fallback.
- Returns:
All nodes that match the evaluation of the provided XPath expression.
Tag attribute¶
- class delb.nodes.Attribute(attributes: TagAttributes, key: str)[source]¶
Attribute objects represent tag node’s attributes. See the
delb.TagNode.attributes()
documentation for capabilities.- property universal_name: str¶
The attribute’s namespace and local name in Clark notation.
Text¶
- class delb.TextNode(reference_or_text: _Element | str | TextNode, position: int = 0)[source]¶
TextNodes contain the textual data of a document. The class shall not be initialized by client code, just throw strings into the trees.
Instances expose all methods of
str
exceptstr.index()
:>>> node = TextNode("Show us the way to the next whisky bar.") >>> node.split() ['Show', 'us', 'the', 'way', 'to', 'the', 'next', 'whisky', 'bar.']
Instances can be tested for inequality with other text nodes and strings:
>>> TextNode("ham") == TextNode("spam") False >>> TextNode("Patsy") == "Patsy" True
And they can be tested for substrings:
>>> "Sir" in TextNode("Sir Bedevere the Wise") True
Attributes that rely to child nodes yield nothing respectively
None
.Properties
content
The node's text content.
depth
The depth (or level) of the node in its tree.
document
The
Document
instance that the node is associated with orNone
.first_child
full_text
The concatenated contents of all text node descendants in document order.
index
The node's index within the parent's collection of child nodes or
None
when the node has no parent.last_child
last_descendant
namespaces
The prefix to namespace mapping of the node.
parent
The node's parent or
None
.Fetching a single relative node
fetch_following
(*filter)- param filter:
Any number of filter s.
fetch_following_sibling
(*filter)- param filter:
Any number of filter s.
fetch_preceding
(*filter)- param filter:
Any number of filter s.
fetch_preceding_sibling
(*filter)- param filter:
Any number of filter s.
Iterating over relative nodes
iterate_ancestors
(*filter)- param filter:
Any number of filter s that a node must match to be
iterate_children
(*filter[, recurse])A generator iterator that yields nothing.
iterate_descendants
(*filter)- param filter:
Any number of filter s that a node must match to be
iterate_following
(*filter)- param filter:
Any number of filter s that a node must match to be
iterate_following_siblings
(*filter)- param filter:
Any number of filter s that a node must match to be
iterate_preceding
(*filter)- param filter:
Any number of filter s that a node must match to be
iterate_preceding_siblings
(*filter)- param filter:
Any number of filter s that a node must match to be
Querying nodes
xpath
(expression[, namespaces])See Queries with XPath & CSS for details on the extent of the XPath implementation.
Adding nodes
add_following_siblings
(*node[, clone])Adds one or more nodes to the right of the node this method is called on.
add_preceding_siblings
(*node[, clone])Adds one or more nodes to the left of the node this method is called on.
Removing a node from its tree
detach
([retain_child_nodes])Removes the node from its tree.
replace_with
(node[, clone])Removes the node and places the given one in its tree location.
Uncategorized methods
clone
([deep, quick_and_unsafe])- param deep:
Clones the whole subtree if
True
.
new_tag_node
(local_name[, attributes, ...])Creates a new
TagNode
instance in the node's context.serialize
(*[, align_attributes, ...])Returns a string that contains the serialization of the node.
- add_following_siblings(*node: NodeSource, clone: bool = False)¶
Adds one or more nodes to the right of the node this method is called on.
The nodes can be concrete instances of any node type or rather abstract descriptions in the form of strings or objects returned from the
tag()
function that are used to deriveTextNode
respectivelyTagNode
instances from.- Parameters:
node – The node(s) to be added.
clone – Clones the concrete nodes before adding if
True
.
- add_preceding_siblings(*node: NodeSource, clone: bool = False)¶
Adds one or more nodes to the left of the node this method is called on.
The nodes can be concrete instances of any node type or rather abstract descriptions in the form of strings or objects returned from the
tag()
function that are used to deriveTextNode
respectivelyTagNode
instances from.- Parameters:
node – The node(s) to be added.
clone – Clones the concrete nodes before adding if
True
.
- clone(deep: bool = False, quick_and_unsafe: bool = False) NodeBase [source]¶
- Parameters:
deep – Clones the whole subtree if
True
.quick_and_unsafe – Creates a deep clone in a quicker manner where text nodes may get lost. It should be safe with trees that don’t contain subsequent text nodes, e.g. freshly parsed, unaltered documents of after
TagNode.merge_text_nodes()
has been applied.
- Returns:
A copy of the node.
- detach(retain_child_nodes: bool = False) TextNode [source]¶
Removes the node from its tree.
- Parameters:
retain_child_nodes – Keeps the node’s descendants in the originating tree if
True
.- Returns:
The removed node.
- property document: Optional[Document]¶
The
Document
instance that the node is associated with orNone
.
- first_child = None¶
- property index: Optional[int]¶
The node’s index within the parent’s collection of child nodes or
None
when the node has no parent.
- iterate_ancestors(*filter: Filter) Iterator[TagNode] ¶
- Parameters:
filter – Any number of filter s that a node must match to be yielded.
- Returns:
A generator iterator that yields the ancestor nodes from bottom to top.
- iterate_children(*filter: Filter, recurse: bool = False) Iterator[NodeBase] ¶
A generator iterator that yields nothing.
- iterate_descendants(*filter: Filter) Iterator[NodeBase] ¶
- Parameters:
filter – Any number of filter s that a node must match to be yielded.
- Returns:
A generator iterator that yields the descending nodes of the node.
- iterate_following(*filter: Filter) Iterator[NodeBase] ¶
- Parameters:
filter – Any number of filter s that a node must match to be yielded.
- Returns:
A generator iterator that yields the following nodes in document order.
- iterate_following_siblings(*filter: Filter) Iterator[NodeBase] ¶
- Parameters:
filter – Any number of filter s that a node must match to be yielded.
- Returns:
A generator iterator that yields the siblings to the node’s right.
- iterate_preceding(*filter: Filter) Iterator[NodeBase] ¶
- Parameters:
filter – Any number of filter s that a node must match to be yielded.
- Returns:
A generator iterator that yields the previous nodes in document order.
- iterate_preceding_siblings(*filter: Filter) Iterator[NodeBase] ¶
- Parameters:
filter – Any number of filter s that a node must match to be yielded.
- Returns:
A generator iterator that yields the siblings to the node’s left.
- last_child = None¶
- last_descendant = None¶
- new_tag_node(local_name: str, attributes: Optional[dict[str, str]] = None, namespace: Optional[str] = None, children: Sequence[str | NodeBase | _TagDefinition] = ()) TagNode ¶
Creates a new
TagNode
instance in the node’s context.- Parameters:
local_name – The tag name.
attributes – Optional attributes that are assigned to the new node.
namespace – An optional tag namespace. If none is provided, the context node’s namespace is inherited.
children – An optional sequence of objects that will be appended as child nodes. This can be existing nodes, strings that will be inserted as text nodes and in-place definitions of
TagNode
instances fromtag()
. The latter will be assigned to the same namespace.
- Returns:
The newly created tag node.
- replace_with(node: NodeSource, clone: bool = False) NodeBase ¶
Removes the node and places the given one in its tree location.
The node can be a concrete instance of any node type or a rather abstract description in the form of a string or an object returned from the
tag()
function that is used to derive aTextNode
respectivelyTagNode
instance from.- Parameters:
node – The replacing node.
clone – A concrete, replacing node is cloned if
True
.
- Returns:
The removed node.
- serialize(*, align_attributes: bool = False, indentation: str = '', namespaces: Optional[NamespaceDeclarations] = None, newline: Optional[str] = None, text_width: int = 0)¶
Returns a string that contains the serialization of the node.
- Parameters:
align_attributes – Determines whether attributes’ names and values line up sharply around vertically aligned equal signs.
indentation – This string prefixes descending nodes’ contents one time per depth level. A non-empty string implies line-breaks between nodes as well.
namespaces – A mapping of prefixes to namespaces. These are overriding possible declarations from a parsed serialisat that the document instance stems from. Prefixes for undeclared namespaces are enumerated with the prefix
ns
.newline – See
io.TextIOWrapper
for a detailed explanation of the parameter with the same name.text_width – A positive value indicates that text nodes shall get wrapped at this character position. Indentations are not considered as part of text. This parameter’s purposed to define reasonable widths for text displays that can be scrolled horizontally.
- xpath(expression: str, namespaces: Optional[NamespaceDeclarations] = None) QueryResults ¶
See Queries with XPath & CSS for details on the extent of the XPath implementation.
- Parameters:
expression – A supported XPath 1.0 expression that contains one or more location paths.
namespaces – A mapping of prefixes that are used in the expression to namespaces. The declarations that were used in a document’s source serialisat serve as fallback.
- Returns:
All nodes that match the evaluation of the provided XPath expression.
Node constructors¶
- delb.new_comment_node(content: str) CommentNode [source]¶
Creates a new
CommentNode
.- Parameters:
content – The comment’s content a.k.a. text.
- Returns:
The newly created comment node.
- delb.new_processing_instruction_node(target: str, content: str) ProcessingInstructionNode [source]¶
Creates a new
ProcessingInstructionNode
.- Parameters:
target – The processing instruction’s target name.
content – The processing instruction’s text.
- Returns:
The newly created processing instruction node.
- delb.new_tag_node(local_name: str, attributes: Optional[dict[str, str]] = None, namespace: Optional[str] = None, children: Sequence[NodeSource] = ()) TagNode [source]¶
Creates a new
TagNode
instance outside any context. It is preferable to use the methodnew_tag_node
on instances of documents and nodes where the namespace is inherited.- Parameters:
local_name – The tag name.
attributes – Optional attributes that are assigned to the new node.
namespace – An optional tag namespace.
children – An optional sequence of objects that will be appended as child nodes. This can be existing nodes, strings that will be inserted as text nodes and in-place definitions of
TagNode
instances fromtag()
. The latter will be assigned to the same namespace.
- Returns:
The newly created tag node.
Queries with XPath & CSS¶
delb allows querying of nodes with CSS selector and XPath expressions. CSS selectors are converted to XPath expressions with a third-party library before evaluation and they are only supported as far as their computed XPath equivalents are supported by delb’s very own XPath implementation.
This implementation is not fully compliant with one of the W3C’s XPath specifications. It mostly covers the XPath 1.0 specs, but focuses on the querying via path expressions with simple constraints while it omits a broad employment of computations (that’s what programming languages are for) and has therefore these intended deviations from that standard:
Default namespaces can be addressed in node and attribute names, by simply using no prefix.
The attribute and namespace axes are not supported in location steps (see also below).
In predicates only the attribute axis can be used in its abbreviated form (
@name
).Path evaluations within predicates are not available.
- Only these predicate functions are provided and tested:
boolean
concat
contains
last
not
position
starts-with
text
Behaves as if deployed as a single step location path that only tests for the node type text. Hence it returns the contents of the context node’s first child node that is a text node or an empty string when there is none.
Please refrain from extension requests without a proper, concrete implementation proposal.
If you’re accustomed to retrieve attribute values with XPath expressions, employ the functionality of the higher programming language at hand like this:
>>> [x.attributes["target"] for x in root.xpath(".//foo")
... if "target" in x.attributes ]
Instead of:
>>> root.xpath(".//foo/@target")
See _delb.plugins.PluginManager.register_xpath_function()
regarding the use of
custom functions.
- class _delb.xpath.EvaluationContext(node: NodeBase, position: int, size: int, namespaces: Namespaces)[source]¶
Instances of this type are passed to XPath functions in order to pass contextual information.
- count(value, /)¶
Return number of occurrences of value.
- index(value, start=0, stop=9223372036854775807, /)¶
Return first index of value.
Raises ValueError if the value is not present.
- property namespaces¶
A mapping of prefixes to namespaces that is used in the whole evaluation.
- property node¶
The node that is evaluated.
- property position¶
The node’s position within all nodes that matched a location step’s node test in order of the step’s axis’ direction. The first position is 1.
- property size¶
The number of all nodes all nodes that matched a location step’s node test.
- class _delb.xpath.QueryResults(results: Iterable[NodeBase])[source]¶
A container that includes the results of a CSS selector or XPath query with some helpers for better readable Python expressions.
- count(value) integer -- return number of occurrences of value ¶
- filtered_by(*filters: _delb.typing.Filter) QueryResults [source]¶
Returns another
QueryResults
instance that contains all nodes filtered by the provided filter s.
- in_document_order() QueryResults [source]¶
Returns another
QueryResults
instance where the contained nodes are sorted in document order.
- index(value[, start[, stop]]) integer -- return first index of value. ¶
Raises ValueError if the value is not present.
Supporting start and stop arguments is optional, but recommended.
Filters¶
Default filters¶
- delb.altered_default_filters(*filter: _delb.typing.Filter, extend: bool = False)[source]¶
This function can be either used as as context manager or decorator to define a set of
default_filters
for the encapsuled code block or callable. These are then applied in all operations that allow node filtering, likeTagNode.next_node()
. Mind that they also affect a node’s index property and indexed access to child nodes.>>> root = Document( ... '<root xmlns="foo"><a/><!--x--><b/><!--y--><c/></root>' ... ).root >>> with altered_default_filters(is_comment_node): ... print([x.content for x in root.iterate_children()]) ['x', 'y']
As the default filters shadow comments and processing instructions by default, use no argument to unset this in order to access all type of nodes.
- Parameters:
filter – The filters to set or append.
extend – Extends the currently active filters with the given ones instead of replacing them.
Contributed filters¶
- delb.any_of(*filter: _delb.typing.Filter) _delb.typing.Filter [source]¶
A node filter wrapper that matches when any of the given filters is matching, like a boolean
or
.
- delb.is_comment_node(node: NodeBase) bool [source]¶
A node filter that matches
CommentNode
instances.
- delb.is_processing_instruction_node(node: NodeBase) bool [source]¶
A node filter that matches
ProcessingInstructionNode
instances.
Transformations¶
This module offers a canonical interface with the aim to make re-use of transforming algorithms easier.
Let’s look at it with examples:
from delb.transform import Transformation
class ResolveCopyOf(Transformation):
def transform(self):
for node in self.root.css_select("*[copyOf]"):
source_id = node["copyOf"]
source_node = self.origin_document.xpath(
f'//*[@xml:id="{source_id[1:]}"]'
).first
cloned_node = source_node.clone(deep=True)
cloned_node.id = None
node.replace_with(cloned_node)
From such defined transformations instances can be called with a (sub-)tree and an optional document where that tree originates from:
resolve_copy_of = ResolveCopyOf()
tree = resolve_copy_of(tree) # where tree is an instance of TagNode
typing.NamedTuple
are used to define options for transformations:
from typing import NamedTuple
class ResolveChoiceOptions(NamedTuple):
corr: bool = True
reg: bool = True
class ResolveChoice(Transformation):
options_class = ResolveChoiceOptions
def __init__(self, options):
super().__init__(options)
self.keep_selector = ",".join(
(
"corr" if self.options.corr else "sic",
"reg" if self.options.reg else "orig"
)
)
self.drop_selector = ",".join(
(
"sic" if self.options.corr else "corr",
"orig" if self.options.reg else "reg"
)
)
def transform(self):
for choice_node in self.root.css_select("choice"):
node_to_drop = choice_node.css_select(self.drop_selector).first
node_to_drop.detach()
node_to_keep = choice_node.css_select(self.keep_selector).first
node_to_keep.detach(retain_child_nodes=True)
choice_node.detach(retain_child_nodes=True)
A transformation class that defines an option_class
property can then either be used
with its defaults or with alternate options:
resolve_choice = ResolveChoice()
tree = resolve_choice(tree)
resolve_choice = ResolveChoice(ResolveChoiceOptions(reg=False))
tree = resolve_choice(tree)
Finally, concrete transformations can be chained, both as classes or instances. The interface allows also to chain multiple chains:
from delb.transform import TransformationSequence
tidy_up = TransformationSequence(ResolveCopyOf, resolve_choice)
tree = tidy_up(tree)
Attention
This is an experimental feature. It might change significantly in the future or be removed altogether.
- class delb.transform.Transformation(options: Optional[NamedTuple] = None)[source]¶
This is a base class for any transformation algorithm.
- abstract transform()[source]¶
This method needs to implement the transformation logic. When it is called, the instance has two attributes assigned from its call:
root
is the node that the transformation was called to transform with.origin_document
is the document that was possibly passed as second argument.
- class delb.transform.TransformationSequence(*transformations: TransformationBase | type[TransformationBase])[source]¶
A transformation sequence can be used to combine any number of both
Transformation
(provided as class or instantiated with options) and otherTransformationSequence
instances or classes.
String serialization¶
- class delb.DefaultStringOptions[source]¶
This object’s class variables are used to configure the serialization parameters that are applied when nodes are coerced to
str
objects. Hence it also applies when node objects are fed to theprint()
function and in other cases where objects are implicitly cast to strings.⚠️ Use this once to define behaviour on application level. For thread-safe serializations of nodes with diverging parameters use
NodeBase.serialize()
! Think thrice whether you want to use this facility in a library.- align_attributes: ClassWar[bool] = False¶
Determines whether attributes’ names and values line up sharply around vertically aligned equal signs.
- indentation: ClassWar[str] = ''¶
This string prefixes descending nodes’ contents one time per depth level. A non-empty string implies line-breaks between nodes as well.
- namespaces: ClassWar[None | NamespaceDeclarations] = None¶
A mapping of prefixes to namespaces. These are overriding possible declarations from a parsed serialisat that the document instance stems from. Prefixes for undeclared namespaces are enumerated with the prefix
ns
.
- newline: ClassWar[None | str] = None¶
See
io.TextIOWrapper
for a detailed explanation of the parameter with the same name.
Various helpers¶
- delb.first(iterable: Iterable) Optional[Any] [source]¶
Returns the first item of the given iterable or
None
if it’s empty. Note that the first item is consumed when the iterable is an iterator.
- delb.get_traverser(from_left=True, depth_first=True, from_top=True)[source]¶
Returns a function that can be used to traverse a (sub)tree with the given node as root. While traversing the given root node is yielded at some point.
The returned functions have this signature:
def traverser(root: NodeBase, *filters: Filter) -> Iterator[NodeBase]: ...
- Parameters:
from_left – The traverser yields sibling nodes from left to right if
True
, or starting from the right ifFalse
.depth_first – The child nodes resp. the parent node are yielded before the siblings of a node by a traverser if
True
. Siblings are favored ifFalse
.from_top – The traverser starts yielding nodes with the lowest depth if
True
. WhenFalse
, again, the opposite is in effect.
- delb.last(iterable: Iterable) Optional[Any] [source]¶
Returns the last item of the given iterable or
None
if it’s empty. Note that the whole iterator is consumed when such is given.
- delb.tag(local_name: str)[source]¶
- delb.tag(local_name: str, attributes: Mapping[str, str])
- delb.tag(local_name: str, child: NodeSource)
- delb.tag(local_name: str, children: Sequence[NodeSource])
- delb.tag(local_name: str, attributes: Mapping[str, str], child: NodeSource)
- delb.tag(local_name: str, attributes: Mapping[str, str], children: Sequence[NodeSource])
This function can be used for in-place creation (or call it templating if you want to) of
TagNode
instances as:node
argument to methods that add nodes to a treeitems in the
children
argument ofnew_tag_node()
andNodeBase.new_tag_node()
The first argument to the function is always the local name of the tag node. Optionally, the second argument can be a mapping that specifies attributes for that node. The optional last argument is either a single object that will be appended as child node or a sequence of such, these objects can be node instances of any type, strings (for derived
TextNode
instances) or other definitions from this function (for derivedTagNode
instances).The actual nodes that are constructed always inherit the namespace of the context node they are created in.
>>> root = new_tag_node('root', children=[ ... tag("head", {"lvl": "1"}, "Hello!"), ... tag("items", ( ... tag("item1"), ... tag("item2"), ... ) ... ) ... ]) >>> str(root) '<root><head lvl="1">Hello!</head><items><item1/><item2/></items></root>' >>> root.append_children(tag("addendum")) >>> str(root)[-26:] '</items><addendum/></root>'
Exceptions¶
- exception delb.exceptions.AmbiguousTreeError(message: str)[source]¶
Raised when a single node shall be fetched or created by an XPath expression in a tree where the target position can’t be clearly determined.
- with_traceback()¶
Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.
- exception delb.exceptions.DelbBaseException[source]¶
- with_traceback()¶
Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.
- exception delb.exceptions.FailedDocumentLoading(source: Any, excuses: dict[Loader, str | Exception])[source]¶
- with_traceback()¶
Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.
- exception delb.exceptions.InvalidCodePath[source]¶
Raised when a code path that is not expected to be executed is reached.
- with_traceback()¶
Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.
- exception delb.exceptions.InvalidOperation[source]¶
Raised when an invalid operation is attempted by the client code.
- with_traceback()¶
Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.
- exception delb.exceptions.XPathEvaluationError(message: str)[source]¶
- with_traceback()¶
Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.