API Documentation¶
Note
There are actually two packages that are installed with delb:
delb
and _delb
. As the underscore indicates, the latter is exposing
private parts of the API while the first is re-exposing what is deemed to
be public from that one and additional contents.
As a rule of thumb, use the public API in applications and the private API
in delb extensions. By doing so, you can avoid circular dependencies if
your extension (or other code that it depends on) uses contents from the
_delb
package.
Documents¶
- class delb.Document(source, collapse_whitespace=None, parser=None, parser_options=None, klass=None, **config)[source]¶
This class is the entrypoint to obtain a representation of an XML encoded text document. For instantiation any object can be passed. A suitable loader must be available for the given source. See Document loaders for the default loaders that come with this package. Plugins are capable to alter the available loaders, see Extending delb.
Nodes can be tested for membership in a document:
>>> document = Document("<root>text</root>") >>> text_node = document.root[0] >>> text_node in document True >>> text_node.clone() in document False
The string coercion of a document yields an XML encoded stream, but unlike
Document.save()
andDocument.write()
, without an XML declaration:>>> document = Document("<root/>") >>> str(document) '<root/>'
- Parameters
source – Anything that the configured loaders can make sense of to return a parsed document tree.
collapse_whitespace – Deprecated. Use the argument with the same name on the
parser_options
object.parser – Deprecated.
parser_options – A
delb.ParserOptions
class to configure the used parser.klass – Explicitly define the initilized class. This can be useful for applications that have default document subclasses in use.
config – Additional keyword arguments for the configuration of extension classes.
Properties
Beside the used
parser
andcollapsed_whitespace
option, this property contains the namespaced data that extension classes and loaders may have stored.A list-like accessor to the nodes that precede the document's root node.
The namespace mapping of the document's
root
node.The root node of a document tree.
The source URL where a loader obtained the document's contents or
None
.A list-like accessor to the nodes that follow the document's root node.
Uncategorized methods
cleanup_namespaces
([namespaces, retain_prefixes])Consolidates the namespace declarations in the document by removing unused and redundant ones.
clone
()- return
Another instance with the duplicated contents.
Collapses whitespace as described here: https://wiki.tei-c.org/index.php/XML_Whitespace#Recommendations
css_select
(expression[, namespaces])This method proxies to the
TagNode.css_select()
method of the document'sroot
node.This method proxies to the
TagNode.merge_text_nodes()
method of the document'sroot
node.new_tag_node
(local_name[, attributes, namespace])This method proxies to the
TagNode.new_tag_node()
method of the document's root node.save
(path[, pretty])- param path
The path where the document shall be saved.
write
(buffer[, pretty])- param buffer
A file-like object that the document is written to.
xpath
(expression[, namespaces])This method proxies to the
TagNode.xpath()
method of the document'sroot
node.xslt
(transformation)- param transformation
A
lxml.etree.XSLT
instance that shall be
- cleanup_namespaces(namespaces: Optional[Mapping[Optional[str], str]] = None, retain_prefixes: Optional[Iterable[str]] = None)[source]¶
Consolidates the namespace declarations in the document by removing unused and redundant ones.
- There are currently some caveats due to lxml/libxml2’s implementations:
prefixes cannot be set for the default namespace
a namespace cannot be declared as default after a node’s creation (where a namespace was specified that had been registered for a prefix with
register_namespace()
)there’s no way to unregister a prefix for a namespace
if there are other namespaces used as default namespaces (where a namespace was specified that had not been registered for a prefix) in the descendants of the root, their declarations are lost when this method is used
- To ensure clean serializations, one should:
register prefixes for all namespaces except the default one at the start of an application
use only one default namespace within a document
- Parameters
namespaces – An optional mapping of prefixes (keys) to namespaces (values) that will be declared at the root element.
retain_prefixes – An optional iterable that contains prefixes whose declarations shall be kept despite not being used.
- collapse_whitespace()[source]¶
Collapses whitespace as described here: https://wiki.tei-c.org/index.php/XML_Whitespace#Recommendations
Implicitly merges all neighbouring text nodes.
- config: SimpleNamespace¶
Beside the used
parser
andcollapsed_whitespace
option, this property contains the namespaced data that extension classes and loaders may have stored.
- css_select(expression: str, namespaces: Optional[Namespaces] = None) QueryResults [source]¶
This method proxies to the
TagNode.css_select()
method of the document’sroot
node.
- head_nodes¶
A list-like accessor to the nodes that precede the document’s root node. Note that nodes can’t be removed or replaced.
- merge_text_nodes()[source]¶
This method proxies to the
TagNode.merge_text_nodes()
method of the document’sroot
node.
- new_tag_node(local_name: str, attributes: Optional[Dict[str, str]] = None, namespace: Optional[str] = None) TagNode [source]¶
This method proxies to the
TagNode.new_tag_node()
method of the document’s root node.
- save(path: Path, pretty: bool = False, **cleanup_namespaces_args)[source]¶
- Parameters
path – The path where the document shall be saved.
pretty – Adds indentation for human consumers when
True
.cleanup_namespaces_args – Arguments that are a passed to
Document.cleanup_namespaces()
before saving.
- tail_nodes¶
A list-like accessor to the nodes that follow the document’s root node. Note that nodes can’t be removed or replaced.
- write(buffer: IO, pretty: bool = False, **cleanup_namespaces_args)[source]¶
- Parameters
buffer – A file-like object that the document is written to.
pretty – Adds indentation for human consumers when
True
.cleanup_namespaces_args – Arguments that are a passed to
Document.cleanup_namespaces()
before writing.
- xpath(expression: str, namespaces: Optional[Namespaces] = None) QueryResults [source]¶
This method proxies to the
TagNode.xpath()
method of the document’sroot
node.
Document loaders¶
If you want or need to manipulate the availability of or order in which loaders
are attempted, you can change the
delb.plugins.plugin_manager.plugins.loaders
object which is a
list
. Its state is reflected in your whole application. Please refer to
this issue when you require finer controls over these aspects.
Core¶
The core_loaders
module provides a set loaders to retrieve documents from various
data sources.
- _delb.plugins.core_loaders.buffer_loader(data: Any, config: SimpleNamespace) _delb.typing.LoaderResult [source]¶
This loader loads a document from a file-like object.
- _delb.plugins.core_loaders.etree_loader(data: Any, config: SimpleNamespace) _delb.typing.LoaderResult [source]¶
This loader processes
lxml.etree._Element
andlxml.etree._ElementTree
instances.
- _delb.plugins.core_loaders.ftp_loader(data: Any, config: SimpleNamespace) _delb.typing.LoaderResult [source]¶
Loads a document from a URL with either the
ftp
schema. The URL will be bound tosource_url
on the document’sDocument.config
attribute.
- _delb.plugins.core_loaders.path_loader(data: Any, config: SimpleNamespace) _delb.typing.LoaderResult [source]¶
This loader loads from a file that is pointed at with a
pathlib.Path
instance. That instance will be bound tosource_path
on the document’sDocument.config
attribute.
- _delb.plugins.core_loaders.tag_node_loader(data: Any, config: SimpleNamespace) _delb.typing.LoaderResult [source]¶
This loader loads, or rather clones, a
delb.TagNode
instance and its descendant nodes.
- _delb.plugins.core_loaders.text_loader(data: Any, config: SimpleNamespace) _delb.typing.LoaderResult [source]¶
Parses a string containing a full document.
Extra¶
If delb
is installed with https-loader
as extra, the required
dependencies for this loader are installed as well. See Installation.
- _delb.plugins.https_loader.https_loader(data: ~typing.Any, config: ~types.SimpleNamespace, client: ~httpx.Client = <httpx.Client object>) _delb.typing.LoaderResult [source]¶
This loader loads a document from a URL with the
http
andhttps
scheme. Redirects are followed. The default httpx-client follows redirects and can partially be configured with environment variables. The URL will be bound to the namesource_url
on the document’sDocument.config
attribute.Loaders with specifically configured httpx-clients can build on this loader like so:
import httpx from _delb.plugins import plugin_manager from _delb.plugins.https_loader import https_loader client = httpx.Client(follow_redirects=False, trust_env=False) @plugin_manager.register_loader(before=https_loader) def custom_https_loader(data, config): return https_loader(data, config, client=client)
Parser options¶
- class delb.ParserOptions(cleanup_namespaces: bool = False, collapse_whitespace: bool = False, remove_comments: bool = False, remove_processing_instructions: bool = False, resolve_entities: bool = True, unplugged: bool = False)[source]¶
The configuration options that define an XML parser’s behaviour.
- Parameters
cleanup_namespaces – Consolidate XML namespace declarations.
collapse_whitespace –
Collapse the content's whitespace
.remove_comments – Ignore comments.
remove_processing_instructions – Don’t include processing instructions in the parsed tree.
resolve_entities – Resolve entities.
unplugged – Don’t load referenced resources over network.
Nodes¶
Comment¶
- class delb.CommentNode(etree_element: _Element)[source]¶
The instances of this class represent comment nodes of a tree.
To instantiate new nodes use
new_comment_node()
.Properties
The comment's text.
The depth (or level) of the node in its tree.
The
Document
instances that the node is associated with orNone
.The concatenated contents of all text node descendants in document order.
The node's index within the parent's collection of child nodes or
None
when the node has no parent.The prefix to namespace mapping of the node.
The node's parent or
None
.Fetching a single relative node
fetch_following
(*filter)- param filter
Any number of filter s.
fetch_following_sibling
(*filter)- param filter
Any number of filter s.
fetch_preceding
(*filter)- param filter
Any number of filter s.
fetch_preceding_sibling
(*filter)- param filter
Any number of filter s.
Iterating over relative nodes
iterate_ancestors
(*filter)- param filter
Any number of filter s that a node must match to be
iterate_children
(*filter[, recurse])A generator iterator that yields nothing.
iterate_descendants
(*filter)- param filter
Any number of filter s that a node must match to be
iterate_following
(*filter)- param filter
Any number of filter s that a node must match to be
iterate_following_siblings
(*filter)- param filter
Any number of filter s that a node must match to be
iterate_preceding
(*filter)- param filter
Any number of filter s that a node must match to be
iterate_preceding_siblings
(*filter)- param filter
Any number of filter s that a node must match to be
Querying nodes
xpath
(expression[, namespaces])See Queries with XPath & CSS for details on the extent of the XPath implementation.
Adding nodes
add_following_siblings
(*node[, clone])Adds one or more nodes to the right of the node this method is called on.
add_preceding_siblings
(*node[, clone])Adds one or more nodes to the left of the node this method is called on.
Removing a node from its tree
detach
([retain_child_nodes])Removes the node from its tree.
replace_with
(node[, clone])Removes the node and places the given one in its tree location.
Uncategorized methods
clone
([deep, quick_and_unsafe])- param deep
Clones the whole subtree if
True
.
new_tag_node
(local_name[, attributes, ...])Creates a new
TagNode
instance in the node's context.
- add_following_siblings(*node: Union[str, NodeBase, _TagDefinition], clone: bool = False)¶
Adds one or more nodes to the right of the node this method is called on.
The nodes can be concrete instances of any node type or rather abstract descriptions in the form of strings or objects returned from the
tag()
function that are used to deriveTextNode
respectivelyTagNode
instances from.- Parameters
node – The node(s) to be added.
clone – Clones the concrete nodes before adding if
True
.
- add_preceding_siblings(*node: Union[str, NodeBase, _TagDefinition], clone: bool = False)¶
Adds one or more nodes to the left of the node this method is called on.
The nodes can be concrete instances of any node type or rather abstract descriptions in the form of strings or objects returned from the
tag()
function that are used to deriveTextNode
respectivelyTagNode
instances from.- Parameters
node – The node(s) to be added.
clone – Clones the concrete nodes before adding if
True
.
- clone(deep: bool = False, quick_and_unsafe: bool = False) _ElementWrappingNode ¶
- Parameters
deep – Clones the whole subtree if
True
.quick_and_unsafe – Creates a deep clone in a quicker manner where text nodes may get lost. It should be safe with trees that don’t contain subsequent text nodes, e.g. freshly parsed, unaltered documents of after
TagNode.merge_text_nodes()
has been applied.
- Returns
A copy of the node.
- detach(retain_child_nodes: bool = False) _ElementWrappingNode ¶
Removes the node from its tree.
- Parameters
retain_child_nodes – Keeps the node’s descendants in the originating tree if
True
.- Returns
The removed node.
- property document: Optional[Document]¶
The
Document
instances that the node is associated with orNone
.
- fetch_following(*filter: _delb.typing.Filter) Optional[NodeBase] ¶
- Parameters
filter – Any number of filter s.
- Returns
The next node in document order that matches all filters or
None
.
- fetch_following_sibling(*filter: _delb.typing.Filter) Optional[NodeBase] ¶
- Parameters
filter – Any number of filter s.
- Returns
The next sibling to the right that matches all filters or
None
.
- fetch_preceding(*filter: _delb.typing.Filter) Optional[NodeBase] ¶
- Parameters
filter – Any number of filter s.
- Returns
The previous node in document order that matches all filters or
None
.
- fetch_preceding_sibling(*filter: _delb.typing.Filter) Optional[NodeBase] ¶
- Parameters
filter – Any number of filter s.
- Returns
The next sibling to the left that matches all filters or
None
.
- first_child = None¶
- property index: Optional[int]¶
The node’s index within the parent’s collection of child nodes or
None
when the node has no parent.
- iterate_ancestors(*filter: _delb.typing.Filter) Iterator[TagNode] ¶
- Parameters
filter – Any number of filter s that a node must match to be yielded.
- Returns
A generator iterator that yields the ancestor nodes from bottom to top.
- iterate_children(*filter: _delb.typing.Filter, recurse: bool = False) Iterator[NodeBase] ¶
A generator iterator that yields nothing.
- iterate_descendants(*filter: _delb.typing.Filter) Iterator[NodeBase] ¶
- Parameters
filter – Any number of filter s that a node must match to be yielded.
- Returns
A generator iterator that yields the descending nodes of the node.
- iterate_following(*filter: _delb.typing.Filter) Iterator[NodeBase] ¶
- Parameters
filter – Any number of filter s that a node must match to be yielded.
- Returns
A generator iterator that yields the following nodes in document order.
- iterate_following_siblings(*filter: _delb.typing.Filter) Iterator[NodeBase] ¶
- Parameters
filter – Any number of filter s that a node must match to be yielded.
- Returns
A generator iterator that yields the siblings to the node’s right.
- iterate_preceding(*filter: _delb.typing.Filter) Iterator[NodeBase] ¶
- Parameters
filter – Any number of filter s that a node must match to be yielded.
- Returns
A generator iterator that yields the previous nodes in document order.
- iterate_preceding_siblings(*filter: _delb.typing.Filter) Iterator[NodeBase] ¶
- Parameters
filter – Any number of filter s that a node must match to be yielded.
- Returns
A generator iterator that yields the siblings to the node’s left.
- last_child = None¶
- last_descendant = None¶
- new_tag_node(local_name: str, attributes: Optional[Dict[str, str]] = None, namespace: Optional[str] = None, children: Sequence[Union[str, NodeBase, _TagDefinition]] = ()) TagNode ¶
Creates a new
TagNode
instance in the node’s context.- Parameters
local_name – The tag name.
attributes – Optional attributes that are assigned to the new node.
namespace – An optional tag namespace. If none is provided, the context node’s namespace is inherited.
children – An optional sequence of objects that will be appended as child nodes. This can be existing nodes, strings that will be inserted as text nodes and in-place definitions of
TagNode
instances fromtag()
. The latter will be assigned to the same namespace.
- Returns
The newly created tag node.
- replace_with(node: Union[str, NodeBase, _TagDefinition], clone: bool = False) NodeBase ¶
Removes the node and places the given one in its tree location.
The node can be a concrete instance of any node type or a rather abstract description in the form of a string or an object returned from the
tag()
function that is used to derive aTextNode
respectivelyTagNode
instance from.- Parameters
node – The replacing node.
clone – A concrete, replacing node is cloned if
True
.
- Returns
The removed node.
- xpath(expression: str, namespaces: Optional[Namespaces] = None) QueryResults ¶
See Queries with XPath & CSS for details on the extent of the XPath implementation.
- Parameters
expression – A supported XPath 1.0 expression that contains one or more location paths.
namespaces – A mapping of prefixes that are used in the expression to namespaces. If omitted, the node’s definition is used.
- Returns
All nodes that match the evaluation of the provided XPath expression.
Processing instruction¶
- class delb.ProcessingInstructionNode(etree_element: _Element)[source]¶
The instances of this class represent processing instruction nodes of a tree.
To instantiate new nodes use
new_processing_instruction_node()
.Properties
The processing instruction's text.
The depth (or level) of the node in its tree.
The
Document
instances that the node is associated with orNone
.The concatenated contents of all text node descendants in document order.
The node's index within the parent's collection of child nodes or
None
when the node has no parent.The prefix to namespace mapping of the node.
The node's parent or
None
.The processing instruction's target.
Fetching a single relative node
fetch_following
(*filter)- param filter
Any number of filter s.
fetch_following_sibling
(*filter)- param filter
Any number of filter s.
fetch_preceding
(*filter)- param filter
Any number of filter s.
fetch_preceding_sibling
(*filter)- param filter
Any number of filter s.
Iterating over relative nodes
iterate_ancestors
(*filter)- param filter
Any number of filter s that a node must match to be
iterate_children
(*filter[, recurse])A generator iterator that yields nothing.
iterate_descendants
(*filter)- param filter
Any number of filter s that a node must match to be
iterate_following
(*filter)- param filter
Any number of filter s that a node must match to be
iterate_following_siblings
(*filter)- param filter
Any number of filter s that a node must match to be
iterate_preceding
(*filter)- param filter
Any number of filter s that a node must match to be
iterate_preceding_siblings
(*filter)- param filter
Any number of filter s that a node must match to be
Querying nodes
xpath
(expression[, namespaces])See Queries with XPath & CSS for details on the extent of the XPath implementation.
Adding nodes
add_following_siblings
(*node[, clone])Adds one or more nodes to the right of the node this method is called on.
add_preceding_siblings
(*node[, clone])Adds one or more nodes to the left of the node this method is called on.
Removing a node from its tree
detach
([retain_child_nodes])Removes the node from its tree.
replace_with
(node[, clone])Removes the node and places the given one in its tree location.
Uncategorized methods
clone
([deep, quick_and_unsafe])- param deep
Clones the whole subtree if
True
.
new_tag_node
(local_name[, attributes, ...])Creates a new
TagNode
instance in the node's context.
- add_following_siblings(*node: Union[str, NodeBase, _TagDefinition], clone: bool = False)¶
Adds one or more nodes to the right of the node this method is called on.
The nodes can be concrete instances of any node type or rather abstract descriptions in the form of strings or objects returned from the
tag()
function that are used to deriveTextNode
respectivelyTagNode
instances from.- Parameters
node – The node(s) to be added.
clone – Clones the concrete nodes before adding if
True
.
- add_preceding_siblings(*node: Union[str, NodeBase, _TagDefinition], clone: bool = False)¶
Adds one or more nodes to the left of the node this method is called on.
The nodes can be concrete instances of any node type or rather abstract descriptions in the form of strings or objects returned from the
tag()
function that are used to deriveTextNode
respectivelyTagNode
instances from.- Parameters
node – The node(s) to be added.
clone – Clones the concrete nodes before adding if
True
.
- clone(deep: bool = False, quick_and_unsafe: bool = False) _ElementWrappingNode ¶
- Parameters
deep – Clones the whole subtree if
True
.quick_and_unsafe – Creates a deep clone in a quicker manner where text nodes may get lost. It should be safe with trees that don’t contain subsequent text nodes, e.g. freshly parsed, unaltered documents of after
TagNode.merge_text_nodes()
has been applied.
- Returns
A copy of the node.
- detach(retain_child_nodes: bool = False) _ElementWrappingNode ¶
Removes the node from its tree.
- Parameters
retain_child_nodes – Keeps the node’s descendants in the originating tree if
True
.- Returns
The removed node.
- property document: Optional[Document]¶
The
Document
instances that the node is associated with orNone
.
- fetch_following(*filter: _delb.typing.Filter) Optional[NodeBase] ¶
- Parameters
filter – Any number of filter s.
- Returns
The next node in document order that matches all filters or
None
.
- fetch_following_sibling(*filter: _delb.typing.Filter) Optional[NodeBase] ¶
- Parameters
filter – Any number of filter s.
- Returns
The next sibling to the right that matches all filters or
None
.
- fetch_preceding(*filter: _delb.typing.Filter) Optional[NodeBase] ¶
- Parameters
filter – Any number of filter s.
- Returns
The previous node in document order that matches all filters or
None
.
- fetch_preceding_sibling(*filter: _delb.typing.Filter) Optional[NodeBase] ¶
- Parameters
filter – Any number of filter s.
- Returns
The next sibling to the left that matches all filters or
None
.
- first_child = None¶
- property index: Optional[int]¶
The node’s index within the parent’s collection of child nodes or
None
when the node has no parent.
- iterate_ancestors(*filter: _delb.typing.Filter) Iterator[TagNode] ¶
- Parameters
filter – Any number of filter s that a node must match to be yielded.
- Returns
A generator iterator that yields the ancestor nodes from bottom to top.
- iterate_children(*filter: _delb.typing.Filter, recurse: bool = False) Iterator[NodeBase] ¶
A generator iterator that yields nothing.
- iterate_descendants(*filter: _delb.typing.Filter) Iterator[NodeBase] ¶
- Parameters
filter – Any number of filter s that a node must match to be yielded.
- Returns
A generator iterator that yields the descending nodes of the node.
- iterate_following(*filter: _delb.typing.Filter) Iterator[NodeBase] ¶
- Parameters
filter – Any number of filter s that a node must match to be yielded.
- Returns
A generator iterator that yields the following nodes in document order.
- iterate_following_siblings(*filter: _delb.typing.Filter) Iterator[NodeBase] ¶
- Parameters
filter – Any number of filter s that a node must match to be yielded.
- Returns
A generator iterator that yields the siblings to the node’s right.
- iterate_preceding(*filter: _delb.typing.Filter) Iterator[NodeBase] ¶
- Parameters
filter – Any number of filter s that a node must match to be yielded.
- Returns
A generator iterator that yields the previous nodes in document order.
- iterate_preceding_siblings(*filter: _delb.typing.Filter) Iterator[NodeBase] ¶
- Parameters
filter – Any number of filter s that a node must match to be yielded.
- Returns
A generator iterator that yields the siblings to the node’s left.
- last_child = None¶
- last_descendant = None¶
- new_tag_node(local_name: str, attributes: Optional[Dict[str, str]] = None, namespace: Optional[str] = None, children: Sequence[Union[str, NodeBase, _TagDefinition]] = ()) TagNode ¶
Creates a new
TagNode
instance in the node’s context.- Parameters
local_name – The tag name.
attributes – Optional attributes that are assigned to the new node.
namespace – An optional tag namespace. If none is provided, the context node’s namespace is inherited.
children – An optional sequence of objects that will be appended as child nodes. This can be existing nodes, strings that will be inserted as text nodes and in-place definitions of
TagNode
instances fromtag()
. The latter will be assigned to the same namespace.
- Returns
The newly created tag node.
- replace_with(node: Union[str, NodeBase, _TagDefinition], clone: bool = False) NodeBase ¶
Removes the node and places the given one in its tree location.
The node can be a concrete instance of any node type or a rather abstract description in the form of a string or an object returned from the
tag()
function that is used to derive aTextNode
respectivelyTagNode
instance from.- Parameters
node – The replacing node.
clone – A concrete, replacing node is cloned if
True
.
- Returns
The removed node.
- xpath(expression: str, namespaces: Optional[Namespaces] = None) QueryResults ¶
See Queries with XPath & CSS for details on the extent of the XPath implementation.
- Parameters
expression – A supported XPath 1.0 expression that contains one or more location paths.
namespaces – A mapping of prefixes that are used in the expression to namespaces. If omitted, the node’s definition is used.
- Returns
All nodes that match the evaluation of the provided XPath expression.
Tag¶
- class delb.TagNode(etree_element: _Element)[source]¶
The instances of this class represent tag node s of a tree, the equivalent of DOM’s elements.
To instantiate new nodes use
Document.new_tag_node
,TagNode.new_tag_node
,TextNode.new_tag_node
ornew_tag_node()
.Some syntactic sugar is baked in:
Attributes and nodes can be tested for membership in a node.
>>> root = Document('<root ham="spam"><child/></root>').root >>> child = root.first_child >>> "ham" in root True >>> child in root True
Nodes can be copied. Note that this relies on
TagNode.clone()
.>>> from copy import copy, deepcopy >>> root = Document("<root>Content</root>").root >>> print(copy(root)) <root/> >>> print(deepcopy(root)) <root>Content</root>
Nodes can be tested for equality regarding their qualified name and attributes.
>>> root = Document('<root><foo x="0"/><foo x="0"/><bar x="0"/></root>').root >>> root[0] == root[1] True >>> root[0] == root[2] False
Attribute values and child nodes can be obtained with the subscript notation.
>>> root = Document('<root x="0"><child_1/>child_2<child_3/></root>').root >>> root["x"] '0' >>> print(root[0]) <child_1/> >>> print(root[-1]) <child_3/> >>> print([str(x) for x in root[1::-1]]) ['child_2', '<child_1/>']
How much child nodes has this node anyway?
>>> root = Document("<root><child_1/><child_2/></root>").root >>> len(root) 2 >>> len(root[0]) 0
As seen in the examples above, a tag nodes string representation yields a serialized XML representation of a sub-/tree.
Properties
A mapping that can be used to query and alter the node's attributes.
The depth (or level) of the node in its tree.
The
Document
instances that the node is associated with orNone
.The node's first child node.
The concatenated contents of all text node descendants in document order.
This is a shortcut to retrieve and set the
id
attribute in the XML namespace.The node's index within the parent's collection of child nodes or
None
when the node has no parent.The node's last child node.
The node's last descendant.
The node's name.
An unambiguous XPath location path that points to this node from its tree root.
The node's namespace.
The prefix to namespace mapping of the node.
The node's parent or
None
.The prefix that the node's namespace is currently mapped to.
The node's qualified name in Clark notation.
Fetching a single relative node
fetch_following
(*filter)- param filter
Any number of filter s.
fetch_following_sibling
(*filter)- param filter
Any number of filter s.
fetch_preceding
(*filter)- param filter
Any number of filter s.
fetch_preceding_sibling
(*filter)- param filter
Any number of filter s.
Iterating over relative nodes
iterate_ancestors
(*filter)- param filter
Any number of filter s that a node must match to be
iterate_children
(*filter[, recurse])- param filter
Any number of filter s that a node must match to be
iterate_descendants
(*filter)- param filter
Any number of filter s that a node must match to be
iterate_following
(*filter)- param filter
Any number of filter s that a node must match to be
iterate_following_siblings
(*filter)- param filter
Any number of filter s that a node must match to be
iterate_preceding
(*filter)- param filter
Any number of filter s that a node must match to be
iterate_preceding_siblings
(*filter)- param filter
Any number of filter s that a node must match to be
Querying nodes
css_select
(expression[, namespaces])See Queries with XPath & CSS regarding the extent of the supported grammar.
fetch_or_create_by_xpath
(expression[, ...])Fetches a single node that is locatable by the provided XPath expression.
xpath
(expression[, namespaces])See Queries with XPath & CSS for details on the extent of the XPath implementation.
Adding nodes
add_following_siblings
(*node[, clone])Adds one or more nodes to the right of the node this method is called on.
add_preceding_siblings
(*node[, clone])Adds one or more nodes to the left of the node this method is called on.
append_children
(*node[, clone])Adds one or more nodes as child nodes after any existing to the child nodes of the node this method is called on.
insert_children
(index, *node[, clone])Inserts one or more child nodes.
prepend_children
(*node[, clone])Adds one or more nodes as child nodes before any existing to the child nodes of the node this method is called on.
Removing a node from its tree
detach
([retain_child_nodes])Removes the node from its tree.
replace_with
(node[, clone])Removes the node and places the given one in its tree location.
Uncategorized methods
clone
([deep, quick_and_unsafe])- param deep
Clones the whole subtree if
True
.
Merges all consecutive text nodes in the subtree into one.
new_tag_node
(local_name[, attributes, ...])Creates a new
TagNode
instance in the node's context.parse
(text[, parser, parser_options, ...])Parses the given string or bytes sequence into a new tree.
- add_following_siblings(*node: Union[str, NodeBase, _TagDefinition], clone: bool = False)¶
Adds one or more nodes to the right of the node this method is called on.
The nodes can be concrete instances of any node type or rather abstract descriptions in the form of strings or objects returned from the
tag()
function that are used to deriveTextNode
respectivelyTagNode
instances from.- Parameters
node – The node(s) to be added.
clone – Clones the concrete nodes before adding if
True
.
- add_preceding_siblings(*node: Union[str, NodeBase, _TagDefinition], clone: bool = False)¶
Adds one or more nodes to the left of the node this method is called on.
The nodes can be concrete instances of any node type or rather abstract descriptions in the form of strings or objects returned from the
tag()
function that are used to deriveTextNode
respectivelyTagNode
instances from.- Parameters
node – The node(s) to be added.
clone – Clones the concrete nodes before adding if
True
.
- append_children(*node: Union[str, NodeBase, _TagDefinition], clone: bool = False)[source]¶
Adds one or more nodes as child nodes after any existing to the child nodes of the node this method is called on.
The nodes can be concrete instances of any node type or rather abstract descriptions in the form of strings or objects returned from the
tag()
function that are used to deriveTextNode
respectivelyTagNode
instances from.- Parameters
node – The node(s) to be added.
clone – Clones the concrete nodes before adding if
True
.
- property attributes: TagAttributes¶
A mapping that can be used to query and alter the node’s attributes.
>>> node = new_tag_node("node", attributes={"foo": "0", "bar": "0"}) >>> node.attributes {'foo': '0', 'bar': '0'} >>> node.attributes.pop("bar") '0' >>> node.attributes["foo"] = "1" >>> node.attributes["peng"] = "1" >>> print(node) <node foo="1" peng="1"/> >>> node.attributes.update({"foo": "2", "zong": "2"}) >>> print(node) <node foo="2" peng="1" zong="2"/>
Namespaced attributes can be accessed by using Python’s slice notation. A default namespace can be provided optionally, but it’s also found without.
>>> node = new_tag_node("node", {}) >>> node.attributes["http://namespace":"foo"] = "0" >>> print(node) <node xmlns:ns0="http://namespace" ns0:foo="0"/> >>> node = Document('<node xmlns="default" foo="0"/>').root >>> node.attributes["default":"foo"] is node.attributes["foo"] True
Attributes behave like strings, but also expose namespace, local name and value for manipulation.
>>> node = new_tag_node("node") >>> node.attributes["foo"] = "0" >>> node.attributes["foo"].local_name = "bar" >>> node.attributes["bar"].namespace = "http://namespace" >>> node.attributes["http://namespace":"bar"].value = "1" >>> print(node) <node xmlns:ns0="http://namespace" ns0:bar="1"/>
Unlike with typical Python mappings, requesting a non-existing attribute doesn’t evoke a
KeyError
, insteadNone
is returned.
- clone(deep: bool = False, quick_and_unsafe: bool = False) TagNode [source]¶
- Parameters
deep – Clones the whole subtree if
True
.quick_and_unsafe – Creates a deep clone in a quicker manner where text nodes may get lost. It should be safe with trees that don’t contain subsequent text nodes, e.g. freshly parsed, unaltered documents of after
TagNode.merge_text_nodes()
has been applied.
- Returns
A copy of the node.
- css_select(expression: str, namespaces: Optional[Namespaces] = None) QueryResults [source]¶
See Queries with XPath & CSS regarding the extent of the supported grammar.
Namespace prefixes are delimited with a
|
before a name test, for examplediv svg|metadata
selects all descendants ofdiv
named nodes that belong to the default namespace or have no namespace and whose name ismetadata
and have a namespace that is mapped to thesvg
prefix.- Parameters
expression – A CSS selector expression.
namespaces – A mapping of prefixes that are used in the expression to namespaces. If omitted, the node’s definition is used.
- Returns
All nodes that match the evaluation of the provided CSS selector expression.
- detach(retain_child_nodes: bool = False) _ElementWrappingNode [source]¶
Removes the node from its tree.
- Parameters
retain_child_nodes – Keeps the node’s descendants in the originating tree if
True
.- Returns
The removed node.
- property document: Optional[Document]¶
The
Document
instances that the node is associated with orNone
.
- fetch_following(*filter: _delb.typing.Filter) Optional[NodeBase] ¶
- Parameters
filter – Any number of filter s.
- Returns
The next node in document order that matches all filters or
None
.
- fetch_following_sibling(*filter: _delb.typing.Filter) Optional[NodeBase] ¶
- Parameters
filter – Any number of filter s.
- Returns
The next sibling to the right that matches all filters or
None
.
- fetch_or_create_by_xpath(expression: str, namespaces: Union[Namespaces, None, Mapping[Optional[str], str]] = None) TagNode [source]¶
Fetches a single node that is locatable by the provided XPath expression. If the node doesn’t exist, the non-existing branch will be created. These rules are imperative in your endeavour:
All location steps must use the child axis.
Each step needs to provide a name test.
Attributes must be compared against a literal.
Multiple attribute comparisons must be joined with the and operator and / or more than one predicate expression.
The logical validity of multiple attribute comparisons isn’t checked. E.g. one could provide
foo[@p="her"][@p="him"]
, but expect an undefined behaviour.Other contents in predicate expressions are invalid.
>>> document = Document("<root/>") >>> grandchild = document.root.fetch_or_create_by_xpath( ... "child[@a='b']/grandchild" ... ) >>> grandchild is document.root.fetch_or_create_by_xpath( ... "child[@a='b']/grandchild" ... ) True >>> str(document) '<root><child a="b"><grandchild/></child></root>'
- Parameters
expression – An XPath expression that can unambiguously locate a descending node in a tree that has any state.
namespaces – An optional mapping of prefixes to namespaces. As default the node’s one is used.
- Returns
The existing or freshly created node descibed with
expression
.
- fetch_preceding(*filter: _delb.typing.Filter) Optional[NodeBase] ¶
- Parameters
filter – Any number of filter s.
- Returns
The previous node in document order that matches all filters or
None
.
- fetch_preceding_sibling(*filter: _delb.typing.Filter) Optional[NodeBase] ¶
- Parameters
filter – Any number of filter s.
- Returns
The next sibling to the left that matches all filters or
None
.
- property id: Optional[str]¶
This is a shortcut to retrieve and set the
id
attribute in the XML namespace. The client code is responsible to pass properly formed id names.
- property index: Optional[int]¶
The node’s index within the parent’s collection of child nodes or
None
when the node has no parent.
- insert_children(index: int, *node: Union[str, NodeBase, _TagDefinition], clone: bool = False)[source]¶
Inserts one or more child nodes.
The nodes can be concrete instances of any node type or rather abstract descriptions in the form of strings or objects returned from the
tag()
function that are used to deriveTextNode
respectivelyTagNode
instances from.- Parameters
index – The index at which the first of the given nodes will be inserted, the remaining nodes are added afterwards in the given order.
node – The node(s) to be added.
clone – Clones the concrete nodes before adding if
True
.
- iterate_ancestors(*filter: _delb.typing.Filter) Iterator[TagNode] ¶
- Parameters
filter – Any number of filter s that a node must match to be yielded.
- Returns
A generator iterator that yields the ancestor nodes from bottom to top.
- iterate_children(*filter: _delb.typing.Filter, recurse: bool = False) Iterator[NodeBase] [source]¶
- Parameters
filter – Any number of filter s that a node must match to be yielded.
recurse – Deprecated. Use
NodeBase.iterate_descendants()
.
- Returns
A generator iterator that yields the child nodes of the node.
- iterate_descendants(*filter: _delb.typing.Filter) Iterator[NodeBase] [source]¶
- Parameters
filter – Any number of filter s that a node must match to be yielded.
- Returns
A generator iterator that yields the descending nodes of the node.
- iterate_following(*filter: _delb.typing.Filter) Iterator[NodeBase] ¶
- Parameters
filter – Any number of filter s that a node must match to be yielded.
- Returns
A generator iterator that yields the following nodes in document order.
- iterate_following_siblings(*filter: _delb.typing.Filter) Iterator[NodeBase] ¶
- Parameters
filter – Any number of filter s that a node must match to be yielded.
- Returns
A generator iterator that yields the siblings to the node’s right.
- iterate_preceding(*filter: _delb.typing.Filter) Iterator[NodeBase] ¶
- Parameters
filter – Any number of filter s that a node must match to be yielded.
- Returns
A generator iterator that yields the previous nodes in document order.
- iterate_preceding_siblings(*filter: _delb.typing.Filter) Iterator[NodeBase] ¶
- Parameters
filter – Any number of filter s that a node must match to be yielded.
- Returns
A generator iterator that yields the siblings to the node’s left.
- property location_path: str¶
An unambiguous XPath location path that points to this node from its tree root.
- property namespace: Optional[str]¶
The node’s namespace. Be aware, that while this property can be set to
None
, serializations will continue to render a previous default namespace declaration if the node had such.
- new_tag_node(local_name: str, attributes: Optional[Dict[str, str]] = None, namespace: Optional[str] = None, children: Sequence[Union[str, NodeBase, _TagDefinition]] = ()) TagNode [source]¶
Creates a new
TagNode
instance in the node’s context.- Parameters
local_name – The tag name.
attributes – Optional attributes that are assigned to the new node.
namespace – An optional tag namespace. If none is provided, the context node’s namespace is inherited.
children – An optional sequence of objects that will be appended as child nodes. This can be existing nodes, strings that will be inserted as text nodes and in-place definitions of
TagNode
instances fromtag()
. The latter will be assigned to the same namespace.
- Returns
The newly created tag node.
- static parse(text: AnyStr, parser: Optional[XMLParser] = None, parser_options: Optional[ParserOptions] = None, collapse_whitespace: Optional[bool] = None) TagNode [source]¶
Parses the given string or bytes sequence into a new tree.
- Parameters
text – A serialized XML tree.
parser – Deprecated.
parser_options – A
delb.ParserOptions
class to configure the used parser.collapse_whitespace – Deprecated. Use the argument with the same name on the
parser_options
object.
- prepend_children(*node: NodeBase, clone: bool = False) None [source]¶
Adds one or more nodes as child nodes before any existing to the child nodes of the node this method is called on.
The nodes can be concrete instances of any node type or rather abstract descriptions in the form of strings or objects returned from the
tag()
function that are used to deriveTextNode
respectivelyTagNode
instances from.- Parameters
node – The node(s) to be added.
clone – Clones the concrete nodes before adding if
True
.
- replace_with(node: Union[str, NodeBase, _TagDefinition], clone: bool = False) NodeBase ¶
Removes the node and places the given one in its tree location.
The node can be a concrete instance of any node type or a rather abstract description in the form of a string or an object returned from the
tag()
function that is used to derive aTextNode
respectivelyTagNode
instance from.- Parameters
node – The replacing node.
clone – A concrete, replacing node is cloned if
True
.
- Returns
The removed node.
- property universal_name: str¶
The node’s qualified name in Clark notation.
- xpath(expression: str, namespaces: Optional[Namespaces] = None) QueryResults [source]¶
See Queries with XPath & CSS for details on the extent of the XPath implementation.
- Parameters
expression – A supported XPath 1.0 expression that contains one or more location paths.
namespaces – A mapping of prefixes that are used in the expression to namespaces. If omitted, the node’s definition is used.
- Returns
All nodes that match the evaluation of the provided XPath expression.
Tag attribute¶
- class delb.nodes.Attribute(attributes: TagAttributes, key: str)[source]¶
Attribute objects represent tag node’s attributes. See the
delb.TagNode.attributes()
documentation for capabilities.- property universal_name: str¶
The attribute’s namespace and local name in Clark notation.
Text¶
- class delb.TextNode(reference_or_text: Union[_Element, str, TextNode], position: int = 0)[source]¶
TextNodes contain the textual data of a document. The class shall not be initialized by client code, just throw strings into the trees.
Instances expose all methods of
str
exceptstr.index()
:>>> node = TextNode("Show us the way to the next whisky bar.") >>> node.split() ['Show', 'us', 'the', 'way', 'to', 'the', 'next', 'whisky', 'bar.']
Instances can be tested for inequality with other text nodes and strings:
>>> TextNode("ham") == TextNode("spam") False >>> TextNode("Patsy") == "Patsy" True
And they can be tested for substrings:
>>> "Sir" in TextNode("Sir Bedevere the Wise") True
Attributes that rely to child nodes yield nothing respectively
None
.Properties
The node's text content.
The depth (or level) of the node in its tree.
The
Document
instances that the node is associated with orNone
.The concatenated contents of all text node descendants in document order.
The node's index within the parent's collection of child nodes or
None
when the node has no parent.The prefix to namespace mapping of the node.
The node's parent or
None
.Fetching a single relative node
fetch_following
(*filter)- param filter
Any number of filter s.
fetch_following_sibling
(*filter)- param filter
Any number of filter s.
fetch_preceding
(*filter)- param filter
Any number of filter s.
fetch_preceding_sibling
(*filter)- param filter
Any number of filter s.
Iterating over relative nodes
iterate_ancestors
(*filter)- param filter
Any number of filter s that a node must match to be
iterate_children
(*filter[, recurse])A generator iterator that yields nothing.
iterate_descendants
(*filter)- param filter
Any number of filter s that a node must match to be
iterate_following
(*filter)- param filter
Any number of filter s that a node must match to be
iterate_following_siblings
(*filter)- param filter
Any number of filter s that a node must match to be
iterate_preceding
(*filter)- param filter
Any number of filter s that a node must match to be
iterate_preceding_siblings
(*filter)- param filter
Any number of filter s that a node must match to be
Querying nodes
xpath
(expression[, namespaces])See Queries with XPath & CSS for details on the extent of the XPath implementation.
Adding nodes
add_following_siblings
(*node[, clone])Adds one or more nodes to the right of the node this method is called on.
add_preceding_siblings
(*node[, clone])Adds one or more nodes to the left of the node this method is called on.
Removing a node from its tree
detach
([retain_child_nodes])Removes the node from its tree.
replace_with
(node[, clone])Removes the node and places the given one in its tree location.
Uncategorized methods
clone
([deep, quick_and_unsafe])- param deep
Clones the whole subtree if
True
.
new_tag_node
(local_name[, attributes, ...])Creates a new
TagNode
instance in the node's context.
- add_following_siblings(*node: Union[str, NodeBase, _TagDefinition], clone: bool = False)¶
Adds one or more nodes to the right of the node this method is called on.
The nodes can be concrete instances of any node type or rather abstract descriptions in the form of strings or objects returned from the
tag()
function that are used to deriveTextNode
respectivelyTagNode
instances from.- Parameters
node – The node(s) to be added.
clone – Clones the concrete nodes before adding if
True
.
- add_preceding_siblings(*node: Union[str, NodeBase, _TagDefinition], clone: bool = False)¶
Adds one or more nodes to the left of the node this method is called on.
The nodes can be concrete instances of any node type or rather abstract descriptions in the form of strings or objects returned from the
tag()
function that are used to deriveTextNode
respectivelyTagNode
instances from.- Parameters
node – The node(s) to be added.
clone – Clones the concrete nodes before adding if
True
.
- clone(deep: bool = False, quick_and_unsafe: bool = False) NodeBase [source]¶
- Parameters
deep – Clones the whole subtree if
True
.quick_and_unsafe – Creates a deep clone in a quicker manner where text nodes may get lost. It should be safe with trees that don’t contain subsequent text nodes, e.g. freshly parsed, unaltered documents of after
TagNode.merge_text_nodes()
has been applied.
- Returns
A copy of the node.
- detach(retain_child_nodes: bool = False) TextNode [source]¶
Removes the node from its tree.
- Parameters
retain_child_nodes – Keeps the node’s descendants in the originating tree if
True
.- Returns
The removed node.
- property document: Optional[Document]¶
The
Document
instances that the node is associated with orNone
.
- fetch_following(*filter: _delb.typing.Filter) Optional[NodeBase] ¶
- Parameters
filter – Any number of filter s.
- Returns
The next node in document order that matches all filters or
None
.
- fetch_following_sibling(*filter: _delb.typing.Filter) Optional[NodeBase] ¶
- Parameters
filter – Any number of filter s.
- Returns
The next sibling to the right that matches all filters or
None
.
- fetch_preceding(*filter: _delb.typing.Filter) Optional[NodeBase] ¶
- Parameters
filter – Any number of filter s.
- Returns
The previous node in document order that matches all filters or
None
.
- fetch_preceding_sibling(*filter: _delb.typing.Filter) Optional[NodeBase] [source]¶
- Parameters
filter – Any number of filter s.
- Returns
The next sibling to the left that matches all filters or
None
.
- first_child = None¶
- property index: Optional[int]¶
The node’s index within the parent’s collection of child nodes or
None
when the node has no parent.
- iterate_ancestors(*filter: _delb.typing.Filter) Iterator[TagNode] ¶
- Parameters
filter – Any number of filter s that a node must match to be yielded.
- Returns
A generator iterator that yields the ancestor nodes from bottom to top.
- iterate_children(*filter: _delb.typing.Filter, recurse: bool = False) Iterator[NodeBase] ¶
A generator iterator that yields nothing.
- iterate_descendants(*filter: _delb.typing.Filter) Iterator[NodeBase] ¶
- Parameters
filter – Any number of filter s that a node must match to be yielded.
- Returns
A generator iterator that yields the descending nodes of the node.
- iterate_following(*filter: _delb.typing.Filter) Iterator[NodeBase] ¶
- Parameters
filter – Any number of filter s that a node must match to be yielded.
- Returns
A generator iterator that yields the following nodes in document order.
- iterate_following_siblings(*filter: _delb.typing.Filter) Iterator[NodeBase] ¶
- Parameters
filter – Any number of filter s that a node must match to be yielded.
- Returns
A generator iterator that yields the siblings to the node’s right.
- iterate_preceding(*filter: _delb.typing.Filter) Iterator[NodeBase] ¶
- Parameters
filter – Any number of filter s that a node must match to be yielded.
- Returns
A generator iterator that yields the previous nodes in document order.
- iterate_preceding_siblings(*filter: _delb.typing.Filter) Iterator[NodeBase] ¶
- Parameters
filter – Any number of filter s that a node must match to be yielded.
- Returns
A generator iterator that yields the siblings to the node’s left.
- last_child = None¶
- last_descendant = None¶
- new_tag_node(local_name: str, attributes: Optional[Dict[str, str]] = None, namespace: Optional[str] = None, children: Sequence[Union[str, NodeBase, _TagDefinition]] = ()) TagNode ¶
Creates a new
TagNode
instance in the node’s context.- Parameters
local_name – The tag name.
attributes – Optional attributes that are assigned to the new node.
namespace – An optional tag namespace. If none is provided, the context node’s namespace is inherited.
children – An optional sequence of objects that will be appended as child nodes. This can be existing nodes, strings that will be inserted as text nodes and in-place definitions of
TagNode
instances fromtag()
. The latter will be assigned to the same namespace.
- Returns
The newly created tag node.
- replace_with(node: Union[str, NodeBase, _TagDefinition], clone: bool = False) NodeBase ¶
Removes the node and places the given one in its tree location.
The node can be a concrete instance of any node type or a rather abstract description in the form of a string or an object returned from the
tag()
function that is used to derive aTextNode
respectivelyTagNode
instance from.- Parameters
node – The replacing node.
clone – A concrete, replacing node is cloned if
True
.
- Returns
The removed node.
- xpath(expression: str, namespaces: Optional[Namespaces] = None) QueryResults ¶
See Queries with XPath & CSS for details on the extent of the XPath implementation.
- Parameters
expression – A supported XPath 1.0 expression that contains one or more location paths.
namespaces – A mapping of prefixes that are used in the expression to namespaces. If omitted, the node’s definition is used.
- Returns
All nodes that match the evaluation of the provided XPath expression.
Node constructors¶
- delb.new_comment_node(content: str) CommentNode [source]¶
Creates a new
CommentNode
.- Parameters
content – The comment’s content a.k.a. as text.
- Returns
The newly created comment node.
- delb.new_processing_instruction_node(target: str, content: str) ProcessingInstructionNode [source]¶
Creates a new
ProcessingInstructionNode
.- Parameters
target – The processing instruction’s target name.
content – The processing instruction’s text.
- Returns
The newly created processing instruction node.
- delb.new_tag_node(local_name: str, attributes: Optional[Dict[str, str]] = None, namespace: Optional[str] = None, children: Sequence[Union[str, NodeBase, _TagDefinition]] = ()) TagNode [source]¶
Creates a new
TagNode
instance outside any context. It is preferable to usenew_tag_node()
, on instances of documents and nodes where the instance is the creation context.- Parameters
local_name – The tag name.
attributes – Optional attributes that are assigned to the new node.
namespace – An optional tag namespace.
children – An optional sequence of objects that will be appended as child nodes. This can be existing nodes, strings that will be inserted as text nodes and in-place definitions of
TagNode
instances fromtag()
. The latter will be assigned to the same namespace.
- Returns
The newly created tag node.
Queries with XPath & CSS¶
delb allows querying of nodes with CSS selector and XPath expressions. CSS selectors are converted to XPath expressions with a third-party library before evaluation and they are only supported as far as their computed XPath equivalents are supported by delb’s very own XPath implementation.
This implementation is not fully compliant with one of the W3C’s XPath specifications. It mostly covers the XPath 1.0 specs, but focuses on the querying via path expressions with simple constraints while it omits a broad employment of computations (that’s what programming languages are for) and has therefore these intended deviations from that standard:
Default namespaces can be addressed in node and attribute names, by simply using no prefix.
The attribute and namespace axes are not supported in location steps (see also below).
In predicates only the attribute axis can be used in its abbreviated form (
@name
).Path evaluations within predicates are not available.
- Only these predicate functions are provided and tested:
boolean
concat
contains
last
not
position
starts-with
text
Behaves as if deployed as a single step location path that only tests for the node type text. Hence it returns the contents of the context node’s first child node that is a text node or an empty string when there is none.
Please refrain from extension requests without a proper, concrete implementation proposal.
If you’re accustomed to retrieve attribute values with XPath expressions, employ the functionality of the higher programming language at hand like this:
>>> [x.attributes["target"] for x in root.xpath(".//foo")
... if "target" in x.attributes ]
Instead of:
>>> root.xpath(".//foo/@target")
See _delb.plugins.PluginManager.register_xpath_function()
regarding the use of
custom functions.
- class _delb.xpath.EvaluationContext(node: NodeBase, position: int, size: int, namespaces: Namespaces)[source]¶
Instances of this type are passed to XPath functions in order to pass contextual information.
- count(value, /)¶
Return number of occurrences of value.
- index(value, start=0, stop=9223372036854775807, /)¶
Return first index of value.
Raises ValueError if the value is not present.
- property namespaces¶
A mapping of prefixes to namespaces that is used in the whole evaluation.
- property node¶
The node that is evaluated.
- property position¶
The node’s position within all nodes that matched a location step’s node test in order of the step’s axis’ direction. The first position is 1.
- property size¶
The number of all nodes all nodes that matched a location step’s node test.
- class _delb.xpath.QueryResults(results: Iterable[NodeBase])[source]¶
A container that includes the results of a CSS selector or XPath query with some helpers for better readable Python expressions.
- count(value) integer -- return number of occurrences of value ¶
- filtered_by(*filters: _delb.typing.Filter) QueryResults [source]¶
Returns another
QueryResults
instance that contains all nodes filtered by the provided filter s.
- property first: Optional[NodeBase]¶
The first node from the results or
None
if there are none.
- in_document_order() QueryResults [source]¶
Returns another
QueryResults
instance where the contained nodes are sorted in document order.
- index(value[, start[, stop]]) integer -- return first index of value. ¶
Raises ValueError if the value is not present.
Supporting start and stop arguments is optional, but recommended.
- property last: Optional[NodeBase]¶
The last node from the results or
None
if there are none.
Filters¶
Default filters¶
- delb.altered_default_filters(*filter: _delb.typing.Filter, extend: bool = False)[source]¶
This function can be either used as as context manager or decorator to define a set of
default_filters
for the encapsuled code block or callable. These are then applied in all operations that allow node filtering, likeTagNode.next_node()
. Mind that they also affect a node’s index property and indexed access to child nodes.>>> root = Document( ... '<root xmlns="foo"><a/><!--x--><b/><!--y--><c/></root>' ... ).root >>> with altered_default_filters(is_comment_node): ... print([x.content for x in root.iterate_children()]) ['x', 'y']
As the default filters shadow comments and processing instructions by default, use no argument to unset this in order to access all type of nodes.
- Parameters
extend – Extends the currently active filters with the given ones instead of replacing them.
Contributed filters¶
- delb.any_of(*filter: _delb.typing.Filter) _delb.typing.Filter [source]¶
A node filter wrapper that matches when any of the given filters is matching, like a boolean
or
.
- delb.is_comment_node(node: NodeBase) bool [source]¶
A node filter that matches
CommentNode
instances.
- delb.is_processing_instruction_node(node: NodeBase) bool [source]¶
A node filter that matches
ProcessingInstructionNode
instances.
Transformations¶
This module offers a canonical interface with the aim to make re-use of transforming algorithms easier.
Let’s look at it with examples:
from delb.transform import Transformation
class ResolveCopyOf(Transformation):
def transform(self):
for node in self.root.css_select("*[copyOf]"):
source_id = node["copyOf"]
source_node = self.origin_document.xpath(
f'//*[@xml:id="{source_id[1:]}"]'
).first
cloned_node = source_node.clone(deep=True)
cloned_node.id = None
node.replace_with(cloned_node)
From such defined transformations instances can be called with a (sub-)tree and an optional document where that tree originates from:
resolve_copy_of = ResolveCopyOf()
tree = resolve_copy_of(tree) # where tree is an instance of TagNode
typing.NamedTuple
are used to define options for transformations:
from typing import NamedTuple
class ResolveChoiceOptions(NamedTuple):
corr: bool = True
reg: bool = True
class ResolveChoice(Transformation):
options_class = ResolveChoiceOptions
def __init__(self, options):
super().__init__(options)
self.keep_selector = ",".join(
(
"corr" if self.options.corr else "sic",
"reg" if self.options.reg else "orig"
)
)
self.drop_selector = ",".join(
(
"sic" if self.options.corr else "corr",
"orig" if self.options.reg else "reg"
)
)
def transform(self):
for choice_node in self.root.css_select("choice"):
node_to_drop = choice_node.css_select(self.drop_selector).first
node_to_drop.detach()
node_to_keep = choice_node.css_select(self.keep_selector).first
node_to_keep.detach(retain_child_nodes=True)
choice_node.detach(retain_child_nodes=True)
A transformation class that defines an option_class
property can then either be used
with its defaults or with alternate options:
resolve_choice = ResolveChoice()
tree = resolve_choice(tree)
resolve_choice = ResolveChoice(ResolveChoiceOptions(reg=False))
tree = resolve_choice(tree)
Finally, concrete transformations can be chained, both as classes or instances. The interface allows also to chain multiple chains:
from delb.transform import TransformationSequence
tidy_up = TransformationSequence(ResolveCopyOf, resolve_choice)
tree = tidy_up(tree)
Attention
This is an experimental feature. It might change significantly in the future or be removed altogether.
- class delb.transform.Transformation(options: Optional[NamedTuple] = None)[source]¶
This is a base class for any transformation algorithm.
- abstract transform()[source]¶
This method needs to implement the transformation logic. When it is called, the instance has two attributes assigned from its call:
root
is the node that the transformation was called to transform with.origin_document
is the document that was possibly passed as second argument.
- class delb.transform.TransformationBase[source]¶
This base class defines the calling interface of transformations.
- class delb.transform.TransformationSequence(*transformations: Union[TransformationBase, Type[TransformationBase]])[source]¶
A transformation sequence can be used to combine any number of both
Transformation
(provided as class or instantiated with options) and otherTransformationSequence
instances or classes.
Various helpers¶
- delb.first(iterable: Iterable) Optional[Any] [source]¶
Returns the first item of the given iterable or
None
if it’s empty. Note that the first item is consumed when the iterable is an iterator.
- delb.get_traverser(from_left=True, depth_first=True, from_top=True)[source]¶
Returns a function that can be used to traverse a (sub)tree with the given node as root. While traversing the given root node is yielded at some point.
The returned functions have this signature:
def traverser(root: NodeBase, *filters: Filter) -> Iterator[NodeBase]: ...
- Parameters
from_left – The traverser yields sibling nodes from left to right if
True
, or starting from the right ifFalse
.depth_first – The child nodes resp. the parent node are yielded before the siblings of a node by a traverser if
True
. Siblings are favored ifFalse
.from_top – The traverser starts yielding nodes with the lowest depth if
True
. WhenFalse
, again, the opposite is in effect.
- delb.last(iterable: Iterable) Optional[Any] [source]¶
Returns the last item of the given iterable or
None
if it’s empty. Note that the whole iterator is consumed when such is given.
- delb.register_namespace(prefix: str, namespace: str)[source]¶
Registers a namespace prefix that newly created
TagNode
instances in that namespace will use in serializations.The registry is global, and any existing mapping for either the given prefix or the namespace URI will be removed. It has however no effect on the serialization of existing nodes, see
Document.cleanup_namespace()
for that.- Parameters
prefix – The prefix to register.
namespace – The targeted namespace.
- delb.tag(local_name: str)[source]¶
- delb.tag(local_name: str, attributes: Mapping[str, str])
- delb.tag(local_name: str, child: Union[str, NodeBase, _TagDefinition])
- delb.tag(local_name: str, children: Sequence[Union[str, NodeBase, _TagDefinition]])
- delb.tag(local_name: str, attributes: Mapping[str, str], child: Union[str, NodeBase, _TagDefinition])
- delb.tag(local_name: str, attributes: Mapping[str, str], children: Sequence[Union[str, NodeBase, _TagDefinition]])
This function can be used for in-place creation (or call it templating if you want to) of
TagNode
instances as:node
argument to methods that add nodes to a treeitems in the
children
argument ofnew_tag_node()
andNodeBase.new_tag_node()
The first argument to the function is always the local name of the tag node. Optionally, the second argument can be a mapping that specifies attributes for that node. The optional last argument is either a single object that will be appended as child node or a sequence of such, these objects can be node instances of any type, strings (for derived
TextNode
instances) or other definitions from this function (for derivedTagNode
instances).The actual nodes that are constructed always inherit the namespace of the context node they are created in.
>>> root = new_tag_node('root', children=[ ... tag("head", {"lvl": "1"}, "Hello!"), ... tag("items", ( ... tag("item1"), ... tag("item2"), ... ) ... ) ... ]) >>> str(root) '<root><head lvl="1">Hello!</head><items><item1/><item2/></items></root>' >>> root.append_children(tag("addendum")) >>> str(root)[-26:] '</items><addendum/></root>'
Exceptions¶
- exception delb.exceptions.AmbiguousTreeError(message: str)[source]¶
Raised when a single node shall be fetched or created by an XPath expression in a tree where the target position can’t be clearly determined.
- with_traceback()¶
Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.
- exception delb.exceptions.DelbBaseException[source]¶
- with_traceback()¶
Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.
- exception delb.exceptions.FailedDocumentLoading(source: Any, excuses: Dict[Callable[[Any, SimpleNamespace], Union[_ElementTree, str]], Union[str, Exception]])[source]¶
- with_traceback()¶
Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.
- exception delb.exceptions.InvalidCodePath[source]¶
Raised when a code path that is not expected to be executed is reached.
- with_traceback()¶
Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.
- exception delb.exceptions.InvalidOperation[source]¶
Raised when an invalid operation is attempted by the client code.
- with_traceback()¶
Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.
- exception delb.exceptions.XPathEvaluationError(message: str)[source]¶
- with_traceback()¶
Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.