API Documentation

Note

There are actually two packages that are installed with delb: delb and _delb. As the underscore indicates, the latter is exposing private parts of the API while the first is re-exposing what is deemed to be public from that one and additional contents. As a rule of thumb, use the public API in applications and the private API in delb extensions. By doing so, you can avoid circular dependencies if your extension (or other code that it depends on) uses contents from the _delb package.

Documents

Document loaders

If you want or need to manipulate the availability of or order in which loaders are attempted, you can change the delb.plugins.plugin_manager.plugins.loaders object which is a list. Its state is reflected in your whole application. Please refer to this issue when you require finer controls over these aspects.

Core

The core_loaders module provides a set loaders to retrieve documents from various data sources.

_delb.plugins.core_loaders.buffer_loader(data: Any, config: SimpleNamespace) _delb.typing.LoaderResult[source]

This loader loads a document from a file-like object.

_delb.plugins.core_loaders.etree_loader(data: Any, config: SimpleNamespace) _delb.typing.LoaderResult[source]

This loader processes lxml.etree._Element and lxml.etree._ElementTree instances.

_delb.plugins.core_loaders.ftp_loader(data: Any, config: SimpleNamespace) _delb.typing.LoaderResult[source]

Loads a document from a URL with either the ftp schema. The URL will be bound to source_url on the document’s Document.config attribute.

_delb.plugins.core_loaders.path_loader(data: Any, config: SimpleNamespace) _delb.typing.LoaderResult[source]

This loader loads from a file that is pointed at with a pathlib.Path instance. That instance will be bound to source_path on the document’s Document.config attribute.

_delb.plugins.core_loaders.tag_node_loader(data: Any, config: SimpleNamespace) _delb.typing.LoaderResult[source]

This loader loads, or rather clones, a delb.TagNode instance and its descendant nodes.

_delb.plugins.core_loaders.text_loader(data: Any, config: SimpleNamespace) _delb.typing.LoaderResult[source]

Parses a string containing a full document.

Extra

Parser options

Nodes

Comment

Processing instruction

Tag

Tag attribute

Text

Node constructors

Queries with XPath & CSS

delb allows querying of nodes with CSS selector and XPath expressions. CSS selectors are converted to XPath expressions with a third-party library before evaluation and they are only supported as far as their computed XPath equivalents are supported by delb’s very own XPath implementation.

This implementation is not fully compliant with one of the W3C’s XPath specifications. It mostly covers the XPath 1.0 specs, but focuses on the querying via path expressions with simple constraints while it omits a broad employment of computations (that’s what programming languages are for) and has therefore these intended deviations from that standard:

  • Default namespaces can be addressed in node and attribute names, by simply using no prefix.

  • The attribute and namespace axes are not supported in location steps (see also below).

  • In predicates only the attribute axis can be used in its abbreviated form (@name).

  • Path evaluations within predicates are not available.

  • Only these predicate functions are provided and tested:
    • boolean

    • concat

    • contains

    • last

    • not

    • position

    • starts-with

    • text
      • Behaves as if deployed as a single step location path that only tests for the node type text. Hence it returns the contents of the context node’s first child node that is a text node or an empty string when there is none.

    • Please refrain from extension requests without a proper, concrete implementation proposal.

If you’re accustomed to retrieve attribute values with XPath expressions, employ the functionality of the higher programming language at hand like this:

>>> [x.attributes["target"] for x in root.xpath(".//foo")
...  if "target" in x.attributes ]  

Instead of:

>>> root.xpath(".//foo/@target")  

See _delb.plugins.PluginManager.register_xpath_function() regarding the use of custom functions.

class _delb.xpath.EvaluationContext(node: NodeBase, position: int, size: int, namespaces: Namespaces)[source]

Instances of this type are passed to XPath functions in order to pass contextual information.

count(value, /)

Return number of occurrences of value.

index(value, start=0, stop=9223372036854775807, /)

Return first index of value.

Raises ValueError if the value is not present.

property namespaces

A mapping of prefixes to namespaces that is used in the whole evaluation.

property node

The node that is evaluated.

property position

The node’s position within all nodes that matched a location step’s node test in order of the step’s axis’ direction. The first position is 1.

property size

The number of all nodes all nodes that matched a location step’s node test.

class _delb.xpath.QueryResults(results: Iterable[NodeBase])[source]

A container that includes the results of a CSS selector or XPath query with some helpers for better readable Python expressions.

as_list() List[NodeBase][source]

The contained nodes as a new list.

property as_tuple: Tuple[NodeBase, ...]

The contained nodes in a tuple.

count(value) integer -- return number of occurrences of value
filtered_by(*filters: _delb.typing.Filter) QueryResults[source]

Returns another QueryResults instance that contains all nodes filtered by the provided filter s.

property first: Optional[NodeBase]

The first node from the results or None if there are none.

in_document_order() QueryResults[source]

Returns another QueryResults instance where the contained nodes are sorted in document order.

index(value[, start[, stop]]) integer -- return first index of value.

Raises ValueError if the value is not present.

Supporting start and stop arguments is optional, but recommended.

property last: Optional[NodeBase]

The last node from the results or None if there are none.

property size: int

The amount of contained nodes.

Filters

Default filters

Contributed filters

Transformations

Various helpers

Exceptions