API Documentation¶
Note
There are actually two packages that are installed with delb:
delb
and _delb
. As the underscore indicates, the latter is exposing
private parts of the API while the first is re-exposing what is deemed to
be public from that one and additional contents.
As a rule of thumb, use the public API in applications and the private API
in delb extensions. By doing so, you can avoid circular dependencies if
your extension (or other code that it depends on) uses contents from the
_delb
package.
Documents¶
Document loaders¶
If you want or need to manipulate the availability of or order in which loaders
are attempted, you can change the
delb.plugins.plugin_manager.plugins.loaders
object which is a
list
. Its state is reflected in your whole application. Please refer to
this issue when you require finer controls over these aspects.
Core¶
The core_loaders
module provides a set loaders to retrieve documents from various
data sources.
- _delb.plugins.core_loaders.buffer_loader(data: Any, config: SimpleNamespace) _delb.typing.LoaderResult [source]¶
This loader loads a document from a file-like object.
- _delb.plugins.core_loaders.etree_loader(data: Any, config: SimpleNamespace) _delb.typing.LoaderResult [source]¶
This loader processes
lxml.etree._Element
andlxml.etree._ElementTree
instances.
- _delb.plugins.core_loaders.ftp_loader(data: Any, config: SimpleNamespace) _delb.typing.LoaderResult [source]¶
Loads a document from a URL with either the
ftp
schema. The URL will be bound tosource_url
on the document’sDocument.config
attribute.
- _delb.plugins.core_loaders.path_loader(data: Any, config: SimpleNamespace) _delb.typing.LoaderResult [source]¶
This loader loads from a file that is pointed at with a
pathlib.Path
instance. That instance will be bound tosource_path
on the document’sDocument.config
attribute.
- _delb.plugins.core_loaders.tag_node_loader(data: Any, config: SimpleNamespace) _delb.typing.LoaderResult [source]¶
This loader loads, or rather clones, a
delb.TagNode
instance and its descendant nodes.
- _delb.plugins.core_loaders.text_loader(data: Any, config: SimpleNamespace) _delb.typing.LoaderResult [source]¶
Parses a string containing a full document.
Extra¶
Parser options¶
Nodes¶
Comment¶
Processing instruction¶
Tag¶
Tag attribute¶
Text¶
Node constructors¶
Queries with XPath & CSS¶
delb allows querying of nodes with CSS selector and XPath expressions. CSS selectors are converted to XPath expressions with a third-party library before evaluation and they are only supported as far as their computed XPath equivalents are supported by delb’s very own XPath implementation.
This implementation is not fully compliant with one of the W3C’s XPath specifications. It mostly covers the XPath 1.0 specs, but focuses on the querying via path expressions with simple constraints while it omits a broad employment of computations (that’s what programming languages are for) and has therefore these intended deviations from that standard:
Default namespaces can be addressed in node and attribute names, by simply using no prefix.
The attribute and namespace axes are not supported in location steps (see also below).
In predicates only the attribute axis can be used in its abbreviated form (
@name
).Path evaluations within predicates are not available.
- Only these predicate functions are provided and tested:
boolean
concat
contains
last
not
position
starts-with
text
Behaves as if deployed as a single step location path that only tests for the node type text. Hence it returns the contents of the context node’s first child node that is a text node or an empty string when there is none.
Please refrain from extension requests without a proper, concrete implementation proposal.
If you’re accustomed to retrieve attribute values with XPath expressions, employ the functionality of the higher programming language at hand like this:
>>> [x.attributes["target"] for x in root.xpath(".//foo")
... if "target" in x.attributes ]
Instead of:
>>> root.xpath(".//foo/@target")
See _delb.plugins.PluginManager.register_xpath_function()
regarding the use of
custom functions.
- class _delb.xpath.EvaluationContext(node: NodeBase, position: int, size: int, namespaces: Namespaces)[source]¶
Instances of this type are passed to XPath functions in order to pass contextual information.
- count(value, /)¶
Return number of occurrences of value.
- index(value, start=0, stop=9223372036854775807, /)¶
Return first index of value.
Raises ValueError if the value is not present.
- property namespaces¶
A mapping of prefixes to namespaces that is used in the whole evaluation.
- property node¶
The node that is evaluated.
- property position¶
The node’s position within all nodes that matched a location step’s node test in order of the step’s axis’ direction. The first position is 1.
- property size¶
The number of all nodes all nodes that matched a location step’s node test.
- class _delb.xpath.QueryResults(results: Iterable[NodeBase])[source]¶
A container that includes the results of a CSS selector or XPath query with some helpers for better readable Python expressions.
- count(value) integer -- return number of occurrences of value ¶
- filtered_by(*filters: _delb.typing.Filter) QueryResults [source]¶
Returns another
QueryResults
instance that contains all nodes filtered by the provided filter s.
- property first: Optional[NodeBase]¶
The first node from the results or
None
if there are none.
- in_document_order() QueryResults [source]¶
Returns another
QueryResults
instance where the contained nodes are sorted in document order.
- index(value[, start[, stop]]) integer -- return first index of value. ¶
Raises ValueError if the value is not present.
Supporting start and stop arguments is optional, but recommended.
- property last: Optional[NodeBase]¶
The last node from the results or
None
if there are none.