Extending delb¶
Note
There are actually two packages that are installed with delb:
delb
and _delb
. As the underscore indicates, the latter is exposing
private parts of the API while the first is re-exposing what is deemed to
be public from that one and additional contents.
As a rule of thumb, use the public API in applications and the private API
in delb extensions. By doing so, you can avoid circular dependencies if
your extension (or other code that it depends on) uses contents from the
_delb
package.
delb
offers a plugin system to facilitate the extendability of a few of its
mechanics with Python packages.
A package that extends its functionality must provide entrypoint metadata
for an entrypoint group named delb
that points to modules that contain
extensions. Some extensions have to be decorated with specific methods
of the plugin manager object. Authors are encouraged to prefix their package
names with delb-
in order to increase discoverability.
These extension types are currently available:
document loaders
document mixin classes
document subclasses
XPath functions
Loaders are functions that try to make sense of any given input value, and if they can they return a parsed document.
Mixin classes add functionality / attributes to the delb.Document
class (instead of inheriting from it). That allows applications to rely
optionally on the availability of plugins and to combine various extensions.
Subclasses can be used to provide distinct models of arbitrary aspects for contents that are represented by a specific encoding. They can optionally implement a test method to qualify them as default class for recognized contents.
The designated means of communication between extensions is the config
argument to the loader respectively the instance property of a document instance
with that name.
Warning
A module that contains plugins and any module it is explicitly or implicitly
importing must not import anything from the delb
module itself,
because that would initiate the collection of plugin implementations. And
these wouldn’t have been completely registered at that point. Import from
the _delb
module instead.
Caution
Mind to re-install a package in development when its entrypoint specification changed.
There’s a repository that outlines the mechanics as developer reference: https://github.com/delb-xml/delb-py-reference-plugins
There’s also the snakesist project that implements the loader and document mixin plugin types to interact with eXist-db as storage.
Document loaders¶
Loaders are registered with this decorator:
- _delb.plugins.plugin_manager.register_loader(before: Optional[Union[Callable[[Any, SimpleNamespace], Union[_ElementTree, str]], Iterable[Callable[[Any, SimpleNamespace], Union[_ElementTree, str]]]]] = None, after: Optional[Union[Callable[[Any, SimpleNamespace], Union[_ElementTree, str]], Iterable[Callable[[Any, SimpleNamespace], Union[_ElementTree, str]]]]] = None) Callable ¶
Registers a document loader.
An example module that is specified as
delb
plugin for an IPFS loader might look like this:from os import getenv from types import SimpleNamespace from typing import Any from _delb.plugins import plugin_manager from _delb.plugins.https_loader import https_loader from _delb.typing import LoaderResult IPFS_GATEWAY = getenv("IPFS_GATEWAY_PREFIX", "https://ipfs.io/ipfs/") @plugin_manager.register_loader() def ipfs_loader(source: Any, config: SimpleNamespace) -> LoaderResult: if isinstance(source, str) and source.startswith("ipfs://"): config.source_url = source config.ipfs_gateway_source_url = IPFS_GATEWAY + source[7:] return https_loader(config.ipfs_gateway_source_url, config) # return an indication why this loader didn't attempt to load in order # to support debugging return "The input value is not an URL with the ipfs scheme."
The
source
argument is what aDocument
instance is initialized with as input data.Note that the
config
argument that is passed to a loader function contains configuration data, it’s thedelb.Document.config
property after_init_config
has been processed.Loaders that retrieve a document from an URL should add the origin as string to the
config
object assource_url
.You might want to specify a loader to be considered before or after another one. Let’s assume a loader shall figure out what to load from a remote XML resource that contains a reference to the actual document. That one would have to be considered before the one that loads XML documents from a URL with the https scheme:
from _delb.plugins import plugin_manager from _delb.plugins.https_loader import https_loader @plugin_manager.register_loader(before=https_loader) def mets_loader(source, config) -> LoaderResult: # loading logic here pass
Document extensions¶
Document mixin classes are registered by subclassing them from this base class:
- class _delb.plugins.DocumentMixinBase[source]¶
By deriving a subclass from this one, a document extension class is registered as plugin. These are supposed to add additional attributes to a document, e.g. derived data or methods to interact with storage systems. All attributes of an extension should share a common prefix that terminates with an underscore, e.g. storage_load, storage_save, etc.
This base class also acts as termination for methods that can be implemented by mixin classes. Any implementation of a method must call a base class’ one, e.g.:
from types import SimpleNamespace from _delb.plugins import DocumentMixinBase from magic_wonderland import play_disco class MyExtension(DocumentMixinBase): # this method can be implemented by any extension class @classmethod def _init_config(cls, config, kwargs): config.my_extension = SimpleNamespace(conf=kwargs.pop( "my_extension_conf")) super()._init_config(config, kwargs) # this method is specific to this extension def my_extension_makes_magic(self): play_disco()
- classmethod _init_config(config: SimpleNamespace, kwargs: Dict[str, Any])[source]¶
The
kwargs
argument contains the additional keyword arguments that aDocument
instance is called with. Extension classes that expect configuration data must process their specific arguments by clearing them from thekwargs
dictionary, e.g. withdict.pop()
, and preferably storing the final configuration data in atypes.SimpleNamespace
and adding it to thetypes.SimpleNamespace
passed asconfig
with the extension’s name. The initially mentioned keyword arguments should be prefixed with that name as well. This method is called before the loaders try to read and parse the given source for a document.
Document subclasses¶
Of course one can simply subclass delb.Document
to add functionality.
Beside using a subclass directly, you can let delb.Document
figure out
which subclass is an appropriate representation of the content. Subclasses can
claim that by implementing a staticmethod()
named _class_test__
that
takes the document’s root node and the configuration to return a boolean that
indicates the subclass is suited. The first class to return a True
value
will immediately be chosen, so be aware of the possible ambiguity in complex
setups. It is only ensured that subclasses are considered before others that
they derive from.
Subclasses are registered by importing them into an application, they must not be pointed to by entrypoint definitions.
Here’s an example:
class TEIDocument(Document):
def __init__(self, *args, **kwargs):
super().__init__(*args, **{**kwargs, "collapse_whitespace": True})
@staticmethod
def __class_test__(root: TagNode, config: types.SimpleNamespace) -> bool:
return root.universal_name == "{http://www.tei-c.org/ns/1.0}TEI"
@property
def title(self) -> str:
return self.css_select('titleStmt title[type="main"]').first.full_text
document = Document("""\
<?xml version="1.0" encoding="UTF-8"?>
<TEI xmlns="http://www.tei-c.org/ns/1.0"><teiHeader><fileDesc><titleStmt>
<title type="main">The Document's Title</title>
</titleStmt></fileDesc></teiHeader></TEI>
""")
if isinstance(document, TEIDocument):
print(document.title)
else:
print("Sorry, I don't know how to retrieve the document's title.")
The Document's Title
The recommendations as laid out for DocumentMixinHooks._init_config
also apply for subclasses who
would process configuration arguments in their __init__
method before
calling the super class’ one.
XPath functions¶
Custom XPath functions are registered with this decorator:
- _delb.plugins.PluginManager.register_xpath_function(self, arg: Union[Callable, str]) Callable ¶
Custom XPath functions can be defined as shown in the following example. The first argument to a function is always an instance of
_delb.xpath.EvaluationContext
followed by the expression’s arguments.from delb import Document from _delb.plugins import plugin_manager from _delb.xpath import EvaluationContext @plugin_manager.register_xpath_function("is-last") def is_last(context: EvaluationContext) -> bool: return context.position == context.size @plugin_manager.register_xpath_function def lowercase(_, string: str) -> str: return string.lower() document = Document("<root><node/><node foo='BAR'/></root>") print(document.xpath("/*[is-last() and lowercase(@foo)='bar']").first)
<node foo="BAR"/>