Welcome | Get started | Dive | Contribute | Topics | Reference | Changes | More

Schematron validation

Lino can generate the XML of a Peppol document but currrently is not able to validate it. Here is why.

All code snippets on this page (lines starting with >>>) are being tested as part of our development workflow. The following snippet initializes a demo project to use throughout this page.

>>> from lino_book.projects.cosi1.startup import *

Let’s get our latest sales invoice and call XMLMaker.make_xml_file() on it (The following snippet is from Outbound documents, but here we will focus on validation).

>>> ar = rt.login()
>>> qs = trading.VatProductInvoice.objects.filter(journal__ref="SLS")
>>> obj = qs.order_by("accounting_period__year", "number").last()
>>> obj
VatProductInvoice #177 ('SLS 15/2015')

We have an invoice and now we can call its XMLMaker.make_xml_file() method to render its XML file:

>>> xmlfile = obj.make_xml_file(ar)
Make .../cosi1/media/xml/2015/SLS-2015-15.xml from SLS 15/2015 ...

The jinja.XmlMaker.xml_validator_file() points to the file PEPPOL-EN16931-UBL.sch, which is an unmodified copy from https://docs.peppol.eu/poacc/billing/3.0/

>>> obj.xml_validator_file
PosixPath('.../lino_xl/lib/vat/XSD/PEPPOL-EN16931-UBL.sch')

Right now the jinja.XmlMaker.make_xml_file() method does nothing when the validator file ends with “.sch”. Because we didn’t yet find a way to run Schematron validation under Python. If you look at the code, you can see that we tried lxml and saxon.

The third and most promising method is tested in the following snippet. It is Robbert Harms’ pyschematron package.

The tests in this document are skipped unless you have pyschematron installed.

>>> from importlib.util import find_spec
>>> if not find_spec('pyschematron'):
...     pytest.skip('this doctest requires pyschematron')
>>> from pyschematron import validate_document
>>> from lxml import etree
>>> result = validate_document(xmlfile, obj.xml_validator_file)
>>> result.is_valid()
True
>>> print(etree.tostring(result.get_svrl(), pretty_print=True).decode(), end='')
...
<svrl:schematron-output xmlns:svrl="http://purl.oclc.org/dsdl/svrl" xmlns:sch="http://purl.oclc.org/dsdl/schematron" xmlns:xs="http://www.w3.org/2001/XMLSchema" schemaVersion="iso">
  <svrl:metadata xmlns:dct="http://purl.org/dc/terms/" xmlns:skos="http://www.w3.org/2004/02/skos/core#" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:pysch="https://github.com/robbert-harms/pyschematron">
    <dct:creator>
      <dct:agent>
        <skos:prefLabel>PySchematron 1.1.6</skos:prefLabel>
      </dct:agent>
    </dct:creator>
    <dct:created>...</dct:created>
    <dct:source>
      <rdf:Description>
        <dct:creator>
          <dct:Agent>
            <skos:prefLabel>PySchematron 1.1.6</skos:prefLabel>
          </dct:Agent>
        </dct:creator>
        <dct:created>...</dct:created>
      </rdf:Description>
    </dct:source>
  </svrl:metadata>
</svrl:schematron-output>

We join Robbert when he writes in his README file: “In the future we hope to expand this library with an XSLT transformation based processing. Unfortunately XSLT transformations require an XSLT processor, which is currently not available in Python for XSLT >= 2.0.”

There are other people who would like to validate XML using Schematron in Python without needing a Java machine.