Welcome | Get started | Dive | Contribute | Topics | Reference | Changes | More

Schematron validation

Lino can generate the XML of a Peppol document but currrently is not able to validate it. Here is why.

Note

Code snippets in this document (lines starting with >>>) get tested as part of our development workflow. The following initialization snippet tells you which demo project is being used.

>>> from lino_book.projects.cosi1.startup import *

Let’s get our latest sales invoice and call XMLMaker.make_xml_file() on it (The following snippet is from Outbound documents, but here we will focus on validation).

>>> ar = rt.login()
>>> qs = trading.VatProductInvoice.objects.filter(journal__ref="SLS")
>>> obj = qs.order_by("accounting_period__year", "number").last()
>>> obj
VatProductInvoice #177 ('SLS 15/2015')

We have an invoice and now we can call its XMLMaker.make_xml_file() method to render its XML file and then validate it:

>>> xmlfile, url = obj.make_xml_file(ar)
Make .../cosi1/media/xml/2015/SLS-177.xml from SLS 15/2015 ...
Validate SLS-177.xml against .../lino_xl/lib/vat/XSD/PEPPOL-EN16931-UBL.sch ...

We can see that the jinja.XmlMaker.xml_validator_file() uses the file PEPPOL-EN16931-UBL.sch, which is an unmodified copy from https://docs.peppol.eu/poacc/billing/3.0/

The logger message lies a bit, right now the jinja.XmlMaker.make_xml_file() method does nothing when the validator file ends with “.sch”. This is because we didn’t yet find a way to run Schematron validation under Python. If you look at the code, you can see that we tried lxml and saxon.

The third and most promising method is tested in the following snippet. It is Robbert Harms’ pyschematron package.

>>> from importlib.util import find_spec
>>> if not find_spec('pyschematron'):
...     pytest.skip('this doctest requires pyschematron')
>>> from pyschematron import validate_document
>>> from lxml import etree
>>> result = validate_document(xmlfile, obj.xml_validator_file)
>>> result.is_valid()
True
>>> print(etree.tostring(result.get_svrl(), pretty_print=True).decode(), end='')
...
<svrl:schematron-output xmlns:svrl="http://purl.oclc.org/dsdl/svrl" xmlns:sch="http://purl.oclc.org/dsdl/schematron" xmlns:xs="http://www.w3.org/2001/XMLSchema" schemaVersion="iso">
  <svrl:metadata xmlns:dct="http://purl.org/dc/terms/" xmlns:skos="http://www.w3.org/2004/02/skos/core#" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:pysch="https://github.com/robbert-harms/pyschematron">
    <dct:creator>
      <dct:agent>
        <skos:prefLabel>PySchematron 1.1.6</skos:prefLabel>
      </dct:agent>
    </dct:creator>
    <dct:created>...</dct:created>
    <dct:source>
      <rdf:Description>
        <dct:creator>
          <dct:Agent>
            <skos:prefLabel>PySchematron 1.1.6</skos:prefLabel>
          </dct:Agent>
        </dct:creator>
        <dct:created>...</dct:created>
      </rdf:Description>
    </dct:source>
  </svrl:metadata>
</svrl:schematron-output>

We join Robbert when he writes in his README file: “In the future we hope to expand this library with an XSLT transformation based processing. Unfortunately XSLT transformations require an XSLT processor, which is currently not available in Python for XSLT >= 2.0.”

There are other people who would like to validate XML using Schematron in Python without Java.