Meta-Programming in Extensible Documents
Vivien Kraus
Literate Programming is a technique for writing programs, in a
way that the code comes to support the ideas developed in a
human language. Successful programs written with that
technique are easy to understand, because you can get all the
important ideas while reading the document, cover to cover.
It is tempting to add code evaluation to the literate
programming technique. With this common addition, the
techniques becomes a meta-programming technique as it lets
programmers use programs to write programs, possibly in other
programming languages.
While many tools provide such meta-programming capabilities to
the literate programming task, it remains fairly uncommon to
have it applied to extensible documents, in the XML
ecosystem. This book provides a new extension to Docbook, to
support meta-programming.
Meta-Programming
Literate Programming
XML
2022
Vivien Kraus
I have not made a decision about the license of the program.
What this book is trying to do
As developers, we like to broaden our ideas about how
programming should be done. Repeating the same design process of
computer programs is boring, to the point that it seems robots
would do it better than us. After all, writing programs is very
far from a purely scientific or engineering task, a large part
of the writing process is deciding how to lay out the idea on
the medium. Choices of styles, or technologies te write the
program, are always more a question of personal preference than
an objective cost of development.
Lawyers worldwide seem to have noticed that, too, which is why
it has been decided that computer programs would be ruled by
copyright law: there are different ways to express an idea, none
of which is inherently better than the others, so the law
controls the expression of these ideas, not the ideas
themselves.
Among the different ways to write a program, literate
programming must be one of the most appealing. Write a book,
develop your ideas, and support them with code. I like the idea!
Let’s do it. What do we need?
First, we need to write a book. A very special book that is: it
must feature text, programs, and documentation of this
program. This is not very typical of a book, so we want the
authoring process to be extensible, so that
it lets authors add elements to their books without modifying
the process they use to write their books. Books are typically
written in a markup language: text is divided into elements that
carry some intrinsic semantics, such as chapters. We want an
extensible markup language, so that we can create new semantic
elements without changing the language. I know two classes of
extensible markup languages: ones where the extensions are code
plug-ins to editors for that markup language, which is how you
add features for org-mode through emacs plugins, for instance,
and XML. For this present task, I want to use XML.
We also need to write a program. Thus, our markup language
should be able to take the pieces of code around and compile
them to a program. While it is possible to write a program that
would parse the document and extract the source code, I find it
way more elegant to leverage XSLT, the stylesheet and
transformation language for markup languages based on XML.
Finally, we need to combine everything into a printable
document. There, XSLT is a tool to be used too.
The work presented here uses its own namespace:
https://labo.planete-kraus.eu/mped.git, that we will
now summarize as “mped”.
Tangling pieces of code from the document
One of the most iconic features of literate programming is its
ability to extract source code blocks and put them in files.
One source block to one file
The document contains program listings that support the
development of ideas. These are usually written in elements,
siblings to paragraphs, and for Docbook, of type
<programlisting>. The most important attribute,
“language”, identifies the programming language.
However, there is no attribute in Docbook that tells the
tangling program where each piece of code should end up. This is
why we introduce our first extension: the “mped:tangle-to”
attribute.
To tangle a document, an XSLT stylesheet is defined. It reads a
Docbook document, and outputs a shell script that writes the
correct pieces of code to the correct file names. The key
template to do the task is:
mkdir -p $(dirname "
")
>
<< "_MPED_EOF"
_MPED_EOF
]]>
This template starts by creating the directory where the file
should go, then fills the file with the source code. XML has a
precise behavior when it comes to whitespace preservation, but
it’s not always the prettiest when we write it with whitespace
output in mind. So, the code output is frequently not indented
correctly, and has too many empty lines. To counter this
effect, we use a code source formatter, a
program that reads source code and indents it correctly. For
XSLT, we can use xmllint from
libxml2. The important thing about
the formatter is that it should take its input from standard
input and write the formatted code to standard output.
xmllint --format -
]]>
If no formatter applies, then we can resort to
cat.
cat
]]>
We also need to specify how the source code is copied.
]]>
Tangling should never touch anything else. So, text should not
be copied to output.
]]>
Paste other listings in place
Literate programming requires the author to be able to discuss
bits of code in isolation, and then insert each bit into a
larger bit. Mped provides this operation with a new tag,
“mped:copy”. It has a “linkend” attribute that resolves to a
program listing anywhere in the document. When copying source
code, matching this element will insert the linked listing
directly here.
There are no listing with ID '
'.
There are multiple listings with ID '
'.
]]>
Putting it all together
The collection of all these templates gives the following:
]]>
]]>