Meta-Programming in Extensible Documents Vivien Kraus Literate Programming is a technique for writing programs, in a way that the code comes to support the ideas developed in a human language. Successful programs written with that technique are easy to understand, because you can get all the important ideas while reading the document, cover to cover. It is tempting to add code evaluation to the literate programming technique. With this common addition, the techniques becomes a meta-programming technique as it lets programmers use programs to write programs, possibly in other programming languages. While many tools provide such meta-programming capabilities to the literate programming task, it remains fairly uncommon to have it applied to extensible documents, in the XML ecosystem. This book provides a new extension to Docbook, to support meta-programming. Meta-Programming Literate Programming XML 2022 Vivien Kraus I have not made a decision about the license of the program. What this book is trying to do As developers, we like to broaden our ideas about how programming should be done. Repeating the same design process of computer programs is boring, to the point that it seems robots would do it better than us. After all, writing programs is very far from a purely scientific or engineering task, a large part of the writing process is deciding how to lay out the idea on the medium. Choices of styles, or technologies te write the program, are always more a question of personal preference than an objective cost of development. Lawyers worldwide seem to have noticed that, too, which is why it has been decided that computer programs would be ruled by copyright law: there are different ways to express an idea, none of which is inherently better than the others, so the law controls the expression of these ideas, not the ideas themselves. Among the different ways to write a program, literate programming must be one of the most appealing. Write a book, develop your ideas, and support them with code. I like the idea! Let’s do it. What do we need? First, we need to write a book. A very special book that is: it must feature text, programs, and documentation of this program. This is not very typical of a book, so we want the authoring process to be extensible, so that it lets authors add elements to their books without modifying the process they use to write their books. Books are typically written in a markup language: text is divided into elements that carry some intrinsic semantics, such as chapters. We want an extensible markup language, so that we can create new semantic elements without changing the language. I know two classes of extensible markup languages: ones where the extensions are code plug-ins to editors for that markup language, which is how you add features for org-mode through emacs plugins, for instance, and XML. For this present task, I want to use XML. We also need to write a program. Thus, our markup language should be able to take the pieces of code around and compile them to a program. While it is possible to write a program that would parse the document and extract the source code, I find it way more elegant to leverage XSLT, the stylesheet and transformation language for markup languages based on XML. Finally, we need to combine everything into a printable document. There, XSLT is a tool to be used too. The work presented here uses its own namespace: https://labo.planete-kraus.eu/mped.git, that we will now summarize as “mped”. Tangling pieces of code from the document One of the most iconic features of literate programming is its ability to extract source code blocks and put them in files.
One source block to one file The document contains program listings that support the development of ideas. These are usually written in elements, siblings to paragraphs, and for Docbook, of type <programlisting>. The most important attribute, “language”, identifies the programming language. However, there is no attribute in Docbook that tells the tangling program where each piece of code should end up. This is why we introduce our first extension: the “mped:tangle-to” attribute. To tangle a document, an XSLT stylesheet is defined. It reads a Docbook document, and outputs a shell script that writes the correct pieces of code to the correct file names. The key template to do the task is .
Copy a specific programlisting to disk mkdir -p $(dirname " ") > << "_MPED_EOF" _MPED_EOF ]]>
This template starts by creating the directory where the file should go, then fills the file with the source code. XML has a precise behavior when it comes to whitespace preservation, but it’s not always the prettiest when we write it with whitespace output in mind. So, the code output is frequently not indented correctly, and has too many empty lines. To counter this effect, we use a code source formatter, a program that reads source code and indents it correctly. For XSLT, in , we can use xmllint from libxml2. The important thing about the formatter is that it should take its input from standard input and write the formatted code to standard output.
Use xmllint as a formatter for XML languages xmllint --format - ]]>
Unfortunately, xmllint does not accept an XML processing instruction after the first line, so you will still need to put no whitespace between the start tag of programlisting and the text for XML listings. If no formatter applies, then we can resort to cat, see .
Use cat as the default formatter for any language cat ]]>
We also need to specify how the source code is copied. It is very simple: copies the text verbatim.
Copy the source code as text ]]>
Tangling should never touch anything else. So, text should not be copied to output. This is why we disable text matching by default with .
Ignore text by default when tangling ]]>
Paste other listings in place Literate programming requires the author to be able to discuss bits of code in isolation, and then insert each bit into a larger bit. Mped provides this operation with a new tag, “mped:copy”. It has a “linkend” attribute that resolves to a program listing anywhere in the document. When copying source code, matching this element will insert the linked listing directly here. This is done in . More precisely, it looks if there is a single program listing that is directly under a figure with the given ID. This way, we can refer to listings as the figure they appear in, which makes cross-referencing easier.
Insert literally a listing in another when tangling There are no listing directly within a figure with ID ' '. There are multiple listings directly within a figure with ID ' '. ]]>
Putting it all together The collection of all these templates gives a full stylesheet, in .
The full stylesheet for tangling ]]> ]]>
Displaying the source code: back to docbook! Now that the program is written, the mped tags are not required anymore. We need to remove them, and embed their semantics into docbook. For instance, the <mped:copy> tag that links to other listings must be replaced with a comment instructing the reader to insert the correct listing at this location. To achieve this, we create a new stylesheet, that runs across the document, and produces a list of templates to remove mped elements.
Clean program listings We need to replace the <mped:copy> tags within the program listing, with the title of the listing as a comment. See to put the comment in the XML language, and for a default language (do not put a comment).
Insert a call-out comment in XML <!-- --> ]]>
Do not insert a call-out if the language comment syntax is unknown ]]>
Finally, all other elements must be copied as-is. This is why the catch-all template is used.
Copy everything else without modification ]]>
Putting it all together The collection of all these templates gives a full stylesheet, in .
The full stylesheet to apply mped markup ]]> ]]>
Starting from this file, mped.xml, and the bootstrap tangling stylesheet, tangle-bootstrap.xsl, you would obtain a mped-less docbook file with .
How to convert this document
Future works There is still a lot to do on this subject. I want more languages supported out-of-the-box, for inserting comment, for formatting. I want to be able to specify shebang to listings for tangling only and making the file executable. I want to be able to evaluate code blocks. Evaluation results would be XML, to be included in the document. Code blocks would accept parameters as linkends, that would be bound to their string content. I want to have a tag similar to mped:copy, named mped:evaluate, that inserts the result of the evaluation. I want a stylesheet that removes empty lines after and before each source code. Care is needed to ensure that if a program listing contains mixed text nodes and other elements, everything works correctly. I want a stylesheet that runs on a whole document, perform syntax highlighting, and add the colored syntax annotations. I want to export the result to texinfo, and listings should be called “listings”, not “figures”. I want translations of this document.