mped.xml


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368

<?xml version="1.0" encoding="utf-8"?>
<book xmlns="http://docbook.org/ns/docbook"
      xmlns:xi="http://www.w3.org/2001/XInclude"
      xmlns:mped="https://labo.planete-kraus.eu/mped.git" version="5.0">
  <info>
    <title>Meta-Programming in Extensible Documents</title>
    <author>
      <personname>Vivien Kraus</personname>
    </author>
    <abstract>
      <para>
        Literate Programming is a technique for writing programs, in a
        way that the code comes to support the ideas developed in a
        human language. Successful programs written with that
        technique are easy to understand, because you can get all the
        important ideas while reading the document, cover to cover.
      </para>
      <para>
        It is tempting to add code evaluation to the literate
        programming technique. With this common addition, the
        techniques becomes a meta-programming technique as it lets
        programmers use programs to write programs, possibly in other
        programming languages.
      </para>
      <para>
        While many tools provide such meta-programming capabilities to
        the literate programming task, it remains fairly uncommon to
        have it applied to extensible documents, in the XML
        ecosystem. This book provides a new extension to Docbook, to
        support meta-programming.
      </para>
    </abstract>
    <keywordset>
      <keyword>Meta-Programming</keyword>
      <keyword>Literate Programming</keyword>
      <keyword>XML</keyword>
    </keywordset>
    <copyright>
      <year>2022</year>
      <holder>Vivien Kraus</holder>
    </copyright>
    <legalnotice>
      <para>
        I have not made a decision about the license of the program.
      </para>
    </legalnotice>
  </info>
  <preface>
    <title>What this book is trying to do</title>
    <para>
      As developers, we like to broaden our ideas about how
      programming should be done. Repeating the same design process of
      computer programs is boring, to the point that it seems robots
      would do it better than us. After all, writing programs is very
      far from a purely scientific or engineering task, a large part
      of the writing process is deciding how to lay out the idea on
      the medium. Choices of styles, or technologies te write the
      program, are always more a question of personal preference than
      an objective cost of development.
    </para>
    <para>
      Lawyers worldwide seem to have noticed that, too, which is why
      it has been decided that computer programs would be ruled by
      copyright law: there are different ways to express an idea, none
      of which is inherently better than the others, so the law
      controls the expression of these ideas, not the ideas
      themselves.
    </para>
    <para>
      Among the different ways to write a program, literate
      programming must be one of the most appealing. Write a book,
      develop your ideas, and support them with code. I like the idea!
      Let’s do it. What do we need?
    </para>
    <para>
      First, we need to write a book. A very special book that is: it
      must feature text, programs, and documentation of this
      program. This is not very typical of a book, so we want the
      authoring process to be <emphasis>extensible</emphasis>, so that
      it lets authors add elements to their books without modifying
      the process they use to write their books. Books are typically
      written in a markup language: text is divided into elements that
      carry some intrinsic semantics, such as chapters. We want an
      extensible markup language, so that we can create new semantic
      elements without changing the language. I know two classes of
      extensible markup languages: ones where the extensions are code
      plug-ins to editors for that markup language, which is how you
      add features for org-mode through emacs plugins, for instance,
      and XML. For this present task, I want to use XML.
    </para>
    <para>
      We also need to write a program. Thus, our markup language
      should be able to take the pieces of code around and compile
      them to a program. While it is possible to write a program that
      would parse the document and extract the source code, I find it
      way more elegant to leverage XSLT, the stylesheet and
      transformation language for markup languages based on XML.
    </para>
    <para>
      Finally, we need to combine everything into a printable
      document. There, XSLT is a tool to be used too.
    </para>
    <para>
      The work presented here uses its own namespace:
      <uri>https://labo.planete-kraus.eu/mped.git</uri>, that we will
      now summarize as “mped”.
    </para>
  </preface>
  <chapter>
    <title>Tangling pieces of code from the document</title>
    <para>
      One of the most iconic features of literate programming is its
      ability to extract source code blocks and put them in files.
    </para>
    <section>
      <title>One source block to one file</title>
      <para>
        The document contains program listings that support the
        development of ideas. These are usually written in elements,
        siblings to paragraphs, and for Docbook, of type
        &lt;programlisting&gt;. The most important attribute,
        “language”, identifies the programming language.
      </para>
      <para>
        However, there is no attribute in Docbook that tells the
        tangling program where each piece of code should end up. This is
        why we introduce our first extension: the “mped:tangle-to”
        attribute.
      </para>
      <para>
        To tangle a document, an XSLT stylesheet is defined. It reads a
        Docbook document, and outputs a shell script that writes the
        correct pieces of code to the correct file names. The key
        template to do the task is:
      </para>
      <programlisting language="xml" xml:id="tangle-programlisting">
        <![CDATA[
<xsl:template match="docbook:programlisting[@mped:tangle-to]">
  <xsl:text>mkdir -p $(dirname "</xsl:text>
  <xsl:value-of select="@mped:tangle-to" />
  <xsl:text>")&#xA;</xsl:text>

  <xsl:text>cat >> </xsl:text>
  <xsl:value-of select="@mped:tangle-to" />
  <xsl:text> &lt;&lt; "_MPED_EOF"&#xA;</xsl:text>

  <xsl:apply-templates mode="copy-source-code" />

  <xsl:text>&#xA;_MPED_EOF&#xA;</xsl:text>
</xsl:template>
        ]]>
      </programlisting>
      <para>
        This template starts by creating the directory where the file
        should go, then fills the file with the source code. For this to
        work, we need to do two things about the text of the program
        listing: remove the first empty lines and the last empty
        lines of the content (but preserve indentation).
      </para>
      <para>
        Let us start with removing leading or trailing empty
        lines. Removing leading empty lines seems easier.
      </para>
      <programlisting language="xml"
                      xml:id="mped-private-leading-empty-lines">
        <![CDATA[
<xsl:template name="mped-private-leading-empty-lines">
  <xsl:param name="indentation" select="''" />
  <xsl:param name="text" select="." />
  <xsl:choose>
    <xsl:when test="substring($text, 1, 1) = '&#xA;'">
      <xsl:call-template name="mped-private-leading-empty-lines">
        <xsl:with-param name="indentation" select="''" />
        <xsl:with-param name="text"
             select="substring($text, 2)" />
      </xsl:call-template>
    </xsl:when>

    <xsl:when test="(
        substring($text, 1, 1) = ' '
        or substring($text, 1, 1) = '&#9;'
        or substring($text, 1, 1) = '&#13;')">
      <xsl:call-template name="mped-private-leading-empty-lines">
        <xsl:with-param name="indentation" 
             select="concat($indentation, substring($text, 1, 1))" />
        <xsl:with-param name="text"
             select="substring($text, 2)" />
      </xsl:call-template>
    </xsl:when>

    <xsl:otherwise>
      <xsl:value-of select="$indentation" />
      <xsl:value-of select="$text" />
    </xsl:otherwise>
  </xsl:choose>
</xsl:template>
        ]]>
      </programlisting>
      <para>
        There are three different cases. If the text starts with a
        newline, discard the indentation that we carried and the
        newline. If the text starts with whitespace, carry it and look
        at the next character. Otherwise, the whitespace that we carried
        is indentation, so print it, and print the text.
      </para>
      <para>
        To avoid exposing the carried indentation, it is better to mark
        this template as internal and wrap it in a new template.
      </para>
      <programlisting language="xml" xml:id="remove-leading-empty-lines">
        <![CDATA[
<xsl:template name="remove-leading-empty-lines">
  <xsl:param name="text" select="." />
  <xsl:call-template name="mped-private-leading-empty-lines">
    <xsl:with-param name="indentation" select="''" />
    <xsl:with-param name="text" select="$text" />
  </xsl:call-template>
</xsl:template>
        ]]>
      </programlisting>
      <para>
        To remove trailing empty lines, the solution is easier since
        there is no indentation to keep around: just discard all the
        trailing whitespace.
      </para>
      <programlisting language="xml" xml:id="remove-trailing-whitespace">
        <![CDATA[
<xsl:template name="remove-trailing-whitespace">
  <xsl:param name="text" select="." />
  <xsl:choose>
    <xsl:when test="$text = ''">
      <xsl:value-of select="$text" />
    </xsl:when>
    <xsl:otherwise>
      <xsl:variable name="last" select="substring($text, string-length ($text), 1)" />
      <xsl:variable name="before" select="substring($text, 1, string-length ($text) - 1)" />
      <xsl:choose>
        <xsl:when test="$last = ' ' or $last = '&#9;'
                        or $last = '&#10;' or $last = '&#13;'">
          <xsl:call-template name="remove-trailing-whitespace">
            <xsl:with-param name="text" select="$before" />
          </xsl:call-template>
        </xsl:when>
        <xsl:otherwise>
          <xsl:value-of select="$text" />
        </xsl:otherwise>
      </xsl:choose>
    </xsl:otherwise>
  </xsl:choose>
</xsl:template>
        ]]>
      </programlisting>
      <para>
        Using these templates, we can process the program listing code
        in the “copy-source-code” mode: if there is only one text node,
        then remove the leading emtpy lines and trailing
        whitespace. Otherwise, remove the leading emtpy lines from the
        first text node and the trailing whitespace from the last text
        node. By “first” (respectively, “last”) text node, I mean the
        text node that has no preceding (respectively, following)
        siblings. Maybe there are no such text nodes.
      </para>
      <programlisting language="xml" xml:id="copy-source-code-text">
        <![CDATA[
<xsl:template match="text()[position() = 1 and position() = last()]"
              mode="copy-source-code">
  <xsl:call-template name="remove-trailing-whitespace">
    <xsl:with-param name="text">
      <xsl:call-template name="remove-leading-empty-lines">
        <xsl:with-param name="text" select="." />
      </xsl:call-template>
    </xsl:with-param>
  </xsl:call-template>
</xsl:template>

<xsl:template match="text()[position() = 1 and position() != last()]"
              mode="copy-source-code">
  <xsl:call-template name="remove-leading-empty-lines">
    <xsl:with-param name="text" select="." />
  </xsl:call-template>
</xsl:template>

<xsl:template match="text()[position() > 1 and position() = last()]"
              mode="copy-source-code">
  <xsl:call-template name="remove-trailing-whitespace">
    <xsl:with-param name="text" select="." />
  </xsl:call-template>
</xsl:template>
        ]]>
      </programlisting>
      <para>
        Tangling should never touch anything else. So, text should not
        be copied to output.
      </para>
      <programlisting language="xml" xml:id="ignore-text-other-than-source">
        <![CDATA[
                 <xsl:template match="text()" />
        ]]>
      </programlisting>
    </section>
    <section>
      <title>Paste other listings in place</title>
      <para>
        Literate programming requires the author to be able to discuss
        bits of code in isolation, and then insert each bit into a
        larger bit. Mped provides this operation with a new tag,
        “mped:copy”. It has a “linkend” attribute that resolves to a
        program listing anywhere in the document. When copying source
        code, matching this element will insert the linked listing
        directly here.
      </para>
      <programlisting language="xml" xml:id="tangle-mped-copy">
        <![CDATA[
<xsl:template match="mped:copy" mode="copy-source-code">
  <xsl:variable name="ref" select="@linkend" />
  <xsl:variable name="candidates"
                select="count(//docbook:programlisting[@xml:id = $ref])" />
  <xsl:if test="$candidates = 0">
    <xsl:message terminate="yes">
      <xsl:text>There are no listing with ID '</xsl:text>
      <xsl:value-of select="$ref" />
      <xsl:text>'.&#xA;</xsl:text>
    </xsl:message>
  </xsl:if>
  <xsl:if test="$candidates > 1">
    <xsl:message terminate="yes">
      <xsl:text>There are multiple listings with ID '</xsl:text>
      <xsl:value-of select="$ref" />
      <xsl:text>'.&#xA;</xsl:text>
    </xsl:message>
  </xsl:if>
  <xsl:apply-templates select="//docbook:programlisting[@xml:id = $ref]"
                       mode="copy-source" />
</xsl:template>
        ]]>
      </programlisting>
    </section>
    <section>
      <title>Putting it all together</title>
      <para>
        The collection of all these templates gives the following:
      </para>
      <programlisting language="xml" xml:id="whole-tangling-stylesheet"
                      mped:tangle-to="tangle.xsl">
        <![CDATA[
<?xml version="1.0"?>
<xsl:stylesheet version="1.0"
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                xmlns:mped="https://labo.planete-kraus.eu/mped.git"
                xmlns:docbook="http://docbook.org/ns/docbook">

  <xsl:output method="text" indent="no" />
  <xsl:strip-space elements="*" />
        ]]>
        <mped:copy linkend="tangle-programlisting" />
        <mped:copy linkend="mped-private-leading-empty-lines" />
        <mped:copy linkend="remove-leading-empty-lines" />
        <mped:copy linkend="remove-trailing-whitespace" />
        <mped:copy linkend="copy-source-code-text" />
        <mped:copy linkend="ignore-text-other-than-source" />
        <mped:copy linkend="tangle-mped-copy" />
        <![CDATA[
</xsl:stylesheet>
        ]]>
      </programlisting>
    </section>
  </chapter>
</book>