Literate Programming using Sphinx and Haskell
Posted: October 17, 2011 Filed under: Programming | Tags: Haskell 1 Comment »When working on new projects, we try to write down any ideas we have in documents, for future reference. After a while, some of these documents become design-documents.
Sometimes it’s useful to provide some code samples in these documents, to clarify some things, e.g. provide a basic implementation of some algorithm.
These samples should be in the form of pseudocode, so they don’t grow too large and don’t expose too many details. Using some fake programming language has a serious downside though: there’s no way to validate consistency, check types,… in an automated way.
To write documents in the most recent project, the Sphinx tool is used, which allows one to write documents using ReStructured Text, and compile them into HTML, LaTeX/PDF or other documents, including useful features like syntax highlighting of code blocks.
When writing code in these documents, I tend to use Haskell as pseudocode-language. Compact notation, type-safety and fake implementations using undefined allow for easy prototyping and interpretation, even for readers not familiar with the language.
Whilst GHC, the major Haskell compiler, has native support for so-called Literate Programming (i.e. source code with other text in-between, in a specific formatting as introduced by Donald Knuth, check Wikipedia), this is not compatible with the Sphinx syntax. Luckily, the GHC developers allow users to specify custom literate processors on the command line. This removes the need for extra preprocessing build-steps, and allows one to use custom literate files in the ghci REPL as-is.
Today I implemented such preprocessor (using a simple Python script). The source is available at https://gist.github.com/1292596. The script will extract all blocks marked as code-block. It will not filter any of the blocks, so if you use code-blocks containing non-Haskell code, things will fail. There’s filtering support code, but this is not exposed (as of now).
Here’s a quick walkthrough. Imagine you wrote a Sphinx source document, hello_world.rst, with the following content:
Hello World
===========
To be able to print "Hello world" to the screen, we need to define this string:
.. code-block:: haskell
helloWorld :: String
helloWorld = "Hello world"
Printing a string to the screen is an IO operation, so we should perform this
action inside the IO monad. The action won't return any useful result, so we'll
return *()*:
.. code-block:: haskell
printHelloWorld :: IO ()
printHelloWorld = putStrLn helloWorld
Finally, we want to make a real application, so we need a main action:
.. code-block:: haskell
main :: IO ()
main = printHelloWorld
To use this module in ghci, we should enable our preprocessor, and tell GHC the .rst file is actually Literate Haskell (.lhs), so it will perform all required compilation steps. Assume the preprocessor script is stored as spp.py in the current working directory, and has executable permissions:
$ ghci -pgmL "./spp.py" -x lhs hello_world.rst GHCi, version 7.0.2: http://www.haskell.org/ghc/for help Loading package ghc-prim ... linking ... done. Loading package integer-gmp ... linking ... done. Loading package base ... linking ... done. [1 of 1] Compiling Main ( hello_world.rst, interpreted ) Ok, modules loaded: Main. *Main> main Hello world
Note source location is calculated correctly:
*Main> :info main main :: IO () -- Defined at hello_world.rst:24:1-4
You can use ghc with the same arguments to compile Sphinx documents.
Using this approach, it should be possible to write complete applications as a tree of Spinx documents, containing design documents, documentation as well as the actual implementation.
The same system could be used with other toolchains (other languages) as well, of course (Ocaml comes to mind, e.g. integrating the preprocessor as an OcamlBuild rule).

Andrew Tridgell famously advocated that one “value (one’s) junk code” – http://junkcode.samba.org. Scratch documents have a way of becoming design documents, and scratch code has a way of becoming the core of an application.