Thursday, 16 April 2015

Organise collections of LaTeX documents with BSD Owl Scripts

Let us discuss how to handle collections of LaTeX documents with the build system BSD Owl Scripts. In our example we pretend that we are preparing an electronic journal and want to distribute each article of the journal as a separate electronic document.

Organisation on the file-system

We use the following simple organisation at the file-system level:
  1. We prepare a directory holding each issue of our journal, for instance ~/journal.
  2. Each issue of the journal is represented by a subdirectory.
  3. Each article of the journal is represented by a subdirectory of the directory corresponding to the issue it belongs to.
Assume we already have several articles, as demonstrated by the following command output:
% find ./journal -name '*.tex'
./journal/issue-2013-1/01-galdal/article.tex
./journal/issue-2013-1/02-arathlor/article.tex
./journal/issue-2013-2/01-mirmilothor/article.tex
./journal/issue-2013-2/02-eoron/article.tex
./journal/issue-2013-2/03-echalad/article.tex
Names like galdal, arathlor are the names of fictional authors of articles of our journal. Each submission has a directory containing the text article.tex of the article.

Typeset each single article

We rely on BSD Owl Scripts to transform each article in a PDF file. We therefore add a Makefile in each directory corresponding to an article.
% find ./journal -name 'Makefile'
./journal/issue-2013-1/01-galdal/Makefile
./journal/issue-2013-1/02-arathlor/Makefile
./journal/issue-2013-2/01-mirmilothor/Makefile
./journal/issue-2013-2/02-eoron/Makefile
./journal/issue-2013-2/03-echalad/Makefile
Each of these Makefiles can actually be as simple as
DOCUMENT=       article.tex
.include "latex.doc.mk"
These Makefiles can also define file-system locations where TeX will lookup for common assets, define rules to automatically build some tables or figures, or use any of the more advanced techniques described in the documentation. Since we want to keep focus on the organisational features of BSD Owl Scripts we will stick to that minimalistic Makefile.

Bundle the articles together

To orchestrate the preparation of all our articles with BSD Owl Scripts we just need to write additional Makefiles.
./journal/Makefile
./journal/issue-2013-1/Makefile
./journal/issue-2013-2/Makefile
./journal/issue-2013-3/Makefile
Each Makefile basically contains the list of subdirectories where make should descend to actually build, install or clean. Readers fond of design patterns will recognise aggregates implementing a delegate pattern.
The file ./journal/Makefile should contain:
PACKAGE=        journal

SUBDIR=         issue-2013-1
SUBDIR+=        issue-2013-2
SUBDIR+=        issue-2013-3
.include "bps.subdir.mk"
The file ./journal/issue-2013-1/Makefile should contain:
SUBDIR=         01-galdal
SUBDIR+=        02-arathlor
.include "bps.subdir.mk"
The remaining files ./journal/issue-2013-2/Makefile and ./journal/issue-2013-3/Makefile can be similarly prepared. With these settings, the targets all, build, clean, distclean, realclean and install are delegated to Makefiles found in the subdirectories listed by SUBDIR.
The variable SUBDIR_PREFIX can be used to define a customised installation path for each article, so that the Makefile building a document could be
DOCUMENT=       article.tex
DOCDIR=         ${HOME}/publish/journal${SUBDIR_PREFIX}
.include "latex.doc.mk"
With this setting, the document ./journal/issue-2013-1/01-galdal/article.pdf will be installed as ${HOME}/publish/journal/issue-2013-1/01-galdal/article.pdf and so on. It is possible to tweak this in all possible ways to use arbitrary naming schemes for installed articles, like for instance ${HOME}/publish/journal/issue-2013-1/01-galdal.pdf or whatever we fancy.

Declare locations of file assets

We can elaborate on our basic setup to handle the case where our documents share assets, for instance a logo for our journal or some custom LaTeX packages. In BSD Owl Scripts we can use the TEXINPUTS variable to declare one or more such locations. For instance the declaration
TEXINPUTS=      ${HOME}/share/texmf/tex/latex/journal
will arrange so that TeX finds all files in ${HOME}/share/texmf/tex/latex/journal when it needs them. This statement can be added to individual Makefiles responsible for the preparation of an article, or it can be added to ./journal/Makefile.inc. The latter file is read by make every times it processes a Makefile based on BSD Owl Scripts. Adding that declaration to ./journal/Makefile.inc is therfore similar to adding it to each single Makefile in the project.

Monday, 13 April 2015

Testing complex shell programs without installing them

A simple shell script fitting in one file can easily be tested from the command line. Complex scripts relying on several shell subroutines libraries and other file assets are a bit more complicated to test, because the file assets used by the script lie at different locations on the file system when the script is developed and when the script is installed. Let us see together how we can use parameter expansion modifiers to easily test our shell scripts without installing them, and how this is actually used in a small project anvil.

Test the scripts in a special crafted environment

We can environment variables to reconciliate the shell script's idea of where to found its file assets when it is installed and when it is tested as part of its development cycle. More precisely, we can take advantage of the Assign Default Values parameter expansion modificator. This modificator is described as follows in sh(1) on FreeBSD:
${parameter:=word}
Assign Default Values. If parameter is unset or null, the expan- sion of word is assigned to parameter. In all cases, the final value of parameter is substituted. Quoting inside word does not prevent field splitting or pathname expansion. Only variables, not positional parameters or special parameters, can be assigned in this way.
Assume that our software package is called anvil and its source repository contains a collection of function libraries found in the folder subr, which are copied in /usr/local/share/anvil/subr when the package is installed. If we write
: ${subrdir:=/usr/local/share/anvil/subr}
near the top of our shell program, we can use ${subrdir} to access our subroutine libraries, for instance
. "${subrdir}/common.sh"
In order to test our script as part of its developement cycle, we must ensure that it reads the subroutines found in its source repository, and not the libraries found in /usr/local/share/anvil/subr copied by the installation of an older version of the software package. For this, it is enough to run the script in an environment where the variable subrdir has been set to ./subr or to the absolute path to that directory.

Configure installation paths

Instead of hard-wiring the installation in the sources, we may want to use configuration parameters for this. It is easy to use autoconf for this, but in order to avoid running repeatedly the slow ./configure script it generates, we may prefer a solution based on autoconf and Makefiles, so that the ./configure script is run once and for all and make is used to edit configuration parameters in the scripts. This is the approach used by anvil, which can be studied as an example of this technique.

Sunday, 12 April 2015

Delegating complex treatments to filters in shell programs

Novice shell programmers tend to reproduce procedural structures they learnt from classical procedural languages like Pascal or C¹. While it produces results, this approach is catastrophic and complex treatments should be delegated to filters. I will first convince you that catastrophic is not as much of an hyperbole as it may seem and discuss a simple concrete example from a code review I recently made.

Example presentation

I reviewed a submission for opam, the package manager recently adopted by the OCaml community, and spent quite a time to comment on a code snippet which confronts the list of compilers supported by the system with a list of compiler versions that are to be kept, the purpose of the script being to remove the remaining ones.
Each supported compiler is represented on the filesystem by a directory, whose path relative to ${OPAMROOT} — the path to data owned by opam — has the following structure:
compilers/${SERIES}/${VERSION}
So for instance, the directory compilers/4.02.1/4.02.1+PIC corresponds to the compiler 4.02.1+PIC in the 4.02.1 series.
The problem solved by the snippet I reviewed performs a rather straightforward treatment: given a list of compiler versions held by the variable COMPILER_VERSIONS it removes from the file-system the compilers whose version is not listed in COMPILER_VERSIONS.

Mimicking the classic procedural approach

The classic procedural approach to solving this problem can be worded “consider each compiler, if it is not in my little list, then delete it.” This can be implemented like this in the shell:
is_in_compiler_versions()
{
  local version
  for version in ${COMPILER_VERSIONS}; do
      if [ ${version} = $1 ]; then return 0; fi
  done
  return 1
}

for compiler in compilers/*/*; do
    if !is_in_compiler_versions "${compiler##compilers/*/}"; then
        rm -r -f "${compiler}"
    fi
done
There is nothing terribly surprising here and it definitely works, so why should we consider this approach catastrophic? Here are a few reasons:
  1. The code is hard to read, there is no function name advertising the purpose of the main treatment and this purpose is well hidden in a conditional in the body of a for-loop.
  2. The code is hard to debug, since of the three important data sets involved in this treatment, only the list COMPILER_VERSIONS can be easily examined by the maintainance programmer. The list of supported compilers and the list of compilers to remove form the file-system exist only in an evanescent manner in this code and cannot be easily examined.
  3. The code is hard to reuse, because the enumeration of the compilers to remove and the actual removing are tightly bound together.
To put this in a few words, mimicking the classical procedural approach led to a code which is hard to read, hard to debug and hard to reuse. Maybe labeling this catastrophic was not an exaggeration, after all. And we did not even consider execution speed, the shell being rather slow, that kind of code performs poorly when it has to handle a lot of data.
The example itself is of course really innocent but things go worse when we consider more complicated treatments, and larger programs. Now, what can we do about this? We can opt for

Delegating complex treatments to filters

Understanding that complex treatments should not be performed by the shell itself but delegated to filters is probably the most important perspective shift required to program the shell properly. Here is how we can rewrite the previous snippet using filters.
find_compilers()
{
  find "compilers" -type d -depth 2
}

select_not_in()
{
    awk -F '/' -v filter_out_list="$1" '
BEGIN {
  split(filter_out_list, s, " ")
  for(i in s){
    filter_out[s[i]]
  }
}
!($3 in filter_out) {print}
'
}

find_compilers\
  | select_not_in "${COMPILER_VERSIONS}"\
  | xargs rm -r -f
This solves all the problems found before. The code is easy to read because the function names make their purpose obvious. We do not need to understand awk to guess what the filter select_not_in does, since it is pretty clear from its name. Using awk here is essentially irrelevant, any language can be used to perform this selection step. It is very easy to scan the code down to the end of the pipeline to see that the purpose of the pipeline is to remove some files. The code is also easy to debug because the maintainance programmer can break the pipe sequence anywhere to examine the output of the program at that point, insert tees, insert a filter to pause between each line or mock the input of the filter. This code is easy to reuse because each of the three steps are independent. Last, it is way faster than the previous program and starts much less processes.

Consequences for code organisation

Once we have understood the benefits of delegating complex treatments to filters, we can draw a few consequences for the organisation of our programs and how we should shape our competences.
  1. As a rule of thumb, shell variables should not contain any complex data² and should contain only variables from the Unix world, that is, paths in the filesystem and PIDs. Everything else is stored in files or flaws from one process to the other through a pipe.
  2. It is crucial to know well a tool which we can use to quickly write these filters. I am very pleased with sed and awk as the versions in BSD systems are quite lightweight in comparison to some others, but there is a lot of reasonable choices here.


¹ Yes I wrote Pascal or C. This is a bit old-school, I know.
² Are base64-encoded files complex data?

Saturday, 11 April 2015

Drawing METAPOST pictures with BSD Owl Scripts



METAPOST, a program my John Hobby, is a powerful language for creating technical drawings and it is found in most if not all TeX distributions. While most LaTeX compilation assistants do not pay much attention to METAPOST, it is very well integrated in BSD Owl Scripts so that preparing a LaTeX document containing beautiful METAPOST pictures is achieved by a Makefile as simple as
DOCUMENT=        galley.tex
SRCS+=           figures.mp
.include "latex.doc.mk"
It is also possible to produce pictures for themselves, using a Makefile similar to
DOCUMENT=        figures.mp
MPDEVICE=        eps pdf png svg
.include "mpost.doc.mk"
It will produce EPS, PDF, PNG and SVG versions of the figures.

If you do not know METAPOST here are few figures drawn with it:
A performance comparison chart
A timelineA UML diagram
These pictures are examples found in my Blueprint project, a library of METAPOST definitions. This project also illustrates the use of BSD Owl Scripts to produce METAPOST pictures.

See also: Producing LaTeX documents (BSD Owl Scripts documentation), TeX Users Group page dedicated to METAPOST, André Heck's METAPOST tutorial.

Friday, 10 April 2015

Debian and Ubuntu packaging for BSD Owl Scripts users

I recently wrote Debian and Ubuntu packages for anvil, a small software package using BSD Owl Scripts as build system. I documented my work in the form of a short document and of a series of commits in a dedicated branch of the anvil repository.

You can take advantage of this documentation if you want to write Debian or Ubuntu packages for your git-hosted software built with BSD Owl Scripts. Take good note that this documentation is focused on the technical preparation of a package.

If you consider to submit your software for inclusion in Debian repositories, you should get in touch with a mentor which will help you to implement all the best practices desrbied in Debian New Maintainers' Guide.

In contrast, you can to setup a so-called private package archive to let Ubuntu users easily install your package within minutes. Nevertheless, Debian guidelines and processes guarantee the consistency of this distribution, which the publication in private package archives do not.