Told to blog

it wasn't me, they made me do it

The FSFE.org buildscript 29th June 2015 Tags: fsfe

At the start of this month I deployed the new build script on the running test instance of fsfe.org.
I'd like to give an overview over its features and limitations. In the end you should be able to understand the build logs on our web server and to test web site changes on your own computer.

General Concept

The new build script (let's call it the 2015 revision) emulates the basic behaviour of the old (around 2002ish revision) of the build script. The rough idea is that the web site starts as a collection of xhtml files, which get turned into html files.

The Main build

A xhtml file on the input side contains a page text, usually a news article or informational text. When it is turned into its corresponding html output, it will be enriched with menu headers, the site footer, tag based cross-links, etc. In essence however it will still be the same article and one xhtml input file normally corresponds to one html output file. The rules for the transition are described using the xslt language. The build script finds the transition rules for each xhtml file in a xsl file. Each xsl file will normally provide rules for a number of pages.

Some xhtml files contain special references which will cause the output to include data from other xhtml and xml files. I.e. the news page will contain headings from all news articles and the front page has some quotes rolling through, which are loaded from a different file.

The build script coordinates the tools which perform the build process. It selects xsl rules for each file, handles different language versions, and the fallback for non-existing translations, collects external files for inclusion into a page, and calls the XSLT processor, RSS generator, etc.

Files like PNG images and PDF documents simply get copied to the output tree.

The pre build

Aside from commiting images, changing XML/XHTML code, and altering inclusion rules, authors have the option to have dynamic content generated at build time. This is mostly used for our PDF leaflets but occasionally comes in handy for other things as well. At different places the source directory contains some files called Makefile. Those are instruction files for the GNU make program, a system used for running compilers and converters to generate output files from source code. A key feature of make is, that it regenerates output files only if their source code has changed since the last generator run.
GNU make is called by the build script prior to its own build run. This goes for both, the 2002 and 2015 revision of the build script. Make itself runs some xslt-based conversions and PDF generators to set up news items for later processing and to build PDF leaflets. The output goes to the websites source tree for later processing by the build script. When building locally you must be careful, not to commit generated files to the SVN repository.

Build times

My development machine "Vulcan" uses relatively lightweighted hardware by contemporary standards: an Intel Celeron 2955U with two Haswell CPU Cores at 1.4 GHz and a SSD for mass storage.
I measured the time for some build runs on this machine, however our web server "Ekeberg" despite running older hardware seems to perform slightly faster. Our future web server "Claus" which isn't yet productively deployed seems to work a little slower. The script performs most tasks multi threaded and can profit greatly from multiple CPU cores.

Pre build

The above mentioned pre build will take a long time when it is first run. However once its output files are set up, they will hardly be ever altered.

Initial pre build on Vulcan:	~38 minutes
Subsequent pre build on Vulcan:	< 1 minute

2002 Build script

When the build script is called it first runs the pre build. All timing tests listed here where performed after an initial pre build. This way, as in normal operation, the time required for the pre build has an almost negligible impact on the total build time.

Page rebuild on Vulcan:	~17 minutes
Page rebuild on Ekeberg:	~13 minutes

2015 Build script

The 2015 revision of the build script is written in Shell Script while the 2002 implementation was in Perl. The Perl script used to call the XSLT processor as a library and passed a pre parsed XML tree into the converter. This way it was able to keep a pre parsed version of all source files in cache which was advantageous as it saved reparsing a file which would be included repeatedly. For example news collections are included in differnet places on the site and missing language versions of an article are usually all filled with the same english text version while retaining menu links and page footers in their respective translation.
The shell script does not ever parse an XML tree itself, instead it uses quicker shortcuts for the few XML modifications it has to perform. This means however, that it has to pass raw XML input to the XSLT program, which then has to perform the parsing over and over again. On the plus site this makes operations more atomic from the scripts point of view, and aids in implementing a dependency based build which can save it completely from rebuilding most of the files.

For performing a build, the shell script first calculates a dependency vector in the form of a set of make rules. It then uses make to perform the build. This dependency based build is the basic mode of operation for the 2015 build script.
This can still be tweaked: When the build script updates the source tree from our version control system it can use the list of changes to update the dependency rules generated in a previous build run. In this differential build even the dependency calculation is limited to a minimum with the resulting build time beeing mostly dependent on the actual content changes.

timings taken on the development machine *Vulcan*
Dependency build, initial run:	60+ minutes
Dependency build, subsequent run:	~9 to ~40 minutes
Differential build:	~2 to ~40 minutes

Local builds

In the simplest case you check out the web page from subversion, choose a target directory for the output and and build from the source directory directly into the target. Note that in the process the build script will create some additional files in the source directory. Ideally all of those files should be ignored by SVN, so they cannot be accidentally commited.

There are two options that will result in additional directories being set up beside the output.

You can set up a status directory, where log files for the build will be placed. In case you are running a full build this is recommendable because it allows you to introspect the build process. The status directory is also required to run the differential builds. If you do not provide a status directory some temporary files will be created in your /tmp directory. The differential builds will then behave identical to the regular dependency builds.
You can set up a stage directory. You will not normally need this feature unless you are building a live web site. When you specify a stage directory, updates to the website will first be generated there and only after the full build the stage directory will be synchronised into the target folder. This way you avoid having a website online that is half deprecated and half updated. Note that even the choosen synchronisation method (rsync) is not fully atomic.

Full build

The full build is best tested with a local web server. You can easily set one up using lighttpd. Set up a config file, e.g. ~/lighttpd.conf:

server.modules = ( "mod_access" )
$HTTP["remoteip"] !~ "127.0.0.1" {
  url.access-deny = ("")    # prevent hosting to the network
}

# change port and document-root accordingly
server.port                     = 5080
server.document-root            = "/home/fsfe/fsfe.org"
server.errorlog                 = "/dev/stdout"
server.dir-listing              = "enable"
dir-listing.encoding            = "utf-8"
index-file.names                = ("index.html", "index.en.html")

include_shell "/usr/share/lighttpd/create-mime.assign.pl"

Start the server (I like to run it in foreground mode, so I can watch the error output):

/usr/sbin/lighttpd -Df ~/lighttpd.conf

...and point your browser to http://localhost:5080

Of course you can configure the server to run on any other port. Unless you want to use a port number below 1024 (e.g. the standard for HTTP is port 80) you do not need to start the server as root user.

Finally build the site:

~/fsfe-trunk/build/build_main.sh -statusdir ~/status/ build_into /home/fsfe/fsfe.org/

Testing single pages

Unless you are interested in browsing the entire FSFE website locally, there is a much quicker way, to test changes you make to one particular page, or even to .xsl files. You can build each page individually, exaclty as it would be generated during the complete site update:

~/fsfe-trunk/build/build_main.sh process_file ~/fsfe-trunk/some-document.en.xhtml > ~/some-document.html

The resulting file can of course be opened directly. However since it will contain references to images and style sheets, it may be useful to test it on a local web server providing the referenced files (that is mostly the look/ and graphics/ directory).

The status directory

There is no elaborate status page yet. Instead we log different parts of the build output to different files. This log output is visible on http://status.fsfe.org.

File	Description
Make_copy	The file is part of the generated make rules, it all rules for files that are just copied as they are. The file may be reused in the differential build.
Make_globs	Part of the generated make rules. The file contains rules for preprocessing XML file inclusions. It may be reused during the differential build.
Make_sourcecopy	Part of the generated make rules. Responsible for copying xhtml files to the `source/` directory of the website. May be reused in differential build runs.
Make_xhtml	Part of the generated make rules. Contains the main rules for XHTML to HTML transitions. May be reused in differential builds.
Make_xslt	Part of generated make rules. Contains rules for tracking interdependencies between XSL files. May be reused in differential builds.
Makefile	All make rules. This file is a concatenation of all rule files above. While the differential build regenerates the other Make_-files selectively, this one gets always assembled from the input files which may or may not have been reused in the process. The `make` program which builds the site uses this file. Note the time stamp: this file is the last one to be written into, before `make` takes over the build.
SVNchanges	List of changes pulled in with the latest SVN update. Unfortunately it gets overwritten with every non-successful update attempt (normally every minute).
SVNerrors	SVN error output. Should not contain anything ;-)
buildlog	Output of the `make` program performing the build. Possibly the most valuable source of information when investigating a build failure. The last file to be written to during the `make` run.
debug	Some debugging output of the build system, not too informative because it is used very sparsely.
lasterror	Part of the error output. Gets overwritten with every run attempt of the build script.
manifest	List of all files which should be contained in the output directory. Gets regenerated for every build along with the Makefile. The list is used for removing obsolete files from the output tree.
premake	Output of the `make`-based pre build. Useful for investigating issues that come up during this stage.
removed	List of files that were removed from the output after the last run. That is, files that have been part of a previous website revision but do no longer appear in the manifest.
stagesync	Output of `rsync` when copying from a stage directory to the http root. Basically contains a list of all updated, added, and removed files.

Roadmap

Roughly in that order:

move *.sources inclusions from xslt logic to build logic
split up translation files
- both steps will shrink the dependency network and give build times a more favourable tendency
deploy on productive site
improve status output
auto detect build requirements on startup (to aid local use)
add support for markdown language in documents
add sensible support for other, more distinct, language codes (e.g. pt and pt-br)
deploy on DFD site
enable the script to remove obsolete directories (not only files)