Told to blog

it wasn't me, they made me do it

Website Cleanup Report - Part 4 20th April 2015 Tags: fsfe pdfreaders web_build

This post was originally written as an email to the mailing list of the FSFE web team. In the text I describe my progress in rewriting the build system and integrating campaign sites. I also provide an overview about the road map and give some leads on how to run a local web site build.

Introducing: The build scripts

This mail is particularly long. Unfortunately I'm not really the blogger type so I'll have to use your email inbox to keep you posted about what happens on the technical side. Today I'm going to post some command line examples, that will help you perform local builds and test your changes to the web site. This email will also provide technical background about what's happening behind the screen when the build is performed.

But first things first...

External site integration

As announced in the last report we moved pdfreaders.org and drm.info to the main web server where the sites are served as part of the main web site.
Did you know? Instead of accessing the domains pdfreaders.org or drm.info, you can also go to http://fsfe.org/pdfreaders or http://fsfe.org/drm.info/.
This means the sites are also integrated in the fsfe test builds and parts of the fsfe web site, like e.g. a list of news, can be integrated into the pages. The build process can be observed together with the main build under http://status.fsfe.org.

Production/Test script unification

We used to have different scripts for building the test branch and the productive branch of the website. This means the four files build.sh, build.pl, build-test.sh, and build-test.pl were each present in the test branch and the trunk. As a result, there was not only a logical difference between the build.* and build-test.* scripts, but also between the build.* scripts in the test and trunk section and the build-test.* scripts in the test and trunk section - we actually had to consider four sets of build scripts.

Obviously the fact that the scripts exist in different branches should represent all version differences and should be enough for testing too.
Since the only intended difference between the scripts were some hard coded file system pathes I added support for some call parameters to select the different build flavours. I was able to unify the scripts, causing the test and productive build to work in identical ways.
Pathes are also no longer hard coded, allowing for local builds. I.e. to perform a site build in your home directory you can issue a command like:

~/fsfe-web/trunk/tools/build.sh -f -d ~/www_site -s ~/buildstatus

NOTE: Don't trust the process to be free of side effects! I recommend you set up a distinct user in your system to run the build!

Another note: you will require some perl libraries for the build and some latex packages for building the pdf files included in the site. This is the old build. I aim to document the exact dependencies for my reimplementation of the build process.

Reimplementing the build scripts

Our goals here are

Of course it is practical to start from a point where we don't have to fundamentally change the way our site is set up. That's why I'd like the new build process to work with all the existing data. Basically it should build the site in a very similar way to the scripts we use today.

Which brings us to:

Understanding the build

The first part of the above mentioned build system is a shell script (tools/build.sh) that triggers the build and does some set up and job control. The actual workhorse is the perl script, tools/build.pl, which is called by the shellscript. On our webserver the shell program is executed every few minutes by cron and tries to update the website source tree from SVN. If there are updates the build is started. Unconditional builds are run only once a night.

The first stage of the site build is performed by make, also called from the shell script. You can find makefiles all over the svn tree of our website and it is easy to add new make rules to those. Existing make rules call the latex interpreter to generate pdf documents from some of our pages, assembles parts of the menu from different pages, build includable xml items from news articles, etc. The initial make run on a freshly checked out source tree will require a long time, easily more than an hour. Subsequent runs will only affect updated files and run through very quickly.

The perl script then traverses the source tree of our web site. Its goal is to generate a html file for each xhtml file in the source. Since not every source document is available in every language there will be more html files on the output side than there are xhtml files in the source. Missing translations are padded with a file that contains the navigation menu and a translation warning in the respective language and the document text in english.
Each HTML file is generated from more than just its corresponding xhtml source! The basic processing mechanic is xslt and build.pl automatically selects the xsl processing rules for each document depending on its location and file name. The input to the XSL processor is assembled from various files, including the original xhtml file, translation files, menu files and other referenced files like xml news items.

Assembling this input tree for the xslt processor is the most complex task performed by the perl script.

The rewrite

The reimplementation of the build script drops the perl dependency and happens entirely in shell script. It uses xltproc as main xslt processor. The program is already used during the make stage of the build and replaces the perl module which was previously required. The new shell script makes exessive use of shell functions and is much more modular than the old builder. I started by rewriting functions to assemble the processor input, hence the name of the shell script is build/xmltree.sh. At some point I will choose a more expressive name.

While the perl script went out of its way to glue xml snippets onto each other, a much larger portion of the rewrite is comitted to resolving interdependencies. With each site update we want to process as little source material as possible, rebuilding only those files that are actually affected by the change. The make program provides a good basis for resolving those requirements. The new script is set around generating make rules to build the entire site. Make can then perform an efficient differential update, setting up parallel processing threads whereever possible. Backend functions are provided to make, again by the same shell script.

To perform a site build in your home directory using the new script you can run:

~/fsfe-web/trunk/build/xmltree.sh -s ~/buildstatus build_into ~/www_site/

Try building only a single page for review:

cd ~/fsfe-web/trunk
./build/xmltree.sh process_file index.it.xhtml >~/index.it.html

Note, when viewing the result in a browser, that some ressources are referenced live from fsfe.org, making it hard to review e.g. changes to CSS rules. This is a property of the xsl rules not the build program. You can set up a local web server and point fsfe.org to your local machine via /etc/hosts if you want to perform more extensive testing.

xmltree.sh provides some additional functions. View build/xmltree.sh --help to get a more complete overview. If you are bold enough you can even source the script into your interactive shell and use functions from it directly. This might help you debug some more advanced problems.

ToDo

The new script still has some smaller issues in regards to compatibility to the current build process.

While make limits the required processing work for a page rebuild drastically, the generation of the make rules alone still takes a lot of time. I intend to relieve this problem by making use of the SVN log. SVN update provides clear information about the changes made to the source tree, making it possible to limit even the rule generation to a very small subset.
In addition it may be possible to refine the dependency model further, resulting in even fewer updates.

Future development

Some files have an unduely large part of the site depending on them. Even a minor update to the file "tools/texts-en.xhtml" for example would result in a rebuild of almost every page, which in the new build script is actually slower than before. For this if not for a hundred other reasons this file should be split into local sections, which in turn requires a slight adaption to the build logic and possibly some xslt files.

documentfreedom.org never benefited much from the development on the main web site. The changes to xsl files, as well as to the build script should be applied to the DFD site as well. This should happen before the start of next years DFD campaign.

Currently our language support is restricted to the space provided by iso639-1 language tags. During the most recent DFD campaign we came to perceive this as a limitation when working with concurrent Portuguese and Brazilian translations. Adding support for the more distinct RFC5646 (IETF) language tags requires some smaller adaptions to the build script. The IETF language codes exend the currently used two letter codes by adding refinement options while maintaining compatibility.

To help contributors in setting up local builds the build script should check its own dependencies at the start and give hints which packages to install.

Archive

Tags

Static blog generated by Chronicle v4.6