The FSFE.org buildscript 29th June 2015 Tags: fsfe

At the start of this month I deployed the new build script on the running test instance of fsfe.org.
I'd like to give an overview over its features and limitations. In the end you should be able to understand the build logs on our web server and to test web site changes on your own computer.

General Concept

The new build script (let's call it the 2015 revision) emulates the basic behaviour of the old (around 2002ish revision) of the build script. The rough idea is that the web site starts as a collection of xhtml files, which get turned into html files.

The Main build

A xhtml file on the input side contains a page text, usually a news article or informational text. When it is turned into its corresponding html output, it will be enriched with menu headers, the site footer, tag based cross-links, etc. In essence however it will still be the same article and one xhtml input file normally corresponds to one html output file. The rules for the transition are described using the xslt language. The build script finds the transition rules for each xhtml file in a xsl file. Each xsl file will normally provide rules for a number of pages.

Some xhtml files contain special references which will cause the output to include data from other xhtml and xml files. I.e. the news page will contain headings from all news articles and the front page has some quotes rolling through, which are loaded from a different file.

The build script coordinates the tools which perform the build process. It selects xsl rules for each file, handles different language versions, and the fallback for non-existing translations, collects external files for inclusion into a page, and calls the XSLT processor, RSS generator, etc.

Files like PNG images and PDF documents simply get copied to the output tree.

The pre build

Aside from commiting images, changing XML/XHTML code, and altering inclusion rules, authors have the option to have dynamic content generated at build time. This is mostly used for our PDF leaflets but occasionally comes in handy for other things as well. At different places the source directory contains some files called Makefile. Those are instruction files for the GNU make program, a system used for running compilers and converters to generate output files from source code. A key feature of make is, that it regenerates output files only if their source code has changed since the last generator run.
GNU make is called by the build script prior to its own build run. This goes for both, the 2002 and 2015 revision of the build script. Make itself runs some xslt-based conversions and PDF generators to set up news items for later processing and to build PDF leaflets. The output goes to the websites source tree for later processing by the build script. When building locally you must be careful, not to commit generated files to the SVN repository.

Build times

My development machine "Vulcan" uses relatively lightweighted hardware by contemporary standards: an Intel Celeron 2955U with two Haswell CPU Cores at 1.4 GHz and a SSD for mass storage.
I measured the time for some build runs on this machine, however our web server "Ekeberg" despite running older hardware seems to perform slightly faster. Our future web server "Claus" which isn't yet productively deployed seems to work a little slower. The script performs most tasks multi threaded and can profit greatly from multiple CPU cores.

Pre build

The above mentioned pre build will take a long time when it is first run. However once its output files are set up, they will hardly be ever altered.

Initial pre build on Vulcan:	~38 minutes
Subsequent pre build on Vulcan:	< 1 minute

2002 Build script

When the build script is called it first runs the pre build. All timing tests listed here where performed after an initial pre build. This way, as in normal operation, the time required for the pre build has an almost negligible impact on the total build time.

Page rebuild on Vulcan:	~17 minutes
Page rebuild on Ekeberg:	~13 minutes

2015 Build script

The 2015 revision of the build script is written in Shell Script while the 2002 implementation was in Perl. The Perl script used to call the XSLT processor as a library and passed a pre parsed XML tree into the converter. This way it was able to keep a pre parsed version of all source files in cache which was advantageous as it saved reparsing a file which would be included repeatedly. For example news collections are included in differnet places on the site and missing language versions of an article are usually all filled with the same english text version while retaining menu links and page footers in their respective translation.
The shell script does not ever parse an XML tree itself, instead it uses quicker shortcuts for the few XML modifications it has to perform. This means however, that it has to pass raw XML input to the XSLT program, which then has to perform the parsing over and over again. On the plus site this makes operations more atomic from the scripts point of view, and aids in implementing a dependency based build which can save it completely from rebuilding most of the files.

For performing a build, the shell script first calculates a dependency vector in the form of a set of make rules. It then uses make to perform the build. This dependency based build is the basic mode of operation for the 2015 build script.
This can still be tweaked: When the build script updates the source tree from our version control system it can use the list of changes to update the dependency rules generated in a previous build run. In this differential build even the dependency calculation is limited to a minimum with the resulting build time beeing mostly dependent on the actual content changes.

timings taken on the development machine *Vulcan*
Dependency build, initial run:	60+ minutes
Dependency build, subsequent run:	~9 to ~40 minutes
Differential build:	~2 to ~40 minutes

Local builds

In the simplest case you check out the web page from subversion, choose a target directory for the output and and build from the source directory directly into the target. Note that in the process the build script will create some additional files in the source directory. Ideally all of those files should be ignored by SVN, so they cannot be accidentally commited.

There are two options that will result in additional directories being set up beside the output.

You can set up a status directory, where log files for the build will be placed. In case you are running a full build this is recommendable because it allows you to introspect the build process. The status directory is also required to run the differential builds. If you do not provide a status directory some temporary files will be created in your /tmp directory. The differential builds will then behave identical to the regular dependency builds.
You can set up a stage directory. You will not normally need this feature unless you are building a live web site. When you specify a stage directory, updates to the website will first be generated there and only after the full build the stage directory will be synchronised into the target folder. This way you avoid having a website online that is half deprecated and half updated. Note that even the choosen synchronisation method (rsync) is not fully atomic.

Full build

The full build is best tested with a local web server. You can easily set one up using lighttpd. Set up a config file, e.g. ~/lighttpd.conf:

server.modules = ( "mod_access" )
$HTTP["remoteip"] !~ "127.0.0.1" {
  url.access-deny = ("")    # prevent hosting to the network
}

# change port and document-root accordingly
server.port                     = 5080
server.document-root            = "/home/fsfe/fsfe.org"
server.errorlog                 = "/dev/stdout"
server.dir-listing              = "enable"
dir-listing.encoding            = "utf-8"
index-file.names                = ("index.html", "index.en.html")

include_shell "/usr/share/lighttpd/create-mime.assign.pl"

Start the server (I like to run it in foreground mode, so I can watch the error output):

/usr/sbin/lighttpd -Df ~/lighttpd.conf

...and point your browser to http://localhost:5080

Of course you can configure the server to run on any other port. Unless you want to use a port number below 1024 (e.g. the standard for HTTP is port 80) you do not need to start the server as root user.

Finally build the site:

~/fsfe-trunk/build/build_main.sh -statusdir ~/status/ build_into /home/fsfe/fsfe.org/

Testing single pages

Unless you are interested in browsing the entire FSFE website locally, there is a much quicker way, to test changes you make to one particular page, or even to .xsl files. You can build each page individually, exaclty as it would be generated during the complete site update:

~/fsfe-trunk/build/build_main.sh process_file ~/fsfe-trunk/some-document.en.xhtml > ~/some-document.html

The resulting file can of course be opened directly. However since it will contain references to images and style sheets, it may be useful to test it on a local web server providing the referenced files (that is mostly the look/ and graphics/ directory).

The status directory

There is no elaborate status page yet. Instead we log different parts of the build output to different files. This log output is visible on http://status.fsfe.org.

File	Description
Make_copy	The file is part of the generated make rules, it all rules for files that are just copied as they are. The file may be reused in the differential build.
Make_globs	Part of the generated make rules. The file contains rules for preprocessing XML file inclusions. It may be reused during the differential build.
Make_sourcecopy	Part of the generated make rules. Responsible for copying xhtml files to the `source/` directory of the website. May be reused in differential build runs.
Make_xhtml	Part of the generated make rules. Contains the main rules for XHTML to HTML transitions. May be reused in differential builds.
Make_xslt	Part of generated make rules. Contains rules for tracking interdependencies between XSL files. May be reused in differential builds.
Makefile	All make rules. This file is a concatenation of all rule files above. While the differential build regenerates the other Make_-files selectively, this one gets always assembled from the input files which may or may not have been reused in the process. The `make` program which builds the site uses this file. Note the time stamp: this file is the last one to be written into, before `make` takes over the build.
SVNchanges	List of changes pulled in with the latest SVN update. Unfortunately it gets overwritten with every non-successful update attempt (normally every minute).
SVNerrors	SVN error output. Should not contain anything ;-)
buildlog	Output of the `make` program performing the build. Possibly the most valuable source of information when investigating a build failure. The last file to be written to during the `make` run.
debug	Some debugging output of the build system, not too informative because it is used very sparsely.
lasterror	Part of the error output. Gets overwritten with every run attempt of the build script.
manifest	List of all files which should be contained in the output directory. Gets regenerated for every build along with the Makefile. The list is used for removing obsolete files from the output tree.
premake	Output of the `make`-based pre build. Useful for investigating issues that come up during this stage.
removed	List of files that were removed from the output after the last run. That is, files that have been part of a previous website revision but do no longer appear in the manifest.
stagesync	Output of `rsync` when copying from a stage directory to the http root. Basically contains a list of all updated, added, and removed files.

Roadmap

Roughly in that order:

move *.sources inclusions from xslt logic to build logic
split up translation files
- both steps will shrink the dependency network and give build times a more favourable tendency
deploy on productive site
improve status output
auto detect build requirements on startup (to aid local use)
add support for markdown language in documents
add sensible support for other, more distinct, language codes (e.g. pt and pt-br)
deploy on DFD site
enable the script to remove obsolete directories (not only files)

more...

Website Cleanup Report - Part 4 20th April 2015 Tags: fsfe pdfreaders web_build

This post was originally written as an email to the mailing list of the FSFE web team. In the text I describe my progress in rewriting the build system and integrating campaign sites. I also provide an overview about the road map and give some leads on how to run a local web site build.

Introducing: The build scripts

This mail is particularly long. Unfortunately I'm not really the blogger type so I'll have to use your email inbox to keep you posted about what happens on the technical side. Today I'm going to post some command line examples, that will help you perform local builds and test your changes to the web site. This email will also provide technical background about what's happening behind the screen when the build is performed.

But first things first...

External site integration

As announced in the last report we moved pdfreaders.org and drm.info to the main web server where the sites are served as part of the main web site.
Did you know? Instead of accessing the domains pdfreaders.org or drm.info, you can also go to http://fsfe.org/pdfreaders or http://fsfe.org/drm.info/.
This means the sites are also integrated in the fsfe test builds and parts of the fsfe web site, like e.g. a list of news, can be integrated into the pages. The build process can be observed together with the main build under http://status.fsfe.org.

Production/Test script unification

We used to have different scripts for building the test branch and the productive branch of the website. This means the four files build.sh, build.pl, build-test.sh, and build-test.pl were each present in the test branch and the trunk. As a result, there was not only a logical difference between the build.* and build-test.* scripts, but also between the build.* scripts in the test and trunk section and the build-test.* scripts in the test and trunk section - we actually had to consider four sets of build scripts.

Obviously the fact that the scripts exist in different branches should represent all version differences and should be enough for testing too.
Since the only intended difference between the scripts were some hard coded file system pathes I added support for some call parameters to select the different build flavours. I was able to unify the scripts, causing the test and productive build to work in identical ways.
Pathes are also no longer hard coded, allowing for local builds. I.e. to perform a site build in your home directory you can issue a command like:

~/fsfe-web/trunk/tools/build.sh -f -d ~/www_site -s ~/buildstatus

NOTE: Don't trust the process to be free of side effects! I recommend you set up a distinct user in your system to run the build!

Another note: you will require some perl libraries for the build and some latex packages for building the pdf files included in the site. This is the old build. I aim to document the exact dependencies for my reimplementation of the build process.

Reimplementing the build scripts

Our goals here are

to be able to extend the build scripts by other functions
to speed up the build
to simplify local builds

Of course it is practical to start from a point where we don't have to fundamentally change the way our site is set up. That's why I'd like the new build process to work with all the existing data. Basically it should build the site in a very similar way to the scripts we use today.

Which brings us to:

Understanding the build

The first part of the above mentioned build system is a shell script (tools/build.sh) that triggers the build and does some set up and job control. The actual workhorse is the perl script, tools/build.pl, which is called by the shellscript. On our webserver the shell program is executed every few minutes by cron and tries to update the website source tree from SVN. If there are updates the build is started. Unconditional builds are run only once a night.

The first stage of the site build is performed by make, also called from the shell script. You can find makefiles all over the svn tree of our website and it is easy to add new make rules to those. Existing make rules call the latex interpreter to generate pdf documents from some of our pages, assembles parts of the menu from different pages, build includable xml items from news articles, etc. The initial make run on a freshly checked out source tree will require a long time, easily more than an hour. Subsequent runs will only affect updated files and run through very quickly.

The perl script then traverses the source tree of our web site. Its goal is to generate a html file for each xhtml file in the source. Since not every source document is available in every language there will be more html files on the output side than there are xhtml files in the source. Missing translations are padded with a file that contains the navigation menu and a translation warning in the respective language and the document text in english.
Each HTML file is generated from more than just its corresponding xhtml source! The basic processing mechanic is xslt and build.pl automatically selects the xsl processing rules for each document depending on its location and file name. The input to the XSL processor is assembled from various files, including the original xhtml file, translation files, menu files and other referenced files like xml news items.

Assembling this input tree for the xslt processor is the most complex task performed by the perl script.

The rewrite

The reimplementation of the build script drops the perl dependency and happens entirely in shell script. It uses xltproc as main xslt processor. The program is already used during the make stage of the build and replaces the perl module which was previously required. The new shell script makes exessive use of shell functions and is much more modular than the old builder. I started by rewriting functions to assemble the processor input, hence the name of the shell script is build/xmltree.sh. At some point I will choose a more expressive name.

While the perl script went out of its way to glue xml snippets onto each other, a much larger portion of the rewrite is comitted to resolving interdependencies. With each site update we want to process as little source material as possible, rebuilding only those files that are actually affected by the change. The make program provides a good basis for resolving those requirements. The new script is set around generating make rules to build the entire site. Make can then perform an efficient differential update, setting up parallel processing threads whereever possible. Backend functions are provided to make, again by the same shell script.

To perform a site build in your home directory using the new script you can run:

~/fsfe-web/trunk/build/xmltree.sh -s ~/buildstatus build_into ~/www_site/

Try building only a single page for review:

cd ~/fsfe-web/trunk
./build/xmltree.sh process_file index.it.xhtml >~/index.it.html

Note, when viewing the result in a browser, that some ressources are referenced live from fsfe.org, making it hard to review e.g. changes to CSS rules. This is a property of the xsl rules not the build program. You can set up a local web server and point fsfe.org to your local machine via /etc/hosts if you want to perform more extensive testing.

xmltree.sh provides some additional functions. View build/xmltree.sh --help to get a more complete overview. If you are bold enough you can even source the script into your interactive shell and use functions from it directly. This might help you debug some more advanced problems.

ToDo

The new script still has some smaller issues in regards to compatibility to the current build process.

source files are not yet copied to the destination tree (the old script does this to help translators)
The XSL rule files for generating RSS and ICS output are not yet executed, resulting in RSS and ICS files not being built automatically. (manual build is possible)
Documents not present in english are not yet built at all.

While make limits the required processing work for a page rebuild drastically, the generation of the make rules alone still takes a lot of time. I intend to relieve this problem by making use of the SVN log. SVN update provides clear information about the changes made to the source tree, making it possible to limit even the rule generation to a very small subset.
In addition it may be possible to refine the dependency model further, resulting in even fewer updates.

Future development

Some files have an unduely large part of the site depending on them. Even a minor update to the file "tools/texts-en.xhtml" for example would result in a rebuild of almost every page, which in the new build script is actually slower than before. For this if not for a hundred other reasons this file should be split into local sections, which in turn requires a slight adaption to the build logic and possibly some xslt files.

documentfreedom.org never benefited much from the development on the main web site. The changes to xsl files, as well as to the build script should be applied to the DFD site as well. This should happen before the start of next years DFD campaign.

Currently our language support is restricted to the space provided by iso639-1 language tags. During the most recent DFD campaign we came to perceive this as a limitation when working with concurrent Portuguese and Brazilian translations. Adding support for the more distinct RFC5646 (IETF) language tags requires some smaller adaptions to the build script. The IETF language codes exend the currently used two letter codes by adding refinement options while maintaining compatibility.

To help contributors in setting up local builds the build script should check its own dependencies at the start and give hints which packages to install.

more...

Website Cleanup Report - Part 3 4th September 2014 Tags: fsfe pdfreaders xslt

This post was originally written as an email to the mailing list of the FSFE web team. In the text I describe my progress in tidying up our xsl rules and integrating campaign sites.

Hello List,

it's been a while. Upcoming tasks aside, my project focus is still the web site.

1. Modified XSLT code

The biggest obstacle for us to deploy sub pages with styles different from the main page was that the way of including xsl rules seemed very unclear. I had a really hard time analysing the code and making sense of it.
As XHTML goes, the two highest level sections in a document are <head> and <body>, to describe both sections we use extensive style rules. I was pretty confused by the fact, that both rule sets are applied in completely different ways. I don't think there was a reason for that, the original code must just have been written by different people or copied from different tutorials.
Anyway, I normalised the method and the template names for the sections. To completely override a page style it is now enough to define a template named "page-head" and one named "page-body" and execute (include) the file build/xslt/fsfe_document.xsl.

Fun fact: the most useful documentation for me when working with XSLT code was the one in the Microsoft developers network.
http://msdn.microsoft.com/en-us/library/ms256069(v=vs.110).aspx

2. Started to move pdfreaders

Finally being able to properly override page styles I started to move pdfreaders.org to our current build system. I am already playing around with code to have the readers defined in separate files. I use code from the news page to compile a page from tags in different files. News files, being listed in a row, ordered by the news date became reader definition files listed by priority.

See http://test.fsfe.org/pdfreaders

X. What's coming up

Moving drm.info and theydontwantyou.to over to the build system should now be rather simple too. The static build systems of those pages are outdated and don't run on our machines anymore, to update translations, we will need to either update the build systems, provided they are still maintained or better yet, do the transfer.

For the time ahead I will not do too much on the xsl files. I am going to focus on the perl and shell scripts, which can make a real difference regarding build speed and features.

more...

Website Cleanup Report - Part 2 21st July 2014 Tags: fsfe xslt

This post was originally written as an email to the mailing list of the FSFE web team. In the text I describe my progress in smoothing out the web server setup and tidying up our collection of xsl rules.

Hello List,

despite me being sick last week (I feel well again), the work on the web site continues with undiminished effort. Here is about the changes I have merged into the trunk today.

1. Enabled .htaccess rules

Well, this was not actually done in the repo, but on the web server. The page http://fsfe.org/tools is protected by a htaccess rule denying access to the directory. This was supposed to be the case for the last four and a half years. However instead of replying with a grumpy 403 when trying to access the directory, the webserver used to fail with a 500 "Internal server error" whenever someone hit that folder. This is because acceess rules in the .htaccess files had not been allowed in the web server configuration.
Admittedly in this case the effect was very similar to what was supposed to be achieved, but http://fsfe.org/campaigns/browserbundling/ did also fail with an error 500 because the .htaccess file tried to override the directory index without success.

2. Removed unneeded libraries

I believe the oldest occurrence of date-time.xsl was the one in the news folder, where it had been added in 2009 with the saucy svn comment "I hate XSLT 1.0."
The 1447 line file offers three features which we use on the web site:

getting the English name of a month (for a month number)
getting the English name of a weekday
calculating the weekday for any given date.

The first two functions are not actually used in the news section - they are trivial and are reimplemented directly in news.rss.xsl. I didn't bother with that though - some day we are going to replace this code with a localised function anyway and date-time.xsl will not be of any help there.
The third function - the one that was actually used - is a 7 line implementation of a variation of Zeller's algorithm. That makes 1440 lines of wasted code.
Wait! The weekday is actually only needed for the pubDate field of the RSS data. The RSS standard requires the pubDate field to adhere to the date format specified in RFC-822 and RFC-822 in turn explicitly specifies the week day as being optional in date statements.
So we do not actually need this function either.
I removed the file changing the format of the pubDate field in our RSS feeds.

The function get-month-name is still used somewhere on the site, so we retain a copy of the date-time library at some other point. I will remove this copy in time and either switch to a truly localised or purely numeric date display for event items.

Thus questioning the usefulness of some of the library files, I started to track down their usage on the site. In the end I removed most files from the "XSLT Standard Library". BTW. despite this stately name the lib has nothing to do with the xslt standard. It is just a project name and the library hasn't been updated in almost 10 years. I have it in for this monster, bet I can throw it out completely.

All in all I removed some thousand lines of code form the build process. Surely I am not finished yet ;-)

more...

Website Cleanup Report - Part 1 10th July 2014 Tags: fsfe

This post was originally written as an email to the mailing list of the FSFE web team. In the text I describe my progress in tidying up our collection of build scripts and xsl rules.

Hello List,

in the spirit of opensslrampage.org I am going to talk a little about what I found out and what I did when cleaning up the website build scripts. I will try to save the worst phrasings for the svn log.

This will be the first mail, there will be more.

1. Removed build-test.pl

Currently we use a bunch of build scripts for building fsfe.org, test.fsfe.org, documentfreedom.org, test.documentfreedom.org, as well as the analogous translation logs and build status pages.

You guessed it, all those scripts look the same. Or at least at some point they were supposed to. What started out as minor forks with some file system paths altered evolved into a diverse and colourful ecosystem.

So last week I merged build-test.sh and build.sh into one script and got rid of build-test.pl which showed only a very minor difference to the original build script.
I threw away the patch files which recorded the difference between the scripts at the same time - they had been hurting the feelings of svn diff for long enough.

The same is to be done for the translation log files and some others.

2. Split up fsfe.xsl

This file was suffering a bad form of feature cancer. Each time we wanted to make something look different on the page we put some more code in there. That was fine because it worked. Only at some time it became harder and harder to grasp why it did so, harder to find the section of code which we would need to change to have a particular effect, and impossible to judge what code could be removed.

So I made a new folder, called it build/, put a bunch of files in there and ripped fsfe.xsl into pieces. Now there is still superfluous code, but at least the we can see, where definitions for the html footer start and where they end.

I put this into practice already. Some xls rules for example are only used in the supporters section. Until recently however they were processed for the built of every single page on the website. Having the rules in separate files, allowed me to only include them for processing of the supporter pages.

I am sure there lurk many undiscovered processing rules, which are only applied to parts of the page, or even not at all.

Being able to select which rules will be loaded for processing a page does also enable us to introduce different page headers, footers and other sections for campaign pages and similar.

more...

Told to blog

it wasn't me, they made me do it

Entries tagged "fsfe".

General Concept

The Main build

The pre build

Build times

Pre build

2002 Build script

2015 Build script

Local builds

Full build

Testing single pages

The status directory

Roadmap

External site integration

Production/Test script unification

Reimplementing the build scripts

Understanding the build

The rewrite

ToDo

Future development

1. Modified XSLT code

2. Started to move pdfreaders

X. What's coming up

1. Enabled .htaccess rules

2. Removed unneeded libraries

1. Removed build-test.pl

2. Split up fsfe.xsl

Archive