PostgreSQL Buildfarm Howto

From PostgreSQL wiki
Revision as of 14:12, 28 May 2010 by Adunstan (talk | contribs) (import of buildfarm howto. not yet complete)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

DRAFT, still editing!

This HOWTO is for PostgreSQL Build Farm clients.

PostgreSQL build Farm is a distributed build system designed to detect build failures on a large collection of platforms and configurations. This software is written in Perl. If you're not comfortable with Perl then you possibly don't want to run this, even though the only adjustment you should ever need is to the config file (which is also Perl).

Get the Software from: pgFoundry Unpack it and put it somewhere. You can put the config file in a different place from the run_build.pl script if you want to (see later), but the simplest thing is to put it in the same place. Decide which user you will run the script as - it must be a user who can run postgres server programs (on Unix that means it must *not* run as root). Do everything else here as that user.

Set up a base git mirror that all your branches will pull from Most buildfarm members run on more than one branch, and if you do it's good practice to set up a mirror on the buildfarm machine and then just clone that for each branch. As of this writing there are two suitable public git repositories:

The first is the community repository, and is kept very up to date. Unfortunately it is broken on branches earlier then REL8_3_STABLE. The second repository is similar to the first but is not broken on any of the live branches (as of the current writing). When the PostgreSQL project finally moves to using Git, currently expect to be around August 2010, both of these will become redundant. To set up a mirror, do something like this:

 git clone --mirror git://github.com/oicu/pg-cvs-mirror.git pgsql-base.git

When that is done, add an entry to your crontab to keep it up to date, something like:

 20,50 * * * * cd /path/to/pgsql-base.git && git fetch -q

Create a directory where builds will run. This should be dedicated to the use of the build farm. Make sure there's plenty of space - on my machine each branch can use up to about 700Mb during a build. You can use the directory where the script lives, or a subdirectory of it, or a completely different directory.

Edit the build-farm.conf file and put the location of the directory you just created in the config variable "build_root". Set the "scm" to "git" and the "scmrepo" to the path to your git mirror. If you are not using the community git repo, set the "scm_url" to point to where to find a givven git commit on the web. e.g. for my mirror on github, the scm_url should be http://github.com/oicu.pg-cvs-mirror/commit/ (Don't forget the trailing "/".) Adjust the config variables "make", "config_opts", and (if you don't use ccache) "config_env" to suit your environment, and to choose which optional postgres coniguration options you want to build with. You should not need to adjust any other variables. Check that you didn't screw things up by running "perl -cw build-farm.conf".

Change the shebang line in the run_build script.. If the path to your perl installation isn't "/usr/bin/perl", edit the #! line in run_build.pl so it is correct. This is the ONLY line in that file you should ever need to edit.

Check that required perl modules are present. Run "perl -cw run_build.pl". If you get errors about missing perl modules you will need to install them. Most of the required modules are standard modules in any perl distribution.. The rest are all standard CPAN modules, and available either from there or from your OS distribution. When you don't get an error any more, run the same test on run_web_txn.pl. When all is clear you are ready to start testing.

Run in test mode. With a PATH that matches what you will have when running from cron, run the script in no-send, no-status, verbose mode. Something like this:

 PATH=/usr/bin:/bin ./run_build.pl --nosend --nostatus

and watch the fun begin. If this results in failures because it can't find some executables (especially gmake and git), you might need to change the config file again, this time changing the "build_env" with another setting something like: PATH => "/usr/local/bin:$ENV{PATH}", Also, if you put the config file somewhere else, you will need to use the --config=/path/to/build-farm.conf option.

Test running from cron When you have that running, it's time to try with cron. Put a line in your crontab that looks something like this:

 43 * * * * cd /location/of/run_build.pl/ && ./run_build.pl --nosend --verbose

Again, add the --config option if needed. Notice that this time we didn't specify nostatus. That means that (after the first run) the script won't do any build work unless the CVS repo has changed. Check that your cron job runs (it should email you the results, unless you tell it to send them elsewhere).

Choose which branches you want to build By default run_build.pl builds the HEAD branch. If you want to build other branches, you can do so by specifying the name on the commandline, e.g.

 run_build.pl REL8_4_STABLE

so, once you have HEAD working, remove the --verbose flag from your crontab, and add extra cron lines for each branch you want to build regularly. You could have something like this:

6 * * * * cd /home/andrew/buildfarm && ./run_build.pl --nosend
30 4 * * * cd /home/andrew/buildfarm && ./run_build.pl --nosend REL8_1_STABLE

9. Once this is all running happily, you can register to upload your results to the central server. Registration can be done on the buildfarm server at http://www.pgbuildfarm.org:/register.html . When you receive your approval by email, you will edit 2 lines in your config file, remove the --nosend flags, and you are done.

10. Resource use. Using the 'update' cvs method (see the config file) results in significantly lower bandwidth use on both your server and the main postgresql cvs server than using method 'export'. The price is that occasionally cvs update is less reliable, and you have a slightly higher disk usage (about 70Mb more for HEAD branch). Eventually I'd like to migrate the load entirely off the postgresql cvs server by implementing an 'rsync' method. But that's for another day. When you use the 'update' method, run_build.pl works on a temporary copy of the repo, never inside the repo (hence the extra disk usage). Use of ccache is highly recommended, especially on slow machines - on my two machines runs that took an hour without ccache take about 15 minutes with ccache, and with significantly lower processor load. Finally, bandwidth use from uploading results should be very light, except when there are failures, in which case the transaction can be quite large. But it's only one transaction, and we don't expect lots of failures, do we? ;-)

11. Almost all the bandwidth issues disappear if you use a local CVS repository instead of the one at postgresql.org. The way to do this (or at least the way I did it) is using CVSup. Since building CVSup is non-trivial, the best way to start this is to get a binary package for some system it will run on. In my case this was a Linux system running Fedora Core/1. After a few false starts, I got it working replicating the entire repo at postgresql.org, including the CVSROOT directory. Then I commented out the entries in CVSROOT/loginfo and CVSROOT/commitinfo, and set up the LockDir directive as I wanted it it CVSROOT/config. Then I checked out the CVSROOT module and did that all over again, and checked the module back in. Then to make sure CVSup didn't overwrite those files, I made entries for them in <mirror-home>/sup/repository/refuse. With that done, I was able to change the build config on that machine so that the config variable "cvsrepo" was just the name of the mirror root directory. Everything worked fine. After that I set up an anonymous cvs pserver against the mirror, so that my other machine could also get the source from there instead of from postgresql.org. I did a "cvs login", changed the "cvsrepo" config variable on that machine, and it worked happily too. Finally, I set up a cron job on the mirror machine to update the mirror. The anonymous repository is only updated from the master once every hour, so there is no point in running the cron job more often than that. This should not be too big a deal, as CVSup is extremely efficient, and even doing this so frequently should not incur a lot of bandwidth use.

12. CVSup is not universally available. For example, it does not seem to be available any longer in Fedora Extras, and there are platforms for which it has never been available. However, a similar procedure to the above can be done with rsync, which is pretty universally available. Here is what I did. First I made a repo location, and get an initial repo copy:

 mkdir -p /home/cvsmirror/pg
 rsync -avzCH --delete anoncvs.postgresql.org::pgsql-cvs /home/cvsmirror/pg

Then remove the sup directory and set up an rsync exclude file:

 rm -rf /home/cvsmirror/pg/sup
 echo /sup/ > /home/cvsmirror/pg-exclude
 echo '/CVSROOT/loginfo*' >> /home/cvsmirror/pg-exclude
 echo '/CVSROOT/commitinfo*' >> /home/cvsmirror/pg-exclude
 echo '/CVSROOT/config*' >> /home/cvsmirror/pg-exclude

Then edit the CVSROOT as in step 11. The add a job to cron something like this:

 43 * * * * rsync -avzCH --delete --exclude-from=/home/cvsmirror/pg-exclude anoncvs.postgresql.org::pgsql-cvs /home/cvsmirror/pg

Finally, add a pserver if other local buildfarm member machines need access.

13. Please file bug reports on the tracker at:

 http://pgfoundry.org/tracker/?atid=238&group_id=1000040&func=browse