Working with CVS

From PostgreSQL wiki
Jump to navigationJump to search




Warning: The PostgreSQL master repository has now switched to using Git instead of CVS. While it's possible to obtain a copy of the repository via CVS still, development work and generation of patches should be done Working with Git instead.



The Importance of Local CVS Repositories

One of the keys to hacking on PostgreSQL is dealing with CVS. CVS keeps track of two things: source code and revision history. The way it's structured, the revision information is stored in directories named CVS among the files themselves. When you make your own complete local copy of the PostgreSQL CVS repository using a tool like rsync or CVSup, the main goal is usually to be able to do operations like a diff to see local changes without having the rely on the master repository. But you also get the ability to reproduce the development process that went into building the code. You'll have all the revision history needed to do things like examine individual patches that were applied or revert to earlier versions of the PostgreSQL code.

The problem with this is that it's not completely obvious what to do about your changes to the code. You can't just put your revision history into your copy of the master repository, because your local changes will be wiped out the next time you synchronize--replaced by the changes made to the official source code. And you can't just ignore the master repository. By the time you finish your modifications, the code it touches may have moved underneath you, forcing a reconciliation ("merge") of the two change sets before your patch can be used by others.

Developers use a variety of techniques to do their own local development while keeping up with changes made to the master repository. This page collects some of the popular approaches and recipes for how to successfully manage this issue.

Complicated local commits

The fundamental problem faced by projects with a central repository is how to cope with complicated work done on a developer's system. Local work done by a developer results in a diff file that can be applied as a patch. If the code is fairly straightforward, self-contained, and complete, that development model works fine. There are two areas where that approach breaks down. The first is when the patch conflicts with other work being done, so there must be some merging of the two sets of code. The other difficult case is where only part of a patch gets applied, perhaps because there are good ideas in it but other parts still need work. Sometimes, it's obvious the new development breaks into logical pieces that way from the beginning, but it's not so easy to build it that way.

To pick an actual example from some recent development, let's say you have a piece of code that instruments a part of the PostgreSQL internals, then exposes that information into one of the pg_statXXX views. This code might be fairly complicated, but it's standalone and straightforward to apply. On top of that, you then want to build a controversial tuning feature that uses the data collected to do something that is much less likely to be accepted (or even work!)

The actual flow of coding here would look like this:

  • HEAD -> diff A -> instrumentation feature -> diff B -> tuning feature

It's relatively straightforward to do this work and end up with one patch that include diff A, while second patch include diff A+B. But getting just diff B is hard to do unless you have a way to stop there as a commit point. While there are some simple workarounds to do that within the bounds of the PostgreSQL CVS model, you can still get into trouble if you discover there's a bug in diff A that has to be fixed only after you're well into B. You can easily end up in a situation where the only straightforward way to cope with that is to to save the two diffs, roll-back to the original repository, re-apply diff A, fix the issue, make a new diff A-2, re-apply B, merge conflicts with A-2, then produce a new B-2.

As you can imagine, this is more complicated than it should be, but such scenarios are common to CVS-based development.

General CVS Information

There is a big manual and a free books about CVS available:

For a quicker start, read one of these pages instead:

CVS+rsync Solutions

Having local CVS repository is a must if you want to be able to generate diffs for code changes quickly. It also allows working off-line.

Full repository via rsync, local changes in checkout area

If you want a full copy of the PostgreSQL repository with revision history, in order to develop or test new code you'll need to checkout a copy and then work in that area. This is a popular method of working for developers who are involved in the PostgreSQL commit process, because it makes it easy to monitor the work other people are doing to the master tree.

Initial setup

Copying the repository can be done with a single-line rsync command. Here we assume that $HOME is your home directory, and there's a directory there named pgrepo you want to store your local copy of the full repository:

export CVSROOT=$HOME/pgrepo
rsync --progress -avzCH --delete anoncvs.postgresql.org::pgsql-cvs $CVSROOT

When starting on a new piece of code, checkout a new copy of the directory tree and work from there. The suggested structure here is to make a new directory named for each project; we'll assume here that name is just "project". Here is a complete sample of how to checkout a copy of the code for that project into a useful build environment, then compile, build, and use that copy. (Note: it assumes that $CVSROOT is defined as above.)

$ cd ~
~/$ mkdir project
~/$ cd project
~/project$ cvs co pgsql  # could do "cvs co -d pgsql.project pgsql" instead of these two lines
~/project$ mv pgsql pgsql.project   
~/project$ cd pgsql.project
~/project/pgsql.project$ cvs update -dP
~/project/pgsql.project$ ./configure --prefix=$HOME/project --enable-depend --enable-cassert --enable-debug
~/project/pgsql.project$ make
~/project/pgsql.project$ make install
~/project/pgsql.project$ export PGDATA=<your database directory>
~/project/pgsql.project$ export PATH=$HOME/project/bin:$PATH
~/project/pgsql.project$ initdb
~/project/pgsql.project$ pg_ctl start 
~/project/pgsql.project$ psql

Note that it's possible to simplify the syntax on the cvs update done here on the first checkout, but it's a good habit to just use the exact same line here you'll ultimately be using to keep it up to date anyway. The reason for renaming the pgsql to pgsql.project is so that you can tell which project a random diff patch you come across came from; if you ever get multiple projects going at once you'll realize how important that is.

  • Warning: The enable-cassert and and enable-debug flags will help you find issues with your code, but a copy of PostgreSQL built using these parameters will be substantially slower than one built without them. If you're working on something performance-related, be sure to build without these flags before testing execution speed. You can turn off the assert testing, the larger of the slowdowns, at server start time by putting debug_assertions = false in your postgresql.conf. See Developer Options for more details about that setting; it defaults to true in builds done with --enable-cassert.
  • If you do an update into a tree that's already had an earlier version build there, there are some conditions under which you won't get something that builds quite right. The recommended practice is make sure you've removed all build artifacts is to do a "make distclean" in your project directory before resynchronizing against an updated repository with "cvs update".

One way to make CVS default to the correctly behavior is to setup a .cvsrc file in your home directory. Here's a sample one with the good defaults for this work. You may not want the "-z3" if you're only using CVS locally, as would be the case if you're maintaining a local copy with rsync:

cvs -z3
update -d -P
checkout -P
diff -c

Using this project environment

Once this environment is setup, when you log back in again you can switch to using it with this subset of the above. One good technique here is to put those lines as a script in the root of the directory for each project. Here is an example session showing a script that does that and how to use it:

~/project/pgsql.project$ cat setup
#!/bin/bash
export CVSROOT=$HOME/cvsrepo
export PATH=$HOME/project/bin:$PATH
export PGDATA=<your database directory>

if [ ! -f "$PGDATA/postmaster.pid" ] ; then
  pg_ctl start
fi
~/project/pgsql.project $ source ./setup
server starting
~/project/pgsql.project $

Tracking local changes

For primitive version control, you can make a diff after any significant changes:

~/project/pgsql.project$ cvs diff -cN > project-1.patch

The last number is a revision number, increment by one after each diff.

Since you're operating on your own personal copy of the master repository here, you can actually commit these changes to your local tree (something that's impossible if you just checkout a copy of the tree rather than using rsync). This makes it easier to produce diffs for complicated work that is done in stages--it's hard to get an incremental diff when working on code in stages without doing something like this this. You can even use the tagging features of CVS to make this easier to manage.

The problem here is that all of your local commit, tag, or branch information will be blown away the minute you resync against the master repository. You can only consider these commits you make temporary, and it's important that you save copies of the actual diff information manually if you want to keep your changes.

For longer running projects, what some people who like this approach do is save snapshots of the incremental diffs for changes they made, resync against the master repository, and then "replay" those diffs again using the patch utility, repeating commits after each patch and possibly tagging at points. This is a manual way to recreate some of the features addressed by the "3rd party sources" approach below while still having a straight, complete copy of the master repository. This works best for people who don't want or need to synchronize against the PostgreSQL HEAD often.

Updating the repository and project copies

~/$ rsync --progress -avzCH --delete anoncvs.postgresql.org::pgsql-cvs $CVSHOME
~/$ cd $HOME/project/pgsql.project
~/project/pgsql.project$ cvs update -dP 

Just source code from the repository via rsync, local changes into branches

If you really don't care about exactly what's done to the master repository, and just want to keep current with it so your own patches always work against the current CVS HEAD, there is an alternate way to use CVS and rsync together. In this approach, your own CVS repository becomes the primary working area. Rather than copying the whole master repository, the PostgreSQL CVS HEAD is copied as the trunk for your own CVS tree, with your own changes all happening in branches.

Here you would duplicate just the source code from the master repository (specifically ignoring its revision history in preference for you own) periodically, occasionally checking in those changes into the trunk.

The main advantage of this approach is that it gives you very strong revision control on the changes you make yourself. With the other rsync-based approach, the process of generating and managing patch diffs is very much a manual process. It's particularly unsuitable for the relatively common task of developing patches that apply in sections.

If local coding is your primary focus, there is a set of techniques for doing that while keeping in sync with the master repository. It's called managing or tracking third party sources. Here the PostgreSQL repository would be considered "vendor" code that you wanted to track while also customizing.

Links for more information:

Use another version control system, sync entire repository

Expanding on the previous section, once you've made the leap to the idea that your checkout of the PostgreSQL source code is just updating the trunk of your local development tree, there's no reason you then need to use CVS to handle that local development at all. For example, you could easily host all your local code with Subversion instead, organizing with the standard Subversion trunk/branch/tags directory structure and operating like this:

  • Checkout a copy of your local trunk into a working directory
  • Use rsync to update it to match the current PostgreSQL repository. You don't even have to ignore the CVS files because Subversion doesn't operate on them
  • Run svn status or some other process to cope with files that were added
  • svn commit this update to the trunk
  • svn update your other copies and merge changes as needed
  • svn checkout a branch you use for your own development, hack away on it. Create copies of the tree into the tags directory structure at useful section breakpoints.
  • svn diff --diff-cmd /usr/bin/diff -x '-cN' to generate diffs in the right format when you're done.

With this approach, you don't even lose the CVS information needed to track repository history; just point CVSROOT to a local checkout of the trunk and you can run cvs to look at it.

There is nothing special about Subversion, other systems could work just as well. For example, you could use Git for your local repository, utilizing all its support for local branching, and then just sync the currently active branch with the PostgreSQL CVS repository using cg-update.

rsync notes from the pgbuildfarm HOWTO

This section should be removed once all useful insight from it has merged into other sections

First I made a repo location, and got an initial repo copy:

  mkdir -p /home/cvsmirror/pg
  rsync -avzCH --delete anoncvs.postgresql.org::pgsql-cvs /home/cvsmirror/pg

Then remove the sup directory and set up an rsync exclude file:

  rm -rf /home/cvsmirror/pg/sup
  echo /sup/ > /home/cvsmirror/pg-exclude
  echo '/CVSROOT/loginfo*' >> /home/cvsmirror/pg-exclude
  echo '/CVSROOT/commitinfo*' >> /home/cvsmirror/pg-exclude
  echo '/CVSROOT/config*' >> /home/cvsmirror/pg-exclude

Switch to using this repository by setting CVSROOT.

Then add a job to cron something like this:

  43 * * * * rsync -avzCH --delete --exclude-from=/home/cvsmirror/pg-exclude anoncvs.postgresql.org::pgsql-cvs /home/cvsmirror/pg

Other rsync+CVS resources

For one interesting approach for keeping the local tree in sync, see the script at How to make a CVS mirror of an rsync repository

CVSup Solutions

Pros and Cons of CVSup

Working on the PostgreSQL code using CVSup has some unique aspects to it that are worth talking about, so it's clear what situations it might be the appropriate tool for. The main compelling feature of CVSup in the context of PostgreSQL source code control is how it supports Local Modifications in your CVS Repository. This uses a technique that keeps your own, local CVS branches distinct from those in the original archive. It relies on setting a CVS environment variable called CVS_LOCAL_BRANCH_NUM to keep the revision numbers from conflicting. When you update against the master, CVSup helps reconcile repository changes against the local ones. A good sample workflow for operating with CVSup in this fashion is at Maintaining and Synchronizing Local CVS Repository.

While this all seems ideal for PostgreSQL hacking, there are several hurdles to using it:

  • CVSup is written in Modula-3, which takes a considerable amount of work to install on most operating systems--and may not compile/install at all on others.
  • Getting the cvsupfile configuration correct is difficult, and because this software isn't very popular it's hard to get help if you have issues.
  • Complicated CVSup configurations are more fragile than their rsync-based equivilents, both because the code isn't used as widely and because the configuration is so complicated. It's not unusual to hear reports of a perfectly functional CVSup configuration that just stops working after some seemingly trivial update, and troubleshooting these problems can be very frustrating.
  • The CVS_LOCAL_BRANCH_NUM feature started as a hack to the FreeBSD version of CVS, but it's made its way into the current CVS 1.2 available from the GNU CVS. However, the CVS bundled with most operating systems at this point is still CVS 1.1, so you'll likely have to upgrade that package yourself.

CVSup notes from the pgbuildfarm HOWTO

This section should be removed once all useful insight from it has merged into other sections

After a few false starts, I got it working replicating the entire repo at postgresql.org, including the CVSROOT directory. Then I commented out the entries in CVSROOT/loginfo and CVSROOT/commitinfo, and set up the LockDir directive as I wanted it it CVSROOT/config. Then I checked out the CVSROOT module and did that all over again, and checked the module back in. Then to make sure CVSup didn't overwrite those files, I made entries for them in <mirror-home>/sup/repository/refuse. With that done, I was able to change the build config on that machine so that the config variable "cvsrepo" was just the name of the mirror root directory. Everything worked fine. After that I set up an anonymous cvs pserver against the mirror, so that my other machine could also get the source from there instead of from postgresql.org. I did a "cvs login", changed the "cvsrepo" config variable on that machine, and it worked happily too. Finally, I set up a cron job on the mirror machine to update the mirror. The anonymous repository is only updated from the master once every hour, so there is no point in running the cron job more often than that. This should not be too big a deal, as CVSup is extremely efficient, and even doing this so frequently should not incur a lot of bandwidth use.

Other versions of the PostgreSQL Repository

There are several projects investigating alternatives to CVS that are converting the existing repository to a new format. They're listed here by geographic location because these sites aren't necessarily setup with a large amount of global bandwidth.

Subversion

There is a straight conversion of the CVS repository into Subversion available: USA

Git

The main Git conversions of the PostgreSQL repository is git.postgresql.org, hosted in France and updated every 30 minutes. See Other Git Repositories for additional ones. Working with Git covers similar material to this document but using the Git toolchain.

Kernel.org and this blog talk about converting between CVS and Git.

Mercurial

  • Chile: Updated once an hour

Other Version Control Tools

cvsutils

To create patches which would otherwise require that you have write access to the CVS repository, for example ones that add or remove files, you can use cvsutils. The cvsutils toolchain is packaged for many operating systems and available in source form.

Tailor

There is a tool called Tailor which helps migrate changesets among ArX, Bazaar, Bazaar-NG, CVS, Codeville, Darcs, Git, Mercurial, Monotone, Subversion and Tla repositories.

Credits

The information in this document came from many sources. Contributors to the initial documentation include Heikki Linnakangas, Pavan Deolasee, Florian Pflug, Jim Nasby, David Fetter, and Greg Smith.