Committing with Git

From PostgreSQL wiki
Jump to navigationJump to search

This document is intended for PostgreSQL project Committers. Regular project contributors should see the introductions for Submitting a Patch. This is not a complete tutorial on using git. See Working with Git or the git documentation. Committers may also wish to review the Committing checklist page.

Common setup

1. To connect to the gitmaster repository, you will need to have your PostgreSQL SSH key loaded into your SSH agent, or available somewhere that SSH knows to look for it, such as ~/.ssh/authorized_keys.

2. Since merge commits may not be pushed, it is a good idea to set up your repository to rebase rather than merging. Without this, if someone else pushes a commit after you pull and before you push, your local repository will contain a merge commit that you'll need to manually remove before you can push.

cd postgresql
git config branch.master.rebase true
git config branch.autosetuprebase always

The last of these commands ensures that any subsequent tracking branches that you create will have branch.<name>.rebase configured automatically.

3. Any commits you push must have matching author and committer tags, and your name and email address must match those configured on the server. So, if you are Foo Bar <fbar@postgresql.org>, do:

git config user.name "Foo Bar"
git config user.email fbar@postgresql.org

If you use the same email address for all of the git repositories where you commit, you can (or perhaps already have) configure it globally instead; this will update ~/.gitconfig rather than the .git/config for the current repository:

git config --global user.name "Foo Bar"
git config --global user.email foo@bar.net

4. Always use "git push --dry-run" option before the real thing!

Committing Using a Single Clone

1. Clone the master repository. Note that this is not the same as the public repository; however, changes from the master repository are regularly pushed to the public repository.

git clone ssh://git@gitmaster.postgresql.org/postgresql.git

2. To commit a patch to the master branch (the equivalent of CVS HEAD), you can use any of the normal git commands. For example, if you've manually modified files or applied a patch from the mailing list that modifies existing files but does not create any new ones, you can just do:

git commit -a

If you've added new files, you must git add them first.

git add file1 file2 file3
git commit -a

Or, if the changes you want to commit are on a local branch, you can collapse the commits on the branch into a single commit on the tracking branch using:

git merge --squash branchname

Make sure to use --squash, or you'll end up with a merge commit.

3. To back-patch, you can check out the appropriate branch; a local tracking branch will automatically be created. For example:

git checkout REL9_0_STABLE
...hack, hack...
git commit -a

4. Finally, you must push your changes back to the server.

git push

This will push changes in all branches you've updated, but only branches that also exist on the remote side will be pushed; thus, you can have local working branches that won't be pushed. Or, for the avoidance of error, you can configure your repository to push only the current branch:

git config push.default tracking

5. To pull down changes others have committed, you can of course use:

git pull

If you have unpushed changes to any of the branches that have changed on the server, then (1) git will automatically attempt to rebase the currently checked-out branch (because of the configuration you did in step #2) and (2) each other branch that needs to be rebased will be out-of-sync with the server. The easiest way to fix this is to just check out the offending branch and re-pull, e.g.

git checkout REL9_0_STABLE
git pull

6. If one of your tracking branches gets messed up somehow (e.g. you accidentally merge into it, or commit something with the wrong author name/tag) and you can't figure out how to fix it, you can just snap it back to the state in which it exists on the master, throwing away your local changes, e.g.

git checkout master
git reset --hard origin/master

Make sure to use the same branch name in both commands.

Committing Using Multiple Clones

When applying a patch to many branches, it can become tedious to keep switching branches; it can be nice to be able to see all the different versions side by side. Furthermore, the PostgreSQL build system isn't smart enough to do the right thing if you switch major releases without cleaning out all the intermediates, so if you do switch branches you'll need to do a complete rebuild each time. (git clean -dfx is a useful way to clean all the cruft out of your repository, but be careful that you don't have any untracked files there that you meant to keep.)

The use git clone --reference is not recommended except for short-term, throw-away copies, because a subsequent git gc can result in data loss. Instead, use one of the techniques described below.

Independent Clone per Branch

The simplest way to commit using multiple clones is to create them as described above for a single clone, and keep a different branch checked out in each copy. You can configure each clone to pull only the branch you care about for that clone. For example, for the REL9_0_STABLE branch:

git clone ssh://git@gitmaster.postgresql.org/postgresql.git REL9_0_STABLE 
cd REL9_0_STABLE
git checkout REL9_0_STABLE
git config branch.REL9_0_STABLE.rebase true
git config user.name "Foo Bar"
git config user.email fbar@postgresql.org
git config remote.origin.fetch '+refs/heads/REL9_0_STABLE:refs/remotes/origin/REL9_0_STABLE'
git branch -D master

One disadvantage of this approach is that you will use more disk space: the .git directory for each repository, as of this writing, is a bit more than 220MB. If this is a concern, use one of the methods described below.

Dependent Clone per Branch, Pushing and Pulling From a Local Repository

Git will automatically use hard links when cloning a repository stored on the local machine. So, you could do this:

git clone --bare --mirror ssh://git@gitmaster.postgresql.org/postgresql.git
git clone postgresql REL9_0_STABLE
cd REL9_0_STABLE
git checkout REL9_0_STABLE
git config branch.REL9_0_STABLE.rebase true
git config user.name "Foo Bar"
git config user.email fbar@postgresql.org
git config remote.origin.fetch '+refs/heads/REL9_0_STABLE:refs/remotes/origin/REL9_0_STABLE'
git branch -D master

All of these steps except the first should be repeated for each branch for which you wish to maintain a clone. (If your user name and/or email are configured globally, you need not configure them again for each new repository.)

With this approach, the REL9_0_STABLE repository will pull from and push to the local postgresql repository, which in turn will pull from and push to the master server. Thus, you must do this to update (supposing both repositories are in your home directory):

cd ~/postgresql.git
git fetch
cd ~/REL9_0_STABLE
git pull

And to push, you must do this:

cd ~/REL9_0_STABLE
git push
cd ~/postgresql.git
git push

It would probably be wise to script this, if you plan to do it regularly and with multiple branches.

Clone Locally, Repoint Origin

The solution described in the previous section can be inconvenient, since it requires pushing and pulling each commit twice. One possible way to avoid this is to create a single clone from the master, then clone it multiple times locally, then repoint the origin server for each such local clone at the master. The history existing at the time of the initial clone will be shared among all the local clones (using hard links), but any new history fetched after the initial setup will consume separate storage in each local clone. This still represents a substantial savings in disk space, while avoiding the inconvenience of pushing and pulling twice.

To do this, set up each clone as described in the previous section and then perform the following additional steps:

git remote set-url origin ssh://git@gitmaster.postgresql.org/postgresql.git
git remote update
git remote prune origin 

Dependent Clone per Branch, Pulling From a Local Repository and Pushing to the Remote Repository

This method is like the "Dependent Clone per Branch, Pushing and Pulling From a Local Repository" recipe, but instead of pushing back to the local repository you set it up push direct to the remote repository, by doing, in each clone of your local mirror:

 git remote set-url --push origin ssh://git@gitmaster.postgresql.org/postgresql.git

This requires a fairly modern version of git.

You can also avoid having to fetch into the mirror and then pull into the clone in separate steps by using a shell alias. Here is a function that works with bash to combine these steps:

 function pgpull () 
 {
   pushd /path/to/mirror > /dev/null
   git fetch
   popd > /dev/null
   git pull
 }

Committing Using a Single Clone and multiple workdirs

Note: in newer versions of git, the 'git worktree' command reportedly can be used to accomplish this in a more-officially-supported way.

This method is similar to the method with single clone, but you can keep each active branch checked out all the time.

1. Create a directory to hold all the cloned repository and all the workdirs (makes it easier to remember that they're all linked to the same clone).

mkdir pgsql-git; cd pgsql-git

2. Clone the master repository. Note that this is not the same as the public repository; however, changes from the master repository are regularly pushed to the public repository.

git clone ssh://git@gitmaster.postgresql.org/postgresql.git

This creates a directory called "postgresql", and the master branch is automatically checked out into that working directory. The cloned git repository is in "postgresql/.git", which is shared with all the other workdirs we create later.

3. Prevent automatic git garbage collection.

git --git-dir=postgresql/.git/ config gc.auto 0

Rationale: With this method, you have multiple checkouts from a single repository, but "git gc" does not know about the other working directories. That is not a problem in general, but if you run "git gc" when you have staged but uncommitted in a workdir other than the master one, those changes can be lost. This is a known limitation with git-new-workdir, see [1]. Make sure you don't run "git gc" when all the back-branch checkouts are not in a clean state, and you should be safe.

4. Create workdirs for all active backbranches. The git-new-workdir tool is in git contrib directory, the path in the example below is for Debian:

sh /usr/share/doc/git/contrib/workdir/git-new-workdir postgresql/.git/ 90stable
cd 90stable
git checkout -b REL9_0_STABLE origin/REL9_0_STABLE
cd ..
 <repeat for every active back-branch>

5. You can now work separately on the checkouts as you would on CVS. All the local branches, tracking the corresponding remote branches, are in the repository shared by all workdirs. To commit a patch to any of the branches, you can use any of the normal git commands. For example, if you've manually modified files or applied a patch from the mailing list that modifies existing files but does not create any new ones, you can just do:

git commit -a

If you've added new files, you must git add them first.

git add file1 file2 file3
git commit -a

Or, if the changes you want to commit are on a local branch, you can collapse the commits on the branch into a single commit on the tracking branch using:

git merge --squash branchname

Make sure to use --squash, or you'll end up with a merge commit.

6. Finally, you must push your changes back to the server. This will push changes in *all* branches you've committed to:

git push --dry-run

This will show what's being pushed without doing anything yet, Double-check the changes, using "git log" and "git diff" with the commitid ranges printed out. Once your satisfied, push them for real:

git push

7. To pull down changes others have committed, you can of course use:

git pull

If you have unpushed changes to any of the branches that have changed on the server, then (1) git will automatically attempt to rebase the currently checked-out branch (because of the configuration you did earlier) and (2) each other branch that needs to be rebased will be out-of-sync with the server. The easiest way to fix this is to just check out the offending branch and re-pull, e.g.

git checkout REL9_0_STABLE
git pull

8. If one of your tracking branches gets messed up somehow (e.g. you accidentally merge into it, or commit something with the wrong author name/tag) and you can't figure out how to fix it, you can just snap it back to the state in which it exists on the master, throwing away your local changes, e.g.

git checkout master
git reset --hard origin/master

Make sure to use the branch name matching the workdir you're in in both commands.

Making a new release branch on origin

To create a new branch in the gitmaster repo starting from the current tip of master, do this:

git pull           # be sure you have the latest "master"
git push origin master:refs/heads/"new-branch-name"

for example

git push origin master:refs/heads/REL_12_STABLE

After this, check out the branch locally following whichever of the previous arrangements you are using.

By convention, only release branches should be pushed to the gitmaster repo; don't push experimental or feature branches there.