Developer FAQ/zh

From PostgreSQL wiki

< Developer FAQ
Revision as of 10:35, 1 December 2013 by Jack (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Contents

Getting Involved

加入我们

How do I get involved in PostgreSQL development?

我该如何参与PostgreSQL的开发工作?

Download the code and have a look around. See downloading the source tree. 下载源代码,并且先对其有一个大概的了解。见下载源代码 Subscribe to and read the pgsql-hackers mailing list (often termed "hackers"). This is where the major contributors and core members of the project discuss development. 订阅并查看pgsql-hackers 邮件列表 (通常称作“hackers”)。这里可以看见项目的贡献者和核心开发人员的讨论进展。

How do I download/update the current source tree?

=== 我该如何下载/更新当前的源代码? There are several ways to obtain the source tree. Occasional developers can just get the most recent source tree snapshot from ftp://ftp.postgresql.org/pub/snapshot/. 你有很多种方式获取源代码。临时进行开发的开发者可以从ftp://ftp.postgresql.org/pub/snapshot/获取到最新源代码的快照。 Regular developers might want to take advantage of anonymous access to our source code management system. The source tree is currently hosted in git. For details of how to obtain the source from git see http://developer.postgresql.org/pgdocs/postgres/git.html and//t Working with Git. 长期进行开发的开发者可能会希望能够匿名访问我们的源代码管理系统。当前的源代码是保留在git中的。对于通过git来获取源代码的详细信息,请参见http://developer.postgresql.org/pgdocs/postgres/git.html和//t Working with Git

What development environment is required to develop code?

对源代码进行开发需要什么开发环境?

PostgreSQL is developed mostly in the C programming language. The source code is targeted at most of the popular Unix platforms and the Windows environment (XP, Windows 2000, and up). PostgreSQL主要是由C语言开发的。该源代码是针对大多数流行的Unix平台和Windows环境的(XP, Windows 2000及以上)。 Most developers run a Unix-like operating system and use an open source tool chain with GCC, GNU Make, GDB, Autoconf, and so on. If you have contributed to open source software before, you will probably be familiar with these tools. Developers using this tool chain on Windows make use of MinGW, though most development on Windows is currently done with the Microsoft Visual Studio 2005 (version 8) development environment and associated tools. 大多数开发者运行类Unix操作系统,并且使用开源工具链,如GCC, GNU Make, GDB, Autoconf等。如果你之前对开源软件做出过贡献,你可能会对这些工具非常熟悉。在Windows平台上的开发者利用MinGW来使用这一工具链,尽管当前大多数在Windows上的开发是使用Microsoft Visual Studio 2005 (version 8) 开发环境和与其相关的工具。 The complete list of required software to build PostgreSQL can be found in the installation instructions. 构建PostgreSQL所需软件的完整列表可以在installation instructions上找到。 Developers who regularly rebuild the source often pass the --enable-depend flag to configure. The result is that if you make a modification to a C header file, all files depend upon that file are also rebuilt. 经常去重新编译源代码的开发者通常会在configure的时候加上--enable-depend参数。这样做的结果就是,如果你对一个C头文件进行了修改,那么所有依赖该该头文件的源文件都会被重新编译。 src/Makefile.custom can be used to set environment variables, like CUSTOM_COPT, that are used for every compile. src/Makefile.custom可以被用来设置环境变量,例如CUSTOM_COPT,其在每一次编译时都会使用。

What areas need work?

Outstanding features are detailed in Todo.

PostgreSQL的哪些方面可以来开展相关工作?

突出的特性在Todo中有详细的介绍。 You can learn more about these features by consulting the archives, the SQL standards and the recommended texts (see books for developers). 你可以通过查阅archives来学习有关这些特性的更多内容、SQL标准和被大家推荐的一些文章(见给开发者的书籍)。

How do I get involved in PostgreSQL web site development?

我该如何参与PostgreSQL站点的开发工作?

PostgreSQL website development is discussed on the pgsql-www mailing list and organized by the Infrastructure team. Source code for the postgresql.org web site is stored in a Git repository. PostgreSQL站点的开发的讨论在pgsql-www mailing list,并且由Infrastructure team进行组织。postgresql.org站点的源代码保存在Git repository

Development Tools and Help

开发工具和帮助

How is the source code organized?

源代码的结构是怎样组织的?

If you point your browser at How PostgreSQL Processes a Query, you will see few paragraphs describing the data flow, the backend components in a flow chart, and a description of the shared memory area. You can click on any flowchart box to see a description. If you then click on the directory name, you will be taken to the source directory, to browse the actual source code behind it. We also have several README files in some source directories to describe the function of the module. The browser will display these when you enter the directory also. 如果通过浏览器访问How PostgreSQL Processes a Query,你将会在一个流程图中看到描述数据流和后端组件的段落,和对共享存储区域的描述。你可以点击任何一个流程图来查看对其的描述。如果你之后再点击一个目录名,你将会进入到源代码目录,以浏览其后的实际代码。在一些源代码目录中,我们也同样拥有README文件以描述该模块的功能。当你进入该目录后,浏览器也会显示这些。 Other than documentation in the source tree itself, you can find some papers/presentations discussing the code at http://www.postgresql.org/developer/coding. An excellent presentation is at http://neilconway.org/talks/hacking/ 除了源代码本身所附带的文档,你可以在http://www.postgresql.org/developer/coding中找到一些讨论源代码的论文和演示文档。http://neilconway.org/talks/hacking/是一个非常出色的演示文档。

What tools are available for developers?

对开发者有哪些可用的工具?

First, all the files in the src/tools directory are designed for developers. 首先,所有在src/tools目录中的文件都是为开发者设计的。

   RELEASE_CHANGES changes we have to make for each release
   ccsym           find standard defines made by your compiler 找到你的编译器所产生的标准定义
   copyright       fixes copyright notices 版权通知
   entab           converts spaces to tabs, used by pgindent 将空格转换为制表符,pgindent使用了该工具。
   find_static     finds functions that could be made static 找到可以写成静态的函数
   find_typedef    finds typedefs in the source code 找到源代码中的typedefs
   find_badmacros  finds macros that use braces incorrectly 找到没有正确使用大括号的宏
   fsync           a script to provide information about the cost of cache 一个脚本,其用来展示出cache同步系统调用的代价信息。
                    syncing system calls
   make_ctags      make vi 'tags' file in each directory 在每一个目录中制作vi的‘tags’文件
   make_diff       make *.orig and diffs of source 制作*.orig和源代码之间的差异
   make_etags      make emacs 'etags' files 制作emcas的‘etags’文件
   make_keywords   make comparison of our keywords and SQL'92 对我们的保留字和SQL 92进行对比
   make_mkid       make mkid ID files 制作mkid的ID文件
   git_changelog   used to generate a list of changes for each release 常常用于产生针对每一个发行版
   pginclude       scripts for adding/removing include files 用来添加/删除include文件的脚本
   pgindent        indents source files indent的源文件
   pgtest          a semi-automated build system 一个半自动化的构建系统
   thread          a thread testing script 一个线程测试的脚本

In src/include/catalog: 在src/include/目录下:

   unused_oids     a script that finds unused OIDs for use in system catalogs
   duplicate_oids  finds duplicate OIDs in system catalog definitions
   unused_oids     一个脚本,其可以找出在系统表中未使用的OID
   duplicate_oids  找到在系统表重复定义的OID

tools/backend was already described in the question-and-answer above. tools/backend已经在之前的问答部分讲解过了。 Second, you really should have an editor that can handle tags, so you can tag a function call to see the function definition, and then tag inside that function to see an even lower-level function, and then back out twice to return to the original function. Most editors support this via tags or etags files. 第二,你应该有一个可以处理标签的编辑器,这样你就能够对一个函数的调用加标签以查看该函数的定义,在这之后可以在函数内部加标签以查看更低层次的函数,并且之后再次返回到开始的函数那里。大多数编辑器通过tags或etags文件来支持这种功能。 Third, you need to get id-utils from ftp://ftp.gnu.org/gnu/idutils/ 第三,你需要从ftp://ftp.gnu.org/gnu/idutils/获取id-utils。 By running tools/make_mkid, an archive of source symbols can be created that can be rapidly queried. 通过运行tools/make_makid,将创建一个源文件符号的归档,其可以立即用来查询。 Some developers make use of cscope, which can be found at http://cscope.sf.net/. Others use glimpse, which can be found at http://webglimpse.net/. 许多开发者使用cscope,其可以在http://cscope.sf.net/中找到。其他一些人也使用glimpse,其可以在http://webglimpse.net/中找到。 tools/make_diff has tools to create patch diff files that can be applied to the distribution. This produces context diffs for easier readability. tools/make_diff有空间去创建patch diff文件,其可以应用至分发版中。这样产生的context diff更加具有可读性。 pgindent is used to fix the source code style to conform to our standards, and is normally run at the end of each development cycle; see this question for more information on our style. pgident常用于修订源代码的风格以符合我们的标准,其通常在每一个开发周期的最后被运行:有关我们代码风格的更多信息请参见这个问题 pginclude contains scripts used to add needed #include's to include files, and removed unneeded #include's. pginclude包含了一些脚本,其常常用于添加#include行以包括一些文件,以及当不需要的时候,删除一些#include行。 When adding built-in objects such as types or functions, you will need to assign OIDs to them. Our convention is that all hand-assigned OIDs are distinct values in the range 1-9999. (It would work mechanically for them to be unique within individual system catalogs, but for clarity we require them to be unique across the whole system.) There is a script called unused_oids in src/include/catalog that shows the currently unused OIDs. To assign a new OID, pick one that is free according to unused_oids, and for bonus points pick one that is nearby to related existing objects. See also the duplicate_oids script, which will complain if you made a mistake. 当你添加了一些内建的对象,诸如类型或函数时,你将需要赋给它们OID值。我们的惯例是,所有手工分配的OID都是彼此不同的,并且在区间1-9999范围内。(在单独的一个系统目录内,会机械性地保持OID的唯一性,但是为了清晰一些,我们需要OID在整个系统中都保持唯一性。)在src/include目录中有一个unused_oids的脚本,运行它可以显示出当前未使用的OID。如果想要赋一个新的OID,就可以根据unused_oids的结果来选择一个,并且尽量选择一个和当前已经存在的对象相近的OID。也请参见duplicate_oids脚本,如果你犯了错误(没有保持OID的唯一性),它也会给出提示。

What's the formatting style used in PostgreSQL source code?

PostgreSQL的编码风格是什么样子的?

Our standard format BSD style, with each level of code indented one tab, where each tab is four spaces. You will need to set your editor or file viewer to display tabs as four spaces: 我们的标准格式是BSD风格的,即每一级代码的缩进为一个tab,其中一个tab是由四个空格组成。你将需要设置你的编辑器或文件查看器以将tab显示为四个空格: For vi (in .exrc or .vimrc):

set tabstop=4 shiftwidth=4 noexpandtab

对于vi(in .exrc or .vimrc):

set tabstop=4 shiftwidth=4 noexpandtab

For less or more, specify -x4 to get the correct indentation. 对于lessmore,指明-x4以获取正确的缩进。 The tools/editors directory of the latest sources contains sample settings that can be used with the emacs, xemacs and vim editors, that assist in keeping to PostgreSQL coding standards. 最新的源代码中的tools/editors目录包含了一些可以用在emacs、xeamcs和vim编辑器的样板,其可以帮助保持PostgreSQL的编码标准。 pgindent will the format code by specifying flags to your operating system's utility indent. pgindent is run on all source files just before each beta test period. It auto-formats all source files to make them consistent. Comment blocks that need specific line breaks should be formatted as block comments, where the comment starts as /*------. These comments will not be reformatted in any way. pgindent将通过指定你操作系统工具的缩进标志来格式化代码。pgindent在每个beta版的测试期之前会针对所有文件运行一次。其自动格式化所有的源文件以保持它们的一致性。需要特定的换行的注释块应该被格式化为块式注释,这种注释方式以/*------开头。这些注释将不会被重新格式化。 See also the Formatting section in the documentation. This posting talks about our naming of variable and function names. 请参见文档中的the Formatting section这篇文章谈论了我们的变量和函数名字的命名方式。 If you're wondering why we bother with this, this article describes the value of a consistent coding style. 如果你想知道为什么我们在这些地方如此费心,这篇文章描述了一致性编码风格的意义所在。

Is there a diagram of the system catalogs available?

=== 有可以获得的系统目录图表吗? Yes, we have at least one (SVG version). 是的,我们至少有一个SVG版本

What books are good for developers?

给开发者的书籍

There are five good books: 有5本较为不错的书籍:

  • An Introduction to Database Systems, by C.J. Date, Addison, Wesley
  • A Guide to the SQL Standard, by C.J. Date, et. al, Addison, Wesley
  • Fundamentals of Database Systems, by Elmasri and Navathe
  • Transaction Processing, by Jim Gray and Andreas Reuter, Morgan Kaufmann
  • Transactional Information Systems, by Gerhard Weikum and Gottfried Vossen, Morgan Kaufmann
  • An Introduction to Database Systems, by C.J. Date, Addison, Wesley
  • A Guide to the SQL Standard, by C.J. Date, et. al, Addison, Wesley
  • Fundamentals of Database Systems, by Elmasri and Navathe
  • Transaction Processing, by Jim Gray and Andreas Reuter, Morgan Kaufmann
  • Transactional Information Systems, by Gerhard Weikum and Gottfried Vossen, Morgan Kaufmann

What is configure all about?

configure都和什么内容有关

The files configure and configure.in are part of the GNU autoconf package. Configure allows us to test for various capabilities of the OS, and to set variables that can then be tested in C programs and Makefiles. Autoconf is installed on the PostgreSQL main server. To add options to configure, edit configure.in, and then run autoconf to generate configure. configure和configure.in文件是GNU autoconf包的一部分。configure运行我们来检测操作系统对不同方面的兼容性,以及接下来可以被C程序和Makefile检测到变量。autoconf也安装在PostgreSQL的主服务器上。如果希望给configure文件添加选项,可以编辑configure.in,之后运行autoconf以产生configure文件。 When configure is run by the user, it tests various OS capabilities, stores those in config.status and config.cache, and modifies a list of *.in files. For example, if there exists a Makefile.in, configure generates a Makefile that contains substitutions for all @var@ parameters found by configure. 当用户运行configure,其就开始检测操作系统的各项兼容性,将这些结果保存在config.status和config.cache,并且修改*.in文件。例如,如果存在一个Makefile.in,configure将生成一个Makefile,其已经替换掉了configure所找到@var@参数。 When you need to edit files, make sure you don't waste time modifying files generated by configure. Edit the *.in file, and re-run configure to recreate the needed file. If you run make distclean from the top-level source directory, all files derived by configure are removed, so you see only the file contained in the source distribution. 当你需要编辑文件时,请确认你没将时间浪费在修改由configure产生的文件。你可以编辑*.in文件,之后重新运行configure以重现创建需要的文件。如果你在源代码目录的顶层目录下运行了make distclean,所有由configure产生的文件将会被移除,此时你将只会看到源代码分发版中的文件。

How do I add a new port?

我如何添加一个新的端口?

There are a variety of places that need to be modified to add a new port. First, start in the src/template directory. Add an appropriate entry for your OS. Also, use src/config.guess to add your OS to src/template/.similar. You shouldn't match the OS version exactly. The configure test will look for an exact OS version number, and if not found, find a match without version number. Edit src/configure.in to add your new OS. (See configure item above.) You will need to run autoconf, or patch src/configure too. 为了添加一个新的端口,有很多地方需要被修改。首先,从src/template目录开始。为你的操作系统添加一个合适的入口。同样,使用src/config.guess来添加你的操作系统至src/template/.similar。你不需要精确地匹配操作系统版本。configure的测试将会查找准确的操作系统版本号。并且,如果其没有找到,则将会匹配一个不带有版本号的该操作系统。编辑src/configure.in以添加你的新操作系统。(见上面的configure。)将会需要运行autoconf或者也需要给src/configure.in打补丁。 Then, check src/include/port and add your new OS file, with appropriate values. Hopefully, there is already locking code in src/include/storage/s_lock.h for your CPU. There is also a src/makefiles directory for port-specific Makefile handling. There is a backend/port directory if you need special files for your OS. 结下账,检查src/include/port,并用一个合适的值以添加你的新操作系统文件。幸运的是,已经有了一个针对你的CPU的固定代码在src/include/storage/s_lock.h中。这也有一个src/makefiles目录,其用于处理特定端口的Makefile。如果你需要针对你的操作系统的特定文件,PostgreSQL也有一个backend/port目录专门用于此种目的。

Why don't you use threads, raw devices, async-I/O, <insert your favorite wizz-bang feature here>?

为什么不使用线程、裸设备和异步I/O,<在这里添加你喜欢的高级特性>?

There is always a temptation to use the newest operating system features as soon as they arrive. We resist that temptation. 当最新的操作系统特性出现时,尽可能早的使用该特性往往具有很大的诱惑。我们拒绝这种诱惑。 First, we support 15+ operating systems, so any new feature has to be well established before we will consider it. Second, most new wizz-bang features don't provide dramatic improvements. Third, they usually have some downside, such as decreased reliability or additional code required. Therefore, we don't rush to use new features but rather wait for the feature to be established, then ask for testing to show that a measurable improvement is possible. 首先,我们支持15+操作系统,因此在我们考虑任何新特性之前,我们都需要保证其能够很好地在现有的系统上建立起来。第二,大多数高级特性并没有带来极大的性能提升。第三,它们通常有一些弱点,诸如较低的可信度或者需要额外的代码。 As an example, threads are not currently used instead of multiple processes for backends because:

  • Historically, threads were poorly supported and buggy.
  • An error in one backend can corrupt other backends if they're threads within a single process
  • Speed improvements using threads are small compared to the remaining backend startup time.
  • The backend code would be more complex.
  • Terminating backend processes allows the OS to cleanly and quickly free all resources, protecting against memory and file descriptor leaks and making backend shutdown cheaper and faster
  • Debugging threaded programs is much harder than debugging worker processes, and core dumps are much less useful
  • Sharing of read-only executable mappings and the use of shared_buffers means processes, like threads, are very memory efficient
  • Regular creation and destruction of processes helps protect against memory fragmentation, which can be hard to manage in long-running processes

(Whether individual backend processes should use multiple threads to make use of multiple cores for single queries is a separate question not covered here).

So, we are not ignorant of new features. It is just that we are cautious about their adoption. The TODO list often contains links to discussions showing our reasoning in these areas.

How are branches managed?

See Using Back Branches and Committing with Git for information about how branches and backporting are handled.

Where can I get a copy of the SQL standards?

You are supposed to buy them from ISO or ANSI. Search for ISO/ANSI 9075. ANSI's offer is less expensive, but the contents of the documents are the same between the two organizations.

Since buying an official copy of the standard is quite expensive, most developers rely on one of the various draft versions available on the Internet. Some of these are:

The PostgreSQL documentation contains information about PostgreSQL and SQL conformance.

Some further web pages about the SQL standard are:

Note that having access to a copy of the SQL standard is not necessary to become a useful contributor to PostgreSQL development. Interpreting the standard is difficult and needs years of experience. And most features in PostgreSQL are not specified in the standard anyway.

Where can I get technical assistance?

Many technical questions held by those new to the code have been answered on the pgsql-hackers mailing list - the archives of which can be found at http://archives.postgresql.org/pgsql-hackers/.

If you cannot find discussion or your particular question, feel free to put it to the list.

Major contributors also answer technical questions, including questions about development of new features, on IRC at irc.freenode.net in the #postgresql channel.

Why haven't you replaced CVS with SVN, Git, Monotone, VSS, <insert your favorite SCMS here>?

The project switched to Git in September 2010.

Development Process

What do I do after choosing an item to work on?

Send an email to pgsql-hackers with a proposal for what you want to do (assuming your contribution is not trivial). Working in isolation is not advisable because others might be working on the same TODO item, or you might have misunderstood the TODO item. In the email, discuss both the internal implementation method you plan to use, and any user-visible changes (new syntax, etc). For complex patches, it is important to get community feedback on your proposal before starting work. Failure to do so might mean your patch is rejected. If your work is being sponsored by a company, read this article for tips on being more effective.

Our queue of patches to be reviewed is maintained via a custom CommitFest web application at http://commitfest.postgresql.org.

How do I test my changes?

Basic system testing

The easiest way to test your code is to ensure that it builds against the latest version of the code and that it does not generate compiler warnings.

It is worth advised that you pass --enable-cassert to configure. This will turn on assertions within the source which will often make bugs more visible because they cause data corruption or segmentation violations. This generally makes debugging much easier.

Then, perform run time testing via psql.

Runtime environment

To test your modified version of PostgreSQL, it's convenient to install PostgreSQL into a local directory (in your home directory, for instance) to avoid conflicting with a system wide installation. Use the --prefix= option to configure to specify an installation location; --with-pgport to specify a non-standard default port is helpful as well. To run this instance, you will need to make sure that the correct binaries are used; depending on your operating system, environment variables like PATH and LD_LIBRARY_PATH (on most Linux/Unix-like systems) need to be set. Setting PGDATA will also be useful.

To avoid having to set this environment up manually, you may want to use Greg Smith's peg scripts,or the scripts that are used on the buildfarm.

Regression test suite

The next step is to test your changes against the existing regression test suite. To do this, issue "make check" in the root directory of the source tree. If any tests fail, investigate.

If you've deliberately changed existing behavior, this change might cause a regression test failure but not any actual regression. If so, you should also patch the regression test suite.

Other run time testing

Some developers make use of tools such as valgrind (http://valgrind.kde.org) for memory testing, gprof (which comes with the GNU binutils suite) and oprofile (http://oprofile.sourceforge.net/) for profiling and other related tools.

What about unit testing, static analysis, model checking...?

There have been a number of discussions about other testing frameworks and some developers are exploring these ideas.

Keep in mind the Makefiles do not have the proper dependencies for include files. You have to do a make clean and then another make. If you are using GCC you can use the --enable-depend option of configure to have the compiler compute the dependencies automatically.

I have developed a patch, what next?

You will need to submit the patch to pgsql-hackers@postgresql.org. To help ensure your patch is reviewed and committed in a timely fashion, please try to follow the guidelines at Submitting a Patch.

What happens to my patch once it is submitted?

It will be reviewed by other contributors to the project and will be either accepted or sent back for further work. The process is explained in more detail at Submitting a Patch.

How do I help with reviewing patches?

If you would like to contribute by reviewing a patch in the CommitFest queue, you are most welcome to do so. Please read the guide at Reviewing a Patch for more information.

Do I need to sign a copyright assignment?

No, contributors keeps their copyright (as is the case for most European countries anyway). They simply consider themselves to be part of the Postgres Global Development Group. (It's not even possible to assign copyright to PGDG, as it's not a legal entity). This is the same way that the Linux Kernel and many other Open Source projects works.

May I add my own copyright notice where appropriate?

No, please don't. We like to keep the legal information short and crisp. Additionally, we've heard that could possibly pose problems for corporate users.

Doesn't the PostgreSQL license itself require to keep the copyright notice intact?

Yes, it does. And it is, because the PostgreSQL Global Development Group covers all copyright holders. Also note that US law doesn't require any copyright notice for getting the copyright granted, just like most European laws.

Technical Questions

How do I efficiently access information in system catalogs from the backend code?

You first need to find the tuples (rows) you are interested in. There are two ways. First, SearchSysCache() and related functions allow you to query the system catalogs using predefined indexes on the catalogs. This is the preferred way to access system tables, because the first call to the cache loads the needed rows, and future requests can return the results without accessing the base table. A list of available caches is located in src/backend/utils/cache/syscache.c. src/backend/utils/cache/lsyscache.c contains many column-specific cache lookup functions.

The rows returned are cache-owned versions of the heap rows. Therefore, you must not modify or delete the tuple returned by SearchSysCache(). What you should do is release it with ReleaseSysCache() when you are done using it; this informs the cache that it can discard that tuple if necessary. If you neglect to call ReleaseSysCache(), then the cache entry will remain locked in the cache until end of transaction, which is tolerable during development but not considered acceptable for release-worthy code.

If you can't use the system cache, you will need to retrieve the data directly from the heap table, using the buffer cache that is shared by all backends. The backend automatically takes care of loading the rows into the buffer cache. To do this, open the table with heap_open(). You can then start a table scan with heap_beginscan(), then use heap_getnext() and continue as long as HeapTupleIsValid() returns true. Then do a heap_endscan(). Keys can be assigned to the scan. No indexes are used, so all rows are going to be compared to the keys, and only the valid rows returned.

You can also use heap_fetch() to fetch rows by block number/offset. While scans automatically lock/unlock rows from the buffer cache, with heap_fetch(), you must pass a Buffer pointer, and ReleaseBuffer() it when completed.

Once you have the row, you can get data that is common to all tuples, like t_self and t_oid, by merely accessing the HeapTuple structure entries. If you need a table-specific column, you should take the HeapTuple pointer, and use the GETSTRUCT() macro to access the table-specific start of the tuple. You then cast the pointer, for example as a Form_pg_proc pointer if you are accessing the pg_proc table, or Form_pg_type if you are accessing pg_type. You can then access fields of the tuple by using the structure pointer:

((Form_pg_class) GETSTRUCT(tuple))->relnatts

Note however that this only works for columns that are fixed-width and never null, and only when all earlier columns are likewise fixed-width and never null. Otherwise the column's location is variable and you must use heap_getattr() or related functions to extract it from the tuple.

Also, avoid storing directly into struct fields as a means of changing live tuples. The best way is to use heap_modifytuple() and pass it your original tuple, plus the values you want changed. It returns a palloc'ed tuple, which you pass to heap_update(). You can delete tuples by passing the tuple's t_self to heap_delete(). You use t_self for heap_update() too. Remember, tuples can be either system cache copies, which might go away after you call ReleaseSysCache(), or read directly from disk buffers, which go away when you heap_getnext(), heap_endscan, or ReleaseBuffer(), in the heap_fetch() case. Or it may be a palloc'ed tuple, that you must pfree() when finished.

Why are table, column, type, function, view names sometimes referenced as Name or NameData, and sometimes as char *?

Table, column, type, function, and view names are stored in system tables in columns of type Name. Name is a fixed-length, null-terminated type of NAMEDATALEN bytes. (The default value for NAMEDATALEN is 64 bytes.)

  typedef struct nameData
   {
       char        data[NAMEDATALEN];
   } NameData;
   typedef NameData *Name;

Table, column, type, function, and view names that come into the backend via user queries are stored as variable-length, null-terminated character strings.

Many functions are called with both types of names, ie. heap_open(). Because the Name type is null-terminated, it is safe to pass it to a function expecting a char *. Because there are many cases where on-disk names(Name) are compared to user-supplied names(char *), there are many cases where Name and char * are used interchangeably.

Why do we use Node and List to make data structures?

We do this because this allows a consistent way to pass data inside the backend in a flexible way. Every node has a NodeTag which specifies what type of data is inside the Node. Lists are groups of Nodes chained together as a forward-linked list. The ordering of the list elements might or might not be significant, depending on the usage of the particular list.

Here are some of the List manipulation commands:

lfirst(i)
lfirst_int(i)
lfirst_oid(i)
return the data (a pointer, integer or OID respectively) of list cell i.
lnext(i)
return the next list cell after i.
foreach(i, list)
loop through list, assigning each list cell to i.

It is important to note that i is a ListCell *, not the data in the List cell. You need to use one of the lfirst variants to get at the cell's data.

Here is a typical code snippet that loops through a List containing Var * cells and processes each one:

           List        *list;
           ListCell    *i;
           ...
           foreach(i, list)
           {
               Var *var = (Var *) lfirst(i);
               ...
               /* process var here */
           }
lcons(node, list)
add node to the front of list, or create a new list with node if list is NIL.
lappend(list, node)
add node to the end of list.
list_concat(list1, list2)
Concatenate list2 on to the end of list1.
list_length(list)
return the length of the list.
list_nth(list, i)
return the i'th element in list, counting from zero.
lcons_int, ...
There are integer versions of these: lcons_int, lappend_int, etc. Also versions for OID lists: lcons_oid, lappend_oid, etc.

You can print nodes easily inside gdb. First, to disable output truncation when you use the gdb print command:

(gdb) set print elements 0

Instead of printing values in gdb format, you can use the next two commands to print out List, Node, and structure contents in a verbose format that is easier to understand. Lists are unrolled into nodes, and nodes are printed in detail. The first prints in a short format, and the second in a long format:

(gdb) call print(any_pointer)
(gdb) call pprint(any_pointer)

The output appears in the server log file, or on your screen if you are running a backend directly without a postmaster.

I just added a field to a structure. What else should I do?

The structures passed around in the parser, rewriter, optimizer, and executor require quite a bit of support. Most structures have support routines in src/backend/nodes used to create, copy, read, and output those structures -- in particular, most node types need support in the files copyfuncs.c and equalfuncs.c, and some need support in outfuncs.c and possibly readfuncs.c. Make sure you add support for your new field to these files. Find any other places the structure might need code for your new field -- searching for references to existing fields of the struct is a good way to do that. mkid is helpful with this (see available tools).

Why do we use palloc() and pfree() to allocate memory?

palloc() and pfree() are used in place of malloc() and free() because we find it easier to automatically free all memory allocated when a query completes. This assures us that all memory that was allocated gets freed even if we have lost track of where we allocated it. There are special non-query contexts that memory can be allocated in. These affect when the allocated memory is freed by the backend.

What is ereport()?

ereport() is used to send messages to the front-end, and optionally terminate the current query being processed. See here for more details on how to use it.

What is CommandCounterIncrement()?

Normally, statements can not see the rows they modify. This allows UPDATE foo SET x = x + 1 to work correctly.

However, there are cases where a transaction needs to see rows affected in previous parts of the transaction. This is accomplished using a Command Counter. Incrementing the counter allows transactions to be broken into pieces so each piece can see rows modified by previous pieces. CommandCounterIncrement() increments the Command Counter, creating a new part of the transaction.

I need to do some changes to query parsing. Can you succintly explain the parser files?

The parser files live in the 'src/backend/parser' directory.

scan.l defines the lexer, i.e. the algorithm that splits a string (containing an SQL statement) into a stream of tokens. A token is usually a single word (i.e., doesn't contain spaces but is delimited by spaces), but can also be a whole single or double-quoted string for example. The lexer is basically defined in terms of regular expressions which describe the different token types.

gram.y defines the grammar (the syntactical structure) of SQL statements, using the tokens generated by the lexer as basic building blocks. The grammar is defined in BNF notation. BNF resembles regular expressions but works on the level of tokens, not characters. Also, patterns (called rules or productions in BNF) are named, and may be recursive, i.e. use themselves as sub-patters.

The actual lexer is generated from scan.l by a tool called flex. You can find the manual at http://flex.sourceforge.net/manual/

The actual parser is generated from gram.y by a tool called bison. You can find the manual at http://www.gnu.org/s/bison/.

Beware, though, that you'll have a rather steep learning curve ahead of you if you've never used flex or bison before.

I get shift/reduce conflict I don't know how to deal with

See Fixing_shift/reduce_conflicts_in_Bison

What debugging features are available?

First, if you are developing new C code you should ALWAYS work in a build configured with the --enable-cassert and --enable-debug options. Enabling asserts turns on many sanity checking options. Enabling debug symbols supports use of debuggers (such as gdb) to trace through misbehaving code. When compiling on gcc, the additional cflags -ggdb -Og are also useful, as they insert the maximum amount of debugging info detail; you can pass them to configure with something like:

   ./configure --enable-cassert --enable-debug CFLAGS="-ggdb -Og"

The postgres server has a -d option that allows detailed information to be logged (elog or ereport DEBUGn printouts). The -d option takes a number that specifies the debug level. Be warned that high debug level values generate large log files. This option isn't available when starting the server via pg_ctl, but you can use -o log_min_messages=debug4 or similar instead.

If the postmaster is running, start psql in one window, then find the PID of the postgres process used by psql using SELECT pg_backend_pid(). Use a debugger to attach to the postgres PID - gdb -p 1234 or, within a running gdb, attach 1234. You can set breakpoints in the debugger and then issue queries from the psql session.

If you are looking to find the location that is generating an error or log message, set a breakpoint at errfinish. This will trap on all elog and ereport calls for enabled log levels, so it may be triggered a lot. If you're only interested in ERROR/FATAL/PANIC, use a gdb conditional breakpoint for errordata[errordata_stack_depth].elevel >= 20, or set a source-line breakpoint within the cases for PANIC, FATAL, and ERROR in errfinish. Note that not all errors go through errfinish; in particular, permissions checks are thrown separately. If your breakpoint doesn't trigger, git grep for the error text and see where it's thrown from.

If you are debugging something that happens during session startup, you can set PGOPTIONS="-W n", then start psql. This will cause startup to delay for n seconds so you can attach to the process with the debugger, set appropriate breakpoints, then continue through the startup sequence.

If the postmaster is not running, you can actually run the postgres backend from the command line, and type your SQL statement directly. This is almost always a bad way to do things, however, since the usage environment isn't nearly as friendly as psql (no command history for instance) and there's no chance to study concurrent behavior. You might have to use this method if you broke initdb, but otherwise it has nothing to recommend it.

You can also compile with profiling to see what functions are taking execution time --- configuring with --enable-profiling is the recommended way to set this up. (You usually shouldn't use --enable-cassert or -Og when studying performance issues, since the checks it enables are not always cheap, but --enable-debug is fine with gcc) Profile files from server processes will be deposited in the pgsql/data directory. Profile files from clients such as psql will be put in the client's current directory.

Personal tools