Switching PostgreSQL documentation from SGML to XML

From PostgreSQL wiki
Jump to navigationJump to search

For about the last 10 years, there have been occasional discussions about switching the PostgreSQL documentation from SGML to XML (whatever that might mean). Here are some points to consider.

Technical points

There is a distinction between changing the source format from SGML to XML versus switching the tool chain from DSSSL to XSLT. Even though DSSSL is typically associated with SGML and XSLT with XML, in fact all combinations work more or less equally well.

Arguments on SGML vs XML

  • DocBook 5, the next major version, will no longer publish an SGML DTD. But DocBook 5 has been in the works for years and it might be many more years until it is the standard version and all previous versions have disappeared.
  • SGML is easier to edit than XML, because of tag minimization and some other simplifications.
  • SGML supports conditional sections (used to build the INSTALL file, for example). That needs a different solution for XML.
  • There might be more editing tools that support XML, but this needs to be substantiated.
  • Translation teams are using XML.
  • We'd probably want to rename *.sgml -> *.xml. Easier now with Git, but needs some planning.
  • With xmllint, we could a sort of pgindent on the documentation.

Arguments on DSSSL vs XSLT

  • The DSSSL tools have been unmaintained for years.
  • Small question marks on FOP (XSLT to PDF tool): Is it maintained, is there a stable version (as opposed to endless development pre-releases), can it run on an entirely free Java installation? Should dblatex be used instead of FOP?
  • The customizations that we have done to the DSSSL stylesheets would have to be ported to XSLT. This work could start already.
  • The XSLT build is currently a lot slower than the DSSSL build.
  • XSLT might offer more flexibility such as partial builds.
  • There are more tools that work with XSLT in various ways.

Formats We Generate

A compleat solution needs to be able to generate all of the following output forms,

  • HTML
  • Manual pages
  • RTF
  • PDF and Postscript
    • A4 format
    • US format
  • TeXinfo
  • ePub

Toolchain Components

In order to do a switchover, it will be necessary to replace all of the components. It is worth enumerating these components, so that nobody makes the error of thinking that some partial solution to one part represents a complete solution.

Consists of scripts that transform DocBook into some useful forms (html, pdf, rtf, man pages)
These are the DSSSL stylesheets used to render DocBook into various output forms. They might be replaced by XSLT stylesheets.
A DSSSL engine
An alternative DSSSL engine
A DSSSL engine generating TeX-based output