Re: Managing database change: improvements needed

Well, other than the standard Perl steps "copy to blib" and "manify POD", we don't really have build steps of our own. I'm indeed proposing that we have a build step. I envision it to be something like this:

* Create PostgreSQL database
* Play Sqitch scripts
* Dump schema from the database

and then consider the dumped schema a build artifact which doesn't need version control. This step would be executed before creating a distribution archive.

The pre-distribution step should -if we go with Sqitch- also include compilation of a Sqitch bundle, so we don't distribute development progress, branches, or whatever that might be in the Sqitch repository (which is - as far as I understand it - stored in the development tree).

I was assuming we would ship the schema. So for a new db, no need for the end user to build from sqitch. We could also use sqitch to generate migration files etc. That way *we* need sqitch to get things going, but users don't need it.

Ok. My idea was (and still is) that we ship the schema for *new* databases, but we'd ship a sqitch bundle for upgrades. How do you think we should handle upgrades, if we don't require users to have at least part of sqitch installed?

Oh one downside to sqitch reading the docs is that it requires union merge on the sqitch.plan file so we may want to wait a little for this to be supported in Mercurial. That would also provide an easily checkable intermediary step.

Agreed, but union merge has now been merged into Mercurial; AFAICT, it'll be part of the 3.6 release which will be some time around October -- much earlier than we'll be changing over to use Sqitch, I'd imagine (thinking we'd start using Sqitch at 1.5.0).

PG Extensions are definitely in the "reuse work of others" category but the system solves a different problem than sqitch.

This is an interesting difference in views and we should explore it a bit.

Indeed the two together could be a very powerful combination.

PG Extensions would allow us to ship db components (whether table sets, etc) in a versioned way. There is likely to be a learning curve here and it adds a major build dependency for people installing from our source packages. But it provides some really nice features including versioning and dependency tracking/enforcement.

What PG Extensions would allow us to do would be to effectively ship versioned pieces of our db schema and have an orderly way of upgrading them.

Ok. So far, I'm not sure I see the difference with Sqitch, apart from the fact that Sqitch is something we'd have to depend on and extensions are built into PostgreSQL. One thing where I can see a difference - but we wouldn't be using it, is my current estimation - is managing *external* dependencies. But with Sqitch, we can manage internal dependencies, so, if that's what we're using extensions for, what would be your envisioned difference?

There are a few differences. First, reloading the db stored procedures can be as simple as "drop extension... create extension...." across each of the processing modules.

Depending on whether we'd design the modules to also support the "ledgersmb extension" use-case such as Budgetting was for 1.3, extensions should also contain tables, I think (is this even possible?). Dropping the extension becomes quite problematic in that case. My proposed use of Sqitch is entirely directed at the versioning of the core LedgerSMB database schema, though, so if we restrict the use of extensions to that, I guess you're right.

However, I think John's point is also very important: if we want to support cloud-hosted databases (and I think we do?), then how would that interoperate with extensions? I know how it would interoperate with Sqitch: seamlessly, because everything will be run over normal database connections.

The second thing is we can track internal dependencies at install time without pushing another versioning system to users (they need the pg build tools installed but they will be from any source build and only a small package away for debian users).

I'm not sure what you mean here: in my vision users wouldn't need to learn about it; we'd simply add App::Sqitch as a dependency on LedgerSMB, meaning it'll be loaded from cpan or from the packaging system. Much like Starman and - in 1.6 (?) - Dancer.

Then, in setup.pl, we can invoke Sqitch's APIs to run the database upgrades required. No need for users to see anything about Sqitch.

Secondly, we can make our version tracking useful for external modules.

While I do see value in offering version tracking to add-ins and other extensions, I think we have way too little infrastructure in place for people to start writing add-ins at all: currently, it's impossible to add a tab to the contacts screen without patching a truckload of files. And that's even just an extremely basic use case. If we want to offer an extension system (which we probably do, in time), we'd need to look long and hard at Drupal, Joomla and others to copy their best practices. For now, I'm thinking that providing a stable webservice API is much more important.

For things like stored procedure sections, we could always drop the extension and load a new version. But it would make it very clean. And we could even break up parts of the db into modules that could be independently versioned.

Getting this tooling to work right is not a minor project however. It definitely cannot be done in a minor release because of the additional dependencies, and I think we'd need a policy against adding extensions to core in major versions. However it would also provide a nice platform for third party integration and reuse as well.

Each of our modules could be an extension and separately versioned.

Ok. I like the technical side of that, but I'm not sure we need that at this point. I *do* see value for this in terms of optional modules. Say the way we used to have Budgetting as an extension in 1.3 (although it solves only 20% of the problem, because hooking up Budgetting into the web-application was actually a bigger problem than getting the tables installed).

Right. This is in part for the benefit of optional modules and third party applications.

Ok. But do you mean that it is to support the 20% of the problem, leaving the 80% unsolved? Combined with John's point about cloud-hosted databases not having our extensions installed on the server (thus dis-allowing "CREATE EXTENSION ..." for our extensions), I'm very much leaning toward using Sqitch over extensions.

For branching and merging, the extensions would have a number specific to development head and could be re-numbered during the build process. For modules all we really need to do is drop extension/create extension if the version number has changed.

We'd probably want to reserve a number series for "next in version series." I am thinking that using 999 would be a good one, so 1.4.999 would be "whatever next comes in 1.4" and this would allow us to install and run out of the development tree.

Hmmm. Sqitch - like version control systems - does this by itself (in casu by generating a hash, like Git and other DVCSes).

Right. I am proposing a difference between what we *ship* and what we develop for storage schemas. I am also saying I don/t think that is needed for the stored proc modules.

I too am proposing we make a difference between what we ship and what we develop :-) However, my difference comes from using a development/release machine to build a clean DB schema on every release for the creation of new databases and a Sqitch bundle file for upgrading existing databases. Both do not exist during development and have to be generated upon release.

BTW, what I found out is that Sqitch generates a blockchain of changes rolled out to the database, not of the changes in the plan file. Using the blockchain in the database, it validates the correctness of the order of changes to be applied to the database. So, contrary to my original thought, changes can be merged between branches without big issues (usually).

If we're going to go with Sqitch, I don't think it's a good idea to switch just then and there, but test it to build up some experience and then choose a well defined point in time to start using it.
Agreed. Maybe start on plugins?

Actually, I was thinking of starting with 1.5 *after* it's been upgraded from 1.4? However, that would mean that *everybody* would need to migrate through 1.5.0, because that's the moment we'll move to Sqitch? (Or we could choose 1.6 or ...?)

That places a lot of responsibility on a .0 release. I would suggest if we go that route, we still maintain migration tools for 1.4 + 1.3 and we ship db schemas rather than sqitch plans. Maybe after we have experience with it, we can then switch to sqitch plans in a future migration branch?

At first I didn't understand what you meant here at all. I think I do now. You think I was saying we do not provide upgrade scripts. (Although I don't see how else we'd get at 1.5.0.)

What I meant was (and is) this process: when upgrading from 1.3 or 1.4, the upgrade scripts upgrade the install from the respective 1.3/1.4 version to the database schema that corresponds with 1.5.0. At that point, the schema gets "Sqitch initialized" and from that point on, all upgrades are going to be run through the Sqitch bundle of a given release.

Given that all upgrades in Fixes.sql always need to be run anyway, why is this too big a responsibility for Sqitch or the 1.5.0 release?

So, there are 2 proposals here. What do you say?
I don't see how we can get away from doing some things ourselves, so I guess the question is what and where to reuse other work.

So my preference would be sqitch and pg extensions. In my mind this would require a two-step build process for dbs:

1. Build the schema extensions using sqitch
2. Install all pg extensions using make.

Ok. Is this on releases only, or with our day-to-day work on the tree?

For db storage schema changes, on release only. We can always play sqitch on dev databases.

This far, we're on the same page for new installs. But what about upgrades? (e.g. 1.5.0->1.5.2)

The second though does impose some user dependencies in that pg_config and other pg development tools must be installed before the build can be done. Of course for those installing debs, these would be done when building debs rather than when installing them.

If we use sqitch only, we also need it as a dependency, both for developers and for end users. However, we probably don't need to learn our users about it, if we simply use it as a library.

True, but we have to support it. And we have to depend on it, make sure it is installed, etc.

True, but there's a *huge* list of items we currently depend on (TeX is 100s of MBs!); I do see a small risk here, but our dependency on TeX for PDF I find much more thorny. Maybe we should consult Jame for his opinion about the extra dependency? I mean, he's maintaining packages for LedgerSMB. (I could see how we'd make sure there are packages available for Ubuntu/Debian for Sqitch; either through apt.ledgersmb.org or otherwise.)

Also there are a bunch of areas I am not 100% sure about in this regard (why I think extensions are cleaner).
For example we support a bunch of different versions of PostgreSQL. Some, but not all, will throw an error if you do something like:

create or replace function foo (bar int) returns bool language sql as $$ return true; $$;
create or replace function foo (baz int) returns bool language sql as $$ return true; $$;

The reason is that more recent versions of Pg tightened up the handling of named arguments so now the change in argument name requires a drop function before the new one can be created.

Right. We are lacking upgrade tests in our test suite, but this is definitely one of the things to be added to our test suite, because upgrades have been broken over exactly this point. (I just added https://github.com/ledgersmb/LedgerSMB/issues/814 .)

I don't know how sophisticated sqitch is in this regard, and how easy it will be to mess things up so a rebuild ends up with dropped functions that are never reloaded.

As we have it right now, storage tables are separate from data processing routines so with the extension system rebuilding is as simple as:

BEGIN TRANSACTION;
DROP EXTENSION...
DROP EXTENSION ....
...
CREATE EXTENSION ....
CREATE EXTENSION ....
....
COMMIT;

If something goes wrong, the extensions are exactly where they were before the failed effort. In other words, our entire db rebuild/upgrade process would be entirely transactional (while right now it is one transaction per module), and it would be more predictable, clearer, simpler, and with probably less code.

Right. Sqitch runs each change in one transaction, so not the entire list of changes. However, since every change is run in a transaction separately, the database remains in a well-defined state all the time: the database can be upgraded from the "current" state to any later state with a well-defined list of scripts.

Bye,

Erik.

http://efficito.com -- Hosted accounting and ERP.

Robust and Flexible. No vendor lock-in.