Re: Managing database change: improvements needed

On Sat, Sep 12, 2015 at 4:27 PM, R. Ransbottom <..hidden..> wrote:

On Fri, Sep 11, 2015 at 02:35:58PM +0200, Erik Huelsmann wrote:

> > For myself, I am leaning toward setting up a local repository.
> > I would have liked to avoid adding a larger version control
> > requirement for my eventual replacement.

> sure, can you describe how you think you will track the main repository and
> how you will manage the database schema changes?

As a user, I'd have the trunk, branches for local production and local
development. I would look to have infrequent upgrades--roughly annually.

Beyond that: a branch to give back.

Most local changes are simplistic, storing and displaying extra data fields.
This was the territory I thought lsmb might have some protocols or
conventions to help manage or just prevent name collisions. Annually,
checking them and rebuilding those would not be too onerous; that without
a VCS.

Ok. That's pretty straight forward indeed.

> > For minor local changes, managing the sql model is relatively easy,
> > it is the user view that tends to grow into the odd cracks.

>
> Could you elaborate what you mean by this statement? If the sql model of
> your repository starts to diverge from the sql model in the main
> repository, how is that relatively easy to manage?

I meant that _small_ changes to the SQL, like above, are _relatively_
easy to find/track without a VCS. That is about the nature of data
definition languages. If the SQL code is spread around, it is still
findable (\dS tablename, pg_dump). From 1.4.0 to 1.4.15, there have not
been that many changes to the database.

Right. That's the intent, really: the tables shouldn't change at all (preferably) during a minor release series. The stored procedures would change in so far as to fix bugs (preferably without changing the functional API's). However, sometimes that doesn't fully work...

(So far the dojo code is pretty obscure to me, but I expect it to
have other concerns when it is creating tables--anyway I will do Dojo 101.)

I don't have any special wisdom in dealing with database change.
But with a core business transaction application, I do have caution.

Ok. That's sane. With the small scripts that Sqitch provides, you'd be handed the differences between the various schema versions by us in the form of a list of script names (the Sqitch plan) and a number of scripts.

> > > >>> We have a main schema file for the table definitions and a number of
> > > >>> modules with stored procedures grouped by "subject".
> >
> > > >>> Next to that, there's a "Fixes" file, which contains all incremental
> > > >>> updates to the schema since some version.
> >
> > > >>> When I want to change the schema as part of our development, I need
> > to
> > > >>> change the schema definition files *and* I need to add the schema
> > changes
> > > >>> to the Fixes.sql file.

I would use make to generate Pg-database.sql from Fixes.sql.
Pg-database.sql being refactored and Fixes.sql destroyed at appropriate
milestones--stopping stones for user upgrades.

Exactly. The current plan is to generate a Pg-database.sql for *every* release. People can then quickly set up new databases from scratch. People with pre-existing databases can have the scripts applied that the releases between their version and the newly installed release comes with. Sqitch will help that process by determining which ones that are.

> Yes, the little scripts *will* be written over time, because we have small

Point taken.

> changes to the database schema every now and then. Currently these small
> scripts are all "lumped" into a single large file called Fixes.sql. The

I feel there is a qualitative difference to adding to a script versus
repeatedly starting on a blank page--context, style, less naming.

> that they're in will fail, leading to humongous numbers of errors in the
> database creation/upgrade logs. As a result, it's very hard for people to
> report problems to our mailing list and even worse, users may have doubts
> about the quality of the software, with these huge numbers of errors.

I'd try to write a script with good error catching.

It's not impossible, but a *lot* more work than tracking which changes have been applied using a program like Sqitch. As in: so far as I currently know, we'd have to query the schema catalog on every change and only execute the schema change if the schema catalog indicates it's not already been applied. There are 2 ways I know how to do this:

1) Number/name the scripts and keep a catalog of the names/numbers, which can be queried by the update program

2) Query PostgreSQL's own schema catalog for specific characteristics and conditionally execute the DDL change commands

(2) would AFAIK require coding a specific stored procedure for each change, while (1) is exactly the direction I'm proposing (by using Sqitch to do it for us, that is).

> Wondering what you mean here. Are you planning to copy over all data from
> one version to another every time you upgrade say from 1.4.x -> 1.4.y

Yes. What else?

LedgerSMB uses a different approach where we modify the existing schema by adding columns, foreign keys, other constraints and stored procedures (or removing them, where appropriate). You could use the same approach. If we were to go with the use of Sqitch, you could even develop your "add my columns" scripts, add them to Sqitch, execute the Sqitch changes and then, upon update to 1.4.y, we'd provide you with the scripts to update your schema from 1.4.(x') to 1.4.(y'). Where (x') is the schema including your changes and (y') is the schema for the new version, including your changes (as long as there are no conflicts, of course).

This way, you get the benefit of re-using the efforts from the project and a full re-analysis of the schema wouldn't be necessary.

> (x<y)? If so, are you planning to develop scripts every time?

Yes, though I expect much reuse in the typical cases.

Agreed. That's very much to be expected.

> One reason why I like the way Sqitch works is that you'll be able to put
> your schema changes in sqitch files and deploy those on your LedgerSMB
> version. Then, when a new release comes out, you simply add those to the
> end of the list of sqitch files and run the standard LedgerSMB upgrade
> procedure. LedgerSMB will then simply upgrade the database with your
> modifications in it.

This sounds very much like how it is done now: q/[Ss]qitch//g.
Is writing three scripts making things less difficult?

No :-) But Sqitch does a number of things we currently don't do:

1) Check deployment consistency before deployment of new changes

2) Allow verification of the deployment afterwards

3) Prevent duplicate application of changes,

--> making the deployment logs clean of errors

--> no longer frightening users who take a peek in this log

4) Provide a downgrade path

http://stories.iovation.com/why-we-sqitch seems like an example where
sqitch is worth the effort.

Right. I see what you mean and I basically agree. However, we look at this on a different scale, I think: I'm looking at it from the perspective of the software that the LedgerSMB project distributes. This means thousands of databases across the world in way more than 2 data centers (sometimes just at home), completely out of the control of the project, where deployments should "simply work", upgrading from anywhere from 1.2.<latest> or SQL Ledger 2.6/2.8 to the version being upgraded to.

I'd like the switch to Sqitch to help people with local schema modifications to maintain their modified database and enjoy the regular updates of the project. How exactly this should pan out is something we can design now, but I think we'll find out more as we go too. I sure would like to use this opportunity to formulate and formalize the advice you were looking for but not finding in exactly this matter.

Please note that I'm very much valueing your feedback, although the above might not come across that way (at least I'm not sure it does). I've tried to explain my line(s) of reasoning more than trying to persuade anybody to do anything.

Bye,

Erik.

http://efficito.com -- Hosted accounting and ERP.

Robust and Flexible. No vendor lock-in.