[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Database updates



On 07/15/2011 12:04 PM, Chris Travers wrote:

> Now that the menus should be pretty stable for the duration of 1.3
> (absent bugs which you'd probably want flagged), what would you expect
> such an output to look like that you can't get from a diff on various
> versions of Pg-database?   I mean, what specifically is missing?
> Table names?  Something else?  Or is it a matter of noise from
> comparing dumps?

It's a matter of comparing the actual schema in the database with the
codebase. I'm looking for a quick way to tell if the failed upgrade
scripts have left my database schema in the old state or was just noisy
about success, without having to go table-by-table to compare manually
against Pg-database.sql.

Sure, it's easy to compare different versions of Pg-database.sql. But I
recently skipped forward 2 weeks on trunk, ran all the new upgrade
scripts, many reported errors, and there were several hundred lines of
diffs in Pg_database.sql. I really do not want to spend hours trying to
get my schema to match, figuring out which upgrades failed, and which
schema changes were not even in an upgrade script -- I'd much sooner
spend those hours building a tool that will tell me at any point in the
future where my schema does not match the code.

I can't be alone in this...

>> So I'm suggesting:
>>
>> 1. Schema index file, parsed by Perl/Bash/whatever
>> 2. base schema sql file, auto-generated from a clean install by a script
>> that parses #1
>> 3. base sql data file, to populate menu tables, etc.
> So how would this work exactly?  Are you thinking one DDL file per
> table?  I guess I am wondering if this would create more maintenance
> problems than it would solve.  I mean you have to store the DDL
> somewhere, or you have to generate it from some other format which is
> semantically reducible to the SQL DDL as supported by PostgreSQL.
> Quite frankly if we are going that route, we should be using SQL DDL
> instead.
>
>

No, I'm thinking one file for all of the tables, in a pre-determined
order. I want a quick way to tell if the schema I have in the database
matches what LSMB expects. If I have extra tables in the database, what
I have in mind would simply ignore them -- making it so you can have
add-ons without breaking this...

I'm looking for an automated way of creating a Pg-database.sql file so I
can compare it at any time against any (future) version of LSMB, and see
what schema alterations are necessary. I already can compare two
different versions of Pg-database.sql -- but I don't know how to quickly
tell how my installed schema matches up, without going table-by-table.

I'm assuming by DDL you mean the SQL statements to create a table? Or
some other format? I understand SQL, so that would be my preference, if
you're talking about some other format of DDL...


>> Hmm. Is there any way in Postgres to define column ordering via an alter
>> statement? This is something MySQL supports ;-)
> Nope:
> http://wiki.postgresql.org/wiki/Alter_column_position
>
> Apparently nobody wants this badly enough to actually do the work required.
>
>> This might be an issue for developers, but for regular users, I think we
>> just need to make sure we don't change the update scripts after creating
>> them -- as long as people follow the upgrade procedures, wouldn't new
>> columns get added to the end of the table? Or does this vary by Pg version?
> Do we want to be limited to always adding new fields to the end of the
> table?  I would rather have the base DDL files focused on clarity for
> developers if given the choice because it means fewer errors and
> omissions.

Ok, well, as a developer, I can handle it if diff shows me a column
definition deleted in one line and added to another. I think we should
take care to make upgrade scripts going forward work for less technical
users, so that column sequence either matches and doesn't trigger a
diff, or we provide a more sophisticated tool that understands the DDL.
(Personally I'd suggest let's start with diff and just add columns to
the end of the table for minor releases, and then perhaps refactor
column orders at major releases).


> Another issue that we'd run into has to do with self-joins.  In
> general the backup/restore processes from PostgreSQL are only
> guaranteed to work in the event you dump both the schema and data
> together in one file.  This is an issue for tables like menu_node.
> It's certainly possible to get around this problem (by dropping the
> fkey, loading the data, recreating it) but it means additional effort
> and complexity and I think we'd have to weigh the benefit of doing so
> against the benefit of using well tested tools to dump data in a
> reloadable format rather than maintaining that by hand.

Ok. I've been using pg_dump --format=c/pg_restore -c for backups -- but
that's a different problem than comparing the schema...


> So I guess I think we should start from a few more basic questions:
>
> Now that the menus should be pretty stable for the duration of 1.3
> (absent bugs which you'd probably want flagged), what would you expect
> such an output to look like that you can't get from a diff on various
> versions of Pg-database?   I mean, what specifically is missing?
> Table names?  Something else?  Or is it a matter of noise from
> comparing dumps?

Looking to solve the following problems with as little pain as possible:

1. Initial database load
2. Update database schema when it's altered
3. Compare installed database schema with the current release/version


#1 works fine now, the upgrade scripts (when they are complete and don't
fail) mostly address #2, but #3 is a pain in the butt. Unless there's
some trick I don't know about?

So I'm proposing that the release contains something like the following,
for managing the database schema:

1. base_schema_list.txt, in a load order that works
2. schema_dump.pl, script that generates #3 from an installed database,
using #1 to specify the dump order
3. base_schema.sql, data schema that can be automatically generated from
#1 and #2
4. base_data.sql, data that only ever gets loaded once, when the company
db is first created
5. modules/loadorder.txt , might as well script the install of functions
while we're at it
6. modules/xxx.sql , individual module stored functions, guaranteed to
be possible to re-run at any time without breaking things, perhaps with
the caveat that roles need to be reloaded
7. modules/<dbname>_roles.sql , basically what we do now, possibly with
a helper script to assist
8. upgrades/<svn-commitlevel>-description.sql , manually-built scripts
that alter the database to make running #2 with an updated
release/version generate a file identical to #3 at that commitlevel, as
we're currently doing.


Does that all make sense, or am I missing something/overthinking this?

Cheers,
-- 
John Locke
http://www.freelock.com