[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Database updates

On Fri, Jul 15, 2011 at 3:18 PM, John Locke <..hidden..> wrote:

>>> So I'm suggesting:
>>> 1. Schema index file, parsed by Perl/Bash/whatever
>>> 2. base schema sql file, auto-generated from a clean install by a script
>>> that parses #1
>>> 3. base sql data file, to populate menu tables, etc.
>> So how would this work exactly?  Are you thinking one DDL file per
>> table?  I guess I am wondering if this would create more maintenance
>> problems than it would solve.  I mean you have to store the DDL
>> somewhere, or you have to generate it from some other format which is
>> semantically reducible to the SQL DDL as supported by PostgreSQL.
>> Quite frankly if we are going that route, we should be using SQL DDL
>> instead.
> No, I'm thinking one file for all of the tables, in a pre-determined
> order. I want a quick way to tell if the schema I have in the database
> matches what LSMB expects. If I have extra tables in the database, what
> I have in mind would simply ignore them -- making it so you can have
> add-ons without breaking this...

Ok, so if one file for all tables, wouldn't the index have to be
generated from the base table script and not the other way around?  If
so, what's wrong with generating it from the Pg-database.sql?  I can't
imagine the index file has enough info to generate the schema file by
itself, or you wouldn't need an index.

There may be something else available right now that can help.  The
current test rig contains a large number of database tests which do
not run by default but are production safe.

If I run:

PGUSER=postgres PGPASSWORD='foo' PGDATABASE='lsmb13test6'
LSMB_TEST_DB=1 make test

against an out of date test db, I get:

Test Summary Report
t/43-dbtest.t               (Wstat: 1024 Tests: 148 Failed: 4)
  Failed tests:  54, 97, 104-105
  Non-zero exit status: 4
Files=18, Tests=2563, 20 wallclock secs ( 0.54 usr  0.05 sys + 13.31
cusr  0.93 csys = 14.83 CPU)
Result: FAIL
Failed 1/18 test programs. 4/2563 subtests failed.
make: *** [test_dynamic] Error 255

Just a note here on test safety:  All tests are safe on production
with the exception of the LWP test rig which has additional safeties
that must be disabled before they can be run (i.e. doesn't run unless
you really really really tell it to).  We have layers of safety here
to ensure test data doesn't pollute a production database including
the fact that all scripts run in transactions and they never commit,
meaning that all inserts etc. roll back.  These run mock-live tests
against a production database and flag misbehaviors or problems found.
 In addition to looking at schemas, they also look at, for example,
stored procedure overloading, table permissions (albeit in a cursory
way), etc.  More information about how to run test cases is found in
the README.tests.

My own preference would be to try to build additional tests into the
database testing routines if there is a need to do so.  After all, if
we are looking for a way to test,we should probably test
automatically.  Of course this involves highly recommending test rigs
as a support tool.

If I were trying to do this what I would do is:
1)  Do a clean load into a designated test database environment.
2)  Connect to both databases using psql, list all tables and types in
the public schema
3)  List schema for each table, parsing the psql output and sorting
consistently, and then comparing.
4)  Each comparison would be a test case.  Tables only in production
db would be ignored.  Tables only in the test base db would be
5)  Clean up the test db afterwards.

1 and 5 could be accomplished by adding an interface to the relevant
test scripts for db creation and teardown.  The others would have to
involve another test script of some sort.  This is a bit of work.  So
my first question is:  are the test rigs themselves (which test stored
procedure behavior, function overloading, and some some permissions
issues) sufficiently inadequate to justify putting other stuff on hold
while coding something like this.  One caveat about the test framework
is that some things will likely not have test cases written until they
are nearing completion for the reason that desired behavior in the
lower levels of a program can change a fair bit between first sitting
down and starting to work on something to get it ready to put into

> I'm looking for an automated way of creating a Pg-database.sql file so I
> can compare it at any time against any (future) version of LSMB, and see
> what schema alterations are necessary. I already can compare two
> different versions of Pg-database.sql -- but I don't know how to quickly
> tell how my installed schema matches up, without going table-by-table.

pg_dump -s [database_name] will get you most of the way there (it's a
backup of the database structure).  Note however, you'd want to
compare against a similar dump taken from a new installation for the
reason that it will have a lot of stuff from contrib that's not in the
original file.

> I'm assuming by DDL you mean the SQL statements to create a table? Or
> some other format? I understand SQL, so that would be my preference, if
> you're talking about some other format of DDL...

Yes.  DDL (Data Definition Language) is the subset of SQL used to
define data structure, like tables, etc.

>>> Hmm. Is there any way in Postgres to define column ordering via an alter
>>> statement? This is something MySQL supports ;-)
>> Nope:
>> http://wiki.postgresql.org/wiki/Alter_column_position
>> Apparently nobody wants this badly enough to actually do the work required.
>>> This might be an issue for developers, but for regular users, I think we
>>> just need to make sure we don't change the update scripts after creating
>>> them -- as long as people follow the upgrade procedures, wouldn't new
>>> columns get added to the end of the table? Or does this vary by Pg version?
>> Do we want to be limited to always adding new fields to the end of the
>> table?  I would rather have the base DDL files focused on clarity for
>> developers if given the choice because it means fewer errors and
>> omissions.
> Ok, well, as a developer, I can handle it if diff shows me a column
> definition deleted in one line and added to another. I think we should
> take care to make upgrade scripts going forward work for less technical
> users, so that column sequence either matches and doesn't trigger a
> diff, or we provide a more sophisticated tool that understands the DDL.
> (Personally I'd suggest let's start with diff and just add columns to
> the end of the table for minor releases, and then perhaps refactor
> column orders at major releases).
>> Another issue that we'd run into has to do with self-joins.  In
>> general the backup/restore processes from PostgreSQL are only
>> guaranteed to work in the event you dump both the schema and data
>> together in one file.  This is an issue for tables like menu_node.
>> It's certainly possible to get around this problem (by dropping the
>> fkey, loading the data, recreating it) but it means additional effort
>> and complexity and I think we'd have to weigh the benefit of doing so
>> against the benefit of using well tested tools to dump data in a
>> reloadable format rather than maintaining that by hand.
> Ok. I've been using pg_dump --format=c/pg_restore -c for backups -- but
> that's a different problem than comparing the schema...

use pg_dump -s for dumping schema only.
>> So I guess I think we should start from a few more basic questions:
>> Now that the menus should be pretty stable for the duration of 1.3
>> (absent bugs which you'd probably want flagged), what would you expect
>> such an output to look like that you can't get from a diff on various
>> versions of Pg-database?   I mean, what specifically is missing?
>> Table names?  Something else?  Or is it a matter of noise from
>> comparing dumps?
> Looking to solve the following problems with as little pain as possible:
> 1. Initial database load
> 2. Update database schema when it's altered
> 3. Compare installed database schema with the current release/version
> #1 works fine now, the upgrade scripts (when they are complete and don't
> fail) mostly address #2, but #3 is a pain in the butt. Unless there's
> some trick I don't know about?
> So I'm proposing that the release contains something like the following,
> for managing the database schema:
> 1. base_schema_list.txt, in a load order that works
> 2. schema_dump.pl, script that generates #3 from an installed database,
> using #1 to specify the dump order
> 3. base_schema.sql, data schema that can be automatically generated from
> #1 and #2
> 4. base_data.sql, data that only ever gets loaded once, when the company
> db is first created
> 5. modules/loadorder.txt , might as well script the install of functions
> while we're at it
> 6. modules/xxx.sql , individual module stored functions, guaranteed to
> be possible to re-run at any time without breaking things, perhaps with
> the caveat that roles need to be reloaded
> 7. modules/<dbname>_roles.sql , basically what we do now, possibly with
> a helper script to assist
> 8. upgrades/<svn-commitlevel>-description.sql , manually-built scripts
> that alter the database to make running #2 with an updated
> release/version generate a file identical to #3 at that commitlevel, as
> we're currently doing.
> Does that all make sense, or am I missing something/overthinking this?

Makes some sense.  I would suggest that if you aren't doing so, using
the test rigs as part of the update process.  You can run a test rig
against a production database using the example above which also uses
some tricks to run test cases against the application itself (hitting
the db through a commit filter, so nothing can be committed).  Whether
instead of or in addition to anything we decide to add, those test
cases are something I have used and found extremely helpful.

There are also test cases against new databases, but these are really
build-level tests, and so I wouldn't worry about those.

Best Wishes,
Chris Travers