[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

pg_dump, statement logging and revision control of a company database



> > Putting the plaintext backup of the company database under version
> > control. At any time I can and do make snapshots, then examine the
> > diff to see how each LedgerSMB operation affects the database tables.
>
> BTW, a better idea would be to turn on statement logging and revision
> control your logs :-)  database backups can be problematic for revision
> control for a number of reasons including the dump order of the rows.
>
> If anyone needs to know how to do this, let me know :-)

I'll bite. Let's document each method here for the benefit of the list
readers. I think they may be complementary.

I have been using pg_dump, via two convenience scripts (per company)

backup.sh:
pg_dump --format=p -c -U postgres mycompany --file=mycompany.sql

restoredb.sh:
psql -U postgres -d ledgercdog -f ledgercdog.sql

with mycompany.sql under revision control. This allows one to:

- get a birds-eye view of the whole changed row(s), including defaults,
  empty strings and NULLs.

- easily go back to any previous snapshot

The limitation Chris mentioned, that pg_dump tables are not sorted, is
exactly right. Usually pg_dump generates the same or similar sort
between runs, but on occasion it does limit the usefulness of the diff
between revisions.

If it became annoying enough, a well-tested post-processing script could
sort the rows between a COPY statement and the closing "\n\\.\n"

Off topic, but I'd really like to see table sort order implemented in
pg_dump. From what I understand, by the time the database gets big enough
that sorting the table would be a performance problem, you probably aren't
using pg_dump anyway.

I didn't investigate statement logging because I didn't expect that
information alone would let me (easily) roll back to a previous revision.

Personally, after some life-lesson experiences with MSSQL and Exchange
backups, I'll never be without a plaintext database backup again.

I can see that collecting the SQL statements from the log would be
helpful for generating scripts which repetitively operate on the database,
after observing how the UI submits its changes to the database.

Chris, can you jump in to show the recommended way to turn on and off
statement logging, including a persistent setting if possible. What can
the developer can do with that information during the typical data-munging
process of loading a customer's data into a new LedgerSMB instance?

Thanks,
Jeff