[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: upgrades and database integrity.



Hi,

------- Original Message  --------
Subject: Re: [Ledger-smb-users] upgrades and database integrity.
From: Chris Travers <..hidden..>
To: ..hidden..
Date: Fri 08 Jan 2010 01:30:23 PM PST
>
>
> On Fri, Jan 8, 2010 at 12:52 PM, Alvin Starr <..hidden..
> <mailto:..hidden..>> wrote:
>
>
>
>     The database dump is 115K lines. There is about 6 years of data in the
>     system.
>
>     I would argue that all database driven applications  should have a
>     simple flat file export/import that dis-entangles the funny database
>     issues on export and then tangles them back up on import and does an
>     intregity check. It also means that you can make huge database
>     structure
>     changes and not have to restrict changes to the level of what can be
>     fixed with a few sql statements.
>
>
> Well, if that was supported it wouldn't help in this case.  The issue
> is that SQL-Ledger (and hence earlier versions of LedgerSMB) have
> inadequate constraints and consequently you get really nasty issues
> with data ambiguity.  At least when manually trying to fix it you can
> identify the duplicate id's relatively quickly.  If trying to
> automatically disentangle I would be concerned about automating fixes
> that don't fix the problem but make it harder to track down.
>

Hmm. I think this might be a really good approach for upgrading to 1.3,
as I dig into this without being close to the code. Two scripts: a dump
and a restore. Dump from each of the core tables to some big CSV file,
perhaps with each part of a transaction taking a single ginormous row,
with acc_trans joined to ap, ar, gl, invoice, and indexed in the
transactions table? And then proceed to dump orders, vendors/customers,
parts, projects, etc. to other files.

I'm with Alvin on the upgrade--upgrading in place gives me the
willies... would prefer to migrate my data into a clean install.

Have the dump script do some integrity checking, and log any problems it
finds with duplicated transaction ids and other collisions--after
running each dump, group by something that should be unique and log any
ids that show up more than once.

Then the user can edit the raw data in a spreadsheet, guided by problems
found in the log. Don't try to automatically repair any problems--just
identify them where we can.

And then we can write an import script/stored functions to store these
rows in the right tables.

If I get a chance to work on any of these soon, I'll push them up to our
public git repo and let the list know... And yes, some funding might
help that happen quicker ;-)


Cheers,
John Locke
http://www.freelock.com