Re: The plan from here to 1.5.0-rc1 and forward

On Wed, May 11, 2016 at 9:27 PM, Michael Richardson <..hidden..> wrote:

Erik Huelsmann <..hidden..> wrote:
> * Before we release RC1, we create the 1.5 branch
> * When we create the branch, 'master' becomes 1.6.0-dev
> * The 1.5 branch will still report 1.5.0-dev, until we release 1.5.0 at
> which time it switches to 1.5.1-dev, exactly as we do now on 1.4
> * All fixes need to be committed on 'master' before being ported to 1.5
> -- at least as long as the code bases haven't diverged much
> * Since there's more than enough infrastructure for testing on 1.5, PRs
> (fixes) which come with tests (where appropriate) will be handled with
> priority (at least by me)

All this seems reasonable enough.
The issue is how people manage to test new things in the field enough that
people feel comfortable moving forward at times other than after doing their
final report for the year.

Agreed. Actually, that's not only true for new things, but also for existing functionalities that get changed due to rewrites or even fixes. That's why we have started implementing all the tests that we have. And the level of testing we're currently executing is only the start. There's a *lot* more to come. That will not mean there won't be bugs. But it does mean that the obvious things aren't going to be broken anymore.

Moving from 1.3 to 1.4 was very painful, enough
that I haven't done it for all the ledgers I take care of.

I know you have had a lot of problems. It'd greatly help me (and maybe others) if you could summarize your problems once you're completely up and running so we can work on the issues you've had.

There's one thing here though. Both 1.2 -> 1.3 and 1.3 -> 1.4 have been full-schema upgrades. This caused a lot of problems for even more users. And that includes you (in both migrations?). The upgrade 1.4->1.5 will not be a full schema upgrade anymore, nor will any other future upgrade if I can help it. What I mean is this: when you upgrade from 1.2->1.3 or 1.3->1.4, all data will be copied from your existing schema to a new schema, with (mostly) the same tables. While only the new and/or changed tables need complex queries to be filled, the fact that all data is copied introduces risks for data that otherwise might not have been touched. Instead, data migration from 1.4 to 1.5 doesn't change much to the database schema and as such doesn't require this extensive copying behaviour. We have also added a different (better) way of loading schema changes than the current Fixes.sql "patchwork": changes that have been applied to the schema won't be applied after that anymore; this should remove all the (expected) warnings and errors when upgrading from one version to another.

There is a lack of a virtuous cycle of improvements that are easy to deploy
which spark new ideas and new improvements.

Can you detail a bit more what you mean here? Efficito deployed nearly all 1.4 versions ever released. Some of the changes in later 1.4.x releases come from feedback from the customers who were using these new releases. For those who are still running 1.3, it's true, they can't join and enjoy (or hate) this cycle of change. However, there are those who would rather keep things as they are, as long as they are not really broken and they know how to work around the partially broken stuff.

I'm just at the point where I can move things from 1.3 to 1.4, and yet I'd
like to get onto 1.5 as soon as possible so that fixes I might do get in.

While I think we can't support upgrades from 1.2 to 1.5 (they'll have to go through 1.4), it's my desire to support upgrades from 1.3 to 1.5. However, there's a problem with developing upgrades that run smoothly: the data in the wild is *always* more dirty than you hope and never in the ways one expects. So, in order to develop a great upgrade, we need loads and loads of real world 1.3 and 1.4 data dumps. Those dumps could be anonymized, randomized, etc., but the database relations should be there in every broken aspect as in the source data.

Do you think it would help if we were to build a data-anonymizer so it would be easier to submit data to the project as migration testing data sets?

As I see it there are three things encoded in the version number.
1) database schema information.
2) stored proceedure functions and interfaces.
3) Perl code.

The relationship between these three is rather intricate and I think that
this is part of what is breaking the virtuous circle.

Ok. From the way you select your three points, I conclude that the table layout is what you call the database schema information. The version number stored in the database - the number that's checked and reported if it mismatches - is directly linked to the combination of the table layout and the stored procedures. You're probably aware of this, but there's really very little Perl code (in "new code"); the Perl code is mostly just enough to glue the stored procedures and the output templates together. So, going back and forth, really isn't a matter of replacing the Perl routines and being done.

However, I've in the past successfully backdated databases by reloading the old stored procedures into the database and replacing the Perl layer: simply replace the Perl layer and run 'setup.pl'-'s upgrade procedure on the database. Now, that may not *always* work, but it sure works most of the time. During the 1.4 cycle we have had only very few actual schema changes, most of which meant adding tables or columns, not replacing or dropping. Almost always, you would have been able to go back...

If we could number the
database schema as "1.5", rather than "1.5.x", that would mean that we could
move the other things forward *and backwards* more easily.

Since you're providing some really valuable user feedback here, I'm really interested; I hope I'm not sounding defensive. From the above and from my memory, you were having a lot of problems to get from 1.3 to 1.4. I can't seem to remember loads of problems (occasional maybe) within the 1.3 series. Is my memory correct? Or were there a lot of other problems which would prevent you from doing upgrades (or maybe undermine your confidence to do so)?

I could for instance, try newer perl code against an existing database.

While this makes sense for old code, I don't think this makes sense for new code as all it does is pass data from the web interface to queries and query results to the UI templates. At least, this is from my perception of the development and fixing process. If we want to be sure, we should investigate the fixes which happened over some time and see if changes that were committed, can be decomposed like this.

I could do this with a virtual host, or another VM (or docker image).
(perhaps with read-only permissions).

Well, if you only want to "play" with the new version, I think there's an easier solution to this. If you go into the 'setup.pl' admin area, you can copy the company database. Then, you could run the upgrade on the second copy, play with it and either decide to upgrade the original database, or to report problems to the project and ditch the effort (for that version).

That would be a big boon to development on the perl side.

Realistically, we probably have to number the stored proceedures (2), along
with the schema. So, my suggestion for 1.5 is:

"at least as long as the code bases haven't diverged much"

I can see how that would be useful. Chris wanted to go that way, by creating PostgreSQL extensions for the various module files. Then each module (set of routines) gets its own version number. The question is: what does it buy us? I mean, Perl offers the option (actually: really *wants*) to have separate version numbers for each of the files we have in the LedgerSMB/ directory. But we never distribute any of these separately. And since we don't distribute them separately, what's the chance of them getting run separately? Moreover, isn't it so that if we do that, we'll have to provide *guarantees* that it'll work? And if we have to provide guarantees, then won't it be a lot of work to ensure we meet these guarantees? But if it ends up being a lot of work, then will a good percentage of our user base be taking advantage of that effort?

Or should we direct our efforts at providing better upgrade paths where hopefully your experience with the 1.4 upgrade will soon be just a bad memory?

that this be defined by "the interface between 2/3 hasn't been forced to change".
I'd like to say that if it has to change that it's called "1.6", but we may
need to do bug fixes in 1.5 that require changes.

Well, if 1.4 is indicative of 1.5, then I expect that we'll do exactly as you suggest: During he 1.4 release cycle (I just browsed the history of Pg-database.sql to see what happened in the schema), there were only 10 or so actual changes to the schema (the rest of the changes to Pg-database.sql were version number updates). Of these 10-or-so changes, there were a number (3?) which added 'on delete' clauses to foreign keys. There were another 3-or-4 which changed menus (deleted, mostly). And the rest were actual schema changes, although some only changed views.

All in all, the schema was pretty stable, I'd say. I'm hoping and expecting to achieve the same thing for 1.5.

Now, the tone of the above to me feels like I'm defending myself. I don't want to do that. I want to learn from your feedback and I'm thinking what I can do this week to make your life as a user and our cooperation as a project better. At a few points in the response, I've said "you could have". May be you would have done those things, if only you were aware of the possibility? Some things which I have done as a developer, have not been stated the way that you need them to as a user.

So:

* Would it help to have an anonymizer tool so people (you) can submit their datasets to the project and we as a project can use those data sets to test our migration routines?

* Does it help to know that you can create a copy and simply upgrade that to play with?

* Do we need to have a more extensive FAQ with questions that you still have (or had) which would have been well answered there?

* What more information do you miss to be a good admin of your LedgerSMB instance?

Regards,

Bye,

Erik.

http://efficito.com -- Hosted accounting and ERP.

Robust and Flexible. No vendor lock-in.