Re: [ledgersmb-devel] Setting a policy for our dependencies

On Sun, Aug 27, 2017 at 11:52 PM, Erik Huelsmann <..hidden..> wrote:

Hi,

Thanks for the feedback.

On Sun, Aug 27, 2017 at 12:56 PM, Chris Travers <..hidden..> wrote:
As a couple notes, I would add the following expansions to this.

"Allow the widest possible version range on any module, disallowing individual malfunctioning versions to further extend the range"

I think we are better off documenting bugs in misbehaving versions of modules and perhaps offering a module check utility rather than disallowing certain versions of modules.

How do you envisage the disallowing taking place? I was actually not thinking of actively disallowing specific versions, but rather document the fact that the module doesn't work with LedgerSMB by putting a version-exclusion in our cpanfile (the canonical location to declare version dependency).

Right and my point is that this is under inclusive since it is merely an install-time check and moreover lacks any forward knowledge of later possible breakage. Also I can see possibilities where a package distributed under Debian or Gentoo might have a back ported fix and therefore it might also be over inclusive since there might be patched versions with the same version number which lack the misbehaviour.

I say this for two reasons:

1. Managing this list is annoying and deciding when to block a specific version of a module is going to be an annoying and political decision, and

Well, so far I've explicitly excluded versions which were known to be buggy. E.g. there's a version of PGObject-Type-Bytestring which is known not to work with LedgerSMB, because I've fixed the bug that was exposed with you. My plan wasn't to go "hunt" for non-working versions.

What I would prefer would be a range of versions where the software was intended to work together and not make bugs in dependencies the responsibility of the LedgerSMB cpanfile at all. The reason again being that the bugs in the dependencies could conceivably be addressed in a number of different ways (patching the dependencies, possible by a packager, upgrading the dependencies, etc) and we have no knowledge of that.

2. A block on a module version will not prevent someone who has an older version of the module from upgrading to a blocked version since it is an install-time check and another piece of software could require a newer version and thus break the install-time check.

Yup. But it does help packagers and the cpanm installer when doing their work.

How does this affect back-porting of patches to otherwise stable dependencies?

Suppose for example there is a bug with some version of a common central dependency (say, something central like the Plack/CGI adaptors) that causes us grief, but Debian maintainers back port a fix for it. Do we disallow the patched version? Do we say "you cannot install from our source in this case?" I don't know what the answer is here. I don't even know if we want to make it our problem aside from support.

Maybe we could say that we only blacklist module versions in our direct control (like the PGObject stuff?)

So I would suggest deciding required versions by first-supported version and up, and providing a tool or just documentation for checking for other problems in the mean time.

To what extent would that tool be different than 'cpanfile' and our dependencies test "xt/01.2-deps.t"? (It's in xt/ which could be a reason to develop an additional test for the actually installed versions...)

Secondly I would like to suggest that a focus on moving things to CPAN and breaking the application into components for better testability.

I'm all for breaking it up into components. However, I'm thinking that moving those components to CPAN is really only useful when the component has a function outside of LedgerSMB (i.e. is generic enough).

Certainly our reporting engine would be. Also a lot of tooling around LedgerSMB (such as what goes in the setup.pl) might be something we might want to break apart and have as a separate management apps supporting many stable versions. Our templating and mailer modules might be. And then the question becomes whether something like contact management is general enough to be modularised and spun off.

But my larger point here is that this generates some work for downstream packagers and we should also think about how to make this manageable there. I think we also want to think about the issues of spinning things off as a part of dependency management because at that point we are generating our own dependencies.

"Require as few a possible modules in the expanded dependency tree (prefer modules as direct dependencies which are already depended on implicitly)"

Additionally I think that when we spin things off we should shoot for modules which are simple in interface, within a year or two will likely be pretty close to bug free and unlikely to require dramatic modifications, and therefore keep spin-offs throttled so they do not cause undue chaos for packagers.

The reason to put this point into the policy is actually not mainly for packagers (although, as you note, it does help). The main reason to put it there is to support those installing from source. The current dependency list for LedgerSMB is huge. This doesn't mean it shouldn't grow new dependencies, but my point here is that if we can - sometimes even significantly - reduce the dependency tree by being careful about which immediate dependencies we choose.

This actually brings me to a reason we may want to be thinking about pushing as much off (eventually) into external dependencies as we can.

Right now we have a huge direct dependency tree. This is a problem for a number of reasons. It makes it hard for people to install from source. It also creates a whole lot of extra work for packagers. For example when creating the Gentoo overlay I am expecting the vast majority of my time to be managing and testing our dependency tree. Packaging the dependencies is no problem. Making sure I have all of them and that they all work as expected. That's the problem.

But now, imagine that we could take the reporting and template libraries and spin them off. How many dependencies could we push to indirect by doing so? How much easier would it be to test installation of optional capabilities? By my count we could reduce direct dependencies by at least 2 and optional dependencies by 4 or 6 depending on how we do it. If we spin off the reporting engine as a whole we could perhaps reduce dependencies long-term by at least 10 (4 standard and 6 optional).

" Not depend directly on modules which have overlapping functionalities"

Big caveat here is that some modules that are used *because* they are overlapping might be in a different position. Moo/Moose is a good example and both are widely used enough that there is no harm in using Moo instead of Moose in spun-off modules. However we still have some old paradigms in the code that need to be cleaned up and we should also focus on this.

Well, the point about "depend directly" is that our tree either depends on Moose *or* on Moo, but not both. Whether or not our dependencies themselves depend on the other of the two, is up to our dependencies, of course.

If we incubate modules we are spinning off, we may want to depend on Moo and Moose temporarily because the justification for adding Moose to a spun-off dependency I going to be a bit higher than Moo.

"Not require modules which only provide nice-to-have functionality"

Also, while I am happy to see optional modules used to provide nice-to-have functionality it is worth noting that there are two things that need to be guarded against here. The first are extra dependencies. The second is extra code complexity that comes with adding such options. I would say that both need to be justified before we should allow them and where we do allow them we should think about how to set up proper interfaces so we don't have lots of conditional module handling everywhere.

Ok. This is a broader point than our dependency management. I think this is a point to be put in the category "what should we consider when thinking about adding new features/functionalities?"

"Group feature dependencies into their respective features as much as possible (so as to not require them for a more basic installation)"

A corollary here might be that long-run a lot of the application today should be relegated to optional features as we go forward. If someone doesn't need inventory tracking, why install it? If one doesn't need more than GL, customers, vendors, and basic reporting, why install more than that?

Yup. That does work. Personally I think that installing inventory tracking (if it doesn't add many new dependencies), shouldn't be a big problem; we're in the terabyte era and installing one such feature is just a few (hundred) kilobyte. Also, having the feature installed might actually reduce code complexity. However, being able *not* to install the feature would seem to require some infrastructure to create, install and register components. Which might work out great for (third-party) components which are yet to be written.

Of course we aren't there yet but I would suggest that a forward-thinking view of dependencies we might want to think about it not only regarding the dependency problems we encounter with maintaining and working on the current code but also in the question of how modularised and loosely coupled we may want the code to be in the future.

My approach here has been to put the problem of dependency declaration in a system with loosely coupled components is to be part of a design of such a module system; that is, for now, I haven't been looking too far into the future and trying to make the best out of the current situation for current package managers and installers-from-source. The immediate trigger for my previous mail was a discussion on #ledgersmb with Yves and David about the fact that Yves declared the minimum version of PGObject to be 2.0.2. I'm against requiring such a new version as the bare minimum since the code base supports anything at or newer than 1.403.2. Hence my idea about requiring a policy with a solid reasoning of why we do things the way we do. Then we also have a good foundation to evaluate the minimum PGObject version requirement.

Agreed. I am not even sure we should blacklist versions just because they are broken on CPAN because there could be patched versions packaged in distros. Mere warnings in logs is not enough to bump a version.

Looking at the cpanfile, it appears to be the common practice however because apparently blacklisting individual versions doesn't work. See the following lines:

# cpanm doesn't handle our true dependency declaration correctly:

# PGObject::Simple 3.0.1 breaks our file uploads

#requires 'PGObject::Simple', '>=2.0.0, !=3.0.0, !=3.0.1';

#requires 'PGObject::Simple::Role', '1.13.2';

# so we use:

requires 'PGObject::Simple', '3.0.2';

requires 'PGObject::Simple::Role', '2.0.0';

I think we would do better to require PGObject::Simple 2.0.0 and up and leave it to packagers to deal with the 3.0.0/3.0.1 breakage for example.

Best Wishes,
Chris Travers

Regards,

--
Bye,

Erik.

http://efficito.com -- Hosted accounting and ERP.
Robust and Flexible. No vendor lock-in.

Best Wishes,

Chris Travers

Efficito: Hosted Accounting and ERP. Robust and Flexible. No vendor lock-in.

http://www.efficito.com/learn_more