Re: [ledgersmb-devel] Setting a policy for our dependencies

As a couple notes, I would add the following expansions to this.

"Allow the widest possible version range on any module, disallowing individual malfunctioning versions to further extend the range"

I think we are better off documenting bugs in misbehaving versions of modules and perhaps offering a module check utility rather than disallowing certain versions of modules.

How do you envisage the disallowing taking place? I was actually not thinking of actively disallowing specific versions, but rather document the fact that the module doesn't work with LedgerSMB by putting a version-exclusion in our cpanfile (the canonical location to declare version dependency).

Right and my point is that this is under inclusive since it is merely an install-time check and moreover lacks any forward knowledge of later possible breakage. Also I can see possibilities where a package distributed under Debian or Gentoo might have a back ported fix and therefore it might also be over inclusive since there might be patched versions with the same version number which lack the misbehaviour.

If a patched version retains exactly the same version number, then I'd consider that to be a bug in the patch.
There should be at least a suffix indicating it's patched, unless the patch is simply a change to conform with distro policy, such as a different path, or change in a library name etc.
As Erik says in his response, I don't think we need to concern ourselves too much with dependencies on packaged installs, that's up to the packager (with our support as needed during packaging)

Two points:
1. Please correct me if I am wrong but I don't think Debian packages change the version $VERSION when they back port fixes.

That's very well possible, yes.

2. This is a specific problem when installing LedgerSMB from source and packages from the distribution. I assume we want to support that.

Well, yes and no. Ideally, we don't need to support this combination, because we provide packages (at least for the most popular of distributions). I see installing from source with packages from the dist as a workaround for packages not being available or sufficiently up to date.

Now assuming I am right on point 1, an important question is whether we want to take on the responsibility in the cpanfile for knowing what every distro we support has done in that area.

The answer to this point - from me at least - is a firm No. People can install most packages from their distro, but if some have been blocked by us due to version mismatch, they'll have to accept that if they don't install manually for the remaining packages, `cpanm` is going to upgrade the packages which have mismatched versions. By consequence they'll need something like "local::lib".

I would ague that maybe we don't and therefore maybe the cpanfile should presume that patched versions of broken dependencies exist and we should just say "not our problem" aside from user support.

Ah. But the "not our problem" can be taken two directions: over-inclusive at install-time (your solution, as I understand it), which results in non-working installations or under-inclusive at install-time (what I advocate) which results in unnecessary upgrades but is much more likely to result in working installs. My interpretation of "not our problem" is: it's not our problem that packagers don't change their version numbers (be it for good reasons or not) -- which means that we have to reject packages which *might* have been acceptable.

How the user and the user's source of the package goes about fixing the bug then becomes something we don't have to know how it works or worry about.

Right. We agree there completely. But if we can't distinguish the originally broken version from the patched working version, then only packagers can solve that in the package-creation process -- which brings me to my first point: installing a mix of packages-from-dist and ledgersmb-from-source is a workaround which we should treat as such.

[ snip ]

"Require as few a possible modules in the expanded dependency tree (prefer modules as direct dependencies which are already depended on implicitly)"

Additionally I think that when we spin things off we should shoot for modules which are simple in interface, within a year or two will likely be pretty close to bug free and unlikely to require dramatic modifications, and therefore keep spin-offs throttled so they do not cause undue chaos for packagers.

The reason to put this point into the policy is actually not mainly for packagers (although, as you note, it does help). The main reason to put it there is to support those installing from source. The current dependency list for LedgerSMB is huge. This doesn't mean it shouldn't grow new dependencies, but my point here is that if we can - sometimes even significantly - reduce the dependency tree by being careful about which immediate dependencies we choose.

This actually brings me to a reason we may want to be thinking about pushing as much off (eventually) into external dependencies as we can.

Right now we have a huge direct dependency tree. This is a problem for a number of reasons. It makes it hard for people to install from source. It also creates a whole lot of extra work for packagers. For example when creating the Gentoo overlay I am expecting the vast majority of my time to be managing and testing our dependency tree. Packaging the dependencies is no problem. Making sure I have all of them and that they all work as expected. That's the problem.

But now, imagine that we could take the reporting and template libraries and spin them off. How many dependencies could we push to indirect by doing so? How much easier would it be to test installation of optional capabilities? By my count we could reduce direct dependencies by at least 2 and optional dependencies by 4 or 6 depending on how we do it. If we spin off the reporting engine as a whole we could perhaps reduce dependencies long-term by at least 10 (4 standard and 6 optional).

Honestly, I don't see how spinning parts of our code out into separate cpan modules will help the dependency tree at all.
We still have the same tree, it's then just distributed among many modules, which will each have an overlapping tree.
Worse, is the fact that the tree that needs to be met is now harder to discover, as you need to check each of the modules for what's required.

It's not the same tree though. It is more modular and easy to test, and the responsibility for change is more clearly defined. My view is that smaller pieces of software, with stable contracts between them, are easier to manage, test, and so forth than bigger pieces of software with more fluid internal contracts. Put another way (and this may be a philosophical difference here): small, well-defined responsibilities are easier to keep than big, nebulous ones.

Agreed. However, for a packager who starts from scratch, that doesn't matter much: that packager must do all the dependencies as well as the LedgerSMB software itself. So, it doesn't reduce the work he has to do. Moreover, if things work the way they do with Debian, you have more processing time: a package needs to transition into a specific stage before you can build packages that depend on it. Which in this case means you need to build more levels of dependent packages and thus have more moments where you have to wait for the package to transition.

Pieces of software should have small, well defined responsibilities in my view. The thing is, indirect dependencies are not our responsibility so pushing more things to indirect dependencies makes our responsibility narrower, better defined, and ultimately easier to keep.

The idea of spinning off the templates is actually an important case in point.

While I agree with the general point of small dependency trees, I like to add a subtle addition to it:

Pieces of software should have small dependency trees *compared to the functionalities they offer*.

The reason I'm saying that is because I think LedgerSMB offers a huge deal of functionalities. By comparison, we can probably slightly reduce the dependency tree, but can't get it much smaller. The tree won't get smaller by separating out; it might not get much bigger either.

As long as we have to maintain these new separated modules ourselves too (as well as our packagers), then there seems to be little benefit to dependency management as a whole. Using modules like Template Toolkit makes a great deal of sense: we outsource maintenance of the software *and* dependency tree to other developers in the community, sharing the development and test efforts. Also, Template Toolkit probably was packaged before Jame ever started his work. So, there too is a benifit.

As long as we're the only user (in Debian) who use our factored out libraries, the burden will fall on Jame, though.

The specific modules I would suggest we look at in the near future are the templates and the reporting engine (and actually I have immediate uses for these outside of LedgerSMB but inside the LedgerSMB ecosystem).

Now if we were to have the template system outside, then we have two required modules (the successors to LedgerSMB::Template, probably, the successor to LedgerSMB::Template::HTML) which are now required. Assuming that this successor has a good plugin framework and we can auto-detect any other template library we put in, we can effectively that the LedgerSMB application no longer has any responsibility to the PDF, OpenOffice, CSV, etc. generation. If someone wants to generate spreadsheets, they do:

cpan App::LedgerSMB::Template::Plugin::OpenOffice;

But we already do (for a limited set of plugins):

https://github.com/ledgersmb/LedgerSMB/blob/master/cpanfile#L61-L91

where the syntax of the command to enable PDF support is slightly different.

And away they go. We no longer worry about what dependencies this has.

$ cpanm --with-feature=latex-pdf --installdeps .

does exactly the same. we only specify which modules we use to make it happen. the same modules which would otherwise have been in LedgerSMB::Template::Plugin::PDF. Both would be maintained by us, right?

It is not LedgerSMB's (the software) responsibility at all. The template engine would have to have a plugin structure that probably couldn't change much, and an interface for interacting with plugins that could not change. But this is not actually a big difficulty in that case.

For pre packaged modules that's only a problem for the packager, but for git or tgz installs it's a problem for the installer.
The installer of a src version then also needs to download and install many individual modules.
Development installs will become harder, especially if different versions of a module are needed for different branches being worked on.

Finally, it will add additional load to our packagers.
Right now Jame has quite a large number of perl packages that he deals with simply to support LedgerSMB being packaged.
That can lead to delays in packaging a new LedgerSMB release if there are also changes to it's dependencies .
Having even more separate dependencies will only make that worse.

I think this is actually a very valid concern. There are two very important ways we can help with that:

1. we don't release a lot of new spin-offs until the old ones are really stable and we are guaranteeing compatibility for a while. If Debian has an old version of PGObject, as long as we support that interface...
2. we make sure we have stable interfaces before we spin things off.

I think the only way to spin off is when things are well-designed and stable indeed. Probably by having the to-be-separated-out code in exactly that state in the main code base.

[ snip ]

"Group feature dependencies into their respective features as much as possible (so as to not require them for a more basic installation)"

A corollary here might be that long-run a lot of the application today should be relegated to optional features as we go forward. If someone doesn't need inventory tracking, why install it? If one doesn't need more than GL, customers, vendors, and basic reporting, why install more than that?

Yup. That does work. Personally I think that installing inventory tracking (if it doesn't add many new dependencies), shouldn't be a big problem; we're in the terabyte era and installing one such feature is just a few (hundred) kilobyte. Also, having the feature installed might actually reduce code complexity. However, being able *not* to install the feature would seem to require some infrastructure to create, install and register components. Which might work out great for (third-party) components which are yet to be written.

Of course we aren't there yet but I would suggest that a forward-thinking view of dependencies we might want to think about it not only regarding the dependency problems we encounter with maintaining and working on the current code but also in the question of how modularised and loosely coupled we may want the code to be in the future.

My approach here has been to put the problem of dependency declaration in a system with loosely coupled components is to be part of a design of such a module system; that is, for now, I haven't been looking too far into the future and trying to make the best out of the current situation for current package managers and installers-from-source. The immediate trigger for my previous mail was a discussion on #ledgersmb with Yves and David about the fact that Yves declared the minimum version of PGObject to be 2.0.2. I'm against requiring such a new version as the bare minimum since the code base supports anything at or newer than 1.403.2. Hence my idea about requiring a policy with a solid reasoning of why we do things the way we do. Then we also have a good foundation to evaluate the minimum PGObject version requirement.

Agreed. I am not even sure we should blacklist versions just because they are broken on CPAN because there could be patched versions packaged in distros. Mere warnings in logs is not enough to bump a version.

As said previously, I don't think we need to concern ourselves with patched (same version number) versions, as it shouldn't happen (the version number should gain a suffix) and it's up to the packagers to handle version compatibility anyway. They are certainly better positioned to know if an available package has been patched.

But that is the package suffix, not the $VERSION right? On delian-based systems I have worked on I have *never* seen a suffix on internal version numbers, only on package versions. Maybe I am missing something?

Probably not, but see above: I don't think *that* is our problem. (We'll simply overlay the fixed-and-working patched version with a fixed-and-working clean version from CPAN.)

Bye,

Erik.

http://efficito.com -- Hosted accounting and ERP.

Robust and Flexible. No vendor lock-in.