Re: [ledgersmb-devel] Setting a policy for our dependencies

What I still don't understand is if your opinion is that we should just let people struggle and if not what other mechanism you think we should use to "encode" our knowledge.

2. A block on a module version will not prevent someone who has an older version of the module from upgrading to a blocked version since it is an install-time check and another piece of software could require a newer version and thus break the install-time check.

Yup. But it does help packagers and the cpanm installer when doing their work.

How does this affect back-porting of patches to otherwise stable dependencies?

Suppose for example there is a bug with some version of a common central dependency (say, something central like the Plack/CGI adaptors) that causes us grief, but Debian maintainers back port a fix for it. Do we disallow the patched version?

No, we disallow the original. If the person applying the patch does not change the version number, then there's no way we can know the difference, can we? But more importantly, the packager of *our* package can inspect the dependent packages and decide *not* to follow our dependency advice - assuming we include sufficient documentation as to why specific versions have been excluded.

Do we say "you cannot install from our source in this case?" I don't know what the answer is here. I don't even know if we want to make it our problem aside from support.

Maybe we could say that we only blacklist module versions in our direct control (like the PGObject stuff?)

Why is that any better than the "general module case"? Do we control packages in Debian not getting patched for PGObject?

So I would suggest deciding required versions by first-supported version and up, and providing a tool or just documentation for checking for other problems in the mean time.

To what extent would that tool be different than 'cpanfile' and our dependencies test "xt/01.2-deps.t"? (It's in xt/ which could be a reason to develop an additional test for the actually installed versions...)

Secondly I would like to suggest that a focus on moving things to CPAN and breaking the application into components for better testability.

I'm all for breaking it up into components. However, I'm thinking that moving those components to CPAN is really only useful when the component has a function outside of LedgerSMB (i.e. is generic enough).

Certainly our reporting engine would be.

The formats, you mean, I take it? Or do you mean a broader scope? If it's the formats, I really wonder what they add on CPAN: as an example, after my refactoring on master, the LaTeX format is no more than 30 lines (less the escaping routine). But there's another thread waiting for that discussion. I'll respond in that thread.

Also a lot of tooling around LedgerSMB (such as what goes in the setup.pl) might be something we might want to break apart and have as a separate management apps supporting many stable versions. Our templating and mailer modules might be. And then the question becomes whether something like contact management is general enough to be modularised and spun off.

But my larger point here is that this generates some work for downstream packagers and we should also think about how to make this manageable there. I think we also want to think about the issues of spinning things off as a part of dependency management because at that point we are generating our own dependencies.

Yes, we have spun off our own dependency with PGObject as well. So, yes, I think that's a good thing to think about. My approach was that we think about it when we spin off the code that can be spun off; possibly we need to change the dependency declaration "rules" when we do. Currently though we don't have one and we're not spinning off more code today (are we?). So, my idea was to solve the problem that we're having today and when we want to make things more complex tomorrow, then we solve the added complexity in its full breadth (meaning including dependency management).

"Require as few a possible modules in the expanded dependency tree (prefer modules as direct dependencies which are already depended on implicitly)"

Additionally I think that when we spin things off we should shoot for modules which are simple in interface, within a year or two will likely be pretty close to bug free and unlikely to require dramatic modifications, and therefore keep spin-offs throttled so they do not cause undue chaos for packagers.

The reason to put this point into the policy is actually not mainly for packagers (although, as you note, it does help). The main reason to put it there is to support those installing from source. The current dependency list for LedgerSMB is huge. This doesn't mean it shouldn't grow new dependencies, but my point here is that if we can - sometimes even significantly - reduce the dependency tree by being careful about which immediate dependencies we choose.

This actually brings me to a reason we may want to be thinking about pushing as much off (eventually) into external dependencies as we can.

Right now we have a huge direct dependency tree. This is a problem for a number of reasons. It makes it hard for people to install from source. It also creates a whole lot of extra work for packagers.

Is our dependency tree in general a problem? Or the fact that we have a huge *direct* dependency tree?

For example when creating the Gentoo overlay I am expecting the vast majority of my time to be managing and testing our dependency tree. Packaging the dependencies is no problem. Making sure I have all of them and that they all work as expected. That's the problem.

But doesn't that problem remain when you make the dependency tree a little bit more indirect by splitting off the mailer? I mean, the mailer still has the original dependencies, we now depend on the new mailer, so, all in all we added 1 more dependency to manage, right?

But now, imagine that we could take the reporting and template libraries and spin them off. How many dependencies could we push to indirect by doing so? How much easier would it be to test installation of optional capabilities? By my count we could reduce direct dependencies by at least 2 and optional dependencies by 4 or 6 depending on how we do it. If we spin off the reporting engine as a whole we could perhaps reduce dependencies long-term by at least 10 (4 standard and 6 optional).

I'm less optimistic there, because I'm assuming that at least for a number of years to come, we'll be the only ones depending on those intermediate dependencies, which means we basically just added complexity to the mix.

" Not depend directly on modules which have overlapping functionalities"

Big caveat here is that some modules that are used *because* they are overlapping might be in a different position. Moo/Moose is a good example and both are widely used enough that there is no harm in using Moo instead of Moose in spun-off modules. However we still have some old paradigms in the code that need to be cleaned up and we should also focus on this.

Well, the point about "depend directly" is that our tree either depends on Moose *or* on Moo, but not both. Whether or not our dependencies themselves depend on the other of the two, is up to our dependencies, of course.

If we incubate modules we are spinning off, we may want to depend on Moo and Moose temporarily because the justification for adding Moose to a spun-off dependency I going to be a bit higher than Moo.

Fair enough. While the policy in my opinion should be taken seriously, I also think that we should still allow ourselves to be reasonable. This is a case where I think the policy should probably be suspended for a certain period of time. I'm thinking that we'd need to discuss that here on a case-by-case basis in order not to become too hand-wavy with the policy.

"Not require modules which only provide nice-to-have functionality"

Also, while I am happy to see optional modules used to provide nice-to-have functionality it is worth noting that there are two things that need to be guarded against here. The first are extra dependencies. The second is extra code complexity that comes with adding such options. I would say that both need to be justified before we should allow them and where we do allow them we should think about how to set up proper interfaces so we don't have lots of conditional module handling everywhere.

Ok. This is a broader point than our dependency management. I think this is a point to be put in the category "what should we consider when thinking about adding new features/functionalities?"

"Group feature dependencies into their respective features as much as possible (so as to not require them for a more basic installation)"

A corollary here might be that long-run a lot of the application today should be relegated to optional features as we go forward. If someone doesn't need inventory tracking, why install it? If one doesn't need more than GL, customers, vendors, and basic reporting, why install more than that?

Yup. That does work. Personally I think that installing inventory tracking (if it doesn't add many new dependencies), shouldn't be a big problem; we're in the terabyte era and installing one such feature is just a few (hundred) kilobyte. Also, having the feature installed might actually reduce code complexity. However, being able *not* to install the feature would seem to require some infrastructure to create, install and register components. Which might work out great for (third-party) components which are yet to be written.

Of course we aren't there yet but I would suggest that a forward-thinking view of dependencies we might want to think about it not only regarding the dependency problems we encounter with maintaining and working on the current code but also in the question of how modularised and loosely coupled we may want the code to be in the future.

My approach here has been to put the problem of dependency declaration in a system with loosely coupled components is to be part of a design of such a module system; that is, for now, I haven't been looking too far into the future and trying to make the best out of the current situation for current package managers and installers-from-source. The immediate trigger for my previous mail was a discussion on #ledgersmb with Yves and David about the fact that Yves declared the minimum version of PGObject to be 2.0.2. I'm against requiring such a new version as the bare minimum since the code base supports anything at or newer than 1.403.2. Hence my idea about requiring a policy with a solid reasoning of why we do things the way we do. Then we also have a good foundation to evaluate the minimum PGObject version requirement.

Agreed. I am not even sure we should blacklist versions just because they are broken on CPAN because there could be patched versions packaged in distros. Mere warnings in logs is not enough to bump a version.

Right. I'm not saying we should be checking the version on each request or even on each Starman start-up. I'm restricting it completely to installation time and package-time documentation (but machine-readable).

Looking at the cpanfile, it appears to be the common practice however because apparently blacklisting individual versions doesn't work. See the following lines:

# cpanm doesn't handle our true dependency declaration correctly:

# PGObject::Simple 3.0.1 breaks our file uploads

#requires 'PGObject::Simple', '>=2.0.0, !=3.0.0, !=3.0.1';

#requires 'PGObject::Simple::Role', '1.13.2';

# so we use:

requires 'PGObject::Simple', '3.0.2';

requires 'PGObject::Simple::Role', '2.0.0';

I think we would do better to require PGObject::Simple 2.0.0 and up and leave it to packagers to deal with the 3.0.0/3.0.1 breakage for example.

The reason that's there the way it is, is because we test with Travis CI and we let Travis CI handle the dependency tree setup completely by way of cpanm. If we were to force-install 3.0.2 before running the rest of the dependency installation or we'd manage the complete dependency tree ourselves - as a packager would - then we can return to the commented-out require statements. In other words, it's the fact that cpanfile is overloaded in the CI process which causes the commented lines to be commented. Ideally, we manage the installation of dependencies of TravisCI ourselves (at least partially); then I can remove the work-around lines and uncomment the original lines.