Re: [ledgersmb-devel] RFC: Formalize templates interface and split out onto CPAN

On Thu, Aug 31, 2017 at 11:26 PM, Erik Huelsmann <..hidden..> wrote:

Ok. What is the problem with the render_to_psgi() functions when you throw away the success/fail indicator and convert that to a 'die()' when the value isn't equal to HTTP_OK ? I mean, that amounts to the same thing as the pair you're proposing? The body is one of two things: an array of strings or a filehandle. That should be easy to convert to a string (as you want it?).

That's one too many options for the body in my view. It means unneeded complexity given that different formats will probably return differently here and you always have to handle both.

Yes. That's exactly the point: you *have* to handle both. In any reasonable context, you can't know what the size of the returned document is and there should not be any assumptions as to whether the document can or needs to be loaded into memory in its entirety.

That's a good point and especially applicable to reports.

One nice thing about file handles in Perl though is that they are an abstraction which means you don't have to know whether it has been loaded into memory or not.

I would like to see, ideally, one response format only. A file handle would be good if we standardised on that.

But from a performance perspective, it's not always desirable to wrap content in a wrapper object. And from a DDoS perspective, it's not always desirable to load the entire response in memory. To that extent I think that the PSGI interface has been designed pretty well.

LaTeX->PDF documents are already on disk and can be returned using a file handle. HTML documents are (usually) already in-memory and can be returned using scalar references.

In all Perl versions we support you can open a file handle to a scalar reference to a string. See the following:

Chriss-MBP:.ssh christravers$ perl

use strict;

use warnings;

use 5.010;

my $string = "This is\na long\nstring across\nmultiple lines";

open my $fh, '<', \$string;

print "line: $_" for <$fh>;

say " and we are done";

Prints:

line: This is

line: a long

line: string across

line: multiple lines and we are done

So I guess with that the question becomes: Are there any cases where file handles introduce problems in this regard?

Here's a case in point: Suppose I want a cron job which emails a spreadsheet of overdue invoices (ageing detail report). I think this should be a fairly easy thing to do even with the global state requirements, but on 1.4 it goes something like "fork our reporting engine" and the problem still is not fully fixed in master (worse, it is not *obvious* that the problem is not fully fixed in master because of assumptions in the templating engine).

I'm afraid I'm missing the point. In my mind you'd need to instantiate a LedgerSMB::Report::Aging, call its "render_to_psgi()" method and take the body out of the returned triple. Then you do with that body whatever you want.

I'm not sure I see what the problem is that isn't completely fixed in master.

Quick: does the csv and OOO adaptors return bodies with the same interface? Or do I have to find out what kind of reference the body is and handle it?

You always have to do the same thing, which means: Use Plack::Util to grab the returned body.

And why does the template library have special knowledge of HTTP status?

Because we never settled on a good error handling model. Some parts of the code are returning undef, others are dying in combination with Try::Tiny (try/catch). Until we decide about a fitting error handling model, I don't see other options than to simply be pragmatic and apply the examples from the library we're using (e.g. Plack/PSGI in this case).

Ok so aside from the file handle/array of strings issue we need to come up with a good error handling model. Our error handling model is worse than just that problem in that often our errors are depending on a locale in global state in order to work and that is something that makes it impossible to, for example, build a list of GL transactions out of LedgerSMB from a third party module and have some approximation of half-way sane error handling at all. Maybe we should bring that up in a separate thread.

But this kind of gets at what I think is important in general in software engineering, which is that we build as much as possible in small, generally useful pieces that aren't going to change much and then we have a stable environment on which the rest of the project can move faster once some of those smaller pieces are off the table. To be honest the error handling is an even higher priority for me.

(Note that I'm avoiding to say that we have resolved the problem that none of our modules behave as "good libraries" and make lots of assumptions about the context in which they're being called, like accessing and/or assuming global state having been set up.)

Agreed. One of the point of trying to formalise things that can be broken off is to make good libraries and a smaller footprint. But this is about two things. The first is general interface and having an interface which avoids assuming that every component is being accessed from the whole framework.

Yes. We completely agree there.

WAIT! While going over the response above, I'm suddenly realising your use-case is with the aging report!

Then it's my time now to say that you're confusing one problem for another. Our LedgerSMB::Template is conflating two completely unrelated (although both "template" related) problems: LedgerSMB::Template generates our UI *and* LedgerSMB::Template generates our templatized documents.

Yep. And the UI portion will probably eventually go away.

If you mean "go away from LedgerSMB::Template (into LedgerSMB::UI)", then I can follow. However, I'm not so optimistic about LedgerSMB::UI going away. I'm thinking it might switch template processors, for speed purposes, but I'm not (unfortunately) seeing it be replaced by Dojo completely.

Ok. Initial thought is that maybe we need a LedgerSMB::Output namespace where we put mailers, send to ui, etc and maybe have a LedgerSMB::Template::UI class which we then use in many of these cases for simplicity purpose.

While doing the latter, it even mistakenly sends them as "UI responses" (HTTP responses).

I've long wanted to separate the "UI templates processor" from the "document templates processor". Having that separation resolves 99% of the cases where people may want to run reports from outside the context of a web request.

Exactly what I am trying to get to.

Ok.

I'm not convinced that separating "render_to_psgi()" out from the "UI templates processor" actually brings us anything: Dancer's "template()" function is very tightly nit to the response being sent too.

Right but Dancer's template engine assumes you are going to use it with Dancer, to generate web pages with HTML and jquery. I guess I am questioning that assumption for an accounting package.

Ah. But you're looking at it wrong. We're using LedgerSMB::Template as part of the "LedgerSMB web glue layer". It has nothing to do with the accounting package LedgerSMB, but much more with LedgerSMB's infrastructure. We've been writing all our components into the single application because these frameworks didn't exist yet. And while we're moving to Plack quickly, we're not completely rewriting the application to use Dancer (yet). So, we still need our custom glue layer for a while.

If we accept that it is always the web glue layer then, the implication of this is that reporting is a problem in a non-web environment right?

I think these two are closely connected and global state is a part of the second.

The case that again immediately occurred to me here was the fact that LedgerSMB::Report->render sends the result in CGI format, despite the fact that this is a module introduced in 1.4.

We didn't "do" PSGI-native-responses until 1.6, so, both 1.4 and 1.5 return CGI responses which are being converted to PSGI responses on the request-dispatch boundary.

And even in master, you have a choice of PSGI or CGI only (because for historical reasons we output to cgi by default). Yes, we can fix it in the reporting engine by adding a line of code but I guess I see this as a single responsibility problem because the templating engine takes on the responsibility of output which I think is a problem.

The problem is that we have a single templating engine to generate our UI output as well as our "documents" (invoices, reports,...) output. (And because they were conflated, we stored the report templates in the UI/ tree, instead of in the templates/ tree (!), with the huge downside that they're not translatable or per-locale modifiable.)

Right. So suggesting we a shim for old code and a clean separation of concerns.

I'd like to avoid using shims: they obfuscate the execution flow (generally) and tend to stick around way longer than planned and desirable. Can this same effect be achieved by *not* using a shim and simply coding "the right thing"?

Given the way we abuse the template engine in the old code I think an old code-compatible version of things will need to be available. Maybe we change the name of the package but part of the reason our concerns are still not clear or separate in the template engine is because the old code uses templates to, say, print to a printer which means our template engine knows about our configured printers and how to send files there.

(But if it doesn't take on the responsibility to output then I don't think it should take on PSGI responsibility either.)

There is a related question here which is where should the HTTP status messages come from but I think that's an additional discussion. I think they should only come from the workflow scripts or from PSGI.pm but open to other ideas.

Now we're getting back to the beginning of the discussion: if the UI templating engine doesn't take care of generating the actual output, then what role *does* it have? I mean, expanding the template is a function already performed by Template Toolkit...

I guess that's the question, isn't it?

Here's what I would like to see (and I think this goes well with the efforts to move to a more service-oriented front-end): I would like to see the template infrastructure be a report and document generator which abstracts generation of documents in a variety of formats for the programmer. That's one responsibility. It is a narrow one. We can build something that then works for the future by ignoring what happens after we generate the document.

Ok. I like that. As long as the definition of "document" here excludes the HTML being generated as part of the HTTP response generation. Because that's an entirely different league of "documents" than the documents I think the engine should focus on: the user-visible, possibly translatable or adjustable envelopes, invoices, and loads of reports.

I am not sure they are entirely different, in that there is currently some overlap. One of the things that we do that SQL-Ledger did also because it works well from a user workflow perspective is allow someone to go through a report and do things to lines. A good example here would be the unapproved batches report where one can drill down into a batch (showing vouchers in a batch) but also select batches and post or delete them. So here is a document which exists in both leagues. It is a report (and is useful as a report) and it also provides a user interface. Now if you request the report in PDF format you obviously lose these (and so that is only in the reporting league).

Long-run we could dojoize it, and build a UI around the report document in a different way, but currently there is not.a way to say that these represent documents with non-overlapping use cases.

Put another way, what I see the module doing is providing an abstraction around "I want a report" and "btw, I want it in pdf format" or "I want it as a csv." Or "I want an invoice" and "Please make it a PDF."

Ok. If you want it maximally flexible for the future, I think we want "I want an invoice. Please use <this> template. Please make it a PDF" (because then we can maintain documents which are PDF output from LaTeX while others can be PDF from HTML).

That's a good point.

Now to do this it has to do two things:

1. Maintain knowledge of available formats on the system (and possibly be able to do so on a per-template basis, something we are currently missing), and
2. It has to provide a stable interface for format providers to use to communicate with it and to the program as a whole.

This means that what we have now for Template.pm in my proposal is basically a point of dependency injection, which maintains the responsibility of being aware of what dependencies have used it (PDF, HTML, etc). The Template::LaTeX, Template::HTML, etc. then take on the responsibility of plugging in to allow document generation in a singularly consistent API.

Ok. But I found Template Toolkit to be extremely flexible (we implemented it to read from the database instead of from a filesystem, completely plugin-based!). So, I'm still wondering if this can't be achieved with Template Toolkit and a few wrappers and plugins.

A whole lot of it can be, actually. And actually I think there is a whole lot of room for improvement on every level of this area.

There is a coordination layer, however, that cannot be. In reporting this manifests as the list of formats you can select.

That sounds small and narrow and it is. But think about what that enables as well:

1. Dynamic format detection (What formats do I know about? PDF, HTML, PS, OOO, and Gnumeric*)
2. Dynamic template detection (What formats can give me an invoice with my current template list?)

* hypothetical new format we just installed

Note we have to provide a shim for old code.

Why? I mean, old code (as in code in old/) typically invokes the template processor to generate a true document (the HTML responses are "composed" inline). The difficulty might be existing code in lib/. Although there you could probably search for 'render_to_psgi' and try to establish if there's a document or a UI being generated in that specific invocation. (All 'render()' invocations probably are real documents being generated.)

The problem for the old code is in the printing logic. I think we need a good replacement for that problem too as we move forward. However, I also don't want to disturb all the old code. My thought is to have a LedgerSMB::Template::Old class that replaces the LedgerSMB::Template class in these cases.

The template engine was a *major* step forward for 1.3 and the importance cannot be understated. But because of the way the old code handles printing, separating certain concerns there hasn't really been possible.

But I think the stabilisation and generalisation here would be helpful.

More tangibly, it would allow someone to write an HTML -> PDF template engine third party, and just have it plugged in if that was still desired by anyone, or for someone who wants to be able to generate OOO doc invoices to be able to do so without demanding that everyone else support the functionality.

At any rate that's what I had in mind.

Thanks. That's much more detailed and as such better understood (by me) as to where you're headed. So, to reiterate: I think splitting UI and "document" template engine wrappers into two categories sounds like a major, good step forward. I'd like us to try really hard to stay away from shims as something that's more than just a proof-of-concept; in the sense that it can be a way of hooking something in, but I'd say that the refactoring is only done when the shim is gone.

Well, in this case I think it should be old-code-only and only to avoid breaking things which are known-brittle. The refactoring will be done when the old code is gone.

Another option is we build something else and depreciate LedgerSMB::Template. That might actually be better.

Maybe we call it LedgerSMB::Document? Or maybe we build it on CPAN first, and then decide if we want to use it?

Regards,

--
Bye,

Erik.

http://efficito.com -- Hosted accounting and ERP.
Robust and Flexible. No vendor lock-in.

Best Wishes,

Chris Travers

Efficito: Hosted Accounting and ERP. Robust and Flexible. No vendor lock-in.

http://www.efficito.com/learn_more