Re: [DESIGN] Proposed structure fol LedgerSMB web services

Hi John,

Posted a cleaned-up, expanded v2 draft to the list to start clean discussion about what would need adding in the document. Responses to your comments, concerns and ideas below.

Very cool! Following some links from there led me here: http://www.restdoc.org/

Incorporated that reference in the design doc.

As to what you mean by "required", I don't know. If you mean "required to read before using the API", then, no, it's not required. If you mean "required when implementing a new service", then I think the answer should be "yes, it's required". Every service requires documentation and if this is the only documentation, then I'm quite happy to accept the service.

Agreed, sounds good.

Ok. Changed wording to make it more clear that it's a requirement on the provider, not the consumer.

For development/debugging, I really like having an API observe query parameters in addition to Range headers. I would suggest we support both, and pick one to win...

Hmm. I'm not really in favor of having duplicate functionality. People seem to be using cURL to test their services; it should be pretty easy to add the Range header (or others, for that matter) to a cURL request. What method do you use?

Ha. For a while I was using a home-grown Dojo single-page app for testing out APIs, have played around with quite a bit, but it's been a while since I've done a major API project. I've seen some decent browser extensions for some of these kinds of things...

The other thing I'm thinking of here is for more light-weight, reporting types of uses. I'm not sure how much control you can get over headers when doing a cross-domain request from a browser -- I'm thinking a lightweight JS app that might want to grab the last 10 sales invoices for a dashboard, or something like that -- with an iframe, for example, you can't necessarily set browser headers but you can easily add a GET parameter.

Not a big deal these days, there's so many decent tools for doing it right with a toolkit that we may not need the "lightweight" GET-only approach, but I do think there may be scenarios where it might prove useful...

Ok. Would really love to have a single API access method; lets help people find their way to development tools available today and see if that limits us too much and as soon as it does, extend the api with multiple options. (My experience with multiple options is that we have to support all options endlessly, but people use only one option anyway...)

The API itself should be responsible for doing this conversion -- and should allow the consuming client to send whichever of these it wants. The API can then convert to a Perl data object of some kind to pass off to the internal code.

Ok. You're saying there's *always* going to exist a mapping from application/form-data to application/json? I mean, I can imagine that a mapping like that for non-nested structures, but what about nested objects in arrays? I mean, in the new multi currency branch we have form fields named debit_1 and debit_fx_1; how do those map to a JSON/_javascript_ object?

Don't get me wrong; I'd like to delegate this to the request consumer too.

I think we just define a convention, and describe it. Perhaps make it simply mirror the Json structure with _ separated parts? e.g. debit_1_value=234&debit_1_fx=222&debit_2_value=444&debit_2_fx=400 maps to json as: (intentionally swapping the index to the 2nd position)
[{
"debit":{
    "value":234,
    "fx":222
}
},
{
"debit":{
    "value":444,
    "fx":400
}
}]

... I mean we already do this for form posts now, we need to convert it to some sort of data object internally anyway, why not build a library that does this for us, regardless of what format it receives in the request? Might need to change some of the current form field names...

Ok. Added a note to the doc that each format should document such a key/fieldname mapping.

This and the previous note does bring up something missing here: response format. Like the Range header, there's the "Accept" header the client can send, and I've also found it useful for very quick browser debugging to allow overriding that with a GET parameter.

So we should discuss the formats we support for the response:

application/json
application/xml
text/yaml
text/csv
text/html
application/x-latex
application/pdf

... and of course how we handle these. Json, XML, CSV are pretty straightforward (hey, are there any industry-specific XML formats we should leverage/offer?) -- for nested data in CSV I've typically seen Json used...

For those last 3, clearly there's a need for templates for each kind of object...

Added the list (and the question which ones we need to support initially); my thinking is that initially we simply need application/json just to be able to get started with the exchangerate upload/download.

If we've done a good job on the API, we should be able to plug in request formatters and response formatters easily -- so we could add text/yaml by writing a new plugin for both response and request handling...

Right. This and the response types made me put out the question in the document about how much we need to get started with Dancer *right now* (and how much we can delegate to later versions).

If the answer is that we do need dancer, then the question also becomes how much of the current URL space needs to be rewritten *right now*...

Ah, yes, and that's exactly why I think we need to support a GET parameter in addition to Range: header -- then you can simply generate a URL to get a CSV or HTML report of the most recent 10 payments from client X.

Is PUT to be added to this list? I would expect PUT to update values of an existing object, and needs to contain all new values for the object. Obviously since we're doing financial transactions, this probably can only modify drafts and not anything posted (in a financial sense). But for drafts, reconciliation, batches, etc. this seems useful.

POST or PATCH can be used for modifying just a field on an object, or handling things like payments on an object?

Ah. Good point. POST(with an rpc endpoint) would be for adding a payment to an open item. PATCH would be to change the values of an existing object which is still editable.

Ok. That all sounds fine to me...

[ snip batches ]

Actually, thinking about it, I can see how to put it all into one transaction. However, if that works, it depends on what you expect on subsequent calls within the same batch. Do you expect any queries to return the new values while they have not been fully committed to the database? Or do you just expect to send loads of modifications? Do you expect to be returned new IDs?

Good questions, and this gets beyond my experience -- I haven't actually done that much transactional programming to know the best practices here...

I would think we would expect subsequent calls to have the new values, and I do know that Dojo stores have supported "placeholder" ids that can be replaced with permanent ones after the data is committed, so I would tend to think that pattern should work, a "placeholder" that is returned while the batch is "open", and when the batch is "approved" a set of replacements get returned so the client can update with final IDs.

Should we be considering UUIDs here?

My basic idea was to batch up all RPC calls and delay them until the final "COMMIT" comes in and executing all the batched commands inside a single database transaction.

Hmm. Goes against REST, but then we are talking about financial systems, practically the definition of transactional logic. It feels like we are reinventing SOAP!

Ok. as per your suggestion below, putting this on hold for now. We don't want to reinvent SOAP and we want things to stay simple for now so we finally get the API off the ground...

I'm thinking about the scenarios here, and the one that comes to mind is "shipping" some products on a sales order. We use this all the time --skipping the shipping screen, we just put in a value in the "ship" box and "Create Invoice." The current LSMB adjusts the sales order line items/totals, and commits that, and then takes you to a create invoice form that is completely open, unsaved, and in my opinion really should be in a transaction -- the sales order qtys shouldn't get updated until the invoice is posted (or at least saved as a draft).

That's a scenario I think the current app should do in a transaction, and doesn't.

Right. But it does sound like a bug/problem in the current application; not something we should move to "web transactions" to fix it in a "heavy client" client-side.

I am also thinking about how you do transactions in a database, that you generally have to start a transaction with a "BEGIN" and otherwise it's not in a transaction. I'm thinking we just model the API the same way, that it's not in a transaction unless it's explicitly called for.

I also think this entire transaction functionality can be deferred until a later version, as long as you're thinking about it with the current version so it's something that can be added later...

Right. Added that to the doc.

Regardless of whether the response generated by the server is a failure or a success, the session cookies should be updated on each request. The client must respect cookie updates regardless of the type of response.

Hmm. What if the same client is running multiple, parallel transactions? How would we handle race conditions here? Is it possible for the same session to have multiple sequences?

Good point, but it seems to work for PHP, RoR, ... I'll look around and try to find how others solve it. Maybe by opening a second session?

I think the general approach is a token sent in the body, not a cookie. The browser will send all cookies in any session... You can probably go to some extra steps to isolate sessions with curl, but I mostly just use the "cookiejar" in curl that makes it act like a browser here...

Actually, what I'm thinking of is this: http://guides.rubyonrails.org/security.html#replay-attacks-for-cookiestore-sessions ; the nonce there may prevent session replay. The token you're talking about seems to be one to prevent session hijacking. GitLab allows to send the token as a header or as a query parameter. I like that approach, because it's separate from the payload.

IIRC, the Drupal Form API sets a "form-build-id" and a "form-token". The build-id essentially is the session/form sent to the browser, and the token is used to validate and detect replays.

Ok. But in a webservice situation, there's no form that's being sent from the application first.

[ snip ]

Yes it can... the dojo/date functionality works both ways -- I would suggest we deserialize to a _javascript_ object in the store functions themselves, this works pretty well.

Well, agreed that at least it *used* to do it: in the dojo/data docs there's mention of *serializing* (but I couldn't find any mention of deserializing). In the new dojo/(d)store, there's nothing in the documentation that I could find. But, indeed, the only correct place to deserialize dates into date objects does seem to be in the stores.

http://dojotoolkit.org/reference-guide/1.10/dojo/date/locale/parse.html

Right. My point is not that it can parse, but that the stores *automatically* instantiate Date objects from their serialized form. It's that functionality that I'm looking for. If we need to write our own JSON store which takes advantage of the request return schema description to find the date fields and parse them into date objects, that's great and completely fine by me. I had hoped it already exists though.

NESTING OF RESOURCES

=====================

When obtaining a resource from the server, the serving webservice may include embedded in its response objects that it refers to; e.g. the server may decide to include address data included in a response to a query for a customer. The server isn't required to include more than just the key by which the resource can be queried out of the resource collection.

Nested resources in the URL space (such as the GitLab example with team members in a project [2]).

*** Nested resources like the GitLab example pollute the namespace, because there's a two way correspondence: users-in-project and projects-in-user. *** How to handle this in the way that creates the least complexity??? *** Presumably, we want things to be layered, building complex resources on simple ones; so it's problematic in the gitlab example to make the user aware of the projects... ***

We should support and default to "obvious" nested resources. e.g. line items on an invoice, payment lines, etc.

Do you mean that these nested resources should be made available at the URL level? Or simply *always* be embedded in the response object? Basically, I wasn't thinking of the journal lines as individual resources. I think the *journal* is the individual resource, with a number of lines "inside" it. Would it make more sense to you to make the individual lines into resources too? [I can see reasoning for that too, because it allows running queries on the journal-line resource and filter out all lines on e.g. a single account...]

Well... yes. I think this boils down to a question of "document database" or "relational database". Obviously, we're built on a relational database, and I've never truly warmed up to pure document/object storage, the "NoSQL" movement... At the same time, the structure of an invoice in LSMB is pretty well-defined, and doesn't vary much, so we can present the entire thing as a "document", even though the lines are themselves first-class objects.

Maybe this is just force of habit for me, and there may well not be any actual need for it, but I would think that pretty much anything that can be a line on a report or an invoice should be directly addressable. But maybe that's overkill?

I can see how that's useful for GL lines. I can see how that's useful for inventory movements. But I can't really see (yet) how that's useful for invoice lines which are already part of the GL lines information as far as the accounting impact is concerned and already part of the inventory movement information as far as the inventory impact is concerned. However, maybe it's better to just disclose all the information along the same design and not have a lot of variance so its easy to write the services for them.

I've built one very complex system from scratch, and with that one I just made each level of the hierarchy extend the base data object class, and so I did essentially get the basic CRUD APIs for this for free, once I mapped my API layer to the data object -- about the only thing that needed attention at each level were the fields available for index queries -- and then the nesting issues we're discussing. I guess I didn't think that much about whether we *needed* that level of access (though it certainly helped when debugging).

Ok. That strategy works for me, especially if it reduces coding in the overall application.

I do think we should plan to allow the client to request what data to nest, perhaps either a custom header or a parameter (or both)? This would be one area that needs to be self-documenting, what resources can be excluded/included/expanded in which requests, and what is included by default.

I like that. I'll think about how we can model this.
As I think about it, I really only see two levels here: expanded, or condensed. Expanded, for an invoice, the response would include the customer record, each line item detail, each payment line. Condensed, it would only contain references to these other records, which would have to be retrieved separately if they don't exist.

How much deeper is useful to go?

Would we ever want to load the product from the line item? Perhaps, and then need to look up a pricegroup for a customer for the product... not exactly sure how this is currently modeled. But that really seems as complicated as this system gets. Oh, I guess there's entity/eca/contact method.

Right. So, if we talk about condensed versus expanded, we'd be talking about the first-level expansion only. I think that should work for most use-cases. If we need more, then we should probably be looking to solve that on a specific case-by-case basis. (Allowing N-level expansion would probably mean building a graph of interdependencies and teach the service code to walk them... That sounds like an interesting problem, but not a way to achieve a solid webservice any time soon...)

Bye,

Erik.

http://efficito.com -- Hosted accounting and ERP.

Robust and Flexible. No vendor lock-in.