[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Proposal for file attachment API

Subject: Re: Proposal for file attachment API
From: Luke <..hidden..>
Date: Mon, 4 Jul 2011 15:33:19 -0400 (EDT)

On Mon, 4 Jul 2011, Chris Travers wrote:

On Sun, Jul 3, 2011 at 11:28 PM, Luke <..hidden..> wrote:

I was expecting that, yes.  However, I shouldn't.  My recent experience is
with reasonably disciplined corporate users, who either get files from
sources with likely to be unique names (some form of the vendor name and
vendor's ID), create files for customers/vendors with names of the same
type, or are good at storing files with rather long, descriptive, and
accidentally unique names.


Are we only worried about corporate users here?  What about small


Not at all.  I said I was wrong to expect that.:)

However, if we combine our two ways of looking at this, I think we have
the solution.

If you store files by ID, and internally reference them by ID at all
times, they can all be called "foobar.pdf" and it doesn't matter.


Ok, so which foobar.pdf attached to a sales order is which one?  i'd
rather have some sort of unique constraint on the file name even if it
isn't global.  My current thinking is a uniqueness between the primary
attachment point and the file name.  So you can only have one
"contract.doc" attached to a given sales order.  However, if you have

Exactly. Enforce name uniqueness at the document level. "there isalready a file of that name attached to this document."

You're talking about doing it in the database, which takes us back to somestructure involving a table for each document class. The way I wasthinking about (assuming the routine logic would take care of managing it)is probably too similar to how 1.2 would have done it, but your suggestionbelow *seems* at first look to get the job done.

a secondary link there (to be implemented in the future) to the sales
invoice, the user could see "contract.doc" linked from "sales order
12345."  There could only be one "contract.doc" linked from that sales
order to that invoice though there could be additional contract.doc's
from other sales orders onto the same invoice.

When a new file is uploaded, compare its CRC/checksum to the index of
stored files.  If there's no match, it's a new file, gets a new ID, and we
save it. If the checksum matches, compare the contents to the N files with
the match.  If one of them matches, create a link to the existing copy, by
inserting a referential entry in whatever table is tracking where files
are attached. If none of them matches, it's a new file.


I am still worried about handling naming collisions in a way that is
user-transparent.


Here's what I'm envisioning...

There are two public facing database routines which insert files, and onewhich associates them with documents.


The first, inserts a file, and associates it with a document:

* If the name of the uploaded file is the same as one associated with thisdocument, throw an error.* It does the checksumish comparison that I spoke of above, withoutreference to name. If there's a match, create a link instead of storing;if not, store the file.


The second, stores the file with no document:

* It does the checksumish comparison that I spoke of above, withoutreference to name. If there's a match, create a link instead of storing;if not, store the file.


Association function:
* Verify document exists.
* Verify file exists.

* If the name of the pre-uploaded file is the same as one associated withthe passed in document reference, throw an error, or let the user renamethat particular link.


Obviously, there's crossover and so private function opportunity in that.
However, it's the basic form of what I was thinking.

I'm pushing this, because I think it's more extendable, and it also leads
directly to what Erik wanted.

If you divorce the storage of files, and the way they are tracked, from
the documents to which they are attached, you get a true virtual
filesystem.


If you are going to do a true virtual filesystem I think you'd have to
store a lot more information about each file to avoid the file
collision issue.  I personally would get cranky if I had to look
through 34 foobar.pdf's to figure out which one was the one I wanted.

If most files are attached to documents [assumption], you won't have tolook through 34: look for the one associated with the document associatedwith your customer of interest, or similar. It becomes a reporting issue.

If you're looking through unattached uploades, maybe you can narrow byupload date, sort in some meaningful way (newest to oldist), or in someother way prioritize to help you find it quickly.


Also, one hopes that the descriptions would help.

Yes, it *could* be a problem, but with reasonable common sense behavior,it probably won't be in most cases.


 > Traditional filesystems get around this problem by requiring unique

paths.  We could do the same here but either the paths end up
meaningless or they end up impossible to enforce against different
database relations.


I don't think any path-like construct will work well.

The only thing I see as needing to be unique, is file contents. Namesonly matter at the document level. If users don't use good descriptions,they bring confusion upon themselves; and most files attached to documentswill probably be self-explanatory anyway (I hope that's not toooptimistic).

Hmm, if you did have to use paths, the vendor or customer name might be agood left hand portion, although I still don't like paths.

So the question I have is that if we go this way, how do we handle
that?  Over what domains do we require unique file names, and how do
we expose this to the user?

Well, because of what Erik wants (and I can see value in it), we willeventually need some sort of way to look at all files in overview--a filemanager. It will have to display the name, the description, and theassociated document(s), minimally.

Thinking about that, I am picturing a files item off the main menu. Itwould have a generic "add file" option, and a "reports" option.

The report could show the data I spoke of for files selected in some way(by upload date, by size, by type, by customer, by documents attached to,whatever).It would return a list of files, similar to a list of orders, including(usually/by default), the file name, description, and document(s) to whichit's attached.

A file object, once drilled down to, could have the ability to assign itto new/other documents, or remove existing assignments, if the user'spermissions were high enough.

If it's assigned to nothing, it could be deleted entirely.

Any document can point to any file(s), and any file can be pointed to by
one/some/no documents.

Associations can be re-mapped after file storage (this assumes a file
management UI at some point), which is necessary for Erik's suggestion.


The second addition I was thinking about a different way of attaching
secondary linking.

CREATE TABLE secondary_file_mapping AS (
file_id int not null,
ref_key int not null
attach_class int not null references file_class(id)
);

CREATE TABLE attach_tx_file_to_oe AS (
FOREIGN KEY file_id REFERENCES file_transaction(id),
FOREIGN KEY ref_key REFERENCES oe(id),
CHECK (attach_class = 2)
) INHERITS (secondary_file_mapping);

CREATE TABLE attach_oe_file_to_tx AS (
FOREIGN KEY file_id REFERENCES file_order(id),
FOREIGN KEY ref_key REFERENCES transaction(id),
CHECK(attach_class = 1)
) INHERITS (secondary_file_mapping);

CREATE TABLE attach_oe_file_to_oe AS (
FOREIGN KEY file_id REFERENCES file_order(id),
FOREIGN KEY ref_key REFERENCES oe(id),
CHECK (attach_class = 2)
) INHERITS (secondary_file_mapping); -- for order consolidation, etc

This gets more complicated than your solution, Luke and that's a
legitimate question of tradeoffs.  But I am wondering if a real

Yeah. I see what you're doing. I don't know enough about advancedPostGreSQL to know if it has to be done that way or not. It seems thatthis assumes some modifications (restriction removal) to your tables fromthe other day--it might be useful to see the whole file with these added.

virtual filesystem is desired if it would be better to use a mature

I may have thrown around the term a bit too loosely. This is a limitedvirtual FS--virtual only in the sense that file contents is not linked toname or location [association], and the file can be pointed to by anynumber of equally valid link points.


 > document management solution instead and use the proposed x-url

I started trying to look at those, when I saw your original proposal to meand Alvin Starr for doing the original form of this. That's why I didn'trespond right away: I wanted to see if I could find some document managerwhich would fit the bill. To date, I haven't. User management is a bigsticking point.

Note here you can get all files by "select  * from file_attachment"
and all secondary links by "select * from secondary_file_mapping."

I don't currently see a way around the naming collision issues across
wide ranges of users which strike me as very manageable, but I'd be
interested in hearing suggestions.


The best I have right now is above.

Luke

Follow-Ups:
- Re: Proposal for file attachment API
  - From: Chris Travers

References:
- Proposal for file attachment API
  - From: Chris Travers
- Re: Proposal for file attachment API
  - From: Erik Huelsmann
- Re: Proposal for file attachment API
  - From: Chris Travers
- Re: Proposal for file attachment API
  - From: Luke
- Re: Proposal for file attachment API
  - From: Chris Travers
- Re: Proposal for file attachment API
  - From: Luke
- Re: Proposal for file attachment API
  - From: Chris Travers

Prev by Date: LedgerSMB 1.2.23 Available
Next by Date: New bugs filed
Previous by thread: Re: Proposal for file attachment API
Next by thread: Re: Proposal for file attachment API
Index(es):
- Date
- Thread