[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Proposal for file attachment API



On Sun, Jul 3, 2011 at 11:28 PM, Luke <..hidden..> wrote:

> I was expecting that, yes.  However, I shouldn't.  My recent experience is
> with reasonably disciplined corporate users, who either get files from
> sources with likely to be unique names (some form of the vendor name and
> vendor's ID), create files for customers/vendors with names of the same
> type, or are good at storing files with rather long, descriptive, and
> accidentally unique names.

Are we only worried about corporate users here?  What about small
business owners?
>
> However, if we combine our two ways of looking at this, I think we have
> the solution.
>
> If you store files by ID, and internally reference them by ID at all
> times, they can all be called "foobar.pdf" and it doesn't matter.

Ok, so which foobar.pdf attached to a sales order is which one?  i'd
rather have some sort of unique constraint on the file name even if it
isn't global.  My current thinking is a uniqueness between the primary
attachment point and the file name.  So you can only have one
"contract.doc" attached to a given sales order.  However, if you have
a secondary link there (to be implemented in the future) to the sales
invoice, the user could see "contract.doc" linked from "sales order
12345."  There could only be one "contract.doc" linked from that sales
order to that invoice though there could be additional contract.doc's
from other sales orders onto the same invoice.

>
> When a new file is uploaded, compare its CRC/checksum to the index of
> stored files.  If there's no match, it's a new file, gets a new ID, and we
> save it. If the checksum matches, compare the contents to the N files with
> the match.  If one of them matches, create a link to the existing copy, by
> inserting a referential entry in whatever table is tracking where files
> are attached. If none of them matches, it's a new file.

I am still worried about handling naming collisions in a way that is
user-transparent.
>
> I'm pushing this, because I think it's more extendable, and it also leads
> directly to what Erik wanted.
>
> If you divorce the storage of files, and the way they are tracked, from
> the documents to which they are attached, you get a true virtual
> filesystem.

If you are going to do a true virtual filesystem I think you'd have to
store a lot more information about each file to avoid the file
collision issue.  I personally would get cranky if I had to look
through 34 foobar.pdf's to figure out which one was the one I wanted.
Traditional filesystems get around this problem by requiring unique
paths.  We could do the same here but either the paths end up
meaningless or they end up impossible to enforce against different
database relations.

So the question I have is that if we go this way, how do we handle
that?  Over what domains do we require unique file names, and how do
we expose this to the user?

> Any document can point to any file(s), and any file can be pointed to by
> one/some/no documents.
>
> Associations can be re-mapped after file storage (this assumes a file
> management UI at some point), which is necessary for Erik's suggestion.

The second addition I was thinking about a different way of attaching
secondary linking.

CREATE TABLE secondary_file_mapping AS (
file_id int not null,
ref_key int not null
attach_class int not null references file_class(id)
);

CREATE TABLE attach_tx_file_to_oe AS (
FOREIGN KEY file_id REFERENCES file_transaction(id),
FOREIGN KEY ref_key REFERENCES oe(id),
CHECK (attach_class = 2)
) INHERITS (secondary_file_mapping);


CREATE TABLE attach_oe_file_to_tx AS (
FOREIGN KEY file_id REFERENCES file_order(id),
FOREIGN KEY ref_key REFERENCES transaction(id),
CHECK(attach_class = 1)
) INHERITS (secondary_file_mapping);

CREATE TABLE attach_oe_file_to_oe AS (
FOREIGN KEY file_id REFERENCES file_order(id),
FOREIGN KEY ref_key REFERENCES oe(id),
CHECK (attach_class = 2)
) INHERITS (secondary_file_mapping); -- for order consolidation, etc

This gets more complicated than your solution, Luke and that's a
legitimate question of tradeoffs.  But I am wondering if a real
virtual filesystem is desired if it would be better to use a mature
document management solution instead and use the proposed x-url
approach.

Note here you can get all files by "select  * from file_attachment"
and all secondary links by "select * from secondary_file_mapping."

I don't currently see a way around the naming collision issues across
wide ranges of users which strike me as very manageable, but I'd be
interested in hearing suggestions.

Best Wishes,
Chris Travers