[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Proposal for file attachment API
- Subject: Re: Proposal for file attachment API
- From: Luke <..hidden..>
- Date: Wed, 6 Jul 2011 01:45:48 -0400 (EDT)
On Tue, 5 Jul 2011, Chris Travers wrote:
I guess there is one thing that's bothering me about this discussion,
The thing that bothers me (slightly) is that you and I seem to be the only
people with opinions on this.
and it's worth bringing up. I am not aware of a single filesystem
which attempts to enforce uniqueness on file data. I would think if
it was a significant problem, it would have been tackled there first.
My reasoning was this:
If content is not unique, or at least probably unique, you will
probably end up with many copies of a single document being attached, to
for example, a quote, an order, an invoice, possibly a purchase order, and
maybe a payment (just for a most extended case).
I thought that was bad, initially, from a data storage prospective.
Storing files in a database used to be thought a generally bad idea, but
if you're going to do it, it seems likely to be a good thing if you don't
use the storage wastefully: it will have performance effects, possibly
might have effects when doing cleanup/data recovery/repairs, etc..
The second reason for jumping to uniqueness of contents under the multiple
links system, was also in part so that a user could upload a file without
knowing if it was already on-system. Instead of being stored, a link
would be created, and the upload thrown away, if the contents were already
in the repository.
As one, I can attest that users are lazy. It may be easier to re-upload
ten times, than to go hunting the already uploaded copy ten times.
That doesn't mean the software should have to maintain ten copies.
The third reason why I don't like multiple copies of the same document,
although this is probably more of an argument for the linking system, is
the case of contract law.
Hypothetical:
Manufacturing makes an agreement with a customer, and attaches the
contract to a quote.
They email it to Joe in accounting for his approval.
He suggests some changes by altering the document and sending it back, but
they tell him to go ahead as-is.
He puts through an order and invoice to the customer, attaching the file.
Only what he attaches, is his munged version, accidentally.
That'll probably become the legally binding version of the agreement.
I imagine iterations on that kind of case, and my thought for how to
limit it in software, is to let the order Joe creates, link back to the
original contract, attached to the quote.
That in and of itself does not require unique contents, but it does
require the linking scheme. If you're going to do the linking scheme, it
seems a small step to make it global, which leads to probably unique
contents.
(I suppose if you really want to limit that case, we should have file
name uniqueness at the customer or vendor level, but...oy.)
But, let's consider no unique contents, and no linking.
You may end up with at least three copies of each file on the
system. With lots of files, the storage requirements for that are going
to get absurd.
I, at least, run LSMB in virtual machines. I don't always grant them a
huge amount of space, or a huge amount of memory.
I conserve where I can, and if I don't have to duplicate every 1-5 meg
proposal or file, 2-3 times per customer, I don't want to.
Consider a company which attaches a standard contract to every order, or a
standard SomethingOrOther.
This probably shouldn't be used for that, but there's a good chance it
will be.
(Realistically, if that was the system in use, I would avoid passing
the attachment along the accounting chain like that--I'd put it in the
first document (quote or order), and refer back to it when I had to. But
I'm not everyone.)
N.B. File systems do not require this kind of uniqueness, but the ones
which assume a level of intelligence in their users, do make it possible,
via various kinds of links.
If you're really very uncomfortable with it, I'm certainly not going to
insist upon it, but I do think it makes for a better system in the long
run, if we try to minimize the number of copies of files as much as
possible, but maximize the number of documents they can be attached to.
It's the virtual names (I.E. multiple linking) I most wanted, and the
uniqueness of contents was a by-product idea that seemed good in
retrospect from a storage prospective.
I do like your source document reference plan though.
I suppose I am viewing files as their own documents in all this,
attachable to anything that supports it.
wondering if the relational model is really well suited for this
To my mind, it's the only model that's perfect for it, perhaps sans the
primary key issue.
The first virtual FS I ever worked with, was a PostGreSQL backed one,
although it used non-DB storage.
In it, any number of paths and names could point to each file.
problem. After all we are talking about a huge natural primary key so
however we go about enforcing that, there will be a substantial
performance cost.
I really wish we could find a way not to use that primary key, or to
derive a unique short form, so we don't have that problem.
You have a point about checksums, but there aught to be a way to
fingerprint a file and do comparisons on that basis.
I agree with that, but if we have to... Do we have other tables in 1.3
where that's the case?
Yes.
note (abstract table, no acfcess)
entity_note (notes for companies and people)
eca_note (notes for customer/vendor credit agreements).
All could conceivably be queried together with:
select * from note, but to insert you must insert into the appropriate
subclassed table.
There is no attempt to enforce uniqueness of note contents.
Haha!
(Although, if notes were likely to be several hundred K to a few meg each,
someone would probably suggest it.:))
Luke