Categorization Data in Documents; a Basic Processing Need? (No replies)
I remember discussions back in the early 2000's around whether document metadata should be held in the EDM system or go with the document. We went back and forth about this item many times. Since EDM systems already held this information, holding the data in the system became the norm.
Since then it has become apparent to me that there is a need to hold this data in both places – not just one. Capturing standard document metadata (eg. categorization, version, approved date, author, language) need only occur once in the originating system. If standard metadata (like say, the EDM Submission Reference Model) are passed with the document, then re-entering that data no longer becomes necessary. The potential performance gains from using embedded data in documents, simply for metadata and categorizing the documents, can be quite significant.
The typical situation, for submission documents, is when an EDM system is used to author submission documents, then an electronic Publishing system is used to generate the submissions onto a file share, and then these submitted documents are placed back into an EDM system. It is seldom the case that the EDM authoring system, the Publishing system, and the submission archive system are all from the same vendor and/or all use the same categorization of documents.
Typically today, only the authoring side appropriately categorizes documents. This is typically not done on the Submission output side (sometimes called Submission Archive). This means that a person looking for submitted information usually has to rely on the submission structure to find documents and not a standard search (which would presume the documents are categorized).
It would seem that the EDM Submission Reference Model should be appropriate for both source and submitted documents.
From a business process perspective, we should be able to locate a document, using the same search metadata and categorization, at any stage of the document’s lifecycle. I would be interested to know:
- Who, today, can use the same categorization search criteria to find a source and submitted document (would this categorization be the EDM Submission Reference Model?).
- Do vendors support this situation?
- Should this be something industry should consider and/or push for?
There are a number of projects to add structured data into the content of documents, however a more basic need might be to embed standard document metadata (especially categorization data) into each document. This would provide the means to pass the document from one process step to another keeping the metadata with the document, allowing that data to be automatically utilized and/or added to a different system.