Is there a way in Documentum (6.5 or later) to find documents with similar content? For example, when two documents differ in few sentences or terms, or just in non-textual part (pictures/charts). There are two tasks requiring this:
- at the moment when a new content is ready for check in, quickly find similar documents in the repository and infor the user.
- periodical cleanup process running through the docbase and reporting groups of documents having given degree of similarity.
I know that most of full text indexers provide such thing. There is "MoreLikeThis" function in Lucene API, and MS FAST reportedly builds something called "Document Vector" for every document. So, I hoped that, perhaps, there would be a way to get such info from FTDQL?