Home
TeamSite
Metatagger or Lucene
System
Firstly, I know they do different thing :-)
From my understanding of these products Lucene can generate a summary of the page using keywords. This can be improved by using a metatagger file to supplement it's index which is the value add because of the taxonomy and other rules metatagger employs.
What I would like to know is why wouldn't you use metatagger to generate keywords and summaries and place them within the html document eg within the <meta> tags. Then let lucene index the complete html document without relying having to integrate the additional metatagger index?
Thanks,
Alex
Find more posts tagged with
Comments
Migrateduser
Skaj,
We had a DevNet cast on this last week - you can get the recording from the site:
http://devnet.interwoven.com/site.fcgi/webcasts/data/005/11-ContentSearch-MetaTagger/DevNet%20MetaTagger%20Search%20V2.pdf
It really depends on what org.apache.lucene.document implementation you are using - that defines what fields are extracted and indexed and how. For the DevNet webcast we focused on a more radical approach - index the metadata record, not the HTML. That way you can do fielded searches with Lucene and have fine-grained control over what gets indexed. To summarize, the process was:
HTML --> MT (using kw, summary, and 2 recognizers for topic and thesaurus) --> XSL --> Lucene.
Lucene hits would then point back to the original HTML docs.
Regards,
Clark
Edited by lissa on 03/18/05 11:59 AM (server time).