Discussions
Categories
Groups
Community Home
Categories
INTERNAL ENABLEMENT
POPULAR
THRUST SERVICES & TOOLS
CLOUD EDITIONS
Quick Links
MY LINKS
HELPFUL TIPS
Back to website
Home
Web CMS (TeamSite)
keyword extraction from a portion of html doc
cls
I'm a very new user to both Metatagger and TeamSite but I have searched both forums for this question, and I've studied the MT documentation so please be kind. We're using TeamSite 6.1.0 / SP 2 and Metatagger 4.01. In the TeamSite Tag screen, we invoke Metatagger to extract a certain number of keywords from the document. If the document is an HTML document, we need Metatagger to examine only a portion of the document, which is set off by comments (e.g. <!-- main body content --> or something similar). I think we need to write a CLT preprocessor but we're not sure if this is the best way. Is there a different mechanism? Has anyone done this?
Thank you for your input.
Find more posts tagged with
Comments
Migrateduser
Yes you will need to write a preprocessor.
Capture the contewnts of the text you want to summarize (extraction of keywords is a summarizer project) and write the targeted contents back out to the -crackedText file. This means the summarizer will just see that section. If you have other processes also running, you should move them to a second Content Processor for HTML so they are not hindered by just getting the targeted summarization text.