Home
TeamSite
Manually Adding Keywords
seeDerekNow
Is there a way to enter keywords manually through the MetaTagger GUI? From what I've seen, you can only select keywords that are based off of your vocabulary. Any suggestions?
Find more posts tagged with
Comments
Ottawa_IWOV
Not sure if I completely understand. Keywords are generated from content that is passed to the MetaTagger engine. Keywords are not pulled from controlled vocabularies.
Lucas Cochrane
lcochrane@deloitte.ca
Adam Stoller
MetaTagger provides for both automated and manual entry fields. The manual entry fields are just like regular-old DCT fields (e.g. text, textbox, etc.) - the automated fields for things like categorization tend to be large selection lists.
I don't remember, but I don't think that MT allowed you to intermix manual and automated text entry in the same field (I could be wrong though, I only played with MT once for about a month)
--fish
Senior Consultant, Quotient Inc.
http://www.quotient-inc.com
seeDerekNow
Is there a way to switch to a manual entry mode for keywords? I was unable to find out through the MT documentation. Can you direct me to some available resources?
Migrateduser
The terms you are referring to are terms from controlled vocabularies, which represent the "official" forms of terms and their variations. For instance, a product for your company may have an official name but also a shorter name, and a former name. When you assign product tags to your document, you want to know that any one of these forms refers to the same product. As an example, MetaTagger and MT are two separate ways to refer to the product MetaTagger and they both occur in Interwoven literature. We can put both of these forms of the name in a controlled vocabulary which defines all of the products for Interwoven. If we were to assign product names to Interwoven documents, when the string "MT" occured, we would want to assign the category "MetaTagger". This is accomplished by building a Recognizer with that Interwoven Product Names vocabulary.
When a user gets a list of terms suggested by MetaTagger, s/he can either accept the list, delete some or all of the terms, or navigate through the controlled vocabulary and add one or more terms from that controlled vocabulary. You cannot add in your own terms into a list that is generated from a controlled vocabulary.
However, if you are generating "keywords" (which are significant words found in the text) using MetaTagger, you can manually add additional keywords into that list. MetaTagger generates keywords by identifying the important words that actually occur in the document, using an indexed document collection as a reference. This is in contrast to assigning categories from a controlled vocabulary. You can get details on how to implement this in the MetaTagger Admin Guide, in the chapter "Building a Summarizer".
seeDerekNow
Thanks. I did some reading up on the summarizer in the MT Admin guide. However, I was not able to find anything on how to generate keywords. Can you anyone point me to some good material on how to do this?
Migrateduser
Hi,
MetaTagger ships with an out of the box keyword/keyphrases generator. If you look in the Admin UI, cllick on "Configuration" then click on "Show Summary." Scroll down and you should see a "Keywords" Category. It is possible that if you are using MetaTagger with TeamSite, that your datacapture.cfg file has not had a parallel item created to take advantage of this. Look at metatagger/examples/configs/catacapture.cfg.mt.example for an <item/> element that you can cut and past into your datacapture. You will have to run iwreset -ui after making such a change.
To check to see if this is working before making a change in your datacapture file, have metatagger generate metadata for a document outside of the TeamSite context, as follows:
iwgenmetadata -save -suggest MyDocument.doc
where MyDocument.doc is an example document in the current working directory for which you would want keywords generated. A resulting "MyDocument.imd" file will be created in that same directory. Open it up in a text editor. It will be in XML format. Look for something like this:
<attribute>
<name>keywords</name>
<value>eurocontrol, prnewswire, air, traffic control, air traffic control, a
ir navigation</value>
</attribute>
MetaTagger suggests keywords and keyphrases for an inputted document by referencing a previously indexed document collection. The Summarization chapter describes how to do this, but here is a brief set of instructions.
If for some reason you are missing the Keywords category in your Admin GUI, look to see if the summidx* db files are in metatagger/conf. If they are, you can add the keywords category back into metatagger.cfg and simply use the out of the box index to generate your key words. In the Admin GUI, add a new Index entry. It should by of type "TABLE_ENTRY". The lexicon should be "summidx" adn the script should be as follows:
tokenize -stopwords english_stopwords.txt -ignorenumbers -ignorepunctuation; splitter; phrase; tagger; summarize -keyphrases 3 -keywords 3
Make sure there are no line breaks in the script.
If you don't have the summidx* db files anymore, follow the steps below to create your own index:
Step 1: Create an indexed document collection
1.1 Gather together a collection of documents that are similar to the documents you want to generate keywords for.
1.2 Assuming that you want keyphrases (multiple tokens in a string, such as "The White House") in addition to keywords, you will need to run two indexing passes. The first is for phrase training, the second is for creating the final index. You will need two separate configuration files for each. You can find examples in metatagger/examples/configs. Make a copy of create_phrase_index.cfg.example and edit it so that it contains the proper path to your documents and includes the file extensions of your documents. Also make sure that you change the value of the <lexicon> element to be the root name you want for your index (e.g. pressreleases or marketingDocs).
1.3 Run your first indexing pass with the following command, using your edited create_phrase_index.cfg file:
iwgenindex -config edited_create_phrase_index.cfg -verbose
You should see the list of documents scrolling by. Make sure you have indexed the documents by running the following command:
iwqueryindex -db YOUR_LEXICON_NAME
This will give you information about the number of documents indexed.
1.4 Prepare the second configuration file. Make a copy of metatagger/examples/configs/trainphrases.cfg.example. Edit the <lexicon> element so that it has the same value as the <lexicon> element in step 1.2. Change the path statements and file extensions so that they are exactly the same as in the previous configuration file that you edited.
1.5 Run your second indexing pass with the following command, using your edited trainphrases.cfg file:
iwgenindex -config edited_trainphrases.cfg -verbose.
1.6 Check to make sure you have a good index:
iwqueryindex -db YOUR_LEXICON_NAME
Step 2. Login to the Admin GUI and add a "Keywords" category entry. Do this by selecting "Configuration" , then "Modify Index Entries", then "Add". Give it the tag and title you want. Select "TABLE_ENTRY" as the type. The lexicon should be the root name of the index files you created in steps 1.3 and 1.5. The rack script should be the one referenced above.
Step 3. Check to make sure it's working by runnign iwgenmetadata -save -suggest MYDOCUMENT.DOC
Step 4. Add a matching <item> element to datacapture.cfg. Not that the "name" attribute of the datacapture.cfg <item> element must *exactly* match the value of the "tag" element that you entered in the Admin GUI.
Step 5. Run iwreset -ui.
You should be good to go.
Migrateduser
I neglected to say in the previous post that the the instructions I detailed there are also documented in the MetaTagger 3.6 Advanced User's Guide, Chapter 4 "Tagging with Keywords and Summaries."