Discussions
Categories
Groups
Community Home
Categories
INTERNAL ENABLEMENT
POPULAR
THRUST SERVICES & TOOLS
CLOUD EDITIONS
Quick Links
MY LINKS
HELPFUL TIPS
Back to website
Home
Web CMS (TeamSite)
Using Metatagger.pm
jaswal
This issue is regarding metatagger.pm I am using this in a perl script to get the suggested metadata from metatagger. Everything works fine and as expected for html files, however it does not return suggested values for pdf,xls files .
Any pointers what could be going wrong.
This is the code am trying-
my $mtagger = Metatagger->new();
$mtagger->setServerHost("localhost", 9090);
$mtagger->loadDocument(" my file name ");
$mtagger->suggest();
print $mtagger->getMetadata();
From the metatagger gui everything seems to be working perfectly fine for all file types.
I tried using the method loadDocumentByExt() too passing the extension name, but still did not work.
Thanks
Find more posts tagged with
Comments
Migrateduser
Can you attach your metadata-rules.cfg and your metatagger.cfg?
jaswal
This is my metatagger-rules.cfg
<?xml version="1.0" encoding="UTF-8" ?>
<metadata-rules>
<cond vpath-regex=".*">
<rule name="DLRDOC Metadata" />
</cond>
</metadata-rules>
and heres metatagger.cfg
<config>
<fileType>
<label>HTML input</label>
<extension>html</extension>
<preprocessor>HTMLTitlepreprocessor.ipr</preprocessor>
<htmlMode>true</htmlMode>
</fileType>
<fileType>
<label>HTML input</label>
<extension>htm</extension>
<preprocessor>HTMLTitlepreprocessor.ipr</preprocessor>
<htmlMode>true</htmlMode>
</fileType>
<fileType>
<label>Clear text</label>
<extension>txt</extension>
<preprocessor>TxtPreprocessor.ipr</preprocessor>
</fileType>
<fileType>
<label>Excel Document</label>
<extension>xls</extension>
<htmlMode>true</htmlMode>
<converter>xlhtml "%FILE%"</converter>
</fileType>
<fileType>
<label>PowerPoint Document</label>
<extension>ppt</extension>
<htmlMode>true</htmlMode>
<converter>ppthtml "%FILE%"</converter>
</fileType>
<fileType>
<label>Word Document</label>
<extension>doc</extension>
<converter>catdoc "%FILE%"</converter>
</fileType>
<fileType>
<label>PDF Document</label>
<extension>pdf</extension>
<converter>pdftotext "%FILE%" </converter>
</fileType>
<category>
<categoryType>TABLE_ENTRY</categoryType>
<title>Keyswords</title>
<tag>Keywords</tag>
<implicitAdd>false</implicitAdd>
<SRscript>tokenize -ignoretags -html -punctuation +=!#$^\38*()-|\\"'?/.\37\60\62,]{}[~`:\221\222\223\224\231 -stopwords english_stopwords.txt -ignorenumbers; splitter; tagger; extract -config summ_phrase.cfg; summarize -ignoretags -keywords 5 -keyphrases 5</SRscript>
<lexicon>keywords-idx</lexicon>
<pass>1</pass>
</category>
<category>
<categoryType>DESCRIPTOR_LIST</categoryType>
<title>products</title>
<tag>Products</tag>
<dbName>products_lookup</dbName>
<lexicon>products</lexicon>
<netTimeout>300</netTimeout>
<implicitAdd>false</implicitAdd>
<SRscript>tokenize -ignoretags -affixstem english.cfg; recognize; resolve</SRscript>
</category>
</config>
Again everything works fine using the metatagger gui.
Thanks.
StreamIN.JPG
Migrateduser
Just for kicks try using setServerHost("localhost", 9095);
9090 is the admin server, 9095 is for tagging documents...
jaswal
Thanks for the suggestion but this doesnt help either. The XML doesnt get loaded still for files other than html,txt files.
Migrateduser
OK, I reproduced the problem, it is a bug that has to do with reading binary files on Windows (that's your OS right?). So you should file a bug against Metatagger.pm...please do this for an "official" fix.
It is a one line fix which you could make to
<iw-home>/iw-perl/site/lib/MSWin32-x86/Metatagger.pm
(make a back-up because this making this fix yourself will turn it into unsupported custom code
in the subroutine _AsciiEncoder replace these lines:
open(FH, $filename) or die "Can't open file $filename", $!;
while (<FH>) {
$content .= $_;
}
close(FH);
with:
open(FH, $filename) or die "Can't open file $filename", $!;
binmode(FH);
while (<FH>) {
$content .= $_;
}
close(FH);
i.e. insert the line with binmode(FH);
Edited by jpierre on 06/16/03 04:44 PM (server time).
jaswal
This works...great Thanks!!
Ottawa_IWOV
Does this module work with MetaTagger 4.0? If so, can I pass an IP address an port in setServerHost?
Lucas Cochrane
lucas.cochrane@baucanada.com
Ottawa_IWOV
I keep getting XML parser errors when I do $mtagger->suggest();
I don't even think I am connecting properly to MetaTagger
I am passing an IP Address, 9096 as a port.
I am using MT 4.0
Lucas Cochrane
lucas.cochrane@baucanada.com
Ottawa_IWOV
got it figured out...
Lucas Cochrane
lucas.cochrane@baucanada.com