Discussions
Categories
Groups
Community Home
Categories
INTERNAL ENABLEMENT
POPULAR
THRUST SERVICES & TOOLS
CLOUD EDITIONS
Quick Links
MY LINKS
HELPFUL TIPS
Back to website
Home
Web CMS (TeamSite)
Adding a new post-processor
JessicaD
MT4.0
TS6.7.1
MT is integrated with TS; There is a task in our workflows that requires content developers to review metadata generated by MT.
I am trying to add a post-processor (written in Perl) that will remove the date and year from the list of keywords generated by Metatagger. We currently use a Perl script to remove unusual characters, so I built my new file based on that one (I'm new to Perl). I have tested my regex statements to make sure they are correct, the .pl file is in the correct directory, and the runtime.cfg file is in place. I stopped the MT Server, followed the process for adding the new post-processor according to the documentation, and then started the MT Server. When I re-tag my files in TeamSite, I am still seeing the same keywords as I had before adding the post-processor. Am I missing a step? This is what I have in the Perl script:
#!C:\iw-home\iw-perl\bin\iwperl
# Remove Year Post Processor written in perl.
#use strict;
use Getopt::Long;
#variables
my $crackedText=(); #crackedText stream
my $original=(); #original, uncracked file
my $mdrecord=();
#Must to have a variable for both crackedTex AND originalText even if both variables aren't used.
GetOptions ( "crackedText=s" => \$crackedText, "originalText=s" =>\$original); # -crackedText and -original
while (){
$mdrecord .= $_;
}
$mdrecord = process($mdrecord);
print $mdrecord;
sub process {
my $md = shift;
$md =~ s/(20\d\d)//gsm;
$md =~ s/(-\d\d-(20\d\d))//gsm;
$md =~ s/(\d\d-\d\d-(20\d\d))//gsm;
return $md;
}
Find more posts tagged with
Comments
Migrateduser
There are a few things to consider.
1. The TaggerGUI is configured to keep items that are deltas of the current summarization. This is to protect any additions of keywords someone may have added to the field by hand. If the documents that you are tagging in TS, have been tagged previously with the date and saved on TS server, then the date is now part of the extended attributes for that record on the TS side - which MetaTagger does not control. When you retag TS sends the current values to MT, MT generates it's metadata record and overwirtes any duplicates, it adds any not present and keeps all items that come from TS but do not match it's set. I assume you are seeing the dates still in the TaggerGUI. You can do a quick test to see if you script is infact removing dates and this is the error you are seeing.
a. Pick a document that has a date in the keywords field.
b. Manually delete all the keywords in teh filed - including the date.
c. Save the metadata.
d. Retag All.
e. Is the date back? - if so your script is not working.
2. If you need to debug further try this:
Look at the metadata record that MT is producing to send to TS. Use the MTAdmin log viewer if you do not have access to the server. First make sure your logging levels are at max, just during your testing. I would look at the mtserver_9095 log. It will be easier to debug if you bounce the sever to clear the logs first. Tag just your one file and immediately look at the log. If the dates are not there, then your script to remove is working.
JessicaD
Thanks for the info. This explains why I'm still seeing the dates in the Tagger GUI in TS. I followed your instructions and the debugging steps, and I'm still seeing the dates, so it's got to be something in the Perl script. I tested the regex statements at a couple of sites that I know of, and they all work on the metadata if I copy and paste it into them. Can you tell me if the rest of the script that I pasted in my original post looks right?
Thanks!
JessicaD
I got it to work - thanks for your help, Morgan!