Using Filter Regexes to choose content processors

Hello All,

The mt.411.user.pdf document quotes this as a reference solution in Page number 30.

"Create two content processors associated with the PDF file type. The first processor removes unwanted bits from memos (for example, the “To:” and “From:” lines), and the second processor removes tables of numbers. Create two preprocessor groups, associating each content processor with a different processor group. Add a filter regular expression (regex) field that matches on the path to the memos to the content processor for handling
memos.Next, add a filter regex field that matches on the path to the papers with tables to remove to the content processor for handling those papers.Both content processors share the same set of fields, but each provides optimal results because the d ocuments are properly preprocessed. "

I do not understand how filter regexes can be used to choose content processors.

Could somebody please clarify/eloborate on the approach?(How to go about doing this.)

Thanks,
Aryan.

Find more posts tagged with

Comments

Migrateduser

Consider two different document types - memo and report - with different prefixes in the file name (memo-23.doc and report-41.doc). Using a filter regex in the content processor definition (.*memo-.*.doc and .*report-.*.doc) would allow you to use different models to analyze the different types of files.

Aryan

Thanks a lot.

Aryan