OCRed PDFs have the text layer embeddedover top of the original image layer.
When Livelink indexes the content of thePDFs, it indexes the text layer so the words are searchable.
I’m not sure what image format you’reusing but the only way to get the OCRed data to Livelink without customizingLivelink is to either embed it as a layer within the image or to archive theimage together with a text/html/xml file containing the OCRed data in a ZIPfile.
However, the data will appear in Livelinkas content – not metadata.
To capture the data within custom metadataregions, however, you will need to customize Livelink.
I would recommend walking through theOScript code during a normal extraction run of a document to see how themetadata and content are collected to see how you could implement acustomization in that area.
_________________________________________Kyle SwidrovichPrincipal Search / SDK Product SpecialistLivelink Escalations Support TeamOpen Text Corporation275 Frank Tompa Dr.Waterloo, ON, CANADAPhone : +1-800-540-7292Online: http://support.opentext.com_________________________________________
From: eLink Discussion: Development Discussion[mailto:development@elinkkc.opentext.com] Sent: Tuesday, March 17, 2009 1:38AMTo: eLink RecipientSubject: OCR issue in EnterpriseScan
OCR issue in Enterprise Scan
Posted by eu0022854 (unknown, unknown) on 2009/03/17 01:35
Can someone advise how to pass OCR value to indexing field? If there is some documents about this, that would be great. The situation we have now is:we can get characters recognized from scanned documents, but we couldn't pass that value to indexing field for metadata.
Unfortunately, I’m not personallyfamiliar with anyone implementing something like this, so I’m not able toprovide any examples.
As far as OScript review goes, you’llneed to have the Livelink SDK.
It will provide you with the Builderapplication, which will allow you to set breakpoints in OScript code and walkthrough each function during an extraction attempt (index.update request).
Builder will also allow you to customizeLivelink by writing your own modules or patches to existing code to enhance,override or extend Livelink functionality.
If you do not currently have the LivelinkSDK, please contact your sales rep for more information.
Regards,
Kyle
From: eLink Discussion: Development Discussion[mailto:development@elinkkc.opentext.com] Sent: Tuesday, March 17, 200911:40 PMTo: eLink RecipientSubject: Further query
Further query
Posted by eu0022854 (unknown, unknown) on 2009/03/17 23:36
In reply to: RE OCR issue in Enterprise Scan
Posted by kswidrov (Swidrovich, Kyle) on 2009/03/17 09:02
Thanks for your reply. We are using TIFF image, when it is archived, it is converted to PDF format. We are able to pass Barcode number to Metadata. From you reply, my understanding is that it is not easy to pass OCRed data to Metada. Is there any successful case abou this? Also how to do "walking through the OScript code during a normal extraction run of a document" Thank you in advance Vincent