RiskGuard file size limitations

Chris Jensen · 2022-10-31T18:15:23+00:00

There was an error rendering this rich post.

Trying to figure out what are the file size limitations for the RiskGuard API.

I uploaded a 10MB file - and the upload appears to be successful - but I got the following in the response json:

Status(message=SKIPPED, description=Extracted text size exceeded maximum value of 1048576.)

Find more posts tagged with

Magellan Risk Guard

Comments

Robert Kapitan

Hi,

By default 1MB is what gets processed in that service. There are ways to reconfigure maximum size, but this is not something we made available.

Robert

Chris Jensen

Thank you Robert. Perhaps that is something that should be added to the API documentation.

The processing limitation for maximum extracted text. Plus a note about the difference between file size and extracted text size. For example it was able to successfully process a 4.7MB PDF - because much of that was images embedded in the document.

Robert Kapitan

Yes, it is 1MB of text, not 1MB as file size. Thanks.

David Henshaw

Hi robert,

Ironically, I was talking with some of the OT team at OpenText World a few weeks back about RiskGuard.

Are you able to clarify following?

1) My understanding of the OT2 service is that you pass a document and the service extracts the text content. am I correct from above that if the extracted text is 1 MB that the service will responds the maximum size is exceeded, rather than providing a response based on the first 1Mb extracted and supplemntary information that the size was exceeded?

2) Does the same limitation apply if on on-prem or private cloud deployment is used?

3) Is there a means to pass a document and other textual data for review - i.e. if I have some additional metadata about the document (e.g. categories from Content Server), can these be passed with the document and RiskGuard provide review of both? (or can you pass a JSON or XML data structure directly for assessment) ?

Regards,

David

Robert Kapitan

Hi David,

Please see my answers below:

1.My understanding of the OT2 service is that you pass a document and the service extracts the text content. am I correct from above that if the extracted text is 1 MB that the service will responds the maximum size is exceeded, rather than providing a response based on the first 1Mb extracted and supplemntary information that the size was exceeded?

RK: Correct, our Cloud service will respond with maximum size exceeded.

2.Does the same limitation apply if on on-prem or private cloud deployment is used?

RK: There are configuration settings that can be adjusted for on-prem and private cloud deployments if there is a need to process larger files. The users can also change the configuration so that the engine will process a portion of the file only. Please keep in mind that processing of a very large file is not that efficient as the user might end up with a list of, let's say, 7000 entites, 20000 concepts and 65 classifications that are all relevant.

3.Is there a means to pass a document and other textual data for review - i.e. if I have some additional metadata about the document (e.g. categories from Content Server), can these be passed with the document and RiskGuard provide review of both? (or can you pass a JSON or XML data structure directly for assessment) ?

RK: The service processes files. If a given file includes a list of metadata this will be processed as any text.

Thanks,

Robert