RiskGuard file size limitations

Chris Jensen
edited November 4, 2022 in Thrust Services #1

Hi

Trying to figure out what are the file size limitations for the RiskGuard API.

I uploaded a 10MB file - and the upload appears to be successful - but I got the following in the response json:

Status(message=SKIPPED, description=Extracted text size exceeded maximum value of 1048576.)

Comments

  • Robert Kapitan
    edited November 1, 2022 #2

    Hi,

    By default 1MB is what gets processed in that service. There are ways to reconfigure maximum size, but this is not something we made available.

    Robert

  • Thank you Robert. Perhaps that is something that should be added to the API documentation.

    The processing limitation for maximum extracted text. Plus a note about the difference between file size and extracted text size. For example it was able to successfully process a 4.7MB PDF - because much of that was images embedded in the document.

  • Yes, it is 1MB of text, not 1MB as file size. Thanks.

  • Hi robert,

    Ironically, I was talking with some of the OT team at OpenText World a few weeks back about RiskGuard.

    Are you able to clarify following?

    1) My understanding of the OT2 service is that you pass a document and the service extracts the text content. am I correct from above that if the extracted text is 1 MB that the service will responds the maximum size is exceeded, rather than providing a response based on the first 1Mb extracted and supplemntary information that the size was exceeded?

    2) Does the same limitation apply if on on-prem or private cloud deployment is used?

    3) Is there a means to pass a document and other textual data for review - i.e. if I have some additional metadata about the document (e.g. categories from Content Server), can these be passed with the document and RiskGuard provide review of both? (or can you pass a JSON or XML data structure directly for assessment) ?

    Regards,

    David

  • Hi David,

    Please see my answers below:

    1.My understanding of the OT2 service is that you pass a document and the service extracts the text content. am I correct from above that if the extracted text is 1 MB that the service will responds the maximum size is exceeded, rather than providing a response based on the first 1Mb extracted and supplemntary information that the size was exceeded?

    RK: Correct, our Cloud service will respond with maximum size exceeded.

    2.Does the same limitation apply if on on-prem or private cloud deployment is used?

    RK: There are configuration settings that can be adjusted for on-prem and private cloud deployments if there is a need to process larger files. The users can also change the configuration so that the engine will process a portion of the file only. Please keep in mind that processing of a very large file is not that efficient as the user might end up with a list of, let's say, 7000 entites, 20000 concepts and 65 classifications that are all relevant.

    3.Is there a means to pass a document and other textual data for review - i.e. if I have some additional metadata about the document (e.g. categories from Content Server), can these be passed with the document and RiskGuard provide review of both? (or can you pass a JSON or XML data structure directly for assessment) ?

    RK: The service processes files. If a given file includes a list of metadata this will be processed as any text.

    Thanks,

    Robert