Magellan Risk Guard Service

Robert Kapitan
edited November 1, 2022 in Blog #1

1.  What is Risk Guard Service?

OpenText Magellan Risk Guard is an AI service that lets you uncover corporate compliance risks from images, video, audio and text content. It includes the AI powered content analytics capabilities to scan and flag for further human review or integration into existing applications, processes and workflows.

Risk Guard enhances your ability to detect potentially offensive or unwanted images and video through prebuilt AI models for categories such as weapons, alcohol and drug use or even adult material. 

Risk Guard also checks for Personally Identifiable Information (PII) like person names, credit card number or banking information. Additionally, Risk Guard includes AI models for Personally Sensitive Information (PSI) such as hate speech identification, or content that has gender or racial bias.

Through these powerful prebuilt AI models, your business can be assured no content or information against corporate compliance initiatives will be used, stored or communicated.

The rich AI capabilities are easily used and integrated into your existing applications, processes, and workflows. Delivering the power of AI with the simplicity of an API call.

2.  Why would an organization need Risk Guard Service?

Security, Privacy, Records Management, IT teams as well as Content Analysts & Data Scientists are struggling with unstructured content!

The questions and concerns they have are:

  • How can we efficiently identify risky files?
  • Extreme volumes require an AI approach: is REGEX enough? Or we rather should look for AI/NLP/NLU & computer vision technics?
  • Do we need to build AI/NLP/NLU models? How long this might take?
  • Do we have technologies & skills to identify risky files?
  • Do we need to buy several licenses to run risk assessment projects on our content?
  • Is our approach flexible enough? What’s risky for me is not risky for you!
  • Can we run projects on ‘any type of content’? Not just on xml or pdf files.
  • What’s our user-experience? Do we have to follow how different scripts are getting executed? Or there is an easy end point to reach out to from our repository?


Risk Guard Service provides a solution to above-listed concerns!

The service can process image and text files and identify risky content according to:

  • Number of image threat categories (e.g.: Gambling, Violence, Adult Materials),
  • Different standard PII (e.g.: person name, address, credit card number, SSN, etc.)
  • Sensitive information such as Tax and Financial Information, Gender, Religious Beliefs, Homophobia, etc.

Text files that can be processed in Risk Guard are the ones APACHE TIKA conversion service can handle: https://tika.apache.org/0.9/formats.html

-------------------------------------------------------------------------------

Within Risk Guard Service output users can identify what is risky for them and simply ignore any other risk assessment results.

Some examples of the use case below:

  • I would like to identify the documents sitting in an ECM folder that might contain credentials, I can look for<ip address>, <username> and <password> values and <Credentials> category in Risk Guard output
  • Controversial topics should not be discussed in corporate emails, I can look for <Religious Beliefs>, <Gender and ****>, <Racial Origin>, <Sexual Orientation> categories in Risk Guard output
  • Is there any images showing violence or weapons in a collection of files in a corporate repository, I can look for <Graphic Violence>, <Gore>, <Weapons> and <Extremism> classification values
  • Let's make sure that the set of documents that we are sending to a customer doesn't contain any PII such as <SSN>, <Credit Card>, <Bank Account Numbers>, <Drivers Licenses>, etc.
  • I need to validate if there is no files mentioning my name in the collection of files on a computer as I have to return my laptop to IT, I can look for <PN> value matching my name in Risk Guard OT outputs
  • I need to check customer files that were sent to me ahead of a pilot project and validate if these will contain expected URL links, I can look for <URL> and <shortURL> values
  • I lost some addresses of my customers, I should be able to retrieve them in some archived files, I can look for <Organization Name>, <Person Name> and <Address> values in Risk Guard output

3.  What kind of input file Risk Guard Service can process?

Currently text files and images can be processed in Risk Guard, but video frames and audio files will be added soon.

Also, the files that need OCR will be possible to analyze in the future versions of the service.


4.  What to expect from Risk Guard Service output?

Risk Guard output - this is not only about PII or PSI models, classifications and named-entities, the service also provides tons of information around risk assessment values:

Output element

Confidence Score

What is the value of it?

How the model is confident that the entity is the right one in a given context, e.g.: 'Gucci' could be a person name or an organization name, the confidence score tells the user how the engine is convinced that 'Gucci' belongs to a specific entity type.


Output element

Relevancy Score

What is the value of it?

How the entity or classification is important for a file, e.g.: if the 'phone number' was found 3 times in a document it will be more relevant that the number that was found once, if the classification comes with relevancy score 1 it means it describes better the topics of the document comparing to relevancy score 3 classification


Output element

Position

What is the value of it?

Where the specific data point is coming from in a file, e.g.: passport number was found in position '1542'.


Output element

Frequency

What is the value of it?

How many occurrences of a single data point were found?


Output element

Weight

What is the value of it?

How the classification is important to a document, how many concepts and rules are recognized in Text Classification model to confirm that this specific classification is strong for a document.


Output element

Candidate

What is the value of it?

Does the entity coming from a dictionary (value 1) or it was inferred based on Machine Learning backed deduction process.


5. Step-by-step guide & Additional Information

Magellan Risk Guard Service Overview