Magellan Chrome Extension to Summarize Websites
originally posted July 2019 | 🕐️ 9 minute read
Overview
The purpose of this post is really two-fold.
- It's a relatively simple use case for Magellan's NLP capabilities and shows how easy they are to consume.
- The end tool is actually pretty useful - if you just want to skim read different sites.
The full source code for the extension can be found here:
https://github.com/garethhutchins/Magellan-Chrome-Summarizer
The end artifact is an extension that can be added to Chrome, that when pressed, will summarize the current page to the number of sentences you specify as well as summarize the overall tone & sentiment.
Here's what it looks like:
Magellan's Text Mining Engine
I came over to OpenText as part of the Dell ECD acquisition where I was the Global Functional Lead for Capture Solutions. Whilst at ECD, I became a little obsessed at how NLP technologies could complement traditional Capture processes - more on this in later Blog posts. The problem was, ECD didn't have an NLP engine - so after the acquisition, I went straight to the Text Mining Engine to see how we could implement it in some of our Customer use cases. I've been very impressed with it's capabilities. Here's what I think sets it apart:
- The Micro services are very easy to consume. I managed to integrate them into a traditional capture process before lunch one day.
- The engine is not a black box, you can use machine learning techniques to enhance its vocabulary or you can use a user interface to make an immediate change and browse relationships. You are not confined to the usual entities of People, Locations & Organizations like a lot of other NLP engines around. You can create new entities for specific verticals like healthcare or 90's alternative rock bands if you like.
- It works for a number of different languages, not just English.
Understanding The TME Service
The Text Mining service is easy to consume and understand. The service accepts an XML post which contains the text you want to analyse as well as what types of things you want to look for. I won't talk about all of the features here, just some highlights I often use.
To start with, if you wanted to return the tone and sentiment of a piece of text, your request would look like this:
<" ?> <Nserver> <NSTEIN_Text>[Your text goes here]</NSTEIN_Text> <Methods> <NSentiment/> </Methods> </Nserver>
This will return the sentiment and tone of each sentence in the text as well as the whole piece of text, document, as specified in the Methods section of the request, the NSentiment command.
So, if you took the following command:
<" ?> <Nserver> <NSTEIN_Text> This is a story about Gareth Hutchins. He lives in Farnham in Surrey and works for Opentext in Reading. He's pretty cool. </NSTEIN_Text> <Methods> <NSentiment/> </Methods> </Nserver>
You would get the following back:
<"?> <Nserver Version="3.0"> <ErrorID Type="Status">0</ErrorID> <ErrorDescription>Success</ErrorDescription> <Results> <NSentiment> <SentenceLevel> <Sentence> <Text begin="6" end="44">This is a story about Gareth Hutchins.</Text> <Subjectivity score="10.0075">fact</Subjectivity> <Tone>neutral</Tone> </Sentence> <Sentence> <Text begin="45" end="109">He lives in Farnham in Surrey and works for Opentext in Reading.</Text> <Subjectivity score="9.7272">fact</Subjectivity> <Tone>neutral</Tone> </Sentence> <Sentence> <Text begin="110" end="127">He's pretty cool.</Text> <Subjectivity score="79.8701">opinion</Subjectivity> <PositiveTone score="38.471"/> <NegativeTone score="24.893"/> <Tone>positive</Tone> </Sentence> </SentenceLevel> <DocumentLevel> <Subjectivity score="75.0036" distribution="17.2043">opinion</Subjectivity> <PositiveTone score="25.3561" distribution="17.2043"/> <NegativeTone score="16.4069" distribution="0.0"/> <Tone>positive</Tone> </DocumentLevel> </NSentiment> </Results>
You can also summarize the text either to a percentage, by category or to a number of sentences. I've chosen to use number of sentences. To do that, you just need to add another command to the Methods section. Including a KBid command which is the taxonomy base to use. I'm using IPTC which is the standard for the news industry. There are a number of other different taxonomies the engine has or you can create your own.
<Methods> <nsummarizer> <NbSentences>1</NbSentences> <KBid>IPTC</KBid> </nsummarizer> </Methods>
You will then get a structure like this returned back in the Results section:
<Results> <nsummarizer> <Summary[A Summary of your text]</Summary> </nsummarizer> </Results>
Again, for the purpose of this extension, I'm only using some of the features of the Text Mining Engine. If you wanted to return entities, then you would add the following methods to the call:
<Methods> <nfinder> <nfExtract> <Cartridges> <Cartridge>ON</Cartridge> <Cartridge>PN</Cartridge> <Cartridge>GL</Cartridge> </Cartridges> <Hierarchy /> </nfExtract> </nfinder> </Methods>
This would return all Organizations, People and Locations from the text. However, like I say - you are not limited to just these entities with the Text Mining Engine. We also provide Events, Drugs, Symptoms & Date Times out of the box plus you can create your own. This would also return related entities such as town's borough, county, country, continent etc. As well as stock symbols of organizations.
Creating a Popup
I decided to use a Chrome Extension Popup to display the summarization results and allow the user to specify the number of sentences you wanted returned. Here's what the popup looks like:
The popup is just a simple html page. I've added elements for the tone, subjectivity & summarization. I also added a slide container to specify the number of sentences. These elements are then referenced from the JavaScript for the popup.
I excluded all of the style sheet parts from this section but you can see it in the full github project
<body> <img src="logo.png" width="100%"> <p>Drag the slider to change the number of sentences</p> <div class="slidecontainer"> <input type="range" min="1" max="20" value="2" class="slider" id="myRange"> <p>Number of Sentences: <span id="demo"></span></p> <p><font size = "3">Subjectivity: <span id ="subjectivity"></span></font></p> <p><font size = "3">Tone: <span id ="tone"></span></font></p> <p><font size ="3"><span id="summary"></span></font></p> </div>
There's a Problem with the Script
Cleaning the HTML
When you access the Document Object of a Webpage, you'll notice that it's full of junk that you don't know is there. You get all sorts of nonsense about cookies that will spoil the results of the Text Mining Engine. Therefore, before I send the text from a page to the service I do some cleaning. I first only look for text that is displayed, I do this by looping through all of the sections of a document and removing them from the body if they're hidden like so:
var divs = document.getElementsByTagName("div"); for (divx of divs) { if (divx.style.display === 'none') { var divId = divx.id; var divR = document.getElementById(divId); if (divR !== null) { divR.parentNode.removeChild(divR); } } }
I then loop through what's left and look for any Paragraph elements, again checking to see if they're visible. If so, I take the text and add it to the text I'm going to pass to the service like so:
var allPs = document.getElementsByTagName("p"); var rText = ""; for (val of allPs) { if (val.style.display != 'none' || val.hidden != true) { rText += val.innerText + '. '; } }
Finally, I then replace some characters that can cause the service some issue:
text = text.replace(/[\n\r]+/g, ' '); text = text.replace(/&/g,"&"); text = text.replace(/</g,"<"); text = text.replace(/>/g,">"); text = text.replace(/"/g,"""); text = text.replace(/'/g,"'"); text = text.replace(/\[\d*\]/g, ' '); text = text.replace( /\.(?=[^\d])[ ]*/g , '. ')
Calling The Service From JavaScript
Once you've cleaned up all of the text from the page, you can then call the service from the Chrome extensions using JavaScript. I used the following code to pass the text, call the service and then populate the results back to the popup:
var command = "<"; command = command + text; //Say we're looking for # sentences command = command + "</NSTEIN_Text><Methods><nsummarizer><NbSentences>" + ranger.value + "</NbSentences><KBid>IPTC</KBid></nsummarizer><NSentiment></NSentiment></Methods></Nserver>"; //now do the post var URL = 'http://[Your TME URL]:[port]/rs/'; //var result = ""; fetch(URL, { method: "POST", body : command, headers : {"Content-Type" : "application/xml"}, }) .then(function(res) { if (res.ok) { // ok if status is 2xx console.log('OK ' + res.statusText); } else { console.log('Request failed. Returned status of ' + res.status); } return res.text() }) .then(function(text) { parser = new DOMParser(); var xmlDoc = parser.parseFromString(text,"text/xml"); result = xmlDoc.getElementsByTagName("Summary")[0].textContent; console.log(result); summary.innerText = result; var DocLevels = xmlDoc.getElementsByTagName("DocumentLevel")[0]; Subjectivity.innerText = DocLevels.getElementsByTagName("Subjectivity")[0].textContent; tone.innerText = DocLevels.getElementsByTagName("Tone")[0].textContent; console.log('Subjectivity' + Subjectivity); console.log('Tone' + tone); return result; })
Loading the Extension in Chrome
Before you're able to load the extension in Chrome, you need to create a manifest file that includes version information, extension icon and permissions. Mine looks like this:
{ "name": "Magellan", "version": "1.0", "manifest_version": 2, "description": "Summarize pages for a popup", "browser_action": { "default_icon": "icon.png", "default_popup": "popup.html" }, "permissions": ["tabs", "<all_urls>"] }
You'll then need to change the URL in popup.js to point to your instance of Text Mining Engine Service.
To load your plugin, in chrome enter the following in the address bar:
chrome://extensions/
That will bring up a screen like the following:
Select the Load Unpacked option from the top and browse to the location of the extension files on your local machine.
And there you have it, the extension is loaded.
Summary
So, that's it. This was a walk-through on how to call Magellan's Microservice for Text Mining from a Chrome Extension to summarize web pages. Please feel free to contact me for any comments or questions.
Ironically, this post has been longer than I thought it would be…
Categories
- All Categories
- 117 Developer Announcements
- 52 Articles
- 145 General Questions
- 132 Services
- 56 OpenText Hackathon
- 35 Developer Tools
- 20.6K Analytics
- 4.2K AppWorks
- 8.9K Extended ECM
- 912 Cloud Fax and Notifications
- 81 Digital Asset Management
- 9.3K Documentum
- 29 eDOCS
- 164 Exstream
- 39.8K TeamSite
- 1.7K Web Experience Management
- 4 XM Fax