Imagine mining ANY HTML page on the web as a potential BIRT Data source. Using this sample report as a guide, now you can! The included sample is a live snapshot for a given stock (stock ticker supplied as a parameter input). The report then drills into a sub-report detailing the trading range for the same stock over a given period of time.<br />
<br />
This strategy requires a creative process for accessing and parsing data. BIRT does not have a native HTML data source. What BIRT does have is an XML data source. I looked at this and figured the structure of HTML is similar enough to that of XML and as a result processing the HTML might not be a stretch for the XML data source. The last link in the chain was to get the HTML snippet I needed embedded in a well-formed XML stream. That is where the Yahoo Query Language comes into play. YQL allows SQL-like queries to be executed against web-based content. (more on YQL
here) The results of those queries can be retuned as XML. BIRT takes it from there.<br />
<br />
Once we have crafted the Yahoo query to isolate the data we need (most likely a table object on a web page), we can use XPath queries inside the XML data source to parse the core HTML and isolate the data we want to use. Using XPath array notation we can flatten the HTML data into distinct rows and columns to use in BIRT.<br />
<br />
In the example “LiveStockDetailsâ€, we parse the HTML table detailing the current price of the stock located at this URL (the URL is accessed via YQL, not directly):
http://finance.yahoo.com/q?s=GOOG (note the ticker symbol is parameterized and can be changed each time the report is run). Using XPath array notation we are able to isolate distinct data elements within the table DOM element (the table is all that is returned by YQL). Looking at the table itself, the following query will isolate the “Last Trade†time:
/tr[2]/td/span. This is the second row of the table within a “span†tag of the first table data element. Have a look at the column mapping of the data set to see each of the XPath statement used. You may also want to look at the Data Source to see the YQL URL that is executed (execute the URL manually
here). The parameter bindings on the Data Source feature the dynamic substitution of the sotck ticker into the YQL URL.<br />
<br />
This approach requires a detailed understanding of the underlying HTML for processing. Once the individual column mappings are complete, the Data Set is stable and this will not have to be revisited (unless the underlying HTML changes!).<br />
<br />
To review:<br />
YQL --> XML w/ Embedded HTML Table Element --> BIRT Data Set (via XPath Parsing)<br />
<br />
Scrolling Ticker<br />
This sample also features a scrolling news ticker detailing the latest headlines for the target stock. These headlines are gathered just as the stock data is; a YQL-fed XML Data Set. In this case, the headlines are fetched and placed in the data set. This processing is more native XML, using a standard YQL query. As such the column mappings are very straightforward. The data are placed in a hidden table on the report (it is at the top of the sample). The table is there simply to force the Data Set to build out. The onFetch scripting event on the data set is used to build a Persistent Global Variable called “linkBlockâ€. This variable contains HTML-markup detailing the headlines and it is embedded in a DIV at the center of the report and in turn processed by the Javascript at the bottom of the report. Have a look at the fairly simple script for more details on how this feature works.<br />
<br />
These samples are fully functional for BIRT 2.3 & 2.5. No external dependencies at all.<br />
Good Luck!