XML Data Source Performance

We are looking for a "default" or "universal" schema for use with XML data sets. When BIRT makes a call for XML data, if a schema is not supplied, it will call for entire dataset twice, once to get all of the data and create it's own default schema, and a second time to get the data and validate it against that schema. This second call of course always validates perfectly, since the schema was just built against this same data.

Before you ask, yes, we know that BIRT actually makes two calls for the entire dataset if a schema is not supplied. Because the data is being supplied by a server side agent (as opposed to being read from say a .xml file on the file system) we have a logging mechanism that allows to see both calls when no schema is supplied, and only the one call when we do supply a schema.

In our case, we don't need to do a detailed validation of the XML data at the report level, because the server agent that is supplying the data has already performed the validation process. What we need is a "universal" schema that BIRT will use to "validate" an XML file at its highest level with a root tag and a tag for a "row" of data (which would be useless for actual detailed data validation, but that's OK here).

Alternately, does anyone know how to suppress BIRT's desire to always make that first pull of the data to create it's own schema?

Another BIRT user posted the following issue with XML Data Sources, but we are not sure if this is related to our situation unless our server agent call could be likened to their InputStream.

Bug 293726 - report creation takes > 30 minutes for 10.000 XML records ---https://bugs.eclipse.org/bugs/show_bug.cgi?id=293726

Find more posts tagged with

Comments

thuston

I don't understand. 
<pre class='_prettyXprint _lang-auto _linenums:0'><book>
<title>IMA Title</title>
<chapter>1 stuff</chapter>
<chapter>2 things</chapter>
</book></pre>
How can you expect it to find <chapter> if it has no schema and has not pre-read the document to find it? 
The ODA has to be sure the new document you give it matches the schema the DataSet was defined against. If it just makes assumptions, you get corrupt data.

jjfeigal

<blockquote class='ipsBlockquote' data-author="'thuston'" data-cid="80616" data-time="1311710173" data-date="26 July 2011 - 12:56 PM">
I don't understand. 
<pre class='_prettyXprint _lang-auto _linenums:0'><book>
<title>IMA Title</title>
<chapter>1 stuff</chapter>
<chapter>2 things</chapter>
</book></pre>
How can you expect it to find <chapter> if it has no schema and has not pre-read the document to find it? 
The ODA has to be sure the new document you give it matches the schema the DataSet was defined against. If it just makes assumptions, you get corrupt data. </blockquote>
 
I understand your comment, but in my original post I tried to empahsize that I don't need the ODA to validate the detailed document. The server agent that generates the XML is responsible for generating valid XML in the form that the BIRT report was originally defined against. 
 
I'm hoping that there is a way to provide a high-level "stripped down" schema which the XML document will satisfy without validating every XML element contained in the document. For example, my XML document will be in a form as follows: 
 
<view><viewEntry>...</viewEntry><viewEntry>...</viewEntry></view> 
 
where <view> is the root with one or more <viewEntry> elements. Each <viewEntry> element contains additional XML elements, but I don't need the ODA to validate them because I know that the server agent has built the required XML for each <viewEntry> for the XPath defined in the BIRT report. 
 
What I'm asking is if there is a way to bypass the detailed validation that you refer to. As you imply, the answer could very well be "No - a detailed schema completely matching the XML document is required".