We are looking for a "default" or "universal" schema for use with XML data sets. When BIRT makes a call for XML data, if a schema is not supplied, it will call for entire dataset twice, once to get all of the data and create it's own default schema, and a second time to get the data and validate it against that schema. This second call of course always validates perfectly, since the schema was just built against this same data.
Before you ask, yes, we know that BIRT actually makes two calls for the entire dataset if a schema is not supplied. Because the data is being supplied by a server side agent (as opposed to being read from say a .xml file on the file system) we have a logging mechanism that allows to see both calls when no schema is supplied, and only the one call when we do supply a schema.
In our case, we don't need to do a detailed validation of the XML data at the report level, because the server agent that is supplying the data has already performed the validation process. What we need is a "universal" schema that BIRT will use to "validate" an XML file at its highest level with a root tag and a tag for a "row" of data (which would be useless for actual detailed data validation, but that's OK here).
Alternately, does anyone know how to suppress BIRT's desire to always make that first pull of the data to create it's own schema?
Another BIRT user posted the following issue with XML Data Sources, but we are not sure if this is related to our situation unless our server agent call could be likened to their InputStream.
Bug 293726 - report creation takes > 30 minutes for 10.000 XML records ---
https://bugs.eclipse.org/bugs/show_bug.cgi?id=293726