Not sure what can be done here - hopefully ya'll can help me out.
We're using BIRT in an enterprise application where the data against which it reports can often be quite large. In some cases users are able to provide combinations of filtering criteria to our reports that result in 100,000s, sometimes millions of rows being reported on. For those rows, often there are detail subqueries, so we're talking lots and lots of pages!
(Anecdotaly, one of my production operations guys told me he found a 27gig report file sitting in a temp folder on the server! We've since made some changes to preclude such things...

)
Ok, so the problem is that we're seeing a large memory footprint for reports that are rather simple but that result in many pages. We're not storing the output - we stream the output direct to browser or to a file, but watching the heap during the execution of these reports shows that it takes a huge amount of memory to process the report. (In my test case, for instance, I am working against 1.7 million rows for a report that on each page has just a few lines of text - it's a certificate of completion for a course report. I have never had it finish, even with a 6 gig heap; it had generated about 600meg of output before it ran out of heap memory.)
This feels wrong to me. If we're simply generating a report and streaming the results direct to a file or browser, I would expect that the memory footprint would not seem to correlate with the # of pages, but quickly plateau into a steady-state. Whether we do PDF or HTML, however, it is clear that more pages = more memory. (And for HTML, a "page" would just be the same content that would have been on a page in PDF.)
What we've done so far is to at least DataEngine.MEMORY_BUFFER_SIZE in the AppContext to a reasonable number (10mb is where I settled, but maybe that is too high?). Prior to this, 1.7 million rows would not even result in any output - the result set caching that happens prior to the report being generated would itself blow the heap.
So the question is: is this just normal BIRT behavior? Are there other configuration options we can set that would restrict report memory? (And limiting the # of rows the queries can process is not an option - that would screw up the analytic reports for instance...)
What can we do to get this under control? Or is BIRT not going to scale to situations like this? (Keep in mind that one of our output formats is CSV, and so a million-row CSV report, while cumbersome, is still in the realm of real-world reporting.)