Home
Analytics
Data Cube: Looking for string concatenation in the summary field
BirtWidi
<p>I have a data cube and I wish to summarize over string data types.</p>
<p>Until now, the only working functions for summary fields of type STRING I found are: MIN, MAX, FIRST, LAST.</p>
<p> </p>
<p>I wish to have a summary function like "concatenate all values using a given separator".</p>
<p>Is there any extension point where I can add such a summary-function?</p>
<p> </p>
<p>A long time ago, a guy gave me the hint to use a grouped table instead of a data cube and to add sub-reports in the table cells.</p>
<p>That works for "small" data sets - but on larger data sets the runtime increases to more than 5 minutes.</p>
<p>I believe this is because the sub-reports are joins of three data sets and it looks like BIRT joins these data sets again for each sub-report.</p>
<p> </p>
<p>Can anyone give me a hint?</p>
<p> </p>
<p>Thanks<br>
Widi</p>
Find more posts tagged with
Comments
JFreeman
<p>What version of BIRT are you using?</p>
<p> </p>
<p>Can you provide details on your use case?</p>
BirtWidi
<p>Jesse,</p>
<p> </p>
<p>first, I use BIRT 4.4.2 with Zend Studio 12 as development environment and tomcat6 in the productive environment.</p>
<p> </p>
<p>And second - more details. I have attached a sample report based on the SampleDB.</p>
<p>My manager likes to see, which orders makes his sales employees distributed over the months.</p>
<p> </p>
<p>The first attempt was, to build a data cube with dimensions over the columns EMPLOYEENUMBER and ORDERDATE. But here I miss a summary function to list all ORDERDATES in one summary field.</p>
<p> </p>
<p>The second attempt was to build a grouped table (group over EMPLOYEENUMBER), add 12 columns for the months, and add 12 sub-reports which are filtered by the EMPLOYEENUMBER and the MONTH of the column.</p>
<p>Here I really got the result I wish - but it runs long, sometimes too long for the timeout of a HTTP request.</p>
<p>You can see: the whole cross table needs approx. 130ms, but the grouped table needs 4400ms.</p>
<p> </p>
<p>My real data base contains large tables which are fetched from an external server via REST. This data base system does not provide the ability to build joined views on the server. Therefore I have to join the data sets in the BIRT report.</p>
<p>Thus the preparation of the data set for one sub report needs 5seconds - thus 1 minute for one row.</p>
<p> </p>
<p>And now ... any suggestions?</p>
<p> </p>
<p>Thanks</p>
<p>Widi</p>
JFreeman
<p>So you are wanting to concatenate all of the dates together in order to display all of the applicable dates into a single cell in the crosstab like you have it in the table?</p>
<p> </p>
<p>Created joined data sets can definitely cause a performance impact depending on the situation.</p>
<p>I would typically try and avoid a joined data set if possible.</p>
BirtWidi
<blockquote class="ipsBlockquote" data-author="JFreeman" data-cid="136924" data-time="1433867831">
<div>
<p>So you are wanting to concatenate all of the dates together in order to display all of the applicable dates into a single cell in the crosstab like you have it in the table?</p>
</div>
</blockquote>
<p> </p>
<p>Yes, that´s what I need!</p>
<p>You can convert the dates to strings and concatenate the strings - that doesn´t matter. For this a computed column may be introduced in the data set (in the BIRT report - not in my foreign and poor data base system).</p>
<p>The result I need are lists of dates in the summary cells of the crosstab - like in my table.</p>
<p> </p>
<p>Regards,</p>
<p>Friedbert</p>
JFreeman
<p>I've been working on trying to get this done quite a few different ways and between your data source/set setup and the way the aggregations work in crosstabs, I'm not finding a good way to create the string concatenation you are wanting within the crosstab.</p>
<p> </p>
<p>I think the best option is to go the table route and instead focus on improving the setup and performance of your data source/sets. </p>
BirtWidi
<p>Jesse,</p>
<p> </p>
<p>thanks for your work.</p>
<p>I have also gained the impression that this problem is difficult to solve with the given environment.</p>
<p>I believe, I have to look for other solutions. One possibility would be to copy the data to a separate SQL database. Then I would have the full power of SQL and could do the joins efficiently from the SQL data base machine.</p>
<p>But so far I had no desire to handle a redundant copy of the data.</p>
<p> </p>
<p>Friedbert</p>
JFreeman
<p>Yeah, handling redundant data is less than ideal.</p>
<p> </p>
<p>You can investigate some other options to maybe optimize pulling in the data from your existing data source setup. However, from your previous description, without making changes to the data sources directly, I think you are probably still going to be encountering performance issues.</p>