How do you make a histogram in BIRT?

newbie321
edited February 11, 2022 in Analytics #1

I can not seem to figure out how to make a histogram within BIRT. My hunch tells me that Bar chart would be the one to use and this is what couple other threads are suggesting, but what/how do I group?

let's say I have a column vector of normally distributed values. What do I put in:
1) X -Category field in Chart
2) Y-Series field in Chart

What grouping do I need toperform?
Thanks!~

Best Answer

  • newbie321
    edited June 1, 2018 #2 Answer ✓

    @jfranken said:
    I've looked into this issue and I don't see a simple way to create a histogram. There could be options I haven't identified. It seems like creating a histogram should be straightforward, but custom processing is required to group the data by adjustable size bins and do counts on the data points in each bin, then plot the bins vs. the bin counts.

    If you are running Analytics Designer Professional, the chart editor supports Highcharts and there is a Highcharts plugin for histograms. The histogram plugin is not bundled with the product but I believe it could be added. For the open source designer, the only option I can think of is to use a Scripted data set. I tried using a table with group aggregations as the data binding for the chart. I could get the correct aggregations, but I couldn't get the table aggregations to bind with the chart axes. Using a Scripted data set, you can loop through the data in code creating the bins as well as the counts for each bin. Use those values to make a two column data set. Then you can simply create a bar chart with the "bin" column data as the X axis and the "bin count" column data as the Y series.

    I hope this helps.

    Thanks a lot for a quick response confirming that histograms are not possible within BIRT using standard chart tools. I needed a second opinion to make sure I am not missing anything as I wanted to program the minimum amount of code.
    In the meantime, I did my calcs for Histogram outside of BIRT (in Java using apache commons library) and then just pushed the x,y pairs into the BIRT Bar chart. This is exactly what you are suggesting at the end of your email. This method was absorbed by BIRT Bar chart very well.

    On a slightly different note, if you have a chance, could you please look at my other posts and the one about scatter plots/two points having same x-coordinates.

    Thanks again for your help!!

Answers

  • jfranken
    edited May 25, 2018 #3

    A bar chart or area chart would work for a histogram. The x-axis series is typically date/time data. The y-axis series data will be your vector data. The default y series aggregation is "SUM" which is probably what you want, although you could change it to "Average" or another type of aggregation. There's an icon next to the x-axis series box on the Select Data tab of the chart editor that opens the Group & Sorting editor. Typically a time interval is chosen to group by. Each bar in the chart will show an aggregation of the data for one time period. If you have 10 years worth of data, group by years. If you have 12 days worth of data, group by days. In other words, create a group that will result in a readable chart.

    Warning No formatter is installed for the format ipb
  • Thanks for a quick reply.

    ok I have a returns series (1 x 200 in dimension) and I want to put 'percent' on X-axis and 'count' on the Y-axis (the number of observations that falls within any particular bin). Let's say my range along the x-axis is -20% to +20 With a tick step size = 8. There is nothing to average or to sum. The machine must be able to count the number of entries that falls within a category.

    I think in the suggestion you are not discussing the generation of a histogram. You are only performing an aggregation, but that aggregation does not lead to a histogram.

    Thanks!

    @jfranken said:
    A bar chart or area chart would work for a histogram. The x-axis series is typically date/time data. The y-axis series data will be your vector data. The default y series aggregation is "SUM" which is probably what you want, although you could change it to "Average" or another type of aggregation. There's an icon next to the x-axis series box on the Select Data tab of the chart editor that opens the Group & Sorting editor. Typically a time interval is chosen to group by. Each bar in the chart will show an aggregation of the data for one time period. If you have 10 years worth of data, group by years. If you have 12 days worth of data, group by days. In other words, create a group that will result in a readable chart.

  • Sorry, I wasn't think about histogram data. For histograms, the x-axis is usually a range of values and the y-axis shows the percentage each of the values occurs in the range. For example, a histogram on a camera shows the percentage of pixels at each brightness level from lowest brightness to highest. In your description, I'm still not sure what you mean by (1 x 200 in dimension). Also, the -20% to +20% on the x-axis is confusing. If you can provide an example of what you want (not necessarily created in the Designer), that would be helpful. I will be away until Tuesday, so hopefully someone else will respond in my absence.

    Warning No formatter is installed for the format ipb
  • newbie321
    edited May 27, 2018 #6

    Hi, @jfranken

    Yes the general description of a histogram that I am trying to generate can be found here: https://en.wikipedia.org/wiki/Histogram
    1x200 data vector means: you have 1 row and 200 columns or 200 rows and 1 column. in either way the input fundamentally is the same: it is a vector of data.

    Apologies, I should have included an explicit example from the start - I am doing it right now. Attached excel contains the single column vector of data. The histogram is also included as an image. the y-axis of a histogram will contain the count of data points that fall within any given bin.
    I tried to upload the file, but uploading does not work. So use the Excel's =rand() function to generate random numbers and try to generate a histogram using Data Analysis->Histogram option.
    The units on the y-axis are COUNT and the units on the x-axis are the same as the units on the original input column vector.

    So coming back my original question:
    How does one create a histogram using BIRT? I know it should be a bar chart. But how do we group the data to achieve a histogram effect? The shape of the histogram depends on the number of bins so your final chart might differ from what I provided in the .xlsx file. But the principle is still the same.

    Thanks,

  • @jfranken said:
    Sorry, I wasn't think about histogram data. For histograms, the x-axis is usually a range of values and the y-axis shows the percentage each of the values occurs in the range. For example, a histogram on a camera shows the percentage of pixels at each brightness level from lowest brightness to highest. In your description, I'm still not sure what you mean by (1 x 200 in dimension). Also, the -20% to +20% on the x-axis is confusing. If you can provide an example of what you want (not necessarily created in the Designer), that would be helpful. I will be away until Tuesday, so hopefully someone else will respond in my absence.

    @jfranken
    Any ideas on how to proceed with respect to a histogram? I provided requested information in my other reply. thanks!

  • I've looked into this issue and I don't see a simple way to create a histogram. There could be options I haven't identified. It seems like creating a histogram should be straightforward, but custom processing is required to group the data by adjustable size bins and do counts on the data points in each bin, then plot the bins vs. the bin counts.

    If you are running Analytics Designer Professional, the chart editor supports Highcharts and there is a Highcharts plugin for histograms. The histogram plugin is not bundled with the product but I believe it could be added. For the open source designer, the only option I can think of is to use a Scripted data set. I tried using a table with group aggregations as the data binding for the chart. I could get the correct aggregations, but I couldn't get the table aggregations to bind with the chart axes. Using a Scripted data set, you can loop through the data in code creating the bins as well as the counts for each bin. Use those values to make a two column data set. Then you can simply create a bar chart with the "bin" column data as the X axis and the "bin count" column data as the Y series.

    I hope this helps.

    Warning No formatter is installed for the format ipb
  • newbie321
    edited June 1, 2018 #9 Answer ✓

    @jfranken said:
    I've looked into this issue and I don't see a simple way to create a histogram. There could be options I haven't identified. It seems like creating a histogram should be straightforward, but custom processing is required to group the data by adjustable size bins and do counts on the data points in each bin, then plot the bins vs. the bin counts.

    If you are running Analytics Designer Professional, the chart editor supports Highcharts and there is a Highcharts plugin for histograms. The histogram plugin is not bundled with the product but I believe it could be added. For the open source designer, the only option I can think of is to use a Scripted data set. I tried using a table with group aggregations as the data binding for the chart. I could get the correct aggregations, but I couldn't get the table aggregations to bind with the chart axes. Using a Scripted data set, you can loop through the data in code creating the bins as well as the counts for each bin. Use those values to make a two column data set. Then you can simply create a bar chart with the "bin" column data as the X axis and the "bin count" column data as the Y series.

    I hope this helps.

    Thanks a lot for a quick response confirming that histograms are not possible within BIRT using standard chart tools. I needed a second opinion to make sure I am not missing anything as I wanted to program the minimum amount of code.
    In the meantime, I did my calcs for Histogram outside of BIRT (in Java using apache commons library) and then just pushed the x,y pairs into the BIRT Bar chart. This is exactly what you are suggesting at the end of your email. This method was absorbed by BIRT Bar chart very well.

    On a slightly different note, if you have a chance, could you please look at my other posts and the one about scatter plots/two points having same x-coordinates.

    Thanks again for your help!!