Home
Analytics
Simple xml datsource failing with & character in xml file
warwick.baker@gmail.com
<p>Hi,</p>
<p> </p>
<p>I have an xml file as a simple xml datasource. Within it is a line like ...</p>
<p> </p>
<div>
<pre class="_prettyXprint _lang-xml">
<lineitem><item>AB Allen & Bradley</item></lineitem>
</pre>
</div>
<p>When I try and setup some row mappings with the designer for my dataset it fails saying the source file is invalid.</p>
<p> </p>
<p>If I change the & character to 'and', for example, or remove it completely things work ok.</p>
<p> </p>
<p>How can I handle these characters in the xml file successfully? Is there some way to escape them or something?</p>
<p> </p>
<p>Thanks, </p>
Find more posts tagged with
Comments
warwick.baker@gmail.com
<p>I guess most people will tell me to wrap the content of the tag in <![CDATA[AB Allen & Bradley]]>. But what if I don't have control of the xml, I just pull it off a feed.</p>
<p> </p>
<p>As a follow on from this initial problem, could anyone tell me if there is a way of maintaining the whitespace in xml tags as is?</p>
<p> </p>
<p>Thanks.</p>
Clement Wong
<p>Per XML Spec (<a data-ipb='nomediaparse' href='
http://www.w3.org/TR/REC-xml/#syntax'>http://www.w3.org/TR/REC-xml/#syntax</a>):</p>
;
<p style="margin-left:40px;"><em>"The ampersand character (&) and the left angle bracket (<) <strong><em>MUST NOT</em></strong> appear in their literal form, except when used as markup delimiters, or within a <a data-ipb='nomediaparse' href='
http://www.w3.org/TR/REC-xml/#dt-comment'
title="Comment">comment</a>, a <a data-ipb='nomediaparse' href='
http://www.w3.org/TR/REC-xml/#dt-pi'
title="Processing instruction">processing instruction</a>, or a <a data-ipb='nomediaparse' href='
http://www.w3.org/TR/REC-xml/#dt-cdsection'
title="CDATA Section">CDATA section</a>. If they are needed elsewhere, they <em>MUST</em> be <a data-ipb='nomediaparse' href='
http://www.w3.org/TR/REC-xml/#dt-escape'
title="escape">escaped</a> using either <a data-ipb='nomediaparse' href='
http://www.w3.org/TR/REC-xml/#dt-charref'
title="Character Reference">numeric character references</a> or the strings " & " and " < " respectively. The right angle bracket (>) may be represented using the string " > ", and <em>MUST</em>, <a data-ipb='nomediaparse' href='
http://www.w3.org/TR/REC-xml/#dt-compat'
title="For Compatibility">for compatibility</a>, be escaped using either " > " or a character reference when it appears in the string " ]]> " in content, when that string is not marking the end of a <a data-ipb='nomediaparse' href='
http://www.w3.org/TR/REC-xml/#dt-cdsection'
title="CDATA Section">CDATA section</a>."</em></p>
<p style="margin-left:40px;"> </p>
<p> </p>
<p>Since the incoming data is not properly formed, then you should read the stream in and then "fix" it by adding a CDATA where needed.</p>
<p> </p>
<p>Check out my DevShare I recently wrote about parsing XML from a RSS feed with E4X @ <a data-ipb='nomediaparse' href='
http://developer.actuate.com/community/forum/index.php?/files/file/1122-parsing-xml-in-birt-with-e4x/'>http://developer.actuate.com/community/forum/index.php?/files/file/1122-parsing-xml-in-birt-with-e4x/</a></p>
;
<p> </p>
<p>After the report design has read in the XML feed, you can use a regex search and replace to add the CDATA tags in. This will also maintain your whitespace requirement.</p>
<p> </p>
<p>This code snippet example take a simple XML string like your example, and "fixes" it. The magic is in this line, and everything else is just variable initialization and debug statements:</p>
<pre class="_prettyXprint">
myXML2 = myXML2.replace(/\<item\>/g, "<item><![CDATA[").replace(/\<\/item\>/g, "]]></item>");</pre>
<p>Demo:</p>
<pre class="_prettyXprint">
//The logger only works in commercial BIRT and will show the output to Eclipse's Error Log in the IDE
logger = java.util.logging.Logger.getLogger("birt.report.logger");
myXML = "<feed><lineitem><item><![CDATA[AB Allen & Bradley]]></item><item><![CDATA[CW Clement & Wong]]></item></lineitem></feed>";
myXML2 = "<feed><lineitem><item>AB Allen & Bradley</item><item>CW Clement & Wong</item></lineitem></feed>";
//& is an invalid character without CDATA
//
//If you don't search/replace, you will get the following error:
// TypeError: The entity name must immediately follow the '&' in the entity reference. (/report/method[
@name="
;beforeFactory"]#14)
//
myXML2 = myXML2.replace(/\<item\>/g, "<item><![CDATA[").replace(/\<\/item\>/g, "]]></item>");
logger.warning (myXML); // Show a well formatted XML stream
logger.warning (myXML2); // Show the raw XML stream
rss = new XML(myXML); // Convert to XML literal using E4X
rss2 = new XML (myXML2); // Convert to XML literal using E4X
totalItems = rss.lineitem.item.length(); // Easy E4X dot notation
totalItems2 = rss2.lineitem.item.length();
logger.warning (totalItems); // Shows 2 <item>
logger.warning (totalItems2);
logger.warning (rss2.lineitem.item[0]); // Shows each <item> element
logger.warning (rss2.lineitem.item[1]);
</pre>
warwick.baker@gmail.com
<p>Thanks Clement, this is great help, I'll give it a try and see how I go.</p>
<p> </p>
<p>Regards,</p>
<p>Warwick Baker.</p>
warwick.baker@gmail.com
<p>Hello Clement,</p>
<p> </p>
<p>I tried in a bit of java code to wrapper things like you suggested. For example, after manipulating the input xml feed I ended up with ...</p>
<p> </p>
<div>
<pre class="_prettyXprint">
<reportbody>
<lineitem><item><![CDATA[Brand Description]]></item></lineitem>
<lineitem><item><![CDATA[
]]></item></lineitem>
<lineitem><item><![CDATA[AB Pickle & Smitherns]]></item></lineitem>
<lineitem><item><![CDATA[HANS Hansen]]></item></lineitem>
<lineitem><item><![CDATA[ 17 Brand Codes Listed.]]></item></lineitem>
</reportbody>
</pre>
<p>However, when I feed this into the report engine the problem of the & is handled aok but the whitespace at the start of the last line is not maintained in the generated report. Am I missing something?</p>
<p> </p>
<p>Thanks,</p>
<p>Warwick Baker.</p>
</div>
Clement Wong
<p>Two things to make it work with the leading whitespaces:</p>
<p> </p>
<p>E4X setting -- <em>add this before <span style="font-family:'courier new', courier, monospace;"><span>rss </span><span>=</span><span> </span><span>new</span><span> XML</span><span>(</span><span>myXML</span><span>);</span></span></em></p>
<pre class="_prettyXprint">
XML.ignoreWhitespace = false;</pre>
<p>BIRT setting for the text report item:</p>
<p> </p>
<p style="margin-left:40px;"><em>Properties > Format String > Custom > Custom settings > Preserve white spaces</em></p>
<p> </p>
<p> </p>
<p>See attached for an example.</p>
warwick.baker@gmail.com
<p>Thanks Clement, I'm sticking away from the E4X stuff and doing the tweaking manually in java code before passing to birt engine.</p>
<p> </p>
<p>I'll keep mucking about with it ...</p>