TS 16.4.1: Can I turn off Tika XML Parsing in SOLR?

I confirmed that I can still search for content in a file that was flagged by one of these errors - I found another file that had clearer content outside of its HTML attributes to search on and it returned the correct result. However, I still want to turn off the Tika Parsing.

UPDATE: I lied - I was searching against the wrong file. When I attempted to search for keywords/phrases in the file that the parser flagged as malformed, I could not get search to return that file.

Hi there, this is a bit too technical for me, so just checking in first - did you resolve this with your update to the second post? BTW, thanks for upgrading to the latest version of TeamSite!

Hi Jacqui. No I did not resolve this. It’s a real concern for us as far as continuing with TeamSite. I received confirmation from our Support Engineer that indeed if the Tika XML Parser fails a file because it thinks it’s malformed, the contents of the file will not be indexed. I had a few hundred files that failed the Parser. However those files are formatted exactly how we want to format them. We can’t be expected to change our content to please the Parser.

I’m told there is no way to turn the Parser off, which is very disappointing. It’s not the Search Indexer’s job to parse XML. Just index the files. I’m awaiting further analysis from Support, but this could be a showstopper for us.

Yea you are pretty much correct. The full text search is pretty much useless as implemented.

I certainly don't get it. The previous search was marginal at best but worked better than this implementation

Agreed - I didn't think I'd every say "IDOL was better than this" but it's true at this point. I am getting all kinds of strange results when I attempt to search on Extended Attributes as well. It doesn't seem to be integrated properly or something.

I will rant some more on the use of this silly parser during Indexing. We utilize Server Side Includes in our HTML fragments that we produce from many of our templates via PTs. The parser doesn't like Server Side Includes. We're certainly not going to change THAT! How can OpenText expect their customers to create content to appease their Indexer because it is parsing the content prior to Indexing? That's utterly ridiculous. I'll repeat my earlier comment: It is not the Indexer's job to parse my content. Leave that to me. The Indexer just needs to index my content.

First, thank you for bringing this to our attention! I will forward all of your comments to the technical experts and see what I can find out for you.

@David Smith said:
Agreed - I didn't think I'd every say "IDOL was better than this" but it's true at this point. I am getting all kinds of strange results when I attempt to search on Extended Attributes as well. It doesn't seem to be integrated properly or something.

And the sad thing is that the parser really has little to do with Idol/SOLR. There are so many general purpose string parsers out there. That being said, Idol had keyview, which was performance hog, but functioned pretty well. Tika is well known and supposedly can: detect and extract metadata and text from over a thousand different file types (such as PPT, XLS, and PDF).

So there is no reason that we should have issues except that Tika is not being used properly.

@Jacqui_N said:
First, thank you for bringing this to our attention! I will forward all of your comments to the technical experts and see what I can find out for you.

I certainly hope non of this is a surprise as both Smitty and I have been working with support on issues like this for months now.

Since I'm the marketer, so I don't get technical inquiries, but I'm always willing to help you find an answer. I just found out about this forum yesterday! I am sure it's not a surprise to those "in the know". I reached out to engineering earlier this morning...just waiting on a response.

Hi there! I heard back from the engineer and he said, "Support is the right channel they should approach for technical assistance." I saw that you have been working with support on this. Is there anything further I can do to help?

Well reportedly there is a patch (Smitty received) we shall see if this works.

I will bring this forward!

I received a patch this morning and am currently re-indexing my branch to find out if it works. Fingers crossed.

Are there emojis on this forum? I want to show the fingers crossed emoji....picture it, if you will.

There aren't many formatting tools or emojis on this forum, Jacqui. I appreciate your help!

The patch I received appears to be working well. I'm a much happier camper.