How we can retrieve the dftxml file

lementree
lementree Member
edited August 15, 2023 in Documentum #1

Hi,

When we upload a document to content server, it will convert to dftxml file for indexing. We need to get the the dftxml file of the document so we can get the text content of pdf/word etc documents.

Can you please help to get this done.

Regards,

Sagar

Tagged:

Comments

  • gvicari
    gvicari E Community Moderator
    edited August 10, 2023 #2

    Hi @lementree, sorry for the late response. We are submitting this to the team and will get back to you as soon as possible.

  • Karen Weir
    Karen Weir E Community Administrator

    @lementree , I have moved your question to our Documentum community area so you have access to the experts.

  • The index agent is responsible for generating the dftxml. You can retrieve it via the xPlore Admin console. But I'm not sure what you plan to do with it. Other than troubleshooting it will not help you much.

  • Hi,

    We are trying to integrate with OpenAI. For OpenAI we need to extract the text from pdf/word files and submit. As we will have the extracted text in dftxml files we would like to use instead of extracting again. Also we would like to show the results based on permissions, can you please provide if there is any api if we pass the list of object_ids, and username we can filterout the documents user not having permissions.

    Regards,

  • Hi @lementree,

    Did you find any solution to extract text from dftxml files? We are also looking for same use case, please let know if you found any solution.

    Thanks,

    Karthik

  • Like I wrote this information is stored in xPlore, and more specifically in the xDB database. So you would have to use the xDB API to retrieve that information. ACL information is also replicated into xDB so you can also get it from there.

  • Karthik S
    Karthik S Member
    edited November 12 #8

    @Hicham Bahi our use case is in Content server not in Documentum, I wanted to check if @lementree is able to extract content from dftxml files since CS and Documentum follow same process for indexing.

  • Michael McCollough
    Michael McCollough E Community Moderator

    In the xPlore Admin adn Developer guide you will find:
    OpenText™ Documentum™ xPlore: Administration and Development Guide (EDCSRC220100-AGD-EN-02)

    This is the Appendix on the dftxml. It gives you the way to get this via the admin console. However there are also developer libraries that should allow you to programmatically do this lookup as well.

    xPlore is now supported only for the duration to migrate to Documentum Search as of Documentum 24.4 (about to be released end of Nov/Begin of Dec). There were some more advanced articles out there at one time that explained how to tie-into the index agent to do custom indexing/formatting which (if you can find) may give some inside into a more optimal approach. You might check the xPlore distribution sub-directories, seems there was a tika example there you might use to get some insight to this as well.

    Sorry I don't have much more to give on this but I hope my tips may help you find more and find a patch forward.

  • Michael McCollough
    Michael McCollough E Community Moderator

    Just caught you comment about using OT CS vs. DCTM CS here. Text extraction is generally very lightweight adn takes very little time/overhead as it does not have to process the complexities of the document formats, the differnet in time saved here may not be worth the effort (not saying it isn't, just have a feeling it is not going to save you much with text extraction). Most of the time spent in indexing is post text extraction with linguistic analysis, tokenizing, lemmatization, etc that make Full Text indexing so powerful.

    Assuming you are migrating from one to the other, sure they may use similar approaches but they are different technologies/approaches. There are options in text extraction that may be used by one but not the other. I have not delved into OT CS extraction/index options myself. I know we tend to have more capability around ("this" and "that") and ("here" or "there") advanced queries. Good luck to you and I hope this is all helpful.