Hello,
Wanted to see if others had to deal with this type of issue.
We are working to build a DFC program to apply relationships between documents and another object type. This program will have to process every document in all of our repositories. There is one particular repos that has over 3-4 million documents so this process needs to be efficient and needs to run in batch mode - where it will process a set number of docs, stop, then rerun again on a new set of documents.
The one issue I see is how the program will batch up the object ids to process and how to avoid getting the same set of documents in the query. So bascially, the program will do the following:
1. Get fixed set of documents to process for a particular run
2. Create the relationship object for child and parent ids.
3. Complete processing the fixed set and get the next set of documents that have not been processed
I've been considering using the r_object_id to order the documents lists. So, I would order the documents by r_object_id, and only get the first N documents. Store the last r_object_id processed. Then on the next run, get the next N documents where object Id > last r_object_id processed.
Has anyone else used the r_object_id in this way? Will using the r_object_id burn us in any way?
Any help would be apprecated.