Moving 83 Million Documents from Various File Systems to Documentum

DocumentumNewbie
edited July 3, 2014 in Documentum #1

We are planning to Moving 83 Million Documents (7 TB Size) from Various File Systems to Documentum as single object type. Metadata for these file is stored in LEGACY SQL Server Database

We are planning to write Java/ DFC code to take the files from File system and read the meta data

My Questions is based on experience

  • What is estimated (roughly) time it will take to complete 83 Million docs assuming SQL Server is indexed and read is fast
  • Any best practices, catches, things to watch out for.
  • Any other alternative tools or others

Comments

  • Koen_Verheyen
    Koen_Verheyen Member
    edited June 30, 2014 #2

    If you plan to traverse the folders to look for the documents, and you happen upon a directory with a lot of files, then I'd suggest using a shell command instead of the File class to get a listing of files.

    It will speed up the process considerably.

    This might take some time. So best create the logic restart-able (in case of issues) and the possibility to stop and start it.

  • EDMS98
    EDMS98 Member
    edited July 2, 2014 #3

    You might want to research external stores.  With an external store you can leave the legacy files on their existing file system and still create the new object in the repository.  Then later, you can move the content to one of the internal file stores in batches using the MIGRATE_CONTENT administrative method.   Essentially this would allow you to do the migration in two stages;  (1) create the objects in the target repository and (2)  move the actual files in batches.

    But if you do elect to create the object and move the content all in one step, my guess would be that you'll be able to process 5 to 10 document per second per thread / process.    Please note that this is a just a guess!

  • rstein
    rstein Member
    edited July 2, 2014 #4

    Hi, you may also want to take a look at High Volume Server benefits, and if that may help you.

    If you take the approach of using external file store mind that if you try to version the objects with content stored in the external store you may have an error as, by default, Documentum will try to store the new version in the same store. In my experience I had to first move the most recent versions content to a filestore.

    You should take special attention to the size of the Documentum database and its rate of growth. In this case you may consider using database partitions, in which case HVS may also help you.

  • Benoit_MITTAU
    Benoit_MITTAU Member
    edited July 2, 2014 #5

    Hi

    For a such volume of data, I don't think anyone could give you a rought idea of the time it will take to complete this migration. There are too much variables : how data are organized in the source DB, how it will be stored in the target repository, the average/max size of content files, network&disk I/O, Documentum tuning, migration tool used, etc, etc, etc

    In terms of tools, here are a quick list of the main solutions (from my point of view) :

    Benoit

  • EDMS98
    EDMS98 Member
    edited July 3, 2014 #6

    I've seen repositories loaded using AWK scripts, Visual Basic programs, Java programs, Documentum's old Import Manager, and a few third party utilities. I've never seen anything run faster than 20 objects per second per process or Java thread, and a more typical performance would be 10 objects per second.  Note that these all used Documentum method calls to do the work of creating the new objects.

    Of course there are a bunch of variables, but in my experience the limiting factor was always the I/O associated with moving the content files, especially if the files are large.  One idea is to test how long the file transfer will take using RSYNC to copy a batch of files from host to host over the network (assuming the platform is Unix).  If that runs at 20 files per second, it is highly unlikely the migration will run any faster given the overhead of reading the metadata from the legacy database plus the time it takes to make the Documentum methods calls to create the new objects.

    My intention here is to try to set some realistic expectations on what to expect with a custom process. I can't speak to the performance of FME's Migration Center or Documentum's EMA.

    Cheers