Moving 83 Million Documents from Various File Systems to Documentum
We are planning to Moving 83 Million Documents (7 TB Size) from Various File Systems to Documentum as single object type. Metadata for these file is stored in LEGACY SQL Server Database
We are planning to write Java/ DFC code to take the files from File system and read the meta data
My Questions is based on experience
- What is estimated (roughly) time it will take to complete 83 Million docs assuming SQL Server is indexed and read is fast
- Any best practices, catches, things to watch out for.
- Any other alternative tools or others
Comments
-
If you plan to traverse the folders to look for the documents, and you happen upon a directory with a lot of files, then I'd suggest using a shell command instead of the File class to get a listing of files.
It will speed up the process considerably.
This might take some time. So best create the logic restart-able (in case of issues) and the possibility to stop and start it.
0 -
You might want to research external stores. With an external store you can leave the legacy files on their existing file system and still create the new object in the repository. Then later, you can move the content to one of the internal file stores in batches using the MIGRATE_CONTENT administrative method. Essentially this would allow you to do the migration in two stages; (1) create the objects in the target repository and (2) move the actual files in batches.
But if you do elect to create the object and move the content all in one step, my guess would be that you'll be able to process 5 to 10 document per second per thread / process. Please note that this is a just a guess!
0 -
Hi, you may also want to take a look at High Volume Server benefits, and if that may help you.
If you take the approach of using external file store mind that if you try to version the objects with content stored in the external store you may have an error as, by default, Documentum will try to store the new version in the same store. In my experience I had to first move the most recent versions content to a filestore.
You should take special attention to the size of the Documentum database and its rate of growth. In this case you may consider using database partitions, in which case HVS may also help you.
0 -
Hi
For a such volume of data, I don't think anyone could give you a rought idea of the time it will take to complete this migration. There are too much variables : how data are organized in the source DB, how it will be stored in the target repository, the average/max size of content files, network&disk I/O, Documentum tuning, migration tool used, etc, etc, etc
In terms of tools, here are a quick list of the main solutions (from my point of view) :
- FME Migration Center http://www.migration-center.com/ : this is probably the most powerful solution for DMS migration, for Documentum but not only (there is a free version if you like to test it first)
- Documentum Enterprise Migration Appliance (EMA)
https://www.emc.com/collateral/service-overview/h11784-svo-pdf-emc-migration-appliance.pdf - TSG OpenMigrate Technology Services Group | OpenMigrate Product Overview : an open source configurable framework proposed by Technology Services Group
- Crown Partner Buldoser
- or pure custom development
Benoit
0 -
I've seen repositories loaded using AWK scripts, Visual Basic programs, Java programs, Documentum's old Import Manager, and a few third party utilities. I've never seen anything run faster than 20 objects per second per process or Java thread, and a more typical performance would be 10 objects per second. Note that these all used Documentum method calls to do the work of creating the new objects.
Of course there are a bunch of variables, but in my experience the limiting factor was always the I/O associated with moving the content files, especially if the files are large. One idea is to test how long the file transfer will take using RSYNC to copy a batch of files from host to host over the network (assuming the platform is Unix). If that runs at 20 files per second, it is highly unlikely the migration will run any faster given the overhead of reading the metadata from the legacy database plus the time it takes to make the Documentum methods calls to create the new objects.
My intention here is to try to set some realistic expectations on what to expect with a custom process. I can't speak to the performance of FME's Migration Center or Documentum's EMA.
Cheers
0
Categories
- All Categories
- 122 Developer Announcements
- 53 Articles
- 149 General Questions
- 148 Thrust Services
- 56 OpenText Hackathon
- 37 Developer Tools
- 20.6K Analytics
- 4.2K AppWorks
- 9K Extended ECM
- 917 Cloud Fax and Notifications
- 84 Digital Asset Management
- 9.4K Documentum
- 31 eDOCS
- 181 Exstream
- 39.8K TeamSite
- 1.7K Web Experience Management
- 8 XM Fax
- Follow Categories