Migration from one S3 object storage to another S3 storage

Options
Hi,

We're in the process of replacing our existing S3 object storage with a different one. We have few billion object in the current S3 bucket and using a Oracle database with Documentum. All our objects are worm protected and objects are not modified at all.

Due to large amount of objects I was thinking about the following procedure:

1. Create bucket A in the new S3 storage and configure it to be used with Documentum for all future writes
2. Old S3 bucket B will be read only, no new writes to old S3 storage.'
3. Create bucket C in the new S3 storage and copy all data from old bucket B
4. Update Documentum/Oracle with new location information for the migrated objects (bucket B -> Bucket C). Not sure how to do this.. I would assume that this could be done via script or something.

Is something like this doable? we were told that migrate_content is not really suitable for huge number of objects. We need to minimize the potential downtime. 

Is you probably figured it out already I'm a storage guy and I don't know too much about Documentum yet :) 


Comments

  • DCTM_Guru
    edited April 9, 2020 #2
    Options
    Since objects accessed via S3 is through url, you should be able to copy the contents to another bucket without issue (this is what migrate_content job would do).  Before you actually do the copy, I would read the Documentum Admin (DA) guide on how to configure a new S3 store object from within DA.  It will ask your for all the relevant bucket information for bucket c.  After you copy the objects over bucket c, you will need to update a_storage_type for all the documents that are point to bucket B to point to bucket C.  Note this attribute stores the name of the storage object created in DA (not necessarily name of bucket C).

    Curious though, why are you creating a new bucket to begin with?  You should be able to change the S3 storage type on existing bucket.
  • @dipdap 's idea is to update the existing dm_s3_store object to point to the new bucket. So no need to update a_storage_type for all documents.

    I have never done that with S3 but I have done it with regular filestores. I don't see why it shouldn't work.
  • Good point @bacham3 - if he doesn't need access to original bucket (B), he can just remap existing store to bucket (C) and not update a_storage_type.  Typically, what we have done in the past is to leave existing storage as is and create new store for new objects.

    I am curious to hear OP response to my last question b/c this entire process may not be necessary.
  • Thanks for the answers, I really grateful for all the tips!

    Our issue is that we're switching to different storage vendor and therefore we need to do data migration between the storage hw.

    Issue is that we're receiving few million new documents each day to our existing bucket and that needs to be accessible for reads during the migration. That's basically the reason why I started working with the idea about using A,B,C buckets in this migration. 

    So the process would be something like this (added some stuff based on your feedback):

    1. First creating a new dm_s3_store object pointing to the new Bucket A in the new S3 array. All new objects will be written to new storage array and bucket A.
    2. Then I would have plenty of time to copy all of the existing data from the old bucket B to the new bucket C in the new S3 array without additional downtime. During this migration Documentum would still be reading existing data from the old S3 bucket B. My main reason to use two different buckets in the new array is because of WORM protection that we're using. If I'm using only one new bucket and there's issues during the migration we might end up with corrupted objects in our environment which we cannot delete. Using two buckets would allow us to simply fix the migration error and continue (WORM would be enabled only after successful migration).
    3. Now I have all the data in the new S3 array, but Documentum does not know about the objects in bucket C. In this part we would need to set new 
    dm_s3_store to bucket C and modify a_storage_type for all the objects the were migrated outside Documentum. After this step we could demolish the old S3 array.

    How to efficiently modify the a_storage_type for few billion objects is still a bit unknown territory for us. 

    I hope I was able to explain this a bit more and thanks again for the feedback :)

  • In step 3, instead of creating a new dm_s3_store pointing to bucket C, you simply update the dm_s3_store object which points to bucket B and make it point to bucket C. That's only one object to update. I would restart the docbase immediately after to be on the safe side.
  • That would definitely make this migration much easier and more manageable! I will need to test this next week. Thanks for the advice :smile:
  • You can do this migration directly, quickly, and efficiently with available migration tools like ShareGate and Gs Richcopy 360, both can save you time and effort.

    On the other hand, in this case, your proposed procedure is doable and can be an effective way to migrate your data to the new S3 storage with minimal downtime. Here are some additional details and considerations to keep in mind:

    1. When creating bucket A, make sure to configure the necessary permissions and access policies to allow Documentum to write to it. You should also consider setting up versioning and lifecycle policies to manage the data in the bucket over time.
    2. When creating bucket C, you can use the S3 Transfer Acceleration feature to speed up the data transfer from the old bucket to the new one. You can also use the AWS CLI or a third-party tool like CloudBerry Explorer to perform the copy operation.
    3. To update the location information for the migrated objects in Documentum/Oracle, you will need to run a script or query that updates the relevant records in the database. The exact steps will depend on the specific configuration of your Documentum system and the schema of your Oracle database. You may want to consult with a Documentum expert or DBA to ensure that the updates are performed correctly.
    4. To minimize downtime, you can perform the migration in stages or batches, rather than copying all the data at once. For example, you can migrate data for a particular set of users or applications first, and then gradually migrate the remaining data over time. This can help you avoid any potential performance or availability issues during the migration process.
    5. You should also consider testing the migration process in a non-production environment before performing it on your live data. This can help you identify any potential issues or challenges and refine your procedures accordingly.