Bulk Rename of Folders & Documents

aflowers001
edited February 3, 2009 in Documentum #1

We have a need to rename a large set of the folders within a repository such that we can pass it to a third party for development purposes such that they cannot work out any confidential information.

Now for the documents we're ok doing that via direct Oracel SQL calls but for the folders it looks a little tricky for the r_folder_path recalc. Has anyone done anything like this is the past and if so are they willing to share any hints & tips.

We're looking to do it directly via Oracle as the number of folders to be updated is somewhat large.

cheers,

Comments

  • sbickle
    edited January 15, 2009 #2

    You really should avoid mass rename of the folder objects at the database level. The simplest approach would be to write an API script to rename the folders, the API call will take care of all the referential integrity with folder paths and operates relatively quickly. To make it really simple you could just create a list of folder object IDs using a DQL query, then use a decent text editor to create a script with the following lines for each id:

    fetch,c,<id>

    set,c,l,object_name

    Folder-<id>

    save,c,l

    This would name each folder with 'Folder-' followed by the string representing the folder ID

    Steve

  • aflowers001
    edited January 15, 2009 #3

    Hi,

    Thanks for this. We've already looked into doing it via the API but given the number of folders [let's just say way too many] it would take too long.

    We are after something that will run in less than a business day and we have a feeling that running via DQL/API it would take many days, if not weeks.

    We have come up with a basic plan for it and have now engaged with our RDBMS team and if we get something useful I will feed it back here.

  • sbickle
    edited January 15, 2009 #4

    Any SQL you craft to make the change will as you say have to address all the changes to all the related r_folder_path entries. First you will need a query to determine the entries to be changed, then it would will require a number of separate updates bound by a transaction to make each change. You may well be able to do this within a piece of custom code running directly against the database.

    It would be simpler to write a small piece of java code to run on the content server using the DFC api to achieve the same. The underlying queries run by DFC to achieve a folder name change should be expected to be optimised to run efficiently. What does slow down the API is all the ACL security checking. If you run your code on the content server as the docbase installer, or other superuser, fewer checks are carried out and the code will run much faster. So it may be worth conidering giving this a try to see if you get a workable speed.

  • aflowers001
    edited January 15, 2009 #5

    We looked at all DQL/DFC based ways but the number of folders to be renamed is in the hundreds of thousands, actually over half a million, and there are other folders in between and below these that will need r_folder_path updated which adds up to a horrific number, and it's a number that DQL/DFC will not be able to cope with in a reasonable time frame.

    cheers,

  • sbickle
    edited January 15, 2009 #6

    I guess that it depends upon what you consider a viable time frame. To change 500k folders you only need to average 6 per second to do it all in under 24 hours, what did you achieve?

  • aflowers001
    edited January 15, 2009 #7

    Is the maths right?

    500K * 6 = 3,000,000 seconds

    3,000,000 / 60 = 50,000 minutes

    50,000 / 60 = 833.33 hrs

    833.33 / 24 = 34.7 days

    We'd have to do each folder in .17 seconds assuming only 500k folders and 1 r_folder_path per folder.

    As it is we have over 3 million folders in total and over 3 million distinct r_folder_path entries.

    These number this makes DQL/DFC a non-starter no matter where we run it.

  • sbickle
    edited January 15, 2009 #8

    This was my maths:

    24 * 60 * 60 = 86400 seconds per day

    500,000 folder in 86,400 second = 5.79 per second

    So you would need to do about 6 per second to cover 500k in a day. Which might well be achievable.

    Is it possible that not all the folders need to have a name change. Most folder structures have their own internal structural folders that do not distinguish anything client specific, such as year or state etc. Perhaps a more conservative list of Folder IDs to be changed could be derived by runinng some code with business logic against the folder structure, just picking out those that may be sensitive.

  • aflowers001
    edited January 15, 2009 #9

    Aplogies on the maths, I misread your post.

    As for the business logic unfortunately there is nothing that can slim the list down. For our industry everything is sensitive, they are names of companies and the like that cannot be revealed to those outside, and even to many inside, the company, and whilst we do have many folders that don't identify much in and of themselves, i.e. child folders, these do identify their parent by the r_folder_path which is why they have to be updated.

    Anyhow, we're pretty closeto a solution that looks like it will work.

    thanks.

  • lgrayson
    edited January 15, 2009 #10

    While doing your math, you may also consider breaking the list up and using multiple sessions.  While there is a balance to the number of open sessions to degrading performance, you will find a decent balance around 4 to 6 sessions.  Maybe more, since I was importing files not just renaming objects.  This will reduce the total number of days as a whole.

  • lgrayson
    edited January 15, 2009 #11

    You will also want to consider, going directly to the database to make these changes can void support from EMC if anything should happen to go wrong later.  They can't always help you if you make the changes outside the given DQL/APIs.  So taking a little longer now through the APIs may save you more time in headaches later on.

  • Yoni
    edited January 28, 2009 #12

    I must mention that I also discourage direct intervention in DB,

    especially when you deal with folders.

    Now regarding your solution: do you have hierarcical structure of folders?

    If so, have you considered the case when you change child-folder name

    prior changing father-folder name? In that case r_folder_path of a child will contain

    PRIVIOUS name of its father.

    Best of luck in finding suitable solution ASAP

    Yoni

  • ldallas
    edited January 28, 2009 #13

    as silly as this sounds - it might be worth timing the following

    perform the process in 4 steps

    1) create the complete new taxonmy first

    2) update statistics - this many new objects might cause perf problems if you don't

    3) move the content - time both the MOVE DQL and sucessive link,unlink.

    4) delete the empties - consider creating queue items for folders to be deleted to make it easier to ID them and process in batch

    this is a little tricky but if you are not dealing with multiple versions - this eliminates a significant amount of existence checking and blocking and will give you more discreet control of batch sizing. Makes it easier the multi-thread the workload too. The transactional nature of a rename is very expensive. A lot of versions though could make this mehtod more difficult than its worth

    I would still do this with DQL instead of DFC operations. might result in excessive local caching - have to try it to be sure

  • lgrayson
    edited February 3, 2009 #14

    Lee has a valid point in regards to the local caching.

    For API you want to avoid using fetch.  Fetch pulls down all of the information for the object, and if you are not going to use it all, don't use it.  Since you already have the r_object_id, go ahead and use the id within the API commands like below:

    set,c,<id>,object_name,<New Folder Name>

    save,c,<id>

  • lgrayson
    edited February 3, 2009 #15

    You can also use DQL, however, I would suggest using individual DQLs, like the individual API statements, to make the changes instead of one DQL because the fully encompassing DQL is likely to fail (for various database space, table sync, network, and server issues), and the query will rightly revert all unsaved data. Thereby wasting a ton of time, and making you very frustrated when returning to find the failure.

    So, since you already fetched the object ids to change, you would want a list of queries like below:

    UPDATE "dm_folder" OBJECTS SET "object_name" = '<New Folder Name>' WHERE "r_object_id" = '<id>'

    go

    You may then use the file created and launch using a batch file, calling idql32 instead of iapi32.

  • Thanks for the hint. I mean this post is from 2009 , now we have 2021 is there a better solution at the moment or an API REST Call where you can rename a folder with a couple of id`s?

    Because the issue i see is, using REST means, that we need the Object ID in the destination path. That means i could prepare a nodeJS application which sends 100 calls including the node id.

    Is there an example ?

    cheers