Home
TeamSite
Clearing old editions and file histories
Bryan_K
Is there a setting in TeamSite that allows automatic deletion of editions or file histories after a certain number accrue? Say delete old editions after 50 have been published and just keep the latest 50? I have read some articles on performance problems with deleting editions and the relatively little disk space saved. If we have clients publishing 200 editions a day, is this effort worth it to save disk space?
Find more posts tagged with
Comments
Migrateduser
I wrote an edition maintenance script that does precisely what you mentioned - only allows X number of editions per branch (X can be different for different branches) and runs as a trigger script for the iwatpub event. So basically, anytime anyone publishes an edition, it checks to see if more than X editions exist and removes the oldest one(s) to keep the total at X. The script also provides a naming mechanism for people to tag their editions to save no matter what by adding the word "save" to the edition name. Then it will never delete that one. It ended up killing us in TS 4.2.1 because of the way TS handled edition deletions - if you delete too many editions at once, TS barfed. However we're considering turning it back on in 5.5.2 because edition deletion is much more efficient. I can post the script if anyone cares to see it.
As a side note, we have found you definitely can recover vast amounts of disk space by deleting editions.
Dave Smith
Sr. Software Engineer
Nike, Inc.
(503) 671-4238
DavidH.Smith@nike.com
Edited by Smitty77 on 11/21/02 12:23 PM (server time).
Bryan_K
Id definitely be interested. THis was one of the things we were considering implementing in the next few days...
Bryan_K
One question, does the script wait until there are a number of editions of the limit and then delete them a number all at once, or does it check each time, effectively deleting 1 edition each time a publish is run in a branch that is already at the maximum #?
Migrateduser
It checks each time an edition is published. So if have a branch with a limit of 50 editions and there are currently 48, it will do nothing until you publish 3 more editions, then it will delete the oldest one. However, keep in mind that if you set an edition limit on a branch for 50 editions and the branch currently has 200 editions, the next publish event will cause this script to try and delete 150 editions all at once - probably not a good idea. So I would recommend cleaning up your editions manually to a number you want to use as your limit before running this.
To execute this script as a trigger on every publish, edit your .../iw-home/local/iwlocal.cfg file and add the line:
iwatpub root /interwoven/iw-home/local/publish.cgi '$IW_STAGINGAREA'
then you have to start the command triggers by typing:
/etc/init.d/iw.local start
Of course I believe you need root or sudo to do this.
In the publish.cgi script, modify the %maxEditions hash to indicate which branches you want maintained by this script - it will only do this for branches listed in this hash. You'll probably have questions - ask away.
Dave Smith
Sr. Software Engineer
Nike, Inc.
(503) 671-4238
DavidH.Smith@nike.com
Bryan_K
Thx, Ill let you know how it turns out if we choose to implement.
Migrateduser
Smitty says:
As a side note, we have found you definitely can recover vast amounts of disk space by deleting editions.
Bear in mind the above will be true only if the "vast amount of space" is represented by file versions that are unique to a deleted edition. TeamSite only deletes a file version in two situations:
- The branch the version is in is deleted
- The version is only in one edition, and that edition is deleted
The second situation is the more common, and the one Smitty is leveraging. When you delete an edition, you're only deleting a small DB record as far as TS is concerned--ie, the edition itself is trivial. However, TS will then go on to find and destroy file versions that are unique to that edition. If the edition happens to have large numbers of versions that were unique to it, then yes, a large amount of space will be freed up. However, it is very possible that a given edition contains *no* unique versions, and thus only a handful of bytes will be freed up. Thus, your milage may vary when you delete any given edition.
OTOH, if you _really_ want to do some cleanup, create a new branch, and copy-to-area the latest edition
from an old branch to the new one. Then delete the old branch and rename the new one so that it has the same name as the old one. This can be scripted (several versions exist), including migration of the workareas as well. This "pruning" will definitely free up lots of disk space.
bw
Bob Walden [bob.walden@interwoven.com]
Interwoven Education Group
IM: Yahoo, MSN bob_walden
Bryan_K
If this is the case, I have to ask the follow up question. Is there a way/is it beneficial to delete old file versions? Same as before - say only keep the last 20 file versions...
Adam Stoller
To put a different perspective on Bob's answer - using Smitty's process (thanks for prodviding it) - the incremental savings you see when publishing *an* edition will likely not be that big - but the difference *over time* will likely be greater.
The savings are aggregated over time - some of your editions will have few changes of small files, some will have few changes of large files, some will have many changes to small files, some will have many changes to large files (etc.).
Removing any one edition will only result in the savings resulting from files / file versions unique to that edition.
So, for example, if you have two *.wav files (they tend to be fairly large - say 80 Mb each for now) - and both existed in the first edition [1] on that branch, one of them got changed in the next edition [2], and then got removed in the next edition after that [3] - and the other file never changed.
Deleting the first edition results in about 80 Mb of savings because instead of having two 80 Mb versions of that file, you only have one (so you're still storing 80 Mb, just not 160 Mb for that particular file).
Next, deleting the second editon results in another savings of 80 Mb.
Throughout all of this, you still have 80 Mb devoted to the *other* wav file - but only 80 Mb - not 160 or 240 - because since it didn't change between editions, there is only one physical copy in the backing store.
Over time - consider the number and size of files that change and/or get removed between editions, and you will realize a savings more in terms of lack-of-growth than in terms of retrieval of disk-space.
I don't think we support deleting individual versions of files separate from deleting editions - because if we did, that would possibly invalidate an edition - and that would be *wrong*
--fish
(Interwoven Senior Technical Consultant)
Bryan_K
Okay sounds great, just want to make sure I understand.
-Editions store pointers to versions of files that exist in TS. If an edition is deleted and it references an old version of a file, that old version of the file will be deleted, so long as no other editions point to that particular version. So by cleaning up editions you are essentially cleaning up old version of files in the backing store, thus saving space.
- Do this mean that deleting an edition might delete a version of a file, which will then no longer be available to roll back to by using the individual file's history?
james1
> Editions store pointers to versions of files that exist in TS.
This is true of all
areas
, not just editions.
> If an edition is deleted and it references an old version of a
> file, that old version of the file will be deleted, so long as no
> other editions point to that particular version.
If you replace "edition" with "area", then the above sentence is correct.
For example, if edtion 16 and workarea "james" are the only areas in TeamSite referencing a particular version of a file, and if edition 16 is deleted, then that version of that file won't be deleted, because the "james" workarea still holds a reference to it. If you want to reclaim backing store disk space, then you should check for workareas with really old content and either delete them or get latest on them.
> So by cleaning up editions you are essentially cleaning up
> old version of files in the backing store, thus saving space.
Yes.
> Do this mean that deleting an edition might delete a version
> of a file, which will then no longer be available to roll back
> to by using the individual file's history?
You bet. One of the great features of TeamSite is the version history of your content, but you limit that feature when you delete old editions. Be careful.
It's worth talking about intermediate versions of files here too.
If I submit a file, version 1, and then publish an edition (ed1), and then change and submit the file 3 more times, versions 2-4, and then publish another edition (ed2), then versions 2 and 3 will have no edition containing them. And, for the sake of argument, we'll say that no other area in any other branch anywhere else references these versions of those files.
If you delete ed1 (or is it ed2? ... I forget), then those intermediate orphan versions of the file (versions 2 and 3) will get deleted as well.
If you are so inclined, this behavior would be easy to see for yourself.
-- James
--
James H Koh
Interwoven Engineering
Bryan_K
What can we expect in TS 5.5.2 with the new and improved backing store if we choose to run a script such as the one Smitty provided? Will performance be an issue?
james1
> What can we expect in TS 5.5.2 with the new and improved
> backing store if we choose to run a script such as the one
> Smitty provided?
I'm not sure I understand this question...
> Will performance be an issue?
I'm not sure what your thresholds for performance are, but I would expect the removal of an edition to be faster in TS5.5.2 than in TS5.0.x or earlier, and therefore I would expect Smitty's script to run faster as well. At the very least, I would be surprised if it ran any slower.
Hope this helps.
-- James
--
James H Koh
Interwoven Engineering
Migrateduser
For what it's worth, I have been deleting editions in bulk manually (not running the script but using File->Delete on 40-60 editions at once) which would have brought TeamSite to its knees in the older versions. In 5.5.2 it returns control back to me within a matter of a few seconds and I see no degradation in performance while in the background the edition deletion process chugs along. We have noticed that our larger editions take just as long (or so it seems - some take almost an hour each) for the background batch process to run to completion on a deletion as it did before. What's improved is the affect this has on TeamSIte performance and the UI gives you back control right away. In the older versions, the UI would be stuck for a really long time. This doesn't mean that if you try to delete 100 editions all at once you won't have problems, but I think you can expect a significant improvement on your performance in 5.5.2.
Dave Smith
Sr. Software Engineer
Nike, Inc.
(503) 671-4238
DavidH.Smith@nike.com
Adam Stoller
-Editions store pointers ...
Basically, yes.
- Do this mean that deleting an edition might delete a version of a file, which will then no longer be available to roll back to by using the individual file's history?
Actually, I meant it the other way around - if you could arbitrarily delete specific versions of files, you would very likely "break" one or more editions - and your ability to use those editions for rollback purposes would then not work...
I
believe
that the history for the individual file is correctly cleaned up should a specific version(s) of a file be deleted due to no longer being referenced by any edition or other active TeamSite area.
--fish
(Interwoven Senior Technical Consultant)
Bryan_K
"I believe that the history for the individual file is correctly cleaned up should a specific version(s) of a file be deleted due to no longer being referenced by any edition or other active TeamSite area."
-------------------------------------------------------------------
If a file is copied from one branch to another via OpenDeploy, I assume TeamSite will consider this a new file, even though it is the same data? Meaning if I delete an edition in 1 branch which causes the file to be deleted in that branch, it will not affect the other branch, correct?
Migrateduser
correct.
bw
Bob Walden [bob.walden@interwoven.com]
Interwoven Education Group
IM: Yahoo, MSN bob_walden
Bryan_K
Smitty,
I think we are going to go ahead and implement your script. It looks great and accomplishes all we would like it to.
Im wanting to set it up to run at each publish as you suggested, and we are running TS 5.5.2 (no SP) on W2k.
I have seen some stuff in the manual about setting this up as a service, but it seems to be different than the method you suggested. What is the iw-home/local/iwlocal.cfg file? I dont have this file (nor have I seen it). Is that something that can be created as a text file from scratch?
Bryan
Migrateduser
iwlocal.cfg is the config file you use to map trigger events to custom scripts you create. This is where I set up the script I posted to run by adding this line to iwlocal.cfg:
iwatpub root /interwoven/iw-home/local/publish.cgi '$IW_STAGINGAREA'
Once you add or remove something from iwlocal.cfg you have to stop and start iw.local (I don't know where that is on an NT installation but it's in /etc/init.d on Solaris) in order for your changes to take effect. This is all documented pretty well in the Command Line Tools document in the Command Triggers section.
Dave Smith
Sr. Software Engineer
Nike, Inc.
(503) 671-4238
DavidH.Smith@nike.com
Bryan_K
Im thinking it may be quite different in NT. The documentation talks about setting up a service that runs in the background. What I dont know is how to get that service to run your script each time someone publishes.
Anyone that can help with this?
Adam Stoller
iwlocal.cfg
is a Unix-ismFor Windows - you would either have to use
iwatpub
(setting it up as a service as described in the manuals) or use the more generic
iwat
trigger (also documented in the manuals).The advantage of using the latter is that it does not require the creation of a separate service using
srvany
and gets recorded in the backing store so that it continues to work after reboots.
--fish
(Interwoven Senior Technical Consultant)