Discussions
Categories
Groups
Community Home
Categories
INTERNAL ENABLEMENT
POPULAR
THRUST SERVICES & TOOLS
CLOUD EDITIONS
Quick Links
MY LINKS
HELPFUL TIPS
Back to website
Home
Web CMS (TeamSite)
Maybe someone can get a better approach than mine.
nipper
Here is the situation:
I have ~ 15,000 files in TS. Most of which are referenced by a DCR.
DCR1 -> file1
DCR2 -> file2
file3
DCR4 -> file4
So what I want to do is find all of files not referenced by DCRs.
First thought, read DCR, set EA on file it points to, when done, search through the structure for files without the EA.
Reasonably quick, HOWEVER, I then would have 15,000 modified files since adding an EA is a file modification.
My current thought:
do an ls -R for the files into an array, read DCR 1, find file referenced in array. Very slow & CPU intense. Other ideas ? I was thinking about sort & a binary search.
TIA
Andy
Find more posts tagged with
Comments
james1
I wouldn't store 15,000 items in an in-memory array. This could consume a lot of memory, and if your program has a problem in the middle, you'd have to start over from the beginning.
I'd look for persistent store so that you can resumee your job if you want to. One way is to create a new workarea, then for each file referenced by a DCR, delete the file in the workarea. When you're done, the files that you're left with are the unreferenced files.
Hope this helps.
-- James
--
James H Koh
Interwoven Engineering
mogoo
Andy-
What is it you're ultimately trying to accomplish here? OOTB, files generated via DCRs already have EAs that flag them as such. We search through the HTML files for a PrimaryDCR EA -- if this value is equal to null, it's assumed to be a hardcoded file.
-maureen
nipper
Maurren
The files are referenced by DCRs but not generated (via tpl) by the DCRs. Thus there is no EA from templating. The file is uploaded and needs to be pointed to by a DCR.
Andy
mogoo
Oh, gotcha. Maybe automatically maintaining 1 big list of all the files referenced in the DCRs would help cut down on processing??? No idea if that would help, just a thought...
maureen
Adam Stoller
Building on what I think James was saying ...
Have your PT's record the referenced files somehow (either as EAs, flat-file, or DB calls)
Have your process utilize the information recorded by the PTs.
I think using EAs with DataDeploy, or doing your own inserts/updates into a DB is probably more efficient than using a flat-file (since you want to make sure the information is updated if/when the references within the DCR change). Both are likely to incur a bit of performance overhead during page generation - but would make the process of checking file references afterwards far more efficient than parsing each and every file....
--fish
(Interwoven Senior Technical Consultant)
nipper
>Have your PT's record the referenced files somehow (either >as EAs, flat-file, or DB calls)
Not using EAs, think of this as a catalog system a DCR and come downloadable specs (usually PDF). They need to be kept in sync (with a great deal of custom coding)
>I think using EAs with DataDeploy, or doing your own >inserts/updates into a DB
I wish EAs were possible. When IW has the flexability that templating has for the datacapture window then I will drop DCRs. FormAPI, inline callouts,much more granular invokation (category/type) are things we use heavily.
Andy
Adam Stoller
When you say a PDF (or other asset) is referenced by a DCR - is that reference within a specific field of the DCR or is it an arbitrary reference within a text area?
Do the DCRs ever get processed by a PT? Or are you using DCRs as wrappers only for setting metadata about / on the referenced document(s)?
It sounds a bit like you're talking about the latter - but perhaps you could use a PT that generates a mock file and takes care of sending all the associated referenced documents to a DB (as previously mentioned) - and then you could probably take a file list and a dump of referenced files from the DB and compare them to get a list of non-referenced files?
(there might be an even better way, but I'm a bit short on sleep right now)
Hmm - I just thought of another possibility - perhaps you can set up DAS so that the DCRs (and their respective fields) get automatically populated into a DB - so no PT would be necessary...
--fish
(Interwoven Senior Technical Consultant)
Edited by ghoti on 08/26/03 07:14 PM (server time).