Home
TeamSite
Calculate number of pages on website
alanhill
Hi,
Nothing much to do with Interwoven this one, but does anyone know of any utilities that you can use (eitehr downloaded or via browser) which will systematically go through your website and bring back the number of pages.
With our CMS system in place and users creating and deleting pages and links, it's often difficult to accurately work out how many navigable pages there are on our sites at any one point in time.
Many thanks
Alan
Teamsite Template Developer
Find more posts tagged with
Comments
akshathp
What type of result are you expecting in this case? Would it be a simple count of files on your site? Maybe with some details on type of files too. I mean the total count of files with divisions for HTMLs, CGIs, GIF/JPEGs, DOCs, PDFs etc.
Or are you looking for something more detailed like not just a count but also a follow through on dependents and precedents. Something where you can find orphan files, broken link files etc.
If the case if first one, then that should be pretty simple to do in PERL (or any such scripting language). You would need to traverse the root directory of your site recursively with a check for file counter and file-type counter etc.
Maybe more details on your requirements will help.
Akshat Pramod Sharma
dazzlad
Sounds like a job for File::Find.....
Something similar to the example below.
use File::Find;
my
@files
;
find(\&wanted, 'd:/path/to/root');
print scalar(
@files)
;
sub wanted {
my $name = $File::Find::name;
if($name =~ m/\.asp$|\.htm$/) {
push
@files
, $name;
}
}
dazzlad
Replying to my own posts again...
I just re-read the question and I guess you are more interested in counting the numberof pages that are currently used rather than all pages in a folder.
I'd use something like
http://www.httrack.com/
to crawl your site and then use a perl script like in my previous post to analyse the contents of the copied site.
Problem here, is that if the site is within TeamSite, I am not sure
whether you will be able to get the software to authenticate.
Hope this helps.
Darren.
alanhill
Hi, thanks for the advice, that htttrack site looks useful. I'll be 'scraping' the live site if I use the tool, so should have no problems getting a copy of it locally. I could then look at writing some perl routine to scan the folders and pick out/report on certain page types etc.
Cheers
Alan
Teamsite Template Developer
jed
wget in spider mode could also do the job.
--
Jed Michnowicz
jedm@sun.com
Content Management Engineering
Sun Microsystems
DJ_At_Work
BlackWidow will also tell you the number/structure of a website.
http://www.softbytelabs.com/Frames.html?f1=Banner.html&f2=BlackWidow/index.html
or if you need a more robust analysis/reporting tool, check out WebTrends.
HIH
-Drew
Edited by DJ_At_Work on 12/19/05 02:34 PM (server time).
dazzlad
Alan. Forgot that there is a very good Perl script called w3mir that does the same as the software i recommended before (you can find it using google).
It may be better for automating analysis etc.
Darren.
Tbag
Best freeware ever --------------> Xenu LinkSleuth <----------------
I can't recommend it enough. As an added bonus, it's anti-Scientology.
It'll get you page counts you're looking for in aggregate, and also break 'em out by file type, etc.