Is it possible to get access to the entire HTML generated from a page in SitePublisher?
What I am trying to achieve is running a document markup processor over the entire HTML e.g. markdown
Well, you could build a workflow to do the generation of the html, and then have a following task process the output.
..or were you asking about hijacking the output on the LSDS runtime?
Yes was looking at doing this on LSDS.
The only place so far that seems to have access to the raw HTML is a function called replaceForm() inside BaseRequestContext class.
It would be a hack to override this class so looking for any other way this might be possible.
A hack will be the only way. The LSDS engine writes directly to the outputstream so there's no hook to do post-processing. What you could do is create another service (or perhaps even a servlet within the same container) that does an HTTP request to the LSDS engine and then does the post-processing before the handoff to the http client. It adds more overhead, but it doesn't involve any hackery.