TS High Availability

good afternoon,

we are looking to implement a 24/7 environment for teamsite (currently on 5.5.2 SP 6 but soon upgrading to TS 6.x) and we are struggling to find a decent and robust solution. we have had IWOV consultants in for four days and it looked like we had a solution until they said that actually, that proposal wouldnt work. something about having to do an iwfschk if the primary node fails over? this would not only be time/resource consuming but would also leave us vulnerable during the check.

anyway, i was wondering if anyone knows of any successful in-situ implementations of HA for Teamsite (not necessarily their proprietary watchdog software) based on a windows platform? ideally, ms cluster service or such-like or maybe something like nsi's double-take software?

i look forward to you input.

brendan

Find more posts tagged with

Comments

TCavanaugh

We utilized the HA modules for over a year in a Windows 2000 cluster, and recently converted them back to standalone servers.

There are 2 approaches for TeamSite clustering on Windows 2000
1) Create a series of generic resources with appropriate dependancies.
2) Purchase and install the HA module, and create one resource of type TeamSite.

The second option is slightly easier to setup and maintain. Both solutions will detect and trigger failover in the event of a service crash. Neither will detect problems that do not involve a service crash. i.e. If the application fails in a way that leaves the service running, no failover would happen automatically.

We found little benefit from using clustering. Following are a list of items to consider:
1) OpenDeploy is not clustering friendly as a receiver. In our environment, in some instances we deploy from one TeamSite cluster to another. This required us to install and configure OpenDeploy to be able to receive connections while operating as a cluster. This would only work if installing into an Active Passive cluster. We were told that clustering would be supported by OpenDeploy 6.0.
2) The iwreset command is not cluster aware. These commands should be smart enough to communicate with the clustering resouces, if installed. They are not. Executing an iwrest -ui will initiate a cluster failover; as the clustering environment will detect a service failure.
3) The standby isn't really in standby. A failover to the secondary server would take about 5 minutes - the time necessary to transfer the resources, and launch TeamSite. We have a reasonably large domain, the bulk of the launch time is downloading information from one domain. The amount of time to restart a single server is actually only slighlty longer; approximately 7 minutes. We would on occasion run into problems with the cluster itself (not a TeamSite issue) which would actually extend our outage.
4) You will want to move your entity database and your iw-store to a shared drive. To move the entity database, put the following into your iw.cfg:
[locations]
## Store entity DB on shared drive
entities=s:\iw-entitydb

I don't believe there are very many Windows TeamSite clustering environments out there; support was not often familiar with the issues involving clusters; and the benefits were minimal. Clusters are great if you are trying to take a 98% available solution to 100%. But, if you currently have a 95% available solution, they tend to help you get to a 90% available solution...

-TC

Migrateduser

I first noticed the effect of your point #2 yesterday when rebuilding my UI on a 6.1 cluster with HA. The weird thing is that the one build appears to have resulted in it being installed on both nodes. When I tried to build on the other node, it said it was already up to date. One of these days I'll explore that more and figure out exactly what has happened.

Migrateduser

I just thought of another potential gotcha. If you're like me, you probably tend to write any logs for external tasks somewhere within iw-home (e.g. iw-home/local/logs, iw-home/tmp, iw-home/custom/logs, or whatever). Since the iw-home dir is not a shared resource, your logs will not all be in one place. I'm thinking about putting those sorts of things on a shared drive, but since I tend to use them only for debugging problems it might be best to keep them with the node where they executed.