TeamSite Redundancy

This is more of an architecture question than a development question, but here goes ... I am doing research on redundancy for our TeamSite server. I know that TeamSite High Availability is there on the high end, and recovery from backup is there on the low end. Is anyone using anything in between those two approaches? I would appreciate comments and suggestions about what to research.
Thanks,
Rich

Find more posts tagged with

Comments

tvaughan

Hey Rich,

I'm looking in to a middle ground solution, but I'm having problems getting it rolling because of bureaucratic issues.

At first, I was gonna do the whole "clustered" thing with TS HA, but then I realized it was kinda overkill for what we were doing, and pretty expensive (each HA license is around $50K)

I'm running TS 5.5.2 on Solaris 8 and heard about this Sun StorEdge suite of products that includes a real-time data synchronization application that can mirror filesystems across a network.

What I'm going to try to do is shut down the iw process running on our backup server and real-time mirror the backing store filesystem running on our primary IW server over to a filesystem hosted on the backup IW server. If and when a primary failure occurs, I'll just boot up the backup server and it **should** load a backing store very similar to the primary store.

The dangers I see with this are two-fold: (1) if the failure on primary prod is due to some backing store corruption, then the backup IW box is just going to inherit that corruption. (2) there's a Heisenberg effect going on here in that by poking around in the sensitive primary backing store, StorEdge might actually end up causing a primary failure that it is designed to mitigate.

I'm thinking about alternating the name of the backup filesystem to which StorEdge writes its mirrored copy every other day. Thus, if there's a Thursday crash in which the backup IW server gets a bogus backing store copy from, I can always bring up the Wednesday backing store on the backup IW server.

I'll keep you posted as to how this goes.

Tom

Copyrfcgateway.txt

iwovGraduate

Just a few additional things to think about...

- You will have to replicate all the configurations as well.
- You cannot directly "copy" over some of the config files since they have host specific information. e.g: iw.cfg .
- When you failover to backup TS server, you might lose some data. Important implication -- your web-server (OD target) might already have data that is not in your backup TS.
- Consider failback: You will have to first bring the primary TS in sync with the changes during the down time before activating it.

This might be something that IWOV PROD MGMT should look at. Afterall, IWOV is all about "ENTERPRISE" content mgmt, right ?
I have seen a quite a few "enterprise" customers who need to have redundancy not only for data but for applications as well. In the ideal world there should be a automatic switch triggered by a SNMP trap or soething like that. The same thing applies to OD receivers, etc.

mmb

Hi

Are there any other things that you can think of that may cause problems when doing a failover of the application between two servers?

I have a example where I am failing over a shared disk between 2 servers, the TS app resides on the shared disk, when teamsite starts on the second server, I am unable to login with a known and configured user. Is there something in the user entity database that ties users to servers ?

I have made as many server specific changes to iw.cfg , openapi.cfg and the webserver config as i can think of. each server has its own license key. Is there anything else I should think of?

Thanks,

- mark

web-atf.xml

wdk-atf.log

tvaughan

The TeamSite user must be a user known to unix. If your user doesn't exist in the passwd/nis/ldap authentication source that your backup TeamSite server uses, they won't be able to log in.

Can you telnet to your backup server as one of your team site users?

If you can, try running the "iwuser username" command and make sure that the numeric uid that TeamSite thinks your username represents is identical to the uid that Unix supplies to the user.

If the uids (or gids, for that matter) don't jive, you can either synch the passwd file from your primary server to your backup server (and don't forget the shadow file), or you can hack your Entity files and manually edit the uids in the various entity entries.

Tom

mmb

Thanks Tom,

Yes the users are known to Unix, and are set up in the *.uid files.

I think that you are right about the uid & gid's I will double check and try your suggestion.

Thanks,

- mark

gwen1

My company requires us to have off site failover for our production servers. We currently have two different methods of replicating our backingstores to the failover TeamSite instances. In both cases, uid's and gid's are same and any custom code is deployed to both servers when implemented.

On our 4.2.1 TeamSite installation, we have a script that runs every 1/2 hour which compares files in specified Workareas and captures the extended attributes in a script to set them after deployment. For a large directory though this would not be practical as a comparison takes too long. Following a failover someone submits the files to staging, but there is some loss of workflow tasks, etc. We also tar and compress the production backingstore on a nightly basis. This tar.Z file is backed up through ABS backup everynight. Once per week, we replace the failover backingstore with a copy of the production one. This catches anything that may have been missed during the intraday replications, failures, unreported new WA's, etc. These processes could be adapted and beefed up for other versions.

On our 5.0.1 TeamSite installation, our backingstore is on EMC. We use the SRDF replication to our remote disk. When we failover that link is severed and the disk mounted to our failover box and we start TeamSite. This has been successful in our testing and the one time we had some hardware problems. One benefit is that the workflow is maintained. (may loose task or two if someone was in the middle of something) We also make a backup copy of the disk nightly. If the failure was due to a corruption in the backingstore, we would then be able to fail back to previous night at least.

Hope this aides in the research!