Constantly restarting teamsite and open deploy

We are experiencing a problem with our teamsite and open deploy servers whereby we are required to constantly restart the server services. I am most familiar with the teamsite situation, so I will describe that one:

Periodically, between once and 3 times a week, teamsite will all of the suddent start acting odd. If you have a session already open, you can browse thru branches but any attempt to edit a file or a DCR does nothing. No windows pop up, there is no response at all except for the fact that you can move around and see the files in branches. If you have the launch page up and click the "start new session" button, a window pops up with a message along the lines of "Could not contact servlet engine." If you try to LOAD the teamsite launch page, it just hangs with the little IE globe spinning and spinning.

This is very frustrating as it constantly requires IT attention to restart the TS services, and also impedes critical business access to the tool. We are running teamsite 5.6 SP3, OpenDeploy 5.6 on solaris. Any help would be greatly appreciated! Thanks!

Find more posts tagged with

Comments

rwinterpacht

We also are on Solaris, running OD 5.5 and TeamSite 5.0.1. There haven't been many problems with TeamSite, but with OpenDeploy, we would see the processes die, and determined it was related to the process being started without the "nohup" option. This was supposed to be fixed with OD 5.6 sp1, but haven't downloaded/tested yet. We had a case opened up, and were advised to upgrade OD to that version, but could use the nohup workaround:

Also, we were advised to start OD like this:
$ ssh -l <account name> <OD server name> "/etc/init.d/iwod start"

Hope this helps,
Raf Winterpacht
Household International

Migrateduser

What you're seeing sounds like the effect of your web server crashing/being killed. Your version is later than what I have administered; and I've hardly had to touch my installation since it was upgraded to TS5.5.2. Even that was only to add a means for testing backups. The web server isn't (or at least didn't used to be) a TS service per se; it was just Apache*. So it sounds like something is killing 'httpd'/Apache When the web server isn't running, you can't get source code rendered, but TS will still navigate its branch structure. At any rate, this is not normal behavior for Apache to crash. You should talk to technical support.

* When I bought TS in '99 you could choose Netscape's web server; but I've only used Apache.

BENMAY

I would assume that you are correct in the apache problem, except that the way I solve this issue is by issuing:

/etc/init.d/iw.server stop
/etc/init.d/iw.server start

After which everything works. Now, correct me if I'm wrong, but iw.server doesn't touch the apache service does it?

Itemisation.jpg

herald10

Have you tried installing OpenDeploy 5.6 service pack 1. I got the same problem but installing the service pack solved it.

BENMAY

Actually we did try to install OD SP1 but it actually created problems. We couldn't view any deployment logs. Plus, we have dozens of servers we deploy to, which are all owned by a service provider, so installing the OD SP is no small task...

Itemisation 02.jpg

BENMAY

Did you also have this problem with teamsite?

herald10

If thats the case I guess uninstalling the opendeploy and then installing opendeploy without admin server may solve the problem.

gwen1

You should likely open a support case for this. Also, have your Solaris system admins been involved in looking at the system as a whole? Memory, CPU, etc. We have been on TeamSite 5.5.2 , Solaris 8, since May and haven't had any cases where the servlet, proxy or web server died or became hung.

Migrateduser

Well now that depends. I'd expect TS start-up to make service requests of 'httpd'. An 'httpd' installation on Unix might well have 'inetd' starting it up on a service request if 'httpd' isn't running.

Next time this happens, you might try '<path>/httpd start' and see if that solves it (temporarily) too. Either result would be useful, I'd think, to tech. supt. in directing you to the sol'n.

Migrateduser

I am not aware of this specific situation, but you might want to investigate how you set up the OD Admin application. It can either share the web server with TeamSite, or have its own instance. From what you are describing it could be a good place to look. I believe most OD customers set up a separate web server.
Regards,

lissa

BENMAY

I have already opened a ticket with interwoven for this problem, and am awaiting customer service reponse...

I believe we have our own instance of apache running. This morning the system was in the state I described above, and here is some system info which might be helpful:

usgcu024-/$ps -aef |grep http
nobody 1133 1117 0 Aug 18 ? 0:33 /usr/apache/bin/httpd
nobody 5412 1117 0 Aug 18 ? 0:32 /usr/apache/bin/httpd
root 1117 1 0 Aug 18 ? 0:00 /usr/apache/bin/httpd
nobody 5413 1117 0 Aug 18 ? 0:31 /usr/apache/bin/httpd
nobody 1132 1117 0 Aug 18 ? 0:33 /usr/apache/bin/httpd
nobody 1134 1117 0 Aug 18 ? 0:30 /usr/apache/bin/httpd
nobody 1135 1117 0 Aug 18 ? 0:32 /usr/apache/bin/httpd
nobody 1136 1117 0 Aug 18 ? 0:31 /usr/apache/bin/httpd
nobody 28478 1117 0 Sep 03 ? 0:16 /usr/apache/bin/httpd
nobody 25027 1117 0 Aug 19 ? 0:29 /usr/apache/bin/httpd

usgcu024-/$ps -aef |grep iw |sort
iwui 16232 16230 0 Sep 23 ? 0:25 /DBA/IW/iw-home/iw-webd/bin/iwwebd -DUNIX -d /DBA/IW/iw-home/iw-webd -DSSL
iwui 16233 16230 0 Sep 23 ? 0:27 /DBA/IW/iw-home/iw-webd/bin/iwwebd -DUNIX -d /DBA/IW/iw-home/iw-webd -DSSL
iwui 16234 16230 0 Sep 23 ? 0:25 /DBA/IW/iw-home/iw-webd/bin/iwwebd -DUNIX -d /DBA/IW/iw-home/iw-webd -DSSL
iwui 16239 16230 0 Sep 23 ? 0:29 /DBA/IW/iw-home/iw-webd/bin/iwwebd -DUNIX -d /DBA/IW/iw-home/iw-webd -DSSL
iwui 16243 16230 0 Sep 23 ? 0:26 /DBA/IW/iw-home/iw-webd/bin/iwwebd -DUNIX -d /DBA/IW/iw-home/iw-webd -DSSL
iwui 16247 1 0 Sep 23 ? 25:47 /DBA/IW/iw-home/tools/java1.3/bin/../bin/sparc/native_threads/java -server -Xba
iwui 16412 16230 0 Sep 23 ? 0:27 /DBA/IW/iw-home/iw-webd/bin/iwwebd -DUNIX -d /DBA/IW/iw-home/iw-webd -DSSL
iwui 16413 16230 0 Sep 23 ? 0:28 /DBA/IW/iw-home/iw-webd/bin/iwwebd -DUNIX -d /DBA/IW/iw-home/iw-webd -DSSL
iwui 16423 16230 0 Sep 23 ? 0:26 /DBA/IW/iw-home/iw-webd/bin/iwwebd -DUNIX -d /DBA/IW/iw-home/iw-webd -DSSL
iwui 16528 16230 0 Sep 23 ? 0:25 /DBA/IW/iw-home/iw-webd/bin/iwwebd -DUNIX -d /DBA/IW/iw-home/iw-webd -DSSL
iwui 16530 16230 0 Sep 23 ? 0:26 /DBA/IW/iw-home/iw-webd/bin/iwwebd -DUNIX -d /DBA/IW/iw-home/iw-webd -DSSL
root 16122 1 0 Sep 23 ? 0:00 /bin/sh /DBA/IW/iw-home/private/bin/iwprocdstart
root 16123 1 0 Sep 23 ? 0:00 /bin/sh /DBA/IW/iw-home/private/bin/iwstart /PBSW/IW/iw-store
root 16127 16123 0 Sep 23 ? 30:38 iwserver.2.8 -e /DBA/IW/logs/iwevents.log /PBSW/IW/iw-store
root 16141 16122 0 Sep 23 ? 0:00 /DBA/IW/iw-home/bin/iwprocessd
root 16196 1 0 Sep 23 ? 0:02 /DBA/IW/iw-home/tools/db/sqlanywhere-7.0.1/SYBSsa7/bin/dbeng7 -c 16M -mf /DBA/I
root 16230 1 0 Sep 23 ? 0:00 /DBA/IW/iw-home/iw-webd/bin/iwwebd -DUNIX -d /DBA/IW/iw-home/iw-webd -DSSL
root 16236 1 0 Sep 23 ? 0:00 /DBA/IW/iw-home/bin/iwproxy
root 16240 16236 0 Sep 23 ? 0:39 /DBA/IW/iw-home/bin/iwproxy

Now, I am very curious about what would cause there to be 10 instances of iw-webd running, two of iwproxy, and 10 of apache. Could this be the root of the problem?

Interestingly, if I go to: http://10.202.4.32:81 I get the apache test html page, which to me says that the root of the problem does not lie with the apache http daemon... After issuing the following command:

$cd /etc/init.d/
$./iw.server stop ; ./iwtcboot stop ; ./iwod stop ; ./iw.server start ; ./iwtcboot start ; ./iwod start

The state of the httpd and iw processes was:

usgcu024-/etc/init.d$ps -aef |grep iw |sort
iwui 20337 20335 0 09:02:30 ? 0:00 /DBA/IW/iw-home/iw-webd/bin/iwwebd -DUNIX -d /DBA/IW/iw-home/iw-webd -DSSL
iwui 20338 20335 0 09:02:30 ? 0:00 /DBA/IW/iw-home/iw-webd/bin/iwwebd -DUNIX -d /DBA/IW/iw-home/iw-webd -DSSL
iwui 20339 20335 0 09:02:30 ? 0:00 /DBA/IW/iw-home/iw-webd/bin/iwwebd -DUNIX -d /DBA/IW/iw-home/iw-webd -DSSL
iwui 20340 20335 0 09:02:30 ? 0:00 /DBA/IW/iw-home/iw-webd/bin/iwwebd -DUNIX -d /DBA/IW/iw-home/iw-webd -DSSL
iwui 20341 20335 0 09:02:30 ? 0:00 /DBA/IW/iw-home/iw-webd/bin/iwwebd -DUNIX -d /DBA/IW/iw-home/iw-webd -DSSL
iwui 20352 1 0 09:02:31 pts/11 0:33 /DBA/IW/iw-home/tools/java1.3/bin/../bin/sparc/native_threads/java -server -Xba
iwui 20434 20352 0 09:02:48 pts/11 0:01 /DBA/IW/iw-home/tools/java1.3/bin/../bin/sparc/native_threads/rmiregistry -J-Xr
root 20224 1 0 09:02:21 pts/11 0:00 /bin/sh /DBA/IW/iw-home/private/bin/iwprocdstart
root 20225 1 0 09:02:21 pts/11 0:00 /bin/sh /DBA/IW/iw-home/private/bin/iwstart /PBSW/IW/iw-store
root 20229 20225 0 09:02:21 ? 0:06 iwserver.2.8 -e /DBA/IW/logs/iwevents.log /PBSW/IW/iw-store
root 20243 20224 0 09:02:21 pts/11 0:00 /DBA/IW/iw-home/bin/iwprocessd
root 20300 1 0 09:02:26 ? 0:01 /DBA/IW/iw-home/tools/db/sqlanywhere-7.0.1/SYBSsa7/bin/dbeng7 -c 16M -mf /DBA/I
root 20335 1 0 09:02:29 ? 0:00 /DBA/IW/iw-home/iw-webd/bin/iwwebd -DUNIX -d /DBA/IW/iw-home/iw-webd -DSSL
root 20345 1 0 09:02:30 ? 0:00 /DBA/IW/iw-home/bin/iwproxy
root 20346 20345 0 09:02:30 ? 0:00 /DBA/IW/iw-home/bin/iwproxy

usgcu024-/usr/apache/bin$ps -aef |grep http
nobody 1133 1117 0 Aug 18 ? 0:33 /usr/apache/bin/httpd
nobody 5412 1117 0 Aug 18 ? 0:32 /usr/apache/bin/httpd
root 1117 1 0 Aug 18 ? 0:00 /usr/apache/bin/httpd
nobody 5413 1117 0 Aug 18 ? 0:31 /usr/apache/bin/httpd
nobody 1132 1117 0 Aug 18 ? 0:33 /usr/apache/bin/httpd
nobody 1134 1117 0 Aug 18 ? 0:30 /usr/apache/bin/httpd
nobody 1135 1117 0 Aug 18 ? 0:32 /usr/apache/bin/httpd
nobody 1136 1117 0 Aug 18 ? 0:31 /usr/apache/bin/httpd
nobody 28478 1117 0 Sep 03 ? 0:16 /usr/apache/bin/httpd
nobody 25027 1117 0 Aug 19 ? 0:29 /usr/apache/bin/httpd
root 21142 19531 0 09:06:24 pts/11 0:00 grep http

Which looks about the same. Hrm.... Any more ideas?

gwen1

The number of web processes, etc. would not necessarily be the issue. There should be 2 proxies, etc. and up to 10 per apache, apache servers usually start up with 5 and there is a max set at a number. Out of box believe it is ten.

When this happens again... if you perform an iwstat -c ... are there processes stacking up? Also, when I have had hung processes... check for close waits. netstat -an , can also grep for certain ports.

Do you see any errors in the servlet logs?

Gwen

PaulD

Ben

This sounds very similar to a problem we have on AIX.

When the problem starts, we can browse, but no one can log in, edit files, run java apps, etc. Eventually, our ability to browse will disappear.

Examining the server we noticed our NFS mounts were "gone" and we had no idea why. When we restarted TeamSite, everything was fine. As a side, if OD was in a middle of a deployment when the mount points to /iwmnt disappeared, it would "spin" like crazy chewing up memory and eventually hogging AIX to the point where we couldn't log in to kill it.

What we found was TeamSite was filling up our paging space and automatically freezing the backing store when paging space was low, aka shutting down our NFS mount points. Its a good feature now that we understand it. We suspect the paging space gets filled because iwserver.aix5 isnt releasing memory. Now we watch it grow on a daily basis and we routinely recycle the server to avoid problems. We've seen the largest jumps in memory usage when we are moving around large sets of files in /iwmnt or /.iwmnt via command line. But we are still gathering evidence.

hope this helps
Paul

BENMAY

I will check the netstat and iwstat stuff next time we experience the problem. As far as the logs/iwui/servletd.log file goes, we get about 3 entries a second which look like:

Retrying OpenAPI call IWService.locate(rmi://localhost:1099/IWUIService)
Retrying OpenAPI call IWService.locate(rmi://localhost:1099/IWUIService)
Retrying OpenAPI call IWService.locate(rmi://localhost:1099/IWUIService)

We have yet to find the root of this, and I have no idea if its related to the restart issue.

BENMAY

Aha! I think we may have found the culprit! This morning I came in and had a flood of user emails saying teamsite is down. The behavior currently is a little different from its normal "broken" state in that you can get a teamsite session open, but none of the branches are visable under the server name in the left panel.

And in iwtrace.log I see lots of entries like:

Your system is running low on virtual memory (186 Mb is available)
[Fri Sep 26 09:53:18 2003] Server will deactivate all stores if available memory
falls below 100 Mb
[Fri Sep 26 09:53:34 2003]

Hrm, maybe our teamsite problems ARE being caused by memmory issues...

I have attached a zip file containing the output from iwstat and netstat both before and after restarting teamsite and opendeploy using the command:

usgcu024-/etc/init.d$./iw.server stop ; ./iw.server start ; ./iwtcboot stop ; ./iwtcboot start ; ./iwod stop ; ./iwod start

BENMAY

More news: Teamsite was again down this morning. Here's the end of the iwtrace.log file:

Your system is running low on virtual memory (119 Mb is available)
[Wed Oct 1 04:55:29 2003] Server will deactivate all stores if available memory
falls below 100 Mb
[Wed Oct 1 05:57:31 2003] TEnvironment::Thaw
[Wed Oct 1 05:57:31 2003] TEnvironment::Thaw
[Wed Oct 1 09:25:20 2003] TEnvironment::Flush flushed 0, unlinked 0
[Wed Oct 1 09:25:20 2003] TEnvironment::Flush flushed 0, unlinked 0
[Wed Oct 1 09:25:20 2003] TEnvironment::Freeze freeze until Thu Oct 2 09:25:20
2003
[Wed Oct 1 09:25:21 2003] TEnvironment::Freeze freeze until Thu Oct 2 09:25:20
2003
WARNING: Received ABORT message from wfs request.
WARNING: Received ABORT message from wfs request.
WARNING: Received ABORT message from wfs request.
WARNING: Received ABORT message from wfs request.
[Wed Oct 1 09:27:30 2003] TEnvironment::dtor found 0 dirty.
[Wed Oct 1 09:27:31 2003] TEnvironment::dtor found 0 dirty.
EXITED: Wed Oct 1 09:27:42 EDT 2003

and the results of iwstat -e:

usgcu024-/$ iwstat -c
ERROR:00930: RPC connection failure

herald10

As I have already told you, this is the problem I faced some time back. If you have a lot many servers on which to install the latest Service pack, at least do it on the base server. Hopefully that may resolve the issue.

Adam Stoller

You're clearly having problems with memory on your TeamSite server - which should be addressed as soon as possible.

Whether or not this also was affecting your OpenDeploy Server (likely but not necessarilly) remains to be seen - but until you take care of the memory problems you're not going to be able to tell too much.

--fish
Strolling Prime Minister of no fixed address

Migrateduser

I think it's plausible if you are having memory issues, you will have trouble with any java program that needs a JVM. This would include TeamSite and the OD Admin GUI.

I would definitely focus on the memory problems first.

lissa