Can't say any solutions come to mind, but I would ask these follow up questions:1- Is this only happening on one box? Do you have multiple boxes with the same setup?2- During times of UI slowness, which process is running hot? iwserver, jboss, or something else?3- Is turning off samba for a couple days an option? You could try that and see if the issue persists4- How big/old is your content store? Have you ever done any cleanup such as removing ancient editions?5- Have you run an iwfsck?6- Have you done any performance tuning in iw.cfg, whether or not with help from support?
Did you get clarification on what versions of TS are affected, whether this is resolved in 7.4.1(.1), and what conditions trigger the problem? I've got TS 7.3.2 running on Linux and haven't come across those issues.
An update on this. The build 360 version of iwserver.linux did not solve this. Nor did a subsequent install of UVFS2.0.7 to use its files, also recommended by support for enhanced performance.
However when the Unix team turned off snapvault for 1 night, there were no problems - the TeamSite mounts stayed up and I could work in the TS UI OK. Go figure.
Has anyone else seen TeamSite having an issue with snapvault?
I am now involved in a war of words with our particularly sensitive Unix team on whose problem this is - TeamSite or the keepers of the servers. It would help greatly if anyone else has had some experience of this.
Cheers
HF
Hi there,
We're running teamsite 7.3.2 pretty stable over here. We're using Redhat 6.5 64-bit with the latest uvfs drivers, and connected to an external Oracle db for the teamsite messaging. IMPORTANT : teamsite search is installed on a seperate server. Make sure you did that ! . The redhat is installed on a VMware instance, the os runs on a virtual disk , but the content store is running on an ext3 raw device that is connected with fibre to a SAN.
The important things to get here are :
-make sure teamsite search is not installed on the same server
-make sure enough inodes are assigned to your filesystem where the store is located
-make sure you have the latest uvfs drivers, properly compiled for the OS
-make troughput to the disk where the store is located is the highest possible
What you can do for now (and how we did discover latency problems in the past).
Create yourself a small shellscript that measure the time to create a few thousand files,and delete a few thousand files ,and let it run in a recurrent loop. You can use the 'time' option to see how long your script runs to be able to do that. If the result shows you a difference according to whatever change you do in your infrastructure , you will have cought the culprite.
Hope this helps.
Greetings,
De Smet Koen - Belgacom
Hi Koen,
it's me, Peter!
We have tried moving the data store to another NetApp filer.
I have uninstalled TS Search from the TS server, and I will install it on another server I have set-up, once we get through these problems.
I've put in uvfs2.0.7 and we are now using its drivers, and I also tried the sync_content_writes switch in iw.cfg (and took it out again when it didn't work).
The latest change is that we have moved the iw-store mount definition from autofs to fstab, so we will see how that goes.
Yes I think I do need to check general server response times. I had been concentrating on TeamSite or mount responses such as iwstat or df -k, because I had allowed myself to be convinced that was where the problem lies. What I notice is that the first sign of a slowdown is when iwstat or df -k are slow to return, followed by a buildup of TeamSite processes, followed by an increasing load (iwstat and top). I need to prove or disprove that the Linux server as a whole has slowed.
PJ
A further update and query.
Since we have moved the mount definition for iw-store from autofs to a hard static mount via fstab, we seem to be having less performance issues - less slowdowns or freezes.
I don't see anything in the documentation on autofs versus fstab, so I was wondering: Has anybody else found they needed to change from autofs to fstab (or vice versa)?
Hey Peter,
Good to hear from you. Thanks again for your recommendation on Link.
Good to hear you kind of sorted it.
AutoFS is usually a bad idea. As a matter of fact ,every since sentence that starts 'automatic' is. Pronouncing the word should initiate an alarm. How I usually call it is 'automatic catasprohe'.
Do check these too :
-make sure you're on an ext4 filesytem : ext3/2 supports less subdirectories ,there is a limit.
-do check you've got enough inodes ( http://unix.stackexchange.com/questions/26598/how-can-i-increase-the-number-of-inodes-in-an-ext4-filesystem )
-test yourself the speed on how much time takes to write a few thousand files ,and compare it between systems with different specs. => this is how we discovered in the past a problem with a fibrecard cable.
Hope that helps.
Koen
I missed your response coming in unfortunately.
Changing to fstab certainly improved the length and magnitude of the slowdowns, which became more momentary but still occurred. And in particular during 3 nightly timeslots of 6:10, 9:10 & 10:10.
However I think I have fixed these and cause/solution is a little bizarre, so I will post it here if it helps anybody or anyone can shed more light:
We have users (people and non-people) accessing TeamSite also via Samba and via other mounts of TeamSite on remote servers. At times of slowdowns I would often see users like '801' or '820' with iwstat or iwrecentusers. I then got a tip from HP-Aut Support that pstacks of iwserver.linux during slowdowns showed that users were not found in the local cache and a lookup was being performed. When I looked I saw getpwuid processes in the pstacks when slow, and not when not.
So I made fake TeamSite users of all the phantom remote users I saw running iwstat and iwrecentusers. Some were in NIS passwd, and others were just local users on remote Linux servers that mounted TS. But I added them all to the TS server's passwd file then made them TS users, so that they would be in the TS user/group cache and not need to be looked up with the getpwuid function. In effect that is tricking the system but it worked. Specifically the nightly slowdowns at 6:10, 9:10 & 10:10 stopped and I could almost tell which users were responsible for those individual events ('cfusion' and 'httpd' users on remote servers).
We are still scratching our heads as to why lookups of non-local users should have such a drastic effect on the system, but it appears this is what was happening and it was backing up processes and increasing the load so that TS froze for a while each time.
Touch wood, but hopefully our performance issues are over.