Check Pending Context took 21 minutes ?

TS->LSCSAT 7.2.1 (with the mega patch) on Linux

The import of 3 pages files (and about 15 dependencies) took over 20 minutes. From the LSCS log you can see it was spent finding a pending context.

03-08-13 @ 13:09:20 [INFO ] UtilCommandRequestHandler - Get pending context for a project
03-08-13 @ 13:40:24 [INFO ] UtilCommandRequestHandler - No pending context found for project….

What the heck is it doing then ? Nothing else was in process on the server. Has not happened since. Do I assume it won't happen again or is this something I should be concerned about. I went through all the LSCS logs (and Idol logs) nothing else shows anything. Non of the Idol logs show an update until 13:40, after the No pending context found.

Find more posts tagged with

Comments

Bowker

On occasion we have long delays (not that long) in our deployments. However we've never found anything like this.

where did you see that log information? (which server and what path?)

nipper

That was out of the lscs.log.

Nothing else was updated that I can tell

Adam Stoller

Another tidbit of information - in the ODHOME/od.log - we're seeing a lot of messages like this:

2013-03-08 15:41:30.015 GMT-0500 odLog [ReThread-0] - IWDeploySock::ReadLineWithLength ERROR: Failed to read length of data buffer from socket.
2013-03-08 15:41:30.015 GMT-0500 odLog [ReThread-0] - IWDeploySock::ReadLine ERROR: Failed to read (-1) bytes of data from socket.
2013-03-08 15:41:30.016 GMT-0500 odLog [ReThread-0] - IWDeploySock::ReadLongLine ERROR: No response data from remote.

Deployments do seem to be going through - but I cannot say I like seeing those messages and it might be tied into the length of time it seems to take to do the odAdapter deployment for LSCS....

Rick Poulin

What the heck is it doing then ?

For some DB vendors (namely Oracle) and depending largely on version and configuration, a query will sometimes block if another session is currently updating that table.. so it's conceivable that you had another deployment running that wrote to that table but didn't commit for a full 30 mins, while it was doing whatever it does, or the connection dropped altogether and your DB's session timeout is 30 mins.

Another tidbit of information - in the ODHOME/od.log - we're seeing a lot of messages like this.

In my experience, that's often symptomatic of a firewall/LB heartbeat on port 20014 (or a user checking connectivity via telnet), when the input stream is completely garbage or empty, so I highly doubt it has anything to do with your LSCS issues...

Adam Stoller

[OD errors in od.log]
In my experience, that's often symptomatic of a firewall/LB heartbeat on port 20014 (or a user checking connectivity via telnet), when the input stream is completely garbage or empty, so I highly doubt it has anything to do with your LSCS issues...

This is definitely possible - I wrote a simple CGI script to check the status of all of our LSCS servers as well as all of our OpenDeploy senders / receivers - the latter doing a simple telnet to port 20014 to make sure that it didn't error out. I thought I was careful about making sure that it terminated the telnet probe gracefully, but I'll have to take another look at that - and if I was and it's still happening, I may have to do something else (or just remove it - which would be a shame because it's really nice having a single URL to use to check the basic status of all the servers)

Rick Poulin

careful about making sure that it terminated the telnet probe gracefully

Whether you terminate it gracefully or not, OD will still complain about having received an empty input stream.. still, it's just a warning in the logs so your script might be worth keeping so long as everybody understands that the message is almost certainly innocuous.

Prashanth Shetty

Did you find out what was causing this delay? We are experiencing this issue now.

nipper

Yes. I've seen this multiple times at different clients.

LSCS keeps the DB connection open continually. There are newer firewalls/routers/some network toy that kills the connection after hours of inactivity. LSCS tries to write on this, now closed connection, and never gets a reply. After 20 minutes, it reconnects and runs.

I get around it by adding activity. I have a script that runs every 10 minutes that runs curl and gets the context. The DB connection never times out and no delays happen.

Prashanth Shetty

Thanks Andy, I see what you saying. I've seen similar issues in the network setting before.

Prashanth Shetty

Looks like firewall was terminating the connection.

Support provided a solution to this issue.
1. Open the /runtime/webapps/lscs/WEB-INF/context/persistence-context.xml file for editing.
2. Find the "dataSource" bean and add

5
true
600000
600000

Restart : LSCS

This seems to be working, we are removing idle connections before firewall terminates them.

Prashanth Shetty

attached config section

conf.txt

nipper

Cool, when this first happened in 7.2.1 I didn't have the ability to remove the connections, glad they added it.