Event Manager doesn't process eWait records

We're setting up several new environments for our first live deployment, and are experiencing a frustrating issue in several of them.  These show up in both version 9.2.0.0 and 9.2.1.1.  These are both completely fresh environment installations, with split engine and web servers.

 

All things are working fine (deployment, user actions, loading custom lists and forms, etc), except for eWait table processing.  After the engine arrives at a stage with conditional actions, the eWait table gets populated with records and timestamps in 1970.  After that, there is anywhere from a 30 second to a 168 second delay before the eWait table is queried and the process moves forward.  "168 seconds" is not a random number; the majority of the time, that is exactly how long it takes between eWait records being added, and eWait events being processed.  When the events do eventually process, they process correctly.

 

These conditional actions may or may not have conditions on them, but the conditions were always simple comparisons out of the database.  The only folders in progress in these servers are the test project that does some very basic branching and conditional actions with no conditions (which are also delayed).

 

Tracing the database calls doesn't show any long running SQL calls.  The behavior is the same whether I configure the engine as on demand or on 1 second polling (and eActiveEngine confirms the setting changes per normal).  The behavior is also the same if we have multiple engines pointing at the database.

 

This hasn't happened in all the environments we've configured, but we can't find any major differences between them.  Its the designated production environment that's giving us trouble.

 

Help?

Tagged:

Comments

  • I have seen this behaviour in environments where MSDTC has not been configured correctly. Check that DTC is working between your engine and DB platforms (DTCPing can be a useful tool). With Windows2008 you may need to add firewall rules if you haven't already.

  • Thanks.  This appears that it is indeed the case, as we did not open firewall ports for DTC (in our network firewall, as opposed to the Windows firewall), and the DTC logs show timeout errors at the exact same intervals that we're experiencing delays for in the environment.

    What's confusing us now is that we have another group of servers that is working fine and ostensibly has the same firewall rules.  When we traced the connections from the engine to the database on the failing environment, we saw blocked attempts to port 135, which was expected as that's the the RPC endpoint mapper that DTC uses.

     

    However, when we tracked the working environment, we didn't see any calls to 135, or indeed any ports other than 1433 for the SQL server.  So things worked, but only because they didn't seem to enlist in the DTC transactions.

     

    Any ideas there?