eRaisedFlag. Causing Performance issues HELP!

Hi all,

 

Im after some advice on Flagged data.

I have two processes linked together by Flag actions...

 

* The end user updates one process which in turn raises a flag in a separate process.

The flag is applied via a loopback on a common action applied to multiple stages.. With the option of "Anyfolder"

 

This impacts the web server massively, I understand using the “Anyfolder” will do this I’ve tried this folder and appears to have a similar effect on the server… please can somebody help.

 

Whilst testing these processes on a development server i have no issues (Lack of Data)

Tagged:

Comments

  • This is one of my favourite bug-bears in Design. It is poorly, if at all, addressed by Metastorm's documentation or Training. We devote a special section to it in our own training.

     

    You can start here in our >Free Metatsorm BPM Developer Course< (as it is now past its sell-by date), but this is also documented in our book, the >Metastorm BPM Developer's Guide<, and originally in our 'Optimising e-work' guide written in 2001 (online somewhere, no idea where right now).

  • BTW, solving it when already in place requires some artful SQL. If you get it wrong, a mess will result.

     

    We can provide that service, PM or email me if you need it.

  • Hi Jerome,

    As always many thanks for your response, i will take a look at the links provided. hopefully this can be sorted..

    thankfully the process is in live yet is still in beta testing by myself..

     

    Im trying to use "This Folder" but the flag fails to process...

     

    Once the eraisedflag started to bottle neck do you recommend any method to resolve this. our software supplier just restarted the BPM Engine ?

     

     

  • There is not much point restarting, as it all just goes off again. To stop it going you can delete relevant eRaisedFlag entries, but that will stop the execution of the action.

    There is another problem, and it is a major bug in the product IMO. This is that if you have many folders waiting on one of these flagged actions (responding to 'any folder'), and ONE of these Folders fails to trigger because it is locked by a user, the flag is queued for re-triggering. Unfortunately this triggers EVERY Folder yet again!

    This was discovered as a major source of performance hit in a system we built. We only discovered this by accident. Seriously flawed functionality, I think.

    The solution in all cases is to get the list of folder ids, and raise the flag for each folder AS THAT FOLDER, and have the flagged action respond to 'this folder'.

  • Jerome is accurate in how this is working...

     

    You have to get your head around the difference between raising a “flag” and a “flagged action” being performed.

     

    You can raise a flag at any time and if nothing is waiting for it then nothing happens. The “flag” does not know that this is the case, it either get successfully raised or not and that is the end of the flag's story...

     

    “Flagged actions” can be waiting in one of three modes, "Any Folder", linked to a specific "Folder ID" or for an "initial action" (thus not tied to a specific folder).

     

    In the "Any folder" mode all the system cares about is the flag name. When a flag is raised then any folders waiting for that same flag name and with the “all folders” setting will react.  They will check their “Only Start Action If” conditions return true and then try to complete the action for each folder.

     

    When linked to a specific folder ID then the flagged action is effectively listening for both a specific flag name and a specific folder id to be passed with any raised “flag”. Only when both conditions match will any specific instance of a flag action react and attempt the action.

     

    In circumstances where you have a stage with a large number of folders resting at the stage or the same flag applied to multiple stages, only then can you see what the difference in these too methods.  If you have 1000 folders waiting folders for a flagged action in the "Any Folder" mode then when you raise the flag all 1000 folders react, check their conditions and attempt to complete the action. If you have 1000 folders waiting in the “linked to unique Folder ID” mode then when you raise a flag with a specific flag name and folder ID only one folder reacts.

     

    This becomes a problem when you have the flagged action in the first configuration, the “all folders” mode, but only want one folder to move based on the flag being raised.  You would likely configure a "Only Start Action If" condition on the action that would prevent the other folders from moving. This seems logical but it will make the engine do a lot of work extra work as it has to evaluation the "Only Start Action If" for all 1000 folders every time the relevant flag is raised.  Now, depending on the complexity of the statement, evaluating the condition will take a finite amount of time to complete and as a single thread is processing the flag it will do the evaluations in series. This means that the time taken will be "finite time" x 1000 in this case and it is not hard to create a complex condition component evaluating the system actions is only deal with the one flag action every 30 seconds. 1000 x 30 seconds = 8 hours and 20 minutes to complete!!

     

    Configuring the Event Manager to run with multiple threads can help with this as it will allow the system to process other system event (flags and timers) in parallel.

     

    However then we run into the problem Jerome mentions. There was a time when if a folder was locked for another action (a user action for example) then a flag was raised then the flagged action would fail and then the flag would simply be thrown away as if it never happened. Popular demand insisted this was changed and a mechanism was put in place where if a specific folder was already locked when a flagged action was to be performed against it then the system does not try the action but rather configures for the flag to be raised again in a period of time based on the eFolderLockTimeOut value (60 minutes by default). This new flag will have the same name as before but will be given a specific folder ID targeting the folder that was locked. The idea is if the folder was locked and the user cancelled or abandoned the action (or it was a loop back action) then the flagged action would still be performed at the later time. If the folder has been moved and the flag is no longer relevant then the raised flag would pass quietly with no impact.

     

    Going back to the subject of flagged actions configured with "Any Folder", they don't care about the folder ID passed with e flag and only listen for the flag name. So the flag being triggered again (because of the above) will trigger the evaluation of all those flagged actions again (even though there may be no folder to move). Then you have added issue that the more folders that are being evaluated as a result the likely that one or two will be locked for some other action which creates more delayed instances of the flag. You can see how this could get out of hand, particularly as each flag instance could be picked up by a different Event Manager thread.

     

    Basic rule is if you are dealing with large number of folders waiting on a flag is to not use the "Any Folder" mode, unless you really intend to action all or the majority of the waiting folders. You should find a mechanism where the flag is raised with a targeted folder ID and create your flagged actions linked to “folder Id”.  This will cause flags to be evaluated efficiently.

     

    If you have a system that was originally setup for the first example, “Any Folder” and have then altered it to the “specific folderID” style it is very important to remember that existing folders will still have the old style actions configured until the next time they are actioned/moved after the process has been updated. If you are replacing a "Only Start Action IF" mechanism with one where you are passing in a targeted folder ID, then leave the "Only Start Action If" statement in place until he older folders have filtered through the flagged action, otherwise the next time the flag is raised any folders still in the "Any folder" mode will react and as there is now no condition they will take the flagged action in question. More than once I have been asked how to move 10,000 folders back to the previous stage... :smileyhappy:

     

    How do you tell if a give flagged action is in the "Any Folder" mode? Look in the eWait table. The flagged actions are the ones with a value in the eFlagName column. If those rows have a blank entry in the eFlagFolder column they are in the "Any Folder" mode.

     

    However, this takes us on to the last flagged action type I mentioned at that start, the initial action type.  Please be aware that if you have any processes that are started by a flagged action there will be one flagged action listed in the eWait table for each of these maps.  This flagged action will have a blank eFlagFolder, but can be identified as that will also have a blank eFolderID column. These are supposed to be like this, they are entered into the eWait table when the map is published and remain that permanently.  It will break the start of your process if you change them.

     

    I would further add that this topic area is covered in the newer versions of our Foundation Training course.

  • Hi Pdenton.

     

    I appreciate your detailed responses..

    I’m still working on resolving this issue (Luckily in a development environment

     

    The reason I have got into this mess is I’ve used the "Any Folder" statement with a "Only start action if condition" also applied to multiple stages. I can see this being resolved by using "This Folder" as the flag originates from a parent process.

     

    I have a question regarding the below statement.

    "Basic rule is if you are dealing with large number of folders and flags, don't use "Any Folder" (unless you really intend to action all the folders).  Find a mechanism where the flag is raised with a targeted folder ID and created your flagged actions with folder IDs and things should perform fine."

     

    How do you not use the "Any Folder" and specify the eFolderID if the flag does not originate from the parent process instead the flag is coming from the child process back to the parent.

     

    I have tried this and the flag action fails.?

     

    Hope this makes sense.

    Dan

  • Have the flag being raised use "This folder" under Folder which must raise the flag....

    Then your child process does a %RaiseFlag(,%Parent,).

     

    This will have the child call the flag only for its parent process.  If you wanted to target another specific folder, you would just replace %Parent with the folder id of the target folder.

  • I assume this particular issue applies to V9 version as well?

  • An age-old problem I firsdt pointed out in my 'optimising e-work' publication over ten years ago. And yes, it is still a problem in version 9. It is the first thing we look for when coming across new poorly-performing systems. It is also the most common mistake made.

     

    Also discussed in our v7 (and v9) training course:

    http://processmapping.com.au/freestuff/metastormbpm7developercourse/FlaggedActions.html

  • Jerome,

     

    I can confirm that this issue exists in V9 as we were able to see duplicate entries in the eRaisedFlag.

  • The problem is actually much worse that it first seems. This is because of another bug we have reported. I'll detail that first:

     

    When a flag is raised against many Folders, and one folder fails to respond because it is locked, the flag is queued to be raised for all folders once again in the future. This is because the original eWait entry is used to queue the flag, not a modified entry that refers only to the locked folder. So if I have 5,000 folders to 'refresh' every night, and one is locked, the flag is raised for those same 5,000 folders again about 30 minutes later. If, when it is raised again, one folder is also locked, it gets queued again. and so on ad nauseam.

     

    We had fun finding this one, as it caused a massive performance hit on one of our customers' installation. The reason it is hard to find is that the flag raising is not really recorded anywhere.

     

    OK, so why is this a problem here? Well, if you have a flag to raise against 1,000 folders, but only one is supposed to respond, if ANY of those 1,000 folders is locked, the flag will be re-queued. Regardless of whether the flag is triggered for the correct folder, the flag will be raised again. And again. And again....

     

    What happens is that your design flaw is feeding on Metastorm's design flaw (or really a bug in my book) and creating a monster. I predict that 30% of all Metatsorm performance issue may be related to one or other, or even both, of these issues. I am trying to address the second issue (although it's still is not in Metastorm's training or documentation AFAIK, even 10 years after I pointed it out in version 5). I hope Metastorm will address the first.