Product category:
Alarms, annunciators and safety interlocks
News Release from: MatrikonOPC | Subject: Alarm management
Edited by the Processingtalk Editorial
Team on 18 October 2007
Manage the process, not the alarm
A paper about process alarm systems, alarm management techniques, and the Matrikon approach, from Mark McTavish, Director of Alarm Solutions and Global Training at Matrikon in Canada
The information age has provided the means to get the right information at the right time However, data has often been provided in the absence of knowledge, which has resulted in the proliferation of alarms in the control room
This article was originally published on Processingtalk on 12 May 2008 at 8.00am (UK)
Related stories
Alarm management a hot topic in process industry
The recent alarm management seminar held in Aberdeen by Matrikon Europe proved to be a huge success, with papers provided by Shell and EEMUA, plus ABB and Matrikon
The next generation of alarm management
Matrikon announces the release of Matrikon Alarm Manager Version 4.1, its premier windows-based alarm system software application, compatible with EEMUA and ISA guidelines
Context has been removed from events and alarms, creating an environment whereby alarms dictate the operator's daily routine.
The human capability is often exceeded, resulting in a poor operating environment with costly results.
Solutions are available to regain control of the alarm system, returning the alarm system to its place as a management-by-exception tool.
Further reading
ISA new alarm management standard at Summit 2007
Matrikon is pleased to announce that ISA will exhibit and present at Matrikon Summit 2007, the technology conference for process industry executives, process control engineers and other professionals
Air Products realizes gains using alarm management
Proper management of alarms and alarm data can be a significant source of "hidden capacity" in a plant, as Air Products and Chemicals discovered on their Calvert City ethylene vinyl acetate plant
DCS alarm management of change made easy
Matrikon announces the release of Alarm Mocca version 2.1, its premier windows-based alarm system Management of Change software application
Leaders in the power industry are applying alarm management principles to better their operations.
The key is to manage the process and not the alarm, through alarm management.
* What is the Problem?.
There once was a time when the operator monitored the process, watched key gauges and trends, and made small adjustments, maintaining production and preventing alarms from occurring.
When an alarm did occur due to an abnormal situation, the operator would jump out of their seat and walk up to the board, knowing what part of the process the alarm occurred in before even getting to the panel as the alarms were contextually positioned on the pneumatic panel.
If the operator had a history of responding to the alarm, he/she knew what the alarm was before even reaching the acknowledge button as the alarm always occurred in the same physical spot.
Upon reaching the panel the operator had the key indicators and historical trends available at a glance, controllers were in close proximity, and after the implementation of a corrective action the operator could watch not only the impact on the process variable in alarm, but the impact to the overall process as well.
All was good.
But could it be even better? With the implementation of microprocessor-based control systems or the DCS, we brought huge leaps in our ability to control processes, model, evaluate and improve efficiencies of our facilities, provide diagnostic capabilities for upsets, perform predictive maintenance, and reduce the cost of alarming.
Why yes, it could be better.
So why do we have pages of unacknowledged alarms, pages of standing alarms, messages that are so cryptic no one understands them - messages like "Stator Temp Decrease High", whatever that really means - missed alarms, and sequence of event logs that are almost impossible to analyze? Why have our operators turned off the alarm summary and permanently silenced the horns? Why do some operators no longer monitor the process making adjustments to prevent alarms from occurring? Why have we come to the point where we are managed by the alarm and we do not manage the process? How can the alarm system support the Operational Incident Manager (OIM) during a crisis? An even larger question: How do we recover from this situation and start to manage the process again? Let us first look at how we got into this mess.
Understanding that, helps us evaluate our recovery from what may appear as the great alarm abyss.
* Good Intention.
Looking back, we can remember certain key phrases and trends.
The computer, the PC, the DCS, the information age, client-server architectures, mainframes, Fortran.
It was a new vocabulary and that new vocabulary required a new education.
We had new young electrical engineers and computer scientists suddenly writing and implementing control logic.
Few of these individuals had worked in process facilities and even fewer had operational experience.
They were ambitious, talented, ready to prove to the world both their own worth and the worth of the new age, the information age.
With hard work and great determination, these individuals implemented great systems that allowed us to historise our process variables, without the ink running dry.
They implemented control strategies that would fill this room with pneumatic tubing.
They created graphics that mimicked the physical layout of the facility.
They were great systems.
Great, for everyone but the operator.
In many cases the operator was not involved in the design or implementation, and when they were involved it was so strange and different the operators did not even know how to respond to the enthusiasm of the young programmers.
So with good intention these young developers gave the operators every piece of information they could ever want.
And they alarmed it while they were at it.
The alarms were free, how could the operator not want them? Here begins the road to too many alarms.
Inadvertently, we made a couple of other mistakes.
Some due to our inexperience; some due to technical limitations of the DCS systems of the day.
We removed the geographical location of alarms.
Literally we took all the alarms we ever had, multiplied them, and threw them into a bucket.
To the operator it appeared like they occurred at random.
Due to limitations of the graphical user interface, we truncated the messages into strings of text nearly impossible to decipher.
The youth of today are proud of their text messaging abilities.
They are neophytes when compared to the early DCS programmers.
We took the diagnostic information away from the operator.
At least it seemed that way, for the gauges and trends were no longer located near the alarm and so the operator lost the context of the alarm.
With good intention, we instantly made the life of the operator a life of....
Well, you fill in your own thought.
What was the result? Within a short period of time the operator began spending their day acknowledging alarms and not monitoring the process.
A decade later, we had a flurry of activity documenting best or standard operating practices; we went for ISO certification, etc, and documented the very things that got us into trouble in the first place.
Enter a time of downsizing, right sizing, reengineering, early retirements and all that is left are new engineers following our standard practices.
The problem lives on.
Enter some Human Factors Research, some major incidents that catch worldwide attention, and suddenly all this good intention is a problem - the very thing the operators said the day we installed the system.
Then comes EEMUA 191 and now we have awareness.
Plant managers begin to acknowledge that problems do exist and want to know how their facilities measure up.
* Fixing the problem.
The very first thing about fixing the problem is acknowledging that we actually have a problem.
This is not a technical thing, it is a personal thing; if you do not believe you are overweight why would you start to diet? Understanding the problem is the second thing to do.
When one looks at how we got into an alarm management problem, one begins to understand that the solution is bigger than just inhibiting a bunch of alarms; it enters the entire realm of the operator and how monitoring and diagnostic information is presented to the operator - the operator interface.
It involves colour selection on graphics, graphical layout meant for operators rather than a duplication of the Process and Instrument Drawing.
It involves management processes and can literally involve changing the culture of an organization.
To understand the solution, let's look at what leading organizations are doing in the world of alarm management.
* The Path of Redemption Enlightenment.
At Matrikon we have observed many successful - and unfortunately some unsuccessful - alarm management endeavours.
The successful efforts have had some common characteristics: First, the organizations did not view alarm management as a one-time project, but as a life cycle.
This view is very similar to quality management and safety management efforts.
A one-time effort will not put the systems in place to ensure long-term success of alarm management, and one-time efforts seldom go beyond the reach of the team assigned to the task.
Therefore success is ultimately doomed as team members leave and the alarm system returns to its old norms.
Secondly, alarm management is all about the operator and the operational environment.
Successful alarm management is seldom achieved without operator participation.
Successful organizations assign knowledgeable operators throughout the life cycle, even though the operator is the hardest person to free up.
Thirdly, clear roles and responsibilities are understood by the successful organization.
The unsuccessful organization tends to charge the DCS engineer with the solution.
After all, he or his predecessor created the problem; let them clean it up.
Alarms are part of the safe work environment and as such an alarm management programme becomes very akin to a safety programme.
Successful organizations acknowledge and understand this, and as such, put into place the management systems to ensure a good alarm management system.
* Managing Expectations.
Once you have the traits for being successful, you need to be clear on the expectations of what can be achieved.
Managing expectations also depends on where you are starting.
Assuming that facility has too many alarms from the perspective of all the key performance indicators (KPI) in EEMUA 191; let us look at what is conceptually possible.
Examining an alarm and event archive, then a 5% reduction in alarms the operator sees on an hourly basis might be achievable.
These are the alarms that chatter and jump right out at you when you look through the archive.
If you store the alarm and event data and have analysis tools such as the Matrikon analysis software, then you can obtain about a 50% reduction in the total number of alarms the operator sees without performing rationalisation.
That is relatively good news.
With little effort, chattering alarms and redundant alarms can be identified and fixed.
However, this does not address the problem of whether an alarm should even exist in the first place.
To address the inherent value of each alarm, you must systematically re-engineer each alarmable tag.
There may be tags in the DCS that do not have alarms configured, yet these may be the devices that would give the operator a better indication of the problem.
This systematic review is commonly referred to as alarm rationalisation or documentation and review.
In performing rationalisation you first apply the definition of an alarm: 'An audible or visible means of indicating to the operator an equipment or process malfunction or abnormal condition requiring a response'.
If the alarm passes the test of requiring an operator action and being unexpected, then the tag is further re-engineered to design the correct alarm threshold, priorities, etc.
It is here that a Master Alarm Database becomes a necessity.
The audit trail, tracking the implementation of rationalised tags and comparison to DCS configured values, becomes very difficult without a properly designed database.
At this point the law of diminishing returns starts to come into play, yet the next step to dynamic alarm management is where a properly designed alarm system starts to shine.
This is where the relevance of the alarm to the operating state of the plant and/or equipment is examined and the alarm is treated differently depending on that state.
One of the major problems for the power industry is burst rate of alarms.
It is not unusual to see thousands of alarms occur during a unit trip.
In dynamic alarm management you start by suppressing the alarms that one expects during an equipment trip by using logic in the DCS.
Then you enable new alarms for events that would be abnormal during the equipment trip.
Predefine the plant states such as offline, online, startup and shutdown, and then configure alarms for each of those states.
Software such as the Honeywell "Alarm on Demand" module, the Ovation Deluxe Points and Yamatake transitional software provides means for the operator to select planned plant transitions and reconfigure the alarm system for the new state.
At the same time intelligent alarming starts to be employed to combine several alarms and process variable states that are not yet in an alarm state to produce an alarm that provides diagnostic support to the operator regarding the cause of the abnormal condition rather that mere feedback on symptoms of the situation.
Finally predictive alarming looks at multivariate analysis and trends process variables forward to determine what will be happening.
In this way a single alarm can be generated allowing the operator to intervene prior to the unit trip and associated alarm flood.
* Alarm Management Steps.
Upon determining targets and establishing expectations, successful organizations have consciously followed certain steps in their alarm management lifecycles.
These being:.
1) Alarm Philosophy: The alarm philosophy, or technical requirements, is typically done at the plant site level, although some organizations develop a corporate philosophy with appendices for the specifics of each individual plant.
The alarm philosophy documents the management process encompassing the alarm management programme as well as the actual alarm creation guidelines.
Typically the alarm philosophy will cover the topics of the alarm management review process, the management of change control, roles and responsibilities, DCS specifics, alarm setting guidelines, alarm handling criteria, system monitoring, testing, documentation control, and training.
This document will define the methodology for setting alarm priorities and alarm thresholds.
It becomes the corner stone of the alarm management programme as it develops into a guideline, rule book, interpretation guide, and the centre for rationalisation.
Successful organizations find that the process of creating the alarm philosophy can be as valuable as the document itself, as it brings all the views, opinions, and methodologies of the employee resources into a common framework.
2) Benchmark and performance audit: The benchmark and performance audit may be done for several reasons.
Successful alarm management organizations find that performing the benchmark and audit yields the following benefits:.
* An understanding of where you are relative to best-practices guidelines, and a comparison of your site to industry best practices.
* An analysis of your current system performance to determine problem areas.
* Creation of an informed path forward.
* Justification of the business case.
* Demonstration of due diligence.
The benchmark and performance audit involves an audit of all the management processes as well as of actual alarm performance.
During the alarm performance audit each of the KPIs defined in the alarm philosophy, which may match those of EEMUA 191 where they are applicable, are calculated and interpreted.
The representation of the data may take many forms, such as the tree map, where the alarm performance of an entire plant may be visualized on one screen.
Alternatively the audit may be summarised by classifying the facility as DC Campbell Brown of British Petroleum proposed in his paper "Horses For Courses - A Vision For Alarm Management".
3) Rationalization: Bad actor cleanup.
During a benchmark and performance audit, bad acting tags will make their presence known.
At the onset of rationalisation it is good to clean up these tags, which could contribute up to 50% of the operators' daily alarm load.
Quick success is helpful in gaining operational support for the alarm management endeavour.
These bad actors are the alarms that are chattering, are totally redundant with other alarms, alarms that have no operational action associated with them or alarms whose limit is in the normal operating range.
Do this and your operators will thank you.
* Rationalization.
The classic method of rationalization involves a team of people from operations and engineering and an impartial facilitator sitting in a room together, methodically reviewing alarm settings on each DCS tag.
The key to successful rationalisation is preparation.
Preparing for an alarm rationalisation can be onerous and time-consuming.
It requires the following actions:.
1) Understand the alarm capabilities of the DCS.
Specifically, one must be able to answer: What parameters drive alarm generation? What priority values are available? Which priorities are audible / visible? What message / alert capabilities exist? How do deviation alarms work? How would I count the number of alarms configured on a tag? EEMUA 191 has guidelines on configuration numbers.
2) Get a DCS database extraction.
Ensure you understand how to categorize all tags into a plant hierarchy (plant/area/unit).
3) Create a Master Alarm Database.
4) Ensure the necessary personnel will be participating.
Generally speaking, the following resources are required: Experienced panel operator and/or shift supervisor; Process engineer; Meeting facilitator; Control engineer for the first few days, especially if you are not experienced with the DCS brand.
5) Ensure a room is booked that is separate from the plant environment.
If panel operators or site personnel can easily find the people in the rationalisation meeting, then many disruptions will occur.
6) Book two projectors.
One computer should display the master alarm database, the other should trend historical process values and access operating procedures and HAZOPs.
7) Ensure a copy of PandIDs and HMIs are available.
Highlight tags on the PandIDs as they are reviewed.
Once the preparation is complete review the "rules of engagement": What conditions allow for an alarm? How are severities classified and determined? What are the time-to-respond categories? How, then, are priorities determined? What alarm limits are the operators allowed to alter? (e.g: priority "3" only?) Are alerts and/or messages available? With the preparation complete it is time to start the rationalisation.
Review each of the alarmable tags on a loop basis determining: time to respond, severity, limit, causes, initial responses, and corrective actions.
Based on these decisions, set priorities and alarm limits.
Document results and record follow-up items and logic opportunities (dynamic alarming / cut-outs).
* Dynamic and state-based alarming.
With classic rationalisation complete, dynamic and state-based alarming can be investigated.
This is not a simple activity and takes a solid understanding of the operational and control philosophies of the plant during all relevant states.
During this stage you: Identify time of alarm floods; Determine how to sense the initiating event; Identify tags that alarmed during the flood; Determine what is normal during the event; List problems; List available tags; Engineer new alarms; Build new alarms; Remove old alarms; Suppress non-relevant alarms.
Care must be taken to test the logic prior to implementation to ensure unsafe conditions are not generated as a result of alarm suppression.
* Implementation: Implementation is the practice of changing the control system configuration to match the decisions made during alarm rationalisation.
This is best implemented in two phases: Implement static settings as documented in the Master Alarm Database.
Implement suggestions for control system logic to consolidate alarms.
Run a discrepancy report.
If the list is manageable, simply implement the DCS settings that do not match the engineered settings.
As the changes are made in the DCS, the discrepancy list will get smaller and smaller until it is complete.
Maintenance organizations successful at alarm management integrate the practices into the plant workflow to optimise plant performance over the long term.
Good maintenance ensures problems that arise are identified, resolved, and documented in a timely fashion, providing: Audit trail of all changes to the DCS; Documentation of reasons for changes; Sustained and improved alarm system performance; Approval of all alarm changes; A knowledge base from experienced personnel.
Continuous Improvement: Continuous performance monitoring helps identify new opportunities for improvement, such as dynamic alarm strategies.
It identifies new opportunities to optimize the alarm system, and alarm floods provide opportunities to better refine dynamic suppressing strategies.
* The Benefits.
An organization will see diverse benefits with the improvement to the operators' environment, and these can all be equated back to a financial benefit.
Some of the benefits successful organizations have seen include: Reduction in unplanned shutdowns; Reduced equipment damage as protection related alarms are not missed; Reduced insurance premiums; Improved safety and due diligence compliance in auditing the alarm system; Reduction in near-miss reports; Operator environment improvement; Giving back the alarm system to the operator is a major accomplishment in ensuring safe and reliable operation of any process facility.
Giving the operator the tools to identify and diagnose abnormal situations, and providing the operator with the time window to implement a corrective action prior to escalation is what alarm management is all about.
Providing manageable alarm rates, removing nuisance alarms, separating out messages and alerts, providing diagnostic messaging and intelligent alarms, providing the operator with access to the cause, consequences, corrective actions, and initial responses to alarms from the master alarm database, providing alarm handling tools such as alarm shelving for times of equipment maintenance - these are the actions necessary in order to return to managing the process and not the alarm.
Free from the never-ending onslaught of nuisance alarms, operators can learn to manage their plants better, optimising resources and becoming better and happier operators.
"There are more things to ALARM us than to HARM us, and we suffer more often in apprehension than reality": Lucius Annaeus Seneca.
Mark McTavish, the author of this paper, is Director of Alarm Solutions and Global Training at Matrikon, and has over 25 years of engineering and operations experience.
Mark has been actively involved in mechanical, process, and control system design, safety and alarm systems, emergency and disaster response, incident investigation and Canadian Standards Association (CSA) code development.
Mark's objectives are to ensure that plants are maintained in a safe, reliable working condition so they can produce quality products.
His industrial experience has included utilities, oil and gas, pipelining, mining, cement manufacturing, district heating, shipping, power generation, and other process industries.
Matrikon has just celebrated its 20th year in the alarm management business.
Matrikon supplies solutions for the collection, analysis, monitoring and presentation of alarms as well as integrated Master Alarm Databases to provide operator assistance to the operator.
We work with our clients, training their staff through workshops, developing alarm philosophies and benchmark and performance audits, and providing rationalisation services.
Matrikon's roots are in the control and automation field and we still provide those services for alarm management when required.
At a higher level, Matrikon develops early event-detection through intelligent systems to identify and prevent future abnormal events. Request a free brochure from MatrikonOPC ...
• MatrikonOPC: contact details and other news
• Email this article to a colleague
• Register for the free Processingtalk email newsletter
• Processingtalk Home Page


