An Escalation System for Handling and Analyzing Production Disturbances

In a manufacturing environment the effectiveness of internal problem solving is a key success factor in the service level performance offered to the customers. Broken internal processes need to be fixed immediately or at least in a very short time, in order to fulfill the committed delivery dates. There are several methodologies applied for internal problem solving used by different companies. This article presents a solution worked out for a plant operating in a high-mix low-volume (HMLV) production environment prone to both internal and external disruptions and disturbances. Principles, architecture and information flow in our digitized disruption handling – so-called escalation – system will be shortly discussed. Lessons learned in a six-year’s period of using the system will also be summarized. The recorded data confirm that with the introduction of the escalation system the capability of the plant to adapt to changing circumstances and disruptions greatly improved.


Introduction
Production companies invest intensive efforts into planning their activities, but their overall performance hinges much on whether they can execute their plans under changing circumstances, facing unexpected disruptions.It has been early realized that the management of changes, disturbances and disruptions is a key to business success in manufacturing [1], and with the advent of cyber-physical production systems [2] we have a broad set of tools and techniques to assess the deviation between the planned and actual operation of a production system, to mitigate its impact and to adapt the operation of the system to the changes.Indeed, this is what is expected from resilient systems, in many domains of operations [3].
So-called high-mix low-volume (HMLV) production systems are in particular prone to both internal and external disruptions and disturbances.Here, in certain situations deviations from planned course of production are rather the rule than the exception [4].Still, the overall business goals need to be attained, while running operations as continuously [3] and keeping the necessary changes as local as possible [5].In the past two decades many models have been developed to handle production disturbances; e.g., [6] presents and compares almost sixty of them all dedicated to handling external (supply chain) disruptions.Researchers and practitioners are in common that human operators (or first-line managers) have a vital role in handling deviations: in anticipating, monitoring, responding, delegating and learning [7,8].They are the most flexible integrators just when unexpected events need to be handled [9,10].On the other hand, in this activity they need as a systematic and digitalized support as possible to record all relevant information, to find and configure appropriate resources for problem solving [3], and to escalating issues to higher levels of the management whenever shortage of resources or time requires it [11,12].Automotive industry is pioneering in the development of such systems (focusing on network issues) [13], and we have found a disruption management system supporting the production of pressure diecasting cells [14].However, none of these related works operate in a HMLV production environment.
This paper presents a digitalized escalation (ESC) system designed for and deployed in our HMLV plant [4,15].Section 2 discusses this industrial background and our motivation, while Section 3 shows the main deviation types and typical expert groups responsible for their resolution.Section 4 describes the ESC system whose impact on our production is analyzed in Section 5 by using the data records of a six-year's long horizon.Finally, Section 6 concludes the paper and gives directions for future developments.

Industrial background and motivation
The manufacturing plant where the escalation system was developed and implemented is a typical HMLV environment.The plant belongs to a leading international company producing a broad variety of pneumatics products and components.The company is in a continuous growing phase and is managing more than twenty-seven thousand standard products codes with the possibility of configuration of most of these products, using more than thirty thousand components, from more than eight hundred of suppliers, close to one thousand five hundred of daily production orders entering in the production, and with production batch sizes less than one hundred pieces in ninety-five percent of the cases.
The digitization level of the actual manufacturing plant has already been traditionally on a high level (it was the Factory of the 2016 Year in Hungary and won Hungary's Best Prepared Industry 4.0 Plant award in 2017).As we reported in earlier papers, much emphasis was put on building a series of discrete-event simulation models of the plant, deploying an advanced manufacturing execution system (MES), developing an automatic scheduler for selected production lines, and developing a decision support system to improve key performance indicators (KPIs), most importantly our customer service level [4,15].These latest developments improved our planning performance.
In parallel, many efforts were invested into increasing our adaptivity and responsiveness in face of the highly volatile conditions characteristic to HMLV production.Our main motivation was that however good our planning processes are, the plant can operate successfully if only we continuously trace the execution of plans, detect deviations of planned and executed operations and do the appropriate recovery actions as early as possible.In the past decade we also digitalized these processes which are incorporated in our escalation system.It is important to emphasize that this system operates in the service of the same goals as our planning activities: maintaining high customer service level by stabilizing production lead times at minimal costs and buffer levels (or work-in-process, WIP).While the primary objective of the ESC to support the smooth running of the plant by detecting and managing deviations, by recording deviations and responses it can also provide valuable feedback to planning and form a loop of learning.An efficient ESC system supporting the above objectives should fulfill the following criteria: It should be able to • real-time situation assessment and the identification of deviations; • facilitate the generation of immediate response, at the right level of the management hierarchy; • mitigate the proliferation of disturbances across our complex production environment; and • record systematically all escalation data, making it amenable for further analysis and learning.

Production deviations and problem-solving groups
In our definition deviations can happen at the time and point of the plan execution, when for some reason the execution of a planned operation is hindered.Hence, when the production operator cannot execute the production plan according to the given peace and speed, we speak about production deviation.According to their cause, main categories of deviations have been identified in our production environment as follows (see also Fig. 1): 1. Raw material shortage on the line.The expected arrival of the raw material did not happen as planned, because (1) it is on the way to production and it will arrive soon, (2) the filled up Kanban box still did not arrive to the production, (3) the raw material is still under quality check by the incoming quality inspection, or (4) there is a serious deviation in the supply chain what needs to be clarified by the material planner with supply chain team.2. CNC machine programming or network issues.At the time of execution, there is no executable CNC program for machining the raw part.3. Equipment breakdown.Any resources needed for executing an operation, like CNC machines, testers, automatic screwing machines, washing machines, cutting machines, robots, pressing machines, transportation belts etc. are unavailable due to their technical failure.

Information and communication technology (ICT)
issues.Applications are not running properly, production plans, drawings and documentations are not reachable on the network, there are part identification (label printing) issues, or user permissions are not rightly activated.5. Production technology issues.The operator cannot assemble the product based on the available documentation, not proper jigs and devices are used, not proper machines or equipment are recommended in the documentation.
6. Quality related issues.High failure rate is observed at the production line.This is the responsibility of the manufacturing quality expert.If he or she cannot find a prompt solution, then a cross-functional team is set up to work out a solution in the shortest possible time.7. Other (administrative) issues.Workforce assigned to the line is missing (e.g., due to yet unreported health problem), or the workforce does not have the necessary skill or skill level.
The operator can assess the actual issue, but his/her knowledge as of the possible root causes and resolutions are quite limited.Hence, the issue has to be escalated for solution to a specialized, so-called called primary solver group level.It is assumed that the operator has the right knowledge regarding the internal processes, hence it is the operator's responsibility to assess the type of the deviation and to assign it to the appropriate expert in the primary solver group.This assignment happens by sending an e-ticket to a selected expert who tries to find out the root cause of the deviation and to generate a prompt solution.In case the primary solving group cannot find an answer within a given, rather short time frame, the issue is elevated to the secondary solver group.The detailed timing of the escalation process is presented in Section 4.2.In the ESC system, the following experts make decisions in the primary solver groups: • Manufacturing logistics coordinator.This internal logistics position handles cases when either the raw material is missing from the line at the moment of launching the next order, or the raw material has quality problems, or the material is damaged during the assembly operation, or the number of pieces brought to production were less than what was originally specified.• CNC manufacturing engineering.This expert receives an escalation in case there are issues with CNC programs, or the intranet network.• Maintenance.In case of a machine break down issue an e-ticket arrives to the maintenance team from the machine operator.They can right start to repair the broken-down machine.If there are obstacles in repairing the given machine, due to reasons as missing spare part (what needs to be ordered), or in case of a very serious machine breakdown, an e-ticket is escalated to a secondary problem-solving group.The problem-solving group will add all the necessary information to the e-ticket if needed.It can be that an IT support, or a special equipment engineer expertise is needed, or an external special support is required.

• Information and communication technology (ICT).
ICT expert is escalated by the operator in case any application supporting the production processes is not running properly.• Product engineering.In case the documentations or drawings do not match with the parts/components, or the process is not clear for the operator (typically, in case of configurated products), the operator initiates an escalation to get the right support from the product engineers or production support engineers.• Manufacturing quality assistance.In case end of line (EOL) rate or the internal failure rate is higher than the accepted level, an e-ticket is sent to the quality assistant expert, who will try to find out the root cause of the deviation.There can be several reasons behind, such as assembly process issue, raw material issues, or equipment issues.Then this expert decides who should be contacted to eliminate the generated deviation, like the assembly operator coordinator, the incoming inspection colleagues, or the maintenance staff.As a last resort, the escalation can be forwarded to one of the secondary solver groups.

• Production coordinator. If an operator does not
show up for production, or a person get sick on the line, or the person delegated to the line does not have the right training for that product, then the production coordinator is informed with an e-ticket.
When there is no timed solution on the secondary solver group level, then the escalation is automatically delegated to next level where it is handled by a Task Force (see Section 4).This body of decision makers is set up by the plant manager from department leaders and cross-functional experts.Within the Task Force temporary teams are formed to handle the unresolved issue.The head of such a team is always coming from the department which is mainly responsible for the problem to be solved.E.g., in case of a quality issue, the responsible is coming from the quality department.Specifically, members are selected from the following departments: Manufacturing, Quality, Materials management, Supply Chain, Manufacturing engineering, R&D Engineering, Human Resources, and IT.Fig. 2 shows the composition of various problem-solving groups.Note that such a primary and secondary solving group structure can be set up by any manufacturing plant, and the system can be shaped in a flexible way to the specific processes and demands of that facility.

Escalation system: architecture and information flow 4.1 Principles of the system architecture
The escalation system is hierarchical and time-controlled, following a generic design principle: if all the resources and know-how are available for solving an issue, then the expert handles it locally.However, if the existing issue cannot be solved on a level within a given time frame (like one or two hours), then it is transferred to the next higher level.This is the case when the first level is unable to find a solution for the escalation received from the operators.Then they have to contact the right internal experts on the secondary level who could solve the issue.
Hence, the hierarchical solution levels are traversed bottom-up, one after another trying to generate mitigation actions for a deviation.As time flies, first after one hour, next after two and four hours a still unresolved escalation is automatically transferred to the next higher level.The top is Level#5 where the issue needs to be handled directly by the plant manager (see Fig. 3).Issues at this level are resolved in more then 8 hours.
As for mitigating actions, thereby we take an essentially conservative approach: we generate mitigation actions with the least possible changes to the original production plan and try to make the least possible impact on the subsequent stages of production.The search for a mitigating action starts at the lowest suitable escalation level -where the e-tickets are generated -and is only passed upward to a higher level if no feasible mitigating action can be found within a pre-defined time period.Note that every escalation level may need different human and computational resources to handle the assigned issues appropriately.This policy has a number of managerial purposes: • Reaction to every detected deviation is imminent.
• The escalation is initiated by the operators who first experiences a deviation and is aware of the situation.• The changes to the original plan of production, together with its ramifications remain as local as possible.• Higher-level management is burdened only with the most severe issues only.

Flow of escalation information
In a plant the escalation is usually starting either with an Andon lamp or with a sound signal.This is a very useful method in case of mass production, where the processes can easily be supervised.However, in case of HMLV production, so as to facilitate responsiveness and avoid bias or loss of information, the firefighting team should be linked with the problem spot also digitally.Hence, the operator launches an electronic escalation ticket together with a time stamp.The e-ticket is received by the addressee in form of a text message, email or on a direct message screen in the office area.In HMLV production, typically several product families are running on the same line.In case of a product related deviation, it can happen that the line can continue running with another product until the issue is fixed with the problematic product.Since the plant is highly digitized, all relevant documentation, NC program, etc. is readily available to support such decision.
Digital information transfer can be done easily when the operators have access to a personal computer or digital device (like a smart phone) close to their working area.From these devices they can launch the escalations, and the next escalation level (maintenance team, group leaders, quality people etc.) will receive the messages on their similar devices.
The ESC system has a number of advantages: • Escalations are automatically generated by the system and sent to a higher hierarchical level, if the production deviation is not eliminated or closed within the time frame defined in the system.• Documents and pictures can be attached to any e-ticket.• Priorities can be set up among different production areas, in case more escalations arrive at the same time.
• E-tickets can be linked to the production scheduling sheet as an explanation for not delivering the plan.• Every escalation can be recorded and traced back, and all historic information related to past escalations is available in the system.• Statistics can be generated about type of problems, reaction times, length of problems solving, frequency of repetitive deviations, etc. Decisions can be done for actions to improve equipment reliability, capacity increase etc. Overall, quality and efficiency of planning can be increased.• Similar escalations can be analyzed for common patterns, a possible lesson to be learned.Historical data records can be the subject of root cause analysis, too.• Special reports can be generated related to deviations, waiting times, listing problems still to be tackled.• A personalized daily/weekly/monthly report system can be set up related to a person, group, production line, machines, etc.This facilitates the reliable performance evaluation and providing incentives for efficient problem solving at every level of the management hierarchy, including the operators.• Every organization change can be easily transferred to this system, to avoid that an escalation gets lost if changes happen.

Impact of the escalation system in light of historical records
The ESC system has been designed and deployed in our HMLV plant a decade ago.In this section we present on a six-year time horizon some characteristic aggregated data based on the recorded problem-solving results.From the six escalation categories depicted in Fig. 1 here we focus only on two: escalations related to equipment breakdown, as well as to raw material availability on the lines.These have complete histories while the handling of the other four escalation types were implemented only from the fourth year.From the business perspective, the first four years were stable both as far as the product portfolio and the sales volumes are concerned.From the fourth year, when the product portfolio started to get increasing, the other four type of escalations were also implemented.In the investigated period, sales volume increased with the product portfolio simultaneously.
The equipment breakdown issues were managed by the maintenance group, while the raw materials availability issues were mostly managed by the incoming warehouse.We analyzed in parallel the production evolution, the number of escalations, and the efficiency of the escalations.The efficiency of the escalations was measured with the duration of average solution time.During the six-year period the net sales increased with 35%, the number of product types increased with 30% from 120 to 165 product families (each of them with a variety from 2 to 8 subtypes), and the number of escalations increased from 4.000 to 8.000 per year (see Fig. 4).
With the use of the ESC system the plant has learned how to handle deviations: as Fig. 3 shows, in the period when the number of product families was stable, but the sold volume increased (YEAR#1 -YEAR#4), the number of escalations per unit sold value decreased.In the period when new product families were entered and plus four new escalation categories were handled in the system (YEAR#5 -YEAR#6), the number of escalations grew proportionally with the sold value.
Indeed, (YEAR#5 -YEAR#6) was a transition period in the life of the plant when we had to adapt to a swiftly changing demand pattern by developing and implementing our specific HMLV production policy (see details in [15]).In this period, the diversity of the product portfolio grew more rapidly than the actual sales volumes.Hence, the number of new product introductions (NPIs) was also rapidly grown, which come along with more issues, deviations, and escalations -both in terms of categories and ticket numbers.This explains why after an initial period of learning the number of ESC tickets per unit sales started to rise.However, we are convinced that in this hard transition period without the introduction of the ESC system, as well as of its preparatory four years of use and fine-tuning the plant would have faced significantly greater challenges.
The next figure (Fig. 5) shows the number and distribution of escalations related to equipment breakdown between the five levels on the selected six-year horizon.Here, level is directly related to the time needed to solve escalation problems.Under stable conditions (YEAR#1 -YEAR#2) the distribution of such type of escalations changed definitely to the better, solving more and more issues within one hour (Level#1), while leaving less and less issues to be solved over 8 hours (Level#5).In the last year of the period (YEAR#6), 75% of the escalations was solved in 2 hours, and only 2% of them were above 8 hours.This means also that the higher-level management was relieved more and more from handling the minute adaptations tasks.We assume that in the meantime planning quality improved, too.
Finally, Fig. 6 shows the number and distribution of escalations related to raw material availability in the analyzed period.As it seems, such issues required the involvement of the higher-level management more frequently than the handling of equipment problems.However, with the introduction of the ESC system the distribution of escalation cases changed definitely for the better, even though with the introduction of new product families (YEAR#5 and YEAR#6) the number of issues steeply raised.In the last year of the investigated period (YEAR#6) 65% of escalations was solved in 1 hour, and only 5% of them needed a solution time above 8 hours.
All in all, the above data confirm that with the introduction and extension of our ESC system the capability of the plant to adapt to changing circumstances and disruptions improved in a number of respects.In a HMLV production environment this ability was particularly important when the plant had to be managed under more and more volatile and unexpected conditions.

Conclusions and future work
In this paper we have presented an escalation system which was designed to adapt the routine operation of a HMLV production plant to changing circumstances as swiftly and as locally as possible.As we have shortly analyzed, the ESC system which is operational for a decade or so contributed to the high overall performance of the plant considerably, even under the harshest market conditions, assigning mitigation tasks to the management mostly in the right time and at the right level.While the system was tailored to the actual needs of our plans, the generic lessons relevant for production informatics and management are the following: • An important role of the ESC system is to help the escalated topic get to the right solver group in the shortest possible time.• As the variety is increasing in a manufacturing system, priorities should be already defined in advance for problem elimination for critical products and critical technologies.• The implemented digitized ESC system helped the manufacturing staff direct the escalation to the right responsible group.As the problem solving was strictly measured, the problem-solving attitude changed in a positive manner.• When the escalation is done on a digitized way, the problem-solving efficiency is significantly higher, and the accumulated experience can be later analyzed.It is visible, that after the implementation of this escalation system, the length of the reaction and solution time got shorter and shorter.• Indeed, in an HMLV system, where speed and flexibility are the most essential factors in serving the customers, such a digitized problem-solving method is indispensable.Now, our escalation system works as a decision support system supporting the daily work of the management.Hence, it is basically aimed at mitigating the cognitive load on the management, in every level of the management hierarchy, by automatically triggering decisions, channeling information to appropriate levels and recording the history of events and mitigating actions.Regarding the future, the combination of artificial intelligence (AI), data analytics and simulation techniques would certainly help improve the efficiency and effectiveness of our disruption handling workflow.Given the accumulated records of deviations back over many years of the operation of our plant, certain deviations could be predicted from actual observations by some standard machine learning methods.Then, reaction to a predicted disruption could also be planned well ahead of time.Efficient simulation techniques we are using routinely to support our normal production [15] can be applied to test the consequences of various decision options and select the most promising (or cheapest) one.Thereby, a so-called daydreaming factory [16] could be realized where a self-learning system would offer better and better solutions for the deviation cases which would appear in the future.

Fig. 1
Fig. 1 Main categories of deviations and experts and teams for handling them

Fig. 2 Fig. 3 ESC
Fig.2The architecture and the information flow of the escalation system

Fig. 4
Fig. 4 Sales volumes vs. number of e-tickets over a six-year horizon

Fig. 5 Fig. 6
Fig. 5 Number and distribution of ESC tickets related to equipment breakdown