Over-optimized

Image result for 737 max"
The Boeing 737MAX crisis is a crisis of over-optimization.

"Over-optimization" is a curios term: "Optimal" means the best possible state of being. So how can something be over-optimized? You wouldn't say someone is "over-healthy," and much less would you expect that person to be ill. Yet over-optimization killed two plane-loads of people and grounded a plane that cost billions to develop, idling billions of dollars more in finished goods sitting on aprons, runways, and parking lots.

Boeing made a safe and very efficient airliner. Then, in pursuit of even greater efficiency, they made it less safe. Boeing over-optimized.

There are other elements to this tragedy, especially in in that Boeing could have chosen to mitigate some of the risks they created in over-optimizing by adding redundant sensors, better warnings of failures, etc.

Boeing could also have done, months ago, things like what they announced in January of 2020, that 737 MAX pilots should receive simulator training before flying the plane. Boeing created a very complex problem for themselves in multiple dimensions of how to manage this crisis. But, at the center of this problem is an over-optimization.

There are also a lot of things that did not contribute to this crisis: That the 737 MAX is an evolution of an old design did not contribute to the 737 MAX crisis. The 737 design evolution has kept the plane modern, otherwise it would not be a candidate for further development, and it would not be competitive with newer designs.

The position of the new larger engines did not make the 737 MAX "unstable." The new engines change the flight characteristics of the plane. Test pilots suggested aerodynamic changes that would improve these characteristics. But even those improvements would still have caused current 737 pilots to have to be trained, in simulators and/or with flight instructors, to know how to safely fly the 737 MAX.

The way the 737 MAX evolved from previous designs forms the context of Boeing's fatal decisions. Boeing could have made different decisions. In theory, they could have designed a new plane. But the decisions they made to evolve the design and to put new engines on the plane did not themselves make the plane less safe.

Airliners have to be efficient. If a new airliner costs too much to develop, it will have to be priced higher to make back the development costs. If a new airliner burns too much fuel because the engines are outdated, it won't sell because fuel costs are a significant part of airlines' total costs. Had it not been for the crisis stemming from two crashes, and Boeing subsequent response, the 737 MAX would be a case study in efficient product development strategy.

But then Boeing took optimization one step farther: Instead of training pilots to fly a 737 with somewhat different flight characteristics, Boeing decided to eliminate the need for pilot training entirely.

By now, the acronym MCAS has been much in the news. It stands for Maneuvering Characteristics Augmentation System. It is the system Boeing used to make the 737 MAX fly like the previous generation 737 by automatically adjusting the angle of the horizontal stabilizer, also referred to as "trim." MCAS is the proximate cause of the 737 MAX crashes. If the sensor input to the MCAS system is faulty it will move the horizontal stabilizer to point the nose of the plane down when it should not do so. If this happens at low altitude, pilots have only seconds to intervene manually to prevent a crash.

Because the purpose of the MCAS system is to make the 737 MAX fly just like the previous generation of 737, it was not mentioned in the 737 flight manual. This may seem like a terrible omission, but previous generations of 737 have automatic trim adjustment system, too. Pilots are trained to use electrically assisted or manual trim adjustments if needed, which are literally a pair of cranks in the cockpit that operate the horizontal stabilizer trim by means of cables and pulleys.

Over the course of developing the 737 MAX, the extent to which the MCAS system adjusted trim was increased fourfold, making recovering from a failure more difficult. Unlike a similar system used on another Boeing aircraft, the 737 MAX MCAS system reads input from only one sensor, instead of a redundant pair of sensors, making a failure more likely. The 737 MAX cockpit controls, which are more complex than on most current airliners due to the age of the design, also make recovery from an MCAS failure more difficult. Boeing made optional an indicator light that would inform pilots of a sensor failure and only 20% of the 737 MAX fleet was fitted with that option. But, when working correctly, MCAS did the trick: The 737 MAX flies like the plane it replaces. No pilot retraining needed.

The sequence of decisions leading to the crashes is easy to see in hindsight. But neither Boeing management nor the FAA saw it at the time. This is not to say that all the engineers and test pilots at Boeing were uncritical of these decisions. In an email uncovered during the investigation of the 737 MAX development process, a Boeing employee stated:
"This airplane is designed by clowns who in turn are supervised by monkeys."
A sample of cynicism so far revealed can be found the emails archived here: https://int.nyt.com/data/documenthelper/6653-internal-boeing-communications/606e3fda752a935bc0df/optimized/full.pdf

If you were to read this kind of correspondence among quality assurance engineers at an internet dating site, your understandable response would be to think "par for the course." But, of course, a failure in a dating algorithm can only result in a bad date.

The internal correspondence reveals a symptom of the disorder that affects many high-stakes engineering endeavors before a failure: "normalization of deviance." This phrase originated with a sociologist studying the Challenger Space Shuttle disaster. That Boeing's processes deteriorated due to perhaps the most famous management pathology in history is further saddening.

Even after the crashes Boeing's response appeared to be to shift blame and avoid having to reengineer MCAS, add a redundant sensor, or retrain pilots. Billions of dollars were at stake. Hundreds of unsuspecting people, on new modern airliners, died.

Should the 737 MAX be sent to the scrapyard? No. It will be safe when it returns to flight. The 737 NG, the previous model of 737, has a safety record that is at the forefront of the airliner industry. This is due in part to how well the design is understood over its long evolution. No airliner is without flaws, and FAA communiques about these flaws, called Airworthiness Directives, or "ADs," are issued frequently, sometimes weekly, even for aircraft where the development and qualification processes were a picture of harmony and collegiality.

Boeing does have a problem, though. Evidence is, thus far, that Boeing management is doing the opposite of what is needed. They have directed opprobrium at the people who wrote the now widely publicized embarrassing emails.




Comments

Popular Posts

5G: Hype vs Reality

The QUIC Brown Fox Jumped Over the Top of Carrier Messaging, or Allo, Duo, WebRTC, QUIC, Jibe, and RCS, Explained

Telirati Analysis #11 Diagnosing and Fixing Google's Social Problems