When my kids asked me “What do engineers do?” I would answer, “We figure out how to make things work.” I’ve spent a few years doing just that: making things work.
But I spent many more years, stretching into decades, understanding how things don’t work. I’ve studied how things fail, why things fail, where faults arise and how to assure failures won’t occur. The study of failure (and how to avoid or prevent it) is its own unique science.
With AI now on the scene, we need to extend the science of failure to AI. Where to begin? It’s a messy space, fraught with changing terminology and inconsistent treatment. Just as one man’s ceiling is another man’s floor, one framework’s “risk” is another framework’s “hazard” (or “malfunction” or “error” or, etc., etc.).
I don’t want to wade into academic discussions of terminology here. Instead, I want to review sources of information, related to wrong or bad behaviors of AI, that might be useful for managing AI risks. The idea is to take an exploratory step into the world of failure and risk analysis; with a long-term objective of defining fault models and cause-effect relationships in AI. (That’s a big project! … remember this is just a first literature survey). With that objective in mind, here are a few sources and methods for AI risk managers to consider.
The Big Picture: MIT AI Risk Repository
MIT maintains a repository of AI risks drawn from a deep-dive into the AI risk literature. Their listing includes a “causal taxonomy” for AI risks. That seems like a decent place to start. (You can find the repository here: https://airisk.mit.edu/). The main risk repository includes over 1,000 entries, curated from 50+ source papers, describing what-can-go-wrong for AI. Spoiler alert: it’s a depressing read! So many fearful and hazardous outcomes, all listed out in one place. Taken as an entire list, the repository reads as a grisly hall-of-horrors in a dystopian AI-driven future.
Other than being a downer, the main impression left by MIT’s list is that it’s very high-level. I would call it societally-focused. There’s nothing wrong with that. Especially if you are a policy-maker grappling with the impact of AI on a community or a society, a list of societal harms is useful to have. So for example, here are few examples of risks listed in MITs repository (sampled directly from MIT’s June 2024 release of the repository at the link shared above):
“ ‘Diffusion of Responsibility’ : Societal-scale harm can arise from AI built by a diffuse collection of creators, where no one is uniquely accountable for the technology's creation or use, as in a classic "tragedy of the commons.”
“(lack of) AI Jurisprudence’ : "When considering legal frameworks, we note that at present no such framework has been identified in literature which would apply blame and responsibility to an autonomous agent for its actions. (Though we do suggest that the recent establishment of laws regarding autonomous vehicles may provide some early frameworks that can be evaluated for efficacy and gaps in future research.) Frequently the literature refers to existing liability and negligence laws which might apply to the manufacturer or operator of a device."
“ ‘Human trust in systems, institutions, and people represented by system outputs evolves as generative AI systems are increasingly embedded in daily life’ : With the increased ease of access to creating machine generated content, which produce misinformation as a product, distinguishing between human and machine generated content, verified and misinformation, will become increasingly difficult and poses a series of threats to trust in media and what we can experience with our own hearing and vision."
MIT’s list is impressively documented and structured to enable deeper diving. Again, it’s over 1,000 lines long. I like the effort. But reading it, I felt the same feeling I get when doomscrolling the news. I can’t directly impact or improve these risks. And I wonder, how much do these risks represent real, observable outcomes; and how much do they reflect the imaginative fear of the authors? It’s hard to tell without more experience. (I suppose that experience is coming soon to a screen, or a robot, near you. We’ll know more soon enough.)
MIT’s list helps me think. But it doesn’t help me to engineer. For example it doesn’t enable me to perform a failure analysis or a risk review. In safety-speak it’s a list of end effects… stuff that can fail in the big-big-picture, but not the causes of failure. To get engineering answers, we need engineering sources.
The Engineer’s Handbook: ISO 8800
The ISO 8800 standard was just released in December 2024, as a compliment to the automotive safety standards ISO 26262 and ISO 21448. All those ISO digits probably put the average reader to sleep; but the main idea is that these standards together define safety practice for automotive electronics and software, including both traditionally-defined functional safety (ISO 26262) and safety of the intended function for self-driving features like ADAS and autonomous vehicles (ISO 21448). ISO 8800 works as a kind of parallel addendum to the other two standards, giving guidance on AI safety and how to implement safety within a product development process.
The ISO 8800 standard is automotive by its nature; and that limits it’s use in some ways. But remember that automotive self-driving is a key early application of safety-critical AI. So the autonomous-vehicle (AV) view of AI risks and failure modes is better developed than many other sectors.
ISO 8800 doesn’t provide one specific list or “fault model” for AI. But while there’s no master list, there are a wealth of practical checklists and references throughout the standard that provide great guidance to failure analysts. Fault models can absolutely be extracted from ISO 8800; and are even specific to given categories of root cause.
For example table 11-2 is titled ‘Examples of Dataset Insufficiencies.’ As the name promises, this table lists data insufficiency examples, organized by positive “properties” of AI which we wish to achieve. The basic idea is obvious yet powerful: we seek a given property of correctness; and we can list examples of problems (in this case the generalized problem of dataset insufficiencies) which would violate or challenge that correct property.
The below list is a slightly-adapted sampling from ISO 8800 Table 11-2, to show how it works. (Note a primary use of AI in self-driving is the detection and classification of objects; therefore the list is focused on image processing and object detection generally):
‘Property: Accuracy
Dataset Insufficiency Example: The resolution of camera images is not sufficient according to expected AI model inputs for object detection.
Dataset Insufficiency Example: An AI system operates with sensors detecting certain types of obstacles at a given distance range and high speed, but the camera used is not adapted for the range and speed, yielding blurry images.
Data Insufficiency Example: The mesh used in lidar imaging of objects is not fine enough (number of points, spacing) to properly detect target obstacles.
Property: Completeness
Data Insufficiency Example: Few images have obstacles close to the camera in a dataset for obstacle detection.
Data Insufficiency Example: No night images are in the dataset even though the input space includes night time.
Data Insufficiency Example: Pertubations like noise, brightening, darkening, vibration, rotation, turbulence, blurring, booming, smear an interference are not reflected in the dataset.
Data Insufficiency Example: An AI system for traffic signal identification is trained with a dataset that does not contain data elements that have all the possible variations of traffic signal shape, height, positions etc., outputted by the AI system.
Data Insufficiency Example: Missing information on the location of captured data does not allow one to analyze the geographical distribution of data and can cause undetected bias.
Property: Correctness or Fidelity
Data Insufficiency Example: Annotators manually create bounding boxes around objects inconsistently, which leads to object size per scenario to be calculated differently.
Data Insufficiency Example: No distinction has been made between a motorcycle and its rider in an image label, though this is relevant for the driving task that the system performs.
….’ (list truncated here, but it continues in the standard)
This single table from ISO 8800 is four pages long. The standard itself is 167 pages long, and is filled with practical fault-relevant lists for practicing engineers and developers. Some lists like the example above are “negative space” lists, describing what can be wrong. Other lists are done in the “positive space,” providing examples of what it looks like to succeed; which can be inverted to generate terms for failures and/or bad behaviors.
Overall ISO 8800 is a great source of practical thinking where AI fault models are concerned. It’s a fine tool for autonomous vehicle development. And it’s easily extensible into related technology fields like autonomous robotics, AI-enabled machinery and operations, etc. But it’s not always generalizable to the AI that we think about every day (specifically, LLMs and their numerous cousins with whom we interact online). For that, something simpler and more universal would be useful.
The Generalists Short-List: NIST AI Risk Management Framework
The NIST AI Risk Management Framework (or ‘RMF’ for short) provides a generalized framework for managing AI risks. I have some quibbles with the details of the RMF and don’t consider it flawless. But it has an important upside: it’s built as a general framework, which can be extended into any AI situation. In this it varies from the automotive-only scope of ISO 8800.
We’re all imagining the future of AI; and we all imagine differently. My personal imagination is that AI will look a lot different than today’s stand-alone LLMs. I think it’s only a matter of time (likely months instead of years) before LLM-like capabilities are combined with agent-like AIs who act on our behalf, coupled with corporate and government agents who also act on behalf of their ‘owners’. Throw some robotics in for good measure, stir in pervasive AI in video and social media, and things get strange pretty fast. In a world of rapid change, adaptable frameworks are the most workable frameworks. NISTs AI RMF is such a framework.
The NIST AI RMF lists seven high-level characteristics of trustworthy AI. It is thought, though not extensively proven, that ‘trustworthy-ness’ gets to the heart of what we’re seeking in good AI, and what we’re lacking in bad AI. (I wonder aloud if that is completely correct… but let’s leave it for another article). The seven characteristics of trustworthy AI according to the NIST AI RMF are as follows (with some limited text selected from the framework itself):
Valid and Reliable: ‘… Validity and reliability for deployed AI systems are often assessed by ongoing testing or monitoring that confirms a system is performing as intended. Measurement of validity, accuracy, robustness, and reliability contribute to trustworthiness and should take into consideration that certain types of failures can cause greater harm…’
Safe: ‘AI systems should not under defined conditions, lead to a state in which human life, health, property, or the environment is endangered” (Source: ISO/IEC TS 5723:2022). Safe operation of AI systems is improved through:
responsible design, development, and deployment practices;
clear information to deployers on responsible use of the system;
responsible decision-making by deployers and end users; and
explanations and documentation of risks based on empirical evidence of incidents.’
Secure and Resilient: ‘AI systems, as well as the ecosystems in which they are deployed, may be said to be resilient if they can withstand unexpected adverse events or unexpected changes in their environment or use – or if they can maintain their functions and structure in the face of internal and external change and degrade safely and gracefully when this is necessary (Adapted from: ISO/IEC TS 5723:2022). Common security concerns relate to adversarial examples,
data poisoning, and the exfiltration of models, training data, or other intellectual property through AI system endpoints.’
Accountable and Transparent: ‘Transparency reflects the extent to which information about an AI system and its outputs is available to individuals interacting with such a system – regardless of whether they are even aware that they are doing so.’
Explainable and Interpretable: ‘Explainability refers to a representation of the mechanisms underlying AI systems’ operation, whereas interpretability refers to the meaning of AI systems’ output in the context of their designed functional purposes. Together, explainability and interpretability assist those operating or overseeing an AI system, as well as users of an AI system, to gain deeper insights into the functionality and trustworthiness of the system, including its outputs.’
Privacy-Enhanced: ‘Privacy refers generally to the norms and practices that help to safeguard human autonomy, identity, and dignity. These norms and practices typically address freedom from intrusion, limiting observation, or individuals’ agency to consent to disclosure or control of facets of their identities (e.g., body, data, reputation).’
Fair - with Harmful Bias Managed: ‘Fairness in AI includes concerns for equality and equity by addressing issues such as harmful bias and discrimination. Standards of fairness can be complex and difficult to define because perceptions of fairness differ among cultures and may shift depending on application. Organizations’ risk management efforts will be enhanced by recognizing and considering these differences.’
This is list written in “positive space” ; as a list of things that are right with good AI. To get the list of problems, simply invert each one into “negative-space”. (For example, “Valid and Reliable” may become “Invalid or Un-reliable” if we wish to think about wrong/bad/incorrect AI).
Anyone seeking engineering rigor may be disappointed in a list like this one. After all, the seven are broad generalities and not always crisply defined. But as noted before, in a world of rapid change, a flexible and even mutable list of problems can be very useful as a starting point.
The Risk Managers Guidebook: ISO/IEC 23894
Finally we come to a document written for the specific purpose of managing AI risks. Similar in rough outline to the NIST AI RMF, the ISO/IEC 23894 standard is targeted at broad-purpose AI usage.
The NIST guide uses what I think of as American-Style language for conformity to a process. NIST says something like “Here is a framework. Use it to manage risk if you want to, or need to, or both.” As an organization, you can do with it what you like. The more European-style ISO/IEC 23894 is a bit less inviting, and more amenable to being formally policed. An ISO standard is technically voluntary on it’s own. But it often serves as a standard of reference in both regulation and in tort law, creating a basis for judgements on thorny topics such as “Was this person (or organization) liable for the harm that occurred?” ISO standards tend to bring out auditors, inspectors, assessors, etc.; and the writers of ISO standards often intend this in their writings.
With that said, the big ideas of the NIST AI RMF and the ISO/IEC 23894 are at least somewhat aligned, regarding potential for AI to be wrong/bad/incorrect. Two lists are relevant in the 23894. One is a “positive space” list of Objectives for organizations using AI. The other is a “negative space” list of Risk Sources relevant to AI. (You can find the full list with text support/discussion in annexes A and B of the standard).
Possible Objectives for AI as summarized in ISO/IEC 23894 Annex A:
Accountability
AI Expertise
Availability and Quality of Training and Test Data
Environmental Impact
Fairness
Maintainability
Privacy
Robustness
Safety
Security
Transparency and Explainability
Risk Sources which should be taken into account for AI, according to ISO/IEC 23894 Annex B:
Complexity of Environment
Lack of Transparency and Explainability
Level of Automation
Risk Sources Related to Machine Learning
System Hardware Issues
System Lifecycle Issues
Technology Readiness
The ISO/IEC 23894 is extremely useful. It plugs into the ISO literature on risk management with a set of guidelines and frameworks suited to AI. At a glance it seems a bit more complete than the NIST AI RMF, incorporating more specifics om root causes and/or ‘risk sources’ which are implied but not listed by NIST.
As an English minor and systems engineer, I slightly cringe at the linguistic inconsistency in the ISO/IEC 23894. Broad-minded sweeping themes like “Fairness” are intermingled and co-listed with specific AI-nerd domains like “Availability and Quantity of Training and Test Data.” I’m sure both topics are important. I doubt they belong on the exact same list.
This points to a broader issue with all the lists (from ISO and other sources): they confuse and confound terminology regarding AI risks, hazards, failure effects, failures, faults, causes, problems, etc.; into one swirling mass of words. These can all be generalized as “AI-bad” words… but how they fit together into an analytical picture is still being worked out, and varies by use case and industry. How we name things is important. (See prior articles for more on this point). But maybe the derivation of globally consistent terms across widely varying industries is a fools errand for now. As risk managers and failure analysts, will need to work with the language we have, even when it’s a bit rough around the edges. So: the development of detailed and extensible fault models is very much a work in progress. But at least we have some places to start.