3 AI Fails and Why They Happened

In little over a decade, AI has made leaps and bounds. Every day, new headlines showcase the most recent advancement in AI. In fact, advancements are accelerating:

  • 2004 DARPA sponsors a driverless car grand challenge. Technology developed by the participants eventually allows Google to develop a driverless automobile and modify existing transportation laws;
  • 2005 Honda’s ASIMO humanoid robot is able to walk as fast as a human, delivering trays to customers in a restaurant setting. The same technology is now used in military robots;
  • 2007 Computers learned to play a perfect game of checkers, and in the process opened the door for algorithms capable of searching vast databases of information;
  • 2011 IBM’s Watson wins Jeopardy against top human champions. It is currently training to provide medical advice to doctors. It is capable of mastering any domain of knowledge;
  • 2012 Google releases its Knowledge Graph, a semantic search knowledge base, likely to be the first step toward true artificial intelligence;
  • 2013 Facebook releases Graph Search, a semantic search engine with intimate knowledge about Facebook’s users, essentially making it impossible for us to hide anything from the intelligent algorithms;
  • 2013 BRAIN initiative aimed at reverse engineering the human brain receives 3 billion US dollars in funding by the White House, following an earlier billion euro European initiative to accomplish the same;
  • 2014 Chatbot convinced 33% of the judges that it was human and by doing so passed a restricted version of a Turing Test;
  • 2015 Single piece of general software learns to outperform human players in dozens of Atari video games;
  • 2016 Go playing deep neural network beats world champion.

Source: Artificial Intelligence Safety and Cybersecurity: a Timeline of AI Failures, https://arxiv.org/pdf/1610.07997.pdf

However, little information is shared on the failures in AI and even less on why they fail.

Failure is king

Failure is at the core of human advancement. For example, the microwave’s invention was a failed attempt at making a military grade radar during WW2. Percy Spencer noticed a melting chocolate bar in his pocket while working on magnetrons for Raytheon, a major U.S. defense contractor.

“From that embarrassing accident came a multimillion dollar industry — and one of the great twin blessings and curses of the American kitchen.” Wired.

More recently, major corporations have begun to embrace the value of failure. Unsurprisingly, Gatorade’s brand strategy is all about winning. It’s been at the core of their marketing efforts for the past decade. What’s more surprising, is their latest campaign:

AI is no different

Being a recent advancement in which many efforts are still considered R&D investments, notable failures are emerging. This is a normal path towards improvement. To list a few:

1959 AI designed to be a General Problem Solver failed to solve real world problems

1982 Software designed to make discoveries, discovered how to cheat instead

1983 Nuclear attack early warning system falsely claimed that an attack is taking place

2010 Complex AI stock trading software caused a trillion dollar flash crash

2011 E-Assistant told to “call me an ambulance” began to refer to the user as Ambulance

2013 Object recognition neural networks saw phantom objects in particular noise images

2015 A robot for grabbing auto parts grabbed and killed a man

2015 Image tagging software classified black people as gorillas

2015 Adult content filtering software failed to remove inappropriate content

2016 AI designed to predict recidivism acted racist

2016 Game NPCs designed unauthorized superweapons

2016 Patrol robot collided with a child

2016 World champion-level Go playing AI lost a game

2016 Self driving car had a deadly accident

While most of these accidents or mishaps are not directly linked to the AI itself failing, they are all linked to AI in some way. For example, in the self-driving car accident, it is impossible to state if the AI is responsible for the accident, but we can ask the question: could the accident have been avoided if the AI was better trained? It goes to show that as we move forward with AI development, we must be extremely confident in the algorithms making decisions on our behalf particularly when these decisions have complex variables and consequences can be fatal as in the case of driverless cars.

Let’s take three recent and largely covered examples of AI failures.

#1 Tay, Microsoft’s (racist and bigoted) chatbot

The most recognized failure in AI this past year has easily been Tay, “[…] an artificially intelligent chatbot developed by Microsoft’s Technology and Research and Bing teams to experiment with and conduct research on conversational understanding. Tay is designed to engage and entertain people where they connect with each other online through casual and playful conversation. The more you chat with Tay the smarter she gets, so the experience can be more personalized for you.”

Tay was an attempt in Natural Language Understanding (NLU). Basically, it’s learning algorithms were set to read, interpret, and adapt to written content you were feeding it. The goal was to personalize and personify interactions with a robot. It is a key strategic advancement many tech giants would like to see accomplished. The goal was to be something in the likes of Her and we can clearly see why. In the high tech sector, there are usually three pillars to commercial success: acquisition, engagement, and conversion. Having a fully human and personal experience that can pass a rigorous Turing test would redefine how we go about creating engagement.

#2 Alexa mistakenly offers porn to child

It is difficult not to crack a smile when listening to this one: when the child in the video tells Alexa to “play ‘Digger, Digger,’” Alexa answers, “You want to hear a station for porn detected…hot chick amateur girl sexy…”(full article).

Some would argue that this isn’t considered an AI failing, rather it is the voice command. While those people would be right, keep in mind that Alexa is trained in Voice Recognition with machine learning.

#3 InspiroBot gives questionable advice

My personal favorite, InspiroBot, is designed to provide you with a daily dose of inspiring quotes. Ironically, the fact it regularly fails at creating motivational messages, will most likely lighten up your day. At Arcbees, we enjoy the occasional dark humor joke. We had a couple good laughs when these came up:

It is only appropriate to follow-up these quotes with another:

“Those who cannot learn from history are doomed to repeat it.” – George Santayana

In cases like these, AI fails can be entertaining, but I’m more interested in the reason they have failed and what we can learn from it.

Why Did They Fail?


AI has shown it can produce results in all industries. From predicting insurance sales opportunities, reducing medicine research times, automating production lines, to optimizing transportation routes and much much more. While its domains of applications are far-reaching, successfully putting AI into production requires a very specific problem to solve.

For example, fraud detection can be treated in a very specific manner when used in conjunction with a neural network that has few input and output terms. Your output terms (transactions) can be limited to fraudulent or non-fraudulent. Remember, in this type of a situation you’re modeling an algorithm to correctly classify data into two classes. With such a limited number of possible outputs, it is easier to model an algorithm that will efficiently classify transactions.

Tay failed in part because of its lack of precision. The desired output, other than grammatically correct interactions, was not bound to clearly defined parameters. That’s the challenge though; human interaction is not precise. People participating in the Tay experiment used different vocabulary and syntax producing dispersed and largely variable entry data making it difficult to build coherent results.


For all three examples and for AI in general, context remains a challenge. Context is an extension of precision in some regards but still merits a place in the conversation especially in cases where humans interact with AI. If you chat with Tay, ask Alexa for information, or look to InspiroBot for motivation, you are set in a context where time, place, emotion, weather, identity, company, etc. will impact how you interpret and appreciate the provided outcome.

A classic example would be: “Hey Siri, call me an ambulance”, and she replies: “OK, from now on I will call you Ambulance”. It succeeds in automating a task but fails in understanding the context in which the task is given.

Tay failed to act as a respectful conversational virtual agent because both its training and interactions were subject to unlimited contexts. It was able to identify words and build minimally coherent responses but wasn’t able to understand the meaning of those words and the weight they carried throughout a conversation. Virtual agents do work, however. When they are specific to a context, for example reporting an accident to your insurer, the subject matter, possible questions and answers are far less ambiguous. Although, businesses usually opt for a decision tree model when creating these agents.

Similarly, InspiroBot fails because positivity is in the realm of context. Its content, while generic, is sufficiently rich and descriptive to nourish our interpretation of the possible applications of its advice. It successfully creates quotes but lacks the intelligence to understand the content, meaning, and possible interpretation of its quotes.


While neural networks are capable of backpropagation to adjust their ability to produce the desired result, they are bound to do so with the data and parameters they are trained with.

You’ve probably heard the expression: garbage in, garbage out. With Tay, this played a major role in its failure. Instead of training the chatbot behind closed doors and in a controlled environment before releasing it to the world, Tay was designed to learn while interacting with the open public. Everything went haywire in less than 24 hours because tech-savvy communities (notably the 4chan and 8chan) thought it would be interesting to feed the learning algorithms with questionable content. Needless to say, they succeeded.

With Alexa, it’s a little different. Commands are already set to trigger the appropriate response. Alexa’s training aims to understand what commands to trigger according to the audio bit it captures. Its success lies in matching commands with a wide range of vocabulary, syntax, pitch, tones, rhythms, accents and pronunciation for it to be of any value. The hard part is balance: using a large enough variety of audio patterns to match the world’s diversity while and being specific enough to match them with the correct commands. Pushing for this balance may also mean a bigger margin of error and this why Alexa failed in this case. With more training, Alexa could be taught to identify a child’s voice and prompt a parental control if necessary.

If InspiroBot were to use fewer words, template sentences, prevalidated optimistic vocabulary it would be easier to increase performance in creating what we consider to be a motivational quote. However, it would also defy the purpose of using AI. Oversimplification of the parameters negates the use of machine learning as it becomes simpler to model an algorithm without it.

Embracing failure

We learn from our mistakes. It’s for this reason we should embrace our failures in AI. If we really do believe in the advancement of AI as a community, we should share, discuss, analyze, and experiment with failure. Have you seen or experienced any failures in AI? Feel free to share them in the comments section or hit me up on Twitter.