The State of AI 2019: Divergence

Chapter 1: What is AI?

Modern AI – ‘machine learning’ – enables software to perform difficult tasks more effectively by learning through training instead of following sets of rules. Deep learning, a subset of machine learning, is delivering breakthrough results in fields including computer vision and language processing.

Summary

  • ‘AI’ is a general term that refers to hardware or software that exhibit behaviour which appears intelligent.
  • Basic AI has existed since the 1950s, via rules-based programs that display rudimentary intelligence in limited contexts. Early forms of AI included ‘expert systems’ designed to mimic human specialists.
  • Rules-based systems are limited. Many real-world challenges, from making medical diagnoses to recognising objects in images, are too complex or subtle to be solved by programs that follow sets of rules written by people.
  • Excitement regarding modern AI relates to a set of techniques called machine learning, where advances have been rapid and significant. Machine learning is a sub-set of AI. All machine learning is AI, but not all AI is machine learning.
  • Machine learning enables programs to learn through training, instead of being programmed with rules. By processing training data, machine learning systems provide results that improve with experience.
  • Machine learning can be applied to a wide variety of prediction and optimisation challenges, from determining the probability of a credit card transaction being fraudulent to predicting when an industrial asset is likely to fail.
  • There are more than 15 approaches to machine learning. Popular methodologies include random forests, Bayesian networks and support vector machines.
  • Deep learning is a subset of machine learning that is delivering breakthrough results in fields including computer vision and language. All deep learning is machine learning, but not all machine learning is deep learning.
  • Deep learning emulates the way animals’ brains learn subtle tasks – it models the brain, not the world. Networks of artificial neurons process input data to extract features and optimise variables relevant to a problem, with results improving through training.

Recommendations

Executives

  • Familiarise yourself with the concepts of rules-based software, machine learning and deep learning.
  • Explore why AI is important and its many applications (Chapter 2).
  • Identify sources of AI expertise, and existing AI projects, within your organisation.

Entrepreneurs

  • To identify opportunities for value creation, explore the many applications for AI (Chapter 2) and AI’s implications for markets (Chapter 8).
  • Familiarise yourself with current developments in AI technology (Chapter 5). New approaches and novel techniques offer new possibilities.

Investors

  • Ensure that portfolio company executives are familiar with the concepts of machine learning and deep learning.
  • Explore how the limits of rules-based systems are inhibiting portfolio companies. What problems are too complex, or subtle, to be solved by rules-based systems?
  • Familiarise yourself with the different approaches to machine learning, to enable you to differentiate between companies deploying meaningful AI and pretenders.

Policy-makers

  • AI will impact every industry. Explore Chapter 2 to familiarise yourself with the many applications of AI.
  • Explore the positive implications of AI and the risks it poses to society  (Chapter 8).

Explore our AI Playbook, a blueprint for developing and deploying AI, at www.mmcventures.com/research.

AI: the science of intelligent programs

Coined in 1956, by Dartmouth Assistant Professor John McCarthy, Artificial Intelligence (AI) is a broad term that refers to hardware or software that exhibit behaviour which appears intelligent. AI is “the science and engineering of making intelligent machines, especially intelligent computer programs” (John McCarthy).

“AI is a general term that refers to hardware or software that exhibit behaviour which appears intelligent.”

Early AI: rules-based systems

Basic AI has existed for decades, via rules-based programs that exhibit rudimentary displays of intelligence in specific contexts.

‘Expert systems’ were a popular form of early AI. Programmers codified into software a body of knowledge regarding a specific field and a set of rules. Together, these components were designed to mimic a human expert’s decision-making process.

SRI International’s PROSPECTOR system of 1977 (Fig. 1) was intended to assist geologists’ mineral exploration work. Incorporating extensive subject matter information and over 1,000 rules, the system was designed to emulate the process followed by a geologist investigating the potential of a drilling site (Fig. 2).

While expert systems experienced some success (PROSPECTOR predicted the existence of an unknown molybdenum deposit in Washington State) their capabilities were typically limited.

Fig 1. PROSPECTOR Expert System: 1977 Technical Note (Cover)

Source: SRI International

Fig 2. PROSPECTOR Expert System: 1977 Technical Note (Detail: Decision Tree)

Source: SRI International

The limits of rules-based systems

Rules-based systems are limited – because many real-world challenges are too complex, or subtle, to be solved by programs that follow sets of rules written by people. Providing a medical diagnosis, operating a vehicle, optimising the performance of an industrial asset (Fig. 3) and developing an optimised investment portfolio are examples of complex problems. Each involves processing large volumes of data with numerous variables and non-linear relationships between inputs and outputs. It is impractical, and frequently impossible, to write a set of rules – such as a set of ‘if…then’ statements – that will produce useful and consistent results.

 

Fig 3. Industrial asset optimisation is a complex problem

Source: Alamy

Machine learning: software that learns through training

What if the burden of finding solutions to complex problems can be transferred from the programmer to their program? This is the promise of modern AI.

Excitement regarding modern AI relates to a set of techniques called machine learning, where advances have been significant and rapid. Machine learning is a sub-set of AI (Fig. 4). All machine learning is AI, but not all AI is machine learning.

Machine learning shifts much of the burden of writing intelligent software from the programmer to their program, enabling more complex and subtle problems to be solved. Instead of codifying rules for programs to follow, programmers enable programs to learn. Machine learning is the “field of
study that gives computers the ability to learn without being explicitly programmed” (Arthur Samuel).

Machine learning algorithms learn through training. In a simplified example, an algorithm is fed inputs – training data – whose outputs are usually known in advance (‘supervised learning’). The algorithm processes the input data to produce a prediction or recommendation. The difference between the
algorithm’s output and the correct output is determined. If the algorithm’s output is incorrect, the processing function in the algorithm changes to improve the accuracy of its predictions. Initially the results of a machine learning algorithm will be poor. However, as larger volumes of training data are provided, a program’s predictions can become highly accurate (Fig. 5).

Fig 4. The Evolution of AI: machine learning

Source: MMC Ventures

Fig 5. Large data sets enable effective machine learning

Source: Michael Nielsen. Note: The size of data set required to train a machine learning algorithm is context dependent and cannot be generalised

The defining characteristic of a machine learning algorithm, therefore, is that the quality of its predictions improves with experience. Typically, the more relevant data provided to a machine learning system, the more effective its predictions (up to a point).

By learning through practice, instead of following sets of rules, machine learning systems deliver better solutions than rules-based systems to numerous prediction and optimisation challenges.

There are many approaches to machine learning

There are more than 15 approaches to machine learning. Each uses a different form of algorithmic architecture to optimise predictions based on input data.

One, deep learning, is delivering breakthrough results in new domains. We explain deep learning below. Others receive less attention – but are widely used given their utility and applicability to a broad range of use cases. Popular machine learning algorithms beyond deep learning include:

  • Random forests that create multitudes of decision trees to optimise predictions. Random forests are used by nearly half of data scientists (Kaggle).
  • Bayesian networks that use probabilistic approaches to analyse variables and the relationships between them. One third of data scientists use Bayesian networks (Kaggle).
  • Support vector machines that are fed categorised examples and create models to assign new inputs to one of the categories. A quarter of data scientists employ support vector machines (Kaggle).

Each approach offers advantages and disadvantages. Frequently, combinations are used (an ‘ensemble’ method). In practice, developers frequently experiment to determine what is effective.

Machine learning can be applied to a wide variety of prediction and optimisation challenges. Examples include: assessing whether a credit card transaction is fraudulent; identifying products a person is likely to buy given their prior purchases; and predicting when an industrial asset is likely to
experience mechanical failure.

“The defining characteristic of a machine learning algorithm is that the quality of its predictions improves with experience.”

Deep learning: offloading feature specification

Even with the power of general machine learning, it is difficult to develop programs that perform certain tasks well – such as understanding speech or recognising objects in images.

In these cases, programmers cannot specify the features in the input data to optimise. For example, it is difficult to write a program that identifies images of dogs. Dogs vary significantly in their visual appearance. These variations are too broad to be described by a set of rules that will consistently enable correct classification (Fig. 6). Even if an exhaustive set of rules could be created, the approach would not be scalable; a new set of rules would be required for every type of object we wished to classify.

Deep learning is delivering breakthrough results in these use cases. Deep learning is a sub-set of machine learning and one of many approaches to it (Fig. 7). All deep learning is machine learning, but not all machine learning is deep learning.

Fig 6. Identifying features can be difficult (‘Dalmatians or ice cream?’)

Source: Google images

Fig 7. The Evolution of AI: deep learning

Source: MMC Ventures

“Even with the power of general machine learning, it is difficult to develop programs that perform certain tasks well – such as understanding speech or recognising objects in images.”

Fig 8. Deep learning offloads the burden of feature extraction from a programmer to their program

Source: MMC Ventures

Deep learning is valuable because it transfers an additional burden – the process of feature extraction – from the programmer to their program (Fig. 8).

Humans learn to complete subtle tasks, such as recognising objects and understanding speech, not by following rules but through practice and feedback. As children, individuals experience the world (see a dog), make a prediction (‘dog’) and receive feedback. Humans learn through training.

Deep learning works by recreating the mechanism of the brain (Fig. 9) in software (Fig. 10). With deep learning we model the brain, not the world.

To undertake deep learning, developers create artificial neurons – software-based calculators that approximate, crudely, the function of neurons in a brain. Artificial neurons are connected together to form a neural network. The network receives an input (such as a picture of a dog), extracts features and offers a determination. If the output of the neural network is incorrect, the connections between the neurons adjust to alter its future predictions. Initially the network’s predictions will frequently be incorrect. However, as the   network is fed many examples (potentially, millions) in a domain, the connections between neurons become finely tuned. When analysing new examples, the artificial neural network will then make consistently correct determinations.

“To undertake deep learning, developers create artificial neurons – software-based calculators that approximate, crudely, the function of neurons in a brain.”

Fig 9. A biological neural network

Source: iStock

Fig 10. An artificial neural network

Source: MMC Ventures

Deep learning has unlocked significant new capabilities, particularly in the domains of vision and language. Deep learning enables:

  • autonomous vehicles to recognise entities and features in the world around them (Fig. 11);
  • software to identify tumours in medical images;
  • Apple and Google to offer voice recognition systems in their smartphones;
  • voice-controlled devices, such as the Amazon Echo;
  • real-time language translation (Fig. 12);
  • sentiment analysis of text;
    and more.

Deep learning is not suited to every problem. Typically, deep learning requires large data sets for training. Training and operating a neural network also demand extensive processing power. Further, it can also be difficult to identify how a neural network developed a specific prediction – a challenge of ‘explainability’.

However, by freeing programmers from the burden of feature extraction, deep learning has delivered effective prediction engines for a range of important use cases and is a powerful tool in the AI developer’s arsenal.

“Deep learning has delivered effective prediction engines for a range of important use cases and is a powerful tool in the AI developer’s arsenal.”

Fig 11. Deep learning enables autonomous vehicles to identify objects around them

Source: Museum of Computer Science, MTV, CA

Fig 12. Google’s Pixel Buds use deep learning to provide real-time language translation

Source: Google / Pixel Buds

How does deep learning work?

Deep learning involves creating artificial neural networks – software-based calculators (artificial neurons) that are connected to one another.

An artificial neuron (Fig. 13) has one or more inputs. The neuron performs a mathematical function on its inputs to deliver an output. The output will depend on the weights given to each input, and the configuration of the input-output function in the neuron. The input-output function can vary. An artificial neuron may be a:

linear unit (the output is proportional to the total weighted input);
threshold unit (the output is set to one of two levels, depending on whether the total input is above a specified value);
sigmoid unit (the output varies continuously, but not linearly as the input changes).

An artificial neural network (Fig. 14) is created when artificial neurons are connected to each other. The output of one neuron becomes an input for another.

“An artificial neural network is created when artificial neurons are connected together. The output of one neuron becomes an input for another.”

Fig 13. An artificial neuron

Source: MMC Ventures

Fig 14. An artificial neural network

Source: MMC Ventures

Neural networks are organised into multiple layers of neurons (Fig. 15) – hence ‘deep’ learning. An input layer receives information to be processed, such as a set of pictures. An output layer delivers results. Between the input and output layers are layers referred to as ‘hidden layers’ where features are detected. Typically, the outputs of neurons on one level of a network all serve as inputs to each neuron in the next layer.

Fig 15. Deep learning: structuring an artificial neural network

Source: MMC Ventures

“Neural networks are organised into multiple layers of neurons – hence ‘deep’ learning. An input layer receives information to be processed, such as a set of pictures. An output layer delivers results.”

Fig. 16 illustrates a neural network designed to recognise pictures of human faces. When pictures are fed into the neural network, the first hidden layers identify patterns of local contrast (low-level features such as edges). As images traverse the hidden layers, progressively higher-level features are identified. Based on its training, at its output layer the neural network will deliver a probability that the picture is of a human face.

Typically, neural networks are trained by exposing them to a large number of labelled examples. As errors are detected, the weightings of the connections between neurons adjust to offer improved results. When the optimisation process has been repeated extensively, the system is deployed to assess unlabelled images.

The structure and operation of the neural network below is simple (and simplified), but structures vary and most are more complex. Architectural variations include: connecting neurons on the same layer; varying the number of neurons per layer; and connecting neurons’ outputs into previous layers in the network (‘recursive neural networks’).

It takes considerable skill to design and improve a neural network. AI professionals undertake multiple steps including: structuring the network for a particular application; providing suitable training data; adjusting the structure of the network according to progress; and combining multiple approaches to optimise results.

Fig 16. Deep learning: the process of feature extraction

Source: MMC Ventures, Andrew Ng