Advances in AI technology are creating new possibilities. Custom silicon is enabling a new generation of AI hardware. Emerging software techniques are delivering breakthroughs in multiple domains and decoupling progress from the constraints of human experience.
Explore our AI Playbook, a blueprint for developing and deploying AI, at www.mmcventures.com/research.
Training the neural networks that power many AI systems is computationally intensive. Graphical Processing Units (GPUs) – hardware that is efficient at performing the matrix mathematics required – have enabled extensive progress and transformed the field of AI (see Chapter 3). In the last decade, computing performance for AI has improved at a rate of 2.5x per year (IBM). The performance of GPUs will continue to increase.
However, GPUs were designed for graphics processing – not AI. Manufacturers exploited GPUs’ ability to perform matrix calculations when it became apparent that AI benefited from the same mathematics. Frequently, just a third of a GPU’s core area is used for AI.
As AI matures, greater demands are being placed on the hardware that powers it. Larger data sets, more model parameters, deeper networks, moving AI to ‘edge’ devices, and an ambition to tackle new challenges demand improved capability. “Current hardware was holding developers back.” (Nigel Toon, Graphcore)
Below, before describing breakthroughs in AI software techniques, we highlight three dynamics shaping AI hardware – the optimisation, customisation and reimagination of hardware for AI.
Competition among hardware providers is fierce. In response to recent industry benchmarking, which compared Google’s and NVIDIA’s processors (https://mlperf.org/results/), both parties claimed victory (https://bit.ly/2IgWK2T; https://bit. ly/2SYLEQd). Developers and consumers alike will benefit from intense competition, as new hardware:
Deep learning AI continues to offer myriad breakthroughs and benefits – in domains including computer vision and language and applications ranging from autonomous vehicles to medical diagnosis and language translation.
In response, vendors are optimising or customising hardware to support the use of popular deep learning frameworks. While addressing a more limited set of instructions, this hardware enables faster system training and performance from common AI frameworks – with varying degrees of specificity.
NVIDIA has introduced GPUs with architectures optimised for deep learning on a range of frameworks. The Company’s Tesla GPUs contain hundreds of Tensor Cores that accelerate the matrix calculations at the heart of deep learning AI. Tesla GPUs deliver faster results with common AI frameworks, particularly convolutional neural networks used for computer vision systems.
Tesla GPUs enable suitable neural networks to be trained in a third of the time previously required (Fig. 43) and operate four times faster (Fig. 44). Compared with a traditional CPU, Tesla GPUs offer a 27-fold improvement.
Google’s Tensor Processing Unit (TPU) is an application- specific integrated circuit (ASIC) – a custom microchip – designed specifically to accelerate AI workloads on the popular TensorFlow framework.
After publicising its use of TPUs in May 2016, Google announced its second-generation TPU in May 2017 and third generation in May 2018. While first generation TPUs were limited to inferencing (processing queries through a trained network), subsequent generations accelerate system training as well as inference.
Optimised to process the mathematics required by TensorFlow, TPUs offer exceptional performance for TensorFlow applications. Even moving from Google’s second- generation TPU to its third reduced by nearly 40% the time required to train ResNet-50, an industry-standard image classification model.
Initially, Google used TPUs only within its own data centres, to accelerate Google services including Google Photos (one TPU can process 100 million photos per day), Google Street View and Google’s RankBrain search facility. TPUs are now accessible to general developers and researchers via the Google Cloud Platform.
Leading hardware manufacturers are diverging from architectures used in the past. In 2019 a new class of computer processors designed, from inception, for AI will emerge. Custom silicon, designed from first principles for AI, offers transformational performance, capability similar to existing systems for a fraction of the power or space, and greater versatility.
Incumbent microchip manufacturers, global technology companies and dozens of disruptive early stage companies including Cerebras, Graphcore and Mythic are developing next-generation processors for AI.
Graphcore, a privately-held ‘scale-up’ company in the UK that has attracted over $300m of venture capital funding, has developed an Intelligence Processing Unit (IPU) (Fig. 46). Graphcore’s IPU combines a bespoke, parallel architecture with custom software to offer greater performance than existing systems. Graphcore’s benchmarking suggests that its IPU can deliver 200-fold performance improvements in selected tasks, compared with GPUs (Fig. 47).
The IPU’s architecture and software enable large quantities of data to be consumed in parallel, instead of sequentially, and from multiple locations (‘graph computing’ in place of ‘linear addressing’). Data is transported across the IPU’s 1,000+ sub-processors more efficiently, and the IPU provides faster access to greater volumes of memory to reduce bandwidth limitations.
As well as enabling existing workloads to be processed more rapidly, new hardware architectures such as IPUs will enable developers to tackle previously intractable challenges.
While cloud computing proliferates, a ‘barbell’ effect is emerging as a new class of AI hardware is optimised for edge computing instead of the data centre.
Edge computing moves the processing of data from the cloud to the ‘edge’ of the internet – on to devices where it was created such as autonomous vehicles, drones, sensors and IoT devices. Increasingly, edge computing is required – as devices proliferate, and connectivity and latency issues demand on- device processing for many.
Numerous hardware manufacturers are developing custom silicon for AI at the edge. In October 2018, Google released Edge TPU – a custom processor to run TensorFlow Lite models on edge devices. A plethora of early stage companies, including Gyrfalcon, Mythic and Syntiant are also developing custom silicon for the edge.
In 2019, as well as enabling next generation AI in the cloud, custom silicon will transform AI at the edge by coupling high performance with low power consumption and small size.
Quantum computing is a paradigm shift in computing that exploits the properties of nature – quantum mechanics – to offer profound new possibilities. While nascent, quantum computing hardware and software are advancing rapidly. 2019 may be the year of ‘quantum supremacy’ – the first time a quantum computer solves a problem a classical computer cannot.
Quantum hardware, and associated software to accelerate AI, are emerging. In addition to building quantum processors, Google is developing quantum neural networks. In November 2018, an Italian team of researchers developed a functioning quantum neural network on an IBM quantum computer (https://bit.ly/2Gx1pee). Rigetti, a manufacturer of quantum computers and software, has developed a method for quantum computers to run certain AI algorithms.
While quantum computing technology will take time to mature, in the decade ahead quantum-powered AI will enable humanity to address previously intractable problems – from climate change to personalised medicine.
“In 2019, as well as enabling next generation AI in the cloud, custom silicon will transform AI at the edge by coupling high performance with low power consumption and small size.”
While novel hardware will enable more powerful AI, recent breakthroughs in software development are delivering transformational results.
Below, we explain how advances in two alternative approaches to developing AI systems – RL and TL – are enabling the creation of programs with unrivalled capabilities. We also describe how a new AI software technique – the Generative Adversarial Networks (GAN) – has reached a tipping point in capability that will reshape media and society.
Recent advances in RL, an alternative approach to developing AI systems, are delivering breakthrough results – and raising expectations regarding AI’s long-term potential.
Typically, an AI system analyses training data and develops a ‘function’ – a way of relating an output to an input – that is used to assess new samples provided to the system (‘supervised learning’).
RL is an alternative approach that uses principles of exploration and reward. Human parents encourage children’s development through emotional rewards (smiling, clapping and verbal encouragement) and physical prizes (toys or sweets). Similarly, after an RL system is presented with a goal, it experiments through trial and error and is rewarded for progress towards the goal. While the system will initially have no knowledge of the correct steps to take, through cycles of exploration RL systems can rapidly improve.
RL is an efficient approach for teaching an agent to interact with its environment. Developers begin by specifying a goal and elements within the agent’s control – for example, in robotics, the joints that a robot can move and the directions in which it can travel. By rewarding useful progress and negatively reinforcing failure, as early as 1997 it was demonstrated that RL could produce a robot that walked in a dynamic environment – without knowledge of the environment or how to walk (Benbrahim and Franklin).
Developments in RL are enabling profound milestones in the training of individual AI agents and, by teaching cooperation, groups.
18 months ago AlphaGo Zero, an RL system developed by DeepMind to play the board game Go, outperformed DeepMind’s previous AI Go system that had been trained using traditional, supervised learning. Provided only with the rules of Go, and without knowledge of any prior games, by playing against itself AlphaGo Zero reached the level of AlphaGo Master in 21 days. After 40 days, AlphaGo Zero surpassed all prior versions of AlphaGo to become, arguably, the strongest Go player in the world (Fig. 48). “Humans seem redundant in front of its self-improvement” (Ke Jie, World No. 1 Go player).
15 months ago, DeepMind developed a more general program – AlphaZero – that could play Chess, Shogi and Go at levels surpassing existing programs.
RL is well suited to creating agents that can perform autonomously in environments for which we lack training data, and enabling agents to adapt to dynamic environments. In 2019 RL will catalyse the development of autonomous vehicles. In the longer-term the exploration of space, where training data is limited and real-time adaptation is required, is likely to draw on RL.
Progress in RL is significant, more broadly, because it decouples system improvement from the constraints of human knowledge. RL enables researchers to “achieve superhuman performance in the most challenging domains with no human input” (DeepMind). We explore this profound implication of AI in Chapter 8.
Source: Google DeepMind
In 2019, developments in RL will also enable groups of agents to interact and collaborate with each other more effectively.
Games, which present a safe and bounded environment for learning, are valuable for training RL systems (Aditya Kaul). Defence of The Ancients 2 (Dota2) is a cooperative online game for teams of five players (Fig. 49). While previous environments required AI agents to optimise only for their own success when responding to the actions of other teams, Dota2 requires agents to consider the success of their team.
OpenAI 5 is a Dota2 team developed by OpenAI, a non- profit AI research company building safe artificial general intelligence. OpenAI used RL in a similar manner to DeepMind’s AlphaGo Zero to train its team.
OpenAI 5 agents initially played against themselves to learn individual and cooperative skills. Subsequently, they were able to improve rapidly (Fig. 50) and defeat all but the top professional human teams.
Developing RLremains challenging. Designing reward functions can be difficult as RL agents will ‘game the system’ to obtain the greatest reward. OpenAI discovered that if they offered agents rewards for collecting power-ups, which would enable the agents to complete their task faster, agents abandoned the task to collect the power-ups given the available rewards. Even with sound reward functions, it can be difficult to avoid ‘overfitting’ solutions to their local environment.
Traditional AI requires systems to be trained from a standing start, which demands data and time, or accepting the outputs of existing, pre-trained networks whose training data is inaccessible. Accordingly, AI development is frequently inefficient or sub-optimal.
Transfer learning (TL) is an emerging approach for developing AI software, which enables programmers to create novel solutions by re-using structures or features of pre-trained networks with their own data. By drawing upon skills learned from a previous problem, and applying them to a different but related challenge, TL can deliver systems with stronger initial performance, more rapid system improvement, and better long-term results (Fig. 51).
Source: Torry, Shalvik
TL has been used to accelerate the development of AI computer vision systems for over a decade. In the last 24 months, however, interest in TL has grown 7-fold (Fig. 52). In 2019 TL is being applied to broader domains –particularly natural language processing.
Source: Google trends
To date, natural language processing has operated at a shallow level, struggling to infer meaning at the level of sentences and paragraphs instead of words. Word embedding, an historically popular technique for inferring the meaning of a word based on the words that frequently appear near to it, is limited and susceptible to bias. The absence of extensive, labelled training data for natural language processing has compounded practitioners’ challenges.
“By enabling better results with less training data, transfer learning is delivering transformational results. 2018 was a breakthrough year for the application of transfer learning in language processing.”
By enabling better results with less training data, TL is offering transformational results. 2018 was a breakthrough year for the application of transfer learning in language processing:
New, TL-powered models “learn fundamental properties of language” (Matthew Peters, ELMo). By doing so, they may unlock higher-level capabilities in language processing with universal utility – including text classification, summation, text generation, question answering and sentiment analysis.
In many situations, gathering data to train AI systems is laborious, expensive or dangerous. Amassing data to train an autonomous vehicle, for example, could require millions of hours of labour, billions of dollars and considerable risk. Simulation, combined with transfer learning, offers a solution. Instead of capturing real-life data, environments are simulated. Using TL, learnings from the simulation can then be applied to the real-world asset.
In the field of robotics, similarly, training models on real- world robots is slow and costly. Learning from a simulation, and transferring the knowledge to a physical machine, can be preferable.
TL may be “a pre-requisite for large-scale machine learning projects that need to interact with the real world” (Sebastian Ruder). As a result, “transfer learning will be the next driver of machine learning commercial success” (Andrew Ng).
TL offers profound as well as pragmatic benefits.
By reducing the volume of training data required to solve a problem, TL enables humans to develop systems in domains where we lack large numbers of labelled data-points for system training.
By offering greater adaptability, TL also supports progresstowards artificial general intelligence (AGI) – systems that can undertake any intellectual tasks a human can perform. While AGI is far from possible with current AI technology, developments in TL are enabling progress. “I think transfer learning is the key to general intelligence. And I think the key to doing transfer learning will be the acquisition of conceptual knowledge – knowledge that is abstracted away from perceptual details of where you learned it, so you can apply it to a new domain” (Demis Hassabis, DeepMind).
“I think transfer learning is the key to general intelligence.”
First proposed in 2014, Generative Adversarial Networks (GANs) are a novel, emerging AI software technique for the creation of lifelike media – including pictures, video, music and text. Exceptional recent progress in the development of GANs (Fig. 53) has enabled breakthrough results. Today, GANs can generate highly realistic media, which – despite being artificially generated – are virtually impossible to differentiate from real content (Fig. 54)
“Today, GANs can generate highly realistic media, which – despite being artificially generated – are virtually impossible to differentiate from real content.”
Source: Goodfellow et al, Radford et al, Liu and Tuzel, Karras et al, https://bit.ly/2GxTRot
While GANs are frequently used to create images, their utility is broader. Additional uses include:
GANs will deliver transformational benefits. The ability to generate lifelike images to a desired specification will reshape the media sector. Further, GANs will enable agencies to capture footage of brand ambassadors and then repurpose footage to create an infinite range of convincing variations. Ambassadors could appear to speak in foreign languages (to promote goods and services in international markets) and discuss new products – without recording any additional footage.
GANs also present profound ethical and societal risks. GANs can be used to: splice individuals’ faces onto existing video without their consent; develop video in which individuals appear to speak words they have not spoken; create counterfeit evidence for criminal cases; and generate or alter footage to create ‘fake news’. We discuss the implications of GANs for society in Chapter 8.
“GANs will deliver transformational benefits. They also present profound risks. We discuss the implications of GANs in Chapter 8.”
GANs operate by two networks – a ‘generator’ and ‘discriminator’ – working in opposition to create increasingly lifelike media.
For a visual GAN, a generator receives a random input, such as a matrix of numbers, and follows a series of mathematical transformations to convert the input into a picture. Initial results will be poor, resembling random sets of pixels (Fig. 55).
Source: Naoki Shibuya
“GANs operate by two networks – a ‘generator’ and ‘discriminator’ – working in opposition to create increasingly lifelike media.”
The output of the generator is then passed to the discriminator. The discriminator is a separate convolutional neural network that has been trained to recognise counterfeit images of the type in question – in this example, handwritten digits. The discriminator assesses whether the image received from the generator is authentic or has been artificially generated. Following the discriminator’s decision, the correct answer is revealed.
If the discriminator correctly determines that the output is artificially generated, the generator: changes the weights in the network responsible for the output recognised as counterfeit; and reinforces the weights in the discriminator that led to the correct conclusion.
If the discriminator incorrectly assess the output from the generator: the weights in the generator, which led to a useful image, are reinforced; and the features in the discriminator, which led to an incorrect result, are down- weighted to yield a better assessment in future.
As the two networks work in parallel, influencing one another, the output from the generator improves until the accuracy of the discriminator is no better than chance (a 50/50 probability of correctly determining the authenticity of the generated image).
“The discriminator assesses whether the image received from the generator is authentic or has been artificially generated.”