Draft chapter. This is a pre-publication draft. The final version appears in the Elgar Concise Encyclopedia of Artificial Intelligence and Law, available for purchase on Amazon.
Traditional computer programming requires an author to tell a machine exactly what to do. Machine learning flips this paradigm on its head, requiring the programmer to know only the correct output, not how that output was generated. This magic is made possible, in part, by representing problems as mathematical equations — networks that process input to arrive at correct output.
Neural networks are one of two fundamental features in machine learning that permit automated learning — digitizing the ability to solve future, unknown problems given examples of currently known solutions. The neural network represents the "machine" of machine learning. Although neural networks represent, as a singular artificial neuron, intuitive mathematics, the layering and scale found in real-world neural networks, not to mention the relation to neuroscience, may make the concept unfamiliar at first blush.
Background
In the 1940s, Warren McCulloch, a neurophysiologist, and Walter Pitts, a logician, realized that one of the most advanced machines in existence — the human brain — relied on incredibly simple neurons that themselves could be represented as individual functions. A neuron receives input through a dendrite, processes that input in the nucleus, and produces an output sent along an axon; artificial neurons could be built to take input in the form of numbers, process the input via some type of algorithm, and pass the result of the neuron forward. Although this sounds complicated, it relies on simple mathematics.
The following mathematical formula may be considered a single neuron — a single neural network — just before the activation function is applied:
bias + (inputx × weight₁) + (inputy × weight₂) = intermediate output
— The perceptron formulaThe intermediate output is then run through an activation function. Like a ReLU: if the intermediate output is less than zero, the final output is zero; if more than or equal to zero, the final output is one. In short, if each of these variables is set just right, the artificial neuron will produce correct output to a specific — and potentially useful — problem.
Interestingly, although the neuron just described is the brain behind some of the most complex artificial intelligence architectures today, the idea went dormant in the decades after its release. This dormancy remained despite Frank Rosenblatt's work on automatically updating the neuron's variables to receive more and more correct output. As a quote from 1958 portrays: "We are now about to witness the birth of such a machine — a machine capable of perceiving, recognizing and identifying its surroundings without any human training or control."
The Universal Approximation Theorem
What Professor Rosenblatt was referring to, and what was not practically possible until the recent artificial intelligence era, was the universal approximation theorem: any real-world problem which is able to be mathematically mapped as a continuous function can be solved with nearly-perfect accuracy by using a neural network. In other words, if a neural network is large enough, it can solve any real-world problem that may be mathematically described.
Consider the AND gate. If the bias term is set to negative two and both weights to one, the network correctly produces the right output for all combinations of input — from (0,0) to (1,1). But what if instead of the AND gate, we wanted to represent the logical XOR gate? A single neuron does not work. What Rosenblatt was considering was the layering of one neuron with other neurons — which can, theoretically, answer problems like the XOR gate by combining a NAND neuron with an OR neuron. These layers are where the concept of a "deep" neural network comes from. The more layers, the deeper the network.
"Because of the universal approximation theorem, adding more and more layers allows a machine to drive vehicles, recognize speech, and talk — from lethal autonomous weapons systems to synthetic data to the autonomous vehicles we have on the roads today."
Limitations
Input-output constraints
Neural networks operate by considering input and producing output. If a problem cannot be molded into that workflow — a mathematical input-to-output — then a neural network will be no good at solving the problem. Although there are an infinite number of colors that exist in the real world, computers can only represent colors in a finite and limited way, using a series of red, green, and blue pixel values. Although color itself is continuous in the natural world, computers must downgrade those colors in the digital world to operate on them — something which, in some ways, is unavoidably reductionist.
Inscrutability and traceability
It can sometimes be useful to think of machine learning models as inscrutable, given the difficulty of tracing particular inputs to particular outputs. However, that thinking elides the fact that these networks are surely accessible in some fashion. Neural networks do not produce truly random output. A neural network taught to recognize hand-written digits will not suddenly be able to distinguish between images of cats and dogs. Even "random" and untoward outcomes of advanced neural networks that have learned undesired behaviors — like Microsoft's Tay — have a traceable path.
Finite mathematical goals
Neural networks have finite and mathematical goals in mind. The AND gate from above was told exactly what type of output was correct: zero or one. The world, according to that neural network, consisted of no more than zeros and ones. The same is true of neural networks used to drive vehicles or generate art. The network is approximating a problem by rudimentarily turning that problem into a mathematical equation. This means that unless the mathematical problem incorporates values we as a society care about — like equality — the network will simply not consider them.
Note: Developers must think carefully about how neural networks are used and what is — or is not — considered as input and output by a neural network.
Conclusion
It was not until recently that researchers began layering neurons in such a massive fashion that the aims of McCulloch, Pitts, and Rosenblatt came to fruition. These networks are beginning to match or outperform humans at a variety of tasks, though those breakthroughs are coming from scale rather than invention. Network types have been showing improvements in recent years: convolutional neural networks became popular for vision, generative adversarial networks started a generative trend, and diffusion models have more recently taken hold. And yet, the core of these networks, so far, has remained the groundwork laid by McCulloch, Pitts, and Rosenblatt.
Systems that rely on "more compute" to achieve better performance may get an architectural update in the future — one that could unlock performance boosts that seem limited when scale plateaus. For the moment, however, neural networks can still be abstracted, in some sense, to the perceptron from the 1950s, meaning that the understanding laid out above will continue to capture the essence of artificial intelligence for the foreseeable future.