Ludwig Boltzmann's and Andrej Karpathy's AI Story

Part 17, April 30, 2021 - Ludwig Boltzmann's and Andrej Karpathy's AI Story (Reading time: 3 minutes)

Ludwig Boltzmann was – what we are hardly aware of – one of the most influential physicists of all times. Science historians name him in one breath with Newton and Einstein. In our blog we can’t list all his merits in a wide variety of fields. However, we trace his visionary influence on Artificial Intelligence here, and particularly his impact on Andrej Karpathy, Head of “Artificial Intelligence and Autopilot Vision” at Tesla. In the talk at the end of this post, Karpathy himself tells us his AI story.

Boltzmann_Karpathy Ludwig Boltzmann (photography from 1898) and Andrej Karpathy (2019)

Ludwig Boltzmann (born in Vienna 1844, died in Duino, Italy, in 1906) was – not a matter of course even among physicists at that time – a passionate advocator of Darwin's theory of evolution. Boltzmann showed that the concept of entropy, which he brought from thermodynamics, is crucial for the origin of life, the evolution of plants and animals, and finally humans. In this context he described the formation of nerve cells (neurons), linking up to a network, developing a brain, that, after all, get conscious of itself.

And Boltzmann went one step further. In his book “Populäre Schriften” (“Popular Writings”), published in 1905, he describes machines that can, like humans, receive and process external stimuli (sound, light, etc.), feel pain and, ultimately, develop self-awareness (called “Strong AI” today). Someone who denies an AI machines sensations and consciousness should ask himself, if he is constricted in his own, too small world.

Boltzmann_Populaere-Schriften Excerpt on human-like machines from: Ludwig Boltzmann, Populäre Schriften, Johann Ambrosius Barth Verlag, Leipzig, 1905. https://archive.org/details/populreschrifte00boltgoog/page/n192/mode/2up

It isn’t known, if computer scientists David Hackley and Geoffrey Hinton from Carnegie-Mellon University in Pittsburgh and biophysicist Terrence Sejnowski from Johns Hopkins University in Baltimore had known these words, when they developed the “Boltzmann machine” in the 1980s, or rather had in mind the statistical, energy-dependent Boltzmann distribution of states. In any case, Ludwig Boltzmann was mentally present as “spin doctor”.

The Boltzmann machine is an elementary, trainable, adaptive neural network. It consists of a few “neurons” which are activated or not activated. These are the two possible “energy”-states, or as Hinton paraphrased: a neuron is “asleep” or “awake”. By mutual communication via “synapses”, the huddle of neurons emerges to a neural network. Scientists had started to make true Ludwig Boltzmann's imagination of intelligent machines. (D. Ackley, G. Hinton, T. Sejnowski: A Learning Algorithm for Boltzmann Machines, Cognitive Science, 1985)

Andrej Karpathy was born in 1986 in Kosice (Slovakia today), one year after the publication of the Cognitive Science article on Boltzmann machines. When he was 15, his family emigrated to Canada. From 2007 to 2009, he studied computer science and physics at the University of Toronto. There he met Geoffrey Hinton, who had moved from Pittsburgh to Toronto already in 1987. Hinton gave a lecture on his special subject, the Boltzmann machine.

The deep impression, that this first contact with the artificial intelligence of the Boltzmann machine has made on Karpathy, was described by himself on the video platform bilibili.com in 2019 in a talk with Andrew Ng, British-American professor for AI at Stanford and co-founder of the learning platforms Coursera and deeplearning.ai. You’ll find a transcript of the first part of that interview at the end of this post.

After earning his bachelor's degree in Toronto, Karpathy's academic career took him to the University of British Columbia and later to Stanford University, where he received a PhD in 2016. His thesis was on “Connecting Images and Natural Language”. As an image processing expert, he joined Tesla in 2017, where he heads, as mentioned before, the AI and Autonomous Driving department.

Ludwig Boltzmann and Andrej Karpathy and Geoffrey Hinton and Andrew Ng inspire you? Get started with an Industrial PC from Omtec! Great things begin with

“A flavor of something magical”

Andrew Ng: So welcome Andrej, I'm really glad you could join me today.
Andrej Karpathy: Yeah, thank you for having me.
Ng: So, a lot of people already know your work in deep learning, but not everybody knows your personal story. So, let us start telling us, how did you end up doing all these work in deep learning?
Karpathy: Yeah, absolutely. So I think my first exposure to deep learning once when I was an undergraduate at the University of Toronto. And so, Geoff Hinton was there, and he was teaching a class on deep learning. And at that time, it was restricted Boltzmann machines trained on MNIST digits.* And I just really like the way Geoff talked about training the network, like the mind of the network, and he was using these terms. And I just thought there was a flavor of something magical happening when this was training on those digits. And so that's my first exposure to it, although I didn't get into it in a lot of detail at that time.
And then when I was doing my master's degree at University of British Columbia, I took a class [Note: acoustically not understandable] and that was again on machine learning. And that's the first time I delved deeper into these networks and so on. And what was interesting is that I was very interested in artificial intelligence, and so I took classes in artificial intelligence. But lot of what I was seeing there was just very not satisfying. It was a lot of depth-first search, breadth-first search, alpha-beta pruning, and all these things. And I was not understanding how, I was not satisfied. And so, when I was seeing neural networks for the first time in machine learning, which is this term that I think is more technical and not well known in kind of a most people talk about Artificial Intelligence. Machine learning was more kind of a technical term I would almost say. And so I was dissatisfied with Artificial Intelligence.
When I saw machine learning, I was like, this is the AI that I want to kind of spend time on, this is what's really interesting. And that's what took me down those directions is that is almost a new computing paradigm, I would say. Because normally, humans write code, but here in this case, the optimization writes code. And so, you're creating the input/output specification and then you have lots of examples of it, and the optimization writes code, and sometimes it can write code better than you. And so, I thought that was just a very new way of thinking about programming, and that's what intrigued me about it.

*60,000 training and 10,000 handwritten digits, normalized and image-centered, provided by the U.S. National Institute of Standards and Technology (NIST) as data set for developing OCR character recognition and document management.