Roger Penrose’s “Shadows of the Mind”

Sir Roger Penrose, a Nobel-prize-winning physicist, has the distinction of being (I believe) the most famous and gifted adherent to the view that artificial intelligence cannot and will not reach the point of human equivalence. He is a sceptic—perhaps the principal sceptic—of strong AI. In pursuit of this view, he’s written two key books: The Emperor’s New Mind (1989) and Shadows of the Mind: A Search for the Missing Science of Consciousness (1994):

This post is a critique mainly of Shadows of the Mind, and only of Part I^[1]Part II discusses the new physical theories Penrose thinks will be necessary to explain human consciousness, but that isn’t my real concern., since it’s there that Penrose sets out his arguments against strong AI in full detail. Much has been said already—this being nearly three decades after the fact—but the topic is important to me as an author of AI-focused science fiction.

Briefly, though, The Emperor’s New Mind contains a wealth of insights on mathematics, computing, physics, cosmology and the human brain. Penrose intends these to form the background to his argument, but they stand on their own as wonderful introductions to a range of complex scientific concepts. I learnt a lot!

However, when Penrose gets around to talking about artificial intelligence (mainly in Shadows), not everything seems to stack up.

Chapter 2: “The Gödelian case”

Here, Penrose sets up a mathematical reductio ad absurdum (or, really, proof-by-contradiction). He aims to prove that you cannot encapsulate all human mathematical reasoning in algorithmic form, if it’s assumed to be an entirely sound algorithm.

A few background notes on this:

Most people (of those who’ve thought about it) assume or proclaim that the entire universe is “computable” (or, equivalently, “algorithmic”). This means that any part of it—such as the human brain—could, in principle, be simulated on a sufficiently powerful computer. This view seems to be true for our understanding of physics so far, but we’re far from understanding all of physics. It’s possible that the deeper aspects of physics, beyond current understanding, may be non-computable (non-algorithmic). If so, they would be impossible to simulate accurately on anything resembling a conventional computer (or even a quantum computer as currently envisaged), no matter how powerful it was.
All computation can be represented by the abstract notion of a Turing machine, which has input, output, internal state, and (once started) can either stop after some time, or keep going forever.
Penrose focuses on mathematical reasoning, because it’s a more constrained problem^[2]He’s a mathematician after all., and if he can show that AIs cannot compete with human minds here, then they can’t compete with human minds in general.

Penrose begins by defining the algorithm:

Suppose, then, that we have some computational procedure [algorithm] A which, when it terminates, provides us with a demonstration that a computation such as C(n) actually does not ever stop. We are going to try to imagine that A encapsulates all the procedures available to human mathematicians for convincingly demonstrating that computations do not stop.
(p. 73)

That is, our hypothetical algorithm A takes as input another algorithm, and tries to determine whether that other algorithm stops or not. This is known to be impossible in the general case^[3]It’s called the “halting problem”. Algorithm X cannot find out for all cases whether some other algorithm Y stops or not. The reason is that some algorithm Y might actually invoke a … Continue reading, but Penrose is only asking that (1) A stops only when it can be sure that its input does not stop, and (2) A keeps going forever in all other cases.

Penrose then applies A to itself (taking itself as input), and arrives at the semi-paradoxical result that, if A stops, then A does not stop. Therefore, A does not stop.

Thus, we know something that A is unable to ascertain. It follows that A cannot encapsulate our understanding
(p. 75)

That is, we humans can see that algorithm A does not stop, but A cannot tell us this itself, because it’s too busy not stopping. And if we can see something that the algorithm cannot, then the algorithm cannot be in possession of all human reasoning after all.

(Note: this is Penrose’s “Gödelian” argument, in reference to Gödel’s theorems, but those are typically expressed quite differently, in terms of formal systems of mathematical logic. An algorithm isn’t such a system, in itself, though Penrose draws a link between the two (p. 92).)

Now, however, I bring up the three main problems I have with this line of reasoning (even allowing the assumption that the algorithm is sound).

1. Reporting is not knowing

We might envisage that A actually does internally conclude that it will not stop, when given itself as input. This knowledge might be encoded in A’s internal state (remembering that A is a Turing machine), but A just won’t be able to communicate it via the approved mechanism, without creating a contradiction.

In light of the broader discussion around AI “awareness”, it seems unfortunate to overlook the role of a Turing machine’s internal state.

In fact, A could even output the result—“A does not stop”—that it’s been accused of not being able to ascertain, without creating a contradiction. The only technicality is that a Turing machine’s output isn’t supposed to be considered “official” until the Turing machine has stopped, but this feels like a very legalistic viewpoint when we know that almost all real software outputs information all the time without needing to stop.

2. Why the non-stopping requirement?

It seems arbitrary, and even self-contradictory, that A should have to keep going forever if its input algorithm stops. Why? This doesn’t seem to be necessary for encapsulating human mathematical reasoning. No actual human mathematical procedure runs forever, and certainly not where the actual task is provably finite. If the input algorithm stops, A can definitely find out in finite time and then stop itself (or it can give up and stop even earlier).

A more plausible version of such an all-encompassing procedure always stops in finite time, returning “true”, “false” or “undecidable” (where, we’ll say, the latter occurs if/when the algorithm “gives up”, for whatever reason). When this algorithm is given itself as input, it would simply return “true” or “undecidable”, without any contradiction! (In fact, it never really needs to return “undecidable” in this case, because it can always just compare its input with its own code, at which point it can safely return “true”.)

Penrose is right in that A cannot encapsulate our understanding, but only because he’s smuggled in an extra factor into the definition of A.

Penrose anticipates this objection (or something like it), and tries to address it as part of a Q&A section, but I can’t make sense of his response. He says:

The argument is supposed to be applied merely to insights that allow us to come to the conclusion that computations do not stop, not to those insights that allow us to conclude the opposite.
(p. 79)

But we could conclude, from the very contradiction Penrose sets up, that such sets of insights cannot be separated from one another. It’s hard to see the utility of separating them, in any case, since ultimately we are concerned with the sum total of human reasoning. Whether human reasoning can be broken down into disparate sets of insights isn’t really the point.

And I think Penrose needs to explain why he’s introducing the non-stopping requirement. There has to be a reason for it, not just the equivalent of “don’t worry about it”.

He then tries to explain:

…think of A in the following way: try to include both types of insight in A, but in those circumstances when the conclusion is that the computation C_q(n) [the input algorithm] does stop, deliberately put A into a loop (i.e. make A just repeat some operation over and over again endlessly).
(p. 79)

This is true as far as it goes, but it means Penrose is aware of two things:

This isn’t a natural way to write software, and certainly not the only way; and
It absolutely is possible (in line with my first objection above) for the algorithm to “know” the correct conclusion, even if it cannot report it.

I feel that Penrose is explicitly instructing us in how to cover up the very result that he’s trying to prove cannot be obtained!

He then says:

The argument has the form reductio ad absurdum, starting from the assumption that we use a knowably sound algorithm for ascertaining mathematical truth, and then deriving a contradiction.
(p. 79)

Yes, but that isn’t the only assumption! The contradiction could alternatively be taken to refute the non-stopping requirement.

3. Humans can screw this up too

We can imagine a mirror-image of the situation Penrose sets up, only applying to humans rather than computers. Suppose a scientist is asked to analyse their own brain, or the exact informational content of it, with access to all conceivable tools, information, help (including from the entire human race) and whatever other external resources might exist. The scientist is then asked to give a “thumbs up” in the case where their brain doesn’t make a response at all (or at least in some subset of cases, but never inaccurately), and to give no response in all other cases.

The scientist can do no such thing, for the same reason A is forced to not stop. We aren’t restricting the scientist’s investigative tools in any way. If the scientist’s own brain has an internal truth-seeking apparatus that transcends mere algorithmic logic, then the scientist is free to bring this to bear. The problem (as a matter of principle, putting aside the complexity) is that the task itself is paradoxical for whichever entity is told to carry it out.

And in this revised scenario, at the same time that the human scientist is unable to produce an answer, an algorithm could produce the correct answer (even a very simple one, just by observing that the task is paradoxical). And so now it’s the algorithm that enjoys apparently greater understanding than the human brain!

Have I understood this?

I’m not a mathematician, and I must admit to having a constant nagging feeling that I’ve missed some crucial aspect of Penrose’s reasoning or mathematical background. Of others who have responded to Shadows of the Mind, I can’t find any who dispute the argument in Chapter 2 (even if they dispute the overall case).

To put a foolhardy, non-mathematician’s foot forward, people seem to be thinking about this problem in terms of the “formal systems” of Gödel’s incompleteness theorems, and not in the terms Penrose has actually laid out. If you assume that a hypothetical AI mathematician has a sound formal system for proving mathematical theorems, then Gödel’s theorems say that certain truths are out of reach for the AI, whereas they would be within reach for a human mathematician, given they know which formal system is being used.

I’m certainly not going up against Gödel, but I also question the premise of an entity having “a” formal system in the first place. An algorithm isn’t a formal system (in the sense that Gödel means it), and a single algorithm for proving mathematical propositions may be capable of operating according to many, perhaps infinitely many, different formal systems. Thus, an AI mathematician may be able to see the truth of a “Gödel sentence” by switching between formal systems (as a human mathematician can).

Chapter 3: “The case for non-computability in mathematical thought”

In this chapter, Penrose looks at “unsound” algorithms, and tries to demonstrate that they too (like the hypothetical algorithm from Chapter 2) are untenable.

This is the part that most other people take issue with, because “unsoundness” is a gigantic loophole in Gödelian reasoning. Penrose’s approach, it seems to me, is to insist that mathematical reasoning can’t really be unsound at all.

He begins with this categorisation:

We must distinguish clearly between three distinct standpoints with regard to the knowability of a putative algorithmic procedure F underlying mathematical understanding, whether sound or not. For F must be:

I. consciously knowable, where its role as the actual algorithm underlying mathematical understanding is also knowable;

II. consciously knowable, but its role as the actual algorithm underlying mathematical understanding is unconscious and not knowable;

III. unconscious and not knowable.
(p. 130)

Penrose proceeds by arguing that I is impossible, II is highly implausible, and III can be reduced to one of the other two, and hence is also ruled out (p. 144). Disproving any one of these contentions would end the argument, and in fact I think each of the three cases is fatally flawed in its own separate way.

Case I

Penrose says:

Strangely enough, however, unsoundness does not help at all for a known formal system F which, as asserted in I, is actually known—and thus believed—by any mathematician to underlie his or her mathematical reasoning! For such a belief entails a (mistaken) belief in F’s soundness. (It would be an unreasonable mathematical standpoint that allows for a disbelief in the very basis of its own unassailable belief system!)
(p. 131)

Penrose appears to say that mathematicians cannot have knowably unsound algorithmic reasoning, for no other reason than that they are sure it’s sound. To me this seems an obviously and fatally flawed argument. A given person might (a) be unaware of the algorithm governing their reasoning, or in denial about it, (b) be in denial about the implications of the algorithm, or (c) not hold 100% confidence in their own power of reason. All of these seem to be logically permitted under case I.

I would think that most people (even mathematicians!) accept, in general, that their mental faculties are not 100% reliable, and so wouldn’t really have a problem if this was somehow formally proven. This doesn’t make our reasoning irretrievably broken though; we can still make progress, with sufficient review and revision. Penrose’s discussion seems to conflate imperfection with complete unworkability, assuming that any level of “unsoundness” is absolute fatal.

If there is some sense of “perfection” in mathematical reasoning (and admittedly I say this as a non-mathematician), surely it’s a perfection that we try to glimpse through our own imperfect cognition, not a perfection in our own selves.

I think Penrose simply cannot accept fallibility in human mathematical reasoning, almost as an article of faith. He isn’t prepared for the “unassailability” of human mathematical reasoning to be, well, assailed, even within the hypothetical discussion here.

Case II

But is it really plausible that our unassailable mathematical beliefs might rest on an unsound system—so unsound, indeed, that “1=2” is in principle part of those beliefs? Surely, if our mathematical reasoning cannot be trusted, then none of our reasoning about the workings of the world can be trusted.
(p. 138)

To answer Penrose’s question, yes, I do find it plausible that “1=2” is hiding away in one grimy corner of the sum total of our reasoning capabilities! But it (and things like it) can be “tuned down” to an infinitesimal weight with training. We learn how to perform consistent mathematical reasoning, aided by the consistency (where it exists) of the world around us. But we are (loosely speaking) only approaching sound reading in the limit.

Regarding trust in our reasoning, in the uncertain, stochastic real-world, this trust is a matter of degrees, not a binary “yes” or “no”. We don’t instantly turn to nihilism just because of the possibility of an error. We just do the best we can!

Penrose goes on to argue that, although mathematicians obviously do commit errors, these are invariably “correctable” and therefore not evidence of fundamentally unsound reasoning capabilities.

This falls foul of Occam’s Razor. Error-correction is a clear and obvious consequence of an imperfect system, particularly in light of the process of natural selection by which our brains formed. If our faculties are ultimately completely sound, why do errors occur at all? You would have to suppose, for instance, that we have two sets of mental faculties, one imperfect set that lets us perform temporary working and leads to errors, and another perfect set that lets us make conclusions that are (to use Penrose’s oft-repeated phrase) “unassailably true”.

It seems unnecessary to entertain this when there’s a simpler workable theory at hand. Basically I think Penrose is asking us to dismiss clear evidence that his assumptions are simply wrong.

Case III

Penrose devotes numerous whole sections to this final point, on the possibility of an unconscious and unknowable algorithm. His broader argument already seems a lost cause at this point, but it’s here that he gets around to discussing a evolutionary/machine-learning (“bottom-up”) approach for reproducing human reasoning, which is something I’m specifically interested in.

Referring to mathematical thinking in the context of human evolution, Penrose at first says:

It would be very reasonable to suppose that the selective advantages that our ancestors enjoyed were qualities that were valuable for all these things [subsequent achievements], and, as an incidental feature, turned out, much later, to be just what was needed for the carrying out of mathematical reasoning. This, indeed, is more or less what I believe myself.
(p. 148)

But Penrose is convinced that this doesn’t apply to a potential algorithmic version of human reasoning. He argues that, if human reasoning was algorithmic, then from the moment it first evolved, it must have had built into it (somehow) all the conclusions subsequently reached throughout all of mathematics:

This putative, unknowable, or incomprehensible algorithm would have to have, coded within itself, a power to do all this, yet we are being asked to believe that it arose solely by a natural selection geared to the circumstances in which our remote ancestors struggled for survival. A particular ability to do obscure mathematics can have had no direct selective advantage for its possessor, and I would argue that there can be no reason for such an algorithm to have arisen.
(p. 149)

This is an extravagantly passionate splitting of hairs. Why does non-computability make any difference here? Why is computable determinism any more difficult to swallow, philosophically, than non-computable determinism?

I also think Penrose is missing that the “human algorithm” can gain information from its environment^[4]Technically, a Turing machine is the wrong way to model this, but in general it’s a trivial task.. A lot of scientific and mathematical progress has followed from real-world experimentation. The algorithm did not need to incorporate all future human achievements purely within its own code. What the algorithm gave us, presumably, was the general ability to conduct abstract reasoning, and not specifically all the individual theorems mathematicians have arrived at, stored up and just waiting to be unzipped. Since we can reason abstractly, we can create models—abstract concepts grounded in aspects of the physical world. We use these for making things, predicting nature, organising society, etc. And, once such immediately-practical modelling proves its worth, it’s not surprising (to me) that it would eventually lead to people tinkering with pure abstraction.

Later, Penrose comes to another issue. If we suppose an AI can reproduce the mathematical understanding of humanity (and perhaps beyond), then we can’t actually know this unless the AI uses our language:

Now, for the interpretation of the formal system Q(M) [the system of mathematics underlying the robot’s reasoning], it needs to be clear that, as the robot develops, the imprimatur ‘*’ [used by the robot to label what it considers “unassailably true” theorems] actually does mean—and will continue to mean—that the thing that is being asserted is indeed to be taken as unassailably established. Without input from the human teachers (in some form) we cannot be sure that the robot will not develop for itself some different language in which ‘*’ has some entirely other meaning, if it has any meaning at all. To ensure that the robot’s language is consistent with our own specifications in the definition of Q(M), we must make sure that, as part of the robot’s training (say, by the human teacher), the meaning that is to be attached to ‘*’ is indeed what we intend it to be. Likewise, we must make sure that the actual notation that the robot uses … is the same as (or explicitly translatable into) the notation that we use ourselves.
(p. 161)

Penrose’s requirement here seems practically impossible to fulfil^[5]It could even be impossible in principle, though I’m not properly equipped to determine that.. First, machine learning techniques are not able to follow absolute rules regarding the very things they’re supposed to be learning. Neural networks and genetic algorithms can only really learn approximations of rules.

Second, how would we encode the meaning of the phrase “unassailably true” into an AI? This seems an insurmountably complicated notion either to learn or to explicitly program in. It’s not even clear to me as a human that “unassailably true” has any truly objective meaning.

And more broadly, how could we possibly ensure the AI uses our language (including mathematical notations) to express itself? We could just let the AI explore the world completely on its own terms, and thus come up with its own systems for representing mathematical theorems (assuming it gets to that point). It might learn our notations by reading Wikipedia, or we might train it explicitly. However, we’d have to hobble it in quite strange and counterproductive ways to ensure that it only uses our notations, given that we’re already assuming it to be capable of arbitrary abstract reasoning.

I think this is important, because Penrose is trying to show that any algorithm underlying mathematical reasoning can actually be known. But to do this, he needs to set specific requirements on the AIs that embody these algorithms, requirements that are almost certainly impossible to guarantee. In a sense, Penrose needs the AIs’ hypothetical cooperation in order to prove that they cannot exist.

Computational intelligence?

Hopefully it’s clear that I’m not trying to prove that AGI (strong AI) is definitely possible. I just don’t think that Penrose has successfully argued against it. I’m a very late entrant to this discussion, but I wanted to engage directly with the argument myself.

It remains to be seen whether AGI can be achieved practically. But there don’t seem to be any fundamental, theoretical roadblocks to it, and I can get on with writing sci-fi about it!

References[+]

References
↑1	Part II discusses the new physical theories Penrose thinks will be necessary to explain human consciousness, but that isn’t my real concern.
↑2	He’s a mathematician after all.
↑3	It’s called the “halting problem”. Algorithm X cannot find out for all cases whether some other algorithm Y stops or not. The reason is that some algorithm Y might actually invoke a copy of X to work out whether X thinks Y will stop, and then just do the opposite, thus making X wrong.
↑4	Technically, a Turing machine is the wrong way to model this, but in general it’s a trivial task.
↑5	It could even be impossible in principle, though I’m not properly equipped to determine that.