Algorithms & Complexity Theory Matei Popovici Computer Science Department POLITEHNICA POLITE HNICA Universit University y of Bucharest Bucharest January 9, 2015
2
0. 0.1 1 0. 0.1. 1.1 1
In Intr trodu oduct ctio ion n The The Apol Apollo lo examp example le
The story On the 11th of April 1970, the Apollo 13 mission is taking o ff , with the intention of landing on the Moon. Two days into the mission, an explosion in the Service Module forces the crew to abort mission and attempt a return to Earth. Facing extreme extreme hardship, especially especially due to loss of power, the crew finally succeeds in returning to Earth. Both the crew and the support team from Space Center were forced to find fast and simple solutions to apparently insurmontable problems1 . Drawing inspiration from this scenario, we consider the following inspirational example. The space mission S takes o ff having having Moon as destination. Some time after take-o ff , S notices a malfunction in the main engine, which fails to respond to command inputs. Trying to hack-control the engine the following circuit is identified: ϕ
≡ (¬A ∨ B ∨ D) ∧ (¬B ∨ C ∨ D) ∧ (A ∨ ¬C ∨ D) ∧ (A ∨ ¬B¬D) ∧(B ∨ ¬C ∨ ¬D) ∧ (¬A ∨ C ¬D) ∧ (A ∨ B ∨ C ) ∧ (¬A ∨ ¬B ∨ ¬C )
If ϕ can be made to output 1, then the engine could be manually started. S however notices that, with no apparent input for for A, B , C C and D, does the circuit yield 1. S requires advice from Control, as how to proceed in this situation. The solution solution which which Control Control must must prompt prompt is depende dependent nt on some some ke key y “background” information: (i) The position of S allows S allows only 5 minutes of window for a decision to be made. After 5 minutes, a safe mission abort is no longer possible. 2 (ii) Trying to find the solution “by “ by hand ” is unfeasible and likely to produce errors. 1
One example is the famous improvisation which made the Command Module’s square CO2 filters operable in the Lunar Module, which required such filters of round shape. 2 Th This is is hig highly hly similar similar to the real situati situation on of Apo Apollo llo 13: giv given en their posit position ion and malfunctio malfu nctions ns made a direc directt abort and return to Earth impossible. Instead, Instead, their return trajectory involved entering Moon orbit.
0.1. INT INTRO RODUC DUCTIO TION N
3
(iii) The actual problem problem to be solved solved consists consists in finding a certain “input “input ” I , which assigns to each of A of A . . . D a D a value from {0, 1}, such that the circuit ϕ yields 1 (which we denote as ϕ (I ) = 1). We call this problem S (ϕ). (iv) To solve solve S , one needs to find an algorithm for computing ϕ(I ), ), given an input I input I .. Computing ϕ(I ). It is helpful helpful to examine the the structure structure of ϕ first, which shows certain regularity. ϕ is of the form: (L1
∨ . . . ∨ L1 ) ∧ . . . ∧ (L ∨ . . . ∨ L k
n1
k nk ))
where each (L (Li . . . Lini ) is a clause a clause consisting consisting of n n i literals , and each literal i L j is of the form form X or ¬X , where X where X is is a variable (input). Next, we identify a suitable representation representation of which our algorithm can benefit. bene fit. Assume Assume I I is repres represen ented ted as a ve vecto ctor, r, which which,, for conve convenie nience nce,, is indexed by variable names instead of 0 . . . n 1 (n ( n is the number of variables). Assume ϕ is represented as a matrix, where each column ϕ[i] represents a vector vec tor of literals. The value value ϕ[i, X ] = 1 indicates that that X X appears appears as itself in clause i, ϕ[i, X ] = 0 indicates that X X appears in a literal ¬X , while ϕ[i, X ] = indicates that that X is is not present in clause ϕ[i]. The algorithm to solv solvee our problem, entitled CHECK(ϕ, I ), ), proceeds as follows:
∨
−
⊥
a) Set v Set v = 0 b) for i for i = 1, k: re-set v re-set v = 0 for each variable X variable X from from I I :: set v = set = v v + + ϕ[i, X ]
⊕ I [X ]
if v = 0 stop and return 0; otherwise, continue; c) return 1
⊕
⊕
The operation is a customized XNOR, which behaves as follows: a b is always 0 if either of a or or b is . Hen Hence, ce, if a variabl ariablee X X does not appear in a clause clause i, then ϕ[i, X ] = will not influence the value of v. v . a b is 1 if both operands are the same. same. Hence Hence ϕ[i, X ] I [X ] = 1 if X occurs occurs as itself in clause i clause i and is given value 1 in I in I ,, or if X occurs X occurs as ¬ X in in clause i clause i and is given value 0 in I in I ..
⊥
⊥
⊕
⊕
4 At the end of the inner loop, CHECK will have computed the value of some clause (L (Li . . . Lini ), which is 1 if at least some literal is 1. If, some clause has the computed value of 0, then ϕ is 0 for the input I input I .. Finally, note that CHECK performs at performs at most n k computations, where n is the number of variables, and k and k is the number of clauses in ϕ.
∨
∗
Computing S (ϕ) To so solv lvee S , S , it is sufficient to take all possible inputs I inputs I ,, up to the number of variables which occur in ϕ, and for each one, perform CHECK(ϕ, I ). ). This can be achieved by viewing the vector I vector I as as a binar binary y counter. counter. Hence, Hence, the operation I operation I ++ ++ is implemented as follows, where variables X variables X are are iterated in some arbitrary order, but fixed with respect to the algorithm: a) for each v variabl ariablee X X in in I : if I [ I [X ] = 1 make I make I [[X ] = 0 and continue; otherwise make I make I [[X ] = 1 and stop; b) overflo overflow; w; The operation operation I + + + is said said to overflow to overflow , if instruction b) is reached, namely if all variables are set to 0 upon a traversal of all variables. Now, we implement FIND(ϕ): a) Set Set I to to be the input where all variables are set to 0; b) while overflow while overflow was was not signalled: if CHECK(ϕ, I )=1 )=1 then return 1 otherwise I otherwise I ++; ++; c) return 0 We now have a procedure for solving S , one which Control may use in order to assist the Mission. However, several issues arise: (V ) is FIND(ϕ) correct, that is, bug-free, and returning an appropriate result ? (M) how long does FIND(ϕ) take to run, with respect to the size of the input (i.e. the size of the matrix ϕ) ? (C ) is FIND(ϕ) an an optimal optimal algorithm? algorithm? Are there ones which perform better?
0.1. INT INTRO RODUC DUCTIO TION N
5
Next, we discuss discuss each issue in more detail: (V ) denotes the Verification the Verification problem. Verification of computer programs (that is, of algorithm implementations) is a essential nowadays, as more and more we rely on software to perform perf orm critical critical tasks: tasks: assist assist us in drivin drivingg cars cars and flying planes, planes, guidin guidingg missiles missil es and performing performing surgeries. surgeries. Verification erification is equally equally important important for the mission. (M) identifies a measurement a measurement problem. problem. Namely Namely, in what way Critical way Critical resources are consumed consumed by by an algorithm. Traditionally, raditionally, these resources time,, i.e. number of CPU operations and space are time are and space i.e. i.e. computer memory. We note that, for our given formula, CHECK( ϕ, I ) will run at most 8 4 = 32 CPU operations, where 8 is the number of clauses, and 4 is the number of variables ariables from any any input. FIND( FIND(ϕ) will build at most 2 n assignments assignments I , where n where n is the number of variables. In our example, n example, n = = 4, thus we have 16 diff erent erent ways of assigning A assigning A,, B,C and and D to values 0, 0, 1. For each, each, we run CHECK(ϕ, I ), ), yielding a total of 16 32 = 512 CPU operations. For the sake of our example, example, let us assume assume Mission’ Mission’ss com comput puters ers ran one CPU instruction per second, which is not unlikely, given the hardware available in the ’70s. Thus, in 5 minutes, we can perform: 5 60 = 300 CPU operations ! Note that FIND( ϕ) does not finish in su fficient time: if Control would use this algorithm, it would waste Mission’s time. Finally Finall y, (C ) designates a complexity problem, one specific to Complexity plexit y Theory: Theory: “Do e ffi cient cient alg algori orithm thmss exist, exist, for a given given pr probl oblem? em? ”. ”. There are also sub-questions which spawn from the former:
∗
∗
∗
• is there a better encoding better encoding of of the input and the variables from I from I , which makes the algorithm run faster? • is there a certain machine certain machine which which allows performing computations in a more efficient (fast) way? Unfortunately, the Unfortunately, t he problem which we denoted by S , is traditionally called SAT: SA T: i.e. i.e. th thee Satisfiability probl problem em for boolean formu formulae lae.. It has been known that efficient algorithms to solve SAT are unlikely are unlikely to to exist, no matter the machine machine of choice. Translated ranslated into simple terms, in the general case, we can’t get substantially better than an exponential number of steps, with respect to the formula input. If Control would have this knowledge, then it will take no trouble in looking at specific algorithms. If would recognize the problem to be impossible to solve in the given time constraints, and would recommend Mission to return home. To recap, (V ), ), (M) and (C ) play an important role in decision making, both at the software software level, and beyond. Natur Naturally ally hard problems (re)occur (re)occur
6 if every field of science. Having at least a minimal knowledge on their nature is a critical step in attempting to solve them. Disclamer SAT is probably one of the most studied problems in Computer Science. SAT solvers are key components of many algorithms which solve even harder problems: proble ms: program program verificatio verification, n, code optimisation, optimisation, cryptograp cryptography hy,, only to name a few. While reading this, it is highly probable that the reader employs some technology which relies on SAT solvers, directly or indirectly. Are these solvers efficient, in the sense that, they run in less than exponential time? For the general case, the answer is no. However, these solvers will run fast enough for most formulae . Thi Thiss doesn’t doesn’t really really help Control Control,, unless they are lucky enough to stumble upon a “nice “ nice ”” form formula. ula. The intuition behind SAT solvers is that they rely on an efficient graphlike like struct structure ure to repres represen entt a formu formula. la. This This struct structure ure is called called an OBDD OBDD (Ordered-B (Orde red-Binary inary Decision Decision Diagram). The efficiency of an OBDD is unfortunately dependent tunately dependent on finding a “suitable” “suitable” ordering ordering for the variables variables.. The latter, in itself is a hard problem. However, there are good heuristics which come close. The overall overall result is that SAT SAT solvers perform well in “many “ many ” cases, and the “many “many ” is measurable with controlled precision. 0. 0.1. 1.2 2
Which Which is harde harder? r?
Consider the following problems: A telecommunications company T owns radios across the country, and needs to monitor all links, to record performance. Monitoring each radio is expensive, thus T would like a minimal number of k radios to be monitored. Is this possible? An organism consists of a number of cells, and connections between cells. Some cells are good, while some are bad. The bad cells can be detected by examining their links: they form a complete mesh: each cell is connected to all others. Are there (k-) bad cells in the organism? It is easy to notice that both problems can be cast into graph problems. Let G Let G = = (V, E ) be an unoriented unoriented graph. If we interpret interpret each each node as a radio, and each edge as a radio link, then solving the former problem boils down to finding a subset S V such V such that, for all (a, (a, b) E , at least one of the following holds: a S , b S . Hence, S Hence, S “covers covers ”” all edges from E from E ..
∈
⊆⊆
∈
∈
0.1. INT INTRO RODUC DUCTIO TION N
7
If we interpret nodes as cells, and edges as connections between cells, then the latter problem consists in finding a subset S V V (of size |S | = k = k)) such that (a, (a, b) E for for each each a, b S . Hence Hence S is is a clique (of size k size k). ). One interesting question which may be raised is whether one problem is harder har der than the other. other. We not notee tha thatt a graph graph capturing capturing the first first proble problem m can be transformed into one capturing the latter, and vice-versa. If we start from the “telecommunications “ telecommunications ”” graph, we can build a “cell “cell ” graph gra ph by: (i) creatin creatingg once once cell per radio radio (ii) if tw twoo radios radios do not share share a lin link, k, then then the correspo correspondi nding ng cells will share share a connec connectio tion. n. Thus, Thus, if some some subset S subset S of of radios covers all links, then all the cells corresponding to radios outside S outside S must must share connections, hence they are bad. We note that the transformation can be easily adjusted to work in the other direction. direction. Thus, Thus, if one has an algorithm algorithm to solve solve the Telecommun elecommuniications problem, then, via a transformation, we can solve the cell problem, and vice-versa. This observation highlights several issues:
⊆ ⊆
∈
∈
• If such a transformation exists between two problems, what does it say about the complexity of solving the problems? • Is it always the case that transformations are bijective? • What are the properties of the transformation, such that it is e ff ective ective (can be used for problem solving)? solving)? For instance, instance, if some problem problem A 3 can be transformed to B in exponential time , and A can be solved in polynomial time, then solving B via A yields an algorithm in exponential time. • Can (appropriate) transformations be used to characterize problems which are equally are equally hard? hard? In the previous section, we illustrated a problem which can be solved in exponential time. We claimed, without arguing, that this problem cannot be solved in polynomial time. Given some problem problem P P ,, if we can appropriately transform P transform P into into the former problem, we can also argue that P P cannot be solve sol ved d in polynomial polynomial time. time. If this would would be possibl possible, e, an algorithm algorithm for for P could also solve our former problem. Developing a formal methodology from the above intuition is a critical tool for assessing problem hardness. 3
with respect to the size of the input
8
0.2 0.2.1 0.2 .1
Com Compu puta tabi bili litty Th Theor eory y Proble Problems ms and and prob problem lem instan instances ces
In the previous section, we have illustrated the problem SAT, as well as a pseudocode pseud ocode which describes a solution in exponential exponential time. We have seen that such a solution is infeasible in practice, and also that no (predictible) techno tec hnologi logical cal advanc advancee can help. The main question question we asked asked (and also also answe ans wered red)) is whethe whetherr there there exists exists a faster faster procedu procedure re to solve solve SAT. SAT. (We (We have conjectured that the answer is no no,, or, more precisely not precisely not likely ..)) To generalize a little, we are interested in the following question: Can a given problem Q problem Q be solved in “e “ e ffi cient cient ” time? For now, we go around the currently absent definition for “e “ e ffi cient cient ”, ”, and note that such a question spawns another (which is actually more straightforward to ask): Can a given problem problem Q be b e “solved solved ”” at all? In this chapter, we shall focus on answering this question, first, and to do so, we need to settle the following issues:
• What exactly is a “problem “problem ””?? • What does it mean to “solve “solve ” a problem? Definition 0.2.1.1 (Abstract problem, problem instance) A problem instance is instance is a mathematical object of which we ask a question and expect an answer. An (abstract) problem is a mapping mapping P P : I O where where I I is a set of pr probl oblem em instan instancces of which which we ask the same questio question, n, and O is a set of answers. P P assigns to each problem instance instance i I the I the answer P P ((i) O.
→ ∈
∈
It is often the case that the answers we seek are also mathematical ob jects. For instance, the vector sorting problem must be answered by a sorted vector. vec tor. Howeve However, r, many other problems problems prompt yes/no prompt yes/no answers. answers. Whenever O = { 0, 1} we say that P P is a decision problem . Man Many y other problems problems can be cast cast in into to decisi decision on proble problems. ms. The vector vector sorting sorting pro proble blem m may may be seen seen as a decision problem, if we simply ask whether the problem instance (i.e. the vector) vector) is sorted. sorted. The original sorting sorting problem and it’s decision counterpart may not seem equivalent in terms of hardness. For instance, sorting
0.2.
COMPU COMPUT TABILITY THEOR THEORY Y
9
is solved in polynomial time n log n (using “standard” algorithms) which decidi dec iding ng whether whether a ve vecto ctorr is sorted can be don donee in linear time. time. We shall shall see that, from the point of view of Complexity Theory, both problems are equally hard. Definition 0.2.1.1 Definition 0.2.1.1 may may seem abstract and unusable for the following reason: the set set I I is is hard to characte characterize. rize. One solution solution may be to assign types assign types to problem instances. instances. For example graph example graph may may be a problem instance type. However, such a choice forces us to reason about problems separately, based on the type of their problem instances. instances. Also types themselve themselvess are an infinite set, which is also difficult to characterize. Another approach is to level out out problem instances, starting from the following follo wing key observations: observations: (i) each each i I I must be, in some sense finite. For instance, vectors have a finite length, hence a finite number of elements. Graphs (of which we ask our questions) also have a finite set of nodes, hence, a finite set of edges, etc. (ii) I (ii) I must must be countable be countable (but (but not necessarily finite). For instance, the problem P problem P : R R {0, 1} where where P P ((x, y) returns 1 if x and x and y are equal, has no sense from the point of computabilit computability y theory. theory. Assume Assume we would like to answer P P ((π , 2). Simply Simply storin storingg π and 2, which takes infinite space, is impossible on machines, and also takes us to point (i). The observations observations suggest that problem problem instances instances can be represen represented ted via a finite encoding encoding , which may be assumed to be uniform over all possible mathematical mathem atical objects ob jects we consider. consider.
∈
√ × →
√
Definition 0.2.1.2 (Encoding problem instances) Let Σ be a finite set whom we call call alphabet. alphabet. A one-letter one-letter word word is a member of Σ. A two-letter 2 instance, e, if Σ = {a , b . . .}, then word is any member of Σ Σ = Σ . For instanc 2 (a, a) Σ is a two-le two-lette tterr word. word. An i-letter word is a member of Σi . We denote by: ∗ 2 . . . Σi . . . Σ = { } Σ Σ
∈
×
∪ ∪ ∪ ∪ ∪
the set of finite words which can be formed over Σ. is a special word which we call empty empty word. word. Instead of writing, e.g. (a,b,b,a,a a,b,b,a,a)) for a 5-letter word, we simply write abbaa. abbaa. Concatenation of two words is defined as usual.
→ →
Remark 0.2.1.1 We shall henceforth consider that a problem P P : I O ∗ has the followin followingg pr prop opert erty: y: if I I is infinite, then I Σ ( I I is isomorphic ∗ with Σ ). Thus, each problem instance i can be represented as a finite word enc((i) Σ∗ , for some Σ∗ . enc
'
∈
We shall postpone, for now, the question of choosing the appropriate Σ for our problem (the above remark simply states that such a Σ must exist).
10 Definition 0.2.1.2 and Definition 0.2.1.2 and Remark 0.2.1.1 Remark 0.2.1.1 can can be easily recognized from practice. A programmer programmer always always employs the same prede predefined fined mechanisms mechanisms of his programming language (the available datatypes) to represent his program inputs inp uts.. Moreo Moreove ver, r, these these objects objects ultima ultimately tely become streams streams of bits, bits, when when they are actually processed by the machine. Making one step further, we can observe the following property of alphabets, which conforms with (ii): Proposition Propositi on 0.2.1.1 0.2.1.1 For any finite Σ, Σ∗ is infinitely countable. Proof: We show Σ∗ N. We build a bijective bijective function function h which assigns to each word, a unique natural. We assign 0 to . Assume |Σ| = n = n.. We assign to each one-letter word, the numbers from 1 to n. Next, we assign to each k 2-letter word word w = w 0 x the number number n h(w0 ) + h(x). If n = 2 we easily recognise that each binary word is assigned to his natural equivalent. Hence, we have the following diagram:
'
≥
∗
i
∗
∈ I ↔ ↔ enc enc((i)) ∈ N enc((i) ∈ Σ ↔ h(enc
Hence, we can view a problem instance as a natural number, without losing the ability to uniquely identify the instance at hand. Thus:
→
Definition 0.2.1.3 (Problem) A problem is a function function f f : N N. If some n encodes a problem input, then f (n) encodes it’s answer. A decision problem is a function f : N {0, 1}.
→
To conclude: when trying to solve solve concrete problems, problems, the encoding encoding issue is fundamental, and this is dependent on the type of problem instances we tackle tac kle.. From the perspect perspectiv ivee of Com Comput putabil abilit ity y Theory Theory whi which ch deals with proble pro blems ms in genera general, l, the encoding encoding is unesse unessent ntial, ial, and can be abstra abstracte cted d without “loss “loss of information ” by a natural number. 0.2.2 0.2 .2
Algori Algorithm thmss as Turi Turing ng Machin Machines es
Algorithms are usually described as pseudo-code, and intended as abstractions over concrete concrete programming programming language language operatio op erations. ns. The level of abstracabstraction is usually unspecified rigorously, and is decided in an ad-hoc manner by the writer. From the author’s experience, experience, pseudo-code is often dependent dependent on (some future) implementation implementation,, and only abstracts abstracts from language language syntax, possibly including data initialization and subsequent subsequent handling. handling. Thus, Thus, some pseudo-code can be easily implemented in di ff erent erent languages to the
0.2.
COMPU COMPUT TABILITY THEOR THEORY Y
11
extent to which the languages are the same, or at least follow the same programming principles. The above observation is not intended as a criticism towards pseudocode and pseudo-c pseudo-code ode writing. writing. It is indeed indeed difficult, for instance, to write pseudocode which does not seem vague, and which can be naturally implemented in an imperative language (using assignments and iterations) as well as in a purely functional language (where iterations are possible only through recursion). As before, we require a means for leveling out out diff erent erent programming styles and programming languages, in order to come up with a uniform, straightf straig htforwa orward rd and simple definition for an algorithm. algorithm. The key observation here is that programming languages, especially the newest and most popular, are quite restrictive w.r.t. to what the programmer can do. This may seem counter-i counter-intui ntuitive tive at first. Consider Consider typed languages gua ges for instan instance. ce. Enforc Enforcing ing each variab variable le to ha have ve a ty type pe is obvio obviousl usly y a restri restricti ction, on, and has a definit definitee purpose purpose:: it helps helps the progra programme mmerr write write cleaner code, and one which is less likely to crash at runtime. However, this issue is irrelevant irrelevant from the point of view of Computabilit Computability y Theory. Theory. If we try to search for less restrictive languages, we find the assembly languages. The restrictions are minimal here (as well as the “programming structure”). The formal definition for an algorithm which we propose, can be seen as an abstract assembly language, where all technical aspects are put aside. We call such a language the Turing Machine. Definition 0.2.2.1 (Deterministic Turing Machine) A Deterministic Turing Machine (abbreviated Machine (abbreviated DTM) is a tuple tuple M = (K,F, Σ, δ , s0 ) where:
• Σ = { a , b , c , . . .} is a finite set of of symbols which symbols which we call call alphabet; alphabet; • K is is a set of states, and F
× → × × ∈
K is is a set of of accepting/final states; states;
⊆ ∈
• δ :: K Σ K Σ {L,H,R} is a transition function which assigns to each state s K K and c Σ the triple δ (s, c) = (s0 , c0 ,pos ,pos)).
∈ K is is an initial initial state. state.
• s0
The Turing Machine has a tape which contains infinite cells in both directions, and on each tape cell we have a symbol from Σ. The Turing Machine has a tape tape head, head, which is able to read the symbol from the current cell. Also, the Turing Machine is always in a given state. Initially (before the machine has started) the state is ss 0 . From a given state s, s , the Turing Machine reads the symbol c from c from the current cell, and performs a transition a transition.. The transition
12 is given by δ (s, c) = (s0 , c0 ,pos ,pos)). Perfo Performing rming the transitio transition n means that the 0 TM moves from state s to to s , overrides the symbol c with c0 on the tape cell and:: (i) and (i) if pos pos = L moves the tape head on the next cell to the left, (ii) if pos = pos = R R moves the tape head on the next cell to the right and (iii) pos pos = = H H leaves tape head on the current cell. The Turing Machine will perform transitions according to δ . Whenever the TM reaches an accepting/final state, we say it it halts. halts. If a TM reaches a non-accepting state where no other transitions are possible, we say it it clings/hangs. clings/hangs.
• the the input of input of a Turing Machine is a finite word which is contained in its otherwise empty tape; • the the output of a TM is the contents of the tape (not including empty cells) after after the Machine Machine has halte halted. d. We also write write M (w) to refer to the output of M , given input w. Example 0.2.2.1 (Turing Machine) Consider the alphabet Σ = {#, > , 0, 1}, the set of states K = { s0 , s1 , s2 }, the set of final states F F = { s2 } and the transition function: δ (s0 , 0) = (s (s0 , 0, R) δ (s0 , 1) = (s ( s0 , 1 , R ) δ (s0 , #) = (s (s1 , #, L) δ (s1 , 1) = (s (s1 , 0, L) δ (s1 , 0) = (s (s2 , 1, H ) δ (s1 , >) = (s2 , 1, H )
The Turing Machine M = (K,F, Σ, δ , s0 ) reads a number encoded in binary on the tape, tape, and incr increments ements it by 1. The symbol symbol # encodes the empty cell 4 tape. Initially, the tape head is positioned at the most significant bit of the number. numb er. The Machine Machine first goes goes over over all bits, bits, from left to right. right. When When the first empty cell is detected, detected, the machine goes into state s s 1 , and starts flipping 1s to 0s, until the first 0 (or the initial position, marked by by >) is detected. Finally, the machine places 1 on this current cell, and enters it’s final state. The behaviour of the transition function can be more intuitively represented as in Figure Figure 2.2.1. 2.2.1. Each Each no node de represe epresents nts a state, state, and each edge, edge, a 0 tr trans ansiti ition. on. The label label on each edge edge is of the form c/c ,pos where where c is the 0 symbol read from the current tape cell, c is the symbol written on the current tape cell and pos pos is a tape tape he head ad position. position. The label label should should be read read as: 0 the machine replaces c with c on the current cell tape and moves in the direction indicated by by pos. pos. 4
We shall use # to refer to the empty cell, throughout the text
0.2.
COMPU COMPUT TABILITY THEOR THEORY Y
c/c,R c/ c,R with with c c ∈ {0, 1}
1/0, L 0/1, H > /1 / 1, H
#/#, L
start
s0
13
s1
s2
Figure 2.2.1: The binary binary incremen incrementt Turing Machine Machine Let us consider that, initially, on the tape we have >0111 — the representation sentat ion of the number number 7. The evolution evolution of the tape is shown below. below. Each line shows the TM TM configuration at step i step i,, that is, the tape and current state, after transition i. i . For convenience, we have chosen to show two empty cells in each direction, only. Also, the underline indicates the position of the tape head. Trans ansiti ition no Tape ape Cur urrren entt state tate 0 1 2 3 4 5 6 7 8 9
##>0111## ##>0111##
##>0111##
##>0111##
##>0111##
##>0111##
##>0110##
##>0100##
##>0000##
##>1000##
s0 s0 s0 s0 s0 s1 s1 s1 s1 s2
In order to better understand the Turing Machine, it is useful to establis lish h some some sim similar ilaritie itiess with, with, e.g. assem assembly bly language languages. s. As specifi specified ed in DefiDefinition nition 0.2.2.1, a Turing Machine is is M M specifies specifies a clearly defined behaviour, which 0.2.2.1, a is actually captured by δ . Thus, Thus, M M is is quite similar to a specific program , p perfor erforming ming a definite task. If programs (algorithms (algorithms)) are abstracted by Turing Machine, then what is the abstraction for the programming language? gua ge? The answe answerr is, again, again, the Turin Turingg Machine Machine.. This This implies implies that, a Turing Machine acting as a programming language, can be fed another Turing Machine acting as program, and execute it. In the following Proposition, we show how Turing Machines can be encoded as words: Proposition 0.2.2.1 (TMs as words) Any Turing Machine M M = ((K,F, K,F, Σ, δ , s0 ) can be encoded as a word over Σ. We w wri rite te enc enc((M ) to refer to this
14 word. Proof:(sketch) Intuitively, Proof:(sketch) Intuitively, we encode states and positions as integers n
∈ N,
transitions as pairs of integers, etc. and subsequently “convert “ convert ”” each integer to it’s word counterpart in Σ∗ , cf. Proposition 0.2.1.1. Proposition 0.2.1.1. Let NonFin NonFin = |K \ F \ {s0 }| be the set of non-fin non-final al states states,, exc exclud lud-ing ing th thee init initial ial one. one. We encod encodee ea eacch stat statee in NonFin NonFin as an in inte tege gerr in {1, 2, . . . , |NonFin ||}} and each final state as an integer in {|NonFin |+1 +1,, . . . , |NonFin |+|F |}. We encode the initial state s0 as |NonFin |+ |F | + 1, and L,H,R L,H,R as |NonFin | + | F | + i + i with with i {2, 3, 4}. Each Each integer integer from the above is represented as a word using log|Σ| (|NonFin | + |F | + 4) bits. Each transition δ (s, c) = (s0 , c0 ,pos ,pos)) is encoded as:
d
∈
e
enc((s)#c#enc enc enc((s0 )#c’#enc enc(( pos pos)) where enc( enc(·) is the encodin encodingg descri described bed above. above. The entire entire δ is encoded a sequence of encoded transitions, separated by #. The encoding of M M is enc((M ) = enc enc enc((|NonFin |)#enc enc((|F |)#enc enc((δ )
Thus, enc( enc(M ) is a word, which can be fed to another Turing Machine. The latter should have the ability to execute (or to simulate) M . Th This is is is indeed possible: possible: Proposition 0.2.2.2 (The Universal Turing Machine) There exists a TM U U which, which, for any TM M , M , and every word w w Σ∗ , takes enc enc((M ) and a nd w as input and outputs 11 whenever M (w) = 1 and 0 0 whenever M ( M (w) = 0. We call U U ,, the Universal Universal Turing Machine, Machine, and say that U U simulates simulates M .
∈
Proof: Let Let M M be a TM and and w = c1 c2 . . . cn be a word which is built from the alphabet of M . M . We build the Universal Turing Machine U , U , as follows:
• The input of of U U is enc( enc(M )#enc enc((s0 )#c1 #c2 . . . cn . Note ote tha thatt enc enc((s0 ) encodes the initial state of of M M while c1 is the first symbol from w. The portion of the tape tape enc( enc(s0 )#c1 #c2 . . . cn will be used to mark the current configuration of M , namely the current state of M M (initially s0 ), the contents of M ’s ’s tape, and and M ’s ’s current current head position. position. More generally, this portion of the tape is of the form enc enc((si )#u#v , with ∗ u, v Σb and and si being the current state of M . The la last st symbol symbol of u marks the current symbol, while v is the word which is to the left of the head. Initially, the current symbol is the first one, namely namely c1 .
∈
0.2.
COMPU COMPUT TABILITY THEOR THEORY Y
15
• U U will scan the initial state of of M , then it will move on the initial symbol from from w, and finally will move on the portion of enc enc((M ) were transitions are encoded. Once a valid transition is found, it will execute it: 1. U U will will change the initial state to the current one, according to the transition; 2. U will will change the original symbol in w in w according according to the transition; 3. U U will ch chang angee the curren currentt symbol, symbol, accord according ing to pos pos,, from the transition;
• U U will repeat this process until an accepting state of of M M is is detected, or until no transition can be performed.
Propositions 0.2.2.2 0.2.2.2 and and 0.2.2.1 show that TMs have the capability to characterize both algorithms, as well as the computational framework to execut exe cutee them. them. One question question remains remains:: what what can TMs actually actually compute compute?? Can they be used used to sort vectors vectors,, solve solve SAT, SAT, etc.? The answer, answer, which which is positive is given by the following hypothesis: Conjecture 0.2.2.1 (Church-Turing) Any problem which can be solved with the Turing Machine is “ universally “ universally solvable”. solvable”. The term “universally “universally solvable ” cannot be given a precise mathematical definition. We only know solvability w.r.t. concrete means, e.g. computers and programming progr amming languages, languages, etc. It can be b e (an has been) shown that the TurTuring canTuring solve any problem whichdescribes known programming languages can Machine solve.5 The solve. Machine, in itself, a model of computation based on side-eff ects: ects: each each transi transition tion may modify the tape in some some wa way y. Computation can be described di ff erently, erently, for instance: as function application, or as term rewriting. However, all other known computational models are equivalent to the Turing Machine, in the sense that they solve precisely the same problems. This observation prompted the aforementioned conjecture. It is strongly believed to hold (as evidence suggests), but it cannot be formally proved. 5
To be fair fair to the TM, one would would formu formulat latee this this state statemen mentt as: ”all ”all pro progra grammi mming ng languages are Turing-complete, i.e. they can solve everything the TM can solve”
16 0.2.3 0.2 .3
Decidab Decidabili ility ty
The existence of the Universal Turing Machine (U) inevitably leads to interesting questions. Assume Assume M M is is a Turing Machine and and w is a word. We use the following convention: enc( enc(M ) w in order to represent the input of the U U .. Thus, U Thus, U expects expects the encoding encoding of a TM, followed followed by by the special symbol symbol , and M ’s and ’s input w input w.. (?) Does Does U U halt halt for all inputs? If the answer answer is positive, then U then U can can be used to tell whether any machine halts, for a given input. We already have some reasons to believe we cannot answer positively to (?), if we examine the proof of Proposition 0.2.2.2 Proposition 0.2.2.2.. Actually (?) is a decision problem, one that is quite interesting and useful. As before before,, we try to lift lift ou ourr se sett ttin ingg to a more more genera generall on one: e: Can Can an any y proble pro blem m be solved solved by some Turing uring Machin Machine. e. The fol follo lowin wingg proposi proposition tionss indicate that it’s not likely the case: Proposition Propositi on 0.2.3.1 0.2.3.1 The set T M of Turing Machines is countably infinite. Proof: The proof follows immediately immediately from Proposition Proposition 0.2.2.1 0.2.2.1.. Any Turing Machine can be uniquely encoded by a string, hence the set of Turing Machines is isomorphic to a subset of Σ∗ , which in turn is countably infinite, since Σ∗ is countably infinite, for any Σ. Proposition 0.2.3.2 Proposition 0.2.3.2 The set H om( om(N, N) of functions functions f f : N countably infinite.
→ N is un-
Proof: It is sufficient to show that H om( om(N, {0, 1}) is uncountably infinite. We build a proof by contraposition. We assume H assume H om om((N, {0, 1}) is countably infinite. Hence, each natural number n N corresponds to a function f n H om om((N, {0, 1}). We build a matrix as follows: Columns Columns describe functions functions
∈
∈
f n : n N. Rows Rows describe describe inputs inputs k N. Each Each matrix content content mi,j is the value of f j (i) (hence, the expected output for input input i, from function f function f jj ). In the Figure 0.2.3, 0.2.3, we have have illustr illustrated ated our matrix. matrix. We no now w devise devise a ∗ problem f problem f as follows:
∈
∈
f ∗ (x) =
1 0
iff f f x(x) = 0 iff f f x(x) = 1
Since f ∗ H om Since om((N, {0, 1}) it must also have a number assigned to it: f ∗ = f for some α N. Then f ∗ (α) = 1 if f (α) = 0. 0. But f (α) = f ∗ (α). Contradict Cont radiction. ion. On the other hand hand f ∗ (α) = 0 if f (α) = 1. As befo before re we we obtain a contradiction.
∈
α
∈
α
α
α
0.2.
COMPU COMPUT TABILITY THEOR THEORY Y
f 0 f 1 f 2 0 1 1 0 1 0 1 1 2 1 0 1 . .. . .. . .. . .. n 1 1 0 . .. . .. . .. . ..
. .. ... ... ... . .. ... . ..
fn 0 0 1 . .. 1 . ..
17
... ... ... ... . .. ... . ..
Figure 2.3.2: An example of the matrix of the proof of Proposition 0.2.3.2. 0.2.3.2. The value value mi,j have been filled out purely for the illustration. Propositions 0.2.3.2 and 0.2.3.1 tell us that there are infinitely more functions (decision problems) what means of computing them (Turing Machines). Our next step is to look at solvable and unsolvable problems, and devise a method method for separa separatin tingg the first first from from the latter. latter. In other words, words, we are looking for a tool which allows us to identify those problems which are solvable, and those which are not. We start by observing that Turing Machines Machines may never never halt. We write M (w) = to designate that M M loops infinitely for input w. Also, lso, we w write n write N to refer to the number which corresponds to w , according to Proposition 0.2.1.1 Proposition 0.2.1.1.. Next, we refine the notion of problem problem solving :
⊥ ∈
Definition 0.2.3.1 (Decision, acceptance) Let M be M be a Turing Machine and f H om om((N, {0, 1}).We say that:
∈∈
• M M decides decides f , i ff for for all n N: M (w) = 1 whenever f (nw ) = 1 and M (w) = 0 whenever f (nw ) = 0. 0.
∈
• M M accepts accepts f f i ff for for all n n i ff f (n) = 0.
⊥
f (n ∈ N: M (w) = 1 i ff f (
w)
= 1, 1 , and M ( M (w) =
Note that, in contrast with acceptance, decision is, intuitively, a stronger means of computing computing a function (i.e. solving solving a problem). problem). In the latter case, the TM at hand can provide both a yes yes and a a no answer to any problem instance, while in the former, the TM can only provide an answer of yes of yes . If the answer to the problem instance at hand is no no,, the TM will not halt. Based on the two types of problem solving, we can classify problems (functions) as follows:
∈∈ H om om((N, {0, 1}) be a decision problem.
Definition 0.2.3.2 Let f
18
• f f is recursive recursive ( decidable ( decidable)) i ff there there exists a TM TM M M which decides f . The set of recursive functions is R = { f
om(N, {0, 1}) | f is recursive } ∈∈ H om(
• f f is recursively enumerable enumerable ( semi-decidable ( semi-decidable)) i ff there there exists a TM M M which which accepts f . The set of recursive-enumerable functions is
∈∈ H om om((N, {0, 1}) | f is recursively-enumerable }
RE = { f
Now, let us turn our attention to question ( ?), which we shall formulate as a problem: f h (n
enc(M ) w
)=
1
iff M (w) halts iff M (w) =
⊥
0
Hence, the input of f f is is a natural number which encodes a Turing Machine M chine M and and an input word w word w.. The first question we ask, is whether f whether f h R.
∈
6∈ R.
Proposition Propositi on 0.2.3.3 0.2.3.3 f h
∈
by M h the Turing Machine which decides Proof: Assume f Assume f h R and denote by M f h . We build the Turing Machine Machine D, as follows: D(enc enc((M )) )) =
⊥ i M M (enc enc((M ) enc enc((M )) )) = 1 ff
1
h
iff M h (enc enc((M ) enc enc((M )) )) = 0
The existence of the Universal Turing Machine guarantees that D can indeed be built, since since D simulates simulates M M h . We note that that M h (enc enc((M ) enc enc((M )))) decides if the TM M TM M halts halts with “itself “itself ” ” as input (namely enc enc((M ))). ). h enc Assume Assume Dnot (enc enc( (D))for=input enc 1. Henc He ncee( D M enc((D D( ),enc ,enc( (D is, machine chine D D does halt input enc( ). (Hence D Hence (enc enc( (D))))==0, that . Contradiction. Assume D(enc enc((D)) = . Hence ence M h (enc enc((D),enc ,enc((D)) = 1, and thus D(enc enc((D)) halts. Contradiction. We note that the construction of of D mimics mimics the techni technique que which which we applied for the proof of Proposition 0.2.3.2 Proposition 0.2.3.2,, which is called diagonalization called diagonalization
⊥
⊥
Exercise 0.2.3.1 Exercise 0.2.3.1 Apply Apply the diagon diagonali alizat zation ion techn techniqu iquee fr from om the pr pro oof of Proposition 0.2.3.2, in 0.2.3.2, in order to prove Proposition 0.2.3.3 0.2.3.3 .
∈ RE .
Proposition Propositi on 0.2.3.4 0.2.3.4 f h
0.2.
COMPU COMPUT TABILITY THEOR THEORY Y
19
Proof: We build a Turing Machine M Machine M h which accepts f accepts f h . Essentially, M Essentially, M h is the Universal Turing Machine. M h (enc enc((M ) w) simulates simulates M , and if M (w) halts, then it outputs 1. If M (w) does not halt, M halt, M h (enc enc((M ) w) = . Propositions 0.2.3.3 Propositions 0.2.3.3 and and 0.2.3.4 0.2.3.4 produce produce a classification for f for f h . The question which we shall answer, is how to classify any problem f , f , by establishing membership in R in R and and RE RE , respectively. We start with a simple proposition:
⊥
Proposition Propositi on 0.2.3.5 0.2.3.5 R ( RE .
⊆
∈∈
Proof: R R RE E is straightforward from Definition 0.2.3.2. Definition 0.2.3.2. Let Let f R, R , and 0 0 M ff be the TM which decides M decides M f TM M such that M that M (w) = 1 f . We build the TM M 0 0 iff M ff (w) = 1 and and M (w) = iff M f simulates M M but enters f (w) = 0. M simulates 0 into an infinite loop whenever M f accepts f f hence f hence f RE . f (w ) = 0. M accepts R = RE RE has already been shown by Propositions 0.2.3.3 and and 0.2.3.4. 0.2.3.4. but f h R. f h RE but f Thus, R and and RE should “ scale ” for solvability: R RE should be interpreted as a “scale membership is complete solvability, RE RE membership is partial solvability, while non-membership in in RE is “complete complete ”” unsolvability.
⊥
∈
6
∈∈
6∈
Remark 0.2.3.1 We note that R and R and RE are RE are not the only sets of functions which are used used in Computabili Computability ty Theory. Theory. It has been been shown that there there ar are e “ degrees “ degrees” ” of unsolvability, of “ higher “ higher level” level” than R R and and RE RE .. These degrees are intuitively obtained as follows: We assume we live in a world where where f h is decidable (recursive). Now, as before, we ask which problems are recursive and which ar aree recursive recursively-en ly-enumer umerable. able. It turns out that, also in this ideal ideal case, there still exist recursive and recursively-enumerable problems, as well as some which are neither. This could be imagined as “ undecidability “ undecidability level 1”. Now, Now, we take some probl problem em which which is in RE RE on level 1, and repeat the samee assump sam assumptio tion: n: it is de decid cidabl able. e. Again, gain, under under thi thiss assump assumptio tion, n, we find pr problem oblemss in in R, RE RE and outs outsid idee th thee tw two, o, wh whic ich h fo form rm up “ undecidability “ undecidability level 2”. 2”. This process process can be repeate repeated d ad ad infinitum. infinitum. Returning to our simpler classification, we must observe an interesting feature of recursively-enumerable functions, which is also the reason they are called this way.
∈
Proposition 0.2.3.6 Proposition 0.2.3.6 A function function f H om( om(N, {0, 1}) is recursi recursively vely enumerable i ff there there exists a Turing Machine which can enumerate/generate enumerate/generate w all elements in Af = {w N | f (n ) = 1}. Intuit Intuitive ively, ly, Af is the set of inputs of f f for for which the answer at hand is is yes. yes.
∈
20
⇒
Proof: = Suppose f f is recursively-enumerable and M M accepts f . We ∗ specify the TM genera generatin tingg write wi to refer to the ith word from Σ . We specify Af by the following pseudocode: Algorithm 1: GEN() 1: GEN()
∅
static Af = ; static k = 0; while True while True do for 0 for 0 i k do run M (wi ); run if if M (wi ) halts before k steps and i Af = A f {wi }; return wi return end end k = = k k + + 1; end
≤ ≤
∪
6∈ A then f
The value of k from the for has a two-fol two-fold d usage. usage. First, First, it is used to explore all inputs inputs wi : 0 i k. k . Sec Second ond,, it is used as a time-li time-limit mit for for M . precisely k step steps. s. If M (wi ) = 1 in at most run M (wi ), for precisely For each each wi we run k steps, then then wi is added to to Af , and then returned (written on the tape). Also, w Also, w i is stored for a future execution of GEN. If M If M ((wi ) = 1 for some w some w i , then there must exist a k a k : k : k i such that M that M ((wi ) halts after k after k steps. Thus, such a a k will eventually be reached. = Assume Assume we have the Turing Turing Machine Machine GEN which generates generates Af . We construct a Turing Machine M Machine M which which accepts accepts f . M M works works as follows:
≤ ≤
≥
⇐
Algorithm 2: M 2: M ((w)
∅
Af = , n = 0; while w Af do while w = = GE GEN N ((); ); Af = A f {w} end return 1 return 1 M M simply uses GEN to generate elements in Af . If If w Af , if will eventu eve ntually ally be generated, generated, and M M will will ou outp tput ut 1. Othe Otherw rwis isee M M will loop. Thus M Thus M accepts f accepts f .. Proposition 0.2.3.6 is useful since, in many cases, it is easier to find a generator for f for f ,, instead of a Turing Machine which accepts f . f .
6∈
∪
∈
0.2.
COMPU COMPUT TABILITY THEOR THEORY Y
21
Finally, in what follows, we shall take a few decision problems, and apply a reduction reduction technique, technique, in order to prove they are not decidable. Halting on all inputs Let: f all all (n
enc(M )
)=
1
iff M M halts for all inputs 0 ot othe herw rwis isee
6∈ ∈
a reduction (from f (from f h ). It The technique we use to show f show f aallll R is called a reduction R. Starting from the TM M TM M all proceeds as follows. We assume f assume f all all which all decides f all decides decides f h . Thus, if f all all we built a TM which decides f all is decidable then f h is decidable, which leads to contradiction. First, for each fixed TM TM M M and fixed input w Σ∗ , we build the TM ΠM,w (ω ) = “Replace ω by w and then simulate M (w )”. It is is easy easy ttoo se seee ∗ that ( ω Σ : ΠM,w (ω ) halts) iff M (w) halts. Now, we build the TM M TM M h which decides decides f h . The The input input of M h is is enc( enc(M ) w. We construct construct ΠM,w and
∈
∀ ∈
Π all (enc a ll must run run M enc( ( M,w Byfor assumption assumption ll must always halt. the output ΠM,w is 1, M then (ω ) )). halts all inputs, M hence M ((w M ) halts. WeIfoutput 1. If the output is 0, then ΠM,w (ω ) does not halt for all inputs, hence M hence M ((w) does not halt. We output 0. We have built a reduction from f all Usin ingg th thee TM which which dedeall to f h : Us cides f all cides Since ce f h is not all we have constructed a machine which decides f h . Sin recursive, we obtain a contradiction.
∈ ≤
om((N, {0, 1}). Definition 0.2.3.3 (Turing-reducibility) Let f A , f B H om We say f A is is Turing Turing reducible to f B , and write write f A T f B i ff there exists a a decidable transformation T H om( om(N, N) such that that f A (n) = 1 i ff f B (T ( T (n)) = 1. 1.
∈
Remark 0.2.3.2 (Reducibility) We note that the transformation T T must be computable, in the sense that a Turing Machine should be able to compute T , pute T , for any possible valid input.When proving proving f halt T f all all , we have halt enc(M ) w taken n taken — an instance of of f h and shown that that it could ould be tr trans ans-enc(M ) w enc(ΠM,w ) formed into into n )=1 — an instance of f all all , such that f h (n enc(ΠM,w ) i ff f aallll (n ) = 1. A Turing Machine can easily easily pe perform rform the transfo transforrmation of enc( enc(M ) w to to enc( enc(ΠM,w ) since it involves adding some states and transitions which precede the start-state of M , hence T T is computable.
≤
Halting on 111 enc(M )
f 111 111 (n
)=
1
iff M M (111) (111) halts 0 other otherwi wise se
22 We reduce reduce f h to to f 111 Assume M 1111 decides f 1111 Given a Turing uring Ma111 . Assume 11 decides 11 . Given chine M chine M and and word word w, we construct the machine: ΠM,w (ω )
= if ω = 111 then 111 then M (w) else loop
We observe that (i) the transformation from enc enc((M ) w to enc enc((ΠM,w ) is decidable since it involves adding precisely three states to M : the these se states states check the input ω , and if it is 111 — replace it with w and run M ; (ii) M 111 R. 111 (ΠM,w ) = 1 iff M (w ) halts. The reduction is complete. f 1 111 11
6∈
Halting on some input We define: enc(M )
f any any (n
)=
1
iff M M ((w) halts for some w some w 0 other otherwi wise se
∗
∈Σ
We reduce reduce f 111 to f any assume f any by M any con111 to any . We assume any is decided by any . We construct: ΠM (ω ) = Replace ω by by 111 111 and and M (111) (111) Now, M any Now, enc((ΠM )) = 1 iff M (111) (111) = 1, hence we can use M any any (enc any to build R. a machine which decides decides f 111 Contradiction f any 111 . Contradiction f any
6∈
Machine halt equivalence We define: enc(M 1 ) enc (M 2 )
f eq eq (n
)=
1
for all w 0 ot othe herw rwis isee
∗
∈ Σ : M 1(w) halts iff M M 2 (w) halts
We reduce reduce f all to f eq et M triv eq . Let triv be a one-state Turing Machine which all to halts on every input, and M eq eq be the Turing Machine which decides f eq eq . Then M Then M eq enc((M ) enc( enc(M triv M halts on all inputs. We have shown eq (enc triv ) = 1 i ff M halts we can use M eq Contradiction. ion. eq in order to build a machine which decides f all all . Contradict eq R. f eq So far, far, we ha hav ve us used ed re redu duct ctio ions ns in orde orderr to esta establ blish ish pr prob oble lem m nonnonmembership in R in R.. There are other properties of R R and RE RE which which can be of use for this task. First we define:
6∈
∈∈ H om om((N, {0, 1}).
Definition 0.2.3.4 (Complement of a problem) Let f f We denote by f f the problem: f (n) =
1 i f (n) = 0
We call f f the complement the complement of f .
ff 0 i ff f (n) = 1
0.2.
COMPU COMPUT TABILITY THEOR THEORY Y
23
For instance, the complement of f f h is the problem which asks if a Turing Machine M does Machine does not halt for input w input w.. We also note that f f = f . f . Next, we define the class:
∈∈ H om( ∈ RE } om(N, {0, 1}) | f ∈
= { f coRE =
coRE contains the set of all problems whose complement is in RE. We establish that:
∩∩
Proposition Propositi on 0.2.3.7 0.2.3.7 RE coRE = R.
∈∈
∩
Proof: Assume Assume f RE R E coRE . Hence, Hence, there exists a Turing Machine Machine M which accepts accepts f f and and a Turing Machine M M which accepts accepts f . We build build the Turing Machine: M ∗ (w) = for i
∈N:
run run M (w) for for i steps. If M (w) = 1, 1 , return return 1. Otherw Otherwise: ise: run M (w) for run for i steps. If M (w) = 1, 1 , return return 0.
Since M Since M and and M will will always halt when expected result is 1, they can be used together to decide decide f . Hence Hence f R.
∈∈ ∈ R i ff f ∈∈ R. Proposition Propositi on 0.2.3.8 0.2.3.8 f ∈
Proof: The proposition follows immediately since the Turing Machine which decides f decides f can can be used to decide f f ,, by simply switching it’s output from 0 to 1 and 1 to 0. The same holds for the other direction. We conclude this chapter with a very powerful result which states that an an category/type category/type of of problems does not belong in R.
⊆⊆
Theorem 0.2.3.1 (Rice) Let C RE . Giv Given en a Turing uring Machine Machine M , we ask: “ The “ The problem accepted by M by M is is in C ?. Answering this question is not in R. Proof: We consider that the trivial problem f problem f ((n) = 0 is not in C . Since C is ∗ C , and since non-empty, suppose suppose f since f ∗ is recursively-enumerable, let let M ∗ be the Turing Machine which accepts accepts f ∗ . We apply a reduction from a variant of f 111 111 , namely f x . f x asks if a Turing Machine halts for input x. We assume we can decide the membership f C by by some Turing Machine. Based on the latter, we construct a Turing
∈
∈∈
24 Machine which decides decides f x (i.e. solves solves the halting problem for a particular input). Let M Let M x be the Turing Machine which accepts which accepts f x . Let: Πw (ω ) = if M x (w) then then M ∗ (ω ) else loop. If f Πw is the problem accepted by Πw , we show that: f Πw
∈ C iff M (w) halts x
( ). Suppose f Suppose f Πw C . Then Πw (ω ) cannot loop for every input ω Σ∗ . If there were so, then f then f Πw would be the trivial function always returning 0 for any input, which we have assumed is not in C . Thus, Thus, M x (w) halts. ( ). Suppose M Suppose M x(w) halts. Then the behaviour behaviour of Πw (ω ) is precisely that of ∗ M (ω ). Πw (ω ) will return 1 whenever M whenever M ∗ (ω ) will return 1 and Πw (ω) = whenever M ∗ (ω ) = . Since f Since f ∗ C , then also also f Πw C . In Theorem 0.2.3.1, the set C should be interpreted as a property property of problems, proble ms, and subsequently subsequently of the Turing Turing Machines Machines which accept them. Checking if some Turing Machine M ∗ satisfies the given property is undecidable. cidabl e. Consider Consider the property property informally informally described as: The set of Turing Machines(/computer programs) that behave as viruses . The ability of deciding whether a Turing Turing Machine Machine beha b ehaves ves as a viru viruss (i.e. belongs to the former former set) is undecidable, via Rice’s Theorem.
⇒
∈
∈
⇐
⊥
0.3 0. 0.3. 3.1 1
∈
∈
⊥
Com Compl plex exit ity y Th Theor eory y Measu Measuri ring ng tim time e and spa space ce
In Computability Theory, we have classified problems (e.g. in classes R classes R and RE ) based on Turing Machine’s ability to decide/accept them. In order to classify problems based on hardness, we need to account for the number of steps (time) the number (time) and tape and tape cells which cells which are employed by a Turing Machine. The amount of spent resources (time/space) by a Turing Machine M may be expressed as functions: ∗ T M , S M M : Σ
→N
where T M (w) (resp. S M number of steps steps performe performed d (resp. (resp. tape M (w )) is the number cells used) by by M , when running on input input w. This definition suff ers ers from un-necessary un-necessary overhead, overhead, which which makes makes time and space analysis difficult. We formulate some examples to illustrate illustrate why this is the case:
0.3.
COMPL COMPLEXITY EXITY THEOR THEORY Y
25
Alg(n) Alg( while n < 100 while n = = n n + 1 return 1 We note that Alg runs 100 steps for n = 0 while only one step, for n 100. Howeve However, r, in practice, practice, it is often considered considered that each input is as 6 likely to occur as any other . For this reason, we shall adopt the following convention: (?) We allways consider the most expensive most expensive/ / defavourable defavourable case, given inputs of a certain type . In our previou previouss example, example, we consider consider the Alg as being 100, since this is the most expensive case. running time of Alg as Consider the following example:
≥
Sum(v, n) Sum( s = 0, i = 0 while i < n while s = = s s + v [i] i = = i i + 1 return s Unlike Alg, Unlike Alg, Sum does Sum does not have a universal upper limit on it’s running Sum executes, depends on the number of varitime. The number number of steps steps Sum executes, ables from v from v,, namely n namely n,, and is equal to 2n 2n + 3, if we consider each variable initialisation initial isation and the return return statement statements, s, as computing computing steps. Thus, Thus, we observe that (??) The running time (resp. (resp. co consume nsumed d spac space) e) of an algorithm algorithm will grow as the size of the input grows. We can now merge ( ?) and (??) into a definition: Definition 0.3.1.1 (Running time of a TM) The running running time of time of a TurN
N
ing Machine M M is is given by T M : i ff : ∗ w Σ : the nb. nb. of transition transitionss performed performed by by M M is is at most T M (|w|)
∀ ∈
→
Remark 0.3.1.1 (Consumed space for a TM) A naive definition for consumed space of a Turing Machine would state that S M M (|w |) is the number 6
This is not often the case. There are numerous algorithms which rely on some probability that the input is of some partic ability particular ular type. For instance instance,, efficient SAT solvers rely on a particular ordering of variables, when interpretations are generated and verified. On certain cert ain ordering orderingss and certai certain n formulae, formulae, the algorith algorithm m runs in polynomial polynomial time. The key estimate to the efficiency of SAT solvers is that programmers and efficient ordering, based on some expectancy from the input formula. The algorithm may be exponentially costly close-to-polynomial l time, for most of the inputs. for some formulae, but run in close-to-polynomia
26 of tape cells which which M M employs. employs. This definition definition is imprecise. imprecise. Consider Consider the Turing Machine which receives a binary word as input and compute whether the word is a power of 2n . Asside from reading it’s reading it’s input, the machine consumes no space. Thus, we might refine our definition into: “ S S M M (|w|)” is the number of tape writes which writes which M performs”. M performs”. This definition is also imprecise. Consider Consid er the binary counter counter Turing Turing Machine from from the former former chapter. It performs a number of writes proportional to the number of consecutive consecutive 1’s found at the end of the string. However, the counter does not use additional space, but only makes processing on the input. Thus, the consumed space S S M is the number of written cells except cells except M (|w |) is the from the input. input. Consi Consider der a Turing Machine which which rece receives ives n numbers encoded as binary words, each having at most 4 bits, and which computes the sum of the numbers, modulo 2 modulo 2 4 . Apart from reading the 4 4 n bits and n n 1 word separators, the machine employs another 4 cells to hold a temporary sum. Thus, the consumed space for this machine is 47 . A formal definition for consumed space of a TM is outside the scope of this course, course, since since it invol involves ves multi-tape multi-tape Turing Machines. The basic idea is to separate the the input from input from the rest of the space used for computation. Thus, when assesing the consumed space of an algorithm, we shall never shall never account for the consumed space by the input .
∗
−
Recall that, as in Computability Theory, our primary agenda is to produce a classificatio classification n of problems. problems. To this end, it makes makes sense to first introintroduce a a classification of Turing Machine running times . Asymptotic notations Remark 0.3.1.2 (Running (Running times vs. arbitrary arbitrary functions) In functions) In the previous section, we have defined running times of a Turing Machine as func N, and we have seen that they are often monotonically tions: T : N increasing ( n m = T (n (n) T (m ( m)). While mono monotonici tonicity ty is common common among the running times of conventional algorithms, it is not hard to find examples exampl es (more-or-l (more-or-less ess rea ealisti listic) c) where where it do does es not hold. For instanc instance, e, an algori alg orithm thm may simply simply return return,, if it’s it’s input input exce exceeds a given given siz size. e. Thus, Thus, we shall not, in general, assume that running times are monotonic. Furthermor urthermore, e, we shall extend our classific classification ation to arbitra arbitrary ry functions functions R, since there is no technical reason to consider only functions over f : R naturals. natur als. In support support for this, we shall also add that asymptotic asymptotic notations notations ar are e
→ ≤
⇒
≤
→
7
We can also build another machine which simply uses the first number to hold the temporary sum, and this use no additional space.
0.3.
COMPL COMPLEXITY EXITY THEOR THEORY Y
27
useful in other fields outside complexity theory, where the assumption that functions are defined over natural numbers only is not justified. Definition 0.3.1.2 (Θ (theta) notation) Let g : R is the class of functions:
→ R.
Then Then Θ(g (n))
+
Θ(g (n))
= { f : R
R ∀n ≥ n0, c1g(n) ≤ f (n) ≤ c2g(n)} → R | ∃∃nc10, ∈c2 ∈ N
Thus, Θ(f (n)) is the class of all functions with the same asymptotic asymptotic growth as f (n). We ca growth can n ea easi sily ly observ observee th that, at, fo forr all all cont contin inuo uous us g, f f (n) c , where c where c = 0. H om( om(R, R) such that that g Θ(f (n)), we have limn→∞ g(n) = c, There is an infinite number of classes Θ(f (n)), one for each function f . However, if g( g (n) Θ(f (n)), then Θ(g (n)) = Θ(f (n)). It makes sense to consider classes which describe functions with inferior /superior superior asymptotic growth:
6
∈
∈
∈
Definition 0.3.1.3 (O,Ω notations) Let f : R
→ R. Then:
+
O(g (n)) = { f : R
→ R | ∃∃cn0∈ ∈RN ∀n ≥ n0, 0 ≤ f (n) ≤ c ∗ g(n)}
= { f : R
→ R | ∃∃cn0∈ ∈RN ∀n ≥ n0, 0 ≤ c ∗ g(n) ≤ f (n)}
+
Ω(g (n))
∈
⇒
⊆
∈
⇒
Note that g that g O(f (n)) = O(g (n)) O(f (n)), while g while g Ω(f (n)) = Ω(g (n)) Ω(g (n)). Finally Finally,, Ω(f (n)) O(f (n)) = Θ(f (n)). Eac Each of the the above propositions can be easily proved using the respective definitions of the notations. notations.
⊆
Ω
∩
ff
o er relaxed relaxed bounds bounds asympt asy mptotic function func tion growth. growth.atThus, Th us, g O O (and f (n)) should be read as:The for as:The function functi on g otic grows asymptotically most as much as f . It makes sense to also consider tight consider tight bounds:
∈
Definition 0.3.1.4 (o,ω notations) +
o(g (n)) = { f : R
→ R | ∀∃cn0∈ ∈RN ∀n ≥ n0, 0 ≤ f (n) ≤ c ∗ g(n)}
ω (g (n)) = { f : R
→ R | ∀∃cn0∈ ∈RN ∀n ≥ n0, 0 ≤ c ∗ g(n) ≤ f (n)}
+
28
∈
Thus, g o(f (n)) should be read: g grows assymptotically strictly less than f . We have have o(f (n)) ω (f (n)) = O(f (n)) Ω(f (n)) = Θ(f (n)) and ω (f (n)) Θ(f (n)) = Ω(f (n)).
∩
∅
∩
∪
Exercise Exerci se 0.3.1.1 0.3.1.1 If f (n) If f (n) If f (n)
∈ Ω(n2) and g(n) ∈ O(n3) then f (n)/g /g((n) ∈ . . . 2 3 ∈ o(n ) and g(n) ∈ Θ(n ) then f (n) · g(n) ∈ . . . ∈ Θ(n3) and g(n) ∈ o(n2) then f (n) \ g(n) ∈ . . .
Exercise Exerci se 0.3.1.2 0.3.1.2 Prove or disprove the following implications: O (n) f (n) = O O(log (log n) 2f (n) = O( 2 f (n) = O O((n ) and g (n) = O O((n) f (g (n)) = O( O (n3 ) f (n) = O O((n) and g (n) = 1 + f (n) g (n) = Ω(log n)
⇒
p ⇒
⇒
Syntactic sugars This section follows closely Lecture 2 from [1 [1]. Quite Quite often, often, asympt asymptotic otic notations are used to refer to arbitrary functions having certain properties related to their order of growth. For instance, in:
df (x)e = f f ((x) + O(1) applying “rounding” to f to f ((x), may be expressed as the original f original f ((x) to which we add a function bounded by a constant. Similarly: 1 1
−x
= 1 + x + x2 + x3 + Ω(x4 ), for
− 1 < x < 1
The above notation allows us to “formally disregard” the terms from the expansion, by replacing them with an asymptotic notation which characterise ter isess their their order order of growth growth.. One should should make make a distin distinctio ction n betwe between en the usage of asymptotic notations in arithmetic in arithmetic expressions, expressions, such as the ones previously illustrated, and equations. equations. Consider the following example: f (x) = O(1 O (1/x /x))
∈
which should be read: there exists a function h O(1/x (1/x)) such that f (x) = h(x). Similarly: f (x) = O O(log (log x) + O(1/x (1/x))
∈
should be read: there exist functions h h O(1/x (1/x)) and w w f (x) = w( w (x) + h(x). In equations such as: O(x) = O O(log (log x) + O(1/x (1/x))
∈ O(log x) such that
0.3.
COMPL COMPLEXITY EXITY THEOR THEORY Y
29
the equality is not symmetric, and should be read from left to right: for any function f O O((x), there exist functions h O(1 O (1/x /x)) and w O(log O (log x) such that that f (x) = w(x) + h( h (x). In order to avoid avoid mis mistak takes, es, the followin followingg algorithmic rule should be applied. When reading an equation of the form:
∈∈
∈
∈
left = left = right right
• each occurrence of an asymptotic notation in left in left should should be replaced by an an universally quantified function quantified function belonging to the corresponding class. • each occurrence of an asymptotic notation in right in right should should be replaced by an existentially quantified function from the corresponding class. 0.3.2 0.3 .2
Runnin Running g time time in comp complex lexit ity y theory theory
Using asymptotic notations, we can distinguish between running (of algorithms) with diff erent erent asymptotic asymptotic growths. growths. Experience Experience has shown that, it is unfeasible to develop a theory which uses asymptotic notations in order to classify problems, based on their di fficulty culty.. Thus, in complexity theory, theory, we make an even stronger assumption: the exponent of a polynomial function is un-important . Recall Recall that, with asymptoti asymptoticc notatio notations, ns, we do not diff erer2 2 2 entiate between n between n and 2n 2n + n +1 (and denote either of the two by Θ (n )). In Complexity Theory, we do not distinguish between, e.g. n2 and and n3 , and thus, we write n write n O(1) , thus refering to a polynomial of arbitrary degree. Before introducing a classification of problems, there is a question which must be addressed: How does the encoding of a problem instance a ff ect ect the running time of the subsequent algorithm? To see why this issue is important, consider the encoding of numbers us usin ingg a single single digit digit.. (e (e.g. .g. IIIIIIII encode encodess the number number 8). A Turin Turingg Machine M which Machine which starts with the tape: > I
I
I
I
I
I
#
#
and increments the represented number by shifting the head to the first empty cell, where it places I , will perform a number steps which is linear with respect to the size of the input. Thus, the running time of of M M is is O(n) where n where n = | w|, and and w is M is M ’s ’s input. The Turing Machine which uses the binary alphabet, and encodes numbers as binary words, will also run in linear time w.r.t. the size of the input, but in this case, there is an exponential an exponential gap between gap between the two representations.
30
d
e
The representation of a natural x consumes consumes n = log x cells in the second machine, and x and x cells, in the first. Note that that x = 2n . This is one of the rare cases cases [2] [2] where where a bad choice of a representation may lead to an exponential exponential increase increase in the number number of steps. steps. In what follows, we assume problem instances are encoded in some default way: e.g. graphs are represente represented d as adjacency adjacency matrices matrices or as adjacency adjacency lists, etc. When appropriate representations are chosen, the computational gap between them is is at at most polynomial . As an example, let us compare a matrix representarepresentation of a graph, with that of adjacency lists. Assume the graph is directed, contains n nodes and only one edge (u, contains ( u, v). The matrix represen representation tation will 2 2 consume n consume n positions (out of which n which n 1 are equal to 0), while the list representation will consume only one position (corresponding unique element from the adjacency list of u). Howeve However, r, the gap between the two represenrepresentations is polynomial. Thus, from the point of view of Complexity Theory, it is irrellevant if we chose matrices or adjacency lists to represent graphs. This observ observation is highlighte highlighted d by the following following Proposition: Proposition:
−
→
Proposition 0.3.2.1 (The encoding does not matter) Let f f : N {0, 1} be a problem which is decided by a Turing Machine Machine M M in time time T . T . 0 M M is is defined over alphabet Σ. Then, ther theree exists a Turing Machine M — 0 defined over alphabet Σ = {0, 1, #, >}, which decides decides f f and runs in time O(log |Σ|) T ( T (n).
∗
Proof:(sketch) We Proof:(sketch) We build build M 0 = (K ( K 0 , F 0 , Σ0 , δ 0 , s00 ), from from M as follows:
• Σ0 = { 0, 1, #, >}. We encode each symbol symbol diff erent erent from # (the empty cell) and and > (the marker symbol of the beginning of the input), as a word w Σ0 with |w| = log |Σ| . We use word use k to refer to the length |w| of the word w word w.. We write enc write encΣ (x), with x with x Σ, to refer to the encoding
∈
d
0
e
∈
of symbol x symbol x Σ. states q q 1 , . . . q2 k+1 K 0 , organized • For each state s state s K , we build 2k+1 states as a full binary tree of height k height k.. The purpose of the tree is to recognize the word word encΣ (x) of length length k from the tape. Thus, Thus, the unique state state at the root of the tree, namely q 1 , is responsible for recognising the first bit. If it is 0, M 0, M 0 will transition to to q 2 and if it is 1, to q 3 . q 2 and q 3 must each recognize the second bit. After their transitions, we shall be in one of the states states q 4 to to q 8 , which give us information about the firstt tw firs twoo bits bits of the word. word. The states states from from level level i recognize the first i bits of the encoded symbol symbol encΣ (x). The states states from the last level level k are 2 in number, and recognize the last bit of the encoded symbol
∈
∈
∈
0
0
0.3.
COMPL COMPLEXITY EXITY THEOR THEORY Y
31
Thus,, ea eacch of th thee 2 k leaf-states in the tree corresponds to encΣ (x). Thus one possible symbol symbol x Σ which is encoded as enc as enc Σ (x). We connect k+1 all 2 states by transitions, as described above. 0
∈
0
• For each transition δ (s, x) = (s0 , x0 ,pos ,pos)) of M , the machine M machine M 0 must: (i) recognize recognize x, (ii) override override x0 , (iii) move according to to pos pos and and go to 0 state s . Thus: state – (i) is done by the procedure described at the above point. – for (ii), we use k states to go back (k ( k cells) at the beginning of 0 encΣ (x) and write enc write encΣ (x ), cell by cell. Finally Finally, we connect connect the state corresponding to to encΣ (x) from the tree, to the first of the above-described k above-described k states. 0
0
0
– for (iii) if pos = pos = L/R L/R,, we use another k another k states to go either left or right. If pos = pos = H H ,, we need not use these states. Finally, we need to make a transition to the root of the state tree corresponding to to s s 0 . For each transition δ (s, x) = (s0 , x0 ,pos ,pos)) of M , M 0 performs k transitions for reading the encoded symbol symbol x, k transitions for writing writing x0 and possibly k tran transit sitions ions for moving moving the tape head. head. Thus, Thus, for all all w Σ0 , the number of transitions performed by by M 0 is at most 3k 3k T ( T (|w|). Hence, Hence, the running running 0 time of M is is O(k ) T ( T (n). The proof of Proposition 0.3.2.1 Proposition 0.3.2.1 shows shows that any Turing Machine using an arbitrary alphabet Σ can be transformed in one using the binary alphabet. The overhead of this transformation is logarithmic: O( log |Σ| ). Thus Thus,, if the original Turing Machine runs in some polynomial time T ( T (n), then the transformed TM will run in O( log |Σ| ) T ( T (n) time which is bound by a polynom poly nomial. ial. Similar Similarly ly,, if the original original TM is runnin runningg in suprasupra-poly polynom nomial ial
∗
∗
∈
d
d
e
e∗
time, the transformed TM will also run in supra-polynomial time. In what follows, we shall assume all Turing Machines are using the binary alphabet Σb = { 0, 1, #, >}. 0.3.3 0.3 .3
Comple Complexit xity y classes classes
In the previous section, we have stated that, in complexity theory, we shall make no distinction between polynomial running times with di ff erent erent asymp2 3 totic growths growths (e.g. between between n and and n ). With this in mind, we construct a classificatio classi fication n of problems. problems. First, First, we say that: f f is decidable in time time T ( T (n) iff there exists a Turing machine M M which decides f , and whose running time is is T . T . We interchange interchangeably ably use the terms decidable terms decidable and and solvable solvable , since,
32 in this chapter, there is no ambiguity on what “solvability “solvability ” means, and it cannot be mistaken by acceptability. All considered problems in this section are members of R. The following following definition characteri characterizes zes problems problems with a specific running time. DTIME(T DTIME( T ((n)) = { f : N
→ {0, 1} | f is f is decidable in time time O(T ( T (n))}
Note that, unlike asymptotic notations, DTIME(T DTIME(T ((n)) is a class of problems, lem s, not of runnin runningg times. times. Also, Also, note note that that our characte characteriz rizatio ation n does not provid pro videe a strict strict upper upper bound. Hence, Hence, e.g. DTIME( DTIME(n n) DTIME( DTIME(n n2 ). In word wo rds: s: a pr prob oble lem m whic which h is deci decida dabl blee in lin liniar iar time time,, is also also de decid cidab able le in quadratic time. Next, we introduce the class:
⊆
PTIME =
[ DTIME( DTIME(n n ) d
d∈N
PTIME is often abbreviated P. It is the class of all problems which are decidable decida ble in polynomial time. Also, note that if some problem f problem f is is decidable in log(n log(n) time, then then f DTIME(log( DTIME(log(n n)) DTIME( DTIME(n n) P. Hence, Hence, even even problems which are solvable in sub-liniar time belong in the class P. Further on, we introduce the class:
∈∈
⊆
EXPTIME =
⊆
nd
[ DTIME(2
)
d∈N
which contains all the problems which are decidable in exponential time. Naturally, we have: P EXPTIME R
⊆
⊆
There are two interesting questions which can be raised, at this point: 1. Is the inclusion P
⊆ EXPTIME strict?
2. Are all problems problems in EXPTIME EXPTIME \ P8 , “equals ”, ”, in terms of di fficulty? In the following section, we shall focus on the latter question: Nondeterminism and Nondeterministic Turing Machines We recall the problem SAT, which takes as input a boolean formula ψ in Conjunctive Normal Form (CNF). More precisely: ψ = = C C 1 8
∧ C 2 ∧ . . . ∧ C
n
Later in this chapter, we shall see that EXPTIME \ P is a set whose members are currently unknown.
0.3.
COMPL COMPLEXITY EXITY THEOR THEORY Y
where, for each each i : 1
≤ i ≤ n we have C i = = L L i1
and, for each each j : 1
33
Li2
...
Limi
∨ ∨ ∨ ≤ j ≤ m we have i
Lij = x x or or L L ij = ¬ x and finally, x finally, x is a variable. Recall that SAT can be solved in exponential time, hence SAT EXPTIME. The major source of complexity consists in generating all possible interpretations tati ons on which which a ve verifi rificat cation ion is subseq subsequen uently tly done. In order order to answe answerr question 2. from the previous section, suppose that SAT would be solvable in polynomial time. Could we find problems (possibly related to SAT) which are still solvable in exponential time under our assumption? The answer is yes: Let γ be be the formula:
∈
∀ ∀
∀
γ = x1 x2 . . . xk ψ
∈
where ψ is a formula in CNF containing variables x1 , . . . , xk , and k N is arbitrar arbitrarly ly fixed. fixed. Check Checking ing if γ is satisfiable is the problem SA SAT. T. An algorithm for SAT must build build all combinations of 0/1 values for each each xi with i : 1 i k and for each one, must solve an instance of the SAT with problem. In total, we have 2k combinations, and since k since k is is part of the input, the algorithm runs in exponential time, provided time, provided that we have an algorithm for SAT for SAT which runs in polynomial time . In order to study the di fficulty of problems, in Complexity Theory, we generalise the above-presented approach, in order to determine degr degrees ees of hardness . In order order to do so, we need a genera generall ve versi rsion on of our assumpt assumption ion::
∀ ≤ ≤
∀
“SAT is “SAT is solvable in polynomial time ”. ”. In other words, we need a theoretical tool to make make exponential exponential search (seem) (seem) polynom p olynomial. ial. This tool will have have no relation relati on to reality reality, and no implications implications for real problem solving. It should not be unders understood tood as a techni technique que for deciding deciding hard problem problems. s. It’s mere mere purpose pur pose is theore theoretic tical: al: it allows allows us to abstra abstract ct one source source of com comple plexit xity y (that of SAT in our example) in order to explore others (e.g. SAT). The tool that we mentioned is the Nondeterministic Turing Machine :
∀
Definition 0.3.3.1 (Non-deterministic TM) A non-deterministic Turing machine (NTM short) is a tuple tuple M M = (K,F, Σ, δ , s0 ) over alphabet Σ with K, Σ and s0 defined as before, F F = {syes , sno } and δ K Σ K Σ {L,H,R} is a transition transition relation .
×
⊆ ⊆ × × × × ×
34 A NTM NTM terminates terminates i ff it it reaches a final state, hence, a state in in F F .. A w {0, 1} i ff f (n ) = 0 = M (w) NTM M NTM M decides a function function f f : N reaches state sno on all possible sequences of transitions an transitions and d f (nw ) = 1 = M (w) reaches state ss yes on at least one sequence of transitions. transitions. We say the running running time time of a NTM M M is is T T i ff , for all w Σ, all sequences of transitions of M (w) contain at most T ( T (|w|) steps.
→
⇒
⇒
∈
We sta start rt with a few technic technical al observ observatio ations. ns. First First note that the NTM is specific specifically ally tailored tailored for decisi decision on proble problems. ms. It has only tw twoo final final states states,, which correspond to yes/no to yes/no answers. answers. Also, the machine machine does not produce an output, and the usage of the tape is merely for internal computations. In essence, these “design choices” for the NTM are purely for convenience, and alternatives are possible. Whereas the conventional Turing Machine assigned, for each combination of state and symbol, a unique next-state, overriding symbol and head movement, a nondeterministic machine assigns a collection a collection of of such elements. The current The current configuration of of a conventional Turing Machine was characterized by the current state, by the current contents of the tape, and by the head position. A configuration of the nondeterministic machine corresponds to a set set of of conventional TM configurations. The intuition is that the NTM can simultaneously process a set of conventional configurations, in one single step. While the execution execution of a Turing Machine Machine can be represen represented ted as a sequence , that of the NTM can be represented as a tree . A path in the tree corresponds to one sequence of transitions which the NTM performs. Now, notice the conditions under which a NTM decides a function: if at least one sequence of transitions leads to syes , we can interpret the answer of the NTM as yes . Converse Conversely ly,, if if all sequences all sequences of transitions lead to sno , then the machine returns returns no no.. Finally, when accounting running time of a NTM, weonly do not count all performed transitionsfor (asthe it would seem reasonable), but the length of the longest transition sequence sequence perform performed ed by the machine machine.. The intuition is that all members of the current configuration are processed in parralel , during a single step. We illustrate the NTM in the following example: Example 0.3.3.1 (Nondeterministic Turing Machine) We build the NTM M SAT SAT problem discussed previously. First, we assume SAT which solves the SAT the existence of M M chk chk (I, ψ ) which takes an interpretation and a formula, both encoded as a unique string, and checks if if I |= ψ. M cchk hk is a conventional TM, thus upon termination it leaves leaves 0 or 1 on the tape.
0.3.
COMPL COMPLEXITY EXITY THEOR THEORY Y
35
Step 1: M SAT SAT computes the number of variables from ψ (henceforth referred to as n), and prepends the encoding of ψ with the encoding of of n in unary (as a sequence of 1’s). This step takes O(n) transitions. Step 2: During the the former step, step, M M SAT SAT has created a context for generating interpretations. In this step, M step, M SAT SAT goes over each cell from the encoding of of n, and non-deterministically places places 0 or or 1 in that cell cell.. Thus Thus,, af af-ter 1 such transition, there are 2 possible conventional configurations. In the first, bit bit 0 is placed on the first cell of the encoding of of n. In the second, bit 1 is plac placed in the same positi position. on. After After i transitions, i we have 2 possibl ossiblee configur onfigurati ations ons.. At the end of thi thiss step, step, we have have n 2 possible configurations, and in each one, we have a binary word of length n n at the beginning of the tape, which corresponds to one possible interpretation. All sequences of transitions have the same length, thus, the execution of this part of M SAT SAT takes O(n). Step 3: At the end of each each sequence sequence illustrated illustrated above, we run M M chk (I, ψ), where I and ψ ψ are already conveniently on the tape. This step takes O takes O((n m) where m is maximal number of literals in ψ.
∗
∗
If we add up all running times of the three steps, we obtain O(n m). An important issue is whether the NTM has more expressive power than the conventional TM: Proposition 0.3.3.1 Proposition 0.3.3.1 Every function which is decidable by an NTM in polynomial running time, is also decidable by a TM which runs in exponential time. The proof is left as exercise. Intuitive Intuitively ly,, we can simulate simulate a NTM by doing a backtracking procedure, with a classic TM. The propositions shows that the NTM only off ers ers a gain in speed and not expres expressiv sivee power. power. It solves solves precisely the same problems which the conventional TM solves. Convent Conve ntion ion for descri describing bing NTM in pseudocode pseudocode.. In the prev previou iouss chapters, we often resorted to traditional pseudocode in order to describe algorit algo rithms hms – that that is, Turing uring Machin Machines. es. It is occasio occasional naly y useful useful to be able able to do the sa same me thing thing for for NTMs NTMs.. With With this in mind, mind, we intr introdu oduce ce some notational conventions. The instruction: v = ch choic oice( e(A) A)
where v is a variable and A is a set of values, behaves as follows:
36
• the current (non-deterministic) configuration of the NTM shall contain |A| conventional configuration. • each conventional configuration corresponds to a distinct value a value a A, and it should be interpreted that v = a, a , in that particular configuration.
∈
• the running time of the instruction is O is O(1). (1). We also note that, it is not possible to achieve some form of “communica“ communication ” between conventional configurations. Thus, it is intuitive to think that the processing (execution of a transition) of a conventional configuration is done independently of all other conventional configurations. We add two aditional instructions: success success and and fail fail.. They ccorres orrespond pond to a transitions into states s yes and s and s no , respectively. We illustrate NTM pseudocode, by re-writing the SAT algorithm described above. above. We adopt the same representa representational tional conventi conventions ons from the first Chapter, and also re-use the procedure CHECK. Example 0.3.3.2 (Pseudocode) . SolveSAT( ϕ): Let n be the number of variables in ϕ. Let I I be be a vector with n components which are initialised with 0 for i = 0, n
− 1:
I [i] = choice choice(({0, 1}) if CHECK( CHECK( ϕ,I) = 0 fail else success As illustrated in the previous example, the NTM has the power of taming down the complexity which results from the search of an exponential number num ber of candidates candidates (in our example, interpretat interpretations). ions). Note that, in the NTM, the main source of complexity is given by the verification procedure of M M cchk hk (I , ψ ). By using the NTM, we have managed to find the source of complexity of SAT, SAT, which which is the exponential exponential search search over over possib p ossible le interpre interpretations tations.. We have seen that there are other types of exponential explosion. Thus, we are
0.3.
COMPL COMPLEXITY EXITY THEOR THEORY Y
37
in the position for refining our classification, by introducing new complexity classes:
NTIME(T NTIME( T ((n)) = { f : N
→ {0, 1} | f f is is decidable by a NTM in time O time O((T ( T (n))}
and NPTIME =
[ NTIME( NTIME(n n ) d
d∈N
∈
We usually abbreviate the class NPTIME by NP. Note that SAT NP, however it seems unlikely that SA SAT T NP. We sha shall ll discuss discuss this issue in the next next section. section. Le us relate relate NP with our other other classes. classes. First, First, note that NP EXPTIME EXPTIME.. This result is essentially essentially given given by Proposition Proposition 0.3.3.1: 0.3.3.1: every problem solved by a NTM can be solved in exponential time by a TM. Also, we trivially have: P NP. The concise argument is that the NTM is a generalization of the TM, “minus” some technical details. Thus:
∀
∈
⊆
⊆
P
⊆ NP ⊆ EXPTIME ⊆ R
The fundamental property of problems from NP is that “solution “ solution candidates can be be verified in verified in polynomial time ”. ”. An analogy with solving crosswords wor ds is possible. possible. Generating Generating all possible solutions solutions (by “lexicographi “lexicographically” cally” generating gener ating all posible words) words) is obviously obviously exponentia exponential. l. But since verifiying since verifiying whether a solution is correct can be done in polynomial time w.r.t. the size of the crossword, the problem in in NP. We shall soon see that all algorithms which solve hard problems from NP can be split up into an exponential “generating” “gener ating” procedure, and a polynomial polynomial “verificat “verification ion procedure”. procedure”. 0. 0.3. 3.4 4
Hard Hard and compl complet ete e proble problems ms for a class class
66∈
Recall that proving proving f R for a given problem f was f was done using contraposition. Essentially Essentially,, the proof relied on finding a Turing reduction reduction f ∗ T f to a problem f problem f ∗ for which which f ∗ R is known. First, we note that R that R could be easily replaced replaced with any complexity complexity class X . Thus, a more general proof scheme could be described as:
≤
6∈
f ∗
6∈ X, f ≤≤ f ∗ =⇒ f 66∈ X X
where f ∗ and where f and X X are are given in advance. We observe that we have replaced T by X , in order to highlight that the reduction X depends on the choice
≤
≤
≤
38 of X . X . We shall later see that we cannot allways use Turing Reductions, and there are X are X ’s ’s for which more restrictions on the reduction must be in place. Returning to the proof scheme, we make a second note: to prove f prove f R, ∗ we must first find some some f R. But if this very fact is shown by the same technique, then we need another problem f problem f ∗∗ R and so on. This situation is similar to: “Which “Which came first, hens or eggs? ”. eggs? ”. For the case of of R, this issue was settled by the proof proof f h R using diagonalizatio diagona lization. n. This introduced introduced an initial undecidable undecidable problem, and the aforementioned technique could be employed. Now, let us turn our attention to X X = NP. First, First, let us examin examinee the properties which the reduction should have, in order to make our technique work in this particular particular case. case. let let f ∗ NP NP.. We need to show show that that f NP. The first part consists of assuming f assuming f NP. Next, we a reduction T reduction T : I ff and I f subscript .9 We I ff where where I f f are the inputs of the problem from the subscript. f and I recall that the reduction needs to satisfy:
6∈
6∈
6∈
6∈
6∈ ∈∈
∗
∀i ∈ I
∗
f f ∗ : f
(i) = 1
66∈
→
∗
⇐⇒ f (T ( T (i)) = 1
In words words:: “we can solve all instances i of I ff by solving instances T ( T (i) of I f f ”. This is done as follows: ∗
a. receive receive i i
∈ I
f f ∗
b. compute T compute T ((i)
∈ I
f f
∈∈ NP) and return it’s
c. run the NTM M NTM M f T (i)) (M f since f f (T ( f must exist since f answer.
After a careful look, we observe that the conclusion f ∗ NP (which is our objective, in order to complete the contrapositive argument) is not immediate. immedi ate. If, for instance, the computation computation of T ( T (i) takes exponential time, then th en the proof proof does does not work work:: th thee NTM which which perf perfor orms ms a. b. c. ru runs ns in (non-deterministic) exponential time. Thus, Thu s, the restriction restriction that T that T is is decidable is insufficient. We further need that T T is computable in polynomial polynomial time. time. We writ writee f ∗ p f f iff there ∗ exists a polynomial-time transformation which allows solving f via via f , as illustrated by the scheme above. We now turn our attention to the “hen “hen and eggs ” issue. issue. We need need an initial problem, which is known not to be in NP. Unfortunately, such a
∈
≤
9
In order to make explicit the direction of the transformation, we ignore the fact that and I ff are (subsets of) naturals. both I both I ff and I ∗
0.3.
COMPL COMPLEXITY EXITY THEOR THEORY Y
39
problem is not known, although there have been major (unsuccessful) e ff orts orts in trying to find one. The same holds for the class P. Hence, the issue: P ( NP is currently open, and it is generally believed that it is true, but all proof attempts have failed. The good news is that our e ff ort ort in classifying problems is not fruitless. Transformations: f ∗
≤ f p
f ∗
≤
T f
establish a relation between problems, which is denoted as hardness . No matter the reduction type (polynomial or Turing), f f is at is at least as hard as hard as ∗ f . With With the machine machine M f f (together with computing a transformation) we can solve all instances of f ∗ . It may be possibl possiblee that that T T is bijective: bijective: hence ∗ each input of f f is uniquely mapped to an input of f of f and vice-versa. Then, ∗ f and f and f are are equally equally hard . Howeve However, r, this is not generally generally the case, hence the term “at “at least ” as hard. We can naturally extend hardness to complexity classes: Definition 0.3.4.1 A pr problem oblem f f is ca called lled NP-hard NP-hard i ff for all all f 0 f 0 p f f ..
≤
∈ NP NP,,
Thus,, a prob Thus proble lem m is hard hard w.r.t. w.r.t. a clas class, s, i ff it is at least as hard as any problem in the class at hand. Note that hardness can be defined w.r.t. any complexity class and not just NP, provided that the appropriate type of transformation is employed. Definition 0.3.4.2 A problem problem f f is called alled NP-c NP-complet ompletee i ff it is is NP NP-hard -hard and f NP NP..
∈∈
Informally, NP-complete problems are the hardest problems in NP. In the more general general case, complete problems problems w.r.t. w.r.t. a class are the hardest of that class. It is likely that if f f is is NP-complete, then then f P , P , however, this is not a proven fact. Thus, Th us, instea instead d of trying trying to dispro disprove ve membersh membership ip of a class class (f P P ), ), in complexity theory, we prove completeness for the immediate upper (or greater) class (f (f is NP-complete). The intuition is that class membership provides an upper bound, while hardness — a lower bound for the di fficulty of a problem. We illustrate this by an abstract example. Recall:
66∈
6∈
P
⊆ NP ⊆ EXPTIME
40 Let f f be a pr prob oble lem. m. Su Suppo ppose se we find an algor algorith ithm m fo forr f f which runs in exponential time, however, we cannot find one which runs in polynomial time on a NTM. At this point, we have have f EXP EXPTIME. TIME. Suppose we show f f is is P-hard, thus, thus, f f can can be used to solve any problem in P. We now know that f that f can be solved exponentially and it is unlikely that f f can be solved in sub-polynomial sub-polynomial (e.g. logarithmic) logarithmic) time. Thus, Thus, the likely varian variants ts are: f may be solved in polynomial time (i) by a conventional TM f P or (ii) by a NTM NTM f NP, and (iii) in exponential time, again by a conventional TM f EX EXPTI PTIME. ME. In the best case case f f is polyno p olynomially mially solvable. solvable. In the worst — it is exponentially solvable. Suppose now we also find that f that f is is NP-hard. We cannot rule out f out f P by a proof, but Complexity Theory predicts that such a membership is not likely. Hence, the feasible variants remain (ii) and (iii). Finally, if we manage to improve our exponential algorithm for f f and turn it into a non-deterministic polynomial algorithm, then f NP and,
∈
∈∈
∈∈
∈∈
∈∈
∈
hence hence, it is lNP-complete. NP-comp lete. At Case (iii) remains remain of course true,characte butracterisatio it does not carry ,useful usefu information. information. this point, wes have hav e an exact cha risation n of the difficulty of f . f . Proving NP-completeness Proving NP-completeness For a problem f problem f to to be NP-complete, it must satisfy sati sfy two two condit conditions ions.. The first: first: f NP is shown by finding a NTM which decides decides f f in polynom polynomial ial time. For the second second part (f ( f is is NP-hard), we can employ precisely the “reduction-finding” technique illustrated at the beginning of this section.
∈
Proposition 0.3.4.1 Proposition 0.3.4.1 A problem f f is is NP NP-hard -hard i ff there there exists a problem g which is is NP-hard, NP-hard, such that g p f f ..
≤
Proof: Suppose Suppose g p f f and and g is NP-hard. NP-hard. Hence, Hence, for all all h NP, NP, h p g . By transitivity, we also have that that h p f . f . It follows that that f f is NP-hard. In the former proof we have made use of the transitivity of p , without showing it. We now state several properties of o including transitivity, and leave the proofs as exercises.
≤
∈
≤
≤
Proposition Propositi on 0.3.4.2 0.3.4.2
≤ ≤
≤
≤ is reflexive and transitive. p
≤ ≤ . Proposition Propositi on 0.3.4.4 0.3.4.4 The set of of NP-complete NP-complete problems together with ≤ is
Proposition Propositi on 0.3.4.3 0.3.4.3 The set of of NP-hard NP-hard problems is closed under
p
p
an equivalence class.
0.3.
COMPL COMPLEXITY EXITY THEOR THEORY Y
41
∈ ∈ P, then P P = NP. NP.
Proposition Propositi on 0.3.4.5 0.3.4.5 Assume f f is is NP-complete. NP-complete. If f
The former proposition, whose proof follows immediately from the underlyunderlying definitions, makes the case for the common belief that P = NP. If some efficient algorithm can be found for some NP-complete problem, then then all problems in NP can be solved in polynomial time. The P = NP issue can also be given another intuitive interpretation: “The verification of a solution candidate is as di ffi cult cult as generating it ” or, alternativ altern atively: ely: “Verifying a given proof P for A, is as di ffi cult cult as finding a proof for P ””.. Finally, to better understand the implications of P = NP, consider several facts which would be arguably true, in the case the former equality holds:
6
• We can provide a solution to the astronauts’ problem (see first chapter) fficiently • Partial program correctness can betosolved ciently. Technique such as model checking can be applied a wideerange of .applications (in-
cluding operating system cluding system kernels kernels). ). Bugs are almost almost remove removed. d. Windows bluescreens are no longer happening.
• Generation of exponentially many training sets would make tasks such as voice recognition, computer vision, natural language processing — computationally easy. • Mathematical proofs (of, say 100 pages) can be generated e fficiently. Computers can be used to find proofs for some open problems. • We can exponential search to find passwords, or to break encryption keyss in polynomial key polynomial time. Internet Internet privacy privacy is no longer possible possible using encryptio encryp tion n (e.g. (e.g. using using SSH). SSH). In Inter ternet net commerce commerce and bankin bankingg is no longer possible. possible. Safe communicat communication ion is no longer possible (at all levels). Any computer-controlled facility (public, military, etc.), which is connected to the Internet has considerable potential of being compromised. Remark 0.3.4.1 (Practical applications of reductions) As illustrated before, befor e, re reductions ductions of o f the type p are a theoretical tool which is useful for proving NP-hardness. NP-hardness. Reductions also have practical applications. For instance, most NP-complete most NP-complete problems are solved by employing SAT solvers, which, as discussed in the former chapters, may be quite fast in the general case. Thus, a specific problem instance is cast (via an appropriate transformation)
≤
42 into a formula ϕ, such that ϕ is satisfiable i ff the the answer to the instance is yes.. yes SAT. The first NP-complete problem. We observe observe that the “hen and eggs ” issue issue still holds holds in our scenar scenario. io. To app apply ly our techniq technique, ue, we need need an initial NP-hard problem problem in the first place. This problem problem is provided provided by Cook’s Theorem, which proves that SAT is NP-complete. The technique for the proof relies on building, for each NTM M NTM M (and (and hence, for each problem in NP), a formula ϕ M such that it is satisfiable i ff there there exists a sequence in the computation tree leading to success to success..