Why Philosophy is Going to Stay

In a nutshell, philosophy deals with those subjects that cannot be completely formalized. The sciences are about areas of knowledge for which complete formal theories are possible. Scientism includes the belief that what is formalizable and what is real is the same, i.e. everything can be described in terms of formal theories. Analytical philosophy is trying to turn philosophy into a science, but if everything can be formalized, philosophy is (as some scientists state) unnecessary, so analytical philosophy is making itself obsolete.

However, if, as I think, reality cannot be completely formalized, science is inherently incomplete and philosophy is not going away. Especially, human cognitive processes cannot be completely formalized in principle. Each formal description of such processes is incomplete and partial. Cognition develops historically (something formalizable systems don’t do). Cognitive science then turns out not to be a science but a historical discipline. Human thinking does not follow fixed laws.

As a result, there is no complete and at the same time exact formal description of cognition and of its products, like society, culture, and even science itself, cannot be described completely in terms of a single formal theory. Philosophy is not going away. As long as you do “normal science” in the Kuhnian sense, you don’t need philosophy, but if you are working in any field of the humanities, or psychology or “social sciences”, you permanently need philosophy. Here, you do not have a fixed methodology. You have to be reflexive and look at what you are doing from a meta- (and meta-meta-…level) all of the time. You have to look at what you are doing critically all of the time. In the sciences, you also have to do that, but only occasionally, if you bump into anomalies and you have to shift your paradigm.

In mathematics, there are entities for which we can prove that a complete formal description is impossible. If such entities exist in mathematics, there is no a-priory reason why they should not also exist in physical reality. Human beings and their societies and cultures seem to be such entities for which a complete formalization is impossible. If that is so, philosophy is not going to go away.

Advertisements

The Core of Philosophy

In a way (that I am going to explore in some articles on Creativistic Philosophy), one could say that computability theory (which could be called “formalizability theory”), as one can find it in the works of Post, Kleene, Turing, Church and Gödel, forms the very core of philosophy. From here, one can investigate why philosophy still exists, why it will not go away and what is the nature of the analytic/continental divide and the science/humanities divide.

Project Sketch

Sketch of the line of argumentation, to be developed in a sequence of articles. The plan is to write each article in such a way that it appears to be almost trivial. The argument is broken up into very small steps that can be understood without special knowledge of mathematics or computer science. The line of thought should be presented in a form that shows it is actually simple and trivial (which it is).

Programs as finite texts over finite alphabets. Each program only contains a finite amount of information.

Programming languages – Interpreters – Special purpose languages – Universal programming languages – Turing machines and other mathematical “programming languages”

Computable functions. Programs computing functions. Functions as (infinite) lists of input-output pairs. Programs of computable functions as compressed representations of such lists. Regularity in such lists expressed by the programs.

Representation of arbitrary data as natural numbers. Representation of Programs by natural numbers. Gödel numbers. Results valid for functions (and programs) of natural numbers are valid for functions (and programs) of arbitrary data.

Turing-enumerability.

Programs computing total functions of natural numbers are not Turing-enumerable. Proof of this by the diagonal method. Constructive nature of this proof. So every algorithm producing programs computing total functions is incomplete. The diagonalization method can always be used to produce another computable function and the program computing it, but although this operation is Turing-computable itself, integrating it into an algorithm yields an incomplete program again. So it must be applied “from the outside”, not under the control of the algorithm itself.

Side-step: Turing-enumerability of programs of a programming language (programming languages are decidable). Halting-problem for Turing machines. Impossibility to prove equivalence of arbitrary programs with an algorithm. Impossibility to prove correctness of arbitrary software by an algorithm. Programming is always risky and error-prone.

Set of Programs producing programs computing total functions is again not Turing-enumerable. Sketch of Proof. Productive sets and productive functions. The set of such programs is a productive set. Trying to integrate the productive function into the algorithm does yield an incomplete program. So again, the extension process must be applied from the outside, not under the control of the algorithm itself.

Definition of creative systems. Creative systems cannot be algorithms.

Because of the possibility of Gödelization (mapping of data onto natural numbers) all these results are valid for programs processing arbitrary types of data.

Any kind of knowledge can be viewed as programs calculating total functions or programs producing such programs. Declarative knowledge can be viewed as programs formulated in a special purpose programming language and interpreted by some procedures that act as the interpreter. Applying such knowledge can be viewed as the production and subsequent execution of programs. All these programs halt after some time, so they can be viewed as programs computing total functions.

Creativity (adding new programs to a set of programs that is not Turing-enumerable) is the core of general intelligence. A generally intelligent system cannot be an algorithm but must be a creative system. Any algorithm (even an algorithm producing programs) is limited. It contains a limited amount of knowledge that has a limited reach. General intelligence requires a mechanism to extend the set of programs (the knowledge) but this cannot be part of the system as far as it can be viewed as an algorithm.

Algorithms and formal theories are equivalent notions. There cannot be formal theories of creative systems. If science is about describing systems with fixed laws, creative systems are outside its scope. They are inside the scope of a wider area of “Wissenschaft”, however.

Artificial intelligence may be possible but truly intelligent systems cannot be algorithms. They must contain an extension mechanism not under the control of their algorithmic part.

It is interesting to note that the basic results from computability theory where already known in the 1950s and 1960s (and even earlier) when the traditional AI paradigm was created. The traditional AI paradigm ignored these insights. This is the reason it developed into a dead track. All contemporary “AI” systems can be described as algorithms. Where they contain learning mechanisms, these are limited. It would be interesting to work out the history of early AI to see how this happened. Why where the results of people like Gödel, Turing, Kleene etc. ignored by AI, instead of turning them into the core of the discipline and defining the aim of the discipline as developing creative systems, i.e. systems that can go beyond algorithms? Has this been worked out by any historian of science already?

Estimating the Complexity of Innate Knowledge

File:GeneticCode21-version-2.svg

The following is a very crude estimate of the informational complexity of the innate knowledge of human beings. To be more exact, it is a crude estimate of an upper limit to the information content of this knowledge. It might be off by an order of magnitude or so. So this is a “back of an envelope” or “back of a napkin” kind of calculation. It just gives a direction into which to try to get a more accurate calculation.

According to the human proteome  project (http://www.proteinatlas.org/humanproteome/brain), about 67 % of the human genes are expressed in the brain. Most of these genes are also expressed in other parts of the body, so they probably form part of the general biochemical machinery of all cells. However, 1223 genes have an elevated level of expression in the brain. In one way or the other, the brain-specific structures must be encoded in these genes, or mainly in these genes.

There are about 20.000 genes in the human genome. So the 1223 genes. So about 6.115 % of our genes are brain specific. Probably, we share many of these with primates and other animals, like rodents, so the really human-specific part of the brain-specific genes might be much smaller. However, I am only interested here in an order-of-magnitude-result for an upper limit.

I have no information about the total length of these brain-specific genes, so I can only assume that they have average length.

According to https://en.wikipedia.org/wiki/Human_genome, the human genome has  3,095,693,981 base paris (of course, there is variation here).  Only about 2 % of this is coding DNA. There is also some non-coding DNA that has a function (in regulation, or in production of some types of RNA) but let us assume that the functional part of the genome is maybe 3%. That makes something in the order of 92 – 93 million base pares with a function (probably less). That makes 30 million to 31 million triplets. If the brain genes have average length, 6.115 % of this would be brain specific. That makes that is something like 1.89 million triplets.

The triplets code for 20 amino acids. There are also start- and stop-signals. The exact information content of a triplet would depend on how often it appears, and they are definitely not equally distributed, but let us assume that each of them codes for one out of 20 possibilities (calculating the exact information content of a triplet will require much more sophisticated reasoning and specific information, but for our purposes, this is enough). The information content of a triplet can then be estimated as the dual logarithm of 20 (you need 4 bits to encode 16 possibilities and 5 bits to encode 32 possibilites, so this should be between 4 and 5 bits). The dual logarithm of 20 is 4.322. So we multiply this with the number of triplets and get  8.200.549 bits. This is  1.025.069 bytes, or roughly a megabyte (something like 200 – 300 times the length of this blog article).

So the information content of the brain coding genes that determine the structure of the brain is in the order of a megabyte (smaller than many pieces of commercial software). The structure of the brain is generated by the information contained in these genes. This is probably an overestimate because many of these genes might not be involved in the encoding of the connective pattern of the neurons, but, for example, in the glial immune system of the brain or other brain specific, “non-neuronal” stuff.

If the brain’s structure is encoded in these genes, the information content of these structures cannot be larger than the information content of these genes. Since there are many more neurons, a lot of their connectivity must be repetitive. Indeed, the cortex consists of neuronal columns that show a lot of repetitive structure. If one would describe the innate brain circuitry, i.e. that found in a newborn (or developing in the small child in processes of ripening), and you compress that information to the smallest possible size, determining its information content, that information content cannot be larger than the information content of the genes involved in its generation. The process of transcribing those genes and building the brain structures as a result can be viewed as a process of informtion transformation, but it cannot create new information not contained in those genes. The brain structure might contain random elements (i.e. new information created by random processes) and information taken up from the environment through processes of perception, experimentation and learning, but this additional information is, by definition, not part of the innate structures. So the complexity of the innate structures or the innate knowledge, i.e. the complexity of the innate developmental core of cognition, must be limited by the information content of the genes involved in generating the brain.

The above calculation shows that this should be in the order of magnitude of a megabyte or so.

This means also that the minimum complexity of an artificial intelligent system capable of reaching human-type general intelligence cannot be larger than that.

We should note, however, that human beings who learn and develop their intelligence are embedded in a world they can interact with through their bodies and senses and that they are embedded into societies. These societies are the carriers of cultures whose information content is larger by many orders of magnitude. The question would be if it is possible to embed an artificial intelligent system into a world and a culture or society in a way to enable it to reach human-like intelligence. (This also raises ethical questions.)

In other words, the total complexity of innate knowledge of humans can hardly extend the amount of information contained in an average book, and is probably much smaller. It cannot be very sophisticated or complex.

(The picture, displaying the genetic code, is from https://commons.wikimedia.org/wiki/File:GeneticCode21-version-2.svg)

Thoughts about Intelligence and Creativity

Some unordered notes (to be worked out further) on some general principles and limits of intelligence.

Reality has more features that we can perceive. What we perceive is more than what we understand. And our understanding has several levels, from perceiving shapes to conceptual interpretation and deep analysis. On each level, we can capture only a fraction of the information of the level before it. (See also https://creativisticphilosophy.wordpress.com/2015/02/19/dividing-the-stream-of-perceptions/)

The primary sense data are processed quickly, by neuronal systems having a high degree of parallelism. However, the level of analysis is rather shallow. To process large amounts of data quickly, you have to have an algorithm, a fixed way of processing the data. Such an algorithm can only recognize a limited range of structures. An algorithm limits the ways in which the bits of data are combined. An algorithm is a restriction. It prevents universality. The data could be combined in so many ways that you would get what is known as a combinatorial explosion if you would not limit it somehow. The system, having only a limited processing capacity, would be overwhelmed by the hyper-astronomically growing number of possibilities. Therefore a system processing a large amount of data must restrict the way it combines the data. As a result, it can process large amounts of data quickly but will be blind to a lot of the regularity that is contained in the data and could theoretically be discovered.

In order to discover such hidden features, you cannot process large amounts of information at once because this would lead to a combinatorial explosion. You would, instead, have to process small amounts of information at any given time, trying to find some pattern. Only when you discover a pattern, you can try to scan large amounts of data for it, essentially applying a newly found algorithm to the data. But that algorithm will in turn be blind to other regularity the data might contain. Each algorithm you may use to analyze data is incomplete, because it has to limit the way data is combined, or it will not be efficient, leading to combinatorial explosions again.

Intelligence could be defined as the ability to find new instances of regularity in data, regularity that was not known before. It can therefore be defined as the ability to construct new knowledge (new algorithms). This is only possible, in principle, by analyzing small amounts of data at any given time. Any algorithm you may use to analyze larger amounts of data will be limited and may be missing some of the structure that is there (i.e. it will restrict the generality of the intelligence). (See also https://creativisticphilosophy.wordpress.com/2015/05/16/how-intelligent-can-artificial-intelligence-systems-become/ and https://denkblasen.wordpress.com/2015/05/25/a-note-on-cognitive-limits/).

This limit to intelligence should be valid for single human beings but also for groups of human beings, like scientific communities or cultures. It would also hold for any artificial intelligent system. Such systems cannot be made arbitrarily intelligent. One could try to do so by putting many small intelligent systems in parallel (something like an artificial intelligent community) but since such systems would not be limited by any algorithm (or formal theory), they could develop into totally different directions, disagree with each other and suffer from misunderstandings if one would try to connect them together. If you connect them in a way that limits the possibility of misunderstandings in their communication or that stops them from disagreeing or from developing into totally different directions, you end up with a parallel algorithm again that can harmoniously analyze large amounts of data but is limited in what it can do.

You either get shallow processing of large amounts of data or deep analysis of small amounts of data with the potential of new discoveries, but you cannot have both at once. As a result, there is a limit to how intelligent a system can become.

There is no limit to what can be discovered by an intelligent system: if a structure is present in a set of data, it can be found if the system doing the analysis is not an algorithm (i.e. a system describable in terms of a finite formal theory – an algorithmic system, on the other hand, will necessarily be systematically blind to some structures). On the other hand, an artificial superintelligence is not possible. Processes of intelligent data analysis in such a system might be faster than they are in a human being, but they will not be much more sophisticated. Higher sophistication by adding of smart algorithms leads to limitations, i.e. to systematic blind spots. Higher sophistication by attempting to process more data at a time leads to combinatorial explosions which cannot be managed by whatever additional speed or processing power one would add. (See also http://asifoscope.org/2013/01/18/on-algorithmic-and-creative-animals/ and also http://asifoscope.org/2015/01/20/turning-the-other-way/)

For shallow analysis you need algorithms. Speed in terms of amount of data (bits) processed per time (seconds) may be high, but the depth of processing is limited. If the goal of cognition is to find regularity (and thus compress data), the algorithmic system will not find all regularity that is there. It cannot compress data optimally in all instances. Such a system will have blind spots.

Finding all regularity may be viewed as the ability to find the smallest self-expanding program that can produce the data (i.e. an optimal compression of the data). If an algorithm analyzes a stream of data, i.e. it parses the data, and the stream of data is longer than the algorithm itself, the algorithm may be seen as a compression of the data. If the compression is loss-free, i.e. the algorithm can reproduce the original data then the data must contain some redundancy if it is longer than the algorithm. The data will then not exhaust the information carrying capacity of the information channel. Therefore, it must be possible to add some information to that channel that is not parsed by the given algorithm. Hence the algorithm must be incomplete since there is data it cannot parse. It systematically has a blind spot.

Therefore, an intelligent system able to find arbitrary regularity cannot itself be an algorithm. Instead it must be a system that can produce new knowledge (and thus does not have a fixed representation as a finite text, and does not have a Goedel number). It must be changing over time, incorporating information that enters it from the analyzed information stream. This information reprograms the system, so it changes the way the system works. The system cannot have a fixed way in which it is working because then it would be an algorithm and would have a blind spot.

The possibility that the system self-destructs (becomes mad) cannot be excluded. That is a risk involved in intelligence/creativity.

Sophisticated knowledge has a high efficiency but a low universality. It is special and will “miss” many features of the data it processes (i.e. it has blind spots). On the other hand, it is efficient, which means that it allows large amounts of data to be processed. The processing of large amounts of data in a short time means that only a limited subset of the properties of that data can be considered, making analysis shallow.

Simple knowledge, on the other hand, has a high universality but a low efficiency. It allows for new features of data to be discovered. It therefore has the potential of a deep analysis that does not miss properties, but it has a low efficiency and can only process small amounts of data at a time, since applying it to large sets of data leads to combinatorial explosions.

The simple knowledge is what is called “reflection basis” in K. Ammon’s dissertation. (see Ammon, Kurt: “The Automatic Development of Concepts and Methods“, Doctoral Dissertation, University of Hamburg, 1987).

New knowledge forms by incorporating information from data into the knowledge base. This might occasionally happen through the application of sophisticated knowledge but most of the time is the result of applying simple knowledge to small amounts of data, leading to the discovery novel (from the system’s point of view) properties of the data. As a result, new more sophisticated knowledge forms. This knowledge is special and more efficient.

The small amounts of data that are processed by simple knowledge might be input data from the input stream, but might also be chunks of knowledge that are experimentally plugged together in different ways and then experimentally applied to the input stream (perhaps in its entirety). This might occasionally lead to sudden changes of perception (e.g. changing from two-dimensional vision to three-dimensional vision). Successful (i.e. efficient) structures are then retained. This is a way of incorporating information from the environment into the system.

The total universality of a creative system lies in the emptiness of its core (i.e. there is no fixed, i.e. unchangeable, special knowledge restricting what it can do).

The trade-of between efficiency and generality is a special case of (or another way of expressing) the trade of between explicitness/exactness and generality described in https://creativisticphilosophy.wordpress.com/2013/10/12/the-trade-off-between-explicitness-and-generality/. A result of it is that there is a fundamental limit to how intelligent a system can become.

Sophisticated knowledge can be used to filter out expected components from the data stream, leaving the surprising parts that can then be analyzed by less sophisticated knowledge. The end result might be more comprehensive sophisticated knowledge where the previously surprising data becomes expected data.

(A lot of this is already contained in K. Ammon’s dissertation in one form or another).

A Note on Computational Models of Cognition

File:Nyhavn lego.jpg

Another set of draft notes to be worked into more elaborate articles:

In https://creativisticphilosophy.wordpress.com/2014/06/23/a-note-on-analytic-philosophy/ I have already stated what I generally think about analytic philosophy. Cognition can always work in more different ways than any of the formalisms developed inside analytic philosophy or “Artificial Intelligence” (AI)is describing.

The AI people are trying to develop computational models of human cognition. But their idea of “computation” is very limited. I have seen a lot of software during my life (I am myself a programmer) but the only software I have ever seen that was working according to AI principles was, well, a piece of artificial intelligence software (as far as I remember, it was based on what they called “semantic networks”, and the results were not very impressing – I turned away from that field of research). There is a lot of different software working in many different ways. For example, there is software that is controlling air planes or the brakes of your car. There is image processing software processing your photographs. There is some software playing music to you. There is the software of internet applications like the WordPress blogging platform. And so on and so on. None of this software is working in terms of conceptual hierarchies, semantic networks, etc.

I could describe my own ideas about cognition as “computational”, but the approaches I see in AI and analytic philosophy are rather ridiculous. Computation (or software) is a much more ductile, pliable, plastic “material” than these people think. It is not even restricted to fixed representational languages and fixed algorithms. The models of AI and analytic philosophy look like somebody is trying to model the whole world from Lego bricks. Reality is far more complex. It simply does not work that way. These models seem to come from a philosophical tradition that started in the 17th century (or even earlier?), a tradition providing a simplistic model of how thinking and language work.

It is obvious that our processes of perception, thinking and acting are just that: processes. Something is happening. And one can describe them as processes in which information is processed and stored. In that very general sense, one can think of cognition as information processing or computation (although not necessarily digital). In this sense, it makes sense to me to think about it in computational terms. However, we should not buy into the simplistic models as described in http://plato.stanford.edu/entries/mental-representation/. If we buy into those limited and restricted notions of computation, we have, in a sense, already fallen prey to those theories. There might be some thinking processes that work in terms of concepts, propositions and logical inferences and stuff like that, but that is just a fraction of the whole story (just as in the case of our computers, where the majority of applications does not work in such ways).

I would classify my own approach of thinking about cognition as “computational”, but not in the sense this term is used in classical AI.

(The picture, showing a scene from Legoland in Denmark, is from https://commons.wikimedia.org/wiki/File:Nyhavn_lego.jpg).

Notes on Language and the Semiotic Revolution

Some draft notes in connection to my recent article http://asifoscope.org/2015/09/09/creativity-and-language/, to be worked out into proper articles:

The case of the Piraha language (see, for example, http://www1.icsi.berkeley.edu/~kay/Everett.CA.Piraha.pdf) shows that a simple culture can do without a lot of the logico-semantic machinery that was long thought to be both universal and essential. “…Piraha˜ culture constrains communication to nonabstract subjects which fall within the immediate experience of interlocutors. This constraint explains a number of very surprising features of Piraha˜ grammar and culture: the absence of numbers of any kind or a concept of counting and of any terms for quantification, the absence of color terms, the absence of embedding, the simplest pronoun inventory known, the absence of “relative tenses,” the simplest kinship system yet documented, the absence of creation myths and fiction, the absence of any individual or collective memory of more than two generations past, the absence of drawing or other art…”

It seems likely that the ancestors of this small group probably had a more elaborate language and that this language might be the result of some process of simplification (maybe caused by cultural factors, maybe by a desaster that was survived only by some children, I don’t know) and is not a remnant from an earlier time, but it is interesting that the human brain is capable of so simple a culture and language that is laking all of these things. For example, there is no universal quantification in Piraha (i.e. no possibility to express sentences about all instances of some set, like “All people are mortal”. This indicates that universal quantification (for example) is not “hard-wired” into our brains. It is part of our culture. If that is so, it must have been invented at some point in history. (I think Kant thought of it as something a priori, and Hamann thought Kant was wrong because we get it through language – to be explored in more detail…).

If such semantic or logical devices are not genetically hard-wired into our nervous systems, they must have been historically invented at some point and are completely part of culture, and there must have been a time when all human cultures and languages where as simple as the one of the Piraha, or even simpler. So the “semiotic revolution” that seems to show up in the archaeological record, a sudden increase in the complexity of cultures (around 100.000 years ago in southern Africa and then spreading) could have been a completely cultural development.

Anthropologists often seem to asume that it was a biological/genetic change. The assumption seems to be that cognition evolved biologically step by step, with genetic changes in the brain enabling hominids to think in novel ways. Instead, the bulk of this development could have been completely cultural. Once a certain intelligence threshold is passed (at a far earlier point in time, I think even before the development of Home Erectus), language is invented and then bit by bit, new semantic and syntactic constructs are (culturally) invented. During this development, the cognitive capabilities, i.e. the range and types of thoughts that where possible, where extended, not by biological changes but by cultural inventions. At some point (and this might well have been the invention of universal quantification) cognition became markedly more complex because the expressive power of language increased.

The Piraha show that this is possible. The assumption that these changes where genetic then turns out to be pure speculation. It is just as well possible the semiotic revolution was purely cultural. There is also no reason to believe that other populations of humans (e.g. the Neanderthals or the Denisovians) had inferior cognitive abilities in genetic terms. They might have had simpler cultures. And the fact that they mixed with the people coming out of Africa indicates that it does not make much sense to view them as separate species. (The assumption that these where separate “species” then appears totaly arbitrary and a remnant of 19th century “scientific” racism).