- Type: NiTe
- Development: ll-l
- Attitude: Unseelie
There has recently been an explosion of interest in all fields about computation – with computational models for the mind propping up left and right, the question of what computation is has not ever been more important to consider than it is now. Especially relevant to CT is the bearing that the definition of computation has on the ontological status of Model 2; we will discover today that the dichotomy between discrete and continuous (or digital and analog) processing is of foundational importance to understanding the nature and methods of modern computation (along with its limitations). Computational complexity theory will also be explained, or the study of how “long” certain algorithms could take, measured in terms of the increase of the cardinality in the set of possible outcomes for a given algorithm, combined by the operational difficulty to reach the answer that fits the conditions outlined by the problem (or the algorithm, if the implementation is erroneous). Hopefully the post can serve as a useful introduction to the basics for newcomers and experienced members alike – given the possible change in direction for the whole theory. The map to computation begins at the most basic of standpoints; the simple act of differentiating things from one another.
Discreteness & Continuity
Taken at face value, the world can seem like it is a constant stream of overwhelming sensory input – mercilessly never ceasing to stop for any person or thing that comes towards it. However, both due to our minds and the underlying structure of the universe – we are able to deduce patterns in events and discern certain regularities that emerge from them. The belief that this set of regularities can be measured and tested (and put to pragmatic use) is what is at the foundation of the scientific method. There are many ontological presuppositions people tend to make after spotting this regularities – one is that some people believe not only that they can discern events about to happen, but that they can predict events that will happen. The physicist Pierre-Simon de Laplace (and many others of this time) expressed this articulately – Laplace’s Demon (as it is now summarized) was as follows:
“We may regard the present state of the universe as the effect of its past and the cause of its future. An intellect which at any given moment knew all of the forces that animate nature and the mutual positions of the beings that compose it, if this intellect were vast enough to submit the data to analysis, could condense into a single formula the movement of the greatest bodies of the universe and that of the lightest atom; for such an intellect nothing could be uncertain and the future just like the past would be present before its eyes.”
Given a knowledge of every possible state for every possible particle within Laplace’s universe, we could articulate a precise casual chain of exactly what is to happen at each given moment. This is the deterministic universe, one where each action could theoretically, if there were a system to contain it, be predicted – and also theoretically, be modified and optimized to certain results. If there was the capability to monitor the state of every possible object within a universe at any given time – and if there was the ability to store and manipulate that state, with reference to previous states, the current operating state of the machine could be changed. This would have been an ideal world in which to answer the important problems of science, philosophy and engineering; they would simply slip into place once we had the knowledge of every single state. Unfortunately, it turned out that reality does not quite work like that – the uncertainty principle puts a cap on properties that can be measured simultaneously to one another, based on rather intuitive constraints. The momentum of an object can be found through the product of its mass and its velocity; the more accurately its position can be measured at some given moment, the less accurately that its movement can be determined (due to the fact that measuring the position “clamps” the observer to a certain view of the object). Switching frames to measure momentum after measuring position takes an ever-so-small amount of time; thus while position and momentum are both possibly measurable properties of some object, measuring one means the other can only be seen in lower fidelity. More formally, a stationary particle and a particle moving at a certain velocity (and thus gaining momentum) form two distinct states within a given state-space. Since some states contradict one another, the extent our probabilistic calculus can extend is towards those state-spaces. We may be able to measure the wave function for a given object (or the function which outputs a “wave”, or the space of possible states that the object could don), but we cannot collapse the wave function arbitrarily until it does collapse – there is a probability for each outcome, and some are higher than others, but it can only be reasoned in likelihoods as to what may happen. This idea will be returned to much later on in the analysis – but for now, we consider reality with these constraints on it.
We must now consider another related idea – we implicitly introduced the idea of probability above, or the likelihood for certain sets of events to occur. The question we must seek to answer is one relating to just how ambiguous something is to us. Clearly there is a greater volume of something needed to understand the probability of different sorts of events. We can begin to construct a model for this with simple components; let us say that I have a hurricane monitoring system that logs one of two states: either HURRICANE or NO HURRICANE, abbreviated to H and NH. The logging system can be in precisely one state at a time. Knowing one of those states resolves the ambiguity about the weather outside; and if we were to say the weather had a 50% chance to be in either H or NH, we know precisely one state-space (of two states) to resolve the ambiguity about the situation. We can call this sort of state space a bit. If we were to say that there were four equally likely states, being precisely one degree higher than the previous situation (at 2^2 rather than 2^1), we can say there are two bits of information contained within the state message. This allows us to qualify that information a system contains is also a measure of entropy, or the “ambiguity” within any given system. Given that we have based our measure of information on a two-state system, we can also imagine this as the amount of “coin-flips” taken to represent the entropy of a system – it could be formalized as follows (source: Wikipedia, logarithms are to base two):
This function encodes the entropy of a given state space X by enumerating the entropy of each and every event by calculating the sum over the product of the probability of each event and its logarithm to base two (effectively forcing it into a H-NH situation, or a coin-flip situation).
This allows us to unlock an incredibly powerful ability; any event (including events recording information as we understand it in the colloquial sense of the word) can be recorded into discrete units known as bits, with the encoded information potentially passed onto other functions for manipulation. Given that the inputs are discrete and can be put into a unit (and thus can be operated upon in the same manner) – functions which deal with specific domains of inputs in specially adjusted manners may be constructed. Video, text, audio, everything, can be represented in this discrete manner.
The return of determinism
Given that the entropy for any event can be encoded in a manner wherein it can be deterministically passed as a component of a domain to any function, it is possible to construct functions around entropy-values (or bits) as inputs. Assuming we want the value of one function to be usable as input to the value of another function, we must impose the restriction that the ranges of all of these functions also be bit-encoded. The formal definition of such a function could be as follows: the function f(B) -> B’ takes a bit B as an input and outputs bit B’ as an output. The operations within the function must therefore be capable of transforming any bit into another bit. Given that a bit may represent any sort of information (provided it can be calculated with the function provided above), the function f(B) -> B’ may represent any sort of operation with an integer (given n-base number systems and their interoperability with the formula above). It may also represent rational numbers (as the ratio of two numbers) and irrational numbers which can be approximated by the limiting process – such results may lose fidelity if represented within a single string of bits, but they may be encoded with multiple strings of bits to retain more information. There are also numbers which may not be represented by this apparatus – the set of irrational numbers which cannot be approximated can never be approximated through f(B) -> B’, because there is no set of operations through which we could encode them, let alone manipulate them (as encoding also requires a discrete representation, which may require approximation). Most importantly, however, this means that a computable function allows for the return of a sort of determinism – given its input, its output may be precisely calculated via a translation (or the translation, or possible translations, may be deducible given the inputs and outputs).
This arrangement is very powerful on its own, but it is not yet fully capable. To conduct “branching”, dictating the order of composition of functions – a sort of storage is needed. Imagine a machine M which has as its input an infinitely long tape of bits, which can be in precisely one system at a time. The machine M may store its current state alongside the history of states it has been in. This allows for a comparison of states and conditional calling of different functions of the form f(B) -> B, provided that the states themselves are encoded in a medium which allows for bitwise comparison. This is a rudimentary description of Turing’s machine, or the prototype for modern computers – a modern computer may deterministically execute certain instructions with reference to present and past states of the system, allowing for conditional branching. This allows us not only to compute things deterministically; but to compute things in a certain order deterministically, or run various sorts of simulations to emulate a score of phenomena. With advances in digital technology throughout the last century, at least a portion of this theoretical device was able to come to actuality. Due to constraints of the laws of physics, an infinitely long memory capacity (and infinitely long tape) are not possible. However, very long tapes are, and semiconductors allow for reliably deterministic behavior.
These rather basic operations can be composed on embedded circuits to form “higher” or more complex operations. This is known as the process of abstraction. Generally speaking, abstraction entails a sort of repetition; the simplest example being the conditional loop, which in an unoptimized form, simply repeats the operation contained within it until the condition for the repetition is broken. Abstraction is the foundation of modern computing, but it is also a gateway to one of the areas of computing most relevant to research today – the study of computational complexity.
Recall your high school math class; most likely you were taught some information about different “kinds” of functions. Let us say that you have two functions, nx and x^n. If the values of n are equal, than the derivative of x^n is always larger than that of nx, regardless of which starting value is used. Similarly, assuming the same value of n is used as the base for raising, the derivative of n^x being n^x * log(n) means that it is always larger than the derivative of x^n (all of this is assuming x > 1, when x = 1, the values are the same, but we are modeling input growth, and an input of one bit is very unlikely. x < 1 does not need to be considered as you cannot have fractional bits). Given that a program can be modeled as a function f(B) -> B’ over a collection of bits, certain algorithms may take longer to operate on bit series of greater lengths than others. For example, an algorithm which has to apply a bit mask to each and every element of a bit list of length n would have to enumerate at least n steps (one for each bit mask applied to each element). The function of the number of steps taken may simply be written as n, and the derivative of n is 1. If it were to run through a list of length 2n or 3n, the derivatives would be 2 or 3, but they would belong to the same class of functions as one another – and so do their parent functions. The function which represents the number of steps a program takes to run for each increase in input size is known as the time complexity of the program. This allows for an abstracted measure of runtime independent of any given machine. Logical branching and conditions may also be measured as the worse of the two complexities; if one branch has a complexity of n^2, and the other branch has a complexity of n, the worst option is picked out as a maximum value for the complexity of the algorithm – which can be written as O(n) (or O(n^2), O(n log n), etc…). Each complexity function can be stripped of its coefficients and constant factors to simply denote which class of functions it belongs to. This provides a robust way to model the amount of steps an algorithm may take to run, but the same notation may be used for other purposes – like to calculate the space in memory needed to store the instance of an algorithm (which may not be important for modern computers with extremely high working memory, but is important for systems with low working memory such as human beings).
The greatest unsolved problem of computer science has to do with complexity theory. This problem is known as P=NP, and it asks the question of whether problems whose solutions can be verified in polynomial time can also be solved within polynomial time (and not other complexities such as exponential time). There are a number of problems known as NP-complete problems; where it is uncertain if there is a solution within polynomial time, but where there is no rigorous proof that they are unequivocally outside of the range of polynomial time. One such problem is known as the traveling salesman problem. A salesman has to go from one town to the other by crossing a number of other towns in the middle; the question is, passing all towns, what is the most efficient route? The route between one town and another town may be represented by either a series of straight line or a triangle – with nodes from every other point added to each insertion of a point already there. Computationally, these problems are very hard to solve, and efficient algorithms have yet to be derived. Most of the time, tasks involving a high degree of processing power are delegated either to parallel computing or other forms of solvers – like neural networks or genetic algorithms, which use statistical clustering to solve for these sorts of solutions. These have their own distinct problems. Neural networks can have a very high time complexity for training the dataset, and given that genetic algorithms have to enumerate thousands of possible algorithms for the “cost” they would take, there is time overhead there too. But there appears to be another solver for NP-complete problems: the human brain. Examples of NP-complete problems that they solve daily actually include myriad instances of the Traveling Salesman Problem – including deciding when to avoid the annoying superior colleague at work (and when to stand through what they’re talking and stomach it), also to figure out which clothes to buy (for which pragmatic gauges of “distance” have to be made). These are often, from the standpoint of the phenomenology of the human, solved distinctively by not enumeration – and by different approaches, which include narrowing down straight to a specific domain and solving from there on. More on that now.
Computation and the mind
In attempting to construct a model of the mind which is predicated upon computation, one has to first consider what “computation” is. The definition of computation in this post; where any computable function can be expressed as a transformation over a sequence of bits which range also being a sequence of bits for each input does not encompass the whole range of human thought. For one, there are phenomena that cannot be expressed computationally. I do not have to resort to vague symbolism or spiritual hand-waving to assert this; one only has to consider that Alan Turing’s original thesis (which is the model of computation this essay uses, and is also the model of computation that text written in any modern language presupposes) existed to prove the result that there indeed exist problems which cannot be computed; problems for which there are no formulae, such as irrational numbers which cannot easily be approximated by rational numbers (easy to remember when you consider a computational function f(B) -> B’ may only also return a computable value if it is to be composable with other computable functions). It is hard to know how numbers could be composed to create the very complex thoughts we have as people; and given the lack of rigor for most approximations made by us, it is reasonable to assume that the foundation may not be strictly logical as it is for a computer – our minds may be too concurrent to have a definitive “call” stack.
Another stumbling block is the boundary between consciousness and unconsciousness, which has not fully been explored (and in fact may be essential to understanding the parallelism behind our minds) – every step has to be explained as an explicit step, with full computational power put into it, without qualifiers for how the code could be parallelized or scheduled between different states (see the endnote for more detail about a potential divider between conscious/unconscious representation). Consider the following bit of code from Auburn’s recent dive into quantifying the functions into Turing-complete pseudocode. This line, is seemingly a benign sorting algorithm which looks through a set of mental association and restructures them according to location:
SORT allInstances.child-objects by spatial-coordinates
First it is important to note that it is not exactly clear what “sorting” here means; putting the coordinates of each object in ascending order, per object? Given that objects can be composed abstractedly in the mind, presumably an object also has nodes which connect it to “child” objects and “parent” objects, and if there are in total n different layers of objects, and a different number of branches a, b, c, d…. for each individual layer, extremely high polynomial complexities (nondeterministic, potentially, making these problems NP-complete) can be reached – for a complex structure with many different branches, the final result would very likely be a composition of O(n log n) (which is where most of the efficient algorithms are) to a power of several thousands (if composing each layer involves raising to a power). A simple example involving just two layers, one where each layer has an equal number of elements, for n layers and m branches would likely be around O(mn log m) – approaching quadratic complexity (which is generally considered doable, but slow). The architecture here explicitly presupposes thousands of cycles, and thus thousands of layers, and so if this code were to be written for a Turing-complete system – it would be considered vastly ineffective for its purpose. It is not even possible to say the physical component of the brain has the computational power to be able to conduct this sort of investigation, because it does not – the brain’s signal speed is around 10 million times slower than the average computer. It is clear that there is a massive architectural difference at play, and likely a much greater reliance on memory.
Specifically, Turing’s “tape” model of memory and sequential reading are unlikely to operate accurately at the level of the brain. It is far more likely that the brain operates on a much more parallel architecture (to compensate for its much slower processing power) and be able to index and tag memory with much more depth than Turing’s tape, allowing for complex approximation and sorting algorithms which presuppose an entirely different model of computer architecture from what the CT posts so far have been attempting to reach. Why is this important? At the ontological level, representing functions via pseudocode already assumes that the composition of many tiny processes which deal exclusively with computable quantities can give rise to non-computable thoughts – but until this link is established, not only is that tenuous, but also appears to be inefficient under most models of computation – far more inefficient than our brains are. The problem here likely goes back to encoding and information theory; it is doubtful that transformations on series of encoded bits form the basis of our navigation capacity; the likelihood is high that internally, we have to convert between several bases and number systems, and that the standard encoding patterns reflect that.
As a metaphor, the computational model helps people understand what functions may be working like. But it continues to be a vague and confusing way of understanding the mind. How do non-computable quantities emerge from computable transformations? How does the brain execute supposedly extremely complex algorithms, with ridiculous time and space requirements, so efficiently? These are some questions I would like to leave.
There are a class of problems known as scheduling problems which may be useful to consider – these problems dictate the best way to execute procedures in a distributed, parallel or sequential manner.
A basic example of a scheduling algorithm is the Round Robin mechanism for implementing operating system schedulers which allocate space and time to programs for execution. It is a flat scheduling algorithm, and orders processes by time – time is handed out to any process which requests it, and once the request is granted, it cycles to the next possibility. This requires a consistent slice to set as the required “time” for any given operation. A step more complex are prioritized scheduling algorithms; which calculate a weight for each process requesting centralized resources and grant slots for time based on that, asymmetrically; the vast majority of operating system schedulers use a form of prioritization to streamline functioning.
It is my personal belief (which is why this is put at the endnote and not in a previous section of this essay) that the brain may operate in a decentralized manner and that the “scheduling” of resources does not have to be done through a request to the centralized components of the brain – this would explain why we do not automatically tend to draw analogies between different kinds of thoughts, they exist discretely in separate parts of the brain. As they do come together, they may be composed with one another (perhaps implying a uniform information format), but the actual allocation of process “time” may be highly asymmetric and section-dependent. Consciousness and unconsciousness were mentioned earlier because they may potentially represent two kinds of states for two different sorts of problems. The explicit language of computation may represent a “conscious” and “unconscious” version of each possible state through multiple functions; but this appears difficult to do, much more than a trivial problem.
- Type: Unknown
- Attitude: Unknown
Hopefully the post can serve as a useful introduction to the basics for newcomers and experienced members alike – given the possible change in direction for the whole theory.
I appreciate your insight but I find your explanation overcomplicated and hard to understand in a way you put it into words. The main points aren’t clear as they could.
The idea have a lot of details behind itself but is somewhat simple actually.
Computational models belong to early cognitive science becouse we known almost nothing about the mind and we theoretized not how mind work but how mind could work. Computer like computations was convinient idea to use at the beginning. But with time we start to realize that if the mind is a tool for adaptation then idea of computation doesn’t work well becouse centalized computations are ineffiecient, and don’t match enviromental and biological realities.
This is something strange to me in CT – to use that way of conceptualize things becouse computational model seems to don’t go well with hipothesis of mind embodiment.
- Type: NiTe
- Development: ll-l
- Attitude: Unseelie
The point of my post was not to express the idea behind why exactly a computational model does not work generally – that’s been done many times (and I saw your thread on it, where I think the argument is good, and I err on the side of agreeing with you). The reason I have tried to provide every single possible detail here is to show that not only is the computational model incompatible with cognitive science, but that it violates the very fundamental principles of computer science too (mainly, the separation of the complexity classes), and so this was meant to be a technically focused post (since we’ve had many “ordinary” arguments put forth against it, I wanted to put a specifically computer science based argument against it).
I appreciate your point of view too, however, and think it is correct.
- Type: Unknown
- Attitude: Unknown
It seems to me that cognitive science went to another direction for the same reasons as computer science have problems with creating this algorithms and execiuting this computations.
As you wrote:
(how) composition of many tiny processes which deal exclusively with computable quantities can give rise to non-computable thoughts
The question is good but framework for answer must shift to another paradigm if we want obtain useful answer. Here we comes to emobodied cognition and robotics.
We know now from experiments that we can have intelligent behaviour without this computations. That put whole computational hipothesis into question. Also this have more philosophical consequeses: Do cognition (with non-compututable thoughts) is even possible without acting in the world? Algorithm can execute (imitate) the same task as acting entity. For example chess program can play chess as chess player (human) can. But computer computations don’t equal chess player thought process, they look different.
- You must be logged in to reply to this topic.