Meaning from Data

Intelligence: Artificial vs. Real

Here is the problem David Cope solved many years ago

Given the enormous musical library of compositions left to us by the venerable Johann Sebastian Bach, can new compositions be manufactured as if his signature were upon them? As if they were long lost Bach musical manuscripts only recently discovered?

Cope reincarnated Bach by giving us compositions Bach didn’t produce but very well might have produced.

Now consider the problem every baby must solve in learning to talk. Baby Kate is presented with an enormous database of words  that she hears over, perhaps, two years. They come at her in various combinations – statements. And from this library she constructs her own word statements. Her elders can understand them. But Kate’s words are original creations manufactured from the library at her disposal in her memory.

The parallel is evident. It is not precisely known how Baby Kate does this. But it is, in fact, done on digital computers. Perhaps there are elements in the way computers are programmed to do it that hint at how the brain might do it.

In a memorable presentation to our Concept Exchange Society Cope showed us how he did it with Bach pieces.

To portray the essentials we model the process using a simple toy system. The analogue of the simpleton-tic-tac-toe of an earlier post; Robots’ Feelings .

Both music and language are constructed of elementary units from which phrases or statements are built. Statement units are ‘words’. Imagine a ‘language’ of statements consisting of three linked entities. Three words make a statement. The words are Q and R and S.

Here is the database – the library of examples. This library includes four statements. (In the whole of this toy language there are only 27 possible statements.)

  • RSR
  • QQR
  • SRQ
  • RSQ

Using these as exemplars we wish to create new statements. How can we create new valid statements in this language using as a basis the four examples we have in the database?

The essential role in this quest is not that of the computer. There is no need for a computer when the database is small. That is precisely why we examine this toy model; to avoid computer jargon and thus expose the essentials of the matter. A computer is programmed with detailed instructions on what to do with the data in the database. Because it can do this at such extraordinary speed, large database tasks can be accomplished that were formerly insurmountable. The computer is made to do the work by coding it with an algorithm; a set of detailed instructions on what is to be done with the data.

But the computer cannot decide what constitutes a valid statement. It cannot know what fits naturally among the examples that we have? Is RRR a valid new statement in this language? Maybe not. How do we separate gibberish from valid statements?

This decision must be coded as an algorithm by the programmer. He decides, on the basis of his personal convictions, what constitutes fitness. If his belief is that statements should never end in S, he instructs the computer to discard all statements, among the 27 possible ones, ending in S.

Here’s the essential of Cope’s scheme to identify fitness. He noted how Bach arranged his chord progressions beat by beat. Looking at the three beat musical statement RSR, of the database, he asked what chord, other than the one here tagged by S, might follow that first R. To what may R progress? Searching the database he finds that Q may follow R. It does so in the SRQ example. So for the first two chords of RSR we can replace RS.. by RQ..

Now we must query the library for examples of what might follow Q. Among the examples in the database library only another Q follows Q. So we conclude that Bach might well have written RQQ. This statement isn’t in the library but fits into it.

Needless to say, Cope imposed other constraints reflecting his knowledgable musicianship in order to produce the musical finished products he extracted from his computer.

What may we gather from this history?

1. Computers Descend from their Engineers

The computer’s behavior is imposed from the programmer’s experience of the world. The rules for querying the database reflect the partiality of the programmer as to how things work. The programmer’s world view is contained in his algorithm for mining the data. A future programmer may use a different algorithm to do the mining.

2. Nothing is New

In this scheme there is a limit to the novelty of outcomes. Something beyond the prejudices inserted by the programmer cannot arise. In creating new things from fragments what the programmer does is to program his whims. He extracts from the library what fits his world view.

Remarkably, when large databases are involved, there is a surprising variety and richness in the program’s output. Effects emerge that aren’t foreseen. Because of the variety of output it appears as if originality is manufactured.

3. Life has no Programmer

Living organisms don’t have a programmer. They are self-programming. They must operate on different basis. A living organism computer must grow its own intelligence.

4. Context is Everything

4. What can we learn from the instructions that programmer’s use to extract information from data? How universal is Cope’s algorithm of reorganizing close pairs in the data. Is proximity – in time and in space – fundamental to the processing of information from sampling?

Cope says that his algorithm, “using what he calls the recombinance described above, is very much a part of the evolutionary process inherent in life”. Exactly proximity is a fundamental element, says Cope.

What I seek is this: an explanation, on an elemental and non-technical level, how a computer is currently made to respond to questions by an arbitrary person. It doesn’t, itself, acquire an ability to answer but rather humans pre-program answers. How? What organization chart of human experience is embedded in this activity? How does one think about the program for a computer to dissect what is spoken to it and produce an appropriate answer? By what algorithm is the right answer retrieved from stored data?

Seems to me that the compartmentalization of questions so as to retrieve answers expresses a way of seeing the world. What is that world view? Of a machine, that without processing meaning, appears smart.

5. What does it all Mean?

If it can appear smart without processing meaning, what does smart mean? Alternatively put: from where does meaning come? What is the meaning of meaning?