Thinking about how we understand art helps us understand what is required for real machine intelligence.
Current machine intelligence always arrives at a definite conclusion, even if its conclusion is probabilistic. Art often requires you to juggle several possibilities at the same time, or even to suspend judgement.
To understand music, poetry, painting, sculpture, dance, writing, or any other art form, you often must be able to juggle multiple associations, indeterminate relationships, or even contradictory meanings. Good art is often ambiguous. Humans can handle ambiguity, current machine intelligence cannot.
One of the most analyzed paintings in history is Las Meninas. Scholars and observers cannot even agree what is being described in the painting.
There are three layers of questions about Las Meninas. The following analysis might look laborious to us because humans do it so naturally, but this is what an intelligent agent would have to do.
What is being seen?
What would a machine intelligence see in the art?
It would recognize several objects: some children, some adults, a dog. Perhaps it could recognize the paintings in the background. Would it classify the dwarf as a child given the relative sizes? Would it recognize one of the adults as a painter?
What about the two people in the image in the background. Would it think it was a painting? Or would the machine intelligence recognize it as a reflective surface with a beveled edge of glass, and therefore a mirror?
Would the software understand the relationship between the three children in the center of the picture where one child is clearly the center of attention?
Would it be able to determine the vanishing point of the picture?
Would the painting be recognized as a court scene?
Who is in the picture?
Facial or image recognition technology could not be used to identify the people in the picture because there is an insufficient number of images available for it to use.
The algorithm would have to associate written historical information with objects in the picture in order to identify people in the picture. This is currently impossible. Software algorithms have made very little progress in real comprehension, especially deducing things that are implied in a text, much less associating it with other data.
Take a simple example of a class of problems called Winograd schemas. Computers have difficulty answering these types of questions that involve understanding, not just processing, sentences.1
- The city council refused the demonstrators a permit because they feared violence. Who feared violence, the demonstrators or the city council?
- The city council refused the demonstrators a permit because they advocated violence. Who advocated violence, the demonstrators or the city council?
The difference between these two sentences is a single word. Humans find these sentences easy to answer because they rely on their knowledge of the world. Computers find them hard. For example, the correct answer in the first sentence depends on you realizing that you have to ignore the standard rules in English for the antecedents of pronouns.
You need to be able to understand the sentences, not just use algorithms to process them. Understanding a work of art, or the relevant information about the work of art, is way beyond this level of difficulty. The more associations made, the more pieces of the puzzle recognized.
It would have to realize that the picture is set in the Alcázar palace in Madrid.
Would it realize this is a painting by Diego Velázquez? Would it associate him with the artist in the picture?
Could it recognize King Philip IV and Queen Mariana in the mirror? Could it realize this was a mirror by reading information about the painting if it was unable to detect it as an object?
The child in the center is the princess, Margaret Theresa. 2 What could it deduce about the painting given that at the time of the painting the child was the royal couple’s only surviving child?
Then man in the back is José Nieto Velázquez, the queen’s chamberlain. He may or may not have been related to the painter.
What about the radiographic analysis of the painting that possibly suggests that Velázquez was missing from the original? If the painting was revised, it might be the case that by that time, Margaret Theresa now had a male sibling. The machine intelligence would have to understand about royal succession rules. Would that change its conclusions?
What about the Order of Santiago on Velázquez’s chest. He did not receive that until after the painting was supposed to have been finished.
How do you make sense out of the painting?
How would a machine intelligence understand the meaning of the painting?
Here are some of the types of questions that most interpretations revolve around. A truly intelligent software agent would have to be able to attempt to answer them, or argue that they are irrelevant.
First, what is everyone doing here? What is this scene about? Why is everyone looking where they are looking, and what do they see?
Do the pictures on the wall help us make sense out off the painting, or do they just help us locate the room where this takes place? In the latter case, does the room of the setting help us decipher anything?
What is the person in the stairway in the background indicating? As the chamberlain for the queen he would be responsible for, among other things, opening doors for her. Is he coming or going? What is he doing holding the curtain? What difference does that make for the king or queen?
Since the king and queen are in the mirror, where are they sitting? Or is that just a reflection of the painting? If that is what the artist is painting, are the king and queen currently sitting for it? If so, are we (the viewers) sitting with the royalty? Would a learning machine ever consider it might be part of the work of art?
Why is the princess here? Why is she highlighted by the light? Why is the light also on the palette? Why are these emphasized?
Is Velázquez then painting the picture we see?
This picture is large – 318 cm × 276 cm (125.2 in × 108.7 in) -, but then Velázquez would be less than half the size of the canvas, and therefore be the size of a dwarf, which he was not.
Is Velázquez then painting the king and queen?
Why then is the princesses there? To amuse the royal couple while they were sitting for a portrait? But the king and queen never sat long for paintings. The artist would make some sketches and then paint when they were not present.
Is Velázquez then painting the princess?
Why would he do that?
Is the royal family visiting Velázquez while he was painting?
Maybe the whole scene was made up by the painter – it would be an artistic fiction. If so, what is the painting about?
Some say this painting is about painting itself. The light illuminates the artist’s palette, the raw material of painting. We see the artist in the midst of thinking what he will paint next. At the same time, while the artist seems ready to use the paint brush, we do not know what he is looking at, or what he intends to paint.
Is this then a painting about an artist doing a painting? Does the artist see us, do we (the viewer, including the machine intelligence agent) see the artist? Given the presence of the mirror, does this imply some sort of recursion? Is this a painting about a painter doing a painting? Would machine intelligence understand art that contains recursion?
A Definite Conclusion is Not Necessary
Much has been written about Las Meninas, and these questions only scratch the surface. Answering them is not necessary to make clear that today’s machine intelligence is nowhere near being able to understand art that has multiple, even contradictory interpretations, much less to juggle them back and forth as it tries out multiple possible interpretations. Could it even change the interpretation used based on its current mood as humans do.
Humans do all this analysis quite naturally, we are nowhere near doing this with current technology.
All these layers of meaning might be there, or just some of them, or perhaps even none of them are there. How does deep learning deal with an indeterminate answer, something humans can handle even if it is sometimes hard to do.
As long as algorithmic reasoning cannot think about conflicting interpretations, or stacked levels of analysis with wide uncertainty at each level, it will never comprehend art.
Often you comprehend art in the course of living your life. You might come, without knowing it, to an answer. Until then you just live with the contradictions, uncertainties, and incompatible ideas rather than throwing them away. Current AI just throws away what it cannot use because it does not understand the questions being asked – which transcend yes or no answers.
Current machine intelligence will never understand art, even kitsch or bad art.
- “Artificial Intelligence, A Guide for Thinking Humans” Melanie Mitchell pp 225-228
- The ladies-in-waiting are (the meninas of the title) are Isabel de Velasco and María Agustina Sarmiento de Sotomayor. The dwarfs are Maria Barbola and Nicolas Pertusato. Behind them is Marcela de Ulloa the princess’ chaperone.