Things Humans Still Do Better Than AI: Understanding Flowers

While it might feel as though artificial intelligence is getting dangerously smart, there are still some basic concepts that AI doesn’t comprehend as well as humans do.
Back in March, we reported that popular large language models (LLMs) struggle to tell time and interpret calendars. Now, a study published earlier this week in Nature Human Behaviour reveals that AI tools like ChatGPT are also incapable of understanding familiar concepts, such as flowers, as well as humans do. According to the paper, accurately representing physical concepts is challenging for machine learning trained solely on text and sometimes images.
“A large language model can’t smell a rose, touch the petals of a daisy or walk through a field of wildflowers,” Qihui Xu, lead author of the study and a postdoctoral researcher in psychology at Ohio State University, said in a university statement. “Without those sensory and motor experiences, it can’t truly represent what a flower is in all its richness. The same is true of some other human concepts.”
The team tested humans and four AI models—OpenAI’s GPT-3.5 and GPT-4, and Google’s PaLM and Gemini—on their conceptual understanding of 4,442 words, including terms like flower, hoof, humorous, and swing. Xu and her colleagues compared the outcomes to two standard psycholinguistic ratings: the Glasgow Norms (the rating of words based on feelings such as arousal, dominance, familiarity, etc.) and the Lancaster Norms (the rating of words based on sensory perceptions and bodily actions).
The Glasgow Norms approach saw the researchers asking questions like how emotionally arousing a flower is, and how easy it is to imagine one. The Lancaster Norms, on the other hand, involved questions including how much one can experience a flower through smell, and how much a person can experience a flower with their torso.
In comparison to humans, LLMs demonstrated a strong understanding of words without sensorimotor associations (concepts like “justice”), but they struggled with words linked to physical concepts (like “flower,” which we can see, smell, touch, etc.). The reason for this is rather straightforward—ChatGPT doesn’t have eyes, a nose, or sensory neurons (yet) and so it can’t learn through those senses. The best it can do is approximate, despite the fact that they train on more text than a person experiences in an entire lifetime, Xu explained.
“From the intense aroma of a flower, the vivid silky touch when we caress petals, to the profound visual aesthetic sensation, human representation of ‘flower’ binds these diverse experiences and interactions into a coherent category,” the researchers wrote in the study. “This type of associative perceptual learning, where a concept becomes a nexus of interconnected meanings and sensation strengths, may be difficult to achieve through language alone.”
In fact, the LLMs trained on both text and images demonstrated a better understanding of visual concepts than their text-only counterparts. That’s not to say, however, that AI will forever be limited to language and visual information. LLMs are constantly improving, and they might one day be able to better represent physical concepts via sensorimotor data and/or robotics, according to Xu. She and her colleagues’ research carries important implications for AI-human interactions, which are becoming increasingly (and, let’s be honest, worryingly) intimate.
For now, however, one thing is certain: “The human experience is far richer than words alone can hold,” Xu concluded.
gizmodo