In a new study, an AI model was trained to learn words and concepts through the eyes and ears of a single child using headcam video recordings from when the child was six months old and up until their second birthday.
Researchers have shown that an artificial intelligence (AI) model can learn a significant number of words and concepts using the limited pieces a child has experienced. Although video captures only one percent of a child's waking hours, it is sufficient for real language learning, they say.
“By using AI models to study a real language-learning problem that children face, we can resolve classic debates about what ingredients children need to learn words — do they need language-specific biases, innate knowledge, or just associative learning.” NYU Center for Data Science and said Brenden Lake, assistant professor in the Department of Psychology and senior author of the study published in the journal Science.
We are on whatsapp channels. Click to join.
To develop the model, the researchers first analyzed the children's learning process captured on first-person video — via a light, head-mounted camera — at six months and weekly through 25 months.
Using video footage collected over 60 hours, the team observed that it contained nearly a quarter of a million word instances — the number of words communicated, many of them repeated — that were linked to the video frames the children watched. Talkers
The footage also includes a variety of activities such as mealtimes, reading books and the child playing, the team said.
The researchers then trained a multimodal neural network with two different modules — one taken on single frames of video and the other on a transcript of the child's speech.
These modules were combined and trained using an algorithm called contrastive learning, which aims to learn by making associations in the input data, they said.
For example, when a parent says something to a child's eyes, they explain that some of the words used are likely to refer to what the child can see, meaning that comprehension occurs by connecting visual and linguistic cues.
“This gives the model a clue as to which words to associate with which objects,” said Y Kean Wang, a research scientist at NYU's Center for Data Science.
“Combining these cues allows contrastive learning to gradually identify which words go with which visuals and capture children's first word learning,” Wang said.
After training the model, the team tested the model by presenting it with a target word and a series of four different picture options and asking it to select the picture that matched the target word.
The researchers said the model was able to learn a “significant” number of words and concepts that children experience in everyday life.
Furthermore, for some of the words the model learned, it was observed to generalize them to visual contexts different from those seen in its training data.
This may reflect an aspect of generalization when children are studied in the lab, the researchers said.
Also read today's other top stories:
Will Apple be foldable? Apple may launch its first foldable device in 2026 or 2027 with a 7-8 inch display. Uncertainty surrounds whether it will be a foldable iPhone or an iPad. Read all about it here.
Love editing photos? Here are the best for you to do so in no time! Check them out here.
Smartphone launch! The Infinix Smart 8 has an 8+128GB variant. It features a 50MP AI camera, innovative design elements and a powerful MediaTek Helio G36 octa-core processor. Check it out here.
Tourists visiting the Paris Eiffel Tower can now book their tour to the iconic monument using UPI. Read them all here.
Beware of hackers! A recent report found 12 malicious apps spreading 6 malware on Google Play Store. Learn how to protect yourself from such threats. Find out what's going on here.