Home NEWS Meta’s AI Models Are Trained By Watching Video Footages

Meta’s AI Models Are Trained By Watching Video Footages

19 February 2024

Yann LeCun, the leading AI expert at the company, sees promise in the V-JEPA model, suggesting it as a potential precursor to achieving artificial general intelligence.

Meta’s AI researchers have unveiled a novel model that diverges from the traditional methods of training large language models (LLMs). Instead of relying on written text, this new model learns from video footage, marking a significant departure in AI development.

Typically, LLMs are trained on vast datasets of sentences or phrases with certain words masked, compelling the model to fill in the missing words. Through this process, they gain a basic understanding of the world. Yann LeCun, the head of Meta’s FAIR (foundational AI research) group, envisions a more efficient learning approach for AI models by employing a similar masking technique on video content.

LeCun articulated the ambition behind this endeavor, stating, “Our goal is to build advanced machine intelligence that can learn more like humans do, forming internal models of the world around them to learn, adapt, and forge plans efficiently in the service of completing complex tasks.”

At the core of LeCun’s vision lies a research model named Video Joint Embedding Predictive Architecture (V-JEPA). It operates by analyzing unlabeled video segments and deducing probable events during obscured intervals.

It’s important to note that V-JEPA isn’t a generative model; rather, it constructs an internal conceptual understanding of the world. Meta researchers affirm that V-JEPA, post-pretraining via video masking, excels in discerning and comprehending intricate interactions between objects.

The implications of this research extend beyond Meta, potentially reshaping the broader AI landscape

The implications of this research extend beyond Meta, potentially reshaping the broader AI landscape. Meta has previously discussed the concept of a “world model” in the context of augmented reality glasses, envisioning an AI assistant that anticipates user needs and preferences based on an audio-visual understanding of the surroundings.

Moreover, V-JEPA could revolutionize AI model training methodologies. Current pretraining methods for foundational models necessitate substantial time and computational resources, often limiting access to larger organizations. However, with more efficient training techniques, the barrier to entry could lower, aligning with Meta’s ethos of open-source research dissemination.

LeCun highlights the current limitation of LLMs in learning from visual and auditory stimuli, hindering progress toward artificial general intelligence.

Meta’s next phase involves integrating audio data into the video, providing the model with additional sensory input akin to a child watching television. This auditory dimension will enrich the model’s learning experience, akin to how a child gains understanding through both sight and sound.

Meta intends to release the V-JEPA model under a Creative Commons noncommercial license, fostering collaboration and further exploration of its capabilities by researchers.

21 COMMENTS

Ariana Bryant 20 February 2024 At 11:14

This is both fascinating and slightly concerning. How do they ensure the AI learns ethical behavior from these videos?

Log in to leave a comment
Sofia Campbell 20 February 2024 At 11:14

I wonder if Meta will release any studies on the effectiveness and safety of training AI this way.

Log in to leave a comment
Aurora Coleman 20 February 2024 At 11:17

Imagine the sheer amount of data they must have to train these AI models.

Log in to leave a comment
Jeremiah Brooks 20 February 2024 At 11:17

Privacy implications aside, it’s amazing how advanced AI technology has become.

Log in to leave a comment
Sebastian Powell 20 February 2024 At 11:18

I hope Meta is transparent about the sources and types of videos they use for training.

Log in to leave a comment
Samuel Howard 20 February 2024 At 11:18

It’s crucial for Meta to address any biases that may arise from training AI on video footage.

Log in to leave a comment
Alexandra Hughes 20 February 2024 At 11:18

I wonder if this approach could lead to AI developing human-like biases or prejudices.

Log in to leave a comment
Anna Kelly 20 February 2024 At 11:19

The ethical considerations of training AI on real-world data are complex and deserve careful attention.

Log in to leave a comment
Ryan Russell 20 February 2024 At 11:19

Training AI on video footage opens up a whole new realm of possibilities, but also raises important questions about privacy and consent.

Log in to leave a comment
Nicholas Morris 20 February 2024 At 11:20

I’m curious about the specific techniques Meta uses to extract useful information from video data for training AI.

Log in to leave a comment
Brooklyn Turner 20 February 2024 At 11:20

It would be interesting to see how Meta’s approach compares to other methods of training AI.

Log in to leave a comment
Samantha Kelly 20 February 2024 At 11:21

I wonder if Meta will face any regulatory challenges regarding the use of video data for AI training.

Log in to leave a comment
Eleanor Watson 20 February 2024 At 11:23

This highlights the importance of responsible AI development and oversight.

Log in to leave a comment
David Coleman 20 February 2024 At 11:23

I hope Meta prioritizes the ethical use of AI in their training practices.

Log in to leave a comment
Layla Sanchez 20 February 2024 At 11:24

The potential applications of AI trained on video data are vast, but so are the potential risks.

Log in to leave a comment
Angela Hill 20 February 2024 At 11:25

It’s crucial for Meta to consider the potential impact of their AI models on society.

Log in to leave a comment
Mia Reed 20 February 2024 At 11:25

I’m curious about the accuracy and reliability of AI trained in this manner.

Log in to leave a comment
Joshua Cooper 20 February 2024 At 11:25

The intersection of AI and privacy is a complex and evolving area that requires careful consideration.

Log in to leave a comment
Thomas Perez 20 February 2024 At 11:26

I wonder how Meta addresses the challenge of labeling and categorizing the vast amount of video data they use for training.

Log in to leave a comment
Zoey Edwards 20 February 2024 At 11:26

This raises important questions about the ownership and control of data used to train AI.

Log in to leave a comment
Madison Mitchel 20 February 2024 At 11:27

The ethical implications of training AI on video data will likely be a topic of ongoing discussion in the AI community.

Log in to leave a comment