Thursday, June 4, 2026
HomeTechnology and Innovation in EducationTech companies Apple and Nvidia utilized YouTube videos to train artificial intelligence

Tech companies Apple and Nvidia utilized YouTube videos to train artificial intelligence

Date:

Related stories

Empowering Everyone to Learn AI-Ready Coding Skills with a User-Friendly Platform

SkillReactor Launches Academy: Empowering Users to Learn AI-Ready...

Mistral AI Introduces Codestral: AI-Powered Code Generation for 80 Programming Languages

Introducing Codestral: Mistral AI's Revolutionary Code-Focused Generative AI...

Top 3 AI Stocks to Consider Investing in for June

Top Artificial Intelligence Stocks to Invest in Now Artificial...

Tech Companies Under Fire for Using Swiped YouTube Videos to Train AI Models

The use of generative artificial intelligence (AI) has been on the rise, with tech companies constantly seeking training data to improve their models. However, a recent investigation by Proof News has revealed that some companies, including Apple, Nvidia, and Anthropic, have been using YouTube videos without permission to train their AI models.

The investigation found that these companies were utilizing a dataset called YouTube Subtitles, which contained transcripts of over 173,000 YouTube videos from various channels. These videos ranged from educational content to news sites to popular creators like MrBeast and Marques Brownlee. Despite YouTube’s rules against downloading and using content without permission, these companies went ahead and used the data for their AI models.

Marques Brownlee, a popular tech YouTuber, addressed the issue on social media, stating that Apple had sourced data from companies that scraped data/transcripts from YouTube videos, including his own. While Apple may not be directly responsible for the scraping, this revelation raises concerns about the ethical implications of using unauthorized data for AI training.

Proof News also created a tool for creators to search for their content in the dataset, allowing them to see if their videos were included without permission. While the dataset does not include imagery from the videos, it does contain translated subtitles in multiple languages.

The dataset in question was created by Eleuther AI, a non-profit AI research lab focused on promoting open science norms. The dataset, known as the Pile, includes material from various sources, including the European Parliament and English Wikipedia, and was released under a permissive license for academic and research purposes.

This investigation highlights the ongoing challenges surrounding data privacy and ethics in the AI industry. Companies must be held accountable for their data practices and ensure that they are obtaining data ethically and with proper permissions. As the use of AI continues to grow, it is crucial for tech companies to prioritize transparency and ethical data usage to build trust with users and creators.

Latest stories

LEAVE A REPLY

Please enter your comment!
Please enter your name here