Tech Companies Under Fire for Using Swiped YouTube Videos to Train AI Models

The use of generative artificial intelligence (AI) has been on the rise, with tech companies constantly seeking training data to improve their models. However, a recent investigation by Proof News has revealed that some companies, including Apple, Nvidia, and Anthropic, have been using YouTube videos without permission to train their AI models.

The investigation found that these companies were utilizing a dataset called YouTube Subtitles, which contained transcripts of over 173,000 YouTube videos from various channels. These videos ranged from educational content to news sites to popular creators like MrBeast and Marques Brownlee. Despite YouTube’s rules against downloading and using content without permission, these companies went ahead and used the data for their AI models.

Marques Brownlee, a popular tech YouTuber, addressed the issue on social media, stating that Apple had sourced data from companies that scraped data/transcripts from YouTube videos, including his own. While Apple may not be directly responsible for the scraping, this revelation raises concerns about the ethical implications of using unauthorized data for AI training.

Proof News also created a tool for creators to search for their content in the dataset, allowing them to see if their videos were included without permission. While the dataset does not include imagery from the videos, it does contain translated subtitles in multiple languages.

The dataset in question was created by Eleuther AI, a non-profit AI research lab focused on promoting open science norms. The dataset, known as the Pile, includes material from various sources, including the European Parliament and English Wikipedia, and was released under a permissive license for academic and research purposes.

This investigation highlights the ongoing challenges surrounding data privacy and ethics in the AI industry. Companies must be held accountable for their data practices and ensure that they are obtaining data ethically and with proper permissions. As the use of AI continues to grow, it is crucial for tech companies to prioritize transparency and ethical data usage to build trust with users and creators.

Tech companies Apple and Nvidia utilized YouTube videos to train artificial intelligence

Redactive, an Australian AI development startup, secures $7.5M funding for growth and expansion

Figma Removes AI Tool That Mimicked Apple’s Weather App, AI Spam Outranks Original Content in Google Search Results, and Other Updates

Highmark Chooses Laguna for Conversational AI Integration

Vermont families fight against lawyers, the Legislature, and understaffing in pursuit of quality education

ET Telecom’s BCG Report on Telecom Industry

Tech Companies Under Fire for Using Swiped YouTube Videos to Train AI Models

Redactive, an Australian AI development startup, secures $7.5M funding for growth and expansion

Figma Removes AI Tool That Mimicked Apple’s Weather App, AI Spam Outranks Original Content in Google Search Results, and Other Updates

Highmark Chooses Laguna for Conversational AI Integration

Vermont families fight against lawyers, the Legislature, and understaffing in pursuit of quality education

LEAVE A REPLY Cancel reply

Editor's Picks

Understanding AI investment opportunities and top Canadian...

AI and Emerging Technology Task Force for...

Mark Twain experiences programming difficulties as he...

Latest

Redactive, an Australian AI development startup, secures...

Figma Removes AI Tool That Mimicked Apple’s...

Highmark Chooses Laguna for Conversational AI Integration

Popular

3 Semiconductor Stocks Expected to Surge by...

Highmark Chooses Laguna for Conversational AI Integration

Billionaires Dump Nvidia Stock Pre-Split, Invest in...

Sitemap