• Home
  • Blog
  • Android
  • Cars
  • Gadgets
  • Gaming
  • Internet
  • Mobile
  • Sci-Fi
Tech News, Magazine & Review WordPress Theme 2017
  • Home
  • Blog
  • Android
  • Cars
  • Gadgets
  • Gaming
  • Internet
  • Mobile
  • Sci-Fi
No Result
View All Result
  • Home
  • Blog
  • Android
  • Cars
  • Gadgets
  • Gaming
  • Internet
  • Mobile
  • Sci-Fi
No Result
View All Result
Blog - Creative Collaboration
No Result
View All Result
Home Sci-Fi

Apple and Salesforce AI training datasets co-opt MrBeast, Marques Brownlee videos

July 16, 2024
Share on FacebookShare on Twitter

A new investigation claims that tech companies used subtitles from more than 48,000 YouTube channels — including from top creators like MrBeast and Marques Brownlee and higher learning institutions like MIT and Harvard — to train their AI models, even though YouTube prohibits the harvesting of platform content without permission.

The investigation, conducted by Proof News and published in conjunction with Wired, found that companies like Anthropic, Nvidia, Apple, and Salesforce used a dataset of 173,536 YouTube videos including those from Khan Academy, MIT, Harvard, The Wall Street Journal, NPR, the BBC and late night shows like The Late Show With Stephen Colbert, Last Week Tonight With John Oliver, and Jimmy Kimmel Live.

SEE ALSO:

ChatGPT now saves chat history even if you’ve opted out of sharing training data

Marques Brownlee posted an Instagram Reel noting that, in his opinion, “the real story is Apple and a whole bunch of other tech companies are training their AI models using data that they buy from third party data scraping companies some of which get their data in slightly illegal ways… Apple can technically say they’re not at fault for this.”

Wired says that representatives for the non-profit AI research lab that scraped and disseminated the YouTube dataset, EleutherAI, did not respond to the publication’s requests for comment. The dataset is part of a compilation the nonprofit calls The Pile, which also includes material from the European Parliament, English Wikipedia, and emails from the employees of the Enron Corporation released during the federal investigation into the company in the early 2000s.


Prime Day deals you can shop right now

Products available for purchase here through affiliate links are selected by our merchandising team. If you buy something through links on our site, Mashable may earn an affiliate commission.


Mashable Light Speed

Wired reports that most of the collections that make up The Pile are accessible to “anyone on the internet with enough space and computing power to access them.” These include Apple, Nvidia, Salesforce, Bloomberg and Databricks, all of which have publicly acknowledged their use of The Pile to train AI models.

Jennifer Martinez, a spokesperson for AI startup Anthropic, said in a statement that while the company had used The Pile to train its generative AI assistant, “YouTube’s terms cover direct use of its platform, which is distinct from use of the Pile dataset. On the point about potential violations of YouTube’s terms of service, we’d have to refer you to the Pile authors.”

In his Instagram Reel, Brownlee added, “The double whammy is that I actually pay for more accurate manual transcriptions on every video that we put out… so that means the stolen transcriptions specifically are paid content that’s being stolen more than once.”

His concerns echo those of creators across the world who are concerned that their work will be consumed or exploited by AI without compensation or permission. Many are currently suing tech companies for unapproved use of their work.

Wired reports that The Pile is still available on file-sharing services but has been removed from its official download site. Proof News has created a tool to search for creators in the YouTube AI training dataset.

Topics
Artificial Intelligence

Next Post

Galaxy Ring, who? There's a hot deal on the best-looking Oura Ring style right now

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

No Result
View All Result

Recent Posts

  • Krispy Kreme is giving away free green doughnuts for St. Patrick’s Day — how to claim yours
  • Resident Evil Requiem Sales Exceed 6 Million Units
  • Apple launches AirPods Max 2 with improved noise cancellation
  • The Galaxy Z Fold 8 may finally get the upgrade you’ve been waiting for
  • Oxford Medical Simulation secures £5M growth financing

Recent Comments

    No Result
    View All Result

    Categories

    • Android
    • Cars
    • Gadgets
    • Gaming
    • Internet
    • Mobile
    • Sci-Fi
    • Home
    • Shop
    • Privacy Policy
    • Terms and Conditions

    © CC Startup, Powered by Creative Collaboration. © 2020 Creative Collaboration, LLC. All Rights Reserved.

    No Result
    View All Result
    • Home
    • Blog
    • Android
    • Cars
    • Gadgets
    • Gaming
    • Internet
    • Mobile
    • Sci-Fi

    © CC Startup, Powered by Creative Collaboration. © 2020 Creative Collaboration, LLC. All Rights Reserved.

    Get more stuff like this
    in your inbox

    Subscribe to our mailing list and get interesting stuff and updates to your email inbox.

    Thank you for subscribing.

    Something went wrong.

    We respect your privacy and take protecting it seriously