Salesforce Sued for Allegedly 'Pirating' Books for AI Training

The TL;DR 📝

The lawsuit claims Salesforce ‘pirated’ hundreds of thousands of books.
Salesforce allegedly scrubbed references to the datasets after backlash.
Authors are seeking damages and accountability for their stolen work.
This case highlights the blurry lines of AI training and copyright.
Market reaction? Lots of side-eye and ‘I told you so’ vibes.

📧 Want crypto news that doesn’t put you to sleep? Get our weekly digest straight to your inbox. No spam, just the good stuff.

Okay, let’s talk about something that might make your head spin faster than a TikTok challenge gone wrong. Imagine you’ve spent years writing a novel, pouring your soul into every chapter, only to find out it’s been used as fuel for some AI’s language model. That’s exactly the kind of drama going down with Salesforce right now, and trust me, it’s a wild ride.

Here’s the tea: Authors E. Molly Tanzer and Jennifer Gilmore have filed a class action lawsuit against Salesforce, claiming the company “pirated hundreds of thousands of copyrighted books” to train its XGen AI models. Yes, you heard that right. This isn’t just your average corporate misstep; it’s a full-blown legal showdown. They’re alleging that Salesforce pulled a fast one by initially listing the “RedPajama-Books” dataset as a source for its AI in June 2023, but then, poof! Those references vanished like your favorite limited-edition sneakers on drop day.

So, why should you care? Well, this isn’t just about some authors getting riled up. It speaks to a much bigger issue in the tech world: the blurred lines of copyright when it comes to AI. As AI continues to grow like that plant you forgot to water, the question of where its training data comes from is becoming increasingly vital. If companies can just swipe content without a second thought, what does that mean for creators everywhere? It’s like if Netflix started using your TikToks as their next big series without asking. Major yikes.

Now, let’s break down what actually happened. According to the lawsuit filed in a San Francisco federal court, Salesforce relied on datasets like “RedPajama” and “The Pile,” which includes a whopping collection known as Books3—over 196,000 books straight up copied from a private tracker. In June 2023, they were all about being transparent, naming their sources clearly. Fast forward to September, and suddenly those references disappear, replaced by vague terms like “publicly available data.” Sounds sus, right?

Salesforce CEO Marc Benioff has even chimed in on the drama, saying that AI companies have “ripped off” training data. It’s like he’s calling out the entire industry, but also, aren’t you the captain of this ship, Marc?

What does this mean for the average Joe or Jane who might not be knee-deep in the crypto or tech world? If you’re a content creator, artist, or really anyone who puts their work out there, this case could set some serious precedents. If Salesforce gets away with this, what’s stopping other companies from doing the same? It’s like opening Pandora’s box, except instead of chaos, it’s just a bunch of copyright violations.

The market reaction to this news has been a mix of eye rolls and ‘I told you so’s. On the one hand, some people are panicking about the implications for the AI industry. On the other hand, there’s a sense of schadenfreude—kind of like watching your ex trip at a party. People are feeling that maybe, just maybe, accountability is coming for tech giants who play fast and loose with intellectual property.

But wait, it gets better (or worse, depending on how you look at it). Legal experts are weighing in, and it’s not looking easy for the plaintiffs. They have to prove real financial harm, not just that their work was used. Recent court rulings have favored companies like OpenAI, making it a tough uphill battle for authors. So, while we’re all rooting for the little guy here, it’s clear this is going to be a complicated legal mess.

In conclusion, the Salesforce saga is just another reminder of the wild west we’re living in when it comes to technology and copyright law. So next time you’re scrolling through Twitter, keep an eye on this case. It might just reshape how we think about AI and the rights of creators. And who knows, maybe it’ll even lead to some much-needed guidelines in this chaotic digital landscape.

What do you think? Can AI companies really claim ‘innocence’ when it comes to using copyrighted material? Let’s chat about it in the comments.

Sources
Decrypt

Also, if you’re need to secure your bags, check out this guide

BTW, if you’re thinking about hardware wallets, check out this guide

Secure your crypto

Protect your assets with a hardware wallet. Readers often choose:

(affiliate links)

Quick Crypto Resources 🔥

Looking to actually get into crypto? Here are some solid places to start:

Learn the basics: Check out our What is DeFi? guide
Keep your crypto safe: Don’t get rekt - read How to store Bitcoin safely

This is just news, not financial advice. DYOR and maybe don’t bet the farm on magic internet money.

Sources

Salesforce Faces Class Action Over Alleged Illegal AI Training Data

The TL;DR 📝

Stay Ahead of the Market

Quick Crypto Resources 🔥

Sources

Related reading