To my understanding the lawsuit is not about Reddit's content or copyright infringements but in more detail about using proxy farms and other measures to get Reddit content via Google's results and thus breaking the DMCA and they go against services providing the infrastructure to do so.
A bit comparable to how pressure was put onto payment providers to restrict sites with extreme adult content.
You might’ve read Perplexity was named in a lawsuit filed by Reddit this morning. We know companies usually dodge questions during lawsuits, but we’d rather be up front.
Perplexity believes this is a sad example of what happens when public data becomes a big part of a public company’s business model.
Selling access to training data is an increasingly important revenue stream for Reddit, especially now that model makers are cutting back on deals with Reddit or walking away completely. (A trend Reddit has acknowledged in recent earnings reports).
So, why sue Perplexity? Our guess: it’s about a show of force in Reddit’s training data negotiations with Google and OpenAI. (Perplexity doesn’t train foundation models!)
Here’s where we push back. Reddit told the press we ignored them when they asked about licensing. Untrue. Whenever anyone asks us about content licensing, we explain that Perplexity, as an application-layer company, does not train AI models on content. Never has. So it is impossible for us to sign a license agreement to do so.
A year ago, after explaining this, Reddit insisted we pay anyway, despite lawfully accessing Reddit data. Bowing to strong arm tactics just isn’t how we do business.
What does Perplexity actually do with Reddit content? We summarize Reddit discussions, and we cite Reddit threads in answers, just like people share links to posts here all the time. Perplexity invented citations in AI for two reasons: so that you can verify the accuracy of the AI-generated answers, and so you can follow the citation to learn more and expand your journey of curiosity.
And that’s what people use Perplexity for: journeys of curiosity and learning. When they visit Reddit to read your content it’s because they want to read it, and they read more than they would have from a Google search.
Reddit changed its mind this week on whether they want Perplexity users to find your public content on their journeys of learning. Reddit thinks that’s their right. But it is the opposite of an open internet.
In any case, we won’t be extorted, and we won’t help Reddit extort Google, even if they’re our (huge) competitor. Perplexity will play fair, but we won’t cave. And we won’t let bigger companies use us in shell games.
We’re here to keep helping people pursue wisdom of any kind, cite our sources, and always have more questions than answers. Thanks for reading.
>Here’s where we push back. Reddit told the press we ignored them when they asked about licensing. Untrue. Whenever anyone asks us about content licensing, we explain that Perplexity, as an application-layer company, does not train AI models on content. Never has. So it is impossible for us to sign a license agreement to do so.
I wish they had told reddit to go fuck itself and taken that to court.
unlike the new york times lawsuit - where the platform owns their content and training is a gray area - reddit doesn't own shit. and if they insist otherwise - bye bye section 230 protections, no? they now retroactively own every post in r/jailbait and r/coontown.
Without gating AI scraper access, Reddit’s enterprise value based on only ad revenue is greatly diminished. If the AI folks impair Reddit’s economics through their maneuvers, that might not be so bad (as Reddit’s behavior of late has been “all this user generated content belongs to us to monetize as we see fit”).
They would most likely use the browsers they offer users to scrap and stream the content back to an endpoint for ingest and processing as users browse Reddit, think Recap the Law extension for Pacer (which scrapes Pacer while a user browses it and ships the data to the Internet Archive) or ArchiveTeam’s Warrior VM. You can’t defend against scraping when every user browser, that looks like a human because it is a human, is a crawler node.
At least, this is how I would engineer a public browser operating as an adversarial distributed crawler network.
To my understanding the lawsuit is not about Reddit's content or copyright infringements but in more detail about using proxy farms and other measures to get Reddit content via Google's results and thus breaking the DMCA and they go against services providing the infrastructure to do so.
A bit comparable to how pressure was put onto payment providers to restrict sites with extreme adult content.
Side note: I find it super interesting in the filing https://fingfx.thomsonreuters.com/gfx/legaldocs/xmpjezjawvr/... that Google was able to provide them with the exact number of automated requests broken down by company.
Perplexity's response to Reddit's lawsuit:
"Dear Reddit community,
You might’ve read Perplexity was named in a lawsuit filed by Reddit this morning. We know companies usually dodge questions during lawsuits, but we’d rather be up front.
Perplexity believes this is a sad example of what happens when public data becomes a big part of a public company’s business model.
Selling access to training data is an increasingly important revenue stream for Reddit, especially now that model makers are cutting back on deals with Reddit or walking away completely. (A trend Reddit has acknowledged in recent earnings reports).
So, why sue Perplexity? Our guess: it’s about a show of force in Reddit’s training data negotiations with Google and OpenAI. (Perplexity doesn’t train foundation models!)
Here’s where we push back. Reddit told the press we ignored them when they asked about licensing. Untrue. Whenever anyone asks us about content licensing, we explain that Perplexity, as an application-layer company, does not train AI models on content. Never has. So it is impossible for us to sign a license agreement to do so.
A year ago, after explaining this, Reddit insisted we pay anyway, despite lawfully accessing Reddit data. Bowing to strong arm tactics just isn’t how we do business.
What does Perplexity actually do with Reddit content? We summarize Reddit discussions, and we cite Reddit threads in answers, just like people share links to posts here all the time. Perplexity invented citations in AI for two reasons: so that you can verify the accuracy of the AI-generated answers, and so you can follow the citation to learn more and expand your journey of curiosity.
And that’s what people use Perplexity for: journeys of curiosity and learning. When they visit Reddit to read your content it’s because they want to read it, and they read more than they would have from a Google search.
Reddit changed its mind this week on whether they want Perplexity users to find your public content on their journeys of learning. Reddit thinks that’s their right. But it is the opposite of an open internet.
In any case, we won’t be extorted, and we won’t help Reddit extort Google, even if they’re our (huge) competitor. Perplexity will play fair, but we won’t cave. And we won’t let bigger companies use us in shell games.
We’re here to keep helping people pursue wisdom of any kind, cite our sources, and always have more questions than answers. Thanks for reading.
>Here’s where we push back. Reddit told the press we ignored them when they asked about licensing. Untrue. Whenever anyone asks us about content licensing, we explain that Perplexity, as an application-layer company, does not train AI models on content. Never has. So it is impossible for us to sign a license agreement to do so.
I wish they had told reddit to go fuck itself and taken that to court.
unlike the new york times lawsuit - where the platform owns their content and training is a gray area - reddit doesn't own shit. and if they insist otherwise - bye bye section 230 protections, no? they now retroactively own every post in r/jailbait and r/coontown.
Without gating AI scraper access, Reddit’s enterprise value based on only ad revenue is greatly diminished. If the AI folks impair Reddit’s economics through their maneuvers, that might not be so bad (as Reddit’s behavior of late has been “all this user generated content belongs to us to monetize as we see fit”).
The AI companies could just pull the content from Reddit mirrors like https://arctic-shift.photon-reddit.com/search/ and https://search.pullpush.io/. It's not difficult to scrape nor difficult to acquire archives of all Reddit posts and comments.
They would most likely use the browsers they offer users to scrap and stream the content back to an endpoint for ingest and processing as users browse Reddit, think Recap the Law extension for Pacer (which scrapes Pacer while a user browses it and ships the data to the Internet Archive) or ArchiveTeam’s Warrior VM. You can’t defend against scraping when every user browser, that looks like a human because it is a human, is a crawler node.
At least, this is how I would engineer a public browser operating as an adversarial distributed crawler network.
Related, the lawsuit in question:
Reddit Accuses 'Data Scraper' Companies of Theft
https://www.nytimes.com/2025/10/22/technology/reddit-data-sc...
(https://news.ycombinator.com/item?id=45671679)