I tried spinning up the local approach with docker compose, but it fails.
There's no `.env.example` file to copy from. And even if the env vars are set manually, there are issues with the mentioned volumes not existing locally.
What I'd really like is a plugin that automatically pulls from archives somewhere and replaces deleted comments and those bot-overwritten comments with the original context.
Reddit is becoming maddening to use because half the old links I click have comments overwritten with garbage out of protest for something. Ironically the original content is available in these archives (which are used for AI training) but now missing for actual users like me just trying to figure out how someone fixed their printer driver 2 years ago.
That would only really be ironic if the reason for people overwriting their comments was out of protest for LLM training, but the main reason that resulted in by far the biggest wave of deletions was Reddit locking down their API. If the result of their protest is that the site is less useful for you, the user, then in fact it served its purpose, as the entire point was an attempt to boycott Reddit, ie. get people to stop using it by removing the user contributions that give the site its only value in the first place.
> If the result of their protest is that the site is less useful for you, the user, then in fact it served its purpose, as the entire point was an attempt to boycott Reddit, ie. get people to stop using it by removing the user contributions that give the site its only value in the first place.
In practice I just give them more page views because I have to view more threads before I find the answer.
Reddit's DAU numbers have only gone up since the protest.
I did phrase it as "an attempt". In the end the protest probably wasn't as effective as protestors might have hoped, and it didn't get Reddit to change course on their enshittification decisions. I do think it was good that there was an attempt at pushback, at least, when most software users just accept enshittification as normal and continue tolerating whatever abuse their masters throw at them.
I want to do the same thing for tiktok. I have 5k videos starting from the pandemic downloaded. want to find a way to use AI to tag and categorize the videos to scroll locally.
It sold itself as a healthier alternative to Reddit, but by the end of its run virtually every post sitewide was some flavor of virulently racist, misogynistic, anti-semitic, fringe conspiratorial, etc.
So what? It's written by humans, and those people are just as human as you. Just because you don't agree with it doesn't mean they are wrong.
There are a billion Muslims that worship a guy that married a prepubescent child. Who are you to say that they are wrong to do so? Or that not liking a group of people (like you are doing yourself) is wrong?
I tried spinning up the local approach with docker compose, but it fails.
There's no `.env.example` file to copy from. And even if the env vars are set manually, there are issues with the mentioned volumes not existing locally.
Seems like this needs more polish.
Cool way to self-host archives.
What I'd really like is a plugin that automatically pulls from archives somewhere and replaces deleted comments and those bot-overwritten comments with the original context.
Reddit is becoming maddening to use because half the old links I click have comments overwritten with garbage out of protest for something. Ironically the original content is available in these archives (which are used for AI training) but now missing for actual users like me just trying to figure out how someone fixed their printer driver 2 years ago.
That would only really be ironic if the reason for people overwriting their comments was out of protest for LLM training, but the main reason that resulted in by far the biggest wave of deletions was Reddit locking down their API. If the result of their protest is that the site is less useful for you, the user, then in fact it served its purpose, as the entire point was an attempt to boycott Reddit, ie. get people to stop using it by removing the user contributions that give the site its only value in the first place.
> If the result of their protest is that the site is less useful for you, the user, then in fact it served its purpose, as the entire point was an attempt to boycott Reddit, ie. get people to stop using it by removing the user contributions that give the site its only value in the first place.
In practice I just give them more page views because I have to view more threads before I find the answer.
Reddit's DAU numbers have only gone up since the protest.
I did phrase it as "an attempt". In the end the protest probably wasn't as effective as protestors might have hoped, and it didn't get Reddit to change course on their enshittification decisions. I do think it was good that there was an attempt at pushback, at least, when most software users just accept enshittification as normal and continue tolerating whatever abuse their masters throw at them.
I wonder if this can be hooked up with the now-dead Apollo app in some way, to get back a slice of time that is forever lost now?
the API should allow for a lot of different integrations
Data is available via torrent in this section: https://github.com/19-84/redd-archiver?tab=readme-ov-file#-g...
I have also published sub statistics and profiling for each platform. these can be used to help identify which subs to prioritize for archiving.
reddit: https://github.com/19-84/redd-archiver/blob/main/tools/subre...
voat: https://github.com/19-84/redd-archiver/blob/main/tools/subve...
ruqqus: https://github.com/19-84/redd-archiver/blob/main/tools/guild...
I want to do the same thing for tiktok. I have 5k videos starting from the pandemic downloaded. want to find a way to use AI to tag and categorize the videos to scroll locally.
_Hacker News collectively grabs the dataset to train their models on how to become effective reddit trolls_
Don’t we have enough of those already? ;)
the API and MCP server is very powerful ;)
Did you pay all the people who created its content?
I have no problem with this being downloaded for personal use, in fact that's a good thing. But of course we both know it'll be used to train AI.
>Voat
Gross. Why would anyone want to have an archive of Reddit For Neonazis?
Might be good for researchers to be able to perform studies on.
thank you for your comment, I will support any platform that has complete dataset available. I will take submissions for any complete datasets through github issues. https://github.com/19-84/redd-archiver/blob/main/.github/ISS...
Wat?
It sold itself as a healthier alternative to Reddit, but by the end of its run virtually every post sitewide was some flavor of virulently racist, misogynistic, anti-semitic, fringe conspiratorial, etc.
So what? It's written by humans, and those people are just as human as you. Just because you don't agree with it doesn't mean they are wrong.
There are a billion Muslims that worship a guy that married a prepubescent child. Who are you to say that they are wrong to do so? Or that not liking a group of people (like you are doing yourself) is wrong?