AI-powered open-source code laundering

146 points | by genkiuncle 2 days ago

82 comments

foxylad a day ago
This will kill open source. Anything of value will be derived and re-derived and re-re-derived by bad players until no-one knows which package or library to trust.
The fatal flaw of the open internet is that bad players can exploit with impunity. It happened with email, it happened with websites, it happened with search, and now it's happening with code. Greedy people spoil good things.
[-]
- squigz 6 hours ago
  Yup, a fundamental side effect of freedom is that some people are assholes and will abuse it.
  No, it won't kill open source, just as it hasn't killed the Internet.
- billy99k 20 hours ago
  This was always the case with open source. It's not that hard to obfuscate code in compiled binaries.
- awesome_dude a day ago
  If this was true, why hasn't it happened for the last... 30 or 40 years that FOSS code has been published on the internet
  [-]
  - makeitdouble a day ago
    Copyright was the base protection layer. Not in the "I own it" sense, but in the "you can't take it and run with it" sense.
    With the current weakening of it, it opens the door to abuses that we don't have the proper tools to deal with now. Perhaps new ones will emerge, but we'll have to see.
  - croes a day ago
    Same reason why fake images and videos are now more. Photoshop existed 30 years ago.
    Before LLM you needed time and abilities to do it, with AI you need less of both.
  - trod1234 a day ago
    Until now, people have had the leverage/cost asymmetry in their favor where they could easily differentiate and make rational choices.
    AI has tipped that nuanced balance in a way that is both destructive, and unsustainable. Just like any other fraud or ponzi.
    Cost/loss constraint function now favors the unskilled, blind, destructive individual running an LLM who spits on all those that act with good faith. Quite twisted.
  - ares623 a day ago
    Last i checked LLMs didn’t exist until only a few years ago
userbinator 2 days ago
Hopefully the spread of AI will make more people realise that everything is a derivative work. If it wasn't an AI, it was a human standing on the shoulders of giants.
[-]
- as1mov a day ago
  The offending repository is copying files verbatim while removing off the license header from the said files. It's not "standing on the shoulder of giants".
  [-]
  - typpilol a day ago
    That doesn't even seem like ai but just direct copy pasting lol
    [-]
    - em-bee a day ago
      looks to me like they are using AI to refactor the code, not to generate it. even if we allow code to be used to train AI to generate new code, copying code and refactoring it is something entirely different.
- dvfjsdhgfv a day ago
  This is a very intriguing statement because it looks like it contains a truism but something is off. Yes, everything is a derivative work of some kind, what matters is the amount of added value - if it gets close to 0 as in this case, we've got bare plagiarism.
  [As a side note, the problem with LLMs (sorry, the term "AI" became so muddy I prefer not to use it) is that they tend to be extremely uncreative and just average to mean. So I wouldn't expect added value in creativity itself, just helping humans with more menial tasks just like antirez is doing.]
- croes a day ago
  AI makes it easy for others to claim they did the work so others are less likely to do the real work. Means the giants won’t grow.
- smj-edison 2 days ago
  Yeah, this is where I find the copyright argument a little weak. Because how do artisans learn their craft? By observing others' work.
  Instead, I feel like the objections are (rightly) these two issues:
  1. GenAI operates at a much larger scale than an individual artist. I don't think artists would have an issue with someone commissioning a portrait say in the style Van Gogh (copyright argument). They would have an issue if that artist painted 100,000 pictures a day in the style of Van Gogh.
  2. Lack of giving back: some of the greatest artists have internalized great art from previous generations, and then something miraculous happens. An entirely new style emerges. They have now given back to the community that incubated them. I don't really see this same giving back with GenAI.
  Edit: one other thought. Adobe used their own legally created art to train their model, and people still complain about it, so I don't buy the copyright argument if they're upset about Adobe's GenAI.
  Edit 2: I'm not condoning blatant copyright infringement like is detailed in this post.
  [-]
  - visarga a day ago
    1. If I wanted the "style of Van Gogh" I would simply download Van Gogh, why waste time and money on approximative AI. But if I want something Else, then I can use AI. But Gen AI is really the worst infringement tool, for example would anyone try to read bootleg Harry Potter from a LLM to avoid payment? Don't think so.
    2. LLMs will give back what you put in + what they learned, it's your job to put in the original parts. But every so often this interaction will spark some new ideas. The LLM+human team can get where neither of them would get alone, building on each other's ideas.
  - Amekedl 11 hours ago
    No, this "observing" argument has already been beaten to death by a multitude of creatives explaining way better than I could how they learn and operate.
    If you really think all they do is observe, form a gradient from millions of samples and spit out some approximations, you are deeply mistaken.
    You cannot equate human learning with how genai learns (and if it did, we'd have agi already imao)
  - bluefirebrand a day ago
    > Because how do artisans learn their craft? By observing others' work
    I don't think that computer systems of any kind should have the same right to fair use that humans have
    I think humans should get fair use carve outs for fanart and derivative work, but AI should not
  - charcircuit 2 days ago
    >Lack of giving back
    I disagree. There is a ton of free AI generated text, code, images, and video available for completely free for people to learn from.
    [-]
    - chrisldgk a day ago
      Which is just laundered from real material that real humans put work in to create, only to be regurgitated by a krass homonculous of 1s and 0s for free without any mention of the real work that has been put into creating that information.
      I’m not a big fan of the copyright system we have myself, but there’s a reason it exists. AI companies illegally training their AI on copyrighted content to reap the spoils of the hard work of other people that never get recognition for their work is the opposite of „giving back“.
  - alganet 2 days ago
    Copyright is a nightmare. It's just that it sounds like a gentler nightmare than hyperscaled algorithms controlled by a few.
- hu3 2 days ago
  This. AI is a magnificent way to make the entire world's codebase available as a giant, cross-platform, standard library.
  I welcome AI to copy my crap if that's going to help anyone in the future.
  [-]
  - beeflet a day ago
    Except closed source software which it isn't trained on.
  - alganet 2 days ago
    You forgot to mention that if things continue as they are, a very small group of people will have complete control over this giant library.
    [-]
    - hu3 2 days ago
      It's a concern. But there are open source models.
      [-]
      - vineyardmike a day ago
        Open source model, created at great expense… by a still small cohort of people.
        There are like a dozen organizations globally creating anything close to state of the art models. The fact that you can use some for free on your own hardware doesn’t change that those weights were trained by a small cohort of people, with training data selected by those people, and fine-tuning and “alignment” created by those people.
        Sure you can fine-tune the smaller ones yourself, but that still leaves you at the will of original creator.
      - zdwolfe 2 days ago
        I find it odd that any LLM could be considered open source. Sure the weights are available to download and use, but you can't reasonably reconstruct the output model as it's impractical for an individual to gather a useful dataset or spend $5,000,000+ of GPU time training.
        [-]
        jsight a day ago
        Distillation can extract the knowledge from an existing model into a newly trained one. That doesn't solve the cost problem, but costs are steadily coming down.
        [-]
        goku12 a day ago
        That's still a crude repurposement of an inscrutable artifact. Open source requires you to share the source data from which that artifact (the model parameters) was created.
      - ares623 a day ago
        Are you able to build these models from source?
      - alganet 2 days ago
        No, there aren't.
        There is open source training and inference software. And there are open weights.
        Those things are not enough to reproduce the training.
        Even if you had the hardware, you would not be able to recreate llama (for example) because you don't know what data went into the training.
        That's a very weird library. You can get their summaries, but you don't have access to the original works used when creating it. Sounds terrible, open source or not.
- add-sub-mul-div 2 days ago
  Nothing subverts my defense of human creativity more than the cliched human defenses of AI.
  [-]
  - monero-xmr a day ago
    For those of us who exceed the AI, it raises our value enormously. You see it in the pay of the AI engineers. But in the high interest rate world, those of us who continue to be employed, are commanding higher wages, as far as I can tell. It is a culling of the lesser-than.
    One unfortunate side-effect is the junior engineers who cannot immediately exceed the AI are not being hired as often. But this era echos the dotcom boom, where very low-skilled people commanded very-high wages. Universities, which have always been white collar job training but pretended they weren't, are being impacted greatly.
    https://registrar.mit.edu/stats-reports/majors-count
    24% of undergraduate MIT students this year have Computer Science in the title (I asked chatgpt to calculate this from the difficult-to-parse website). 1/4 of all MIT undergraduates are not being trained to be future PhD researchers - they, like all other schools, are training the vast majority of their students for private sector workforce jobs.
    The culling is happening all over. We will likely go down to < 1000 colleges in America from 4000 now over the next 15 years.
    This is a good thing. The cost of university degrees is far too high. We are in the midst of a vast transition. College should return to being the purview of the truly intelligent and the children of the rich, as it was for all time before WW2. This very weird experiment in human history is ending, and it cannot happen soon enough
    [-]
    - sciencejerk a day ago
      > College should return to being the purview of the truly intelligent and the children of the rich, as it was for all time before WW2.
      You're likely correct that we're witnessing a reconsolidation of wealth and the extinction of the middle class in society, but you seem happy about this? Be careful what you wish for...
      [-]
      - ares623 a day ago
        They probably think they’re one of the “truly intelligent and children/parent of the rich” lol
        [-]
        monero-xmr a day ago
        I would not want the unintelligent and non-rich to go into debt to spend 4 years at a university, getting a degree in a subject which is absurd
        https://www.sps.nyu.edu/explore/degrees-and-programs/bs-in-h...
        Please, tell me how going $300k in debt for an undergraduate degree in Tourism Studies benefits society, or the student
        [-]
        ares623 a day ago
        Sounds like a US problem tbh
        [-]
        monero-xmr a day ago
        Nearly every European university offers a degree in Tourism. The difference is regarding debt. But socializing the cost of a degree in tourism does not mean the cost isn’t born by society. I believe deep in my bones that one can learn the ropes of managing a hotel outside the illustrious grounds of a university
        [-]
        ares623 a day ago
        Tbh it just sounds like you don’t value others and others’ work that isn’t a “hard” field.
        Sure people can learn hotel management outside of university. But outside of nepotism who will trust random strangers with no qualifications to get their foot in the door?
        And you make it sound like socializing the cost of improving outcomes for the next generations as a negative. What is the point if society if not that? Even from a purely selfish perspective, The next generation will take care of me when I am too old to do it myself. I’d want them to be in a good state by then
        Do you think you got to wherever you are now without some part of socialized cost of society getting you there?
        [-]
        monero-xmr a day ago
        “Tourism studies” isn’t a field. It’s not an academic discipline. Requiring someone to spend years “studying” this is completely absurd. The reality is university is finishing school, and young adults desire 4 years of screwing around while officially getting a degree, and society subsidies it.
        You have missed my point entirely. These degrees have no value. I would argue they have negative value when factoring in their cost in resources and wasted time
      - monero-xmr a day ago
        Alternatively, all middle class jobs do not require a college degree. Perhaps a college degree is primarily a signalling mechanism for adherence to a bygone era of societal norms. But the price is far too high to justify it, and the market will create alternative proof of societal norms, at a far cheaper price. Which is happening as we debate.
        My concern now is a large number of under-employed college graduates who are indebted to worthless degrees, feeling pinched because the debt far surpasses their market value. This has been the case for a long time, but has now reached the upper-echelons of academia where even Ivy league grads cannot get employment. You need to re-calibrate your ire to the correct target
        [-]
        sciencejerk a day ago
        If AI and other societal shifts eliminate many white-collar jobs in developed countries, degree-seekers will eventually notice and the demand and perceived value of a college education may greatly diminish. I got my degree at a time when it was actually useful as a signalling mechanism. Now students might not benefit much from a college degree and internships might be hard to find. This is too bad and grossly unfair.
        I hope that new societal avenues are created to help young people start their careers, even if those careers are in fields like plumbing, nursing and hospitality. I also hope efforts are made to help white collar workers transition into other (lesser-paying) careers when AI really starts to permanently reduce the size of the white-collar workforce.
        sciencejerk a day ago
        > You need to re-calibrate your ire to the correct target
        Who do you think is the correct target? Big institutions? The college system?
        [-]
        monero-xmr a day ago
        Definitely the colleges, which charge more year after year, burdening the young with debt for worthless pieces of paper.
        novemp a day ago
        Yeah, sure, not every job should require a degree, but that doesn't justify keeping The Poors from pursuing education.
        Some of us value education for its own sake, not as a prerequisite for employment.
        [-]
        monero-xmr a day ago
        You are assuming the only avenue to "education" is through the university experience
        [-]
        novemp a day ago
        Some people learn best in structured class settings.
    - card_zero a day ago
      35%, ignoring "secondary majors" which may or may not coincide with primary majors that also have CS in the title.
      (Also ignoring the thousand first years at the end of the list.)
      The various 0.5 half-student quantities throw some doubt on the measurement too.
    - teiferer a day ago
      > College should return to being the purview of the truly intelligent and the children of the rich, as it was for all time before WW2.
      Yeah, the world was a better place when it was mostly white males having that chance.
      /s
- CamperBob2 2 days ago
  I'll give you the only upvote you'll probably get for that sentiment around here. Enjoy your trip to -4 (Dead)!
cientifico a day ago
The license was MIT until two months ago.
That gives anyone the right to get the source code of that commit and do whatever.
The article does not specified if the company is still using the code AFTER the license change.
The rest of the points are still valid.
[-]
- Leynos 7 hours ago
  MIT places the following condition on the licencee if they wish to re-distribute the code:
  > The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
  Which the other party was not doing.
  [-]
  - nasusnavas 5 hours ago
    I looked at their codebase and it seems the other party was doing. I'm seeing a pattern here where either this is not really a copyright problem but possibly a marketing stunt if its not, then it may well be an emotional spiral or lash-out for one person extending another's open source logic even with attribution clearly given. If so then this is not healthy for the open-source community.
    Also is it legal to start with MIT and change to Apache midway? The laws around opensource licensing are so tricky and cutthroat at this point.
    Also does anyone know what this Intentional License is from the other party, I have never seen it before. It seems that's what their main package is while the other packages are Apache. If its custom is it even legal to just create a new OSS License out of nothing?
    There's too much gray area with OSS especially when it comes to legalities we almost need a standard.
arthurofbabylon 2 days ago
If we step back and examine LLMs more broadly (beyond our personal use cases, beyond "economic impact", beyond the underlying computer science) what we are largely looking at is an emerging means of collaboration. I am not an expert computer scientist, and yet I can "collaborate" (I almost feel bad using this term) with expert computer scientists when my LLM helps me design my particular algorithm. I am not an expert on Indonesian surf breaks, yet I tap into an existing knowledge base when I query my LLM while planning the trip. I am very naive about a lot of things and thankfully there are numerous ways to integrate with experts and improve my capacity to engage in whatever I am naive about, LLMs offering the latest ground-breaking method.
This is the most appropriate lens through which to assess AI and its impact on open source, intellectual property, and other proprietary assets. Alongside this new form of collaboration comes a restructuring of power. It's not clear to me how our various societies will design this restructuring (so far we are collectively doing nearly nothing) but the restructuring of these power structures is not a technical process; it is cultural and political. Engineers will only offer so much help here.
For the most part, it is up to us to collectively orchestrate the new power structure, and I am still seeing very little literature on the topic. If anyone has a reading list, please share!
[-]
- 48terry a day ago
  Me copying and pasting your post verbatim to put on my blog under my name: "I greatly enjoyed collaborating with arthurofbabylon on this piece."
  [-]
  - arthurofbabylon 8 hours ago
    Forget about copying and pasting – the act of reading it is on the spectrum of collaboration. But thanks for the citation.
- visarga a day ago
  > what we are largely looking at is an emerging means of collaboration.
  They surpass open source, "out-open source-opensouce" by learning skills everywhere and opening them up for anyone who needs them later.
  [-]
  - goku12 a day ago
    It's owned by a few rich corporations and individuals. It isn't available to anyone - only to those they choose and are ready to pay them. And it isn't open source at all, because open source is not reuse without any obligations (even under permissive licenses). And let's not forget that they 'open' only FOSS works and individual works. They never expose proprietary IP belonging to rich corporations. It isn't an emerging method of collaboration - it's another method for wealth consolidation.
ugh123 2 days ago
> Please DO NOT TURST ANY WORD THEY SAY. They're very good at lingual manipulation.
I don't know if this was intentional misspelling or not but it's damn funny
[-]
- josfredo a day ago
  It is likely intentional as the author is battling AI with many means possible. However it leans towards funny and hopeless at the same time.
ebcode 2 days ago
not hard to believe. I’ve been using claude code and am hesitant to publish publicly because I’m concerned about copyright violations. It would be nice if there were a registry (besides github) where I could compare “new” code against public repositories.
[-]
- adastra22 19 hours ago
  Why? That’s not how copyright works.
laurex a day ago
I’m interested in a new kind of license which I’m calling “relational source” - not about money or whether a product is commercial but instead if there’s an actual person who wants to use the code with some kind of AGPL-esque mechanism to ensure no mindless ingestion- perhaps this would never work but it’s also breaking the spirit of everything I love about OSS to have AI erasing the contributions of the people who put their time into doing the work.
pjfin123 20 hours ago
Is the allegation here that a LLM generated code that was very similar to the author's copyright protected code or that they copied the code and then tried to use AI to hide that fact?
[-]
- Leynos 7 hours ago
  The allegation is that the party in question copied the author's code verbatim and stripped off the copyright / licence notices.
haebom a day ago
Wouldn't “Pretending It's Mine” be a better name for the project?
CuriouslyC 2 days ago
Sorry to say but this is going to be the new normal, and it's going to be quite difficult to stop. Your moat as a creator is your personal brand and the community you build around your tools.
[-]
- o11c 2 days ago
  I just hope that means we're all allowed to feed leaked source code to our own AIs then. This is mandatory if we're to have any sort of coherent legal precedent.
  [-]
  - ares623 a day ago
    Game crackers can just claim they generated a completely different game using AI that just so happens to look very close to another game?
    [-]
    - CuriouslyC a day ago
      They could copy the core game mechanics and have AI launder the source and generate new art assets. Proving infringement is going to be basically impossible for all but the most trivial of cases.
      [-]
      - ares623 a day ago
        The same could be done for movies too I guess. Probably easier.
        One can setup a site to crowdsource laundering 8-10 second sections of an entire movie and then stitching it back.
- throwaway290 2 days ago
  this is a blatant try to normalize. "Bad people do unethical things, I guess we'll have to live with it and shut up" is the vibe
  the author is going good. it's not a new normal until everybody goes quiet
  [-]
  - CuriouslyC a day ago
    This is a very bad faith comment from a throwaway account.
    Recognition of realities is different from wishing for things to occur. If you think you can stop unethical people from AI washing your software, feel free to try, you will fail.
    [-]
    - throwaway290 a day ago
      Bad faith = trying to normalize bad faith behavior.
      > If you think you can stop unethical people from AI washing your software, feel free to try, you will fail.
      Posts like these = trying to stop unethical people from copyright (copyleft) washing. Telling people writing these posts that it's the new normal is basically saying they are doing pointless thing, while they are doing something very good
      [-]
      - CuriouslyC a day ago
        Stop being a coward and have a discussion with me with your real identity.
        You can whine about something till you're dead, but incentives drive actions, full stop. Instead of whining about normalization, lobby lawmakers to make actual change and build tools to help creators detect the issue.
        [-]
        throwaway290 9 hours ago
        1 Stop telling me what to do
        2 The incentives is to steal and rampage and rape. Some incentives deserve to die in fire, period. To change incentives we punish people or change people. Posts like these help
        I think trying to shut up these efforts by saying this is "normal" is only done by people in that industry or invested in it who profit from this disaster
        [-]
        CuriouslyC 4 hours ago
        How about this... If I was afraid to attach my public identity to my views on the internet, that would probably trigger some reflection in me as to the root cause of that emotion. I'm guessing shame, but people are complex so YMMV.
        If the incentive was to steal and rampage and rape, people would be doing it more. The incentives are reversed though; 10 years in prison for rape, 6 years for stealing, and the odds are you will get caught eventually. If we removed law enforcement, there would be a lot more stealing and raping, though people would take justice in to their own hands as they did in cases of theft and rape in olden times.
        Ultimately this is a dumb back and forth because we agree about the wrongness of the act, we just disagree about the best course of action and the ultimate value of certain approaches. You keep doing you, but if you really care about this I'd suggest you do something more productive than trying to high horse people on the internet about norms.
        [-]
        throwaway290 3 hours ago
        You agree with me, incentives against rape and robbery because of laws and punishment. Sadly copyright laws are not made for LLMs yet. But we can still name and shame.
  - pessimizer 2 days ago
    > this is a blatant try to normalize.
    This doesn't mean anything. You have no ability to "normalize" anything. It's not an action that somebody can take.
    > it's not a new normal until everybody goes quiet
    Real let me speak to your manager energy. Nobody is waiting for you to go quiet to get on with things.
    [-]
    - throwaway290 a day ago
      > You have no ability to "normalize" anything.
      You can if you convince everyone to stop making a fuss because it's the new normal. The comment literally said "it's the new normal".
    - akoboldfrying a day ago
      > You have no ability to "normalize" anything.
      Normalisation isn't something that one person by themselves can achieve. It only happens when public opinion is swayed. How is it swayed? By people deliberately trying to sway it, like GP here.
      If you are instead arguing that normalisation is not really a thing at all: What do you call the change in attitudes to people who are left-handed, disabled, or homosexual?
dvrp 2 days ago
This is the new reality. Information in the form of raw entropy encoded in weights—it doesn’t matter if it’s text, image, video, or 3D. Assets (or formerly known as assets) now belong to the big labs, if it’s on the internet.
Internet plus AI implies the tragedy of the commons manifested in the digital world.