License Laundering and the Death of Clean Room

A canary died in the open source coal mine and a hundred people showed up to argue about the autopsy.

Last week, a Python library called chardet became the most contested piece of open source software on the internet. And not because it does anything glamorous – it detects character encodings and figures out whether your file is UTF-8 or Shift-JIS.

It’s plumbing. The kind of thing that used to sit inside requests (and still does if you have it installed) which means it’s probably somewhere in your dependency tree whether you know it or not.

Here’s what happened:

The maintainer who kept this plumbing running for twelve years used Claude to rewrite it from scratch, then published the result under MIT instead of LGPL. The original author, who disappeared from public life in 2011, came back from the dead to object. Two hundred and forty-four comments followed. Most of them were unhelpful.

What makes this interesting isn’t the law, it’s that every single participant in this fight chose a position that protects their ego over one that would have actually fixed anything.

Three licenses walk into a bar…

If you don’t live in licensing land, here’s the short version – three licenses matter for this story: MIT, GPL, and LGPL. They all let you use, modify, and distribute the code. The difference is what they demand in return.

MIT says: do whatever you want. Use it in your startup, your side project, your proprietary product. Keep the copyright notice, and we’re done. No further obligations. This is why companies love it – their lawyers read it once, nod, and move on.

GPL (General Public License) says: you can use and modify this code, but if you distribute the result, you have to release your entire program’s source code under the same license. That’s the “copyleft” mechanism, keeping free software free. The trade-off: it’s viral – if GPL code touches your code, your code inherits the obligation. Most companies treat GPL like radioactive material; legal teams won’t approve it for anything shipped to customers.

LGPL (Lesser GPL) is the compromise. It says: you can use this library inside proprietary software without the copyleft spreading to your code (that’s the “Lesser” part). But if you modify the library itself, those modifications must stay under LGPL. You must include the license, provide the library’s source, and let users swap in their own modified version. That last part matters: it’s meant for libraries like chardet – shared plumbing usable everywhere, but with improvements flowing back to the community. The practical consequence: MIT code can go anywhere.

GPL code can only go into other GPL projects. LGPL sits in between: any project can use it, but nobody can take improvements private.

The Python standard library requires permissive licensing (MIT, BSD, or similar) because anything in stdlib ships with every Python installation on earth, including inside proprietary products. LGPL’s requirements (source disclosure, relinkability, copyleft on modifications) make that impossible. That one restriction is what set this entire fight in motion.

The maintainer should have forked

Dan Blanchard has been primary maintainer of chardet since 2013, contributing nearly seven hundred commits versus forty-eight from the next person. He aimed to relicense the library for Python standard library inclusion – a goal on record since 2014. It’s reasonable: LGPL prevents stdlib inclusion, and chardet is one of Python’s most widely used packages.

So, he used Claude Code to write a clean reimplementation. Ran JPlag plagiarism detection. Got 0.04% average similarity to the old codebase. Published the design documents. Published the implementation plans. Released it as version 7.0.0 under MIT.

And he did all of this under the same package name, in the same repository, on the same PyPI listing.

That’s the mistake – not the rewrite, not the AI. Not even the license change in isolation. The mistake is claiming your code is “an independent work, not a derivative” while simultaneously shipping it as the next version of the thing you say it’s independent from.

You don’t get to have it both ways: If 7.0.0 is an independent work, it’s not chardet, call it chardet-ng.Call it chardetect. Call it encoding-detector. Ship it as a new package. Let people migrate on their own terms. The Python standard library doesn’t care what the package is called on PyPI, it cares about the license and the code quality.

There’s a real question buried here that most people skipped over: who actually controls the PyPI listing? Dan has been the sole active maintainer for over a decade. Could he even publish a fresh package under the chardet name without owning the existing listing? PyPI doesn’t have a “fork the namespace” button. The infrastructure assumes continuity.

That’s a genuine constraint, not just a convenience play. But it doesn’t change the outcome – if the code is independent, it deserves an independent identity, even if the migration is harder.

But Dan didn’t do that, because the value isn’t in the code – the value is in the name. In the twelve years of trust built by that name. In the fact that thousands of requirements.txt files already have chardet in them.

He knows this. He said as much in the issue: “It’s not like this was a thing I just popped into last week.” Right. That’s the point. The name carries weight that the code, by itself, does not. And the name isn’t his to relicense.

The mob should have shown up years ago

242 comments. People comparing Dan to a sex offender. People offering to fund lawsuits. People from a group called “Monadic Sheep” volunteering to take over the project. The FSF was invoked. DMCA was invoked incorrectly. Someone brought up trademark law. Someone posted a Rust rewrite just to prove a point.

Where were all these people for the last twelve years?

Dan maintained this library alone. No funding. No co-maintainers. No help. The other two people on the chardet team haven’t committed since 2017 at the latest – one of them not since 2012. The original author deleted his entire internet presence in 2011. This is one of the most depended-upon packages in the Python ecosystem, and it was held together by a single person on their own time.

Now that person does something people don’t like, and suddenly everyone has opinions about governance and stewardship and the spirit of free software.

Mike Hoye nailed it in the thread:

If the end state of open source projects is that devs are left to work alone for years on the keystone projects of this jenga tower we’re calling modern infrastructure, and then we collectively jump all over them when they turn to the kind of help that, however reprehensible it might be, actually shows up to help, then this entire FOSS project is just a popularity contest where the losers join a slow, lonely suicide pact.

That’s not comfortable to read. It’s accurate.

The people most outraged by this license change are people who benefited from Dan’s work for a decade without lifting a finger. Consumed the output of a copyleft license without contributing back. Relied on a single maintainer without offering support. And now they’re furious he made a unilateral decision without consulting them.

If you want a voice in how a project is governed, you have to be present when the project needs governing. Not just when it does something you don’t like.

The last time anyone other than Dan Blanchard contributed to chardet, Donald Trump was just starting his first term.

The AI optimists should stop celebrating

Armin Ronacher wrote a blog post about this called “AI And The Ship of Theseus.” He’s excited. He sees AI rewrites as a way to finally escape the GPL, which he views as a restriction on sharing:

If you throw away all code and start from scratch, even if the end result behaves the same, it’s a new ship.

With respect to Armin, whose work I deeply admire, this framing is dangerous.

What he’s describing is license laundering. Take copyleft code. Feed it to a model that was trained on that code. Ask the model to produce something functionally equivalent. Point at the output and say “look, no similarity.” The fact that a plagiarism detector can’t find matching tokens doesn’t mean the work is independent. It means the laundering was effective.

If this technique is legitimate, every copyleft project in existence is one Claude session away from becoming MIT. Or proprietary. The same trick works in both directions.

Someone in the GitHub thread made the sharpest observation of the entire debate: take a leaked Windows source code dump, run it through an LLM, and release the output as open source. Is that acceptable? If not, explain why chardet is different. The mechanism is identical. The only variable is whether you sympathize with the copyright holder.

Ronacher also points out that:

Vercel happily reimplemented bash but got visibly upset when someone reimplemented Next.js in the same way.

He means this as a critique of hypocrisy. It’s actually a critique of his own position. Everyone is fine with license laundering when it benefits them. Nobody is fine with it when it’s their code being laundered. That’s not a principled stance. That’s convenience.

The celebration of AI-assisted relicensing as “exciting” or “progress” only works if you assume that copyleft licenses are a mistake that needs a technological workaround. If you think authors should have the right to choose how their work is used, this should terrify you. Not because the law is clear, but because it isn’t. And the people with the resources to push the boundaries are the ones who benefit from copyleft disappearing.

The man who defined open source says you already lost

Bruce Perens showed up in a separate chardet issue. If you don’t know the name: he co-founded the Open Source Initiative and wrote the Open Source Definition. He’s the person who decided what “open source” means.

His position should make everyone uncomfortable.

To the copyleft defenders, he said:

The courts have not sided with plaintiffs in finding AI work to be infringing so far, because the law as it stands today is built primarily around the concept of literal copying, cut and paste of the actual text.

The AI doesn’t do that. It produces statistically probable output from a blended model of everything it’s been trained on. The result is “unrecognizable as derived from any one source.”

Then he added this:

I was hoping the courts would go a different way than they have so far. My present conclusion is that the wrong side may have won, but they won.

Read that again. The man who defined open source licensing thinks the wrong side won. And he’s telling you to act accordingly.

To the AI enthusiasts celebrating, he offered no comfort either:

I am not evangelizing this. This might not be the world I would have liked to have, but it’s the one we got.

And to the companies wondering what to do, he was blunt:

I do not recommend rejecting an AI-mediated Open Source program with verified low-similarity to other works on the basis of legal risk at this time.

Not because it’s right. Because the law, as it currently works, doesn’t have the tools to stop it. “So, yes, it’s copying, but it’s not the kind of copying the court is going to prosecute as a copyright violation.” (He expanded on this in an interview with The Register.)

Perens is describing a world where copyright law was built for photocopiers and the technology outran it. The copyleft purists are right on principle. The AI laundering crowd is right on current law. And the gap between those two things is where every open source project now lives.

Nobody in the chardet thread wants to sit with this. The purists want to believe the law will catch up. The optimists want to believe the law was always wrong. Perens is telling both sides that the law is what it is, it’s probably not going to change fast enough to matter, and you should plan accordingly.

That’s a f*****g eulogy, not a victory speech.

The copyleft purists should stop pretending they’re owed

On the other side of the thread, people are arguing that Dan owes the LGPL, the FSF, and the original author something that goes beyond what the license actually says.

The LGPL says derivative works must be released under the same license. The core legal question is whether 7.0.0 is a derivative work. That’s a question for a judge, not for a GitHub comment thread. And it’s genuinely unclear. The JPlag numbers suggest structural independence. The fact that Claude was trained on the original code suggests something murkier.

Some commenters went further than the legal point: Dan should step down, never be trusted, his work is a supply chain risk, and twelve years of maintenance entitle him to nothing.

That’s not a principled defense of copyleft. That’s resentment wearing a license as a mask.

A license is a legal instrument. It grants and restricts specific rights. It does not create a moral hierarchy where the original author has permanent authority over a project they abandoned fifteen years ago, simply because they chose the license. The LGPL doesn’t say “the maintainer must defer to the original author in perpetuity.” It says derivative works must carry the same license.

If the code is truly independent, the LGPL doesn’t apply. If it’s not, the LGPL requires the license to be reverted. Those are the only two outcomes the license contemplates. “The maintainer should resign in shame” is not one of them.

Dan’s mistake was strategic, not moral. He should have made a new project. He didn’t, and now he’s in a mess. But treating him like he committed a crime against the commons, when he’s the only person who actually showed up to maintain the commons for over a decade, is selective outrage at its worst.

The only person who did their job right

Mark Pilgrim opened the issue: 4 paragraphs, no insults, no legal threats, no grandstanding.

I respectfully insist that they revert the project to its original license.

Then he stopped talking.

He made his position clear, provided his reasoning, and let others respond. He didn’t engage with the two hundred comments that followed. Didn’t threaten lawsuits or moralize. He stated a fact as he understands it and left.

Mark Pilgrim was a hero of mine. I grew up on Dive Into Python. His blog was one of those places where you went to learn how to think about the web, not just how to build for it. When he deleted his entire internet presence in 2011, it hit a lot of us hard. People called it an “infosuicide.” Whatever his reasons were, the web lost something real that day. Seeing his name show up in a GitHub issue in 2026 was – I don’t know. Not closure. But something.

Jason Scott vouched for his identity. And then he did something remarkable: he stayed out of it. Jason Scott is the kind of person who could give a two-hour talk on this topic without repeating himself. He’s an archivist, a historian, and nobody’s fool. The fact that he showed up, confirmed Mark was Mark, and then chose not to add his voice to the pile, that’s restraint that most people in that thread couldn’t manage. Sometimes the most useful thing you can do is not talk.

The person with the most legitimate claim to outrage was the least outraged person in the thread. The person most qualified to add commentary chose silence. There’s a lesson in that, if anyone’s paying attention.

Meanwhile, in the real world…

While the GitHub thread debated philosophy, someone who works at NVIDIA opened a separate issue with a different framing entirely. No ideology. No license theory. Just a practical assessment from someone who has to get software approved by a legal team before it ships. (They explicitly noted their opinions don’t represent NVIDIA’s.)

The title: “v7.0.0 presents unacceptable legal risk to users due to copyright controversy.”

The conclusion:

chardet v7.0.0 is absolutely toxic. If my employer’s open source review legal people got wind of it, I seriously doubt that they’d approve v7.0.0 and up for any use under any circumstances whatsoever.

This is the part that should have stopped everyone cold.

Dan’s stated goal was to make chardet more widely adopted. Get it into the Python standard library. Remove the LGPL barrier that kept companies from contributing.

Instead, the rewrite made chardet less usable than it was before. Under LGPL, any company could use it freely – LGPL is specifically designed to be non-viral for library consumers. Under the disputed MIT, no company with a functioning legal team will touch it. Not because MIT is worse than LGPL, but because the provenance is radioactive. The license dispute itself is the contamination.

Dan’s own user is telling him:

I can’t use this anymore. The license isn’t the problem. The uncertainty is. And uncertainty is worse than the restriction ever was.

This is what happens when you optimize for a theoretical audience (stdlib committee, hypothetical future contributors) instead of the actual one (the thousands of projects that depend on your library right now). You trade a known constraint for an unknown risk. And in enterprise software, unknown risk is the one thing nobody can accept.

What this is actually about?

This fight isn’t about chardet, it’s about three things crashing into each other at once, and nobody wants to untangle them because each thread, pulled separately, leads somewhere uncomfortable.

Thread 1: who owns a project? Open source has no good answer for what happens when the original author leaves and a sole maintainer carries the project for a decade. The license governs the code. Nothing governs the name, the reputation, the PyPI listing, the trust. Dan accumulated something real over twelve years. It’s not copyright. It’s not a trademark. But it’s not nothing, and the current system has no framework for recognizing it or limiting it.

Thread 2: AI makes clean-room arguments meaningless. The entire concept of a clean-room implementation (where you build something from scratch without ever looking at the original code, so you can prove your version is independent) assumes that knowledge contamination is binary. Either you’ve seen the code or you haven’t. LLMs break this model completely. The model has “seen” the code during training. The developer has seen the code during years of maintenance. The output has near-zero structural similarity.

Is that independence, or is it effective laundering? Nobody knows. No court has ruled. The first ruling will set precedent for every copyleft project in every language.

Thread 3: copyleft has a sustainability problem. LGPL kept chardet from entering the Python standard library for over a decade. It kept a sole maintainer trapped in a licensing box that actively prevented the project from growing. The license did exactly what it was designed to do, and the result was a critical dependency maintained by one exhausted person. If your license’s primary effect is preventing adoption and discouraging contribution, you should at least acknowledge that outcome before invoking the license as sacred.

None of these threads have clean answers. But here’s what would have worked: Dan creates a new project called chardetect under MIT. He announces that chardet 6.x is the final LGPL release and will receive security fixes only. He points the community to the new project. Mark has no grounds to object because the new project doesn’t claim to be chardet. The Python stdlib gets its MIT implementation. Everyone who depends on chardet can migrate on their own schedule. Nobody’s trust gets violated.

That didn’t happen because it would have required Dan to give up the one thing that made the rewrite valuable: the name.

The uncomfortable question

Pull up your dependency tree. Find the packages maintained by a single person. Check when they last committed. Check who else has commit access.

That’s your chardet. It’s sitting there right now. And when the maintainer finally snaps, burns out, or makes a decision you don’t like, you’ll have opinions about governance too.

The question is whether you’ll have earned them.

Further reading: Simon Willison’s analysis of the chardet dispute, Armin Ronacher’s “AI And The Ship of Theseus”, Bruce Perens in The Register, the original GitHub issue, and the legal risk issue.

The original Mozilla research on universal character set detection that predates all of this. The US Copyright Office report on AI and copyrightability. Google v. Oracle on API copyrightability and fair use.