tedivm 9 hours ago

I think calling this "scraping" is hiding an important point. Scraping tends to mean crawling the web and pulling down stuff people put up. In the US (and yes, I know that this is a UK article) it is legal to scrape, with the idea being that if people responded to the request that's on them. This is why lots of sites have login gates to access data, since to log in you have to agree to a TOS.

AI companies haven't just been scraping, they've been pirating. I think it's important because people consent to put their comments on websites, but authors don't consent to have their work stolen. Mixing the two together as if they are the same is an attempt by AI companies to muddy the ethical waters a bit more than I'm comfortable with.

  • graemep 9 hours ago

    Its not necessarily OK with comments. A lot of people posted comments before LLMs, a lot of people posted comments in places with T & Cs that said the comments would be used in connection with the service.

    Because I consented to a comment being posted on a website to be read by people does not mean I consented to it being used to train LLMs.

  • 1vuio0pswjnm7 7 hours ago

    FWIW, the proposed amendment never uses the word "scraping". It's only used in the title from The Register.

  • alganet 8 hours ago

    Terms of service cannot override the law.

    For example, lots of TOS make the works of users property of the platform owners. All of that can stop being relevant if that practice becomes illegal.

    People consented to put their comments publicly, they didn't consented to feeding a machine that steals their ideas. One ruling establishing that the wording on the terms is not good enough, and all of that can fall.

    • tedivm 7 hours ago

      No one said that TOS's override the law.

      If you shout something in public, someone might hear you. If you shout in a locked house they won't hear you unless they break in.

      If you post something publicly on the internet, then people can look at it. If you post something privately on the internet, people have to agree to your terms to view it.

      That's it, that's the logic the government uses to allow website scraping.

Waterluvian 9 hours ago

It feels like a data laundering problem. Within the models the data still exists in some form, but it’s been laundered with so much other data that they feel it’s sufficiently untrackable and clean.

  • sixtyj 8 hours ago

    It is data laundering. When Open Street Maps project started, it was the same situation. You take data, use them, clean them, and voila… they are open source

ksymph 9 hours ago

The neatest outcome IMO would be to grandfather in data from before ~2022, but strictly require consent for anything new going forward.

Enforcement would be kind of impossible though. How would you prove the age of the data? And what about for things like websites - if a website from 2015 gets scraped in 2025, where does that fall?

So, maybe not a very practical solution. But I think the ideal outcome is one where AI companies don't have to throw away progress from the past few years, but artists have control over their participation going forward. It's hard to say if there's any solution that can respect both.

More likely than not, one party is going to be shafted; and it's probably not going to be the one with the money behind it.

  • herbturbo 8 hours ago

    I personally don’t see why AI companies should be allowed to hold on to any of the “progress” that came from theft of intellectual property.

f1codz 9 hours ago

"you can't enforce the law if you can't see the crime taking place"

Isn't this the key?

If AI can learn how to write a song like the Beatles, is it a crime that it has learnt it, or is it a crime that someone can use AI to produce work that resembles those artists' creations?

May the control be put on preventing AI from plagiarizing instead of putting measures to prevent scraping specific content from the open web.

  • kstrauser 8 hours ago

    The obvious difference would be that a human doesn’t regurgitate someone else’s work as-is and represent it as their own, at least not with enormous reputional damage.

  • herbturbo 8 hours ago

    It wasn’t scraped from the open web though was it?

realo 9 hours ago

I am a human. i can listen to music, look at videos, movies, read books etc etc...

If I were a musician, I could pull upon my life's experience of music to create new, unheard before, music in any style I chose.

My music would belong to me because I created it.

How is it different for an "innocent" AI who does exactly the same thing?

  • awkwardpotato 8 hours ago

    A musician doesn't just pull from their life's experience "of music" to create music. They pull from their entire lived experience, their emotions, the experiences and emotions of their community and world.

    An "innocent" AI cannot and does not have any of that. To say all a musician does to create new music is "listen to [existing] music" is reductive and ignores the inherently human expression that is music.

  • mihaic 7 hours ago

    The question itself is wrong, since it personifies the AI, when in practice there's always a human entering prompts, and at the same time ignoring the consequences of the generated material on the ecosystem from which it was generated.

    Or, consider an analogy: how would you feel if some machine could use DNA from strands of hair to clone entire humans? Wouldn't it feel wrong?

    • realo 7 hours ago

      Oh yes. Very wrong.

      But how are we going to have the trans-humans required to be able to keep humanity's existence after AGI takes control of the world!

      Don't take this too seriously... ;)

  • sceptic123 8 hours ago

    It's different because you haven't been given free access to all of the music/videos/movies/books in the world to learn from.

    AI also doesn't have life experience to pull from.

    Who does the music that AI creates belong to?

  • qwery 6 hours ago

    Your entire line of questioning is disconnected from reality. How could you possibly think that the machine can do that? I'm sorry that that sounds so severe, but it's the best way I could describe it.

    Creation doesn't work that way. Music(ian) doesn't work that way. Your innocent AI lifeform doesn't exist. The technology does not experience in any way, let alone like a human does.

    When you read the bitstream of an mp3 does it sound like the song you hear if you play it in Winamp?

  • herbturbo 8 hours ago

    Because you what you listened to was actively offered to you by its creator, and you paid for it, or you were supposed to.

    You would also only hear the songs you had discovered or had been introduced to, at the speed they were meant to be listened to. The music would also often be part of a scene that might include different underlying philosophies or associated fashion.

    Your new music would be the outcome of all of those individual musical experiences, coupled with your creative and technical ability.

    None of that is true for AI. It has just harvested everything without anyone’s permission and now makes artless slop from it for anyone who wants it 24/7.

  • alganet 8 hours ago

    AI is not human. It doesn't have feelings, children, pupils, family, goals, ideals.

    It's very different.

    • realo 8 hours ago

      Oh?

      Then please bear with me and extend the comparison just a tiny bit.

      Replace the AI with an advanced species from Mars. Or even better yet an augmented human ... a trans-human if you will.

      Both the martian have kids, families, feelings, etc... AND both can absorb the entire internet in , say , a week or two. They are musicians and create new music from their internet experience.

      Does their music belong to them?

      Do they have to pay rights to all the authors they listened to so far on the internet?

      And finally ... what is the difference between my martian and a human today?

      • awkwardpotato 8 hours ago

        What a wild false equivalency...

        This article isn't even a discussion on AI copyright ownership. It's about training.

        What if an evil robot race came along and begin draining your Martians of all their history, expressions, and entertainment. Would the Martians not be within their rights to go "hey, don't do that"?

        Parent edited rather than responded:

        Your "martians" already did pay for the rights to access the music (streaming services, cds, etc)! And if it's highly derivative they'll continue to have to pay (royalties). OpenAI/Meta/Anthropic did not pay, that's the inherent issue. They took when they should not have had the right to access, and trained against it.

        The question is not the difference between your mythical Martians and humans, its difference between your Martians and AI. Your martians, like humans, have lived experience. They have emotions, fears, beliefs. They have family, children, loved ones. This all affects how they would express themselves. An "AI" has none of this.

      • alganet 8 hours ago

        You only made my argument better, thanks.

    • TiredOfLife 8 hours ago

      I don't have like half of the listed things. That means I am not human?

      • alganet 8 hours ago

        You can understand those things, even if you don't have them. Anyone that sees you can understand that you are human and capable of sharing those experiences.

hu3 8 hours ago

Everything humans create is a product of:

- a pre-trained model: DNA

- and a prompt: envinronment/input

Is that much different from AI?

  • qwery 6 hours ago

    What is this supposed to mean? You could replace "humans" and "AI", in either place, with "plants", "E. Coli", or "cane toads", &c. and it would make just as much sense.

  • the_snooze 7 hours ago

    If it's all the same, then that implies AI has no value-add and thus there's no point in developing it.

    • hu3 7 hours ago

      Is it?

      Humans have hit a hard wall of intelligence limitation for a long time now.

      AIs, on the other hand, are just beginning.

  • herbturbo 8 hours ago

    Real music comes from the soul.

DarkWiiPlayer 10 hours ago

My opinion continues to be that AI companies should have to prove that they have consent to use any and all data their models are trained on.

That is, be able to prove a) that their models were actually trained on the data they claim, b) that they have consent to use said data for AI training, and c) that this consent was given by the actual author or with the author's consent.

I want platforms like soundcloud, youtube, etc. to be required to actually send out an e-mail to all of its users "hey we will be using your content for AI training, please click here to give permission".

  • rafaelmn 9 hours ago

    Even if you can enforce this somehow, other countries will not. Unlike copyright and patent law in consumer products and content - getting an upper hand in AI race could have huge implications down the line. So the only government that would enforce this is the one that has no chance of competing in this space in the first place (EU)

    • rapind 9 hours ago

      AI poisoning might be the answer, but it needs a business case. Some sort of SaaS that artists can pay for to process their content that will flood and poison the crawlers.

      • ronsor 9 hours ago

        AI poisoning—or rather how artists think AI poisoning works—is largely a myth that doesn't work in practice with these large foundation models.

    • dopidopHN 9 hours ago

      Not with that attitude for sure. If the US or / and European union do that, it’s already a big chunk

    • dbg31415 9 hours ago

      Let’s be honest - this is an argument that “the ends justify the means.” But that kind of reasoning should make all of us uneasy. Where do we draw the line? If we eliminated a third of the world’s population to stop global warming, would the noble goal make it acceptable? Clearly not.

      We can’t ignore the ethical cost of how AI is being developed - especially when it relies on taking other people’s work without permission. Many of today’s most powerful AI systems were trained on vast datasets filled with human-made content: art, writing, music, code, and more. Much of it was used without consent, credit, or compensation. This isn’t conjecture - it’s been thoroughly documented.

      That approach isn’t just legally murky - it’s ethically indefensible. We cannot build the future on a foundation of stolen labor and creativity. Artists, writers, musicians, and other creators deserve both recognition and fair compensation. No matter how impactful the tools become, we cannot accept theft as a business model.

      https://arstechnica.com/tech-policy/2025/02/meta-torrented-o...

    • sofixa 8 hours ago

      > So the only government that would enforce this is the one that has no chance of competing in this space in the first place (EU)

      Mistral waves hello. They're alive and well, and competing well.

      Also, while the AI Act and copyright are handled at the EU level, I always get the impression that anyone talking about a "EU government" simply doesn't understand the EU. If you think Germans or Slovaks are rooting for Mistral just because they're European you'd be wrong - they'd be more accepting of it, maybe, due to higher trust in them respecting privacy and related rights, but that's.

  • docdeek 9 hours ago

    > I want platforms like soundcloud, youtube, etc. to be required to actually send out an e-mail to all of its users "hey we will be using your content for AI training, please click here to give permission”.

    Wouldn’t sites like YouTube already have a license to make money off your content anyway? This might be a little out of date but it notes that even though you own the material you upload to YouTube, by uploading it you grant them a license to make money off it, sub-license it to others for commerical gain, make derivative works etc. IANAL but this suggests to me that if you upload it to YouTube, YouTube can license it to OpenAI without needing to inform you or get additional consent. [0]

    [0]: https://www.theguardian.com/money/2012/dec/20/who-owns-conte...

    • lawlessone 9 hours ago

      citing an article from 2012? I don't think much of this kind of training was happening then

      • docdeek 8 hours ago

        I agree - though I also imagine that the T&C's were deliberately broad enough to ensure that they could adapt to what has emerged.

  • simonw 9 hours ago

    Should an AI model be able to answer the question "which team won the superbowl in 2023" if there are thousands of articles out there containing that information but not a single one of them has been licensed for use by AI?

    • DarkWiiPlayer 9 hours ago

      If you could separate the information from the intellectual property, sure; but if the model is also capable of generating a similar article, that's the point where it starts infringing on the IP of all the authors whose articles were fed into the model.

      So in practice, no, it shouldn't. Not because that information itself is bad, but because it probably isn't limited to just that answer.

      In summary, I think it is definitely a problem when:

      1. The model is trained on a certain type of intellectual property 2. The model is then asked to produce content of the same type 3. The authors of the training data did not consent

      And slightly less so, but still questionable when instead:

      2. The IP becomes an integral part of the new product

      which, arguably, is the case for any and all AI training data; individually you could take any of them out and not much would happen, but remove them all and the entire product is gone.

    • dingnuts 9 hours ago

      No.

      That's a funny example since broadcasters have to pay a fee to say "The Super Bowl" in the first place. If they don't, they have to use some euphemism like "the big game."

      The answer is definitely no. You cannot use something that you don't have a license for unless it belongs to you.

      • simonw 9 hours ago

        I didn't know that about euphemisms, that's a great little detail - makes this hypothetical question even more interesting!

        (For what it's worth to, Claude disagrees and claims that news organizations ARE allowed to use the term Super Bowl, but companies that aren't official sponsors can't use it in their ads. But Claude is not a lawyer so <shrug>)

  • amelius 9 hours ago

    > please click here to give permission

    I want "please mail back this physical form, signed".

    It's way too easy with dark-patterns to make people inadvertently click buttons. Or to pretend that people did.

  • detectivestory 10 hours ago

    I'm pretty sure soundcloud has already done this. I don't believe they gave an option to opt-out though.

    • DarkWiiPlayer 9 hours ago

      Then they are stealing people's content and imho should be punished for it. It is baffling that we let companies get away with "if you don't opt out you agree" or even "you can't opt out, delete your account or you agree" and often hide that in generic sounding terms & conditions updates.

      Again, I think we should require companies to get the user to actively give their consent to these things. Platforms are free to lock or terminate accounts that don't, but they shouldn't be allowed to steal content because someone didn't read an e-mail.

alganet 8 hours ago

What a dilemma. Isn't it?

Fight for big copyright, or let it all be scraped?

It's not just about famous works now. We live in a gray area era. Every post on social media, tap on a keyboard, click, can be used to train AI.

Who owns that intelligence? It feels like AI products should acknowledge the names of billions of contributors. Common people, feeding the machine, mostly unknowingly.

So, copyright could be an unusual ally. It's often a villain for common folk, but again, we live in a gray area era regarding those rights. Copyright has a precedent with pedigree. Most scraped content is not famous songs and movies, it's random internet stuff.

We need to remember we want AI, but we also want good songs and movies for entertainment, and we also want to have our lesser works protected (internet banal stuff). There is a fine line connecting all that stuff somewhere.

  • qwery 6 hours ago

    I don't see the dilemma. Content laundering as a service shouldn't be incentivised by the government.

  • sceptic123 8 hours ago

    Who actually wants AI though?

    • alganet 2 hours ago

      Scraping affects those who want and those who do not want AI, it's a commons issue.

    • senordevnyc 7 hours ago

      Presumably myself and the other 20 million people paying for ChatGPT alone, not to mention all the other AI products out there. And then the hundreds of millions using it for free. I'm guessing they all want it, but idk.

empath75 9 hours ago

I think we would all like to think that policy is formed from a dispassionate view of the issues at hand and everyone works to produce fair out comes for everyone, but what is going to happen with regard to AI, is that whichever policy produces the most economic "value", however that is defined is going to win out.

For example, pretty much everyone agrees that the current copyright regime that allows large corporations to hold copyrights on vast libraries of content for near perpetuity isn't what's best for society, but it has earned a lot of companies and a few people a lot of money with which they can influence politics, and so it remains.

Now, it seems that there is a lot of money to be made through training AI on these vast libraries of material, and the companies making that money can use it to influence politics to allow it.

There is of course, the remote possibility that policy on this topic is going to be formed by rich and famous people persuading people of the rightness of their cause, but I don't have a lot of high hopes on that outcome -- and the even remoter chance that the much larger number of "creatives" on the lower manage to band together and lobby over it.

This whole conversation is basically just a public negotiation between corporations on who gets to make the most money from "content" that they didn't create.

  • sorcerer-mar 9 hours ago

    I don't think your framing nor your conclusion is correct. Copyright law wasn't written to make Party A or B wealthy, it was written to protect the incentive to produce things that society values.

    It is very clear that a huge portion of AI's value is specifically the destruction of incentive to the original creator of the work, ergo courts will over time find that AI will have to pay for that right. Or courts will have to decide we know longer value music, art, literature, etc. made by humans -- seems like a long shot to me.

    • thomastjeffery 8 hours ago

      Or we can recognize that virtue is not equivalent to value.

      Just because copyright was made with good intentions does not mean it's actually helping. It's pretty clear that copyright is used almost entirely to hoard wealth.

    • empath75 8 hours ago

      > I don't think your framing nor your conclusion is correct. Copyright law wasn't written to make Party A or B wealthy, it was written to protect the incentive to produce things that society values.

      That's not really different from what I said. "What society values" is what people spend money on, and that money is used to influence policy.

      > Or courts will have to decide we know longer value music, art, literature, etc. made by humans -- seems like a long shot to me.

      The courts only interpret the law and it's pretty clear to me the law is going to change.

hndownfall 9 hours ago

Hahaha now pull the other one.

sam_lowry_ 8 hours ago

Paul McCartney does not have the rights to some of this songs, including the most popular ones.

He still sucks up to the copyright lobby.

Pathetic old man (

superkuh 10 hours ago

I can't think of a group of people less qualified and less informed to comment on the issues at hand except politicians. But like usual the weavers will get upset about mechanical looms. And these weavers have big enough money to rock the boat.

  • prophesi 10 hours ago

    > And these weavers have big enough money to rock the boat.

    But they're up against multi-billion dollar companies, and the richest man in the world appointed by the US president to dismantle federal oversight.

    • bilbo0s 9 hours ago

      The weavers don’t care.

      In four years there will be yet another president. Probably even from a different party the way things are going. They know the system swings back and forth like that. These people have been working that system for over a hundred years now.

      How do you think they got d@mn near a century IP protection on a fricken mouse? Who’s in power doesn’t matter to these people.

      • prophesi 8 hours ago

        I'd argue the same goes with Big Tech. I agree the length of copyright protection has gotten out of hand, and I'm normally against any expansion of copyright law, but there are several aspects of training AI that seem blatantly problematic in the way it's currently carried out.

  • evrimoztamur 10 hours ago

    I wanted to see if you yourself were qualified to make this statement but your SSL certificate is expired.

    • superkuh 9 hours ago

      Nah, your browser is a corporate one with only corporate use cases that only accepts certs with extremely short lifetimes. I self-signed the cert so it's good till 2050. You can always turn off JS and access the HTTP version of the site.

      As for my qualifications, I've trained a very small (~0.8B) LLM myself on 3GB of IRC logs. So I kind of know what I'm doing and have a basic understanding of the theory involved and the pragmatic issues.

      • sorcerer-mar 9 hours ago

        Is that meant to be a humorous claim to authority?

      • cush 9 hours ago

        "My opinion is more important than the opinions of the greatest musicians of all time, because I once built an LLM myself on 3GB of IRC logs" - superkuh, 2025

        • superkuh 3 hours ago

          It's like thinking the greatest musicians of all time have reliable expert opinions on audio electronics. Sure, it's technically music related but the skill set doesn't really transfer to, say, ADC design.

          I'm not saying I'm great. If we go back to my original comment which started this thread, I'm saying this group has few relevant qualifications besides celebrity and profit lust. Which means even I know more than them about generative and LLM AI training, and the regulatory implications, despite just being a "hobbyist" in the domain.

      • evrimoztamur 9 hours ago

        Stock iOS Safari, and I do not accept your beginner-hobbyist credentials and superficial comments on genuine political issues concerning copyright law and fair labour compensation, not copy-pasted Python tutorials.

        • hu3 9 hours ago

          > Stock iOS Safari

          to be fair that's not a bastion of quality either.

          • evrimoztamur 8 hours ago

            What do you mean by quality, exactly?

            My up to date list of recognised certificate authorities in one of the two major smartphone OSs does not validate his certificate. This is not a matter of 'quality' of Safari, or iOS.

        • superkuh 3 hours ago

          That's fine. I understand. Only corporations matter. And this is a corporation with celebrity names associated! Surely that means the people involved know what they're talking about more than a random person with direct experience. After all, that person isn't trying to make money. Their opinion can't possibly matter if they're not trying to make money.

  • detectivestory 10 hours ago

    Yes, they should have ran with "More than 400 of the UK's leading media and arts professionals" as the headline. But realistically, they are probably trying to appeal to a certain reader who won't be found in the comments section of a HN article.

  • SirFatty 10 hours ago

    Sounds like you're stereotyping... do you say that because of their age or their profession?

    • superkuh 9 hours ago

      Their profession. I doubt they understand what generative AI or large language model training actually involves. It's obvious from the ideology which comes up with an absurd law like, "If your computer has something I've made in ram at any time you must tell me about it."

      I do admit there are many quite informed and technically capable professional musicians but I doubt they'd support a law like this.

  • mrkeen 10 hours ago

    > The group [...] said the amendments tabled for the Lords debate would create a requirement for AI firms to tell copyright owners which individual works they have ingested.

    They are demanding to be informed.

    • cosmotic 10 hours ago

      That list would be practically endless. It might make more sense to list what wasn't used.

      • kstrauser 10 hours ago

        That sounds like a them problem. Sorry if it’s inconvenient to cite your sources, but that’s life.

  • pantulis 10 hours ago

    What are the necessary qualifications?

  • amelius 9 hours ago

    > But like usual the weavers will get upset about mechanical looms.

    Except the mechanical loom folks are dependent on the IP of the weavers.

    In other words, not a good analogy.

  • 9283409232 10 hours ago

    Fans of AI see artist like icemen complaining about refrigeration but it is not the same thing. Generative AI doesn't exist without their prior work and it is used to copy it 1 to 1. Make them say things they didn't say or make art they didn't make. It is fair that they want transparency, consent, and payment.

    • imgabe 9 hours ago

      If someone uses AI to produce an infringing copy of an existing work, the person who prompted the AI to do that can be sued just like if someone hand drew a copy or used a photocopier to make duplicates. That isn’t unique to AI.

    • DarkWiiPlayer 9 hours ago

      Indeed; this is like stealing the icemen's ice to fuel refrigeration. If some technology makes your job (partly) obsolete, too bad for you. But you shouldn't be forced to contribute to this technology against your will.

  • surfingdino 10 hours ago

    Are you saying that people who are affected by the AI's impact on their income/wealth are the least qualified to be concerned about it?

    • gloxkiqcza 10 hours ago

      AI is a tool. Just like with a drum machine or a DAW, you need to be a musician to be able to use it to create something worthwhile. And just like sampling, drum machines and DJing didn’t kill acoustic music, AI won’t either. It will merely create a new type of music that will coexist along with all of the other types of music just fine.

      AI just raises the level of abstraction and therefore the capabilities of an individual.

      • etblg 35 minutes ago

        Since you brought up sampling in music and I feel compelled to point this out in any AI thread when that gets mentioned:

        sampling machines don't give you a free pass to sample music all willy nilly, if you're going to publish the result for commercial gain you have to clear the sample, the original artist is getting royalties from it. This is something that was fought for and won by musicians: https://en.wikipedia.org/wiki/Grand_Upright_Music,_Ltd._v._W....

        (Not that you implied otherwise, I just want to point that out)

  • sp527 10 hours ago

    This is a very primitive, pre-genAI type of thought process.

    People like you can only see copyright infringement when it's blatantly staring you in the face, like Studio Ghibli style AI images. Why is it obvious in that case? Because we have a large enough preexisting sample of Studio Ghibli style frames from anime to make it obvious.

    But move closer to zero-shot generation, which anyone with a modicum of knowledge on this subject understands was the directional impetus for generative AI development, and the monkey brain short circuits and suddenly creatives who want to protect their art can go fuck themselves, because money.

    You may not find common cause with multi-millionaire artists trying to protect their work now, but you certainly will in hindsight if the only fiscally-sustainable engines of creativity left to us in the future are slop-vomiting AI models.

Dumblydorr 10 hours ago

Multi millionaires who got very lucky demand tech companies to come clean, yeah I doubt they’ll find allies in the populace or most of tech sympathetic. Use your 100s to wipe your tears Paul.

  • jakeinspace 9 hours ago

    You know the general populous hates big tech far more than big-time artists, right? The profession is moving in the direction of lawyers and investment bankers in terms of public trust.

francisofascii 9 hours ago

We all "stand on the shoulders of giants." Much of our creative work is a derivative of earlier work. Humans and AI are no different.

  • 4d4m 7 hours ago

    Lol but laws are laws, and copyright law is exceedingly clear.

    The vibe that all creative work is based on something from earlier doesn't hold up in court. Breaking copyright isn't legal, it's theft.