Borrowed knowledge

I didn’t build the industry I work in, nor did I originate the concepts I use daily. Instead, I am paid to curate vast amounts of borrowed knowledge and reconfigure it to solve specific, novel problems.

The ways I have obtained this knowledge vary — thinking in AI terms, my “training data” is the collection of all the inputs that have made their way to me over the years. Some of them I can specifically recall and credit. After some reflection I can even ascribe a certain level of importance to them. But the majority of inputs have happened, changed me, and been forgotten. I couldn’t tell you every source that taught me how to speak English, nor could I create a weighting describing the relative credit each source deserves for the voice that is “uniquely mine.”

But there is a critical distinction between my training data and the data powering the current wave of Artificial Intelligence.

My inputs were paid for. As social creatures, we understand that while some inputs are shared freely (family, friends), valuable expertise costs something — whether it is tuition, the price of a book, or the social currency of a conversation. We have built an entire economy around the exchange of knowledge.

Generative AI has broken this economy. It is ingesting the sum of human experience without upholding the social or financial contract of the past. By treating human creativity as a free raw material, we are not just cheating creators; we are sprinting toward an intellectual heat death — a closed loop where the internet feeds on its own exhaust.

Time and value

It is worth pausing on the concept of time to understand what we are about to lose. Industrialisation and Henry Ford instilled in us the conviction that each unit of our time has a distinct value and can be “owned” by someone other than ourselves. I go to work for eight hours a day for someone else and during that time I am expected to deliver value only to the organisation paying for my time.

With the internet, people began freely sharing information created in their “free time.” Because we hold this “time is valuable” mindset it was only natural that ad-supported content would arise. I desire new inputs, but I have a limited budget. I get free access, and the creator gets paid by an advertiser. It was a neat way of getting a third party to pay for the “free time” the creator gave up in order to do the valuable work of sharing their knowledge.

By 2021, the state of knowledge acquisition was, at best, a tense truce. While search engines like Google held a near-monopoly on attention — forcing publishers to optimise their work for an opaque algorithm in a lottery for clicks — the basic economic loop remained intact. They scraped the content, but they sent the traffic back to the original source.

The arrival of the black box

In 2022, OpenAI released ChatGPT, and the logic of the internet changed overnight.

The underlying technologies of Generative AI and LLMs are trained on trillions of inputs that humanity has created over the years. This threatens the established patterns of knowledge creation, sharing, and crediting detailed above.

Part of the fascination with LLMs comes from how much they resemble us. We use anthropomorphic language to cement this. Models are “trained.” They “learn.” They claim to “create.” Because they generate content that looks and sounds like a human wrote it, we naturally start to trust the output as a valid input.

But anyone who has interacted with GenAI for any period of time has had the experience where, after the initial awe, a vague disappointment sets in. The built-in perkiness and over-confident answers present a facade. It is like talking to a boorish know-it-all after a couple of martinis. Underneath that facade we start to intuitively recognise a technical truth: GenAI is boring.

It regresses to the mean. It returns the response most likely to be the “right” combination of words based on the average of its training data. It flattens.

This feels eerily familiar. Go on LinkedIn or any social media platform pre-2020 and you will find plenty of real humans creating “unique” content with the same flaws of mistruth and over-confidence. Certain countries even elected presidents for whom this was the main calling card.

The problem with GenAI

The problem isn’t GenAI itself. It is the way it has been sold and adopted as a source of truth despite its known limitations. It is being applied to as many domains as possible, as fast as possible.

Even in the face of demonstrable, frequent inaccuracies, we keep going back to ChatGPT, Gemini, and Claude. They tell us upfront — “We make mistakes, please double check responses” — but they put it in small print below incredibly convincing sounding answers. They have thousands of engineers paid unheard of amounts of money to create an engaging experience that keeps the user coming back for another turn in the conversation rather than taking the information and fact-checking it.

This article is not intended to convince you of the dangers of AI. The fact is that Generative AI is earning its owners a lot of money. Because these tools only exist due to the massive amount of human-generated content available to them, we need to have the conversation about crediting and rewarding the creators of that content. The old structures won’t get us through what is developing now.

Originality and fairness

We all have a sense of fairness that gets triggered when someone copies work. But this usually only applies if the work is deemed “creative” or “new.” No one gets offended when I copy everyone else and stand on the right side of the escalator.

“New” is tricky. The sentence I just wrote hasn’t existed before in this exact form. In that narrow sense, it is new. But the concepts and word patterns were learned from the inputs I was trained on. I have just reconfigured them in a novel way.

I would argue that the issue of fairness only raises its head when there is something to be gained from the content. We don’t like it when someone copies an answer for a good grade, or when someone tries to look smart by claiming an idea that isn’t theirs.

Getting what we pay for

Let’s get back to the original idea. I am paid because of what I know and how I apply it. That money is mine. I don’t have to pay a royalty to my high school teachers who ensured I could form decent sentences. I don’t have to tip the person who trained me at my first bartending job every time I make a cocktail.

All of these inputs were paid for at the time. They were either purchased or freely given as part of the social contract of being human. By our commonly accepted values, my “debt” to the people who trained me is paid.

Here lies the issue. Massive amounts of human-generated information have been leveraged to create products that profit at never-before-seen rates. The companies profiting did not generate the information. We have disregarded the frameworks that dictate equitable credit. Our systems worked when the “intelligence” being trained was human. We cannot think those same systems apply to Artificial Intelligence just because there is a superficial resemblance between us and them.

A grain of salt

We instinctively annotate anything another person says to us with what we know about that person. If both Robert De Niro and Selena Gomez tell me something about being a celebrity, I trust them because I know they have experience. But I also take into account their differences in age and background.

As humans, we rely on these impressions. This latent filtering of information is integral to how we operate in the world. We rely on it to know who to trust. We know when to take things with a grain of salt.

LLMs negate this faculty. In their current state, it requires an unreasonable level of technical research to understand “where this model is coming from.” Once that LLM is used to power an application like ChatGPT, it becomes impossible to consider the source.

When meeting a stranger, we slowly build knowledge and expect a consistent backstory. The “strangers” of ChatGPT are backed by billions of advertising dollars. They are trained by psychologists recruited from addictive technologies like social media and mobile games. They are motivated by goals dictated by their programming, constantly changing with no warning.

If I introduced you to someone at the pub who fit that description, you would correctly not trust a single word they said. Yet here we are.

We have a problem

I am raising two distinct problems that are intrinsically related:

LLMs drive significant revenue for their owners without rewarding the creators of the inputs they were trained on.
LLMs operate as a black box, giving users no indication of the source of their information, biases, or intentions.

Because these resemble problems we have known for thousands of years, it is easy to think our existing systems can handle them. Copyright laws should cover problem 1. Education and experience should help us with problem 2.

We are wrong. Fair use laws have no precedent for the scale at which AI is ingesting human knowledge. And as for human instinct? Tech companies have decades of experience wielding psychology as a weapon. They are better at it than we are.

Transparency and attribution

It would be absurd to ask a human to wear all their inputs on their sleeves. It would be impossible to have a human articulate their agenda for every conversation.

But LLMs are not human. The tech companies claim that attribution is impossible, that the “neural network” is a black box that cannot cite its sources. This is a choice, not a technical limitation.

Architectures like Retrieval Augmented Generation (RAG) exist today. These systems force the AI to look up information in a trusted database before answering, effectively showing its homework. We can build machines that cite their sources. The industry simply chooses not to because it is cheaper and faster to feed us the “average” of the internet without accountability.

The political headwinds

The political landscape is shifting against us. In July 2025, President Trump stated that forcing AI companies to pay for training data is “not doable”. Federal judges have recently ruled that training on copyrighted books is “spectacularly transformative.”

That word, “transformative,” is doing a lot of heavy lifting. The legal argument is that because the AI doesn’t reproduce a book verbatim, but rather absorbs it into a statistical model alongside millions of other works, the output is something new. This is the same logic that would let me photocopy every cookbook ever written, blend the pages into pulp, and sell the resulting paper as “transformative art.” The original recipes are unrecognisable, so no harm done.

When the people in power accept this framing, they are saying the economic value of human creativity is zero once it can be consumed at scale.

What needs to happen

The patterns being established now will determine whether human creativity remains an economically viable activity. Without attribution and compensation, we risk creating a future where the only rational choice is to stop producing original content.

The technology to implement attribution exists. We use it for advertising and analytics every day. The question is whether we will demand its application to AI before current practices become too entrenched.

We are wrong to think this is unprecedented. We have actually been here before. In 1908, the Supreme Court ruled that piano rolls — those perforated sheets that let pianos play themselves — were not “copies” of music because humans couldn’t read them. Composers got paid nothing.

It took Congress passing a law to create a “compulsory licence” to fix it. They didn’t ban the player piano. They just forced the robot makers to pay a standardised fee for the music they fed into it. We are in the 1908 moment again, but this time we are letting the machine ingest the entire internet for free.

If enough people recognise what is at stake and demand transparency, we might still get a system that works for creators and consumers alike. But we are running out of time to make that demand before “this is just how it works” becomes the permanent answer.