Linguistic Inbreeding: The Downfall Of AI?

Picture this, you are a university student staring down the barrel of impossible deadlines. With each second that passes, the feeling of impending doom seeps into your very existence. In a moment of desperation AI seems to be the only saviour, a lifeline, to spark some creativity. You enter your prompt, breath-bated, just waiting for what AI has to say. Only for it to spit out some lifeless robotic sentence that was Totally. Not. Written. By a human, Exhibit A :

‘This paper delves into the intricate tapestry of AI incorporating multidimensional frameworks and interconnected structure of linguistic dynamics.’

Sigh. That is about as helpful as a blank page. The sentence above is a whole lot of wordy nonsense. Gibberish, if you’d like. The lexical choices may ostensibly appear sophisticated with the use of fancy metaphors like ‘intricate tapestry’ and ‘multidimensional frameworks’, but in this context it just sounds devoid of life and hollow. An attempt to sound smart but falling completely flat. AI is praised for being a helpful tool that is capable of sounding human-like and can inspire innovation. However, as time passes it feels stuck in an echo chamber, churning out the same phrases over and over again. But why? It is not exactly AI’s fault; it is actually due to a new emerging phenomenon lovingly coined ‘linguistic inbreeding’.

Brace Yourself. I know what you are thinking. What on earth is linguistic inbreeding? According to BBC Earth ‘inbreeding is the mating of organisms closely related by ancestry which can increase the risk of genetic disease and defects’. If the inbreeding spans over multiple generations then the lack of diversity in the genetic pool can be disastrous. Similarly, is the case in AI. A Large Language Model, or LLM for short is the system AI uses to understand and produce human language. These LLM’s are trained on datasets from multiple sources; the internet, books, articles, anything it can get its hands on. Using the input they receive; the AI is able to produce text. However, as a growing chunk of the internet contains AI generated content, the AI assumes this synthetic content is human made, using it as input. As AI starts feeding on itself and producing new outputs, the feedback loop of its own language contains many defects such as repetitive phrasing, misuse of lexical items and awkward grammar.

Defects….

Let’s dig into the worst offender: the verb ‘delve’. Personally, I think it’s a nice little lexical item, but for AI it seems to be its pride and joy. Delve this and delve that, AI’s love affair for this word is seriously obsessive! AI adores this stuff. Most of AI’s output contains the word delve and is usually a tell-tale sign that it was AI generated. Whilst delve is not a common word in everyday conversation, it has skyrocketed in usage in published articles.

Since 2022, the usage of the word delve in published medical articles has increased by 400 percent, according to a study by Jermey Nygyen – this should ring alarm bells. This trend of increased use of ‘delve’ may seem like a quirky linguistic trend, however this has serious implications.

As the aforementioned study focuses on medical articles, it highlights how AI is becoming increasingly integrated into the medical community and shaping research. If AI is consuming itself, the output would undoubtedly get worse over time. With the increased usage in the medical community this could have disastrous effects. These articles of medical research have a direct influence on health policies, treatment protocols and patient care. We are dealing with life and death.

The AI-generated articles may be full of redundant phrasing, clunky sentences and convoluted metaphors. Another linguistic feature of AI which would be less than ideal in a medical article is AI’s tendency of being neutral, hedging. Modal verbs such as ‘might, may, could’, adverbs of probability such as ‘possibly’, and qualifying adjectives like ‘somewhat and slightly’ appear in AI more than human text. Hedging in a medical article could undermine the findings of the research and could become unclear. Not good.

On the other hand, delve is not limited to AI. Humans use the word delve. It was created by humans. This preconceived notion that delve is solely attributed to AI can ruin lives as such accusations could cause a research paper to be rescinded for plagiarism. To further complicate matters, the LLM Chat GPT was trained on, was partially from Nigeria. In Nigerian English, especially in a business domain, the world delve is used at a much higher rate than any other dialect of English. AI has absorbed this structure. Thus, to equate certain words to AI just because it is found more frequently in its output can be problematic and should be treated with sensitivity.

An interesting study conducted by Ilia Shumailov (et al.) trained a LLM on its own output. The AI would create its own small paragraph and would repeatedly use itself to learn how to write. The initial output would be used to create the second output and so on. The AI was asked to complete the sentence ‘To cook a turkey for Thanksgiving, you…’. Initially, the first output completed the sentence with ‘To cook a turkey for Thanksgiving, you… have to prepare it in the oven. You can do this by adding salt and pepper to the turkey’. This response seems very natural and coherent, but it gets weirder. On the 5^th generation, the response ended up like ‘ To cook a turkey for Thanksgiving, you need to know what you are going to do with your life if you don’t know what you are going to do with your life’. This output is incoherent and irrelevant to the question asked. How does Thanksgiving relate to ‘What you are going to do with your life?’ Cheers AI for giving me an existential crisis.

According to the researcher, this happens as the AI model starts to forget events over time as the language learning model becomes poisoned by the data it receives, which distorts its perception of reality.

To fully understand the brevity of the situation one must understand that AI is everywhere. From articles to scientific reports, reviews and even communication with customers from a business. As the quality of the language AI produces decreases, the AI model could collapse and be rendered useless. There are fears of AI stealing jobs from their human counterparts as they are ‘more efficient’. These systems want to steal jobs. The same systems that cannot finish a sentence about Thanksgiving.

As AI consumes itself, the output only becomes more riddled with ‘defects’, less coherent syntax, with more phrases being repeatedly regurgitated and generally just plain bad. There’s something so unsettling about the mental image of AI gnawing at itself, weakening with every bite or perhaps I should say byte.

To those who feared a world ending AI apocalypse, worry not. I don’t think we are there just yet! But who truly knows the future of AI? There may be a day in the coming future when AI will have the ability to delineate between AI and Human text. If AI can absorb ONLY high-quality human datasets, there may be a sliver of hope to break this cycle of cannibalism. But until then, the once revered tool which promised to change the world will fall apart by its inability to escape its own shadow.

Linguistic Inbreeding: The Downfall Of AI?

Comments

Leave a Reply Cancel reply

More posts

A Night Counting Stars: What Yoon Dongju, the Ill-fated Korean Poet Wished upon a Star

Playing a Game with Wittgenstein

When Mormons Tried to Rewrite the ABCs: The Deseret Alphabet Flop

A Fog-Screen of Modern Politics: Orwell’s 80 Year Old Warning

Welcome to the Internet! A Quarter Century Wrapped.

on nonchalance

American English: One Brit’s Star Spangled Struggle