Thoughts on Wittgenstein-ian evolution of Natural Language Processing

Herin Kim
4 min readMar 18, 2024

--

Shift to semantic search with AI can help us redeem our abilities to converse contextually

Since the arrival of Chat GPT, the talk of AI has dominated conversations at work and between family and friends. At first there was the rather ambiguous, existential discourse on the wellbeing of humanity led by AI doomers and boomers, playing around dystopian scenarios that have intrigued great minds as far back as Aldous Huxley. Then companies jumped into building their own AI models, shifting the debate to a more practical side of how it will transform businesses and human labour. Not to undermine the groundbreaking potential of AI, as an average user of digital services who is neither involved in computer science, machine learning, nor a business owner seeking to cut operating costs, I have been more interested in how users actually communicate with the internet using AI and how that may shape our day to day linguistic patterns.

Natural Language and LLMs

Large language model (LLMs) is an algorithm trained with deep learning text data. By now, most of us have had experience, to some extent, in conversing with AI chatbots; writing prompts, asking questions and pointing out its errors and tuning its response to generate a desired text. Chatbots like GPT4 are built on LLMs that enable generative text output. In a broader context, LLM is a branch of Natural Language Processing (NLP), computation techniques that allow computers to interact with humans through natural language. NLP encompasses a wider range of technologies such as text translation, search engine results, or text categorisation in social media filtering.

Language of Search

Until now, our linguistic approach to searching has been customised to the lexical search system used in typical engines like Google or Bing. With lexical search, results are based on word-to-word matches between the database and the query. Lexical search is concentrated on keywords, ignoring data that has little value; stopwords like ‘are’ or ‘the’. So rather than using natural language like, “How many Asians live in New York City?”, it has been more efficient to type something like “Asian population NYC”. If I want to find pictures of the Amazon forest and type “amazon photo” in Google, most results are related to Amazon’s (the company) photo services, including the Google Play link to download Amazon Photos on my phone. Only when I tweak my query to “Amazon forest photo”, do I find images of the rainforest. Like this, our brains are wired to interact with the internet based on keywords.

Wittgensteinian LLM

Ludwig Wittgenstein was an Austrian philosopher and mathematician whose book Tractatus Logico-Philosophicus (1921) signalled the linguistic turn in Western philosophy. In his short lifetime (1889–1951), he presented ideas that forever changed the way we perceive language and logical thinking. One of which was his tenet on the contextuality and participatory nature of language - that language is not a combination of words according to syntactic rules or “universal grammar”, as Chomsky would say, but a human activity that is part of the broader “form of life”. While declaratory sentences like “The apple is red” or “There is a house across the street”, can make language appear to be mere representations of our perception of reality, Wittgenstein’s observations unearthed the non-representational, ambiguous aspects of language. Depending on the community, culture, and context, the same word or sentence can become a question, an exclamation, or a command. Among the many probabilities of what it can mean, it is the collective consensus that every social agent participates in that determines the intention of that word or sentence. Wittgenstein used the term “language game” to describe this activity. Reality, words, symbols form these language games, and because every culture has its own way of seeing things, there isn’t one authoritarian language game that rules everything else. Instead, there are multiple games that are interwoven.

Language is woven into human practise (gif by herin kim)

This is almost antithetical to the linguistic approach used in lexical search that depend on keywords and definitions. Like LLMs, the Wittgensteinian theory of language goes beyond this, retaining the value of stopwords and ‘trivial’ words that are typically filtered out in the preprocessing stage of traditional NLP, and processing them to understand the full context of the information. The future of NLP, which market is projected to grow $29.19bn this year, will be much more natural, adopting semantic and contextual communication methods as already used in GPT4.

How we talk on the Internet

The internet culture and social media has already drastically changed the way we speak and communicate, even think. I’m not just talking about information overload, or infinite scrolling or our diminishing attention span. The internet culture continues to churn out neologisms and portmanteaus everyday while at the same time, pushing away other terms and words into redundancy. Barbenheimer, rizz, tradwife, slay, “how often do you think about the roman empire?”; these are only a some of terms recently born online. Often times they are harmless, but there is an entire dictionary of political-social lexicons that are exploited to divide people. Polarisation becomes much easier when there are certain keywords upheld by the ‘isms’ and ‘ists’ that you can assimilate yourself to. The more extreme the keywords are, the more controversial and abused they become, leading to our failure to make cohesive debates and logical conversations online.

You don’t have to dig in too deep into his work to know that Wittgenstein wasn’t the happiest man alive. But I don’t see him as a stoic or misanthropist. In fact, it seems reasonable to deduce that his frustration in life was partly due to his empathy for others, the fact that he could see how reality could be perceived and communicated differently according to one’s culture and language, and how so many failed to understand this.

The return to contextual communication through large language models’ capability of semantic search may help us redeem our ability to think comprehensively. If language is, in fact, human practice, our participation in it should reflect our values in life.

--

--

Herin Kim
Herin Kim

Written by Herin Kim

Creative researcher and writer specialising in art, urban space and technology.

No responses yet