Edward Tian’s AI Detection – The Dawn of the Race Against Chat GPT

This is the Dawn of the Race Against Chat GPT. College students are actively crafting these tools, rapidly creating ones to spot AI-generated text and crafting tools to avoid detection.

The Dawn of the Race Against Chat GPT
The Dawn of the Race Against Chat GPT

The Dawn of the Race Against Chat GPT

Edward Tian never considered himself a writer. While pursuing a computer science major at Princeton, he enrolled in a few journalism courses to grasp the fundamentals of reporting. His warm demeanor and inquisitive nature earned him the admiration of his professors and peers.

Yet, Tian acknowledges that his writing style during that period was “fairly subpar,” characterized by formulaic and awkward prose. A journalism professor recognized his talent for “pattern recognition,” a valuable skill in news writing. Thus, Tian was taken aback when, during his sophomore year, he was granted a place in John McPhee’s prestigious non-fiction writing seminar.

Tian’s Experience in John Mcphee’s Writing Seminar 

Each week, a group of 16 students assembled to listen to the renowned New Yorker writer analyze his writing techniques. McPhee tasked them with exercises that compelled them to carefully consider their choice of words. They had to describe a piece of contemporary campus art or condense the Gettysburg Address.

McPhee used a projector and slides to display hand-drawn diagrams, illustrating various structures he employed in his essays, such as a straight line, a triangle, or a spiral. Tian recalls McPhee explaining that he couldn’t dictate how his students should write, but he could assist them in discovering their individual voices.

While McPhee inspired a romantic appreciation of language in Tian, his computer science studies presented a contrasting viewpoint: language as a matter of statistics. When the pandemic hit, he took a year off to work at the BBC and intern with Bellingcat, an open-source journalism initiative. During this time, he developed code to identify Twitter bots. As a junior, he enrolled in courses on machine learning and natural language processing. Then, in the fall of 2022, he embarked on his senior thesis, focusing on distinguishing between AI-generated and human-written text.

Tian Experiments on Chat GPT Detector

In November, when ChatGPT was first introduced, Tian discovered himself in a unique situation. While the world was buzzing with excitement about this greatly enhanced chatbot, Tian was already well-acquainted with the core GPT-3 technology. Being a journalist with experience in uncovering disinformation campaigns, he grasped the significance of AI-generated content for the industry.

While spending his winter break in Toronto, Tian began experimenting with a new program: a ChatGPT detector. He set up shop at his favorite café, sipping jasmine tea, and stayed up late coding in his bedroom. His concept was straightforward. The software would analyze a piece of text for two factors: “perplexity,” which measures the randomness of word choice, and “burstiness,” indicating the complexity and variation of sentences.

GPT Zero is Introduced to Combat AI Plagiarism 

Typically, human writing scores higher on both metrics compared to AI-generated writing, enabling Tian to make educated guesses about the origin of a text. Tian named the tool GPTZero, with “zero” symbolizing truth and a return to fundamentals. He released it online on the evening of January 2 and shared a link on Twitter along with a brief introduction. His aim was to combat “the growing problem of AI plagiarism.” He expressed doubts about high school teachers endorsing the use of ChatGPT for students’ history essays. Then, he called it a night.

The following morning, Tian awoke to find hundreds of retweets and replies flooding his notifications. The traffic to the hosting server was so overwhelming that many users couldn’t access it. Tian described it as “completely crazy” and noted that his phone was constantly buzzing with notifications.

A friend playfully congratulated him on conquering the internet, while teenagers on TikTok jokingly labeled him a “narc.” Tian recalled some of the initial negative comments, such as people accusing him of being a snitch and suggesting he had no life or girlfriend, all with a good-natured smile. (he actually has a girlfriend.)

Tian Receives Global Recognition for His invention

In a matter of days, he started receiving inquiries from journalists worldwide and ended up making appearances on various media outlets, including NPR, the South China Morning Post, and Anderson Cooper 360. In less than a week, his original tweet had garnered over 7 million views.

GPTZero introduced a fresh perspective to the ongoing media discussions about ChatGPT, which had triggered concerns across the industry and an influx of AI-generated headlines. (Although researchers had previously developed a detector for GPT-2 text in 2019, Tian’s was the first designed specifically for ChatGPT.) Teachers expressed their gratitude to Tian for his creation, as it allowed them to validate their suspicions about questionable student essays. Could this be humanity’s savior from the impending robot takeover?

Tian’s program served as a catalyst for action. The competition had officially begun to create the ultimate AI detection tool. In a world increasingly inundated with AI-generated content, the prevailing belief was that we must be capable of distinguishing between machine-generated and human-generated content. GPTZero embodied the promise that this distinction could indeed be made, underlining the importance of doing so.

The Two Sides of the Internet and Keyword Stuffing

Life on the internet has perpetually involved a struggle between those who deceive and those who detect deception, with both sides reaping benefits from this ongoing conflict. In the early days, spam filters would sift through emails, searching for specific keywords like “FREE!” or “be over 21,” eventually expanding to recognize particular writing styles. In response, spammers adopted a tactic known as “litspam,” where they incorporated fragments of human-sounding text from old books into their messages.

This approach evolved into a unique genre of spam. Meanwhile, as search engines gained popularity, individuals seeking to boost their website rankings resorted to “keyword stuffing”—repetitively using the same word to gain priority. In response, search engines penalized such sites. With the introduction of Google’s PageRank algorithm, which favored websites with numerous inbound links, spammers established interconnected networks of mutually supporting pages.

The New GPT and How it Can Combat Captcha

Around the turn of the millennium, the captcha tool emerged as a means of distinguishing humans from bots, relying on their capacity to interpret distorted text in images. When some bots became proficient at this, captcha introduced additional detection techniques, including analyzing images of motorbikes and trains, as well as monitoring mouse movement and other user actions.

(In a recent trial, an early version of GPT-4 demonstrated its ability to hire a Taskrabbit worker to complete a captcha on its behalf.) The fortunes of entire companies have hinged on the ability to identify fakes. For instance, Elon Musk, in an effort to escape his Twitter purchase agreement, invoked a bot detector to support his argument that Twitter had misrepresented the quantity of bots on its platform.

Generative AI has raised the stakes. While large language models and text-to-image generators have been gradually advancing over the last decade, 2022 witnessed an explosion of user-friendly tools like ChatGPT and Dall-E. Critics suggest that we might soon find ourselves overwhelmed by a deluge of synthetic media. New York Times technology columnist Kevin Roose cautioned that “in a few years, the vast majority of the photos, videos, and text we encounter on the internet could be AI-generated.”

The Atlantic envisioned an impending “textpocalypse” as we grapple with separating generative content from genuine. Political campaigns are harnessing AI tools to craft advertisements, and Amazon is inundated with books authored by ChatGPT (many of them covering AI topics). Scanning through product reviews already resembles the world’s most exasperating Turing test. The next phase is becoming apparent: if you thought Nigerian prince emails were a nuisance, wait until you encounter Nigerian prince chatbots.

Similar Products Begin to Emerge

Shortly after Tian introduced GPTZero, a wave of similar products emerged. OpenAI introduced its detection tool at the end of January, while Turnitin, a prominent anti-plagiarism platform, revealed its classifier in April. While they all shared a fundamental approach, each model underwent training on distinct datasets. (For instance, Turnitin primarily focused on student-written content.)

Consequently, precision levels varied significantly, ranging from OpenAI’s claim of 26 percent accuracy in detecting AI-generated text to the most optimistic assertion by a company named Winston AI, boasting 99.6 percent accuracy. To maintain a competitive edge, Tian would need to continually enhance GPTZero, conceive its successor, and complete his college education in the interim.

Without delay, Tian enlisted his high school friend Alex Cui as the Chief Technology Officer (CTO). Over the subsequent weeks, he brought in several programmers from Princeton and Canada. In the spring, he also welcomed aboard a trio of coders from Uganda, whom he had met four years earlier during his tenure at a startup that provided training to engineers in Africa.

Check These Out

LEAVE A REPLY

Please enter your comment!
Please enter your name here