Credit: LinkeDIn

The brilliant computer scientist Alan Turing put forward a thought experiment known as the Imitation Game in 1950. Two subjects are being interviewed by a typewriter, with the interviewer aware that one is a person and the other is a machine. Turing proposed that a machine may be said to be capable of anything akin to thinking if it repeatedly succeeded in duping the interviewer into thinking it was the human.

Turing thought that the subject of whether robots could think was “too meaningless to deserve discussion.” The “Turing test” nonetheless established a standard for artificial intelligence. Many computer programs have attempted to pass it throughout the years utilizing subpar conversational techniques, with varying degrees of success.

In recent years, rich tech companies like Google, Facebook, and OpenAI have created a new category of computer programs known as “large language models,” which have conversational skills considerably beyond the simple chatbots of the past. Blake Lemoine, a Google developer, was persuaded by one of those models, Google’s LaMDA, that it is not just clever but conscious and sentient.

If Lemoine was fooled by LaMDA’s lifelike reactions, it stands to reason that many others with less knowledge of artificial intelligence (AI) may be as well. This illustrates how dangerous AI can be when used improperly as a means of deceit and manipulation.

Thus, LaMDA’s amazing skill at Turing’s Imitation Game is not considered a feat to be honored by many in the profession. If anything, it demonstrates that the venerable test’s usefulness as a compass for artificial intelligence has passed.

Behavioral scientist and co-author of the book “Rebooting AI,” Gary Marcus, claimed that these tests “aren’t really getting at intelligence.” It refers to a software program’s ability to pass for a person, at least under certain circumstances. “Which, now that I think about it, may not be that helpful for society.”

Marcus argued that the ability of systems like LaMDA to produce humanlike literature or conversation was not a breakthrough in intelligence. It’s a step forward in convincing people that you are intelligent.

Lemoine could stand out from his industry counterparts. The software does not, and could not possible contain anything like the inner life he imagines, according to Google and other AI specialists. LaMDA won’t soon become Skynet, the evil artificial intelligence from the Terminator movies, so we don’t need to be concerned about that.

Now that we live in the future that Turing foresaw, there are reasons to be concerned about a different set of issues: one where computer programs are so evolved that they may give the impression that they have agency even when they don’t.

The goal of cutting-edge artificial intelligence systems like OpenAI’s GPT-3 text generator and DALL-E 2 picture generator is to produce uncannily humanlike products by utilizing enormous data sets and powerful computational resources.

When programmers in the 1960s offered a chatbot named ELIZA prefabricated replies to different linguistic signals in an effort to dupe human interlocutors, they represented a far more potent and sophisticated approach to software creation than was before feasible.

They may also find use in commonplace tools like search engines, autocomplete recommendations, and voice assistants like Apple’s Siri and Amazon’s Alexa for a fee.

It’s also important to note that the AI industry has mostly stopped using the Turing test as a standard. The General Language Understanding Evaluation (GLUE) and the Stanford Question Answering Dataset (SQuAD) are two tests that huge language models are currently designed to do well on.

LaMDA, as contrast to ELIZA, wasn’t created with the express purpose of passing for a person; rather, it is quite adept at piecing together and spewing out plausible-sounding answers to a variety of inquiries.

Despite their sophistication, modern models and tests strive to provide results that are as humanlike as feasible, an aim that is shared with the Turing test. This “arms race,” as Margaret Mitchell, an AI ethicist, described it in a Wednesday Twitter Spaces debate with reporters from the Washington Post, has come at the price of a variety of other potential language model aims. These include making sure that their operations are transparent, that they don’t deceive users, and that they don’t unintentionally promote negative prejudices. After they co-authored a study emphasizing those and other vulnerabilities of massive language models, Mitchell and her former coworker Timnit Gebru were sacked by Google in 2021 and 2020, respectively.

While denying Lemoine’s assertions, Google and other industry titans have occasionally applauded their systems’ capacity to deceive users, as Jeremy Kahn noted this week in his Fortune newsletter “Eye on A.I.” For instance, the firm proudly demonstrated recordings of Duplex, a voice assistant, at a public event in 2018 complete with vocal tics like “umm” and “mm-hm,” which convinced receptionists to believe the caller was a person when it called to make appointments.

The Turing Test is basically about deceit, which is why Kahn called it “the Turing Test’s most troubling legacy.” And in this case, the test’s effects on the field were both extremely real and upsetting.

Kahn reaffirmed the demand to abandon the Turing test and move on, which is often made by AI skeptics and pundits. Of fact, the industry has already done so in that it has substituted more accurate standards for the Imitation Game.

The Lemoine case, however, raises the possibility that the Turing test may have a new use at a time when robots are becoming better at sounding human. The Turing test should act as an ethical warning sign rather than an aspirational benchmark since any technology that can pass it runs the risk of tricking people.

To reach the Innovate Tech Show editorial team on your feedback, story ideas and pitches, contact usĀ here.