In a preprint research paper titled “Does GPT-4 Pass the Turing Test?”, two researchers from UC San Diego decided to have a little fun by pitting OpenAI’s GPT-4 AI language model against human participants, GPT-3.5, and the ancient ELIZA computer program. The goal was to see which one could fool people into thinking it was human. Surprisingly, the study found that humans correctly identified other humans only 63 percent of the time, while ELIZA, a program from the 1960s, outperformed GPT-3.5. GPT-4 came in second place, just behind actual humans. Looks like the old saying “age before algorithms” holds true here, folks.
The Turing test, created by the brilliant Alan Turing, is a benchmark for determining if a machine can imitate human conversation. In this study, UC San Diego researchers set up a website called turingtest.live, where human interrogators chatted with AI models and other humans without knowing who was who. The results? ELIZA, with its conservative responses and lack of human-like cues, scored a solid 27 percent success rate, while GPT-3.5 struggled with a 14 percent success rate. GPT-4, despite not quite passing the test, did manage a respectable 41 percent success rate.
But let’s not be too hard on GPT-4. It turns out that it, like GPT-3.5, was specifically designed not to present itself as human. So maybe it’s not fair to judge it solely on its ability to fool us. After all, testing behavior doesn’t necessarily reflect capability. And as for the humans who failed to convince others they were real, well, maybe they just need to work on their trolling skills.
So, while GPT-4 may not have aced the Turing test, there’s still hope for future AI models. With the right prompts and a little more fine-tuning, who knows? Maybe one day we’ll be fooled by a chatbot and have no idea we’re talking to a machine. The future of AI deception awaits!