Artificial Intelligence

# ChatGPT, Now with Plugins – O’Reilly

A few months ago, I wrote about some experiments with prime numbers. I generated a 16-digit non-prime number by multiplying two 8-digit prime numbers, and asked ChatGPT (using GPT -3.5) whether the larger number was prime. It answered correctly that the number was non-prime, but when it told me the number’s prime factors, it was clearly wrong. It also generated a short program that implemented the widely used Miller-Rabin primality test. After fixing some obvious errors, I ran the program–and while it told me (correctly) that my number was non-prime, when compared to a known good implementation of Miller-Rabin, ChatGPT’s code made many mistakes. When it became available, GPT-4 gave me similar results. And the result itself–well, that could have been a good guess. There’s a roughly a 97% chance that a randomly chosen 16-digit number will be non-prime.

OpenAI recently opened their long-awaited Plugins feature to users of ChatGPT Plus (the paid version) using the GPT-4 model. One of the first plugins was from Wolfram, the makers of Mathematica and Wolfram Alpha. I had to try this! Specifically, I was compelled to re-try my prime test. And everything worked: ChatGPT sent the problem to Wolfram, it determined that number was not prime, and gave me the correct prime factors. It didn’t generate any code, but provided a link to the Wolfram Alpha result page that described how to test for primality. The process of going through ChatGPT to Wolfram and back was also painfully slow, much slower than using Wolfram Alpha directly or writing a few lines of Python. But it worked and, for fans of prime numbers, that’s a plus.

## Learn faster. Dig deeper. See farther.

I was still uncomfortable. How does ChatGPT decide what to offload to Wolfram Alpha, and what to handle on its own? I tried a few questions from calculus; unsurprisingly, they went to Wolfram. Then I got really simple: “How much is 3 + 5?”  No Wolfram, and I wasn’t surprised when ChatGPT told me the answer was 8. But that begged the question: what about more complex arithmetic? So I asked “How much is 123456789 + 98776543321?”, a problem that could be solved by any elementary school student who has learned how to carry. Again, no Wolfram, but this time, the answer was incorrect.

We’ve long known that ChatGPT was poor at arithmetic, in addition to being poor at more advanced math. The Wolfram plugin solves the math problem with ease. However, ChatGPT is still poor at arithmetic, and still attempts to do arithmetic on its own. The important question that I can’t answer is “when does a problem become complex enough to send to the plugin?” The plugin is a big win, but not an unqualified one.

ChatGPT’s tendency to make up citations is another well-known problem. A few weeks ago, a story circulated about a lawyer who used ChatGPT to write a brief. ChatGPT cited a lot of case law, but made up all the citations. When a judge asked him to produce the actual case law, the lawyer went back to ChatGPT–which obediently made up the cases themselves. The judge was not pleased. That raises another question: ChatGPT has always been prone to making up citations–but now there’s a plugin for that! The ScholarAI plugin searches academic databases for citations, and returns links. That wouldn’t have helped this lawyer (I don’t yet see plugins from Westlaw or LexisNexis), but it’s worth asking: what about citations?

I first tried asking a medical question. I’m not a doctor, so the question was simple: what’s the latest research on antibiotic-resistant bacteria? ChatGPT sent the question to ScholarAI, and I got back a long list of relevant citations. (The plugin appeared to get into a loop, so I eventually terminated the output.) While I’m not competent to evaluate the quality or relevance of the papers, all the links were valid: the papers were real, and the author names were correct. No hallucinations here.

I followed up with some questions about English literature (I have a PhD, so I can make up real questions). I didn’t get as many citations in return, possibly because we don’t have preprint servers like ArXiv, and have done little to protest journals’ proprietary lock on scholarship. However, the citations I got were valid: real books and articles, with the authors listed correctly.