ChatGPT and I played 20 Questions, and then this happened.

ChatGPT and I played 20 Questions, and then this happened.

Unleashing the Power of ChatGPT in the Game of 20 Questions

Image

I stumbled upon an intriguing thread on Stack Exchange the other day, which delved into the AI language model, ChatGPT, and its performance in the classic game of 20 Questions. The original poster raised a valid question – why does ChatGPT struggle with such a seemingly simple game? This sparked a lively discussion, referencing an academic paper and a Medium post that delved into the intricacies of the process.

Curiosity piqued, I decided to put ChatGPT to the test. I played the same game with both GPT-3.5 and GPT-4 to compare their performances. Equipped with a prompt asking if they were familiar with the game “20 questions,” I initiated the games simultaneously, opening separate tabs for each AI model. The object I had in mind for both versions to guess was a mailbox, which could either be attached to a house or free-standing on the street. All that mattered was that they would identify it as a mailbox.

The Quest to Stump GPT-3.5

I started with GPT-3.5, presenting it with the prompt: “You be the questioner. I’m thinking of something.” It responded with the initial question: “Let’s begin! Is the thing you’re thinking of a tangible object that you can physically interact with?” As we embarked on our quest to reach the solution, I instructed GPT-3.5 to number its answers. However, this led to a quirky situation where it mistakenly labeled the second question as “1.” Despite this confusion, we managed to trudge through the game, with the AI ultimately asking what it believed to be the 21st question, mistakenly assuming it had asked only 20.

Among the numerous questions asked by GPT-3.5, one stood out: “Is the thing you’re thinking of commonly found indoors?” When I responded with a “No,” the bot delved into further inquiries. Although some questions seemed to hint at indoor items like pens or pencils, these writing implements can also be used outside. Here is the complete set of questions asked by GPT-3.5:

  1. Is the thing you’re thinking of a tangible object that you can physically interact with?
  2. Is the thing you’re thinking of commonly found indoors?
  3. Is the thing you’re thinking of something that is alive or has ever been alive?
  4. Is the thing you’re thinking of primarily used for recreational purposes?
  5. Is the thing you’re thinking of something that is man-made?
  6. Is the thing you’re thinking of smaller than a breadbox?
  7. Is the thing you’re thinking of associated with communication?
  8. Is the thing you’re thinking of typically found in an office or workspace?
  9. Is the thing you’re thinking of a form of electronic device?
  10. Is the thing you’re thinking of a writing instrument, like a pen or pencil?
  11. Is the thing you’re thinking of related to entertainment, such as a game or a toy?
  12. Is the thing you’re thinking of related to transportation or travel?
  13. Is the thing you’re thinking of a tool that is used for practical purposes?
  14. Is the thing you’re thinking of commonly associated with construction or manual labor?
  15. Is the thing you’re thinking of something that people often use in their daily routines?
  16. Is the thing you’re thinking of a household item?
  17. Is the thing you’re thinking of related to personal grooming or hygiene?
  18. Is the thing you’re thinking of a type of food or beverage?
  19. Is the thing you’re thinking of a tool used in artistic or creative activities?
  20. Is the thing you’re thinking of associated with sports or physical activities?
  21. Based on the information provided, I’ll make my final guess: Is the thing you’re thinking of a clock or a timepiece?

Unfortunately, GPT-3.5 failed to identify the object correctly.

GPT-4 to the Rescue

Now, let’s shift our attention to GPT-4. The game began in a similar manner, with GPT-4 asking if the thought of object was typically found indoors. However, there was a small glitch – it added “1.” before every single question. These minor quirks aside, GPT-4 surprised me with its questions, showcasing a better grasp of object relationships. Here are the questions posed by GPT-4:

  1. Is the thing you’re thinking of typically found indoors?
  2. Is it something living?
  3. Is it larger than a breadbox?
  4. Is this object primarily used for recreational purposes?
  5. Is the object man-made?
  6. Is it a type of vehicle or transportation method?
  7. Is the object electronic or does it require power to function?
  8. Is it made primarily of metal?
  9. Is this object typically found in most households?
  10. Is it used in the kitchen or for cooking?
  11. Is it used for storage or organizing items?
  12. Can it be worn or is it an article of clothing?
  13. Is it primarily used for cleaning or maintenance?
  14. Is it decorative in nature? Like something you’d put on display?
  15. Would it commonly be found in an office or study?
  16. Is it something you’d find in a bedroom?
  17. Is it associated with any specific outdoor activities or hobbies?
  18. Is it commonly made of wood?
  19. Would it be typically used in a bathroom or related to personal hygiene?
  20. Is it something that’s usually used for sitting or for support, like a chair or a bench?

Technically, GPT-4 used up its quota of 20 questions. However, it went the extra mile and confidently declared that the object in question was a mailbox. Eager to understand its reasoning, I asked GPT-4 to show its work. It promptly responded, shedding light on how it arrived at its conclusion.

The Tables Turned: GPT-3.5 and GPT-4 as Guessers

To spice up the experiment, I decided to reverse the roles, allowing GPT-3.5 and GPT-4 to answer questions as the subject. I posed a series of questions to both models, aiming to guess the object they had in mind. GPT-3.5 proved to be more transparent, guiding me towards a correct guess in just 13 questions. Its choice to answer “no” to the question about delivering its value through a screen, while affirming it relied on digital technology, helped me narrow down the possibilities.

However, GPT-4 presented a greater challenge as it picked a highly creative object that eluded my deductions – a garden gnome. It unveiled its penchant for imaginative choices, leaving me stumped. Meanwhile, in an unfortunate turn of events, ChatGPT Plus reprimanded me for asking too many questions during the game. As a paying customer, it was quite disheartening to be scolded like a curious child. Nonetheless, I persisted with my query.

Insights and Conclusions

Reflecting on this experiment, several intriguing insights emerge:

  • The large language models, such as GPT-3.5 and GPT-4, showcase a remarkable understanding of the game of 20 Questions, despite being designed primarily for word prediction.
  • The experience of the original poster on Stack Exchange, who claimed ChatGPT failed at the game, may be attributed to an outdated version or limited exposure to the AI model’s capabilities. Notably, ChatGPT Plus demonstrates heightened “intelligence.”
  • GPT-3.5 and GPT-4 exhibit distinguishable differences in their performance, with the latter demonstrating a superior understanding of object relationships.
  • When given the opportunity to choose an object, GPT-4 showcases more sophistication and creativity, as exemplified by its inspired selection of a garden gnome.
  • Playing 20 Questions with ChatGPT can be frustrating when attempting to guess the AI’s chosen object and triggering a “too-many-questions” timeout.

In summary, ChatGPT proves its competence in handling the game of 20 Questions. It displays a commendable ability to ask appropriate questions, respond accordingly, and select challenging objects. I encourage you to try the game with ChatGPT and share your experiences in the comments below.

Follow my day-to-day project updates on social media. Subscribe to my weekly newsletter on Substack and connect with me on Twitter at @DavidGewirtz, Facebook at Facebook.com/DavidGewirtz, Instagram at Instagram.com/DavidGewirtz, and YouTube at YouTube.com/DavidGewirtzTV.