ChatGPT incorrectly answers over 50% of software engineering questions.

ChatGPT incorrectly answers over 50% of software engineering questions.

ChatGPT: A Conversational Chatbot with Limitations for Software Engineering Prompts

Person using ChatGPT on a laptop

Imagine having access to a chatbot that can provide conversational answers to any question you have, anytime you need. It seems like the perfect resource for all your information needs, right? Well, that’s precisely what ChatGPT offers. However, a recent study has discovered a limitation that might make you reconsider using ChatGPT for software engineering prompts.

Before the rise of AI chatbots, programmers relied on platforms like Stack Overflow for advice and answers to their programming-related questions. However, with Stack Overflow, you often had to wait for someone to respond to your query. This is where ChatGPT’s appeal became evident – it provides instant responses without the need for waiting.

As a result, an increasing number of software engineers and programmers turned to ChatGPT for answers to their questions. However, with no clear data on the effectiveness of ChatGPT in answering software engineering prompts, a study conducted by Purdue University aimed to investigate this dilemma.

For the study, researchers fed ChatGPT 517 Stack Overflow questions and examined the accuracy and quality of the answers provided. The results were somewhat surprising and raised concerns about ChatGPT’s efficacy in this specific domain. Out of the 512 questions, 259 (52%) of ChatGPT’s answers were incorrect, while only 248 (48%) were correct. Additionally, a significant 77% of the answers generated were verbose.

Despite the concerning inaccuracy rate, the study did reveal that ChatGPT’s answers were comprehensive 65% of the time, addressing various aspects of the questions asked. To better assess the quality of ChatGPT’s responses, the researchers enlisted the help of 12 participants with different levels of programming expertise.

While the participants generally preferred Stack Overflow’s responses over ChatGPT’s across different categories, they struggled to correctly identify incorrect answers generated by ChatGPT. In fact, they failed to recognize the inaccuracies 39.34% of the time. The researchers attribute this oversight to the comprehensive and well-articulated nature of ChatGPT’s answers.

It is crucial to recognize that the generation of plausible yet incorrect answers is a significant issue across all chatbots. This ability can inadvertently contribute to the spread of misinformation. Considering the low accuracy scores found in the study, it is advisable to reconsider using ChatGPT for software engineering prompts.

[Image]

In conclusion, while ChatGPT proves to be a valuable resource for general information needs, its limitations become apparent in the context of software engineering prompts. The study conducted by Purdue University demonstrates that ChatGPT often generates incorrect and verbose answers in this domain. Although the answers can be comprehensive and well-articulated, users tend to overlook the inaccuracies, leading to the potential propagation of misinformation.

As the field of AI continues to evolve, it is crucial to acknowledge the strengths and limitations of different chatbot models. While ChatGPT can be a helpful tool, especially for casual inquiries, it is always beneficial to remember that human expertise, like that found on platforms like Stack Overflow, remains invaluable in specialized domains like software engineering.