Understanding and Improving the Use of Conversational LLMs in Software Engineering

Conversational LLMs (e.g., GPT, Gemini, Claude) have emerged as a pivotal resource for programming support, providing immediate assistance that enhances productivity and simplifies the learning process for developers. These models are particularly valued for allowing software developers to interact in natural language, supporting an interactive learning experience. Despite their popularity, conversational LLMs often omit crucial details or produce incorrect solutions, which are hard or time-consuming for developers to identify. We found several instances where these conversational LLMs suggest fabricated information (e.g., non-existent APIs) or omit warnings about potential security risks in their code suggestions. Our research aims to improve software quality and developer productivity by providing comprehensive support for developers using conversational LLMs. This involves creating a framework to auto-reformulate queries and assess the correctness and reliability of the generated information.

Mining Emotions from Software Engineering Communication

Emotions can strongly impact activities that are collaborative in nature and require creativity and problem-solving skills, such as software development. Research has shown that positive emotions (e.g., Joy) are associated with increased productivity and job satisfaction in software engineering teams. On the other hand, negative emotions (e.g., Frustration) can cause developers to lose motivation and exhibit lower participation, ultimately leading to team attrition. In this project, we aim to mine emotions and affect in software related text towards improving collaboration and productivity in software projects.

Mining Information from Developer Chat Conversations Towards Building Software Maintenance Tools

Popular chat platforms such as Slack host public chat communities that focus on specific software development topics such as Python or Ruby-on-Rails. Many of those chat communications contain valuable information, such as description of code snippets and APIs, opinions on good programming practices, and causes of common errors/exceptions. This project aims to develop analyses for automatically identifying and extracting information in developers’ chat communications towards improving and building new tools to support software engineers.

Studying Developer Focus on Question and Answer (Q&A) Forums

Although popular Q&A forums such as Stack Overflow serve as a good knowledge resource, the abundance of information can cause developers to spend considerable time in identifying relevant answers and suitable fixes. This project aims to help developers identify informative code and text from Q&A forums, once they have narrowed down their search to a post relevant to their task.

Learning about Code Snippet Characteristics in Software Artifacts

Large corpora of software-related artifacts (e.g., blogs, bug reports, emails) offer the unique opportunity to learn from developers’ discussion about code snippets. The goal of this project is to gain insight into the potential value and difficulty of mining the natural language text associated with the code snippets found in a variety of software-related documents, including blog posts, API documentation, code reviews, and public chats.

Mining Source Code Descriptions from Research Articles

Digital libraries of computer science research articles can be a rich source for code examples that are used to motivate or explain particular concepts or issues. In this project, we designed a technique to automatically identify natural language descriptions of code segments embedded within articles. Extracting these natural language descriptions alongside code could enable new advances in areas including code-based search, automatic code comment generation, and documentation generation.

Past and Present Collaborators

Research Funding