Research

LLMs and Security

Code generated by large language models often contain security vulnerabilities. This project investigates the security risks of using LLM-generated code and information in various software maintenance tasks such as bug resolution. We build tools to automatically uncover such risks and suggest potential solutions.

Improving LLM-Assisted Bug Resolution

Our research explores ways to enhance software bug resolution using large language models (LLMs). We analyze developer-LLM interactions to identify effective communication patterns and develop tools for refining prompts. Additionally, we investigate methods to help LLMs generate accurate bug fixes by leveraging key contextual information from real-world issue reports. This work aims to advance AI-driven tools for reliable and efficient debugging.

Understanding and Improving the Use of Conversational LLMs in Software Engineering

Conversational LLMs (e.g., GPT, Gemini, Claude) have emerged as a pivotal resource for programming support, providing immediate assistance that enhances productivity and simplifies the learning process for developers. These models are particularly valued for allowing software developers to interact in natural language, supporting an interactive learning experience. Despite their popularity, conversational LLMs often omit crucial details or produce incorrect solutions, which are hard or time-consuming for developers to identify. We found several instances where these conversational LLMs suggest fabricated information (e.g., non-existent APIs) or omit warnings about potential security risks in their code suggestions. Our research aims to improve software quality and developer productivity by providing comprehensive support for developers using conversational LLMs. This involves creating a framework to auto-reformulate queries and assess the correctness and reliability of the generated information.

Mining Emotions from Software Engineering Communication

Emotions can strongly impact activities that are collaborative in nature and require creativity and problem-solving skills, such as software development. Research has shown that positive emotions (e.g., Joy) are associated with increased productivity and job satisfaction in software engineering teams. On the other hand, negative emotions (e.g., Frustration) can cause developers to lose motivation and exhibit lower participation, ultimately leading to team attrition. In this project, we aim to mine emotions and affect in software related text towards improving collaboration and productivity in software projects.

Mining Information from Developer Chat Conversations Towards Building Software Maintenance Tools

Popular chat platforms such as Slack host public chat communities that focus on specific software development topics such as Python or Ruby-on-Rails. Many of those chat communications contain valuable information, such as description of code snippets and APIs, opinions on good programming practices, and causes of common errors/exceptions. This project aims to develop analyses for automatically identifying and extracting information in developers’ chat communications towards improving and building new tools to support software engineers.

Studying Developer Focus on Question and Answer (Q&A) Forums

Although popular Q&A forums such as Stack Overflow serve as a good knowledge resource, the abundance of information can cause developers to spend considerable time in identifying relevant answers and suitable fixes. This project aims to help developers identify informative code and text from Q&A forums, once they have narrowed down their search to a post relevant to their task.

Learning about Code Snippet Characteristics in Software Artifacts

Large corpora of software-related artifacts (e.g., blogs, bug reports, emails) offer the unique opportunity to learn from developers’ discussion about code snippets. The goal of this project is to gain insight into the potential value and difficulty of mining the natural language text associated with the code snippets found in a variety of software-related documents, including blog posts, API documentation, code reviews, and public chats.

Mining Source Code Descriptions from Research Articles

Digital libraries of computer science research articles can be a rich source for code examples that are used to motivate or explain particular concepts or issues. In this project, we designed a technique to automatically identify natural language descriptions of code segments embedded within articles. Extracting these natural language descriptions alongside code could enable new advances in areas including code-based search, automatic code comment generation, and documentation generation.

Research

LLMs and Security

Improving LLM-Assisted Bug Resolution

Understanding and Improving the Use of Conversational LLMs in Software Engineering

Mining Emotions from Software Engineering Communication

Mining Information from Developer Chat Conversations Towards Building Software Maintenance Tools

Studying Developer Focus on Question and Answer (Q&A) Forums

Learning about Code Snippet Characteristics in Software Artifacts

Mining Source Code Descriptions from Research Articles

Past and Present Collaborators

Kostadin
Damevski

Sonia
Haiduc

Shadi
Rezapour

Evan
Forman

Olga
Baysal

Yuanfang
Cai

Research Funding

Faculty Summer Research Awards for Tenure/Tenure‐Track Faculty

Follow Us

Address

Research

LLMs and Security

Improving LLM-Assisted Bug Resolution

Understanding and Improving the Use of Conversational LLMs in Software Engineering

Mining Emotions from Software Engineering Communication

Mining Information from Developer Chat Conversations Towards Building Software Maintenance Tools

Studying Developer Focus on Question and Answer (Q&A) Forums

Learning about Code Snippet Characteristics in Software Artifacts

Mining Source Code Descriptions from Research Articles

Past and Present Collaborators

KostadinDamevski

SoniaHaiduc

ShadiRezapour

EvanForman

OlgaBaysal

YuanfangCai

Research Funding

Faculty Summer Research Awards for Tenure/Tenure‐Track Faculty

Follow Us

Address

Kostadin
Damevski

Sonia
Haiduc

Shadi
Rezapour

Evan
Forman

Olga
Baysal

Yuanfang
Cai