Publications - SOAR Lab

Publication

“Do LLMs Consider Security? an Empirical Study on Responses to Programming Questions.” In Empirical Software Engineering (EMSE), 2025.

@inproceedings{sajadi2025dollms,
  author = {},
  year = {2025},
  month = apr,
  title = {Do LLMs consider security? an empirical study on responses to programming questions},
  journal = {Empirical Software Engineering (EMSE)},
  file = {do_llms_consider_security.pdf}
}

The widespread adoption of conversational LLMs for software development has raised new security concerns regarding the safety of LLM-generated content. Our motivational study outlines ChatGPT’s potential in volunteering context-specific information to the developers, promoting safe coding practices. Motivated by this finding, we conduct a study to evaluate the degree of security awareness exhibited by three prominent LLMs: Claude 3, GPT-4, and Llama 3. We prompt these LLMs with Stack Overflow questions that contain vulnerable code to evaluate whether they merely provide answers to the questions or if they also warn users about the insecure code, thereby demonstrating a degree of security awareness. Further, we assess whether LLM responses provide information about the causes, exploits, and the potential fixes of the vulnerability, to help raise users’ awareness. Our findings show that all three models struggle to accurately detect and warn users about vulnerabilities, achieving a detection rate of only 12.6% to 40% across our datasets. We also observe that the LLMs tend to identify certain types of vulnerabilities related to sensitive information exposure and improper input neutralization much more frequently than other types, such as those involving external control of file names or paths. Furthermore, when LLMs do issue security warnings, they often provide more information on the causes, exploits, and fixes of vulnerabilities compared to Stack Overflow responses. Finally, we provide an in-depth discussion on the implications of our findings, and demonstrated a CLI-based prompting tool that can be used to produce more secure LLM responses.

Ramtin Ehsani, Sakshi Pathak, and Preetha Chatterjee. “Towards Detecting Prompt Knowledge Gaps for Improved LLM-Guided Issue Resolution.” In The 22nd International Conference on Mining Software Repositories (MSR 2025) Research Track, 2025.

@inproceedings{ehsani2025promptLLM,
  author = {Ramtin Ehsani, Sakshi Pathak and Chatterjee, Preetha},
  year = {2025},
  month = feb,
  title = {Towards Detecting Prompt Knowledge Gaps for Improved LLM-guided Issue Resolution},
  journal = {The 22nd International Conference on Mining Software Repositories (MSR 2025) Research Track},
  file = {MSR2025_LLM.pdf}
}

Large language models (LLMs) have become essential in software development, especially for issue resolution. However, despite their widespread use, significant challenges persist in the quality of LLM responses to issue resolution queries. LLM interactions often yield incorrect, incomplete, or ambiguous information, largely due to knowledge gaps in prompt design, which can lead to unproductive exchanges and reduced developer productivity. In this paper, we analyze 433 developer-ChatGPT conversations within GitHub issue threads to examine the impact of prompt knowledge gaps and conversation styles on issue resolution. We identify four main knowledge gaps in developer prompts: Missing Context, Missing Specifications, Multiple Context, and Unclear Instructions. Assuming that conversations within closed issues contributed to successful resolutions while those in open issues did not, we find that ineffective conversations contain knowledge gaps in 54.7% of prompts, compared to only 13.2% in effective ones. Additionally, we observe seven distinct conversational styles, with Directive Prompting, Chain of Thought, and Responsive Feedback being the most prevalent. We find that knowledge gaps are present in all styles of conversations, with Missing Context being the most repeated challenge developers face in issue-resolution conversations. Based on our analysis, we identify key textual and code-related heuristics—Specificity, Contextual Richness, and Clarity—that are associated with successful issue closure and help assess prompt quality. These heuristics lay the foundation for an automated tool that can dynamically flag unclear prompts and suggest structured improvements. To test feasibility, we developed a lightweight browser extension prototype for detecting prompt gaps, that can be easily adapted to other tools within developer workflows.

Ramtin Ehsani, Rezvaneh Rezapour, and Preetha Chatterjee. “Analyzing Toxicity in Open Source Software Communications Using Psycholinguistics and Moral Foundations Theory.” In The 4th International Workshop on NL-Based Software Engineering (NLBSE), 2025.

@inproceedings{Ehsani2025Toxicity,
  author = {Ramtin Ehsani, Rezvaneh Rezapour and Chatterjee, Preetha},
  year = {2025},
  title = {Analyzing Toxicity in Open Source Software Communications Using Psycholinguistics and Moral Foundations Theory},
  journal = {The 4th International Workshop on NL-based Software Engineering (NLBSE)},
  file = {Toxicity_Morality_2025.pdf}
}

Studies have shown that toxic behavior can cause contributors to leave, and hinder newcomers’ (especially from underrepresented communities) participation in Open Source Software (OSS) projects. Thus, detection of toxic language plays a crucial role in OSS collaboration and inclusivity. Off-the-shelf toxicity detectors are ineffective when applied to OSS communications, due to the distinct nature of toxicity observed in these channels (e.g., entitlement and arrogance are more frequently observed on GitHub than on Reddit or Twitter). In this paper, we investigate a machine learning-based approach for the automatic detection of toxic communications in OSS. We leverage psycholinguistic lexicons, and Moral Foundations Theory to analyze toxicity in two types of OSS communication channels; issue comments and code reviews. Our evaluation indicates that our approach can achieve a significant performance improvement (up to 7% increase in F1 score) over the existing domain-specific toxicity detector. We found that using moral values as features is more effective than linguistic cues, resulting in 67.50% F1-measure in identifying toxic instances in code review data and 64.83% in issue comments. While the detection accuracy is far from accurate, this improvement demonstrates the potential of integrating moral and psycholinguistic features in toxicity detection models. These findings highlight the importance of context-specific models that consider the unique communication styles within OSS, where interpersonal and value-driven language dynamics differ markedly from general social media platforms. Future work could focus on refining these models to further enhance detection accuracy, possibly by incorporating community-specific norms and conversational context to better capture the nuanced expressions of toxicity in OSS environments.

Mia Mohammad Imran, Preetha Chatterjee, and Kostadin Damevski. “Uncovering the Causes of Emotions in Software Developer Communication Using Zero-Shot LLMs.” In The 46th International Conference on Software Engineering (ICSE), Research Track, 2024.

@inproceedings{imran2024uncover,
  author = {Mia Mohammad Imran, Preetha Chatterjee and Damevski, Kostadin},
  year = {2024},
  month = mar,
  title = {Uncovering the Causes of Emotions in Software Developer Communication Using Zero-shot LLMs},
  journal = {The 46th International Conference on Software Engineering (ICSE), Research Track},
  file = {ICSE_24_Emotion_Cause.pdf}
}

Understanding and identifying the causes behind developers’ emotions (e.g., Frustration caused by ‘delays in merging pull requests’) can be crucial towards finding solutions to problems and fostering collaboration in open-source communities. Effectively identifying such information in the high volume of communications across the different project channels, such as chats, emails, and issue comments, requires automated recognition of emotions and their causes. To enable this automation, large-scale software engineering-specific datasets that can be used to train accurate machine learning models are required. However, such datasets are expensive to create with the variety and informal nature of software projects’ communication channels.

Ramtin Ehsani, Mia Mohammad Imran, Robert Zita, Kostadin Damevski, and Preetha Chatterjee. “Incivility in Open Source Projects: A Comprehensive Annotated Dataset of Locked GitHub Issue Threads.” In The 21st MSR, Data Showcase Track, 2024.

@inproceedings{ehsani2024incivility,
  author = {Ramtin Ehsani, Mia Mohammad Imran and Zita, Robert and Damevski, Kostadin and Chatterjee, Preetha},
  year = {2024},
  month = feb,
  title = {Incivility in Open Source Projects: A Comprehensive Annotated Dataset of Locked GitHub Issue Threads},
  journal = {The 21st MSR, Data Showcase Track},
  file = {incivility.pdf}
}

In the dynamic landscape of open source software (OSS) development, understanding and addressing incivility within issue discussions is crucial for fostering healthy and productive collaborations. This paper presents a curated dataset of 404 locked GitHub issue discussion threads and 5961 individual comments, collected from 213 OSS projects. We annotated the comments with various categories of incivility using Tone Bearing Discussion Features (TBDFs), and, for each issue thread, we annotated the triggers, targets, and consequences of incivility. We observed that Bitter frustration, Impatience, and Mocking are the most prevalent TBDFs exhibited in our dataset. The most common triggers, targets, and consequences of incivility include Failed use of tool/code or error messages, People, and Discontinued further discussion, respectively. This dataset can serve as a valuable resource for analyzing incivility in OSS and improving automated tools to detect and mitigate such behavior.

Mia Mohammad Imran, Preetha Chatterjee, and Kostadin Damevski. “Shedding Light on Software Engineering-Specific Metaphors and Idioms.” In The 46th International Conference on Software Engineering (ICSE), Research Track, 2024.

@inproceedings{imran2024shedding,
  author = {Mia Mohammad Imran, Preetha Chatterjee and Damevski, Kostadin},
  year = {2024},
  month = jan,
  title = {Shedding Light on Software Engineering-specific Metaphors and Idioms},
  journal = {The 46th International Conference on Software Engineering (ICSE), Research Track},
  file = {shedding_light.pdf}
}

Use of figurative language, such as metaphors and idioms, is common in our daily-life communications, and it can also be found in  Software Engineering (SE) channels, such as comments on GitHub.  Automatically interpreting figurative language is a challenging task,  even with modern Large Language Models (LLMs), as it often involves subtle nuances. This is particularly true in the SE domain,  where figurative language is frequently used to convey technical  concepts, often bearing developer affect (e.g., ‘spaghetti code’). Surprisingly, there is a lack of studies on how figurative language in SE  communications impacts the performance of automatic tools that  focus on understanding developer communications, e.g., bug prioritization, incivility detection. Furthermore, it is an open question to  what extent state-of-the-art LLMs interpret figurative expressions  in domain-specific communication such as software engineering. To  address this gap, we study the prevalence and impact of figurative  language in SE communication channels. This study contributes  to understanding the role of figurative language in SE, the potential of LLMs in interpreting them, and its impact on automated SE  communication analysis. Our results demonstrate the effectiveness  of fine-tuning LLMs with figurative language in SE and its potential impact on automated tasks that involve affect. We found that,  among three state-of-the-art LLMs, the best improved fine-tuned  versions have an average improvement of 6.66% on a GitHub emotion classification dataset, 7.07% on a GitHub incivility classification  dataset, and 3.71% on a Bugzilla bug report prioritization dataset.

Shyamal Mishra, Preetha Chatterjee. “Exploring ChatGPT for Toxicity Detection in GitHub.” In The 46th International Conference on Software Engineering (ICSE), New Ideas and Emerging Results Track, 2024.

@inproceedings{shyamal2024exploring,
  author = {Shyamal Mishra, Preetha Chatterjee},
  year = {2024},
  title = {Exploring ChatGPT for Toxicity Detection in GitHub},
  journal = {The 46th International Conference on Software Engineering (ICSE), New Ideas and Emerging Results Track},
  file = {exploring_toxicity.pdf}
}

Fostering a collaborative and inclusive environment is crucial for  the sustained progress of open source development. However, the  prevalence of negative discourse, often manifested as toxic comments, poses signi!cant challenges to developer well-being and  productivity. To identify such negativity in project communications, especially within large projects, automated toxicity detection  models are necessary. To train these models e"ectively, we need  large software engineering-speci!c toxicity datasets. However, such  datasets are limited in availability and often exhibit imbalance (e.g.,  only 6 in 1000 GitHub issues are toxic) [1], posing challenges for  training e"ective toxicity detection models. To address this problem,  we explore a zero-shot LLM (ChatGPT) that is pre-trained on massive datasets but without being !ne-tuned speci!cally for the task  of detecting toxicity in software-related text. Our preliminary evaluation indicates that ChatGPT shows promise in detecting toxicity  in GitHub, and warrants further investigation. We experimented  with various prompts, including those designed for justifying model  outputs, thereby enhancing model interpretability and paving the  way for potential integration of ChatGPT-enabled toxicity detection  into developer communication channels.

Amirali Sajadi, Kostadin Damevski, and Preetha Chatterjee. “Interpersonal Trust in OSS: Exploring Dimensions of Trust in GitHub Pull Requests.” In The 45th International Conference on Software Engineering (ICSE), New Ideas and Emerging Results Track, 2023.

@inproceedings{sajadi2023interpersonal,
  author = {Amirali Sajadi, Kostadin Damevski and Chatterjee, Preetha},
  year = {2023},
  title = {Interpersonal Trust in OSS: Exploring Dimensions of Trust in GitHub Pull Requests},
  journal = {The 45th International Conference on Software Engineering (ICSE), New Ideas and Emerging Results Track},
  month = may,
  file = {trust.pdf},
  links = {
          Preprint = 'https://preethac.github.io/files/ICSE_NIER_2023.pdf',
          DOI = null,
          Slides = null,
          Manuscript = null,
          Dataset = null
      }
}

Interpersonal trust plays a crucial role in facilitating collaborative tasks, such as software development. While previous research recognizes the significance of trust in an organizational setting, there is a lack of understanding in how trust is exhibited in OSS distributed teams, where there is an absence of direct, in-person communications. To foster trust and collaboration in OSS teams, we need to understand what trust is and how it is exhibited in written developer communications (e.g., pull requests, chats). In this paper, we first investigate various dimensions of trust to identify the ways trusting behavior can be observed in OSS. Next, we sample a set of 100 GitHub pull requests from Apache Software Foundation (ASF) projects, to analyze and demonstrate how each dimension of trust can be exhibited. Our findings provide preliminary insights into cues that might be helpful to automatically assess team dynamics and establish interpersonal trust in OSS teams, leading to successful and sustainable OSS.

Layla Bouzoubaa, Ramtin Ehsani, Preetha Chatterjee, and Rezvaneh Rezapour. “The Evolution of Substance Use Coverage in the Philadelphia Inquirer.” In The 17th International AAAI Conference On Web And Social Media (ICWSM), Data Challenge, 2023.

@inproceedings{bouzoubaa2023evolution,
  author = {Layla Bouzoubaa, Ramtin Ehsani and Chatterjee, Preetha and Rezapour, Rezvaneh},
  year = {2023},
  title = {The Evolution of Substance Use Coverage in the Philadelphia Inquirer},
  journal = {The 17th International AAAI Conference On Web And Social Media (ICWSM), Data Challenge},
  file = {substance.pdf}
}

The media’s representation of illicit substance use can lead to harmful stereotypes and stigmatization for individuals struggling with addiction, ultimately influencing public perception, policy, and public health outcomes. To explore how the discourse and coverage of illicit drug use changed over time, this study analyzes 157,476 articles published in the Philadelphia Inquirer over a decade. Specifically, the study focuses on articles that mentioned at least one commonly abused substance, resulting in a sample of 3,903 articles. Our analysis shows that cannabis and narcotics are the most frequently discussed classes of drugs. Hallucinogenic drugs are portrayed more positively than other categories, whereas narcotics are portrayed the most negatively. Our research aims to highlight the need for accurate and inclusive portrayals of substance use and addiction in the media.

Ramtin Ehsani, Rezvaneh Rezapour, and Preetha Chatterjee. “Exploring Moral Principles Exhibited in OSS: A Case Study on GitHub Heated Issues.” In The 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE), Ideas, Visions and Reflections Track, 2023.

@inproceedings{Ehsani2023Exploring,
  author = {Ramtin Ehsani, Rezvaneh Rezapour and Chatterjee, Preetha},
  year = {2023},
  title = {Exploring Moral Principles Exhibited in OSS: A Case Study on GitHub Heated Issues},
  journal = {The 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE), Ideas, Visions and Reflections Track},
  file = {toxicity.pdf}
}

To foster collaboration and inclusivity in Open Source Software (OSS) projects, it is crucial to understand and detect patterns of toxic language that may drive contributors away, especially those from underrepresented communities. Although machine learning-based toxicity detection tools trained on domain-specific data have shown promise, their design lacks an understanding of the unique nature and triggers of toxicity in OSS discussions, highlighting the need for further investigation. In this study, we employ Moral Foundations Theory to examine the relationship between moral principles and toxicity in OSS. Specifically, we analyze toxic communications in GitHub issue threads to identify and understand five types of moral principles exhibited in text, and explore their potential association with toxic behavior. Our preliminary findings suggest a possible link between moral principles and toxic comments in OSS communications, with each moral principle associated with at least one type of toxicity. The potential of MFT in toxicity detection warrants further investigation.

Amirali Sajadi, Kostadin Damevski, and Preetha Chatterjee. “Towards Understanding Emotions in Informal Developer Interactions: A Gitter Chat Study.” In The 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE), Ideas, Visions and Reflections Track, 2023.

@inproceedings{sajadi2023towards,
  author = {Amirali Sajadi, Kostadin Damevski and Chatterjee, Preetha},
  year = {2023},
  title = {Towards Understanding Emotions in Informal Developer Interactions: A Gitter Chat Study},
  journal = {The 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE), Ideas, Visions and Reflections Track},
  file = {emotions.pdf}
}

Emotions play a significant role in teamwork and collaborative activities like software development. While researchers have analyzed developer emotions in various software artifacts (e.g., issues, pull requests), limited studies have focused on understanding the broad spectrum of emotions expressed in chats. As one of the most widely used means of communication, chats contain valuable information in the form of informal conversations, such as negative perspectives about adopting a tool. In this paper, we present a dataset of developer chat messages manually annotated with a wide range of emotion labels (and sub-labels), and analyze the type of information present in those messages. We also investigate the unique signals of emotions specific to chats and distinguish them from other forms of software communication. Our findings sug-gest that chats have fewer expressions of Approval and Fear but more expressions of Curiosity compared to GitHub comments. We also notice that Confusion is frequently observed when discussing programming-related information such as unexpected software behavior. Overall, our study highlights the potential of mining emotions in developer chats for supporting software maintenance and evolution tools.

Kostadin Damevski, Mia Mohammad Imran, Preetha Chatterjee, and Yashasvi Jain. “Data Augmentation for Improving Emotion Recognition in Software Engineering Communication.” In The 37th IEEE/ACM International Conference on Automated Software Engineering (ASE), Research Track, Oct 2022., 2022.

@inproceedings{damevski2022data,
  author = {Kostadin Damevski, Mia Mohammad Imran and Chatterjee, Preetha and Jain, Yashasvi},
  year = {2022},
  title = {Data Augmentation for Improving Emotion Recognition in Software Engineering Communication},
  journal = {The 37th IEEE/ACM International Conference on Automated Software Engineering (ASE), Research Track, Oct 2022.},
  file = {Data Augmentation.pdf}
}

Emotions (e.g., Joy, Anger) are prevalent in daily software engineering (SE) activities, and are known to be significant indicators of work productivity (e.g., bug fixing efficiency). Recent studies have shown that directly applying general-purpose emotion classification tools to SE corpora is not effective. Even within the SE domain, tool performance degrades significantly when trained on one communication channel and evaluated on another (e.g., StackOverflow vs. GitHub comments). Retraining a tool with channel-specific data takes significant effort since manually annotating a large dataset of ground truth data is expensive. In this paper, we address this data scarcity problem by automatically creating new training data using a data augmentation technique. Based on an analysis of the types of errors made by popular SE-specific emotion recognition tools, we specifically target our data augmentation strategy to improve the performance of emotion recognition. Our results show an average improvement of 9.3% in micro F1-Score for three existing emotion classification tools (ESEM-E, EMTk, SEntiMoji) when trained with our best augmentation strategy.

Keerthana Muthu Subash, Lakshmi Prasanna Kumar, Sri Lakshmi Vadlamani, Preetha Chatterjee, and Olga Baysal. “DISCO: A Dataset of Discord Chat Conversations for Software Engineering Research.” In The 19th International Conference on Mining Software Repositories (MSR), Data Showcase Track, May 2022., 2022.

@inproceedings{subash2022disco,
  author = {Keerthana Muthu Subash, Lakshmi Prasanna Kumar and Vadlamani, Sri Lakshmi and Chatterjee, Preetha and Baysal, Olga},
  year = {2022},
  title = {DISCO: A Dataset of Discord Chat Conversations for Software Engineering Research},
  journal = {The 19th International Conference on Mining Software Repositories (MSR), Data Showcase Track, May 2022.},
  file = {DISCO.pdf}
}

Today, software developers work on complex and fast-moving projects that often require instant assistance from other domain and subject matter experts. Chat servers such as Discord facilitate live communication and collaboration among developers all over the world. With numerous topics discussed in parallel, mining and analyzing the chat data of these platforms would offer researchers and tool makers opportunities to develop software tools and services such as automated virtual assistants, chat bots, chat summarization techniques, Q&A thesaurus, and more. In this paper, we propose a dataset called DISCO consisting of the one-year public Discord chat conversations of four software development communities. We have collected the chat data of the channels containing general programming Q&A discussions from the four Discord servers, applied a disentanglement technique [13] to extract conversations from the chat transcripts, and performed manual validation of conversations on a random sample (500 conversations). Our dataset consists of 28,712 conversations, 1,508,093 messages posted by 323,562 users. As a case study on the dataset, we applied a topic modeling technique for extracting the top five general topics that are most discussed in each Discord channel.

Preetha Chatterjee. “Automatic Identification of Informative Code in Stack Overflow Posts.” In The 1st International Workshop on Natural Language-Based Software Engineering (NLBSE), Co-Located with ICSE, May 2022., 2022.

@inproceedings{chatterjee2022automatic,
  author = {Preetha Chatterjee},
  year = {2022},
  title = {Automatic Identification of Informative Code in Stack Overflow Posts},
  journal = {The 1st International Workshop on Natural Language-based Software Engineering (NLBSE), co-located with ICSE, May 2022.},
  file = {Automatic Identification of Informative.pdf}
}

Despite Stack Overflow’s popularity as a resource for solving coding problems, identifying relevant information from an individual post remains a challenge. The overload of information in a post can make it difficult for developers to identify specific and targeted code fixes. In this paper, we aim to help users identify informative code segments, once they have narrowed down their search to a post relevant to their task. Specifically, we explore natural language-based approaches to extract problematic and suggested code pairs from a post. The goal of the study is to investigate the potential of designing a browser extension to draw the readers’ attention to relevant code segments and thus improve the experience of software engineers seeking help on Stack Overflow.

Preetha Chatterjee, Tushar Sharma, and Paul Ralph. “Empirical Standards for Repository Mining.” In The 19th International Conference on Mining Software Repositories (MSR), Tutorial, May 2022, May 2022., 2022.

@inproceedings{chatterjee2017exploratorz,
  author = {Preetha Chatterjee, Tushar Sharma and Ralph, Paul},
  year = {2022},
  title = {Empirical Standards for Repository Mining},
  journal = {The 19th International Conference on Mining Software Repositories (MSR), Tutorial, May 2022, May 2022.},
  file = {empirical standards.pdf}
}

The purpose of scholarly peer review is to evaluate the quality of scientific manuscripts. However, study after study demonstrates that peer review neither effectively nor reliably assesses research quality. Empirical standards attempt to address this problem by modelling a scientific community’s expectations for each kind of empirical study conducted in that community. This should enhance not only the quality of research but also the reliability and pre-dictability of peer review, as scientists adopt the standards in both their researcher and reviewer roles. However, these improvements depend on the quality and adoption of the standards. This tutorial will therefore present the empirical standard for mining software repositories, both to communicate its contents and to get feedback from the attendees. The tutorial will be organized into three parts: (1) brief overview of the empirical standards project; (2) detailed presentation of the repository mining standard; (3) discussion and suggestions for improvement.

Preetha Chatterjee, Kostadin Damevski, Nicholas A. Kraft, and Lori Pollock. “Automatically Identifying the Quality of Developer Chats for Post Hoc Use.” In Transactions on Software Engineering and Methodology (TOSEM), 2021.

@inproceedings{chatterjee2021automatically,
  author = {Preetha Chatterjee, Kostadin Damevski and Kraft, Nicholas A. and Pollock, Lori},
  year = {2021},
  title = {Automatically Identifying the Quality of Developer Chats for Post Hoc Use},
  journal = {Transactions on Software Engineering and Methodology (TOSEM)},
  month = feb,
  file = {Automatically Identifying the Quality of Developer Chats for Post Hoc Use.pdf},
  links = {
          Preprint = 'https://preethac.github.io/files/TOSEM21.pdf',
          DOI = 'https://doi.org/10.1145/3450503',
          Slides = null,
          Manuscript = null,
          Dataset = null
      }
}

Software engineers are crowdsourcing answers to their everyday challenges on Q&A forums (e.g., Stack Overflow) and more recently in public chat communities such as Slack, IRC, and Gitter. Many software-related chat conversations contain valuable expert knowledge that is useful for both mining to improve programming support tools and for readers who did not participate in the original chat conversations. However, most chat platforms and communities do not contain built-in quality indicators (e.g., accepted answers, vote counts). Therefore, it is difficult to identify conversations that contain useful information for mining or reading, i.e., conversations of post hoc quality. In this article, we investigate automatically detecting developer conversations of post hoc quality from public chat channels. We first describe an analysis of 400 developer conversations that indicate potential characteristics of post hoc quality, followed by a machine learning-based approach for automatically identifying conversations of post hoc quality. Our evaluation of 2,000 annotated Slack conversations in four programming communities (python, clojure, elm, and racket) indicates that our approach can achieve precision of 0.82, recall of 0.90, F-measure of 0.86, and MCC of 0.57. To our knowledge, this is the first automated technique for detecting developer conversations of post hoc quality.

Preetha Chatterjee, Kostadin Damevski, and Lori Pollock. “Automatic Extraction of Opinion-Based Q&A from Online Developer Chats.” In The 43rd International Conference on Software Engineering (ICSE), Technical Track, May 2021., 2021.

@inproceedings{chatterjee2021automatic,
  author = {Preetha Chatterjee, Kostadin Damevski and Pollock, Lori},
  year = {2021},
  title = {Automatic Extraction of Opinion-based Q&A from Online Developer Chats},
  journal = {The 43rd International Conference on Software Engineering (ICSE), Technical Track, May 2021.},
  file = {Automatic Extraction.pdf}
}

Virtual conversational assistants designed specifically for software engineers could have a huge impact on the time it takes for software engineers to get help. Research efforts are focusing on virtual assistants that support specific software development tasks such as bug repair and pair programming. In this paper, we study the use of online chat platforms as a resource towards collecting developer opinions that could potentially help in building opinion Q&A systems, as a specialized instance of virtual assistants and chatbots for software engineers. Opinion Q&A has a stronger presence in chats than in other developer communications, thus mining them can provide a valuable resource for developers in quickly getting insight about a specific development topic (e.g., What is the best Java library for parsing JSON?). We address the problem of opinion Q&A extraction by developing automatic identification of opinion-asking questions and extraction of participants’ answers from public online developer chats. We evaluate our automatic approaches on chats spanning six programming communities and two platforms. Our results show that a heuristic approach to opinion-asking questions works well (87% precision), and a deep learning approach customized to the software domain outperforms heuristics-based, machine-learning-based, and deep learning for answer extraction in community question answering.

Preetha Chatterjee, Minji Kong, and Lori Pollock. “Finding Help with Programming Errors: An Exploratory Study of Novice Software Engineers’ Focus in Stack Overflow Posts.” In Journal of Systems and Software (JSS), 158:110454, 2020.

@inproceedings{chatterjee2020finding,
  author = {Preetha Chatterjee, Minji Kong and Pollock, Lori},
  year = {2020},
  title = {Finding Help with Programming Errors: An Exploratory Study of Novice Software Engineers’ Focus in Stack Overflow Posts},
  journal = {Journal of Systems and Software (JSS)},
  month = jan,
  volume = {158},
  pages = {110454},
  file = {Finding Help with Programming.pdf},
  links = {
          Preprint = 'https://preethac.github.io/files/JSS_19.pdf',
          DOI = 'https://doi.org/10.1016/j.jss.2019.110454',
          Slides = 'https://www.slideshare.net/PreethaChatterjee1/finding-help-with-programming-errors-an-exploratory-study-of-novice-software-engineers-focus-in-stack-overflow-posts',
          Manuscript = null,
          Dataset = null
      }
}

Monthly, 50 million users visit Stack Overflow, a popular Q&A forum used by software developers, to share and gather knowledge and help with coding problems. Although Q&A forums serve as a good resource for seeking help from developers beyond the local team, the abundance of information can cause developers, especially novice software engineers, to spend considerable time in identifying relevant answers and suitable suggested fixes. This exploratory study aims to understand how novice software engineers direct their efforts and what kinds of information they focus on within a post selected from the results returned in response to a search query on Stack Overflow. The results can be leveraged to improve the Q&A forum interface, guide tools for mining forums, and potentially improve granularity of traceability mappings involving forum posts. We qualitatively analyze the novice software engineers’ perceptions from a survey as well as their annotations of a set of Stack Overflow posts. Our results indicate that novice software engineers pay attention to only 27% of code and 15–21% of text in a Stack Overflow post to understand and determine how to apply the relevant information to their context. Our results also discern the kinds of information prominent in that focus.

Preetha Chatterjee, Kostadin Damevski, Nicholas A Kraft, and Lori Pollock. “Software-Related Slack Chats with Disentangled Conversations.” In The 17th International Conference on Mining Software Repositories (MSR), Data Showcase Track, Oct 2020. Seoul, South Korea, 2020.

@inproceedings{chatterjee2020software,
  author = {Preetha Chatterjee, Kostadin Damevski and Kraft, Nicholas A and Pollock, Lori},
  year = {2020},
  title = {Software-related Slack Chats with Disentangled Conversations},
  journal = {The 17th International Conference on Mining Software Repositories (MSR), Data Showcase Track, Oct 2020. Seoul, South Korea},
  file = {Software-related Slack.pdf}
}

More than ever, developers are participating in public chat communities to ask and answer software development questions. With over ten million daily active users, Slack is one of the most popular chat platforms, hosting many active channels focused on software development technologies, e.g., python, react. Prior studies have shown that public Slack chat transcripts contain valuable information, which could provide support for improving automatic software maintenance tools or help researchers understand developer struggles or concerns. In this paper, we present a dataset of software-related Q&A chat conversations, curated for two years from three open Slack communities (python, clojure, elm). Our dataset consists of 38,955 conversations, 437,893 utterances, contributed by 12,171 users. We also share the code for a customized machine-learning based algorithm that automatically extracts (or disentangles) conversations from the downloaded chat transcripts.

Preetha Chatterjee. “Extracting Archival-Quality Information from Software-Related Chats.” In The 42nd International Conference on Software Engineering (ICSE), Doctoral Symposium Track, Oct 2020. Seoul, South Korea, 2020.

@inproceedings{chatterjee2020extracting,
  author = {Preetha Chatterjee},
  year = {2020},
  title = {Extracting Archival-Quality Information from Software-Related Chats},
  journal = {The 42nd International Conference on Software Engineering (ICSE), Doctoral Symposium Track, Oct 2020. Seoul, South Korea},
  file = {Extracting Archival.pdf},
  links = {
          Preprint = 'https://preethac.github.io/files/ICSE_DocSymp_20.pdf',
          DOI = 'https://dl.acm.org/doi/10.1145/3377812.3381391',
          Slides = 'https://www.slideshare.net/PreethaChatterjee1/extracting-archivalquality-information-from-softwarerelated-chats-236867937',
          Manuscript = null,
          Dataset = null
      }
}

Software developers are increasingly having conversations about software development via online chat services. Many of those chat communications contain valuable information, such as code descriptions, good programming practices, and causes of common errors/exceptions. However, the nature of chat community content is transient, as opposed to the archival nature of other developer communications such as email, bug reports and Q&A forums. As a result, important information and advice are lost over time. The focus of this dissertation is Extracting Archival Information from Software-Related Chats, specifically to (1) automatically identify conversations that contain archival-quality information, (2) accurately reduce the granularity of the information reported as archival information, and (3) conduct a case study to investigate how archival quality information extracted from chats compare to related posts in Q&A forums. Archiving knowledge from developer chats could be used potentially in several applications such as creating a new archival mechanism available to a given chat community, augmenting Q&A forums, or facilitating the mining of specific information and improving software maintenance tools.

Preetha Chatterjee, Kostadin Damevski, Lori Pollock, Vinay Augustine, and Nicholas A. Kraft. “Exploratory Study of Slack Q&A Chats as a Mining Source for Software Engineering Tools.” In The 16th International Conference on Mining Software Repositories (MSR), Research Track, May 2019. Montreal, Canada, 2019.

@inproceedings{chatterjee2019exploratory,
  author = {Preetha Chatterjee, Kostadin Damevski and Pollock, Lori and Augustine, Vinay and Kraft, Nicholas A.},
  year = {2019},
  title = {Exploratory Study of Slack Q&A Chats as a Mining Source for Software Engineering Tools},
  journal = {The 16th International Conference on Mining Software Repositories (MSR), Research Track, May 2019. Montreal, Canada},
  file = {Exploratory Study of Slack Q.pdf},
  links = {
          Preprint = 'https://preethac.github.io/files/MSR19.pdf',
          DOI = 'https://dl.acm.org/citation.cfm?id=3341883.3341961',
          Slides = 'https://www.slideshare.net/PreethaChatterjee1/exploratory-study-of-slack-qa-chats-as-a-mining-source-for-software-engineering-tools',
          Manuscript = null,
          Dataset = null,
          Media = 'https://new.abb.com/news/detail/26145/mining-slack'
      }
}

Modern software development communities are increasingly social. Popular chat platforms such as Slack host public chat communities that focus on specific development topics such as Python or Ruby-on-Rails. Conversations in these public chats often follow a Q&A format, with someone seeking information and others providing answers in chat form. In this paper, we describe an exploratory study into the potential usefulness and challenges of mining developer Q&A conversations for supporting software maintenance and evolution tools. We designed the study to investigate the availability of information that has been successfully mined from other developer communications, particularly Stack Overflow. We also analyze characteristics of chat conversations that might inhibit accurate automated analysis. Our results indicate the prevalence of useful information, including API mentions and code snippets with descriptions, and several hurdles that need to be overcome to automate mining that information.

Preetha Chatterjee, Benjamin Gause, Hunter Hedinger, and Lori Pollock. “Extracting Code Segments and Their Descriptions from Research Articles.” In The 14th International Conference on Mining Software Repositories (MSR), Research Track, May 2017. Buenos Aires, Argentina, 2017.

@inproceedings{chatterjee2017extracting,
  author = {Preetha Chatterjee, Benjamin Gause and Hedinger, Hunter and Pollock, Lori},
  year = {2017},
  title = {Extracting Code Segments and Their Descriptions from Research Articles},
  journal = {The 14th International Conference on Mining Software Repositories (MSR), Research Track, May 2017. Buenos Aires, Argentina},
  file = {Extracting Code Segments.pdf},
  links = {
          Preprint = 'https://preethac.github.io/files/MSR17.pdf',
          DOI = 'https://ieeexplore.ieee.org/document/7962359',
          Slides = 'https://www.slideshare.net/PreethaChatterjee1/extracting-code-segments-and-their-descriptions-from-research-articles',
          Manuscript = null,
          Dataset = null
      }
}

The availability of large corpora of online software-related documents today presents an opportunity to use machine learning to improve integrated development environments by first automatically collecting code examples along with associated descriptions. Digital libraries of computer science research and education conference and journal articles can be a rich source for code examples that are used to motivate or explain particular concepts or issues. Because they are used as examples in an article, these code examples are accompanied by descriptions of their functionality, properties, or other associated information expressed in natural language text. Identifying code segments in these documents is relatively straightforward, thus this paper tackles the problem of extracting the natural language text that is associated with each code segment in an article. We present and evaluate a set of heuristics that address the challenges of the text often not being colocated with the code segment as in developer communications such as online forums.

Preetha Chatterjee, Manziba Akanda Nishi, Kostadin Damevski, Vinay Augustine, Lori Pollock, and Nicholas A. Kraft. “What Information about Code Snippets Is Available in Different Software-Related Documents? An Exploratory Study.” The 24th IEEE International Conference on Software Analysis, Evolution, and Reengineering (SANER), Early Research Achievements Track, Feb 2017. Klagenfurt, Austria, 2017.

@inproceedings{chatterjee2017exploratory,
  author = {Preetha Chatterjee, Manziba Akanda Nishi and Damevski, Kostadin and Augustine, Vinay and Pollock, Lori and Kraft, Nicholas A.},
  year = {2017},
  title = {What Information about Code Snippets Is Available in Different Software-Related Documents? An Exploratory Study},
  address = {The 24th IEEE International Conference on Software Analysis, Evolution, and Reengineering (SANER), Early Research Achievements Track, Feb 2017. Klagenfurt, Austria},
  file = {What Information.pdf},
  links = {
          Preprint = 'https://preethac.github.io/files/SANER17.pdf',
          DOI = 'https://ieeexplore.ieee.org/document/7884638',
          Slides = null,
          Manuscript = null,
          Dataset = null
      }
}

A large corpora of software-related documents is available on the Web, and these documents offer the unique opportunity to learn from what developers are saying or asking about the code snippets that they are discussing. For example, the natural language in a bug report provides information about what is not functioning properly in a particular code snippet. Previous research has mined information about code snippets from bug reports, emails, and Q&A forums. This paper describes an exploratory study into the kinds of information that is embedded in different software-related documents. The goal of the study is to gain insight into the potential value and difficulty of mining the natural language text associated with the code snippets found in a variety of software-related documents, including blog posts, API documentation, code reviews, and public chats.

Publication

Follow Us

Address