publications
2023
- CodeBERTScore: Evaluating Code Generation with Pretrained Models of CodeShuyan Zhou, Uri Alon, Sumit Agarwal, and Graham NeubigMar 2023
Since the rise of neural models of code that can generate long expressions and statements rather than a single next-token, one of the major problems has been reliably evaluating their generated output. In this paper, we propose CodeBERTScore: an automatic evaluation metric for code generation, which builds on BERTScore (Zhang et al., 2020). Instead of measuring exact token matching as BLEU, CodeBERTScore computes a soft similarity score between each token in the generated code and in the reference code, using the contextual encodings of large pretrained models. Further, instead of encoding only the generated tokens as in BERTScore, CodeBERTScore also encodes the programmatic context surrounding the generated code. We perform an extensive evaluation of CodeBERTScore across four programming languages. We find that CodeBERTScore achieves a higher correlation with human preference and with functional correctness than all existing metrics. That is, generated code that receives a higher score by CodeBERTScore is more likely to be preferred by humans, as well as to function correctly when executed. Finally, while CodeBERTScore can be used with a multilingual CodeBERT as its base model, we release five language-specific pretrained models to use with our publicly available code at this https URL . Our language-specific models have been downloaded more than 25,000 times from the Huggingface Hub.
2022
- PRO-CS : An Instance-Based Prompt Composition Technique for Code-Switched TasksSumit Agarwal, Srijan Bansal, Suraj Tripathi, Teruko Mitamura, and Eric NybergIn Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing Dec 2022
Code-switched (CS) data is ubiquitous in today’s globalized world, but the dearth of annotated datasets in code-switching poses a significant challenge for learning diverse tasks across different language pairs. Parameter-efficient prompt-tuning approaches conditioned on frozen language models have shown promise for transfer learning in limited-resource setups. In this paper, we propose a novel instance-based prompt composition technique, PRO-CS, for CS tasks that combine language and task knowledge. We compare our approach with prompt-tuning and fine-tuning for code-switched tasks on 10 datasets across 4 language pairs. Our model outperforms the prompt-tuning approach by significant margins across all datasets and outperforms or remains at par with fine-tuning by using just 0.18% of total parameters. We also achieve competitive results when compared with the fine-tuned model in the low-resource cross-lingual and cross-task setting, indicating the effectiveness of our approach to incorporate new code-switched tasks.
- Zero-shot cross-lingual open domain question answeringSumit Agarwal, Suraj Tripathi, Teruko Mitamura, and Carolyn Penstein RoseIn Proceedings of the Workshop on Multilingual Information Access (MIA) Jul 2022
People speaking different kinds of languages search for information in a cross-lingual manner. They tend to ask questions in their language and expect the answer to be in the same language, despite the evidence lying in another language. In this paper, we present our approach for this task of cross-lingual open-domain question-answering. Our proposed method employs a passage reranker, the fusion-in-decoder technique for generation, and a wiki data entity-based post-processing system to tackle the inability to generate entities across all languages. Our end-2-end pipeline shows an improvement of 3 and 4.6 points on F1 and EM metrics respectively, when compared with the baseline CORA model on the XOR-TyDi dataset. We also evaluate the effectiveness of our proposed techniques in the zero-shot setting using the MKQA dataset and show an improvement of 5 points in F1 for high-resource and 3 points improvement for low-resource zero-shot languages. Our team, CMUmQA’s submission in the MIA-Shared task ranked 1st in the constrained setup for the dev and 2nd in the test setting.
- R3 : Refined Retriever-Reader pipeline for Multidoc2dialSumit Agarwal, Srijan Bansal, Suraj Tripathi, Sireesh Gururaja, Aditya Srikanth Veerubhotla, Ritam Dutt, Teruko Mitamura, and Eric NybergIn Proceedings of the Second DialDoc Workshop on Document-grounded Dialogue and Conversational Question Answering May 2022
In this paper, we present our submission to the DialDoc shared task based on the MultiDoc2Dial dataset. MultiDoc2Dial is a conversational question answering dataset that grounds dialogues in multiple documents. The task involves grounding a user’s query in a document followed by generating an appropriate response. We propose several improvements over the baseline’s retriever-reader architecture to aid in modeling goal-oriented dialogues grounded in multiple documents. Our proposed approach employs sparse representations for passage retrieval, a passage re-ranker, the fusion-in-decoder architecture for generation, and a curriculum learning training paradigm. Our approach shows a 12 point improvement in BLEU score compared to the baseline RAG model.
- Model Transfer for Event tracking as Transcript Understanding for Videos of Small Group InteractionSumit Agarwal, Rosanna Vitiello, and Carolyn RoséIn Proceedings of the First Workshop On Transcript Understanding Oct 2022
Videos of group interactions contain a wealth of information beyond the information directly communicated in a transcript of the discussion. Tracking who has participated throughout an extended interaction and what each of their trajectories has been in relation to one another is the foundation for joint activity understanding, though it comes with some unique challenges in videos of tightly coupled group work. Motivated by insights into the properties of such scenarios, including group composition and the properties of task-oriented, goal directed tasks, we present a successful proof-of-concept. In particular, we present a transfer experiment to a dyadic robot construction task, an ablation study, and a qualitative analysis.
2017
- SLANT+: A Nonlinear Model for Opinion Dynamics in Social NetworksBhushan Kulkarni, Sumit Agarwal, Abir De, Sourangshu Bhattacharya, and Niloy GangulyIn 2017 IEEE International Conference on Data Mining (ICDM) Oct 2017
Online Social Networks (OSNs) have emerged as a global media for forming and shaping opinions on a broad spectrum of topics like politics, e-commerce, sports, etc. So, research on understanding and predicting opinion dynamics in OSNs, especially using a tractable linear model, has abound in literature. However, these linear models are too simple to uncover the actual complex dynamics of opinion flow in social networks. In this paper, we propose SLANT+, a novel nonlinear generative model for opinion dynamics, by extending our earlier linear opinion model SLANT. To design this model, we rely on a network-guided recurrent neural network architecture which learns a proper temporal representation of the messages as well as the underlying network. Furthermore, we probe various signals from the real life datasets and offer a conceptually interpretable nonlinear function that not only provides concrete clues of the opinion exchange process, but also captures the coupled dynamics of message timings and opinion flow. As a result, with five real-life datasets crawled from Twitter, our proposal gives significant accuracy boost over six state-of-the-art baselines.