
During my stay at LTI, while working on a varied number of NLP topics and learning about the prevailing challenges in this field, I inculcated a desire to answer three basic language needs using current state-of-the-art NLP approaches - a) Easy Information Access, b) Efficient Information Access, c) Multilingual Information Access

Easy Information Access

Information can often be embedded in multiple modes like natural language, structured data, images, and programs of code. I wish to make this information easily available through sophisticated NLP techniques and state-of-the-art multimodal and language models. As part of my current research under Prof. Teruko, I am working on Knowledge-directed Artificial Intelligence Reasoning Over Schemas (KAIROS), a DARPA-funded project to identify complex events embedded in documents, generate schemas and use these patterns to project missing or predict future events. I am currently interested in using code-based models like Codex to generate these schemas by converting text to python snippets and convert the schema generation problem to a structure prediction task.

Efficient Information Access

State-of-the-art language models like T5, GPT encode huge amounts of information which enables them to perform efficiently on downstream tasks. It becomes computationally inefficient to train these models for ever-evolving tasks and languages. In this realm, I want to enable efficient information access through parameter-efficient techniques (PET) that have got recent popularity in NLP. In my EMNLP’22 work on prompt-composition for code-switching languages, I showed that instance-based prompts can be composed by learning task prompts and language prompts separately using monolingual corpora of the languages composing the code-switching pair.

Multilingual Information Access

I grew up as a multilingual speaker and hence I have always been concerned about NLP in non-English languages. Especially for languages like Hindi and Bengali which rank among the top spoken languages in the world, where state-of-the-art multilingual models still struggle with tasks like question answering, and summarization. In my NAACL’22 work, I worked on cross-lingual open-domain question-answering based on Wikipedia showing improved performance in zero shot settings.

For detailed information about the projects, one can refer my resume