Even though current vision language (V+L) models have achieved success in generating image captions, they often lack specificity and overlook various aspects of the image. Additionally, the attention learned through weak supervision operates opaquely and is difficult to control. To address these limitations, we propose the use of semantic roles as control signals in caption generation. Our hypothesis is that, by incorporating semantic roles as signals, the generated captions can be guided to follow specific predicate argument structures. To validate the effectiveness of our approach, we conducted experiments using data and compared the results with a baseline model VL-BART(CITATION). The experiments showed a significant improvement, with a gain of 45% in Smatch score (Standard NLP evaluation metric for semantic representations), demonstrating the efficacy of our approach. By focusing on specific objects and their associated semantic roles instead of providing a general description, our framework produces captions that exhibit enhanced quality, diversity, and controllability.
2023
CRAPES:Cross-modal Annotation Projection for Visual Semantic Role Labeling
Abhidip Bhattacharyya, Martha Palmer, and Christoffer Heckman
In The 12th Joint Conference on Lexical and Computational Semantics, Co-located with ACL, Jul 2023
In recent decades, there has been a significant push to leverage technology to aid both teachers and students in the classroom. Language processing advancements have been harnessed to provide better tutoring services, automated feedback to teachers, improved peer-to-peer feedback mechanisms, and measures of student comprehension for reading. Automated question generation systems have the potential to significantly reduce teachers’ workload in the latter. In this paper, we compare three differ- ent neural architectures for question generation across two types of reading material: narratives and textbooks. For each architecture, we explore the benefits of including question attributes in the input representation. Our models show that a T5 architecture has the best overall performance, with a RougeL score of 0.536 on a narrative corpus and 0.316 on a textbook corpus. We break down the results by attribute and discover that the attribute can improve the quality of some types of generated questions, including Action and Character, but this is not true for all models.
2022
Aligning Images and Text with Semantic Role Labels for Fine-Grained Cross-Modal Understanding
Abhidip Bhattacharyya, Cecilia Mauceri, Martha Palmer, and 1 more author
In Proceedings of the Thirteenth Language Resources and Evaluation Conference, Jun 2022
As vision processing and natural language processing continue to advance, there is increasing interest in multimodal applications, such as image retrieval, caption generation, and human-robot interaction. These tasks require close alignment between the information in the images and text. In this paper, we present a new multimodal dataset that combines state of the art semantic annotation for language with the bounding boxes of corresponding images. This richer multimodal labeling supports cross-modal inference for applications in which such alignment is useful. Our semantic representations, developed in the natural language processing community, abstract away from the surface structure of the sentence, focusing on specific actions and the roles of their participants, a level that is equally relevant to images. We then utilize these representations in the form of semantic role labels in the captions and the images and demonstrate improvements in standard tasks such as image retrieval. The potential contributions of these additional labels is evaluated using a role-aware retrieval system based on graph convolutional and recurrent neural networks. The addition of semantic roles into this system provides a significant increase in capability and greater flexibility for these tasks, and could be extended to state-of-the-art techniques relying on transformers with larger amounts of annotated data.
2020
An Approach to Mine SBVR Vocabularies and Rules from Business Documents
Pavan Kumar Chittimalli, Chandan Prakash, Ravindra Naik, and 1 more author
In Proceedings of the 13th Innovations in Software Engineering Conference on Formerly Known as India Software Engineering Conference, Jun 2020
Enterprises model the behavior of their business to prepare a communication standard for business analysts and to specify requirements to Information Technology (IT) people. The communication gap between IT group and business analysts, who lie on the opposite end of the business spectrum exists due to the different terminologies used in their respective fields regarding the same context. This gap has led to major software failures which prompted the OMG group has come up with a new standard - Semantic of Business Vocabulary and Business Rules (SBVR). Declarative models are provided by SBVR to represent Business Vocabulary and Business Rules which can be understood by everyone working throughout the business spectrum. Each business is governed by business rules which are constrained by the regulation policy set up by the policy guidelines of the organization and government regulations set up on the organization. Business rules are specified in documents like user guides, requirement documents, terms and conditions, do’s and don’ts. Typically a Business Analyst interprets the document and manually extracts rules based on his understanding which leads to potential discrepancies, ambiguities and quality issues in the software system. To minimize such errors, in this paper we present an unsupervised approach to automatically extract SBVR vocabularies and rules from domain-specific business documents. We also present our initial results and comparative study with our earlier approach.
2019
SBVR-Based Business Rule Creation for Legacy Programs Using Variable Provenance
Pavan Kumar Chittimalli, and Abhidip Bhattacharyya
In Proceedings of the 12th Innovations on Software Engineering Conference (Formerly Known as India Software Engineering Conference), Jun 2019
Functionality of a software system that implements business operations can be captured using business processes and rules. To understand the ’as-is’ processes and rules, the source-code is arguably the best source of knowledge. We present a novel method that combines program analysis and domain knowledge to create the descriptions for "IT rules", as a critical step towards extracting business rules automatically. We introduce and use the concept of ’variable provenance’ to propagate the domain descriptions into the source code to create Semantics of Business Vocabularies and Rules (SBVR) rules. In our experiments on sample, near-real-life systems, we could successfully annotate very large percentage (> 90%) of IT rules and enable to create SBVR rules. We present and describe the ProgAnnotator tool which is based on variable provenance and generates descriptions for IT rules in the source code and subsequently create SBVR rules automatically.
2018
Relation Identification in Business Rules for Domain-Specific Documents
Abhidip Bhattacharyya, Pavan Kumar Chittimalli, and Ravindra Naik
In Proceedings of the 11th Innovations in Software Engineering Conference, Jun 2018
This paper focuses on an approach to mine business rules from documents and facilitates a methodology to represent them in a formal notation. Businesses are operated abiding by some rules and complying with respect to regulation and guidelines. The business rules are often written using English in operating procedures, terms and conditions, and various other supporting documents. The manual analysis of these rules for activities like impact analysis, maintenance, business transformation leads to potential discrepancies, ambiguities, and quality issues. In this paper, we discuss our approach of mining relations among the rule intents (atomic facts) defined for business rules. We also present our preliminary studies on a couple of openly available documents.
2017
An Approach to Mine Business Rule Intents from Domain-Specific Documents
Abhidip Bhattacharyya, Pavan Kumar Chittimalli, and Ravindra Naik
In Proceedings of the 10th Innovations in Software Engineering Conference, Jun 2017
An enterprise system enables business by providing various services that are guided by set of well-defined processes, and adhere to certain business rules and constraints. The business rules are usually written using English in operating procedures, terms and conditions, and various other supporting documents. For implementing the business rules in a software system, or expressing them as UML use-case specifications, analysts manually interpret the documents, leading to potential discrepancies, ambiguities, and quality issues in the software system that can be resolved only after testing.To minimize such errors, we propose a novel method to mine the documents automatically to extract the fundamental atomic facts in every sentence - called as business rule intents. We adopt dependency tree parser to parse the rule sentences and extract rule intents from them. Our experiments using few publicly available sample documents in the financial domain yielded very promising results, where rule intents extraction produced an average precision of 78% and recall of 80%.