Overview

This workshop seeks to expedite efforts at the intersection of Symbolic Knowledge and Statistical Knowledge inherent in LLMs. The objective is to establish quantifiable methods and acceptable metrics for addressing consistency, reliability, and safety in LLMs. Simultaneously, we seek unimodal or multimodal NeuroSymbolic solutions to mitigate LLM issues through context-aware explanations and reasoning. The workshop also focuses on critical applications of LLMs in health informatics, biomedical informatics, crisis informatics, cyber-physical systems, and legal domains. We invite submissions that present novel developments and assessments of informatics methods, including those that showcase the strengths and weaknesses of utilizing LLMs.

Workshop Schedule

Time (GMT+2)	Event	Details
8:50 AM - 9:00 AM	Welcome Address and Introduction	Organizing Committee Members and Keynote Speakers
9:00 AM - 9:40 AM	Keynote Talk #1	Dr. Amit P. Sheth
9:45 AM - 10:00 AM	Oral Presentation 1	Vinicius Monteiro de Lira et al.
10:00 AM - 10:15 AM	Oral Presentation 2	Hannah Sansford et al.
10:15 AM - 10:40 AM	Invited Talk #1	Andrii Skomorokhov, Haltia.Inc
10:45 AM - 11:20 AM	Keynote Talk #2	Dr. Alex Jaimes, Dataminr Inc.
11:20 AM - 11:30 AM	BREAK (10 Mins)
11:30 AM - 11:35 AM	Lightning Talk 1	Bing Hu et al.
11:35 AM - 11:40 AM	Lightning Talk 2	Amirhossein Ghaffari et al.
11:40 AM - 11:45 AM	Lightning Talk 3	Walid S. Saba
11:45 AM - 11:50 AM	Lightning Talk 4	Firuz Juraev et al.
11:50 AM - 11:55 AM	BREAK (5 Mins)
11:55 AM - 12:30 PM	Keynote Talk #3	Dr. Huzefa Rangawala, Amazon/George Mason University
12:30 PM - 12:45 PM	Invited Talk #2	Negar Foroutan Eghlidi, EPFL (Antoine Bosselut)
12:45 PM - 1:00 PM	Oral Presentation #3	Ishwar B Balappanawar
1:00 PM - 1:15 PM	Oral Presentation #4	Ziyi Shou et al.
Closing Remarks for the Workshop

Topics of Interest

Theme: Improving LLMs with Consistency, Reliability, Explainability, and Safety

Developing methods and metrics to enhance consistency and reliability in LLMs.
Developing or Adapting safety certification frameworks for LLMs for health, cybersecurity, legal, and others.
Investigation into various types of general-purpose or domain-specific knowledge (like procedural and declarative); the fusion of knowledge and drawing inferences in Language Model Models (LLMs) for decision-making purposes.
Comparing how humans and machines semantically represent and process information, including abstraction levels, lexical structure, property inheritance, generalization, geometric approaches to meaning representation, mental connections, and the retrieval of meaning.
Innovative approaches and standards for achieving interpretability, explainability, and transparency, focusing on healthcare, both in quantitative and qualitative contexts.
Application of methodologies from diverse fields (such as neuroscience or computer vision) to scrutinize and assess LLMs and foundation models.
Applications of LLMs, foundation models, conversational systems, and generative AI in physical and mental healthcare as well as wellbeing and nutrition.
Development of open-source tools for analyzing, visualizing, or explaining LLMs.
Metrics for Safety-assessment of LLMs
Methods to make LLMs inherently safe from adversarial attacks.
Metrics for hallucination in the presence or absence of ground truth.
Experiences from real-world deployments and resulting datasets.

NeuroSymbolic and Knowledge-infused Learning

Retrieval augmented LLMs for health, legal, crisis, and other applications
Enhancing retrieval augmentation through structured background knowledge.
Knowledge-infused Reinforcement Learning
Knowledge-injected foundational language models,
Automated Rule Learning and Inference in health, legal, crisis, cybersecurity, etc.
User-Explainable Machine Learning and Deep Learning
User-Safety in Conversational Systems
Bias-awareness or Debiasing with Context in Deep Learning
User Controllability in Deep Learning using rules, constraints, and domain-specific guidelines.

Call For Papers

Submission Website: Open Review

Submission Deadline: May 25, 2024 May 31, 2024
Author Notification: June 21, 2024 June 27 2024

Camera Ready Submission: June 30, 2024 July 7, 2024

We welcome original research papers in four types of submissions:

Full research papers (9 pages including references)
Position papers(4-6 pages) including references)
Short papers (6 pages including references)
Demo (4-6 pages including references) papers

A skilled and multidisciplinary program committee will evaluate all submitted papers, focusing on the originality of the work and its relevance to the workshop's theme. Acceptance of papers will adhere to the KDD 2024 Conference Template and undergo a double-blind review process. More details regarding submission can also be found at https://kdd2024.kdd.org/research-track-call-for-papers/. Selected papers will be presented at the workshop and published as open-access in the workshop proceedings through CEUR, where they will be available as archival content.

Submission Instructions

Dual submission policy: This workshop welcomes ongoing and unpublished work but will also accept papers that are under review or have recently been accepted at other venues.
OpenReview Moderation Policy: Authors are advised to register on Openreview using their institutional email addresses when submitting their work to KDD KiL Workshop. Using personal email addresses, such as "gmail.com," may result in a delay of up to two weeks for Openreview to verify and allow the submission. This could cause inconvenience during the busy submission period.

Organizing Committee

Manas Gaur

University of Maryland Baltimore County, USA

(Primary Contact)

Email: manas@umbc.edu

Efthymia Tsamoura

Samsung Research, Cambridge, UK

Email: efi.tsamoura@samsung.com

Edward Raff

Booz Allen Hamilton, USA

Email: Raff_Edward@bah.com

Nikhita Vedula

Amazon, USA

Email: veduln@amazon.com

Srinivasan Parthasarathy

Ohio State University, USA

Email: srini@cse.ohio-state.edu

Amit Sheth

Forging Trust in Tomorrow’s AI: A Roadmap for Reliable, Explainable, and Safe NeuroSymbolic Systems

Abstract: In Pedro Dominguez's influential 2012 paper, the phrase "Data alone is not enough" emphasized a crucial point. I've long shared this belief, which is evident in our Semantic Search engine, which was commercialized in 2000 and detailed in a patent. We enhanced machine learning classifiers with a comprehensive WorldModel™, known today as knowledge graphs, to improve named entity, relationship extraction, and semantic search. This early project highlighted the synergy between data-driven statistical learning and knowledge-supported symbolic AI methods, a key idea driving the fast-emerging NeuroSymbolic AI.

LLMs, while impressive in their abilities to understand and generate human-like text, have limitations in reasoning. They excel at pattern recognition, language processing, and generating coherent text based on input. However, their reasoning capabilities are limited by their need for true understanding or awareness of concepts, contexts, or causal relationships beyond the statistical patterns in the data they were trained on. While they can perform certain types of reasoning tasks (e.g., simple logical deductions or basic arithmetic), they often need help with more complex forms of reasoning that require deeper understanding, context awareness, or commonsense knowledge. They may produce responses that appear rational on the surface but lack genuine comprehension or logical consistency. Furthermore, their reasoning does not adapt well to the changing environment (where data and knowledge change) in which the AI model operates.

Solution: Neurosymbolic AI combined with Custom and Compact Models: AI models can be augmented with neurosymbolic methods and external knowledge sources, resulting in compact (small size, high performance) and custom (vertical, addressing specific application/use) models. They can support efficient adaptation to changing data and knowledge. By integrating neurosymbolic approaches, these models acquire a structured understanding of data, enhancing interpretability and reliability (e.g., through verifiability audits using reasoning traces). This structured understanding fosters safer and more consistent behavior and facilitates efficient adaptation to evolving information, ensuring agility in handling dynamic environments. Furthermore, incorporating external knowledge sources enriches the model's understanding and adaptability for the chosen domains, bolstering its efficiency in tackling varied specialized tasks. The small size of these models enables rapid deployment and contributes to computational efficiency, better management of constraints, and faster re-training/fine-tuning/inference. Our current work involves applications to health, autonomous vehicles, and smart manufacturing.

Bio: Professor Amit Sheth is an Educator, Researcher, and Entrepreneur. He founded the university-wide AI Institute at the University of South Carolina (AIISC) in 2019 and grew it to nearly 50 AI researchers in four years. He is a fellow of IEEE, AAAI, AAAS, ACM, and AIAA. Among his awards include the IEEE CS Wallace McDowell Award and IEEE TCSVC Research Innovation Award. He has co-founded four companies and ran two of them. These include Taalee/Semangix, which pioneered Semantic Search (founded 1999), ezDI (founded 2014), which supported knowledge-infused clinical NLP/NLU (founded 2024), and Cognovi Labs (founded 2016), an emotion AI company. He is proud of the success of over 45 Ph.D. advisees and postdocs he has advised/mentored.

Alex Jaimes

The Role of Knowledge in AI for Critical Real-Time Applications

Abstract: Dataminr’s AI Platform discovers the earliest signals of events, risks, and threats from billions of multi-modal inputs from over one million public data sources. It uses predictive AI to detect events, generative AI to describe them, and regenerative AI to generate live briefs that continuously update as events unfold. The events discovered by the platform help first responders quickly respond to emergencies, they help corporate security teams respond to risks (including Cyber risks), and they help news organizations discover breaking events to provide fast and accurate coverage. Building and deploying a large-scale AI platform like Dataminr’s is fraught with research and technical challenges. This includes tackling the hardest problem in AI (determining the real-time value of information), which requires combining a multitude of AI approaches. In this talk, I will focus on the role of knowledge in the platform, particularly the role of Knowledge Graphs and how they can be used in conjunction with LLMs in critical real-time applications. I’ll point to the main research challenges in building and leveraging Knowledge Graphs in critical applications.

Bio: Alex Jaimes leads the AI efforts at Dataminr, focusing on leveraging AI to detect and respond to critical events in real-time. His work has significant impacts on first responders, corporate security teams, and news organizations, providing them with the necessary tools to act quickly and accurately in high-stakes situations.

Huzefa Rangawala

Leveraging Structured Knowledge for Generative AI Applications

Bio: At AWS AI/ML, Huzefa Rangawala spearheads a team of scientists and engineers, revolutionizing AWS services through advancements in graph machine learning, reinforcement learning, AutoML, low-code/no-code generative AI, and personalized AI solutions. His passion extends to transforming analytical sciences with the power of generative AI. He is a Professor of Computer Science and the Lawrence Cranberg Faculty Fellow at George Mason University, where he also served as interim Chair from 2019-2020. He is the recipient of the National Science Foundation (NSF) Career Award, the 2014 University-wide Teaching Award, Emerging Researcher/Creator/Scholar Award, the 2018 Undergraduate Research Mentor Award. In 2022, Huzefa co-chaired the ACM SIGKDD conference in Washington, DC. His research interests include structured learning, federated learning, and ML fairness inter-twinned with applying ML to problems in biology, biomedical engineering, and learning sciences.

Frequently Asked Questions

Will all of the approved papers be included in the proceedings for archival purposes?

Yes.

When is the submission deadline for the workshop ?

May 31, 2024

Where would the proceedings be archived ?

The proceedings would be archived on CEUR.

How to be a reviewer in Kil Workshop ?

Please email manas@umbc.edu with the subject: Potential KDD-KiL reviewer.

Does the workshop accept Position/Vision papers ?

Check out similar papers presented at VLDB (Vision Papers) to get an idea of topics we might find interesting. Likewise, for open problems, you could refer to publicly available datasets or challenge papers, such as those found in NeurIPS, ACL, and VLDB. Remember, your paper should contain technically engaging content. If your focus is on technical methodology, Research, Short, or Demo papers may be more suitable. An example of position paper: https://arxiv.org/abs/2404.04540

Will there be an opportunity for rebuttal ?

No.

Theme: Towards Consistent, Reliable, Explainable, and Safe LLMs

Date: August 26, 2024

Time: 9 AM to 1 PM (GMT+2)