Understanding the role of knowledge graphs on large language model accuracy.
Source Data.World Podcasts
Investing in Knowledge Graph provides higher accuracy for LLM-powered question-answering systems. That's the conclusion of the latest research that Juan Sequeda, Dean Allemang and Bryon Jacob have recently presented. In this episode, they dive into the details of this research and understand why - to succeed in this AI world - enterprises must treat the business context and semantics as a first-class citizen.
Topic Summary
Investing in knowledge graphs significantly improves the accuracy of large language models (LLMs) for question-answering systems, especially in enterprise settings. The presenters argue that knowledge graphs are a requirement for successful generative AI applications in enterprises. The main study cited is research conducted by Juan Sequeda, Dean Allemang, and Bryon Jacob. Their experiment compared the accuracy of LLM-powered question-answering systems with and without knowledge graphs.
Bottom-Line Up-Front
Overall accuracy increased from 16% without a knowledge graph to 54% with a knowledge graph (a 3x improvement).
For easy questions on easy data, accuracy improved from 25% to 70%.
For complicated questions on easy data, accuracy improved from 37% to 67%.
For questions requiring more than five tables, accuracy improved from 0% to 35-38% with a knowledge graph.
Other Call-Outs
Defining knowledge graphs
Combination of Ontology and Data: A knowledge graph consists of both the semantic layer (ontology) and the data layer. The semantic layer, also referred to as the ontology or semantic layer, represents the business context and relationships, while the data layer contains the actual data.
Explicit Relationships: Unlike SQL databases where relationships are implicit (e.g., through foreign keys), knowledge graphs make these relationships explicit, often using a subject-predicate-object structure similar to natural language.
Virtualization: In their experiments, the knowledge graph was virtualized, meaning the data remained in SQL databases but was accessed and interpreted as a graph through semantic virtualization technology.
Key findings
Knowledge graphs significantly improve LLM accuracy: The research showed that using a knowledge graph increased overall accuracy from 16% to 54% (a 3x improvement) compared to using SQL databases alone.
A requirement for generative AI: Juan Sequeda states that "Knowledge graphs are a requirement for generative AI" and emphasizes that investing in knowledge graphs is necessary to succeed in the AI world.
Providing context and semantics: Knowledge graphs make relationships between data explicit, providing crucial context and semantics that LLMs can leverage.
Combining ontology and data: Dean Allemang describes a knowledge graph as consisting of both the semantic layer (ontology) and the data layer, whether physically represented as a graph or virtualized.
Improving complex queries: Knowledge graphs were especially effective for complex questions requiring multiple tables, where SQL-only approaches failed completely.
Enabling business-level querying: Dean mentions that knowledge graphs allow querying data from a business perspective, unifying data from various sources.
Evolving value proposition: The presenters note that the value of knowledge graphs has increased significantly with the advent of generative AI, compared to just a year ago.
Scientific approach: They emphasize the importance of empirically testing the effectiveness of knowledge graphs and invite others to build upon their research.
The presenters are particularly excited about the potential for knowledge graphs to transform how enterprises interact with their data and the possibilities for further improvements in accuracy through additional techniques like prompt engineering and RAG (Retrieval Augmented Generation).
The presenters are most excited about:
The potential for further improvements: Dean Allemang mentions that prompt engineering, multi-shot approaches, and RAG (Retrieval Augmented Generation) could significantly enhance accuracy.
The scientific approach: They emphasize the importance of empirical evidence and invite others to build upon their research. Dean says, "This framework can answer that same question for anything else that you think is valuable."
Turning ontology engineering into a science: Dean is excited about using this approach to measure the effectiveness of ontologies, potentially transforming ontology engineering from a craft into a more quantifiable science.
The productivity boost in enterprises: Juan Sequeda is enthusiastic about the potential for knowledge graphs combined with LLMs to significantly increase productivity in enterprise settings.
Real-world studies
Dean mentions a customer who initially was "dating knowledge graphs" but ended up "falling in love" with them after seeing the benefits in expressing complex business rules.
Juan discusses the potential for using LLMs to accelerate knowledge engineering tasks, extracting information from people's heads more efficiently.
They compare the current value proposition of knowledge graphs to what it was just a year ago, emphasizing how much more valuable they've become with the advent of generative AI.
About the Episode
Biographies
Juan Sequeda is a leading expert in data management and knowledge graphs, focusing on enhancing the accuracy of large language models (LLMs) in enterprise question-answering systems. His research demonstrates that knowledge graphs can significantly improve LLM performance, achieving a threefold increase in accuracy. Sequeda emphasizes the importance of integrating business context and semantics into AI applications and actively engages with the data community to share insights and findings.
Dean Allemang is a recognized authority in the semantic web and knowledge graphs, with extensive experience in the field. He is the author of "Semantic Web for the Working Ontologist" and has contributed significantly to the practical application of knowledge graphs in enterprise data management. Allemang's research highlights the critical role of knowledge graphs in improving LLM accuracy, particularly for complex queries. He advocates for a scientific approach to ontology engineering, aiming to transform it into a more empirical discipline while fostering community collaboration in exploring knowledge representation.
Bryon Jacob is mentioned as a collaborator in the research presented by Sequeda and Allemang, known for his work on knowledge graphs and their applications in enterprises. Jacob focuses on leveraging data semantics to enhance the accuracy and efficiency of question-answering systems, aligning with the podcast's theme of the importance of knowledge graphs in generative AI.