Published
- From structured data to actionable narratives: The power of narrativized embeddings
- How it works: From data to narrative to value
- Narrativized embeddings in action
- Developing narrativized embeddings
- Getting the templates right
- Narrativized embeddings versus knowledge graphs and Text2SQL
- Combining narrativized embeddings with knowledge graphs
- The advantages of narrativized embeddings
- A scalable solution for structuring and querying complex data
From structured data to actionable narratives: The power of narrativized embeddings
Extracting and applying semantic relationships from structured data is crucial for AI to deliver reliable results. Knowledge graphs are powerful but can take a lot of effort to build and maintain. Enter narrativized embeddings: an agile approach that uses templates to turn structured data into narratives.
Narrativized embeddings can boost semantic understanding and pave the way for more advanced applications such as retrieval-augmented generation (RAG) in conversational AI systems.
How it works: From data to narrative to value
Narrativized embeddings use templates to convert database rows into clear, context-rich narratives. These narratives can be short or span entire documents, capturing the relationships and meaning that matter most. For instance, an electronic medical record can be narrativized into a long document that describes a patient's history, mimicking the structure and details found in traditional records. Once standardized, these can then be chunked into manageable segments, vectorized, and stored in a database. This approach enables related fields or measures to be placed in proximity, making retrieval and contextual understanding easier.
Once created, these narratives are chunked, vectorized, and stored for fast, context-aware search. The result? When you ask a question, the system retrieves the most relevant, semantically connected answers – no complex SQL or graph queries required.
Narrativized embeddings in action
Imagine you have related tables of drug development data.
| Compound | IC50 | hERG | LD50 | T1/2 |
| Compound A | 1uM | 2uM | 5uM | 12h |
| Compound B | 6uM | 20uM | 20mM | 6h |
| Compound C | 10uM | 1mM | 10mM | 12h |
| Compound | Species | Cyp2D6 t1/2 | Cyp2D6 t1/2 |
| Compound A | Human | 2uM | 5uM |
| Compound B | Human | 20uM | 20mM |
| Compound C | Human | 1mM | 10mM |
With narrativized embeddings, you can use a template like this to generate one sentence per row and then create embeddings. Although the template includes fields from separate tables, it will still embed the semantic meanings into the narrative.
<Compound> inhibited <Target> at a concentration of <IC50>, with a half-life of <t1/2> in <species> when administered <route>. It inhibited hERG at a concentration of <hERG> and shows a toxicity (LD50) of <LD50> in <species>
This approach brings together data from multiple tables, making connections explicit and easy to find. Also, multiple templates can be built to capture different sections and connections within the dataset.
Although creating templates and connections takes time, they can use existing databases without data remodeling. This flexible approach lets organizations explore semantic and graph technologies without committing to a full knowledge graph.
Developing narrativized embeddings
Narrativized embeddings involve transforming each row of a structured database into a coherent narrative that encapsulates the data's inherent semantics. This process includes:
Template-based narrative generation: Each database entry is converted into a narrative that not only describes the data but also integrates information from related tables
Embedding conversion: The narratives are then transformed into vector embeddings. These embeddings serve as dense, numerical representations of the narratives, capturing their semantic essence
Vector search and RAG integration: Embeddings enable efficient vector search and retrieval of relevant information. In RAG systems, they support context-aware responses, improving output accuracy and relevance
Getting the templates right
The key is to start simple and then build complexity.
Entity naming: Make sure entity names are clearly defined and included in the narratives. This distinction is critical for differentiating between embeddings
Understanding relationships: Begin with templates that use a basic set of relationships and then expand to incorporate deeper or more specific connections. Multiple templates can cater to various facets of the data, such as general overviews versus in-depth analyses
Negative embeddings: These can be used to define relationships that are invalid or disallowed. This adds a layer of sanitization and guardrails to minimize incorrect interpretations or queries
Iterative development: Start with simple templates and refine them over time. As you begin to understand your data better, update the templates to capture more nuanced relationships and target specific aspects of the process
Narrativized embeddings versus knowledge graphs and Text2SQL
Narrativized embeddings can be applied to a range of topics and queries. And multiple templates can be used to represent different relationships or aspects of the data for context-sensitive retrieval. Once these templates are created, they can automatically generate narratives and embeddings, offering a practical middle ground between knowledge graphs and text-to-SQL (Text2SQL): less effort than constructing a comprehensive knowledge graph but more semantic depth and contextual awareness compared to basic Text2SQL techniques.
Embeddings are especially useful in exploratory scenarios or where complex relationships are not fully understood, supporting flexible and incremental development.
Here's an in-depth comparison of Text2SQL, narrativized embeddings, and knowledge graphs:
| Aspect | Text-to-SQL | Narrativized embeddings | Knowledge graphs |
| Definition | Translates natural language queries into SQL statements for relational databases | Transforms structured data rows into narrative text, then into embeddings for vector-based retrieval | Represents data entities and their relationships in a graph structure, enabling complex queries and inferences |
| Implementation effort | Requires accurate mapping between natural language and DB schema; complex for intricate queries | Involves creating templates; once done, narrative and embedding generation can be automated | Demands significant resources to design and maintain the graph structure and relationships |
| Flexibility | Limited by schema understanding; less adaptable to evolving data relationships | Agile template definition; can easily adapt and refine narratives for various relationships | Comprehensive model that can be adapted and sliced as needed |
| Scalability | May face performance issues with complex joins or very large datasets | Efficient vector searches scale effectively with data size | Can scale well, when properly designed |
| Semantic understanding | Relies on schema-defined relationships; may miss implicit or nuanced connections | Captures implicit and explicit relationships through narratives and language models | Explicitly defines relationships, providing clear semantics where defined; shines at governing enterprise data |
| Query complexity handling | Handles simple queries well; struggles with complex, multi-table, or ambiguous queries | Excels in complex queries by embedding rich contextual information; reduces ambiguity when templates approximate query context | Handles complex queries when relationships are well-defined |
| Maintenance | Ongoing effort to align with schema changes; can be labor-intensive | Templates can be updated as needed; automation reduces overhead | Requires continuous updates to reflect new data and relationships that need to be defined at the database level |
Combining narrativized embeddings with knowledge graphs
Narrativized embeddings can also be generated from existing knowledge graphs. This integration supports the direction of specific relationships and uses language-driven context to support RAG systems. It involves creating narrative templates to identify the graph's connections and then transforming them into embeddings. These embeddings encapsulate both the structured knowledge and the contextual richness of natural language, enhancing the retrieval and generation capabilities of AI systems.
The advantages of narrativized embeddings
Narrativized embeddings can't solve every problem. But they can play a big role in multi-agent RAG systems and simplify agile chatbot development without major data engineering efforts. The wider benefits include:
Comprehensive record representation: Narrativized embeddings turn complex records into detailed narratives that maintain data integrity and standardize formats for easier querying and analysis
Post-retrieval ranking: After retrieving embeddings, ranking mechanisms are applied to determine related narratives and identify additional narrative templates for searching. This approach delivers more relevant results and outlines connections among related data points
Rapid testing and development: Narrativized embeddings allow quick iteration and exploration without building a full knowledge graph, letting teams test ideas and refine templates easily
Clearer semantic links: Narrativizing data reveals implicit connections, aiding understanding and analysis
Resource efficiency: This approach provides an alternative to constructing extensive knowledge graphs, which can drain resources and often require expertise in graph query languages
Improved AI performance: Incorporating narrativized embeddings into AI systems, particularly those utilizing RAG, can lead to more accurate and contextually relevant responses
Scalability and flexibility: Narrativized embeddings can evolve alongside the system's requirements. Basic templates can be expanded to include more complex relationships, or new templates can be developed to address specific needs. This adaptability supports agile development and iterative improvements
A scalable solution for structuring and querying complex data
Narrativized embeddings are a practical and efficient way to increase semantic understanding in structured data. By converting structured tables into narrative forms with context and transforming these narratives into embeddings, they support advanced applications like RAG in conversational AI systems. Proper template design, iterative development, and using negative embeddings for guardrails will help develop the system's flexibility and accuracy over time.