Transformer neural networks and large language models (LLMs) have ushered in an exciting time for Artificial Intelligence. Despite the incredible advancements, LLMs still face a number of different challenges that prevent them from providing greater value to its users:
Foundation models can be fine-tuned on labeled, domain-specific knowledge to address a variety of tailored tasks with some additional fine-tuning. Although fine-tuning can enhance the output of a model, it does not address many of the other challenges including cost, source citation and applying ever-changing current world information.
The image below depicts a prompt to an LLM that does not have access to specific information:
Source: Pinecone
In the example above, although the LLM has no idea how to turn off reverse braking for that car model, it performs its generative task to the best of its ability anyway, producing an answer that sounds grammatically solid – but is unfortunately flatly incorrect.
In a 2020 paper, Meta came up with a framework called retrieval-augmented generation (RAG) to provide LLMs with access to information beyond their training data. RAG allows LLMs to build on a specialized body of knowledge to answer questions in more accurate way.
RAG Overview
The concept of Retrieval Augmented Generation was introduced to help LLMs overcome these issues. Retrieval Augmented Generation works by fetching up-to-date or context-specific data from an external database and makes the content available to an LLM to support the generation of a response. The grounding of the model on external sources of knowledge to supplement the LLM’s internal representation of information has proven to boost the performance and accuracy of LLM applications by enabling the following:
How RAG Works
RAG has two phases: retrieval and content generation. In the retrieval phase, algorithms search for and retrieve snippets of information relevant to the user’s prompt or question. In an open-domain, consumer setting, those facts can come from indexed documents on the internet; in a closed-domain, enterprise setting, a narrower set of sources are typically used for added security and reliability.
For an LLM to leverage the supplemental data, the data must be in a format that the LLM can understand. LLMs use data that are encoded into tokens in the form of vectors which can be thought of as a matrix of numbers. A vector represents the meaning of the input text, the same way another human would understand the essence if you spoke the text aloud. Leading LLMs utilize vectors with over 512 values within each vector.
As a simple example, the vector value for the word “apple” may be [24, 61, 68, 88, …., n]
When information is encoded into vectors, it is termed an embedding. Embeddings are encoded by a specialized embedding LLM that converts data into vectors: arrays, or groups, of numbers and stores the values in a specialized database known as a vector database. The value of the embeddings denotes the relationship between the embedded data utilizing the 512+ values of the vector (e.g., instructions, news, words, etc.).
The image below illustrates at a high level the process by which the driver’s manual of an automobile is encoded into an embedding (vector) and stored within a vector database:
Source: Pinecone
When a user prompts an LLM, the model attempts to understand the true meaning of the query and retrieve relevant information instead of simply matching keywords in the user’s query. This process, known as semantic search, aims to deliver results that better fit the user’s intent, not just their exact words.
The image below visualizes the RAG flow to support generation when a vector database is used to improve the response using relevant and current information:
Source: Pinecone
The vector database performs a “nearest neighbor” search, finding the vectors that most closely resemble the user’s intent. When the vector database returns the relevant results, the application provides them to the LLM via its context window, prompting it to perform its generative task. Because the LLM can now easily determine the specific data that it was fed from the vector database to support the response, it can cite the source in its answer. If generative AI’s output is inaccurate, the document that contains that erroneous information can be quickly identified, corrected, and re-fed into the vector database.
AI INFLUENCERS
AI MODELS
Popular Large Language Models
ALPACA (Stanford)
BARD (Google)
Gemini (Google)
GPT (OpenAI)
LLaMA (Meta)
Mixtral 8x7B (Mistral)
PaLM-E (Google)
VICUNA (Fine Tuned LLaMA)
Popular Image Models
Stable Diffusion (StabilityAI)
Leaderboards
NOTABLE AI APPS
Chat
Image Generation / Editing
Audio / Voice Generation
Video Generation
DAILY LINKS TO YOUR INBOX
©2024 The Horizon