LLM Fine Tuning with QLoRA Cassandra Lunch 4, presented by Anant

Version 1.0
LLM Fine Tuning with QLoRA -
Evaluation vs RAG
Comparing our fine-tuned Llama 2 model to using Retrieval
Augmented Generation alongside base Llama 2. Evaluated
using similar statistical measures the we used previously.
Obioma Anomnachi
Engineer @ Anant

RAG Overview
● What is Retrieval-Augmented Generation (RAG)?
○ Hybrid NLP Approach:
■ Combines information retrieval and text generation.
■ Creates more comprehensive and contextually accurate outputs.
○ Uses External Knowledge Sources:
■ Leverages large corpora or databases.
■ Augments generative capabilities of language models.
● How RAG Works:
○ Retrieval Stage:
■ Model retrieves relevant information from a pre-existing corpus or
knowledge base.
○ Generation Stage:
■ Uses retrieved information as input.
■ Generates a coherent and contextually appropriate response.
● Produces more informed and accurate results.
● Especially effective for complex tasks requiring in-depth knowledge.

● Traditional Language Models:
○ Data Dependency:
■ Rely solely on the data they were
trained on.
○ Text Generation:
■ Generate high-quality text based on
learned patterns.
○ Limitations:
■ Struggle with tasks requiring up-to-
date information.
■ May lack specific factual knowledge
not present in training data.
● RAG Models:
○ Enhanced Generative Process:
■ Incorporate real-time information
retrieval.
○ Dynamic Information Retrieval:
■ Fetch and utilize the most relevant
information available at the time of
generation.
○ Improved Performance:
■ Significantly better at tasks
requiring recent, detailed, or
domain-specific information.
RAG vs Language Models

Retrievers
● Knowledge Sources
○ External Corpora:
■ Large datasets, databases, and documents.
○ Domain-Specific Databases:
■ Specialized knowledge bases tailored to specific fields (e.g., medical, legal).
○ Real-Time Data:
■ Up-to-date information from live sources such as news feeds or databases.
● Search Mechanisms
○ Dense Vector Representations:
■ Utilize neural embeddings to find semantically similar documents.
○ Sparse Vector Representations:
■ Use traditional methods like TF-IDF or BM25 to retrieve relevant passages.
○ Hybrid Techniques:
■ Combine dense and sparse methods for more accurate retrieval.
○ Relevance Scoring:
■ Assign scores to documents based on relevance to the query.
○ Filtering and Ranking:
■ Select and rank the most pertinent information for generation.

Retrievers - Embeddings and Similarity Search
● What are Neural Embeddings?
○ Definition:
■ Neural embeddings are dense vector representations of words, phrases, sentences, or documents,
generated using neural network models.
■ They capture semantic meaning in a continuous vector space where similar items are placed closer
together.
○ Purpose:
■ Semantic Similarity:
● Encodes semantic information, making it easier to measure similarity between different
pieces of text.
● Allows models to understand and retrieve information based on meaning, not just exact word
matching.
○ Output:
■ Generates dense vectors (embeddings) with fixed dimensions, typically high-dimensional (e.g., 300,
768).

RAG Advantages
● Enhanced Accuracy:
○ Incorporation of External Knowledge:
■ Leverages up-to-date and domain-specific information.
● Improved Factuality:
○ Accesses and integrates verified data sources.
■ Reduces the risk of generating incorrect or outdated information.
● Increased Relevance:
○ Context-Aware Responses:
■ Dynamic retrieval of pertinent information based on the query.
■ Ensures responses are highly relevant to the user's needs.
○ Domain-Specific Expertise:
■ Customizable to access specialized knowledge bases (e.g., medical, legal).
○ Real-Time Information:
■ Capable of retrieving the latest data, adapting to changes and new developments.
■ Useful for applications requiring up-to-date information, like news or trend analysis.
● Versatile Applications:
○ Adapts to various tasks such as question answering, summarization, and conversational agents.

RAG
● Enhanced Accuracy and Relevance:
○ Incorporates up-to-date, domain-specific
information dynamically.
○ Provides contextually relevant responses
leveraging real-time data retrieval.
● Scalability and Flexibility:
○ Adaptable to various tasks without the
need for extensive retraining.
○ Easy to update knowledge base for
different domains or new information.
● Cost Efficiency:
○ Reduces the need for large-scale dataset
creation and extensive retraining.
○ Utilizes existing knowledge sources,
lowering computational and resource
expenses.
Fine Tuning
● Customization and Specialization:
○ Tailors the model to specific tasks or
domains
○ Results in highly specialized models fine-
tuned to particular use cases.
● Improved Performance for Specific Tasks:
○ Fine-tuning on curated datasets produces
models optimized for particular
applications.
○ Enhances performance in narrow domains
with specialized requirements.
● Control Over Output:
○ Fine-grained adjustments to the model
improve accuracy and reduce errors.
○ Allows for better control over generated
content style.
RAG vs Fine Tuning

Evaluation
● Because the answer is ultimately generated via LLM, the performance of a RAG model is evaluated
the same way as for LLMs, fine tuned or not.
● Domain specific tests, benchmarks, statistical measures, human and llm evaluation all work the
same as in the previous presentation.
● Performance will depend on the sophistication of the retriever mechanism as well as the
capabilities of the LLM used, and the the quality of the data backing it.

Strategy: Scalable Fast Data
Architecture: Cassandra, Spark, Kafka
Engineering: Node, Python, JVM,CLR
Operations: Cloud, Container
Rescue: Downtime!! I need help.
www.anant.us | solutions@anant.us | (855) 262-6826
3 Washington Circle, NW | Suite 301 | Washington, DC 20037

LLM Fine Tuning with QLoRA Cassandra Lunch 4, presented by Anant

Recommended

Recommended

More Related Content

Similar to LLM Fine Tuning with QLoRA Cassandra Lunch 4, presented by Anant

Similar to LLM Fine Tuning with QLoRA Cassandra Lunch 4, presented by Anant (20)

More from Anant Corporation

More from Anant Corporation (20)

Recently uploaded

Recently uploaded (20)

LLM Fine Tuning with QLoRA Cassandra Lunch 4, presented by Anant