Being-Creative-with-AI/C4/Understanding-the-RAG-pipeline/English
Title of the Script: Understanding the RAG Pipeline.
Author: EdyPyramids Team.
Keywords: RAG, Retrieval Augmented Generation, Artificial Intelligence, Embeddings, Vector Database, Vector Search, AI Chatbot, Document Chunking, AI Pipeline, Query Processing, Information Retrieval, Generative AI, EduPyramids, video tutorial.
| Visual Cue | Narration |
| Slide 1
Title Slide |
Welcome to this Spoken Tutorial on Understanding the RAG Pipeline. |
| Slide 2
Learning Objectives |
In this tutorial, we will learn:
|
| Slide 3
Disclaimer Slide As AI tools constantly evolve, if you are unable to locate any icon or encounter difficulty at any step, you may use any conversational AI Chatbot for guidance. |
As AI tools constantly evolve, if you are unable to locate any icon or encounter difficulty at any step, you may use any conversational AI Chatbot for guidance. |
| Slide 4
System Requirements |
To record this tutorial, I am using:
Learners will also need a working internet connection |
| Slide 5
Prerequisites |
To follow this tutorial,
For the Prerequisite of this tutorial, visit the website shown on your screen |
| Let us get started. | |
| RAG pipeline diagram with Retrieval + Generation blocks | RAG is known as Retrieval Augmented Generation. |
| Flowchart showing sequence of stages | It is a sequence of steps used to answer a question. |
| AI system connected to external documents/database | It uses external data to generate accurate answers. |
| AI brain with “Stored Knowledge” and “Retrieved Knowledge” labels | The system does not rely only on stored knowledge. |
| Search and retrieval animation from documents | It retrieves relevant information to generate answers. |
| “Step-by-step process” illustration | Let us understand this step by step. |
| Grocery shopping chatbot interface | Consider a grocery app chatbot. |
| User typing a question into chatbot | Ask this question: “Can I return vegetables after 2 days?” |
| Human brain vs computer comparison graphic | Now the system cannot understand words directly like humans |
| Flowchart showing processing stages | So it follows a series of steps to find the answer. |
| Text converting into vectors/numbers animation | First, the system converts the question into a machine-understandable form. |
| Highlight the term “Embedding” | This process is called embedding. |
| Binary numbers or vectors beside words | Computers work with numbers, not words. |
| Words converting into numeric arrays | So every word is represented as a list of numbers. |
| Example vector representation of “vegetables” | For example,
The word ‘vegetables’ may be converted into a list of numbers like: [0.21, 0.45, 0.78, …] |
| Side-by-side vectors for vegetables and fruits | Similarly,
The word ‘fruits’ will have a different set of numbers. |
| Highlight closeness between vectors | But the values will be close to ‘vegetables’ because both are related. |
| Vector comparison illustration | Now, instead of comparing words,
The system compares these numbers. |
| Similar vectors connected visually | If two sentences have similar vectors, their meanings are similar. |
| Question matched with stored documents | This lets the system compare the question with stored documents |
| Similarity measurement graphic | It helps measure sentence similarity. |
| Company documents or policy PDF shown | Next, the system searches company documents such as the return policy. |
| Large document splitting into blocks | These documents are often divided into smaller parts called chunks. |
| Three text boxes labeled Chunk 1, Chunk 2, Chunk 3 | A return policy document may be split into chunks like this |
| Three text boxes labeled Chunk 1, Chunk 2, Chunk 3 | Chunk 1: Return rules for fruits and vegetables
Chunk 2: Return rules for packaged items Chunk 3: Refund process and timelines |
| Chunks converting into embeddings | Now, each chunk is converted into numbers using the same embedding process. |
| Vector database illustration with stored vectors | These number representations are stored in a database called a vector database. |
| Search icon over vector database | A vector database stores data in the form of numbers.
It can quickly find similar meanings. |
| User query converting into vectors | When you ask:
‘Can I return vegetables after 2 days?’ Your question is also converted into numbers. |
| Query vector compared with stored chunk vectors | Then, the system compares your question with chunks in the vector database. |
| Best matching chunk highlighted | It finds the most similar chunk.
For example: ‘Fresh vegetables can be returned within 24 hours only.’ |
| Retrieved chunk sent to AI model | This relevant chunk is then used to generate the final answer. |
| Multiple retrieved policy lines displayed | The system compares the question with chunks and retrieves relevant chunks. |
| Highlight retrieved policy statements | The system may retrieve statements such as:
|
| Good retrieval vs bad retrieval comparison | Note that the quality of the final answer depends on the retrieved information. |
| Relevant chunk highlighted among many results | The system selects the most relevant information from the retrieved results. |
| Question and retrieved chunk combined visually | This information is combined with the original question. |
| Two blocks labeled Question and Policy Line | The system now has two things:
Your question and the relevant policy line. |
| Context box formed from question + retrieved text | Together, these form the context
The AI uses the background information to generate an answer. |
| Policy statement highlighted | For example, from the return policy, it may find this line:
‘Fresh vegetables can be returned within 24 hours only.’ |
| Original question shown beside retrieved statement | Now, this information is combined with your original question: |
| Question displayed again prominently | ‘Can I return vegetables after 2 days?’ |
| Combined input entering AI model | So the system now has both, your question and the relevant policy information |
| Label “Context” shown clearly | This combined input is called context. |
| Definition text animation
Context → AI → Response flowchart |
In simple terms, context is the background information given to the system. |
| Final chatbot answer displayed on screen | Using this context, the system can now generate a correct response, like this:
‘No, vegetables cannot be returned after 2 days. As the policy allows returns only within 24 hours. |
| AI model reading retrieved chunk and question | The AI model reads the question and the retrieved information. |
| Generated answer appearing | It then generates an answer based on this information. |
| Alternative generated answer example | For example:
“Vegetables are perishable items and cannot be returned after delivery.” |
| Incorrect guessing crossed out, retrieved answer ticked | The system reduces guess work and bases its answer on retrieved information. |
| Relevant answer highlighted | This leads to a more relevant response.
Without retrieval, the answer may be generic or incorrect. |
| Three-step pipeline animation: Question → Retrieval → Answer | The process is as follows:
|
| Wrong retrieval leading to incorrect answer illustration | Wrong or unrelated retrieval may lead to an incorrect answer. |
| With this we come to the end of this tutorial. | |
| Slide 6
Summary In this tutorial, we learnt:
|
In this tutorial, we learnt:
|
| Slide 7
Assignment Create a simple grocery return policy with at least three rules. Ask the question: “Can I return vegetables after 2 days?” Then ask the same question again using your policy as context. Compare both responses and observe how retrieval improves the final answer. Also identify:
|
We encourage you to do this assignment. |
| Slide 8
Acknowledgement Domain Inputs: Bhavani Shankar R and Saisudha Sugavanam Script Writer: Ketki Naina Admin Reviewer: Arthi Varadarajan Quality Reviewer: Sakina Sidhwa Novice Reviewer: Misbah Samir AI Narration: Debosmita Mukherjee AI Graphics: Arvind Pillai Video Editor: Arvind Pillai Web Developer: Ankita Singhal |
Thank you for joining. |
| Slide 9
Acknowledgement This Spoken Tutorial is brought to you by EduPyramids Educational Services Private Limited at SINE, IIT Bombay. |