RAG, Fine-Tuning & Embeddings
You've learned how LLMs work and how to prompt them. But what if you need the AI to know about YOUR specific data — your company's docs, your product manuals, your research? That's where fine-tuning and RAG come in.
After this lesson, you'll be able to:
- ✓ Explain the difference between fine-tuning and RAG
- ✓ Identify when to use fine-tuning vs. RAG for a given scenario
- ✓ Describe how embeddings represent meaning as numbers
Fine-Tuning: Teaching an Expert New Tricks
Fine-tuning means taking a pre-trained model and training it further on a specific, smaller dataset. The model has already learned the fundamentals of language from its massive pre-training — fine-tuning adapts that general knowledge to a specialized domain or task.
When to use fine-tuning: When you need the model to consistently behave a certain way or understand domain-specific patterns. For example, training a model on thousands of legal contracts so it writes in the right legal style, or fine-tuning a model on your company's support conversations so it matches your brand voice.
RAG: Retrieval-Augmented Generation
Instead of relying only on what the model learned during training, RAG lets the model retrieve relevant information from an external knowledge base before generating its response. It's like giving the AI access to a reference library.
How RAG Works
User asks a question — "What is our return policy for electronics?"
System searches knowledge base — finds the relevant return policy documents
Relevant docs retrieved — the matching policy sections are pulled out
LLM generates answer — uses those docs + the question to produce an accurate, grounded response
Why RAG matters: It reduces hallucinations (the model making things up), keeps answers current without retraining, and lets you ground responses in your actual data. There's no need to retrain the model when your documents change — just update the knowledge base.
Fine-Tuning vs. RAG
These are two different tools for different problems. Here's how they compare:
Fine-Tuning
- Permanently changes model weights
- Bakes knowledge into the model
- Good for behavior, style, and tone
- Requires retraining when data changes
- Higher upfront cost
RAG
- Keeps model weights as-is
- Provides info at query time
- Good for factual, changing info
- Just update the knowledge base
- Lower ongoing cost
In practice, many production systems use both together — fine-tuning to set the model's behavior and tone, and RAG to provide accurate, up-to-date factual information.
Embeddings: Meaning as Numbers
Embeddings are a way to represent words, sentences, or documents as numbers (vectors) so that similar meanings end up close together in mathematical space. This is how computers understand that words are related.
For example, "king" and "queen" would be close together because they share similar meaning. "King" and "bicycle" would be far apart. "Happy" and "joyful" would be neighbors, while "happy" and "concrete" would be distant.
Embedding Space (2D illustration)
A technical note: real embeddings live in hundreds to a few thousand dimensions depending on the model (OpenAI's text-embedding-3-large uses 3,072), not on a 2D map. The neighborhood analogy captures the principle (similar meanings cluster together), not the shape.
Why embeddings matter: Embeddings power the search step in RAG. When a user asks a question, the system converts the question into an embedding, then finds documents with similar embeddings. This is how it retrieves relevant information — not by matching keywords, but by matching meaning.
Pre-training an LLM is extraordinarily expensive. Training a frontier model from scratch costs millions of dollars in compute, takes months of processing time, and requires massive datasets. Only a handful of companies in the world can do it.
Fine-tuning is much cheaper. It typically takes hours or days instead of months, and costs thousands of dollars instead of millions. You're adjusting an existing model, not building one from the ground up.
RAG requires no retraining at all. You simply update the documents in your knowledge base. This makes it the most cost-effective and flexible option for keeping AI responses current and accurate. That's why RAG has become one of the most popular patterns in production AI systems.
Key Takeaway
Fine-tuning permanently adapts a model for a specific domain. RAG retrieves external information at query time. Fine-tuning is for behavior, RAG is for facts. Embeddings make this all work by converting meaning into searchable numbers.
Try This (Optional)
Your team wants to build an internal chatbot that answers HR policy questions. The policies are updated quarterly, and employees often phrase questions informally ("sick days" instead of "PTO"). Would you lean toward RAG, fine-tuning, or both — and which part of your choice depends on embeddings doing their job well?
Knowledge Check
Your company wants their AI chatbot to answer questions using the latest product documentation that changes weekly. Should they use fine-tuning or RAG?
What are embeddings?