Open main menu
Integrations
Resources
Pricing
Help
πΊπΈ
English
βΌ
Close main menu
Integrations
Resources
Pricing
Language
πΊπΈ
English
βΌ
Download
Create AI Presentation
Demystifying LLMs: From Theory to Hands-On with Llama-3
A Step-by-Step Workshop for Students
Session Goal
β Open a Google Colab notebook
β Load a 4-bit quantized Llama-3 or Phi-3 model
β Write a Python function to interact with the model
Focus: From understanding LLM internals to building prompt-ready models
10:00 β 11:00 AM: The Mechanics (How LLMs Work)
Goal: Understand that LLMs are not knowledge bases, but next-token predictors.
Visual: Transformer Encoder-Decoder architecture.
Core idea: Context determines meaning (e.g., 'Bank' in different sentences).
Output: Predicts the most probable next word.
Tokenization & The Math Problem
Interactive Demo: Use the OpenAI Tokenizer.
Example: 'Lollipop' is one token, not letters.
Reason for errors: Models operate on tokens, not characters.
Lesson: Break problems down for better results.
The Hardware Problem: Quantization
Challenge: Llama-3-8B needs ~16GB VRAM; Colab Free Tier gives 15GB.
Solution: Quantization β compressing model weights.
Analogy: FP16 = high-res pizza; INT4 = pixelated pizza.
Takeaway: 4-bit quantization makes large models runnable in Colab.
11:00 β 12:00 PM: The Hello World (Hands-On)
Step 1: Change runtime β T4 GPU.
Step 2: Install dependencies: torch, transformers, bitsandbytes, accelerate.
Step 3: Configure quantization and load pre-trained model.
Troubleshoot: Common Colab errors and GPU disconnects.
First Inference: Why βHiβ Repeats
Common mistake: Running model.generate('Hi') directly.
Result: Repetitive or meaningless output.
Lesson: Models need chat templates for structure.
Next: Learn to use Llama-3βs chat format.
12:00 β 12:20 PM: Engineering Prompts
Concept: Templates tell the model who is speaking.
Format includes system, user, and assistant tags.
Activity: Build ask_llama(prompt) with tokenizer.apply_chat_template.
Outcome: Consistent, context-aware model responses.
12:20 β 12:40 PM: Few-Shot Prompting
Challenge: Translate Telugu to English JSON.
Bad Prompt: Adds conversational fluff.
Fix: Provide examples in the prompt (few-shot).
Output: Precise, code-ready JSON response.
12:40 β 01:00 PM: Mini Hack β The Strict Librarian
Task: System prompt to return JSON with genre and year for a book.
Condition: If the book doesnβt exist β return null.
Goal: Output must be parsable via json.loads() in Python.
Why: Real-world AI apps depend on reliable JSON output.
Wrap-Up & Key Takeaways
β LLMs = Predictors, not databases.
β Tokenization explains model quirks.
β Quantization enables on-device AI.
β Prompt engineering = Real control over output.
Related Presentations
Tero: Bridging the Gap
28 November 2025
IP測試ιηΌζΊζ §θ½εθε
21 November 2025
Master Snake Identification Basics
20 November 2025
UKG Pro WFM: Transforming Workforce Management at Shoprite
19 November 2025
Master JSON for Products
19 November 2025
Extracurriculars: Boon or Bane?
19 November 2025
Previous
More pages
Next