Open main menu
Features
Integrations
Resources
Pricing
Help
πΊπΈ
English
βΌ
Log in
Log in
Sign up
Close main menu
Integrations
Resources
Pricing
Language
πΊπΈ
English
βΌ
Download
Create AI Presentation
Demystifying LLMs: From Theory to Hands-On with Llama-3
A Step-by-Step Workshop for Students
Session Goal
β Open a Google Colab notebook
β Load a 4-bit quantized Llama-3 or Phi-3 model
β Write a Python function to interact with the model
Focus: From understanding LLM internals to building prompt-ready models
10:00 β 11:00 AM: The Mechanics (How LLMs Work)
Goal: Understand that LLMs are not knowledge bases, but next-token predictors.
Visual: Transformer Encoder-Decoder architecture.
Core idea: Context determines meaning (e.g., 'Bank' in different sentences).
Output: Predicts the most probable next word.
Tokenization & The Math Problem
Interactive Demo: Use the OpenAI Tokenizer.
Example: 'Lollipop' is one token, not letters.
Reason for errors: Models operate on tokens, not characters.
Lesson: Break problems down for better results.
The Hardware Problem: Quantization
Challenge: Llama-3-8B needs ~16GB VRAM; Colab Free Tier gives 15GB.
Solution: Quantization β compressing model weights.
Analogy: FP16 = high-res pizza; INT4 = pixelated pizza.
Takeaway: 4-bit quantization makes large models runnable in Colab.
11:00 β 12:00 PM: The Hello World (Hands-On)
Step 1: Change runtime β T4 GPU.
Step 2: Install dependencies: torch, transformers, bitsandbytes, accelerate.
Step 3: Configure quantization and load pre-trained model.
Troubleshoot: Common Colab errors and GPU disconnects.
First Inference: Why βHiβ Repeats
Common mistake: Running model.generate('Hi') directly.
Result: Repetitive or meaningless output.
Lesson: Models need chat templates for structure.
Next: Learn to use Llama-3βs chat format.
12:00 β 12:20 PM: Engineering Prompts
Concept: Templates tell the model who is speaking.
Format includes system, user, and assistant tags.
Activity: Build ask_llama(prompt) with tokenizer.apply_chat_template.
Outcome: Consistent, context-aware model responses.
12:20 β 12:40 PM: Few-Shot Prompting
Challenge: Translate Telugu to English JSON.
Bad Prompt: Adds conversational fluff.
Fix: Provide examples in the prompt (few-shot).
Output: Precise, code-ready JSON response.
12:40 β 01:00 PM: Mini Hack β The Strict Librarian
Task: System prompt to return JSON with genre and year for a book.
Condition: If the book doesnβt exist β return null.
Goal: Output must be parsable via json.loads() in Python.
Why: Real-world AI apps depend on reliable JSON output.
Wrap-Up & Key Takeaways
β LLMs = Predictors, not databases.
β Tokenization explains model quirks.
β Quantization enables on-device AI.
β Prompt engineering = Real control over output.
Related Presentations
Tero: Bridging the Gap
28 November 2025
IP測試ιηΌζΊζ §θ½εθε
21 November 2025
Master Snake Identification Basics
20 November 2025
UKG Pro WFM: Transforming Workforce Management at Shoprite
19 November 2025
Master JSON for Products
19 November 2025
Extracurriculars: Boon or Bane?
19 November 2025
Previous
More pages
Next