Demystifying LLMs: From Theory to Hands-On with Llama-3

A Step-by-Step Workshop for Students

Session Goal

  • βœ… Open a Google Colab notebook
  • βœ… Load a 4-bit quantized Llama-3 or Phi-3 model
  • βœ… Write a Python function to interact with the model
  • Focus: From understanding LLM internals to building prompt-ready models

10:00 – 11:00 AM: The Mechanics (How LLMs Work)

  • Goal: Understand that LLMs are not knowledge bases, but next-token predictors.
  • Visual: Transformer Encoder-Decoder architecture.
  • Core idea: Context determines meaning (e.g., 'Bank' in different sentences).
  • Output: Predicts the most probable next word.

Tokenization & The Math Problem

  • Interactive Demo: Use the OpenAI Tokenizer.
  • Example: 'Lollipop' is one token, not letters.
  • Reason for errors: Models operate on tokens, not characters.
  • Lesson: Break problems down for better results.

The Hardware Problem: Quantization

  • Challenge: Llama-3-8B needs ~16GB VRAM; Colab Free Tier gives 15GB.
  • Solution: Quantization – compressing model weights.
  • Analogy: FP16 = high-res pizza; INT4 = pixelated pizza.
  • Takeaway: 4-bit quantization makes large models runnable in Colab.

11:00 – 12:00 PM: The Hello World (Hands-On)

  • Step 1: Change runtime β†’ T4 GPU.
  • Step 2: Install dependencies: torch, transformers, bitsandbytes, accelerate.
  • Step 3: Configure quantization and load pre-trained model.
  • Troubleshoot: Common Colab errors and GPU disconnects.

First Inference: Why β€˜Hi’ Repeats

  • Common mistake: Running model.generate('Hi') directly.
  • Result: Repetitive or meaningless output.
  • Lesson: Models need chat templates for structure.
  • Next: Learn to use Llama-3’s chat format.

12:00 – 12:20 PM: Engineering Prompts

  • Concept: Templates tell the model who is speaking.
  • Format includes system, user, and assistant tags.
  • Activity: Build ask_llama(prompt) with tokenizer.apply_chat_template.
  • Outcome: Consistent, context-aware model responses.

12:20 – 12:40 PM: Few-Shot Prompting

  • Challenge: Translate Telugu to English JSON.
  • Bad Prompt: Adds conversational fluff.
  • Fix: Provide examples in the prompt (few-shot).
  • Output: Precise, code-ready JSON response.

12:40 – 01:00 PM: Mini Hack – The Strict Librarian

  • Task: System prompt to return JSON with genre and year for a book.
  • Condition: If the book doesn’t exist β†’ return null.
  • Goal: Output must be parsable via json.loads() in Python.
  • Why: Real-world AI apps depend on reliable JSON output.

Wrap-Up & Key Takeaways

  • βœ” LLMs = Predictors, not databases.
  • βœ” Tokenization explains model quirks.
  • βœ” Quantization enables on-device AI.
  • βœ” Prompt engineering = Real control over output.

Other Free PPT Tools

Topic to PPT using AI

Generate engaging presentations quickly from just a keyword. Ideal for students and educators needing fast, content-rich slides.

Create PPT from Topic
AI

YouTube to PPT using AI

Turn YouTube videos into informative slide presentations. Excellent for marketers and creators looking to expand their video content's reach.

Create PPT from YouTube
AI

AI PitchDeck Generator

Turn Pitch Deck into informative slide presentations. Excellent for business and startup looking to present his business.

Create PPT from Pitch Deck
AI

Text to PPT using AI

Generate engaging presentations quickly from just a keyword. Ideal for students and educators needing fast, content-rich slides.

Create PPT from Text
AI

URL to PPT using AI

Effortlessly convert any web page into a comprehensive presentation. Perfect for professionals and researchers presenting web-based data.

Create PPT from URL
AI

PDF to PPT using AI

Convert PDF files to PowerPoint slides easily. Essential for analysts and consultants dealing with detailed reports.

Create PPT from PDF
AI