Skip to content

satorucommit/Baby-SLM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 

Repository files navigation

BabySLM — Tiny Transformer Language Model

A minimal PyTorch implementation of a Transformer-based language model for educational purposes.

What Does This Model Do?

BabySLM is a next-token prediction model — given a sequence of word tokens (represented as numbers), it predicts what token should come next at each position. This is the same core task that powers models like GPT, but in a much smaller, educational form.

Architecture

  • Token & Position Embeddings: Converts word indices into vectors and adds positional information
  • Single Transformer Block: Uses multi-head attention (4 heads) to learn relationships between tokens
  • Output Head: Projects back to vocabulary space for next-token predictions

Key Parameters

  • vocab_size: Number of unique tokens the model can handle (e.g., 1000 words)
  • embed_dim: Dimensionality of token embeddings (e.g., 32 or 128)
  • context_length: Maximum sequence length the model can process (e.g., 16 or 64 tokens)

Quick Start (Windows PowerShell)

python -m venv .venv
.\.venv\Scripts\Activate.ps1
pip install -r requirements.txt
python model.py

Notes

  • Installing torch may require following the official instructions at https://pytorch.org/ if you need CUDA support or specific wheels for Windows.
  • This is a teaching example — the model is structurally similar to real LLMs but much smaller (real models have billions of parameters and 30+ stacked transformer blocks).
  • The model is untrained and will output random predictions until trained on actual data.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages