A minimal PyTorch implementation of a Transformer-based language model for educational purposes.
BabySLM is a next-token prediction model — given a sequence of word tokens (represented as numbers), it predicts what token should come next at each position. This is the same core task that powers models like GPT, but in a much smaller, educational form.
- Token & Position Embeddings: Converts word indices into vectors and adds positional information
- Single Transformer Block: Uses multi-head attention (4 heads) to learn relationships between tokens
- Output Head: Projects back to vocabulary space for next-token predictions
vocab_size: Number of unique tokens the model can handle (e.g., 1000 words)embed_dim: Dimensionality of token embeddings (e.g., 32 or 128)context_length: Maximum sequence length the model can process (e.g., 16 or 64 tokens)
python -m venv .venv
.\.venv\Scripts\Activate.ps1
pip install -r requirements.txt
python model.py- Installing
torchmay require following the official instructions at https://pytorch.org/ if you need CUDA support or specific wheels for Windows. - This is a teaching example — the model is structurally similar to real LLMs but much smaller (real models have billions of parameters and 30+ stacked transformer blocks).
- The model is untrained and will output random predictions until trained on actual data.