This repository showcases a wide range of examples and implementations built with the Transformers library to highlight different aspects of modern deep learning models. It covers language models, vision transformers, multimodal architectures, and more.
Architecture/– NEW! RoPE (Rotary Position Embedding) comparisons and transformer architecture explorationsGenel-1/– Foundational transformer implementations and configuration examplesGenel-2/– Advanced transformer models (vision transformers and multimodal demos)Genel-3/– Additional transformer variants and experimentsGenel-4/– Performance comparisons and fine-tuning workflowsGenel-5/– Cutting-edge techniques and model optimisationsMulti Modal/– Multimodal transformer implementations for video, audio, and textVision Transformers/– Vision transformer models and applicationsTime series - Transformers/– Time-series analysis with transformer modelsTokenizer/– Custom tokenizer implementations and training scriptsllama/– LLaMA model implementation and utilitiesQwen3/– Qwen 3 model examples and usage guidesfinetuned-llm/– Fine-tuned language model checkpointsarchive/– MMLU benchmark results and archived artefacts
test-time-scaling.py– Test-time scaling implementation for language modelsrequirements.txt– Core Python dependenciesrequirements-jax.txt– Additional dependencies for the JAX ecosystemrequirements-dev.txt– Tooling for development and advanced trainingsetup.sh– Automated setup script.env.example– Template for environment variablesCONTRIBUTING.md– Contribution guidelines
Ensure that Python 3.7+ is installed on your system.
Automatic Setup (Recommended):
# Clone the repository
git clone https://github.com/emredeveloper/Transformers-Examples.git
cd Transformers-Examples
# Run the automated setup script (default profile: base)
chmod +x setup.sh
./setup.sh --venv
# To include JAX or development dependencies:
# ./setup.sh --profile jax
# ./setup.sh --profile dev
# ./setup.sh --profile allManual Setup:
- Clone the repository:
git clone https://github.com/emredeveloper/Transformers-Examples.git
cd Transformers-Examples- Create a virtual environment (recommended):
python -m venv .venv
# Windows:
.venv\Scripts\activate
# Linux/macOS:
source .venv/bin/activate- Install dependencies:
pip install -r requirements.txt
# Extra dependencies for JAX experiments:
# pip install -r requirements-jax.txt
# Development tooling:
# pip install -r requirements-dev.txt- Base (
requirements.txt): Core packages required for PyTorch, Transformers, and most examples. - JAX (
requirements-jax.txt): Addsjax,jaxlib, andflaxfor JAX-based experiments. - Development (
requirements-dev.txt): Provides notebooks, large-scale training helpers, and advanced tooling (jupyter,notebook,fairscale,deepspeed).
The setup.sh script can install these profiles automatically with the --profile flag. The default profile is base.
- Configure environment variables:
# Copy the template to .env
copy .env.example .env # Windows
cp .env.example .env # Linux/macOS
# Edit .env and add your Hugging Face tokencd Architecture
python partial-rope.pycd Genel-1
python app.pycd "Vision Transformers"
jupyter notebook sglip2.ipynbcd "Multi Modal"
python basic-multimodal.pycd llama
python run_cpu.pycd Tokenizer
python tokenizer.pypython test-time-scaling.pyMany examples can be configured via environment variables:
HUGGINGFACE_TOKEN: Your Hugging Face API tokenCUDA_VISIBLE_DEVICES: GPU device selectionMODEL_CACHE_DIR: Cache directory for downloaded models
- GPT-2 configuration and fine-tuning
- DeepSeek transformer implementations
- Qwen 3 model usage
- Test-time scaling techniques
- RoPE (Rotary Position Embedding) comparisons
- Vision Transformer (ViT) implementations
- SGLIP-2 multimodal understanding
- Image classification examples
- Video, audio, and text processing
- Cross-modal attention mechanisms
- Multimodal fusion techniques
- Transformer-based time-series forecasting
- Sequence-to-sequence modelling
- Mixture of Experts (MoE)
- Cross-attention mechanisms
- Custom tokenisation strategies
- Model optimisation techniques
- Partial RoPE implementations
This directory focuses on advanced transformer architecture examples:
partial-rope.py: Partial RoPE vs. full RoPE performance comparison- Detailed benchmark results and visualisations
- Memory usage analyses
- Ablation studies
Contributions are welcome! Feel free to open a Pull Request. For major changes, please start a discussion by opening an issue first.
See CONTRIBUTING.md for more information.
This project is open source and available under the MIT License. Some third-party examples may include their own licence texts (e.g., Apache 2.0) and are distributed under the terms specified in their respective directories.
- Certain examples require special access to hosted models
- A GPU is recommended for large-scale models
- Check the individual directory README files for specific requirements
- Ensure authentication is configured for Hugging Face models
- Remember to create the
.envfile and add your API tokens
- Import errors: Verify all dependencies are installed
- CUDA errors: Check GPU availability and CUDA installation
- Model access: Confirm you have permission to use private models
- Out of memory: Reduce batch sizes or switch to smaller model variants
- Token errors: Ensure your Hugging Face token is set correctly in
.env
For deeper assistance, review the documentation in the relevant directory or open an issue.
The repository includes performance comparisons for multiple transformer variants:
- Speed and accuracy comparisons between RoPE implementations
- MMLU benchmark results (see the
archive/directory) - Analyses of model optimisation techniques
For detailed results, inspect the Architecture/ directory and the generated PNG assets.
Install the optional development dependencies to run lightweight tests and quality checks:
pip install -r requirements-dev.txtThen run:
pytest
ruff check tests
black --check testsThe continuous integration workflow executes these checks automatically.