PolyglotLite

Lightweight multilingual language models that run on consumer hardware.

PolyglotLite is a toolkit for training and running efficient multilingual language models (100M-500M parameters) without needing expensive cloud GPUs. It's designed for developers who want to experiment with multilingual NLP on their own machines.

Status

This is an early-stage project. The model architecture is implemented but pretrained weights are still in development. You can:

Train models from scratch on your own data
Fine-tune on custom datasets
Experiment with the architecture
Use the language detection utilities

Pretrained weights coming soon.

Quick Start

Installation

git clone https://github.com/IIIDman/polyglotlite.git
cd polyglotlite
pip install -e .

Basic Usage

from polyglotlite import PolyglotLite

# Initialize a model
model = PolyglotLite(model_name="polyglot-135m")

# Generate text (note: without pretrained weights, output will be random)
output = model.generate("Hello world", max_length=50)

Using Pretrained Models

For actual text generation with real pretrained weights:

from polyglotlite import PolyglotLiteHF

# Load pretrained model (downloads automatically)
model = PolyglotLiteHF("polyglot-135m")

# Generate text
output = model.generate("The future of AI is", max_length=50)
print(output)

Requires: pip install transformers

Training on Your Data

from polyglotlite import PolyglotLite, Trainer

model = PolyglotLite(model_name="polyglot-135m")

trainer = Trainer(
    model=model,
    train_data="path/to/your/data.json",
    learning_rate=2e-4,
    batch_size=8
)
trainer.train()

model.save_pretrained("my-model")

Language Detection

from polyglotlite import detect_language

detect_language("Bonjour le monde")  # returns 'fr'
detect_language("你好世界")  # returns 'zh'

Model Sizes

Model	Parameters	Memory (FP16)
polyglot-135m	135M	~270MB
polyglot-360m	360M	~720MB
polyglot-500m	500M	~1GB

Supported Languages

The tokenizer and language detection support 50+ languages including English, Chinese, Spanish, French, German, Portuguese, Russian, Japanese, Korean, Arabic, Hindi, Vietnamese, Turkish, Polish, and many others. See polyglotlite/utils/language.py for the full list.

Compatibility

Platforms: macOS (Intel & Apple Silicon), Linux, Windows

Python: 3.8 - 3.13

Apple Silicon Note: For stability on M1/M2/M3/M4 Macs, the model defaults to CPU. You can try MPS acceleration with device="mps" but it may have issues with some PyTorch operations.

Project Structure

polyglotlite/
├── polyglotlite/
│   ├── models/          # Model architecture
│   ├── tokenizers/      # Tokenization  
│   ├── training/        # Training loop, configs
│   ├── inference/       # (planned) Optimized inference
│   └── utils/           # Language detection, helpers
├── examples/            
├── tests/               
└── scripts/

Troubleshooting

MPS errors on Mac:

model = PolyglotLite.from_pretrained("polyglot-135m", device="cpu")

Import errors: Make sure you're in the directory containing pyproject.toml when running pip install -e .

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
examples		examples
polyglotlite		polyglotlite
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PolyglotLite

Status

Quick Start

Installation

Basic Usage

Using Pretrained Models

Training on Your Data

Language Detection

Model Sizes

Supported Languages

Compatibility

Project Structure

Troubleshooting

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

PolyglotLite

Status

Quick Start

Installation

Basic Usage

Using Pretrained Models

Training on Your Data

Language Detection

Model Sizes

Supported Languages

Compatibility

Project Structure

Troubleshooting

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages