Skip to content

quantization support #71

@neverclover

Description

@neverclover

Requirements
Quantization Methods: Ensure compatibility with bitsandbytes to provide 8-bit and 4-bit quantization options within the existing model inference workflow.

Documentation: Provide clear instructions on how to toggle quantization modes, list necessary dependencies, and specify supported/unsupported model architectures.

Examples & Benchmarks: Include integration examples and API usage code. Provide a comparative analysis of model accuracy, inference speed, and memory usage before and after quantization.

Apple Silicon Support (Optional): Include compatibility notes or specific configurations required for running quantized models on Apple Silicon (M-series) hardware.

Motivation
Resource Efficiency: Lower VRAM/RAM consumption to allow the deployment of larger models on hardware with limited resources.

Inference Speed: Improve throughput to facilitate faster deployment and real-world application responsiveness.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions