New learning path: Deploy Arcee AFM-4.5B on Google Axion

juliensimon · juliensimon · commit d123006311a4 · 2025-06-26T16:54:31.000+02:00
diff --git a/content/learning-paths/servers-and-cloud-computing/arcee-foundation-model-on-gcp/01_launching_an axion_instance.md b/content/learning-paths/servers-and-cloud-computing/arcee-foundation-model-on-gcp/01_launching_an axion_instance.md
@@ -0,0 +1,102 @@
+---
+title: Launching an Axion c4a instance
+weight: 2
+
+### FIXED, DO NOT MODIFY
+layout: learningpathall
+---
+
+## System Requirements
+
+  - A Google Cloud account with billing enabled
+
+  - Quota for c4a instances in your preferred region
+
+  - A Linux or MacOS host
+
+  - A c4a-standard-4 or larger instance
+
+  - At least 128GB of storage
+
+## Google Cloud Console Steps
+
+Follow these steps to launch your Compute Engine instance using the Google Cloud Console:
+
+### Step 1: Launch Compute Engine Instance
+
+1. **Navigate to Google Cloud Console**
+
+   - Go to the [Google Cloud Console](https://console.cloud.google.com)
+
+   - Make sure you're in the correct project
+
+   - In the left navigation menu, click "Compute Engine" > "VM instances"
+
+2. **Create Instance**
+
+   Click "CREATE INSTANCE" button
+
+3. **Configure Instance Details**
+
+   - **Name**: Enter `arcee-axion-instance`
+
+   - **Region**: Select a region where c4a instances are available (e.g., us-central1, us-east1, europe-west1)
+
+   - **Zone**: Select any zone in the chosen region
+
+   - **Machine family**: Select "General urpose"
+
+   - **Series**: Select "C4A"
+
+   - **Machine type**: Select `c4a-standard-32` or larger
+     - This provides 32 vCPUs and 128 GB memory
+
+4. **Configure OS and Storage**
+
+   In the left menu, click on "OS and storage"
+
+   - Click "Change".
+
+   - **Size (GB)**: Set to `128`
+
+   - Click "Select"
+
+5. **Configure Networking**
+
+   In the left menu, click on "Networking"
+
+   - Click 
+
+   - **Important**: We'll configure SSH access through IAP (Identity-Aware Proxy) for security
+
+7. **Create Instance**
+
+   - Review all settings
+
+   - Click "Create" at the bottom of the screen.
+
+### Step 3: Connect to Your Instance
+
+  After a minute or so, the instance should be available.
+
+   - In the VM instances list, locate the instance name (`arcee-axion-instance`) and click on "SSH"
+
+   - This opens a browser-based SSH terminal. You may need to accept some security message
+
+   - No additional configuration is needed
+
+   - You should now be connected to your Ubuntu instance
+
+### Important Notes
+
+- **Region Selection**: Ensure you're in a region where c4a instances are available
+
+- **Quota**: Make sure you have sufficient quota for c4a instances in your selected region
+
+- **Security**: The browser-based SSH connection is more secure as it uses Google's Identity-Aware Proxy
+
+- **Storage**: The 128GB boot disk is sufficient for the Arcee model and dependencies
+
+- **Cost**: Monitor your usage in the Google Cloud Console billing section
+
+- **Backup**: Consider creating snapshots for backup purposes
diff --git a/content/learning-paths/servers-and-cloud-computing/arcee-foundation-model-on-gcp/02_setting_up_the_instance.md b/content/learning-paths/servers-and-cloud-computing/arcee-foundation-model-on-gcp/02_setting_up_the_instance.md
@@ -0,0 +1,51 @@
+---
+title: Setting up the instance
+weight: 3
+
+### FIXED, DO NOT MODIFY
+layout: learningpathall
+---
+
+In this step, we'll set up the Axion c4a instance with all the necessary tools and dependencies required to build and run the Arcee Foundation Model. This includes installing the build tools and Python environment.
+
+## Step 1: Update Package List
+
+```bash
+sudo apt-get update
+```
+
+This command updates the local package index from the repositories:
+
+- Downloads the latest package lists from all configured APT repositories
+- Ensures you have the most recent information about available packages and their versions
+- This is a best practice before installing new packages to avoid potential conflicts
+- The package index contains metadata about available packages, their dependencies, and version information
+
+## Step 2: Install System Dependencies
+
+```bash
+sudo apt-get install cmake gcc g++ git python3 python3-pip python3-virtualenv libcurl4-openssl-dev unzip -y
+```
+
+This command installs all the essential development tools and dependencies:
+
+- **cmake**: Cross-platform build system generator that we'll use to compile Llama.cpp
+- **gcc & g++**: GNU C and C++ compilers for building native code
+- **git**: Version control system for cloning repositories
+- **python3**: Python interpreter for running Python-based tools and scripts
+- **python3-pip**: Python package installer for managing Python dependencies
+- **python3-virtualenv**: Tool for creating isolated Python environments
+- **libcurl4-openssl-dev**: client-side URL transfer library
+
+The `-y` flag automatically answers "yes" to prompts, making the installation non-interactive.
+
+## What's Ready Now
+
+After completing these steps, your Axion c4a instance will have:
+
+- A complete C/C++ development environment for building Llama.cpp
+- Python 3 with pip for managing Python packages
+- Git for cloning repositories
+- All necessary build tools for compiling optimized ARM64 binaries
+
+The system is now prepared for the next steps: building Llama.cpp and downloading the Arcee Foundation Model.
diff --git a/content/learning-paths/servers-and-cloud-computing/arcee-foundation-model-on-gcp/03_building_llama_cpp.md b/content/learning-paths/servers-and-cloud-computing/arcee-foundation-model-on-gcp/03_building_llama_cpp.md
@@ -0,0 +1,82 @@
+---
+title: Building Llama.cpp
+weight: 4
+
+### FIXED, DO NOT MODIFY
+layout: learningpathall
+---
+
+In this step, we'll build Llama.cpp from source. Llama.cpp is a high-performance C++ implementation of the LLaMA model that's optimized for inference on various hardware platforms, including ARM-based processors like Google Axion.
+
+Even though AFM-4.5B has a custom model architecture, we're able to use the vanilla version of llama.cpp as the Arcee AI team has contributed the appropriate modeling code.
+
+Here are all the steps.
+
+## Step 1: Clone the Repository
+
+```bash
+git clone https://github.com/ggerganov/llama.cpp
+```
+
+This command clones the Llama.cpp repository from GitHub to your local machine. The repository contains the source code, build scripts, and documentation needed to compile the inference engine.
+
+## Step 2: Navigate to the Project Directory
+
+```bash
+cd llama.cpp
+```
+
+Change into the llama.cpp directory where we'll perform the build process. This directory contains the CMakeLists.txt file and source code structure.
+
+## Step 3: Configure the Build with CMake
+
+```bash
+cmake -B .
+```
+
+This command uses CMake to configure the build system:
+- `-B .` specifies that the build files should be generated in the current directory
+- CMake will detect your system's compiler, libraries, and hardware capabilities
+- It will generate the appropriate build files (Makefiles on Linux) based on your system configuration
+
+Note: The cmake output should include the information below, indicating that the build process will leverage the Neoverse V2 architecture's specialized instruction sets designed for AI/ML workloads. These optimizations are crucial for achieving optimal performance on Axion:
+
+```bash
+-- ARM feature DOTPROD enabled
+-- ARM feature SVE enabled
+-- ARM feature MATMUL_INT8 enabled
+-- ARM feature FMA enabled
+-- ARM feature FP16_VECTOR_ARITHMETIC enabled
+-- Adding CPU backend variant ggml-cpu: -mcpu=neoverse-v2+crc+sve2-aes+sve2-sha3+dotprod+i8mm+sve
+```
+
+- **DOTPROD: Dot Product** - Hardware-accelerated dot product operations for neural network computations
+- **SVE: Scalable Vector Extension** - Advanced vector processing capabilities that can handle variable-length vectors up to 2048 bits, providing significant performance improvements for matrix operations
+- **MATMUL_INT8: Matrix multiplication units** - Dedicated hardware for efficient matrix operations common in transformer models, accelerating the core computations of large language models
+- **FMA: Fused Multiply-Add - Optimized floating-point operations that combine multiplication and addition in a single instruction
+- **FP16 Vector Arithmetic - Hardware support for 16-bit floating-point vector operations, reducing memory usage while maintaining good numerical precision
+
+## Step 4: Compile the Project
+
+```bash
+cmake --build . --config Release -j16
+```
+
+This command compiles the Llama.cpp project:
+- `--build .` tells CMake to build the project using the files in the current directory
+- `--config Release` specifies a Release build configuration, which enables optimizations and removes debug symbols
+- `-j16` runs the build with 16 parallel jobs, which speeds up compilation on multi-core systems like Axion.
+
+The build process will compile the C++ source code into executable binaries optimized for your ARM64 architecture. This should only take a minute.
+
+## What Gets Built
+
+After successful compilation, you'll have several key command-line executables in the `bin` directory:
+- `llama-cli` - The main inference executable for running LLaMA models
+- `llama-server` - A web server for serving model inference over HTTP
+- `llama-quantize` - a tool for model quantization to reduce memory usage
+- Various utility programs for model conversion and optimization
+
+You can find more information in the llama.cpp [GitHub repository](https://github.com/ggml-org/llama.cpp/tree/master/tools).
+
+These binaries are specifically optimized for ARM64 architecture and will provide excellent performance on your Google Axion instance.
diff --git a/content/learning-paths/servers-and-cloud-computing/arcee-foundation-model-on-gcp/04_install_python_dependencies_for_llama_cpp.md b/content/learning-paths/servers-and-cloud-computing/arcee-foundation-model-on-gcp/04_install_python_dependencies_for_llama_cpp.md
@@ -0,0 +1,68 @@
+---
+title: Installing Python dependencies for llama.cpp
+weight: 5
+
+### FIXED, DO NOT MODIFY
+layout: learningpathall
+---
+
+In this step, we'll set up a Python virtual environment and install the required dependencies for working with Llama.cpp. This ensures we have a clean, isolated Python environment with all the necessary packages for model optimization.
+
+Here are all the steps.
+
+## Step 1: Create a Python Virtual Environment
+
+```bash
+virtualenv env-llama-cpp
+```
+
+This command creates a new Python virtual environment named `env-llama-cpp`:
+- Virtual environments provide isolated Python environments that prevent conflicts between different projects
+- The `env-llama-cpp` directory will contain its own Python interpreter and package installation space
+- This isolation ensures that the Llama.cpp dependencies won't interfere with other Python projects on your system
+- Virtual environments are essential for reproducible development environments
+
+## Step 2: Activate the Virtual Environment
+
+```bash
+source env-llama-cpp/bin/activate
+```
+
+This command activates the virtual environment:
+- The `source` command executes the activation script, which modifies your current shell environment
+- Depending on you sheel, your command prompt may change to show `(env-llama-cpp)` at the beginning, indicating the active environment. We will reflect this in the following commands.
+- All subsequent `pip` commands will install packages into this isolated environment
+- The `PATH` environment variable is updated to prioritize the virtual environment's Python interpreter
+
+## Step 3: Upgrade pip to the Latest Version
+
+```bash
+(env-llama-cpp) pip install --upgrade pip
+```
+
+This command ensures you have the latest version of pip:
+- Upgrading pip helps avoid compatibility issues with newer packages
+- The `--upgrade` flag tells pip to install the newest available version
+- This is a best practice before installing project dependencies
+- Newer pip versions often include security fixes and improved package resolution
+
+## Step 4: Install Project Dependencies
+
+```bash
+(env-llama-cpp) pip install -r requirements.txt
+```
+
+This command installs all the Python packages specified in the requirements.txt file:
+- The `-r` flag tells pip to read the package list from the specified file
+- `requirements.txt` contains a list of Python packages and their version specifications
+- This ensures everyone working on the project uses the same package versions
+- The installation will include packages needed for model loading, inference, and any Python bindings for Llama.cpp
+
+## What Gets Installed
+
+After successful installation, your virtual environment will contain:
+- **NumPy**: For numerical computations and array operations
+- **Requests**: For HTTP operations and API calls
+- **Other dependencies**: Specific packages needed for Llama.cpp Python integration
+
+The virtual environment is now ready for running Python scripts that interact with the compiled Llama.cpp binaries. Remember to always activate the virtual environment (`source env-llama-cpp/bin/activate`) before running any Python code related to this project.
diff --git a/content/learning-paths/servers-and-cloud-computing/arcee-foundation-model-on-gcp/_index.md b/content/learning-paths/servers-and-cloud-computing/arcee-foundation-model-on-gcp/_index.md
@@ -0,0 +1,78 @@
+---
+title: Deploy Arcee AFM-4.5B on Google Axion
+
+minutes_to_complete: 30
+
+who_is_this_for: This is an introductory topic for developers and engineers who want to deploy the Arcee AFM-4.5B small language model on a Google Cloud Axion c4a instance. AFM-4.5B is a 4.5-billion-parameter frontier model that delivers excellent accuracy, strict compliance, and very high cost-efficiency. It was trained on almost 7 trillion tokens of clean, rigorously filtered data, and has been tested across a wide range of languages, including Arabic, English, French, German, Hindi, Italian, Korean, Mandarin, Portuguese, Russian, and Spanish
+
+learning_objectives:
+    - Launch and set up an Arm-based Axion c4a virtual machine on Google Cloud
+
+    - Build llama.cpp from source
+
+    - Download AFM-4.5B from Hugging Face
+
+    - Quantize AFM-4.5B with llama.cpp
+
+    - Deploy the model and run inference with llama.cpp
+
+    - Evaluate the quality of quantized models by measuring perplexity
+
+prerequisites:
+    - A Google Cloud account, with quota for c4a instances
+
+    - Basic familiarity with SSH
+
+author: Julien Simon
+
+### Tags
+# Tagging metadata, see the Learning Path guide for the allowed values
+skilllevels: Introductory
+subjects: ML
+arm_ips:
+    - Neoverse
+
+tools_software_languages:
+    - Google Cloud
+
+    - Linux
+
+    - Python
+
+    - Llama.cpp
+
+operatingsystems:
+    - Linux
+
+further_reading:
+    - resource:
+        title: Arcee AI
+        link: https://www.arcee.ai
+        type: Website
+
+    - resource:
+        title: Announcing Arcee Foundation Models
+        link: https://www.arcee.ai/blog/announcing-the-arcee-foundation-model-family
+        type: Blog
+
+    - resource:
+        title: AFM-4.5B, the First Arcee Foundation Model
+        link: https://www.arcee.ai/blog/deep-dive-afm-4-5b-the-first-arcee-foundational-model
+        type: Blog
+
+    - resource:
+        title: Google Cloud c4a Instances
+        link: https://cloud.google.com/blog/products/compute/try-c4a-the-first-google-axion-processor 
+        type: Documentation
+
+    - resource:
+        title: Google Cloud Compute Engine
+        link: https://cloud.google.com/compute/docs
+        type: Documentation
+
+### FIXED, DO NOT MODIFY
+# ================================================================================
+weight: 1                       # _index.md always has weight of 1 to order correctly
+layout: "learningpathall"       # All files under learning paths have this same wrapper
+learning_path_main_page: "yes"  # This should be surfaced when looking for related content. Only set for _index.md of learning path content.
+---
diff --git a/content/learning-paths/servers-and-cloud-computing/arcee-foundation-model-on-gcp/_next-steps.md b/content/learning-paths/servers-and-cloud-computing/arcee-foundation-model-on-gcp/_next-steps.md
@@ -0,0 +1,8 @@
+---
+# ================================================================================
+#       FIXED, DO NOT MODIFY THIS FILE
+# ================================================================================
+weight: 21                  # Set to always be larger than the content in this path to be at the end of the navigation.
+title: "Next Steps"         # Always the same, html page title.
+layout: "learningpathall"   # All files under learning paths have this same wrapper for Hugo processing.
+---