Skip to content

Commit d123006

Browse files
committed
New learning path: Deploy Arcee AFM-4.5B on Google Axion
1 parent 7cae6e4 commit d123006

File tree

6 files changed

+389
-0
lines changed

6 files changed

+389
-0
lines changed
Lines changed: 102 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,102 @@
1+
---
2+
title: Launching an Axion c4a instance
3+
weight: 2
4+
5+
### FIXED, DO NOT MODIFY
6+
layout: learningpathall
7+
---
8+
9+
## System Requirements
10+
11+
- A Google Cloud account with billing enabled
12+
13+
- Quota for c4a instances in your preferred region
14+
15+
- A Linux or MacOS host
16+
17+
- A c4a-standard-4 or larger instance
18+
19+
- At least 128GB of storage
20+
21+
## Google Cloud Console Steps
22+
23+
Follow these steps to launch your Compute Engine instance using the Google Cloud Console:
24+
25+
### Step 1: Launch Compute Engine Instance
26+
27+
1. **Navigate to Google Cloud Console**
28+
29+
- Go to the [Google Cloud Console](https://console.cloud.google.com)
30+
31+
- Make sure you're in the correct project
32+
33+
- In the left navigation menu, click "Compute Engine" > "VM instances"
34+
35+
2. **Create Instance**
36+
37+
Click "CREATE INSTANCE" button
38+
39+
3. **Configure Instance Details**
40+
41+
- **Name**: Enter `arcee-axion-instance`
42+
43+
- **Region**: Select a region where c4a instances are available (e.g., us-central1, us-east1, europe-west1)
44+
45+
- **Zone**: Select any zone in the chosen region
46+
47+
- **Machine family**: Select "General urpose"
48+
49+
- **Series**: Select "C4A"
50+
51+
- **Machine type**: Select `c4a-standard-32` or larger
52+
- This provides 32 vCPUs and 128 GB memory
53+
54+
4. **Configure OS and Storage**
55+
56+
In the left menu, click on "OS and storage"
57+
58+
- Click "Change".
59+
60+
- **Size (GB)**: Set to `128`
61+
62+
- Click "Select"
63+
64+
5. **Configure Networking**
65+
66+
In the left menu, click on "Networking"
67+
68+
- Click
69+
70+
- **Important**: We'll configure SSH access through IAP (Identity-Aware Proxy) for security
71+
72+
7. **Create Instance**
73+
74+
- Review all settings
75+
76+
- Click "Create" at the bottom of the screen.
77+
78+
### Step 3: Connect to Your Instance
79+
80+
After a minute or so, the instance should be available.
81+
82+
- In the VM instances list, locate the instance name (`arcee-axion-instance`) and click on "SSH"
83+
84+
- This opens a browser-based SSH terminal. You may need to accept some security message
85+
86+
- No additional configuration is needed
87+
88+
- You should now be connected to your Ubuntu instance
89+
90+
### Important Notes
91+
92+
- **Region Selection**: Ensure you're in a region where c4a instances are available
93+
94+
- **Quota**: Make sure you have sufficient quota for c4a instances in your selected region
95+
96+
- **Security**: The browser-based SSH connection is more secure as it uses Google's Identity-Aware Proxy
97+
98+
- **Storage**: The 128GB boot disk is sufficient for the Arcee model and dependencies
99+
100+
- **Cost**: Monitor your usage in the Google Cloud Console billing section
101+
102+
- **Backup**: Consider creating snapshots for backup purposes
Lines changed: 51 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,51 @@
1+
---
2+
title: Setting up the instance
3+
weight: 3
4+
5+
### FIXED, DO NOT MODIFY
6+
layout: learningpathall
7+
---
8+
9+
In this step, we'll set up the Axion c4a instance with all the necessary tools and dependencies required to build and run the Arcee Foundation Model. This includes installing the build tools and Python environment.
10+
11+
## Step 1: Update Package List
12+
13+
```bash
14+
sudo apt-get update
15+
```
16+
17+
This command updates the local package index from the repositories:
18+
19+
- Downloads the latest package lists from all configured APT repositories
20+
- Ensures you have the most recent information about available packages and their versions
21+
- This is a best practice before installing new packages to avoid potential conflicts
22+
- The package index contains metadata about available packages, their dependencies, and version information
23+
24+
## Step 2: Install System Dependencies
25+
26+
```bash
27+
sudo apt-get install cmake gcc g++ git python3 python3-pip python3-virtualenv libcurl4-openssl-dev unzip -y
28+
```
29+
30+
This command installs all the essential development tools and dependencies:
31+
32+
- **cmake**: Cross-platform build system generator that we'll use to compile Llama.cpp
33+
- **gcc & g++**: GNU C and C++ compilers for building native code
34+
- **git**: Version control system for cloning repositories
35+
- **python3**: Python interpreter for running Python-based tools and scripts
36+
- **python3-pip**: Python package installer for managing Python dependencies
37+
- **python3-virtualenv**: Tool for creating isolated Python environments
38+
- **libcurl4-openssl-dev**: client-side URL transfer library
39+
40+
The `-y` flag automatically answers "yes" to prompts, making the installation non-interactive.
41+
42+
## What's Ready Now
43+
44+
After completing these steps, your Axion c4a instance will have:
45+
46+
- A complete C/C++ development environment for building Llama.cpp
47+
- Python 3 with pip for managing Python packages
48+
- Git for cloning repositories
49+
- All necessary build tools for compiling optimized ARM64 binaries
50+
51+
The system is now prepared for the next steps: building Llama.cpp and downloading the Arcee Foundation Model.
Lines changed: 82 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,82 @@
1+
---
2+
title: Building Llama.cpp
3+
weight: 4
4+
5+
### FIXED, DO NOT MODIFY
6+
layout: learningpathall
7+
---
8+
9+
In this step, we'll build Llama.cpp from source. Llama.cpp is a high-performance C++ implementation of the LLaMA model that's optimized for inference on various hardware platforms, including ARM-based processors like Google Axion.
10+
11+
Even though AFM-4.5B has a custom model architecture, we're able to use the vanilla version of llama.cpp as the Arcee AI team has contributed the appropriate modeling code.
12+
13+
Here are all the steps.
14+
15+
## Step 1: Clone the Repository
16+
17+
```bash
18+
git clone https://github.com/ggerganov/llama.cpp
19+
```
20+
21+
This command clones the Llama.cpp repository from GitHub to your local machine. The repository contains the source code, build scripts, and documentation needed to compile the inference engine.
22+
23+
## Step 2: Navigate to the Project Directory
24+
25+
```bash
26+
cd llama.cpp
27+
```
28+
29+
Change into the llama.cpp directory where we'll perform the build process. This directory contains the CMakeLists.txt file and source code structure.
30+
31+
## Step 3: Configure the Build with CMake
32+
33+
```bash
34+
cmake -B .
35+
```
36+
37+
This command uses CMake to configure the build system:
38+
- `-B .` specifies that the build files should be generated in the current directory
39+
- CMake will detect your system's compiler, libraries, and hardware capabilities
40+
- It will generate the appropriate build files (Makefiles on Linux) based on your system configuration
41+
42+
Note: The cmake output should include the information below, indicating that the build process will leverage the Neoverse V2 architecture's specialized instruction sets designed for AI/ML workloads. These optimizations are crucial for achieving optimal performance on Axion:
43+
44+
```bash
45+
-- ARM feature DOTPROD enabled
46+
-- ARM feature SVE enabled
47+
-- ARM feature MATMUL_INT8 enabled
48+
-- ARM feature FMA enabled
49+
-- ARM feature FP16_VECTOR_ARITHMETIC enabled
50+
-- Adding CPU backend variant ggml-cpu: -mcpu=neoverse-v2+crc+sve2-aes+sve2-sha3+dotprod+i8mm+sve
51+
```
52+
53+
- **DOTPROD: Dot Product** - Hardware-accelerated dot product operations for neural network computations
54+
- **SVE: Scalable Vector Extension** - Advanced vector processing capabilities that can handle variable-length vectors up to 2048 bits, providing significant performance improvements for matrix operations
55+
- **MATMUL_INT8: Matrix multiplication units** - Dedicated hardware for efficient matrix operations common in transformer models, accelerating the core computations of large language models
56+
- **FMA: Fused Multiply-Add - Optimized floating-point operations that combine multiplication and addition in a single instruction
57+
- **FP16 Vector Arithmetic - Hardware support for 16-bit floating-point vector operations, reducing memory usage while maintaining good numerical precision
58+
59+
## Step 4: Compile the Project
60+
61+
```bash
62+
cmake --build . --config Release -j16
63+
```
64+
65+
This command compiles the Llama.cpp project:
66+
- `--build .` tells CMake to build the project using the files in the current directory
67+
- `--config Release` specifies a Release build configuration, which enables optimizations and removes debug symbols
68+
- `-j16` runs the build with 16 parallel jobs, which speeds up compilation on multi-core systems like Axion.
69+
70+
The build process will compile the C++ source code into executable binaries optimized for your ARM64 architecture. This should only take a minute.
71+
72+
## What Gets Built
73+
74+
After successful compilation, you'll have several key command-line executables in the `bin` directory:
75+
- `llama-cli` - The main inference executable for running LLaMA models
76+
- `llama-server` - A web server for serving model inference over HTTP
77+
- `llama-quantize` - a tool for model quantization to reduce memory usage
78+
- Various utility programs for model conversion and optimization
79+
80+
You can find more information in the llama.cpp [GitHub repository](https://github.com/ggml-org/llama.cpp/tree/master/tools).
81+
82+
These binaries are specifically optimized for ARM64 architecture and will provide excellent performance on your Google Axion instance.
Lines changed: 68 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,68 @@
1+
---
2+
title: Installing Python dependencies for llama.cpp
3+
weight: 5
4+
5+
### FIXED, DO NOT MODIFY
6+
layout: learningpathall
7+
---
8+
9+
In this step, we'll set up a Python virtual environment and install the required dependencies for working with Llama.cpp. This ensures we have a clean, isolated Python environment with all the necessary packages for model optimization.
10+
11+
Here are all the steps.
12+
13+
## Step 1: Create a Python Virtual Environment
14+
15+
```bash
16+
virtualenv env-llama-cpp
17+
```
18+
19+
This command creates a new Python virtual environment named `env-llama-cpp`:
20+
- Virtual environments provide isolated Python environments that prevent conflicts between different projects
21+
- The `env-llama-cpp` directory will contain its own Python interpreter and package installation space
22+
- This isolation ensures that the Llama.cpp dependencies won't interfere with other Python projects on your system
23+
- Virtual environments are essential for reproducible development environments
24+
25+
## Step 2: Activate the Virtual Environment
26+
27+
```bash
28+
source env-llama-cpp/bin/activate
29+
```
30+
31+
This command activates the virtual environment:
32+
- The `source` command executes the activation script, which modifies your current shell environment
33+
- Depending on you sheel, your command prompt may change to show `(env-llama-cpp)` at the beginning, indicating the active environment. We will reflect this in the following commands.
34+
- All subsequent `pip` commands will install packages into this isolated environment
35+
- The `PATH` environment variable is updated to prioritize the virtual environment's Python interpreter
36+
37+
## Step 3: Upgrade pip to the Latest Version
38+
39+
```bash
40+
(env-llama-cpp) pip install --upgrade pip
41+
```
42+
43+
This command ensures you have the latest version of pip:
44+
- Upgrading pip helps avoid compatibility issues with newer packages
45+
- The `--upgrade` flag tells pip to install the newest available version
46+
- This is a best practice before installing project dependencies
47+
- Newer pip versions often include security fixes and improved package resolution
48+
49+
## Step 4: Install Project Dependencies
50+
51+
```bash
52+
(env-llama-cpp) pip install -r requirements.txt
53+
```
54+
55+
This command installs all the Python packages specified in the requirements.txt file:
56+
- The `-r` flag tells pip to read the package list from the specified file
57+
- `requirements.txt` contains a list of Python packages and their version specifications
58+
- This ensures everyone working on the project uses the same package versions
59+
- The installation will include packages needed for model loading, inference, and any Python bindings for Llama.cpp
60+
61+
## What Gets Installed
62+
63+
After successful installation, your virtual environment will contain:
64+
- **NumPy**: For numerical computations and array operations
65+
- **Requests**: For HTTP operations and API calls
66+
- **Other dependencies**: Specific packages needed for Llama.cpp Python integration
67+
68+
The virtual environment is now ready for running Python scripts that interact with the compiled Llama.cpp binaries. Remember to always activate the virtual environment (`source env-llama-cpp/bin/activate`) before running any Python code related to this project.
Lines changed: 78 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,78 @@
1+
---
2+
title: Deploy Arcee AFM-4.5B on Google Axion
3+
4+
minutes_to_complete: 30
5+
6+
who_is_this_for: This is an introductory topic for developers and engineers who want to deploy the Arcee AFM-4.5B small language model on a Google Cloud Axion c4a instance. AFM-4.5B is a 4.5-billion-parameter frontier model that delivers excellent accuracy, strict compliance, and very high cost-efficiency. It was trained on almost 7 trillion tokens of clean, rigorously filtered data, and has been tested across a wide range of languages, including Arabic, English, French, German, Hindi, Italian, Korean, Mandarin, Portuguese, Russian, and Spanish
7+
8+
learning_objectives:
9+
- Launch and set up an Arm-based Axion c4a virtual machine on Google Cloud
10+
11+
- Build llama.cpp from source
12+
13+
- Download AFM-4.5B from Hugging Face
14+
15+
- Quantize AFM-4.5B with llama.cpp
16+
17+
- Deploy the model and run inference with llama.cpp
18+
19+
- Evaluate the quality of quantized models by measuring perplexity
20+
21+
prerequisites:
22+
- A Google Cloud account, with quota for c4a instances
23+
24+
- Basic familiarity with SSH
25+
26+
author: Julien Simon
27+
28+
### Tags
29+
# Tagging metadata, see the Learning Path guide for the allowed values
30+
skilllevels: Introductory
31+
subjects: ML
32+
arm_ips:
33+
- Neoverse
34+
35+
tools_software_languages:
36+
- Google Cloud
37+
38+
- Linux
39+
40+
- Python
41+
42+
- Llama.cpp
43+
44+
operatingsystems:
45+
- Linux
46+
47+
further_reading:
48+
- resource:
49+
title: Arcee AI
50+
link: https://www.arcee.ai
51+
type: Website
52+
53+
- resource:
54+
title: Announcing Arcee Foundation Models
55+
link: https://www.arcee.ai/blog/announcing-the-arcee-foundation-model-family
56+
type: Blog
57+
58+
- resource:
59+
title: AFM-4.5B, the First Arcee Foundation Model
60+
link: https://www.arcee.ai/blog/deep-dive-afm-4-5b-the-first-arcee-foundational-model
61+
type: Blog
62+
63+
- resource:
64+
title: Google Cloud c4a Instances
65+
link: https://cloud.google.com/blog/products/compute/try-c4a-the-first-google-axion-processor
66+
type: Documentation
67+
68+
- resource:
69+
title: Google Cloud Compute Engine
70+
link: https://cloud.google.com/compute/docs
71+
type: Documentation
72+
73+
### FIXED, DO NOT MODIFY
74+
# ================================================================================
75+
weight: 1 # _index.md always has weight of 1 to order correctly
76+
layout: "learningpathall" # All files under learning paths have this same wrapper
77+
learning_path_main_page: "yes" # This should be surfaced when looking for related content. Only set for _index.md of learning path content.
78+
---
Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
---
2+
# ================================================================================
3+
# FIXED, DO NOT MODIFY THIS FILE
4+
# ================================================================================
5+
weight: 21 # Set to always be larger than the content in this path to be at the end of the navigation.
6+
title: "Next Steps" # Always the same, html page title.
7+
layout: "learningpathall" # All files under learning paths have this same wrapper for Hugo processing.
8+
---

0 commit comments

Comments
 (0)