Skip to content

Commit 7cae6e4

Browse files
committed
A couple of clarifications
1 parent eebf70c commit 7cae6e4

File tree

2 files changed

+7
-4
lines changed

2 files changed

+7
-4
lines changed

content/learning-paths/servers-and-cloud-computing/arcee-foundation-model-on-aws/03_building_llama_cpp.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,8 @@ layout: learningpathall
88

99
In this step, we'll build Llama.cpp from source. Llama.cpp is a high-performance C++ implementation of the LLaMA model that's optimized for inference on various hardware platforms, including ARM-based processors like Graviton4.
1010

11+
Even though AFM-4.5B has a custom model architecture, we're able to use the vanilla version of llama.cpp as the Arcee AI team has contributed the appropriate modeling code.
12+
1113
Here are all the steps.
1214

1315
## Step 1: Clone the Repository

content/learning-paths/servers-and-cloud-computing/arcee-foundation-model-on-aws/_index.md

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -6,11 +6,12 @@ minutes_to_complete: 30
66
who_is_this_for: This is an introductory topic for developers and engineers who want to deploy the Arcee AFM-4.5B small language model on an AWS Arm-based instance. AFM-4.5B is a 4.5-billion-parameter frontier model that delivers excellent accuracy, strict compliance, and very high cost-efficiency. It was trained on almost 7 trillion tokens of clean, rigorously filtered data, and has been tested across a wide range of languages, including Arabic, English, French, German, Hindi, Italian, Korean, Mandarin, Portuguese, Russian, and Spanish
77

88
learning_objectives:
9-
- Deploy an Arm-based Graviton4 virtual machine on Amazon Web Services
10-
- Connect to the virtual machine using SSH
11-
- Download the AFM-4.5B model from Hugging Face
12-
- Quantize the model with llama.cpp
9+
- Launch and set up an Arm-based Graviton4 virtual machine on Amazon Web Services
10+
- Build llama.cpp from source
11+
- Download AFM-4.5B from Hugging Face
12+
- Quantize AFM-4.5B with llama.cpp
1313
- Deploy the model and run inference with llama.cpp
14+
- Evaluate the quality of quantized models by measuring perplexity
1415

1516
prerequisites:
1617
- An Amazon Web Services account, with quota for c8g instances

0 commit comments

Comments
 (0)