Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
104 changes: 104 additions & 0 deletions pod.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,104 @@
kind: Pod
apiVersion: v1
metadata:
# this is a sample pod name.
name: <user>-dev-e2e-1x-aiu

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add comment saying: # this is a sample pod name.

spec:
restartPolicy: Always
serviceAccountName: default
imagePullSecrets:
- name: <user>-secret
priority: 0
schedulerName: aiu-scheduler
enableServiceLinks: true
containers:
- resources:
limits:
ibm.com/aiu_pf: '1'
requests:
ibm.com/aiu_pf: '1'
terminationMessagePath: /dev/termination-log
# Sample container name. Substitute with your own name.
name: <user>-dev-e2e-1x-aiu

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add comment: # Sample container name. Substitute with your own name.

command:
- bash
- '-c'
env:
- name: FLEX_COMPUTE
value: SENTIENT
- name: FLEX_DEVICE
value: PF
- name: FLEX_OVERWRITE_NMB_FRAME
value: '1'
- name: FLEX_UNLINK_DEVMEM
value: 'false'
- name: PYTHONUNBUFFERED
value: '1'
- name: HOME
value: /home/senuser
- name: HF_HUB_OFFLINE
value: '1'
# This can be canned to your local home path environment.
- name: HF_HOME
value: /home/senuser/models/huggingface_cache

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add this comment. # This can be canned to your local home path environment.

Since this will be exposed externally, we should say that we are just providing a sample yaml. User can adjust the fields based on their AIU image and cluster environment.

- name: HF_HUB_CACHE
value: /home/senuser/models/huggingface_cache/hub
- name: DTLOG_LEVEL
value: error
- name: TORCH_SENDNN_LOG
value: CRITICAL
- name: DT_DEEPRT_VERBOSE
value: '-1'
- name: POD_IMAGE
value: *pod_image
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’ve never seen the POD_IMAGE and FMS_CHECKOUT env vars before, where does it get used in the stack? Thanks

- name: FMS_CHECKOUT
value: v1.1.0
securityContext:
capabilities:
drop:
- ALL
runAsUser: 1000810000
runAsNonRoot: true
allowPrivilegeEscalation: false
imagePullPolicy: IfNotPresent
volumeMounts:
- name: dev-shm
mountPath: /dev/shm
terminationMessagePolicy: File
# AIU software image
image: &pod_image icr.io/ibmaiu_internal/x86_64/dd2/e2e_stable:latest

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add comment # AIU software image

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an internal image, can we remove this path please?

workingDir: /home/senuser
args:
- |
source ~/.bashrc
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be in line with the README we might not need all these args

unset HF_HOME
cd $HOME
pip3 install -q -U transformers
git clone https://github.com/foundation-model-stack/foundation-model-stack.git
cd foundation-model-stack
git checkout $FMS_CHECKOUT
cp ${AIU_AUTOGEN_SENLIB_CONFIG_FILE} /tmp/etc/aiu/senlib_config.json
FILE=/tmp/etc/aiu/senlib_config.json
cat $FILE | jq '. += {"RISCV": {"DOOM": { "enable" : false}}, "SNT_MCI" : { "DCR": {"MCI_CTRL": {"ENABLE_RISCV": "0x0"} } }}' > $FILE.jq
mv $FILE.jq $FILE
cp /tmp/etc/aiu/senlib_config.json $HOME/.senlib.json
echo "POD_IMAGE:" $POD_IMAGE >> /tmp/aiu-query-devices.txt
echo " " >> /tmp/aiu-query-devices.txt
/opt/sentient/bin/aiu-query-devices >> /tmp/aiu-query-devices.txt
echo " " >> ~/.bashrc
echo "cat /tmp/aiu-query-devices.txt" >> ~/.bashrc
echo 'FLEX_COMPUTE = ' $FLEX_COMPUTE
echo 'FLEX_DEVICE = ' $FLEX_DEVICE
echo 'DTLOG_LEVEL = ' $DTLOG_LEVEL
echo 'TORCH_SENDNN_LOG = ' $TORCH_SENDNN_LOG
echo 'DT_DEEPRT_VERBOSE = ' $DT_DEEPRT_VERBOSE
echo 'INFER_SCRIPT = ' $INFER_SCRIPT
echo 'MODEL = ' $MODEL
tail -f /dev/null
serviceAccount: default
volumes:
- name: dev-shm
emptyDir:
medium: Memory
sizeLimit: 64Gi
dnsPolicy: ClusterFirst