@@ -361,6 +361,39 @@ for obtaining these keys.
361361☝️ **Note** The same credentials can also be used for
362362[configuring cloud storage](/doc/cml-with-dvc#cloud-storage-provider-credentials).
363363
364+ The following are the minimum IAM permissions needed for the CML runner to
365+ deploy on EC2 :
366+
367+ - ` ec2:CreateSecurityGroup` -- _(Firewall and SSH Access Management)_
368+ - ` ec2:AuthorizeSecurityGroupEgress`
369+ - ` ec2:AuthorizeSecurityGroupIngress`
370+ - ` ec2:DescribeSecurityGroups`
371+ - ` ec2:DescribeSubnets`
372+ - ` ec2:DescribeVpcs`
373+ - ` ec2:ImportKeyPair`
374+ - ` ec2:DeleteKeyPair`
375+ - ` ec2:CreateTags` -- _(General Resource Management)_
376+ - ` ec2:RunInstances` -- _(EC2 Instance Management)
377+ - ` ec2:DescribeImages`
378+ - ` ec2:DescribeInstances`
379+ - ` ec2:TerminateInstances`
380+ - ` ec2:DescribeSpotInstanceRequests` -- _(Optionally needed for Spot Access)_
381+ - ` ec2:RequestSpotInstances`
382+ - ` ec2:CancelSpotInstanceRequests`
383+
384+ Outside of this list, you will need to add any extra permissions required
385+ for your process to complete. These extra permissions can either be added
386+ directly to the account used by the `cml runner` or can be specified during
387+ the `cml runnner` command with :
388+ [`--cloud-permission-set`](https://cml.dev/doc/ref/runner#--cloud-permission-set)
389+
390+ For example, if you need S3 read and write data, you may want to add :
391+
392+ - ` s3:ListBucket`
393+ - ` s3:PutObject`
394+ - ` s3:GetObject`
395+ - ` s3:DeleteObject`
396+
364397</tab>
365398<tab title="Azure">
366399
@@ -391,6 +424,50 @@ provisioned through environment variables instead of files.
391424</tab>
392425</toggle>
393426
427+ # ### Cloud Compute Resource Manual Cleanup
428+
429+ In very rare cases, you may need to cleanup CML cloud resources manually.
430+ An example of such a problem can be seen
431+ [when an EC2 instance ran out of storage space](https://github.com/iterative/cml/issues/1006).
432+
433+ The following is a list of all the resources you may need to
434+ manually cleanup in the case of a failure :
435+
436+ - The running instance (named with pattern `cml-{random-id}`)
437+ - The volume attached to the running instance
438+ (this should delete itself after terminating the instance)
439+ - The generated key-pair (named with pattern `cml-{random-id}`)
440+
441+ If you keep encountering issues, it is appreciated to attempt pulling the logs
442+ from the running instance before terminating and opening a GitHub Issue.
443+
444+ For easy access and debugging on the `cml runner` instance add :
445+
446+ > `--cloud-startup-script=$(echo 'echo "$(curl https://github.com/'"$GITHUB_ACTOR"'.keys)" >> /home/ubuntu/.ssh/authorized_keys' | base64 -w 0)`
447+
448+ If you encounter an error with the `cml runner` instance retrieving logs
449+ with the following is helpful for diagnosing the issue :
450+
451+ ☝️ **Note** Please give your cml.log a visual scan, entries like IP addresses
452+ and git repository names may be present and sensitive in some cases.
453+
454+ ` ` ` bash
455+ ssh ubuntu@instance_public_ip
456+ sudo journalctl -n all -u cml.service --no-pager > cml.log
457+ sudo dmesg --ctime > system.log
458+ ` ` `
459+
460+ You can then copy those logs to your local machine with :
461+
462+ ` ` ` bash
463+ scp ubuntu@instance_public_ip:~/cml.log .
464+ scp ubuntu@instance_public_ip:~/system.log .
465+ ` ` `
466+
467+ There is a chance that the instance could be severely broken if the SSH command
468+ hangs -- if that happens reboot it from the web console and try the commands
469+ again.
470+
394471# ### On-premise (Local) Runners
395472
396473The `cml runner` command can also be used to manually set up a local machine,
0 commit comments