AI Research (PhD) β ML / AI Engineering β Cloud Engineering + AIOps
(AWS Β· Terraform Β· Observability Β· FinOps)
Started in academia studying how systems fail β modeling SSD and HDD reliability at HPC scale, published at ARCS 2024, funded by the EU Horizon 2020 IO-SEA project. That work pushed me toward engineering resilient systems at scale.
From there I shipped AI/ML systems to production, then completed the Ironhack Cloud Engineering Bootcamp to own the layer underneath them: AWS infrastructure-as-code, CI/CD, observability, security and FinOps. Today I build AI-powered cloud platforms β where AIOps agents don't just watch infrastructure, they act on it under a human-in-the-loop.
The goal: a Cloud Engineer who speaks fluent AWS and fluent AI β owning the full path from research to production infrastructure.
Cloud & Infrastructure
AI & Machine Learning
Languages, Data & Tooling
π€ AIOps Observability Platform Β Β·Β Cloud Engineering Capstone
A production AWS platform whose core value is an intelligent operations layer: CrewAI agents on Amazon Bedrock that analyze live infrastructure and take supervised actions on it under a 3-tier human-in-the-loop trust model (analysis Β· autonomous non-prod Β· approval-gated). Built on full observability (structured logging, custom CloudWatch metrics, Golden-Signal dashboards, tiered alerting), edge security (CloudFront + WAF + GuardDuty + Config/CloudTrail), an Auth0-authenticated operator console, and FinOps cost controls β all Terraform IaC, shipped through GitHub Actions (OIDC, no static keys). Deployed and verified in us-east-1 across 3 AZs.
Highlight: a live operator console with Approve / Reject buttons β the human-in-the-loop, demoed end-to-end. ~$145/mo prod run-rate with deliberate FinOps trade-offs. Β π₯οΈ Live console
Bedrock CrewAI ECS Fargate RDS CloudFront WAF CloudWatch Lambda Auth0 Terraform GitHub Actions
Production-grade AWS 3-tier architecture: internet-facing ALB β 6 Node.js EC2 instances across 2 AZs with Auto Scaling β isolated data tier. Custom VPC with network segmentation, security-group chaining, and CloudWatch monitoring.
Highlight: fully automated scaling and high availability across multiple availability zones.
AWS VPC EC2 Auto Scaling ALB CloudWatch Terraform
The observability foundation the capstone grew from. An order-processing API (FastAPI Β· ECS Fargate Β· RDS) with structured JSON logging, 8 custom CloudWatch metrics, Golden-Signal dashboards, tiered SNS alerting, Lambda auto-remediation, FinOps cost monitoring, and AI-powered incident analysis via CrewAI β all Terraform IaC, shipped through GitHub Actions.
Highlight: three injected failure scenarios (error flood, high latency, CPU spike), each diagnosed from the CloudWatch correlation view and remediated automatically by Lambda β closing the alert loop with no human in it.
FastAPI ECS Fargate RDS CloudWatch Lambda SNS Terraform CrewAI
ML-driven reliability analysis of SSD and HDD failure in HPC burst buffers. Uses SMART telemetry from ~1M Alibaba SSDs and Backblaze HDDs to predict Mean Time to Failure with Random Forest and LSTM models.
Highlight: 94% prediction accuracy β published at ARCS 2024, funded by EU Horizon 2020 IO-SEA.
Python MongoDB scikit-learn XGBoost LSTM
π§ͺ Plus 30+ hands-on labs from the Ironhack Cloud Engineering Bootcamp β Terraform & GitOps workflows, FinOps cost optimization (EC2 scheduling β67%, VPC endpoints β$297/yr), and security & compliance (CIS, WAF, Config, Vault). Browse the
ce-repos β
- π Completed the Ironhack Cloud Engineering Bootcamp β capstone: the AIOps Observability Platform above
- π Preparing the AWS Certified Solutions Architect β Associate and HashiCorp Terraform Associate certifications
- ποΈ Building production-ready AI services on AWS β containerized, observable, infrastructure-as-code
- π Going deeper on AIOps, FinOps and MLOps: agentic operations, cost optimization, and model serving on cloud
Open to Cloud Engineer Β· Cloud Ops Β· DevOps Β· AIOps roles β including internships
Last updated June 2026 Β· View all repositories β


