Senior infrastructure/platform engineer focused on AWS, Kubernetes, Terraform, CI/CD, observability, and production reliability.
I work on cloud infrastructure, deployment automation, distributed worker systems, and operational reliability for production workloads. My strongest experience is in AWS/EKS/ECS, Terraform-managed environments, CI/CD pipelines, Kubernetes operations, observability, and production incident recovery.
- AWS infrastructure: ECS, EKS, EC2, IAM, S3, SQS, VPC, ALB, Route53, CloudWatch
- Kubernetes operations, Helm, IAM/auth troubleshooting, cluster access, and deployment workflows
- Terraform-managed multi-environment infrastructure
- CI/CD automation with GitHub Actions and Azure Pipelines
- Observability with OpenTelemetry, Prometheus, and Grafana
- Distributed task processing with RabbitMQ, Celery, Redis, PostgreSQL, and worker-based systems
- Production troubleshooting, incident response, and recovery-oriented engineering
- Re-architected Celery/Redis queue-processing workflows toward RabbitMQ-backed durable task processing and safer recovery behavior.
- Led ECS Fargate to ECS EC2 migration, reducing deployment times by ~40% and improving deployment control.
- Implemented Kubernetes observability tooling using OpenTelemetry, Prometheus, and Grafana.
- Helped recover production database systems after accidental deletion by coordinating Azure snapshot/backup discovery and restoration.
- Operated distributed ML competition infrastructure supporting high-volume submissions and long-running worker workloads.
I like the unglamorous parts of engineering: deployments that can be trusted, queues that recover cleanly, dashboards that answer real questions, and infrastructure that another engineer can safely operate at 2 AM.
Senior Platform Engineer · Infrastructure Engineer · Site Reliability Engineer · Cloud Infrastructure Engineer · Senior DevOps Engineer
Email: mr.tyler.thomas@gmail.com




