
Initiative-taking technical professional with 4+ years of experience in IT and MSP environments, specializing in DevOps and cloud-native operations on AWS. Highly skilled in containerization (K8s, Docker), IaC (Terraform), CI/CD (Argo CD, GitHub Actions), and log analysis (ELK, Prometheus). Proven ability to streamline troubleshooting via runbooks and deliver high system availability through strong technical and collaborative communication.
- Engineered and deployed a Kubernetes monitoring stack POC using ArgoCD (GitOps), Terraform for IaC, and GitHub Actions for CI/CD automation.
- Managed end-to-end observability workflows (log ingestion, monitoring, alerting) using the ELK Stack, Prometheus, and Grafana to proactively address system health.
- Created detailed runbooks for monitoring dashboards to guide incident response and troubleshooting, improving resolution speed and consistency.
- Ensured high system availability by effectively troubleshooting application, network, and container-level issues within Kubernetes and Docker environments.
- Improved data quality and anomaly detection by building a real-time system (Random Cut Forest model) in AWS, SageMaker, to automatically find unusual product updates.
- Accelerated data processing and scoring via a serverless workflow (AWS Lambda) that prepared data, communicated with SageMaker, and managed alerting.
- Developed and optimized data pipelines for performance, reliability, and scalability in production.
- Bridged a critical gap in course registration during the LMS migration from iLearn to Brightspace by designing RESTful APIs using Python and FastAPI, enabling registration for over 2,000 courses, and reducing downtime by 50%.
- Rapidly migrated over 3,000 course sites from a legacy LMS to Brightspace, ensuring data integrity, accessibility, regression, and smoke testing for a seamless transition in 180 days.
- Enhanced platform functionality by creating 100+ website pages in Liferay 7, increasing user engagement.
Delivered and developed six onboarding training modules for new team hires.
- Conducted WCAG-compliant keyboard accessibility testing in collaboration with the Sakai LMS community, ensuring compliance for over 1,700 test scripts.
- Monitored and maintained web application performance, identifying and resolving health issues that impacted user experience.
- Migrated over 6,000 pages from Liferay V2.x to V7.x, ensuring content integrity and a seamless user experience.
- Collaborated with cross-functional teams (designers, developers, and content creators) to integrate content, maintaining website functionality and aesthetic consistency, resulting in a 30% reduction in post-migration errors.
- Elevated website performance through detailed testing, contributing to improved user experience, and accessibility.
- Managed critical incident recovery across backup infrastructure (tape, disk, and containerized storage), escalating unresolved issues via Salesforce.
- Identified and corrected faults impacting service health by performing detailed log monitoring and network diagnostics using tools like Wireshark and traceroute.
- Ensured SLA compliance by maintaining accurate ticket updates, following escalation protocols, and tracking performance against company policies.
- Significantly reduced downtime and improved operational stability by diagnosing and resolving complex network connectivity, SSL, VPN, and system access issues for global customers.
- Improved future issue resolution processes by 20% by streamlining knowledge base documentation, and analyzing root causes while managing the incident lifecycle in ServiceNow.
AWS Certified SysOps Administrator – Associate Link