6+ years of experience in DevOps, Site Reliability Engineering (SRE), Linux Administration, Build/Release Management, Change/Incident Management, and Cloud Infrastructure. 4+ years as an SRE, focusing on system reliability, automation, and operational efficiency. Extensive experience with AWS cloud infrastructure, including services like VPC, EC2, S3, EBS, IAM, RDS, CloudFormation, Lambda, and CloudWatch for scalable and secure deployments. Architected and managed highly available, fault-tolerant, and disaster recovery-ready environments on AWS, utilizing Auto Scaling, Elastic Load Balancing, and Route53. Automated infrastructure provisioning using AWS CloudFormation, Terraform, and configuration management tools like Chef, Puppet, and Ansible. Established and maintained CI/CD pipelines using Jenkins, Maven, Bitbucket, Nexus, and GitLab, enabling continuous integration and delivery. Implemented and enforced AWS IAM policies for secure access management across environments, ensuring compliance with security best practices. Deployed and managed containerized applications using Docker, Kubernetes, and AWS ECS/EKS, ensuring efficient orchestration and scaling of microservices. Utilized AWS CloudWatch, DataDog, Splunk, and ELK stack for system and application monitoring, setting up alerts, dashboards, and log analysis for proactive incident response. Led Splunk implementation, reducing incident detection and resolution times by 30% through advanced dashboards, alerts, and data-driven insights. Automated repetitive tasks and configuration management with Ansible and Ansible Tower, streamlining deployments and minimizing human error. Experience in monitoring and maintaining service-level indicators (SLI) and ensuring systems meet defined service-level objectives (SLO) for reliability, latency, and availability. Hands-on with Docker for containerized deployments and integration with CI/CD processes. Proficient in scripting with Shell, Python, Ruby, Perl, and JavaScript to automate tasks and manage infrastructure. Deep knowledge of RDBMS such as Oracle, MySQL, and SQL Server, with experience in querying and maintaining data integrity using PL/SQL. Familiar with incident and bug tracking tools such as JIRA, HP Quality Center, Fisheye, and IBM ClearQuest. Conducted root cause analysis (RCA) for production incidents and implemented corrective action plans. Provided 24x7 on-call support for production systems, managing escalations, troubleshooting, and issue resolution. Strong troubleshooting skills in build, deployment, and production support environments, ensuring system stability and uptime.