Experience as a data engineer with around 8 years of experience, committed to creating and improving applications and data solutions.
Specialized expertise in crafting scalable data and analytics solutions on the AWS cloud platform, proficiently utilizing a range of services including EC2, S3, Redshift, Glue, Cloud watch, and Lambda.
Proficiently designed and implemented ETL pipelines for both batch and streaming data, employing PySpark and Spark to optimize processes for maximum efficiency.
Adept in database architecture, excelling in performance tuning and complex query development while prioritizing data quality and consistency.
Demonstrated mastery of Infrastructure as Code (IaC) tools like Terraform, streamlining resource provisioning, and effectively managing CI/CD pipeline workflows.
A strong foundation in designing, implementing, and optimizing data pipelines, harnessing Azure services like Azure Synapse, Data Factory, Blob Storage, Azure Key Vault, and so on technologies to empower data-driven insights and decision-making.
Proficiency in programming with Python and SQL, along with hands-on experience with Databricks for efficient data processing. A steadfast commitment to ensuring high availability and disaster recovery through vigilant cloud data center monitoring and servicing.
Acts as a recognized subject matter expert in cloud systems, overseeing integration efforts, ensuring compliance with industry standards, and automating governance and security measures for comprehensive data protection.
Proactively creates and manages data warehouses, with a primary focus on data quality, consistency, and security.
Expertise in data integration and data warehousing best practices, adeptly managing seamless data flows across diverse systems and platforms.
Proficient in the ongoing performance monitoring and optimization of data pipelines, consistently meeting stringent SLAs to ensure data reliability.
A dedicated commitment to delivering high-quality, scalable data solutions that catalyze data-informed decision-making and foster innovation.
Overview
9
9
years of professional experience
Work History
Data Engineer
Evicore
01.2023 - Current
Spearheaded the migration and transformation of an on-premises data warehouse to a cutting-edge cloud-based solution, leveraging a tech stack that included Snowflake, AWS Glue, ETL processes, Amazon S3, Hive, Python, and SQL
Orchestrated the extraction, transformation, and seamless ingestion of data from existing databases into Amazon S3, seamlessly configuring Snowflake as the central cloud data warehouse
Utilized Python to optimize ETL pipelines, ensuring efficient data movement, while harnessing the power of Hive and SQL for user-friendly querying and reporting capabilities
Implemented and managed batch jobs using Snowflake Task objects, efficiently loading data from S3 into Snowflake's Raw Layer via the powerful COPY command
Pioneered the design and execution of a cost-effective, high-performance data warehousing solution, harnessing the full potential of Snowflake to enhance data accessibility and accelerate decision-making processes
Implemented comprehensive data lineage tracking mechanisms, providing a transparent audit trail for data changes to ensure compliance with regulatory standards and transparency
Engineered automated data validation techniques within the ETL process, proactively identifying and rectifying data quality issues, thereby guaranteeing the availability of high-quality, reliable data for analytical purposes
Demonstrated proficiency in monitoring Snowflake workloads, adeptly identifying and addressing performance bottlenecks, and maintaining the integrity of data pipelines and storage
Showcased expertise in utilizing data cataloging tools to streamline data discovery, facilitate lineage tracking, and effectively manage metadata for enhanced data governance
Successfully integrated Snowflake with a diverse range of data sources and tools, including data lakes, BI platforms, and data integration solutions, ensuring seamless data flow and accessibility
Fostered open communication and collaborated closely with business stakeholders throughout project lifecycles, assuming a leadership role in large-scale data processing initiatives, emphasizing accuracy, oversight, and innovation
Spearheaded the inception and evolution of serverless applications on AWS, utilizing the Serverless Framework and Python's boto3 library to create efficient, cost-effective solutions
Effectively crafted serverless applications, leveraging AWS Lambda, API Gateway, and DynamoDB to achieve substantial reductions in infrastructure expenses while enhancing scalability
Engineered and executed comprehensive data pipelines adeptly, employing AWS Glue, Apache Airflow, and Apache Spark to seamlessly harvest, transform, and load data from a diverse array of sources into data repositories
Established data quality validations and governance protocols, ensuring the precision and uniformity of data across the entire organizational spectrum
Developed and tailored ETL processes within AWS Glue, streamlining the importation of data from external sources into AWS Redshift, and paving the way for comprehensive data analysis
Diligently oversaw and optimized the health and performance of Amazon Redshift clusters through cutting-edge monitoring tools and intuitive dashboards
Orchestrated meticulous data backups and executed contingency strategies, guaranteeing uninterrupted data integrity and business resilience
Implemented stringent data encryption measures and fortified security protocols by industry standards, providing an impervious shield for sensitive data
Masterminded the design and fine-tuning of intricate data models, meticulously delineating schemas and configuring indexes to amplify data storage efficiency and retrieval speed within Snowflake
Directed and fine-tuned data pipelines using Apache Airflow in tandem with custom scripts, proficiently managing data extraction, transformation, and loading (ETL) processes from a diverse array of sources into the Redshift data warehouse
Pioneered proactive monitoring and alerting systems, expertly identifying and mitigating data load hiccups, query performance bottlenecks, and systemic health concerns
Harnessed the computational prowess of Apache Spark for distributed data processing tasks, dramatically boosting processing speeds and operational efficiency
Oversaw the management and upkeep of Hadoop clusters, ensuring their continuous availability, security, and optimal performance
Developed MapReduce applications, crafted Hive queries, and authored Pig scripts for streamlined data processing
Constructed and perpetually maintained real-time data streaming applications through Kafka, taking charge of topic management and guaranteeing data reliability
Demonstrated expertise in employing AWS Glue, Databricks, and RedShift Analytics to architect intricate data solutions
Crafted Spark applications for distributed data processing and analytics, encompassing both batch and real-time processing capabilities
Seamlessly integrated ETL operations, maintaining the integrity of data flowing from diverse sources into Amazon S3 and SQL Data Warehouse
Leveraged Sqoop to facilitate seamless data transfers between Hadoop and relational databases, thereby ensuring data consistency and accuracy
Expertly scripted SQL queries for Spark, enabling data analysis on structured data within the Spark framework
Optimized Hive queries tailored for Hadoop data warehousing and analytics, enhancing efficiency and performance
Orchestrated large-scale data processing endeavors utilizing Databricks and PySpark for transformative operations
Enforced rigorous data governance, encompassing lineage, metadata management, and security protocols
Notably skilled in designing and optimizing SQL queries to achieve peak data solution performance
Developed applications using Scala, frequently in tandem with Spark, to execute data processing tasks effectively
Implemented Pig Latin scripts to drive data transformations and ETL procedures within the Hadoop ecosystem
Engineered and executed data pipelines using AWS Glue to facilitate data movement and transformation
Managed workflows for the coordination and scheduling of Hadoop jobs, alongside the administration and maintenance of HBase clusters for NoSQL data storage and retrieval
Proficiently harnessed Tableau to craft interactive and insightful data visualizations for in-depth data analysis and reporting
Crafted applications, scripts, and data analysis tools utilizing Python
Designed and maintained data pipelines, harnessing a suite of AWS services such as Glue, Lambda, Step Functions, Code Build, Event Bridge, and Athena for various data processing requirements
Formulated and administered shell scripts to streamline server management, automation, and system maintenance tasks
Held the responsibility for the establishment, configuration, and sustained management of Hadoop clusters
Actively developed MapReduce tasks and various applications tailored for processing extensive datasets
Skillfully engineered and maintained MapReduce applications, integral to data processing and analysis
Orchestrated data storage management to ensure continuous data availability and reliability
Expertly employed Pig and Hive for data querying and analysis, complemented by the development of Spark applications tailored for real-time data processing and advanced analytics
Proficiently governed and fine-tuned Kafka for real-time data streaming and processing, optimizing data flow and performance
Leveraged IntelliJ IDEA for Java and other programming languages to conceive, develop, test, and package applications with precision, utilizing Sbt for streamlined application management
Effectively harnessed Zeppelin to explore and visualize interactive data, concurrently managing cluster resources and work scheduling
Demonstrated proficiency in PostgreSQL database administration and maintenance
Utilized SQL queries as the primary tool for data extraction and analysis, complemented by the implementation of SQL Plus for command-line interactions with Oracle databases
Expertly developed stored procedures, functions, and triggers within Oracle databases, fostering optimized data processing
Efficiently operated TOAD for Oracle database maintenance and development, implementing SQL Loader for the seamless loading of data into Oracle databases from external sources
Expertly employed SQL Discoverer for robust business intelligence and data reporting capabilities
Crafted and executed ETL (Extract, Transform, Load) operations tailored for SQL Server using SSIS, concurrently managing and maintaining Oracle 9i databases
Successfully administered and maintained servers and systems running on both Windows and UNIX platforms, ensuring their continued reliability and performance