Summary
Overview
Work History
Education
Skills
Timeline
Generic

Raj Maganti

Summary

  • Experience as a data engineer with around 8 years of experience, committed to creating and improving applications and data solutions.
  • Specialized expertise in crafting scalable data and analytics solutions on the AWS cloud platform, proficiently utilizing a range of services including EC2, S3, Redshift, Glue, Cloud watch, and Lambda.
  • Proficiently designed and implemented ETL pipelines for both batch and streaming data, employing PySpark and Spark to optimize processes for maximum efficiency.
  • Adept in database architecture, excelling in performance tuning and complex query development while prioritizing data quality and consistency.
  • Demonstrated mastery of Infrastructure as Code (IaC) tools like Terraform, streamlining resource provisioning, and effectively managing CI/CD pipeline workflows.
  • A strong foundation in designing, implementing, and optimizing data pipelines, harnessing Azure services like Azure Synapse, Data Factory, Blob Storage, Azure Key Vault, and so on technologies to empower data-driven insights and decision-making.
  • Proficiency in programming with Python and SQL, along with hands-on experience with Databricks for efficient data processing. A steadfast commitment to ensuring high availability and disaster recovery through vigilant cloud data center monitoring and servicing.
  • Acts as a recognized subject matter expert in cloud systems, overseeing integration efforts, ensuring compliance with industry standards, and automating governance and security measures for comprehensive data protection.
  • Proactively creates and manages data warehouses, with a primary focus on data quality, consistency, and security.
  • Expertise in data integration and data warehousing best practices, adeptly managing seamless data flows across diverse systems and platforms.
  • Proficient in the ongoing performance monitoring and optimization of data pipelines, consistently meeting stringent SLAs to ensure data reliability.
  • A dedicated commitment to delivering high-quality, scalable data solutions that catalyze data-informed decision-making and foster innovation.

Overview

9
9
years of professional experience

Work History

Data Engineer

Evicore
01.2023 - Current
  • Spearheaded the migration and transformation of an on-premises data warehouse to a cutting-edge cloud-based solution, leveraging a tech stack that included Snowflake, AWS Glue, ETL processes, Amazon S3, Hive, Python, and SQL
  • Orchestrated the extraction, transformation, and seamless ingestion of data from existing databases into Amazon S3, seamlessly configuring Snowflake as the central cloud data warehouse
  • Utilized Python to optimize ETL pipelines, ensuring efficient data movement, while harnessing the power of Hive and SQL for user-friendly querying and reporting capabilities
  • Implemented and managed batch jobs using Snowflake Task objects, efficiently loading data from S3 into Snowflake's Raw Layer via the powerful COPY command
  • Pioneered the design and execution of a cost-effective, high-performance data warehousing solution, harnessing the full potential of Snowflake to enhance data accessibility and accelerate decision-making processes
  • Implemented comprehensive data lineage tracking mechanisms, providing a transparent audit trail for data changes to ensure compliance with regulatory standards and transparency
  • Engineered automated data validation techniques within the ETL process, proactively identifying and rectifying data quality issues, thereby guaranteeing the availability of high-quality, reliable data for analytical purposes
  • Demonstrated proficiency in monitoring Snowflake workloads, adeptly identifying and addressing performance bottlenecks, and maintaining the integrity of data pipelines and storage
  • Showcased expertise in utilizing data cataloging tools to streamline data discovery, facilitate lineage tracking, and effectively manage metadata for enhanced data governance
  • Successfully integrated Snowflake with a diverse range of data sources and tools, including data lakes, BI platforms, and data integration solutions, ensuring seamless data flow and accessibility
  • Fostered open communication and collaborated closely with business stakeholders throughout project lifecycles, assuming a leadership role in large-scale data processing initiatives, emphasizing accuracy, oversight, and innovation
  • Environments: Snowflake, AWS Glue, ETL, S3, Hive, PySpark, Python and SQL

AWS Data Engineer

Neiman Marcus
01.2022 - 11.2022
  • Spearheaded the inception and evolution of serverless applications on AWS, utilizing the Serverless Framework and Python's boto3 library to create efficient, cost-effective solutions
  • Effectively crafted serverless applications, leveraging AWS Lambda, API Gateway, and DynamoDB to achieve substantial reductions in infrastructure expenses while enhancing scalability
  • Engineered and executed comprehensive data pipelines adeptly, employing AWS Glue, Apache Airflow, and Apache Spark to seamlessly harvest, transform, and load data from a diverse array of sources into data repositories
  • Established data quality validations and governance protocols, ensuring the precision and uniformity of data across the entire organizational spectrum
  • Developed and tailored ETL processes within AWS Glue, streamlining the importation of data from external sources into AWS Redshift, and paving the way for comprehensive data analysis
  • Diligently oversaw and optimized the health and performance of Amazon Redshift clusters through cutting-edge monitoring tools and intuitive dashboards
  • Orchestrated meticulous data backups and executed contingency strategies, guaranteeing uninterrupted data integrity and business resilience
  • Implemented stringent data encryption measures and fortified security protocols by industry standards, providing an impervious shield for sensitive data
  • Masterminded the design and fine-tuning of intricate data models, meticulously delineating schemas and configuring indexes to amplify data storage efficiency and retrieval speed within Snowflake
  • Directed and fine-tuned data pipelines using Apache Airflow in tandem with custom scripts, proficiently managing data extraction, transformation, and loading (ETL) processes from a diverse array of sources into the Redshift data warehouse
  • Pioneered proactive monitoring and alerting systems, expertly identifying and mitigating data load hiccups, query performance bottlenecks, and systemic health concerns
  • Harnessed the computational prowess of Apache Spark for distributed data processing tasks, dramatically boosting processing speeds and operational efficiency
  • Environments: AWS Cloud S3, Snowflakes, DB2, Bigdata, Spark, EMR, Python, MapReduce, Glue, CloudWatch

Data Engineer

Citizens Bank
11.2018 - 12.2021
  • Oversaw the management and upkeep of Hadoop clusters, ensuring their continuous availability, security, and optimal performance
  • Developed MapReduce applications, crafted Hive queries, and authored Pig scripts for streamlined data processing
  • Constructed and perpetually maintained real-time data streaming applications through Kafka, taking charge of topic management and guaranteeing data reliability
  • Demonstrated expertise in employing AWS Glue, Databricks, and RedShift Analytics to architect intricate data solutions
  • Crafted Spark applications for distributed data processing and analytics, encompassing both batch and real-time processing capabilities
  • Seamlessly integrated ETL operations, maintaining the integrity of data flowing from diverse sources into Amazon S3 and SQL Data Warehouse
  • Leveraged Sqoop to facilitate seamless data transfers between Hadoop and relational databases, thereby ensuring data consistency and accuracy
  • Expertly scripted SQL queries for Spark, enabling data analysis on structured data within the Spark framework
  • Optimized Hive queries tailored for Hadoop data warehousing and analytics, enhancing efficiency and performance
  • Orchestrated large-scale data processing endeavors utilizing Databricks and PySpark for transformative operations
  • Enforced rigorous data governance, encompassing lineage, metadata management, and security protocols
  • Employed AWS Glue to streamline intricate operations, significantly enhancing operational efficiency
  • Notably skilled in designing and optimizing SQL queries to achieve peak data solution performance
  • Developed applications using Scala, frequently in tandem with Spark, to execute data processing tasks effectively
  • Implemented Pig Latin scripts to drive data transformations and ETL procedures within the Hadoop ecosystem
  • Engineered and executed data pipelines using AWS Glue to facilitate data movement and transformation
  • Managed workflows for the coordination and scheduling of Hadoop jobs, alongside the administration and maintenance of HBase clusters for NoSQL data storage and retrieval
  • Proficiently harnessed Tableau to craft interactive and insightful data visualizations for in-depth data analysis and reporting
  • Crafted applications, scripts, and data analysis tools utilizing Python
  • Designed and maintained data pipelines, harnessing a suite of AWS services such as Glue, Lambda, Step Functions, Code Build, Event Bridge, and Athena for various data processing requirements
  • Formulated and administered shell scripts to streamline server management, automation, and system maintenance tasks
  • Environments: Hadoop, Kafka, Spark, Sqoop, SQL, Hive, Scala, Pig, Oozie, HBase, Tableau, Python, AWS (Glue, Lambda, Step Functions, Code Build, Code Pipeline, Event Bridge, Athena), Linux Shell Scripting, Databricks

ETL Developer

Falcon Smart IT
01.2015 - 10.2018
  • Held the responsibility for the establishment, configuration, and sustained management of Hadoop clusters
  • Actively developed MapReduce tasks and various applications tailored for processing extensive datasets
  • Skillfully engineered and maintained MapReduce applications, integral to data processing and analysis
  • Orchestrated data storage management to ensure continuous data availability and reliability
  • Expertly employed Pig and Hive for data querying and analysis, complemented by the development of Spark applications tailored for real-time data processing and advanced analytics
  • Proficiently governed and fine-tuned Kafka for real-time data streaming and processing, optimizing data flow and performance
  • Leveraged IntelliJ IDEA for Java and other programming languages to conceive, develop, test, and package applications with precision, utilizing Sbt for streamlined application management
  • Effectively harnessed Zeppelin to explore and visualize interactive data, concurrently managing cluster resources and work scheduling
  • Demonstrated proficiency in PostgreSQL database administration and maintenance
  • Utilized SQL queries as the primary tool for data extraction and analysis, complemented by the implementation of SQL Plus for command-line interactions with Oracle databases
  • Expertly developed stored procedures, functions, and triggers within Oracle databases, fostering optimized data processing
  • Efficiently operated TOAD for Oracle database maintenance and development, implementing SQL Loader for the seamless loading of data into Oracle databases from external sources
  • Expertly employed SQL Discoverer for robust business intelligence and data reporting capabilities
  • Crafted and executed ETL (Extract, Transform, Load) operations tailored for SQL Server using SSIS, concurrently managing and maintaining Oracle 9i databases
  • Successfully administered and maintained servers and systems running on both Windows and UNIX platforms, ensuring their continued reliability and performance
  • Environments: Hadoop, MapReduce, HDFS, Pig, Hive, PostgreSQL, Spark, Kafka, IntelliJ, Sbt, YARN, SQL, Git, SQL
  • Plus, PL/SQL, TOAD, SSIS, Oracle 10G and ETL

Education

Computer Science -

University of Bridgeport
Bridgeport, CT

Information Technology -

Prasad V Potluri Siddharth Institute of Technology (Affiliated to JNTUK)
Vijayawada, India

Skills

    • SQL
    • PL/SQL
    • Python
    • PySpark
    • Hive QL
    • Scala
    • Shell Scripting
    • Java
    • PostgreSQL
    • Power BI
    • Tableau
    • ADF
    • Snowflake
      • AWS Glue
      • Databricks
      • My SQL
      • SQL Server
      • MongoDB
      • Oracle
      • Azure Data Factory
      • Synapse
      • CVS
      • GitHub
      • GitLab
      • Azure DevOps
      • Hadoop

Timeline

Data Engineer

Evicore
01.2023 - Current

AWS Data Engineer

Neiman Marcus
01.2022 - 11.2022

Data Engineer

Citizens Bank
11.2018 - 12.2021

ETL Developer

Falcon Smart IT
01.2015 - 10.2018

Computer Science -

University of Bridgeport

Information Technology -

Prasad V Potluri Siddharth Institute of Technology (Affiliated to JNTUK)
Raj Maganti