Summary
Overview
Work History
Education
Skills
Timeline
Generic

Satya Vaibhav Ravuri

Sandy Springs

Summary

Accomplished Sr. Data Engineer at Wells Fargo, specializing in Snowflake and Azure Data Factory. Expert in designing robust data pipelines and optimizing performance, achieving 99.9% uptime. Proficient in Python and Terraform, with a strong commitment to data governance and compliance. Adept at collaborating with cross-functional teams to drive impactful data solutions. A proven track record of working in a fast-paced environment. Analytically driven professional with a background in data analysis, adept at optimizing productivity and task efficiency. Specialize in machine learning, predictive modeling, and big data analytics. Bring critical thinking, problem-solving, and teamwork skills to excel in collaborative environments. Strategic professional versed in distilling and analyzing large data sets. Develops and delivers presentations detailing data findings. Articulate and collaborative with expertise in algorithm design and data collection.

Overview

10
10
years of professional experience

Work History

Sr. Data Engineer

Wells Fargo
San Francisco
12.2023 - Current
  • Designed and implemented Snowflake-based data warehouses supporting business intelligence and analytics
  • Developed and optimized data pipelines using Spark, Scala, Python, and Snowflake
  • Built modular SQL queries in DBT, breaking down complex transformations
  • Created data ingestion and transformation pipelines using Apache Airflow, Informatica, and Automic
  • Implemented CDC and slow-changing dimension techniques in Snowflake, optimizing data for reporting in Power BI
  • Used Azure Data Factory (ADF) and Terraform for orchestration and automation of workflows
  • Developed Shell scripting and JavaScript-based automation scripts for monitoring and system reliability
  • Worked on CI/CD pipelines using Git, Terraform, and Jenkins for automated deployments
  • Led the design, develop, and deliver large-scale data ingestion, data processing, and data transformation projects on Azure
  • Led performance tuning of Spark Applications for setting the right Batch Interval time, correct level of Parallelism and memory tuning
  • Developed DF's, Case Classes for the required input data and performed the data transformations using Spark-Core
  • Created Pipelines in ADF using Linked Services/Datasets/Pipeline/ to Extract, Transform and load data from different sources, Scheduled Triggers, Mapping data flows using Azure Data Factory, and used key Vaults to Store Credentials
  • Set expectations to define expected data quality constraints and specify how to handle records that fail the set expectations
  • Used Spark-Streaming APIs to perform transformations and actions on the fly for building the common learner data model which gets the data from Kafka in near real time and Persists into Cassandra
  • Developed Terraform Templates to automate the azure IAAS virtual machines using terraform modules and deployed machines scales sets in production environment
  • Integrated Azure log analytics with azure VM for monitoring log files, store them track metrics with Terraform
  • Work closely with security architects and engineers to integrate security into the DevOps process, Collaborate to design secure infrastructure architectures
  • Process streaming and batch data in a single DLT pipeline
  • Used change data capture (CDC) in DLT to update tables based on changes in source data
  • Designed and implemented ETL/ELT pipelines in Azure Synapse, utilizing tools such as Azure Data Factory and Azure Databricks to orchestrate data movement, transformation, and loading processes
  • Monitored and improved NoSQL database performance, ensuring 99.9% uptime and real-time responsiveness
  • Updated and maintained comprehensive documentation for all NoSQL systems, aiding in efficient knowledge transfer in the team
  • Capable of integrating Azure/Synapse with various data sources including Azure Blob Storage, Azure Data Lake Storage, Azure SQL Database, and on-premises databases for seamless data ingestion and transformation
  • Developed API endpoints using Node.js, fetching real-time data from Snowflake, enabling seamless integration with the reporting portal, resulting in improved delivery planning for retail clients, optimized operations, and reduction in delivery costs
  • Engineered ELT pipelines to ingest and transform segregated data from SQL Server into Snowflake, optimizing data storage and facilitating metric identification, leading to reduction in storage costs
  • Implemented Change Data Capture and slow changing dimension methodologies on Snowflake using Tasks and Streams, reducing reporting latency and enabling business intelligence capabilities using Tableau and Microsoft Power BI
  • Developed JSON Scripts for deploying the pipelines in Azure Data Factory(ADF) that process the data using SQL Activity
  • Worked with various formats of files like delimited text files, click stream log files, Apache log files, Avro files, JSON files, and XML Files
  • Mastered in using different columnar file formats like Parquet formats
  • Worked on various compression and file formats like Avro, Parquet, and Text formats
  • Worked on Azure Databricks cloud to organize the data into notebooks and make it easy to visualize data using dashboards
  • Develop and maintain ETL processes and data workflows using Scala and Apache Spark for real-time and batch data processing
  • Collaborate with data engineers and data scientists to integrate machine learning models and algorithms into Scala-based applications
  • Conduct performance tuning and optimization of Scala applications and Spark jobs to improve resource utilization and efficiency
  • Worked on CI/CD(GitHub) Pipeline to migrate the code into different environments
  • Using Azure Databricks, created Spark clusters and configured high concurrency clusters to speed up the preparation of high-quality data
  • Used Azure Databricks for a fast, easy, and collaborative spark-based platform on Azure

Data Engineer

TP Vision India private limited
09.2019 - 06.2022
  • Designed and implemented data warehousing solutions on Snowflake and Azure Synapse
  • Optimized Spark applications for large-scale data transformations and performance tuning
  • Created ETL pipelines using Spark, Scala, Python, and SQL
  • Developed Azure Data Factory and Airflow workflows for orchestrating automated data processing
  • Ensured data governance, compliance, and security through best practices
  • Worked Extensively on Azure SQL Written queries, stored procedures and views
  • Processed claims in mainframes for all claim types such as Professional, Institutional, Medicare/Medicaid, COB, SNF for the Cost shares (Deductible & Coinsurance) and payments are made as expected with member benefits and provider discounts
  • Managed high provider network claims by data mining large data sets to do large scale quantitative analysis and accurate data impacted 14 states of Blue cross and blue shield
  • Worked on resolving Anthem Provider data Migration from files into RDBMS
  • Key member of Data Engineer team that handled the data cleaning, pre-processing and mining visualization using Tableau
  • Managed high provider network claims by data mining large data sets of IRS, CMS, HHSC, HEDIS, HIPAA, PHI to do large scale quantitative analysis and accurate data impacted 14 states of Blue cross and blue shield
  • Proficient in navigating EPIC and Clarity systems, with hands-on experience in extracting, analyzing, and interpreting healthcare data to drive strategic decision-making
  • Extensive experience in managing and manipulating large datasets within EPIC and Clarity platforms to generate actionable insights and improve healthcare delivery processes
  • Created File system linked service to extract events from on-prim network folder to data lake
  • Worked on data factory monitor to check the status of the pipeline runs and triggers
  • For Claims data, Provider data, Utilization data and provide assessment of performance deviations and anomalies to project consultant
  • Worked on the performance tuning and extracting data between excel and servers using flows
  • Configured and implemented the Azure Data Factory Triggers and scheduled the Pipelines
  • Monitored the scheduled Azure Data Factory pipelines and configured the alerts to get notification of failure pipelines
  • Extensively worked on Azure Data Bricks with the help of Spark-SQL to implement SCD-1, SCD-2, SCD-3 approach
  • Maintained comprehensive documentation for Spark applications and adhered to best practices for code quality and readability
  • Integrated Apache Spark with other components of the big data ecosystem, such as Hadoop HDFS, Hive, and Kafka, for seamless data flow and processing
  • Successfully implemented Spark in scenarios demanding horizontal scalability, allowing for the efficient handling of growing data volumes
  • Regularly monitored Spark job performance, identified bottlenecks, and applied optimizations to improve job execution times
  • Trained and mentored team members on Apache Spark best practices, promoting knowledge sharing within the organization
  • Understanding of Cache-Control headers and their configuration to control caching behavior, such as setting expiration times, enabling validation, and managing cache re-validation
  • Configured the logic apps to handle email notification to the end users and key shareholders with the help of web services activity; create dynamic pipeline to handle multiple source extracting to multiple targets; extensively used azure key vaults to configure the connections in linked service
  • Exposure on Azure Data Factory activities such as Lookups, Stored procedures, if condition, for each, Set Variable, Append Variable, Get Metadata, Filter and wait
  • Deployed the codes to multiple environments with the help of CI/CD process and worked on code defect during the SIT and UAT testing and provide supports
  • Implemented End-End logging frameworks for Data factory pipelines
  • Developing Spark (Python) notebooks to transform and partition the data and organize files in ADLS
  • Involved in planning strategies to move existing application from on-premises to Azure cloud platform
  • Participated in daily Scrum meeting, bi-weekly Sprint planning and quarterly epic designs, planning as a process of agile methodology
  • Created the ARM Template for creating the Azure Infrastructure using the Gradle build in the Jenkins job

Data Engineer

Universal Soft Tech India private Limited
05.2015 - 08.2019
  • Implemented Terraform to manage the Azure and GCP Infrastructure using the configuration management tool Jenkins
  • Developed a G-cloud function with Python to load data into BigQuery for on-arrival CSV files in the GCS bucket.
  • Process and load bound and unbound data from Google Pub/Sub topic to BigQuery using Cloud Dataflow with Python.
  • Migrating the Data from the On-premises Database server to Azure SQL using the Data factory
  • Design, Plan and create Azure virtual machines, Implement and manage virtual networking within Azure and connect to on-premises environments
  • Migrating the client files from a shared server to Azure SQL using the Data Factory
  • Performing the Fortify scan and continuously check for any bugs
  • Automate the deployment and troubleshoot mechanisms for quick service
  • Managing and monitoring the azure cloud resources
  • Worked on fixing the azure vulnerabilities
  • Worked on querying and analyzing the logs of the azure resources
  • Implemented Azure Automation Account, Created an Azure Automation Runbook
  • Storing and using automation assets
  • Created and maintained data pipelines and ETL processes that leverage the parallel processing capabilities of MPP systems for large-scale data integration
  • Provided expertise in troubleshooting and resolving issues related to MPP architecture, ensuring uninterrupted data processing and system reliability
  • Understanding and using JSON Templates
  • Using Visual Studio to create and edit JSON templates and deployed JSON templates
  • Worked on implementing the private endpoint to the storage account
  • Worked on implementing the firewall to the storage account and azure SQL to restrict access to particular VM’s and IP address
  • Worked on storing the on-premises files in the azure storage account using the data factory
  • Created Automation scripts to deploying to the higher environments
  • Created Jenkins job to create the azure Infrastructure whenever there is a git push to the bitbucket
  • Developed the automation scripts for the health checks, Online Servers
  • Scheduled the health checks of the servers using the task scheduler
  • Loaded the data into database using SSIS
  • Debugging and resolving the azure SQL database errors
  • Worked on Hybrid Infrastructure and resolved the vulnerabilities in both cloud and on premises
  • Manage the metrics on azure resources
  • Implemented Azure Application Insights to store user activities and error logging
  • Worked on the Cost Management of Azure resources
  • Debugging and resolving the access issues by going through the logs, domain controller and active directory
  • Documented about the Servers lists in all the environments and its details
  • Worked on backup, restore and recovery of the Azure Databases
  • Worked on implementing backup methodologies by PowerShell scripts for azure services like Azure SQL Database, Key Vault, Storage blobs, App Services etc
  • Worked on improving the existing code and fixing some issues in it after a deep dive through the existing code
  • Migrated all the Service Requests from Jira to the Jira Service desk
  • Generated the scan report by performing the fortify scan on the existing code
  • Used various sources to pull the data into power Bi from SQL server and Oracle
  • Involved in installation of Power BI remote server
  • Using a Query editor in Power BI performed certain operations like fetching from different file
  • Merge queries and append queries, remove columns and split columns, choosing required columns in data
  • Used various data visualizations in Power BI
  • Managed version control and deployment of data applications using Git, Docker, and Jenkins.
  • Participated in agile development processes, contributing to sprint planning, stand-ups, and reviews to ensure timely delivery of data projects.
  • Collaborated with data scientists and analysts to understand data needs and implement appropriate data models and structures.

Education

Master’s -

Rivier university
01.2024

B.Tech -

Kits Engineering College
01-2016

Skills

  • Snowflake
  • Azure Data Factory
  • Azure Databricks
  • Azure Synapse
  • Azure Data Lake
  • Python
  • Scala
  • SQL
  • Shell Scripting
  • JavaScript
  • Apache Airflow
  • Informatica
  • Automic
  • Azure Logic Apps
  • Spark
  • PySpark
  • Spark SQL
  • DBT
  • Data lineage
  • Security best practices
  • Compliance
  • HIPAA
  • GDPR
  • Terraform
  • Git
  • Jenkins
  • CI/CD pipeline automation
  • SQL Analytical Functions
  • Power BI
  • Tableau

Timeline

Sr. Data Engineer

Wells Fargo
12.2023 - Current

Data Engineer

TP Vision India private limited
09.2019 - 06.2022

Data Engineer

Universal Soft Tech India private Limited
05.2015 - 08.2019

Master’s -

Rivier university

B.Tech -

Kits Engineering College
Satya Vaibhav Ravuri