Summary

Overview

Work History

Education

Skills

Timeline

Satya Vaibhav Ravuri

Sandy Springs

Summary

Accomplished Sr. Data Engineer at Wells Fargo, specializing in Snowflake and Azure Data Factory. Expert in designing robust data pipelines and optimizing performance, achieving 99.9% uptime. Proficient in Python and Terraform, with a strong commitment to data governance and compliance. Adept at collaborating with cross-functional teams to drive impactful data solutions. A proven track record of working in a fast-paced environment. Analytically driven professional with a background in data analysis, adept at optimizing productivity and task efficiency. Specialize in machine learning, predictive modeling, and big data analytics. Bring critical thinking, problem-solving, and teamwork skills to excel in collaborative environments. Strategic professional versed in distilling and analyzing large data sets. Develops and delivers presentations detailing data findings. Articulate and collaborative with expertise in algorithm design and data collection.

Overview

years of professional experience

Work History

Sr. Data Engineer

Wells Fargo

San Francisco

12.2023 - Current

Designed and implemented Snowflake-based data warehouses supporting business intelligence and analytics
Developed and optimized data pipelines using Spark, Scala, Python, and Snowflake
Built modular SQL queries in DBT, breaking down complex transformations
Created data ingestion and transformation pipelines using Apache Airflow, Informatica, and Automic
Implemented CDC and slow-changing dimension techniques in Snowflake, optimizing data for reporting in Power BI
Used Azure Data Factory (ADF) and Terraform for orchestration and automation of workflows
Developed Shell scripting and JavaScript-based automation scripts for monitoring and system reliability
Worked on CI/CD pipelines using Git, Terraform, and Jenkins for automated deployments
Led the design, develop, and deliver large-scale data ingestion, data processing, and data transformation projects on Azure
Led performance tuning of Spark Applications for setting the right Batch Interval time, correct level of Parallelism and memory tuning
Developed DF's, Case Classes for the required input data and performed the data transformations using Spark-Core
Created Pipelines in ADF using Linked Services/Datasets/Pipeline/ to Extract, Transform and load data from different sources, Scheduled Triggers, Mapping data flows using Azure Data Factory, and used key Vaults to Store Credentials
Set expectations to define expected data quality constraints and specify how to handle records that fail the set expectations
Used Spark-Streaming APIs to perform transformations and actions on the fly for building the common learner data model which gets the data from Kafka in near real time and Persists into Cassandra
Developed Terraform Templates to automate the azure IAAS virtual machines using terraform modules and deployed machines scales sets in production environment
Integrated Azure log analytics with azure VM for monitoring log files, store them track metrics with Terraform
Work closely with security architects and engineers to integrate security into the DevOps process, Collaborate to design secure infrastructure architectures
Process streaming and batch data in a single DLT pipeline
Used change data capture (CDC) in DLT to update tables based on changes in source data
Designed and implemented ETL/ELT pipelines in Azure Synapse, utilizing tools such as Azure Data Factory and Azure Databricks to orchestrate data movement, transformation, and loading processes
Monitored and improved NoSQL database performance, ensuring 99.9% uptime and real-time responsiveness
Updated and maintained comprehensive documentation for all NoSQL systems, aiding in efficient knowledge transfer in the team
Capable of integrating Azure/Synapse with various data sources including Azure Blob Storage, Azure Data Lake Storage, Azure SQL Database, and on-premises databases for seamless data ingestion and transformation
Developed API endpoints using Node.js, fetching real-time data from Snowflake, enabling seamless integration with the reporting portal, resulting in improved delivery planning for retail clients, optimized operations, and reduction in delivery costs
Engineered ELT pipelines to ingest and transform segregated data from SQL Server into Snowflake, optimizing data storage and facilitating metric identification, leading to reduction in storage costs
Implemented Change Data Capture and slow changing dimension methodologies on Snowflake using Tasks and Streams, reducing reporting latency and enabling business intelligence capabilities using Tableau and Microsoft Power BI
Developed JSON Scripts for deploying the pipelines in Azure Data Factory(ADF) that process the data using SQL Activity
Worked with various formats of files like delimited text files, click stream log files, Apache log files, Avro files, JSON files, and XML Files
Mastered in using different columnar file formats like Parquet formats
Worked on various compression and file formats like Avro, Parquet, and Text formats
Worked on Azure Databricks cloud to organize the data into notebooks and make it easy to visualize data using dashboards
Develop and maintain ETL processes and data workflows using Scala and Apache Spark for real-time and batch data processing
Collaborate with data engineers and data scientists to integrate machine learning models and algorithms into Scala-based applications
Conduct performance tuning and optimization of Scala applications and Spark jobs to improve resource utilization and efficiency
Worked on CI/CD(GitHub) Pipeline to migrate the code into different environments
Using Azure Databricks, created Spark clusters and configured high concurrency clusters to speed up the preparation of high-quality data
Used Azure Databricks for a fast, easy, and collaborative spark-based platform on Azure

Data Engineer

TP Vision India private limited

09.2019 - 06.2022

Designed and implemented data warehousing solutions on Snowflake and Azure Synapse
Optimized Spark applications for large-scale data transformations and performance tuning
Created ETL pipelines using Spark, Scala, Python, and SQL
Developed Azure Data Factory and Airflow workflows for orchestrating automated data processing
Ensured data governance, compliance, and security through best practices
Worked Extensively on Azure SQL Written queries, stored procedures and views
Processed claims in mainframes for all claim types such as Professional, Institutional, Medicare/Medicaid, COB, SNF for the Cost shares (Deductible & Coinsurance) and payments are made as expected with member benefits and provider discounts
Managed high provider network claims by data mining large data sets to do large scale quantitative analysis and accurate data impacted 14 states of Blue cross and blue shield
Worked on resolving Anthem Provider data Migration from files into RDBMS
Key member of Data Engineer team that handled the data cleaning, pre-processing and mining visualization using Tableau
Managed high provider network claims by data mining large data sets of IRS, CMS, HHSC, HEDIS, HIPAA, PHI to do large scale quantitative analysis and accurate data impacted 14 states of Blue cross and blue shield
Proficient in navigating EPIC and Clarity systems, with hands-on experience in extracting, analyzing, and interpreting healthcare data to drive strategic decision-making
Extensive experience in managing and manipulating large datasets within EPIC and Clarity platforms to generate actionable insights and improve healthcare delivery processes
Created File system linked service to extract events from on-prim network folder to data lake
Worked on data factory monitor to check the status of the pipeline runs and triggers
For Claims data, Provider data, Utilization data and provide assessment of performance deviations and anomalies to project consultant
Worked on the performance tuning and extracting data between excel and servers using flows
Configured and implemented the Azure Data Factory Triggers and scheduled the Pipelines
Monitored the scheduled Azure Data Factory pipelines and configured the alerts to get notification of failure pipelines
Extensively worked on Azure Data Bricks with the help of Spark-SQL to implement SCD-1, SCD-2, SCD-3 approach
Maintained comprehensive documentation for Spark applications and adhered to best practices for code quality and readability
Integrated Apache Spark with other components of the big data ecosystem, such as Hadoop HDFS, Hive, and Kafka, for seamless data flow and processing
Successfully implemented Spark in scenarios demanding horizontal scalability, allowing for the efficient handling of growing data volumes
Regularly monitored Spark job performance, identified bottlenecks, and applied optimizations to improve job execution times
Trained and mentored team members on Apache Spark best practices, promoting knowledge sharing within the organization
Understanding of Cache-Control headers and their configuration to control caching behavior, such as setting expiration times, enabling validation, and managing cache re-validation
Configured the logic apps to handle email notification to the end users and key shareholders with the help of web services activity; create dynamic pipeline to handle multiple source extracting to multiple targets; extensively used azure key vaults to configure the connections in linked service
Exposure on Azure Data Factory activities such as Lookups, Stored procedures, if condition, for each, Set Variable, Append Variable, Get Metadata, Filter and wait
Deployed the codes to multiple environments with the help of CI/CD process and worked on code defect during the SIT and UAT testing and provide supports
Implemented End-End logging frameworks for Data factory pipelines
Developing Spark (Python) notebooks to transform and partition the data and organize files in ADLS
Involved in planning strategies to move existing application from on-premises to Azure cloud platform
Participated in daily Scrum meeting, bi-weekly Sprint planning and quarterly epic designs, planning as a process of agile methodology
Created the ARM Template for creating the Azure Infrastructure using the Gradle build in the Jenkins job

Data Engineer

Universal Soft Tech India private Limited

05.2015 - 08.2019

Implemented Terraform to manage the Azure and GCP Infrastructure using the configuration management tool Jenkins
Developed a G-cloud function with Python to load data into BigQuery for on-arrival CSV files in the GCS bucket.
Process and load bound and unbound data from Google Pub/Sub topic to BigQuery using Cloud Dataflow with Python.
Migrating the Data from the On-premises Database server to Azure SQL using the Data factory
Design, Plan and create Azure virtual machines, Implement and manage virtual networking within Azure and connect to on-premises environments
Migrating the client files from a shared server to Azure SQL using the Data Factory
Performing the Fortify scan and continuously check for any bugs
Automate the deployment and troubleshoot mechanisms for quick service
Managing and monitoring the azure cloud resources
Worked on fixing the azure vulnerabilities
Worked on querying and analyzing the logs of the azure resources
Implemented Azure Automation Account, Created an Azure Automation Runbook
Storing and using automation assets
Created and maintained data pipelines and ETL processes that leverage the parallel processing capabilities of MPP systems for large-scale data integration
Provided expertise in troubleshooting and resolving issues related to MPP architecture, ensuring uninterrupted data processing and system reliability
Understanding and using JSON Templates
Using Visual Studio to create and edit JSON templates and deployed JSON templates
Worked on implementing the private endpoint to the storage account
Worked on implementing the firewall to the storage account and azure SQL to restrict access to particular VM’s and IP address
Worked on storing the on-premises files in the azure storage account using the data factory
Created Automation scripts to deploying to the higher environments
Created Jenkins job to create the azure Infrastructure whenever there is a git push to the bitbucket
Developed the automation scripts for the health checks, Online Servers
Scheduled the health checks of the servers using the task scheduler
Loaded the data into database using SSIS
Debugging and resolving the azure SQL database errors
Worked on Hybrid Infrastructure and resolved the vulnerabilities in both cloud and on premises
Manage the metrics on azure resources
Implemented Azure Application Insights to store user activities and error logging
Worked on the Cost Management of Azure resources
Debugging and resolving the access issues by going through the logs, domain controller and active directory
Documented about the Servers lists in all the environments and its details
Worked on backup, restore and recovery of the Azure Databases
Worked on implementing backup methodologies by PowerShell scripts for azure services like Azure SQL Database, Key Vault, Storage blobs, App Services etc
Worked on improving the existing code and fixing some issues in it after a deep dive through the existing code
Migrated all the Service Requests from Jira to the Jira Service desk
Generated the scan report by performing the fortify scan on the existing code
Used various sources to pull the data into power Bi from SQL server and Oracle
Involved in installation of Power BI remote server
Using a Query editor in Power BI performed certain operations like fetching from different file
Merge queries and append queries, remove columns and split columns, choosing required columns in data
Used various data visualizations in Power BI
Managed version control and deployment of data applications using Git, Docker, and Jenkins.
Participated in agile development processes, contributing to sprint planning, stand-ups, and reviews to ensure timely delivery of data projects.
Collaborated with data scientists and analysts to understand data needs and implement appropriate data models and structures.

Education

Master’s -

Rivier university

01.2024

B.Tech -

Kits Engineering College

01-2016

Skills

Snowflake
Azure Data Factory
Azure Databricks
Azure Synapse
Azure Data Lake
Python
Scala
SQL
Shell Scripting
JavaScript
Apache Airflow
Informatica
Automic
Azure Logic Apps
Spark

PySpark
Spark SQL
DBT
Data lineage
Security best practices
Compliance
HIPAA
GDPR
Terraform
Git
Jenkins
CI/CD pipeline automation
SQL Analytical Functions
Power BI
Tableau

Timeline

Sr. Data Engineer

Wells Fargo

12.2023 - Current

Data Engineer

TP Vision India private limited

09.2019 - 06.2022

Data Engineer

Universal Soft Tech India private Limited

05.2015 - 08.2019

Master’s -

Rivier university

B.Tech -

Kits Engineering College

Satya Vaibhav Ravuri

Summary

Overview

Work History

Sr. Data Engineer

Data Engineer

Data Engineer

Education

Master’s -

B.Tech -

Skills

Timeline

Sr. Data Engineer

Data Engineer

Data Engineer

Master’s -

B.Tech -

Similar Profiles

PATRICIA M. SCHERERPATRICIA M. SCHERER

Andrew S. HollenbachAndrew S. Hollenbach

Laxmikanth ChittampallyLaxmikanth Chittampally

Mayank OberoiMayank Oberoi

Jayson WellsJayson Wells