Summary
Overview
Work History
Education
Skills
Certification
Timeline
Generic
Lakshman Dhullipalla

Lakshman Dhullipalla

Cumming

Summary

Principal Consultant with extensive experience in developing metadata-driven ETL frameworks and optimizing cloud data solutions across AWS, GCP, and Azure. Proficient in Python, PySpark, Databricks, and Kafka, with a strong focus on enhancing data processing efficiency. Proven ability to lead cross-functional teams and improve project delivery timelines through strategic initiatives.

Overview

19
19
years of professional experience
1
1
Certification

Work History

Principal Consultant - DE

Factspan Inc.
Orlando
09.2022 - Current
  • DPDF is a platform to migrate the existing functionality of Enterprise Data Lake from the AWS platform to the Google platform with additional features like Meta Driven no-code ETL pipeline generation, which standardizes incoming data, allows onboarding of different data sources into Data Lake, and accelerates ETL development. It facilitates automation with real-time schema management. The metadata-driven ETL framework emphasizes templates created for rule management, data migration controls and exception handling, Integration, and transformation rules. As part of the framework, you can store data schemas, data source locations, job control parameters, and error-handling logic in configuration files. In this way, you can achieve quick replication and addition of new sources. The key element, which is at the central point, is the metadata portion enabling the reuse or definition of mappings, definition of different targets and sources, and the reuse or definition of transformation rules.
  • Roles & Responsibilities:
  • Developing reduce, reuse, recycle (RRR) Utilities and accelerators to improve the data engineering project delivery timelines.
  • Designed and led a team to build a metadata-driven ETL/Ingestion framework on the AWS and GCP cloud data platforms.
  • Coordinate with the full stack development team to create an ETL tool simulation, User Interfaces to facilitate the Data engineering teams to build their no-code ETL/Ingestion pipelines.
  • Troubleshooting the cloud platform architecture issues and creating alternative fixes to resolve the performance degradation issues.
  • Designed and developed a streaming pipeline for Primacy Determination using Kafka publish/consume messages to/from vendor systems, updating the outcomes to snowflake tables, which will support the coordination of Benefits (COB) applications.
  • Developed a cloud file exchange (CFX) utility layer that will transfer the files to an intermediate S3 file gateway and calls an AWS lambda function to apply Protegrity data tokenization API to tokenize the PHI/PII sensitive data, collect vault access tokens to place the processed files in S3/GCS buckets.
  • Created an Orchestration layer which will get initiated by the file event triggers from the GCS bucket -> Pub/sub -> cloud function reads the Metadata entries in a MySQL cloud SQL table to invoke respective Airflow ETL orchestrator job to trigger the file ingestion to snowflake - SFCopy/PostgreSQL/Post load DML executions using GKE images/Dataproc with Spark SQL to generate the extracts….
  • Created Docker files to create images for the gcs copy, execute SQL, execute py… and bundled the respective source code, K8s configuration deployment, service files to generate the images and trigger the jobs execution using the GKE images from the Ingestion framework.
  • Developed and deployed data processing solutions with Azure Databricks.
  • Converted existing business logic formulas from PL/SQL procedures to Snowflake SQL scripting stored procedures.
  • Develop/modify terraform scripts to deploy code enhancements, config changes to the serverless cloud resources i.e. cloud functions, lambda functions.

Solution Architect

Factspan Analytics Pvt ltd.
Bengaluru
05.2017 - 09.2022
  • Conducted root cause analysis on long-running Hadoop jobs and optimized performance through session-level options.
  • Defined coding best practices, enhancing Hive and Spark job performance.
  • Resolved performance issues in Hive and Pig scripts by applying knowledge of Joins, Group, and aggregation.
  • Proposed and initiated migration of Hadoop cluster from on-premises infrastructure to GCP cloud.
  • Developed technical design documents for ETL pipeline conversion from Hadoop to GCP using Airflow.
  • Migrated critical executive reporting jobs from Hadoop to BigQuery, scheduling ETL and Tableau reporting in GCP.
  • Monitored GCP resource usage, troubleshooting production ETL job failures across 40+ applications.
  • Converted Adobe clickstream ingestion framework from Scala to PySpark.

Solution Architect

Factspan
Bengaluru
02.2021 - 08.2022
  • Provisioned SSIS Catalog and Azure-SSIS Integration runtime for executing migrated SSIS packages.
  • Converted SSIS packages and T-SQL procedures into PySpark programs to enhance data processing efficiency.
  • Utilized Databricks to design data orchestration workflows, improving operational capabilities.
  • Developed an implementation plan for optimizing distributed workspaces in Databricks using Unity Catalog for enhanced data governance.
  • Applied expertise in Delta Lake, medallion architecture, and Lakehouse architecture to manage Delta Live Tables.
  • Implemented Slowly Changing Dimension Type 2 on Databricks with Delta Lake for improved change tracking.
  • Led project on ADSL Gen1 to Gen2 migration, focusing on decommissioning strategies and leveraging new technologies.
  • Executed ETL refactoring from SSIS to ADF while migrating the data warehouse to Synapse Analytics.

Lead Consultant

Genpact India Pvt Ltd
05.2015 - 08.2016
  • Trip Optimizer
  • Environment: JDK 1.7, Talend for Big Data DI, Eclipse IDE, SQL Developer, pgAdmin, GPDB, Java, Rally, Linux scripts, Hortonworks Distribution for Hadoop, HDFS, MapReduce, Pig, Hive, Sqoop, Spark, Maven, SVN, and Agile.aven, SVN, Agile.

Programmer Analyst

System Soft Technologies – Comcast Cables
10.2013 - 11.2014
  • Comcast cables
  • Environment: Red Hat Enterprise Linux 6, Informatica Power Center 9.0.1, Oracle 11g, Teradata 13.10, HDP 2.x, HDFS, MapReduce, PIG, Hive, Sqoop, Flume, HBase, Golden Gate, Flat Files, XML Files, Cognos 10.2.1, SQL Server 2008, Control-m, Tortoise SVN, Agile - Rally, JIRA bug tracking tool, Apache Spark, Talend open studio for Data Integration 5.4, Informatica PDO, Teradata BTEQ, Teradata parallel transporter.

Senior Associate - Projects

Cognizant Technology Solutions – JP Morgan Chase
10.2012 - 10.2013
  • ICDW
  • Environment: AIX 6.1, Informatica Power Center 9.0.1, PowerExchange9.0.1, Oracle 11g, Flat Files, XML Files, Mainframe, Teradata 13.10, Cognos 10.2, SQL Server 2008 SSIS, SSAS, SSRS, Control-m 6.4 & SVN.

Programmer Analyst

Reliable Software Resources Inc – Fidelity Investments
09.2011 - 10.2012
  • Finance Data Hub
  • Environment: RHEL 5.5 (Tikanga), Informatica Power Center 9.0.1, PowerExchange 9.0.1, Oracle 10g, SQL Server 2008, Flat Files, XML Files, NZ 1000 -TF12, OBIEE 10g & Control-m 6.4 & SVN.

Software Engineer

Great Falls Software Solutions – PNC Bank
01.2009 - 05.2011
  • EIP- Basel II
  • Environment: AIX 5.3.0.0, Z/OS 390, VSAM KSDS Files, Informatica PowerCenter 8.6.1, PowerExchange 8.6.1, Oracle 10g with Esadata, Toad, Oracle BI EE 10.x, CA Autosys Engine 11.x, CA Erwin, CA SCM (Harvest) 12.x, ChangeMan DS(Serena) 5.x, Harvest.

Software Engineer

Prodigy Software Group
08.2008 - 01.2009
  • HR - Human Capital Management
  • Environment: Oracle 9i/10g, PL/SQL, PERL, UNIX Shell, Solaris 10, Informatica 7.X.

Software Engineer

HCL Technologies, Chennai – Merck Pharmaceuticals
03.2007 - 04.2008
  • OMCDS - Order Management Control Decision Support System
  • Environment: Solaris 9, Oracle 10G, Toad, Erwin, DB2, SAP R/3, XML Files, Teradata V2R5, Informatica 8.1 Power Center, Power Exchange Change Data Capture (CDC), Trillium & Cognos 8.2.

Software Engineer

Infinite Computer Solutions (India) Ltd
09.2006 - 03.2007
  • COMFIN HYPERION
  • Environment: Informatica 7.1, Oracle 9i, Flat Files, SQL Server 2000, Approx, Perl, Hyperion & Solaris.

Education

Bachelor of Engineering - Computer Science

University of Madras
Chepauk, Chennai
06-2004

Skills

  • Expertise in:
  • Building batch ingestion triggers using AWS-S3 file events with Lambda / GCP-GCS file events with Pub sub trigger
  • Data Lake, Data Warehouse/Data Mart Development using Hadoop Framework, DataBricks, Talend Big Data Integration, Informatica PowerCenter 62/71/816, Power Exchange, T-SQL, PL/SQL, Teradata Utilities, UNIX, SED/AWK, and Perl
  • Designing & developing the ETL process for loading data from heterogeneous source systems like Flat Files, XML, Oracle, SQL Server, MQSeries 52, VSAM Files, SAP R/3, DB2, and Teradata using Talend DI, Informatica, BTEQ, T-SQL, PL/SQL, UNIX & PERL
  • Writing Teradata Multiload, Fast Load & BTEQ Scripts and using Tpump Utility to load near-real-time data into Data Warehouse
  • Well-versed with Oracle, SQL Server & Teradata Database Architecture
  • Deft in writing PL/SQL stored procedures, functions, packages & triggers, UNIX Scripts, BTEQ scripts and SQL
  • LDR Scripts, Perl Scripts using Net::FTP, NET::TELENET & DBI Modules
  • Conversant with PL/SQL subroutines using Bulk Collects, Dynamic SQL, Utl_file, Ref Cursors, SQL Analytical Functions, and Query Optimization
  • Fine-tuning complex mappings, PL/SQL Stored Procedures, and SQL to obtain optimal performance and throughput
  • RDBMS: Teradata 13, V2R5, Oracle10g/9i/8i/7x, Amazon Redshift, SQL Server 65/70/2000/2005, MS Access 70 & DB2 80
  • ETL Tools: Talend Big Data DI, Informatica Power Center 86/81/7x/6x, Power Connect, Power Exchange, SQL Server SSIS & ODI, Apache Kafka
  • Operating System: UNIX, Linux, Mac & Windows XP/NT/2000
  • Data Modelling: Erwin 352/40
  • Languages/Utilities: MapReduce, Pig, Sqoop, Hive, Oozie, TPT, BTEQ, SQL, PL/SQL, Perl, Core Java, XML
  • Other Software: TOAD, Crystal Reports 80/60/46, Auto Sys, ClearQuest, ClearCase, Mercury & Appworx

Certification

  • Oracle Database 10g Administrator Certified Associate
  • Informatica Certified Developer
  • Cloudera Certified Developer for Apache Hadoop CCDH
  • Talend Data Integration for Developer 6.0
  • Hortonworks HDP Certified Administrator
  • SnowPro Core certification
  • AWS Certified Developer
  • AWS Certified SysOps Administrator Associate
  • AWS Certified DevOps Engineer Professional
  • Dbt Fundamentals

Timeline

Principal Consultant - DE

Factspan Inc.
09.2022 - Current

Solution Architect

Factspan
02.2021 - 08.2022

Solution Architect

Factspan Analytics Pvt ltd.
05.2017 - 09.2022

Lead Consultant

Genpact India Pvt Ltd
05.2015 - 08.2016

Programmer Analyst

System Soft Technologies – Comcast Cables
10.2013 - 11.2014

Senior Associate - Projects

Cognizant Technology Solutions – JP Morgan Chase
10.2012 - 10.2013

Programmer Analyst

Reliable Software Resources Inc – Fidelity Investments
09.2011 - 10.2012

Software Engineer

Great Falls Software Solutions – PNC Bank
01.2009 - 05.2011

Software Engineer

Prodigy Software Group
08.2008 - 01.2009

Software Engineer

HCL Technologies, Chennai – Merck Pharmaceuticals
03.2007 - 04.2008

Software Engineer

Infinite Computer Solutions (India) Ltd
09.2006 - 03.2007

Bachelor of Engineering - Computer Science

University of Madras
Lakshman Dhullipalla