Summary

Overview

Work History

Education

Skills

Certification

Websites

References

Timeline

Lakshman Dhullipalla

Cumming

Summary

Accomplished Data Engineer with expertise in building metadata-driven ETL frameworks at Elevance Health. Proven ability to optimize data processing solutions using Informatica IDMC, AWS, and GCP, enhancing project delivery timelines. Strong analytical skills complemented by effective team leadership, driving successful migrations and performance improvements in complex data environments.

Overview

years of professional experience

Certification

Work History

Data Engineer

Elevance Health - Contract

Atlanta

09.2022 - Current

DPDF is a platform to migrate the existing functionality of Enterprise Data Lake from the AWS platform to the Google platform with additional features like Meta Driven no-code ETL pipeline generation, which standardizes incoming data, allows onboarding of different data sources into Data Lake, and accelerates ETL development. It facilitates automation with real-time schema management. The metadata-driven ETL framework emphasizes templates created for rule management, data migration controls and exception handling, Integration, and transformation rules. As part of the framework, you can store data schemas, data source locations, job control parameters, and error-handling logic in configuration files. In this way, you can achieve quick replication and addition of new sources. The key element, which is at the central point, is the metadata portion enabling the reuse or definition of mappings, definition of different targets and sources, and the reuse or definition of transformation rules.
Roles & Responsibilities:
Developing reduce, reuse, recycle (RRR) Utilities and accelerators to improve the data engineering project delivery timelines.
Designed and led a team to build a metadata-driven ETL/Ingestion framework on the AWS and GCP cloud data platforms.
Troubleshooting (PDO) push down optimization between Informatica and Teradata Staging, Integration, and Semantic layers.
Coordinate with the full stack development team to create an ETL tool simulation, User Interfaces to facilitate the Data engineering teams to build their no-code ETL/Ingestion pipelines.
Troubleshooting the cloud platform architecture issues and creating alternative fixes to resolve the performance degradation issues.
Designed and developed a streaming pipeline for Primacy Determination using Kafka publish/consume messages to/from vendor systems, updating the outcomes to Snowflake tables, which will support the coordination of Benefits (COB) applications.
Developed a cloud file exchange (CFX) utility layer that will transfer the files to an intermediate S3 file gateway and calls an AWS lambda function to apply Protegrity data tokenization API to tokenize the PHI/PII sensitive data, collect vault access tokens to place the processed files in S3/GCS buckets.
Created an Orchestration layer which will get initiated by the file event triggers from the GCS bucket → Pub/sub → cloud function reads the Metadata entries in a MySQL cloud SQL table to invoke respective Airflow ETL orchestrator job to trigger the file ingestion to snowflake - SF/Copy/PostgreSQL/Post load DML executions using GKE images/Dataproc with Spark SQL to generate the extracts….
Created Docker files to create images for the gcs copy, execute SQL, execute py… and bundled the respective source code, K8s configuration deployment, service files to generate the images and trigger the jobs execution using the GKE images from the Ingestion framework.
Developed and deployed data processing solutions with Azure Databricks.
Converted existing business logic formulas from PL/SQL procedures to Snowflake SQL scripting stored procedures.
Develop/modify terraform scripts to deploy code enhancements, config changes to the serverless cloud resources i.e. cloud functions, lambda functions.
Environment: Informatica, GCP GCS, DataProc, Cloud Composer, Cloud SQL, Cloud Functions, BigQuery, AWS Step Functions, RDS, Lambda, Glue, EMR, S3, EMR, Kafka, Hive, Sqoop, and PySpark, Scala, Python 3.8, EC2, Oracle, Netezza, DB2, SQL Server, Teradata, TPT, Snowflake, Terraform, Kubernetes.

Data Engineer

Macy's - Contract

Atlanta

05.2017 - 09.2022

Project Title: Marketing Analytics.
Period: May 2017 – September 2022
Roles and Responsibilities:
Root cause analysis of the long-running Hadoop jobs, and optimization of the same with different types of session-level option settings.
Defined guidelines and implemented coding best practices to improve the performance of Hive and Spark jobs.
As part of the pilot project implementation, we converted a combination of moderate to complex Informatica workflows, mappings from Informatica PowerCenter to IICS. This included regression testing and performance testing between the data generated by the PowerCenter version and that generated by IICS.
Informatica mappings using Teradata parallel transporter API to import various Source Systems, Mainframe VSAM, XML, & Flat Files Data into the Teradata Staging Database
Working with Informatica Command Line Utilities, INFACMD, PMCMD, and PMREP for upgradation, running workflows & creating an enabled repository service
Developing reusable Transformations and MAPPLETS using various Informatica transformations like aggregator, joiner, router, expression, lookup, update strategy, sequence generator & source qualifier control-m jobs, Autosys Jils according to the business needs, ETL support group requirements to schedule nightly ETL Informatica workflow execution
Solved performance issues in Hive and Pig scripts with an understanding of Joins, Group, and aggregation, and how they translate to MapReduce jobs.
Proposed and initiated Hadoop cluster migration from on-premises bare metal data center infrastructure to GCP cloud.
Create technical design documents to convert the ETL pipelines from the Hadoop platform to the GCP cloud platform using Airflow/composer DAGS and operators.
Migrated executive reporting critical jobs from the Hadoop cluster to BigQuery and scheduled the ETL and Tableau reporting jobs using the GCP cloud composer environment.
Responsible for GCP resource usage monitoring, troubleshooting/fixing production ETL job failures for 40+ applications.
Converted Adobe click stream ingestion framework from Scala to PySpark
Peer-reviewing and approving the code check-ins by team members.
Configured CI/CD pipelines using Jenkins, Bitbucket, GitLab CI to deploy the ETL pipeline code changes to higher environments – SIT, UAT, Pre-prod, Prod.
Providing technical guidance to the development team to implement/execute the pykafka code Containerized Spark on the Yarn environment.
Good understanding of Partitions and bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance.
Created and managed Sqoop jobs with incremental load to populate Hive External tables.
Designed and developed Sqoop import/export scripts to transfer data from Teradata to HDFS and HDFS to Teradata using Teradata Hadoop Connector.
Created reusable Linux scripts to backup/restore HDFS files between HDFS and S3 bucket using distcp while passing HDFS path, S3 keys, S3 bucket location, and user values as input parameters.
Environment: Informatica – IDQ, Enterprise Data Catalog(EDC), AXON, IICS, Hortonworks HDP 2.6.3, MySQL, Oracle, Teradata 15, RHEL 6.7/7.4, TensorFlow 1.3.1, Python 3.6.1, h2o-3.10.5.4-hdp2.6, JupyterHub, Hive, Sqoop, PySpark, GCP, and DevOps tools – Terraform, Ansible, Grafana, Prometheus, Kubernetes.

Lead Consultant

Genpact Headstrong

05.2015 - 08.2016

Trip Optimizer

Environment:

JDK 1.7, Talend for Big Data DI, Eclipse IDE, SQL

developer, pgAdmin, GPDB, Java, Rally, Linux scripts,

Hortonworks Distribution for Hadoop, HDFS,

MapReduce, Python, Pyspark, Pig, Hive, Sqoop, Spark, Maven, SVN, Agile.

Big Data Engineer

Comcast cables - Contract

Westchester

10.2013 - 11.2014

National Data Warehouse

Environment: Red Hat Enterprise Linux 6, Informatica

Power Center 9.0.1, Oracle 11g, Teradata 13.10, HDP 2.x,

HDFS, MapReduce, PIG, Hive, Sqoop, Flume, HBase,

Golden Gate, Flat Files, XML Files, Cognos 10.2.1,

SQL Server 2008, Control-m, Tortoise SVN, Agile - Rally,

JIRA bug tracking tool, Apache Spark, Talend open studio

for Data Integration 5.4, Informatica PDO, Teradata

BTEQ, Teradata parallel transporter

ETL Lead Developer

jpmorgan chase - contract

Dallas

10.2012 - 10.2013

ICDW

Environment:

AIX 6.1, Informatica Power Center 9.0.1,

PowerExchange9.0.1, Oracle 11g, Flat Files, XML Files,

Mainframe, Teradata 13.10, Cognos 10.2, SQL Server

2008 SSIS, SSAS, SSRS, Control-m 6.4 & SVN

Fidelity Investments - Contract

Programmer Analyst

Boston

09.2011 - 10.2012

Finance Data Hub

Environment:

RHEL 5.5 (Tikanga), Informatica Power Center 9.0.1,

PowerExchange 9.0.1, Oracle 10g, SQL Server 2008, Flat

Files, XML Files, NZ 1000 -TF12, OBIEE 10g &

Control-m 6.4 & SVN

ETL Informatica Developer

PNC Bank - Contract

Cleveland

03.2010 - 04.2011

EIP- Basel II

Environment:

AIX 5.3.0.0, Z/OS 390, VSAM KSDS Files, Informatica

PowerCenter 8.6.1, PowerExchange 8.6.1, Oracle 10g with

Exadata, Toad, Oracle BI EE 10.x, CA Autosys Engine

11.x, CA Erwin, CA SCM (Harvest)

12.x, ChangeMan DS(Serena) 5.x, Harvest

Software Engineer

Great Falls Software Solutions

Reston

01.2009 - 03.2010

Corporate Executiveboard, Rosslyn, VA Jun09 to Feb2010
Role: ETL Lead Developer
EDW

Environment: Solaris 9, INFORMATICA 8.6,Oracle 10g, Windows 2003 R2, SQL server 2005, SQL
server 2005 BIDS, Management Studio, SSIS, .NET frames work, Business Objects-XI, Source Gear
Vault 3.1.8, Dragnet,Autosys

Education

BE - Computer Science

University of Madras

Chepauk, Chennai

06-2004

Skills

Expertise in:
Building batch ingestion triggers using AWS-S3 file events with Lambda / GCP-GCS file events with Pub sub trigger
Data Lake, Data Warehouse/Data Mart Development using Hadoop Framework, DataBricks, Talend Big Data Integration, Informatica PowerCenter 62/71/81/86, Power Exchange, T-SQL, PL/SQL, Teradata Utilities, UNIX, SED/AWK, and Perl
Designing & developing the ETL process for loading data from heterogeneous source systems like Flat Files, XML, Oracle, SQL Server, MQSeries 52, VSAM Files, SAP R/3, DB2, and Teradata using Talend DI, Informatica, BTEQ, T-SQL, PL/SQL, UNIX & PERL
Writing Teradata Multiload, Fast Load & BTEQ Scripts and using Tpump Utility to load near-real-time data into Data Warehouse
Well-versed with Oracle, SQL Server & Teradata Database Architecture
Deft in writing PL/SQL stored procedures, functions, packages & triggers, UNIX Scripts, BTEQ scripts and SQL
LDR Scripts, Perl Scripts using Net::FTP, NET::TELENET & DBI Modules

Conversant with PL/SQL subroutines using Bulk Collects, Dynamic SQL, Utl_file, Ref Cursors, SQL Analytical Functions, and Query Optimization
Fine-tuning of Informatica mappings, PL/SQL Stored Procedures, and SQL to obtain optimal performance and throughput
RDBMS: Teradata 13, V2R5, Oracle10g/9i/8i/7x, Amazon Redshift, SQL Server 65/70/2000/2005, MS Access 70 & DB2 80
ETL Tools: Talend Big Data DI, Informatica Power Center 86/81/7x/6x, Power Connect, Power Exchange, SQL Server SSIS & ODI, Apache Kafka
Operating System: UNIX, Linux, Mac & Windows XP/NT/2000
Data Modelling: Erwin 35/24/0
Languages/Utilities: MapReduce, Pig, Sqoop, Hive, Oozie, TPT, BTEQ, SQL, PL/SQL, Perl, Core Java, XML
Other Software: TOAD, Crystal Reports 80/60/46, Auto Sys, ClearQuest, ClearCase, Mercury & Appworx

Certification

Oracle Database 10g Administrator Certified Associate
Informatica Certified Developer
Cloudera Certified Developer for Apache Hadoop CCDH
Talend Data Integration for Developer 6.0
Hortonworks HDP Certified Administrator
SnowPro Core certification
AWS Certified Developer
AWS Certified SysOps Administrator Associate
AWS Certified DevOps Engineer Professional
Dbr Fundamentals

Websites

https://www.linkedin.com/in/lakshmancdhullipalla/

References

References available upon request.

Timeline

Data Engineer

Elevance Health - Contract

09.2022 - Current

Data Engineer

Macy's - Contract

05.2017 - 09.2022

Lead Consultant

Genpact Headstrong

05.2015 - 08.2016

Big Data Engineer

Comcast cables - Contract

10.2013 - 11.2014

ETL Lead Developer

jpmorgan chase - contract

10.2012 - 10.2013

Fidelity Investments - Contract

Programmer Analyst

09.2011 - 10.2012

ETL Informatica Developer

PNC Bank - Contract

03.2010 - 04.2011

Software Engineer

Great Falls Software Solutions

01.2009 - 03.2010

BE - Computer Science

University of Madras

Lakshman Dhullipalla

Summary

Overview

Work History

Data Engineer

Data Engineer

Lead Consultant

Big Data Engineer

ETL Lead Developer

Fidelity Investments - Contract

ETL Informatica Developer

Software Engineer

Education

BE - Computer Science

Skills

Certification

Websites

References

Timeline

Data Engineer

Data Engineer

Lead Consultant

Big Data Engineer

ETL Lead Developer

Fidelity Investments - Contract

ETL Informatica Developer

Software Engineer

BE - Computer Science

Similar Profiles

CHRISTIAN HERMANNCHRISTIAN HERMANN

Corey AdamsCorey Adams

DAVI ARMANINODAVI ARMANINO

Julia MartinJulia Martin

Jennifer AndrewsJennifer Andrews