Summary
Overview
Work History
Education
Skills
Certification
Websites
References
Timeline
Generic
Lakshman Dhullipalla

Lakshman Dhullipalla

Cumming

Summary

Accomplished Data Engineer with expertise in building metadata-driven ETL frameworks at Elevance Health. Proven ability to optimize data processing solutions using Informatica IDMC, AWS, and GCP, enhancing project delivery timelines. Strong analytical skills complemented by effective team leadership, driving successful migrations and performance improvements in complex data environments.

Overview

16
16
years of professional experience
1
1
Certification

Work History

Data Engineer

Elevance Health - Contract
Atlanta
09.2022 - Current
  • DPDF is a platform to migrate the existing functionality of Enterprise Data Lake from the AWS platform to the Google platform with additional features like Meta Driven no-code ETL pipeline generation, which standardizes incoming data, allows onboarding of different data sources into Data Lake, and accelerates ETL development. It facilitates automation with real-time schema management. The metadata-driven ETL framework emphasizes templates created for rule management, data migration controls and exception handling, Integration, and transformation rules. As part of the framework, you can store data schemas, data source locations, job control parameters, and error-handling logic in configuration files. In this way, you can achieve quick replication and addition of new sources. The key element, which is at the central point, is the metadata portion enabling the reuse or definition of mappings, definition of different targets and sources, and the reuse or definition of transformation rules.
  • Roles & Responsibilities:
  • Developing reduce, reuse, recycle (RRR) Utilities and accelerators to improve the data engineering project delivery timelines.
  • Designed and led a team to build a metadata-driven ETL/Ingestion framework on the AWS and GCP cloud data platforms.
  • Troubleshooting (PDO) push down optimization between Informatica and Teradata Staging, Integration, and Semantic layers.
  • Coordinate with the full stack development team to create an ETL tool simulation, User Interfaces to facilitate the Data engineering teams to build their no-code ETL/Ingestion pipelines.
  • Troubleshooting the cloud platform architecture issues and creating alternative fixes to resolve the performance degradation issues.
  • Designed and developed a streaming pipeline for Primacy Determination using Kafka publish/consume messages to/from vendor systems, updating the outcomes to Snowflake tables, which will support the coordination of Benefits (COB) applications.
  • Developed a cloud file exchange (CFX) utility layer that will transfer the files to an intermediate S3 file gateway and calls an AWS lambda function to apply Protegrity data tokenization API to tokenize the PHI/PII sensitive data, collect vault access tokens to place the processed files in S3/GCS buckets.
  • Created an Orchestration layer which will get initiated by the file event triggers from the GCS bucket → Pub/sub → cloud function reads the Metadata entries in a MySQL cloud SQL table to invoke respective Airflow ETL orchestrator job to trigger the file ingestion to snowflake - SF/Copy/PostgreSQL/Post load DML executions using GKE images/Dataproc with Spark SQL to generate the extracts….
  • Created Docker files to create images for the gcs copy, execute SQL, execute py… and bundled the respective source code, K8s configuration deployment, service files to generate the images and trigger the jobs execution using the GKE images from the Ingestion framework.
  • Developed and deployed data processing solutions with Azure Databricks.
  • Converted existing business logic formulas from PL/SQL procedures to Snowflake SQL scripting stored procedures.
  • Develop/modify terraform scripts to deploy code enhancements, config changes to the serverless cloud resources i.e. cloud functions, lambda functions.
  • Environment: Informatica, GCP GCS, DataProc, Cloud Composer, Cloud SQL, Cloud Functions, BigQuery, AWS Step Functions, RDS, Lambda, Glue, EMR, S3, EMR, Kafka, Hive, Sqoop, and PySpark, Scala, Python 3.8, EC2, Oracle, Netezza, DB2, SQL Server, Teradata, TPT, Snowflake, Terraform, Kubernetes.

Data Engineer

Macy's - Contract
Atlanta
05.2017 - 09.2022
  • Project Title: Marketing Analytics.
  • Period: May 2017 – September 2022
  • Roles and Responsibilities:
  • Root cause analysis of the long-running Hadoop jobs, and optimization of the same with different types of session-level option settings.
  • Defined guidelines and implemented coding best practices to improve the performance of Hive and Spark jobs.
  • As part of the pilot project implementation, we converted a combination of moderate to complex Informatica workflows, mappings from Informatica PowerCenter to IICS. This included regression testing and performance testing between the data generated by the PowerCenter version and that generated by IICS.
  • Informatica mappings using Teradata parallel transporter API to import various Source Systems, Mainframe VSAM, XML, & Flat Files Data into the Teradata Staging Database
  • Working with Informatica Command Line Utilities, INFACMD, PMCMD, and PMREP for upgradation, running workflows & creating an enabled repository service
  • Developing reusable Transformations and MAPPLETS using various Informatica transformations like aggregator, joiner, router, expression, lookup, update strategy, sequence generator & source qualifier control-m jobs, Autosys Jils according to the business needs, ETL support group requirements to schedule nightly ETL Informatica workflow execution
  • Solved performance issues in Hive and Pig scripts with an understanding of Joins, Group, and aggregation, and how they translate to MapReduce jobs.
  • Proposed and initiated Hadoop cluster migration from on-premises bare metal data center infrastructure to GCP cloud.
  • Create technical design documents to convert the ETL pipelines from the Hadoop platform to the GCP cloud platform using Airflow/composer DAGS and operators.
  • Migrated executive reporting critical jobs from the Hadoop cluster to BigQuery and scheduled the ETL and Tableau reporting jobs using the GCP cloud composer environment.
  • Responsible for GCP resource usage monitoring, troubleshooting/fixing production ETL job failures for 40+ applications.
  • Converted Adobe click stream ingestion framework from Scala to PySpark
  • Peer-reviewing and approving the code check-ins by team members.
  • Configured CI/CD pipelines using Jenkins, Bitbucket, GitLab CI to deploy the ETL pipeline code changes to higher environments – SIT, UAT, Pre-prod, Prod.
  • Providing technical guidance to the development team to implement/execute the pykafka code Containerized Spark on the Yarn environment.
  • Good understanding of Partitions and bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance.
  • Created and managed Sqoop jobs with incremental load to populate Hive External tables.
  • Designed and developed Sqoop import/export scripts to transfer data from Teradata to HDFS and HDFS to Teradata using Teradata Hadoop Connector.
  • Created reusable Linux scripts to backup/restore HDFS files between HDFS and S3 bucket using distcp while passing HDFS path, S3 keys, S3 bucket location, and user values as input parameters.
  • Environment: Informatica – IDQ, Enterprise Data Catalog(EDC), AXON, IICS, Hortonworks HDP 2.6.3, MySQL, Oracle, Teradata 15, RHEL 6.7/7.4, TensorFlow 1.3.1, Python 3.6.1, h2o-3.10.5.4-hdp2.6, JupyterHub, Hive, Sqoop, PySpark, GCP, and DevOps tools – Terraform, Ansible, Grafana, Prometheus, Kubernetes.

Lead Consultant

Genpact Headstrong
05.2015 - 08.2016

Trip Optimizer

Environment:

JDK 1.7, Talend for Big Data DI, Eclipse IDE, SQL

developer, pgAdmin, GPDB, Java, Rally, Linux scripts,

Hortonworks Distribution for Hadoop, HDFS,

MapReduce, Python, Pyspark, Pig, Hive, Sqoop, Spark, Maven, SVN, Agile.

Big Data Engineer

Comcast cables - Contract
Westchester
10.2013 - 11.2014

National Data Warehouse

Environment: Red Hat Enterprise Linux 6, Informatica

Power Center 9.0.1, Oracle 11g, Teradata 13.10, HDP 2.x,

HDFS, MapReduce, PIG, Hive, Sqoop, Flume, HBase,

Golden Gate, Flat Files, XML Files, Cognos 10.2.1,

SQL Server 2008, Control-m, Tortoise SVN, Agile - Rally,

JIRA bug tracking tool, Apache Spark, Talend open studio

for Data Integration 5.4, Informatica PDO, Teradata

BTEQ, Teradata parallel transporter

ETL Lead Developer

jpmorgan chase - contract
Dallas
10.2012 - 10.2013

ICDW

Environment:

AIX 6.1, Informatica Power Center 9.0.1,

PowerExchange9.0.1, Oracle 11g, Flat Files, XML Files,

Mainframe, Teradata 13.10, Cognos 10.2, SQL Server

2008 SSIS, SSAS, SSRS, Control-m 6.4 & SVN

Fidelity Investments - Contract

Programmer Analyst
Boston
09.2011 - 10.2012

Finance Data Hub

Environment:

RHEL 5.5 (Tikanga), Informatica Power Center 9.0.1,

PowerExchange 9.0.1, Oracle 10g, SQL Server 2008, Flat

Files, XML Files, NZ 1000 -TF12, OBIEE 10g &

Control-m 6.4 & SVN

ETL Informatica Developer

PNC Bank - Contract
Cleveland
03.2010 - 04.2011

EIP- Basel II

Environment:

AIX 5.3.0.0, Z/OS 390, VSAM KSDS Files, Informatica

PowerCenter 8.6.1, PowerExchange 8.6.1, Oracle 10g with

Exadata, Toad, Oracle BI EE 10.x, CA Autosys Engine

11.x, CA Erwin, CA SCM (Harvest)

12.x, ChangeMan DS(Serena) 5.x, Harvest

Software Engineer

Great Falls Software Solutions
Reston
01.2009 - 03.2010

Corporate Executiveboard, Rosslyn, VA Jun09 to Feb2010
Role: ETL Lead Developer
EDW

Environment: Solaris 9, INFORMATICA 8.6,Oracle 10g, Windows 2003 R2, SQL server 2005, SQL
server 2005 BIDS, Management Studio, SSIS, .NET frames work, Business Objects-XI, Source Gear
Vault 3.1.8, Dragnet,Autosys

Education

BE - Computer Science

University of Madras
Chepauk, Chennai
06-2004

Skills

  • Expertise in:
  • Building batch ingestion triggers using AWS-S3 file events with Lambda / GCP-GCS file events with Pub sub trigger
  • Data Lake, Data Warehouse/Data Mart Development using Hadoop Framework, DataBricks, Talend Big Data Integration, Informatica PowerCenter 62/71/81/86, Power Exchange, T-SQL, PL/SQL, Teradata Utilities, UNIX, SED/AWK, and Perl
  • Designing & developing the ETL process for loading data from heterogeneous source systems like Flat Files, XML, Oracle, SQL Server, MQSeries 52, VSAM Files, SAP R/3, DB2, and Teradata using Talend DI, Informatica, BTEQ, T-SQL, PL/SQL, UNIX & PERL
  • Writing Teradata Multiload, Fast Load & BTEQ Scripts and using Tpump Utility to load near-real-time data into Data Warehouse
  • Well-versed with Oracle, SQL Server & Teradata Database Architecture
  • Deft in writing PL/SQL stored procedures, functions, packages & triggers, UNIX Scripts, BTEQ scripts and SQL
  • LDR Scripts, Perl Scripts using Net::FTP, NET::TELENET & DBI Modules
  • Conversant with PL/SQL subroutines using Bulk Collects, Dynamic SQL, Utl_file, Ref Cursors, SQL Analytical Functions, and Query Optimization
  • Fine-tuning of Informatica mappings, PL/SQL Stored Procedures, and SQL to obtain optimal performance and throughput
  • RDBMS: Teradata 13, V2R5, Oracle10g/9i/8i/7x, Amazon Redshift, SQL Server 65/70/2000/2005, MS Access 70 & DB2 80
  • ETL Tools: Talend Big Data DI, Informatica Power Center 86/81/7x/6x, Power Connect, Power Exchange, SQL Server SSIS & ODI, Apache Kafka
  • Operating System: UNIX, Linux, Mac & Windows XP/NT/2000
  • Data Modelling: Erwin 35/24/0
  • Languages/Utilities: MapReduce, Pig, Sqoop, Hive, Oozie, TPT, BTEQ, SQL, PL/SQL, Perl, Core Java, XML
  • Other Software: TOAD, Crystal Reports 80/60/46, Auto Sys, ClearQuest, ClearCase, Mercury & Appworx

Certification

  • Oracle Database 10g Administrator Certified Associate
  • Informatica Certified Developer
  • Cloudera Certified Developer for Apache Hadoop CCDH
  • Talend Data Integration for Developer 6.0
  • Hortonworks HDP Certified Administrator
  • SnowPro Core certification
  • AWS Certified Developer
  • AWS Certified SysOps Administrator Associate
  • AWS Certified DevOps Engineer Professional
  • Dbr Fundamentals

References

References available upon request.

Timeline

Data Engineer

Elevance Health - Contract
09.2022 - Current

Data Engineer

Macy's - Contract
05.2017 - 09.2022

Lead Consultant

Genpact Headstrong
05.2015 - 08.2016

Big Data Engineer

Comcast cables - Contract
10.2013 - 11.2014

ETL Lead Developer

jpmorgan chase - contract
10.2012 - 10.2013

Fidelity Investments - Contract

Programmer Analyst
09.2011 - 10.2012

ETL Informatica Developer

PNC Bank - Contract
03.2010 - 04.2011

Software Engineer

Great Falls Software Solutions
01.2009 - 03.2010

BE - Computer Science

University of Madras
Lakshman Dhullipalla