Senior Data Engineer (Python, AWS)
About the Role
We are looking for a Senior Data Engineer who will be responsible for designing and building optimized data pipelines, in an on-prem or cloud environment, for the purpose of driving analytic insights.
Responsibilities
- Create the conceptual, logical and physical data models.
- Build the infrastructure required for optimal extraction, transformation, and loading of data from a wide variety of sources like Hadoop, Spark, AWS Lambda, etc.
- Lead and/or mentor a small team of data engineers.
- Design, develop, test, deploy, maintain and improve data integration pipeline.
- Develop pipeline objects using Apache Spark / Pyspark / Python or Scala.
- Design and develop data pipeline architectures using Hadoop, Spark and related AWS Services.
- Load and performance test data pipelines built using the above-mentioned technologies.
- Communicate effectively with client leadership and business stakeholders.
- Participate in proposal and/or SOW development.
Requirements
- Professional work experience as a strategic or a management consulting (customer facing) role and in an on-shore capacity, is highly preferred.
- 5+ years of professional work experience designing and implementing data pipelines in on-prem and cloud environments is REQUIRED.
- 5+ years of experience building conceptual, logical and/or physical database designs using tools such as ErWin, Visio or Enterprise Architect.
- Strong hands-on experience implementing big-data solutions in the Hadoop ecosystem (Apache Hadoop, MapReduce, Hive, Pig, Sqoop, NoSQL, etc) and.or Databricks is required.
- 3+ years of experience with Cloud platforms (AWS preferred, GCP or Azure), and Python programming and frameworks (e.g., Django, Flask, Bottle) is REQUIRED.
- 5+ years of working with one or more databases like Snowflake, AWS Redshift, Oracle, SQL Server, Teradata, Netezza, Hadoop, Mongo DB or Cassandra is required.
- Expert level knowledge of using SQL to write complex, highly-optimized queries across large volumes of data is required.
- 3+ years of professional hands-on experience working with one or more ETL tools to build data pipelines/data warehouses is highly preferred (e.g. Talend Big Data, Informatica, DataStage, Abinitio).
- 3+ years of hands-on programming experience using Scala, Python, R, or Java is REQUIRED.
- 2+ years of professional work experience on ETL pipeline implementation using AWS services such as Glue, Lambda, EMR, Athena, S3, SNS, Kinesis, Data-Pipelines, Pyspark, etc.
- 2+ years of professional work experience using real-time streaming systems (Kafka/Kafka Connect, Spark, Flink or AWS Kinesis) is required.
- Knowledge or experience in architectural best practices in building data lakes is required.
- Strong problem solving and troubleshooting skills with the ability to exercise mature judgement.
- Ability to work independently, and provide guidance to junior data engineers.
- Ability to build and maintain strong customer relationships.
- Bachelor or Master’s degree in Computer Science, Engineering, Information Systems or relevant degree is required.
- Open to 25% national travel to the client location, as required by the client.