Principal Data Engineer
- You will provide architectural and engineering leadership to create and enhance data solutions. These solutions will enable seamless integration and flow of data across our data ecosystem which includes AWS. Additionally you will provide senior level technical consulting to peer data engineers during design and development for highly complex and critical data projects.
- Some of these projects will include designing and developing data ingestion and processing/transformation frameworks leveraging open source tools such as Lambda, Step Functions, Java, Python, Spark, etc.
- You will interact with data stores such as S3, Redshift, Snowflake and NoSQL DBs. Additionally, you will work on real time processing solutions using tools such as Kafka, and Kinesis.
- You will deploy application code using CI/CD tools and techniques and provide support for deployed data applications and analytical models.
In this role, you will have the following duties:
- Develop data driven solutions utilizing current and next generation technologies to meet evolving business needs.
- Ability to quickly identify an opportunity and recommend possible technical solutions.
- Utilize multiple development languages/tools such as Python, SPARK(in scala), Java to build prototypes and evaluate results for effectiveness and feasibility.
- Work heavily within the AWS ecosystem, using AWS services
- Operationalize open source data-analytic tools for enterprise use.
- Develop real-time data ingestion and stream-analytic solutions leveraging technologies such as Kafka, Apache Spark, NIFI, Python, Kinesis, and Hadoop/EMR.
- Custom Data pipeline development (Cloud and locally hosted)
- Work heavily within the Hadoop ecosystem.
- Provide support for deployed data applications and analytical models by being a trusted advisor to Data Scientists and other data consumers by identifying data problems and guiding issue resolution with partner Data Engineers and source data providers.
- Provide subject matter expertise in the analysis, preparation of specifications and plans for the development of data processes.
- Ensure proper data governance policies are followed by implementing or validating Data Lineage, Quality checks, classification, etc.
Technical Skills / Experience
- AWS data services (Lambda,Glue, EMR, Kinesis, Step Functions, Data Pipeline)
- Deep understanding of the Hadoop technology stack
- Building custom NiFI processors
- Data pipeline development
- Experience in developing Python / R applications
- Spark application coding in Scala / Python (pySpark)
- Deep knowledge and very strong in SQL, and Relational Databases
- Strong in Unix / Shell scripting
- Experience in creating very efficient HiveQL and SparkQL queries and can educate peers on the topic
- Leadership Skills: 7+ years of experience of being a lead engineer and able to coach/provide guidance to peer and junior engineers.
- Excellent written and verbal communication, presentation and professional speaking skills