Seeking a skilled Data Engineer with a robust background in PySpark and extensive experience with AWS services, including Athena and EMR. The ideal candidate will be responsible for designing, developing, and optimizing large-scale data processing systems, ensuring efficient and reliable data flow and transformation.
Key Responsibilities :
- Data Pipeline Development : Design, develop, and maintain scalable data pipelines using PySpark to process and transform large datasets.
- AWS Integration : Utilize AWS services, including Athena and EMR, to manage and optimize data workflows and storage solutions.
- Data Management : Implement data quality, data governance, and data security best practices to ensure the integrity and confidentiality of data.
- Performance Optimization : Optimize and troubleshoot data processing workflows for performance, reliability, and scalability.
- Collaboration : Work closely with data scientists, analysts, and other stakeholders to understand data requirements and deliver solutions that meet business needs.
- Documentation : Create and maintain comprehensive documentation of data pipelines, ETL processes, and data architecture.
Required Skills and Qualifications :
Education : Bachelor's or Master’s degree in Computer Science, Engineering, or a related field.Experience : 5+ years of experience as a Data Engineer or in a similar role, with a strong emphasis on PySpark.Technical Expertise :o Proficient in PySpark for data processing and transformation.
o Extensive experience with AWS services, specifically Athena and EMR.
o Strong knowledge of SQL and database technologies.
o Experience with Apache Airflow is a plus
o Familiarity with other AWS services such as S3, Lambda, and Redshift.
Programming : Proficiency in Python; experience with other programming languages is a plus.Problem-Solving : Excellent analytical and problem-solving skills with attention to detail.Communication : Strong verbal and written communication skills to effectively collaborate with team members and stakeholders.Agility : Ability to work in a fast-paced, dynamic environment and adapt to changing prioritie