Talent.com
Esta oferta de trabajo no está disponible en tu país.
Principal ML Engineer (Infra / hardware)

Principal ML Engineer (Infra / hardware)

Neurons LabValencia, Comunidad Valenciana, Spain
Hace 3 días
Descripción del trabajo

About the project

We're looking for an experienced ML Infrastructure Engineer who has successfully implemented large-scale ML infrastructure optimization projects. The primary focus is migrating and optimizing computer vision models from Nvidia GPU-based infrastructure to AWS Inferentia / Trainium while getting performance boost and cost reduction.

Current Infrastructure :

ML Models : RetinaFace, OpenPose, CLIP, and other CV models

Hardware : A10 / T4 GPUs on EKS

Serving : Triton Inference Server

Orchestration : Mix of Kubernetes and Ray

Stage : Presale and Delivery

Duration : 2 months (preliminary)

Capacity : part-time (20h / week)

Areas of Responsibility

Technical Leadership :

Lead the architecture design for ML infrastructure modernization

Define compilation and optimization strategies for model migration

Establish performance benchmarking framework

Set up monitoring and alerting for the new infrastructure

Performance Optimization :

Implement efficient model compilation pipelines for Inferentia2

Optimize batch processing and memory layouts

Fine-tune model serving configurations

Ensure latency requirements are met across all services

Cost Optimization :

Analyze and optimize infrastructure costs

Implement efficient resource allocation strategies

Set up cost monitoring and reporting

Achieve target cost reduction while maintaining performance

Skills

Proven track record of ML infrastructure optimization projects

Hands-on experience with AWS Neuron SDK and Inferentia / Trainium deployment

Deep expertise in PyTorch model optimization and compilation

Experience with high-throughput computer vision model serving

Production experience with both Kubernetes and Ray for ML workloads

Knowledge

Model Optimization Expertise :

Deep understanding of ML model architecture optimization

Experience with model compilation techniques for specialized hardware (Inferentia / Trainium)

Proficiency in optimizing computer vision models (CNN architectures)

Knowledge of model serving optimization patterns

Performance Optimization :

Advanced understanding of ML model inference optimization

Expertise in batch processing strategies

Memory layout optimization for vision models

Experience with pipeline parallelism implementation

Proficiency in latency / throughput optimization techniques

Hardware Acceleration :

Deep knowledge of ML accelerator architectures

Understanding of hardware-specific optimizations

Experience with model compilation for specialized chips

Proficiency in memory access pattern optimization

Performance Analysis :

Proficiency in ML model profiling tools

Experience with performance bottleneck identification

Knowledge of performance monitoring techniques

Ability to analyze and optimize inference patterns

Nice to Have :

Experience with Ray architecture for ML serving

Knowledge of distributed ML systems

Understanding of ML pipeline optimization

Experience with model quantization techniques

Experience

Model Optimization (4+ years) :

Proven track record of optimizing large-scale ML inference systems

Successfully implemented hardware-specific model optimizations

Demonstrated experience with computer vision model optimization

Led projects achieving significant performance improvements

Proven Results (Examples) :

Successfully optimized computer vision models similar to RetinaFace / CLIP

Achieved significant cost reduction while maintaining performance

Implemented efficient batch processing strategies

Developed performance monitoring and optimization frameworks

J-18808-Ljbffr