At Roche you can show up as yourself embraced for the unique qualities you bring. Our culture encourages personal expression open dialogue and genuine connections where you are valued accepted and respected for who you are allowing you to thrive both personally and professionally. This is how we aim to prevent stop and cure diseases and ensure everyone has access to healthcare today and for generations to come. Join Roche where every voice matters.
The Position
We are looking for a highly skilled Data Scientist with expertise in building AIpowered applications. We will be building GenAI solutions endtoend : from concept through prototyping productization to operations. The ideal candidate will bring technical expertise in Natural Language Processing (NLP) especially leveraging Large Language Models (LLM) and proficiency in prompt engineering techniques.
Key Responsibilities :
Generative AI Application Development : Collaborate with AI engineers product owners business analysts and other developers in Agile teams to integrate LLMs into scalable robust fair and ethical enduser applications focusing on user experience relevance and realtime performance
Algorithm Development : Design develop customize optimize and finetune LLMbased and other AIinfused algorithms tailored to specific use cases such as text generation summarization information extraction chatbots AI agents code generation document analysis sentiment analysis data analysis etc.
Data Curation for LLMs : Design data pipelines to curate preprocess and structure datasets that improve LLMbased algorithms performance and reduce biases with a focus on data quality and diversity
Exploratory Data Analysis (EDA) : Perform thorough data exploration to understand dataset characteristics uncover patterns detect biases and identify data quality issues; use statistical and visualization techniques to inform feature engineering model selection and optimization of LLMbased applications
Support in Prompt Engineering : support prompt engineers business analysts and subject matter experts in crafting and optimizing prompts to guide LLM outputs enhancing performance for specific tasks; be ready to participate in prompt engineering when necessary
Experimentation and Validation : Conduct rigorous experimentation including A / B testing to evaluate algorithm performance against benchmarks and control groups; use metrics specific to generative AI as well as preGenAI techniques as required
Software Development : Apply software development best practices including writing unit test; contribute to configuring CI / CD pipelines containerizing applications setting up APIs ensuring robust logging experiment tracking and model monitoring
Continuous Improvement : Collaborate with other developers to monitor deployed algorithms identify areas for improvement and collaborate on updates to enhance performance
Stakeholder Communication : Translate complex technical results into clear actionable insights for stakeholders driving datadriven decisionmaking across the organization
Ethical AI and Bias Mitigation : Implement techniques to identify and mitigate biases in LLM outputs ensuring responsible and ethical AI deployment
Pregenerative AI Application Development : Design and implement classical machine learning and NLP models (e.g. regression classification clustering sequence modeling) when they provide a more efficient interpretable or costeffective solution compared to LLMs; integrate these models into AI applications as needed
Requirements :
Experience : 3 years working with advanced machine learning algorithms
3 years of handson experience working with language models especially those based on Transformer architectures (e.g. BERT T5 RoBERTa) and at least 1 year of experience with generative large language models (e.g. GPT LLaMA Claude Cohere etc.)
Technical Skills : Advanced proficiency in Python and experience with deep learning frameworks such as PyTorch or TensorFlow; expertise with Transformer architectures; handson experience with LangChain or similar LLM frameworks
Experience with designing endtoend RAG systems using state of the art orchestration frameworks (hands on experience with finetuning LLMs for specific tasks and use cases considered as an additional advantage)
Practical overview and experience with AWS services to design cloud solutions familiarity with Azure is a plus; experience with working with GenAI specific services like Azure OpenAI Amazon Bedrock Amazon SageMaker JumpStart etc.
Data Skills : Strong skills in data manipulation annotation and crafting datasets that maximize LLM effectiveness; experience in working with data stores like vector relational NoSQL databases and data lakes through APIs; experience with data augmentation techniques or synthetic data generation in the context of LLMs considered as a plus
Prompt Engineering : Handson experience with prompt design zeroshot and fewshot learning paradigms to optimize LLM performance without extensive training or finetuning
Evaluation Metrics : Deep understanding of generative model and preGenAI evaluation techniques
NLP Expertise : Solid foundation in natural language processing including tokenization embeddings attention mechanisms and transfer learning specific to LLMs
Statistical Knowledge : Strong background in statistics machine learning algorithms and optimization techniques
Classical Machine Learning & NLP : Experience with traditional NLP techniques and classical machine learning algorithms (e.g. decision trees SVMs random forests gradient boosting) for text analysis and structured data applications
PreLLM Model Development : Handson experience developing and deploying machine learning models for tasks such as classification clustering regression and sequence modeling using frameworks like Scikitlearn XGBoost or traditional NLP pipelines
Feature Engineering & Data Preprocessing : Strong skills in feature engineering dimensionality reduction text preprocessing and structured data transformation to improve model performance
Deployment : Experience in deploying LLM models with cloud platforms (AWS Azure) and machine learning workbenches for robust and scalable productization
Proficiency in best practices of software engineering
Problem Solving : Excellent analytical skills and the ability to tackle complex challenges with innovative solutions
Communication : Strong verbal and written communication skills with the ability to present complex findings clearly to both technical and nontechnical audiences
The successful candidate should also :
be passionate about AI and stay uptodate with the latest developments in LLMs GenAI and AI in general
be teamoriented proactive and collaborative
be an excellent problem solver and analytical thinker
be detailoriented and highly organized
be willing to learn and expand their skill set
have the ability to work collaboratively in a fastpaced dynamic environment
be able to communicate in English at the level of : C1
be located near the Central European time zone or willing to work at a time consistent with the Central European time zone
Who we are
A healthier future drives us to innovate. Together more than 100000 employees across the globe are dedicated to advance science ensuring everyone has access to healthcare today and for generations to come. Our efforts result in more than 26 million people treated with our medicines and over 30 billion tests conducted using our Diagnostics products. We empower each other to explore new possibilities foster creativity and keep our ambitions high so we can deliver lifechanging healthcare solutions that make a global impact.
Lets build a healthier future together.
Roche is an Equal Opportunity Employer.
Key Skills
Laboratory Experience,Immunoassays,Machine Learning,Biochemistry,Assays,Research Experience,Spectroscopy,Research & Development,cGMP,Cell Culture,Molecular Biology,Data Analysis Skills
Employment Type : Full-Time
Experience : years
Vacancy : 1
Data Scientist • Madrid, Madrid, Spain