Your mission
Define how AI models for materials discovery are evaluated, compared, and trusted
Dunia is building AI for one of the hardest unsolved problems in science: turning materials discovery from an academic, trial-and-error process into a programmable, scalable discipline.
As our models grow more complex and our experimental throughput increases, the limiting factor is no longer generating predictions, but knowing which ones to believe.
As Materials Informatics Scientist (Evaluation-focused), you will own the evaluation and validation of AI models applied to materials discovery. Your role is to ensure that model performance claims are meaningful, comparable, and decision-relevant, and that progress in AI for Materials reflects real improvements in discovery, not artifacts of metrics or datasets.
This role is not about building new models. It is about defining the standards by which models are judged.
Dunia is building AI for one of the hardest unsolved problems in science: turning materials discovery from an academic, trial-and-error process into a programmable, scalable discipline.
As our models grow more complex and our experimental throughput increases, the limiting factor is no longer generating predictions, but knowing which ones to believe.
As Materials Informatics Scientist (Evaluation-focused), you will own the evaluation and validation of AI models applied to materials discovery. Your role is to ensure that model performance claims are meaningful, comparable, and decision-relevant, and that progress in AI for Materials reflects real improvements in discovery, not artifacts of metrics or datasets.
This role is not about building new models. It is about defining the standards by which models are judged.
Your tasks will include:
Own evaluation as a scientific discipline- Design, implement, and maintain evaluation frameworks for AI models across materials discovery tasks
- Define metrics and protocols that reflect generalization, robustness, uncertainty, and experimental relevance
- Identify failure modes, dataset leakage, and misleading performance signals
- Systematically benchmark different model classes, training regimes, and representations
- Evaluate tradeoffs between accuracy, uncertainty, data efficiency, and usability
- Provide clear, defensible recommendations on which models to trust, deploy, or retire
- Link model behavior to experimental results and program-level objectives
- Distinguish improvements that change decisions from those that only improve abstract scores
- Help research and programs teams understand what current models can and cannot reliably do
- Develop and maintain professional-grade scripts and analysis pipelines for evaluation and benchmarking
- Visualize complex, high-dimensional results in ways that surface real insight
- Ensure disciplined, reproducible handling of data, code, and results
- Present findings clearly to AI researchers, materials scientists, and leadership
- Produce concise summaries that align the organization around a shared view of evidence and uncertainty
- Act as an independent scientific reference point when claims require validation