
AI/ML Research Engineer, LLM Post-Training & Evaluation
- Remote
- Ridgefield Park, New Jersey, United States
- Innodata Services LLC
Job description
Who we are:
Innodata (NASDAQ: INOD) is a leading data engineering company. With more than 2,000 customers and operations in 13 cities around the world, we are the AI technology solutions provider-of-choice to 4 out of 5 of the world’s biggest technology companies, as well as leading companies across financial services, insurance, technology, law, and medicine.
By combining advanced machine learning and artificial intelligence (ML/AI) technologies, a global workforce of subject matter experts, and a high-security infrastructure, we’re helping usher in the promise of clean and optimized digital data to all industries. Innodata offers a powerful combination of both digital data solutions and easy-to-use, high-quality platforms.
Our global workforce includes over 3,000 employees in the United States, Canada, United Kingdom, the Philippines, India, Sri Lanka, Israel and Germany. We’re poised for a period of explosive growth over the next few years.
Position Summary:
Innodata is expanding its team of technical experts in LLM training, post-training, and evaluation systems. As an AI/ML Research Engineer, LLM Training & Evaluation, you will build and optimize the technical foundations that power model improvement for foundation model builders and leading labs.
This role is ideal for someone who has hands-on experience fine-tuning and evaluating large language models (and ideally multimodal models), and who can bridge research and engineering in real-world customer environments. You will work closely with Language Data Scientists, Applied Research Scientists, data engineers, and client technical stakeholders to design and implement robust training/evaluation pipelines using both human-in-the-loop and AI-augmented methods.
The ideal candidate brings a strong computer science / machine learning engineering background, experience with modern LLM post-training workflows, and the ability to engage credibly with technical counterparts at leading AI organizations.
Who We’re Looking For:
You have at least 2-3 years of relevant experience in machine learning engineering, applied ML systems, or research engineering, with substantial hands-on work in LLMs and multimodal foundation models. You have built, adapted, or optimized model training and evaluation pipelines, and you understand the practical realities of experimentation at scale: reproducibility, debugging, metrics quality, and iteration speed.
You are comfortable operating in ambiguous, high-complexity environments and can move from problem framing to implementation. You can collaborate effectively with both researchers and engineers, and you are credible in technical conversations with sophisticated customer stakeholders (e.g., AI researchers, ML engineers, technical product leads).
You bring a builder mindset and strong engineering judgment, while also understanding that evaluation quality and data quality are central to model improvement. You are excited to partner with human evaluation experts and language data scientists to create integrated post-training and evaluation systems.
Tell Me More:
As an AI/ML Research Engineer, LLM Training & Evaluation, you will design and implement the pipelines and tooling that connect data, evaluation, and post-training. You will help customers and internal teams move from evaluation findings to measurable model improvements.
Your work may include building fine-tuning workflows (e.g., supervised fine-tuning and preference-based optimization), integrating evaluation harnesses into model development loops, improving experiment reliability and throughput, and supporting advanced evaluation scenarios such as long-context, cross-modal, and dynamic multi-turn interactions.
You will also contribute to Innodata’s internal R&D efforts, including benchmark datasets, evaluation frameworks, and reusable infrastructure for model assessment and post-training experimentation.
Responsibilities:
Lead or co-lead technically complex ML engineering projects from initial customer discussions through implementation and delivery
Design, build, and improve LLM training and post-training pipelines, including data ingestion, preprocessing, fine-tuning, evaluation, and experiment tracking
Implement and optimize evaluation systems for LLMs and multimodal models, including offline benchmarks and task-specific test harnesses
Integrate human-in-the-loop and AI-augmented evaluation signals into model development workflows
Build robust infrastructure and tooling for reproducible experimentation, metrics logging, and regression monitoring
Diagnose model behavior and pipeline failures, including data issues, training instability, metric inconsistencies, and evaluation drift
Collaborate with Language Data Scientists and Applied Research Scientists to translate evaluation frameworks into executable systems
Work closely with customer technical stakeholders to understand goals, constraints, and success criteria; propose and implement technically sound solutions
Contribute to internal research and platform development, including benchmark frameworks, evaluation tooling, and post-training workflow improvements
Contribute to best practices and standards for LLM training, evaluation, and quality assurance across projects
Mentor junior engineers and contribute to technical design reviews, documentation, and engineering rigor across the team
Job requirements
BS/MS/PhD in Computer Science, Machine Learning, AI, Applied Mathematics, or a related quantitative technical field (MS/PhD preferred)
2-3 years of relevant industry or research engineering experience in ML/AI systems
Hands-on experience with LLM training / fine-tuning / post-training, including at least one of:
supervised fine-tuning (SFT)
preference optimization (e.g., DPO or related methods)
RLHF / RLAIF-style workflows
task- or domain-adaptation of foundation models
Strong programming skills in Python and experience building production-quality ML code
Experience with modern ML frameworks (e.g., PyTorch, JAX, TensorFlow) and model libraries/tooling (e.g., Hugging Face ecosystem, vLLM, distributed training stacks)
Experience designing and implementing evaluation pipelines for LLM/ML systems, including metrics computation, dataset handling, and experiment comparisons
Strong understanding of data pipelines and ML systems engineering, including reproducibility, observability, and debugging
Experience with large-scale distributed ML systems and performance optimization for training/evaluation workloads (GPU/accelerator environments preferred)
Experience with large-scale data processing and workflow orchestration in support of model training/evaluation
Ability to collaborate directly with technical stakeholders including research scientists, ML engineers, data engineers, and customer technical leads
Strong written and verbal communication skills, including the ability to explain complex technical tradeoffs to both technical and non-technical audiences
Technical Skills
ML / LLM Engineering
Experience training, fine-tuning, and evaluating transformer-based models
Understanding of post-training workflows and model iteration loops
Familiarity with inference-time considerations (latency, throughput, memory/performance tradeoffs) where relevant to evaluation or deployment
Evaluation & Experimentation
Experience implementing automated evaluation pipelines and test harnesses
Experience with experiment tracking, versioning, and reproducibility practices
Ability to assess metric quality and ensure consistency across model comparisons
Software / Data Engineering
Proficiency in Python and strong software engineering fundamentals
Experience with data processing pipelines, storage formats, and scalable dataset workflows
Familiarity with CI/CD, testing, and engineering quality practices for ML systems
Preferred Skills
Experience with multimodal model training/evaluation (text + image/audio/video)
Experience with long-context evaluation and/or model adaptation for long-context tasks
Experience with agentic or multi-turn evaluation harnesses, tool-use simulation, or interactive environment testing
Experience working in customer-facing technical consulting, solutions engineering, or applied research delivery
Familiarity with LLM safety, alignment, robustness, or red-teaming evaluation approaches
Contributions to open-source ML/LLM tooling or published technical work in relevant areas
How this role partners with the team
This role works closely with:
Language Data Scientists, who lead human evaluation design, language/data process excellence, and annotation/synthetic workflows
Applied Research Scientists, who lead evaluation methodology, benchmarking research, and experimental design
Data Engineers / Platform Teams, who support scalable data and infrastructure foundations
Customer Technical Stakeholders, who rely on Innodata for expert guidance and implementation support in advanced GenAI development
Please be aware of recruitment scams involving individuals or organizations falsely claiming to represent employers. Innodata will never ask for payment, banking details, or sensitive personal information during the application process. To learn more on how to recognize job scams, please visit the Federal Trade Commission’s guide at https://consumer.ftc.gov/articles/job-scams.
If you believe you’ve been targeted by a recruitment scam, please report it to Innodata at verifyjoboffer@innodata.com and consider reporting it to the FTC at ReportFraud.ftc.gov.
#LI-NS1
or
All done!
Your application has been successfully submitted!