All roles

Research Engineer at OpenAI

What to expect when interviewing as a Research Engineer at OpenAI, pulled from live postings, the official interview guide, and Levels.fyi.

10 min readUpdated 2026-04-28Expert
Location

San Francisco

Seniority

L3 to L6

Total comp

$337K – $1.28M+

Primary stack

Python, PyTorch, CUDA

Overview

Research Engineer is OpenAI's senior engineering title inside the Research org. The role builds the distributed training systems, neural net training code, and infrastructure that produces frontier models. Engineering and research are treated as equal disciplines; the job selects for people who can move between CUDA kernels, distributed training, model code, and product surface in the same week.

The interview process is published openly. The skills assessment and the final loop are calibrated to the team you'll join, so the technical content you'll see is decentralized even though the structure is consistent.

Responsibilities

  • Take research from notebook to a production system serving millions of concurrent requests.
  • Build and operate distributed training infrastructure that survives 10x or 100x scale jumps without rewrites.
  • Implement and validate post-training techniques: SFT, distillation, RLHF, DPO, policy optimization.
  • Debug non-learning training loops, performance regressions, and reliability incidents end-to-end.
  • Define what "good enough" looks like on surfaces where the spec is loose by design.

What they want

Pulled from live job postings and LCA filings. Inline links jump to the curriculum.

Must-haves

  • Strong CS fundamentals: data structures, algorithms, software engineering principles.System Design
  • Production Python and fluency in PyTorch.Python
  • Experience architecting, building, and debugging production distributed systems.Distributed Systems
  • Cross-stack range and self-direction. Comfort picking up missing knowledge end-to-end.

Nice-to-haves

  • Familiarity with LLM training methods: distillation, SFT, policy optimization, RLHF, DPO.Transformers & LLMs
  • CUDA, NCCL, NVLink, InfiniBand, MPI for inference and performance-adjacent loops.
  • Master's or PhD in CS, ML, or a related field. Required on the Applied AI variant; nice elsewhere.
  • Track record rebuilding production systems through 10x or 100x scale jumps.

Compensation

OpenAI's equity is Profit Participation Units (PPUs), not standard public-company RSUs. Levels.fyi total comp bakes in OpenAI's PPU mark — the company's valuation, not a guaranteed liquid figure. Cash component is closer to the LCA-disclosed median (~$310K base for MTS in 2025). Negotiate on cash, cliff, and PPU strike, not on the headline TC.

LevelTotal compBaseEquity
L3 (Mid)$337K$210K$127K (PPUs)
L4 (Senior)$569K$255K$315K (PPUs)
L5 (Staff)$1.15M$336K$774K (PPUs)
L6 (Senior Staff)$1.28M+

Interview process

About one week per stage. End-to-end typically 6 to 8 weeks.

OpenAI publishes the process explicitly. Five stages, with the recruiting team aiming for one week of turnaround between each. The skills assessment format depends on the team — pair coding, take-home, or technical test — sometimes more than one. The final loop is 4 to 6 hours over 1 to 2 days with 4 to 6 interviewers.

01

Application & resume review

~1 week

Recruiting reviews your application. They are explicit about not being credential-driven; they want to understand your background and what you'd contribute.

02

Introductory call

30 to 45 min · Recruiter or hiring manager

A conversation about your work, academic experience, motivations, and goals. OpenAI explicitly recommends familiarizing yourself with their recent blog posts before this call, especially anything from the team you are interviewing for.

2 tips for this stage
  • Read 2 or 3 recent posts from the team you are interviewing for. Have one specific question ready that ties your background to a tradeoff they made.
  • Be specific about why OpenAI and not 'why frontier AI in general.' Generic mission answers tend to screen out.
03

Skills-based assessment

~1 week to next stage · Pair coding, take-home, or technical test

Format varies by team. For ML roles this is typically ML coding in NumPy or PyTorch. Reported tasks include implementing multi-head attention, KV cache, layer norm, and small training loops. Some teams run two assessments back to back. The recruiter sends prep materials before this stage.

2 tips for this stage
  • Practice MHA, KV cache, and a tiny training loop in pure NumPy with broadcasting. Higher-level autograd is sometimes disabled in the screen.
  • Read the prep materials your recruiter sends. They are not optional.
04

Final interviews

4 to 6 hours over 1 to 2 days · Virtual default, SF onsite optional

4 to 6 interviews with 4 to 6 interviewers. Reported components: production-quality coding, ML system design (such as 'design a distributed training platform' or 'serve a GPT-4-class model to millions'), ML debugging on a non-learning training loop, a project deep-dive presentation, and behavioral with mission alignment.

3 tips for this stage
  • Bring slides for the project deep-dive. Treat it as a job-talk format defending technical choices on past work.
  • ML debugging is its own round. Practice gradient diagnostics, label leakage, batch-norm in eval mode, and learning-rate misconfiguration.
  • Composition is decentralized. Ask your recruiter for the specific layout for your team before you prepare.
05

Decision and references

~1 week

OpenAI commits to responding within a week of the final loop. Recruiter may ask for references at this stage.

What to study

Each topic links to a lesson in the SWE Quiz curriculum. The external links are the canonical references.

Three buckets: ML internals, ML systems, and behavioral mission alignment. Weighting depends on team. The coding bar is constant.

Transformer internals

MHA, KV cache, layer norm, and the canonical variants (RoPE, GQA, RMSNorm, SwiGLU) come up across nearly every ML role. The skills assessment often asks you to implement these from NumPy or PyTorch primitives.

RLHF and post-training

The Applied AI and Integrity postings explicitly call out distillation, SFT, and policy optimization. Know PPO mechanics, the role of the KL penalty, what reward hacking looks like, and how DPO simplifies the pipeline.

Distributed training systems

ML system design rounds explicitly include 'design a distributed training platform.' Know data parallel vs tensor parallel vs pipeline parallel, all-reduce vs parameter server, ZeRO sharding, optimizer state, and fault tolerance.

Inference at scale

The Model Inference posting calls out NCCL, CUDA, NVLink, InfiniBand, and MPI. Final loops include 'serve a GPT-4-class model to millions concurrently.' Know batch scheduling, continuous batching, paged KV cache, speculative decoding, and the prefill-vs-decode tradeoff.

ML system design

Reported prompts: design unsafe-content detection, RAG with eval framework, recommender at scale. These rounds reward pipeline thinking, observability, and grounded eval over buzzwords.

ML debugging

A distinct round on some loops: debug a training run that is not learning. Exploding or vanishing gradients, label leakage, batch-norm in eval mode, dataset misalignment, learning-rate misconfiguration.

Insider tips

  • Do the prep your recruiter sends. OpenAI explicitly says they provide prep for the assessment. First-person accounts cite candidates failing because they skipped or skimmed it.
  • Practice MHA, KV cache, and small training loops in pure NumPy with broadcasting. Higher-level autograd is sometimes disabled in the screen.
  • OpenAI's equity is PPUs, not RSUs. The Levels.fyi total comp number is the company's PPU mark, not a liquid figure. Negotiate on cash, cliff, and PPU strike.
  • Bring slides to the project deep-dive. Treat it as a job-talk: ~10 slides defending technical choices on a past project.
  • Ask your recruiter for the final loop composition for your team. The process is decentralized — what others report on Glassdoor may not match yours.

FAQs

Is 'Research Engineer' the same as 'ML Engineer' at OpenAI?
Roughly, but not exactly. OpenAI uses 'Research Engineer' as the senior engineering title inside OpenAI Research — the team that builds the systems that train frontier models. Most other companies call this an ML Engineer. OpenAI also has adjacent titles for production ML: Software Engineer (Model Inference), Software Engineer (ML Performance), Research Engineer (Applied AI Engineering), and ML Engineer (Integrity). Comp converges at the senior end across these titles.
How long does the OpenAI interview process take?
OpenAI's published guide says about a week per stage, putting end-to-end at roughly 6 to 8 weeks. Senior loops with scheduling can run 8 to 12 weeks. The recruiter screen is fast; the gap between the final loop and the decision is rarely more than a week per OpenAI's own commitment.
Do I need a PhD to be a Research Engineer at OpenAI?
Not for engineering-track roles. The Research Engineer (Applied AI Engineering) posting lists 'Master's/PhD' as expected. Core Research Engineer roles select hard for research output regardless of credential, and the Software Engineer (Model Inference) variant requires '5 years of professional software engineering experience' with no PhD listed.
How does OpenAI compensation compare to peer labs?
Roughly comparable at senior levels. Levels.fyi medians sit near $555K SWE TC at OpenAI versus $525K at Anthropic. The structural difference is that OpenAI uses Profit Participation Units (PPUs) instead of public-company RSUs, so the equity number is the company's mark rather than a liquid figure. Cash base at the senior tier is around $300K to $336K.
What languages besides Python should I know?
Production Python and PyTorch are universal. C++ matters for Training Performance Engineer and parts of ML Performance. CUDA matters for Model Inference. Rust is a bonus on training-infra roles. Node.js and React surface on Applied AI and ChatGPT product roles.

Quick check

5 questions on what they actually probe. Answers save locally.

Related

Sources