AI Researcher vs. AI Engineer vs. ML Engineer Explained

In July 2025, Elon Musk announced that xAI was deleting the word "researcher."

This false nomenclature of "researcher" and "engineer," which is a thinly-masked way of describing a two-tier engineering system, is being deleted from @xAI today. There are only engineers. Researcher is a relic term from academia.
— Elon Musk, July 2025

Yann LeCun pushed back.

An engineer, given a problem, invents and tries multiple solutions and stops when the solution is good enough. The goal is product innovation and shipping. A scientist asks new questions, proposes various new solutions, compares them, and writes about them. The goal is scientific breakthroughs and technological progress. Both can be called "researchers." Many people can do both: these are activities, not identities.
— Yann LeCun

It's a fun fight, and two smart people genuinely disagree. But two tweets don't settle org design, and the framing hides the more interesting fact. The argument is really about a third role that already sits between the two they're naming. Most AI work isn't a clean split between scientists who think and engineers who build. It runs along a spectrum, and the role in the middle is where the action is.

The spectrum, not the binary

A more honest picture has three points on a line.

At one end is the AI researcher (formally a research scientist): comes up with original ideas, frames new questions, designs the experiments, and writes up what happened. The output is a result that didn't exist before, usually a paper or a prototype that beats the prior best. These roles look like academia and tend to hire that way, often a PhD with first-author papers at top venues. Success is scientific impact: novelty, citations, benchmarks, and whether the work translates into capability later.

At the other end is the AI engineer (also called an ML or product AI engineer): takes models and turns them into systems that run reliably, scale, and make money. The day job is picking and adapting existing models for a use case, building the data pipelines and serving stack, cutting latency and cost, wiring everything into apps, and iterating on A/B tests. The skill set is software engineering: Python, distributed systems, cloud, CI/CD, containers. Reviews are about what shipped, how reliable it is, and what it did to the numbers.

In the middle is the AI research engineer, and this is the role the Musk/LeCun argument keeps stepping around. A research engineer uses strong engineering skills to set up and run the experiments behind someone's research idea: building infrastructure, curating data, designing metrics, scaling training so the team can experiment fast. At DeepMind, research engineers "lead their efforts in developing and scaling novel algorithmic methods that push the frontier of AI." They co-author papers as equal contributors. The line between RS and RE is real but thin, and people cross it constantly.

That middle role is the whole point. When Musk says "there are only engineers," he's partly describing how frontier work already happens: heavy on infrastructure, data, and scaling, with the "research" and the "engineering" done by the same people. When LeCun says these are activities and not identities, he's describing the same overlap from the other side. They're both circling the research engineer without naming it.

In practice, the spectrum tilts toward the engineering end, and often informally. Foundation models commoditized a lot of capability, so work that once needed a research team now needs a product engineer with API docs and a spare afternoon, the argument Shawn "swyx" Wang made in his 2023 essay, The Rise of the AI Engineer. Andrej Karpathy agreed:

There's probably going to be significantly more AI Engineers than there are ML engineers. One can be quite successful in this role without ever training anything.
— Andrej Karpathy

So even where a company has a formal research or R&D group, a lot of the actual R&D, the evals, fine-tuning, retrieval, and inference tuning that used to sit with research now gets picked up by people whose title says product engineer. Anthropic describes the same thing from the lab side: the boundary "has dissolved with the advent of large models," and engineers author its papers.

Where the roles actually differ

The differences that matter aren't about prestige. They're about what each role optimizes for and what you'd grade them on.

Dimension	AI researcher (research scientist)	Research engineer	AI engineer (ML / product)
Core question	What new thing is true or possible?	How do we build the system to find out?	How do we ship this reliably at scale?
Output	Papers, novel methods, new benchmarks	Training infra, experiment frameworks, scaled methods	Deployed services, features, A/B wins
Optimizes for	Novelty, scientific impact	Experiment velocity, reproducibility	Uptime, latency, cost, KPIs
Time horizon	Years	Months to a couple of years	Weeks to months
Typical background	PhD, publication record	Strong CS plus ML systems experience	Software engineering, production experience

The trap is grading someone on the wrong column. Put a research scientist on uptime and GPU-utilization targets, and you choke off the exploration you hired them for. Grade a product engineer on paper count, and you've measured nothing that ships. The research engineer is hard to slot at all, which is exactly why some companies stopped trying.

What a "Member of Technical Staff" is actually doing

OpenAI and Anthropic give research and engineering hires the same internal title: Member of Technical Staff. The term comes from Bell Labs, and it's a deliberate move, not a quirk. Bob McGrew, OpenAI's former Chief Research Officer, has said the title was chosen to dismantle the usual split between theoretical researchers and the people who implement their work. One title, everyone expected to work across the stack.

xAI's "only engineers" is the same instinct with louder branding. Musk argues that a researcher/engineer split is really a two-tier system in disguise, and he points to SpaceX, which, in his telling, does more cutting-edge work than most academic labs without anyone calling themselves a researcher.

It's worth being clear about what flattening buys you, because the usual story is half of it. The obvious win is execution: shared responsibility for shipping, no caste system, metrics tied to the business. The less obvious win, and the one the "fast execution" framing misses, is that flattening can protect researchers. There is no second-class engineer, who does the unglamorous scaling work while a separate elite gets the credit. Everyone owns the stack, and the person curating data isn't structurally beneath the person who had the idea. That's a real argument for flat structures that has nothing to do with speed.

The flip side is just as real. Flatten without actually valuing the research, and it goes invisible: the talent and the ideas are there, but research lives in a few rooms instead of being part of what the company thinks it is, so it captures little of the value. That's why Bell Labs keeps coming up. There, research was the institution's identity, not an activity tucked inside an engineering org, and that's what translated into real technology again and again.

When separate tracks still make sense (and still ship fast)

The standard knock on separate RS/RE tracks is that they're slow: silos, painful handoffs, prototypes that die on the way to production. Sometimes true. But the cleanest counterexample is also the most famous result of the last decade.

AlphaFold came out of DeepMind, which runs a hard split between research scientists and research engineers. It didn't start in a pure research group; it started in DeepMind's Applied team, which was a majority of research engineers. For roughly the first nine months, the team mostly did engineering: curating data, building the right metrics, and standing up infrastructure that let them run experiments quickly. The separate-tracks structure didn't prevent a breakthrough or slow the shipping. The engineering-heavy front end was the breakthrough's foundation.

So "separate roles are slow" and "flat roles ship faster" are both too simple. DeepMind shipped a Nobel-adjacent result with a strict split. OpenAI shipped products at speed with no split at all. The structure didn't decide the outcome. What you were building did.

Map the work first, then pick the titles

Here's the move that makes the AlphaFold-versus-OpenAI comparison click. As Roberto Pieraccini, who has led technical teams at Bell Labs, IBM Research, and Google, argues, research, R&D, and product engineering aren't the same job at different speeds. They have different definitions of success: knowledge that others build on, a research result translated into something a team can ship, and a working feature in users' hands. That kind of work is the upper floor, fixed by what you're trying to build. How you title the people (separate tracks or flat) is the floor below, and that part you choose. The two cuts overlap since his "R&D" is close to what a research engineer does, but they don't add up to one. Put them on a grid, and the choice gets concrete.

Dominant work	Separate tracks (RS / RE / engineer)	Flat (MTS / "only engineers")
Research-heavy (long, uncertain bets)	DeepMind: distinct RS/RE, research protected, still shipped AlphaFold	Risky: open-ended work gets squeezed by the roadmap
R&D-heavy (translating results into systems)	A named research-engineer role owns the bridge	Works, because everyone is a research engineer in practice
Product-heavy (known methods, scaling)	Overhead and status friction for little gain	OpenAI, xAI: shared ownership, fast shipping

Read down the column that matches the work you actually do most. AlphaFold and OpenAI both worked because each matched its titles to its dominant work, not because separate or flat is correct in the abstract. The failure mode isn't picking the "wrong" regime in general. It's picking one that doesn't fit the work that fills most of your week. That's what turns the vague "it depends" into a question you can answer: where is your portfolio's center of gravity?

How to choose

So reason from objectives, not titles.

Lean flat (MTS or "only engineers") when the objective is mostly shipping: known methods applied hard to a product, speed, and shared ownership over a long research bet, and no status gap between the people with ideas and the people who scale them. The risk you accept: genuinely open-ended research can get squeezed out by the next deadline, and a pure scientist may not want to live on production metrics.

Lean toward separate tracks when the objective runs long: multi-year, high-uncertainty research, recruiting academic talent who want a real research environment, and work with an extended engineering-and-data phase that deserves its own role and its own metrics instead of being treated as someone's overflow. The risk you accept: handoffs and possible status friction, which you manage with explicit interfaces between the groups rather than by hoping it sorts out.

Either way, the research engineer is the load-bearing role and the one to staff deliberately. Flatten the titles, and you've just made everyone a research engineer in practice. Keep them separate, and the RE is the bridge that decides whether research ever reaches production. The title fight is mostly noise. The question worth answering is what you're actually trying to build, how much of it lives in that middle, and whether your structure and your metrics pay for it.

Still not sure which one you need? Fill out the quick test below.

Which one do you actually need?

Answer five questions about the work that fills most of your week to see which role your situation calls for.

What does most of your week actually produce?

What's the time horizon for the work that matters most?

What would you actually grade success on?

Where does your hardest bottleneck sit right now?

How do you mostly work with models?

Meet Pam, your Proactive AI Manager

Pam unifies your fragmented tools and surfaces what needs attention before it becomes a problem — so your team ships, not firefights.

FAQ

What's the difference between an AI researcher, a research engineer, and an AI engineer?

An AI researcher (research scientist) frames new questions and invents methods; the output is knowledge, usually a paper or a novel result. An AI engineer (also called an ML or product engineer) turns existing models into reliable systems that ship and scale. A research engineer sits between them, building the infrastructure, data, and experiments that turn a research idea into something testable, and often co-authoring the results. The three are points on a spectrum, not separate species.

What does "Member of Technical Staff" mean at OpenAI and Anthropic?

It's a single internal title given to both research and engineering hires. The term traces back to Bell Labs, and labs like OpenAI and Anthropic use it on purpose: one title, no formal split between people who have ideas and people who build them, everyone expected to work across the stack. It's the flat alternative to separate "researcher" and "engineer" ladders.

Why did xAI drop the "researcher" title?

In July 2025, Elon Musk said xAI was deleting the researcher/engineer distinction, calling "researcher" a relic from academia and the split a "thinly-masked" two-tier system. The reasoning is the same as the Member of Technical Staff idea, just with louder branding: one engineering standard, shared accountability for what ships.

Is research just engineering done at a slower speed?

No. As Roberto Pieraccini argues, research, R&D, and product engineering have different definitions of success. Engineering succeeds when a feature works and ships; research succeeds when it produces knowledge others build on, whether or not it ever ships; R&D succeeds when it translates a research result into something buildable. Treating them as the same job at different speeds is what leads orgs to fund and measure them wrong.

Do flat team structures hurt research?

They can, but not always. A flat structure risks letting open-ended research get squeezed out by the next deadline, or making it invisible because it's never treated as a real identity. But flattening can also protect researchers, since there's no second-class engineer scaling someone else's ideas for less credit. It depends on whether the company actually values and measures the research work, not just on the title.

Separate research and engineering tracks, or one unified title: which should we pick?

Start from the work, not the title. If most of what you do is applying known methods to a product, lean flat for speed and shared ownership. If you're making multi-year, high-uncertainty research bets, separate tracks give that work its own metrics and protect it from production pressure. Match the titling regime to whichever kind of work fills most of your week.

Do you need a research team to do AI R&D?

Often not, at least for applied R&D. Foundation models commoditized core capability, so much of the experimentation that once required a research team, like fine-tuning, evaluation, retrieval, and inference optimization, is now done by product or AI engineers as part of building the product. swyx's The Rise of the AI Engineer and Andrej Karpathy both noted you can be effective here without ever training a model. Dedicated research still matters for long-horizon, novel work, but the applied middle increasingly sits with engineering.