The conversation that keeps going wrong
A procurement call goes like this. The buyer says we need help with AI. The vendor says we do AI. Six weeks and a statement of work later, both sides are frustrated — because the buyer wanted an LLM-powered feature inside their SaaS product and the vendor was scoped to deliver reinforcement learning from human feedback on a base model. Those are not the same thing. They share five letters and nothing else.
The industry has not drawn the line cleanly, so buyers have to draw it themselves. Here is the one that matters.
AI development — putting models into products
AI development is building applications that use AI models. You are not changing the model. You are integrating it: retrieval-augmented generation over your private data, an agent that takes actions in your tool chain, a classifier that routes support tickets, an LLM-powered editor inside your SaaS.
The skill set is software engineering plus LLM architecture. The deliverable is a feature in your product that your users touch. The evaluations are user-facing — latency, hallucination rate, task completion — and the model is a commodity input, often swapped across vendors as pricing and capability shift.
Who buys this: product engineering teams with an AI feature on a roadmap. The contract is a development engagement that happens to use a model instead of a hardcoded rule set. The right vendor is a software shop that is comfortable with the LLM-specific pieces — retrieval design, evaluation harnesses, prompt caching, tool use, fallback logic — but whose job is fundamentally to ship software.
AI training — improving the model itself
AI training is the human-in-the-loop work that makes a base model better at a task. You are producing data: supervised fine-tuning demonstrations, preference rankings that feed reward models, adversarial prompts for red-teaming, custom evaluation sets for domains where public benchmarks do not exist. The deliverable is not software. It is a dataset — with rubrics, provenance, and inter-rater agreement metrics — that a training run will consume.
The skill set is subject-matter expertise plus calibrated judgment under a rubric. A physician authoring SFT demonstrations for a medical model. A criminal defense attorney writing preference pairs for a legal assistant. An engineer writing adversarial prompts to probe a model's security reasoning. The labelers are senior professionals, not entry-level annotators, because the data quality ceiling is the expertise floor of the people producing it.
Who buys this: frontier AI labs, domain-specific model builders, and enterprises fine-tuning open-weights models on regulated content they cannot send to a public API. The contract is a managed program — rubric design, calibration rounds, reviewer selection, ongoing quality audits — that ends with a dataset and an audit trail, not a running service.
How to tell which one you need
Three questions resolve most of the ambiguity:
- Do we want to ship a feature, or produce training data? — Features mean AI development. Training data means AI training. Teams that cannot answer this cleanly have not scoped the problem yet.
- Does the work end when the model changes? — If yes, you are doing AI development; the value is in the application layer that survives model swaps. If no, you are doing AI training; the output is data for a specific model family.
- Who is the consumer of the deliverable — our users, or our ML team? — User-facing deliverables are development work. ML-team-facing deliverables are training work. The answer dictates which vendor capability set matters.
When both happen inside the same organization
Large AI-native companies run both programs in parallel, staffed by different teams, with different vendors, and usually in different reporting lines. This is correct. The program management discipline for an RLHF run is closer to academic study design than to product engineering, and the product engineering discipline for shipping a RAG feature is closer to standard SaaS engineering than to ML research. A single team trying to be excellent at both is almost always excellent at one and adequate at the other.
If your organization is smaller and running both programs, name them differently inside the org. AI platform team for the development side. Applied AI data team (or model quality team) for the training side. The linguistic collapse of both programs under AI is the first step toward scoping mistakes downstream.
The procurement read
When you are shopping for help:
- A vendor who pitches we do AI without asking which has not drawn the line either. That is the signal to draw it for them — and see which half of the answer they have real expertise in.
- A vendor with a case study that is an LLM-powered product feature is an AI development vendor, regardless of what the deck says.
- A vendor with a case study that is an RLHF program, a custom eval set, or a dedicated expert pool is an AI training vendor.
- A vendor who has shipped both, inside separate programs, with separate teams is unusual — and worth paying for when the work spans both.
Pick the shape of the engagement first, then pick the vendor. The other order is how the week-six frustration happens.

