Why We Ditched LangChain for Pydantic AI

5 minute read Published: 2025-11-11

We used LangChain and LangGraph for about eight months. They got us to our first production deployment. And then we ripped them out and rewrote everything in Pydantic AI.

Here's why.

✨ What We Switched To: Pydantic AI

Let me show you what sold us. Here's a Pydantic AI agent:

from dataclasses import dataclass

from pydantic import BaseModel
from pydantic_ai import Agent


class WhiskeyRecommendation(BaseModel):
    name: str
    distillery: str
    region: str
    age_years: int
    tasting_notes: str
    price_usd: float


@dataclass
class WhiskeyDeps:
    db_connection: str
    api_key: str


agent = Agent(
    'anthropic:claude-sonnet-4-5',
    deps_type=WhiskeyDeps,
    result_type=WhiskeyRecommendation,  # ✨ Guaranteed structure!
    system_prompt='You are a Japanese whiskey sommelier.',
)


async def main():
    deps = WhiskeyDeps(db_connection='...', api_key='...')
    # ✨ result.data is WhiskeyRecommendation, not dict[str, Any]
    # Your IDE knows. Your AI assistant knows. Everyone knows.
    result = await agent.run('Recommend a smoky whiskey under $200', deps=deps)
    print(result.data.distillery)  # ✨ Autocomplete works!

The result is WhiskeyRecommendation, not dict[str, Any]. Your IDE autocompletes .distillery. Claude knows what fields exist. When you refactor, your type checker catches the breakage.

This matters 10x more now that AI writes half our code. Give Claude a well-typed codebase and it writes correct code. Give it Any soup and it hallucinates.

✨ Proper Durable Execution

Pydantic AI has native integration with Temporal. Not bolted on. Designed in.

When your agent needs to survive process restarts, handle retries with exponential backoff, and maintain state across failures—Temporal is the industry standard for that. Pydantic AI speaks Temporal natively.

For our regulated business workflows that might run for hours or days, this isn't optional. It's table stakes.

✨ Documentation You Can Actually Read

The Pydantic AI docs start simple and build complexity progressively. Each concept gets a clear explanation with annotated code examples. The API reference is complete.

✨ Just Enough Framework

Pydantic AI is minimal by design. It doesn't try to abstract away Python—it embraces it. Want a conditional? Use an if statement. Want a loop? Use a for loop. Want error handling? Use try/except.

There's no framework-specific way to do basic programming. You write Python. The framework handles the AI parts and gets out of your way for everything else.

💩 What We Switched From: LangChain

Let me be clear: LangChain isn't unusable. Lots of teams ship with it. But as our agents got more complex and our reliability requirements got stricter, the cracks became impossible to ignore.

💩 No Real Type Safety

This was the big one. LangChain's codebase is riddled with Any types. Your IDE can't help you. Autocomplete is guessing. And when you inevitably pass the wrong thing to the wrong function, you find out at runtime—usually in production.

Here's what a typical LangChain tool looks like:

from typing import Any

from langchain_core.messages import HumanMessage
from langchain_core.tools import tool
from langchain_openai import ChatOpenAI


@tool
def get_cafe_info(arrondissement: int) -> dict[str, Any]:
    """Get cafe recommendations in a Paris arrondissement."""
    # 💩 Returns dict[str, Any] - what's actually in here? Who knows!
    return {'name': '...', 'address': '...', 'rating': '...'}


llm = ChatOpenAI(model='gpt-5')
# 💩 What does this return? The type signature says BaseMessage | Any
# Good luck getting your IDE to help you here
result = llm.invoke([HumanMessage(content='Best cafe in Le Marais?')])

The return type is dict[str, Any]. What fields are in that dict? What are their types? Hope you remembered, because your tooling won't tell you.

Claude, Cursor, Copilot—they're all flying blind when the types are Any. They hallucinate field names. They pass wrong types. And you don't catch it until runtime.

💩 LangGraph Is The Wrong Abstraction

LangGraph models agent workflows as state machines with nodes and edges. On paper, it sounds elegant. In practice, it's the wrong abstraction for long-running tasks.

Real agent workflows aren't graphs. They're programs. They have loops, conditionals, error handling, retries. Forcing this into a graph means fighting the abstraction constantly. You end up with spaghetti of nodes and edges that's harder to understand than just... writing Python.

And durable execution? LangGraph bolted it on with "checkpointing." It's not native. It's not battle-tested. When your workflow needs to survive process restarts and API failures, you want something designed for that from the ground up.

💩 Proprietary Logging With LangSmith

Want observability? LangChain pushes you toward LangSmith—their proprietary tracing platform.

What about OpenTelemetry, the industry standard that works with every observability tool on the planet? It's there, technically, but it's clearly an afterthought. The first-class experience is LangSmith.

In regulated industries, we can't just ship our traces to some third-party SaaS. We need OTEL export to our own infrastructure. LangChain makes this harder than it should be.

💩 The Package Dependency Mess

This one might sound petty, but it drove us crazy. Look at the dependency graph:

langgraph depends on langchain-core
langchain depends on langgraph
langchain depends on langsmith

Circular dependencies. You can't install one without pulling in the others. Our Docker images bloated. Our dependency conflicts multiplied.

I'm going to say something uncharitable: I suspect this is intentional. Every package that depends on langchain-core shows up in LangChain's download numbers. When you're pitching investors, 50 million downloads looks better than 10 million—even if 40 million are transitive dependencies you didn't ask for.

The Migration

Rewriting took us about three weeks. Not because Pydantic AI was hard—it's simpler than LangChain—but because we had a lot of agents.

The result: fewer lines of code, better type coverage, easier testing, and agents that actually work reliably in production. Our on-call incidents dropped. Our AI assistants got dramatically better at writing agent code.

When LangChain Might Still Make Sense

Look, if you're prototyping, doing a hackathon, or building something that doesn't need to be reliable—LangChain is fine. It has more integrations. The community is bigger. You'll find more Stack Overflow answers.

But if you're building production AI for regulated industries, where reliability matters, where observability matters, where your AI assistants need to write correct code—Pydantic AI is the better choice.

We made the switch six months ago. We're not going back.

Now if you'll excuse me, I have a bottle of Yamazaki 12 calling my name. À la prochaine. 🥃