AI That Actually Ships.
Agents, RAG, MCP servers, and AI-powered automation, built with evals, cost control, and production reliability from day one.
What I Build
The full stack of AI work, from a single LLM call to multi-agent systems.
AI Agents & Workflows
Multi-step agents that act on real systems, Claude Code, custom SDK agents, and autonomous workflows tuned for reliability.
- Claude Agent SDK
- Tool use & function calling
- Multi-agent orchestration
- Long-running workflows
LLM Integration
Production-grade integration with Claude, GPT, and open-source models, including caching, streaming, and cost control.
- Anthropic & OpenAI SDKs
- Prompt caching
- Streaming responses
- Token & cost optimization
RAG & Knowledge Systems
Retrieval-augmented generation built on your data, embeddings, vector stores, and grounded answers with citations.
- Vector databases
- Hybrid search
- Citation & grounding
- Document ingestion pipelines
MCP Servers
Custom Model Context Protocol servers that expose your tools and data to Claude and other MCP-compatible clients.
- MCP server design
- Tool & resource APIs
- Auth & access control
- Local & remote deployment
Prompt Engineering
Systematic prompt design with evals, A/B testing, and version control, not vibes-based prompting.
- Prompt evaluation suites
- Few-shot & CoT patterns
- Output schema enforcement
- Regression testing
AI-Powered Automation
Replace repetitive workflows with AI, document processing, classification, summarization, and human-in-the-loop pipelines.
- Document & email processing
- Classification & extraction
- Approval workflows
- Slack/Notion integrations
Stack & Tooling
Modern AI tooling, picked based on the problem, not the hype cycle.
Claude (Anthropic)
Primary LLM for agents and reasoning, Opus, Sonnet, Haiku, with prompt caching and extended thinking.
MCP & Agent SDK
Model Context Protocol servers and Claude Agent SDK for production-grade autonomous systems.
Vector Stores
Pinecone, Qdrant, pgvector, chosen based on scale, latency, and ops requirements.
TypeScript & Python
TypeScript for product integration, Python for data pipelines and ML-adjacent work.
From Prompt to Production
An opinionated process built around evals, because AI without measurement is theatre.
Discovery
Map the actual problem. Most 'AI projects' are really data, workflow, or UX problems wearing an AI hat.
Prototype with Evals
A working prototype plus a small eval set, measure quality from day one, not after launch.
Harden
Cost optimization, prompt caching, fallbacks, observability, and guardrails for the real world.
Ship & Iterate
Deploy with monitoring, then iterate based on real usage, not synthetic test cases.
Common Use Cases
Where AI actually pays off, and where I've seen it work.
Product Features
AI-powered features inside your existing product, chat, summarization, classification, generation.
Internal Automation
Replace manual workflows: triage support tickets, process documents, draft responses, sync across tools.
Developer Tooling
Custom Claude Code workflows, MCP servers, and agent setups that make your engineering team faster.
AI Strategy & Audit
Already have AI in production? I review prompts, costs, evals, and architecture, and tell you what to fix.
Have an AI Project?
From a one-off automation to a full agent system, let's talk about the right approach for your use case.